diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md
index e72c70d8c3d..595ce5c29f3 100644
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@@ -1,15 +1,8 @@
-<!--
-If you have a question rather than reporting a bug please go to http://answers.opencv.org where you get much faster responses.
-If you need further assistance please read [How To Contribute](https://github.com/opencv/opencv/wiki/How_to_contribute).
-
-This is a template helping you to create an issue which can be processed as quickly as possible. This is the bug reporting section for the OpenCV library.
--->
-
 ##### System information (version)
 <!-- Example
-- OpenCV => 3.1
+- OpenCV => 4.2
 - Operating System / Platform => Windows 64 Bit
-- Compiler => Visual Studio 2015
+- Compiler => Visual Studio 2017
 -->
 
 - OpenCV => :grey_question:
@@ -27,4 +20,44 @@ This is a template helping you to create an issue which can be processed as quic
     // C++ code example
     ```
  or attach as .txt or .zip file
--->
\ No newline at end of file
+-->
+
+##### Issue submission checklist
+
+ - [ ] I report the issue, it's not a question
+   <!--
+   OpenCV team works with answers.opencv.org, Stack Overflow and other communities
+   to discuss problems. Tickets with question without real issue statement will be
+   closed.
+   -->
+ - [ ] I checked the problem with documentation, FAQ, open issues,
+       answers.opencv.org, Stack Overflow, etc and have not found solution
+   <!--
+   Places to check:
+   * OpenCV documentation: https://docs.opencv.org
+   * FAQ page: https://github.com/opencv/opencv/wiki/FAQ
+   * OpenCV forum: https://answers.opencv.org
+   * OpenCV issue tracker: https://github.com/opencv/opencv/issues?q=is%3Aissue
+   * OpenCV Contrib issue tracker: https://github.com/opencv/opencv_contrib/issues?q=is%3Aissue
+   * Stack Overflow branch: https://stackoverflow.com/questions/tagged/opencv
+   -->
+ - [ ] I updated to latest OpenCV version and the issue is still there
+   <!--
+   master branch for OpenCV 4.x and 3.4 branch for OpenCV 3.x releases.
+   OpenCV team supports only latest release for each branch.
+   The ticket is closed, if the problem is not reproduced with modern version.
+   -->
+ - [ ] There is reproducer code and related data files: videos, images, onnx, etc
+   <!--
+   The best reproducer -- test case for OpenCV that we can add to the library.
+   Recommendations for media files and binary files:
+   * Try to reproduce the issue with images and videos in opencv_extra repository
+     to reduce attachment size
+   * Use PNG for images, if you report some CV related bug, but not image reader
+     issue
+   * Attach the image as archive to the ticket, if you report some reader issue.
+     Image hosting services compress images and it breaks the repro code.
+   * Provide ONNX file for some public model or ONNX file with with random weights,
+     if you report ONNX parsing or handling issue. Architecture details diagram
+     from netron tool can be very useful too. See https://lutzroeder.github.io/netron/
+   -->
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
index 210a253113c..d461389b198 100644
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,9 +1,11 @@
-<!-- Please use this line to close one or multiple issues when this pullrequest gets merged
-You can add another line right under the first one:
-resolves #1234
-resolves #1235
--->
+### Pull Request Readiness Checklist
 
-### This pullrequest changes
+See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
 
-<!-- Please describe what your pullrequest is changing -->
+- [ ] I agree to contribute to the project under OpenCV (BSD) License.
+- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
+- [ ] The PR is proposed to proper branch
+- [ ] There is reference to original bug report and related work
+- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
+      Patch to opencv_extra has the same branch name.
+- [ ] The feature is well documented and sample code can be built with the project CMake
diff --git a/.travis.yml b/.travis.yml
index de69f1a5a39..815fcc6c4c2 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -4,7 +4,7 @@ compiler:
   - clang
 before_script:
   - cd ../
-  - git clone --branch 3.4 --depth=1 https://github.com/opencv/opencv.git
+  - git clone --branch master --depth=1 https://github.com/opencv/opencv.git
   - mkdir build-opencv
   - cd build-opencv
   - cmake 
diff --git a/README.md b/README.md
index 4e46b9af07e..5d0efe8e21f 100644
--- a/README.md
+++ b/README.md
@@ -31,6 +31,8 @@ use CMake's `BUILD_opencv_*` options. Like in this example:
 $ cmake -DOPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules -DBUILD_opencv_legacy=OFF <opencv_source_directory>
 ```
 
+If you also want to build the samples from the "samples" folder of each module, also include the "-DBUILD_EXAMPLES=ON" option.
+
 If you prefer using the gui version of cmake (cmake-gui), then, you can add `opencv_contrib` modules within `opencv` core by doing the following:
 
 1. start cmake-gui
diff --git a/modules/README.md b/modules/README.md
index 6d51a4db49d..b333043277b 100644
--- a/modules/README.md
+++ b/modules/README.md
@@ -10,6 +10,8 @@ $ cmake -D OPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules -D BUILD_opencv_<r
 
 - **aruco**: ArUco and ChArUco Markers -- Augmented reality ArUco marker and "ChARUco" markers where ArUco markers embedded inside the white areas of the checker board.
 
+- **alphamat**: Computer Vision based Alpha Matting -- Given an input image and a trimap, generate an alpha matte.
+
 - **bgsegm**: Background segmentation algorithm combining statistical background image estimation and per-pixel Bayesian segmentation.
 
 - **bioinspired**: Biological Vision -- Biologically inspired vision model: minimize noise and luminance variance, transient event segmentation, high dynamic range tone mapping methods.
@@ -22,7 +24,9 @@ $ cmake -D OPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules -D BUILD_opencv_<r
 
 - **datasets**: Datasets Reader -- Code for reading existing computer vision databases and samples of using the readers to train, test and run using that dataset's data.
 
-- **dnn_objdetect**: Object Detection using CNNs -- Implements compact CNN Model for object detection. Trained using Caffe but uses opencv_dnn modeule.
+- **dnn_objdetect**: Object Detection using CNNs -- Implements compact CNN Model for object detection. Trained using Caffe but uses opencv_dnn module.
+
+- **dnn_superres**: Superresolution using CNNs -- Contains four trained convolutional neural networks to upscale images.
 
 - **dnns_easily_fooled**: Subvert DNNs -- This code can use the activations in a network to fool the networks into recognizing something else.
 
@@ -48,13 +52,13 @@ $ cmake -D OPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules -D BUILD_opencv_<r
 
 - **reg**: Image Registration -- Pixels based image registration for precise alignment. Follows the paper "Image Alignment and Stitching: A Tutorial", by Richard Szeliski.
 
-- **rgbd**: RGB-Depth Processing module -- Linemod 3D object recognition; Fast surface normals and 3D plane finding. 3D visual odometry
+- **rgbd**: RGB-Depth Processing module -- Linemod 3D object recognition; Fast surface normals and 3D plane finding. 3D visual odometry. 3d reconstruction using KinectFusion.
 
 - **saliency**: Saliency API -- Where humans would look in a scene. Has routines for static, motion and "objectness" saliency.
 
 - **sfm**: Structure from Motion -- This module contains algorithms to perform 3d reconstruction from 2d images. The core of the module is a light version of Libmv.
 
-- **stereo**: Stereo Correspondence -- Stereo matching done with different descriptors: Census / CS-Census / MCT / BRIEF / MV.
+- **stereo**: Stereo Correspondence -- Stereo matching done with different descriptors: Census / CS-Census / MCT / BRIEF / MV and dense stereo correspondence using Quasi Dense Stereo method.
 
 - **structured_light**: Structured Light Use -- How to generate and project gray code patterns and use them to find dense depth in a scene.
 
diff --git a/modules/alphamat/CMakeLists.txt b/modules/alphamat/CMakeLists.txt
new file mode 100644
index 00000000000..f5c9d8917f2
--- /dev/null
+++ b/modules/alphamat/CMakeLists.txt
@@ -0,0 +1,9 @@
+if(NOT HAVE_EIGEN)
+  message(STATUS "Module opencv_alphamat disabled because the following dependencies are not found: Eigen")
+  ocv_module_disable(alphamat)
+endif()
+
+ocv_define_module(alphamat
+    opencv_core
+    opencv_imgproc
+)
diff --git a/modules/alphamat/README.md b/modules/alphamat/README.md
new file mode 100644
index 00000000000..e3dbe6bf443
--- /dev/null
+++ b/modules/alphamat/README.md
@@ -0,0 +1,23 @@
+# Computer Vision based Alpha Matting
+
+This project was part of the Google Summer of Code 2019.
+
+####Student: Muskaan Kularia
+####Mentor: Sunita Nayak
+***
+Alphamatting is the problem of extracting the foreground from an image. Given the input of an image and its corresponding trimap, we try to extract the foreground from the background.
+
+This project is implementation of "[[Designing Effective Inter-Pixel Information Flow for Natural Image Matting](http://people.inf.ethz.ch/aksoyy/ifm/)]" by Yağız Aksoy, Tunç Ozan Aydın and Marc Pollefeys[1]. It required implementation of parts of other papers [2,3,4].
+
+
+## References
+
+[1] Yagiz Aksoy, Tunc Ozan Aydin, Marc Pollefeys, "[Designing Effective Inter-Pixel Information Flow for Natural Image Matting](http://people.inf.ethz.ch/aksoyy/ifm/)", CVPR, 2017.
+
+[2] Roweis, Sam T., and Lawrence K. Saul. "[Nonlinear dimensionality reduction by locally linear embedding](https://science.sciencemag.org/content/290/5500/2323)" Science 290.5500 (2000): 2323-2326.
+
+[3] Anat Levin, Dani Lischinski, Yair Weiss, "[A Closed Form Solution to Natural Image Matting](https://www.researchgate.net/publication/5764820_A_Closed-Form_Solution_to_Natural_Image_Matting)", IEEE TPAMI, 2008.
+
+[4] Qifeng Chen, Dingzeyu Li, Chi-Keung Tang, "[KNN Matting](http://dingzeyu.li/files/knn-matting-tpami.pdf)", IEEE TPAMI, 2013.
+
+[5] Yagiz Aksoy, "[Affinity Based Matting Toolbox](https://github.com/yaksoy/AffinityBasedMattingToolbox)".
diff --git a/modules/alphamat/doc/alphamat.bib b/modules/alphamat/doc/alphamat.bib
new file mode 100644
index 00000000000..7f842568d7f
--- /dev/null
+++ b/modules/alphamat/doc/alphamat.bib
@@ -0,0 +1,26 @@
+@inproceedings{aksoy2017designing,
+  title={Designing effective inter-pixel information flow for natural image matting},
+  author={Aksoy, Yagiz and Ozan Aydin, Tunc and Pollefeys, Marc},
+  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
+  pages={29--37},
+  year={2017}
+}
+
+@article{roweis2000nonlinear,
+  title={Nonlinear dimensionality reduction by locally linear embedding},
+  author={Roweis, Sam T and Saul, Lawrence K},
+  journal={science},
+  volume={290},
+  number={5500},
+  pages={2323--2326},
+  year={2000},
+  publisher={American Association for the Advancement of Science}
+}
+
+@inproceedings{shahrian2013improving,
+  title={Improving image matting using comprehensive sampling sets},
+  author={Shahrian, Ehsan and Rajan, Deepu and Price, Brian and Cohen, Scott},
+  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
+  pages={636--643},
+  year={2013}
+}
diff --git a/modules/alphamat/include/opencv2/alphamat.hpp b/modules/alphamat/include/opencv2/alphamat.hpp
new file mode 100644
index 00000000000..927fbb09eac
--- /dev/null
+++ b/modules/alphamat/include/opencv2/alphamat.hpp
@@ -0,0 +1,32 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+/** Information Flow algorithm implementaton for alphamatting */
+
+#ifndef _OPENCV_ALPHAMAT_HPP_
+#define _OPENCV_ALPHAMAT_HPP_
+
+/**
+ * @defgroup alphamat Alpha Matting
+ * This module is dedicated to compute alpha matting of images, given the input image and an input trimap.
+ * The samples directory includes easy examples of how to use the module.
+ */
+
+namespace cv { namespace alphamat {
+//! @addtogroup alphamat
+//! @{
+
+/**
+ * The implementation is based on Designing Effective Inter-Pixel Information Flow for Natural Image Matting by Yağız Aksoy, Tunç Ozan Aydın and Marc Pollefeys, CVPR 2019.
+ *
+ * This module has been originally developed by Muskaan Kularia and Sunita Nayak as a project
+ * for Google Summer of Code 2019 (GSoC 19).
+ *
+ */
+CV_EXPORTS_W void infoFlow(InputArray image, InputArray tmap, OutputArray result);
+
+//! @}
+}}  // namespace
+
+#endif
diff --git a/modules/alphamat/samples/information_flow_matting.cpp b/modules/alphamat/samples/information_flow_matting.cpp
new file mode 100644
index 00000000000..f4dbda1d002
--- /dev/null
+++ b/modules/alphamat/samples/information_flow_matting.cpp
@@ -0,0 +1,77 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include <iostream>
+#include "opencv2/highgui.hpp"
+#include <opencv2/core.hpp>
+#include <opencv2/imgproc.hpp>
+#include <opencv2/alphamat.hpp>
+
+using namespace std;
+using namespace cv;
+using namespace cv::alphamat;
+
+const char* keys =
+    "{img || input image name}"
+    "{tri || input trimap image name}"
+    "{out || output image name}"
+    "{help h || print help message}"
+;
+
+int main(int argc, char* argv[])
+{
+    CommandLineParser parser(argc, argv, keys);
+    parser.about("This sample demonstrates Information Flow Alpha Matting");
+
+    if (parser.has("help"))
+    {
+        parser.printMessage();
+        return 0;
+    }
+
+    string img_path = parser.get<std::string>("img");
+    string trimap_path = parser.get<std::string>("tri");
+    string result_path = parser.get<std::string>("out");
+
+    if (!parser.check()
+            || img_path.empty() || trimap_path.empty())
+    {
+        parser.printMessage();
+        parser.printErrors();
+        return 1;
+    }
+
+    Mat image, tmap;
+
+    image = imread(img_path, IMREAD_COLOR);  // Read the input image file
+    if (image.empty())
+    {
+        printf("Cannot read image file: '%s'\n", img_path.c_str());
+        return 1;
+    }
+
+    tmap = imread(trimap_path, IMREAD_GRAYSCALE);
+    if (tmap.empty())
+    {
+        printf("Cannot read trimap file: '%s'\n", trimap_path.c_str());
+        return 1;
+    }
+
+    Mat result;
+    infoFlow(image, tmap, result);
+
+    if (result_path.empty())
+    {
+        namedWindow("result alpha matte", WINDOW_NORMAL);
+        imshow("result alpha matte", result);
+        waitKey(0);
+    }
+    else
+    {
+        imwrite(result_path, result);
+        printf("Result saved: '%s'\n", result_path.c_str());
+    }
+
+    return 0;
+}
diff --git a/modules/alphamat/samples/input_images/plant.jpg b/modules/alphamat/samples/input_images/plant.jpg
new file mode 100644
index 00000000000..c6f30953397
Binary files /dev/null and b/modules/alphamat/samples/input_images/plant.jpg differ
diff --git a/modules/alphamat/samples/output_mattes/plant_result.jpg b/modules/alphamat/samples/output_mattes/plant_result.jpg
new file mode 100644
index 00000000000..4ec7e29c6b0
Binary files /dev/null and b/modules/alphamat/samples/output_mattes/plant_result.jpg differ
diff --git a/modules/alphamat/samples/trimaps/plant.png b/modules/alphamat/samples/trimaps/plant.png
new file mode 100755
index 00000000000..6c646b9192d
Binary files /dev/null and b/modules/alphamat/samples/trimaps/plant.png differ
diff --git a/modules/alphamat/src/3rdparty/KDTreeVectorOfVectorsAdaptor.h b/modules/alphamat/src/3rdparty/KDTreeVectorOfVectorsAdaptor.h
new file mode 100644
index 00000000000..893aae70ce6
--- /dev/null
+++ b/modules/alphamat/src/3rdparty/KDTreeVectorOfVectorsAdaptor.h
@@ -0,0 +1,117 @@
+/***********************************************************************
+ * Software License Agreement (BSD License)
+ *
+ * Copyright 2011-16 Jose Luis Blanco (joseluisblancoc@gmail.com).
+ *   All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *************************************************************************/
+
+#pragma once
+
+#include "nanoflann.hpp"
+
+#include <vector>
+
+// ===== This example shows how to use nanoflann with these types of containers: =======
+//typedef std::vector<std::vector<double> > my_vector_of_vectors_t;
+//typedef std::vector<Eigen::VectorXd> my_vector_of_vectors_t;   // This requires #include <Eigen/Dense>
+// =====================================================================================
+
+
+/** A simple vector-of-vectors adaptor for nanoflann, without duplicating the storage.
+  *  The i'th vector represents a point in the state space.
+  *
+  *  \tparam DIM If set to >0, it specifies a compile-time fixed dimensionality for the points in the data set, allowing more compiler optimizations.
+  *  \tparam num_t The type of the point coordinates (typically, double or float).
+  *  \tparam Distance The distance metric to use: nanoflann::metric_L1, nanoflann::metric_L2, nanoflann::metric_L2_Simple, etc.
+  *  \tparam IndexType The type for indices in the KD-tree index (typically, size_t of int)
+  */
+template <class VectorOfVectorsType, typename num_t = double, int DIM = -1, class Distance = nanoflann::metric_L2, typename IndexType = size_t>
+struct KDTreeVectorOfVectorsAdaptor
+{
+  typedef KDTreeVectorOfVectorsAdaptor<VectorOfVectorsType, num_t, DIM,Distance> self_t;
+  typedef typename Distance::template traits<num_t, self_t>::distance_t metric_t;
+  typedef nanoflann::KDTreeSingleIndexAdaptor< metric_t, self_t, DIM, IndexType>  index_t;
+
+  index_t* index; //! The kd-tree index for the user to call its methods as usual with any other FLANN index.
+
+  /// Constructor: takes a const ref to the vector of vectors object with the data points
+  KDTreeVectorOfVectorsAdaptor(const size_t /* dimensionality */, const VectorOfVectorsType &mat, const int leaf_max_size = 10) : m_data(mat)
+  {
+    assert(mat.size() != 0 && mat[0].size() != 0);
+    const size_t dims = mat[0].size();
+    if (DIM>0 && static_cast<int>(dims) != DIM)
+      throw std::runtime_error("Data set dimensionality does not match the 'DIM' template argument");
+    index = new index_t( static_cast<int>(dims), *this /* adaptor */, nanoflann::KDTreeSingleIndexAdaptorParams(leaf_max_size ) );
+    index->buildIndex();
+  }
+
+  ~KDTreeVectorOfVectorsAdaptor() {
+    delete index;
+  }
+
+  const VectorOfVectorsType &m_data;
+
+  /** Query for the \a num_closest closest points to a given point (entered as query_point[0:dim-1]).
+    *  Note that this is a short-cut method for index->findNeighbors().
+    *  The user can also call index->... methods as desired.
+    * \note nChecks_IGNORED is ignored but kept for compatibility with the original FLANN interface.
+    */
+  //inline void query(const num_t *query_point, const size_t num_closest, IndexType *out_indices, num_t *out_distances_sq, const int nChecks_IGNORED = 10) const
+  inline void query(const num_t *query_point, const size_t num_closest, IndexType *out_indices, num_t *out_distances_sq) const
+  {
+    nanoflann::KNNResultSet<num_t, IndexType> resultSet(num_closest);
+    resultSet.init(out_indices, out_distances_sq);
+    index->findNeighbors(resultSet, query_point, nanoflann::SearchParams());
+  }
+
+  /** @name Interface expected by KDTreeSingleIndexAdaptor
+    * @{ */
+
+  const self_t & derived() const {
+    return *this;
+  }
+  self_t & derived()       {
+    return *this;
+  }
+
+  // Must return the number of data points
+  inline size_t kdtree_get_point_count() const {
+    return m_data.size();
+  }
+
+  // Returns the dim'th component of the idx'th point in the class:
+  inline num_t kdtree_get_pt(const size_t idx, const size_t dim) const {
+    return m_data[idx][dim];
+  }
+
+  // Optional bounding-box computation: return false to default to a standard bbox computation loop.
+  // Return true if the BBOX was already computed by the class and returned in "bb" so it can be avoided to redo it again.
+  // Look at bb.size() to find out the expected dimensionality (e.g. 2 or 3 for point clouds)
+  template <class BBOX>
+  bool kdtree_get_bbox(BBOX & /*bb*/) const {
+    return false;
+  }
+
+  /** @} */
+};  // end of KDTreeVectorOfVectorsAdaptor
diff --git a/modules/alphamat/src/3rdparty/nanoflann.hpp b/modules/alphamat/src/3rdparty/nanoflann.hpp
new file mode 100644
index 00000000000..a8e4667dda6
--- /dev/null
+++ b/modules/alphamat/src/3rdparty/nanoflann.hpp
@@ -0,0 +1,2040 @@
+/***********************************************************************
+ * Software License Agreement (BSD License)
+ *
+ * Copyright 2008-2009  Marius Muja (mariusm@cs.ubc.ca). All rights reserved.
+ * Copyright 2008-2009  David G. Lowe (lowe@cs.ubc.ca). All rights reserved.
+ * Copyright 2011-2016  Jose Luis Blanco (joseluisblancoc@gmail.com).
+ *   All rights reserved.
+ *
+ * THE BSD LICENSE
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *************************************************************************/
+
+/** \mainpage nanoflann C++ API documentation
+ *  nanoflann is a C++ header-only library for building KD-Trees, mostly
+ *  optimized for 2D or 3D point clouds.
+ *
+ *  nanoflann does not require compiling or installing, just an
+ *  #include <nanoflann.hpp> in your code.
+ *
+ *  See:
+ *   - <a href="modules.html" >C++ API organized by modules</a>
+ *   - <a href="https://github.com/jlblancoc/nanoflann" >Online README</a>
+ *   - <a href="http://jlblancoc.github.io/nanoflann/" >Doxygen
+ * documentation</a>
+ */
+
+#ifndef NANOFLANN_HPP_
+#define NANOFLANN_HPP_
+
+#include <algorithm>
+#include <array>
+#include <cassert>
+#include <cmath>   // for abs()
+#include <cstdio>  // for fwrite()
+#include <cstdlib> // for abs()
+#include <functional>
+#include <limits> // std::reference_wrapper
+#include <stdexcept>
+#include <vector>
+
+/** Library version: 0xMmP (M=Major,m=minor,P=patch) */
+#define NANOFLANN_VERSION 0x132
+
+// Avoid conflicting declaration of min/max macros in windows headers
+#if !defined(NOMINMAX) &&                                                      \
+    (defined(_WIN32) || defined(_WIN32_) || defined(WIN32) || defined(_WIN64))
+#define NOMINMAX
+#ifdef max
+#undef max
+#undef min
+#endif
+#endif
+
+namespace nanoflann {
+/** @addtogroup nanoflann_grp nanoflann C++ library for ANN
+ *  @{ */
+
+/** the PI constant (required to avoid MSVC missing symbols) */
+template <typename T> T pi_const() {
+  return static_cast<T>(3.14159265358979323846);
+}
+
+/**
+ * Traits if object is resizable and assignable (typically has a resize | assign
+ * method)
+ */
+template <typename T, typename = int> struct has_resize : std::false_type {};
+
+template <typename T>
+struct has_resize<T, decltype((void)std::declval<T>().resize(1), 0)>
+    : std::true_type {};
+
+template <typename T, typename = int> struct has_assign : std::false_type {};
+
+template <typename T>
+struct has_assign<T, decltype((void)std::declval<T>().assign(1, 0), 0)>
+    : std::true_type {};
+
+/**
+ * Free function to resize a resizable object
+ */
+template <typename Container>
+inline typename std::enable_if<has_resize<Container>::value, void>::type
+resize(Container &c, const size_t nElements) {
+  c.resize(nElements);
+}
+
+/**
+ * Free function that has no effects on non resizable containers (e.g.
+ * std::array) It raises an exception if the expected size does not match
+ */
+template <typename Container>
+inline typename std::enable_if<!has_resize<Container>::value, void>::type
+resize(Container &c, const size_t nElements) {
+  if (nElements != c.size())
+    throw std::logic_error("Try to change the size of a std::array.");
+}
+
+/**
+ * Free function to assign to a container
+ */
+template <typename Container, typename T>
+inline typename std::enable_if<has_assign<Container>::value, void>::type
+assign(Container &c, const size_t nElements, const T &value) {
+  c.assign(nElements, value);
+}
+
+/**
+ * Free function to assign to a std::array
+ */
+template <typename Container, typename T>
+inline typename std::enable_if<!has_assign<Container>::value, void>::type
+assign(Container &c, const size_t nElements, const T &value) {
+  for (size_t i = 0; i < nElements; i++)
+    c[i] = value;
+}
+
+/** @addtogroup result_sets_grp Result set classes
+ *  @{ */
+template <typename _DistanceType, typename _IndexType = size_t,
+          typename _CountType = size_t>
+class KNNResultSet {
+public:
+  typedef _DistanceType DistanceType;
+  typedef _IndexType IndexType;
+  typedef _CountType CountType;
+
+private:
+  IndexType *indices;
+  DistanceType *dists;
+  CountType capacity;
+  CountType count;
+
+public:
+  inline KNNResultSet(CountType capacity_)
+      : indices(0), dists(0), capacity(capacity_), count(0) {}
+
+  inline void init(IndexType *indices_, DistanceType *dists_) {
+    indices = indices_;
+    dists = dists_;
+    count = 0;
+    if (capacity)
+      dists[capacity - 1] = (std::numeric_limits<DistanceType>::max)();
+  }
+
+  inline CountType size() const { return count; }
+
+  inline bool full() const { return count == capacity; }
+
+  /**
+   * Called during search to add an element matching the criteria.
+   * @return true if the search should be continued, false if the results are
+   * sufficient
+   */
+  inline bool addPoint(DistanceType dist, IndexType index) {
+    CountType i;
+    for (i = count; i > 0; --i) {
+#ifdef NANOFLANN_FIRST_MATCH // If defined and two points have the same
+                             // distance, the one with the lowest-index will be
+                             // returned first.
+      if ((dists[i - 1] > dist) ||
+          ((dist == dists[i - 1]) && (indices[i - 1] > index))) {
+#else
+      if (dists[i - 1] > dist) {
+#endif
+        if (i < capacity) {
+          dists[i] = dists[i - 1];
+          indices[i] = indices[i - 1];
+        }
+      } else
+        break;
+    }
+    if (i < capacity) {
+      dists[i] = dist;
+      indices[i] = index;
+    }
+    if (count < capacity)
+      count++;
+
+    // tell caller that the search shall continue
+    return true;
+  }
+
+  inline DistanceType worstDist() const { return dists[capacity - 1]; }
+};
+
+/** operator "<" for std::sort() */
+struct IndexDist_Sorter {
+  /** PairType will be typically: std::pair<IndexType,DistanceType> */
+  template <typename PairType>
+  inline bool operator()(const PairType &p1, const PairType &p2) const {
+    return p1.second < p2.second;
+  }
+};
+
+/**
+ * A result-set class used when performing a radius based search.
+ */
+template <typename _DistanceType, typename _IndexType = size_t>
+class RadiusResultSet {
+public:
+  typedef _DistanceType DistanceType;
+  typedef _IndexType IndexType;
+
+public:
+  const DistanceType radius;
+
+  std::vector<std::pair<IndexType, DistanceType>> &m_indices_dists;
+
+  inline RadiusResultSet(
+      DistanceType radius_,
+      std::vector<std::pair<IndexType, DistanceType>> &indices_dists)
+      : radius(radius_), m_indices_dists(indices_dists) {
+    init();
+  }
+
+  inline void init() { clear(); }
+  inline void clear() { m_indices_dists.clear(); }
+
+  inline size_t size() const { return m_indices_dists.size(); }
+
+  inline bool full() const { return true; }
+
+  /**
+   * Called during search to add an element matching the criteria.
+   * @return true if the search should be continued, false if the results are
+   * sufficient
+   */
+  inline bool addPoint(DistanceType dist, IndexType index) {
+    if (dist < radius)
+      m_indices_dists.push_back(std::make_pair(index, dist));
+    return true;
+  }
+
+  inline DistanceType worstDist() const { return radius; }
+
+  /**
+   * Find the worst result (furtherest neighbor) without copying or sorting
+   * Pre-conditions: size() > 0
+   */
+  std::pair<IndexType, DistanceType> worst_item() const {
+    if (m_indices_dists.empty())
+      throw std::runtime_error("Cannot invoke RadiusResultSet::worst_item() on "
+                               "an empty list of results.");
+    typedef
+        typename std::vector<std::pair<IndexType, DistanceType>>::const_iterator
+            DistIt;
+    DistIt it = std::max_element(m_indices_dists.begin(), m_indices_dists.end(),
+                                 IndexDist_Sorter());
+    return *it;
+  }
+};
+
+/** @} */
+
+/** @addtogroup loadsave_grp Load/save auxiliary functions
+ * @{ */
+template <typename T>
+void save_value(FILE *stream, const T &value, size_t count = 1) {
+  fwrite(&value, sizeof(value), count, stream);
+}
+
+template <typename T>
+void save_value(FILE *stream, const std::vector<T> &value) {
+  size_t size = value.size();
+  fwrite(&size, sizeof(size_t), 1, stream);
+  fwrite(&value[0], sizeof(T), size, stream);
+}
+
+template <typename T>
+void load_value(FILE *stream, T &value, size_t count = 1) {
+  size_t read_cnt = fread(&value, sizeof(value), count, stream);
+  if (read_cnt != count) {
+    throw std::runtime_error("Cannot read from file");
+  }
+}
+
+template <typename T> void load_value(FILE *stream, std::vector<T> &value) {
+  size_t size;
+  size_t read_cnt = fread(&size, sizeof(size_t), 1, stream);
+  if (read_cnt != 1) {
+    throw std::runtime_error("Cannot read from file");
+  }
+  value.resize(size);
+  read_cnt = fread(&value[0], sizeof(T), size, stream);
+  if (read_cnt != size) {
+    throw std::runtime_error("Cannot read from file");
+  }
+}
+/** @} */
+
+/** @addtogroup metric_grp Metric (distance) classes
+ * @{ */
+
+struct Metric {};
+
+/** Manhattan distance functor (generic version, optimized for
+ * high-dimensionality data sets). Corresponding distance traits:
+ * nanoflann::metric_L1 \tparam T Type of the elements (e.g. double, float,
+ * uint8_t) \tparam _DistanceType Type of distance variables (must be signed)
+ * (e.g. float, double, int64_t)
+ */
+template <class T, class DataSource, typename _DistanceType = T>
+struct L1_Adaptor {
+  typedef T ElementType;
+  typedef _DistanceType DistanceType;
+
+  const DataSource &data_source;
+
+  L1_Adaptor(const DataSource &_data_source) : data_source(_data_source) {}
+
+  inline DistanceType evalMetric(const T *a, const size_t b_idx, size_t size,
+                                 DistanceType worst_dist = -1) const {
+    DistanceType result = DistanceType();
+    const T *last = a + size;
+    const T *lastgroup = last - 3;
+    size_t d = 0;
+
+    /* Process 4 items with each loop for efficiency. */
+    while (a < lastgroup) {
+      const DistanceType diff0 =
+          std::abs(a[0] - data_source.kdtree_get_pt(b_idx, d++));
+      const DistanceType diff1 =
+          std::abs(a[1] - data_source.kdtree_get_pt(b_idx, d++));
+      const DistanceType diff2 =
+          std::abs(a[2] - data_source.kdtree_get_pt(b_idx, d++));
+      const DistanceType diff3 =
+          std::abs(a[3] - data_source.kdtree_get_pt(b_idx, d++));
+      result += diff0 + diff1 + diff2 + diff3;
+      a += 4;
+      if ((worst_dist > 0) && (result > worst_dist)) {
+        return result;
+      }
+    }
+    /* Process last 0-3 components.  Not needed for standard vector lengths. */
+    while (a < last) {
+      result += std::abs(*a++ - data_source.kdtree_get_pt(b_idx, d++));
+    }
+    return result;
+  }
+
+  template <typename U, typename V>
+  inline DistanceType accum_dist(const U a, const V b, const size_t) const {
+    return std::abs(a - b);
+  }
+};
+
+/** Squared Euclidean distance functor (generic version, optimized for
+ * high-dimensionality data sets). Corresponding distance traits:
+ * nanoflann::metric_L2 \tparam T Type of the elements (e.g. double, float,
+ * uint8_t) \tparam _DistanceType Type of distance variables (must be signed)
+ * (e.g. float, double, int64_t)
+ */
+template <class T, class DataSource, typename _DistanceType = T>
+struct L2_Adaptor {
+  typedef T ElementType;
+  typedef _DistanceType DistanceType;
+
+  const DataSource &data_source;
+
+  L2_Adaptor(const DataSource &_data_source) : data_source(_data_source) {}
+
+  inline DistanceType evalMetric(const T *a, const size_t b_idx, size_t size,
+                                 DistanceType worst_dist = -1) const {
+    DistanceType result = DistanceType();
+    const T *last = a + size;
+    const T *lastgroup = last - 3;
+    size_t d = 0;
+
+    /* Process 4 items with each loop for efficiency. */
+    while (a < lastgroup) {
+      const DistanceType diff0 = a[0] - data_source.kdtree_get_pt(b_idx, d++);
+      const DistanceType diff1 = a[1] - data_source.kdtree_get_pt(b_idx, d++);
+      const DistanceType diff2 = a[2] - data_source.kdtree_get_pt(b_idx, d++);
+      const DistanceType diff3 = a[3] - data_source.kdtree_get_pt(b_idx, d++);
+      result += diff0 * diff0 + diff1 * diff1 + diff2 * diff2 + diff3 * diff3;
+      a += 4;
+      if ((worst_dist > 0) && (result > worst_dist)) {
+        return result;
+      }
+    }
+    /* Process last 0-3 components.  Not needed for standard vector lengths. */
+    while (a < last) {
+      const DistanceType diff0 = *a++ - data_source.kdtree_get_pt(b_idx, d++);
+      result += diff0 * diff0;
+    }
+    return result;
+  }
+
+  template <typename U, typename V>
+  inline DistanceType accum_dist(const U a, const V b, const size_t) const {
+    return (a - b) * (a - b);
+  }
+};
+
+/** Squared Euclidean (L2) distance functor (suitable for low-dimensionality
+ * datasets, like 2D or 3D point clouds) Corresponding distance traits:
+ * nanoflann::metric_L2_Simple \tparam T Type of the elements (e.g. double,
+ * float, uint8_t) \tparam _DistanceType Type of distance variables (must be
+ * signed) (e.g. float, double, int64_t)
+ */
+template <class T, class DataSource, typename _DistanceType = T>
+struct L2_Simple_Adaptor {
+  typedef T ElementType;
+  typedef _DistanceType DistanceType;
+
+  const DataSource &data_source;
+
+  L2_Simple_Adaptor(const DataSource &_data_source)
+      : data_source(_data_source) {}
+
+  inline DistanceType evalMetric(const T *a, const size_t b_idx,
+                                 size_t size) const {
+    DistanceType result = DistanceType();
+    for (size_t i = 0; i < size; ++i) {
+      const DistanceType diff = a[i] - data_source.kdtree_get_pt(b_idx, i);
+      result += diff * diff;
+    }
+    return result;
+  }
+
+  template <typename U, typename V>
+  inline DistanceType accum_dist(const U a, const V b, const size_t) const {
+    return (a - b) * (a - b);
+  }
+};
+
+/** SO2 distance functor
+ *  Corresponding distance traits: nanoflann::metric_SO2
+ * \tparam T Type of the elements (e.g. double, float)
+ * \tparam _DistanceType Type of distance variables (must be signed) (e.g.
+ * float, double) orientation is constrained to be in [-pi, pi]
+ */
+template <class T, class DataSource, typename _DistanceType = T>
+struct SO2_Adaptor {
+  typedef T ElementType;
+  typedef _DistanceType DistanceType;
+
+  const DataSource &data_source;
+
+  SO2_Adaptor(const DataSource &_data_source) : data_source(_data_source) {}
+
+  inline DistanceType evalMetric(const T *a, const size_t b_idx,
+                                 size_t size) const {
+    return accum_dist(a[size - 1], data_source.kdtree_get_pt(b_idx, size - 1),
+                      size - 1);
+  }
+
+  /** Note: this assumes that input angles are already in the range [-pi,pi] */
+  template <typename U, typename V>
+  inline DistanceType accum_dist(const U a, const V b, const size_t) const {
+    DistanceType result = DistanceType();
+    DistanceType PI = pi_const<DistanceType>();
+    result = b - a;
+    if (result > PI)
+      result -= 2 * PI;
+    else if (result < -PI)
+      result += 2 * PI;
+    return result;
+  }
+};
+
+/** SO3 distance functor (Uses L2_Simple)
+ *  Corresponding distance traits: nanoflann::metric_SO3
+ * \tparam T Type of the elements (e.g. double, float)
+ * \tparam _DistanceType Type of distance variables (must be signed) (e.g.
+ * float, double)
+ */
+template <class T, class DataSource, typename _DistanceType = T>
+struct SO3_Adaptor {
+  typedef T ElementType;
+  typedef _DistanceType DistanceType;
+
+  L2_Simple_Adaptor<T, DataSource> distance_L2_Simple;
+
+  SO3_Adaptor(const DataSource &_data_source)
+      : distance_L2_Simple(_data_source) {}
+
+  inline DistanceType evalMetric(const T *a, const size_t b_idx,
+                                 size_t size) const {
+    return distance_L2_Simple.evalMetric(a, b_idx, size);
+  }
+
+  template <typename U, typename V>
+  inline DistanceType accum_dist(const U a, const V b, const size_t idx) const {
+    return distance_L2_Simple.accum_dist(a, b, idx);
+  }
+};
+
+/** Metaprogramming helper traits class for the L1 (Manhattan) metric */
+struct metric_L1 : public Metric {
+  template <class T, class DataSource> struct traits {
+    typedef L1_Adaptor<T, DataSource> distance_t;
+  };
+};
+/** Metaprogramming helper traits class for the L2 (Euclidean) metric */
+struct metric_L2 : public Metric {
+  template <class T, class DataSource> struct traits {
+    typedef L2_Adaptor<T, DataSource> distance_t;
+  };
+};
+/** Metaprogramming helper traits class for the L2_simple (Euclidean) metric */
+struct metric_L2_Simple : public Metric {
+  template <class T, class DataSource> struct traits {
+    typedef L2_Simple_Adaptor<T, DataSource> distance_t;
+  };
+};
+/** Metaprogramming helper traits class for the SO3_InnerProdQuat metric */
+struct metric_SO2 : public Metric {
+  template <class T, class DataSource> struct traits {
+    typedef SO2_Adaptor<T, DataSource> distance_t;
+  };
+};
+/** Metaprogramming helper traits class for the SO3_InnerProdQuat metric */
+struct metric_SO3 : public Metric {
+  template <class T, class DataSource> struct traits {
+    typedef SO3_Adaptor<T, DataSource> distance_t;
+  };
+};
+
+/** @} */
+
+/** @addtogroup param_grp Parameter structs
+ * @{ */
+
+/**  Parameters (see README.md) */
+struct KDTreeSingleIndexAdaptorParams {
+  KDTreeSingleIndexAdaptorParams(size_t _leaf_max_size = 10)
+      : leaf_max_size(_leaf_max_size) {}
+
+  size_t leaf_max_size;
+};
+
+/** Search options for KDTreeSingleIndexAdaptor::findNeighbors() */
+struct SearchParams {
+  /** Note: The first argument (checks_IGNORED_) is ignored, but kept for
+   * compatibility with the FLANN interface */
+  SearchParams(int checks_IGNORED_ = 32, float eps_ = 0, bool sorted_ = true)
+      : checks(checks_IGNORED_), eps(eps_), sorted(sorted_) {}
+
+  int checks;  //!< Ignored parameter (Kept for compatibility with the FLANN
+               //!< interface).
+  float eps;   //!< search for eps-approximate neighbours (default: 0)
+  bool sorted; //!< only for radius search, require neighbours sorted by
+               //!< distance (default: true)
+};
+/** @} */
+
+/** @addtogroup memalloc_grp Memory allocation
+ * @{ */
+
+/**
+ * Allocates (using C's malloc) a generic type T.
+ *
+ * Params:
+ *     count = number of instances to allocate.
+ * Returns: pointer (of type T*) to memory buffer
+ */
+template <typename T> inline T *allocate(size_t count = 1) {
+  T *mem = static_cast<T *>(::malloc(sizeof(T) * count));
+  return mem;
+}
+
+/**
+ * Pooled storage allocator
+ *
+ * The following routines allow for the efficient allocation of storage in
+ * small chunks from a specified pool.  Rather than allowing each structure
+ * to be freed individually, an entire pool of storage is freed at once.
+ * This method has two advantages over just using malloc() and free().  First,
+ * it is far more efficient for allocating small objects, as there is
+ * no overhead for remembering all the information needed to free each
+ * object or consolidating fragmented memory.  Second, the decision about
+ * how long to keep an object is made at the time of allocation, and there
+ * is no need to track down all the objects to free them.
+ *
+ */
+
+const size_t WORDSIZE = 16;
+const size_t BLOCKSIZE = 8192;
+
+class PooledAllocator {
+  /* We maintain memory alignment to word boundaries by requiring that all
+      allocations be in multiples of the machine wordsize.  */
+  /* Size of machine word in bytes.  Must be power of 2. */
+  /* Minimum number of bytes requested at a time from	the system.  Must be
+   * multiple of WORDSIZE. */
+
+  size_t remaining; /* Number of bytes left in current block of storage. */
+  void *base;       /* Pointer to base of current block of storage. */
+  void *loc;        /* Current location in block to next allocate memory. */
+
+  void internal_init() {
+    remaining = 0;
+    base = NULL;
+    usedMemory = 0;
+    wastedMemory = 0;
+  }
+
+public:
+  size_t usedMemory;
+  size_t wastedMemory;
+
+  /**
+      Default constructor. Initializes a new pool.
+   */
+  PooledAllocator() { internal_init(); }
+
+  /**
+   * Destructor. Frees all the memory allocated in this pool.
+   */
+  ~PooledAllocator() { free_all(); }
+
+  /** Frees all allocated memory chunks */
+  void free_all() {
+    while (base != NULL) {
+      void *prev =
+          *(static_cast<void **>(base)); /* Get pointer to prev block. */
+      ::free(base);
+      base = prev;
+    }
+    internal_init();
+  }
+
+  /**
+   * Returns a pointer to a piece of new memory of the given size in bytes
+   * allocated from the pool.
+   */
+  void *malloc(const size_t req_size) {
+    /* Round size up to a multiple of wordsize.  The following expression
+        only works for WORDSIZE that is a power of 2, by masking last bits of
+        incremented size to zero.
+     */
+    const size_t size = (req_size + (WORDSIZE - 1)) & ~(WORDSIZE - 1);
+
+    /* Check whether a new block must be allocated.  Note that the first word
+        of a block is reserved for a pointer to the previous block.
+     */
+    if (size > remaining) {
+
+      wastedMemory += remaining;
+
+      /* Allocate new storage. */
+      const size_t blocksize =
+          (size + sizeof(void *) + (WORDSIZE - 1) > BLOCKSIZE)
+              ? size + sizeof(void *) + (WORDSIZE - 1)
+              : BLOCKSIZE;
+
+      // use the standard C malloc to allocate memory
+      void *m = ::malloc(blocksize);
+      if (!m) {
+        fprintf(stderr, "Failed to allocate memory.\n");
+        return NULL;
+      }
+
+      /* Fill first word of new block with pointer to previous block. */
+      static_cast<void **>(m)[0] = base;
+      base = m;
+
+      size_t shift = 0;
+      // int size_t = (WORDSIZE - ( (((size_t)m) + sizeof(void*)) &
+      // (WORDSIZE-1))) & (WORDSIZE-1);
+
+      remaining = blocksize - sizeof(void *) - shift;
+      loc = (static_cast<char *>(m) + sizeof(void *) + shift);
+    }
+    void *rloc = loc;
+    loc = static_cast<char *>(loc) + size;
+    remaining -= size;
+
+    usedMemory += size;
+
+    return rloc;
+  }
+
+  /**
+   * Allocates (using this pool) a generic type T.
+   *
+   * Params:
+   *     count = number of instances to allocate.
+   * Returns: pointer (of type T*) to memory buffer
+   */
+  template <typename T> T *allocate(const size_t count = 1) {
+    T *mem = static_cast<T *>(this->malloc(sizeof(T) * count));
+    return mem;
+  }
+};
+/** @} */
+
+/** @addtogroup nanoflann_metaprog_grp Auxiliary metaprogramming stuff
+ * @{ */
+
+/** Used to declare fixed-size arrays when DIM>0, dynamically-allocated vectors
+ * when DIM=-1. Fixed size version for a generic DIM:
+ */
+template <int DIM, typename T> struct array_or_vector_selector {
+  typedef std::array<T, DIM> container_t;
+};
+/** Dynamic size version */
+template <typename T> struct array_or_vector_selector<-1, T> {
+  typedef std::vector<T> container_t;
+};
+
+/** @} */
+
+/** kd-tree base-class
+ *
+ * Contains the member functions common to the classes KDTreeSingleIndexAdaptor
+ * and KDTreeSingleIndexDynamicAdaptor_.
+ *
+ * \tparam Derived The name of the class which inherits this class.
+ * \tparam DatasetAdaptor The user-provided adaptor (see comments above).
+ * \tparam Distance The distance metric to use, these are all classes derived
+ * from nanoflann::Metric \tparam DIM Dimensionality of data points (e.g. 3 for
+ * 3D points) \tparam IndexType Will be typically size_t or int
+ */
+
+template <class Derived, typename Distance, class DatasetAdaptor, int DIM = -1,
+          typename IndexType = size_t>
+class KDTreeBaseClass {
+
+public:
+  /** Frees the previously-built index. Automatically called within
+   * buildIndex(). */
+  void freeIndex(Derived &obj) {
+    obj.pool.free_all();
+    obj.root_node = NULL;
+    obj.m_size_at_index_build = 0;
+  }
+
+  typedef typename Distance::ElementType ElementType;
+  typedef typename Distance::DistanceType DistanceType;
+
+  /*--------------------- Internal Data Structures --------------------------*/
+  struct Node {
+    /** Union used because a node can be either a LEAF node or a non-leaf node,
+     * so both data fields are never used simultaneously */
+    union {
+      struct leaf {
+        IndexType left, right; //!< Indices of points in leaf node
+      } lr;
+      struct nonleaf {
+        int divfeat;                  //!< Dimension used for subdivision.
+        DistanceType divlow, divhigh; //!< The values used for subdivision.
+      } sub;
+    } node_type;
+    Node *child1, *child2; //!< Child nodes (both=NULL mean its a leaf node)
+  };
+
+  typedef Node *NodePtr;
+
+  struct Interval {
+    ElementType low, high;
+  };
+
+  /**
+   *  Array of indices to vectors in the dataset.
+   */
+  std::vector<IndexType> vind;
+
+  NodePtr root_node;
+
+  size_t m_leaf_max_size;
+
+  size_t m_size;                //!< Number of current points in the dataset
+  size_t m_size_at_index_build; //!< Number of points in the dataset when the
+                                //!< index was built
+  int dim;                      //!< Dimensionality of each data point
+
+  /** Define "BoundingBox" as a fixed-size or variable-size container depending
+   * on "DIM" */
+  typedef
+      typename array_or_vector_selector<DIM, Interval>::container_t BoundingBox;
+
+  /** Define "distance_vector_t" as a fixed-size or variable-size container
+   * depending on "DIM" */
+  typedef typename array_or_vector_selector<DIM, DistanceType>::container_t
+      distance_vector_t;
+
+  /** The KD-tree used to find neighbours */
+
+  BoundingBox root_bbox;
+
+  /**
+   * Pooled memory allocator.
+   *
+   * Using a pooled memory allocator is more efficient
+   * than allocating memory directly when there is a large
+   * number small of memory allocations.
+   */
+  PooledAllocator pool;
+
+  /** Returns number of points in dataset  */
+  size_t size(const Derived &obj) const { return obj.m_size; }
+
+  /** Returns the length of each point in the dataset */
+  size_t veclen(const Derived &obj) {
+    return static_cast<size_t>(DIM > 0 ? DIM : obj.dim);
+  }
+
+  /// Helper accessor to the dataset points:
+  inline ElementType dataset_get(const Derived &obj, size_t idx,
+                                 int component) const {
+    return obj.dataset.kdtree_get_pt(idx, component);
+  }
+
+  /**
+   * Computes the inde memory usage
+   * Returns: memory used by the index
+   */
+  size_t usedMemory(Derived &obj) {
+    return obj.pool.usedMemory + obj.pool.wastedMemory +
+           obj.dataset.kdtree_get_point_count() *
+               sizeof(IndexType); // pool memory and vind array memory
+  }
+
+  void computeMinMax(const Derived &obj, IndexType *ind, IndexType count,
+                     int element, ElementType &min_elem,
+                     ElementType &max_elem) {
+    min_elem = dataset_get(obj, ind[0], element);
+    max_elem = dataset_get(obj, ind[0], element);
+    for (IndexType i = 1; i < count; ++i) {
+      ElementType val = dataset_get(obj, ind[i], element);
+      if (val < min_elem)
+        min_elem = val;
+      if (val > max_elem)
+        max_elem = val;
+    }
+  }
+
+  /**
+   * Create a tree node that subdivides the list of vecs from vind[first]
+   * to vind[last].  The routine is called recursively on each sublist.
+   *
+   * @param left index of the first vector
+   * @param right index of the last vector
+   */
+  NodePtr divideTree(Derived &obj, const IndexType left, const IndexType right,
+                     BoundingBox &bbox) {
+    NodePtr node = obj.pool.template allocate<Node>(); // allocate memory
+
+    /* If too few exemplars remain, then make this a leaf node. */
+    if ((right - left) <= static_cast<IndexType>(obj.m_leaf_max_size)) {
+      node->child1 = node->child2 = NULL; /* Mark as leaf node. */
+      node->node_type.lr.left = left;
+      node->node_type.lr.right = right;
+
+      // compute bounding-box of leaf points
+      for (int i = 0; i < (DIM > 0 ? DIM : obj.dim); ++i) {
+        bbox[i].low = dataset_get(obj, obj.vind[left], i);
+        bbox[i].high = dataset_get(obj, obj.vind[left], i);
+      }
+      for (IndexType k = left + 1; k < right; ++k) {
+        for (int i = 0; i < (DIM > 0 ? DIM : obj.dim); ++i) {
+          if (bbox[i].low > dataset_get(obj, obj.vind[k], i))
+            bbox[i].low = dataset_get(obj, obj.vind[k], i);
+          if (bbox[i].high < dataset_get(obj, obj.vind[k], i))
+            bbox[i].high = dataset_get(obj, obj.vind[k], i);
+        }
+      }
+    } else {
+      IndexType idx;
+      int cutfeat;
+      DistanceType cutval;
+      middleSplit_(obj, &obj.vind[0] + left, right - left, idx, cutfeat, cutval,
+                   bbox);
+
+      node->node_type.sub.divfeat = cutfeat;
+
+      BoundingBox left_bbox(bbox);
+      left_bbox[cutfeat].high = cutval;
+      node->child1 = divideTree(obj, left, left + idx, left_bbox);
+
+      BoundingBox right_bbox(bbox);
+      right_bbox[cutfeat].low = cutval;
+      node->child2 = divideTree(obj, left + idx, right, right_bbox);
+
+      node->node_type.sub.divlow = left_bbox[cutfeat].high;
+      node->node_type.sub.divhigh = right_bbox[cutfeat].low;
+
+      for (int i = 0; i < (DIM > 0 ? DIM : obj.dim); ++i) {
+        bbox[i].low = std::min(left_bbox[i].low, right_bbox[i].low);
+        bbox[i].high = std::max(left_bbox[i].high, right_bbox[i].high);
+      }
+    }
+
+    return node;
+  }
+
+  void middleSplit_(Derived &obj, IndexType *ind, IndexType count,
+                    IndexType &index, int &cutfeat, DistanceType &cutval,
+                    const BoundingBox &bbox) {
+    const DistanceType EPS = static_cast<DistanceType>(0.00001);
+    ElementType max_span = bbox[0].high - bbox[0].low;
+    for (int i = 1; i < (DIM > 0 ? DIM : obj.dim); ++i) {
+      ElementType span = bbox[i].high - bbox[i].low;
+      if (span > max_span) {
+        max_span = span;
+      }
+    }
+    ElementType max_spread = -1;
+    cutfeat = 0;
+    for (int i = 0; i < (DIM > 0 ? DIM : obj.dim); ++i) {
+      ElementType span = bbox[i].high - bbox[i].low;
+      if (span > (1 - EPS) * max_span) {
+        ElementType min_elem, max_elem;
+        computeMinMax(obj, ind, count, i, min_elem, max_elem);
+        ElementType spread = max_elem - min_elem;
+        ;
+        if (spread > max_spread) {
+          cutfeat = i;
+          max_spread = spread;
+        }
+      }
+    }
+    // split in the middle
+    DistanceType split_val = (bbox[cutfeat].low + bbox[cutfeat].high) / 2;
+    ElementType min_elem, max_elem;
+    computeMinMax(obj, ind, count, cutfeat, min_elem, max_elem);
+
+    if (split_val < min_elem)
+      cutval = min_elem;
+    else if (split_val > max_elem)
+      cutval = max_elem;
+    else
+      cutval = split_val;
+
+    IndexType lim1, lim2;
+    planeSplit(obj, ind, count, cutfeat, cutval, lim1, lim2);
+
+    if (lim1 > count / 2)
+      index = lim1;
+    else if (lim2 < count / 2)
+      index = lim2;
+    else
+      index = count / 2;
+  }
+
+  /**
+   *  Subdivide the list of points by a plane perpendicular on axe corresponding
+   *  to the 'cutfeat' dimension at 'cutval' position.
+   *
+   *  On return:
+   *  dataset[ind[0..lim1-1]][cutfeat]<cutval
+   *  dataset[ind[lim1..lim2-1]][cutfeat]==cutval
+   *  dataset[ind[lim2..count]][cutfeat]>cutval
+   */
+  void planeSplit(Derived &obj, IndexType *ind, const IndexType count,
+                  int cutfeat, DistanceType &cutval, IndexType &lim1,
+                  IndexType &lim2) {
+    /* Move vector indices for left subtree to front of list. */
+    IndexType left = 0;
+    IndexType right = count - 1;
+    for (;;) {
+      while (left <= right && dataset_get(obj, ind[left], cutfeat) < cutval)
+        ++left;
+      while (right && left <= right &&
+             dataset_get(obj, ind[right], cutfeat) >= cutval)
+        --right;
+      if (left > right || !right)
+        break; // "!right" was added to support unsigned Index types
+      std::swap(ind[left], ind[right]);
+      ++left;
+      --right;
+    }
+    /* If either list is empty, it means that all remaining features
+     * are identical. Split in the middle to maintain a balanced tree.
+     */
+    lim1 = left;
+    right = count - 1;
+    for (;;) {
+      while (left <= right && dataset_get(obj, ind[left], cutfeat) <= cutval)
+        ++left;
+      while (right && left <= right &&
+             dataset_get(obj, ind[right], cutfeat) > cutval)
+        --right;
+      if (left > right || !right)
+        break; // "!right" was added to support unsigned Index types
+      std::swap(ind[left], ind[right]);
+      ++left;
+      --right;
+    }
+    lim2 = left;
+  }
+
+  DistanceType computeInitialDistances(const Derived &obj,
+                                       const ElementType *vec,
+                                       distance_vector_t &dists) const {
+    assert(vec);
+    DistanceType distsq = DistanceType();
+
+    for (int i = 0; i < (DIM > 0 ? DIM : obj.dim); ++i) {
+      if (vec[i] < obj.root_bbox[i].low) {
+        dists[i] = obj.distance.accum_dist(vec[i], obj.root_bbox[i].low, i);
+        distsq += dists[i];
+      }
+      if (vec[i] > obj.root_bbox[i].high) {
+        dists[i] = obj.distance.accum_dist(vec[i], obj.root_bbox[i].high, i);
+        distsq += dists[i];
+      }
+    }
+    return distsq;
+  }
+
+  void save_tree(Derived &obj, FILE *stream, NodePtr tree) {
+    save_value(stream, *tree);
+    if (tree->child1 != NULL) {
+      save_tree(obj, stream, tree->child1);
+    }
+    if (tree->child2 != NULL) {
+      save_tree(obj, stream, tree->child2);
+    }
+  }
+
+  void load_tree(Derived &obj, FILE *stream, NodePtr &tree) {
+    tree = obj.pool.template allocate<Node>();
+    load_value(stream, *tree);
+    if (tree->child1 != NULL) {
+      load_tree(obj, stream, tree->child1);
+    }
+    if (tree->child2 != NULL) {
+      load_tree(obj, stream, tree->child2);
+    }
+  }
+
+  /**  Stores the index in a binary file.
+   *   IMPORTANT NOTE: The set of data points is NOT stored in the file, so when
+   * loading the index object it must be constructed associated to the same
+   * source of data points used while building it. See the example:
+   * examples/saveload_example.cpp \sa loadIndex  */
+  void saveIndex_(Derived &obj, FILE *stream) {
+    save_value(stream, obj.m_size);
+    save_value(stream, obj.dim);
+    save_value(stream, obj.root_bbox);
+    save_value(stream, obj.m_leaf_max_size);
+    save_value(stream, obj.vind);
+    save_tree(obj, stream, obj.root_node);
+  }
+
+  /**  Loads a previous index from a binary file.
+   *   IMPORTANT NOTE: The set of data points is NOT stored in the file, so the
+   * index object must be constructed associated to the same source of data
+   * points used while building the index. See the example:
+   * examples/saveload_example.cpp \sa loadIndex  */
+  void loadIndex_(Derived &obj, FILE *stream) {
+    load_value(stream, obj.m_size);
+    load_value(stream, obj.dim);
+    load_value(stream, obj.root_bbox);
+    load_value(stream, obj.m_leaf_max_size);
+    load_value(stream, obj.vind);
+    load_tree(obj, stream, obj.root_node);
+  }
+};
+
+/** @addtogroup kdtrees_grp KD-tree classes and adaptors
+ * @{ */
+
+/** kd-tree static index
+ *
+ * Contains the k-d trees and other information for indexing a set of points
+ * for nearest-neighbor matching.
+ *
+ *  The class "DatasetAdaptor" must provide the following interface (can be
+ * non-virtual, inlined methods):
+ *
+ *  \code
+ *   // Must return the number of data poins
+ *   inline size_t kdtree_get_point_count() const { ... }
+ *
+ *
+ *   // Must return the dim'th component of the idx'th point in the class:
+ *   inline T kdtree_get_pt(const size_t idx, const size_t dim) const { ... }
+ *
+ *   // Optional bounding-box computation: return false to default to a standard
+ * bbox computation loop.
+ *   //   Return true if the BBOX was already computed by the class and returned
+ * in "bb" so it can be avoided to redo it again.
+ *   //   Look at bb.size() to find out the expected dimensionality (e.g. 2 or 3
+ * for point clouds) template <class BBOX> bool kdtree_get_bbox(BBOX &bb) const
+ *   {
+ *      bb[0].low = ...; bb[0].high = ...;  // 0th dimension limits
+ *      bb[1].low = ...; bb[1].high = ...;  // 1st dimension limits
+ *      ...
+ *      return true;
+ *   }
+ *
+ *  \endcode
+ *
+ * \tparam DatasetAdaptor The user-provided adaptor (see comments above).
+ * \tparam Distance The distance metric to use: nanoflann::metric_L1,
+ * nanoflann::metric_L2, nanoflann::metric_L2_Simple, etc. \tparam DIM
+ * Dimensionality of data points (e.g. 3 for 3D points) \tparam IndexType Will
+ * be typically size_t or int
+ */
+template <typename Distance, class DatasetAdaptor, int DIM = -1,
+          typename IndexType = size_t>
+class KDTreeSingleIndexAdaptor
+    : public KDTreeBaseClass<
+          KDTreeSingleIndexAdaptor<Distance, DatasetAdaptor, DIM, IndexType>,
+          Distance, DatasetAdaptor, DIM, IndexType> {
+public:
+  /** Deleted copy constructor*/
+  KDTreeSingleIndexAdaptor(
+      const KDTreeSingleIndexAdaptor<Distance, DatasetAdaptor, DIM, IndexType>
+          &) = delete;
+
+  /**
+   * The dataset used by this index
+   */
+  const DatasetAdaptor &dataset; //!< The source of our data
+
+  const KDTreeSingleIndexAdaptorParams index_params;
+
+  Distance distance;
+
+  typedef typename nanoflann::KDTreeBaseClass<
+      nanoflann::KDTreeSingleIndexAdaptor<Distance, DatasetAdaptor, DIM,
+                                          IndexType>,
+      Distance, DatasetAdaptor, DIM, IndexType>
+      BaseClassRef;
+
+  typedef typename BaseClassRef::ElementType ElementType;
+  typedef typename BaseClassRef::DistanceType DistanceType;
+
+  typedef typename BaseClassRef::Node Node;
+  typedef Node *NodePtr;
+
+  typedef typename BaseClassRef::Interval Interval;
+  /** Define "BoundingBox" as a fixed-size or variable-size container depending
+   * on "DIM" */
+  typedef typename BaseClassRef::BoundingBox BoundingBox;
+
+  /** Define "distance_vector_t" as a fixed-size or variable-size container
+   * depending on "DIM" */
+  typedef typename BaseClassRef::distance_vector_t distance_vector_t;
+
+  /**
+   * KDTree constructor
+   *
+   * Refer to docs in README.md or online in
+   * https://github.com/jlblancoc/nanoflann
+   *
+   * The KD-Tree point dimension (the length of each point in the datase, e.g. 3
+   * for 3D points) is determined by means of:
+   *  - The \a DIM template parameter if >0 (highest priority)
+   *  - Otherwise, the \a dimensionality parameter of this constructor.
+   *
+   * @param inputData Dataset with the input features
+   * @param params Basically, the maximum leaf node size
+   */
+  KDTreeSingleIndexAdaptor(const int dimensionality,
+                           const DatasetAdaptor &inputData,
+                           const KDTreeSingleIndexAdaptorParams &params =
+                               KDTreeSingleIndexAdaptorParams())
+      : dataset(inputData), index_params(params), distance(inputData) {
+    BaseClassRef::root_node = NULL;
+    BaseClassRef::m_size = dataset.kdtree_get_point_count();
+    BaseClassRef::m_size_at_index_build = BaseClassRef::m_size;
+    BaseClassRef::dim = dimensionality;
+    if (DIM > 0)
+      BaseClassRef::dim = DIM;
+    BaseClassRef::m_leaf_max_size = params.leaf_max_size;
+
+    // Create a permutable array of indices to the input vectors.
+    init_vind();
+  }
+
+  /**
+   * Builds the index
+   */
+  void buildIndex() {
+    BaseClassRef::m_size = dataset.kdtree_get_point_count();
+    BaseClassRef::m_size_at_index_build = BaseClassRef::m_size;
+    init_vind();
+    this->freeIndex(*this);
+    BaseClassRef::m_size_at_index_build = BaseClassRef::m_size;
+    if (BaseClassRef::m_size == 0)
+      return;
+    computeBoundingBox(BaseClassRef::root_bbox);
+    BaseClassRef::root_node =
+        this->divideTree(*this, 0, BaseClassRef::m_size,
+                         BaseClassRef::root_bbox); // construct the tree
+  }
+
+  /** \name Query methods
+   * @{ */
+
+  /**
+   * Find set of nearest neighbors to vec[0:dim-1]. Their indices are stored
+   * inside the result object.
+   *
+   * Params:
+   *     result = the result object in which the indices of the
+   * nearest-neighbors are stored vec = the vector for which to search the
+   * nearest neighbors
+   *
+   * \tparam RESULTSET Should be any ResultSet<DistanceType>
+   * \return  True if the requested neighbors could be found.
+   * \sa knnSearch, radiusSearch
+   */
+  template <typename RESULTSET>
+  bool findNeighbors(RESULTSET &result, const ElementType *vec,
+                     const SearchParams &searchParams) const {
+    assert(vec);
+    if (this->size(*this) == 0)
+      return false;
+    if (!BaseClassRef::root_node)
+      throw std::runtime_error(
+          "[nanoflann] findNeighbors() called before building the index.");
+    float epsError = 1 + searchParams.eps;
+
+    distance_vector_t
+        dists; // fixed or variable-sized container (depending on DIM)
+    auto zero = static_cast<decltype(result.worstDist())>(0);
+    assign(dists, (DIM > 0 ? DIM : BaseClassRef::dim),
+           zero); // Fill it with zeros.
+    DistanceType distsq = this->computeInitialDistances(*this, vec, dists);
+    searchLevel(result, vec, BaseClassRef::root_node, distsq, dists,
+                epsError); // "count_leaf" parameter removed since was neither
+                           // used nor returned to the user.
+    return result.full();
+  }
+
+  /**
+   * Find the "num_closest" nearest neighbors to the \a query_point[0:dim-1].
+   * Their indices are stored inside the result object. \sa radiusSearch,
+   * findNeighbors \note nChecks_IGNORED is ignored but kept for compatibility
+   * with the original FLANN interface. \return Number `N` of valid points in
+   * the result set. Only the first `N` entries in `out_indices` and
+   * `out_distances_sq` will be valid. Return may be less than `num_closest`
+   * only if the number of elements in the tree is less than `num_closest`.
+   */
+  size_t knnSearch(const ElementType *query_point, const size_t num_closest,
+                   IndexType *out_indices, DistanceType *out_distances_sq,
+                   const int /* nChecks_IGNORED */ = 10) const {
+    nanoflann::KNNResultSet<DistanceType, IndexType> resultSet(num_closest);
+    resultSet.init(out_indices, out_distances_sq);
+    this->findNeighbors(resultSet, query_point, nanoflann::SearchParams());
+    return resultSet.size();
+  }
+
+  /**
+   * Find all the neighbors to \a query_point[0:dim-1] within a maximum radius.
+   *  The output is given as a vector of pairs, of which the first element is a
+   * point index and the second the corresponding distance. Previous contents of
+   * \a IndicesDists are cleared.
+   *
+   *  If searchParams.sorted==true, the output list is sorted by ascending
+   * distances.
+   *
+   *  For a better performance, it is advisable to do a .reserve() on the vector
+   * if you have any wild guess about the number of expected matches.
+   *
+   *  \sa knnSearch, findNeighbors, radiusSearchCustomCallback
+   * \return The number of points within the given radius (i.e. indices.size()
+   * or dists.size() )
+   */
+  size_t
+  radiusSearch(const ElementType *query_point, const DistanceType &radius,
+               std::vector<std::pair<IndexType, DistanceType>> &IndicesDists,
+               const SearchParams &searchParams) const {
+    RadiusResultSet<DistanceType, IndexType> resultSet(radius, IndicesDists);
+    const size_t nFound =
+        radiusSearchCustomCallback(query_point, resultSet, searchParams);
+    if (searchParams.sorted)
+      std::sort(IndicesDists.begin(), IndicesDists.end(), IndexDist_Sorter());
+    return nFound;
+  }
+
+  /**
+   * Just like radiusSearch() but with a custom callback class for each point
+   * found in the radius of the query. See the source of RadiusResultSet<> as a
+   * start point for your own classes. \sa radiusSearch
+   */
+  template <class SEARCH_CALLBACK>
+  size_t radiusSearchCustomCallback(
+      const ElementType *query_point, SEARCH_CALLBACK &resultSet,
+      const SearchParams &searchParams = SearchParams()) const {
+    this->findNeighbors(resultSet, query_point, searchParams);
+    return resultSet.size();
+  }
+
+  /** @} */
+
+public:
+  /** Make sure the auxiliary list \a vind has the same size than the current
+   * dataset, and re-generate if size has changed. */
+  void init_vind() {
+    // Create a permutable array of indices to the input vectors.
+    BaseClassRef::m_size = dataset.kdtree_get_point_count();
+    if (BaseClassRef::vind.size() != BaseClassRef::m_size)
+      BaseClassRef::vind.resize(BaseClassRef::m_size);
+    for (size_t i = 0; i < BaseClassRef::m_size; i++)
+      BaseClassRef::vind[i] = i;
+  }
+
+  void computeBoundingBox(BoundingBox &bbox) {
+    resize(bbox, (DIM > 0 ? DIM : BaseClassRef::dim));
+    if (dataset.kdtree_get_bbox(bbox)) {
+      // Done! It was implemented in derived class
+    } else {
+      const size_t N = dataset.kdtree_get_point_count();
+      if (!N)
+        throw std::runtime_error("[nanoflann] computeBoundingBox() called but "
+                                 "no data points found.");
+      for (int i = 0; i < (DIM > 0 ? DIM : BaseClassRef::dim); ++i) {
+        bbox[i].low = bbox[i].high = this->dataset_get(*this, 0, i);
+      }
+      for (size_t k = 1; k < N; ++k) {
+        for (int i = 0; i < (DIM > 0 ? DIM : BaseClassRef::dim); ++i) {
+          if (this->dataset_get(*this, k, i) < bbox[i].low)
+            bbox[i].low = this->dataset_get(*this, k, i);
+          if (this->dataset_get(*this, k, i) > bbox[i].high)
+            bbox[i].high = this->dataset_get(*this, k, i);
+        }
+      }
+    }
+  }
+
+  /**
+   * Performs an exact search in the tree starting from a node.
+   * \tparam RESULTSET Should be any ResultSet<DistanceType>
+   * \return true if the search should be continued, false if the results are
+   * sufficient
+   */
+  template <class RESULTSET>
+  bool searchLevel(RESULTSET &result_set, const ElementType *vec,
+                   const NodePtr node, DistanceType mindistsq,
+                   distance_vector_t &dists, const float epsError) const {
+    /* If this is a leaf node, then do check and return. */
+    if ((node->child1 == NULL) && (node->child2 == NULL)) {
+      // count_leaf += (node->lr.right-node->lr.left);  // Removed since was
+      // neither used nor returned to the user.
+      DistanceType worst_dist = result_set.worstDist();
+      for (IndexType i = node->node_type.lr.left; i < node->node_type.lr.right;
+           ++i) {
+        const IndexType index = BaseClassRef::vind[i]; // reorder... : i;
+        DistanceType dist = distance.evalMetric(
+            vec, index, (DIM > 0 ? DIM : BaseClassRef::dim));
+        if (dist < worst_dist) {
+          if (!result_set.addPoint(dist, BaseClassRef::vind[i])) {
+            // the resultset doesn't want to receive any more points, we're done
+            // searching!
+            return false;
+          }
+        }
+      }
+      return true;
+    }
+
+    /* Which child branch should be taken first? */
+    int idx = node->node_type.sub.divfeat;
+    ElementType val = vec[idx];
+    DistanceType diff1 = val - node->node_type.sub.divlow;
+    DistanceType diff2 = val - node->node_type.sub.divhigh;
+
+    NodePtr bestChild;
+    NodePtr otherChild;
+    DistanceType cut_dist;
+    if ((diff1 + diff2) < 0) {
+      bestChild = node->child1;
+      otherChild = node->child2;
+      cut_dist = distance.accum_dist(val, node->node_type.sub.divhigh, idx);
+    } else {
+      bestChild = node->child2;
+      otherChild = node->child1;
+      cut_dist = distance.accum_dist(val, node->node_type.sub.divlow, idx);
+    }
+
+    /* Call recursively to search next level down. */
+    if (!searchLevel(result_set, vec, bestChild, mindistsq, dists, epsError)) {
+      // the resultset doesn't want to receive any more points, we're done
+      // searching!
+      return false;
+    }
+
+    DistanceType dst = dists[idx];
+    mindistsq = mindistsq + cut_dist - dst;
+    dists[idx] = cut_dist;
+    if (mindistsq * epsError <= result_set.worstDist()) {
+      if (!searchLevel(result_set, vec, otherChild, mindistsq, dists,
+                       epsError)) {
+        // the resultset doesn't want to receive any more points, we're done
+        // searching!
+        return false;
+      }
+    }
+    dists[idx] = dst;
+    return true;
+  }
+
+public:
+  /**  Stores the index in a binary file.
+   *   IMPORTANT NOTE: The set of data points is NOT stored in the file, so when
+   * loading the index object it must be constructed associated to the same
+   * source of data points used while building it. See the example:
+   * examples/saveload_example.cpp \sa loadIndex  */
+  void saveIndex(FILE *stream) { this->saveIndex_(*this, stream); }
+
+  /**  Loads a previous index from a binary file.
+   *   IMPORTANT NOTE: The set of data points is NOT stored in the file, so the
+   * index object must be constructed associated to the same source of data
+   * points used while building the index. See the example:
+   * examples/saveload_example.cpp \sa loadIndex  */
+  void loadIndex(FILE *stream) { this->loadIndex_(*this, stream); }
+
+}; // class KDTree
+
+/** kd-tree dynamic index
+ *
+ * Contains the k-d trees and other information for indexing a set of points
+ * for nearest-neighbor matching.
+ *
+ *  The class "DatasetAdaptor" must provide the following interface (can be
+ * non-virtual, inlined methods):
+ *
+ *  \code
+ *   // Must return the number of data poins
+ *   inline size_t kdtree_get_point_count() const { ... }
+ *
+ *   // Must return the dim'th component of the idx'th point in the class:
+ *   inline T kdtree_get_pt(const size_t idx, const size_t dim) const { ... }
+ *
+ *   // Optional bounding-box computation: return false to default to a standard
+ * bbox computation loop.
+ *   //   Return true if the BBOX was already computed by the class and returned
+ * in "bb" so it can be avoided to redo it again.
+ *   //   Look at bb.size() to find out the expected dimensionality (e.g. 2 or 3
+ * for point clouds) template <class BBOX> bool kdtree_get_bbox(BBOX &bb) const
+ *   {
+ *      bb[0].low = ...; bb[0].high = ...;  // 0th dimension limits
+ *      bb[1].low = ...; bb[1].high = ...;  // 1st dimension limits
+ *      ...
+ *      return true;
+ *   }
+ *
+ *  \endcode
+ *
+ * \tparam DatasetAdaptor The user-provided adaptor (see comments above).
+ * \tparam Distance The distance metric to use: nanoflann::metric_L1,
+ * nanoflann::metric_L2, nanoflann::metric_L2_Simple, etc. \tparam DIM
+ * Dimensionality of data points (e.g. 3 for 3D points) \tparam IndexType Will
+ * be typically size_t or int
+ */
+template <typename Distance, class DatasetAdaptor, int DIM = -1,
+          typename IndexType = size_t>
+class KDTreeSingleIndexDynamicAdaptor_
+    : public KDTreeBaseClass<KDTreeSingleIndexDynamicAdaptor_<
+                                 Distance, DatasetAdaptor, DIM, IndexType>,
+                             Distance, DatasetAdaptor, DIM, IndexType> {
+public:
+  /**
+   * The dataset used by this index
+   */
+  const DatasetAdaptor &dataset; //!< The source of our data
+
+  KDTreeSingleIndexAdaptorParams index_params;
+
+  std::vector<int> &treeIndex;
+
+  Distance distance;
+
+  typedef typename nanoflann::KDTreeBaseClass<
+      nanoflann::KDTreeSingleIndexDynamicAdaptor_<Distance, DatasetAdaptor, DIM,
+                                                  IndexType>,
+      Distance, DatasetAdaptor, DIM, IndexType>
+      BaseClassRef;
+
+  typedef typename BaseClassRef::ElementType ElementType;
+  typedef typename BaseClassRef::DistanceType DistanceType;
+
+  typedef typename BaseClassRef::Node Node;
+  typedef Node *NodePtr;
+
+  typedef typename BaseClassRef::Interval Interval;
+  /** Define "BoundingBox" as a fixed-size or variable-size container depending
+   * on "DIM" */
+  typedef typename BaseClassRef::BoundingBox BoundingBox;
+
+  /** Define "distance_vector_t" as a fixed-size or variable-size container
+   * depending on "DIM" */
+  typedef typename BaseClassRef::distance_vector_t distance_vector_t;
+
+  /**
+   * KDTree constructor
+   *
+   * Refer to docs in README.md or online in
+   * https://github.com/jlblancoc/nanoflann
+   *
+   * The KD-Tree point dimension (the length of each point in the datase, e.g. 3
+   * for 3D points) is determined by means of:
+   *  - The \a DIM template parameter if >0 (highest priority)
+   *  - Otherwise, the \a dimensionality parameter of this constructor.
+   *
+   * @param inputData Dataset with the input features
+   * @param params Basically, the maximum leaf node size
+   */
+  KDTreeSingleIndexDynamicAdaptor_(
+      const int dimensionality, const DatasetAdaptor &inputData,
+      std::vector<int> &treeIndex_,
+      const KDTreeSingleIndexAdaptorParams &params =
+          KDTreeSingleIndexAdaptorParams())
+      : dataset(inputData), index_params(params), treeIndex(treeIndex_),
+        distance(inputData) {
+    BaseClassRef::root_node = NULL;
+    BaseClassRef::m_size = 0;
+    BaseClassRef::m_size_at_index_build = 0;
+    BaseClassRef::dim = dimensionality;
+    if (DIM > 0)
+      BaseClassRef::dim = DIM;
+    BaseClassRef::m_leaf_max_size = params.leaf_max_size;
+  }
+
+  /** Assignment operator definiton */
+  KDTreeSingleIndexDynamicAdaptor_
+  operator=(const KDTreeSingleIndexDynamicAdaptor_ &rhs) {
+    KDTreeSingleIndexDynamicAdaptor_ tmp(rhs);
+    std::swap(BaseClassRef::vind, tmp.BaseClassRef::vind);
+    std::swap(BaseClassRef::m_leaf_max_size, tmp.BaseClassRef::m_leaf_max_size);
+    std::swap(index_params, tmp.index_params);
+    std::swap(treeIndex, tmp.treeIndex);
+    std::swap(BaseClassRef::m_size, tmp.BaseClassRef::m_size);
+    std::swap(BaseClassRef::m_size_at_index_build,
+              tmp.BaseClassRef::m_size_at_index_build);
+    std::swap(BaseClassRef::root_node, tmp.BaseClassRef::root_node);
+    std::swap(BaseClassRef::root_bbox, tmp.BaseClassRef::root_bbox);
+    std::swap(BaseClassRef::pool, tmp.BaseClassRef::pool);
+    return *this;
+  }
+
+  /**
+   * Builds the index
+   */
+  void buildIndex() {
+    BaseClassRef::m_size = BaseClassRef::vind.size();
+    this->freeIndex(*this);
+    BaseClassRef::m_size_at_index_build = BaseClassRef::m_size;
+    if (BaseClassRef::m_size == 0)
+      return;
+    computeBoundingBox(BaseClassRef::root_bbox);
+    BaseClassRef::root_node =
+        this->divideTree(*this, 0, BaseClassRef::m_size,
+                         BaseClassRef::root_bbox); // construct the tree
+  }
+
+  /** \name Query methods
+   * @{ */
+
+  /**
+   * Find set of nearest neighbors to vec[0:dim-1]. Their indices are stored
+   * inside the result object.
+   *
+   * Params:
+   *     result = the result object in which the indices of the
+   * nearest-neighbors are stored vec = the vector for which to search the
+   * nearest neighbors
+   *
+   * \tparam RESULTSET Should be any ResultSet<DistanceType>
+   * \return  True if the requested neighbors could be found.
+   * \sa knnSearch, radiusSearch
+   */
+  template <typename RESULTSET>
+  bool findNeighbors(RESULTSET &result, const ElementType *vec,
+                     const SearchParams &searchParams) const {
+    assert(vec);
+    if (this->size(*this) == 0)
+      return false;
+    if (!BaseClassRef::root_node)
+      return false;
+    float epsError = 1 + searchParams.eps;
+
+    // fixed or variable-sized container (depending on DIM)
+    distance_vector_t dists;
+    // Fill it with zeros.
+    assign(dists, (DIM > 0 ? DIM : BaseClassRef::dim),
+           static_cast<typename distance_vector_t::value_type>(0));
+    DistanceType distsq = this->computeInitialDistances(*this, vec, dists);
+    searchLevel(result, vec, BaseClassRef::root_node, distsq, dists,
+                epsError); // "count_leaf" parameter removed since was neither
+                           // used nor returned to the user.
+    return result.full();
+  }
+
+  /**
+   * Find the "num_closest" nearest neighbors to the \a query_point[0:dim-1].
+   * Their indices are stored inside the result object. \sa radiusSearch,
+   * findNeighbors \note nChecks_IGNORED is ignored but kept for compatibility
+   * with the original FLANN interface. \return Number `N` of valid points in
+   * the result set. Only the first `N` entries in `out_indices` and
+   * `out_distances_sq` will be valid. Return may be less than `num_closest`
+   * only if the number of elements in the tree is less than `num_closest`.
+   */
+  size_t knnSearch(const ElementType *query_point, const size_t num_closest,
+                   IndexType *out_indices, DistanceType *out_distances_sq,
+                   const int /* nChecks_IGNORED */ = 10) const {
+    nanoflann::KNNResultSet<DistanceType, IndexType> resultSet(num_closest);
+    resultSet.init(out_indices, out_distances_sq);
+    this->findNeighbors(resultSet, query_point, nanoflann::SearchParams());
+    return resultSet.size();
+  }
+
+  /**
+   * Find all the neighbors to \a query_point[0:dim-1] within a maximum radius.
+   *  The output is given as a vector of pairs, of which the first element is a
+   * point index and the second the corresponding distance. Previous contents of
+   * \a IndicesDists are cleared.
+   *
+   *  If searchParams.sorted==true, the output list is sorted by ascending
+   * distances.
+   *
+   *  For a better performance, it is advisable to do a .reserve() on the vector
+   * if you have any wild guess about the number of expected matches.
+   *
+   *  \sa knnSearch, findNeighbors, radiusSearchCustomCallback
+   * \return The number of points within the given radius (i.e. indices.size()
+   * or dists.size() )
+   */
+  size_t
+  radiusSearch(const ElementType *query_point, const DistanceType &radius,
+               std::vector<std::pair<IndexType, DistanceType>> &IndicesDists,
+               const SearchParams &searchParams) const {
+    RadiusResultSet<DistanceType, IndexType> resultSet(radius, IndicesDists);
+    const size_t nFound =
+        radiusSearchCustomCallback(query_point, resultSet, searchParams);
+    if (searchParams.sorted)
+      std::sort(IndicesDists.begin(), IndicesDists.end(), IndexDist_Sorter());
+    return nFound;
+  }
+
+  /**
+   * Just like radiusSearch() but with a custom callback class for each point
+   * found in the radius of the query. See the source of RadiusResultSet<> as a
+   * start point for your own classes. \sa radiusSearch
+   */
+  template <class SEARCH_CALLBACK>
+  size_t radiusSearchCustomCallback(
+      const ElementType *query_point, SEARCH_CALLBACK &resultSet,
+      const SearchParams &searchParams = SearchParams()) const {
+    this->findNeighbors(resultSet, query_point, searchParams);
+    return resultSet.size();
+  }
+
+  /** @} */
+
+public:
+  void computeBoundingBox(BoundingBox &bbox) {
+    resize(bbox, (DIM > 0 ? DIM : BaseClassRef::dim));
+
+    if (dataset.kdtree_get_bbox(bbox)) {
+      // Done! It was implemented in derived class
+    } else {
+      const size_t N = BaseClassRef::m_size;
+      if (!N)
+        throw std::runtime_error("[nanoflann] computeBoundingBox() called but "
+                                 "no data points found.");
+      for (int i = 0; i < (DIM > 0 ? DIM : BaseClassRef::dim); ++i) {
+        bbox[i].low = bbox[i].high =
+            this->dataset_get(*this, BaseClassRef::vind[0], i);
+      }
+      for (size_t k = 1; k < N; ++k) {
+        for (int i = 0; i < (DIM > 0 ? DIM : BaseClassRef::dim); ++i) {
+          if (this->dataset_get(*this, BaseClassRef::vind[k], i) < bbox[i].low)
+            bbox[i].low = this->dataset_get(*this, BaseClassRef::vind[k], i);
+          if (this->dataset_get(*this, BaseClassRef::vind[k], i) > bbox[i].high)
+            bbox[i].high = this->dataset_get(*this, BaseClassRef::vind[k], i);
+        }
+      }
+    }
+  }
+
+  /**
+   * Performs an exact search in the tree starting from a node.
+   * \tparam RESULTSET Should be any ResultSet<DistanceType>
+   */
+  template <class RESULTSET>
+  void searchLevel(RESULTSET &result_set, const ElementType *vec,
+                   const NodePtr node, DistanceType mindistsq,
+                   distance_vector_t &dists, const float epsError) const {
+    /* If this is a leaf node, then do check and return. */
+    if ((node->child1 == NULL) && (node->child2 == NULL)) {
+      // count_leaf += (node->lr.right-node->lr.left);  // Removed since was
+      // neither used nor returned to the user.
+      DistanceType worst_dist = result_set.worstDist();
+      for (IndexType i = node->node_type.lr.left; i < node->node_type.lr.right;
+           ++i) {
+        const IndexType index = BaseClassRef::vind[i]; // reorder... : i;
+        if (treeIndex[index] == -1)
+          continue;
+        DistanceType dist = distance.evalMetric(
+            vec, index, (DIM > 0 ? DIM : BaseClassRef::dim));
+        if (dist < worst_dist) {
+          if (!result_set.addPoint(
+                  static_cast<typename RESULTSET::DistanceType>(dist),
+                  static_cast<typename RESULTSET::IndexType>(
+                      BaseClassRef::vind[i]))) {
+            // the resultset doesn't want to receive any more points, we're done
+            // searching!
+            return; // false;
+          }
+        }
+      }
+      return;
+    }
+
+    /* Which child branch should be taken first? */
+    int idx = node->node_type.sub.divfeat;
+    ElementType val = vec[idx];
+    DistanceType diff1 = val - node->node_type.sub.divlow;
+    DistanceType diff2 = val - node->node_type.sub.divhigh;
+
+    NodePtr bestChild;
+    NodePtr otherChild;
+    DistanceType cut_dist;
+    if ((diff1 + diff2) < 0) {
+      bestChild = node->child1;
+      otherChild = node->child2;
+      cut_dist = distance.accum_dist(val, node->node_type.sub.divhigh, idx);
+    } else {
+      bestChild = node->child2;
+      otherChild = node->child1;
+      cut_dist = distance.accum_dist(val, node->node_type.sub.divlow, idx);
+    }
+
+    /* Call recursively to search next level down. */
+    searchLevel(result_set, vec, bestChild, mindistsq, dists, epsError);
+
+    DistanceType dst = dists[idx];
+    mindistsq = mindistsq + cut_dist - dst;
+    dists[idx] = cut_dist;
+    if (mindistsq * epsError <= result_set.worstDist()) {
+      searchLevel(result_set, vec, otherChild, mindistsq, dists, epsError);
+    }
+    dists[idx] = dst;
+  }
+
+public:
+  /**  Stores the index in a binary file.
+   *   IMPORTANT NOTE: The set of data points is NOT stored in the file, so when
+   * loading the index object it must be constructed associated to the same
+   * source of data points used while building it. See the example:
+   * examples/saveload_example.cpp \sa loadIndex  */
+  void saveIndex(FILE *stream) { this->saveIndex_(*this, stream); }
+
+  /**  Loads a previous index from a binary file.
+   *   IMPORTANT NOTE: The set of data points is NOT stored in the file, so the
+   * index object must be constructed associated to the same source of data
+   * points used while building the index. See the example:
+   * examples/saveload_example.cpp \sa loadIndex  */
+  void loadIndex(FILE *stream) { this->loadIndex_(*this, stream); }
+};
+
+/** kd-tree dynaimic index
+ *
+ * class to create multiple static index and merge their results to behave as
+ * single dynamic index as proposed in Logarithmic Approach.
+ *
+ *  Example of usage:
+ *  examples/dynamic_pointcloud_example.cpp
+ *
+ * \tparam DatasetAdaptor The user-provided adaptor (see comments above).
+ * \tparam Distance The distance metric to use: nanoflann::metric_L1,
+ * nanoflann::metric_L2, nanoflann::metric_L2_Simple, etc. \tparam DIM
+ * Dimensionality of data points (e.g. 3 for 3D points) \tparam IndexType Will
+ * be typically size_t or int
+ */
+template <typename Distance, class DatasetAdaptor, int DIM = -1,
+          typename IndexType = size_t>
+class KDTreeSingleIndexDynamicAdaptor {
+public:
+  typedef typename Distance::ElementType ElementType;
+  typedef typename Distance::DistanceType DistanceType;
+
+protected:
+  size_t m_leaf_max_size;
+  size_t treeCount;
+  size_t pointCount;
+
+  /**
+   * The dataset used by this index
+   */
+  const DatasetAdaptor &dataset; //!< The source of our data
+
+  std::vector<int> treeIndex; //!< treeIndex[idx] is the index of tree in which
+                              //!< point at idx is stored. treeIndex[idx]=-1
+                              //!< means that point has been removed.
+
+  KDTreeSingleIndexAdaptorParams index_params;
+
+  int dim; //!< Dimensionality of each data point
+
+  typedef KDTreeSingleIndexDynamicAdaptor_<Distance, DatasetAdaptor, DIM>
+      index_container_t;
+  std::vector<index_container_t> index;
+
+public:
+  /** Get a const ref to the internal list of indices; the number of indices is
+   * adapted dynamically as the dataset grows in size. */
+  const std::vector<index_container_t> &getAllIndices() const { return index; }
+
+private:
+  /** finds position of least significant unset bit */
+  int First0Bit(IndexType num) {
+    int pos = 0;
+    while (num & 1) {
+      num = num >> 1;
+      pos++;
+    }
+    return pos;
+  }
+
+  /** Creates multiple empty trees to handle dynamic support */
+  void init() {
+    typedef KDTreeSingleIndexDynamicAdaptor_<Distance, DatasetAdaptor, DIM>
+        my_kd_tree_t;
+    std::vector<my_kd_tree_t> index_(
+        treeCount, my_kd_tree_t(dim /*dim*/, dataset, treeIndex, index_params));
+    index = index_;
+  }
+
+public:
+  Distance distance;
+
+  /**
+   * KDTree constructor
+   *
+   * Refer to docs in README.md or online in
+   * https://github.com/jlblancoc/nanoflann
+   *
+   * The KD-Tree point dimension (the length of each point in the datase, e.g. 3
+   * for 3D points) is determined by means of:
+   *  - The \a DIM template parameter if >0 (highest priority)
+   *  - Otherwise, the \a dimensionality parameter of this constructor.
+   *
+   * @param inputData Dataset with the input features
+   * @param params Basically, the maximum leaf node size
+   */
+  KDTreeSingleIndexDynamicAdaptor(const int dimensionality,
+                                  const DatasetAdaptor &inputData,
+                                  const KDTreeSingleIndexAdaptorParams &params =
+                                      KDTreeSingleIndexAdaptorParams(),
+                                  const size_t maximumPointCount = 1000000000U)
+      : dataset(inputData), index_params(params), distance(inputData) {
+    treeCount = static_cast<size_t>(std::log2(maximumPointCount));
+    pointCount = 0U;
+    dim = dimensionality;
+    treeIndex.clear();
+    if (DIM > 0)
+      dim = DIM;
+    m_leaf_max_size = params.leaf_max_size;
+    init();
+    const size_t num_initial_points = dataset.kdtree_get_point_count();
+    if (num_initial_points > 0) {
+      addPoints(0, num_initial_points - 1);
+    }
+  }
+
+  /** Deleted copy constructor*/
+  KDTreeSingleIndexDynamicAdaptor(
+      const KDTreeSingleIndexDynamicAdaptor<Distance, DatasetAdaptor, DIM,
+                                            IndexType> &) = delete;
+
+  /** Add points to the set, Inserts all points from [start, end] */
+  void addPoints(IndexType start, IndexType end) {
+    size_t count = end - start + 1;
+    treeIndex.resize(treeIndex.size() + count);
+    for (IndexType idx = start; idx <= end; idx++) {
+      int pos = First0Bit(pointCount);
+      index[pos].vind.clear();
+      treeIndex[pointCount] = pos;
+      for (int i = 0; i < pos; i++) {
+        for (int j = 0; j < static_cast<int>(index[i].vind.size()); j++) {
+          index[pos].vind.push_back(index[i].vind[j]);
+          if (treeIndex[index[i].vind[j]] != -1)
+            treeIndex[index[i].vind[j]] = pos;
+        }
+        index[i].vind.clear();
+        index[i].freeIndex(index[i]);
+      }
+      index[pos].vind.push_back(idx);
+      index[pos].buildIndex();
+      pointCount++;
+    }
+  }
+
+  /** Remove a point from the set (Lazy Deletion) */
+  void removePoint(size_t idx) {
+    if (idx >= pointCount)
+      return;
+    treeIndex[idx] = -1;
+  }
+
+  /**
+   * Find set of nearest neighbors to vec[0:dim-1]. Their indices are stored
+   * inside the result object.
+   *
+   * Params:
+   *     result = the result object in which the indices of the
+   * nearest-neighbors are stored vec = the vector for which to search the
+   * nearest neighbors
+   *
+   * \tparam RESULTSET Should be any ResultSet<DistanceType>
+   * \return  True if the requested neighbors could be found.
+   * \sa knnSearch, radiusSearch
+   */
+  template <typename RESULTSET>
+  bool findNeighbors(RESULTSET &result, const ElementType *vec,
+                     const SearchParams &searchParams) const {
+    for (size_t i = 0; i < treeCount; i++) {
+      index[i].findNeighbors(result, &vec[0], searchParams);
+    }
+    return result.full();
+  }
+};
+
+/** An L2-metric KD-tree adaptor for working with data directly stored in an
+ * Eigen Matrix, without duplicating the data storage. Each row in the matrix
+ * represents a point in the state space.
+ *
+ *  Example of usage:
+ * \code
+ * 	Eigen::Matrix<num_t,Dynamic,Dynamic>  mat;
+ * 	// Fill out "mat"...
+ *
+ * 	typedef KDTreeEigenMatrixAdaptor< Eigen::Matrix<num_t,Dynamic,Dynamic> >
+ * my_kd_tree_t; const int max_leaf = 10; my_kd_tree_t   mat_index(mat, max_leaf
+ * ); mat_index.index->buildIndex(); mat_index.index->... \endcode
+ *
+ *  \tparam DIM If set to >0, it specifies a compile-time fixed dimensionality
+ * for the points in the data set, allowing more compiler optimizations. \tparam
+ * Distance The distance metric to use: nanoflann::metric_L1,
+ * nanoflann::metric_L2, nanoflann::metric_L2_Simple, etc.
+ */
+template <class MatrixType, int DIM = -1, class Distance = nanoflann::metric_L2>
+struct KDTreeEigenMatrixAdaptor {
+  typedef KDTreeEigenMatrixAdaptor<MatrixType, DIM, Distance> self_t;
+  typedef typename MatrixType::Scalar num_t;
+  typedef typename MatrixType::Index IndexType;
+  typedef
+      typename Distance::template traits<num_t, self_t>::distance_t metric_t;
+  typedef KDTreeSingleIndexAdaptor<metric_t, self_t,
+                                   MatrixType::ColsAtCompileTime, IndexType>
+      index_t;
+
+  index_t *index; //! The kd-tree index for the user to call its methods as
+                  //! usual with any other FLANN index.
+
+  /// Constructor: takes a const ref to the matrix object with the data points
+  KDTreeEigenMatrixAdaptor(const size_t dimensionality,
+                           const std::reference_wrapper<const MatrixType> &mat,
+                           const int leaf_max_size = 10)
+      : m_data_matrix(mat) {
+    const auto dims = mat.get().cols();
+    if (size_t(dims) != dimensionality)
+      throw std::runtime_error(
+          "Error: 'dimensionality' must match column count in data matrix");
+    if (DIM > 0 && int(dims) != DIM)
+      throw std::runtime_error(
+          "Data set dimensionality does not match the 'DIM' template argument");
+    index =
+        new index_t(static_cast<int>(dims), *this /* adaptor */,
+                    nanoflann::KDTreeSingleIndexAdaptorParams(leaf_max_size));
+    index->buildIndex();
+  }
+
+public:
+  /** Deleted copy constructor */
+  KDTreeEigenMatrixAdaptor(const self_t &) = delete;
+
+  ~KDTreeEigenMatrixAdaptor() { delete index; }
+
+  const std::reference_wrapper<const MatrixType> m_data_matrix;
+
+  /** Query for the \a num_closest closest points to a given point (entered as
+   * query_point[0:dim-1]). Note that this is a short-cut method for
+   * index->findNeighbors(). The user can also call index->... methods as
+   * desired. \note nChecks_IGNORED is ignored but kept for compatibility with
+   * the original FLANN interface.
+   */
+  inline void query(const num_t *query_point, const size_t num_closest,
+                    IndexType *out_indices, num_t *out_distances_sq,
+                    const int /* nChecks_IGNORED */ = 10) const {
+    nanoflann::KNNResultSet<num_t, IndexType> resultSet(num_closest);
+    resultSet.init(out_indices, out_distances_sq);
+    index->findNeighbors(resultSet, query_point, nanoflann::SearchParams());
+  }
+
+  /** @name Interface expected by KDTreeSingleIndexAdaptor
+   * @{ */
+
+  const self_t &derived() const { return *this; }
+  self_t &derived() { return *this; }
+
+  // Must return the number of data points
+  inline size_t kdtree_get_point_count() const {
+    return m_data_matrix.get().rows();
+  }
+
+  // Returns the dim'th component of the idx'th point in the class:
+  inline num_t kdtree_get_pt(const IndexType idx, size_t dim) const {
+    return m_data_matrix.get().coeff(idx, IndexType(dim));
+  }
+
+  // Optional bounding-box computation: return false to default to a standard
+  // bbox computation loop.
+  //   Return true if the BBOX was already computed by the class and returned in
+  //   "bb" so it can be avoided to redo it again. Look at bb.size() to find out
+  //   the expected dimensionality (e.g. 2 or 3 for point clouds)
+  template <class BBOX> bool kdtree_get_bbox(BBOX & /*bb*/) const {
+    return false;
+  }
+
+  /** @} */
+
+}; // end of KDTreeEigenMatrixAdaptor
+   /** @} */
+
+/** @} */ // end of grouping
+} // namespace nanoflann
+
+#endif /* NANOFLANN_HPP_ */
diff --git a/modules/alphamat/src/cm.cpp b/modules/alphamat/src/cm.cpp
new file mode 100644
index 00000000000..19dcb961e4a
--- /dev/null
+++ b/modules/alphamat/src/cm.cpp
@@ -0,0 +1,171 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "precomp.hpp"
+#include "intraU.hpp"
+#include "cm.hpp"
+
+namespace cv { namespace alphamat {
+
+static
+void generateFVectorCM(my_vector_of_vectors_t& samples, Mat& img)
+{
+    int nRows = img.rows;
+    int nCols = img.cols;
+
+    samples.resize(nRows * nCols);
+
+    int i, j;
+
+    for (i = 0; i < nRows; ++i)
+    {
+        for (j = 0; j < nCols; ++j)
+        {
+            samples[i * nCols + j].resize(ALPHAMAT_DIM);
+            samples[i * nCols + j][0] = img.at<cv::Vec3b>(i, j)[0] / 255.0;
+            samples[i * nCols + j][1] = img.at<cv::Vec3b>(i, j)[1] / 255.0;
+            samples[i * nCols + j][2] = img.at<cv::Vec3b>(i, j)[2] / 255.0;
+            samples[i * nCols + j][3] = double(i) / nRows;
+            samples[i * nCols + j][4] = double(j) / nCols;
+        }
+    }
+}
+
+static
+void kdtree_CM(Mat& img, my_vector_of_vectors_t& indm, my_vector_of_vectors_t& samples, std::unordered_set<int>& unk)
+{
+    // Generate feature vectors for intra U:
+    generateFVectorCM(samples, img);
+
+    // Query point: same as samples from which KD tree is generated
+
+    // construct a kd-tree index:
+    // Dimensionality set at run-time (default: L2)
+    // ------------------------------------------------------------
+    typedef KDTreeVectorOfVectorsAdaptor<my_vector_of_vectors_t, double> my_kd_tree_t;
+    my_kd_tree_t mat_index(ALPHAMAT_DIM /*dim*/, samples, 10 /* max leaf */);
+    mat_index.index->buildIndex();
+
+    // do a knn search with cm = 20
+    const size_t num_results = 20 + 1;
+
+    int N = unk.size();
+
+    std::vector<size_t> ret_indexes(num_results);
+    std::vector<double> out_dists_sqr(num_results);
+    nanoflann::KNNResultSet<double> resultSet(num_results);
+
+    indm.resize(N);
+    int i = 0;
+    for (std::unordered_set<int>::iterator it = unk.begin(); it != unk.end(); it++)
+    {
+        resultSet.init(&ret_indexes[0], &out_dists_sqr[0]);
+        mat_index.index->findNeighbors(resultSet, &samples[*it][0], nanoflann::SearchParams(10));
+
+        indm[i].resize(num_results - 1);
+        for (std::size_t j = 1; j < num_results; j++)
+        {
+            indm[i][j - 1] = ret_indexes[j];
+        }
+        i++;
+    }
+}
+
+static
+void lle(my_vector_of_vectors_t& indm, my_vector_of_vectors_t& samples, float eps, std::unordered_set<int>& unk,
+        SparseMatrix<double>& Wcm, SparseMatrix<double>& Dcm, Mat& img)
+{
+    CV_LOG_INFO(NULL, "ALPHAMAT: In cm's lle function");
+    int k = indm[0].size();  //number of neighbours that we are considering
+    int n = indm.size();  //number of unknown pixels
+
+    typedef Triplet<double> T;
+    std::vector<T> triplets, td;
+
+    my_vector_of_vectors_t wcm;
+    wcm.resize(n);
+
+    Mat C(20, 20, DataType<float>::type), rhs(20, 1, DataType<float>::type), Z(3, 20, DataType<float>::type), weights(20, 1, DataType<float>::type), pt(3, 1, DataType<float>::type);
+    Mat ptDotN(20, 1, DataType<float>::type), imd(20, 1, DataType<float>::type);
+    Mat Cones(20, 1, DataType<float>::type), Cinv(20, 1, DataType<float>::type);
+    float alpha, beta, lagrangeMult;
+    Cones += 1;
+
+    C = 0;
+    rhs = 1;
+
+    int i, ind = 0;
+    for (std::unordered_set<int>::iterator it = unk.begin(); it != unk.end(); it++)
+    {
+        // filling values in Z
+        i = *it;
+
+        int index_nbr;
+        for (int j = 0; j < k; j++)
+        {
+            index_nbr = indm[ind][j];
+            for (int p = 0; p < ALPHAMAT_DIM - 2; p++)
+            {
+                Z.at<float>(p, j) = samples[index_nbr][p];
+            }
+        }
+        pt.at<float>(0, 0) = samples[i][0];
+        pt.at<float>(1, 0) = samples[i][1];
+        pt.at<float>(2, 0) = samples[i][2];
+
+        C = Z.t() * Z;
+        for (int p = 0; p < k; p++)
+        {
+            C.at<float>(p, p) += eps;
+        }
+
+        ptDotN = Z.t() * pt;
+        solve(C, ptDotN, imd);
+        alpha = 1 - cv::sum(imd)[0];
+        solve(C, Cones, Cinv);
+        beta = cv::sum(Cinv)[0];  //% sum of elements of inv(corr)
+        lagrangeMult = alpha / beta;
+        solve(C, ptDotN + lagrangeMult * Cones, weights);
+
+        float sum = cv::sum(weights)[0];
+        weights = weights / sum;
+
+        int cMaj_i = findColMajorInd(i, img.rows, img.cols);
+
+        for (int j = 0; j < k; j++)
+        {
+            int cMaj_ind_j = findColMajorInd(indm[ind][j], img.rows, img.cols);
+            triplets.push_back(T(cMaj_i, cMaj_ind_j, weights.at<float>(j, 0)));
+            td.push_back(T(cMaj_i, cMaj_i, weights.at<float>(j, 0)));
+        }
+        ind++;
+    }
+
+    Wcm.setFromTriplets(triplets.begin(), triplets.end());
+    Dcm.setFromTriplets(td.begin(), td.end());
+}
+
+void cm(Mat& image, Mat& tmap, SparseMatrix<double>& Wcm, SparseMatrix<double>& Dcm)
+{
+    my_vector_of_vectors_t samples, indm, Euu;
+
+    int i, j;
+    std::unordered_set<int> unk;
+    for (i = 0; i < tmap.rows; i++)
+    {
+        for (j = 0; j < tmap.cols; j++)
+        {
+            uchar pix = tmap.at<uchar>(i, j);
+            if (pix == 128)
+                unk.insert(i * tmap.cols + j);
+        }
+    }
+
+    kdtree_CM(image, indm, samples, unk);
+    float eps = 0.00001;
+    lle(indm, samples, eps, unk, Wcm, Dcm, image);
+    CV_LOG_INFO(NULL, "ALPHAMAT: cm DONE");
+}
+
+}}  // namespace cv::alphamat
diff --git a/modules/alphamat/src/cm.hpp b/modules/alphamat/src/cm.hpp
new file mode 100644
index 00000000000..2e6baf431e3
--- /dev/null
+++ b/modules/alphamat/src/cm.hpp
@@ -0,0 +1,17 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef __OPENCV_ALPHAMAT_CM_H__
+#define __OPENCV_ALPHAMAT_CM_H__
+
+namespace cv { namespace alphamat {
+
+using namespace Eigen;
+using namespace nanoflann;
+
+void cm(Mat& image, Mat& tmap, SparseMatrix<double>& Wcm, SparseMatrix<double>& Dcm);
+
+}}
+
+#endif
diff --git a/modules/alphamat/src/infoflow.cpp b/modules/alphamat/src/infoflow.cpp
new file mode 100644
index 00000000000..e85ed8db5d1
--- /dev/null
+++ b/modules/alphamat/src/infoflow.cpp
@@ -0,0 +1,130 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "precomp.hpp"
+
+#include <Eigen/Sparse>
+
+using namespace Eigen;
+
+namespace cv { namespace alphamat {
+
+static
+void solve(SparseMatrix<double> Wcm, SparseMatrix<double> Wuu, SparseMatrix<double> Wl, SparseMatrix<double> Dcm,
+        SparseMatrix<double> Duu, SparseMatrix<double> Dl, SparseMatrix<double> T,
+        Mat& wf, Mat& alpha)
+{
+    float suu = 0.01, sl = 0.1, lamd = 100;
+
+    SparseMatrix<double> Lifm = ((Dcm - Wcm).transpose()) * (Dcm - Wcm) + sl * (Dl - Wl) + suu * (Duu - Wuu);
+
+    SparseMatrix<double> A;
+    int n = wf.rows;
+    VectorXd b(n), x(n);
+
+    Eigen::VectorXd wf_;
+    cv2eigen(wf, wf_);
+
+    A = Lifm + lamd * T;
+    b = (lamd * T) * (wf_);
+
+    ConjugateGradient<SparseMatrix<double>, Lower | Upper> cg;
+
+    cg.setMaxIterations(500);
+    cg.compute(A);
+    x = cg.solve(b);
+    CV_LOG_INFO(NULL, "ALPHAMAT: #iterations:     " << cg.iterations());
+    CV_LOG_INFO(NULL, "ALPHAMAT: estimated error: " << cg.error());
+
+    int nRows = alpha.rows;
+    int nCols = alpha.cols;
+    float pix_alpha;
+    for (int j = 0; j < nCols; ++j)
+    {
+        for (int i = 0; i < nRows; ++i)
+        {
+            pix_alpha = x(i + j * nRows);
+            if (pix_alpha < 0)
+                pix_alpha = 0;
+            if (pix_alpha > 1)
+                pix_alpha = 1;
+            alpha.at<uchar>(i, j) = uchar(pix_alpha * 255);
+        }
+    }
+}
+
+void infoFlow(InputArray image_ia, InputArray tmap_ia, OutputArray result)
+{
+    Mat image = image_ia.getMat();
+    Mat tmap = tmap_ia.getMat();
+
+    int64 begin = cv::getTickCount();
+
+    int nRows = image.rows;
+    int nCols = image.cols;
+    int N = nRows * nCols;
+
+    SparseMatrix<double> T(N, N);
+    typedef Triplet<double> Tr;
+    std::vector<Tr> triplets;
+
+    //Pre-process trimap
+    for (int i = 0; i < nRows; ++i)
+    {
+        for (int j = 0; j < nCols; ++j)
+        {
+            uchar& pix = tmap.at<uchar>(i, j);
+            if (pix <= 0.2f * 255)
+                pix = 0;
+            else if (pix >= 0.8f * 255)
+                pix = 255;
+            else
+                pix = 128;
+        }
+    }
+
+    Mat wf = Mat::zeros(nRows * nCols, 1, CV_8U);
+
+    // Column Major Interpretation for working with SparseMatrix
+    for (int i = 0; i < nRows; ++i)
+    {
+        for (int j = 0; j < nCols; ++j)
+        {
+            uchar pix = tmap.at<uchar>(i, j);
+
+            // collection of known pixels samples
+            triplets.push_back(Tr(i + j * nRows, i + j * nRows, (pix != 128) ? 1 : 0));
+
+            // foreground pixel
+            wf.at<uchar>(i + j * nRows, 0) = (pix > 200) ? 1 : 0;
+        }
+    }
+
+    SparseMatrix<double> Wl(N, N), Dl(N, N);
+    local_info(image, tmap, Wl, Dl);
+
+    SparseMatrix<double> Wcm(N, N), Dcm(N, N);
+    cm(image, tmap, Wcm, Dcm);
+
+    Mat new_tmap = tmap.clone();
+
+    SparseMatrix<double> Wuu(N, N), Duu(N, N);
+    Mat image_t = image.t();
+    Mat tmap_t = tmap.t();
+    UU(image, tmap, Wuu, Duu);
+
+    double elapsed_secs = ((double)(getTickCount() - begin)) / getTickFrequency();
+
+    T.setFromTriplets(triplets.begin(), triplets.end());
+
+    Mat alpha = Mat::zeros(nRows, nCols, CV_8UC1);
+    solve(Wcm, Wuu, Wl, Dcm, Duu, Dl, T, wf, alpha);
+
+    alpha.copyTo(result);
+
+    elapsed_secs = ((double)(getTickCount() - begin)) / getTickFrequency();
+    CV_LOG_INFO(NULL, "ALPHAMAT: total time: " << elapsed_secs);
+}
+
+}}  // namespace cv::alphamat
diff --git a/modules/alphamat/src/intraU.cpp b/modules/alphamat/src/intraU.cpp
new file mode 100644
index 00000000000..167458fcfc1
--- /dev/null
+++ b/modules/alphamat/src/intraU.cpp
@@ -0,0 +1,152 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "precomp.hpp"
+#include "intraU.hpp"
+
+namespace cv { namespace alphamat {
+
+int findColMajorInd(int rowMajorInd, int nRows, int nCols)
+{
+    int iInd = rowMajorInd / nCols;
+    int jInd = rowMajorInd % nCols;
+    return (jInd * nRows + iInd);
+}
+
+static
+void generateFVectorIntraU(my_vector_of_vectors_t& samples, Mat& img, Mat& tmap, std::vector<int>& orig_ind)
+{
+    int nRows = img.rows;
+    int nCols = img.cols;
+    int unk_count = 0;
+    int i, j;
+    for (i = 0; i < nRows; ++i)
+    {
+        for (j = 0; j < nCols; ++j)
+        {
+            uchar pix = tmap.at<uchar>(i, j);
+            if (pix == 128)
+                unk_count++;
+        }
+    }
+    samples.resize(unk_count);
+    orig_ind.resize(unk_count);
+
+    int c1 = 0;
+    for (i = 0; i < nRows; ++i)
+    {
+        for (j = 0; j < nCols; ++j)
+        {
+            uchar pix = tmap.at<uchar>(i, j);
+            if (pix == 128)  // collection of unknown pixels samples
+            {
+                samples[c1].resize(ALPHAMAT_DIM);
+                samples[c1][0] = img.at<cv::Vec3b>(i, j)[0] / 255.0;
+                samples[c1][1] = img.at<cv::Vec3b>(i, j)[1] / 255.0;
+                samples[c1][2] = img.at<cv::Vec3b>(i, j)[2] / 255.0;
+                samples[c1][3] = (double(i + 1) / nRows) / 20;
+                samples[c1][4] = (double(j + 1) / nCols) / 20;
+                orig_ind[c1] = i * nCols + j;
+                c1++;
+            }
+        }
+    }
+
+    CV_LOG_INFO(NULL, "ALPHAMAT: Total number of unknown pixels : " << c1);
+}
+
+static
+void kdtree_intraU(Mat& img, Mat& tmap, my_vector_of_vectors_t& indm, my_vector_of_vectors_t& samples, std::vector<int>& orig_ind)
+{
+    // Generate feature vectors for intra U:
+    generateFVectorIntraU(samples, img, tmap, orig_ind);
+
+    typedef KDTreeVectorOfVectorsAdaptor<my_vector_of_vectors_t, double> my_kd_tree_t;
+    my_kd_tree_t mat_index(ALPHAMAT_DIM /*dim*/, samples, 10 /* max leaf */);
+    mat_index.index->buildIndex();
+    // do a knn search with ku  = 5
+    const size_t num_results = 5 + 1;
+
+    int N = samples.size();  // no. of unknown samples
+
+    std::vector<size_t> ret_indexes(num_results);
+    std::vector<double> out_dists_sqr(num_results);
+    nanoflann::KNNResultSet<double> resultSet(num_results);
+
+    indm.resize(N);
+    for (int i = 0; i < N; i++)
+    {
+        resultSet.init(&ret_indexes[0], &out_dists_sqr[0]);
+        mat_index.index->findNeighbors(resultSet, &samples[i][0], nanoflann::SearchParams(10));
+
+        indm[i].resize(num_results - 1);
+        for (std::size_t j = 1; j < num_results; j++)
+        {
+            indm[i][j - 1] = ret_indexes[j];
+        }
+    }
+}
+
+static
+double l1norm(std::vector<double>& x, std::vector<double>& y)
+{
+    double sum = 0;
+    for (int i = 0; i < ALPHAMAT_DIM; i++)
+        sum += abs(x[i] - y[i]);
+    return sum / ALPHAMAT_DIM;
+}
+
+static
+void intraU(Mat& img, my_vector_of_vectors_t& indm, my_vector_of_vectors_t& samples,
+        std::vector<int>& orig_ind, SparseMatrix<double>& Wuu, SparseMatrix<double>& Duu)
+{
+    // input: indm, samples
+    int n = indm.size();  // num of unknown samples
+    CV_LOG_INFO(NULL, "ALPHAMAT: num of unknown samples, n : " << n);
+
+    int i, j, nbr_ind;
+    for (i = 0; i < n; i++)
+    {
+        samples[i][3] *= 1 / 100;
+        samples[i][4] *= 1 / 100;
+    }
+
+    my_vector_of_vectors_t weights;
+    typedef Triplet<double> T;
+    std::vector<T> triplets, td;
+
+    double weight;
+    for (i = 0; i < n; i++)
+    {
+        int num_nbr = indm[i].size();
+        int cMaj_i = findColMajorInd(orig_ind[i], img.rows, img.cols);
+        for (j = 0; j < num_nbr; j++)
+        {
+            nbr_ind = indm[i][j];
+            int cMaj_nbr_j = findColMajorInd(orig_ind[nbr_ind], img.rows, img.cols);
+            weight = max(1 - l1norm(samples[i], samples[j]), 0.0);
+
+            triplets.push_back(T(cMaj_i, cMaj_nbr_j, weight / 2));
+            td.push_back(T(cMaj_i, cMaj_i, weight / 2));
+
+            triplets.push_back(T(cMaj_nbr_j, cMaj_i, weight / 2));
+            td.push_back(T(cMaj_nbr_j, cMaj_nbr_j, weight / 2));
+        }
+    }
+
+    Wuu.setFromTriplets(triplets.begin(), triplets.end());
+    Duu.setFromTriplets(td.begin(), td.end());
+}
+
+void UU(Mat& image, Mat& tmap, SparseMatrix<double>& Wuu, SparseMatrix<double>& Duu)
+{
+    my_vector_of_vectors_t samples, indm;
+    std::vector<int> orig_ind;
+
+    kdtree_intraU(image, tmap, indm, samples, orig_ind);
+    intraU(image, indm, samples, orig_ind, Wuu, Duu);
+    CV_LOG_INFO(NULL, "ALPHAMAT: Intra U Done");
+}
+
+}}  // namespace cv::alphamat
diff --git a/modules/alphamat/src/intraU.hpp b/modules/alphamat/src/intraU.hpp
new file mode 100644
index 00000000000..fdd7dcf639f
--- /dev/null
+++ b/modules/alphamat/src/intraU.hpp
@@ -0,0 +1,23 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef __OPENCV_ALPHAMAT_INTRAU_H__
+#define __OPENCV_ALPHAMAT_INTRAU_H__
+
+namespace cv { namespace alphamat {
+
+const int ALPHAMAT_DIM = 5;  // dimension of feature vectors
+
+using namespace Eigen;
+using namespace nanoflann;
+
+typedef std::vector<std::vector<double>> my_vector_of_vectors_t;
+
+int findColMajorInd(int rowMajorInd, int nRows, int nCols);
+
+void UU(Mat& image, Mat& tmap, SparseMatrix<double>& Wuu, SparseMatrix<double>& Duu);
+
+}}  // namespace
+
+#endif
diff --git a/modules/alphamat/src/local_info.cpp b/modules/alphamat/src/local_info.cpp
new file mode 100644
index 00000000000..5460e864351
--- /dev/null
+++ b/modules/alphamat/src/local_info.cpp
@@ -0,0 +1,153 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+// #ifndef local_info
+// #define local_info
+
+#include "precomp.hpp"
+#include "local_info.hpp"
+
+namespace cv { namespace alphamat {
+
+void local_info(Mat& img, Mat& tmap, SparseMatrix<double>& Wl, SparseMatrix<double>& Dl)
+{
+    float eps = 0.000001;
+    int win_size = 1;
+
+    int nRows = img.rows;
+    int nCols = img.cols;
+    int N = img.rows * img.cols;
+    Mat unk_img = Mat::zeros(cv::Size(nCols, nRows), CV_32FC1);
+
+    for (int i = 0; i < nRows; ++i)
+    {
+        for (int j = 0; j < nCols; ++j)
+        {
+            uchar pix = tmap.at<uchar>(i, j);
+            if (pix == 128)  // collection of unknown pixels samples
+            {
+                unk_img.at<float>(i, j) = 255;
+            }
+        }
+    }
+
+    Mat element = getStructuringElement(MORPH_RECT, Size(2 * win_size + 1, 2 * win_size + 1));
+    /// Apply the dilation operation
+    Mat dilation_dst = unk_img.clone();
+    //dilate(unk_img, dilation_dst, element);
+
+    int num_win = (win_size * 2 + 1) * (win_size * 2 + 1);  // number of pixels in window
+    typedef Triplet<double> T;
+    std::vector<T> triplets, td, tl;
+    int neighInd[9];
+    int i, j;
+    for (j = win_size; j < nCols - win_size; j++)
+    {
+        for (i = win_size; i < nRows - win_size; i++)
+        {
+            uchar pix = tmap.at<uchar>(i, j);
+            //std::cout << i+j*nRows << " --> " << pix << std::endl;
+            if (pix != 128)
+                continue;
+            // extract the window out of image
+            Mat win = img.rowRange(i - win_size, i + win_size + 1);
+            win = win.colRange(j - win_size, j + win_size + 1);
+            Mat win_ravel = Mat::zeros(9, 3, CV_64F);  // doubt ??
+            double sum1 = 0;
+            double sum2 = 0;
+            double sum3 = 0;
+
+            int c = 0;
+            for (int q = -1; q <= 1; q++)
+            {
+                for (int p = -1; p <= 1; p++)
+                {
+                    neighInd[c] = (j + q) * nRows + (i + p);  // column major
+                    c++;
+                }
+            }
+
+            c = 0;
+            //parsing column major way in the window
+            for (int q = 0; q < win_size * 2 + 1; q++)
+            {
+                for (int p = 0; p < win_size * 2 + 1; p++)
+                {
+                    win_ravel.at<double>(c, 0) = win.at<cv::Vec3b>(p, q)[0] / 255.0;
+                    win_ravel.at<double>(c, 1) = win.at<cv::Vec3b>(p, q)[1] / 255.0;
+                    win_ravel.at<double>(c, 2) = win.at<cv::Vec3b>(p, q)[2] / 255.0;
+                    sum1 += win.at<cv::Vec3b>(p, q)[0] / 255.0;
+                    sum2 += win.at<cv::Vec3b>(p, q)[1] / 255.0;
+                    sum3 += win.at<cv::Vec3b>(p, q)[2] / 255.0;
+                    c++;
+                }
+            }
+            win = win_ravel;
+            Mat win_mean = Mat::zeros(1, 3, CV_64F);
+            win_mean.at<double>(0, 0) = sum1 / num_win;
+            win_mean.at<double>(0, 1) = sum2 / num_win;
+            win_mean.at<double>(0, 2) = sum3 / num_win;
+
+            // calculate the covariance matrix
+            Mat covariance = (win.t() * win / num_win) - (win_mean.t() * win_mean);
+
+            Mat I = Mat::eye(img.channels(), img.channels(), CV_64F);
+            Mat I1 = (covariance + (eps / num_win) * I);
+            Mat I1_inv = I1.inv();
+
+            Mat X = win - repeat(win_mean, num_win, 1);
+            Mat vals = (1 + X * I1_inv * X.t()) / num_win;
+
+            for (int q = 0; q < num_win; q++)
+            {
+                for (int p = 0; p < num_win; p++)
+                {
+                    triplets.push_back(T(neighInd[p], neighInd[q], vals.at<double>(p, q)));
+                }
+            }
+        }
+    }
+
+    std::vector<T> tsp;
+    SparseMatrix<double> W(N, N), Wsp(N, N);
+    W.setFromTriplets(triplets.begin(), triplets.end());
+
+    SparseMatrix<double> Wt = W.transpose();
+    SparseMatrix<double> Ws = Wt + W;
+    W = Ws;
+
+    for (int k = 0; k < W.outerSize(); ++k)
+    {
+        double sumCol = 0;
+        for (SparseMatrix<double>::InnerIterator it(W, k); it; ++it)
+        {
+            sumCol += it.value();
+        }
+        if (sumCol < 0.05)
+            sumCol = 1;
+        tsp.push_back(T(k, k, 1 / sumCol));
+    }
+    Wsp.setFromTriplets(tsp.begin(), tsp.end());
+
+    Wl = Wsp * W;  // For normalization
+    //Wl = W; // No normalization
+
+    SparseMatrix<double> Wlt = Wl.transpose();
+
+    for (int k = 0; k < Wlt.outerSize(); ++k)
+    {
+        double sumarr = 0;
+        for (SparseMatrix<double>::InnerIterator it(Wlt, k); it; ++it)
+            sumarr += it.value();
+        td.push_back(T(k, k, sumarr));
+    }
+
+    Dl.setFromTriplets(td.begin(), td.end());
+
+    CV_LOG_INFO(NULL, "ALPHAMAT: local_info DONE");
+}
+
+}}  // namespace cv::alphamat
+
+// #endif
diff --git a/modules/alphamat/src/local_info.hpp b/modules/alphamat/src/local_info.hpp
new file mode 100644
index 00000000000..178add2e579
--- /dev/null
+++ b/modules/alphamat/src/local_info.hpp
@@ -0,0 +1,17 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef __OPENCV_ALPHAMAT_LOCAL_INFO_H__
+#define __OPENCV_ALPHAMAT_LOCAL_INFO_H__
+
+
+namespace cv { namespace alphamat {
+
+using namespace Eigen;
+
+void local_info(Mat& img, Mat& tmap, SparseMatrix<double>& Wl, SparseMatrix<double>& Dl);
+
+}}  // namespace
+
+#endif
diff --git a/modules/alphamat/src/precomp.hpp b/modules/alphamat/src/precomp.hpp
new file mode 100644
index 00000000000..e043d813ded
--- /dev/null
+++ b/modules/alphamat/src/precomp.hpp
@@ -0,0 +1,31 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef __OPENCV_PRECOMP_H__
+#define __OPENCV_PRECOMP_H__
+
+#include <vector>
+#include <unordered_set>
+#include <set>
+
+#include <opencv2/core.hpp>
+#include <opencv2/imgproc.hpp>
+#include <opencv2/core/utils/logger.hpp>
+
+#include <opencv2/alphamat.hpp>
+
+#include "3rdparty/nanoflann.hpp"
+#include "3rdparty/KDTreeVectorOfVectorsAdaptor.h"
+
+#ifdef HAVE_EIGEN
+#include <Eigen/Eigen>
+#include <opencv2/core/eigen.hpp>
+#include <Eigen/IterativeLinearSolvers>
+#endif
+
+#include "intraU.hpp"
+#include "cm.hpp"
+#include "local_info.hpp"
+
+#endif
diff --git a/modules/alphamat/tutorials/alphamat_tutorial.markdown b/modules/alphamat/tutorials/alphamat_tutorial.markdown
new file mode 100644
index 00000000000..03cd329f5f2
--- /dev/null
+++ b/modules/alphamat/tutorials/alphamat_tutorial.markdown
@@ -0,0 +1,53 @@
+Information Flow Alpha Matting {#tutorial_alphamat}
+============================
+
+This project was part of Google Summer of Code 2019.
+
+*Student:* Muskaan Kularia
+
+*Mentor:* Sunita Nayak
+
+Alphamatting is the problem of extracting the foreground from an image. The extracted foreground can be used for further operations like changing the background in an image.
+
+Given an input image and its corresponding trimap, we try to extract the foreground from the background. Following is an example:
+
+Input Image: ![](samples/input_images/plant.jpg)
+Input Trimap: ![](samples/trimaps/plant.png)
+Output alpha Matte: ![](samples/output_mattes/plant_result.jpg)
+
+This project is implementation of @cite aksoy2017designing . It required implementation of parts of other papers [2,3,4].
+
+# Building
+
+This module uses the Eigen package.
+
+Build the sample code of the alphamat module using the following two cmake commands run inside the build folder:
+```
+cmake -DOPENCV_EXTRA_MODULES_PATH=<path to opencv_contrib modules> -DBUILD_EXAMPLES=ON ..
+
+cmake --build . --config Release --target example_alphamat_information_flow_matting
+```
+Please refer to OpenCV building tutorials for further details, if needed.
+
+# Testing
+
+The built target can be tested as follows:
+```
+example_alphamat_information_flow_matting -img=<path to input image file> -tri=<path to the corresponding trimap> -out=<path to save output matte file>
+```
+# Source Code of the sample
+
+@includelineno alphamat/samples/information_flow_matting.cpp
+
+
+# References
+
+[1] Yagiz Aksoy, Tunc Ozan Aydin, Marc Pollefeys, "[Designing Effective Inter-Pixel Information Flow for Natural Image Matting](http://people.inf.ethz.ch/aksoyy/ifm/)", CVPR, 2017.
+
+[2] Roweis, Sam T., and Lawrence K. Saul. "[Nonlinear dimensionality reduction by locally linear embedding](https://science.sciencemag.org/content/290/5500/2323)" Science 290.5500 (2000): 2323-2326.
+
+[3] Anat Levin, Dani Lischinski, Yair Weiss, "[A Closed Form Solution to Natural Image Matting](https://www.researchgate.net/publication/5764820_A_Closed-Form_Solution_to_Natural_Image_Matting)", IEEE TPAMI, 2008.
+
+[4] Qifeng Chen, Dingzeyu Li, Chi-Keung Tang, "[KNN Matting](http://dingzeyu.li/files/knn-matting-tpami.pdf)", IEEE TPAMI, 2013.
+
+[5] Yagiz Aksoy, "[Affinity Based Matting Toolbox](https://github.com/yaksoy/AffinityBasedMattingToolbox)".
diff --git a/modules/aruco/include/opencv2/aruco.hpp b/modules/aruco/include/opencv2/aruco.hpp
index f0d34ed98ac..3cf62d2accd 100644
--- a/modules/aruco/include/opencv2/aruco.hpp
+++ b/modules/aruco/include/opencv2/aruco.hpp
@@ -392,8 +392,8 @@ class CV_EXPORTS_W GridBoard : public Board {
  * Note that returning a 0 means the pose has not been estimated.
  */
 CV_EXPORTS_W int estimatePoseBoard(InputArrayOfArrays corners, InputArray ids, const Ptr<Board> &board,
-                                   InputArray cameraMatrix, InputArray distCoeffs, OutputArray rvec,
-                                   OutputArray tvec, bool useExtrinsicGuess = false);
+                                   InputArray cameraMatrix, InputArray distCoeffs, InputOutputArray rvec,
+                                   InputOutputArray tvec, bool useExtrinsicGuess = false);
 
 
 
diff --git a/modules/aruco/include/opencv2/aruco/charuco.hpp b/modules/aruco/include/opencv2/aruco/charuco.hpp
index 14aa5713e87..2e6ae62865a 100644
--- a/modules/aruco/include/opencv2/aruco/charuco.hpp
+++ b/modules/aruco/include/opencv2/aruco/charuco.hpp
@@ -184,8 +184,8 @@ CV_EXPORTS_W int interpolateCornersCharuco(InputArrayOfArrays markerCorners, Inp
  */
 CV_EXPORTS_W bool estimatePoseCharucoBoard(InputArray charucoCorners, InputArray charucoIds,
                                            const Ptr<CharucoBoard> &board, InputArray cameraMatrix,
-                                           InputArray distCoeffs, OutputArray rvec, OutputArray tvec,
-                                           bool useExtrinsicGuess = false);
+                                           InputArray distCoeffs, InputOutputArray rvec,
+                                           InputOutputArray tvec, bool useExtrinsicGuess = false);
 
 
 
diff --git a/modules/aruco/misc/pattern_generator/MarkerPrinter.py b/modules/aruco/misc/pattern_generator/MarkerPrinter.py
new file mode 100644
index 00000000000..301f4f918fe
--- /dev/null
+++ b/modules/aruco/misc/pattern_generator/MarkerPrinter.py
@@ -0,0 +1,1296 @@
+#!/usr/bin/env python3
+
+# SPDX-License-Identifier: BSD-3-Clause
+#
+# Copyright (c) 2019, Josh Chien. All rights reserved.
+
+from argparse import ArgumentParser
+import numpy as np
+from PIL import Image
+import io
+import warnings
+import os
+import cairo
+from cairosvg import svg2png
+import math
+import tempfile
+
+def SaveArucoDictBytesList(filePath = "arucoDictBytesList.npz"):
+    import numpy as np
+
+    # cv2 is optional dependency
+    try:
+        import cv2
+        from cv2 import aruco
+
+        # Name, Flag
+        dictInfo = \
+        [
+            ("DICT_4X4_1000", aruco.DICT_4X4_1000),
+            ("DICT_5X5_1000", aruco.DICT_5X5_1000),
+            ("DICT_6X6_1000", aruco.DICT_6X6_1000),
+            ("DICT_7X7_1000", aruco.DICT_7X7_1000),
+            ("DICT_ARUCO_ORIGINAL", aruco.DICT_ARUCO_ORIGINAL),
+            ("DICT_APRILTAG_16h5", aruco.DICT_APRILTAG_16h5),
+            ("DICT_APRILTAG_25h9", aruco.DICT_APRILTAG_25h9),
+            ("DICT_APRILTAG_36h10", aruco.DICT_APRILTAG_36h10),
+            ("DICT_APRILTAG_36h11", aruco.DICT_APRILTAG_36h11),
+        ]
+
+        arucoDictBytesList = {}
+        for name, flag in dictInfo:
+            arucoDict = aruco.Dictionary_get(flag)
+            arucoDictBytesList[name] = arucoDict.bytesList
+
+        np.savez_compressed(filePath, **arucoDictBytesList)
+        return arucoDictBytesList
+
+    except Exception as e:
+        warnings.warn(str(e))
+        return None
+
+    return None
+
+class MarkerPrinter:
+
+    debugMode = None # "LINE" "BLOCK"
+
+    # Static Vars
+    # SVG https://oreillymedia.github.io/Using_SVG/guide/units.html
+    # for PDF and SVG, 1 pixel = 1/72 inch, 1 cm = 1/2.54 inch, 1pixl = 2.54/72 cm, 1cm = 72/2.54 pixels
+    ptPerMeter = 72 / 2.54 * 100
+
+    surface = {
+            ".SVG": cairo.SVGSurface,
+            ".PDF": cairo.PDFSurface,
+            ".PS": cairo.PSSurface }
+
+    if (os.path.isfile("arucoDictBytesList.npz")):
+        arucoDictBytesList = np.load("arucoDictBytesList.npz")
+    else:
+        warnings.warn("Missing build-in arucoDictBytesList.npz, generate it again")
+        arucoDictBytesList = SaveArucoDictBytesList(filePath = "arucoDictBytesList.npz")
+
+    arucoDictMarkerSize = \
+        {
+            "DICT_4X4_1000": 4,
+            "DICT_5X5_1000": 5,
+            "DICT_6X6_1000": 6,
+            "DICT_7X7_1000": 7,
+            "DICT_ARUCO_ORIGINAL": 5,
+            "DICT_APRILTAG_16h5": 4,
+            "DICT_APRILTAG_25h9": 5,
+            "DICT_APRILTAG_36h10": 6,
+            "DICT_APRILTAG_36h11": 6,
+        }
+
+    def ArucoBits(dictionary, markerID):
+        bytesList = MarkerPrinter.arucoDictBytesList[dictionary][markerID].ravel()
+        markerSize = MarkerPrinter.arucoDictMarkerSize[dictionary]
+
+        arucoBits = np.zeros(shape = (markerSize, markerSize), dtype = bool)
+        base2List = np.array( [128, 64, 32, 16, 8, 4, 2, 1], dtype = np.uint8)
+        currentByteIdx = 0
+        currentByte = bytesList[currentByteIdx]
+        currentBit = 0
+        for row in range(markerSize):
+            for col in range(markerSize):
+                if(currentByte >= base2List[currentBit]):
+                    arucoBits[row, col] = True
+                    currentByte -= base2List[currentBit]
+                currentBit = currentBit + 1
+                if(currentBit == 8):
+                    currentByteIdx = currentByteIdx + 1
+                    currentByte = bytesList[currentByteIdx]
+                    if(8 * (currentByteIdx + 1) > arucoBits.size):
+                        currentBit = 8 * (currentByteIdx + 1) - arucoBits.size
+                    else:
+                        currentBit = 0;
+        return arucoBits
+
+    def __DrawBlock(context,
+        dictionary = None, markerLength = None, borderBits = 1,
+        chessboardSize = (1, 1), squareLength = None, firstMarkerID = 0,
+        blockX = 0, blockY = 0, originX = 0, originY = 0, pageBorderX = 0, pageBorderY = 0,
+        mode = "CHESS" ):
+
+        if(squareLength is None):
+            squareLength = markerLength
+
+        if(markerLength is None):
+            markerLength = squareLength
+
+        if((squareLength is None) or (markerLength is None)):
+            raise ValueError("lenght is None")
+
+        dawMarkerBlock = False
+        if ((mode == "ARUCO") or (mode == "ARUCOGRID")):
+            dawMarkerBlock = True
+        elif(chessboardSize[1] % 2 == 0):
+            dawMarkerBlock = (( blockX % 2 == 0 ) == ( blockY % 2 == 0 ))
+        else:
+            dawMarkerBlock = (( blockX % 2 == 0 ) != ( blockY % 2 == 0 ))
+
+        if(dawMarkerBlock):
+            if (mode != "CHESS"):
+                if(dictionary is None):
+                    raise ValueError("dictionary is None")
+
+                if (mode == "CHARUCO"):
+                    originX = (blockX - originX) * squareLength + (squareLength - markerLength)*0.5 + pageBorderX
+                    originY = (blockY - originY) * squareLength + (squareLength - markerLength)*0.5 + pageBorderY
+                else:
+                    originX = (blockX - originX) * squareLength + pageBorderX
+                    originY = (blockY - originY) * squareLength + pageBorderY
+
+                context.set_source_rgba(0.0, 0.0, 0.0, 1.0)
+                context.rectangle(originX, originY, markerLength, markerLength)
+                context.fill()
+
+                # Generate marker
+                if  (mode == "CHARUCO"):
+                    markerID = firstMarkerID + (blockY * chessboardSize[0] + blockX) // 2
+                elif (mode == "ARUCO"):
+                    markerID = firstMarkerID
+                elif (mode == "ARUCOGRID"):
+                    markerID = firstMarkerID + (blockY * chessboardSize[0] + blockX)
+
+                marker = MarkerPrinter.ArucoBits(dictionary, markerID)
+                markerSize = marker.shape[0]
+                unitLength = markerLength / (float)(markerSize + borderBits * 2)
+
+                markerBitMap = np.zeros(shape = (markerSize+borderBits*2, markerSize+borderBits*2), dtype = bool)
+                markerBitMap[borderBits:-borderBits,borderBits:-borderBits] = marker
+                markerBitMap = np.swapaxes(markerBitMap, 0, 1)
+
+                # Compute edges
+                hEdges = np.zeros(shape = (markerSize+1,markerSize+1), dtype = bool)
+                vEdges = np.zeros(shape = (markerSize+1,markerSize+1), dtype = bool)
+
+                for mx in range(markerSize):
+                    for my in range(markerSize+1):
+                        if ( markerBitMap[mx + borderBits, my + borderBits - 1] ^ markerBitMap[mx + borderBits, my + borderBits]):
+                            hEdges[mx, my] = True
+
+                for mx in range(markerSize+1):
+                    for my in range(markerSize):
+                        if ( markerBitMap[mx + borderBits - 1, my + borderBits] ^ markerBitMap[mx + borderBits, my + borderBits]):
+                            vEdges[mx, my] = True
+
+                # Use for debug, check edge or position is correct or not
+                if(MarkerPrinter.debugMode is not None):
+                    if(MarkerPrinter.debugMode.upper() == "LINE"):
+                        context.set_source_rgba(1.0, 1.0, 1.0, 1.0)
+                        context.set_line_width(unitLength * 0.1)
+                        for mx in range(markerSize+1):
+                            for my in range(markerSize+1):
+                                if(hEdges[mx, my]):
+                                    context.move_to(originX + unitLength * (mx + borderBits    ), originY + unitLength * (my + borderBits    ))
+                                    context.line_to(originX + unitLength * (mx + borderBits + 1), originY + unitLength * (my + borderBits    ))
+                                    context.stroke()
+                                if(vEdges[mx, my]):
+                                    context.move_to(originX + unitLength * (mx + borderBits    ), originY + unitLength * (my + borderBits    ))
+                                    context.line_to(originX + unitLength * (mx + borderBits    ), originY + unitLength * (my + borderBits + 1))
+                                    context.stroke()
+
+                    elif(MarkerPrinter.debugMode.upper() == "BLOCK"):
+                        context.set_source_rgba(1.0, 1.0, 1.0, 1.0)
+                        for mx in range(markerSize):
+                            for my in range(markerSize):
+                                if(markerBitMap[mx + borderBits, my + borderBits]):
+                                    context.rectangle(
+                                        originX + unitLength * (mx + borderBits),
+                                        originY + unitLength * (my + borderBits),
+                                        unitLength, unitLength)
+                                    context.fill()
+
+                else:
+                    while(True):
+                        found = False
+
+                        # Find start position
+                        sx = 0
+                        sy = 0
+                        for my in range(markerSize):
+                            for mx in range(markerSize):
+                                if(hEdges[mx, my]):
+                                    found = True
+                                    sx = mx
+                                    sy = my
+                                    if(markerBitMap[sx + borderBits, sy + borderBits - 1]):
+                                        context.set_source_rgba(0.0, 0.0, 0.0, 1.0)
+                                    else:
+                                        context.set_source_rgba(1.0, 1.0, 1.0, 1.0)
+                                    break
+                            if(found):
+                                break
+
+                        context.move_to (originX + unitLength * (sx + borderBits), originY + unitLength * (sy + borderBits))
+
+                        # Use wall follower maze solving algorithm to draw white part
+                        cx = sx
+                        cy = sy
+                        cd = 3 # 0 right, 1 down, 2 left, 3 up
+                        while(True):
+                            nd = (cd + 1)%4
+                            moved = False
+                            if(nd == 0):
+                                if(hEdges[cx, cy]):
+                                    hEdges[cx, cy] = False
+                                    cx = cx + 1
+                                    moved = True
+                            elif(nd == 1):
+                                if(vEdges[cx, cy]):
+                                    vEdges[cx, cy] = False
+                                    cy = cy + 1
+                                    moved = True
+                            elif(nd == 2):
+                                if(hEdges[cx - 1, cy]):
+                                    hEdges[cx - 1, cy] = False
+                                    cx = cx - 1
+                                    moved = True
+                            elif(nd == 3):
+                                if(vEdges[cx, cy - 1]):
+                                    vEdges[cx, cy - 1] = False
+                                    cy = cy - 1
+                                    moved = True
+
+                            if((cx == sx) and (cy == sy)):
+                                context.close_path ()
+                                break
+                            else:
+                                if(moved):
+                                    context.line_to(originX + unitLength * (cx + borderBits), originY + unitLength * (cy + borderBits))
+                                cd = nd
+
+                        if (found):
+                            context.fill()
+                        else:
+                            break
+
+        else:
+            originX = (blockX - originX) * squareLength + pageBorderX
+            originY = (blockY - originY) * squareLength + pageBorderY
+            context.set_source_rgba(0.0, 0.0, 0.0, 1.0)
+            context.rectangle(originX, originY, squareLength, squareLength)
+            context.fill()
+
+    def __CheckChessMarkerImage(chessboardSize, squareLength, subSize=None, pageBorder=(0,0)):
+        if(len(chessboardSize) != 2):
+            raise ValueError("len(chessboardSize) != 2")
+        else:
+            sizeX, sizeY = chessboardSize
+
+        if(len(pageBorder) != 2):
+            raise ValueError("len(pageBorder) != 2")
+        else:
+            pageBorderX, pageBorderY = pageBorder
+
+        if(sizeX <= 1):
+            raise ValueError("sizeX <= 1")
+
+        if(sizeY <= 1):
+            raise ValueError("sizeY <= 1")
+
+        if(squareLength <= 0):
+            raise ValueError("squareLength <= 0")
+
+        if(pageBorderX < 0):
+            raise ValueError("pageBorderX < 0")
+
+        if(pageBorderY < 0):
+            raise ValueError("pageBorderY < 0")
+
+        if(subSize is not None):
+            subSizeX, subSizeY = subSize
+
+            if(subSizeX < 0):
+                raise ValueError("subSizeX < 0")
+
+            if(subSizeY < 0):
+                raise ValueError("subSizeY < 0")
+
+    def PreviewChessMarkerImage(chessboardSize, squareLength, pageBorder=(0, 0), dpi=96):
+        MarkerPrinter.__CheckChessMarkerImage(chessboardSize, squareLength, pageBorder=pageBorder)
+
+        squareLength = squareLength * MarkerPrinter.ptPerMeter
+        pageBorder = (pageBorder[0] * MarkerPrinter.ptPerMeter, pageBorder[1] * MarkerPrinter.ptPerMeter)
+
+        prevImage = None
+        with tempfile.TemporaryDirectory() as tmpdirname:
+            with MarkerPrinter.surface[".SVG"] (
+                os.path.join(tmpdirname, "tempSVG.svg"),
+                chessboardSize[0] * squareLength + pageBorder[0] * 2,
+                chessboardSize[1] * squareLength + pageBorder[1] * 2) as surface:
+                context = cairo.Context(surface)
+
+                context.set_source_rgba(0.5, 0.5, 0.5, 1.0)
+                context.rectangle(0, 0,
+                    chessboardSize[0] * squareLength + pageBorder[0] * 2,
+                    chessboardSize[1] * squareLength + pageBorder[1] * 2)
+                context.fill()
+
+                context.set_source_rgba(1.0, 1.0, 1.0, 1.0)
+                context.rectangle(pageBorder[0], pageBorder[1],
+                    chessboardSize[0] * squareLength,
+                    chessboardSize[1] * squareLength)
+                context.fill()
+
+                for bx in range(chessboardSize[0]):
+                    for by in range(chessboardSize[1]):
+                        MarkerPrinter.__DrawBlock(
+                            context = context,
+                            chessboardSize = chessboardSize,
+                            squareLength = squareLength,
+                            blockX = bx,
+                            blockY = by,
+                            pageBorderX = pageBorder[0],
+                            pageBorderY = pageBorder[1],
+                            mode = "CHESS")
+
+            with open(os.path.join(tmpdirname, "tempSVG.svg")) as file:
+                prevImage = Image.open(io.BytesIO(svg2png(bytestring=file.read(), dpi=dpi)))
+
+        return prevImage
+
+    def GenChessMarkerImage(filePath, chessboardSize, squareLength, subSize=None, pageBorder=(0, 0)):
+        MarkerPrinter.__CheckChessMarkerImage(chessboardSize, squareLength, subSize=subSize, pageBorder=pageBorder)
+
+        squareLength = squareLength * MarkerPrinter.ptPerMeter
+        pageBorder = (pageBorder[0] * MarkerPrinter.ptPerMeter, pageBorder[1] * MarkerPrinter.ptPerMeter)
+
+        # Check
+        path, nameExt = os.path.split(filePath)
+        name, ext = os.path.splitext(nameExt)
+
+        if(len(path) > 0):
+            if not(os.path.isdir(path)):
+                os.makedirs(path)
+
+        if((ext.upper() != ".SVG") and (ext.upper() != ".PS") and (ext.upper() != ".PDF")):
+            raise ValueError("file extention is not supported, should be: svg, ps, pdf")
+
+        # Draw
+        with MarkerPrinter.surface[ext.upper()] (
+            filePath,
+            chessboardSize[0] * squareLength + pageBorder[0] * 2,
+            chessboardSize[1] * squareLength + pageBorder[1] * 2) as surface:
+            context = cairo.Context(surface)
+
+            context.set_source_rgba(0.5, 0.5, 0.5, 1.0)
+            context.rectangle(0, 0,
+                chessboardSize[0] * squareLength + pageBorder[0] * 2,
+                chessboardSize[1] * squareLength + pageBorder[1] * 2)
+            context.fill()
+
+            context.set_source_rgba(1.0, 1.0, 1.0, 1.0)
+            context.rectangle(pageBorder[0], pageBorder[1],
+                chessboardSize[0] * squareLength,
+                chessboardSize[1] * squareLength)
+            context.fill()
+
+            for bx in range(chessboardSize[0]):
+                for by in range(chessboardSize[1]):
+                    MarkerPrinter.__DrawBlock(
+                        context = context,
+                        chessboardSize = chessboardSize,
+                        squareLength = squareLength,
+                        blockX = bx,
+                        blockY = by,
+                        pageBorderX = pageBorder[0],
+                        pageBorderY = pageBorder[1],
+                        mode = "CHESS" )
+
+        if(subSize is not None):
+            subDivide = (\
+                chessboardSize[0] // subSize[0] + int(chessboardSize[0] % subSize[0] > 0),
+                chessboardSize[1] // subSize[1] + int(chessboardSize[1] % subSize[1] > 0))
+
+            subChessboardBlockX = np.clip ( np.arange(0, subSize[0] * subDivide[0] + 1, subSize[0]), 0, chessboardSize[0])
+            subChessboardBlockY = np.clip ( np.arange(0, subSize[1] * subDivide[1] + 1, subSize[1]), 0, chessboardSize[1])
+
+            subChessboardSliceX = subChessboardBlockX.astype(np.float) * squareLength
+            subChessboardSliceY = subChessboardBlockY.astype(np.float) * squareLength
+
+            for subXID in range(subDivide[0]):
+                for subYID in range(subDivide[1]):
+                    subName = name + \
+                        "_X" + str(subChessboardBlockX[subXID]) + "_" + str(subChessboardBlockX[subXID+1]) + \
+                        "_Y" + str(subChessboardBlockY[subYID]) + "_" + str(subChessboardBlockY[subYID+1])
+
+                    with MarkerPrinter.surface[ext.upper()](
+                        os.path.join(path, subName + ext),
+                        subChessboardSliceX[subXID+1] - subChessboardSliceX[subXID] + pageBorder[0] * 2,
+                        subChessboardSliceY[subYID+1] - subChessboardSliceY[subYID] + pageBorder[1] * 2) as surface:
+                        context = cairo.Context(surface)
+
+                        context.set_source_rgba(0.5, 0.5, 0.5, 1.0)
+                        context.rectangle(0, 0,
+                            subChessboardSliceX[subXID+1] - subChessboardSliceX[subXID] + pageBorder[0] * 2,
+                            subChessboardSliceY[subYID+1] - subChessboardSliceY[subYID] + pageBorder[1] * 2)
+                        context.fill()
+
+                        context.set_source_rgba(1.0, 1.0, 1.0, 1.0)
+                        context.rectangle(pageBorder[0], pageBorder[1],
+                            subChessboardSliceX[subXID+1] - subChessboardSliceX[subXID],
+                            subChessboardSliceY[subYID+1] - subChessboardSliceY[subYID])
+                        context.fill()
+
+                        for bx in range(subChessboardBlockX[subXID+1] - subChessboardBlockX[subXID]):
+                            for by in range(subChessboardBlockY[subYID+1] - subChessboardBlockY[subYID]):
+                                MarkerPrinter.__DrawBlock(
+                                    context = context,
+                                    chessboardSize = chessboardSize,
+                                    squareLength = squareLength,
+                                    blockX = subChessboardBlockX[subXID] + bx,
+                                    blockY = subChessboardBlockY[subYID] + by,
+                                    originX = subChessboardBlockX[subXID],
+                                    originY = subChessboardBlockY[subYID],
+                                    pageBorderX = pageBorder[0],
+                                    pageBorderY = pageBorder[1],
+                                    mode = "CHESS" )
+
+
+    def __CheckArucoMarkerImage(dictionary, markerID, markerLength, borderBits=1, pageBorder=(0, 0)):
+        if(len(pageBorder) != 2):
+            raise ValueError("len(pageBorder) != 2")
+        else:
+            pageBorderX, pageBorderY = pageBorder
+
+        if not (dictionary in MarkerPrinter.arucoDictBytesList):
+            raise ValueError("dictionary is not support")
+
+        if(MarkerPrinter.arucoDictBytesList[dictionary].shape[0] <= markerID ):
+            raise ValueError("markerID is not in aruce dictionary")
+
+        if(markerID < 0):
+            raise ValueError("markerID < 0")
+
+        if(markerLength <= 0):
+            raise ValueError("markerLength <= 0")
+
+        if(borderBits <= 0):
+            raise ValueError("borderBits <= 0")
+
+        if(pageBorderX < 0):
+            raise ValueError("pageBorderX < 0")
+
+        if(pageBorderY < 0):
+            raise ValueError("pageBorderY < 0")
+
+    def PreviewArucoMarkerImage(dictionary, markerID, markerLength, borderBits=1, pageBorder=(0, 0), dpi=96):
+        MarkerPrinter.__CheckArucoMarkerImage(dictionary, markerID, markerLength, borderBits=borderBits, pageBorder=pageBorder)
+
+        markerLength = markerLength * MarkerPrinter.ptPerMeter
+        pageBorder = (pageBorder[0] * MarkerPrinter.ptPerMeter, pageBorder[1] * MarkerPrinter.ptPerMeter)
+
+        prevImage = None
+        with tempfile.TemporaryDirectory() as tmpdirname:
+            with MarkerPrinter.surface[".SVG"] (
+                os.path.join(tmpdirname, "tempSVG.svg"),
+                markerLength + pageBorder[0] * 2,
+                markerLength + pageBorder[1] * 2) as surface:
+                context = cairo.Context(surface)
+
+                context.set_source_rgba(0.5, 0.5, 0.5, 1.0)
+                context.rectangle(0, 0,
+                    markerLength + pageBorder[0] * 2,
+                    markerLength + pageBorder[1] * 2)
+                context.fill()
+
+                context.set_source_rgba(1.0, 1.0, 1.0, 1.0)
+                context.rectangle(pageBorder[0], pageBorder[1],
+                    markerLength,
+                    markerLength)
+                context.fill()
+
+                MarkerPrinter.__DrawBlock(
+                    context = context,
+                    dictionary = dictionary,
+                    markerLength = markerLength,
+                    borderBits = borderBits,
+                    firstMarkerID = markerID,
+                    pageBorderX = pageBorder[0],
+                    pageBorderY = pageBorder[1],
+                    mode = "ARUCO")
+
+            with open(os.path.join(tmpdirname, "tempSVG.svg")) as file:
+                prevImage = Image.open(io.BytesIO(svg2png(bytestring=file.read(), dpi=dpi)))
+
+        return prevImage
+
+    def GenArucoMarkerImage(filePath, dictionary, markerID, markerLength, borderBits=1, pageBorder=(0, 0)):
+        MarkerPrinter.__CheckArucoMarkerImage(dictionary, markerID, markerLength, borderBits=borderBits, pageBorder=pageBorder)
+
+        markerLength = markerLength * MarkerPrinter.ptPerMeter
+        pageBorder = (pageBorder[0] * MarkerPrinter.ptPerMeter, pageBorder[1] * MarkerPrinter.ptPerMeter)
+
+        # Check
+        path, nameExt = os.path.split(filePath)
+        name, ext = os.path.splitext(nameExt)
+
+        if(len(path) > 0):
+            if not(os.path.isdir(path)):
+                os.makedirs(path)
+
+        if((ext.upper() != ".SVG") and (ext.upper() != ".PS") and (ext.upper() != ".PDF")):
+            raise ValueError("file extention is not supported, should be: svg, ps, pdf")
+
+        # Draw
+        with MarkerPrinter.surface[ext.upper()] (
+            filePath,
+            markerLength + pageBorder[0] * 2,
+            markerLength + pageBorder[1] * 2) as surface:
+            context = cairo.Context(surface)
+
+            context.set_source_rgba(0.5, 0.5, 0.5, 1.0)
+            context.rectangle(0, 0,
+                markerLength + pageBorder[0] * 2,
+                markerLength + pageBorder[1] * 2)
+            context.fill()
+
+            context.set_source_rgba(1.0, 1.0, 1.0, 1.0)
+            context.rectangle(pageBorder[0], pageBorder[1],
+                markerLength,
+                markerLength)
+            context.fill()
+
+            MarkerPrinter.__DrawBlock(
+                context = context,
+                dictionary = dictionary,
+                markerLength = markerLength,
+                borderBits = borderBits,
+                firstMarkerID = markerID,
+                pageBorderX = pageBorder[0],
+                pageBorderY = pageBorder[1],
+                mode = "ARUCO")
+
+    def __CheckCharucoMarkerImage(dictionary, chessboardSize, squareLength, markerLength, borderBits=1, subSize=None, pageBorder=(0, 0)):
+        if(len(chessboardSize) != 2):
+            raise ValueError("len(chessboardSize) != 2")
+        else:
+            sizeX, sizeY = chessboardSize
+
+        if(len(pageBorder) != 2):
+            raise ValueError("len(pageBorder) != 2")
+        else:
+            pageBorderX, pageBorderY = pageBorder
+
+        if not (dictionary in MarkerPrinter.arucoDictBytesList):
+            raise ValueError("dictionary is not support")
+
+        if(MarkerPrinter.arucoDictBytesList[dictionary].shape[0] < (( sizeX * sizeY ) // 2)):
+            raise ValueError("aruce dictionary is not enough for your board size")
+
+        if(sizeX <= 1):
+            raise ValueError("sizeX <= 1")
+
+        if(sizeY <= 1):
+            raise ValueError("sizeY <= 1")
+
+        if(squareLength <= 0):
+            raise ValueError("squareLength <= 0")
+
+        if(markerLength <= 0):
+            raise ValueError("markerLength <= 0")
+
+        if(squareLength < markerLength):
+            raise ValueError("squareLength < markerLength")
+
+        if(borderBits <= 0):
+            raise ValueError("borderBits <= 0")
+
+        if(pageBorderX < 0):
+            raise ValueError("pageBorderX < 0")
+
+        if(pageBorderY < 0):
+            raise ValueError("pageBorderY < 0")
+
+        if(subSize is not None):
+            subSizeX, subSizeY = subSize
+
+            if(subSizeX < 0):
+                raise ValueError("subSizeX < 0")
+
+            if(subSizeY < 0):
+                raise ValueError("subSizeY < 0")
+
+    def PreviewCharucoMarkerImage(dictionary, chessboardSize, squareLength, markerLength, borderBits=1, pageBorder=(0, 0), dpi=96):
+        MarkerPrinter.__CheckCharucoMarkerImage(dictionary, chessboardSize, squareLength, markerLength, borderBits=borderBits, pageBorder=pageBorder)
+
+        squareLength = squareLength * MarkerPrinter.ptPerMeter
+        markerLength = markerLength * MarkerPrinter.ptPerMeter
+        pageBorder = (pageBorder[0] * MarkerPrinter.ptPerMeter, pageBorder[1] * MarkerPrinter.ptPerMeter)
+
+        prevImage = None
+        with tempfile.TemporaryDirectory() as tmpdirname:
+            with MarkerPrinter.surface[".SVG"] (
+                os.path.join(tmpdirname, "tempSVG.svg"),
+                chessboardSize[0] * squareLength + pageBorder[0] * 2,
+                chessboardSize[1] * squareLength + pageBorder[1] * 2) as surface:
+                context = cairo.Context(surface)
+
+                context.set_source_rgba(0.5, 0.5, 0.5, 1.0)
+                context.rectangle(0, 0,
+                    chessboardSize[0] * squareLength + pageBorder[0] * 2,
+                    chessboardSize[1] * squareLength + pageBorder[1] * 2)
+                context.fill()
+
+                context.set_source_rgba(1.0, 1.0, 1.0, 1.0)
+                context.rectangle(pageBorder[0], pageBorder[1],
+                    chessboardSize[0] * squareLength,
+                    chessboardSize[1] * squareLength)
+                context.fill()
+
+                for bx in range(chessboardSize[0]):
+                    for by in range(chessboardSize[1]):
+                        MarkerPrinter.__DrawBlock(
+                            context = context,
+                            dictionary = dictionary,
+                            markerLength = markerLength,
+                            borderBits = borderBits,
+                            chessboardSize = chessboardSize,
+                            squareLength = squareLength,
+                            blockX = bx,
+                            blockY = by,
+                            pageBorderX = pageBorder[0],
+                            pageBorderY = pageBorder[1],
+                            mode = "CHARUCO")
+
+            with open(os.path.join(tmpdirname, "tempSVG.svg")) as file:
+                prevImage = Image.open(io.BytesIO(svg2png(bytestring=file.read(), dpi=dpi)))
+
+        return prevImage
+
+    def GenCharucoMarkerImage(filePath, dictionary, chessboardSize, squareLength, markerLength, borderBits=1, subSize=None, pageBorder=(0, 0)):
+        MarkerPrinter.__CheckCharucoMarkerImage(dictionary, chessboardSize, squareLength, markerLength, borderBits=borderBits, subSize=subSize, pageBorder=pageBorder)
+
+        squareLength = squareLength * MarkerPrinter.ptPerMeter
+        markerLength = markerLength * MarkerPrinter.ptPerMeter
+        pageBorder = (pageBorder[0] * MarkerPrinter.ptPerMeter, pageBorder[1] * MarkerPrinter.ptPerMeter)
+
+        # Check
+        path, nameExt = os.path.split(filePath)
+        name, ext = os.path.splitext(nameExt)
+
+        if(len(path) > 0):
+            if not(os.path.isdir(path)):
+                os.makedirs(path)
+
+        if((ext.upper() != ".SVG") and (ext.upper() != ".PS") and (ext.upper() != ".PDF")):
+            raise ValueError("file extention is not supported, should be: svg, ps, pdf")
+
+        # Draw
+        with MarkerPrinter.surface[ext.upper()] (
+            filePath,
+            chessboardSize[0] * squareLength + pageBorder[0] * 2,
+            chessboardSize[1] * squareLength + pageBorder[1] * 2) as surface:
+            context = cairo.Context(surface)
+
+            context.set_source_rgba(0.5, 0.5, 0.5, 1.0)
+            context.rectangle(0, 0,
+                chessboardSize[0] * squareLength + pageBorder[0] * 2,
+                chessboardSize[1] * squareLength + pageBorder[1] * 2)
+            context.fill()
+
+            context.set_source_rgba(1.0, 1.0, 1.0, 1.0)
+            context.rectangle(pageBorder[0], pageBorder[1],
+                chessboardSize[0] * squareLength,
+                chessboardSize[1] * squareLength)
+            context.fill()
+
+            for bx in range(chessboardSize[0]):
+                for by in range(chessboardSize[1]):
+                    MarkerPrinter.__DrawBlock(
+                        context = context,
+                        dictionary = dictionary,
+                        markerLength = markerLength,
+                        borderBits = borderBits,
+                        chessboardSize = chessboardSize,
+                        squareLength = squareLength,
+                        blockX = bx,
+                        blockY = by,
+                        pageBorderX = pageBorder[0],
+                        pageBorderY = pageBorder[1],
+                        mode = "CHARUCO")
+
+        if(subSize is not None):
+            subDivide = (\
+                chessboardSize[0] // subSize[0] + int(chessboardSize[0] % subSize[0] > 0),
+                chessboardSize[1] // subSize[1] + int(chessboardSize[1] % subSize[1] > 0))
+
+            subChessboardBlockX = np.clip ( np.arange(0, subSize[0] * subDivide[0] + 1, subSize[0]), 0, chessboardSize[0])
+            subChessboardBlockY = np.clip ( np.arange(0, subSize[1] * subDivide[1] + 1, subSize[1]), 0, chessboardSize[1])
+
+            subChessboardSliceX = subChessboardBlockX.astype(np.float) * squareLength
+            subChessboardSliceY = subChessboardBlockY.astype(np.float) * squareLength
+
+            for subXID in range(subDivide[0]):
+                for subYID in range(subDivide[1]):
+                    subName = name + \
+                        "_X" + str(subChessboardBlockX[subXID]) + "_" + str(subChessboardBlockX[subXID+1]) + \
+                        "_Y" + str(subChessboardBlockY[subYID]) + "_" + str(subChessboardBlockY[subYID+1])
+
+                    with MarkerPrinter.surface[ext.upper()](
+                        os.path.join(path, subName + ext),
+                        subChessboardSliceX[subXID+1] - subChessboardSliceX[subXID] + pageBorder[0] * 2,
+                        subChessboardSliceY[subYID+1] - subChessboardSliceY[subYID] + pageBorder[1] * 2) as surface:
+                        context = cairo.Context(surface)
+
+                        context.set_source_rgba(0.5, 0.5, 0.5, 1.0)
+                        context.rectangle(0, 0,
+                            subChessboardSliceX[subXID+1] - subChessboardSliceX[subXID] + pageBorder[0] * 2,
+                            subChessboardSliceY[subYID+1] - subChessboardSliceY[subYID] + pageBorder[1] * 2)
+                        context.fill()
+
+                        context.set_source_rgba(1.0, 1.0, 1.0, 1.0)
+                        context.rectangle(pageBorder[0], pageBorder[1],
+                            subChessboardSliceX[subXID+1] - subChessboardSliceX[subXID],
+                            subChessboardSliceY[subYID+1] - subChessboardSliceY[subYID])
+                        context.fill()
+
+                        for bx in range(subChessboardBlockX[subXID+1] - subChessboardBlockX[subXID]):
+                            for by in range(subChessboardBlockY[subYID+1] - subChessboardBlockY[subYID]):
+                                MarkerPrinter.__DrawBlock(
+                                    context = context,
+                                    dictionary = dictionary,
+                                    markerLength = markerLength,
+                                    borderBits = borderBits,
+                                    chessboardSize = chessboardSize,
+                                    squareLength = squareLength,
+                                    blockX = subChessboardBlockX[subXID] + bx,
+                                    blockY = subChessboardBlockY[subYID] + by,
+                                    originX = subChessboardBlockX[subXID],
+                                    originY = subChessboardBlockY[subYID],
+                                    pageBorderX = pageBorder[0],
+                                    pageBorderY = pageBorder[1],
+                                    mode = "CHARUCO")
+
+    def __CheckArucoGridMarkerImage(dictionary, chessboardSize, markerLength, markerSeparation, firstMarker, borderBits=1, subSize=None, pageBorder=(0, 0)):
+        if(len(chessboardSize) != 2):
+            raise ValueError("len(chessboardSize) != 2")
+        else:
+            sizeX, sizeY = chessboardSize
+
+        if(len(pageBorder) != 2):
+            raise ValueError("len(pageBorder) != 2")
+        else:
+            pageBorderX, pageBorderY = pageBorder
+
+        if not (dictionary in MarkerPrinter.arucoDictBytesList):
+            raise ValueError("dictionary is not support")
+
+        if(MarkerPrinter.arucoDictBytesList[dictionary].shape[0] < (( sizeX * sizeY ) + firstMarker)):
+            raise ValueError("aruce dictionary is not enough for your board size and firstMarker")
+
+        if(sizeX <= 1):
+            raise ValueError("sizeX <= 1")
+
+        if(sizeY <= 1):
+            raise ValueError("sizeY <= 1")
+
+        if(markerLength <= 0):
+            raise ValueError("markerLength <= 0")
+
+        if(markerSeparation <= 0):
+            raise ValueError("markerSeparation <= 0")
+
+        if(borderBits <= 0):
+            raise ValueError("borderBits <= 0")
+
+        if(pageBorderX < 0):
+            raise ValueError("pageBorderX < 0")
+
+        if(pageBorderY < 0):
+            raise ValueError("pageBorderY < 0")
+
+        if(subSize is not None):
+            subSizeX, subSizeY = subSize
+
+            if(subSizeX < 0):
+                raise ValueError("subSizeX < 0")
+
+            if(subSizeY < 0):
+                raise ValueError("subSizeY < 0")
+
+    def PreviewArucoGridMarkerImage(dictionary, chessboardSize, markerLength, markerSeparation, firstMarker, borderBits=1, pageBorder=(0, 0), dpi=96):
+        MarkerPrinter.__CheckArucoGridMarkerImage(dictionary, chessboardSize, markerLength, markerSeparation, firstMarker, borderBits=borderBits, pageBorder=pageBorder)
+
+        markerLength = markerLength * MarkerPrinter.ptPerMeter
+        markerSeparation = markerSeparation * MarkerPrinter.ptPerMeter
+        pageBorder = (pageBorder[0] * MarkerPrinter.ptPerMeter, pageBorder[1] * MarkerPrinter.ptPerMeter)
+
+        prevImage = None
+        with tempfile.TemporaryDirectory() as tmpdirname:
+            with MarkerPrinter.surface[".SVG"] (
+                os.path.join(tmpdirname, "tempSVG.svg"),
+                chessboardSize[0] * markerLength + (chessboardSize[0] - 1) * markerSeparation + pageBorder[0] * 2,
+                chessboardSize[1] * markerLength + (chessboardSize[1] - 1) * markerSeparation + pageBorder[1] * 2) as surface:
+                context = cairo.Context(surface)
+
+                context.set_source_rgba(0.5, 0.5, 0.5, 1.0)
+                context.rectangle(0, 0,
+                    chessboardSize[0] * markerLength + (chessboardSize[0] - 1) * markerSeparation + pageBorder[0] * 2,
+                    chessboardSize[1] * markerLength + (chessboardSize[1] - 1) * markerSeparation + pageBorder[1] * 2)
+                context.fill()
+
+                context.set_source_rgba(1.0, 1.0, 1.0, 1.0)
+                context.rectangle(pageBorder[0], pageBorder[1],
+                    chessboardSize[0] * markerLength + (chessboardSize[0] - 1) * markerSeparation,
+                    chessboardSize[1] * markerLength + (chessboardSize[1] - 1) * markerSeparation)
+                context.fill()
+
+                for bx in range(chessboardSize[0]):
+                    for by in range(chessboardSize[1]):
+                        MarkerPrinter.__DrawBlock(
+                            context = context,
+                            dictionary = dictionary,
+                            markerLength = markerLength,
+                            borderBits = borderBits,
+                            chessboardSize = chessboardSize,
+                            squareLength = markerLength + markerSeparation,
+                            firstMarkerID = firstMarker,
+                            blockX = bx,
+                            blockY = by,
+                            pageBorderX = pageBorder[0],
+                            pageBorderY = pageBorder[1],
+                            mode = "ARUCOGRID")
+
+            with open(os.path.join(tmpdirname, "tempSVG.svg")) as file:
+                prevImage = Image.open(io.BytesIO(svg2png(bytestring=file.read(), dpi=dpi)))
+
+        return prevImage
+
+    def GenArucoGridMarkerImage(filePath, dictionary, chessboardSize, markerLength, markerSeparation, firstMarker, borderBits=1, subSize=None, pageBorder=(0, 0)):
+        MarkerPrinter.__CheckArucoGridMarkerImage(dictionary, chessboardSize, markerLength, markerSeparation, firstMarker, borderBits=borderBits, subSize=subSize, pageBorder=pageBorder)
+
+        markerLength = markerLength * MarkerPrinter.ptPerMeter
+        markerSeparation = markerSeparation * MarkerPrinter.ptPerMeter
+        pageBorder = (pageBorder[0] * MarkerPrinter.ptPerMeter, pageBorder[1] * MarkerPrinter.ptPerMeter)
+
+        # Check
+        path, nameExt = os.path.split(filePath)
+        name, ext = os.path.splitext(nameExt)
+
+        if(len(path) > 0):
+            if not(os.path.isdir(path)):
+                os.makedirs(path)
+
+        if((ext.upper() != ".SVG") and (ext.upper() != ".PS") and (ext.upper() != ".PDF")):
+            raise ValueError("file extention is not supported, should be: svg, ps, pdf")
+
+        # Draw
+        with MarkerPrinter.surface[ext.upper()] (
+            filePath,
+            chessboardSize[0] * markerLength + (chessboardSize[0] - 1) * markerSeparation + pageBorder[0] * 2,
+            chessboardSize[1] * markerLength + (chessboardSize[1] - 1) * markerSeparation + pageBorder[1] * 2) as surface:
+            context = cairo.Context(surface)
+
+            context.set_source_rgba(0.5, 0.5, 0.5, 1.0)
+            context.rectangle(0, 0,
+                chessboardSize[0] * markerLength + (chessboardSize[0] - 1) * markerSeparation + pageBorder[0] * 2,
+                chessboardSize[1] * markerLength + (chessboardSize[1] - 1) * markerSeparation + pageBorder[1] * 2)
+            context.fill()
+
+            context.set_source_rgba(1.0, 1.0, 1.0, 1.0)
+            context.rectangle(pageBorder[0], pageBorder[1],
+                chessboardSize[0] * markerLength + (chessboardSize[0] - 1) * markerSeparation,
+                chessboardSize[1] * markerLength + (chessboardSize[1] - 1) * markerSeparation)
+            context.fill()
+
+            for bx in range(chessboardSize[0]):
+                for by in range(chessboardSize[1]):
+                    MarkerPrinter.__DrawBlock(
+                        context = context,
+                        dictionary = dictionary,
+                        markerLength = markerLength,
+                        borderBits = borderBits,
+                        chessboardSize = chessboardSize,
+                        squareLength = markerLength + markerSeparation,
+                        firstMarkerID = firstMarker,
+                        blockX = bx,
+                        blockY = by,
+                        pageBorderX = pageBorder[0],
+                        pageBorderY = pageBorder[1],
+                        mode = "ARUCOGRID")
+
+        if(subSize is not None):
+            subDivide = (\
+                chessboardSize[0] // subSize[0] + int(chessboardSize[0] % subSize[0] > 0),
+                chessboardSize[1] // subSize[1] + int(chessboardSize[1] % subSize[1] > 0))
+
+            subChessboardBlockX = np.clip ( np.arange(0, subSize[0] * subDivide[0] + 1, subSize[0]), 0, chessboardSize[0])
+            subChessboardBlockY = np.clip ( np.arange(0, subSize[1] * subDivide[1] + 1, subSize[1]), 0, chessboardSize[1])
+
+            subChessboardSliceX = subChessboardBlockX.astype(np.float) * (markerLength + markerSeparation)
+            subChessboardSliceY = subChessboardBlockY.astype(np.float) * (markerLength + markerSeparation)
+
+            subChessboardSliceX[-1] -= markerSeparation
+            subChessboardSliceY[-1] -= markerSeparation
+
+            for subXID in range(subDivide[0]):
+                for subYID in range(subDivide[1]):
+                    subName = name + \
+                        "_X" + str(subChessboardBlockX[subXID]) + "_" + str(subChessboardBlockX[subXID+1]) + \
+                        "_Y" + str(subChessboardBlockY[subYID]) + "_" + str(subChessboardBlockY[subYID+1])
+
+                    with MarkerPrinter.surface[ext.upper()](
+                        os.path.join(path, subName + ext),
+                        subChessboardSliceX[subXID+1] - subChessboardSliceX[subXID] + pageBorder[0] * 2,
+                        subChessboardSliceY[subYID+1] - subChessboardSliceY[subYID] + pageBorder[1] * 2) as surface:
+                        context = cairo.Context(surface)
+
+                        context.set_source_rgba(0.5, 0.5, 0.5, 1.0)
+                        context.rectangle(0, 0,
+                            subChessboardSliceX[subXID+1] - subChessboardSliceX[subXID] + pageBorder[0] * 2,
+                            subChessboardSliceY[subYID+1] - subChessboardSliceY[subYID] + pageBorder[1] * 2)
+                        context.fill()
+
+                        context.set_source_rgba(1.0, 1.0, 1.0, 1.0)
+                        context.rectangle(pageBorder[0], pageBorder[1],
+                            subChessboardSliceX[subXID+1] - subChessboardSliceX[subXID],
+                            subChessboardSliceY[subYID+1] - subChessboardSliceY[subYID])
+                        context.fill()
+
+                        for bx in range(subChessboardBlockX[subXID+1] - subChessboardBlockX[subXID]):
+                            for by in range(subChessboardBlockY[subYID+1] - subChessboardBlockY[subYID]):
+                                MarkerPrinter.__DrawBlock(
+                                    context = context,
+                                    dictionary = dictionary,
+                                    markerLength = markerLength,
+                                    borderBits = borderBits,
+                                    chessboardSize = chessboardSize,
+                                    squareLength = markerLength + markerSeparation,
+                                    firstMarkerID = firstMarker,
+                                    blockX = subChessboardBlockX[subXID] + bx,
+                                    blockY = subChessboardBlockY[subYID] + by,
+                                    originX = subChessboardBlockX[subXID],
+                                    originY = subChessboardBlockY[subYID],
+                                    pageBorderX = pageBorder[0],
+                                    pageBorderY = pageBorder[1],
+                                    mode = "ARUCOGRID")
+
+if __name__ == '__main__':
+    parser = ArgumentParser()
+
+    # Save marker image parameters
+    chessGroup = parser.add_argument_group('chess', 'Chessboard')
+    arucoGroup = parser.add_argument_group('aruco', 'ArUco')
+    arucoGridGroup = parser.add_argument_group('aruco_grid', 'ArUco grid')
+    charucoGroup = parser.add_argument_group('charuco', 'ChArUco')
+    exclusiveGroup = parser.add_mutually_exclusive_group()
+
+    exclusiveGroup.add_argument(
+        "--chess", action='store_true', default=False,
+        help="Choose to save chessboard marker")
+
+    exclusiveGroup.add_argument(
+        "--aruco", action='store_true', default=False,
+        help="Choose to save ArUco marker")
+
+    exclusiveGroup.add_argument(
+        "--aruco_grid", action='store_true', default=False,
+        help="Choose to save ArUco grid marker")
+
+    exclusiveGroup.add_argument(
+        "--charuco", action='store_true', default=False,
+        help="Choose to save ChArUco marker")
+
+    # Utility functions parameters
+    exclusiveGroup.add_argument(
+        "--generate", dest="arucoDataFileName",
+        help="Generate aruco data to FILE", metavar="FILE")
+
+    exclusiveGroup.add_argument(
+        "--list_dictionary", action='store_true', default=False,
+        help="List predefined aruco dictionary")
+
+    # Parameters
+    # fileName
+    parser.add_argument(
+        "--file", dest="fileName", default="./image.pdf",
+        help="Save marker image to FILE", metavar="FILE")
+    for group in [chessGroup, arucoGroup, arucoGridGroup, charucoGroup]:
+        group.add_argument(
+            "--" + group.title + "_file", dest="fileName",
+            help="Save marker image to FILE", metavar="FILE")
+
+    # dictionary
+    parser.add_argument(
+        "--dictionary", dest="dictionary", default="DICT_ARUCO_ORIGINAL",
+        help="Generate marker via predefined DICTIONARY aruco dictionary", metavar="DICTIONARY")
+    for group in [arucoGroup, arucoGridGroup, charucoGroup]:
+        group.add_argument(
+            "--" + group.title + "_dictionary", dest="dictionary",
+            help="Generate marker via predefined DICTIONARY aruco dictionary", metavar="DICTIONARY")
+
+    # size
+    parser.add_argument(
+        "--size_x", dest="sizeX", default="16",
+        help="Save marker image with N board width", metavar="N")
+    parser.add_argument(
+        "--size_y", dest="sizeY", default="9",
+        help="Save marker image with N board height", metavar="N")
+
+    for group in [chessGroup, arucoGridGroup, charucoGroup]:
+        group.add_argument(
+            "--" + group.title + "_size_x", dest="sizeX",
+            help="Save marker image with N board width", metavar="N")
+        group.add_argument(
+            "--" + group.title + "_size_y", dest="sizeY",
+            help="Save marker image with N board height", metavar="N")
+
+    # length
+    parser.add_argument(
+        "--square_length", dest="squareLength", default="0.09",
+        help="Save marker image with L square length (Unit: meter)", metavar="L")
+    parser.add_argument(
+        "--marker_length", dest="markerLength", default="0.07",
+        help="Save marker image with L marker length (Unit: meter)", metavar="L")
+    parser.add_argument(
+        "--marker_separation", dest="markerSeparation", default="0.02",
+        help="Save marker image with L separation length (Unit: meter)", metavar="L")
+
+    for group in [chessGroup, charucoGroup]:
+        group.add_argument(
+            "--" + group.title + "_square_length", dest="squareLength",
+            help="Save marker image with L blocks length (Unit: meter)", metavar="L")
+
+    for group in [arucoGroup, arucoGridGroup, charucoGroup]:
+        group.add_argument(
+            "--" + group.title + "_marker_length", dest="markerLength",
+            help="Save marker image with L marker length (Unit: meter)", metavar="L")
+
+    for group in [arucoGridGroup]:
+        group.add_argument(
+            "--" + group.title + "_marker_separation", dest="markerSeparation",
+            help="Save marker image with L gap length (Unit: meter)", metavar="L")
+
+    # else
+    parser.add_argument(
+        "--marker_id", dest="markerID", default="0",
+        help="Save marker image with ID marker", metavar="ID")
+    parser.add_argument(
+        "--first_marker", dest="firstMarker", default="0",
+        help="Save marker image that start with ID marker", metavar="ID")
+    parser.add_argument(
+        "--border_bits", dest="borderBits", default="1",
+        help="Save marker image with N border size", metavar="N")
+
+    for group in [arucoGroup]:
+        group.add_argument(
+            "--" + group.title + "_marker_id", dest="markerID",
+            help="Save marker image with ID marker", metavar="ID")
+
+    for group in [arucoGridGroup]:
+        group.add_argument(
+            "--" + group.title + "_first_marker", dest="firstMarker",
+            help="Save marker image that start with ID marker", metavar="ID")
+
+    for group in [arucoGroup, arucoGridGroup, charucoGroup]:
+        group.add_argument(
+            "--" + group.title + "_border_bits", dest="borderBits",
+            help="Save marker image with N border size", metavar="N")
+
+    # sub size
+    parser.add_argument(
+        "--sub_size_x", dest="subSizeX", default="0",
+        help="Save marker image with N chuck width", metavar="N")
+    parser.add_argument(
+        "--sub_size_y", dest="subSizeY", default="0",
+        help="Save marker image with N chuck height", metavar="N")
+
+    for group in [chessGroup, arucoGridGroup, charucoGroup]:
+        group.add_argument(
+            "--" + group.title + "_sub_size_x", dest="subSizeX",
+            help="Save marker image with N chuck width", metavar="N")
+        group.add_argument(
+            "--" + group.title + "_sub_size_y", dest="subSizeY",
+            help="Save marker image with N chuck height", metavar="N")
+
+    # page border
+    parser.add_argument(
+        "--page_border_x", dest="pageBorderX", default="0",
+        help="Save with page border width L length (Unit: meter)", metavar="L")
+    parser.add_argument(
+        "--page_border_y", dest="pageBorderY", default="0",
+        help="Save with page border height L length (Unit: meter)", metavar="L")
+
+    for group in [chessGroup, arucoGroup, arucoGridGroup, charucoGroup]:
+        group.add_argument(
+            "--" + group.title + "_page_border_x", dest="pageBorderX", default="0",
+            help="Save with page border width L length (Unit: meter)", metavar="L")
+        group.add_argument(
+            "--" + group.title + "_page_border_y", dest="pageBorderY", default="0",
+            help="Save with page border height L length (Unit: meter)", metavar="L")
+
+    # Run
+    args = parser.parse_args()
+
+    if(args.arucoDataFileName is not None):
+        print("Generate aruco data to: " + args.arucoDataFileName)
+        SaveArucoDictBytesList(args.arucoDataFileName)
+
+    elif(args.list_dictionary):
+        print("List predefined aruco dictionary")
+        for i in MarkerPrinter.arucoDictBytesList.keys():
+            print(i)
+
+    elif(args.chess):
+        try:
+            sizeX = int(args.sizeX)
+            sizeY = int(args.sizeY)
+            squareLength = float(args.squareLength)
+            subSizeX = int(args.subSizeX)
+            subSizeY = int(args.subSizeY)
+            pageBorderX = float(args.pageBorderX)
+            pageBorderY = float(args.pageBorderY)
+        except ValueError as e:
+            warnings.warn(str(e))
+        else:
+            print("Save chessboard marker with parms: " + \
+                    str({ \
+                        "fileName": args.fileName, \
+                        "sizeX": sizeX, \
+                        "sizeY": sizeY, \
+                        "squareLength": squareLength, \
+                        "subSizeX": subSizeX, \
+                        "subSizeY": subSizeY, \
+                        "pageBorderX": pageBorderX, \
+                        "pageBorderY": pageBorderY, \
+                    }))
+
+            subSize = None
+
+            if(subSizeX > 0):
+                if(subSizeY > 0):
+                    subSize = (subSizeX, subSizeY)
+                else:
+                    subSize = (subSizeX, sizeY)
+            else:
+                if(subSizeY > 0):
+                    subSize = (sizeX, subSizeY)
+                else:
+                    subSize = None
+
+            # Gen
+            MarkerPrinter.GenChessMarkerImage(args.fileName, (sizeX, sizeY), squareLength, subSize = subSize, pageBorder = (pageBorderX, pageBorderY))
+
+    elif(args.aruco):
+        try:
+            markerLength = float(args.markerLength)
+            markerID = int(args.markerID)
+            borderBits = int(args.borderBits)
+            pageBorderX = float(args.pageBorderX)
+            pageBorderY = float(args.pageBorderY)
+        except ValueError as e:
+            warnings.warn(str(e))
+        else:
+            print("Save ArUco marker with parms: " + \
+                    str({ \
+                        "fileName": args.fileName, \
+                        "dictionary": args.dictionary, \
+                        "markerLength": markerLength, \
+                        "markerID": markerID, \
+                        "borderBits": borderBits, \
+                        "pageBorderX": pageBorderX, \
+                        "pageBorderY": pageBorderY, \
+                    }))
+
+            # Gen
+            MarkerPrinter.GenArucoMarkerImage(args.fileName, args.dictionary, markerID, markerLength, borderBits=borderBits, pageBorder = (pageBorderX, pageBorderY))
+
+    elif(args.aruco_grid):
+        try:
+            sizeX = int(args.sizeX)
+            sizeY = int(args.sizeY)
+            markerLength = float(args.markerLength)
+            markerSeparation = float(args.markerSeparation)
+            firstMarker = int(args.firstMarker)
+            borderBits = int(args.borderBits)
+            subSizeX = int(args.subSizeX)
+            subSizeY = int(args.subSizeY)
+            pageBorderX = float(args.pageBorderX)
+            pageBorderY = float(args.pageBorderY)
+        except ValueError as e:
+            warnings.warn(str(e))
+        else:
+            print("Save ArUco grid marker with parms: " + \
+                    str({ \
+                        "fileName": args.fileName, \
+                        "dictionary": args.dictionary, \
+                        "sizeX": sizeX, \
+                        "sizeY": sizeY, \
+                        "markerLength": markerLength, \
+                        "markerSeparation": markerSeparation, \
+                        "firstMarker": firstMarker, \
+                        "borderBits": borderBits, \
+                        "subSizeX": subSizeX, \
+                        "subSizeY": subSizeY, \
+                        "pageBorderX": pageBorderX, \
+                        "pageBorderY": pageBorderY, \
+                    }))
+
+            subSize = None
+
+            if(subSizeX > 0):
+                if(subSizeY > 0):
+                    subSize = (subSizeX, subSizeY)
+                else:
+                    subSize = (subSizeX, sizeY)
+            else:
+                if(subSizeY > 0):
+                    subSize = (sizeX, subSizeY)
+                else:
+                    subSize = None
+
+            # Gen
+            MarkerPrinter.GenArucoGridMarkerImage(args.fileName, args.dictionary, (sizeX, sizeY), markerLength, markerSeparation, firstMarker, borderBits=borderBits, subSize=subSize, pageBorder = (pageBorderX, pageBorderY))
+
+    elif(args.charuco):
+        try:
+            sizeX = int(args.sizeX)
+            sizeY = int(args.sizeY)
+            squareLength = float(args.squareLength)
+            markerLength = float(args.markerLength)
+            borderBits = int(args.borderBits)
+            subSizeX = int(args.subSizeX)
+            subSizeY = int(args.subSizeY)
+            pageBorderX = float(args.pageBorderX)
+            pageBorderY = float(args.pageBorderY)
+        except ValueError as e:
+            warnings.warn(str(e))
+        else:
+            print("Save ChArUco marker with parms: " + \
+                    str({ \
+                        "fileName": args.fileName, \
+                        "dictionary": args.dictionary, \
+                        "sizeX": sizeX, \
+                        "sizeY": sizeY, \
+                        "squareLength": squareLength, \
+                        "markerLength": markerLength, \
+                        "borderBits": borderBits, \
+                        "subSizeX": subSizeX, \
+                        "subSizeY": subSizeY, \
+                        "pageBorderX": pageBorderX, \
+                        "pageBorderY": pageBorderY, \
+                    }))
+
+            subSize = None
+
+            if(subSizeX > 0):
+                if(subSizeY > 0):
+                    subSize = (subSizeX, subSizeY)
+                else:
+                    subSize = (subSizeX, sizeY)
+            else:
+                if(subSizeY > 0):
+                    subSize = (sizeX, subSizeY)
+                else:
+                    subSize = None
+
+            # Gen
+            MarkerPrinter.GenCharucoMarkerImage(args.fileName, args.dictionary, (sizeX, sizeY), squareLength, markerLength, borderBits=borderBits, subSize=subSize, pageBorder = (pageBorderX, pageBorderY))
+
+    else:
+        parser.print_help()
diff --git a/modules/aruco/misc/pattern_generator/MarkerPrinterGUI.py b/modules/aruco/misc/pattern_generator/MarkerPrinterGUI.py
new file mode 100644
index 00000000000..bdd6d3cb0f0
--- /dev/null
+++ b/modules/aruco/misc/pattern_generator/MarkerPrinterGUI.py
@@ -0,0 +1,565 @@
+#!/usr/bin/env python3
+
+# SPDX-License-Identifier: BSD-3-Clause
+#
+# Copyright (c) 2019, Josh Chien. All rights reserved.
+
+from MarkerPrinter import *
+
+import tkinter as tk
+from tkinter import ttk, filedialog, messagebox
+
+import time
+
+import PIL.Image
+import PIL.ImageTk
+
+class MarkerPrinterGUI:
+
+    def VisDPI(self, shape):
+        scale0 = float(self.displayShape[0]) / float(shape[0])
+        scale1 = float(self.displayShape[1]) / float(shape[1])
+        if(scale0 > scale1):
+            return scale1 * 96.0
+        else:
+            return scale0 * 96.0
+
+    def OnShowingHelpGithub(self):
+        messagebox.showinfo("Github",
+            "https://github.com/dogod621/OpenCVMarkerPrinter")
+
+    def OnCloseWindow(self):
+        if(self.window is not None):
+            if messagebox.askokcancel("Quit", "Do you want to quit?"):
+                self.window.destroy()
+                self.window = None
+
+    def OnSelectCharucoMarkerDictionary(self, pDictName):
+        self.charucoMarkerDictionaryStr.set(pDictName)
+
+    def __SaveMarker(GenMarkerImageCallback, *args, **kwargs):
+
+        if(kwargs.get("subSize",None) is not None):
+            subSizeX, subSizeY = kwargs["subSize"]
+
+            kwargs["subSize"] = None
+
+            if(subSizeX > 0):
+                if(subSizeY > 0):
+                    kwargs["subSize"] = (subSizeX, subSizeY)
+                else:
+                    kwargs["subSize"] = (subSizeX, sizeY)
+            else:
+                if(subSizeY > 0):
+                    kwargs["subSize"] = (sizeX, subSizeY)
+                else:
+                    kwargs["subSize"] = None
+
+        try:
+            askFileName = filedialog.asksaveasfilename(initialdir = os.path.abspath("./"), title = "Output", filetypes = (\
+                ("scalable vector graphics files","*.svg"), \
+                ("portable document format files","*.pdf"), \
+                ("post script files","*.ps")),
+                defaultextension="*.*")
+
+            if (askFileName):
+                GenMarkerImageCallback(askFileName, *args, **kwargs)
+
+        except Exception as e:
+            warnings.warn(str(e))
+            messagebox.showinfo("Error", "Save marker failed")
+            return
+
+    def OnPreviewOrSaveCharucoMarker(self, askSave = False):
+        try:
+            sizeX = int(self.charucoMarkerChessboardSizeXStr.get())
+            sizeY = int(self.charucoMarkerChessboardSizeYStr.get())
+            squareLength = float(self.charucoMarkerSquareLengthStr.get())
+            markerLength = float(self.charucoMarkerMarkerLengthStr.get())
+            borderBits = int(self.charucoMarkerBorderBitsStr.get())
+            dictionary = self.charucoMarkerDictionaryStr.get()
+            subSizeX = int(self.charucoMarkerSaveSubSizeXStr.get())
+            subSizeY = int(self.charucoMarkerSaveSubSizeYStr.get())
+            pageBorderX = float(self.charucoMarkerSavePageBorderXStr.get())
+            pageBorderY = float(self.charucoMarkerSavePageBorderYStr.get())
+        except ValueError as e:
+            warnings.warn(str(e))
+            messagebox.showinfo("Error", "Enter invalid parameters")
+            return
+        except Exception as e:
+            warnings.warn(str(e))
+            messagebox.showinfo("Error", "Fail to get parameters")
+            return
+
+        # Preview
+        try:
+            dpi = self.VisDPI(((sizeY * squareLength + pageBorderY * 2) * MarkerPrinter.ptPerMeter, (sizeX * squareLength + pageBorderX * 2) * MarkerPrinter.ptPerMeter))
+            tkImage = PIL.ImageTk.PhotoImage(image = MarkerPrinter.PreviewCharucoMarkerImage(dictionary, (sizeX, sizeY), squareLength, markerLength, borderBits=borderBits, pageBorder = (pageBorderX, pageBorderY), dpi=dpi))
+            self.charucoMarkerImageLabel.imgtk = tkImage
+            self.charucoMarkerImageLabel.config(image=tkImage)
+        except Exception as e:
+            warnings.warn(str(e))
+            messagebox.showinfo("Error", "create marker failed")
+            return
+
+        # Save
+        if(askSave):
+            MarkerPrinterGUI.__SaveMarker(MarkerPrinter.GenCharucoMarkerImage, \
+                dictionary, (sizeX, sizeY), squareLength, markerLength, borderBits=borderBits, subSize = (subSizeX, subSizeY), pageBorder = (pageBorderX, pageBorderY))
+
+    def OnPreviewCharucoMarker(self):
+        self.OnPreviewOrSaveCharucoMarker(askSave = False)
+
+    def OnSaveCharucoMarker(self):
+        self.OnPreviewOrSaveCharucoMarker(askSave = True)
+
+    def InitCharucoMarkerTab(self):
+        self.charucoMarkerUIFrame = ttk.Frame(self.charucoMarkerTab)
+        self.charucoMarkerImageTab = ttk.Frame(self.charucoMarkerTab)
+        self.charucoMarkerUIFrame2 = ttk.Frame(self.charucoMarkerTab)
+
+        self.charucoMarkerUIFrame.grid(row=0, column=0, sticky = tk.NSEW)
+        self.charucoMarkerImageTab.grid(row=1, column=0, sticky = tk.NSEW)
+        self.charucoMarkerUIFrame2.grid(row=2, column=0, sticky = tk.NSEW)
+
+        self.charucoMarkerImageLabel = tk.Label(self.charucoMarkerImageTab)
+        self.charucoMarkerImageLabel.grid(row=0, column=0, sticky = tk.NSEW)
+
+        tk.Label(self.charucoMarkerUIFrame, text="dictionary").grid(row=0, column=0, sticky = tk.NSEW)
+        tk.Label(self.charucoMarkerUIFrame, text="chessboardSizeX").grid(row=0, column=1, sticky = tk.NSEW)
+        tk.Label(self.charucoMarkerUIFrame, text="chessboardSizeY").grid(row=0, column=2, sticky = tk.NSEW)
+        tk.Label(self.charucoMarkerUIFrame, text="squareLength (Unit: Meter)").grid(row=0, column=3, sticky = tk.NSEW)
+        tk.Label(self.charucoMarkerUIFrame, text="markerLength (Unit: Meter)").grid(row=0, column=4, sticky = tk.NSEW)
+        tk.Label(self.charucoMarkerUIFrame, text="borderBits").grid(row=0, column=5, sticky = tk.NSEW)
+
+        self.charucoMarkerDictionaryStr = tk.StringVar()
+        self.charucoMarkerChessboardSizeXStr = tk.StringVar()
+        self.charucoMarkerChessboardSizeXStr.set("16")
+        self.charucoMarkerChessboardSizeYStr = tk.StringVar()
+        self.charucoMarkerChessboardSizeYStr.set("9")
+        self.charucoMarkerSquareLengthStr = tk.StringVar()
+        self.charucoMarkerSquareLengthStr.set("0.09")
+        self.charucoMarkerMarkerLengthStr = tk.StringVar()
+        self.charucoMarkerMarkerLengthStr.set("0.07")
+        self.charucoMarkerBorderBitsStr = tk.StringVar()
+        self.charucoMarkerBorderBitsStr.set("1")
+
+        self.charucoMarkerDictionaryMenue = tk.OptionMenu(self.charucoMarkerUIFrame, self.charucoMarkerDictionaryStr, "DICT_ARUCO_ORIGINAL", command = self.OnSelectCharucoMarkerDictionary)
+        self.charucoMarkerDictionaryMenue.grid(row=1, column=0, sticky = tk.NSEW)
+        tk.Entry(self.charucoMarkerUIFrame, textvariable=self.charucoMarkerChessboardSizeXStr).grid(row=1, column=1, sticky = tk.NSEW)
+        tk.Entry(self.charucoMarkerUIFrame, textvariable=self.charucoMarkerChessboardSizeYStr).grid(row=1, column=2, sticky = tk.NSEW)
+        tk.Entry(self.charucoMarkerUIFrame, textvariable=self.charucoMarkerSquareLengthStr).grid(row=1, column=3, sticky = tk.NSEW)
+        tk.Entry(self.charucoMarkerUIFrame, textvariable=self.charucoMarkerMarkerLengthStr).grid(row=1, column=4, sticky = tk.NSEW)
+        tk.Entry(self.charucoMarkerUIFrame, textvariable=self.charucoMarkerBorderBitsStr).grid(row=1, column=5, sticky = tk.NSEW)
+
+        tk.Button(self.charucoMarkerUIFrame2, text = "Preview", command = self.OnPreviewCharucoMarker).grid(row=1, column=0, sticky = tk.NSEW)
+        tk.Button(self.charucoMarkerUIFrame2, text = "Save", command = self.OnSaveCharucoMarker).grid(row=1, column=1, sticky = tk.NSEW)
+
+        tk.Label(self.charucoMarkerUIFrame2, text="Save opetions:").grid(row=0, column=2, sticky = tk.NSEW)
+        tk.Label(self.charucoMarkerUIFrame2, text="(set 0 as disable)").grid(row=1, column=2, sticky = tk.NSEW)
+        tk.Label(self.charucoMarkerUIFrame2, text="subSizeX").grid(row=0, column=3, sticky = tk.NSEW)
+        tk.Label(self.charucoMarkerUIFrame2, text="subSizeY").grid(row=0, column=4, sticky = tk.NSEW)
+        tk.Label(self.charucoMarkerUIFrame2, text="Divide to chunks, chunk sizeX").grid(row=2, column=3, sticky = tk.NSEW)
+        tk.Label(self.charucoMarkerUIFrame2, text="Divide to chunks, chunk sizeY").grid(row=2, column=4, sticky = tk.NSEW)
+        tk.Label(self.charucoMarkerUIFrame2, text="pageBorderX (Unit: Meter)").grid(row=0, column=5, sticky = tk.NSEW)
+        tk.Label(self.charucoMarkerUIFrame2, text="pageBorderY (Unit: Meter)").grid(row=0, column=6, sticky = tk.NSEW)
+        tk.Label(self.charucoMarkerUIFrame2, text="Border or page").grid(row=2, column=5, sticky = tk.NSEW)
+        tk.Label(self.charucoMarkerUIFrame2, text="Border or page").grid(row=2, column=6, sticky = tk.NSEW)
+
+        self.charucoMarkerSaveSubSizeXStr = tk.StringVar()
+        self.charucoMarkerSaveSubSizeXStr.set("0")
+        self.charucoMarkerSaveSubSizeYStr = tk.StringVar()
+        self.charucoMarkerSaveSubSizeYStr.set("0")
+        self.charucoMarkerSavePageBorderXStr = tk.StringVar()
+        self.charucoMarkerSavePageBorderXStr.set("0.02")
+        self.charucoMarkerSavePageBorderYStr = tk.StringVar()
+        self.charucoMarkerSavePageBorderYStr.set("0.02")
+
+        tk.Entry(self.charucoMarkerUIFrame2, textvariable=self.charucoMarkerSaveSubSizeXStr).grid(row=1, column=3, sticky = tk.NSEW)
+        tk.Entry(self.charucoMarkerUIFrame2, textvariable=self.charucoMarkerSaveSubSizeYStr).grid(row=1, column=4, sticky = tk.NSEW)
+        tk.Entry(self.charucoMarkerUIFrame2, textvariable=self.charucoMarkerSavePageBorderXStr).grid(row=1, column=5, sticky = tk.NSEW)
+        tk.Entry(self.charucoMarkerUIFrame2, textvariable=self.charucoMarkerSavePageBorderYStr).grid(row=1, column=6, sticky = tk.NSEW)
+
+        self.charucoMarkerDictionaryMenue['menu'].delete(0, 'end')
+        for dictName in self.dictList:
+            self.charucoMarkerDictionaryMenue['menu'].add_command(label=dictName, command=tk._setit(self.charucoMarkerDictionaryStr, dictName, self.OnSelectCharucoMarkerDictionary))
+
+        self.OnSelectCharucoMarkerDictionary("DICT_ARUCO_ORIGINAL")
+
+    def OnSelectArucoGridMarkerDictionary(self, pDictName):
+        self.arucoGridMarkerDictionaryStr.set(pDictName)
+
+    def OnPreviewOrSaveArucoGridMarker(self, askSave = False):
+        try:
+            markersX = int(self.arucoGridMarkerMarkersXStr.get())
+            markersY = int(self.arucoGridMarkerMarkersYStr.get())
+            markerLength = float(self.arucoGridMarkerMarkerLengthStr.get())
+            markerSeparation = float(self.arucoGridMarkerMarkerSeparationStr.get())
+            borderBits = int(self.arucoGridMarkerBorderBitsStr.get())
+            firstMarker = int(self.arucoGridMarkerFirstMarkerStr.get())
+            dictionary = self.arucoGridMarkerDictionaryStr.get()
+            subSizeX = int(self.arucoGridMarkerSaveSubSizeXStr.get())
+            subSizeY = int(self.arucoGridMarkerSaveSubSizeYStr.get())
+            pageBorderX = float(self.arucoGridMarkerSavePageBorderXStr.get())
+            pageBorderY = float(self.arucoGridMarkerSavePageBorderYStr.get())
+        except ValueError as e:
+            warnings.warn(str(e))
+            messagebox.showinfo("Error", "Enter invalid parameters")
+            return
+        except Exception as e:
+            warnings.warn(str(e))
+            messagebox.showinfo("Error", "Fail to get parameters")
+            return
+
+        # Preview
+        try:
+            dpi=self.VisDPI(((markersY * markerLength + (markersY  - 1) * markerSeparation + pageBorderY * 2) * MarkerPrinter.ptPerMeter, (markersX * markerLength + (markersX  - 1) * markerSeparation + pageBorderX * 2) * MarkerPrinter.ptPerMeter))
+            tkImage = PIL.ImageTk.PhotoImage(image = MarkerPrinter.PreviewArucoGridMarkerImage(dictionary, (markersX, markersY), markerLength, markerSeparation, firstMarker, borderBits=borderBits, pageBorder = (pageBorderX, pageBorderY), dpi=dpi))
+            self.arucoGridMarkerImageLabel.imgtk = tkImage
+            self.arucoGridMarkerImageLabel.config(image=tkImage)
+        except Exception as e:
+            warnings.warn(str(e))
+            messagebox.showinfo("Error", "create marker failed")
+            return
+
+        # Save
+        if(askSave):
+            MarkerPrinterGUI.__SaveMarker(MarkerPrinter.GenArucoGridMarkerImage, \
+                dictionary, (markersX, markersY), markerLength, markerSeparation, firstMarker, borderBits=borderBits, subSize = (subSizeX, subSizeY), pageBorder = (pageBorderX, pageBorderY))
+
+    def OnPreviewArucoGridMarker(self):
+        self.OnPreviewOrSaveArucoGridMarker(askSave = False)
+
+    def OnSaveArucoGridMarker(self):
+        self.OnPreviewOrSaveArucoGridMarker(askSave = True)
+
+    def InitArucoGridMarkerTab(self):
+        self.arucoGridMarkerUIFrame = ttk.Frame(self.arucoGridMarkerTab)
+        self.arucoGridMarkerImageTab = ttk.Frame(self.arucoGridMarkerTab)
+        self.arucoGridMarkerUIFrame2 = ttk.Frame(self.arucoGridMarkerTab)
+
+        self.arucoGridMarkerUIFrame.grid(row=0, column=0, sticky = tk.NSEW)
+        self.arucoGridMarkerImageTab.grid(row=1, column=0, sticky = tk.NSEW)
+        self.arucoGridMarkerUIFrame2.grid(row=2, column=0, sticky = tk.NSEW)
+
+        self.arucoGridMarkerImageLabel = tk.Label(self.arucoGridMarkerImageTab)
+        self.arucoGridMarkerImageLabel.grid(row=0, column=0, sticky = tk.NSEW)
+
+        tk.Label(self.arucoGridMarkerUIFrame, text="dictionary").grid(row=0, column=0, sticky = tk.NSEW)
+        tk.Label(self.arucoGridMarkerUIFrame, text="markersX").grid(row=0, column=1, sticky = tk.NSEW)
+        tk.Label(self.arucoGridMarkerUIFrame, text="markersY").grid(row=0, column=2, sticky = tk.NSEW)
+        tk.Label(self.arucoGridMarkerUIFrame, text="markerLength (Unit: Meter)").grid(row=0, column=3, sticky = tk.NSEW)
+        tk.Label(self.arucoGridMarkerUIFrame, text="markerSeparation (Unit: Meter)").grid(row=0, column=4, sticky = tk.NSEW)
+        tk.Label(self.arucoGridMarkerUIFrame, text="firstMarker").grid(row=0, column=5, sticky = tk.NSEW)
+        tk.Label(self.arucoGridMarkerUIFrame, text="borderBits").grid(row=0, column=6, sticky = tk.NSEW)
+
+        self.arucoGridMarkerDictionaryStr = tk.StringVar()
+        self.arucoGridMarkerMarkersXStr = tk.StringVar()
+        self.arucoGridMarkerMarkersXStr.set("16")
+        self.arucoGridMarkerMarkersYStr = tk.StringVar()
+        self.arucoGridMarkerMarkersYStr.set("9")
+        self.arucoGridMarkerMarkerLengthStr = tk.StringVar()
+        self.arucoGridMarkerMarkerLengthStr.set("0.07")
+        self.arucoGridMarkerMarkerSeparationStr = tk.StringVar()
+        self.arucoGridMarkerMarkerSeparationStr.set("0.02")
+        self.arucoGridMarkerFirstMarkerStr = tk.StringVar()
+        self.arucoGridMarkerFirstMarkerStr.set("0")
+        self.arucoGridMarkerBorderBitsStr = tk.StringVar()
+        self.arucoGridMarkerBorderBitsStr.set("1")
+
+        self.arucoGridMarkerDictionaryMenue = tk.OptionMenu(self.arucoGridMarkerUIFrame, self.arucoGridMarkerDictionaryStr, "DICT_ARUCO_ORIGINAL", command = self.OnSelectArucoGridMarkerDictionary)
+        self.arucoGridMarkerDictionaryMenue.grid(row=1, column=0, sticky = tk.NSEW)
+        tk.Entry(self.arucoGridMarkerUIFrame, textvariable=self.arucoGridMarkerMarkersXStr).grid(row=1, column=1, sticky = tk.NSEW)
+        tk.Entry(self.arucoGridMarkerUIFrame, textvariable=self.arucoGridMarkerMarkersYStr).grid(row=1, column=2, sticky = tk.NSEW)
+        tk.Entry(self.arucoGridMarkerUIFrame, textvariable=self.arucoGridMarkerMarkerLengthStr).grid(row=1, column=3, sticky = tk.NSEW)
+        tk.Entry(self.arucoGridMarkerUIFrame, textvariable=self.arucoGridMarkerMarkerSeparationStr).grid(row=1, column=4, sticky = tk.NSEW)
+        tk.Entry(self.arucoGridMarkerUIFrame, textvariable=self.arucoGridMarkerFirstMarkerStr).grid(row=1, column=5, sticky = tk.NSEW)
+        tk.Entry(self.arucoGridMarkerUIFrame, textvariable=self.arucoGridMarkerBorderBitsStr).grid(row=1, column=6, sticky = tk.NSEW)
+
+        tk.Button(self.arucoGridMarkerUIFrame2, text = "Preview", command = self.OnPreviewArucoGridMarker).grid(row=1, column=0, sticky = tk.NSEW)
+        tk.Button(self.arucoGridMarkerUIFrame2, text = "Save", command = self.OnSaveArucoGridMarker).grid(row=1, column=1, sticky = tk.NSEW)
+
+        tk.Label(self.arucoGridMarkerUIFrame2, text="Save opetions:").grid(row=0, column=2, sticky = tk.NSEW)
+        tk.Label(self.arucoGridMarkerUIFrame2, text="(set 0 as disable)").grid(row=1, column=2, sticky = tk.NSEW)
+        tk.Label(self.arucoGridMarkerUIFrame2, text="subSizeX").grid(row=0, column=3, sticky = tk.NSEW)
+        tk.Label(self.arucoGridMarkerUIFrame2, text="subSizeY").grid(row=0, column=4, sticky = tk.NSEW)
+        tk.Label(self.arucoGridMarkerUIFrame2, text="Divide to chunks, chunk sizeX").grid(row=2, column=3, sticky = tk.NSEW)
+        tk.Label(self.arucoGridMarkerUIFrame2, text="Divide to chunks, chunk sizeY").grid(row=2, column=4, sticky = tk.NSEW)
+        tk.Label(self.arucoGridMarkerUIFrame2, text="pageBorderX (Unit: Meter)").grid(row=0, column=5, sticky = tk.NSEW)
+        tk.Label(self.arucoGridMarkerUIFrame2, text="pageBorderY (Unit: Meter)").grid(row=0, column=6, sticky = tk.NSEW)
+        tk.Label(self.arucoGridMarkerUIFrame2, text="Border or page").grid(row=2, column=5, sticky = tk.NSEW)
+        tk.Label(self.arucoGridMarkerUIFrame2, text="Border or page").grid(row=2, column=6, sticky = tk.NSEW)
+
+        self.arucoGridMarkerSaveSubSizeXStr = tk.StringVar()
+        self.arucoGridMarkerSaveSubSizeXStr.set("0")
+        self.arucoGridMarkerSaveSubSizeYStr = tk.StringVar()
+        self.arucoGridMarkerSaveSubSizeYStr.set("0")
+        self.arucoGridMarkerSavePageBorderXStr = tk.StringVar()
+        self.arucoGridMarkerSavePageBorderXStr.set("0.02")
+        self.arucoGridMarkerSavePageBorderYStr = tk.StringVar()
+        self.arucoGridMarkerSavePageBorderYStr.set("0.02")
+
+        tk.Entry(self.arucoGridMarkerUIFrame2, textvariable=self.arucoGridMarkerSaveSubSizeXStr).grid(row=1, column=3, sticky = tk.NSEW)
+        tk.Entry(self.arucoGridMarkerUIFrame2, textvariable=self.arucoGridMarkerSaveSubSizeYStr).grid(row=1, column=4, sticky = tk.NSEW)
+        tk.Entry(self.arucoGridMarkerUIFrame2, textvariable=self.arucoGridMarkerSavePageBorderXStr).grid(row=1, column=5, sticky = tk.NSEW)
+        tk.Entry(self.arucoGridMarkerUIFrame2, textvariable=self.arucoGridMarkerSavePageBorderYStr).grid(row=1, column=6, sticky = tk.NSEW)
+
+        self.arucoGridMarkerDictionaryMenue['menu'].delete(0, 'end')
+        for dictName in self.dictList:
+            self.arucoGridMarkerDictionaryMenue['menu'].add_command(label=dictName, command=tk._setit(self.arucoGridMarkerDictionaryStr, dictName, self.OnSelectArucoGridMarkerDictionary))
+
+        self.OnSelectArucoGridMarkerDictionary("DICT_ARUCO_ORIGINAL")
+
+    def OnSelectArucoMarkerDictionary(self, pDictName):
+        self.arucoMarkerDictionaryStr.set(pDictName)
+
+    def OnPreviewOrSaveArucoMarker(self, askSave = False):
+        try:
+            markerID = int(self.arucoMarkerMarkerIDStr.get())
+            markerLength = float(self.arucoMarkerMarkerLengthStr.get())
+            borderBits = int(self.arucoMarkerBorderBitsStr.get())
+            dictionary = self.arucoMarkerDictionaryStr.get()
+            pageBorderX = float(self.arucoMarkerSavePageBorderXStr.get())
+            pageBorderY = float(self.arucoMarkerSavePageBorderYStr.get())
+        except ValueError as e:
+            warnings.warn(str(e))
+            messagebox.showinfo("Error", "Enter invalid parameters")
+            return
+        except Exception as e:
+            warnings.warn(str(e))
+            messagebox.showinfo("Error", "Fail to get parameters")
+            return
+
+        # Preview
+        try:
+            dpi=self.VisDPI(((markerLength  + pageBorderY * 2) * MarkerPrinter.ptPerMeter, (markerLength + pageBorderX * 2) * MarkerPrinter.ptPerMeter))
+            tkImage = PIL.ImageTk.PhotoImage(image = MarkerPrinter.PreviewArucoMarkerImage(dictionary, markerID, markerLength, borderBits=borderBits, pageBorder = (pageBorderX, pageBorderY), dpi=dpi))
+            self.arucoMarkerImageLabel.imgtk = tkImage
+            self.arucoMarkerImageLabel.config(image=tkImage)
+        except Exception as e:
+            warnings.warn(str(e))
+            messagebox.showinfo("Error", "create marker failed")
+            return
+
+        # Save
+        if(askSave):
+            MarkerPrinterGUI.__SaveMarker(MarkerPrinter.GenArucoMarkerImage, \
+                dictionary, markerID, markerLength, borderBits=borderBits, pageBorder = (pageBorderX, pageBorderY))
+
+    def OnPreviewArucoMarker(self):
+        self.OnPreviewOrSaveArucoMarker(askSave = False)
+
+    def OnSaveArucoMarker(self):
+        self.OnPreviewOrSaveArucoMarker(askSave = True)
+
+    def InitArucoMarkerTab(self):
+        self.arucoMarkerUIFrame = ttk.Frame(self.arucoMarkerTab)
+        self.arucoMarkerImageTab = ttk.Frame(self.arucoMarkerTab)
+        self.arucoMarkerUIFrame2 = ttk.Frame(self.arucoMarkerTab)
+
+        self.arucoMarkerUIFrame.grid(row=0, column=0, sticky = tk.NSEW)
+        self.arucoMarkerImageTab.grid(row=1, column=0, sticky = tk.NSEW)
+        self.arucoMarkerUIFrame2.grid(row=2, column=0, sticky = tk.NSEW)
+
+        self.arucoMarkerImageLabel = tk.Label(self.arucoMarkerImageTab)
+        self.arucoMarkerImageLabel.grid(row=0, column=0, sticky = tk.NSEW)
+
+        tk.Label(self.arucoMarkerUIFrame, text="dictionary").grid(row=0, column=0, sticky = tk.NSEW)
+        tk.Label(self.arucoMarkerUIFrame, text="markerID").grid(row=0, column=1, sticky = tk.NSEW)
+        tk.Label(self.arucoMarkerUIFrame, text="markerLength (Unit: Meter)").grid(row=0, column=2, sticky = tk.NSEW)
+        tk.Label(self.arucoMarkerUIFrame, text="borderBits").grid(row=0, column=3, sticky = tk.NSEW)
+
+        self.arucoMarkerDictionaryStr = tk.StringVar()
+        self.arucoMarkerMarkerIDStr = tk.StringVar()
+        self.arucoMarkerMarkerIDStr.set("0")
+        self.arucoMarkerMarkerLengthStr = tk.StringVar()
+        self.arucoMarkerMarkerLengthStr.set("0.07")
+        self.arucoMarkerBorderBitsStr = tk.StringVar()
+        self.arucoMarkerBorderBitsStr.set("1")
+
+        self.arucoMarkerDictionaryMenue = tk.OptionMenu(self.arucoMarkerUIFrame, self.arucoMarkerDictionaryStr, "DICT_ARUCO_ORIGINAL", command = self.OnSelectArucoMarkerDictionary)
+        self.arucoMarkerDictionaryMenue.grid(row=1, column=0, sticky = tk.NSEW)
+        tk.Entry(self.arucoMarkerUIFrame, textvariable=self.arucoMarkerMarkerIDStr).grid(row=1, column=1, sticky = tk.NSEW)
+        tk.Entry(self.arucoMarkerUIFrame, textvariable=self.arucoMarkerMarkerLengthStr).grid(row=1, column=2, sticky = tk.NSEW)
+        tk.Entry(self.arucoMarkerUIFrame, textvariable=self.arucoMarkerBorderBitsStr).grid(row=1, column=3, sticky = tk.NSEW)
+
+        tk.Button(self.arucoMarkerUIFrame2, text = "Preview", command = self.OnPreviewArucoMarker).grid(row=0, column=0, sticky = tk.NSEW)
+        tk.Button(self.arucoMarkerUIFrame2, text = "Save", command = self.OnSaveArucoMarker).grid(row=0, column=1, sticky = tk.NSEW)
+
+        tk.Label(self.arucoMarkerUIFrame2, text="Save opetions:").grid(row=0, column=2, sticky = tk.NSEW)
+        tk.Label(self.arucoMarkerUIFrame2, text="(set 0 as disable)").grid(row=1, column=2, sticky = tk.NSEW)
+        tk.Label(self.arucoMarkerUIFrame2, text="pageBorderX (Unit: Meter)").grid(row=0, column=3, sticky = tk.NSEW)
+        tk.Label(self.arucoMarkerUIFrame2, text="pageBorderY (Unit: Meter)").grid(row=0, column=4, sticky = tk.NSEW)
+        tk.Label(self.arucoMarkerUIFrame2, text="Border or page").grid(row=2, column=3, sticky = tk.NSEW)
+        tk.Label(self.arucoMarkerUIFrame2, text="Border or page").grid(row=2, column=4, sticky = tk.NSEW)
+
+        self.arucoMarkerSavePageBorderXStr = tk.StringVar()
+        self.arucoMarkerSavePageBorderXStr.set("0.02")
+        self.arucoMarkerSavePageBorderYStr = tk.StringVar()
+        self.arucoMarkerSavePageBorderYStr.set("0.02")
+
+        tk.Entry(self.arucoMarkerUIFrame2, textvariable=self.arucoMarkerSavePageBorderXStr).grid(row=1, column=3, sticky = tk.NSEW)
+        tk.Entry(self.arucoMarkerUIFrame2, textvariable=self.arucoMarkerSavePageBorderYStr).grid(row=1, column=4, sticky = tk.NSEW)
+
+        self.arucoMarkerDictionaryMenue['menu'].delete(0, 'end')
+        for dictName in self.dictList:
+            self.arucoMarkerDictionaryMenue['menu'].add_command(label=dictName, command=tk._setit(self.arucoMarkerDictionaryStr, dictName, self.OnSelectArucoMarkerDictionary))
+
+        self.OnSelectArucoMarkerDictionary("DICT_ARUCO_ORIGINAL")
+
+    def OnPreviewOrSaveChessMarker(self, askSave = False):
+        try:
+            sizeX = int(self.chessMarkerChessboardSizeXStr.get())
+            sizeY = int(self.chessMarkerChessboardSizeYStr.get())
+            squareLength = float(self.chessMarkerSquareLengthStr.get())
+            subSizeX = int(self.chessMarkerSaveSubSizeXStr.get())
+            subSizeY = int(self.chessMarkerSaveSubSizeYStr.get())
+            pageBorderX = float(self.chessMarkerSavePageBorderXStr.get())
+            pageBorderY = float(self.chessMarkerSavePageBorderYStr.get())
+        except ValueError as e:
+            warnings.warn(str(e))
+            messagebox.showinfo("Error", "Enter invalid parameters")
+            return
+        except Exception as e:
+            warnings.warn(str(e))
+            messagebox.showinfo("Error", "Fail to get parameters")
+            return
+
+        # Preview
+        try:
+            dpi=self.VisDPI(((sizeY * squareLength + pageBorderY * 2) * MarkerPrinter.ptPerMeter, (sizeX * squareLength + pageBorderX * 2) * MarkerPrinter.ptPerMeter))
+            tkImage = PIL.ImageTk.PhotoImage(image = MarkerPrinter.PreviewChessMarkerImage((sizeX, sizeY), squareLength, pageBorder = (pageBorderX, pageBorderY), dpi=dpi))
+            self.chessMarkerImageLabel.imgtk = tkImage
+            self.chessMarkerImageLabel.config(image=tkImage)
+        except Exception as e:
+            warnings.warn(str(e))
+            messagebox.showinfo("Error", "create marker failed")
+            return
+
+        # Save
+        if(askSave):
+            MarkerPrinterGUI.__SaveMarker(MarkerPrinter.GenChessMarkerImage, \
+                (sizeX, sizeY), squareLength, subSize = (subSizeX, subSizeY), pageBorder = (pageBorderX, pageBorderY))
+
+    def OnPreviewChessMarker(self):
+        self.OnPreviewOrSaveChessMarker(askSave = False)
+
+    def OnSaveChessMarker(self):
+        self.OnPreviewOrSaveChessMarker(askSave = True)
+
+    def InitChessMarkerTab(self):
+        self.chessMarkerUIFrame = ttk.Frame(self.chessMarkerTab)
+        self.chessMarkerImageTab = ttk.Frame(self.chessMarkerTab)
+        self.chessMarkerUIFrame2 = ttk.Frame(self.chessMarkerTab)
+
+        self.chessMarkerUIFrame.grid(row=0, column=0, sticky = tk.NSEW)
+        self.chessMarkerImageTab.grid(row=1, column=0, sticky = tk.NSEW)
+        self.chessMarkerUIFrame2.grid(row=2, column=0, sticky = tk.NSEW)
+
+        self.chessMarkerImageLabel = tk.Label(self.chessMarkerImageTab)
+        self.chessMarkerImageLabel.grid(row=0, column=0, sticky = tk.NSEW)
+
+        tk.Label(self.chessMarkerUIFrame, text="chessboardSizeX").grid(row=0, column=0, sticky = tk.NSEW)
+        tk.Label(self.chessMarkerUIFrame, text="chessboardSizeY").grid(row=0, column=1, sticky = tk.NSEW)
+        tk.Label(self.chessMarkerUIFrame, text="squareLength (Unit: Meter)").grid(row=0, column=2, sticky = tk.NSEW)
+
+        self.chessMarkerChessboardSizeXStr = tk.StringVar()
+        self.chessMarkerChessboardSizeXStr.set("16")
+        self.chessMarkerChessboardSizeYStr = tk.StringVar()
+        self.chessMarkerChessboardSizeYStr.set("9")
+        self.chessMarkerSquareLengthStr = tk.StringVar()
+        self.chessMarkerSquareLengthStr.set("0.09")
+
+        tk.Entry(self.chessMarkerUIFrame, textvariable=self.chessMarkerChessboardSizeXStr).grid(row=1, column=0, sticky = tk.NSEW)
+        tk.Entry(self.chessMarkerUIFrame, textvariable=self.chessMarkerChessboardSizeYStr).grid(row=1, column=1, sticky = tk.NSEW)
+        tk.Entry(self.chessMarkerUIFrame, textvariable=self.chessMarkerSquareLengthStr).grid(row=1, column=2, sticky = tk.NSEW)
+
+        tk.Button(self.chessMarkerUIFrame2, text = "Preview", command = self.OnPreviewChessMarker).grid(row=1, column=0, sticky = tk.NSEW)
+        tk.Button(self.chessMarkerUIFrame2, text = "Save", command = self.OnSaveChessMarker).grid(row=1, column=1, sticky = tk.NSEW)
+
+        tk.Label(self.chessMarkerUIFrame2, text="Save opetions:").grid(row=0, column=2, sticky = tk.NSEW)
+        tk.Label(self.chessMarkerUIFrame2, text="(set 0 as disable)").grid(row=1, column=2, sticky = tk.NSEW)
+        tk.Label(self.chessMarkerUIFrame2, text="subSizeX").grid(row=0, column=3, sticky = tk.NSEW)
+        tk.Label(self.chessMarkerUIFrame2, text="subSizeY").grid(row=0, column=4, sticky = tk.NSEW)
+        tk.Label(self.chessMarkerUIFrame2, text="Divide to chunks, chunk sizeX").grid(row=2, column=3, sticky = tk.NSEW)
+        tk.Label(self.chessMarkerUIFrame2, text="Divide to chunks, chunk sizeY").grid(row=2, column=4, sticky = tk.NSEW)
+        tk.Label(self.chessMarkerUIFrame2, text="pageBorderX (Unit: Meter)").grid(row=0, column=5, sticky = tk.NSEW)
+        tk.Label(self.chessMarkerUIFrame2, text="pageBorderY (Unit: Meter)").grid(row=0, column=6, sticky = tk.NSEW)
+        tk.Label(self.chessMarkerUIFrame2, text="Border or page").grid(row=2, column=5, sticky = tk.NSEW)
+        tk.Label(self.chessMarkerUIFrame2, text="Border or page").grid(row=2, column=6, sticky = tk.NSEW)
+
+        self.chessMarkerSaveSubSizeXStr = tk.StringVar()
+        self.chessMarkerSaveSubSizeXStr.set("0")
+        self.chessMarkerSaveSubSizeYStr = tk.StringVar()
+        self.chessMarkerSaveSubSizeYStr.set("0")
+        self.chessMarkerSavePageBorderXStr = tk.StringVar()
+        self.chessMarkerSavePageBorderXStr.set("0.02")
+        self.chessMarkerSavePageBorderYStr = tk.StringVar()
+        self.chessMarkerSavePageBorderYStr.set("0.02")
+
+        tk.Entry(self.chessMarkerUIFrame2, textvariable=self.chessMarkerSaveSubSizeXStr).grid(row=1, column=3, sticky = tk.NSEW)
+        tk.Entry(self.chessMarkerUIFrame2, textvariable=self.chessMarkerSaveSubSizeYStr).grid(row=1, column=4, sticky = tk.NSEW)
+        tk.Entry(self.chessMarkerUIFrame2, textvariable=self.chessMarkerSavePageBorderXStr).grid(row=1, column=5, sticky = tk.NSEW)
+        tk.Entry(self.chessMarkerUIFrame2, textvariable=self.chessMarkerSavePageBorderYStr).grid(row=1, column=6, sticky = tk.NSEW)
+
+    def Update(self):
+        time.sleep(0)
+        self.window.after(self.delay, self.Update)
+
+    def __init__(self, pDelay=15, pDisplayShape=(int(400), int(1200))):
+        self.delay = pDelay
+        self.displayShape = pDisplayShape
+
+        self.dictList = MarkerPrinter.arucoDictBytesList.keys()
+
+        # GUI
+        self.window = tk.Tk()
+        self.notebook = ttk.Notebook(self.window)
+        self.notebook.grid(row=0, column=0, sticky = tk.NSEW)
+
+        self.window.title("MarkerPrinterGUI")
+        self.window.config(cursor="arrow")
+        self.window.protocol("WM_DELETE_WINDOW", self.OnCloseWindow)
+
+        # Menues
+        self.menu = tk.Menu(self.window)
+        self.helpMenu = tk.Menu(self.menu, tearoff=0)
+        self.menu.add_cascade(label="Help", menu=self.helpMenu)
+        self.helpMenu.add_command(label="Github", command=self.OnShowingHelpGithub)
+        self.helpMenu.add_command(label="DEBUG_LINE_MODE", command=self.On_DEBUG_LINE_MODE)
+        self.helpMenu.add_command(label="DEBUG_BLOCK_MODE", command=self.On_DEBUG_BLOCK_MODE)
+        self.helpMenu.add_command(label="CLOSE_DEBUG_MODE", command=self.On_CLOSE_DEBUG_MODE)
+        self.window.config(menu=self.menu)
+
+        self.charucoMarkerTab = ttk.Frame(self.notebook)
+        self.arucoMarkerTab = ttk.Frame(self.notebook)
+        self.arucoGridMarkerTab = ttk.Frame(self.notebook)
+        self.chessMarkerTab = ttk.Frame(self.notebook)
+
+        self.notebook.add(self.charucoMarkerTab, text='ChArUco Marker')
+        self.notebook.add(self.arucoMarkerTab, text='ArUco Marker')
+        self.notebook.add(self.arucoGridMarkerTab, text='ArUcoGrid Marker')
+        self.notebook.add(self.chessMarkerTab, text='Chessboard Marker')
+
+        self.InitCharucoMarkerTab()
+        self.InitArucoMarkerTab()
+        self.InitArucoGridMarkerTab()
+        self.InitChessMarkerTab()
+
+        self.Update()
+        self.window.mainloop()
+
+    def On_DEBUG_LINE_MODE(self):
+        messagebox.showinfo("Note", "You enabled the debug mode: \"LINE\"")
+        MarkerPrinter.debugMode = "LINE"
+
+    def On_DEBUG_BLOCK_MODE(self):
+        messagebox.showinfo("Note", "You enabled the debug mode: \"BLOCK\"")
+        MarkerPrinter.debugMode = "BLOCK"
+
+    def On_CLOSE_DEBUG_MODE(self):
+        messagebox.showinfo("Note", "You closed the debug mode")
+        MarkerPrinter.debugMode = None
+
+if __name__ == '__main__':
+    MarkerPrinterGUI()
diff --git a/modules/aruco/misc/pattern_generator/README.md b/modules/aruco/misc/pattern_generator/README.md
new file mode 100644
index 00000000000..994dc9311b7
--- /dev/null
+++ b/modules/aruco/misc/pattern_generator/README.md
@@ -0,0 +1,68 @@
+# OpenCVMarkerPrinter
+
+## Description
+This small app can save some commonly used opencv markers such as ArUco, ArUcoGrid, Chessboard and ChArUco to vector graphics file. **Supported vector graphics file format: .svg, .pdf and .ps.**
+
+<img src="./doc/images/MarkerPrinterGUI.jpg" height="160" />
+
+### Dependencies
+#### MarkerPrinter
+  * numpy
+  * PIL(Pillow, for image processing)
+  * cairo(for drawing vector graphic)
+  * cairosvg(for svg to png)
+
+#### MarkerPrinterGUI
+  * tkinter(for GUI)
+
+## Tutorial
+#### GUI
+```
+python MarkerPrinterGUI.py
+```
+
+You can switch ArUco, ArUcoGrid, Chessboard and ChArUco mode at the GUI tab, then you can select dictionary from the GUI menu and modify board shape, marker size, border width... etc. at the GUI entry, finally click the preview or save button to show the marker image on the GUI window or save it to file.
+
+#### Command-Line
+##### Print help
+```
+python MarkerPrinter.py
+```
+
+##### Print predefined dictionary list
+```
+python MarkerPrinter.py --list_dictionary
+```
+
+##### Save chessboard
+```
+python MarkerPrinter.py --chess --file "./chess.pdf" --size_x 16 --size_y 9 --square_length 0.09
+```
+
+##### Save ArUco
+```
+python MarkerPrinter.py --aruco --file "./aruco.pdf" --dictionary DICT_ARUCO_ORIGINAL --marker_length 0.07 --marker_id 0 --border_bits 1
+```
+
+##### Save ArUco Grid
+```
+python MarkerPrinter.py --aruco_grid --file "./aruco_grid.pdf" --dictionary DICT_ARUCO_ORIGINAL --size_x 16 --size_y 9 --marker_length 0.07 --marker_separation 0.02 --first_marker 0 --border_bits 1
+```
+
+##### Save ChArUco
+```
+python MarkerPrinter.py --charuco --file "./charuco.pdf" --dictionary DICT_ARUCO_ORIGINAL --size_x 16 --size_y 9 --square_length 0.09 --marker_length 0.07 --border_bits 1
+```
+
+## Useful Options:
+### Divde output to chunks
+If you are using consumer level printer, you will suffer from not able printing too large marker, so just set chunks shape at the GUI subSize entry before saving the marker to files, it will divide output marker to chunks. If you are using command-line interface, just add --sub_size_x x --sub_size_y y as parameters.
+
+### Page border
+If you are printing the image directly, you will need add page border to protect the marker, so just set page border at the GUI pageBorder entry before saving the marker to files. If you are using command-line interface, just add --page_border_x x --page_border_y y as parameters.
+
+### Generate aruco data:
+Although there is a built-in aruco dictionary data, but if you want to update the dictionary(If aruco update predefined dictionary list), just install opencv-python and opencv-contrib-python, and than run
+```
+python MarkerPrinter.py --generate arucoDictBytesList.npz
+```
diff --git a/modules/aruco/misc/pattern_generator/arucoDictBytesList.npz b/modules/aruco/misc/pattern_generator/arucoDictBytesList.npz
new file mode 100644
index 00000000000..64bcbc96cb5
Binary files /dev/null and b/modules/aruco/misc/pattern_generator/arucoDictBytesList.npz differ
diff --git a/modules/aruco/misc/pattern_generator/doc/images/MarkerPrinterGUI.jpg b/modules/aruco/misc/pattern_generator/doc/images/MarkerPrinterGUI.jpg
new file mode 100644
index 00000000000..4aed275565a
Binary files /dev/null and b/modules/aruco/misc/pattern_generator/doc/images/MarkerPrinterGUI.jpg differ
diff --git a/modules/aruco/src/aruco.cpp b/modules/aruco/src/aruco.cpp
index 41b137d3c5b..dee2669ebc1 100644
--- a/modules/aruco/src/aruco.cpp
+++ b/modules/aruco/src/aruco.cpp
@@ -345,49 +345,6 @@ static void _filterTooCloseCandidates(const vector< vector< Point2f > > &candida
     contoursSetOut.push_back(smallerContours);
 }
 
-
-/**
-  * ParallelLoopBody class for the parallelization of the basic candidate detections using
-  * different threhold window sizes. Called from function _detectInitialCandidates()
-  */
-class DetectInitialCandidatesParallel : public ParallelLoopBody {
-    public:
-    DetectInitialCandidatesParallel(const Mat *_grey,
-                                    vector< vector< vector< Point2f > > > *_candidatesArrays,
-                                    vector< vector< vector< Point > > > *_contoursArrays,
-                                    const Ptr<DetectorParameters> &_params)
-        : grey(_grey), candidatesArrays(_candidatesArrays), contoursArrays(_contoursArrays),
-          params(_params) {}
-
-    void operator()(const Range &range) const CV_OVERRIDE {
-        const int begin = range.start;
-        const int end = range.end;
-
-        for(int i = begin; i < end; i++) {
-            int currScale =
-                params->adaptiveThreshWinSizeMin + i * params->adaptiveThreshWinSizeStep;
-            // threshold
-            Mat thresh;
-            _threshold(*grey, thresh, currScale, params->adaptiveThreshConstant);
-
-            // detect rectangles
-            _findMarkerContours(thresh, (*candidatesArrays)[i], (*contoursArrays)[i],
-                                params->minMarkerPerimeterRate, params->maxMarkerPerimeterRate,
-                                params->polygonalApproxAccuracyRate, params->minCornerDistanceRate,
-                                params->minDistanceToBorder);
-        }
-    }
-
-    private:
-    DetectInitialCandidatesParallel &operator=(const DetectInitialCandidatesParallel &);
-
-    const Mat *grey;
-    vector< vector< vector< Point2f > > > *candidatesArrays;
-    vector< vector< vector< Point > > > *contoursArrays;
-    const Ptr<DetectorParameters> &params;
-};
-
-
 /**
  * @brief Initial steps on finding square candidates
  */
@@ -407,21 +364,23 @@ static void _detectInitialCandidates(const Mat &grey, vector< vector< Point2f >
     vector< vector< vector< Point > > > contoursArrays((size_t) nScales);
 
     ////for each value in the interval of thresholding window sizes
-    // for(int i = 0; i < nScales; i++) {
-    //    int currScale = params.adaptiveThreshWinSizeMin + i*params.adaptiveThreshWinSizeStep;
-    //    // treshold
-    //    Mat thresh;
-    //    _threshold(grey, thresh, currScale, params.adaptiveThreshConstant);
-    //    // detect rectangles
-    //    _findMarkerContours(thresh, candidatesArrays[i], contoursArrays[i],
-    // params.minMarkerPerimeterRate,
-    //                        params.maxMarkerPerimeterRate, params.polygonalApproxAccuracyRate,
-    //                        params.minCornerDistance, params.minDistanceToBorder);
-    //}
-
-    // this is the parallel call for the previous commented loop (result is equivalent)
-    parallel_for_(Range(0, nScales), DetectInitialCandidatesParallel(&grey, &candidatesArrays,
-                                                                     &contoursArrays, params));
+    parallel_for_(Range(0, nScales), [&](const Range& range) {
+        const int begin = range.start;
+        const int end = range.end;
+
+        for (int i = begin; i < end; i++) {
+            int currScale = params->adaptiveThreshWinSizeMin + i * params->adaptiveThreshWinSizeStep;
+            // threshold
+            Mat thresh;
+            _threshold(grey, thresh, currScale, params->adaptiveThreshConstant);
+
+            // detect rectangles
+            _findMarkerContours(thresh, candidatesArrays[i], contoursArrays[i],
+                                params->minMarkerPerimeterRate, params->maxMarkerPerimeterRate,
+                                params->polygonalApproxAccuracyRate, params->minCornerDistanceRate,
+                                params->minDistanceToBorder);
+        }
+    });
 
     // join candidates
     for(int i = 0; i < nScales; i++) {
@@ -612,50 +571,6 @@ static uint8_t _identifyOneCandidate(const Ptr<Dictionary>& dictionary, InputArr
     return typ;
 }
 
-
-/**
-  * ParallelLoopBody class for the parallelization of the marker identification step
-  * Called from function _identifyCandidates()
-  */
-class IdentifyCandidatesParallel : public ParallelLoopBody {
-    public:
-    IdentifyCandidatesParallel(const Mat& _grey, vector< vector< Point2f > >& _candidates,
-                               const Ptr<Dictionary> &_dictionary,
-                               vector< int >& _idsTmp, vector< uint8_t >& _validCandidates,
-                               const Ptr<DetectorParameters> &_params,
-                               vector< int > &_rotated)
-        : grey(_grey), candidates(_candidates), dictionary(_dictionary),
-          idsTmp(_idsTmp), validCandidates(_validCandidates), params(_params), rotated(_rotated) {}
-
-    void operator()(const Range &range) const CV_OVERRIDE
-    {
-        const int begin = range.start;
-        const int end = range.end;
-
-        for(int i = begin; i < end; i++) {
-            int currId;
-            validCandidates[i] = _identifyOneCandidate(dictionary, grey, candidates[i], currId, params, rotated[i]);
-
-            if(validCandidates[i] > 0)
-                idsTmp[i] = currId;
-        }
-
-    }
-
-    private:
-    IdentifyCandidatesParallel &operator=(const IdentifyCandidatesParallel &); // to quiet MSVC
-
-    const Mat &grey;
-    vector< vector< Point2f > >& candidates;
-    const Ptr<Dictionary> &dictionary;
-    vector< int > &idsTmp;
-    vector< uint8_t > &validCandidates;
-    const Ptr<DetectorParameters> &params;
-    vector< int > &rotated;
-};
-
-
-
 /**
  * @brief Copy the contents of a corners vector to an OutputArray, settings its size.
  */
@@ -721,11 +636,20 @@ static void _identifyCandidates(InputArray _image, vector< vector< vector< Point
     vector< uint8_t > validCandidates(ncandidates, 0);
 
     //// Analyze each of the candidates
-    parallel_for_(Range(0, ncandidates),
-                  IdentifyCandidatesParallel(grey,
-                                             params->detectInvertedMarker ? _candidatesSet[1] : _candidatesSet[0],
-                                             _dictionary, idsTmp,
-                                             validCandidates, params, rotated));
+    parallel_for_(Range(0, ncandidates), [&](const Range &range) {
+        const int begin = range.start;
+        const int end = range.end;
+
+        vector< vector< Point2f > >& candidates = params->detectInvertedMarker ? _candidatesSet[1] : _candidatesSet[0];
+
+        for(int i = begin; i < end; i++) {
+            int currId;
+            validCandidates[i] = _identifyOneCandidate(_dictionary, grey, candidates[i], currId, params, rotated[i]);
+
+            if(validCandidates[i] > 0)
+                idsTmp[i] = currId;
+        }
+    });
 
     for(int i = 0; i < ncandidates; i++) {
         if(validCandidates[i] > 0) {
@@ -776,40 +700,6 @@ static void _getSingleMarkerObjectPoints(float markerLength, OutputArray _objPoi
     objPoints.ptr< Vec3f >(0)[3] = Vec3f(-markerLength / 2.f, -markerLength / 2.f, 0);
 }
 
-
-
-
-/**
-  * ParallelLoopBody class for the parallelization of the marker corner subpixel refinement
-  * Called from function detectMarkers()
-  */
-class MarkerSubpixelParallel : public ParallelLoopBody {
-    public:
-    MarkerSubpixelParallel(const Mat *_grey, OutputArrayOfArrays _corners,
-                           const Ptr<DetectorParameters> &_params)
-        : grey(_grey), corners(_corners), params(_params) {}
-
-    void operator()(const Range &range) const CV_OVERRIDE {
-        const int begin = range.start;
-        const int end = range.end;
-
-        for(int i = begin; i < end; i++) {
-            cornerSubPix(*grey, corners.getMat(i),
-                         Size(params->cornerRefinementWinSize, params->cornerRefinementWinSize),
-                         Size(-1, -1), TermCriteria(TermCriteria::MAX_ITER | TermCriteria::EPS,
-                                                    params->cornerRefinementMaxIterations,
-                                                    params->cornerRefinementMinAccuracy));
-        }
-    }
-
-    private:
-    MarkerSubpixelParallel &operator=(const MarkerSubpixelParallel &); // to quiet MSVC
-
-    const Mat *grey;
-    OutputArrayOfArrays corners;
-    const Ptr<DetectorParameters> &params;
-};
-
 /**
  * Line fitting  A * B = C :: Called from function refineCandidateLines
  * @param nContours, contour-container
@@ -952,34 +842,6 @@ static void _refineCandidateLines(std::vector<Point>& nContours, std::vector<Poi
 	}
 }
 
-
-/**
-  * ParallelLoopBody class for the parallelization of the marker corner contour refinement
-  * Called from function detectMarkers()
-  */
-class MarkerContourParallel : public ParallelLoopBody {
-    public:
-    MarkerContourParallel( vector< vector< Point > >& _contours, vector< vector< Point2f > >& _candidates,  const Mat& _camMatrix, const Mat& _distCoeff)
-        : contours(_contours), candidates(_candidates), camMatrix(_camMatrix), distCoeff(_distCoeff){}
-
-    void operator()(const Range &range) const CV_OVERRIDE {
-
-        for(int i = range.start; i < range.end; i++) {
-            _refineCandidateLines(contours[i], candidates[i], camMatrix, distCoeff);
-        }
-    }
-
-    private:
-    MarkerContourParallel &operator=(const MarkerContourParallel &){
-        return *this;
-    }
-
-    vector< vector< Point > >& contours;
-    vector< vector< Point2f > >& candidates;
-    const Mat& camMatrix;
-    const Mat& distCoeff;
-};
-
 #ifdef APRIL_DEBUG
 static void _darken(const Mat &im){
     for (int y = 0; y < im.rows; y++) {
@@ -1159,17 +1021,19 @@ void detectMarkers(InputArray _image, const Ptr<Dictionary> &_dictionary, Output
                   _params->cornerRefinementMinAccuracy > 0);
 
         //// do corner refinement for each of the detected markers
-        // for (unsigned int i = 0; i < _corners.cols(); i++) {
-        //    cornerSubPix(grey, _corners.getMat(i),
-        //                 Size(params.cornerRefinementWinSize, params.cornerRefinementWinSize),
-        //                 Size(-1, -1), TermCriteria(TermCriteria::MAX_ITER | TermCriteria::EPS,
-        //                                            params.cornerRefinementMaxIterations,
-        //                                            params.cornerRefinementMinAccuracy));
-        //}
-
-        // this is the parallel call for the previous commented loop (result is equivalent)
-        parallel_for_(Range(0, _corners.cols()),
-                      MarkerSubpixelParallel(&grey, _corners, _params));
+        parallel_for_(Range(0, _corners.cols()), [&](const Range& range) {
+            const int begin = range.start;
+            const int end = range.end;
+
+            for (int i = begin; i < end; i++) {
+                cornerSubPix(grey, _corners.getMat(i),
+                             Size(_params->cornerRefinementWinSize, _params->cornerRefinementWinSize),
+                             Size(-1, -1),
+                             TermCriteria(TermCriteria::MAX_ITER | TermCriteria::EPS,
+                                          _params->cornerRefinementMaxIterations,
+                                          _params->cornerRefinementMinAccuracy));
+            }
+        });
     }
 
     /// STEP 3, Optional : Corner refinement :: use contour container
@@ -1178,7 +1042,12 @@ void detectMarkers(InputArray _image, const Ptr<Dictionary> &_dictionary, Output
         if(! _ids.empty()){
 
             // do corner refinement using the contours for each detected markers
-            parallel_for_(Range(0, _corners.cols()), MarkerContourParallel(contours, candidates, camMatrix.getMat(), distCoeff.getMat()));
+            parallel_for_(Range(0, _corners.cols()), [&](const Range& range) {
+                for (int i = range.start; i < range.end; i++) {
+                    _refineCandidateLines(contours[i], candidates[i], camMatrix.getMat(),
+                                          distCoeff.getMat());
+                }
+            });
 
             // copy the corners to the output array
             _copyVector2Output(candidates, _corners);
@@ -1186,42 +1055,6 @@ void detectMarkers(InputArray _image, const Ptr<Dictionary> &_dictionary, Output
     }
 }
 
-
-
-/**
-  * ParallelLoopBody class for the parallelization of the single markers pose estimation
-  * Called from function estimatePoseSingleMarkers()
-  */
-class SinglePoseEstimationParallel : public ParallelLoopBody {
-    public:
-    SinglePoseEstimationParallel(Mat& _markerObjPoints, InputArrayOfArrays _corners,
-                                 InputArray _cameraMatrix, InputArray _distCoeffs,
-                                 Mat& _rvecs, Mat& _tvecs)
-        : markerObjPoints(_markerObjPoints), corners(_corners), cameraMatrix(_cameraMatrix),
-          distCoeffs(_distCoeffs), rvecs(_rvecs), tvecs(_tvecs) {}
-
-    void operator()(const Range &range) const CV_OVERRIDE {
-        const int begin = range.start;
-        const int end = range.end;
-
-        for(int i = begin; i < end; i++) {
-            solvePnP(markerObjPoints, corners.getMat(i), cameraMatrix, distCoeffs,
-                    rvecs.at<Vec3d>(i), tvecs.at<Vec3d>(i));
-        }
-    }
-
-    private:
-    SinglePoseEstimationParallel &operator=(const SinglePoseEstimationParallel &); // to quiet MSVC
-
-    Mat& markerObjPoints;
-    InputArrayOfArrays corners;
-    InputArray cameraMatrix, distCoeffs;
-    Mat& rvecs, tvecs;
-};
-
-
-
-
 /**
   */
 void estimatePoseSingleMarkers(InputArrayOfArrays _corners, float markerLength,
@@ -1239,15 +1072,16 @@ void estimatePoseSingleMarkers(InputArrayOfArrays _corners, float markerLength,
     Mat rvecs = _rvecs.getMat(), tvecs = _tvecs.getMat();
 
     //// for each marker, calculate its pose
-    // for (int i = 0; i < nMarkers; i++) {
-    //    solvePnP(markerObjPoints, _corners.getMat(i), _cameraMatrix, _distCoeffs,
-    //             _rvecs.getMat(i), _tvecs.getMat(i));
-    //}
-
-    // this is the parallel call for the previous commented loop (result is equivalent)
-    parallel_for_(Range(0, nMarkers),
-                  SinglePoseEstimationParallel(markerObjPoints, _corners, _cameraMatrix,
-                                               _distCoeffs, rvecs, tvecs));
+    parallel_for_(Range(0, nMarkers), [&](const Range& range) {
+        const int begin = range.start;
+        const int end = range.end;
+
+        for (int i = begin; i < end; i++) {
+            solvePnP(markerObjPoints, _corners.getMat(i), _cameraMatrix, _distCoeffs, rvecs.at<Vec3d>(i),
+                     tvecs.at<Vec3d>(i));
+        }
+    });
+
     if(_objPoints.needed()){
         markerObjPoints.convertTo(_objPoints, -1);
     }
@@ -1599,8 +1433,8 @@ void refineDetectedMarkers(InputArray _image, const Ptr<Board> &_board,
 /**
   */
 int estimatePoseBoard(InputArrayOfArrays _corners, InputArray _ids, const Ptr<Board> &board,
-                      InputArray _cameraMatrix, InputArray _distCoeffs, OutputArray _rvec,
-                      OutputArray _tvec, bool useExtrinsicGuess) {
+                      InputArray _cameraMatrix, InputArray _distCoeffs, InputOutputArray _rvec,
+                      InputOutputArray _tvec, bool useExtrinsicGuess) {
 
     CV_Assert(_corners.total() == _ids.total());
 
diff --git a/modules/aruco/src/charuco.cpp b/modules/aruco/src/charuco.cpp
index e1cf224371b..d39dca9960d 100644
--- a/modules/aruco/src/charuco.cpp
+++ b/modules/aruco/src/charuco.cpp
@@ -270,50 +270,6 @@ static int _filterCornersWithoutMinMarkers(const Ptr<CharucoBoard> &_board,
     return (int)_filteredCharucoIds.total();
 }
 
-
-/**
-  * ParallelLoopBody class for the parallelization of the charuco corners subpixel refinement
-  * Called from function _selectAndRefineChessboardCorners()
-  */
-class CharucoSubpixelParallel : public ParallelLoopBody {
-    public:
-    CharucoSubpixelParallel(const Mat *_grey, vector< Point2f > *_filteredChessboardImgPoints,
-                            vector< Size > *_filteredWinSizes, const Ptr<DetectorParameters> &_params)
-        : grey(_grey), filteredChessboardImgPoints(_filteredChessboardImgPoints),
-          filteredWinSizes(_filteredWinSizes), params(_params) {}
-
-    void operator()(const Range &range) const CV_OVERRIDE {
-        const int begin = range.start;
-        const int end = range.end;
-
-        for(int i = begin; i < end; i++) {
-            vector< Point2f > in;
-            in.push_back((*filteredChessboardImgPoints)[i]);
-            Size winSize = (*filteredWinSizes)[i];
-            if(winSize.height == -1 || winSize.width == -1)
-                winSize = Size(params->cornerRefinementWinSize, params->cornerRefinementWinSize);
-
-            cornerSubPix(*grey, in, winSize, Size(),
-                         TermCriteria(TermCriteria::MAX_ITER | TermCriteria::EPS,
-                                      params->cornerRefinementMaxIterations,
-                                      params->cornerRefinementMinAccuracy));
-
-            (*filteredChessboardImgPoints)[i] = in[0];
-        }
-    }
-
-    private:
-    CharucoSubpixelParallel &operator=(const CharucoSubpixelParallel &); // to quiet MSVC
-
-    const Mat *grey;
-    vector< Point2f > *filteredChessboardImgPoints;
-    vector< Size > *filteredWinSizes;
-    const Ptr<DetectorParameters> &params;
-};
-
-
-
-
 /**
   * @brief From all projected chessboard corners, select those inside the image and apply subpixel
   * refinement. Returns number of valid corners.
@@ -353,23 +309,25 @@ static int _selectAndRefineChessboardCorners(InputArray _allCorners, InputArray
     const Ptr<DetectorParameters> params = DetectorParameters::create(); // use default params for corner refinement
 
     //// For each of the charuco corners, apply subpixel refinement using its correspondind winSize
-    // for(unsigned int i=0; i<filteredChessboardImgPoints.size(); i++) {
-    //    vector<Point2f> in;
-    //    in.push_back(filteredChessboardImgPoints[i]);
-    //    Size winSize = filteredWinSizes[i];
-    //    if(winSize.height == -1 || winSize.width == -1)
-    //        winSize = Size(params.cornerRefinementWinSize, params.cornerRefinementWinSize);
-    //    cornerSubPix(grey, in, winSize, Size(),
-    //                 TermCriteria(TermCriteria::MAX_ITER | TermCriteria::EPS,
-    //                              params->cornerRefinementMaxIterations,
-    //                              params->cornerRefinementMinAccuracy));
-    //    filteredChessboardImgPoints[i] = in[0];
-    //}
-
-    // this is the parallel call for the previous commented loop (result is equivalent)
-    parallel_for_(
-        Range(0, (int)filteredChessboardImgPoints.size()),
-        CharucoSubpixelParallel(&grey, &filteredChessboardImgPoints, &filteredWinSizes, params));
+    parallel_for_(Range(0, (int)filteredChessboardImgPoints.size()), [&](const Range& range) {
+        const int begin = range.start;
+        const int end = range.end;
+
+        for (int i = begin; i < end; i++) {
+            vector<Point2f> in;
+            in.push_back(filteredChessboardImgPoints[i]);
+            Size winSize = filteredWinSizes[i];
+            if (winSize.height == -1 || winSize.width == -1)
+                winSize = Size(params->cornerRefinementWinSize, params->cornerRefinementWinSize);
+
+            cornerSubPix(grey, in, winSize, Size(),
+                         TermCriteria(TermCriteria::MAX_ITER | TermCriteria::EPS,
+                                      params->cornerRefinementMaxIterations,
+                                      params->cornerRefinementMinAccuracy));
+
+            filteredChessboardImgPoints[i] = in[0];
+        }
+    });
 
     // parse output
     Mat(filteredChessboardImgPoints).copyTo(_selectedCorners);
@@ -656,7 +614,7 @@ static bool _arePointsEnoughForPoseEstimation(const vector< Point3f > &points) {
   */
 bool estimatePoseCharucoBoard(InputArray _charucoCorners, InputArray _charucoIds,
                               const Ptr<CharucoBoard> &_board, InputArray _cameraMatrix, InputArray _distCoeffs,
-                              OutputArray _rvec, OutputArray _tvec, bool useExtrinsicGuess) {
+                              InputOutputArray _rvec, InputOutputArray _tvec, bool useExtrinsicGuess) {
 
     CV_Assert((_charucoCorners.getMat().total() == _charucoIds.getMat().total()));
 
diff --git a/modules/aruco/tutorials/table_of_content_aruco.markdown b/modules/aruco/tutorials/table_of_content_aruco.markdown
index 7129bdc626c..7f7932d6b51 100644
--- a/modules/aruco/tutorials/table_of_content_aruco.markdown
+++ b/modules/aruco/tutorials/table_of_content_aruco.markdown
@@ -11,6 +11,11 @@ Also, the ChArUco functionalities combine ArUco markers with traditional chessbo
 an easy and versatile corner detection. The module also includes the functions to detect
 ChArUco corners and use them for pose estimation and camera calibration.
 
+If you are going to print out the markers, an useful script/GUI tool is place at
+opencv_contrib/modules/aruco/misc/pattern_generator/ that can generate vector graphics
+of ArUco, ArUcoGrid and ChArUco boards. It can help you to print out the pattern with real size
+and without artifacts.
+
 -   @subpage tutorial_aruco_detection
 
     *Compatibility:* \> OpenCV 3.0
diff --git a/modules/bioinspired/src/retina.cpp b/modules/bioinspired/src/retina.cpp
index a6b42f0501d..fa6e582cef7 100644
--- a/modules/bioinspired/src/retina.cpp
+++ b/modules/bioinspired/src/retina.cpp
@@ -562,7 +562,7 @@ bool RetinaImpl::ocl_run(InputArray inputMatToConvert)
 
 void RetinaImpl::run(InputArray inputMatToConvert)
 {
-    CV_OCL_RUN((_ocl_retina != 0 && inputMatToConvert.isUMat()), ocl_run(inputMatToConvert));
+    CV_OCL_RUN((_ocl_retina && inputMatToConvert.isUMat()), ocl_run(inputMatToConvert));
 
     _wasOCLRunCalled = false;
     // first convert input image to the compatible format : std::valarray<float>
@@ -830,7 +830,7 @@ bool RetinaImpl::_convertCvMat2ValarrayBuffer(InputArray inputMat, std::valarray
 void RetinaImpl::clearBuffers()
 {
 #ifdef HAVE_OPENCL
-    if (_ocl_retina != 0)
+    if (_ocl_retina)
         _ocl_retina->clearBuffers();
 #endif
 
diff --git a/modules/ccalib/include/opencv2/ccalib.hpp b/modules/ccalib/include/opencv2/ccalib.hpp
index c9b9391cb48..538ec0f81b6 100644
--- a/modules/ccalib/include/opencv2/ccalib.hpp
+++ b/modules/ccalib/include/opencv2/ccalib.hpp
@@ -96,21 +96,21 @@ class CV_EXPORTS CustomPattern : public Algorithm
 		Calls the calirateCamera function with the same inputs.
 	*/
 
-	bool findRt(InputArray objectPoints, InputArray imagePoints, InputArray cameraMatrix, InputArray distCoeffs,
-                OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess = false, int flags = SOLVEPNP_ITERATIVE);
-	bool findRt(InputArray image, InputArray cameraMatrix, InputArray distCoeffs,
-                OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess = false, int flags = SOLVEPNP_ITERATIVE);
+    bool findRt(InputArray objectPoints, InputArray imagePoints, InputArray cameraMatrix, InputArray distCoeffs,
+                InputOutputArray rvec, InputOutputArray tvec, bool useExtrinsicGuess = false, int flags = SOLVEPNP_ITERATIVE);
+    bool findRt(InputArray image, InputArray cameraMatrix, InputArray distCoeffs,
+                InputOutputArray rvec, InputOutputArray tvec, bool useExtrinsicGuess = false, int flags = SOLVEPNP_ITERATIVE);
     /**<
 		Uses solvePnP to find the rotation and translation of the pattern
 		with respect to the camera frame.
 	*/
 
-	bool findRtRANSAC(InputArray objectPoints, InputArray imagePoints, InputArray cameraMatrix, InputArray distCoeffs,
-				OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess = false, int iterationsCount = 100,
-				float reprojectionError = 8.0, int minInliersCount = 100, OutputArray inliers = noArray(), int flags = SOLVEPNP_ITERATIVE);
-	bool findRtRANSAC(InputArray image, InputArray cameraMatrix, InputArray distCoeffs,
-				OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess = false, int iterationsCount = 100,
-				float reprojectionError = 8.0, int minInliersCount = 100, OutputArray inliers = noArray(), int flags = SOLVEPNP_ITERATIVE);
+    bool findRtRANSAC(InputArray objectPoints, InputArray imagePoints, InputArray cameraMatrix, InputArray distCoeffs,
+                      InputOutputArray rvec, InputOutputArray tvec, bool useExtrinsicGuess = false, int iterationsCount = 100,
+                      float reprojectionError = 8.0, int minInliersCount = 100, OutputArray inliers = noArray(), int flags = SOLVEPNP_ITERATIVE);
+    bool findRtRANSAC(InputArray image, InputArray cameraMatrix, InputArray distCoeffs,
+                      InputOutputArray rvec, InputOutputArray tvec, bool useExtrinsicGuess = false, int iterationsCount = 100,
+                      float reprojectionError = 8.0, int minInliersCount = 100, OutputArray inliers = noArray(), int flags = SOLVEPNP_ITERATIVE);
         /**<
 		Uses solvePnPRansac()
 	*/
diff --git a/modules/ccalib/src/ccalib.cpp b/modules/ccalib/src/ccalib.cpp
index 249d5b14a5c..03f1414d8e6 100644
--- a/modules/ccalib/src/ccalib.cpp
+++ b/modules/ccalib/src/ccalib.cpp
@@ -425,13 +425,13 @@ double CustomPattern::calibrate(InputArrayOfArrays objectPoints, InputArrayOfArr
 }
 
 bool CustomPattern::findRt(InputArray objectPoints, InputArray imagePoints, InputArray cameraMatrix,
-                InputArray distCoeffs, OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess, int flags)
+                           InputArray distCoeffs, InputOutputArray rvec, InputOutputArray tvec, bool useExtrinsicGuess, int flags)
 {
     return solvePnP(objectPoints, imagePoints, cameraMatrix, distCoeffs, rvec, tvec, useExtrinsicGuess, flags);
 }
 
 bool CustomPattern::findRt(InputArray image, InputArray cameraMatrix, InputArray distCoeffs,
-                OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess, int flags)
+                           InputOutputArray rvec, InputOutputArray tvec, bool useExtrinsicGuess, int flags)
 {
     vector<Point2f> imagePoints;
     vector<Point3f> objectPoints;
@@ -442,17 +442,17 @@ bool CustomPattern::findRt(InputArray image, InputArray cameraMatrix, InputArray
 }
 
 bool CustomPattern::findRtRANSAC(InputArray objectPoints, InputArray imagePoints, InputArray cameraMatrix, InputArray distCoeffs,
-            OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess, int iterationsCount,
-            float reprojectionError, int minInliersCount, OutputArray inliers, int flags)
+                                 InputOutputArray rvec, InputOutputArray tvec, bool useExtrinsicGuess, int iterationsCount,
+                                 float reprojectionError, int minInliersCount, OutputArray inliers, int flags)
 {
     solvePnPRansac(objectPoints, imagePoints, cameraMatrix, distCoeffs, rvec, tvec, useExtrinsicGuess,
-                    iterationsCount, reprojectionError, minInliersCount, inliers, flags);
+                   iterationsCount, reprojectionError, minInliersCount, inliers, flags);
     return true; // for consistency with the other methods
 }
 
 bool CustomPattern::findRtRANSAC(InputArray image, InputArray cameraMatrix, InputArray distCoeffs,
-            OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess, int iterationsCount,
-            float reprojectionError, int minInliersCount, OutputArray inliers, int flags)
+                                 InputOutputArray rvec, InputOutputArray tvec, bool useExtrinsicGuess, int iterationsCount,
+                                 float reprojectionError, int minInliersCount, OutputArray inliers, int flags)
 {
     vector<Point2f> imagePoints;
     vector<Point3f> objectPoints;
@@ -460,7 +460,7 @@ bool CustomPattern::findRtRANSAC(InputArray image, InputArray cameraMatrix, Inpu
     if (!findPattern(image, imagePoints, objectPoints))
         return false;
     solvePnPRansac(objectPoints, imagePoints, cameraMatrix, distCoeffs, rvec, tvec, useExtrinsicGuess,
-                    iterationsCount, reprojectionError, minInliersCount, inliers, flags);
+                   iterationsCount, reprojectionError, minInliersCount, inliers, flags);
     return true;
 }
 
diff --git a/modules/cudaarithm/CMakeLists.txt b/modules/cudaarithm/CMakeLists.txt
new file mode 100644
index 00000000000..d552bb4ebe9
--- /dev/null
+++ b/modules/cudaarithm/CMakeLists.txt
@@ -0,0 +1,27 @@
+if(IOS OR WINRT OR (NOT HAVE_CUDA AND NOT BUILD_CUDA_STUBS))
+  ocv_module_disable(cudaarithm)
+endif()
+
+set(the_description "CUDA-accelerated Operations on Matrices")
+
+ocv_warnings_disable(CMAKE_CXX_FLAGS /wd4127 /wd4324 /wd4512 -Wundef -Wmissing-declarations -Wshadow)
+
+ocv_add_module(cudaarithm opencv_core OPTIONAL opencv_cudev WRAP python)
+
+ocv_module_include_directories()
+ocv_glob_module_sources()
+
+set(extra_libs "")
+
+if(HAVE_CUBLAS)
+  list(APPEND extra_libs ${CUDA_cublas_LIBRARY})
+endif()
+
+if(HAVE_CUFFT)
+  list(APPEND extra_libs ${CUDA_cufft_LIBRARY})
+endif()
+
+ocv_create_module(${extra_libs})
+
+ocv_add_accuracy_tests(DEPENDS_ON opencv_imgproc)
+ocv_add_perf_tests(DEPENDS_ON opencv_imgproc)
diff --git a/modules/cudaarithm/include/opencv2/cudaarithm.hpp b/modules/cudaarithm/include/opencv2/cudaarithm.hpp
new file mode 100644
index 00000000000..4aa6582dcf1
--- /dev/null
+++ b/modules/cudaarithm/include/opencv2/cudaarithm.hpp
@@ -0,0 +1,886 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_CUDAARITHM_HPP
+#define OPENCV_CUDAARITHM_HPP
+
+#ifndef __cplusplus
+#  error cudaarithm.hpp header must be compiled as C++
+#endif
+
+#include "opencv2/core/cuda.hpp"
+
+/**
+  @addtogroup cuda
+  @{
+    @defgroup cudaarithm Operations on Matrices
+    @{
+        @defgroup cudaarithm_core Core Operations on Matrices
+        @defgroup cudaarithm_elem Per-element Operations
+        @defgroup cudaarithm_reduce Matrix Reductions
+        @defgroup cudaarithm_arithm Arithm Operations on Matrices
+    @}
+  @}
+ */
+
+namespace cv { namespace cuda {
+
+//! @addtogroup cudaarithm
+//! @{
+
+//! @addtogroup cudaarithm_elem
+//! @{
+
+/** @brief Computes a matrix-matrix or matrix-scalar sum.
+
+@param src1 First source matrix or scalar.
+@param src2 Second source matrix or scalar. Matrix should have the same size and type as src1 .
+@param dst Destination matrix that has the same size and number of channels as the input array(s).
+The depth is defined by dtype or src1 depth.
+@param mask Optional operation mask, 8-bit single channel array, that specifies elements of the
+destination array to be changed. The mask can be used only with single channel images.
+@param dtype Optional depth of the output array.
+@param stream Stream for the asynchronous version.
+
+@sa add
+ */
+CV_EXPORTS_W void add(InputArray src1, InputArray src2, OutputArray dst, InputArray mask = noArray(), int dtype = -1, Stream& stream = Stream::Null());
+
+/** @brief Computes a matrix-matrix or matrix-scalar difference.
+
+@param src1 First source matrix or scalar.
+@param src2 Second source matrix or scalar. Matrix should have the same size and type as src1 .
+@param dst Destination matrix that has the same size and number of channels as the input array(s).
+The depth is defined by dtype or src1 depth.
+@param mask Optional operation mask, 8-bit single channel array, that specifies elements of the
+destination array to be changed. The mask can be used only with single channel images.
+@param dtype Optional depth of the output array.
+@param stream Stream for the asynchronous version.
+
+@sa subtract
+ */
+CV_EXPORTS_W void subtract(InputArray src1, InputArray src2, OutputArray dst, InputArray mask = noArray(), int dtype = -1, Stream& stream = Stream::Null());
+
+/** @brief Computes a matrix-matrix or matrix-scalar per-element product.
+
+@param src1 First source matrix or scalar.
+@param src2 Second source matrix or scalar.
+@param dst Destination matrix that has the same size and number of channels as the input array(s).
+The depth is defined by dtype or src1 depth.
+@param scale Optional scale factor.
+@param dtype Optional depth of the output array.
+@param stream Stream for the asynchronous version.
+
+@sa multiply
+ */
+CV_EXPORTS_W void multiply(InputArray src1, InputArray src2, OutputArray dst, double scale = 1, int dtype = -1, Stream& stream = Stream::Null());
+
+/** @brief Computes a matrix-matrix or matrix-scalar division.
+
+@param src1 First source matrix or a scalar.
+@param src2 Second source matrix or scalar.
+@param dst Destination matrix that has the same size and number of channels as the input array(s).
+The depth is defined by dtype or src1 depth.
+@param scale Optional scale factor.
+@param dtype Optional depth of the output array.
+@param stream Stream for the asynchronous version.
+
+This function, in contrast to divide, uses a round-down rounding mode.
+
+@sa divide
+ */
+CV_EXPORTS_W void divide(InputArray src1, InputArray src2, OutputArray dst, double scale = 1, int dtype = -1, Stream& stream = Stream::Null());
+
+/** @brief Computes per-element absolute difference of two matrices (or of a matrix and scalar).
+
+@param src1 First source matrix or scalar.
+@param src2 Second source matrix or scalar.
+@param dst Destination matrix that has the same size and type as the input array(s).
+@param stream Stream for the asynchronous version.
+
+@sa absdiff
+ */
+CV_EXPORTS_W void absdiff(InputArray src1, InputArray src2, OutputArray dst, Stream& stream = Stream::Null());
+
+/** @brief Computes an absolute value of each matrix element.
+
+@param src Source matrix.
+@param dst Destination matrix with the same size and type as src .
+@param stream Stream for the asynchronous version.
+
+@sa abs
+ */
+CV_EXPORTS_W void abs(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+/** @brief Computes a square value of each matrix element.
+
+@param src Source matrix.
+@param dst Destination matrix with the same size and type as src .
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void sqr(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+/** @brief Computes a square root of each matrix element.
+
+@param src Source matrix.
+@param dst Destination matrix with the same size and type as src .
+@param stream Stream for the asynchronous version.
+
+@sa sqrt
+ */
+CV_EXPORTS_W void sqrt(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+/** @brief Computes an exponent of each matrix element.
+
+@param src Source matrix.
+@param dst Destination matrix with the same size and type as src .
+@param stream Stream for the asynchronous version.
+
+@sa exp
+ */
+CV_EXPORTS_W void exp(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+/** @brief Computes a natural logarithm of absolute value of each matrix element.
+
+@param src Source matrix.
+@param dst Destination matrix with the same size and type as src .
+@param stream Stream for the asynchronous version.
+
+@sa log
+ */
+CV_EXPORTS_W void log(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+/** @brief Raises every matrix element to a power.
+
+@param src Source matrix.
+@param power Exponent of power.
+@param dst Destination matrix with the same size and type as src .
+@param stream Stream for the asynchronous version.
+
+The function pow raises every element of the input matrix to power :
+
+\f[\texttt{dst} (I) =  \fork{\texttt{src}(I)^power}{if \texttt{power} is integer}{|\texttt{src}(I)|^power}{otherwise}\f]
+
+@sa pow
+ */
+CV_EXPORTS_W void pow(InputArray src, double power, OutputArray dst, Stream& stream = Stream::Null());
+
+/** @brief Compares elements of two matrices (or of a matrix and scalar).
+
+@param src1 First source matrix or scalar.
+@param src2 Second source matrix or scalar.
+@param dst Destination matrix that has the same size as the input array(s) and type CV_8U.
+@param cmpop Flag specifying the relation between the elements to be checked:
+-   **CMP_EQ:** a(.) == b(.)
+-   **CMP_GT:** a(.) \> b(.)
+-   **CMP_GE:** a(.) \>= b(.)
+-   **CMP_LT:** a(.) \< b(.)
+-   **CMP_LE:** a(.) \<= b(.)
+-   **CMP_NE:** a(.) != b(.)
+@param stream Stream for the asynchronous version.
+
+@sa compare
+ */
+CV_EXPORTS_W void compare(InputArray src1, InputArray src2, OutputArray dst, int cmpop, Stream& stream = Stream::Null());
+
+/** @brief Performs a per-element bitwise inversion.
+
+@param src Source matrix.
+@param dst Destination matrix with the same size and type as src .
+@param mask Optional operation mask, 8-bit single channel array, that specifies elements of the
+destination array to be changed. The mask can be used only with single channel images.
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void bitwise_not(InputArray src, OutputArray dst, InputArray mask = noArray(), Stream& stream = Stream::Null());
+
+/** @brief Performs a per-element bitwise disjunction of two matrices (or of matrix and scalar).
+
+@param src1 First source matrix or scalar.
+@param src2 Second source matrix or scalar.
+@param dst Destination matrix that has the same size and type as the input array(s).
+@param mask Optional operation mask, 8-bit single channel array, that specifies elements of the
+destination array to be changed. The mask can be used only with single channel images.
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void bitwise_or(InputArray src1, InputArray src2, OutputArray dst, InputArray mask = noArray(), Stream& stream = Stream::Null());
+
+/** @brief Performs a per-element bitwise conjunction of two matrices (or of matrix and scalar).
+
+@param src1 First source matrix or scalar.
+@param src2 Second source matrix or scalar.
+@param dst Destination matrix that has the same size and type as the input array(s).
+@param mask Optional operation mask, 8-bit single channel array, that specifies elements of the
+destination array to be changed. The mask can be used only with single channel images.
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void bitwise_and(InputArray src1, InputArray src2, OutputArray dst, InputArray mask = noArray(), Stream& stream = Stream::Null());
+
+/** @brief Performs a per-element bitwise exclusive or operation of two matrices (or of matrix and scalar).
+
+@param src1 First source matrix or scalar.
+@param src2 Second source matrix or scalar.
+@param dst Destination matrix that has the same size and type as the input array(s).
+@param mask Optional operation mask, 8-bit single channel array, that specifies elements of the
+destination array to be changed. The mask can be used only with single channel images.
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void bitwise_xor(InputArray src1, InputArray src2, OutputArray dst, InputArray mask = noArray(), Stream& stream = Stream::Null());
+
+/** @brief Performs pixel by pixel right shift of an image by a constant value.
+
+@param src Source matrix. Supports 1, 3 and 4 channels images with integers elements.
+@param val Constant values, one per channel.
+@param dst Destination matrix with the same size and type as src .
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS void rshift(InputArray src, Scalar_<int> val, OutputArray dst, Stream& stream = Stream::Null());
+
+CV_WRAP inline void rshift(InputArray src, Scalar val, OutputArray dst, Stream& stream = Stream::Null()) {
+    rshift(src, Scalar_<int>(val), dst, stream);
+}
+
+/** @brief Performs pixel by pixel right left of an image by a constant value.
+
+@param src Source matrix. Supports 1, 3 and 4 channels images with CV_8U , CV_16U or CV_32S
+depth.
+@param val Constant values, one per channel.
+@param dst Destination matrix with the same size and type as src .
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS void lshift(InputArray src, Scalar_<int> val, OutputArray dst, Stream& stream = Stream::Null());
+
+CV_WRAP inline void lshift(InputArray src, Scalar val, OutputArray dst, Stream& stream = Stream::Null()) {
+    lshift(src, Scalar_<int>(val), dst, stream);
+}
+
+/** @brief Computes the per-element minimum of two matrices (or a matrix and a scalar).
+
+@param src1 First source matrix or scalar.
+@param src2 Second source matrix or scalar.
+@param dst Destination matrix that has the same size and type as the input array(s).
+@param stream Stream for the asynchronous version.
+
+@sa min
+ */
+CV_EXPORTS_W void min(InputArray src1, InputArray src2, OutputArray dst, Stream& stream = Stream::Null());
+
+/** @brief Computes the per-element maximum of two matrices (or a matrix and a scalar).
+
+@param src1 First source matrix or scalar.
+@param src2 Second source matrix or scalar.
+@param dst Destination matrix that has the same size and type as the input array(s).
+@param stream Stream for the asynchronous version.
+
+@sa max
+ */
+CV_EXPORTS_W void max(InputArray src1, InputArray src2, OutputArray dst, Stream& stream = Stream::Null());
+
+/** @brief Computes the weighted sum of two arrays.
+
+@param src1 First source array.
+@param alpha Weight for the first array elements.
+@param src2 Second source array of the same size and channel number as src1 .
+@param beta Weight for the second array elements.
+@param dst Destination array that has the same size and number of channels as the input arrays.
+@param gamma Scalar added to each sum.
+@param dtype Optional depth of the destination array. When both input arrays have the same depth,
+dtype can be set to -1, which will be equivalent to src1.depth().
+@param stream Stream for the asynchronous version.
+
+The function addWeighted calculates the weighted sum of two arrays as follows:
+
+\f[\texttt{dst} (I)= \texttt{saturate} ( \texttt{src1} (I)* \texttt{alpha} +  \texttt{src2} (I)* \texttt{beta} +  \texttt{gamma} )\f]
+
+where I is a multi-dimensional index of array elements. In case of multi-channel arrays, each
+channel is processed independently.
+
+@sa addWeighted
+ */
+CV_EXPORTS_W void addWeighted(InputArray src1, double alpha, InputArray src2, double beta, double gamma, OutputArray dst,
+                            int dtype = -1, Stream& stream = Stream::Null());
+
+//! adds scaled array to another one (dst = alpha*src1 + src2)
+static inline void scaleAdd(InputArray src1, double alpha, InputArray src2, OutputArray dst, Stream& stream = Stream::Null())
+{
+    addWeighted(src1, alpha, src2, 1.0, 0.0, dst, -1, stream);
+}
+
+/** @brief Applies a fixed-level threshold to each array element.
+
+@param src Source array (single-channel).
+@param dst Destination array with the same size and type as src .
+@param thresh Threshold value.
+@param maxval Maximum value to use with THRESH_BINARY and THRESH_BINARY_INV threshold types.
+@param type Threshold type. For details, see threshold . The THRESH_OTSU and THRESH_TRIANGLE
+threshold types are not supported.
+@param stream Stream for the asynchronous version.
+
+@sa threshold
+ */
+CV_EXPORTS_W double threshold(InputArray src, OutputArray dst, double thresh, double maxval, int type, Stream& stream = Stream::Null());
+
+/** @brief Computes magnitudes of complex matrix elements.
+
+@param xy Source complex matrix in the interleaved format ( CV_32FC2 ).
+@param magnitude Destination matrix of float magnitudes ( CV_32FC1 ).
+@param stream Stream for the asynchronous version.
+
+@sa magnitude
+ */
+CV_EXPORTS_W void magnitude(InputArray xy, OutputArray magnitude, Stream& stream = Stream::Null());
+
+/** @brief Computes squared magnitudes of complex matrix elements.
+
+@param xy Source complex matrix in the interleaved format ( CV_32FC2 ).
+@param magnitude Destination matrix of float magnitude squares ( CV_32FC1 ).
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void magnitudeSqr(InputArray xy, OutputArray magnitude, Stream& stream = Stream::Null());
+
+/** @overload
+ computes magnitude of each (x(i), y(i)) vector
+ supports only floating-point source
+@param x Source matrix containing real components ( CV_32FC1 ).
+@param y Source matrix containing imaginary components ( CV_32FC1 ).
+@param magnitude Destination matrix of float magnitudes ( CV_32FC1 ).
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void magnitude(InputArray x, InputArray y, OutputArray magnitude, Stream& stream = Stream::Null());
+
+/** @overload
+ computes squared magnitude of each (x(i), y(i)) vector
+ supports only floating-point source
+@param x Source matrix containing real components ( CV_32FC1 ).
+@param y Source matrix containing imaginary components ( CV_32FC1 ).
+@param magnitude Destination matrix of float magnitude squares ( CV_32FC1 ).
+@param stream Stream for the asynchronous version.
+*/
+CV_EXPORTS_W void magnitudeSqr(InputArray x, InputArray y, OutputArray magnitude, Stream& stream = Stream::Null());
+
+/** @brief Computes polar angles of complex matrix elements.
+
+@param x Source matrix containing real components ( CV_32FC1 ).
+@param y Source matrix containing imaginary components ( CV_32FC1 ).
+@param angle Destination matrix of angles ( CV_32FC1 ).
+@param angleInDegrees Flag for angles that must be evaluated in degrees.
+@param stream Stream for the asynchronous version.
+
+@sa phase
+ */
+CV_EXPORTS_W void phase(InputArray x, InputArray y, OutputArray angle, bool angleInDegrees = false, Stream& stream = Stream::Null());
+
+/** @brief Converts Cartesian coordinates into polar.
+
+@param x Source matrix containing real components ( CV_32FC1 ).
+@param y Source matrix containing imaginary components ( CV_32FC1 ).
+@param magnitude Destination matrix of float magnitudes ( CV_32FC1 ).
+@param angle Destination matrix of angles ( CV_32FC1 ).
+@param angleInDegrees Flag for angles that must be evaluated in degrees.
+@param stream Stream for the asynchronous version.
+
+@sa cartToPolar
+ */
+CV_EXPORTS_W void cartToPolar(InputArray x, InputArray y, OutputArray magnitude, OutputArray angle, bool angleInDegrees = false, Stream& stream = Stream::Null());
+
+/** @brief Converts polar coordinates into Cartesian.
+
+@param magnitude Source matrix containing magnitudes ( CV_32FC1 or CV_64FC1 ).
+@param angle Source matrix containing angles ( same type as magnitude ).
+@param x Destination matrix of real components ( same type as magnitude ).
+@param y Destination matrix of imaginary components ( same type as magnitude ).
+@param angleInDegrees Flag that indicates angles in degrees.
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void polarToCart(InputArray magnitude, InputArray angle, OutputArray x, OutputArray y, bool angleInDegrees = false, Stream& stream = Stream::Null());
+
+//! @} cudaarithm_elem
+
+//! @addtogroup cudaarithm_core
+//! @{
+
+/** @brief Makes a multi-channel matrix out of several single-channel matrices.
+
+@param src Array/vector of source matrices.
+@param n Number of source matrices.
+@param dst Destination matrix.
+@param stream Stream for the asynchronous version.
+
+@sa merge
+ */
+CV_EXPORTS void merge(const GpuMat* src, size_t n, OutputArray dst, Stream& stream = Stream::Null());
+/** @overload */
+CV_EXPORTS_W void merge(const std::vector<GpuMat>& src, OutputArray dst, Stream& stream = Stream::Null());
+
+/** @brief Copies each plane of a multi-channel matrix into an array.
+
+@param src Source matrix.
+@param dst Destination array/vector of single-channel matrices.
+@param stream Stream for the asynchronous version.
+
+@sa split
+ */
+CV_EXPORTS void split(InputArray src, GpuMat* dst, Stream& stream = Stream::Null());
+/** @overload */
+CV_EXPORTS_W void split(InputArray src, CV_OUT std::vector<GpuMat>& dst, Stream& stream = Stream::Null());
+
+/** @brief Transposes a matrix.
+
+@param src1 Source matrix. 1-, 4-, 8-byte element sizes are supported for now.
+@param dst Destination matrix.
+@param stream Stream for the asynchronous version.
+
+@sa transpose
+ */
+CV_EXPORTS_W void transpose(InputArray src1, OutputArray dst, Stream& stream = Stream::Null());
+
+/** @brief Flips a 2D matrix around vertical, horizontal, or both axes.
+
+@param src Source matrix. Supports 1, 3 and 4 channels images with CV_8U, CV_16U, CV_32S or
+CV_32F depth.
+@param dst Destination matrix.
+@param flipCode Flip mode for the source:
+-   0 Flips around x-axis.
+-   \> 0 Flips around y-axis.
+-   \< 0 Flips around both axes.
+@param stream Stream for the asynchronous version.
+
+@sa flip
+ */
+CV_EXPORTS_W void flip(InputArray src, OutputArray dst, int flipCode, Stream& stream = Stream::Null());
+
+/** @brief Base class for transform using lookup table.
+ */
+class CV_EXPORTS_W LookUpTable : public Algorithm
+{
+public:
+    /** @brief Transforms the source matrix into the destination matrix using the given look-up table:
+    dst(I) = lut(src(I)) .
+
+    @param src Source matrix. CV_8UC1 and CV_8UC3 matrices are supported for now.
+    @param dst Destination matrix.
+    @param stream Stream for the asynchronous version.
+     */
+    CV_WRAP virtual void transform(InputArray src, OutputArray dst, Stream& stream = Stream::Null()) = 0;
+};
+
+/** @brief Creates implementation for cuda::LookUpTable .
+
+@param lut Look-up table of 256 elements. It is a continuous CV_8U matrix.
+ */
+CV_EXPORTS_W Ptr<LookUpTable> createLookUpTable(InputArray lut);
+
+/** @brief Forms a border around an image.
+
+@param src Source image. CV_8UC1 , CV_8UC4 , CV_32SC1 , and CV_32FC1 types are supported.
+@param dst Destination image with the same type as src. The size is
+Size(src.cols+left+right, src.rows+top+bottom) .
+@param top Number of top pixels
+@param bottom Number of bottom pixels
+@param left Number of left pixels
+@param right Number of pixels in each direction from the source image rectangle to extrapolate.
+For example: top=1, bottom=1, left=1, right=1 mean that 1 pixel-wide border needs to be built.
+@param borderType Border type. See borderInterpolate for details. BORDER_REFLECT101 ,
+BORDER_REPLICATE , BORDER_CONSTANT , BORDER_REFLECT and BORDER_WRAP are supported for now.
+@param value Border value.
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void copyMakeBorder(InputArray src, OutputArray dst, int top, int bottom, int left, int right, int borderType,
+                               Scalar value = Scalar(), Stream& stream = Stream::Null());
+
+//! @} cudaarithm_core
+
+//! @addtogroup cudaarithm_reduce
+//! @{
+
+/** @brief Returns the norm of a matrix (or difference of two matrices).
+
+@param src1 Source matrix. Any matrices except 64F are supported.
+@param normType Norm type. NORM_L1 , NORM_L2 , and NORM_INF are supported for now.
+@param mask optional operation mask; it must have the same size as src1 and CV_8UC1 type.
+
+@sa norm
+ */
+CV_EXPORTS_W double norm(InputArray src1, int normType, InputArray mask = noArray());
+/** @overload */
+CV_EXPORTS_W void calcNorm(InputArray src, OutputArray dst, int normType, InputArray mask = noArray(), Stream& stream = Stream::Null());
+
+/** @brief Returns the difference of two matrices.
+
+@param src1 Source matrix. Any matrices except 64F are supported.
+@param src2 Second source matrix (if any) with the same size and type as src1.
+@param normType Norm type. NORM_L1 , NORM_L2 , and NORM_INF are supported for now.
+
+@sa norm
+ */
+CV_EXPORTS_W double norm(InputArray src1, InputArray src2, int normType=NORM_L2);
+/** @overload */
+CV_EXPORTS_W void calcNormDiff(InputArray src1, InputArray src2, OutputArray dst, int normType=NORM_L2, Stream& stream = Stream::Null());
+
+/** @brief Returns the sum of matrix elements.
+
+@param src Source image of any depth except for CV_64F .
+@param mask optional operation mask; it must have the same size as src1 and CV_8UC1 type.
+
+@sa sum
+ */
+CV_EXPORTS_W Scalar sum(InputArray src, InputArray mask = noArray());
+/** @overload */
+CV_EXPORTS_W void calcSum(InputArray src, OutputArray dst, InputArray mask = noArray(), Stream& stream = Stream::Null());
+
+/** @brief Returns the sum of absolute values for matrix elements.
+
+@param src Source image of any depth except for CV_64F .
+@param mask optional operation mask; it must have the same size as src1 and CV_8UC1 type.
+ */
+CV_EXPORTS_W Scalar absSum(InputArray src, InputArray mask = noArray());
+/** @overload */
+CV_EXPORTS_W void calcAbsSum(InputArray src, OutputArray dst, InputArray mask = noArray(), Stream& stream = Stream::Null());
+
+/** @brief Returns the squared sum of matrix elements.
+
+@param src Source image of any depth except for CV_64F .
+@param mask optional operation mask; it must have the same size as src1 and CV_8UC1 type.
+ */
+CV_EXPORTS_W Scalar sqrSum(InputArray src, InputArray mask = noArray());
+/** @overload */
+CV_EXPORTS_W void calcSqrSum(InputArray src, OutputArray dst, InputArray mask = noArray(), Stream& stream = Stream::Null());
+
+/** @brief Finds global minimum and maximum matrix elements and returns their values.
+
+@param src Single-channel source image.
+@param minVal Pointer to the returned minimum value. Use NULL if not required.
+@param maxVal Pointer to the returned maximum value. Use NULL if not required.
+@param mask Optional mask to select a sub-matrix.
+
+The function does not work with CV_64F images on GPUs with the compute capability \< 1.3.
+
+@sa minMaxLoc
+ */
+CV_EXPORTS_W void minMax(InputArray src, double* minVal, double* maxVal, InputArray mask = noArray());
+/** @overload */
+CV_EXPORTS_W void findMinMax(InputArray src, OutputArray dst, InputArray mask = noArray(), Stream& stream = Stream::Null());
+
+/** @brief Finds global minimum and maximum matrix elements and returns their values with locations.
+
+@param src Single-channel source image.
+@param minVal Pointer to the returned minimum value. Use NULL if not required.
+@param maxVal Pointer to the returned maximum value. Use NULL if not required.
+@param minLoc Pointer to the returned minimum location. Use NULL if not required.
+@param maxLoc Pointer to the returned maximum location. Use NULL if not required.
+@param mask Optional mask to select a sub-matrix.
+
+The function does not work with CV_64F images on GPU with the compute capability \< 1.3.
+
+@sa minMaxLoc
+ */
+CV_EXPORTS_W void minMaxLoc(InputArray src, double* minVal, double* maxVal, Point* minLoc, Point* maxLoc,
+                          InputArray mask = noArray());
+/** @overload */
+CV_EXPORTS_W void findMinMaxLoc(InputArray src, OutputArray minMaxVals, OutputArray loc,
+                              InputArray mask = noArray(), Stream& stream = Stream::Null());
+
+/** @brief Counts non-zero matrix elements.
+
+@param src Single-channel source image.
+
+The function does not work with CV_64F images on GPUs with the compute capability \< 1.3.
+
+@sa countNonZero
+ */
+CV_EXPORTS_W int countNonZero(InputArray src);
+/** @overload */
+CV_EXPORTS_W void countNonZero(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+/** @brief Reduces a matrix to a vector.
+
+@param mtx Source 2D matrix.
+@param vec Destination vector. Its size and type is defined by dim and dtype parameters.
+@param dim Dimension index along which the matrix is reduced. 0 means that the matrix is reduced
+to a single row. 1 means that the matrix is reduced to a single column.
+@param reduceOp Reduction operation that could be one of the following:
+-   **CV_REDUCE_SUM** The output is the sum of all rows/columns of the matrix.
+-   **CV_REDUCE_AVG** The output is the mean vector of all rows/columns of the matrix.
+-   **CV_REDUCE_MAX** The output is the maximum (column/row-wise) of all rows/columns of the
+matrix.
+-   **CV_REDUCE_MIN** The output is the minimum (column/row-wise) of all rows/columns of the
+matrix.
+@param dtype When it is negative, the destination vector will have the same type as the source
+matrix. Otherwise, its type will be CV_MAKE_TYPE(CV_MAT_DEPTH(dtype), mtx.channels()) .
+@param stream Stream for the asynchronous version.
+
+The function reduce reduces the matrix to a vector by treating the matrix rows/columns as a set of
+1D vectors and performing the specified operation on the vectors until a single row/column is
+obtained. For example, the function can be used to compute horizontal and vertical projections of a
+raster image. In case of CV_REDUCE_SUM and CV_REDUCE_AVG , the output may have a larger element
+bit-depth to preserve accuracy. And multi-channel arrays are also supported in these two reduction
+modes.
+
+@sa reduce
+ */
+CV_EXPORTS_W void reduce(InputArray mtx, OutputArray vec, int dim, int reduceOp, int dtype = -1, Stream& stream = Stream::Null());
+
+/** @brief Computes a mean value and a standard deviation of matrix elements.
+
+@param mtx Source matrix. CV_8UC1 matrices are supported for now.
+@param mean Mean value.
+@param stddev Standard deviation value.
+
+@sa meanStdDev
+ */
+CV_EXPORTS_W void meanStdDev(InputArray mtx, Scalar& mean, Scalar& stddev);
+/** @overload */
+CV_EXPORTS_W void meanStdDev(InputArray mtx, OutputArray dst, Stream& stream = Stream::Null());
+
+/** @brief Computes a standard deviation of integral images.
+
+@param src Source image. Only the CV_32SC1 type is supported.
+@param sqr Squared source image. Only the CV_32FC1 type is supported.
+@param dst Destination image with the same type and size as src .
+@param rect Rectangular window.
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void rectStdDev(InputArray src, InputArray sqr, OutputArray dst, Rect rect, Stream& stream = Stream::Null());
+
+/** @brief Normalizes the norm or value range of an array.
+
+@param src Input array.
+@param dst Output array of the same size as src .
+@param alpha Norm value to normalize to or the lower range boundary in case of the range
+normalization.
+@param beta Upper range boundary in case of the range normalization; it is not used for the norm
+normalization.
+@param norm_type Normalization type ( NORM_MINMAX , NORM_L2 , NORM_L1 or NORM_INF ).
+@param dtype When negative, the output array has the same type as src; otherwise, it has the same
+number of channels as src and the depth =CV_MAT_DEPTH(dtype).
+@param mask Optional operation mask.
+@param stream Stream for the asynchronous version.
+
+@sa normalize
+ */
+CV_EXPORTS_W void normalize(InputArray src, OutputArray dst, double alpha, double beta,
+                          int norm_type, int dtype, InputArray mask = noArray(),
+                          Stream& stream = Stream::Null());
+
+/** @brief Computes an integral image.
+
+@param src Source image. Only CV_8UC1 images are supported for now.
+@param sum Integral image containing 32-bit unsigned integer values packed into CV_32SC1 .
+@param stream Stream for the asynchronous version.
+
+@sa integral
+ */
+CV_EXPORTS_W void integral(InputArray src, OutputArray sum, Stream& stream = Stream::Null());
+
+/** @brief Computes a squared integral image.
+
+@param src Source image. Only CV_8UC1 images are supported for now.
+@param sqsum Squared integral image containing 64-bit unsigned integer values packed into
+CV_64FC1 .
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void sqrIntegral(InputArray src, OutputArray sqsum, Stream& stream = Stream::Null());
+
+//! @} cudaarithm_reduce
+
+//! @addtogroup cudaarithm_arithm
+//! @{
+
+/** @brief Performs generalized matrix multiplication.
+
+@param src1 First multiplied input matrix that should have CV_32FC1 , CV_64FC1 , CV_32FC2 , or
+CV_64FC2 type.
+@param src2 Second multiplied input matrix of the same type as src1 .
+@param alpha Weight of the matrix product.
+@param src3 Third optional delta matrix added to the matrix product. It should have the same type
+as src1 and src2 .
+@param beta Weight of src3 .
+@param dst Destination matrix. It has the proper size and the same type as input matrices.
+@param flags Operation flags:
+-   **GEMM_1_T** transpose src1
+-   **GEMM_2_T** transpose src2
+-   **GEMM_3_T** transpose src3
+@param stream Stream for the asynchronous version.
+
+The function performs generalized matrix multiplication similar to the gemm functions in BLAS level
+3. For example, gemm(src1, src2, alpha, src3, beta, dst, GEMM_1_T + GEMM_3_T) corresponds to
+
+\f[\texttt{dst} =  \texttt{alpha} \cdot \texttt{src1} ^T  \cdot \texttt{src2} +  \texttt{beta} \cdot \texttt{src3} ^T\f]
+
+@note Transposition operation doesn't support CV_64FC2 input type.
+
+@sa gemm
+ */
+CV_EXPORTS_W void gemm(InputArray src1, InputArray src2, double alpha,
+                     InputArray src3, double beta, OutputArray dst, int flags = 0, Stream& stream = Stream::Null());
+
+/** @brief Performs a per-element multiplication of two Fourier spectrums.
+
+@param src1 First spectrum.
+@param src2 Second spectrum with the same size and type as a .
+@param dst Destination spectrum.
+@param flags Mock parameter used for CPU/CUDA interfaces similarity.
+@param conjB Optional flag to specify if the second spectrum needs to be conjugated before the
+multiplication.
+@param stream Stream for the asynchronous version.
+
+Only full (not packed) CV_32FC2 complex spectrums in the interleaved format are supported for now.
+
+@sa mulSpectrums
+ */
+CV_EXPORTS_W void mulSpectrums(InputArray src1, InputArray src2, OutputArray dst, int flags, bool conjB=false, Stream& stream = Stream::Null());
+
+/** @brief Performs a per-element multiplication of two Fourier spectrums and scales the result.
+
+@param src1 First spectrum.
+@param src2 Second spectrum with the same size and type as a .
+@param dst Destination spectrum.
+@param flags Mock parameter used for CPU/CUDA interfaces similarity, simply add a `0` value.
+@param scale Scale constant.
+@param conjB Optional flag to specify if the second spectrum needs to be conjugated before the
+multiplication.
+@param stream Stream for the asynchronous version.
+
+Only full (not packed) CV_32FC2 complex spectrums in the interleaved format are supported for now.
+
+@sa mulSpectrums
+ */
+CV_EXPORTS_W void mulAndScaleSpectrums(InputArray src1, InputArray src2, OutputArray dst, int flags, float scale, bool conjB=false, Stream& stream = Stream::Null());
+
+/** @brief Performs a forward or inverse discrete Fourier transform (1D or 2D) of the floating point matrix.
+
+@param src Source matrix (real or complex).
+@param dst Destination matrix (real or complex).
+@param dft_size Size of a discrete Fourier transform.
+@param flags Optional flags:
+-   **DFT_ROWS** transforms each individual row of the source matrix.
+-   **DFT_SCALE** scales the result: divide it by the number of elements in the transform
+(obtained from dft_size ).
+-   **DFT_INVERSE** inverts DFT. Use for complex-complex cases (real-complex and complex-real
+cases are always forward and inverse, respectively).
+-   **DFT_COMPLEX_INPUT** Specifies that input is complex input with 2 channels.
+-   **DFT_REAL_OUTPUT** specifies the output as real. The source matrix is the result of
+real-complex transform, so the destination matrix must be real.
+@param stream Stream for the asynchronous version.
+
+Use to handle real matrices ( CV32FC1 ) and complex matrices in the interleaved format ( CV32FC2 ).
+
+The source matrix should be continuous, otherwise reallocation and data copying is performed. The
+function chooses an operation mode depending on the flags, size, and channel count of the source
+matrix:
+
+-   If the source matrix is complex and the output is not specified as real, the destination
+matrix is complex and has the dft_size size and CV_32FC2 type. The destination matrix
+contains a full result of the DFT (forward or inverse).
+-   If the source matrix is complex and the output is specified as real, the function assumes that
+its input is the result of the forward transform (see the next item). The destination matrix
+has the dft_size size and CV_32FC1 type. It contains the result of the inverse DFT.
+-   If the source matrix is real (its type is CV_32FC1 ), forward DFT is performed. The result of
+the DFT is packed into complex ( CV_32FC2 ) matrix. So, the width of the destination matrix
+is dft_size.width / 2 + 1 . But if the source is a single column, the height is reduced
+instead of the width.
+
+@sa dft
+ */
+CV_EXPORTS_W void dft(InputArray src, OutputArray dst, Size dft_size, int flags=0, Stream& stream = Stream::Null());
+
+/** @brief Base class for DFT operator as a cv::Algorithm. :
+ */
+class CV_EXPORTS_W DFT : public Algorithm
+{
+public:
+    /** @brief Computes an FFT of a given image.
+
+    @param image Source image. Only CV_32FC1 images are supported for now.
+    @param result Result image.
+    @param stream Stream for the asynchronous version.
+     */
+    CV_WRAP virtual void compute(InputArray image, OutputArray result, Stream& stream = Stream::Null()) = 0;
+};
+
+/** @brief Creates implementation for cuda::DFT.
+
+@param dft_size The image size.
+@param flags Optional flags:
+-   **DFT_ROWS** transforms each individual row of the source matrix.
+-   **DFT_SCALE** scales the result: divide it by the number of elements in the transform
+(obtained from dft_size ).
+-   **DFT_INVERSE** inverts DFT. Use for complex-complex cases (real-complex and complex-real
+cases are always forward and inverse, respectively).
+-   **DFT_COMPLEX_INPUT** Specifies that inputs will be complex with 2 channels.
+-   **DFT_REAL_OUTPUT** specifies the output as real. The source matrix is the result of
+real-complex transform, so the destination matrix must be real.
+ */
+CV_EXPORTS_W Ptr<DFT> createDFT(Size dft_size, int flags);
+
+/** @brief Base class for convolution (or cross-correlation) operator. :
+ */
+class CV_EXPORTS_W Convolution : public Algorithm
+{
+public:
+    /** @brief Computes a convolution (or cross-correlation) of two images.
+
+    @param image Source image. Only CV_32FC1 images are supported for now.
+    @param templ Template image. The size is not greater than the image size. The type is the same as
+    image .
+    @param result Result image. If image is *W x H* and templ is *w x h*, then result must be *W-w+1 x
+    H-h+1*.
+    @param ccorr Flags to evaluate cross-correlation instead of convolution.
+    @param stream Stream for the asynchronous version.
+     */
+    CV_WRAP virtual void convolve(InputArray image, InputArray templ, OutputArray result, bool ccorr = false, Stream& stream = Stream::Null()) = 0;
+};
+
+/** @brief Creates implementation for cuda::Convolution .
+
+@param user_block_size Block size. If you leave default value Size(0,0) then automatic
+estimation of block size will be used (which is optimized for speed). By varying user_block_size
+you can reduce memory requirements at the cost of speed.
+ */
+CV_EXPORTS_W Ptr<Convolution> createConvolution(Size user_block_size = Size());
+
+//! @} cudaarithm_arithm
+
+//! @} cudaarithm
+
+}} // namespace cv { namespace cuda {
+
+#endif /* OPENCV_CUDAARITHM_HPP */
diff --git a/modules/cudaarithm/misc/python/test/test_cudaarithm.py b/modules/cudaarithm/misc/python/test/test_cudaarithm.py
new file mode 100644
index 00000000000..b068fae44bf
--- /dev/null
+++ b/modules/cudaarithm/misc/python/test/test_cudaarithm.py
@@ -0,0 +1,178 @@
+#!/usr/bin/env python
+import os
+import cv2 as cv
+import numpy as np
+
+from tests_common import NewOpenCVTests, unittest
+
+class cudaarithm_test(NewOpenCVTests):
+    def setUp(self):
+        super(cudaarithm_test, self).setUp()
+        if not cv.cuda.getCudaEnabledDeviceCount():
+            self.skipTest("No CUDA-capable device is detected")
+
+    def test_cudaarithm(self):
+        npMat = (np.random.random((128, 128, 3)) * 255).astype(np.uint8)
+
+        cuMat = cv.cuda_GpuMat(npMat)
+        cuMatDst = cv.cuda_GpuMat(cuMat.size(),cuMat.type())
+        cuMatB = cv.cuda_GpuMat(cuMat.size(),cv.CV_8UC1)
+        cuMatG = cv.cuda_GpuMat(cuMat.size(),cv.CV_8UC1)
+        cuMatR = cv.cuda_GpuMat(cuMat.size(),cv.CV_8UC1)
+
+        self.assertTrue(np.allclose(cv.cuda.merge(cv.cuda.split(cuMat)),npMat))
+
+        cv.cuda.split(cuMat,[cuMatB,cuMatG,cuMatR])
+        cv.cuda.merge([cuMatB,cuMatG,cuMatR],cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),npMat))
+
+        shift = (np.random.random((cuMat.channels(),)) * 8).astype(np.uint8).tolist()
+        self.assertTrue(np.allclose(cv.cuda.rshift(cuMat,shift).download(),npMat  >> shift))
+        cv.cuda.rshift(cuMat,shift,cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),npMat >> shift))
+
+        self.assertTrue(np.allclose(cv.cuda.lshift(cuMat,shift).download(),(npMat << shift).astype('uint8')))
+        cv.cuda.lshift(cuMat,shift,cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),(npMat << shift).astype('uint8')))
+
+    def test_arithmetic(self):
+        npMat1 = np.random.random((128, 128, 3)) - 0.5
+        npMat2 = np.random.random((128, 128, 3)) - 0.5
+
+        cuMat1 = cv.cuda_GpuMat()
+        cuMat2 = cv.cuda_GpuMat()
+        cuMat1.upload(npMat1)
+        cuMat2.upload(npMat2)
+        cuMatDst = cv.cuda_GpuMat(cuMat1.size(),cuMat1.type())
+
+        self.assertTrue(np.allclose(cv.cuda.add(cuMat1, cuMat2).download(),
+                                         cv.add(npMat1, npMat2)))
+
+        cv.cuda.add(cuMat1, cuMat2, cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),cv.add(npMat1, npMat2)))
+
+        self.assertTrue(np.allclose(cv.cuda.subtract(cuMat1, cuMat2).download(),
+                                         cv.subtract(npMat1, npMat2)))
+
+        cv.cuda.subtract(cuMat1, cuMat2, cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),cv.subtract(npMat1, npMat2)))
+
+        self.assertTrue(np.allclose(cv.cuda.multiply(cuMat1, cuMat2).download(),
+                                         cv.multiply(npMat1, npMat2)))
+
+        cv.cuda.multiply(cuMat1, cuMat2, cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),cv.multiply(npMat1, npMat2)))
+
+        self.assertTrue(np.allclose(cv.cuda.divide(cuMat1, cuMat2).download(),
+                                         cv.divide(npMat1, npMat2)))
+
+        cv.cuda.divide(cuMat1, cuMat2, cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),cv.divide(npMat1, npMat2)))
+
+        self.assertTrue(np.allclose(cv.cuda.absdiff(cuMat1, cuMat2).download(),
+                                         cv.absdiff(npMat1, npMat2)))
+
+        cv.cuda.absdiff(cuMat1, cuMat2, cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),cv.absdiff(npMat1, npMat2)))
+
+        self.assertTrue(np.allclose(cv.cuda.compare(cuMat1, cuMat2, cv.CMP_GE).download(),
+                                         cv.compare(npMat1, npMat2, cv.CMP_GE)))
+
+        cuMatDst1 = cv.cuda_GpuMat(cuMat1.size(),cv.CV_8UC3)
+        cv.cuda.compare(cuMat1, cuMat2, cv.CMP_GE, cuMatDst1)
+        self.assertTrue(np.allclose(cuMatDst1.download(),cv.compare(npMat1, npMat2, cv.CMP_GE)))
+
+        self.assertTrue(np.allclose(cv.cuda.abs(cuMat1).download(),
+                                         np.abs(npMat1)))
+
+        cv.cuda.abs(cuMat1, cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),np.abs(npMat1)))
+
+        self.assertTrue(np.allclose(cv.cuda.sqrt(cv.cuda.sqr(cuMat1)).download(),
+                                    cv.cuda.abs(cuMat1).download()))
+
+        cv.cuda.sqr(cuMat1, cuMatDst)
+        cv.cuda.sqrt(cuMatDst, cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),cv.cuda.abs(cuMat1).download()))
+
+        self.assertTrue(np.allclose(cv.cuda.log(cv.cuda.exp(cuMat1)).download(),
+                                                            npMat1))
+
+        cv.cuda.exp(cuMat1, cuMatDst)
+        cv.cuda.log(cuMatDst, cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),npMat1))
+
+        self.assertTrue(np.allclose(cv.cuda.pow(cuMat1, 2).download(),
+                                         cv.pow(npMat1, 2)))
+
+        cv.cuda.pow(cuMat1, 2, cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),cv.pow(npMat1, 2)))
+
+    def test_logical(self):
+        npMat1 = (np.random.random((128, 128)) * 255).astype(np.uint8)
+        npMat2 = (np.random.random((128, 128)) * 255).astype(np.uint8)
+
+        cuMat1 = cv.cuda_GpuMat()
+        cuMat2 = cv.cuda_GpuMat()
+        cuMat1.upload(npMat1)
+        cuMat2.upload(npMat2)
+        cuMatDst = cv.cuda_GpuMat(cuMat1.size(),cuMat1.type())
+
+        self.assertTrue(np.allclose(cv.cuda.bitwise_or(cuMat1, cuMat2).download(),
+                                         cv.bitwise_or(npMat1, npMat2)))
+
+        cv.cuda.bitwise_or(cuMat1, cuMat2, cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),cv.bitwise_or(npMat1, npMat2)))
+
+        self.assertTrue(np.allclose(cv.cuda.bitwise_and(cuMat1, cuMat2).download(),
+                                         cv.bitwise_and(npMat1, npMat2)))
+
+        cv.cuda.bitwise_and(cuMat1, cuMat2, cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),cv.bitwise_and(npMat1, npMat2)))
+
+        self.assertTrue(np.allclose(cv.cuda.bitwise_xor(cuMat1, cuMat2).download(),
+                                         cv.bitwise_xor(npMat1, npMat2)))
+
+        cv.cuda.bitwise_xor(cuMat1, cuMat2, cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),cv.bitwise_xor(npMat1, npMat2)))
+
+        self.assertTrue(np.allclose(cv.cuda.bitwise_not(cuMat1).download(),
+                                         cv.bitwise_not(npMat1)))
+
+        cv.cuda.bitwise_not(cuMat1, cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),cv.bitwise_not(npMat1)))
+
+        self.assertTrue(np.allclose(cv.cuda.min(cuMat1, cuMat2).download(),
+                                         cv.min(npMat1, npMat2)))
+
+        cv.cuda.min(cuMat1, cuMat2, cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),cv.min(npMat1, npMat2)))
+
+        self.assertTrue(np.allclose(cv.cuda.max(cuMat1, cuMat2).download(),
+                                         cv.max(npMat1, npMat2)))
+
+        cv.cuda.max(cuMat1, cuMat2, cuMatDst)
+        self.assertTrue(np.allclose(cuMatDst.download(),cv.max(npMat1, npMat2)))
+
+    def test_convolution(self):
+        npMat = (np.random.random((128, 128)) * 255).astype(np.float32)
+        npDims = np.array(npMat.shape)
+        kernel = (np.random.random((3, 3)) * 1).astype(np.float32)
+        kernelDims = np.array(kernel.shape)
+        iS = (kernelDims/2).astype(int)
+        iE = npDims - kernelDims + iS
+
+        cuMat = cv.cuda_GpuMat(npMat)
+        cuKernel= cv.cuda_GpuMat(kernel)
+        cuMatDst = cv.cuda_GpuMat(tuple(npDims - kernelDims + 1), cuMat.type())
+        conv = cv.cuda.createConvolution()
+
+        self.assertTrue(np.allclose(conv.convolve(cuMat,cuKernel,ccorr=True).download(),
+                    cv.filter2D(npMat,-1,kernel,anchor=(-1,-1))[iS[0]:iE[0]+1,iS[1]:iE[1]+1]))
+
+        conv.convolve(cuMat,cuKernel,cuMatDst,True)
+        self.assertTrue(np.allclose(cuMatDst.download(),
+                    cv.filter2D(npMat,-1,kernel,anchor=(-1,-1))[iS[0]:iE[0]+1,iS[1]:iE[1]+1]))
+
+if __name__ == '__main__':
+    NewOpenCVTests.bootstrap()
\ No newline at end of file
diff --git a/modules/cudaarithm/perf/perf_arithm.cpp b/modules/cudaarithm/perf/perf_arithm.cpp
new file mode 100644
index 00000000000..ca23e19dc14
--- /dev/null
+++ b/modules/cudaarithm/perf/perf_arithm.cpp
@@ -0,0 +1,254 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////////////////////
+// GEMM
+
+#ifdef HAVE_CUBLAS
+
+CV_FLAGS(GemmFlags, 0, cv::GEMM_1_T, cv::GEMM_2_T, cv::GEMM_3_T)
+#define ALL_GEMM_FLAGS Values(GemmFlags(0), GemmFlags(cv::GEMM_1_T), GemmFlags(cv::GEMM_2_T), GemmFlags(cv::GEMM_3_T), \
+                              GemmFlags(cv::GEMM_1_T | cv::GEMM_2_T), GemmFlags(cv::GEMM_1_T | cv::GEMM_3_T), GemmFlags(cv::GEMM_1_T | cv::GEMM_2_T | cv::GEMM_3_T))
+
+DEF_PARAM_TEST(Sz_Type_Flags, cv::Size, MatType, GemmFlags);
+
+PERF_TEST_P(Sz_Type_Flags, GEMM,
+            Combine(Values(cv::Size(512, 512), cv::Size(1024, 1024)),
+                    Values(CV_32FC1, CV_32FC2, CV_64FC1),
+                    ALL_GEMM_FLAGS))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+    const int flags = GET_PARAM(2);
+
+    cv::Mat src1(size, type);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, type);
+    declare.in(src2, WARMUP_RNG);
+
+    cv::Mat src3(size, type);
+    declare.in(src3, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        declare.time(5.0);
+
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        const cv::cuda::GpuMat d_src3(src3);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::gemm(d_src1, d_src2, 1.0, d_src3, 1.0, dst, flags);
+
+        CUDA_SANITY_CHECK(dst, 1e-6, ERROR_RELATIVE);
+    }
+    else
+    {
+        declare.time(50.0);
+
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::gemm(src1, src2, 1.0, src3, 1.0, dst, flags);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+#endif
+
+//////////////////////////////////////////////////////////////////////
+// MulSpectrums
+
+CV_FLAGS(DftFlags, 0, cv::DFT_INVERSE, cv::DFT_SCALE, cv::DFT_ROWS, cv::DFT_COMPLEX_OUTPUT, cv::DFT_REAL_OUTPUT)
+
+DEF_PARAM_TEST(Sz_Flags, cv::Size, DftFlags);
+
+PERF_TEST_P(Sz_Flags, MulSpectrums,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(0, DftFlags(cv::DFT_ROWS))))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int flag = GET_PARAM(1);
+
+    cv::Mat a(size, CV_32FC2);
+    cv::Mat b(size, CV_32FC2);
+    declare.in(a, b, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_a(a);
+        const cv::cuda::GpuMat d_b(b);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::mulSpectrums(d_a, d_b, dst, flag);
+
+        CUDA_SANITY_CHECK(dst, 1e-6, ERROR_RELATIVE);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::mulSpectrums(a, b, dst, flag);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// MulAndScaleSpectrums
+
+PERF_TEST_P(Sz, MulAndScaleSpectrums,
+            CUDA_TYPICAL_MAT_SIZES)
+{
+    const cv::Size size = GetParam();
+
+    const float scale = 1.f / size.area();
+
+    cv::Mat src1(size, CV_32FC2);
+    cv::Mat src2(size, CV_32FC2);
+    declare.in(src1,src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::mulAndScaleSpectrums(d_src1, d_src2, dst, cv::DFT_ROWS, scale, false);
+
+        CUDA_SANITY_CHECK(dst, 1e-6, ERROR_RELATIVE);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Dft
+
+PERF_TEST_P(Sz_Flags, Dft,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(0, DftFlags(cv::DFT_ROWS), DftFlags(cv::DFT_INVERSE))))
+{
+    declare.time(10.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int flag = GET_PARAM(1);
+
+    cv::Mat src(size, CV_32FC2);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::dft(d_src, dst, size, flag);
+
+        CUDA_SANITY_CHECK(dst, 1e-6, ERROR_RELATIVE);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::dft(src, dst, flag);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Convolve
+
+DEF_PARAM_TEST(Sz_KernelSz_Ccorr, cv::Size, int, bool);
+
+PERF_TEST_P(Sz_KernelSz_Ccorr, Convolve,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(17, 27, 32, 64),
+                    Bool()))
+{
+    declare.time(10.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int templ_size = GET_PARAM(1);
+    const bool ccorr = GET_PARAM(2);
+
+    const cv::Mat image(size, CV_32FC1);
+    const cv::Mat templ(templ_size, templ_size, CV_32FC1);
+    declare.in(image, templ, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::cuda::GpuMat d_image = cv::cuda::createContinuous(size, CV_32FC1);
+        d_image.upload(image);
+
+        cv::cuda::GpuMat d_templ = cv::cuda::createContinuous(templ_size, templ_size, CV_32FC1);
+        d_templ.upload(templ);
+
+        cv::Ptr<cv::cuda::Convolution> convolution = cv::cuda::createConvolution();
+
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() convolution->convolve(d_image, d_templ, dst, ccorr);
+
+        CUDA_SANITY_CHECK(dst, 1e-6, ERROR_RELATIVE);
+    }
+    else
+    {
+        if (ccorr)
+            FAIL_NO_CPU();
+
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::filter2D(image, dst, image.depth(), templ);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+}} // namespace
diff --git a/modules/cudaarithm/perf/perf_core.cpp b/modules/cudaarithm/perf/perf_core.cpp
new file mode 100644
index 00000000000..bc9f0e2f715
--- /dev/null
+++ b/modules/cudaarithm/perf/perf_core.cpp
@@ -0,0 +1,323 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+#define ARITHM_MAT_DEPTH Values(CV_8U, CV_16U, CV_32F, CV_64F)
+
+//////////////////////////////////////////////////////////////////////
+// Merge
+
+DEF_PARAM_TEST(Sz_Depth_Cn, cv::Size, MatDepth, MatCn);
+
+PERF_TEST_P(Sz_Depth_Cn, Merge,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    ARITHM_MAT_DEPTH,
+                    Values(2, 3, 4)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+
+    std::vector<cv::Mat> src(channels);
+    for (int i = 0; i < channels; ++i)
+    {
+        src[i].create(size, depth);
+        declare.in(src[i], WARMUP_RNG);
+    }
+
+    if (PERF_RUN_CUDA())
+    {
+        std::vector<cv::cuda::GpuMat> d_src(channels);
+        for (int i = 0; i < channels; ++i)
+            d_src[i].upload(src[i]);
+
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::merge(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-10);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::merge(src, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Split
+
+PERF_TEST_P(Sz_Depth_Cn, Split,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    ARITHM_MAT_DEPTH,
+                    Values(2, 3, 4)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+
+    cv::Mat src(size, CV_MAKE_TYPE(depth, channels));
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        std::vector<cv::cuda::GpuMat> dst;
+
+        TEST_CYCLE() cv::cuda::split(d_src, dst);
+
+        const cv::cuda::GpuMat& dst0 = dst[0];
+        const cv::cuda::GpuMat& dst1 = dst[1];
+
+        CUDA_SANITY_CHECK(dst0, 1e-10);
+        CUDA_SANITY_CHECK(dst1, 1e-10);
+    }
+    else
+    {
+        std::vector<cv::Mat> dst;
+
+        TEST_CYCLE() cv::split(src, dst);
+
+        const cv::Mat& dst0 = dst[0];
+        const cv::Mat& dst1 = dst[1];
+
+        CPU_SANITY_CHECK(dst0);
+        CPU_SANITY_CHECK(dst1);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Transpose
+
+PERF_TEST_P(Sz_Type, Transpose,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8UC1, CV_8UC4, CV_16UC2, CV_16SC2, CV_32SC1, CV_32SC2, CV_64FC1)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::transpose(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-10);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::transpose(src, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Flip
+
+enum {FLIP_BOTH = 0, FLIP_X = 1, FLIP_Y = -1};
+CV_ENUM(FlipCode, FLIP_BOTH, FLIP_X, FLIP_Y)
+
+DEF_PARAM_TEST(Sz_Depth_Cn_Code, cv::Size, MatDepth, MatCn, FlipCode);
+
+PERF_TEST_P(Sz_Depth_Cn_Code, Flip,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F),
+                    CUDA_CHANNELS_1_3_4,
+                    FlipCode::all()))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+    const int flipCode = GET_PARAM(3);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::flip(d_src, dst, flipCode);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::flip(src, dst, flipCode);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// LutOneChannel
+
+PERF_TEST_P(Sz_Type, LutOneChannel,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8UC1, CV_8UC3)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Mat lut(1, 256, CV_8UC1);
+    declare.in(lut, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::cuda::LookUpTable> lutAlg = cv::cuda::createLookUpTable(lut);
+
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() lutAlg->transform(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::LUT(src, lut, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// LutMultiChannel
+
+PERF_TEST_P(Sz_Type, LutMultiChannel,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values<MatType>(CV_8UC3)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Mat lut(1, 256, CV_MAKE_TYPE(CV_8U, src.channels()));
+    declare.in(lut, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::cuda::LookUpTable> lutAlg = cv::cuda::createLookUpTable(lut);
+
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() lutAlg->transform(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::LUT(src, lut, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// CopyMakeBorder
+
+DEF_PARAM_TEST(Sz_Depth_Cn_Border, cv::Size, MatDepth, MatCn, BorderMode);
+
+PERF_TEST_P(Sz_Depth_Cn_Border, CopyMakeBorder,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F),
+                    CUDA_CHANNELS_1_3_4,
+                    ALL_BORDER_MODES))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+    const int borderMode = GET_PARAM(3);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::copyMakeBorder(d_src, dst, 5, 5, 5, 5, borderMode);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::copyMakeBorder(src, dst, 5, 5, 5, 5, borderMode);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+}} // namespace
diff --git a/modules/cudaarithm/perf/perf_element_operations.cpp b/modules/cudaarithm/perf/perf_element_operations.cpp
new file mode 100644
index 00000000000..9aa2d4e4e0f
--- /dev/null
+++ b/modules/cudaarithm/perf/perf_element_operations.cpp
@@ -0,0 +1,1504 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+#define ARITHM_MAT_DEPTH Values(CV_8U, CV_16U, CV_32F, CV_64F)
+
+//////////////////////////////////////////////////////////////////////
+// AddMat
+
+DEF_PARAM_TEST(Sz_Depth, cv::Size, MatDepth);
+
+PERF_TEST_P(Sz_Depth, AddMat,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    ARITHM_MAT_DEPTH))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src1(size, depth);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, depth);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::add(d_src1, d_src2, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-10);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::add(src1, src2, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// AddScalar
+
+PERF_TEST_P(Sz_Depth, AddScalar,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    ARITHM_MAT_DEPTH))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Scalar s;
+    declare.in(s, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::add(d_src, s, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-10);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::add(src, s, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// SubtractMat
+
+PERF_TEST_P(Sz_Depth, SubtractMat,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    ARITHM_MAT_DEPTH))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src1(size, depth);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, depth);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::subtract(d_src1, d_src2, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-10);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::subtract(src1, src2, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// SubtractScalar
+
+PERF_TEST_P(Sz_Depth, SubtractScalar,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    ARITHM_MAT_DEPTH))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Scalar s;
+    declare.in(s, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::subtract(d_src, s, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-10);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::subtract(src, s, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// MultiplyMat
+
+PERF_TEST_P(Sz_Depth, MultiplyMat,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    ARITHM_MAT_DEPTH))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src1(size, depth);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, depth);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::multiply(d_src1, d_src2, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-6);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::multiply(src1, src2, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// MultiplyScalar
+
+PERF_TEST_P(Sz_Depth, MultiplyScalar,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    ARITHM_MAT_DEPTH))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Scalar s;
+    declare.in(s, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::multiply(d_src, s, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-6);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::multiply(src, s, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// DivideMat
+
+PERF_TEST_P(Sz_Depth, DivideMat,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    ARITHM_MAT_DEPTH))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src1(size, depth);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, depth);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::divide(d_src1, d_src2, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-6);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::divide(src1, src2, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// DivideScalar
+
+PERF_TEST_P(Sz_Depth, DivideScalar,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    ARITHM_MAT_DEPTH))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Scalar s;
+    declare.in(s, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::divide(d_src, s, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-6);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::divide(src, s, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// DivideScalarInv
+
+PERF_TEST_P(Sz_Depth, DivideScalarInv,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    ARITHM_MAT_DEPTH))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Scalar s;
+    declare.in(s, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::divide(s[0], d_src, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-6);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::divide(s, src, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// AbsDiffMat
+
+PERF_TEST_P(Sz_Depth, AbsDiffMat,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    ARITHM_MAT_DEPTH))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src1(size, depth);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, depth);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::absdiff(d_src1, d_src2, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-10);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::absdiff(src1, src2, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// AbsDiffScalar
+
+PERF_TEST_P(Sz_Depth, AbsDiffScalar,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    ARITHM_MAT_DEPTH))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Scalar s;
+    declare.in(s, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::absdiff(d_src, s, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-10);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::absdiff(src, s, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Abs
+
+PERF_TEST_P(Sz_Depth, Abs,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_16S, CV_32F)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::abs(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Sqr
+
+PERF_TEST_P(Sz_Depth, Sqr,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16S, CV_32F)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::sqr(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Sqrt
+
+PERF_TEST_P(Sz_Depth, Sqrt,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16S, CV_32F)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    cv::randu(src, 0, 100000);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::sqrt(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::sqrt(src, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Log
+
+PERF_TEST_P(Sz_Depth, Log,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16S, CV_32F)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    cv::randu(src, 0, 100000);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::log(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::log(src, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Exp
+
+PERF_TEST_P(Sz_Depth, Exp,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16S, CV_32F)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    cv::randu(src, 0, 10);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::exp(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::exp(src, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Pow
+
+DEF_PARAM_TEST(Sz_Depth_Power, cv::Size, MatDepth, double);
+
+PERF_TEST_P(Sz_Depth_Power, Pow,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16S, CV_32F),
+                    Values(0.3, 2.0, 2.4)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const double power = GET_PARAM(2);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::pow(d_src, power, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::pow(src, power, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// CompareMat
+
+CV_ENUM(CmpCode, cv::CMP_EQ, cv::CMP_GT, cv::CMP_GE, cv::CMP_LT, cv::CMP_LE, cv::CMP_NE)
+
+DEF_PARAM_TEST(Sz_Depth_Code, cv::Size, MatDepth, CmpCode);
+
+PERF_TEST_P(Sz_Depth_Code, CompareMat,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    ARITHM_MAT_DEPTH,
+                    CmpCode::all()))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int cmp_code = GET_PARAM(2);
+
+    cv::Mat src1(size, depth);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, depth);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::compare(d_src1, d_src2, dst, cmp_code);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::compare(src1, src2, dst, cmp_code);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// CompareScalar
+
+PERF_TEST_P(Sz_Depth_Code, CompareScalar,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    ARITHM_MAT_DEPTH,
+                    CmpCode::all()))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int cmp_code = GET_PARAM(2);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Scalar s;
+    declare.in(s, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::compare(d_src, s, dst, cmp_code);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::compare(src, s, dst, cmp_code);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// BitwiseNot
+
+PERF_TEST_P(Sz_Depth, BitwiseNot,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32S)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::bitwise_not(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::bitwise_not(src, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// BitwiseAndMat
+
+PERF_TEST_P(Sz_Depth, BitwiseAndMat,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32S)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src1(size, depth);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, depth);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::bitwise_and(d_src1, d_src2, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::bitwise_and(src1, src2, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// BitwiseAndScalar
+
+DEF_PARAM_TEST(Sz_Depth_Cn, cv::Size, MatDepth, MatCn);
+
+PERF_TEST_P(Sz_Depth_Cn, BitwiseAndScalar,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32S),
+                    CUDA_CHANNELS_1_3_4))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Scalar s;
+    declare.in(s, WARMUP_RNG);
+    cv::Scalar_<int> is = s;
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::bitwise_and(d_src, is, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::bitwise_and(src, is, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// BitwiseOrMat
+
+PERF_TEST_P(Sz_Depth, BitwiseOrMat,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32S)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src1(size, depth);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, depth);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::bitwise_or(d_src1, d_src2, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::bitwise_or(src1, src2, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// BitwiseOrScalar
+
+PERF_TEST_P(Sz_Depth_Cn, BitwiseOrScalar,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32S),
+                    CUDA_CHANNELS_1_3_4))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Scalar s;
+    declare.in(s, WARMUP_RNG);
+    cv::Scalar_<int> is = s;
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::bitwise_or(d_src, is, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::bitwise_or(src, is, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// BitwiseXorMat
+
+PERF_TEST_P(Sz_Depth, BitwiseXorMat,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32S)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src1(size, depth);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, depth);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::bitwise_xor(d_src1, d_src2, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::bitwise_xor(src1, src2, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// BitwiseXorScalar
+
+PERF_TEST_P(Sz_Depth_Cn, BitwiseXorScalar,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32S),
+                    CUDA_CHANNELS_1_3_4))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Scalar s;
+    declare.in(s, WARMUP_RNG);
+    cv::Scalar_<int> is = s;
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::bitwise_xor(d_src, is, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::bitwise_xor(src, is, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// RShift
+
+PERF_TEST_P(Sz_Depth_Cn, RShift,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32S),
+                    CUDA_CHANNELS_1_3_4))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    const cv::Scalar_<int> val = cv::Scalar_<int>::all(4);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::rshift(d_src, val, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// LShift
+
+PERF_TEST_P(Sz_Depth_Cn, LShift,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32S),
+                    CUDA_CHANNELS_1_3_4))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    const cv::Scalar_<int> val = cv::Scalar_<int>::all(4);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::lshift(d_src, val, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// MinMat
+
+PERF_TEST_P(Sz_Depth, MinMat,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src1(size, depth);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, depth);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::min(d_src1, d_src2, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::min(src1, src2, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// MinScalar
+
+PERF_TEST_P(Sz_Depth, MinScalar,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Scalar val;
+    declare.in(val, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::min(d_src, val[0], dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::min(src, val[0], dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// MaxMat
+
+PERF_TEST_P(Sz_Depth, MaxMat,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src1(size, depth);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, depth);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::max(d_src1, d_src2, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::max(src1, src2, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// MaxScalar
+
+PERF_TEST_P(Sz_Depth, MaxScalar,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Scalar val;
+    declare.in(val, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::max(d_src, val[0], dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::max(src, val[0], dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// AddWeighted
+
+DEF_PARAM_TEST(Sz_3Depth, cv::Size, MatDepth, MatDepth, MatDepth);
+
+PERF_TEST_P(Sz_3Depth, AddWeighted,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F, CV_64F),
+                    Values(CV_8U, CV_16U, CV_32F, CV_64F),
+                    Values(CV_8U, CV_16U, CV_32F, CV_64F)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth1 = GET_PARAM(1);
+    const int depth2 = GET_PARAM(2);
+    const int dst_depth = GET_PARAM(3);
+
+    cv::Mat src1(size, depth1);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, depth2);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::addWeighted(d_src1, 0.5, d_src2, 0.5, 10.0, dst, dst_depth);
+
+        CUDA_SANITY_CHECK(dst, 1e-10);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::addWeighted(src1, 0.5, src2, 0.5, 10.0, dst, dst_depth);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// MagnitudeComplex
+
+PERF_TEST_P(Sz, MagnitudeComplex,
+            CUDA_TYPICAL_MAT_SIZES)
+{
+    const cv::Size size = GetParam();
+
+    cv::Mat src(size, CV_32FC2);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::magnitude(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat xy[2];
+        cv::split(src, xy);
+
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::magnitude(xy[0], xy[1], dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// MagnitudeSqrComplex
+
+PERF_TEST_P(Sz, MagnitudeSqrComplex,
+            CUDA_TYPICAL_MAT_SIZES)
+{
+    const cv::Size size = GetParam();
+
+    cv::Mat src(size, CV_32FC2);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::magnitudeSqr(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Magnitude
+
+PERF_TEST_P(Sz, Magnitude,
+            CUDA_TYPICAL_MAT_SIZES)
+{
+    const cv::Size size = GetParam();
+
+    cv::Mat src1(size, CV_32FC1);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, CV_32FC1);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::magnitude(d_src1, d_src2, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::magnitude(src1, src2, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// MagnitudeSqr
+
+PERF_TEST_P(Sz, MagnitudeSqr,
+            CUDA_TYPICAL_MAT_SIZES)
+{
+    const cv::Size size = GetParam();
+
+    cv::Mat src1(size, CV_32FC1);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, CV_32FC1);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::magnitudeSqr(d_src1, d_src2, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Phase
+
+DEF_PARAM_TEST(Sz_AngleInDegrees, cv::Size, bool);
+DEF_PARAM_TEST(Sz_Type_AngleInDegrees, cv::Size, MatType, bool);
+
+PERF_TEST_P(Sz_AngleInDegrees, Phase,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Bool()))
+{
+    const cv::Size size = GET_PARAM(0);
+    const bool angleInDegrees = GET_PARAM(1);
+
+    cv::Mat src1(size, CV_32FC1);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, CV_32FC1);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::phase(d_src1, d_src2, dst, angleInDegrees);
+
+        CUDA_SANITY_CHECK(dst, 1e-6, ERROR_RELATIVE);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::phase(src1, src2, dst, angleInDegrees);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// CartToPolar
+
+PERF_TEST_P(Sz_AngleInDegrees, CartToPolar,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Bool()))
+{
+    const cv::Size size = GET_PARAM(0);
+    const bool angleInDegrees = GET_PARAM(1);
+
+    cv::Mat src1(size, CV_32FC1);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, CV_32FC1);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        cv::cuda::GpuMat magnitude;
+        cv::cuda::GpuMat angle;
+
+        TEST_CYCLE() cv::cuda::cartToPolar(d_src1, d_src2, magnitude, angle, angleInDegrees);
+
+        CUDA_SANITY_CHECK(magnitude);
+        CUDA_SANITY_CHECK(angle, 1e-6, ERROR_RELATIVE);
+    }
+    else
+    {
+        cv::Mat magnitude;
+        cv::Mat angle;
+
+        TEST_CYCLE() cv::cartToPolar(src1, src2, magnitude, angle, angleInDegrees);
+
+        CPU_SANITY_CHECK(magnitude);
+        CPU_SANITY_CHECK(angle);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// PolarToCart
+
+PERF_TEST_P(Sz_Type_AngleInDegrees, PolarToCart,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    testing::Values(CV_32FC1, CV_64FC1),
+                    Bool()))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+    const bool angleInDegrees = GET_PARAM(2);
+
+    cv::Mat magnitude(size, type);
+    declare.in(magnitude, WARMUP_RNG);
+
+    cv::Mat angle(size, type);
+    declare.in(angle, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_magnitude(magnitude);
+        const cv::cuda::GpuMat d_angle(angle);
+        cv::cuda::GpuMat x;
+        cv::cuda::GpuMat y;
+
+        TEST_CYCLE() cv::cuda::polarToCart(d_magnitude, d_angle, x, y, angleInDegrees);
+
+        CUDA_SANITY_CHECK(x);
+        CUDA_SANITY_CHECK(y);
+    }
+    else
+    {
+        cv::Mat x;
+        cv::Mat y;
+
+        TEST_CYCLE() cv::polarToCart(magnitude, angle, x, y, angleInDegrees);
+
+        CPU_SANITY_CHECK(x);
+        CPU_SANITY_CHECK(y);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Threshold
+
+CV_ENUM(ThreshOp, cv::THRESH_BINARY, cv::THRESH_BINARY_INV, cv::THRESH_TRUNC, cv::THRESH_TOZERO, cv::THRESH_TOZERO_INV)
+
+DEF_PARAM_TEST(Sz_Depth_Op, cv::Size, MatDepth, ThreshOp);
+
+PERF_TEST_P(Sz_Depth_Op, Threshold,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+            Values(CV_8U, CV_16U, CV_32F, CV_64F),
+            ThreshOp::all()))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int threshOp = GET_PARAM(2);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::threshold(d_src, dst, 100.0, 255.0, threshOp);
+
+        CUDA_SANITY_CHECK(dst, 1e-10);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::threshold(src, dst, 100.0, 255.0, threshOp);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+}} // namespace
diff --git a/modules/cudaarithm/perf/perf_main.cpp b/modules/cudaarithm/perf/perf_main.cpp
new file mode 100644
index 00000000000..118d7596ac2
--- /dev/null
+++ b/modules/cudaarithm/perf/perf_main.cpp
@@ -0,0 +1,47 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+using namespace perf;
+
+CV_PERF_TEST_CUDA_MAIN(cudaarithm)
diff --git a/modules/cudaarithm/perf/perf_precomp.hpp b/modules/cudaarithm/perf/perf_precomp.hpp
new file mode 100644
index 00000000000..071ac946537
--- /dev/null
+++ b/modules/cudaarithm/perf/perf_precomp.hpp
@@ -0,0 +1,55 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_PERF_PRECOMP_HPP__
+#define __OPENCV_PERF_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_perf.hpp"
+
+#include "opencv2/cudaarithm.hpp"
+
+namespace opencv_test {
+using namespace perf;
+using namespace testing;
+}
+
+#endif
diff --git a/modules/cudaarithm/perf/perf_reductions.cpp b/modules/cudaarithm/perf/perf_reductions.cpp
new file mode 100644
index 00000000000..71bb5524a63
--- /dev/null
+++ b/modules/cudaarithm/perf/perf_reductions.cpp
@@ -0,0 +1,520 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////////////////////
+// Norm
+
+DEF_PARAM_TEST(Sz_Depth_Norm, cv::Size, MatDepth, NormType);
+
+PERF_TEST_P(Sz_Depth_Norm, Norm,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32S, CV_32F),
+                    Values(NormType(cv::NORM_INF), NormType(cv::NORM_L1), NormType(cv::NORM_L2))))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int normType = GET_PARAM(2);
+
+    cv::Mat src(size, depth);
+    if (depth == CV_8U)
+        cv::randu(src, 0, 254);
+    else
+        declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat d_buf;
+        double gpu_dst;
+
+        TEST_CYCLE() gpu_dst = cv::cuda::norm(d_src, normType, d_buf);
+
+        SANITY_CHECK(gpu_dst, 1e-6, ERROR_RELATIVE);
+    }
+    else
+    {
+        double cpu_dst;
+
+        TEST_CYCLE() cpu_dst = cv::norm(src, normType);
+
+        SANITY_CHECK(cpu_dst, 1e-6, ERROR_RELATIVE);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// NormDiff
+
+DEF_PARAM_TEST(Sz_Norm, cv::Size, NormType);
+
+PERF_TEST_P(Sz_Norm, NormDiff,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(NormType(cv::NORM_INF), NormType(cv::NORM_L1), NormType(cv::NORM_L2))))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int normType = GET_PARAM(1);
+
+    cv::Mat src1(size, CV_8UC1);
+    declare.in(src1, WARMUP_RNG);
+
+    cv::Mat src2(size, CV_8UC1);
+    declare.in(src2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src1(src1);
+        const cv::cuda::GpuMat d_src2(src2);
+        double gpu_dst;
+
+        TEST_CYCLE() gpu_dst = cv::cuda::norm(d_src1, d_src2, normType);
+
+        SANITY_CHECK(gpu_dst);
+
+    }
+    else
+    {
+        double cpu_dst;
+
+        TEST_CYCLE() cpu_dst = cv::norm(src1, src2, normType);
+
+        SANITY_CHECK(cpu_dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Sum
+
+DEF_PARAM_TEST(Sz_Depth_Cn, cv::Size, MatDepth, MatCn);
+
+PERF_TEST_P(Sz_Depth_Cn, Sum,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F),
+                    CUDA_CHANNELS_1_3_4))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::Scalar gpu_dst;
+
+        TEST_CYCLE() gpu_dst = cv::cuda::sum(d_src);
+
+        SANITY_CHECK(gpu_dst, 1e-5, ERROR_RELATIVE);
+    }
+    else
+    {
+        cv::Scalar cpu_dst;
+
+        TEST_CYCLE() cpu_dst = cv::sum(src);
+
+        SANITY_CHECK(cpu_dst, 1e-6, ERROR_RELATIVE);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// SumAbs
+
+PERF_TEST_P(Sz_Depth_Cn, SumAbs,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F),
+                    CUDA_CHANNELS_1_3_4))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::Scalar gpu_dst;
+
+        TEST_CYCLE() gpu_dst = cv::cuda::absSum(d_src);
+
+        SANITY_CHECK(gpu_dst, 1e-6, ERROR_RELATIVE);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// SumSqr
+
+PERF_TEST_P(Sz_Depth_Cn, SumSqr,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values<MatDepth>(CV_8U, CV_16U, CV_32F),
+                    CUDA_CHANNELS_1_3_4))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::Scalar gpu_dst;
+
+        TEST_CYCLE() gpu_dst = cv::cuda::sqrSum(d_src);
+
+        SANITY_CHECK(gpu_dst, 1e-6, ERROR_RELATIVE);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// MinMax
+
+DEF_PARAM_TEST(Sz_Depth, cv::Size, MatDepth);
+
+PERF_TEST_P(Sz_Depth, MinMax,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F, CV_64F)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    if (depth == CV_8U)
+        cv::randu(src, 0, 254);
+    else
+        declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        double gpu_minVal, gpu_maxVal;
+
+        TEST_CYCLE() cv::cuda::minMax(d_src, &gpu_minVal, &gpu_maxVal, cv::cuda::GpuMat());
+
+        SANITY_CHECK(gpu_minVal, 1e-10);
+        SANITY_CHECK(gpu_maxVal, 1e-10);
+    }
+    else
+    {
+        double cpu_minVal, cpu_maxVal;
+
+        TEST_CYCLE() cv::minMaxLoc(src, &cpu_minVal, &cpu_maxVal);
+
+        SANITY_CHECK(cpu_minVal);
+        SANITY_CHECK(cpu_maxVal);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// MinMaxLoc
+
+PERF_TEST_P(Sz_Depth, MinMaxLoc,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F, CV_64F)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    if (depth == CV_8U)
+        cv::randu(src, 0, 254);
+    else
+        declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        double gpu_minVal, gpu_maxVal;
+        cv::Point gpu_minLoc, gpu_maxLoc;
+
+        TEST_CYCLE() cv::cuda::minMaxLoc(d_src, &gpu_minVal, &gpu_maxVal, &gpu_minLoc, &gpu_maxLoc);
+
+        SANITY_CHECK(gpu_minVal, 1e-10);
+        SANITY_CHECK(gpu_maxVal, 1e-10);
+    }
+    else
+    {
+        double cpu_minVal, cpu_maxVal;
+        cv::Point cpu_minLoc, cpu_maxLoc;
+
+        TEST_CYCLE() cv::minMaxLoc(src, &cpu_minVal, &cpu_maxVal, &cpu_minLoc, &cpu_maxLoc);
+
+        SANITY_CHECK(cpu_minVal);
+        SANITY_CHECK(cpu_maxVal);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// CountNonZero
+
+PERF_TEST_P(Sz_Depth, CountNonZero,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F, CV_64F)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        int gpu_dst = 0;
+
+        TEST_CYCLE() gpu_dst = cv::cuda::countNonZero(d_src);
+
+        SANITY_CHECK(gpu_dst);
+    }
+    else
+    {
+        int cpu_dst = 0;
+
+        TEST_CYCLE() cpu_dst = cv::countNonZero(src);
+
+        SANITY_CHECK(cpu_dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Reduce
+
+CV_ENUM(ReduceCode, REDUCE_SUM, REDUCE_AVG, REDUCE_MAX, REDUCE_MIN)
+
+enum {Rows = 0, Cols = 1};
+CV_ENUM(ReduceDim, Rows, Cols)
+
+DEF_PARAM_TEST(Sz_Depth_Cn_Code_Dim, cv::Size, MatDepth, MatCn, ReduceCode, ReduceDim);
+
+PERF_TEST_P(Sz_Depth_Cn_Code_Dim, Reduce,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_16S, CV_32F),
+                    Values(1, 2, 3, 4),
+                    ReduceCode::all(),
+                    ReduceDim::all()))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+    const int reduceOp = GET_PARAM(3);
+    const int dim = GET_PARAM(4);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::reduce(d_src, dst, dim, reduceOp, CV_32F);
+
+        dst = dst.reshape(dst.channels(), 1);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::reduce(src, dst, dim, reduceOp, CV_32F);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Normalize
+
+DEF_PARAM_TEST(Sz_Depth_NormType, cv::Size, MatDepth, NormType);
+
+PERF_TEST_P(Sz_Depth_NormType, Normalize,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F, CV_64F),
+                    Values(NormType(cv::NORM_INF),
+                           NormType(cv::NORM_L1),
+                           NormType(cv::NORM_L2),
+                           NormType(cv::NORM_MINMAX))))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+    const int norm_type = GET_PARAM(2);
+
+    const double alpha = 1;
+    const double beta = 0;
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::normalize(d_src, dst, alpha, beta, norm_type, type, cv::cuda::GpuMat());
+
+        CUDA_SANITY_CHECK(dst, 1e-6);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::normalize(src, dst, alpha, beta, norm_type, type);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// MeanStdDev
+
+PERF_TEST_P(Sz, MeanStdDev,
+            CUDA_TYPICAL_MAT_SIZES)
+{
+    const cv::Size size = GetParam();
+
+    cv::Mat src(size, CV_8UC1);
+    declare.in(src, WARMUP_RNG);
+
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::Scalar gpu_mean;
+        cv::Scalar gpu_stddev;
+
+        TEST_CYCLE() cv::cuda::meanStdDev(d_src, gpu_mean, gpu_stddev);
+
+        SANITY_CHECK(gpu_mean);
+        SANITY_CHECK(gpu_stddev);
+    }
+    else
+    {
+        cv::Scalar cpu_mean;
+        cv::Scalar cpu_stddev;
+
+        TEST_CYCLE() cv::meanStdDev(src, cpu_mean, cpu_stddev);
+
+        SANITY_CHECK(cpu_mean);
+        SANITY_CHECK(cpu_stddev);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Integral
+
+PERF_TEST_P(Sz, Integral,
+            CUDA_TYPICAL_MAT_SIZES)
+{
+    const cv::Size size = GetParam();
+
+    cv::Mat src(size, CV_8UC1);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::integral(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::integral(src, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// IntegralSqr
+
+PERF_TEST_P(Sz, IntegralSqr,
+            CUDA_TYPICAL_MAT_SIZES)
+{
+    const cv::Size size = GetParam();
+
+    cv::Mat src(size, CV_8UC1);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::sqrIntegral(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+}} // namespace
diff --git a/modules/cudaarithm/src/arithm.cpp b/modules/cudaarithm/src/arithm.cpp
new file mode 100644
index 00000000000..381580cff43
--- /dev/null
+++ b/modules/cudaarithm/src/arithm.cpp
@@ -0,0 +1,582 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+void cv::cuda::gemm(InputArray, InputArray, double, InputArray, double, OutputArray, int, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::mulSpectrums(InputArray, InputArray, OutputArray, int, bool, Stream&) { throw_no_cuda(); }
+void cv::cuda::mulAndScaleSpectrums(InputArray, InputArray, OutputArray, int, float, bool, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::dft(InputArray, OutputArray, Size, int, Stream&) { throw_no_cuda(); }
+
+Ptr<Convolution> cv::cuda::createConvolution(Size) { throw_no_cuda(); return Ptr<Convolution>(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+namespace
+{
+    #define error_entry(entry)  { entry, #entry }
+
+    struct ErrorEntry
+    {
+        int code;
+        const char* str;
+    };
+
+    struct ErrorEntryComparer
+    {
+        int code;
+        ErrorEntryComparer(int code_) : code(code_) {}
+        bool operator()(const ErrorEntry& e) const { return e.code == code; }
+    };
+
+    String getErrorString(int code, const ErrorEntry* errors, size_t n)
+    {
+        size_t idx = std::find_if(errors, errors + n, ErrorEntryComparer(code)) - errors;
+
+        const char* msg = (idx != n) ? errors[idx].str : "Unknown error code";
+        String str = cv::format("%s [Code = %d]", msg, code);
+
+        return str;
+    }
+}
+
+#ifdef HAVE_CUBLAS
+    namespace
+    {
+        const ErrorEntry cublas_errors[] =
+        {
+            error_entry( CUBLAS_STATUS_SUCCESS ),
+            error_entry( CUBLAS_STATUS_NOT_INITIALIZED ),
+            error_entry( CUBLAS_STATUS_ALLOC_FAILED ),
+            error_entry( CUBLAS_STATUS_INVALID_VALUE ),
+            error_entry( CUBLAS_STATUS_ARCH_MISMATCH ),
+            error_entry( CUBLAS_STATUS_MAPPING_ERROR ),
+            error_entry( CUBLAS_STATUS_EXECUTION_FAILED ),
+            error_entry( CUBLAS_STATUS_INTERNAL_ERROR )
+        };
+
+        const size_t cublas_error_num = sizeof(cublas_errors) / sizeof(cublas_errors[0]);
+
+        static inline void ___cublasSafeCall(cublasStatus_t err, const char* file, const int line, const char* func)
+        {
+            if (CUBLAS_STATUS_SUCCESS != err)
+            {
+                String msg = getErrorString(err, cublas_errors, cublas_error_num);
+                cv::error(cv::Error::GpuApiCallError, msg, func, file, line);
+            }
+        }
+    }
+
+    #define cublasSafeCall(expr)  ___cublasSafeCall(expr, __FILE__, __LINE__, CV_Func)
+#endif // HAVE_CUBLAS
+
+#ifdef HAVE_CUFFT
+    namespace
+    {
+        //////////////////////////////////////////////////////////////////////////
+        // CUFFT errors
+
+        const ErrorEntry cufft_errors[] =
+        {
+            error_entry( CUFFT_INVALID_PLAN ),
+            error_entry( CUFFT_ALLOC_FAILED ),
+            error_entry( CUFFT_INVALID_TYPE ),
+            error_entry( CUFFT_INVALID_VALUE ),
+            error_entry( CUFFT_INTERNAL_ERROR ),
+            error_entry( CUFFT_EXEC_FAILED ),
+            error_entry( CUFFT_SETUP_FAILED ),
+            error_entry( CUFFT_INVALID_SIZE ),
+            error_entry( CUFFT_UNALIGNED_DATA )
+        };
+
+        const int cufft_error_num = sizeof(cufft_errors) / sizeof(cufft_errors[0]);
+
+        void ___cufftSafeCall(int err, const char* file, const int line, const char* func)
+        {
+            if (CUFFT_SUCCESS != err)
+            {
+                String msg = getErrorString(err, cufft_errors, cufft_error_num);
+                cv::error(cv::Error::GpuApiCallError, msg, func, file, line);
+            }
+        }
+    }
+
+    #define cufftSafeCall(expr)  ___cufftSafeCall(expr, __FILE__, __LINE__, CV_Func)
+
+#endif
+
+////////////////////////////////////////////////////////////////////////
+// gemm
+
+void cv::cuda::gemm(InputArray _src1, InputArray _src2, double alpha, InputArray _src3, double beta, OutputArray _dst, int flags, Stream& stream)
+{
+#ifndef HAVE_CUBLAS
+    CV_UNUSED(_src1);
+    CV_UNUSED(_src2);
+    CV_UNUSED(alpha);
+    CV_UNUSED(_src3);
+    CV_UNUSED(beta);
+    CV_UNUSED(_dst);
+    CV_UNUSED(flags);
+    CV_UNUSED(stream);
+    CV_Error(Error::StsNotImplemented, "The library was build without CUBLAS");
+#else
+    // CUBLAS works with column-major matrices
+
+    GpuMat src1 = getInputMat(_src1, stream);
+    GpuMat src2 = getInputMat(_src2, stream);
+    GpuMat src3 = getInputMat(_src3, stream);
+
+    CV_Assert( src1.type() == CV_32FC1 || src1.type() == CV_32FC2 || src1.type() == CV_64FC1 || src1.type() == CV_64FC2 );
+    CV_Assert( src2.type() == src1.type() && (src3.empty() || src3.type() == src1.type()) );
+
+    if (src1.depth() == CV_64F)
+    {
+        if (!deviceSupports(NATIVE_DOUBLE))
+            CV_Error(cv::Error::StsUnsupportedFormat, "The device doesn't support double");
+    }
+
+    bool tr1 = (flags & GEMM_1_T) != 0;
+    bool tr2 = (flags & GEMM_2_T) != 0;
+    bool tr3 = (flags & GEMM_3_T) != 0;
+
+    if (src1.type() == CV_64FC2)
+    {
+        if (tr1 || tr2 || tr3)
+            CV_Error(cv::Error::StsNotImplemented, "transpose operation doesn't implemented for CV_64FC2 type");
+    }
+
+    Size src1Size = tr1 ? Size(src1.rows, src1.cols) : src1.size();
+    Size src2Size = tr2 ? Size(src2.rows, src2.cols) : src2.size();
+    Size src3Size = tr3 ? Size(src3.rows, src3.cols) : src3.size();
+    Size dstSize(src2Size.width, src1Size.height);
+
+    CV_Assert( src1Size.width == src2Size.height );
+    CV_Assert( src3.empty() || src3Size == dstSize );
+
+    GpuMat dst = getOutputMat(_dst, dstSize, src1.type(), stream);
+
+    if (beta != 0)
+    {
+        if (src3.empty())
+        {
+            dst.setTo(Scalar::all(0), stream);
+        }
+        else
+        {
+            if (tr3)
+            {
+                cuda::transpose(src3, dst, stream);
+            }
+            else
+            {
+                src3.copyTo(dst, stream);
+            }
+        }
+    }
+
+    cublasHandle_t handle;
+    cublasSafeCall( cublasCreate_v2(&handle) );
+
+    cublasSafeCall( cublasSetStream_v2(handle, StreamAccessor::getStream(stream)) );
+
+    cublasSafeCall( cublasSetPointerMode_v2(handle, CUBLAS_POINTER_MODE_HOST) );
+
+    const float alphaf = static_cast<float>(alpha);
+    const float betaf = static_cast<float>(beta);
+
+    const cuComplex alphacf = make_cuComplex(alphaf, 0);
+    const cuComplex betacf = make_cuComplex(betaf, 0);
+
+    const cuDoubleComplex alphac = make_cuDoubleComplex(alpha, 0);
+    const cuDoubleComplex betac = make_cuDoubleComplex(beta, 0);
+
+    cublasOperation_t transa = tr2 ? CUBLAS_OP_T : CUBLAS_OP_N;
+    cublasOperation_t transb = tr1 ? CUBLAS_OP_T : CUBLAS_OP_N;
+
+    switch (src1.type())
+    {
+    case CV_32FC1:
+        cublasSafeCall( cublasSgemm_v2(handle, transa, transb, tr2 ? src2.rows : src2.cols, tr1 ? src1.cols : src1.rows, tr2 ? src2.cols : src2.rows,
+            &alphaf,
+            src2.ptr<float>(), static_cast<int>(src2.step / sizeof(float)),
+            src1.ptr<float>(), static_cast<int>(src1.step / sizeof(float)),
+            &betaf,
+            dst.ptr<float>(), static_cast<int>(dst.step / sizeof(float))) );
+        break;
+
+    case CV_64FC1:
+        cublasSafeCall( cublasDgemm_v2(handle, transa, transb, tr2 ? src2.rows : src2.cols, tr1 ? src1.cols : src1.rows, tr2 ? src2.cols : src2.rows,
+            &alpha,
+            src2.ptr<double>(), static_cast<int>(src2.step / sizeof(double)),
+            src1.ptr<double>(), static_cast<int>(src1.step / sizeof(double)),
+            &beta,
+            dst.ptr<double>(), static_cast<int>(dst.step / sizeof(double))) );
+        break;
+
+    case CV_32FC2:
+        cublasSafeCall( cublasCgemm_v2(handle, transa, transb, tr2 ? src2.rows : src2.cols, tr1 ? src1.cols : src1.rows, tr2 ? src2.cols : src2.rows,
+            &alphacf,
+            src2.ptr<cuComplex>(), static_cast<int>(src2.step / sizeof(cuComplex)),
+            src1.ptr<cuComplex>(), static_cast<int>(src1.step / sizeof(cuComplex)),
+            &betacf,
+            dst.ptr<cuComplex>(), static_cast<int>(dst.step / sizeof(cuComplex))) );
+        break;
+
+    case CV_64FC2:
+        cublasSafeCall( cublasZgemm_v2(handle, transa, transb, tr2 ? src2.rows : src2.cols, tr1 ? src1.cols : src1.rows, tr2 ? src2.cols : src2.rows,
+            &alphac,
+            src2.ptr<cuDoubleComplex>(), static_cast<int>(src2.step / sizeof(cuDoubleComplex)),
+            src1.ptr<cuDoubleComplex>(), static_cast<int>(src1.step / sizeof(cuDoubleComplex)),
+            &betac,
+            dst.ptr<cuDoubleComplex>(), static_cast<int>(dst.step / sizeof(cuDoubleComplex))) );
+        break;
+    }
+
+    cublasSafeCall( cublasDestroy_v2(handle) );
+
+    syncOutput(dst, _dst, stream);
+#endif
+}
+
+//////////////////////////////////////////////////////////////////////////////
+// DFT function
+
+void cv::cuda::dft(InputArray _src, OutputArray _dst, Size dft_size, int flags, Stream& stream)
+{
+    if (getInputMat(_src, stream).channels() == 2)
+        flags |= DFT_COMPLEX_INPUT;
+
+    Ptr<DFT> dft = createDFT(dft_size, flags);
+    dft->compute(_src, _dst, stream);
+}
+
+//////////////////////////////////////////////////////////////////////////////
+// DFT algorithm
+
+#ifdef HAVE_CUFFT
+
+namespace
+{
+
+    class DFTImpl : public DFT
+    {
+        Size dft_size, dft_size_opt;
+        bool is_1d_input, is_row_dft, is_scaled_dft, is_inverse, is_complex_input, is_complex_output;
+
+        cufftType dft_type;
+        cufftHandle plan;
+
+    public:
+        DFTImpl(Size dft_size, int flags)
+            : dft_size(dft_size),
+              dft_size_opt(dft_size),
+              is_1d_input((dft_size.height == 1) || (dft_size.width == 1)),
+              is_row_dft((flags & DFT_ROWS) != 0),
+              is_scaled_dft((flags & DFT_SCALE) != 0),
+              is_inverse((flags & DFT_INVERSE) != 0),
+              is_complex_input((flags & DFT_COMPLEX_INPUT) != 0),
+              is_complex_output(!(flags & DFT_REAL_OUTPUT)),
+              dft_type(!is_complex_input ? CUFFT_R2C : (is_complex_output ? CUFFT_C2C : CUFFT_C2R))
+        {
+            // We don't support unpacked output (in the case of real input)
+            CV_Assert( !(flags & DFT_COMPLEX_OUTPUT) );
+
+            // We don't support real-to-real transform
+            CV_Assert( is_complex_input || is_complex_output );
+
+            if (is_1d_input && !is_row_dft)
+            {
+                // If the source matrix is single column handle it as single row
+                dft_size_opt.width = std::max(dft_size.width, dft_size.height);
+                dft_size_opt.height = std::min(dft_size.width, dft_size.height);
+            }
+
+            CV_Assert( dft_size_opt.width > 1 );
+
+            if (is_1d_input || is_row_dft)
+                cufftSafeCall( cufftPlan1d(&plan, dft_size_opt.width, dft_type, dft_size_opt.height) );
+            else
+                cufftSafeCall( cufftPlan2d(&plan, dft_size_opt.height, dft_size_opt.width, dft_type) );
+        }
+
+        ~DFTImpl()
+        {
+            cufftSafeCall( cufftDestroy(plan) );
+        }
+
+        void compute(InputArray _src, OutputArray _dst, Stream& stream)
+        {
+            GpuMat src = getInputMat(_src, stream);
+
+            CV_Assert( src.type() == CV_32FC1 || src.type() == CV_32FC2 );
+            CV_Assert( is_complex_input == (src.channels() == 2) );
+
+            // Make sure here we work with the continuous input,
+            // as CUFFT can't handle gaps
+            GpuMat src_cont;
+            if (src.isContinuous())
+            {
+                src_cont = src;
+            }
+            else
+            {
+                BufferPool pool(stream);
+                src_cont.allocator = pool.getAllocator();
+                createContinuous(src.rows, src.cols, src.type(), src_cont);
+                src.copyTo(src_cont, stream);
+            }
+
+            cufftSafeCall( cufftSetStream(plan, StreamAccessor::getStream(stream)) );
+
+            if (is_complex_input)
+            {
+                if (is_complex_output)
+                {
+                    createContinuous(dft_size, CV_32FC2, _dst);
+                    GpuMat dst = _dst.getGpuMat();
+
+                    cufftSafeCall(cufftExecC2C(
+                            plan, src_cont.ptr<cufftComplex>(), dst.ptr<cufftComplex>(),
+                            is_inverse ? CUFFT_INVERSE : CUFFT_FORWARD));
+                }
+                else
+                {
+                    createContinuous(dft_size, CV_32F, _dst);
+                    GpuMat dst = _dst.getGpuMat();
+
+                    cufftSafeCall(cufftExecC2R(
+                            plan, src_cont.ptr<cufftComplex>(), dst.ptr<cufftReal>()));
+                }
+            }
+            else
+            {
+                // We could swap dft_size for efficiency. Here we must reflect it
+                if (dft_size == dft_size_opt)
+                    createContinuous(Size(dft_size.width / 2 + 1, dft_size.height), CV_32FC2, _dst);
+                else
+                    createContinuous(Size(dft_size.width, dft_size.height / 2 + 1), CV_32FC2, _dst);
+
+                GpuMat dst = _dst.getGpuMat();
+
+                cufftSafeCall(cufftExecR2C(
+                                  plan, src_cont.ptr<cufftReal>(), dst.ptr<cufftComplex>()));
+            }
+
+            if (is_scaled_dft)
+                cuda::multiply(_dst, Scalar::all(1. / dft_size.area()), _dst, 1, -1, stream);
+        }
+    };
+}
+
+#endif
+
+Ptr<DFT> cv::cuda::createDFT(Size dft_size, int flags)
+{
+#ifndef HAVE_CUFFT
+    CV_UNUSED(dft_size);
+    CV_UNUSED(flags);
+    CV_Error(Error::StsNotImplemented, "The library was build without CUFFT");
+    return Ptr<DFT>();
+#else
+    return makePtr<DFTImpl>(dft_size, flags);
+#endif
+}
+
+//////////////////////////////////////////////////////////////////////////////
+// Convolution
+
+#ifdef HAVE_CUFFT
+
+namespace
+{
+    class ConvolutionImpl : public Convolution
+    {
+    public:
+        explicit ConvolutionImpl(Size user_block_size_) : user_block_size(user_block_size_) {}
+
+        void convolve(InputArray image, InputArray templ, OutputArray result, bool ccorr = false, Stream& stream = Stream::Null());
+
+    private:
+        void create(Size image_size, Size templ_size);
+        static Size estimateBlockSize(Size result_size);
+
+        Size result_size;
+        Size block_size;
+        Size user_block_size;
+        Size dft_size;
+
+        GpuMat image_spect, templ_spect, result_spect;
+        GpuMat image_block, templ_block, result_data;
+    };
+
+    void ConvolutionImpl::create(Size image_size, Size templ_size)
+    {
+        result_size = Size(image_size.width - templ_size.width + 1,
+                           image_size.height - templ_size.height + 1);
+
+        block_size = user_block_size;
+        if (user_block_size.width == 0 || user_block_size.height == 0)
+            block_size = estimateBlockSize(result_size);
+
+        dft_size.width = 1 << int(ceil(std::log(block_size.width + templ_size.width - 1.) / std::log(2.)));
+        dft_size.height = 1 << int(ceil(std::log(block_size.height + templ_size.height - 1.) / std::log(2.)));
+
+        // CUFFT has hard-coded kernels for power-of-2 sizes (up to 8192),
+        // see CUDA Toolkit 4.1 CUFFT Library Programming Guide
+        if (dft_size.width > 8192)
+            dft_size.width = getOptimalDFTSize(block_size.width + templ_size.width - 1);
+        if (dft_size.height > 8192)
+            dft_size.height = getOptimalDFTSize(block_size.height + templ_size.height - 1);
+
+        // To avoid wasting time doing small DFTs
+        dft_size.width = std::max(dft_size.width, 512);
+        dft_size.height = std::max(dft_size.height, 512);
+
+        createContinuous(dft_size, CV_32F, image_block);
+        createContinuous(dft_size, CV_32F, templ_block);
+        createContinuous(dft_size, CV_32F, result_data);
+
+        int spect_len = dft_size.height * (dft_size.width / 2 + 1);
+        createContinuous(1, spect_len, CV_32FC2, image_spect);
+        createContinuous(1, spect_len, CV_32FC2, templ_spect);
+        createContinuous(1, spect_len, CV_32FC2, result_spect);
+
+        // Use maximum result matrix block size for the estimated DFT block size
+        block_size.width = std::min(dft_size.width - templ_size.width + 1, result_size.width);
+        block_size.height = std::min(dft_size.height - templ_size.height + 1, result_size.height);
+    }
+
+    Size ConvolutionImpl::estimateBlockSize(Size result_size)
+    {
+        int width = (result_size.width + 2) / 3;
+        int height = (result_size.height + 2) / 3;
+        width = std::min(width, result_size.width);
+        height = std::min(height, result_size.height);
+        return Size(width, height);
+    }
+
+    void ConvolutionImpl::convolve(InputArray _image, InputArray _templ, OutputArray _result, bool ccorr, Stream& _stream)
+    {
+        GpuMat image = getInputMat(_image, _stream);
+        GpuMat templ = getInputMat(_templ, _stream);
+
+        CV_Assert( image.type() == CV_32FC1 );
+        CV_Assert( templ.type() == CV_32FC1 );
+
+        create(image.size(), templ.size());
+
+        GpuMat result = getOutputMat(_result, result_size, CV_32FC1, _stream);
+
+        cudaStream_t stream = StreamAccessor::getStream(_stream);
+
+        cufftHandle planR2C, planC2R;
+        cufftSafeCall( cufftPlan2d(&planC2R, dft_size.height, dft_size.width, CUFFT_C2R) );
+        cufftSafeCall( cufftPlan2d(&planR2C, dft_size.height, dft_size.width, CUFFT_R2C) );
+
+        cufftSafeCall( cufftSetStream(planR2C, stream) );
+        cufftSafeCall( cufftSetStream(planC2R, stream) );
+
+        GpuMat templ_roi(templ.size(), CV_32FC1, templ.data, templ.step);
+        cuda::copyMakeBorder(templ_roi, templ_block, 0, templ_block.rows - templ_roi.rows, 0,
+                            templ_block.cols - templ_roi.cols, 0, Scalar(), _stream);
+
+        cufftSafeCall( cufftExecR2C(planR2C, templ_block.ptr<cufftReal>(), templ_spect.ptr<cufftComplex>()) );
+
+        // Process all blocks of the result matrix
+        for (int y = 0; y < result.rows; y += block_size.height)
+        {
+            for (int x = 0; x < result.cols; x += block_size.width)
+            {
+                Size image_roi_size(std::min(x + dft_size.width, image.cols) - x,
+                                    std::min(y + dft_size.height, image.rows) - y);
+                GpuMat image_roi(image_roi_size, CV_32F, (void*)(image.ptr<float>(y) + x),
+                                 image.step);
+                cuda::copyMakeBorder(image_roi, image_block, 0, image_block.rows - image_roi.rows,
+                                    0, image_block.cols - image_roi.cols, 0, Scalar(), _stream);
+
+                cufftSafeCall(cufftExecR2C(planR2C, image_block.ptr<cufftReal>(),
+                                           image_spect.ptr<cufftComplex>()));
+                cuda::mulAndScaleSpectrums(image_spect, templ_spect, result_spect, 0,
+                                          1.f / dft_size.area(), ccorr, _stream);
+                cufftSafeCall(cufftExecC2R(planC2R, result_spect.ptr<cufftComplex>(),
+                                           result_data.ptr<cufftReal>()));
+
+                Size result_roi_size(std::min(x + block_size.width, result.cols) - x,
+                                     std::min(y + block_size.height, result.rows) - y);
+                GpuMat result_roi(result_roi_size, result.type(),
+                                  (void*)(result.ptr<float>(y) + x), result.step);
+                GpuMat result_block(result_roi_size, result_data.type(),
+                                    result_data.ptr(), result_data.step);
+
+                result_block.copyTo(result_roi, _stream);
+            }
+        }
+
+        cufftSafeCall( cufftDestroy(planR2C) );
+        cufftSafeCall( cufftDestroy(planC2R) );
+
+        syncOutput(result, _result, _stream);
+    }
+}
+
+#endif
+
+Ptr<Convolution> cv::cuda::createConvolution(Size user_block_size)
+{
+#ifndef HAVE_CUFFT
+    CV_UNUSED(user_block_size);
+    CV_Error(Error::StsNotImplemented, "The library was build without CUFFT");
+    return Ptr<Convolution>();
+#else
+    return makePtr<ConvolutionImpl>(user_block_size);
+#endif
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudaarithm/src/core.cpp b/modules/cudaarithm/src/core.cpp
new file mode 100644
index 00000000000..6d97e15dbbd
--- /dev/null
+++ b/modules/cudaarithm/src/core.cpp
@@ -0,0 +1,133 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+void cv::cuda::merge(const GpuMat*, size_t, OutputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::merge(const std::vector<GpuMat>&, OutputArray, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::split(InputArray, GpuMat*, Stream&) { throw_no_cuda(); }
+void cv::cuda::split(InputArray, std::vector<GpuMat>&, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::transpose(InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::flip(InputArray, OutputArray, int, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::copyMakeBorder(InputArray, OutputArray, int, int, int, int, int, Scalar, Stream&) { throw_no_cuda(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+////////////////////////////////////////////////////////////////////////
+// flip
+
+namespace
+{
+    template<int DEPTH> struct NppTypeTraits;
+    template<> struct NppTypeTraits<CV_8U>  { typedef Npp8u npp_t; };
+    template<> struct NppTypeTraits<CV_8S>  { typedef Npp8s npp_t; };
+    template<> struct NppTypeTraits<CV_16U> { typedef Npp16u npp_t; };
+    template<> struct NppTypeTraits<CV_16S> { typedef Npp16s npp_t; };
+    template<> struct NppTypeTraits<CV_32S> { typedef Npp32s npp_t; };
+    template<> struct NppTypeTraits<CV_32F> { typedef Npp32f npp_t; };
+    template<> struct NppTypeTraits<CV_64F> { typedef Npp64f npp_t; };
+
+    template <int DEPTH> struct NppMirrorFunc
+    {
+        typedef typename NppTypeTraits<DEPTH>::npp_t npp_t;
+
+        typedef NppStatus (*func_t)(const npp_t* pSrc, int nSrcStep, npp_t* pDst, int nDstStep, NppiSize oROI, NppiAxis flip);
+    };
+
+    template <int DEPTH, typename NppMirrorFunc<DEPTH>::func_t func> struct NppMirror
+    {
+        typedef typename NppMirrorFunc<DEPTH>::npp_t npp_t;
+
+        static void call(const GpuMat& src, GpuMat& dst, int flipCode, cudaStream_t stream)
+        {
+            NppStreamHandler h(stream);
+
+            NppiSize sz;
+            sz.width  = src.cols;
+            sz.height = src.rows;
+
+            nppSafeCall( func(src.ptr<npp_t>(), static_cast<int>(src.step),
+                dst.ptr<npp_t>(), static_cast<int>(dst.step), sz,
+                (flipCode == 0 ? NPP_HORIZONTAL_AXIS : (flipCode > 0 ? NPP_VERTICAL_AXIS : NPP_BOTH_AXIS))) );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    };
+}
+
+void cv::cuda::flip(InputArray _src, OutputArray _dst, int flipCode, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& src, GpuMat& dst, int flipCode, cudaStream_t stream);
+    static const func_t funcs[6][4] =
+    {
+        {NppMirror<CV_8U, nppiMirror_8u_C1R>::call, 0, NppMirror<CV_8U, nppiMirror_8u_C3R>::call, NppMirror<CV_8U, nppiMirror_8u_C4R>::call},
+        {0,0,0,0},
+        {NppMirror<CV_16U, nppiMirror_16u_C1R>::call, 0, NppMirror<CV_16U, nppiMirror_16u_C3R>::call, NppMirror<CV_16U, nppiMirror_16u_C4R>::call},
+        {0,0,0,0},
+        {NppMirror<CV_32S, nppiMirror_32s_C1R>::call, 0, NppMirror<CV_32S, nppiMirror_32s_C3R>::call, NppMirror<CV_32S, nppiMirror_32s_C4R>::call},
+        {NppMirror<CV_32F, nppiMirror_32f_C1R>::call, 0, NppMirror<CV_32F, nppiMirror_32f_C3R>::call, NppMirror<CV_32F, nppiMirror_32f_C4R>::call}
+    };
+
+    GpuMat src = getInputMat(_src, stream);
+
+    CV_Assert(src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32S || src.depth() == CV_32F);
+    CV_Assert(src.channels() == 1 || src.channels() == 3 || src.channels() == 4);
+
+    _dst.create(src.size(), src.type());
+    GpuMat dst = getOutputMat(_dst, src.size(), src.type(), stream);
+
+    funcs[src.depth()][src.channels() - 1](src, dst, flipCode, StreamAccessor::getStream(stream));
+
+    syncOutput(dst, _dst, stream);
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudaarithm/src/cuda/absdiff_mat.cu b/modules/cudaarithm/src/cuda/absdiff_mat.cu
new file mode 100644
index 00000000000..ec04f122845
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/absdiff_mat.cu
@@ -0,0 +1,188 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudev.hpp"
+
+using namespace cv::cudev;
+
+void absDiffMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat&, double, Stream& stream, int);
+
+namespace
+{
+    __device__ __forceinline__ int _abs(int a)
+    {
+        return ::abs(a);
+    }
+    __device__ __forceinline__ float _abs(float a)
+    {
+        return ::fabsf(a);
+    }
+    __device__ __forceinline__ double _abs(double a)
+    {
+        return ::fabs(a);
+    }
+
+    template <typename T> struct AbsDiffOp1 : binary_function<T, T, T>
+    {
+        __device__ __forceinline__ T operator ()(T a, T b) const
+        {
+            return saturate_cast<T>(_abs(a - b));
+        }
+    };
+
+    template <typename ScalarDepth> struct TransformPolicy : DefaultTransformPolicy
+    {
+    };
+    template <> struct TransformPolicy<double> : DefaultTransformPolicy
+    {
+        enum {
+            shift = 1
+        };
+    };
+
+    template <typename T>
+    void absDiffMat_v1(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+    {
+        gridTransformBinary_< TransformPolicy<T> >(globPtr<T>(src1), globPtr<T>(src2), globPtr<T>(dst), AbsDiffOp1<T>(), stream);
+    }
+
+    struct AbsDiffOp2 : binary_function<uint, uint, uint>
+    {
+        __device__ __forceinline__ uint operator ()(uint a, uint b) const
+        {
+            return vabsdiff2(a, b);
+        }
+    };
+
+    void absDiffMat_v2(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+    {
+        const int vcols = src1.cols >> 1;
+
+        GlobPtrSz<uint> src1_ = globPtr((uint*) src1.data, src1.step, src1.rows, vcols);
+        GlobPtrSz<uint> src2_ = globPtr((uint*) src2.data, src2.step, src1.rows, vcols);
+        GlobPtrSz<uint> dst_ = globPtr((uint*) dst.data, dst.step, src1.rows, vcols);
+
+        gridTransformBinary(src1_, src2_, dst_, AbsDiffOp2(), stream);
+    }
+
+    struct AbsDiffOp4 : binary_function<uint, uint, uint>
+    {
+        __device__ __forceinline__ uint operator ()(uint a, uint b) const
+        {
+            return vabsdiff4(a, b);
+        }
+    };
+
+    void absDiffMat_v4(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+    {
+        const int vcols = src1.cols >> 2;
+
+        GlobPtrSz<uint> src1_ = globPtr((uint*) src1.data, src1.step, src1.rows, vcols);
+        GlobPtrSz<uint> src2_ = globPtr((uint*) src2.data, src2.step, src1.rows, vcols);
+        GlobPtrSz<uint> dst_ = globPtr((uint*) dst.data, dst.step, src1.rows, vcols);
+
+        gridTransformBinary(src1_, src2_, dst_, AbsDiffOp4(), stream);
+    }
+}
+
+void absDiffMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat&, double, Stream& stream, int)
+{
+    typedef void (*func_t)(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream);
+    static const func_t funcs[] =
+    {
+        absDiffMat_v1<uchar>,
+        absDiffMat_v1<schar>,
+        absDiffMat_v1<ushort>,
+        absDiffMat_v1<short>,
+        absDiffMat_v1<int>,
+        absDiffMat_v1<float>,
+        absDiffMat_v1<double>
+    };
+
+    const int depth = src1.depth();
+
+    CV_DbgAssert( depth <= CV_64F );
+
+    GpuMat src1_ = src1.reshape(1);
+    GpuMat src2_ = src2.reshape(1);
+    GpuMat dst_ = dst.reshape(1);
+
+    if (depth == CV_8U || depth == CV_16U)
+    {
+        const intptr_t src1ptr = reinterpret_cast<intptr_t>(src1_.data);
+        const intptr_t src2ptr = reinterpret_cast<intptr_t>(src2_.data);
+        const intptr_t dstptr = reinterpret_cast<intptr_t>(dst_.data);
+
+        const bool isAllAligned = (src1ptr & 31) == 0 && (src2ptr & 31) == 0 && (dstptr & 31) == 0;
+
+        if (isAllAligned)
+        {
+            if (depth == CV_8U && (src1_.cols & 3) == 0)
+            {
+                absDiffMat_v4(src1_, src2_, dst_, stream);
+                return;
+            }
+            else if (depth == CV_16U && (src1_.cols & 1) == 0)
+            {
+                absDiffMat_v2(src1_, src2_, dst_, stream);
+                return;
+            }
+        }
+    }
+
+    const func_t func = funcs[depth];
+
+    if (!func)
+        CV_Error(cv::Error::StsUnsupportedFormat, "Unsupported combination of source and destination types");
+
+    func(src1_, src2_, dst_, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/absdiff_scalar.cu b/modules/cudaarithm/src/cuda/absdiff_scalar.cu
new file mode 100644
index 00000000000..0955e40c8b1
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/absdiff_scalar.cu
@@ -0,0 +1,133 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudev.hpp"
+
+using namespace cv::cudev;
+
+void absDiffScalar(const GpuMat& src, cv::Scalar val, bool, GpuMat& dst, const GpuMat&, double, Stream& stream, int);
+
+namespace
+{
+    template <typename SrcType, typename ScalarType, typename DstType> struct AbsDiffScalarOp : unary_function<SrcType, DstType>
+    {
+        ScalarType val;
+
+        __device__ __forceinline__ DstType operator ()(SrcType a) const
+        {
+            abs_func<ScalarType> f;
+            return saturate_cast<DstType>(f(saturate_cast<ScalarType>(a) - val));
+        }
+    };
+
+    template <typename ScalarDepth> struct TransformPolicy : DefaultTransformPolicy
+    {
+    };
+    template <> struct TransformPolicy<double> : DefaultTransformPolicy
+    {
+        enum {
+            shift = 1
+        };
+    };
+
+    template <typename SrcType, typename ScalarDepth>
+    void absDiffScalarImpl(const GpuMat& src, cv::Scalar value, GpuMat& dst, Stream& stream)
+    {
+        typedef typename MakeVec<ScalarDepth, VecTraits<SrcType>::cn>::type ScalarType;
+
+        cv::Scalar_<ScalarDepth> value_ = value;
+
+        AbsDiffScalarOp<SrcType, ScalarType, SrcType> op;
+        op.val = VecTraits<ScalarType>::make(value_.val);
+        gridTransformUnary_< TransformPolicy<ScalarDepth> >(globPtr<SrcType>(src), globPtr<SrcType>(dst), op, stream);
+    }
+}
+
+void absDiffScalar(const GpuMat& src, cv::Scalar val, bool, GpuMat& dst, const GpuMat&, double, Stream& stream, int)
+{
+    typedef void (*func_t)(const GpuMat& src, cv::Scalar val, GpuMat& dst, Stream& stream);
+    static const func_t funcs[7][4] =
+    {
+        {
+            absDiffScalarImpl<uchar, float>, absDiffScalarImpl<uchar2, float>, absDiffScalarImpl<uchar3, float>, absDiffScalarImpl<uchar4, float>
+        },
+        {
+            absDiffScalarImpl<schar, float>, absDiffScalarImpl<char2, float>, absDiffScalarImpl<char3, float>, absDiffScalarImpl<char4, float>
+        },
+        {
+            absDiffScalarImpl<ushort, float>, absDiffScalarImpl<ushort2, float>, absDiffScalarImpl<ushort3, float>, absDiffScalarImpl<ushort4, float>
+        },
+        {
+            absDiffScalarImpl<short, float>, absDiffScalarImpl<short2, float>, absDiffScalarImpl<short3, float>, absDiffScalarImpl<short4, float>
+        },
+        {
+            absDiffScalarImpl<int, float>, absDiffScalarImpl<int2, float>, absDiffScalarImpl<int3, float>, absDiffScalarImpl<int4, float>
+        },
+        {
+          absDiffScalarImpl<float, float>, absDiffScalarImpl<float2, float>, absDiffScalarImpl<float3, float>, absDiffScalarImpl<float4, float>
+        },
+        {
+          absDiffScalarImpl<double, double>, absDiffScalarImpl<double2, double>, absDiffScalarImpl<double3, double>, absDiffScalarImpl<double4, double>
+        }
+    };
+
+    const int sdepth = src.depth();
+    const int cn = src.channels();
+
+    CV_DbgAssert( sdepth <= CV_64F && cn <= 4 && src.type() == dst.type());
+
+    const func_t func = funcs[sdepth][cn - 1];
+    if (!func)
+        CV_Error(cv::Error::StsUnsupportedFormat, "Unsupported combination of source and destination types");
+
+    func(src, val, dst, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/add_mat.cu b/modules/cudaarithm/src/cuda/add_mat.cu
new file mode 100644
index 00000000000..4166cc104e0
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/add_mat.cu
@@ -0,0 +1,225 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudev.hpp"
+
+using namespace cv::cudev;
+
+void addMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, double, Stream& _stream, int);
+
+namespace
+{
+    template <typename T, typename D> struct AddOp1 : binary_function<T, T, D>
+    {
+        __device__ __forceinline__ D operator ()(T a, T b) const
+        {
+            return saturate_cast<D>(a + b);
+        }
+    };
+
+    template <typename T, typename D>
+    void addMat_v1(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, Stream& stream)
+    {
+        if (mask.data)
+            gridTransformBinary(globPtr<T>(src1), globPtr<T>(src2), globPtr<D>(dst), AddOp1<T, D>(), globPtr<uchar>(mask), stream);
+        else
+            gridTransformBinary(globPtr<T>(src1), globPtr<T>(src2), globPtr<D>(dst), AddOp1<T, D>(), stream);
+    }
+
+    struct AddOp2 : binary_function<uint, uint, uint>
+    {
+        __device__ __forceinline__ uint operator ()(uint a, uint b) const
+        {
+            return vadd2(a, b);
+        }
+    };
+
+    void addMat_v2(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+    {
+        const int vcols = src1.cols >> 1;
+
+        GlobPtrSz<uint> src1_ = globPtr((uint*) src1.data, src1.step, src1.rows, vcols);
+        GlobPtrSz<uint> src2_ = globPtr((uint*) src2.data, src2.step, src1.rows, vcols);
+        GlobPtrSz<uint> dst_ = globPtr((uint*) dst.data, dst.step, src1.rows, vcols);
+
+        gridTransformBinary(src1_, src2_, dst_, AddOp2(), stream);
+    }
+
+    struct AddOp4 : binary_function<uint, uint, uint>
+    {
+        __device__ __forceinline__ uint operator ()(uint a, uint b) const
+        {
+            return vadd4(a, b);
+        }
+    };
+
+    void addMat_v4(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+    {
+        const int vcols = src1.cols >> 2;
+
+        GlobPtrSz<uint> src1_ = globPtr((uint*) src1.data, src1.step, src1.rows, vcols);
+        GlobPtrSz<uint> src2_ = globPtr((uint*) src2.data, src2.step, src1.rows, vcols);
+        GlobPtrSz<uint> dst_ = globPtr((uint*) dst.data, dst.step, src1.rows, vcols);
+
+        gridTransformBinary(src1_, src2_, dst_, AddOp4(), stream);
+    }
+}
+
+void addMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, double, Stream& stream, int)
+{
+    typedef void (*func_t)(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, Stream& stream);
+    static const func_t funcs[7][7] =
+    {
+        {
+            addMat_v1<uchar, uchar>,
+            addMat_v1<uchar, schar>,
+            addMat_v1<uchar, ushort>,
+            addMat_v1<uchar, short>,
+            addMat_v1<uchar, int>,
+            addMat_v1<uchar, float>,
+            addMat_v1<uchar, double>
+        },
+        {
+            addMat_v1<schar, uchar>,
+            addMat_v1<schar, schar>,
+            addMat_v1<schar, ushort>,
+            addMat_v1<schar, short>,
+            addMat_v1<schar, int>,
+            addMat_v1<schar, float>,
+            addMat_v1<schar, double>
+        },
+        {
+            0 /*addMat_v1<ushort, uchar>*/,
+            0 /*addMat_v1<ushort, schar>*/,
+            addMat_v1<ushort, ushort>,
+            addMat_v1<ushort, short>,
+            addMat_v1<ushort, int>,
+            addMat_v1<ushort, float>,
+            addMat_v1<ushort, double>
+        },
+        {
+            0 /*addMat_v1<short, uchar>*/,
+            0 /*addMat_v1<short, schar>*/,
+            addMat_v1<short, ushort>,
+            addMat_v1<short, short>,
+            addMat_v1<short, int>,
+            addMat_v1<short, float>,
+            addMat_v1<short, double>
+        },
+        {
+            0 /*addMat_v1<int, uchar>*/,
+            0 /*addMat_v1<int, schar>*/,
+            0 /*addMat_v1<int, ushort>*/,
+            0 /*addMat_v1<int, short>*/,
+            addMat_v1<int, int>,
+            addMat_v1<int, float>,
+            addMat_v1<int, double>
+        },
+        {
+            0 /*addMat_v1<float, uchar>*/,
+            0 /*addMat_v1<float, schar>*/,
+            0 /*addMat_v1<float, ushort>*/,
+            0 /*addMat_v1<float, short>*/,
+            0 /*addMat_v1<float, int>*/,
+            addMat_v1<float, float>,
+            addMat_v1<float, double>
+        },
+        {
+            0 /*addMat_v1<double, uchar>*/,
+            0 /*addMat_v1<double, schar>*/,
+            0 /*addMat_v1<double, ushort>*/,
+            0 /*addMat_v1<double, short>*/,
+            0 /*addMat_v1<double, int>*/,
+            0 /*addMat_v1<double, float>*/,
+            addMat_v1<double, double>
+        }
+    };
+
+    const int sdepth = src1.depth();
+    const int ddepth = dst.depth();
+
+    CV_DbgAssert( sdepth <= CV_64F && ddepth <= CV_64F );
+
+    GpuMat src1_ = src1.reshape(1);
+    GpuMat src2_ = src2.reshape(1);
+    GpuMat dst_ = dst.reshape(1);
+
+    if (mask.empty() && (sdepth == CV_8U || sdepth == CV_16U) && ddepth == sdepth)
+    {
+        const intptr_t src1ptr = reinterpret_cast<intptr_t>(src1_.data);
+        const intptr_t src2ptr = reinterpret_cast<intptr_t>(src2_.data);
+        const intptr_t dstptr = reinterpret_cast<intptr_t>(dst_.data);
+
+        const bool isAllAligned = (src1ptr & 31) == 0 && (src2ptr & 31) == 0 && (dstptr & 31) == 0;
+
+        if (isAllAligned)
+        {
+            if (sdepth == CV_8U && (src1_.cols & 3) == 0)
+            {
+                addMat_v4(src1_, src2_, dst_, stream);
+                return;
+            }
+            else if (sdepth == CV_16U && (src1_.cols & 1) == 0)
+            {
+                addMat_v2(src1_, src2_, dst_, stream);
+                return;
+            }
+        }
+    }
+
+    const func_t func = funcs[sdepth][ddepth];
+
+    if (!func)
+        CV_Error(cv::Error::StsUnsupportedFormat, "Unsupported combination of source and destination types");
+
+    func(src1_, src2_, dst_, mask, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/add_scalar.cu b/modules/cudaarithm/src/cuda/add_scalar.cu
new file mode 100644
index 00000000000..92838a2a57d
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/add_scalar.cu
@@ -0,0 +1,180 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudev.hpp"
+
+using namespace cv::cudev;
+
+void addScalar(const GpuMat& src, cv::Scalar val, bool, GpuMat& dst, const GpuMat& mask, double, Stream& stream, int);
+
+namespace
+{
+    template <typename SrcType, typename ScalarType, typename DstType> struct AddScalarOp : unary_function<SrcType, DstType>
+    {
+        ScalarType val;
+
+        __device__ __forceinline__ DstType operator ()(SrcType a) const
+        {
+            return saturate_cast<DstType>(saturate_cast<ScalarType>(a) + val);
+        }
+    };
+
+    template <typename ScalarDepth> struct TransformPolicy : DefaultTransformPolicy
+    {
+    };
+    template <> struct TransformPolicy<double> : DefaultTransformPolicy
+    {
+        enum {
+            shift = 1
+        };
+    };
+
+    template <typename SrcType, typename ScalarDepth, typename DstType>
+    void addScalarImpl(const GpuMat& src, cv::Scalar value, GpuMat& dst, const GpuMat& mask, Stream& stream)
+    {
+        typedef typename MakeVec<ScalarDepth, VecTraits<SrcType>::cn>::type ScalarType;
+
+        cv::Scalar_<ScalarDepth> value_ = value;
+
+        AddScalarOp<SrcType, ScalarType, DstType> op;
+        op.val = VecTraits<ScalarType>::make(value_.val);
+
+        if (mask.data)
+            gridTransformUnary_< TransformPolicy<ScalarDepth> >(globPtr<SrcType>(src), globPtr<DstType>(dst), op, globPtr<uchar>(mask), stream);
+        else
+            gridTransformUnary_< TransformPolicy<ScalarDepth> >(globPtr<SrcType>(src), globPtr<DstType>(dst), op, stream);
+    }
+}
+
+void addScalar(const GpuMat& src, cv::Scalar val, bool, GpuMat& dst, const GpuMat& mask, double, Stream& stream, int)
+{
+    typedef void (*func_t)(const GpuMat& src, cv::Scalar val, GpuMat& dst, const GpuMat& mask, Stream& stream);
+    static const func_t funcs[7][7][4] =
+    {
+        {
+            {addScalarImpl<uchar, float, uchar>, addScalarImpl<uchar2, float, uchar2>, addScalarImpl<uchar3, float, uchar3>, addScalarImpl<uchar4, float, uchar4>},
+            {addScalarImpl<uchar, float, schar>, addScalarImpl<uchar2, float, char2>, addScalarImpl<uchar3, float, char3>, addScalarImpl<uchar4, float, char4>},
+            {addScalarImpl<uchar, float, ushort>, addScalarImpl<uchar2, float, ushort2>, addScalarImpl<uchar3, float, ushort3>, addScalarImpl<uchar4, float, ushort4>},
+            {addScalarImpl<uchar, float, short>, addScalarImpl<uchar2, float, short2>, addScalarImpl<uchar3, float, short3>, addScalarImpl<uchar4, float, short4>},
+            {addScalarImpl<uchar, float, int>, addScalarImpl<uchar2, float, int2>, addScalarImpl<uchar3, float, int3>, addScalarImpl<uchar4, float, int4>},
+            {addScalarImpl<uchar, float, float>, addScalarImpl<uchar2, float, float2>, addScalarImpl<uchar3, float, float3>, addScalarImpl<uchar4, float, float4>},
+            {addScalarImpl<uchar, double, double>, addScalarImpl<uchar2, double, double2>, addScalarImpl<uchar3, double, double3>, addScalarImpl<uchar4, double, double4>}
+        },
+        {
+            {addScalarImpl<schar, float, uchar>, addScalarImpl<char2, float, uchar2>, addScalarImpl<char3, float, uchar3>, addScalarImpl<char4, float, uchar4>},
+            {addScalarImpl<schar, float, schar>, addScalarImpl<char2, float, char2>, addScalarImpl<char3, float, char3>, addScalarImpl<char4, float, char4>},
+            {addScalarImpl<schar, float, ushort>, addScalarImpl<char2, float, ushort2>, addScalarImpl<char3, float, ushort3>, addScalarImpl<char4, float, ushort4>},
+            {addScalarImpl<schar, float, short>, addScalarImpl<char2, float, short2>, addScalarImpl<char3, float, short3>, addScalarImpl<char4, float, short4>},
+            {addScalarImpl<schar, float, int>, addScalarImpl<char2, float, int2>, addScalarImpl<char3, float, int3>, addScalarImpl<char4, float, int4>},
+            {addScalarImpl<schar, float, float>, addScalarImpl<char2, float, float2>, addScalarImpl<char3, float, float3>, addScalarImpl<char4, float, float4>},
+            {addScalarImpl<schar, double, double>, addScalarImpl<char2, double, double2>, addScalarImpl<char3, double, double3>, addScalarImpl<char4, double, double4>}
+        },
+        {
+            {0 /*addScalarImpl<ushort, float, uchar>*/, 0 /*addScalarImpl<ushort2, float, uchar2>*/, 0 /*addScalarImpl<ushort3, float, uchar3>*/, 0 /*addScalarImpl<ushort4, float, uchar4>*/},
+            {0 /*addScalarImpl<ushort, float, schar>*/, 0 /*addScalarImpl<ushort2, float, char2>*/, 0 /*addScalarImpl<ushort3, float, char3>*/, 0 /*addScalarImpl<ushort4, float, char4>*/},
+            {addScalarImpl<ushort, float, ushort>, addScalarImpl<ushort2, float, ushort2>, addScalarImpl<ushort3, float, ushort3>, addScalarImpl<ushort4, float, ushort4>},
+            {addScalarImpl<ushort, float, short>, addScalarImpl<ushort2, float, short2>, addScalarImpl<ushort3, float, short3>, addScalarImpl<ushort4, float, short4>},
+            {addScalarImpl<ushort, float, int>, addScalarImpl<ushort2, float, int2>, addScalarImpl<ushort3, float, int3>, addScalarImpl<ushort4, float, int4>},
+            {addScalarImpl<ushort, float, float>, addScalarImpl<ushort2, float, float2>, addScalarImpl<ushort3, float, float3>, addScalarImpl<ushort4, float, float4>},
+            {addScalarImpl<ushort, double, double>, addScalarImpl<ushort2, double, double2>, addScalarImpl<ushort3, double, double3>, addScalarImpl<ushort4, double, double4>}
+        },
+        {
+            {0 /*addScalarImpl<short, float, uchar>*/, 0 /*addScalarImpl<short2, float, uchar2>*/, 0 /*addScalarImpl<short3, float, uchar3>*/, 0 /*addScalarImpl<short4, float, uchar4>*/},
+            {0 /*addScalarImpl<short, float, schar>*/, 0 /*addScalarImpl<short2, float, char2>*/, 0 /*addScalarImpl<short3, float, char3>*/, 0 /*addScalarImpl<short4, float, char4>*/},
+            {addScalarImpl<short, float, ushort>, addScalarImpl<short2, float, ushort2>, addScalarImpl<short3, float, ushort3>, addScalarImpl<short4, float, ushort4>},
+            {addScalarImpl<short, float, short>, addScalarImpl<short2, float, short2>, addScalarImpl<short3, float, short3>, addScalarImpl<short4, float, short4>},
+            {addScalarImpl<short, float, int>, addScalarImpl<short2, float, int2>, addScalarImpl<short3, float, int3>, addScalarImpl<short4, float, int4>},
+            {addScalarImpl<short, float, float>, addScalarImpl<short2, float, float2>, addScalarImpl<short3, float, float3>, addScalarImpl<short4, float, float4>},
+            {addScalarImpl<short, double, double>, addScalarImpl<short2, double, double2>, addScalarImpl<short3, double, double3>, addScalarImpl<short4, double, double4>}
+        },
+        {
+            {0 /*addScalarImpl<int, float, uchar>*/, 0 /*addScalarImpl<int2, float, uchar2>*/, 0 /*addScalarImpl<int3, float, uchar3>*/, 0 /*addScalarImpl<int4, float, uchar4>*/},
+            {0 /*addScalarImpl<int, float, schar>*/, 0 /*addScalarImpl<int2, float, char2>*/, 0 /*addScalarImpl<int3, float, char3>*/, 0 /*addScalarImpl<int4, float, char4>*/},
+            {0 /*addScalarImpl<int, float, ushort>*/, 0 /*addScalarImpl<int2, float, ushort2>*/, 0 /*addScalarImpl<int3, float, ushort3>*/, 0 /*addScalarImpl<int4, float, ushort4>*/},
+            {0 /*addScalarImpl<int, float, short>*/, 0 /*addScalarImpl<int2, float, short2>*/, 0 /*addScalarImpl<int3, float, short3>*/, 0 /*addScalarImpl<int4, float, short4>*/},
+            {addScalarImpl<int, float, int>, addScalarImpl<int2, float, int2>, addScalarImpl<int3, float, int3>, addScalarImpl<int4, float, int4>},
+            {addScalarImpl<int, float, float>, addScalarImpl<int2, float, float2>, addScalarImpl<int3, float, float3>, addScalarImpl<int4, float, float4>},
+            {addScalarImpl<int, double, double>, addScalarImpl<int2, double, double2>, addScalarImpl<int3, double, double3>, addScalarImpl<int4, double, double4>}
+        },
+        {
+            {0 /*addScalarImpl<float, float, uchar>*/, 0 /*addScalarImpl<float2, float, uchar2>*/, 0 /*addScalarImpl<float3, float, uchar3>*/, 0 /*addScalarImpl<float4, float, uchar4>*/},
+            {0 /*addScalarImpl<float, float, schar>*/, 0 /*addScalarImpl<float2, float, char2>*/, 0 /*addScalarImpl<float3, float, char3>*/, 0 /*addScalarImpl<float4, float, char4>*/},
+            {0 /*addScalarImpl<float, float, ushort>*/, 0 /*addScalarImpl<float2, float, ushort2>*/, 0 /*addScalarImpl<float3, float, ushort3>*/, 0 /*addScalarImpl<float4, float, ushort4>*/},
+            {0 /*addScalarImpl<float, float, short>*/, 0 /*addScalarImpl<float2, float, short2>*/, 0 /*addScalarImpl<float3, float, short3>*/, 0 /*addScalarImpl<float4, float, short4>*/},
+            {0 /*addScalarImpl<float, float, int>*/, 0 /*addScalarImpl<float2, float, int2>*/, 0 /*addScalarImpl<float3, float, int3>*/, 0 /*addScalarImpl<float4, float, int4>*/},
+            {addScalarImpl<float, float, float>, addScalarImpl<float2, float, float2>, addScalarImpl<float3, float, float3>, addScalarImpl<float4, float, float4>},
+            {addScalarImpl<float, double, double>, addScalarImpl<float2, double, double2>, addScalarImpl<float3, double, double3>, addScalarImpl<float4, double, double4>}
+        },
+        {
+            {0 /*addScalarImpl<double, double, uchar>*/, 0 /*addScalarImpl<double2, double, uchar2>*/, 0 /*addScalarImpl<double3, double, uchar3>*/, 0 /*addScalarImpl<double4, double, uchar4>*/},
+            {0 /*addScalarImpl<double, double, schar>*/, 0 /*addScalarImpl<double2, double, char2>*/, 0 /*addScalarImpl<double3, double, char3>*/, 0 /*addScalarImpl<double4, double, char4>*/},
+            {0 /*addScalarImpl<double, double, ushort>*/, 0 /*addScalarImpl<double2, double, ushort2>*/, 0 /*addScalarImpl<double3, double, ushort3>*/, 0 /*addScalarImpl<double4, double, ushort4>*/},
+            {0 /*addScalarImpl<double, double, short>*/, 0 /*addScalarImpl<double2, double, short2>*/, 0 /*addScalarImpl<double3, double, short3>*/, 0 /*addScalarImpl<double4, double, short4>*/},
+            {0 /*addScalarImpl<double, double, int>*/, 0 /*addScalarImpl<double2, double, int2>*/, 0 /*addScalarImpl<double3, double, int3>*/, 0 /*addScalarImpl<double4, double, int4>*/},
+            {0 /*addScalarImpl<double, double, float>*/, 0 /*addScalarImpl<double2, double, float2>*/, 0 /*addScalarImpl<double3, double, float3>*/, 0 /*addScalarImpl<double4, double, float4>*/},
+            {addScalarImpl<double, double, double>, addScalarImpl<double2, double, double2>, addScalarImpl<double3, double, double3>, addScalarImpl<double4, double, double4>}
+        }
+    };
+
+    const int sdepth = src.depth();
+    const int ddepth = dst.depth();
+    const int cn = src.channels();
+
+    CV_DbgAssert( sdepth <= CV_64F && ddepth <= CV_64F && cn <= 4 );
+
+    const func_t func = funcs[sdepth][ddepth][cn - 1];
+
+    if (!func)
+        CV_Error(cv::Error::StsUnsupportedFormat, "Unsupported combination of source and destination types");
+
+    func(src, val, dst, mask, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/add_weighted.cu b/modules/cudaarithm/src/cuda/add_weighted.cu
new file mode 100644
index 00000000000..929301076d3
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/add_weighted.cu
@@ -0,0 +1,596 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+namespace
+{
+    template <typename T1, typename T2, typename D, typename S> struct AddWeightedOp : binary_function<T1, T2, D>
+    {
+        S alpha;
+        S beta;
+        S gamma;
+
+        __device__ __forceinline__ D operator ()(T1 a, T2 b) const
+        {
+            return cudev::saturate_cast<D>(a * alpha + b * beta + gamma);
+        }
+    };
+
+    template <typename ScalarDepth> struct TransformPolicy : DefaultTransformPolicy
+    {
+    };
+    template <> struct TransformPolicy<double> : DefaultTransformPolicy
+    {
+        enum {
+            shift = 1
+        };
+    };
+
+    template <typename T1, typename T2, typename D>
+    void addWeightedImpl(const GpuMat& src1, double alpha, const GpuMat& src2, double beta, double gamma, GpuMat& dst, Stream& stream)
+    {
+        typedef typename LargerType<T1, T2>::type larger_type1;
+        typedef typename LargerType<larger_type1, D>::type larger_type2;
+        typedef typename LargerType<larger_type2, float>::type scalar_type;
+
+        AddWeightedOp<T1, T2, D, scalar_type> op;
+        op.alpha = static_cast<scalar_type>(alpha);
+        op.beta = static_cast<scalar_type>(beta);
+        op.gamma = static_cast<scalar_type>(gamma);
+
+        gridTransformBinary_< TransformPolicy<scalar_type> >(globPtr<T1>(src1), globPtr<T2>(src2), globPtr<D>(dst), op, stream);
+    }
+}
+
+void cv::cuda::addWeighted(InputArray _src1, double alpha, InputArray _src2, double beta, double gamma, OutputArray _dst, int ddepth, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& src1, double alpha, const GpuMat& src2, double beta, double gamma, GpuMat& dst, Stream& stream);
+    static const func_t funcs[7][7][7] =
+    {
+        {
+            {
+                addWeightedImpl<uchar, uchar, uchar >,
+                addWeightedImpl<uchar, uchar, schar >,
+                addWeightedImpl<uchar, uchar, ushort>,
+                addWeightedImpl<uchar, uchar, short >,
+                addWeightedImpl<uchar, uchar, int   >,
+                addWeightedImpl<uchar, uchar, float >,
+                addWeightedImpl<uchar, uchar, double>
+            },
+            {
+                addWeightedImpl<uchar, schar, uchar >,
+                addWeightedImpl<uchar, schar, schar >,
+                addWeightedImpl<uchar, schar, ushort>,
+                addWeightedImpl<uchar, schar, short >,
+                addWeightedImpl<uchar, schar, int   >,
+                addWeightedImpl<uchar, schar, float >,
+                addWeightedImpl<uchar, schar, double>
+            },
+            {
+                addWeightedImpl<uchar, ushort, uchar >,
+                addWeightedImpl<uchar, ushort, schar >,
+                addWeightedImpl<uchar, ushort, ushort>,
+                addWeightedImpl<uchar, ushort, short >,
+                addWeightedImpl<uchar, ushort, int   >,
+                addWeightedImpl<uchar, ushort, float >,
+                addWeightedImpl<uchar, ushort, double>
+            },
+            {
+                addWeightedImpl<uchar, short, uchar >,
+                addWeightedImpl<uchar, short, schar >,
+                addWeightedImpl<uchar, short, ushort>,
+                addWeightedImpl<uchar, short, short >,
+                addWeightedImpl<uchar, short, int   >,
+                addWeightedImpl<uchar, short, float >,
+                addWeightedImpl<uchar, short, double>
+            },
+            {
+                addWeightedImpl<uchar, int, uchar >,
+                addWeightedImpl<uchar, int, schar >,
+                addWeightedImpl<uchar, int, ushort>,
+                addWeightedImpl<uchar, int, short >,
+                addWeightedImpl<uchar, int, int   >,
+                addWeightedImpl<uchar, int, float >,
+                addWeightedImpl<uchar, int, double>
+            },
+            {
+                addWeightedImpl<uchar, float, uchar >,
+                addWeightedImpl<uchar, float, schar >,
+                addWeightedImpl<uchar, float, ushort>,
+                addWeightedImpl<uchar, float, short >,
+                addWeightedImpl<uchar, float, int   >,
+                addWeightedImpl<uchar, float, float >,
+                addWeightedImpl<uchar, float, double>
+            },
+            {
+                addWeightedImpl<uchar, double, uchar >,
+                addWeightedImpl<uchar, double, schar >,
+                addWeightedImpl<uchar, double, ushort>,
+                addWeightedImpl<uchar, double, short >,
+                addWeightedImpl<uchar, double, int   >,
+                addWeightedImpl<uchar, double, float >,
+                addWeightedImpl<uchar, double, double>
+            }
+        },
+        {
+            {
+                0/*addWeightedImpl<schar, uchar, uchar >*/,
+                0/*addWeightedImpl<schar, uchar, schar >*/,
+                0/*addWeightedImpl<schar, uchar, ushort>*/,
+                0/*addWeightedImpl<schar, uchar, short >*/,
+                0/*addWeightedImpl<schar, uchar, int   >*/,
+                0/*addWeightedImpl<schar, uchar, float >*/,
+                0/*addWeightedImpl<schar, uchar, double>*/
+            },
+            {
+                addWeightedImpl<schar, schar, uchar >,
+                addWeightedImpl<schar, schar, schar >,
+                addWeightedImpl<schar, schar, ushort>,
+                addWeightedImpl<schar, schar, short >,
+                addWeightedImpl<schar, schar, int   >,
+                addWeightedImpl<schar, schar, float >,
+                addWeightedImpl<schar, schar, double>
+            },
+            {
+                addWeightedImpl<schar, ushort, uchar >,
+                addWeightedImpl<schar, ushort, schar >,
+                addWeightedImpl<schar, ushort, ushort>,
+                addWeightedImpl<schar, ushort, short >,
+                addWeightedImpl<schar, ushort, int   >,
+                addWeightedImpl<schar, ushort, float >,
+                addWeightedImpl<schar, ushort, double>
+            },
+            {
+                addWeightedImpl<schar, short, uchar >,
+                addWeightedImpl<schar, short, schar >,
+                addWeightedImpl<schar, short, ushort>,
+                addWeightedImpl<schar, short, short >,
+                addWeightedImpl<schar, short, int   >,
+                addWeightedImpl<schar, short, float >,
+                addWeightedImpl<schar, short, double>
+            },
+            {
+                addWeightedImpl<schar, int, uchar >,
+                addWeightedImpl<schar, int, schar >,
+                addWeightedImpl<schar, int, ushort>,
+                addWeightedImpl<schar, int, short >,
+                addWeightedImpl<schar, int, int   >,
+                addWeightedImpl<schar, int, float >,
+                addWeightedImpl<schar, int, double>
+            },
+            {
+                addWeightedImpl<schar, float, uchar >,
+                addWeightedImpl<schar, float, schar >,
+                addWeightedImpl<schar, float, ushort>,
+                addWeightedImpl<schar, float, short >,
+                addWeightedImpl<schar, float, int   >,
+                addWeightedImpl<schar, float, float >,
+                addWeightedImpl<schar, float, double>
+            },
+            {
+                addWeightedImpl<schar, double, uchar >,
+                addWeightedImpl<schar, double, schar >,
+                addWeightedImpl<schar, double, ushort>,
+                addWeightedImpl<schar, double, short >,
+                addWeightedImpl<schar, double, int   >,
+                addWeightedImpl<schar, double, float >,
+                addWeightedImpl<schar, double, double>
+            }
+        },
+        {
+            {
+                0/*addWeightedImpl<ushort, uchar, uchar >*/,
+                0/*addWeightedImpl<ushort, uchar, schar >*/,
+                0/*addWeightedImpl<ushort, uchar, ushort>*/,
+                0/*addWeightedImpl<ushort, uchar, short >*/,
+                0/*addWeightedImpl<ushort, uchar, int   >*/,
+                0/*addWeightedImpl<ushort, uchar, float >*/,
+                0/*addWeightedImpl<ushort, uchar, double>*/
+            },
+            {
+                0/*addWeightedImpl<ushort, schar, uchar >*/,
+                0/*addWeightedImpl<ushort, schar, schar >*/,
+                0/*addWeightedImpl<ushort, schar, ushort>*/,
+                0/*addWeightedImpl<ushort, schar, short >*/,
+                0/*addWeightedImpl<ushort, schar, int   >*/,
+                0/*addWeightedImpl<ushort, schar, float >*/,
+                0/*addWeightedImpl<ushort, schar, double>*/
+            },
+            {
+                addWeightedImpl<ushort, ushort, uchar >,
+                addWeightedImpl<ushort, ushort, schar >,
+                addWeightedImpl<ushort, ushort, ushort>,
+                addWeightedImpl<ushort, ushort, short >,
+                addWeightedImpl<ushort, ushort, int   >,
+                addWeightedImpl<ushort, ushort, float >,
+                addWeightedImpl<ushort, ushort, double>
+            },
+            {
+                addWeightedImpl<ushort, short, uchar >,
+                addWeightedImpl<ushort, short, schar >,
+                addWeightedImpl<ushort, short, ushort>,
+                addWeightedImpl<ushort, short, short >,
+                addWeightedImpl<ushort, short, int   >,
+                addWeightedImpl<ushort, short, float >,
+                addWeightedImpl<ushort, short, double>
+            },
+            {
+                addWeightedImpl<ushort, int, uchar >,
+                addWeightedImpl<ushort, int, schar >,
+                addWeightedImpl<ushort, int, ushort>,
+                addWeightedImpl<ushort, int, short >,
+                addWeightedImpl<ushort, int, int   >,
+                addWeightedImpl<ushort, int, float >,
+                addWeightedImpl<ushort, int, double>
+            },
+            {
+                addWeightedImpl<ushort, float, uchar >,
+                addWeightedImpl<ushort, float, schar >,
+                addWeightedImpl<ushort, float, ushort>,
+                addWeightedImpl<ushort, float, short >,
+                addWeightedImpl<ushort, float, int   >,
+                addWeightedImpl<ushort, float, float >,
+                addWeightedImpl<ushort, float, double>
+            },
+            {
+                addWeightedImpl<ushort, double, uchar >,
+                addWeightedImpl<ushort, double, schar >,
+                addWeightedImpl<ushort, double, ushort>,
+                addWeightedImpl<ushort, double, short >,
+                addWeightedImpl<ushort, double, int   >,
+                addWeightedImpl<ushort, double, float >,
+                addWeightedImpl<ushort, double, double>
+            }
+        },
+        {
+            {
+                0/*addWeightedImpl<short, uchar, uchar >*/,
+                0/*addWeightedImpl<short, uchar, schar >*/,
+                0/*addWeightedImpl<short, uchar, ushort>*/,
+                0/*addWeightedImpl<short, uchar, short >*/,
+                0/*addWeightedImpl<short, uchar, int   >*/,
+                0/*addWeightedImpl<short, uchar, float >*/,
+                0/*addWeightedImpl<short, uchar, double>*/
+            },
+            {
+                0/*addWeightedImpl<short, schar, uchar >*/,
+                0/*addWeightedImpl<short, schar, schar >*/,
+                0/*addWeightedImpl<short, schar, ushort>*/,
+                0/*addWeightedImpl<short, schar, short >*/,
+                0/*addWeightedImpl<short, schar, int   >*/,
+                0/*addWeightedImpl<short, schar, float >*/,
+                0/*addWeightedImpl<short, schar, double>*/
+            },
+            {
+                0/*addWeightedImpl<short, ushort, uchar >*/,
+                0/*addWeightedImpl<short, ushort, schar >*/,
+                0/*addWeightedImpl<short, ushort, ushort>*/,
+                0/*addWeightedImpl<short, ushort, short >*/,
+                0/*addWeightedImpl<short, ushort, int   >*/,
+                0/*addWeightedImpl<short, ushort, float >*/,
+                0/*addWeightedImpl<short, ushort, double>*/
+            },
+            {
+                addWeightedImpl<short, short, uchar >,
+                addWeightedImpl<short, short, schar >,
+                addWeightedImpl<short, short, ushort>,
+                addWeightedImpl<short, short, short >,
+                addWeightedImpl<short, short, int   >,
+                addWeightedImpl<short, short, float >,
+                addWeightedImpl<short, short, double>
+            },
+            {
+                addWeightedImpl<short, int, uchar >,
+                addWeightedImpl<short, int, schar >,
+                addWeightedImpl<short, int, ushort>,
+                addWeightedImpl<short, int, short >,
+                addWeightedImpl<short, int, int   >,
+                addWeightedImpl<short, int, float >,
+                addWeightedImpl<short, int, double>
+            },
+            {
+                addWeightedImpl<short, float, uchar >,
+                addWeightedImpl<short, float, schar >,
+                addWeightedImpl<short, float, ushort>,
+                addWeightedImpl<short, float, short >,
+                addWeightedImpl<short, float, int   >,
+                addWeightedImpl<short, float, float >,
+                addWeightedImpl<short, float, double>
+            },
+            {
+                addWeightedImpl<short, double, uchar >,
+                addWeightedImpl<short, double, schar >,
+                addWeightedImpl<short, double, ushort>,
+                addWeightedImpl<short, double, short >,
+                addWeightedImpl<short, double, int   >,
+                addWeightedImpl<short, double, float >,
+                addWeightedImpl<short, double, double>
+            }
+        },
+        {
+            {
+                0/*addWeightedImpl<int, uchar, uchar >*/,
+                0/*addWeightedImpl<int, uchar, schar >*/,
+                0/*addWeightedImpl<int, uchar, ushort>*/,
+                0/*addWeightedImpl<int, uchar, short >*/,
+                0/*addWeightedImpl<int, uchar, int   >*/,
+                0/*addWeightedImpl<int, uchar, float >*/,
+                0/*addWeightedImpl<int, uchar, double>*/
+            },
+            {
+                0/*addWeightedImpl<int, schar, uchar >*/,
+                0/*addWeightedImpl<int, schar, schar >*/,
+                0/*addWeightedImpl<int, schar, ushort>*/,
+                0/*addWeightedImpl<int, schar, short >*/,
+                0/*addWeightedImpl<int, schar, int   >*/,
+                0/*addWeightedImpl<int, schar, float >*/,
+                0/*addWeightedImpl<int, schar, double>*/
+            },
+            {
+                0/*addWeightedImpl<int, ushort, uchar >*/,
+                0/*addWeightedImpl<int, ushort, schar >*/,
+                0/*addWeightedImpl<int, ushort, ushort>*/,
+                0/*addWeightedImpl<int, ushort, short >*/,
+                0/*addWeightedImpl<int, ushort, int   >*/,
+                0/*addWeightedImpl<int, ushort, float >*/,
+                0/*addWeightedImpl<int, ushort, double>*/
+            },
+            {
+                0/*addWeightedImpl<int, short, uchar >*/,
+                0/*addWeightedImpl<int, short, schar >*/,
+                0/*addWeightedImpl<int, short, ushort>*/,
+                0/*addWeightedImpl<int, short, short >*/,
+                0/*addWeightedImpl<int, short, int   >*/,
+                0/*addWeightedImpl<int, short, float >*/,
+                0/*addWeightedImpl<int, short, double>*/
+            },
+            {
+                addWeightedImpl<int, int, uchar >,
+                addWeightedImpl<int, int, schar >,
+                addWeightedImpl<int, int, ushort>,
+                addWeightedImpl<int, int, short >,
+                addWeightedImpl<int, int, int   >,
+                addWeightedImpl<int, int, float >,
+                addWeightedImpl<int, int, double>
+            },
+            {
+                addWeightedImpl<int, float, uchar >,
+                addWeightedImpl<int, float, schar >,
+                addWeightedImpl<int, float, ushort>,
+                addWeightedImpl<int, float, short >,
+                addWeightedImpl<int, float, int   >,
+                addWeightedImpl<int, float, float >,
+                addWeightedImpl<int, float, double>
+            },
+            {
+                addWeightedImpl<int, double, uchar >,
+                addWeightedImpl<int, double, schar >,
+                addWeightedImpl<int, double, ushort>,
+                addWeightedImpl<int, double, short >,
+                addWeightedImpl<int, double, int   >,
+                addWeightedImpl<int, double, float >,
+                addWeightedImpl<int, double, double>
+            }
+        },
+        {
+            {
+                0/*addWeightedImpl<float, uchar, uchar >*/,
+                0/*addWeightedImpl<float, uchar, schar >*/,
+                0/*addWeightedImpl<float, uchar, ushort>*/,
+                0/*addWeightedImpl<float, uchar, short >*/,
+                0/*addWeightedImpl<float, uchar, int   >*/,
+                0/*addWeightedImpl<float, uchar, float >*/,
+                0/*addWeightedImpl<float, uchar, double>*/
+            },
+            {
+                0/*addWeightedImpl<float, schar, uchar >*/,
+                0/*addWeightedImpl<float, schar, schar >*/,
+                0/*addWeightedImpl<float, schar, ushort>*/,
+                0/*addWeightedImpl<float, schar, short >*/,
+                0/*addWeightedImpl<float, schar, int   >*/,
+                0/*addWeightedImpl<float, schar, float >*/,
+                0/*addWeightedImpl<float, schar, double>*/
+            },
+            {
+                0/*addWeightedImpl<float, ushort, uchar >*/,
+                0/*addWeightedImpl<float, ushort, schar >*/,
+                0/*addWeightedImpl<float, ushort, ushort>*/,
+                0/*addWeightedImpl<float, ushort, short >*/,
+                0/*addWeightedImpl<float, ushort, int   >*/,
+                0/*addWeightedImpl<float, ushort, float >*/,
+                0/*addWeightedImpl<float, ushort, double>*/
+            },
+            {
+                0/*addWeightedImpl<float, short, uchar >*/,
+                0/*addWeightedImpl<float, short, schar >*/,
+                0/*addWeightedImpl<float, short, ushort>*/,
+                0/*addWeightedImpl<float, short, short >*/,
+                0/*addWeightedImpl<float, short, int   >*/,
+                0/*addWeightedImpl<float, short, float >*/,
+                0/*addWeightedImpl<float, short, double>*/
+            },
+            {
+                0/*addWeightedImpl<float, int, uchar >*/,
+                0/*addWeightedImpl<float, int, schar >*/,
+                0/*addWeightedImpl<float, int, ushort>*/,
+                0/*addWeightedImpl<float, int, short >*/,
+                0/*addWeightedImpl<float, int, int   >*/,
+                0/*addWeightedImpl<float, int, float >*/,
+                0/*addWeightedImpl<float, int, double>*/
+            },
+            {
+                addWeightedImpl<float, float, uchar >,
+                addWeightedImpl<float, float, schar >,
+                addWeightedImpl<float, float, ushort>,
+                addWeightedImpl<float, float, short >,
+                addWeightedImpl<float, float, int   >,
+                addWeightedImpl<float, float, float >,
+                addWeightedImpl<float, float, double>
+            },
+            {
+                addWeightedImpl<float, double, uchar >,
+                addWeightedImpl<float, double, schar >,
+                addWeightedImpl<float, double, ushort>,
+                addWeightedImpl<float, double, short >,
+                addWeightedImpl<float, double, int   >,
+                addWeightedImpl<float, double, float >,
+                addWeightedImpl<float, double, double>
+            }
+        },
+        {
+            {
+                0/*addWeightedImpl<double, uchar, uchar >*/,
+                0/*addWeightedImpl<double, uchar, schar >*/,
+                0/*addWeightedImpl<double, uchar, ushort>*/,
+                0/*addWeightedImpl<double, uchar, short >*/,
+                0/*addWeightedImpl<double, uchar, int   >*/,
+                0/*addWeightedImpl<double, uchar, float >*/,
+                0/*addWeightedImpl<double, uchar, double>*/
+            },
+            {
+                0/*addWeightedImpl<double, schar, uchar >*/,
+                0/*addWeightedImpl<double, schar, schar >*/,
+                0/*addWeightedImpl<double, schar, ushort>*/,
+                0/*addWeightedImpl<double, schar, short >*/,
+                0/*addWeightedImpl<double, schar, int   >*/,
+                0/*addWeightedImpl<double, schar, float >*/,
+                0/*addWeightedImpl<double, schar, double>*/
+            },
+            {
+                0/*addWeightedImpl<double, ushort, uchar >*/,
+                0/*addWeightedImpl<double, ushort, schar >*/,
+                0/*addWeightedImpl<double, ushort, ushort>*/,
+                0/*addWeightedImpl<double, ushort, short >*/,
+                0/*addWeightedImpl<double, ushort, int   >*/,
+                0/*addWeightedImpl<double, ushort, float >*/,
+                0/*addWeightedImpl<double, ushort, double>*/
+            },
+            {
+                0/*addWeightedImpl<double, short, uchar >*/,
+                0/*addWeightedImpl<double, short, schar >*/,
+                0/*addWeightedImpl<double, short, ushort>*/,
+                0/*addWeightedImpl<double, short, short >*/,
+                0/*addWeightedImpl<double, short, int   >*/,
+                0/*addWeightedImpl<double, short, float >*/,
+                0/*addWeightedImpl<double, short, double>*/
+            },
+            {
+                0/*addWeightedImpl<double, int, uchar >*/,
+                0/*addWeightedImpl<double, int, schar >*/,
+                0/*addWeightedImpl<double, int, ushort>*/,
+                0/*addWeightedImpl<double, int, short >*/,
+                0/*addWeightedImpl<double, int, int   >*/,
+                0/*addWeightedImpl<double, int, float >*/,
+                0/*addWeightedImpl<double, int, double>*/
+            },
+            {
+                0/*addWeightedImpl<double, float, uchar >*/,
+                0/*addWeightedImpl<double, float, schar >*/,
+                0/*addWeightedImpl<double, float, ushort>*/,
+                0/*addWeightedImpl<double, float, short >*/,
+                0/*addWeightedImpl<double, float, int   >*/,
+                0/*addWeightedImpl<double, float, float >*/,
+                0/*addWeightedImpl<double, float, double>*/
+            },
+            {
+                addWeightedImpl<double, double, uchar >,
+                addWeightedImpl<double, double, schar >,
+                addWeightedImpl<double, double, ushort>,
+                addWeightedImpl<double, double, short >,
+                addWeightedImpl<double, double, int   >,
+                addWeightedImpl<double, double, float >,
+                addWeightedImpl<double, double, double>
+            }
+        }
+    };
+
+    GpuMat src1 = getInputMat(_src1, stream);
+    GpuMat src2 = getInputMat(_src2, stream);
+
+    int sdepth1 = src1.depth();
+    int sdepth2 = src2.depth();
+
+    ddepth = ddepth >= 0 ? CV_MAT_DEPTH(ddepth) : std::max(sdepth1, sdepth2);
+    const int cn = src1.channels();
+
+    CV_Assert( src2.size() == src1.size() && src2.channels() == cn );
+    CV_Assert( sdepth1 <= CV_64F && sdepth2 <= CV_64F && ddepth <= CV_64F );
+
+    GpuMat dst = getOutputMat(_dst, src1.size(), CV_MAKE_TYPE(ddepth, cn), stream);
+
+    GpuMat src1_single = src1.reshape(1);
+    GpuMat src2_single = src2.reshape(1);
+    GpuMat dst_single = dst.reshape(1);
+
+    if (sdepth1 > sdepth2)
+    {
+        src1_single.swap(src2_single);
+        std::swap(alpha, beta);
+        std::swap(sdepth1, sdepth2);
+    }
+
+    const func_t func = funcs[sdepth1][sdepth2][ddepth];
+
+    if (!func)
+        CV_Error(cv::Error::StsUnsupportedFormat, "Unsupported combination of source and destination types");
+
+    func(src1_single, alpha, src2_single, beta, gamma, dst_single, stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/bitwise_mat.cu b/modules/cudaarithm/src/cuda/bitwise_mat.cu
new file mode 100644
index 00000000000..f151c1a4862
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/bitwise_mat.cu
@@ -0,0 +1,230 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+void bitMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, double, Stream& stream, int op);
+
+//////////////////////////////////////////////////////////////////////////////
+/// bitwise_not
+
+void cv::cuda::bitwise_not(InputArray _src, OutputArray _dst, InputArray _mask, Stream& stream)
+{
+    GpuMat src = getInputMat(_src, stream);
+    GpuMat mask = getInputMat(_mask, stream);
+
+    const int depth = src.depth();
+
+    CV_DbgAssert( depth <= CV_32F );
+    CV_DbgAssert( mask.empty() || (mask.type() == CV_8UC1 && mask.size() == src.size()) );
+
+    GpuMat dst = getOutputMat(_dst, src.size(), src.type(), stream);
+
+    if (mask.empty())
+    {
+        const int bcols = (int) (src.cols * src.elemSize());
+
+        if ((bcols & 3) == 0)
+        {
+            const int vcols = bcols >> 2;
+
+            GlobPtrSz<uint> vsrc = globPtr((uint*) src.data, src.step, src.rows, vcols);
+            GlobPtrSz<uint> vdst = globPtr((uint*) dst.data, dst.step, src.rows, vcols);
+
+            gridTransformUnary(vsrc, vdst, bit_not<uint>(), stream);
+        }
+        else if ((bcols & 1) == 0)
+        {
+            const int vcols = bcols >> 1;
+
+            GlobPtrSz<ushort> vsrc = globPtr((ushort*) src.data, src.step, src.rows, vcols);
+            GlobPtrSz<ushort> vdst = globPtr((ushort*) dst.data, dst.step, src.rows, vcols);
+
+            gridTransformUnary(vsrc, vdst, bit_not<ushort>(), stream);
+        }
+        else
+        {
+            GlobPtrSz<uchar> vsrc = globPtr((uchar*) src.data, src.step, src.rows, bcols);
+            GlobPtrSz<uchar> vdst = globPtr((uchar*) dst.data, dst.step, src.rows, bcols);
+
+            gridTransformUnary(vsrc, vdst, bit_not<uchar>(), stream);
+        }
+    }
+    else
+    {
+        if (depth == CV_32F || depth == CV_32S)
+        {
+            GlobPtrSz<uint> vsrc = globPtr((uint*) src.data, src.step, src.rows, src.cols * src.channels());
+            GlobPtrSz<uint> vdst = globPtr((uint*) dst.data, dst.step, src.rows, src.cols * src.channels());
+
+            gridTransformUnary(vsrc, vdst, bit_not<uint>(), singleMaskChannels(globPtr<uchar>(mask), src.channels()), stream);
+        }
+        else if (depth == CV_16S || depth == CV_16U)
+        {
+            GlobPtrSz<ushort> vsrc = globPtr((ushort*) src.data, src.step, src.rows, src.cols * src.channels());
+            GlobPtrSz<ushort> vdst = globPtr((ushort*) dst.data, dst.step, src.rows, src.cols * src.channels());
+
+            gridTransformUnary(vsrc, vdst, bit_not<ushort>(), singleMaskChannels(globPtr<uchar>(mask), src.channels()), stream);
+        }
+        else
+        {
+            GlobPtrSz<uchar> vsrc = globPtr((uchar*) src.data, src.step, src.rows, src.cols * src.channels());
+            GlobPtrSz<uchar> vdst = globPtr((uchar*) dst.data, dst.step, src.rows, src.cols * src.channels());
+
+            gridTransformUnary(vsrc, vdst, bit_not<uchar>(), singleMaskChannels(globPtr<uchar>(mask), src.channels()), stream);
+        }
+    }
+
+    syncOutput(dst, _dst, stream);
+}
+
+//////////////////////////////////////////////////////////////////////////////
+/// Binary bitwise logical operations
+
+namespace
+{
+    template <template <typename> class Op, typename T>
+    void bitMatOp(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, Stream& stream)
+    {
+        GlobPtrSz<T> vsrc1 = globPtr((T*) src1.data, src1.step, src1.rows, src1.cols * src1.channels());
+        GlobPtrSz<T> vsrc2 = globPtr((T*) src2.data, src2.step, src1.rows, src1.cols * src1.channels());
+        GlobPtrSz<T> vdst = globPtr((T*) dst.data, dst.step, src1.rows, src1.cols * src1.channels());
+
+        if (mask.data)
+            gridTransformBinary(vsrc1, vsrc2, vdst, Op<T>(), singleMaskChannels(globPtr<uchar>(mask), src1.channels()), stream);
+        else
+            gridTransformBinary(vsrc1, vsrc2, vdst, Op<T>(), stream);
+    }
+}
+
+void bitMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, double, Stream& stream, int op)
+{
+    typedef void (*func_t)(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, Stream& stream);
+    static const func_t funcs32[] =
+    {
+        bitMatOp<bit_and, uint>,
+        bitMatOp<bit_or, uint>,
+        bitMatOp<bit_xor, uint>
+    };
+    static const func_t funcs16[] =
+    {
+        bitMatOp<bit_and, ushort>,
+        bitMatOp<bit_or, ushort>,
+        bitMatOp<bit_xor, ushort>
+    };
+    static const func_t funcs8[] =
+    {
+        bitMatOp<bit_and, uchar>,
+        bitMatOp<bit_or, uchar>,
+        bitMatOp<bit_xor, uchar>
+    };
+
+    const int depth = src1.depth();
+
+    CV_DbgAssert( depth <= CV_32F );
+    CV_DbgAssert( op >= 0 && op < 3 );
+
+    if (mask.empty())
+    {
+        const int bcols = (int) (src1.cols * src1.elemSize());
+
+        if ((bcols & 3) == 0)
+        {
+            const int vcols = bcols >> 2;
+
+            GpuMat vsrc1(src1.rows, vcols, CV_32SC1, src1.data, src1.step);
+            GpuMat vsrc2(src1.rows, vcols, CV_32SC1, src2.data, src2.step);
+            GpuMat vdst(src1.rows, vcols, CV_32SC1, dst.data, dst.step);
+
+            funcs32[op](vsrc1, vsrc2, vdst, GpuMat(), stream);
+        }
+        else if ((bcols & 1) == 0)
+        {
+            const int vcols = bcols >> 1;
+
+            GpuMat vsrc1(src1.rows, vcols, CV_16UC1, src1.data, src1.step);
+            GpuMat vsrc2(src1.rows, vcols, CV_16UC1, src2.data, src2.step);
+            GpuMat vdst(src1.rows, vcols, CV_16UC1, dst.data, dst.step);
+
+            funcs16[op](vsrc1, vsrc2, vdst, GpuMat(), stream);
+        }
+        else
+        {
+            GpuMat vsrc1(src1.rows, bcols, CV_8UC1, src1.data, src1.step);
+            GpuMat vsrc2(src1.rows, bcols, CV_8UC1, src2.data, src2.step);
+            GpuMat vdst(src1.rows, bcols, CV_8UC1, dst.data, dst.step);
+
+            funcs8[op](vsrc1, vsrc2, vdst, GpuMat(), stream);
+        }
+    }
+    else
+    {
+        if (depth == CV_32F || depth == CV_32S)
+        {
+            funcs32[op](src1, src2, dst, mask, stream);
+        }
+        else if (depth == CV_16S || depth == CV_16U)
+        {
+            funcs16[op](src1, src2, dst, mask, stream);
+        }
+        else
+        {
+            funcs8[op](src1, src2, dst, mask, stream);
+        }
+    }
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/bitwise_scalar.cu b/modules/cudaarithm/src/cuda/bitwise_scalar.cu
new file mode 100644
index 00000000000..0dd99e8cd85
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/bitwise_scalar.cu
@@ -0,0 +1,171 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv::cudev;
+
+void bitScalar(const GpuMat& src, cv::Scalar value, bool, GpuMat& dst, const GpuMat& mask, double, Stream& stream, int op);
+
+namespace
+{
+    template <template <typename> class Op, typename T>
+    void bitScalarOp(const GpuMat& src, uint value, GpuMat& dst, Stream& stream)
+    {
+        gridTransformUnary(globPtr<T>(src), globPtr<T>(dst), bind2nd(Op<T>(), value), stream);
+    }
+
+    typedef void (*bit_scalar_func_t)(const GpuMat& src, uint value, GpuMat& dst, Stream& stream);
+
+    template <typename T, bit_scalar_func_t func> struct BitScalar
+    {
+        static void call(const GpuMat& src, cv::Scalar value, GpuMat& dst, Stream& stream)
+        {
+            func(src, cv::saturate_cast<T>(value[0]), dst, stream);
+        }
+    };
+
+    template <bit_scalar_func_t func> struct BitScalar4
+    {
+        static void call(const GpuMat& src, cv::Scalar value, GpuMat& dst, Stream& stream)
+        {
+            uint packedVal = 0;
+
+            packedVal |= cv::saturate_cast<uchar>(value[0]);
+            packedVal |= cv::saturate_cast<uchar>(value[1]) << 8;
+            packedVal |= cv::saturate_cast<uchar>(value[2]) << 16;
+            packedVal |= cv::saturate_cast<uchar>(value[3]) << 24;
+
+            func(src, packedVal, dst, stream);
+        }
+    };
+
+    template <int DEPTH, int cn> struct NppBitwiseCFunc
+    {
+        typedef typename NPPTypeTraits<DEPTH>::npp_type npp_type;
+
+        typedef NppStatus (*func_t)(const npp_type* pSrc1, int nSrc1Step, const npp_type* pConstants, npp_type* pDst, int nDstStep, NppiSize oSizeROI);
+    };
+
+    template <int DEPTH, int cn, typename NppBitwiseCFunc<DEPTH, cn>::func_t func> struct NppBitwiseC
+    {
+        typedef typename NppBitwiseCFunc<DEPTH, cn>::npp_type npp_type;
+
+        static void call(const GpuMat& src, cv::Scalar value, GpuMat& dst, Stream& _stream)
+        {
+            cudaStream_t stream = StreamAccessor::getStream(_stream);
+            NppStreamHandler h(stream);
+
+            NppiSize oSizeROI;
+            oSizeROI.width = src.cols;
+            oSizeROI.height = src.rows;
+
+            const npp_type pConstants[] =
+            {
+                cv::saturate_cast<npp_type>(value[0]),
+                cv::saturate_cast<npp_type>(value[1]),
+                cv::saturate_cast<npp_type>(value[2]),
+                cv::saturate_cast<npp_type>(value[3])
+            };
+
+            nppSafeCall( func(src.ptr<npp_type>(), static_cast<int>(src.step), pConstants, dst.ptr<npp_type>(), static_cast<int>(dst.step), oSizeROI) );
+
+            if (stream == 0)
+                CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+        }
+    };
+}
+
+void bitScalar(const GpuMat& src, cv::Scalar value, bool, GpuMat& dst, const GpuMat& mask, double, Stream& stream, int op)
+{
+    CV_UNUSED(mask);
+
+    typedef void (*func_t)(const GpuMat& src, cv::Scalar value, GpuMat& dst, Stream& stream);
+    static const func_t funcs[3][6][4] =
+    {
+        {
+            {BitScalar<uchar, bitScalarOp<bit_and, uchar> >::call  , 0, NppBitwiseC<CV_8U , 3, nppiAndC_8u_C3R >::call, BitScalar4< bitScalarOp<bit_and, uint> >::call},
+            {BitScalar<uchar, bitScalarOp<bit_and, uchar> >::call  , 0, NppBitwiseC<CV_8U , 3, nppiAndC_8u_C3R >::call, BitScalar4< bitScalarOp<bit_and, uint> >::call},
+            {BitScalar<ushort, bitScalarOp<bit_and, ushort> >::call, 0, NppBitwiseC<CV_16U, 3, nppiAndC_16u_C3R>::call, NppBitwiseC<CV_16U, 4, nppiAndC_16u_C4R>::call},
+            {BitScalar<ushort, bitScalarOp<bit_and, ushort> >::call, 0, NppBitwiseC<CV_16U, 3, nppiAndC_16u_C3R>::call, NppBitwiseC<CV_16U, 4, nppiAndC_16u_C4R>::call},
+            {BitScalar<uint, bitScalarOp<bit_and, uint> >::call    , 0, NppBitwiseC<CV_32S, 3, nppiAndC_32s_C3R>::call, NppBitwiseC<CV_32S, 4, nppiAndC_32s_C4R>::call},
+            {BitScalar<uint, bitScalarOp<bit_and, uint> >::call    , 0, NppBitwiseC<CV_32S, 3, nppiAndC_32s_C3R>::call, NppBitwiseC<CV_32S, 4, nppiAndC_32s_C4R>::call}
+        },
+        {
+            {BitScalar<uchar, bitScalarOp<bit_or, uchar> >::call  , 0, NppBitwiseC<CV_8U , 3, nppiOrC_8u_C3R >::call, BitScalar4< bitScalarOp<bit_or, uint> >::call},
+            {BitScalar<uchar, bitScalarOp<bit_or, uchar> >::call  , 0, NppBitwiseC<CV_8U , 3, nppiOrC_8u_C3R >::call, BitScalar4< bitScalarOp<bit_or, uint> >::call},
+            {BitScalar<ushort, bitScalarOp<bit_or, ushort> >::call, 0, NppBitwiseC<CV_16U, 3, nppiOrC_16u_C3R>::call, NppBitwiseC<CV_16U, 4, nppiOrC_16u_C4R>::call},
+            {BitScalar<ushort, bitScalarOp<bit_or, ushort> >::call, 0, NppBitwiseC<CV_16U, 3, nppiOrC_16u_C3R>::call, NppBitwiseC<CV_16U, 4, nppiOrC_16u_C4R>::call},
+            {BitScalar<uint, bitScalarOp<bit_or, uint> >::call    , 0, NppBitwiseC<CV_32S, 3, nppiOrC_32s_C3R>::call, NppBitwiseC<CV_32S, 4, nppiOrC_32s_C4R>::call},
+            {BitScalar<uint, bitScalarOp<bit_or, uint> >::call    , 0, NppBitwiseC<CV_32S, 3, nppiOrC_32s_C3R>::call, NppBitwiseC<CV_32S, 4, nppiOrC_32s_C4R>::call}
+        },
+        {
+            {BitScalar<uchar, bitScalarOp<bit_xor, uchar> >::call  , 0, NppBitwiseC<CV_8U , 3, nppiXorC_8u_C3R >::call, BitScalar4< bitScalarOp<bit_xor, uint> >::call},
+            {BitScalar<uchar, bitScalarOp<bit_xor, uchar> >::call  , 0, NppBitwiseC<CV_8U , 3, nppiXorC_8u_C3R >::call, BitScalar4< bitScalarOp<bit_xor, uint> >::call},
+            {BitScalar<ushort, bitScalarOp<bit_xor, ushort> >::call, 0, NppBitwiseC<CV_16U, 3, nppiXorC_16u_C3R>::call, NppBitwiseC<CV_16U, 4, nppiXorC_16u_C4R>::call},
+            {BitScalar<ushort, bitScalarOp<bit_xor, ushort> >::call, 0, NppBitwiseC<CV_16U, 3, nppiXorC_16u_C3R>::call, NppBitwiseC<CV_16U, 4, nppiXorC_16u_C4R>::call},
+            {BitScalar<uint, bitScalarOp<bit_xor, uint> >::call    , 0, NppBitwiseC<CV_32S, 3, nppiXorC_32s_C3R>::call, NppBitwiseC<CV_32S, 4, nppiXorC_32s_C4R>::call},
+            {BitScalar<uint, bitScalarOp<bit_xor, uint> >::call    , 0, NppBitwiseC<CV_32S, 3, nppiXorC_32s_C3R>::call, NppBitwiseC<CV_32S, 4, nppiXorC_32s_C4R>::call}
+        }
+    };
+
+    const int depth = src.depth();
+    const int cn = src.channels();
+
+    CV_DbgAssert( depth <= CV_32F );
+    CV_DbgAssert( cn == 1 || cn == 3 || cn == 4 );
+    CV_DbgAssert( mask.empty() );
+    CV_DbgAssert( op >= 0 && op < 3 );
+
+    funcs[op][depth][cn - 1](src, value, dst, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/cmp_mat.cu b/modules/cudaarithm/src/cuda/cmp_mat.cu
new file mode 100644
index 00000000000..3693fc2b784
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/cmp_mat.cu
@@ -0,0 +1,219 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudev.hpp"
+
+using namespace cv::cudev;
+
+void cmpMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat&, double, Stream& stream, int cmpop);
+
+namespace
+{
+    template <class Op, typename T> struct CmpOp : binary_function<T, T, uchar>
+    {
+        __device__ __forceinline__ uchar operator()(T a, T b) const
+        {
+            Op op;
+            return -op(a, b);
+        }
+    };
+
+    template <typename ScalarDepth> struct TransformPolicy : DefaultTransformPolicy
+    {
+    };
+    template <> struct TransformPolicy<double> : DefaultTransformPolicy
+    {
+        enum {
+            shift = 1
+        };
+    };
+
+    template <template <typename> class Op, typename T>
+    void cmpMat_v1(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+    {
+        CmpOp<Op<T>, T> op;
+        gridTransformBinary_< TransformPolicy<T> >(globPtr<T>(src1), globPtr<T>(src2), globPtr<uchar>(dst), op, stream);
+    }
+
+    struct VCmpEq4 : binary_function<uint, uint, uint>
+    {
+        __device__ __forceinline__ uint operator ()(uint a, uint b) const
+        {
+            return vcmpeq4(a, b);
+        }
+    };
+    struct VCmpNe4 : binary_function<uint, uint, uint>
+    {
+        __device__ __forceinline__ uint operator ()(uint a, uint b) const
+        {
+            return vcmpne4(a, b);
+        }
+    };
+    struct VCmpLt4 : binary_function<uint, uint, uint>
+    {
+        __device__ __forceinline__ uint operator ()(uint a, uint b) const
+        {
+            return vcmplt4(a, b);
+        }
+    };
+    struct VCmpLe4 : binary_function<uint, uint, uint>
+    {
+        __device__ __forceinline__ uint operator ()(uint a, uint b) const
+        {
+            return vcmple4(a, b);
+        }
+    };
+
+    void cmpMatEq_v4(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+    {
+        const int vcols = src1.cols >> 2;
+
+        GlobPtrSz<uint> src1_ = globPtr((uint*) src1.data, src1.step, src1.rows, vcols);
+        GlobPtrSz<uint> src2_ = globPtr((uint*) src2.data, src2.step, src1.rows, vcols);
+        GlobPtrSz<uint> dst_ = globPtr((uint*) dst.data, dst.step, src1.rows, vcols);
+
+        gridTransformBinary(src1_, src2_, dst_, VCmpEq4(), stream);
+    }
+    void cmpMatNe_v4(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+    {
+        const int vcols = src1.cols >> 2;
+
+        GlobPtrSz<uint> src1_ = globPtr((uint*) src1.data, src1.step, src1.rows, vcols);
+        GlobPtrSz<uint> src2_ = globPtr((uint*) src2.data, src2.step, src1.rows, vcols);
+        GlobPtrSz<uint> dst_ = globPtr((uint*) dst.data, dst.step, src1.rows, vcols);
+
+        gridTransformBinary(src1_, src2_, dst_, VCmpNe4(), stream);
+    }
+    void cmpMatLt_v4(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+    {
+        const int vcols = src1.cols >> 2;
+
+        GlobPtrSz<uint> src1_ = globPtr((uint*) src1.data, src1.step, src1.rows, vcols);
+        GlobPtrSz<uint> src2_ = globPtr((uint*) src2.data, src2.step, src1.rows, vcols);
+        GlobPtrSz<uint> dst_ = globPtr((uint*) dst.data, dst.step, src1.rows, vcols);
+
+        gridTransformBinary(src1_, src2_, dst_, VCmpLt4(), stream);
+    }
+    void cmpMatLe_v4(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+    {
+        const int vcols = src1.cols >> 2;
+
+        GlobPtrSz<uint> src1_ = globPtr((uint*) src1.data, src1.step, src1.rows, vcols);
+        GlobPtrSz<uint> src2_ = globPtr((uint*) src2.data, src2.step, src1.rows, vcols);
+        GlobPtrSz<uint> dst_ = globPtr((uint*) dst.data, dst.step, src1.rows, vcols);
+
+        gridTransformBinary(src1_, src2_, dst_, VCmpLe4(), stream);
+    }
+}
+
+void cmpMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat&, double, Stream& stream, int cmpop)
+{
+    typedef void (*func_t)(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream);
+    static const func_t funcs[7][4] =
+    {
+        {cmpMat_v1<equal_to, uchar> , cmpMat_v1<not_equal_to, uchar> , cmpMat_v1<less, uchar> , cmpMat_v1<less_equal, uchar> },
+        {cmpMat_v1<equal_to, schar> , cmpMat_v1<not_equal_to, schar> , cmpMat_v1<less, schar> , cmpMat_v1<less_equal, schar> },
+        {cmpMat_v1<equal_to, ushort>, cmpMat_v1<not_equal_to, ushort>, cmpMat_v1<less, ushort>, cmpMat_v1<less_equal, ushort>},
+        {cmpMat_v1<equal_to, short> , cmpMat_v1<not_equal_to, short> , cmpMat_v1<less, short> , cmpMat_v1<less_equal, short> },
+        {cmpMat_v1<equal_to, int>   , cmpMat_v1<not_equal_to, int>   , cmpMat_v1<less, int>   , cmpMat_v1<less_equal, int>   },
+        {cmpMat_v1<equal_to, float> , cmpMat_v1<not_equal_to, float> , cmpMat_v1<less, float> , cmpMat_v1<less_equal, float> },
+        {cmpMat_v1<equal_to, double>, cmpMat_v1<not_equal_to, double>, cmpMat_v1<less, double>, cmpMat_v1<less_equal, double>}
+    };
+
+    typedef void (*func_v4_t)(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream);
+    static const func_v4_t funcs_v4[] =
+    {
+        cmpMatEq_v4, cmpMatNe_v4, cmpMatLt_v4, cmpMatLe_v4
+    };
+
+    const int depth = src1.depth();
+
+    CV_DbgAssert( depth <= CV_64F );
+
+    static const int codes[] =
+    {
+        0, 2, 3, 2, 3, 1
+    };
+    const GpuMat* psrc1[] =
+    {
+        &src1, &src2, &src2, &src1, &src1, &src1
+    };
+    const GpuMat* psrc2[] =
+    {
+        &src2, &src1, &src1, &src2, &src2, &src2
+    };
+
+    const int code = codes[cmpop];
+
+    GpuMat src1_ = psrc1[cmpop]->reshape(1);
+    GpuMat src2_ = psrc2[cmpop]->reshape(1);
+    GpuMat dst_ = dst.reshape(1);
+
+    if (depth == CV_8U && (src1_.cols & 3) == 0)
+    {
+        const intptr_t src1ptr = reinterpret_cast<intptr_t>(src1_.data);
+        const intptr_t src2ptr = reinterpret_cast<intptr_t>(src2_.data);
+        const intptr_t dstptr = reinterpret_cast<intptr_t>(dst_.data);
+
+        const bool isAllAligned = (src1ptr & 31) == 0 && (src2ptr & 31) == 0 && (dstptr & 31) == 0;
+
+        if (isAllAligned)
+        {
+            funcs_v4[code](src1_, src2_, dst_, stream);
+            return;
+        }
+    }
+
+    const func_t func = funcs[depth][code];
+
+    func(src1_, src2_, dst_, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/cmp_scalar.cu b/modules/cudaarithm/src/cuda/cmp_scalar.cu
new file mode 100644
index 00000000000..df57bc00436
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/cmp_scalar.cu
@@ -0,0 +1,225 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudev.hpp"
+
+using namespace cv::cudev;
+
+void cmpScalar(const GpuMat& src, cv::Scalar val, bool inv, GpuMat& dst, const GpuMat&, double, Stream& stream, int cmpop);
+
+namespace
+{
+    template <class Op, typename T> struct CmpOp : binary_function<T, T, uchar>
+    {
+        __device__ __forceinline__ uchar operator()(T a, T b) const
+        {
+            Op op;
+            return -op(a, b);
+        }
+    };
+
+#define MAKE_VEC(_type, _cn) typename MakeVec<_type, _cn>::type
+
+    template <class Op, typename T, int cn> struct CmpScalarOp;
+
+    template <class Op, typename T>
+    struct CmpScalarOp<Op, T, 1> : unary_function<T, uchar>
+    {
+        T val;
+
+        __device__ __forceinline__ uchar operator()(T src) const
+        {
+            CmpOp<Op, T> op;
+            return op(src, val);
+        }
+    };
+
+    template <class Op, typename T>
+    struct CmpScalarOp<Op, T, 2> : unary_function<MAKE_VEC(T, 2), MAKE_VEC(uchar, 2)>
+    {
+        MAKE_VEC(T, 2) val;
+
+        __device__ __forceinline__ MAKE_VEC(uchar, 2) operator()(const MAKE_VEC(T, 2) & src) const
+        {
+            CmpOp<Op, T> op;
+            return VecTraits<MAKE_VEC(uchar, 2)>::make(op(src.x, val.x), op(src.y, val.y));
+        }
+    };
+
+    template <class Op, typename T>
+    struct CmpScalarOp<Op, T, 3> : unary_function<MAKE_VEC(T, 3), MAKE_VEC(uchar, 3)>
+    {
+        MAKE_VEC(T, 3) val;
+
+        __device__ __forceinline__ MAKE_VEC(uchar, 3) operator()(const MAKE_VEC(T, 3) & src) const
+        {
+            CmpOp<Op, T> op;
+            return VecTraits<MAKE_VEC(uchar, 3)>::make(op(src.x, val.x), op(src.y, val.y), op(src.z, val.z));
+        }
+    };
+
+    template <class Op, typename T>
+    struct CmpScalarOp<Op, T, 4> : unary_function<MAKE_VEC(T, 4), MAKE_VEC(uchar, 4)>
+    {
+        MAKE_VEC(T, 4) val;
+
+        __device__ __forceinline__ MAKE_VEC(uchar, 4) operator()(const MAKE_VEC(T, 4) & src) const
+        {
+            CmpOp<Op, T> op;
+            return VecTraits<MAKE_VEC(uchar, 4)>::make(op(src.x, val.x), op(src.y, val.y), op(src.z, val.z), op(src.w, val.w));
+        }
+    };
+
+#undef TYPE_VEC
+
+    template <typename ScalarDepth> struct TransformPolicy : DefaultTransformPolicy
+    {
+    };
+    template <> struct TransformPolicy<double> : DefaultTransformPolicy
+    {
+        enum {
+            shift = 1
+        };
+    };
+
+    template <template <typename> class Op, typename T, int cn>
+    void cmpScalarImpl(const GpuMat& src, cv::Scalar value, GpuMat& dst, Stream& stream)
+    {
+        typedef typename MakeVec<T, cn>::type src_type;
+        typedef typename MakeVec<uchar, cn>::type dst_type;
+
+        cv::Scalar_<T> value_ = value;
+
+        CmpScalarOp<Op<T>, T, cn> op;
+        op.val = VecTraits<src_type>::make(value_.val);
+
+        gridTransformUnary_< TransformPolicy<T> >(globPtr<src_type>(src), globPtr<dst_type>(dst), op, stream);
+    }
+}
+
+void cmpScalar(const GpuMat& src, cv::Scalar val, bool inv, GpuMat& dst, const GpuMat&, double, Stream& stream, int cmpop)
+{
+    typedef void (*func_t)(const GpuMat& src, cv::Scalar value, GpuMat& dst, Stream& stream);
+    static const func_t funcs[7][6][4] =
+    {
+        {
+            {cmpScalarImpl<equal_to,      uchar, 1>, cmpScalarImpl<equal_to,      uchar, 2>, cmpScalarImpl<equal_to,      uchar, 3>, cmpScalarImpl<equal_to,      uchar, 4>},
+            {cmpScalarImpl<greater,       uchar, 1>, cmpScalarImpl<greater,       uchar, 2>, cmpScalarImpl<greater,       uchar, 3>, cmpScalarImpl<greater,       uchar, 4>},
+            {cmpScalarImpl<greater_equal, uchar, 1>, cmpScalarImpl<greater_equal, uchar, 2>, cmpScalarImpl<greater_equal, uchar, 3>, cmpScalarImpl<greater_equal, uchar, 4>},
+            {cmpScalarImpl<less,          uchar, 1>, cmpScalarImpl<less,          uchar, 2>, cmpScalarImpl<less,          uchar, 3>, cmpScalarImpl<less,          uchar, 4>},
+            {cmpScalarImpl<less_equal,    uchar, 1>, cmpScalarImpl<less_equal,    uchar, 2>, cmpScalarImpl<less_equal,    uchar, 3>, cmpScalarImpl<less_equal,    uchar, 4>},
+            {cmpScalarImpl<not_equal_to,  uchar, 1>, cmpScalarImpl<not_equal_to,  uchar, 2>, cmpScalarImpl<not_equal_to,  uchar, 3>, cmpScalarImpl<not_equal_to,  uchar, 4>}
+        },
+        {
+            {cmpScalarImpl<equal_to,      schar, 1>, cmpScalarImpl<equal_to,      schar, 2>, cmpScalarImpl<equal_to,      schar, 3>, cmpScalarImpl<equal_to,      schar, 4>},
+            {cmpScalarImpl<greater,       schar, 1>, cmpScalarImpl<greater,       schar, 2>, cmpScalarImpl<greater,       schar, 3>, cmpScalarImpl<greater,       schar, 4>},
+            {cmpScalarImpl<greater_equal, schar, 1>, cmpScalarImpl<greater_equal, schar, 2>, cmpScalarImpl<greater_equal, schar, 3>, cmpScalarImpl<greater_equal, schar, 4>},
+            {cmpScalarImpl<less,          schar, 1>, cmpScalarImpl<less,          schar, 2>, cmpScalarImpl<less,          schar, 3>, cmpScalarImpl<less,          schar, 4>},
+            {cmpScalarImpl<less_equal,    schar, 1>, cmpScalarImpl<less_equal,    schar, 2>, cmpScalarImpl<less_equal,    schar, 3>, cmpScalarImpl<less_equal,    schar, 4>},
+            {cmpScalarImpl<not_equal_to,  schar, 1>, cmpScalarImpl<not_equal_to,  schar, 2>, cmpScalarImpl<not_equal_to,  schar, 3>, cmpScalarImpl<not_equal_to,  schar, 4>}
+        },
+        {
+            {cmpScalarImpl<equal_to,      ushort, 1>, cmpScalarImpl<equal_to,      ushort, 2>, cmpScalarImpl<equal_to,      ushort, 3>, cmpScalarImpl<equal_to,      ushort, 4>},
+            {cmpScalarImpl<greater,       ushort, 1>, cmpScalarImpl<greater,       ushort, 2>, cmpScalarImpl<greater,       ushort, 3>, cmpScalarImpl<greater,       ushort, 4>},
+            {cmpScalarImpl<greater_equal, ushort, 1>, cmpScalarImpl<greater_equal, ushort, 2>, cmpScalarImpl<greater_equal, ushort, 3>, cmpScalarImpl<greater_equal, ushort, 4>},
+            {cmpScalarImpl<less,          ushort, 1>, cmpScalarImpl<less,          ushort, 2>, cmpScalarImpl<less,          ushort, 3>, cmpScalarImpl<less,          ushort, 4>},
+            {cmpScalarImpl<less_equal,    ushort, 1>, cmpScalarImpl<less_equal,    ushort, 2>, cmpScalarImpl<less_equal,    ushort, 3>, cmpScalarImpl<less_equal,    ushort, 4>},
+            {cmpScalarImpl<not_equal_to,  ushort, 1>, cmpScalarImpl<not_equal_to,  ushort, 2>, cmpScalarImpl<not_equal_to,  ushort, 3>, cmpScalarImpl<not_equal_to,  ushort, 4>}
+        },
+        {
+            {cmpScalarImpl<equal_to,      short, 1>, cmpScalarImpl<equal_to,      short, 2>, cmpScalarImpl<equal_to,      short, 3>, cmpScalarImpl<equal_to,      short, 4>},
+            {cmpScalarImpl<greater,       short, 1>, cmpScalarImpl<greater,       short, 2>, cmpScalarImpl<greater,       short, 3>, cmpScalarImpl<greater,       short, 4>},
+            {cmpScalarImpl<greater_equal, short, 1>, cmpScalarImpl<greater_equal, short, 2>, cmpScalarImpl<greater_equal, short, 3>, cmpScalarImpl<greater_equal, short, 4>},
+            {cmpScalarImpl<less,          short, 1>, cmpScalarImpl<less,          short, 2>, cmpScalarImpl<less,          short, 3>, cmpScalarImpl<less,          short, 4>},
+            {cmpScalarImpl<less_equal,    short, 1>, cmpScalarImpl<less_equal,    short, 2>, cmpScalarImpl<less_equal,    short, 3>, cmpScalarImpl<less_equal,    short, 4>},
+            {cmpScalarImpl<not_equal_to,  short, 1>, cmpScalarImpl<not_equal_to,  short, 2>, cmpScalarImpl<not_equal_to,  short, 3>, cmpScalarImpl<not_equal_to,  short, 4>}
+        },
+        {
+            {cmpScalarImpl<equal_to,      int, 1>, cmpScalarImpl<equal_to,      int, 2>, cmpScalarImpl<equal_to,      int, 3>, cmpScalarImpl<equal_to,      int, 4>},
+            {cmpScalarImpl<greater,       int, 1>, cmpScalarImpl<greater,       int, 2>, cmpScalarImpl<greater,       int, 3>, cmpScalarImpl<greater,       int, 4>},
+            {cmpScalarImpl<greater_equal, int, 1>, cmpScalarImpl<greater_equal, int, 2>, cmpScalarImpl<greater_equal, int, 3>, cmpScalarImpl<greater_equal, int, 4>},
+            {cmpScalarImpl<less,          int, 1>, cmpScalarImpl<less,          int, 2>, cmpScalarImpl<less,          int, 3>, cmpScalarImpl<less,          int, 4>},
+            {cmpScalarImpl<less_equal,    int, 1>, cmpScalarImpl<less_equal,    int, 2>, cmpScalarImpl<less_equal,    int, 3>, cmpScalarImpl<less_equal,    int, 4>},
+            {cmpScalarImpl<not_equal_to,  int, 1>, cmpScalarImpl<not_equal_to,  int, 2>, cmpScalarImpl<not_equal_to,  int, 3>, cmpScalarImpl<not_equal_to,  int, 4>}
+        },
+        {
+            {cmpScalarImpl<equal_to,      float, 1>, cmpScalarImpl<equal_to,      float, 2>, cmpScalarImpl<equal_to,      float, 3>, cmpScalarImpl<equal_to,      float, 4>},
+            {cmpScalarImpl<greater,       float, 1>, cmpScalarImpl<greater,       float, 2>, cmpScalarImpl<greater,       float, 3>, cmpScalarImpl<greater,       float, 4>},
+            {cmpScalarImpl<greater_equal, float, 1>, cmpScalarImpl<greater_equal, float, 2>, cmpScalarImpl<greater_equal, float, 3>, cmpScalarImpl<greater_equal, float, 4>},
+            {cmpScalarImpl<less,          float, 1>, cmpScalarImpl<less,          float, 2>, cmpScalarImpl<less,          float, 3>, cmpScalarImpl<less,          float, 4>},
+            {cmpScalarImpl<less_equal,    float, 1>, cmpScalarImpl<less_equal,    float, 2>, cmpScalarImpl<less_equal,    float, 3>, cmpScalarImpl<less_equal,    float, 4>},
+            {cmpScalarImpl<not_equal_to,  float, 1>, cmpScalarImpl<not_equal_to,  float, 2>, cmpScalarImpl<not_equal_to,  float, 3>, cmpScalarImpl<not_equal_to,  float, 4>}
+        },
+        {
+            {cmpScalarImpl<equal_to,      double, 1>, cmpScalarImpl<equal_to,      double, 2>, cmpScalarImpl<equal_to,      double, 3>, cmpScalarImpl<equal_to,      double, 4>},
+            {cmpScalarImpl<greater,       double, 1>, cmpScalarImpl<greater,       double, 2>, cmpScalarImpl<greater,       double, 3>, cmpScalarImpl<greater,       double, 4>},
+            {cmpScalarImpl<greater_equal, double, 1>, cmpScalarImpl<greater_equal, double, 2>, cmpScalarImpl<greater_equal, double, 3>, cmpScalarImpl<greater_equal, double, 4>},
+            {cmpScalarImpl<less,          double, 1>, cmpScalarImpl<less,          double, 2>, cmpScalarImpl<less,          double, 3>, cmpScalarImpl<less,          double, 4>},
+            {cmpScalarImpl<less_equal,    double, 1>, cmpScalarImpl<less_equal,    double, 2>, cmpScalarImpl<less_equal,    double, 3>, cmpScalarImpl<less_equal,    double, 4>},
+            {cmpScalarImpl<not_equal_to,  double, 1>, cmpScalarImpl<not_equal_to,  double, 2>, cmpScalarImpl<not_equal_to,  double, 3>, cmpScalarImpl<not_equal_to,  double, 4>}
+        }
+    };
+
+    if (inv)
+    {
+        // src1 is a scalar; swap it with src2
+        cmpop = cmpop == cv::CMP_LT ? cv::CMP_GT : cmpop == cv::CMP_LE ? cv::CMP_GE :
+            cmpop == cv::CMP_GE ? cv::CMP_LE : cmpop == cv::CMP_GT ? cv::CMP_LT : cmpop;
+    }
+
+    const int depth = src.depth();
+    const int cn = src.channels();
+
+    CV_DbgAssert( depth <= CV_64F && cn <= 4 );
+
+    funcs[depth][cmpop][cn - 1](src, val, dst, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/copy_make_border.cu b/modules/cudaarithm/src/cuda/copy_make_border.cu
new file mode 100644
index 00000000000..ce9cda36cfc
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/copy_make_border.cu
@@ -0,0 +1,159 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+namespace
+{
+    struct ShiftMap
+    {
+        typedef int2 value_type;
+        typedef int index_type;
+
+        int top;
+        int left;
+
+        __device__ __forceinline__ int2 operator ()(int y, int x) const
+        {
+            return make_int2(x - left, y - top);
+        }
+    };
+
+    struct ShiftMapSz : ShiftMap
+    {
+        int rows, cols;
+    };
+}
+
+namespace cv { namespace cudev {
+
+template <> struct PtrTraits<ShiftMapSz> : PtrTraitsBase<ShiftMapSz, ShiftMap>
+{
+};
+
+}}
+
+namespace
+{
+    template <typename T, int cn>
+    void copyMakeBorderImpl(const GpuMat& src, GpuMat& dst, int top, int left, int borderMode, cv::Scalar borderValue, Stream& stream)
+    {
+        typedef typename MakeVec<T, cn>::type src_type;
+
+        cv::Scalar_<T> borderValue_ = borderValue;
+        const src_type brdVal = VecTraits<src_type>::make(borderValue_.val);
+
+        ShiftMapSz map;
+        map.top = top;
+        map.left = left;
+        map.rows = dst.rows;
+        map.cols = dst.cols;
+
+        switch (borderMode)
+        {
+        case cv::BORDER_CONSTANT:
+            gridCopy(remapPtr(brdConstant(globPtr<src_type>(src), brdVal), map), globPtr<src_type>(dst), stream);
+            break;
+        case cv::BORDER_REPLICATE:
+            gridCopy(remapPtr(brdReplicate(globPtr<src_type>(src)), map), globPtr<src_type>(dst), stream);
+            break;
+        case cv::BORDER_REFLECT:
+            gridCopy(remapPtr(brdReflect(globPtr<src_type>(src)), map), globPtr<src_type>(dst), stream);
+            break;
+        case cv::BORDER_WRAP:
+            gridCopy(remapPtr(brdWrap(globPtr<src_type>(src)), map), globPtr<src_type>(dst), stream);
+            break;
+        case cv::BORDER_REFLECT_101:
+            gridCopy(remapPtr(brdReflect101(globPtr<src_type>(src)), map), globPtr<src_type>(dst), stream);
+            break;
+        };
+    }
+}
+
+void cv::cuda::copyMakeBorder(InputArray _src, OutputArray _dst, int top, int bottom, int left, int right, int borderType, Scalar value, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& src, GpuMat& dst, int top, int left, int borderMode, cv::Scalar borderValue, Stream& stream);
+    static const func_t funcs[6][4] =
+    {
+        {    copyMakeBorderImpl<uchar , 1>  ,     copyMakeBorderImpl<uchar , 2>  ,     copyMakeBorderImpl<uchar , 3>  ,     copyMakeBorderImpl<uchar , 4>  },
+        {0 /*copyMakeBorderImpl<schar , 1>*/, 0 /*copyMakeBorderImpl<schar , 2>*/, 0 /*copyMakeBorderImpl<schar , 3>*/, 0 /*copyMakeBorderImpl<schar , 4>*/},
+        {    copyMakeBorderImpl<ushort, 1>  , 0 /*copyMakeBorderImpl<ushort, 2>*/,     copyMakeBorderImpl<ushort, 3>  ,     copyMakeBorderImpl<ushort, 4>  },
+        {    copyMakeBorderImpl<short , 1>  , 0 /*copyMakeBorderImpl<short , 2>*/,     copyMakeBorderImpl<short , 3>  ,     copyMakeBorderImpl<short , 4>  },
+        {0 /*copyMakeBorderImpl<int   , 1>*/, 0 /*copyMakeBorderImpl<int   , 2>*/, 0 /*copyMakeBorderImpl<int   , 3>*/, 0 /*copyMakeBorderImpl<int   , 4>*/},
+        {    copyMakeBorderImpl<float , 1>  , 0 /*copyMakeBorderImpl<float , 2>*/,     copyMakeBorderImpl<float , 3>  ,     copyMakeBorderImpl<float  ,4>  }
+    };
+
+    GpuMat src = getInputMat(_src, stream);
+
+    const int depth = src.depth();
+    const int cn = src.channels();
+
+    CV_Assert( depth <= CV_32F && cn <= 4 );
+    CV_Assert( borderType == BORDER_REFLECT_101 || borderType == BORDER_REPLICATE || borderType == BORDER_CONSTANT || borderType == BORDER_REFLECT || borderType == BORDER_WRAP );
+
+    GpuMat dst = getOutputMat(_dst, src.rows + top + bottom, src.cols + left + right, src.type(), stream);
+
+    const func_t func = funcs[depth][cn - 1];
+
+    if (!func)
+        CV_Error(cv::Error::StsUnsupportedFormat, "Unsupported combination of source and destination types");
+
+    func(src, dst, top, left, borderType, value, stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/countnonzero.cu b/modules/cudaarithm/src/cuda/countnonzero.cu
new file mode 100644
index 00000000000..fb7324660aa
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/countnonzero.cu
@@ -0,0 +1,113 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+namespace
+{
+    template <typename T, typename D>
+    void countNonZeroImpl(const GpuMat& _src, GpuMat& _dst, Stream& stream)
+    {
+        const GpuMat_<T>& src = (const GpuMat_<T>&) _src;
+        GpuMat_<D>& dst = (GpuMat_<D>&) _dst;
+
+        gridCountNonZero(src, dst, stream);
+    }
+}
+
+void cv::cuda::countNonZero(InputArray _src, OutputArray _dst, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& src, GpuMat& dst, Stream& stream);
+    static const func_t funcs[] =
+    {
+        countNonZeroImpl<uchar, int>,
+        countNonZeroImpl<schar, int>,
+        countNonZeroImpl<ushort, int>,
+        countNonZeroImpl<short, int>,
+        countNonZeroImpl<int, int>,
+        countNonZeroImpl<float, int>,
+        countNonZeroImpl<double, int>,
+    };
+
+    GpuMat src = getInputMat(_src, stream);
+
+    CV_Assert( src.depth() <= CV_64F );
+    CV_Assert( src.channels() == 1 );
+
+    GpuMat dst = getOutputMat(_dst, 1, 1, CV_32SC1, stream);
+
+    const func_t func = funcs[src.depth()];
+    func(src, dst, stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+int cv::cuda::countNonZero(InputArray _src)
+{
+    Stream& stream = Stream::Null();
+
+    BufferPool pool(stream);
+    GpuMat buf = pool.getBuffer(1, 1, CV_32SC1);
+
+    countNonZero(_src, buf, stream);
+
+    int data;
+    buf.download(Mat(1, 1, CV_32SC1, &data));
+
+    return data;
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/div_mat.cu b/modules/cudaarithm/src/cuda/div_mat.cu
new file mode 100644
index 00000000000..2a2fb9bf51a
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/div_mat.cu
@@ -0,0 +1,242 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudev.hpp"
+
+using namespace cv::cudev;
+
+void divMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat&, double scale, Stream& stream, int);
+void divMat_8uc4_32f(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream);
+void divMat_16sc4_32f(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream);
+
+namespace
+{
+    template <typename T, typename D> struct DivOp : binary_function<T, T, D>
+    {
+        __device__ __forceinline__ D operator ()(T a, T b) const
+        {
+            return b != 0 ? saturate_cast<D>(a / b) : 0;
+        }
+    };
+    template <typename T> struct DivOp<T, float> : binary_function<T, T, float>
+    {
+        __device__ __forceinline__ float operator ()(T a, T b) const
+        {
+            return b != 0 ? static_cast<float>(a) / b : 0.0f;
+        }
+    };
+    template <typename T> struct DivOp<T, double> : binary_function<T, T, double>
+    {
+        __device__ __forceinline__ double operator ()(T a, T b) const
+        {
+            return b != 0 ? static_cast<double>(a) / b : 0.0;
+        }
+    };
+
+    template <typename T, typename S, typename D> struct DivScaleOp : binary_function<T, T, D>
+    {
+        S scale;
+
+        __device__ __forceinline__ D operator ()(T a, T b) const
+        {
+            return b != 0 ? saturate_cast<D>(scale * a / b) : 0;
+        }
+    };
+
+    template <typename ScalarDepth> struct TransformPolicy : DefaultTransformPolicy
+    {
+    };
+    template <> struct TransformPolicy<double> : DefaultTransformPolicy
+    {
+        enum {
+            shift = 1
+        };
+    };
+
+    template <typename T, typename S, typename D>
+    void divMatImpl(const GpuMat& src1, const GpuMat& src2, const GpuMat& dst, double scale, Stream& stream)
+    {
+        if (scale == 1)
+        {
+            DivOp<T, D> op;
+            gridTransformBinary_< TransformPolicy<S> >(globPtr<T>(src1), globPtr<T>(src2), globPtr<D>(dst), op, stream);
+        }
+        else
+        {
+            DivScaleOp<T, S, D> op;
+            op.scale = static_cast<S>(scale);
+            gridTransformBinary_< TransformPolicy<S> >(globPtr<T>(src1), globPtr<T>(src2), globPtr<D>(dst), op, stream);
+        }
+    }
+}
+
+void divMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat&, double scale, Stream& stream, int)
+{
+    typedef void (*func_t)(const GpuMat& src1, const GpuMat& src2, const GpuMat& dst, double scale, Stream& stream);
+    static const func_t funcs[7][7] =
+    {
+        {
+            divMatImpl<uchar, float, uchar>,
+            divMatImpl<uchar, float, schar>,
+            divMatImpl<uchar, float, ushort>,
+            divMatImpl<uchar, float, short>,
+            divMatImpl<uchar, float, int>,
+            divMatImpl<uchar, float, float>,
+            divMatImpl<uchar, double, double>
+        },
+        {
+            divMatImpl<schar, float, uchar>,
+            divMatImpl<schar, float, schar>,
+            divMatImpl<schar, float, ushort>,
+            divMatImpl<schar, float, short>,
+            divMatImpl<schar, float, int>,
+            divMatImpl<schar, float, float>,
+            divMatImpl<schar, double, double>
+        },
+        {
+            0 /*divMatImpl<ushort, float, uchar>*/,
+            0 /*divMatImpl<ushort, float, schar>*/,
+            divMatImpl<ushort, float, ushort>,
+            divMatImpl<ushort, float, short>,
+            divMatImpl<ushort, float, int>,
+            divMatImpl<ushort, float, float>,
+            divMatImpl<ushort, double, double>
+        },
+        {
+            0 /*divMatImpl<short, float, uchar>*/,
+            0 /*divMatImpl<short, float, schar>*/,
+            divMatImpl<short, float, ushort>,
+            divMatImpl<short, float, short>,
+            divMatImpl<short, float, int>,
+            divMatImpl<short, float, float>,
+            divMatImpl<short, double, double>
+        },
+        {
+            0 /*divMatImpl<int, float, uchar>*/,
+            0 /*divMatImpl<int, float, schar>*/,
+            0 /*divMatImpl<int, float, ushort>*/,
+            0 /*divMatImpl<int, float, short>*/,
+            divMatImpl<int, float, int>,
+            divMatImpl<int, float, float>,
+            divMatImpl<int, double, double>
+        },
+        {
+            0 /*divMatImpl<float, float, uchar>*/,
+            0 /*divMatImpl<float, float, schar>*/,
+            0 /*divMatImpl<float, float, ushort>*/,
+            0 /*divMatImpl<float, float, short>*/,
+            0 /*divMatImpl<float, float, int>*/,
+            divMatImpl<float, float, float>,
+            divMatImpl<float, double, double>
+        },
+        {
+            0 /*divMatImpl<double, double, uchar>*/,
+            0 /*divMatImpl<double, double, schar>*/,
+            0 /*divMatImpl<double, double, ushort>*/,
+            0 /*divMatImpl<double, double, short>*/,
+            0 /*divMatImpl<double, double, int>*/,
+            0 /*divMatImpl<double, double, float>*/,
+            divMatImpl<double, double, double>
+        }
+    };
+
+    const int sdepth = src1.depth();
+    const int ddepth = dst.depth();
+
+    CV_DbgAssert( sdepth <= CV_64F && ddepth <= CV_64F );
+
+    GpuMat src1_ = src1.reshape(1);
+    GpuMat src2_ = src2.reshape(1);
+    GpuMat dst_ = dst.reshape(1);
+
+    const func_t func = funcs[sdepth][ddepth];
+
+    if (!func)
+        CV_Error(cv::Error::StsUnsupportedFormat, "Unsupported combination of source and destination types");
+
+    func(src1_, src2_, dst_, scale, stream);
+}
+
+namespace
+{
+    template <typename T>
+    struct DivOpSpecial : binary_function<T, float, T>
+    {
+        __device__ __forceinline__ T operator ()(const T& a, float b) const
+        {
+            typedef typename VecTraits<T>::elem_type elem_type;
+
+            T res = VecTraits<T>::all(0);
+
+            if (b != 0)
+            {
+                b = 1.0f / b;
+                res.x = saturate_cast<elem_type>(a.x * b);
+                res.y = saturate_cast<elem_type>(a.y * b);
+                res.z = saturate_cast<elem_type>(a.z * b);
+                res.w = saturate_cast<elem_type>(a.w * b);
+            }
+
+            return res;
+        }
+    };
+}
+
+void divMat_8uc4_32f(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+{
+    gridTransformBinary(globPtr<uchar4>(src1), globPtr<float>(src2), globPtr<uchar4>(dst), DivOpSpecial<uchar4>(), stream);
+}
+
+void divMat_16sc4_32f(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+{
+    gridTransformBinary(globPtr<short4>(src1), globPtr<float>(src2), globPtr<short4>(dst), DivOpSpecial<short4>(), stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/div_scalar.cu b/modules/cudaarithm/src/cuda/div_scalar.cu
new file mode 100644
index 00000000000..97ada834104
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/div_scalar.cu
@@ -0,0 +1,260 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudev.hpp"
+
+using namespace cv::cudev;
+
+void divScalar(const GpuMat& src, cv::Scalar val, bool inv, GpuMat& dst, const GpuMat& mask, double scale, Stream& stream, int);
+
+namespace
+{
+    template <typename T, int cn> struct SafeDiv;
+    template <typename T> struct SafeDiv<T, 1>
+    {
+        __device__ __forceinline__ static T op(T a, T b)
+        {
+            return b != 0 ? a / b : 0;
+        }
+    };
+    template <typename T> struct SafeDiv<T, 2>
+    {
+        __device__ __forceinline__ static T op(const T& a, const T& b)
+        {
+            T res;
+
+            res.x = b.x != 0 ? a.x / b.x : 0;
+            res.y = b.y != 0 ? a.y / b.y : 0;
+
+            return res;
+        }
+    };
+    template <typename T> struct SafeDiv<T, 3>
+    {
+        __device__ __forceinline__ static T op(const T& a, const T& b)
+        {
+            T res;
+
+            res.x = b.x != 0 ? a.x / b.x : 0;
+            res.y = b.y != 0 ? a.y / b.y : 0;
+            res.z = b.z != 0 ? a.z / b.z : 0;
+
+            return res;
+        }
+    };
+    template <typename T> struct SafeDiv<T, 4>
+    {
+        __device__ __forceinline__ static T op(const T& a, const T& b)
+        {
+            T res;
+
+            res.x = b.x != 0 ? a.x / b.x : 0;
+            res.y = b.y != 0 ? a.y / b.y : 0;
+            res.z = b.z != 0 ? a.z / b.z : 0;
+            res.w = b.w != 0 ? a.w / b.w : 0;
+
+            return res;
+        }
+    };
+
+    template <typename SrcType, typename ScalarType, typename DstType> struct DivScalarOp : unary_function<SrcType, DstType>
+    {
+        ScalarType val;
+
+        __device__ __forceinline__ DstType operator ()(SrcType a) const
+        {
+            return saturate_cast<DstType>(SafeDiv<ScalarType, VecTraits<ScalarType>::cn>::op(saturate_cast<ScalarType>(a), val));
+        }
+    };
+
+    template <typename SrcType, typename ScalarType, typename DstType> struct DivScalarOpInv : unary_function<SrcType, DstType>
+    {
+        ScalarType val;
+
+        __device__ __forceinline__ DstType operator ()(SrcType a) const
+        {
+            return saturate_cast<DstType>(SafeDiv<ScalarType, VecTraits<ScalarType>::cn>::op(val, saturate_cast<ScalarType>(a)));
+        }
+    };
+
+    template <typename ScalarDepth> struct TransformPolicy : DefaultTransformPolicy
+    {
+    };
+    template <> struct TransformPolicy<double> : DefaultTransformPolicy
+    {
+        enum {
+            shift = 1
+        };
+    };
+
+    template <typename SrcType, typename ScalarDepth, typename DstType>
+    void divScalarImpl(const GpuMat& src, cv::Scalar value, bool inv, GpuMat& dst, Stream& stream)
+    {
+        typedef typename MakeVec<ScalarDepth, VecTraits<SrcType>::cn>::type ScalarType;
+
+        cv::Scalar_<ScalarDepth> value_ = value;
+
+        if (inv)
+        {
+            DivScalarOpInv<SrcType, ScalarType, DstType> op;
+            op.val = VecTraits<ScalarType>::make(value_.val);
+
+            gridTransformUnary_< TransformPolicy<ScalarDepth> >(globPtr<SrcType>(src), globPtr<DstType>(dst), op, stream);
+        }
+        else
+        {
+            DivScalarOp<SrcType, ScalarType, DstType> op;
+            op.val = VecTraits<ScalarType>::make(value_.val);
+
+            gridTransformUnary_< TransformPolicy<ScalarDepth> >(globPtr<SrcType>(src), globPtr<DstType>(dst), op, stream);
+        }
+    }
+}
+
+void divScalar(const GpuMat& src, cv::Scalar val, bool inv, GpuMat& dst, const GpuMat&, double scale, Stream& stream, int)
+{
+    typedef void (*func_t)(const GpuMat& src, cv::Scalar val, bool inv, GpuMat& dst, Stream& stream);
+    static const func_t funcs[7][7][4] =
+    {
+        {
+            {divScalarImpl<uchar, float, uchar>, divScalarImpl<uchar2, float, uchar2>, divScalarImpl<uchar3, float, uchar3>, divScalarImpl<uchar4, float, uchar4>},
+            {divScalarImpl<uchar, float, schar>, divScalarImpl<uchar2, float, char2>, divScalarImpl<uchar3, float, char3>, divScalarImpl<uchar4, float, char4>},
+            {divScalarImpl<uchar, float, ushort>, divScalarImpl<uchar2, float, ushort2>, divScalarImpl<uchar3, float, ushort3>, divScalarImpl<uchar4, float, ushort4>},
+            {divScalarImpl<uchar, float, short>, divScalarImpl<uchar2, float, short2>, divScalarImpl<uchar3, float, short3>, divScalarImpl<uchar4, float, short4>},
+            {divScalarImpl<uchar, float, int>, divScalarImpl<uchar2, float, int2>, divScalarImpl<uchar3, float, int3>, divScalarImpl<uchar4, float, int4>},
+            {divScalarImpl<uchar, float, float>, divScalarImpl<uchar2, float, float2>, divScalarImpl<uchar3, float, float3>, divScalarImpl<uchar4, float, float4>},
+            {divScalarImpl<uchar, double, double>, divScalarImpl<uchar2, double, double2>, divScalarImpl<uchar3, double, double3>, divScalarImpl<uchar4, double, double4>}
+        },
+        {
+            {divScalarImpl<schar, float, uchar>, divScalarImpl<char2, float, uchar2>, divScalarImpl<char3, float, uchar3>, divScalarImpl<char4, float, uchar4>},
+            {divScalarImpl<schar, float, schar>, divScalarImpl<char2, float, char2>, divScalarImpl<char3, float, char3>, divScalarImpl<char4, float, char4>},
+            {divScalarImpl<schar, float, ushort>, divScalarImpl<char2, float, ushort2>, divScalarImpl<char3, float, ushort3>, divScalarImpl<char4, float, ushort4>},
+            {divScalarImpl<schar, float, short>, divScalarImpl<char2, float, short2>, divScalarImpl<char3, float, short3>, divScalarImpl<char4, float, short4>},
+            {divScalarImpl<schar, float, int>, divScalarImpl<char2, float, int2>, divScalarImpl<char3, float, int3>, divScalarImpl<char4, float, int4>},
+            {divScalarImpl<schar, float, float>, divScalarImpl<char2, float, float2>, divScalarImpl<char3, float, float3>, divScalarImpl<char4, float, float4>},
+            {divScalarImpl<schar, double, double>, divScalarImpl<char2, double, double2>, divScalarImpl<char3, double, double3>, divScalarImpl<char4, double, double4>}
+        },
+        {
+            {0 /*divScalarImpl<ushort, float, uchar>*/, 0 /*divScalarImpl<ushort2, float, uchar2>*/, 0 /*divScalarImpl<ushort3, float, uchar3>*/, 0 /*divScalarImpl<ushort4, float, uchar4>*/},
+            {0 /*divScalarImpl<ushort, float, schar>*/, 0 /*divScalarImpl<ushort2, float, char2>*/, 0 /*divScalarImpl<ushort3, float, char3>*/, 0 /*divScalarImpl<ushort4, float, char4>*/},
+            {divScalarImpl<ushort, float, ushort>, divScalarImpl<ushort2, float, ushort2>, divScalarImpl<ushort3, float, ushort3>, divScalarImpl<ushort4, float, ushort4>},
+            {divScalarImpl<ushort, float, short>, divScalarImpl<ushort2, float, short2>, divScalarImpl<ushort3, float, short3>, divScalarImpl<ushort4, float, short4>},
+            {divScalarImpl<ushort, float, int>, divScalarImpl<ushort2, float, int2>, divScalarImpl<ushort3, float, int3>, divScalarImpl<ushort4, float, int4>},
+            {divScalarImpl<ushort, float, float>, divScalarImpl<ushort2, float, float2>, divScalarImpl<ushort3, float, float3>, divScalarImpl<ushort4, float, float4>},
+            {divScalarImpl<ushort, double, double>, divScalarImpl<ushort2, double, double2>, divScalarImpl<ushort3, double, double3>, divScalarImpl<ushort4, double, double4>}
+        },
+        {
+            {0 /*divScalarImpl<short, float, uchar>*/, 0 /*divScalarImpl<short2, float, uchar2>*/, 0 /*divScalarImpl<short3, float, uchar3>*/, 0 /*divScalarImpl<short4, float, uchar4>*/},
+            {0 /*divScalarImpl<short, float, schar>*/, 0 /*divScalarImpl<short2, float, char2>*/, 0 /*divScalarImpl<short3, float, char3>*/, 0 /*divScalarImpl<short4, float, char4>*/},
+            {divScalarImpl<short, float, ushort>, divScalarImpl<short2, float, ushort2>, divScalarImpl<short3, float, ushort3>, divScalarImpl<short4, float, ushort4>},
+            {divScalarImpl<short, float, short>, divScalarImpl<short2, float, short2>, divScalarImpl<short3, float, short3>, divScalarImpl<short4, float, short4>},
+            {divScalarImpl<short, float, int>, divScalarImpl<short2, float, int2>, divScalarImpl<short3, float, int3>, divScalarImpl<short4, float, int4>},
+            {divScalarImpl<short, float, float>, divScalarImpl<short2, float, float2>, divScalarImpl<short3, float, float3>, divScalarImpl<short4, float, float4>},
+            {divScalarImpl<short, double, double>, divScalarImpl<short2, double, double2>, divScalarImpl<short3, double, double3>, divScalarImpl<short4, double, double4>}
+        },
+        {
+            {0 /*divScalarImpl<int, float, uchar>*/, 0 /*divScalarImpl<int2, float, uchar2>*/, 0 /*divScalarImpl<int3, float, uchar3>*/, 0 /*divScalarImpl<int4, float, uchar4>*/},
+            {0 /*divScalarImpl<int, float, schar>*/, 0 /*divScalarImpl<int2, float, char2>*/, 0 /*divScalarImpl<int3, float, char3>*/, 0 /*divScalarImpl<int4, float, char4>*/},
+            {0 /*divScalarImpl<int, float, ushort>*/, 0 /*divScalarImpl<int2, float, ushort2>*/, 0 /*divScalarImpl<int3, float, ushort3>*/, 0 /*divScalarImpl<int4, float, ushort4>*/},
+            {0 /*divScalarImpl<int, float, short>*/, 0 /*divScalarImpl<int2, float, short2>*/, 0 /*divScalarImpl<int3, float, short3>*/, 0 /*divScalarImpl<int4, float, short4>*/},
+            {divScalarImpl<int, float, int>, divScalarImpl<int2, float, int2>, divScalarImpl<int3, float, int3>, divScalarImpl<int4, float, int4>},
+            {divScalarImpl<int, float, float>, divScalarImpl<int2, float, float2>, divScalarImpl<int3, float, float3>, divScalarImpl<int4, float, float4>},
+            {divScalarImpl<int, double, double>, divScalarImpl<int2, double, double2>, divScalarImpl<int3, double, double3>, divScalarImpl<int4, double, double4>}
+        },
+        {
+            {0 /*divScalarImpl<float, float, uchar>*/, 0 /*divScalarImpl<float2, float, uchar2>*/, 0 /*divScalarImpl<float3, float, uchar3>*/, 0 /*divScalarImpl<float4, float, uchar4>*/},
+            {0 /*divScalarImpl<float, float, schar>*/, 0 /*divScalarImpl<float2, float, char2>*/, 0 /*divScalarImpl<float3, float, char3>*/, 0 /*divScalarImpl<float4, float, char4>*/},
+            {0 /*divScalarImpl<float, float, ushort>*/, 0 /*divScalarImpl<float2, float, ushort2>*/, 0 /*divScalarImpl<float3, float, ushort3>*/, 0 /*divScalarImpl<float4, float, ushort4>*/},
+            {0 /*divScalarImpl<float, float, short>*/, 0 /*divScalarImpl<float2, float, short2>*/, 0 /*divScalarImpl<float3, float, short3>*/, 0 /*divScalarImpl<float4, float, short4>*/},
+            {0 /*divScalarImpl<float, float, int>*/, 0 /*divScalarImpl<float2, float, int2>*/, 0 /*divScalarImpl<float3, float, int3>*/, 0 /*divScalarImpl<float4, float, int4>*/},
+            {divScalarImpl<float, float, float>, divScalarImpl<float2, float, float2>, divScalarImpl<float3, float, float3>, divScalarImpl<float4, float, float4>},
+            {divScalarImpl<float, double, double>, divScalarImpl<float2, double, double2>, divScalarImpl<float3, double, double3>, divScalarImpl<float4, double, double4>}
+        },
+        {
+            {0 /*divScalarImpl<double, double, uchar>*/, 0 /*divScalarImpl<double2, double, uchar2>*/, 0 /*divScalarImpl<double3, double, uchar3>*/, 0 /*divScalarImpl<double4, double, uchar4>*/},
+            {0 /*divScalarImpl<double, double, schar>*/, 0 /*divScalarImpl<double2, double, char2>*/, 0 /*divScalarImpl<double3, double, char3>*/, 0 /*divScalarImpl<double4, double, char4>*/},
+            {0 /*divScalarImpl<double, double, ushort>*/, 0 /*divScalarImpl<double2, double, ushort2>*/, 0 /*divScalarImpl<double3, double, ushort3>*/, 0 /*divScalarImpl<double4, double, ushort4>*/},
+            {0 /*divScalarImpl<double, double, short>*/, 0 /*divScalarImpl<double2, double, short2>*/, 0 /*divScalarImpl<double3, double, short3>*/, 0 /*divScalarImpl<double4, double, short4>*/},
+            {0 /*divScalarImpl<double, double, int>*/, 0 /*divScalarImpl<double2, double, int2>*/, 0 /*divScalarImpl<double3, double, int3>*/, 0 /*divScalarImpl<double4, double, int4>*/},
+            {0 /*divScalarImpl<double, double, float>*/, 0 /*divScalarImpl<double2, double, float2>*/, 0 /*divScalarImpl<double3, double, float3>*/, 0 /*divScalarImpl<double4, double, float4>*/},
+            {divScalarImpl<double, double, double>, divScalarImpl<double2, double, double2>, divScalarImpl<double3, double, double3>, divScalarImpl<double4, double, double4>}
+        }
+    };
+
+    const int sdepth = src.depth();
+    const int ddepth = dst.depth();
+    const int cn = src.channels();
+
+    CV_DbgAssert( sdepth <= CV_64F && ddepth <= CV_64F && cn <= 4 );
+
+    if (inv)
+    {
+        val[0] *= scale;
+        val[1] *= scale;
+        val[2] *= scale;
+        val[3] *= scale;
+    }
+    else
+    {
+        val[0] /= scale;
+        val[1] /= scale;
+        val[2] /= scale;
+        val[3] /= scale;
+    }
+
+    const func_t func = funcs[sdepth][ddepth][cn - 1];
+
+    if (!func)
+        CV_Error(cv::Error::StsUnsupportedFormat, "Unsupported combination of source and destination types");
+
+    func(src, val, inv, dst, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/integral.cu b/modules/cudaarithm/src/cuda/integral.cu
new file mode 100644
index 00000000000..4a70ab0de85
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/integral.cu
@@ -0,0 +1,107 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+////////////////////////////////////////////////////////////////////////
+// integral
+
+void cv::cuda::integral(InputArray _src, OutputArray _dst, Stream& stream)
+{
+    GpuMat src = getInputMat(_src, stream);
+
+    CV_Assert( src.type() == CV_8UC1 );
+
+    BufferPool pool(stream);
+    GpuMat_<int> res(src.size(), pool.getAllocator());
+
+    gridIntegral(globPtr<uchar>(src), res, stream);
+
+    GpuMat dst = getOutputMat(_dst, src.rows + 1, src.cols + 1, CV_32SC1, stream);
+
+    dst.setTo(Scalar::all(0), stream);
+
+    GpuMat inner = dst(Rect(1, 1, src.cols, src.rows));
+    res.copyTo(inner, stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+//////////////////////////////////////////////////////////////////////////////
+// sqrIntegral
+
+void cv::cuda::sqrIntegral(InputArray _src, OutputArray _dst, Stream& stream)
+{
+    GpuMat src = getInputMat(_src, stream);
+
+    CV_Assert( src.type() == CV_8UC1 );
+
+    BufferPool pool(Stream::Null());
+    GpuMat_<double> res(pool.getBuffer(src.size(), CV_64FC1));
+
+    gridIntegral(sqr_(cvt_<int>(globPtr<uchar>(src))), res, stream);
+
+    GpuMat dst = getOutputMat(_dst, src.rows + 1, src.cols + 1, CV_64FC1, stream);
+
+    dst.setTo(Scalar::all(0), stream);
+
+    GpuMat inner = dst(Rect(1, 1, src.cols, src.rows));
+    res.copyTo(inner, stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/lut.cu b/modules/cudaarithm/src/cuda/lut.cu
new file mode 100644
index 00000000000..336f9b28859
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/lut.cu
@@ -0,0 +1,194 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "../lut.hpp"
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+namespace cv { namespace cuda {
+
+    texture<uchar, cudaTextureType1D, cudaReadModeElementType> texLutTable;
+
+    LookUpTableImpl::LookUpTableImpl(InputArray _lut)
+    {
+        if (_lut.kind() == _InputArray::CUDA_GPU_MAT)
+        {
+            d_lut = _lut.getGpuMat();
+        }
+        else
+        {
+            Mat h_lut = _lut.getMat();
+            d_lut.upload(Mat(1, 256, h_lut.type(), h_lut.data));
+        }
+
+        CV_Assert( d_lut.depth() == CV_8U );
+        CV_Assert( d_lut.rows == 1 && d_lut.cols == 256 );
+
+        cc30 = deviceSupports(FEATURE_SET_COMPUTE_30);
+
+        if (cc30)
+        {
+            // Use the texture object
+            cudaResourceDesc texRes;
+            std::memset(&texRes, 0, sizeof(texRes));
+            texRes.resType = cudaResourceTypeLinear;
+            texRes.res.linear.devPtr = d_lut.data;
+            texRes.res.linear.desc = cudaCreateChannelDesc<uchar>();
+            texRes.res.linear.sizeInBytes = 256 * d_lut.channels() * sizeof(uchar);
+
+            cudaTextureDesc texDescr;
+            std::memset(&texDescr, 0, sizeof(texDescr));
+
+            CV_CUDEV_SAFE_CALL( cudaCreateTextureObject(&texLutTableObj, &texRes, &texDescr, 0) );
+        }
+        else
+        {
+            // Use the texture reference
+            cudaChannelFormatDesc desc = cudaCreateChannelDesc<uchar>();
+            CV_CUDEV_SAFE_CALL( cudaBindTexture(0, &texLutTable, d_lut.data, &desc) );
+        }
+    }
+
+    LookUpTableImpl::~LookUpTableImpl()
+    {
+        if (cc30)
+        {
+            // Use the texture object
+            cudaDestroyTextureObject(texLutTableObj);
+        }
+        else
+        {
+            // Use the texture reference
+            cudaUnbindTexture(texLutTable);
+        }
+    }
+
+    struct LutTablePtrC1
+    {
+        typedef uchar value_type;
+        typedef uchar index_type;
+
+        cudaTextureObject_t texLutTableObj;
+
+        __device__ __forceinline__ uchar operator ()(uchar, uchar x) const
+        {
+        #if CV_CUDEV_ARCH < 300
+            // Use the texture reference
+            return tex1Dfetch(texLutTable, x);
+        #else
+            // Use the texture object
+            return tex1Dfetch<uchar>(texLutTableObj, x);
+        #endif
+        }
+    };
+    struct LutTablePtrC3
+    {
+        typedef uchar3 value_type;
+        typedef uchar3 index_type;
+
+        cudaTextureObject_t texLutTableObj;
+
+        __device__ __forceinline__ uchar3 operator ()(const uchar3&, const uchar3& x) const
+        {
+        #if CV_CUDEV_ARCH < 300
+            // Use the texture reference
+            return make_uchar3(tex1Dfetch(texLutTable, x.x * 3), tex1Dfetch(texLutTable, x.y * 3 + 1), tex1Dfetch(texLutTable, x.z * 3 + 2));
+        #else
+            // Use the texture object
+            return make_uchar3(tex1Dfetch<uchar>(texLutTableObj, x.x * 3), tex1Dfetch<uchar>(texLutTableObj, x.y * 3 + 1), tex1Dfetch<uchar>(texLutTableObj, x.z * 3 + 2));
+        #endif
+        }
+    };
+
+    void LookUpTableImpl::transform(InputArray _src, OutputArray _dst, Stream& stream)
+    {
+        GpuMat src = getInputMat(_src, stream);
+
+        const int cn = src.channels();
+        const int lut_cn = d_lut.channels();
+
+        CV_Assert( src.type() == CV_8UC1 || src.type() == CV_8UC3 );
+        CV_Assert( lut_cn == 1 || lut_cn == cn );
+
+        GpuMat dst = getOutputMat(_dst, src.size(), src.type(), stream);
+
+        if (lut_cn == 1)
+        {
+            GpuMat_<uchar> src1(src.reshape(1));
+            GpuMat_<uchar> dst1(dst.reshape(1));
+
+            LutTablePtrC1 tbl;
+            tbl.texLutTableObj = texLutTableObj;
+
+            dst1.assign(lut_(src1, tbl), stream);
+        }
+        else if (lut_cn == 3)
+        {
+            GpuMat_<uchar3>& src3 = (GpuMat_<uchar3>&) src;
+            GpuMat_<uchar3>& dst3 = (GpuMat_<uchar3>&) dst;
+
+            LutTablePtrC3 tbl;
+            tbl.texLutTableObj = texLutTableObj;
+
+            dst3.assign(lut_(src3, tbl), stream);
+        }
+
+        syncOutput(dst, _dst, stream);
+    }
+
+} }
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/math.cu b/modules/cudaarithm/src/cuda/math.cu
new file mode 100644
index 00000000000..b8853196593
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/math.cu
@@ -0,0 +1,341 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+namespace
+{
+    template <typename ScalarDepth> struct TransformPolicy : DefaultTransformPolicy
+    {
+    };
+    template <> struct TransformPolicy<double> : DefaultTransformPolicy
+    {
+        enum {
+            shift = 1
+        };
+    };
+}
+
+//////////////////////////////////////////////////////////////////////////////
+/// abs
+
+namespace
+{
+    template <typename T>
+    void absMat(const GpuMat& src, const GpuMat& dst, Stream& stream)
+    {
+        gridTransformUnary_< TransformPolicy<T> >(globPtr<T>(src), globPtr<T>(dst), abs_func<T>(), stream);
+    }
+}
+
+void cv::cuda::abs(InputArray _src, OutputArray _dst, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& src, const GpuMat& dst, Stream& stream);
+    static const func_t funcs[] =
+    {
+        absMat<uchar>,
+        absMat<schar>,
+        absMat<ushort>,
+        absMat<short>,
+        absMat<int>,
+        absMat<float>,
+        absMat<double>
+    };
+
+    GpuMat src = getInputMat(_src, stream);
+
+    CV_Assert( src.depth() <= CV_64F );
+
+    GpuMat dst = getOutputMat(_dst, src.size(), src.type(), stream);
+
+    funcs[src.depth()](src.reshape(1), dst.reshape(1), stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+//////////////////////////////////////////////////////////////////////////////
+/// sqr
+
+namespace
+{
+    template <typename T> struct SqrOp : unary_function<T, T>
+    {
+        __device__ __forceinline__ T operator ()(T x) const
+        {
+            return cudev::saturate_cast<T>(x * x);
+        }
+    };
+
+    template <typename T>
+    void sqrMat(const GpuMat& src, const GpuMat& dst, Stream& stream)
+    {
+        gridTransformUnary_< TransformPolicy<T> >(globPtr<T>(src), globPtr<T>(dst), SqrOp<T>(), stream);
+    }
+}
+
+void cv::cuda::sqr(InputArray _src, OutputArray _dst, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& src, const GpuMat& dst, Stream& stream);
+    static const func_t funcs[] =
+    {
+        sqrMat<uchar>,
+        sqrMat<schar>,
+        sqrMat<ushort>,
+        sqrMat<short>,
+        sqrMat<int>,
+        sqrMat<float>,
+        sqrMat<double>
+    };
+
+    GpuMat src = getInputMat(_src, stream);
+
+    CV_Assert( src.depth() <= CV_64F );
+
+    GpuMat dst = getOutputMat(_dst, src.size(), src.type(), stream);
+
+    funcs[src.depth()](src.reshape(1), dst.reshape(1), stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+//////////////////////////////////////////////////////////////////////////////
+/// sqrt
+
+namespace
+{
+    template <typename T>
+    void sqrtMat(const GpuMat& src, const GpuMat& dst, Stream& stream)
+    {
+        gridTransformUnary_< TransformPolicy<T> >(globPtr<T>(src), globPtr<T>(dst), sqrt_func<T>(), stream);
+    }
+}
+
+void cv::cuda::sqrt(InputArray _src, OutputArray _dst, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& src, const GpuMat& dst, Stream& stream);
+    static const func_t funcs[] =
+    {
+        sqrtMat<uchar>,
+        sqrtMat<schar>,
+        sqrtMat<ushort>,
+        sqrtMat<short>,
+        sqrtMat<int>,
+        sqrtMat<float>,
+        sqrtMat<double>
+    };
+
+    GpuMat src = getInputMat(_src, stream);
+
+    CV_Assert( src.depth() <= CV_64F );
+
+    GpuMat dst = getOutputMat(_dst, src.size(), src.type(), stream);
+
+    funcs[src.depth()](src.reshape(1), dst.reshape(1), stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+////////////////////////////////////////////////////////////////////////
+/// exp
+
+namespace
+{
+    template <typename T> struct ExpOp : unary_function<T, T>
+    {
+        __device__ __forceinline__ T operator ()(T x) const
+        {
+            exp_func<T> f;
+            return cudev::saturate_cast<T>(f(x));
+        }
+    };
+
+    template <typename T>
+    void expMat(const GpuMat& src, const GpuMat& dst, Stream& stream)
+    {
+        gridTransformUnary_< TransformPolicy<T> >(globPtr<T>(src), globPtr<T>(dst), ExpOp<T>(), stream);
+    }
+}
+
+void cv::cuda::exp(InputArray _src, OutputArray _dst, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& src, const GpuMat& dst, Stream& stream);
+    static const func_t funcs[] =
+    {
+        expMat<uchar>,
+        expMat<schar>,
+        expMat<ushort>,
+        expMat<short>,
+        expMat<int>,
+        expMat<float>,
+        expMat<double>
+    };
+
+    GpuMat src = getInputMat(_src, stream);
+
+    CV_Assert( src.depth() <= CV_64F );
+
+    GpuMat dst = getOutputMat(_dst, src.size(), src.type(), stream);
+
+    funcs[src.depth()](src.reshape(1), dst.reshape(1), stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+////////////////////////////////////////////////////////////////////////
+// log
+
+namespace
+{
+    template <typename T>
+    void logMat(const GpuMat& src, const GpuMat& dst, Stream& stream)
+    {
+        gridTransformUnary_< TransformPolicy<T> >(globPtr<T>(src), globPtr<T>(dst), log_func<T>(), stream);
+    }
+}
+
+void cv::cuda::log(InputArray _src, OutputArray _dst, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& src, const GpuMat& dst, Stream& stream);
+    static const func_t funcs[] =
+    {
+        logMat<uchar>,
+        logMat<schar>,
+        logMat<ushort>,
+        logMat<short>,
+        logMat<int>,
+        logMat<float>,
+        logMat<double>
+    };
+
+    GpuMat src = getInputMat(_src, stream);
+
+    CV_Assert( src.depth() <= CV_64F );
+
+    GpuMat dst = getOutputMat(_dst, src.size(), src.type(), stream);
+
+    funcs[src.depth()](src.reshape(1), dst.reshape(1), stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+////////////////////////////////////////////////////////////////////////
+// pow
+
+namespace
+{
+    template<typename T, bool Signed = numeric_limits<T>::is_signed> struct PowOp : unary_function<T, T>
+    {
+        typedef typename LargerType<T, float>::type LargerType;
+        LargerType power;
+
+        __device__ __forceinline__ T operator()(T e) const
+        {
+            T res = cudev::saturate_cast<T>(__powf(e < 0 ? -e : e, power));
+
+            if ((e < 0) && (1 & static_cast<int>(power)))
+                res *= -1;
+
+            return res;
+        }
+    };
+
+    template<typename T> struct PowOp<T, false> : unary_function<T, T>
+    {
+        typedef typename LargerType<T, float>::type LargerType;
+        LargerType power;
+
+        __device__ __forceinline__ T operator()(T e) const
+        {
+            return cudev::saturate_cast<T>(__powf(e, power));
+        }
+    };
+
+    template<typename T>
+    void powMat(const GpuMat& src, double power, const GpuMat& dst, Stream& stream)
+    {
+        PowOp<T> op;
+        op.power = static_cast<typename LargerType<T, float>::type>(power);
+
+        gridTransformUnary_< TransformPolicy<T> >(globPtr<T>(src), globPtr<T>(dst), op, stream);
+    }
+}
+
+void cv::cuda::pow(InputArray _src, double power, OutputArray _dst, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& src, double power, const GpuMat& dst, Stream& stream);
+    static const func_t funcs[] =
+    {
+        powMat<uchar>,
+        powMat<schar>,
+        powMat<ushort>,
+        powMat<short>,
+        powMat<int>,
+        powMat<float>,
+        powMat<double>
+    };
+
+    GpuMat src = getInputMat(_src, stream);
+
+    CV_Assert( src.depth() <= CV_64F );
+
+    GpuMat dst = getOutputMat(_dst, src.size(), src.type(), stream);
+
+    funcs[src.depth()](src.reshape(1), power, dst.reshape(1), stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/minmax.cu b/modules/cudaarithm/src/cuda/minmax.cu
new file mode 100644
index 00000000000..c5e912c8e91
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/minmax.cu
@@ -0,0 +1,189 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+namespace
+{
+    template <typename T, typename R>
+    void minMaxImpl(const GpuMat& _src, const GpuMat& mask, GpuMat& _dst, Stream& stream)
+    {
+        const GpuMat_<T>& src = (const GpuMat_<T>&) _src;
+        GpuMat_<R>& dst = (GpuMat_<R>&) _dst;
+
+        if (mask.empty())
+            gridFindMinMaxVal(src, dst, stream);
+        else
+            gridFindMinMaxVal(src, dst, globPtr<uchar>(mask), stream);
+    }
+
+    template <typename T, typename R>
+    void minMaxImpl(const GpuMat& src, const GpuMat& mask, double* minVal, double* maxVal)
+    {
+        BufferPool pool(Stream::Null());
+        GpuMat buf(pool.getBuffer(1, 2, DataType<R>::type));
+
+        minMaxImpl<T, R>(src, mask, buf, Stream::Null());
+
+        R data[2];
+        buf.download(Mat(1, 2, buf.type(), data));
+
+    }
+}
+
+void cv::cuda::findMinMax(InputArray _src, OutputArray _dst, InputArray _mask, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& _src, const GpuMat& mask, GpuMat& _dst, Stream& stream);
+    static const func_t funcs[] =
+    {
+        minMaxImpl<uchar, int>,
+        minMaxImpl<schar, int>,
+        minMaxImpl<ushort, int>,
+        minMaxImpl<short, int>,
+        minMaxImpl<int, int>,
+        minMaxImpl<float, float>,
+        minMaxImpl<double, double>
+    };
+
+    const GpuMat src = getInputMat(_src, stream);
+    const GpuMat mask = getInputMat(_mask, stream);
+
+    CV_Assert( src.channels() == 1 );
+    CV_Assert( mask.empty() || (mask.size() == src.size() && mask.type() == CV_8U) );
+
+    const int src_depth = src.depth();
+    const int dst_depth = src_depth < CV_32F ? CV_32S : src_depth;
+
+    GpuMat dst = getOutputMat(_dst, 1, 2, dst_depth, stream);
+
+    const func_t func = funcs[src.depth()];
+    func(src, mask, dst, stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+void cv::cuda::minMax(InputArray _src, double* minVal, double* maxVal, InputArray _mask)
+{
+    Stream& stream = Stream::Null();
+
+    HostMem dst;
+    findMinMax(_src, dst, _mask, stream);
+
+    stream.waitForCompletion();
+
+    double vals[2];
+    dst.createMatHeader().convertTo(Mat(1, 2, CV_64FC1, &vals[0]), CV_64F);
+
+    if (minVal)
+        *minVal = vals[0];
+
+    if (maxVal)
+        *maxVal = vals[1];
+}
+
+namespace cv { namespace cuda { namespace device {
+
+void findMaxAbs(InputArray _src, OutputArray _dst, InputArray _mask, Stream& stream);
+
+}}}
+
+namespace
+{
+    template <typename T, typename R>
+    void findMaxAbsImpl(const GpuMat& _src, const GpuMat& mask, GpuMat& _dst, Stream& stream)
+    {
+        const GpuMat_<T>& src = (const GpuMat_<T>&) _src;
+        GpuMat_<R>& dst = (GpuMat_<R>&) _dst;
+
+        if (mask.empty())
+            gridFindMaxVal(abs_(src), dst, stream);
+        else
+            gridFindMaxVal(abs_(src), dst, globPtr<uchar>(mask), stream);
+    }
+}
+
+void cv::cuda::device::findMaxAbs(InputArray _src, OutputArray _dst, InputArray _mask, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& _src, const GpuMat& mask, GpuMat& _dst, Stream& stream);
+    static const func_t funcs[] =
+    {
+        findMaxAbsImpl<uchar, int>,
+        findMaxAbsImpl<schar, int>,
+        findMaxAbsImpl<ushort, int>,
+        findMaxAbsImpl<short, int>,
+        findMaxAbsImpl<int, int>,
+        findMaxAbsImpl<float, float>,
+        findMaxAbsImpl<double, double>
+    };
+
+    const GpuMat src = getInputMat(_src, stream);
+    const GpuMat mask = getInputMat(_mask, stream);
+
+    CV_Assert( src.channels() == 1 );
+    CV_Assert( mask.empty() || (mask.size() == src.size() && mask.type() == CV_8U) );
+
+    const int src_depth = src.depth();
+    const int dst_depth = src_depth < CV_32F ? CV_32S : src_depth;
+
+    GpuMat dst = getOutputMat(_dst, 1, 1, dst_depth, stream);
+
+    const func_t func = funcs[src.depth()];
+    func(src, mask, dst, stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/minmax_mat.cu b/modules/cudaarithm/src/cuda/minmax_mat.cu
new file mode 100644
index 00000000000..405b230868a
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/minmax_mat.cu
@@ -0,0 +1,243 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudev.hpp"
+
+using namespace cv::cudev;
+
+void minMaxMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat&, double, Stream& stream, int op);
+
+void minMaxScalar(const GpuMat& src, cv::Scalar value, bool, GpuMat& dst, const GpuMat&, double, Stream& stream, int op);
+
+///////////////////////////////////////////////////////////////////////
+/// minMaxMat
+
+namespace
+{
+    template <template <typename> class Op, typename T>
+    void minMaxMat_v1(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+    {
+        gridTransformBinary(globPtr<T>(src1), globPtr<T>(src2), globPtr<T>(dst), Op<T>(), stream);
+    }
+
+    struct MinOp2 : binary_function<uint, uint, uint>
+    {
+        __device__ __forceinline__ uint operator ()(uint a, uint b) const
+        {
+            return vmin2(a, b);
+        }
+    };
+
+    struct MaxOp2 : binary_function<uint, uint, uint>
+    {
+        __device__ __forceinline__ uint operator ()(uint a, uint b) const
+        {
+            return vmax2(a, b);
+        }
+    };
+
+    template <class Op2>
+    void minMaxMat_v2(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+    {
+        const int vcols = src1.cols >> 1;
+
+        GlobPtrSz<uint> src1_ = globPtr((uint*) src1.data, src1.step, src1.rows, vcols);
+        GlobPtrSz<uint> src2_ = globPtr((uint*) src2.data, src2.step, src1.rows, vcols);
+        GlobPtrSz<uint> dst_ = globPtr((uint*) dst.data, dst.step, src1.rows, vcols);
+
+        gridTransformBinary(src1_, src2_, dst_, Op2(), stream);
+    }
+
+    struct MinOp4 : binary_function<uint, uint, uint>
+    {
+        __device__ __forceinline__ uint operator ()(uint a, uint b) const
+        {
+            return vmin4(a, b);
+        }
+    };
+
+    struct MaxOp4 : binary_function<uint, uint, uint>
+    {
+        __device__ __forceinline__ uint operator ()(uint a, uint b) const
+        {
+            return vmax4(a, b);
+        }
+    };
+
+    template <class Op4>
+    void minMaxMat_v4(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+    {
+        const int vcols = src1.cols >> 2;
+
+        GlobPtrSz<uint> src1_ = globPtr((uint*) src1.data, src1.step, src1.rows, vcols);
+        GlobPtrSz<uint> src2_ = globPtr((uint*) src2.data, src2.step, src1.rows, vcols);
+        GlobPtrSz<uint> dst_ = globPtr((uint*) dst.data, dst.step, src1.rows, vcols);
+
+        gridTransformBinary(src1_, src2_, dst_, Op4(), stream);
+    }
+}
+
+void minMaxMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat&, double, Stream& stream, int op)
+{
+    typedef void (*func_t)(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream);
+    static const func_t funcs_v1[2][7] =
+    {
+        {
+            minMaxMat_v1<minimum, uchar>,
+            minMaxMat_v1<minimum, schar>,
+            minMaxMat_v1<minimum, ushort>,
+            minMaxMat_v1<minimum, short>,
+            minMaxMat_v1<minimum, int>,
+            minMaxMat_v1<minimum, float>,
+            minMaxMat_v1<minimum, double>
+        },
+        {
+            minMaxMat_v1<maximum, uchar>,
+            minMaxMat_v1<maximum, schar>,
+            minMaxMat_v1<maximum, ushort>,
+            minMaxMat_v1<maximum, short>,
+            minMaxMat_v1<maximum, int>,
+            minMaxMat_v1<maximum, float>,
+            minMaxMat_v1<maximum, double>
+        }
+    };
+
+    static const func_t funcs_v2[2] =
+    {
+        minMaxMat_v2<MinOp2>, minMaxMat_v2<MaxOp2>
+    };
+
+    static const func_t funcs_v4[2] =
+    {
+        minMaxMat_v4<MinOp4>, minMaxMat_v4<MaxOp4>
+    };
+
+    const int depth = src1.depth();
+
+    CV_DbgAssert( depth <= CV_64F );
+
+    GpuMat src1_ = src1.reshape(1);
+    GpuMat src2_ = src2.reshape(1);
+    GpuMat dst_ = dst.reshape(1);
+
+    if (depth == CV_8U || depth == CV_16U)
+    {
+        const intptr_t src1ptr = reinterpret_cast<intptr_t>(src1_.data);
+        const intptr_t src2ptr = reinterpret_cast<intptr_t>(src2_.data);
+        const intptr_t dstptr = reinterpret_cast<intptr_t>(dst_.data);
+
+        const bool isAllAligned = (src1ptr & 31) == 0 && (src2ptr & 31) == 0 && (dstptr & 31) == 0;
+
+        if (isAllAligned)
+        {
+            if (depth == CV_8U && (src1_.cols & 3) == 0)
+            {
+                funcs_v4[op](src1_, src2_, dst_, stream);
+                return;
+            }
+            else if (depth == CV_16U && (src1_.cols & 1) == 0)
+            {
+                funcs_v2[op](src1_, src2_, dst_, stream);
+                return;
+            }
+        }
+    }
+
+    const func_t func = funcs_v1[op][depth];
+
+    func(src1_, src2_, dst_, stream);
+}
+
+///////////////////////////////////////////////////////////////////////
+/// minMaxScalar
+
+namespace
+{
+    template <template <typename> class Op, typename T>
+    void minMaxScalar(const GpuMat& src, double value, GpuMat& dst, Stream& stream)
+    {
+        gridTransformUnary(globPtr<T>(src), globPtr<T>(dst), bind2nd(Op<T>(), cv::saturate_cast<T>(value)), stream);
+    }
+}
+
+void minMaxScalar(const GpuMat& src, cv::Scalar value, bool, GpuMat& dst, const GpuMat&, double, Stream& stream, int op)
+{
+    typedef void (*func_t)(const GpuMat& src, double value, GpuMat& dst, Stream& stream);
+    static const func_t funcs[2][7] =
+    {
+        {
+            minMaxScalar<minimum, uchar>,
+            minMaxScalar<minimum, schar>,
+            minMaxScalar<minimum, ushort>,
+            minMaxScalar<minimum, short>,
+            minMaxScalar<minimum, int>,
+            minMaxScalar<minimum, float>,
+            minMaxScalar<minimum, double>
+        },
+        {
+            minMaxScalar<maximum, uchar>,
+            minMaxScalar<maximum, schar>,
+            minMaxScalar<maximum, ushort>,
+            minMaxScalar<maximum, short>,
+            minMaxScalar<maximum, int>,
+            minMaxScalar<maximum, float>,
+            minMaxScalar<maximum, double>
+        }
+    };
+
+    const int depth = src.depth();
+
+    CV_DbgAssert( depth <= CV_64F );
+    CV_DbgAssert( src.channels() == 1 );
+
+    funcs[op][depth](src, value[0], dst, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/minmaxloc.cu b/modules/cudaarithm/src/cuda/minmaxloc.cu
new file mode 100644
index 00000000000..b7c5ec872fc
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/minmaxloc.cu
@@ -0,0 +1,159 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+namespace
+{
+    template <typename T, typename R>
+    void minMaxLocImpl(const GpuMat& _src, const GpuMat& mask, GpuMat& _valBuf, GpuMat& _locBuf, Stream& stream)
+    {
+        const GpuMat_<T>& src = (const GpuMat_<T>&) _src;
+        GpuMat_<R>& valBuf = (GpuMat_<R>&) _valBuf;
+        GpuMat_<int>& locBuf = (GpuMat_<int>&) _locBuf;
+
+        if (mask.empty())
+            gridMinMaxLoc(src, valBuf, locBuf, stream);
+        else
+            gridMinMaxLoc(src, valBuf, locBuf, globPtr<uchar>(mask), stream);
+    }
+}
+
+void cv::cuda::findMinMaxLoc(InputArray _src, OutputArray _minMaxVals, OutputArray _loc, InputArray _mask, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& _src, const GpuMat& mask, GpuMat& _valBuf, GpuMat& _locBuf, Stream& stream);
+    static const func_t funcs[] =
+    {
+        minMaxLocImpl<uchar, int>,
+        minMaxLocImpl<schar, int>,
+        minMaxLocImpl<ushort, int>,
+        minMaxLocImpl<short, int>,
+        minMaxLocImpl<int, int>,
+        minMaxLocImpl<float, float>,
+        minMaxLocImpl<double, double>
+    };
+
+    const GpuMat src = getInputMat(_src, stream);
+    const GpuMat mask = getInputMat(_mask, stream);
+
+    CV_Assert( src.channels() == 1 );
+    CV_Assert( mask.empty() || (mask.size() == src.size() && mask.type() == CV_8U) );
+
+    const int src_depth = src.depth();
+
+    BufferPool pool(stream);
+    GpuMat valBuf(pool.getAllocator());
+    GpuMat locBuf(pool.getAllocator());
+
+    const func_t func = funcs[src_depth];
+    func(src, mask, valBuf, locBuf, stream);
+
+    GpuMat minMaxVals = valBuf.colRange(0, 1);
+    GpuMat loc = locBuf.colRange(0, 1);
+
+    if (_minMaxVals.kind() == _InputArray::CUDA_GPU_MAT)
+    {
+        minMaxVals.copyTo(_minMaxVals, stream);
+    }
+    else
+    {
+        minMaxVals.download(_minMaxVals, stream);
+    }
+
+    if (_loc.kind() == _InputArray::CUDA_GPU_MAT)
+    {
+        loc.copyTo(_loc, stream);
+    }
+    else
+    {
+        loc.download(_loc, stream);
+    }
+}
+
+void cv::cuda::minMaxLoc(InputArray _src, double* minVal, double* maxVal, Point* minLoc, Point* maxLoc, InputArray _mask)
+{
+    Stream& stream = Stream::Null();
+
+    HostMem minMaxVals, locVals;
+    findMinMaxLoc(_src, minMaxVals, locVals, _mask, stream);
+
+    stream.waitForCompletion();
+
+    double vals[2];
+    minMaxVals.createMatHeader().convertTo(Mat(minMaxVals.size(), CV_64FC1, &vals[0]), CV_64F);
+
+    int locs[2];
+    locVals.createMatHeader().copyTo(Mat(locVals.size(), CV_32SC1, &locs[0]));
+    Size size = _src.size();
+    cv::Point locs2D[] = {
+        cv::Point(locs[0] % size.width, locs[0] / size.width),
+        cv::Point(locs[1] % size.width, locs[1] / size.width),
+    };
+
+    if (minVal)
+        *minVal = vals[0];
+
+    if (maxVal)
+        *maxVal = vals[1];
+
+    if (minLoc)
+        *minLoc = locs2D[0];
+
+    if (maxLoc)
+        *maxLoc = locs2D[1];
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/mul_mat.cu b/modules/cudaarithm/src/cuda/mul_mat.cu
new file mode 100644
index 00000000000..6ea70655731
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/mul_mat.cu
@@ -0,0 +1,224 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudev.hpp"
+
+using namespace cv::cudev;
+
+void mulMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat&, double scale, Stream& stream, int);
+void mulMat_8uc4_32f(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream);
+void mulMat_16sc4_32f(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream);
+
+namespace
+{
+    template <typename T, typename D> struct MulOp : binary_function<T, T, D>
+    {
+        __device__ __forceinline__ D operator ()(T a, T b) const
+        {
+            return saturate_cast<D>(a * b);
+        }
+    };
+
+    template <typename T, typename S, typename D> struct MulScaleOp : binary_function<T, T, D>
+    {
+        S scale;
+
+        __device__ __forceinline__ D operator ()(T a, T b) const
+        {
+            return saturate_cast<D>(scale * a * b);
+        }
+    };
+
+    template <typename ScalarDepth> struct TransformPolicy : DefaultTransformPolicy
+    {
+    };
+    template <> struct TransformPolicy<double> : DefaultTransformPolicy
+    {
+        enum {
+            shift = 1
+        };
+    };
+
+    template <typename T, typename S, typename D>
+    void mulMatImpl(const GpuMat& src1, const GpuMat& src2, const GpuMat& dst, double scale, Stream& stream)
+    {
+        if (scale == 1)
+        {
+            MulOp<T, D> op;
+            gridTransformBinary_< TransformPolicy<S> >(globPtr<T>(src1), globPtr<T>(src2), globPtr<D>(dst), op, stream);
+        }
+        else
+        {
+            MulScaleOp<T, S, D> op;
+            op.scale = static_cast<S>(scale);
+            gridTransformBinary_< TransformPolicy<S> >(globPtr<T>(src1), globPtr<T>(src2), globPtr<D>(dst), op, stream);
+        }
+    }
+}
+
+void mulMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat&, double scale, Stream& stream, int)
+{
+    typedef void (*func_t)(const GpuMat& src1, const GpuMat& src2, const GpuMat& dst, double scale, Stream& stream);
+    static const func_t funcs[7][7] =
+    {
+        {
+            mulMatImpl<uchar, float, uchar>,
+            mulMatImpl<uchar, float, schar>,
+            mulMatImpl<uchar, float, ushort>,
+            mulMatImpl<uchar, float, short>,
+            mulMatImpl<uchar, float, int>,
+            mulMatImpl<uchar, float, float>,
+            mulMatImpl<uchar, double, double>
+        },
+        {
+            mulMatImpl<schar, float, uchar>,
+            mulMatImpl<schar, float, schar>,
+            mulMatImpl<schar, float, ushort>,
+            mulMatImpl<schar, float, short>,
+            mulMatImpl<schar, float, int>,
+            mulMatImpl<schar, float, float>,
+            mulMatImpl<schar, double, double>
+        },
+        {
+            0 /*mulMatImpl<ushort, float, uchar>*/,
+            0 /*mulMatImpl<ushort, float, schar>*/,
+            mulMatImpl<ushort, float, ushort>,
+            mulMatImpl<ushort, float, short>,
+            mulMatImpl<ushort, float, int>,
+            mulMatImpl<ushort, float, float>,
+            mulMatImpl<ushort, double, double>
+        },
+        {
+            0 /*mulMatImpl<short, float, uchar>*/,
+            0 /*mulMatImpl<short, float, schar>*/,
+            mulMatImpl<short, float, ushort>,
+            mulMatImpl<short, float, short>,
+            mulMatImpl<short, float, int>,
+            mulMatImpl<short, float, float>,
+            mulMatImpl<short, double, double>
+        },
+        {
+            0 /*mulMatImpl<int, float, uchar>*/,
+            0 /*mulMatImpl<int, float, schar>*/,
+            0 /*mulMatImpl<int, float, ushort>*/,
+            0 /*mulMatImpl<int, float, short>*/,
+            mulMatImpl<int, float, int>,
+            mulMatImpl<int, float, float>,
+            mulMatImpl<int, double, double>
+        },
+        {
+            0 /*mulMatImpl<float, float, uchar>*/,
+            0 /*mulMatImpl<float, float, schar>*/,
+            0 /*mulMatImpl<float, float, ushort>*/,
+            0 /*mulMatImpl<float, float, short>*/,
+            0 /*mulMatImpl<float, float, int>*/,
+            mulMatImpl<float, float, float>,
+            mulMatImpl<float, double, double>
+        },
+        {
+            0 /*mulMatImpl<double, double, uchar>*/,
+            0 /*mulMatImpl<double, double, schar>*/,
+            0 /*mulMatImpl<double, double, ushort>*/,
+            0 /*mulMatImpl<double, double, short>*/,
+            0 /*mulMatImpl<double, double, int>*/,
+            0 /*mulMatImpl<double, double, float>*/,
+            mulMatImpl<double, double, double>
+        }
+    };
+
+    const int sdepth = src1.depth();
+    const int ddepth = dst.depth();
+
+    CV_DbgAssert( sdepth <= CV_64F && ddepth <= CV_64F );
+
+    GpuMat src1_ = src1.reshape(1);
+    GpuMat src2_ = src2.reshape(1);
+    GpuMat dst_ = dst.reshape(1);
+
+    const func_t func = funcs[sdepth][ddepth];
+
+    if (!func)
+        CV_Error(cv::Error::StsUnsupportedFormat, "Unsupported combination of source and destination types");
+
+    func(src1_, src2_, dst_, scale, stream);
+}
+
+namespace
+{
+    template <typename T>
+    struct MulOpSpecial : binary_function<T, float, T>
+    {
+        __device__ __forceinline__ T operator ()(const T& a, float b) const
+        {
+            typedef typename VecTraits<T>::elem_type elem_type;
+
+            T res;
+
+            res.x = saturate_cast<elem_type>(a.x * b);
+            res.y = saturate_cast<elem_type>(a.y * b);
+            res.z = saturate_cast<elem_type>(a.z * b);
+            res.w = saturate_cast<elem_type>(a.w * b);
+
+            return res;
+        }
+    };
+}
+
+void mulMat_8uc4_32f(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+{
+    gridTransformBinary(globPtr<uchar4>(src1), globPtr<float>(src2), globPtr<uchar4>(dst), MulOpSpecial<uchar4>(), stream);
+}
+
+void mulMat_16sc4_32f(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+{
+    gridTransformBinary(globPtr<short4>(src1), globPtr<float>(src2), globPtr<short4>(dst), MulOpSpecial<short4>(), stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/mul_scalar.cu b/modules/cudaarithm/src/cuda/mul_scalar.cu
new file mode 100644
index 00000000000..f27ef26ddd7
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/mul_scalar.cu
@@ -0,0 +1,182 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudev.hpp"
+
+using namespace cv::cudev;
+
+void mulScalar(const GpuMat& src, cv::Scalar val, bool, GpuMat& dst, const GpuMat& mask, double scale, Stream& stream, int);
+
+namespace
+{
+    template <typename SrcType, typename ScalarType, typename DstType> struct MulScalarOp : unary_function<SrcType, DstType>
+    {
+        ScalarType val;
+
+        __device__ __forceinline__ DstType operator ()(SrcType a) const
+        {
+            return saturate_cast<DstType>(saturate_cast<ScalarType>(a) * val);
+        }
+    };
+
+    template <typename ScalarDepth> struct TransformPolicy : DefaultTransformPolicy
+    {
+    };
+    template <> struct TransformPolicy<double> : DefaultTransformPolicy
+    {
+        enum {
+            shift = 1
+        };
+    };
+
+    template <typename SrcType, typename ScalarDepth, typename DstType>
+    void mulScalarImpl(const GpuMat& src, cv::Scalar value, GpuMat& dst, Stream& stream)
+    {
+        typedef typename MakeVec<ScalarDepth, VecTraits<SrcType>::cn>::type ScalarType;
+
+        cv::Scalar_<ScalarDepth> value_ = value;
+
+        MulScalarOp<SrcType, ScalarType, DstType> op;
+        op.val = VecTraits<ScalarType>::make(value_.val);
+
+        gridTransformUnary_< TransformPolicy<ScalarDepth> >(globPtr<SrcType>(src), globPtr<DstType>(dst), op, stream);
+    }
+}
+
+void mulScalar(const GpuMat& src, cv::Scalar val, bool, GpuMat& dst, const GpuMat&, double scale, Stream& stream, int)
+{
+    typedef void (*func_t)(const GpuMat& src, cv::Scalar val, GpuMat& dst, Stream& stream);
+    static const func_t funcs[7][7][4] =
+    {
+        {
+            {mulScalarImpl<uchar, float, uchar>, mulScalarImpl<uchar2, float, uchar2>, mulScalarImpl<uchar3, float, uchar3>, mulScalarImpl<uchar4, float, uchar4>},
+            {mulScalarImpl<uchar, float, schar>, mulScalarImpl<uchar2, float, char2>, mulScalarImpl<uchar3, float, char3>, mulScalarImpl<uchar4, float, char4>},
+            {mulScalarImpl<uchar, float, ushort>, mulScalarImpl<uchar2, float, ushort2>, mulScalarImpl<uchar3, float, ushort3>, mulScalarImpl<uchar4, float, ushort4>},
+            {mulScalarImpl<uchar, float, short>, mulScalarImpl<uchar2, float, short2>, mulScalarImpl<uchar3, float, short3>, mulScalarImpl<uchar4, float, short4>},
+            {mulScalarImpl<uchar, float, int>, mulScalarImpl<uchar2, float, int2>, mulScalarImpl<uchar3, float, int3>, mulScalarImpl<uchar4, float, int4>},
+            {mulScalarImpl<uchar, float, float>, mulScalarImpl<uchar2, float, float2>, mulScalarImpl<uchar3, float, float3>, mulScalarImpl<uchar4, float, float4>},
+            {mulScalarImpl<uchar, double, double>, mulScalarImpl<uchar2, double, double2>, mulScalarImpl<uchar3, double, double3>, mulScalarImpl<uchar4, double, double4>}
+        },
+        {
+            {mulScalarImpl<schar, float, uchar>, mulScalarImpl<char2, float, uchar2>, mulScalarImpl<char3, float, uchar3>, mulScalarImpl<char4, float, uchar4>},
+            {mulScalarImpl<schar, float, schar>, mulScalarImpl<char2, float, char2>, mulScalarImpl<char3, float, char3>, mulScalarImpl<char4, float, char4>},
+            {mulScalarImpl<schar, float, ushort>, mulScalarImpl<char2, float, ushort2>, mulScalarImpl<char3, float, ushort3>, mulScalarImpl<char4, float, ushort4>},
+            {mulScalarImpl<schar, float, short>, mulScalarImpl<char2, float, short2>, mulScalarImpl<char3, float, short3>, mulScalarImpl<char4, float, short4>},
+            {mulScalarImpl<schar, float, int>, mulScalarImpl<char2, float, int2>, mulScalarImpl<char3, float, int3>, mulScalarImpl<char4, float, int4>},
+            {mulScalarImpl<schar, float, float>, mulScalarImpl<char2, float, float2>, mulScalarImpl<char3, float, float3>, mulScalarImpl<char4, float, float4>},
+            {mulScalarImpl<schar, double, double>, mulScalarImpl<char2, double, double2>, mulScalarImpl<char3, double, double3>, mulScalarImpl<char4, double, double4>}
+        },
+        {
+            {0 /*mulScalarImpl<ushort, float, uchar>*/, 0 /*mulScalarImpl<ushort2, float, uchar2>*/, 0 /*mulScalarImpl<ushort3, float, uchar3>*/, 0 /*mulScalarImpl<ushort4, float, uchar4>*/},
+            {0 /*mulScalarImpl<ushort, float, schar>*/, 0 /*mulScalarImpl<ushort2, float, char2>*/, 0 /*mulScalarImpl<ushort3, float, char3>*/, 0 /*mulScalarImpl<ushort4, float, char4>*/},
+            {mulScalarImpl<ushort, float, ushort>, mulScalarImpl<ushort2, float, ushort2>, mulScalarImpl<ushort3, float, ushort3>, mulScalarImpl<ushort4, float, ushort4>},
+            {mulScalarImpl<ushort, float, short>, mulScalarImpl<ushort2, float, short2>, mulScalarImpl<ushort3, float, short3>, mulScalarImpl<ushort4, float, short4>},
+            {mulScalarImpl<ushort, float, int>, mulScalarImpl<ushort2, float, int2>, mulScalarImpl<ushort3, float, int3>, mulScalarImpl<ushort4, float, int4>},
+            {mulScalarImpl<ushort, float, float>, mulScalarImpl<ushort2, float, float2>, mulScalarImpl<ushort3, float, float3>, mulScalarImpl<ushort4, float, float4>},
+            {mulScalarImpl<ushort, double, double>, mulScalarImpl<ushort2, double, double2>, mulScalarImpl<ushort3, double, double3>, mulScalarImpl<ushort4, double, double4>}
+        },
+        {
+            {0 /*mulScalarImpl<short, float, uchar>*/, 0 /*mulScalarImpl<short2, float, uchar2>*/, 0 /*mulScalarImpl<short3, float, uchar3>*/, 0 /*mulScalarImpl<short4, float, uchar4>*/},
+            {0 /*mulScalarImpl<short, float, schar>*/, 0 /*mulScalarImpl<short2, float, char2>*/, 0 /*mulScalarImpl<short3, float, char3>*/, 0 /*mulScalarImpl<short4, float, char4>*/},
+            {mulScalarImpl<short, float, ushort>, mulScalarImpl<short2, float, ushort2>, mulScalarImpl<short3, float, ushort3>, mulScalarImpl<short4, float, ushort4>},
+            {mulScalarImpl<short, float, short>, mulScalarImpl<short2, float, short2>, mulScalarImpl<short3, float, short3>, mulScalarImpl<short4, float, short4>},
+            {mulScalarImpl<short, float, int>, mulScalarImpl<short2, float, int2>, mulScalarImpl<short3, float, int3>, mulScalarImpl<short4, float, int4>},
+            {mulScalarImpl<short, float, float>, mulScalarImpl<short2, float, float2>, mulScalarImpl<short3, float, float3>, mulScalarImpl<short4, float, float4>},
+            {mulScalarImpl<short, double, double>, mulScalarImpl<short2, double, double2>, mulScalarImpl<short3, double, double3>, mulScalarImpl<short4, double, double4>}
+        },
+        {
+            {0 /*mulScalarImpl<int, float, uchar>*/, 0 /*mulScalarImpl<int2, float, uchar2>*/, 0 /*mulScalarImpl<int3, float, uchar3>*/, 0 /*mulScalarImpl<int4, float, uchar4>*/},
+            {0 /*mulScalarImpl<int, float, schar>*/, 0 /*mulScalarImpl<int2, float, char2>*/, 0 /*mulScalarImpl<int3, float, char3>*/, 0 /*mulScalarImpl<int4, float, char4>*/},
+            {0 /*mulScalarImpl<int, float, ushort>*/, 0 /*mulScalarImpl<int2, float, ushort2>*/, 0 /*mulScalarImpl<int3, float, ushort3>*/, 0 /*mulScalarImpl<int4, float, ushort4>*/},
+            {0 /*mulScalarImpl<int, float, short>*/, 0 /*mulScalarImpl<int2, float, short2>*/, 0 /*mulScalarImpl<int3, float, short3>*/, 0 /*mulScalarImpl<int4, float, short4>*/},
+            {mulScalarImpl<int, float, int>, mulScalarImpl<int2, float, int2>, mulScalarImpl<int3, float, int3>, mulScalarImpl<int4, float, int4>},
+            {mulScalarImpl<int, float, float>, mulScalarImpl<int2, float, float2>, mulScalarImpl<int3, float, float3>, mulScalarImpl<int4, float, float4>},
+            {mulScalarImpl<int, double, double>, mulScalarImpl<int2, double, double2>, mulScalarImpl<int3, double, double3>, mulScalarImpl<int4, double, double4>}
+        },
+        {
+            {0 /*mulScalarImpl<float, float, uchar>*/, 0 /*mulScalarImpl<float2, float, uchar2>*/, 0 /*mulScalarImpl<float3, float, uchar3>*/, 0 /*mulScalarImpl<float4, float, uchar4>*/},
+            {0 /*mulScalarImpl<float, float, schar>*/, 0 /*mulScalarImpl<float2, float, char2>*/, 0 /*mulScalarImpl<float3, float, char3>*/, 0 /*mulScalarImpl<float4, float, char4>*/},
+            {0 /*mulScalarImpl<float, float, ushort>*/, 0 /*mulScalarImpl<float2, float, ushort2>*/, 0 /*mulScalarImpl<float3, float, ushort3>*/, 0 /*mulScalarImpl<float4, float, ushort4>*/},
+            {0 /*mulScalarImpl<float, float, short>*/, 0 /*mulScalarImpl<float2, float, short2>*/, 0 /*mulScalarImpl<float3, float, short3>*/, 0 /*mulScalarImpl<float4, float, short4>*/},
+            {0 /*mulScalarImpl<float, float, int>*/, 0 /*mulScalarImpl<float2, float, int2>*/, 0 /*mulScalarImpl<float3, float, int3>*/, 0 /*mulScalarImpl<float4, float, int4>*/},
+            {mulScalarImpl<float, float, float>, mulScalarImpl<float2, float, float2>, mulScalarImpl<float3, float, float3>, mulScalarImpl<float4, float, float4>},
+            {mulScalarImpl<float, double, double>, mulScalarImpl<float2, double, double2>, mulScalarImpl<float3, double, double3>, mulScalarImpl<float4, double, double4>}
+        },
+        {
+            {0 /*mulScalarImpl<double, double, uchar>*/, 0 /*mulScalarImpl<double2, double, uchar2>*/, 0 /*mulScalarImpl<double3, double, uchar3>*/, 0 /*mulScalarImpl<double4, double, uchar4>*/},
+            {0 /*mulScalarImpl<double, double, schar>*/, 0 /*mulScalarImpl<double2, double, char2>*/, 0 /*mulScalarImpl<double3, double, char3>*/, 0 /*mulScalarImpl<double4, double, char4>*/},
+            {0 /*mulScalarImpl<double, double, ushort>*/, 0 /*mulScalarImpl<double2, double, ushort2>*/, 0 /*mulScalarImpl<double3, double, ushort3>*/, 0 /*mulScalarImpl<double4, double, ushort4>*/},
+            {0 /*mulScalarImpl<double, double, short>*/, 0 /*mulScalarImpl<double2, double, short2>*/, 0 /*mulScalarImpl<double3, double, short3>*/, 0 /*mulScalarImpl<double4, double, short4>*/},
+            {0 /*mulScalarImpl<double, double, int>*/, 0 /*mulScalarImpl<double2, double, int2>*/, 0 /*mulScalarImpl<double3, double, int3>*/, 0 /*mulScalarImpl<double4, double, int4>*/},
+            {0 /*mulScalarImpl<double, double, float>*/, 0 /*mulScalarImpl<double2, double, float2>*/, 0 /*mulScalarImpl<double3, double, float3>*/, 0 /*mulScalarImpl<double4, double, float4>*/},
+            {mulScalarImpl<double, double, double>, mulScalarImpl<double2, double, double2>, mulScalarImpl<double3, double, double3>, mulScalarImpl<double4, double, double4>}
+        }
+    };
+
+    const int sdepth = src.depth();
+    const int ddepth = dst.depth();
+    const int cn = src.channels();
+
+    CV_DbgAssert( sdepth <= CV_64F && ddepth <= CV_64F && cn <= 4 );
+
+    val[0] *= scale;
+    val[1] *= scale;
+    val[2] *= scale;
+    val[3] *= scale;
+
+    const func_t func = funcs[sdepth][ddepth][cn - 1];
+
+    if (!func)
+        CV_Error(cv::Error::StsUnsupportedFormat, "Unsupported combination of source and destination types");
+
+    func(src, val, dst, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/mul_spectrums.cu b/modules/cudaarithm/src/cuda/mul_spectrums.cu
new file mode 100644
index 00000000000..10a4eff6134
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/mul_spectrums.cu
@@ -0,0 +1,170 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+//////////////////////////////////////////////////////////////////////////////
+// mulSpectrums
+
+namespace
+{
+    __device__ __forceinline__ float real(const float2& val)
+    {
+        return val.x;
+    }
+
+    __device__ __forceinline__ float imag(const float2& val)
+    {
+        return val.y;
+    }
+
+    __device__ __forceinline__ float2 cmul(const float2& a, const float2& b)
+    {
+        return make_float2((real(a) * real(b)) - (imag(a) * imag(b)),
+                           (real(a) * imag(b)) + (imag(a) * real(b)));
+    }
+
+    __device__ __forceinline__ float2 conj(const float2& a)
+    {
+        return make_float2(real(a), -imag(a));
+    }
+
+    struct comlex_mul : binary_function<float2, float2, float2>
+    {
+        __device__ __forceinline__ float2 operator ()(const float2& a, const float2& b) const
+        {
+            return cmul(a, b);
+        }
+    };
+
+    struct comlex_mul_conj : binary_function<float2, float2, float2>
+    {
+        __device__ __forceinline__ float2 operator ()(const float2& a, const float2& b) const
+        {
+            return cmul(a, conj(b));
+        }
+    };
+
+    struct comlex_mul_scale : binary_function<float2, float2, float2>
+    {
+        float scale;
+
+        __device__ __forceinline__ float2 operator ()(const float2& a, const float2& b) const
+        {
+            return scale * cmul(a, b);
+        }
+    };
+
+    struct comlex_mul_conj_scale : binary_function<float2, float2, float2>
+    {
+        float scale;
+
+        __device__ __forceinline__ float2 operator ()(const float2& a, const float2& b) const
+        {
+            return scale * cmul(a, conj(b));
+        }
+    };
+}
+
+void cv::cuda::mulSpectrums(InputArray _src1, InputArray _src2, OutputArray _dst, int flags, bool conjB, Stream& stream)
+{
+    CV_UNUSED(flags);
+
+    GpuMat src1 = getInputMat(_src1, stream);
+    GpuMat src2 = getInputMat(_src2, stream);
+
+    CV_Assert( src1.type() == src2.type() && src1.type() == CV_32FC2 );
+    CV_Assert( src1.size() == src2.size() );
+
+    GpuMat dst = getOutputMat(_dst, src1.size(), CV_32FC2, stream);
+
+    if (conjB)
+        gridTransformBinary(globPtr<float2>(src1), globPtr<float2>(src2), globPtr<float2>(dst), comlex_mul_conj(), stream);
+    else
+        gridTransformBinary(globPtr<float2>(src1), globPtr<float2>(src2), globPtr<float2>(dst), comlex_mul(), stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+void cv::cuda::mulAndScaleSpectrums(InputArray _src1, InputArray _src2, OutputArray _dst, int flags, float scale, bool conjB, Stream& stream)
+{
+    CV_UNUSED(flags);
+
+    GpuMat src1 = getInputMat(_src1, stream);
+    GpuMat src2 = getInputMat(_src2, stream);
+
+    CV_Assert( src1.type() == src2.type() && src1.type() == CV_32FC2);
+    CV_Assert( src1.size() == src2.size() );
+
+    GpuMat dst = getOutputMat(_dst, src1.size(), CV_32FC2, stream);
+
+    if (conjB)
+    {
+        comlex_mul_conj_scale op;
+        op.scale = scale;
+        gridTransformBinary(globPtr<float2>(src1), globPtr<float2>(src2), globPtr<float2>(dst), op, stream);
+    }
+    else
+    {
+        comlex_mul_scale op;
+        op.scale = scale;
+        gridTransformBinary(globPtr<float2>(src1), globPtr<float2>(src2), globPtr<float2>(dst), op, stream);
+    }
+
+    syncOutput(dst, _dst, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/norm.cu b/modules/cudaarithm/src/cuda/norm.cu
new file mode 100644
index 00000000000..43bd358f32b
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/norm.cu
@@ -0,0 +1,189 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+namespace
+{
+    void normDiffInf(const GpuMat& _src1, const GpuMat& _src2, GpuMat& _dst, Stream& stream)
+    {
+        const GpuMat_<uchar>& src1 = (const GpuMat_<uchar>&) _src1;
+        const GpuMat_<uchar>& src2 = (const GpuMat_<uchar>&) _src2;
+        GpuMat_<int>& dst = (GpuMat_<int>&) _dst;
+
+        gridFindMaxVal(abs_(cvt_<int>(src1) - cvt_<int>(src2)), dst, stream);
+    }
+
+    void normDiffL1(const GpuMat& _src1, const GpuMat& _src2, GpuMat& _dst, Stream& stream)
+    {
+        const GpuMat_<uchar>& src1 = (const GpuMat_<uchar>&) _src1;
+        const GpuMat_<uchar>& src2 = (const GpuMat_<uchar>&) _src2;
+        GpuMat_<int>& dst = (GpuMat_<int>&) _dst;
+
+        gridCalcSum(abs_(cvt_<int>(src1) - cvt_<int>(src2)), dst, stream);
+    }
+
+    void normDiffL2(const GpuMat& _src1, const GpuMat& _src2, GpuMat& _dst, Stream& stream)
+    {
+        const GpuMat_<uchar>& src1 = (const GpuMat_<uchar>&) _src1;
+        const GpuMat_<uchar>& src2 = (const GpuMat_<uchar>&) _src2;
+        GpuMat_<double>& dst = (GpuMat_<double>&) _dst;
+
+        BufferPool pool(stream);
+        GpuMat_<double> buf(1, 1, pool.getAllocator());
+
+        gridCalcSum(sqr_(cvt_<double>(src1) - cvt_<double>(src2)), buf, stream);
+        gridTransformUnary(buf, dst, sqrt_func<double>(), stream);
+    }
+}
+
+void cv::cuda::calcNormDiff(InputArray _src1, InputArray _src2, OutputArray _dst, int normType, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& _src1, const GpuMat& _src2, GpuMat& _dst, Stream& stream);
+    static const func_t funcs[] =
+    {
+        0, normDiffInf, normDiffL1, 0, normDiffL2
+    };
+
+    GpuMat src1 = getInputMat(_src1, stream);
+    GpuMat src2 = getInputMat(_src2, stream);
+
+    CV_Assert( src1.type() == CV_8UC1 );
+    CV_Assert( src1.size() == src2.size() && src1.type() == src2.type() );
+    CV_Assert( normType == NORM_INF || normType == NORM_L1 || normType == NORM_L2 );
+
+    GpuMat dst = getOutputMat(_dst, 1, 1, normType == NORM_L2 ? CV_64FC1 : CV_32SC1, stream);
+
+    const func_t func = funcs[normType];
+    func(src1, src2, dst, stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+double cv::cuda::norm(InputArray _src1, InputArray _src2, int normType)
+{
+    Stream& stream = Stream::Null();
+
+    HostMem dst;
+    calcNormDiff(_src1, _src2, dst, normType, stream);
+
+    stream.waitForCompletion();
+
+    double val;
+    dst.createMatHeader().convertTo(Mat(1, 1, CV_64FC1, &val), CV_64F);
+
+    return val;
+}
+
+namespace cv { namespace cuda { namespace device {
+
+void normL2(cv::InputArray _src, cv::OutputArray _dst, cv::InputArray _mask, Stream& stream);
+
+}}}
+
+namespace
+{
+    template <typename T, typename R>
+    void normL2Impl(const GpuMat& _src, const GpuMat& mask, GpuMat& _dst, Stream& stream)
+    {
+        const GpuMat_<T>& src = (const GpuMat_<T>&) _src;
+        GpuMat_<R>& dst = (GpuMat_<R>&) _dst;
+
+        BufferPool pool(stream);
+        GpuMat_<double> buf(1, 1, pool.getAllocator());
+
+        if (mask.empty())
+        {
+            gridCalcSum(sqr_(cvt_<double>(src)), buf, stream);
+        }
+        else
+        {
+            gridCalcSum(sqr_(cvt_<double>(src)), buf, globPtr<uchar>(mask), stream);
+        }
+
+        gridTransformUnary(buf, dst, sqrt_func<double>(), stream);
+    }
+}
+
+void cv::cuda::device::normL2(InputArray _src, OutputArray _dst, InputArray _mask, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& _src, const GpuMat& mask, GpuMat& _dst, Stream& stream);
+    static const func_t funcs[] =
+    {
+        normL2Impl<uchar, double>,
+        normL2Impl<schar, double>,
+        normL2Impl<ushort, double>,
+        normL2Impl<short, double>,
+        normL2Impl<int, double>,
+        normL2Impl<float, double>,
+        normL2Impl<double, double>
+    };
+
+    const GpuMat src = getInputMat(_src, stream);
+    const GpuMat mask = getInputMat(_mask, stream);
+
+    CV_Assert( src.channels() == 1 );
+    CV_Assert( mask.empty() || (mask.size() == src.size() && mask.type() == CV_8U) );
+
+    GpuMat dst = getOutputMat(_dst, 1, 1, CV_64FC1, stream);
+
+    const func_t func = funcs[src.depth()];
+    func(src, mask, dst, stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/normalize.cu b/modules/cudaarithm/src/cuda/normalize.cu
new file mode 100644
index 00000000000..c83f2c0dff4
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/normalize.cu
@@ -0,0 +1,294 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+namespace {
+
+template <typename T, typename R, typename I>
+struct ConvertorMinMax : unary_function<T, R>
+{
+    typedef typename LargerType<T, R>::type larger_type1;
+    typedef typename LargerType<larger_type1, I>::type larger_type2;
+    typedef typename LargerType<larger_type2, float>::type scalar_type;
+
+    scalar_type dmin, dmax;
+    const I* minMaxVals;
+
+    __device__ R operator ()(typename TypeTraits<T>::parameter_type src) const
+    {
+        const scalar_type smin = minMaxVals[0];
+        const scalar_type smax = minMaxVals[1];
+
+        const scalar_type scale = (dmax - dmin) * (smax - smin > numeric_limits<scalar_type>::epsilon() ? 1.0 / (smax - smin) : 0.0);
+        const scalar_type shift = dmin - smin * scale;
+
+        return cudev::saturate_cast<R>(scale * src + shift);
+    }
+};
+
+template <typename T, typename R, typename I>
+void normalizeMinMax(const GpuMat& _src, GpuMat& _dst, double a, double b, const GpuMat& mask, Stream& stream)
+{
+    const GpuMat_<T>& src = (const GpuMat_<T>&)_src;
+    GpuMat_<R>& dst = (GpuMat_<R>&)_dst;
+
+    BufferPool pool(stream);
+    GpuMat_<I> minMaxVals(1, 2, pool.getAllocator());
+
+    if (mask.empty())
+    {
+        gridFindMinMaxVal(src, minMaxVals, stream);
+    }
+    else
+    {
+        gridFindMinMaxVal(src, minMaxVals, globPtr<uchar>(mask), stream);
+    }
+
+    ConvertorMinMax<T, R, I> cvt;
+    cvt.dmin = std::min(a, b);
+    cvt.dmax = std::max(a, b);
+    cvt.minMaxVals = minMaxVals[0];
+
+    if (mask.empty())
+    {
+        gridTransformUnary(src, dst, cvt, stream);
+    }
+    else
+    {
+        dst.setTo(Scalar::all(0), stream);
+        gridTransformUnary(src, dst, cvt, globPtr<uchar>(mask), stream);
+    }
+}
+
+template <typename T, typename R, typename I, bool normL2>
+struct ConvertorNorm : unary_function<T, R>
+{
+    typedef typename LargerType<T, R>::type larger_type1;
+    typedef typename LargerType<larger_type1, I>::type larger_type2;
+    typedef typename LargerType<larger_type2, float>::type scalar_type;
+
+    scalar_type a;
+    const I* normVal;
+
+    __device__ R operator ()(typename TypeTraits<T>::parameter_type src) const
+    {
+        sqrt_func<scalar_type> sqrt;
+
+        scalar_type scale = normL2 ? sqrt(*normVal) : *normVal;
+        scale = scale > numeric_limits<scalar_type>::epsilon() ? a / scale : 0.0;
+
+        return cudev::saturate_cast<R>(scale * src);
+    }
+};
+
+template <typename T, typename R, typename I>
+void normalizeNorm(const GpuMat& _src, GpuMat& _dst, double a, int normType, const GpuMat& mask, Stream& stream)
+{
+    const GpuMat_<T>& src = (const GpuMat_<T>&)_src;
+    GpuMat_<R>& dst = (GpuMat_<R>&)_dst;
+
+    BufferPool pool(stream);
+    GpuMat_<I> normVal(1, 1, pool.getAllocator());
+
+    if (normType == NORM_L1)
+    {
+        if (mask.empty())
+        {
+            gridCalcSum(abs_(cvt_<I>(src)), normVal, stream);
+        }
+        else
+        {
+            gridCalcSum(abs_(cvt_<I>(src)), normVal, globPtr<uchar>(mask), stream);
+        }
+    }
+    else if (normType == NORM_L2)
+    {
+        if (mask.empty())
+        {
+            gridCalcSum(sqr_(cvt_<I>(src)), normVal, stream);
+        }
+        else
+        {
+            gridCalcSum(sqr_(cvt_<I>(src)), normVal, globPtr<uchar>(mask), stream);
+        }
+    }
+    else // NORM_INF
+    {
+        if (mask.empty())
+        {
+            gridFindMaxVal(abs_(cvt_<I>(src)), normVal, stream);
+        }
+        else
+        {
+            gridFindMaxVal(abs_(cvt_<I>(src)), normVal, globPtr<uchar>(mask), stream);
+        }
+    }
+
+    if (normType == NORM_L2)
+    {
+        ConvertorNorm<T, R, I, true> cvt;
+        cvt.a = a;
+        cvt.normVal = normVal[0];
+
+        if (mask.empty())
+        {
+            gridTransformUnary(src, dst, cvt, stream);
+        }
+        else
+        {
+            dst.setTo(Scalar::all(0), stream);
+            gridTransformUnary(src, dst, cvt, globPtr<uchar>(mask), stream);
+        }
+    }
+    else
+    {
+        ConvertorNorm<T, R, I, false> cvt;
+        cvt.a = a;
+        cvt.normVal = normVal[0];
+
+        if (mask.empty())
+        {
+            gridTransformUnary(src, dst, cvt, stream);
+        }
+        else
+        {
+            dst.setTo(Scalar::all(0), stream);
+            gridTransformUnary(src, dst, cvt, globPtr<uchar>(mask), stream);
+        }
+    }
+}
+
+} // namespace
+
+void cv::cuda::normalize(InputArray _src, OutputArray _dst, double a, double b, int normType, int dtype, InputArray _mask, Stream& stream)
+{
+    typedef void (*func_minmax_t)(const GpuMat& _src, GpuMat& _dst, double a, double b, const GpuMat& mask, Stream& stream);
+    typedef void (*func_norm_t)(const GpuMat& _src, GpuMat& _dst, double a, int normType, const GpuMat& mask, Stream& stream);
+
+    static const func_minmax_t funcs_minmax[] =
+    {
+        normalizeMinMax<uchar, float, float>,
+        normalizeMinMax<schar, float, float>,
+        normalizeMinMax<ushort, float, float>,
+        normalizeMinMax<short, float, float>,
+        normalizeMinMax<int, float, float>,
+        normalizeMinMax<float, float, float>,
+        normalizeMinMax<double, double, double>
+    };
+
+    static const func_norm_t funcs_norm[] =
+    {
+        normalizeNorm<uchar, float, float>,
+        normalizeNorm<schar, float, float>,
+        normalizeNorm<ushort, float, float>,
+        normalizeNorm<short, float, float>,
+        normalizeNorm<int, float, float>,
+        normalizeNorm<float, float, float>,
+        normalizeNorm<double, double, double>
+    };
+
+    CV_Assert( normType == NORM_INF || normType == NORM_L1 || normType == NORM_L2 || normType == NORM_MINMAX );
+
+    const GpuMat src = getInputMat(_src, stream);
+    const GpuMat mask = getInputMat(_mask, stream);
+
+    CV_Assert( src.channels() == 1 );
+    CV_Assert( mask.empty() || (mask.size() == src.size() && mask.type() == CV_8U) );
+
+    if (dtype < 0)
+    {
+        dtype = _dst.fixedType() ? _dst.type() : src.type();
+    }
+    dtype = CV_MAT_DEPTH(dtype);
+
+    const int src_depth = src.depth();
+    const int tmp_depth = src_depth <= CV_32F ? CV_32F : src_depth;
+
+    GpuMat dst;
+    if (dtype == tmp_depth)
+    {
+        _dst.create(src.size(), tmp_depth);
+        dst = getOutputMat(_dst, src.size(), tmp_depth, stream);
+    }
+    else
+    {
+        BufferPool pool(stream);
+        dst = pool.getBuffer(src.size(), tmp_depth);
+    }
+
+    if (normType == NORM_MINMAX)
+    {
+        const func_minmax_t func = funcs_minmax[src_depth];
+        func(src, dst, a, b, mask, stream);
+    }
+    else
+    {
+        const func_norm_t func = funcs_norm[src_depth];
+        func(src, dst, a, normType, mask, stream);
+    }
+
+    if (dtype == tmp_depth)
+    {
+        syncOutput(dst, _dst, stream);
+    }
+    else
+    {
+        dst.convertTo(_dst, dtype, stream);
+    }
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/polar_cart.cu b/modules/cudaarithm/src/cuda/polar_cart.cu
new file mode 100644
index 00000000000..2fb1315e619
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/polar_cart.cu
@@ -0,0 +1,240 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+void cv::cuda::magnitude(InputArray _x, InputArray _y, OutputArray _dst, Stream& stream)
+{
+    GpuMat x = getInputMat(_x, stream);
+    GpuMat y = getInputMat(_y, stream);
+
+    CV_Assert( x.depth() == CV_32F );
+    CV_Assert( y.type() == x.type() && y.size() == x.size() );
+
+    GpuMat dst = getOutputMat(_dst, x.size(), CV_32FC1, stream);
+
+    GpuMat_<float> xc(x.reshape(1));
+    GpuMat_<float> yc(y.reshape(1));
+    GpuMat_<float> magc(dst.reshape(1));
+
+    gridTransformBinary(xc, yc, magc, magnitude_func<float>(), stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+void cv::cuda::magnitudeSqr(InputArray _x, InputArray _y, OutputArray _dst, Stream& stream)
+{
+    GpuMat x = getInputMat(_x, stream);
+    GpuMat y = getInputMat(_y, stream);
+
+    CV_Assert( x.depth() == CV_32F );
+    CV_Assert( y.type() == x.type() && y.size() == x.size() );
+
+    GpuMat dst = getOutputMat(_dst, x.size(), CV_32FC1, stream);
+
+    GpuMat_<float> xc(x.reshape(1));
+    GpuMat_<float> yc(y.reshape(1));
+    GpuMat_<float> magc(dst.reshape(1));
+
+    gridTransformBinary(xc, yc, magc, magnitude_sqr_func<float>(), stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+void cv::cuda::phase(InputArray _x, InputArray _y, OutputArray _dst, bool angleInDegrees, Stream& stream)
+{
+    GpuMat x = getInputMat(_x, stream);
+    GpuMat y = getInputMat(_y, stream);
+
+    CV_Assert( x.depth() == CV_32F );
+    CV_Assert( y.type() == x.type() && y.size() == x.size() );
+
+    GpuMat dst = getOutputMat(_dst, x.size(), CV_32FC1, stream);
+
+    GpuMat_<float> xc(x.reshape(1));
+    GpuMat_<float> yc(y.reshape(1));
+    GpuMat_<float> anglec(dst.reshape(1));
+
+    if (angleInDegrees)
+        gridTransformBinary(xc, yc, anglec, direction_func<float, true>(), stream);
+    else
+        gridTransformBinary(xc, yc, anglec, direction_func<float, false>(), stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+void cv::cuda::cartToPolar(InputArray _x, InputArray _y, OutputArray _mag, OutputArray _angle, bool angleInDegrees, Stream& stream)
+{
+    GpuMat x = getInputMat(_x, stream);
+    GpuMat y = getInputMat(_y, stream);
+
+    CV_Assert( x.depth() == CV_32F );
+    CV_Assert( y.type() == x.type() && y.size() == x.size() );
+
+    GpuMat mag = getOutputMat(_mag, x.size(), CV_32FC1, stream);
+    GpuMat angle = getOutputMat(_angle, x.size(), CV_32FC1, stream);
+
+    GpuMat_<float> xc(x.reshape(1));
+    GpuMat_<float> yc(y.reshape(1));
+    GpuMat_<float> magc(mag.reshape(1));
+    GpuMat_<float> anglec(angle.reshape(1));
+
+    if (angleInDegrees)
+    {
+        gridTransformTuple(zipPtr(xc, yc),
+                           tie(magc, anglec),
+                           make_tuple(
+                               binaryTupleAdapter<0, 1>(magnitude_func<float>()),
+                               binaryTupleAdapter<0, 1>(direction_func<float, true>())),
+                           stream);
+    }
+    else
+    {
+        gridTransformTuple(zipPtr(xc, yc),
+                           tie(magc, anglec),
+                           make_tuple(
+                               binaryTupleAdapter<0, 1>(magnitude_func<float>()),
+                               binaryTupleAdapter<0, 1>(direction_func<float, false>())),
+                           stream);
+    }
+
+    syncOutput(mag, _mag, stream);
+    syncOutput(angle, _angle, stream);
+}
+
+namespace
+{
+    template <typename T> struct sincos_op
+    {
+        __device__ __forceinline__ void operator()(T a, T *sptr, T *cptr) const
+        {
+            ::sincos(a, sptr, cptr);
+        }
+    };
+    template <> struct sincos_op<float>
+    {
+        __device__ __forceinline__ void operator()(float a, float *sptr, float *cptr) const
+        {
+            ::sincosf(a, sptr, cptr);
+        }
+    };
+
+    template <typename T, bool useMag>
+    __global__ void polarToCartImpl_(const GlobPtr<T> mag, const GlobPtr<T> angle, GlobPtr<T> xmat, GlobPtr<T> ymat, const T scale, const int rows, const int cols)
+    {
+        const int x = blockDim.x * blockIdx.x + threadIdx.x;
+        const int y = blockDim.y * blockIdx.y + threadIdx.y;
+
+        if (x >= cols || y >= rows)
+            return;
+
+        const T mag_val = useMag ? mag(y, x) : static_cast<T>(1.0);
+        const T angle_val = angle(y, x);
+
+        T sin_a, cos_a;
+        sincos_op<T> op;
+        op(scale * angle_val, &sin_a, &cos_a);
+
+        xmat(y, x) = mag_val * cos_a;
+        ymat(y, x) = mag_val * sin_a;
+    }
+
+    template <typename T>
+    void polarToCartImpl(const GpuMat& mag, const GpuMat& angle, GpuMat& x, GpuMat& y, bool angleInDegrees, cudaStream_t& stream)
+    {
+        GpuMat_<T> xc(x.reshape(1));
+        GpuMat_<T> yc(y.reshape(1));
+        GpuMat_<T> magc(mag.reshape(1));
+        GpuMat_<T> anglec(angle.reshape(1));
+
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(anglec.cols, block.x), divUp(anglec.rows, block.y));
+
+        const T scale = angleInDegrees ? static_cast<T>(CV_PI / 180.0) : static_cast<T>(1.0);
+
+        if (magc.empty())
+            polarToCartImpl_<T, false> << <grid, block, 0, stream >> >(shrinkPtr(magc), shrinkPtr(anglec), shrinkPtr(xc), shrinkPtr(yc), scale, anglec.rows, anglec.cols);
+        else
+            polarToCartImpl_<T, true> << <grid, block, 0, stream >> >(shrinkPtr(magc), shrinkPtr(anglec), shrinkPtr(xc), shrinkPtr(yc), scale, anglec.rows, anglec.cols);
+    }
+}
+
+void cv::cuda::polarToCart(InputArray _mag, InputArray _angle, OutputArray _x, OutputArray _y, bool angleInDegrees, Stream& _stream)
+{
+    typedef void(*func_t)(const GpuMat& mag, const GpuMat& angle, GpuMat& x, GpuMat& y, bool angleInDegrees, cudaStream_t& stream);
+    static const func_t funcs[7] = { 0, 0, 0, 0, 0, polarToCartImpl<float>, polarToCartImpl<double> };
+
+    GpuMat mag = getInputMat(_mag, _stream);
+    GpuMat angle = getInputMat(_angle, _stream);
+
+    CV_Assert(angle.depth() == CV_32F || angle.depth() == CV_64F);
+    CV_Assert( mag.empty() || (mag.type() == angle.type() && mag.size() == angle.size()) );
+
+    GpuMat x = getOutputMat(_x, angle.size(), CV_MAKETYPE(angle.depth(), 1), _stream);
+    GpuMat y = getOutputMat(_y, angle.size(), CV_MAKETYPE(angle.depth(), 1), _stream);
+
+    cudaStream_t stream = StreamAccessor::getStream(_stream);
+    funcs[angle.depth()](mag, angle, x, y, angleInDegrees, stream);
+    CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+    syncOutput(x, _x, _stream);
+    syncOutput(y, _y, _stream);
+
+    if (stream == 0)
+        CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/reduce.cu b/modules/cudaarithm/src/cuda/reduce.cu
new file mode 100644
index 00000000000..3f907c79554
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/reduce.cu
@@ -0,0 +1,301 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+namespace
+{
+    template <typename T, typename S, typename D>
+    void reduceToRowImpl(const GpuMat& _src, GpuMat& _dst, int reduceOp, Stream& stream)
+    {
+        const GpuMat_<T>& src = (const GpuMat_<T>&) _src;
+        GpuMat_<D>& dst = (GpuMat_<D>&) _dst;
+
+        switch (reduceOp)
+        {
+        case cv::REDUCE_SUM:
+            gridReduceToRow< Sum<S> >(src, dst, stream);
+            break;
+
+        case cv::REDUCE_AVG:
+            gridReduceToRow< Avg<S> >(src, dst, stream);
+            break;
+
+        case cv::REDUCE_MIN:
+            gridReduceToRow< Min<S> >(src, dst, stream);
+            break;
+
+        case cv::REDUCE_MAX:
+            gridReduceToRow< Max<S> >(src, dst, stream);
+            break;
+        };
+    }
+
+    template <typename T, typename S, typename D>
+    void reduceToColumnImpl_(const GpuMat& _src, GpuMat& _dst, int reduceOp, Stream& stream)
+    {
+        const GpuMat_<T>& src = (const GpuMat_<T>&) _src;
+        GpuMat_<D>& dst = (GpuMat_<D>&) _dst;
+
+        switch (reduceOp)
+        {
+        case cv::REDUCE_SUM:
+            gridReduceToColumn< Sum<S> >(src, dst, stream);
+            break;
+
+        case cv::REDUCE_AVG:
+            gridReduceToColumn< Avg<S> >(src, dst, stream);
+            break;
+
+        case cv::REDUCE_MIN:
+            gridReduceToColumn< Min<S> >(src, dst, stream);
+            break;
+
+        case cv::REDUCE_MAX:
+            gridReduceToColumn< Max<S> >(src, dst, stream);
+            break;
+        };
+    }
+
+    template <typename T, typename S, typename D>
+    void reduceToColumnImpl(const GpuMat& src, GpuMat& dst, int reduceOp, Stream& stream)
+    {
+        typedef void (*func_t)(const GpuMat& src, GpuMat& dst, int reduceOp, Stream& stream);
+        static const func_t funcs[4] =
+        {
+            reduceToColumnImpl_<T, S, D>,
+            reduceToColumnImpl_<typename MakeVec<T, 2>::type, typename MakeVec<S, 2>::type, typename MakeVec<D, 2>::type>,
+            reduceToColumnImpl_<typename MakeVec<T, 3>::type, typename MakeVec<S, 3>::type, typename MakeVec<D, 3>::type>,
+            reduceToColumnImpl_<typename MakeVec<T, 4>::type, typename MakeVec<S, 4>::type, typename MakeVec<D, 4>::type>
+        };
+
+        funcs[src.channels() - 1](src, dst, reduceOp, stream);
+    }
+}
+
+void cv::cuda::reduce(InputArray _src, OutputArray _dst, int dim, int reduceOp, int dtype, Stream& stream)
+{
+    GpuMat src = getInputMat(_src, stream);
+
+    CV_Assert( src.channels() <= 4 );
+    CV_Assert( dim == 0 || dim == 1 );
+    CV_Assert( reduceOp == REDUCE_SUM || reduceOp == REDUCE_AVG || reduceOp == REDUCE_MAX || reduceOp == REDUCE_MIN );
+
+    if (dtype < 0)
+        dtype = src.depth();
+
+    GpuMat dst = getOutputMat(_dst, dim == 0 ? 1 : src.rows, dim == 0 ? src.cols : 1, CV_MAKE_TYPE(CV_MAT_DEPTH(dtype), src.channels()), stream);
+
+    if (dim == 0)
+    {
+        typedef void (*func_t)(const GpuMat& _src, GpuMat& _dst, int reduceOp, Stream& stream);
+        static const func_t funcs[7][7] =
+        {
+            {
+                reduceToRowImpl<uchar, int, uchar>,
+                0 /*reduceToRowImpl<uchar, int, schar>*/,
+                0 /*reduceToRowImpl<uchar, int, ushort>*/,
+                0 /*reduceToRowImpl<uchar, int, short>*/,
+                reduceToRowImpl<uchar, int, int>,
+                reduceToRowImpl<uchar, float, float>,
+                reduceToRowImpl<uchar, double, double>
+            },
+            {
+                0 /*reduceToRowImpl<schar, int, uchar>*/,
+                0 /*reduceToRowImpl<schar, int, schar>*/,
+                0 /*reduceToRowImpl<schar, int, ushort>*/,
+                0 /*reduceToRowImpl<schar, int, short>*/,
+                0 /*reduceToRowImpl<schar, int, int>*/,
+                0 /*reduceToRowImpl<schar, float, float>*/,
+                0 /*reduceToRowImpl<schar, double, double>*/
+            },
+            {
+                0 /*reduceToRowImpl<ushort, int, uchar>*/,
+                0 /*reduceToRowImpl<ushort, int, schar>*/,
+                reduceToRowImpl<ushort, int, ushort>,
+                0 /*reduceToRowImpl<ushort, int, short>*/,
+                reduceToRowImpl<ushort, int, int>,
+                reduceToRowImpl<ushort, float, float>,
+                reduceToRowImpl<ushort, double, double>
+            },
+            {
+                0 /*reduceToRowImpl<short, int, uchar>*/,
+                0 /*reduceToRowImpl<short, int, schar>*/,
+                0 /*reduceToRowImpl<short, int, ushort>*/,
+                reduceToRowImpl<short, int, short>,
+                reduceToRowImpl<short, int, int>,
+                reduceToRowImpl<short, float, float>,
+                reduceToRowImpl<short, double, double>
+            },
+            {
+                0 /*reduceToRowImpl<int, int, uchar>*/,
+                0 /*reduceToRowImpl<int, int, schar>*/,
+                0 /*reduceToRowImpl<int, int, ushort>*/,
+                0 /*reduceToRowImpl<int, int, short>*/,
+                reduceToRowImpl<int, int, int>,
+                reduceToRowImpl<int, float, float>,
+                reduceToRowImpl<int, double, double>
+            },
+            {
+                0 /*reduceToRowImpl<float, float, uchar>*/,
+                0 /*reduceToRowImpl<float, float, schar>*/,
+                0 /*reduceToRowImpl<float, float, ushort>*/,
+                0 /*reduceToRowImpl<float, float, short>*/,
+                0 /*reduceToRowImpl<float, float, int>*/,
+                reduceToRowImpl<float, float, float>,
+                reduceToRowImpl<float, double, double>
+            },
+            {
+                0 /*reduceToRowImpl<double, double, uchar>*/,
+                0 /*reduceToRowImpl<double, double, schar>*/,
+                0 /*reduceToRowImpl<double, double, ushort>*/,
+                0 /*reduceToRowImpl<double, double, short>*/,
+                0 /*reduceToRowImpl<double, double, int>*/,
+                0 /*reduceToRowImpl<double, double, float>*/,
+                reduceToRowImpl<double, double, double>
+            }
+        };
+
+        const func_t func = funcs[src.depth()][dst.depth()];
+
+        if (!func)
+            CV_Error(cv::Error::StsUnsupportedFormat, "Unsupported combination of input and output array formats");
+
+        GpuMat dst_cont = dst.reshape(1);
+        func(src.reshape(1), dst_cont, reduceOp, stream);
+    }
+    else
+    {
+        typedef void (*func_t)(const GpuMat& _src, GpuMat& _dst, int reduceOp, Stream& stream);
+        static const func_t funcs[7][7] =
+        {
+            {
+                reduceToColumnImpl<uchar, int, uchar>,
+                0 /*reduceToColumnImpl<uchar, int, schar>*/,
+                0 /*reduceToColumnImpl<uchar, int, ushort>*/,
+                0 /*reduceToColumnImpl<uchar, int, short>*/,
+                reduceToColumnImpl<uchar, int, int>,
+                reduceToColumnImpl<uchar, float, float>,
+                reduceToColumnImpl<uchar, double, double>
+            },
+            {
+                0 /*reduceToColumnImpl<schar, int, uchar>*/,
+                0 /*reduceToColumnImpl<schar, int, schar>*/,
+                0 /*reduceToColumnImpl<schar, int, ushort>*/,
+                0 /*reduceToColumnImpl<schar, int, short>*/,
+                0 /*reduceToColumnImpl<schar, int, int>*/,
+                0 /*reduceToColumnImpl<schar, float, float>*/,
+                0 /*reduceToColumnImpl<schar, double, double>*/
+            },
+            {
+                0 /*reduceToColumnImpl<ushort, int, uchar>*/,
+                0 /*reduceToColumnImpl<ushort, int, schar>*/,
+                reduceToColumnImpl<ushort, int, ushort>,
+                0 /*reduceToColumnImpl<ushort, int, short>*/,
+                reduceToColumnImpl<ushort, int, int>,
+                reduceToColumnImpl<ushort, float, float>,
+                reduceToColumnImpl<ushort, double, double>
+            },
+            {
+                0 /*reduceToColumnImpl<short, int, uchar>*/,
+                0 /*reduceToColumnImpl<short, int, schar>*/,
+                0 /*reduceToColumnImpl<short, int, ushort>*/,
+                reduceToColumnImpl<short, int, short>,
+                reduceToColumnImpl<short, int, int>,
+                reduceToColumnImpl<short, float, float>,
+                reduceToColumnImpl<short, double, double>
+            },
+            {
+                0 /*reduceToColumnImpl<int, int, uchar>*/,
+                0 /*reduceToColumnImpl<int, int, schar>*/,
+                0 /*reduceToColumnImpl<int, int, ushort>*/,
+                0 /*reduceToColumnImpl<int, int, short>*/,
+                reduceToColumnImpl<int, int, int>,
+                reduceToColumnImpl<int, float, float>,
+                reduceToColumnImpl<int, double, double>
+            },
+            {
+                0 /*reduceToColumnImpl<float, float, uchar>*/,
+                0 /*reduceToColumnImpl<float, float, schar>*/,
+                0 /*reduceToColumnImpl<float, float, ushort>*/,
+                0 /*reduceToColumnImpl<float, float, short>*/,
+                0 /*reduceToColumnImpl<float, float, int>*/,
+                reduceToColumnImpl<float, float, float>,
+                reduceToColumnImpl<float, double, double>
+            },
+            {
+                0 /*reduceToColumnImpl<double, double, uchar>*/,
+                0 /*reduceToColumnImpl<double, double, schar>*/,
+                0 /*reduceToColumnImpl<double, double, ushort>*/,
+                0 /*reduceToColumnImpl<double, double, short>*/,
+                0 /*reduceToColumnImpl<double, double, int>*/,
+                0 /*reduceToColumnImpl<double, double, float>*/,
+                reduceToColumnImpl<double, double, double>
+            }
+        };
+
+        const func_t func = funcs[src.depth()][dst.depth()];
+
+        if (!func)
+            CV_Error(cv::Error::StsUnsupportedFormat, "Unsupported combination of input and output array formats");
+
+        func(src, dst, reduceOp, stream);
+    }
+
+    syncOutput(dst, _dst, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/split_merge.cu b/modules/cudaarithm/src/cuda/split_merge.cu
new file mode 100644
index 00000000000..5b3af10775d
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/split_merge.cu
@@ -0,0 +1,250 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+////////////////////////////////////////////////////////////////////////
+/// merge
+
+namespace
+{
+    template <int cn, typename T> struct MergeFunc;
+
+    template <typename T> struct MergeFunc<2, T>
+    {
+        static void call(const GpuMat* src, GpuMat& dst, Stream& stream)
+        {
+            gridMerge(zipPtr(globPtr<T>(src[0]), globPtr<T>(src[1])),
+                    globPtr<typename MakeVec<T, 2>::type>(dst),
+                    stream);
+        }
+    };
+
+    template <typename T> struct MergeFunc<3, T>
+    {
+        static void call(const GpuMat* src, GpuMat& dst, Stream& stream)
+        {
+            gridMerge(zipPtr(globPtr<T>(src[0]), globPtr<T>(src[1]), globPtr<T>(src[2])),
+                    globPtr<typename MakeVec<T, 3>::type>(dst),
+                    stream);
+        }
+    };
+
+    template <typename T> struct MergeFunc<4, T>
+    {
+        static void call(const GpuMat* src, GpuMat& dst, Stream& stream)
+        {
+            gridMerge(zipPtr(globPtr<T>(src[0]), globPtr<T>(src[1]), globPtr<T>(src[2]), globPtr<T>(src[3])),
+                    globPtr<typename MakeVec<T, 4>::type>(dst),
+                    stream);
+        }
+    };
+
+    void mergeImpl(const GpuMat* src, size_t n, cv::OutputArray _dst, Stream& stream)
+    {
+        CV_Assert( src != 0 );
+        CV_Assert( n > 0 && n <= 4 );
+
+        const int depth = src[0].depth();
+        const cv::Size size = src[0].size();
+
+        for (size_t i = 0; i < n; ++i)
+        {
+            CV_Assert( src[i].size() == size );
+            CV_Assert( src[i].depth() == depth );
+            CV_Assert( src[i].channels() == 1 );
+        }
+
+        if (n == 1)
+        {
+            src[0].copyTo(_dst, stream);
+        }
+        else
+        {
+            typedef void (*func_t)(const GpuMat* src, GpuMat& dst, Stream& stream);
+            static const func_t funcs[3][5] =
+            {
+                {MergeFunc<2, uchar>::call, MergeFunc<2, ushort>::call, MergeFunc<2, int>::call, 0, MergeFunc<2, double>::call},
+                {MergeFunc<3, uchar>::call, MergeFunc<3, ushort>::call, MergeFunc<3, int>::call, 0, MergeFunc<3, double>::call},
+                {MergeFunc<4, uchar>::call, MergeFunc<4, ushort>::call, MergeFunc<4, int>::call, 0, MergeFunc<4, double>::call}
+            };
+
+            const int channels = static_cast<int>(n);
+
+            GpuMat dst = getOutputMat(_dst, size, CV_MAKE_TYPE(depth, channels), stream);
+
+            const func_t func = funcs[channels - 2][CV_ELEM_SIZE(depth) / 2];
+
+            if (func == 0)
+                CV_Error(cv::Error::StsUnsupportedFormat, "Unsupported channel count or data type");
+
+            func(src, dst, stream);
+
+            syncOutput(dst, _dst, stream);
+        }
+    }
+}
+
+void cv::cuda::merge(const GpuMat* src, size_t n, OutputArray dst, Stream& stream)
+{
+    mergeImpl(src, n, dst, stream);
+}
+
+
+void cv::cuda::merge(const std::vector<GpuMat>& src, OutputArray dst, Stream& stream)
+{
+    mergeImpl(&src[0], src.size(), dst, stream);
+}
+
+////////////////////////////////////////////////////////////////////////
+/// split
+
+namespace
+{
+    template <int cn, typename T> struct SplitFunc;
+
+    template <typename T> struct SplitFunc<2, T>
+    {
+        static void call(const GpuMat& src, GpuMat* dst, Stream& stream)
+        {
+            GlobPtrSz<T> dstarr[2] =
+            {
+                globPtr<T>(dst[0]), globPtr<T>(dst[1])
+            };
+
+            gridSplit(globPtr<typename MakeVec<T, 2>::type>(src), dstarr, stream);
+        }
+    };
+
+    template <typename T> struct SplitFunc<3, T>
+    {
+        static void call(const GpuMat& src, GpuMat* dst, Stream& stream)
+        {
+            GlobPtrSz<T> dstarr[3] =
+            {
+                globPtr<T>(dst[0]), globPtr<T>(dst[1]), globPtr<T>(dst[2])
+            };
+
+            gridSplit(globPtr<typename MakeVec<T, 3>::type>(src), dstarr, stream);
+        }
+    };
+
+    template <typename T> struct SplitFunc<4, T>
+    {
+        static void call(const GpuMat& src, GpuMat* dst, Stream& stream)
+        {
+            GlobPtrSz<T> dstarr[4] =
+            {
+                globPtr<T>(dst[0]), globPtr<T>(dst[1]), globPtr<T>(dst[2]), globPtr<T>(dst[3])
+            };
+
+            gridSplit(globPtr<typename MakeVec<T, 4>::type>(src), dstarr, stream);
+        }
+    };
+
+    void splitImpl(const GpuMat& src, GpuMat* dst, Stream& stream)
+    {
+        typedef void (*func_t)(const GpuMat& src, GpuMat* dst, Stream& stream);
+        static const func_t funcs[3][5] =
+        {
+            {SplitFunc<2, uchar>::call, SplitFunc<2, ushort>::call, SplitFunc<2, int>::call, 0, SplitFunc<2, double>::call},
+            {SplitFunc<3, uchar>::call, SplitFunc<3, ushort>::call, SplitFunc<3, int>::call, 0, SplitFunc<3, double>::call},
+            {SplitFunc<4, uchar>::call, SplitFunc<4, ushort>::call, SplitFunc<4, int>::call, 0, SplitFunc<4, double>::call}
+        };
+
+        CV_Assert( dst != 0 );
+
+        const int depth = src.depth();
+        const int channels = src.channels();
+
+        CV_Assert( channels <= 4 );
+
+        if (channels == 0)
+            return;
+
+        if (channels == 1)
+        {
+            src.copyTo(dst[0], stream);
+            return;
+        }
+
+        for (int i = 0; i < channels; ++i)
+            dst[i].create(src.size(), depth);
+
+        const func_t func = funcs[channels - 2][CV_ELEM_SIZE(depth) / 2];
+
+        if (func == 0)
+            CV_Error(cv::Error::StsUnsupportedFormat, "Unsupported channel count or data type");
+
+        func(src, dst, stream);
+    }
+}
+
+void cv::cuda::split(InputArray _src, GpuMat* dst, Stream& stream)
+{
+    GpuMat src = getInputMat(_src, stream);
+    splitImpl(src, dst, stream);
+}
+
+void cv::cuda::split(InputArray _src, std::vector<GpuMat>& dst, Stream& stream)
+{
+    GpuMat src = getInputMat(_src, stream);
+    dst.resize(src.channels());
+    if (src.channels() > 0)
+        splitImpl(src, &dst[0], stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/sub_mat.cu b/modules/cudaarithm/src/cuda/sub_mat.cu
new file mode 100644
index 00000000000..6468692aee9
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/sub_mat.cu
@@ -0,0 +1,225 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudev.hpp"
+
+using namespace cv::cudev;
+
+void subMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, double, Stream& _stream, int);
+
+namespace
+{
+    template <typename T, typename D> struct SubOp1 : binary_function<T, T, D>
+    {
+        __device__ __forceinline__ D operator ()(T a, T b) const
+        {
+            return saturate_cast<D>(a - b);
+        }
+    };
+
+    template <typename T, typename D>
+    void subMat_v1(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, Stream& stream)
+    {
+        if (mask.data)
+            gridTransformBinary(globPtr<T>(src1), globPtr<T>(src2), globPtr<D>(dst), SubOp1<T, D>(), globPtr<uchar>(mask), stream);
+        else
+            gridTransformBinary(globPtr<T>(src1), globPtr<T>(src2), globPtr<D>(dst), SubOp1<T, D>(), stream);
+    }
+
+    struct SubOp2 : binary_function<uint, uint, uint>
+    {
+        __device__ __forceinline__ uint operator ()(uint a, uint b) const
+        {
+            return vsub2(a, b);
+        }
+    };
+
+    void subMat_v2(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+    {
+        const int vcols = src1.cols >> 1;
+
+        GlobPtrSz<uint> src1_ = globPtr((uint*) src1.data, src1.step, src1.rows, vcols);
+        GlobPtrSz<uint> src2_ = globPtr((uint*) src2.data, src2.step, src1.rows, vcols);
+        GlobPtrSz<uint> dst_ = globPtr((uint*) dst.data, dst.step, src1.rows, vcols);
+
+        gridTransformBinary(src1_, src2_, dst_, SubOp2(), stream);
+    }
+
+    struct SubOp4 : binary_function<uint, uint, uint>
+    {
+        __device__ __forceinline__ uint operator ()(uint a, uint b) const
+        {
+            return vsub4(a, b);
+        }
+    };
+
+    void subMat_v4(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+    {
+        const int vcols = src1.cols >> 2;
+
+        GlobPtrSz<uint> src1_ = globPtr((uint*) src1.data, src1.step, src1.rows, vcols);
+        GlobPtrSz<uint> src2_ = globPtr((uint*) src2.data, src2.step, src1.rows, vcols);
+        GlobPtrSz<uint> dst_ = globPtr((uint*) dst.data, dst.step, src1.rows, vcols);
+
+        gridTransformBinary(src1_, src2_, dst_, SubOp4(), stream);
+    }
+}
+
+void subMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, double, Stream& stream, int)
+{
+    typedef void (*func_t)(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, Stream& stream);
+    static const func_t funcs[7][7] =
+    {
+        {
+            subMat_v1<uchar, uchar>,
+            subMat_v1<uchar, schar>,
+            subMat_v1<uchar, ushort>,
+            subMat_v1<uchar, short>,
+            subMat_v1<uchar, int>,
+            subMat_v1<uchar, float>,
+            subMat_v1<uchar, double>
+        },
+        {
+            subMat_v1<schar, uchar>,
+            subMat_v1<schar, schar>,
+            subMat_v1<schar, ushort>,
+            subMat_v1<schar, short>,
+            subMat_v1<schar, int>,
+            subMat_v1<schar, float>,
+            subMat_v1<schar, double>
+        },
+        {
+            0 /*subMat_v1<ushort, uchar>*/,
+            0 /*subMat_v1<ushort, schar>*/,
+            subMat_v1<ushort, ushort>,
+            subMat_v1<ushort, short>,
+            subMat_v1<ushort, int>,
+            subMat_v1<ushort, float>,
+            subMat_v1<ushort, double>
+        },
+        {
+            0 /*subMat_v1<short, uchar>*/,
+            0 /*subMat_v1<short, schar>*/,
+            subMat_v1<short, ushort>,
+            subMat_v1<short, short>,
+            subMat_v1<short, int>,
+            subMat_v1<short, float>,
+            subMat_v1<short, double>
+        },
+        {
+            0 /*subMat_v1<int, uchar>*/,
+            0 /*subMat_v1<int, schar>*/,
+            0 /*subMat_v1<int, ushort>*/,
+            0 /*subMat_v1<int, short>*/,
+            subMat_v1<int, int>,
+            subMat_v1<int, float>,
+            subMat_v1<int, double>
+        },
+        {
+            0 /*subMat_v1<float, uchar>*/,
+            0 /*subMat_v1<float, schar>*/,
+            0 /*subMat_v1<float, ushort>*/,
+            0 /*subMat_v1<float, short>*/,
+            0 /*subMat_v1<float, int>*/,
+            subMat_v1<float, float>,
+            subMat_v1<float, double>
+        },
+        {
+            0 /*subMat_v1<double, uchar>*/,
+            0 /*subMat_v1<double, schar>*/,
+            0 /*subMat_v1<double, ushort>*/,
+            0 /*subMat_v1<double, short>*/,
+            0 /*subMat_v1<double, int>*/,
+            0 /*subMat_v1<double, float>*/,
+            subMat_v1<double, double>
+        }
+    };
+
+    const int sdepth = src1.depth();
+    const int ddepth = dst.depth();
+
+    CV_DbgAssert( sdepth <= CV_64F && ddepth <= CV_64F );
+
+    GpuMat src1_ = src1.reshape(1);
+    GpuMat src2_ = src2.reshape(1);
+    GpuMat dst_ = dst.reshape(1);
+
+    if (mask.empty() && (sdepth == CV_8U || sdepth == CV_16U) && ddepth == sdepth)
+    {
+        const intptr_t src1ptr = reinterpret_cast<intptr_t>(src1_.data);
+        const intptr_t src2ptr = reinterpret_cast<intptr_t>(src2_.data);
+        const intptr_t dstptr = reinterpret_cast<intptr_t>(dst_.data);
+
+        const bool isAllAligned = (src1ptr & 31) == 0 && (src2ptr & 31) == 0 && (dstptr & 31) == 0;
+
+        if (isAllAligned)
+        {
+            if (sdepth == CV_8U && (src1_.cols & 3) == 0)
+            {
+                subMat_v4(src1_, src2_, dst_, stream);
+                return;
+            }
+            else if (sdepth == CV_16U && (src1_.cols & 1) == 0)
+            {
+                subMat_v2(src1_, src2_, dst_, stream);
+                return;
+            }
+        }
+    }
+
+    const func_t func = funcs[sdepth][ddepth];
+
+    if (!func)
+        CV_Error(cv::Error::StsUnsupportedFormat, "Unsupported combination of source and destination types");
+
+    func(src1_, src2_, dst_, mask, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/sub_scalar.cu b/modules/cudaarithm/src/cuda/sub_scalar.cu
new file mode 100644
index 00000000000..c4eeec01482
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/sub_scalar.cu
@@ -0,0 +1,203 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudev.hpp"
+
+using namespace cv::cudev;
+
+void subScalar(const GpuMat& src, cv::Scalar val, bool inv, GpuMat& dst, const GpuMat& mask, double, Stream& stream, int);
+
+namespace
+{
+    template <typename SrcType, typename ScalarType, typename DstType> struct SubScalarOp : unary_function<SrcType, DstType>
+    {
+        ScalarType val;
+
+        __device__ __forceinline__ DstType operator ()(SrcType a) const
+        {
+            return saturate_cast<DstType>(saturate_cast<ScalarType>(a) - val);
+        }
+    };
+
+    template <typename SrcType, typename ScalarType, typename DstType> struct SubScalarOpInv : unary_function<SrcType, DstType>
+    {
+        ScalarType val;
+
+        __device__ __forceinline__ DstType operator ()(SrcType a) const
+        {
+            return saturate_cast<DstType>(val - saturate_cast<ScalarType>(a));
+        }
+    };
+
+    template <typename ScalarDepth> struct TransformPolicy : DefaultTransformPolicy
+    {
+    };
+    template <> struct TransformPolicy<double> : DefaultTransformPolicy
+    {
+        enum {
+            shift = 1
+        };
+    };
+
+    template <typename SrcType, typename ScalarDepth, typename DstType>
+    void subScalarImpl(const GpuMat& src, cv::Scalar value, bool inv, GpuMat& dst, const GpuMat& mask, Stream& stream)
+    {
+        typedef typename MakeVec<ScalarDepth, VecTraits<SrcType>::cn>::type ScalarType;
+
+        cv::Scalar_<ScalarDepth> value_ = value;
+
+        if (inv)
+        {
+            SubScalarOpInv<SrcType, ScalarType, DstType> op;
+            op.val = VecTraits<ScalarType>::make(value_.val);
+
+            if (mask.data)
+                gridTransformUnary_< TransformPolicy<ScalarDepth> >(globPtr<SrcType>(src), globPtr<DstType>(dst), op, globPtr<uchar>(mask), stream);
+            else
+                gridTransformUnary_< TransformPolicy<ScalarDepth> >(globPtr<SrcType>(src), globPtr<DstType>(dst), op, stream);
+        }
+        else
+        {
+            SubScalarOp<SrcType, ScalarType, DstType> op;
+            op.val = VecTraits<ScalarType>::make(value_.val);
+
+            if (mask.data)
+                gridTransformUnary_< TransformPolicy<ScalarDepth> >(globPtr<SrcType>(src), globPtr<DstType>(dst), op, globPtr<uchar>(mask), stream);
+            else
+                gridTransformUnary_< TransformPolicy<ScalarDepth> >(globPtr<SrcType>(src), globPtr<DstType>(dst), op, stream);
+        }
+    }
+}
+
+void subScalar(const GpuMat& src, cv::Scalar val, bool inv, GpuMat& dst, const GpuMat& mask, double, Stream& stream, int)
+{
+    typedef void (*func_t)(const GpuMat& src, cv::Scalar val, bool inv, GpuMat& dst, const GpuMat& mask, Stream& stream);
+    static const func_t funcs[7][7][4] =
+    {
+        {
+            {subScalarImpl<uchar, float, uchar>, subScalarImpl<uchar2, float, uchar2>, subScalarImpl<uchar3, float, uchar3>, subScalarImpl<uchar4, float, uchar4>},
+            {subScalarImpl<uchar, float, schar>, subScalarImpl<uchar2, float, char2>, subScalarImpl<uchar3, float, char3>, subScalarImpl<uchar4, float, char4>},
+            {subScalarImpl<uchar, float, ushort>, subScalarImpl<uchar2, float, ushort2>, subScalarImpl<uchar3, float, ushort3>, subScalarImpl<uchar4, float, ushort4>},
+            {subScalarImpl<uchar, float, short>, subScalarImpl<uchar2, float, short2>, subScalarImpl<uchar3, float, short3>, subScalarImpl<uchar4, float, short4>},
+            {subScalarImpl<uchar, float, int>, subScalarImpl<uchar2, float, int2>, subScalarImpl<uchar3, float, int3>, subScalarImpl<uchar4, float, int4>},
+            {subScalarImpl<uchar, float, float>, subScalarImpl<uchar2, float, float2>, subScalarImpl<uchar3, float, float3>, subScalarImpl<uchar4, float, float4>},
+            {subScalarImpl<uchar, double, double>, subScalarImpl<uchar2, double, double2>, subScalarImpl<uchar3, double, double3>, subScalarImpl<uchar4, double, double4>}
+        },
+        {
+            {subScalarImpl<schar, float, uchar>, subScalarImpl<char2, float, uchar2>, subScalarImpl<char3, float, uchar3>, subScalarImpl<char4, float, uchar4>},
+            {subScalarImpl<schar, float, schar>, subScalarImpl<char2, float, char2>, subScalarImpl<char3, float, char3>, subScalarImpl<char4, float, char4>},
+            {subScalarImpl<schar, float, ushort>, subScalarImpl<char2, float, ushort2>, subScalarImpl<char3, float, ushort3>, subScalarImpl<char4, float, ushort4>},
+            {subScalarImpl<schar, float, short>, subScalarImpl<char2, float, short2>, subScalarImpl<char3, float, short3>, subScalarImpl<char4, float, short4>},
+            {subScalarImpl<schar, float, int>, subScalarImpl<char2, float, int2>, subScalarImpl<char3, float, int3>, subScalarImpl<char4, float, int4>},
+            {subScalarImpl<schar, float, float>, subScalarImpl<char2, float, float2>, subScalarImpl<char3, float, float3>, subScalarImpl<char4, float, float4>},
+            {subScalarImpl<schar, double, double>, subScalarImpl<char2, double, double2>, subScalarImpl<char3, double, double3>, subScalarImpl<char4, double, double4>}
+        },
+        {
+            {0 /*subScalarImpl<ushort, float, uchar>*/, 0 /*subScalarImpl<ushort2, float, uchar2>*/, 0 /*subScalarImpl<ushort3, float, uchar3>*/, 0 /*subScalarImpl<ushort4, float, uchar4>*/},
+            {0 /*subScalarImpl<ushort, float, schar>*/, 0 /*subScalarImpl<ushort2, float, char2>*/, 0 /*subScalarImpl<ushort3, float, char3>*/, 0 /*subScalarImpl<ushort4, float, char4>*/},
+            {subScalarImpl<ushort, float, ushort>, subScalarImpl<ushort2, float, ushort2>, subScalarImpl<ushort3, float, ushort3>, subScalarImpl<ushort4, float, ushort4>},
+            {subScalarImpl<ushort, float, short>, subScalarImpl<ushort2, float, short2>, subScalarImpl<ushort3, float, short3>, subScalarImpl<ushort4, float, short4>},
+            {subScalarImpl<ushort, float, int>, subScalarImpl<ushort2, float, int2>, subScalarImpl<ushort3, float, int3>, subScalarImpl<ushort4, float, int4>},
+            {subScalarImpl<ushort, float, float>, subScalarImpl<ushort2, float, float2>, subScalarImpl<ushort3, float, float3>, subScalarImpl<ushort4, float, float4>},
+            {subScalarImpl<ushort, double, double>, subScalarImpl<ushort2, double, double2>, subScalarImpl<ushort3, double, double3>, subScalarImpl<ushort4, double, double4>}
+        },
+        {
+            {0 /*subScalarImpl<short, float, uchar>*/, 0 /*subScalarImpl<short2, float, uchar2>*/, 0 /*subScalarImpl<short3, float, uchar3>*/, 0 /*subScalarImpl<short4, float, uchar4>*/},
+            {0 /*subScalarImpl<short, float, schar>*/, 0 /*subScalarImpl<short2, float, char2>*/, 0 /*subScalarImpl<short3, float, char3>*/, 0 /*subScalarImpl<short4, float, char4>*/},
+            {subScalarImpl<short, float, ushort>, subScalarImpl<short2, float, ushort2>, subScalarImpl<short3, float, ushort3>, subScalarImpl<short4, float, ushort4>},
+            {subScalarImpl<short, float, short>, subScalarImpl<short2, float, short2>, subScalarImpl<short3, float, short3>, subScalarImpl<short4, float, short4>},
+            {subScalarImpl<short, float, int>, subScalarImpl<short2, float, int2>, subScalarImpl<short3, float, int3>, subScalarImpl<short4, float, int4>},
+            {subScalarImpl<short, float, float>, subScalarImpl<short2, float, float2>, subScalarImpl<short3, float, float3>, subScalarImpl<short4, float, float4>},
+            {subScalarImpl<short, double, double>, subScalarImpl<short2, double, double2>, subScalarImpl<short3, double, double3>, subScalarImpl<short4, double, double4>}
+        },
+        {
+            {0 /*subScalarImpl<int, float, uchar>*/, 0 /*subScalarImpl<int2, float, uchar2>*/, 0 /*subScalarImpl<int3, float, uchar3>*/, 0 /*subScalarImpl<int4, float, uchar4>*/},
+            {0 /*subScalarImpl<int, float, schar>*/, 0 /*subScalarImpl<int2, float, char2>*/, 0 /*subScalarImpl<int3, float, char3>*/, 0 /*subScalarImpl<int4, float, char4>*/},
+            {0 /*subScalarImpl<int, float, ushort>*/, 0 /*subScalarImpl<int2, float, ushort2>*/, 0 /*subScalarImpl<int3, float, ushort3>*/, 0 /*subScalarImpl<int4, float, ushort4>*/},
+            {0 /*subScalarImpl<int, float, short>*/, 0 /*subScalarImpl<int2, float, short2>*/, 0 /*subScalarImpl<int3, float, short3>*/, 0 /*subScalarImpl<int4, float, short4>*/},
+            {subScalarImpl<int, float, int>, subScalarImpl<int2, float, int2>, subScalarImpl<int3, float, int3>, subScalarImpl<int4, float, int4>},
+            {subScalarImpl<int, float, float>, subScalarImpl<int2, float, float2>, subScalarImpl<int3, float, float3>, subScalarImpl<int4, float, float4>},
+            {subScalarImpl<int, double, double>, subScalarImpl<int2, double, double2>, subScalarImpl<int3, double, double3>, subScalarImpl<int4, double, double4>}
+        },
+        {
+            {0 /*subScalarImpl<float, float, uchar>*/, 0 /*subScalarImpl<float2, float, uchar2>*/, 0 /*subScalarImpl<float3, float, uchar3>*/, 0 /*subScalarImpl<float4, float, uchar4>*/},
+            {0 /*subScalarImpl<float, float, schar>*/, 0 /*subScalarImpl<float2, float, char2>*/, 0 /*subScalarImpl<float3, float, char3>*/, 0 /*subScalarImpl<float4, float, char4>*/},
+            {0 /*subScalarImpl<float, float, ushort>*/, 0 /*subScalarImpl<float2, float, ushort2>*/, 0 /*subScalarImpl<float3, float, ushort3>*/, 0 /*subScalarImpl<float4, float, ushort4>*/},
+            {0 /*subScalarImpl<float, float, short>*/, 0 /*subScalarImpl<float2, float, short2>*/, 0 /*subScalarImpl<float3, float, short3>*/, 0 /*subScalarImpl<float4, float, short4>*/},
+            {0 /*subScalarImpl<float, float, int>*/, 0 /*subScalarImpl<float2, float, int2>*/, 0 /*subScalarImpl<float3, float, int3>*/, 0 /*subScalarImpl<float4, float, int4>*/},
+            {subScalarImpl<float, float, float>, subScalarImpl<float2, float, float2>, subScalarImpl<float3, float, float3>, subScalarImpl<float4, float, float4>},
+            {subScalarImpl<float, double, double>, subScalarImpl<float2, double, double2>, subScalarImpl<float3, double, double3>, subScalarImpl<float4, double, double4>}
+        },
+        {
+            {0 /*subScalarImpl<double, double, uchar>*/, 0 /*subScalarImpl<double2, double, uchar2>*/, 0 /*subScalarImpl<double3, double, uchar3>*/, 0 /*subScalarImpl<double4, double, uchar4>*/},
+            {0 /*subScalarImpl<double, double, schar>*/, 0 /*subScalarImpl<double2, double, char2>*/, 0 /*subScalarImpl<double3, double, char3>*/, 0 /*subScalarImpl<double4, double, char4>*/},
+            {0 /*subScalarImpl<double, double, ushort>*/, 0 /*subScalarImpl<double2, double, ushort2>*/, 0 /*subScalarImpl<double3, double, ushort3>*/, 0 /*subScalarImpl<double4, double, ushort4>*/},
+            {0 /*subScalarImpl<double, double, short>*/, 0 /*subScalarImpl<double2, double, short2>*/, 0 /*subScalarImpl<double3, double, short3>*/, 0 /*subScalarImpl<double4, double, short4>*/},
+            {0 /*subScalarImpl<double, double, int>*/, 0 /*subScalarImpl<double2, double, int2>*/, 0 /*subScalarImpl<double3, double, int3>*/, 0 /*subScalarImpl<double4, double, int4>*/},
+            {0 /*subScalarImpl<double, double, float>*/, 0 /*subScalarImpl<double2, double, float2>*/, 0 /*subScalarImpl<double3, double, float3>*/, 0 /*subScalarImpl<double4, double, float4>*/},
+            {subScalarImpl<double, double, double>, subScalarImpl<double2, double, double2>, subScalarImpl<double3, double, double3>, subScalarImpl<double4, double, double4>}
+        }
+    };
+
+    const int sdepth = src.depth();
+    const int ddepth = dst.depth();
+    const int cn = src.channels();
+
+    CV_DbgAssert( sdepth <= CV_64F && ddepth <= CV_64F && cn <= 4 );
+
+    const func_t func = funcs[sdepth][ddepth][cn - 1];
+
+    if (!func)
+        CV_Error(cv::Error::StsUnsupportedFormat, "Unsupported combination of source and destination types");
+
+    func(src, val, inv, dst, mask, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/sum.cu b/modules/cudaarithm/src/cuda/sum.cu
new file mode 100644
index 00000000000..01604490394
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/sum.cu
@@ -0,0 +1,242 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+namespace
+{
+    template <typename T, typename R, int cn>
+    void sumImpl(const GpuMat& _src, GpuMat& _dst, const GpuMat& mask, Stream& stream)
+    {
+        typedef typename MakeVec<T, cn>::type src_type;
+        typedef typename MakeVec<R, cn>::type res_type;
+
+        const GpuMat_<src_type>& src = (const GpuMat_<src_type>&) _src;
+        GpuMat_<res_type>& dst = (GpuMat_<res_type>&) _dst;
+
+        if (mask.empty())
+            gridCalcSum(src, dst, stream);
+        else
+            gridCalcSum(src, dst, globPtr<uchar>(mask), stream);
+    }
+
+    template <typename T, typename R, int cn>
+    void sumAbsImpl(const GpuMat& _src, GpuMat& _dst, const GpuMat& mask, Stream& stream)
+    {
+        typedef typename MakeVec<T, cn>::type src_type;
+        typedef typename MakeVec<R, cn>::type res_type;
+
+        const GpuMat_<src_type>& src = (const GpuMat_<src_type>&) _src;
+        GpuMat_<res_type>& dst = (GpuMat_<res_type>&) _dst;
+
+        if (mask.empty())
+            gridCalcSum(abs_(cvt_<res_type>(src)), dst, stream);
+        else
+            gridCalcSum(abs_(cvt_<res_type>(src)), dst, globPtr<uchar>(mask), stream);
+    }
+
+    template <typename T, typename R, int cn>
+    void sumSqrImpl(const GpuMat& _src, GpuMat& _dst, const GpuMat& mask, Stream& stream)
+    {
+        typedef typename MakeVec<T, cn>::type src_type;
+        typedef typename MakeVec<R, cn>::type res_type;
+
+        const GpuMat_<src_type>& src = (const GpuMat_<src_type>&) _src;
+        GpuMat_<res_type>& dst = (GpuMat_<res_type>&) _dst;
+
+        if (mask.empty())
+            gridCalcSum(sqr_(cvt_<res_type>(src)), dst, stream);
+        else
+            gridCalcSum(sqr_(cvt_<res_type>(src)), dst, globPtr<uchar>(mask), stream);
+    }
+}
+
+void cv::cuda::calcSum(InputArray _src, OutputArray _dst, InputArray _mask, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& _src, GpuMat& _dst, const GpuMat& mask, Stream& stream);
+    static const func_t funcs[7][4] =
+    {
+        {sumImpl<uchar , double, 1>, sumImpl<uchar , double, 2>, sumImpl<uchar , double, 3>, sumImpl<uchar , double, 4>},
+        {sumImpl<schar , double, 1>, sumImpl<schar , double, 2>, sumImpl<schar , double, 3>, sumImpl<schar , double, 4>},
+        {sumImpl<ushort, double, 1>, sumImpl<ushort, double, 2>, sumImpl<ushort, double, 3>, sumImpl<ushort, double, 4>},
+        {sumImpl<short , double, 1>, sumImpl<short , double, 2>, sumImpl<short , double, 3>, sumImpl<short , double, 4>},
+        {sumImpl<int   , double, 1>, sumImpl<int   , double, 2>, sumImpl<int   , double, 3>, sumImpl<int   , double, 4>},
+        {sumImpl<float , double, 1>, sumImpl<float , double, 2>, sumImpl<float , double, 3>, sumImpl<float , double, 4>},
+        {sumImpl<double, double, 1>, sumImpl<double, double, 2>, sumImpl<double, double, 3>, sumImpl<double, double, 4>}
+    };
+
+    const GpuMat src = getInputMat(_src, stream);
+    const GpuMat mask = getInputMat(_mask, stream);
+
+    CV_Assert( mask.empty() || (mask.type() == CV_8UC1 && mask.size() == src.size()) );
+
+    const int src_depth = src.depth();
+    const int channels = src.channels();
+
+    GpuMat dst = getOutputMat(_dst, 1, 1, CV_64FC(channels), stream);
+
+    const func_t func = funcs[src_depth][channels - 1];
+    func(src, dst, mask, stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+cv::Scalar cv::cuda::sum(InputArray _src, InputArray _mask)
+{
+    Stream& stream = Stream::Null();
+
+    HostMem dst;
+    calcSum(_src, dst, _mask, stream);
+
+    stream.waitForCompletion();
+
+    cv::Scalar val;
+    dst.createMatHeader().convertTo(cv::Mat(dst.size(), CV_64FC(dst.channels()), val.val), CV_64F);
+
+    return val;
+}
+
+void cv::cuda::calcAbsSum(InputArray _src, OutputArray _dst, InputArray _mask, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& _src, GpuMat& _dst, const GpuMat& mask, Stream& stream);
+    static const func_t funcs[7][4] =
+    {
+        {sumAbsImpl<uchar , double, 1>, sumAbsImpl<uchar , double, 2>, sumAbsImpl<uchar , double, 3>, sumAbsImpl<uchar , double, 4>},
+        {sumAbsImpl<schar , double, 1>, sumAbsImpl<schar , double, 2>, sumAbsImpl<schar , double, 3>, sumAbsImpl<schar , double, 4>},
+        {sumAbsImpl<ushort, double, 1>, sumAbsImpl<ushort, double, 2>, sumAbsImpl<ushort, double, 3>, sumAbsImpl<ushort, double, 4>},
+        {sumAbsImpl<short , double, 1>, sumAbsImpl<short , double, 2>, sumAbsImpl<short , double, 3>, sumAbsImpl<short , double, 4>},
+        {sumAbsImpl<int   , double, 1>, sumAbsImpl<int   , double, 2>, sumAbsImpl<int   , double, 3>, sumAbsImpl<int   , double, 4>},
+        {sumAbsImpl<float , double, 1>, sumAbsImpl<float , double, 2>, sumAbsImpl<float , double, 3>, sumAbsImpl<float , double, 4>},
+        {sumAbsImpl<double, double, 1>, sumAbsImpl<double, double, 2>, sumAbsImpl<double, double, 3>, sumAbsImpl<double, double, 4>}
+    };
+
+    const GpuMat src = getInputMat(_src, stream);
+    const GpuMat mask = getInputMat(_mask, stream);
+
+    CV_Assert( mask.empty() || (mask.type() == CV_8UC1 && mask.size() == src.size()) );
+
+    const int src_depth = src.depth();
+    const int channels = src.channels();
+
+    GpuMat dst = getOutputMat(_dst, 1, 1, CV_64FC(channels), stream);
+
+    const func_t func = funcs[src_depth][channels - 1];
+    func(src, dst, mask, stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+cv::Scalar cv::cuda::absSum(InputArray _src, InputArray _mask)
+{
+    Stream& stream = Stream::Null();
+
+    HostMem dst;
+    calcAbsSum(_src, dst, _mask, stream);
+
+    stream.waitForCompletion();
+
+    cv::Scalar val;
+    dst.createMatHeader().convertTo(cv::Mat(dst.size(), CV_64FC(dst.channels()), val.val), CV_64F);
+
+    return val;
+}
+
+void cv::cuda::calcSqrSum(InputArray _src, OutputArray _dst, InputArray _mask, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& _src, GpuMat& _dst, const GpuMat& mask, Stream& stream);
+    static const func_t funcs[7][4] =
+    {
+        {sumSqrImpl<uchar , double, 1>, sumSqrImpl<uchar , double, 2>, sumSqrImpl<uchar , double, 3>, sumSqrImpl<uchar , double, 4>},
+        {sumSqrImpl<schar , double, 1>, sumSqrImpl<schar , double, 2>, sumSqrImpl<schar , double, 3>, sumSqrImpl<schar , double, 4>},
+        {sumSqrImpl<ushort, double, 1>, sumSqrImpl<ushort, double, 2>, sumSqrImpl<ushort, double, 3>, sumSqrImpl<ushort, double, 4>},
+        {sumSqrImpl<short , double, 1>, sumSqrImpl<short , double, 2>, sumSqrImpl<short , double, 3>, sumSqrImpl<short , double, 4>},
+        {sumSqrImpl<int   , double, 1>, sumSqrImpl<int   , double, 2>, sumSqrImpl<int   , double, 3>, sumSqrImpl<int   , double, 4>},
+        {sumSqrImpl<float , double, 1>, sumSqrImpl<float , double, 2>, sumSqrImpl<float , double, 3>, sumSqrImpl<float , double, 4>},
+        {sumSqrImpl<double, double, 1>, sumSqrImpl<double, double, 2>, sumSqrImpl<double, double, 3>, sumSqrImpl<double, double, 4>}
+    };
+
+    const GpuMat src = getInputMat(_src, stream);
+    const GpuMat mask = getInputMat(_mask, stream);
+
+    CV_Assert( mask.empty() || (mask.type() == CV_8UC1 && mask.size() == src.size()) );
+
+    const int src_depth = src.depth();
+    const int channels = src.channels();
+
+    GpuMat dst = getOutputMat(_dst, 1, 1, CV_64FC(channels), stream);
+
+    const func_t func = funcs[src_depth][channels - 1];
+    func(src, dst, mask, stream);
+
+    syncOutput(dst, _dst, stream);
+}
+
+cv::Scalar cv::cuda::sqrSum(InputArray _src, InputArray _mask)
+{
+    Stream& stream = Stream::Null();
+
+    HostMem dst;
+    calcSqrSum(_src, dst, _mask, stream);
+
+    stream.waitForCompletion();
+
+    cv::Scalar val;
+    dst.createMatHeader().convertTo(cv::Mat(dst.size(), CV_64FC(dst.channels()), val.val), CV_64F);
+
+    return val;
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/threshold.cu b/modules/cudaarithm/src/cuda/threshold.cu
new file mode 100644
index 00000000000..1249fee04be
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/threshold.cu
@@ -0,0 +1,153 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+namespace
+{
+    template <typename ScalarDepth> struct TransformPolicy : DefaultTransformPolicy
+    {
+    };
+    template <> struct TransformPolicy<double> : DefaultTransformPolicy
+    {
+        enum {
+            shift = 1
+        };
+    };
+
+    template <typename T>
+    void thresholdImpl(const GpuMat& src, GpuMat& dst, double thresh, double maxVal, int type, Stream& stream)
+    {
+        const T thresh_ = static_cast<T>(thresh);
+        const T maxVal_ = static_cast<T>(maxVal);
+
+        switch (type)
+        {
+        case 0:
+            gridTransformUnary_< TransformPolicy<T> >(globPtr<T>(src), globPtr<T>(dst), thresh_binary_func(thresh_, maxVal_), stream);
+            break;
+        case 1:
+            gridTransformUnary_< TransformPolicy<T> >(globPtr<T>(src), globPtr<T>(dst), thresh_binary_inv_func(thresh_, maxVal_), stream);
+            break;
+        case 2:
+            gridTransformUnary_< TransformPolicy<T> >(globPtr<T>(src), globPtr<T>(dst), thresh_trunc_func(thresh_), stream);
+            break;
+        case 3:
+            gridTransformUnary_< TransformPolicy<T> >(globPtr<T>(src), globPtr<T>(dst), thresh_to_zero_func(thresh_), stream);
+            break;
+        case 4:
+            gridTransformUnary_< TransformPolicy<T> >(globPtr<T>(src), globPtr<T>(dst), thresh_to_zero_inv_func(thresh_), stream);
+            break;
+        };
+    }
+}
+
+double cv::cuda::threshold(InputArray _src, OutputArray _dst, double thresh, double maxVal, int type, Stream& stream)
+{
+    GpuMat src = getInputMat(_src, stream);
+
+    const int depth = src.depth();
+
+    CV_Assert( depth <= CV_64F );
+    CV_Assert( type <= 4 /*THRESH_TOZERO_INV*/ );
+
+    GpuMat dst = getOutputMat(_dst, src.size(), src.type(), stream);
+    src = src.reshape(1);
+    dst = dst.reshape(1);
+
+    if (depth == CV_32F && type == 2 /*THRESH_TRUNC*/)
+    {
+        NppStreamHandler h(StreamAccessor::getStream(stream));
+
+        NppiSize sz;
+        sz.width  = src.cols;
+        sz.height = src.rows;
+
+        nppSafeCall( nppiThreshold_32f_C1R(src.ptr<Npp32f>(), static_cast<int>(src.step),
+            dst.ptr<Npp32f>(), static_cast<int>(dst.step), sz, static_cast<Npp32f>(thresh), NPP_CMP_GREATER) );
+
+        if (!stream)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+    else
+    {
+        typedef void (*func_t)(const GpuMat& src, GpuMat& dst, double thresh, double maxVal, int type, Stream& stream);
+        static const func_t funcs[] =
+        {
+            thresholdImpl<uchar>,
+            thresholdImpl<schar>,
+            thresholdImpl<ushort>,
+            thresholdImpl<short>,
+            thresholdImpl<int>,
+            thresholdImpl<float>,
+            thresholdImpl<double>
+        };
+
+        if (depth != CV_32F && depth != CV_64F)
+        {
+            thresh = cvFloor(thresh);
+            maxVal = cvRound(maxVal);
+        }
+
+        funcs[depth](src, dst, thresh, maxVal, type, stream);
+    }
+
+    syncOutput(dst, _dst, stream);
+
+    return thresh;
+}
+
+#endif
diff --git a/modules/cudaarithm/src/cuda/transpose.cu b/modules/cudaarithm/src/cuda/transpose.cu
new file mode 100644
index 00000000000..bfe50bd34fb
--- /dev/null
+++ b/modules/cudaarithm/src/cuda/transpose.cu
@@ -0,0 +1,95 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudev.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+void cv::cuda::transpose(InputArray _src, OutputArray _dst, Stream& stream)
+{
+    GpuMat src = getInputMat(_src, stream);
+
+    const size_t elemSize = src.elemSize();
+
+    CV_Assert( elemSize == 1 || elemSize == 4 || elemSize == 8 );
+
+    GpuMat dst = getOutputMat(_dst, src.cols, src.rows, src.type(), stream);
+
+    if (elemSize == 1)
+    {
+        NppStreamHandler h(StreamAccessor::getStream(stream));
+
+        NppiSize sz;
+        sz.width  = src.cols;
+        sz.height = src.rows;
+
+        nppSafeCall( nppiTranspose_8u_C1R(src.ptr<Npp8u>(), static_cast<int>(src.step),
+            dst.ptr<Npp8u>(), static_cast<int>(dst.step), sz) );
+
+        if (!stream)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+    else if (elemSize == 4)
+    {
+        gridTranspose(globPtr<int>(src), globPtr<int>(dst), stream);
+    }
+    else // if (elemSize == 8)
+    {
+        gridTranspose(globPtr<double>(src), globPtr<double>(dst), stream);
+    }
+
+    syncOutput(dst, _dst, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/element_operations.cpp b/modules/cudaarithm/src/element_operations.cpp
new file mode 100644
index 00000000000..f88119502d1
--- /dev/null
+++ b/modules/cudaarithm/src/element_operations.cpp
@@ -0,0 +1,505 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+void cv::cuda::add(InputArray, InputArray, OutputArray, InputArray, int, Stream&) { throw_no_cuda(); }
+void cv::cuda::subtract(InputArray, InputArray, OutputArray, InputArray, int, Stream&) { throw_no_cuda(); }
+void cv::cuda::multiply(InputArray, InputArray, OutputArray, double, int, Stream&) { throw_no_cuda(); }
+void cv::cuda::divide(InputArray, InputArray, OutputArray, double, int, Stream&) { throw_no_cuda(); }
+void cv::cuda::absdiff(InputArray, InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::abs(InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::sqr(InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::sqrt(InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::exp(InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::log(InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::pow(InputArray, double, OutputArray, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::compare(InputArray, InputArray, OutputArray, int, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::bitwise_not(InputArray, OutputArray, InputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::bitwise_or(InputArray, InputArray, OutputArray, InputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::bitwise_and(InputArray, InputArray, OutputArray, InputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::bitwise_xor(InputArray, InputArray, OutputArray, InputArray, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::rshift(InputArray, Scalar_<int>, OutputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::lshift(InputArray, Scalar_<int>, OutputArray, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::min(InputArray, InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::max(InputArray, InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::addWeighted(InputArray, double, InputArray, double, double, OutputArray, int, Stream&) { throw_no_cuda(); }
+
+double cv::cuda::threshold(InputArray, OutputArray, double, double, int, Stream&) {throw_no_cuda(); return 0.0;}
+
+void cv::cuda::magnitude(InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::magnitude(InputArray, InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::magnitudeSqr(InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::magnitudeSqr(InputArray, InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::phase(InputArray, InputArray, OutputArray, bool, Stream&) { throw_no_cuda(); }
+void cv::cuda::cartToPolar(InputArray, InputArray, OutputArray, OutputArray, bool, Stream&) { throw_no_cuda(); }
+void cv::cuda::polarToCart(InputArray, InputArray, OutputArray, OutputArray, bool, Stream&) { throw_no_cuda(); }
+
+#else
+
+////////////////////////////////////////////////////////////////////////
+// arithm_op
+
+namespace
+{
+    typedef void (*mat_mat_func_t)(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, double scale, Stream& stream, int op);
+    typedef void (*mat_scalar_func_t)(const GpuMat& src, Scalar val, bool inv, GpuMat& dst, const GpuMat& mask, double scale, Stream& stream, int op);
+
+    void arithm_op(InputArray _src1, InputArray _src2, OutputArray _dst, InputArray _mask, double scale, int dtype, Stream& stream,
+                   mat_mat_func_t mat_mat_func, mat_scalar_func_t mat_scalar_func, int op = 0)
+    {
+        const int kind1 = _src1.kind();
+        const int kind2 = _src2.kind();
+
+        const bool isScalar1 = (kind1 == _InputArray::MATX);
+        const bool isScalar2 = (kind2 == _InputArray::MATX);
+        CV_Assert( !isScalar1 || !isScalar2 );
+
+        GpuMat src1;
+        if (!isScalar1)
+            src1 = getInputMat(_src1, stream);
+
+        GpuMat src2;
+        if (!isScalar2)
+            src2 = getInputMat(_src2, stream);
+
+        Mat scalar;
+        if (isScalar1)
+            scalar = _src1.getMat();
+        else if (isScalar2)
+            scalar = _src2.getMat();
+
+        Scalar val;
+        if (!scalar.empty())
+        {
+            CV_Assert( scalar.total() <= 4 );
+            scalar.convertTo(Mat_<double>(scalar.rows, scalar.cols, &val[0]), CV_64F);
+        }
+
+        GpuMat mask = getInputMat(_mask, stream);
+
+        const int sdepth = src1.empty() ? src2.depth() : src1.depth();
+        const int cn = src1.empty() ? src2.channels() : src1.channels();
+        const Size size = src1.empty() ? src2.size() : src1.size();
+
+        if (dtype < 0)
+            dtype = sdepth;
+
+        const int ddepth = CV_MAT_DEPTH(dtype);
+
+        CV_Assert( sdepth <= CV_64F && ddepth <= CV_64F );
+        CV_Assert( !scalar.empty() || (src2.type() == src1.type() && src2.size() == src1.size()) );
+        CV_Assert( mask.empty() || (cn == 1 && mask.size() == size && mask.type() == CV_8UC1) );
+
+        if (sdepth == CV_64F || ddepth == CV_64F)
+        {
+            if (!deviceSupports(NATIVE_DOUBLE))
+                CV_Error(Error::StsUnsupportedFormat, "The device doesn't support double");
+        }
+
+        GpuMat dst = getOutputMat(_dst, size, CV_MAKE_TYPE(ddepth, cn), stream);
+
+        if (isScalar1)
+            mat_scalar_func(src2, val, true, dst, mask, scale, stream, op);
+        else if (isScalar2)
+            mat_scalar_func(src1, val, false, dst, mask, scale, stream, op);
+        else
+            mat_mat_func(src1, src2, dst, mask, scale, stream, op);
+
+        syncOutput(dst, _dst, stream);
+    }
+}
+
+////////////////////////////////////////////////////////////////////////
+// add
+
+void addMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, double, Stream& _stream, int);
+
+void addScalar(const GpuMat& src, Scalar val, bool, GpuMat& dst, const GpuMat& mask, double, Stream& stream, int);
+
+void cv::cuda::add(InputArray src1, InputArray src2, OutputArray dst, InputArray mask, int dtype, Stream& stream)
+{
+    arithm_op(src1, src2, dst, mask, 1.0, dtype, stream, addMat, addScalar);
+}
+
+////////////////////////////////////////////////////////////////////////
+// subtract
+
+void subMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, double, Stream& _stream, int);
+
+void subScalar(const GpuMat& src, Scalar val, bool inv, GpuMat& dst, const GpuMat& mask, double, Stream& stream, int);
+
+void cv::cuda::subtract(InputArray src1, InputArray src2, OutputArray dst, InputArray mask, int dtype, Stream& stream)
+{
+    arithm_op(src1, src2, dst, mask, 1.0, dtype, stream, subMat, subScalar);
+}
+
+////////////////////////////////////////////////////////////////////////
+// multiply
+
+void mulMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat&, double scale, Stream& stream, int);
+void mulMat_8uc4_32f(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream);
+void mulMat_16sc4_32f(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream);
+
+void mulScalar(const GpuMat& src, cv::Scalar val, bool, GpuMat& dst, const GpuMat& mask, double scale, Stream& stream, int);
+
+void cv::cuda::multiply(InputArray _src1, InputArray _src2, OutputArray _dst, double scale, int dtype, Stream& stream)
+{
+    if (_src1.type() == CV_8UC4 && _src2.type() == CV_32FC1)
+    {
+        GpuMat src1 = getInputMat(_src1, stream);
+        GpuMat src2 = getInputMat(_src2, stream);
+
+        CV_Assert( src1.size() == src2.size() );
+
+        GpuMat dst = getOutputMat(_dst, src1.size(), src1.type(), stream);
+
+        mulMat_8uc4_32f(src1, src2, dst, stream);
+
+        syncOutput(dst, _dst, stream);
+    }
+    else if (_src1.type() == CV_16SC4 && _src2.type() == CV_32FC1)
+    {
+        GpuMat src1 = getInputMat(_src1, stream);
+        GpuMat src2 = getInputMat(_src2, stream);
+
+        CV_Assert( src1.size() == src2.size() );
+
+        GpuMat dst = getOutputMat(_dst, src1.size(), src1.type(), stream);
+
+        mulMat_16sc4_32f(src1, src2, dst, stream);
+
+        syncOutput(dst, _dst, stream);
+    }
+    else
+    {
+        arithm_op(_src1, _src2, _dst, GpuMat(), scale, dtype, stream, mulMat, mulScalar);
+    }
+}
+
+////////////////////////////////////////////////////////////////////////
+// divide
+
+void divMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat&, double scale, Stream& stream, int);
+void divMat_8uc4_32f(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream);
+void divMat_16sc4_32f(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream);
+
+void divScalar(const GpuMat& src, cv::Scalar val, bool inv, GpuMat& dst, const GpuMat& mask, double scale, Stream& stream, int);
+
+void cv::cuda::divide(InputArray _src1, InputArray _src2, OutputArray _dst, double scale, int dtype, Stream& stream)
+{
+    if (_src1.type() == CV_8UC4 && _src2.type() == CV_32FC1)
+    {
+        GpuMat src1 = getInputMat(_src1, stream);
+        GpuMat src2 = getInputMat(_src2, stream);
+
+        CV_Assert( src1.size() == src2.size() );
+
+        GpuMat dst = getOutputMat(_dst, src1.size(), src1.type(), stream);
+
+        divMat_8uc4_32f(src1, src2, dst, stream);
+
+        syncOutput(dst, _dst, stream);
+    }
+    else if (_src1.type() == CV_16SC4 && _src2.type() == CV_32FC1)
+    {
+        GpuMat src1 = getInputMat(_src1, stream);
+        GpuMat src2 = getInputMat(_src2, stream);
+
+        CV_Assert( src1.size() == src2.size() );
+
+        GpuMat dst = getOutputMat(_dst, src1.size(), src1.type(), stream);
+
+        divMat_16sc4_32f(src1, src2, dst, stream);
+
+        syncOutput(dst, _dst, stream);
+    }
+    else
+    {
+        arithm_op(_src1, _src2, _dst, GpuMat(), scale, dtype, stream, divMat, divScalar);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////////////
+// absdiff
+
+void absDiffMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat&, double, Stream& stream, int);
+
+void absDiffScalar(const GpuMat& src, cv::Scalar val, bool, GpuMat& dst, const GpuMat&, double, Stream& stream, int);
+
+void cv::cuda::absdiff(InputArray src1, InputArray src2, OutputArray dst, Stream& stream)
+{
+    arithm_op(src1, src2, dst, noArray(), 1.0, -1, stream, absDiffMat, absDiffScalar);
+}
+
+//////////////////////////////////////////////////////////////////////////////
+// compare
+
+void cmpMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat&, double, Stream& stream, int cmpop);
+
+void cmpScalar(const GpuMat& src, Scalar val, bool inv, GpuMat& dst, const GpuMat&, double, Stream& stream, int cmpop);
+
+void cv::cuda::compare(InputArray src1, InputArray src2, OutputArray dst, int cmpop, Stream& stream)
+{
+    arithm_op(src1, src2, dst, noArray(), 1.0, CV_8U, stream, cmpMat, cmpScalar, cmpop);
+}
+
+//////////////////////////////////////////////////////////////////////////////
+// Binary bitwise logical operations
+
+namespace
+{
+    enum
+    {
+        BIT_OP_AND,
+        BIT_OP_OR,
+        BIT_OP_XOR
+    };
+}
+
+void bitMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, double, Stream& stream, int op);
+
+void bitScalar(const GpuMat& src, cv::Scalar value, bool, GpuMat& dst, const GpuMat& mask, double, Stream& stream, int op);
+
+void cv::cuda::bitwise_or(InputArray src1, InputArray src2, OutputArray dst, InputArray mask, Stream& stream)
+{
+    arithm_op(src1, src2, dst, mask, 1.0, -1, stream, bitMat, bitScalar, BIT_OP_OR);
+}
+
+void cv::cuda::bitwise_and(InputArray src1, InputArray src2, OutputArray dst, InputArray mask, Stream& stream)
+{
+    arithm_op(src1, src2, dst, mask, 1.0, -1, stream, bitMat, bitScalar, BIT_OP_AND);
+}
+
+void cv::cuda::bitwise_xor(InputArray src1, InputArray src2, OutputArray dst, InputArray mask, Stream& stream)
+{
+    arithm_op(src1, src2, dst, mask, 1.0, -1, stream, bitMat, bitScalar, BIT_OP_XOR);
+}
+
+//////////////////////////////////////////////////////////////////////////////
+// shift
+
+namespace
+{
+    template <int DEPTH, int cn> struct NppShiftFunc
+    {
+        typedef typename NPPTypeTraits<DEPTH>::npp_type npp_type;
+
+        typedef NppStatus (*func_t)(const npp_type* pSrc1, int nSrc1Step, const Npp32u* pConstants, npp_type* pDst,  int nDstStep,  NppiSize oSizeROI);
+    };
+    template <int DEPTH> struct NppShiftFunc<DEPTH, 1>
+    {
+        typedef typename NPPTypeTraits<DEPTH>::npp_type npp_type;
+
+        typedef NppStatus (*func_t)(const npp_type* pSrc1, int nSrc1Step, const Npp32u pConstants, npp_type* pDst,  int nDstStep,  NppiSize oSizeROI);
+    };
+
+    template <int DEPTH, int cn, typename NppShiftFunc<DEPTH, cn>::func_t func> struct NppShift
+    {
+        typedef typename NPPTypeTraits<DEPTH>::npp_type npp_type;
+
+        static void call(const GpuMat& src, Scalar_<Npp32u> sc, GpuMat& dst, cudaStream_t stream)
+        {
+            NppStreamHandler h(stream);
+
+            NppiSize oSizeROI;
+            oSizeROI.width = src.cols;
+            oSizeROI.height = src.rows;
+
+            nppSafeCall( func(src.ptr<npp_type>(), static_cast<int>(src.step), sc.val, dst.ptr<npp_type>(), static_cast<int>(dst.step), oSizeROI) );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    };
+    template <int DEPTH, typename NppShiftFunc<DEPTH, 1>::func_t func> struct NppShift<DEPTH, 1, func>
+    {
+        typedef typename NPPTypeTraits<DEPTH>::npp_type npp_type;
+
+        static void call(const GpuMat& src, Scalar_<Npp32u> sc, GpuMat& dst, cudaStream_t stream)
+        {
+            NppStreamHandler h(stream);
+
+            NppiSize oSizeROI;
+            oSizeROI.width = src.cols;
+            oSizeROI.height = src.rows;
+
+            nppSafeCall( func(src.ptr<npp_type>(), static_cast<int>(src.step), sc.val[0], dst.ptr<npp_type>(), static_cast<int>(dst.step), oSizeROI) );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    };
+}
+
+void cv::cuda::rshift(InputArray _src, Scalar_<int> val, OutputArray _dst, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& src, Scalar_<Npp32u> sc, GpuMat& dst, cudaStream_t stream);
+    static const func_t funcs[5][4] =
+    {
+        {NppShift<CV_8U , 1, nppiRShiftC_8u_C1R >::call, 0, NppShift<CV_8U , 3, nppiRShiftC_8u_C3R >::call, NppShift<CV_8U , 4, nppiRShiftC_8u_C4R>::call },
+        {NppShift<CV_8S , 1, nppiRShiftC_8s_C1R >::call, 0, NppShift<CV_8S , 3, nppiRShiftC_8s_C3R >::call, NppShift<CV_8S , 4, nppiRShiftC_8s_C4R>::call },
+        {NppShift<CV_16U, 1, nppiRShiftC_16u_C1R>::call, 0, NppShift<CV_16U, 3, nppiRShiftC_16u_C3R>::call, NppShift<CV_16U, 4, nppiRShiftC_16u_C4R>::call},
+        {NppShift<CV_16S, 1, nppiRShiftC_16s_C1R>::call, 0, NppShift<CV_16S, 3, nppiRShiftC_16s_C3R>::call, NppShift<CV_16S, 4, nppiRShiftC_16s_C4R>::call},
+        {NppShift<CV_32S, 1, nppiRShiftC_32s_C1R>::call, 0, NppShift<CV_32S, 3, nppiRShiftC_32s_C3R>::call, NppShift<CV_32S, 4, nppiRShiftC_32s_C4R>::call},
+    };
+
+    GpuMat src = getInputMat(_src, stream);
+
+    CV_Assert( src.depth() < CV_32F );
+    CV_Assert( src.channels() == 1 || src.channels() == 3 || src.channels() == 4 );
+
+    GpuMat dst = getOutputMat(_dst, src.size(), src.type(), stream);
+
+    funcs[src.depth()][src.channels() - 1](src, val, dst, StreamAccessor::getStream(stream));
+
+    syncOutput(dst, _dst, stream);
+}
+
+void cv::cuda::lshift(InputArray _src, Scalar_<int> val, OutputArray _dst, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& src, Scalar_<Npp32u> sc, GpuMat& dst, cudaStream_t stream);
+    static const func_t funcs[5][4] =
+    {
+        {NppShift<CV_8U , 1, nppiLShiftC_8u_C1R>::call , 0, NppShift<CV_8U , 3, nppiLShiftC_8u_C3R>::call , NppShift<CV_8U , 4, nppiLShiftC_8u_C4R>::call },
+        {0                                             , 0, 0                                             , 0                                             },
+        {NppShift<CV_16U, 1, nppiLShiftC_16u_C1R>::call, 0, NppShift<CV_16U, 3, nppiLShiftC_16u_C3R>::call, NppShift<CV_16U, 4, nppiLShiftC_16u_C4R>::call},
+        {0                                             , 0, 0                                             , 0                                             },
+        {NppShift<CV_32S, 1, nppiLShiftC_32s_C1R>::call, 0, NppShift<CV_32S, 3, nppiLShiftC_32s_C3R>::call, NppShift<CV_32S, 4, nppiLShiftC_32s_C4R>::call},
+    };
+
+    GpuMat src = getInputMat(_src, stream);
+
+    CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32S );
+    CV_Assert( src.channels() == 1 || src.channels() == 3 || src.channels() == 4 );
+
+    GpuMat dst = getOutputMat(_dst, src.size(), src.type(), stream);
+
+    funcs[src.depth()][src.channels() - 1](src, val, dst, StreamAccessor::getStream(stream));
+
+    syncOutput(dst, _dst, stream);
+}
+
+//////////////////////////////////////////////////////////////////////////////
+// Minimum and maximum operations
+
+namespace
+{
+    enum
+    {
+        MIN_OP,
+        MAX_OP
+    };
+}
+
+void minMaxMat(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat&, double, Stream& stream, int op);
+
+void minMaxScalar(const GpuMat& src, cv::Scalar value, bool, GpuMat& dst, const GpuMat&, double, Stream& stream, int op);
+
+void cv::cuda::min(InputArray src1, InputArray src2, OutputArray dst, Stream& stream)
+{
+    arithm_op(src1, src2, dst, noArray(), 1.0, -1, stream, minMaxMat, minMaxScalar, MIN_OP);
+}
+
+void cv::cuda::max(InputArray src1, InputArray src2, OutputArray dst, Stream& stream)
+{
+    arithm_op(src1, src2, dst, noArray(), 1.0, -1, stream, minMaxMat, minMaxScalar, MAX_OP);
+}
+
+////////////////////////////////////////////////////////////////////////
+// NPP magnitide
+
+namespace
+{
+    typedef NppStatus (*nppMagnitude_t)(const Npp32fc* pSrc, int nSrcStep, Npp32f* pDst, int nDstStep, NppiSize oSizeROI);
+
+    void npp_magnitude(const GpuMat& src, GpuMat& dst, nppMagnitude_t func, cudaStream_t stream)
+    {
+        CV_Assert(src.type() == CV_32FC2);
+
+        NppiSize sz;
+        sz.width = src.cols;
+        sz.height = src.rows;
+
+        NppStreamHandler h(stream);
+
+        nppSafeCall( func(src.ptr<Npp32fc>(), static_cast<int>(src.step), dst.ptr<Npp32f>(), static_cast<int>(dst.step), sz) );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}
+
+void cv::cuda::magnitude(InputArray _src, OutputArray _dst, Stream& stream)
+{
+    GpuMat src = getInputMat(_src, stream);
+
+    GpuMat dst = getOutputMat(_dst, src.size(), CV_32FC1, stream);
+
+    npp_magnitude(src, dst, nppiMagnitude_32fc32f_C1R, StreamAccessor::getStream(stream));
+
+    syncOutput(dst, _dst, stream);
+}
+
+void cv::cuda::magnitudeSqr(InputArray _src, OutputArray _dst, Stream& stream)
+{
+    GpuMat src = getInputMat(_src, stream);
+
+    GpuMat dst = getOutputMat(_dst, src.size(), CV_32FC1, stream);
+
+    npp_magnitude(src, dst, nppiMagnitudeSqr_32fc32f_C1R, StreamAccessor::getStream(stream));
+
+    syncOutput(dst, _dst, stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/lut.cpp b/modules/cudaarithm/src/lut.cpp
new file mode 100644
index 00000000000..a4b4e02650a
--- /dev/null
+++ b/modules/cudaarithm/src/lut.cpp
@@ -0,0 +1,23 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "precomp.hpp"
+
+#include "lut.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+Ptr<LookUpTable> cv::cuda::createLookUpTable(InputArray) { throw_no_cuda(); return Ptr<LookUpTable>(); }
+
+#else /* !defined (HAVE_CUDA) || defined (CUDA_DISABLER) */
+
+Ptr<LookUpTable> cv::cuda::createLookUpTable(InputArray lut)
+{
+    return makePtr<LookUpTableImpl>(lut);
+}
+
+#endif
diff --git a/modules/cudaarithm/src/lut.hpp b/modules/cudaarithm/src/lut.hpp
new file mode 100644
index 00000000000..2c63e9acdfc
--- /dev/null
+++ b/modules/cudaarithm/src/lut.hpp
@@ -0,0 +1,30 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef __CUDAARITHM_LUT_HPP__
+#define __CUDAARITHM_LUT_HPP__
+
+#include "opencv2/cudaarithm.hpp"
+
+#include <cuda_runtime.h>
+
+namespace cv { namespace cuda {
+
+class LookUpTableImpl : public LookUpTable
+{
+public:
+    LookUpTableImpl(InputArray lut);
+    ~LookUpTableImpl();
+
+    void transform(InputArray src, OutputArray dst, Stream& stream = Stream::Null()) CV_OVERRIDE;
+
+private:
+    GpuMat d_lut;
+    cudaTextureObject_t texLutTableObj;
+    bool cc30;
+};
+
+} }
+
+#endif // __CUDAARITHM_LUT_HPP__
diff --git a/modules/cudaarithm/src/precomp.hpp b/modules/cudaarithm/src/precomp.hpp
new file mode 100644
index 00000000000..6a756e9d022
--- /dev/null
+++ b/modules/cudaarithm/src/precomp.hpp
@@ -0,0 +1,63 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_PRECOMP_H__
+#define __OPENCV_PRECOMP_H__
+
+#include <limits>
+
+#include "cvconfig.h"
+
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/core/utility.hpp"
+
+#include "opencv2/core/private.cuda.hpp"
+
+#ifdef HAVE_CUBLAS
+#  include <cublas.h>
+#endif
+
+#ifdef HAVE_CUFFT
+#  include <cufft.h>
+#endif
+
+#endif /* __OPENCV_PRECOMP_H__ */
diff --git a/modules/cudaarithm/src/reductions.cpp b/modules/cudaarithm/src/reductions.cpp
new file mode 100644
index 00000000000..4824a5c4da7
--- /dev/null
+++ b/modules/cudaarithm/src/reductions.cpp
@@ -0,0 +1,219 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+double cv::cuda::norm(InputArray, int, InputArray) { throw_no_cuda(); return 0.0; }
+void cv::cuda::calcNorm(InputArray, OutputArray, int, InputArray, Stream&) { throw_no_cuda(); }
+double cv::cuda::norm(InputArray, InputArray, int) { throw_no_cuda(); return 0.0; }
+void cv::cuda::calcNormDiff(InputArray, InputArray, OutputArray, int, Stream&) { throw_no_cuda(); }
+
+Scalar cv::cuda::sum(InputArray, InputArray) { throw_no_cuda(); return Scalar(); }
+void cv::cuda::calcSum(InputArray, OutputArray, InputArray, Stream&) { throw_no_cuda(); }
+Scalar cv::cuda::absSum(InputArray, InputArray) { throw_no_cuda(); return Scalar(); }
+void cv::cuda::calcAbsSum(InputArray, OutputArray, InputArray, Stream&) { throw_no_cuda(); }
+Scalar cv::cuda::sqrSum(InputArray, InputArray) { throw_no_cuda(); return Scalar(); }
+void cv::cuda::calcSqrSum(InputArray, OutputArray, InputArray, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::minMax(InputArray, double*, double*, InputArray) { throw_no_cuda(); }
+void cv::cuda::findMinMax(InputArray, OutputArray, InputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::minMaxLoc(InputArray, double*, double*, Point*, Point*, InputArray) { throw_no_cuda(); }
+void cv::cuda::findMinMaxLoc(InputArray, OutputArray, OutputArray, InputArray, Stream&) { throw_no_cuda(); }
+
+int cv::cuda::countNonZero(InputArray) { throw_no_cuda(); return 0; }
+void cv::cuda::countNonZero(InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::reduce(InputArray, OutputArray, int, int, int, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::meanStdDev(InputArray, Scalar&, Scalar&) { throw_no_cuda(); }
+void cv::cuda::meanStdDev(InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::rectStdDev(InputArray, InputArray, OutputArray, Rect, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::normalize(InputArray, OutputArray, double, double, int, int, InputArray, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::integral(InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::sqrIntegral(InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+
+#else
+
+////////////////////////////////////////////////////////////////////////
+// norm
+
+namespace cv { namespace cuda { namespace device {
+
+void normL2(cv::InputArray _src, cv::OutputArray _dst, cv::InputArray _mask, Stream& stream);
+
+void findMaxAbs(cv::InputArray _src, cv::OutputArray _dst, cv::InputArray _mask, Stream& stream);
+
+}}}
+
+void cv::cuda::calcNorm(InputArray _src, OutputArray dst, int normType, InputArray mask, Stream& stream)
+{
+    CV_Assert( normType == NORM_INF || normType == NORM_L1 || normType == NORM_L2 );
+
+    GpuMat src = getInputMat(_src, stream);
+
+    GpuMat src_single_channel = src.reshape(1);
+
+    if (normType == NORM_L1)
+    {
+        calcAbsSum(src_single_channel, dst, mask, stream);
+    }
+    else if (normType == NORM_L2)
+    {
+        cv::cuda::device::normL2(src_single_channel, dst, mask, stream);
+    }
+    else // NORM_INF
+    {
+        cv::cuda::device::findMaxAbs(src_single_channel, dst, mask, stream);
+    }
+}
+
+double cv::cuda::norm(InputArray _src, int normType, InputArray _mask)
+{
+    Stream& stream = Stream::Null();
+
+    HostMem dst;
+    calcNorm(_src, dst, normType, _mask, stream);
+
+    stream.waitForCompletion();
+
+    double val;
+    dst.createMatHeader().convertTo(Mat(1, 1, CV_64FC1, &val), CV_64F);
+
+    return val;
+}
+
+////////////////////////////////////////////////////////////////////////
+// meanStdDev
+
+void cv::cuda::meanStdDev(InputArray _src, OutputArray _dst, Stream& stream)
+{
+    if (!deviceSupports(FEATURE_SET_COMPUTE_13))
+        CV_Error(cv::Error::StsNotImplemented, "Not sufficient compute capebility");
+
+    const GpuMat src = getInputMat(_src, stream);
+
+    CV_Assert( src.type() == CV_8UC1 );
+
+    GpuMat dst = getOutputMat(_dst, 1, 2, CV_64FC1, stream);
+
+    NppiSize sz;
+    sz.width  = src.cols;
+    sz.height = src.rows;
+
+    int bufSize;
+#if (CUDA_VERSION <= 4020)
+    nppSafeCall( nppiMeanStdDev8uC1RGetBufferHostSize(sz, &bufSize) );
+#else
+    nppSafeCall( nppiMeanStdDevGetBufferHostSize_8u_C1R(sz, &bufSize) );
+#endif
+
+    BufferPool pool(stream);
+    GpuMat buf = pool.getBuffer(1, bufSize, CV_8UC1);
+
+    // detail: https://github.com/opencv/opencv/issues/11063
+    //NppStreamHandler h(StreamAccessor::getStream(stream));
+
+    nppSafeCall( nppiMean_StdDev_8u_C1R(src.ptr<Npp8u>(), static_cast<int>(src.step), sz, buf.ptr<Npp8u>(), dst.ptr<Npp64f>(), dst.ptr<Npp64f>() + 1) );
+
+    syncOutput(dst, _dst, stream);
+}
+
+void cv::cuda::meanStdDev(InputArray _src, Scalar& mean, Scalar& stddev)
+{
+    Stream& stream = Stream::Null();
+
+    HostMem dst;
+    meanStdDev(_src, dst, stream);
+
+    stream.waitForCompletion();
+
+    double vals[2];
+    dst.createMatHeader().copyTo(Mat(1, 2, CV_64FC1, &vals[0]));
+
+    mean = Scalar(vals[0]);
+    stddev = Scalar(vals[1]);
+}
+
+//////////////////////////////////////////////////////////////////////////////
+// rectStdDev
+
+void cv::cuda::rectStdDev(InputArray _src, InputArray _sqr, OutputArray _dst, Rect rect, Stream& _stream)
+{
+    GpuMat src = getInputMat(_src, _stream);
+    GpuMat sqr = getInputMat(_sqr, _stream);
+
+    CV_Assert( src.type() == CV_32SC1 && sqr.type() == CV_64FC1 );
+
+    GpuMat dst = getOutputMat(_dst, src.size(), CV_32FC1, _stream);
+
+    NppiSize sz;
+    sz.width = src.cols;
+    sz.height = src.rows;
+
+    NppiRect nppRect;
+    nppRect.height = rect.height;
+    nppRect.width = rect.width;
+    nppRect.x = rect.x;
+    nppRect.y = rect.y;
+
+    cudaStream_t stream = StreamAccessor::getStream(_stream);
+
+    NppStreamHandler h(stream);
+
+    nppSafeCall( nppiRectStdDev_32s32f_C1R(src.ptr<Npp32s>(), static_cast<int>(src.step), sqr.ptr<Npp64f>(), static_cast<int>(sqr.step),
+                dst.ptr<Npp32f>(), static_cast<int>(dst.step), sz, nppRect) );
+
+    if (stream == 0)
+        cudaSafeCall( cudaDeviceSynchronize() );
+
+    syncOutput(dst, _dst, _stream);
+}
+
+#endif
diff --git a/modules/cudaarithm/test/test_arithm.cpp b/modules/cudaarithm/test/test_arithm.cpp
new file mode 100644
index 00000000000..9ee10b6532f
--- /dev/null
+++ b/modules/cudaarithm/test/test_arithm.cpp
@@ -0,0 +1,433 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////////////////////////////
+// GEMM
+
+#ifdef HAVE_CUBLAS
+
+CV_FLAGS(GemmFlags, 0, cv::GEMM_1_T, cv::GEMM_2_T, cv::GEMM_3_T);
+#define ALL_GEMM_FLAGS testing::Values(GemmFlags(0), GemmFlags(cv::GEMM_1_T), GemmFlags(cv::GEMM_2_T), GemmFlags(cv::GEMM_3_T), GemmFlags(cv::GEMM_1_T | cv::GEMM_2_T), GemmFlags(cv::GEMM_1_T | cv::GEMM_3_T), GemmFlags(cv::GEMM_1_T | cv::GEMM_2_T | cv::GEMM_3_T))
+
+PARAM_TEST_CASE(GEMM, cv::cuda::DeviceInfo, cv::Size, MatType, GemmFlags, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    int flags;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        flags = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(GEMM, Accuracy)
+{
+    cv::Mat src1 = randomMat(size, type, -10.0, 10.0);
+    cv::Mat src2 = randomMat(size, type, -10.0, 10.0);
+    cv::Mat src3 = randomMat(size, type, -10.0, 10.0);
+    double alpha = randomDouble(-10.0, 10.0);
+    double beta = randomDouble(-10.0, 10.0);
+
+    if (CV_MAT_DEPTH(type) == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::gemm(loadMat(src1), loadMat(src2), alpha, loadMat(src3), beta, dst, flags);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else if (type == CV_64FC2 && flags != 0)
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::gemm(loadMat(src1), loadMat(src2), alpha, loadMat(src3), beta, dst, flags);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsNotImplemented, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+        cv::cuda::gemm(loadMat(src1, useRoi), loadMat(src2, useRoi), alpha, loadMat(src3, useRoi), beta, dst, flags);
+
+        cv::Mat dst_gold;
+        cv::gemm(src1, src2, alpha, src3, beta, dst_gold, flags);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, CV_MAT_DEPTH(type) == CV_32F ? 1e-1 : 1e-10);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, GEMM, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_32FC1), MatType(CV_32FC2), MatType(CV_64FC1), MatType(CV_64FC2)),
+    ALL_GEMM_FLAGS,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////
+// MulSpectrums
+
+CV_FLAGS(DftFlags, 0, cv::DFT_INVERSE, cv::DFT_SCALE, cv::DFT_ROWS, cv::DFT_COMPLEX_OUTPUT, cv::DFT_REAL_OUTPUT)
+
+PARAM_TEST_CASE(MulSpectrums, cv::cuda::DeviceInfo, cv::Size, DftFlags)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int flag;
+
+    cv::Mat a, b;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        flag = GET_PARAM(2);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        a = randomMat(size, CV_32FC2);
+        b = randomMat(size, CV_32FC2);
+    }
+};
+
+CUDA_TEST_P(MulSpectrums, Simple)
+{
+    cv::cuda::GpuMat c;
+    cv::cuda::mulSpectrums(loadMat(a), loadMat(b), c, flag, false);
+
+    cv::Mat c_gold;
+    cv::mulSpectrums(a, b, c_gold, flag, false);
+
+    EXPECT_MAT_NEAR(c_gold, c, 1e-2);
+}
+
+CUDA_TEST_P(MulSpectrums, Scaled)
+{
+    float scale = 1.f / size.area();
+
+    cv::cuda::GpuMat c;
+    cv::cuda::mulAndScaleSpectrums(loadMat(a), loadMat(b), c, flag, scale, false);
+
+    cv::Mat c_gold;
+    cv::mulSpectrums(a, b, c_gold, flag, false);
+    c_gold.convertTo(c_gold, c_gold.type(), scale);
+
+    EXPECT_MAT_NEAR(c_gold, c, 1e-2);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, MulSpectrums, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(DftFlags(0), DftFlags(cv::DFT_ROWS))));
+
+////////////////////////////////////////////////////////////////////////////
+// Dft
+
+struct Dft : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    virtual void SetUp()
+    {
+        devInfo = GetParam();
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+namespace
+{
+    void testC2C(const std::string& hint, int cols, int rows, int flags, bool inplace)
+    {
+        SCOPED_TRACE(hint);
+
+        cv::Mat a = randomMat(cv::Size(cols, rows), CV_32FC2, 0.0, 10.0);
+
+        cv::Mat b_gold;
+        cv::dft(a, b_gold, flags);
+
+        cv::cuda::GpuMat d_b;
+        cv::cuda::GpuMat d_b_data;
+        if (inplace)
+        {
+            d_b_data.create(1, a.size().area(), CV_32FC2);
+            d_b = cv::cuda::GpuMat(a.rows, a.cols, CV_32FC2, d_b_data.ptr(), a.cols * d_b_data.elemSize());
+        }
+        cv::cuda::dft(loadMat(a), d_b, cv::Size(cols, rows), flags);
+
+        EXPECT_TRUE(!inplace || d_b.ptr() == d_b_data.ptr());
+        ASSERT_EQ(CV_32F, d_b.depth());
+        ASSERT_EQ(2, d_b.channels());
+        EXPECT_MAT_NEAR(b_gold, cv::Mat(d_b), rows * cols * 1e-4);
+    }
+}
+
+CUDA_TEST_P(Dft, C2C)
+{
+    int cols = randomInt(2, 100);
+    int rows = randomInt(2, 100);
+
+    for (int i = 0; i < 2; ++i)
+    {
+        bool inplace = i != 0;
+
+        testC2C("no flags", cols, rows, 0, inplace);
+        testC2C("no flags 0 1", cols, rows + 1, 0, inplace);
+        testC2C("no flags 1 0", cols, rows + 1, 0, inplace);
+        testC2C("no flags 1 1", cols + 1, rows, 0, inplace);
+        testC2C("DFT_INVERSE", cols, rows, cv::DFT_INVERSE, inplace);
+        testC2C("DFT_ROWS", cols, rows, cv::DFT_ROWS, inplace);
+        testC2C("single col", 1, rows, 0, inplace);
+        testC2C("single row", cols, 1, 0, inplace);
+        testC2C("single col inversed", 1, rows, cv::DFT_INVERSE, inplace);
+        testC2C("single row inversed", cols, 1, cv::DFT_INVERSE, inplace);
+        testC2C("single row DFT_ROWS", cols, 1, cv::DFT_ROWS, inplace);
+        testC2C("size 1 2", 1, 2, 0, inplace);
+        testC2C("size 2 1", 2, 1, 0, inplace);
+    }
+}
+
+CUDA_TEST_P(Dft, Algorithm)
+{
+    int cols = randomInt(2, 100);
+    int rows = randomInt(2, 100);
+
+    int flags = 0 | DFT_COMPLEX_INPUT;
+    cv::Ptr<cv::cuda::DFT> dft = cv::cuda::createDFT(cv::Size(cols, rows), flags);
+
+    for (int i = 0; i < 5; ++i)
+    {
+        SCOPED_TRACE("dft algorithm");
+
+        cv::Mat a = randomMat(cv::Size(cols, rows), CV_32FC2, 0.0, 10.0);
+
+        cv::cuda::GpuMat d_b;
+        cv::cuda::GpuMat d_b_data;
+        dft->compute(loadMat(a), d_b);
+
+        cv::Mat b_gold;
+        cv::dft(a, b_gold, flags);
+
+        ASSERT_EQ(CV_32F, d_b.depth());
+        ASSERT_EQ(2, d_b.channels());
+        EXPECT_MAT_NEAR(b_gold, cv::Mat(d_b), rows * cols * 1e-4);
+    }
+}
+
+namespace
+{
+    void testR2CThenC2R(const std::string& hint, int cols, int rows, bool inplace)
+    {
+        SCOPED_TRACE(hint);
+
+        cv::Mat a = randomMat(cv::Size(cols, rows), CV_32FC1, 0.0, 10.0);
+
+        cv::cuda::GpuMat d_b, d_c;
+        cv::cuda::GpuMat d_b_data, d_c_data;
+        if (inplace)
+        {
+            if (a.cols == 1)
+            {
+                d_b_data.create(1, (a.rows / 2 + 1) * a.cols, CV_32FC2);
+                d_b = cv::cuda::GpuMat(a.rows / 2 + 1, a.cols, CV_32FC2, d_b_data.ptr(), a.cols * d_b_data.elemSize());
+            }
+            else
+            {
+                d_b_data.create(1, a.rows * (a.cols / 2 + 1), CV_32FC2);
+                d_b = cv::cuda::GpuMat(a.rows, a.cols / 2 + 1, CV_32FC2, d_b_data.ptr(), (a.cols / 2 + 1) * d_b_data.elemSize());
+            }
+            d_c_data.create(1, a.size().area(), CV_32F);
+            d_c = cv::cuda::GpuMat(a.rows, a.cols, CV_32F, d_c_data.ptr(), a.cols * d_c_data.elemSize());
+        }
+
+        cv::cuda::dft(loadMat(a), d_b, cv::Size(cols, rows), 0);
+        cv::cuda::dft(d_b, d_c, cv::Size(cols, rows), cv::DFT_REAL_OUTPUT | cv::DFT_SCALE);
+
+        EXPECT_TRUE(!inplace || d_b.ptr() == d_b_data.ptr());
+        EXPECT_TRUE(!inplace || d_c.ptr() == d_c_data.ptr());
+        ASSERT_EQ(CV_32F, d_c.depth());
+        ASSERT_EQ(1, d_c.channels());
+
+        cv::Mat c(d_c);
+        EXPECT_MAT_NEAR(a, c, rows * cols * 1e-5);
+    }
+}
+
+CUDA_TEST_P(Dft, R2CThenC2R)
+{
+    int cols = randomInt(2, 100);
+    int rows = randomInt(2, 100);
+
+    testR2CThenC2R("sanity", cols, rows, false);
+    testR2CThenC2R("sanity 0 1", cols, rows + 1, false);
+    testR2CThenC2R("sanity 1 0", cols + 1, rows, false);
+    testR2CThenC2R("sanity 1 1", cols + 1, rows + 1, false);
+    testR2CThenC2R("single col", 1, rows, false);
+    testR2CThenC2R("single col 1", 1, rows + 1, false);
+    testR2CThenC2R("single row", cols, 1, false);
+    testR2CThenC2R("single row 1", cols + 1, 1, false);
+
+    testR2CThenC2R("sanity", cols, rows, true);
+    testR2CThenC2R("sanity 0 1", cols, rows + 1, true);
+    testR2CThenC2R("sanity 1 0", cols + 1, rows, true);
+    testR2CThenC2R("sanity 1 1", cols + 1, rows + 1, true);
+    testR2CThenC2R("single row", cols, 1, true);
+    testR2CThenC2R("single row 1", cols + 1, 1, true);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Dft, ALL_DEVICES);
+
+////////////////////////////////////////////////////////
+// Convolve
+
+namespace
+{
+    void convolveDFT(const cv::Mat& A, const cv::Mat& B, cv::Mat& C, bool ccorr = false)
+    {
+        // reallocate the output array if needed
+        C.create(std::abs(A.rows - B.rows) + 1, std::abs(A.cols - B.cols) + 1, A.type());
+        cv::Size dftSize;
+
+        // compute the size of DFT transform
+        dftSize.width = cv::getOptimalDFTSize(A.cols + B.cols - 1);
+        dftSize.height = cv::getOptimalDFTSize(A.rows + B.rows - 1);
+
+        // allocate temporary buffers and initialize them with 0s
+        cv::Mat tempA(dftSize, A.type(), cv::Scalar::all(0));
+        cv::Mat tempB(dftSize, B.type(), cv::Scalar::all(0));
+
+        // copy A and B to the top-left corners of tempA and tempB, respectively
+        cv::Mat roiA(tempA, cv::Rect(0, 0, A.cols, A.rows));
+        A.copyTo(roiA);
+        cv::Mat roiB(tempB, cv::Rect(0, 0, B.cols, B.rows));
+        B.copyTo(roiB);
+
+        // now transform the padded A & B in-place;
+        // use "nonzeroRows" hint for faster processing
+        cv::dft(tempA, tempA, 0, A.rows);
+        cv::dft(tempB, tempB, 0, B.rows);
+
+        // multiply the spectrums;
+        // the function handles packed spectrum representations well
+        cv::mulSpectrums(tempA, tempB, tempA, 0, ccorr);
+
+        // transform the product back from the frequency domain.
+        // Even though all the result rows will be non-zero,
+        // you need only the first C.rows of them, and thus you
+        // pass nonzeroRows == C.rows
+        cv::dft(tempA, tempA, cv::DFT_INVERSE + cv::DFT_SCALE, C.rows);
+
+        // now copy the result back to C.
+        tempA(cv::Rect(0, 0, C.cols, C.rows)).copyTo(C);
+    }
+
+    IMPLEMENT_PARAM_CLASS(KSize, int)
+    IMPLEMENT_PARAM_CLASS(Ccorr, bool)
+}
+
+PARAM_TEST_CASE(Convolve, cv::cuda::DeviceInfo, cv::Size, KSize, Ccorr)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int ksize;
+    bool ccorr;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        ksize = GET_PARAM(2);
+        ccorr = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Convolve, Accuracy)
+{
+    cv::Mat src = randomMat(size, CV_32FC1, 0.0, 100.0);
+    cv::Mat kernel = randomMat(cv::Size(ksize, ksize), CV_32FC1, 0.0, 1.0);
+
+    cv::Ptr<cv::cuda::Convolution> conv = cv::cuda::createConvolution();
+
+    cv::cuda::GpuMat dst;
+    conv->convolve(loadMat(src), loadMat(kernel), dst, ccorr);
+
+    cv::Mat dst_gold;
+    convolveDFT(src, kernel, dst_gold, ccorr);
+
+    EXPECT_MAT_NEAR(dst, dst_gold, 1e-1);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Convolve, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(KSize(3), KSize(7), KSize(11), KSize(17), KSize(19), KSize(23), KSize(45)),
+    testing::Values(Ccorr(false), Ccorr(true))));
+
+#endif // HAVE_CUBLAS
+
+}} // namespace
+
+#endif // HAVE_CUDA
diff --git a/modules/cudaarithm/test/test_buffer_pool.cpp b/modules/cudaarithm/test/test_buffer_pool.cpp
new file mode 100644
index 00000000000..3c1fe2bcfd7
--- /dev/null
+++ b/modules/cudaarithm/test/test_buffer_pool.cpp
@@ -0,0 +1,120 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/core/private.cuda.hpp"
+#include "opencv2/ts/cuda_test.hpp"
+
+namespace opencv_test { namespace {
+
+struct BufferPoolTest : TestWithParam<DeviceInfo>
+{
+    void RunSimpleTest(Stream& stream, HostMem& dst_1, HostMem& dst_2)
+    {
+        BufferPool pool(stream);
+
+        {
+            GpuMat buf0 = pool.getBuffer(Size(640, 480), CV_8UC1);
+            EXPECT_FALSE( buf0.empty() );
+
+            buf0.setTo(Scalar::all(0), stream);
+
+            GpuMat buf1 = pool.getBuffer(Size(640, 480), CV_8UC1);
+            EXPECT_FALSE( buf1.empty() );
+
+            buf0.convertTo(buf1, buf1.type(), 1.0, 1.0, stream);
+
+            buf1.download(dst_1, stream);
+        }
+
+        {
+            GpuMat buf2 = pool.getBuffer(Size(1280, 1024), CV_32SC1);
+            EXPECT_FALSE( buf2.empty() );
+
+            buf2.setTo(Scalar::all(2), stream);
+
+            buf2.download(dst_2, stream);
+        }
+    }
+
+    void CheckSimpleTest(HostMem& dst_1, HostMem& dst_2)
+    {
+        EXPECT_MAT_NEAR(Mat(Size(640, 480), CV_8UC1, Scalar::all(1)), dst_1, 0.0);
+        EXPECT_MAT_NEAR(Mat(Size(1280, 1024), CV_32SC1, Scalar::all(2)), dst_2, 0.0);
+    }
+};
+
+CUDA_TEST_P(BufferPoolTest, FromNullStream)
+{
+    HostMem dst_1, dst_2;
+
+    RunSimpleTest(Stream::Null(), dst_1, dst_2);
+
+    cudaSafeCall(cudaDeviceSynchronize());
+
+    CheckSimpleTest(dst_1, dst_2);
+}
+
+CUDA_TEST_P(BufferPoolTest, From2Streams)
+{
+    HostMem dst1_1, dst1_2;
+    HostMem dst2_1, dst2_2;
+
+    Stream stream1, stream2;
+    RunSimpleTest(stream1, dst1_1, dst1_2);
+    RunSimpleTest(stream2, dst2_1, dst2_2);
+
+    stream1.waitForCompletion();
+    stream2.waitForCompletion();
+
+    CheckSimpleTest(dst1_1, dst1_2);
+    CheckSimpleTest(dst2_1, dst2_2);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Stream, BufferPoolTest, ALL_DEVICES);
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaarithm/test/test_core.cpp b/modules/cudaarithm/test/test_core.cpp
new file mode 100644
index 00000000000..7e5762aa3fc
--- /dev/null
+++ b/modules/cudaarithm/test/test_core.cpp
@@ -0,0 +1,421 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+////////////////////////////////////////////////////////////////////////////////
+// Merge
+
+PARAM_TEST_CASE(Merge, cv::cuda::DeviceInfo, cv::Size, MatDepth, Channels, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    int channels;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        channels = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Merge, Accuracy)
+{
+    std::vector<cv::Mat> src;
+    src.reserve(channels);
+    for (int i = 0; i < channels; ++i)
+        src.push_back(cv::Mat(size, depth, cv::Scalar::all(i)));
+
+    std::vector<cv::cuda::GpuMat> d_src;
+    for (int i = 0; i < channels; ++i)
+        d_src.push_back(loadMat(src[i], useRoi));
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::merge(d_src, dst);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst;
+        cv::cuda::merge(d_src, dst);
+
+        cv::Mat dst_gold;
+        cv::merge(src, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Merge, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    ALL_DEPTH,
+    testing::Values(1, 2, 3, 4),
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Split
+
+PARAM_TEST_CASE(Split, cv::cuda::DeviceInfo, cv::Size, MatDepth, Channels, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    int channels;
+    bool useRoi;
+
+    int type;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        channels = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        type = CV_MAKE_TYPE(depth, channels);
+    }
+};
+
+CUDA_TEST_P(Split, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            std::vector<cv::cuda::GpuMat> dst;
+            cv::cuda::split(loadMat(src), dst);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        std::vector<cv::cuda::GpuMat> dst;
+        cv::cuda::split(loadMat(src, useRoi), dst);
+
+        std::vector<cv::Mat> dst_gold;
+        cv::split(src, dst_gold);
+
+        ASSERT_EQ(dst_gold.size(), dst.size());
+
+        for (size_t i = 0; i < dst_gold.size(); ++i)
+        {
+            EXPECT_MAT_NEAR(dst_gold[i], dst[i], 0.0);
+        }
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Split, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    ALL_DEPTH,
+    testing::Values(1, 2, 3, 4),
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Transpose
+
+PARAM_TEST_CASE(Transpose, cv::cuda::DeviceInfo, cv::Size, MatType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Transpose, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+
+    if (CV_MAT_DEPTH(type) == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::transpose(loadMat(src), dst);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(cv::Size(size.height, size.width), type, useRoi);
+        cv::cuda::transpose(loadMat(src, useRoi), dst);
+
+        cv::Mat dst_gold;
+        cv::transpose(src, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Transpose, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1),
+                    MatType(CV_8UC4),
+                    MatType(CV_16UC2),
+                    MatType(CV_16SC2),
+                    MatType(CV_32SC1),
+                    MatType(CV_32SC2),
+                    MatType(CV_64FC1)),
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Flip
+
+enum {FLIP_BOTH = 0, FLIP_X = 1, FLIP_Y = -1};
+CV_ENUM(FlipCode, FLIP_BOTH, FLIP_X, FLIP_Y)
+#define ALL_FLIP_CODES testing::Values(FlipCode(FLIP_BOTH), FlipCode(FLIP_X), FlipCode(FLIP_Y))
+
+PARAM_TEST_CASE(Flip, cv::cuda::DeviceInfo, cv::Size, MatType, FlipCode, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    int flip_code;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        flip_code = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Flip, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    cv::cuda::flip(loadMat(src, useRoi), dst, flip_code);
+
+    cv::Mat dst_gold;
+    cv::flip(src, dst_gold, flip_code);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Flip, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1),
+                    MatType(CV_8UC3),
+                    MatType(CV_8UC4),
+                    MatType(CV_16UC1),
+                    MatType(CV_16UC3),
+                    MatType(CV_16UC4),
+                    MatType(CV_32SC1),
+                    MatType(CV_32SC3),
+                    MatType(CV_32SC4),
+                    MatType(CV_32FC1),
+                    MatType(CV_32FC3),
+                    MatType(CV_32FC4)),
+    ALL_FLIP_CODES,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// LUT
+
+PARAM_TEST_CASE(LUT, cv::cuda::DeviceInfo, cv::Size, MatType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(LUT, OneChannel)
+{
+    cv::Mat src = randomMat(size, type);
+    cv::Mat lut = randomMat(cv::Size(256, 1), CV_8UC1);
+
+    cv::Ptr<cv::cuda::LookUpTable> lutAlg = cv::cuda::createLookUpTable(lut);
+
+    cv::cuda::GpuMat dst = createMat(size, CV_MAKE_TYPE(lut.depth(), src.channels()));
+    lutAlg->transform(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::LUT(src, lut, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(LUT, MultiChannel)
+{
+    cv::Mat src = randomMat(size, type);
+    cv::Mat lut = randomMat(cv::Size(256, 1), CV_MAKE_TYPE(CV_8U, src.channels()));
+
+    cv::Ptr<cv::cuda::LookUpTable> lutAlg = cv::cuda::createLookUpTable(lut);
+
+    cv::cuda::GpuMat dst = createMat(size, CV_MAKE_TYPE(lut.depth(), src.channels()), useRoi);
+    lutAlg->transform(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::LUT(src, lut, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, LUT, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC3)),
+    WHOLE_SUBMAT));
+
+//////////////////////////////////////////////////////////////////////////////
+// CopyMakeBorder
+
+namespace
+{
+    IMPLEMENT_PARAM_CLASS(Border, int)
+}
+
+PARAM_TEST_CASE(CopyMakeBorder, cv::cuda::DeviceInfo, cv::Size, MatType, Border, BorderType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    int border;
+    int borderType;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        border = GET_PARAM(3);
+        borderType = GET_PARAM(4);
+        useRoi = GET_PARAM(5);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(CopyMakeBorder, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+    cv::Scalar val = randomScalar(0, 255);
+
+    cv::cuda::GpuMat dst = createMat(cv::Size(size.width + 2 * border, size.height + 2 * border), type, useRoi);
+    cv::cuda::copyMakeBorder(loadMat(src, useRoi), dst, border, border, border, border, borderType, val);
+
+    cv::Mat dst_gold;
+    cv::copyMakeBorder(src, dst_gold, border, border, border, border, borderType, val);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, CopyMakeBorder, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1),
+                    MatType(CV_8UC3),
+                    MatType(CV_8UC4),
+                    MatType(CV_16UC1),
+                    MatType(CV_16UC3),
+                    MatType(CV_16UC4),
+                    MatType(CV_32FC1),
+                    MatType(CV_32FC3),
+                    MatType(CV_32FC4)),
+    testing::Values(Border(1), Border(10), Border(50)),
+    ALL_BORDER_TYPES,
+    WHOLE_SUBMAT));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaarithm/test/test_element_operations.cpp b/modules/cudaarithm/test/test_element_operations.cpp
new file mode 100644
index 00000000000..9d20046df97
--- /dev/null
+++ b/modules/cudaarithm/test/test_element_operations.cpp
@@ -0,0 +1,2803 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+////////////////////////////////////////////////////////////////////////////////
+// Add_Array
+
+PARAM_TEST_CASE(Add_Array, cv::cuda::DeviceInfo, cv::Size, std::pair<MatDepth, MatDepth>, Channels, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    std::pair<MatDepth, MatDepth> depth;
+    int channels;
+    bool useRoi;
+
+    int stype;
+    int dtype;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        channels = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        stype = CV_MAKE_TYPE(depth.first, channels);
+        dtype = CV_MAKE_TYPE(depth.second, channels);
+    }
+};
+
+CUDA_TEST_P(Add_Array, Accuracy)
+{
+    cv::Mat mat1 = randomMat(size, stype);
+    cv::Mat mat2 = randomMat(size, stype);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::add(loadMat(mat1), loadMat(mat2), dst, cv::cuda::GpuMat(), depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, dtype, useRoi);
+        dst.setTo(cv::Scalar::all(0));
+        cv::cuda::add(loadMat(mat1, useRoi), loadMat(mat2, useRoi), dst, cv::cuda::GpuMat(), depth.second);
+
+        cv::Mat dst_gold(size, dtype, cv::Scalar::all(0));
+        cv::add(mat1, mat2, dst_gold, cv::noArray(), depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-4 : 0.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Add_Array, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    DEPTH_PAIRS,
+    ALL_CHANNELS,
+    WHOLE_SUBMAT));
+
+PARAM_TEST_CASE(Add_Array_Mask, cv::cuda::DeviceInfo, cv::Size, std::pair<MatDepth, MatDepth>, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    std::pair<MatDepth, MatDepth> depth;
+    bool useRoi;
+
+    int stype;
+    int dtype;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        stype = CV_MAKE_TYPE(depth.first, 1);
+        dtype = CV_MAKE_TYPE(depth.second, 1);
+    }
+};
+
+CUDA_TEST_P(Add_Array_Mask, Accuracy)
+{
+    cv::Mat mat1 = randomMat(size, stype);
+    cv::Mat mat2 = randomMat(size, stype);
+    cv::Mat mask = randomMat(size, CV_8UC1, 0, 2);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::add(loadMat(mat1), loadMat(mat2), dst, cv::cuda::GpuMat(), depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, dtype, useRoi);
+        dst.setTo(cv::Scalar::all(0));
+        cv::cuda::add(loadMat(mat1, useRoi), loadMat(mat2, useRoi), dst, loadMat(mask, useRoi), depth.second);
+
+        cv::Mat dst_gold(size, dtype, cv::Scalar::all(0));
+        cv::add(mat1, mat2, dst_gold, mask, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-4 : 0.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Add_Array_Mask, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    DEPTH_PAIRS,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Add_Scalar
+
+PARAM_TEST_CASE(Add_Scalar, cv::cuda::DeviceInfo, cv::Size, std::pair<MatDepth, MatDepth>, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    std::pair<MatDepth, MatDepth> depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Add_Scalar, WithOutMask)
+{
+    cv::Mat mat = randomMat(size, depth.first);
+    cv::Scalar val = randomScalar(0, 255);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::add(loadMat(mat), val, dst, cv::cuda::GpuMat(), depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth.second, useRoi);
+        dst.setTo(cv::Scalar::all(0));
+        cv::cuda::add(loadMat(mat, useRoi), val, dst, cv::cuda::GpuMat(), depth.second);
+
+        cv::Mat dst_gold(size, depth.second, cv::Scalar::all(0));
+        cv::add(mat, val, dst_gold, cv::noArray(), depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-4 : 1.0);
+    }
+}
+
+CUDA_TEST_P(Add_Scalar, WithMask)
+{
+    cv::Mat mat = randomMat(size, depth.first);
+    cv::Scalar val = randomScalar(0, 255);
+    cv::Mat mask = randomMat(size, CV_8UC1, 0.0, 2.0);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::add(loadMat(mat), val, dst, cv::cuda::GpuMat(), depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth.second, useRoi);
+        dst.setTo(cv::Scalar::all(0));
+        cv::cuda::add(loadMat(mat, useRoi), val, dst, loadMat(mask, useRoi), depth.second);
+
+        cv::Mat dst_gold(size, depth.second, cv::Scalar::all(0));
+        cv::add(mat, val, dst_gold, mask, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-4 : 1.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Add_Scalar, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    DEPTH_PAIRS,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Add_Scalar_First
+
+PARAM_TEST_CASE(Add_Scalar_First, cv::cuda::DeviceInfo, cv::Size, std::pair<MatDepth, MatDepth>, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    std::pair<MatDepth, MatDepth> depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Add_Scalar_First, WithOutMask)
+{
+    cv::Mat mat = randomMat(size, depth.first);
+    cv::Scalar val = randomScalar(0, 255);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::add(val, loadMat(mat), dst, cv::cuda::GpuMat(), depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth.second, useRoi);
+        dst.setTo(cv::Scalar::all(0));
+        cv::cuda::add(val, loadMat(mat, useRoi), dst, cv::cuda::GpuMat(), depth.second);
+
+        cv::Mat dst_gold(size, depth.second, cv::Scalar::all(0));
+        cv::add(val, mat, dst_gold, cv::noArray(), depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-4 : 0.0);
+    }
+}
+
+CUDA_TEST_P(Add_Scalar_First, WithMask)
+{
+    cv::Mat mat = randomMat(size, depth.first);
+    cv::Scalar val = randomScalar(0, 255);
+    cv::Mat mask = randomMat(size, CV_8UC1, 0.0, 2.0);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::add(val, loadMat(mat), dst, cv::cuda::GpuMat(), depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth.second, useRoi);
+        dst.setTo(cv::Scalar::all(0));
+        cv::cuda::add(val, loadMat(mat, useRoi), dst, loadMat(mask, useRoi), depth.second);
+
+        cv::Mat dst_gold(size, depth.second, cv::Scalar::all(0));
+        cv::add(val, mat, dst_gold, mask, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-4 : 0.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Add_Scalar_First, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    DEPTH_PAIRS,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Subtract_Array
+
+PARAM_TEST_CASE(Subtract_Array, cv::cuda::DeviceInfo, cv::Size, std::pair<MatDepth, MatDepth>, Channels, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    std::pair<MatDepth, MatDepth> depth;
+    int channels;
+    bool useRoi;
+
+    int stype;
+    int dtype;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        channels = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        stype = CV_MAKE_TYPE(depth.first, channels);
+        dtype = CV_MAKE_TYPE(depth.second, channels);
+    }
+};
+
+CUDA_TEST_P(Subtract_Array, Accuracy)
+{
+    cv::Mat mat1 = randomMat(size, stype);
+    cv::Mat mat2 = randomMat(size, stype);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::subtract(loadMat(mat1), loadMat(mat2), dst, cv::cuda::GpuMat(), depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, dtype, useRoi);
+        dst.setTo(cv::Scalar::all(0));
+        cv::cuda::subtract(loadMat(mat1, useRoi), loadMat(mat2, useRoi), dst, cv::cuda::GpuMat(), depth.second);
+
+        cv::Mat dst_gold(size, dtype, cv::Scalar::all(0));
+        cv::subtract(mat1, mat2, dst_gold, cv::noArray(), depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-4 : 0.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Subtract_Array, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    DEPTH_PAIRS,
+    ALL_CHANNELS,
+    WHOLE_SUBMAT));
+
+PARAM_TEST_CASE(Subtract_Array_Mask, cv::cuda::DeviceInfo, cv::Size, std::pair<MatDepth, MatDepth>, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    std::pair<MatDepth, MatDepth> depth;
+    bool useRoi;
+
+    int stype;
+    int dtype;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        stype = CV_MAKE_TYPE(depth.first, 1);
+        dtype = CV_MAKE_TYPE(depth.second, 1);
+    }
+};
+
+CUDA_TEST_P(Subtract_Array_Mask, Accuracy)
+{
+    cv::Mat mat1 = randomMat(size, stype);
+    cv::Mat mat2 = randomMat(size, stype);
+    cv::Mat mask = randomMat(size, CV_8UC1, 0.0, 2.0);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::subtract(loadMat(mat1), loadMat(mat2), dst, cv::cuda::GpuMat(), depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, dtype, useRoi);
+        dst.setTo(cv::Scalar::all(0));
+        cv::cuda::subtract(loadMat(mat1, useRoi), loadMat(mat2, useRoi), dst, loadMat(mask, useRoi), depth.second);
+
+        cv::Mat dst_gold(size, dtype, cv::Scalar::all(0));
+        cv::subtract(mat1, mat2, dst_gold, mask, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-4 : 0.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Subtract_Array_Mask, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    DEPTH_PAIRS,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Subtract_Scalar
+
+PARAM_TEST_CASE(Subtract_Scalar, cv::cuda::DeviceInfo, cv::Size, std::pair<MatDepth, MatDepth>, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    std::pair<MatDepth, MatDepth> depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Subtract_Scalar, WithOutMask)
+{
+    cv::Mat mat = randomMat(size, depth.first);
+    cv::Scalar val = randomScalar(0, 255);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::subtract(loadMat(mat), val, dst, cv::cuda::GpuMat(), depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth.second, useRoi);
+        dst.setTo(cv::Scalar::all(0));
+        cv::cuda::subtract(loadMat(mat, useRoi), val, dst, cv::cuda::GpuMat(), depth.second);
+
+        cv::Mat dst_gold(size, depth.second, cv::Scalar::all(0));
+        cv::subtract(mat, val, dst_gold, cv::noArray(), depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-4 : 1.0);
+    }
+}
+
+CUDA_TEST_P(Subtract_Scalar, WithMask)
+{
+    cv::Mat mat = randomMat(size, depth.first);
+    cv::Scalar val = randomScalar(0, 255);
+    cv::Mat mask = randomMat(size, CV_8UC1, 0.0, 2.0);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::subtract(loadMat(mat), val, dst, cv::cuda::GpuMat(), depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth.second, useRoi);
+        dst.setTo(cv::Scalar::all(0));
+        cv::cuda::subtract(loadMat(mat, useRoi), val, dst, loadMat(mask, useRoi), depth.second);
+
+        cv::Mat dst_gold(size, depth.second, cv::Scalar::all(0));
+        cv::subtract(mat, val, dst_gold, mask, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-4 : 1.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Subtract_Scalar, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    DEPTH_PAIRS,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Subtract_Scalar_First
+
+PARAM_TEST_CASE(Subtract_Scalar_First, cv::cuda::DeviceInfo, cv::Size, std::pair<MatDepth, MatDepth>, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    std::pair<MatDepth, MatDepth> depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Subtract_Scalar_First, WithOutMask)
+{
+    cv::Mat mat = randomMat(size, depth.first);
+    cv::Scalar val = randomScalar(0, 255);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::subtract(val, loadMat(mat), dst, cv::cuda::GpuMat(), depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth.second, useRoi);
+        dst.setTo(cv::Scalar::all(0));
+        cv::cuda::subtract(val, loadMat(mat, useRoi), dst, cv::cuda::GpuMat(), depth.second);
+
+        cv::Mat dst_gold(size, depth.second, cv::Scalar::all(0));
+        cv::subtract(val, mat, dst_gold, cv::noArray(), depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-4 : 0.0);
+    }
+}
+
+CUDA_TEST_P(Subtract_Scalar_First, WithMask)
+{
+    cv::Mat mat = randomMat(size, depth.first);
+    cv::Scalar val = randomScalar(0, 255);
+    cv::Mat mask = randomMat(size, CV_8UC1, 0.0, 2.0);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::subtract(val, loadMat(mat), dst, cv::cuda::GpuMat(), depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth.second, useRoi);
+        dst.setTo(cv::Scalar::all(0));
+        cv::cuda::subtract(val, loadMat(mat, useRoi), dst, loadMat(mask, useRoi), depth.second);
+
+        cv::Mat dst_gold(size, depth.second, cv::Scalar::all(0));
+        cv::subtract(val, mat, dst_gold, mask, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-4 : 0.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Subtract_Scalar_First, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    DEPTH_PAIRS,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Multiply_Array
+
+PARAM_TEST_CASE(Multiply_Array, cv::cuda::DeviceInfo, cv::Size, std::pair<MatDepth, MatDepth>, Channels, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    std::pair<MatDepth, MatDepth> depth;
+    int channels;
+    bool useRoi;
+
+    int stype;
+    int dtype;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        channels = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        stype = CV_MAKE_TYPE(depth.first, channels);
+        dtype = CV_MAKE_TYPE(depth.second, channels);
+    }
+};
+
+CUDA_TEST_P(Multiply_Array, WithOutScale)
+{
+    cv::Mat mat1 = randomMat(size, stype);
+    cv::Mat mat2 = randomMat(size, stype);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::multiply(loadMat(mat1), loadMat(mat2), dst, 1, depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, dtype, useRoi);
+        cv::cuda::multiply(loadMat(mat1, useRoi), loadMat(mat2, useRoi), dst, 1, depth.second);
+
+        cv::Mat dst_gold;
+        cv::multiply(mat1, mat2, dst_gold, 1, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-2 : 0.0);
+    }
+}
+
+CUDA_TEST_P(Multiply_Array, WithScale)
+{
+    cv::Mat mat1 = randomMat(size, stype);
+    cv::Mat mat2 = randomMat(size, stype);
+    double scale = randomDouble(0.0, 255.0);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::multiply(loadMat(mat1), loadMat(mat2), dst, scale, depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, dtype, useRoi);
+        cv::cuda::multiply(loadMat(mat1, useRoi), loadMat(mat2, useRoi), dst, scale, depth.second);
+
+        cv::Mat dst_gold;
+        cv::multiply(mat1, mat2, dst_gold, scale, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 2.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Multiply_Array, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    DEPTH_PAIRS,
+    ALL_CHANNELS,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Multiply_Array_Special
+
+PARAM_TEST_CASE(Multiply_Array_Special, cv::cuda::DeviceInfo, cv::Size, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        useRoi = GET_PARAM(2);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Multiply_Array_Special, Case_8UC4x_32FC1)
+{
+    cv::Mat mat1 = randomMat(size, CV_8UC4);
+    cv::Mat mat2 = randomMat(size, CV_32FC1);
+
+    cv::cuda::GpuMat dst = createMat(size, CV_8UC4, useRoi);
+    cv::cuda::multiply(loadMat(mat1, useRoi), loadMat(mat2, useRoi), dst);
+
+    cv::Mat h_dst(dst);
+
+    for (int y = 0; y < h_dst.rows; ++y)
+    {
+        const cv::Vec4b* mat1_row = mat1.ptr<cv::Vec4b>(y);
+        const float* mat2_row = mat2.ptr<float>(y);
+        const cv::Vec4b* dst_row = h_dst.ptr<cv::Vec4b>(y);
+
+        for (int x = 0; x < h_dst.cols; ++x)
+        {
+            cv::Vec4b val1 = mat1_row[x];
+            float val2 = mat2_row[x];
+            cv::Vec4b actual = dst_row[x];
+
+            cv::Vec4b gold;
+
+            gold[0] = cv::saturate_cast<uchar>(val1[0] * val2);
+            gold[1] = cv::saturate_cast<uchar>(val1[1] * val2);
+            gold[2] = cv::saturate_cast<uchar>(val1[2] * val2);
+            gold[3] = cv::saturate_cast<uchar>(val1[3] * val2);
+
+            ASSERT_LE(std::abs(gold[0] - actual[0]), 1.0);
+            ASSERT_LE(std::abs(gold[1] - actual[1]), 1.0);
+            ASSERT_LE(std::abs(gold[1] - actual[1]), 1.0);
+            ASSERT_LE(std::abs(gold[1] - actual[1]), 1.0);
+        }
+    }
+}
+
+CUDA_TEST_P(Multiply_Array_Special, Case_16SC4x_32FC1)
+{
+    cv::Mat mat1 = randomMat(size, CV_16SC4);
+    cv::Mat mat2 = randomMat(size, CV_32FC1);
+
+    cv::cuda::GpuMat dst = createMat(size, CV_16SC4, useRoi);
+    cv::cuda::multiply(loadMat(mat1, useRoi), loadMat(mat2, useRoi), dst);
+
+    cv::Mat h_dst(dst);
+
+    for (int y = 0; y < h_dst.rows; ++y)
+    {
+        const cv::Vec4s* mat1_row = mat1.ptr<cv::Vec4s>(y);
+        const float* mat2_row = mat2.ptr<float>(y);
+        const cv::Vec4s* dst_row = h_dst.ptr<cv::Vec4s>(y);
+
+        for (int x = 0; x < h_dst.cols; ++x)
+        {
+            cv::Vec4s val1 = mat1_row[x];
+            float val2 = mat2_row[x];
+            cv::Vec4s actual = dst_row[x];
+
+            cv::Vec4s gold;
+
+            gold[0] = cv::saturate_cast<short>(val1[0] * val2);
+            gold[1] = cv::saturate_cast<short>(val1[1] * val2);
+            gold[2] = cv::saturate_cast<short>(val1[2] * val2);
+            gold[3] = cv::saturate_cast<short>(val1[3] * val2);
+
+            ASSERT_LE(std::abs(gold[0] - actual[0]), 1.0);
+            ASSERT_LE(std::abs(gold[1] - actual[1]), 1.0);
+            ASSERT_LE(std::abs(gold[1] - actual[1]), 1.0);
+            ASSERT_LE(std::abs(gold[1] - actual[1]), 1.0);
+        }
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Multiply_Array_Special, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Multiply_Scalar
+
+PARAM_TEST_CASE(Multiply_Scalar, cv::cuda::DeviceInfo, cv::Size, std::pair<MatDepth, MatDepth>, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    std::pair<MatDepth, MatDepth> depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Multiply_Scalar, WithOutScale)
+{
+    cv::Mat mat = randomMat(size, depth.first);
+    cv::Scalar val = randomScalar(0, 255);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::multiply(loadMat(mat), val, dst, 1, depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth.second, useRoi);
+        cv::cuda::multiply(loadMat(mat, useRoi), val, dst, 1, depth.second);
+
+        cv::Mat dst_gold;
+        cv::multiply(mat, val, dst_gold, 1, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 1.0);
+    }
+}
+
+
+CUDA_TEST_P(Multiply_Scalar, WithScale)
+{
+    cv::Mat mat = randomMat(size, depth.first);
+    cv::Scalar val = randomScalar(0, 255);
+    double scale = randomDouble(0.0, 255.0);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::multiply(loadMat(mat), val, dst, scale, depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth.second, useRoi);
+        cv::cuda::multiply(loadMat(mat, useRoi), val, dst, scale, depth.second);
+
+        cv::Mat dst_gold;
+        cv::multiply(mat, val, dst_gold, scale, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 1.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Multiply_Scalar, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    DEPTH_PAIRS,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Multiply_Scalar_First
+
+PARAM_TEST_CASE(Multiply_Scalar_First, cv::cuda::DeviceInfo, cv::Size, std::pair<MatDepth, MatDepth>, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    std::pair<MatDepth, MatDepth> depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Multiply_Scalar_First, WithOutScale)
+{
+    cv::Mat mat = randomMat(size, depth.first);
+    cv::Scalar val = randomScalar(0, 255);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::multiply(val, loadMat(mat), dst, 1, depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth.second, useRoi);
+        cv::cuda::multiply(val, loadMat(mat, useRoi), dst, 1, depth.second);
+
+        cv::Mat dst_gold;
+        cv::multiply(val, mat, dst_gold, 1, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 1.0);
+    }
+}
+
+
+CUDA_TEST_P(Multiply_Scalar_First, WithScale)
+{
+    cv::Mat mat = randomMat(size, depth.first);
+    cv::Scalar val = randomScalar(0, 255);
+    double scale = randomDouble(0.0, 255.0);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::multiply(val, loadMat(mat), dst, scale, depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth.second, useRoi);
+        cv::cuda::multiply(val, loadMat(mat, useRoi), dst, scale, depth.second);
+
+        cv::Mat dst_gold;
+        cv::multiply(val, mat, dst_gold, scale, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 1.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Multiply_Scalar_First, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    DEPTH_PAIRS,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Divide_Array
+
+PARAM_TEST_CASE(Divide_Array, cv::cuda::DeviceInfo, cv::Size, std::pair<MatDepth, MatDepth>, Channels, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    std::pair<MatDepth, MatDepth> depth;
+    int channels;
+    bool useRoi;
+
+    int stype;
+    int dtype;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        channels = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        stype = CV_MAKE_TYPE(depth.first, channels);
+        dtype = CV_MAKE_TYPE(depth.second, channels);
+    }
+};
+
+CUDA_TEST_P(Divide_Array, WithOutScale)
+{
+    cv::Mat mat1 = randomMat(size, stype);
+    cv::Mat mat2 = randomMat(size, stype, 1.0, 255.0);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::divide(loadMat(mat1), loadMat(mat2), dst, 1, depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, dtype, useRoi);
+        cv::cuda::divide(loadMat(mat1, useRoi), loadMat(mat2, useRoi), dst, 1, depth.second);
+
+        cv::Mat dst_gold;
+        cv::divide(mat1, mat2, dst_gold, 1, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-4 : 1.0);
+    }
+}
+
+CUDA_TEST_P(Divide_Array, WithScale)
+{
+    cv::Mat mat1 = randomMat(size, stype);
+    cv::Mat mat2 = randomMat(size, stype, 1.0, 255.0);
+    double scale = randomDouble(0.0, 255.0);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::divide(loadMat(mat1), loadMat(mat2), dst, scale, depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, dtype, useRoi);
+        cv::cuda::divide(loadMat(mat1, useRoi), loadMat(mat2, useRoi), dst, scale, depth.second);
+
+        cv::Mat dst_gold;
+        cv::divide(mat1, mat2, dst_gold, scale, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-2 : 1.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Divide_Array, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    DEPTH_PAIRS,
+    ALL_CHANNELS,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Divide_Array_Special
+
+PARAM_TEST_CASE(Divide_Array_Special, cv::cuda::DeviceInfo, cv::Size, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        useRoi = GET_PARAM(2);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Divide_Array_Special, Case_8UC4x_32FC1)
+{
+    cv::Mat mat1 = randomMat(size, CV_8UC4);
+    cv::Mat mat2 = randomMat(size, CV_32FC1, 1.0, 255.0);
+
+    cv::cuda::GpuMat dst = createMat(size, CV_8UC4, useRoi);
+    cv::cuda::divide(loadMat(mat1, useRoi), loadMat(mat2, useRoi), dst);
+
+    cv::Mat h_dst(dst);
+
+    for (int y = 0; y < h_dst.rows; ++y)
+    {
+        const cv::Vec4b* mat1_row = mat1.ptr<cv::Vec4b>(y);
+        const float* mat2_row = mat2.ptr<float>(y);
+        const cv::Vec4b* dst_row = h_dst.ptr<cv::Vec4b>(y);
+
+        for (int x = 0; x < h_dst.cols; ++x)
+        {
+            cv::Vec4b val1 = mat1_row[x];
+            float val2 = mat2_row[x];
+            cv::Vec4b actual = dst_row[x];
+
+            cv::Vec4b gold;
+
+            gold[0] = cv::saturate_cast<uchar>(val1[0] / val2);
+            gold[1] = cv::saturate_cast<uchar>(val1[1] / val2);
+            gold[2] = cv::saturate_cast<uchar>(val1[2] / val2);
+            gold[3] = cv::saturate_cast<uchar>(val1[3] / val2);
+
+            ASSERT_LE(std::abs(gold[0] - actual[0]), 1.0);
+            ASSERT_LE(std::abs(gold[1] - actual[1]), 1.0);
+            ASSERT_LE(std::abs(gold[1] - actual[1]), 1.0);
+            ASSERT_LE(std::abs(gold[1] - actual[1]), 1.0);
+        }
+    }
+}
+
+CUDA_TEST_P(Divide_Array_Special, Case_16SC4x_32FC1)
+{
+    cv::Mat mat1 = randomMat(size, CV_16SC4);
+    cv::Mat mat2 = randomMat(size, CV_32FC1, 1.0, 255.0);
+
+    cv::cuda::GpuMat dst = createMat(size, CV_16SC4, useRoi);
+    cv::cuda::divide(loadMat(mat1, useRoi), loadMat(mat2, useRoi), dst);
+
+    cv::Mat h_dst(dst);
+
+    for (int y = 0; y < h_dst.rows; ++y)
+    {
+        const cv::Vec4s* mat1_row = mat1.ptr<cv::Vec4s>(y);
+        const float* mat2_row = mat2.ptr<float>(y);
+        const cv::Vec4s* dst_row = h_dst.ptr<cv::Vec4s>(y);
+
+        for (int x = 0; x < h_dst.cols; ++x)
+        {
+            cv::Vec4s val1 = mat1_row[x];
+            float val2 = mat2_row[x];
+            cv::Vec4s actual = dst_row[x];
+
+            cv::Vec4s gold;
+
+            gold[0] = cv::saturate_cast<short>(val1[0] / val2);
+            gold[1] = cv::saturate_cast<short>(val1[1] / val2);
+            gold[2] = cv::saturate_cast<short>(val1[2] / val2);
+            gold[3] = cv::saturate_cast<short>(val1[3] / val2);
+
+            ASSERT_LE(std::abs(gold[0] - actual[0]), 1.0);
+            ASSERT_LE(std::abs(gold[1] - actual[1]), 1.0);
+            ASSERT_LE(std::abs(gold[1] - actual[1]), 1.0);
+            ASSERT_LE(std::abs(gold[1] - actual[1]), 1.0);
+        }
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Divide_Array_Special, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Divide_Scalar
+
+PARAM_TEST_CASE(Divide_Scalar, cv::cuda::DeviceInfo, cv::Size, std::pair<MatDepth, MatDepth>, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    std::pair<MatDepth, MatDepth> depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Divide_Scalar, WithOutScale)
+{
+    cv::Mat mat = randomMat(size, depth.first);
+    cv::Scalar val = randomScalar(1.0, 255.0);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::divide(loadMat(mat), val, dst, 1, depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth.second, useRoi);
+        cv::cuda::divide(loadMat(mat, useRoi), val, dst, 1, depth.second);
+
+        cv::Mat dst_gold;
+        cv::divide(mat, val, dst_gold, 1, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-4 : 1.0);
+    }
+}
+
+CUDA_TEST_P(Divide_Scalar, WithScale)
+{
+    cv::Mat mat = randomMat(size, depth.first);
+    cv::Scalar val = randomScalar(1.0, 255.0);
+    double scale = randomDouble(0.0, 255.0);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::divide(loadMat(mat), val, dst, scale, depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth.second, useRoi);
+        cv::cuda::divide(loadMat(mat, useRoi), val, dst, scale, depth.second);
+
+        cv::Mat dst_gold;
+        cv::divide(mat, val, dst_gold, scale, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-2 : 1.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Divide_Scalar, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    DEPTH_PAIRS,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Divide_Scalar_First
+
+PARAM_TEST_CASE(Divide_Scalar_First, cv::cuda::DeviceInfo, cv::Size, std::pair<MatDepth, MatDepth>, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    std::pair<MatDepth, MatDepth> depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Divide_Scalar_First, Accuracy)
+{
+    double scale = randomDouble(0.0, 255.0);
+    cv::Mat mat = randomMat(size, depth.first, 1.0, 255.0);
+
+    if ((depth.first == CV_64F || depth.second == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::divide(scale, loadMat(mat), dst, 1.0, depth.second);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth.second, useRoi);
+        cv::cuda::divide(scale, loadMat(mat, useRoi), dst, 1.0, depth.second);
+
+        cv::Mat dst_gold;
+        cv::divide(scale, mat, dst_gold, depth.second);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth.first >= CV_32F || depth.second >= CV_32F ? 1e-4 : 1.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Divide_Scalar_First, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    DEPTH_PAIRS,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// AbsDiff
+
+PARAM_TEST_CASE(AbsDiff, cv::cuda::DeviceInfo, cv::Size, MatDepth, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(AbsDiff, Array)
+{
+    cv::Mat src1 = randomMat(size, depth);
+    cv::Mat src2 = randomMat(size, depth);
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::absdiff(loadMat(src1), loadMat(src2), dst);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth, useRoi);
+        cv::cuda::absdiff(loadMat(src1, useRoi), loadMat(src2, useRoi), dst);
+
+        cv::Mat dst_gold;
+        cv::absdiff(src1, src2, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+}
+
+CUDA_TEST_P(AbsDiff, Scalar)
+{
+    cv::Mat src = randomMat(size, depth);
+    cv::Scalar val = randomScalar(0.0, 255.0);
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::absdiff(loadMat(src), val, dst);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth, useRoi);
+        cv::cuda::absdiff(loadMat(src, useRoi), val, dst);
+
+        cv::Mat dst_gold;
+        cv::absdiff(src, val, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth <= CV_32F ? 1.0 : 1e-5);
+    }
+}
+
+CUDA_TEST_P(AbsDiff, Scalar_First)
+{
+    cv::Mat src = randomMat(size, depth);
+    cv::Scalar val = randomScalar(0.0, 255.0);
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::absdiff(val, loadMat(src), dst);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth, useRoi);
+        cv::cuda::absdiff(val, loadMat(src, useRoi), dst);
+
+        cv::Mat dst_gold;
+        cv::absdiff(val, src, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth <= CV_32F ? 1.0 : 1e-5);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, AbsDiff, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    ALL_DEPTH,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Abs
+
+PARAM_TEST_CASE(Abs, cv::cuda::DeviceInfo, cv::Size, MatDepth, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Abs, Accuracy)
+{
+    cv::Mat src = randomMat(size, depth);
+
+    cv::cuda::GpuMat dst = createMat(size, depth, useRoi);
+    cv::cuda::abs(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold = cv::abs(src);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Abs, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_16S), MatDepth(CV_32F)),
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Sqr
+
+PARAM_TEST_CASE(Sqr, cv::cuda::DeviceInfo, cv::Size, MatDepth, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Sqr, Accuracy)
+{
+    cv::Mat src = randomMat(size, depth, 0, depth == CV_8U ? 16 : 255);
+
+    cv::cuda::GpuMat dst = createMat(size, depth, useRoi);
+    cv::cuda::sqr(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::multiply(src, src, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Sqr, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U),
+                    MatDepth(CV_16U),
+                    MatDepth(CV_16S),
+                    MatDepth(CV_32F)),
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Sqrt
+
+namespace
+{
+    template <typename T> void sqrtImpl(const cv::Mat& src, cv::Mat& dst)
+    {
+        dst.create(src.size(), src.type());
+
+        for (int y = 0; y < src.rows; ++y)
+        {
+            for (int x = 0; x < src.cols; ++x)
+                dst.at<T>(y, x) = static_cast<T>(std::sqrt(static_cast<float>(src.at<T>(y, x))));
+        }
+    }
+
+    void sqrtGold(const cv::Mat& src, cv::Mat& dst)
+    {
+        typedef void (*func_t)(const cv::Mat& src, cv::Mat& dst);
+
+        const func_t funcs[] =
+        {
+            sqrtImpl<uchar>, sqrtImpl<schar>, sqrtImpl<ushort>, sqrtImpl<short>,
+            sqrtImpl<int>, sqrtImpl<float>
+        };
+
+        funcs[src.depth()](src, dst);
+    }
+}
+
+PARAM_TEST_CASE(Sqrt, cv::cuda::DeviceInfo, cv::Size, MatDepth, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Sqrt, Accuracy)
+{
+    cv::Mat src = randomMat(size, depth);
+
+    cv::cuda::GpuMat dst = createMat(size, depth, useRoi);
+    cv::cuda::sqrt(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    sqrtGold(src, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth < CV_32F ? 1.0 : 1e-5);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Sqrt, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U),
+                    MatDepth(CV_16U),
+                    MatDepth(CV_16S),
+                    MatDepth(CV_32F)),
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Log
+
+namespace
+{
+    template <typename T> void logImpl(const cv::Mat& src, cv::Mat& dst)
+    {
+        dst.create(src.size(), src.type());
+
+        for (int y = 0; y < src.rows; ++y)
+        {
+            for (int x = 0; x < src.cols; ++x)
+                dst.at<T>(y, x) = static_cast<T>(std::log(static_cast<float>(src.at<T>(y, x))));
+        }
+    }
+
+    void logGold(const cv::Mat& src, cv::Mat& dst)
+    {
+        typedef void (*func_t)(const cv::Mat& src, cv::Mat& dst);
+
+        const func_t funcs[] =
+        {
+            logImpl<uchar>, logImpl<schar>, logImpl<ushort>, logImpl<short>,
+            logImpl<int>, logImpl<float>
+        };
+
+        funcs[src.depth()](src, dst);
+    }
+}
+
+PARAM_TEST_CASE(Log, cv::cuda::DeviceInfo, cv::Size, MatDepth, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Log, Accuracy)
+{
+    cv::Mat src = randomMat(size, depth, 1.0, 255.0);
+
+    cv::cuda::GpuMat dst = createMat(size, depth, useRoi);
+    cv::cuda::log(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    logGold(src, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth < CV_32F ? 1.0 : 1e-6);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Log, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U),
+                    MatDepth(CV_16U),
+                    MatDepth(CV_16S),
+                    MatDepth(CV_32F)),
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Exp
+
+namespace
+{
+    template <typename T> void expImpl(const cv::Mat& src, cv::Mat& dst)
+    {
+        dst.create(src.size(), src.type());
+
+        for (int y = 0; y < src.rows; ++y)
+        {
+            for (int x = 0; x < src.cols; ++x)
+                dst.at<T>(y, x) = cv::saturate_cast<T>(static_cast<int>(std::exp(static_cast<float>(src.at<T>(y, x)))));
+        }
+    }
+    void expImpl_float(const cv::Mat& src, cv::Mat& dst)
+    {
+        dst.create(src.size(), src.type());
+
+        for (int y = 0; y < src.rows; ++y)
+        {
+            for (int x = 0; x < src.cols; ++x)
+                dst.at<float>(y, x) = std::exp(static_cast<float>(src.at<float>(y, x)));
+        }
+    }
+
+    void expGold(const cv::Mat& src, cv::Mat& dst)
+    {
+        typedef void (*func_t)(const cv::Mat& src, cv::Mat& dst);
+
+        const func_t funcs[] =
+        {
+            expImpl<uchar>, expImpl<schar>, expImpl<ushort>, expImpl<short>,
+            expImpl<int>, expImpl_float
+        };
+
+        funcs[src.depth()](src, dst);
+    }
+}
+
+PARAM_TEST_CASE(Exp, cv::cuda::DeviceInfo, cv::Size, MatDepth, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Exp, Accuracy)
+{
+    cv::Mat src = randomMat(size, depth, 0.0, 10.0);
+
+    cv::cuda::GpuMat dst = createMat(size, depth, useRoi);
+    cv::cuda::exp(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    expGold(src, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth < CV_32F ? 1.0 : 1e-2);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Exp, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U),
+                    MatDepth(CV_16U),
+                    MatDepth(CV_16S),
+                    MatDepth(CV_32F)),
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Pow
+
+PARAM_TEST_CASE(Pow, cv::cuda::DeviceInfo, cv::Size, MatDepth, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Pow, Accuracy)
+{
+    cv::Mat src = randomMat(size, depth, 0.0, 10.0);
+    double power = randomDouble(2.0, 4.0);
+
+    if (src.depth() < CV_32F)
+        power = static_cast<int>(power);
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::pow(loadMat(src), power, dst);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth, useRoi);
+        cv::cuda::pow(loadMat(src, useRoi), power, dst);
+
+        cv::Mat dst_gold;
+        cv::pow(src, power, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth < CV_32F ? 0.0 : 1e-1);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Pow, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    ALL_DEPTH,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Compare_Array
+
+CV_ENUM(CmpCode, cv::CMP_EQ, cv::CMP_GT, cv::CMP_GE, cv::CMP_LT, cv::CMP_LE, cv::CMP_NE)
+#define ALL_CMP_CODES testing::Values(CmpCode(cv::CMP_EQ), CmpCode(cv::CMP_NE), CmpCode(cv::CMP_GT), CmpCode(cv::CMP_GE), CmpCode(cv::CMP_LT), CmpCode(cv::CMP_LE))
+
+PARAM_TEST_CASE(Compare_Array, cv::cuda::DeviceInfo, cv::Size, MatDepth, CmpCode, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    int cmp_code;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        cmp_code = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Compare_Array, Accuracy)
+{
+    cv::Mat src1 = randomMat(size, depth);
+    cv::Mat src2 = randomMat(size, depth);
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::compare(loadMat(src1), loadMat(src2), dst, cmp_code);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, CV_8UC1, useRoi);
+        cv::cuda::compare(loadMat(src1, useRoi), loadMat(src2, useRoi), dst, cmp_code);
+
+        cv::Mat dst_gold;
+        cv::compare(src1, src2, dst_gold, cmp_code);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Compare_Array, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    ALL_DEPTH,
+    ALL_CMP_CODES,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Compare_Scalar
+
+namespace
+{
+    template <template <typename> class Op, typename T>
+    void compareScalarImpl(const cv::Mat& src, cv::Scalar sc, cv::Mat& dst)
+    {
+        Op<T> op;
+
+        const int cn = src.channels();
+
+        dst.create(src.size(), CV_MAKE_TYPE(CV_8U, cn));
+
+        for (int y = 0; y < src.rows; ++y)
+        {
+            for (int x = 0; x < src.cols; ++x)
+            {
+                for (int c = 0; c < cn; ++c)
+                {
+                    T src_val = src.at<T>(y, x * cn + c);
+                    T sc_val = cv::saturate_cast<T>(sc.val[c]);
+                    dst.at<uchar>(y, x * cn + c) = static_cast<uchar>(static_cast<int>(op(src_val, sc_val)) * 255);
+                }
+            }
+        }
+    }
+
+    void compareScalarGold(const cv::Mat& src, cv::Scalar sc, cv::Mat& dst, int cmpop)
+    {
+        typedef void (*func_t)(const cv::Mat& src, cv::Scalar sc, cv::Mat& dst);
+        static const func_t funcs[7][6] =
+        {
+            {compareScalarImpl<std::equal_to, unsigned char> , compareScalarImpl<std::greater, unsigned char> , compareScalarImpl<std::greater_equal, unsigned char> , compareScalarImpl<std::less, unsigned char> , compareScalarImpl<std::less_equal, unsigned char> , compareScalarImpl<std::not_equal_to, unsigned char> },
+            {compareScalarImpl<std::equal_to, signed char>   , compareScalarImpl<std::greater, signed char>   , compareScalarImpl<std::greater_equal, signed char>   , compareScalarImpl<std::less, signed char>   , compareScalarImpl<std::less_equal, signed char>   , compareScalarImpl<std::not_equal_to, signed char>   },
+            {compareScalarImpl<std::equal_to, unsigned short>, compareScalarImpl<std::greater, unsigned short>, compareScalarImpl<std::greater_equal, unsigned short>, compareScalarImpl<std::less, unsigned short>, compareScalarImpl<std::less_equal, unsigned short>, compareScalarImpl<std::not_equal_to, unsigned short>},
+            {compareScalarImpl<std::equal_to, short>         , compareScalarImpl<std::greater, short>         , compareScalarImpl<std::greater_equal, short>         , compareScalarImpl<std::less, short>         , compareScalarImpl<std::less_equal, short>         , compareScalarImpl<std::not_equal_to, short>         },
+            {compareScalarImpl<std::equal_to, int>           , compareScalarImpl<std::greater, int>           , compareScalarImpl<std::greater_equal, int>           , compareScalarImpl<std::less, int>           , compareScalarImpl<std::less_equal, int>           , compareScalarImpl<std::not_equal_to, int>           },
+            {compareScalarImpl<std::equal_to, float>         , compareScalarImpl<std::greater, float>         , compareScalarImpl<std::greater_equal, float>         , compareScalarImpl<std::less, float>         , compareScalarImpl<std::less_equal, float>         , compareScalarImpl<std::not_equal_to, float>         },
+            {compareScalarImpl<std::equal_to, double>        , compareScalarImpl<std::greater, double>        , compareScalarImpl<std::greater_equal, double>        , compareScalarImpl<std::less, double>        , compareScalarImpl<std::less_equal, double>        , compareScalarImpl<std::not_equal_to, double>        }
+        };
+
+        funcs[src.depth()][cmpop](src, sc, dst);
+    }
+}
+
+PARAM_TEST_CASE(Compare_Scalar, cv::cuda::DeviceInfo, cv::Size, MatType, CmpCode, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    int cmp_code;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        cmp_code = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Compare_Scalar, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+    cv::Scalar sc = randomScalar(0.0, 255.0);
+
+    if (src.depth() < CV_32F)
+    {
+        sc.val[0] = cvRound(sc.val[0]);
+        sc.val[1] = cvRound(sc.val[1]);
+        sc.val[2] = cvRound(sc.val[2]);
+        sc.val[3] = cvRound(sc.val[3]);
+    }
+
+    if (src.depth() == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::compare(loadMat(src), sc, dst, cmp_code);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, CV_MAKE_TYPE(CV_8U, src.channels()), useRoi);
+
+        cv::cuda::compare(loadMat(src, useRoi), sc, dst, cmp_code);
+
+        cv::Mat dst_gold;
+        compareScalarGold(src, sc, dst_gold, cmp_code);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Compare_Scalar, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    TYPES(CV_8U, CV_64F, 1, 4),
+    ALL_CMP_CODES,
+    WHOLE_SUBMAT));
+
+//////////////////////////////////////////////////////////////////////////////
+// Bitwise_Array
+
+PARAM_TEST_CASE(Bitwise_Array, cv::cuda::DeviceInfo, cv::Size, MatType)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+
+    cv::Mat src1;
+    cv::Mat src2;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        src1 = randomMat(size, type, 0.0, std::numeric_limits<int>::max());
+        src2 = randomMat(size, type, 0.0, std::numeric_limits<int>::max());
+    }
+};
+
+CUDA_TEST_P(Bitwise_Array, Not)
+{
+    cv::cuda::GpuMat dst;
+    cv::cuda::bitwise_not(loadMat(src1), dst);
+
+    cv::Mat dst_gold = ~src1;
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(Bitwise_Array, Or)
+{
+    cv::cuda::GpuMat dst;
+    cv::cuda::bitwise_or(loadMat(src1), loadMat(src2), dst);
+
+    cv::Mat dst_gold = src1 | src2;
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(Bitwise_Array, And)
+{
+    cv::cuda::GpuMat dst;
+    cv::cuda::bitwise_and(loadMat(src1), loadMat(src2), dst);
+
+    cv::Mat dst_gold = src1 & src2;
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(Bitwise_Array, Xor)
+{
+    cv::cuda::GpuMat dst;
+    cv::cuda::bitwise_xor(loadMat(src1), loadMat(src2), dst);
+
+    cv::Mat dst_gold = src1 ^ src2;
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Bitwise_Array, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    TYPES(CV_8U, CV_32S, 1, 4)));
+
+//////////////////////////////////////////////////////////////////////////////
+// Bitwise_Scalar
+
+PARAM_TEST_CASE(Bitwise_Scalar, cv::cuda::DeviceInfo, cv::Size, MatDepth, Channels)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    int channels;
+
+    cv::Mat src;
+    cv::Scalar val;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        channels = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        src = randomMat(size, CV_MAKE_TYPE(depth, channels));
+        cv::Scalar_<int> ival = randomScalar(0.0, std::numeric_limits<int>::max());
+        val = ival;
+    }
+};
+
+CUDA_TEST_P(Bitwise_Scalar, Or)
+{
+    cv::cuda::GpuMat dst;
+    cv::cuda::bitwise_or(loadMat(src), val, dst);
+
+    cv::Mat dst_gold;
+    cv::bitwise_or(src, val, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(Bitwise_Scalar, And)
+{
+    cv::cuda::GpuMat dst;
+    cv::cuda::bitwise_and(loadMat(src), val, dst);
+
+    cv::Mat dst_gold;
+    cv::bitwise_and(src, val, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(Bitwise_Scalar, Xor)
+{
+    cv::cuda::GpuMat dst;
+    cv::cuda::bitwise_xor(loadMat(src), val, dst);
+
+    cv::Mat dst_gold;
+    cv::bitwise_xor(src, val, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Bitwise_Scalar, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U), MatDepth(CV_16U), MatDepth(CV_32S)),
+    IMAGE_CHANNELS));
+
+//////////////////////////////////////////////////////////////////////////////
+// RShift
+
+namespace
+{
+    template <typename T> void rhiftImpl(const cv::Mat& src, cv::Scalar_<int> val, cv::Mat& dst)
+    {
+        const int cn = src.channels();
+
+        dst.create(src.size(), src.type());
+
+        for (int y = 0; y < src.rows; ++y)
+        {
+            for (int x = 0; x < src.cols; ++x)
+            {
+                for (int c = 0; c < cn; ++c)
+                    dst.at<T>(y, x * cn + c) = src.at<T>(y, x * cn + c) >> val.val[c];
+            }
+        }
+    }
+
+    void rhiftGold(const cv::Mat& src, cv::Scalar_<int> val, cv::Mat& dst)
+    {
+        typedef void (*func_t)(const cv::Mat& src, cv::Scalar_<int> val, cv::Mat& dst);
+
+        const func_t funcs[] =
+        {
+            rhiftImpl<uchar>, rhiftImpl<schar>, rhiftImpl<ushort>, rhiftImpl<short>, rhiftImpl<int>
+        };
+
+        funcs[src.depth()](src, val, dst);
+    }
+}
+
+PARAM_TEST_CASE(RShift, cv::cuda::DeviceInfo, cv::Size, MatDepth, Channels, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    int channels;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        channels = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(RShift, Accuracy)
+{
+    int type = CV_MAKE_TYPE(depth, channels);
+    cv::Mat src = randomMat(size, type);
+    cv::Scalar_<int> val = randomScalar(0.0, 8.0);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    cv::cuda::rshift(loadMat(src, useRoi), val, dst);
+
+    cv::Mat dst_gold;
+    rhiftGold(src, val, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, RShift, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U),
+                    MatDepth(CV_8S),
+                    MatDepth(CV_16U),
+                    MatDepth(CV_16S),
+                    MatDepth(CV_32S)),
+    IMAGE_CHANNELS,
+    WHOLE_SUBMAT));
+
+//////////////////////////////////////////////////////////////////////////////
+// LShift
+
+namespace
+{
+    template <typename T> void lhiftImpl(const cv::Mat& src, cv::Scalar_<int> val, cv::Mat& dst)
+    {
+        const int cn = src.channels();
+
+        dst.create(src.size(), src.type());
+
+        for (int y = 0; y < src.rows; ++y)
+        {
+            for (int x = 0; x < src.cols; ++x)
+            {
+                for (int c = 0; c < cn; ++c)
+                    dst.at<T>(y, x * cn + c) = src.at<T>(y, x * cn + c) << val.val[c];
+            }
+        }
+    }
+
+    void lhiftGold(const cv::Mat& src, cv::Scalar_<int> val, cv::Mat& dst)
+    {
+        typedef void (*func_t)(const cv::Mat& src, cv::Scalar_<int> val, cv::Mat& dst);
+
+        const func_t funcs[] =
+        {
+            lhiftImpl<uchar>, lhiftImpl<schar>, lhiftImpl<ushort>, lhiftImpl<short>, lhiftImpl<int>
+        };
+
+        funcs[src.depth()](src, val, dst);
+    }
+}
+
+PARAM_TEST_CASE(LShift, cv::cuda::DeviceInfo, cv::Size, MatDepth, Channels, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    int channels;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        channels = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(LShift, Accuracy)
+{
+    int type = CV_MAKE_TYPE(depth, channels);
+    cv::Mat src = randomMat(size, type);
+    cv::Scalar_<int> val = randomScalar(0.0, 8.0);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    cv::cuda::lshift(loadMat(src, useRoi), val, dst);
+
+    cv::Mat dst_gold;
+    lhiftGold(src, val, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, LShift, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U), MatDepth(CV_16U), MatDepth(CV_32S)),
+    IMAGE_CHANNELS,
+    WHOLE_SUBMAT));
+
+//////////////////////////////////////////////////////////////////////////////
+// Min
+
+PARAM_TEST_CASE(Min, cv::cuda::DeviceInfo, cv::Size, MatDepth, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Min, Array)
+{
+    cv::Mat src1 = randomMat(size, depth);
+    cv::Mat src2 = randomMat(size, depth);
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::min(loadMat(src1), loadMat(src2), dst);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth, useRoi);
+        cv::cuda::min(loadMat(src1, useRoi), loadMat(src2, useRoi), dst);
+
+        cv::Mat dst_gold = cv::min(src1, src2);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+}
+
+CUDA_TEST_P(Min, Scalar)
+{
+    cv::Mat src = randomMat(size, depth);
+    double val = randomDouble(0.0, 255.0);
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::min(loadMat(src), val, dst);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth, useRoi);
+        cv::cuda::min(loadMat(src, useRoi), val, dst);
+
+        cv::Mat dst_gold = cv::min(src, val);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth < CV_32F ? 1.0 : 1e-5);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Min, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    ALL_DEPTH,
+    WHOLE_SUBMAT));
+
+//////////////////////////////////////////////////////////////////////////////
+// Max
+
+PARAM_TEST_CASE(Max, cv::cuda::DeviceInfo, cv::Size, MatDepth, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Max, Array)
+{
+    cv::Mat src1 = randomMat(size, depth);
+    cv::Mat src2 = randomMat(size, depth);
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::max(loadMat(src1), loadMat(src2), dst);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth, useRoi);
+        cv::cuda::max(loadMat(src1, useRoi), loadMat(src2, useRoi), dst);
+
+        cv::Mat dst_gold = cv::max(src1, src2);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+}
+
+CUDA_TEST_P(Max, Scalar)
+{
+    cv::Mat src = randomMat(size, depth);
+    double val = randomDouble(0.0, 255.0);
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::max(loadMat(src), val, dst);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, depth, useRoi);
+        cv::cuda::max(loadMat(src, useRoi), val, dst);
+
+        cv::Mat dst_gold = cv::max(src, val);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth < CV_32F ? 1.0 : 1e-5);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Max, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    ALL_DEPTH,
+    WHOLE_SUBMAT));
+
+//////////////////////////////////////////////////////////////////////////////
+// AddWeighted
+
+PARAM_TEST_CASE(AddWeighted, cv::cuda::DeviceInfo, cv::Size, MatDepth, MatDepth, MatDepth, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth1;
+    int depth2;
+    int dst_depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth1 = GET_PARAM(2);
+        depth2 = GET_PARAM(3);
+        dst_depth = GET_PARAM(4);
+        useRoi = GET_PARAM(5);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(AddWeighted, Accuracy)
+{
+    cv::Mat src1 = randomMat(size, depth1);
+    cv::Mat src2 = randomMat(size, depth2);
+    double alpha = randomDouble(-10.0, 10.0);
+    double beta = randomDouble(-10.0, 10.0);
+    double gamma = randomDouble(-10.0, 10.0);
+
+    if ((depth1 == CV_64F || depth2 == CV_64F || dst_depth == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat dst;
+            cv::cuda::addWeighted(loadMat(src1), alpha, loadMat(src2), beta, gamma, dst, dst_depth);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat dst = createMat(size, dst_depth, useRoi);
+        cv::cuda::addWeighted(loadMat(src1, useRoi), alpha, loadMat(src2, useRoi), beta, gamma, dst, dst_depth);
+
+        cv::Mat dst_gold;
+        cv::addWeighted(src1, alpha, src2, beta, gamma, dst_gold, dst_depth);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, dst_depth < CV_32F ? 2.0 : 1e-3);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, AddWeighted, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    ALL_DEPTH,
+    ALL_DEPTH,
+    ALL_DEPTH,
+    WHOLE_SUBMAT));
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// Threshold
+
+CV_ENUM(ThreshOp, cv::THRESH_BINARY, cv::THRESH_BINARY_INV, cv::THRESH_TRUNC, cv::THRESH_TOZERO, cv::THRESH_TOZERO_INV)
+#define ALL_THRESH_OPS testing::Values(ThreshOp(cv::THRESH_BINARY), ThreshOp(cv::THRESH_BINARY_INV), ThreshOp(cv::THRESH_TRUNC), ThreshOp(cv::THRESH_TOZERO), ThreshOp(cv::THRESH_TOZERO_INV))
+
+PARAM_TEST_CASE(Threshold, cv::cuda::DeviceInfo, cv::Size, MatType, Channels, ThreshOp, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    int channel;
+    int threshOp;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        channel = GET_PARAM(3);
+        threshOp = GET_PARAM(4);
+        useRoi = GET_PARAM(5);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Threshold, Accuracy)
+{
+    cv::Mat src = randomMat(size, CV_MAKE_TYPE(type, channel));
+    double maxVal = randomDouble(20.0, 127.0);
+    double thresh = randomDouble(0.0, maxVal);
+
+    cv::cuda::GpuMat dst = createMat(src.size(), src.type(), useRoi);
+    cv::cuda::threshold(loadMat(src, useRoi), dst, thresh, maxVal, threshOp);
+
+    cv::Mat dst_gold;
+    cv::threshold(src, dst_gold, thresh, maxVal, threshOp);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Threshold, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U), MatDepth(CV_16S), MatDepth(CV_32F)),
+    ALL_CHANNELS,
+    ALL_THRESH_OPS,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Magnitude
+
+PARAM_TEST_CASE(Magnitude, cv::cuda::DeviceInfo, cv::Size, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        useRoi = GET_PARAM(2);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Magnitude, NPP)
+{
+    cv::Mat src = randomMat(size, CV_32FC2);
+
+    cv::cuda::GpuMat dst = createMat(size, CV_32FC1, useRoi);
+    cv::cuda::magnitude(loadMat(src, useRoi), dst);
+
+    cv::Mat arr[2];
+    cv::split(src, arr);
+    cv::Mat dst_gold;
+    cv::magnitude(arr[0], arr[1], dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-4);
+}
+
+CUDA_TEST_P(Magnitude, Sqr_NPP)
+{
+    cv::Mat src = randomMat(size, CV_32FC2);
+
+    cv::cuda::GpuMat dst = createMat(size, CV_32FC1, useRoi);
+    cv::cuda::magnitudeSqr(loadMat(src, useRoi), dst);
+
+    cv::Mat arr[2];
+    cv::split(src, arr);
+    cv::Mat dst_gold;
+    cv::magnitude(arr[0], arr[1], dst_gold);
+    cv::multiply(dst_gold, dst_gold, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-1);
+}
+
+CUDA_TEST_P(Magnitude, Accuracy)
+{
+    cv::Mat x = randomMat(size, CV_32FC1);
+    cv::Mat y = randomMat(size, CV_32FC1);
+
+    cv::cuda::GpuMat dst = createMat(size, CV_32FC1, useRoi);
+    cv::cuda::magnitude(loadMat(x, useRoi), loadMat(y, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::magnitude(x, y, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-4);
+}
+
+CUDA_TEST_P(Magnitude, Sqr_Accuracy)
+{
+    cv::Mat x = randomMat(size, CV_32FC1);
+    cv::Mat y = randomMat(size, CV_32FC1);
+
+    cv::cuda::GpuMat dst = createMat(size, CV_32FC1, useRoi);
+    cv::cuda::magnitudeSqr(loadMat(x, useRoi), loadMat(y, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::magnitude(x, y, dst_gold);
+    cv::multiply(dst_gold, dst_gold, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-1);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Magnitude, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// Phase
+
+namespace
+{
+    IMPLEMENT_PARAM_CLASS(AngleInDegrees, bool)
+}
+
+PARAM_TEST_CASE(Phase, cv::cuda::DeviceInfo, cv::Size, AngleInDegrees, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    bool angleInDegrees;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        angleInDegrees = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Phase, Accuracy)
+{
+    cv::Mat x = randomMat(size, CV_32FC1);
+    cv::Mat y = randomMat(size, CV_32FC1);
+
+    cv::cuda::GpuMat dst = createMat(size, CV_32FC1, useRoi);
+    cv::cuda::phase(loadMat(x, useRoi), loadMat(y, useRoi), dst, angleInDegrees);
+
+    cv::Mat dst_gold;
+    cv::phase(x, y, dst_gold, angleInDegrees);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, angleInDegrees ? 1e-2 : 1e-3);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Phase, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(AngleInDegrees(false), AngleInDegrees(true)),
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// CartToPolar
+
+PARAM_TEST_CASE(CartToPolar, cv::cuda::DeviceInfo, cv::Size, AngleInDegrees, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    bool angleInDegrees;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        angleInDegrees = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(CartToPolar, Accuracy)
+{
+    cv::Mat x = randomMat(size, CV_32FC1);
+    cv::Mat y = randomMat(size, CV_32FC1);
+
+    cv::cuda::GpuMat mag = createMat(size, CV_32FC1, useRoi);
+    cv::cuda::GpuMat angle = createMat(size, CV_32FC1, useRoi);
+    cv::cuda::cartToPolar(loadMat(x, useRoi), loadMat(y, useRoi), mag, angle, angleInDegrees);
+
+    cv::Mat mag_gold;
+    cv::Mat angle_gold;
+    cv::cartToPolar(x, y, mag_gold, angle_gold, angleInDegrees);
+
+    EXPECT_MAT_NEAR(mag_gold, mag, 1e-4);
+    EXPECT_MAT_NEAR(angle_gold, angle, angleInDegrees ? 1e-2 : 1e-3);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, CartToPolar, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(AngleInDegrees(false), AngleInDegrees(true)),
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// polarToCart
+
+PARAM_TEST_CASE(PolarToCart, cv::cuda::DeviceInfo, cv::Size, MatType, AngleInDegrees, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    bool angleInDegrees;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        angleInDegrees = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(PolarToCart, Accuracy)
+{
+    cv::Mat magnitude = randomMat(size, type);
+    cv::Mat angle = randomMat(size, type);
+    const double tol = (type == CV_32FC1 ? 1.6e-4 : 1e-4) * (angleInDegrees ? 1.0 : 19.0);
+
+    cv::cuda::GpuMat x = createMat(size, type, useRoi);
+    cv::cuda::GpuMat y = createMat(size, type, useRoi);
+    cv::cuda::polarToCart(loadMat(magnitude, useRoi), loadMat(angle, useRoi), x, y, angleInDegrees);
+
+    cv::Mat x_gold;
+    cv::Mat y_gold;
+    cv::polarToCart(magnitude, angle, x_gold, y_gold, angleInDegrees);
+
+    EXPECT_MAT_NEAR(x_gold, x, tol);
+    EXPECT_MAT_NEAR(y_gold, y, tol);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, PolarToCart, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(CV_32FC1, CV_64FC1),
+    testing::Values(AngleInDegrees(false), AngleInDegrees(true)),
+    WHOLE_SUBMAT));
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaarithm/test/test_gpumat.cpp b/modules/cudaarithm/test/test_gpumat.cpp
new file mode 100644
index 00000000000..e2fed16ad5f
--- /dev/null
+++ b/modules/cudaarithm/test/test_gpumat.cpp
@@ -0,0 +1,412 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/ts/cuda_test.hpp"
+
+namespace opencv_test { namespace {
+
+////////////////////////////////////////////////////////////////////////////////
+// SetTo
+
+PARAM_TEST_CASE(GpuMat_SetTo, cv::cuda::DeviceInfo, cv::Size, MatType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(GpuMat_SetTo, Zero)
+{
+    cv::Scalar zero = cv::Scalar::all(0);
+
+    cv::cuda::GpuMat mat = createMat(size, type, useRoi);
+    mat.setTo(zero);
+
+    EXPECT_MAT_NEAR(cv::Mat::zeros(size, type), mat, 0.0);
+}
+
+CUDA_TEST_P(GpuMat_SetTo, SameVal)
+{
+    cv::Scalar val = cv::Scalar::all(randomDouble(0.0, 255.0));
+
+    if (CV_MAT_DEPTH(type) == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat mat = createMat(size, type, useRoi);
+            mat.setTo(val);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat mat = createMat(size, type, useRoi);
+        mat.setTo(val);
+
+        EXPECT_MAT_NEAR(cv::Mat(size, type, val), mat, 0.0);
+    }
+}
+
+CUDA_TEST_P(GpuMat_SetTo, DifferentVal)
+{
+    cv::Scalar val = randomScalar(0.0, 255.0);
+
+    if (CV_MAT_DEPTH(type) == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat mat = createMat(size, type, useRoi);
+            mat.setTo(val);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat mat = createMat(size, type, useRoi);
+        mat.setTo(val);
+
+        EXPECT_MAT_NEAR(cv::Mat(size, type, val), mat, 0.0);
+    }
+}
+
+CUDA_TEST_P(GpuMat_SetTo, Masked)
+{
+    cv::Scalar val = randomScalar(0.0, 255.0);
+    cv::Mat mat_gold = randomMat(size, type);
+    cv::Mat mask = randomMat(size, CV_8UC1, 0.0, 2.0);
+
+    if (CV_MAT_DEPTH(type) == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat mat = createMat(size, type, useRoi);
+            mat.setTo(val, loadMat(mask));
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat mat = loadMat(mat_gold, useRoi);
+        mat.setTo(val, loadMat(mask, useRoi));
+
+        mat_gold.setTo(val, mask);
+
+        EXPECT_MAT_NEAR(mat_gold, mat, 0.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA, GpuMat_SetTo, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    ALL_TYPES,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// CopyTo
+
+PARAM_TEST_CASE(GpuMat_CopyTo, cv::cuda::DeviceInfo, cv::Size, MatType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    bool useRoi;
+
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(GpuMat_CopyTo, WithOutMask)
+{
+    cv::Mat src = randomMat(size, type);
+
+    cv::cuda::GpuMat d_src = loadMat(src, useRoi);
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    d_src.copyTo(dst);
+
+    EXPECT_MAT_NEAR(src, dst, 0.0);
+}
+
+CUDA_TEST_P(GpuMat_CopyTo, Masked)
+{
+    cv::Mat src = randomMat(size, type);
+    cv::Mat mask = randomMat(size, CV_8UC1, 0.0, 2.0);
+
+    if (CV_MAT_DEPTH(type) == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat d_src = loadMat(src);
+            cv::cuda::GpuMat dst;
+            d_src.copyTo(dst, loadMat(mask, useRoi));
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat d_src = loadMat(src, useRoi);
+        cv::cuda::GpuMat dst = loadMat(cv::Mat::zeros(size, type), useRoi);
+        d_src.copyTo(dst, loadMat(mask, useRoi));
+
+        cv::Mat dst_gold = cv::Mat::zeros(size, type);
+        src.copyTo(dst_gold, mask);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA, GpuMat_CopyTo, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    ALL_TYPES,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// ConvertTo
+
+PARAM_TEST_CASE(GpuMat_ConvertTo, cv::cuda::DeviceInfo, cv::Size, MatDepth, MatDepth, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth1;
+    int depth2;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth1 = GET_PARAM(2);
+        depth2 = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(GpuMat_ConvertTo, WithOutScaling)
+{
+    cv::Mat src = randomMat(size, depth1);
+
+    if ((depth1 == CV_64F || depth2 == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat d_src = loadMat(src);
+            cv::cuda::GpuMat dst;
+            d_src.convertTo(dst, depth2);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat d_src = loadMat(src, useRoi);
+        cv::cuda::GpuMat dst = createMat(size, depth2, useRoi);
+        d_src.convertTo(dst, depth2);
+
+        cv::Mat dst_gold;
+        src.convertTo(dst_gold, depth2);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth2 < CV_32F ? 1.0 : 1e-4);
+    }
+}
+
+CUDA_TEST_P(GpuMat_ConvertTo, WithScaling)
+{
+    cv::Mat src = randomMat(size, depth1);
+    double a = randomDouble(0.0, 1.0);
+    double b = randomDouble(-10.0, 10.0);
+
+    if ((depth1 == CV_64F || depth2 == CV_64F) && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::GpuMat d_src = loadMat(src);
+            cv::cuda::GpuMat dst;
+            d_src.convertTo(dst, depth2, a, b);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat d_src = loadMat(src, useRoi);
+        cv::cuda::GpuMat dst = createMat(size, depth2, useRoi);
+        d_src.convertTo(dst, depth2, a, b);
+
+        cv::Mat dst_gold;
+        src.convertTo(dst_gold, depth2, a, b);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, depth2 < CV_32F ? 1.0 : 1e-4);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA, GpuMat_ConvertTo, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    ALL_DEPTH,
+    ALL_DEPTH,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// ensureSizeIsEnough
+
+struct EnsureSizeIsEnough : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    virtual void SetUp()
+    {
+        cv::cuda::DeviceInfo devInfo = GetParam();
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(EnsureSizeIsEnough, BufferReuse)
+{
+    cv::cuda::GpuMat buffer(100, 100, CV_8U);
+    cv::cuda::GpuMat old = buffer;
+
+    // don't reallocate memory
+    cv::cuda::ensureSizeIsEnough(10, 20, CV_8U, buffer);
+    EXPECT_EQ(10, buffer.rows);
+    EXPECT_EQ(20, buffer.cols);
+    EXPECT_EQ(CV_8UC1, buffer.type());
+    EXPECT_EQ(reinterpret_cast<intptr_t>(old.data), reinterpret_cast<intptr_t>(buffer.data));
+
+    // don't reallocate memory
+    cv::cuda::ensureSizeIsEnough(20, 30, CV_8U, buffer);
+    EXPECT_EQ(20, buffer.rows);
+    EXPECT_EQ(30, buffer.cols);
+    EXPECT_EQ(CV_8UC1, buffer.type());
+    EXPECT_EQ(reinterpret_cast<intptr_t>(old.data), reinterpret_cast<intptr_t>(buffer.data));
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA, EnsureSizeIsEnough, ALL_DEVICES);
+
+////////////////////////////////////////////////////////////////////////////////
+// createContinuous
+
+struct CreateContinuous : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    virtual void SetUp()
+    {
+        cv::cuda::DeviceInfo devInfo = GetParam();
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(CreateContinuous, BufferReuse)
+{
+    cv::cuda::GpuMat buffer;
+
+    cv::cuda::createContinuous(100, 100, CV_8UC1, buffer);
+    EXPECT_EQ(100, buffer.rows);
+    EXPECT_EQ(100, buffer.cols);
+    EXPECT_EQ(CV_8UC1, buffer.type());
+    EXPECT_TRUE(buffer.isContinuous());
+    EXPECT_EQ(buffer.cols * sizeof(uchar), buffer.step);
+
+    cv::cuda::createContinuous(10, 1000, CV_8UC1, buffer);
+    EXPECT_EQ(10, buffer.rows);
+    EXPECT_EQ(1000, buffer.cols);
+    EXPECT_EQ(CV_8UC1, buffer.type());
+    EXPECT_TRUE(buffer.isContinuous());
+    EXPECT_EQ(buffer.cols * sizeof(uchar), buffer.step);
+
+    cv::cuda::createContinuous(10, 10, CV_8UC1, buffer);
+    EXPECT_EQ(10, buffer.rows);
+    EXPECT_EQ(10, buffer.cols);
+    EXPECT_EQ(CV_8UC1, buffer.type());
+    EXPECT_TRUE(buffer.isContinuous());
+    EXPECT_EQ(buffer.cols * sizeof(uchar), buffer.step);
+
+    cv::cuda::createContinuous(100, 100, CV_8UC1, buffer);
+    EXPECT_EQ(100, buffer.rows);
+    EXPECT_EQ(100, buffer.cols);
+    EXPECT_EQ(CV_8UC1, buffer.type());
+    EXPECT_TRUE(buffer.isContinuous());
+    EXPECT_EQ(buffer.cols * sizeof(uchar), buffer.step);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA, CreateContinuous, ALL_DEVICES);
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaarithm/test/test_main.cpp b/modules/cudaarithm/test/test_main.cpp
new file mode 100644
index 00000000000..04f4fcf6e60
--- /dev/null
+++ b/modules/cudaarithm/test/test_main.cpp
@@ -0,0 +1,45 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+CV_CUDA_TEST_MAIN("gpu")
diff --git a/modules/cudaarithm/test/test_opengl.cpp b/modules/cudaarithm/test/test_opengl.cpp
new file mode 100644
index 00000000000..c1a8f189650
--- /dev/null
+++ b/modules/cudaarithm/test/test_opengl.cpp
@@ -0,0 +1,457 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#if defined(HAVE_CUDA) && defined(HAVE_OPENGL)
+
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/core/opengl.hpp"
+#include "opencv2/ts/cuda_test.hpp"
+
+namespace opencv_test { namespace {
+
+/////////////////////////////////////////////
+// Buffer
+
+PARAM_TEST_CASE(Buffer, cv::Size, MatType)
+{
+    static void SetUpTestCase()
+    {
+        cv::namedWindow("test", cv::WINDOW_OPENGL);
+    }
+
+    static void TearDownTestCase()
+    {
+        cv::destroyAllWindows();
+    }
+
+    cv::Size size;
+    int type;
+
+    virtual void SetUp()
+    {
+        size = GET_PARAM(0);
+        type = GET_PARAM(1);
+    }
+};
+
+CUDA_TEST_P(Buffer, Constructor1)
+{
+    cv::ogl::Buffer buf(size.height, size.width, type, cv::ogl::Buffer::ARRAY_BUFFER, true);
+
+    EXPECT_EQ(size.height, buf.rows());
+    EXPECT_EQ(size.width, buf.cols());
+    EXPECT_EQ(type, buf.type());
+}
+
+CUDA_TEST_P(Buffer, Constructor2)
+{
+    cv::ogl::Buffer buf(size, type, cv::ogl::Buffer::ARRAY_BUFFER, true);
+
+    EXPECT_EQ(size.height, buf.rows());
+    EXPECT_EQ(size.width, buf.cols());
+    EXPECT_EQ(type, buf.type());
+}
+
+CUDA_TEST_P(Buffer, ConstructorFromMat)
+{
+    cv::Mat gold = randomMat(size, type);
+
+    cv::ogl::Buffer buf(gold, cv::ogl::Buffer::ARRAY_BUFFER, true);
+
+    cv::Mat bufData;
+    buf.copyTo(bufData);
+
+    EXPECT_MAT_NEAR(gold, bufData, 0);
+}
+
+CUDA_TEST_P(Buffer, ConstructorFromGpuMat)
+{
+    cv::Mat gold = randomMat(size, type);
+    cv::cuda::GpuMat d_gold(gold);
+
+    cv::ogl::Buffer buf(d_gold, cv::ogl::Buffer::ARRAY_BUFFER);
+
+    cv::Mat bufData;
+    buf.copyTo(bufData);
+
+    EXPECT_MAT_NEAR(gold, bufData, 0);
+}
+
+CUDA_TEST_P(Buffer, ConstructorFromBuffer)
+{
+    cv::ogl::Buffer buf_gold(size, type, cv::ogl::Buffer::ARRAY_BUFFER, true);
+
+    cv::ogl::Buffer buf(buf_gold);
+
+    EXPECT_EQ(buf_gold.bufId(), buf.bufId());
+    EXPECT_EQ(buf_gold.rows(), buf.rows());
+    EXPECT_EQ(buf_gold.cols(), buf.cols());
+    EXPECT_EQ(buf_gold.type(), buf.type());
+}
+
+CUDA_TEST_P(Buffer, Create)
+{
+    cv::ogl::Buffer buf;
+    buf.create(size.height, size.width, type, cv::ogl::Buffer::ARRAY_BUFFER, true);
+
+    EXPECT_EQ(size.height, buf.rows());
+    EXPECT_EQ(size.width, buf.cols());
+    EXPECT_EQ(type, buf.type());
+}
+
+CUDA_TEST_P(Buffer, CopyFromMat)
+{
+    cv::Mat gold = randomMat(size, type);
+
+    cv::ogl::Buffer buf;
+    buf.copyFrom(gold, cv::ogl::Buffer::ARRAY_BUFFER, true);
+
+    cv::Mat bufData;
+    buf.copyTo(bufData);
+
+    EXPECT_MAT_NEAR(gold, bufData, 0);
+}
+
+CUDA_TEST_P(Buffer, CopyFromGpuMat)
+{
+    cv::Mat gold = randomMat(size, type);
+    cv::cuda::GpuMat d_gold(gold);
+
+    cv::ogl::Buffer buf;
+    buf.copyFrom(d_gold, cv::ogl::Buffer::ARRAY_BUFFER, true);
+
+    cv::Mat bufData;
+    buf.copyTo(bufData);
+
+    EXPECT_MAT_NEAR(gold, bufData, 0);
+}
+
+CUDA_TEST_P(Buffer, CopyFromBuffer)
+{
+    cv::Mat gold = randomMat(size, type);
+    cv::ogl::Buffer buf_gold(gold, cv::ogl::Buffer::ARRAY_BUFFER, true);
+
+    cv::ogl::Buffer buf;
+    buf.copyFrom(buf_gold, cv::ogl::Buffer::ARRAY_BUFFER, true);
+
+    EXPECT_NE(buf_gold.bufId(), buf.bufId());
+
+    cv::Mat bufData;
+    buf.copyTo(bufData);
+
+    EXPECT_MAT_NEAR(gold, bufData, 0);
+}
+
+CUDA_TEST_P(Buffer, CopyToGpuMat)
+{
+    cv::Mat gold = randomMat(size, type);
+
+    cv::ogl::Buffer buf(gold, cv::ogl::Buffer::ARRAY_BUFFER, true);
+
+    cv::cuda::GpuMat dst;
+    buf.copyTo(dst);
+
+    EXPECT_MAT_NEAR(gold, dst, 0);
+}
+
+CUDA_TEST_P(Buffer, CopyToBuffer)
+{
+    cv::Mat gold = randomMat(size, type);
+
+    cv::ogl::Buffer buf(gold, cv::ogl::Buffer::ARRAY_BUFFER, true);
+
+    cv::ogl::Buffer dst;
+    buf.copyTo(dst);
+    dst.setAutoRelease(true);
+
+    EXPECT_NE(buf.bufId(), dst.bufId());
+
+    cv::Mat bufData;
+    dst.copyTo(bufData);
+
+    EXPECT_MAT_NEAR(gold, bufData, 0);
+}
+
+CUDA_TEST_P(Buffer, Clone)
+{
+    cv::Mat gold = randomMat(size, type);
+
+    cv::ogl::Buffer buf(gold, cv::ogl::Buffer::ARRAY_BUFFER, true);
+
+    cv::ogl::Buffer dst = buf.clone(cv::ogl::Buffer::ARRAY_BUFFER, true);
+
+    EXPECT_NE(buf.bufId(), dst.bufId());
+
+    cv::Mat bufData;
+    dst.copyTo(bufData);
+
+    EXPECT_MAT_NEAR(gold, bufData, 0);
+}
+
+CUDA_TEST_P(Buffer, MapHostRead)
+{
+    cv::Mat gold = randomMat(size, type);
+
+    cv::ogl::Buffer buf(gold, cv::ogl::Buffer::ARRAY_BUFFER, true);
+
+    cv::Mat dst = buf.mapHost(cv::ogl::Buffer::READ_ONLY);
+
+    EXPECT_MAT_NEAR(gold, dst, 0);
+
+    buf.unmapHost();
+}
+
+CUDA_TEST_P(Buffer, MapHostWrite)
+{
+    cv::Mat gold = randomMat(size, type);
+
+    cv::ogl::Buffer buf(size, type, cv::ogl::Buffer::ARRAY_BUFFER, true);
+
+    cv::Mat dst = buf.mapHost(cv::ogl::Buffer::WRITE_ONLY);
+    gold.copyTo(dst);
+    buf.unmapHost();
+    dst.release();
+
+    cv::Mat bufData;
+    buf.copyTo(bufData);
+
+    EXPECT_MAT_NEAR(gold, bufData, 0);
+}
+
+CUDA_TEST_P(Buffer, MapDevice)
+{
+    cv::Mat gold = randomMat(size, type);
+
+    cv::ogl::Buffer buf(gold, cv::ogl::Buffer::ARRAY_BUFFER, true);
+
+    cv::cuda::GpuMat dst = buf.mapDevice();
+
+    EXPECT_MAT_NEAR(gold, dst, 0);
+
+    buf.unmapDevice();
+}
+
+INSTANTIATE_TEST_CASE_P(OpenGL, Buffer, testing::Combine(DIFFERENT_SIZES, ALL_TYPES));
+
+/////////////////////////////////////////////
+// Texture2D
+
+PARAM_TEST_CASE(Texture2D, cv::Size, MatType)
+{
+    static void SetUpTestCase()
+    {
+        cv::namedWindow("test", cv::WINDOW_OPENGL);
+    }
+
+    static void TearDownTestCase()
+    {
+        cv::destroyAllWindows();
+    }
+
+    cv::Size size;
+    int type;
+    int depth;
+    int cn;
+    cv::ogl::Texture2D::Format format;
+
+    virtual void SetUp()
+    {
+        size = GET_PARAM(0);
+        type = GET_PARAM(1);
+
+        depth = CV_MAT_DEPTH(type);
+        cn = CV_MAT_CN(type);
+        format = cn == 1 ? cv::ogl::Texture2D::DEPTH_COMPONENT : cn == 3 ? cv::ogl::Texture2D::RGB : cn == 4 ? cv::ogl::Texture2D::RGBA : cv::ogl::Texture2D::NONE;
+    }
+};
+
+CUDA_TEST_P(Texture2D, Constructor1)
+{
+    cv::ogl::Texture2D tex(size.height, size.width, format, true);
+
+    EXPECT_EQ(size.height, tex.rows());
+    EXPECT_EQ(size.width, tex.cols());
+    EXPECT_EQ(format, tex.format());
+}
+
+CUDA_TEST_P(Texture2D, Constructor2)
+{
+    cv::ogl::Texture2D tex(size, format, true);
+
+    EXPECT_EQ(size.height, tex.rows());
+    EXPECT_EQ(size.width, tex.cols());
+    EXPECT_EQ(format, tex.format());
+}
+
+CUDA_TEST_P(Texture2D, ConstructorFromMat)
+{
+    cv::Mat gold = randomMat(size, type, 0, depth == CV_8U ? 255 : 1);
+
+    cv::ogl::Texture2D tex(gold, true);
+
+    cv::Mat texData;
+    tex.copyTo(texData, depth);
+
+    EXPECT_MAT_NEAR(gold, texData, 1e-2);
+}
+
+CUDA_TEST_P(Texture2D, ConstructorFromGpuMat)
+{
+    cv::Mat gold = randomMat(size, type, 0, depth == CV_8U ? 255 : 1);
+    cv::cuda::GpuMat d_gold(gold);
+
+    cv::ogl::Texture2D tex(d_gold, true);
+
+    cv::Mat texData;
+    tex.copyTo(texData, depth);
+
+    EXPECT_MAT_NEAR(gold, texData, 1e-2);
+}
+
+CUDA_TEST_P(Texture2D, ConstructorFromBuffer)
+{
+    cv::Mat gold = randomMat(size, type, 0, depth == CV_8U ? 255 : 1);
+    cv::ogl::Buffer buf_gold(gold, cv::ogl::Buffer::PIXEL_UNPACK_BUFFER, true);
+
+    cv::ogl::Texture2D tex(buf_gold, true);
+
+    cv::Mat texData;
+    tex.copyTo(texData, depth);
+
+    EXPECT_MAT_NEAR(gold, texData, 1e-2);
+}
+
+CUDA_TEST_P(Texture2D, ConstructorFromTexture2D)
+{
+    cv::ogl::Texture2D tex_gold(size, format, true);
+    cv::ogl::Texture2D tex(tex_gold);
+
+    EXPECT_EQ(tex_gold.texId(), tex.texId());
+    EXPECT_EQ(tex_gold.rows(), tex.rows());
+    EXPECT_EQ(tex_gold.cols(), tex.cols());
+    EXPECT_EQ(tex_gold.format(), tex.format());
+}
+
+CUDA_TEST_P(Texture2D, Create)
+{
+    cv::ogl::Texture2D tex;
+    tex.create(size.height, size.width, format, true);
+
+    EXPECT_EQ(size.height, tex.rows());
+    EXPECT_EQ(size.width, tex.cols());
+    EXPECT_EQ(format, tex.format());
+}
+
+CUDA_TEST_P(Texture2D, CopyFromMat)
+{
+    cv::Mat gold = randomMat(size, type, 0, depth == CV_8U ? 255 : 1);
+
+    cv::ogl::Texture2D tex;
+    tex.copyFrom(gold, true);
+
+    cv::Mat texData;
+    tex.copyTo(texData, depth);
+
+    EXPECT_MAT_NEAR(gold, texData, 1e-2);
+}
+
+CUDA_TEST_P(Texture2D, CopyFromGpuMat)
+{
+    cv::Mat gold = randomMat(size, type, 0, depth == CV_8U ? 255 : 1);
+    cv::cuda::GpuMat d_gold(gold);
+
+    cv::ogl::Texture2D tex;
+    tex.copyFrom(d_gold, true);
+
+    cv::Mat texData;
+    tex.copyTo(texData, depth);
+
+    EXPECT_MAT_NEAR(gold, texData, 1e-2);
+}
+
+CUDA_TEST_P(Texture2D, CopyFromBuffer)
+{
+    cv::Mat gold = randomMat(size, type, 0, depth == CV_8U ? 255 : 1);
+    cv::ogl::Buffer buf_gold(gold, cv::ogl::Buffer::PIXEL_UNPACK_BUFFER, true);
+
+    cv::ogl::Texture2D tex;
+    tex.copyFrom(buf_gold, true);
+
+    cv::Mat texData;
+    tex.copyTo(texData, depth);
+
+    EXPECT_MAT_NEAR(gold, texData, 1e-2);
+}
+
+CUDA_TEST_P(Texture2D, CopyToGpuMat)
+{
+    cv::Mat gold = randomMat(size, type, 0, depth == CV_8U ? 255 : 1);
+
+    cv::ogl::Texture2D tex(gold, true);
+
+    cv::cuda::GpuMat dst;
+    tex.copyTo(dst, depth);
+
+    EXPECT_MAT_NEAR(gold, dst, 1e-2);
+}
+
+CUDA_TEST_P(Texture2D, CopyToBuffer)
+{
+    cv::Mat gold = randomMat(size, type, 0, depth == CV_8U ? 255 : 1);
+
+    cv::ogl::Texture2D tex(gold, true);
+
+    cv::ogl::Buffer dst;
+    tex.copyTo(dst, depth, true);
+
+    cv::Mat bufData;
+    dst.copyTo(bufData);
+
+    EXPECT_MAT_NEAR(gold, bufData, 1e-2);
+}
+
+INSTANTIATE_TEST_CASE_P(OpenGL, Texture2D, testing::Combine(DIFFERENT_SIZES, testing::Values(CV_8UC1, CV_8UC3, CV_8UC4, CV_32FC1, CV_32FC3, CV_32FC4)));
+
+}} // namespace
+#endif
diff --git a/modules/cudaarithm/test/test_precomp.hpp b/modules/cudaarithm/test/test_precomp.hpp
new file mode 100644
index 00000000000..93627cd6d5e
--- /dev/null
+++ b/modules/cudaarithm/test/test_precomp.hpp
@@ -0,0 +1,56 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_test.hpp"
+
+#include "opencv2/cudaarithm.hpp"
+
+#include "cvconfig.h"
+
+namespace opencv_test {
+using namespace cv::cuda;
+}
+
+#endif
diff --git a/modules/cudaarithm/test/test_reductions.cpp b/modules/cudaarithm/test/test_reductions.cpp
new file mode 100644
index 00000000000..b868280f96f
--- /dev/null
+++ b/modules/cudaarithm/test/test_reductions.cpp
@@ -0,0 +1,1121 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+////////////////////////////////////////////////////////////////////////////////
+// Norm
+
+PARAM_TEST_CASE(Norm, cv::cuda::DeviceInfo, cv::Size, MatDepth, NormCode, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    int normCode;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        normCode = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Norm, Accuracy)
+{
+    cv::Mat src = randomMat(size, depth);
+    cv::Mat mask = randomMat(size, CV_8UC1, 0, 2);
+
+    double val = cv::cuda::norm(loadMat(src, useRoi), normCode, loadMat(mask, useRoi));
+
+    double val_gold = cv::norm(src, normCode, mask);
+
+    EXPECT_NEAR(val_gold, val, depth < CV_32F ? 0.0 : 1.0);
+}
+
+CUDA_TEST_P(Norm, Async)
+{
+    cv::Mat src = randomMat(size, depth);
+    cv::Mat mask = randomMat(size, CV_8UC1, 0, 2);
+
+    cv::cuda::Stream stream;
+
+    cv::cuda::HostMem dst;
+    cv::cuda::calcNorm(loadMat(src, useRoi), dst, normCode, loadMat(mask, useRoi), stream);
+
+    stream.waitForCompletion();
+
+    double val;
+    dst.createMatHeader().convertTo(cv::Mat(1, 1, CV_64FC1, &val), CV_64F);
+
+    double val_gold = cv::norm(src, normCode, mask);
+
+    EXPECT_NEAR(val_gold, val, depth < CV_32F ? 0.0 : 1.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Norm, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U),
+                    MatDepth(CV_8S),
+                    MatDepth(CV_16U),
+                    MatDepth(CV_16S),
+                    MatDepth(CV_32S),
+                    MatDepth(CV_32F)),
+    testing::Values(NormCode(cv::NORM_L1), NormCode(cv::NORM_L2), NormCode(cv::NORM_INF)),
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// normDiff
+
+PARAM_TEST_CASE(NormDiff, cv::cuda::DeviceInfo, cv::Size, NormCode, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int normCode;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        normCode = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(NormDiff, Accuracy)
+{
+    cv::Mat src1 = randomMat(size, CV_8UC1);
+    cv::Mat src2 = randomMat(size, CV_8UC1);
+
+    double val = cv::cuda::norm(loadMat(src1, useRoi), loadMat(src2, useRoi), normCode);
+
+    double val_gold = cv::norm(src1, src2, normCode);
+
+    EXPECT_NEAR(val_gold, val, 0.0);
+}
+
+CUDA_TEST_P(NormDiff, Async)
+{
+    cv::Mat src1 = randomMat(size, CV_8UC1);
+    cv::Mat src2 = randomMat(size, CV_8UC1);
+
+    cv::cuda::Stream stream;
+
+    cv::cuda::HostMem dst;
+    cv::cuda::calcNormDiff(loadMat(src1, useRoi), loadMat(src2, useRoi), dst, normCode, stream);
+
+    stream.waitForCompletion();
+
+    double val;
+    const cv::Mat val_mat(1, 1, CV_64FC1, &val);
+    dst.createMatHeader().convertTo(val_mat, CV_64F);
+
+    double val_gold = cv::norm(src1, src2, normCode);
+
+    EXPECT_NEAR(val_gold, val, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, NormDiff, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(NormCode(cv::NORM_L1), NormCode(cv::NORM_L2), NormCode(cv::NORM_INF)),
+    WHOLE_SUBMAT));
+
+//////////////////////////////////////////////////////////////////////////////
+// Sum
+
+namespace
+{
+    template <typename T>
+    cv::Scalar absSumImpl(const cv::Mat& src)
+    {
+        const int cn = src.channels();
+
+        cv::Scalar sum = cv::Scalar::all(0);
+
+        for (int y = 0; y < src.rows; ++y)
+        {
+            for (int x = 0; x < src.cols; ++x)
+            {
+                for (int c = 0; c < cn; ++c)
+                    sum[c] += std::abs(src.at<T>(y, x * cn + c));
+            }
+        }
+
+        return sum;
+    }
+
+    cv::Scalar absSumGold(const cv::Mat& src)
+    {
+        typedef cv::Scalar (*func_t)(const cv::Mat& src);
+
+        static const func_t funcs[] =
+        {
+            absSumImpl<uchar>,
+            absSumImpl<schar>,
+            absSumImpl<ushort>,
+            absSumImpl<short>,
+            absSumImpl<int>,
+            absSumImpl<float>,
+            absSumImpl<double>
+        };
+
+        return funcs[src.depth()](src);
+    }
+
+    template <typename T>
+    cv::Scalar sqrSumImpl(const cv::Mat& src)
+    {
+        const int cn = src.channels();
+
+        cv::Scalar sum = cv::Scalar::all(0);
+
+        for (int y = 0; y < src.rows; ++y)
+        {
+            for (int x = 0; x < src.cols; ++x)
+            {
+                for (int c = 0; c < cn; ++c)
+                {
+                    const T val = src.at<T>(y, x * cn + c);
+                    sum[c] += val * val;
+                }
+            }
+        }
+
+        return sum;
+    }
+
+    cv::Scalar sqrSumGold(const cv::Mat& src)
+    {
+        typedef cv::Scalar (*func_t)(const cv::Mat& src);
+
+        static const func_t funcs[] =
+        {
+            sqrSumImpl<uchar>,
+            sqrSumImpl<schar>,
+            sqrSumImpl<ushort>,
+            sqrSumImpl<short>,
+            sqrSumImpl<int>,
+            sqrSumImpl<float>,
+            sqrSumImpl<double>
+        };
+
+        return funcs[src.depth()](src);
+    }
+}
+
+PARAM_TEST_CASE(Sum, cv::cuda::DeviceInfo, cv::Size, MatType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    bool useRoi;
+
+    cv::Mat src;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        src = randomMat(size, type, -128.0, 128.0);
+    }
+};
+
+CUDA_TEST_P(Sum, Simple)
+{
+    cv::Scalar val = cv::cuda::sum(loadMat(src, useRoi));
+
+    cv::Scalar val_gold = cv::sum(src);
+
+    EXPECT_SCALAR_NEAR(val_gold, val, CV_MAT_DEPTH(type) < CV_32F ? 0.0 : 0.5);
+}
+
+CUDA_TEST_P(Sum, Simple_Async)
+{
+    cv::cuda::Stream stream;
+
+    cv::cuda::HostMem dst;
+    cv::cuda::calcSum(loadMat(src, useRoi), dst, cv::noArray(), stream);
+
+    stream.waitForCompletion();
+
+    cv::Scalar val;
+    cv::Mat val_mat(dst.size(), CV_64FC(dst.channels()), val.val);
+    dst.createMatHeader().convertTo(val_mat, CV_64F);
+
+    cv::Scalar val_gold = cv::sum(src);
+
+    EXPECT_SCALAR_NEAR(val_gold, val, CV_MAT_DEPTH(type) < CV_32F ? 0.0 : 0.5);
+}
+
+CUDA_TEST_P(Sum, Abs)
+{
+    cv::Scalar val = cv::cuda::absSum(loadMat(src, useRoi));
+
+    cv::Scalar val_gold = absSumGold(src);
+
+    EXPECT_SCALAR_NEAR(val_gold, val, CV_MAT_DEPTH(type) < CV_32F ? 0.0 : 0.5);
+}
+
+CUDA_TEST_P(Sum, Abs_Async)
+{
+    cv::cuda::Stream stream;
+
+    cv::cuda::HostMem dst;
+    cv::cuda::calcAbsSum(loadMat(src, useRoi), dst, cv::noArray(), stream);
+
+    stream.waitForCompletion();
+
+    cv::Scalar val;
+    cv::Mat val_mat(dst.size(), CV_64FC(dst.channels()), val.val);
+    dst.createMatHeader().convertTo(val_mat, CV_64F);
+
+    cv::Scalar val_gold = absSumGold(src);
+
+    EXPECT_SCALAR_NEAR(val_gold, val, CV_MAT_DEPTH(type) < CV_32F ? 0.0 : 0.5);
+}
+
+CUDA_TEST_P(Sum, Sqr)
+{
+    cv::Scalar val = cv::cuda::sqrSum(loadMat(src, useRoi));
+
+    cv::Scalar val_gold = sqrSumGold(src);
+
+    EXPECT_SCALAR_NEAR(val_gold, val, CV_MAT_DEPTH(type) < CV_32F ? 0.0 : 0.5);
+}
+
+CUDA_TEST_P(Sum, Sqr_Async)
+{
+    cv::cuda::Stream stream;
+
+    cv::cuda::HostMem dst;
+    cv::cuda::calcSqrSum(loadMat(src, useRoi), dst, cv::noArray(), stream);
+
+    stream.waitForCompletion();
+
+    cv::Scalar val;
+    cv::Mat val_mat(dst.size(), CV_64FC(dst.channels()), val.val);
+    dst.createMatHeader().convertTo(val_mat, CV_64F);
+
+    cv::Scalar val_gold = sqrSumGold(src);
+
+    EXPECT_SCALAR_NEAR(val_gold, val, CV_MAT_DEPTH(type) < CV_32F ? 0.0 : 0.5);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Sum, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    TYPES(CV_8U, CV_64F, 1, 4),
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// MinMax
+
+PARAM_TEST_CASE(MinMax, cv::cuda::DeviceInfo, cv::Size, MatDepth, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(MinMax, WithoutMask)
+{
+    cv::Mat src = randomMat(size, depth);
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            double minVal, maxVal;
+            cv::cuda::minMax(loadMat(src), &minVal, &maxVal);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        double minVal, maxVal;
+        cv::cuda::minMax(loadMat(src, useRoi), &minVal, &maxVal);
+
+        double minVal_gold, maxVal_gold;
+        minMaxLocGold(src, &minVal_gold, &maxVal_gold);
+
+        EXPECT_DOUBLE_EQ(minVal_gold, minVal);
+        EXPECT_DOUBLE_EQ(maxVal_gold, maxVal);
+    }
+}
+
+CUDA_TEST_P(MinMax, Async)
+{
+    cv::Mat src = randomMat(size, depth);
+
+    cv::cuda::Stream stream;
+
+    cv::cuda::HostMem dst;
+    cv::cuda::findMinMax(loadMat(src, useRoi), dst, cv::noArray(), stream);
+
+    stream.waitForCompletion();
+
+    double vals[2];
+    const cv::Mat vals_mat(1, 2, CV_64FC1, &vals[0]);
+    dst.createMatHeader().convertTo(vals_mat, CV_64F);
+
+    double minVal_gold, maxVal_gold;
+    minMaxLocGold(src, &minVal_gold, &maxVal_gold);
+
+    EXPECT_DOUBLE_EQ(minVal_gold, vals[0]);
+    EXPECT_DOUBLE_EQ(maxVal_gold, vals[1]);
+}
+
+CUDA_TEST_P(MinMax, WithMask)
+{
+    cv::Mat src = randomMat(size, depth);
+    cv::Mat mask = randomMat(size, CV_8UC1, 0.0, 2.0);
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            double minVal, maxVal;
+            cv::cuda::minMax(loadMat(src), &minVal, &maxVal, loadMat(mask));
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        double minVal, maxVal;
+        cv::cuda::minMax(loadMat(src, useRoi), &minVal, &maxVal, loadMat(mask, useRoi));
+
+        double minVal_gold, maxVal_gold;
+        minMaxLocGold(src, &minVal_gold, &maxVal_gold, 0, 0, mask);
+
+        EXPECT_DOUBLE_EQ(minVal_gold, minVal);
+        EXPECT_DOUBLE_EQ(maxVal_gold, maxVal);
+    }
+}
+
+CUDA_TEST_P(MinMax, NullPtr)
+{
+    cv::Mat src = randomMat(size, depth);
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            double minVal, maxVal;
+            cv::cuda::minMax(loadMat(src), &minVal, 0);
+            cv::cuda::minMax(loadMat(src), 0, &maxVal);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        double minVal, maxVal;
+        cv::cuda::minMax(loadMat(src, useRoi), &minVal, 0);
+        cv::cuda::minMax(loadMat(src, useRoi), 0, &maxVal);
+
+        double minVal_gold, maxVal_gold;
+        minMaxLocGold(src, &minVal_gold, &maxVal_gold, 0, 0);
+
+        EXPECT_DOUBLE_EQ(minVal_gold, minVal);
+        EXPECT_DOUBLE_EQ(maxVal_gold, maxVal);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, MinMax, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    ALL_DEPTH,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// MinMaxLoc
+
+namespace
+{
+    template <typename T>
+    void expectEqualImpl(const cv::Mat& src, cv::Point loc_gold, cv::Point loc)
+    {
+        EXPECT_EQ(src.at<T>(loc_gold.y, loc_gold.x), src.at<T>(loc.y, loc.x));
+    }
+
+    void expectEqual(const cv::Mat& src, cv::Point loc_gold, cv::Point loc)
+    {
+        typedef void (*func_t)(const cv::Mat& src, cv::Point loc_gold, cv::Point loc);
+
+        static const func_t funcs[] =
+        {
+            expectEqualImpl<uchar>,
+            expectEqualImpl<schar>,
+            expectEqualImpl<ushort>,
+            expectEqualImpl<short>,
+            expectEqualImpl<int>,
+            expectEqualImpl<float>,
+            expectEqualImpl<double>
+        };
+
+        funcs[src.depth()](src, loc_gold, loc);
+    }
+}
+
+PARAM_TEST_CASE(MinMaxLoc, cv::cuda::DeviceInfo, cv::Size, MatDepth, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(MinMaxLoc, WithoutMask)
+{
+    cv::Mat src = randomMat(size, depth);
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            double minVal, maxVal;
+            cv::Point minLoc, maxLoc;
+            cv::cuda::minMaxLoc(loadMat(src), &minVal, &maxVal, &minLoc, &maxLoc);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        double minVal, maxVal;
+        cv::Point minLoc, maxLoc;
+        cv::cuda::minMaxLoc(loadMat(src, useRoi), &minVal, &maxVal, &minLoc, &maxLoc);
+
+        double minVal_gold, maxVal_gold;
+        cv::Point minLoc_gold, maxLoc_gold;
+        minMaxLocGold(src, &minVal_gold, &maxVal_gold, &minLoc_gold, &maxLoc_gold);
+
+        EXPECT_DOUBLE_EQ(minVal_gold, minVal);
+        EXPECT_DOUBLE_EQ(maxVal_gold, maxVal);
+
+        expectEqual(src, minLoc_gold, minLoc);
+        expectEqual(src, maxLoc_gold, maxLoc);
+    }
+}
+
+CUDA_TEST_P(MinMaxLoc, OneRowMat)
+{
+    cv::Mat src = randomMat(cv::Size(size.width, 1), depth);
+
+    double minVal, maxVal;
+    cv::Point minLoc, maxLoc;
+    cv::cuda::minMaxLoc(loadMat(src, useRoi), &minVal, &maxVal, &minLoc, &maxLoc);
+
+    double minVal_gold, maxVal_gold;
+    cv::Point minLoc_gold, maxLoc_gold;
+    minMaxLocGold(src, &minVal_gold, &maxVal_gold, &minLoc_gold, &maxLoc_gold);
+
+    EXPECT_DOUBLE_EQ(minVal_gold, minVal);
+    EXPECT_DOUBLE_EQ(maxVal_gold, maxVal);
+
+    expectEqual(src, minLoc_gold, minLoc);
+    expectEqual(src, maxLoc_gold, maxLoc);
+}
+
+CUDA_TEST_P(MinMaxLoc, OneColumnMat)
+{
+    cv::Mat src = randomMat(cv::Size(1, size.height), depth);
+
+    double minVal, maxVal;
+    cv::Point minLoc, maxLoc;
+    cv::cuda::minMaxLoc(loadMat(src, useRoi), &minVal, &maxVal, &minLoc, &maxLoc);
+
+    double minVal_gold, maxVal_gold;
+    cv::Point minLoc_gold, maxLoc_gold;
+    minMaxLocGold(src, &minVal_gold, &maxVal_gold, &minLoc_gold, &maxLoc_gold);
+
+    EXPECT_DOUBLE_EQ(minVal_gold, minVal);
+    EXPECT_DOUBLE_EQ(maxVal_gold, maxVal);
+
+    expectEqual(src, minLoc_gold, minLoc);
+    expectEqual(src, maxLoc_gold, maxLoc);
+}
+
+CUDA_TEST_P(MinMaxLoc, Async)
+{
+    cv::Mat src = randomMat(size, depth);
+
+    cv::cuda::Stream stream;
+
+    cv::cuda::HostMem minMaxVals, locVals;
+    cv::cuda::findMinMaxLoc(loadMat(src, useRoi), minMaxVals, locVals, cv::noArray(), stream);
+
+    stream.waitForCompletion();
+
+    double vals[2];
+    const cv::Mat vals_mat(2, 1, CV_64FC1, &vals[0]);
+    minMaxVals.createMatHeader().convertTo(vals_mat, CV_64F);
+
+    int locs[2];
+    const cv::Mat locs_mat(2, 1, CV_32SC1, &locs[0]);
+    locVals.createMatHeader().copyTo(locs_mat);
+
+    cv::Point locs2D[] = {
+        cv::Point(locs[0] % src.cols, locs[0] / src.cols),
+        cv::Point(locs[1] % src.cols, locs[1] / src.cols),
+    };
+
+    double minVal_gold, maxVal_gold;
+    cv::Point minLoc_gold, maxLoc_gold;
+    minMaxLocGold(src, &minVal_gold, &maxVal_gold, &minLoc_gold, &maxLoc_gold);
+
+    EXPECT_DOUBLE_EQ(minVal_gold, vals[0]);
+    EXPECT_DOUBLE_EQ(maxVal_gold, vals[1]);
+
+    expectEqual(src, minLoc_gold, locs2D[0]);
+    expectEqual(src, maxLoc_gold, locs2D[1]);
+}
+
+CUDA_TEST_P(MinMaxLoc, WithMask)
+{
+    cv::Mat src = randomMat(size, depth);
+    cv::Mat mask = randomMat(size, CV_8UC1, 0.0, 2.0);
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            double minVal, maxVal;
+            cv::Point minLoc, maxLoc;
+            cv::cuda::minMaxLoc(loadMat(src), &minVal, &maxVal, &minLoc, &maxLoc, loadMat(mask));
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        double minVal, maxVal;
+        cv::Point minLoc, maxLoc;
+        cv::cuda::minMaxLoc(loadMat(src, useRoi), &minVal, &maxVal, &minLoc, &maxLoc, loadMat(mask, useRoi));
+
+        double minVal_gold, maxVal_gold;
+        cv::Point minLoc_gold, maxLoc_gold;
+        minMaxLocGold(src, &minVal_gold, &maxVal_gold, &minLoc_gold, &maxLoc_gold, mask);
+
+        EXPECT_DOUBLE_EQ(minVal_gold, minVal);
+        EXPECT_DOUBLE_EQ(maxVal_gold, maxVal);
+
+        expectEqual(src, minLoc_gold, minLoc);
+        expectEqual(src, maxLoc_gold, maxLoc);
+    }
+}
+
+CUDA_TEST_P(MinMaxLoc, NullPtr)
+{
+    cv::Mat src = randomMat(size, depth);
+
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            double minVal, maxVal;
+            cv::Point minLoc, maxLoc;
+            cv::cuda::minMaxLoc(loadMat(src, useRoi), &minVal, 0, 0, 0);
+            cv::cuda::minMaxLoc(loadMat(src, useRoi), 0, &maxVal, 0, 0);
+            cv::cuda::minMaxLoc(loadMat(src, useRoi), 0, 0, &minLoc, 0);
+            cv::cuda::minMaxLoc(loadMat(src, useRoi), 0, 0, 0, &maxLoc);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        double minVal, maxVal;
+        cv::Point minLoc, maxLoc;
+        cv::cuda::minMaxLoc(loadMat(src, useRoi), &minVal, 0, 0, 0);
+        cv::cuda::minMaxLoc(loadMat(src, useRoi), 0, &maxVal, 0, 0);
+        cv::cuda::minMaxLoc(loadMat(src, useRoi), 0, 0, &minLoc, 0);
+        cv::cuda::minMaxLoc(loadMat(src, useRoi), 0, 0, 0, &maxLoc);
+
+        double minVal_gold, maxVal_gold;
+        cv::Point minLoc_gold, maxLoc_gold;
+        minMaxLocGold(src, &minVal_gold, &maxVal_gold, &minLoc_gold, &maxLoc_gold);
+
+        EXPECT_DOUBLE_EQ(minVal_gold, minVal);
+        EXPECT_DOUBLE_EQ(maxVal_gold, maxVal);
+
+        expectEqual(src, minLoc_gold, minLoc);
+        expectEqual(src, maxLoc_gold, maxLoc);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, MinMaxLoc, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    ALL_DEPTH,
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////
+// CountNonZero
+
+PARAM_TEST_CASE(CountNonZero, cv::cuda::DeviceInfo, cv::Size, MatDepth, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    bool useRoi;
+
+    cv::Mat src;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        cv::Mat srcBase = randomMat(size, CV_8U, 0.0, 1.5);
+        srcBase.convertTo(src, depth);
+    }
+};
+
+CUDA_TEST_P(CountNonZero, Accuracy)
+{
+    if (depth == CV_64F && !supportFeature(devInfo, cv::cuda::NATIVE_DOUBLE))
+    {
+        try
+        {
+            cv::cuda::countNonZero(loadMat(src));
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsUnsupportedFormat, e.code);
+        }
+    }
+    else
+    {
+        int val = cv::cuda::countNonZero(loadMat(src, useRoi));
+
+        int val_gold = cv::countNonZero(src);
+
+        ASSERT_EQ(val_gold, val);
+    }
+}
+
+CUDA_TEST_P(CountNonZero, Async)
+{
+    cv::cuda::Stream stream;
+
+    cv::cuda::HostMem dst;
+    cv::cuda::countNonZero(loadMat(src, useRoi), dst, stream);
+
+    stream.waitForCompletion();
+
+    int val;
+    const cv::Mat val_mat(1, 1, CV_32SC1, &val);
+    dst.createMatHeader().copyTo(val_mat);
+
+    int val_gold = cv::countNonZero(src);
+
+    ASSERT_EQ(val_gold, val);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, CountNonZero, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    ALL_DEPTH,
+    WHOLE_SUBMAT));
+
+//////////////////////////////////////////////////////////////////////////////
+// Reduce
+
+CV_ENUM(ReduceCode, cv::REDUCE_SUM, cv::REDUCE_AVG, cv::REDUCE_MAX, cv::REDUCE_MIN)
+#define ALL_REDUCE_CODES testing::Values(ReduceCode(cv::REDUCE_SUM), ReduceCode(cv::REDUCE_AVG), ReduceCode(cv::REDUCE_MAX), ReduceCode(cv::REDUCE_MIN))
+
+PARAM_TEST_CASE(Reduce, cv::cuda::DeviceInfo, cv::Size, MatDepth, Channels, ReduceCode, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    int channels;
+    int reduceOp;
+    bool useRoi;
+
+    int type;
+    int dst_depth;
+    int dst_type;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        channels = GET_PARAM(3);
+        reduceOp = GET_PARAM(4);
+        useRoi = GET_PARAM(5);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        type = CV_MAKE_TYPE(depth, channels);
+
+        if (reduceOp == cv::REDUCE_MAX || reduceOp == cv::REDUCE_MIN)
+            dst_depth = depth;
+        else if (reduceOp == cv::REDUCE_SUM)
+            dst_depth = depth == CV_8U ? CV_32S : depth < CV_64F ? CV_32F : depth;
+        else
+            dst_depth = depth < CV_32F ? CV_32F : depth;
+
+        dst_type = CV_MAKE_TYPE(dst_depth, channels);
+    }
+
+};
+
+CUDA_TEST_P(Reduce, Rows)
+{
+    cv::Mat src = randomMat(size, type);
+
+    cv::cuda::GpuMat dst = createMat(cv::Size(src.cols, 1), dst_type, useRoi);
+    cv::cuda::reduce(loadMat(src, useRoi), dst, 0, reduceOp, dst_depth);
+
+    cv::Mat dst_gold;
+    cv::reduce(src, dst_gold, 0, reduceOp, dst_depth);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, dst_depth < CV_32F ? 0.0 : 0.02);
+}
+
+CUDA_TEST_P(Reduce, Cols)
+{
+    cv::Mat src = randomMat(size, type);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::reduce(loadMat(src, useRoi), dst, 1, reduceOp, dst_depth);
+
+    cv::Mat dst_gold;
+    cv::reduce(src, dst_gold, 1, reduceOp, dst_depth);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, dst_depth < CV_32F ? 0.0 : 0.02);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Reduce, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U),
+                    MatDepth(CV_16U),
+                    MatDepth(CV_16S),
+                    MatDepth(CV_32F),
+                    MatDepth(CV_64F)),
+    ALL_CHANNELS,
+    ALL_REDUCE_CODES,
+    WHOLE_SUBMAT));
+
+//////////////////////////////////////////////////////////////////////////////
+// Normalize
+
+PARAM_TEST_CASE(Normalize, cv::cuda::DeviceInfo, cv::Size, MatDepth, NormCode, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    int norm_type;
+    bool useRoi;
+
+    double alpha;
+    double beta;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        norm_type = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        alpha = 1;
+        beta = 0;
+    }
+
+};
+
+CUDA_TEST_P(Normalize, WithOutMask)
+{
+    cv::Mat src = randomMat(size, type);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    cv::cuda::normalize(loadMat(src, useRoi), dst, alpha, beta, norm_type, type);
+
+    cv::Mat dst_gold;
+    cv::normalize(src, dst_gold, alpha, beta, norm_type, type);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, type < CV_32F ? 1.0 : 1e-4);
+}
+
+CUDA_TEST_P(Normalize, WithMask)
+{
+    cv::Mat src = randomMat(size, type);
+    cv::Mat mask = randomMat(size, CV_8UC1, 0, 2);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    dst.setTo(cv::Scalar::all(0));
+    cv::cuda::normalize(loadMat(src, useRoi), dst, alpha, beta, norm_type, -1, loadMat(mask, useRoi));
+
+    cv::Mat dst_gold(size, type);
+    dst_gold.setTo(cv::Scalar::all(0));
+    cv::normalize(src, dst_gold, alpha, beta, norm_type, -1, mask);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, type < CV_32F ? 1.0 : 1e-4);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Normalize, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    ALL_DEPTH,
+    testing::Values(NormCode(cv::NORM_L1), NormCode(cv::NORM_L2), NormCode(cv::NORM_INF), NormCode(cv::NORM_MINMAX)),
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////////////////////////////
+// MeanStdDev
+
+PARAM_TEST_CASE(MeanStdDev, cv::cuda::DeviceInfo, cv::Size, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        useRoi = GET_PARAM(2);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(MeanStdDev, Accuracy)
+{
+    cv::Mat src = randomMat(size, CV_8UC1);
+
+    if (!supportFeature(devInfo, cv::cuda::FEATURE_SET_COMPUTE_13))
+    {
+        try
+        {
+            cv::Scalar mean;
+            cv::Scalar stddev;
+            cv::cuda::meanStdDev(loadMat(src, useRoi), mean, stddev);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsNotImplemented, e.code);
+        }
+    }
+    else
+    {
+        cv::Scalar mean;
+        cv::Scalar stddev;
+        cv::cuda::meanStdDev(loadMat(src, useRoi), mean, stddev);
+
+        cv::Scalar mean_gold;
+        cv::Scalar stddev_gold;
+        cv::meanStdDev(src, mean_gold, stddev_gold);
+
+        EXPECT_SCALAR_NEAR(mean_gold, mean, 1e-5);
+        EXPECT_SCALAR_NEAR(stddev_gold, stddev, 1e-5);
+    }
+}
+
+CUDA_TEST_P(MeanStdDev, Async)
+{
+    cv::Mat src = randomMat(size, CV_8UC1);
+
+    cv::cuda::Stream stream;
+
+    cv::cuda::HostMem dst;
+    cv::cuda::meanStdDev(loadMat(src, useRoi), dst, stream);
+
+    stream.waitForCompletion();
+
+    double vals[2];
+    dst.createMatHeader().copyTo(cv::Mat(1, 2, CV_64FC1, &vals[0]));
+
+    cv::Scalar mean_gold;
+    cv::Scalar stddev_gold;
+    cv::meanStdDev(src, mean_gold, stddev_gold);
+
+    EXPECT_SCALAR_NEAR(mean_gold, cv::Scalar(vals[0]), 1e-5);
+    EXPECT_SCALAR_NEAR(stddev_gold, cv::Scalar(vals[1]), 1e-5);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, MeanStdDev, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    WHOLE_SUBMAT));
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// Integral
+
+PARAM_TEST_CASE(Integral, cv::cuda::DeviceInfo, cv::Size, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        useRoi = GET_PARAM(2);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Integral, Accuracy)
+{
+    cv::Mat src = randomMat(size, CV_8UC1);
+
+    cv::cuda::GpuMat dst = createMat(cv::Size(src.cols + 1, src.rows + 1), CV_32SC1, useRoi);
+    cv::cuda::integral(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::integral(src, dst_gold, CV_32S);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, Integral, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(cv::Size(16, 16), cv::Size(128, 128), cv::Size(113, 113), cv::Size(768, 1066)),
+    WHOLE_SUBMAT));
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// IntegralSqr
+
+PARAM_TEST_CASE(IntegralSqr, cv::cuda::DeviceInfo, cv::Size, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        useRoi = GET_PARAM(2);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(IntegralSqr, Accuracy)
+{
+    cv::Mat src = randomMat(size, CV_8UC1);
+
+    cv::cuda::GpuMat dst = createMat(cv::Size(src.cols + 1, src.rows + 1), CV_64FC1, useRoi);
+    cv::cuda::sqrIntegral(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold, temp;
+    cv::integral(src, temp, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Arithm, IntegralSqr, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    WHOLE_SUBMAT));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaarithm/test/test_stream.cpp b/modules/cudaarithm/test/test_stream.cpp
new file mode 100644
index 00000000000..d701edfee32
--- /dev/null
+++ b/modules/cudaarithm/test/test_stream.cpp
@@ -0,0 +1,176 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+#include <cuda_runtime.h>
+
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/core/cuda_stream_accessor.hpp"
+#include "opencv2/ts/cuda_test.hpp"
+
+namespace opencv_test { namespace {
+
+struct Async : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    cv::cuda::HostMem src;
+    cv::cuda::GpuMat d_src;
+
+    cv::cuda::HostMem dst;
+    cv::cuda::GpuMat d_dst;
+
+    virtual void SetUp()
+    {
+        cv::cuda::DeviceInfo devInfo = GetParam();
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        src = cv::cuda::HostMem(cv::cuda::HostMem::PAGE_LOCKED);
+
+        cv::Mat m = randomMat(cv::Size(128, 128), CV_8UC1);
+        m.copyTo(src);
+    }
+};
+
+void checkMemSet(int status, void* userData)
+{
+    ASSERT_EQ(cudaSuccess, status);
+
+    Async* test = reinterpret_cast<Async*>(userData);
+
+    cv::cuda::HostMem src = test->src;
+    cv::cuda::HostMem dst = test->dst;
+
+    cv::Mat dst_gold = cv::Mat::zeros(src.size(), src.type());
+
+    ASSERT_MAT_NEAR(dst_gold, dst, 0);
+}
+
+CUDA_TEST_P(Async, MemSet)
+{
+    cv::cuda::Stream stream;
+
+    d_dst.upload(src);
+
+    d_dst.setTo(cv::Scalar::all(0), stream);
+    d_dst.download(dst, stream);
+
+    Async* test = this;
+    stream.enqueueHostCallback(checkMemSet, test);
+
+    stream.waitForCompletion();
+}
+
+void checkConvert(int status, void* userData)
+{
+    ASSERT_EQ(cudaSuccess, status);
+
+    Async* test = reinterpret_cast<Async*>(userData);
+
+    cv::cuda::HostMem src = test->src;
+    cv::cuda::HostMem dst = test->dst;
+
+    cv::Mat dst_gold;
+    src.createMatHeader().convertTo(dst_gold, CV_32S);
+
+    ASSERT_MAT_NEAR(dst_gold, dst, 0);
+}
+
+CUDA_TEST_P(Async, Convert)
+{
+    cv::cuda::Stream stream;
+
+    d_src.upload(src, stream);
+    d_src.convertTo(d_dst, CV_32S, stream);
+    d_dst.download(dst, stream);
+
+    Async* test = this;
+    stream.enqueueHostCallback(checkConvert, test);
+
+    stream.waitForCompletion();
+}
+
+CUDA_TEST_P(Async, WrapStream)
+{
+    cudaStream_t cuda_stream = NULL;
+    ASSERT_EQ(cudaSuccess, cudaStreamCreate(&cuda_stream));
+
+    {
+        cv::cuda::Stream stream = cv::cuda::StreamAccessor::wrapStream(cuda_stream);
+
+        d_src.upload(src, stream);
+        d_src.convertTo(d_dst, CV_32S, stream);
+        d_dst.download(dst, stream);
+
+        Async* test = this;
+        stream.enqueueHostCallback(checkConvert, test);
+
+        stream.waitForCompletion();
+    }
+
+    ASSERT_EQ(cudaSuccess, cudaStreamDestroy(cuda_stream));
+}
+
+CUDA_TEST_P(Async, HostMemAllocator)
+{
+    cv::cuda::Stream stream;
+
+    cv::Mat h_dst;
+    h_dst.allocator = cv::cuda::HostMem::getAllocator();
+
+    d_src.upload(src, stream);
+    d_src.convertTo(d_dst, CV_32S, stream);
+    d_dst.download(h_dst, stream);
+
+    stream.waitForCompletion();
+
+    cv::Mat dst_gold;
+    src.createMatHeader().convertTo(dst_gold, CV_32S);
+
+    ASSERT_MAT_NEAR(dst_gold, h_dst, 0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Stream, Async, ALL_DEVICES);
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudabgsegm/CMakeLists.txt b/modules/cudabgsegm/CMakeLists.txt
new file mode 100644
index 00000000000..ffc6a628aea
--- /dev/null
+++ b/modules/cudabgsegm/CMakeLists.txt
@@ -0,0 +1,9 @@
+if(IOS OR (NOT HAVE_CUDA AND NOT BUILD_CUDA_STUBS))
+  ocv_module_disable(cudabgsegm)
+endif()
+
+set(the_description "CUDA-accelerated Background Segmentation")
+
+ocv_warnings_disable(CMAKE_CXX_FLAGS /wd4127 /wd4324 /wd4512 -Wundef -Wmissing-declarations -Wshadow)
+
+ocv_define_module(cudabgsegm opencv_video WRAP python)
diff --git a/modules/cudabgsegm/include/opencv2/cudabgsegm.hpp b/modules/cudabgsegm/include/opencv2/cudabgsegm.hpp
new file mode 100644
index 00000000000..eb5467e9b87
--- /dev/null
+++ b/modules/cudabgsegm/include/opencv2/cudabgsegm.hpp
@@ -0,0 +1,162 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_CUDABGSEGM_HPP
+#define OPENCV_CUDABGSEGM_HPP
+
+#ifndef __cplusplus
+#  error cudabgsegm.hpp header must be compiled as C++
+#endif
+
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/video/background_segm.hpp"
+
+/**
+  @addtogroup cuda
+  @{
+    @defgroup cudabgsegm Background Segmentation
+  @}
+ */
+
+namespace cv { namespace cuda {
+
+//! @addtogroup cudabgsegm
+//! @{
+
+////////////////////////////////////////////////////
+// MOG
+
+/** @brief Gaussian Mixture-based Background/Foreground Segmentation Algorithm.
+
+The class discriminates between foreground and background pixels by building and maintaining a model
+of the background. Any pixel which does not fit this model is then deemed to be foreground. The
+class implements algorithm described in @cite MOG2001 .
+
+@sa BackgroundSubtractorMOG
+
+@note
+   -   An example on gaussian mixture based background/foreground segmantation can be found at
+        opencv_source_code/samples/gpu/bgfg_segm.cpp
+ */
+class CV_EXPORTS_W BackgroundSubtractorMOG : public cv::BackgroundSubtractor
+{
+public:
+
+    using cv::BackgroundSubtractor::apply;
+    CV_WRAP virtual void apply(InputArray image, OutputArray fgmask, double learningRate, Stream& stream) = 0;
+
+    using cv::BackgroundSubtractor::getBackgroundImage;
+    virtual void getBackgroundImage(OutputArray backgroundImage, Stream& stream) const = 0;
+
+    CV_WRAP inline void getBackgroundImage(CV_OUT GpuMat& backgroundImage, Stream& stream) {
+        getBackgroundImage(OutputArray(backgroundImage), stream);
+    }
+
+    CV_WRAP virtual int getHistory() const = 0;
+    CV_WRAP virtual void setHistory(int nframes) = 0;
+
+    CV_WRAP virtual int getNMixtures() const = 0;
+    CV_WRAP virtual void setNMixtures(int nmix) = 0;
+
+    CV_WRAP virtual double getBackgroundRatio() const = 0;
+    CV_WRAP virtual void setBackgroundRatio(double backgroundRatio) = 0;
+
+    CV_WRAP virtual double getNoiseSigma() const = 0;
+    CV_WRAP virtual void setNoiseSigma(double noiseSigma) = 0;
+};
+
+/** @brief Creates mixture-of-gaussian background subtractor
+
+@param history Length of the history.
+@param nmixtures Number of Gaussian mixtures.
+@param backgroundRatio Background ratio.
+@param noiseSigma Noise strength (standard deviation of the brightness or each color channel). 0
+means some automatic value.
+ */
+CV_EXPORTS_W Ptr<cuda::BackgroundSubtractorMOG>
+    createBackgroundSubtractorMOG(int history = 200, int nmixtures = 5,
+                                  double backgroundRatio = 0.7, double noiseSigma = 0);
+
+////////////////////////////////////////////////////
+// MOG2
+
+/** @brief Gaussian Mixture-based Background/Foreground Segmentation Algorithm.
+
+The class discriminates between foreground and background pixels by building and maintaining a model
+of the background. Any pixel which does not fit this model is then deemed to be foreground. The
+class implements algorithm described in @cite Zivkovic2004 .
+
+@sa BackgroundSubtractorMOG2
+ */
+class CV_EXPORTS_W BackgroundSubtractorMOG2 : public cv::BackgroundSubtractorMOG2
+{
+public:
+    using cv::BackgroundSubtractorMOG2::apply;
+    using cv::BackgroundSubtractorMOG2::getBackgroundImage;
+
+    CV_WRAP virtual void apply(InputArray image, OutputArray fgmask, double learningRate, Stream& stream) = 0;
+
+    virtual void getBackgroundImage(OutputArray backgroundImage, Stream& stream) const = 0;
+
+    CV_WRAP inline void getBackgroundImage(CV_OUT GpuMat &backgroundImage, Stream& stream) {
+        getBackgroundImage(OutputArray(backgroundImage), stream);
+    }
+};
+
+/** @brief Creates MOG2 Background Subtractor
+
+@param history Length of the history.
+@param varThreshold Threshold on the squared Mahalanobis distance between the pixel and the model
+to decide whether a pixel is well described by the background model. This parameter does not
+affect the background update.
+@param detectShadows If true, the algorithm will detect shadows and mark them. It decreases the
+speed a bit, so if you do not need this feature, set the parameter to false.
+ */
+CV_EXPORTS_W Ptr<cuda::BackgroundSubtractorMOG2>
+    createBackgroundSubtractorMOG2(int history = 500, double varThreshold = 16,
+                                   bool detectShadows = true);
+
+//! @}
+
+}} // namespace cv { namespace cuda {
+
+#endif /* OPENCV_CUDABGSEGM_HPP */
diff --git a/modules/cudabgsegm/misc/python/test/test_cudabgsegm.py b/modules/cudabgsegm/misc/python/test/test_cudabgsegm.py
new file mode 100644
index 00000000000..a6e383ac012
--- /dev/null
+++ b/modules/cudabgsegm/misc/python/test/test_cudabgsegm.py
@@ -0,0 +1,43 @@
+#!/usr/bin/env python
+import os
+import cv2 as cv
+import numpy as np
+
+from tests_common import NewOpenCVTests, unittest
+
+class cudabgsegm_test(NewOpenCVTests):
+    def setUp(self):
+        super(cudabgsegm_test, self).setUp()
+        if not cv.cuda.getCudaEnabledDeviceCount():
+            self.skipTest("No CUDA-capable device is detected")
+
+    def test_cudabgsegm(self):
+        lr = 0.05
+        sz = (128,128,1)
+        npMat = (np.random.random(sz) * 255).astype(np.uint8)
+        cuMat = cv.cuda_GpuMat(npMat)
+        cuMatBg = cv.cuda_GpuMat(cuMat.size(),cuMat.type())
+        cuMatFg = cv.cuda_GpuMat(cuMat.size(),cuMat.type())
+
+        mog = cv.cuda.createBackgroundSubtractorMOG()
+        mog.apply(cuMat, lr, cv.cuda.Stream_Null(), cuMatFg)
+        mog.getBackgroundImage(cv.cuda.Stream_Null(),cuMatBg)
+        self.assertTrue(sz[:2] == cuMatFg.size() == cuMatBg.size())
+        self.assertTrue(sz[2] == cuMatFg.channels() == cuMatBg.channels())
+        self.assertTrue(cv.CV_8UC1 == cuMatFg.type() == cuMatBg.type())
+        mog = cv.cuda.createBackgroundSubtractorMOG()
+        self.assertTrue(np.allclose(cuMatFg.download(),mog.apply(cuMat, lr, cv.cuda.Stream_Null()).download()))
+        self.assertTrue(np.allclose(cuMatBg.download(),mog.getBackgroundImage(cv.cuda.Stream_Null()).download()))
+
+        mog2 = cv.cuda.createBackgroundSubtractorMOG2()
+        mog2.apply(cuMat, lr, cv.cuda.Stream_Null(), cuMatFg)
+        mog2.getBackgroundImage(cv.cuda.Stream_Null(),cuMatBg)
+        self.assertTrue(sz[:2] == cuMatFg.size() == cuMatBg.size())
+        self.assertTrue(sz[2] == cuMatFg.channels() == cuMatBg.channels())
+        self.assertTrue(cv.CV_8UC1 == cuMatFg.type() == cuMatBg.type())
+        mog2 = cv.cuda.createBackgroundSubtractorMOG2()
+        self.assertTrue(np.allclose(cuMatFg.download(),mog2.apply(cuMat, lr, cv.cuda.Stream_Null()).download()))
+        self.assertTrue(np.allclose(cuMatBg.download(),mog2.getBackgroundImage(cv.cuda.Stream_Null()).download()))
+
+if __name__ == '__main__':
+    NewOpenCVTests.bootstrap()
\ No newline at end of file
diff --git a/modules/cudabgsegm/perf/perf_bgsegm.cpp b/modules/cudabgsegm/perf/perf_bgsegm.cpp
new file mode 100644
index 00000000000..e99c26d5609
--- /dev/null
+++ b/modules/cudabgsegm/perf/perf_bgsegm.cpp
@@ -0,0 +1,392 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////
+// MOG
+
+#ifdef HAVE_VIDEO_INPUT
+
+DEF_PARAM_TEST(Video_Cn_LearningRate, string, MatCn, double);
+
+PERF_TEST_P(Video_Cn_LearningRate, MOG,
+            Combine(Values("gpu/video/768x576.avi", "gpu/video/1920x1080.avi"),
+                    CUDA_CHANNELS_1_3_4,
+                    Values(0.0, 0.01)))
+{
+    const int numIters = 10;
+
+    const string inputFile = perf::TestBase::getDataPath(GET_PARAM(0));
+    const int cn = GET_PARAM(1);
+    const float learningRate = static_cast<float>(GET_PARAM(2));
+
+    cv::VideoCapture cap(inputFile);
+    ASSERT_TRUE(cap.isOpened());
+
+    cv::Mat frame;
+
+    cap >> frame;
+    ASSERT_FALSE(frame.empty());
+
+    if (cn != 3)
+    {
+        cv::Mat temp;
+        if (cn == 1)
+            cv::cvtColor(frame, temp, cv::COLOR_BGR2GRAY);
+        else
+            cv::cvtColor(frame, temp, cv::COLOR_BGR2BGRA);
+        cv::swap(temp, frame);
+    }
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::BackgroundSubtractor> d_mog = cv::cuda::createBackgroundSubtractorMOG();
+
+        cv::cuda::GpuMat d_frame(frame);
+        cv::cuda::GpuMat foreground;
+
+        d_mog->apply(d_frame, foreground, learningRate);
+
+        int i = 0;
+
+        // collect performance data
+        for (; i < numIters; ++i)
+        {
+            cap >> frame;
+            ASSERT_FALSE(frame.empty());
+
+            if (cn != 3)
+            {
+                cv::Mat temp;
+                if (cn == 1)
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2GRAY);
+                else
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2BGRA);
+                cv::swap(temp, frame);
+            }
+
+            d_frame.upload(frame);
+
+            startTimer();
+            if(!next())
+                break;
+
+            d_mog->apply(d_frame, foreground, learningRate);
+
+            stopTimer();
+        }
+
+        // process last frame in sequence to get data for sanity test
+        for (; i < numIters; ++i)
+        {
+            cap >> frame;
+            ASSERT_FALSE(frame.empty());
+
+            if (cn != 3)
+            {
+                cv::Mat temp;
+                if (cn == 1)
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2GRAY);
+                else
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2BGRA);
+                cv::swap(temp, frame);
+            }
+
+            d_frame.upload(frame);
+
+            d_mog->apply(d_frame, foreground, learningRate);
+        }
+
+        CUDA_SANITY_CHECK(foreground);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+#endif
+
+//////////////////////////////////////////////////////
+// MOG2
+
+#ifdef HAVE_VIDEO_INPUT
+
+DEF_PARAM_TEST(Video_Cn, string, int);
+
+PERF_TEST_P(Video_Cn, DISABLED_MOG2,
+            Combine(Values("gpu/video/768x576.avi", "gpu/video/1920x1080.avi"),
+                    CUDA_CHANNELS_1_3_4))
+{
+    const int numIters = 10;
+
+    const string inputFile = perf::TestBase::getDataPath(GET_PARAM(0));
+    const int cn = GET_PARAM(1);
+
+    cv::VideoCapture cap(inputFile);
+    ASSERT_TRUE(cap.isOpened());
+
+    cv::Mat frame;
+
+    cap >> frame;
+    ASSERT_FALSE(frame.empty());
+
+    if (cn != 3)
+    {
+        cv::Mat temp;
+        if (cn == 1)
+            cv::cvtColor(frame, temp, cv::COLOR_BGR2GRAY);
+        else
+            cv::cvtColor(frame, temp, cv::COLOR_BGR2BGRA);
+        cv::swap(temp, frame);
+    }
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::BackgroundSubtractorMOG2> d_mog2 = cv::cuda::createBackgroundSubtractorMOG2();
+        d_mog2->setDetectShadows(false);
+
+        cv::cuda::GpuMat d_frame(frame);
+        cv::cuda::GpuMat foreground;
+
+        d_mog2->apply(d_frame, foreground);
+
+        int i = 0;
+
+        // collect performance data
+        for (; i < numIters; ++i)
+        {
+            cap >> frame;
+            ASSERT_FALSE(frame.empty());
+
+            if (cn != 3)
+            {
+                cv::Mat temp;
+                if (cn == 1)
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2GRAY);
+                else
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2BGRA);
+                cv::swap(temp, frame);
+            }
+
+            d_frame.upload(frame);
+
+            startTimer();
+            if(!next())
+                break;
+
+            d_mog2->apply(d_frame, foreground);
+
+            stopTimer();
+        }
+
+        // process last frame in sequence to get data for sanity test
+        for (; i < numIters; ++i)
+        {
+            cap >> frame;
+            ASSERT_FALSE(frame.empty());
+
+            if (cn != 3)
+            {
+                cv::Mat temp;
+                if (cn == 1)
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2GRAY);
+                else
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2BGRA);
+                cv::swap(temp, frame);
+            }
+
+            d_frame.upload(frame);
+
+            d_mog2->apply(d_frame, foreground);
+        }
+
+        CUDA_SANITY_CHECK(foreground);
+    }
+    else
+    {
+        cv::Ptr<cv::BackgroundSubtractorMOG2> mog2 = cv::createBackgroundSubtractorMOG2();
+        mog2->setDetectShadows(false);
+
+        cv::Mat foreground;
+
+        mog2->apply(frame, foreground);
+
+        int i = 0;
+
+        // collect performance data
+        for (; i < numIters; ++i)
+        {
+            cap >> frame;
+            ASSERT_FALSE(frame.empty());
+
+            if (cn != 3)
+            {
+                cv::Mat temp;
+                if (cn == 1)
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2GRAY);
+                else
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2BGRA);
+                cv::swap(temp, frame);
+            }
+
+            startTimer();
+            if(!next())
+                break;
+
+            mog2->apply(frame, foreground);
+
+            stopTimer();
+        }
+
+        // process last frame in sequence to get data for sanity test
+        for (; i < numIters; ++i)
+        {
+            cap >> frame;
+            ASSERT_FALSE(frame.empty());
+
+            if (cn != 3)
+            {
+                cv::Mat temp;
+                if (cn == 1)
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2GRAY);
+                else
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2BGRA);
+                cv::swap(temp, frame);
+            }
+
+            mog2->apply(frame, foreground);
+        }
+
+        CPU_SANITY_CHECK(foreground);
+    }
+}
+
+#endif
+
+//////////////////////////////////////////////////////
+// MOG2GetBackgroundImage
+
+#ifdef HAVE_VIDEO_INPUT
+
+PERF_TEST_P(Video_Cn, MOG2GetBackgroundImage,
+            Combine(Values("gpu/video/768x576.avi", "gpu/video/1920x1080.avi"),
+                    CUDA_CHANNELS_1_3_4))
+{
+    const string inputFile = perf::TestBase::getDataPath(GET_PARAM(0));
+    const int cn = GET_PARAM(1);
+
+    cv::VideoCapture cap(inputFile);
+    ASSERT_TRUE(cap.isOpened());
+
+    cv::Mat frame;
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::BackgroundSubtractor> d_mog2 = cv::cuda::createBackgroundSubtractorMOG2();
+
+        cv::cuda::GpuMat d_frame;
+        cv::cuda::GpuMat d_foreground;
+
+        for (int i = 0; i < 10; ++i)
+        {
+            cap >> frame;
+            ASSERT_FALSE(frame.empty());
+
+            if (cn != 3)
+            {
+                cv::Mat temp;
+                if (cn == 1)
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2GRAY);
+                else
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2BGRA);
+                cv::swap(temp, frame);
+            }
+
+            d_frame.upload(frame);
+
+            d_mog2->apply(d_frame, d_foreground);
+        }
+
+        cv::cuda::GpuMat background;
+
+        TEST_CYCLE() d_mog2->getBackgroundImage(background);
+
+        CUDA_SANITY_CHECK(background, 1);
+    }
+    else
+    {
+        cv::Ptr<cv::BackgroundSubtractor> mog2 = cv::createBackgroundSubtractorMOG2();
+        cv::Mat foreground;
+
+        for (int i = 0; i < 10; ++i)
+        {
+            cap >> frame;
+            ASSERT_FALSE(frame.empty());
+
+            if (cn != 3)
+            {
+                cv::Mat temp;
+                if (cn == 1)
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2GRAY);
+                else
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2BGRA);
+                cv::swap(temp, frame);
+            }
+
+            mog2->apply(frame, foreground);
+        }
+
+        cv::Mat background;
+
+        TEST_CYCLE() mog2->getBackgroundImage(background);
+
+        CPU_SANITY_CHECK(background);
+    }
+}
+
+#endif
+
+}} // namespace
diff --git a/modules/cudabgsegm/perf/perf_main.cpp b/modules/cudabgsegm/perf/perf_main.cpp
new file mode 100644
index 00000000000..e85365d03d9
--- /dev/null
+++ b/modules/cudabgsegm/perf/perf_main.cpp
@@ -0,0 +1,47 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+using namespace perf;
+
+CV_PERF_TEST_CUDA_MAIN(cudabgsegm)
diff --git a/modules/cudabgsegm/perf/perf_precomp.hpp b/modules/cudabgsegm/perf/perf_precomp.hpp
new file mode 100644
index 00000000000..77194b61949
--- /dev/null
+++ b/modules/cudabgsegm/perf/perf_precomp.hpp
@@ -0,0 +1,55 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef OPENCV_PERF_PRECOMP_HPP
+#define OPENCV_PERF_PRECOMP_HPP
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_perf.hpp"
+
+#include "opencv2/cudabgsegm.hpp"
+#include "opencv2/video.hpp"
+
+namespace opencv_test {
+using namespace perf;
+}
+
+#endif
diff --git a/modules/cudabgsegm/src/cuda/mog.cu b/modules/cudabgsegm/src/cuda/mog.cu
new file mode 100644
index 00000000000..a0f03843872
--- /dev/null
+++ b/modules/cudabgsegm/src/cuda/mog.cu
@@ -0,0 +1,425 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/vec_traits.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/limits.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace mog
+    {
+        ///////////////////////////////////////////////////////////////
+        // Utility
+
+        __device__ __forceinline__ float cvt(uchar val)
+        {
+            return val;
+        }
+        __device__ __forceinline__ float3 cvt(const uchar3& val)
+        {
+            return make_float3(val.x, val.y, val.z);
+        }
+        __device__ __forceinline__ float4 cvt(const uchar4& val)
+        {
+            return make_float4(val.x, val.y, val.z, val.w);
+        }
+
+        __device__ __forceinline__ float sqr(float val)
+        {
+            return val * val;
+        }
+        __device__ __forceinline__ float sqr(const float3& val)
+        {
+            return val.x * val.x + val.y * val.y + val.z * val.z;
+        }
+        __device__ __forceinline__ float sqr(const float4& val)
+        {
+            return val.x * val.x + val.y * val.y + val.z * val.z;
+        }
+
+        __device__ __forceinline__ float sum(float val)
+        {
+            return val;
+        }
+        __device__ __forceinline__ float sum(const float3& val)
+        {
+            return val.x + val.y + val.z;
+        }
+        __device__ __forceinline__ float sum(const float4& val)
+        {
+            return val.x + val.y + val.z;
+        }
+
+        __device__ __forceinline__ float clamp(float var, float learningRate, float diff, float minVar)
+        {
+             return ::fmaxf(var + learningRate * (diff * diff - var), minVar);
+        }
+        __device__ __forceinline__ float3 clamp(const float3& var, float learningRate, const float3& diff, float minVar)
+        {
+             return make_float3(::fmaxf(var.x + learningRate * (diff.x * diff.x - var.x), minVar),
+                                ::fmaxf(var.y + learningRate * (diff.y * diff.y - var.y), minVar),
+                                ::fmaxf(var.z + learningRate * (diff.z * diff.z - var.z), minVar));
+        }
+        __device__ __forceinline__ float4 clamp(const float4& var, float learningRate, const float4& diff, float minVar)
+        {
+             return make_float4(::fmaxf(var.x + learningRate * (diff.x * diff.x - var.x), minVar),
+                                ::fmaxf(var.y + learningRate * (diff.y * diff.y - var.y), minVar),
+                                ::fmaxf(var.z + learningRate * (diff.z * diff.z - var.z), minVar),
+                                0.0f);
+        }
+
+        ///////////////////////////////////////////////////////////////
+        // MOG without learning
+
+        template <typename SrcT, typename WorkT>
+        __global__ void mog_withoutLearning(const PtrStepSz<SrcT> frame, PtrStepb fgmask,
+                                            const PtrStepf gmm_weight, const PtrStep<WorkT> gmm_mean, const PtrStep<WorkT> gmm_var,
+                                            const int nmixtures, const float varThreshold, const float backgroundRatio)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x >= frame.cols || y >= frame.rows)
+                return;
+
+            WorkT pix = cvt(frame(y, x));
+
+            int kHit = -1;
+            int kForeground = -1;
+
+            for (int k = 0; k < nmixtures; ++k)
+            {
+                if (gmm_weight(k * frame.rows + y, x) < numeric_limits<float>::epsilon())
+                    break;
+
+                WorkT mu = gmm_mean(k * frame.rows + y, x);
+                WorkT var = gmm_var(k * frame.rows + y, x);
+
+                WorkT diff = pix - mu;
+
+                if (sqr(diff) < varThreshold * sum(var))
+                {
+                    kHit = k;
+                    break;
+                }
+            }
+
+            if (kHit >= 0)
+            {
+                float wsum = 0.0f;
+                for (int k = 0; k < nmixtures; ++k)
+                {
+                    wsum += gmm_weight(k * frame.rows + y, x);
+
+                    if (wsum > backgroundRatio)
+                    {
+                        kForeground = k + 1;
+                        break;
+                    }
+                }
+            }
+
+            fgmask(y, x) = (uchar) (-(kHit < 0 || kHit >= kForeground));
+        }
+
+        template <typename SrcT, typename WorkT>
+        void mog_withoutLearning_caller(PtrStepSzb frame, PtrStepSzb fgmask, PtrStepSzf weight, PtrStepSzb mean, PtrStepSzb var,
+                                        int nmixtures, float varThreshold, float backgroundRatio, cudaStream_t stream)
+        {
+            dim3 block(32, 8);
+            dim3 grid(divUp(frame.cols, block.x), divUp(frame.rows, block.y));
+
+            cudaSafeCall( cudaFuncSetCacheConfig(mog_withoutLearning<SrcT, WorkT>, cudaFuncCachePreferL1) );
+
+            mog_withoutLearning<SrcT, WorkT><<<grid, block, 0, stream>>>((PtrStepSz<SrcT>) frame, fgmask,
+                                                                         weight, (PtrStepSz<WorkT>) mean, (PtrStepSz<WorkT>) var,
+                                                                         nmixtures, varThreshold, backgroundRatio);
+
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ///////////////////////////////////////////////////////////////
+        // MOG with learning
+
+        template <typename SrcT, typename WorkT>
+        __global__ void mog_withLearning(const PtrStepSz<SrcT> frame, PtrStepb fgmask,
+                                         PtrStepf gmm_weight, PtrStepf gmm_sortKey, PtrStep<WorkT> gmm_mean, PtrStep<WorkT> gmm_var,
+                                         const int nmixtures, const float varThreshold, const float backgroundRatio, const float learningRate, const float minVar)
+        {
+            const float w0 = 0.05f;
+            const float sk0 = w0 / (30.0f * 0.5f * 2.0f);
+            const float var0 = 30.0f * 0.5f * 30.0f * 0.5f * 4.0f;
+
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x >= frame.cols || y >= frame.rows)
+                return;
+
+            WorkT pix = cvt(frame(y, x));
+
+            float wsum = 0.0f;
+            int kHit = -1;
+            int kForeground = -1;
+
+            int k = 0;
+            for (; k < nmixtures; ++k)
+            {
+                float w = gmm_weight(k * frame.rows + y, x);
+                wsum += w;
+
+                if (w < numeric_limits<float>::epsilon())
+                    break;
+
+                WorkT mu = gmm_mean(k * frame.rows + y, x);
+                WorkT var = gmm_var(k * frame.rows + y, x);
+
+                WorkT diff = pix - mu;
+
+                if (sqr(diff) < varThreshold * sum(var))
+                {
+                    wsum -= w;
+                    float dw = learningRate * (1.0f - w);
+
+                    var = clamp(var, learningRate, diff, minVar);
+
+                    float sortKey_prev = w / ::sqrtf(sum(var));
+                    gmm_sortKey(k * frame.rows + y, x) = sortKey_prev;
+
+                    float weight_prev = w + dw;
+                    gmm_weight(k * frame.rows + y, x) = weight_prev;
+
+                    WorkT mean_prev = mu + learningRate * diff;
+                    gmm_mean(k * frame.rows + y, x) = mean_prev;
+
+                    WorkT var_prev = var;
+                    gmm_var(k * frame.rows + y, x) = var_prev;
+
+                    int k1 = k - 1;
+
+                    if (k1 >= 0)
+                    {
+                        float sortKey_next = gmm_sortKey(k1 * frame.rows + y, x);
+                        float weight_next = gmm_weight(k1 * frame.rows + y, x);
+                        WorkT mean_next = gmm_mean(k1 * frame.rows + y, x);
+                        WorkT var_next = gmm_var(k1 * frame.rows + y, x);
+
+                        for (; sortKey_next < sortKey_prev && k1 >= 0; --k1)
+                        {
+                            gmm_sortKey(k1 * frame.rows + y, x) = sortKey_prev;
+                            gmm_sortKey((k1 + 1) * frame.rows + y, x) = sortKey_next;
+
+                            gmm_weight(k1 * frame.rows + y, x) = weight_prev;
+                            gmm_weight((k1 + 1) * frame.rows + y, x) = weight_next;
+
+                            gmm_mean(k1 * frame.rows + y, x) = mean_prev;
+                            gmm_mean((k1 + 1) * frame.rows + y, x) = mean_next;
+
+                            gmm_var(k1 * frame.rows + y, x) = var_prev;
+                            gmm_var((k1 + 1) * frame.rows + y, x) = var_next;
+
+                            sortKey_prev = sortKey_next;
+                            sortKey_next = k1 > 0 ? gmm_sortKey((k1 - 1) * frame.rows + y, x) : 0.0f;
+
+                            weight_prev = weight_next;
+                            weight_next = k1 > 0 ? gmm_weight((k1 - 1) * frame.rows + y, x) : 0.0f;
+
+                            mean_prev = mean_next;
+                            mean_next = k1 > 0 ? gmm_mean((k1 - 1) * frame.rows + y, x) : VecTraits<WorkT>::all(0.0f);
+
+                            var_prev = var_next;
+                            var_next = k1 > 0 ? gmm_var((k1 - 1) * frame.rows + y, x) : VecTraits<WorkT>::all(0.0f);
+                        }
+                    }
+
+                    kHit = k1 + 1;
+                    break;
+                }
+            }
+
+            if (kHit < 0)
+            {
+                // no appropriate gaussian mixture found at all, remove the weakest mixture and create a new one
+                kHit = k = ::min(k, nmixtures - 1);
+                wsum += w0 - gmm_weight(k * frame.rows + y, x);
+
+                gmm_weight(k * frame.rows + y, x) = w0;
+                gmm_mean(k * frame.rows + y, x) = pix;
+                gmm_var(k * frame.rows + y, x) = VecTraits<WorkT>::all(var0);
+                gmm_sortKey(k * frame.rows + y, x) = sk0;
+            }
+            else
+            {
+                for( ; k < nmixtures; k++)
+                    wsum += gmm_weight(k * frame.rows + y, x);
+            }
+
+            float wscale = 1.0f / wsum;
+            wsum = 0;
+            for (k = 0; k < nmixtures; ++k)
+            {
+                float w = gmm_weight(k * frame.rows + y, x);
+                wsum += w *= wscale;
+
+                gmm_weight(k * frame.rows + y, x) = w;
+                gmm_sortKey(k * frame.rows + y, x) *= wscale;
+
+                if (wsum > backgroundRatio && kForeground < 0)
+                    kForeground = k + 1;
+            }
+
+            fgmask(y, x) = (uchar)(-(kHit >= kForeground));
+        }
+
+        template <typename SrcT, typename WorkT>
+        void mog_withLearning_caller(PtrStepSzb frame, PtrStepSzb fgmask, PtrStepSzf weight, PtrStepSzf sortKey, PtrStepSzb mean, PtrStepSzb var,
+                                     int nmixtures, float varThreshold, float backgroundRatio, float learningRate, float minVar,
+                                     cudaStream_t stream)
+        {
+            dim3 block(32, 8);
+            dim3 grid(divUp(frame.cols, block.x), divUp(frame.rows, block.y));
+
+            cudaSafeCall( cudaFuncSetCacheConfig(mog_withLearning<SrcT, WorkT>, cudaFuncCachePreferL1) );
+
+            mog_withLearning<SrcT, WorkT><<<grid, block, 0, stream>>>((PtrStepSz<SrcT>) frame, fgmask,
+                                                                      weight, sortKey, (PtrStepSz<WorkT>) mean, (PtrStepSz<WorkT>) var,
+                                                                      nmixtures, varThreshold, backgroundRatio, learningRate, minVar);
+
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ///////////////////////////////////////////////////////////////
+        // MOG
+
+        void mog_gpu(PtrStepSzb frame, int cn, PtrStepSzb fgmask, PtrStepSzf weight, PtrStepSzf sortKey, PtrStepSzb mean, PtrStepSzb var, int nmixtures, float varThreshold, float learningRate, float backgroundRatio, float noiseSigma, cudaStream_t stream)
+        {
+            typedef void (*withoutLearning_t)(PtrStepSzb frame, PtrStepSzb fgmask, PtrStepSzf weight, PtrStepSzb mean, PtrStepSzb var, int nmixtures, float varThreshold, float backgroundRatio, cudaStream_t stream);
+            typedef void (*withLearning_t)(PtrStepSzb frame, PtrStepSzb fgmask, PtrStepSzf weight, PtrStepSzf sortKey, PtrStepSzb mean, PtrStepSzb var, int nmixtures, float varThreshold, float backgroundRatio, float learningRate, float minVar, cudaStream_t stream);
+
+            static const withoutLearning_t withoutLearning[] =
+            {
+                0, mog_withoutLearning_caller<uchar, float>, 0, mog_withoutLearning_caller<uchar3, float3>, mog_withoutLearning_caller<uchar4, float4>
+            };
+            static const withLearning_t withLearning[] =
+            {
+                0, mog_withLearning_caller<uchar, float>, 0, mog_withLearning_caller<uchar3, float3>, mog_withLearning_caller<uchar4, float4>
+            };
+
+            const float minVar = noiseSigma * noiseSigma;
+
+            if (learningRate > 0.0f)
+                withLearning[cn](frame, fgmask, weight, sortKey, mean, var, nmixtures, varThreshold, backgroundRatio, learningRate, minVar, stream);
+            else
+                withoutLearning[cn](frame, fgmask, weight, mean, var, nmixtures, varThreshold, backgroundRatio, stream);
+        }
+
+        template <typename WorkT, typename OutT>
+        __global__ void getBackgroundImage(const PtrStepf gmm_weight, const PtrStep<WorkT> gmm_mean, PtrStepSz<OutT> dst, const int nmixtures, const float backgroundRatio)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x >= dst.cols || y >= dst.rows)
+                return;
+
+            WorkT meanVal = VecTraits<WorkT>::all(0.0f);
+            float totalWeight = 0.0f;
+
+            for (int mode = 0; mode < nmixtures; ++mode)
+            {
+                float weight = gmm_weight(mode * dst.rows + y, x);
+
+                WorkT mean = gmm_mean(mode * dst.rows + y, x);
+                meanVal = meanVal + weight * mean;
+
+                totalWeight += weight;
+
+                if(totalWeight > backgroundRatio)
+                    break;
+            }
+
+            meanVal = meanVal * (1.f / totalWeight);
+
+            dst(y, x) = saturate_cast<OutT>(meanVal);
+        }
+
+        template <typename WorkT, typename OutT>
+        void getBackgroundImage_caller(PtrStepSzf weight, PtrStepSzb mean, PtrStepSzb dst, int nmixtures, float backgroundRatio, cudaStream_t stream)
+        {
+            dim3 block(32, 8);
+            dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y));
+
+            cudaSafeCall( cudaFuncSetCacheConfig(getBackgroundImage<WorkT, OutT>, cudaFuncCachePreferL1) );
+
+            getBackgroundImage<WorkT, OutT><<<grid, block, 0, stream>>>(weight, (PtrStepSz<WorkT>) mean, (PtrStepSz<OutT>) dst, nmixtures, backgroundRatio);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        void getBackgroundImage_gpu(int cn, PtrStepSzf weight, PtrStepSzb mean, PtrStepSzb dst, int nmixtures, float backgroundRatio, cudaStream_t stream)
+        {
+            typedef void (*func_t)(PtrStepSzf weight, PtrStepSzb mean, PtrStepSzb dst, int nmixtures, float backgroundRatio, cudaStream_t stream);
+
+            static const func_t funcs[] =
+            {
+                0, getBackgroundImage_caller<float, uchar>, 0, getBackgroundImage_caller<float3, uchar3>, getBackgroundImage_caller<float4, uchar4>
+            };
+
+            funcs[cn](weight, mean, dst, nmixtures, backgroundRatio, stream);
+        }
+    }
+}}}
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudabgsegm/src/cuda/mog2.cu b/modules/cudabgsegm/src/cuda/mog2.cu
new file mode 100644
index 00000000000..46891c688f4
--- /dev/null
+++ b/modules/cudabgsegm/src/cuda/mog2.cu
@@ -0,0 +1,418 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/vec_traits.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/limits.hpp"
+
+#include "mog2.hpp"
+
+namespace cv
+{
+namespace cuda
+{
+namespace device
+{
+namespace mog2
+{
+///////////////////////////////////////////////////////////////
+// Utility
+
+__device__ __forceinline__ float cvt(uchar val)
+{
+    return val;
+}
+__device__ __forceinline__ float3 cvt(const uchar3 &val)
+{
+    return make_float3(val.x, val.y, val.z);
+}
+__device__ __forceinline__ float4 cvt(const uchar4 &val)
+{
+    return make_float4(val.x, val.y, val.z, val.w);
+}
+
+__device__ __forceinline__ float sqr(float val)
+{
+    return val * val;
+}
+__device__ __forceinline__ float sqr(const float3 &val)
+{
+    return val.x * val.x + val.y * val.y + val.z * val.z;
+}
+__device__ __forceinline__ float sqr(const float4 &val)
+{
+    return val.x * val.x + val.y * val.y + val.z * val.z;
+}
+
+__device__ __forceinline__ float sum(float val)
+{
+    return val;
+}
+__device__ __forceinline__ float sum(const float3 &val)
+{
+    return val.x + val.y + val.z;
+}
+__device__ __forceinline__ float sum(const float4 &val)
+{
+    return val.x + val.y + val.z;
+}
+
+template <class Ptr2D>
+__device__ __forceinline__ void swap(Ptr2D &ptr, int x, int y, int k, int rows)
+{
+    typename Ptr2D::elem_type val = ptr(k * rows + y, x);
+    ptr(k * rows + y, x) = ptr((k + 1) * rows + y, x);
+    ptr((k + 1) * rows + y, x) = val;
+}
+
+///////////////////////////////////////////////////////////////
+// MOG2
+
+template <bool detectShadows, typename SrcT, typename WorkT>
+__global__ void mog2(const PtrStepSz<SrcT> frame, PtrStepb fgmask, PtrStepb modesUsed,
+                     PtrStepf gmm_weight, PtrStepf gmm_variance, PtrStep<WorkT> gmm_mean,
+                     const float alphaT, const float alpha1, const float prune, const Constants *const constants)
+{
+    const int x = blockIdx.x * blockDim.x + threadIdx.x;
+    const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+    if (x < frame.cols && y < frame.rows)
+    {
+        WorkT pix = cvt(frame(y, x));
+
+        //calculate distances to the modes (+ sort)
+        //here we need to go in descending order!!!
+
+        bool background = false; // true - the pixel classified as background
+
+        //internal:
+
+        bool fitsPDF = false; //if it remains zero a new GMM mode will be added
+
+        int nmodes = modesUsed(y, x);
+        const int nNewModes = nmodes; //current number of modes in GMM
+
+        float totalWeight = 0.0f;
+
+        //go through all modes
+
+        for (int mode = 0; mode < nmodes; ++mode)
+        {
+            //need only weight if fit is found
+            float weight = alpha1 * gmm_weight(mode * frame.rows + y, x) + prune;
+            int swap_count = 0;
+            //fit not found yet
+            if (!fitsPDF)
+            {
+                //check if it belongs to some of the remaining modes
+                const float var = gmm_variance(mode * frame.rows + y, x);
+
+                const WorkT mean = gmm_mean(mode * frame.rows + y, x);
+
+                //calculate difference and distance
+                const WorkT diff = mean - pix;
+                const float dist2 = sqr(diff);
+
+                //background? - Tb - usually larger than Tg
+                if (totalWeight < constants->TB_ && dist2 < constants->Tb_ * var)
+                    background = true;
+
+                //check fit
+                if (dist2 < constants->Tg_ * var)
+                {
+                    //belongs to the mode
+                    fitsPDF = true;
+
+                    //update distribution
+
+                    //update weight
+                    weight += alphaT;
+                    float k = alphaT / weight;
+
+                    //update mean
+                    gmm_mean(mode * frame.rows + y, x) = mean - k * diff;
+
+                    //update variance
+                    float varnew = var + k * (dist2 - var);
+
+                    //limit the variance
+                    varnew = ::fmaxf(varnew, constants->varMin_);
+                    varnew = ::fminf(varnew, constants->varMax_);
+
+                    gmm_variance(mode * frame.rows + y, x) = varnew;
+
+                    //sort
+                    //all other weights are at the same place and
+                    //only the matched (iModes) is higher -> just find the new place for it
+
+                    for (int i = mode; i > 0; --i)
+                    {
+                        //check one up
+                        if (weight < gmm_weight((i - 1) * frame.rows + y, x))
+                            break;
+
+                        swap_count++;
+                        //swap one up
+                        swap(gmm_weight, x, y, i - 1, frame.rows);
+                        swap(gmm_variance, x, y, i - 1, frame.rows);
+                        swap(gmm_mean, x, y, i - 1, frame.rows);
+                    }
+
+                    //belongs to the mode - bFitsPDF becomes 1
+                }
+            } // !fitsPDF
+
+            //check prune
+            if (weight < -prune)
+            {
+                weight = 0.0f;
+                nmodes--;
+            }
+
+            gmm_weight((mode - swap_count) * frame.rows + y, x) = weight; //update weight by the calculated value
+            totalWeight += weight;
+        }
+
+        //renormalize weights
+
+        totalWeight = 1.f / totalWeight;
+        for (int mode = 0; mode < nmodes; ++mode)
+            gmm_weight(mode * frame.rows + y, x) *= totalWeight;
+
+        nmodes = nNewModes;
+
+        //make new mode if needed and exit
+
+        if (!fitsPDF)
+        {
+            // replace the weakest or add a new one
+            const int mode = nmodes == constants->nmixtures_ ? constants->nmixtures_ - 1 : nmodes++;
+
+            if (nmodes == 1)
+                gmm_weight(mode * frame.rows + y, x) = 1.f;
+            else
+            {
+                gmm_weight(mode * frame.rows + y, x) = alphaT;
+
+                // renormalize all other weights
+
+                for (int i = 0; i < nmodes - 1; ++i)
+                    gmm_weight(i * frame.rows + y, x) *= alpha1;
+            }
+
+            // init
+
+            gmm_mean(mode * frame.rows + y, x) = pix;
+            gmm_variance(mode * frame.rows + y, x) = constants->varInit_;
+
+            //sort
+            //find the new place for it
+
+            for (int i = nmodes - 1; i > 0; --i)
+            {
+                // check one up
+                if (alphaT < gmm_weight((i - 1) * frame.rows + y, x))
+                    break;
+
+                //swap one up
+                swap(gmm_weight, x, y, i - 1, frame.rows);
+                swap(gmm_variance, x, y, i - 1, frame.rows);
+                swap(gmm_mean, x, y, i - 1, frame.rows);
+            }
+        }
+
+        //set the number of modes
+        modesUsed(y, x) = nmodes;
+
+        bool isShadow = false;
+        if (detectShadows && !background)
+        {
+            float tWeight = 0.0f;
+
+            // check all the components  marked as background:
+            for (int mode = 0; mode < nmodes; ++mode)
+            {
+                const WorkT mean = gmm_mean(mode * frame.rows + y, x);
+
+                const WorkT pix_mean = pix * mean;
+
+                const float numerator = sum(pix_mean);
+                const float denominator = sqr(mean);
+
+                // no division by zero allowed
+                if (denominator == 0)
+                    break;
+
+                // if tau < a < 1 then also check the color distortion
+                else if (numerator <= denominator && numerator >= constants->tau_ * denominator)
+                {
+                    const float a = numerator / denominator;
+
+                    WorkT dD = a * mean - pix;
+
+                    if (sqr(dD) < constants->Tb_ * gmm_variance(mode * frame.rows + y, x) * a * a)
+                    {
+                        isShadow = true;
+                        break;
+                    }
+                };
+
+                tWeight += gmm_weight(mode * frame.rows + y, x);
+                if (tWeight > constants->TB_)
+                    break;
+            }
+        }
+
+        fgmask(y, x) = background ? 0 : isShadow ? constants->shadowVal_ : 255;
+    }
+}
+
+template <typename SrcT, typename WorkT>
+void mog2_caller(PtrStepSzb frame, PtrStepSzb fgmask, PtrStepSzb modesUsed, PtrStepSzf weight, PtrStepSzf variance, PtrStepSzb mean,
+                 float alphaT, float prune, bool detectShadows, const Constants *const constants, cudaStream_t stream)
+{
+    dim3 block(32, 8);
+    dim3 grid(divUp(frame.cols, block.x), divUp(frame.rows, block.y));
+
+    const float alpha1 = 1.0f - alphaT;
+
+    if (detectShadows)
+    {
+        cudaSafeCall(cudaFuncSetCacheConfig(mog2<true, SrcT, WorkT>, cudaFuncCachePreferL1));
+
+        mog2<true, SrcT, WorkT><<<grid, block, 0, stream>>>((PtrStepSz<SrcT>)frame, fgmask, modesUsed,
+                                                            weight, variance, (PtrStepSz<WorkT>)mean,
+                                                            alphaT, alpha1, prune, constants);
+    }
+    else
+    {
+        cudaSafeCall(cudaFuncSetCacheConfig(mog2<false, SrcT, WorkT>, cudaFuncCachePreferL1));
+
+        mog2<false, SrcT, WorkT><<<grid, block, 0, stream>>>((PtrStepSz<SrcT>)frame, fgmask, modesUsed,
+                                                             weight, variance, (PtrStepSz<WorkT>)mean,
+                                                             alphaT, alpha1, prune, constants);
+    }
+
+    cudaSafeCall(cudaGetLastError());
+
+    if (stream == 0)
+        cudaSafeCall(cudaDeviceSynchronize());
+}
+
+void mog2_gpu(PtrStepSzb frame, int cn, PtrStepSzb fgmask, PtrStepSzb modesUsed, PtrStepSzf weight, PtrStepSzf variance, PtrStepSzb mean,
+              float alphaT, float prune, bool detectShadows, const Constants *const constants, cudaStream_t stream)
+{
+    typedef void (*func_t)(PtrStepSzb frame, PtrStepSzb fgmask, PtrStepSzb modesUsed, PtrStepSzf weight, PtrStepSzf variance, PtrStepSzb mean, float alphaT, float prune, bool detectShadows, const Constants *const constants, cudaStream_t stream);
+
+    static const func_t funcs[] =
+        {
+            0, mog2_caller<uchar, float>, 0, mog2_caller<uchar3, float3>, mog2_caller<uchar4, float4>};
+
+    funcs[cn](frame, fgmask, modesUsed, weight, variance, mean, alphaT, prune, detectShadows, constants, stream);
+}
+
+template <typename WorkT, typename OutT>
+__global__ void getBackgroundImage2(const PtrStepSzb modesUsed, const PtrStepf gmm_weight, const PtrStep<WorkT> gmm_mean, PtrStep<OutT> dst, const Constants *const constants)
+{
+    const int x = blockIdx.x * blockDim.x + threadIdx.x;
+    const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+    if (x >= modesUsed.cols || y >= modesUsed.rows)
+        return;
+
+    int nmodes = modesUsed(y, x);
+
+    WorkT meanVal = VecTraits<WorkT>::all(0.0f);
+    float totalWeight = 0.0f;
+
+    for (int mode = 0; mode < nmodes; ++mode)
+    {
+        float weight = gmm_weight(mode * modesUsed.rows + y, x);
+
+        WorkT mean = gmm_mean(mode * modesUsed.rows + y, x);
+        meanVal = meanVal + weight * mean;
+
+        totalWeight += weight;
+
+        if (totalWeight > constants->TB_)
+            break;
+    }
+
+    meanVal = meanVal * (1.f / totalWeight);
+
+    dst(y, x) = saturate_cast<OutT>(meanVal);
+}
+
+template <typename WorkT, typename OutT>
+void getBackgroundImage2_caller(PtrStepSzb modesUsed, PtrStepSzf weight, PtrStepSzb mean, PtrStepSzb dst, const Constants *const constants, cudaStream_t stream)
+{
+    dim3 block(32, 8);
+    dim3 grid(divUp(modesUsed.cols, block.x), divUp(modesUsed.rows, block.y));
+
+    cudaSafeCall(cudaFuncSetCacheConfig(getBackgroundImage2<WorkT, OutT>, cudaFuncCachePreferL1));
+
+    getBackgroundImage2<WorkT, OutT><<<grid, block, 0, stream>>>(modesUsed, weight, (PtrStepSz<WorkT>)mean, (PtrStepSz<OutT>)dst, constants);
+    cudaSafeCall(cudaGetLastError());
+
+    if (stream == 0)
+        cudaSafeCall(cudaDeviceSynchronize());
+}
+
+void getBackgroundImage2_gpu(int cn, PtrStepSzb modesUsed, PtrStepSzf weight, PtrStepSzb mean, PtrStepSzb dst, const Constants *const constants, cudaStream_t stream)
+{
+    typedef void (*func_t)(PtrStepSzb modesUsed, PtrStepSzf weight, PtrStepSzb mean, PtrStepSzb dst, const Constants *const constants, cudaStream_t stream);
+
+    static const func_t funcs[] =
+        {
+            0, getBackgroundImage2_caller<float, uchar>, 0, getBackgroundImage2_caller<float3, uchar3>, getBackgroundImage2_caller<float4, uchar4>};
+
+    funcs[cn](modesUsed, weight, mean, dst, constants, stream);
+}
+} // namespace mog2
+} // namespace device
+} // namespace cuda
+} // namespace cv
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudabgsegm/src/cuda/mog2.hpp b/modules/cudabgsegm/src/cuda/mog2.hpp
new file mode 100644
index 00000000000..5b2155195fb
--- /dev/null
+++ b/modules/cudabgsegm/src/cuda/mog2.hpp
@@ -0,0 +1,37 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_CUDA_MOG2_H
+#define OPENCV_CUDA_MOG2_H
+
+#include "opencv2/core/cuda.hpp"
+
+struct CUstream_st;
+typedef struct CUstream_st *cudaStream_t;
+
+namespace cv { namespace cuda {
+
+class Stream;
+
+namespace device { namespace mog2 {
+
+typedef struct
+{
+    float Tb_;
+    float TB_;
+    float Tg_;
+    float varInit_;
+    float varMin_;
+    float varMax_;
+    float tau_;
+    int nmixtures_;
+    unsigned char shadowVal_;
+} Constants;
+
+void mog2_gpu(PtrStepSzb frame, int cn, PtrStepSzb fgmask, PtrStepSzb modesUsed, PtrStepSzf weight, PtrStepSzf variance, PtrStepSzb mean, float alphaT, float prune, bool detectShadows, const Constants *const constants, cudaStream_t stream);
+void getBackgroundImage2_gpu(int cn, PtrStepSzb modesUsed, PtrStepSzf weight, PtrStepSzb mean, PtrStepSzb dst, const Constants *const constants, cudaStream_t stream);
+
+} } } }
+
+#endif /* OPENCV_CUDA_MOG2_H */
diff --git a/modules/cudabgsegm/src/mog.cpp b/modules/cudabgsegm/src/mog.cpp
new file mode 100644
index 00000000000..8a43293d43a
--- /dev/null
+++ b/modules/cudabgsegm/src/mog.cpp
@@ -0,0 +1,209 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined HAVE_CUDA || defined(CUDA_DISABLER)
+
+Ptr<cuda::BackgroundSubtractorMOG> cv::cuda::createBackgroundSubtractorMOG(int, int, double, double)  { throw_no_cuda(); return Ptr<cuda::BackgroundSubtractorMOG>(); }
+
+#else
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace mog
+    {
+        void mog_gpu(PtrStepSzb frame, int cn, PtrStepSzb fgmask, PtrStepSzf weight, PtrStepSzf sortKey, PtrStepSzb mean, PtrStepSzb var,
+                     int nmixtures, float varThreshold, float learningRate, float backgroundRatio, float noiseSigma,
+                     cudaStream_t stream);
+        void getBackgroundImage_gpu(int cn, PtrStepSzf weight, PtrStepSzb mean, PtrStepSzb dst, int nmixtures, float backgroundRatio, cudaStream_t stream);
+    }
+}}}
+
+namespace
+{
+    const int defaultNMixtures = 5;
+    const int defaultHistory = 200;
+    const float defaultBackgroundRatio = 0.7f;
+    const float defaultVarThreshold = 2.5f * 2.5f;
+    const float defaultNoiseSigma = 30.0f * 0.5f;
+    const float defaultInitialWeight = 0.05f;
+
+    class MOGImpl CV_FINAL : public cuda::BackgroundSubtractorMOG
+    {
+    public:
+        MOGImpl(int history, int nmixtures, double backgroundRatio, double noiseSigma);
+
+        void apply(InputArray image, OutputArray fgmask, double learningRate=-1) CV_OVERRIDE;
+        void apply(InputArray image, OutputArray fgmask, double learningRate, Stream& stream) CV_OVERRIDE;
+
+        void getBackgroundImage(OutputArray backgroundImage) const CV_OVERRIDE;
+        void getBackgroundImage(OutputArray backgroundImage, Stream& stream) const CV_OVERRIDE;
+
+        int getHistory() const CV_OVERRIDE { return history_; }
+        void setHistory(int nframes) CV_OVERRIDE { history_ = nframes; }
+
+        int getNMixtures() const CV_OVERRIDE { return nmixtures_; }
+        void setNMixtures(int nmix) CV_OVERRIDE { nmixtures_ = nmix; }
+
+        double getBackgroundRatio() const CV_OVERRIDE { return backgroundRatio_; }
+        void setBackgroundRatio(double backgroundRatio) CV_OVERRIDE { backgroundRatio_ = (float) backgroundRatio; }
+
+        double getNoiseSigma() const CV_OVERRIDE { return noiseSigma_; }
+        void setNoiseSigma(double noiseSigma) CV_OVERRIDE { noiseSigma_ = (float) noiseSigma; }
+
+    private:
+        //! re-initiaization method
+        void initialize(Size frameSize, int frameType);
+
+        int history_;
+        int nmixtures_;
+        float backgroundRatio_;
+        float noiseSigma_;
+
+        float varThreshold_;
+
+        Size frameSize_;
+        int frameType_;
+        int nframes_;
+
+        GpuMat weight_;
+        GpuMat sortKey_;
+        GpuMat mean_;
+        GpuMat var_;
+    };
+
+    MOGImpl::MOGImpl(int history, int nmixtures, double backgroundRatio, double noiseSigma) :
+        frameSize_(0, 0), frameType_(0), nframes_(0)
+    {
+        history_ = history > 0 ? history : defaultHistory;
+        nmixtures_ = std::min(nmixtures > 0 ? nmixtures : defaultNMixtures, 8);
+        backgroundRatio_ = backgroundRatio > 0 ? (float) backgroundRatio : defaultBackgroundRatio;
+        noiseSigma_ = noiseSigma > 0 ? (float) noiseSigma : defaultNoiseSigma;
+
+        varThreshold_ = defaultVarThreshold;
+    }
+
+    void MOGImpl::apply(InputArray image, OutputArray fgmask, double learningRate)
+    {
+        apply(image, fgmask, learningRate, Stream::Null());
+    }
+
+    void MOGImpl::apply(InputArray _frame, OutputArray _fgmask, double learningRate, Stream& stream)
+    {
+        using namespace cv::cuda::device::mog;
+
+        GpuMat frame = _frame.getGpuMat();
+
+        CV_Assert( frame.depth() == CV_8U );
+
+        int ch = frame.channels();
+        int work_ch = ch;
+
+        if (nframes_ == 0 || learningRate >= 1.0 || frame.size() != frameSize_ || work_ch != mean_.channels())
+            initialize(frame.size(), frame.type());
+
+        _fgmask.create(frameSize_, CV_8UC1);
+        GpuMat fgmask = _fgmask.getGpuMat();
+
+        ++nframes_;
+        learningRate = learningRate >= 0 && nframes_ > 1 ? learningRate : 1.0 / std::min(nframes_, history_);
+        CV_Assert( learningRate >= 0 );
+
+        mog_gpu(frame, ch, fgmask, weight_, sortKey_, mean_, var_, nmixtures_,
+                varThreshold_, (float) learningRate, backgroundRatio_, noiseSigma_,
+                StreamAccessor::getStream(stream));
+    }
+
+    void MOGImpl::getBackgroundImage(OutputArray backgroundImage) const
+    {
+        getBackgroundImage(backgroundImage, Stream::Null());
+    }
+
+    void MOGImpl::getBackgroundImage(OutputArray _backgroundImage, Stream& stream) const
+    {
+        using namespace cv::cuda::device::mog;
+
+        _backgroundImage.create(frameSize_, frameType_);
+        GpuMat backgroundImage = _backgroundImage.getGpuMat();
+
+        getBackgroundImage_gpu(backgroundImage.channels(), weight_, mean_, backgroundImage, nmixtures_, backgroundRatio_, StreamAccessor::getStream(stream));
+    }
+
+    void MOGImpl::initialize(Size frameSize, int frameType)
+    {
+        CV_Assert( frameType == CV_8UC1 || frameType == CV_8UC3 || frameType == CV_8UC4 );
+
+        frameSize_ = frameSize;
+        frameType_ = frameType;
+
+        int ch = CV_MAT_CN(frameType);
+        int work_ch = ch;
+
+        // for each gaussian mixture of each pixel bg model we store
+        // the mixture sort key (w/sum_of_variances), the mixture weight (w),
+        // the mean (nchannels values) and
+        // the diagonal covariance matrix (another nchannels values)
+
+        weight_.create(frameSize.height * nmixtures_, frameSize_.width, CV_32FC1);
+        sortKey_.create(frameSize.height * nmixtures_, frameSize_.width, CV_32FC1);
+        mean_.create(frameSize.height * nmixtures_, frameSize_.width, CV_32FC(work_ch));
+        var_.create(frameSize.height * nmixtures_, frameSize_.width, CV_32FC(work_ch));
+
+        weight_.setTo(cv::Scalar::all(0));
+        sortKey_.setTo(cv::Scalar::all(0));
+        mean_.setTo(cv::Scalar::all(0));
+        var_.setTo(cv::Scalar::all(0));
+
+        nframes_ = 0;
+    }
+}
+
+Ptr<cuda::BackgroundSubtractorMOG> cv::cuda::createBackgroundSubtractorMOG(int history, int nmixtures, double backgroundRatio, double noiseSigma)
+{
+    return makePtr<MOGImpl>(history, nmixtures, backgroundRatio, noiseSigma);
+}
+
+#endif
diff --git a/modules/cudabgsegm/src/mog2.cpp b/modules/cudabgsegm/src/mog2.cpp
new file mode 100644
index 00000000000..47135a088ba
--- /dev/null
+++ b/modules/cudabgsegm/src/mog2.cpp
@@ -0,0 +1,251 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+#include "cuda/mog2.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cuda::device::mog2;
+
+#if !defined HAVE_CUDA || defined(CUDA_DISABLER)
+
+Ptr<cuda::BackgroundSubtractorMOG2> cv::cuda::createBackgroundSubtractorMOG2(int, double, bool)
+{
+    throw_no_cuda();
+    return Ptr<cuda::BackgroundSubtractorMOG2>();
+}
+
+#else
+
+namespace
+{
+// default parameters of gaussian background detection algorithm
+const int defaultHistory = 500; // Learning rate; alpha = 1/defaultHistory2
+const float defaultVarThreshold = 4.0f * 4.0f;
+const int defaultNMixtures = 5;            // maximal number of Gaussians in mixture
+const float defaultBackgroundRatio = 0.9f; // threshold sum of weights for background test
+const float defaultVarThresholdGen = 3.0f * 3.0f;
+const float defaultVarInit = 15.0f; // initial variance for new components
+const float defaultVarMax = 5.0f * defaultVarInit;
+const float defaultVarMin = 4.0f;
+
+// additional parameters
+const float defaultCT = 0.05f;                // complexity reduction prior constant 0 - no reduction of number of components
+const unsigned char defaultShadowValue = 127; // value to use in the segmentation mask for shadows, set 0 not to do shadow detection
+const float defaultShadowThreshold = 0.5f;    // Tau - shadow threshold, see the paper for explanation
+
+class MOG2Impl CV_FINAL : public cuda::BackgroundSubtractorMOG2
+{
+public:
+    MOG2Impl(int history, double varThreshold, bool detectShadows);
+    ~MOG2Impl();
+
+    void apply(InputArray image, OutputArray fgmask, double learningRate = -1) CV_OVERRIDE;
+    void apply(InputArray image, OutputArray fgmask, double learningRate, Stream &stream) CV_OVERRIDE;
+
+    void getBackgroundImage(OutputArray backgroundImage) const CV_OVERRIDE;
+    void getBackgroundImage(OutputArray backgroundImage, Stream &stream) const CV_OVERRIDE;
+
+    int getHistory() const CV_OVERRIDE { return history_; }
+    void setHistory(int history) CV_OVERRIDE { history_ = history; }
+
+    int getNMixtures() const CV_OVERRIDE { return constantsHost_.nmixtures_; }
+    void setNMixtures(int nmixtures) CV_OVERRIDE { constantsHost_.nmixtures_ = nmixtures; }
+
+    double getBackgroundRatio() const CV_OVERRIDE { return constantsHost_.TB_; }
+    void setBackgroundRatio(double ratio) CV_OVERRIDE { constantsHost_.TB_ = (float)ratio; }
+
+    double getVarThreshold() const CV_OVERRIDE { return constantsHost_.Tb_; }
+    void setVarThreshold(double varThreshold) CV_OVERRIDE { constantsHost_.Tb_ = (float)varThreshold; }
+
+    double getVarThresholdGen() const CV_OVERRIDE { return constantsHost_.Tg_; }
+    void setVarThresholdGen(double varThresholdGen) CV_OVERRIDE { constantsHost_.Tg_ = (float)varThresholdGen; }
+
+    double getVarInit() const CV_OVERRIDE { return constantsHost_.varInit_; }
+    void setVarInit(double varInit) CV_OVERRIDE { constantsHost_.varInit_ = (float)varInit; }
+
+    double getVarMin() const CV_OVERRIDE { return constantsHost_.varMin_; }
+    void setVarMin(double varMin) CV_OVERRIDE { constantsHost_.varMin_ = ::fminf((float)varMin, constantsHost_.varMax_); }
+
+    double getVarMax() const CV_OVERRIDE { return constantsHost_.varMax_; }
+    void setVarMax(double varMax) CV_OVERRIDE { constantsHost_.varMax_ = ::fmaxf(constantsHost_.varMin_, (float)varMax); }
+
+    double getComplexityReductionThreshold() const CV_OVERRIDE { return ct_; }
+    void setComplexityReductionThreshold(double ct) CV_OVERRIDE { ct_ = (float)ct; }
+
+    bool getDetectShadows() const CV_OVERRIDE { return detectShadows_; }
+    void setDetectShadows(bool detectShadows) CV_OVERRIDE { detectShadows_ = detectShadows; }
+
+    int getShadowValue() const CV_OVERRIDE { return constantsHost_.shadowVal_; }
+    void setShadowValue(int value) CV_OVERRIDE { constantsHost_.shadowVal_ = (uchar)value; }
+
+    double getShadowThreshold() const CV_OVERRIDE { return constantsHost_.tau_; }
+    void setShadowThreshold(double threshold) CV_OVERRIDE { constantsHost_.tau_ = (float)threshold; }
+
+private:
+    void initialize(Size frameSize, int frameType, Stream &stream);
+
+    Constants constantsHost_;
+    Constants *constantsDevice_;
+
+    int history_;
+    float ct_;
+    bool detectShadows_;
+
+    Size frameSize_;
+    int frameType_;
+    int nframes_;
+
+    GpuMat weight_;
+    GpuMat variance_;
+    GpuMat mean_;
+
+    //keep track of number of modes per pixel
+    GpuMat bgmodelUsedModes_;
+};
+
+MOG2Impl::MOG2Impl(int history, double varThreshold, bool detectShadows) : frameSize_(0, 0), frameType_(0), nframes_(0)
+{
+    history_ = history > 0 ? history : defaultHistory;
+    detectShadows_ = detectShadows;
+    ct_ = defaultCT;
+
+    setNMixtures(defaultNMixtures);
+    setBackgroundRatio(defaultBackgroundRatio);
+    setVarInit(defaultVarInit);
+    setVarMin(defaultVarMin);
+    setVarMax(defaultVarMax);
+    setVarThreshold(varThreshold > 0 ? (float)varThreshold : defaultVarThreshold);
+    setVarThresholdGen(defaultVarThresholdGen);
+
+    setShadowValue(defaultShadowValue);
+    setShadowThreshold(defaultShadowThreshold);
+
+    cudaSafeCall(cudaMalloc((void **)&constantsDevice_, sizeof(Constants)));
+}
+
+MOG2Impl::~MOG2Impl()
+{
+    cudaFree(constantsDevice_);
+}
+
+void MOG2Impl::apply(InputArray image, OutputArray fgmask, double learningRate)
+{
+    apply(image, fgmask, learningRate, Stream::Null());
+}
+
+void MOG2Impl::apply(InputArray _frame, OutputArray _fgmask, double learningRate, Stream &stream)
+{
+    using namespace cv::cuda::device::mog2;
+
+    GpuMat frame = _frame.getGpuMat();
+
+    int ch = frame.channels();
+    int work_ch = ch;
+
+    if (nframes_ == 0 || learningRate >= 1.0 || frame.size() != frameSize_ || work_ch != mean_.channels())
+        initialize(frame.size(), frame.type(), stream);
+
+    _fgmask.create(frameSize_, CV_8UC1);
+    GpuMat fgmask = _fgmask.getGpuMat();
+
+    fgmask.setTo(Scalar::all(0), stream);
+
+    ++nframes_;
+    learningRate = learningRate >= 0 && nframes_ > 1 ? learningRate : 1.0 / std::min(2 * nframes_, history_);
+    CV_Assert(learningRate >= 0);
+
+    mog2_gpu(frame, frame.channels(), fgmask, bgmodelUsedModes_, weight_, variance_, mean_,
+             (float)learningRate, static_cast<float>(-learningRate * ct_), detectShadows_, constantsDevice_, StreamAccessor::getStream(stream));
+}
+
+void MOG2Impl::getBackgroundImage(OutputArray backgroundImage) const
+{
+    getBackgroundImage(backgroundImage, Stream::Null());
+}
+
+void MOG2Impl::getBackgroundImage(OutputArray _backgroundImage, Stream &stream) const
+{
+    using namespace cv::cuda::device::mog2;
+
+    _backgroundImage.create(frameSize_, frameType_);
+    GpuMat backgroundImage = _backgroundImage.getGpuMat();
+
+    getBackgroundImage2_gpu(backgroundImage.channels(), bgmodelUsedModes_, weight_, mean_, backgroundImage, constantsDevice_, StreamAccessor::getStream(stream));
+}
+
+void MOG2Impl::initialize(cv::Size frameSize, int frameType, Stream &stream)
+{
+    using namespace cv::cuda::device::mog2;
+
+    CV_Assert(frameType == CV_8UC1 || frameType == CV_8UC3 || frameType == CV_8UC4);
+
+    frameSize_ = frameSize;
+    frameType_ = frameType;
+    nframes_ = 0;
+
+    const int ch = CV_MAT_CN(frameType);
+    const int work_ch = ch;
+
+    // for each gaussian mixture of each pixel bg model we store ...
+    // the mixture weight (w),
+    // the mean (nchannels values) and
+    // the covariance
+    weight_.create(frameSize.height * getNMixtures(), frameSize_.width, CV_32FC1);
+    variance_.create(frameSize.height * getNMixtures(), frameSize_.width, CV_32FC1);
+    mean_.create(frameSize.height * getNMixtures(), frameSize_.width, CV_32FC(work_ch));
+
+    //make the array for keeping track of the used modes per pixel - all zeros at start
+    bgmodelUsedModes_.create(frameSize_, CV_8UC1);
+    bgmodelUsedModes_.setTo(Scalar::all(0));
+
+    cudaSafeCall(cudaMemcpyAsync(constantsDevice_, &constantsHost_, sizeof(Constants), cudaMemcpyHostToDevice, StreamAccessor::getStream(stream)));
+}
+} // namespace
+
+Ptr<cuda::BackgroundSubtractorMOG2> cv::cuda::createBackgroundSubtractorMOG2(int history, double varThreshold, bool detectShadows)
+{
+    return makePtr<MOG2Impl>(history, varThreshold, detectShadows);
+}
+
+#endif
diff --git a/modules/cudabgsegm/src/precomp.hpp b/modules/cudabgsegm/src/precomp.hpp
new file mode 100644
index 00000000000..12429c24164
--- /dev/null
+++ b/modules/cudabgsegm/src/precomp.hpp
@@ -0,0 +1,54 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_PRECOMP_H
+#define OPENCV_PRECOMP_H
+
+#include <limits>
+
+#include "opencv2/cudabgsegm.hpp"
+
+#include "opencv2/core/private.cuda.hpp"
+
+#include "opencv2/opencv_modules.hpp"
+
+#endif /* OPENCV_PRECOMP_H */
diff --git a/modules/cudabgsegm/test/test_bgsegm.cpp b/modules/cudabgsegm/test/test_bgsegm.cpp
new file mode 100644
index 00000000000..c0c319ff42d
--- /dev/null
+++ b/modules/cudabgsegm/test/test_bgsegm.cpp
@@ -0,0 +1,164 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////
+// MOG2
+
+#ifdef HAVE_VIDEO_INPUT
+
+namespace
+    {
+IMPLEMENT_PARAM_CLASS(UseGray, bool)
+    IMPLEMENT_PARAM_CLASS(DetectShadow, bool)
+}
+
+PARAM_TEST_CASE(MOG2, cv::cuda::DeviceInfo, std::string, UseGray, DetectShadow, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    std::string inputFile;
+    bool useGray;
+    bool detectShadow;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        inputFile = std::string(cvtest::TS::ptr()->get_data_path()) + "video/" + GET_PARAM(1);
+        useGray = GET_PARAM(2);
+        detectShadow = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+    }
+};
+
+CUDA_TEST_P(MOG2, Update)
+{
+    cv::VideoCapture cap(inputFile);
+    ASSERT_TRUE(cap.isOpened());
+
+    cv::Mat frame;
+    cap >> frame;
+    ASSERT_FALSE(frame.empty());
+
+    cv::Ptr<cv::BackgroundSubtractorMOG2> mog2 = cv::cuda::createBackgroundSubtractorMOG2();
+    mog2->setDetectShadows(detectShadow);
+    cv::cuda::GpuMat foreground = createMat(frame.size(), CV_8UC1, useRoi);
+
+    cv::Ptr<cv::BackgroundSubtractorMOG2> mog2_gold = cv::createBackgroundSubtractorMOG2();
+    mog2_gold->setDetectShadows(detectShadow);
+    cv::Mat foreground_gold;
+
+    for (int i = 0; i < 10; ++i)
+    {
+        cap >> frame;
+        ASSERT_FALSE(frame.empty());
+
+        if (useGray)
+        {
+            cv::Mat temp;
+            cv::cvtColor(frame, temp, cv::COLOR_BGR2GRAY);
+            cv::swap(temp, frame);
+        }
+
+        mog2->apply(loadMat(frame, useRoi), foreground);
+
+        mog2_gold->apply(frame, foreground_gold);
+
+        ASSERT_MAT_SIMILAR(foreground_gold, foreground, detectShadow ? 13e-3 : 18e-8);
+    }
+}
+
+CUDA_TEST_P(MOG2, getBackgroundImage)
+{
+    if (useGray)
+        return;
+
+    cv::VideoCapture cap(inputFile);
+    ASSERT_TRUE(cap.isOpened());
+
+    cv::Mat frame;
+
+    cv::Ptr<cv::BackgroundSubtractorMOG2> mog2 = cv::cuda::createBackgroundSubtractorMOG2();
+    mog2->setDetectShadows(detectShadow);
+    cv::cuda::GpuMat foreground;
+
+    cv::Ptr<cv::BackgroundSubtractorMOG2> mog2_gold = cv::createBackgroundSubtractorMOG2();
+    mog2_gold->setDetectShadows(detectShadow);
+    cv::Mat foreground_gold;
+
+    for (int i = 0; i < 10; ++i)
+    {
+        cap >> frame;
+        ASSERT_FALSE(frame.empty());
+
+        mog2->apply(loadMat(frame, useRoi), foreground);
+
+        mog2_gold->apply(frame, foreground_gold);
+    }
+
+    cv::cuda::GpuMat background = createMat(frame.size(), frame.type(), useRoi);
+    mog2->getBackgroundImage(background);
+
+    cv::Mat background_gold;
+    mog2_gold->getBackgroundImage(background_gold);
+
+    ASSERT_MAT_NEAR(background_gold, background, 1);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_BgSegm, MOG2, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(std::string("768x576.avi")),
+    testing::Values(UseGray(true), UseGray(false)),
+    testing::Values(DetectShadow(true), DetectShadow(false)),
+    WHOLE_SUBMAT));
+
+#endif
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudabgsegm/test/test_main.cpp b/modules/cudabgsegm/test/test_main.cpp
new file mode 100644
index 00000000000..04f4fcf6e60
--- /dev/null
+++ b/modules/cudabgsegm/test/test_main.cpp
@@ -0,0 +1,45 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+CV_CUDA_TEST_MAIN("gpu")
diff --git a/modules/cudabgsegm/test/test_precomp.hpp b/modules/cudabgsegm/test/test_precomp.hpp
new file mode 100644
index 00000000000..c0408cb7631
--- /dev/null
+++ b/modules/cudabgsegm/test/test_precomp.hpp
@@ -0,0 +1,54 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef OPENCV_TEST_PRECOMP_HPP
+#define OPENCV_TEST_PRECOMP_HPP
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_test.hpp"
+
+#include "opencv2/cudabgsegm.hpp"
+#include "opencv2/video.hpp"
+
+#include "opencv2/opencv_modules.hpp"
+#include "cvconfig.h"
+
+#endif
diff --git a/modules/cudacodec/CMakeLists.txt b/modules/cudacodec/CMakeLists.txt
new file mode 100644
index 00000000000..071404ecc76
--- /dev/null
+++ b/modules/cudacodec/CMakeLists.txt
@@ -0,0 +1,29 @@
+if(IOS OR APPLE OR WINRT OR (NOT HAVE_CUDA AND NOT BUILD_CUDA_STUBS))
+  ocv_module_disable(cudacodec)
+endif()
+
+set(the_description "CUDA-accelerated Video Encoding/Decoding")
+
+ocv_warnings_disable(CMAKE_CXX_FLAGS /wd4127 /wd4324 /wd4512 -Wundef -Wshadow)
+
+ocv_add_module(cudacodec opencv_core opencv_videoio OPTIONAL opencv_cudev WRAP python)
+
+ocv_module_include_directories()
+ocv_glob_module_sources()
+
+set(extra_libs "")
+
+if(HAVE_NVCUVID)
+  list(APPEND extra_libs ${CUDA_CUDA_LIBRARY} ${CUDA_nvcuvid_LIBRARY})
+endif()
+
+if(HAVE_NVCUVENC)
+  if(WIN32)
+    list(APPEND extra_libs ${CUDA_nvcuvenc_LIBRARY})
+  endif()
+endif()
+
+ocv_create_module(${extra_libs})
+
+ocv_add_accuracy_tests()
+ocv_add_perf_tests()
diff --git a/modules/cudacodec/include/opencv2/cudacodec.hpp b/modules/cudacodec/include/opencv2/cudacodec.hpp
new file mode 100644
index 00000000000..3fad950e102
--- /dev/null
+++ b/modules/cudacodec/include/opencv2/cudacodec.hpp
@@ -0,0 +1,349 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_CUDACODEC_HPP
+#define OPENCV_CUDACODEC_HPP
+
+#ifndef __cplusplus
+#  error cudacodec.hpp header must be compiled as C++
+#endif
+
+#include "opencv2/core/cuda.hpp"
+
+/**
+  @addtogroup cuda
+  @{
+    @defgroup cudacodec Video Encoding/Decoding
+  @}
+ */
+
+namespace cv { namespace cudacodec {
+
+using namespace cuda;  // Stream
+
+//! @addtogroup cudacodec
+//! @{
+
+////////////////////////////////// Video Encoding //////////////////////////////////
+
+// Works only under Windows.
+// Supports only H264 video codec and AVI files.
+
+enum SurfaceFormat
+{
+    SF_UYVY = 0,
+    SF_YUY2,
+    SF_YV12,
+    SF_NV12,
+    SF_IYUV,
+    SF_BGR,
+    SF_GRAY = SF_BGR
+};
+
+/** @brief Different parameters for CUDA video encoder.
+ */
+struct CV_EXPORTS_W EncoderParams
+{
+    int P_Interval;      //!< NVVE_P_INTERVAL,
+    int IDR_Period;      //!< NVVE_IDR_PERIOD,
+    int DynamicGOP;      //!< NVVE_DYNAMIC_GOP,
+    int RCType;          //!< NVVE_RC_TYPE,
+    int AvgBitrate;      //!< NVVE_AVG_BITRATE,
+    int PeakBitrate;     //!< NVVE_PEAK_BITRATE,
+    int QP_Level_Intra;  //!< NVVE_QP_LEVEL_INTRA,
+    int QP_Level_InterP; //!< NVVE_QP_LEVEL_INTER_P,
+    int QP_Level_InterB; //!< NVVE_QP_LEVEL_INTER_B,
+    int DeblockMode;     //!< NVVE_DEBLOCK_MODE,
+    int ProfileLevel;    //!< NVVE_PROFILE_LEVEL,
+    int ForceIntra;      //!< NVVE_FORCE_INTRA,
+    int ForceIDR;        //!< NVVE_FORCE_IDR,
+    int ClearStat;       //!< NVVE_CLEAR_STAT,
+    int DIMode;          //!< NVVE_SET_DEINTERLACE,
+    int Presets;         //!< NVVE_PRESETS,
+    int DisableCabac;    //!< NVVE_DISABLE_CABAC,
+    int NaluFramingType; //!< NVVE_CONFIGURE_NALU_FRAMING_TYPE
+    int DisableSPSPPS;   //!< NVVE_DISABLE_SPS_PPS
+
+    EncoderParams();
+    /** @brief Constructors.
+
+    @param configFile Config file name.
+
+    Creates default parameters or reads parameters from config file.
+     */
+    explicit EncoderParams(const String& configFile);
+
+    /** @brief Reads parameters from config file.
+
+    @param configFile Config file name.
+     */
+    void load(const String& configFile);
+    /** @brief Saves parameters to config file.
+
+    @param configFile Config file name.
+     */
+    void save(const String& configFile) const;
+};
+
+/** @brief Callbacks for CUDA video encoder.
+ */
+class CV_EXPORTS_W EncoderCallBack
+{
+public:
+    enum PicType
+    {
+        IFRAME = 1,
+        PFRAME = 2,
+        BFRAME = 3
+    };
+
+    virtual ~EncoderCallBack() {}
+
+    /** @brief Callback function to signal the start of bitstream that is to be encoded.
+
+    Callback must allocate buffer for CUDA encoder and return pointer to it and it's size.
+     */
+    virtual uchar* acquireBitStream(int* bufferSize) = 0;
+
+    /** @brief Callback function to signal that the encoded bitstream is ready to be written to file.
+    */
+    virtual void releaseBitStream(unsigned char* data, int size) = 0;
+
+    /** @brief Callback function to signal that the encoding operation on the frame has started.
+
+    @param frameNumber
+    @param picType Specify frame type (I-Frame, P-Frame or B-Frame).
+     */
+    CV_WRAP virtual void onBeginFrame(int frameNumber, EncoderCallBack::PicType picType) = 0;
+
+    /** @brief Callback function signals that the encoding operation on the frame has finished.
+
+    @param frameNumber
+    @param picType Specify frame type (I-Frame, P-Frame or B-Frame).
+     */
+    CV_WRAP virtual void onEndFrame(int frameNumber, EncoderCallBack::PicType picType) = 0;
+};
+
+/** @brief Video writer interface.
+
+The implementation uses H264 video codec.
+
+@note Currently only Windows platform is supported.
+
+@note
+   -   An example on how to use the videoWriter class can be found at
+        opencv_source_code/samples/gpu/video_writer.cpp
+ */
+class CV_EXPORTS_W VideoWriter
+{
+public:
+    virtual ~VideoWriter() {}
+
+    /** @brief Writes the next video frame.
+
+    @param frame The written frame.
+    @param lastFrame Indicates that it is end of stream. The parameter can be ignored.
+
+    The method write the specified image to video file. The image must have the same size and the same
+    surface format as has been specified when opening the video writer.
+     */
+    CV_WRAP virtual void write(InputArray frame, bool lastFrame = false) = 0;
+
+    CV_WRAP virtual EncoderParams getEncoderParams() const = 0;
+};
+
+/** @brief Creates video writer.
+
+@param fileName Name of the output video file. Only AVI file format is supported.
+@param frameSize Size of the input video frames.
+@param fps Framerate of the created video stream.
+@param format Surface format of input frames ( SF_UYVY , SF_YUY2 , SF_YV12 , SF_NV12 ,
+SF_IYUV , SF_BGR or SF_GRAY). BGR or gray frames will be converted to YV12 format before
+encoding, frames with other formats will be used as is.
+
+The constructors initialize video writer. FFMPEG is used to write videos. User can implement own
+multiplexing with cudacodec::EncoderCallBack .
+ */
+CV_EXPORTS_W Ptr<cudacodec::VideoWriter> createVideoWriter(const String& fileName, Size frameSize, double fps, SurfaceFormat format = SF_BGR);
+/** @overload
+@param fileName Name of the output video file. Only AVI file format is supported.
+@param frameSize Size of the input video frames.
+@param fps Framerate of the created video stream.
+@param params Encoder parameters. See cudacodec::EncoderParams .
+@param format Surface format of input frames ( SF_UYVY , SF_YUY2 , SF_YV12 , SF_NV12 ,
+SF_IYUV , SF_BGR or SF_GRAY). BGR or gray frames will be converted to YV12 format before
+encoding, frames with other formats will be used as is.
+*/
+CV_EXPORTS_W Ptr<cudacodec::VideoWriter> createVideoWriter(const String& fileName, Size frameSize, double fps, const EncoderParams& params, SurfaceFormat format = SF_BGR);
+
+/** @overload
+@param encoderCallback Callbacks for video encoder. See cudacodec::EncoderCallBack . Use it if you
+want to work with raw video stream.
+@param frameSize Size of the input video frames.
+@param fps Framerate of the created video stream.
+@param format Surface format of input frames ( SF_UYVY , SF_YUY2 , SF_YV12 , SF_NV12 ,
+SF_IYUV , SF_BGR or SF_GRAY). BGR or gray frames will be converted to YV12 format before
+encoding, frames with other formats will be used as is.
+*/
+CV_EXPORTS_W Ptr<cudacodec::VideoWriter> createVideoWriter(const Ptr<EncoderCallBack>& encoderCallback, Size frameSize, double fps, SurfaceFormat format = SF_BGR);
+/** @overload
+@param encoderCallback Callbacks for video encoder. See cudacodec::EncoderCallBack . Use it if you
+want to work with raw video stream.
+@param frameSize Size of the input video frames.
+@param fps Framerate of the created video stream.
+@param params Encoder parameters. See cudacodec::EncoderParams .
+@param format Surface format of input frames ( SF_UYVY , SF_YUY2 , SF_YV12 , SF_NV12 ,
+SF_IYUV , SF_BGR or SF_GRAY). BGR or gray frames will be converted to YV12 format before
+encoding, frames with other formats will be used as is.
+*/
+CV_EXPORTS_W Ptr<cudacodec::VideoWriter> createVideoWriter(const Ptr<EncoderCallBack>& encoderCallback, Size frameSize, double fps, const EncoderParams& params, SurfaceFormat format = SF_BGR);
+
+////////////////////////////////// Video Decoding //////////////////////////////////////////
+
+/** @brief Video codecs supported by cudacodec::VideoReader .
+ */
+enum Codec
+{
+    MPEG1 = 0,
+    MPEG2,
+    MPEG4,
+    VC1,
+    H264,
+    JPEG,
+    H264_SVC,
+    H264_MVC,
+    HEVC,
+    VP8,
+    VP9,
+    NumCodecs,
+
+    Uncompressed_YUV420 = (('I'<<24)|('Y'<<16)|('U'<<8)|('V')),   //!< Y,U,V (4:2:0)
+    Uncompressed_YV12   = (('Y'<<24)|('V'<<16)|('1'<<8)|('2')),   //!< Y,V,U (4:2:0)
+    Uncompressed_NV12   = (('N'<<24)|('V'<<16)|('1'<<8)|('2')),   //!< Y,UV  (4:2:0)
+    Uncompressed_YUYV   = (('Y'<<24)|('U'<<16)|('Y'<<8)|('V')),   //!< YUYV/YUY2 (4:2:2)
+    Uncompressed_UYVY   = (('U'<<24)|('Y'<<16)|('V'<<8)|('Y'))    //!< UYVY (4:2:2)
+};
+
+/** @brief Chroma formats supported by cudacodec::VideoReader .
+ */
+enum ChromaFormat
+{
+    Monochrome = 0,
+    YUV420,
+    YUV422,
+    YUV444,
+    NumFormats
+};
+
+/** @brief Struct providing information about video file format. :
+ */
+struct FormatInfo
+{
+    Codec codec;
+    ChromaFormat chromaFormat;
+    int nBitDepthMinus8;
+    int width;
+    int height;
+};
+
+/** @brief Video reader interface.
+
+@note
+   -   An example on how to use the videoReader class can be found at
+        opencv_source_code/samples/gpu/video_reader.cpp
+ */
+class CV_EXPORTS_W VideoReader
+{
+public:
+    virtual ~VideoReader() {}
+
+    /** @brief Grabs, decodes and returns the next video frame.
+
+    If no frames has been grabbed (there are no more frames in video file), the methods return false .
+    The method throws Exception if error occurs.
+     */
+    CV_WRAP virtual bool nextFrame(CV_OUT GpuMat& frame, Stream &stream = Stream::Null()) = 0;
+
+    /** @brief Returns information about video file format.
+    */
+    virtual FormatInfo format() const = 0;
+};
+
+/** @brief Interface for video demultiplexing. :
+
+User can implement own demultiplexing by implementing this interface.
+ */
+class CV_EXPORTS_W RawVideoSource
+{
+public:
+    virtual ~RawVideoSource() {}
+
+    /** @brief Returns next packet with RAW video frame.
+
+    @param data Pointer to frame data.
+    @param size Size in bytes of current frame.
+     */
+    virtual bool getNextPacket(unsigned char** data, size_t* size) = 0;
+
+    /** @brief Returns information about video file format.
+    */
+    virtual FormatInfo format() const = 0;
+};
+
+/** @brief Creates video reader.
+
+@param filename Name of the input video file.
+
+FFMPEG is used to read videos. User can implement own demultiplexing with cudacodec::RawVideoSource
+ */
+CV_EXPORTS_W Ptr<VideoReader> createVideoReader(const String& filename);
+/** @overload
+@param source RAW video source implemented by user.
+*/
+CV_EXPORTS_W Ptr<VideoReader> createVideoReader(const Ptr<RawVideoSource>& source);
+
+//! @}
+
+}} // namespace cv { namespace cudacodec {
+
+#endif /* OPENCV_CUDACODEC_HPP */
diff --git a/modules/cudacodec/misc/python/pyopencv_cudacodec.hpp b/modules/cudacodec/misc/python/pyopencv_cudacodec.hpp
new file mode 100644
index 00000000000..15fd43de427
--- /dev/null
+++ b/modules/cudacodec/misc/python/pyopencv_cudacodec.hpp
@@ -0,0 +1,10 @@
+#ifdef HAVE_OPENCV_CUDACODEC
+
+#include "opencv2/cudacodec.hpp"
+
+typedef cudacodec::EncoderCallBack::PicType EncoderCallBack_PicType;
+
+CV_PY_TO_CLASS(cudacodec::EncoderParams);
+CV_PY_FROM_CLASS(cudacodec::EncoderParams);
+
+#endif
diff --git a/modules/cudacodec/misc/python/test/test_cudacodec.py b/modules/cudacodec/misc/python/test/test_cudacodec.py
new file mode 100644
index 00000000000..3bb8feaa0a6
--- /dev/null
+++ b/modules/cudacodec/misc/python/test/test_cudacodec.py
@@ -0,0 +1,55 @@
+#!/usr/bin/env python
+import os
+import cv2 as cv
+import numpy as np
+
+from tests_common import NewOpenCVTests, unittest
+
+class cudacodec_test(NewOpenCVTests):
+    def setUp(self):
+        super(cudacodec_test, self).setUp()
+        if not cv.cuda.getCudaEnabledDeviceCount():
+            self.skipTest("No CUDA-capable device is detected")
+
+    @unittest.skipIf('OPENCV_TEST_DATA_PATH' not in os.environ,
+                     "OPENCV_TEST_DATA_PATH is not defined")
+    def test_reader(self):
+        #Test the functionality but not the results of the video reader
+
+        vid_path = os.environ['OPENCV_TEST_DATA_PATH'] + '/cv/video/1920x1080.avi'
+        try:
+            reader = cv.cudacodec.createVideoReader(vid_path)
+            ret, gpu_mat = reader.nextFrame()
+            self.assertTrue(ret)
+            self.assertTrue('GpuMat' in str(type(gpu_mat)), msg=type(gpu_mat))
+            #TODO: print(cv.utils.dumpInputArray(gpu_mat)) # - no support for GpuMat
+
+            # not checking output, therefore sepearate tests for different signatures is unecessary
+            ret, _gpu_mat2 = reader.nextFrame(gpu_mat)
+            #TODO: self.assertTrue(gpu_mat == gpu_mat2)
+            self.assertTrue(ret)
+        except cv.error as e:
+            notSupported = (e.code == cv.Error.StsNotImplemented or e.code == cv.Error.StsUnsupportedFormat or e.code == cv.Error.GPU_API_CALL_ERROR)
+            self.assertTrue(notSupported)
+            if e.code == cv.Error.StsNotImplemented:
+                self.skipTest("NVCUVID is not installed")
+            elif e.code == cv.Error.StsUnsupportedFormat:
+                self.skipTest("GPU hardware video decoder missing or video format not supported")
+            elif e.code == cv.Error.GPU_API_CALL_ERRROR:
+                self.skipTest("GPU hardware video decoder is missing")
+            else:
+                self.skipTest(e.err)
+
+    def test_writer_existence(self):
+        #Test at least the existence of wrapped functions for now
+
+        try:
+            _writer = cv.cudacodec.createVideoWriter("tmp", (128, 128), 30)
+        except cv.error as e:
+            self.assertEqual(e.code, cv.Error.StsNotImplemented)
+            self.skipTest("NVCUVENC is not installed")
+
+        self.assertTrue(True) #It is sufficient that no exceptions have been there
+
+if __name__ == '__main__':
+    NewOpenCVTests.bootstrap()
\ No newline at end of file
diff --git a/modules/cudacodec/perf/perf_main.cpp b/modules/cudacodec/perf/perf_main.cpp
new file mode 100644
index 00000000000..68e91460f79
--- /dev/null
+++ b/modules/cudacodec/perf/perf_main.cpp
@@ -0,0 +1,47 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+using namespace perf;
+
+CV_PERF_TEST_CUDA_MAIN(cudacodec)
diff --git a/modules/cudacodec/perf/perf_precomp.hpp b/modules/cudacodec/perf/perf_precomp.hpp
new file mode 100644
index 00000000000..1cc4bdd9845
--- /dev/null
+++ b/modules/cudacodec/perf/perf_precomp.hpp
@@ -0,0 +1,54 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef OPENCV_PERF_PRECOMP_HPP
+#define OPENCV_PERF_PRECOMP_HPP
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_perf.hpp"
+
+#include "opencv2/cudacodec.hpp"
+
+namespace opencv_test {
+using namespace perf;
+}
+
+#endif
diff --git a/modules/cudacodec/perf/perf_video.cpp b/modules/cudacodec/perf/perf_video.cpp
new file mode 100644
index 00000000000..9305b4bdfee
--- /dev/null
+++ b/modules/cudacodec/perf/perf_video.cpp
@@ -0,0 +1,156 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+#include "opencv2/highgui/highgui_c.h"
+#include "opencv2/videoio/videoio_c.h"
+
+namespace opencv_test { namespace {
+
+#if defined(HAVE_NVCUVID)
+
+#if defined(HAVE_FFMPEG_WRAPPER) // should this be set in preprocessor or in cvconfig.h
+#define VIDEO_SRC Values("gpu/video/768x576.avi", "gpu/video/1920x1080.avi")
+#else
+// CUDA demuxer has to fall back to ffmpeg to process "gpu/video/768x576.avi"
+#define VIDEO_SRC Values( "gpu/video/1920x1080.avi")
+#endif
+
+DEF_PARAM_TEST_1(FileName, string);
+
+//////////////////////////////////////////////////////
+// VideoReader
+
+PERF_TEST_P(FileName, VideoReader, VIDEO_SRC)
+{
+    declare.time(20);
+
+    const string inputFile = perf::TestBase::getDataPath(GetParam());
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::cudacodec::VideoReader> d_reader = cv::cudacodec::createVideoReader(inputFile);
+
+        cv::cuda::GpuMat frame;
+
+        TEST_CYCLE_N(10) d_reader->nextFrame(frame);
+
+        CUDA_SANITY_CHECK(frame);
+    }
+    else
+    {
+        cv::VideoCapture reader(inputFile);
+        ASSERT_TRUE( reader.isOpened() );
+
+        cv::Mat frame;
+
+        TEST_CYCLE_N(10) reader >> frame;
+
+        CPU_SANITY_CHECK(frame);
+    }
+}
+
+#endif
+
+//////////////////////////////////////////////////////
+// VideoWriter
+
+#if defined(HAVE_NVCUVID) && defined(_WIN32)
+
+PERF_TEST_P(FileName, VideoWriter, VIDEO_SRC)
+{
+    declare.time(30);
+
+    const string inputFile = perf::TestBase::getDataPath(GetParam());
+    const string outputFile = cv::tempfile(".avi");
+
+    const double FPS = 25.0;
+
+    cv::VideoCapture reader(inputFile);
+    ASSERT_TRUE( reader.isOpened() );
+
+    cv::Mat frame;
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::cudacodec::VideoWriter> d_writer;
+
+        cv::cuda::GpuMat d_frame;
+
+        for (int i = 0; i < 10; ++i)
+        {
+            reader >> frame;
+            ASSERT_FALSE(frame.empty());
+
+            d_frame.upload(frame);
+
+            if (d_writer.empty())
+                d_writer = cv::cudacodec::createVideoWriter(outputFile, frame.size(), FPS);
+
+            startTimer(); next();
+            d_writer->write(d_frame);
+            stopTimer();
+        }
+    }
+    else
+    {
+        cv::VideoWriter writer;
+
+        for (int i = 0; i < 10; ++i)
+        {
+            reader >> frame;
+            ASSERT_FALSE(frame.empty());
+
+            if (!writer.isOpened())
+                writer.open(outputFile, CV_FOURCC('X', 'V', 'I', 'D'), FPS, frame.size());
+
+            startTimer(); next();
+            writer.write(frame);
+            stopTimer();
+        }
+    }
+
+    SANITY_CHECK(frame);
+}
+
+#endif
+}} // namespace
diff --git a/modules/cudacodec/src/cuda/nv12_to_rgb.cu b/modules/cudacodec/src/cuda/nv12_to_rgb.cu
new file mode 100644
index 00000000000..410f610ef0d
--- /dev/null
+++ b/modules/cudacodec/src/cuda/nv12_to_rgb.cu
@@ -0,0 +1,207 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+/*
+ * NV12ToARGB color space conversion CUDA kernel
+ *
+ * This sample uses CUDA to perform a simple NV12 (YUV 4:2:0 planar)
+ * source and converts to output in ARGB format
+ */
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudev/common.hpp"
+
+using namespace cv;
+using namespace cv::cudev;
+
+void videoDecPostProcessFrame(const GpuMat& decodedFrame, GpuMat& _outFrame, int width, int height, cudaStream_t stream);
+
+namespace
+{
+    __constant__ float constHueColorSpaceMat[9] = {1.1644f, 0.0f, 1.596f, 1.1644f, -0.3918f, -0.813f, 1.1644f, 2.0172f, 0.0f};
+
+    __device__ static void YUV2RGB(const uint* yuvi, float* red, float* green, float* blue)
+    {
+        float luma, chromaCb, chromaCr;
+
+        // Prepare for hue adjustment
+        luma     = (float)yuvi[0];
+        chromaCb = (float)((int)yuvi[1] - 512.0f);
+        chromaCr = (float)((int)yuvi[2] - 512.0f);
+
+       // Convert YUV To RGB with hue adjustment
+       *red   = (luma     * constHueColorSpaceMat[0]) +
+                (chromaCb * constHueColorSpaceMat[1]) +
+                (chromaCr * constHueColorSpaceMat[2]);
+
+       *green = (luma     * constHueColorSpaceMat[3]) +
+                (chromaCb * constHueColorSpaceMat[4]) +
+                (chromaCr * constHueColorSpaceMat[5]);
+
+       *blue  = (luma     * constHueColorSpaceMat[6]) +
+                (chromaCb * constHueColorSpaceMat[7]) +
+                (chromaCr * constHueColorSpaceMat[8]);
+    }
+
+    __device__ static uint RGBA_pack_10bit(float red, float green, float blue, uint alpha)
+    {
+        uint ARGBpixel = 0;
+
+        // Clamp final 10 bit results
+        red   = ::fmin(::fmax(red,   0.0f), 1023.f);
+        green = ::fmin(::fmax(green, 0.0f), 1023.f);
+        blue  = ::fmin(::fmax(blue,  0.0f), 1023.f);
+
+        // Convert to 8 bit unsigned integers per color component
+        ARGBpixel = (((uint)blue  >> 2) |
+                    (((uint)green >> 2) << 8)  |
+                    (((uint)red   >> 2) << 16) |
+                    (uint)alpha);
+
+        return ARGBpixel;
+    }
+
+    // CUDA kernel for outputting the final ARGB output from NV12
+
+    #define COLOR_COMPONENT_BIT_SIZE 10
+    #define COLOR_COMPONENT_MASK     0x3FF
+
+    __global__ void NV12_to_RGB(const uchar* srcImage, size_t nSourcePitch,
+                                  uint* dstImage, size_t nDestPitch,
+                                  uint width, uint height)
+    {
+        // Pad borders with duplicate pixels, and we multiply by 2 because we process 2 pixels per thread
+        const int x = blockIdx.x * (blockDim.x << 1) + (threadIdx.x << 1);
+        const int y = blockIdx.y *  blockDim.y       +  threadIdx.y;
+
+        if (x >= width || y >= height)
+            return;
+
+        // Read 2 Luma components at a time, so we don't waste processing since CbCr are decimated this way.
+        // if we move to texture we could read 4 luminance values
+
+        uint yuv101010Pel[2];
+
+        yuv101010Pel[0] = (srcImage[y * nSourcePitch + x    ]) << 2;
+        yuv101010Pel[1] = (srcImage[y * nSourcePitch + x + 1]) << 2;
+
+        const size_t chromaOffset = nSourcePitch * height;
+
+        const int y_chroma = y >> 1;
+
+        if (y & 1)  // odd scanline ?
+        {
+            uint chromaCb = srcImage[chromaOffset + y_chroma * nSourcePitch + x    ];
+            uint chromaCr = srcImage[chromaOffset + y_chroma * nSourcePitch + x + 1];
+
+            if (y_chroma < ((height >> 1) - 1)) // interpolate chroma vertically
+            {
+                chromaCb = (chromaCb + srcImage[chromaOffset + (y_chroma + 1) * nSourcePitch + x    ] + 1) >> 1;
+                chromaCr = (chromaCr + srcImage[chromaOffset + (y_chroma + 1) * nSourcePitch + x + 1] + 1) >> 1;
+            }
+
+            yuv101010Pel[0] |= (chromaCb << ( COLOR_COMPONENT_BIT_SIZE       + 2));
+            yuv101010Pel[0] |= (chromaCr << ((COLOR_COMPONENT_BIT_SIZE << 1) + 2));
+
+            yuv101010Pel[1] |= (chromaCb << ( COLOR_COMPONENT_BIT_SIZE       + 2));
+            yuv101010Pel[1] |= (chromaCr << ((COLOR_COMPONENT_BIT_SIZE << 1) + 2));
+        }
+        else
+        {
+            yuv101010Pel[0] |= ((uint)srcImage[chromaOffset + y_chroma * nSourcePitch + x    ] << ( COLOR_COMPONENT_BIT_SIZE       + 2));
+            yuv101010Pel[0] |= ((uint)srcImage[chromaOffset + y_chroma * nSourcePitch + x + 1] << ((COLOR_COMPONENT_BIT_SIZE << 1) + 2));
+
+            yuv101010Pel[1] |= ((uint)srcImage[chromaOffset + y_chroma * nSourcePitch + x    ] << ( COLOR_COMPONENT_BIT_SIZE       + 2));
+            yuv101010Pel[1] |= ((uint)srcImage[chromaOffset + y_chroma * nSourcePitch + x + 1] << ((COLOR_COMPONENT_BIT_SIZE << 1) + 2));
+        }
+
+        // this steps performs the color conversion
+        uint yuvi[6];
+        float red[2], green[2], blue[2];
+
+        yuvi[0] =  (yuv101010Pel[0] &   COLOR_COMPONENT_MASK    );
+        yuvi[1] = ((yuv101010Pel[0] >>  COLOR_COMPONENT_BIT_SIZE)       & COLOR_COMPONENT_MASK);
+        yuvi[2] = ((yuv101010Pel[0] >> (COLOR_COMPONENT_BIT_SIZE << 1)) & COLOR_COMPONENT_MASK);
+
+        yuvi[3] =  (yuv101010Pel[1] &   COLOR_COMPONENT_MASK    );
+        yuvi[4] = ((yuv101010Pel[1] >>  COLOR_COMPONENT_BIT_SIZE)       & COLOR_COMPONENT_MASK);
+        yuvi[5] = ((yuv101010Pel[1] >> (COLOR_COMPONENT_BIT_SIZE << 1)) & COLOR_COMPONENT_MASK);
+
+        // YUV to RGB Transformation conversion
+        YUV2RGB(&yuvi[0], &red[0], &green[0], &blue[0]);
+        YUV2RGB(&yuvi[3], &red[1], &green[1], &blue[1]);
+
+        // Clamp the results to RGBA
+
+        const size_t dstImagePitch = nDestPitch >> 2;
+
+        dstImage[y * dstImagePitch + x     ] = RGBA_pack_10bit(red[0], green[0], blue[0], ((uint)0xff << 24));
+        dstImage[y * dstImagePitch + x + 1 ] = RGBA_pack_10bit(red[1], green[1], blue[1], ((uint)0xff << 24));
+    }
+}
+
+void videoDecPostProcessFrame(const GpuMat& decodedFrame, GpuMat& outFrame, int width, int height, cudaStream_t stream)
+{
+    // Final Stage: NV12toARGB color space conversion
+
+    outFrame.create(height, width, CV_8UC4);
+
+    dim3 block(32, 8);
+    dim3 grid(divUp(width, 2 * block.x), divUp(height, block.y));
+
+    NV12_to_RGB<<<grid, block, 0, stream>>>(decodedFrame.ptr<uchar>(), decodedFrame.step,
+                                 outFrame.ptr<uint>(), outFrame.step,
+                                 width, height);
+
+    CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+    if (stream == 0)
+      CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+}
+
+#endif
diff --git a/modules/cudacodec/src/cuda/rgb_to_yv12.cu b/modules/cudacodec/src/cuda/rgb_to_yv12.cu
new file mode 100644
index 00000000000..ed0e0df9ba8
--- /dev/null
+++ b/modules/cudacodec/src/cuda/rgb_to_yv12.cu
@@ -0,0 +1,167 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "opencv2/cudev/ptr2d/glob.hpp"
+
+using namespace cv::cudev;
+
+void RGB_to_YV12(const GpuMat& src, GpuMat& dst);
+
+namespace
+{
+    __device__ __forceinline__ void rgb_to_y(const uchar b, const uchar g, const uchar r, uchar& y)
+    {
+        y = static_cast<uchar>(((int)(30 * r) + (int)(59 * g) + (int)(11 * b)) / 100);
+    }
+
+    __device__ __forceinline__ void rgb_to_yuv(const uchar b, const uchar g, const uchar r, uchar& y, uchar& u, uchar& v)
+    {
+        rgb_to_y(b, g, r, y);
+        u = static_cast<uchar>(((int)(-17 * r) - (int)(33 * g) + (int)(50 * b) + 12800) / 100);
+        v = static_cast<uchar>(((int)(50 * r) - (int)(42 * g) - (int)(8 * b) + 12800) / 100);
+    }
+
+    __global__ void Gray_to_YV12(const GlobPtrSz<uchar> src, GlobPtr<uchar> dst)
+    {
+        const int x = (blockIdx.x * blockDim.x + threadIdx.x) * 2;
+        const int y = (blockIdx.y * blockDim.y + threadIdx.y) * 2;
+
+        if (x + 1 >= src.cols || y + 1 >= src.rows)
+            return;
+
+        // get pointers to the data
+        const size_t planeSize = src.rows * dst.step;
+        GlobPtr<uchar> y_plane = globPtr(dst.data, dst.step);
+        GlobPtr<uchar> u_plane = globPtr(y_plane.data + planeSize, dst.step / 2);
+        GlobPtr<uchar> v_plane = globPtr(u_plane.data + (planeSize / 4), dst.step / 2);
+
+        uchar pix;
+        uchar y_val, u_val, v_val;
+
+        pix = src(y, x);
+        rgb_to_y(pix, pix, pix, y_val);
+        y_plane(y, x) = y_val;
+
+        pix = src(y, x + 1);
+        rgb_to_y(pix, pix, pix, y_val);
+        y_plane(y, x + 1) = y_val;
+
+        pix = src(y + 1, x);
+        rgb_to_y(pix, pix, pix, y_val);
+        y_plane(y + 1, x) = y_val;
+
+        pix = src(y + 1, x + 1);
+        rgb_to_yuv(pix, pix, pix, y_val, u_val, v_val);
+        y_plane(y + 1, x + 1) = y_val;
+        u_plane(y / 2, x / 2) = u_val;
+        v_plane(y / 2, x / 2) = v_val;
+    }
+
+    template <typename T>
+    __global__ void RGB_to_YV12(const GlobPtrSz<T> src, GlobPtr<uchar> dst)
+    {
+        const int x = (blockIdx.x * blockDim.x + threadIdx.x) * 2;
+        const int y = (blockIdx.y * blockDim.y + threadIdx.y) * 2;
+
+        if (x + 1 >= src.cols || y + 1 >= src.rows)
+            return;
+
+        // get pointers to the data
+        const size_t planeSize = src.rows * dst.step;
+        GlobPtr<uchar> y_plane = globPtr(dst.data, dst.step);
+        GlobPtr<uchar> u_plane = globPtr(y_plane.data + planeSize, dst.step / 2);
+        GlobPtr<uchar> v_plane = globPtr(u_plane.data + (planeSize / 4), dst.step / 2);
+
+        T pix;
+        uchar y_val, u_val, v_val;
+
+        pix = src(y, x);
+        rgb_to_y(pix.z, pix.y, pix.x, y_val);
+        y_plane(y, x) = y_val;
+
+        pix = src(y, x + 1);
+        rgb_to_y(pix.z, pix.y, pix.x, y_val);
+        y_plane(y, x + 1) = y_val;
+
+        pix = src(y + 1, x);
+        rgb_to_y(pix.z, pix.y, pix.x, y_val);
+        y_plane(y + 1, x) = y_val;
+
+        pix = src(y + 1, x + 1);
+        rgb_to_yuv(pix.z, pix.y, pix.x, y_val, u_val, v_val);
+        y_plane(y + 1, x + 1) = y_val;
+        u_plane(y / 2, x / 2) = u_val;
+        v_plane(y / 2, x / 2) = v_val;
+    }
+}
+
+void RGB_to_YV12(const GpuMat& src, GpuMat& dst)
+{
+    const dim3 block(32, 8);
+    const dim3 grid(divUp(src.cols, block.x * 2), divUp(src.rows, block.y * 2));
+
+    switch (src.channels())
+    {
+    case 1:
+        Gray_to_YV12<<<grid, block>>>(globPtr<uchar>(src), globPtr<uchar>(dst));
+        break;
+    case 3:
+        RGB_to_YV12<<<grid, block>>>(globPtr<uchar3>(src), globPtr<uchar>(dst));
+        break;
+    case 4:
+        RGB_to_YV12<<<grid, block>>>(globPtr<uchar4>(src), globPtr<uchar>(dst));
+        break;
+    }
+
+    CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+    CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+}
+
+#endif
diff --git a/modules/cudacodec/src/cuvid_video_source.cpp b/modules/cudacodec/src/cuvid_video_source.cpp
new file mode 100644
index 00000000000..0ca0b2d9b25
--- /dev/null
+++ b/modules/cudacodec/src/cuvid_video_source.cpp
@@ -0,0 +1,116 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+#ifdef HAVE_NVCUVID
+
+using namespace cv;
+using namespace cv::cudacodec;
+using namespace cv::cudacodec::detail;
+
+cv::cudacodec::detail::CuvidVideoSource::CuvidVideoSource(const String& fname)
+{
+    CUVIDSOURCEPARAMS params;
+    std::memset(&params, 0, sizeof(CUVIDSOURCEPARAMS));
+
+    // Fill parameter struct
+    params.pUserData = this;                        // will be passed to data handlers
+    params.pfnVideoDataHandler = HandleVideoData;   // our local video-handler callback
+    params.pfnAudioDataHandler = 0;
+
+    // now create the actual source
+    CUresult cuRes = cuvidCreateVideoSource(&videoSource_, fname.c_str(), &params);
+    if (cuRes == CUDA_ERROR_INVALID_SOURCE)
+        CV_Error(Error::StsUnsupportedFormat, "Unsupported video source");
+    cuSafeCall( cuRes );
+
+    CUVIDEOFORMAT vidfmt;
+    cuSafeCall( cuvidGetSourceVideoFormat(videoSource_, &vidfmt, 0) );
+
+    CV_Assert(Codec::NumCodecs == cudaVideoCodec::cudaVideoCodec_NumCodecs);
+    format_.codec = static_cast<Codec>(vidfmt.codec);
+    format_.chromaFormat = static_cast<ChromaFormat>(vidfmt.chroma_format);
+    format_.nBitDepthMinus8 = vidfmt.bit_depth_luma_minus8;
+    format_.width = vidfmt.coded_width;
+    format_.height = vidfmt.coded_height;
+}
+
+cv::cudacodec::detail::CuvidVideoSource::~CuvidVideoSource()
+{
+    cuvidDestroyVideoSource(videoSource_);
+}
+
+FormatInfo cv::cudacodec::detail::CuvidVideoSource::format() const
+{
+    return format_;
+}
+
+void cv::cudacodec::detail::CuvidVideoSource::start()
+{
+    cuSafeCall( cuvidSetVideoSourceState(videoSource_, cudaVideoState_Started) );
+}
+
+void cv::cudacodec::detail::CuvidVideoSource::stop()
+{
+    cuSafeCall( cuvidSetVideoSourceState(videoSource_, cudaVideoState_Stopped) );
+}
+
+bool cv::cudacodec::detail::CuvidVideoSource::isStarted() const
+{
+    return (cuvidGetVideoSourceState(videoSource_) == cudaVideoState_Started);
+}
+
+bool cv::cudacodec::detail::CuvidVideoSource::hasError() const
+{
+    return (cuvidGetVideoSourceState(videoSource_) == cudaVideoState_Error);
+}
+
+int CUDAAPI cv::cudacodec::detail::CuvidVideoSource::HandleVideoData(void* userData, CUVIDSOURCEDATAPACKET* packet)
+{
+    CuvidVideoSource* thiz = static_cast<CuvidVideoSource*>(userData);
+
+    return thiz->parseVideoData(packet->payload, packet->payload_size, (packet->flags & CUVID_PKT_ENDOFSTREAM) != 0);
+}
+
+#endif // HAVE_NVCUVID
diff --git a/modules/cudacodec/src/cuvid_video_source.hpp b/modules/cudacodec/src/cuvid_video_source.hpp
new file mode 100644
index 00000000000..802e65a92cc
--- /dev/null
+++ b/modules/cudacodec/src/cuvid_video_source.hpp
@@ -0,0 +1,90 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __CUVID_VIDEO_SOURCE_HPP__
+#define __CUVID_VIDEO_SOURCE_HPP__
+
+#if CUDA_VERSION >= 9000 && CUDA_VERSION < 10000
+    #include <dynlink_nvcuvid.h>
+#else
+    #include <nvcuvid.h>
+#endif
+#include "opencv2/core/private.cuda.hpp"
+#include "opencv2/cudacodec.hpp"
+#include "video_source.hpp"
+
+namespace cv { namespace cudacodec { namespace detail
+{
+
+class CuvidVideoSource : public VideoSource
+{
+public:
+    explicit CuvidVideoSource(const String& fname);
+    ~CuvidVideoSource();
+
+    FormatInfo format() const CV_OVERRIDE;
+    void start() CV_OVERRIDE;
+    void stop() CV_OVERRIDE;
+    bool isStarted() const CV_OVERRIDE;
+    bool hasError() const CV_OVERRIDE;
+
+private:
+    // Callback for handling packages of demuxed video data.
+    //
+    // Parameters:
+    //      pUserData - Pointer to user data. We must pass a pointer to a
+    //          VideoSourceData struct here, that contains a valid CUvideoparser
+    //          and FrameQueue.
+    //      pPacket - video-source data packet.
+    //
+    // NOTE: called from a different thread that doesn't not have a cuda context
+    //
+    static int CUDAAPI HandleVideoData(void* pUserData, CUVIDSOURCEDATAPACKET* pPacket);
+
+    CUvideosource videoSource_;
+    FormatInfo format_;
+};
+
+}}}
+
+#endif // __CUVID_VIDEO_SOURCE_HPP__
diff --git a/modules/cudacodec/src/ffmpeg_video_source.cpp b/modules/cudacodec/src/ffmpeg_video_source.cpp
new file mode 100644
index 00000000000..90016fe39a6
--- /dev/null
+++ b/modules/cudacodec/src/ffmpeg_video_source.cpp
@@ -0,0 +1,156 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+#ifdef HAVE_NVCUVID
+using namespace cv;
+using namespace cv::cudacodec;
+using namespace cv::cudacodec::detail;
+
+#ifndef CV_FOURCC_MACRO
+#define CV_FOURCC_MACRO(c1, c2, c3, c4) (((c1) & 255) + (((c2) & 255) << 8) + (((c3) & 255) << 16) + (((c4) & 255) << 24))
+#endif
+
+static std::string fourccToString(int fourcc)
+{
+    union {
+        int u32;
+        unsigned char c[4];
+    } i32_c;
+    i32_c.u32 = fourcc;
+    return cv::format("%c%c%c%c",
+        (i32_c.c[0] >= ' ' && i32_c.c[0] < 128) ? i32_c.c[0] : '?',
+        (i32_c.c[1] >= ' ' && i32_c.c[1] < 128) ? i32_c.c[1] : '?',
+        (i32_c.c[2] >= ' ' && i32_c.c[2] < 128) ? i32_c.c[2] : '?',
+        (i32_c.c[3] >= ' ' && i32_c.c[3] < 128) ? i32_c.c[3] : '?');
+}
+
+static
+Codec FourccToCodec(int codec)
+{
+    switch (codec)
+    {
+    case CV_FOURCC_MACRO('m', 'p', 'e', 'g'): // fallthru
+    case CV_FOURCC_MACRO('M', 'P', 'G', '1'): return MPEG1;
+    case CV_FOURCC_MACRO('M', 'P', 'G', '2'): return MPEG2;
+    case CV_FOURCC_MACRO('X', 'V', 'I', 'D'): // fallthru
+    case CV_FOURCC_MACRO('D', 'I', 'V', 'X'): return MPEG4;
+    case CV_FOURCC_MACRO('W', 'V', 'C', '1'): return VC1;
+    case CV_FOURCC_MACRO('H', '2', '6', '4'): // fallthru
+    case CV_FOURCC_MACRO('h', '2', '6', '4'): // fallthru
+    case CV_FOURCC_MACRO('a', 'v', 'c', '1'): return H264;
+    case CV_FOURCC_MACRO('H', '2', '6', '5'): // fallthru
+    case CV_FOURCC_MACRO('h', '2', '6', '5'): // fallthru
+    case CV_FOURCC_MACRO('h', 'e', 'v', 'c'): return HEVC;
+    case CV_FOURCC_MACRO('M', 'J', 'P', 'G'): return JPEG;
+    case CV_FOURCC_MACRO('V', 'P', '8', '0'): return VP8;
+    case CV_FOURCC_MACRO('V', 'P', '9', '0'): return VP9;
+    default:
+        break;
+    }
+
+    std::string msg = cv::format("Unknown codec FOURCC: 0x%08X (%s)", codec, fourccToString(codec).c_str());
+    CV_LOG_WARNING(NULL, msg);
+    CV_Error(Error::StsUnsupportedFormat, msg);
+}
+
+static
+void FourccToChromaFormat(const int pixelFormat, ChromaFormat &chromaFormat, int & nBitDepthMinus8)
+{
+    switch (pixelFormat)
+    {
+    case CV_FOURCC_MACRO('I', '4', '2', '0'):
+        chromaFormat = YUV420;
+        nBitDepthMinus8 = 0;
+        break;
+    default:
+        CV_LOG_WARNING(NULL, cv::format("ChromaFormat not recognized: 0x%08X (%s). Assuming I420", pixelFormat, fourccToString(pixelFormat).c_str()));
+        chromaFormat = YUV420;
+        nBitDepthMinus8 = 0;
+        break;
+    }
+}
+
+cv::cudacodec::detail::FFmpegVideoSource::FFmpegVideoSource(const String& fname)
+{
+    if (!videoio_registry::hasBackend(CAP_FFMPEG))
+        CV_Error(Error::StsNotImplemented, "FFmpeg backend not found");
+
+    cap.open(fname, CAP_FFMPEG);
+    if (!cap.isOpened())
+        CV_Error(Error::StsUnsupportedFormat, "Unsupported video source");
+
+    if (!cap.set(CAP_PROP_FORMAT, -1))  // turn off video decoder (extract stream)
+        CV_Error(Error::StsUnsupportedFormat, "Fetching of RAW video streams is not supported");
+    CV_Assert(cap.get(CAP_PROP_FORMAT) == -1);
+
+    int codec = (int)cap.get(CAP_PROP_FOURCC);
+    int pixelFormat = (int)cap.get(CAP_PROP_CODEC_PIXEL_FORMAT);
+
+    format_.codec = FourccToCodec(codec);
+    format_.height = cap.get(CAP_PROP_FRAME_HEIGHT);
+    format_.width = cap.get(CAP_PROP_FRAME_WIDTH);
+    FourccToChromaFormat(pixelFormat, format_.chromaFormat, format_.nBitDepthMinus8);
+}
+
+cv::cudacodec::detail::FFmpegVideoSource::~FFmpegVideoSource()
+{
+    if (cap.isOpened())
+        cap.release();
+}
+
+FormatInfo cv::cudacodec::detail::FFmpegVideoSource::format() const
+{
+    return format_;
+}
+
+bool cv::cudacodec::detail::FFmpegVideoSource::getNextPacket(unsigned char** data, size_t* size)
+{
+    cap >> rawFrame;
+    *data = rawFrame.data;
+    *size = rawFrame.total();
+    return *size != 0;
+}
+
+#endif // HAVE_CUDA
diff --git a/modules/cudacodec/src/ffmpeg_video_source.hpp b/modules/cudacodec/src/ffmpeg_video_source.hpp
new file mode 100644
index 00000000000..34f9d084211
--- /dev/null
+++ b/modules/cudacodec/src/ffmpeg_video_source.hpp
@@ -0,0 +1,69 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __FFMPEG_VIDEO_SOURCE_HPP__
+#define __FFMPEG_VIDEO_SOURCE_HPP__
+
+#include "opencv2/cudacodec.hpp"
+
+namespace cv { namespace cudacodec { namespace detail {
+
+class FFmpegVideoSource : public RawVideoSource
+{
+public:
+    FFmpegVideoSource(const String& fname);
+    ~FFmpegVideoSource();
+
+    bool getNextPacket(unsigned char** data, size_t* size) CV_OVERRIDE;
+
+    FormatInfo format() const CV_OVERRIDE;
+
+private:
+    FormatInfo format_;
+    VideoCapture cap;
+    Mat rawFrame;
+};
+
+}}}
+
+#endif // __FFMPEG_VIDEO_SOURCE_HPP__
diff --git a/modules/cudacodec/src/frame_queue.cpp b/modules/cudacodec/src/frame_queue.cpp
new file mode 100644
index 00000000000..d3c42c902ca
--- /dev/null
+++ b/modules/cudacodec/src/frame_queue.cpp
@@ -0,0 +1,118 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+#ifdef HAVE_NVCUVID
+
+cv::cudacodec::detail::FrameQueue::FrameQueue() :
+    endOfDecode_(0),
+    framesInQueue_(0),
+    readPosition_(0)
+{
+    std::memset(displayQueue_, 0, sizeof(displayQueue_));
+    std::memset((void*) isFrameInUse_, 0, sizeof(isFrameInUse_));
+}
+
+bool cv::cudacodec::detail::FrameQueue::waitUntilFrameAvailable(int pictureIndex)
+{
+    while (isInUse(pictureIndex))
+    {
+        // Decoder is getting too far ahead from display
+        Thread::sleep(1);
+
+        if (isEndOfDecode())
+            return false;
+    }
+
+    return true;
+}
+
+void cv::cudacodec::detail::FrameQueue::enqueue(const CUVIDPARSERDISPINFO* picParams)
+{
+    // Mark the frame as 'in-use' so we don't re-use it for decoding until it is no longer needed
+    // for display
+    isFrameInUse_[picParams->picture_index] = true;
+
+    // Wait until we have a free entry in the display queue (should never block if we have enough entries)
+    do
+    {
+        bool isFramePlaced = false;
+
+        {
+            AutoLock autoLock(mtx_);
+
+            if (framesInQueue_ < MaximumSize)
+            {
+                int writePosition = (readPosition_ + framesInQueue_) % MaximumSize;
+                displayQueue_[writePosition] = *picParams;
+                framesInQueue_++;
+                isFramePlaced = true;
+            }
+        }
+
+        if (isFramePlaced) // Done
+            break;
+
+        // Wait a bit
+        Thread::sleep(1);
+    } while (!isEndOfDecode());
+}
+
+bool cv::cudacodec::detail::FrameQueue::dequeue(CUVIDPARSERDISPINFO& displayInfo)
+{
+    AutoLock autoLock(mtx_);
+
+    if (framesInQueue_ > 0)
+    {
+        int entry = readPosition_;
+        displayInfo = displayQueue_[entry];
+        readPosition_ = (entry + 1) % MaximumSize;
+        framesInQueue_--;
+        return true;
+    }
+
+    return false;
+}
+
+#endif // HAVE_NVCUVID
diff --git a/modules/cudacodec/src/frame_queue.hpp b/modules/cudacodec/src/frame_queue.hpp
new file mode 100644
index 00000000000..3ff06a67ed1
--- /dev/null
+++ b/modules/cudacodec/src/frame_queue.hpp
@@ -0,0 +1,102 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __FRAME_QUEUE_HPP__
+#define __FRAME_QUEUE_HPP__
+
+#include "opencv2/core/utility.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+#if CUDA_VERSION >= 9000 && CUDA_VERSION < 10000
+    #include <dynlink_nvcuvid.h>
+#else
+    #include <nvcuvid.h>
+#endif
+
+namespace cv { namespace cudacodec { namespace detail
+{
+
+class FrameQueue
+{
+public:
+    static const int MaximumSize = 20; // MAX_FRM_CNT;
+
+    FrameQueue();
+
+    void endDecode() { endOfDecode_ = true; }
+    bool isEndOfDecode() const { return endOfDecode_ != 0;}
+
+    // Spins until frame becomes available or decoding gets canceled.
+    // If the requested frame is available the method returns true.
+    // If decoding was interrupted before the requested frame becomes
+    // available, the method returns false.
+    bool waitUntilFrameAvailable(int pictureIndex);
+
+    void enqueue(const CUVIDPARSERDISPINFO* picParams);
+
+    // Deque the next frame.
+    // Parameters:
+    //      displayInfo - New frame info gets placed into this object.
+    // Returns:
+    //      true, if a new frame was returned,
+    //      false, if the queue was empty and no new frame could be returned.
+    bool dequeue(CUVIDPARSERDISPINFO& displayInfo);
+
+    void releaseFrame(const CUVIDPARSERDISPINFO& picParams) { isFrameInUse_[picParams.picture_index] = false; }
+
+private:
+    bool isInUse(int pictureIndex) const { return isFrameInUse_[pictureIndex] != 0; }
+
+    Mutex mtx_;
+
+    volatile int isFrameInUse_[MaximumSize];
+    volatile int endOfDecode_;
+
+    int framesInQueue_;
+    int readPosition_;
+    CUVIDPARSERDISPINFO displayQueue_[MaximumSize];
+};
+
+}}}
+
+#endif // __FRAME_QUEUE_HPP__
diff --git a/modules/cudacodec/src/precomp.hpp b/modules/cudacodec/src/precomp.hpp
new file mode 100644
index 00000000000..6fac7322b92
--- /dev/null
+++ b/modules/cudacodec/src/precomp.hpp
@@ -0,0 +1,88 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_PRECOMP_H
+#define OPENCV_PRECOMP_H
+
+#include <cstdlib>
+#include <cstring>
+#include <deque>
+#include <utility>
+#include <stdexcept>
+#include <iostream>
+
+#include "opencv2/cudacodec.hpp"
+#include "opencv2/videoio.hpp"
+#include "opencv2/videoio/registry.hpp"
+#include "opencv2/core/private.cuda.hpp"
+#include <opencv2/core/utils/logger.hpp>
+
+#ifdef HAVE_NVCUVID
+    #if CUDA_VERSION >= 9000 && CUDA_VERSION < 10000
+        #include <dynlink_nvcuvid.h>
+    #else
+        #include <nvcuvid.h>
+    #endif
+
+    #ifdef _WIN32
+        #define NOMINMAX
+        #include <windows.h>
+        #ifdef HAVE_NVCUVENC
+            #include <NVEncoderAPI.h>
+        #endif
+    #else
+        #include <pthread.h>
+        #include <unistd.h>
+    #endif
+
+    #include "thread.hpp"
+    #include "video_source.hpp"
+    #include "ffmpeg_video_source.hpp"
+    #include "cuvid_video_source.hpp"
+    #include "frame_queue.hpp"
+    #include "video_decoder.hpp"
+    #include "video_parser.hpp"
+
+#endif
+
+#endif /* OPENCV_PRECOMP_H */
diff --git a/modules/cudacodec/src/thread.cpp b/modules/cudacodec/src/thread.cpp
new file mode 100644
index 00000000000..d3264491351
--- /dev/null
+++ b/modules/cudacodec/src/thread.cpp
@@ -0,0 +1,170 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+#ifdef HAVE_NVCUVID
+
+using namespace cv::cudacodec::detail;
+
+#ifdef _WIN32
+
+namespace
+{
+    struct UserData
+    {
+        Thread::Func func;
+        void* param;
+    };
+
+    DWORD WINAPI WinThreadFunction(LPVOID lpParam)
+    {
+        UserData* userData = static_cast<UserData*>(lpParam);
+
+        userData->func(userData->param);
+
+        return 0;
+    }
+}
+
+class cv::cudacodec::detail::Thread::Impl
+{
+public:
+    Impl(Thread::Func func, void* userData)
+    {
+        userData_.func = func;
+        userData_.param = userData;
+
+        thread_ = CreateThread(
+            NULL,                   // default security attributes
+            0,                      // use default stack size
+            WinThreadFunction,      // thread function name
+            &userData_,             // argument to thread function
+            0,                      // use default creation flags
+            &threadId_);            // returns the thread identifier
+    }
+
+    ~Impl()
+    {
+        CloseHandle(thread_);
+    }
+
+    void wait()
+    {
+        WaitForSingleObject(thread_, INFINITE);
+    }
+
+private:
+    UserData userData_;
+    HANDLE thread_;
+    DWORD threadId_;
+};
+
+#else
+
+namespace
+{
+    struct UserData
+    {
+        Thread::Func func;
+        void* param;
+    };
+
+    void* PThreadFunction(void* lpParam)
+    {
+        UserData* userData = static_cast<UserData*>(lpParam);
+
+        userData->func(userData->param);
+
+        return 0;
+    }
+}
+
+class cv::cudacodec::detail::Thread::Impl
+{
+public:
+    Impl(Thread::Func func, void* userData)
+    {
+        userData_.func = func;
+        userData_.param = userData;
+
+        pthread_create(&thread_, NULL, PThreadFunction, &userData_);
+    }
+
+    ~Impl()
+    {
+        pthread_detach(thread_);
+    }
+
+    void wait()
+    {
+        pthread_join(thread_, NULL);
+    }
+
+private:
+    pthread_t thread_;
+    UserData userData_;
+};
+
+#endif
+
+cv::cudacodec::detail::Thread::Thread(Func func, void* userData) :
+    impl_(new Impl(func, userData))
+{
+}
+
+void cv::cudacodec::detail::Thread::wait()
+{
+    impl_->wait();
+}
+
+void cv::cudacodec::detail::Thread::sleep(int ms)
+{
+#ifdef _WIN32
+    ::Sleep(ms);
+#else
+    ::usleep(ms * 1000);
+#endif
+}
+
+#endif // HAVE_NVCUVID
diff --git a/modules/cudacodec/src/thread.hpp b/modules/cudacodec/src/thread.hpp
new file mode 100644
index 00000000000..25c2b2251b9
--- /dev/null
+++ b/modules/cudacodec/src/thread.hpp
@@ -0,0 +1,70 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __THREAD_WRAPPERS_HPP__
+#define __THREAD_WRAPPERS_HPP__
+
+#include "opencv2/core.hpp"
+
+namespace cv { namespace cudacodec { namespace detail {
+
+class Thread
+{
+public:
+    typedef void (*Func)(void* userData);
+
+    explicit Thread(Func func, void* userData = 0);
+
+    void wait();
+
+    static void sleep(int ms);
+
+    class Impl;
+
+private:
+    cv::Ptr<Impl> impl_;
+};
+
+}}}
+
+#endif // __THREAD_WRAPPERS_HPP__
diff --git a/modules/cudacodec/src/video_decoder.cpp b/modules/cudacodec/src/video_decoder.cpp
new file mode 100644
index 00000000000..a84f79c9b6f
--- /dev/null
+++ b/modules/cudacodec/src/video_decoder.cpp
@@ -0,0 +1,146 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+#ifdef HAVE_NVCUVID
+
+void cv::cudacodec::detail::VideoDecoder::create(const FormatInfo& videoFormat)
+{
+    if (videoFormat.nBitDepthMinus8 > 0 || videoFormat.chromaFormat != YUV420)
+        CV_Error(Error::StsUnsupportedFormat, "NV12 output requires 8 bit YUV420");
+
+    cudaVideoCodec _codec = static_cast<cudaVideoCodec>(videoFormat.codec);
+    cudaVideoChromaFormat _chromaFormat = static_cast<cudaVideoChromaFormat>(videoFormat.chromaFormat);
+
+    cudaVideoCreateFlags videoCreateFlags = (_codec == cudaVideoCodec_JPEG || _codec == cudaVideoCodec_MPEG2) ?
+                                            cudaVideoCreate_PreferCUDA :
+                                            cudaVideoCreate_PreferCUVID;
+
+    // Validate video format.  These are the currently supported formats via NVCUVID
+    bool codecSupported =   cudaVideoCodec_MPEG1    == _codec ||
+                            cudaVideoCodec_MPEG2    == _codec ||
+                            cudaVideoCodec_MPEG4    == _codec ||
+                            cudaVideoCodec_VC1      == _codec ||
+                            cudaVideoCodec_H264     == _codec ||
+                            cudaVideoCodec_JPEG     == _codec ||
+                            cudaVideoCodec_H264_SVC == _codec ||
+                            cudaVideoCodec_H264_MVC == _codec ||
+                            cudaVideoCodec_YV12     == _codec ||
+                            cudaVideoCodec_NV12     == _codec ||
+                            cudaVideoCodec_YUYV     == _codec ||
+                            cudaVideoCodec_UYVY     == _codec;
+
+#if defined (HAVE_CUDA)
+#if (CUDART_VERSION >= 6500)
+    codecSupported |=       cudaVideoCodec_HEVC     == _codec;
+#endif
+#if  ((CUDART_VERSION == 7500) || (CUDART_VERSION >= 9000))
+    codecSupported |=       cudaVideoCodec_VP8      == _codec ||
+                            cudaVideoCodec_VP9      == _codec ||
+                            cudaVideoCodec_YUV420   == _codec;
+#endif
+#endif
+
+    CV_Assert(codecSupported);
+    CV_Assert(  cudaVideoChromaFormat_Monochrome == _chromaFormat ||
+                cudaVideoChromaFormat_420        == _chromaFormat ||
+                cudaVideoChromaFormat_422        == _chromaFormat ||
+                cudaVideoChromaFormat_444        == _chromaFormat);
+
+#if (CUDART_VERSION >= 9000)
+    // Check video format is supported by GPU's hardware video decoder
+    CUVIDDECODECAPS decodeCaps = {};
+    decodeCaps.eCodecType = _codec;
+    decodeCaps.eChromaFormat = _chromaFormat;
+    decodeCaps.nBitDepthMinus8 = videoFormat.nBitDepthMinus8;
+    cuSafeCall(cuCtxPushCurrent(ctx_));
+    cuSafeCall(cuvidGetDecoderCaps(&decodeCaps));
+    cuSafeCall(cuCtxPopCurrent(NULL));
+    if (!decodeCaps.bIsSupported)
+        CV_Error(Error::StsUnsupportedFormat, "Video source is not supported by hardware video decoder");
+
+    CV_Assert(videoFormat.width >= decodeCaps.nMinWidth &&
+        videoFormat.height >= decodeCaps.nMinHeight &&
+        videoFormat.width <= decodeCaps.nMaxWidth &&
+        videoFormat.height <= decodeCaps.nMaxHeight);
+
+    CV_Assert((videoFormat.width >> 4)* (videoFormat.height >> 4) <= decodeCaps.nMaxMBCount);
+#endif
+
+    // Fill the decoder-create-info struct from the given video-format struct.
+    std::memset(&createInfo_, 0, sizeof(CUVIDDECODECREATEINFO));
+
+    // Create video decoder
+    createInfo_.CodecType           = _codec;
+    createInfo_.ulWidth             = videoFormat.width;
+    createInfo_.ulHeight            = videoFormat.height;
+    createInfo_.ulNumDecodeSurfaces = FrameQueue::MaximumSize;
+    createInfo_.ChromaFormat    = _chromaFormat;
+    createInfo_.OutputFormat    = cudaVideoSurfaceFormat_NV12;
+    createInfo_.DeinterlaceMode = cudaVideoDeinterlaceMode_Adaptive;
+
+    // No scaling
+    static const int MAX_FRAME_COUNT = 2;
+
+    createInfo_.ulTargetWidth       = createInfo_.ulWidth;
+    createInfo_.ulTargetHeight      = createInfo_.ulHeight;
+    createInfo_.ulNumOutputSurfaces = MAX_FRAME_COUNT;  // We won't simultaneously map more than 8 surfaces
+    createInfo_.ulCreationFlags     = videoCreateFlags;
+    createInfo_.vidLock = lock_;
+
+    cuSafeCall(cuCtxPushCurrent(ctx_));
+    cuSafeCall(cuvidCreateDecoder(&decoder_, &createInfo_));
+    cuSafeCall(cuCtxPopCurrent(NULL));
+}
+
+void cv::cudacodec::detail::VideoDecoder::release()
+{
+    if (decoder_)
+    {
+        cuvidDestroyDecoder(decoder_);
+        decoder_ = 0;
+    }
+}
+
+#endif // HAVE_NVCUVID
diff --git a/modules/cudacodec/src/video_decoder.hpp b/modules/cudacodec/src/video_decoder.hpp
new file mode 100644
index 00000000000..99eb224dde0
--- /dev/null
+++ b/modules/cudacodec/src/video_decoder.hpp
@@ -0,0 +1,119 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __VIDEO_DECODER_HPP__
+#define __VIDEO_DECODER_HPP__
+
+#if CUDA_VERSION >= 9000 && CUDA_VERSION < 10000
+    #include <dynlink_nvcuvid.h>
+#else
+    #include <nvcuvid.h>
+#endif
+
+#include "opencv2/core/private.cuda.hpp"
+#include "opencv2/cudacodec.hpp"
+
+namespace cv { namespace cudacodec { namespace detail
+{
+
+class VideoDecoder
+{
+public:
+    VideoDecoder(const FormatInfo& videoFormat, CUcontext ctx, CUvideoctxlock lock) : ctx_(ctx), lock_(lock), decoder_(0)
+    {
+        create(videoFormat);
+    }
+
+    ~VideoDecoder()
+    {
+        release();
+    }
+
+    void create(const FormatInfo& videoFormat);
+    void release();
+
+    // Get the code-type currently used.
+    cudaVideoCodec codec() const { return createInfo_.CodecType; }
+    unsigned long maxDecodeSurfaces() const { return createInfo_.ulNumDecodeSurfaces; }
+
+    unsigned long frameWidth() const { return createInfo_.ulWidth; }
+    unsigned long frameHeight() const { return createInfo_.ulHeight; }
+
+    unsigned long targetWidth() const { return createInfo_.ulTargetWidth; }
+    unsigned long targetHeight() const { return createInfo_.ulTargetHeight; }
+
+    cudaVideoChromaFormat chromaFormat() const { return createInfo_.ChromaFormat; }
+    int nBitDepthMinus8() const { return createInfo_.bitDepthMinus8; }
+
+    bool decodePicture(CUVIDPICPARAMS* picParams)
+    {
+        return cuvidDecodePicture(decoder_, picParams) == CUDA_SUCCESS;
+    }
+
+    cuda::GpuMat mapFrame(int picIdx, CUVIDPROCPARAMS& videoProcParams)
+    {
+        CUdeviceptr ptr;
+        unsigned int pitch;
+
+        cuSafeCall( cuvidMapVideoFrame(decoder_, picIdx, &ptr, &pitch, &videoProcParams) );
+
+
+        return cuda::GpuMat(targetHeight() * 3 / 2, targetWidth(), CV_8UC1, (void*) ptr, pitch);
+    }
+
+    void unmapFrame(cuda::GpuMat& frame)
+    {
+        cuSafeCall( cuvidUnmapVideoFrame(decoder_, (CUdeviceptr) frame.data) );
+        frame.release();
+    }
+
+private:
+    CUvideoctxlock lock_;
+    CUcontext ctx_;
+    CUVIDDECODECREATEINFO createInfo_;
+    CUvideodecoder        decoder_;
+};
+
+}}}
+
+#endif // __VIDEO_DECODER_HPP__
diff --git a/modules/cudacodec/src/video_parser.cpp b/modules/cudacodec/src/video_parser.cpp
new file mode 100644
index 00000000000..e75a7bb9bb0
--- /dev/null
+++ b/modules/cudacodec/src/video_parser.cpp
@@ -0,0 +1,165 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+#ifdef HAVE_NVCUVID
+
+cv::cudacodec::detail::VideoParser::VideoParser(VideoDecoder* videoDecoder, FrameQueue* frameQueue) :
+    videoDecoder_(videoDecoder), frameQueue_(frameQueue), unparsedPackets_(0), hasError_(false)
+{
+    CUVIDPARSERPARAMS params;
+    std::memset(&params, 0, sizeof(CUVIDPARSERPARAMS));
+
+    params.CodecType              = videoDecoder->codec();
+    params.ulMaxNumDecodeSurfaces = videoDecoder->maxDecodeSurfaces();
+    params.ulMaxDisplayDelay      = 1; // this flag is needed so the parser will push frames out to the decoder as quickly as it can
+    params.pUserData              = this;
+    params.pfnSequenceCallback    = HandleVideoSequence;    // Called before decoding frames and/or whenever there is a format change
+    params.pfnDecodePicture       = HandlePictureDecode;    // Called when a picture is ready to be decoded (decode order)
+    params.pfnDisplayPicture      = HandlePictureDisplay;   // Called whenever a picture is ready to be displayed (display order)
+
+    cuSafeCall( cuvidCreateVideoParser(&parser_, &params) );
+}
+
+bool cv::cudacodec::detail::VideoParser::parseVideoData(const unsigned char* data, size_t size, bool endOfStream)
+{
+    CUVIDSOURCEDATAPACKET packet;
+    std::memset(&packet, 0, sizeof(CUVIDSOURCEDATAPACKET));
+
+    if (endOfStream)
+        packet.flags |= CUVID_PKT_ENDOFSTREAM;
+
+    packet.payload_size = static_cast<unsigned long>(size);
+    packet.payload = data;
+
+    if (cuvidParseVideoData(parser_, &packet) != CUDA_SUCCESS)
+    {
+        hasError_ = true;
+        frameQueue_->endDecode();
+        return false;
+    }
+
+    const int maxUnparsedPackets = 20;
+
+    ++unparsedPackets_;
+    if (unparsedPackets_ > maxUnparsedPackets)
+    {
+        hasError_ = true;
+        frameQueue_->endDecode();
+        return false;
+    }
+
+    if (endOfStream)
+        frameQueue_->endDecode();
+
+    return !frameQueue_->isEndOfDecode();
+}
+
+int CUDAAPI cv::cudacodec::detail::VideoParser::HandleVideoSequence(void* userData, CUVIDEOFORMAT* format)
+{
+    VideoParser* thiz = static_cast<VideoParser*>(userData);
+
+    thiz->unparsedPackets_ = 0;
+
+    if (format->codec         != thiz->videoDecoder_->codec()       ||
+        format->coded_width   != thiz->videoDecoder_->frameWidth()  ||
+        format->coded_height  != thiz->videoDecoder_->frameHeight() ||
+        format->chroma_format != thiz->videoDecoder_->chromaFormat()||
+        format->bit_depth_luma_minus8 != thiz->videoDecoder_->nBitDepthMinus8())
+    {
+        FormatInfo newFormat;
+
+        newFormat.codec = static_cast<Codec>(format->codec);
+        newFormat.chromaFormat = static_cast<ChromaFormat>(format->chroma_format);
+        newFormat.width = format->coded_width;
+        newFormat.height = format->coded_height;
+        newFormat.nBitDepthMinus8 = format->bit_depth_luma_minus8;
+
+        try
+        {
+            thiz->videoDecoder_->release();
+            thiz->videoDecoder_->create(newFormat);
+        }
+        catch (const cv::Exception&)
+        {
+            thiz->hasError_ = true;
+            return false;
+        }
+    }
+
+    return true;
+}
+
+int CUDAAPI cv::cudacodec::detail::VideoParser::HandlePictureDecode(void* userData, CUVIDPICPARAMS* picParams)
+{
+    VideoParser* thiz = static_cast<VideoParser*>(userData);
+
+    thiz->unparsedPackets_ = 0;
+
+    bool isFrameAvailable = thiz->frameQueue_->waitUntilFrameAvailable(picParams->CurrPicIdx);
+
+    if (!isFrameAvailable)
+        return false;
+
+    if (!thiz->videoDecoder_->decodePicture(picParams))
+    {
+        thiz->hasError_ = true;
+        return false;
+    }
+
+    return true;
+}
+
+int CUDAAPI cv::cudacodec::detail::VideoParser::HandlePictureDisplay(void* userData, CUVIDPARSERDISPINFO* picParams)
+{
+    VideoParser* thiz = static_cast<VideoParser*>(userData);
+
+    thiz->unparsedPackets_ = 0;
+
+    thiz->frameQueue_->enqueue(picParams);
+
+    return true;
+}
+
+#endif // HAVE_NVCUVID
diff --git a/modules/cudacodec/src/video_parser.hpp b/modules/cudacodec/src/video_parser.hpp
new file mode 100644
index 00000000000..5bd0f965629
--- /dev/null
+++ b/modules/cudacodec/src/video_parser.hpp
@@ -0,0 +1,99 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __VIDEO_PARSER_HPP__
+#define __VIDEO_PARSER_HPP__
+
+#if CUDA_VERSION >= 9000 && CUDA_VERSION < 10000
+    #include <dynlink_nvcuvid.h>
+#else
+    #include <nvcuvid.h>
+#endif
+
+#include "opencv2/core/private.cuda.hpp"
+#include "opencv2/cudacodec.hpp"
+#include "frame_queue.hpp"
+#include "video_decoder.hpp"
+
+namespace cv { namespace cudacodec { namespace detail
+{
+
+class VideoParser
+{
+public:
+    VideoParser(VideoDecoder* videoDecoder, FrameQueue* frameQueue);
+
+    ~VideoParser()
+    {
+        cuvidDestroyVideoParser(parser_);
+    }
+
+    bool parseVideoData(const unsigned char* data, size_t size, bool endOfStream);
+
+    bool hasError() const { return hasError_; }
+
+private:
+    VideoDecoder* videoDecoder_;
+    FrameQueue* frameQueue_;
+    CUvideoparser parser_;
+    int unparsedPackets_;
+    volatile bool hasError_;
+
+    // Called when the decoder encounters a video format change (or initial sequence header)
+    // This particular implementation of the callback returns 0 in case the video format changes
+    // to something different than the original format. Returning 0 causes a stop of the app.
+    static int CUDAAPI HandleVideoSequence(void* pUserData, CUVIDEOFORMAT* pFormat);
+
+    // Called by the video parser to decode a single picture
+    // Since the parser will deliver data as fast as it can, we need to make sure that the picture
+    // index we're attempting to use for decode is no longer used for display
+    static int CUDAAPI HandlePictureDecode(void* pUserData, CUVIDPICPARAMS* pPicParams);
+
+    // Called by the video parser to display a video frame (in the case of field pictures, there may be
+    // 2 decode calls per 1 display call, since two fields make up one frame)
+    static int CUDAAPI HandlePictureDisplay(void* pUserData, CUVIDPARSERDISPINFO* pPicParams);
+};
+
+}}}
+
+#endif // __VIDEO_PARSER_HPP__
diff --git a/modules/cudacodec/src/video_reader.cpp b/modules/cudacodec/src/video_reader.cpp
new file mode 100644
index 00000000000..17c172ad0f9
--- /dev/null
+++ b/modules/cudacodec/src/video_reader.cpp
@@ -0,0 +1,221 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudacodec;
+
+#ifndef HAVE_NVCUVID
+
+Ptr<VideoReader> cv::cudacodec::createVideoReader(const String&) { throw_no_cuda(); return Ptr<VideoReader>(); }
+Ptr<VideoReader> cv::cudacodec::createVideoReader(const Ptr<RawVideoSource>&) { throw_no_cuda(); return Ptr<VideoReader>(); }
+
+#else // HAVE_NVCUVID
+
+void videoDecPostProcessFrame(const GpuMat& decodedFrame, GpuMat& _outFrame, int width, int height, cudaStream_t stream);
+
+using namespace cv::cudacodec::detail;
+
+namespace
+{
+    class VideoReaderImpl : public VideoReader
+    {
+    public:
+        explicit VideoReaderImpl(const Ptr<VideoSource>& source);
+        ~VideoReaderImpl();
+
+        bool nextFrame(GpuMat& frame, Stream& stream) CV_OVERRIDE;
+
+        FormatInfo format() const CV_OVERRIDE;
+
+    private:
+        Ptr<VideoSource> videoSource_;
+
+        Ptr<FrameQueue> frameQueue_;
+        Ptr<VideoDecoder> videoDecoder_;
+        Ptr<VideoParser> videoParser_;
+
+        CUvideoctxlock lock_;
+
+        std::deque< std::pair<CUVIDPARSERDISPINFO, CUVIDPROCPARAMS> > frames_;
+    };
+
+    FormatInfo VideoReaderImpl::format() const
+    {
+        return videoSource_->format();
+    }
+
+    VideoReaderImpl::VideoReaderImpl(const Ptr<VideoSource>& source) :
+        videoSource_(source),
+        lock_(0)
+    {
+        // init context
+        GpuMat temp(1, 1, CV_8UC1);
+        temp.release();
+
+        CUcontext ctx;
+        cuSafeCall( cuCtxGetCurrent(&ctx) );
+        cuSafeCall( cuvidCtxLockCreate(&lock_, ctx) );
+
+        frameQueue_.reset(new FrameQueue);
+        videoDecoder_.reset(new VideoDecoder(videoSource_->format(), ctx, lock_));
+        videoParser_.reset(new VideoParser(videoDecoder_, frameQueue_));
+
+        videoSource_->setVideoParser(videoParser_);
+        videoSource_->start();
+    }
+
+    VideoReaderImpl::~VideoReaderImpl()
+    {
+        frameQueue_->endDecode();
+        videoSource_->stop();
+    }
+
+    class VideoCtxAutoLock
+    {
+    public:
+        VideoCtxAutoLock(CUvideoctxlock lock) : m_lock(lock) { cuSafeCall( cuvidCtxLock(m_lock, 0) ); }
+        ~VideoCtxAutoLock() { cuvidCtxUnlock(m_lock, 0); }
+
+    private:
+        CUvideoctxlock m_lock;
+    };
+
+    bool VideoReaderImpl::nextFrame(GpuMat& frame, Stream& stream)
+    {
+        if (videoSource_->hasError() || videoParser_->hasError())
+            CV_Error(Error::StsUnsupportedFormat, "Unsupported video source");
+
+        if (frames_.empty())
+        {
+            CUVIDPARSERDISPINFO displayInfo;
+
+            for (;;)
+            {
+                if (frameQueue_->dequeue(displayInfo))
+                    break;
+
+                if (videoSource_->hasError() || videoParser_->hasError())
+                    CV_Error(Error::StsUnsupportedFormat, "Unsupported video source");
+
+                if (frameQueue_->isEndOfDecode())
+                    return false;
+
+                // Wait a bit
+                Thread::sleep(1);
+            }
+
+            bool isProgressive = displayInfo.progressive_frame != 0;
+            const int num_fields = isProgressive ? 1 : 2 + displayInfo.repeat_first_field;
+
+            for (int active_field = 0; active_field < num_fields; ++active_field)
+            {
+                CUVIDPROCPARAMS videoProcParams;
+                std::memset(&videoProcParams, 0, sizeof(CUVIDPROCPARAMS));
+
+                videoProcParams.progressive_frame = displayInfo.progressive_frame;
+                videoProcParams.second_field      = active_field;
+                videoProcParams.top_field_first   = displayInfo.top_field_first;
+                videoProcParams.unpaired_field    = (num_fields == 1);
+
+                frames_.push_back(std::make_pair(displayInfo, videoProcParams));
+            }
+        }
+
+        if (frames_.empty())
+            return false;
+
+        std::pair<CUVIDPARSERDISPINFO, CUVIDPROCPARAMS> frameInfo = frames_.front();
+        frames_.pop_front();
+
+        {
+            VideoCtxAutoLock autoLock(lock_);
+
+            // map decoded video frame to CUDA surface
+            GpuMat decodedFrame = videoDecoder_->mapFrame(frameInfo.first.picture_index, frameInfo.second);
+
+            // perform post processing on the CUDA surface (performs colors space conversion and post processing)
+            // comment this out if we include the line of code seen above
+            videoDecPostProcessFrame(decodedFrame, frame, videoDecoder_->targetWidth(), videoDecoder_->targetHeight(), StreamAccessor::getStream(stream));
+
+            // unmap video frame
+            // unmapFrame() synchronizes with the VideoDecode API (ensures the frame has finished decoding)
+            videoDecoder_->unmapFrame(decodedFrame);
+        }
+
+        // release the frame, so it can be re-used in decoder
+        if (frames_.empty())
+            frameQueue_->releaseFrame(frameInfo.first);
+
+        return true;
+    }
+}
+
+Ptr<VideoReader> cv::cudacodec::createVideoReader(const String& filename)
+{
+    CV_Assert( !filename.empty() );
+
+    Ptr<VideoSource> videoSource;
+
+    try
+    {
+        // prefer ffmpeg to cuvidGetSourceVideoFormat() which doesn't always return the corrct raw pixel format
+        Ptr<RawVideoSource> source(new FFmpegVideoSource(filename));
+        videoSource.reset(new RawVideoSourceWrapper(source));
+    }
+    catch (...)
+    {
+        videoSource.reset(new CuvidVideoSource(filename));
+    }
+
+    return makePtr<VideoReaderImpl>(videoSource);
+}
+
+Ptr<VideoReader> cv::cudacodec::createVideoReader(const Ptr<RawVideoSource>& source)
+{
+    Ptr<VideoSource> videoSource(new RawVideoSourceWrapper(source));
+    return makePtr<VideoReaderImpl>(videoSource);
+}
+
+#endif // HAVE_NVCUVID
diff --git a/modules/cudacodec/src/video_source.cpp b/modules/cudacodec/src/video_source.cpp
new file mode 100644
index 00000000000..750244e985d
--- /dev/null
+++ b/modules/cudacodec/src/video_source.cpp
@@ -0,0 +1,120 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+#ifdef HAVE_NVCUVID
+
+using namespace cv;
+using namespace cv::cudacodec;
+using namespace cv::cudacodec::detail;
+
+bool cv::cudacodec::detail::VideoSource::parseVideoData(const unsigned char* data, size_t size, bool endOfStream)
+{
+    return videoParser_->parseVideoData(data, size, endOfStream);
+}
+
+cv::cudacodec::detail::RawVideoSourceWrapper::RawVideoSourceWrapper(const Ptr<RawVideoSource>& source) :
+    source_(source)
+{
+    CV_Assert( !source_.empty() );
+}
+
+cv::cudacodec::FormatInfo cv::cudacodec::detail::RawVideoSourceWrapper::format() const
+{
+    return source_->format();
+}
+
+void cv::cudacodec::detail::RawVideoSourceWrapper::start()
+{
+    stop_ = false;
+    hasError_ = false;
+    thread_.reset(new Thread(readLoop, this));
+}
+
+void cv::cudacodec::detail::RawVideoSourceWrapper::stop()
+{
+    stop_ = true;
+    thread_->wait();
+    thread_.release();
+}
+
+bool cv::cudacodec::detail::RawVideoSourceWrapper::isStarted() const
+{
+    return !stop_;
+}
+
+bool cv::cudacodec::detail::RawVideoSourceWrapper::hasError() const
+{
+    return hasError_;
+}
+
+void cv::cudacodec::detail::RawVideoSourceWrapper::readLoop(void* userData)
+{
+    RawVideoSourceWrapper* thiz = static_cast<RawVideoSourceWrapper*>(userData);
+
+    for (;;)
+    {
+        unsigned char* data;
+        size_t size;
+
+        if (!thiz->source_->getNextPacket(&data, &size))
+        {
+            thiz->hasError_ = false;
+            break;
+        }
+
+        if (!thiz->parseVideoData(data, size))
+        {
+            thiz->hasError_ = true;
+            break;
+        }
+
+        if (thiz->stop_)
+            break;
+    }
+
+    thiz->parseVideoData(0, 0, true);
+}
+
+#endif // HAVE_NVCUVID
diff --git a/modules/cudacodec/src/video_source.hpp b/modules/cudacodec/src/video_source.hpp
new file mode 100644
index 00000000000..9f2ed29d588
--- /dev/null
+++ b/modules/cudacodec/src/video_source.hpp
@@ -0,0 +1,99 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __CUDACODEC_VIDEO_SOURCE_H__
+#define __CUDACODEC_VIDEO_SOURCE_H__
+
+#include "opencv2/core/private.cuda.hpp"
+#include "opencv2/cudacodec.hpp"
+#include "thread.hpp"
+
+namespace cv { namespace cudacodec { namespace detail
+{
+
+class VideoParser;
+
+class VideoSource
+{
+public:
+    virtual ~VideoSource() {}
+
+    virtual FormatInfo format() const = 0;
+    virtual void start() = 0;
+    virtual void stop() = 0;
+    virtual bool isStarted() const = 0;
+    virtual bool hasError() const = 0;
+
+    void setVideoParser(detail::VideoParser* videoParser) { videoParser_ = videoParser; }
+
+protected:
+    bool parseVideoData(const uchar* data, size_t size, bool endOfStream = false);
+
+private:
+    detail::VideoParser* videoParser_;
+};
+
+class RawVideoSourceWrapper : public VideoSource
+{
+public:
+    RawVideoSourceWrapper(const Ptr<RawVideoSource>& source);
+
+    FormatInfo format() const CV_OVERRIDE;
+    void start() CV_OVERRIDE;
+    void stop() CV_OVERRIDE;
+    bool isStarted() const CV_OVERRIDE;
+    bool hasError() const CV_OVERRIDE;
+
+private:
+    Ptr<RawVideoSource> source_;
+
+    Ptr<Thread> thread_;
+    volatile bool stop_;
+    volatile bool hasError_;
+
+    static void readLoop(void* userData);
+};
+
+}}}
+
+#endif // __CUDACODEC_VIDEO_SOURCE_H__
diff --git a/modules/cudacodec/src/video_writer.cpp b/modules/cudacodec/src/video_writer.cpp
new file mode 100644
index 00000000000..ce3b68fb2a8
--- /dev/null
+++ b/modules/cudacodec/src/video_writer.cpp
@@ -0,0 +1,916 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudacodec;
+
+#if !defined(HAVE_NVCUVENC) || !defined(_WIN32)
+
+cv::cudacodec::EncoderParams::EncoderParams() { throw_no_cuda(); }
+cv::cudacodec::EncoderParams::EncoderParams(const String&) { throw_no_cuda(); }
+void cv::cudacodec::EncoderParams::load(const String&) { throw_no_cuda(); }
+void cv::cudacodec::EncoderParams::save(const String&) const { throw_no_cuda(); }
+
+Ptr<cv::cudacodec::VideoWriter> cv::cudacodec::createVideoWriter(const String&, Size, double, SurfaceFormat) { throw_no_cuda(); return Ptr<cv::cudacodec::VideoWriter>(); }
+Ptr<cv::cudacodec::VideoWriter> cv::cudacodec::createVideoWriter(const String&, Size, double, const EncoderParams&, SurfaceFormat) { throw_no_cuda(); return Ptr<cv::cudacodec::VideoWriter>(); }
+
+Ptr<cv::cudacodec::VideoWriter> cv::cudacodec::createVideoWriter(const Ptr<EncoderCallBack>&, Size, double, SurfaceFormat) { throw_no_cuda(); return Ptr<cv::cudacodec::VideoWriter>(); }
+Ptr<cv::cudacodec::VideoWriter> cv::cudacodec::createVideoWriter(const Ptr<EncoderCallBack>&, Size, double, const EncoderParams&, SurfaceFormat) { throw_no_cuda(); return Ptr<cv::cudacodec::VideoWriter>(); }
+
+#else // !defined HAVE_NVCUVENC || !defined _WIN32
+
+void RGB_to_YV12(const GpuMat& src, GpuMat& dst);
+
+///////////////////////////////////////////////////////////////////////////
+// VideoWriterImpl
+
+namespace
+{
+    class NVEncoderWrapper
+    {
+    public:
+        NVEncoderWrapper() : encoder_(0)
+        {
+            int err;
+
+            err = NVGetHWEncodeCaps();
+            if (err)
+                CV_Error(Error::GpuNotSupported, "No CUDA capability present");
+
+            // Create the Encoder API Interface
+            err = NVCreateEncoder(&encoder_);
+            CV_Assert( err == 0 );
+        }
+
+        ~NVEncoderWrapper()
+        {
+            if (encoder_)
+                NVDestroyEncoder(encoder_);
+        }
+
+        operator NVEncoder() const
+        {
+            return encoder_;
+        }
+
+    private:
+        NVEncoder encoder_;
+    };
+
+    enum CodecType
+    {
+        MPEG1, // not supported yet
+        MPEG2, // not supported yet
+        MPEG4, // not supported yet
+        H264
+    };
+
+    class VideoWriterImpl : public VideoWriter
+    {
+    public:
+        VideoWriterImpl(const Ptr<EncoderCallBack>& callback, Size frameSize, double fps, SurfaceFormat format, CodecType codec = H264);
+        VideoWriterImpl(const Ptr<EncoderCallBack>& callback, Size frameSize, double fps, const EncoderParams& params, SurfaceFormat format, CodecType codec = H264);
+
+        void write(InputArray frame, bool lastFrame = false);
+
+        EncoderParams getEncoderParams() const;
+
+    private:
+        void initEncoder(double fps);
+        void setEncodeParams(const EncoderParams& params);
+        void initGpuMemory();
+        void initCallBacks();
+        void createHWEncoder();
+
+        Ptr<EncoderCallBack> callback_;
+        Size frameSize_;
+
+        CodecType codec_;
+        SurfaceFormat inputFormat_;
+        NVVE_SurfaceFormat surfaceFormat_;
+
+        NVEncoderWrapper encoder_;
+
+        GpuMat videoFrame_;
+        CUvideoctxlock cuCtxLock_;
+
+        // CallBacks
+
+        static unsigned char* NVENCAPI HandleAcquireBitStream(int* pBufferSize, void* pUserdata);
+        static void NVENCAPI HandleReleaseBitStream(int nBytesInBuffer, unsigned char* cb, void* pUserdata);
+        static void NVENCAPI HandleOnBeginFrame(const NVVE_BeginFrameInfo* pbfi, void* pUserdata);
+        static void NVENCAPI HandleOnEndFrame(const NVVE_EndFrameInfo* pefi, void* pUserdata);
+    };
+
+    VideoWriterImpl::VideoWriterImpl(const Ptr<EncoderCallBack>& callback, Size frameSize, double fps, SurfaceFormat format, CodecType codec) :
+        callback_(callback),
+        frameSize_(frameSize),
+        codec_(codec),
+        inputFormat_(format),
+        cuCtxLock_(0)
+    {
+        surfaceFormat_ = (inputFormat_ == SF_BGR ? YV12 : static_cast<NVVE_SurfaceFormat>(inputFormat_));
+
+        initEncoder(fps);
+
+        initGpuMemory();
+
+        initCallBacks();
+
+        createHWEncoder();
+    }
+
+    VideoWriterImpl::VideoWriterImpl(const Ptr<EncoderCallBack>& callback, Size frameSize, double fps, const EncoderParams& params, SurfaceFormat format, CodecType codec) :
+        callback_(callback),
+        frameSize_(frameSize),
+        codec_(codec),
+        inputFormat_(format),
+        cuCtxLock_(0)
+    {
+        surfaceFormat_ = (inputFormat_ == SF_BGR ? YV12 : static_cast<NVVE_SurfaceFormat>(inputFormat_));
+
+        initEncoder(fps);
+
+        setEncodeParams(params);
+
+        initGpuMemory();
+
+        initCallBacks();
+
+        createHWEncoder();
+    }
+
+    void VideoWriterImpl::initEncoder(double fps)
+    {
+        int err;
+
+        // Set codec
+
+        static const unsigned long codecs_id[] =
+        {
+            NV_CODEC_TYPE_MPEG1, NV_CODEC_TYPE_MPEG2, NV_CODEC_TYPE_MPEG4, NV_CODEC_TYPE_H264, NV_CODEC_TYPE_VC1
+        };
+        err = NVSetCodec(encoder_, codecs_id[codec_]);
+        if (err)
+            CV_Error(Error::StsNotImplemented, "Codec format is not supported");
+
+        // Set default params
+
+        err = NVSetDefaultParam(encoder_);
+        CV_Assert( err == 0 );
+
+        // Set some common params
+
+        int inputSize[] = { frameSize_.width, frameSize_.height };
+        err = NVSetParamValue(encoder_, NVVE_IN_SIZE, &inputSize);
+        CV_Assert( err == 0 );
+        err = NVSetParamValue(encoder_, NVVE_OUT_SIZE, &inputSize);
+        CV_Assert( err == 0 );
+
+        int aspectRatio[] = { frameSize_.width, frameSize_.height, ASPECT_RATIO_DAR };
+        err = NVSetParamValue(encoder_, NVVE_ASPECT_RATIO, &aspectRatio);
+        CV_Assert( err == 0 );
+
+        // FPS
+
+        int frame_rate = static_cast<int>(fps + 0.5);
+        int frame_rate_base = 1;
+        while (fabs(static_cast<double>(frame_rate) / frame_rate_base) - fps > 0.001)
+        {
+            frame_rate_base *= 10;
+            frame_rate = static_cast<int>(fps*frame_rate_base + 0.5);
+        }
+        int FrameRate[] = { frame_rate, frame_rate_base };
+        err = NVSetParamValue(encoder_, NVVE_FRAME_RATE, &FrameRate);
+        CV_Assert( err == 0 );
+
+        // Select device for encoding
+
+        int gpuID = getDevice();
+        err = NVSetParamValue(encoder_, NVVE_FORCE_GPU_SELECTION, &gpuID);
+        CV_Assert( err == 0 );
+    }
+
+    void VideoWriterImpl::setEncodeParams(const EncoderParams& params)
+    {
+        int err;
+
+        int P_Interval = params.P_Interval;
+        err = NVSetParamValue(encoder_, NVVE_P_INTERVAL, &P_Interval);
+        CV_Assert( err == 0 );
+
+        int IDR_Period = params.IDR_Period;
+        err = NVSetParamValue(encoder_, NVVE_IDR_PERIOD, &IDR_Period);
+        CV_Assert( err == 0 );
+
+        int DynamicGOP = params.DynamicGOP;
+        err = NVSetParamValue(encoder_, NVVE_DYNAMIC_GOP, &DynamicGOP);
+        CV_Assert( err == 0 );
+
+        NVVE_RateCtrlType RCType = static_cast<NVVE_RateCtrlType>(params.RCType);
+        err = NVSetParamValue(encoder_, NVVE_RC_TYPE, &RCType);
+        CV_Assert( err == 0 );
+
+        int AvgBitrate = params.AvgBitrate;
+        err = NVSetParamValue(encoder_, NVVE_AVG_BITRATE, &AvgBitrate);
+        CV_Assert( err == 0 );
+
+        int PeakBitrate = params.PeakBitrate;
+        err = NVSetParamValue(encoder_, NVVE_PEAK_BITRATE, &PeakBitrate);
+        CV_Assert( err == 0 );
+
+        int QP_Level_Intra = params.QP_Level_Intra;
+        err = NVSetParamValue(encoder_, NVVE_QP_LEVEL_INTRA, &QP_Level_Intra);
+        CV_Assert( err == 0 );
+
+        int QP_Level_InterP = params.QP_Level_InterP;
+        err = NVSetParamValue(encoder_, NVVE_QP_LEVEL_INTER_P, &QP_Level_InterP);
+        CV_Assert( err == 0 );
+
+        int QP_Level_InterB = params.QP_Level_InterB;
+        err = NVSetParamValue(encoder_, NVVE_QP_LEVEL_INTER_B, &QP_Level_InterB);
+        CV_Assert( err == 0 );
+
+        int DeblockMode = params.DeblockMode;
+        err = NVSetParamValue(encoder_, NVVE_DEBLOCK_MODE, &DeblockMode);
+        CV_Assert( err == 0 );
+
+        int ProfileLevel = params.ProfileLevel;
+        err = NVSetParamValue(encoder_, NVVE_PROFILE_LEVEL, &ProfileLevel);
+        CV_Assert( err == 0 );
+
+        int ForceIntra = params.ForceIntra;
+        err = NVSetParamValue(encoder_, NVVE_FORCE_INTRA, &ForceIntra);
+        CV_Assert( err == 0 );
+
+        int ForceIDR = params.ForceIDR;
+        err = NVSetParamValue(encoder_, NVVE_FORCE_IDR, &ForceIDR);
+        CV_Assert( err == 0 );
+
+        int ClearStat = params.ClearStat;
+        err = NVSetParamValue(encoder_, NVVE_CLEAR_STAT, &ClearStat);
+        CV_Assert( err == 0 );
+
+        NVVE_DI_MODE DIMode = static_cast<NVVE_DI_MODE>(params.DIMode);
+        err = NVSetParamValue(encoder_, NVVE_SET_DEINTERLACE, &DIMode);
+        CV_Assert( err == 0 );
+
+        if (params.Presets != -1)
+        {
+            NVVE_PRESETS_TARGET Presets = static_cast<NVVE_PRESETS_TARGET>(params.Presets);
+            err = NVSetParamValue(encoder_, NVVE_PRESETS, &Presets);
+            CV_Assert( err == 0 );
+        }
+
+        int DisableCabac = params.DisableCabac;
+        err = NVSetParamValue(encoder_, NVVE_DISABLE_CABAC, &DisableCabac);
+        CV_Assert( err == 0 );
+
+        int NaluFramingType = params.NaluFramingType;
+        err = NVSetParamValue(encoder_, NVVE_CONFIGURE_NALU_FRAMING_TYPE, &NaluFramingType);
+        CV_Assert( err == 0 );
+
+        int DisableSPSPPS = params.DisableSPSPPS;
+        err = NVSetParamValue(encoder_, NVVE_DISABLE_SPS_PPS, &DisableSPSPPS);
+        CV_Assert( err == 0 );
+    }
+
+    EncoderParams VideoWriterImpl::getEncoderParams() const
+    {
+        int err;
+
+        EncoderParams params;
+
+        int P_Interval;
+        err = NVGetParamValue(encoder_, NVVE_P_INTERVAL, &P_Interval);
+        CV_Assert( err == 0 );
+        params.P_Interval = P_Interval;
+
+        int IDR_Period;
+        err = NVGetParamValue(encoder_, NVVE_IDR_PERIOD, &IDR_Period);
+        CV_Assert( err == 0 );
+        params.IDR_Period = IDR_Period;
+
+        int DynamicGOP;
+        err = NVGetParamValue(encoder_, NVVE_DYNAMIC_GOP, &DynamicGOP);
+        CV_Assert( err == 0 );
+        params.DynamicGOP = DynamicGOP;
+
+        NVVE_RateCtrlType RCType;
+        err = NVGetParamValue(encoder_, NVVE_RC_TYPE, &RCType);
+        CV_Assert( err == 0 );
+        params.RCType = RCType;
+
+        int AvgBitrate;
+        err = NVGetParamValue(encoder_, NVVE_AVG_BITRATE, &AvgBitrate);
+        CV_Assert( err == 0 );
+        params.AvgBitrate = AvgBitrate;
+
+        int PeakBitrate;
+        err = NVGetParamValue(encoder_, NVVE_PEAK_BITRATE, &PeakBitrate);
+        CV_Assert( err == 0 );
+        params.PeakBitrate = PeakBitrate;
+
+        int QP_Level_Intra;
+        err = NVGetParamValue(encoder_, NVVE_QP_LEVEL_INTRA, &QP_Level_Intra);
+        CV_Assert( err == 0 );
+        params.QP_Level_Intra = QP_Level_Intra;
+
+        int QP_Level_InterP;
+        err = NVGetParamValue(encoder_, NVVE_QP_LEVEL_INTER_P, &QP_Level_InterP);
+        CV_Assert( err == 0 );
+        params.QP_Level_InterP = QP_Level_InterP;
+
+        int QP_Level_InterB;
+        err = NVGetParamValue(encoder_, NVVE_QP_LEVEL_INTER_B, &QP_Level_InterB);
+        CV_Assert( err == 0 );
+        params.QP_Level_InterB = QP_Level_InterB;
+
+        int DeblockMode;
+        err = NVGetParamValue(encoder_, NVVE_DEBLOCK_MODE, &DeblockMode);
+        CV_Assert( err == 0 );
+        params.DeblockMode = DeblockMode;
+
+        int ProfileLevel;
+        err = NVGetParamValue(encoder_, NVVE_PROFILE_LEVEL, &ProfileLevel);
+        CV_Assert( err == 0 );
+        params.ProfileLevel = ProfileLevel;
+
+        int ForceIntra;
+        err = NVGetParamValue(encoder_, NVVE_FORCE_INTRA, &ForceIntra);
+        CV_Assert( err == 0 );
+        params.ForceIntra = ForceIntra;
+
+        int ForceIDR;
+        err = NVGetParamValue(encoder_, NVVE_FORCE_IDR, &ForceIDR);
+        CV_Assert( err == 0 );
+        params.ForceIDR = ForceIDR;
+
+        int ClearStat;
+        err = NVGetParamValue(encoder_, NVVE_CLEAR_STAT, &ClearStat);
+        CV_Assert( err == 0 );
+        params.ClearStat = ClearStat;
+
+        NVVE_DI_MODE DIMode;
+        err = NVGetParamValue(encoder_, NVVE_SET_DEINTERLACE, &DIMode);
+        CV_Assert( err == 0 );
+        params.DIMode = DIMode;
+
+        params.Presets = -1;
+
+        int DisableCabac;
+        err = NVGetParamValue(encoder_, NVVE_DISABLE_CABAC, &DisableCabac);
+        CV_Assert( err == 0 );
+        params.DisableCabac = DisableCabac;
+
+        int NaluFramingType;
+        err = NVGetParamValue(encoder_, NVVE_CONFIGURE_NALU_FRAMING_TYPE, &NaluFramingType);
+        CV_Assert( err == 0 );
+        params.NaluFramingType = NaluFramingType;
+
+        int DisableSPSPPS;
+        err = NVGetParamValue(encoder_, NVVE_DISABLE_SPS_PPS, &DisableSPSPPS);
+        CV_Assert( err == 0 );
+        params.DisableSPSPPS = DisableSPSPPS;
+
+        return params;
+    }
+
+    void VideoWriterImpl::initGpuMemory()
+    {
+        int err;
+
+        // initialize context
+        GpuMat temp(1, 1, CV_8U);
+        temp.release();
+
+        static const int bpp[] =
+        {
+            16, // UYVY, 4:2:2
+            16, // YUY2, 4:2:2
+            12, // YV12, 4:2:0
+            12, // NV12, 4:2:0
+            12, // IYUV, 4:2:0
+        };
+
+        CUcontext cuContext;
+        cuSafeCall( cuCtxGetCurrent(&cuContext) );
+
+        // Allocate the CUDA memory Pitched Surface
+        if (surfaceFormat_ == UYVY || surfaceFormat_ == YUY2)
+            videoFrame_.create(frameSize_.height, (frameSize_.width * bpp[surfaceFormat_]) / 8, CV_8UC1);
+        else
+            videoFrame_.create((frameSize_.height * bpp[surfaceFormat_]) / 8, frameSize_.width, CV_8UC1);
+
+        // Create the Video Context Lock (used for synchronization)
+        cuSafeCall( cuvidCtxLockCreate(&cuCtxLock_, cuContext) );
+
+        // If we are using GPU Device Memory with NVCUVENC, it is necessary to create a
+        // CUDA Context with a Context Lock cuvidCtxLock.  The Context Lock needs to be passed to NVCUVENC
+
+        int iUseDeviceMem = 1;
+        err = NVSetParamValue(encoder_, NVVE_DEVICE_MEMORY_INPUT, &iUseDeviceMem);
+        CV_Assert( err == 0 );
+
+        err = NVSetParamValue(encoder_, NVVE_DEVICE_CTX_LOCK, &cuCtxLock_);
+        CV_Assert( err == 0 );
+    }
+
+    void VideoWriterImpl::initCallBacks()
+    {
+        NVVE_CallbackParams cb;
+        memset(&cb, 0, sizeof(NVVE_CallbackParams));
+
+        cb.pfnacquirebitstream = HandleAcquireBitStream;
+        cb.pfnonbeginframe     = HandleOnBeginFrame;
+        cb.pfnonendframe       = HandleOnEndFrame;
+        cb.pfnreleasebitstream = HandleReleaseBitStream;
+
+        NVRegisterCB(encoder_, cb, this);
+    }
+
+    void VideoWriterImpl::createHWEncoder()
+    {
+        int err;
+
+        // Create the NVIDIA HW resources for Encoding on NVIDIA hardware
+        err = NVCreateHWEncoder(encoder_);
+        CV_Assert( err == 0 );
+    }
+
+    // UYVY/YUY2 are both 4:2:2 formats (16bpc)
+    // Luma, U, V are interleaved, chroma is subsampled (w/2,h)
+    void copyUYVYorYUY2Frame(Size frameSize, const GpuMat& src, GpuMat& dst)
+    {
+        // Source is YUVY/YUY2 4:2:2, the YUV data in a packed and interleaved
+
+        // YUV Copy setup
+        CUDA_MEMCPY2D stCopyYUV422;
+        memset(&stCopyYUV422, 0, sizeof(CUDA_MEMCPY2D));
+
+        stCopyYUV422.srcXInBytes          = 0;
+        stCopyYUV422.srcY                 = 0;
+        stCopyYUV422.srcMemoryType        = CU_MEMORYTYPE_DEVICE;
+        stCopyYUV422.srcHost              = 0;
+        stCopyYUV422.srcDevice            = (CUdeviceptr) src.data;
+        stCopyYUV422.srcArray             = 0;
+        stCopyYUV422.srcPitch             = src.step;
+
+        stCopyYUV422.dstXInBytes          = 0;
+        stCopyYUV422.dstY                 = 0;
+        stCopyYUV422.dstMemoryType        = CU_MEMORYTYPE_DEVICE;
+        stCopyYUV422.dstHost              = 0;
+        stCopyYUV422.dstDevice            = (CUdeviceptr) dst.data;
+        stCopyYUV422.dstArray             = 0;
+        stCopyYUV422.dstPitch             = dst.step;
+
+        stCopyYUV422.WidthInBytes         = frameSize.width * 2;
+        stCopyYUV422.Height               = frameSize.height;
+
+        // DMA Luma/Chroma
+        cuSafeCall( cuMemcpy2D(&stCopyYUV422) );
+    }
+
+    // YV12/IYUV are both 4:2:0 planar formats (12bpc)
+    // Luma, U, V chroma planar (12bpc), chroma is subsampled (w/2,h/2)
+    void copyYV12orIYUVFrame(Size frameSize, const GpuMat& src, GpuMat& dst)
+    {
+        // Source is YV12/IYUV, this native format is converted to NV12 format by the video encoder
+
+        // (1) luma copy setup
+        CUDA_MEMCPY2D stCopyLuma;
+        memset(&stCopyLuma, 0, sizeof(CUDA_MEMCPY2D));
+
+        stCopyLuma.srcXInBytes          = 0;
+        stCopyLuma.srcY                 = 0;
+        stCopyLuma.srcMemoryType        = CU_MEMORYTYPE_DEVICE;
+        stCopyLuma.srcHost              = 0;
+        stCopyLuma.srcDevice            = (CUdeviceptr) src.data;
+        stCopyLuma.srcArray             = 0;
+        stCopyLuma.srcPitch             = src.step;
+
+        stCopyLuma.dstXInBytes          = 0;
+        stCopyLuma.dstY                 = 0;
+        stCopyLuma.dstMemoryType        = CU_MEMORYTYPE_DEVICE;
+        stCopyLuma.dstHost              = 0;
+        stCopyLuma.dstDevice            = (CUdeviceptr) dst.data;
+        stCopyLuma.dstArray             = 0;
+        stCopyLuma.dstPitch             = dst.step;
+
+        stCopyLuma.WidthInBytes         = frameSize.width;
+        stCopyLuma.Height               = frameSize.height;
+
+        // (2) chroma copy setup, U/V can be done together
+        CUDA_MEMCPY2D stCopyChroma;
+        memset(&stCopyChroma, 0, sizeof(CUDA_MEMCPY2D));
+
+        stCopyChroma.srcXInBytes        = 0;
+        stCopyChroma.srcY               = frameSize.height << 1; // U/V chroma offset
+        stCopyChroma.srcMemoryType      = CU_MEMORYTYPE_DEVICE;
+        stCopyChroma.srcHost            = 0;
+        stCopyChroma.srcDevice          = (CUdeviceptr) src.data;
+        stCopyChroma.srcArray           = 0;
+        stCopyChroma.srcPitch           = src.step >> 1; // chroma is subsampled by 2 (but it has U/V are next to each other)
+
+        stCopyChroma.dstXInBytes        = 0;
+        stCopyChroma.dstY               = frameSize.height << 1; // chroma offset (srcY*srcPitch now points to the chroma planes)
+        stCopyChroma.dstMemoryType      = CU_MEMORYTYPE_DEVICE;
+        stCopyChroma.dstHost            = 0;
+        stCopyChroma.dstDevice          = (CUdeviceptr) dst.data;
+        stCopyChroma.dstArray           = 0;
+        stCopyChroma.dstPitch           = dst.step >> 1;
+
+        stCopyChroma.WidthInBytes       = frameSize.width >> 1;
+        stCopyChroma.Height             = frameSize.height; // U/V are sent together
+
+        // DMA Luma
+        cuSafeCall( cuMemcpy2D(&stCopyLuma) );
+
+        // DMA Chroma channels (UV side by side)
+        cuSafeCall( cuMemcpy2D(&stCopyChroma) );
+    }
+
+    // NV12 is 4:2:0 format (12bpc)
+    // Luma followed by U/V chroma interleaved (12bpc), chroma is subsampled (w/2,h/2)
+    void copyNV12Frame(Size frameSize, const GpuMat& src, GpuMat& dst)
+    {
+        // Source is NV12 in pitch linear memory
+        // Because we are assume input is NV12 (if we take input in the native format), the encoder handles NV12 as a native format in pitch linear memory
+
+        // Luma/Chroma can be done in a single transfer
+        CUDA_MEMCPY2D stCopyNV12;
+        memset(&stCopyNV12, 0, sizeof(CUDA_MEMCPY2D));
+
+        stCopyNV12.srcXInBytes          = 0;
+        stCopyNV12.srcY                 = 0;
+        stCopyNV12.srcMemoryType        = CU_MEMORYTYPE_DEVICE;
+        stCopyNV12.srcHost              = 0;
+        stCopyNV12.srcDevice            = (CUdeviceptr) src.data;
+        stCopyNV12.srcArray             = 0;
+        stCopyNV12.srcPitch             = src.step;
+
+        stCopyNV12.dstXInBytes          = 0;
+        stCopyNV12.dstY                 = 0;
+        stCopyNV12.dstMemoryType        = CU_MEMORYTYPE_DEVICE;
+        stCopyNV12.dstHost              = 0;
+        stCopyNV12.dstDevice            = (CUdeviceptr) dst.data;
+        stCopyNV12.dstArray             = 0;
+        stCopyNV12.dstPitch             = dst.step;
+
+        stCopyNV12.WidthInBytes         = frameSize.width;
+        stCopyNV12.Height               = (frameSize.height * 3) >> 1;
+
+        // DMA Luma/Chroma
+        cuSafeCall( cuMemcpy2D(&stCopyNV12) );
+    }
+
+    void VideoWriterImpl::write(InputArray _frame, bool lastFrame)
+    {
+        GpuMat frame = _frame.getGpuMat();
+
+        if (inputFormat_ == SF_BGR)
+        {
+            CV_Assert( frame.size() == frameSize_ );
+            CV_Assert( frame.type() == CV_8UC1 || frame.type() == CV_8UC3 || frame.type() == CV_8UC4 );
+        }
+        else
+        {
+            CV_Assert( frame.size() == videoFrame_.size() );
+            CV_Assert( frame.type() == videoFrame_.type() );
+        }
+
+        NVVE_EncodeFrameParams efparams;
+        efparams.Width = frameSize_.width;
+        efparams.Height = frameSize_.height;
+        efparams.Pitch = static_cast<int>(videoFrame_.step);
+        efparams.SurfFmt = surfaceFormat_;
+        efparams.PictureStruc = FRAME_PICTURE;
+        efparams.topfieldfirst =  0;
+        efparams.repeatFirstField = 0;
+        efparams.progressiveFrame = (surfaceFormat_ == NV12) ? 1 : 0;
+        efparams.bLast = lastFrame;
+        efparams.picBuf = 0; // Must be set to NULL in order to support device memory input
+
+        // Don't forget we need to lock/unlock between memcopies
+        cuSafeCall( cuvidCtxLock(cuCtxLock_, 0) );
+
+        if (inputFormat_ == SF_BGR)
+        {
+            RGB_to_YV12(frame, videoFrame_);
+        }
+        else
+        {
+            switch (surfaceFormat_)
+            {
+            case UYVY: // UYVY (4:2:2)
+            case YUY2: // YUY2 (4:2:2)
+                copyUYVYorYUY2Frame(frameSize_, frame, videoFrame_);
+                break;
+
+            case YV12: // YV12 (4:2:0), Y V U
+            case IYUV: // IYUV (4:2:0), Y U V
+                copyYV12orIYUVFrame(frameSize_, frame, videoFrame_);
+                break;
+
+            case NV12: // NV12 (4:2:0)
+                copyNV12Frame(frameSize_, frame, videoFrame_);
+                break;
+            }
+        }
+
+        cuSafeCall( cuvidCtxUnlock(cuCtxLock_, 0) );
+
+        int err = NVEncodeFrame(encoder_, &efparams, 0, videoFrame_.data);
+        CV_Assert( err == 0 );
+    }
+
+    unsigned char* NVENCAPI VideoWriterImpl::HandleAcquireBitStream(int* pBufferSize, void* pUserdata)
+    {
+        VideoWriterImpl* thiz = static_cast<VideoWriterImpl*>(pUserdata);
+
+        return thiz->callback_->acquireBitStream(pBufferSize);
+    }
+
+    void NVENCAPI VideoWriterImpl::HandleReleaseBitStream(int nBytesInBuffer, unsigned char* cb, void* pUserdata)
+    {
+        VideoWriterImpl* thiz = static_cast<VideoWriterImpl*>(pUserdata);
+
+        thiz->callback_->releaseBitStream(cb, nBytesInBuffer);
+    }
+
+    void NVENCAPI VideoWriterImpl::HandleOnBeginFrame(const NVVE_BeginFrameInfo* pbfi, void* pUserdata)
+    {
+        VideoWriterImpl* thiz = static_cast<VideoWriterImpl*>(pUserdata);
+
+        thiz->callback_->onBeginFrame(pbfi->nFrameNumber, static_cast<EncoderCallBack::PicType>(pbfi->nPicType));
+    }
+
+    void NVENCAPI VideoWriterImpl::HandleOnEndFrame(const NVVE_EndFrameInfo* pefi, void* pUserdata)
+    {
+        VideoWriterImpl* thiz = static_cast<VideoWriterImpl*>(pUserdata);
+
+        thiz->callback_->onEndFrame(pefi->nFrameNumber, static_cast<EncoderCallBack::PicType>(pefi->nPicType));
+    }
+
+    ///////////////////////////////////////////////////////////////////////////
+    // FFMPEG
+
+    class EncoderCallBackFFMPEG : public EncoderCallBack
+    {
+    public:
+        EncoderCallBackFFMPEG(const String& fileName, Size frameSize, double fps);
+        ~EncoderCallBackFFMPEG();
+
+        unsigned char* acquireBitStream(int* bufferSize);
+        void releaseBitStream(unsigned char* data, int size);
+        void onBeginFrame(int frameNumber, PicType picType);
+        void onEndFrame(int frameNumber, PicType picType);
+
+    private:
+        static bool init_MediaStream_FFMPEG();
+
+        struct OutputMediaStream_FFMPEG* stream_;
+        std::vector<uchar> buf_;
+        bool isKeyFrame_;
+
+        static Create_OutputMediaStream_FFMPEG_Plugin create_OutputMediaStream_FFMPEG_p;
+        static Release_OutputMediaStream_FFMPEG_Plugin release_OutputMediaStream_FFMPEG_p;
+        static Write_OutputMediaStream_FFMPEG_Plugin write_OutputMediaStream_FFMPEG_p;
+    };
+
+    Create_OutputMediaStream_FFMPEG_Plugin EncoderCallBackFFMPEG::create_OutputMediaStream_FFMPEG_p = 0;
+    Release_OutputMediaStream_FFMPEG_Plugin EncoderCallBackFFMPEG::release_OutputMediaStream_FFMPEG_p = 0;
+    Write_OutputMediaStream_FFMPEG_Plugin EncoderCallBackFFMPEG::write_OutputMediaStream_FFMPEG_p = 0;
+
+    bool EncoderCallBackFFMPEG::init_MediaStream_FFMPEG()
+    {
+        static bool initialized = false;
+
+        if (!initialized)
+        {
+            #if defined(_WIN32)
+                const char* module_name = "opencv_ffmpeg"
+                    CVAUX_STR(CV_VERSION_MAJOR) CVAUX_STR(CV_VERSION_MINOR) CVAUX_STR(CV_VERSION_REVISION)
+                #if (defined _MSC_VER && defined _M_X64) || (defined __GNUC__ && defined __x86_64__)
+                    "_64"
+                #endif
+                    ".dll";
+
+                static HMODULE cvFFOpenCV = LoadLibrary(module_name);
+
+                if (cvFFOpenCV)
+                {
+                    create_OutputMediaStream_FFMPEG_p =
+                        (Create_OutputMediaStream_FFMPEG_Plugin)GetProcAddress(cvFFOpenCV, "create_OutputMediaStream_FFMPEG");
+                    release_OutputMediaStream_FFMPEG_p =
+                        (Release_OutputMediaStream_FFMPEG_Plugin)GetProcAddress(cvFFOpenCV, "release_OutputMediaStream_FFMPEG");
+                    write_OutputMediaStream_FFMPEG_p =
+                        (Write_OutputMediaStream_FFMPEG_Plugin)GetProcAddress(cvFFOpenCV, "write_OutputMediaStream_FFMPEG");
+
+                    initialized = create_OutputMediaStream_FFMPEG_p != 0 && release_OutputMediaStream_FFMPEG_p != 0 && write_OutputMediaStream_FFMPEG_p != 0;
+                }
+            #elif defined(HAVE_FFMPEG)
+                create_OutputMediaStream_FFMPEG_p = create_OutputMediaStream_FFMPEG;
+                release_OutputMediaStream_FFMPEG_p = release_OutputMediaStream_FFMPEG;
+                write_OutputMediaStream_FFMPEG_p = write_OutputMediaStream_FFMPEG;
+
+                initialized = true;
+            #endif
+        }
+
+        return initialized;
+    }
+
+    EncoderCallBackFFMPEG::EncoderCallBackFFMPEG(const String& fileName, Size frameSize, double fps) :
+        stream_(0), isKeyFrame_(false)
+    {
+        int buf_size = std::max(frameSize.area() * 4, 1024 * 1024);
+        buf_.resize(buf_size);
+
+        CV_Assert( init_MediaStream_FFMPEG() );
+
+        stream_ = create_OutputMediaStream_FFMPEG_p(fileName.c_str(), frameSize.width, frameSize.height, fps);
+        CV_Assert( stream_ != 0 );
+    }
+
+    EncoderCallBackFFMPEG::~EncoderCallBackFFMPEG()
+    {
+        release_OutputMediaStream_FFMPEG_p(stream_);
+    }
+
+    unsigned char* EncoderCallBackFFMPEG::acquireBitStream(int* bufferSize)
+    {
+        *bufferSize = static_cast<int>(buf_.size());
+        return &buf_[0];
+    }
+
+    void EncoderCallBackFFMPEG::releaseBitStream(unsigned char* data, int size)
+    {
+        write_OutputMediaStream_FFMPEG_p(stream_, data, size, isKeyFrame_);
+    }
+
+    void EncoderCallBackFFMPEG::onBeginFrame(int frameNumber, PicType picType)
+    {
+        CV_UNUSED(frameNumber);
+        isKeyFrame_ = (picType == IFRAME);
+    }
+
+    void EncoderCallBackFFMPEG::onEndFrame(int frameNumber, PicType picType)
+    {
+        CV_UNUSED(frameNumber);
+        CV_UNUSED(picType);
+    }
+}
+
+///////////////////////////////////////////////////////////////////////////
+// EncoderParams
+
+cv::cudacodec::EncoderParams::EncoderParams()
+{
+    P_Interval = 3;
+    IDR_Period = 15;
+    DynamicGOP = 0;
+    RCType = 1;
+    AvgBitrate = 4000000;
+    PeakBitrate = 10000000;
+    QP_Level_Intra = 25;
+    QP_Level_InterP = 28;
+    QP_Level_InterB = 31;
+    DeblockMode = 1;
+    ProfileLevel = 65357;
+    ForceIntra = 0;
+    ForceIDR = 0;
+    ClearStat = 0;
+    DIMode = 1;
+    Presets = 2;
+    DisableCabac = 0;
+    NaluFramingType = 0;
+    DisableSPSPPS = 0;
+}
+
+cv::cudacodec::EncoderParams::EncoderParams(const String& configFile)
+{
+    load(configFile);
+}
+
+void cv::cudacodec::EncoderParams::load(const String& configFile)
+{
+    FileStorage fs(configFile, FileStorage::READ);
+    CV_Assert( fs.isOpened() );
+
+    read(fs["P_Interval"     ], P_Interval, 3);
+    read(fs["IDR_Period"     ], IDR_Period, 15);
+    read(fs["DynamicGOP"     ], DynamicGOP, 0);
+    read(fs["RCType"         ], RCType, 1);
+    read(fs["AvgBitrate"     ], AvgBitrate, 4000000);
+    read(fs["PeakBitrate"    ], PeakBitrate, 10000000);
+    read(fs["QP_Level_Intra" ], QP_Level_Intra, 25);
+    read(fs["QP_Level_InterP"], QP_Level_InterP, 28);
+    read(fs["QP_Level_InterB"], QP_Level_InterB, 31);
+    read(fs["DeblockMode"    ], DeblockMode, 1);
+    read(fs["ProfileLevel"   ], ProfileLevel, 65357);
+    read(fs["ForceIntra"     ], ForceIntra, 0);
+    read(fs["ForceIDR"       ], ForceIDR, 0);
+    read(fs["ClearStat"      ], ClearStat, 0);
+    read(fs["DIMode"         ], DIMode, 1);
+    read(fs["Presets"        ], Presets, 2);
+    read(fs["DisableCabac"   ], DisableCabac, 0);
+    read(fs["NaluFramingType"], NaluFramingType, 0);
+    read(fs["DisableSPSPPS"  ], DisableSPSPPS, 0);
+}
+
+void cv::cudacodec::EncoderParams::save(const String& configFile) const
+{
+    FileStorage fs(configFile, FileStorage::WRITE);
+    CV_Assert( fs.isOpened() );
+
+    write(fs, "P_Interval"     , P_Interval);
+    write(fs, "IDR_Period"     , IDR_Period);
+    write(fs, "DynamicGOP"     , DynamicGOP);
+    write(fs, "RCType"         , RCType);
+    write(fs, "AvgBitrate"     , AvgBitrate);
+    write(fs, "PeakBitrate"    , PeakBitrate);
+    write(fs, "QP_Level_Intra" , QP_Level_Intra);
+    write(fs, "QP_Level_InterP", QP_Level_InterP);
+    write(fs, "QP_Level_InterB", QP_Level_InterB);
+    write(fs, "DeblockMode"    , DeblockMode);
+    write(fs, "ProfileLevel"   , ProfileLevel);
+    write(fs, "ForceIntra"     , ForceIntra);
+    write(fs, "ForceIDR"       , ForceIDR);
+    write(fs, "ClearStat"      , ClearStat);
+    write(fs, "DIMode"         , DIMode);
+    write(fs, "Presets"        , Presets);
+    write(fs, "DisableCabac"   , DisableCabac);
+    write(fs, "NaluFramingType", NaluFramingType);
+    write(fs, "DisableSPSPPS"  , DisableSPSPPS);
+}
+
+///////////////////////////////////////////////////////////////////////////
+// createVideoWriter
+
+Ptr<VideoWriter> cv::cudacodec::createVideoWriter(const String& fileName, Size frameSize, double fps, SurfaceFormat format)
+{
+    Ptr<EncoderCallBack> encoderCallback(new EncoderCallBackFFMPEG(fileName, frameSize, fps));
+    return createVideoWriter(encoderCallback, frameSize, fps, format);
+}
+
+Ptr<VideoWriter> cv::cudacodec::createVideoWriter(const String& fileName, Size frameSize, double fps, const EncoderParams& params, SurfaceFormat format)
+{
+    Ptr<EncoderCallBack> encoderCallback(new EncoderCallBackFFMPEG(fileName, frameSize, fps));
+    return createVideoWriter(encoderCallback, frameSize, fps, params, format);
+}
+
+Ptr<VideoWriter> cv::cudacodec::createVideoWriter(const Ptr<EncoderCallBack>& encoderCallback, Size frameSize, double fps, SurfaceFormat format)
+{
+    return makePtr<VideoWriterImpl>(encoderCallback, frameSize, fps, format);
+}
+
+Ptr<VideoWriter> cv::cudacodec::createVideoWriter(const Ptr<EncoderCallBack>& encoderCallback, Size frameSize, double fps, const EncoderParams& params, SurfaceFormat format)
+{
+    return makePtr<VideoWriterImpl>(encoderCallback, frameSize, fps, params, format);
+}
+
+#endif // !defined HAVE_NVCUVENC || !defined _WIN32 || defined HAVE_FFMPEG_WRAPPER
diff --git a/modules/cudacodec/test/test_main.cpp b/modules/cudacodec/test/test_main.cpp
new file mode 100644
index 00000000000..04f4fcf6e60
--- /dev/null
+++ b/modules/cudacodec/test/test_main.cpp
@@ -0,0 +1,45 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+CV_CUDA_TEST_MAIN("gpu")
diff --git a/modules/cudacodec/test/test_precomp.hpp b/modules/cudacodec/test/test_precomp.hpp
new file mode 100644
index 00000000000..dd584825213
--- /dev/null
+++ b/modules/cudacodec/test/test_precomp.hpp
@@ -0,0 +1,53 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef OPENCV_TEST_PRECOMP_HPP
+#define OPENCV_TEST_PRECOMP_HPP
+
+#include "opencv2/ts.hpp"
+#include "opencv2/videoio/registry.hpp"
+#include "opencv2/ts/cuda_test.hpp"
+
+#include "opencv2/cudacodec.hpp"
+
+#include "cvconfig.h"
+
+#endif
diff --git a/modules/cudacodec/test/test_video.cpp b/modules/cudacodec/test/test_video.cpp
new file mode 100644
index 00000000000..085ee906416
--- /dev/null
+++ b/modules/cudacodec/test/test_video.cpp
@@ -0,0 +1,132 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+namespace opencv_test {
+    namespace {
+
+#if defined(HAVE_NVCUVID) || defined(HAVE_NVCUVENC)
+PARAM_TEST_CASE(Video, cv::cuda::DeviceInfo, std::string)
+{
+};
+
+#if defined(HAVE_NVCUVID)
+//////////////////////////////////////////////////////
+// VideoReader
+
+CUDA_TEST_P(Video, Reader)
+{
+    cv::cuda::setDevice(GET_PARAM(0).deviceID());
+
+    // CUDA demuxer has to fall back to ffmpeg to process "gpu/video/768x576.avi"
+    if (GET_PARAM(1) == "gpu/video/768x576.avi" && !videoio_registry::hasBackend(CAP_FFMPEG))
+        throw SkipTestException("FFmpeg backend not found");
+
+    std::string inputFile = std::string(cvtest::TS::ptr()->get_data_path()) + "../" + GET_PARAM(1);
+    cv::Ptr<cv::cudacodec::VideoReader> reader = cv::cudacodec::createVideoReader(inputFile);
+
+    cv::cuda::GpuMat frame;
+    for (int i = 0; i < 100; i++)
+    {
+        ASSERT_TRUE(reader->nextFrame(frame));
+        ASSERT_FALSE(frame.empty());
+    }
+}
+#endif // HAVE_NVCUVID
+
+#if defined(_WIN32) && defined(HAVE_NVCUVENC)
+//////////////////////////////////////////////////////
+// VideoWriter
+
+CUDA_TEST_P(Video, Writer)
+{
+    cv::cuda::setDevice(GET_PARAM(0).deviceID());
+
+    const std::string inputFile = std::string(cvtest::TS::ptr()->get_data_path()) + "video/" + GET_PARAM(1);
+
+    std::string outputFile = cv::tempfile(".avi");
+    const double FPS = 25.0;
+
+    cv::VideoCapture reader(inputFile);
+    ASSERT_TRUE(reader.isOpened());
+
+    cv::Ptr<cv::cudacodec::VideoWriter> d_writer;
+
+    cv::Mat frame;
+    cv::cuda::GpuMat d_frame;
+
+    for (int i = 0; i < 10; ++i)
+    {
+        reader >> frame;
+        ASSERT_FALSE(frame.empty());
+
+        d_frame.upload(frame);
+
+        if (d_writer.empty())
+            d_writer = cv::cudacodec::createVideoWriter(outputFile, frame.size(), FPS);
+
+        d_writer->write(d_frame);
+    }
+
+    reader.release();
+    d_writer.release();
+
+    reader.open(outputFile);
+    ASSERT_TRUE(reader.isOpened());
+
+    for (int i = 0; i < 5; ++i)
+    {
+        reader >> frame;
+        ASSERT_FALSE(frame.empty());
+    }
+}
+
+#endif // _WIN32, HAVE_NVCUVENC
+
+#define VIDEO_SRC "gpu/video/768x576.avi", "gpu/video/1920x1080.avi", "highgui/video/big_buck_bunny.avi", \
+    "highgui/video/big_buck_bunny.h264", "highgui/video/big_buck_bunny.h265", "highgui/video/big_buck_bunny.mpg"
+INSTANTIATE_TEST_CASE_P(CUDA_Codec, Video, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(VIDEO_SRC)));
+
+#endif // HAVE_NVCUVID || HAVE_NVCUVENC
+}} // namespace
diff --git a/modules/cudafeatures2d/CMakeLists.txt b/modules/cudafeatures2d/CMakeLists.txt
new file mode 100644
index 00000000000..aba40283dd9
--- /dev/null
+++ b/modules/cudafeatures2d/CMakeLists.txt
@@ -0,0 +1,9 @@
+if(IOS OR WINRT OR (NOT HAVE_CUDA AND NOT BUILD_CUDA_STUBS))
+  ocv_module_disable(cudafeatures2d)
+endif()
+
+set(the_description "CUDA-accelerated Feature Detection and Description")
+
+ocv_warnings_disable(CMAKE_CXX_FLAGS /wd4127 /wd4100 /wd4324 /wd4512 /wd4515 -Wundef -Wmissing-declarations -Wshadow -Wunused-parameter -Wshadow)
+
+ocv_define_module(cudafeatures2d opencv_features2d opencv_cudafilters opencv_cudawarping WRAP python)
diff --git a/modules/cudafeatures2d/include/opencv2/cudafeatures2d.hpp b/modules/cudafeatures2d/include/opencv2/cudafeatures2d.hpp
new file mode 100644
index 00000000000..d9cff6de84c
--- /dev/null
+++ b/modules/cudafeatures2d/include/opencv2/cudafeatures2d.hpp
@@ -0,0 +1,486 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_CUDAFEATURES2D_HPP
+#define OPENCV_CUDAFEATURES2D_HPP
+
+#ifndef __cplusplus
+#  error cudafeatures2d.hpp header must be compiled as C++
+#endif
+
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/features2d.hpp"
+#include "opencv2/cudafilters.hpp"
+
+/**
+  @addtogroup cuda
+  @{
+    @defgroup cudafeatures2d Feature Detection and Description
+  @}
+ */
+
+namespace cv { namespace cuda {
+
+//! @addtogroup cudafeatures2d
+//! @{
+
+//
+// DescriptorMatcher
+//
+
+/** @brief Abstract base class for matching keypoint descriptors.
+
+It has two groups of match methods: for matching descriptors of an image with another image or with
+an image set.
+ */
+class CV_EXPORTS_W DescriptorMatcher : public cv::Algorithm
+{
+public:
+    //
+    // Factories
+    //
+
+    /** @brief Brute-force descriptor matcher.
+
+    For each descriptor in the first set, this matcher finds the closest descriptor in the second set
+    by trying each one. This descriptor matcher supports masking permissible matches of descriptor
+    sets.
+
+    @param normType One of NORM_L1, NORM_L2, NORM_HAMMING. L1 and L2 norms are
+    preferable choices for SIFT and SURF descriptors, NORM_HAMMING should be used with ORB, BRISK and
+    BRIEF).
+     */
+    CV_WRAP static Ptr<cuda::DescriptorMatcher> createBFMatcher(int normType = cv::NORM_L2);
+
+    //
+    // Utility
+    //
+
+    /** @brief Returns true if the descriptor matcher supports masking permissible matches.
+     */
+    CV_WRAP virtual bool isMaskSupported() const = 0;
+
+    //
+    // Descriptor collection
+    //
+
+    /** @brief Adds descriptors to train a descriptor collection.
+
+    If the collection is not empty, the new descriptors are added to existing train descriptors.
+
+    @param descriptors Descriptors to add. Each descriptors[i] is a set of descriptors from the same
+    train image.
+     */
+    CV_WRAP virtual void add(const std::vector<GpuMat>& descriptors) = 0;
+
+    /** @brief Returns a constant link to the train descriptor collection.
+     */
+    CV_WRAP virtual const std::vector<GpuMat>& getTrainDescriptors() const = 0;
+
+    /** @brief Clears the train descriptor collection.
+     */
+    CV_WRAP virtual void clear() = 0;
+
+    /** @brief Returns true if there are no train descriptors in the collection.
+     */
+    CV_WRAP virtual bool empty() const = 0;
+
+    /** @brief Trains a descriptor matcher.
+
+    Trains a descriptor matcher (for example, the flann index). In all methods to match, the method
+    train() is run every time before matching.
+     */
+    CV_WRAP virtual void train() = 0;
+
+    //
+    // 1 to 1 match
+    //
+
+    /** @brief Finds the best match for each descriptor from a query set (blocking version).
+
+    @param queryDescriptors Query set of descriptors.
+    @param trainDescriptors Train set of descriptors. This set is not added to the train descriptors
+    collection stored in the class object.
+    @param matches Matches. If a query descriptor is masked out in mask , no match is added for this
+    descriptor. So, matches size may be smaller than the query descriptors count.
+    @param mask Mask specifying permissible matches between an input query and train matrices of
+    descriptors.
+
+    In the first variant of this method, the train descriptors are passed as an input argument. In the
+    second variant of the method, train descriptors collection that was set by DescriptorMatcher::add is
+    used. Optional mask (or masks) can be passed to specify which query and training descriptors can be
+    matched. Namely, queryDescriptors[i] can be matched with trainDescriptors[j] only if
+    mask.at\<uchar\>(i,j) is non-zero.
+     */
+    CV_WRAP virtual void match(InputArray queryDescriptors, InputArray trainDescriptors,
+                       CV_OUT std::vector<DMatch>& matches,
+                       InputArray mask = noArray()) = 0;
+
+    /** @overload
+     */
+    CV_WRAP virtual void match(InputArray queryDescriptors,
+                       CV_OUT std::vector<DMatch>& matches,
+                       const std::vector<GpuMat>& masks = std::vector<GpuMat>()) = 0;
+
+    /** @brief Finds the best match for each descriptor from a query set (asynchronous version).
+
+    @param queryDescriptors Query set of descriptors.
+    @param trainDescriptors Train set of descriptors. This set is not added to the train descriptors
+    collection stored in the class object.
+    @param matches Matches array stored in GPU memory. Internal representation is not defined.
+    Use DescriptorMatcher::matchConvert method to retrieve results in standard representation.
+    @param mask Mask specifying permissible matches between an input query and train matrices of
+    descriptors.
+    @param stream CUDA stream.
+
+    In the first variant of this method, the train descriptors are passed as an input argument. In the
+    second variant of the method, train descriptors collection that was set by DescriptorMatcher::add is
+    used. Optional mask (or masks) can be passed to specify which query and training descriptors can be
+    matched. Namely, queryDescriptors[i] can be matched with trainDescriptors[j] only if
+    mask.at\<uchar\>(i,j) is non-zero.
+     */
+    CV_WRAP virtual void matchAsync(InputArray queryDescriptors, InputArray trainDescriptors,
+                            OutputArray matches,
+                            InputArray mask = noArray(),
+                            Stream& stream = Stream::Null()) = 0;
+
+    /** @overload
+     */
+    CV_WRAP virtual void matchAsync(InputArray queryDescriptors,
+                            OutputArray matches,
+                            const std::vector<GpuMat>& masks = std::vector<GpuMat>(),
+                            Stream& stream = Stream::Null()) = 0;
+
+    /** @brief Converts matches array from internal representation to standard matches vector.
+
+    The method is supposed to be used with DescriptorMatcher::matchAsync to get final result.
+    Call this method only after DescriptorMatcher::matchAsync is completed (ie. after synchronization).
+
+    @param gpu_matches Matches, returned from DescriptorMatcher::matchAsync.
+    @param matches Vector of DMatch objects.
+     */
+    CV_WRAP virtual void matchConvert(InputArray gpu_matches,
+                              CV_OUT std::vector<DMatch>& matches) = 0;
+
+    //
+    // knn match
+    //
+
+    /** @brief Finds the k best matches for each descriptor from a query set (blocking version).
+
+    @param queryDescriptors Query set of descriptors.
+    @param trainDescriptors Train set of descriptors. This set is not added to the train descriptors
+    collection stored in the class object.
+    @param matches Matches. Each matches[i] is k or less matches for the same query descriptor.
+    @param k Count of best matches found per each query descriptor or less if a query descriptor has
+    less than k possible matches in total.
+    @param mask Mask specifying permissible matches between an input query and train matrices of
+    descriptors.
+    @param compactResult Parameter used when the mask (or masks) is not empty. If compactResult is
+    false, the matches vector has the same size as queryDescriptors rows. If compactResult is true,
+    the matches vector does not contain matches for fully masked-out query descriptors.
+
+    These extended variants of DescriptorMatcher::match methods find several best matches for each query
+    descriptor. The matches are returned in the distance increasing order. See DescriptorMatcher::match
+    for the details about query and train descriptors.
+     */
+    CV_WRAP virtual void knnMatch(InputArray queryDescriptors, InputArray trainDescriptors,
+                          CV_OUT std::vector<std::vector<DMatch> >& matches,
+                          int k,
+                          InputArray mask = noArray(),
+                          bool compactResult = false) = 0;
+
+    /** @overload
+     */
+    CV_WRAP virtual void knnMatch(InputArray queryDescriptors,
+                          CV_OUT std::vector<std::vector<DMatch> >& matches,
+                          int k,
+                          const std::vector<GpuMat>& masks = std::vector<GpuMat>(),
+                          bool compactResult = false) = 0;
+
+    /** @brief Finds the k best matches for each descriptor from a query set (asynchronous version).
+
+    @param queryDescriptors Query set of descriptors.
+    @param trainDescriptors Train set of descriptors. This set is not added to the train descriptors
+    collection stored in the class object.
+    @param matches Matches array stored in GPU memory. Internal representation is not defined.
+    Use DescriptorMatcher::knnMatchConvert method to retrieve results in standard representation.
+    @param k Count of best matches found per each query descriptor or less if a query descriptor has
+    less than k possible matches in total.
+    @param mask Mask specifying permissible matches between an input query and train matrices of
+    descriptors.
+    @param stream CUDA stream.
+
+    These extended variants of DescriptorMatcher::matchAsync methods find several best matches for each query
+    descriptor. The matches are returned in the distance increasing order. See DescriptorMatcher::matchAsync
+    for the details about query and train descriptors.
+     */
+    CV_WRAP virtual void knnMatchAsync(InputArray queryDescriptors, InputArray trainDescriptors,
+                               OutputArray matches,
+                               int k,
+                               InputArray mask = noArray(),
+                               Stream& stream = Stream::Null()) = 0;
+
+    /** @overload
+     */
+    CV_WRAP virtual void knnMatchAsync(InputArray queryDescriptors,
+                               OutputArray matches,
+                               int k,
+                               const std::vector<GpuMat>& masks = std::vector<GpuMat>(),
+                               Stream& stream = Stream::Null()) = 0;
+
+    /** @brief Converts matches array from internal representation to standard matches vector.
+
+    The method is supposed to be used with DescriptorMatcher::knnMatchAsync to get final result.
+    Call this method only after DescriptorMatcher::knnMatchAsync is completed (ie. after synchronization).
+
+    @param gpu_matches Matches, returned from DescriptorMatcher::knnMatchAsync.
+    @param matches Vector of DMatch objects.
+    @param compactResult Parameter used when the mask (or masks) is not empty. If compactResult is
+    false, the matches vector has the same size as queryDescriptors rows. If compactResult is true,
+    the matches vector does not contain matches for fully masked-out query descriptors.
+     */
+    CV_WRAP virtual void knnMatchConvert(InputArray gpu_matches,
+                                 std::vector< std::vector<DMatch> >& matches,
+                                 bool compactResult = false) = 0;
+
+    //
+    // radius match
+    //
+
+    /** @brief For each query descriptor, finds the training descriptors not farther than the specified distance (blocking version).
+
+    @param queryDescriptors Query set of descriptors.
+    @param trainDescriptors Train set of descriptors. This set is not added to the train descriptors
+    collection stored in the class object.
+    @param matches Found matches.
+    @param maxDistance Threshold for the distance between matched descriptors. Distance means here
+    metric distance (e.g. Hamming distance), not the distance between coordinates (which is measured
+    in Pixels)!
+    @param mask Mask specifying permissible matches between an input query and train matrices of
+    descriptors.
+    @param compactResult Parameter used when the mask (or masks) is not empty. If compactResult is
+    false, the matches vector has the same size as queryDescriptors rows. If compactResult is true,
+    the matches vector does not contain matches for fully masked-out query descriptors.
+
+    For each query descriptor, the methods find such training descriptors that the distance between the
+    query descriptor and the training descriptor is equal or smaller than maxDistance. Found matches are
+    returned in the distance increasing order.
+     */
+    CV_WRAP virtual void radiusMatch(InputArray queryDescriptors, InputArray trainDescriptors,
+                             CV_OUT std::vector<std::vector<DMatch> >& matches,
+                             float maxDistance,
+                             InputArray mask = noArray(),
+                             bool compactResult = false) = 0;
+
+    /** @overload
+     */
+    CV_WRAP virtual void radiusMatch(InputArray queryDescriptors,
+                             CV_OUT std::vector<std::vector<DMatch> >& matches,
+                             float maxDistance,
+                             const std::vector<GpuMat>& masks = std::vector<GpuMat>(),
+                             bool compactResult = false) = 0;
+
+    /** @brief For each query descriptor, finds the training descriptors not farther than the specified distance (asynchronous version).
+
+    @param queryDescriptors Query set of descriptors.
+    @param trainDescriptors Train set of descriptors. This set is not added to the train descriptors
+    collection stored in the class object.
+    @param matches Matches array stored in GPU memory. Internal representation is not defined.
+    Use DescriptorMatcher::radiusMatchConvert method to retrieve results in standard representation.
+    @param maxDistance Threshold for the distance between matched descriptors. Distance means here
+    metric distance (e.g. Hamming distance), not the distance between coordinates (which is measured
+    in Pixels)!
+    @param mask Mask specifying permissible matches between an input query and train matrices of
+    descriptors.
+    @param stream CUDA stream.
+
+    For each query descriptor, the methods find such training descriptors that the distance between the
+    query descriptor and the training descriptor is equal or smaller than maxDistance. Found matches are
+    returned in the distance increasing order.
+     */
+    CV_WRAP virtual void radiusMatchAsync(InputArray queryDescriptors, InputArray trainDescriptors,
+                                  OutputArray matches,
+                                  float maxDistance,
+                                  InputArray mask = noArray(),
+                                  Stream& stream = Stream::Null()) = 0;
+
+    /** @overload
+     */
+    CV_WRAP virtual void radiusMatchAsync(InputArray queryDescriptors,
+                                  OutputArray matches,
+                                  float maxDistance,
+                                  const std::vector<GpuMat>& masks = std::vector<GpuMat>(),
+                                  Stream& stream = Stream::Null()) = 0;
+
+    /** @brief Converts matches array from internal representation to standard matches vector.
+
+    The method is supposed to be used with DescriptorMatcher::radiusMatchAsync to get final result.
+    Call this method only after DescriptorMatcher::radiusMatchAsync is completed (ie. after synchronization).
+
+    @param gpu_matches Matches, returned from DescriptorMatcher::radiusMatchAsync.
+    @param matches Vector of DMatch objects.
+    @param compactResult Parameter used when the mask (or masks) is not empty. If compactResult is
+    false, the matches vector has the same size as queryDescriptors rows. If compactResult is true,
+    the matches vector does not contain matches for fully masked-out query descriptors.
+     */
+    CV_WRAP virtual void radiusMatchConvert(InputArray gpu_matches,
+                                    std::vector< std::vector<DMatch> >& matches,
+                                    bool compactResult = false) = 0;
+};
+
+//
+// Feature2DAsync
+//
+
+/** @brief Abstract base class for CUDA asynchronous 2D image feature detectors and descriptor extractors.
+ */
+class CV_EXPORTS_W Feature2DAsync : public cv::Feature2D
+{
+public:
+    CV_WRAP virtual ~Feature2DAsync();
+
+    /** @brief Detects keypoints in an image.
+
+    @param image Image.
+    @param keypoints The detected keypoints.
+    @param mask Mask specifying where to look for keypoints (optional). It must be a 8-bit integer
+    matrix with non-zero values in the region of interest.
+    @param stream CUDA stream.
+     */
+    CV_WRAP virtual void detectAsync(InputArray image,
+                             OutputArray keypoints,
+                             InputArray mask = noArray(),
+                             Stream& stream = Stream::Null());
+
+    /** @brief Computes the descriptors for a set of keypoints detected in an image.
+
+    @param image Image.
+    @param keypoints Input collection of keypoints.
+    @param descriptors Computed descriptors. Row j is the descriptor for j-th keypoint.
+    @param stream CUDA stream.
+     */
+    CV_WRAP virtual void computeAsync(InputArray image,
+                              OutputArray keypoints,
+                              OutputArray descriptors,
+                              Stream& stream = Stream::Null());
+
+    /** Detects keypoints and computes the descriptors. */
+    CV_WRAP virtual void detectAndComputeAsync(InputArray image,
+                                       InputArray mask,
+                                       OutputArray keypoints,
+                                       OutputArray descriptors,
+                                       bool useProvidedKeypoints = false,
+                                       Stream& stream = Stream::Null());
+
+    /** Converts keypoints array from internal representation to standard vector. */
+    CV_WRAP virtual void convert(InputArray gpu_keypoints,
+                         CV_OUT std::vector<KeyPoint>& keypoints) = 0;
+};
+
+//
+// FastFeatureDetector
+//
+
+/** @brief Wrapping class for feature detection using the FAST method.
+ */
+class CV_EXPORTS_W FastFeatureDetector : public Feature2DAsync
+{
+public:
+    static const int LOCATION_ROW = 0;
+    static const int RESPONSE_ROW = 1;
+    static const int ROWS_COUNT   = 2;
+    static const int FEATURE_SIZE = 7;
+
+    CV_WRAP static Ptr<cuda::FastFeatureDetector> create(int threshold=10,
+                                           bool nonmaxSuppression=true,
+                                           int type=cv::FastFeatureDetector::TYPE_9_16,
+                                           int max_npoints = 5000);
+    CV_WRAP virtual void setThreshold(int threshold) = 0;
+
+    CV_WRAP virtual void setMaxNumPoints(int max_npoints) = 0;
+    CV_WRAP virtual int getMaxNumPoints() const = 0;
+};
+
+//
+// ORB
+//
+
+/** @brief Class implementing the ORB (*oriented BRIEF*) keypoint detector and descriptor extractor
+ *
+ * @sa cv::ORB
+ */
+class CV_EXPORTS_W ORB : public Feature2DAsync
+{
+public:
+    static const int X_ROW        = 0;
+    static const int Y_ROW        = 1;
+    static const int RESPONSE_ROW = 2;
+    static const int ANGLE_ROW    = 3;
+    static const int OCTAVE_ROW   = 4;
+    static const int SIZE_ROW     = 5;
+    static const int ROWS_COUNT   = 6;
+
+    CV_WRAP static Ptr<cuda::ORB> create(int nfeatures=500,
+                           float scaleFactor=1.2f,
+                           int nlevels=8,
+                           int edgeThreshold=31,
+                           int firstLevel=0,
+                           int WTA_K=2,
+                           int scoreType=cv::ORB::HARRIS_SCORE,
+                           int patchSize=31,
+                           int fastThreshold=20,
+                           bool blurForDescriptor=false);
+
+    //! if true, image will be blurred before descriptors calculation
+    CV_WRAP virtual void setBlurForDescriptor(bool blurForDescriptor) = 0;
+    CV_WRAP virtual bool getBlurForDescriptor() const = 0;
+
+    CV_WRAP virtual void setFastThreshold(int fastThreshold) = 0;
+    CV_WRAP virtual int getFastThreshold() const = 0;
+};
+
+//! @}
+
+}} // namespace cv { namespace cuda {
+
+#endif /* OPENCV_CUDAFEATURES2D_HPP */
diff --git a/modules/cudafeatures2d/misc/python/test/test_cudafeatures2d.py b/modules/cudafeatures2d/misc/python/test/test_cudafeatures2d.py
new file mode 100644
index 00000000000..9c17da79674
--- /dev/null
+++ b/modules/cudafeatures2d/misc/python/test/test_cudafeatures2d.py
@@ -0,0 +1,47 @@
+#!/usr/bin/env python
+import os
+import cv2 as cv
+import numpy as np
+
+from tests_common import NewOpenCVTests, unittest
+
+class cudafeatures2d_test(NewOpenCVTests):
+    def setUp(self):
+        super(cudafeatures2d_test, self).setUp()
+        if not cv.cuda.getCudaEnabledDeviceCount():
+            self.skipTest("No CUDA-capable device is detected")
+
+    def test_cudafeatures2d(self):
+        npMat1 = self.get_sample("samples/data/right01.jpg")
+        npMat2 = self.get_sample("samples/data/right02.jpg")
+
+        cuMat1 = cv.cuda_GpuMat()
+        cuMat2 = cv.cuda_GpuMat()
+        cuMat1.upload(npMat1)
+        cuMat2.upload(npMat2)
+
+        cuMat1 = cv.cuda.cvtColor(cuMat1, cv.COLOR_RGB2GRAY)
+        cuMat2 = cv.cuda.cvtColor(cuMat2, cv.COLOR_RGB2GRAY)
+
+        fast = cv.cuda_FastFeatureDetector.create()
+        _kps = fast.detectAsync(cuMat1)
+
+        orb = cv.cuda_ORB.create()
+        _kps1, descs1 = orb.detectAndComputeAsync(cuMat1, None)
+        _kps2, descs2 = orb.detectAndComputeAsync(cuMat2, None)
+
+        self.assertTrue(len(orb.convert(_kps1)) == _kps1.size()[0])
+        self.assertTrue(len(orb.convert(_kps2)) == _kps2.size()[0])
+
+        bf = cv.cuda_DescriptorMatcher.createBFMatcher(cv.NORM_HAMMING)
+        matches = bf.match(descs1, descs2)
+        self.assertGreater(len(matches), 0)
+        matches = bf.knnMatch(descs1, descs2, 2)
+        self.assertGreater(len(matches), 0)
+        matches = bf.radiusMatch(descs1, descs2, 0.1)
+        self.assertGreater(len(matches), 0)
+
+        self.assertTrue(True) #It is sufficient that no exceptions have been there
+
+if __name__ == '__main__':
+    NewOpenCVTests.bootstrap()
\ No newline at end of file
diff --git a/modules/cudafeatures2d/perf/perf_features2d.cpp b/modules/cudafeatures2d/perf/perf_features2d.cpp
new file mode 100644
index 00000000000..8fb2fb07e0c
--- /dev/null
+++ b/modules/cudafeatures2d/perf/perf_features2d.cpp
@@ -0,0 +1,312 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////////////////////
+// FAST
+
+DEF_PARAM_TEST(Image_Threshold_NonMaxSuppression, string, int, bool);
+
+PERF_TEST_P(Image_Threshold_NonMaxSuppression, FAST,
+            Combine(Values<string>("gpu/perf/aloe.png"),
+                    Values(20),
+                    Bool()))
+{
+    const cv::Mat img = readImage(GET_PARAM(0), cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(img.empty());
+
+    const int threshold = GET_PARAM(1);
+    const bool nonMaxSuppersion = GET_PARAM(2);
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::cuda::FastFeatureDetector> d_fast =
+                cv::cuda::FastFeatureDetector::create(threshold, nonMaxSuppersion,
+                                                      cv::FastFeatureDetector::TYPE_9_16,
+                                                      0.5 * img.size().area());
+
+        const cv::cuda::GpuMat d_img(img);
+        cv::cuda::GpuMat d_keypoints;
+
+        TEST_CYCLE() d_fast->detectAsync(d_img, d_keypoints);
+
+        std::vector<cv::KeyPoint> gpu_keypoints;
+        d_fast->convert(d_keypoints, gpu_keypoints);
+
+        sortKeyPoints(gpu_keypoints);
+
+        SANITY_CHECK_KEYPOINTS(gpu_keypoints);
+    }
+    else
+    {
+        std::vector<cv::KeyPoint> cpu_keypoints;
+
+        TEST_CYCLE() cv::FAST(img, cpu_keypoints, threshold, nonMaxSuppersion);
+
+        SANITY_CHECK_KEYPOINTS(cpu_keypoints);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// ORB
+
+DEF_PARAM_TEST(Image_NFeatures, string, int);
+
+PERF_TEST_P(Image_NFeatures, ORB,
+            Combine(Values<string>("gpu/perf/aloe.png"),
+                    Values(4000)))
+{
+    declare.time(300.0);
+
+    const cv::Mat img = readImage(GET_PARAM(0), cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(img.empty());
+
+    const int nFeatures = GET_PARAM(1);
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::cuda::ORB> d_orb = cv::cuda::ORB::create(nFeatures);
+
+        const cv::cuda::GpuMat d_img(img);
+        cv::cuda::GpuMat d_keypoints, d_descriptors;
+
+        TEST_CYCLE() d_orb->detectAndComputeAsync(d_img, cv::noArray(), d_keypoints, d_descriptors);
+
+        std::vector<cv::KeyPoint> gpu_keypoints;
+        d_orb->convert(d_keypoints, gpu_keypoints);
+
+        cv::Mat gpu_descriptors(d_descriptors);
+
+        gpu_keypoints.resize(10);
+        gpu_descriptors = gpu_descriptors.rowRange(0, 10);
+
+        sortKeyPoints(gpu_keypoints, gpu_descriptors);
+
+        SANITY_CHECK_KEYPOINTS(gpu_keypoints, 1e-4);
+        SANITY_CHECK(gpu_descriptors);
+    }
+    else
+    {
+        cv::Ptr<cv::ORB> orb = cv::ORB::create(nFeatures);
+
+        std::vector<cv::KeyPoint> cpu_keypoints;
+        cv::Mat cpu_descriptors;
+
+        TEST_CYCLE() orb->detectAndCompute(img, cv::noArray(), cpu_keypoints, cpu_descriptors);
+
+        SANITY_CHECK_KEYPOINTS(cpu_keypoints);
+        SANITY_CHECK(cpu_descriptors);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// BFMatch
+
+DEF_PARAM_TEST(DescSize_Norm, int, NormType);
+
+PERF_TEST_P(DescSize_Norm, BFMatch,
+            Combine(Values(64, 128, 256),
+                    Values(NormType(cv::NORM_L1), NormType(cv::NORM_L2), NormType(cv::NORM_HAMMING))))
+{
+    declare.time(20.0);
+
+    const int desc_size = GET_PARAM(0);
+    const int normType = GET_PARAM(1);
+
+    const int type = normType == cv::NORM_HAMMING ? CV_8U : CV_32F;
+
+    cv::Mat query(3000, desc_size, type);
+    declare.in(query, WARMUP_RNG);
+
+    cv::Mat train(3000, desc_size, type);
+    declare.in(train, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::cuda::DescriptorMatcher> d_matcher = cv::cuda::DescriptorMatcher::createBFMatcher(normType);
+
+        const cv::cuda::GpuMat d_query(query);
+        const cv::cuda::GpuMat d_train(train);
+        cv::cuda::GpuMat d_matches;
+
+        TEST_CYCLE() d_matcher->matchAsync(d_query, d_train, d_matches);
+
+        std::vector<cv::DMatch> gpu_matches;
+        d_matcher->matchConvert(d_matches, gpu_matches);
+
+        SANITY_CHECK_MATCHES(gpu_matches);
+    }
+    else
+    {
+        cv::BFMatcher matcher(normType);
+
+        std::vector<cv::DMatch> cpu_matches;
+
+        TEST_CYCLE() matcher.match(query, train, cpu_matches);
+
+        SANITY_CHECK_MATCHES(cpu_matches);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// BFKnnMatch
+
+static void toOneRowMatches(const std::vector< std::vector<cv::DMatch> >& src, std::vector<cv::DMatch>& dst)
+{
+    dst.clear();
+    for (size_t i = 0; i < src.size(); ++i)
+        for (size_t j = 0; j < src[i].size(); ++j)
+            dst.push_back(src[i][j]);
+}
+
+DEF_PARAM_TEST(DescSize_K_Norm, int, int, NormType);
+
+PERF_TEST_P(DescSize_K_Norm, BFKnnMatch,
+            Combine(Values(64, 128, 256),
+                    Values(2, 3),
+                    Values(NormType(cv::NORM_L1), NormType(cv::NORM_L2))))
+{
+    declare.time(30.0);
+
+    const int desc_size = GET_PARAM(0);
+    const int k = GET_PARAM(1);
+    const int normType = GET_PARAM(2);
+
+    const int type = normType == cv::NORM_HAMMING ? CV_8U : CV_32F;
+
+    cv::Mat query(3000, desc_size, type);
+    declare.in(query, WARMUP_RNG);
+
+    cv::Mat train(3000, desc_size, type);
+    declare.in(train, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::cuda::DescriptorMatcher> d_matcher = cv::cuda::DescriptorMatcher::createBFMatcher(normType);
+
+        const cv::cuda::GpuMat d_query(query);
+        const cv::cuda::GpuMat d_train(train);
+        cv::cuda::GpuMat d_matches;
+
+        TEST_CYCLE() d_matcher->knnMatchAsync(d_query, d_train, d_matches, k);
+
+        std::vector< std::vector<cv::DMatch> > matchesTbl;
+        d_matcher->knnMatchConvert(d_matches, matchesTbl);
+
+        std::vector<cv::DMatch> gpu_matches;
+        toOneRowMatches(matchesTbl, gpu_matches);
+
+        SANITY_CHECK_MATCHES(gpu_matches);
+    }
+    else
+    {
+        cv::BFMatcher matcher(normType);
+
+        std::vector< std::vector<cv::DMatch> > matchesTbl;
+
+        TEST_CYCLE() matcher.knnMatch(query, train, matchesTbl, k);
+
+        std::vector<cv::DMatch> cpu_matches;
+        toOneRowMatches(matchesTbl, cpu_matches);
+
+        SANITY_CHECK_MATCHES(cpu_matches);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// BFRadiusMatch
+
+PERF_TEST_P(DescSize_Norm, BFRadiusMatch,
+            Combine(Values(64, 128, 256),
+                    Values(NormType(cv::NORM_L1), NormType(cv::NORM_L2))))
+{
+    declare.time(30.0);
+
+    const int desc_size = GET_PARAM(0);
+    const int normType = GET_PARAM(1);
+
+    const int type = normType == cv::NORM_HAMMING ? CV_8U : CV_32F;
+    const float maxDistance = 10000;
+
+    cv::Mat query(3000, desc_size, type);
+    declare.in(query, WARMUP_RNG);
+
+    cv::Mat train(3000, desc_size, type);
+    declare.in(train, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::cuda::DescriptorMatcher> d_matcher = cv::cuda::DescriptorMatcher::createBFMatcher(normType);
+
+        const cv::cuda::GpuMat d_query(query);
+        const cv::cuda::GpuMat d_train(train);
+        cv::cuda::GpuMat d_matches;
+
+        TEST_CYCLE() d_matcher->radiusMatchAsync(d_query, d_train, d_matches, maxDistance);
+
+        std::vector< std::vector<cv::DMatch> > matchesTbl;
+        d_matcher->radiusMatchConvert(d_matches, matchesTbl);
+
+        std::vector<cv::DMatch> gpu_matches;
+        toOneRowMatches(matchesTbl, gpu_matches);
+
+        SANITY_CHECK_MATCHES(gpu_matches);
+    }
+    else
+    {
+        cv::BFMatcher matcher(normType);
+
+        std::vector< std::vector<cv::DMatch> > matchesTbl;
+
+        TEST_CYCLE() matcher.radiusMatch(query, train, matchesTbl, maxDistance);
+
+        std::vector<cv::DMatch> cpu_matches;
+        toOneRowMatches(matchesTbl, cpu_matches);
+
+        SANITY_CHECK_MATCHES(cpu_matches);
+    }
+}
+
+}} // namespace
diff --git a/modules/cudafeatures2d/perf/perf_main.cpp b/modules/cudafeatures2d/perf/perf_main.cpp
new file mode 100644
index 00000000000..07b891703da
--- /dev/null
+++ b/modules/cudafeatures2d/perf/perf_main.cpp
@@ -0,0 +1,47 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+using namespace perf;
+
+CV_PERF_TEST_CUDA_MAIN(cudafeatures2d)
diff --git a/modules/cudafeatures2d/perf/perf_precomp.hpp b/modules/cudafeatures2d/perf/perf_precomp.hpp
new file mode 100644
index 00000000000..df801ba5368
--- /dev/null
+++ b/modules/cudafeatures2d/perf/perf_precomp.hpp
@@ -0,0 +1,56 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_PERF_PRECOMP_HPP__
+#define __OPENCV_PERF_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_perf.hpp"
+
+#include "opencv2/cudafeatures2d.hpp"
+#include "opencv2/features2d.hpp"
+
+namespace opencv_test {
+using namespace perf;
+using namespace testing;
+}
+
+#endif
diff --git a/modules/cudafeatures2d/src/brute_force_matcher.cpp b/modules/cudafeatures2d/src/brute_force_matcher.cpp
new file mode 100644
index 00000000000..87316846d55
--- /dev/null
+++ b/modules/cudafeatures2d/src/brute_force_matcher.cpp
@@ -0,0 +1,1078 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+Ptr<cv::cuda::DescriptorMatcher> cv::cuda::DescriptorMatcher::createBFMatcher(int) { throw_no_cuda(); return Ptr<cv::cuda::DescriptorMatcher>(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace bf_match
+    {
+        template <typename T> void matchL1_gpu(const PtrStepSzb& query, const PtrStepSzb& train, const PtrStepSzb& mask,
+            const PtrStepSzi& trainIdx, const PtrStepSzf& distance,
+            cudaStream_t stream);
+        template <typename T> void matchL2_gpu(const PtrStepSzb& query, const PtrStepSzb& train, const PtrStepSzb& mask,
+            const PtrStepSzi& trainIdx, const PtrStepSzf& distance,
+            cudaStream_t stream);
+        template <typename T> void matchHamming_gpu(const PtrStepSzb& query, const PtrStepSzb& train, const PtrStepSzb& mask,
+            const PtrStepSzi& trainIdx, const PtrStepSzf& distance,
+            cudaStream_t stream);
+
+        template <typename T> void matchL1_gpu(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks,
+            const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance,
+            cudaStream_t stream);
+        template <typename T> void matchL2_gpu(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks,
+            const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance,
+            cudaStream_t stream);
+        template <typename T> void matchHamming_gpu(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks,
+            const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance,
+            cudaStream_t stream);
+    }
+
+    namespace bf_knnmatch
+    {
+        template <typename T> void matchL1_gpu(const PtrStepSzb& query, const PtrStepSzb& train, int k, const PtrStepSzb& mask,
+            const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist,
+            cudaStream_t stream);
+        template <typename T> void matchL2_gpu(const PtrStepSzb& query, const PtrStepSzb& train, int k, const PtrStepSzb& mask,
+            const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist,
+            cudaStream_t stream);
+        template <typename T> void matchHamming_gpu(const PtrStepSzb& query, const PtrStepSzb& train, int k, const PtrStepSzb& mask,
+            const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist,
+            cudaStream_t stream);
+
+        template <typename T> void match2L1_gpu(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks,
+            const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance,
+            cudaStream_t stream);
+        template <typename T> void match2L2_gpu(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks,
+            const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance,
+            cudaStream_t stream);
+        template <typename T> void match2Hamming_gpu(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks,
+            const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance,
+            cudaStream_t stream);
+    }
+
+    namespace bf_radius_match
+    {
+        template <typename T> void matchL1_gpu(const PtrStepSzb& query, const PtrStepSzb& train, float maxDistance, const PtrStepSzb& mask,
+            const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+            cudaStream_t stream);
+        template <typename T> void matchL2_gpu(const PtrStepSzb& query, const PtrStepSzb& train, float maxDistance, const PtrStepSzb& mask,
+            const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+            cudaStream_t stream);
+        template <typename T> void matchHamming_gpu(const PtrStepSzb& query, const PtrStepSzb& train, float maxDistance, const PtrStepSzb& mask,
+            const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+            cudaStream_t stream);
+
+        template <typename T> void matchL1_gpu(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks,
+            const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+            cudaStream_t stream);
+
+        template <typename T> void matchL2_gpu(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks,
+            const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+            cudaStream_t stream);
+
+        template <typename T> void matchHamming_gpu(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks,
+            const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+            cudaStream_t stream);
+    }
+}}}
+
+namespace
+{
+    static void makeGpuCollection(const std::vector<GpuMat>& trainDescCollection,
+                                  const std::vector<GpuMat>& masks,
+                                  GpuMat& trainCollection,
+                                  GpuMat& maskCollection)
+    {
+        if (trainDescCollection.empty())
+            return;
+
+        if (masks.empty())
+        {
+            Mat trainCollectionCPU(1, static_cast<int>(trainDescCollection.size()), CV_8UC(sizeof(PtrStepSzb)));
+
+            PtrStepSzb* trainCollectionCPU_ptr = trainCollectionCPU.ptr<PtrStepSzb>();
+
+            for (size_t i = 0, size = trainDescCollection.size(); i < size; ++i, ++trainCollectionCPU_ptr)
+                *trainCollectionCPU_ptr = trainDescCollection[i];
+
+            trainCollection.upload(trainCollectionCPU);
+            maskCollection.release();
+        }
+        else
+        {
+            CV_Assert( masks.size() == trainDescCollection.size() );
+
+            Mat trainCollectionCPU(1, static_cast<int>(trainDescCollection.size()), CV_8UC(sizeof(PtrStepSzb)));
+            Mat maskCollectionCPU(1, static_cast<int>(trainDescCollection.size()), CV_8UC(sizeof(PtrStepb)));
+
+            PtrStepSzb* trainCollectionCPU_ptr = trainCollectionCPU.ptr<PtrStepSzb>();
+            PtrStepb* maskCollectionCPU_ptr = maskCollectionCPU.ptr<PtrStepb>();
+
+            for (size_t i = 0, size = trainDescCollection.size(); i < size; ++i, ++trainCollectionCPU_ptr, ++maskCollectionCPU_ptr)
+            {
+                const GpuMat& train = trainDescCollection[i];
+                const GpuMat& mask = masks[i];
+
+                CV_Assert( mask.empty() || (mask.type() == CV_8UC1 && mask.cols == train.rows) );
+
+                *trainCollectionCPU_ptr = train;
+                *maskCollectionCPU_ptr = mask;
+            }
+
+            trainCollection.upload(trainCollectionCPU);
+            maskCollection.upload(maskCollectionCPU);
+        }
+    }
+
+    class BFMatcher_Impl : public cv::cuda::DescriptorMatcher
+    {
+    public:
+        explicit BFMatcher_Impl(int norm) : norm_(norm)
+        {
+            CV_Assert( norm == NORM_L1 || norm == NORM_L2 || norm == NORM_HAMMING );
+        }
+
+        virtual bool isMaskSupported() const { return true; }
+
+        virtual void add(const std::vector<GpuMat>& descriptors)
+        {
+            trainDescCollection_.insert(trainDescCollection_.end(), descriptors.begin(), descriptors.end());
+        }
+
+        virtual const std::vector<GpuMat>& getTrainDescriptors() const
+        {
+            return trainDescCollection_;
+        }
+
+        virtual void clear()
+        {
+            trainDescCollection_.clear();
+        }
+
+        virtual bool empty() const
+        {
+            return trainDescCollection_.empty();
+        }
+
+        virtual void train()
+        {
+        }
+
+        virtual void match(InputArray queryDescriptors, InputArray trainDescriptors,
+                           std::vector<DMatch>& matches,
+                           InputArray mask = noArray());
+
+        virtual void match(InputArray queryDescriptors,
+                           std::vector<DMatch>& matches,
+                           const std::vector<GpuMat>& masks = std::vector<GpuMat>());
+
+        virtual void matchAsync(InputArray queryDescriptors, InputArray trainDescriptors,
+                                OutputArray matches,
+                                InputArray mask = noArray(),
+                                Stream& stream = Stream::Null());
+
+        virtual void matchAsync(InputArray queryDescriptors,
+                                OutputArray matches,
+                                const std::vector<GpuMat>& masks = std::vector<GpuMat>(),
+                                Stream& stream = Stream::Null());
+
+        virtual void matchConvert(InputArray gpu_matches,
+                                  std::vector<DMatch>& matches);
+
+        virtual void knnMatch(InputArray queryDescriptors, InputArray trainDescriptors,
+                              std::vector<std::vector<DMatch> >& matches,
+                              int k,
+                              InputArray mask = noArray(),
+                              bool compactResult = false);
+
+        virtual void knnMatch(InputArray queryDescriptors,
+                              std::vector<std::vector<DMatch> >& matches,
+                              int k,
+                              const std::vector<GpuMat>& masks = std::vector<GpuMat>(),
+                              bool compactResult = false);
+
+        virtual void knnMatchAsync(InputArray queryDescriptors, InputArray trainDescriptors,
+                                   OutputArray matches,
+                                   int k,
+                                   InputArray mask = noArray(),
+                                   Stream& stream = Stream::Null());
+
+        virtual void knnMatchAsync(InputArray queryDescriptors,
+                                   OutputArray matches,
+                                   int k,
+                                   const std::vector<GpuMat>& masks = std::vector<GpuMat>(),
+                                   Stream& stream = Stream::Null());
+
+        virtual void knnMatchConvert(InputArray gpu_matches,
+                                     std::vector< std::vector<DMatch> >& matches,
+                                     bool compactResult = false);
+
+        virtual void radiusMatch(InputArray queryDescriptors, InputArray trainDescriptors,
+                                 std::vector<std::vector<DMatch> >& matches,
+                                 float maxDistance,
+                                 InputArray mask = noArray(),
+                                 bool compactResult = false);
+
+        virtual void radiusMatch(InputArray queryDescriptors,
+                                 std::vector<std::vector<DMatch> >& matches,
+                                 float maxDistance,
+                                 const std::vector<GpuMat>& masks = std::vector<GpuMat>(),
+                                 bool compactResult = false);
+
+        virtual void radiusMatchAsync(InputArray queryDescriptors, InputArray trainDescriptors,
+                                      OutputArray matches,
+                                      float maxDistance,
+                                      InputArray mask = noArray(),
+                                      Stream& stream = Stream::Null());
+
+        virtual void radiusMatchAsync(InputArray queryDescriptors,
+                                      OutputArray matches,
+                                      float maxDistance,
+                                      const std::vector<GpuMat>& masks = std::vector<GpuMat>(),
+                                      Stream& stream = Stream::Null());
+
+        virtual void radiusMatchConvert(InputArray gpu_matches,
+                                        std::vector< std::vector<DMatch> >& matches,
+                                        bool compactResult = false);
+
+    private:
+        int norm_;
+        std::vector<GpuMat> trainDescCollection_;
+    };
+
+    //
+    // 1 to 1 match
+    //
+
+    void BFMatcher_Impl::match(InputArray _queryDescriptors, InputArray _trainDescriptors,
+                               std::vector<DMatch>& matches,
+                               InputArray _mask)
+    {
+        GpuMat d_matches;
+        matchAsync(_queryDescriptors, _trainDescriptors, d_matches, _mask);
+        matchConvert(d_matches, matches);
+    }
+
+    void BFMatcher_Impl::match(InputArray _queryDescriptors,
+                               std::vector<DMatch>& matches,
+                               const std::vector<GpuMat>& masks)
+    {
+        GpuMat d_matches;
+        matchAsync(_queryDescriptors, d_matches, masks);
+        matchConvert(d_matches, matches);
+    }
+
+    void BFMatcher_Impl::matchAsync(InputArray _queryDescriptors, InputArray _trainDescriptors,
+                                    OutputArray _matches,
+                                    InputArray _mask,
+                                    Stream& stream)
+    {
+        using namespace cv::cuda::device::bf_match;
+
+        const GpuMat query = _queryDescriptors.getGpuMat();
+        const GpuMat train = _trainDescriptors.getGpuMat();
+        const GpuMat mask = _mask.getGpuMat();
+
+        if (query.empty() || train.empty())
+        {
+            _matches.release();
+            return;
+        }
+
+        CV_Assert( query.channels() == 1 && query.depth() < CV_64F );
+        CV_Assert( train.cols == query.cols && train.type() == query.type() );
+        CV_Assert( mask.empty() || (mask.type() == CV_8UC1 && mask.rows == query.rows && mask.cols == train.rows) );
+
+        typedef void (*caller_t)(const PtrStepSzb& query, const PtrStepSzb& train, const PtrStepSzb& mask,
+                                 const PtrStepSzi& trainIdx, const PtrStepSzf& distance,
+                                 cudaStream_t stream);
+
+        static const caller_t callersL1[] =
+        {
+            matchL1_gpu<unsigned char>, 0/*matchL1_gpu<signed char>*/,
+            matchL1_gpu<unsigned short>, matchL1_gpu<short>,
+            matchL1_gpu<int>, matchL1_gpu<float>
+        };
+        static const caller_t callersL2[] =
+        {
+            0/*matchL2_gpu<unsigned char>*/, 0/*matchL2_gpu<signed char>*/,
+            0/*matchL2_gpu<unsigned short>*/, 0/*matchL2_gpu<short>*/,
+            0/*matchL2_gpu<int>*/, matchL2_gpu<float>
+        };
+        static const caller_t callersHamming[] =
+        {
+            matchHamming_gpu<unsigned char>, 0/*matchHamming_gpu<signed char>*/,
+            matchHamming_gpu<unsigned short>, 0/*matchHamming_gpu<short>*/,
+            matchHamming_gpu<int>, 0/*matchHamming_gpu<float>*/
+        };
+
+        const caller_t* callers = norm_ == NORM_L1 ? callersL1 : norm_ == NORM_L2 ? callersL2 : callersHamming;
+
+        const caller_t func = callers[query.depth()];
+        if (func == 0)
+        {
+            CV_Error(Error::StsUnsupportedFormat, "unsupported combination of query.depth() and norm");
+        }
+
+        const int nQuery = query.rows;
+
+        _matches.create(2, nQuery, CV_32SC1);
+        GpuMat matches = _matches.getGpuMat();
+
+        GpuMat trainIdx(1, nQuery, CV_32SC1, matches.ptr(0));
+        GpuMat distance(1, nQuery, CV_32FC1, matches.ptr(1));
+
+        func(query, train, mask, trainIdx, distance, StreamAccessor::getStream(stream));
+    }
+
+    void BFMatcher_Impl::matchAsync(InputArray _queryDescriptors,
+                                    OutputArray _matches,
+                                    const std::vector<GpuMat>& masks,
+                                    Stream& stream)
+    {
+        using namespace cv::cuda::device::bf_match;
+
+        const GpuMat query = _queryDescriptors.getGpuMat();
+
+        if (query.empty() || trainDescCollection_.empty())
+        {
+            _matches.release();
+            return;
+        }
+
+        CV_Assert( query.channels() == 1 && query.depth() < CV_64F );
+
+        GpuMat trainCollection, maskCollection;
+        makeGpuCollection(trainDescCollection_, masks, trainCollection, maskCollection);
+
+        typedef void (*caller_t)(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks,
+                                 const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance,
+                                 cudaStream_t stream);
+
+        static const caller_t callersL1[] =
+        {
+            matchL1_gpu<unsigned char>, 0/*matchL1_gpu<signed char>*/,
+            matchL1_gpu<unsigned short>, matchL1_gpu<short>,
+            matchL1_gpu<int>, matchL1_gpu<float>
+        };
+        static const caller_t callersL2[] =
+        {
+            0/*matchL2_gpu<unsigned char>*/, 0/*matchL2_gpu<signed char>*/,
+            0/*matchL2_gpu<unsigned short>*/, 0/*matchL2_gpu<short>*/,
+            0/*matchL2_gpu<int>*/, matchL2_gpu<float>
+        };
+        static const caller_t callersHamming[] =
+        {
+            matchHamming_gpu<unsigned char>, 0/*matchHamming_gpu<signed char>*/,
+            matchHamming_gpu<unsigned short>, 0/*matchHamming_gpu<short>*/,
+            matchHamming_gpu<int>, 0/*matchHamming_gpu<float>*/
+        };
+
+        const caller_t* callers = norm_ == NORM_L1 ? callersL1 : norm_ == NORM_L2 ? callersL2 : callersHamming;
+
+        const caller_t func = callers[query.depth()];
+        if (func == 0)
+        {
+            CV_Error(Error::StsUnsupportedFormat, "unsupported combination of query.depth() and norm");
+        }
+
+        const int nQuery = query.rows;
+
+        _matches.create(3, nQuery, CV_32SC1);
+        GpuMat matches = _matches.getGpuMat();
+
+        GpuMat trainIdx(1, nQuery, CV_32SC1, matches.ptr(0));
+        GpuMat imgIdx(1, nQuery, CV_32SC1, matches.ptr(1));
+        GpuMat distance(1, nQuery, CV_32FC1, matches.ptr(2));
+
+        func(query, trainCollection, maskCollection, trainIdx, imgIdx, distance, StreamAccessor::getStream(stream));
+    }
+
+    void BFMatcher_Impl::matchConvert(InputArray _gpu_matches,
+                                      std::vector<DMatch>& matches)
+    {
+        Mat gpu_matches;
+        if (_gpu_matches.kind() == _InputArray::CUDA_GPU_MAT)
+        {
+            _gpu_matches.getGpuMat().download(gpu_matches);
+        }
+        else
+        {
+            gpu_matches = _gpu_matches.getMat();
+        }
+
+        if (gpu_matches.empty())
+        {
+            matches.clear();
+            return;
+        }
+
+        CV_Assert( (gpu_matches.type() == CV_32SC1) && (gpu_matches.rows == 2 || gpu_matches.rows == 3) );
+
+        const int nQuery = gpu_matches.cols;
+
+        matches.clear();
+        matches.reserve(nQuery);
+
+        const int* trainIdxPtr = NULL;
+        const int* imgIdxPtr = NULL;
+        const float* distancePtr = NULL;
+
+        if (gpu_matches.rows == 2)
+        {
+            trainIdxPtr = gpu_matches.ptr<int>(0);
+            distancePtr =  gpu_matches.ptr<float>(1);
+        }
+        else
+        {
+            trainIdxPtr = gpu_matches.ptr<int>(0);
+            imgIdxPtr =  gpu_matches.ptr<int>(1);
+            distancePtr =  gpu_matches.ptr<float>(2);
+        }
+
+        for (int queryIdx = 0; queryIdx < nQuery; ++queryIdx)
+        {
+            const int trainIdx = trainIdxPtr[queryIdx];
+            if (trainIdx == -1)
+                continue;
+
+            const int imgIdx = imgIdxPtr ? imgIdxPtr[queryIdx] : 0;
+            const float distance = distancePtr[queryIdx];
+
+            DMatch m(queryIdx, trainIdx, imgIdx, distance);
+
+            matches.push_back(m);
+        }
+    }
+
+    //
+    // knn match
+    //
+
+    void BFMatcher_Impl::knnMatch(InputArray _queryDescriptors, InputArray _trainDescriptors,
+                                  std::vector<std::vector<DMatch> >& matches,
+                                  int k,
+                                  InputArray _mask,
+                                  bool compactResult)
+    {
+        GpuMat d_matches;
+        knnMatchAsync(_queryDescriptors, _trainDescriptors, d_matches, k, _mask);
+        knnMatchConvert(d_matches, matches, compactResult);
+    }
+
+    void BFMatcher_Impl::knnMatch(InputArray _queryDescriptors,
+                                  std::vector<std::vector<DMatch> >& matches,
+                                  int k,
+                                  const std::vector<GpuMat>& masks,
+                                  bool compactResult)
+    {
+        if (k == 2)
+        {
+            GpuMat d_matches;
+            knnMatchAsync(_queryDescriptors, d_matches, k, masks);
+            knnMatchConvert(d_matches, matches, compactResult);
+        }
+        else
+        {
+            const GpuMat query = _queryDescriptors.getGpuMat();
+
+            if (query.empty() || trainDescCollection_.empty())
+            {
+                matches.clear();
+                return;
+            }
+
+            CV_Assert( query.channels() == 1 && query.depth() < CV_64F );
+
+            std::vector< std::vector<DMatch> > curMatches;
+            std::vector<DMatch> temp;
+            temp.reserve(2 * k);
+
+            matches.resize(query.rows);
+            for (size_t i = 0; i < matches.size(); ++i)
+                matches[i].reserve(k);
+
+            for (size_t imgIdx = 0; imgIdx < trainDescCollection_.size(); ++imgIdx)
+            {
+                knnMatch(query, trainDescCollection_[imgIdx], curMatches, k, masks.empty() ? GpuMat() : masks[imgIdx]);
+
+                for (int queryIdx = 0; queryIdx < query.rows; ++queryIdx)
+                {
+                    std::vector<DMatch>& localMatch = curMatches[queryIdx];
+                    std::vector<DMatch>& globalMatch = matches[queryIdx];
+
+                    for (size_t i = 0; i < localMatch.size(); ++i)
+                        localMatch[i].imgIdx = imgIdx;
+
+                    temp.clear();
+                    std::merge(globalMatch.begin(), globalMatch.end(), localMatch.begin(), localMatch.end(), std::back_inserter(temp));
+
+                    globalMatch.clear();
+                    const size_t count = std::min(static_cast<size_t>(k), temp.size());
+                    std::copy(temp.begin(), temp.begin() + count, std::back_inserter(globalMatch));
+                }
+            }
+
+            if (compactResult)
+            {
+                std::vector< std::vector<DMatch> >::iterator new_end = std::remove_if(matches.begin(), matches.end(),
+                    [](const std::vector<DMatch>& e)->bool { return e.empty(); });
+                matches.erase(new_end, matches.end());
+            }
+        }
+    }
+
+    void BFMatcher_Impl::knnMatchAsync(InputArray _queryDescriptors, InputArray _trainDescriptors,
+                                       OutputArray _matches,
+                                       int k,
+                                       InputArray _mask,
+                                       Stream& stream)
+    {
+        using namespace cv::cuda::device::bf_knnmatch;
+
+        const GpuMat query = _queryDescriptors.getGpuMat();
+        const GpuMat train = _trainDescriptors.getGpuMat();
+        const GpuMat mask = _mask.getGpuMat();
+
+        if (query.empty() || train.empty())
+        {
+            _matches.release();
+            return;
+        }
+
+        CV_Assert( query.channels() == 1 && query.depth() < CV_64F );
+        CV_Assert( train.cols == query.cols && train.type() == query.type() );
+        CV_Assert( mask.empty() || (mask.type() == CV_8UC1 && mask.rows == query.rows && mask.cols == train.rows) );
+
+        typedef void (*caller_t)(const PtrStepSzb& query, const PtrStepSzb& train, int k, const PtrStepSzb& mask,
+                                 const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist,
+                                 cudaStream_t stream);
+
+        static const caller_t callersL1[] =
+        {
+            matchL1_gpu<unsigned char>, 0/*matchL1_gpu<signed char>*/,
+            matchL1_gpu<unsigned short>, matchL1_gpu<short>,
+            matchL1_gpu<int>, matchL1_gpu<float>
+        };
+        static const caller_t callersL2[] =
+        {
+            0/*matchL2_gpu<unsigned char>*/, 0/*matchL2_gpu<signed char>*/,
+            0/*matchL2_gpu<unsigned short>*/, 0/*matchL2_gpu<short>*/,
+            0/*matchL2_gpu<int>*/, matchL2_gpu<float>
+        };
+        static const caller_t callersHamming[] =
+        {
+            matchHamming_gpu<unsigned char>, 0/*matchHamming_gpu<signed char>*/,
+            matchHamming_gpu<unsigned short>, 0/*matchHamming_gpu<short>*/,
+            matchHamming_gpu<int>, 0/*matchHamming_gpu<float>*/
+        };
+
+        const caller_t* callers = norm_ == NORM_L1 ? callersL1 : norm_ == NORM_L2 ? callersL2 : callersHamming;
+
+        const caller_t func = callers[query.depth()];
+        if (func == 0)
+        {
+            CV_Error(Error::StsUnsupportedFormat, "unsupported combination of query.depth() and norm");
+        }
+
+        const int nQuery = query.rows;
+        const int nTrain = train.rows;
+
+        GpuMat trainIdx, distance, allDist;
+        if (k == 2)
+        {
+            _matches.create(2, nQuery, CV_32SC2);
+            GpuMat matches = _matches.getGpuMat();
+
+            trainIdx = GpuMat(1, nQuery, CV_32SC2, matches.ptr(0));
+            distance = GpuMat(1, nQuery, CV_32FC2, matches.ptr(1));
+        }
+        else
+        {
+            _matches.create(2 * nQuery, k, CV_32SC1);
+            GpuMat matches = _matches.getGpuMat();
+
+            trainIdx = GpuMat(nQuery, k, CV_32SC1, matches.ptr(0), matches.step);
+            distance = GpuMat(nQuery, k, CV_32FC1, matches.ptr(nQuery), matches.step);
+
+            BufferPool pool(stream);
+            allDist = pool.getBuffer(nQuery, nTrain, CV_32FC1);
+        }
+
+        trainIdx.setTo(Scalar::all(-1), stream);
+
+        func(query, train, k, mask, trainIdx, distance, allDist, StreamAccessor::getStream(stream));
+    }
+
+    void BFMatcher_Impl::knnMatchAsync(InputArray _queryDescriptors,
+                                       OutputArray _matches,
+                                       int k,
+                                       const std::vector<GpuMat>& masks,
+                                       Stream& stream)
+    {
+        using namespace cv::cuda::device::bf_knnmatch;
+
+        if (k != 2)
+        {
+            CV_Error(Error::StsNotImplemented, "only k=2 mode is supported for now");
+        }
+
+        const GpuMat query = _queryDescriptors.getGpuMat();
+
+        if (query.empty() || trainDescCollection_.empty())
+        {
+            _matches.release();
+            return;
+        }
+
+        CV_Assert( query.channels() == 1 && query.depth() < CV_64F );
+
+        GpuMat trainCollection, maskCollection;
+        makeGpuCollection(trainDescCollection_, masks, trainCollection, maskCollection);
+
+        typedef void (*caller_t)(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks,
+                                 const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance,
+                                 cudaStream_t stream);
+
+        static const caller_t callersL1[] =
+        {
+            match2L1_gpu<unsigned char>, 0/*match2L1_gpu<signed char>*/,
+            match2L1_gpu<unsigned short>, match2L1_gpu<short>,
+            match2L1_gpu<int>, match2L1_gpu<float>
+        };
+        static const caller_t callersL2[] =
+        {
+            0/*match2L2_gpu<unsigned char>*/, 0/*match2L2_gpu<signed char>*/,
+            0/*match2L2_gpu<unsigned short>*/, 0/*match2L2_gpu<short>*/,
+            0/*match2L2_gpu<int>*/, match2L2_gpu<float>
+        };
+        static const caller_t callersHamming[] =
+        {
+            match2Hamming_gpu<unsigned char>, 0/*match2Hamming_gpu<signed char>*/,
+            match2Hamming_gpu<unsigned short>, 0/*match2Hamming_gpu<short>*/,
+            match2Hamming_gpu<int>, 0/*match2Hamming_gpu<float>*/
+        };
+
+        const caller_t* callers = norm_ == NORM_L1 ? callersL1 : norm_ == NORM_L2 ? callersL2 : callersHamming;
+
+        const caller_t func = callers[query.depth()];
+        if (func == 0)
+        {
+            CV_Error(Error::StsUnsupportedFormat, "unsupported combination of query.depth() and norm");
+        }
+
+        const int nQuery = query.rows;
+
+        _matches.create(3, nQuery, CV_32SC2);
+        GpuMat matches = _matches.getGpuMat();
+
+        GpuMat trainIdx(1, nQuery, CV_32SC2, matches.ptr(0));
+        GpuMat imgIdx(1, nQuery, CV_32SC2, matches.ptr(1));
+        GpuMat distance(1, nQuery, CV_32FC2, matches.ptr(2));
+
+        trainIdx.setTo(Scalar::all(-1), stream);
+
+        func(query, trainCollection, maskCollection, trainIdx, imgIdx, distance, StreamAccessor::getStream(stream));
+    }
+
+    void BFMatcher_Impl::knnMatchConvert(InputArray _gpu_matches,
+                                         std::vector< std::vector<DMatch> >& matches,
+                                         bool compactResult)
+    {
+        Mat gpu_matches;
+        if (_gpu_matches.kind() == _InputArray::CUDA_GPU_MAT)
+        {
+            _gpu_matches.getGpuMat().download(gpu_matches);
+        }
+        else
+        {
+            gpu_matches = _gpu_matches.getMat();
+        }
+
+        if (gpu_matches.empty())
+        {
+            matches.clear();
+            return;
+        }
+
+        CV_Assert( ((gpu_matches.type() == CV_32SC2) && (gpu_matches.rows == 2 || gpu_matches.rows == 3)) ||
+                   (gpu_matches.type() == CV_32SC1) );
+
+        int nQuery = -1, k = -1;
+
+        const int* trainIdxPtr = NULL;
+        const int* imgIdxPtr = NULL;
+        const float* distancePtr = NULL;
+
+        if (gpu_matches.type() == CV_32SC2)
+        {
+            nQuery = gpu_matches.cols;
+            k = 2;
+
+            if (gpu_matches.rows == 2)
+            {
+                trainIdxPtr = gpu_matches.ptr<int>(0);
+                distancePtr =  gpu_matches.ptr<float>(1);
+            }
+            else
+            {
+                trainIdxPtr = gpu_matches.ptr<int>(0);
+                imgIdxPtr =  gpu_matches.ptr<int>(1);
+                distancePtr =  gpu_matches.ptr<float>(2);
+            }
+        }
+        else
+        {
+            nQuery = gpu_matches.rows / 2;
+            k = gpu_matches.cols;
+
+            trainIdxPtr = gpu_matches.ptr<int>(0);
+            distancePtr =  gpu_matches.ptr<float>(nQuery);
+        }
+
+        matches.clear();
+        matches.reserve(nQuery);
+
+        for (int queryIdx = 0; queryIdx < nQuery; ++queryIdx)
+        {
+            matches.push_back(std::vector<DMatch>());
+            std::vector<DMatch>& curMatches = matches.back();
+            curMatches.reserve(k);
+
+            for (int i = 0; i < k; ++i)
+            {
+                const int trainIdx = *trainIdxPtr;
+                if (trainIdx == -1)
+                    continue;
+
+                const int imgIdx = imgIdxPtr ? *imgIdxPtr : 0;
+                const float distance = *distancePtr;
+
+                DMatch m(queryIdx, trainIdx, imgIdx, distance);
+
+                curMatches.push_back(m);
+
+                ++trainIdxPtr;
+                ++distancePtr;
+                if (imgIdxPtr)
+                    ++imgIdxPtr;
+            }
+
+            if (compactResult && curMatches.empty())
+            {
+                matches.pop_back();
+            }
+        }
+    }
+
+    //
+    // radius match
+    //
+
+    void BFMatcher_Impl::radiusMatch(InputArray _queryDescriptors, InputArray _trainDescriptors,
+                                     std::vector<std::vector<DMatch> >& matches,
+                                     float maxDistance,
+                                     InputArray _mask,
+                                     bool compactResult)
+    {
+        GpuMat d_matches;
+        radiusMatchAsync(_queryDescriptors, _trainDescriptors, d_matches, maxDistance, _mask);
+        radiusMatchConvert(d_matches, matches, compactResult);
+    }
+
+    void BFMatcher_Impl::radiusMatch(InputArray _queryDescriptors,
+                                     std::vector<std::vector<DMatch> >& matches,
+                                     float maxDistance,
+                                     const std::vector<GpuMat>& masks,
+                                     bool compactResult)
+    {
+        GpuMat d_matches;
+        radiusMatchAsync(_queryDescriptors, d_matches, maxDistance, masks);
+        radiusMatchConvert(d_matches, matches, compactResult);
+    }
+
+    void BFMatcher_Impl::radiusMatchAsync(InputArray _queryDescriptors, InputArray _trainDescriptors,
+                                          OutputArray _matches,
+                                          float maxDistance,
+                                          InputArray _mask,
+                                          Stream& stream)
+    {
+        using namespace cv::cuda::device::bf_radius_match;
+
+        const GpuMat query = _queryDescriptors.getGpuMat();
+        const GpuMat train = _trainDescriptors.getGpuMat();
+        const GpuMat mask = _mask.getGpuMat();
+
+        if (query.empty() || train.empty())
+        {
+            _matches.release();
+            return;
+        }
+
+        CV_Assert( query.channels() == 1 && query.depth() < CV_64F );
+        CV_Assert( train.cols == query.cols && train.type() == query.type() );
+        CV_Assert( mask.empty() || (mask.type() == CV_8UC1 && mask.rows == query.rows && mask.cols == train.rows) );
+
+        typedef void (*caller_t)(const PtrStepSzb& query, const PtrStepSzb& train, float maxDistance, const PtrStepSzb& mask,
+                                 const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+                                 cudaStream_t stream);
+
+        static const caller_t callersL1[] =
+        {
+            matchL1_gpu<unsigned char>, 0/*matchL1_gpu<signed char>*/,
+            matchL1_gpu<unsigned short>, matchL1_gpu<short>,
+            matchL1_gpu<int>, matchL1_gpu<float>
+        };
+        static const caller_t callersL2[] =
+        {
+            0/*matchL2_gpu<unsigned char>*/, 0/*matchL2_gpu<signed char>*/,
+            0/*matchL2_gpu<unsigned short>*/, 0/*matchL2_gpu<short>*/,
+            0/*matchL2_gpu<int>*/, matchL2_gpu<float>
+        };
+        static const caller_t callersHamming[] =
+        {
+            matchHamming_gpu<unsigned char>, 0/*matchHamming_gpu<signed char>*/,
+            matchHamming_gpu<unsigned short>, 0/*matchHamming_gpu<short>*/,
+            matchHamming_gpu<int>, 0/*matchHamming_gpu<float>*/
+        };
+
+        const caller_t* callers = norm_ == NORM_L1 ? callersL1 : norm_ == NORM_L2 ? callersL2 : callersHamming;
+
+        const caller_t func = callers[query.depth()];
+        if (func == 0)
+        {
+            CV_Error(Error::StsUnsupportedFormat, "unsupported combination of query.depth() and norm");
+        }
+
+        const int nQuery = query.rows;
+        const int nTrain = train.rows;
+
+        const int cols = std::max((nTrain / 100), nQuery);
+
+        _matches.create(2 * nQuery + 1, cols, CV_32SC1);
+        GpuMat matches = _matches.getGpuMat();
+
+        GpuMat trainIdx(nQuery, cols, CV_32SC1, matches.ptr(0), matches.step);
+        GpuMat distance(nQuery, cols, CV_32FC1, matches.ptr(nQuery), matches.step);
+        GpuMat nMatches(1, nQuery, CV_32SC1, matches.ptr(2 * nQuery));
+
+        nMatches.setTo(Scalar::all(0), stream);
+
+        func(query, train, maxDistance, mask, trainIdx, distance, nMatches, StreamAccessor::getStream(stream));
+    }
+
+    void BFMatcher_Impl::radiusMatchAsync(InputArray _queryDescriptors,
+                                          OutputArray _matches,
+                                          float maxDistance,
+                                          const std::vector<GpuMat>& masks,
+                                          Stream& stream)
+    {
+        using namespace cv::cuda::device::bf_radius_match;
+
+        const GpuMat query = _queryDescriptors.getGpuMat();
+
+        if (query.empty() || trainDescCollection_.empty())
+        {
+            _matches.release();
+            return;
+        }
+
+        CV_Assert( query.channels() == 1 && query.depth() < CV_64F );
+
+        GpuMat trainCollection, maskCollection;
+        makeGpuCollection(trainDescCollection_, masks, trainCollection, maskCollection);
+
+        typedef void (*caller_t)(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks,
+                                 const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+                                 cudaStream_t stream);
+
+        static const caller_t callersL1[] =
+        {
+            matchL1_gpu<unsigned char>, 0/*matchL1_gpu<signed char>*/,
+            matchL1_gpu<unsigned short>, matchL1_gpu<short>,
+            matchL1_gpu<int>, matchL1_gpu<float>
+        };
+        static const caller_t callersL2[] =
+        {
+            0/*matchL2_gpu<unsigned char>*/, 0/*matchL2_gpu<signed char>*/,
+            0/*matchL2_gpu<unsigned short>*/, 0/*matchL2_gpu<short>*/,
+            0/*matchL2_gpu<int>*/, matchL2_gpu<float>
+        };
+        static const caller_t callersHamming[] =
+        {
+            matchHamming_gpu<unsigned char>, 0/*matchHamming_gpu<signed char>*/,
+            matchHamming_gpu<unsigned short>, 0/*matchHamming_gpu<short>*/,
+            matchHamming_gpu<int>, 0/*matchHamming_gpu<float>*/
+        };
+
+        const caller_t* callers = norm_ == NORM_L1 ? callersL1 : norm_ == NORM_L2 ? callersL2 : callersHamming;
+
+        const caller_t func = callers[query.depth()];
+        if (func == 0)
+        {
+            CV_Error(Error::StsUnsupportedFormat, "unsupported combination of query.depth() and norm");
+        }
+
+        const int nQuery = query.rows;
+
+        _matches.create(3 * nQuery + 1, nQuery, CV_32FC1);
+        GpuMat matches = _matches.getGpuMat();
+
+        GpuMat trainIdx(nQuery, nQuery, CV_32SC1, matches.ptr(0), matches.step);
+        GpuMat imgIdx(nQuery, nQuery, CV_32SC1, matches.ptr(nQuery), matches.step);
+        GpuMat distance(nQuery, nQuery, CV_32FC1, matches.ptr(2 * nQuery), matches.step);
+        GpuMat nMatches(1, nQuery, CV_32SC1, matches.ptr(3 * nQuery));
+
+        nMatches.setTo(Scalar::all(0), stream);
+
+        std::vector<PtrStepSzb> trains_(trainDescCollection_.begin(), trainDescCollection_.end());
+        std::vector<PtrStepSzb> masks_(masks.begin(), masks.end());
+
+        func(query, &trains_[0], static_cast<int>(trains_.size()), maxDistance, masks_.size() == 0 ? 0 : &masks_[0],
+            trainIdx, imgIdx, distance, nMatches, StreamAccessor::getStream(stream));
+    }
+
+    void BFMatcher_Impl::radiusMatchConvert(InputArray _gpu_matches,
+                                            std::vector< std::vector<DMatch> >& matches,
+                                            bool compactResult)
+    {
+        Mat gpu_matches;
+        if (_gpu_matches.kind() == _InputArray::CUDA_GPU_MAT)
+        {
+            _gpu_matches.getGpuMat().download(gpu_matches);
+        }
+        else
+        {
+            gpu_matches = _gpu_matches.getMat();
+        }
+
+        if (gpu_matches.empty())
+        {
+            matches.clear();
+            return;
+        }
+
+        CV_Assert( gpu_matches.type() == CV_32SC1 || gpu_matches.type() == CV_32FC1 );
+
+        int nQuery = -1;
+
+        const int* trainIdxPtr = NULL;
+        const int* imgIdxPtr = NULL;
+        const float* distancePtr = NULL;
+        const int* nMatchesPtr = NULL;
+
+        if (gpu_matches.type() == CV_32SC1)
+        {
+            nQuery = (gpu_matches.rows - 1) / 2;
+
+            trainIdxPtr = gpu_matches.ptr<int>(0);
+            distancePtr =  gpu_matches.ptr<float>(nQuery);
+            nMatchesPtr = gpu_matches.ptr<int>(2 * nQuery);
+        }
+        else
+        {
+            nQuery = (gpu_matches.rows - 1) / 3;
+
+            trainIdxPtr = gpu_matches.ptr<int>(0);
+            imgIdxPtr = gpu_matches.ptr<int>(nQuery);
+            distancePtr =  gpu_matches.ptr<float>(2 * nQuery);
+            nMatchesPtr = gpu_matches.ptr<int>(3 * nQuery);
+        }
+
+        matches.clear();
+        matches.reserve(nQuery);
+
+        for (int queryIdx = 0; queryIdx < nQuery; ++queryIdx)
+        {
+            const int nMatched = std::min(nMatchesPtr[queryIdx], gpu_matches.cols);
+
+            if (nMatched == 0)
+            {
+                if (!compactResult)
+                {
+                    matches.push_back(std::vector<DMatch>());
+                }
+            }
+            else
+            {
+                matches.push_back(std::vector<DMatch>(nMatched));
+                std::vector<DMatch>& curMatches = matches.back();
+
+                for (int i = 0; i < nMatched; ++i)
+                {
+                    const int trainIdx = trainIdxPtr[i];
+
+                    const int imgIdx = imgIdxPtr ? imgIdxPtr[i] : 0;
+                    const float distance = distancePtr[i];
+
+                    DMatch m(queryIdx, trainIdx, imgIdx, distance);
+
+                    curMatches[i] = m;
+                }
+
+                std::sort(curMatches.begin(), curMatches.end());
+            }
+
+            trainIdxPtr += gpu_matches.cols;
+            distancePtr += gpu_matches.cols;
+            if (imgIdxPtr)
+                imgIdxPtr += gpu_matches.cols;
+        }
+    }
+}
+
+Ptr<cv::cuda::DescriptorMatcher> cv::cuda::DescriptorMatcher::createBFMatcher(int norm)
+{
+    return makePtr<BFMatcher_Impl>(norm);
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudafeatures2d/src/cuda/bf_knnmatch.cu b/modules/cudafeatures2d/src/cuda/bf_knnmatch.cu
new file mode 100644
index 00000000000..40c8ac80923
--- /dev/null
+++ b/modules/cudafeatures2d/src/cuda/bf_knnmatch.cu
@@ -0,0 +1,1255 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/utility.hpp"
+#include "opencv2/core/cuda/reduce.hpp"
+#include "opencv2/core/cuda/limits.hpp"
+#include "opencv2/core/cuda/vec_distance.hpp"
+#include "opencv2/core/cuda/datamov_utils.hpp"
+#include "opencv2/core/cuda/warp_shuffle.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace bf_knnmatch
+    {
+        ///////////////////////////////////////////////////////////////////////////////
+        // Reduction
+
+        template <int BLOCK_SIZE>
+        __device__ void findBestMatch(float& bestDistance1, float& bestDistance2,
+                                      int& bestTrainIdx1, int& bestTrainIdx2,
+                                      float* s_distance, int* s_trainIdx)
+        {
+        #if __CUDA_ARCH__ >= 300
+            CV_UNUSED(s_distance);
+            CV_UNUSED(s_trainIdx);
+
+            float d1, d2;
+            int i1, i2;
+
+            #pragma unroll
+            for (int i = BLOCK_SIZE / 2; i >= 1; i /= 2)
+            {
+                d1 = shfl_down(bestDistance1, i, BLOCK_SIZE);
+                d2 = shfl_down(bestDistance2, i, BLOCK_SIZE);
+                i1 = shfl_down(bestTrainIdx1, i, BLOCK_SIZE);
+                i2 = shfl_down(bestTrainIdx2, i, BLOCK_SIZE);
+
+                if (bestDistance1 < d1)
+                {
+                    if (d1 < bestDistance2)
+                    {
+                        bestDistance2 = d1;
+                        bestTrainIdx2 = i1;
+                    }
+                }
+                else
+                {
+                    bestDistance2 = bestDistance1;
+                    bestTrainIdx2 = bestTrainIdx1;
+
+                    bestDistance1 = d1;
+                    bestTrainIdx1 = i1;
+
+                    if (d2 < bestDistance2)
+                    {
+                        bestDistance2 = d2;
+                        bestTrainIdx2 = i2;
+                    }
+                }
+            }
+        #else
+            float myBestDistance1 = numeric_limits<float>::max();
+            float myBestDistance2 = numeric_limits<float>::max();
+            int myBestTrainIdx1 = -1;
+            int myBestTrainIdx2 = -1;
+
+            s_distance += threadIdx.y * BLOCK_SIZE;
+            s_trainIdx += threadIdx.y * BLOCK_SIZE;
+
+            s_distance[threadIdx.x] = bestDistance1;
+            s_trainIdx[threadIdx.x] = bestTrainIdx1;
+
+            __syncthreads();
+
+            if (threadIdx.x == 0)
+            {
+                #pragma unroll
+                for (int i = 0; i < BLOCK_SIZE; ++i)
+                {
+                    float val = s_distance[i];
+
+                    if (val < myBestDistance1)
+                    {
+                        myBestDistance2 = myBestDistance1;
+                        myBestTrainIdx2 = myBestTrainIdx1;
+
+                        myBestDistance1 = val;
+                        myBestTrainIdx1 = s_trainIdx[i];
+                    }
+                    else if (val < myBestDistance2)
+                    {
+                        myBestDistance2 = val;
+                        myBestTrainIdx2 = s_trainIdx[i];
+                    }
+                }
+            }
+
+            __syncthreads();
+
+            s_distance[threadIdx.x] = bestDistance2;
+            s_trainIdx[threadIdx.x] = bestTrainIdx2;
+
+            __syncthreads();
+
+            if (threadIdx.x == 0)
+            {
+                #pragma unroll
+                for (int i = 0; i < BLOCK_SIZE; ++i)
+                {
+                    float val = s_distance[i];
+
+                    if (val < myBestDistance2)
+                    {
+                        myBestDistance2 = val;
+                        myBestTrainIdx2 = s_trainIdx[i];
+                    }
+                }
+            }
+
+            bestDistance1 = myBestDistance1;
+            bestDistance2 = myBestDistance2;
+
+            bestTrainIdx1 = myBestTrainIdx1;
+            bestTrainIdx2 = myBestTrainIdx2;
+        #endif
+        }
+
+        template <int BLOCK_SIZE>
+        __device__ void findBestMatch(float& bestDistance1, float& bestDistance2,
+                                       int& bestTrainIdx1, int& bestTrainIdx2,
+                                       int& bestImgIdx1, int& bestImgIdx2,
+                                       float* s_distance, int* s_trainIdx, int* s_imgIdx)
+        {
+        #if __CUDA_ARCH__ >= 300
+            CV_UNUSED(s_distance);
+            CV_UNUSED(s_trainIdx);
+            CV_UNUSED(s_imgIdx);
+
+            float d1, d2;
+            int i1, i2;
+            int j1, j2;
+
+            #pragma unroll
+            for (int i = BLOCK_SIZE / 2; i >= 1; i /= 2)
+            {
+                d1 = shfl_down(bestDistance1, i, BLOCK_SIZE);
+                d2 = shfl_down(bestDistance2, i, BLOCK_SIZE);
+                i1 = shfl_down(bestTrainIdx1, i, BLOCK_SIZE);
+                i2 = shfl_down(bestTrainIdx2, i, BLOCK_SIZE);
+                j1 = shfl_down(bestImgIdx1, i, BLOCK_SIZE);
+                j2 = shfl_down(bestImgIdx2, i, BLOCK_SIZE);
+
+                if (bestDistance1 < d1)
+                {
+                    if (d1 < bestDistance2)
+                    {
+                        bestDistance2 = d1;
+                        bestTrainIdx2 = i1;
+                        bestImgIdx2 = j1;
+                    }
+                }
+                else
+                {
+                    bestDistance2 = bestDistance1;
+                    bestTrainIdx2 = bestTrainIdx1;
+                    bestImgIdx2 = bestImgIdx1;
+
+                    bestDistance1 = d1;
+                    bestTrainIdx1 = i1;
+                    bestImgIdx1 = j1;
+
+                    if (d2 < bestDistance2)
+                    {
+                        bestDistance2 = d2;
+                        bestTrainIdx2 = i2;
+                        bestImgIdx2 = j2;
+                    }
+                }
+            }
+        #else
+            float myBestDistance1 = numeric_limits<float>::max();
+            float myBestDistance2 = numeric_limits<float>::max();
+            int myBestTrainIdx1 = -1;
+            int myBestTrainIdx2 = -1;
+            int myBestImgIdx1 = -1;
+            int myBestImgIdx2 = -1;
+
+            s_distance += threadIdx.y * BLOCK_SIZE;
+            s_trainIdx += threadIdx.y * BLOCK_SIZE;
+            s_imgIdx   += threadIdx.y * BLOCK_SIZE;
+
+            s_distance[threadIdx.x] = bestDistance1;
+            s_trainIdx[threadIdx.x] = bestTrainIdx1;
+            s_imgIdx[threadIdx.x]   = bestImgIdx1;
+
+            __syncthreads();
+
+            if (threadIdx.x == 0)
+            {
+                #pragma unroll
+                for (int i = 0; i < BLOCK_SIZE; ++i)
+                {
+                    float val = s_distance[i];
+
+                    if (val < myBestDistance1)
+                    {
+                        myBestDistance2 = myBestDistance1;
+                        myBestTrainIdx2 = myBestTrainIdx1;
+                        myBestImgIdx2   = myBestImgIdx1;
+
+                        myBestDistance1 = val;
+                        myBestTrainIdx1 = s_trainIdx[i];
+                        myBestImgIdx1   = s_imgIdx[i];
+                    }
+                    else if (val < myBestDistance2)
+                    {
+                        myBestDistance2 = val;
+                        myBestTrainIdx2 = s_trainIdx[i];
+                        myBestImgIdx2   = s_imgIdx[i];
+                    }
+                }
+            }
+
+            __syncthreads();
+
+            s_distance[threadIdx.x] = bestDistance2;
+            s_trainIdx[threadIdx.x] = bestTrainIdx2;
+            s_imgIdx[threadIdx.x]   = bestImgIdx2;
+
+            __syncthreads();
+
+            if (threadIdx.x == 0)
+            {
+                #pragma unroll
+                for (int i = 0; i < BLOCK_SIZE; ++i)
+                {
+                    float val = s_distance[i];
+
+                    if (val < myBestDistance2)
+                    {
+                        myBestDistance2 = val;
+                        myBestTrainIdx2 = s_trainIdx[i];
+                        myBestImgIdx2   = s_imgIdx[i];
+                    }
+                }
+            }
+
+            bestDistance1 = myBestDistance1;
+            bestDistance2 = myBestDistance2;
+
+            bestTrainIdx1 = myBestTrainIdx1;
+            bestTrainIdx2 = myBestTrainIdx2;
+
+            bestImgIdx1 = myBestImgIdx1;
+            bestImgIdx2 = myBestImgIdx2;
+        #endif
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // Match Unrolled Cached
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename T, typename U>
+        __device__ void loadQueryToSmem(int queryIdx, const PtrStepSz<T>& query, U* s_query)
+        {
+            #pragma unroll
+            for (int i = 0; i < MAX_DESC_LEN / BLOCK_SIZE; ++i)
+            {
+                const int loadX = threadIdx.x + i * BLOCK_SIZE;
+                s_query[threadIdx.y * MAX_DESC_LEN + loadX] = loadX < query.cols ? query.ptr(::min(queryIdx, query.rows - 1))[loadX] : 0;
+            }
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        __device__ void loopUnrolledCached(int queryIdx, const PtrStepSz<T>& query, int imgIdx, const PtrStepSz<T>& train, const Mask& mask,
+                                           typename Dist::value_type* s_query, typename Dist::value_type* s_train,
+                                           float& bestDistance1, float& bestDistance2,
+                                           int& bestTrainIdx1, int& bestTrainIdx2,
+                                           int& bestImgIdx1, int& bestImgIdx2)
+        {
+            for (int t = 0, endt = (train.rows + BLOCK_SIZE - 1) / BLOCK_SIZE; t < endt; ++t)
+            {
+                Dist dist;
+
+                #pragma unroll
+                for (int i = 0; i < MAX_DESC_LEN / BLOCK_SIZE; ++i)
+                {
+                    const int loadX = threadIdx.x + i * BLOCK_SIZE;
+
+                    s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = 0;
+
+                    if (loadX < train.cols)
+                    {
+                        T val;
+
+                        ForceGlob<T>::Load(train.ptr(::min(t * BLOCK_SIZE + threadIdx.y, train.rows - 1)), loadX, val);
+                        s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = val;
+                    }
+
+                    __syncthreads();
+
+                    #pragma unroll
+                    for (int j = 0; j < BLOCK_SIZE; ++j)
+                        dist.reduceIter(s_query[threadIdx.y * MAX_DESC_LEN + i * BLOCK_SIZE + j], s_train[j * BLOCK_SIZE + threadIdx.x]);
+
+                    __syncthreads();
+                }
+
+                typename Dist::result_type distVal = dist;
+
+                const int trainIdx = t * BLOCK_SIZE + threadIdx.x;
+
+                if (queryIdx < query.rows && trainIdx < train.rows && mask(queryIdx, trainIdx))
+                {
+                    if (distVal < bestDistance1)
+                    {
+                        bestImgIdx2   = bestImgIdx1;
+                        bestDistance2 = bestDistance1;
+                        bestTrainIdx2 = bestTrainIdx1;
+
+                        bestImgIdx1   = imgIdx;
+                        bestDistance1 = distVal;
+                        bestTrainIdx1 = trainIdx;
+                    }
+                    else if (distVal < bestDistance2)
+                    {
+                        bestImgIdx2   = imgIdx;
+                        bestDistance2 = distVal;
+                        bestTrainIdx2 = trainIdx;
+                    }
+                }
+            }
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        __global__ void matchUnrolledCached(const PtrStepSz<T> query, const PtrStepSz<T> train, const Mask mask, int2* bestTrainIdx, float2* bestDistance)
+        {
+            extern __shared__ int smem[];
+
+            const int queryIdx = blockIdx.x * BLOCK_SIZE + threadIdx.y;
+
+            typename Dist::value_type* s_query = (typename Dist::value_type*)(smem);
+            typename Dist::value_type* s_train = (typename Dist::value_type*)(smem + BLOCK_SIZE * MAX_DESC_LEN);
+
+            loadQueryToSmem<BLOCK_SIZE, MAX_DESC_LEN>(queryIdx, query, s_query);
+
+            float myBestDistance1 = numeric_limits<float>::max();
+            float myBestDistance2 = numeric_limits<float>::max();
+            int myBestTrainIdx1 = -1;
+            int myBestTrainIdx2 = -1;
+
+            loopUnrolledCached<BLOCK_SIZE, MAX_DESC_LEN, Dist>(queryIdx, query, 0, train, mask, s_query, s_train, myBestDistance1, myBestDistance2, myBestTrainIdx1, myBestTrainIdx2, myBestTrainIdx1, myBestTrainIdx2);
+
+            __syncthreads();
+
+            float* s_distance = (float*)(smem);
+            int* s_trainIdx = (int*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            findBestMatch<BLOCK_SIZE>(myBestDistance1, myBestDistance2, myBestTrainIdx1, myBestTrainIdx2, s_distance, s_trainIdx);
+
+            if (queryIdx < query.rows && threadIdx.x == 0)
+            {
+                bestTrainIdx[queryIdx] = make_int2(myBestTrainIdx1, myBestTrainIdx2);
+                bestDistance[queryIdx] = make_float2(myBestDistance1, myBestDistance2);
+            }
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        void matchUnrolledCached(const PtrStepSz<T>& query, const PtrStepSz<T>& train, const Mask& mask,
+                                 const PtrStepSz<int2>& trainIdx, const PtrStepSz<float2>& distance,
+                                 cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+            const dim3 grid(divUp(query.rows, BLOCK_SIZE));
+
+            const size_t smemSize = (BLOCK_SIZE * (MAX_DESC_LEN >= BLOCK_SIZE ? MAX_DESC_LEN : BLOCK_SIZE) + BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            matchUnrolledCached<BLOCK_SIZE, MAX_DESC_LEN, Dist><<<grid, block, smemSize, stream>>>(query, train, mask, trainIdx.data, distance.data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        __global__ void matchUnrolledCached(const PtrStepSz<T> query, const PtrStepSz<T>* trains, int n, const Mask mask, int2* bestTrainIdx, int2* bestImgIdx, float2* bestDistance)
+        {
+            extern __shared__ int smem[];
+
+            const int queryIdx = blockIdx.x * BLOCK_SIZE + threadIdx.y;
+
+            typename Dist::value_type* s_query = (typename Dist::value_type*)(smem);
+            typename Dist::value_type* s_train = (typename Dist::value_type*)(smem + BLOCK_SIZE * MAX_DESC_LEN);
+
+            loadQueryToSmem<BLOCK_SIZE, MAX_DESC_LEN>(queryIdx, query, s_query);
+
+            float myBestDistance1 = numeric_limits<float>::max();
+            float myBestDistance2 = numeric_limits<float>::max();
+            int myBestTrainIdx1 = -1;
+            int myBestTrainIdx2 = -1;
+            int myBestImgIdx1 = -1;
+            int myBestImgIdx2 = -1;
+
+            Mask m = mask;
+
+            for (int imgIdx = 0; imgIdx < n; ++imgIdx)
+            {
+                const PtrStepSz<T> train = trains[imgIdx];
+                m.next();
+                loopUnrolledCached<BLOCK_SIZE, MAX_DESC_LEN, Dist>(queryIdx, query, imgIdx, train, m, s_query, s_train, myBestDistance1, myBestDistance2, myBestTrainIdx1, myBestTrainIdx2, myBestImgIdx1, myBestImgIdx2);
+            }
+
+            __syncthreads();
+
+            float* s_distance = (float*)(smem);
+            int* s_trainIdx = (int*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+            int* s_imgIdx = (int*)(smem + 2 * BLOCK_SIZE * BLOCK_SIZE);
+
+            findBestMatch<BLOCK_SIZE>(myBestDistance1, myBestDistance2, myBestTrainIdx1, myBestTrainIdx2, myBestImgIdx1, myBestImgIdx2, s_distance, s_trainIdx, s_imgIdx);
+
+            if (queryIdx < query.rows && threadIdx.x == 0)
+            {
+                bestTrainIdx[queryIdx] = make_int2(myBestTrainIdx1, myBestTrainIdx2);
+                bestImgIdx[queryIdx] = make_int2(myBestImgIdx1, myBestImgIdx2);
+                bestDistance[queryIdx] = make_float2(myBestDistance1, myBestDistance2);
+            }
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        void matchUnrolledCached(const PtrStepSz<T>& query, const PtrStepSz<T>* trains, int n, const Mask& mask,
+                                 const PtrStepSz<int2>& trainIdx, const PtrStepSz<int2>& imgIdx, const PtrStepSz<float2>& distance,
+                                 cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+            const dim3 grid(divUp(query.rows, BLOCK_SIZE));
+
+            const size_t smemSize = (BLOCK_SIZE * (MAX_DESC_LEN >= 2 * BLOCK_SIZE ? MAX_DESC_LEN : 2 * BLOCK_SIZE) + BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            matchUnrolledCached<BLOCK_SIZE, MAX_DESC_LEN, Dist><<<grid, block, smemSize, stream>>>(query, trains, n, mask, trainIdx.data, imgIdx.data, distance.data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // Match Unrolled
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        __device__ void loopUnrolled(int queryIdx, const PtrStepSz<T>& query, int imgIdx, const PtrStepSz<T>& train, const Mask& mask,
+                                     typename Dist::value_type* s_query, typename Dist::value_type* s_train,
+                                     float& bestDistance1, float& bestDistance2,
+                                     int& bestTrainIdx1, int& bestTrainIdx2,
+                                     int& bestImgIdx1, int& bestImgIdx2)
+        {
+            for (int t = 0, endt = (train.rows + BLOCK_SIZE - 1) / BLOCK_SIZE; t < endt; ++t)
+            {
+                Dist dist;
+
+                #pragma unroll
+                for (int i = 0; i < MAX_DESC_LEN / BLOCK_SIZE; ++i)
+                {
+                    const int loadX = threadIdx.x + i * BLOCK_SIZE;
+
+                    s_query[threadIdx.y * BLOCK_SIZE + threadIdx.x] = 0;
+                    s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = 0;
+
+                    if (loadX < query.cols)
+                    {
+                        T val;
+
+                        ForceGlob<T>::Load(query.ptr(::min(queryIdx, query.rows - 1)), loadX, val);
+                        s_query[threadIdx.y * BLOCK_SIZE + threadIdx.x] = val;
+
+                        ForceGlob<T>::Load(train.ptr(::min(t * BLOCK_SIZE + threadIdx.y, train.rows - 1)), loadX, val);
+                        s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = val;
+                    }
+
+                    __syncthreads();
+
+                    #pragma unroll
+                    for (int j = 0; j < BLOCK_SIZE; ++j)
+                        dist.reduceIter(s_query[threadIdx.y * BLOCK_SIZE + j], s_train[j * BLOCK_SIZE + threadIdx.x]);
+
+                    __syncthreads();
+                }
+
+                typename Dist::result_type distVal = dist;
+
+                const int trainIdx = t * BLOCK_SIZE + threadIdx.x;
+
+                if (queryIdx < query.rows && trainIdx < train.rows && mask(queryIdx, trainIdx))
+                {
+                    if (distVal < bestDistance1)
+                    {
+                        bestImgIdx2   = bestImgIdx1;
+                        bestDistance2 = bestDistance1;
+                        bestTrainIdx2 = bestTrainIdx1;
+
+                        bestImgIdx1   = imgIdx;
+                        bestDistance1 = distVal;
+                        bestTrainIdx1 = trainIdx;
+                    }
+                    else if (distVal < bestDistance2)
+                    {
+                        bestImgIdx2   = imgIdx;
+                        bestDistance2 = distVal;
+                        bestTrainIdx2 = trainIdx;
+                    }
+                }
+            }
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        __global__ void matchUnrolled(const PtrStepSz<T> query, const PtrStepSz<T> train, const Mask mask, int2* bestTrainIdx, float2* bestDistance)
+        {
+            extern __shared__ int smem[];
+
+            const int queryIdx = blockIdx.x * BLOCK_SIZE + threadIdx.y;
+
+            typename Dist::value_type* s_query = (typename Dist::value_type*)(smem);
+            typename Dist::value_type* s_train = (typename Dist::value_type*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            float myBestDistance1 = numeric_limits<float>::max();
+            float myBestDistance2 = numeric_limits<float>::max();
+            int myBestTrainIdx1 = -1;
+            int myBestTrainIdx2 = -1;
+
+            loopUnrolled<BLOCK_SIZE, MAX_DESC_LEN, Dist>(queryIdx, query, 0, train, mask, s_query, s_train, myBestDistance1, myBestDistance2, myBestTrainIdx1, myBestTrainIdx2, myBestTrainIdx1, myBestTrainIdx2);
+
+            __syncthreads();
+
+            float* s_distance = (float*)(smem);
+            int* s_trainIdx = (int*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            findBestMatch<BLOCK_SIZE>(myBestDistance1, myBestDistance2, myBestTrainIdx1, myBestTrainIdx2, s_distance, s_trainIdx);
+
+            if (queryIdx < query.rows && threadIdx.x == 0)
+            {
+                bestTrainIdx[queryIdx] = make_int2(myBestTrainIdx1, myBestTrainIdx2);
+                bestDistance[queryIdx] = make_float2(myBestDistance1, myBestDistance2);
+            }
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        void matchUnrolled(const PtrStepSz<T>& query, const PtrStepSz<T>& train, const Mask& mask,
+                           const PtrStepSz<int2>& trainIdx, const PtrStepSz<float2>& distance,
+                           cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+            const dim3 grid(divUp(query.rows, BLOCK_SIZE));
+
+            const size_t smemSize = (2 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            matchUnrolled<BLOCK_SIZE, MAX_DESC_LEN, Dist><<<grid, block, smemSize, stream>>>(query, train, mask, trainIdx.data, distance.data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        __global__ void matchUnrolled(const PtrStepSz<T> query, const PtrStepSz<T>* trains, int n, const Mask mask, int2* bestTrainIdx, int2* bestImgIdx, float2* bestDistance)
+        {
+            extern __shared__ int smem[];
+
+            const int queryIdx = blockIdx.x * BLOCK_SIZE + threadIdx.y;
+
+            typename Dist::value_type* s_query = (typename Dist::value_type*)(smem);
+            typename Dist::value_type* s_train = (typename Dist::value_type*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            float myBestDistance1 = numeric_limits<float>::max();
+            float myBestDistance2 = numeric_limits<float>::max();
+            int myBestTrainIdx1 = -1;
+            int myBestTrainIdx2 = -1;
+            int myBestImgIdx1 = -1;
+            int myBestImgIdx2 = -1;
+
+            Mask m = mask;
+
+            for (int imgIdx = 0; imgIdx < n; ++imgIdx)
+            {
+                const PtrStepSz<T> train = trains[imgIdx];
+                m.next();
+                loopUnrolled<BLOCK_SIZE, MAX_DESC_LEN, Dist>(queryIdx, query, imgIdx, train, m, s_query, s_train, myBestDistance1, myBestDistance2, myBestTrainIdx1, myBestTrainIdx2, myBestImgIdx1, myBestImgIdx2);
+            }
+
+            __syncthreads();
+
+            float* s_distance = (float*)(smem);
+            int* s_trainIdx = (int*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+            int* s_imgIdx = (int*)(smem + 2 * BLOCK_SIZE * BLOCK_SIZE);
+
+            findBestMatch<BLOCK_SIZE>(myBestDistance1, myBestDistance2, myBestTrainIdx1, myBestTrainIdx2, myBestImgIdx1, myBestImgIdx2, s_distance, s_trainIdx, s_imgIdx);
+
+            if (queryIdx < query.rows && threadIdx.x == 0)
+            {
+                bestTrainIdx[queryIdx] = make_int2(myBestTrainIdx1, myBestTrainIdx2);
+                bestImgIdx[queryIdx] = make_int2(myBestImgIdx1, myBestImgIdx2);
+                bestDistance[queryIdx] = make_float2(myBestDistance1, myBestDistance2);
+            }
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        void matchUnrolled(const PtrStepSz<T>& query, const PtrStepSz<T>* trains, int n, const Mask& mask,
+                           const PtrStepSz<int2>& trainIdx, const PtrStepSz<int2>& imgIdx, const PtrStepSz<float2>& distance,
+                           cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+            const dim3 grid(divUp(query.rows, BLOCK_SIZE));
+
+            const size_t smemSize = (3 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            matchUnrolled<BLOCK_SIZE, MAX_DESC_LEN, Dist><<<grid, block, smemSize, stream>>>(query, trains, n, mask, trainIdx.data, imgIdx.data, distance.data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // Match
+
+        template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
+        __device__ void loop(int queryIdx, const PtrStepSz<T>& query, int imgIdx, const PtrStepSz<T>& train, const Mask& mask,
+                             typename Dist::value_type* s_query, typename Dist::value_type* s_train,
+                             float& bestDistance1, float& bestDistance2,
+                             int& bestTrainIdx1, int& bestTrainIdx2,
+                             int& bestImgIdx1, int& bestImgIdx2)
+        {
+            for (int t = 0, endt = (train.rows + BLOCK_SIZE - 1) / BLOCK_SIZE; t < endt; ++t)
+            {
+                Dist dist;
+
+                for (int i = 0, endi = (query.cols + BLOCK_SIZE - 1) / BLOCK_SIZE; i < endi; ++i)
+                {
+                    const int loadX = threadIdx.x + i * BLOCK_SIZE;
+
+                    s_query[threadIdx.y * BLOCK_SIZE + threadIdx.x] = 0;
+                    s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = 0;
+
+                    if (loadX < query.cols)
+                    {
+                        T val;
+
+                        ForceGlob<T>::Load(query.ptr(::min(queryIdx, query.rows - 1)), loadX, val);
+                        s_query[threadIdx.y * BLOCK_SIZE + threadIdx.x] = val;
+
+                        ForceGlob<T>::Load(train.ptr(::min(t * BLOCK_SIZE + threadIdx.y, train.rows - 1)), loadX, val);
+                        s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = val;
+                    }
+
+                    __syncthreads();
+
+                    #pragma unroll
+                    for (int j = 0; j < BLOCK_SIZE; ++j)
+                        dist.reduceIter(s_query[threadIdx.y * BLOCK_SIZE + j], s_train[j * BLOCK_SIZE + threadIdx.x]);
+
+                    __syncthreads();
+                }
+
+                typename Dist::result_type distVal = dist;
+
+                const int trainIdx = t * BLOCK_SIZE + threadIdx.x;
+
+                if (queryIdx < query.rows && trainIdx < train.rows && mask(queryIdx, trainIdx))
+                {
+                    if (distVal < bestDistance1)
+                    {
+                        bestImgIdx2   = bestImgIdx1;
+                        bestDistance2 = bestDistance1;
+                        bestTrainIdx2 = bestTrainIdx1;
+
+                        bestImgIdx1   = imgIdx;
+                        bestDistance1 = distVal;
+                        bestTrainIdx1 = trainIdx;
+                    }
+                    else if (distVal < bestDistance2)
+                    {
+                        bestImgIdx2   = imgIdx;
+                        bestDistance2 = distVal;
+                        bestTrainIdx2 = trainIdx;
+                    }
+                }
+            }
+        }
+
+        template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
+        __global__ void match(const PtrStepSz<T> query, const PtrStepSz<T> train, const Mask mask, int2* bestTrainIdx, float2* bestDistance)
+        {
+            extern __shared__ int smem[];
+
+            const int queryIdx = blockIdx.x * BLOCK_SIZE + threadIdx.y;
+
+            typename Dist::value_type* s_query = (typename Dist::value_type*)(smem);
+            typename Dist::value_type* s_train = (typename Dist::value_type*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            float myBestDistance1 = numeric_limits<float>::max();
+            float myBestDistance2 = numeric_limits<float>::max();
+            int myBestTrainIdx1 = -1;
+            int myBestTrainIdx2 = -1;
+
+            loop<BLOCK_SIZE, Dist>(queryIdx, query, 0, train, mask, s_query, s_train, myBestDistance1, myBestDistance2, myBestTrainIdx1, myBestTrainIdx2, myBestTrainIdx1, myBestTrainIdx2);
+
+            __syncthreads();
+
+            float* s_distance = (float*)(smem);
+            int* s_trainIdx = (int*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            findBestMatch<BLOCK_SIZE>(myBestDistance1, myBestDistance2, myBestTrainIdx1, myBestTrainIdx2, s_distance, s_trainIdx);
+
+            if (queryIdx < query.rows && threadIdx.x == 0)
+            {
+                bestTrainIdx[queryIdx] = make_int2(myBestTrainIdx1, myBestTrainIdx2);
+                bestDistance[queryIdx] = make_float2(myBestDistance1, myBestDistance2);
+            }
+        }
+
+        template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
+        void match(const PtrStepSz<T>& query, const PtrStepSz<T>& train, const Mask& mask,
+                   const PtrStepSz<int2>& trainIdx, const PtrStepSz<float2>& distance,
+                   cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+            const dim3 grid(divUp(query.rows, BLOCK_SIZE));
+
+            const size_t smemSize = (2 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            match<BLOCK_SIZE, Dist><<<grid, block, smemSize, stream>>>(query, train, mask, trainIdx.data, distance.data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
+        __global__ void match(const PtrStepSz<T> query, const PtrStepSz<T>* trains, int n, const Mask mask, int2* bestTrainIdx, int2* bestImgIdx, float2* bestDistance)
+        {
+            extern __shared__ int smem[];
+
+            const int queryIdx = blockIdx.x * BLOCK_SIZE + threadIdx.y;
+
+            typename Dist::value_type* s_query = (typename Dist::value_type*)(smem);
+            typename Dist::value_type* s_train = (typename Dist::value_type*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            float myBestDistance1 = numeric_limits<float>::max();
+            float myBestDistance2 = numeric_limits<float>::max();
+            int myBestTrainIdx1 = -1;
+            int myBestTrainIdx2 = -1;
+            int myBestImgIdx1 = -1;
+            int myBestImgIdx2 = -1;
+
+            Mask m = mask;
+
+            for (int imgIdx = 0; imgIdx < n; ++imgIdx)
+            {
+                const PtrStepSz<T> train = trains[imgIdx];
+                m.next();
+                loop<BLOCK_SIZE, Dist>(queryIdx, query, imgIdx, train, m, s_query, s_train, myBestDistance1, myBestDistance2, myBestTrainIdx1, myBestTrainIdx2, myBestImgIdx1, myBestImgIdx2);
+            }
+
+            __syncthreads();
+
+            float* s_distance = (float*)(smem);
+            int* s_trainIdx = (int*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+            int* s_imgIdx = (int*)(smem + 2 * BLOCK_SIZE * BLOCK_SIZE);
+
+            findBestMatch<BLOCK_SIZE>(myBestDistance1, myBestDistance2, myBestTrainIdx1, myBestTrainIdx2, myBestImgIdx1, myBestImgIdx2, s_distance, s_trainIdx, s_imgIdx);
+
+            if (queryIdx < query.rows && threadIdx.x == 0)
+            {
+                bestTrainIdx[queryIdx] = make_int2(myBestTrainIdx1, myBestTrainIdx2);
+                bestImgIdx[queryIdx] = make_int2(myBestImgIdx1, myBestImgIdx2);
+                bestDistance[queryIdx] = make_float2(myBestDistance1, myBestDistance2);
+            }
+        }
+
+        template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
+        void match(const PtrStepSz<T>& query, const PtrStepSz<T>* trains, int n, const Mask& mask,
+                   const PtrStepSz<int2>& trainIdx, const PtrStepSz<int2>& imgIdx, const PtrStepSz<float2>& distance,
+                   cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+            const dim3 grid(divUp(query.rows, BLOCK_SIZE));
+
+            const size_t smemSize = (3 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            match<BLOCK_SIZE, Dist><<<grid, block, smemSize, stream>>>(query, trains, n, mask, trainIdx.data, imgIdx.data, distance.data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // knnMatch 2 dispatcher
+
+        template <typename Dist, typename T, typename Mask>
+        void match2Dispatcher(const PtrStepSz<T>& query, const PtrStepSz<T>& train, const Mask& mask,
+                              const PtrStepSzb& trainIdx, const PtrStepSzb& distance,
+                              cudaStream_t stream)
+        {
+            if (query.cols <= 64)
+            {
+                matchUnrolledCached<16, 64, Dist>(query, train, mask, static_cast< PtrStepSz<int2> >(trainIdx), static_cast< PtrStepSz<float2> > (distance), stream);
+            }
+            else if (query.cols <= 128)
+            {
+                matchUnrolledCached<16, 128, Dist>(query, train, mask, static_cast< PtrStepSz<int2> >(trainIdx), static_cast< PtrStepSz<float2> > (distance), stream);
+            }
+            /*else if (query.cols <= 256)
+            {
+                matchUnrolled<16, 256, Dist>(query, train, mask, static_cast< PtrStepSz<int2> >(trainIdx), static_cast< PtrStepSz<float2> > (distance), stream);
+            }
+            else if (query.cols <= 512)
+            {
+                matchUnrolled<16, 512, Dist>(query, train, mask, static_cast< PtrStepSz<int2> >(trainIdx), static_cast< PtrStepSz<float2> > (distance), stream);
+            }
+            else if (query.cols <= 1024)
+            {
+                matchUnrolled<16, 1024, Dist>(query, train, mask, static_cast< PtrStepSz<int2> >(trainIdx), static_cast< PtrStepSz<float2> > (distance), stream);
+            }*/
+            else
+            {
+                match<16, Dist>(query, train, mask, static_cast< PtrStepSz<int2> >(trainIdx), static_cast< PtrStepSz<float2> > (distance), stream);
+            }
+        }
+
+        template <typename Dist, typename T, typename Mask>
+        void match2Dispatcher(const PtrStepSz<T>& query, const PtrStepSz<T>* trains, int n, const Mask& mask,
+                              const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance,
+                              cudaStream_t stream)
+        {
+            if (query.cols <= 64)
+            {
+                matchUnrolledCached<16, 64, Dist>(query, trains, n, mask, static_cast< PtrStepSz<int2> >(trainIdx), static_cast< PtrStepSz<int2> >(imgIdx), static_cast< PtrStepSz<float2> > (distance), stream);
+            }
+            else if (query.cols <= 128)
+            {
+                matchUnrolledCached<16, 128, Dist>(query, trains, n, mask, static_cast< PtrStepSz<int2> >(trainIdx), static_cast< PtrStepSz<int2> >(imgIdx), static_cast< PtrStepSz<float2> > (distance), stream);
+            }
+            /*else if (query.cols <= 256)
+            {
+                matchUnrolled<16, 256, Dist>(query, trains, n, mask, static_cast< PtrStepSz<int2> >(trainIdx), static_cast< PtrStepSz<int2> >(imgIdx), static_cast< PtrStepSz<float2> > (distance), stream);
+            }
+            else if (query.cols <= 512)
+            {
+                matchUnrolled<16, 512, Dist>(query, trains, n, mask, static_cast< PtrStepSz<int2> >(trainIdx), static_cast< PtrStepSz<int2> >(imgIdx), static_cast< PtrStepSz<float2> > (distance), stream);
+            }
+            else if (query.cols <= 1024)
+            {
+                matchUnrolled<16, 1024, Dist>(query, trains, n, mask, static_cast< PtrStepSz<int2> >(trainIdx), static_cast< PtrStepSz<int2> >(imgIdx), static_cast< PtrStepSz<float2> > (distance), stream);
+            }*/
+            else
+            {
+                match<16, Dist>(query, trains, n, mask, static_cast< PtrStepSz<int2> >(trainIdx), static_cast< PtrStepSz<int2> >(imgIdx), static_cast< PtrStepSz<float2> > (distance), stream);
+            }
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // Calc distance kernel
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        __global__ void calcDistanceUnrolled(const PtrStepSz<T> query, const PtrStepSz<T> train, const Mask mask, PtrStepf allDist)
+        {
+            extern __shared__ int smem[];
+
+            const int queryIdx = blockIdx.y * BLOCK_SIZE + threadIdx.y;
+            const int trainIdx = blockIdx.x * BLOCK_SIZE + threadIdx.x;
+
+            typename Dist::value_type* s_query = (typename Dist::value_type*)(smem);
+            typename Dist::value_type* s_train = (typename Dist::value_type*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            Dist dist;
+
+            #pragma unroll
+            for (int i = 0; i < MAX_DESC_LEN / BLOCK_SIZE; ++i)
+            {
+                const int loadX = threadIdx.x + i * BLOCK_SIZE;
+
+                if (loadX < query.cols)
+                {
+                    s_query[threadIdx.y * BLOCK_SIZE + threadIdx.x] = query.ptr(::min(queryIdx, query.rows - 1))[loadX];
+                    s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = train.ptr(::min(blockIdx.x * BLOCK_SIZE + threadIdx.y, train.rows - 1))[loadX];
+                }
+                else
+                {
+                    s_query[threadIdx.y * BLOCK_SIZE + threadIdx.x] = 0;
+                    s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = 0;
+                }
+
+                __syncthreads();
+
+                #pragma unroll
+                for (int j = 0; j < BLOCK_SIZE; ++j)
+                    dist.reduceIter(s_query[threadIdx.y * BLOCK_SIZE + j], s_train[j * BLOCK_SIZE + threadIdx.x]);
+
+                __syncthreads();
+            }
+
+            if (queryIdx < query.rows && trainIdx < train.rows)
+            {
+                float distVal = numeric_limits<float>::max();
+
+                if (mask(queryIdx, trainIdx))
+                    distVal = (typename Dist::result_type)dist;
+
+                allDist.ptr(queryIdx)[trainIdx] = distVal;
+            }
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        void calcDistanceUnrolled(const PtrStepSz<T>& query, const PtrStepSz<T>& train, const Mask& mask, const PtrStepSzf& allDist, cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+            const dim3 grid(divUp(train.rows, BLOCK_SIZE), divUp(query.rows, BLOCK_SIZE));
+
+            const size_t smemSize = (2 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            calcDistanceUnrolled<BLOCK_SIZE, MAX_DESC_LEN, Dist><<<grid, block, smemSize, stream>>>(query, train, mask, allDist);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
+        __global__ void calcDistance(const PtrStepSz<T> query, const PtrStepSz<T> train, const Mask mask, PtrStepf allDist)
+        {
+            extern __shared__ int smem[];
+
+            const int queryIdx = blockIdx.y * BLOCK_SIZE + threadIdx.y;
+            const int trainIdx = blockIdx.x * BLOCK_SIZE + threadIdx.x;
+
+            typename Dist::value_type* s_query = (typename Dist::value_type*)(smem);
+            typename Dist::value_type* s_train = (typename Dist::value_type*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            Dist dist;
+
+            for (int i = 0, endi = (query.cols + BLOCK_SIZE - 1) / BLOCK_SIZE; i < endi; ++i)
+            {
+                const int loadX = threadIdx.x + i * BLOCK_SIZE;
+
+                if (loadX < query.cols)
+                {
+                    s_query[threadIdx.y * BLOCK_SIZE + threadIdx.x] = query.ptr(::min(queryIdx, query.rows - 1))[loadX];
+                    s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = train.ptr(::min(blockIdx.x * BLOCK_SIZE + threadIdx.y, train.rows - 1))[loadX];
+                }
+                else
+                {
+                    s_query[threadIdx.y * BLOCK_SIZE + threadIdx.x] = 0;
+                    s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = 0;
+                }
+
+                __syncthreads();
+
+                #pragma unroll
+                for (int j = 0; j < BLOCK_SIZE; ++j)
+                    dist.reduceIter(s_query[threadIdx.y * BLOCK_SIZE + j], s_train[j * BLOCK_SIZE + threadIdx.x]);
+
+                __syncthreads();
+            }
+
+            if (queryIdx < query.rows && trainIdx < train.rows)
+            {
+                float distVal = numeric_limits<float>::max();
+
+                if (mask(queryIdx, trainIdx))
+                    distVal = (typename Dist::result_type)dist;
+
+                allDist.ptr(queryIdx)[trainIdx] = distVal;
+            }
+        }
+
+        template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
+        void calcDistance(const PtrStepSz<T>& query, const PtrStepSz<T>& train, const Mask& mask, const PtrStepSzf& allDist, cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+            const dim3 grid(divUp(train.rows, BLOCK_SIZE), divUp(query.rows, BLOCK_SIZE));
+
+            const size_t smemSize = (2 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            calcDistance<BLOCK_SIZE, Dist><<<grid, block, smemSize, stream>>>(query, train, mask, allDist);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // Calc Distance dispatcher
+
+        template <typename Dist, typename T, typename Mask>
+        void calcDistanceDispatcher(const PtrStepSz<T>& query, const PtrStepSz<T>& train, const Mask& mask,
+                                    const PtrStepSzf& allDist,
+                                    cudaStream_t stream)
+        {
+            if (query.cols <= 64)
+            {
+                calcDistanceUnrolled<16, 64, Dist>(query, train, mask, allDist, stream);
+            }
+            else if (query.cols <= 128)
+            {
+                calcDistanceUnrolled<16, 128, Dist>(query, train, mask, allDist, stream);
+            }
+            /*else if (query.cols <= 256)
+            {
+                calcDistanceUnrolled<16, 256, Dist>(query, train, mask, allDist, stream);
+            }
+            else if (query.cols <= 512)
+            {
+                calcDistanceUnrolled<16, 512, Dist>(query, train, mask, allDist, stream);
+            }
+            else if (query.cols <= 1024)
+            {
+                calcDistanceUnrolled<16, 1024, Dist>(query, train, mask, allDist, stream);
+            }*/
+            else
+            {
+                calcDistance<16, Dist>(query, train, mask, allDist, stream);
+            }
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // find knn match kernel
+
+        template <int BLOCK_SIZE>
+        __global__ void findBestMatch(PtrStepSzf allDist, int i, PtrStepi trainIdx, PtrStepf distance)
+        {
+            const int SMEM_SIZE = BLOCK_SIZE > 64 ? BLOCK_SIZE : 64;
+            __shared__ float s_dist[SMEM_SIZE];
+            __shared__ int s_trainIdx[SMEM_SIZE];
+
+            const int queryIdx = blockIdx.x;
+
+            float* allDistRow = allDist.ptr(queryIdx);
+
+            float dist = numeric_limits<float>::max();
+            int bestIdx = -1;
+
+            for (int i = threadIdx.x; i < allDist.cols; i += BLOCK_SIZE)
+            {
+                float reg = allDistRow[i];
+                if (reg < dist)
+                {
+                    dist = reg;
+                    bestIdx = i;
+                }
+            }
+
+            s_dist[threadIdx.x] = dist;
+            s_trainIdx[threadIdx.x] = bestIdx;
+            __syncthreads();
+
+            reduceKeyVal<BLOCK_SIZE>(s_dist, dist, s_trainIdx, bestIdx, threadIdx.x, less<float>());
+
+            if (threadIdx.x == 0)
+            {
+                if (dist < numeric_limits<float>::max())
+                {
+                    allDistRow[bestIdx] = numeric_limits<float>::max();
+                    trainIdx.ptr(queryIdx)[i] = bestIdx;
+                    distance.ptr(queryIdx)[i] = dist;
+                }
+            }
+        }
+
+        template <int BLOCK_SIZE>
+        void findKnnMatch(int k, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSzf& allDist, cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, 1, 1);
+            const dim3 grid(trainIdx.rows, 1, 1);
+
+            for (int i = 0; i < k; ++i)
+            {
+                findBestMatch<BLOCK_SIZE><<<grid, block, 0, stream>>>(allDist, i, trainIdx, distance);
+                cudaSafeCall( cudaGetLastError() );
+            }
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        void findKnnMatchDispatcher(int k, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream)
+        {
+            findKnnMatch<256>(k, static_cast<PtrStepSzi>(trainIdx), static_cast<PtrStepSzf>(distance), allDist, stream);
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // knn match Dispatcher
+
+        template <typename Dist, typename T, typename Mask>
+        void matchDispatcher(const PtrStepSz<T>& query, const PtrStepSz<T>& train, int k, const Mask& mask,
+            const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist,
+            cudaStream_t stream)
+        {
+            if (k == 2)
+            {
+                match2Dispatcher<Dist>(query, train, mask, trainIdx, distance, stream);
+            }
+            else
+            {
+                calcDistanceDispatcher<Dist>(query, train, mask, allDist, stream);
+                findKnnMatchDispatcher(k, trainIdx, distance, allDist, stream);
+            }
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // knn match caller
+
+        template <typename T> void matchL1_gpu(const PtrStepSzb& query, const PtrStepSzb& train, int k, const PtrStepSzb& mask,
+            const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist,
+            cudaStream_t stream)
+        {
+            if (mask.data)
+                matchDispatcher< L1Dist<T> >(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), k, SingleMask(mask), trainIdx, distance, allDist, stream);
+            else
+                matchDispatcher< L1Dist<T> >(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), k, WithOutMask(), trainIdx, distance, allDist, stream);
+        }
+
+        template void matchL1_gpu<uchar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+        //template void matchL1_gpu<schar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+        template void matchL1_gpu<ushort>(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+        template void matchL1_gpu<short >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+        template void matchL1_gpu<int   >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+        template void matchL1_gpu<float >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+
+        template <typename T> void matchL2_gpu(const PtrStepSzb& query, const PtrStepSzb& train, int k, const PtrStepSzb& mask,
+            const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist,
+            cudaStream_t stream)
+        {
+            if (mask.data)
+                matchDispatcher<L2Dist>(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), k, SingleMask(mask), trainIdx, distance, allDist, stream);
+            else
+                matchDispatcher<L2Dist>(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), k, WithOutMask(), trainIdx, distance, allDist, stream);
+        }
+
+        //template void matchL2_gpu<uchar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+        //template void matchL2_gpu<schar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+        //template void matchL2_gpu<ushort>(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+        //template void matchL2_gpu<short >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+        //template void matchL2_gpu<int   >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+        template void matchL2_gpu<float >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+
+        template <typename T> void matchHamming_gpu(const PtrStepSzb& query, const PtrStepSzb& train, int k, const PtrStepSzb& mask,
+            const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist,
+            cudaStream_t stream)
+        {
+            if (mask.data)
+                matchDispatcher<HammingDist>(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), k, SingleMask(mask), trainIdx, distance, allDist, stream);
+            else
+                matchDispatcher<HammingDist>(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), k, WithOutMask(), trainIdx, distance, allDist, stream);
+        }
+
+        template void matchHamming_gpu<uchar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+        //template void matchHamming_gpu<schar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+        template void matchHamming_gpu<ushort>(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+        //template void matchHamming_gpu<short >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+        template void matchHamming_gpu<int   >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, int k, const PtrStepSzb& mask, const PtrStepSzb& trainIdx, const PtrStepSzb& distance, const PtrStepSzf& allDist, cudaStream_t stream);
+
+        template <typename T> void match2L1_gpu(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks,
+            const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance,
+            cudaStream_t stream)
+        {
+            if (masks.data)
+                match2Dispatcher< L1Dist<T> >(static_cast< PtrStepSz<T> >(query), (const PtrStepSz<T>*)trains.ptr(), trains.cols, MaskCollection(masks.data), trainIdx, imgIdx, distance, stream);
+            else
+                match2Dispatcher< L1Dist<T> >(static_cast< PtrStepSz<T> >(query), (const PtrStepSz<T>*)trains.ptr(), trains.cols, WithOutMask(), trainIdx, imgIdx, distance,  stream);
+        }
+
+        template void match2L1_gpu<uchar >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+        //template void match2L1_gpu<schar >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+        template void match2L1_gpu<ushort>(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+        template void match2L1_gpu<short >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+        template void match2L1_gpu<int   >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+        template void match2L1_gpu<float >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+
+        template <typename T> void match2L2_gpu(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks,
+            const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance,
+            cudaStream_t stream)
+        {
+            if (masks.data)
+                match2Dispatcher<L2Dist>(static_cast< PtrStepSz<T> >(query), (const PtrStepSz<T>*)trains.ptr(), trains.cols, MaskCollection(masks.data), trainIdx, imgIdx, distance, stream);
+            else
+                match2Dispatcher<L2Dist>(static_cast< PtrStepSz<T> >(query), (const PtrStepSz<T>*)trains.ptr(), trains.cols, WithOutMask(), trainIdx, imgIdx, distance, stream);
+        }
+
+        //template void match2L2_gpu<uchar >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+        //template void match2L2_gpu<schar >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+        //template void match2L2_gpu<ushort>(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+        //template void match2L2_gpu<short >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+        //template void match2L2_gpu<int   >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+        template void match2L2_gpu<float >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+
+        template <typename T> void match2Hamming_gpu(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks,
+            const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance,
+            cudaStream_t stream)
+        {
+            if (masks.data)
+                match2Dispatcher<HammingDist>(static_cast< PtrStepSz<T> >(query), (const PtrStepSz<T>*)trains.ptr(), trains.cols, MaskCollection(masks.data), trainIdx, imgIdx, distance, stream);
+            else
+                match2Dispatcher<HammingDist>(static_cast< PtrStepSz<T> >(query), (const PtrStepSz<T>*)trains.ptr(), trains.cols, WithOutMask(), trainIdx, imgIdx, distance, stream);
+        }
+
+        template void match2Hamming_gpu<uchar >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+        //template void match2Hamming_gpu<schar >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+        template void match2Hamming_gpu<ushort>(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+        //template void match2Hamming_gpu<short >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+        template void match2Hamming_gpu<int   >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzb& trainIdx, const PtrStepSzb& imgIdx, const PtrStepSzb& distance, cudaStream_t stream);
+    } // namespace bf_knnmatch
+}}} // namespace cv { namespace cuda { namespace cudev {
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafeatures2d/src/cuda/bf_match.cu b/modules/cudafeatures2d/src/cuda/bf_match.cu
new file mode 100644
index 00000000000..ec7df1ea848
--- /dev/null
+++ b/modules/cudafeatures2d/src/cuda/bf_match.cu
@@ -0,0 +1,774 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/utility.hpp"
+#include "opencv2/core/cuda/reduce.hpp"
+#include "opencv2/core/cuda/limits.hpp"
+#include "opencv2/core/cuda/vec_distance.hpp"
+#include "opencv2/core/cuda/datamov_utils.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace bf_match
+    {
+        ///////////////////////////////////////////////////////////////////////////////
+        // Reduction
+
+        template <int BLOCK_SIZE>
+        __device__ void findBestMatch(float& bestDistance, int& bestTrainIdx, float* s_distance, int* s_trainIdx)
+        {
+            s_distance += threadIdx.y * BLOCK_SIZE;
+            s_trainIdx += threadIdx.y * BLOCK_SIZE;
+
+            reduceKeyVal<BLOCK_SIZE>(s_distance, bestDistance, s_trainIdx, bestTrainIdx, threadIdx.x, less<float>());
+        }
+
+        template <int BLOCK_SIZE>
+        __device__ void findBestMatch(float& bestDistance, int& bestTrainIdx, int& bestImgIdx, float* s_distance, int* s_trainIdx, int* s_imgIdx)
+        {
+            s_distance += threadIdx.y * BLOCK_SIZE;
+            s_trainIdx += threadIdx.y * BLOCK_SIZE;
+            s_imgIdx   += threadIdx.y * BLOCK_SIZE;
+
+            reduceKeyVal<BLOCK_SIZE>(s_distance, bestDistance, smem_tuple(s_trainIdx, s_imgIdx), thrust::tie(bestTrainIdx, bestImgIdx), threadIdx.x, less<float>());
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // Match Unrolled Cached
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename T, typename U>
+        __device__ void loadQueryToSmem(int queryIdx, const PtrStepSz<T>& query, U* s_query)
+        {
+            #pragma unroll
+            for (int i = 0; i < MAX_DESC_LEN / BLOCK_SIZE; ++i)
+            {
+                const int loadX = threadIdx.x + i * BLOCK_SIZE;
+                s_query[threadIdx.y * MAX_DESC_LEN + loadX] = loadX < query.cols ? query.ptr(::min(queryIdx, query.rows - 1))[loadX] : 0;
+            }
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        __device__ void loopUnrolledCached(int queryIdx, const PtrStepSz<T>& query,volatile int imgIdx, const PtrStepSz<T>& train, const Mask& mask,
+                                           typename Dist::value_type* s_query, typename Dist::value_type* s_train,
+                                           float& bestDistance, int& bestTrainIdx, int& bestImgIdx)
+        {
+            for (int t = 0, endt = (train.rows + BLOCK_SIZE - 1) / BLOCK_SIZE; t < endt; ++t)
+            {
+                Dist dist;
+
+                #pragma unroll
+                for (int i = 0; i < MAX_DESC_LEN / BLOCK_SIZE; ++i)
+                {
+                    const int loadX = threadIdx.x + i * BLOCK_SIZE;
+
+                    s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = 0;
+
+                    if (loadX < train.cols)
+                    {
+                        T val;
+
+                        ForceGlob<T>::Load(train.ptr(::min(t * BLOCK_SIZE + threadIdx.y, train.rows - 1)), loadX, val);
+                        s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = val;
+                    }
+
+                    __syncthreads();
+
+                    #pragma unroll
+                    for (int j = 0; j < BLOCK_SIZE; ++j)
+                        dist.reduceIter(s_query[threadIdx.y * MAX_DESC_LEN + i * BLOCK_SIZE + j], s_train[j * BLOCK_SIZE + threadIdx.x]);
+
+                    __syncthreads();
+                }
+
+                typename Dist::result_type distVal = dist;
+
+                const int trainIdx = t * BLOCK_SIZE + threadIdx.x;
+
+                if (queryIdx < query.rows && trainIdx < train.rows && distVal < bestDistance && mask(queryIdx, trainIdx))
+                {
+                    bestImgIdx = imgIdx;
+                    bestDistance = distVal;
+                    bestTrainIdx = trainIdx;
+                }
+            }
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        __global__ void matchUnrolledCached(const PtrStepSz<T> query, const PtrStepSz<T> train, const Mask mask, int* bestTrainIdx, float* bestDistance)
+        {
+            extern __shared__ int smem[];
+
+            const int queryIdx = blockIdx.x * BLOCK_SIZE + threadIdx.y;
+
+            typename Dist::value_type* s_query = (typename Dist::value_type*)(smem);
+            typename Dist::value_type* s_train = (typename Dist::value_type*)(smem + BLOCK_SIZE * MAX_DESC_LEN);
+
+            loadQueryToSmem<BLOCK_SIZE, MAX_DESC_LEN>(queryIdx, query, s_query);
+
+            float myBestDistance = numeric_limits<float>::max();
+            int myBestTrainIdx = -1;
+
+            loopUnrolledCached<BLOCK_SIZE, MAX_DESC_LEN, Dist>(queryIdx, query, 0, train, mask, s_query, s_train, myBestDistance, myBestTrainIdx, myBestTrainIdx);
+
+            __syncthreads();
+
+            float* s_distance = (float*)(smem);
+            int* s_trainIdx = (int*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            findBestMatch<BLOCK_SIZE>(myBestDistance, myBestTrainIdx, s_distance, s_trainIdx);
+
+            if (queryIdx < query.rows && threadIdx.x == 0)
+            {
+                bestTrainIdx[queryIdx] = myBestTrainIdx;
+                bestDistance[queryIdx] = myBestDistance;
+            }
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        void matchUnrolledCached(const PtrStepSz<T>& query, const PtrStepSz<T>& train, const Mask& mask,
+                                 const PtrStepSzi& trainIdx, const PtrStepSzf& distance,
+                                 cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+            const dim3 grid(divUp(query.rows, BLOCK_SIZE));
+
+            const size_t smemSize = (BLOCK_SIZE * (MAX_DESC_LEN >= BLOCK_SIZE ? MAX_DESC_LEN : BLOCK_SIZE) + BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            matchUnrolledCached<BLOCK_SIZE, MAX_DESC_LEN, Dist><<<grid, block, smemSize, stream>>>(query, train, mask, trainIdx.data, distance.data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        __global__ void matchUnrolledCached(const PtrStepSz<T> query, const PtrStepSz<T>* trains, int n, const Mask mask,
+                                            int* bestTrainIdx, int* bestImgIdx, float* bestDistance)
+        {
+            extern __shared__ int smem[];
+
+            const int queryIdx = blockIdx.x * BLOCK_SIZE + threadIdx.y;
+
+            typename Dist::value_type* s_query = (typename Dist::value_type*)(smem);
+            typename Dist::value_type* s_train = (typename Dist::value_type*)(smem + BLOCK_SIZE * MAX_DESC_LEN);
+
+            loadQueryToSmem<BLOCK_SIZE, MAX_DESC_LEN>(queryIdx, query, s_query);
+
+            float myBestDistance = numeric_limits<float>::max();
+            int myBestTrainIdx = -1;
+            int myBestImgIdx = -1;
+
+            Mask m = mask;
+
+            for (int imgIdx = 0; imgIdx < n; ++imgIdx)
+            {
+                const PtrStepSz<T> train = trains[imgIdx];
+                m.next();
+                loopUnrolledCached<BLOCK_SIZE, MAX_DESC_LEN, Dist>(queryIdx, query, imgIdx, train, m, s_query, s_train, myBestDistance, myBestTrainIdx, myBestImgIdx);
+            }
+
+            __syncthreads();
+
+            float* s_distance = (float*)(smem);
+            int* s_trainIdx = (int*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+            int* s_imgIdx = (int*)(smem + 2 * BLOCK_SIZE * BLOCK_SIZE);
+
+            findBestMatch<BLOCK_SIZE>(myBestDistance, myBestTrainIdx, myBestImgIdx, s_distance, s_trainIdx, s_imgIdx);
+
+            if (queryIdx < query.rows && threadIdx.x == 0)
+            {
+                bestTrainIdx[queryIdx] = myBestTrainIdx;
+                bestImgIdx[queryIdx] = myBestImgIdx;
+                bestDistance[queryIdx] = myBestDistance;
+            }
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        void matchUnrolledCached(const PtrStepSz<T>& query, const PtrStepSz<T>* trains, int n, const Mask& mask,
+                                 const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance,
+                                 cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+            const dim3 grid(divUp(query.rows, BLOCK_SIZE));
+
+            const size_t smemSize = (BLOCK_SIZE * (MAX_DESC_LEN >= 2 * BLOCK_SIZE ? MAX_DESC_LEN : 2 * BLOCK_SIZE) + BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            matchUnrolledCached<BLOCK_SIZE, MAX_DESC_LEN, Dist><<<grid, block, smemSize, stream>>>(query, trains, n, mask, trainIdx.data, imgIdx.data, distance.data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // Match Unrolled
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        __device__ void loopUnrolled(int queryIdx, const PtrStepSz<T>& query,volatile int imgIdx, const PtrStepSz<T>& train, const Mask& mask,
+                                     typename Dist::value_type* s_query, typename Dist::value_type* s_train,
+                                     float& bestDistance, int& bestTrainIdx, int& bestImgIdx)
+        {
+            for (int t = 0, endt = (train.rows + BLOCK_SIZE - 1) / BLOCK_SIZE; t < endt; ++t)
+            {
+                Dist dist;
+
+                #pragma unroll
+                for (int i = 0; i < MAX_DESC_LEN / BLOCK_SIZE; ++i)
+                {
+                    const int loadX = threadIdx.x + i * BLOCK_SIZE;
+
+                    s_query[threadIdx.y * BLOCK_SIZE + threadIdx.x] = 0;
+                    s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = 0;
+
+                    if (loadX < query.cols)
+                    {
+                        T val;
+
+                        ForceGlob<T>::Load(query.ptr(::min(queryIdx, query.rows - 1)), loadX, val);
+                        s_query[threadIdx.y * BLOCK_SIZE + threadIdx.x] = val;
+
+                        ForceGlob<T>::Load(train.ptr(::min(t * BLOCK_SIZE + threadIdx.y, train.rows - 1)), loadX, val);
+                        s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = val;
+                    }
+
+                    __syncthreads();
+
+                    #pragma unroll
+                    for (int j = 0; j < BLOCK_SIZE; ++j)
+                        dist.reduceIter(s_query[threadIdx.y * BLOCK_SIZE + j], s_train[j * BLOCK_SIZE + threadIdx.x]);
+
+                    __syncthreads();
+                }
+
+                typename Dist::result_type distVal = dist;
+
+                const int trainIdx = t * BLOCK_SIZE + threadIdx.x;
+
+                if (queryIdx < query.rows && trainIdx < train.rows && distVal < bestDistance && mask(queryIdx, trainIdx))
+                {
+                    bestImgIdx = imgIdx;
+                    bestDistance = distVal;
+                    bestTrainIdx = trainIdx;
+                }
+            }
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        __global__ void matchUnrolled(const PtrStepSz<T> query, const PtrStepSz<T> train, const Mask mask, int* bestTrainIdx, float* bestDistance)
+        {
+            extern __shared__ int smem[];
+
+            const int queryIdx = blockIdx.x * BLOCK_SIZE + threadIdx.y;
+
+            float myBestDistance = numeric_limits<float>::max();
+            int myBestTrainIdx = -1;
+
+            typename Dist::value_type* s_query = (typename Dist::value_type*)(smem);
+            typename Dist::value_type* s_train = (typename Dist::value_type*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            loopUnrolled<BLOCK_SIZE, MAX_DESC_LEN, Dist>(queryIdx, query, 0, train, mask, s_query, s_train, myBestDistance, myBestTrainIdx, myBestTrainIdx);
+
+            __syncthreads();
+
+            float* s_distance = (float*)(smem);
+            int* s_trainIdx = (int*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            findBestMatch<BLOCK_SIZE>(myBestDistance, myBestTrainIdx, s_distance, s_trainIdx);
+
+            if (queryIdx < query.rows && threadIdx.x == 0)
+            {
+                bestTrainIdx[queryIdx] = myBestTrainIdx;
+                bestDistance[queryIdx] = myBestDistance;
+            }
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        void matchUnrolled(const PtrStepSz<T>& query, const PtrStepSz<T>& train, const Mask& mask,
+                           const PtrStepSzi& trainIdx, const PtrStepSzf& distance,
+                           cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+            const dim3 grid(divUp(query.rows, BLOCK_SIZE));
+
+            const size_t smemSize = (2 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            matchUnrolled<BLOCK_SIZE, MAX_DESC_LEN, Dist><<<grid, block, smemSize, stream>>>(query, train, mask, trainIdx.data, distance.data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        __global__ void matchUnrolled(const PtrStepSz<T> query, const PtrStepSz<T>* trains, int n, const Mask mask,
+                                      int* bestTrainIdx, int* bestImgIdx, float* bestDistance)
+        {
+            extern __shared__ int smem[];
+
+            const int queryIdx = blockIdx.x * BLOCK_SIZE + threadIdx.y;
+
+            float myBestDistance = numeric_limits<float>::max();
+            int myBestTrainIdx = -1;
+            int myBestImgIdx = -1;
+
+            typename Dist::value_type* s_query = (typename Dist::value_type*)(smem);
+            typename Dist::value_type* s_train = (typename Dist::value_type*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            Mask m = mask;
+
+            for (int imgIdx = 0; imgIdx < n; ++imgIdx)
+            {
+                const PtrStepSz<T> train = trains[imgIdx];
+                m.next();
+                loopUnrolled<BLOCK_SIZE, MAX_DESC_LEN, Dist>(queryIdx, query, imgIdx, train, m, s_query, s_train, myBestDistance, myBestTrainIdx, myBestImgIdx);
+            }
+
+            __syncthreads();
+
+            float* s_distance = (float*)(smem);
+            int* s_trainIdx = (int*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+            int* s_imgIdxIdx = (int*)(smem + 2 * BLOCK_SIZE * BLOCK_SIZE);
+
+            findBestMatch<BLOCK_SIZE>(myBestDistance, myBestTrainIdx, myBestImgIdx, s_distance, s_trainIdx, s_imgIdxIdx);
+
+            if (queryIdx < query.rows && threadIdx.x == 0)
+            {
+                bestTrainIdx[queryIdx] = myBestTrainIdx;
+                bestImgIdx[queryIdx] = myBestImgIdx;
+                bestDistance[queryIdx] = myBestDistance;
+            }
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        void matchUnrolled(const PtrStepSz<T>& query, const PtrStepSz<T>* trains, int n, const Mask& mask,
+                           const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance,
+                           cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+            const dim3 grid(divUp(query.rows, BLOCK_SIZE));
+
+            const size_t smemSize = (3 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            matchUnrolled<BLOCK_SIZE, MAX_DESC_LEN, Dist><<<grid, block, smemSize, stream>>>(query, trains, n, mask, trainIdx.data, imgIdx.data, distance.data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // Match
+
+        template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
+        __device__ void loop(int queryIdx, const PtrStepSz<T>& query, volatile int imgIdx, const PtrStepSz<T>& train, const Mask& mask,
+                             typename Dist::value_type* s_query, typename Dist::value_type* s_train,
+                             float& bestDistance, int& bestTrainIdx, int& bestImgIdx)
+        {
+            for (int t = 0, endt = (train.rows + BLOCK_SIZE - 1) / BLOCK_SIZE; t < endt; ++t)
+            {
+                Dist dist;
+
+                for (int i = 0, endi = (query.cols + BLOCK_SIZE - 1) / BLOCK_SIZE; i < endi; ++i)
+                {
+                    const int loadX = threadIdx.x + i * BLOCK_SIZE;
+
+                    s_query[threadIdx.y * BLOCK_SIZE + threadIdx.x] = 0;
+                    s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = 0;
+
+                    if (loadX < query.cols)
+                    {
+                        T val;
+
+                        ForceGlob<T>::Load(query.ptr(::min(queryIdx, query.rows - 1)), loadX, val);
+                        s_query[threadIdx.y * BLOCK_SIZE + threadIdx.x] = val;
+
+                        ForceGlob<T>::Load(train.ptr(::min(t * BLOCK_SIZE + threadIdx.y, train.rows - 1)), loadX, val);
+                        s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = val;
+                    }
+
+                    __syncthreads();
+
+                    #pragma unroll
+                    for (int j = 0; j < BLOCK_SIZE; ++j)
+                        dist.reduceIter(s_query[threadIdx.y * BLOCK_SIZE + j], s_train[j * BLOCK_SIZE + threadIdx.x]);
+
+                    __syncthreads();
+                }
+
+                typename Dist::result_type distVal = dist;
+
+                const int trainIdx = t * BLOCK_SIZE + threadIdx.x;
+
+                if (queryIdx < query.rows && trainIdx < train.rows && distVal < bestDistance && mask(queryIdx, trainIdx))
+                {
+                    bestImgIdx = imgIdx;
+                    bestDistance = distVal;
+                    bestTrainIdx = trainIdx;
+                }
+            }
+        }
+
+        template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
+        __global__ void match(const PtrStepSz<T> query, const PtrStepSz<T> train, const Mask mask, int* bestTrainIdx, float* bestDistance)
+        {
+            extern __shared__ int smem[];
+
+            const int queryIdx = blockIdx.x * BLOCK_SIZE + threadIdx.y;
+
+            float myBestDistance = numeric_limits<float>::max();
+            int myBestTrainIdx = -1;
+
+            typename Dist::value_type* s_query = (typename Dist::value_type*)(smem);
+            typename Dist::value_type* s_train = (typename Dist::value_type*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            loop<BLOCK_SIZE, Dist>(queryIdx, query, 0, train, mask, s_query, s_train, myBestDistance, myBestTrainIdx, myBestTrainIdx);
+
+            __syncthreads();
+
+            float* s_distance = (float*)(smem);
+            int* s_trainIdx = (int*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            findBestMatch<BLOCK_SIZE>(myBestDistance, myBestTrainIdx, s_distance, s_trainIdx);
+
+            if (queryIdx < query.rows && threadIdx.x == 0)
+            {
+                bestTrainIdx[queryIdx] = myBestTrainIdx;
+                bestDistance[queryIdx] = myBestDistance;
+            }
+        }
+
+        template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
+        void match(const PtrStepSz<T>& query, const PtrStepSz<T>& train, const Mask& mask,
+                   const PtrStepSzi& trainIdx, const PtrStepSzf& distance,
+                   cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+            const dim3 grid(divUp(query.rows, BLOCK_SIZE));
+
+            const size_t smemSize = (2 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            match<BLOCK_SIZE, Dist><<<grid, block, smemSize, stream>>>(query, train, mask, trainIdx.data, distance.data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
+        __global__ void match(const PtrStepSz<T> query, const PtrStepSz<T>* trains, int n, const Mask mask,
+                              int* bestTrainIdx, int* bestImgIdx, float* bestDistance)
+        {
+            extern __shared__ int smem[];
+
+            const int queryIdx = blockIdx.x * BLOCK_SIZE + threadIdx.y;
+
+            float myBestDistance = numeric_limits<float>::max();
+            int myBestTrainIdx = -1;
+            int myBestImgIdx = -1;
+
+            typename Dist::value_type* s_query = (typename Dist::value_type*)(smem);
+            typename Dist::value_type* s_train = (typename Dist::value_type*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            Mask m = mask;
+            for (int imgIdx = 0; imgIdx < n; ++imgIdx)
+            {
+                const PtrStepSz<T> train = trains[imgIdx];
+                m.next();
+                loop<BLOCK_SIZE, Dist>(queryIdx, query, imgIdx, train, m, s_query, s_train, myBestDistance, myBestTrainIdx, myBestImgIdx);
+            }
+
+            __syncthreads();
+
+            float* s_distance = (float*)(smem);
+            int* s_trainIdx = (int*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+            int* s_imgIdxIdx = (int*)(smem + 2 * BLOCK_SIZE * BLOCK_SIZE);
+
+            findBestMatch<BLOCK_SIZE>(myBestDistance, myBestTrainIdx, myBestImgIdx, s_distance, s_trainIdx, s_imgIdxIdx);
+
+            if (queryIdx < query.rows && threadIdx.x == 0)
+            {
+                bestTrainIdx[queryIdx] = myBestTrainIdx;
+                bestImgIdx[queryIdx] = myBestImgIdx;
+                bestDistance[queryIdx] = myBestDistance;
+            }
+        }
+
+        template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
+        void match(const PtrStepSz<T>& query, const PtrStepSz<T>* trains, int n, const Mask& mask,
+                   const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance,
+                   cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+            const dim3 grid(divUp(query.rows, BLOCK_SIZE));
+
+            const size_t smemSize = (3 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            match<BLOCK_SIZE, Dist><<<grid, block, smemSize, stream>>>(query, trains, n, mask, trainIdx.data, imgIdx.data, distance.data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // Match dispatcher
+
+        template <typename Dist, typename T, typename Mask>
+        void matchDispatcher(const PtrStepSz<T>& query, const PtrStepSz<T>& train, const Mask& mask,
+                             const PtrStepSzi& trainIdx, const PtrStepSzf& distance,
+                             cudaStream_t stream)
+        {
+            if (query.cols <= 64)
+            {
+                matchUnrolledCached<16, 64, Dist>(query, train, mask, trainIdx, distance, stream);
+            }
+            else if (query.cols <= 128)
+            {
+                matchUnrolledCached<16, 128, Dist>(query, train, mask, trainIdx, distance, stream);
+            }
+            /*else if (query.cols <= 256)
+            {
+                matchUnrolled<16, 256, Dist>(query, train, mask, trainIdx, distance, stream);
+            }
+            else if (query.cols <= 512)
+            {
+                matchUnrolled<16, 512, Dist>(query, train, mask, trainIdx, distance, stream);
+            }
+            else if (query.cols <= 1024)
+            {
+                matchUnrolled<16, 1024, Dist>(query, train, mask, trainIdx, distance, stream);
+            }*/
+            else
+            {
+                match<16, Dist>(query, train, mask, trainIdx, distance, stream);
+            }
+        }
+
+        template <typename Dist, typename T, typename Mask>
+        void matchDispatcher(const PtrStepSz<T>& query, const PtrStepSz<T>* trains, int n, const Mask& mask,
+                             const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance,
+                             cudaStream_t stream)
+        {
+            if (query.cols <= 64)
+            {
+                matchUnrolledCached<16, 64, Dist>(query, trains, n, mask, trainIdx, imgIdx, distance, stream);
+            }
+            else if (query.cols <= 128)
+            {
+                matchUnrolledCached<16, 128, Dist>(query, trains, n, mask, trainIdx, imgIdx, distance, stream);
+            }
+            /*else if (query.cols <= 256)
+            {
+                matchUnrolled<16, 256, Dist>(query, trains, n, mask, trainIdx, imgIdx, distance, stream);
+            }
+            else if (query.cols <= 512)
+            {
+                matchUnrolled<16, 512, Dist>(query, trains, n, mask, trainIdx, imgIdx, distance, stream);
+            }
+            else if (query.cols <= 1024)
+            {
+                matchUnrolled<16, 1024, Dist>(query, trains, n, mask, trainIdx, imgIdx, distance, stream);
+            }*/
+            else
+            {
+                match<16, Dist>(query, trains, n, mask, trainIdx, imgIdx, distance, stream);
+            }
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // Match caller
+
+        template <typename T> void matchL1_gpu(const PtrStepSzb& query, const PtrStepSzb& train, const PtrStepSzb& mask,
+                                               const PtrStepSzi& trainIdx, const PtrStepSzf& distance,
+                                               cudaStream_t stream)
+        {
+            if (mask.data)
+            {
+                matchDispatcher< L1Dist<T> >(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), SingleMask(mask),
+                    trainIdx, distance,
+                    stream);
+            }
+            else
+            {
+                matchDispatcher< L1Dist<T> >(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), WithOutMask(),
+                    trainIdx, distance,
+                    stream);
+            }
+        }
+
+        template void matchL1_gpu<uchar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        //template void matchL1_gpu<schar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        template void matchL1_gpu<ushort>(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        template void matchL1_gpu<short >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        template void matchL1_gpu<int   >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        template void matchL1_gpu<float >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+
+        template <typename T> void matchL2_gpu(const PtrStepSzb& query, const PtrStepSzb& train, const PtrStepSzb& mask,
+                                               const PtrStepSzi& trainIdx, const PtrStepSzf& distance,
+                                               cudaStream_t stream)
+        {
+            if (mask.data)
+            {
+                matchDispatcher<L2Dist>(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), SingleMask(mask),
+                    trainIdx, distance,
+                    stream);
+            }
+            else
+            {
+                matchDispatcher<L2Dist>(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), WithOutMask(),
+                    trainIdx, distance,
+                    stream);
+            }
+        }
+
+        //template void matchL2_gpu<uchar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        //template void matchL2_gpu<schar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        //template void matchL2_gpu<ushort>(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        //template void matchL2_gpu<short >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        //template void matchL2_gpu<int   >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        template void matchL2_gpu<float >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+
+        template <typename T> void matchHamming_gpu(const PtrStepSzb& query, const PtrStepSzb& train, const PtrStepSzb& mask,
+                                                    const PtrStepSzi& trainIdx, const PtrStepSzf& distance,
+                                                    cudaStream_t stream)
+        {
+            if (mask.data)
+            {
+                matchDispatcher<HammingDist>(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), SingleMask(mask),
+                    trainIdx, distance,
+                    stream);
+            }
+            else
+            {
+                matchDispatcher<HammingDist>(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), WithOutMask(),
+                    trainIdx, distance,
+                    stream);
+            }
+        }
+
+        template void matchHamming_gpu<uchar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        //template void matchHamming_gpu<schar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        template void matchHamming_gpu<ushort>(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        //template void matchHamming_gpu<short >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        template void matchHamming_gpu<int   >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, cudaStream_t stream);
+
+        template <typename T> void matchL1_gpu(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks,
+                                               const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance,
+                                                cudaStream_t stream)
+        {
+            if (masks.data)
+            {
+                matchDispatcher< L1Dist<T> >(static_cast< PtrStepSz<T> >(query), (const PtrStepSz<T>*)trains.ptr(), trains.cols, MaskCollection(masks.data),
+                    trainIdx, imgIdx, distance,
+                    stream);
+            }
+            else
+            {
+                matchDispatcher< L1Dist<T> >(static_cast< PtrStepSz<T> >(query), (const PtrStepSz<T>*)trains.ptr(), trains.cols, WithOutMask(),
+                    trainIdx, imgIdx, distance,
+                    stream);
+            }
+        }
+
+        template void matchL1_gpu<uchar >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        //template void matchL1_gpu<schar >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        template void matchL1_gpu<ushort>(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        template void matchL1_gpu<short >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        template void matchL1_gpu<int   >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        template void matchL1_gpu<float >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+
+        template <typename T> void matchL2_gpu(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks,
+                                               const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance,
+                                               cudaStream_t stream)
+        {
+            if (masks.data)
+            {
+                matchDispatcher<L2Dist>(static_cast< PtrStepSz<T> >(query), (const PtrStepSz<T>*)trains.ptr(), trains.cols, MaskCollection(masks.data),
+                    trainIdx, imgIdx, distance,
+                    stream);
+            }
+            else
+            {
+                matchDispatcher<L2Dist>(static_cast< PtrStepSz<T> >(query), (const PtrStepSz<T>*)trains.ptr(), trains.cols, WithOutMask(),
+                    trainIdx, imgIdx, distance,
+                    stream);
+            }
+        }
+
+        //template void matchL2_gpu<uchar >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        //template void matchL2_gpu<schar >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        //template void matchL2_gpu<ushort>(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        //template void matchL2_gpu<short >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        //template void matchL2_gpu<int   >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        template void matchL2_gpu<float >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& maskCollection, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+
+        template <typename T> void matchHamming_gpu(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks,
+                                                    const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance,
+                                                    cudaStream_t stream)
+        {
+            if (masks.data)
+            {
+                matchDispatcher<HammingDist>(static_cast< PtrStepSz<T> >(query), (const PtrStepSz<T>*)trains.ptr(), trains.cols, MaskCollection(masks.data),
+                    trainIdx, imgIdx, distance,
+                    stream);
+            }
+            else
+            {
+                matchDispatcher<HammingDist>(static_cast< PtrStepSz<T> >(query), (const PtrStepSz<T>*)trains.ptr(), trains.cols, WithOutMask(),
+                    trainIdx, imgIdx, distance,
+                    stream);
+            }
+        }
+
+        template void matchHamming_gpu<uchar >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        //template void matchHamming_gpu<schar >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        template void matchHamming_gpu<ushort>(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        //template void matchHamming_gpu<short >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+        template void matchHamming_gpu<int   >(const PtrStepSzb& query, const PtrStepSzb& trains, const PtrStepSz<PtrStepb>& masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, cudaStream_t stream);
+    } // namespace bf_match
+}}} // namespace cv { namespace cuda { namespace cudev {
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafeatures2d/src/cuda/bf_radius_match.cu b/modules/cudafeatures2d/src/cuda/bf_radius_match.cu
new file mode 100644
index 00000000000..0121e81e2b8
--- /dev/null
+++ b/modules/cudafeatures2d/src/cuda/bf_radius_match.cu
@@ -0,0 +1,463 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/utility.hpp"
+#include "opencv2/core/cuda/limits.hpp"
+#include "opencv2/core/cuda/vec_distance.hpp"
+#include "opencv2/core/cuda/datamov_utils.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace bf_radius_match
+    {
+        ///////////////////////////////////////////////////////////////////////////////
+        // Match Unrolled
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, bool SAVE_IMG_IDX, typename Dist, typename T, typename Mask>
+        __global__ void matchUnrolled(const PtrStepSz<T> query, int imgIdx, const PtrStepSz<T> train, float maxDistance, const Mask mask,
+            PtrStepi bestTrainIdx, PtrStepi bestImgIdx, PtrStepf bestDistance, unsigned int* nMatches, int maxCount)
+        {
+            extern __shared__ int smem[];
+
+            const int queryIdx = blockIdx.y * BLOCK_SIZE + threadIdx.y;
+            const int trainIdx = blockIdx.x * BLOCK_SIZE + threadIdx.x;
+
+            typename Dist::value_type* s_query = (typename Dist::value_type*)(smem);
+            typename Dist::value_type* s_train = (typename Dist::value_type*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            Dist dist;
+
+            #pragma unroll
+            for (int i = 0; i < MAX_DESC_LEN / BLOCK_SIZE; ++i)
+            {
+                const int loadX = threadIdx.x + i * BLOCK_SIZE;
+
+                s_query[threadIdx.y * BLOCK_SIZE + threadIdx.x] = 0;
+                s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = 0;
+
+                if (loadX < query.cols)
+                {
+                    T val;
+
+                    ForceGlob<T>::Load(query.ptr(::min(queryIdx, query.rows - 1)), loadX, val);
+                    s_query[threadIdx.y * BLOCK_SIZE + threadIdx.x] = val;
+
+                    ForceGlob<T>::Load(train.ptr(::min(blockIdx.x * BLOCK_SIZE + threadIdx.y, train.rows - 1)), loadX, val);
+                    s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = val;
+                }
+
+                __syncthreads();
+
+                #pragma unroll
+                for (int j = 0; j < BLOCK_SIZE; ++j)
+                    dist.reduceIter(s_query[threadIdx.y * BLOCK_SIZE + j], s_train[j * BLOCK_SIZE + threadIdx.x]);
+
+                __syncthreads();
+            }
+
+            float distVal = (typename Dist::result_type)dist;
+
+            if (queryIdx < query.rows && trainIdx < train.rows && mask(queryIdx, trainIdx) && distVal < maxDistance)
+            {
+                unsigned int ind = atomicInc(nMatches + queryIdx, (unsigned int) -1);
+                if (ind < maxCount)
+                {
+                    bestTrainIdx.ptr(queryIdx)[ind] = trainIdx;
+                    if (SAVE_IMG_IDX) bestImgIdx.ptr(queryIdx)[ind] = imgIdx;
+                    bestDistance.ptr(queryIdx)[ind] = distVal;
+                }
+            }
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T, typename Mask>
+        void matchUnrolled(const PtrStepSz<T>& query, const PtrStepSz<T>& train, float maxDistance, const Mask& mask,
+            const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+            const dim3 grid(divUp(train.rows, BLOCK_SIZE), divUp(query.rows, BLOCK_SIZE));
+
+            const size_t smemSize = (2 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            matchUnrolled<BLOCK_SIZE, MAX_DESC_LEN, false, Dist><<<grid, block, smemSize, stream>>>(query, 0, train, maxDistance, mask,
+                trainIdx, PtrStepi(), distance, nMatches.data, trainIdx.cols);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template <int BLOCK_SIZE, int MAX_DESC_LEN, typename Dist, typename T>
+        void matchUnrolled(const PtrStepSz<T>& query, const PtrStepSz<T>* trains, int n, float maxDistance, const PtrStepSzb* masks,
+            const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+            cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+
+            const size_t smemSize = (2 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            for (int i = 0; i < n; ++i)
+            {
+                const PtrStepSz<T> train = trains[i];
+
+                const dim3 grid(divUp(train.rows, BLOCK_SIZE), divUp(query.rows, BLOCK_SIZE));
+
+                if (masks != 0 && masks[i].data)
+                {
+                    matchUnrolled<BLOCK_SIZE, MAX_DESC_LEN, true, Dist><<<grid, block, smemSize, stream>>>(query, i, train, maxDistance, SingleMask(masks[i]),
+                        trainIdx, imgIdx, distance, nMatches.data, trainIdx.cols);
+                }
+                else
+                {
+                    matchUnrolled<BLOCK_SIZE, MAX_DESC_LEN, true, Dist><<<grid, block, smemSize, stream>>>(query, i, train, maxDistance, WithOutMask(),
+                        trainIdx, imgIdx, distance, nMatches.data, trainIdx.cols);
+                }
+                cudaSafeCall( cudaGetLastError() );
+            }
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // Match
+
+        template <int BLOCK_SIZE, bool SAVE_IMG_IDX, typename Dist, typename T, typename Mask>
+        __global__ void match(const PtrStepSz<T> query, int imgIdx, const PtrStepSz<T> train, float maxDistance, const Mask mask,
+            PtrStepi bestTrainIdx, PtrStepi bestImgIdx, PtrStepf bestDistance, unsigned int* nMatches, int maxCount)
+        {
+            extern __shared__ int smem[];
+
+            const int queryIdx = blockIdx.y * BLOCK_SIZE + threadIdx.y;
+            const int trainIdx = blockIdx.x * BLOCK_SIZE + threadIdx.x;
+
+            typename Dist::value_type* s_query = (typename Dist::value_type*)(smem);
+            typename Dist::value_type* s_train = (typename Dist::value_type*)(smem + BLOCK_SIZE * BLOCK_SIZE);
+
+            Dist dist;
+
+            for (int i = 0, endi = (query.cols + BLOCK_SIZE - 1) / BLOCK_SIZE; i < endi; ++i)
+            {
+                const int loadX = threadIdx.x + i * BLOCK_SIZE;
+
+                s_query[threadIdx.y * BLOCK_SIZE + threadIdx.x] = 0;
+                s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = 0;
+
+                if (loadX < query.cols)
+                {
+                    T val;
+
+                    ForceGlob<T>::Load(query.ptr(::min(queryIdx, query.rows - 1)), loadX, val);
+                    s_query[threadIdx.y * BLOCK_SIZE + threadIdx.x] = val;
+
+                    ForceGlob<T>::Load(train.ptr(::min(blockIdx.x * BLOCK_SIZE + threadIdx.y, train.rows - 1)), loadX, val);
+                    s_train[threadIdx.x * BLOCK_SIZE + threadIdx.y] = val;
+                }
+
+                __syncthreads();
+
+                #pragma unroll
+                for (int j = 0; j < BLOCK_SIZE; ++j)
+                    dist.reduceIter(s_query[threadIdx.y * BLOCK_SIZE + j], s_train[j * BLOCK_SIZE + threadIdx.x]);
+
+                __syncthreads();
+            }
+
+            float distVal = (typename Dist::result_type)dist;
+
+            if (queryIdx < query.rows && trainIdx < train.rows && mask(queryIdx, trainIdx) && distVal < maxDistance)
+            {
+                unsigned int ind = atomicInc(nMatches + queryIdx, (unsigned int) -1);
+                if (ind < maxCount)
+                {
+                    bestTrainIdx.ptr(queryIdx)[ind] = trainIdx;
+                    if (SAVE_IMG_IDX) bestImgIdx.ptr(queryIdx)[ind] = imgIdx;
+                    bestDistance.ptr(queryIdx)[ind] = distVal;
+                }
+            }
+        }
+
+        template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
+        void match(const PtrStepSz<T>& query, const PtrStepSz<T>& train, float maxDistance, const Mask& mask,
+            const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+            cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+            const dim3 grid(divUp(train.rows, BLOCK_SIZE), divUp(query.rows, BLOCK_SIZE));
+
+            const size_t smemSize = (2 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            match<BLOCK_SIZE, false, Dist><<<grid, block, smemSize, stream>>>(query, 0, train, maxDistance, mask,
+                trainIdx, PtrStepi(), distance, nMatches.data, trainIdx.cols);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template <int BLOCK_SIZE, typename Dist, typename T>
+        void match(const PtrStepSz<T>& query, const PtrStepSz<T>* trains, int n, float maxDistance, const PtrStepSzb* masks,
+            const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+            cudaStream_t stream)
+        {
+            const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
+
+            const size_t smemSize = (2 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
+
+            for (int i = 0; i < n; ++i)
+            {
+                const PtrStepSz<T> train = trains[i];
+
+                const dim3 grid(divUp(train.rows, BLOCK_SIZE), divUp(query.rows, BLOCK_SIZE));
+
+                if (masks != 0 && masks[i].data)
+                {
+                    match<BLOCK_SIZE, true, Dist><<<grid, block, smemSize, stream>>>(query, i, train, maxDistance, SingleMask(masks[i]),
+                        trainIdx, imgIdx, distance, nMatches.data, trainIdx.cols);
+                }
+                else
+                {
+                    match<BLOCK_SIZE, true, Dist><<<grid, block, smemSize, stream>>>(query, i, train, maxDistance, WithOutMask(),
+                        trainIdx, imgIdx, distance, nMatches.data, trainIdx.cols);
+                }
+                cudaSafeCall( cudaGetLastError() );
+            }
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // Match dispatcher
+
+        template <typename Dist, typename T, typename Mask>
+        void matchDispatcher(const PtrStepSz<T>& query, const PtrStepSz<T>& train, float maxDistance, const Mask& mask,
+                             const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+                             cudaStream_t stream)
+        {
+            if (query.cols <= 64)
+            {
+                matchUnrolled<16, 64, Dist>(query, train, maxDistance, mask, trainIdx, distance, nMatches, stream);
+            }
+            else if (query.cols <= 128)
+            {
+                matchUnrolled<16, 128, Dist>(query, train, maxDistance, mask, trainIdx, distance, nMatches, stream);
+            }
+            /*else if (query.cols <= 256)
+            {
+                matchUnrolled<16, 256, Dist>(query, train, maxDistance, mask, trainIdx, distance, nMatches, stream);
+            }
+            else if (query.cols <= 512)
+            {
+                matchUnrolled<16, 512, Dist>(query, train, maxDistance, mask, trainIdx, distance, nMatches, stream);
+            }
+            else if (query.cols <= 1024)
+            {
+                matchUnrolled<16, 1024, Dist>(query, train, maxDistance, mask, trainIdx, distance, nMatches, stream);
+            }*/
+            else
+            {
+                match<16, Dist>(query, train, maxDistance, mask, trainIdx, distance, nMatches, stream);
+            }
+        }
+
+        template <typename Dist, typename T>
+        void matchDispatcher(const PtrStepSz<T>& query, const PtrStepSz<T>* trains, int n, float maxDistance, const PtrStepSzb* masks,
+                             const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+                             cudaStream_t stream)
+        {
+            if (query.cols <= 64)
+            {
+                matchUnrolled<16, 64, Dist>(query, trains, n, maxDistance, masks, trainIdx, imgIdx, distance, nMatches, stream);
+            }
+            else if (query.cols <= 128)
+            {
+                matchUnrolled<16, 128, Dist>(query, trains, n, maxDistance, masks, trainIdx, imgIdx, distance, nMatches, stream);
+            }
+            /*else if (query.cols <= 256)
+            {
+                matchUnrolled<16, 256, Dist>(query, trains, n, maxDistance, masks, trainIdx, imgIdx, distance, nMatches, stream);
+            }
+            else if (query.cols <= 512)
+            {
+                matchUnrolled<16, 512, Dist>(query, trains, n, maxDistance, masks, trainIdx, imgIdx, distance, nMatches, stream);
+            }
+            else if (query.cols <= 1024)
+            {
+                matchUnrolled<16, 1024, Dist>(query, trains, n, maxDistance, masks, trainIdx, imgIdx, distance, nMatches, stream);
+            }*/
+            else
+            {
+                match<16, Dist>(query, trains, n, maxDistance, masks, trainIdx, imgIdx, distance, nMatches, stream);
+            }
+        }
+
+        ///////////////////////////////////////////////////////////////////////////////
+        // Radius Match caller
+
+        template <typename T> void matchL1_gpu(const PtrStepSzb& query, const PtrStepSzb& train, float maxDistance, const PtrStepSzb& mask,
+            const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+            cudaStream_t stream)
+        {
+            if (mask.data)
+            {
+                matchDispatcher< L1Dist<T> >(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), maxDistance, SingleMask(mask),
+                    trainIdx, distance, nMatches,
+                    stream);
+            }
+            else
+            {
+                matchDispatcher< L1Dist<T> >(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), maxDistance, WithOutMask(),
+                    trainIdx, distance, nMatches,
+                    stream);
+            }
+        }
+
+        template void matchL1_gpu<uchar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        //template void matchL1_gpu<schar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        template void matchL1_gpu<ushort>(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        template void matchL1_gpu<short >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        template void matchL1_gpu<int   >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        template void matchL1_gpu<float >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+
+        template <typename T> void matchL2_gpu(const PtrStepSzb& query, const PtrStepSzb& train, float maxDistance, const PtrStepSzb& mask,
+            const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+            cudaStream_t stream)
+        {
+            if (mask.data)
+            {
+                matchDispatcher<L2Dist>(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), maxDistance, SingleMask(mask),
+                    trainIdx, distance, nMatches,
+                    stream);
+            }
+            else
+            {
+                matchDispatcher<L2Dist>(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), maxDistance, WithOutMask(),
+                    trainIdx, distance, nMatches,
+                    stream);
+            }
+        }
+
+        //template void matchL2_gpu<uchar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        //template void matchL2_gpu<schar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        //template void matchL2_gpu<ushort>(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        //template void matchL2_gpu<short >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        //template void matchL2_gpu<int   >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        template void matchL2_gpu<float >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+
+        template <typename T> void matchHamming_gpu(const PtrStepSzb& query, const PtrStepSzb& train, float maxDistance, const PtrStepSzb& mask,
+            const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+            cudaStream_t stream)
+        {
+            if (mask.data)
+            {
+                matchDispatcher<HammingDist>(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), maxDistance, SingleMask(mask),
+                    trainIdx, distance, nMatches,
+                    stream);
+            }
+            else
+            {
+                matchDispatcher<HammingDist>(static_cast< PtrStepSz<T> >(query), static_cast< PtrStepSz<T> >(train), maxDistance, WithOutMask(),
+                    trainIdx, distance, nMatches,
+                    stream);
+            }
+        }
+
+        template void matchHamming_gpu<uchar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        //template void matchHamming_gpu<schar >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        template void matchHamming_gpu<ushort>(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        //template void matchHamming_gpu<short >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        template void matchHamming_gpu<int   >(const PtrStepSzb& queryDescs, const PtrStepSzb& trainDescs, float maxDistance, const PtrStepSzb& mask, const PtrStepSzi& trainIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+
+        template <typename T> void matchL1_gpu(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks,
+            const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+            cudaStream_t stream)
+        {
+            matchDispatcher< L1Dist<T> >(static_cast< PtrStepSz<T> >(query), (const PtrStepSz<T>*)trains, n, maxDistance, masks,
+                trainIdx, imgIdx, distance, nMatches,
+                stream);
+        }
+
+        template void matchL1_gpu<uchar >(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        //template void matchL1_gpu<schar >(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        template void matchL1_gpu<ushort>(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        template void matchL1_gpu<short >(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        template void matchL1_gpu<int   >(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        template void matchL1_gpu<float >(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+
+        template <typename T> void matchL2_gpu(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks,
+            const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+            cudaStream_t stream)
+        {
+            matchDispatcher<L2Dist>(static_cast< PtrStepSz<T> >(query), (const PtrStepSz<T>*)trains, n, maxDistance, masks,
+                trainIdx, imgIdx, distance, nMatches,
+                stream);
+        }
+
+        //template void matchL2_gpu<uchar >(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        //template void matchL2_gpu<schar >(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        //template void matchL2_gpu<ushort>(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        //template void matchL2_gpu<short >(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        //template void matchL2_gpu<int   >(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        template void matchL2_gpu<float >(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+
+        template <typename T> void matchHamming_gpu(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks,
+            const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches,
+            cudaStream_t stream)
+        {
+            matchDispatcher<HammingDist>(static_cast< PtrStepSz<T> >(query), (const PtrStepSz<T>*)trains, n, maxDistance, masks,
+                trainIdx, imgIdx, distance, nMatches,
+                stream);
+        }
+
+        template void matchHamming_gpu<uchar >(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        //template void matchHamming_gpu<schar >(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        template void matchHamming_gpu<ushort>(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        //template void matchHamming_gpu<short >(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+        template void matchHamming_gpu<int   >(const PtrStepSzb& query, const PtrStepSzb* trains, int n, float maxDistance, const PtrStepSzb* masks, const PtrStepSzi& trainIdx, const PtrStepSzi& imgIdx, const PtrStepSzf& distance, const PtrStepSz<unsigned int>& nMatches, cudaStream_t stream);
+    } // namespace bf_radius_match
+}}} // namespace cv { namespace cuda { namespace cudev
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafeatures2d/src/cuda/fast.cu b/modules/cudafeatures2d/src/cuda/fast.cu
new file mode 100644
index 00000000000..5da9e5ecdb0
--- /dev/null
+++ b/modules/cudafeatures2d/src/cuda/fast.cu
@@ -0,0 +1,377 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/utility.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace fast
+    {
+        ///////////////////////////////////////////////////////////////////////////
+        // calcKeypoints
+
+        __constant__ uchar c_table[] = { 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xff, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xff, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0xc0, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0xf0, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0x80, 0x0, 0xc0, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
+
+        // 1 -> v > x + th
+        // 2 -> v < x - th
+        // 0 -> x - th <= v <= x + th
+        __device__ __forceinline__ int diffType(const int v, const int x, const int th)
+        {
+            const int diff = x - v;
+
+            return static_cast<int>(diff < -th) + (static_cast<int>(diff > th) << 1);
+        }
+
+        __device__ void calcMask(const uint C[4], const int v, const int th, int& mask1, int& mask2)
+        {
+            mask1 = 0;
+            mask2 = 0;
+
+            int d1, d2;
+
+
+
+            d1 = diffType(v, C[0] & 0xff, th);
+            d2 = diffType(v, C[2] & 0xff, th);
+
+            if ((d1 | d2) == 0)
+                return;
+
+            mask1 |= (d1 & 1) << 0;
+            mask2 |= ((d1 & 2) >> 1) << 0;
+
+            mask1 |= (d2 & 1) << 8;
+            mask2 |= ((d2 & 2) >> 1) << 8;
+
+
+
+            d1 = diffType(v, C[1] & 0xff, th);
+            d2 = diffType(v, C[3] & 0xff, th);
+
+            if ((d1 | d2) == 0)
+                return;
+
+            mask1 |= (d1 & 1) << 4;
+            mask2 |= ((d1 & 2) >> 1) << 4;
+
+            mask1 |= (d2 & 1) << 12;
+            mask2 |= ((d2 & 2) >> 1) << 12;
+
+
+
+            d1 = diffType(v, (C[0] >> (2 * 8)) & 0xff, th);
+            d2 = diffType(v, (C[2] >> (2 * 8)) & 0xff, th);
+
+            if ((d1 | d2) == 0)
+                return;
+
+            mask1 |= (d1 & 1) << 2;
+            mask2 |= ((d1 & 2) >> 1) << 2;
+
+            mask1 |= (d2 & 1) << 10;
+            mask2 |= ((d2 & 2) >> 1) << 10;
+
+
+
+            d1 = diffType(v, (C[1] >> (2 * 8)) & 0xff, th);
+            d2 = diffType(v, (C[3] >> (2 * 8)) & 0xff, th);
+
+            if ((d1 | d2) == 0)
+                return;
+
+            mask1 |= (d1 & 1) << 6;
+            mask2 |= ((d1 & 2) >> 1) << 6;
+
+            mask1 |= (d2 & 1) << 14;
+            mask2 |= ((d2 & 2) >> 1) << 14;
+
+
+
+            d1 = diffType(v, (C[0] >> (1 * 8)) & 0xff, th);
+            d2 = diffType(v, (C[2] >> (1 * 8)) & 0xff, th);
+
+            /*if ((d1 | d2) == 0)
+                return;*/
+
+            mask1 |= (d1 & 1) << 1;
+            mask2 |= ((d1 & 2) >> 1) << 1;
+
+            mask1 |= (d2 & 1) << 9;
+            mask2 |= ((d2 & 2) >> 1) << 9;
+
+
+
+            d1 = diffType(v, (C[0] >> (3 * 8)) & 0xff, th);
+            d2 = diffType(v, (C[2] >> (3 * 8)) & 0xff, th);
+
+            /*if ((d1 | d2) == 0)
+                return;*/
+
+            mask1 |= (d1 & 1) << 3;
+            mask2 |= ((d1 & 2) >> 1) << 3;
+
+            mask1 |= (d2 & 1) << 11;
+            mask2 |= ((d2 & 2) >> 1) << 11;
+
+
+
+            d1 = diffType(v, (C[1] >> (1 * 8)) & 0xff, th);
+            d2 = diffType(v, (C[3] >> (1 * 8)) & 0xff, th);
+
+            /*if ((d1 | d2) == 0)
+                return;*/
+
+            mask1 |= (d1 & 1) << 5;
+            mask2 |= ((d1 & 2) >> 1) << 5;
+
+            mask1 |= (d2 & 1) << 13;
+            mask2 |= ((d2 & 2) >> 1) << 13;
+
+
+
+            d1 = diffType(v, (C[1] >> (3 * 8)) & 0xff, th);
+            d2 = diffType(v, (C[3] >> (3 * 8)) & 0xff, th);
+
+            mask1 |= (d1 & 1) << 7;
+            mask2 |= ((d1 & 2) >> 1) << 7;
+
+            mask1 |= (d2 & 1) << 15;
+            mask2 |= ((d2 & 2) >> 1) << 15;
+        }
+
+        // 1 -> v > x + th
+        // 2 -> v < x - th
+        // 0 -> not a keypoint
+        __device__ __forceinline__ bool isKeyPoint(int mask1, int mask2)
+        {
+            return (__popc(mask1) > 8 && (c_table[(mask1 >> 3) - 63] & (1 << (mask1 & 7)))) ||
+                   (__popc(mask2) > 8 && (c_table[(mask2 >> 3) - 63] & (1 << (mask2 & 7))));
+        }
+
+        __device__ int cornerScore(const uint C[4], const int v, const int threshold)
+        {
+            // binary search in [threshold + 1, 255]
+
+            int min = threshold + 1;
+            int max = 255;
+
+            while (min <= max)
+            {
+                const int mid = (min + max) >> 1;
+
+                int mask1 = 0;
+                int mask2 = 0;
+
+                calcMask(C, v, mid, mask1, mask2);
+
+                int isKp = static_cast<int>(isKeyPoint(mask1, mask2));
+
+                min = isKp * (mid + 1) + (isKp ^ 1) * min;
+                max = (isKp ^ 1) * (mid - 1) + isKp * max;
+            }
+
+            return min - 1;
+        }
+
+        template <bool calcScore, class Mask>
+        __global__ void calcKeypoints(const PtrStepSzb img, const Mask mask, short2* kpLoc, const unsigned int maxKeypoints, PtrStepi score, const int threshold, unsigned int* d_counter)
+        {
+            #if defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 110)
+
+            const int j = threadIdx.x + blockIdx.x * blockDim.x + 3;
+            const int i = threadIdx.y + blockIdx.y * blockDim.y + 3;
+
+            if (i < img.rows - 3 && j < img.cols - 3 && mask(i, j))
+            {
+                int v;
+                uint C[4] = {0,0,0,0};
+
+                C[2] |= static_cast<uint>(img(i - 3, j - 1)) << 8;
+                C[2] |= static_cast<uint>(img(i - 3, j));
+                C[1] |= static_cast<uint>(img(i - 3, j + 1)) << (3 * 8);
+
+                C[2] |= static_cast<uint>(img(i - 2, j - 2)) << (2 * 8);
+                C[1] |= static_cast<uint>(img(i - 2, j + 2)) << (2 * 8);
+
+                C[2] |= static_cast<uint>(img(i - 1, j - 3)) << (3 * 8);
+                C[1] |= static_cast<uint>(img(i - 1, j + 3)) << 8;
+
+                C[3] |= static_cast<uint>(img(i, j - 3));
+                v     = static_cast<int>(img(i, j));
+                C[1] |= static_cast<uint>(img(i, j + 3));
+
+                int d1 = diffType(v, C[1] & 0xff, threshold);
+                int d2 = diffType(v, C[3] & 0xff, threshold);
+
+                if ((d1 | d2) == 0)
+                    return;
+
+                C[3] |= static_cast<uint>(img(i + 1, j - 3)) << 8;
+                C[0] |= static_cast<uint>(img(i + 1, j + 3)) << (3 * 8);
+
+                C[3] |= static_cast<uint>(img(i + 2, j - 2)) << (2 * 8);
+                C[0] |= static_cast<uint>(img(i + 2, j + 2)) << (2 * 8);
+
+                C[3] |= static_cast<uint>(img(i + 3, j - 1)) << (3 * 8);
+                C[0] |= static_cast<uint>(img(i + 3, j));
+                C[0] |= static_cast<uint>(img(i + 3, j + 1)) << 8;
+
+                int mask1 = 0;
+                int mask2 = 0;
+
+                calcMask(C, v, threshold, mask1, mask2);
+
+                if (isKeyPoint(mask1, mask2))
+                {
+                    if (calcScore) score(i, j) = cornerScore(C, v, threshold);
+
+                    const unsigned int ind = atomicInc(d_counter, (unsigned int)(-1));
+
+                    if (ind < maxKeypoints)
+                        kpLoc[ind] = make_short2(j, i);
+                }
+            }
+
+            #endif
+        }
+
+        int calcKeypoints_gpu(PtrStepSzb img, PtrStepSzb mask, short2* kpLoc, int maxKeypoints, PtrStepSzi score, int threshold, unsigned int* d_counter, cudaStream_t stream)
+        {
+            dim3 block(32, 8);
+
+            dim3 grid;
+            grid.x = divUp(img.cols - 6, block.x);
+            grid.y = divUp(img.rows - 6, block.y);
+
+            cudaSafeCall( cudaMemsetAsync(d_counter, 0, sizeof(unsigned int), stream) );
+
+            if (score.data)
+            {
+                if (mask.data)
+                    calcKeypoints<true><<<grid, block, 0, stream>>>(img, SingleMask(mask), kpLoc, maxKeypoints, score, threshold, d_counter);
+                else
+                    calcKeypoints<true><<<grid, block, 0, stream>>>(img, WithOutMask(), kpLoc, maxKeypoints, score, threshold, d_counter);
+            }
+            else
+            {
+                if (mask.data)
+                    calcKeypoints<false><<<grid, block, 0, stream>>>(img, SingleMask(mask), kpLoc, maxKeypoints, score, threshold, d_counter);
+                else
+                    calcKeypoints<false><<<grid, block, 0, stream>>>(img, WithOutMask(), kpLoc, maxKeypoints, score, threshold, d_counter);
+            }
+
+            cudaSafeCall( cudaGetLastError() );
+
+            unsigned int count;
+            cudaSafeCall( cudaMemcpyAsync(&count, d_counter, sizeof(unsigned int), cudaMemcpyDeviceToHost, stream) );
+
+            cudaSafeCall( cudaStreamSynchronize(stream) );
+
+            return count;
+        }
+
+        ///////////////////////////////////////////////////////////////////////////
+        // nonmaxSuppression
+
+        __global__ void nonmaxSuppression(const short2* kpLoc, int count, const PtrStepSzi scoreMat, short2* locFinal, float* responseFinal, unsigned int* d_counter)
+        {
+            #if defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 110)
+
+            const int kpIdx = threadIdx.x + blockIdx.x * blockDim.x;
+
+            if (kpIdx < count)
+            {
+                short2 loc = kpLoc[kpIdx];
+
+                int score = scoreMat(loc.y, loc.x);
+
+                bool ismax =
+                    score > scoreMat(loc.y - 1, loc.x - 1) &&
+                    score > scoreMat(loc.y - 1, loc.x    ) &&
+                    score > scoreMat(loc.y - 1, loc.x + 1) &&
+
+                    score > scoreMat(loc.y    , loc.x - 1) &&
+                    score > scoreMat(loc.y    , loc.x + 1) &&
+
+                    score > scoreMat(loc.y + 1, loc.x - 1) &&
+                    score > scoreMat(loc.y + 1, loc.x    ) &&
+                    score > scoreMat(loc.y + 1, loc.x + 1);
+
+                if (ismax)
+                {
+                    const unsigned int ind = atomicInc(d_counter, (unsigned int)(-1));
+
+                    locFinal[ind] = loc;
+                    responseFinal[ind] = static_cast<float>(score);
+                }
+            }
+
+            #endif
+        }
+
+        int nonmaxSuppression_gpu(const short2* kpLoc, int count, PtrStepSzi score, short2* loc, float* response, unsigned int* d_counter, cudaStream_t stream)
+        {
+            dim3 block(256);
+
+            dim3 grid;
+            grid.x = divUp(count, block.x);
+
+            cudaSafeCall( cudaMemsetAsync(d_counter, 0, sizeof(unsigned int), stream) );
+
+            nonmaxSuppression<<<grid, block, 0, stream>>>(kpLoc, count, score, loc, response, d_counter);
+            cudaSafeCall( cudaGetLastError() );
+
+            unsigned int new_count;
+            cudaSafeCall( cudaMemcpyAsync(&new_count, d_counter, sizeof(unsigned int), cudaMemcpyDeviceToHost, stream) );
+
+            cudaSafeCall( cudaStreamSynchronize(stream) );
+
+            return new_count;
+        }
+    } // namespace fast
+}}}
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafeatures2d/src/cuda/orb.cu b/modules/cudafeatures2d/src/cuda/orb.cu
new file mode 100644
index 00000000000..182ca4fb867
--- /dev/null
+++ b/modules/cudafeatures2d/src/cuda/orb.cu
@@ -0,0 +1,446 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include <thrust/device_ptr.h>
+#include <thrust/sort.h>
+#include <thrust/system/cuda/execution_policy.h>
+#include <thrust/version.h>
+
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/reduce.hpp"
+#include "opencv2/core/cuda/functional.hpp"
+#include "opencv2/core/cuda/utility.hpp"
+namespace cv { namespace cuda { namespace device
+{
+    namespace orb
+    {
+        ////////////////////////////////////////////////////////////////////////////////////////////////////////
+        // cull
+
+        int cull_gpu(int* loc, float* response, int size, int n_points, cudaStream_t stream)
+        {
+            thrust::device_ptr<int> loc_ptr(loc);
+            thrust::device_ptr<float> response_ptr(response);
+#if THRUST_VERSION >= 100800
+#if THRUST_VERSION >= 100802
+            if (stream)
+            {
+                thrust::sort_by_key(thrust::cuda::par(ThrustAllocator::getAllocator()).on(stream), response_ptr, response_ptr + size, loc_ptr, thrust::greater<float>());
+            }
+            else
+            {
+                thrust::sort_by_key(thrust::cuda::par(ThrustAllocator::getAllocator()), response_ptr, response_ptr + size, loc_ptr, thrust::greater<float>());
+            }
+#else
+            if(stream)
+            {
+                thrust::sort_by_key(thrust::cuda::par.on(stream), response_ptr, response_ptr + size, loc_ptr, thrust::greater<float>());
+            }else
+            {
+                thrust::sort_by_key(response_ptr, response_ptr + size, loc_ptr, thrust::greater<float>());
+            }
+#endif
+#else
+            thrust::sort_by_key(response_ptr, response_ptr + size, loc_ptr, thrust::greater<float>());
+#endif
+            return n_points;
+        }
+
+        ////////////////////////////////////////////////////////////////////////////////////////////////////////
+        // HarrisResponses
+
+        __global__ void HarrisResponses(const PtrStepb img, const short2* loc_, float* response, const int npoints, const int blockSize, const float harris_k)
+        {
+            __shared__ int smem0[8 * 32];
+            __shared__ int smem1[8 * 32];
+            __shared__ int smem2[8 * 32];
+
+            const int ptidx = blockIdx.x * blockDim.y + threadIdx.y;
+
+            if (ptidx < npoints)
+            {
+                const short2 loc = loc_[ptidx];
+
+                const int r = blockSize / 2;
+                const int x0 = loc.x - r;
+                const int y0 = loc.y - r;
+
+                int a = 0, b = 0, c = 0;
+
+                for (int ind = threadIdx.x; ind < blockSize * blockSize; ind += blockDim.x)
+                {
+                    const int i = ind / blockSize;
+                    const int j = ind % blockSize;
+
+                    int Ix = (img(y0 + i, x0 + j + 1) - img(y0 + i, x0 + j - 1)) * 2 +
+                        (img(y0 + i - 1, x0 + j + 1) - img(y0 + i - 1, x0 + j - 1)) +
+                        (img(y0 + i + 1, x0 + j + 1) - img(y0 + i + 1, x0 + j - 1));
+
+                    int Iy = (img(y0 + i + 1, x0 + j) - img(y0 + i - 1, x0 + j)) * 2 +
+                        (img(y0 + i + 1, x0 + j - 1) - img(y0 + i - 1, x0 + j - 1)) +
+                        (img(y0 + i + 1, x0 + j + 1) - img(y0 + i - 1, x0 + j + 1));
+
+                    a += Ix * Ix;
+                    b += Iy * Iy;
+                    c += Ix * Iy;
+                }
+
+                int* srow0 = smem0 + threadIdx.y * blockDim.x;
+                int* srow1 = smem1 + threadIdx.y * blockDim.x;
+                int* srow2 = smem2 + threadIdx.y * blockDim.x;
+
+                plus<int> op;
+                reduce<32>(smem_tuple(srow0, srow1, srow2), thrust::tie(a, b, c), threadIdx.x, thrust::make_tuple(op, op, op));
+
+                if (threadIdx.x == 0)
+                {
+                    float scale = (1 << 2) * blockSize * 255.0f;
+                    scale = 1.0f / scale;
+                    const float scale_sq_sq = scale * scale * scale * scale;
+
+                    response[ptidx] = ((float)a * b - (float)c * c - harris_k * ((float)a + b) * ((float)a + b)) * scale_sq_sq;
+                }
+            }
+        }
+
+        void HarrisResponses_gpu(PtrStepSzb img, const short2* loc, float* response, const int npoints, int blockSize, float harris_k, cudaStream_t stream)
+        {
+            dim3 block(32, 8);
+
+            dim3 grid;
+            grid.x = divUp(npoints, block.y);
+
+            HarrisResponses<<<grid, block, 0, stream>>>(img, loc, response, npoints, blockSize, harris_k);
+
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ////////////////////////////////////////////////////////////////////////////////////////////////////////
+        // IC_Angle
+
+        __constant__ int c_u_max[32];
+
+        void loadUMax(const int* u_max, int count)
+        {
+            cudaSafeCall( cudaMemcpyToSymbol(c_u_max, u_max, count * sizeof(int)) );
+        }
+
+        __global__ void IC_Angle(const PtrStepb image, const short2* loc_, float* angle, const int npoints, const int half_k)
+        {
+            __shared__ int smem0[8 * 32];
+            __shared__ int smem1[8 * 32];
+
+            int* srow0 = smem0 + threadIdx.y * blockDim.x;
+            int* srow1 = smem1 + threadIdx.y * blockDim.x;
+
+            plus<int> op;
+
+            const int ptidx = blockIdx.x * blockDim.y + threadIdx.y;
+
+            if (ptidx < npoints)
+            {
+                int m_01 = 0, m_10 = 0;
+
+                const short2 loc = loc_[ptidx];
+
+                // Treat the center line differently, v=0
+                for (int u = threadIdx.x - half_k; u <= half_k; u += blockDim.x)
+                    m_10 += u * image(loc.y, loc.x + u);
+
+                reduce<32>(srow0, m_10, threadIdx.x, op);
+
+                for (int v = 1; v <= half_k; ++v)
+                {
+                    // Proceed over the two lines
+                    int v_sum = 0;
+                    int m_sum = 0;
+                    const int d = c_u_max[v];
+
+                    for (int u = threadIdx.x - d; u <= d; u += blockDim.x)
+                    {
+                        int val_plus = image(loc.y + v, loc.x + u);
+                        int val_minus = image(loc.y - v, loc.x + u);
+
+                        v_sum += (val_plus - val_minus);
+                        m_sum += u * (val_plus + val_minus);
+                    }
+
+                    reduce<32>(smem_tuple(srow0, srow1), thrust::tie(v_sum, m_sum), threadIdx.x, thrust::make_tuple(op, op));
+
+                    m_10 += m_sum;
+                    m_01 += v * v_sum;
+                }
+
+                if (threadIdx.x == 0)
+                {
+                    float kp_dir = ::atan2f((float)m_01, (float)m_10);
+                    kp_dir += (kp_dir < 0) * (2.0f * CV_PI_F);
+                    kp_dir *= 180.0f / CV_PI_F;
+
+                    angle[ptidx] = kp_dir;
+                }
+            }
+        }
+
+        void IC_Angle_gpu(PtrStepSzb image, const short2* loc, float* angle, int npoints, int half_k, cudaStream_t stream)
+        {
+            dim3 block(32, 8);
+
+            dim3 grid;
+            grid.x = divUp(npoints, block.y);
+
+            IC_Angle<<<grid, block, 0, stream>>>(image, loc, angle, npoints, half_k);
+
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ////////////////////////////////////////////////////////////////////////////////////////////////////////
+        // computeOrbDescriptor
+
+        template <int WTA_K> struct OrbDescriptor;
+
+        #define GET_VALUE(idx) \
+            img(loc.y + __float2int_rn(pattern_x[idx] * sina + pattern_y[idx] * cosa), \
+                loc.x + __float2int_rn(pattern_x[idx] * cosa - pattern_y[idx] * sina))
+
+        template <> struct OrbDescriptor<2>
+        {
+            __device__ static int calc(const PtrStepb& img, short2 loc, const int* pattern_x, const int* pattern_y, float sina, float cosa, int i)
+            {
+                pattern_x += 16 * i;
+                pattern_y += 16 * i;
+
+                int t0, t1, val;
+
+                t0 = GET_VALUE(0); t1 = GET_VALUE(1);
+                val = t0 < t1;
+
+                t0 = GET_VALUE(2); t1 = GET_VALUE(3);
+                val |= (t0 < t1) << 1;
+
+                t0 = GET_VALUE(4); t1 = GET_VALUE(5);
+                val |= (t0 < t1) << 2;
+
+                t0 = GET_VALUE(6); t1 = GET_VALUE(7);
+                val |= (t0 < t1) << 3;
+
+                t0 = GET_VALUE(8); t1 = GET_VALUE(9);
+                val |= (t0 < t1) << 4;
+
+                t0 = GET_VALUE(10); t1 = GET_VALUE(11);
+                val |= (t0 < t1) << 5;
+
+                t0 = GET_VALUE(12); t1 = GET_VALUE(13);
+                val |= (t0 < t1) << 6;
+
+                t0 = GET_VALUE(14); t1 = GET_VALUE(15);
+                val |= (t0 < t1) << 7;
+
+                return val;
+            }
+        };
+
+        template <> struct OrbDescriptor<3>
+        {
+            __device__ static int calc(const PtrStepb& img, short2 loc, const int* pattern_x, const int* pattern_y, float sina, float cosa, int i)
+            {
+                pattern_x += 12 * i;
+                pattern_y += 12 * i;
+
+                int t0, t1, t2, val;
+
+                t0 = GET_VALUE(0); t1 = GET_VALUE(1); t2 = GET_VALUE(2);
+                val = t2 > t1 ? (t2 > t0 ? 2 : 0) : (t1 > t0);
+
+                t0 = GET_VALUE(3); t1 = GET_VALUE(4); t2 = GET_VALUE(5);
+                val |= (t2 > t1 ? (t2 > t0 ? 2 : 0) : (t1 > t0)) << 2;
+
+                t0 = GET_VALUE(6); t1 = GET_VALUE(7); t2 = GET_VALUE(8);
+                val |= (t2 > t1 ? (t2 > t0 ? 2 : 0) : (t1 > t0)) << 4;
+
+                t0 = GET_VALUE(9); t1 = GET_VALUE(10); t2 = GET_VALUE(11);
+                val |= (t2 > t1 ? (t2 > t0 ? 2 : 0) : (t1 > t0)) << 6;
+
+                return val;
+            }
+        };
+
+        template <> struct OrbDescriptor<4>
+        {
+            __device__ static int calc(const PtrStepb& img, short2 loc, const int* pattern_x, const int* pattern_y, float sina, float cosa, int i)
+            {
+                pattern_x += 16 * i;
+                pattern_y += 16 * i;
+
+                int t0, t1, t2, t3, k, val;
+                int a, b;
+
+                t0 = GET_VALUE(0); t1 = GET_VALUE(1);
+                t2 = GET_VALUE(2); t3 = GET_VALUE(3);
+                a = 0, b = 2;
+                if( t1 > t0 ) t0 = t1, a = 1;
+                if( t3 > t2 ) t2 = t3, b = 3;
+                k = t0 > t2 ? a : b;
+                val = k;
+
+                t0 = GET_VALUE(4); t1 = GET_VALUE(5);
+                t2 = GET_VALUE(6); t3 = GET_VALUE(7);
+                a = 0, b = 2;
+                if( t1 > t0 ) t0 = t1, a = 1;
+                if( t3 > t2 ) t2 = t3, b = 3;
+                k = t0 > t2 ? a : b;
+                val |= k << 2;
+
+                t0 = GET_VALUE(8); t1 = GET_VALUE(9);
+                t2 = GET_VALUE(10); t3 = GET_VALUE(11);
+                a = 0, b = 2;
+                if( t1 > t0 ) t0 = t1, a = 1;
+                if( t3 > t2 ) t2 = t3, b = 3;
+                k = t0 > t2 ? a : b;
+                val |= k << 4;
+
+                t0 = GET_VALUE(12); t1 = GET_VALUE(13);
+                t2 = GET_VALUE(14); t3 = GET_VALUE(15);
+                a = 0, b = 2;
+                if( t1 > t0 ) t0 = t1, a = 1;
+                if( t3 > t2 ) t2 = t3, b = 3;
+                k = t0 > t2 ? a : b;
+                val |= k << 6;
+
+                return val;
+            }
+        };
+
+        #undef GET_VALUE
+
+        template <int WTA_K>
+        __global__ void computeOrbDescriptor(const PtrStepb img, const short2* loc, const float* angle_, const int npoints,
+            const int* pattern_x, const int* pattern_y, PtrStepb desc, int dsize)
+        {
+            const int descidx = blockIdx.x * blockDim.x + threadIdx.x;
+            const int ptidx = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (ptidx < npoints && descidx < dsize)
+            {
+                float angle = angle_[ptidx];
+                angle *= (float)(CV_PI_F / 180.f);
+
+                float sina, cosa;
+                ::sincosf(angle, &sina, &cosa);
+
+                desc.ptr(ptidx)[descidx] = OrbDescriptor<WTA_K>::calc(img, loc[ptidx], pattern_x, pattern_y, sina, cosa, descidx);
+            }
+        }
+
+        void computeOrbDescriptor_gpu(PtrStepb img, const short2* loc, const float* angle, const int npoints,
+            const int* pattern_x, const int* pattern_y, PtrStepb desc, int dsize, int WTA_K, cudaStream_t stream)
+        {
+            dim3 block(32, 8);
+
+            dim3 grid;
+            grid.x = divUp(dsize, block.x);
+            grid.y = divUp(npoints, block.y);
+
+            switch (WTA_K)
+            {
+            case 2:
+                computeOrbDescriptor<2><<<grid, block, 0, stream>>>(img, loc, angle, npoints, pattern_x, pattern_y, desc, dsize);
+                break;
+
+            case 3:
+                computeOrbDescriptor<3><<<grid, block, 0, stream>>>(img, loc, angle, npoints, pattern_x, pattern_y, desc, dsize);
+                break;
+
+            case 4:
+                computeOrbDescriptor<4><<<grid, block, 0, stream>>>(img, loc, angle, npoints, pattern_x, pattern_y, desc, dsize);
+                break;
+            }
+
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ////////////////////////////////////////////////////////////////////////////////////////////////////////
+        // mergeLocation
+
+        __global__ void mergeLocation(const short2* loc_, float* x, float* y, const int npoints, float scale)
+        {
+            const int ptidx = blockIdx.x * blockDim.x + threadIdx.x;
+
+            if (ptidx < npoints)
+            {
+                short2 loc = loc_[ptidx];
+
+                x[ptidx] = loc.x * scale;
+                y[ptidx] = loc.y * scale;
+            }
+        }
+
+        void mergeLocation_gpu(const short2* loc, float* x, float* y, int npoints, float scale, cudaStream_t stream)
+        {
+            dim3 block(256);
+
+            dim3 grid;
+            grid.x = divUp(npoints, block.x);
+
+            mergeLocation<<<grid, block, 0, stream>>>(loc, x, y, npoints, scale);
+
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    }
+}}}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafeatures2d/src/fast.cpp b/modules/cudafeatures2d/src/fast.cpp
new file mode 100644
index 00000000000..e2c13b06b2b
--- /dev/null
+++ b/modules/cudafeatures2d/src/fast.cpp
@@ -0,0 +1,214 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+Ptr<cv::cuda::FastFeatureDetector> cv::cuda::FastFeatureDetector::create(int, bool, int, int) { throw_no_cuda(); return Ptr<cv::cuda::FastFeatureDetector>(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace fast
+    {
+        int calcKeypoints_gpu(PtrStepSzb img, PtrStepSzb mask, short2* kpLoc, int maxKeypoints, PtrStepSzi score, int threshold, unsigned int* d_counter, cudaStream_t stream);
+        int nonmaxSuppression_gpu(const short2* kpLoc, int count, PtrStepSzi score, short2* loc, float* response, unsigned int* d_counter, cudaStream_t stream);
+    }
+}}}
+
+namespace
+{
+    class FAST_Impl : public cv::cuda::FastFeatureDetector
+    {
+    public:
+        FAST_Impl(int threshold, bool nonmaxSuppression, int max_npoints);
+
+        virtual void detect(InputArray _image, std::vector<KeyPoint>& keypoints, InputArray _mask);
+        virtual void detectAsync(InputArray _image, OutputArray _keypoints, InputArray _mask, Stream& stream);
+
+        virtual void convert(InputArray _gpu_keypoints, std::vector<KeyPoint>& keypoints);
+
+        virtual void setThreshold(int threshold) { threshold_ = threshold; }
+        virtual int getThreshold() const { return threshold_; }
+
+        virtual void setNonmaxSuppression(bool f) { nonmaxSuppression_ = f; }
+        virtual bool getNonmaxSuppression() const { return nonmaxSuppression_; }
+
+        virtual void setMaxNumPoints(int max_npoints) { max_npoints_ = max_npoints; }
+        virtual int getMaxNumPoints() const { return max_npoints_; }
+
+        virtual void setType(int type) { CV_Assert( type == cv::FastFeatureDetector::TYPE_9_16 ); }
+        virtual int getType() const { return cv::FastFeatureDetector::TYPE_9_16; }
+
+    private:
+        int threshold_;
+        bool nonmaxSuppression_;
+        int max_npoints_;
+
+        unsigned int* d_counter;
+    };
+
+    FAST_Impl::FAST_Impl(int threshold, bool nonmaxSuppression, int max_npoints) :
+        threshold_(threshold), nonmaxSuppression_(nonmaxSuppression), max_npoints_(max_npoints)
+    {
+    }
+
+    void FAST_Impl::detect(InputArray _image, std::vector<KeyPoint>& keypoints, InputArray _mask)
+    {
+        if (_image.empty())
+        {
+            keypoints.clear();
+            return;
+        }
+
+        BufferPool pool(Stream::Null());
+        GpuMat d_keypoints = pool.getBuffer(ROWS_COUNT, max_npoints_, CV_32FC1);
+
+        detectAsync(_image, d_keypoints, _mask, Stream::Null());
+        convert(d_keypoints, keypoints);
+    }
+
+    void FAST_Impl::detectAsync(InputArray _image, OutputArray _keypoints, InputArray _mask, Stream& stream)
+    {
+        using namespace cv::cuda::device::fast;
+
+        cudaSafeCall( cudaMalloc(&d_counter, sizeof(unsigned int)) );
+
+        const GpuMat img = _image.getGpuMat();
+        const GpuMat mask = _mask.getGpuMat();
+
+        CV_Assert( img.type() == CV_8UC1 );
+        CV_Assert( mask.empty() || (mask.type() == CV_8UC1 && mask.size() == img.size()) );
+
+        BufferPool pool(stream);
+
+        GpuMat kpLoc = pool.getBuffer(1, max_npoints_, CV_16SC2);
+
+        GpuMat score;
+        if (nonmaxSuppression_)
+        {
+            score = pool.getBuffer(img.size(), CV_32SC1);
+            score.setTo(Scalar::all(0), stream);
+        }
+
+        int count = calcKeypoints_gpu(img, mask, kpLoc.ptr<short2>(), max_npoints_, score, threshold_, d_counter, StreamAccessor::getStream(stream));
+        count = std::min(count, max_npoints_);
+
+        if (count == 0)
+        {
+            _keypoints.release();
+            return;
+        }
+
+        ensureSizeIsEnough(ROWS_COUNT, count, CV_32FC1, _keypoints);
+        GpuMat& keypoints = _keypoints.getGpuMatRef();
+
+        if (nonmaxSuppression_)
+        {
+            count = nonmaxSuppression_gpu(kpLoc.ptr<short2>(), count, score, keypoints.ptr<short2>(LOCATION_ROW), keypoints.ptr<float>(RESPONSE_ROW), d_counter, StreamAccessor::getStream(stream));
+            if (count == 0)
+            {
+                keypoints.release();
+            }
+            else
+            {
+                keypoints.cols = count;
+            }
+        }
+        else
+        {
+            GpuMat locRow(1, count, kpLoc.type(), keypoints.ptr(0));
+            kpLoc.colRange(0, count).copyTo(locRow, stream);
+            keypoints.row(1).setTo(Scalar::all(0), stream);
+        }
+
+        cudaSafeCall( cudaFree(d_counter) );
+    }
+
+    void FAST_Impl::convert(InputArray _gpu_keypoints, std::vector<KeyPoint>& keypoints)
+    {
+        if (_gpu_keypoints.empty())
+        {
+            keypoints.clear();
+            return;
+        }
+
+        Mat h_keypoints;
+        if (_gpu_keypoints.kind() == _InputArray::CUDA_GPU_MAT)
+        {
+            _gpu_keypoints.getGpuMat().download(h_keypoints);
+        }
+        else
+        {
+            h_keypoints = _gpu_keypoints.getMat();
+        }
+
+        CV_Assert( h_keypoints.rows == ROWS_COUNT );
+        CV_Assert( h_keypoints.elemSize() == 4 );
+
+        const int npoints = h_keypoints.cols;
+
+        keypoints.resize(npoints);
+
+        const short2* loc_row = h_keypoints.ptr<short2>(LOCATION_ROW);
+        const float* response_row = h_keypoints.ptr<float>(RESPONSE_ROW);
+
+        for (int i = 0; i < npoints; ++i)
+        {
+            KeyPoint kp(loc_row[i].x, loc_row[i].y, static_cast<float>(FEATURE_SIZE), -1, response_row[i]);
+            keypoints[i] = kp;
+        }
+    }
+}
+
+Ptr<cv::cuda::FastFeatureDetector> cv::cuda::FastFeatureDetector::create(int threshold, bool nonmaxSuppression, int type, int max_npoints)
+{
+    CV_Assert( type == cv::FastFeatureDetector::TYPE_9_16 );
+    return makePtr<FAST_Impl>(threshold, nonmaxSuppression, max_npoints);
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudafeatures2d/src/feature2d_async.cpp b/modules/cudafeatures2d/src/feature2d_async.cpp
new file mode 100644
index 00000000000..202a725376a
--- /dev/null
+++ b/modules/cudafeatures2d/src/feature2d_async.cpp
@@ -0,0 +1,85 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+cv::cuda::Feature2DAsync::~Feature2DAsync()
+{
+}
+
+void cv::cuda::Feature2DAsync::detectAsync(InputArray image,
+                                           OutputArray keypoints,
+                                           InputArray mask,
+                                           Stream& stream)
+{
+    if (image.empty())
+    {
+        keypoints.clear();
+        return;
+    }
+
+    detectAndComputeAsync(image, mask, keypoints, noArray(), false, stream);
+}
+
+void cv::cuda::Feature2DAsync::computeAsync(InputArray image,
+                                            OutputArray keypoints,
+                                            OutputArray descriptors,
+                                            Stream& stream)
+{
+    if (image.empty())
+    {
+        descriptors.release();
+        return;
+    }
+
+    detectAndComputeAsync(image, noArray(), keypoints, descriptors, true, stream);
+}
+
+void cv::cuda::Feature2DAsync::detectAndComputeAsync(InputArray /*image*/,
+                                                     InputArray /*mask*/,
+                                                     OutputArray /*keypoints*/,
+                                                     OutputArray /*descriptors*/,
+                                                     bool /*useProvidedKeypoints*/,
+                                                     Stream& /*stream*/)
+{
+    CV_Error(Error::StsNotImplemented, "");
+}
diff --git a/modules/cudafeatures2d/src/orb.cpp b/modules/cudafeatures2d/src/orb.cpp
new file mode 100644
index 00000000000..75cdd7efa88
--- /dev/null
+++ b/modules/cudafeatures2d/src/orb.cpp
@@ -0,0 +1,930 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+Ptr<cv::cuda::ORB> cv::cuda::ORB::create(int, float, int, int, int, int, int, int, int, bool) { throw_no_cuda(); return Ptr<cv::cuda::ORB>(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace orb
+    {
+        int cull_gpu(int* loc, float* response, int size, int n_points, cudaStream_t stream);
+
+        void HarrisResponses_gpu(PtrStepSzb img, const short2* loc, float* response, const int npoints, int blockSize, float harris_k, cudaStream_t stream);
+
+        void loadUMax(const int* u_max, int count);
+
+        void IC_Angle_gpu(PtrStepSzb image, const short2* loc, float* angle, int npoints, int half_k, cudaStream_t stream);
+
+        void computeOrbDescriptor_gpu(PtrStepb img, const short2* loc, const float* angle, const int npoints,
+            const int* pattern_x, const int* pattern_y, PtrStepb desc, int dsize, int WTA_K, cudaStream_t stream);
+
+        void mergeLocation_gpu(const short2* loc, float* x, float* y, int npoints, float scale, cudaStream_t stream);
+    }
+}}}
+
+namespace
+{
+    const float HARRIS_K = 0.04f;
+    const int DESCRIPTOR_SIZE = 32;
+
+    const int bit_pattern_31_[256 * 4] =
+    {
+        8,-3, 9,5/*mean (0), correlation (0)*/,
+        4,2, 7,-12/*mean (1.12461e-05), correlation (0.0437584)*/,
+        -11,9, -8,2/*mean (3.37382e-05), correlation (0.0617409)*/,
+        7,-12, 12,-13/*mean (5.62303e-05), correlation (0.0636977)*/,
+        2,-13, 2,12/*mean (0.000134953), correlation (0.085099)*/,
+        1,-7, 1,6/*mean (0.000528565), correlation (0.0857175)*/,
+        -2,-10, -2,-4/*mean (0.0188821), correlation (0.0985774)*/,
+        -13,-13, -11,-8/*mean (0.0363135), correlation (0.0899616)*/,
+        -13,-3, -12,-9/*mean (0.121806), correlation (0.099849)*/,
+        10,4, 11,9/*mean (0.122065), correlation (0.093285)*/,
+        -13,-8, -8,-9/*mean (0.162787), correlation (0.0942748)*/,
+        -11,7, -9,12/*mean (0.21561), correlation (0.0974438)*/,
+        7,7, 12,6/*mean (0.160583), correlation (0.130064)*/,
+        -4,-5, -3,0/*mean (0.228171), correlation (0.132998)*/,
+        -13,2, -12,-3/*mean (0.00997526), correlation (0.145926)*/,
+        -9,0, -7,5/*mean (0.198234), correlation (0.143636)*/,
+        12,-6, 12,-1/*mean (0.0676226), correlation (0.16689)*/,
+        -3,6, -2,12/*mean (0.166847), correlation (0.171682)*/,
+        -6,-13, -4,-8/*mean (0.101215), correlation (0.179716)*/,
+        11,-13, 12,-8/*mean (0.200641), correlation (0.192279)*/,
+        4,7, 5,1/*mean (0.205106), correlation (0.186848)*/,
+        5,-3, 10,-3/*mean (0.234908), correlation (0.192319)*/,
+        3,-7, 6,12/*mean (0.0709964), correlation (0.210872)*/,
+        -8,-7, -6,-2/*mean (0.0939834), correlation (0.212589)*/,
+        -2,11, -1,-10/*mean (0.127778), correlation (0.20866)*/,
+        -13,12, -8,10/*mean (0.14783), correlation (0.206356)*/,
+        -7,3, -5,-3/*mean (0.182141), correlation (0.198942)*/,
+        -4,2, -3,7/*mean (0.188237), correlation (0.21384)*/,
+        -10,-12, -6,11/*mean (0.14865), correlation (0.23571)*/,
+        5,-12, 6,-7/*mean (0.222312), correlation (0.23324)*/,
+        5,-6, 7,-1/*mean (0.229082), correlation (0.23389)*/,
+        1,0, 4,-5/*mean (0.241577), correlation (0.215286)*/,
+        9,11, 11,-13/*mean (0.00338507), correlation (0.251373)*/,
+        4,7, 4,12/*mean (0.131005), correlation (0.257622)*/,
+        2,-1, 4,4/*mean (0.152755), correlation (0.255205)*/,
+        -4,-12, -2,7/*mean (0.182771), correlation (0.244867)*/,
+        -8,-5, -7,-10/*mean (0.186898), correlation (0.23901)*/,
+        4,11, 9,12/*mean (0.226226), correlation (0.258255)*/,
+        0,-8, 1,-13/*mean (0.0897886), correlation (0.274827)*/,
+        -13,-2, -8,2/*mean (0.148774), correlation (0.28065)*/,
+        -3,-2, -2,3/*mean (0.153048), correlation (0.283063)*/,
+        -6,9, -4,-9/*mean (0.169523), correlation (0.278248)*/,
+        8,12, 10,7/*mean (0.225337), correlation (0.282851)*/,
+        0,9, 1,3/*mean (0.226687), correlation (0.278734)*/,
+        7,-5, 11,-10/*mean (0.00693882), correlation (0.305161)*/,
+        -13,-6, -11,0/*mean (0.0227283), correlation (0.300181)*/,
+        10,7, 12,1/*mean (0.125517), correlation (0.31089)*/,
+        -6,-3, -6,12/*mean (0.131748), correlation (0.312779)*/,
+        10,-9, 12,-4/*mean (0.144827), correlation (0.292797)*/,
+        -13,8, -8,-12/*mean (0.149202), correlation (0.308918)*/,
+        -13,0, -8,-4/*mean (0.160909), correlation (0.310013)*/,
+        3,3, 7,8/*mean (0.177755), correlation (0.309394)*/,
+        5,7, 10,-7/*mean (0.212337), correlation (0.310315)*/,
+        -1,7, 1,-12/*mean (0.214429), correlation (0.311933)*/,
+        3,-10, 5,6/*mean (0.235807), correlation (0.313104)*/,
+        2,-4, 3,-10/*mean (0.00494827), correlation (0.344948)*/,
+        -13,0, -13,5/*mean (0.0549145), correlation (0.344675)*/,
+        -13,-7, -12,12/*mean (0.103385), correlation (0.342715)*/,
+        -13,3, -11,8/*mean (0.134222), correlation (0.322922)*/,
+        -7,12, -4,7/*mean (0.153284), correlation (0.337061)*/,
+        6,-10, 12,8/*mean (0.154881), correlation (0.329257)*/,
+        -9,-1, -7,-6/*mean (0.200967), correlation (0.33312)*/,
+        -2,-5, 0,12/*mean (0.201518), correlation (0.340635)*/,
+        -12,5, -7,5/*mean (0.207805), correlation (0.335631)*/,
+        3,-10, 8,-13/*mean (0.224438), correlation (0.34504)*/,
+        -7,-7, -4,5/*mean (0.239361), correlation (0.338053)*/,
+        -3,-2, -1,-7/*mean (0.240744), correlation (0.344322)*/,
+        2,9, 5,-11/*mean (0.242949), correlation (0.34145)*/,
+        -11,-13, -5,-13/*mean (0.244028), correlation (0.336861)*/,
+        -1,6, 0,-1/*mean (0.247571), correlation (0.343684)*/,
+        5,-3, 5,2/*mean (0.000697256), correlation (0.357265)*/,
+        -4,-13, -4,12/*mean (0.00213675), correlation (0.373827)*/,
+        -9,-6, -9,6/*mean (0.0126856), correlation (0.373938)*/,
+        -12,-10, -8,-4/*mean (0.0152497), correlation (0.364237)*/,
+        10,2, 12,-3/*mean (0.0299933), correlation (0.345292)*/,
+        7,12, 12,12/*mean (0.0307242), correlation (0.366299)*/,
+        -7,-13, -6,5/*mean (0.0534975), correlation (0.368357)*/,
+        -4,9, -3,4/*mean (0.099865), correlation (0.372276)*/,
+        7,-1, 12,2/*mean (0.117083), correlation (0.364529)*/,
+        -7,6, -5,1/*mean (0.126125), correlation (0.369606)*/,
+        -13,11, -12,5/*mean (0.130364), correlation (0.358502)*/,
+        -3,7, -2,-6/*mean (0.131691), correlation (0.375531)*/,
+        7,-8, 12,-7/*mean (0.160166), correlation (0.379508)*/,
+        -13,-7, -11,-12/*mean (0.167848), correlation (0.353343)*/,
+        1,-3, 12,12/*mean (0.183378), correlation (0.371916)*/,
+        2,-6, 3,0/*mean (0.228711), correlation (0.371761)*/,
+        -4,3, -2,-13/*mean (0.247211), correlation (0.364063)*/,
+        -1,-13, 1,9/*mean (0.249325), correlation (0.378139)*/,
+        7,1, 8,-6/*mean (0.000652272), correlation (0.411682)*/,
+        1,-1, 3,12/*mean (0.00248538), correlation (0.392988)*/,
+        9,1, 12,6/*mean (0.0206815), correlation (0.386106)*/,
+        -1,-9, -1,3/*mean (0.0364485), correlation (0.410752)*/,
+        -13,-13, -10,5/*mean (0.0376068), correlation (0.398374)*/,
+        7,7, 10,12/*mean (0.0424202), correlation (0.405663)*/,
+        12,-5, 12,9/*mean (0.0942645), correlation (0.410422)*/,
+        6,3, 7,11/*mean (0.1074), correlation (0.413224)*/,
+        5,-13, 6,10/*mean (0.109256), correlation (0.408646)*/,
+        2,-12, 2,3/*mean (0.131691), correlation (0.416076)*/,
+        3,8, 4,-6/*mean (0.165081), correlation (0.417569)*/,
+        2,6, 12,-13/*mean (0.171874), correlation (0.408471)*/,
+        9,-12, 10,3/*mean (0.175146), correlation (0.41296)*/,
+        -8,4, -7,9/*mean (0.183682), correlation (0.402956)*/,
+        -11,12, -4,-6/*mean (0.184672), correlation (0.416125)*/,
+        1,12, 2,-8/*mean (0.191487), correlation (0.386696)*/,
+        6,-9, 7,-4/*mean (0.192668), correlation (0.394771)*/,
+        2,3, 3,-2/*mean (0.200157), correlation (0.408303)*/,
+        6,3, 11,0/*mean (0.204588), correlation (0.411762)*/,
+        3,-3, 8,-8/*mean (0.205904), correlation (0.416294)*/,
+        7,8, 9,3/*mean (0.213237), correlation (0.409306)*/,
+        -11,-5, -6,-4/*mean (0.243444), correlation (0.395069)*/,
+        -10,11, -5,10/*mean (0.247672), correlation (0.413392)*/,
+        -5,-8, -3,12/*mean (0.24774), correlation (0.411416)*/,
+        -10,5, -9,0/*mean (0.00213675), correlation (0.454003)*/,
+        8,-1, 12,-6/*mean (0.0293635), correlation (0.455368)*/,
+        4,-6, 6,-11/*mean (0.0404971), correlation (0.457393)*/,
+        -10,12, -8,7/*mean (0.0481107), correlation (0.448364)*/,
+        4,-2, 6,7/*mean (0.050641), correlation (0.455019)*/,
+        -2,0, -2,12/*mean (0.0525978), correlation (0.44338)*/,
+        -5,-8, -5,2/*mean (0.0629667), correlation (0.457096)*/,
+        7,-6, 10,12/*mean (0.0653846), correlation (0.445623)*/,
+        -9,-13, -8,-8/*mean (0.0858749), correlation (0.449789)*/,
+        -5,-13, -5,-2/*mean (0.122402), correlation (0.450201)*/,
+        8,-8, 9,-13/*mean (0.125416), correlation (0.453224)*/,
+        -9,-11, -9,0/*mean (0.130128), correlation (0.458724)*/,
+        1,-8, 1,-2/*mean (0.132467), correlation (0.440133)*/,
+        7,-4, 9,1/*mean (0.132692), correlation (0.454)*/,
+        -2,1, -1,-4/*mean (0.135695), correlation (0.455739)*/,
+        11,-6, 12,-11/*mean (0.142904), correlation (0.446114)*/,
+        -12,-9, -6,4/*mean (0.146165), correlation (0.451473)*/,
+        3,7, 7,12/*mean (0.147627), correlation (0.456643)*/,
+        5,5, 10,8/*mean (0.152901), correlation (0.455036)*/,
+        0,-4, 2,8/*mean (0.167083), correlation (0.459315)*/,
+        -9,12, -5,-13/*mean (0.173234), correlation (0.454706)*/,
+        0,7, 2,12/*mean (0.18312), correlation (0.433855)*/,
+        -1,2, 1,7/*mean (0.185504), correlation (0.443838)*/,
+        5,11, 7,-9/*mean (0.185706), correlation (0.451123)*/,
+        3,5, 6,-8/*mean (0.188968), correlation (0.455808)*/,
+        -13,-4, -8,9/*mean (0.191667), correlation (0.459128)*/,
+        -5,9, -3,-3/*mean (0.193196), correlation (0.458364)*/,
+        -4,-7, -3,-12/*mean (0.196536), correlation (0.455782)*/,
+        6,5, 8,0/*mean (0.1972), correlation (0.450481)*/,
+        -7,6, -6,12/*mean (0.199438), correlation (0.458156)*/,
+        -13,6, -5,-2/*mean (0.211224), correlation (0.449548)*/,
+        1,-10, 3,10/*mean (0.211718), correlation (0.440606)*/,
+        4,1, 8,-4/*mean (0.213034), correlation (0.443177)*/,
+        -2,-2, 2,-13/*mean (0.234334), correlation (0.455304)*/,
+        2,-12, 12,12/*mean (0.235684), correlation (0.443436)*/,
+        -2,-13, 0,-6/*mean (0.237674), correlation (0.452525)*/,
+        4,1, 9,3/*mean (0.23962), correlation (0.444824)*/,
+        -6,-10, -3,-5/*mean (0.248459), correlation (0.439621)*/,
+        -3,-13, -1,1/*mean (0.249505), correlation (0.456666)*/,
+        7,5, 12,-11/*mean (0.00119208), correlation (0.495466)*/,
+        4,-2, 5,-7/*mean (0.00372245), correlation (0.484214)*/,
+        -13,9, -9,-5/*mean (0.00741116), correlation (0.499854)*/,
+        7,1, 8,6/*mean (0.0208952), correlation (0.499773)*/,
+        7,-8, 7,6/*mean (0.0220085), correlation (0.501609)*/,
+        -7,-4, -7,1/*mean (0.0233806), correlation (0.496568)*/,
+        -8,11, -7,-8/*mean (0.0236505), correlation (0.489719)*/,
+        -13,6, -12,-8/*mean (0.0268781), correlation (0.503487)*/,
+        2,4, 3,9/*mean (0.0323324), correlation (0.501938)*/,
+        10,-5, 12,3/*mean (0.0399235), correlation (0.494029)*/,
+        -6,-5, -6,7/*mean (0.0420153), correlation (0.486579)*/,
+        8,-3, 9,-8/*mean (0.0548021), correlation (0.484237)*/,
+        2,-12, 2,8/*mean (0.0616622), correlation (0.496642)*/,
+        -11,-2, -10,3/*mean (0.0627755), correlation (0.498563)*/,
+        -12,-13, -7,-9/*mean (0.0829622), correlation (0.495491)*/,
+        -11,0, -10,-5/*mean (0.0843342), correlation (0.487146)*/,
+        5,-3, 11,8/*mean (0.0929937), correlation (0.502315)*/,
+        -2,-13, -1,12/*mean (0.113327), correlation (0.48941)*/,
+        -1,-8, 0,9/*mean (0.132119), correlation (0.467268)*/,
+        -13,-11, -12,-5/*mean (0.136269), correlation (0.498771)*/,
+        -10,-2, -10,11/*mean (0.142173), correlation (0.498714)*/,
+        -3,9, -2,-13/*mean (0.144141), correlation (0.491973)*/,
+        2,-3, 3,2/*mean (0.14892), correlation (0.500782)*/,
+        -9,-13, -4,0/*mean (0.150371), correlation (0.498211)*/,
+        -4,6, -3,-10/*mean (0.152159), correlation (0.495547)*/,
+        -4,12, -2,-7/*mean (0.156152), correlation (0.496925)*/,
+        -6,-11, -4,9/*mean (0.15749), correlation (0.499222)*/,
+        6,-3, 6,11/*mean (0.159211), correlation (0.503821)*/,
+        -13,11, -5,5/*mean (0.162427), correlation (0.501907)*/,
+        11,11, 12,6/*mean (0.16652), correlation (0.497632)*/,
+        7,-5, 12,-2/*mean (0.169141), correlation (0.484474)*/,
+        -1,12, 0,7/*mean (0.169456), correlation (0.495339)*/,
+        -4,-8, -3,-2/*mean (0.171457), correlation (0.487251)*/,
+        -7,1, -6,7/*mean (0.175), correlation (0.500024)*/,
+        -13,-12, -8,-13/*mean (0.175866), correlation (0.497523)*/,
+        -7,-2, -6,-8/*mean (0.178273), correlation (0.501854)*/,
+        -8,5, -6,-9/*mean (0.181107), correlation (0.494888)*/,
+        -5,-1, -4,5/*mean (0.190227), correlation (0.482557)*/,
+        -13,7, -8,10/*mean (0.196739), correlation (0.496503)*/,
+        1,5, 5,-13/*mean (0.19973), correlation (0.499759)*/,
+        1,0, 10,-13/*mean (0.204465), correlation (0.49873)*/,
+        9,12, 10,-1/*mean (0.209334), correlation (0.49063)*/,
+        5,-8, 10,-9/*mean (0.211134), correlation (0.503011)*/,
+        -1,11, 1,-13/*mean (0.212), correlation (0.499414)*/,
+        -9,-3, -6,2/*mean (0.212168), correlation (0.480739)*/,
+        -1,-10, 1,12/*mean (0.212731), correlation (0.502523)*/,
+        -13,1, -8,-10/*mean (0.21327), correlation (0.489786)*/,
+        8,-11, 10,-6/*mean (0.214159), correlation (0.488246)*/,
+        2,-13, 3,-6/*mean (0.216993), correlation (0.50287)*/,
+        7,-13, 12,-9/*mean (0.223639), correlation (0.470502)*/,
+        -10,-10, -5,-7/*mean (0.224089), correlation (0.500852)*/,
+        -10,-8, -8,-13/*mean (0.228666), correlation (0.502629)*/,
+        4,-6, 8,5/*mean (0.22906), correlation (0.498305)*/,
+        3,12, 8,-13/*mean (0.233378), correlation (0.503825)*/,
+        -4,2, -3,-3/*mean (0.234323), correlation (0.476692)*/,
+        5,-13, 10,-12/*mean (0.236392), correlation (0.475462)*/,
+        4,-13, 5,-1/*mean (0.236842), correlation (0.504132)*/,
+        -9,9, -4,3/*mean (0.236977), correlation (0.497739)*/,
+        0,3, 3,-9/*mean (0.24314), correlation (0.499398)*/,
+        -12,1, -6,1/*mean (0.243297), correlation (0.489447)*/,
+        3,2, 4,-8/*mean (0.00155196), correlation (0.553496)*/,
+        -10,-10, -10,9/*mean (0.00239541), correlation (0.54297)*/,
+        8,-13, 12,12/*mean (0.0034413), correlation (0.544361)*/,
+        -8,-12, -6,-5/*mean (0.003565), correlation (0.551225)*/,
+        2,2, 3,7/*mean (0.00835583), correlation (0.55285)*/,
+        10,6, 11,-8/*mean (0.00885065), correlation (0.540913)*/,
+        6,8, 8,-12/*mean (0.0101552), correlation (0.551085)*/,
+        -7,10, -6,5/*mean (0.0102227), correlation (0.533635)*/,
+        -3,-9, -3,9/*mean (0.0110211), correlation (0.543121)*/,
+        -1,-13, -1,5/*mean (0.0113473), correlation (0.550173)*/,
+        -3,-7, -3,4/*mean (0.0140913), correlation (0.554774)*/,
+        -8,-2, -8,3/*mean (0.017049), correlation (0.55461)*/,
+        4,2, 12,12/*mean (0.01778), correlation (0.546921)*/,
+        2,-5, 3,11/*mean (0.0224022), correlation (0.549667)*/,
+        6,-9, 11,-13/*mean (0.029161), correlation (0.546295)*/,
+        3,-1, 7,12/*mean (0.0303081), correlation (0.548599)*/,
+        11,-1, 12,4/*mean (0.0355151), correlation (0.523943)*/,
+        -3,0, -3,6/*mean (0.0417904), correlation (0.543395)*/,
+        4,-11, 4,12/*mean (0.0487292), correlation (0.542818)*/,
+        2,-4, 2,1/*mean (0.0575124), correlation (0.554888)*/,
+        -10,-6, -8,1/*mean (0.0594242), correlation (0.544026)*/,
+        -13,7, -11,1/*mean (0.0597391), correlation (0.550524)*/,
+        -13,12, -11,-13/*mean (0.0608974), correlation (0.55383)*/,
+        6,0, 11,-13/*mean (0.065126), correlation (0.552006)*/,
+        0,-1, 1,4/*mean (0.074224), correlation (0.546372)*/,
+        -13,3, -9,-2/*mean (0.0808592), correlation (0.554875)*/,
+        -9,8, -6,-3/*mean (0.0883378), correlation (0.551178)*/,
+        -13,-6, -8,-2/*mean (0.0901035), correlation (0.548446)*/,
+        5,-9, 8,10/*mean (0.0949843), correlation (0.554694)*/,
+        2,7, 3,-9/*mean (0.0994152), correlation (0.550979)*/,
+        -1,-6, -1,-1/*mean (0.10045), correlation (0.552714)*/,
+        9,5, 11,-2/*mean (0.100686), correlation (0.552594)*/,
+        11,-3, 12,-8/*mean (0.101091), correlation (0.532394)*/,
+        3,0, 3,5/*mean (0.101147), correlation (0.525576)*/,
+        -1,4, 0,10/*mean (0.105263), correlation (0.531498)*/,
+        3,-6, 4,5/*mean (0.110785), correlation (0.540491)*/,
+        -13,0, -10,5/*mean (0.112798), correlation (0.536582)*/,
+        5,8, 12,11/*mean (0.114181), correlation (0.555793)*/,
+        8,9, 9,-6/*mean (0.117431), correlation (0.553763)*/,
+        7,-4, 8,-12/*mean (0.118522), correlation (0.553452)*/,
+        -10,4, -10,9/*mean (0.12094), correlation (0.554785)*/,
+        7,3, 12,4/*mean (0.122582), correlation (0.555825)*/,
+        9,-7, 10,-2/*mean (0.124978), correlation (0.549846)*/,
+        7,0, 12,-2/*mean (0.127002), correlation (0.537452)*/,
+        -1,-6, 0,-11/*mean (0.127148), correlation (0.547401)*/
+    };
+
+    class ORB_Impl : public cv::cuda::ORB
+    {
+    public:
+        ORB_Impl(int nfeatures,
+                 float scaleFactor,
+                 int nlevels,
+                 int edgeThreshold,
+                 int firstLevel,
+                 int WTA_K,
+                 int scoreType,
+                 int patchSize,
+                 int fastThreshold,
+                 bool blurForDescriptor);
+
+        virtual void detectAndCompute(InputArray _image, InputArray _mask, std::vector<KeyPoint>& keypoints, OutputArray _descriptors, bool useProvidedKeypoints);
+        virtual void detectAndComputeAsync(InputArray _image, InputArray _mask, OutputArray _keypoints, OutputArray _descriptors, bool useProvidedKeypoints, Stream& stream);
+
+        virtual void convert(InputArray _gpu_keypoints, std::vector<KeyPoint>& keypoints);
+
+        virtual int descriptorSize() const { return cv::ORB::kBytes; }
+        virtual int descriptorType() const { return CV_8U; }
+        virtual int defaultNorm() const { return NORM_HAMMING; }
+
+        virtual void setMaxFeatures(int maxFeatures) { nFeatures_ = maxFeatures; }
+        virtual int getMaxFeatures() const { return nFeatures_; }
+
+        virtual void setScaleFactor(double scaleFactor) { scaleFactor_ = scaleFactor; }
+        virtual double getScaleFactor() const { return scaleFactor_; }
+
+        virtual void setNLevels(int nlevels) { nLevels_ = nlevels; }
+        virtual int getNLevels() const { return nLevels_; }
+
+        virtual void setEdgeThreshold(int edgeThreshold) { edgeThreshold_ = edgeThreshold; }
+        virtual int getEdgeThreshold() const { return edgeThreshold_; }
+
+        virtual void setFirstLevel(int firstLevel) { firstLevel_ = firstLevel; }
+        virtual int getFirstLevel() const { return firstLevel_; }
+
+        virtual void setWTA_K(int wta_k) { WTA_K_ = wta_k; }
+        virtual int getWTA_K() const { return WTA_K_; }
+
+        virtual void setScoreType(int scoreType) { scoreType_ = scoreType; }
+        virtual int getScoreType() const { return scoreType_; }
+
+        virtual void setPatchSize(int patchSize) { patchSize_ = patchSize; }
+        virtual int getPatchSize() const { return patchSize_; }
+
+        virtual void setFastThreshold(int fastThreshold) { fastThreshold_ = fastThreshold; }
+        virtual int getFastThreshold() const { return fastThreshold_; }
+
+        virtual void setBlurForDescriptor(bool blurForDescriptor) { blurForDescriptor_ = blurForDescriptor; }
+        virtual bool getBlurForDescriptor() const { return blurForDescriptor_; }
+
+    private:
+        int nFeatures_;
+        float scaleFactor_;
+        int nLevels_;
+        int edgeThreshold_;
+        int firstLevel_;
+        int WTA_K_;
+        int scoreType_;
+        int patchSize_;
+        int fastThreshold_;
+        bool blurForDescriptor_;
+
+    private:
+        void buildScalePyramids(InputArray _image, InputArray _mask, Stream& stream);
+        void computeKeyPointsPyramid(Stream& stream);
+        void computeDescriptors(OutputArray _descriptors, Stream& stream);
+        void mergeKeyPoints(OutputArray _keypoints, Stream& stream);
+
+    private:
+        Ptr<cv::cuda::FastFeatureDetector> fastDetector_;
+
+        //! The number of desired features per scale
+        std::vector<size_t> n_features_per_level_;
+
+        //! Points to compute BRIEF descriptors from
+        GpuMat pattern_;
+
+        std::vector<GpuMat> imagePyr_;
+        std::vector<GpuMat> maskPyr_;
+
+        GpuMat buf_;
+
+        std::vector<GpuMat> keyPointsPyr_;
+        std::vector<int> keyPointsCount_;
+
+        Ptr<cuda::Filter> blurFilter_;
+
+        GpuMat d_keypoints_;
+    };
+
+    static void initializeOrbPattern(const Point* pattern0, Mat& pattern, int ntuples, int tupleSize, int poolSize)
+    {
+        RNG rng(0x12345678);
+
+        pattern.create(2, ntuples * tupleSize, CV_32SC1);
+        pattern.setTo(Scalar::all(0));
+
+        int* pattern_x_ptr = pattern.ptr<int>(0);
+        int* pattern_y_ptr = pattern.ptr<int>(1);
+
+        for (int i = 0; i < ntuples; i++)
+        {
+            for (int k = 0; k < tupleSize; k++)
+            {
+                for(;;)
+                {
+                    int idx = rng.uniform(0, poolSize);
+                    Point pt = pattern0[idx];
+
+                    int k1;
+                    for (k1 = 0; k1 < k; k1++)
+                        if (pattern_x_ptr[tupleSize * i + k1] == pt.x && pattern_y_ptr[tupleSize * i + k1] == pt.y)
+                            break;
+
+                    if (k1 == k)
+                    {
+                        pattern_x_ptr[tupleSize * i + k] = pt.x;
+                        pattern_y_ptr[tupleSize * i + k] = pt.y;
+                        break;
+                    }
+                }
+            }
+        }
+    }
+
+    static void makeRandomPattern(int patchSize, Point* pattern, int npoints)
+    {
+        // we always start with a fixed seed,
+        // to make patterns the same on each run
+        RNG rng(0x34985739);
+
+        for (int i = 0; i < npoints; i++)
+        {
+            pattern[i].x = rng.uniform(-patchSize / 2, patchSize / 2 + 1);
+            pattern[i].y = rng.uniform(-patchSize / 2, patchSize / 2 + 1);
+        }
+    }
+
+    ORB_Impl::ORB_Impl(int nFeatures,
+                       float scaleFactor,
+                       int nLevels,
+                       int edgeThreshold,
+                       int firstLevel,
+                       int WTA_K,
+                       int scoreType,
+                       int patchSize,
+                       int fastThreshold,
+                       bool blurForDescriptor) :
+        nFeatures_(nFeatures),
+        scaleFactor_(scaleFactor),
+        nLevels_(nLevels),
+        edgeThreshold_(edgeThreshold),
+        firstLevel_(firstLevel),
+        WTA_K_(WTA_K),
+        scoreType_(scoreType),
+        patchSize_(patchSize),
+        fastThreshold_(fastThreshold),
+        blurForDescriptor_(blurForDescriptor)
+    {
+        CV_Assert( patchSize_ >= 2 );
+        CV_Assert( WTA_K_ == 2 || WTA_K_ == 3 || WTA_K_ == 4 );
+
+        fastDetector_ = cuda::FastFeatureDetector::create(fastThreshold_);
+
+        // fill the extractors and descriptors for the corresponding scales
+        float factor = 1.0f / scaleFactor_;
+        float n_desired_features_per_scale = nFeatures_ * (1.0f - factor) / (1.0f - std::pow(factor, nLevels_));
+
+        n_features_per_level_.resize(nLevels_);
+        size_t sum_n_features = 0;
+        for (int level = 0; level < nLevels_ - 1; ++level)
+        {
+            n_features_per_level_[level] = cvRound(n_desired_features_per_scale);
+            sum_n_features += n_features_per_level_[level];
+            n_desired_features_per_scale *= factor;
+        }
+        n_features_per_level_[nLevels_ - 1] = nFeatures - sum_n_features;
+
+        // pre-compute the end of a row in a circular patch
+        int half_patch_size = patchSize_ / 2;
+        std::vector<int> u_max(half_patch_size + 2);
+        for (int v = 0; v <= half_patch_size * std::sqrt(2.f) / 2 + 1; ++v)
+        {
+            u_max[v] = cvRound(std::sqrt(static_cast<float>(half_patch_size * half_patch_size - v * v)));
+        }
+
+        // Make sure we are symmetric
+        for (int v = half_patch_size, v_0 = 0; v >= half_patch_size * std::sqrt(2.f) / 2; --v)
+        {
+            while (u_max[v_0] == u_max[v_0 + 1])
+                ++v_0;
+            u_max[v] = v_0;
+            ++v_0;
+        }
+        CV_Assert( u_max.size() < 32 );
+        cv::cuda::device::orb::loadUMax(&u_max[0], static_cast<int>(u_max.size()));
+
+        // Calc pattern
+        const int npoints = 512;
+        Point pattern_buf[npoints];
+        const Point* pattern0 = (const Point*)bit_pattern_31_;
+        if (patchSize_ != 31)
+        {
+            pattern0 = pattern_buf;
+            makeRandomPattern(patchSize_, pattern_buf, npoints);
+        }
+
+        Mat h_pattern;
+        if (WTA_K_ == 2)
+        {
+            h_pattern.create(2, npoints, CV_32SC1);
+
+            int* pattern_x_ptr = h_pattern.ptr<int>(0);
+            int* pattern_y_ptr = h_pattern.ptr<int>(1);
+
+            for (int i = 0; i < npoints; ++i)
+            {
+                pattern_x_ptr[i] = pattern0[i].x;
+                pattern_y_ptr[i] = pattern0[i].y;
+            }
+        }
+        else
+        {
+            int ntuples = descriptorSize() * 4;
+            initializeOrbPattern(pattern0, h_pattern, ntuples, WTA_K_, npoints);
+        }
+
+        pattern_.upload(h_pattern);
+
+        blurFilter_ = cuda::createGaussianFilter(CV_8UC1, -1, Size(7, 7), 2, 2, BORDER_REFLECT_101);
+    }
+
+    static float getScale(float scaleFactor, int firstLevel, int level)
+    {
+        return pow(scaleFactor, level - firstLevel);
+    }
+
+    void ORB_Impl::detectAndCompute(InputArray _image, InputArray _mask, std::vector<KeyPoint>& keypoints, OutputArray _descriptors, bool useProvidedKeypoints)
+    {
+        using namespace cv::cuda::device::orb;
+        if (useProvidedKeypoints)
+        {
+            d_keypoints_.release();
+            keyPointsPyr_.clear();
+
+            int j, level, nkeypoints = (int)keypoints.size();
+            nLevels_ = 0;
+            for( j = 0; j < nkeypoints; j++ )
+            {
+                level = keypoints[j].octave;
+                CV_Assert(level >= 0);
+                nLevels_ = std::max(nLevels_, level);
+            }
+            nLevels_ ++;
+            std::vector<std::vector<KeyPoint> > oKeypoints(nLevels_);
+            for( j = 0; j < nkeypoints; j++ )
+            {
+                level = keypoints[j].octave;
+                oKeypoints[level].push_back(keypoints[j]);
+            }
+            if (!keypoints.empty())
+            {
+                keyPointsPyr_.resize(nLevels_);
+                keyPointsCount_.resize(nLevels_);
+                int t;
+                for(t = 0; t < nLevels_; t++) {
+                    const std::vector<KeyPoint>& ks = oKeypoints[t];
+                    if (!ks.empty()){
+
+                        Mat h_keypoints(ROWS_COUNT, static_cast<int>(ks.size()), CV_32FC1);
+
+                        float sf = getScale(scaleFactor_, firstLevel_, t);
+                        float locScale = t != firstLevel_ ? sf : 1.0f;
+                        float scale = 1.f/locScale;
+
+                        short2* x_loc_row = h_keypoints.ptr<short2>(0);
+                        float* x_kp_hessian = h_keypoints.ptr<float>(1);
+                        float* x_kp_dir = h_keypoints.ptr<float>(2);
+
+                        for (size_t i = 0, size = ks.size(); i < size; ++i)
+                        {
+                            const KeyPoint& kp = ks[i];
+                            x_kp_hessian[i] = kp.response;
+                            x_loc_row[i].x = cvRound(kp.pt.x * scale);
+                            x_loc_row[i].y = cvRound(kp.pt.y * scale);
+                            x_kp_dir[i] = kp.angle;
+
+                        }
+
+                        keyPointsPyr_[t].upload(h_keypoints.rowRange(0,3));
+                        keyPointsCount_[t] = h_keypoints.cols;
+                    }
+                }
+            }
+        }
+
+        detectAndComputeAsync(_image, _mask, d_keypoints_, _descriptors, useProvidedKeypoints, Stream::Null());
+
+        if (!useProvidedKeypoints) {
+            convert(d_keypoints_, keypoints);
+        }
+    }
+
+    void ORB_Impl::detectAndComputeAsync(InputArray _image, InputArray _mask, OutputArray _keypoints, OutputArray _descriptors, bool useProvidedKeypoints, Stream& stream)
+    {
+        buildScalePyramids(_image, _mask, stream);
+        if (!useProvidedKeypoints)
+        {
+           computeKeyPointsPyramid(stream);
+        }
+        if (_descriptors.needed())
+        {
+            computeDescriptors(_descriptors, stream);
+        }
+        if (!useProvidedKeypoints)
+        {
+            mergeKeyPoints(_keypoints, stream);
+        }
+    }
+
+    void ORB_Impl::buildScalePyramids(InputArray _image, InputArray _mask, Stream& stream)
+    {
+        const GpuMat image = _image.getGpuMat();
+        const GpuMat mask = _mask.getGpuMat();
+
+        CV_Assert( image.type() == CV_8UC1 );
+        CV_Assert( mask.empty() || (mask.type() == CV_8UC1 && mask.size() == image.size()) );
+
+        imagePyr_.resize(nLevels_);
+        maskPyr_.resize(nLevels_);
+
+        for (int level = 0; level < nLevels_; ++level)
+        {
+            float scale = 1.0f / getScale(scaleFactor_, firstLevel_, level);
+
+            Size sz(cvRound(image.cols * scale), cvRound(image.rows * scale));
+
+            ensureSizeIsEnough(sz, image.type(), imagePyr_[level]);
+            ensureSizeIsEnough(sz, CV_8UC1, maskPyr_[level]);
+            maskPyr_[level].setTo(Scalar::all(255));
+
+            // Compute the resized image
+            if (level != firstLevel_)
+            {
+                if (level < firstLevel_)
+                {
+                    cuda::resize(image, imagePyr_[level], sz, 0, 0, INTER_LINEAR, stream);
+
+                    if (!mask.empty())
+                        cuda::resize(mask, maskPyr_[level], sz, 0, 0, INTER_LINEAR, stream);
+                }
+                else
+                {
+                    cuda::resize(imagePyr_[level - 1], imagePyr_[level], sz, 0, 0, INTER_LINEAR, stream);
+
+                    if (!mask.empty())
+                    {
+                        cuda::resize(maskPyr_[level - 1], maskPyr_[level], sz, 0, 0, INTER_LINEAR, stream);
+                        cuda::threshold(maskPyr_[level], maskPyr_[level], 254, 0, THRESH_TOZERO, stream);
+                    }
+                }
+            }
+            else
+            {
+                image.copyTo(imagePyr_[level], stream);
+
+                if (!mask.empty())
+                    mask.copyTo(maskPyr_[level], stream);
+            }
+
+            // Filter keypoints by image border
+            ensureSizeIsEnough(sz, CV_8UC1, buf_);
+            buf_.setTo(Scalar::all(0), stream);
+            Rect inner(edgeThreshold_, edgeThreshold_, sz.width - 2 * edgeThreshold_, sz.height - 2 * edgeThreshold_);
+            buf_(inner).setTo(Scalar::all(255), stream);
+
+            cuda::bitwise_and(maskPyr_[level], buf_, maskPyr_[level], cv::noArray(), stream);
+        }
+    }
+
+    // takes keypoints and culls them by the response
+    static void cull(GpuMat& keypoints, int& count, int n_points, Stream& stream)
+    {
+        using namespace cv::cuda::device::orb;
+
+        //this is only necessary if the keypoints size is greater than the number of desired points.
+        if (count > n_points)
+        {
+            if (n_points == 0)
+            {
+                keypoints.release();
+                return;
+            }
+
+            count = cull_gpu(keypoints.ptr<int>(cuda::FastFeatureDetector::LOCATION_ROW), keypoints.ptr<float>(cuda::FastFeatureDetector::RESPONSE_ROW), count, n_points, StreamAccessor::getStream(stream));
+        }
+    }
+
+    void ORB_Impl::computeKeyPointsPyramid(Stream& stream)
+    {
+        using namespace cv::cuda::device::orb;
+
+        int half_patch_size = patchSize_ / 2;
+
+        keyPointsPyr_.resize(nLevels_);
+        keyPointsCount_.resize(nLevels_);
+
+        fastDetector_->setThreshold(fastThreshold_);
+
+        for (int level = 0; level < nLevels_; ++level)
+        {
+            fastDetector_->setMaxNumPoints(0.05 * imagePyr_[level].size().area());
+
+            GpuMat fastKpRange;
+            fastDetector_->detectAsync(imagePyr_[level], fastKpRange, maskPyr_[level], stream);
+
+            keyPointsCount_[level] = fastKpRange.cols;
+
+            if (keyPointsCount_[level] == 0)
+                continue;
+
+            ensureSizeIsEnough(3, keyPointsCount_[level], fastKpRange.type(), keyPointsPyr_[level]);
+            fastKpRange.copyTo(keyPointsPyr_[level].rowRange(0, 2), stream);
+
+            const int n_features = static_cast<int>(n_features_per_level_[level]);
+
+            if (scoreType_ == cv::ORB::HARRIS_SCORE)
+            {
+                // Keep more points than necessary as FAST does not give amazing corners
+                cull(keyPointsPyr_[level], keyPointsCount_[level], 2 * n_features, stream);
+
+                // Compute the Harris cornerness (better scoring than FAST)
+                HarrisResponses_gpu(imagePyr_[level], keyPointsPyr_[level].ptr<short2>(0), keyPointsPyr_[level].ptr<float>(1), keyPointsCount_[level], 7, HARRIS_K, StreamAccessor::getStream(stream));
+            }
+
+            //cull to the final desired level, using the new Harris scores or the original FAST scores.
+            cull(keyPointsPyr_[level], keyPointsCount_[level], n_features, stream);
+
+            // Compute orientation
+            IC_Angle_gpu(imagePyr_[level], keyPointsPyr_[level].ptr<short2>(0), keyPointsPyr_[level].ptr<float>(2), keyPointsCount_[level], half_patch_size, StreamAccessor::getStream(stream));
+        }
+    }
+
+    void ORB_Impl::computeDescriptors(OutputArray _descriptors, Stream& stream)
+    {
+        using namespace cv::cuda::device::orb;
+
+        int nAllkeypoints = 0;
+
+        for (int level = 0; level < nLevels_; ++level)
+            nAllkeypoints += keyPointsCount_[level];
+
+        if (nAllkeypoints == 0)
+        {
+            _descriptors.release();
+            return;
+        }
+
+        ensureSizeIsEnough(nAllkeypoints, descriptorSize(), CV_8UC1, _descriptors);
+        GpuMat descriptors = _descriptors.getGpuMat();
+
+        int offset = 0;
+
+        for (int level = 0; level < nLevels_; ++level)
+        {
+            if (keyPointsCount_[level] == 0)
+                continue;
+
+            GpuMat descRange = descriptors.rowRange(offset, offset + keyPointsCount_[level]);
+
+            if (blurForDescriptor_)
+            {
+                // preprocess the resized image
+                ensureSizeIsEnough(imagePyr_[level].size(), imagePyr_[level].type(), buf_);
+                blurFilter_->apply(imagePyr_[level], buf_, stream);
+            }
+
+            computeOrbDescriptor_gpu(blurForDescriptor_ ? buf_ : imagePyr_[level], keyPointsPyr_[level].ptr<short2>(0), keyPointsPyr_[level].ptr<float>(2),
+                keyPointsCount_[level], pattern_.ptr<int>(0), pattern_.ptr<int>(1), descRange, descriptorSize(), WTA_K_, StreamAccessor::getStream(stream));
+
+            offset += keyPointsCount_[level];
+        }
+    }
+
+    void ORB_Impl::mergeKeyPoints(OutputArray _keypoints, Stream& stream)
+    {
+        using namespace cv::cuda::device::orb;
+
+        int nAllkeypoints = 0;
+
+        for (int level = 0; level < nLevels_; ++level)
+            nAllkeypoints += keyPointsCount_[level];
+
+        if (nAllkeypoints == 0)
+        {
+            _keypoints.release();
+            return;
+        }
+
+        ensureSizeIsEnough(ROWS_COUNT, nAllkeypoints, CV_32FC1, _keypoints);
+        GpuMat& keypoints = _keypoints.getGpuMatRef();
+
+        int offset = 0;
+
+        for (int level = 0; level < nLevels_; ++level)
+        {
+            if (keyPointsCount_[level] == 0)
+                continue;
+
+            float sf = getScale(scaleFactor_, firstLevel_, level);
+
+            GpuMat keyPointsRange = keypoints.colRange(offset, offset + keyPointsCount_[level]);
+
+            float locScale = level != firstLevel_ ? sf : 1.0f;
+
+            mergeLocation_gpu(keyPointsPyr_[level].ptr<short2>(0), keyPointsRange.ptr<float>(0), keyPointsRange.ptr<float>(1), keyPointsCount_[level], locScale, StreamAccessor::getStream(stream));
+
+            GpuMat range = keyPointsRange.rowRange(2, 4);
+            keyPointsPyr_[level](Range(1, 3), Range(0, keyPointsCount_[level])).copyTo(range, stream);
+
+            keyPointsRange.row(4).setTo(Scalar::all(level), stream);
+            keyPointsRange.row(5).setTo(Scalar::all(patchSize_ * sf), stream);
+
+            offset += keyPointsCount_[level];
+        }
+    }
+
+    void ORB_Impl::convert(InputArray _gpu_keypoints, std::vector<KeyPoint>& keypoints)
+    {
+        if (_gpu_keypoints.empty())
+        {
+            keypoints.clear();
+            return;
+        }
+
+        Mat h_keypoints;
+        if (_gpu_keypoints.kind() == _InputArray::CUDA_GPU_MAT)
+        {
+            _gpu_keypoints.getGpuMat().download(h_keypoints);
+        }
+        else
+        {
+            h_keypoints = _gpu_keypoints.getMat();
+        }
+
+        CV_Assert( h_keypoints.rows == ROWS_COUNT );
+        CV_Assert( h_keypoints.type() == CV_32FC1 );
+
+        const int npoints = h_keypoints.cols;
+
+        keypoints.resize(npoints);
+
+        const float* x_ptr = h_keypoints.ptr<float>(X_ROW);
+        const float* y_ptr = h_keypoints.ptr<float>(Y_ROW);
+        const float* response_ptr = h_keypoints.ptr<float>(RESPONSE_ROW);
+        const float* angle_ptr = h_keypoints.ptr<float>(ANGLE_ROW);
+        const float* octave_ptr = h_keypoints.ptr<float>(OCTAVE_ROW);
+        const float* size_ptr = h_keypoints.ptr<float>(SIZE_ROW);
+
+        for (int i = 0; i < npoints; ++i)
+        {
+            KeyPoint kp;
+
+            kp.pt.x = x_ptr[i];
+            kp.pt.y = y_ptr[i];
+            kp.response = response_ptr[i];
+            kp.angle = angle_ptr[i];
+            kp.octave = static_cast<int>(octave_ptr[i]);
+            kp.size = size_ptr[i];
+
+            keypoints[i] = kp;
+        }
+    }
+}
+
+Ptr<cv::cuda::ORB> cv::cuda::ORB::create(int nfeatures,
+                                         float scaleFactor,
+                                         int nlevels,
+                                         int edgeThreshold,
+                                         int firstLevel,
+                                         int WTA_K,
+                                         int scoreType,
+                                         int patchSize,
+                                         int fastThreshold,
+                                         bool blurForDescriptor)
+{
+    return makePtr<ORB_Impl>(nfeatures, scaleFactor, nlevels, edgeThreshold, firstLevel, WTA_K, scoreType, patchSize, fastThreshold, blurForDescriptor);
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudafeatures2d/src/precomp.hpp b/modules/cudafeatures2d/src/precomp.hpp
new file mode 100644
index 00000000000..da64ba4a1c7
--- /dev/null
+++ b/modules/cudafeatures2d/src/precomp.hpp
@@ -0,0 +1,57 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_PRECOMP_H__
+#define __OPENCV_PRECOMP_H__
+
+#include <algorithm>
+#include <functional>
+#include <iterator>
+
+#include "opencv2/cudafeatures2d.hpp"
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudawarping.hpp"
+#include "opencv2/features2d.hpp"
+
+#include "opencv2/core/private.cuda.hpp"
+
+#endif /* __OPENCV_PRECOMP_H__ */
diff --git a/modules/cudafeatures2d/test/test_features2d.cpp b/modules/cudafeatures2d/test/test_features2d.cpp
new file mode 100644
index 00000000000..6df947196b6
--- /dev/null
+++ b/modules/cudafeatures2d/test/test_features2d.cpp
@@ -0,0 +1,760 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+#include <cuda_runtime_api.h>
+
+namespace opencv_test { namespace {
+
+/////////////////////////////////////////////////////////////////////////////////////////////////
+// FAST
+
+namespace
+{
+    IMPLEMENT_PARAM_CLASS(FAST_Threshold, int)
+    IMPLEMENT_PARAM_CLASS(FAST_NonmaxSuppression, bool)
+}
+
+PARAM_TEST_CASE(FAST, cv::cuda::DeviceInfo, FAST_Threshold, FAST_NonmaxSuppression)
+{
+    cv::cuda::DeviceInfo devInfo;
+    int threshold;
+    bool nonmaxSuppression;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        threshold = GET_PARAM(1);
+        nonmaxSuppression = GET_PARAM(2);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(FAST, Accuracy)
+{
+    cv::Mat image = readImage("features2d/aloe.png", cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(image.empty());
+
+    cv::Ptr<cv::cuda::FastFeatureDetector> fast = cv::cuda::FastFeatureDetector::create(threshold, nonmaxSuppression);
+
+    if (!supportFeature(devInfo, cv::cuda::GLOBAL_ATOMICS))
+    {
+        throw SkipTestException("CUDA device doesn't support global atomics");
+    }
+    else
+    {
+        std::vector<cv::KeyPoint> keypoints;
+        fast->detect(loadMat(image), keypoints);
+
+        std::vector<cv::KeyPoint> keypoints_gold;
+        cv::FAST(image, keypoints_gold, threshold, nonmaxSuppression);
+
+        ASSERT_KEYPOINTS_EQ(keypoints_gold, keypoints);
+    }
+}
+
+class FastAsyncParallelLoopBody : public cv::ParallelLoopBody
+{
+public:
+    FastAsyncParallelLoopBody(cv::cuda::HostMem& src, cv::cuda::GpuMat* d_kpts, cv::Ptr<cv::cuda::FastFeatureDetector>* d_fast)
+        : src_(src), kpts_(d_kpts), fast_(d_fast) {}
+    ~FastAsyncParallelLoopBody() {};
+    void operator()(const cv::Range& r) const
+    {
+        for (int i = r.start; i < r.end; i++) {
+            cv::cuda::Stream stream;
+            cv::cuda::GpuMat d_src_(src_.rows, src_.cols, CV_8UC1);
+            d_src_.upload(src_);
+            fast_[i]->detectAsync(d_src_, kpts_[i], noArray(), stream);
+        }
+    }
+protected:
+    cv::cuda::HostMem src_;
+    cv::cuda::GpuMat* kpts_;
+    cv::Ptr<cv::cuda::FastFeatureDetector>* fast_;
+};
+
+CUDA_TEST_P(FAST, Async)
+{
+    if (!supportFeature(devInfo, cv::cuda::GLOBAL_ATOMICS))
+    {
+        throw SkipTestException("CUDA device doesn't support global atomics");
+    }
+    else
+    {
+        cv::Mat image_ = readImage("features2d/aloe.png", cv::IMREAD_GRAYSCALE);
+        ASSERT_FALSE(image_.empty());
+
+        cv::cuda::HostMem image(image_);
+
+        cv::cuda::GpuMat d_keypoints[2];
+        cv::Ptr<cv::cuda::FastFeatureDetector> d_fast[2];
+
+        d_fast[0] = cv::cuda::FastFeatureDetector::create(threshold, nonmaxSuppression);
+        d_fast[1] = cv::cuda::FastFeatureDetector::create(threshold, nonmaxSuppression);
+
+        cv::parallel_for_(cv::Range(0, 2), FastAsyncParallelLoopBody(image, d_keypoints, d_fast));
+
+        cudaDeviceSynchronize();
+
+        std::vector<cv::KeyPoint> keypoints[2];
+        d_fast[0]->convert(d_keypoints[0], keypoints[0]);
+        d_fast[1]->convert(d_keypoints[1], keypoints[1]);
+
+        std::vector<cv::KeyPoint> keypoints_gold;
+        cv::FAST(image, keypoints_gold, threshold, nonmaxSuppression);
+
+        ASSERT_KEYPOINTS_EQ(keypoints_gold, keypoints[0]);
+        ASSERT_KEYPOINTS_EQ(keypoints_gold, keypoints[1]);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Features2D, FAST, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(FAST_Threshold(25), FAST_Threshold(50)),
+    testing::Values(FAST_NonmaxSuppression(false), FAST_NonmaxSuppression(true))));
+
+/////////////////////////////////////////////////////////////////////////////////////////////////
+// ORB
+
+namespace
+{
+    IMPLEMENT_PARAM_CLASS(ORB_FeaturesCount, int)
+    IMPLEMENT_PARAM_CLASS(ORB_ScaleFactor, float)
+    IMPLEMENT_PARAM_CLASS(ORB_LevelsCount, int)
+    IMPLEMENT_PARAM_CLASS(ORB_EdgeThreshold, int)
+    IMPLEMENT_PARAM_CLASS(ORB_firstLevel, int)
+    IMPLEMENT_PARAM_CLASS(ORB_WTA_K, int)
+    IMPLEMENT_PARAM_CLASS(ORB_PatchSize, int)
+    IMPLEMENT_PARAM_CLASS(ORB_BlurForDescriptor, bool)
+}
+
+PARAM_TEST_CASE(ORB, cv::cuda::DeviceInfo, ORB_FeaturesCount, ORB_ScaleFactor, ORB_LevelsCount, ORB_EdgeThreshold, ORB_firstLevel, ORB_WTA_K, cv::ORB::ScoreType, ORB_PatchSize, ORB_BlurForDescriptor)
+{
+    cv::cuda::DeviceInfo devInfo;
+    int nFeatures;
+    float scaleFactor;
+    int nLevels;
+    int edgeThreshold;
+    int firstLevel;
+    int WTA_K;
+    cv::ORB::ScoreType scoreType;
+    int patchSize;
+    bool blurForDescriptor;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        nFeatures = GET_PARAM(1);
+        scaleFactor = GET_PARAM(2);
+        nLevels = GET_PARAM(3);
+        edgeThreshold = GET_PARAM(4);
+        firstLevel = GET_PARAM(5);
+        WTA_K = GET_PARAM(6);
+        scoreType = GET_PARAM(7);
+        patchSize = GET_PARAM(8);
+        blurForDescriptor = GET_PARAM(9);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(ORB, Accuracy)
+{
+    cv::Mat image = readImage("features2d/aloe.png", cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(image.empty());
+
+    cv::Mat mask(image.size(), CV_8UC1, cv::Scalar::all(1));
+    mask(cv::Range(0, image.rows / 2), cv::Range(0, image.cols / 2)).setTo(cv::Scalar::all(0));
+
+    cv::Ptr<cv::cuda::ORB> orb =
+            cv::cuda::ORB::create(nFeatures, scaleFactor, nLevels, edgeThreshold, firstLevel,
+                                  WTA_K, scoreType, patchSize, 20, blurForDescriptor);
+
+    if (!supportFeature(devInfo, cv::cuda::GLOBAL_ATOMICS))
+    {
+        try
+        {
+            std::vector<cv::KeyPoint> keypoints;
+            cv::cuda::GpuMat descriptors;
+            orb->detectAndComputeAsync(loadMat(image), loadMat(mask), rawOut(keypoints), descriptors);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsNotImplemented, e.code);
+        }
+    }
+    else
+    {
+        std::vector<cv::KeyPoint> keypoints;
+        cv::cuda::GpuMat descriptors;
+        orb->detectAndCompute(loadMat(image), loadMat(mask), keypoints, descriptors);
+
+        cv::Ptr<cv::ORB> orb_gold = cv::ORB::create(nFeatures, scaleFactor, nLevels, edgeThreshold, firstLevel, WTA_K, scoreType, patchSize);
+
+        std::vector<cv::KeyPoint> keypoints_gold;
+        cv::Mat descriptors_gold;
+        orb_gold->detectAndCompute(image, mask, keypoints_gold, descriptors_gold);
+
+        cv::BFMatcher matcher(cv::NORM_HAMMING);
+        std::vector<cv::DMatch> matches;
+        matcher.match(descriptors_gold, cv::Mat(descriptors), matches);
+
+        int matchedCount = getMatchedPointsCount(keypoints_gold, keypoints, matches);
+        double matchedRatio = static_cast<double>(matchedCount) / keypoints.size();
+
+        EXPECT_GT(matchedRatio, 0.35);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Features2D, ORB,  testing::Combine(
+    ALL_DEVICES,
+    testing::Values(ORB_FeaturesCount(1000)),
+    testing::Values(ORB_ScaleFactor(1.2f)),
+    testing::Values(ORB_LevelsCount(4), ORB_LevelsCount(8)),
+    testing::Values(ORB_EdgeThreshold(31)),
+    testing::Values(ORB_firstLevel(0)),
+    testing::Values(ORB_WTA_K(2), ORB_WTA_K(3), ORB_WTA_K(4)),
+    testing::Values(cv::ORB::HARRIS_SCORE),
+    testing::Values(ORB_PatchSize(31), ORB_PatchSize(29)),
+    testing::Values(ORB_BlurForDescriptor(false), ORB_BlurForDescriptor(true))));
+
+/////////////////////////////////////////////////////////////////////////////////////////////////
+// BruteForceMatcher
+
+namespace
+{
+    IMPLEMENT_PARAM_CLASS(DescriptorSize, int)
+    IMPLEMENT_PARAM_CLASS(UseMask, bool)
+}
+
+PARAM_TEST_CASE(BruteForceMatcher, cv::cuda::DeviceInfo, NormCode, DescriptorSize, UseMask)
+{
+    cv::cuda::DeviceInfo devInfo;
+    int normCode;
+    int dim;
+    bool useMask;
+
+    int queryDescCount;
+    int countFactor;
+
+    cv::Mat query, train;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        normCode = GET_PARAM(1);
+        dim = GET_PARAM(2);
+        useMask = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        queryDescCount = 300; // must be even number because we split train data in some cases in two
+        countFactor = 4; // do not change it
+
+        cv::RNG& rng = cvtest::TS::ptr()->get_rng();
+
+        cv::Mat queryBuf, trainBuf;
+
+        // Generate query descriptors randomly.
+        // Descriptor vector elements are integer values.
+        queryBuf.create(queryDescCount, dim, CV_32SC1);
+        rng.fill(queryBuf, cv::RNG::UNIFORM, cv::Scalar::all(0), cv::Scalar::all(3));
+        queryBuf.convertTo(queryBuf, CV_32FC1);
+
+        // Generate train descriptors as follows:
+        // copy each query descriptor to train set countFactor times
+        // and perturb some one element of the copied descriptors in
+        // in ascending order. General boundaries of the perturbation
+        // are (0.f, 1.f).
+        trainBuf.create(queryDescCount * countFactor, dim, CV_32FC1);
+        float step = 1.f / countFactor;
+        for (int qIdx = 0; qIdx < queryDescCount; qIdx++)
+        {
+            cv::Mat queryDescriptor = queryBuf.row(qIdx);
+            for (int c = 0; c < countFactor; c++)
+            {
+                int tIdx = qIdx * countFactor + c;
+                cv::Mat trainDescriptor = trainBuf.row(tIdx);
+                queryDescriptor.copyTo(trainDescriptor);
+                int elem = rng(dim);
+                float diff = rng.uniform(step * c, step * (c + 1));
+                trainDescriptor.at<float>(0, elem) += diff;
+            }
+        }
+
+        queryBuf.convertTo(query, CV_32F);
+        trainBuf.convertTo(train, CV_32F);
+    }
+};
+
+CUDA_TEST_P(BruteForceMatcher, Match_Single)
+{
+    cv::Ptr<cv::cuda::DescriptorMatcher> matcher =
+            cv::cuda::DescriptorMatcher::createBFMatcher(normCode);
+
+    cv::cuda::GpuMat mask;
+    if (useMask)
+    {
+        mask.create(query.rows, train.rows, CV_8UC1);
+        mask.setTo(cv::Scalar::all(1));
+    }
+
+    std::vector<cv::DMatch> matches;
+    matcher->match(loadMat(query), loadMat(train), matches, mask);
+
+    ASSERT_EQ(static_cast<size_t>(queryDescCount), matches.size());
+
+    int badCount = 0;
+    for (size_t i = 0; i < matches.size(); i++)
+    {
+        cv::DMatch match = matches[i];
+        if ((match.queryIdx != (int)i) || (match.trainIdx != (int)i * countFactor) || (match.imgIdx != 0))
+            badCount++;
+    }
+
+    ASSERT_EQ(0, badCount);
+}
+
+CUDA_TEST_P(BruteForceMatcher, Match_Collection)
+{
+    cv::Ptr<cv::cuda::DescriptorMatcher> matcher =
+            cv::cuda::DescriptorMatcher::createBFMatcher(normCode);
+
+    cv::cuda::GpuMat d_train(train);
+
+    // make add() twice to test such case
+    matcher->add(std::vector<cv::cuda::GpuMat>(1, d_train.rowRange(0, train.rows / 2)));
+    matcher->add(std::vector<cv::cuda::GpuMat>(1, d_train.rowRange(train.rows / 2, train.rows)));
+
+    // prepare masks (make first nearest match illegal)
+    std::vector<cv::cuda::GpuMat> masks(2);
+    for (int mi = 0; mi < 2; mi++)
+    {
+        masks[mi] = cv::cuda::GpuMat(query.rows, train.rows/2, CV_8UC1, cv::Scalar::all(1));
+        for (int di = 0; di < queryDescCount/2; di++)
+            masks[mi].col(di * countFactor).setTo(cv::Scalar::all(0));
+    }
+
+    std::vector<cv::DMatch> matches;
+    if (useMask)
+        matcher->match(cv::cuda::GpuMat(query), matches, masks);
+    else
+        matcher->match(cv::cuda::GpuMat(query), matches);
+
+    ASSERT_EQ(static_cast<size_t>(queryDescCount), matches.size());
+
+    int badCount = 0;
+    int shift = useMask ? 1 : 0;
+    for (size_t i = 0; i < matches.size(); i++)
+    {
+        cv::DMatch match = matches[i];
+
+        if ((int)i < queryDescCount / 2)
+        {
+            bool validQueryIdx = (match.queryIdx == (int)i);
+            bool validTrainIdx = (match.trainIdx == (int)i * countFactor + shift);
+            bool validImgIdx = (match.imgIdx == 0);
+            if (!validQueryIdx || !validTrainIdx || !validImgIdx)
+                badCount++;
+        }
+        else
+        {
+            bool validQueryIdx = (match.queryIdx == (int)i);
+            bool validTrainIdx = (match.trainIdx == ((int)i - queryDescCount / 2) * countFactor + shift);
+            bool validImgIdx = (match.imgIdx == 1);
+            if (!validQueryIdx || !validTrainIdx || !validImgIdx)
+                badCount++;
+        }
+    }
+
+    ASSERT_EQ(0, badCount);
+}
+
+CUDA_TEST_P(BruteForceMatcher, KnnMatch_2_Single)
+{
+    cv::Ptr<cv::cuda::DescriptorMatcher> matcher =
+            cv::cuda::DescriptorMatcher::createBFMatcher(normCode);
+
+    const int knn = 2;
+
+    cv::cuda::GpuMat mask;
+    if (useMask)
+    {
+        mask.create(query.rows, train.rows, CV_8UC1);
+        mask.setTo(cv::Scalar::all(1));
+    }
+
+    std::vector< std::vector<cv::DMatch> > matches;
+    matcher->knnMatch(loadMat(query), loadMat(train), matches, knn, mask);
+
+    ASSERT_EQ(static_cast<size_t>(queryDescCount), matches.size());
+
+    int badCount = 0;
+    for (size_t i = 0; i < matches.size(); i++)
+    {
+        if ((int)matches[i].size() != knn)
+            badCount++;
+        else
+        {
+            int localBadCount = 0;
+            for (int k = 0; k < knn; k++)
+            {
+                cv::DMatch match = matches[i][k];
+                if ((match.queryIdx != (int)i) || (match.trainIdx != (int)i * countFactor + k) || (match.imgIdx != 0))
+                    localBadCount++;
+            }
+            badCount += localBadCount > 0 ? 1 : 0;
+        }
+    }
+
+    ASSERT_EQ(0, badCount);
+}
+
+CUDA_TEST_P(BruteForceMatcher, KnnMatch_3_Single)
+{
+    cv::Ptr<cv::cuda::DescriptorMatcher> matcher =
+            cv::cuda::DescriptorMatcher::createBFMatcher(normCode);
+
+    const int knn = 3;
+
+    cv::cuda::GpuMat mask;
+    if (useMask)
+    {
+        mask.create(query.rows, train.rows, CV_8UC1);
+        mask.setTo(cv::Scalar::all(1));
+    }
+
+    std::vector< std::vector<cv::DMatch> > matches;
+    matcher->knnMatch(loadMat(query), loadMat(train), matches, knn, mask);
+
+    ASSERT_EQ(static_cast<size_t>(queryDescCount), matches.size());
+
+    int badCount = 0;
+    for (size_t i = 0; i < matches.size(); i++)
+    {
+        if ((int)matches[i].size() != knn)
+            badCount++;
+        else
+        {
+            int localBadCount = 0;
+            for (int k = 0; k < knn; k++)
+            {
+                cv::DMatch match = matches[i][k];
+                if ((match.queryIdx != (int)i) || (match.trainIdx != (int)i * countFactor + k) || (match.imgIdx != 0))
+                    localBadCount++;
+            }
+            badCount += localBadCount > 0 ? 1 : 0;
+        }
+    }
+
+    ASSERT_EQ(0, badCount);
+}
+
+CUDA_TEST_P(BruteForceMatcher, KnnMatch_2_Collection)
+{
+    cv::Ptr<cv::cuda::DescriptorMatcher> matcher =
+            cv::cuda::DescriptorMatcher::createBFMatcher(normCode);
+
+    const int knn = 2;
+
+    cv::cuda::GpuMat d_train(train);
+
+    // make add() twice to test such case
+    matcher->add(std::vector<cv::cuda::GpuMat>(1, d_train.rowRange(0, train.rows / 2)));
+    matcher->add(std::vector<cv::cuda::GpuMat>(1, d_train.rowRange(train.rows / 2, train.rows)));
+
+    // prepare masks (make first nearest match illegal)
+    std::vector<cv::cuda::GpuMat> masks(2);
+    for (int mi = 0; mi < 2; mi++ )
+    {
+        masks[mi] = cv::cuda::GpuMat(query.rows, train.rows / 2, CV_8UC1, cv::Scalar::all(1));
+        for (int di = 0; di < queryDescCount / 2; di++)
+            masks[mi].col(di * countFactor).setTo(cv::Scalar::all(0));
+    }
+
+    std::vector< std::vector<cv::DMatch> > matches;
+
+    if (useMask)
+        matcher->knnMatch(cv::cuda::GpuMat(query), matches, knn, masks);
+    else
+        matcher->knnMatch(cv::cuda::GpuMat(query), matches, knn);
+
+    ASSERT_EQ(static_cast<size_t>(queryDescCount), matches.size());
+
+    int badCount = 0;
+    int shift = useMask ? 1 : 0;
+    for (size_t i = 0; i < matches.size(); i++)
+    {
+        if ((int)matches[i].size() != knn)
+            badCount++;
+        else
+        {
+            int localBadCount = 0;
+            for (int k = 0; k < knn; k++)
+            {
+                cv::DMatch match = matches[i][k];
+                {
+                    if ((int)i < queryDescCount / 2)
+                    {
+                        if ((match.queryIdx != (int)i) || (match.trainIdx != (int)i * countFactor + k + shift) || (match.imgIdx != 0) )
+                            localBadCount++;
+                    }
+                    else
+                    {
+                        if ((match.queryIdx != (int)i) || (match.trainIdx != ((int)i - queryDescCount / 2) * countFactor + k + shift) || (match.imgIdx != 1) )
+                            localBadCount++;
+                    }
+                }
+            }
+            badCount += localBadCount > 0 ? 1 : 0;
+        }
+    }
+
+    ASSERT_EQ(0, badCount);
+}
+
+CUDA_TEST_P(BruteForceMatcher, KnnMatch_3_Collection)
+{
+    cv::Ptr<cv::cuda::DescriptorMatcher> matcher =
+            cv::cuda::DescriptorMatcher::createBFMatcher(normCode);
+
+    const int knn = 3;
+
+    cv::cuda::GpuMat d_train(train);
+
+    // make add() twice to test such case
+    matcher->add(std::vector<cv::cuda::GpuMat>(1, d_train.rowRange(0, train.rows / 2)));
+    matcher->add(std::vector<cv::cuda::GpuMat>(1, d_train.rowRange(train.rows / 2, train.rows)));
+
+    // prepare masks (make first nearest match illegal)
+    std::vector<cv::cuda::GpuMat> masks(2);
+    for (int mi = 0; mi < 2; mi++ )
+    {
+        masks[mi] = cv::cuda::GpuMat(query.rows, train.rows / 2, CV_8UC1, cv::Scalar::all(1));
+        for (int di = 0; di < queryDescCount / 2; di++)
+            masks[mi].col(di * countFactor).setTo(cv::Scalar::all(0));
+    }
+
+    std::vector< std::vector<cv::DMatch> > matches;
+
+    if (useMask)
+        matcher->knnMatch(cv::cuda::GpuMat(query), matches, knn, masks);
+    else
+        matcher->knnMatch(cv::cuda::GpuMat(query), matches, knn);
+
+    ASSERT_EQ(static_cast<size_t>(queryDescCount), matches.size());
+
+    int badCount = 0;
+    int shift = useMask ? 1 : 0;
+    for (size_t i = 0; i < matches.size(); i++)
+    {
+        if ((int)matches[i].size() != knn)
+            badCount++;
+        else
+        {
+            int localBadCount = 0;
+            for (int k = 0; k < knn; k++)
+            {
+                cv::DMatch match = matches[i][k];
+                {
+                    if ((int)i < queryDescCount / 2)
+                    {
+                        if ((match.queryIdx != (int)i) || (match.trainIdx != (int)i * countFactor + k + shift) || (match.imgIdx != 0) )
+                            localBadCount++;
+                    }
+                    else
+                    {
+                        if ((match.queryIdx != (int)i) || (match.trainIdx != ((int)i - queryDescCount / 2) * countFactor + k + shift) || (match.imgIdx != 1) )
+                            localBadCount++;
+                    }
+                }
+            }
+            badCount += localBadCount > 0 ? 1 : 0;
+        }
+    }
+
+    ASSERT_EQ(0, badCount);
+}
+
+CUDA_TEST_P(BruteForceMatcher, RadiusMatch_Single)
+{
+    cv::Ptr<cv::cuda::DescriptorMatcher> matcher =
+            cv::cuda::DescriptorMatcher::createBFMatcher(normCode);
+
+    const float radius = 1.f / countFactor;
+
+    if (!supportFeature(devInfo, cv::cuda::GLOBAL_ATOMICS))
+    {
+        try
+        {
+            std::vector< std::vector<cv::DMatch> > matches;
+            matcher->radiusMatch(loadMat(query), loadMat(train), matches, radius);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsNotImplemented, e.code);
+        }
+    }
+    else
+    {
+        cv::cuda::GpuMat mask;
+        if (useMask)
+        {
+            mask.create(query.rows, train.rows, CV_8UC1);
+            mask.setTo(cv::Scalar::all(1));
+        }
+
+        std::vector< std::vector<cv::DMatch> > matches;
+        matcher->radiusMatch(loadMat(query), loadMat(train), matches, radius, mask);
+
+        ASSERT_EQ(static_cast<size_t>(queryDescCount), matches.size());
+
+        int badCount = 0;
+        for (size_t i = 0; i < matches.size(); i++)
+        {
+            if ((int)matches[i].size() != 1)
+                badCount++;
+            else
+            {
+                cv::DMatch match = matches[i][0];
+                if ((match.queryIdx != (int)i) || (match.trainIdx != (int)i*countFactor) || (match.imgIdx != 0))
+                    badCount++;
+            }
+        }
+
+        ASSERT_EQ(0, badCount);
+    }
+}
+
+CUDA_TEST_P(BruteForceMatcher, RadiusMatch_Collection)
+{
+    cv::Ptr<cv::cuda::DescriptorMatcher> matcher =
+            cv::cuda::DescriptorMatcher::createBFMatcher(normCode);
+
+    const int n = 3;
+    const float radius = 1.f / countFactor * n;
+
+    cv::cuda::GpuMat d_train(train);
+
+    // make add() twice to test such case
+    matcher->add(std::vector<cv::cuda::GpuMat>(1, d_train.rowRange(0, train.rows / 2)));
+    matcher->add(std::vector<cv::cuda::GpuMat>(1, d_train.rowRange(train.rows / 2, train.rows)));
+
+    // prepare masks (make first nearest match illegal)
+    std::vector<cv::cuda::GpuMat> masks(2);
+    for (int mi = 0; mi < 2; mi++)
+    {
+        masks[mi] = cv::cuda::GpuMat(query.rows, train.rows / 2, CV_8UC1, cv::Scalar::all(1));
+        for (int di = 0; di < queryDescCount / 2; di++)
+            masks[mi].col(di * countFactor).setTo(cv::Scalar::all(0));
+    }
+
+    if (!supportFeature(devInfo, cv::cuda::GLOBAL_ATOMICS))
+    {
+        try
+        {
+            std::vector< std::vector<cv::DMatch> > matches;
+            matcher->radiusMatch(cv::cuda::GpuMat(query), matches, radius, masks);
+        }
+        catch (const cv::Exception& e)
+        {
+            ASSERT_EQ(cv::Error::StsNotImplemented, e.code);
+        }
+    }
+    else
+    {
+        std::vector< std::vector<cv::DMatch> > matches;
+
+        if (useMask)
+            matcher->radiusMatch(cv::cuda::GpuMat(query), matches, radius, masks);
+        else
+            matcher->radiusMatch(cv::cuda::GpuMat(query), matches, radius);
+
+        ASSERT_EQ(static_cast<size_t>(queryDescCount), matches.size());
+
+        int badCount = 0;
+        int shift = useMask ? 1 : 0;
+        int needMatchCount = useMask ? n-1 : n;
+        for (size_t i = 0; i < matches.size(); i++)
+        {
+            if ((int)matches[i].size() != needMatchCount)
+                badCount++;
+            else
+            {
+                int localBadCount = 0;
+                for (int k = 0; k < needMatchCount; k++)
+                {
+                    cv::DMatch match = matches[i][k];
+                    {
+                        if ((int)i < queryDescCount / 2)
+                        {
+                            if ((match.queryIdx != (int)i) || (match.trainIdx != (int)i * countFactor + k + shift) || (match.imgIdx != 0) )
+                                localBadCount++;
+                        }
+                        else
+                        {
+                            if ((match.queryIdx != (int)i) || (match.trainIdx != ((int)i - queryDescCount / 2) * countFactor + k + shift) || (match.imgIdx != 1) )
+                                localBadCount++;
+                        }
+                    }
+                }
+                badCount += localBadCount > 0 ? 1 : 0;
+            }
+        }
+
+        ASSERT_EQ(0, badCount);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Features2D, BruteForceMatcher, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(NormCode(cv::NORM_L1), NormCode(cv::NORM_L2)),
+    testing::Values(DescriptorSize(57), DescriptorSize(64), DescriptorSize(83), DescriptorSize(128), DescriptorSize(179), DescriptorSize(256), DescriptorSize(304)),
+    testing::Values(UseMask(false), UseMask(true))));
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudafeatures2d/test/test_main.cpp b/modules/cudafeatures2d/test/test_main.cpp
new file mode 100644
index 00000000000..04f4fcf6e60
--- /dev/null
+++ b/modules/cudafeatures2d/test/test_main.cpp
@@ -0,0 +1,45 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+CV_CUDA_TEST_MAIN("gpu")
diff --git a/modules/cudafeatures2d/test/test_precomp.hpp b/modules/cudafeatures2d/test/test_precomp.hpp
new file mode 100644
index 00000000000..15283f30faf
--- /dev/null
+++ b/modules/cudafeatures2d/test/test_precomp.hpp
@@ -0,0 +1,53 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_test.hpp"
+
+#include "opencv2/cudafeatures2d.hpp"
+#include "opencv2/features2d.hpp"
+
+#include "cvconfig.h"
+
+#endif
diff --git a/modules/cudafilters/CMakeLists.txt b/modules/cudafilters/CMakeLists.txt
new file mode 100644
index 00000000000..08281c135ce
--- /dev/null
+++ b/modules/cudafilters/CMakeLists.txt
@@ -0,0 +1,9 @@
+if(IOS OR WINRT OR (NOT HAVE_CUDA AND NOT BUILD_CUDA_STUBS))
+  ocv_module_disable(cudafilters)
+endif()
+
+set(the_description "CUDA-accelerated Image Filtering")
+
+ocv_warnings_disable(CMAKE_CXX_FLAGS /wd4127 /wd4324 /wd4512 -Wundef -Wmissing-declarations -Wshadow)
+
+ocv_define_module(cudafilters opencv_imgproc opencv_cudaarithm WRAP python)
diff --git a/modules/cudafilters/include/opencv2/cudafilters.hpp b/modules/cudafilters/include/opencv2/cudafilters.hpp
new file mode 100644
index 00000000000..fd28150f310
--- /dev/null
+++ b/modules/cudafilters/include/opencv2/cudafilters.hpp
@@ -0,0 +1,331 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_CUDAFILTERS_HPP
+#define OPENCV_CUDAFILTERS_HPP
+
+#ifndef __cplusplus
+#  error cudafilters.hpp header must be compiled as C++
+#endif
+
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/imgproc.hpp"
+
+/**
+  @addtogroup cuda
+  @{
+    @defgroup cudafilters Image Filtering
+
+Functions and classes described in this section are used to perform various linear or non-linear
+filtering operations on 2D images.
+
+@note
+   -   An example containing all basic morphology operators like erode and dilate can be found at
+        opencv_source_code/samples/gpu/morphology.cpp
+
+  @}
+ */
+
+namespace cv { namespace cuda {
+
+//! @addtogroup cudafilters
+//! @{
+
+/** @brief Common interface for all CUDA filters :
+ */
+class CV_EXPORTS_W Filter : public Algorithm
+{
+public:
+    /** @brief Applies the specified filter to the image.
+
+    @param src Input image.
+    @param dst Output image.
+    @param stream Stream for the asynchronous version.
+     */
+    CV_WRAP virtual void apply(InputArray src, OutputArray dst, Stream& stream = Stream::Null()) = 0;
+};
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Box Filter
+
+/** @brief Creates a normalized 2D box filter.
+
+@param srcType Input image type. Only CV_8UC1, CV_8UC4 and CV_32FC1 are supported for now.
+@param dstType Output image type. Only the same type as src is supported for now.
+@param ksize Kernel size.
+@param anchor Anchor point. The default value Point(-1, -1) means that the anchor is at the kernel
+center.
+@param borderMode Pixel extrapolation method. For details, see borderInterpolate .
+@param borderVal Default border value.
+
+@sa boxFilter
+ */
+CV_EXPORTS_W Ptr<Filter> createBoxFilter(int srcType, int dstType, Size ksize, Point anchor = Point(-1, -1),
+                                       int borderMode = BORDER_DEFAULT, Scalar borderVal = Scalar::all(0));
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Linear Filter
+
+/** @brief Creates a non-separable linear 2D filter.
+
+@param srcType Input image type. Supports CV_8U , CV_16U and CV_32F one and four channel image.
+@param dstType Output image type. Only the same type as src is supported for now.
+@param kernel 2D array of filter coefficients.
+@param anchor Anchor point. The default value Point(-1, -1) means that the anchor is at the kernel
+center.
+@param borderMode Pixel extrapolation method. For details, see borderInterpolate .
+@param borderVal Default border value.
+
+@sa filter2D
+ */
+CV_EXPORTS_W Ptr<Filter> createLinearFilter(int srcType, int dstType, InputArray kernel, Point anchor = Point(-1, -1),
+                                          int borderMode = BORDER_DEFAULT, Scalar borderVal = Scalar::all(0));
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Laplacian Filter
+
+/** @brief Creates a Laplacian operator.
+
+@param srcType Input image type. Supports CV_8U , CV_16U and CV_32F one and four channel image.
+@param dstType Output image type. Only the same type as src is supported for now.
+@param ksize Aperture size used to compute the second-derivative filters (see getDerivKernels). It
+must be positive and odd. Only ksize = 1 and ksize = 3 are supported.
+@param scale Optional scale factor for the computed Laplacian values. By default, no scaling is
+applied (see getDerivKernels ).
+@param borderMode Pixel extrapolation method. For details, see borderInterpolate .
+@param borderVal Default border value.
+
+@sa Laplacian
+ */
+CV_EXPORTS_W Ptr<Filter> createLaplacianFilter(int srcType, int dstType, int ksize = 1, double scale = 1,
+                                             int borderMode = BORDER_DEFAULT, Scalar borderVal = Scalar::all(0));
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Separable Linear Filter
+
+/** @brief Creates a separable linear filter.
+
+@param srcType Source array type.
+@param dstType Destination array type.
+@param rowKernel Horizontal filter coefficients. Support kernels with size \<= 32 .
+@param columnKernel Vertical filter coefficients. Support kernels with size \<= 32 .
+@param anchor Anchor position within the kernel. Negative values mean that anchor is positioned at
+the aperture center.
+@param rowBorderMode Pixel extrapolation method in the vertical direction For details, see
+borderInterpolate.
+@param columnBorderMode Pixel extrapolation method in the horizontal direction.
+
+@sa sepFilter2D
+ */
+CV_EXPORTS_W Ptr<Filter> createSeparableLinearFilter(int srcType, int dstType, InputArray rowKernel, InputArray columnKernel,
+                                                   Point anchor = Point(-1,-1), int rowBorderMode = BORDER_DEFAULT, int columnBorderMode = -1);
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Deriv Filter
+
+/** @brief Creates a generalized Deriv operator.
+
+@param srcType Source image type.
+@param dstType Destination array type.
+@param dx Derivative order in respect of x.
+@param dy Derivative order in respect of y.
+@param ksize Aperture size. See getDerivKernels for details.
+@param normalize Flag indicating whether to normalize (scale down) the filter coefficients or not.
+See getDerivKernels for details.
+@param scale Optional scale factor for the computed derivative values. By default, no scaling is
+applied. For details, see getDerivKernels .
+@param rowBorderMode Pixel extrapolation method in the vertical direction. For details, see
+borderInterpolate.
+@param columnBorderMode Pixel extrapolation method in the horizontal direction.
+ */
+CV_EXPORTS_W Ptr<Filter> createDerivFilter(int srcType, int dstType, int dx, int dy,
+                                         int ksize, bool normalize = false, double scale = 1,
+                                         int rowBorderMode = BORDER_DEFAULT, int columnBorderMode = -1);
+
+/** @brief Creates a Sobel operator.
+
+@param srcType Source image type.
+@param dstType Destination array type.
+@param dx Derivative order in respect of x.
+@param dy Derivative order in respect of y.
+@param ksize Size of the extended Sobel kernel. Possible values are 1, 3, 5 or 7.
+@param scale Optional scale factor for the computed derivative values. By default, no scaling is
+applied. For details, see getDerivKernels .
+@param rowBorderMode Pixel extrapolation method in the vertical direction. For details, see
+borderInterpolate.
+@param columnBorderMode Pixel extrapolation method in the horizontal direction.
+
+@sa Sobel
+ */
+CV_EXPORTS_W Ptr<Filter> createSobelFilter(int srcType, int dstType, int dx, int dy, int ksize = 3,
+                                         double scale = 1, int rowBorderMode = BORDER_DEFAULT, int columnBorderMode = -1);
+
+/** @brief Creates a vertical or horizontal Scharr operator.
+
+@param srcType Source image type.
+@param dstType Destination array type.
+@param dx Order of the derivative x.
+@param dy Order of the derivative y.
+@param scale Optional scale factor for the computed derivative values. By default, no scaling is
+applied. See getDerivKernels for details.
+@param rowBorderMode Pixel extrapolation method in the vertical direction. For details, see
+borderInterpolate.
+@param columnBorderMode Pixel extrapolation method in the horizontal direction.
+
+@sa Scharr
+ */
+CV_EXPORTS_W Ptr<Filter> createScharrFilter(int srcType, int dstType, int dx, int dy,
+                                          double scale = 1, int rowBorderMode = BORDER_DEFAULT, int columnBorderMode = -1);
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Gaussian Filter
+
+/** @brief Creates a Gaussian filter.
+
+@param srcType Source image type.
+@param dstType Destination array type.
+@param ksize Aperture size. See getGaussianKernel for details.
+@param sigma1 Gaussian sigma in the horizontal direction. See getGaussianKernel for details.
+@param sigma2 Gaussian sigma in the vertical direction. If 0, then
+\f$\texttt{sigma2}\leftarrow\texttt{sigma1}\f$ .
+@param rowBorderMode Pixel extrapolation method in the vertical direction. For details, see
+borderInterpolate.
+@param columnBorderMode Pixel extrapolation method in the horizontal direction.
+
+@sa GaussianBlur
+ */
+CV_EXPORTS_W Ptr<Filter> createGaussianFilter(int srcType, int dstType, Size ksize,
+                                            double sigma1, double sigma2 = 0,
+                                            int rowBorderMode = BORDER_DEFAULT, int columnBorderMode = -1);
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Morphology Filter
+
+/** @brief Creates a 2D morphological filter.
+
+@param op Type of morphological operation. The following types are possible:
+-   **MORPH_ERODE** erode
+-   **MORPH_DILATE** dilate
+-   **MORPH_OPEN** opening
+-   **MORPH_CLOSE** closing
+-   **MORPH_GRADIENT** morphological gradient
+-   **MORPH_TOPHAT** "top hat"
+-   **MORPH_BLACKHAT** "black hat"
+@param srcType Input/output image type. Only CV_8UC1, CV_8UC4, CV_32FC1 and CV_32FC4 are supported.
+@param kernel 2D 8-bit structuring element for the morphological operation.
+@param anchor Anchor position within the structuring element. Negative values mean that the anchor
+is at the center.
+@param iterations Number of times erosion and dilation to be applied.
+
+@sa morphologyEx
+ */
+CV_EXPORTS_W Ptr<Filter> createMorphologyFilter(int op, int srcType, InputArray kernel, Point anchor = Point(-1, -1), int iterations = 1);
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Image Rank Filter
+
+/** @brief Creates the maximum filter.
+
+@param srcType Input/output image type. Only CV_8UC1 and CV_8UC4 are supported.
+@param ksize Kernel size.
+@param anchor Anchor point. The default value (-1) means that the anchor is at the kernel center.
+@param borderMode Pixel extrapolation method. For details, see borderInterpolate .
+@param borderVal Default border value.
+ */
+CV_EXPORTS_W Ptr<Filter> createBoxMaxFilter(int srcType, Size ksize,
+                                          Point anchor = Point(-1, -1),
+                                          int borderMode = BORDER_DEFAULT, Scalar borderVal = Scalar::all(0));
+
+/** @brief Creates the minimum filter.
+
+@param srcType Input/output image type. Only CV_8UC1 and CV_8UC4 are supported.
+@param ksize Kernel size.
+@param anchor Anchor point. The default value (-1) means that the anchor is at the kernel center.
+@param borderMode Pixel extrapolation method. For details, see borderInterpolate .
+@param borderVal Default border value.
+ */
+CV_EXPORTS_W Ptr<Filter> createBoxMinFilter(int srcType, Size ksize,
+                                          Point anchor = Point(-1, -1),
+                                          int borderMode = BORDER_DEFAULT, Scalar borderVal = Scalar::all(0));
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// 1D Sum Filter
+
+/** @brief Creates a horizontal 1D box filter.
+
+@param srcType Input image type. Only CV_8UC1 type is supported for now.
+@param dstType Output image type. Only CV_32FC1 type is supported for now.
+@param ksize Kernel size.
+@param anchor Anchor point. The default value (-1) means that the anchor is at the kernel center.
+@param borderMode Pixel extrapolation method. For details, see borderInterpolate .
+@param borderVal Default border value.
+ */
+CV_EXPORTS_W Ptr<Filter> createRowSumFilter(int srcType, int dstType, int ksize, int anchor = -1, int borderMode = BORDER_DEFAULT, Scalar borderVal = Scalar::all(0));
+
+/** @brief Creates a vertical 1D box filter.
+
+@param srcType Input image type. Only CV_8UC1 type is supported for now.
+@param dstType Output image type. Only CV_32FC1 type is supported for now.
+@param ksize Kernel size.
+@param anchor Anchor point. The default value (-1) means that the anchor is at the kernel center.
+@param borderMode Pixel extrapolation method. For details, see borderInterpolate .
+@param borderVal Default border value.
+ */
+CV_EXPORTS_W Ptr<Filter> createColumnSumFilter(int srcType, int dstType, int ksize, int anchor = -1, int borderMode = BORDER_DEFAULT, Scalar borderVal = Scalar::all(0));
+
+//! @}
+
+///////////////////////////// Median Filtering //////////////////////////////
+
+/** @brief Performs median filtering for each point of the source image.
+
+@param srcType type of of source image. Only CV_8UC1 images are supported for now.
+@param windowSize Size of the kernerl used for the filtering. Uses a (windowSize x windowSize) filter.
+@param partition Specifies the parallel granularity of the workload. This parameter should be used GPU experts when optimizing performance.
+
+Outputs an image that has been filtered using median-filtering formulation.
+ */
+CV_EXPORTS_W Ptr<Filter> createMedianFilter(int srcType, int windowSize, int partition = 128);
+
+}} // namespace cv { namespace cuda {
+
+#endif /* OPENCV_CUDAFILTERS_HPP */
diff --git a/modules/cudafilters/misc/python/test/test_cudafilters.py b/modules/cudafilters/misc/python/test/test_cudafilters.py
new file mode 100644
index 00000000000..8ee74dba5fe
--- /dev/null
+++ b/modules/cudafilters/misc/python/test/test_cudafilters.py
@@ -0,0 +1,43 @@
+#!/usr/bin/env python
+import os
+import cv2 as cv
+import numpy as np
+
+from tests_common import NewOpenCVTests, unittest
+
+class cudafilters_test(NewOpenCVTests):
+    def setUp(self):
+        super(cudafilters_test, self).setUp()
+        if not cv.cuda.getCudaEnabledDeviceCount():
+            self.skipTest("No CUDA-capable device is detected")
+
+    def test_existence(self):
+        #Test at least the existence of wrapped functions for now
+
+        _filter = cv.cuda.createBoxFilter(cv.CV_8UC1, -1, (3, 3))
+        _filter = cv.cuda.createLinearFilter(cv.CV_8UC4, -1, np.eye(3))
+        _filter = cv.cuda.createLaplacianFilter(cv.CV_16UC1, -1, ksize=3)
+        _filter = cv.cuda.createSeparableLinearFilter(cv.CV_8UC1, -1, np.eye(3), np.eye(3))
+        _filter = cv.cuda.createDerivFilter(cv.CV_8UC1, -1, 1, 1, 3)
+        _filter = cv.cuda.createSobelFilter(cv.CV_8UC1, -1, 1, 1)
+        _filter = cv.cuda.createScharrFilter(cv.CV_8UC1, -1, 1, 0)
+        _filter = cv.cuda.createGaussianFilter(cv.CV_8UC1, -1, (3, 3), 16)
+        _filter = cv.cuda.createMorphologyFilter(cv.MORPH_DILATE, cv.CV_32FC1, np.eye(3))
+        _filter = cv.cuda.createBoxMaxFilter(cv.CV_8UC1, (3, 3))
+        _filter = cv.cuda.createBoxMinFilter(cv.CV_8UC1, (3, 3))
+        _filter = cv.cuda.createRowSumFilter(cv.CV_8UC1, cv.CV_32FC1, 3)
+        _filter = cv.cuda.createColumnSumFilter(cv.CV_8UC1, cv.CV_32FC1, 3)
+        _filter = cv.cuda.createMedianFilter(cv.CV_8UC1, 3)
+
+        self.assertTrue(True) #It is sufficient that no exceptions have been there
+
+    def test_laplacian(self):
+        npMat = (np.random.random((128, 128)) * 255).astype(np.uint16)
+        cuMat = cv.cuda_GpuMat()
+        cuMat.upload(npMat)
+
+        self.assertTrue(np.allclose(cv.cuda.createLaplacianFilter(cv.CV_16UC1, -1, ksize=3).apply(cuMat).download(),
+                                         cv.Laplacian(npMat, cv.CV_16UC1, ksize=3)))
+
+if __name__ == '__main__':
+    NewOpenCVTests.bootstrap()
\ No newline at end of file
diff --git a/modules/cudafilters/perf/perf_filters.cpp b/modules/cudafilters/perf/perf_filters.cpp
new file mode 100644
index 00000000000..e1970f441ea
--- /dev/null
+++ b/modules/cudafilters/perf/perf_filters.cpp
@@ -0,0 +1,417 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////////////////////
+// Blur
+
+DEF_PARAM_TEST(Sz_Type_KernelSz, cv::Size, MatType, int);
+
+PERF_TEST_P(Sz_Type_KernelSz, Blur,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8UC1, CV_8UC4, CV_32FC1),
+                    Values(3, 5, 7)))
+{
+    declare.time(20.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+    const int ksize = GET_PARAM(2);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        cv::Ptr<cv::cuda::Filter> blurFilter = cv::cuda::createBoxFilter(d_src.type(), -1, cv::Size(ksize, ksize));
+
+        TEST_CYCLE() blurFilter->apply(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst, 1);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::blur(src, dst, cv::Size(ksize, ksize));
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Filter2D
+
+PERF_TEST_P(Sz_Type_KernelSz, Filter2D, Combine(CUDA_TYPICAL_MAT_SIZES, Values(CV_8UC1, CV_8UC4, CV_32FC1, CV_32FC4), Values(3, 5, 7, 9, 11, 13, 15)))
+{
+    declare.time(20.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+    const int ksize = GET_PARAM(2);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Mat kernel(ksize, ksize, CV_32FC1);
+    declare.in(kernel, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        cv::Ptr<cv::cuda::Filter> filter2D = cv::cuda::createLinearFilter(d_src.type(), -1, kernel);
+
+        TEST_CYCLE() filter2D->apply(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::filter2D(src, dst, -1, kernel);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Laplacian
+
+PERF_TEST_P(Sz_Type_KernelSz, Laplacian, Combine(CUDA_TYPICAL_MAT_SIZES, Values(CV_8UC1, CV_8UC4, CV_32FC1, CV_32FC4), Values(1, 3)))
+{
+    declare.time(20.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+    const int ksize = GET_PARAM(2);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        cv::Ptr<cv::cuda::Filter> laplacian = cv::cuda::createLaplacianFilter(d_src.type(), -1, ksize);
+
+        TEST_CYCLE() laplacian->apply(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::Laplacian(src, dst, -1, ksize);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Sobel
+
+PERF_TEST_P(Sz_Type_KernelSz, Sobel, Combine(CUDA_TYPICAL_MAT_SIZES, Values(CV_8UC1, CV_8UC4, CV_32FC1), Values(3, 5, 7, 9, 11, 13, 15)))
+{
+    declare.time(20.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+    const int ksize = GET_PARAM(2);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        cv::Ptr<cv::cuda::Filter> sobel = cv::cuda::createSobelFilter(d_src.type(), -1, 1, 1, ksize);
+
+        TEST_CYCLE() sobel->apply(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::Sobel(src, dst, -1, 1, 1, ksize);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Scharr
+
+PERF_TEST_P(Sz_Type, Scharr, Combine(CUDA_TYPICAL_MAT_SIZES, Values(CV_8UC1, CV_8UC4, CV_32FC1)))
+{
+    declare.time(20.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        cv::Ptr<cv::cuda::Filter> scharr = cv::cuda::createScharrFilter(d_src.type(), -1, 1, 0);
+
+        TEST_CYCLE() scharr->apply(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::Scharr(src, dst, -1, 1, 0);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// GaussianBlur
+
+PERF_TEST_P(Sz_Type_KernelSz, GaussianBlur, Combine(CUDA_TYPICAL_MAT_SIZES, Values(CV_8UC1, CV_8UC4, CV_32FC1), Values(3, 5, 7, 9, 11, 13, 15)))
+{
+    declare.time(20.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+    const int ksize = GET_PARAM(2);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        cv::Ptr<cv::cuda::Filter> gauss = cv::cuda::createGaussianFilter(d_src.type(), -1, cv::Size(ksize, ksize), 0.5);
+
+        TEST_CYCLE() gauss->apply(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::GaussianBlur(src, dst, cv::Size(ksize, ksize), 0.5);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Erode
+
+PERF_TEST_P(Sz_Type, Erode, Combine(CUDA_TYPICAL_MAT_SIZES, Values(CV_8UC1, CV_8UC4)))
+{
+    declare.time(20.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    const cv::Mat ker = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(3, 3));
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        cv::Ptr<cv::cuda::Filter> erode = cv::cuda::createMorphologyFilter(cv::MORPH_ERODE, src.type(), ker);
+
+        TEST_CYCLE() erode->apply(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::erode(src, dst, ker);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Dilate
+
+PERF_TEST_P(Sz_Type, Dilate, Combine(CUDA_TYPICAL_MAT_SIZES, Values(CV_8UC1, CV_8UC4)))
+{
+    declare.time(20.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    const cv::Mat ker = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(3, 3));
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        cv::Ptr<cv::cuda::Filter> dilate = cv::cuda::createMorphologyFilter(cv::MORPH_DILATE, src.type(), ker);
+
+        TEST_CYCLE() dilate->apply(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::dilate(src, dst, ker);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// MorphologyEx
+
+CV_ENUM(MorphOp, MORPH_OPEN, MORPH_CLOSE, MORPH_GRADIENT, MORPH_TOPHAT, MORPH_BLACKHAT)
+
+DEF_PARAM_TEST(Sz_Type_Op, cv::Size, MatType, MorphOp);
+
+PERF_TEST_P(Sz_Type_Op, MorphologyEx, Combine(CUDA_TYPICAL_MAT_SIZES, Values(CV_8UC1, CV_8UC4), MorphOp::all()))
+{
+    declare.time(20.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+    const int morphOp = GET_PARAM(2);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    const cv::Mat ker = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(3, 3));
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        cv::Ptr<cv::cuda::Filter> morph = cv::cuda::createMorphologyFilter(morphOp, src.type(), ker);
+
+        TEST_CYCLE() morph->apply(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::morphologyEx(src, dst, morphOp, ker);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+//////////////////////////////////////////////////////////////////////
+// MedianFilter
+//////////////////////////////////////////////////////////////////////
+// Median
+
+DEF_PARAM_TEST(Sz_KernelSz, cv::Size, int);
+
+//PERF_TEST_P(Sz_Type_KernelSz, Median, Combine(CUDA_TYPICAL_MAT_SIZES, Values(CV_8UC1,CV_8UC1), Values(3, 5, 7, 9, 11, 13, 15)))
+PERF_TEST_P(Sz_KernelSz, Median, Combine(CUDA_TYPICAL_MAT_SIZES, Values(3, 5, 7, 9, 11, 13, 15)))
+{
+    declare.time(20.0);
+
+    const cv::Size size = GET_PARAM(0);
+    // const int type = GET_PARAM(1);
+    const int type = CV_8UC1;
+    const int kernel = GET_PARAM(1);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        cv::Ptr<cv::cuda::Filter> median = cv::cuda::createMedianFilter(d_src.type(), kernel);
+
+        TEST_CYCLE() median->apply(d_src, dst);
+
+        SANITY_CHECK_NOTHING();
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::medianBlur(src,dst,kernel);
+
+        SANITY_CHECK_NOTHING();
+    }
+}
+
+}} // namespace
diff --git a/modules/cudafilters/perf/perf_main.cpp b/modules/cudafilters/perf/perf_main.cpp
new file mode 100644
index 00000000000..b49b5242480
--- /dev/null
+++ b/modules/cudafilters/perf/perf_main.cpp
@@ -0,0 +1,47 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+using namespace perf;
+
+CV_PERF_TEST_CUDA_MAIN(cudafilters)
diff --git a/modules/cudafilters/perf/perf_precomp.hpp b/modules/cudafilters/perf/perf_precomp.hpp
new file mode 100644
index 00000000000..bf5186482b4
--- /dev/null
+++ b/modules/cudafilters/perf/perf_precomp.hpp
@@ -0,0 +1,55 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_PERF_PRECOMP_HPP__
+#define __OPENCV_PERF_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_perf.hpp"
+
+#include "opencv2/cudafilters.hpp"
+
+namespace opencv_test {
+using namespace perf;
+}
+
+#endif
diff --git a/modules/cudafilters/src/cuda/column_filter.16sc1.cu b/modules/cudafilters/src/cuda/column_filter.16sc1.cu
new file mode 100644
index 00000000000..d4c6d19ab80
--- /dev/null
+++ b/modules/cudafilters/src/cuda/column_filter.16sc1.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "column_filter.hpp"
+
+namespace filter
+{
+    template void linearColumn<float, short>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/column_filter.16sc3.cu b/modules/cudafilters/src/cuda/column_filter.16sc3.cu
new file mode 100644
index 00000000000..419fdea6528
--- /dev/null
+++ b/modules/cudafilters/src/cuda/column_filter.16sc3.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "column_filter.hpp"
+
+namespace filter
+{
+    template void linearColumn<float3, short3>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/column_filter.16sc4.cu b/modules/cudafilters/src/cuda/column_filter.16sc4.cu
new file mode 100644
index 00000000000..1caeb877582
--- /dev/null
+++ b/modules/cudafilters/src/cuda/column_filter.16sc4.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "column_filter.hpp"
+
+namespace filter
+{
+    template void linearColumn<float4, short4>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/column_filter.16uc1.cu b/modules/cudafilters/src/cuda/column_filter.16uc1.cu
new file mode 100644
index 00000000000..dc68b710f51
--- /dev/null
+++ b/modules/cudafilters/src/cuda/column_filter.16uc1.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "column_filter.hpp"
+
+namespace filter
+{
+    template void linearColumn<float, unsigned short>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/column_filter.16uc3.cu b/modules/cudafilters/src/cuda/column_filter.16uc3.cu
new file mode 100644
index 00000000000..f0a07d6dddb
--- /dev/null
+++ b/modules/cudafilters/src/cuda/column_filter.16uc3.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "column_filter.hpp"
+
+namespace filter
+{
+    template void linearColumn<float3, ushort3>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/column_filter.16uc4.cu b/modules/cudafilters/src/cuda/column_filter.16uc4.cu
new file mode 100644
index 00000000000..638ef794adc
--- /dev/null
+++ b/modules/cudafilters/src/cuda/column_filter.16uc4.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "column_filter.hpp"
+
+namespace filter
+{
+    template void linearColumn<float4, ushort4>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/column_filter.32fc1.cu b/modules/cudafilters/src/cuda/column_filter.32fc1.cu
new file mode 100644
index 00000000000..aa30933e69e
--- /dev/null
+++ b/modules/cudafilters/src/cuda/column_filter.32fc1.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "column_filter.hpp"
+
+namespace filter
+{
+    template void linearColumn<float, float>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/column_filter.32fc3.cu b/modules/cudafilters/src/cuda/column_filter.32fc3.cu
new file mode 100644
index 00000000000..c0ed3ac3cd0
--- /dev/null
+++ b/modules/cudafilters/src/cuda/column_filter.32fc3.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "column_filter.hpp"
+
+namespace filter
+{
+    template void linearColumn<float3, float3>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/column_filter.32fc4.cu b/modules/cudafilters/src/cuda/column_filter.32fc4.cu
new file mode 100644
index 00000000000..f37f71792b8
--- /dev/null
+++ b/modules/cudafilters/src/cuda/column_filter.32fc4.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "column_filter.hpp"
+
+namespace filter
+{
+    template void linearColumn<float4, float4>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/column_filter.32sc1.cu b/modules/cudafilters/src/cuda/column_filter.32sc1.cu
new file mode 100644
index 00000000000..ee052050d61
--- /dev/null
+++ b/modules/cudafilters/src/cuda/column_filter.32sc1.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "column_filter.hpp"
+
+namespace filter
+{
+    template void linearColumn<float, int>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/column_filter.32sc3.cu b/modules/cudafilters/src/cuda/column_filter.32sc3.cu
new file mode 100644
index 00000000000..b921d961032
--- /dev/null
+++ b/modules/cudafilters/src/cuda/column_filter.32sc3.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "column_filter.hpp"
+
+namespace filter
+{
+    template void linearColumn<float3, int3>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/column_filter.32sc4.cu b/modules/cudafilters/src/cuda/column_filter.32sc4.cu
new file mode 100644
index 00000000000..dd21524c5db
--- /dev/null
+++ b/modules/cudafilters/src/cuda/column_filter.32sc4.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "column_filter.hpp"
+
+namespace filter
+{
+    template void linearColumn<float4, int4>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/column_filter.8uc1.cu b/modules/cudafilters/src/cuda/column_filter.8uc1.cu
new file mode 100644
index 00000000000..470f3ee8e65
--- /dev/null
+++ b/modules/cudafilters/src/cuda/column_filter.8uc1.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "column_filter.hpp"
+
+namespace filter
+{
+    template void linearColumn<float, uchar>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/column_filter.8uc3.cu b/modules/cudafilters/src/cuda/column_filter.8uc3.cu
new file mode 100644
index 00000000000..5d5be583109
--- /dev/null
+++ b/modules/cudafilters/src/cuda/column_filter.8uc3.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "column_filter.hpp"
+
+namespace filter
+{
+    template void linearColumn<float3, uchar3>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/column_filter.8uc4.cu b/modules/cudafilters/src/cuda/column_filter.8uc4.cu
new file mode 100644
index 00000000000..8a322f2995a
--- /dev/null
+++ b/modules/cudafilters/src/cuda/column_filter.8uc4.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "column_filter.hpp"
+
+namespace filter
+{
+    template void linearColumn<float4, uchar4>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/column_filter.hpp b/modules/cudafilters/src/cuda/column_filter.hpp
new file mode 100644
index 00000000000..e93fc836fa7
--- /dev/null
+++ b/modules/cudafilters/src/cuda/column_filter.hpp
@@ -0,0 +1,365 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/saturate_cast.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/border_interpolate.hpp"
+
+using namespace cv::cuda;
+using namespace cv::cuda::device;
+
+namespace column_filter
+{
+    #define MAX_KERNEL_SIZE 32
+
+    template <int KSIZE, typename T, typename D, typename B>
+    __global__ void linearColumnFilter(const PtrStepSz<T> src, PtrStep<D> dst, const float* kernel, const int anchor, const B brd)
+    {
+        #if defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 200)
+            const int BLOCK_DIM_X = 16;
+            const int BLOCK_DIM_Y = 16;
+            const int PATCH_PER_BLOCK = 4;
+            const int HALO_SIZE = KSIZE <= 16 ? 1 : 2;
+        #else
+            const int BLOCK_DIM_X = 16;
+            const int BLOCK_DIM_Y = 8;
+            const int PATCH_PER_BLOCK = 2;
+            const int HALO_SIZE = 2;
+        #endif
+
+        typedef typename TypeVec<float, VecTraits<T>::cn>::vec_type sum_t;
+
+        __shared__ sum_t smem[(PATCH_PER_BLOCK + 2 * HALO_SIZE) * BLOCK_DIM_Y][BLOCK_DIM_X];
+
+        const int x = blockIdx.x * BLOCK_DIM_X + threadIdx.x;
+
+        if (x >= src.cols)
+            return;
+
+        const T* src_col = src.ptr() + x;
+
+        const int yStart = blockIdx.y * (BLOCK_DIM_Y * PATCH_PER_BLOCK) + threadIdx.y;
+
+        if (blockIdx.y > 0)
+        {
+            //Upper halo
+            #pragma unroll
+            for (int j = 0; j < HALO_SIZE; ++j)
+                smem[threadIdx.y + j * BLOCK_DIM_Y][threadIdx.x] = saturate_cast<sum_t>(src(yStart - (HALO_SIZE - j) * BLOCK_DIM_Y, x));
+        }
+        else
+        {
+            //Upper halo
+            #pragma unroll
+            for (int j = 0; j < HALO_SIZE; ++j)
+                smem[threadIdx.y + j * BLOCK_DIM_Y][threadIdx.x] = saturate_cast<sum_t>(brd.at_low(yStart - (HALO_SIZE - j) * BLOCK_DIM_Y, src_col, src.step));
+        }
+
+        if (blockIdx.y + 2 < gridDim.y)
+        {
+            //Main data
+            #pragma unroll
+            for (int j = 0; j < PATCH_PER_BLOCK; ++j)
+                smem[threadIdx.y + HALO_SIZE * BLOCK_DIM_Y + j * BLOCK_DIM_Y][threadIdx.x] = saturate_cast<sum_t>(src(yStart + j * BLOCK_DIM_Y, x));
+
+            //Lower halo
+            #pragma unroll
+            for (int j = 0; j < HALO_SIZE; ++j)
+                smem[threadIdx.y + (PATCH_PER_BLOCK + HALO_SIZE) * BLOCK_DIM_Y + j * BLOCK_DIM_Y][threadIdx.x] = saturate_cast<sum_t>(src(yStart + (PATCH_PER_BLOCK + j) * BLOCK_DIM_Y, x));
+        }
+        else
+        {
+            //Main data
+            #pragma unroll
+            for (int j = 0; j < PATCH_PER_BLOCK; ++j)
+                smem[threadIdx.y + HALO_SIZE * BLOCK_DIM_Y + j * BLOCK_DIM_Y][threadIdx.x] = saturate_cast<sum_t>(brd.at_high(yStart + j * BLOCK_DIM_Y, src_col, src.step));
+
+            //Lower halo
+            #pragma unroll
+            for (int j = 0; j < HALO_SIZE; ++j)
+                smem[threadIdx.y + (PATCH_PER_BLOCK + HALO_SIZE) * BLOCK_DIM_Y + j * BLOCK_DIM_Y][threadIdx.x] = saturate_cast<sum_t>(brd.at_high(yStart + (PATCH_PER_BLOCK + j) * BLOCK_DIM_Y, src_col, src.step));
+        }
+
+        __syncthreads();
+
+        #pragma unroll
+        for (int j = 0; j < PATCH_PER_BLOCK; ++j)
+        {
+            const int y = yStart + j * BLOCK_DIM_Y;
+
+            if (y < src.rows)
+            {
+                sum_t sum = VecTraits<sum_t>::all(0);
+
+                #pragma unroll
+                for (int k = 0; k < KSIZE; ++k)
+                    sum = sum + smem[threadIdx.y + HALO_SIZE * BLOCK_DIM_Y + j * BLOCK_DIM_Y - anchor + k][threadIdx.x] * kernel[k];
+
+                dst(y, x) = saturate_cast<D>(sum);
+            }
+        }
+    }
+
+    template <int KSIZE, typename T, typename D, template<typename> class B>
+    void caller(PtrStepSz<T> src, PtrStepSz<D> dst, const float* kernel, int anchor, int cc, cudaStream_t stream)
+    {
+        int BLOCK_DIM_X;
+        int BLOCK_DIM_Y;
+        int PATCH_PER_BLOCK;
+
+        if (cc >= 20)
+        {
+            BLOCK_DIM_X = 16;
+            BLOCK_DIM_Y = 16;
+            PATCH_PER_BLOCK = 4;
+        }
+        else
+        {
+            BLOCK_DIM_X = 16;
+            BLOCK_DIM_Y = 8;
+            PATCH_PER_BLOCK = 2;
+        }
+
+        const dim3 block(BLOCK_DIM_X, BLOCK_DIM_Y);
+        const dim3 grid(divUp(src.cols, BLOCK_DIM_X), divUp(src.rows, BLOCK_DIM_Y * PATCH_PER_BLOCK));
+
+        B<T> brd(src.rows);
+
+        linearColumnFilter<KSIZE, T, D><<<grid, block, 0, stream>>>(src, dst, kernel, anchor, brd);
+
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}
+
+namespace filter
+{
+    template <typename T, typename D>
+    void linearColumn(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream)
+    {
+        typedef void (*caller_t)(PtrStepSz<T> src, PtrStepSz<D> dst, const float* kernel, int anchor, int cc, cudaStream_t stream);
+
+        static const caller_t callers[5][33] =
+        {
+            {
+                0,
+                column_filter::caller< 1, T, D, BrdColConstant>,
+                column_filter::caller< 2, T, D, BrdColConstant>,
+                column_filter::caller< 3, T, D, BrdColConstant>,
+                column_filter::caller< 4, T, D, BrdColConstant>,
+                column_filter::caller< 5, T, D, BrdColConstant>,
+                column_filter::caller< 6, T, D, BrdColConstant>,
+                column_filter::caller< 7, T, D, BrdColConstant>,
+                column_filter::caller< 8, T, D, BrdColConstant>,
+                column_filter::caller< 9, T, D, BrdColConstant>,
+                column_filter::caller<10, T, D, BrdColConstant>,
+                column_filter::caller<11, T, D, BrdColConstant>,
+                column_filter::caller<12, T, D, BrdColConstant>,
+                column_filter::caller<13, T, D, BrdColConstant>,
+                column_filter::caller<14, T, D, BrdColConstant>,
+                column_filter::caller<15, T, D, BrdColConstant>,
+                column_filter::caller<16, T, D, BrdColConstant>,
+                column_filter::caller<17, T, D, BrdColConstant>,
+                column_filter::caller<18, T, D, BrdColConstant>,
+                column_filter::caller<19, T, D, BrdColConstant>,
+                column_filter::caller<20, T, D, BrdColConstant>,
+                column_filter::caller<21, T, D, BrdColConstant>,
+                column_filter::caller<22, T, D, BrdColConstant>,
+                column_filter::caller<23, T, D, BrdColConstant>,
+                column_filter::caller<24, T, D, BrdColConstant>,
+                column_filter::caller<25, T, D, BrdColConstant>,
+                column_filter::caller<26, T, D, BrdColConstant>,
+                column_filter::caller<27, T, D, BrdColConstant>,
+                column_filter::caller<28, T, D, BrdColConstant>,
+                column_filter::caller<29, T, D, BrdColConstant>,
+                column_filter::caller<30, T, D, BrdColConstant>,
+                column_filter::caller<31, T, D, BrdColConstant>,
+                column_filter::caller<32, T, D, BrdColConstant>
+            },
+            {
+                0,
+                column_filter::caller< 1, T, D, BrdColReplicate>,
+                column_filter::caller< 2, T, D, BrdColReplicate>,
+                column_filter::caller< 3, T, D, BrdColReplicate>,
+                column_filter::caller< 4, T, D, BrdColReplicate>,
+                column_filter::caller< 5, T, D, BrdColReplicate>,
+                column_filter::caller< 6, T, D, BrdColReplicate>,
+                column_filter::caller< 7, T, D, BrdColReplicate>,
+                column_filter::caller< 8, T, D, BrdColReplicate>,
+                column_filter::caller< 9, T, D, BrdColReplicate>,
+                column_filter::caller<10, T, D, BrdColReplicate>,
+                column_filter::caller<11, T, D, BrdColReplicate>,
+                column_filter::caller<12, T, D, BrdColReplicate>,
+                column_filter::caller<13, T, D, BrdColReplicate>,
+                column_filter::caller<14, T, D, BrdColReplicate>,
+                column_filter::caller<15, T, D, BrdColReplicate>,
+                column_filter::caller<16, T, D, BrdColReplicate>,
+                column_filter::caller<17, T, D, BrdColReplicate>,
+                column_filter::caller<18, T, D, BrdColReplicate>,
+                column_filter::caller<19, T, D, BrdColReplicate>,
+                column_filter::caller<20, T, D, BrdColReplicate>,
+                column_filter::caller<21, T, D, BrdColReplicate>,
+                column_filter::caller<22, T, D, BrdColReplicate>,
+                column_filter::caller<23, T, D, BrdColReplicate>,
+                column_filter::caller<24, T, D, BrdColReplicate>,
+                column_filter::caller<25, T, D, BrdColReplicate>,
+                column_filter::caller<26, T, D, BrdColReplicate>,
+                column_filter::caller<27, T, D, BrdColReplicate>,
+                column_filter::caller<28, T, D, BrdColReplicate>,
+                column_filter::caller<29, T, D, BrdColReplicate>,
+                column_filter::caller<30, T, D, BrdColReplicate>,
+                column_filter::caller<31, T, D, BrdColReplicate>,
+                column_filter::caller<32, T, D, BrdColReplicate>
+            },
+            {
+                0,
+                column_filter::caller< 1, T, D, BrdColReflect>,
+                column_filter::caller< 2, T, D, BrdColReflect>,
+                column_filter::caller< 3, T, D, BrdColReflect>,
+                column_filter::caller< 4, T, D, BrdColReflect>,
+                column_filter::caller< 5, T, D, BrdColReflect>,
+                column_filter::caller< 6, T, D, BrdColReflect>,
+                column_filter::caller< 7, T, D, BrdColReflect>,
+                column_filter::caller< 8, T, D, BrdColReflect>,
+                column_filter::caller< 9, T, D, BrdColReflect>,
+                column_filter::caller<10, T, D, BrdColReflect>,
+                column_filter::caller<11, T, D, BrdColReflect>,
+                column_filter::caller<12, T, D, BrdColReflect>,
+                column_filter::caller<13, T, D, BrdColReflect>,
+                column_filter::caller<14, T, D, BrdColReflect>,
+                column_filter::caller<15, T, D, BrdColReflect>,
+                column_filter::caller<16, T, D, BrdColReflect>,
+                column_filter::caller<17, T, D, BrdColReflect>,
+                column_filter::caller<18, T, D, BrdColReflect>,
+                column_filter::caller<19, T, D, BrdColReflect>,
+                column_filter::caller<20, T, D, BrdColReflect>,
+                column_filter::caller<21, T, D, BrdColReflect>,
+                column_filter::caller<22, T, D, BrdColReflect>,
+                column_filter::caller<23, T, D, BrdColReflect>,
+                column_filter::caller<24, T, D, BrdColReflect>,
+                column_filter::caller<25, T, D, BrdColReflect>,
+                column_filter::caller<26, T, D, BrdColReflect>,
+                column_filter::caller<27, T, D, BrdColReflect>,
+                column_filter::caller<28, T, D, BrdColReflect>,
+                column_filter::caller<29, T, D, BrdColReflect>,
+                column_filter::caller<30, T, D, BrdColReflect>,
+                column_filter::caller<31, T, D, BrdColReflect>,
+                column_filter::caller<32, T, D, BrdColReflect>
+            },
+            {
+                0,
+                column_filter::caller< 1, T, D, BrdColWrap>,
+                column_filter::caller< 2, T, D, BrdColWrap>,
+                column_filter::caller< 3, T, D, BrdColWrap>,
+                column_filter::caller< 4, T, D, BrdColWrap>,
+                column_filter::caller< 5, T, D, BrdColWrap>,
+                column_filter::caller< 6, T, D, BrdColWrap>,
+                column_filter::caller< 7, T, D, BrdColWrap>,
+                column_filter::caller< 8, T, D, BrdColWrap>,
+                column_filter::caller< 9, T, D, BrdColWrap>,
+                column_filter::caller<10, T, D, BrdColWrap>,
+                column_filter::caller<11, T, D, BrdColWrap>,
+                column_filter::caller<12, T, D, BrdColWrap>,
+                column_filter::caller<13, T, D, BrdColWrap>,
+                column_filter::caller<14, T, D, BrdColWrap>,
+                column_filter::caller<15, T, D, BrdColWrap>,
+                column_filter::caller<16, T, D, BrdColWrap>,
+                column_filter::caller<17, T, D, BrdColWrap>,
+                column_filter::caller<18, T, D, BrdColWrap>,
+                column_filter::caller<19, T, D, BrdColWrap>,
+                column_filter::caller<20, T, D, BrdColWrap>,
+                column_filter::caller<21, T, D, BrdColWrap>,
+                column_filter::caller<22, T, D, BrdColWrap>,
+                column_filter::caller<23, T, D, BrdColWrap>,
+                column_filter::caller<24, T, D, BrdColWrap>,
+                column_filter::caller<25, T, D, BrdColWrap>,
+                column_filter::caller<26, T, D, BrdColWrap>,
+                column_filter::caller<27, T, D, BrdColWrap>,
+                column_filter::caller<28, T, D, BrdColWrap>,
+                column_filter::caller<29, T, D, BrdColWrap>,
+                column_filter::caller<30, T, D, BrdColWrap>,
+                column_filter::caller<31, T, D, BrdColWrap>,
+                column_filter::caller<32, T, D, BrdColWrap>
+            },
+            {
+                0,
+                column_filter::caller< 1, T, D, BrdColReflect101>,
+                column_filter::caller< 2, T, D, BrdColReflect101>,
+                column_filter::caller< 3, T, D, BrdColReflect101>,
+                column_filter::caller< 4, T, D, BrdColReflect101>,
+                column_filter::caller< 5, T, D, BrdColReflect101>,
+                column_filter::caller< 6, T, D, BrdColReflect101>,
+                column_filter::caller< 7, T, D, BrdColReflect101>,
+                column_filter::caller< 8, T, D, BrdColReflect101>,
+                column_filter::caller< 9, T, D, BrdColReflect101>,
+                column_filter::caller<10, T, D, BrdColReflect101>,
+                column_filter::caller<11, T, D, BrdColReflect101>,
+                column_filter::caller<12, T, D, BrdColReflect101>,
+                column_filter::caller<13, T, D, BrdColReflect101>,
+                column_filter::caller<14, T, D, BrdColReflect101>,
+                column_filter::caller<15, T, D, BrdColReflect101>,
+                column_filter::caller<16, T, D, BrdColReflect101>,
+                column_filter::caller<17, T, D, BrdColReflect101>,
+                column_filter::caller<18, T, D, BrdColReflect101>,
+                column_filter::caller<19, T, D, BrdColReflect101>,
+                column_filter::caller<20, T, D, BrdColReflect101>,
+                column_filter::caller<21, T, D, BrdColReflect101>,
+                column_filter::caller<22, T, D, BrdColReflect101>,
+                column_filter::caller<23, T, D, BrdColReflect101>,
+                column_filter::caller<24, T, D, BrdColReflect101>,
+                column_filter::caller<25, T, D, BrdColReflect101>,
+                column_filter::caller<26, T, D, BrdColReflect101>,
+                column_filter::caller<27, T, D, BrdColReflect101>,
+                column_filter::caller<28, T, D, BrdColReflect101>,
+                column_filter::caller<29, T, D, BrdColReflect101>,
+                column_filter::caller<30, T, D, BrdColReflect101>,
+                column_filter::caller<31, T, D, BrdColReflect101>,
+                column_filter::caller<32, T, D, BrdColReflect101>
+            }
+        };
+
+        callers[brd_type][ksize]((PtrStepSz<T>)src, (PtrStepSz<D>)dst, kernel, anchor, cc, stream);
+    }
+}
diff --git a/modules/cudafilters/src/cuda/filter2d.cu b/modules/cudafilters/src/cuda/filter2d.cu
new file mode 100644
index 00000000000..e37e91dd2f9
--- /dev/null
+++ b/modules/cudafilters/src/cuda/filter2d.cu
@@ -0,0 +1,150 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/saturate_cast.hpp"
+#include "opencv2/core/cuda/border_interpolate.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    template <class SrcPtr, typename D>
+    __global__ void filter2D(const SrcPtr src, PtrStepSz<D> dst,
+                             const float* __restrict__ kernel,
+                             const int kWidth, const int kHeight,
+                             const int anchorX, const int anchorY)
+    {
+        typedef typename TypeVec<float, VecTraits<D>::cn>::vec_type sum_t;
+
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= dst.cols || y >= dst.rows)
+            return;
+
+        sum_t res = VecTraits<sum_t>::all(0);
+        int kInd = 0;
+
+        for (int i = 0; i < kHeight; ++i)
+        {
+            for (int j = 0; j < kWidth; ++j)
+                res = res + src(y - anchorY + i, x - anchorX + j) * kernel[kInd++];
+        }
+
+        dst(y, x) = saturate_cast<D>(res);
+    }
+
+    template <typename T, typename D, template <typename> class Brd> struct Filter2DCaller;
+
+    #define IMPLEMENT_FILTER2D_TEX_READER(type) \
+        struct tex_filter2D_ ## type ## _reader \
+        { \
+            PtrStepSz<type> dat; \
+            typedef type elem_type; \
+            typedef int index_type; \
+            const int xoff; \
+            const int yoff; \
+            tex_filter2D_ ## type ## _reader (PtrStepSz<type> dat_, int xoff_, int yoff_) : dat(dat_), xoff(xoff_), yoff(yoff_) {} \
+            __device__ __forceinline__ elem_type operator ()(index_type y, index_type x) const \
+            { \
+                return dat(y + yoff, x + xoff ); \
+            } \
+        }; \
+        template <typename D, template <typename> class Brd> struct Filter2DCaller< type , D, Brd> \
+        { \
+            static void call(const PtrStepSz< type > srcWhole, int xoff, int yoff, PtrStepSz<D> dst, const float* kernel, \
+                int kWidth, int kHeight, int anchorX, int anchorY, const float* borderValue, cudaStream_t stream) \
+            { \
+                typedef typename TypeVec<float, VecTraits< type >::cn>::vec_type work_type; \
+                dim3 block(16, 16); \
+                dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y)); \
+                tex_filter2D_ ## type ##_reader texSrc(srcWhole, xoff, yoff); \
+                Brd<work_type> brd(dst.rows, dst.cols, VecTraits<work_type>::make(borderValue)); \
+                BorderReader< tex_filter2D_ ## type ##_reader, Brd<work_type> > brdSrc(texSrc, brd); \
+                filter2D<<<grid, block, 0, stream>>>(brdSrc, dst, kernel, kWidth, kHeight, anchorX, anchorY); \
+                cudaSafeCall( cudaGetLastError() ); \
+                if (stream == 0) \
+                    cudaSafeCall( cudaDeviceSynchronize() ); \
+            } \
+        };
+
+    IMPLEMENT_FILTER2D_TEX_READER(uchar);
+    IMPLEMENT_FILTER2D_TEX_READER(uchar4);
+
+    IMPLEMENT_FILTER2D_TEX_READER(ushort);
+    IMPLEMENT_FILTER2D_TEX_READER(ushort4);
+
+    IMPLEMENT_FILTER2D_TEX_READER(float);
+    IMPLEMENT_FILTER2D_TEX_READER(float4);
+
+    #undef IMPLEMENT_FILTER2D_TEX_READER
+
+    template <typename T, typename D>
+    void filter2D(PtrStepSzb srcWhole, int ofsX, int ofsY, PtrStepSzb dst, const float* kernel,
+                  int kWidth, int kHeight, int anchorX, int anchorY,
+                  int borderMode, const float* borderValue, cudaStream_t stream)
+    {
+        typedef void (*func_t)(const PtrStepSz<T> srcWhole, int xoff, int yoff, PtrStepSz<D> dst, const float* kernel,
+                               int kWidth, int kHeight, int anchorX, int anchorY, const float* borderValue, cudaStream_t stream);
+        static const func_t funcs[] =
+        {
+            Filter2DCaller<T, D, BrdConstant>::call,
+            Filter2DCaller<T, D, BrdReplicate>::call,
+            Filter2DCaller<T, D, BrdReflect>::call,
+            Filter2DCaller<T, D, BrdWrap>::call,
+            Filter2DCaller<T, D, BrdReflect101>::call
+        };
+
+        funcs[borderMode]((PtrStepSz<T>) srcWhole, ofsX, ofsY, (PtrStepSz<D>) dst, kernel,
+                          kWidth, kHeight, anchorX, anchorY, borderValue, stream);
+    }
+
+    template void filter2D<uchar  , uchar  >(PtrStepSzb srcWhole, int ofsX, int ofsY, PtrStepSzb dst, const float* kernel, int kWidth, int kHeight, int anchorX, int anchorY, int borderMode, const float* borderValue, cudaStream_t stream);
+    template void filter2D<uchar4 , uchar4 >(PtrStepSzb srcWhole, int ofsX, int ofsY, PtrStepSzb dst, const float* kernel, int kWidth, int kHeight, int anchorX, int anchorY, int borderMode, const float* borderValue, cudaStream_t stream);
+    template void filter2D<ushort , ushort >(PtrStepSzb srcWhole, int ofsX, int ofsY, PtrStepSzb dst, const float* kernel, int kWidth, int kHeight, int anchorX, int anchorY, int borderMode, const float* borderValue, cudaStream_t stream);
+    template void filter2D<ushort4, ushort4>(PtrStepSzb srcWhole, int ofsX, int ofsY, PtrStepSzb dst, const float* kernel, int kWidth, int kHeight, int anchorX, int anchorY, int borderMode, const float* borderValue, cudaStream_t stream);
+    template void filter2D<float  , float  >(PtrStepSzb srcWhole, int ofsX, int ofsY, PtrStepSzb dst, const float* kernel, int kWidth, int kHeight, int anchorX, int anchorY, int borderMode, const float* borderValue, cudaStream_t stream);
+    template void filter2D<float4 , float4 >(PtrStepSzb srcWhole, int ofsX, int ofsY, PtrStepSzb dst, const float* kernel, int kWidth, int kHeight, int anchorX, int anchorY, int borderMode, const float* borderValue, cudaStream_t stream);
+}}}
+
+#endif // CUDA_DISABLER
diff --git a/modules/cudafilters/src/cuda/median_filter.cu b/modules/cudafilters/src/cuda/median_filter.cu
new file mode 100644
index 00000000000..cbc53f4b4f3
--- /dev/null
+++ b/modules/cudafilters/src/cuda/median_filter.cu
@@ -0,0 +1,345 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/vec_traits.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/saturate_cast.hpp"
+#include "opencv2/core/cuda/border_interpolate.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    // // namespace imgproc
+    // {
+
+        __device__ void histogramAddAndSub8(int* H, const int * hist_colAdd,const int * hist_colSub){
+            int tx = threadIdx.x;
+            if (tx<8){
+                H[tx]+=hist_colAdd[tx]-hist_colSub[tx];
+            }
+        }
+
+        __device__ void histogramMultipleAdd8(int* H, const int * hist_col,int histCount){
+            int tx = threadIdx.x;
+            if (tx<8){
+                int temp=H[tx];
+                for(int i=0; i<histCount; i++)
+                    temp+=hist_col[(i<<3)+tx];
+                H[tx]=temp;
+            }
+        }
+
+        __device__ void histogramClear8(int* H){
+            int tx = threadIdx.x;
+            if (tx<8){
+                H[tx]=0;
+            }
+        }
+
+        __device__ void histogramAdd8(int* H, const int * hist_col){
+            int tx = threadIdx.x;
+            if (tx<8){
+                H[tx]+=hist_col[tx];
+            }
+        }
+
+        __device__ void histogramSub8(int* H, const int * hist_col){
+            int tx = threadIdx.x;
+            if (tx<8){
+                H[tx]-=hist_col[tx];
+            }
+        }
+
+
+        __device__ void histogramAdd32(int* H, const int * hist_col){
+            int tx = threadIdx.x;
+            if (tx<32){
+                H[tx]+=hist_col[tx];
+            }
+        }
+
+        __device__ void histogramAddAndSub32(int* H, const int * hist_colAdd,const int * hist_colSub){
+            int tx = threadIdx.x;
+            if (tx<32){
+                H[tx]+=hist_colAdd[tx]-hist_colSub[tx];
+            }
+        }
+
+
+        __device__ void histogramClear32(int* H){
+            int tx = threadIdx.x;
+            if (tx<32){
+                H[tx]=0;
+            }
+        }
+
+        __device__ void lucClear8(int* luc){
+            int tx = threadIdx.x;
+            if (tx<8)
+                luc[tx]=0;
+        }
+
+        __device__ void histogramMedianPar8LookupOnly(int* H,int* Hscan, const int medPos,int* retval, int* countAtMed){
+            int tx=threadIdx.x;
+            *retval=*countAtMed=0;
+            if(tx<8){
+                Hscan[tx]=H[tx];
+            }
+            __syncthreads();
+            if (1 <= tx && tx < 8 )
+                Hscan[tx]+=Hscan[tx-1];
+            __syncthreads();
+            if (2 <= tx && tx < 8 )
+                Hscan[tx]+=Hscan[tx-2];
+            __syncthreads();
+            if (4 <= tx && tx < 8 )
+                Hscan[tx]+=Hscan[tx-4];
+            __syncthreads();
+
+            if(tx<7){
+                if(Hscan[tx+1] > medPos && Hscan[tx] < medPos){
+                    *retval=tx+1;
+                    *countAtMed=Hscan[tx];
+                }
+                else if(Hscan[tx]==medPos){
+                  if(Hscan[tx+1]>medPos){
+                     *retval=tx+1;
+                     *countAtMed=Hscan[tx];
+                  }
+                }
+            }
+        }
+
+        __device__ void histogramMedianPar32LookupOnly(int* H,int* Hscan, const int medPos,int* retval, int* countAtMed){
+            int tx=threadIdx.x;
+            *retval=*countAtMed=0;
+            if(tx<32){
+                Hscan[tx]=H[tx];
+            }
+            __syncthreads();
+            if ( 1 <= tx && tx < 32 )
+                Hscan[tx]+=Hscan[tx-1];
+            __syncthreads();
+            if ( 2 <= tx && tx < 32 )
+                Hscan[tx]+=Hscan[tx-2];
+            __syncthreads();
+            if ( 4 <= tx && tx < 32 )
+                Hscan[tx]+=Hscan[tx-4];
+            __syncthreads();
+            if ( 8 <= tx && tx < 32 )
+                Hscan[tx]+=Hscan[tx-8];
+            __syncthreads();
+            if ( 16 <= tx && tx < 32 )
+                Hscan[tx]+=Hscan[tx-16];
+            __syncthreads();
+            if(tx<31){
+                if(Hscan[tx+1] > medPos && Hscan[tx] < medPos){
+                    *retval=tx+1;
+                    *countAtMed=Hscan[tx];
+                }
+                else if(Hscan[tx]==medPos){
+                  if(Hscan[tx+1]>medPos){
+                      *retval=tx+1;
+                      *countAtMed=Hscan[tx];
+                  }
+                }
+            }
+         }
+
+    __global__ void cuMedianFilterMultiBlock(PtrStepSzb src, PtrStepSzb  dest, PtrStepSzi histPar, PtrStepSzi coarseHistGrid,int r, int medPos_)
+    {
+        __shared__ int HCoarse[8];
+        __shared__ int HCoarseScan[32];
+        __shared__ int HFine[8][32];
+
+        __shared__ int luc[8];
+
+        __shared__ int firstBin,countAtMed, retval;
+
+        int rows = src.rows, cols=src.cols;
+
+        int extraRowThread=rows%gridDim.x;
+        int doExtraRow=blockIdx.x<extraRowThread;
+        int startRow=0, stopRow=0;
+        int rowsPerBlock= rows/gridDim.x+doExtraRow;
+
+
+        // The following code partitions the work to the blocks. Some blocks will do one row more
+        // than other blocks. This code is responsible for doing that balancing
+        if(doExtraRow){
+            startRow=rowsPerBlock*blockIdx.x;
+            stopRow=::min(rows, startRow+rowsPerBlock);
+        }
+        else{
+            startRow=(rowsPerBlock+1)*extraRowThread+(rowsPerBlock)*(blockIdx.x-extraRowThread);
+            stopRow=::min(rows, startRow+rowsPerBlock);
+        }
+
+        int* hist= histPar.data+cols*256*blockIdx.x;
+        int* histCoarse=coarseHistGrid.data +cols*8*blockIdx.x;
+
+        if (blockIdx.x==(gridDim.x-1))
+            stopRow=rows;
+        __syncthreads();
+        int initNeeded=0, initVal, initStartRow, initStopRow;
+
+        if(blockIdx.x==0){
+            initNeeded=1; initVal=r+2; initStartRow=1;  initStopRow=r;
+        }
+        else if (startRow<(r+2)){
+            initNeeded=1; initVal=r+2-startRow; initStartRow=1; initStopRow=r+startRow;
+        }
+        else{
+            initNeeded=0; initVal=0; initStartRow=startRow-(r+1);   initStopRow=r+startRow;
+        }
+       __syncthreads();
+
+
+        // In the original algorithm an initialization phase was required as part of the window was outside the
+        // image. In this parallel version, the initializtion is required for all thread blocks that part
+        // of the median filter is outside the window.
+        // For all threads in the block the same code will be executed.
+        if (initNeeded){
+            for (int j=threadIdx.x; j<(cols); j+=blockDim.x){
+                hist[j*256+src.ptr(0)[j]]=initVal;
+                histCoarse[j*8+(src.ptr(0)[j]>>5)]=initVal;
+            }
+        }
+        __syncthreads();
+
+        // For all remaining rows in the median filter, add the values to the the histogram
+        for (int j=threadIdx.x; j<cols; j+=blockDim.x){
+            for(int i=initStartRow; i<initStopRow; i++){
+                    int pos=::min(i,rows-1);
+                    hist[j*256+src.ptr(pos)[j]]++;
+                    histCoarse[j*8+(src.ptr(pos)[j]>>5)]++;
+                }
+        }
+        __syncthreads();
+         // Going through all the rows that the block is responsible for.
+         int inc=blockDim.x*256;
+         int incCoarse=blockDim.x*8;
+         for(int i=startRow; i< stopRow; i++){
+             // For every new row that is started the global histogram for the entire window is restarted.
+
+             histogramClear8(HCoarse);
+             lucClear8(luc);
+             // Computing some necessary indices
+             int possub=::max(0,i-r-1),posadd=::min(rows-1,i+r);
+             int histPos=threadIdx.x*256;
+             int histCoarsePos=threadIdx.x*8;
+             // Going through all the elements of a specific row. Foeach histogram, a value is taken out and
+             // one value is added.
+             for (int j=threadIdx.x; j<cols; j+=blockDim.x){
+                hist[histPos+ src.ptr(possub)[j] ]--;
+                hist[histPos+ src.ptr(posadd)[j] ]++;
+                histCoarse[histCoarsePos+ (src.ptr(possub)[j]>>5) ]--;
+                histCoarse[histCoarsePos+ (src.ptr(posadd)[j]>>5) ]++;
+
+                histPos+=inc;
+                histCoarsePos+=incCoarse;
+             }
+            __syncthreads();
+
+            histogramMultipleAdd8(HCoarse,histCoarse, 2*r+1);
+//            __syncthreads();
+            int cols_m_1=cols-1;
+
+             for(int j=r;j<cols-r;j++){
+                int possub=::max(j-r,0);
+                int posadd=::min(j+1+r,cols_m_1);
+                int medPos=medPos_;
+                __syncthreads();
+
+                histogramMedianPar8LookupOnly(HCoarse,HCoarseScan,medPos, &firstBin,&countAtMed);
+                __syncthreads();
+
+                if ( luc[firstBin] <= (j-r))
+                {
+                    histogramClear32(HFine[firstBin]);
+                    for ( luc[firstBin] = j-r; luc[firstBin] < ::min(j+r+1,cols); luc[firstBin]++ ){
+                        histogramAdd32(HFine[firstBin], hist+(luc[firstBin]*256+(firstBin<<5) ) );
+                    }
+                }
+                else{
+                    for ( ; luc[firstBin] < (j+r+1);luc[firstBin]++ ) {
+                        histogramAddAndSub32(HFine[firstBin],
+                        hist+(::min(luc[firstBin],cols_m_1)*256+(firstBin<<5) ),
+                        hist+(::max(luc[firstBin]-2*r-1,0)*256+(firstBin<<5) ) );
+                        __syncthreads();
+
+                    }
+                }
+                __syncthreads();
+
+                int leftOver=medPos-countAtMed;
+                if(leftOver>=0){
+                    histogramMedianPar32LookupOnly(HFine[firstBin],HCoarseScan,leftOver,&retval,&countAtMed);
+                }
+                else retval=0;
+                __syncthreads();
+
+                if (threadIdx.x==0){
+                    dest.ptr(i)[j]=(firstBin<<5) + retval;
+                }
+                histogramAddAndSub8(HCoarse, histCoarse+(int)(posadd<<3),histCoarse+(int)(possub<<3));
+
+                __syncthreads();
+            }
+             __syncthreads();
+        }
+    }
+
+    void medianFiltering_gpu(const PtrStepSzb src, PtrStepSzb dst, PtrStepSzi devHist, PtrStepSzi devCoarseHist,int kernel, int partitions,cudaStream_t stream){
+        int medPos=2*kernel*kernel+2*kernel;
+        dim3 gridDim; gridDim.x=partitions;
+        dim3 blockDim; blockDim.x=32;
+        cuMedianFilterMultiBlock<<<gridDim,blockDim,0, stream>>>(src, dst, devHist,devCoarseHist, kernel, medPos);
+        if (!stream)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+}}}
+
+#endif
diff --git a/modules/cudafilters/src/cuda/row_filter.16sc1.cu b/modules/cudafilters/src/cuda/row_filter.16sc1.cu
new file mode 100644
index 00000000000..59ebb9f5f79
--- /dev/null
+++ b/modules/cudafilters/src/cuda/row_filter.16sc1.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "row_filter.hpp"
+
+namespace filter
+{
+    template void linearRow<short, float>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/row_filter.16sc3.cu b/modules/cudafilters/src/cuda/row_filter.16sc3.cu
new file mode 100644
index 00000000000..fcf40d81e31
--- /dev/null
+++ b/modules/cudafilters/src/cuda/row_filter.16sc3.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "row_filter.hpp"
+
+namespace filter
+{
+    template void linearRow<short3, float3>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/row_filter.16sc4.cu b/modules/cudafilters/src/cuda/row_filter.16sc4.cu
new file mode 100644
index 00000000000..c5d472692bf
--- /dev/null
+++ b/modules/cudafilters/src/cuda/row_filter.16sc4.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "row_filter.hpp"
+
+namespace filter
+{
+    template void linearRow<short4, float4>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/row_filter.16uc1.cu b/modules/cudafilters/src/cuda/row_filter.16uc1.cu
new file mode 100644
index 00000000000..02e125abc08
--- /dev/null
+++ b/modules/cudafilters/src/cuda/row_filter.16uc1.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "row_filter.hpp"
+
+namespace filter
+{
+    template void linearRow<unsigned short, float>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/row_filter.16uc3.cu b/modules/cudafilters/src/cuda/row_filter.16uc3.cu
new file mode 100644
index 00000000000..494c604b4ba
--- /dev/null
+++ b/modules/cudafilters/src/cuda/row_filter.16uc3.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "row_filter.hpp"
+
+namespace filter
+{
+    template void linearRow<ushort3, float3>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/row_filter.16uc4.cu b/modules/cudafilters/src/cuda/row_filter.16uc4.cu
new file mode 100644
index 00000000000..1eb1ac25a66
--- /dev/null
+++ b/modules/cudafilters/src/cuda/row_filter.16uc4.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "row_filter.hpp"
+
+namespace filter
+{
+    template void linearRow<ushort4, float4>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/row_filter.32fc1.cu b/modules/cudafilters/src/cuda/row_filter.32fc1.cu
new file mode 100644
index 00000000000..bf577c6b71f
--- /dev/null
+++ b/modules/cudafilters/src/cuda/row_filter.32fc1.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "row_filter.hpp"
+
+namespace filter
+{
+    template void linearRow<float, float>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/row_filter.32fc3.cu b/modules/cudafilters/src/cuda/row_filter.32fc3.cu
new file mode 100644
index 00000000000..594fc04b5c7
--- /dev/null
+++ b/modules/cudafilters/src/cuda/row_filter.32fc3.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "row_filter.hpp"
+
+namespace filter
+{
+    template void linearRow<float3, float3>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/row_filter.32fc4.cu b/modules/cudafilters/src/cuda/row_filter.32fc4.cu
new file mode 100644
index 00000000000..5f2812bb31b
--- /dev/null
+++ b/modules/cudafilters/src/cuda/row_filter.32fc4.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "row_filter.hpp"
+
+namespace filter
+{
+    template void linearRow<float4, float4>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/row_filter.32sc1.cu b/modules/cudafilters/src/cuda/row_filter.32sc1.cu
new file mode 100644
index 00000000000..67f3fb04c55
--- /dev/null
+++ b/modules/cudafilters/src/cuda/row_filter.32sc1.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "row_filter.hpp"
+
+namespace filter
+{
+    template void linearRow<int, float>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/row_filter.32sc3.cu b/modules/cudafilters/src/cuda/row_filter.32sc3.cu
new file mode 100644
index 00000000000..8e881a22ad8
--- /dev/null
+++ b/modules/cudafilters/src/cuda/row_filter.32sc3.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "row_filter.hpp"
+
+namespace filter
+{
+    template void linearRow<int3, float3>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/row_filter.32sc4.cu b/modules/cudafilters/src/cuda/row_filter.32sc4.cu
new file mode 100644
index 00000000000..66f00cf06b4
--- /dev/null
+++ b/modules/cudafilters/src/cuda/row_filter.32sc4.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "row_filter.hpp"
+
+namespace filter
+{
+    template void linearRow<int4, float4>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/row_filter.8uc1.cu b/modules/cudafilters/src/cuda/row_filter.8uc1.cu
new file mode 100644
index 00000000000..c94b39f1b49
--- /dev/null
+++ b/modules/cudafilters/src/cuda/row_filter.8uc1.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "row_filter.hpp"
+
+namespace filter
+{
+    template void linearRow<uchar, float>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/row_filter.8uc3.cu b/modules/cudafilters/src/cuda/row_filter.8uc3.cu
new file mode 100644
index 00000000000..1c924c10b24
--- /dev/null
+++ b/modules/cudafilters/src/cuda/row_filter.8uc3.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "row_filter.hpp"
+
+namespace filter
+{
+    template void linearRow<uchar3, float3>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/row_filter.8uc4.cu b/modules/cudafilters/src/cuda/row_filter.8uc4.cu
new file mode 100644
index 00000000000..1ae9651a809
--- /dev/null
+++ b/modules/cudafilters/src/cuda/row_filter.8uc4.cu
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "row_filter.hpp"
+
+namespace filter
+{
+    template void linearRow<uchar4, float4>(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudafilters/src/cuda/row_filter.hpp b/modules/cudafilters/src/cuda/row_filter.hpp
new file mode 100644
index 00000000000..4a4be36f9af
--- /dev/null
+++ b/modules/cudafilters/src/cuda/row_filter.hpp
@@ -0,0 +1,364 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/saturate_cast.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/border_interpolate.hpp"
+
+using namespace cv::cuda;
+using namespace cv::cuda::device;
+
+namespace row_filter
+{
+    #define MAX_KERNEL_SIZE 32
+
+    template <int KSIZE, typename T, typename D, typename B>
+    __global__ void linearRowFilter(const PtrStepSz<T> src, PtrStep<D> dst, const float* kernel, const int anchor, const B brd)
+    {
+        #if defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 200)
+            const int BLOCK_DIM_X = 32;
+            const int BLOCK_DIM_Y = 8;
+            const int PATCH_PER_BLOCK = 4;
+            const int HALO_SIZE = 1;
+        #else
+            const int BLOCK_DIM_X = 32;
+            const int BLOCK_DIM_Y = 4;
+            const int PATCH_PER_BLOCK = 4;
+            const int HALO_SIZE = 1;
+        #endif
+
+        typedef typename TypeVec<float, VecTraits<T>::cn>::vec_type sum_t;
+
+        __shared__ sum_t smem[BLOCK_DIM_Y][(PATCH_PER_BLOCK + 2 * HALO_SIZE) * BLOCK_DIM_X];
+
+        const int y = blockIdx.y * BLOCK_DIM_Y + threadIdx.y;
+
+        if (y >= src.rows)
+            return;
+
+        const T* src_row = src.ptr(y);
+
+        const int xStart = blockIdx.x * (PATCH_PER_BLOCK * BLOCK_DIM_X) + threadIdx.x;
+
+        if (blockIdx.x > 0)
+        {
+            //Load left halo
+            #pragma unroll
+            for (int j = 0; j < HALO_SIZE; ++j)
+                smem[threadIdx.y][threadIdx.x + j * BLOCK_DIM_X] = saturate_cast<sum_t>(src_row[xStart - (HALO_SIZE - j) * BLOCK_DIM_X]);
+        }
+        else
+        {
+            //Load left halo
+            #pragma unroll
+            for (int j = 0; j < HALO_SIZE; ++j)
+                smem[threadIdx.y][threadIdx.x + j * BLOCK_DIM_X] = saturate_cast<sum_t>(brd.at_low(xStart - (HALO_SIZE - j) * BLOCK_DIM_X, src_row));
+        }
+
+        if (blockIdx.x + 2 < gridDim.x)
+        {
+            //Load main data
+            #pragma unroll
+            for (int j = 0; j < PATCH_PER_BLOCK; ++j)
+                smem[threadIdx.y][threadIdx.x + HALO_SIZE * BLOCK_DIM_X + j * BLOCK_DIM_X] = saturate_cast<sum_t>(src_row[xStart + j * BLOCK_DIM_X]);
+
+            //Load right halo
+            #pragma unroll
+            for (int j = 0; j < HALO_SIZE; ++j)
+                smem[threadIdx.y][threadIdx.x + (PATCH_PER_BLOCK + HALO_SIZE) * BLOCK_DIM_X + j * BLOCK_DIM_X] = saturate_cast<sum_t>(src_row[xStart + (PATCH_PER_BLOCK + j) * BLOCK_DIM_X]);
+        }
+        else
+        {
+            //Load main data
+            #pragma unroll
+            for (int j = 0; j < PATCH_PER_BLOCK; ++j)
+                smem[threadIdx.y][threadIdx.x + HALO_SIZE * BLOCK_DIM_X + j * BLOCK_DIM_X] = saturate_cast<sum_t>(brd.at_high(xStart + j * BLOCK_DIM_X, src_row));
+
+            //Load right halo
+            #pragma unroll
+            for (int j = 0; j < HALO_SIZE; ++j)
+                smem[threadIdx.y][threadIdx.x + (PATCH_PER_BLOCK + HALO_SIZE) * BLOCK_DIM_X + j * BLOCK_DIM_X] = saturate_cast<sum_t>(brd.at_high(xStart + (PATCH_PER_BLOCK + j) * BLOCK_DIM_X, src_row));
+        }
+
+        __syncthreads();
+
+        #pragma unroll
+        for (int j = 0; j < PATCH_PER_BLOCK; ++j)
+        {
+            const int x = xStart + j * BLOCK_DIM_X;
+
+            if (x < src.cols)
+            {
+                sum_t sum = VecTraits<sum_t>::all(0);
+
+                #pragma unroll
+                for (int k = 0; k < KSIZE; ++k)
+                    sum = sum + smem[threadIdx.y][threadIdx.x + HALO_SIZE * BLOCK_DIM_X + j * BLOCK_DIM_X - anchor + k] * kernel[k];
+
+                dst(y, x) = saturate_cast<D>(sum);
+            }
+        }
+    }
+
+    template <int KSIZE, typename T, typename D, template<typename> class B>
+    void caller(PtrStepSz<T> src, PtrStepSz<D> dst, const float* kernel, int anchor, int cc, cudaStream_t stream)
+    {
+        int BLOCK_DIM_X;
+        int BLOCK_DIM_Y;
+        int PATCH_PER_BLOCK;
+
+        if (cc >= 20)
+        {
+            BLOCK_DIM_X = 32;
+            BLOCK_DIM_Y = 8;
+            PATCH_PER_BLOCK = 4;
+        }
+        else
+        {
+            BLOCK_DIM_X = 32;
+            BLOCK_DIM_Y = 4;
+            PATCH_PER_BLOCK = 4;
+        }
+
+        const dim3 block(BLOCK_DIM_X, BLOCK_DIM_Y);
+        const dim3 grid(divUp(src.cols, BLOCK_DIM_X * PATCH_PER_BLOCK), divUp(src.rows, BLOCK_DIM_Y));
+
+        B<T> brd(src.cols);
+
+        linearRowFilter<KSIZE, T, D><<<grid, block, 0, stream>>>(src, dst, kernel, anchor, brd);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}
+
+namespace filter
+{
+    template <typename T, typename D>
+    void linearRow(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream)
+    {
+        typedef void (*caller_t)(PtrStepSz<T> src, PtrStepSz<D> dst, const float* kernel, int anchor, int cc, cudaStream_t stream);
+
+        static const caller_t callers[5][33] =
+        {
+            {
+                0,
+                row_filter::caller< 1, T, D, BrdRowConstant>,
+                row_filter::caller< 2, T, D, BrdRowConstant>,
+                row_filter::caller< 3, T, D, BrdRowConstant>,
+                row_filter::caller< 4, T, D, BrdRowConstant>,
+                row_filter::caller< 5, T, D, BrdRowConstant>,
+                row_filter::caller< 6, T, D, BrdRowConstant>,
+                row_filter::caller< 7, T, D, BrdRowConstant>,
+                row_filter::caller< 8, T, D, BrdRowConstant>,
+                row_filter::caller< 9, T, D, BrdRowConstant>,
+                row_filter::caller<10, T, D, BrdRowConstant>,
+                row_filter::caller<11, T, D, BrdRowConstant>,
+                row_filter::caller<12, T, D, BrdRowConstant>,
+                row_filter::caller<13, T, D, BrdRowConstant>,
+                row_filter::caller<14, T, D, BrdRowConstant>,
+                row_filter::caller<15, T, D, BrdRowConstant>,
+                row_filter::caller<16, T, D, BrdRowConstant>,
+                row_filter::caller<17, T, D, BrdRowConstant>,
+                row_filter::caller<18, T, D, BrdRowConstant>,
+                row_filter::caller<19, T, D, BrdRowConstant>,
+                row_filter::caller<20, T, D, BrdRowConstant>,
+                row_filter::caller<21, T, D, BrdRowConstant>,
+                row_filter::caller<22, T, D, BrdRowConstant>,
+                row_filter::caller<23, T, D, BrdRowConstant>,
+                row_filter::caller<24, T, D, BrdRowConstant>,
+                row_filter::caller<25, T, D, BrdRowConstant>,
+                row_filter::caller<26, T, D, BrdRowConstant>,
+                row_filter::caller<27, T, D, BrdRowConstant>,
+                row_filter::caller<28, T, D, BrdRowConstant>,
+                row_filter::caller<29, T, D, BrdRowConstant>,
+                row_filter::caller<30, T, D, BrdRowConstant>,
+                row_filter::caller<31, T, D, BrdRowConstant>,
+                row_filter::caller<32, T, D, BrdRowConstant>
+            },
+            {
+                0,
+                row_filter::caller< 1, T, D, BrdRowReplicate>,
+                row_filter::caller< 2, T, D, BrdRowReplicate>,
+                row_filter::caller< 3, T, D, BrdRowReplicate>,
+                row_filter::caller< 4, T, D, BrdRowReplicate>,
+                row_filter::caller< 5, T, D, BrdRowReplicate>,
+                row_filter::caller< 6, T, D, BrdRowReplicate>,
+                row_filter::caller< 7, T, D, BrdRowReplicate>,
+                row_filter::caller< 8, T, D, BrdRowReplicate>,
+                row_filter::caller< 9, T, D, BrdRowReplicate>,
+                row_filter::caller<10, T, D, BrdRowReplicate>,
+                row_filter::caller<11, T, D, BrdRowReplicate>,
+                row_filter::caller<12, T, D, BrdRowReplicate>,
+                row_filter::caller<13, T, D, BrdRowReplicate>,
+                row_filter::caller<14, T, D, BrdRowReplicate>,
+                row_filter::caller<15, T, D, BrdRowReplicate>,
+                row_filter::caller<16, T, D, BrdRowReplicate>,
+                row_filter::caller<17, T, D, BrdRowReplicate>,
+                row_filter::caller<18, T, D, BrdRowReplicate>,
+                row_filter::caller<19, T, D, BrdRowReplicate>,
+                row_filter::caller<20, T, D, BrdRowReplicate>,
+                row_filter::caller<21, T, D, BrdRowReplicate>,
+                row_filter::caller<22, T, D, BrdRowReplicate>,
+                row_filter::caller<23, T, D, BrdRowReplicate>,
+                row_filter::caller<24, T, D, BrdRowReplicate>,
+                row_filter::caller<25, T, D, BrdRowReplicate>,
+                row_filter::caller<26, T, D, BrdRowReplicate>,
+                row_filter::caller<27, T, D, BrdRowReplicate>,
+                row_filter::caller<28, T, D, BrdRowReplicate>,
+                row_filter::caller<29, T, D, BrdRowReplicate>,
+                row_filter::caller<30, T, D, BrdRowReplicate>,
+                row_filter::caller<31, T, D, BrdRowReplicate>,
+                row_filter::caller<32, T, D, BrdRowReplicate>
+            },
+            {
+                0,
+                row_filter::caller< 1, T, D, BrdRowReflect>,
+                row_filter::caller< 2, T, D, BrdRowReflect>,
+                row_filter::caller< 3, T, D, BrdRowReflect>,
+                row_filter::caller< 4, T, D, BrdRowReflect>,
+                row_filter::caller< 5, T, D, BrdRowReflect>,
+                row_filter::caller< 6, T, D, BrdRowReflect>,
+                row_filter::caller< 7, T, D, BrdRowReflect>,
+                row_filter::caller< 8, T, D, BrdRowReflect>,
+                row_filter::caller< 9, T, D, BrdRowReflect>,
+                row_filter::caller<10, T, D, BrdRowReflect>,
+                row_filter::caller<11, T, D, BrdRowReflect>,
+                row_filter::caller<12, T, D, BrdRowReflect>,
+                row_filter::caller<13, T, D, BrdRowReflect>,
+                row_filter::caller<14, T, D, BrdRowReflect>,
+                row_filter::caller<15, T, D, BrdRowReflect>,
+                row_filter::caller<16, T, D, BrdRowReflect>,
+                row_filter::caller<17, T, D, BrdRowReflect>,
+                row_filter::caller<18, T, D, BrdRowReflect>,
+                row_filter::caller<19, T, D, BrdRowReflect>,
+                row_filter::caller<20, T, D, BrdRowReflect>,
+                row_filter::caller<21, T, D, BrdRowReflect>,
+                row_filter::caller<22, T, D, BrdRowReflect>,
+                row_filter::caller<23, T, D, BrdRowReflect>,
+                row_filter::caller<24, T, D, BrdRowReflect>,
+                row_filter::caller<25, T, D, BrdRowReflect>,
+                row_filter::caller<26, T, D, BrdRowReflect>,
+                row_filter::caller<27, T, D, BrdRowReflect>,
+                row_filter::caller<28, T, D, BrdRowReflect>,
+                row_filter::caller<29, T, D, BrdRowReflect>,
+                row_filter::caller<30, T, D, BrdRowReflect>,
+                row_filter::caller<31, T, D, BrdRowReflect>,
+                row_filter::caller<32, T, D, BrdRowReflect>
+            },
+            {
+                0,
+                row_filter::caller< 1, T, D, BrdRowWrap>,
+                row_filter::caller< 2, T, D, BrdRowWrap>,
+                row_filter::caller< 3, T, D, BrdRowWrap>,
+                row_filter::caller< 4, T, D, BrdRowWrap>,
+                row_filter::caller< 5, T, D, BrdRowWrap>,
+                row_filter::caller< 6, T, D, BrdRowWrap>,
+                row_filter::caller< 7, T, D, BrdRowWrap>,
+                row_filter::caller< 8, T, D, BrdRowWrap>,
+                row_filter::caller< 9, T, D, BrdRowWrap>,
+                row_filter::caller<10, T, D, BrdRowWrap>,
+                row_filter::caller<11, T, D, BrdRowWrap>,
+                row_filter::caller<12, T, D, BrdRowWrap>,
+                row_filter::caller<13, T, D, BrdRowWrap>,
+                row_filter::caller<14, T, D, BrdRowWrap>,
+                row_filter::caller<15, T, D, BrdRowWrap>,
+                row_filter::caller<16, T, D, BrdRowWrap>,
+                row_filter::caller<17, T, D, BrdRowWrap>,
+                row_filter::caller<18, T, D, BrdRowWrap>,
+                row_filter::caller<19, T, D, BrdRowWrap>,
+                row_filter::caller<20, T, D, BrdRowWrap>,
+                row_filter::caller<21, T, D, BrdRowWrap>,
+                row_filter::caller<22, T, D, BrdRowWrap>,
+                row_filter::caller<23, T, D, BrdRowWrap>,
+                row_filter::caller<24, T, D, BrdRowWrap>,
+                row_filter::caller<25, T, D, BrdRowWrap>,
+                row_filter::caller<26, T, D, BrdRowWrap>,
+                row_filter::caller<27, T, D, BrdRowWrap>,
+                row_filter::caller<28, T, D, BrdRowWrap>,
+                row_filter::caller<29, T, D, BrdRowWrap>,
+                row_filter::caller<30, T, D, BrdRowWrap>,
+                row_filter::caller<31, T, D, BrdRowWrap>,
+                row_filter::caller<32, T, D, BrdRowWrap>
+            },
+            {
+                0,
+                row_filter::caller< 1, T, D, BrdRowReflect101>,
+                row_filter::caller< 2, T, D, BrdRowReflect101>,
+                row_filter::caller< 3, T, D, BrdRowReflect101>,
+                row_filter::caller< 4, T, D, BrdRowReflect101>,
+                row_filter::caller< 5, T, D, BrdRowReflect101>,
+                row_filter::caller< 6, T, D, BrdRowReflect101>,
+                row_filter::caller< 7, T, D, BrdRowReflect101>,
+                row_filter::caller< 8, T, D, BrdRowReflect101>,
+                row_filter::caller< 9, T, D, BrdRowReflect101>,
+                row_filter::caller<10, T, D, BrdRowReflect101>,
+                row_filter::caller<11, T, D, BrdRowReflect101>,
+                row_filter::caller<12, T, D, BrdRowReflect101>,
+                row_filter::caller<13, T, D, BrdRowReflect101>,
+                row_filter::caller<14, T, D, BrdRowReflect101>,
+                row_filter::caller<15, T, D, BrdRowReflect101>,
+                row_filter::caller<16, T, D, BrdRowReflect101>,
+                row_filter::caller<17, T, D, BrdRowReflect101>,
+                row_filter::caller<18, T, D, BrdRowReflect101>,
+                row_filter::caller<19, T, D, BrdRowReflect101>,
+                row_filter::caller<20, T, D, BrdRowReflect101>,
+                row_filter::caller<21, T, D, BrdRowReflect101>,
+                row_filter::caller<22, T, D, BrdRowReflect101>,
+                row_filter::caller<23, T, D, BrdRowReflect101>,
+                row_filter::caller<24, T, D, BrdRowReflect101>,
+                row_filter::caller<25, T, D, BrdRowReflect101>,
+                row_filter::caller<26, T, D, BrdRowReflect101>,
+                row_filter::caller<27, T, D, BrdRowReflect101>,
+                row_filter::caller<28, T, D, BrdRowReflect101>,
+                row_filter::caller<29, T, D, BrdRowReflect101>,
+                row_filter::caller<30, T, D, BrdRowReflect101>,
+                row_filter::caller<31, T, D, BrdRowReflect101>,
+                row_filter::caller<32, T, D, BrdRowReflect101>
+            }
+        };
+
+        callers[brd_type][ksize]((PtrStepSz<T>)src, (PtrStepSz<D>)dst, kernel, anchor, cc, stream);
+    }
+}
diff --git a/modules/cudafilters/src/filtering.cpp b/modules/cudafilters/src/filtering.cpp
new file mode 100644
index 00000000000..764e6f63096
--- /dev/null
+++ b/modules/cudafilters/src/filtering.cpp
@@ -0,0 +1,1118 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+Ptr<Filter> cv::cuda::createBoxFilter(int, int, Size, Point, int, Scalar) { throw_no_cuda(); return Ptr<Filter>(); }
+
+Ptr<Filter> cv::cuda::createLinearFilter(int, int, InputArray, Point, int, Scalar) { throw_no_cuda(); return Ptr<Filter>(); }
+
+Ptr<Filter> cv::cuda::createLaplacianFilter(int, int, int, double, int, Scalar) { throw_no_cuda(); return Ptr<Filter>(); }
+
+Ptr<Filter> cv::cuda::createSeparableLinearFilter(int, int, InputArray, InputArray, Point, int, int) { throw_no_cuda(); return Ptr<Filter>(); }
+
+Ptr<Filter> cv::cuda::createDerivFilter(int, int, int, int, int, bool, double, int, int) { throw_no_cuda(); return Ptr<Filter>(); }
+Ptr<Filter> cv::cuda::createSobelFilter(int, int, int, int, int, double, int, int) { throw_no_cuda(); return Ptr<Filter>(); }
+Ptr<Filter> cv::cuda::createScharrFilter(int, int, int, int, double, int, int) { throw_no_cuda(); return Ptr<Filter>(); }
+
+Ptr<Filter> cv::cuda::createGaussianFilter(int, int, Size, double, double, int, int) { throw_no_cuda(); return Ptr<Filter>(); }
+
+Ptr<Filter> cv::cuda::createMorphologyFilter(int, int, InputArray, Point, int) { throw_no_cuda(); return Ptr<Filter>(); }
+
+Ptr<Filter> cv::cuda::createBoxMaxFilter(int, Size, Point, int, Scalar) { throw_no_cuda(); return Ptr<Filter>(); }
+Ptr<Filter> cv::cuda::createBoxMinFilter(int, Size, Point, int, Scalar) { throw_no_cuda(); return Ptr<Filter>(); }
+
+Ptr<Filter> cv::cuda::createRowSumFilter(int, int, int, int, int, Scalar) { throw_no_cuda(); return Ptr<Filter>(); }
+Ptr<Filter> cv::cuda::createColumnSumFilter(int, int, int, int, int, Scalar) { throw_no_cuda(); return Ptr<Filter>(); }
+
+Ptr<Filter> cv::cuda::createMedianFilter(int srcType, int _windowSize, int _partitions){ throw_no_cuda(); return Ptr<Filter>();}
+
+#else
+
+namespace
+{
+    void normalizeAnchor(int& anchor, int ksize)
+    {
+        if (anchor < 0)
+            anchor = ksize >> 1;
+
+        CV_Assert( 0 <= anchor && anchor < ksize );
+    }
+
+    void normalizeAnchor(Point& anchor, Size ksize)
+    {
+        normalizeAnchor(anchor.x, ksize.width);
+        normalizeAnchor(anchor.y, ksize.height);
+    }
+}
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Box Filter
+
+namespace
+{
+    class NPPBoxFilter : public Filter
+    {
+    public:
+        NPPBoxFilter(int srcType, int dstType, Size ksize, Point anchor, int borderMode, Scalar borderVal);
+
+        void apply(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+    private:
+        typedef NppStatus (*nppFilterBox8U_t)(const Npp8u* pSrc, Npp32s nSrcStep, Npp8u* pDst, Npp32s nDstStep,
+                                            NppiSize oSizeROI, NppiSize oMaskSize, NppiPoint oAnchor);
+        typedef NppStatus (*nppFilterBox32F_t)(const Npp32f* pSrc, Npp32s nSrcStep, Npp32f* pDst, Npp32s nDstStep,
+                                            NppiSize oSizeROI, NppiSize oMaskSize, NppiPoint oAnchor);
+
+        Size ksize_;
+        Point anchor_;
+        int type_;
+        int borderMode_;
+        Scalar borderVal_;
+        GpuMat srcBorder_;
+    };
+
+    NPPBoxFilter::NPPBoxFilter(int srcType, int dstType, Size ksize, Point anchor, int borderMode, Scalar borderVal) :
+        ksize_(ksize), anchor_(anchor), type_(srcType), borderMode_(borderMode), borderVal_(borderVal)
+    {
+        CV_Assert( srcType == CV_8UC1 || srcType == CV_8UC4 || srcType == CV_32FC1);
+        CV_Assert( dstType == srcType );
+
+        normalizeAnchor(anchor_, ksize);
+    }
+
+    void NPPBoxFilter::apply(InputArray _src, OutputArray _dst, Stream& _stream)
+    {
+        GpuMat src = _src.getGpuMat();
+        CV_Assert( src.type() == type_ );
+
+        cuda::copyMakeBorder(src, srcBorder_, ksize_.height, ksize_.height, ksize_.width, ksize_.width, borderMode_, borderVal_, _stream);
+
+        _dst.create(src.size(), src.type());
+        GpuMat dst = _dst.getGpuMat();
+
+        GpuMat srcRoi = srcBorder_(Rect(ksize_.width, ksize_.height, src.cols, src.rows));
+
+        cudaStream_t stream = StreamAccessor::getStream(_stream);
+        NppStreamHandler h(stream);
+
+        NppiSize oSizeROI;
+        oSizeROI.width = src.cols;
+        oSizeROI.height = src.rows;
+
+        NppiSize oMaskSize;
+        oMaskSize.height = ksize_.height;
+        oMaskSize.width = ksize_.width;
+
+        NppiPoint oAnchor;
+        oAnchor.x = anchor_.x;
+        oAnchor.y = anchor_.y;
+
+        const int depth = CV_MAT_DEPTH(type_);
+        const int cn = CV_MAT_CN(type_);
+
+        switch (depth)
+        {
+        case CV_8U:
+        {
+            static const nppFilterBox8U_t funcs8U[] = { 0, nppiFilterBox_8u_C1R, 0, 0, nppiFilterBox_8u_C4R };
+            const nppFilterBox8U_t func8U = funcs8U[cn];
+            nppSafeCall(func8U(srcRoi.ptr<Npp8u>(), static_cast<int>(srcRoi.step),
+                dst.ptr<Npp8u>(), static_cast<int>(dst.step),
+                oSizeROI, oMaskSize, oAnchor));
+        }
+            break;
+        case CV_32F:
+        {
+            static const nppFilterBox32F_t funcs32F[] = { 0, nppiFilterBox_32f_C1R, 0, 0, 0 };
+            const nppFilterBox32F_t func32F = funcs32F[cn];
+            nppSafeCall(func32F(srcRoi.ptr<Npp32f>(), static_cast<int>(srcRoi.step),
+                dst.ptr<Npp32f>(), static_cast<int>(dst.step),
+                oSizeROI, oMaskSize, oAnchor));
+        }
+            break;
+        }
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}
+
+Ptr<Filter> cv::cuda::createBoxFilter(int srcType, int dstType, Size ksize, Point anchor, int borderMode, Scalar borderVal)
+{
+    if (dstType < 0)
+        dstType = srcType;
+
+    dstType = CV_MAKE_TYPE(CV_MAT_DEPTH(dstType), CV_MAT_CN(srcType));
+
+    return makePtr<NPPBoxFilter>(srcType, dstType, ksize, anchor, borderMode, borderVal);
+}
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Linear Filter
+
+namespace cv { namespace cuda { namespace device
+{
+    template <typename T, typename D>
+    void filter2D(PtrStepSzb srcWhole, int ofsX, int ofsY, PtrStepSzb dst, const float* kernel,
+                  int kWidth, int kHeight, int anchorX, int anchorY,
+                  int borderMode, const float* borderValue, cudaStream_t stream);
+}}}
+
+namespace
+{
+    class LinearFilter : public Filter
+    {
+    public:
+        LinearFilter(int srcType, int dstType, InputArray kernel, Point anchor, int borderMode, Scalar borderVal);
+
+        void apply(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+    private:
+        typedef void (*filter2D_t)(PtrStepSzb srcWhole, int ofsX, int ofsY, PtrStepSzb dst, const float* kernel,
+                                   int kWidth, int kHeight, int anchorX, int anchorY,
+                                   int borderMode, const float* borderValue, cudaStream_t stream);
+
+        GpuMat kernel_;
+        Point anchor_;
+        int type_;
+        filter2D_t func_;
+        int borderMode_;
+        Scalar_<float> borderVal_;
+    };
+
+    LinearFilter::LinearFilter(int srcType, int dstType, InputArray _kernel, Point anchor, int borderMode, Scalar borderVal) :
+        anchor_(anchor), type_(srcType), borderMode_(borderMode), borderVal_(borderVal)
+    {
+        const int sdepth = CV_MAT_DEPTH(srcType);
+        const int scn = CV_MAT_CN(srcType);
+
+        Mat kernel = _kernel.getMat();
+
+        CV_Assert( sdepth == CV_8U || sdepth == CV_16U || sdepth == CV_32F );
+        CV_Assert( scn == 1 || scn == 4 );
+        CV_Assert( dstType == srcType );
+        CV_Assert( kernel.channels() == 1 );
+        CV_Assert( borderMode == BORDER_REFLECT101 || borderMode == BORDER_REPLICATE || borderMode == BORDER_CONSTANT || borderMode == BORDER_REFLECT || borderMode == BORDER_WRAP );
+
+        Mat kernel32F;
+        kernel.convertTo(kernel32F, CV_32F);
+
+        kernel_ = cuda::createContinuous(kernel.size(), CV_32FC1);
+        kernel_.upload(kernel32F);
+
+        normalizeAnchor(anchor_, kernel.size());
+
+        switch (srcType)
+        {
+        case CV_8UC1:
+            func_ = cv::cuda::device::filter2D<uchar, uchar>;
+            break;
+        case CV_8UC4:
+            func_ = cv::cuda::device::filter2D<uchar4, uchar4>;
+            break;
+        case CV_16UC1:
+            func_ = cv::cuda::device::filter2D<ushort, ushort>;
+            break;
+        case CV_16UC4:
+            func_ = cv::cuda::device::filter2D<ushort4, ushort4>;
+            break;
+        case CV_32FC1:
+            func_ = cv::cuda::device::filter2D<float, float>;
+            break;
+        case CV_32FC4:
+            func_ = cv::cuda::device::filter2D<float4, float4>;
+            break;
+        }
+    }
+
+    void LinearFilter::apply(InputArray _src, OutputArray _dst, Stream& _stream)
+    {
+        GpuMat src = _src.getGpuMat();
+        CV_Assert( src.type() == type_ );
+
+        _dst.create(src.size(), src.type());
+        GpuMat dst = _dst.getGpuMat();
+
+        Point ofs;
+        Size wholeSize;
+        src.locateROI(wholeSize, ofs);
+
+        GpuMat srcWhole(wholeSize, src.type(), src.datastart);
+
+        func_(srcWhole, ofs.x, ofs.y, dst, kernel_.ptr<float>(),
+              kernel_.cols, kernel_.rows, anchor_.x, anchor_.y,
+              borderMode_, borderVal_.val, StreamAccessor::getStream(_stream));
+    }
+}
+
+Ptr<Filter> cv::cuda::createLinearFilter(int srcType, int dstType, InputArray kernel, Point anchor, int borderMode, Scalar borderVal)
+{
+    if (dstType < 0)
+        dstType = srcType;
+
+    dstType = CV_MAKE_TYPE(CV_MAT_DEPTH(dstType), CV_MAT_CN(srcType));
+
+    return makePtr<LinearFilter>(srcType, dstType, kernel, anchor, borderMode, borderVal);
+}
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Laplacian Filter
+
+Ptr<Filter> cv::cuda::createLaplacianFilter(int srcType, int dstType, int ksize, double scale, int borderMode, Scalar borderVal)
+{
+    CV_Assert( ksize == 1 || ksize == 3 );
+
+    static const float K[2][9] =
+    {
+        {0.0f, 1.0f, 0.0f, 1.0f, -4.0f, 1.0f, 0.0f, 1.0f, 0.0f},
+        {2.0f, 0.0f, 2.0f, 0.0f, -8.0f, 0.0f, 2.0f, 0.0f, 2.0f}
+    };
+
+    Mat kernel1(3, 3, CV_32FC1, (void*)K[ksize == 3]);
+    Mat kernel = (scale == 1) ? kernel1 : (kernel1 * scale);
+
+    return cuda::createLinearFilter(srcType, dstType, kernel, Point(-1,-1), borderMode, borderVal);
+}
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Separable Linear Filter
+
+namespace filter
+{
+    template <typename T, typename D>
+    void linearRow(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+
+    template <typename T, typename D>
+    void linearColumn(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+}
+
+namespace
+{
+    class SeparableLinearFilter : public Filter
+    {
+    public:
+        SeparableLinearFilter(int srcType, int dstType,
+                              InputArray rowKernel, InputArray columnKernel,
+                              Point anchor, int rowBorderMode, int columnBorderMode);
+
+        void apply(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+    private:
+        typedef void (*func_t)(PtrStepSzb src, PtrStepSzb dst, const float* kernel, int ksize, int anchor, int brd_type, int cc, cudaStream_t stream);
+
+        int srcType_, bufType_, dstType_;
+        GpuMat rowKernel_, columnKernel_;
+        func_t rowFilter_, columnFilter_;
+        Point anchor_;
+        int rowBorderMode_, columnBorderMode_;
+
+        GpuMat buf_;
+    };
+
+    SeparableLinearFilter::SeparableLinearFilter(int srcType, int dstType,
+                                                 InputArray _rowKernel, InputArray _columnKernel,
+                                                 Point anchor, int rowBorderMode, int columnBorderMode) :
+        srcType_(srcType), dstType_(dstType), anchor_(anchor), rowBorderMode_(rowBorderMode), columnBorderMode_(columnBorderMode)
+    {
+        static const func_t rowFilterFuncs[7][4] =
+        {
+            {filter::linearRow<uchar, float>, 0, filter::linearRow<uchar3, float3>, filter::linearRow<uchar4, float4>},
+            {0, 0, 0, 0},
+            {filter::linearRow<ushort, float>, 0, filter::linearRow<ushort3, float3>, filter::linearRow<ushort4, float4>},
+            {filter::linearRow<short, float>, 0, filter::linearRow<short3, float3>, filter::linearRow<short4, float4>},
+            {filter::linearRow<int, float>, 0, filter::linearRow<int3, float3>, filter::linearRow<int4, float4>},
+            {filter::linearRow<float, float>, 0, filter::linearRow<float3, float3>, filter::linearRow<float4, float4>},
+            {0, 0, 0, 0}
+        };
+
+        static const func_t columnFilterFuncs[7][4] =
+        {
+            {filter::linearColumn<float, uchar>, 0, filter::linearColumn<float3, uchar3>, filter::linearColumn<float4, uchar4>},
+            {0, 0, 0, 0},
+            {filter::linearColumn<float, ushort>, 0, filter::linearColumn<float3, ushort3>, filter::linearColumn<float4, ushort4>},
+            {filter::linearColumn<float, short>, 0, filter::linearColumn<float3, short3>, filter::linearColumn<float4, short4>},
+            {filter::linearColumn<float, int>, 0, filter::linearColumn<float3, int3>, filter::linearColumn<float4, int4>},
+            {filter::linearColumn<float, float>, 0, filter::linearColumn<float3, float3>, filter::linearColumn<float4, float4>},
+            {0, 0, 0, 0}
+        };
+
+        const int sdepth = CV_MAT_DEPTH(srcType);
+        const int cn = CV_MAT_CN(srcType);
+        const int ddepth = CV_MAT_DEPTH(dstType);
+
+        Mat rowKernel = _rowKernel.getMat();
+        Mat columnKernel = _columnKernel.getMat();
+
+        CV_Assert( sdepth <= CV_64F && cn <= 4 );
+        CV_Assert( rowKernel.channels() == 1 );
+        CV_Assert( columnKernel.channels() == 1 );
+        CV_Assert( rowBorderMode == BORDER_REFLECT101 || rowBorderMode == BORDER_REPLICATE || rowBorderMode == BORDER_CONSTANT || rowBorderMode == BORDER_REFLECT || rowBorderMode == BORDER_WRAP );
+        CV_Assert( columnBorderMode == BORDER_REFLECT101 || columnBorderMode == BORDER_REPLICATE || columnBorderMode == BORDER_CONSTANT || columnBorderMode == BORDER_REFLECT || columnBorderMode == BORDER_WRAP );
+
+        Mat kernel32F;
+
+        rowKernel.convertTo(kernel32F, CV_32F);
+        rowKernel_.upload(kernel32F.reshape(1, 1));
+
+        columnKernel.convertTo(kernel32F, CV_32F);
+        columnKernel_.upload(kernel32F.reshape(1, 1));
+
+        CV_Assert( rowKernel_.cols > 0 && rowKernel_.cols <= 32 );
+        CV_Assert( columnKernel_.cols > 0 && columnKernel_.cols <= 32 );
+
+        normalizeAnchor(anchor_.x, rowKernel_.cols);
+        normalizeAnchor(anchor_.y, columnKernel_.cols);
+
+        bufType_ = CV_MAKE_TYPE(CV_32F, cn);
+
+        rowFilter_ = rowFilterFuncs[sdepth][cn - 1];
+        CV_Assert( rowFilter_ != 0 );
+
+        columnFilter_ = columnFilterFuncs[ddepth][cn - 1];
+        CV_Assert( columnFilter_ != 0 );
+    }
+
+    void SeparableLinearFilter::apply(InputArray _src, OutputArray _dst, Stream& _stream)
+    {
+        GpuMat src = _src.getGpuMat();
+        CV_Assert( src.type() == srcType_ );
+
+        _dst.create(src.size(), dstType_);
+        GpuMat dst = _dst.getGpuMat();
+
+        ensureSizeIsEnough(src.size(), bufType_, buf_);
+
+        DeviceInfo devInfo;
+        const int cc = devInfo.majorVersion() * 10 + devInfo.minorVersion();
+
+        cudaStream_t stream = StreamAccessor::getStream(_stream);
+
+        rowFilter_(src, buf_, rowKernel_.ptr<float>(), rowKernel_.cols, anchor_.x, rowBorderMode_, cc, stream);
+        columnFilter_(buf_, dst, columnKernel_.ptr<float>(), columnKernel_.cols, anchor_.y, columnBorderMode_, cc, stream);
+    }
+}
+
+Ptr<Filter> cv::cuda::createSeparableLinearFilter(int srcType, int dstType, InputArray rowKernel, InputArray columnKernel, Point anchor, int rowBorderMode, int columnBorderMode)
+{
+    if (dstType < 0)
+        dstType = srcType;
+
+    dstType = CV_MAKE_TYPE(CV_MAT_DEPTH(dstType), CV_MAT_CN(srcType));
+
+    if (columnBorderMode < 0)
+        columnBorderMode = rowBorderMode;
+
+    return makePtr<SeparableLinearFilter>(srcType, dstType, rowKernel, columnKernel, anchor, rowBorderMode, columnBorderMode);
+}
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Deriv Filter
+
+Ptr<Filter> cv::cuda::createDerivFilter(int srcType, int dstType, int dx, int dy, int ksize, bool normalize, double scale, int rowBorderMode, int columnBorderMode)
+{
+    Mat kx, ky;
+    getDerivKernels(kx, ky, dx, dy, ksize, normalize, CV_32F);
+
+    if (scale != 1)
+    {
+        // usually the smoothing part is the slowest to compute,
+        // so try to scale it instead of the faster differentiating part
+        if (dx == 0)
+            kx *= scale;
+        else
+            ky *= scale;
+    }
+
+    return cuda::createSeparableLinearFilter(srcType, dstType, kx, ky, Point(-1, -1), rowBorderMode, columnBorderMode);
+}
+
+Ptr<Filter> cv::cuda::createSobelFilter(int srcType, int dstType, int dx, int dy, int ksize, double scale, int rowBorderMode, int columnBorderMode)
+{
+    return cuda::createDerivFilter(srcType, dstType, dx, dy, ksize, false, scale, rowBorderMode, columnBorderMode);
+}
+
+Ptr<Filter> cv::cuda::createScharrFilter(int srcType, int dstType, int dx, int dy, double scale, int rowBorderMode, int columnBorderMode)
+{
+    return cuda::createDerivFilter(srcType, dstType, dx, dy, -1, false, scale, rowBorderMode, columnBorderMode);
+}
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Gaussian Filter
+
+Ptr<Filter> cv::cuda::createGaussianFilter(int srcType, int dstType, Size ksize, double sigma1, double sigma2, int rowBorderMode, int columnBorderMode)
+{
+    const int depth = CV_MAT_DEPTH(srcType);
+
+    if (sigma2 <= 0)
+        sigma2 = sigma1;
+
+    // automatic detection of kernel size from sigma
+    if (ksize.width <= 0 && sigma1 > 0)
+        ksize.width = cvRound(sigma1 * (depth == CV_8U ? 3 : 4)*2 + 1) | 1;
+    if (ksize.height <= 0 && sigma2 > 0)
+        ksize.height = cvRound(sigma2 * (depth == CV_8U ? 3 : 4)*2 + 1) | 1;
+
+    CV_Assert( ksize.width > 0 && ksize.width % 2 == 1 && ksize.height > 0 && ksize.height % 2 == 1 );
+
+    sigma1 = std::max(sigma1, 0.0);
+    sigma2 = std::max(sigma2, 0.0);
+
+    Mat kx = getGaussianKernel(ksize.width, sigma1, CV_32F);
+    Mat ky;
+    if (ksize.height == ksize.width && std::abs(sigma1 - sigma2) < DBL_EPSILON)
+        ky = kx;
+    else
+        ky = getGaussianKernel(ksize.height, sigma2, CV_32F);
+
+    return createSeparableLinearFilter(srcType, dstType, kx, ky, Point(-1,-1), rowBorderMode, columnBorderMode);
+}
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Morphology Filter
+
+namespace
+{
+    class MorphologyFilter : public Filter
+    {
+    public:
+        MorphologyFilter(int op, int srcType, InputArray kernel, Point anchor, int iterations);
+
+        void apply(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+    private:
+        typedef NppStatus (*nppMorfFilter8u_t)(const Npp8u* pSrc, Npp32s nSrcStep, Npp8u* pDst, Npp32s nDstStep, NppiSize oSizeROI,
+                                               const Npp8u* pMask, NppiSize oMaskSize, NppiPoint oAnchor);
+        typedef NppStatus (*nppMorfFilter32f_t)(const Npp32f* pSrc, Npp32s nSrcStep, Npp32f* pDst, Npp32s nDstStep, NppiSize oSizeROI,
+                                                const Npp8u* pMask, NppiSize oMaskSize, NppiPoint oAnchor);
+
+        int type_;
+        GpuMat kernel_;
+        Point anchor_;
+        int iters_;
+        nppMorfFilter8u_t func8u_;
+        nppMorfFilter32f_t func32f_;
+
+        GpuMat srcBorder_;
+        GpuMat buf_;
+    };
+
+    MorphologyFilter::MorphologyFilter(int op, int srcType, InputArray _kernel, Point anchor, int iterations) :
+        type_(srcType), anchor_(anchor), iters_(iterations)
+    {
+        static const nppMorfFilter8u_t funcs8u[2][5] =
+        {
+            {0, nppiErode_8u_C1R, 0, 0, nppiErode_8u_C4R },
+            {0, nppiDilate_8u_C1R, 0, 0, nppiDilate_8u_C4R }
+        };
+        static const nppMorfFilter32f_t funcs32f[2][5] =
+        {
+            {0, nppiErode_32f_C1R, 0, 0, nppiErode_32f_C4R },
+            {0, nppiDilate_32f_C1R, 0, 0, nppiDilate_32f_C4R }
+        };
+
+        CV_Assert( op == MORPH_ERODE || op == MORPH_DILATE );
+        CV_Assert( srcType == CV_8UC1 || srcType == CV_8UC4 || srcType == CV_32FC1 || srcType == CV_32FC4 );
+
+        Mat kernel = _kernel.getMat();
+        Size ksize = !kernel.empty() ? _kernel.size() : Size(3, 3);
+
+        normalizeAnchor(anchor_, ksize);
+
+        if (kernel.empty())
+        {
+            kernel = getStructuringElement(MORPH_RECT, Size(1 + iters_ * 2, 1 + iters_ * 2));
+            anchor_ = Point(iters_, iters_);
+            iters_ = 1;
+        }
+        else if (iters_ > 1 && cv::countNonZero(kernel) == (int) kernel.total())
+        {
+            anchor_ = Point(anchor_.x * iters_, anchor_.y * iters_);
+            kernel = getStructuringElement(MORPH_RECT,
+                                           Size(ksize.width + (iters_ - 1) * (ksize.width - 1),
+                                                ksize.height + (iters_ - 1) * (ksize.height - 1)),
+                                           anchor_);
+            iters_ = 1;
+        }
+
+        CV_Assert( kernel.channels() == 1 );
+
+        Mat kernel8U;
+        kernel.convertTo(kernel8U, CV_8U);
+
+        kernel_ = cuda::createContinuous(kernel.size(), CV_8UC1);
+        kernel_.upload(kernel8U);
+
+        if(srcType == CV_8UC1 || srcType == CV_8UC4)
+        {
+            func8u_ = funcs8u[op][CV_MAT_CN(srcType)];
+        }
+        else if(srcType == CV_32FC1 || srcType == CV_32FC4)
+        {
+            func32f_ = funcs32f[op][CV_MAT_CN(srcType)];
+        }
+    }
+
+    void MorphologyFilter::apply(InputArray _src, OutputArray _dst, Stream& _stream)
+    {
+        GpuMat src = _src.getGpuMat();
+        CV_Assert( src.type() == type_ );
+
+        Size ksize = kernel_.size();
+        cuda::copyMakeBorder(src, srcBorder_, ksize.height, ksize.height, ksize.width, ksize.width, BORDER_DEFAULT, Scalar(), _stream);
+
+        GpuMat srcRoi = srcBorder_(Rect(ksize.width, ksize.height, src.cols, src.rows));
+
+        GpuMat bufRoi;
+        if (iters_ > 1)
+        {
+            ensureSizeIsEnough(srcBorder_.size(), type_, buf_);
+            buf_.setTo(Scalar::all(0), _stream);
+            bufRoi = buf_(Rect(ksize.width, ksize.height, src.cols, src.rows));
+        }
+
+        _dst.create(src.size(), src.type());
+        GpuMat dst = _dst.getGpuMat();
+
+        cudaStream_t stream = StreamAccessor::getStream(_stream);
+        NppStreamHandler h(stream);
+
+        NppiSize oSizeROI;
+        oSizeROI.width = src.cols;
+        oSizeROI.height = src.rows;
+
+        NppiSize oMaskSize;
+        oMaskSize.height = ksize.height;
+        oMaskSize.width = ksize.width;
+
+        NppiPoint oAnchor;
+        oAnchor.x = anchor_.x;
+        oAnchor.y = anchor_.y;
+
+        if (type_ == CV_8UC1 || type_ == CV_8UC4)
+        {
+            nppSafeCall( func8u_(srcRoi.ptr<Npp8u>(), static_cast<int>(srcRoi.step), dst.ptr<Npp8u>(), static_cast<int>(dst.step),
+                                 oSizeROI, kernel_.ptr<Npp8u>(), oMaskSize, oAnchor) );
+
+            for(int i = 1; i < iters_; ++i)
+            {
+                dst.copyTo(bufRoi, _stream);
+
+                nppSafeCall( func8u_(bufRoi.ptr<Npp8u>(), static_cast<int>(bufRoi.step), dst.ptr<Npp8u>(), static_cast<int>(dst.step),
+                                     oSizeROI, kernel_.ptr<Npp8u>(), oMaskSize, oAnchor) );
+            }
+        }
+        else if (type_ == CV_32FC1 || type_ == CV_32FC4)
+        {
+            nppSafeCall( func32f_(srcRoi.ptr<Npp32f>(), static_cast<int>(srcRoi.step), dst.ptr<Npp32f>(), static_cast<int>(dst.step),
+                                  oSizeROI, kernel_.ptr<Npp8u>(), oMaskSize, oAnchor) );
+
+            for(int i = 1; i < iters_; ++i)
+            {
+                dst.copyTo(bufRoi, _stream);
+
+                nppSafeCall( func32f_(bufRoi.ptr<Npp32f>(), static_cast<int>(bufRoi.step), dst.ptr<Npp32f>(), static_cast<int>(dst.step),
+                                      oSizeROI, kernel_.ptr<Npp8u>(), oMaskSize, oAnchor) );
+            }
+        }
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}
+
+namespace
+{
+    class MorphologyExFilter : public Filter
+    {
+    public:
+        MorphologyExFilter(int srcType, InputArray kernel, Point anchor, int iterations);
+
+    protected:
+        Ptr<cuda::Filter> erodeFilter_, dilateFilter_;
+        GpuMat buf_;
+    };
+
+    MorphologyExFilter::MorphologyExFilter(int srcType, InputArray kernel, Point anchor, int iterations)
+    {
+        erodeFilter_ = cuda::createMorphologyFilter(MORPH_ERODE, srcType, kernel, anchor, iterations);
+        dilateFilter_ = cuda::createMorphologyFilter(MORPH_DILATE, srcType, kernel, anchor, iterations);
+    }
+
+    // MORPH_OPEN
+
+    class MorphologyOpenFilter : public MorphologyExFilter
+    {
+    public:
+        MorphologyOpenFilter(int srcType, InputArray kernel, Point anchor, int iterations);
+
+        void apply(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+    };
+
+    MorphologyOpenFilter::MorphologyOpenFilter(int srcType, InputArray kernel, Point anchor, int iterations) :
+        MorphologyExFilter(srcType, kernel, anchor, iterations)
+    {
+    }
+
+    void MorphologyOpenFilter::apply(InputArray src, OutputArray dst, Stream& stream)
+    {
+        erodeFilter_->apply(src, buf_, stream);
+        dilateFilter_->apply(buf_, dst, stream);
+    }
+
+    // MORPH_CLOSE
+
+    class MorphologyCloseFilter : public MorphologyExFilter
+    {
+    public:
+        MorphologyCloseFilter(int srcType, InputArray kernel, Point anchor, int iterations);
+
+        void apply(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+    };
+
+    MorphologyCloseFilter::MorphologyCloseFilter(int srcType, InputArray kernel, Point anchor, int iterations) :
+        MorphologyExFilter(srcType, kernel, anchor, iterations)
+    {
+    }
+
+    void MorphologyCloseFilter::apply(InputArray src, OutputArray dst, Stream& stream)
+    {
+        dilateFilter_->apply(src, buf_, stream);
+        erodeFilter_->apply(buf_, dst, stream);
+    }
+
+    // MORPH_GRADIENT
+
+    class MorphologyGradientFilter : public MorphologyExFilter
+    {
+    public:
+        MorphologyGradientFilter(int srcType, InputArray kernel, Point anchor, int iterations);
+
+        void apply(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+    };
+
+    MorphologyGradientFilter::MorphologyGradientFilter(int srcType, InputArray kernel, Point anchor, int iterations) :
+        MorphologyExFilter(srcType, kernel, anchor, iterations)
+    {
+    }
+
+    void MorphologyGradientFilter::apply(InputArray src, OutputArray dst, Stream& stream)
+    {
+        erodeFilter_->apply(src, buf_, stream);
+        dilateFilter_->apply(src, dst, stream);
+        cuda::subtract(dst, buf_, dst, noArray(), -1, stream);
+    }
+
+    // MORPH_TOPHAT
+
+    class MorphologyTophatFilter : public MorphologyExFilter
+    {
+    public:
+        MorphologyTophatFilter(int srcType, InputArray kernel, Point anchor, int iterations);
+
+        void apply(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+    };
+
+    MorphologyTophatFilter::MorphologyTophatFilter(int srcType, InputArray kernel, Point anchor, int iterations) :
+        MorphologyExFilter(srcType, kernel, anchor, iterations)
+    {
+    }
+
+    void MorphologyTophatFilter::apply(InputArray src, OutputArray dst, Stream& stream)
+    {
+        erodeFilter_->apply(src, dst, stream);
+        dilateFilter_->apply(dst, buf_, stream);
+        cuda::subtract(src, buf_, dst, noArray(), -1, stream);
+    }
+
+    // MORPH_BLACKHAT
+
+    class MorphologyBlackhatFilter : public MorphologyExFilter
+    {
+    public:
+        MorphologyBlackhatFilter(int srcType, InputArray kernel, Point anchor, int iterations);
+
+        void apply(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+    };
+
+    MorphologyBlackhatFilter::MorphologyBlackhatFilter(int srcType, InputArray kernel, Point anchor, int iterations) :
+        MorphologyExFilter(srcType, kernel, anchor, iterations)
+    {
+    }
+
+    void MorphologyBlackhatFilter::apply(InputArray src, OutputArray dst, Stream& stream)
+    {
+        dilateFilter_->apply(src, dst, stream);
+        erodeFilter_->apply(dst, buf_, stream);
+        cuda::subtract(buf_, src, dst, noArray(), -1, stream);
+    }
+}
+
+Ptr<Filter> cv::cuda::createMorphologyFilter(int op, int srcType, InputArray kernel, Point anchor, int iterations)
+{
+    switch( op )
+    {
+    case MORPH_ERODE:
+    case MORPH_DILATE:
+        return makePtr<MorphologyFilter>(op, srcType, kernel, anchor, iterations);
+        break;
+
+    case MORPH_OPEN:
+        return makePtr<MorphologyOpenFilter>(srcType, kernel, anchor, iterations);
+        break;
+
+    case MORPH_CLOSE:
+        return makePtr<MorphologyCloseFilter>(srcType, kernel, anchor, iterations);
+        break;
+
+    case MORPH_GRADIENT:
+        return makePtr<MorphologyGradientFilter>(srcType, kernel, anchor, iterations);
+        break;
+
+    case MORPH_TOPHAT:
+        return makePtr<MorphologyTophatFilter>(srcType, kernel, anchor, iterations);
+        break;
+
+    case MORPH_BLACKHAT:
+        return makePtr<MorphologyBlackhatFilter>(srcType, kernel, anchor, iterations);
+        break;
+
+    default:
+        CV_Error(Error::StsBadArg, "Unknown morphological operation");
+        return Ptr<Filter>();
+    }
+}
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Image Rank Filter
+
+namespace
+{
+    enum RankType
+    {
+        RANK_MAX,
+        RANK_MIN
+    };
+
+    class NPPRankFilter : public Filter
+    {
+    public:
+        NPPRankFilter(int op, int srcType, Size ksize, Point anchor, int borderMode, Scalar borderVal);
+
+        void apply(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+    private:
+        typedef NppStatus (*nppFilterRank_t)(const Npp8u* pSrc, Npp32s nSrcStep, Npp8u* pDst, Npp32s nDstStep, NppiSize oSizeROI,
+                                             NppiSize oMaskSize, NppiPoint oAnchor);
+
+        int type_;
+        Size ksize_;
+        Point anchor_;
+        int borderMode_;
+        Scalar borderVal_;
+        nppFilterRank_t func_;
+
+        GpuMat srcBorder_;
+    };
+
+    NPPRankFilter::NPPRankFilter(int op, int srcType, Size ksize, Point anchor, int borderMode, Scalar borderVal) :
+        type_(srcType), ksize_(ksize), anchor_(anchor), borderMode_(borderMode), borderVal_(borderVal)
+    {
+        static const nppFilterRank_t maxFuncs[] = {0, nppiFilterMax_8u_C1R, 0, 0, nppiFilterMax_8u_C4R};
+        static const nppFilterRank_t minFuncs[] = {0, nppiFilterMin_8u_C1R, 0, 0, nppiFilterMin_8u_C4R};
+
+        CV_Assert( srcType == CV_8UC1 || srcType == CV_8UC4 );
+
+        normalizeAnchor(anchor_, ksize_);
+
+        if (op == RANK_MAX)
+            func_ = maxFuncs[CV_MAT_CN(srcType)];
+        else
+            func_ = minFuncs[CV_MAT_CN(srcType)];
+    }
+
+    void NPPRankFilter::apply(InputArray _src, OutputArray _dst, Stream& _stream)
+    {
+        GpuMat src = _src.getGpuMat();
+        CV_Assert( src.type() == type_ );
+
+        cuda::copyMakeBorder(src, srcBorder_, ksize_.height, ksize_.height, ksize_.width, ksize_.width, borderMode_, borderVal_, _stream);
+
+        _dst.create(src.size(), src.type());
+        GpuMat dst = _dst.getGpuMat();
+
+        GpuMat srcRoi = srcBorder_(Rect(ksize_.width, ksize_.height, src.cols, src.rows));
+
+        cudaStream_t stream = StreamAccessor::getStream(_stream);
+        NppStreamHandler h(stream);
+
+        NppiSize oSizeROI;
+        oSizeROI.width = src.cols;
+        oSizeROI.height = src.rows;
+
+        NppiSize oMaskSize;
+        oMaskSize.height = ksize_.height;
+        oMaskSize.width = ksize_.width;
+
+        NppiPoint oAnchor;
+        oAnchor.x = anchor_.x;
+        oAnchor.y = anchor_.y;
+
+        nppSafeCall( func_(srcRoi.ptr<Npp8u>(), static_cast<int>(srcRoi.step), dst.ptr<Npp8u>(), static_cast<int>(dst.step),
+                           oSizeROI, oMaskSize, oAnchor) );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}
+
+Ptr<Filter> cv::cuda::createBoxMaxFilter(int srcType, Size ksize, Point anchor, int borderMode, Scalar borderVal)
+{
+    return makePtr<NPPRankFilter>(RANK_MAX, srcType, ksize, anchor, borderMode, borderVal);
+}
+
+Ptr<Filter> cv::cuda::createBoxMinFilter(int srcType, Size ksize, Point anchor, int borderMode, Scalar borderVal)
+{
+    return makePtr<NPPRankFilter>(RANK_MIN, srcType, ksize, anchor, borderMode, borderVal);
+}
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// 1D Sum Filter
+
+namespace
+{
+    class NppRowSumFilter : public Filter
+    {
+    public:
+        NppRowSumFilter(int srcType, int dstType, int ksize, int anchor, int borderMode, Scalar borderVal);
+
+        void apply(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+    private:
+        int srcType_, dstType_;
+        int ksize_;
+        int anchor_;
+        int borderMode_;
+        Scalar borderVal_;
+
+        GpuMat srcBorder_;
+    };
+
+    NppRowSumFilter::NppRowSumFilter(int srcType, int dstType, int ksize, int anchor, int borderMode, Scalar borderVal) :
+        srcType_(srcType), dstType_(dstType), ksize_(ksize), anchor_(anchor), borderMode_(borderMode), borderVal_(borderVal)
+    {
+        CV_Assert( srcType_ == CV_8UC1 );
+        CV_Assert( dstType_ == CV_32FC1 );
+
+        normalizeAnchor(anchor_, ksize_);
+    }
+
+    void NppRowSumFilter::apply(InputArray _src, OutputArray _dst, Stream& _stream)
+    {
+        GpuMat src = _src.getGpuMat();
+        CV_Assert( src.type() == srcType_ );
+
+        cuda::copyMakeBorder(src, srcBorder_, 0, 0, ksize_, ksize_, borderMode_, borderVal_, _stream);
+
+        _dst.create(src.size(), dstType_);
+        GpuMat dst = _dst.getGpuMat();
+
+        GpuMat srcRoi = srcBorder_(Rect(ksize_, 0, src.cols, src.rows));
+
+        cudaStream_t stream = StreamAccessor::getStream(_stream);
+        NppStreamHandler h(stream);
+
+        NppiSize oSizeROI;
+        oSizeROI.width = src.cols;
+        oSizeROI.height = src.rows;
+
+        nppSafeCall( nppiSumWindowRow_8u32f_C1R(srcRoi.ptr<Npp8u>(), static_cast<int>(srcRoi.step),
+                                                dst.ptr<Npp32f>(), static_cast<int>(dst.step),
+                                                oSizeROI, ksize_, anchor_) );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}
+
+Ptr<Filter> cv::cuda::createRowSumFilter(int srcType, int dstType, int ksize, int anchor, int borderMode, Scalar borderVal)
+{
+    return makePtr<NppRowSumFilter>(srcType, dstType, ksize, anchor, borderMode, borderVal);
+}
+
+namespace
+{
+    class NppColumnSumFilter : public Filter
+    {
+    public:
+        NppColumnSumFilter(int srcType, int dstType, int ksize, int anchor, int borderMode, Scalar borderVal);
+
+        void apply(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+    private:
+        int srcType_, dstType_;
+        int ksize_;
+        int anchor_;
+        int borderMode_;
+        Scalar borderVal_;
+
+        GpuMat srcBorder_;
+    };
+
+    NppColumnSumFilter::NppColumnSumFilter(int srcType, int dstType, int ksize, int anchor, int borderMode, Scalar borderVal) :
+        srcType_(srcType), dstType_(dstType), ksize_(ksize), anchor_(anchor), borderMode_(borderMode), borderVal_(borderVal)
+    {
+        CV_Assert( srcType_ == CV_8UC1 );
+        CV_Assert( dstType_ == CV_32FC1 );
+
+        normalizeAnchor(anchor_, ksize_);
+    }
+
+    void NppColumnSumFilter::apply(InputArray _src, OutputArray _dst, Stream& _stream)
+    {
+        GpuMat src = _src.getGpuMat();
+        CV_Assert( src.type() == srcType_ );
+
+        cuda::copyMakeBorder(src, srcBorder_, ksize_, ksize_, 0, 0, borderMode_, borderVal_, _stream);
+
+        _dst.create(src.size(), dstType_);
+        GpuMat dst = _dst.getGpuMat();
+
+        GpuMat srcRoi = srcBorder_(Rect(0, ksize_, src.cols, src.rows));
+
+        cudaStream_t stream = StreamAccessor::getStream(_stream);
+        NppStreamHandler h(stream);
+
+        NppiSize oSizeROI;
+        oSizeROI.width = src.cols;
+        oSizeROI.height = src.rows;
+
+        nppSafeCall( nppiSumWindowColumn_8u32f_C1R(srcRoi.ptr<Npp8u>(), static_cast<int>(srcRoi.step),
+                                                   dst.ptr<Npp32f>(), static_cast<int>(dst.step),
+                                                   oSizeROI, ksize_, anchor_) );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}
+
+Ptr<Filter> cv::cuda::createColumnSumFilter(int srcType, int dstType, int ksize, int anchor, int borderMode, Scalar borderVal)
+{
+    return makePtr<NppColumnSumFilter>(srcType, dstType, ksize, anchor, borderMode, borderVal);
+}
+
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Median Filter
+
+
+
+namespace cv { namespace cuda { namespace device
+{
+    void medianFiltering_gpu(const PtrStepSzb src, PtrStepSzb dst, PtrStepSzi devHist,
+        PtrStepSzi devCoarseHist,int kernel, int partitions, cudaStream_t stream);
+}}}
+
+namespace
+{
+    class MedianFilter : public Filter
+    {
+    public:
+        MedianFilter(int srcType, int _windowSize, int _partitions=128);
+
+        void apply(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+    private:
+        int windowSize;
+        int partitions;
+        GpuMat devHist;
+        GpuMat devCoarseHist;
+    };
+
+    MedianFilter::MedianFilter(int srcType, int _windowSize, int _partitions) :
+        windowSize(_windowSize),partitions(_partitions)
+    {
+        CV_Assert( srcType == CV_8UC1 );
+        CV_Assert(windowSize>=3);
+        CV_Assert(_partitions>=1);
+
+    }
+
+    void MedianFilter::apply(InputArray _src, OutputArray _dst, Stream& _stream)
+    {
+        using namespace cv::cuda::device;
+
+        GpuMat src = _src.getGpuMat();
+         _dst.create(src.rows, src.cols, src.type());
+        GpuMat dst = _dst.getGpuMat();
+
+        if (partitions>src.rows)
+            partitions=src.rows/2;
+
+        // Kernel needs to be half window size
+        int kernel=windowSize/2;
+
+        CV_Assert(kernel < src.rows);
+        CV_Assert(kernel < src.cols);
+
+        // Note - these are hardcoded in the actual GPU kernel. Do not change these values.
+        int histSize=256, histCoarseSize=8;
+
+        devHist.create(1, src.cols*histSize*partitions, CV_32SC1);
+        devCoarseHist.create(1, src.cols*histCoarseSize*partitions, CV_32SC1);
+
+        devHist.setTo(0, _stream);
+        devCoarseHist.setTo(0, _stream);
+
+        medianFiltering_gpu(src,dst,devHist, devCoarseHist,kernel,partitions,StreamAccessor::getStream(_stream));
+    }
+}
+
+Ptr<Filter> cv::cuda::createMedianFilter(int srcType, int _windowSize, int _partitions)
+{
+    return makePtr<MedianFilter>(srcType, _windowSize,_partitions);
+}
+
+#endif
diff --git a/modules/cudafilters/src/precomp.hpp b/modules/cudafilters/src/precomp.hpp
new file mode 100644
index 00000000000..f2c47073fee
--- /dev/null
+++ b/modules/cudafilters/src/precomp.hpp
@@ -0,0 +1,54 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_PRECOMP_H__
+#define __OPENCV_PRECOMP_H__
+
+#include <limits>
+
+#include "opencv2/cudafilters.hpp"
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/imgproc.hpp"
+
+#include "opencv2/core/private.cuda.hpp"
+
+#endif /* __OPENCV_PRECOMP_H__ */
diff --git a/modules/cudafilters/test/test_filters.cpp b/modules/cudafilters/test/test_filters.cpp
new file mode 100644
index 00000000000..a86d8c9fff4
--- /dev/null
+++ b/modules/cudafilters/test/test_filters.cpp
@@ -0,0 +1,708 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+namespace
+{
+    IMPLEMENT_PARAM_CLASS(KSize, cv::Size)
+    IMPLEMENT_PARAM_CLASS(Anchor, cv::Point)
+    IMPLEMENT_PARAM_CLASS(Deriv_X, int)
+    IMPLEMENT_PARAM_CLASS(Deriv_Y, int)
+    IMPLEMENT_PARAM_CLASS(Iterations, int)
+    IMPLEMENT_PARAM_CLASS(KernelSize, int)
+
+    cv::Mat getInnerROI(cv::InputArray m_, cv::Size ksize)
+    {
+        cv::Mat m = getMat(m_);
+        cv::Rect roi(ksize.width, ksize.height, m.cols - 2 * ksize.width, m.rows - 2 * ksize.height);
+        return m(roi);
+    }
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////////
+// Blur
+
+PARAM_TEST_CASE(Blur, cv::cuda::DeviceInfo, cv::Size, MatType, KSize, Anchor, BorderType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    cv::Size ksize;
+    cv::Point anchor;
+    int borderType;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        ksize = GET_PARAM(3);
+        anchor = GET_PARAM(4);
+        borderType = GET_PARAM(5);
+        useRoi = GET_PARAM(6);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Blur, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+
+    cv::Ptr<cv::cuda::Filter> blurFilter = cv::cuda::createBoxFilter(src.type(), -1, ksize, anchor, borderType);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    blurFilter->apply(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::blur(src, dst_gold, ksize, anchor, borderType);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Filters, Blur, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC4), MatType(CV_32FC1)),
+    testing::Values(KSize(cv::Size(3, 3)), KSize(cv::Size(5, 5)), KSize(cv::Size(7, 7))),
+    testing::Values(Anchor(cv::Point(-1, -1)), Anchor(cv::Point(0, 0)), Anchor(cv::Point(2, 2))),
+    testing::Values(BorderType(cv::BORDER_REFLECT101), BorderType(cv::BORDER_REPLICATE), BorderType(cv::BORDER_CONSTANT), BorderType(cv::BORDER_REFLECT)),
+    WHOLE_SUBMAT));
+
+/////////////////////////////////////////////////////////////////////////////////////////////////
+// Filter2D
+
+PARAM_TEST_CASE(Filter2D, cv::cuda::DeviceInfo, cv::Size, MatType, KSize, Anchor, BorderType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    cv::Size ksize;
+    cv::Point anchor;
+    int borderType;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        ksize = GET_PARAM(3);
+        anchor = GET_PARAM(4);
+        borderType = GET_PARAM(5);
+        useRoi = GET_PARAM(6);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Filter2D, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+    cv::Mat kernel = randomMat(cv::Size(ksize.width, ksize.height), CV_32FC1, 0.0, 1.0);
+
+    cv::Ptr<cv::cuda::Filter> filter2D = cv::cuda::createLinearFilter(src.type(), -1, kernel, anchor, borderType);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    filter2D->apply(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::filter2D(src, dst_gold, -1, kernel, anchor, 0, borderType);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, CV_MAT_DEPTH(type) == CV_32F ? 1e-1 : 1.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Filters, Filter2D, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC4), MatType(CV_16UC1), MatType(CV_16UC4), MatType(CV_32FC1), MatType(CV_32FC4)),
+    testing::Values(KSize(cv::Size(3, 3)), KSize(cv::Size(5, 5)), KSize(cv::Size(7, 7)), KSize(cv::Size(11, 11)), KSize(cv::Size(13, 13)), KSize(cv::Size(15, 15))),
+    testing::Values(Anchor(cv::Point(-1, -1)), Anchor(cv::Point(0, 0)), Anchor(cv::Point(2, 2))),
+    testing::Values(BorderType(cv::BORDER_REFLECT101), BorderType(cv::BORDER_REPLICATE), BorderType(cv::BORDER_CONSTANT), BorderType(cv::BORDER_REFLECT)),
+    WHOLE_SUBMAT));
+
+/////////////////////////////////////////////////////////////////////////////////////////////////
+// Laplacian
+
+PARAM_TEST_CASE(Laplacian, cv::cuda::DeviceInfo, cv::Size, MatType, KSize, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    cv::Size ksize;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        ksize = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Laplacian, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+
+    cv::Ptr<cv::cuda::Filter> laplacian = cv::cuda::createLaplacianFilter(src.type(), -1, ksize.width);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    laplacian->apply(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::Laplacian(src, dst_gold, -1, ksize.width);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, src.depth() < CV_32F ? 0.0 : 1e-3);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Filters, Laplacian, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC4), MatType(CV_32FC1)),
+    testing::Values(KSize(cv::Size(1, 1)), KSize(cv::Size(3, 3))),
+    WHOLE_SUBMAT));
+
+/////////////////////////////////////////////////////////////////////////////////////////////////
+// SeparableLinearFilter
+
+PARAM_TEST_CASE(SeparableLinearFilter, cv::cuda::DeviceInfo, cv::Size, MatDepth, Channels, KSize, Anchor, BorderType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    int cn;
+    cv::Size ksize;
+    cv::Point anchor;
+    int borderType;
+    bool useRoi;
+
+    int type;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        cn = GET_PARAM(3);
+        ksize = GET_PARAM(4);
+        anchor = GET_PARAM(5);
+        borderType = GET_PARAM(6);
+        useRoi = GET_PARAM(7);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        type = CV_MAKE_TYPE(depth, cn);
+    }
+};
+
+CUDA_TEST_P(SeparableLinearFilter, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+    cv::Mat rowKernel = randomMat(Size(ksize.width, 1), CV_32FC1, 0.0, 1.0);
+    cv::Mat columnKernel = randomMat(Size(ksize.height, 1), CV_32FC1, 0.0, 1.0);
+
+    cv::Ptr<cv::cuda::Filter> filter = cv::cuda::createSeparableLinearFilter(src.type(), -1, rowKernel, columnKernel, anchor, borderType);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    filter->apply(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::sepFilter2D(src, dst_gold, -1, rowKernel, columnKernel, anchor, 0, borderType);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, src.depth() < CV_32F ? 1.0 : 1e-2);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Filters, SeparableLinearFilter, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U), MatDepth(CV_16U), MatDepth(CV_16S), MatDepth(CV_32F)),
+    IMAGE_CHANNELS,
+    testing::Values(KSize(cv::Size(3, 3)),
+                    KSize(cv::Size(7, 7)),
+                    KSize(cv::Size(13, 13)),
+                    KSize(cv::Size(15, 15)),
+                    KSize(cv::Size(17, 17)),
+                    KSize(cv::Size(23, 15)),
+                    KSize(cv::Size(31, 3))),
+    testing::Values(Anchor(cv::Point(-1, -1)), Anchor(cv::Point(0, 0)), Anchor(cv::Point(2, 2))),
+    testing::Values(BorderType(cv::BORDER_REFLECT101),
+                    BorderType(cv::BORDER_REPLICATE),
+                    BorderType(cv::BORDER_CONSTANT),
+                    BorderType(cv::BORDER_REFLECT)),
+    WHOLE_SUBMAT));
+
+/////////////////////////////////////////////////////////////////////////////////////////////////
+// Sobel
+
+PARAM_TEST_CASE(Sobel, cv::cuda::DeviceInfo, cv::Size, MatDepth, Channels, KSize, Deriv_X, Deriv_Y, BorderType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    int cn;
+    cv::Size ksize;
+    int dx;
+    int dy;
+    int borderType;
+    bool useRoi;
+
+    int type;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        cn = GET_PARAM(3);
+        ksize = GET_PARAM(4);
+        dx = GET_PARAM(5);
+        dy = GET_PARAM(6);
+        borderType = GET_PARAM(7);
+        useRoi = GET_PARAM(8);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        type = CV_MAKE_TYPE(depth, cn);
+    }
+};
+
+CUDA_TEST_P(Sobel, Accuracy)
+{
+    if (dx == 0 && dy == 0)
+        return;
+
+    cv::Mat src = randomMat(size, type);
+
+    cv::Ptr<cv::cuda::Filter> sobel = cv::cuda::createSobelFilter(src.type(), -1, dx, dy, ksize.width, 1.0, borderType);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    sobel->apply(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::Sobel(src, dst_gold, -1, dx, dy, ksize.width, 1.0, 0.0, borderType);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, src.depth() < CV_32F ? 0.0 : 0.1);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Filters, Sobel, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U), MatDepth(CV_16U), MatDepth(CV_16S), MatDepth(CV_32F)),
+    IMAGE_CHANNELS,
+    testing::Values(KSize(cv::Size(3, 3)), KSize(cv::Size(5, 5)), KSize(cv::Size(7, 7))),
+    testing::Values(Deriv_X(0), Deriv_X(1), Deriv_X(2)),
+    testing::Values(Deriv_Y(0), Deriv_Y(1), Deriv_Y(2)),
+    testing::Values(BorderType(cv::BORDER_REFLECT101),
+                    BorderType(cv::BORDER_REPLICATE),
+                    BorderType(cv::BORDER_CONSTANT),
+                    BorderType(cv::BORDER_REFLECT)),
+    WHOLE_SUBMAT));
+
+/////////////////////////////////////////////////////////////////////////////////////////////////
+// Scharr
+
+PARAM_TEST_CASE(Scharr, cv::cuda::DeviceInfo, cv::Size, MatDepth, Channels, Deriv_X, Deriv_Y, BorderType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    int cn;
+    int dx;
+    int dy;
+    int borderType;
+    bool useRoi;
+
+    int type;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        cn = GET_PARAM(3);
+        dx = GET_PARAM(4);
+        dy = GET_PARAM(5);
+        borderType = GET_PARAM(6);
+        useRoi = GET_PARAM(7);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        type = CV_MAKE_TYPE(depth, cn);
+    }
+};
+
+CUDA_TEST_P(Scharr, Accuracy)
+{
+    if (dx + dy != 1)
+        return;
+
+    cv::Mat src = randomMat(size, type);
+
+    cv::Ptr<cv::cuda::Filter> scharr = cv::cuda::createScharrFilter(src.type(), -1, dx, dy, 1.0, borderType);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    scharr->apply(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::Scharr(src, dst_gold, -1, dx, dy, 1.0, 0.0, borderType);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, src.depth() < CV_32F ? 0.0 : 0.1);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Filters, Scharr, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U), MatDepth(CV_16U), MatDepth(CV_16S), MatDepth(CV_32F)),
+    IMAGE_CHANNELS,
+    testing::Values(Deriv_X(0), Deriv_X(1)),
+    testing::Values(Deriv_Y(0), Deriv_Y(1)),
+    testing::Values(BorderType(cv::BORDER_REFLECT101),
+                    BorderType(cv::BORDER_REPLICATE),
+                    BorderType(cv::BORDER_CONSTANT),
+                    BorderType(cv::BORDER_REFLECT)),
+    WHOLE_SUBMAT));
+
+/////////////////////////////////////////////////////////////////////////////////////////////////
+// GaussianBlur
+
+PARAM_TEST_CASE(GaussianBlur, cv::cuda::DeviceInfo, cv::Size, MatDepth, Channels, KSize, BorderType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    int cn;
+    cv::Size ksize;
+    int borderType;
+    bool useRoi;
+
+    int type;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        cn = GET_PARAM(3);
+        ksize = GET_PARAM(4);
+        borderType = GET_PARAM(5);
+        useRoi = GET_PARAM(6);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        type = CV_MAKE_TYPE(depth, cn);
+    }
+};
+
+CUDA_TEST_P(GaussianBlur, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+    double sigma1 = randomDouble(0.1, 1.0);
+    double sigma2 = randomDouble(0.1, 1.0);
+
+    cv::Ptr<cv::cuda::Filter> gauss = cv::cuda::createGaussianFilter(src.type(), -1, ksize, sigma1, sigma2, borderType);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    gauss->apply(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::GaussianBlur(src, dst_gold, ksize, sigma1, sigma2, borderType);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, src.depth() < CV_32F ? 4.0 : 1e-4);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Filters, GaussianBlur, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U), MatDepth(CV_16U), MatDepth(CV_16S), MatDepth(CV_32F)),
+    IMAGE_CHANNELS,
+    testing::Values(KSize(cv::Size(3, 3)),
+                    KSize(cv::Size(5, 5)),
+                    KSize(cv::Size(7, 7)),
+                    KSize(cv::Size(9, 9)),
+                    KSize(cv::Size(11, 11)),
+                    KSize(cv::Size(13, 13)),
+                    KSize(cv::Size(15, 15)),
+                    KSize(cv::Size(17, 17)),
+                    KSize(cv::Size(19, 19)),
+                    KSize(cv::Size(21, 21)),
+                    KSize(cv::Size(23, 23)),
+                    KSize(cv::Size(25, 25)),
+                    KSize(cv::Size(27, 27)),
+                    KSize(cv::Size(29, 29)),
+                    KSize(cv::Size(31, 31))),
+    testing::Values(BorderType(cv::BORDER_REFLECT101),
+                    BorderType(cv::BORDER_REPLICATE),
+                    BorderType(cv::BORDER_CONSTANT),
+                    BorderType(cv::BORDER_REFLECT)),
+    WHOLE_SUBMAT));
+
+/////////////////////////////////////////////////////////////////////////////////////////////////
+// Erode
+
+PARAM_TEST_CASE(Erode, cv::cuda::DeviceInfo, cv::Size, MatType, Anchor, Iterations, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    cv::Point anchor;
+    int iterations;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        anchor = GET_PARAM(3);
+        iterations = GET_PARAM(4);
+        useRoi = GET_PARAM(5);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Erode, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+    cv::Mat kernel = cv::Mat::ones(3, 3, CV_8U);
+
+    cv::Ptr<cv::cuda::Filter> erode = cv::cuda::createMorphologyFilter(cv::MORPH_ERODE, src.type(), kernel, anchor, iterations);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    erode->apply(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::erode(src, dst_gold, kernel, anchor, iterations);
+
+    cv::Size ksize = cv::Size(kernel.cols + iterations * (kernel.cols - 1), kernel.rows + iterations * (kernel.rows - 1));
+
+    EXPECT_MAT_NEAR(getInnerROI(dst_gold, ksize), getInnerROI(dst, ksize), 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Filters, Erode, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC4)),
+    testing::Values(Anchor(cv::Point(-1, -1)), Anchor(cv::Point(0, 0)), Anchor(cv::Point(2, 2))),
+    testing::Values(Iterations(1), Iterations(2), Iterations(3)),
+    WHOLE_SUBMAT));
+
+/////////////////////////////////////////////////////////////////////////////////////////////////
+// Dilate
+
+PARAM_TEST_CASE(Dilate, cv::cuda::DeviceInfo, cv::Size, MatType, Anchor, Iterations, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    cv::Point anchor;
+    int iterations;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        anchor = GET_PARAM(3);
+        iterations = GET_PARAM(4);
+        useRoi = GET_PARAM(5);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Dilate, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+    cv::Mat kernel = cv::Mat::ones(3, 3, CV_8U);
+
+    cv::Ptr<cv::cuda::Filter> dilate = cv::cuda::createMorphologyFilter(cv::MORPH_DILATE, src.type(), kernel, anchor, iterations);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    dilate->apply(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::dilate(src, dst_gold, kernel, anchor, iterations);
+
+    cv::Size ksize = cv::Size(kernel.cols + iterations * (kernel.cols - 1), kernel.rows + iterations * (kernel.rows - 1));
+
+    EXPECT_MAT_NEAR(getInnerROI(dst_gold, ksize), getInnerROI(dst, ksize), 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Filters, Dilate, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC4)),
+    testing::Values(Anchor(cv::Point(-1, -1)), Anchor(cv::Point(0, 0)), Anchor(cv::Point(2, 2))),
+    testing::Values(Iterations(1), Iterations(2), Iterations(3)),
+    WHOLE_SUBMAT));
+
+/////////////////////////////////////////////////////////////////////////////////////////////////
+// MorphEx
+
+CV_ENUM(MorphOp, MORPH_OPEN, MORPH_CLOSE, MORPH_GRADIENT, MORPH_TOPHAT, MORPH_BLACKHAT)
+
+PARAM_TEST_CASE(MorphEx, cv::cuda::DeviceInfo, cv::Size, MatType, MorphOp, Anchor, Iterations, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    int morphOp;
+    cv::Point anchor;
+    int iterations;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        morphOp = GET_PARAM(3);
+        anchor = GET_PARAM(4);
+        iterations = GET_PARAM(5);
+        useRoi = GET_PARAM(6);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(MorphEx, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+    cv::Mat kernel = cv::Mat::ones(3, 3, CV_8U);
+
+    cv::Ptr<cv::cuda::Filter> morph = cv::cuda::createMorphologyFilter(morphOp, src.type(), kernel, anchor, iterations);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    morph->apply(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::morphologyEx(src, dst_gold, morphOp, kernel, anchor, iterations);
+
+    cv::Size border = cv::Size(kernel.cols + (iterations + 1) * kernel.cols + 2, kernel.rows + (iterations + 1) * kernel.rows + 2);
+
+    EXPECT_MAT_NEAR(getInnerROI(dst_gold, border), getInnerROI(dst, border), 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Filters, MorphEx, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC4)),
+    MorphOp::all(),
+    testing::Values(Anchor(cv::Point(-1, -1)), Anchor(cv::Point(0, 0)), Anchor(cv::Point(2, 2))),
+    testing::Values(Iterations(1), Iterations(2), Iterations(3)),
+    WHOLE_SUBMAT));
+
+/////////////////////////////////////////////////////////////////////////////////////////////////
+// Median
+
+
+PARAM_TEST_CASE(Median, cv::cuda::DeviceInfo, cv::Size, MatDepth,  KernelSize, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    int kernel;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        kernel = GET_PARAM(3);
+        useRoi = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+
+
+CUDA_TEST_P(Median, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+
+    cv::Ptr<cv::cuda::Filter> median = cv::cuda::createMedianFilter(src.type(), kernel);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    median->apply(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::medianBlur(src,dst_gold,kernel);
+
+    cv::Rect rect(kernel+1,0,src.cols-(2*kernel+1),src.rows);
+    cv::Mat dst_gold_no_border = dst_gold(rect);
+    cv::cuda::GpuMat dst_no_border = cv::cuda::GpuMat(dst, rect);
+
+    EXPECT_MAT_NEAR(dst_gold_no_border, dst_no_border, 1);
+
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Filters, Median, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U)),
+    testing::Values(KernelSize(3),
+                    KernelSize(5),
+                    KernelSize(7),
+                    KernelSize(9),
+                    KernelSize(11),
+                    KernelSize(13),
+                    KernelSize(15)),
+    WHOLE_SUBMAT)
+    );
+
+}} // namespace
+
+#endif // HAVE_CUDA
diff --git a/modules/cudafilters/test/test_main.cpp b/modules/cudafilters/test/test_main.cpp
new file mode 100644
index 00000000000..04f4fcf6e60
--- /dev/null
+++ b/modules/cudafilters/test/test_main.cpp
@@ -0,0 +1,45 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+CV_CUDA_TEST_MAIN("gpu")
diff --git a/modules/cudafilters/test/test_precomp.hpp b/modules/cudafilters/test/test_precomp.hpp
new file mode 100644
index 00000000000..5bb815b7416
--- /dev/null
+++ b/modules/cudafilters/test/test_precomp.hpp
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_test.hpp"
+
+#include "opencv2/cudafilters.hpp"
+
+#include "cvconfig.h"
+
+#endif
diff --git a/modules/cudaimgproc/CMakeLists.txt b/modules/cudaimgproc/CMakeLists.txt
new file mode 100644
index 00000000000..8d06804ddcc
--- /dev/null
+++ b/modules/cudaimgproc/CMakeLists.txt
@@ -0,0 +1,9 @@
+if(IOS OR WINRT OR (NOT HAVE_CUDA AND NOT BUILD_CUDA_STUBS))
+  ocv_module_disable(cudaimgproc)
+endif()
+
+set(the_description "CUDA-accelerated Image Processing")
+
+ocv_warnings_disable(CMAKE_CXX_FLAGS /wd4127 /wd4100 /wd4324 /wd4512 /wd4515 -Wundef -Wmissing-declarations -Wshadow -Wunused-parameter)
+
+ocv_define_module(cudaimgproc opencv_imgproc OPTIONAL opencv_cudev opencv_cudaarithm opencv_cudafilters WRAP python)
diff --git a/modules/cudaimgproc/include/opencv2/cudaimgproc.hpp b/modules/cudaimgproc/include/opencv2/cudaimgproc.hpp
new file mode 100644
index 00000000000..038dc9d053d
--- /dev/null
+++ b/modules/cudaimgproc/include/opencv2/cudaimgproc.hpp
@@ -0,0 +1,738 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_CUDAIMGPROC_HPP
+#define OPENCV_CUDAIMGPROC_HPP
+
+#ifndef __cplusplus
+#  error cudaimgproc.hpp header must be compiled as C++
+#endif
+
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/imgproc.hpp"
+
+/**
+  @addtogroup cuda
+  @{
+    @defgroup cudaimgproc Image Processing
+    @{
+      @defgroup cudaimgproc_color Color space processing
+      @defgroup cudaimgproc_hist Histogram Calculation
+      @defgroup cudaimgproc_hough Hough Transform
+      @defgroup cudaimgproc_feature Feature Detection
+    @}
+  @}
+*/
+
+namespace cv { namespace cuda {
+
+//! @addtogroup cudaimgproc
+//! @{
+
+/////////////////////////// Color Processing ///////////////////////////
+
+//! @addtogroup cudaimgproc_color
+//! @{
+
+/** @brief Converts an image from one color space to another.
+
+@param src Source image with CV_8U , CV_16U , or CV_32F depth and 1, 3, or 4 channels.
+@param dst Destination image.
+@param code Color space conversion code. For details, see cvtColor .
+@param dcn Number of channels in the destination image. If the parameter is 0, the number of the
+channels is derived automatically from src and the code .
+@param stream Stream for the asynchronous version.
+
+3-channel color spaces (like HSV, XYZ, and so on) can be stored in a 4-channel image for better
+performance.
+
+@sa cvtColor
+ */
+CV_EXPORTS_W void cvtColor(InputArray src, OutputArray dst, int code, int dcn = 0, Stream& stream = Stream::Null());
+
+enum DemosaicTypes
+{
+    //! Bayer Demosaicing (Malvar, He, and Cutler)
+    COLOR_BayerBG2BGR_MHT = 256,
+    COLOR_BayerGB2BGR_MHT = 257,
+    COLOR_BayerRG2BGR_MHT = 258,
+    COLOR_BayerGR2BGR_MHT = 259,
+
+    COLOR_BayerBG2RGB_MHT = COLOR_BayerRG2BGR_MHT,
+    COLOR_BayerGB2RGB_MHT = COLOR_BayerGR2BGR_MHT,
+    COLOR_BayerRG2RGB_MHT = COLOR_BayerBG2BGR_MHT,
+    COLOR_BayerGR2RGB_MHT = COLOR_BayerGB2BGR_MHT,
+
+    COLOR_BayerBG2GRAY_MHT = 260,
+    COLOR_BayerGB2GRAY_MHT = 261,
+    COLOR_BayerRG2GRAY_MHT = 262,
+    COLOR_BayerGR2GRAY_MHT = 263
+};
+
+/** @brief Converts an image from Bayer pattern to RGB or grayscale.
+
+@param src Source image (8-bit or 16-bit single channel).
+@param dst Destination image.
+@param code Color space conversion code (see the description below).
+@param dcn Number of channels in the destination image. If the parameter is 0, the number of the
+channels is derived automatically from src and the code .
+@param stream Stream for the asynchronous version.
+
+The function can do the following transformations:
+
+-   Demosaicing using bilinear interpolation
+
+    > -   COLOR_BayerBG2GRAY , COLOR_BayerGB2GRAY , COLOR_BayerRG2GRAY , COLOR_BayerGR2GRAY
+    > -   COLOR_BayerBG2BGR , COLOR_BayerGB2BGR , COLOR_BayerRG2BGR , COLOR_BayerGR2BGR
+
+-   Demosaicing using Malvar-He-Cutler algorithm (@cite MHT2011)
+
+    > -   COLOR_BayerBG2GRAY_MHT , COLOR_BayerGB2GRAY_MHT , COLOR_BayerRG2GRAY_MHT ,
+    >     COLOR_BayerGR2GRAY_MHT
+    > -   COLOR_BayerBG2BGR_MHT , COLOR_BayerGB2BGR_MHT , COLOR_BayerRG2BGR_MHT ,
+    >     COLOR_BayerGR2BGR_MHT
+
+@sa cvtColor
+ */
+CV_EXPORTS_W void demosaicing(InputArray src, OutputArray dst, int code, int dcn = -1, Stream& stream = Stream::Null());
+
+/** @brief Exchanges the color channels of an image in-place.
+
+@param image Source image. Supports only CV_8UC4 type.
+@param dstOrder Integer array describing how channel values are permutated. The n-th entry of the
+array contains the number of the channel that is stored in the n-th channel of the output image.
+E.g. Given an RGBA image, aDstOrder = [3,2,1,0] converts this to ABGR channel order.
+@param stream Stream for the asynchronous version.
+
+The methods support arbitrary permutations of the original channels, including replication.
+ */
+CV_EXPORTS void swapChannels(InputOutputArray image, const int dstOrder[4], Stream& stream = Stream::Null());
+
+/** @brief Routines for correcting image color gamma.
+
+@param src Source image (3- or 4-channel 8 bit).
+@param dst Destination image.
+@param forward true for forward gamma correction or false for inverse gamma correction.
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void gammaCorrection(InputArray src, OutputArray dst, bool forward = true, Stream& stream = Stream::Null());
+
+enum AlphaCompTypes { ALPHA_OVER, ALPHA_IN, ALPHA_OUT, ALPHA_ATOP, ALPHA_XOR, ALPHA_PLUS, ALPHA_OVER_PREMUL, ALPHA_IN_PREMUL, ALPHA_OUT_PREMUL,
+       ALPHA_ATOP_PREMUL, ALPHA_XOR_PREMUL, ALPHA_PLUS_PREMUL, ALPHA_PREMUL};
+
+/** @brief Composites two images using alpha opacity values contained in each image.
+
+@param img1 First image. Supports CV_8UC4 , CV_16UC4 , CV_32SC4 and CV_32FC4 types.
+@param img2 Second image. Must have the same size and the same type as img1 .
+@param dst Destination image.
+@param alpha_op Flag specifying the alpha-blending operation:
+-   **ALPHA_OVER**
+-   **ALPHA_IN**
+-   **ALPHA_OUT**
+-   **ALPHA_ATOP**
+-   **ALPHA_XOR**
+-   **ALPHA_PLUS**
+-   **ALPHA_OVER_PREMUL**
+-   **ALPHA_IN_PREMUL**
+-   **ALPHA_OUT_PREMUL**
+-   **ALPHA_ATOP_PREMUL**
+-   **ALPHA_XOR_PREMUL**
+-   **ALPHA_PLUS_PREMUL**
+-   **ALPHA_PREMUL**
+@param stream Stream for the asynchronous version.
+
+@note
+   -   An example demonstrating the use of alphaComp can be found at
+        opencv_source_code/samples/gpu/alpha_comp.cpp
+ */
+CV_EXPORTS_W void alphaComp(InputArray img1, InputArray img2, OutputArray dst, int alpha_op, Stream& stream = Stream::Null());
+
+//! @} cudaimgproc_color
+
+////////////////////////////// Histogram ///////////////////////////////
+
+//! @addtogroup cudaimgproc_hist
+//! @{
+
+/** @brief Calculates histogram for one channel 8-bit image.
+
+@param src Source image with CV_8UC1 type.
+@param hist Destination histogram with one row, 256 columns, and the CV_32SC1 type.
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void calcHist(InputArray src, OutputArray hist, Stream& stream = Stream::Null());
+
+/** @brief Calculates histogram for one channel 8-bit image confined in given mask.
+
+@param src Source image with CV_8UC1 type.
+@param hist Destination histogram with one row, 256 columns, and the CV_32SC1 type.
+@param mask A mask image same size as src and of type CV_8UC1.
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void calcHist(InputArray src, InputArray mask, OutputArray hist, Stream& stream = Stream::Null());
+
+/** @brief Equalizes the histogram of a grayscale image.
+
+@param src Source image with CV_8UC1 type.
+@param dst Destination image.
+@param stream Stream for the asynchronous version.
+
+@sa equalizeHist
+ */
+CV_EXPORTS_W void equalizeHist(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+/** @brief Base class for Contrast Limited Adaptive Histogram Equalization. :
+ */
+class CV_EXPORTS_W CLAHE : public cv::CLAHE
+{
+public:
+    using cv::CLAHE::apply;
+    /** @brief Equalizes the histogram of a grayscale image using Contrast Limited Adaptive Histogram Equalization.
+
+    @param src Source image with CV_8UC1 type.
+    @param dst Destination image.
+    @param stream Stream for the asynchronous version.
+     */
+    CV_WRAP virtual void apply(InputArray src, OutputArray dst, Stream& stream) = 0;
+};
+
+/** @brief Creates implementation for cuda::CLAHE .
+
+@param clipLimit Threshold for contrast limiting.
+@param tileGridSize Size of grid for histogram equalization. Input image will be divided into
+equally sized rectangular tiles. tileGridSize defines the number of tiles in row and column.
+ */
+CV_EXPORTS_W Ptr<cuda::CLAHE> createCLAHE(double clipLimit = 40.0, Size tileGridSize = Size(8, 8));
+
+/** @brief Computes levels with even distribution.
+
+@param levels Destination array. levels has 1 row, nLevels columns, and the CV_32SC1 type.
+@param nLevels Number of computed levels. nLevels must be at least 2.
+@param lowerLevel Lower boundary value of the lowest level.
+@param upperLevel Upper boundary value of the greatest level.
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void evenLevels(OutputArray levels, int nLevels, int lowerLevel, int upperLevel, Stream& stream = Stream::Null());
+
+/** @brief Calculates a histogram with evenly distributed bins.
+
+@param src Source image. CV_8U, CV_16U, or CV_16S depth and 1 or 4 channels are supported. For
+a four-channel image, all channels are processed separately.
+@param hist Destination histogram with one row, histSize columns, and the CV_32S type.
+@param histSize Size of the histogram.
+@param lowerLevel Lower boundary of lowest-level bin.
+@param upperLevel Upper boundary of highest-level bin.
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void histEven(InputArray src, OutputArray hist, int histSize, int lowerLevel, int upperLevel, Stream& stream = Stream::Null());
+/** @overload */
+CV_EXPORTS_W void histEven(InputArray src, GpuMat hist[4], int histSize[4], int lowerLevel[4], int upperLevel[4], Stream& stream = Stream::Null());
+
+/** @brief Calculates a histogram with bins determined by the levels array.
+
+@param src Source image. CV_8U , CV_16U , or CV_16S depth and 1 or 4 channels are supported.
+For a four-channel image, all channels are processed separately.
+@param hist Destination histogram with one row, (levels.cols-1) columns, and the CV_32SC1 type.
+@param levels Number of levels in the histogram.
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void histRange(InputArray src, OutputArray hist, InputArray levels, Stream& stream = Stream::Null());
+/** @overload */
+CV_EXPORTS_W void histRange(InputArray src, GpuMat hist[4], const GpuMat levels[4], Stream& stream = Stream::Null());
+
+//! @} cudaimgproc_hist
+
+//////////////////////////////// Canny ////////////////////////////////
+
+/** @brief Base class for Canny Edge Detector. :
+ */
+class CV_EXPORTS_W CannyEdgeDetector : public Algorithm
+{
+public:
+    /** @brief Finds edges in an image using the @cite Canny86 algorithm.
+
+    @param image Single-channel 8-bit input image.
+    @param edges Output edge map. It has the same size and type as image.
+    @param stream Stream for the asynchronous version.
+     */
+    CV_WRAP virtual void detect(InputArray image, OutputArray edges, Stream& stream = Stream::Null()) = 0;
+    /** @overload
+    @param dx First derivative of image in the vertical direction. Support only CV_32S type.
+    @param dy First derivative of image in the horizontal direction. Support only CV_32S type.
+    @param edges Output edge map. It has the same size and type as image.
+    @param stream Stream for the asynchronous version.
+    */
+    CV_WRAP virtual void detect(InputArray dx, InputArray dy, OutputArray edges, Stream& stream = Stream::Null()) = 0;
+
+    CV_WRAP virtual void setLowThreshold(double low_thresh) = 0;
+    CV_WRAP virtual double getLowThreshold() const = 0;
+
+    CV_WRAP virtual void setHighThreshold(double high_thresh) = 0;
+    CV_WRAP virtual double getHighThreshold() const = 0;
+
+    CV_WRAP virtual void setAppertureSize(int apperture_size) = 0;
+    CV_WRAP virtual int getAppertureSize() const = 0;
+
+    CV_WRAP virtual void setL2Gradient(bool L2gradient) = 0;
+    CV_WRAP virtual bool getL2Gradient() const = 0;
+};
+
+/** @brief Creates implementation for cuda::CannyEdgeDetector .
+
+@param low_thresh First threshold for the hysteresis procedure.
+@param high_thresh Second threshold for the hysteresis procedure.
+@param apperture_size Aperture size for the Sobel operator.
+@param L2gradient Flag indicating whether a more accurate \f$L_2\f$ norm
+\f$=\sqrt{(dI/dx)^2 + (dI/dy)^2}\f$ should be used to compute the image gradient magnitude (
+L2gradient=true ), or a faster default \f$L_1\f$ norm \f$=|dI/dx|+|dI/dy|\f$ is enough ( L2gradient=false
+).
+ */
+CV_EXPORTS_W Ptr<CannyEdgeDetector> createCannyEdgeDetector(double low_thresh, double high_thresh, int apperture_size = 3, bool L2gradient = false);
+
+/////////////////////////// Hough Transform ////////////////////////////
+
+//////////////////////////////////////
+// HoughLines
+
+//! @addtogroup cudaimgproc_hough
+//! @{
+
+/** @brief Base class for lines detector algorithm. :
+ */
+class CV_EXPORTS_W HoughLinesDetector : public Algorithm
+{
+public:
+    /** @brief Finds lines in a binary image using the classical Hough transform.
+
+    @param src 8-bit, single-channel binary source image.
+    @param lines Output vector of lines. Each line is represented by a two-element vector
+    \f$(\rho, \theta)\f$ . \f$\rho\f$ is the distance from the coordinate origin \f$(0,0)\f$ (top-left corner of
+    the image). \f$\theta\f$ is the line rotation angle in radians (
+    \f$0 \sim \textrm{vertical line}, \pi/2 \sim \textrm{horizontal line}\f$ ).
+    @param stream Stream for the asynchronous version.
+
+    @sa HoughLines
+     */
+    CV_WRAP virtual void detect(InputArray src, OutputArray lines, Stream& stream = Stream::Null()) = 0;
+
+    /** @brief Downloads results from cuda::HoughLinesDetector::detect to host memory.
+
+    @param d_lines Result of cuda::HoughLinesDetector::detect .
+    @param h_lines Output host array.
+    @param h_votes Optional output array for line's votes.
+    @param stream Stream for the asynchronous version.
+     */
+    CV_WRAP virtual void downloadResults(InputArray d_lines, OutputArray h_lines, OutputArray h_votes = noArray(), Stream& stream = Stream::Null()) = 0;
+
+    CV_WRAP virtual void setRho(float rho) = 0;
+    CV_WRAP virtual float getRho() const = 0;
+
+    CV_WRAP virtual void setTheta(float theta) = 0;
+    CV_WRAP virtual float getTheta() const = 0;
+
+    CV_WRAP virtual void setThreshold(int threshold) = 0;
+    CV_WRAP virtual int getThreshold() const = 0;
+
+    CV_WRAP virtual void setDoSort(bool doSort) = 0;
+    CV_WRAP virtual bool getDoSort() const = 0;
+
+    CV_WRAP virtual void setMaxLines(int maxLines) = 0;
+    CV_WRAP virtual int getMaxLines() const = 0;
+};
+
+/** @brief Creates implementation for cuda::HoughLinesDetector .
+
+@param rho Distance resolution of the accumulator in pixels.
+@param theta Angle resolution of the accumulator in radians.
+@param threshold Accumulator threshold parameter. Only those lines are returned that get enough
+votes ( \f$>\texttt{threshold}\f$ ).
+@param doSort Performs lines sort by votes.
+@param maxLines Maximum number of output lines.
+ */
+CV_EXPORTS_W Ptr<HoughLinesDetector> createHoughLinesDetector(float rho, float theta, int threshold, bool doSort = false, int maxLines = 4096);
+
+
+//////////////////////////////////////
+// HoughLinesP
+
+/** @brief Base class for line segments detector algorithm. :
+ */
+class CV_EXPORTS_W HoughSegmentDetector : public Algorithm
+{
+public:
+    /** @brief Finds line segments in a binary image using the probabilistic Hough transform.
+
+    @param src 8-bit, single-channel binary source image.
+    @param lines Output vector of lines. Each line is represented by a 4-element vector
+    \f$(x_1, y_1, x_2, y_2)\f$ , where \f$(x_1,y_1)\f$ and \f$(x_2, y_2)\f$ are the ending points of each detected
+    line segment.
+    @param stream Stream for the asynchronous version.
+
+    @sa HoughLinesP
+     */
+    CV_WRAP virtual void detect(InputArray src, OutputArray lines, Stream& stream = Stream::Null()) = 0;
+
+    CV_WRAP virtual void setRho(float rho) = 0;
+    CV_WRAP virtual float getRho() const = 0;
+
+    CV_WRAP virtual void setTheta(float theta) = 0;
+    CV_WRAP virtual float getTheta() const = 0;
+
+    CV_WRAP virtual void setMinLineLength(int minLineLength) = 0;
+    CV_WRAP virtual int getMinLineLength() const = 0;
+
+    CV_WRAP virtual void setMaxLineGap(int maxLineGap) = 0;
+    CV_WRAP virtual int getMaxLineGap() const = 0;
+
+    CV_WRAP virtual void setMaxLines(int maxLines) = 0;
+    CV_WRAP virtual int getMaxLines() const = 0;
+};
+
+/** @brief Creates implementation for cuda::HoughSegmentDetector .
+
+@param rho Distance resolution of the accumulator in pixels.
+@param theta Angle resolution of the accumulator in radians.
+@param minLineLength Minimum line length. Line segments shorter than that are rejected.
+@param maxLineGap Maximum allowed gap between points on the same line to link them.
+@param maxLines Maximum number of output lines.
+ */
+CV_EXPORTS_W Ptr<HoughSegmentDetector> createHoughSegmentDetector(float rho, float theta, int minLineLength, int maxLineGap, int maxLines = 4096);
+
+//////////////////////////////////////
+// HoughCircles
+
+/** @brief Base class for circles detector algorithm. :
+ */
+class CV_EXPORTS_W HoughCirclesDetector : public Algorithm
+{
+public:
+    /** @brief Finds circles in a grayscale image using the Hough transform.
+
+    @param src 8-bit, single-channel grayscale input image.
+    @param circles Output vector of found circles. Each vector is encoded as a 3-element
+    floating-point vector \f$(x, y, radius)\f$ .
+    @param stream Stream for the asynchronous version.
+
+    @sa HoughCircles
+     */
+    CV_WRAP virtual void detect(InputArray src, OutputArray circles, Stream& stream = Stream::Null()) = 0;
+
+    CV_WRAP virtual void setDp(float dp) = 0;
+    CV_WRAP virtual float getDp() const = 0;
+
+    CV_WRAP virtual void setMinDist(float minDist) = 0;
+    CV_WRAP virtual float getMinDist() const = 0;
+
+    CV_WRAP virtual void setCannyThreshold(int cannyThreshold) = 0;
+    CV_WRAP virtual int getCannyThreshold() const = 0;
+
+    CV_WRAP virtual void setVotesThreshold(int votesThreshold) = 0;
+    CV_WRAP virtual int getVotesThreshold() const = 0;
+
+    CV_WRAP virtual void setMinRadius(int minRadius) = 0;
+    CV_WRAP virtual int getMinRadius() const = 0;
+
+    CV_WRAP virtual void setMaxRadius(int maxRadius) = 0;
+    CV_WRAP virtual int getMaxRadius() const = 0;
+
+    CV_WRAP virtual void setMaxCircles(int maxCircles) = 0;
+    CV_WRAP virtual int getMaxCircles() const = 0;
+};
+
+/** @brief Creates implementation for cuda::HoughCirclesDetector .
+
+@param dp Inverse ratio of the accumulator resolution to the image resolution. For example, if
+dp=1 , the accumulator has the same resolution as the input image. If dp=2 , the accumulator has
+half as big width and height.
+@param minDist Minimum distance between the centers of the detected circles. If the parameter is
+too small, multiple neighbor circles may be falsely detected in addition to a true one. If it is
+too large, some circles may be missed.
+@param cannyThreshold The higher threshold of the two passed to Canny edge detector (the lower one
+is twice smaller).
+@param votesThreshold The accumulator threshold for the circle centers at the detection stage. The
+smaller it is, the more false circles may be detected.
+@param minRadius Minimum circle radius.
+@param maxRadius Maximum circle radius.
+@param maxCircles Maximum number of output circles.
+ */
+CV_EXPORTS_W Ptr<HoughCirclesDetector> createHoughCirclesDetector(float dp, float minDist, int cannyThreshold, int votesThreshold, int minRadius, int maxRadius, int maxCircles = 4096);
+
+//////////////////////////////////////
+// GeneralizedHough
+
+/** @brief Creates implementation for generalized hough transform from @cite Ballard1981 .
+ */
+CV_EXPORTS_W Ptr<GeneralizedHoughBallard> createGeneralizedHoughBallard();
+
+/** @brief Creates implementation for generalized hough transform from @cite Guil1999 .
+ */
+CV_EXPORTS_W Ptr<GeneralizedHoughGuil> createGeneralizedHoughGuil();
+
+//! @} cudaimgproc_hough
+
+////////////////////////// Corners Detection ///////////////////////////
+
+//! @addtogroup cudaimgproc_feature
+//! @{
+
+/** @brief Base class for Cornerness Criteria computation. :
+ */
+class CV_EXPORTS_W CornernessCriteria : public Algorithm
+{
+public:
+    /** @brief Computes the cornerness criteria at each image pixel.
+
+    @param src Source image.
+    @param dst Destination image containing cornerness values. It will have the same size as src and
+    CV_32FC1 type.
+    @param stream Stream for the asynchronous version.
+     */
+    CV_WRAP virtual void compute(InputArray src, OutputArray dst, Stream& stream = Stream::Null()) = 0;
+};
+
+/** @brief Creates implementation for Harris cornerness criteria.
+
+@param srcType Input source type. Only CV_8UC1 and CV_32FC1 are supported for now.
+@param blockSize Neighborhood size.
+@param ksize Aperture parameter for the Sobel operator.
+@param k Harris detector free parameter.
+@param borderType Pixel extrapolation method. Only BORDER_REFLECT101 and BORDER_REPLICATE are
+supported for now.
+
+@sa cornerHarris
+ */
+CV_EXPORTS_W Ptr<CornernessCriteria> createHarrisCorner(int srcType, int blockSize, int ksize, double k, int borderType = BORDER_REFLECT101);
+
+/** @brief Creates implementation for the minimum eigen value of a 2x2 derivative covariation matrix (the
+cornerness criteria).
+
+@param srcType Input source type. Only CV_8UC1 and CV_32FC1 are supported for now.
+@param blockSize Neighborhood size.
+@param ksize Aperture parameter for the Sobel operator.
+@param borderType Pixel extrapolation method. Only BORDER_REFLECT101 and BORDER_REPLICATE are
+supported for now.
+
+@sa cornerMinEigenVal
+ */
+CV_EXPORTS_W Ptr<CornernessCriteria> createMinEigenValCorner(int srcType, int blockSize, int ksize, int borderType = BORDER_REFLECT101);
+
+////////////////////////// Corners Detection ///////////////////////////
+
+/** @brief Base class for Corners Detector. :
+ */
+class CV_EXPORTS_W CornersDetector : public Algorithm
+{
+public:
+    /** @brief Determines strong corners on an image.
+
+    @param image Input 8-bit or floating-point 32-bit, single-channel image.
+    @param corners Output vector of detected corners (1-row matrix with CV_32FC2 type with corners
+    positions).
+    @param mask Optional region of interest. If the image is not empty (it needs to have the type
+    CV_8UC1 and the same size as image ), it specifies the region in which the corners are detected.
+    @param stream Stream for the asynchronous version.
+     */
+    CV_WRAP virtual void detect(InputArray image, OutputArray corners, InputArray mask = noArray(), Stream& stream = Stream::Null()) = 0;
+};
+
+/** @brief Creates implementation for cuda::CornersDetector .
+
+@param srcType Input source type. Only CV_8UC1 and CV_32FC1 are supported for now.
+@param maxCorners Maximum number of corners to return. If there are more corners than are found,
+the strongest of them is returned.
+@param qualityLevel Parameter characterizing the minimal accepted quality of image corners. The
+parameter value is multiplied by the best corner quality measure, which is the minimal eigenvalue
+(see cornerMinEigenVal ) or the Harris function response (see cornerHarris ). The corners with the
+quality measure less than the product are rejected. For example, if the best corner has the
+quality measure = 1500, and the qualityLevel=0.01 , then all the corners with the quality measure
+less than 15 are rejected.
+@param minDistance Minimum possible Euclidean distance between the returned corners.
+@param blockSize Size of an average block for computing a derivative covariation matrix over each
+pixel neighborhood. See cornerEigenValsAndVecs .
+@param useHarrisDetector Parameter indicating whether to use a Harris detector (see cornerHarris)
+or cornerMinEigenVal.
+@param harrisK Free parameter of the Harris detector.
+ */
+CV_EXPORTS_W Ptr<CornersDetector> createGoodFeaturesToTrackDetector(int srcType, int maxCorners = 1000, double qualityLevel = 0.01, double minDistance = 0.0,
+                                                                  int blockSize = 3, bool useHarrisDetector = false, double harrisK = 0.04);
+
+//! @} cudaimgproc_feature
+
+
+///////////////////////////// Mean Shift //////////////////////////////
+
+/** @brief Performs mean-shift filtering for each point of the source image.
+
+@param src Source image. Only CV_8UC4 images are supported for now.
+@param dst Destination image containing the color of mapped points. It has the same size and type
+as src .
+@param sp Spatial window radius.
+@param sr Color window radius.
+@param criteria Termination criteria. See TermCriteria.
+@param stream Stream for the asynchronous version.
+
+It maps each point of the source image into another point. As a result, you have a new color and new
+position of each point.
+ */
+CV_EXPORTS_W void meanShiftFiltering(InputArray src, OutputArray dst, int sp, int sr,
+                                   TermCriteria criteria = TermCriteria(TermCriteria::MAX_ITER + TermCriteria::EPS, 5, 1),
+                                   Stream& stream = Stream::Null());
+
+/** @brief Performs a mean-shift procedure and stores information about processed points (their colors and
+positions) in two images.
+
+@param src Source image. Only CV_8UC4 images are supported for now.
+@param dstr Destination image containing the color of mapped points. The size and type is the same
+as src .
+@param dstsp Destination image containing the position of mapped points. The size is the same as
+src size. The type is CV_16SC2 .
+@param sp Spatial window radius.
+@param sr Color window radius.
+@param criteria Termination criteria. See TermCriteria.
+@param stream Stream for the asynchronous version.
+
+@sa cuda::meanShiftFiltering
+ */
+CV_EXPORTS_W void meanShiftProc(InputArray src, OutputArray dstr, OutputArray dstsp, int sp, int sr,
+                              TermCriteria criteria = TermCriteria(TermCriteria::MAX_ITER + TermCriteria::EPS, 5, 1),
+                              Stream& stream = Stream::Null());
+
+/** @brief Performs a mean-shift segmentation of the source image and eliminates small segments.
+
+@param src Source image. Only CV_8UC4 images are supported for now.
+@param dst Segmented image with the same size and type as src (host or gpu memory).
+@param sp Spatial window radius.
+@param sr Color window radius.
+@param minsize Minimum segment size. Smaller segments are merged.
+@param criteria Termination criteria. See TermCriteria.
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void meanShiftSegmentation(InputArray src, OutputArray dst, int sp, int sr, int minsize,
+                                      TermCriteria criteria = TermCriteria(TermCriteria::MAX_ITER + TermCriteria::EPS, 5, 1),
+                                      Stream& stream = Stream::Null());
+
+/////////////////////////// Match Template ////////////////////////////
+
+/** @brief Base class for Template Matching. :
+ */
+class CV_EXPORTS_W TemplateMatching : public Algorithm
+{
+public:
+    /** @brief Computes a proximity map for a raster template and an image where the template is searched for.
+
+    @param image Source image.
+    @param templ Template image with the size and type the same as image .
+    @param result Map containing comparison results ( CV_32FC1 ). If image is *W x H* and templ is *w
+    x h*, then result must be *W-w+1 x H-h+1*.
+    @param stream Stream for the asynchronous version.
+     */
+    CV_WRAP virtual void match(InputArray image, InputArray templ, OutputArray result, Stream& stream = Stream::Null()) = 0;
+};
+
+/** @brief Creates implementation for cuda::TemplateMatching .
+
+@param srcType Input source type. CV_32F and CV_8U depth images (1..4 channels) are supported
+for now.
+@param method Specifies the way to compare the template with the image.
+@param user_block_size You can use field user_block_size to set specific block size. If you
+leave its default value Size(0,0) then automatic estimation of block size will be used (which is
+optimized for speed). By varying user_block_size you can reduce memory requirements at the cost
+of speed.
+
+The following methods are supported for the CV_8U depth images for now:
+
+-   CV_TM_SQDIFF
+-   CV_TM_SQDIFF_NORMED
+-   CV_TM_CCORR
+-   CV_TM_CCORR_NORMED
+-   CV_TM_CCOEFF
+-   CV_TM_CCOEFF_NORMED
+
+The following methods are supported for the CV_32F images for now:
+
+-   CV_TM_SQDIFF
+-   CV_TM_CCORR
+
+@sa matchTemplate
+ */
+CV_EXPORTS_W Ptr<TemplateMatching> createTemplateMatching(int srcType, int method, Size user_block_size = Size());
+
+////////////////////////// Bilateral Filter ///////////////////////////
+
+/** @brief Performs bilateral filtering of passed image
+
+@param src Source image. Supports only (channels != 2 && depth() != CV_8S && depth() != CV_32S
+&& depth() != CV_64F).
+@param dst Destination imagwe.
+@param kernel_size Kernel window size.
+@param sigma_color Filter sigma in the color space.
+@param sigma_spatial Filter sigma in the coordinate space.
+@param borderMode Border type. See borderInterpolate for details. BORDER_REFLECT101 ,
+BORDER_REPLICATE , BORDER_CONSTANT , BORDER_REFLECT and BORDER_WRAP are supported for now.
+@param stream Stream for the asynchronous version.
+
+@sa bilateralFilter
+ */
+CV_EXPORTS_W void bilateralFilter(InputArray src, OutputArray dst, int kernel_size, float sigma_color, float sigma_spatial,
+                                int borderMode = BORDER_DEFAULT, Stream& stream = Stream::Null());
+
+///////////////////////////// Blending ////////////////////////////////
+
+/** @brief Performs linear blending of two images.
+
+@param img1 First image. Supports only CV_8U and CV_32F depth.
+@param img2 Second image. Must have the same size and the same type as img1 .
+@param weights1 Weights for first image. Must have tha same size as img1 . Supports only CV_32F
+type.
+@param weights2 Weights for second image. Must have tha same size as img2 . Supports only CV_32F
+type.
+@param result Destination image.
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void blendLinear(InputArray img1, InputArray img2, InputArray weights1, InputArray weights2,
+                            OutputArray result, Stream& stream = Stream::Null());
+
+//! @}
+
+}} // namespace cv { namespace cuda {
+
+#endif /* OPENCV_CUDAIMGPROC_HPP */
diff --git a/modules/cudaimgproc/misc/python/test/test_cudaimgproc.py b/modules/cudaimgproc/misc/python/test/test_cudaimgproc.py
new file mode 100644
index 00000000000..0548cbcd8bc
--- /dev/null
+++ b/modules/cudaimgproc/misc/python/test/test_cudaimgproc.py
@@ -0,0 +1,93 @@
+#!/usr/bin/env python
+import os
+import cv2 as cv
+import numpy as np
+
+from tests_common import NewOpenCVTests, unittest
+
+class cudaimgproc_test(NewOpenCVTests):
+    def setUp(self):
+        super(cudaimgproc_test, self).setUp()
+        if not cv.cuda.getCudaEnabledDeviceCount():
+            self.skipTest("No CUDA-capable device is detected")
+
+    def test_cudaimgproc(self):
+        npC1 = (np.random.random((128, 128)) * 255).astype(np.uint8)
+        npC3 = (np.random.random((128, 128, 3)) * 255).astype(np.uint8)
+        npC4 = (np.random.random((128, 128, 4)) * 255).astype(np.uint8)
+        cuC1 = cv.cuda_GpuMat()
+        cuC3 = cv.cuda_GpuMat()
+        cuC4 = cv.cuda_GpuMat()
+        cuC1.upload(npC1)
+        cuC3.upload(npC3)
+        cuC4.upload(npC4)
+
+        cv.cuda.cvtColor(cuC3, cv.COLOR_RGB2HSV)
+        cv.cuda.demosaicing(cuC1, cv.cuda.COLOR_BayerGR2BGR_MHT)
+        cv.cuda.gammaCorrection(cuC3)
+        cv.cuda.alphaComp(cuC4, cuC4, cv.cuda.ALPHA_XOR)
+        cv.cuda.calcHist(cuC1)
+        cv.cuda.equalizeHist(cuC1)
+        cv.cuda.evenLevels(3, 0, 255)
+        cv.cuda.meanShiftFiltering(cuC4, 10, 5)
+        cv.cuda.meanShiftProc(cuC4, 10, 5)
+        cv.cuda.bilateralFilter(cuC3, 3, 16, 3)
+        cv.cuda.blendLinear
+
+        cuRes = cv.cuda.meanShiftSegmentation(cuC4, 10, 5, 5)
+        cuDst = cv.cuda_GpuMat(cuC4.size(),cuC4.type())
+        cv.cuda.meanShiftSegmentation(cuC4, 10, 5, 5, cuDst)
+        self.assertTrue(np.allclose(cuRes.download(),cuDst.download()))
+
+        clahe = cv.cuda.createCLAHE()
+        clahe.apply(cuC1, cv.cuda_Stream.Null())
+
+        histLevels = cv.cuda.histEven(cuC3, 20, 0, 255)
+        cv.cuda.histRange(cuC1, histLevels)
+
+        detector = cv.cuda.createCannyEdgeDetector(0, 100)
+        detector.detect(cuC1)
+
+        detector = cv.cuda.createHoughLinesDetector(3, np.pi / 180, 20)
+        detector.detect(cuC1)
+
+        detector = cv.cuda.createHoughSegmentDetector(3, np.pi / 180, 20, 5)
+        detector.detect(cuC1)
+
+        detector = cv.cuda.createHoughCirclesDetector(3, 20, 10, 10, 20, 100)
+        detector.detect(cuC1)
+
+        detector = cv.cuda.createGeneralizedHoughBallard()
+        #BUG: detect accept only Mat!
+        #Even if generate_gpumat_decls is set to True, it only wraps overload CUDA functions.
+        #The problem is that Mat and GpuMat are not fully compatible to enable system-wide overloading
+        #detector.detect(cuC1, cuC1, cuC1)
+
+        detector = cv.cuda.createGeneralizedHoughGuil()
+        #BUG: same as above..
+        #detector.detect(cuC1, cuC1, cuC1)
+
+        detector = cv.cuda.createHarrisCorner(cv.CV_8UC1, 15, 5, 1)
+        detector.compute(cuC1)
+
+        detector = cv.cuda.createMinEigenValCorner(cv.CV_8UC1, 15, 5, 1)
+        detector.compute(cuC1)
+
+        detector = cv.cuda.createGoodFeaturesToTrackDetector(cv.CV_8UC1)
+        detector.detect(cuC1)
+
+        matcher = cv.cuda.createTemplateMatching(cv.CV_8UC1, cv.TM_CCOEFF_NORMED)
+        matcher.match(cuC3, cuC3)
+
+        self.assertTrue(True) #It is sufficient that no exceptions have been there
+
+    def test_cvtColor(self):
+        npMat = (np.random.random((128, 128, 3)) * 255).astype(np.uint8)
+        cuMat = cv.cuda_GpuMat()
+        cuMat.upload(npMat)
+
+        self.assertTrue(np.allclose(cv.cuda.cvtColor(cuMat, cv.COLOR_BGR2HSV).download(),
+                                         cv.cvtColor(npMat, cv.COLOR_BGR2HSV)))
+
+if __name__ == '__main__':
+    NewOpenCVTests.bootstrap()
\ No newline at end of file
diff --git a/modules/cudaimgproc/perf/perf_bilateral_filter.cpp b/modules/cudaimgproc/perf/perf_bilateral_filter.cpp
new file mode 100644
index 00000000000..8d651d4bdae
--- /dev/null
+++ b/modules/cudaimgproc/perf/perf_bilateral_filter.cpp
@@ -0,0 +1,93 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////////////////////
+// BilateralFilter
+
+DEF_PARAM_TEST(Sz_Depth_Cn_KernelSz, cv::Size, MatDepth, MatCn, int);
+
+PERF_TEST_P(Sz_Depth_Cn_KernelSz, BilateralFilter,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_32F),
+                    CUDA_CHANNELS_1_3,
+                    Values(3, 5, 9)))
+{
+    declare.time(60.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+    const int kernel_size = GET_PARAM(3);
+
+    const float sigma_color = 7;
+    const float sigma_spatial = 5;
+    const int borderMode = cv::BORDER_REFLECT101;
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::bilateralFilter(d_src, dst, kernel_size, sigma_color, sigma_spatial, borderMode);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::bilateralFilter(src, dst, kernel_size, sigma_color, sigma_spatial, borderMode);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+}} // namespace
diff --git a/modules/cudaimgproc/perf/perf_blend.cpp b/modules/cudaimgproc/perf/perf_blend.cpp
new file mode 100644
index 00000000000..f9906e1acfd
--- /dev/null
+++ b/modules/cudaimgproc/perf/perf_blend.cpp
@@ -0,0 +1,88 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////////////////////
+// BlendLinear
+
+DEF_PARAM_TEST(Sz_Depth_Cn, cv::Size, MatDepth, MatCn);
+
+PERF_TEST_P(Sz_Depth_Cn, BlendLinear,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_32F),
+                    CUDA_CHANNELS_1_3_4))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat img1(size, type);
+    cv::Mat img2(size, type);
+    declare.in(img1, img2, WARMUP_RNG);
+
+    const cv::Mat weights1(size, CV_32FC1, cv::Scalar::all(0.5));
+    const cv::Mat weights2(size, CV_32FC1, cv::Scalar::all(0.5));
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_img1(img1);
+        const cv::cuda::GpuMat d_img2(img2);
+        const cv::cuda::GpuMat d_weights1(weights1);
+        const cv::cuda::GpuMat d_weights2(weights2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::blendLinear(d_img1, d_img2, d_weights1, d_weights2, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+}} // namespace
diff --git a/modules/cudaimgproc/perf/perf_canny.cpp b/modules/cudaimgproc/perf/perf_canny.cpp
new file mode 100644
index 00000000000..9c0a6446b4a
--- /dev/null
+++ b/modules/cudaimgproc/perf/perf_canny.cpp
@@ -0,0 +1,88 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////////////////////
+// Canny
+
+DEF_PARAM_TEST(Image_AppertureSz_L2gradient, string, int, bool);
+
+PERF_TEST_P(Image_AppertureSz_L2gradient, Canny,
+            Combine(Values("perf/800x600.png", "perf/1280x1024.png", "perf/1680x1050.png"),
+                    Values(3, 5),
+                    Bool()))
+{
+    const string fileName = GET_PARAM(0);
+    const int apperture_size = GET_PARAM(1);
+    const bool useL2gradient = GET_PARAM(2);
+
+    const cv::Mat image = readImage(fileName, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(image.empty());
+
+    const double low_thresh = 50.0;
+    const double high_thresh = 100.0;
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_image(image);
+        cv::cuda::GpuMat dst;
+
+        cv::Ptr<cv::cuda::CannyEdgeDetector> canny = cv::cuda::createCannyEdgeDetector(low_thresh, high_thresh, apperture_size, useL2gradient);
+
+        TEST_CYCLE() canny->detect(d_image, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::Canny(image, dst, low_thresh, high_thresh, apperture_size, useL2gradient);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+}} // namespace
diff --git a/modules/cudaimgproc/perf/perf_color.cpp b/modules/cudaimgproc/perf/perf_color.cpp
new file mode 100644
index 00000000000..90320835543
--- /dev/null
+++ b/modules/cudaimgproc/perf/perf_color.cpp
@@ -0,0 +1,253 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////////////////////
+// CvtColor
+
+DEF_PARAM_TEST(Sz_Depth_Code, cv::Size, MatDepth, CvtColorInfo);
+
+PERF_TEST_P(Sz_Depth_Code, CvtColor,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_32F),
+                    Values(CvtColorInfo(4, 4, cv::COLOR_RGBA2BGRA),
+                           CvtColorInfo(4, 1, cv::COLOR_BGRA2GRAY),
+                           CvtColorInfo(1, 4, cv::COLOR_GRAY2BGRA),
+                           CvtColorInfo(3, 3, cv::COLOR_BGR2XYZ),
+                           CvtColorInfo(3, 3, cv::COLOR_XYZ2BGR),
+                           CvtColorInfo(3, 3, cv::COLOR_BGR2YCrCb),
+                           CvtColorInfo(3, 3, cv::COLOR_YCrCb2BGR),
+                           CvtColorInfo(3, 3, cv::COLOR_BGR2YUV),
+                           CvtColorInfo(3, 3, cv::COLOR_YUV2BGR),
+                           CvtColorInfo(3, 3, cv::COLOR_BGR2HSV),
+                           CvtColorInfo(3, 3, cv::COLOR_HSV2BGR),
+                           CvtColorInfo(3, 3, cv::COLOR_BGR2HLS),
+                           CvtColorInfo(3, 3, cv::COLOR_HLS2BGR),
+                           CvtColorInfo(3, 3, cv::COLOR_BGR2Lab),
+                           CvtColorInfo(3, 3, cv::COLOR_LBGR2Lab),
+                           CvtColorInfo(3, 3, cv::COLOR_BGR2Luv),
+                           CvtColorInfo(3, 3, cv::COLOR_LBGR2Luv),
+                           CvtColorInfo(3, 3, cv::COLOR_Lab2BGR),
+                           CvtColorInfo(3, 3, cv::COLOR_Lab2LBGR),
+                           CvtColorInfo(3, 3, cv::COLOR_Luv2RGB),
+                           CvtColorInfo(3, 3, cv::COLOR_Luv2LRGB))))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const CvtColorInfo info = GET_PARAM(2);
+
+    cv::Mat src(size, CV_MAKETYPE(depth, info.scn));
+    cv::randu(src, 0, depth == CV_8U ? 255.0 : 1.0);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::cvtColor(d_src, dst, info.code, info.dcn);
+
+        CUDA_SANITY_CHECK(dst, 1e-4);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::cvtColor(src, dst, info.code, info.dcn);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+PERF_TEST_P(Sz_Depth_Code, CvtColorBayer,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U),
+                    Values(CvtColorInfo(1, 3, cv::COLOR_BayerBG2BGR),
+                           CvtColorInfo(1, 3, cv::COLOR_BayerGB2BGR),
+                           CvtColorInfo(1, 3, cv::COLOR_BayerRG2BGR),
+                           CvtColorInfo(1, 3, cv::COLOR_BayerGR2BGR),
+
+                           CvtColorInfo(1, 1, cv::COLOR_BayerBG2GRAY),
+                           CvtColorInfo(1, 1, cv::COLOR_BayerGB2GRAY),
+                           CvtColorInfo(1, 1, cv::COLOR_BayerRG2GRAY),
+                           CvtColorInfo(1, 1, cv::COLOR_BayerGR2GRAY))))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const CvtColorInfo info = GET_PARAM(2);
+
+    cv::Mat src(size, CV_MAKETYPE(depth, info.scn));
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::cvtColor(d_src, dst, info.code, info.dcn);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::cvtColor(src, dst, info.code, info.dcn);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Demosaicing
+
+CV_ENUM(DemosaicingCode,
+        cv::COLOR_BayerBG2BGR, cv::COLOR_BayerGB2BGR, cv::COLOR_BayerRG2BGR, cv::COLOR_BayerGR2BGR,
+        cv::COLOR_BayerBG2GRAY, cv::COLOR_BayerGB2GRAY, cv::COLOR_BayerRG2GRAY, cv::COLOR_BayerGR2GRAY,
+        cv::cuda::COLOR_BayerBG2BGR_MHT, cv::cuda::COLOR_BayerGB2BGR_MHT, cv::cuda::COLOR_BayerRG2BGR_MHT, cv::cuda::COLOR_BayerGR2BGR_MHT,
+        cv::cuda::COLOR_BayerBG2GRAY_MHT, cv::cuda::COLOR_BayerGB2GRAY_MHT, cv::cuda::COLOR_BayerRG2GRAY_MHT, cv::cuda::COLOR_BayerGR2GRAY_MHT)
+
+DEF_PARAM_TEST(Sz_Code, cv::Size, DemosaicingCode);
+
+PERF_TEST_P(Sz_Code, Demosaicing,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    DemosaicingCode::all()))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int code = GET_PARAM(1);
+
+    cv::Mat src(size, CV_8UC1);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::demosaicing(d_src, dst, code);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        if (code >= cv::COLOR_COLORCVT_MAX)
+        {
+            FAIL_NO_CPU();
+        }
+        else
+        {
+            cv::Mat dst;
+
+            TEST_CYCLE() cv::cvtColor(src, dst, code);
+
+            CPU_SANITY_CHECK(dst);
+        }
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// SwapChannels
+
+PERF_TEST_P(Sz, SwapChannels,
+            CUDA_TYPICAL_MAT_SIZES)
+{
+    const cv::Size size = GetParam();
+
+    cv::Mat src(size, CV_8UC4);
+    declare.in(src, WARMUP_RNG);
+
+    const int dstOrder[] = {2, 1, 0, 3};
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::cuda::GpuMat dst(src);
+
+        TEST_CYCLE() cv::cuda::swapChannels(dst, dstOrder);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// AlphaComp
+
+CV_ENUM(AlphaOp, cv::cuda::ALPHA_OVER, cv::cuda::ALPHA_IN, cv::cuda::ALPHA_OUT, cv::cuda::ALPHA_ATOP, cv::cuda::ALPHA_XOR, cv::cuda::ALPHA_PLUS, cv::cuda::ALPHA_OVER_PREMUL, cv::cuda::ALPHA_IN_PREMUL, cv::cuda::ALPHA_OUT_PREMUL, cv::cuda::ALPHA_ATOP_PREMUL, cv::cuda::ALPHA_XOR_PREMUL, cv::cuda::ALPHA_PLUS_PREMUL, cv::cuda::ALPHA_PREMUL)
+
+DEF_PARAM_TEST(Sz_Type_Op, cv::Size, MatType, AlphaOp);
+
+PERF_TEST_P(Sz_Type_Op, AlphaComp,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8UC4, CV_16UC4, CV_32SC4, CV_32FC4),
+                    AlphaOp::all()))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+    const int alpha_op = GET_PARAM(2);
+
+    cv::Mat img1(size, type);
+    cv::Mat img2(size, type);
+    declare.in(img1, img2, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_img1(img1);
+        const cv::cuda::GpuMat d_img2(img2);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::alphaComp(d_img1, d_img2, dst, alpha_op);
+
+        // The function is a just wrapper for NPP. We can't control its results.
+        SANITY_CHECK_NOTHING();
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+}} // namespace
diff --git a/modules/cudaimgproc/perf/perf_corners.cpp b/modules/cudaimgproc/perf/perf_corners.cpp
new file mode 100644
index 00000000000..22165caa644
--- /dev/null
+++ b/modules/cudaimgproc/perf/perf_corners.cpp
@@ -0,0 +1,135 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////////////////////
+// CornerHarris
+
+DEF_PARAM_TEST(Image_Type_Border_BlockSz_ApertureSz, string, MatType, BorderMode, int, int);
+
+PERF_TEST_P(Image_Type_Border_BlockSz_ApertureSz, CornerHarris,
+            Combine(Values<string>("gpu/stereobm/aloe-L.png"),
+                    Values(CV_8UC1, CV_32FC1),
+                    Values(BorderMode(cv::BORDER_REFLECT101), BorderMode(cv::BORDER_REPLICATE), BorderMode(cv::BORDER_REFLECT)),
+                    Values(3, 5, 7),
+                    Values(0, 3, 5, 7)))
+{
+    const string fileName = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+    const int borderMode = GET_PARAM(2);
+    const int blockSize = GET_PARAM(3);
+    const int apertureSize = GET_PARAM(4);
+
+    cv::Mat img = readImage(fileName, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(img.empty());
+
+    img.convertTo(img, type, type == CV_32F ? 1.0 / 255.0 : 1.0);
+
+    const double k = 0.5;
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_img(img);
+        cv::cuda::GpuMat dst;
+
+        cv::Ptr<cv::cuda::CornernessCriteria> harris = cv::cuda::createHarrisCorner(img.type(), blockSize, apertureSize, k, borderMode);
+
+        TEST_CYCLE() harris->compute(d_img, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-4);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::cornerHarris(img, dst, blockSize, apertureSize, k, borderMode);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// CornerMinEigenVal
+
+PERF_TEST_P(Image_Type_Border_BlockSz_ApertureSz, CornerMinEigenVal,
+            Combine(Values<string>("gpu/stereobm/aloe-L.png"),
+                    Values(CV_8UC1, CV_32FC1),
+                    Values(BorderMode(cv::BORDER_REFLECT101), BorderMode(cv::BORDER_REPLICATE), BorderMode(cv::BORDER_REFLECT)),
+                    Values(3, 5, 7),
+                    Values(0, 3, 5, 7)))
+{
+    const string fileName = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+    const int borderMode = GET_PARAM(2);
+    const int blockSize = GET_PARAM(3);
+    const int apertureSize = GET_PARAM(4);
+
+    cv::Mat img = readImage(fileName, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(img.empty());
+
+    img.convertTo(img, type, type == CV_32F ? 1.0 / 255.0 : 1.0);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_img(img);
+        cv::cuda::GpuMat dst;
+
+        cv::Ptr<cv::cuda::CornernessCriteria> minEigenVal = cv::cuda::createMinEigenValCorner(img.type(), blockSize, apertureSize, borderMode);
+
+        TEST_CYCLE() minEigenVal->compute(d_img, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-4);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::cornerMinEigenVal(img, dst, blockSize, apertureSize, borderMode);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+}} // namespace
diff --git a/modules/cudaimgproc/perf/perf_gftt.cpp b/modules/cudaimgproc/perf/perf_gftt.cpp
new file mode 100644
index 00000000000..f64c722972e
--- /dev/null
+++ b/modules/cudaimgproc/perf/perf_gftt.cpp
@@ -0,0 +1,86 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////
+// GoodFeaturesToTrack
+
+DEF_PARAM_TEST(Image_MinDistance, string, double);
+
+PERF_TEST_P(Image_MinDistance, GoodFeaturesToTrack,
+            Combine(Values<string>("gpu/perf/aloe.png"),
+                    Values(0.0, 3.0)))
+{
+    const string fileName = GET_PARAM(0);
+    const double minDistance = GET_PARAM(1);
+
+    const cv::Mat image = readImage(fileName, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(image.empty());
+
+    const int maxCorners = 8000;
+    const double qualityLevel = 0.01;
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::cuda::CornersDetector> d_detector = cv::cuda::createGoodFeaturesToTrackDetector(image.type(), maxCorners, qualityLevel, minDistance);
+
+        const cv::cuda::GpuMat d_image(image);
+        cv::cuda::GpuMat pts;
+
+        TEST_CYCLE() d_detector->detect(d_image, pts);
+
+        CUDA_SANITY_CHECK(pts);
+    }
+    else
+    {
+        cv::Mat pts;
+
+        TEST_CYCLE() cv::goodFeaturesToTrack(image, pts, maxCorners, qualityLevel, minDistance);
+
+        CPU_SANITY_CHECK(pts);
+    }
+}
+
+}} // namespace
diff --git a/modules/cudaimgproc/perf/perf_histogram.cpp b/modules/cudaimgproc/perf/perf_histogram.cpp
new file mode 100644
index 00000000000..585f2b00060
--- /dev/null
+++ b/modules/cudaimgproc/perf/perf_histogram.cpp
@@ -0,0 +1,221 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////////////////////
+// HistEvenC1
+
+DEF_PARAM_TEST(Sz_Depth, cv::Size, MatDepth);
+
+PERF_TEST_P(Sz_Depth, HistEvenC1,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_16S)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::histEven(d_src, dst, 30, 0, 180);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        const int hbins = 30;
+        const float hranges[] = {0.0f, 180.0f};
+        const int histSize[] = {hbins};
+        const float* ranges[] = {hranges};
+        const int channels[] = {0};
+
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::calcHist(&src, 1, channels, cv::Mat(), dst, 1, histSize, ranges);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// HistEvenC4
+
+PERF_TEST_P(Sz_Depth, HistEvenC4,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_16S)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, CV_MAKE_TYPE(depth, 4));
+    declare.in(src, WARMUP_RNG);
+
+    int histSize[] = {30, 30, 30, 30};
+    int lowerLevel[] = {0, 0, 0, 0};
+    int upperLevel[] = {180, 180, 180, 180};
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat d_hist[4];
+
+        TEST_CYCLE() cv::cuda::histEven(d_src, d_hist, histSize, lowerLevel, upperLevel);
+
+        cv::Mat cpu_hist0, cpu_hist1, cpu_hist2, cpu_hist3;
+        d_hist[0].download(cpu_hist0);
+        d_hist[1].download(cpu_hist1);
+        d_hist[2].download(cpu_hist2);
+        d_hist[3].download(cpu_hist3);
+        SANITY_CHECK(cpu_hist0);
+        SANITY_CHECK(cpu_hist1);
+        SANITY_CHECK(cpu_hist2);
+        SANITY_CHECK(cpu_hist3);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// CalcHist
+
+PERF_TEST_P(Sz, CalcHist,
+            CUDA_TYPICAL_MAT_SIZES)
+{
+    const cv::Size size = GetParam();
+
+    cv::Mat src(size, CV_8UC1);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::calcHist(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// EqualizeHist
+
+PERF_TEST_P(Sz, EqualizeHist,
+            CUDA_TYPICAL_MAT_SIZES)
+{
+    const cv::Size size = GetParam();
+
+    cv::Mat src(size, CV_8UC1);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::equalizeHist(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::equalizeHist(src, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// CLAHE
+
+DEF_PARAM_TEST(Sz_ClipLimit, cv::Size, double, MatType);
+
+PERF_TEST_P(Sz_ClipLimit, CLAHE,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(0.0, 40.0),
+                    Values(MatType(CV_8UC1), MatType(CV_16UC1))))
+{
+    const cv::Size size = GET_PARAM(0);
+    const double clipLimit = GET_PARAM(1);
+    const int type = GET_PARAM(2);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::cuda::CLAHE> clahe = cv::cuda::createCLAHE(clipLimit);
+        cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() clahe->apply(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Ptr<cv::CLAHE> clahe = cv::createCLAHE(clipLimit);
+        cv::Mat dst;
+
+        TEST_CYCLE() clahe->apply(src, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+}} // namespace
diff --git a/modules/cudaimgproc/perf/perf_hough.cpp b/modules/cudaimgproc/perf/perf_hough.cpp
new file mode 100644
index 00000000000..64b75e08a00
--- /dev/null
+++ b/modules/cudaimgproc/perf/perf_hough.cpp
@@ -0,0 +1,348 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////////////////////
+// HoughLines
+
+namespace
+{
+    struct Vec4iComparator
+    {
+        bool operator()(const cv::Vec4i& a, const cv::Vec4i b) const
+        {
+            if (a[0] != b[0]) return a[0] < b[0];
+            else if(a[1] != b[1]) return a[1] < b[1];
+            else if(a[2] != b[2]) return a[2] < b[2];
+            else return a[3] < b[3];
+        }
+    };
+    struct Vec3fComparator
+    {
+        bool operator()(const cv::Vec3f& a, const cv::Vec3f b) const
+        {
+            if(a[0] != b[0]) return a[0] < b[0];
+            else if(a[1] != b[1]) return a[1] < b[1];
+            else return a[2] < b[2];
+        }
+    };
+    struct Vec2fComparator
+    {
+        bool operator()(const cv::Vec2f& a, const cv::Vec2f b) const
+        {
+            if(a[0] != b[0]) return a[0] < b[0];
+            else return a[1] < b[1];
+        }
+    };
+}
+
+PERF_TEST_P(Sz, HoughLines,
+            CUDA_TYPICAL_MAT_SIZES)
+{
+    declare.time(30.0);
+
+    const cv::Size size = GetParam();
+
+    const float rho = 1.0f;
+    const float theta = static_cast<float>(CV_PI / 180.0);
+    const int threshold = 300;
+
+    cv::Mat src(size, CV_8UC1, cv::Scalar::all(0));
+    cv::line(src, cv::Point(0, 100), cv::Point(src.cols, 100), cv::Scalar::all(255), 1);
+    cv::line(src, cv::Point(0, 200), cv::Point(src.cols, 200), cv::Scalar::all(255), 1);
+    cv::line(src, cv::Point(0, 400), cv::Point(src.cols, 400), cv::Scalar::all(255), 1);
+    cv::line(src, cv::Point(100, 0), cv::Point(100, src.rows), cv::Scalar::all(255), 1);
+    cv::line(src, cv::Point(200, 0), cv::Point(200, src.rows), cv::Scalar::all(255), 1);
+    cv::line(src, cv::Point(400, 0), cv::Point(400, src.rows), cv::Scalar::all(255), 1);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat d_lines;
+
+        cv::Ptr<cv::cuda::HoughLinesDetector> hough = cv::cuda::createHoughLinesDetector(rho, theta, threshold);
+
+        TEST_CYCLE() hough->detect(d_src, d_lines);
+
+        cv::Mat gpu_lines(d_lines.row(0));
+        cv::Vec2f* begin = gpu_lines.ptr<cv::Vec2f>(0);
+        cv::Vec2f* end = begin + gpu_lines.cols;
+        std::sort(begin, end, Vec2fComparator());
+        SANITY_CHECK(gpu_lines);
+    }
+    else
+    {
+        std::vector<cv::Vec2f> cpu_lines;
+
+        TEST_CYCLE() cv::HoughLines(src, cpu_lines, rho, theta, threshold);
+
+        SANITY_CHECK(cpu_lines);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// HoughLinesP
+
+DEF_PARAM_TEST_1(Image, std::string);
+
+PERF_TEST_P(Image, HoughLinesP,
+            testing::Values("cv/shared/pic5.png", "stitching/a1.png"))
+{
+    declare.time(30.0);
+
+    const std::string fileName = getDataPath(GetParam());
+
+    const float rho = 1.0f;
+    const float theta = static_cast<float>(CV_PI / 180.0);
+    const int threshold = 100;
+    const int minLineLength = 50;
+    const int maxLineGap = 5;
+
+    const cv::Mat image = cv::imread(fileName, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(image.empty());
+
+    cv::Mat mask;
+    cv::Canny(image, mask, 50, 100);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_mask(mask);
+        cv::cuda::GpuMat d_lines;
+
+        cv::Ptr<cv::cuda::HoughSegmentDetector> hough = cv::cuda::createHoughSegmentDetector(rho, theta, minLineLength, maxLineGap);
+
+        TEST_CYCLE() hough->detect(d_mask, d_lines);
+
+        cv::Mat gpu_lines(d_lines);
+        cv::Vec4i* begin = gpu_lines.ptr<cv::Vec4i>();
+        cv::Vec4i* end = begin + gpu_lines.cols;
+        std::sort(begin, end, Vec4iComparator());
+        SANITY_CHECK(gpu_lines);
+    }
+    else
+    {
+        std::vector<cv::Vec4i> cpu_lines;
+
+        TEST_CYCLE() cv::HoughLinesP(mask, cpu_lines, rho, theta, threshold, minLineLength, maxLineGap);
+
+        SANITY_CHECK(cpu_lines);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// HoughCircles
+
+DEF_PARAM_TEST(Sz_Dp_MinDist, cv::Size, float, float);
+
+PERF_TEST_P(Sz_Dp_MinDist, HoughCircles,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(1.0f, 2.0f, 4.0f),
+                    Values(1.0f)))
+{
+    declare.time(30.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const float dp = GET_PARAM(1);
+    const float minDist = GET_PARAM(2);
+
+    const int minRadius = 10;
+    const int maxRadius = 30;
+    const int cannyThreshold = 100;
+    const int votesThreshold = 15;
+
+    cv::Mat src(size, CV_8UC1, cv::Scalar::all(0));
+    cv::circle(src, cv::Point(100, 100), 20, cv::Scalar::all(255), -1);
+    cv::circle(src, cv::Point(200, 200), 25, cv::Scalar::all(255), -1);
+    cv::circle(src, cv::Point(200, 100), 25, cv::Scalar::all(255), -1);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat d_circles;
+
+        cv::Ptr<cv::cuda::HoughCirclesDetector> houghCircles = cv::cuda::createHoughCirclesDetector(dp, minDist, cannyThreshold, votesThreshold, minRadius, maxRadius);
+
+        TEST_CYCLE() houghCircles->detect(d_src, d_circles);
+
+        cv::Mat gpu_circles(d_circles);
+        cv::Vec3f* begin = gpu_circles.ptr<cv::Vec3f>(0);
+        cv::Vec3f* end = begin + gpu_circles.cols;
+        std::sort(begin, end, Vec3fComparator());
+        SANITY_CHECK(gpu_circles);
+    }
+    else
+    {
+        std::vector<cv::Vec3f> cpu_circles;
+
+        TEST_CYCLE() cv::HoughCircles(src, cpu_circles, cv::HOUGH_GRADIENT, dp, minDist, cannyThreshold, votesThreshold, minRadius, maxRadius);
+
+        SANITY_CHECK(cpu_circles);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// GeneralizedHough
+
+PERF_TEST_P(Sz, GeneralizedHoughBallard, CUDA_TYPICAL_MAT_SIZES)
+{
+    declare.time(10);
+
+    const cv::Size imageSize = GetParam();
+
+    const cv::Mat templ = readImage("cv/shared/templ.png", cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(templ.empty());
+
+    cv::Mat image(imageSize, CV_8UC1, cv::Scalar::all(0));
+    templ.copyTo(image(cv::Rect(50, 50, templ.cols, templ.rows)));
+
+    cv::Mat edges;
+    cv::Canny(image, edges, 50, 100);
+
+    cv::Mat dx, dy;
+    cv::Sobel(image, dx, CV_32F, 1, 0);
+    cv::Sobel(image, dy, CV_32F, 0, 1);
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::GeneralizedHoughBallard> alg = cv::cuda::createGeneralizedHoughBallard();
+
+        const cv::cuda::GpuMat d_edges(edges);
+        const cv::cuda::GpuMat d_dx(dx);
+        const cv::cuda::GpuMat d_dy(dy);
+        cv::cuda::GpuMat positions;
+
+        alg->setTemplate(cv::cuda::GpuMat(templ));
+
+        TEST_CYCLE() alg->detect(d_edges, d_dx, d_dy, positions);
+
+        CUDA_SANITY_CHECK(positions);
+    }
+    else
+    {
+        cv::Ptr<cv::GeneralizedHoughBallard> alg = cv::createGeneralizedHoughBallard();
+
+        cv::Mat positions;
+
+        alg->setTemplate(templ);
+
+        TEST_CYCLE() alg->detect(edges, dx, dy, positions);
+
+        CPU_SANITY_CHECK(positions);
+    }
+}
+
+PERF_TEST_P(Sz, DISABLED_GeneralizedHoughGuil, CUDA_TYPICAL_MAT_SIZES)
+{
+    declare.time(10);
+
+    const cv::Size imageSize = GetParam();
+
+    const cv::Mat templ = readImage("cv/shared/templ.png", cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(templ.empty());
+
+    cv::Mat image(imageSize, CV_8UC1, cv::Scalar::all(0));
+    templ.copyTo(image(cv::Rect(50, 50, templ.cols, templ.rows)));
+
+    cv::RNG rng(123456789);
+    const int objCount = rng.uniform(5, 15);
+    for (int i = 0; i < objCount; ++i)
+    {
+        double scale = rng.uniform(0.7, 1.3);
+        bool rotate = 1 == rng.uniform(0, 2);
+
+        cv::Mat obj;
+        cv::resize(templ, obj, cv::Size(), scale, scale);
+        if (rotate)
+            obj = obj.t();
+
+        cv::Point pos;
+
+        pos.x = rng.uniform(0, image.cols - obj.cols);
+        pos.y = rng.uniform(0, image.rows - obj.rows);
+
+        cv::Mat roi = image(cv::Rect(pos, obj.size()));
+        cv::add(roi, obj, roi);
+    }
+
+    cv::Mat edges;
+    cv::Canny(image, edges, 50, 100);
+
+    cv::Mat dx, dy;
+    cv::Sobel(image, dx, CV_32F, 1, 0);
+    cv::Sobel(image, dy, CV_32F, 0, 1);
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::GeneralizedHoughGuil> alg = cv::cuda::createGeneralizedHoughGuil();
+        alg->setMaxAngle(90.0);
+        alg->setAngleStep(2.0);
+
+        const cv::cuda::GpuMat d_edges(edges);
+        const cv::cuda::GpuMat d_dx(dx);
+        const cv::cuda::GpuMat d_dy(dy);
+        cv::cuda::GpuMat positions;
+
+        alg->setTemplate(cv::cuda::GpuMat(templ));
+
+        TEST_CYCLE() alg->detect(d_edges, d_dx, d_dy, positions);
+    }
+    else
+    {
+        cv::Ptr<cv::GeneralizedHoughGuil> alg = cv::createGeneralizedHoughGuil();
+        alg->setMaxAngle(90.0);
+        alg->setAngleStep(2.0);
+
+        cv::Mat positions;
+
+        alg->setTemplate(templ);
+
+        TEST_CYCLE() alg->detect(edges, dx, dy, positions);
+    }
+
+    // The algorithm is not stable yet.
+    SANITY_CHECK_NOTHING();
+}
+
+}} // namespace
diff --git a/modules/cudaimgproc/perf/perf_main.cpp b/modules/cudaimgproc/perf/perf_main.cpp
new file mode 100644
index 00000000000..11a1651b4c3
--- /dev/null
+++ b/modules/cudaimgproc/perf/perf_main.cpp
@@ -0,0 +1,47 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+using namespace perf;
+
+CV_PERF_TEST_CUDA_MAIN(cudaimgproc)
diff --git a/modules/cudaimgproc/perf/perf_match_template.cpp b/modules/cudaimgproc/perf/perf_match_template.cpp
new file mode 100644
index 00000000000..335fe10c012
--- /dev/null
+++ b/modules/cudaimgproc/perf/perf_match_template.cpp
@@ -0,0 +1,135 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+////////////////////////////////////////////////////////////////////////////////
+// MatchTemplate8U
+
+CV_ENUM(TemplateMethod, TM_SQDIFF, TM_SQDIFF_NORMED, TM_CCORR, TM_CCORR_NORMED, TM_CCOEFF, TM_CCOEFF_NORMED)
+
+DEF_PARAM_TEST(Sz_TemplateSz_Cn_Method, cv::Size, cv::Size, MatCn, TemplateMethod);
+
+PERF_TEST_P(Sz_TemplateSz_Cn_Method, MatchTemplate8U,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(cv::Size(5, 5), cv::Size(16, 16), cv::Size(30, 30)),
+                    CUDA_CHANNELS_1_3_4,
+                    TemplateMethod::all()))
+{
+    declare.time(300.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const cv::Size templ_size = GET_PARAM(1);
+    const int cn = GET_PARAM(2);
+    const int method = GET_PARAM(3);
+
+    cv::Mat image(size, CV_MAKE_TYPE(CV_8U, cn));
+    cv::Mat templ(templ_size, CV_MAKE_TYPE(CV_8U, cn));
+    declare.in(image, templ, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_image(image);
+        const cv::cuda::GpuMat d_templ(templ);
+        cv::cuda::GpuMat dst;
+
+        cv::Ptr<cv::cuda::TemplateMatching> alg = cv::cuda::createTemplateMatching(image.type(), method);
+
+        TEST_CYCLE() alg->match(d_image, d_templ, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-5, ERROR_RELATIVE);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::matchTemplate(image, templ, dst, method);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+////////////////////////////////////////////////////////////////////////////////
+// MatchTemplate32F
+
+PERF_TEST_P(Sz_TemplateSz_Cn_Method, MatchTemplate32F,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(cv::Size(5, 5), cv::Size(16, 16), cv::Size(30, 30)),
+                    CUDA_CHANNELS_1_3_4,
+                    Values(TemplateMethod(cv::TM_SQDIFF), TemplateMethod(cv::TM_CCORR))))
+{
+    declare.time(300.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const cv::Size templ_size = GET_PARAM(1);
+    const int cn = GET_PARAM(2);
+    int method = GET_PARAM(3);
+
+    cv::Mat image(size, CV_MAKE_TYPE(CV_32F, cn));
+    cv::Mat templ(templ_size, CV_MAKE_TYPE(CV_32F, cn));
+    declare.in(image, templ, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_image(image);
+        const cv::cuda::GpuMat d_templ(templ);
+        cv::cuda::GpuMat dst;
+
+        cv::Ptr<cv::cuda::TemplateMatching> alg = cv::cuda::createTemplateMatching(image.type(), method);
+
+        TEST_CYCLE() alg->match(d_image, d_templ, dst);
+
+        CUDA_SANITY_CHECK(dst, 1e-6, ERROR_RELATIVE);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::matchTemplate(image, templ, dst, method);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+}} // namespace
diff --git a/modules/cudaimgproc/perf/perf_mean_shift.cpp b/modules/cudaimgproc/perf/perf_mean_shift.cpp
new file mode 100644
index 00000000000..5a151bff295
--- /dev/null
+++ b/modules/cudaimgproc/perf/perf_mean_shift.cpp
@@ -0,0 +1,152 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////////////////////
+// MeanShiftFiltering
+
+DEF_PARAM_TEST_1(Image, string);
+
+PERF_TEST_P(Image, MeanShiftFiltering,
+            Values<string>("gpu/meanshift/cones.png"))
+{
+    declare.time(300.0);
+
+    const cv::Mat img = readImage(GetParam());
+    ASSERT_FALSE(img.empty());
+
+    cv::Mat rgba;
+    cv::cvtColor(img, rgba, cv::COLOR_BGR2BGRA);
+
+    const int sp = 50;
+    const int sr = 50;
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(rgba);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::meanShiftFiltering(d_src, dst, sp, sr);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::pyrMeanShiftFiltering(img, dst, sp, sr);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// MeanShiftProc
+
+PERF_TEST_P(Image, MeanShiftProc,
+            Values<string>("gpu/meanshift/cones.png"))
+{
+    declare.time(300.0);
+
+    const cv::Mat img = readImage(GetParam());
+    ASSERT_FALSE(img.empty());
+
+    cv::Mat rgba;
+    cv::cvtColor(img, rgba, cv::COLOR_BGR2BGRA);
+
+    const int sp = 50;
+    const int sr = 50;
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(rgba);
+        cv::cuda::GpuMat dstr;
+        cv::cuda::GpuMat dstsp;
+
+        TEST_CYCLE() cv::cuda::meanShiftProc(d_src, dstr, dstsp, sp, sr);
+
+        CUDA_SANITY_CHECK(dstr);
+        CUDA_SANITY_CHECK(dstsp);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// MeanShiftSegmentation
+
+PERF_TEST_P(Image, MeanShiftSegmentation,
+            Values<string>("gpu/meanshift/cones.png"))
+{
+    declare.time(300.0);
+
+    const cv::Mat img = readImage(GetParam());
+    ASSERT_FALSE(img.empty());
+
+    cv::Mat rgba;
+    cv::cvtColor(img, rgba, cv::COLOR_BGR2BGRA);
+
+    const int sp = 10;
+    const int sr = 10;
+    const int minsize = 20;
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(rgba);
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::cuda::meanShiftSegmentation(d_src, dst, sp, sr, minsize);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+}} // namespace
diff --git a/modules/cudaimgproc/perf/perf_precomp.hpp b/modules/cudaimgproc/perf/perf_precomp.hpp
new file mode 100644
index 00000000000..b793f312a7c
--- /dev/null
+++ b/modules/cudaimgproc/perf/perf_precomp.hpp
@@ -0,0 +1,55 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_PERF_PRECOMP_HPP__
+#define __OPENCV_PERF_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_perf.hpp"
+
+#include "opencv2/cudaimgproc.hpp"
+
+namespace opencv_test {
+using namespace perf;
+using namespace testing;
+}
+
+#endif
diff --git a/modules/cudaimgproc/src/bilateral_filter.cpp b/modules/cudaimgproc/src/bilateral_filter.cpp
new file mode 100644
index 00000000000..fe77ba93d89
--- /dev/null
+++ b/modules/cudaimgproc/src/bilateral_filter.cpp
@@ -0,0 +1,99 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+void cv::cuda::bilateralFilter(InputArray, OutputArray, int, float, float, int, Stream&) { throw_no_cuda(); }
+
+#else
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace imgproc
+    {
+        template<typename T>
+        void bilateral_filter_gpu(const PtrStepSzb& src, PtrStepSzb dst, int kernel_size, float sigma_spatial, float sigma_color, int borderMode, cudaStream_t stream);
+    }
+}}}
+
+void cv::cuda::bilateralFilter(InputArray _src, OutputArray _dst, int kernel_size, float sigma_color, float sigma_spatial, int borderMode, Stream& stream)
+{
+    using cv::cuda::device::imgproc::bilateral_filter_gpu;
+
+    typedef void (*func_t)(const PtrStepSzb& src, PtrStepSzb dst, int kernel_size, float sigma_spatial, float sigma_color, int borderMode, cudaStream_t s);
+
+    static const func_t funcs[6][4] =
+    {
+        {bilateral_filter_gpu<uchar>      , 0 /*bilateral_filter_gpu<uchar2>*/ , bilateral_filter_gpu<uchar3>      , bilateral_filter_gpu<uchar4>      },
+        {0 /*bilateral_filter_gpu<schar>*/, 0 /*bilateral_filter_gpu<schar2>*/ , 0 /*bilateral_filter_gpu<schar3>*/, 0 /*bilateral_filter_gpu<schar4>*/},
+        {bilateral_filter_gpu<ushort>     , 0 /*bilateral_filter_gpu<ushort2>*/, bilateral_filter_gpu<ushort3>     , bilateral_filter_gpu<ushort4>     },
+        {bilateral_filter_gpu<short>      , 0 /*bilateral_filter_gpu<short2>*/ , bilateral_filter_gpu<short3>      , bilateral_filter_gpu<short4>      },
+        {0 /*bilateral_filter_gpu<int>*/  , 0 /*bilateral_filter_gpu<int2>*/   , 0 /*bilateral_filter_gpu<int3>*/  , 0 /*bilateral_filter_gpu<int4>*/  },
+        {bilateral_filter_gpu<float>      , 0 /*bilateral_filter_gpu<float2>*/ , bilateral_filter_gpu<float3>      , bilateral_filter_gpu<float4>      }
+    };
+
+    sigma_color = (sigma_color <= 0 ) ? 1 : sigma_color;
+    sigma_spatial = (sigma_spatial <= 0 ) ? 1 : sigma_spatial;
+
+    int radius = (kernel_size <= 0) ? cvRound(sigma_spatial*1.5) : kernel_size/2;
+    kernel_size = std::max(radius, 1)*2 + 1;
+
+    GpuMat src = _src.getGpuMat();
+
+    CV_Assert( src.depth() <= CV_32F && src.channels() <= 4 );
+    CV_Assert( borderMode == BORDER_REFLECT101 || borderMode == BORDER_REPLICATE || borderMode == BORDER_CONSTANT || borderMode == BORDER_REFLECT || borderMode == BORDER_WRAP );
+
+    const func_t func = funcs[src.depth()][src.channels() - 1];
+    CV_Assert( func != 0 );
+
+    _dst.create(src.size(), src.type());
+    GpuMat dst = _dst.getGpuMat();
+
+    func(src, dst, kernel_size, sigma_spatial, sigma_color, borderMode, StreamAccessor::getStream(stream));
+}
+
+#endif
diff --git a/modules/cudaimgproc/src/blend.cpp b/modules/cudaimgproc/src/blend.cpp
new file mode 100644
index 00000000000..90a43280057
--- /dev/null
+++ b/modules/cudaimgproc/src/blend.cpp
@@ -0,0 +1,109 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+void cv::cuda::blendLinear(InputArray, InputArray, InputArray, InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+
+#else
+
+////////////////////////////////////////////////////////////////////////
+// blendLinear
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace blend
+    {
+        template <typename T>
+        void blendLinearCaller(int rows, int cols, int cn, PtrStep<T> img1, PtrStep<T> img2, PtrStepf weights1, PtrStepf weights2, PtrStep<T> result, cudaStream_t stream);
+
+        void blendLinearCaller8UC4(int rows, int cols, PtrStepb img1, PtrStepb img2, PtrStepf weights1, PtrStepf weights2, PtrStepb result, cudaStream_t stream);
+    }
+}}}
+
+using namespace ::cv::cuda::device::blend;
+
+void cv::cuda::blendLinear(InputArray _img1, InputArray _img2, InputArray _weights1, InputArray _weights2,
+                          OutputArray _result, Stream& stream)
+{
+    GpuMat img1 = _img1.getGpuMat();
+    GpuMat img2 = _img2.getGpuMat();
+
+    GpuMat weights1 = _weights1.getGpuMat();
+    GpuMat weights2 = _weights2.getGpuMat();
+
+    CV_Assert( img1.size() == img2.size() );
+    CV_Assert( img1.type() == img2.type() );
+    CV_Assert( weights1.size() == img1.size() );
+    CV_Assert( weights2.size() == img2.size() );
+    CV_Assert( weights1.type() == CV_32FC1 );
+    CV_Assert( weights2.type() == CV_32FC1 );
+
+    const Size size = img1.size();
+    const int depth = img1.depth();
+    const int cn = img1.channels();
+
+    _result.create(size, CV_MAKE_TYPE(depth, cn));
+    GpuMat result = _result.getGpuMat();
+
+    switch (depth)
+    {
+    case CV_8U:
+        if (cn != 4)
+            blendLinearCaller<uchar>(size.height, size.width, cn, img1, img2, weights1, weights2, result, StreamAccessor::getStream(stream));
+        else
+            blendLinearCaller8UC4(size.height, size.width, img1, img2, weights1, weights2, result, StreamAccessor::getStream(stream));
+        break;
+    case CV_32F:
+        blendLinearCaller<float>(size.height, size.width, cn, img1, img2, weights1, weights2, result, StreamAccessor::getStream(stream));
+        break;
+    default:
+        CV_Error(cv::Error::StsUnsupportedFormat, "bad image depth in linear blending function");
+    }
+}
+
+#endif
diff --git a/modules/cudaimgproc/src/canny.cpp b/modules/cudaimgproc/src/canny.cpp
new file mode 100644
index 00000000000..9a1125d1cdd
--- /dev/null
+++ b/modules/cudaimgproc/src/canny.cpp
@@ -0,0 +1,245 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+Ptr<CannyEdgeDetector> cv::cuda::createCannyEdgeDetector(double, double, int, bool) { throw_no_cuda(); return Ptr<CannyEdgeDetector>(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+namespace canny
+{
+    void calcMagnitude(PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzi dx, PtrStepSzi dy, PtrStepSzf mag, bool L2Grad, cudaStream_t stream);
+    void calcMagnitude(PtrStepSzi dx, PtrStepSzi dy, PtrStepSzf mag, bool L2Grad, cudaStream_t stream);
+
+    void calcMap(PtrStepSzi dx, PtrStepSzi dy, PtrStepSzf mag, PtrStepSzi map, float low_thresh, float high_thresh, cudaStream_t stream);
+
+    void edgesHysteresisLocal(PtrStepSzi map, short2* st1, int* d_counter, cudaStream_t stream);
+
+    void edgesHysteresisGlobal(PtrStepSzi map, short2* st1, short2* st2, int* d_counter, cudaStream_t stream);
+
+    void getEdges(PtrStepSzi map, PtrStepSzb dst, cudaStream_t stream);
+}
+
+namespace
+{
+    class CannyImpl : public CannyEdgeDetector
+    {
+    public:
+        CannyImpl(double low_thresh, double high_thresh, int apperture_size, bool L2gradient) :
+            low_thresh_(low_thresh), high_thresh_(high_thresh), apperture_size_(apperture_size), L2gradient_(L2gradient)
+        {
+            old_apperture_size_ = -1;
+            d_counter = nullptr;
+        }
+
+        void detect(InputArray image, OutputArray edges, Stream& stream);
+        void detect(InputArray dx, InputArray dy, OutputArray edges, Stream& stream);
+
+        void setLowThreshold(double low_thresh) { low_thresh_ = low_thresh; }
+        double getLowThreshold() const { return low_thresh_; }
+
+        void setHighThreshold(double high_thresh) { high_thresh_ = high_thresh; }
+        double getHighThreshold() const { return high_thresh_; }
+
+        void setAppertureSize(int apperture_size) { apperture_size_ = apperture_size; }
+        int getAppertureSize() const { return apperture_size_; }
+
+        void setL2Gradient(bool L2gradient) { L2gradient_ = L2gradient; }
+        bool getL2Gradient() const { return L2gradient_; }
+
+        void write(FileStorage& fs) const
+        {
+            writeFormat(fs);
+            fs << "name" << "Canny_CUDA"
+            << "low_thresh" << low_thresh_
+            << "high_thresh" << high_thresh_
+            << "apperture_size" << apperture_size_
+            << "L2gradient" << L2gradient_;
+        }
+
+        void read(const FileNode& fn)
+        {
+            CV_Assert( String(fn["name"]) == "Canny_CUDA" );
+            low_thresh_ = (double)fn["low_thresh"];
+            high_thresh_ = (double)fn["high_thresh"];
+            apperture_size_ = (int)fn["apperture_size"];
+            L2gradient_ = (int)fn["L2gradient"] != 0;
+        }
+
+    private:
+        void createBuf(Size image_size);
+        void CannyCaller(GpuMat& edges, Stream& stream);
+
+        double low_thresh_;
+        double high_thresh_;
+        int apperture_size_;
+        bool L2gradient_;
+
+        GpuMat dx_, dy_;
+        GpuMat mag_;
+        GpuMat map_;
+        GpuMat st1_, st2_;
+#ifdef HAVE_OPENCV_CUDAFILTERS
+        Ptr<Filter> filterDX_, filterDY_;
+#endif
+        int old_apperture_size_;
+
+        int *d_counter;
+    };
+
+    void CannyImpl::detect(InputArray _image, OutputArray _edges, Stream& stream)
+    {
+        GpuMat image = _image.getGpuMat();
+
+        CV_Assert( image.type() == CV_8UC1 );
+        CV_Assert( deviceSupports(SHARED_ATOMICS) );
+
+        if (low_thresh_ > high_thresh_)
+            std::swap(low_thresh_, high_thresh_);
+
+        createBuf(image.size());
+
+        _edges.create(image.size(), CV_8UC1);
+        GpuMat edges = _edges.getGpuMat();
+
+        if (apperture_size_ == 3)
+        {
+            Size wholeSize;
+            Point ofs;
+            image.locateROI(wholeSize, ofs);
+            GpuMat srcWhole(wholeSize, image.type(), image.datastart, image.step);
+
+            canny::calcMagnitude(srcWhole, ofs.x, ofs.y, dx_, dy_, mag_, L2gradient_, StreamAccessor::getStream(stream));
+        }
+        else
+        {
+#ifndef HAVE_OPENCV_CUDAFILTERS
+            throw_no_cuda();
+#else
+            filterDX_->apply(image, dx_, stream);
+            filterDY_->apply(image, dy_, stream);
+
+            canny::calcMagnitude(dx_, dy_, mag_, L2gradient_, StreamAccessor::getStream(stream));
+#endif
+        }
+
+        CannyCaller(edges, stream);
+    }
+
+    void CannyImpl::detect(InputArray _dx, InputArray _dy, OutputArray _edges, Stream& stream)
+    {
+        GpuMat dx = _dx.getGpuMat();
+        GpuMat dy = _dy.getGpuMat();
+
+        CV_Assert( dx.type() == CV_32SC1 );
+        CV_Assert( dy.type() == dx.type() && dy.size() == dx.size() );
+        CV_Assert( deviceSupports(SHARED_ATOMICS) );
+
+        dx.copyTo(dx_, stream);
+        dy.copyTo(dy_, stream);
+
+        if (low_thresh_ > high_thresh_)
+            std::swap(low_thresh_, high_thresh_);
+
+        createBuf(dx.size());
+
+        _edges.create(dx.size(), CV_8UC1);
+        GpuMat edges = _edges.getGpuMat();
+
+        canny::calcMagnitude(dx_, dy_, mag_, L2gradient_, StreamAccessor::getStream(stream));
+
+        CannyCaller(edges, stream);
+    }
+
+    void CannyImpl::createBuf(Size image_size)
+    {
+        CV_Assert(image_size.width < std::numeric_limits<short>::max() && image_size.height < std::numeric_limits<short>::max());
+
+        ensureSizeIsEnough(image_size, CV_32SC1, dx_);
+        ensureSizeIsEnough(image_size, CV_32SC1, dy_);
+
+#ifdef HAVE_OPENCV_CUDAFILTERS
+        if (apperture_size_ != 3 && apperture_size_ != old_apperture_size_)
+        {
+            filterDX_ = cuda::createDerivFilter(CV_8UC1, CV_32S, 1, 0, apperture_size_, false, 1, BORDER_REPLICATE);
+            filterDY_ = cuda::createDerivFilter(CV_8UC1, CV_32S, 0, 1, apperture_size_, false, 1, BORDER_REPLICATE);
+            old_apperture_size_ = apperture_size_;
+        }
+#endif
+
+        ensureSizeIsEnough(image_size, CV_32FC1, mag_);
+        ensureSizeIsEnough(image_size, CV_32SC1, map_);
+
+        ensureSizeIsEnough(1, image_size.area(), CV_16SC2, st1_);
+        ensureSizeIsEnough(1, image_size.area(), CV_16SC2, st2_);
+    }
+
+    void CannyImpl::CannyCaller(GpuMat& edges, Stream& stream)
+    {
+        map_.setTo(Scalar::all(0), stream);
+
+        canny::calcMap(dx_, dy_, mag_, map_, static_cast<float>(low_thresh_), static_cast<float>(high_thresh_), StreamAccessor::getStream(stream));
+
+        cudaSafeCall( cudaMalloc(&d_counter, sizeof(int)) );
+
+        canny::edgesHysteresisLocal(map_, st1_.ptr<short2>(), d_counter, StreamAccessor::getStream(stream));
+
+        canny::edgesHysteresisGlobal(map_, st1_.ptr<short2>(), st2_.ptr<short2>(), d_counter, StreamAccessor::getStream(stream));
+
+        cudaSafeCall( cudaFree(d_counter) );
+
+        canny::getEdges(map_, edges, StreamAccessor::getStream(stream));
+    }
+}
+
+Ptr<CannyEdgeDetector> cv::cuda::createCannyEdgeDetector(double low_thresh, double high_thresh, int apperture_size, bool L2gradient)
+{
+    return makePtr<CannyImpl>(low_thresh, high_thresh, apperture_size, L2gradient);
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudaimgproc/src/color.cpp b/modules/cudaimgproc/src/color.cpp
new file mode 100644
index 00000000000..5adfa6cda6c
--- /dev/null
+++ b/modules/cudaimgproc/src/color.cpp
@@ -0,0 +1,2332 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+void cv::cuda::cvtColor(InputArray, OutputArray, int, int, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::demosaicing(InputArray, OutputArray, int, int, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::swapChannels(InputOutputArray, const int[], Stream&) { throw_no_cuda(); }
+
+void cv::cuda::gammaCorrection(InputArray, OutputArray, bool, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::alphaComp(InputArray, InputArray, OutputArray, int, Stream&) { throw_no_cuda(); }
+
+
+#else /* !defined (HAVE_CUDA) */
+
+#include "cvt_color_internal.h"
+
+namespace cv { namespace cuda {
+    namespace device
+    {
+        template <int cn>
+        void Bayer2BGR_8u_gpu(PtrStepSzb src, PtrStepSzb dst, bool blue_last, bool start_with_green, cudaStream_t stream);
+        template <int cn>
+        void Bayer2BGR_16u_gpu(PtrStepSzb src, PtrStepSzb dst, bool blue_last, bool start_with_green, cudaStream_t stream);
+
+        template <int cn>
+        void MHCdemosaic(PtrStepSzb src, int2 sourceOffset, PtrStepSzb dst, int2 firstRed, cudaStream_t stream);
+    }
+}}
+
+using namespace ::cv::cuda::device;
+
+namespace
+{
+    typedef void (*gpu_func_t)(const GpuMat& _src, GpuMat& _dst, Stream& stream);
+
+    void BGR_to_RGB(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[] = {BGR_to_RGB_8u, 0, BGR_to_RGB_16u, 0, 0, BGR_to_RGB_32f};
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), 3));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[src.depth()](src, dst, stream);
+    }
+
+    void BGR_to_BGRA(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[] = {BGR_to_BGRA_8u, 0, BGR_to_BGRA_16u, 0, 0, BGR_to_BGRA_32f};
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), 4));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[src.depth()](src, dst, stream);
+    }
+
+    void BGR_to_RGBA(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[] = {BGR_to_RGBA_8u, 0, BGR_to_RGBA_16u, 0, 0, BGR_to_RGBA_32f};
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), 4));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[src.depth()](src, dst, stream);
+    }
+
+    void BGRA_to_BGR(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[] = {BGRA_to_BGR_8u, 0, BGRA_to_BGR_16u, 0, 0, BGRA_to_BGR_32f};
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), 3));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[src.depth()](src, dst, stream);
+    }
+
+    void BGRA_to_RGB(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[] = {BGRA_to_RGB_8u, 0, BGRA_to_RGB_16u, 0, 0, BGRA_to_RGB_32f};
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), 3));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[src.depth()](src, dst, stream);
+    }
+
+    void BGRA_to_RGBA(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[] = {BGRA_to_RGBA_8u, 0, BGRA_to_RGBA_16u, 0, 0, BGRA_to_RGBA_32f};
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), 4));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[src.depth()](src, dst, stream);
+    }
+
+    void BGR_to_BGR555(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 3 );
+
+        _dst.create(src.size(), CV_8UC2);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::BGR_to_BGR555(src, dst, stream);
+    }
+
+    void BGR_to_BGR565(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 3 );
+
+        _dst.create(src.size(), CV_8UC2);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::BGR_to_BGR565(src, dst, stream);
+    }
+
+    void RGB_to_BGR555(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 3 );
+
+        _dst.create(src.size(), CV_8UC2);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::RGB_to_BGR555(src, dst, stream);
+    }
+
+    void RGB_to_BGR565(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 3 );
+
+        _dst.create(src.size(), CV_8UC2);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::RGB_to_BGR565(src, dst, stream);
+    }
+
+    void BGRA_to_BGR555(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 4 );
+
+        _dst.create(src.size(), CV_8UC2);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::BGRA_to_BGR555(src, dst, stream);
+    }
+
+    void BGRA_to_BGR565(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 4 );
+
+        _dst.create(src.size(), CV_8UC2);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::BGRA_to_BGR565(src, dst, stream);
+    }
+
+    void RGBA_to_BGR555(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 4 );
+
+        _dst.create(src.size(), CV_8UC2);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::RGBA_to_BGR555(src, dst, stream);
+    }
+
+    void RGBA_to_BGR565(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 4 );
+
+        _dst.create(src.size(), CV_8UC2);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::RGBA_to_BGR565(src, dst, stream);
+    }
+
+    void BGR555_to_RGB(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 2 );
+
+        _dst.create(src.size(), CV_8UC3);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::BGR555_to_RGB(src, dst, stream);
+    }
+
+    void BGR565_to_RGB(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 2 );
+
+        _dst.create(src.size(), CV_8UC3);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::BGR565_to_RGB(src, dst, stream);
+    }
+
+    void BGR555_to_BGR(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 2 );
+
+        _dst.create(src.size(), CV_8UC3);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::BGR555_to_BGR(src, dst, stream);
+    }
+
+    void BGR565_to_BGR(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 2 );
+
+        _dst.create(src.size(), CV_8UC3);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::BGR565_to_BGR(src, dst, stream);
+    }
+
+    void BGR555_to_RGBA(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 2 );
+
+        _dst.create(src.size(), CV_8UC4);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::BGR555_to_RGBA(src, dst, stream);
+    }
+
+    void BGR565_to_RGBA(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 2 );
+
+        _dst.create(src.size(), CV_8UC4);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::BGR565_to_RGBA(src, dst, stream);
+    }
+
+    void BGR555_to_BGRA(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 2 );
+
+        _dst.create(src.size(), CV_8UC4);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::BGR555_to_BGRA(src, dst, stream);
+    }
+
+    void BGR565_to_BGRA(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 2 );
+
+        _dst.create(src.size(), CV_8UC4);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::BGR565_to_BGRA(src, dst, stream);
+    }
+
+    void GRAY_to_BGR(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[] = {GRAY_to_BGR_8u, 0, GRAY_to_BGR_16u, 0, 0, GRAY_to_BGR_32f};
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 1 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), 3));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[src.depth()](src, dst, stream);
+    }
+
+    void GRAY_to_BGRA(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[] = {GRAY_to_BGRA_8u, 0, GRAY_to_BGRA_16u, 0, 0, GRAY_to_BGRA_32f};
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 1 );
+
+        _dst.create(src.size(), CV_MAKETYPE(src.depth(), 4));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[src.depth()](src, dst, stream);
+    }
+
+    void GRAY_to_BGR555(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 1 );
+
+        _dst.create(src.size(), CV_8UC2);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::GRAY_to_BGR555(src, dst, stream);
+    }
+
+    void GRAY_to_BGR565(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 1 );
+
+        _dst.create(src.size(), CV_8UC2);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::GRAY_to_BGR565(src, dst, stream);
+    }
+
+    void BGR555_to_GRAY(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 2 );
+
+        _dst.create(src.size(), CV_8UC1);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::BGR555_to_GRAY(src, dst, stream);
+    }
+
+    void BGR565_to_GRAY(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U );
+        CV_Assert( src.channels() == 2 );
+
+        _dst.create(src.size(), CV_8UC1);
+        GpuMat dst = _dst.getGpuMat();
+
+        cv::cuda::device::BGR565_to_GRAY(src, dst, stream);
+    }
+
+    void RGB_to_GRAY(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[] = {RGB_to_GRAY_8u, 0, RGB_to_GRAY_16u, 0, 0, RGB_to_GRAY_32f};
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), 1));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[src.depth()](src, dst, stream);
+    }
+
+    void BGR_to_GRAY(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[] = {BGR_to_GRAY_8u, 0, BGR_to_GRAY_16u, 0, 0, BGR_to_GRAY_32f};
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), 1));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[src.depth()](src, dst, stream);
+    }
+
+    void RGBA_to_GRAY(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[] = {RGBA_to_GRAY_8u, 0, RGBA_to_GRAY_16u, 0, 0, RGBA_to_GRAY_32f};
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), 1));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[src.depth()](src, dst, stream);
+    }
+
+    void BGRA_to_GRAY(InputArray _src, OutputArray _dst, int, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[] = {BGRA_to_GRAY_8u, 0, BGRA_to_GRAY_16u, 0, 0, BGRA_to_GRAY_32f};
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), 1));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[src.depth()](src, dst, stream);
+    }
+
+    void RGB_to_YUV(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {RGB_to_YUV_8u, 0, RGB_to_YUV_16u, 0, 0, RGB_to_YUV_32f},
+                {RGBA_to_YUV_8u, 0, RGBA_to_YUV_16u, 0, 0, RGBA_to_YUV_32f}
+            },
+            {
+                {RGB_to_YUV4_8u, 0, RGB_to_YUV4_16u, 0, 0, RGB_to_YUV4_32f},
+                {RGBA_to_YUV4_8u, 0, RGBA_to_YUV4_16u, 0, 0, RGBA_to_YUV4_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void BGR_to_YUV(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {BGR_to_YUV_8u, 0, BGR_to_YUV_16u, 0, 0, BGR_to_YUV_32f},
+                {BGRA_to_YUV_8u, 0, BGRA_to_YUV_16u, 0, 0, BGRA_to_YUV_32f}
+            },
+            {
+                {BGR_to_YUV4_8u, 0, BGR_to_YUV4_16u, 0, 0, BGR_to_YUV4_32f},
+                {BGRA_to_YUV4_8u, 0, BGRA_to_YUV4_16u, 0, 0, BGRA_to_YUV4_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void YUV_to_RGB(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {YUV_to_RGB_8u, 0, YUV_to_RGB_16u, 0, 0, YUV_to_RGB_32f},
+                {YUV4_to_RGB_8u, 0, YUV4_to_RGB_16u, 0, 0, YUV4_to_RGB_32f}
+            },
+            {
+                {YUV_to_RGBA_8u, 0, YUV_to_RGBA_16u, 0, 0, YUV_to_RGBA_32f},
+                {YUV4_to_RGBA_8u, 0, YUV4_to_RGBA_16u, 0, 0, YUV4_to_RGBA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void YUV_to_BGR(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {YUV_to_BGR_8u, 0, YUV_to_BGR_16u, 0, 0, YUV_to_BGR_32f},
+                {YUV4_to_BGR_8u, 0, YUV4_to_BGR_16u, 0, 0, YUV4_to_BGR_32f}
+            },
+            {
+                {YUV_to_BGRA_8u, 0, YUV_to_BGRA_16u, 0, 0, YUV_to_BGRA_32f},
+                {YUV4_to_BGRA_8u, 0, YUV4_to_BGRA_16u, 0, 0, YUV4_to_BGRA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void RGB_to_YCrCb(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {RGB_to_YCrCb_8u, 0, RGB_to_YCrCb_16u, 0, 0, RGB_to_YCrCb_32f},
+                {RGBA_to_YCrCb_8u, 0, RGBA_to_YCrCb_16u, 0, 0, RGBA_to_YCrCb_32f}
+            },
+            {
+                {RGB_to_YCrCb4_8u, 0, RGB_to_YCrCb4_16u, 0, 0, RGB_to_YCrCb4_32f},
+                {RGBA_to_YCrCb4_8u, 0, RGBA_to_YCrCb4_16u, 0, 0, RGBA_to_YCrCb4_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void BGR_to_YCrCb(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {BGR_to_YCrCb_8u, 0, BGR_to_YCrCb_16u, 0, 0, BGR_to_YCrCb_32f},
+                {BGRA_to_YCrCb_8u, 0, BGRA_to_YCrCb_16u, 0, 0, BGRA_to_YCrCb_32f}
+            },
+            {
+                {BGR_to_YCrCb4_8u, 0, BGR_to_YCrCb4_16u, 0, 0, BGR_to_YCrCb4_32f},
+                {BGRA_to_YCrCb4_8u, 0, BGRA_to_YCrCb4_16u, 0, 0, BGRA_to_YCrCb4_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void YCrCb_to_RGB(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {YCrCb_to_RGB_8u, 0, YCrCb_to_RGB_16u, 0, 0, YCrCb_to_RGB_32f},
+                {YCrCb4_to_RGB_8u, 0, YCrCb4_to_RGB_16u, 0, 0, YCrCb4_to_RGB_32f}
+            },
+            {
+                {YCrCb_to_RGBA_8u, 0, YCrCb_to_RGBA_16u, 0, 0, YCrCb_to_RGBA_32f},
+                {YCrCb4_to_RGBA_8u, 0, YCrCb4_to_RGBA_16u, 0, 0, YCrCb4_to_RGBA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void YCrCb_to_BGR(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {YCrCb_to_BGR_8u, 0, YCrCb_to_BGR_16u, 0, 0, YCrCb_to_BGR_32f},
+                {YCrCb4_to_BGR_8u, 0, YCrCb4_to_BGR_16u, 0, 0, YCrCb4_to_BGR_32f}
+            },
+            {
+                {YCrCb_to_BGRA_8u, 0, YCrCb_to_BGRA_16u, 0, 0, YCrCb_to_BGRA_32f},
+                {YCrCb4_to_BGRA_8u, 0, YCrCb4_to_BGRA_16u, 0, 0, YCrCb4_to_BGRA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void RGB_to_XYZ(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {RGB_to_XYZ_8u, 0, RGB_to_XYZ_16u, 0, 0, RGB_to_XYZ_32f},
+                {RGBA_to_XYZ_8u, 0, RGBA_to_XYZ_16u, 0, 0, RGBA_to_XYZ_32f}
+            },
+            {
+                {RGB_to_XYZ4_8u, 0, RGB_to_XYZ4_16u, 0, 0, RGB_to_XYZ4_32f},
+                {RGBA_to_XYZ4_8u, 0, RGBA_to_XYZ4_16u, 0, 0, RGBA_to_XYZ4_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void BGR_to_XYZ(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {BGR_to_XYZ_8u, 0, BGR_to_XYZ_16u, 0, 0, BGR_to_XYZ_32f},
+                {BGRA_to_XYZ_8u, 0, BGRA_to_XYZ_16u, 0, 0, BGRA_to_XYZ_32f}
+            },
+            {
+                {BGR_to_XYZ4_8u, 0, BGR_to_XYZ4_16u, 0, 0, BGR_to_XYZ4_32f},
+                {BGRA_to_XYZ4_8u, 0, BGRA_to_XYZ4_16u, 0, 0, BGRA_to_XYZ4_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void XYZ_to_RGB(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {XYZ_to_RGB_8u, 0, XYZ_to_RGB_16u, 0, 0, XYZ_to_RGB_32f},
+                {XYZ4_to_RGB_8u, 0, XYZ4_to_RGB_16u, 0, 0, XYZ4_to_RGB_32f}
+            },
+            {
+                {XYZ_to_RGBA_8u, 0, XYZ_to_RGBA_16u, 0, 0, XYZ_to_RGBA_32f},
+                {XYZ4_to_RGBA_8u, 0, XYZ4_to_RGBA_16u, 0, 0, XYZ4_to_RGBA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void XYZ_to_BGR(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {XYZ_to_BGR_8u, 0, XYZ_to_BGR_16u, 0, 0, XYZ_to_BGR_32f},
+                {XYZ4_to_BGR_8u, 0, XYZ4_to_BGR_16u, 0, 0, XYZ4_to_BGR_32f}
+            },
+            {
+                {XYZ_to_BGRA_8u, 0, XYZ_to_BGRA_16u, 0, 0, XYZ_to_BGRA_32f},
+                {XYZ4_to_BGRA_8u, 0, XYZ4_to_BGRA_16u, 0, 0, XYZ4_to_BGRA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void RGB_to_HSV(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {RGB_to_HSV_8u, 0, 0, 0, 0, RGB_to_HSV_32f},
+                {RGBA_to_HSV_8u, 0, 0, 0, 0, RGBA_to_HSV_32f},
+            },
+            {
+                {RGB_to_HSV4_8u, 0, 0, 0, 0, RGB_to_HSV4_32f},
+                {RGBA_to_HSV4_8u, 0, 0, 0, 0, RGBA_to_HSV4_32f},
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void BGR_to_HSV(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {BGR_to_HSV_8u, 0, 0, 0, 0, BGR_to_HSV_32f},
+                {BGRA_to_HSV_8u, 0, 0, 0, 0, BGRA_to_HSV_32f}
+            },
+            {
+                {BGR_to_HSV4_8u, 0, 0, 0, 0, BGR_to_HSV4_32f},
+                {BGRA_to_HSV4_8u, 0, 0, 0, 0, BGRA_to_HSV4_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void HSV_to_RGB(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {HSV_to_RGB_8u, 0, 0, 0, 0, HSV_to_RGB_32f},
+                {HSV4_to_RGB_8u, 0, 0, 0, 0, HSV4_to_RGB_32f}
+            },
+            {
+                {HSV_to_RGBA_8u, 0, 0, 0, 0, HSV_to_RGBA_32f},
+                {HSV4_to_RGBA_8u, 0, 0, 0, 0, HSV4_to_RGBA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void HSV_to_BGR(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {HSV_to_BGR_8u, 0, 0, 0, 0, HSV_to_BGR_32f},
+                {HSV4_to_BGR_8u, 0, 0, 0, 0, HSV4_to_BGR_32f}
+            },
+            {
+                {HSV_to_BGRA_8u, 0, 0, 0, 0, HSV_to_BGRA_32f},
+                {HSV4_to_BGRA_8u, 0, 0, 0, 0, HSV4_to_BGRA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void RGB_to_HLS(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {RGB_to_HLS_8u, 0, 0, 0, 0, RGB_to_HLS_32f},
+                {RGBA_to_HLS_8u, 0, 0, 0, 0, RGBA_to_HLS_32f},
+            },
+            {
+                {RGB_to_HLS4_8u, 0, 0, 0, 0, RGB_to_HLS4_32f},
+                {RGBA_to_HLS4_8u, 0, 0, 0, 0, RGBA_to_HLS4_32f},
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void BGR_to_HLS(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {BGR_to_HLS_8u, 0, 0, 0, 0, BGR_to_HLS_32f},
+                {BGRA_to_HLS_8u, 0, 0, 0, 0, BGRA_to_HLS_32f}
+            },
+            {
+                {BGR_to_HLS4_8u, 0, 0, 0, 0, BGR_to_HLS4_32f},
+                {BGRA_to_HLS4_8u, 0, 0, 0, 0, BGRA_to_HLS4_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void HLS_to_RGB(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {HLS_to_RGB_8u, 0, 0, 0, 0, HLS_to_RGB_32f},
+                {HLS4_to_RGB_8u, 0, 0, 0, 0, HLS4_to_RGB_32f}
+            },
+            {
+                {HLS_to_RGBA_8u, 0, 0, 0, 0, HLS_to_RGBA_32f},
+                {HLS4_to_RGBA_8u, 0, 0, 0, 0, HLS4_to_RGBA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void HLS_to_BGR(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {HLS_to_BGR_8u, 0, 0, 0, 0, HLS_to_BGR_32f},
+                {HLS4_to_BGR_8u, 0, 0, 0, 0, HLS4_to_BGR_32f}
+            },
+            {
+                {HLS_to_BGRA_8u, 0, 0, 0, 0, HLS_to_BGRA_32f},
+                {HLS4_to_BGRA_8u, 0, 0, 0, 0, HLS4_to_BGRA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void RGB_to_HSV_FULL(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {RGB_to_HSV_FULL_8u, 0, 0, 0, 0, RGB_to_HSV_FULL_32f},
+                {RGBA_to_HSV_FULL_8u, 0, 0, 0, 0, RGBA_to_HSV_FULL_32f},
+            },
+            {
+                {RGB_to_HSV4_FULL_8u, 0, 0, 0, 0, RGB_to_HSV4_FULL_32f},
+                {RGBA_to_HSV4_FULL_8u, 0, 0, 0, 0, RGBA_to_HSV4_FULL_32f},
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void BGR_to_HSV_FULL(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {BGR_to_HSV_FULL_8u, 0, 0, 0, 0, BGR_to_HSV_FULL_32f},
+                {BGRA_to_HSV_FULL_8u, 0, 0, 0, 0, BGRA_to_HSV_FULL_32f}
+            },
+            {
+                {BGR_to_HSV4_FULL_8u, 0, 0, 0, 0, BGR_to_HSV4_FULL_32f},
+                {BGRA_to_HSV4_FULL_8u, 0, 0, 0, 0, BGRA_to_HSV4_FULL_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void HSV_to_RGB_FULL(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {HSV_to_RGB_FULL_8u, 0, 0, 0, 0, HSV_to_RGB_FULL_32f},
+                {HSV4_to_RGB_FULL_8u, 0, 0, 0, 0, HSV4_to_RGB_FULL_32f}
+            },
+            {
+                {HSV_to_RGBA_FULL_8u, 0, 0, 0, 0, HSV_to_RGBA_FULL_32f},
+                {HSV4_to_RGBA_FULL_8u, 0, 0, 0, 0, HSV4_to_RGBA_FULL_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void HSV_to_BGR_FULL(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {HSV_to_BGR_FULL_8u, 0, 0, 0, 0, HSV_to_BGR_FULL_32f},
+                {HSV4_to_BGR_FULL_8u, 0, 0, 0, 0, HSV4_to_BGR_FULL_32f}
+            },
+            {
+                {HSV_to_BGRA_FULL_8u, 0, 0, 0, 0, HSV_to_BGRA_FULL_32f},
+                {HSV4_to_BGRA_FULL_8u, 0, 0, 0, 0, HSV4_to_BGRA_FULL_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void RGB_to_HLS_FULL(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {RGB_to_HLS_FULL_8u, 0, 0, 0, 0, RGB_to_HLS_FULL_32f},
+                {RGBA_to_HLS_FULL_8u, 0, 0, 0, 0, RGBA_to_HLS_FULL_32f},
+            },
+            {
+                {RGB_to_HLS4_FULL_8u, 0, 0, 0, 0, RGB_to_HLS4_FULL_32f},
+                {RGBA_to_HLS4_FULL_8u, 0, 0, 0, 0, RGBA_to_HLS4_FULL_32f},
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void BGR_to_HLS_FULL(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {BGR_to_HLS_FULL_8u, 0, 0, 0, 0, BGR_to_HLS_FULL_32f},
+                {BGRA_to_HLS_FULL_8u, 0, 0, 0, 0, BGRA_to_HLS_FULL_32f}
+            },
+            {
+                {BGR_to_HLS4_FULL_8u, 0, 0, 0, 0, BGR_to_HLS4_FULL_32f},
+                {BGRA_to_HLS4_FULL_8u, 0, 0, 0, 0, BGRA_to_HLS4_FULL_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void HLS_to_RGB_FULL(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {HLS_to_RGB_FULL_8u, 0, 0, 0, 0, HLS_to_RGB_FULL_32f},
+                {HLS4_to_RGB_FULL_8u, 0, 0, 0, 0, HLS4_to_RGB_FULL_32f}
+            },
+            {
+                {HLS_to_RGBA_FULL_8u, 0, 0, 0, 0, HLS_to_RGBA_FULL_32f},
+                {HLS4_to_RGBA_FULL_8u, 0, 0, 0, 0, HLS4_to_RGBA_FULL_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void HLS_to_BGR_FULL(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][6] =
+        {
+            {
+                {HLS_to_BGR_FULL_8u, 0, 0, 0, 0, HLS_to_BGR_FULL_32f},
+                {HLS4_to_BGR_FULL_8u, 0, 0, 0, 0, HLS4_to_BGR_FULL_32f}
+            },
+            {
+                {HLS_to_BGRA_FULL_8u, 0, 0, 0, 0, HLS_to_BGRA_FULL_32f},
+                {HLS4_to_BGRA_FULL_8u, 0, 0, 0, 0, HLS4_to_BGRA_FULL_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth()](src, dst, stream);
+    }
+
+    void BGR_to_Lab(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][2] =
+        {
+            {
+                {BGR_to_Lab_8u, BGR_to_Lab_32f},
+                {BGRA_to_Lab_8u, BGRA_to_Lab_32f}
+            },
+            {
+                {BGR_to_Lab4_8u, BGR_to_Lab4_32f},
+                {BGRA_to_Lab4_8u, BGRA_to_Lab4_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth() == CV_32F](src, dst, stream);
+    }
+
+    void RGB_to_Lab(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][2] =
+        {
+            {
+                {RGB_to_Lab_8u, RGB_to_Lab_32f},
+                {RGBA_to_Lab_8u, RGBA_to_Lab_32f}
+            },
+            {
+                {RGB_to_Lab4_8u, RGB_to_Lab4_32f},
+                {RGBA_to_Lab4_8u, RGBA_to_Lab4_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth() == CV_32F](src, dst, stream);
+    }
+
+    void LBGR_to_Lab(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][2] =
+        {
+            {
+                {LBGR_to_Lab_8u, LBGR_to_Lab_32f},
+                {LBGRA_to_Lab_8u, LBGRA_to_Lab_32f}
+            },
+            {
+                {LBGR_to_Lab4_8u, LBGR_to_Lab4_32f},
+                {LBGRA_to_Lab4_8u, LBGRA_to_Lab4_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth() == CV_32F](src, dst, stream);
+    }
+
+    void LRGB_to_Lab(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][2] =
+        {
+            {
+                {LRGB_to_Lab_8u, LRGB_to_Lab_32f},
+                {LRGBA_to_Lab_8u, LRGBA_to_Lab_32f}
+            },
+            {
+                {LRGB_to_Lab4_8u, LRGB_to_Lab4_32f},
+                {LRGBA_to_Lab4_8u, LRGBA_to_Lab4_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth() == CV_32F](src, dst, stream);
+    }
+
+    void Lab_to_BGR(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][2] =
+        {
+            {
+                {Lab_to_BGR_8u, Lab_to_BGR_32f},
+                {Lab4_to_BGR_8u, Lab4_to_BGR_32f}
+            },
+            {
+                {Lab_to_BGRA_8u, Lab_to_BGRA_32f},
+                {Lab4_to_BGRA_8u, Lab4_to_BGRA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth() == CV_32F](src, dst, stream);
+    }
+
+    void Lab_to_RGB(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][2] =
+        {
+            {
+                {Lab_to_RGB_8u, Lab_to_RGB_32f},
+                {Lab4_to_RGB_8u, Lab4_to_RGB_32f}
+            },
+            {
+                {Lab_to_RGBA_8u, Lab_to_RGBA_32f},
+                {Lab4_to_RGBA_8u, Lab4_to_RGBA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth() == CV_32F](src, dst, stream);
+    }
+
+    void Lab_to_LBGR(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][2] =
+        {
+            {
+                {Lab_to_LBGR_8u, Lab_to_LBGR_32f},
+                {Lab4_to_LBGR_8u, Lab4_to_LBGR_32f}
+            },
+            {
+                {Lab_to_LBGRA_8u, Lab_to_LBGRA_32f},
+                {Lab4_to_LBGRA_8u, Lab4_to_LBGRA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth() == CV_32F](src, dst, stream);
+    }
+
+    void Lab_to_LRGB(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][2] =
+        {
+            {
+                {Lab_to_LRGB_8u, Lab_to_LRGB_32f},
+                {Lab4_to_LRGB_8u, Lab4_to_LRGB_32f}
+            },
+            {
+                {Lab_to_LRGBA_8u, Lab_to_LRGBA_32f},
+                {Lab4_to_LRGBA_8u, Lab4_to_LRGBA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth() == CV_32F](src, dst, stream);
+    }
+
+    void BGR_to_Luv(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][2] =
+        {
+            {
+                {BGR_to_Luv_8u, BGR_to_Luv_32f},
+                {BGRA_to_Luv_8u, BGRA_to_Luv_32f}
+            },
+            {
+                {BGR_to_Luv4_8u, BGR_to_Luv4_32f},
+                {BGRA_to_Luv4_8u, BGRA_to_Luv4_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth() == CV_32F](src, dst, stream);
+    }
+
+    void RGB_to_Luv(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][2] =
+        {
+            {
+                {RGB_to_Luv_8u, RGB_to_Luv_32f},
+                {RGBA_to_Luv_8u, RGBA_to_Luv_32f}
+            },
+            {
+                {RGB_to_Luv4_8u, RGB_to_Luv4_32f},
+                {RGBA_to_Luv4_8u, RGBA_to_Luv4_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth() == CV_32F](src, dst, stream);
+    }
+
+    void LBGR_to_Luv(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][2] =
+        {
+            {
+                {LBGR_to_Luv_8u, LBGR_to_Luv_32f},
+                {LBGRA_to_Luv_8u, LBGRA_to_Luv_32f}
+            },
+            {
+                {LBGR_to_Luv4_8u, LBGR_to_Luv4_32f},
+                {LBGRA_to_Luv4_8u, LBGRA_to_Luv4_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth() == CV_32F](src, dst, stream);
+    }
+
+    void LRGB_to_Luv(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][2] =
+        {
+            {
+                {LRGB_to_Luv_8u, LRGB_to_Luv_32f},
+                {LRGBA_to_Luv_8u, LRGBA_to_Luv_32f}
+            },
+            {
+                {LRGB_to_Luv4_8u, LRGB_to_Luv4_32f},
+                {LRGBA_to_Luv4_8u, LRGBA_to_Luv4_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth() == CV_32F](src, dst, stream);
+    }
+
+    void Luv_to_BGR(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][2] =
+        {
+            {
+                {Luv_to_BGR_8u, Luv_to_BGR_32f},
+                {Luv4_to_BGR_8u, Luv4_to_BGR_32f}
+            },
+            {
+                {Luv_to_BGRA_8u, Luv_to_BGRA_32f},
+                {Luv4_to_BGRA_8u, Luv4_to_BGRA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth() == CV_32F](src, dst, stream);
+    }
+
+    void Luv_to_RGB(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][2] =
+        {
+            {
+                {Luv_to_RGB_8u, Luv_to_RGB_32f},
+                {Luv4_to_RGB_8u, Luv4_to_RGB_32f}
+            },
+            {
+                {Luv_to_RGBA_8u, Luv_to_RGBA_32f},
+                {Luv4_to_RGBA_8u, Luv4_to_RGBA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth() == CV_32F](src, dst, stream);
+    }
+
+    void Luv_to_LBGR(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][2] =
+        {
+            {
+                {Luv_to_LBGR_8u, Luv_to_LBGR_32f},
+                {Luv4_to_LBGR_8u, Luv4_to_LBGR_32f}
+            },
+            {
+                {Luv_to_LBGRA_8u, Luv_to_LBGRA_32f},
+                {Luv4_to_LBGRA_8u, Luv4_to_LBGRA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth() == CV_32F](src, dst, stream);
+    }
+
+    void Luv_to_LRGB(InputArray _src, OutputArray _dst, int dcn, Stream& stream)
+    {
+        using namespace cv::cuda::device;
+        static const gpu_func_t funcs[2][2][2] =
+        {
+            {
+                {Luv_to_LRGB_8u, Luv_to_LRGB_32f},
+                {Luv4_to_LRGB_8u, Luv4_to_LRGB_32f}
+            },
+            {
+                {Luv_to_LRGBA_8u, Luv_to_LRGBA_32f},
+                {Luv4_to_LRGBA_8u, Luv4_to_LRGBA_32f}
+            }
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.depth() == CV_8U || src.depth() == CV_32F );
+        CV_Assert( src.channels() == 3 || src.channels() == 4 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[dcn == 4][src.channels() == 4][src.depth() == CV_32F](src, dst, stream);
+    }
+
+    void RGBA_to_mBGRA(InputArray _src, OutputArray _dst, int, Stream& _stream)
+    {
+    #if (CUDA_VERSION < 5000)
+        CV_UNUSED(_src);
+        CV_UNUSED(_dst);
+        CV_UNUSED(_stream);
+        CV_Error( Error::StsBadFlag, "Unknown/unsupported color conversion code" );
+    #else
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.type() == CV_8UC4 || src.type() == CV_16UC4 );
+
+        _dst.create(src.size(), src.type());
+        GpuMat dst = _dst.getGpuMat();
+
+        cudaStream_t stream = StreamAccessor::getStream(_stream);
+        NppStreamHandler h(stream);
+
+        NppiSize oSizeROI;
+        oSizeROI.width = src.cols;
+        oSizeROI.height = src.rows;
+
+        if (src.depth() == CV_8U)
+            nppSafeCall( nppiAlphaPremul_8u_AC4R(src.ptr<Npp8u>(), static_cast<int>(src.step), dst.ptr<Npp8u>(), static_cast<int>(dst.step), oSizeROI) );
+        else
+            nppSafeCall( nppiAlphaPremul_16u_AC4R(src.ptr<Npp16u>(), static_cast<int>(src.step), dst.ptr<Npp16u>(), static_cast<int>(dst.step), oSizeROI) );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    #endif
+    }
+
+    void bayer_to_BGR(InputArray _src, OutputArray _dst, int dcn, bool blue_last, bool start_with_green, Stream& stream)
+    {
+        typedef void (*func_t)(PtrStepSzb src, PtrStepSzb dst, bool blue_last, bool start_with_green, cudaStream_t stream);
+        static const func_t funcs[3][4] =
+        {
+            {0,0,Bayer2BGR_8u_gpu<3>, Bayer2BGR_8u_gpu<4>},
+            {0,0,0,0},
+            {0,0,Bayer2BGR_16u_gpu<3>, Bayer2BGR_16u_gpu<4>}
+        };
+
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.type() == CV_8UC1 || src.type() == CV_16UC1 );
+        CV_Assert( src.rows > 2 && src.cols > 2 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[src.depth()][dcn - 1](src, dst, blue_last, start_with_green, StreamAccessor::getStream(stream));
+    }
+    void bayerBG_to_BGR(InputArray src, OutputArray dst, int dcn, Stream& stream)
+    {
+        bayer_to_BGR(src, dst, dcn, false, false, stream);
+    }
+    void bayeRGB_to_BGR(InputArray src, OutputArray dst, int dcn, Stream& stream)
+    {
+        bayer_to_BGR(src, dst, dcn, false, true, stream);
+    }
+    void bayerRG_to_BGR(InputArray src, OutputArray dst, int dcn, Stream& stream)
+    {
+        bayer_to_BGR(src, dst, dcn, true, false, stream);
+    }
+    void bayerGR_to_BGR(InputArray src, OutputArray dst, int dcn, Stream& stream)
+    {
+        bayer_to_BGR(src, dst, dcn, true, true, stream);
+    }
+
+    void bayer_to_gray(InputArray _src, OutputArray _dst, bool blue_last, bool start_with_green, Stream& stream)
+    {
+        typedef void (*func_t)(PtrStepSzb src, PtrStepSzb dst, bool blue_last, bool start_with_green, cudaStream_t stream);
+        static const func_t funcs[3] =
+        {
+            Bayer2BGR_8u_gpu<1>,
+            0,
+            Bayer2BGR_16u_gpu<1>,
+        };
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.type() == CV_8UC1 || src.type() == CV_16UC1 );
+        CV_Assert( src.rows > 2 && src.cols > 2 );
+
+        _dst.create(src.size(), CV_MAKE_TYPE(src.depth(), 1));
+        GpuMat dst = _dst.getGpuMat();
+
+        funcs[src.depth()](src, dst, blue_last, start_with_green, StreamAccessor::getStream(stream));
+    }
+    void bayerBG_to_gray(InputArray src, OutputArray dst, int /*dcn*/, Stream& stream)
+    {
+        bayer_to_gray(src, dst, false, false, stream);
+    }
+    void bayeRGB_to_GRAY(InputArray src, OutputArray dst, int /*dcn*/, Stream& stream)
+    {
+        bayer_to_gray(src, dst, false, true, stream);
+    }
+    void bayerRG_to_gray(InputArray src, OutputArray dst, int /*dcn*/, Stream& stream)
+    {
+        bayer_to_gray(src, dst, true, false, stream);
+    }
+    void bayerGR_to_gray(InputArray src, OutputArray dst, int /*dcn*/, Stream& stream)
+    {
+        bayer_to_gray(src, dst, true, true, stream);
+    }
+}
+
+////////////////////////////////////////////////////////////////////////
+// cvtColor
+
+void cv::cuda::cvtColor(InputArray src, OutputArray dst, int code, int dcn, Stream& stream)
+{
+    typedef void (*func_t)(InputArray src, OutputArray dst, int dcn, Stream& stream);
+    static const func_t funcs[] =
+    {
+        BGR_to_BGRA,            // CV_BGR2BGRA    =0
+        BGRA_to_BGR,            // CV_BGRA2BGR    =1
+        BGR_to_RGBA,            // CV_BGR2RGBA    =2
+        BGRA_to_RGB,            // CV_RGBA2BGR    =3
+        BGR_to_RGB,             // CV_BGR2RGB     =4
+        BGRA_to_RGBA,           // CV_BGRA2RGBA   =5
+
+        BGR_to_GRAY,            // CV_BGR2GRAY    =6
+        RGB_to_GRAY,            // CV_RGB2GRAY    =7
+        GRAY_to_BGR,            // CV_GRAY2BGR    =8
+        GRAY_to_BGRA,           // CV_GRAY2BGRA   =9
+        BGRA_to_GRAY,           // CV_BGRA2GRAY   =10
+        RGBA_to_GRAY,           // CV_RGBA2GRAY   =11
+
+        BGR_to_BGR565,          // CV_BGR2BGR565  =12
+        RGB_to_BGR565,          // CV_RGB2BGR565  =13
+        BGR565_to_BGR,          // CV_BGR5652BGR  =14
+        BGR565_to_RGB,          // CV_BGR5652RGB  =15
+        BGRA_to_BGR565,         // CV_BGRA2BGR565 =16
+        RGBA_to_BGR565,         // CV_RGBA2BGR565 =17
+        BGR565_to_BGRA,         // CV_BGR5652BGRA =18
+        BGR565_to_RGBA,         // CV_BGR5652RGBA =19
+
+        GRAY_to_BGR565,         // CV_GRAY2BGR565 =20
+        BGR565_to_GRAY,         // CV_BGR5652GRAY =21
+
+        BGR_to_BGR555,          // CV_BGR2BGR555  =22
+        RGB_to_BGR555,          // CV_RGB2BGR555  =23
+        BGR555_to_BGR,          // CV_BGR5552BGR  =24
+        BGR555_to_RGB,          // CV_BGR5552RGB  =25
+        BGRA_to_BGR555,         // CV_BGRA2BGR555 =26
+        RGBA_to_BGR555,         // CV_RGBA2BGR555 =27
+        BGR555_to_BGRA,         // CV_BGR5552BGRA =28
+        BGR555_to_RGBA,         // CV_BGR5552RGBA =29
+
+        GRAY_to_BGR555,         // CV_GRAY2BGR555 =30
+        BGR555_to_GRAY,         // CV_BGR5552GRAY =31
+
+        BGR_to_XYZ,             // CV_BGR2XYZ     =32
+        RGB_to_XYZ,             // CV_RGB2XYZ     =33
+        XYZ_to_BGR,             // CV_XYZ2BGR     =34
+        XYZ_to_RGB,             // CV_XYZ2RGB     =35
+
+        BGR_to_YCrCb,           // CV_BGR2YCrCb   =36
+        RGB_to_YCrCb,           // CV_RGB2YCrCb   =37
+        YCrCb_to_BGR,           // CV_YCrCb2BGR   =38
+        YCrCb_to_RGB,           // CV_YCrCb2RGB   =39
+
+        BGR_to_HSV,             // CV_BGR2HSV     =40
+        RGB_to_HSV,             // CV_RGB2HSV     =41
+
+        0,                      //                =42
+        0,                      //                =43
+
+        BGR_to_Lab,             // CV_BGR2Lab     =44
+        RGB_to_Lab,             // CV_RGB2Lab     =45
+
+        bayerBG_to_BGR,         // CV_BayerBG2BGR =46
+        bayeRGB_to_BGR,         // CV_BayeRGB2BGR =47
+        bayerRG_to_BGR,         // CV_BayerRG2BGR =48
+        bayerGR_to_BGR,         // CV_BayerGR2BGR =49
+
+        BGR_to_Luv,             // CV_BGR2Luv     =50
+        RGB_to_Luv,             // CV_RGB2Luv     =51
+
+        BGR_to_HLS,             // CV_BGR2HLS     =52
+        RGB_to_HLS,             // CV_RGB2HLS     =53
+
+        HSV_to_BGR,             // CV_HSV2BGR     =54
+        HSV_to_RGB,             // CV_HSV2RGB     =55
+
+        Lab_to_BGR,             // CV_Lab2BGR     =56
+        Lab_to_RGB,             // CV_Lab2RGB     =57
+        Luv_to_BGR,             // CV_Luv2BGR     =58
+        Luv_to_RGB,             // CV_Luv2RGB     =59
+
+        HLS_to_BGR,             // CV_HLS2BGR     =60
+        HLS_to_RGB,             // CV_HLS2RGB     =61
+
+        0,                      // CV_BayerBG2BGR_VNG =62
+        0,                      // CV_BayeRGB2BGR_VNG =63
+        0,                      // CV_BayerRG2BGR_VNG =64
+        0,                      // CV_BayerGR2BGR_VNG =65
+
+        BGR_to_HSV_FULL,        // CV_BGR2HSV_FULL = 66
+        RGB_to_HSV_FULL,        // CV_RGB2HSV_FULL = 67
+        BGR_to_HLS_FULL,        // CV_BGR2HLS_FULL = 68
+        RGB_to_HLS_FULL,        // CV_RGB2HLS_FULL = 69
+
+        HSV_to_BGR_FULL,        // CV_HSV2BGR_FULL = 70
+        HSV_to_RGB_FULL,        // CV_HSV2RGB_FULL = 71
+        HLS_to_BGR_FULL,        // CV_HLS2BGR_FULL = 72
+        HLS_to_RGB_FULL,        // CV_HLS2RGB_FULL = 73
+
+        LBGR_to_Lab,            // CV_LBGR2Lab     = 74
+        LRGB_to_Lab,            // CV_LRGB2Lab     = 75
+        LBGR_to_Luv,            // CV_LBGR2Luv     = 76
+        LRGB_to_Luv,            // CV_LRGB2Luv     = 77
+
+        Lab_to_LBGR,            // CV_Lab2LBGR     = 78
+        Lab_to_LRGB,            // CV_Lab2LRGB     = 79
+        Luv_to_LBGR,            // CV_Luv2LBGR     = 80
+        Luv_to_LRGB,            // CV_Luv2LRGB     = 81
+
+        BGR_to_YUV,             // CV_BGR2YUV      = 82
+        RGB_to_YUV,             // CV_RGB2YUV      = 83
+        YUV_to_BGR,             // CV_YUV2BGR      = 84
+        YUV_to_RGB,             // CV_YUV2RGB      = 85
+
+        bayerBG_to_gray,        // CV_BayerBG2GRAY = 86
+        bayeRGB_to_GRAY,        // CV_BayeRGB2GRAY = 87
+        bayerRG_to_gray,        // CV_BayerRG2GRAY = 88
+        bayerGR_to_gray,        // CV_BayerGR2GRAY = 89
+
+        //YUV 4:2:0 formats family
+        0,                      // CV_YUV2RGB_NV12 = 90,
+        0,                      // CV_YUV2BGR_NV12 = 91,
+        0,                      // CV_YUV2RGB_NV21 = 92,
+        0,                      // CV_YUV2BGR_NV21 = 93,
+
+        0,                      // CV_YUV2RGBA_NV12 = 94,
+        0,                      // CV_YUV2BGRA_NV12 = 95,
+        0,                      // CV_YUV2RGBA_NV21 = 96,
+        0,                      // CV_YUV2BGRA_NV21 = 97,
+
+        0,                      // CV_YUV2RGB_YV12 = 98,
+        0,                      // CV_YUV2BGR_YV12 = 99,
+        0,                      // CV_YUV2RGB_IYUV = 100,
+        0,                      // CV_YUV2BGR_IYUV = 101,
+
+        0,                      // CV_YUV2RGBA_YV12 = 102,
+        0,                      // CV_YUV2BGRA_YV12 = 103,
+        0,                      // CV_YUV2RGBA_IYUV = 104,
+        0,                      // CV_YUV2BGRA_IYUV = 105,
+
+        0,                      // CV_YUV2GRAY_420 = 106,
+
+        //YUV 4:2:2 formats family
+        0,                      // CV_YUV2RGB_UYVY = 107,
+        0,                      // CV_YUV2BGR_UYVY = 108,
+        0,                      // //CV_YUV2RGB_VYUY = 109,
+        0,                      // //CV_YUV2BGR_VYUY = 110,
+
+        0,                      // CV_YUV2RGBA_UYVY = 111,
+        0,                      // CV_YUV2BGRA_UYVY = 112,
+        0,                      // //CV_YUV2RGBA_VYUY = 113,
+        0,                      // //CV_YUV2BGRA_VYUY = 114,
+
+        0,                      // CV_YUV2RGB_YUY2 = 115,
+        0,                      // CV_YUV2BGR_YUY2 = 116,
+        0,                      // CV_YUV2RGB_YVYU = 117,
+        0,                      // CV_YUV2BGR_YVYU = 118,
+
+        0,                      // CV_YUV2RGBA_YUY2 = 119,
+        0,                      // CV_YUV2BGRA_YUY2 = 120,
+        0,                      // CV_YUV2RGBA_YVYU = 121,
+        0,                      // CV_YUV2BGRA_YVYU = 122,
+
+        0,                      // CV_YUV2GRAY_UYVY = 123,
+        0,                      // CV_YUV2GRAY_YUY2 = 124,
+
+        // alpha premultiplication
+        RGBA_to_mBGRA,          // CV_RGBA2mRGBA = 125,
+        0,                      // CV_mRGBA2RGBA = 126,
+
+        0,                      // CV_COLORCVT_MAX  = 127
+    };
+
+    CV_Assert( code < 128 );
+
+    func_t func = funcs[code];
+
+    if (func == 0)
+        CV_Error(Error::StsBadFlag, "Unknown/unsupported color conversion code");
+
+    func(src, dst, dcn, stream);
+}
+
+////////////////////////////////////////////////////////////////////////
+// demosaicing
+
+void cv::cuda::demosaicing(InputArray _src, OutputArray _dst, int code, int dcn, Stream& stream)
+{
+    CV_Assert( !_src.empty() );
+
+    switch (code)
+    {
+    case cv::COLOR_BayerBG2GRAY: case cv::COLOR_BayerGB2GRAY: case cv::COLOR_BayerRG2GRAY: case cv::COLOR_BayerGR2GRAY:
+        bayer_to_gray(_src, _dst, code == cv::COLOR_BayerBG2GRAY || code == cv::COLOR_BayerGB2GRAY, code == cv::COLOR_BayerGB2GRAY || code == cv::COLOR_BayerGR2GRAY, stream);
+        break;
+
+    case cv::COLOR_BayerBG2BGR: case cv::COLOR_BayerGB2BGR: case cv::COLOR_BayerRG2BGR: case cv::COLOR_BayerGR2BGR:
+        bayer_to_BGR(_src, _dst, dcn, code == cv::COLOR_BayerBG2BGR || code == cv::COLOR_BayerGB2BGR, code == cv::COLOR_BayerGB2BGR || code == cv::COLOR_BayerGR2BGR, stream);
+        break;
+
+    case COLOR_BayerBG2BGR_MHT: case COLOR_BayerGB2BGR_MHT: case COLOR_BayerRG2BGR_MHT: case COLOR_BayerGR2BGR_MHT:
+    {
+        if (dcn <= 0) dcn = 3;
+
+        GpuMat src = _src.getGpuMat();
+        const int depth = _src.depth();
+
+        CV_Assert( depth == CV_8U );
+        CV_Assert( src.channels() == 1 );
+        CV_Assert( dcn == 3 || dcn == 4 );
+
+        _dst.create(_src.size(), CV_MAKE_TYPE(depth, dcn));
+        GpuMat dst = _dst.getGpuMat();
+
+        dst.setTo(Scalar::all(0), stream);
+
+        Size wholeSize;
+        Point ofs;
+        src.locateROI(wholeSize, ofs);
+        PtrStepSzb srcWhole(wholeSize.height, wholeSize.width, src.datastart, src.step);
+
+        const int2 firstRed = make_int2(code == COLOR_BayerRG2BGR_MHT || code == COLOR_BayerGB2BGR_MHT ? 0 : 1,
+                                        code == COLOR_BayerRG2BGR_MHT || code == COLOR_BayerGR2BGR_MHT ? 0 : 1);
+
+        if (dcn == 3)
+            cv::cuda::device::MHCdemosaic<3>(srcWhole, make_int2(ofs.x, ofs.y), dst, firstRed, StreamAccessor::getStream(stream));
+        else
+            cv::cuda::device::MHCdemosaic<4>(srcWhole, make_int2(ofs.x, ofs.y), dst, firstRed, StreamAccessor::getStream(stream));
+
+        break;
+    }
+
+    case COLOR_BayerBG2GRAY_MHT: case COLOR_BayerGB2GRAY_MHT: case COLOR_BayerRG2GRAY_MHT: case COLOR_BayerGR2GRAY_MHT:
+    {
+        GpuMat src = _src.getGpuMat();
+        const int depth = _src.depth();
+
+        CV_Assert( depth == CV_8U );
+
+        _dst.create(_src.size(), CV_MAKE_TYPE(depth, 1));
+        GpuMat dst = _dst.getGpuMat();
+
+        dst.setTo(Scalar::all(0), stream);
+
+        Size wholeSize;
+        Point ofs;
+        src.locateROI(wholeSize, ofs);
+        PtrStepSzb srcWhole(wholeSize.height, wholeSize.width, src.datastart, src.step);
+
+        const int2 firstRed = make_int2(code == COLOR_BayerRG2BGR_MHT || code == COLOR_BayerGB2BGR_MHT ? 0 : 1,
+                                        code == COLOR_BayerRG2BGR_MHT || code == COLOR_BayerGR2BGR_MHT ? 0 : 1);
+
+        cv::cuda::device::MHCdemosaic<1>(srcWhole, make_int2(ofs.x, ofs.y), dst, firstRed, StreamAccessor::getStream(stream));
+
+        break;
+    }
+
+    default:
+        CV_Error(Error::StsBadFlag, "Unknown / unsupported color conversion code");
+    }
+}
+
+////////////////////////////////////////////////////////////////////////
+// swapChannels
+
+void cv::cuda::swapChannels(InputOutputArray _image, const int dstOrder[4], Stream& _stream)
+{
+    GpuMat image = _image.getGpuMat();
+
+    CV_Assert( image.type() == CV_8UC4 );
+
+    cudaStream_t stream = StreamAccessor::getStream(_stream);
+    NppStreamHandler h(stream);
+
+    NppiSize sz;
+    sz.width  = image.cols;
+    sz.height = image.rows;
+
+    nppSafeCall( nppiSwapChannels_8u_C4IR(image.ptr<Npp8u>(), static_cast<int>(image.step), sz, dstOrder) );
+
+    if (stream == 0)
+        cudaSafeCall( cudaDeviceSynchronize() );
+}
+
+////////////////////////////////////////////////////////////////////////
+// gammaCorrection
+
+void cv::cuda::gammaCorrection(InputArray _src, OutputArray _dst, bool forward, Stream& stream)
+{
+#if (CUDA_VERSION < 5000)
+    CV_UNUSED(_src);
+    CV_UNUSED(_dst);
+    CV_UNUSED(forward);
+    CV_UNUSED(stream);
+    CV_Error(Error::StsNotImplemented, "This function works only with CUDA 5.0 or higher");
+#else
+    typedef NppStatus (*func_t)(const Npp8u* pSrc, int nSrcStep, Npp8u* pDst, int nDstStep, NppiSize oSizeROI);
+    typedef NppStatus (*func_inplace_t)(Npp8u* pSrcDst, int nSrcDstStep, NppiSize oSizeROI);
+
+    static const func_t funcs[2][5] =
+    {
+        {0, 0, 0, nppiGammaInv_8u_C3R, nppiGammaInv_8u_AC4R},
+        {0, 0, 0, nppiGammaFwd_8u_C3R, nppiGammaFwd_8u_AC4R}
+    };
+    static const func_inplace_t funcs_inplace[2][5] =
+    {
+        {0, 0, 0, nppiGammaInv_8u_C3IR, nppiGammaInv_8u_AC4IR},
+        {0, 0, 0, nppiGammaFwd_8u_C3IR, nppiGammaFwd_8u_AC4IR}
+    };
+
+    GpuMat src = _src.getGpuMat();
+
+    CV_Assert( src.type() == CV_8UC3 || src.type() == CV_8UC4 );
+
+    _dst.create(src.size(), src.type());
+    GpuMat dst = _dst.getGpuMat();
+
+    NppStreamHandler h(StreamAccessor::getStream(stream));
+
+    NppiSize oSizeROI;
+    oSizeROI.width = src.cols;
+    oSizeROI.height = src.rows;
+
+    if (dst.data == src.data)
+        funcs_inplace[forward][src.channels()](dst.ptr<Npp8u>(), static_cast<int>(src.step), oSizeROI);
+    else
+        funcs[forward][src.channels()](src.ptr<Npp8u>(), static_cast<int>(src.step), dst.ptr<Npp8u>(), static_cast<int>(dst.step), oSizeROI);
+
+#endif
+}
+
+////////////////////////////////////////////////////////////////////////
+// alphaComp
+
+namespace
+{
+    template <int DEPTH> struct NppAlphaCompFunc
+    {
+        typedef typename NPPTypeTraits<DEPTH>::npp_type npp_t;
+
+        typedef NppStatus (*func_t)(const npp_t* pSrc1, int nSrc1Step, const npp_t* pSrc2, int nSrc2Step, npp_t* pDst, int nDstStep, NppiSize oSizeROI, NppiAlphaOp eAlphaOp);
+    };
+
+    template <int DEPTH, typename NppAlphaCompFunc<DEPTH>::func_t func> struct NppAlphaComp
+    {
+        typedef typename NPPTypeTraits<DEPTH>::npp_type npp_t;
+
+        static void call(const GpuMat& img1, const GpuMat& img2, GpuMat& dst, NppiAlphaOp eAlphaOp, cudaStream_t stream)
+        {
+            NppStreamHandler h(stream);
+
+            NppiSize oSizeROI;
+            oSizeROI.width = img1.cols;
+            oSizeROI.height = img2.rows;
+
+            nppSafeCall( func(img1.ptr<npp_t>(), static_cast<int>(img1.step), img2.ptr<npp_t>(), static_cast<int>(img2.step),
+                              dst.ptr<npp_t>(), static_cast<int>(dst.step), oSizeROI, eAlphaOp) );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    };
+}
+
+void cv::cuda::alphaComp(InputArray _img1, InputArray _img2, OutputArray _dst, int alpha_op, Stream& stream)
+{
+    static const NppiAlphaOp npp_alpha_ops[] = {
+        NPPI_OP_ALPHA_OVER,
+        NPPI_OP_ALPHA_IN,
+        NPPI_OP_ALPHA_OUT,
+        NPPI_OP_ALPHA_ATOP,
+        NPPI_OP_ALPHA_XOR,
+        NPPI_OP_ALPHA_PLUS,
+        NPPI_OP_ALPHA_OVER_PREMUL,
+        NPPI_OP_ALPHA_IN_PREMUL,
+        NPPI_OP_ALPHA_OUT_PREMUL,
+        NPPI_OP_ALPHA_ATOP_PREMUL,
+        NPPI_OP_ALPHA_XOR_PREMUL,
+        NPPI_OP_ALPHA_PLUS_PREMUL,
+        NPPI_OP_ALPHA_PREMUL
+    };
+
+    typedef void (*func_t)(const GpuMat& img1, const GpuMat& img2, GpuMat& dst, NppiAlphaOp eAlphaOp, cudaStream_t stream);
+    static const func_t funcs[] =
+    {
+        NppAlphaComp<CV_8U, nppiAlphaComp_8u_AC4R>::call,
+        0,
+        NppAlphaComp<CV_16U, nppiAlphaComp_16u_AC4R>::call,
+        0,
+        NppAlphaComp<CV_32S, nppiAlphaComp_32s_AC4R>::call,
+        NppAlphaComp<CV_32F, nppiAlphaComp_32f_AC4R>::call
+    };
+
+    GpuMat img1 = _img1.getGpuMat();
+    GpuMat img2 = _img2.getGpuMat();
+
+    CV_Assert( img1.type() == CV_8UC4 || img1.type() == CV_16UC4 || img1.type() == CV_32SC4 || img1.type() == CV_32FC4 );
+    CV_Assert( img1.size() == img2.size() && img1.type() == img2.type() );
+
+    _dst.create(img1.size(), img1.type());
+    GpuMat dst = _dst.getGpuMat();
+
+    const func_t func = funcs[img1.depth()];
+
+    func(img1, img2, dst, npp_alpha_ops[alpha_op], StreamAccessor::getStream(stream));
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudaimgproc/src/corners.cpp b/modules/cudaimgproc/src/corners.cpp
new file mode 100644
index 00000000000..aa8867f0efc
--- /dev/null
+++ b/modules/cudaimgproc/src/corners.cpp
@@ -0,0 +1,189 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER) || !defined(HAVE_OPENCV_CUDAFILTERS)
+
+Ptr<cuda::CornernessCriteria> cv::cuda::createHarrisCorner(int, int, int, double, int) { throw_no_cuda(); return Ptr<cuda::CornernessCriteria>(); }
+Ptr<cuda::CornernessCriteria> cv::cuda::createMinEigenValCorner(int, int, int, int) { throw_no_cuda(); return Ptr<cuda::CornernessCriteria>(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace imgproc
+    {
+        void cornerHarris_gpu(int block_size, float k, PtrStepSzf Dx, PtrStepSzf Dy, PtrStepSzf dst, int border_type, cudaStream_t stream);
+        void cornerMinEigenVal_gpu(int block_size, PtrStepSzf Dx, PtrStepSzf Dy, PtrStepSzf dst, int border_type, cudaStream_t stream);
+    }
+}}}
+
+namespace
+{
+    class CornerBase : public CornernessCriteria
+    {
+    protected:
+        CornerBase(int srcType, int blockSize, int ksize, int borderType);
+
+        void extractCovData(const GpuMat& src, Stream& stream);
+
+        int srcType_;
+        int blockSize_;
+        int ksize_;
+        int borderType_;
+        GpuMat Dx_, Dy_;
+
+    private:
+        Ptr<cuda::Filter> filterDx_, filterDy_;
+    };
+
+    CornerBase::CornerBase(int srcType, int blockSize, int ksize, int borderType) :
+        srcType_(srcType), blockSize_(blockSize), ksize_(ksize), borderType_(borderType)
+    {
+        CV_Assert( borderType_ == BORDER_REFLECT101 || borderType_ == BORDER_REPLICATE || borderType_ == BORDER_REFLECT );
+
+        const int sdepth = CV_MAT_DEPTH(srcType_);
+        const int cn = CV_MAT_CN(srcType_);
+
+        CV_Assert( cn == 1 );
+
+        double scale = static_cast<double>(1 << ((ksize_ > 0 ? ksize_ : 3) - 1)) * blockSize_;
+
+        if (ksize_ < 0)
+            scale *= 2.;
+
+        if (sdepth == CV_8U)
+            scale *= 255.;
+
+        scale = 1./scale;
+
+        if (ksize_ > 0)
+        {
+            filterDx_ = cuda::createSobelFilter(srcType, CV_32F, 1, 0, ksize_, scale, borderType_);
+            filterDy_ = cuda::createSobelFilter(srcType, CV_32F, 0, 1, ksize_, scale, borderType_);
+        }
+        else
+        {
+            filterDx_ = cuda::createScharrFilter(srcType, CV_32F, 1, 0, scale, borderType_);
+            filterDy_ = cuda::createScharrFilter(srcType, CV_32F, 0, 1, scale, borderType_);
+        }
+    }
+
+    void CornerBase::extractCovData(const GpuMat& src, Stream& stream)
+    {
+        CV_Assert( src.type() == srcType_ );
+        filterDx_->apply(src, Dx_, stream);
+        filterDy_->apply(src, Dy_, stream);
+    }
+
+    class Harris : public CornerBase
+    {
+    public:
+        Harris(int srcType, int blockSize, int ksize, double k, int borderType) :
+            CornerBase(srcType, blockSize, ksize, borderType), k_(static_cast<float>(k))
+        {
+        }
+
+        void compute(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+    private:
+        float k_;
+    };
+
+    void Harris::compute(InputArray _src, OutputArray _dst, Stream& stream)
+    {
+        using namespace cv::cuda::device::imgproc;
+
+        GpuMat src = _src.getGpuMat();
+
+        extractCovData(src, stream);
+
+        _dst.create(src.size(), CV_32FC1);
+        GpuMat dst = _dst.getGpuMat();
+
+        cornerHarris_gpu(blockSize_, k_, Dx_, Dy_, dst, borderType_, StreamAccessor::getStream(stream));
+    }
+
+    class MinEigenVal : public CornerBase
+    {
+    public:
+        MinEigenVal(int srcType, int blockSize, int ksize, int borderType) :
+            CornerBase(srcType, blockSize, ksize, borderType)
+        {
+        }
+
+        void compute(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+    private:
+        float k_;
+    };
+
+    void MinEigenVal::compute(InputArray _src, OutputArray _dst, Stream& stream)
+    {
+        using namespace cv::cuda::device::imgproc;
+
+        GpuMat src = _src.getGpuMat();
+
+        extractCovData(src, stream);
+
+        _dst.create(src.size(), CV_32FC1);
+        GpuMat dst = _dst.getGpuMat();
+
+        cornerMinEigenVal_gpu(blockSize_, Dx_, Dy_, dst, borderType_, StreamAccessor::getStream(stream));
+    }
+}
+
+Ptr<cuda::CornernessCriteria> cv::cuda::createHarrisCorner(int srcType, int blockSize, int ksize, double k, int borderType)
+{
+    return makePtr<Harris>(srcType, blockSize, ksize, k, borderType);
+}
+
+Ptr<cuda::CornernessCriteria> cv::cuda::createMinEigenValCorner(int srcType, int blockSize, int ksize, int borderType)
+{
+    return makePtr<MinEigenVal>(srcType, blockSize, ksize, borderType);
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudaimgproc/src/cuda/bilateral_filter.cu b/modules/cudaimgproc/src/cuda/bilateral_filter.cu
new file mode 100644
index 00000000000..f81adc7a983
--- /dev/null
+++ b/modules/cudaimgproc/src/cuda/bilateral_filter.cu
@@ -0,0 +1,199 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/vec_traits.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/border_interpolate.hpp"
+
+using namespace cv::cuda;
+
+typedef unsigned char uchar;
+typedef unsigned short ushort;
+
+//////////////////////////////////////////////////////////////////////////////////
+/// Bilateral filtering
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace imgproc
+    {
+        __device__ __forceinline__ float norm_l1(const float& a)  { return ::fabs(a); }
+        __device__ __forceinline__ float norm_l1(const float2& a) { return ::fabs(a.x) + ::fabs(a.y); }
+        __device__ __forceinline__ float norm_l1(const float3& a) { return ::fabs(a.x) + ::fabs(a.y) + ::fabs(a.z); }
+        __device__ __forceinline__ float norm_l1(const float4& a) { return ::fabs(a.x) + ::fabs(a.y) + ::fabs(a.z) + ::fabs(a.w); }
+
+        __device__ __forceinline__ float sqr(const float& a)  { return a * a; }
+
+        template<typename T, typename B>
+        __global__ void bilateral_kernel(const PtrStepSz<T> src, PtrStep<T> dst, const B b, const int ksz, const float sigma_spatial2_inv_half, const float sigma_color2_inv_half)
+        {
+            typedef typename TypeVec<float, VecTraits<T>::cn>::vec_type value_type;
+
+            int x = threadIdx.x + blockIdx.x * blockDim.x;
+            int y = threadIdx.y + blockIdx.y * blockDim.y;
+
+            if (x >= src.cols || y >= src.rows)
+                return;
+
+            value_type center = saturate_cast<value_type>(src(y, x));
+
+            value_type sum1 = VecTraits<value_type>::all(0);
+            float sum2 = 0;
+
+            int r = ksz / 2;
+            float r2 = (float)(r * r);
+
+            int tx = x - r + ksz;
+            int ty = y - r + ksz;
+
+            if (x - ksz/2 >=0 && y - ksz/2 >=0 && tx < src.cols && ty < src.rows)
+            {
+                for (int cy = y - r; cy < ty; ++cy)
+                    for (int cx = x - r; cx < tx; ++cx)
+                    {
+                        float space2 = (x - cx) * (x - cx) + (y - cy) * (y - cy);
+                        if (space2 > r2)
+                            continue;
+
+                        value_type value = saturate_cast<value_type>(src(cy, cx));
+
+                        float weight = ::exp(space2 * sigma_spatial2_inv_half + sqr(norm_l1(value - center)) * sigma_color2_inv_half);
+                        sum1 = sum1 + weight * value;
+                        sum2 = sum2 + weight;
+                    }
+            }
+            else
+            {
+                for (int cy = y - r; cy < ty; ++cy)
+                    for (int cx = x - r; cx < tx; ++cx)
+                    {
+                        float space2 = (x - cx) * (x - cx) + (y - cy) * (y - cy);
+                        if (space2 > r2)
+                            continue;
+
+                        value_type value = saturate_cast<value_type>(b.at(cy, cx, src.data, src.step));
+
+                        float weight = ::exp(space2 * sigma_spatial2_inv_half + sqr(norm_l1(value - center)) * sigma_color2_inv_half);
+
+                        sum1 = sum1 + weight * value;
+                        sum2 = sum2 + weight;
+                    }
+            }
+            dst(y, x) = saturate_cast<T>(sum1 / sum2);
+        }
+
+        template<typename T, template <typename> class B>
+        void bilateral_caller(const PtrStepSzb& src, PtrStepSzb dst, int kernel_size, float sigma_spatial, float sigma_color, cudaStream_t stream)
+        {
+            dim3 block (32, 8);
+            dim3 grid (divUp (src.cols, block.x), divUp (src.rows, block.y));
+
+            B<T> b(src.rows, src.cols);
+
+            float sigma_spatial2_inv_half = -0.5f/(sigma_spatial * sigma_spatial);
+            float sigma_color2_inv_half = -0.5f/(sigma_color * sigma_color);
+
+            cudaSafeCall( cudaFuncSetCacheConfig (bilateral_kernel<T, B<T> >, cudaFuncCachePreferL1) );
+            bilateral_kernel<<<grid, block, 0, stream>>>((PtrStepSz<T>)src, (PtrStepSz<T>)dst, b, kernel_size, sigma_spatial2_inv_half, sigma_color2_inv_half);
+            cudaSafeCall ( cudaGetLastError () );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template<typename T>
+        void bilateral_filter_gpu(const PtrStepSzb& src, PtrStepSzb dst, int kernel_size, float gauss_spatial_coeff, float gauss_color_coeff, int borderMode, cudaStream_t stream)
+        {
+            typedef void (*caller_t)(const PtrStepSzb& src, PtrStepSzb dst, int kernel_size, float sigma_spatial, float sigma_color, cudaStream_t stream);
+
+            static caller_t funcs[] =
+            {
+                bilateral_caller<T, BrdConstant>,
+                bilateral_caller<T, BrdReplicate>,
+                bilateral_caller<T, BrdReflect>,
+                bilateral_caller<T, BrdWrap>,
+                bilateral_caller<T, BrdReflect101>
+            };
+            funcs[borderMode](src, dst, kernel_size, gauss_spatial_coeff, gauss_color_coeff, stream);
+        }
+    }
+}}}
+
+
+#define OCV_INSTANTIATE_BILATERAL_FILTER(T) \
+    template void cv::cuda::device::imgproc::bilateral_filter_gpu<T>(const PtrStepSzb&, PtrStepSzb, int, float, float, int, cudaStream_t);
+
+OCV_INSTANTIATE_BILATERAL_FILTER(uchar)
+//OCV_INSTANTIATE_BILATERAL_FILTER(uchar2)
+OCV_INSTANTIATE_BILATERAL_FILTER(uchar3)
+OCV_INSTANTIATE_BILATERAL_FILTER(uchar4)
+
+//OCV_INSTANTIATE_BILATERAL_FILTER(schar)
+//OCV_INSTANTIATE_BILATERAL_FILTER(schar2)
+//OCV_INSTANTIATE_BILATERAL_FILTER(schar3)
+//OCV_INSTANTIATE_BILATERAL_FILTER(schar4)
+
+OCV_INSTANTIATE_BILATERAL_FILTER(short)
+//OCV_INSTANTIATE_BILATERAL_FILTER(short2)
+OCV_INSTANTIATE_BILATERAL_FILTER(short3)
+OCV_INSTANTIATE_BILATERAL_FILTER(short4)
+
+OCV_INSTANTIATE_BILATERAL_FILTER(ushort)
+//OCV_INSTANTIATE_BILATERAL_FILTER(ushort2)
+OCV_INSTANTIATE_BILATERAL_FILTER(ushort3)
+OCV_INSTANTIATE_BILATERAL_FILTER(ushort4)
+
+//OCV_INSTANTIATE_BILATERAL_FILTER(int)
+//OCV_INSTANTIATE_BILATERAL_FILTER(int2)
+//OCV_INSTANTIATE_BILATERAL_FILTER(int3)
+//OCV_INSTANTIATE_BILATERAL_FILTER(int4)
+
+OCV_INSTANTIATE_BILATERAL_FILTER(float)
+//OCV_INSTANTIATE_BILATERAL_FILTER(float2)
+OCV_INSTANTIATE_BILATERAL_FILTER(float3)
+OCV_INSTANTIATE_BILATERAL_FILTER(float4)
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudaimgproc/src/cuda/blend.cu b/modules/cudaimgproc/src/cuda/blend.cu
new file mode 100644
index 00000000000..9e47645e383
--- /dev/null
+++ b/modules/cudaimgproc/src/cuda/blend.cu
@@ -0,0 +1,121 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace blend
+    {
+        template <typename T>
+        __global__ void blendLinearKernel(int rows, int cols, int cn, const PtrStep<T> img1, const PtrStep<T> img2,
+                                          const PtrStepf weights1, const PtrStepf weights2, PtrStep<T> result)
+        {
+            int x = blockIdx.x * blockDim.x + threadIdx.x;
+            int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (y < rows && x < cols)
+            {
+                int x_ = x / cn;
+                float w1 = weights1.ptr(y)[x_];
+                float w2 = weights2.ptr(y)[x_];
+                T p1 = img1.ptr(y)[x];
+                T p2 = img2.ptr(y)[x];
+                result.ptr(y)[x] = (p1 * w1 + p2 * w2) / (w1 + w2 + 1e-5f);
+            }
+        }
+
+        template <typename T>
+        void blendLinearCaller(int rows, int cols, int cn, PtrStep<T> img1, PtrStep<T> img2, PtrStepf weights1, PtrStepf weights2, PtrStep<T> result, cudaStream_t stream)
+        {
+            dim3 threads(16, 16);
+            dim3 grid(divUp(cols * cn, threads.x), divUp(rows, threads.y));
+
+            blendLinearKernel<<<grid, threads, 0, stream>>>(rows, cols * cn, cn, img1, img2, weights1, weights2, result);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall(cudaDeviceSynchronize());
+        }
+
+        template void blendLinearCaller<uchar>(int, int, int, PtrStep<uchar>, PtrStep<uchar>, PtrStepf, PtrStepf, PtrStep<uchar>, cudaStream_t stream);
+        template void blendLinearCaller<float>(int, int, int, PtrStep<float>, PtrStep<float>, PtrStepf, PtrStepf, PtrStep<float>, cudaStream_t stream);
+
+
+        __global__ void blendLinearKernel8UC4(int rows, int cols, const PtrStepb img1, const PtrStepb img2,
+                                              const PtrStepf weights1, const PtrStepf weights2, PtrStepb result)
+        {
+            int x = blockIdx.x * blockDim.x + threadIdx.x;
+            int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (y < rows && x < cols)
+            {
+                float w1 = weights1.ptr(y)[x];
+                float w2 = weights2.ptr(y)[x];
+                float sum_inv = 1.f / (w1 + w2 + 1e-5f);
+                w1 *= sum_inv;
+                w2 *= sum_inv;
+                uchar4 p1 = ((const uchar4*)img1.ptr(y))[x];
+                uchar4 p2 = ((const uchar4*)img2.ptr(y))[x];
+                ((uchar4*)result.ptr(y))[x] = make_uchar4(p1.x * w1 + p2.x * w2, p1.y * w1 + p2.y * w2,
+                                                          p1.z * w1 + p2.z * w2, p1.w * w1 + p2.w * w2);
+            }
+        }
+
+        void blendLinearCaller8UC4(int rows, int cols, PtrStepb img1, PtrStepb img2, PtrStepf weights1, PtrStepf weights2, PtrStepb result, cudaStream_t stream)
+        {
+            dim3 threads(16, 16);
+            dim3 grid(divUp(cols, threads.x), divUp(rows, threads.y));
+
+            blendLinearKernel8UC4<<<grid, threads, 0, stream>>>(rows, cols, img1, img2, weights1, weights2, result);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall(cudaDeviceSynchronize());
+        }
+    } // namespace blend
+}}} // namespace cv { namespace cuda { namespace cudev
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudaimgproc/src/cuda/build_point_list.cu b/modules/cudaimgproc/src/cuda/build_point_list.cu
new file mode 100644
index 00000000000..addcabc2442
--- /dev/null
+++ b/modules/cudaimgproc/src/cuda/build_point_list.cu
@@ -0,0 +1,138 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/emulation.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace hough
+    {
+        __device__ int g_counter;
+
+        template <int PIXELS_PER_THREAD>
+        __global__ void buildPointList(const PtrStepSzb src, unsigned int* list)
+        {
+            __shared__ unsigned int s_queues[4][32 * PIXELS_PER_THREAD];
+            __shared__ int s_qsize[4];
+            __shared__ int s_globStart[4];
+
+            const int x = blockIdx.x * blockDim.x * PIXELS_PER_THREAD + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (threadIdx.x == 0)
+                s_qsize[threadIdx.y] = 0;
+            __syncthreads();
+
+            if (y < src.rows)
+            {
+                // fill the queue
+                const uchar* srcRow = src.ptr(y);
+                for (int i = 0, xx = x; i < PIXELS_PER_THREAD && xx < src.cols; ++i, xx += blockDim.x)
+                {
+                    if (srcRow[xx])
+                    {
+                        const unsigned int val = (y << 16) | xx;
+                        const int qidx = Emulation::smem::atomicAdd(&s_qsize[threadIdx.y], 1);
+                        s_queues[threadIdx.y][qidx] = val;
+                    }
+                }
+            }
+
+            __syncthreads();
+
+            // let one thread reserve the space required in the global list
+            if (threadIdx.x == 0 && threadIdx.y == 0)
+            {
+                // find how many items are stored in each list
+                int totalSize = 0;
+                for (int i = 0; i < blockDim.y; ++i)
+                {
+                    s_globStart[i] = totalSize;
+                    totalSize += s_qsize[i];
+                }
+
+                // calculate the offset in the global list
+                const int globalOffset = atomicAdd(&g_counter, totalSize);
+                for (int i = 0; i < blockDim.y; ++i)
+                    s_globStart[i] += globalOffset;
+            }
+
+            __syncthreads();
+
+            // copy local queues to global queue
+            const int qsize = s_qsize[threadIdx.y];
+            int gidx = s_globStart[threadIdx.y] + threadIdx.x;
+            for(int i = threadIdx.x; i < qsize; i += blockDim.x, gidx += blockDim.x)
+                list[gidx] = s_queues[threadIdx.y][i];
+        }
+
+        int buildPointList_gpu(PtrStepSzb src, unsigned int* list)
+        {
+            const int PIXELS_PER_THREAD = 16;
+
+            void* counterPtr;
+            cudaSafeCall( cudaGetSymbolAddress(&counterPtr, g_counter) );
+
+            cudaSafeCall( cudaMemset(counterPtr, 0, sizeof(int)) );
+
+            const dim3 block(32, 4);
+            const dim3 grid(divUp(src.cols, block.x * PIXELS_PER_THREAD), divUp(src.rows, block.y));
+
+            cudaSafeCall( cudaFuncSetCacheConfig(buildPointList<PIXELS_PER_THREAD>, cudaFuncCachePreferShared) );
+
+            buildPointList<PIXELS_PER_THREAD><<<grid, block>>>(src, list);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+
+            int totalCount;
+            cudaSafeCall( cudaMemcpy(&totalCount, counterPtr, sizeof(int), cudaMemcpyDeviceToHost) );
+
+            return totalCount;
+        }
+    }
+}}}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudaimgproc/src/cuda/canny.cu b/modules/cudaimgproc/src/cuda/canny.cu
new file mode 100644
index 00000000000..4418b8e5eb2
--- /dev/null
+++ b/modules/cudaimgproc/src/cuda/canny.cu
@@ -0,0 +1,670 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/emulation.hpp"
+#include "opencv2/core/cuda/transform.hpp"
+#include "opencv2/core/cuda/functional.hpp"
+#include "opencv2/core/cuda/utility.hpp"
+#include "opencv2/core/cuda.hpp"
+
+using namespace cv::cuda;
+using namespace cv::cuda::device;
+
+namespace canny
+{
+    struct L1 : binary_function<int, int, float>
+    {
+        __device__ __forceinline__ float operator ()(int x, int y) const
+        {
+            return ::abs(x) + ::abs(y);
+        }
+
+        __host__ __device__ __forceinline__ L1() {}
+        __host__ __device__ __forceinline__ L1(const L1&) {}
+    };
+    struct L2 : binary_function<int, int, float>
+    {
+        __device__ __forceinline__ float operator ()(int x, int y) const
+        {
+            return ::sqrtf(x * x + y * y);
+        }
+
+        __host__ __device__ __forceinline__ L2() {}
+        __host__ __device__ __forceinline__ L2(const L2&) {}
+    };
+}
+
+namespace cv { namespace cuda { namespace device
+{
+    template <> struct TransformFunctorTraits<canny::L1> : DefaultTransformFunctorTraits<canny::L1>
+    {
+        enum { smart_shift = 4 };
+    };
+    template <> struct TransformFunctorTraits<canny::L2> : DefaultTransformFunctorTraits<canny::L2>
+    {
+        enum { smart_shift = 4 };
+    };
+}}}
+
+namespace canny
+{
+    texture<uchar, cudaTextureType2D, cudaReadModeElementType> tex_src(false, cudaFilterModePoint, cudaAddressModeClamp);
+    struct SrcTex
+    {
+        int xoff;
+        int yoff;
+        __host__ SrcTex(int _xoff, int _yoff) : xoff(_xoff), yoff(_yoff) {}
+
+        __device__ __forceinline__ int operator ()(int y, int x) const
+        {
+            return tex2D(tex_src, x + xoff, y + yoff);
+        }
+    };
+
+    struct SrcTexObject
+    {
+        int xoff;
+        int yoff;
+        cudaTextureObject_t tex_src_object;
+        __host__ SrcTexObject(int _xoff, int _yoff, cudaTextureObject_t _tex_src_object) : xoff(_xoff), yoff(_yoff), tex_src_object(_tex_src_object) { }
+
+        __device__ __forceinline__ int operator ()(int y, int x) const
+        {
+            return tex2D<uchar>(tex_src_object, x + xoff, y + yoff);
+        }
+
+    };
+
+    template <class Norm> __global__
+    void calcMagnitudeKernel(const SrcTex src, PtrStepi dx, PtrStepi dy, PtrStepSzf mag, const Norm norm)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (y >= mag.rows || x >= mag.cols)
+            return;
+
+        int dxVal = (src(y - 1, x + 1) + 2 * src(y, x + 1) + src(y + 1, x + 1)) - (src(y - 1, x - 1) + 2 * src(y, x - 1) + src(y + 1, x - 1));
+        int dyVal = (src(y + 1, x - 1) + 2 * src(y + 1, x) + src(y + 1, x + 1)) - (src(y - 1, x - 1) + 2 * src(y - 1, x) + src(y - 1, x + 1));
+
+        dx(y, x) = dxVal;
+        dy(y, x) = dyVal;
+
+        mag(y, x) = norm(dxVal, dyVal);
+    }
+
+    template <class Norm> __global__
+    void calcMagnitudeKernel(const SrcTexObject src, PtrStepi dx, PtrStepi dy, PtrStepSzf mag, const Norm norm)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (y >= mag.rows || x >= mag.cols)
+            return;
+
+        int dxVal = (src(y - 1, x + 1) + 2 * src(y, x + 1) + src(y + 1, x + 1)) - (src(y - 1, x - 1) + 2 * src(y, x - 1) + src(y + 1, x - 1));
+        int dyVal = (src(y + 1, x - 1) + 2 * src(y + 1, x) + src(y + 1, x + 1)) - (src(y - 1, x - 1) + 2 * src(y - 1, x) + src(y - 1, x + 1));
+
+        dx(y, x) = dxVal;
+        dy(y, x) = dyVal;
+
+        mag(y, x) = norm(dxVal, dyVal);
+    }
+
+    void calcMagnitude(PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzi dx, PtrStepSzi dy, PtrStepSzf mag, bool L2Grad, cudaStream_t stream)
+    {
+        const dim3 block(16, 16);
+        const dim3 grid(divUp(mag.cols, block.x), divUp(mag.rows, block.y));
+
+        bool cc30 = deviceSupports(FEATURE_SET_COMPUTE_30);
+
+        if (cc30)
+        {
+            cudaResourceDesc resDesc;
+            memset(&resDesc, 0, sizeof(resDesc));
+            resDesc.resType = cudaResourceTypePitch2D;
+            resDesc.res.pitch2D.devPtr = srcWhole.ptr();
+            resDesc.res.pitch2D.height = srcWhole.rows;
+            resDesc.res.pitch2D.width = srcWhole.cols;
+            resDesc.res.pitch2D.pitchInBytes = srcWhole.step;
+            resDesc.res.pitch2D.desc = cudaCreateChannelDesc<uchar>();
+
+            cudaTextureDesc texDesc;
+            memset(&texDesc, 0, sizeof(texDesc));
+            texDesc.addressMode[0] = cudaAddressModeClamp;
+            texDesc.addressMode[1] = cudaAddressModeClamp;
+            texDesc.addressMode[2] = cudaAddressModeClamp;
+
+            cudaTextureObject_t tex = 0;
+            cudaCreateTextureObject(&tex, &resDesc, &texDesc, NULL);
+
+            SrcTexObject src(xoff, yoff, tex);
+
+            if (L2Grad)
+            {
+                L2 norm;
+                calcMagnitudeKernel<<<grid, block, 0, stream>>>(src, dx, dy, mag, norm);
+            }
+            else
+            {
+                L1 norm;
+                calcMagnitudeKernel<<<grid, block, 0, stream>>>(src, dx, dy, mag, norm);
+            }
+
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == NULL)
+                cudaSafeCall( cudaDeviceSynchronize() );
+            else
+                cudaSafeCall( cudaStreamSynchronize(stream) );
+
+            cudaSafeCall( cudaDestroyTextureObject(tex) );
+        }
+        else
+        {
+            bindTexture(&tex_src, srcWhole);
+            SrcTex src(xoff, yoff);
+
+            if (L2Grad)
+            {
+                L2 norm;
+                calcMagnitudeKernel<<<grid, block, 0, stream>>>(src, dx, dy, mag, norm);
+            }
+            else
+            {
+                L1 norm;
+                calcMagnitudeKernel<<<grid, block, 0, stream>>>(src, dx, dy, mag, norm);
+            }
+
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == NULL)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    }
+
+    void calcMagnitude(PtrStepSzi dx, PtrStepSzi dy, PtrStepSzf mag, bool L2Grad, cudaStream_t stream)
+    {
+        if (L2Grad)
+        {
+            L2 norm;
+            transform(dx, dy, mag, norm, WithOutMask(), stream);
+        }
+        else
+        {
+            L1 norm;
+            transform(dx, dy, mag, norm, WithOutMask(), stream);
+        }
+    }
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////
+
+namespace canny
+{
+    texture<float, cudaTextureType2D, cudaReadModeElementType> tex_mag(false, cudaFilterModePoint, cudaAddressModeClamp);
+    __global__ void calcMapKernel(const PtrStepSzi dx, const PtrStepi dy, PtrStepi map, const float low_thresh, const float high_thresh)
+    {
+        const int CANNY_SHIFT = 15;
+        const int TG22 = (int)(0.4142135623730950488016887242097*(1<<CANNY_SHIFT) + 0.5);
+
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x == 0 || x >= dx.cols - 1 || y == 0 || y >= dx.rows - 1)
+            return;
+
+        int dxVal = dx(y, x);
+        int dyVal = dy(y, x);
+
+        const int s = (dxVal ^ dyVal) < 0 ? -1 : 1;
+        const float m = tex2D(tex_mag, x, y);
+
+        dxVal = ::abs(dxVal);
+        dyVal = ::abs(dyVal);
+
+        // 0 - the pixel can not belong to an edge
+        // 1 - the pixel might belong to an edge
+        // 2 - the pixel does belong to an edge
+        int edge_type = 0;
+
+        if (m > low_thresh)
+        {
+            const int tg22x = dxVal * TG22;
+            const int tg67x = tg22x + ((dxVal + dxVal) << CANNY_SHIFT);
+
+            dyVal <<= CANNY_SHIFT;
+
+            if (dyVal < tg22x)
+            {
+                if (m > tex2D(tex_mag, x - 1, y) && m >= tex2D(tex_mag, x + 1, y))
+                    edge_type = 1 + (int)(m > high_thresh);
+            }
+            else if(dyVal > tg67x)
+            {
+                if (m > tex2D(tex_mag, x, y - 1) && m >= tex2D(tex_mag, x, y + 1))
+                    edge_type = 1 + (int)(m > high_thresh);
+            }
+            else
+            {
+                if (m > tex2D(tex_mag, x - s, y - 1) && m >= tex2D(tex_mag, x + s, y + 1))
+                    edge_type = 1 + (int)(m > high_thresh);
+            }
+        }
+
+        map(y, x) = edge_type;
+    }
+
+    __global__ void calcMapKernel(const PtrStepSzi dx, const PtrStepi dy, PtrStepi map, const float low_thresh, const float high_thresh, cudaTextureObject_t tex_mag)
+    {
+        const int CANNY_SHIFT = 15;
+        const int TG22 = (int)(0.4142135623730950488016887242097*(1<<CANNY_SHIFT) + 0.5);
+
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x == 0 || x >= dx.cols - 1 || y == 0 || y >= dx.rows - 1)
+            return;
+
+        int dxVal = dx(y, x);
+        int dyVal = dy(y, x);
+
+        const int s = (dxVal ^ dyVal) < 0 ? -1 : 1;
+        const float m = tex2D<float>(tex_mag, x, y);
+
+        dxVal = ::abs(dxVal);
+        dyVal = ::abs(dyVal);
+
+        // 0 - the pixel can not belong to an edge
+        // 1 - the pixel might belong to an edge
+        // 2 - the pixel does belong to an edge
+        int edge_type = 0;
+
+        if (m > low_thresh)
+        {
+            const int tg22x = dxVal * TG22;
+            const int tg67x = tg22x + ((dxVal + dxVal) << CANNY_SHIFT);
+
+            dyVal <<= CANNY_SHIFT;
+
+            if (dyVal < tg22x)
+            {
+                if (m > tex2D<float>(tex_mag, x - 1, y) && m >= tex2D<float>(tex_mag, x + 1, y))
+                    edge_type = 1 + (int)(m > high_thresh);
+            }
+            else if(dyVal > tg67x)
+            {
+                if (m > tex2D<float>(tex_mag, x, y - 1) && m >= tex2D<float>(tex_mag, x, y + 1))
+                    edge_type = 1 + (int)(m > high_thresh);
+            }
+            else
+            {
+                if (m > tex2D<float>(tex_mag, x - s, y - 1) && m >= tex2D<float>(tex_mag, x + s, y + 1))
+                    edge_type = 1 + (int)(m > high_thresh);
+            }
+        }
+
+        map(y, x) = edge_type;
+    }
+
+    void calcMap(PtrStepSzi dx, PtrStepSzi dy, PtrStepSzf mag, PtrStepSzi map, float low_thresh, float high_thresh, cudaStream_t stream)
+    {
+        const dim3 block(16, 16);
+        const dim3 grid(divUp(dx.cols, block.x), divUp(dx.rows, block.y));
+
+        if (deviceSupports(FEATURE_SET_COMPUTE_30))
+        {
+            // Use the texture object
+            cudaResourceDesc resDesc;
+            memset(&resDesc, 0, sizeof(resDesc));
+            resDesc.resType = cudaResourceTypePitch2D;
+            resDesc.res.pitch2D.devPtr = mag.ptr();
+            resDesc.res.pitch2D.height = mag.rows;
+            resDesc.res.pitch2D.width = mag.cols;
+            resDesc.res.pitch2D.pitchInBytes = mag.step;
+            resDesc.res.pitch2D.desc = cudaCreateChannelDesc<float>();
+
+            cudaTextureDesc texDesc;
+            memset(&texDesc, 0, sizeof(texDesc));
+            texDesc.addressMode[0] = cudaAddressModeClamp;
+            texDesc.addressMode[1] = cudaAddressModeClamp;
+            texDesc.addressMode[2] = cudaAddressModeClamp;
+
+            cudaTextureObject_t tex=0;
+            cudaCreateTextureObject(&tex, &resDesc, &texDesc, NULL);
+            calcMapKernel<<<grid, block, 0, stream>>>(dx, dy, map, low_thresh, high_thresh, tex);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == NULL)
+                cudaSafeCall( cudaDeviceSynchronize() );
+            else
+                cudaSafeCall( cudaStreamSynchronize(stream) );
+
+            cudaSafeCall( cudaDestroyTextureObject(tex) );
+        }
+        else
+        {
+            // Use the texture reference
+            bindTexture(&tex_mag, mag);
+            calcMapKernel<<<grid, block, 0, stream>>>(dx, dy, map, low_thresh, high_thresh);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == NULL)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    }
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////
+
+namespace canny
+{
+    __device__ __forceinline__ bool checkIdx(int y, int x, int rows, int cols)
+    {
+        return (y >= 0) && (y < rows) && (x >= 0) && (x < cols);
+    }
+
+    __global__ void edgesHysteresisLocalKernel(PtrStepSzi map, short2* st, int* d_counter)
+    {
+        __shared__ volatile int smem[18][18];
+
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        smem[threadIdx.y + 1][threadIdx.x + 1] = checkIdx(y, x, map.rows, map.cols) ? map(y, x) : 0;
+        if (threadIdx.y == 0)
+            smem[0][threadIdx.x + 1] = checkIdx(y - 1, x, map.rows, map.cols) ? map(y - 1, x) : 0;
+        if (threadIdx.y == blockDim.y - 1)
+            smem[blockDim.y + 1][threadIdx.x + 1] = checkIdx(y + 1, x, map.rows, map.cols) ? map(y + 1, x) : 0;
+        if (threadIdx.x == 0)
+            smem[threadIdx.y + 1][0] = checkIdx(y, x - 1, map.rows, map.cols) ? map(y, x - 1) : 0;
+        if (threadIdx.x == blockDim.x - 1)
+            smem[threadIdx.y + 1][blockDim.x + 1] = checkIdx(y, x + 1, map.rows, map.cols) ? map(y, x + 1) : 0;
+        if (threadIdx.x == 0 && threadIdx.y == 0)
+            smem[0][0] = checkIdx(y - 1, x - 1, map.rows, map.cols) ? map(y - 1, x - 1) : 0;
+        if (threadIdx.x == blockDim.x - 1 && threadIdx.y == 0)
+            smem[0][blockDim.x + 1] = checkIdx(y - 1, x + 1, map.rows, map.cols) ? map(y - 1, x + 1) : 0;
+        if (threadIdx.x == 0 && threadIdx.y == blockDim.y - 1)
+            smem[blockDim.y + 1][0] = checkIdx(y + 1, x - 1, map.rows, map.cols) ? map(y + 1, x - 1) : 0;
+        if (threadIdx.x == blockDim.x - 1 && threadIdx.y == blockDim.y - 1)
+            smem[blockDim.y + 1][blockDim.x + 1] = checkIdx(y + 1, x + 1, map.rows, map.cols) ? map(y + 1, x + 1) : 0;
+
+        __syncthreads();
+
+        if (x >= map.cols || y >= map.rows)
+            return;
+
+        int n;
+
+        #pragma unroll
+        for (int k = 0; k < 16; ++k)
+        {
+            n = 0;
+
+            if (smem[threadIdx.y + 1][threadIdx.x + 1] == 1)
+            {
+                n += smem[threadIdx.y    ][threadIdx.x    ] == 2;
+                n += smem[threadIdx.y    ][threadIdx.x + 1] == 2;
+                n += smem[threadIdx.y    ][threadIdx.x + 2] == 2;
+
+                n += smem[threadIdx.y + 1][threadIdx.x    ] == 2;
+                n += smem[threadIdx.y + 1][threadIdx.x + 2] == 2;
+
+                n += smem[threadIdx.y + 2][threadIdx.x    ] == 2;
+                n += smem[threadIdx.y + 2][threadIdx.x + 1] == 2;
+                n += smem[threadIdx.y + 2][threadIdx.x + 2] == 2;
+            }
+
+            __syncthreads();
+
+            if (n > 0)
+                smem[threadIdx.y + 1][threadIdx.x + 1] = 2;
+
+            __syncthreads();
+        }
+
+        const int e = smem[threadIdx.y + 1][threadIdx.x + 1];
+
+        map(y, x) = e;
+
+        n = 0;
+
+        if (e == 2)
+        {
+            n += smem[threadIdx.y    ][threadIdx.x    ] == 1;
+            n += smem[threadIdx.y    ][threadIdx.x + 1] == 1;
+            n += smem[threadIdx.y    ][threadIdx.x + 2] == 1;
+
+            n += smem[threadIdx.y + 1][threadIdx.x    ] == 1;
+            n += smem[threadIdx.y + 1][threadIdx.x + 2] == 1;
+
+            n += smem[threadIdx.y + 2][threadIdx.x    ] == 1;
+            n += smem[threadIdx.y + 2][threadIdx.x + 1] == 1;
+            n += smem[threadIdx.y + 2][threadIdx.x + 2] == 1;
+        }
+
+        if (n > 0)
+        {
+            const int ind =  ::atomicAdd(d_counter, 1);
+            st[ind] = make_short2(x, y);
+        }
+    }
+
+    void edgesHysteresisLocal(PtrStepSzi map, short2* st1, int* d_counter, cudaStream_t stream)
+    {
+        cudaSafeCall( cudaMemsetAsync(d_counter, 0, sizeof(int), stream) );
+
+        const dim3 block(16, 16);
+        const dim3 grid(divUp(map.cols, block.x), divUp(map.rows, block.y));
+
+        edgesHysteresisLocalKernel<<<grid, block, 0, stream>>>(map, st1, d_counter);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == NULL)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////
+
+namespace canny
+{
+    __constant__ int c_dx[8] = {-1,  0,  1, -1, 1, -1, 0, 1};
+    __constant__ int c_dy[8] = {-1, -1, -1,  0, 0,  1, 1, 1};
+
+    __global__ void edgesHysteresisGlobalKernel(PtrStepSzi map, short2* st1, short2* st2, int* d_counter, const int count)
+    {
+        const int stack_size = 512;
+
+        __shared__ int s_counter;
+        __shared__ int s_ind;
+        __shared__ short2 s_st[stack_size];
+
+        if (threadIdx.x == 0)
+            s_counter = 0;
+
+        __syncthreads();
+
+        int ind = blockIdx.y * gridDim.x + blockIdx.x;
+
+        if (ind >= count)
+            return;
+
+        short2 pos = st1[ind];
+
+        if (threadIdx.x < 8)
+        {
+            pos.x += c_dx[threadIdx.x];
+            pos.y += c_dy[threadIdx.x];
+
+            if (pos.x > 0 && pos.x < map.cols - 1 && pos.y > 0 && pos.y < map.rows - 1 && map(pos.y, pos.x) == 1)
+            {
+                map(pos.y, pos.x) = 2;
+
+                ind = Emulation::smem::atomicAdd(&s_counter, 1);
+
+                s_st[ind] = pos;
+            }
+        }
+
+        __syncthreads();
+
+        while (s_counter > 0 && s_counter <= stack_size - blockDim.x)
+        {
+            const int subTaskIdx = threadIdx.x >> 3;
+            const int portion = ::min(s_counter, blockDim.x >> 3);
+
+            if (subTaskIdx < portion)
+                pos = s_st[s_counter - 1 - subTaskIdx];
+
+            __syncthreads();
+
+            if (threadIdx.x == 0)
+                s_counter -= portion;
+
+            __syncthreads();
+
+            if (subTaskIdx < portion)
+            {
+                pos.x += c_dx[threadIdx.x & 7];
+                pos.y += c_dy[threadIdx.x & 7];
+
+                if (pos.x > 0 && pos.x < map.cols - 1 && pos.y > 0 && pos.y < map.rows - 1 && map(pos.y, pos.x) == 1)
+                {
+                    map(pos.y, pos.x) = 2;
+
+                    ind = Emulation::smem::atomicAdd(&s_counter, 1);
+
+                    s_st[ind] = pos;
+                }
+            }
+
+            __syncthreads();
+        }
+
+        if (s_counter > 0)
+        {
+            if (threadIdx.x == 0)
+            {
+                s_ind = ::atomicAdd(d_counter, s_counter);
+
+                if (s_ind + s_counter > map.cols * map.rows)
+                    s_counter = 0;
+            }
+
+            __syncthreads();
+
+            ind = s_ind;
+
+            for (int i = threadIdx.x; i < s_counter; i += blockDim.x)
+                st2[ind + i] = s_st[i];
+        }
+    }
+
+    void edgesHysteresisGlobal(PtrStepSzi map, short2* st1, short2* st2, int* d_counter, cudaStream_t stream)
+    {
+        int count;
+        cudaSafeCall( cudaMemcpyAsync(&count, d_counter, sizeof(int), cudaMemcpyDeviceToHost, stream) );
+        cudaSafeCall( cudaStreamSynchronize(stream) );
+
+        while (count > 0)
+        {
+            cudaSafeCall( cudaMemsetAsync(d_counter, 0, sizeof(int), stream) );
+
+            const dim3 block(128);
+            const dim3 grid(::min(count, 65535u), divUp(count, 65535), 1);
+
+            edgesHysteresisGlobalKernel<<<grid, block, 0, stream>>>(map, st1, st2, d_counter, count);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == NULL)
+                cudaSafeCall( cudaDeviceSynchronize() );
+
+            cudaSafeCall( cudaMemcpyAsync(&count, d_counter, sizeof(int), cudaMemcpyDeviceToHost, stream) );
+            cudaSafeCall( cudaStreamSynchronize(stream) );
+
+            count = min(count, map.cols * map.rows);
+
+            //std::swap(st1, st2);
+            short2* tmp = st1;
+            st1 = st2;
+            st2 = tmp;
+        }
+    }
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////
+
+namespace canny
+{
+    struct GetEdges : unary_function<int, uchar>
+    {
+        __device__ __forceinline__ uchar operator ()(int e) const
+        {
+            return (uchar)(-(e >> 1));
+        }
+
+        __host__ __device__ __forceinline__ GetEdges() {}
+        __host__ __device__ __forceinline__ GetEdges(const GetEdges&) {}
+    };
+}
+
+namespace cv { namespace cuda { namespace device
+{
+    template <> struct TransformFunctorTraits<canny::GetEdges> : DefaultTransformFunctorTraits<canny::GetEdges>
+    {
+        enum { smart_shift = 4 };
+    };
+}}}
+
+namespace canny
+{
+    void getEdges(PtrStepSzi map, PtrStepSzb dst, cudaStream_t stream)
+    {
+        transform(map, dst, GetEdges(), WithOutMask(), stream);
+    }
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudaimgproc/src/cuda/clahe.cu b/modules/cudaimgproc/src/cuda/clahe.cu
new file mode 100644
index 00000000000..75663c51e2e
--- /dev/null
+++ b/modules/cudaimgproc/src/cuda/clahe.cu
@@ -0,0 +1,408 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/cudev.hpp"
+
+using namespace cv::cudev;
+
+namespace clahe
+{
+    __global__ void calcLutKernel_8U(const PtrStepb src, PtrStepb lut,
+                                     const int2 tileSize, const int tilesX,
+                                     const int clipLimit, const float lutScale)
+    {
+        __shared__ int smem[256];
+
+        const int tx = blockIdx.x;
+        const int ty = blockIdx.y;
+        const unsigned int tid = threadIdx.y * blockDim.x + threadIdx.x;
+
+        smem[tid] = 0;
+        __syncthreads();
+
+        for (int i = threadIdx.y; i < tileSize.y; i += blockDim.y)
+        {
+            const uchar* srcPtr = src.ptr(ty * tileSize.y + i) + tx * tileSize.x;
+            for (int j = threadIdx.x; j < tileSize.x; j += blockDim.x)
+            {
+                const int data = srcPtr[j];
+                ::atomicAdd(&smem[data], 1);
+            }
+        }
+
+        __syncthreads();
+
+        int tHistVal = smem[tid];
+
+        __syncthreads();
+
+        if (clipLimit > 0)
+        {
+            // clip histogram bar
+
+            int clipped = 0;
+            if (tHistVal > clipLimit)
+            {
+                clipped = tHistVal - clipLimit;
+                tHistVal = clipLimit;
+            }
+
+            // find number of overall clipped samples
+
+            blockReduce<256>(smem, clipped, tid, plus<int>());
+
+            // broadcast evaluated value
+
+            __shared__ int totalClipped;
+            __shared__ int redistBatch;
+            __shared__ int residual;
+            __shared__ int rStep;
+
+            if (tid == 0)
+            {
+                totalClipped = clipped;
+                redistBatch = totalClipped / 256;
+                residual = totalClipped - redistBatch * 256;
+
+                rStep = 1;
+                if (residual != 0)
+                    rStep = 256 / residual;
+            }
+
+            __syncthreads();
+
+            // redistribute clipped samples evenly
+
+            tHistVal += redistBatch;
+
+            if (residual && tid % rStep == 0 && tid / rStep < residual)
+                ++tHistVal;
+        }
+
+        const int lutVal = blockScanInclusive<256>(tHistVal, smem, tid);
+
+        lut(ty * tilesX + tx, tid) = saturate_cast<uchar>(__float2int_rn(lutScale * lutVal));
+    }
+
+    __global__ void calcLutKernel_16U(const PtrStepus src, PtrStepus lut,
+                                      const int2 tileSize, const int tilesX,
+                                      const int clipLimit, const float lutScale,
+                                      PtrStepSzi hist)
+    {
+        #define histSize 65536
+        #define blockSize 256
+
+        __shared__ int smem[blockSize];
+
+        const int tx = blockIdx.x;
+        const int ty = blockIdx.y;
+        const unsigned int tid = threadIdx.y * blockDim.x + threadIdx.x;
+
+        const int histRow = ty * tilesX + tx;
+
+        // build histogram
+
+        for (int i = tid; i < histSize; i += blockSize)
+            hist(histRow, i) = 0;
+
+        __syncthreads();
+
+        for (int i = threadIdx.y; i < tileSize.y; i += blockDim.y)
+        {
+            const ushort* srcPtr = src.ptr(ty * tileSize.y + i) + tx * tileSize.x;
+            for (int j = threadIdx.x; j < tileSize.x; j += blockDim.x)
+            {
+                const int data = srcPtr[j];
+                ::atomicAdd(&hist(histRow, data), 1);
+            }
+        }
+
+        __syncthreads();
+
+        if (clipLimit > 0)
+        {
+            // clip histogram bar &&
+            // find number of overall clipped samples
+
+            __shared__ int partialSum[blockSize];
+
+            for (int i = tid; i < histSize; i += blockSize)
+            {
+                int histVal = hist(histRow, i);
+
+                int clipped = 0;
+                if (histVal > clipLimit)
+                {
+                    clipped = histVal - clipLimit;
+                    hist(histRow, i) = clipLimit;
+                }
+
+                // Following code block is in effect equivalent to:
+                //
+                //      blockReduce<blockSize>(smem, clipped, tid, plus<int>());
+                //
+                {
+                    for (int j = 16; j >= 1; j /= 2)
+                    {
+                    #if __CUDACC_VER_MAJOR__ >= 9
+                        int val = __shfl_down_sync(0xFFFFFFFFU, clipped, j);
+                    #else
+                        int val = __shfl_down(clipped, j);
+                    #endif
+                        clipped += val;
+                    }
+
+                    if (tid % 32 == 0)
+                        smem[tid / 32] = clipped;
+
+                    __syncthreads();
+
+                    if (tid < 8)
+                    {
+                        clipped = smem[tid];
+
+                        for (int j = 4; j >= 1; j /= 2)
+                        {
+                        #if __CUDACC_VER_MAJOR__ >= 9
+                            int val = __shfl_down_sync(0x000000FFU, clipped, j);
+                        #else
+                            int val = __shfl_down(clipped, j);
+                        #endif
+                            clipped += val;
+                        }
+                    }
+                }
+                // end of code block
+
+                if (tid == 0)
+                    partialSum[i / blockSize] = clipped;
+
+                __syncthreads();
+            }
+
+            int partialSum_ = partialSum[tid];
+
+            // Following code block is in effect equivalent to:
+            //
+            //      blockReduce<blockSize>(smem, partialSum_, tid, plus<int>());
+            //
+            {
+                for (int j = 16; j >= 1; j /= 2)
+                {
+                #if __CUDACC_VER_MAJOR__ >= 9
+                    int val = __shfl_down_sync(0xFFFFFFFFU, partialSum_, j);
+                #else
+                    int val = __shfl_down(partialSum_, j);
+                #endif
+                    partialSum_ += val;
+                }
+
+                if (tid % 32 == 0)
+                    smem[tid / 32] = partialSum_;
+
+                __syncthreads();
+
+                if (tid < 8)
+                {
+                    partialSum_ = smem[tid];
+
+                    for (int j = 4; j >= 1; j /= 2)
+                    {
+                    #if __CUDACC_VER_MAJOR__ >= 9
+                        int val = __shfl_down_sync(0x000000FFU, partialSum_, j);
+                    #else
+                        int val = __shfl_down(partialSum_, j);
+                    #endif
+                        partialSum_ += val;
+                    }
+                }
+            }
+            // end of code block
+
+            // broadcast evaluated value &&
+            // redistribute clipped samples evenly
+
+            __shared__ int totalClipped;
+            __shared__ int redistBatch;
+            __shared__ int residual;
+            __shared__ int rStep;
+
+            if (tid == 0)
+            {
+                totalClipped = partialSum_;
+                redistBatch = totalClipped / histSize;
+                residual = totalClipped - redistBatch * histSize;
+
+                rStep = 1;
+                if (residual != 0)
+                    rStep = histSize / residual;
+            }
+
+            __syncthreads();
+
+            for (int i = tid; i < histSize; i += blockSize)
+            {
+                int histVal = hist(histRow, i);
+
+                int equalized = histVal + redistBatch;
+
+                if (residual && i % rStep == 0 && i / rStep < residual)
+                    ++equalized;
+
+                hist(histRow, i) = equalized;
+            }
+        }
+
+        __shared__ int partialScan[blockSize];
+
+        for (int i = tid; i < histSize; i += blockSize)
+        {
+            int equalized = hist(histRow, i);
+            equalized = blockScanInclusive<blockSize>(equalized, smem, tid);
+
+            if (tid == blockSize - 1)
+                partialScan[i / blockSize] = equalized;
+
+            hist(histRow, i) = equalized;
+        }
+
+        __syncthreads();
+
+        int partialScan_ = partialScan[tid];
+        partialScan[tid] = blockScanExclusive<blockSize>(partialScan_, smem, tid);
+
+        __syncthreads();
+
+        for (int i = tid; i < histSize; i += blockSize)
+        {
+            const int lutVal = hist(histRow, i) + partialScan[i / blockSize];
+
+            lut(histRow, i) = saturate_cast<ushort>(__float2int_rn(lutScale * lutVal));
+        }
+
+        #undef histSize
+        #undef blockSize
+    }
+
+    void calcLut_8U(PtrStepSzb src, PtrStepb lut, int tilesX, int tilesY, int2 tileSize, int clipLimit, float lutScale, cudaStream_t stream)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(tilesX, tilesY);
+
+        calcLutKernel_8U<<<grid, block, 0, stream>>>(src, lut, tileSize, tilesX, clipLimit, lutScale);
+
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+
+    void calcLut_16U(PtrStepSzus src, PtrStepus lut, int tilesX, int tilesY, int2 tileSize, int clipLimit, float lutScale, PtrStepSzi hist, cudaStream_t stream)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(tilesX, tilesY);
+
+        calcLutKernel_16U<<<grid, block, 0, stream>>>(src, lut, tileSize, tilesX, clipLimit, lutScale, hist);
+
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+
+    template <typename T>
+    __global__ void transformKernel(const PtrStepSz<T> src, PtrStep<T> dst, const PtrStep<T> lut, const int2 tileSize, const int tilesX, const int tilesY)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= src.cols || y >= src.rows)
+            return;
+
+        const float tyf = (static_cast<float>(y) / tileSize.y) - 0.5f;
+        int ty1 = __float2int_rd(tyf);
+        int ty2 = ty1 + 1;
+        const float ya = tyf - ty1;
+        ty1 = ::max(ty1, 0);
+        ty2 = ::min(ty2, tilesY - 1);
+
+        const float txf = (static_cast<float>(x) / tileSize.x) - 0.5f;
+        int tx1 = __float2int_rd(txf);
+        int tx2 = tx1 + 1;
+        const float xa = txf - tx1;
+        tx1 = ::max(tx1, 0);
+        tx2 = ::min(tx2, tilesX - 1);
+
+        const int srcVal = src(y, x);
+
+        float res = 0;
+
+        res += lut(ty1 * tilesX + tx1, srcVal) * ((1.0f - xa) * (1.0f - ya));
+        res += lut(ty1 * tilesX + tx2, srcVal) * ((xa) * (1.0f - ya));
+        res += lut(ty2 * tilesX + tx1, srcVal) * ((1.0f - xa) * (ya));
+        res += lut(ty2 * tilesX + tx2, srcVal) * ((xa) * (ya));
+
+        dst(y, x) = saturate_cast<T>(res);
+    }
+
+    template <typename T>
+    void transform(PtrStepSz<T> src, PtrStepSz<T> dst, PtrStep<T> lut, int tilesX, int tilesY, int2 tileSize, cudaStream_t stream)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(src.cols, block.x), divUp(src.rows, block.y));
+
+        CV_CUDEV_SAFE_CALL( cudaFuncSetCacheConfig(transformKernel<T>, cudaFuncCachePreferL1) );
+
+        transformKernel<T><<<grid, block, 0, stream>>>(src, dst, lut, tileSize, tilesX, tilesY);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+
+    template void transform<uchar>(PtrStepSz<uchar> src, PtrStepSz<uchar> dst, PtrStep<uchar> lut, int tilesX, int tilesY, int2 tileSize, cudaStream_t stream);
+    template void transform<ushort>(PtrStepSz<ushort> src, PtrStepSz<ushort> dst, PtrStep<ushort> lut, int tilesX, int tilesY, int2 tileSize, cudaStream_t stream);
+}
+
+#endif // CUDA_DISABLER
diff --git a/modules/cudaimgproc/src/cuda/color.cu b/modules/cudaimgproc/src/cuda/color.cu
new file mode 100644
index 00000000000..319b83ab20a
--- /dev/null
+++ b/modules/cudaimgproc/src/cuda/color.cu
@@ -0,0 +1,297 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifndef HAVE_OPENCV_CUDEV
+
+#error "opencv_cudev is required"
+
+#else
+
+#include "../cvt_color_internal.h"
+#include "opencv2/cudev.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+
+namespace cv { namespace cuda { namespace device
+{
+
+#define OPENCV_CUDA_IMPLEMENT_CVTCOLOR(name, func_t) \
+    void name(const GpuMat& src, GpuMat& dst, Stream& stream) \
+    { \
+        func_t op; \
+        typedef typename func_t::argument_type src_t; \
+        typedef typename func_t::result_type   dst_t; \
+        gridTransformUnary(globPtr<src_t>(src), globPtr<dst_t>(dst), op, stream); \
+    }
+
+#define OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(name) \
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR(name, name ## _func)
+
+#define OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(name) \
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR(name ## _8u, name ## _func<uchar>) \
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR(name ## _16u, name ## _func<ushort>) \
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR(name ## _32f, name ## _func<float>)
+
+#define OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(name) \
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR(name ## _8u, name ## _func<uchar>) \
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR(name ## _32f, name ## _func<float>)
+
+#define OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(name) \
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR(name ## _8u, name ## _func<uchar>) \
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR(name ## _32f, name ## _func<float>) \
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR(name ## _FULL_8u, name ## _FULL_func<uchar>) \
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR(name ## _FULL_32f, name ## _FULL_func<float>)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGR_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGR_to_BGRA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGR_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGRA_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGRA_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGRA_to_RGBA)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(RGB_to_GRAY)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGR_to_GRAY)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(RGBA_to_GRAY)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGRA_to_GRAY)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(GRAY_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(GRAY_to_BGRA)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(RGB_to_YUV)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(RGBA_to_YUV)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(RGB_to_YUV4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(RGBA_to_YUV4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGR_to_YUV)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGRA_to_YUV)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGR_to_YUV4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGRA_to_YUV4)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(YUV_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(YUV_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(YUV4_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(YUV4_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(YUV_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(YUV_to_BGRA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(YUV4_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(YUV4_to_BGRA)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(RGB_to_YCrCb)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(RGBA_to_YCrCb)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(RGB_to_YCrCb4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(RGBA_to_YCrCb4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGR_to_YCrCb)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGRA_to_YCrCb)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGR_to_YCrCb4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGRA_to_YCrCb4)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(YCrCb_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(YCrCb_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(YCrCb4_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(YCrCb4_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(YCrCb_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(YCrCb_to_BGRA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(YCrCb4_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(YCrCb4_to_BGRA)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(RGB_to_XYZ)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(RGBA_to_XYZ)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(RGB_to_XYZ4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(RGBA_to_XYZ4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGR_to_XYZ)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGRA_to_XYZ)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGR_to_XYZ4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(BGRA_to_XYZ4)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(XYZ_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(XYZ4_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(XYZ_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(XYZ4_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(XYZ_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(XYZ4_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(XYZ_to_BGRA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL(XYZ4_to_BGRA)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(RGB_to_HSV)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(RGBA_to_HSV)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(RGB_to_HSV4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(RGBA_to_HSV4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(BGR_to_HSV)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(BGRA_to_HSV)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(BGR_to_HSV4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(BGRA_to_HSV4)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(HSV_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(HSV_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(HSV4_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(HSV4_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(HSV_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(HSV_to_BGRA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(HSV4_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(HSV4_to_BGRA)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(RGB_to_HLS)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(RGBA_to_HLS)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(RGB_to_HLS4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(RGBA_to_HLS4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(BGR_to_HLS)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(BGRA_to_HLS)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(BGR_to_HLS4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(BGRA_to_HLS4)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(HLS_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(HLS_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(HLS4_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(HLS4_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(HLS_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(HLS_to_BGRA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(HLS4_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL(HLS4_to_BGRA)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(RGB_to_Lab)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(RGBA_to_Lab)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(RGB_to_Lab4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(RGBA_to_Lab4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(BGR_to_Lab)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(BGRA_to_Lab)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(BGR_to_Lab4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(BGRA_to_Lab4)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(LRGB_to_Lab)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(LRGBA_to_Lab)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(LRGB_to_Lab4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(LRGBA_to_Lab4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(LBGR_to_Lab)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(LBGRA_to_Lab)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(LBGR_to_Lab4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(LBGRA_to_Lab4)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Lab_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Lab4_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Lab_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Lab4_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Lab_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Lab4_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Lab_to_BGRA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Lab4_to_BGRA)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Lab_to_LRGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Lab4_to_LRGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Lab_to_LRGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Lab4_to_LRGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Lab_to_LBGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Lab4_to_LBGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Lab_to_LBGRA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Lab4_to_LBGRA)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(RGB_to_Luv)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(RGBA_to_Luv)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(RGB_to_Luv4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(RGBA_to_Luv4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(BGR_to_Luv)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(BGRA_to_Luv)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(BGR_to_Luv4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(BGRA_to_Luv4)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(LRGB_to_Luv)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(LRGBA_to_Luv)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(LRGB_to_Luv4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(LRGBA_to_Luv4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(LBGR_to_Luv)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(LBGRA_to_Luv)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(LBGR_to_Luv4)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(LBGRA_to_Luv4)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Luv_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Luv4_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Luv_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Luv4_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Luv_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Luv4_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Luv_to_BGRA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Luv4_to_BGRA)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Luv_to_LRGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Luv4_to_LRGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Luv_to_LRGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Luv4_to_LRGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Luv_to_LBGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Luv4_to_LBGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Luv_to_LBGRA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F(Luv4_to_LBGRA)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(BGR_to_BGR555)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(BGR_to_BGR565)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(RGB_to_BGR555)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(RGB_to_BGR565)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(BGRA_to_BGR555)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(BGRA_to_BGR565)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(RGBA_to_BGR555)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(RGBA_to_BGR565)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(BGR555_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(BGR565_to_RGB)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(BGR555_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(BGR565_to_BGR)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(BGR555_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(BGR565_to_RGBA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(BGR555_to_BGRA)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(BGR565_to_BGRA)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(GRAY_to_BGR555)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(GRAY_to_BGR565)
+
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(BGR555_to_GRAY)
+    OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE(BGR565_to_GRAY)
+
+    #undef OPENCV_CUDA_IMPLEMENT_CVTCOLOR
+    #undef OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ONE
+    #undef OPENCV_CUDA_IMPLEMENT_CVTCOLOR_ALL
+    #undef OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F
+    #undef OPENCV_CUDA_IMPLEMENT_CVTCOLOR_8U32F_FULL
+
+}}}
+
+#endif
diff --git a/modules/cudaimgproc/src/cuda/corners.cu b/modules/cudaimgproc/src/cuda/corners.cu
new file mode 100644
index 00000000000..92a37e6fde1
--- /dev/null
+++ b/modules/cudaimgproc/src/cuda/corners.cu
@@ -0,0 +1,280 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/vec_traits.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/saturate_cast.hpp"
+#include "opencv2/core/cuda/border_interpolate.hpp"
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifdef HAVE_OPENCV_CUDAFILTERS
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace imgproc
+    {
+        /////////////////////////////////////////// Corner Harris /////////////////////////////////////////////////
+
+        texture<float, cudaTextureType2D, cudaReadModeElementType> harrisDxTex(0, cudaFilterModePoint, cudaAddressModeClamp);
+        texture<float, cudaTextureType2D, cudaReadModeElementType> harrisDyTex(0, cudaFilterModePoint, cudaAddressModeClamp);
+
+        __global__ void cornerHarris_kernel(const int block_size, const float k, PtrStepSzf dst)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < dst.cols && y < dst.rows)
+            {
+                float a = 0.f;
+                float b = 0.f;
+                float c = 0.f;
+
+                const int ibegin = y - (block_size / 2);
+                const int jbegin = x - (block_size / 2);
+                const int iend = ibegin + block_size;
+                const int jend = jbegin + block_size;
+
+                for (int i = ibegin; i < iend; ++i)
+                {
+                    for (int j = jbegin; j < jend; ++j)
+                    {
+                        float dx = tex2D(harrisDxTex, j, i);
+                        float dy = tex2D(harrisDyTex, j, i);
+
+                        a += dx * dx;
+                        b += dx * dy;
+                        c += dy * dy;
+                    }
+                }
+
+                dst(y, x) = a * c - b * b - k * (a + c) * (a + c);
+            }
+        }
+
+        template <typename BR, typename BC>
+        __global__ void cornerHarris_kernel(const int block_size, const float k, PtrStepSzf dst, const BR border_row, const BC border_col)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < dst.cols && y < dst.rows)
+            {
+                float a = 0.f;
+                float b = 0.f;
+                float c = 0.f;
+
+                const int ibegin = y - (block_size / 2);
+                const int jbegin = x - (block_size / 2);
+                const int iend = ibegin + block_size;
+                const int jend = jbegin + block_size;
+
+                for (int i = ibegin; i < iend; ++i)
+                {
+                    const int y = border_col.idx_row(i);
+
+                    for (int j = jbegin; j < jend; ++j)
+                    {
+                        const int x = border_row.idx_col(j);
+
+                        float dx = tex2D(harrisDxTex, x, y);
+                        float dy = tex2D(harrisDyTex, x, y);
+
+                        a += dx * dx;
+                        b += dx * dy;
+                        c += dy * dy;
+                    }
+                }
+
+                dst(y, x) = a * c - b * b - k * (a + c) * (a + c);
+            }
+        }
+
+        void cornerHarris_gpu(int block_size, float k, PtrStepSzf Dx, PtrStepSzf Dy, PtrStepSzf dst, int border_type, cudaStream_t stream)
+        {
+            dim3 block(32, 8);
+            dim3 grid(divUp(Dx.cols, block.x), divUp(Dx.rows, block.y));
+
+            bindTexture(&harrisDxTex, Dx);
+            bindTexture(&harrisDyTex, Dy);
+
+            switch (border_type)
+            {
+            case BORDER_REFLECT101:
+                cornerHarris_kernel<<<grid, block, 0, stream>>>(block_size, k, dst, BrdRowReflect101<void>(Dx.cols), BrdColReflect101<void>(Dx.rows));
+                break;
+
+            case BORDER_REFLECT:
+                cornerHarris_kernel<<<grid, block, 0, stream>>>(block_size, k, dst, BrdRowReflect<void>(Dx.cols), BrdColReflect<void>(Dx.rows));
+                break;
+
+            case BORDER_REPLICATE:
+                cornerHarris_kernel<<<grid, block, 0, stream>>>(block_size, k, dst);
+                break;
+            }
+
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        /////////////////////////////////////////// Corner Min Eigen Val /////////////////////////////////////////////////
+
+        texture<float, cudaTextureType2D, cudaReadModeElementType> minEigenValDxTex(0, cudaFilterModePoint, cudaAddressModeClamp);
+        texture<float, cudaTextureType2D, cudaReadModeElementType> minEigenValDyTex(0, cudaFilterModePoint, cudaAddressModeClamp);
+
+        __global__ void cornerMinEigenVal_kernel(const int block_size, PtrStepSzf dst)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < dst.cols && y < dst.rows)
+            {
+                float a = 0.f;
+                float b = 0.f;
+                float c = 0.f;
+
+                const int ibegin = y - (block_size / 2);
+                const int jbegin = x - (block_size / 2);
+                const int iend = ibegin + block_size;
+                const int jend = jbegin + block_size;
+
+                for (int i = ibegin; i < iend; ++i)
+                {
+                    for (int j = jbegin; j < jend; ++j)
+                    {
+                        float dx = tex2D(minEigenValDxTex, j, i);
+                        float dy = tex2D(minEigenValDyTex, j, i);
+
+                        a += dx * dx;
+                        b += dx * dy;
+                        c += dy * dy;
+                    }
+                }
+
+                a *= 0.5f;
+                c *= 0.5f;
+
+                dst(y, x) = (a + c) - sqrtf((a - c) * (a - c) + b * b);
+            }
+        }
+
+
+        template <typename BR, typename BC>
+        __global__ void cornerMinEigenVal_kernel(const int block_size, PtrStepSzf dst, const BR border_row, const BC border_col)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < dst.cols && y < dst.rows)
+            {
+                float a = 0.f;
+                float b = 0.f;
+                float c = 0.f;
+
+                const int ibegin = y - (block_size / 2);
+                const int jbegin = x - (block_size / 2);
+                const int iend = ibegin + block_size;
+                const int jend = jbegin + block_size;
+
+                for (int i = ibegin; i < iend; ++i)
+                {
+                    int y = border_col.idx_row(i);
+
+                    for (int j = jbegin; j < jend; ++j)
+                    {
+                        int x = border_row.idx_col(j);
+
+                        float dx = tex2D(minEigenValDxTex, x, y);
+                        float dy = tex2D(minEigenValDyTex, x, y);
+
+                        a += dx * dx;
+                        b += dx * dy;
+                        c += dy * dy;
+                    }
+                }
+
+                a *= 0.5f;
+                c *= 0.5f;
+
+                dst(y, x) = (a + c) - sqrtf((a - c) * (a - c) + b * b);
+            }
+        }
+
+        void cornerMinEigenVal_gpu(int block_size, PtrStepSzf Dx, PtrStepSzf Dy, PtrStepSzf dst, int border_type, cudaStream_t stream)
+        {
+            dim3 block(32, 8);
+            dim3 grid(divUp(Dx.cols, block.x), divUp(Dx.rows, block.y));
+
+            bindTexture(&minEigenValDxTex, Dx);
+            bindTexture(&minEigenValDyTex, Dy);
+
+            switch (border_type)
+            {
+            case BORDER_REFLECT101:
+                cornerMinEigenVal_kernel<<<grid, block, 0, stream>>>(block_size, dst, BrdRowReflect101<void>(Dx.cols), BrdColReflect101<void>(Dx.rows));
+                break;
+
+            case BORDER_REFLECT:
+                cornerMinEigenVal_kernel<<<grid, block, 0, stream>>>(block_size, dst, BrdRowReflect<void>(Dx.cols), BrdColReflect<void>(Dx.rows));
+                break;
+
+            case BORDER_REPLICATE:
+                cornerMinEigenVal_kernel<<<grid, block, 0, stream>>>(block_size, dst);
+                break;
+            }
+
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall(cudaDeviceSynchronize());
+        }
+    }
+}}}
+
+#endif // HAVE_OPENCV_CUDAFILTERS
+
+#endif // CUDA_DISABLER
diff --git a/modules/cudaimgproc/src/cuda/debayer.cu b/modules/cudaimgproc/src/cuda/debayer.cu
new file mode 100644
index 00000000000..0da78139807
--- /dev/null
+++ b/modules/cudaimgproc/src/cuda/debayer.cu
@@ -0,0 +1,544 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/vec_traits.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/limits.hpp"
+#include "opencv2/core/cuda/color.hpp"
+#include "opencv2/core/cuda/saturate_cast.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    template <typename T> struct Bayer2BGR;
+
+    template <> struct Bayer2BGR<uchar>
+    {
+        uchar3 res0;
+        uchar3 res1;
+        uchar3 res2;
+        uchar3 res3;
+
+        __device__ void apply(const PtrStepSzb& src, int s_x, int s_y, bool blue_last, bool start_with_green)
+        {
+            uchar4 patch[3][3];
+            patch[0][1] = ((const uchar4*) src.ptr(s_y - 1))[s_x];
+            patch[0][0] = ((const uchar4*) src.ptr(s_y - 1))[::max(s_x - 1, 0)];
+            patch[0][2] = ((const uchar4*) src.ptr(s_y - 1))[::min(s_x + 1, ((src.cols + 3) >> 2) - 1)];
+
+            patch[1][1] = ((const uchar4*) src.ptr(s_y))[s_x];
+            patch[1][0] = ((const uchar4*) src.ptr(s_y))[::max(s_x - 1, 0)];
+            patch[1][2] = ((const uchar4*) src.ptr(s_y))[::min(s_x + 1, ((src.cols + 3) >> 2) - 1)];
+
+            patch[2][1] = ((const uchar4*) src.ptr(s_y + 1))[s_x];
+            patch[2][0] = ((const uchar4*) src.ptr(s_y + 1))[::max(s_x - 1, 0)];
+            patch[2][2] = ((const uchar4*) src.ptr(s_y + 1))[::min(s_x + 1, ((src.cols + 3) >> 2) - 1)];
+
+            if ((s_y & 1) ^ start_with_green)
+            {
+                const int t0 = (patch[0][1].x + patch[2][1].x + 1) >> 1;
+                const int t1 = (patch[1][0].w + patch[1][1].y + 1) >> 1;
+
+                const int t2 = (patch[0][1].x + patch[0][1].z + patch[2][1].x + patch[2][1].z + 2) >> 2;
+                const int t3 = (patch[0][1].y + patch[1][1].x + patch[1][1].z + patch[2][1].y + 2) >> 2;
+
+                const int t4 = (patch[0][1].z + patch[2][1].z + 1) >> 1;
+                const int t5 = (patch[1][1].y + patch[1][1].w + 1) >> 1;
+
+                const int t6 = (patch[0][1].z + patch[0][2].x + patch[2][1].z + patch[2][2].x + 2) >> 2;
+                const int t7 = (patch[0][1].w + patch[1][1].z + patch[1][2].x + patch[2][1].w + 2) >> 2;
+
+                if ((s_y & 1) ^ blue_last)
+                {
+                    res0.x = t1;
+                    res0.y = patch[1][1].x;
+                    res0.z = t0;
+
+                    res1.x = patch[1][1].y;
+                    res1.y = t3;
+                    res1.z = t2;
+
+                    res2.x = t5;
+                    res2.y = patch[1][1].z;
+                    res2.z = t4;
+
+                    res3.x = patch[1][1].w;
+                    res3.y = t7;
+                    res3.z = t6;
+                }
+                else
+                {
+                    res0.x = t0;
+                    res0.y = patch[1][1].x;
+                    res0.z = t1;
+
+                    res1.x = t2;
+                    res1.y = t3;
+                    res1.z = patch[1][1].y;
+
+                    res2.x = t4;
+                    res2.y = patch[1][1].z;
+                    res2.z = t5;
+
+                    res3.x = t6;
+                    res3.y = t7;
+                    res3.z = patch[1][1].w;
+                }
+            }
+            else
+            {
+                const int t0 = (patch[0][0].w + patch[0][1].y + patch[2][0].w + patch[2][1].y + 2) >> 2;
+                const int t1 = (patch[0][1].x + patch[1][0].w + patch[1][1].y + patch[2][1].x + 2) >> 2;
+
+                const int t2 = (patch[0][1].y + patch[2][1].y + 1) >> 1;
+                const int t3 = (patch[1][1].x + patch[1][1].z + 1) >> 1;
+
+                const int t4 = (patch[0][1].y + patch[0][1].w + patch[2][1].y + patch[2][1].w + 2) >> 2;
+                const int t5 = (patch[0][1].z + patch[1][1].y + patch[1][1].w + patch[2][1].z + 2) >> 2;
+
+                const int t6 = (patch[0][1].w + patch[2][1].w + 1) >> 1;
+                const int t7 = (patch[1][1].z + patch[1][2].x + 1) >> 1;
+
+                if ((s_y & 1) ^ blue_last)
+                {
+                    res0.x = patch[1][1].x;
+                    res0.y = t1;
+                    res0.z = t0;
+
+                    res1.x = t3;
+                    res1.y = patch[1][1].y;
+                    res1.z = t2;
+
+                    res2.x = patch[1][1].z;
+                    res2.y = t5;
+                    res2.z = t4;
+
+                    res3.x = t7;
+                    res3.y = patch[1][1].w;
+                    res3.z = t6;
+                }
+                else
+                {
+                    res0.x = t0;
+                    res0.y = t1;
+                    res0.z = patch[1][1].x;
+
+                    res1.x = t2;
+                    res1.y = patch[1][1].y;
+                    res1.z = t3;
+
+                    res2.x = t4;
+                    res2.y = t5;
+                    res2.z = patch[1][1].z;
+
+                    res3.x = t6;
+                    res3.y = patch[1][1].w;
+                    res3.z = t7;
+                }
+            }
+        }
+    };
+
+    template <typename D> __device__ __forceinline__ D toDst(const uchar3& pix);
+    template <> __device__ __forceinline__ uchar toDst<uchar>(const uchar3& pix)
+    {
+        typename bgr_to_gray_traits<uchar>::functor_type f = bgr_to_gray_traits<uchar>::create_functor();
+        return f(pix);
+    }
+    template <> __device__ __forceinline__ uchar3 toDst<uchar3>(const uchar3& pix)
+    {
+        return pix;
+    }
+    template <> __device__ __forceinline__ uchar4 toDst<uchar4>(const uchar3& pix)
+    {
+        return make_uchar4(pix.x, pix.y, pix.z, 255);
+    }
+
+    template <typename D>
+    __global__ void Bayer2BGR_8u(const PtrStepSzb src, PtrStep<D> dst, const bool blue_last, const bool start_with_green)
+    {
+        const int s_x = blockIdx.x * blockDim.x + threadIdx.x;
+        int s_y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (s_y >= src.rows || (s_x << 2) >= src.cols)
+            return;
+
+        s_y = ::min(::max(s_y, 1), src.rows - 2);
+
+        Bayer2BGR<uchar> bayer;
+        bayer.apply(src, s_x, s_y, blue_last, start_with_green);
+
+        const int d_x = (blockIdx.x * blockDim.x + threadIdx.x) << 2;
+        const int d_y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        dst(d_y, d_x) = toDst<D>(bayer.res0);
+        if (d_x + 1 < src.cols)
+            dst(d_y, d_x + 1) = toDst<D>(bayer.res1);
+        if (d_x + 2 < src.cols)
+            dst(d_y, d_x + 2) = toDst<D>(bayer.res2);
+        if (d_x + 3 < src.cols)
+            dst(d_y, d_x + 3) = toDst<D>(bayer.res3);
+    }
+
+    template <> struct Bayer2BGR<ushort>
+    {
+        ushort3 res0;
+        ushort3 res1;
+
+        __device__ void apply(const PtrStepSzb& src, int s_x, int s_y, bool blue_last, bool start_with_green)
+        {
+            ushort2 patch[3][3];
+            patch[0][1] = ((const ushort2*) src.ptr(s_y - 1))[s_x];
+            patch[0][0] = ((const ushort2*) src.ptr(s_y - 1))[::max(s_x - 1, 0)];
+            patch[0][2] = ((const ushort2*) src.ptr(s_y - 1))[::min(s_x + 1, ((src.cols + 1) >> 1) - 1)];
+
+            patch[1][1] = ((const ushort2*) src.ptr(s_y))[s_x];
+            patch[1][0] = ((const ushort2*) src.ptr(s_y))[::max(s_x - 1, 0)];
+            patch[1][2] = ((const ushort2*) src.ptr(s_y))[::min(s_x + 1, ((src.cols + 1) >> 1) - 1)];
+
+            patch[2][1] = ((const ushort2*) src.ptr(s_y + 1))[s_x];
+            patch[2][0] = ((const ushort2*) src.ptr(s_y + 1))[::max(s_x - 1, 0)];
+            patch[2][2] = ((const ushort2*) src.ptr(s_y + 1))[::min(s_x + 1, ((src.cols + 1) >> 1) - 1)];
+
+            if ((s_y & 1) ^ start_with_green)
+            {
+                const int t0 = (patch[0][1].x + patch[2][1].x + 1) >> 1;
+                const int t1 = (patch[1][0].y + patch[1][1].y + 1) >> 1;
+
+                const int t2 = (patch[0][1].x + patch[0][2].x + patch[2][1].x + patch[2][2].x + 2) >> 2;
+                const int t3 = (patch[0][1].y + patch[1][1].x + patch[1][2].x + patch[2][1].y + 2) >> 2;
+
+                if ((s_y & 1) ^ blue_last)
+                {
+                    res0.x = t1;
+                    res0.y = patch[1][1].x;
+                    res0.z = t0;
+
+                    res1.x = patch[1][1].y;
+                    res1.y = t3;
+                    res1.z = t2;
+                }
+                else
+                {
+                    res0.x = t0;
+                    res0.y = patch[1][1].x;
+                    res0.z = t1;
+
+                    res1.x = t2;
+                    res1.y = t3;
+                    res1.z = patch[1][1].y;
+                }
+            }
+            else
+            {
+                const int t0 = (patch[0][0].y + patch[0][1].y + patch[2][0].y + patch[2][1].y + 2) >> 2;
+                const int t1 = (patch[0][1].x + patch[1][0].y + patch[1][1].y + patch[2][1].x + 2) >> 2;
+
+                const int t2 = (patch[0][1].y + patch[2][1].y + 1) >> 1;
+                const int t3 = (patch[1][1].x + patch[1][2].x + 1) >> 1;
+
+                if ((s_y & 1) ^ blue_last)
+                {
+                    res0.x = patch[1][1].x;
+                    res0.y = t1;
+                    res0.z = t0;
+
+                    res1.x = t3;
+                    res1.y = patch[1][1].y;
+                    res1.z = t2;
+                }
+                else
+                {
+                    res0.x = t0;
+                    res0.y = t1;
+                    res0.z = patch[1][1].x;
+
+                    res1.x = t2;
+                    res1.y = patch[1][1].y;
+                    res1.z = t3;
+                }
+            }
+        }
+    };
+
+    template <typename D> __device__ __forceinline__ D toDst(const ushort3& pix);
+    template <> __device__ __forceinline__ ushort toDst<ushort>(const ushort3& pix)
+    {
+        typename bgr_to_gray_traits<ushort>::functor_type f = bgr_to_gray_traits<ushort>::create_functor();
+        return f(pix);
+    }
+    template <> __device__ __forceinline__ ushort3 toDst<ushort3>(const ushort3& pix)
+    {
+        return pix;
+    }
+    template <> __device__ __forceinline__ ushort4 toDst<ushort4>(const ushort3& pix)
+    {
+        return make_ushort4(pix.x, pix.y, pix.z, numeric_limits<ushort>::max());
+    }
+
+    template <typename D>
+    __global__ void Bayer2BGR_16u(const PtrStepSzb src, PtrStep<D> dst, const bool blue_last, const bool start_with_green)
+    {
+        const int s_x = blockIdx.x * blockDim.x + threadIdx.x;
+        int s_y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (s_y >= src.rows || (s_x << 1) >= src.cols)
+            return;
+
+        s_y = ::min(::max(s_y, 1), src.rows - 2);
+
+        Bayer2BGR<ushort> bayer;
+        bayer.apply(src, s_x, s_y, blue_last, start_with_green);
+
+        const int d_x = (blockIdx.x * blockDim.x + threadIdx.x) << 1;
+        const int d_y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        dst(d_y, d_x) = toDst<D>(bayer.res0);
+        if (d_x + 1 < src.cols)
+            dst(d_y, d_x + 1) = toDst<D>(bayer.res1);
+    }
+
+    template <int cn>
+    void Bayer2BGR_8u_gpu(PtrStepSzb src, PtrStepSzb dst, bool blue_last, bool start_with_green, cudaStream_t stream)
+    {
+        typedef typename TypeVec<uchar, cn>::vec_type dst_t;
+
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(src.cols, 4 * block.x), divUp(src.rows, block.y));
+
+        cudaSafeCall( cudaFuncSetCacheConfig(Bayer2BGR_8u<dst_t>, cudaFuncCachePreferL1) );
+
+        Bayer2BGR_8u<dst_t><<<grid, block, 0, stream>>>(src, (PtrStepSz<dst_t>)dst, blue_last, start_with_green);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    template <int cn>
+    void Bayer2BGR_16u_gpu(PtrStepSzb src, PtrStepSzb dst, bool blue_last, bool start_with_green, cudaStream_t stream)
+    {
+        typedef typename TypeVec<ushort, cn>::vec_type dst_t;
+
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(src.cols, 2 * block.x), divUp(src.rows, block.y));
+
+        cudaSafeCall( cudaFuncSetCacheConfig(Bayer2BGR_16u<dst_t>, cudaFuncCachePreferL1) );
+
+        Bayer2BGR_16u<dst_t><<<grid, block, 0, stream>>>(src, (PtrStepSz<dst_t>)dst, blue_last, start_with_green);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    template void Bayer2BGR_8u_gpu<1>(PtrStepSzb src, PtrStepSzb dst, bool blue_last, bool start_with_green, cudaStream_t stream);
+    template void Bayer2BGR_8u_gpu<3>(PtrStepSzb src, PtrStepSzb dst, bool blue_last, bool start_with_green, cudaStream_t stream);
+    template void Bayer2BGR_8u_gpu<4>(PtrStepSzb src, PtrStepSzb dst, bool blue_last, bool start_with_green, cudaStream_t stream);
+
+    template void Bayer2BGR_16u_gpu<1>(PtrStepSzb src, PtrStepSzb dst, bool blue_last, bool start_with_green, cudaStream_t stream);
+    template void Bayer2BGR_16u_gpu<3>(PtrStepSzb src, PtrStepSzb dst, bool blue_last, bool start_with_green, cudaStream_t stream);
+    template void Bayer2BGR_16u_gpu<4>(PtrStepSzb src, PtrStepSzb dst, bool blue_last, bool start_with_green, cudaStream_t stream);
+
+    //////////////////////////////////////////////////////////////
+    // Bayer Demosaicing (Malvar, He, and Cutler)
+    //
+    // by Morgan McGuire, Williams College
+    // http://graphics.cs.williams.edu/papers/BayerJGT09/#shaders
+    //
+    // ported to CUDA
+
+    texture<uchar, cudaTextureType2D, cudaReadModeElementType> sourceTex(false, cudaFilterModePoint, cudaAddressModeClamp);
+
+    template <typename DstType>
+    __global__ void MHCdemosaic(PtrStepSz<DstType> dst, const int2 sourceOffset, const int2 firstRed)
+    {
+        const float   kAx = -1.0f / 8.0f,     kAy = -1.5f / 8.0f,     kAz =  0.5f / 8.0f    /*kAw = -1.0f / 8.0f*/;
+        const float   kBx =  2.0f / 8.0f,   /*kBy =  0.0f / 8.0f,*/ /*kBz =  0.0f / 8.0f,*/   kBw =  4.0f / 8.0f  ;
+        const float   kCx =  4.0f / 8.0f,     kCy =  6.0f / 8.0f,     kCz =  5.0f / 8.0f    /*kCw =  5.0f / 8.0f*/;
+        const float /*kDx =  0.0f / 8.0f,*/   kDy =  2.0f / 8.0f,     kDz = -1.0f / 8.0f    /*kDw = -1.0f / 8.0f*/;
+        const float   kEx = -1.0f / 8.0f,     kEy = -1.5f / 8.0f,   /*kEz = -1.0f / 8.0f,*/   kEw =  0.5f / 8.0f  ;
+        const float   kFx =  2.0f / 8.0f,   /*kFy =  0.0f / 8.0f,*/   kFz =  4.0f / 8.0f    /*kFw =  0.0f / 8.0f*/;
+
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x == 0 || x >= dst.cols - 1 || y == 0 || y >= dst.rows - 1)
+            return;
+
+        int2 center;
+        center.x = x + sourceOffset.x;
+        center.y = y + sourceOffset.y;
+
+        int4 xCoord;
+        xCoord.x = center.x - 2;
+        xCoord.y = center.x - 1;
+        xCoord.z = center.x + 1;
+        xCoord.w = center.x + 2;
+
+        int4 yCoord;
+        yCoord.x = center.y - 2;
+        yCoord.y = center.y - 1;
+        yCoord.z = center.y + 1;
+        yCoord.w = center.y + 2;
+
+        float C = tex2D(sourceTex, center.x, center.y); // ( 0, 0)
+
+        float4 Dvec;
+        Dvec.x = tex2D(sourceTex, xCoord.y, yCoord.y); // (-1,-1)
+        Dvec.y = tex2D(sourceTex, xCoord.y, yCoord.z); // (-1, 1)
+        Dvec.z = tex2D(sourceTex, xCoord.z, yCoord.y); // ( 1,-1)
+        Dvec.w = tex2D(sourceTex, xCoord.z, yCoord.z); // ( 1, 1)
+
+        float4 value;
+        value.x = tex2D(sourceTex, center.x, yCoord.x); // ( 0,-2) A0
+        value.y = tex2D(sourceTex, center.x, yCoord.y); // ( 0,-1) B0
+        value.z = tex2D(sourceTex, xCoord.x, center.y); // (-2, 0) E0
+        value.w = tex2D(sourceTex, xCoord.y, center.y); // (-1, 0) F0
+
+        // (A0 + A1), (B0 + B1), (E0 + E1), (F0 + F1)
+        value.x += tex2D(sourceTex, center.x, yCoord.w); // ( 0, 2) A1
+        value.y += tex2D(sourceTex, center.x, yCoord.z); // ( 0, 1) B1
+        value.z += tex2D(sourceTex, xCoord.w, center.y); // ( 2, 0) E1
+        value.w += tex2D(sourceTex, xCoord.z, center.y); // ( 1, 0) F1
+
+        float4 PATTERN;
+        PATTERN.x = kCx * C;
+        PATTERN.y = kCy * C;
+        PATTERN.z = kCz * C;
+        PATTERN.w = PATTERN.z;
+
+        float D = Dvec.x + Dvec.y + Dvec.z + Dvec.w;
+
+        // There are five filter patterns (identity, cross, checker,
+        // theta, phi). Precompute the terms from all of them and then
+        // use swizzles to assign to color channels.
+        //
+        // Channel Matches
+        // x cross (e.g., EE G)
+        // y checker (e.g., EE B)
+        // z theta (e.g., EO R)
+        // w phi (e.g., EO B)
+
+        #define A value.x  // A0 + A1
+        #define B value.y  // B0 + B1
+        #define E value.z  // E0 + E1
+        #define F value.w  // F0 + F1
+
+        float3 temp;
+
+        // PATTERN.yzw += (kD.yz * D).xyy;
+        temp.x = kDy * D;
+        temp.y = kDz * D;
+        PATTERN.y += temp.x;
+        PATTERN.z += temp.y;
+        PATTERN.w += temp.y;
+
+        // PATTERN += (kA.xyz * A).xyzx;
+        temp.x = kAx * A;
+        temp.y = kAy * A;
+        temp.z = kAz * A;
+        PATTERN.x += temp.x;
+        PATTERN.y += temp.y;
+        PATTERN.z += temp.z;
+        PATTERN.w += temp.x;
+
+        // PATTERN += (kE.xyw * E).xyxz;
+        temp.x = kEx * E;
+        temp.y = kEy * E;
+        temp.z = kEw * E;
+        PATTERN.x += temp.x;
+        PATTERN.y += temp.y;
+        PATTERN.z += temp.x;
+        PATTERN.w += temp.z;
+
+        // PATTERN.xw += kB.xw * B;
+        PATTERN.x += kBx * B;
+        PATTERN.w += kBw * B;
+
+        // PATTERN.xz += kF.xz * F;
+        PATTERN.x += kFx * F;
+        PATTERN.z += kFz * F;
+
+        // Determine which of four types of pixels we are on.
+        int2 alternate;
+        alternate.x = (x + firstRed.x) % 2;
+        alternate.y = (y + firstRed.y) % 2;
+
+        // in BGR sequence;
+        uchar3 pixelColor =
+            (alternate.y == 0) ?
+                ((alternate.x == 0) ?
+                    make_uchar3(saturate_cast<uchar>(PATTERN.y), saturate_cast<uchar>(PATTERN.x), saturate_cast<uchar>(C)) :
+                    make_uchar3(saturate_cast<uchar>(PATTERN.w), saturate_cast<uchar>(C), saturate_cast<uchar>(PATTERN.z))) :
+                ((alternate.x == 0) ?
+                    make_uchar3(saturate_cast<uchar>(PATTERN.z), saturate_cast<uchar>(C), saturate_cast<uchar>(PATTERN.w)) :
+                    make_uchar3(saturate_cast<uchar>(C), saturate_cast<uchar>(PATTERN.x), saturate_cast<uchar>(PATTERN.y)));
+
+        dst(y, x) = toDst<DstType>(pixelColor);
+    }
+
+    template <int cn>
+    void MHCdemosaic(PtrStepSzb src, int2 sourceOffset, PtrStepSzb dst, int2 firstRed, cudaStream_t stream)
+    {
+        typedef typename TypeVec<uchar, cn>::vec_type dst_t;
+
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(src.cols, block.x), divUp(src.rows, block.y));
+
+        bindTexture(&sourceTex, src);
+
+        MHCdemosaic<dst_t><<<grid, block, 0, stream>>>((PtrStepSz<dst_t>)dst, sourceOffset, firstRed);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    template void MHCdemosaic<1>(PtrStepSzb src, int2 sourceOffset, PtrStepSzb dst, int2 firstRed, cudaStream_t stream);
+    template void MHCdemosaic<3>(PtrStepSzb src, int2 sourceOffset, PtrStepSzb dst, int2 firstRed, cudaStream_t stream);
+    template void MHCdemosaic<4>(PtrStepSzb src, int2 sourceOffset, PtrStepSzb dst, int2 firstRed, cudaStream_t stream);
+}}}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudaimgproc/src/cuda/generalized_hough.cu b/modules/cudaimgproc/src/cuda/generalized_hough.cu
new file mode 100644
index 00000000000..232c625f9f8
--- /dev/null
+++ b/modules/cudaimgproc/src/cuda/generalized_hough.cu
@@ -0,0 +1,824 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include <thrust/device_ptr.h>
+#include <thrust/transform.h>
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/emulation.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/functional.hpp"
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifdef HAVE_OPENCV_CUDAARITHM
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace ght
+    {
+        __device__ int g_counter;
+
+        template <typename T, int PIXELS_PER_THREAD>
+        __global__ void buildEdgePointList(const PtrStepSzb edges, const PtrStep<T> dx, const PtrStep<T> dy,
+                                           unsigned int* coordList, float* thetaList)
+        {
+            __shared__ unsigned int s_coordLists[4][32 * PIXELS_PER_THREAD];
+            __shared__ float s_thetaLists[4][32 * PIXELS_PER_THREAD];
+            __shared__ int s_sizes[4];
+            __shared__ int s_globStart[4];
+
+            const int x = blockIdx.x * blockDim.x * PIXELS_PER_THREAD + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (threadIdx.x == 0)
+                s_sizes[threadIdx.y] = 0;
+            __syncthreads();
+
+            if (y < edges.rows)
+            {
+                // fill the queue
+                const uchar* edgesRow = edges.ptr(y);
+                const T* dxRow = dx.ptr(y);
+                const T* dyRow = dy.ptr(y);
+
+                for (int i = 0, xx = x; i < PIXELS_PER_THREAD && xx < edges.cols; ++i, xx += blockDim.x)
+                {
+                    const T dxVal = dxRow[xx];
+                    const T dyVal = dyRow[xx];
+
+                    if (edgesRow[xx] && (dxVal != 0 || dyVal != 0))
+                    {
+                        const unsigned int coord = (y << 16) | xx;
+
+                        float theta = ::atan2f(dyVal, dxVal);
+                        if (theta < 0)
+                            theta += 2.0f * CV_PI_F;
+
+                        const int qidx = Emulation::smem::atomicAdd(&s_sizes[threadIdx.y], 1);
+
+                        s_coordLists[threadIdx.y][qidx] = coord;
+                        s_thetaLists[threadIdx.y][qidx] = theta;
+                    }
+                }
+            }
+
+            __syncthreads();
+
+            // let one thread reserve the space required in the global list
+            if (threadIdx.x == 0 && threadIdx.y == 0)
+            {
+                // find how many items are stored in each list
+                int totalSize = 0;
+                for (int i = 0; i < blockDim.y; ++i)
+                {
+                    s_globStart[i] = totalSize;
+                    totalSize += s_sizes[i];
+                }
+
+                // calculate the offset in the global list
+                const int globalOffset = atomicAdd(&g_counter, totalSize);
+                for (int i = 0; i < blockDim.y; ++i)
+                    s_globStart[i] += globalOffset;
+            }
+
+            __syncthreads();
+
+            // copy local queues to global queue
+            const int qsize = s_sizes[threadIdx.y];
+            int gidx = s_globStart[threadIdx.y] + threadIdx.x;
+            for(int i = threadIdx.x; i < qsize; i += blockDim.x, gidx += blockDim.x)
+            {
+                coordList[gidx] = s_coordLists[threadIdx.y][i];
+                thetaList[gidx] = s_thetaLists[threadIdx.y][i];
+            }
+        }
+
+        template <typename T>
+        int buildEdgePointList_gpu(PtrStepSzb edges, PtrStepSzb dx, PtrStepSzb dy, unsigned int* coordList, float* thetaList)
+        {
+            const int PIXELS_PER_THREAD = 8;
+
+            void* counterPtr;
+            cudaSafeCall( cudaGetSymbolAddress(&counterPtr, g_counter) );
+
+            cudaSafeCall( cudaMemset(counterPtr, 0, sizeof(int)) );
+
+            const dim3 block(32, 4);
+            const dim3 grid(divUp(edges.cols, block.x * PIXELS_PER_THREAD), divUp(edges.rows, block.y));
+
+            cudaSafeCall( cudaFuncSetCacheConfig(buildEdgePointList<T, PIXELS_PER_THREAD>, cudaFuncCachePreferShared) );
+
+            buildEdgePointList<T, PIXELS_PER_THREAD><<<grid, block>>>(edges, (PtrStepSz<T>) dx, (PtrStepSz<T>) dy, coordList, thetaList);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+
+            int totalCount;
+            cudaSafeCall( cudaMemcpy(&totalCount, counterPtr, sizeof(int), cudaMemcpyDeviceToHost) );
+
+            return totalCount;
+        }
+
+        template int buildEdgePointList_gpu<short>(PtrStepSzb edges, PtrStepSzb dx, PtrStepSzb dy, unsigned int* coordList, float* thetaList);
+        template int buildEdgePointList_gpu<int>(PtrStepSzb edges, PtrStepSzb dx, PtrStepSzb dy, unsigned int* coordList, float* thetaList);
+        template int buildEdgePointList_gpu<float>(PtrStepSzb edges, PtrStepSzb dx, PtrStepSzb dy, unsigned int* coordList, float* thetaList);
+
+        __global__ void buildRTable(const unsigned int* coordList, const float* thetaList, const int pointsCount,
+                                    PtrStep<short2> r_table, int* r_sizes, int maxSize,
+                                    const short2 templCenter, const float thetaScale)
+        {
+            const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+
+            if (tid >= pointsCount)
+                return;
+
+            const unsigned int coord = coordList[tid];
+            short2 p;
+            p.x = (coord & 0xFFFF);
+            p.y = (coord >> 16) & 0xFFFF;
+
+            const float theta = thetaList[tid];
+            const int n = __float2int_rn(theta * thetaScale);
+
+            const int ind = ::atomicAdd(r_sizes + n, 1);
+            if (ind < maxSize)
+                r_table(n, ind) = saturate_cast<short2>(p - templCenter);
+        }
+
+        void buildRTable_gpu(const unsigned int* coordList, const float* thetaList, int pointsCount,
+                             PtrStepSz<short2> r_table, int* r_sizes,
+                             short2 templCenter, int levels)
+        {
+            const dim3 block(256);
+            const dim3 grid(divUp(pointsCount, block.x));
+
+            const float thetaScale = levels / (2.0f * CV_PI_F);
+
+            buildRTable<<<grid, block>>>(coordList, thetaList, pointsCount, r_table, r_sizes, r_table.cols, templCenter, thetaScale);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ////////////////////////////////////////////////////////////////////////
+        // Ballard_Pos
+
+        __global__ void Ballard_Pos_calcHist(const unsigned int* coordList, const float* thetaList, const int pointsCount,
+                                             const PtrStep<short2> r_table, const int* r_sizes,
+                                             PtrStepSzi hist,
+                                             const float idp, const float thetaScale)
+        {
+            const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+
+            if (tid >= pointsCount)
+                return;
+
+            const unsigned int coord = coordList[tid];
+            short2 p;
+            p.x = (coord & 0xFFFF);
+            p.y = (coord >> 16) & 0xFFFF;
+
+            const float theta = thetaList[tid];
+            const int n = __float2int_rn(theta * thetaScale);
+
+            const short2* r_row = r_table.ptr(n);
+            const int r_row_size = r_sizes[n];
+
+            for (int j = 0; j < r_row_size; ++j)
+            {
+                short2 c = saturate_cast<short2>(p - r_row[j]);
+
+                c.x = __float2int_rn(c.x * idp);
+                c.y = __float2int_rn(c.y * idp);
+
+                if (c.x >= 0 && c.x < hist.cols - 2 && c.y >= 0 && c.y < hist.rows - 2)
+                    ::atomicAdd(hist.ptr(c.y + 1) + c.x + 1, 1);
+            }
+        }
+
+        void Ballard_Pos_calcHist_gpu(const unsigned int* coordList, const float* thetaList, int pointsCount,
+                                      PtrStepSz<short2> r_table, const int* r_sizes,
+                                      PtrStepSzi hist,
+                                      float dp, int levels)
+        {
+            const dim3 block(256);
+            const dim3 grid(divUp(pointsCount, block.x));
+
+            const float idp = 1.0f / dp;
+            const float thetaScale = levels / (2.0f * CV_PI_F);
+
+            Ballard_Pos_calcHist<<<grid, block>>>(coordList, thetaList, pointsCount, r_table, r_sizes, hist, idp, thetaScale);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        __global__ void Ballard_Pos_findPosInHist(const PtrStepSzi hist, float4* out, int3* votes,
+                                                  const int maxSize, const float dp, const int threshold)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x >= hist.cols - 2 || y >= hist.rows - 2)
+                return;
+
+            const int curVotes = hist(y + 1, x + 1);
+
+            if (curVotes > threshold &&
+                curVotes >  hist(y + 1, x) &&
+                curVotes >= hist(y + 1, x + 2) &&
+                curVotes >  hist(y, x + 1) &&
+                curVotes >= hist(y + 2, x + 1))
+            {
+                const int ind = ::atomicAdd(&g_counter, 1);
+
+                if (ind < maxSize)
+                {
+                    out[ind] = make_float4(x * dp, y * dp, 1.0f, 0.0f);
+                    votes[ind] = make_int3(curVotes, 0, 0);
+                }
+            }
+        }
+
+        int Ballard_Pos_findPosInHist_gpu(PtrStepSzi hist, float4* out, int3* votes, int maxSize, float dp, int threshold)
+        {
+            void* counterPtr;
+            cudaSafeCall( cudaGetSymbolAddress(&counterPtr, g_counter) );
+
+            cudaSafeCall( cudaMemset(counterPtr, 0, sizeof(int)) );
+
+            const dim3 block(32, 8);
+            const dim3 grid(divUp(hist.cols - 2, block.x), divUp(hist.rows - 2, block.y));
+
+            cudaSafeCall( cudaFuncSetCacheConfig(Ballard_Pos_findPosInHist, cudaFuncCachePreferL1) );
+
+            Ballard_Pos_findPosInHist<<<grid, block>>>(hist, out, votes, maxSize, dp, threshold);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+
+            int totalCount;
+            cudaSafeCall( cudaMemcpy(&totalCount, counterPtr, sizeof(int), cudaMemcpyDeviceToHost) );
+
+            totalCount = ::min(totalCount, maxSize);
+
+            return totalCount;
+        }
+
+        ////////////////////////////////////////////////////////////////////////
+        // Guil_Full
+
+        struct FeatureTable
+        {
+            uchar* p1_pos_data;
+            size_t p1_pos_step;
+
+            uchar* p1_theta_data;
+            size_t p1_theta_step;
+
+            uchar* p2_pos_data;
+            size_t p2_pos_step;
+
+            uchar* d12_data;
+            size_t d12_step;
+
+            uchar* r1_data;
+            size_t r1_step;
+
+            uchar* r2_data;
+            size_t r2_step;
+        };
+
+        __constant__ FeatureTable c_templFeatures;
+        __constant__ FeatureTable c_imageFeatures;
+
+        void Guil_Full_setTemplFeatures(PtrStepb p1_pos, PtrStepb p1_theta, PtrStepb p2_pos, PtrStepb d12, PtrStepb r1, PtrStepb r2)
+        {
+            FeatureTable tbl;
+
+            tbl.p1_pos_data = p1_pos.data;
+            tbl.p1_pos_step = p1_pos.step;
+
+            tbl.p1_theta_data = p1_theta.data;
+            tbl.p1_theta_step = p1_theta.step;
+
+            tbl.p2_pos_data = p2_pos.data;
+            tbl.p2_pos_step = p2_pos.step;
+
+            tbl.d12_data = d12.data;
+            tbl.d12_step = d12.step;
+
+            tbl.r1_data = r1.data;
+            tbl.r1_step = r1.step;
+
+            tbl.r2_data = r2.data;
+            tbl.r2_step = r2.step;
+
+            cudaSafeCall( cudaMemcpyToSymbol(c_templFeatures, &tbl, sizeof(FeatureTable)) );
+        }
+        void Guil_Full_setImageFeatures(PtrStepb p1_pos, PtrStepb p1_theta, PtrStepb p2_pos, PtrStepb d12, PtrStepb r1, PtrStepb r2)
+        {
+            FeatureTable tbl;
+
+            tbl.p1_pos_data = p1_pos.data;
+            tbl.p1_pos_step = p1_pos.step;
+
+            tbl.p1_theta_data = p1_theta.data;
+            tbl.p1_theta_step = p1_theta.step;
+
+            tbl.p2_pos_data = p2_pos.data;
+            tbl.p2_pos_step = p2_pos.step;
+
+            tbl.d12_data = d12.data;
+            tbl.d12_step = d12.step;
+
+            tbl.r1_data = r1.data;
+            tbl.r1_step = r1.step;
+
+            tbl.r2_data = r2.data;
+            tbl.r2_step = r2.step;
+
+            cudaSafeCall( cudaMemcpyToSymbol(c_imageFeatures, &tbl, sizeof(FeatureTable)) );
+        }
+
+        struct TemplFeatureTable
+        {
+            static __device__ float2* p1_pos(int n)
+            {
+                return (float2*)(c_templFeatures.p1_pos_data + n * c_templFeatures.p1_pos_step);
+            }
+            static __device__ float* p1_theta(int n)
+            {
+                return (float*)(c_templFeatures.p1_theta_data + n * c_templFeatures.p1_theta_step);
+            }
+            static __device__ float2* p2_pos(int n)
+            {
+                return (float2*)(c_templFeatures.p2_pos_data + n * c_templFeatures.p2_pos_step);
+            }
+
+            static __device__ float* d12(int n)
+            {
+                return (float*)(c_templFeatures.d12_data + n * c_templFeatures.d12_step);
+            }
+
+            static __device__ float2* r1(int n)
+            {
+                return (float2*)(c_templFeatures.r1_data + n * c_templFeatures.r1_step);
+            }
+            static __device__ float2* r2(int n)
+            {
+                return (float2*)(c_templFeatures.r2_data + n * c_templFeatures.r2_step);
+            }
+        };
+        struct ImageFeatureTable
+        {
+            static __device__ float2* p1_pos(int n)
+            {
+                return (float2*)(c_imageFeatures.p1_pos_data + n * c_imageFeatures.p1_pos_step);
+            }
+            static __device__ float* p1_theta(int n)
+            {
+                return (float*)(c_imageFeatures.p1_theta_data + n * c_imageFeatures.p1_theta_step);
+            }
+            static __device__ float2* p2_pos(int n)
+            {
+                return (float2*)(c_imageFeatures.p2_pos_data + n * c_imageFeatures.p2_pos_step);
+            }
+
+            static __device__ float* d12(int n)
+            {
+                return (float*)(c_imageFeatures.d12_data + n * c_imageFeatures.d12_step);
+            }
+
+            static __device__ float2* r1(int n)
+            {
+                return (float2*)(c_imageFeatures.r1_data + n * c_imageFeatures.r1_step);
+            }
+            static __device__ float2* r2(int n)
+            {
+                return (float2*)(c_imageFeatures.r2_data + n * c_imageFeatures.r2_step);
+            }
+        };
+
+        __device__ float clampAngle(float a)
+        {
+            float res = a;
+
+            while (res > 2.0f * CV_PI_F)
+                res -= 2.0f * CV_PI_F;
+            while (res < 0.0f)
+                res += 2.0f * CV_PI_F;
+
+            return res;
+        }
+
+        __device__ bool angleEq(float a, float b, float eps)
+        {
+            return (::fabs(clampAngle(a - b)) <= eps);
+        }
+
+        template <class FT, bool isTempl>
+        __global__ void Guil_Full_buildFeatureList(const unsigned int* coordList, const float* thetaList, const int pointsCount,
+                                                   int* sizes, const int maxSize,
+                                                   const float xi, const float angleEpsilon, const float alphaScale,
+                                                   const float2 center, const float maxDist)
+        {
+            const float p1_theta = thetaList[blockIdx.x];
+            const unsigned int coord1 = coordList[blockIdx.x];
+            float2 p1_pos;
+            p1_pos.x = (coord1 & 0xFFFF);
+            p1_pos.y = (coord1 >> 16) & 0xFFFF;
+
+            for (int i = threadIdx.x; i < pointsCount; i += blockDim.x)
+            {
+                const float p2_theta = thetaList[i];
+                const unsigned int coord2 = coordList[i];
+                float2 p2_pos;
+                p2_pos.x = (coord2 & 0xFFFF);
+                p2_pos.y = (coord2 >> 16) & 0xFFFF;
+
+                if (angleEq(p1_theta - p2_theta, xi, angleEpsilon))
+                {
+                    const float2 d = p1_pos - p2_pos;
+
+                    float alpha12 = clampAngle(::atan2(d.y, d.x) - p1_theta);
+                    float d12 = ::sqrtf(d.x * d.x + d.y * d.y);
+
+                    if (d12 > maxDist)
+                        continue;
+
+                    float2 r1 = p1_pos - center;
+                    float2 r2 = p2_pos - center;
+
+                    const int n = __float2int_rn(alpha12 * alphaScale);
+
+                    const int ind = ::atomicAdd(sizes + n, 1);
+
+                    if (ind < maxSize)
+                    {
+                        if (!isTempl)
+                        {
+                            FT::p1_pos(n)[ind] = p1_pos;
+                            FT::p2_pos(n)[ind] = p2_pos;
+                        }
+
+                        FT::p1_theta(n)[ind] = p1_theta;
+
+                        FT::d12(n)[ind] = d12;
+
+                        if (isTempl)
+                        {
+                            FT::r1(n)[ind] = r1;
+                            FT::r2(n)[ind] = r2;
+                        }
+                    }
+                }
+            }
+        }
+
+        template <class FT, bool isTempl>
+        void Guil_Full_buildFeatureList_caller(const unsigned int* coordList, const float* thetaList, int pointsCount,
+                                               int* sizes, int maxSize,
+                                               float xi, float angleEpsilon, int levels,
+                                               float2 center, float maxDist)
+        {
+            const dim3 block(256);
+            const dim3 grid(pointsCount);
+
+            const float alphaScale = levels / (2.0f * CV_PI_F);
+
+            Guil_Full_buildFeatureList<FT, isTempl><<<grid, block>>>(coordList, thetaList, pointsCount,
+                                                                     sizes, maxSize,
+                                                                     xi * (CV_PI_F / 180.0f), angleEpsilon * (CV_PI_F / 180.0f), alphaScale,
+                                                                     center, maxDist);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+
+            thrust::device_ptr<int> sizesPtr(sizes);
+            thrust::transform(sizesPtr, sizesPtr + levels + 1, sizesPtr, device::bind2nd(device::minimum<int>(), maxSize));
+        }
+
+        void Guil_Full_buildTemplFeatureList_gpu(const unsigned int* coordList, const float* thetaList, int pointsCount,
+                                                 int* sizes, int maxSize,
+                                                 float xi, float angleEpsilon, int levels,
+                                                 float2 center, float maxDist)
+        {
+            Guil_Full_buildFeatureList_caller<TemplFeatureTable, true>(coordList, thetaList, pointsCount,
+                                                                       sizes, maxSize,
+                                                                       xi, angleEpsilon, levels,
+                                                                       center, maxDist);
+        }
+        void Guil_Full_buildImageFeatureList_gpu(const unsigned int* coordList, const float* thetaList, int pointsCount,
+                                                 int* sizes, int maxSize,
+                                                 float xi, float angleEpsilon, int levels,
+                                                 float2 center, float maxDist)
+        {
+            Guil_Full_buildFeatureList_caller<ImageFeatureTable, false>(coordList, thetaList, pointsCount,
+                                                                        sizes, maxSize,
+                                                                        xi, angleEpsilon, levels,
+                                                                        center, maxDist);
+        }
+
+        __global__ void Guil_Full_calcOHist(const int* templSizes, const int* imageSizes, int* OHist,
+                                            const float minAngle, const float maxAngle, const float iAngleStep, const int angleRange)
+        {
+            extern __shared__ int s_OHist[];
+            for (int i = threadIdx.x; i <= angleRange; i += blockDim.x)
+                s_OHist[i] = 0;
+            __syncthreads();
+
+            const int tIdx = blockIdx.x;
+            const int level = blockIdx.y;
+
+            const int tSize = templSizes[level];
+
+            if (tIdx < tSize)
+            {
+                const int imSize = imageSizes[level];
+
+                const float t_p1_theta = TemplFeatureTable::p1_theta(level)[tIdx];
+
+                for (int i = threadIdx.x; i < imSize; i += blockDim.x)
+                {
+                    const float im_p1_theta = ImageFeatureTable::p1_theta(level)[i];
+
+                    const float angle = clampAngle(im_p1_theta - t_p1_theta);
+
+                    if (angle >= minAngle && angle <= maxAngle)
+                    {
+                        const int n = __float2int_rn((angle - minAngle) * iAngleStep);
+                        Emulation::smem::atomicAdd(&s_OHist[n], 1);
+                    }
+                }
+            }
+            __syncthreads();
+
+            for (int i = threadIdx.x; i <= angleRange; i += blockDim.x)
+                ::atomicAdd(OHist + i, s_OHist[i]);
+        }
+
+        void Guil_Full_calcOHist_gpu(const int* templSizes, const int* imageSizes, int* OHist,
+                                     float minAngle, float maxAngle, float angleStep, int angleRange,
+                                     int levels, int tMaxSize)
+        {
+            const dim3 block(256);
+            const dim3 grid(tMaxSize, levels + 1);
+
+            minAngle *= (CV_PI_F / 180.0f);
+            maxAngle *= (CV_PI_F / 180.0f);
+            angleStep *= (CV_PI_F / 180.0f);
+
+            const size_t smemSize = (angleRange + 1) * sizeof(float);
+
+            Guil_Full_calcOHist<<<grid, block, smemSize>>>(templSizes, imageSizes, OHist,
+                                                           minAngle, maxAngle, 1.0f / angleStep, angleRange);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        __global__ void Guil_Full_calcSHist(const int* templSizes, const int* imageSizes, int* SHist,
+                                            const float angle, const float angleEpsilon,
+                                            const float minScale, const float maxScale, const float iScaleStep, const int scaleRange)
+        {
+            extern __shared__ int s_SHist[];
+            for (int i = threadIdx.x; i <= scaleRange; i += blockDim.x)
+                s_SHist[i] = 0;
+            __syncthreads();
+
+            const int tIdx = blockIdx.x;
+            const int level = blockIdx.y;
+
+            const int tSize = templSizes[level];
+
+            if (tIdx < tSize)
+            {
+                const int imSize = imageSizes[level];
+
+                const float t_p1_theta = TemplFeatureTable::p1_theta(level)[tIdx] + angle;
+                const float t_d12 = TemplFeatureTable::d12(level)[tIdx] + angle;
+
+                for (int i = threadIdx.x; i < imSize; i += blockDim.x)
+                {
+                    const float im_p1_theta = ImageFeatureTable::p1_theta(level)[i];
+                    const float im_d12 = ImageFeatureTable::d12(level)[i];
+
+                    if (angleEq(im_p1_theta, t_p1_theta, angleEpsilon))
+                    {
+                        const float scale = im_d12 / t_d12;
+
+                        if (scale >= minScale && scale <= maxScale)
+                        {
+                            const int s = __float2int_rn((scale - minScale) * iScaleStep);
+                            Emulation::smem::atomicAdd(&s_SHist[s], 1);
+                        }
+                    }
+                }
+            }
+            __syncthreads();
+
+            for (int i = threadIdx.x; i <= scaleRange; i += blockDim.x)
+                ::atomicAdd(SHist + i, s_SHist[i]);
+        }
+
+        void Guil_Full_calcSHist_gpu(const int* templSizes, const int* imageSizes, int* SHist,
+                                     float angle, float angleEpsilon,
+                                     float minScale, float maxScale, float iScaleStep, int scaleRange,
+                                     int levels, int tMaxSize)
+        {
+            const dim3 block(256);
+            const dim3 grid(tMaxSize, levels + 1);
+
+            angle *= (CV_PI_F / 180.0f);
+            angleEpsilon *= (CV_PI_F / 180.0f);
+
+            const size_t smemSize = (scaleRange + 1) * sizeof(float);
+
+            Guil_Full_calcSHist<<<grid, block, smemSize>>>(templSizes, imageSizes, SHist,
+                                                           angle, angleEpsilon,
+                                                           minScale, maxScale, iScaleStep, scaleRange);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        __global__ void Guil_Full_calcPHist(const int* templSizes, const int* imageSizes, PtrStepSzi PHist,
+                                            const float angle, const float sinVal, const float cosVal, const float angleEpsilon, const float scale,
+                                            const float idp)
+        {
+            const int tIdx = blockIdx.x;
+            const int level = blockIdx.y;
+
+            const int tSize = templSizes[level];
+
+            if (tIdx < tSize)
+            {
+                const int imSize = imageSizes[level];
+
+                const float t_p1_theta = TemplFeatureTable::p1_theta(level)[tIdx] + angle;
+
+                float2 r1 = TemplFeatureTable::r1(level)[tIdx];
+                float2 r2 = TemplFeatureTable::r2(level)[tIdx];
+
+                r1 = r1 * scale;
+                r2 = r2 * scale;
+
+                r1 = make_float2(cosVal * r1.x - sinVal * r1.y, sinVal * r1.x + cosVal * r1.y);
+                r2 = make_float2(cosVal * r2.x - sinVal * r2.y, sinVal * r2.x + cosVal * r2.y);
+
+                for (int i = threadIdx.x; i < imSize; i += blockDim.x)
+                {
+                    const float im_p1_theta = ImageFeatureTable::p1_theta(level)[i];
+
+                    const float2 im_p1_pos = ImageFeatureTable::p1_pos(level)[i];
+                    const float2 im_p2_pos = ImageFeatureTable::p2_pos(level)[i];
+
+                    if (angleEq(im_p1_theta, t_p1_theta, angleEpsilon))
+                    {
+                        float2 c1, c2;
+
+                        c1 = im_p1_pos - r1;
+                        c1 = c1 * idp;
+
+                        c2 = im_p2_pos - r2;
+                        c2 = c2 * idp;
+
+                        if (::fabs(c1.x - c2.x) > 1 || ::fabs(c1.y - c2.y) > 1)
+                            continue;
+
+                        if (c1.y >= 0 && c1.y < PHist.rows - 2 && c1.x >= 0 && c1.x < PHist.cols - 2)
+                            ::atomicAdd(PHist.ptr(__float2int_rn(c1.y) + 1) + __float2int_rn(c1.x) + 1, 1);
+                    }
+                }
+            }
+        }
+
+        void Guil_Full_calcPHist_gpu(const int* templSizes, const int* imageSizes, PtrStepSzi PHist,
+                                     float angle, float angleEpsilon, float scale,
+                                     float dp,
+                                     int levels, int tMaxSize)
+        {
+            const dim3 block(256);
+            const dim3 grid(tMaxSize, levels + 1);
+
+            angle *= (CV_PI_F / 180.0f);
+            angleEpsilon *= (CV_PI_F / 180.0f);
+
+            const float sinVal = ::sinf(angle);
+            const float cosVal = ::cosf(angle);
+
+            cudaSafeCall( cudaFuncSetCacheConfig(Guil_Full_calcPHist, cudaFuncCachePreferL1) );
+
+            Guil_Full_calcPHist<<<grid, block>>>(templSizes, imageSizes, PHist,
+                                                 angle, sinVal, cosVal, angleEpsilon, scale,
+                                                 1.0f / dp);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        __global__ void Guil_Full_findPosInHist(const PtrStepSzi hist, float4* out, int3* votes, const int maxSize,
+                                                const float angle, const int angleVotes, const float scale, const int scaleVotes,
+                                                const float dp, const int threshold)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x >= hist.cols - 2 || y >= hist.rows - 2)
+                return;
+
+            const int curVotes = hist(y + 1, x + 1);
+
+            if (curVotes > threshold &&
+                curVotes >  hist(y + 1, x) &&
+                curVotes >= hist(y + 1, x + 2) &&
+                curVotes >  hist(y, x + 1) &&
+                curVotes >= hist(y + 2, x + 1))
+            {
+                const int ind = ::atomicAdd(&g_counter, 1);
+
+                if (ind < maxSize)
+                {
+                    out[ind] = make_float4(x * dp, y * dp, scale, angle);
+                    votes[ind] = make_int3(curVotes, scaleVotes, angleVotes);
+                }
+            }
+        }
+
+        int Guil_Full_findPosInHist_gpu(PtrStepSzi hist, float4* out, int3* votes, int curSize, int maxSize,
+                                        float angle, int angleVotes, float scale, int scaleVotes,
+                                        float dp, int threshold)
+        {
+            void* counterPtr;
+            cudaSafeCall( cudaGetSymbolAddress(&counterPtr, g_counter) );
+
+            cudaSafeCall( cudaMemcpy(counterPtr, &curSize, sizeof(int), cudaMemcpyHostToDevice) );
+
+            const dim3 block(32, 8);
+            const dim3 grid(divUp(hist.cols - 2, block.x), divUp(hist.rows - 2, block.y));
+
+            cudaSafeCall( cudaFuncSetCacheConfig(Guil_Full_findPosInHist, cudaFuncCachePreferL1) );
+
+            Guil_Full_findPosInHist<<<grid, block>>>(hist, out, votes, maxSize,
+                                                     angle, angleVotes, scale, scaleVotes,
+                                                     dp, threshold);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+
+            int totalCount;
+            cudaSafeCall( cudaMemcpy(&totalCount, counterPtr, sizeof(int), cudaMemcpyDeviceToHost) );
+
+            totalCount = ::min(totalCount, maxSize);
+
+            return totalCount;
+        }
+    }
+}}}
+
+#endif // HAVE_OPENCV_CUDAARITHM
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudaimgproc/src/cuda/gftt.cu b/modules/cudaimgproc/src/cuda/gftt.cu
new file mode 100644
index 00000000000..ab8713f868a
--- /dev/null
+++ b/modules/cudaimgproc/src/cuda/gftt.cu
@@ -0,0 +1,150 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include <thrust/device_ptr.h>
+#include <thrust/sort.h>
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/utility.hpp"
+#include <thrust/execution_policy.h>
+namespace cv { namespace cuda { namespace device
+{
+    namespace gfft
+    {
+        texture<float, cudaTextureType2D, cudaReadModeElementType> eigTex(0, cudaFilterModePoint, cudaAddressModeClamp);
+
+        __device__ int g_counter = 0;
+
+        template <class Mask> __global__ void findCorners(float threshold, const Mask mask, float2* corners, int max_count, int rows, int cols)
+        {
+            const int j = blockIdx.x * blockDim.x + threadIdx.x;
+            const int i = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (i > 0 && i < rows - 1 && j > 0 && j < cols - 1 && mask(i, j))
+            {
+                float val = tex2D(eigTex, j, i);
+
+                if (val > threshold)
+                {
+                    float maxVal = val;
+
+                    maxVal = ::fmax(tex2D(eigTex, j - 1, i - 1), maxVal);
+                    maxVal = ::fmax(tex2D(eigTex, j    , i - 1), maxVal);
+                    maxVal = ::fmax(tex2D(eigTex, j + 1, i - 1), maxVal);
+
+                    maxVal = ::fmax(tex2D(eigTex, j - 1, i), maxVal);
+                    maxVal = ::fmax(tex2D(eigTex, j + 1, i), maxVal);
+
+                    maxVal = ::fmax(tex2D(eigTex, j - 1, i + 1), maxVal);
+                    maxVal = ::fmax(tex2D(eigTex, j    , i + 1), maxVal);
+                    maxVal = ::fmax(tex2D(eigTex, j + 1, i + 1), maxVal);
+
+                    if (val == maxVal)
+                    {
+                        const int ind = ::atomicAdd(&g_counter, 1);
+
+                        if (ind < max_count)
+                            corners[ind] = make_float2(j, i);
+                    }
+                }
+            }
+        }
+
+        int findCorners_gpu(PtrStepSzf eig, float threshold, PtrStepSzb mask, float2* corners, int max_count, cudaStream_t stream)
+        {
+            void* counter_ptr;
+            cudaSafeCall( cudaGetSymbolAddress(&counter_ptr, g_counter) );
+
+            cudaSafeCall( cudaMemsetAsync(counter_ptr, 0, sizeof(int), stream) );
+
+            bindTexture(&eigTex, eig);
+
+            dim3 block(16, 16);
+            dim3 grid(divUp(eig.cols, block.x), divUp(eig.rows, block.y));
+
+            if (mask.data)
+                findCorners<<<grid, block, 0, stream>>>(threshold, SingleMask(mask), corners, max_count, eig.rows, eig.cols);
+            else
+                findCorners<<<grid, block, 0, stream>>>(threshold, WithOutMask(), corners, max_count, eig.rows, eig.cols);
+
+            cudaSafeCall( cudaGetLastError() );
+
+            int count;
+            cudaSafeCall( cudaMemcpyAsync(&count, counter_ptr, sizeof(int), cudaMemcpyDeviceToHost, stream) );
+            if (stream)
+                cudaSafeCall(cudaStreamSynchronize(stream));
+            else
+                cudaSafeCall( cudaDeviceSynchronize() );
+            return std::min(count, max_count);
+        }
+
+        class EigGreater
+        {
+        public:
+            __device__ __forceinline__ bool operator()(float2 a, float2 b) const
+            {
+                return tex2D(eigTex, a.x, a.y) > tex2D(eigTex, b.x, b.y);
+            }
+        };
+
+
+        void sortCorners_gpu(PtrStepSzf eig, float2* corners, int count, cudaStream_t stream)
+        {
+            bindTexture(&eigTex, eig);
+
+            thrust::device_ptr<float2> ptr(corners);
+#if THRUST_VERSION >= 100802
+            if (stream)
+                thrust::sort(thrust::cuda::par(ThrustAllocator::getAllocator()).on(stream), ptr, ptr + count, EigGreater());
+            else
+                thrust::sort(thrust::cuda::par(ThrustAllocator::getAllocator()), ptr, ptr + count, EigGreater());
+#else
+            thrust::sort(ptr, ptr + count, EigGreater());
+#endif
+        }
+    } // namespace optical_flow
+}}}
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudaimgproc/src/cuda/hist.cu b/modules/cudaimgproc/src/cuda/hist.cu
new file mode 100644
index 00000000000..be13091f12a
--- /dev/null
+++ b/modules/cudaimgproc/src/cuda/hist.cu
@@ -0,0 +1,299 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/functional.hpp"
+#include "opencv2/core/cuda/emulation.hpp"
+#include "opencv2/core/cuda/transform.hpp"
+
+using namespace cv::cuda;
+using namespace cv::cuda::device;
+
+namespace hist
+{
+    __global__ void histogram256Kernel(const uchar* src, int cols, int rows, size_t step, int* hist)
+    {
+        __shared__ int shist[256];
+
+        const int y = blockIdx.x * blockDim.y + threadIdx.y;
+        const int tid = threadIdx.y * blockDim.x + threadIdx.x;
+
+        shist[tid] = 0;
+        __syncthreads();
+
+        if (y < rows)
+        {
+            const unsigned int* rowPtr = (const unsigned int*) (src + y * step);
+
+            const int cols_4 = cols / 4;
+            for (int x = threadIdx.x; x < cols_4; x += blockDim.x)
+            {
+                unsigned int data = rowPtr[x];
+
+                Emulation::smem::atomicAdd(&shist[(data >>  0) & 0xFFU], 1);
+                Emulation::smem::atomicAdd(&shist[(data >>  8) & 0xFFU], 1);
+                Emulation::smem::atomicAdd(&shist[(data >> 16) & 0xFFU], 1);
+                Emulation::smem::atomicAdd(&shist[(data >> 24) & 0xFFU], 1);
+            }
+
+            if (cols % 4 != 0 && threadIdx.x == 0)
+            {
+                for (int x = cols_4 * 4; x < cols; ++x)
+                {
+                    unsigned int data = ((const uchar*)rowPtr)[x];
+                    Emulation::smem::atomicAdd(&shist[data], 1);
+                }
+            }
+        }
+
+        __syncthreads();
+
+        const int histVal = shist[tid];
+        if (histVal > 0)
+            ::atomicAdd(hist + tid, histVal);
+    }
+
+    void histogram256(PtrStepSzb src, int* hist, cudaStream_t stream)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(src.rows, block.y));
+
+        histogram256Kernel<<<grid, block, 0, stream>>>(src.data, src.cols, src.rows, src.step, hist);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    __global__ void histogram256Kernel(const uchar* src, int cols, int rows, size_t srcStep, const uchar* mask, size_t maskStep, int* hist)
+    {
+        __shared__ int shist[256];
+
+        const int y = blockIdx.x * blockDim.y + threadIdx.y;
+        const int tid = threadIdx.y * blockDim.x + threadIdx.x;
+
+        shist[tid] = 0;
+        __syncthreads();
+
+        if (y < rows)
+        {
+            const unsigned int* rowPtr = (const unsigned int*) (src + y * srcStep);
+            const unsigned int* maskRowPtr = (const unsigned int*) (mask + y * maskStep);
+
+            const int cols_4 = cols / 4;
+            for (int x = threadIdx.x; x < cols_4; x += blockDim.x)
+            {
+                unsigned int data = rowPtr[x];
+                unsigned int m = maskRowPtr[x];
+
+                if ((m >>  0) & 0xFFU)
+                    Emulation::smem::atomicAdd(&shist[(data >>  0) & 0xFFU], 1);
+
+                if ((m >>  8) & 0xFFU)
+                    Emulation::smem::atomicAdd(&shist[(data >>  8) & 0xFFU], 1);
+
+                if ((m >>  16) & 0xFFU)
+                    Emulation::smem::atomicAdd(&shist[(data >> 16) & 0xFFU], 1);
+
+                if ((m >>  24) & 0xFFU)
+                    Emulation::smem::atomicAdd(&shist[(data >> 24) & 0xFFU], 1);
+            }
+
+            if (cols % 4 != 0 && threadIdx.x == 0)
+            {
+                for (int x = cols_4 * 4; x < cols; ++x)
+                {
+                    unsigned int data = ((const uchar*)rowPtr)[x];
+                    unsigned int m = ((const uchar*)maskRowPtr)[x];
+
+                    if (m)
+                        Emulation::smem::atomicAdd(&shist[data], 1);
+                }
+            }
+        }
+
+        __syncthreads();
+
+        const int histVal = shist[tid];
+        if (histVal > 0)
+            ::atomicAdd(hist + tid, histVal);
+    }
+
+    void histogram256(PtrStepSzb src, PtrStepSzb mask, int* hist, cudaStream_t stream)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(src.rows, block.y));
+
+        histogram256Kernel<<<grid, block, 0, stream>>>(src.data, src.cols, src.rows, src.step, mask.data, mask.step, hist);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}
+
+/////////////////////////////////////////////////////////////////////////
+
+namespace hist
+{
+    __device__ __forceinline__ void histEvenInc(int* shist, uint data, int binSize, int lowerLevel, int upperLevel)
+    {
+        if (data >= lowerLevel && data <= upperLevel)
+        {
+            const uint ind = (data - lowerLevel) / binSize;
+            Emulation::smem::atomicAdd(shist + ind, 1);
+        }
+    }
+
+    __global__ void histEven8u(const uchar* src, const size_t step, const int rows, const int cols,
+                               int* hist, const int binCount, const int binSize, const int lowerLevel, const int upperLevel)
+    {
+        extern __shared__ int shist[];
+
+        const int y = blockIdx.x * blockDim.y + threadIdx.y;
+        const int tid = threadIdx.y * blockDim.x + threadIdx.x;
+
+        if (tid < binCount)
+            shist[tid] = 0;
+
+        __syncthreads();
+
+        if (y < rows)
+        {
+            const uchar* rowPtr = src + y * step;
+            const uint* rowPtr4 = (uint*) rowPtr;
+
+            const int cols_4 = cols / 4;
+            for (int x = threadIdx.x; x < cols_4; x += blockDim.x)
+            {
+                const uint data = rowPtr4[x];
+
+                histEvenInc(shist, (data >>  0) & 0xFFU, binSize, lowerLevel, upperLevel);
+                histEvenInc(shist, (data >>  8) & 0xFFU, binSize, lowerLevel, upperLevel);
+                histEvenInc(shist, (data >> 16) & 0xFFU, binSize, lowerLevel, upperLevel);
+                histEvenInc(shist, (data >> 24) & 0xFFU, binSize, lowerLevel, upperLevel);
+            }
+
+            if (cols % 4 != 0 && threadIdx.x == 0)
+            {
+                for (int x = cols_4 * 4; x < cols; ++x)
+                {
+                    const uchar data = rowPtr[x];
+                    histEvenInc(shist, data, binSize, lowerLevel, upperLevel);
+                }
+            }
+        }
+
+        __syncthreads();
+
+        if (tid < binCount)
+        {
+            const int histVal = shist[tid];
+
+            if (histVal > 0)
+                ::atomicAdd(hist + tid, histVal);
+        }
+    }
+
+    void histEven8u(PtrStepSzb src, int* hist, int binCount, int lowerLevel, int upperLevel, cudaStream_t stream)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(src.rows, block.y));
+
+        const int binSize = divUp(upperLevel - lowerLevel, binCount);
+
+        const size_t smem_size = binCount * sizeof(int);
+
+        histEven8u<<<grid, block, smem_size, stream>>>(src.data, src.step, src.rows, src.cols, hist, binCount, binSize, lowerLevel, upperLevel);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}
+
+/////////////////////////////////////////////////////////////////////////
+
+namespace hist
+{
+    __constant__ int c_lut[256];
+
+    struct EqualizeHist : unary_function<uchar, uchar>
+    {
+        float scale;
+
+        __host__ EqualizeHist(float _scale) : scale(_scale) {}
+
+        __device__ __forceinline__ uchar operator ()(uchar val) const
+        {
+            const int lut = c_lut[val];
+            return __float2int_rn(scale * lut);
+        }
+    };
+}
+
+namespace cv { namespace cuda { namespace device
+{
+    template <> struct TransformFunctorTraits<hist::EqualizeHist> : DefaultTransformFunctorTraits<hist::EqualizeHist>
+    {
+        enum { smart_shift = 4 };
+    };
+}}}
+
+namespace hist
+{
+    void equalizeHist(PtrStepSzb src, PtrStepSzb dst, const int* lut, cudaStream_t stream)
+    {
+        if (stream == 0)
+            cudaSafeCall( cudaMemcpyToSymbol(c_lut, lut, 256 * sizeof(int), 0, cudaMemcpyDeviceToDevice) );
+        else
+            cudaSafeCall( cudaMemcpyToSymbolAsync(c_lut, lut, 256 * sizeof(int), 0, cudaMemcpyDeviceToDevice, stream) );
+
+        const float scale = 255.0f / (src.cols * src.rows);
+
+        device::transform(src, dst, EqualizeHist(scale), WithOutMask(), stream);
+    }
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudaimgproc/src/cuda/hough_circles.cu b/modules/cudaimgproc/src/cuda/hough_circles.cu
new file mode 100644
index 00000000000..db1623eceb3
--- /dev/null
+++ b/modules/cudaimgproc/src/cuda/hough_circles.cu
@@ -0,0 +1,260 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/emulation.hpp"
+#include "opencv2/core/cuda/dynamic_smem.hpp"
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifdef HAVE_OPENCV_CUDAFILTERS
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace hough_circles
+    {
+        __device__ int g_counter;
+
+        ////////////////////////////////////////////////////////////////////////
+        // circlesAccumCenters
+
+        __global__ void circlesAccumCenters(const unsigned int* list, const int count, const PtrStepi dx, const PtrStepi dy,
+                                            PtrStepi accum, const int width, const int height, const int minRadius, const int maxRadius, const float idp)
+        {
+            const int SHIFT = 10;
+            const int ONE = 1 << SHIFT;
+
+            const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+
+            if (tid >= count)
+                return;
+
+            const unsigned int val = list[tid];
+
+            const int x = (val & 0xFFFF);
+            const int y = (val >> 16) & 0xFFFF;
+
+            const int vx = dx(y, x);
+            const int vy = dy(y, x);
+
+            if (vx == 0 && vy == 0)
+                return;
+
+            const float mag = ::sqrtf(vx * vx + vy * vy);
+
+            const int x0 = __float2int_rn((x * idp) * ONE);
+            const int y0 = __float2int_rn((y * idp) * ONE);
+
+            int sx = __float2int_rn((vx * idp) * ONE / mag);
+            int sy = __float2int_rn((vy * idp) * ONE / mag);
+
+            // Step from minRadius to maxRadius in both directions of the gradient
+            for (int k1 = 0; k1 < 2; ++k1)
+            {
+                int x1 = x0 + minRadius * sx;
+                int y1 = y0 + minRadius * sy;
+
+                for (int r = minRadius; r <= maxRadius; x1 += sx, y1 += sy, ++r)
+                {
+                    const int x2 = x1 >> SHIFT;
+                    const int y2 = y1 >> SHIFT;
+
+                    if (x2 < 0 || x2 >= width || y2 < 0 || y2 >= height)
+                        break;
+
+                    ::atomicAdd(accum.ptr(y2 + 1) + x2 + 1, 1);
+                }
+
+                sx = -sx;
+                sy = -sy;
+            }
+        }
+
+        void circlesAccumCenters_gpu(const unsigned int* list, int count, PtrStepi dx, PtrStepi dy, PtrStepSzi accum, int minRadius, int maxRadius, float idp)
+        {
+            const dim3 block(256);
+            const dim3 grid(divUp(count, block.x));
+
+            cudaSafeCall( cudaFuncSetCacheConfig(circlesAccumCenters, cudaFuncCachePreferL1) );
+
+            circlesAccumCenters<<<grid, block>>>(list, count, dx, dy, accum, accum.cols - 2, accum.rows - 2, minRadius, maxRadius, idp);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ////////////////////////////////////////////////////////////////////////
+        // buildCentersList
+
+        __global__ void buildCentersList(const PtrStepSzi accum, unsigned int* centers, const int threshold)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < accum.cols - 2 && y < accum.rows - 2)
+            {
+                const int top = accum(y, x + 1);
+
+                const int left = accum(y + 1, x);
+                const int cur = accum(y + 1, x + 1);
+                const int right = accum(y + 1, x + 2);
+
+                const int bottom = accum(y + 2, x + 1);
+
+                if (cur > threshold && cur > top && cur >= bottom && cur >  left && cur >= right)
+                {
+                    const unsigned int val = (y << 16) | x;
+                    const int idx = ::atomicAdd(&g_counter, 1);
+                    centers[idx] = val;
+                }
+            }
+        }
+
+        int buildCentersList_gpu(PtrStepSzi accum, unsigned int* centers, int threshold)
+        {
+            void* counterPtr;
+            cudaSafeCall( cudaGetSymbolAddress(&counterPtr, g_counter) );
+
+            cudaSafeCall( cudaMemset(counterPtr, 0, sizeof(int)) );
+
+            const dim3 block(32, 8);
+            const dim3 grid(divUp(accum.cols - 2, block.x), divUp(accum.rows - 2, block.y));
+
+            cudaSafeCall( cudaFuncSetCacheConfig(buildCentersList, cudaFuncCachePreferL1) );
+
+            buildCentersList<<<grid, block>>>(accum, centers, threshold);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+
+            int totalCount;
+            cudaSafeCall( cudaMemcpy(&totalCount, counterPtr, sizeof(int), cudaMemcpyDeviceToHost) );
+
+            return totalCount;
+        }
+
+        ////////////////////////////////////////////////////////////////////////
+        // circlesAccumRadius
+
+        __global__ void circlesAccumRadius(const unsigned int* centers, const unsigned int* list, const int count,
+                                           float3* circles, const int maxCircles, const float dp,
+                                           const int minRadius, const int maxRadius, const int histSize, const int threshold)
+        {
+            int* smem = DynamicSharedMem<int>();
+
+            for (int i = threadIdx.x; i < histSize + 2; i += blockDim.x)
+                smem[i] = 0;
+            __syncthreads();
+
+            unsigned int val = centers[blockIdx.x];
+
+            float cx = (val & 0xFFFF);
+            float cy = (val >> 16) & 0xFFFF;
+
+            cx = (cx + 0.5f) * dp;
+            cy = (cy + 0.5f) * dp;
+
+            for (int i = threadIdx.x; i < count; i += blockDim.x)
+            {
+                val = list[i];
+
+                const int x = (val & 0xFFFF);
+                const int y = (val >> 16) & 0xFFFF;
+
+                const float rad = ::sqrtf((cx - x) * (cx - x) + (cy - y) * (cy - y));
+                if (rad >= minRadius && rad <= maxRadius)
+                {
+                    const int r = __float2int_rn(rad - minRadius);
+
+                    Emulation::smem::atomicAdd(&smem[r + 1], 1);
+                }
+            }
+
+            __syncthreads();
+
+            for (int i = threadIdx.x; i < histSize; i += blockDim.x)
+            {
+                const int curVotes = smem[i + 1];
+
+                if (curVotes >= threshold && curVotes > smem[i] && curVotes >= smem[i + 2])
+                {
+                    const int ind = ::atomicAdd(&g_counter, 1);
+                    if (ind < maxCircles)
+                        circles[ind] = make_float3(cx, cy, i + minRadius);
+                }
+            }
+        }
+
+        int circlesAccumRadius_gpu(const unsigned int* centers, int centersCount, const unsigned int* list, int count,
+                                   float3* circles, int maxCircles, float dp, int minRadius, int maxRadius, int threshold, bool has20)
+        {
+            void* counterPtr;
+            cudaSafeCall( cudaGetSymbolAddress(&counterPtr, g_counter) );
+
+            cudaSafeCall( cudaMemset(counterPtr, 0, sizeof(int)) );
+
+            const dim3 block(has20 ? 1024 : 512);
+            const dim3 grid(centersCount);
+
+            const int histSize = maxRadius - minRadius + 1;
+            size_t smemSize = (histSize + 2) * sizeof(int);
+
+            circlesAccumRadius<<<grid, block, smemSize>>>(centers, list, count, circles, maxCircles, dp, minRadius, maxRadius, histSize, threshold);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+
+            int totalCount;
+            cudaSafeCall( cudaMemcpy(&totalCount, counterPtr, sizeof(int), cudaMemcpyDeviceToHost) );
+
+            totalCount = ::min(totalCount, maxCircles);
+
+            return totalCount;
+        }
+    }
+}}}
+
+#endif // HAVE_OPENCV_CUDAFILTERS
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudaimgproc/src/cuda/hough_lines.cu b/modules/cudaimgproc/src/cuda/hough_lines.cu
new file mode 100644
index 00000000000..9a93cbf147a
--- /dev/null
+++ b/modules/cudaimgproc/src/cuda/hough_lines.cu
@@ -0,0 +1,212 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include <thrust/device_ptr.h>
+#include <thrust/sort.h>
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/emulation.hpp"
+#include "opencv2/core/cuda/dynamic_smem.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace hough_lines
+    {
+        __device__ int g_counter;
+
+        ////////////////////////////////////////////////////////////////////////
+        // linesAccum
+
+        __global__ void linesAccumGlobal(const unsigned int* list, const int count, PtrStepi accum, const float irho, const float theta, const int numrho)
+        {
+            const int n = blockIdx.x;
+            const float ang = n * theta;
+
+            float sinVal;
+            float cosVal;
+            sincosf(ang, &sinVal, &cosVal);
+            sinVal *= irho;
+            cosVal *= irho;
+
+            const int shift = (numrho - 1) / 2;
+
+            int* accumRow = accum.ptr(n + 1);
+            for (int i = threadIdx.x; i < count; i += blockDim.x)
+            {
+                const unsigned int val = list[i];
+
+                const int x = (val & 0xFFFF);
+                const int y = (val >> 16) & 0xFFFF;
+
+                int r = __float2int_rn(x * cosVal + y * sinVal);
+                r += shift;
+
+                ::atomicAdd(accumRow + r + 1, 1);
+            }
+        }
+
+        __global__ void linesAccumShared(const unsigned int* list, const int count, PtrStepi accum, const float irho, const float theta, const int numrho)
+        {
+            int* smem = DynamicSharedMem<int>();
+
+            for (int i = threadIdx.x; i < numrho + 1; i += blockDim.x)
+                smem[i] = 0;
+
+            __syncthreads();
+
+            const int n = blockIdx.x;
+            const float ang = n * theta;
+
+            float sinVal;
+            float cosVal;
+            sincosf(ang, &sinVal, &cosVal);
+            sinVal *= irho;
+            cosVal *= irho;
+
+            const int shift = (numrho - 1) / 2;
+
+            for (int i = threadIdx.x; i < count; i += blockDim.x)
+            {
+                const unsigned int val = list[i];
+
+                const int x = (val & 0xFFFF);
+                const int y = (val >> 16) & 0xFFFF;
+
+                int r = __float2int_rn(x * cosVal + y * sinVal);
+                r += shift;
+
+                Emulation::smem::atomicAdd(&smem[r + 1], 1);
+            }
+
+            __syncthreads();
+
+            int* accumRow = accum.ptr(n + 1);
+            for (int i = threadIdx.x; i < numrho + 1; i += blockDim.x)
+                accumRow[i] = smem[i];
+        }
+
+        void linesAccum_gpu(const unsigned int* list, int count, PtrStepSzi accum, float rho, float theta, size_t sharedMemPerBlock, bool has20)
+        {
+            const dim3 block(has20 ? 1024 : 512);
+            const dim3 grid(accum.rows - 2);
+
+            size_t smemSize = (accum.cols - 1) * sizeof(int);
+
+            if (smemSize < sharedMemPerBlock - 1000)
+                linesAccumShared<<<grid, block, smemSize>>>(list, count, accum, 1.0f / rho, theta, accum.cols - 2);
+            else
+                linesAccumGlobal<<<grid, block>>>(list, count, accum, 1.0f / rho, theta, accum.cols - 2);
+
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ////////////////////////////////////////////////////////////////////////
+        // linesGetResult
+
+        __global__ void linesGetResult(const PtrStepSzi accum, float2* out, int* votes, const int maxSize, const float rho, const float theta, const int threshold, const int numrho)
+        {
+            const int r = blockIdx.x * blockDim.x + threadIdx.x;
+            const int n = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (r >= accum.cols - 2 || n >= accum.rows - 2)
+                return;
+
+            const int curVotes = accum(n + 1, r + 1);
+
+            if (curVotes > threshold &&
+                curVotes >  accum(n + 1, r) &&
+                curVotes >= accum(n + 1, r + 2) &&
+                curVotes >  accum(n, r + 1) &&
+                curVotes >= accum(n + 2, r + 1))
+            {
+                const float radius = (r - (numrho - 1) * 0.5f) * rho;
+                const float angle = n * theta;
+
+                const int ind = ::atomicAdd(&g_counter, 1);
+                if (ind < maxSize)
+                {
+                    out[ind] = make_float2(radius, angle);
+                    votes[ind] = curVotes;
+                }
+            }
+        }
+
+        int linesGetResult_gpu(PtrStepSzi accum, float2* out, int* votes, int maxSize, float rho, float theta, int threshold, bool doSort)
+        {
+            void* counterPtr;
+            cudaSafeCall( cudaGetSymbolAddress(&counterPtr, g_counter) );
+
+            cudaSafeCall( cudaMemset(counterPtr, 0, sizeof(int)) );
+
+            const dim3 block(32, 8);
+            const dim3 grid(divUp(accum.cols - 2, block.x), divUp(accum.rows - 2, block.y));
+
+            cudaSafeCall( cudaFuncSetCacheConfig(linesGetResult, cudaFuncCachePreferL1) );
+
+            linesGetResult<<<grid, block>>>(accum, out, votes, maxSize, rho, theta, threshold, accum.cols - 2);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+
+            int totalCount;
+            cudaSafeCall( cudaMemcpy(&totalCount, counterPtr, sizeof(int), cudaMemcpyDeviceToHost) );
+
+            totalCount = ::min(totalCount, maxSize);
+
+            if (doSort && totalCount > 0)
+            {
+                thrust::device_ptr<float2> outPtr(out);
+                thrust::device_ptr<int> votesPtr(votes);
+                thrust::sort_by_key(votesPtr, votesPtr + totalCount, outPtr, thrust::greater<int>());
+            }
+
+            return totalCount;
+        }
+    }
+}}}
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudaimgproc/src/cuda/hough_segments.cu b/modules/cudaimgproc/src/cuda/hough_segments.cu
new file mode 100644
index 00000000000..ca433d30db3
--- /dev/null
+++ b/modules/cudaimgproc/src/cuda/hough_segments.cu
@@ -0,0 +1,249 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace hough_segments
+    {
+        __device__ int g_counter;
+
+        texture<uchar, cudaTextureType2D, cudaReadModeElementType> tex_mask(false, cudaFilterModePoint, cudaAddressModeClamp);
+
+        __global__ void houghLinesProbabilistic(const PtrStepSzi accum,
+                                                int4* out, const int maxSize,
+                                                const float rho, const float theta,
+                                                const int lineGap, const int lineLength,
+                                                const int rows, const int cols)
+        {
+            const int r = blockIdx.x * blockDim.x + threadIdx.x;
+            const int n = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (r >= accum.cols - 2 || n >= accum.rows - 2)
+                return;
+
+            const int curVotes = accum(n + 1, r + 1);
+
+            if (curVotes >= lineLength &&
+                curVotes > accum(n, r) &&
+                curVotes > accum(n, r + 1) &&
+                curVotes > accum(n, r + 2) &&
+                curVotes > accum(n + 1, r) &&
+                curVotes > accum(n + 1, r + 2) &&
+                curVotes > accum(n + 2, r) &&
+                curVotes > accum(n + 2, r + 1) &&
+                curVotes > accum(n + 2, r + 2))
+            {
+                const float radius = (r - (accum.cols - 2 - 1) * 0.5f) * rho;
+                const float angle = n * theta;
+
+                float cosa;
+                float sina;
+                sincosf(angle, &sina, &cosa);
+
+                float2 p0 = make_float2(cosa * radius, sina * radius);
+                float2 dir = make_float2(-sina, cosa);
+
+                float2 pb[4] = {make_float2(-1, -1), make_float2(-1, -1), make_float2(-1, -1), make_float2(-1, -1)};
+                float a;
+
+                if (dir.x != 0)
+                {
+                    a = -p0.x / dir.x;
+                    pb[0].x = 0;
+                    pb[0].y = p0.y + a * dir.y;
+
+                    a = (cols - 1 - p0.x) / dir.x;
+                    pb[1].x = cols - 1;
+                    pb[1].y = p0.y + a * dir.y;
+                }
+                if (dir.y != 0)
+                {
+                    a = -p0.y / dir.y;
+                    pb[2].x = p0.x + a * dir.x;
+                    pb[2].y = 0;
+
+                    a = (rows - 1 - p0.y) / dir.y;
+                    pb[3].x = p0.x + a * dir.x;
+                    pb[3].y = rows - 1;
+                }
+
+                if (pb[0].x == 0 && (pb[0].y >= 0 && pb[0].y < rows))
+                {
+                    p0 = pb[0];
+                    if (dir.x < 0)
+                        dir = -dir;
+                }
+                else if (pb[1].x == cols - 1 && (pb[1].y >= 0 && pb[1].y < rows))
+                {
+                    p0 = pb[1];
+                    if (dir.x > 0)
+                        dir = -dir;
+                }
+                else if (pb[2].y == 0 && (pb[2].x >= 0 && pb[2].x < cols))
+                {
+                    p0 = pb[2];
+                    if (dir.y < 0)
+                        dir = -dir;
+                }
+                else if (pb[3].y == rows - 1 && (pb[3].x >= 0 && pb[3].x < cols))
+                {
+                    p0 = pb[3];
+                    if (dir.y > 0)
+                        dir = -dir;
+                }
+
+                float2 d;
+                if (::fabsf(dir.x) > ::fabsf(dir.y))
+                {
+                    d.x = dir.x > 0 ? 1 : -1;
+                    d.y = dir.y / ::fabsf(dir.x);
+                }
+                else
+                {
+                    d.x = dir.x / ::fabsf(dir.y);
+                    d.y = dir.y > 0 ? 1 : -1;
+                }
+
+                float2 line_end[2];
+                int gap;
+                bool inLine = false;
+
+                float2 p1 = p0;
+                if (p1.x < 0 || p1.x >= cols || p1.y < 0 || p1.y >= rows)
+                    return;
+
+                for (;;)
+                {
+                    if (tex2D(tex_mask, p1.x, p1.y))
+                    {
+                        gap = 0;
+
+                        if (!inLine)
+                        {
+                            line_end[0] = p1;
+                            line_end[1] = p1;
+                            inLine = true;
+                        }
+                        else
+                        {
+                            line_end[1] = p1;
+                        }
+                    }
+                    else if (inLine)
+                    {
+                        if (++gap > lineGap)
+                        {
+                            bool good_line = ::abs(line_end[1].x - line_end[0].x) >= lineLength ||
+                                             ::abs(line_end[1].y - line_end[0].y) >= lineLength;
+
+                            if (good_line)
+                            {
+                                const int ind = ::atomicAdd(&g_counter, 1);
+                                if (ind < maxSize)
+                                    out[ind] = make_int4(line_end[0].x, line_end[0].y, line_end[1].x, line_end[1].y);
+                            }
+
+                            gap = 0;
+                            inLine = false;
+                        }
+                    }
+
+                    p1 = p1 + d;
+                    if (p1.x < 0 || p1.x >= cols || p1.y < 0 || p1.y >= rows)
+                    {
+                        if (inLine)
+                        {
+                            bool good_line = ::abs(line_end[1].x - line_end[0].x) >= lineLength ||
+                                             ::abs(line_end[1].y - line_end[0].y) >= lineLength;
+
+                            if (good_line)
+                            {
+                                const int ind = ::atomicAdd(&g_counter, 1);
+                                if (ind < maxSize)
+                                    out[ind] = make_int4(line_end[0].x, line_end[0].y, line_end[1].x, line_end[1].y);
+                            }
+
+                        }
+                        break;
+                    }
+                }
+            }
+        }
+
+        int houghLinesProbabilistic_gpu(PtrStepSzb mask, PtrStepSzi accum, int4* out, int maxSize, float rho, float theta, int lineGap, int lineLength)
+        {
+            void* counterPtr;
+            cudaSafeCall( cudaGetSymbolAddress(&counterPtr, g_counter) );
+
+            cudaSafeCall( cudaMemset(counterPtr, 0, sizeof(int)) );
+
+            const dim3 block(32, 8);
+            const dim3 grid(divUp(accum.cols - 2, block.x), divUp(accum.rows - 2, block.y));
+
+            bindTexture(&tex_mask, mask);
+
+            houghLinesProbabilistic<<<grid, block>>>(accum,
+                                                     out, maxSize,
+                                                     rho, theta,
+                                                     lineGap, lineLength,
+                                                     mask.rows, mask.cols);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+
+            int totalCount;
+            cudaSafeCall( cudaMemcpy(&totalCount, counterPtr, sizeof(int), cudaMemcpyDeviceToHost) );
+
+            totalCount = ::min(totalCount, maxSize);
+
+            return totalCount;
+        }
+    }
+}}}
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudaimgproc/src/cuda/match_template.cu b/modules/cudaimgproc/src/cuda/match_template.cu
new file mode 100644
index 00000000000..87ee71e1e72
--- /dev/null
+++ b/modules/cudaimgproc/src/cuda/match_template.cu
@@ -0,0 +1,916 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace match_template
+    {
+        __device__ __forceinline__ float sum(float v) { return v; }
+        __device__ __forceinline__ float sum(float2 v) { return v.x + v.y; }
+        __device__ __forceinline__ float sum(float3 v) { return v.x + v.y + v.z; }
+        __device__ __forceinline__ float sum(float4 v) { return v.x + v.y + v.z + v.w; }
+
+        __device__ __forceinline__ float first(float v) { return v; }
+        __device__ __forceinline__ float first(float2 v) { return v.x; }
+        __device__ __forceinline__ float first(float3 v) { return v.x; }
+        __device__ __forceinline__ float first(float4 v) { return v.x; }
+
+        __device__ __forceinline__ float mul(float a, float b) { return a * b; }
+        __device__ __forceinline__ float2 mul(float2 a, float2 b) { return make_float2(a.x * b.x, a.y * b.y); }
+        __device__ __forceinline__ float3 mul(float3 a, float3 b) { return make_float3(a.x * b.x, a.y * b.y, a.z * b.z); }
+        __device__ __forceinline__ float4 mul(float4 a, float4 b) { return make_float4(a.x * b.x, a.y * b.y, a.z * b.z, a.w * b.w); }
+
+        __device__ __forceinline__ float mul(uchar a, uchar b) { return a * b; }
+        __device__ __forceinline__ float2 mul(uchar2 a, uchar2 b) { return make_float2(a.x * b.x, a.y * b.y); }
+        __device__ __forceinline__ float3 mul(uchar3 a, uchar3 b) { return make_float3(a.x * b.x, a.y * b.y, a.z * b.z); }
+        __device__ __forceinline__ float4 mul(uchar4 a, uchar4 b) { return make_float4(a.x * b.x, a.y * b.y, a.z * b.z, a.w * b.w); }
+
+        __device__ __forceinline__ float sub(float a, float b) { return a - b; }
+        __device__ __forceinline__ float2 sub(float2 a, float2 b) { return make_float2(a.x - b.x, a.y - b.y); }
+        __device__ __forceinline__ float3 sub(float3 a, float3 b) { return make_float3(a.x - b.x, a.y - b.y, a.z - b.z); }
+        __device__ __forceinline__ float4 sub(float4 a, float4 b) { return make_float4(a.x - b.x, a.y - b.y, a.z - b.z, a.w - b.w); }
+
+        __device__ __forceinline__ float sub(uchar a, uchar b) { return a - b; }
+        __device__ __forceinline__ float2 sub(uchar2 a, uchar2 b) { return make_float2(a.x - b.x, a.y - b.y); }
+        __device__ __forceinline__ float3 sub(uchar3 a, uchar3 b) { return make_float3(a.x - b.x, a.y - b.y, a.z - b.z); }
+        __device__ __forceinline__ float4 sub(uchar4 a, uchar4 b) { return make_float4(a.x - b.x, a.y - b.y, a.z - b.z, a.w - b.w); }
+
+        //////////////////////////////////////////////////////////////////////
+        // Naive_CCORR
+
+        template <typename T, int cn>
+        __global__ void matchTemplateNaiveKernel_CCORR(int w, int h, const PtrStepb image, const PtrStepb templ, PtrStepSzf result)
+        {
+            typedef typename TypeVec<T, cn>::vec_type Type;
+            typedef typename TypeVec<float, cn>::vec_type Typef;
+
+            int x = blockDim.x * blockIdx.x + threadIdx.x;
+            int y = blockDim.y * blockIdx.y + threadIdx.y;
+
+            if (x < result.cols && y < result.rows)
+            {
+                Typef res = VecTraits<Typef>::all(0);
+
+                for (int i = 0; i < h; ++i)
+                {
+                    const Type* image_ptr = (const Type*)image.ptr(y + i);
+                    const Type* templ_ptr = (const Type*)templ.ptr(i);
+                    for (int j = 0; j < w; ++j)
+                        res = res + mul(image_ptr[x + j], templ_ptr[j]);
+                }
+
+                result.ptr(y)[x] = sum(res);
+            }
+        }
+
+        template <typename T, int cn>
+        void matchTemplateNaive_CCORR(const PtrStepSzb image, const PtrStepSzb templ, PtrStepSzf result, cudaStream_t stream)
+        {
+            const dim3 threads(32, 8);
+            const dim3 grid(divUp(result.cols, threads.x), divUp(result.rows, threads.y));
+
+            matchTemplateNaiveKernel_CCORR<T, cn><<<grid, threads, 0, stream>>>(templ.cols, templ.rows, image, templ, result);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        void matchTemplateNaive_CCORR_32F(const PtrStepSzb image, const PtrStepSzb templ, PtrStepSzf result, int cn, cudaStream_t stream)
+        {
+            typedef void (*caller_t)(const PtrStepSzb image, const PtrStepSzb templ, PtrStepSzf result, cudaStream_t stream);
+
+            static const caller_t callers[] =
+            {
+                0, matchTemplateNaive_CCORR<float, 1>, matchTemplateNaive_CCORR<float, 2>, matchTemplateNaive_CCORR<float, 3>, matchTemplateNaive_CCORR<float, 4>
+            };
+
+            callers[cn](image, templ, result, stream);
+        }
+
+
+        void matchTemplateNaive_CCORR_8U(const PtrStepSzb image, const PtrStepSzb templ, PtrStepSzf result, int cn, cudaStream_t stream)
+        {
+            typedef void (*caller_t)(const PtrStepSzb image, const PtrStepSzb templ, PtrStepSzf result, cudaStream_t stream);
+
+            static const caller_t callers[] =
+            {
+                0, matchTemplateNaive_CCORR<uchar, 1>, matchTemplateNaive_CCORR<uchar, 2>, matchTemplateNaive_CCORR<uchar, 3>, matchTemplateNaive_CCORR<uchar, 4>
+            };
+
+            callers[cn](image, templ, result, stream);
+        }
+
+        //////////////////////////////////////////////////////////////////////
+        // Naive_SQDIFF
+
+        template <typename T, int cn>
+        __global__ void matchTemplateNaiveKernel_SQDIFF(int w, int h, const PtrStepb image, const PtrStepb templ, PtrStepSzf result)
+        {
+            typedef typename TypeVec<T, cn>::vec_type Type;
+            typedef typename TypeVec<float, cn>::vec_type Typef;
+
+            int x = blockDim.x * blockIdx.x + threadIdx.x;
+            int y = blockDim.y * blockIdx.y + threadIdx.y;
+
+            if (x < result.cols && y < result.rows)
+            {
+                Typef res = VecTraits<Typef>::all(0);
+                Typef delta;
+
+                for (int i = 0; i < h; ++i)
+                {
+                    const Type* image_ptr = (const Type*)image.ptr(y + i);
+                    const Type* templ_ptr = (const Type*)templ.ptr(i);
+                    for (int j = 0; j < w; ++j)
+                    {
+                        delta = sub(image_ptr[x + j], templ_ptr[j]);
+                        res = res + delta * delta;
+                    }
+                }
+
+                result.ptr(y)[x] = sum(res);
+            }
+        }
+
+        template <typename T, int cn>
+        void matchTemplateNaive_SQDIFF(const PtrStepSzb image, const PtrStepSzb templ, PtrStepSzf result, cudaStream_t stream)
+        {
+            const dim3 threads(32, 8);
+            const dim3 grid(divUp(result.cols, threads.x), divUp(result.rows, threads.y));
+
+            matchTemplateNaiveKernel_SQDIFF<T, cn><<<grid, threads, 0, stream>>>(templ.cols, templ.rows, image, templ, result);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        void matchTemplateNaive_SQDIFF_32F(const PtrStepSzb image, const PtrStepSzb templ, PtrStepSzf result, int cn, cudaStream_t stream)
+        {
+            typedef void (*caller_t)(const PtrStepSzb image, const PtrStepSzb templ, PtrStepSzf result, cudaStream_t stream);
+
+            static const caller_t callers[] =
+            {
+                0, matchTemplateNaive_SQDIFF<float, 1>, matchTemplateNaive_SQDIFF<float, 2>, matchTemplateNaive_SQDIFF<float, 3>, matchTemplateNaive_SQDIFF<float, 4>
+            };
+
+            callers[cn](image, templ, result, stream);
+        }
+
+        void matchTemplateNaive_SQDIFF_8U(const PtrStepSzb image, const PtrStepSzb templ, PtrStepSzf result, int cn, cudaStream_t stream)
+        {
+            typedef void (*caller_t)(const PtrStepSzb image, const PtrStepSzb templ, PtrStepSzf result, cudaStream_t stream);
+
+            static const caller_t callers[] =
+            {
+                0, matchTemplateNaive_SQDIFF<uchar, 1>, matchTemplateNaive_SQDIFF<uchar, 2>, matchTemplateNaive_SQDIFF<uchar, 3>, matchTemplateNaive_SQDIFF<uchar, 4>
+            };
+
+            callers[cn](image, templ, result, stream);
+        }
+
+        //////////////////////////////////////////////////////////////////////
+        // Prepared_SQDIFF
+
+        template <int cn>
+        __global__ void matchTemplatePreparedKernel_SQDIFF_8U(int w, int h, const PtrStep<double> image_sqsum, double templ_sqsum, PtrStepSzf result)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < result.cols && y < result.rows)
+            {
+                float image_sqsum_ = (float)(
+                        (image_sqsum.ptr(y + h)[(x + w) * cn] - image_sqsum.ptr(y)[(x + w) * cn]) -
+                        (image_sqsum.ptr(y + h)[x * cn] - image_sqsum.ptr(y)[x * cn]));
+                float ccorr = result.ptr(y)[x];
+                result.ptr(y)[x] = image_sqsum_ - 2.f * ccorr + templ_sqsum;
+            }
+        }
+
+        template <int cn>
+        void matchTemplatePrepared_SQDIFF_8U(int w, int h, const PtrStepSz<double> image_sqsum, double templ_sqsum, PtrStepSzf result, cudaStream_t stream)
+        {
+            const dim3 threads(32, 8);
+            const dim3 grid(divUp(result.cols, threads.x), divUp(result.rows, threads.y));
+
+            matchTemplatePreparedKernel_SQDIFF_8U<cn><<<grid, threads, 0, stream>>>(w, h, image_sqsum, templ_sqsum, result);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        void matchTemplatePrepared_SQDIFF_8U(int w, int h, const PtrStepSz<double> image_sqsum, double templ_sqsum, PtrStepSzf result, int cn,
+                                             cudaStream_t stream)
+        {
+            typedef void (*caller_t)(int w, int h, const PtrStepSz<double> image_sqsum, double templ_sqsum, PtrStepSzf result, cudaStream_t stream);
+
+            static const caller_t callers[] =
+            {
+                0, matchTemplatePrepared_SQDIFF_8U<1>, matchTemplatePrepared_SQDIFF_8U<2>, matchTemplatePrepared_SQDIFF_8U<3>, matchTemplatePrepared_SQDIFF_8U<4>
+            };
+
+            callers[cn](w, h, image_sqsum, templ_sqsum, result, stream);
+        }
+
+        //////////////////////////////////////////////////////////////////////
+        // Prepared_SQDIFF_NORMED
+
+        // normAcc* are accurate normalization routines which make CUDA matchTemplate
+        // consistent with CPU one
+
+        __device__ float normAcc(float num, float denum)
+        {
+            if (::fabs(num) < denum)
+                return num / denum;
+            if (::fabs(num) < denum * 1.125f)
+                return num > 0 ? 1 : -1;
+            return 0;
+        }
+
+
+        __device__ float normAcc_SQDIFF(float num, float denum)
+        {
+            if (::fabs(num) < denum)
+                return num / denum;
+            if (::fabs(num) < denum * 1.125f)
+                return num > 0 ? 1 : -1;
+            return 1;
+        }
+
+
+        template <int cn>
+        __global__ void matchTemplatePreparedKernel_SQDIFF_NORMED_8U(
+                int w, int h, const PtrStep<double> image_sqsum,
+                double templ_sqsum, PtrStepSzf result)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < result.cols && y < result.rows)
+            {
+                float image_sqsum_ = (float)(
+                        (image_sqsum.ptr(y + h)[(x + w) * cn] - image_sqsum.ptr(y)[(x + w) * cn]) -
+                        (image_sqsum.ptr(y + h)[x * cn] - image_sqsum.ptr(y)[x * cn]));
+                float ccorr = result.ptr(y)[x];
+                result.ptr(y)[x] = normAcc_SQDIFF(image_sqsum_ - 2.f * ccorr + templ_sqsum,
+                                                  sqrtf(image_sqsum_ * templ_sqsum));
+            }
+        }
+
+        template <int cn>
+        void matchTemplatePrepared_SQDIFF_NORMED_8U(int w, int h, const PtrStepSz<double> image_sqsum, double templ_sqsum,
+                                                    PtrStepSzf result, cudaStream_t stream)
+        {
+            const dim3 threads(32, 8);
+            const dim3 grid(divUp(result.cols, threads.x), divUp(result.rows, threads.y));
+
+            matchTemplatePreparedKernel_SQDIFF_NORMED_8U<cn><<<grid, threads, 0, stream>>>(w, h, image_sqsum, templ_sqsum, result);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+
+        void matchTemplatePrepared_SQDIFF_NORMED_8U(int w, int h, const PtrStepSz<double> image_sqsum, double templ_sqsum,
+                                                    PtrStepSzf result, int cn, cudaStream_t stream)
+        {
+            typedef void (*caller_t)(int w, int h, const PtrStepSz<double> image_sqsum, double templ_sqsum, PtrStepSzf result, cudaStream_t stream);
+            static const caller_t callers[] =
+            {
+                0, matchTemplatePrepared_SQDIFF_NORMED_8U<1>, matchTemplatePrepared_SQDIFF_NORMED_8U<2>, matchTemplatePrepared_SQDIFF_NORMED_8U<3>, matchTemplatePrepared_SQDIFF_NORMED_8U<4>
+            };
+
+            callers[cn](w, h, image_sqsum, templ_sqsum, result, stream);
+        }
+
+        //////////////////////////////////////////////////////////////////////
+        // Prepared_CCOFF
+
+        __global__ void matchTemplatePreparedKernel_CCOFF_8U(int w, int h, float templ_sum_scale, const PtrStep<int> image_sum, PtrStepSzf result)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < result.cols && y < result.rows)
+            {
+                float image_sum_ = (float)(
+                        (image_sum.ptr(y + h)[x + w] - image_sum.ptr(y)[x + w]) -
+                        (image_sum.ptr(y + h)[x] - image_sum.ptr(y)[x]));
+                float ccorr = result.ptr(y)[x];
+                result.ptr(y)[x] = ccorr - image_sum_ * templ_sum_scale;
+            }
+        }
+
+        void matchTemplatePrepared_CCOFF_8U(int w, int h, const PtrStepSz<int> image_sum, int templ_sum, PtrStepSzf result, cudaStream_t stream)
+        {
+            dim3 threads(32, 8);
+            dim3 grid(divUp(result.cols, threads.x), divUp(result.rows, threads.y));
+
+            matchTemplatePreparedKernel_CCOFF_8U<<<grid, threads, 0, stream>>>(w, h, (float)templ_sum / (w * h), image_sum, result);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+
+
+        __global__ void matchTemplatePreparedKernel_CCOFF_8UC2(
+                int w, int h, float templ_sum_scale_r, float templ_sum_scale_g,
+                const PtrStep<int> image_sum_r,
+                const PtrStep<int> image_sum_g,
+                PtrStepSzf result)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < result.cols && y < result.rows)
+            {
+                float image_sum_r_ = (float)(
+                        (image_sum_r.ptr(y + h)[x + w] - image_sum_r.ptr(y)[x + w]) -
+                        (image_sum_r.ptr(y + h)[x] - image_sum_r.ptr(y)[x]));
+                float image_sum_g_ = (float)(
+                        (image_sum_g.ptr(y + h)[x + w] - image_sum_g.ptr(y)[x + w]) -
+                        (image_sum_g.ptr(y + h)[x] - image_sum_g.ptr(y)[x]));
+                float ccorr = result.ptr(y)[x];
+                result.ptr(y)[x] = ccorr - image_sum_r_ * templ_sum_scale_r
+                                         - image_sum_g_ * templ_sum_scale_g;
+            }
+        }
+
+        void matchTemplatePrepared_CCOFF_8UC2(
+                int w, int h,
+                const PtrStepSz<int> image_sum_r,
+                const PtrStepSz<int> image_sum_g,
+                int templ_sum_r, int templ_sum_g,
+                PtrStepSzf result, cudaStream_t stream)
+        {
+            dim3 threads(32, 8);
+            dim3 grid(divUp(result.cols, threads.x), divUp(result.rows, threads.y));
+
+            matchTemplatePreparedKernel_CCOFF_8UC2<<<grid, threads, 0, stream>>>(
+                    w, h, (float)templ_sum_r / (w * h), (float)templ_sum_g / (w * h),
+                    image_sum_r, image_sum_g, result);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+
+
+        __global__ void matchTemplatePreparedKernel_CCOFF_8UC3(
+                int w, int h,
+                float templ_sum_scale_r,
+                float templ_sum_scale_g,
+                float templ_sum_scale_b,
+                const PtrStep<int> image_sum_r,
+                const PtrStep<int> image_sum_g,
+                const PtrStep<int> image_sum_b,
+                PtrStepSzf result)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < result.cols && y < result.rows)
+            {
+                float image_sum_r_ = (float)(
+                        (image_sum_r.ptr(y + h)[x + w] - image_sum_r.ptr(y)[x + w]) -
+                        (image_sum_r.ptr(y + h)[x] - image_sum_r.ptr(y)[x]));
+                float image_sum_g_ = (float)(
+                        (image_sum_g.ptr(y + h)[x + w] - image_sum_g.ptr(y)[x + w]) -
+                        (image_sum_g.ptr(y + h)[x] - image_sum_g.ptr(y)[x]));
+                float image_sum_b_ = (float)(
+                        (image_sum_b.ptr(y + h)[x + w] - image_sum_b.ptr(y)[x + w]) -
+                        (image_sum_b.ptr(y + h)[x] - image_sum_b.ptr(y)[x]));
+                float ccorr = result.ptr(y)[x];
+                result.ptr(y)[x] = ccorr - image_sum_r_ * templ_sum_scale_r
+                                         - image_sum_g_ * templ_sum_scale_g
+                                         - image_sum_b_ * templ_sum_scale_b;
+            }
+        }
+
+        void matchTemplatePrepared_CCOFF_8UC3(
+                int w, int h,
+                const PtrStepSz<int> image_sum_r,
+                const PtrStepSz<int> image_sum_g,
+                const PtrStepSz<int> image_sum_b,
+                int templ_sum_r,
+                int templ_sum_g,
+                int templ_sum_b,
+                PtrStepSzf result, cudaStream_t stream)
+        {
+            dim3 threads(32, 8);
+            dim3 grid(divUp(result.cols, threads.x), divUp(result.rows, threads.y));
+
+            matchTemplatePreparedKernel_CCOFF_8UC3<<<grid, threads, 0, stream>>>(
+                    w, h,
+                    (float)templ_sum_r / (w * h),
+                    (float)templ_sum_g / (w * h),
+                    (float)templ_sum_b / (w * h),
+                    image_sum_r, image_sum_g, image_sum_b, result);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+
+
+        __global__ void matchTemplatePreparedKernel_CCOFF_8UC4(
+                int w, int h,
+                float templ_sum_scale_r,
+                float templ_sum_scale_g,
+                float templ_sum_scale_b,
+                float templ_sum_scale_a,
+                const PtrStep<int> image_sum_r,
+                const PtrStep<int> image_sum_g,
+                const PtrStep<int> image_sum_b,
+                const PtrStep<int> image_sum_a,
+                PtrStepSzf result)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < result.cols && y < result.rows)
+            {
+                float image_sum_r_ = (float)(
+                        (image_sum_r.ptr(y + h)[x + w] - image_sum_r.ptr(y)[x + w]) -
+                        (image_sum_r.ptr(y + h)[x] - image_sum_r.ptr(y)[x]));
+                float image_sum_g_ = (float)(
+                        (image_sum_g.ptr(y + h)[x + w] - image_sum_g.ptr(y)[x + w]) -
+                        (image_sum_g.ptr(y + h)[x] - image_sum_g.ptr(y)[x]));
+                float image_sum_b_ = (float)(
+                        (image_sum_b.ptr(y + h)[x + w] - image_sum_b.ptr(y)[x + w]) -
+                        (image_sum_b.ptr(y + h)[x] - image_sum_b.ptr(y)[x]));
+                float image_sum_a_ = (float)(
+                        (image_sum_a.ptr(y + h)[x + w] - image_sum_a.ptr(y)[x + w]) -
+                        (image_sum_a.ptr(y + h)[x] - image_sum_a.ptr(y)[x]));
+                float ccorr = result.ptr(y)[x];
+                result.ptr(y)[x] = ccorr - image_sum_r_ * templ_sum_scale_r
+                                         - image_sum_g_ * templ_sum_scale_g
+                                         - image_sum_b_ * templ_sum_scale_b
+                                         - image_sum_a_ * templ_sum_scale_a;
+            }
+        }
+
+        void matchTemplatePrepared_CCOFF_8UC4(
+                int w, int h,
+                const PtrStepSz<int> image_sum_r,
+                const PtrStepSz<int> image_sum_g,
+                const PtrStepSz<int> image_sum_b,
+                const PtrStepSz<int> image_sum_a,
+                int templ_sum_r,
+                int templ_sum_g,
+                int templ_sum_b,
+                int templ_sum_a,
+                PtrStepSzf result, cudaStream_t stream)
+        {
+            dim3 threads(32, 8);
+            dim3 grid(divUp(result.cols, threads.x), divUp(result.rows, threads.y));
+
+            matchTemplatePreparedKernel_CCOFF_8UC4<<<grid, threads, 0, stream>>>(
+                    w, h,
+                    (float)templ_sum_r / (w * h),
+                    (float)templ_sum_g / (w * h),
+                    (float)templ_sum_b / (w * h),
+                    (float)templ_sum_a / (w * h),
+                    image_sum_r, image_sum_g, image_sum_b, image_sum_a,
+                    result);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        //////////////////////////////////////////////////////////////////////
+        // Prepared_CCOFF_NORMED
+
+        __global__ void matchTemplatePreparedKernel_CCOFF_NORMED_8U(
+                int w, int h, float weight,
+                float templ_sum_scale, float templ_sqsum_scale,
+                const PtrStep<int> image_sum,
+                const PtrStep<double> image_sqsum,
+                PtrStepSzf result)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < result.cols && y < result.rows)
+            {
+                float ccorr = result.ptr(y)[x];
+                float image_sum_ = (float)(
+                        (image_sum.ptr(y + h)[x + w] - image_sum.ptr(y)[x + w]) -
+                        (image_sum.ptr(y + h)[x] - image_sum.ptr(y)[x]));
+                float image_sqsum_ = (float)(
+                        (image_sqsum.ptr(y + h)[x + w] - image_sqsum.ptr(y)[x + w]) -
+                        (image_sqsum.ptr(y + h)[x] - image_sqsum.ptr(y)[x]));
+                result.ptr(y)[x] = normAcc(ccorr - image_sum_ * templ_sum_scale,
+                                           sqrtf(templ_sqsum_scale * (image_sqsum_ - weight * image_sum_ * image_sum_)));
+            }
+        }
+
+        void matchTemplatePrepared_CCOFF_NORMED_8U(
+                    int w, int h, const PtrStepSz<int> image_sum,
+                    const PtrStepSz<double> image_sqsum,
+                    int templ_sum, double templ_sqsum,
+                    PtrStepSzf result, cudaStream_t stream)
+        {
+            dim3 threads(32, 8);
+            dim3 grid(divUp(result.cols, threads.x), divUp(result.rows, threads.y));
+
+            float weight = 1.f / (w * h);
+            float templ_sum_scale = templ_sum * weight;
+            float templ_sqsum_scale = templ_sqsum - weight * templ_sum * templ_sum;
+
+            matchTemplatePreparedKernel_CCOFF_NORMED_8U<<<grid, threads, 0, stream>>>(
+                    w, h, weight, templ_sum_scale, templ_sqsum_scale,
+                    image_sum, image_sqsum, result);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+
+
+        __global__ void matchTemplatePreparedKernel_CCOFF_NORMED_8UC2(
+                int w, int h, float weight,
+                float templ_sum_scale_r, float templ_sum_scale_g,
+                float templ_sqsum_scale,
+                const PtrStep<int> image_sum_r, const PtrStep<double> image_sqsum_r,
+                const PtrStep<int> image_sum_g, const PtrStep<double> image_sqsum_g,
+                PtrStepSzf result)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < result.cols && y < result.rows)
+            {
+                float image_sum_r_ = (float)(
+                        (image_sum_r.ptr(y + h)[x + w] - image_sum_r.ptr(y)[x + w]) -
+                        (image_sum_r.ptr(y + h)[x] - image_sum_r.ptr(y)[x]));
+                float image_sqsum_r_ = (float)(
+                        (image_sqsum_r.ptr(y + h)[x + w] - image_sqsum_r.ptr(y)[x + w]) -
+                        (image_sqsum_r.ptr(y + h)[x] - image_sqsum_r.ptr(y)[x]));
+                float image_sum_g_ = (float)(
+                        (image_sum_g.ptr(y + h)[x + w] - image_sum_g.ptr(y)[x + w]) -
+                        (image_sum_g.ptr(y + h)[x] - image_sum_g.ptr(y)[x]));
+                float image_sqsum_g_ = (float)(
+                        (image_sqsum_g.ptr(y + h)[x + w] - image_sqsum_g.ptr(y)[x + w]) -
+                        (image_sqsum_g.ptr(y + h)[x] - image_sqsum_g.ptr(y)[x]));
+
+                float num = result.ptr(y)[x] - image_sum_r_ * templ_sum_scale_r
+                                             - image_sum_g_ * templ_sum_scale_g;
+                float denum = sqrtf(templ_sqsum_scale * (image_sqsum_r_ - weight * image_sum_r_ * image_sum_r_
+                                                         + image_sqsum_g_ - weight * image_sum_g_ * image_sum_g_));
+                result.ptr(y)[x] = normAcc(num, denum);
+            }
+        }
+
+        void matchTemplatePrepared_CCOFF_NORMED_8UC2(
+                    int w, int h,
+                    const PtrStepSz<int> image_sum_r, const PtrStepSz<double> image_sqsum_r,
+                    const PtrStepSz<int> image_sum_g, const PtrStepSz<double> image_sqsum_g,
+                    int templ_sum_r, double templ_sqsum_r,
+                    int templ_sum_g, double templ_sqsum_g,
+                    PtrStepSzf result, cudaStream_t stream)
+        {
+            dim3 threads(32, 8);
+            dim3 grid(divUp(result.cols, threads.x), divUp(result.rows, threads.y));
+
+            float weight = 1.f / (w * h);
+            float templ_sum_scale_r = templ_sum_r * weight;
+            float templ_sum_scale_g = templ_sum_g * weight;
+            float templ_sqsum_scale = templ_sqsum_r - weight * templ_sum_r * templ_sum_r
+                                       + templ_sqsum_g - weight * templ_sum_g * templ_sum_g;
+
+            matchTemplatePreparedKernel_CCOFF_NORMED_8UC2<<<grid, threads, 0, stream>>>(
+                    w, h, weight,
+                    templ_sum_scale_r, templ_sum_scale_g,
+                    templ_sqsum_scale,
+                    image_sum_r, image_sqsum_r,
+                    image_sum_g, image_sqsum_g,
+                    result);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+
+
+        __global__ void matchTemplatePreparedKernel_CCOFF_NORMED_8UC3(
+                int w, int h, float weight,
+                float templ_sum_scale_r, float templ_sum_scale_g, float templ_sum_scale_b,
+                float templ_sqsum_scale,
+                const PtrStep<int> image_sum_r, const PtrStep<double> image_sqsum_r,
+                const PtrStep<int> image_sum_g, const PtrStep<double> image_sqsum_g,
+                const PtrStep<int> image_sum_b, const PtrStep<double> image_sqsum_b,
+                PtrStepSzf result)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < result.cols && y < result.rows)
+            {
+                float image_sum_r_ = (float)(
+                        (image_sum_r.ptr(y + h)[x + w] - image_sum_r.ptr(y)[x + w]) -
+                        (image_sum_r.ptr(y + h)[x] - image_sum_r.ptr(y)[x]));
+                float image_sqsum_r_ = (float)(
+                        (image_sqsum_r.ptr(y + h)[x + w] - image_sqsum_r.ptr(y)[x + w]) -
+                        (image_sqsum_r.ptr(y + h)[x] - image_sqsum_r.ptr(y)[x]));
+                float image_sum_g_ = (float)(
+                        (image_sum_g.ptr(y + h)[x + w] - image_sum_g.ptr(y)[x + w]) -
+                        (image_sum_g.ptr(y + h)[x] - image_sum_g.ptr(y)[x]));
+                float image_sqsum_g_ = (float)(
+                        (image_sqsum_g.ptr(y + h)[x + w] - image_sqsum_g.ptr(y)[x + w]) -
+                        (image_sqsum_g.ptr(y + h)[x] - image_sqsum_g.ptr(y)[x]));
+                float image_sum_b_ = (float)(
+                        (image_sum_b.ptr(y + h)[x + w] - image_sum_b.ptr(y)[x + w]) -
+                        (image_sum_b.ptr(y + h)[x] - image_sum_b.ptr(y)[x]));
+                float image_sqsum_b_ = (float)(
+                        (image_sqsum_b.ptr(y + h)[x + w] - image_sqsum_b.ptr(y)[x + w]) -
+                        (image_sqsum_b.ptr(y + h)[x] - image_sqsum_b.ptr(y)[x]));
+
+                float num = result.ptr(y)[x] - image_sum_r_ * templ_sum_scale_r
+                                             - image_sum_g_ * templ_sum_scale_g
+                                             - image_sum_b_ * templ_sum_scale_b;
+                float denum = sqrtf(templ_sqsum_scale * (image_sqsum_r_ - weight * image_sum_r_ * image_sum_r_
+                                                         + image_sqsum_g_ - weight * image_sum_g_ * image_sum_g_
+                                                         + image_sqsum_b_ - weight * image_sum_b_ * image_sum_b_));
+                result.ptr(y)[x] = normAcc(num, denum);
+            }
+        }
+
+        void matchTemplatePrepared_CCOFF_NORMED_8UC3(
+                    int w, int h,
+                    const PtrStepSz<int> image_sum_r, const PtrStepSz<double> image_sqsum_r,
+                    const PtrStepSz<int> image_sum_g, const PtrStepSz<double> image_sqsum_g,
+                    const PtrStepSz<int> image_sum_b, const PtrStepSz<double> image_sqsum_b,
+                    int templ_sum_r, double templ_sqsum_r,
+                    int templ_sum_g, double templ_sqsum_g,
+                    int templ_sum_b, double templ_sqsum_b,
+                    PtrStepSzf result, cudaStream_t stream)
+        {
+            dim3 threads(32, 8);
+            dim3 grid(divUp(result.cols, threads.x), divUp(result.rows, threads.y));
+
+            float weight = 1.f / (w * h);
+            float templ_sum_scale_r = templ_sum_r * weight;
+            float templ_sum_scale_g = templ_sum_g * weight;
+            float templ_sum_scale_b = templ_sum_b * weight;
+            float templ_sqsum_scale = templ_sqsum_r - weight * templ_sum_r * templ_sum_r
+                                      + templ_sqsum_g - weight * templ_sum_g * templ_sum_g
+                                      + templ_sqsum_b - weight * templ_sum_b * templ_sum_b;
+
+            matchTemplatePreparedKernel_CCOFF_NORMED_8UC3<<<grid, threads, 0, stream>>>(
+                    w, h, weight,
+                    templ_sum_scale_r, templ_sum_scale_g, templ_sum_scale_b,
+                    templ_sqsum_scale,
+                    image_sum_r, image_sqsum_r,
+                    image_sum_g, image_sqsum_g,
+                    image_sum_b, image_sqsum_b,
+                    result);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+
+
+        __global__ void matchTemplatePreparedKernel_CCOFF_NORMED_8UC4(
+                int w, int h, float weight,
+                float templ_sum_scale_r, float templ_sum_scale_g, float templ_sum_scale_b,
+                float templ_sum_scale_a, float templ_sqsum_scale,
+                const PtrStep<int> image_sum_r, const PtrStep<double> image_sqsum_r,
+                const PtrStep<int> image_sum_g, const PtrStep<double> image_sqsum_g,
+                const PtrStep<int> image_sum_b, const PtrStep<double> image_sqsum_b,
+                const PtrStep<int> image_sum_a, const PtrStep<double> image_sqsum_a,
+                PtrStepSzf result)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < result.cols && y < result.rows)
+            {
+                float image_sum_r_ = (float)(
+                        (image_sum_r.ptr(y + h)[x + w] - image_sum_r.ptr(y)[x + w]) -
+                        (image_sum_r.ptr(y + h)[x] - image_sum_r.ptr(y)[x]));
+                float image_sqsum_r_ = (float)(
+                        (image_sqsum_r.ptr(y + h)[x + w] - image_sqsum_r.ptr(y)[x + w]) -
+                        (image_sqsum_r.ptr(y + h)[x] - image_sqsum_r.ptr(y)[x]));
+                float image_sum_g_ = (float)(
+                        (image_sum_g.ptr(y + h)[x + w] - image_sum_g.ptr(y)[x + w]) -
+                        (image_sum_g.ptr(y + h)[x] - image_sum_g.ptr(y)[x]));
+                float image_sqsum_g_ = (float)(
+                        (image_sqsum_g.ptr(y + h)[x + w] - image_sqsum_g.ptr(y)[x + w]) -
+                        (image_sqsum_g.ptr(y + h)[x] - image_sqsum_g.ptr(y)[x]));
+                float image_sum_b_ = (float)(
+                        (image_sum_b.ptr(y + h)[x + w] - image_sum_b.ptr(y)[x + w]) -
+                        (image_sum_b.ptr(y + h)[x] - image_sum_b.ptr(y)[x]));
+                float image_sqsum_b_ = (float)(
+                        (image_sqsum_b.ptr(y + h)[x + w] - image_sqsum_b.ptr(y)[x + w]) -
+                        (image_sqsum_b.ptr(y + h)[x] - image_sqsum_b.ptr(y)[x]));
+                float image_sum_a_ = (float)(
+                        (image_sum_a.ptr(y + h)[x + w] - image_sum_a.ptr(y)[x + w]) -
+                        (image_sum_a.ptr(y + h)[x] - image_sum_a.ptr(y)[x]));
+                float image_sqsum_a_ = (float)(
+                        (image_sqsum_a.ptr(y + h)[x + w] - image_sqsum_a.ptr(y)[x + w]) -
+                        (image_sqsum_a.ptr(y + h)[x] - image_sqsum_a.ptr(y)[x]));
+
+                float num = result.ptr(y)[x] - image_sum_r_ * templ_sum_scale_r - image_sum_g_ * templ_sum_scale_g
+                                             - image_sum_b_ * templ_sum_scale_b - image_sum_a_ * templ_sum_scale_a;
+                float denum = sqrtf(templ_sqsum_scale * (image_sqsum_r_ - weight * image_sum_r_ * image_sum_r_
+                                                         + image_sqsum_g_ - weight * image_sum_g_ * image_sum_g_
+                                                         + image_sqsum_b_ - weight * image_sum_b_ * image_sum_b_
+                                                         + image_sqsum_a_ - weight * image_sum_a_ * image_sum_a_));
+                result.ptr(y)[x] = normAcc(num, denum);
+            }
+        }
+
+        void matchTemplatePrepared_CCOFF_NORMED_8UC4(
+                    int w, int h,
+                    const PtrStepSz<int> image_sum_r, const PtrStepSz<double> image_sqsum_r,
+                    const PtrStepSz<int> image_sum_g, const PtrStepSz<double> image_sqsum_g,
+                    const PtrStepSz<int> image_sum_b, const PtrStepSz<double> image_sqsum_b,
+                    const PtrStepSz<int> image_sum_a, const PtrStepSz<double> image_sqsum_a,
+                    int templ_sum_r, double templ_sqsum_r,
+                    int templ_sum_g, double templ_sqsum_g,
+                    int templ_sum_b, double templ_sqsum_b,
+                    int templ_sum_a, double templ_sqsum_a,
+                    PtrStepSzf result, cudaStream_t stream)
+        {
+            dim3 threads(32, 8);
+            dim3 grid(divUp(result.cols, threads.x), divUp(result.rows, threads.y));
+
+            float weight = 1.f / (w * h);
+            float templ_sum_scale_r = templ_sum_r * weight;
+            float templ_sum_scale_g = templ_sum_g * weight;
+            float templ_sum_scale_b = templ_sum_b * weight;
+            float templ_sum_scale_a = templ_sum_a * weight;
+            float templ_sqsum_scale = templ_sqsum_r - weight * templ_sum_r * templ_sum_r
+                                      + templ_sqsum_g - weight * templ_sum_g * templ_sum_g
+                                      + templ_sqsum_b - weight * templ_sum_b * templ_sum_b
+                                      + templ_sqsum_a - weight * templ_sum_a * templ_sum_a;
+
+            matchTemplatePreparedKernel_CCOFF_NORMED_8UC4<<<grid, threads, 0, stream>>>(
+                    w, h, weight,
+                    templ_sum_scale_r, templ_sum_scale_g, templ_sum_scale_b, templ_sum_scale_a,
+                    templ_sqsum_scale,
+                    image_sum_r, image_sqsum_r,
+                    image_sum_g, image_sqsum_g,
+                    image_sum_b, image_sqsum_b,
+                    image_sum_a, image_sqsum_a,
+                    result);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        //////////////////////////////////////////////////////////////////////
+        // normalize
+
+        template <int cn>
+        __global__ void normalizeKernel_8U(
+                int w, int h, const PtrStep<double> image_sqsum,
+                double templ_sqsum, PtrStepSzf result)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < result.cols && y < result.rows)
+            {
+                float image_sqsum_ = (float)(
+                        (image_sqsum.ptr(y + h)[(x + w) * cn] - image_sqsum.ptr(y)[(x + w) * cn]) -
+                        (image_sqsum.ptr(y + h)[x * cn] - image_sqsum.ptr(y)[x * cn]));
+                result.ptr(y)[x] = normAcc(result.ptr(y)[x], sqrtf(image_sqsum_ * templ_sqsum));
+            }
+        }
+
+        void normalize_8U(int w, int h, const PtrStepSz<double> image_sqsum,
+                          double templ_sqsum, PtrStepSzf result, int cn, cudaStream_t stream)
+        {
+            dim3 threads(32, 8);
+            dim3 grid(divUp(result.cols, threads.x), divUp(result.rows, threads.y));
+
+            switch (cn)
+            {
+            case 1:
+                normalizeKernel_8U<1><<<grid, threads, 0, stream>>>(w, h, image_sqsum, templ_sqsum, result);
+                break;
+            case 2:
+                normalizeKernel_8U<2><<<grid, threads, 0, stream>>>(w, h, image_sqsum, templ_sqsum, result);
+                break;
+            case 3:
+                normalizeKernel_8U<3><<<grid, threads, 0, stream>>>(w, h, image_sqsum, templ_sqsum, result);
+                break;
+            case 4:
+                normalizeKernel_8U<4><<<grid, threads, 0, stream>>>(w, h, image_sqsum, templ_sqsum, result);
+                break;
+            }
+
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        //////////////////////////////////////////////////////////////////////
+        // extractFirstChannel
+
+        template <int cn>
+        __global__ void extractFirstChannel_32F(const PtrStepb image, PtrStepSzf result)
+        {
+            typedef typename TypeVec<float, cn>::vec_type Typef;
+
+            int x = blockDim.x * blockIdx.x + threadIdx.x;
+            int y = blockDim.y * blockIdx.y + threadIdx.y;
+
+            if (x < result.cols && y < result.rows)
+            {
+                Typef val = ((const Typef*)image.ptr(y))[x];
+                result.ptr(y)[x] = first(val);
+            }
+        }
+
+        void extractFirstChannel_32F(const PtrStepSzb image, PtrStepSzf result, int cn, cudaStream_t stream)
+        {
+            dim3 threads(32, 8);
+            dim3 grid(divUp(result.cols, threads.x), divUp(result.rows, threads.y));
+
+            switch (cn)
+            {
+            case 1:
+                extractFirstChannel_32F<1><<<grid, threads, 0, stream>>>(image, result);
+                break;
+            case 2:
+                extractFirstChannel_32F<2><<<grid, threads, 0, stream>>>(image, result);
+                break;
+            case 3:
+                extractFirstChannel_32F<3><<<grid, threads, 0, stream>>>(image, result);
+                break;
+            case 4:
+                extractFirstChannel_32F<4><<<grid, threads, 0, stream>>>(image, result);
+                break;
+            }
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    } //namespace match_template
+}}} // namespace cv { namespace cuda { namespace cudev
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudaimgproc/src/cuda/mean_shift.cu b/modules/cudaimgproc/src/cuda/mean_shift.cu
new file mode 100644
index 00000000000..3b3b93f94e4
--- /dev/null
+++ b/modules/cudaimgproc/src/cuda/mean_shift.cu
@@ -0,0 +1,182 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/vec_traits.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/saturate_cast.hpp"
+#include "opencv2/core/cuda/border_interpolate.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace imgproc
+    {
+        texture<uchar4, 2> tex_meanshift;
+
+        __device__ short2 do_mean_shift(int x0, int y0, unsigned char* out,
+                                        size_t out_step, int cols, int rows,
+                                        int sp, int sr, int maxIter, float eps)
+        {
+            int isr2 = sr*sr;
+            uchar4 c = tex2D(tex_meanshift, x0, y0 );
+
+            // iterate meanshift procedure
+            for( int iter = 0; iter < maxIter; iter++ )
+            {
+                int count = 0;
+                int s0 = 0, s1 = 0, s2 = 0, sx = 0, sy = 0;
+                float icount;
+
+                //mean shift: process pixels in window (p-sigmaSp)x(p+sigmaSp)
+                int minx = x0-sp;
+                int miny = y0-sp;
+                int maxx = x0+sp;
+                int maxy = y0+sp;
+
+                for( int y = miny; y <= maxy; y++)
+                {
+                    int rowCount = 0;
+                    for( int x = minx; x <= maxx; x++ )
+                    {
+                        uchar4 t = tex2D( tex_meanshift, x, y );
+
+                        int norm2 = (t.x - c.x) * (t.x - c.x) + (t.y - c.y) * (t.y - c.y) + (t.z - c.z) * (t.z - c.z);
+                        if( norm2 <= isr2 )
+                        {
+                            s0 += t.x; s1 += t.y; s2 += t.z;
+                            sx += x; rowCount++;
+                        }
+                    }
+                    count += rowCount;
+                    sy += y*rowCount;
+                }
+
+                if( count == 0 )
+                    break;
+
+                icount = 1.f/count;
+                int x1 = __float2int_rz(sx*icount);
+                int y1 = __float2int_rz(sy*icount);
+                s0 = __float2int_rz(s0*icount);
+                s1 = __float2int_rz(s1*icount);
+                s2 = __float2int_rz(s2*icount);
+
+                int norm2 = (s0 - c.x) * (s0 - c.x) + (s1 - c.y) * (s1 - c.y) + (s2 - c.z) * (s2 - c.z);
+
+                bool stopFlag = (x0 == x1 && y0 == y1) || (::abs(x1-x0) + ::abs(y1-y0) + norm2 <= eps);
+
+                x0 = x1; y0 = y1;
+                c.x = s0; c.y = s1; c.z = s2;
+
+                if( stopFlag )
+                    break;
+            }
+
+            int base = (blockIdx.y * blockDim.y + threadIdx.y) * out_step + (blockIdx.x * blockDim.x + threadIdx.x) * 4 * sizeof(uchar);
+            *(uchar4*)(out + base) = c;
+
+            return make_short2((short)x0, (short)y0);
+        }
+
+        __global__ void meanshift_kernel(unsigned char* out, size_t out_step, int cols, int rows, int sp, int sr, int maxIter, float eps )
+        {
+            int x0 = blockIdx.x * blockDim.x + threadIdx.x;
+            int y0 = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if( x0 < cols && y0 < rows )
+                do_mean_shift(x0, y0, out, out_step, cols, rows, sp, sr, maxIter, eps);
+        }
+
+        void meanShiftFiltering_gpu(const PtrStepSzb& src, PtrStepSzb dst, int sp, int sr, int maxIter, float eps, cudaStream_t stream)
+        {
+            dim3 grid(1, 1, 1);
+            dim3 threads(32, 8, 1);
+            grid.x = divUp(src.cols, threads.x);
+            grid.y = divUp(src.rows, threads.y);
+
+            cudaChannelFormatDesc desc = cudaCreateChannelDesc<uchar4>();
+            cudaSafeCall( cudaBindTexture2D( 0, tex_meanshift, src.data, desc, src.cols, src.rows, src.step ) );
+
+            meanshift_kernel<<< grid, threads, 0, stream >>>( dst.data, dst.step, dst.cols, dst.rows, sp, sr, maxIter, eps );
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        __global__ void meanshiftproc_kernel(unsigned char* outr, size_t outrstep,
+                                             unsigned char* outsp, size_t outspstep,
+                                             int cols, int rows,
+                                             int sp, int sr, int maxIter, float eps)
+        {
+            int x0 = blockIdx.x * blockDim.x + threadIdx.x;
+            int y0 = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if( x0 < cols && y0 < rows )
+            {
+                int basesp = (blockIdx.y * blockDim.y + threadIdx.y) * outspstep + (blockIdx.x * blockDim.x + threadIdx.x) * 2 * sizeof(short);
+                *(short2*)(outsp + basesp) = do_mean_shift(x0, y0, outr, outrstep, cols, rows, sp, sr, maxIter, eps);
+            }
+        }
+
+        void meanShiftProc_gpu(const PtrStepSzb& src, PtrStepSzb dstr, PtrStepSzb dstsp, int sp, int sr, int maxIter, float eps, cudaStream_t stream)
+        {
+            dim3 grid(1, 1, 1);
+            dim3 threads(32, 8, 1);
+            grid.x = divUp(src.cols, threads.x);
+            grid.y = divUp(src.rows, threads.y);
+
+            cudaChannelFormatDesc desc = cudaCreateChannelDesc<uchar4>();
+            cudaSafeCall( cudaBindTexture2D( 0, tex_meanshift, src.data, desc, src.cols, src.rows, src.step ) );
+
+            meanshiftproc_kernel<<< grid, threads, 0, stream >>>( dstr.data, dstr.step, dstsp.data, dstsp.step, dstr.cols, dstr.rows, sp, sr, maxIter, eps );
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    }
+}}}
+
+#endif
diff --git a/modules/cudaimgproc/src/cvt_color_internal.h b/modules/cudaimgproc/src/cvt_color_internal.h
new file mode 100644
index 00000000000..ea89dbee790
--- /dev/null
+++ b/modules/cudaimgproc/src/cvt_color_internal.h
@@ -0,0 +1,275 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __cvt_color_internal_h__
+#define __cvt_color_internal_h__
+
+#include "opencv2/core/cuda.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+#define OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(name) \
+    void name(const GpuMat& _src, GpuMat& _dst, Stream& stream);
+
+#define OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(name)       \
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(name ## _8u)    \
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(name ## _16u)   \
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(name ## _32f)
+
+#define OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(name)    \
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(name ## _8u)   \
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(name ## _32f)
+
+#define OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(name)    \
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(name ## _8u)        \
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(name ## _32f)       \
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(name ## _FULL_8u)   \
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(name ## _FULL_32f)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGR_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGR_to_BGRA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGR_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGRA_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGRA_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGRA_to_RGBA)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(RGB_to_GRAY)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGR_to_GRAY)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(RGBA_to_GRAY)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGRA_to_GRAY)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(GRAY_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(GRAY_to_BGRA)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(RGB_to_YUV)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(RGBA_to_YUV)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(RGB_to_YUV4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(RGBA_to_YUV4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGR_to_YUV)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGRA_to_YUV)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGR_to_YUV4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGRA_to_YUV4)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(YUV_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(YUV_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(YUV4_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(YUV4_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(YUV_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(YUV_to_BGRA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(YUV4_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(YUV4_to_BGRA)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(RGB_to_YCrCb)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(RGBA_to_YCrCb)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(RGB_to_YCrCb4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(RGBA_to_YCrCb4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGR_to_YCrCb)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGRA_to_YCrCb)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGR_to_YCrCb4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGRA_to_YCrCb4)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(YCrCb_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(YCrCb_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(YCrCb4_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(YCrCb4_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(YCrCb_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(YCrCb_to_BGRA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(YCrCb4_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(YCrCb4_to_BGRA)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(RGB_to_XYZ)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(RGBA_to_XYZ)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(RGB_to_XYZ4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(RGBA_to_XYZ4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGR_to_XYZ)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGRA_to_XYZ)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGR_to_XYZ4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(BGRA_to_XYZ4)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(XYZ_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(XYZ4_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(XYZ_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(XYZ4_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(XYZ_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(XYZ4_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(XYZ_to_BGRA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ALL(XYZ4_to_BGRA)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(RGB_to_HSV)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(RGBA_to_HSV)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(RGB_to_HSV4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(RGBA_to_HSV4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(BGR_to_HSV)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(BGRA_to_HSV)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(BGR_to_HSV4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(BGRA_to_HSV4)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(HSV_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(HSV_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(HSV4_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(HSV4_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(HSV_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(HSV_to_BGRA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(HSV4_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(HSV4_to_BGRA)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(RGB_to_HLS)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(RGBA_to_HLS)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(RGB_to_HLS4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(RGBA_to_HLS4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(BGR_to_HLS)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(BGRA_to_HLS)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(BGR_to_HLS4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(BGRA_to_HLS4)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(HLS_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(HLS_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(HLS4_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(HLS4_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(HLS_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(HLS_to_BGRA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(HLS4_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL(HLS4_to_BGRA)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(RGB_to_Lab)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(RGBA_to_Lab)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(RGB_to_Lab4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(RGBA_to_Lab4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(BGR_to_Lab)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(BGRA_to_Lab)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(BGR_to_Lab4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(BGRA_to_Lab4)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(LRGB_to_Lab)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(LRGBA_to_Lab)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(LRGB_to_Lab4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(LRGBA_to_Lab4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(LBGR_to_Lab)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(LBGRA_to_Lab)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(LBGR_to_Lab4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(LBGRA_to_Lab4)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Lab_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Lab4_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Lab_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Lab4_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Lab_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Lab4_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Lab_to_BGRA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Lab4_to_BGRA)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Lab_to_LRGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Lab4_to_LRGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Lab_to_LRGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Lab4_to_LRGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Lab_to_LBGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Lab4_to_LBGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Lab_to_LBGRA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Lab4_to_LBGRA)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(RGB_to_Luv)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(RGBA_to_Luv)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(RGB_to_Luv4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(RGBA_to_Luv4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(BGR_to_Luv)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(BGRA_to_Luv)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(BGR_to_Luv4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(BGRA_to_Luv4)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(LRGB_to_Luv)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(LRGBA_to_Luv)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(LRGB_to_Luv4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(LRGBA_to_Luv4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(LBGR_to_Luv)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(LBGRA_to_Luv)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(LBGR_to_Luv4)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(LBGRA_to_Luv4)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Luv_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Luv4_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Luv_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Luv4_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Luv_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Luv4_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Luv_to_BGRA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Luv4_to_BGRA)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Luv_to_LRGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Luv4_to_LRGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Luv_to_LRGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Luv4_to_LRGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Luv_to_LBGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Luv4_to_LBGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Luv_to_LBGRA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F(Luv4_to_LBGRA)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(BGR_to_BGR555)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(BGR_to_BGR565)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(RGB_to_BGR555)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(RGB_to_BGR565)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(BGRA_to_BGR555)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(BGRA_to_BGR565)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(RGBA_to_BGR555)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(RGBA_to_BGR565)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(BGR555_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(BGR565_to_RGB)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(BGR555_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(BGR565_to_BGR)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(BGR555_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(BGR565_to_RGBA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(BGR555_to_BGRA)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(BGR565_to_BGRA)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(GRAY_to_BGR555)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(GRAY_to_BGR565)
+
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(BGR555_to_GRAY)
+    OPENCV_CUDA_DECLARE_CVTCOLOR_ONE(BGR565_to_GRAY)
+
+    #undef OPENCV_CUDA_DECLARE_CVTCOLOR_ONE
+    #undef OPENCV_CUDA_DECLARE_CVTCOLOR_ALL
+    #undef OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F
+    #undef OPENCV_CUDA_DECLARE_CVTCOLOR_8U32F_FULL
+}}}
+
+#endif
diff --git a/modules/cudaimgproc/src/generalized_hough.cpp b/modules/cudaimgproc/src/generalized_hough.cpp
new file mode 100644
index 00000000000..18e00f33145
--- /dev/null
+++ b/modules/cudaimgproc/src/generalized_hough.cpp
@@ -0,0 +1,885 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER) || !defined(HAVE_OPENCV_CUDAARITHM)
+
+Ptr<GeneralizedHoughBallard> cv::cuda::createGeneralizedHoughBallard() { throw_no_cuda(); return Ptr<GeneralizedHoughBallard>(); }
+
+Ptr<GeneralizedHoughGuil> cv::cuda::createGeneralizedHoughGuil() { throw_no_cuda(); return Ptr<GeneralizedHoughGuil>(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace ght
+    {
+        template <typename T>
+        int buildEdgePointList_gpu(PtrStepSzb edges, PtrStepSzb dx, PtrStepSzb dy, unsigned int* coordList, float* thetaList);
+        void buildRTable_gpu(const unsigned int* coordList, const float* thetaList, int pointsCount,
+                             PtrStepSz<short2> r_table, int* r_sizes,
+                             short2 templCenter, int levels);
+
+        void Ballard_Pos_calcHist_gpu(const unsigned int* coordList, const float* thetaList, int pointsCount,
+                                      PtrStepSz<short2> r_table, const int* r_sizes,
+                                      PtrStepSzi hist,
+                                      float dp, int levels);
+        int Ballard_Pos_findPosInHist_gpu(PtrStepSzi hist, float4* out, int3* votes, int maxSize, float dp, int threshold);
+
+        void Guil_Full_setTemplFeatures(PtrStepb p1_pos, PtrStepb p1_theta, PtrStepb p2_pos, PtrStepb d12, PtrStepb r1, PtrStepb r2);
+        void Guil_Full_setImageFeatures(PtrStepb p1_pos, PtrStepb p1_theta, PtrStepb p2_pos, PtrStepb d12, PtrStepb r1, PtrStepb r2);
+        void Guil_Full_buildTemplFeatureList_gpu(const unsigned int* coordList, const float* thetaList, int pointsCount,
+                                                 int* sizes, int maxSize,
+                                                 float xi, float angleEpsilon, int levels,
+                                                 float2 center, float maxDist);
+        void Guil_Full_buildImageFeatureList_gpu(const unsigned int* coordList, const float* thetaList, int pointsCount,
+                                                 int* sizes, int maxSize,
+                                                 float xi, float angleEpsilon, int levels,
+                                                 float2 center, float maxDist);
+        void Guil_Full_calcOHist_gpu(const int* templSizes, const int* imageSizes, int* OHist,
+                                     float minAngle, float maxAngle, float angleStep, int angleRange,
+                                     int levels, int tMaxSize);
+        void Guil_Full_calcSHist_gpu(const int* templSizes, const int* imageSizes, int* SHist,
+                                     float angle, float angleEpsilon,
+                                     float minScale, float maxScale, float iScaleStep, int scaleRange,
+                                     int levels, int tMaxSize);
+        void Guil_Full_calcPHist_gpu(const int* templSizes, const int* imageSizes, PtrStepSzi PHist,
+                                     float angle, float angleEpsilon, float scale,
+                                     float dp,
+                                     int levels, int tMaxSize);
+        int Guil_Full_findPosInHist_gpu(PtrStepSzi hist, float4* out, int3* votes, int curSize, int maxSize,
+                                        float angle, int angleVotes, float scale, int scaleVotes,
+                                        float dp, int threshold);
+    }
+}}}
+
+// common
+
+namespace
+{
+    class GeneralizedHoughBase
+    {
+    protected:
+        GeneralizedHoughBase();
+        virtual ~GeneralizedHoughBase() {}
+
+        void setTemplateImpl(InputArray templ, Point templCenter);
+        void setTemplateImpl(InputArray edges, InputArray dx, InputArray dy, Point templCenter);
+
+        void detectImpl(InputArray image, OutputArray positions, OutputArray votes);
+        void detectImpl(InputArray edges, InputArray dx, InputArray dy, OutputArray positions, OutputArray votes);
+
+        void buildEdgePointList(const GpuMat& edges, const GpuMat& dx, const GpuMat& dy);
+
+        virtual void processTempl() = 0;
+        virtual void processImage() = 0;
+
+        int cannyLowThresh_;
+        int cannyHighThresh_;
+        double minDist_;
+        double dp_;
+        int maxBufferSize_;
+
+        Size templSize_;
+        Point templCenter_;
+        GpuMat templEdges_;
+        GpuMat templDx_;
+        GpuMat templDy_;
+
+        Size imageSize_;
+        GpuMat imageEdges_;
+        GpuMat imageDx_;
+        GpuMat imageDy_;
+
+        GpuMat edgePointList_;
+
+        GpuMat outBuf_;
+        int posCount_;
+
+    private:
+#ifdef HAVE_OPENCV_CUDAFILTERS
+        void calcEdges(InputArray src, GpuMat& edges, GpuMat& dx, GpuMat& dy);
+#endif
+
+        void filterMinDist();
+        void convertTo(OutputArray positions, OutputArray votes);
+
+#ifdef HAVE_OPENCV_CUDAFILTERS
+        Ptr<cuda::CannyEdgeDetector> canny_;
+        Ptr<cuda::Filter> filterDx_;
+        Ptr<cuda::Filter> filterDy_;
+#endif
+
+        std::vector<float4> oldPosBuf_;
+        std::vector<int3> oldVoteBuf_;
+        std::vector<float4> newPosBuf_;
+        std::vector<int3> newVoteBuf_;
+        std::vector<int> indexies_;
+    };
+
+    GeneralizedHoughBase::GeneralizedHoughBase()
+    {
+        cannyLowThresh_ = 50;
+        cannyHighThresh_ = 100;
+        minDist_ = 1.0;
+        dp_ = 1.0;
+
+        maxBufferSize_ = 10000;
+
+#ifdef HAVE_OPENCV_CUDAFILTERS
+        canny_ = cuda::createCannyEdgeDetector(cannyLowThresh_, cannyHighThresh_);
+        filterDx_ = cuda::createSobelFilter(CV_8UC1, CV_32S, 1, 0);
+        filterDy_ = cuda::createSobelFilter(CV_8UC1, CV_32S, 0, 1);
+#endif
+    }
+
+#ifdef HAVE_OPENCV_CUDAFILTERS
+    void GeneralizedHoughBase::calcEdges(InputArray _src, GpuMat& edges, GpuMat& dx, GpuMat& dy)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.type() == CV_8UC1 );
+        CV_Assert( cannyLowThresh_ > 0 && cannyLowThresh_ < cannyHighThresh_ );
+
+        ensureSizeIsEnough(src.size(), CV_32SC1, dx);
+        ensureSizeIsEnough(src.size(), CV_32SC1, dy);
+
+        filterDx_->apply(src, dx);
+        filterDy_->apply(src, dy);
+
+        ensureSizeIsEnough(src.size(), CV_8UC1, edges);
+
+        canny_->setLowThreshold(cannyLowThresh_);
+        canny_->setHighThreshold(cannyHighThresh_);
+        canny_->detect(dx, dy, edges);
+    }
+#endif
+
+    void GeneralizedHoughBase::setTemplateImpl(InputArray templ, Point templCenter)
+    {
+#ifndef HAVE_OPENCV_CUDAFILTERS
+        CV_UNUSED(templ);
+        CV_UNUSED(templCenter);
+        throw_no_cuda();
+#else
+        calcEdges(templ, templEdges_, templDx_, templDy_);
+
+        if (templCenter == Point(-1, -1))
+            templCenter = Point(templEdges_.cols / 2, templEdges_.rows / 2);
+
+        templSize_ = templEdges_.size();
+        templCenter_ = templCenter;
+
+        processTempl();
+#endif
+    }
+
+    void GeneralizedHoughBase::setTemplateImpl(InputArray edges, InputArray dx, InputArray dy, Point templCenter)
+    {
+        edges.getGpuMat().copyTo(templEdges_);
+        dx.getGpuMat().copyTo(templDx_);
+        dy.getGpuMat().copyTo(templDy_);
+
+        CV_Assert( templEdges_.type() == CV_8UC1 );
+        CV_Assert( templDx_.type() == CV_32FC1 && templDx_.size() == templEdges_.size() );
+        CV_Assert( templDy_.type() == templDx_.type() && templDy_.size() == templEdges_.size() );
+
+        if (templCenter == Point(-1, -1))
+            templCenter = Point(templEdges_.cols / 2, templEdges_.rows / 2);
+
+        templSize_ = templEdges_.size();
+        templCenter_ = templCenter;
+
+        processTempl();
+    }
+
+    void GeneralizedHoughBase::detectImpl(InputArray image, OutputArray positions, OutputArray votes)
+    {
+#ifndef HAVE_OPENCV_CUDAFILTERS
+        CV_UNUSED(image);
+        CV_UNUSED(positions);
+        CV_UNUSED(votes);
+        throw_no_cuda();
+#else
+        calcEdges(image, imageEdges_, imageDx_, imageDy_);
+
+        imageSize_ = imageEdges_.size();
+
+        posCount_ = 0;
+
+        processImage();
+
+        if (posCount_ == 0)
+        {
+            positions.release();
+            if (votes.needed())
+                votes.release();
+        }
+        else
+        {
+            if (minDist_ > 1)
+                filterMinDist();
+            convertTo(positions, votes);
+        }
+#endif
+    }
+
+    void GeneralizedHoughBase::detectImpl(InputArray edges, InputArray dx, InputArray dy, OutputArray positions, OutputArray votes)
+    {
+        edges.getGpuMat().copyTo(imageEdges_);
+        dx.getGpuMat().copyTo(imageDx_);
+        dy.getGpuMat().copyTo(imageDy_);
+
+        CV_Assert( imageEdges_.type() == CV_8UC1 );
+        CV_Assert( imageDx_.type() == CV_32FC1 && imageDx_.size() == imageEdges_.size() );
+        CV_Assert( imageDy_.type() == imageDx_.type() && imageDy_.size() == imageEdges_.size() );
+
+        imageSize_ = imageEdges_.size();
+
+        posCount_ = 0;
+
+        processImage();
+
+        if (posCount_ == 0)
+        {
+            positions.release();
+            if (votes.needed())
+                votes.release();
+        }
+        else
+        {
+            if (minDist_ > 1)
+                filterMinDist();
+            convertTo(positions, votes);
+        }
+    }
+
+    void GeneralizedHoughBase::buildEdgePointList(const GpuMat& edges, const GpuMat& dx, const GpuMat& dy)
+    {
+        using namespace cv::cuda::device::ght;
+
+        typedef int (*func_t)(PtrStepSzb edges, PtrStepSzb dx, PtrStepSzb dy, unsigned int* coordList, float* thetaList);
+        static const func_t funcs[] =
+        {
+            0,
+            0,
+            0,
+            buildEdgePointList_gpu<short>,
+            buildEdgePointList_gpu<int>,
+            buildEdgePointList_gpu<float>,
+            0
+        };
+
+        CV_Assert( edges.type() == CV_8UC1 );
+        CV_Assert( dx.size() == edges.size() );
+        CV_Assert( dy.type() == dx.type() && dy.size() == edges.size() );
+
+        const func_t func = funcs[dx.depth()];
+        CV_Assert( func != 0 );
+
+        edgePointList_.cols = (int) (edgePointList_.step / sizeof(int));
+        ensureSizeIsEnough(2, edges.size().area(), CV_32SC1, edgePointList_);
+
+        edgePointList_.cols = func(edges, dx, dy, edgePointList_.ptr<unsigned int>(0), edgePointList_.ptr<float>(1));
+    }
+
+    struct IndexCmp
+    {
+        const int3* aux;
+
+        explicit IndexCmp(const int3* _aux) : aux(_aux) {}
+
+        bool operator ()(int l1, int l2) const
+        {
+            return aux[l1].x > aux[l2].x;
+        }
+    };
+
+    void GeneralizedHoughBase::filterMinDist()
+    {
+        oldPosBuf_.resize(posCount_);
+        oldVoteBuf_.resize(posCount_);
+
+        cudaSafeCall( cudaMemcpy(&oldPosBuf_[0], outBuf_.ptr(0), posCount_ * sizeof(float4), cudaMemcpyDeviceToHost) );
+        cudaSafeCall( cudaMemcpy(&oldVoteBuf_[0], outBuf_.ptr(1), posCount_ * sizeof(int3), cudaMemcpyDeviceToHost) );
+
+        indexies_.resize(posCount_);
+        for (int i = 0; i < posCount_; ++i)
+            indexies_[i] = i;
+        std::sort(indexies_.begin(), indexies_.end(), IndexCmp(&oldVoteBuf_[0]));
+
+        newPosBuf_.clear();
+        newVoteBuf_.clear();
+        newPosBuf_.reserve(posCount_);
+        newVoteBuf_.reserve(posCount_);
+
+        const int cellSize = cvRound(minDist_);
+        const int gridWidth = (imageSize_.width + cellSize - 1) / cellSize;
+        const int gridHeight = (imageSize_.height + cellSize - 1) / cellSize;
+
+        std::vector< std::vector<Point2f> > grid(gridWidth * gridHeight);
+
+        const double minDist2 = minDist_ * minDist_;
+
+        for (int i = 0; i < posCount_; ++i)
+        {
+            const int ind = indexies_[i];
+
+            Point2f p(oldPosBuf_[ind].x, oldPosBuf_[ind].y);
+
+            bool good = true;
+
+            const int xCell = static_cast<int>(p.x / cellSize);
+            const int yCell = static_cast<int>(p.y / cellSize);
+
+            int x1 = xCell - 1;
+            int y1 = yCell - 1;
+            int x2 = xCell + 1;
+            int y2 = yCell + 1;
+
+            // boundary check
+            x1 = std::max(0, x1);
+            y1 = std::max(0, y1);
+            x2 = std::min(gridWidth - 1, x2);
+            y2 = std::min(gridHeight - 1, y2);
+
+            for (int yy = y1; yy <= y2; ++yy)
+            {
+                for (int xx = x1; xx <= x2; ++xx)
+                {
+                    const std::vector<Point2f>& m = grid[yy * gridWidth + xx];
+
+                    for(size_t j = 0; j < m.size(); ++j)
+                    {
+                        const Point2f d = p - m[j];
+
+                        if (d.ddot(d) < minDist2)
+                        {
+                            good = false;
+                            goto break_out;
+                        }
+                    }
+                }
+            }
+
+            break_out:
+
+            if(good)
+            {
+                grid[yCell * gridWidth + xCell].push_back(p);
+
+                newPosBuf_.push_back(oldPosBuf_[ind]);
+                newVoteBuf_.push_back(oldVoteBuf_[ind]);
+            }
+        }
+
+        posCount_ = static_cast<int>(newPosBuf_.size());
+        cudaSafeCall( cudaMemcpy(outBuf_.ptr(0), &newPosBuf_[0], posCount_ * sizeof(float4), cudaMemcpyHostToDevice) );
+        cudaSafeCall( cudaMemcpy(outBuf_.ptr(1), &newVoteBuf_[0], posCount_ * sizeof(int3), cudaMemcpyHostToDevice) );
+    }
+
+    void GeneralizedHoughBase::convertTo(OutputArray positions, OutputArray votes)
+    {
+        ensureSizeIsEnough(1, posCount_, CV_32FC4, positions);
+        GpuMat(1, posCount_, CV_32FC4, outBuf_.ptr(0), outBuf_.step).copyTo(positions);
+
+        if (votes.needed())
+        {
+            ensureSizeIsEnough(1, posCount_, CV_32FC3, votes);
+            GpuMat(1, posCount_, CV_32FC4, outBuf_.ptr(1), outBuf_.step).copyTo(votes);
+        }
+    }
+}
+
+// GeneralizedHoughBallard
+
+namespace
+{
+    class GeneralizedHoughBallardImpl : public GeneralizedHoughBallard, private GeneralizedHoughBase
+    {
+    public:
+        GeneralizedHoughBallardImpl();
+
+        void setTemplate(InputArray templ, Point templCenter) { setTemplateImpl(templ, templCenter); }
+        void setTemplate(InputArray edges, InputArray dx, InputArray dy, Point templCenter) { setTemplateImpl(edges, dx, dy, templCenter); }
+
+        void detect(InputArray image, OutputArray positions, OutputArray votes) { detectImpl(image, positions, votes); }
+        void detect(InputArray edges, InputArray dx, InputArray dy, OutputArray positions, OutputArray votes) { detectImpl(edges, dx, dy, positions, votes); }
+
+        void setCannyLowThresh(int cannyLowThresh) { cannyLowThresh_ = cannyLowThresh; }
+        int getCannyLowThresh() const { return cannyLowThresh_; }
+
+        void setCannyHighThresh(int cannyHighThresh) { cannyHighThresh_ = cannyHighThresh; }
+        int getCannyHighThresh() const { return cannyHighThresh_; }
+
+        void setMinDist(double minDist) { minDist_ = minDist; }
+        double getMinDist() const { return minDist_; }
+
+        void setDp(double dp) { dp_ = dp; }
+        double getDp() const { return dp_; }
+
+        void setMaxBufferSize(int maxBufferSize) { maxBufferSize_ = maxBufferSize; }
+        int getMaxBufferSize() const { return maxBufferSize_; }
+
+        void setLevels(int levels) { levels_ = levels; }
+        int getLevels() const { return levels_; }
+
+        void setVotesThreshold(int votesThreshold) { votesThreshold_ = votesThreshold; }
+        int getVotesThreshold() const { return votesThreshold_; }
+
+    private:
+        void processTempl();
+        void processImage();
+
+        void calcHist();
+        void findPosInHist();
+
+        int levels_;
+        int votesThreshold_;
+
+        GpuMat r_table_;
+        GpuMat r_sizes_;
+
+        GpuMat hist_;
+    };
+
+    GeneralizedHoughBallardImpl::GeneralizedHoughBallardImpl()
+    {
+        levels_ = 360;
+        votesThreshold_ = 100;
+    }
+
+    void GeneralizedHoughBallardImpl::processTempl()
+    {
+        using namespace cv::cuda::device::ght;
+
+        CV_Assert( levels_ > 0 );
+
+        buildEdgePointList(templEdges_, templDx_, templDy_);
+
+        ensureSizeIsEnough(levels_ + 1, maxBufferSize_, CV_16SC2, r_table_);
+        ensureSizeIsEnough(1, levels_ + 1, CV_32SC1, r_sizes_);
+        r_sizes_.setTo(Scalar::all(0));
+
+        if (edgePointList_.cols > 0)
+        {
+            buildRTable_gpu(edgePointList_.ptr<unsigned int>(0), edgePointList_.ptr<float>(1), edgePointList_.cols,
+                            r_table_, r_sizes_.ptr<int>(), make_short2(templCenter_.x, templCenter_.y), levels_);
+            cuda::min(r_sizes_, maxBufferSize_, r_sizes_);
+        }
+    }
+
+    void GeneralizedHoughBallardImpl::processImage()
+    {
+        calcHist();
+        findPosInHist();
+    }
+
+    void GeneralizedHoughBallardImpl::calcHist()
+    {
+        using namespace cv::cuda::device::ght;
+
+        CV_Assert( levels_ > 0 && r_table_.rows == (levels_ + 1) && r_sizes_.cols == (levels_ + 1) );
+        CV_Assert( dp_ > 0.0);
+
+        const double idp = 1.0 / dp_;
+
+        buildEdgePointList(imageEdges_, imageDx_, imageDy_);
+
+        ensureSizeIsEnough(cvCeil(imageSize_.height * idp) + 2, cvCeil(imageSize_.width * idp) + 2, CV_32SC1, hist_);
+        hist_.setTo(Scalar::all(0));
+
+        if (edgePointList_.cols > 0)
+        {
+            Ballard_Pos_calcHist_gpu(edgePointList_.ptr<unsigned int>(0), edgePointList_.ptr<float>(1), edgePointList_.cols,
+                                     r_table_, r_sizes_.ptr<int>(),
+                                     hist_,
+                                     (float)dp_, levels_);
+        }
+    }
+
+    void GeneralizedHoughBallardImpl::findPosInHist()
+    {
+        using namespace cv::cuda::device::ght;
+
+        CV_Assert( votesThreshold_ > 0 );
+
+        ensureSizeIsEnough(2, maxBufferSize_, CV_32FC4, outBuf_);
+
+        posCount_ = Ballard_Pos_findPosInHist_gpu(hist_, outBuf_.ptr<float4>(0), outBuf_.ptr<int3>(1), maxBufferSize_, (float)dp_, votesThreshold_);
+    }
+}
+
+Ptr<GeneralizedHoughBallard> cv::cuda::createGeneralizedHoughBallard()
+{
+    return makePtr<GeneralizedHoughBallardImpl>();
+}
+
+// GeneralizedHoughGuil
+
+namespace
+{
+    class GeneralizedHoughGuilImpl : public GeneralizedHoughGuil, private GeneralizedHoughBase
+    {
+    public:
+        GeneralizedHoughGuilImpl();
+
+        void setTemplate(InputArray templ, Point templCenter) { setTemplateImpl(templ, templCenter); }
+        void setTemplate(InputArray edges, InputArray dx, InputArray dy, Point templCenter) { setTemplateImpl(edges, dx, dy, templCenter); }
+
+        void detect(InputArray image, OutputArray positions, OutputArray votes) { detectImpl(image, positions, votes); }
+        void detect(InputArray edges, InputArray dx, InputArray dy, OutputArray positions, OutputArray votes) { detectImpl(edges, dx, dy, positions, votes); }
+
+        void setCannyLowThresh(int cannyLowThresh) { cannyLowThresh_ = cannyLowThresh; }
+        int getCannyLowThresh() const { return cannyLowThresh_; }
+
+        void setCannyHighThresh(int cannyHighThresh) { cannyHighThresh_ = cannyHighThresh; }
+        int getCannyHighThresh() const { return cannyHighThresh_; }
+
+        void setMinDist(double minDist) { minDist_ = minDist; }
+        double getMinDist() const { return minDist_; }
+
+        void setDp(double dp) { dp_ = dp; }
+        double getDp() const { return dp_; }
+
+        void setMaxBufferSize(int maxBufferSize) { maxBufferSize_ = maxBufferSize; }
+        int getMaxBufferSize() const { return maxBufferSize_; }
+
+        void setXi(double xi) { xi_ = xi; }
+        double getXi() const { return xi_; }
+
+        void setLevels(int levels) { levels_ = levels; }
+        int getLevels() const { return levels_; }
+
+        void setAngleEpsilon(double angleEpsilon) { angleEpsilon_ = angleEpsilon; }
+        double getAngleEpsilon() const { return angleEpsilon_; }
+
+        void setMinAngle(double minAngle) { minAngle_ = minAngle; }
+        double getMinAngle() const { return minAngle_; }
+
+        void setMaxAngle(double maxAngle) { maxAngle_ = maxAngle; }
+        double getMaxAngle() const { return maxAngle_; }
+
+        void setAngleStep(double angleStep) { angleStep_ = angleStep; }
+        double getAngleStep() const { return angleStep_; }
+
+        void setAngleThresh(int angleThresh) { angleThresh_ = angleThresh; }
+        int getAngleThresh() const { return angleThresh_; }
+
+        void setMinScale(double minScale) { minScale_ = minScale; }
+        double getMinScale() const { return minScale_; }
+
+        void setMaxScale(double maxScale) { maxScale_ = maxScale; }
+        double getMaxScale() const { return maxScale_; }
+
+        void setScaleStep(double scaleStep) { scaleStep_ = scaleStep; }
+        double getScaleStep() const { return scaleStep_; }
+
+        void setScaleThresh(int scaleThresh) { scaleThresh_ = scaleThresh; }
+        int getScaleThresh() const { return scaleThresh_; }
+
+        void setPosThresh(int posThresh) { posThresh_ = posThresh; }
+        int getPosThresh() const { return posThresh_; }
+
+    private:
+        void processTempl();
+        void processImage();
+
+        double xi_;
+        int levels_;
+        double angleEpsilon_;
+
+        double minAngle_;
+        double maxAngle_;
+        double angleStep_;
+        int angleThresh_;
+
+        double minScale_;
+        double maxScale_;
+        double scaleStep_;
+        int scaleThresh_;
+
+        int posThresh_;
+
+        struct Feature
+        {
+            GpuMat p1_pos;
+            GpuMat p1_theta;
+            GpuMat p2_pos;
+
+            GpuMat d12;
+
+            GpuMat r1;
+            GpuMat r2;
+
+            GpuMat sizes;
+            int maxSize;
+
+            void create(int levels, int maxCapacity, bool isTempl);
+        };
+
+        typedef void (*set_func_t)(PtrStepb p1_pos, PtrStepb p1_theta, PtrStepb p2_pos, PtrStepb d12, PtrStepb r1, PtrStepb r2);
+        typedef void (*build_func_t)(const unsigned int* coordList, const float* thetaList, int pointsCount,
+                                     int* sizes, int maxSize,
+                                     float xi, float angleEpsilon, int levels,
+                                     float2 center, float maxDist);
+
+        void buildFeatureList(const GpuMat& edges, const GpuMat& dx, const GpuMat& dy, Feature& features,
+                              set_func_t set_func, build_func_t build_func, bool isTempl, Point2d center = Point2d());
+
+        void calcOrientation();
+        void calcScale(double angle);
+        void calcPosition(double angle, int angleVotes, double scale, int scaleVotes);
+
+        Feature templFeatures_;
+        Feature imageFeatures_;
+
+        std::vector< std::pair<double, int> > angles_;
+        std::vector< std::pair<double, int> > scales_;
+
+        GpuMat hist_;
+        std::vector<int> h_buf_;
+    };
+
+    GeneralizedHoughGuilImpl::GeneralizedHoughGuilImpl()
+    {
+        maxBufferSize_ = 1000;
+
+        xi_ = 90.0;
+        levels_ = 360;
+        angleEpsilon_ = 1.0;
+
+        minAngle_ = 0.0;
+        maxAngle_ = 360.0;
+        angleStep_ = 1.0;
+        angleThresh_ = 15000;
+
+        minScale_ = 0.5;
+        maxScale_ = 2.0;
+        scaleStep_ = 0.05;
+        scaleThresh_ = 1000;
+
+        posThresh_ = 100;
+    }
+
+    void GeneralizedHoughGuilImpl::processTempl()
+    {
+        using namespace cv::cuda::device::ght;
+
+        buildFeatureList(templEdges_, templDx_, templDy_, templFeatures_,
+            Guil_Full_setTemplFeatures, Guil_Full_buildTemplFeatureList_gpu,
+            true, templCenter_);
+
+        h_buf_.resize(templFeatures_.sizes.cols);
+        cudaSafeCall( cudaMemcpy(&h_buf_[0], templFeatures_.sizes.data, h_buf_.size() * sizeof(int), cudaMemcpyDeviceToHost) );
+        templFeatures_.maxSize = *std::max_element(h_buf_.begin(), h_buf_.end());
+    }
+
+    void GeneralizedHoughGuilImpl::processImage()
+    {
+        using namespace cv::cuda::device::ght;
+
+        CV_Assert( levels_ > 0 );
+        CV_Assert( templFeatures_.sizes.cols == levels_ + 1 );
+        CV_Assert( minAngle_ >= 0.0 && minAngle_ < maxAngle_ && maxAngle_ <= 360.0 );
+        CV_Assert( angleStep_ > 0.0 && angleStep_ < 360.0 );
+        CV_Assert( angleThresh_ > 0 );
+        CV_Assert( minScale_ > 0.0 && minScale_ < maxScale_ );
+        CV_Assert( scaleStep_ > 0.0 );
+        CV_Assert( scaleThresh_ > 0 );
+        CV_Assert( dp_ > 0.0 );
+        CV_Assert( posThresh_ > 0 );
+
+        const double iAngleStep = 1.0 / angleStep_;
+        const int angleRange = cvCeil((maxAngle_ - minAngle_) * iAngleStep);
+
+        const double iScaleStep = 1.0 / scaleStep_;
+        const int scaleRange = cvCeil((maxScale_ - minScale_) * iScaleStep);
+
+        const double idp = 1.0 / dp_;
+        const int histRows = cvCeil(imageSize_.height * idp);
+        const int histCols = cvCeil(imageSize_.width * idp);
+
+        ensureSizeIsEnough(histRows + 2, std::max(angleRange + 1, std::max(scaleRange + 1, histCols + 2)), CV_32SC1, hist_);
+        h_buf_.resize(std::max(angleRange + 1, scaleRange + 1));
+
+        ensureSizeIsEnough(2, maxBufferSize_, CV_32FC4, outBuf_);
+
+        buildFeatureList(imageEdges_, imageDx_, imageDy_, imageFeatures_,
+            Guil_Full_setImageFeatures, Guil_Full_buildImageFeatureList_gpu,
+            false);
+
+        calcOrientation();
+
+        for (size_t i = 0; i < angles_.size(); ++i)
+        {
+            const double angle = angles_[i].first;
+            const int angleVotes = angles_[i].second;
+
+            calcScale(angle);
+
+            for (size_t j = 0; j < scales_.size(); ++j)
+            {
+                const double scale = scales_[j].first;
+                const int scaleVotes = scales_[j].second;
+
+                calcPosition(angle, angleVotes, scale, scaleVotes);
+            }
+        }
+    }
+
+    void GeneralizedHoughGuilImpl::Feature::create(int levels, int maxCapacity, bool isTempl)
+    {
+        if (!isTempl)
+        {
+            ensureSizeIsEnough(levels + 1, maxCapacity, CV_32FC2, p1_pos);
+            ensureSizeIsEnough(levels + 1, maxCapacity, CV_32FC2, p2_pos);
+        }
+
+        ensureSizeIsEnough(levels + 1, maxCapacity, CV_32FC1, p1_theta);
+
+        ensureSizeIsEnough(levels + 1, maxCapacity, CV_32FC1, d12);
+
+        if (isTempl)
+        {
+            ensureSizeIsEnough(levels + 1, maxCapacity, CV_32FC2, r1);
+            ensureSizeIsEnough(levels + 1, maxCapacity, CV_32FC2, r2);
+        }
+
+        ensureSizeIsEnough(1, levels + 1, CV_32SC1, sizes);
+        sizes.setTo(Scalar::all(0));
+
+        maxSize = 0;
+    }
+
+    void GeneralizedHoughGuilImpl::buildFeatureList(const GpuMat& edges, const GpuMat& dx, const GpuMat& dy, Feature& features,
+                                                    set_func_t set_func, build_func_t build_func, bool isTempl, Point2d center)
+    {
+        CV_Assert( levels_ > 0 );
+
+        const double maxDist = sqrt((double) templSize_.width * templSize_.width + templSize_.height * templSize_.height) * maxScale_;
+
+        features.create(levels_, maxBufferSize_, isTempl);
+        set_func(features.p1_pos, features.p1_theta, features.p2_pos, features.d12, features.r1, features.r2);
+
+        buildEdgePointList(edges, dx, dy);
+
+        if (edgePointList_.cols > 0)
+        {
+            build_func(edgePointList_.ptr<unsigned int>(0), edgePointList_.ptr<float>(1), edgePointList_.cols,
+                features.sizes.ptr<int>(), maxBufferSize_, (float)xi_, (float)angleEpsilon_, levels_, make_float2((float)center.x, (float)center.y), (float)maxDist);
+        }
+    }
+
+    void GeneralizedHoughGuilImpl::calcOrientation()
+    {
+        using namespace cv::cuda::device::ght;
+
+        const double iAngleStep = 1.0 / angleStep_;
+        const int angleRange = cvCeil((maxAngle_ - minAngle_) * iAngleStep);
+
+        hist_.setTo(Scalar::all(0));
+        Guil_Full_calcOHist_gpu(templFeatures_.sizes.ptr<int>(), imageFeatures_.sizes.ptr<int>(0), hist_.ptr<int>(),
+                                (float)minAngle_, (float)maxAngle_, (float)angleStep_, angleRange, levels_, templFeatures_.maxSize);
+        cudaSafeCall( cudaMemcpy(&h_buf_[0], hist_.data, h_buf_.size() * sizeof(int), cudaMemcpyDeviceToHost) );
+
+        angles_.clear();
+
+        for (int n = 0; n < angleRange; ++n)
+        {
+            if (h_buf_[n] >= angleThresh_)
+            {
+                const double angle = minAngle_ + n * angleStep_;
+                angles_.push_back(std::make_pair(angle, h_buf_[n]));
+            }
+        }
+    }
+
+    void GeneralizedHoughGuilImpl::calcScale(double angle)
+    {
+        using namespace cv::cuda::device::ght;
+
+        const double iScaleStep = 1.0 / scaleStep_;
+        const int scaleRange = cvCeil((maxScale_ - minScale_) * iScaleStep);
+
+        hist_.setTo(Scalar::all(0));
+        Guil_Full_calcSHist_gpu(templFeatures_.sizes.ptr<int>(), imageFeatures_.sizes.ptr<int>(0), hist_.ptr<int>(),
+                                (float)angle, (float)angleEpsilon_, (float)minScale_, (float)maxScale_,
+                                (float)iScaleStep, scaleRange, levels_, templFeatures_.maxSize);
+        cudaSafeCall( cudaMemcpy(&h_buf_[0], hist_.data, h_buf_.size() * sizeof(int), cudaMemcpyDeviceToHost) );
+
+        scales_.clear();
+
+        for (int s = 0; s < scaleRange; ++s)
+        {
+            if (h_buf_[s] >= scaleThresh_)
+            {
+                const double scale = minScale_ + s * scaleStep_;
+                scales_.push_back(std::make_pair(scale, h_buf_[s]));
+            }
+        }
+    }
+
+    void GeneralizedHoughGuilImpl::calcPosition(double angle, int angleVotes, double scale, int scaleVotes)
+    {
+        using namespace cv::cuda::device::ght;
+
+        hist_.setTo(Scalar::all(0));
+        Guil_Full_calcPHist_gpu(templFeatures_.sizes.ptr<int>(), imageFeatures_.sizes.ptr<int>(0), hist_,
+                                (float)angle, (float)angleEpsilon_, (float)scale, (float)dp_, levels_, templFeatures_.maxSize);
+
+        posCount_ = Guil_Full_findPosInHist_gpu(hist_, outBuf_.ptr<float4>(0), outBuf_.ptr<int3>(1),
+                                                posCount_, maxBufferSize_, (float)angle, angleVotes,
+                                                (float)scale, scaleVotes, (float)dp_, posThresh_);
+    }
+}
+
+Ptr<GeneralizedHoughGuil> cv::cuda::createGeneralizedHoughGuil()
+{
+    return makePtr<GeneralizedHoughGuilImpl>();
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudaimgproc/src/gftt.cpp b/modules/cudaimgproc/src/gftt.cpp
new file mode 100644
index 00000000000..bf5d01b1174
--- /dev/null
+++ b/modules/cudaimgproc/src/gftt.cpp
@@ -0,0 +1,215 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER) || !defined(HAVE_OPENCV_CUDAARITHM)
+
+Ptr<cuda::CornersDetector> cv::cuda::createGoodFeaturesToTrackDetector(int, int, double, double, int, bool, double) { throw_no_cuda(); return Ptr<cuda::CornersDetector>(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace gfft
+    {
+        int findCorners_gpu(PtrStepSzf eig, float threshold, PtrStepSzb mask, float2* corners, int max_count, cudaStream_t stream);
+        void sortCorners_gpu(PtrStepSzf eig, float2* corners, int count, cudaStream_t stream);
+    }
+}}}
+
+namespace
+{
+    class GoodFeaturesToTrackDetector : public CornersDetector
+    {
+    public:
+        GoodFeaturesToTrackDetector(int srcType, int maxCorners, double qualityLevel, double minDistance,
+                                    int blockSize, bool useHarrisDetector, double harrisK);
+
+        void detect(InputArray image, OutputArray corners, InputArray mask, Stream& stream);
+
+    private:
+        int maxCorners_;
+        double qualityLevel_;
+        double minDistance_;
+
+        Ptr<cuda::CornernessCriteria> cornerCriteria_;
+
+        GpuMat Dx_;
+        GpuMat Dy_;
+        GpuMat buf_;
+        GpuMat eig_;
+        GpuMat tmpCorners_;
+    };
+
+    GoodFeaturesToTrackDetector::GoodFeaturesToTrackDetector(int srcType, int maxCorners, double qualityLevel, double minDistance,
+                                                             int blockSize, bool useHarrisDetector, double harrisK) :
+        maxCorners_(maxCorners), qualityLevel_(qualityLevel), minDistance_(minDistance)
+    {
+        CV_Assert( qualityLevel_ > 0 && minDistance_ >= 0 && maxCorners_ >= 0 );
+
+        cornerCriteria_ = useHarrisDetector ?
+                    cuda::createHarrisCorner(srcType, blockSize, 3, harrisK) :
+                    cuda::createMinEigenValCorner(srcType, blockSize, 3);
+    }
+
+    void GoodFeaturesToTrackDetector::detect(InputArray _image, OutputArray _corners, InputArray _mask, Stream& stream)
+    {
+        using namespace cv::cuda::device::gfft;
+
+        GpuMat image = _image.getGpuMat();
+        GpuMat mask = _mask.getGpuMat();
+
+        CV_Assert( mask.empty() || (mask.type() == CV_8UC1 && mask.size() == image.size()) );
+
+        ensureSizeIsEnough(image.size(), CV_32FC1, eig_);
+        cornerCriteria_->compute(image, eig_, stream);
+
+        double maxVal = 0;
+        cuda::minMax(eig_, 0, &maxVal);
+        cudaStream_t stream_ = StreamAccessor::getStream(stream);
+        ensureSizeIsEnough(1, std::max(1000, static_cast<int>(image.size().area() * 0.05)), CV_32FC2, tmpCorners_);
+
+        int total = findCorners_gpu(eig_, static_cast<float>(maxVal * qualityLevel_), mask, tmpCorners_.ptr<float2>(), tmpCorners_.cols, stream_);
+
+        if (total == 0)
+        {
+            _corners.release();
+            return;
+        }
+
+        sortCorners_gpu(eig_, tmpCorners_.ptr<float2>(), total, stream_);
+
+        if (minDistance_ < 1)
+        {
+            tmpCorners_.colRange(0, maxCorners_ > 0 ? std::min(maxCorners_, total) : total).copyTo(_corners, stream);
+        }
+        else
+        {
+            std::vector<Point2f> tmp(total);
+            Mat tmpMat(1, total, CV_32FC2, (void*)&tmp[0]);
+            tmpCorners_.colRange(0, total).download(tmpMat, stream);
+            stream.waitForCompletion();
+            std::vector<Point2f> tmp2;
+            tmp2.reserve(total);
+
+            const int cell_size = cvRound(minDistance_);
+            const int grid_width = (image.cols + cell_size - 1) / cell_size;
+            const int grid_height = (image.rows + cell_size - 1) / cell_size;
+
+            std::vector< std::vector<Point2f> > grid(grid_width * grid_height);
+
+            for (int i = 0; i < total; ++i)
+            {
+                Point2f p = tmp[i];
+
+                bool good = true;
+
+                int x_cell = static_cast<int>(p.x / cell_size);
+                int y_cell = static_cast<int>(p.y / cell_size);
+
+                int x1 = x_cell - 1;
+                int y1 = y_cell - 1;
+                int x2 = x_cell + 1;
+                int y2 = y_cell + 1;
+
+                // boundary check
+                x1 = std::max(0, x1);
+                y1 = std::max(0, y1);
+                x2 = std::min(grid_width - 1, x2);
+                y2 = std::min(grid_height - 1, y2);
+
+                for (int yy = y1; yy <= y2; yy++)
+                {
+                    for (int xx = x1; xx <= x2; xx++)
+                    {
+                        std::vector<Point2f>& m = grid[yy * grid_width + xx];
+
+                        if (!m.empty())
+                        {
+                            for(size_t j = 0; j < m.size(); j++)
+                            {
+                                float dx = p.x - m[j].x;
+                                float dy = p.y - m[j].y;
+
+                                if (dx * dx + dy * dy < minDistance_ * minDistance_)
+                                {
+                                    good = false;
+                                    goto break_out;
+                                }
+                            }
+                        }
+                    }
+                }
+
+                break_out:
+
+                if(good)
+                {
+                    grid[y_cell * grid_width + x_cell].push_back(p);
+
+                    tmp2.push_back(p);
+
+                    if (maxCorners_ > 0 && tmp2.size() == static_cast<size_t>(maxCorners_))
+                        break;
+                }
+            }
+
+            _corners.create(1, static_cast<int>(tmp2.size()), CV_32FC2);
+            GpuMat corners = _corners.getGpuMat();
+
+            corners.upload(Mat(1, static_cast<int>(tmp2.size()), CV_32FC2, &tmp2[0]), stream);
+        }
+    }
+}
+
+Ptr<cuda::CornersDetector> cv::cuda::createGoodFeaturesToTrackDetector(int srcType, int maxCorners, double qualityLevel, double minDistance,
+                                                                     int blockSize, bool useHarrisDetector, double harrisK)
+{
+    return Ptr<cuda::CornersDetector>(
+        new GoodFeaturesToTrackDetector(srcType, maxCorners, qualityLevel, minDistance, blockSize, useHarrisDetector, harrisK));
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudaimgproc/src/histogram.cpp b/modules/cudaimgproc/src/histogram.cpp
new file mode 100644
index 00000000000..e616c5a2e94
--- /dev/null
+++ b/modules/cudaimgproc/src/histogram.cpp
@@ -0,0 +1,597 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+void cv::cuda::calcHist(InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::equalizeHist(InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+
+cv::Ptr<cv::cuda::CLAHE> cv::cuda::createCLAHE(double, cv::Size) { throw_no_cuda(); return cv::Ptr<cv::cuda::CLAHE>(); }
+
+void cv::cuda::evenLevels(OutputArray, int, int, int, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::histEven(InputArray, OutputArray, int, int, int, Stream&) { throw_no_cuda(); }
+void cv::cuda::histEven(InputArray, GpuMat*, int*, int*, int*, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::histRange(InputArray, OutputArray, InputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::histRange(InputArray, GpuMat*, const GpuMat*, Stream&) { throw_no_cuda(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+////////////////////////////////////////////////////////////////////////
+// calcHist
+
+namespace hist
+{
+    void histogram256(PtrStepSzb src, int* hist, cudaStream_t stream);
+    void histogram256(PtrStepSzb src, PtrStepSzb mask, int* hist, cudaStream_t stream);
+}
+
+void cv::cuda::calcHist(InputArray _src, OutputArray _hist, Stream& stream)
+{
+    calcHist(_src, cv::cuda::GpuMat(), _hist, stream);
+}
+
+void cv::cuda::calcHist(InputArray _src, InputArray _mask, OutputArray _hist, Stream& stream)
+{
+    GpuMat src = _src.getGpuMat();
+    GpuMat mask = _mask.getGpuMat();
+
+    CV_Assert( src.type() == CV_8UC1 );
+    CV_Assert( mask.empty() || mask.type() == CV_8UC1 );
+    CV_Assert( mask.empty() || mask.size() == src.size() );
+
+    _hist.create(1, 256, CV_32SC1);
+    GpuMat hist = _hist.getGpuMat();
+
+    hist.setTo(Scalar::all(0), stream);
+
+    if (mask.empty())
+        hist::histogram256(src, hist.ptr<int>(), StreamAccessor::getStream(stream));
+    else
+        hist::histogram256(src, mask, hist.ptr<int>(), StreamAccessor::getStream(stream));
+}
+
+////////////////////////////////////////////////////////////////////////
+// equalizeHist
+
+namespace hist
+{
+    void equalizeHist(PtrStepSzb src, PtrStepSzb dst, const int* lut, cudaStream_t stream);
+}
+
+void cv::cuda::equalizeHist(InputArray _src, OutputArray _dst, Stream& _stream)
+{
+    GpuMat src = getInputMat(_src, _stream);
+
+    CV_Assert( src.type() == CV_8UC1 );
+
+    _dst.create(src.size(), src.type());
+    GpuMat dst = _dst.getGpuMat();
+
+    int intBufSize;
+    nppSafeCall( nppsIntegralGetBufferSize_32s(256, &intBufSize) );
+
+    size_t bufSize = intBufSize + 2 * 256 * sizeof(int);
+
+    BufferPool pool(_stream);
+    GpuMat buf = pool.getBuffer(1, static_cast<int>(bufSize), CV_8UC1);
+
+    GpuMat hist(1, 256, CV_32SC1, buf.data);
+    GpuMat lut(1, 256, CV_32SC1, buf.data + 256 * sizeof(int));
+    GpuMat intBuf(1, intBufSize, CV_8UC1, buf.data + 2 * 256 * sizeof(int));
+
+    cuda::calcHist(src, hist, _stream);
+
+    cudaStream_t stream = StreamAccessor::getStream(_stream);
+    NppStreamHandler h(stream);
+
+    nppSafeCall( nppsIntegral_32s(hist.ptr<Npp32s>(), lut.ptr<Npp32s>(), 256, intBuf.ptr<Npp8u>()) );
+
+    hist::equalizeHist(src, dst, lut.ptr<int>(), stream);
+}
+
+////////////////////////////////////////////////////////////////////////
+// CLAHE
+
+namespace clahe
+{
+    void calcLut_8U(PtrStepSzb src, PtrStepb lut, int tilesX, int tilesY, int2 tileSize, int clipLimit, float lutScale, cudaStream_t stream);
+    void calcLut_16U(PtrStepSzus src, PtrStepus lut, int tilesX, int tilesY, int2 tileSize, int clipLimit, float lutScale, PtrStepSzi hist, cudaStream_t stream);
+    template <typename T> void transform(PtrStepSz<T> src, PtrStepSz<T> dst, PtrStep<T> lut, int tilesX, int tilesY, int2 tileSize, cudaStream_t stream);
+}
+
+namespace
+{
+    class CLAHE_Impl : public cv::cuda::CLAHE
+    {
+    public:
+        CLAHE_Impl(double clipLimit = 40.0, int tilesX = 8, int tilesY = 8);
+
+        void apply(cv::InputArray src, cv::OutputArray dst);
+        void apply(InputArray src, OutputArray dst, Stream& stream);
+
+        void setClipLimit(double clipLimit);
+        double getClipLimit() const;
+
+        void setTilesGridSize(cv::Size tileGridSize);
+        cv::Size getTilesGridSize() const;
+
+        void collectGarbage();
+
+    private:
+        double clipLimit_;
+        int tilesX_;
+        int tilesY_;
+
+        GpuMat srcExt_;
+        GpuMat lut_;
+        GpuMat hist_; // histogram on global memory for CV_16UC1 case
+    };
+
+    CLAHE_Impl::CLAHE_Impl(double clipLimit, int tilesX, int tilesY) :
+        clipLimit_(clipLimit), tilesX_(tilesX), tilesY_(tilesY)
+    {
+    }
+
+    void CLAHE_Impl::apply(cv::InputArray _src, cv::OutputArray _dst)
+    {
+        apply(_src, _dst, Stream::Null());
+    }
+
+    void CLAHE_Impl::apply(InputArray _src, OutputArray _dst, Stream& s)
+    {
+        GpuMat src = _src.getGpuMat();
+
+        const int type = src.type();
+
+        CV_Assert( type == CV_8UC1 || type == CV_16UC1 );
+
+        _dst.create( src.size(), type );
+        GpuMat dst = _dst.getGpuMat();
+
+        const int histSize = type == CV_8UC1 ? 256 : 65536;
+
+        ensureSizeIsEnough(tilesX_ * tilesY_, histSize, type, lut_);
+
+        cudaStream_t stream = StreamAccessor::getStream(s);
+
+        cv::Size tileSize;
+        GpuMat srcForLut;
+
+        if (src.cols % tilesX_ == 0 && src.rows % tilesY_ == 0)
+        {
+            tileSize = cv::Size(src.cols / tilesX_, src.rows / tilesY_);
+            srcForLut = src;
+        }
+        else
+        {
+#ifndef HAVE_OPENCV_CUDAARITHM
+            throw_no_cuda();
+#else
+            cv::cuda::copyMakeBorder(src, srcExt_, 0, tilesY_ - (src.rows % tilesY_), 0, tilesX_ - (src.cols % tilesX_), cv::BORDER_REFLECT_101, cv::Scalar(), s);
+#endif
+
+            tileSize = cv::Size(srcExt_.cols / tilesX_, srcExt_.rows / tilesY_);
+            srcForLut = srcExt_;
+        }
+
+        const int tileSizeTotal = tileSize.area();
+        const float lutScale = static_cast<float>(histSize - 1) / tileSizeTotal;
+
+        int clipLimit = 0;
+        if (clipLimit_ > 0.0)
+        {
+            clipLimit = static_cast<int>(clipLimit_ * tileSizeTotal / histSize);
+            clipLimit = std::max(clipLimit, 1);
+        }
+
+        if (type == CV_8UC1)
+            clahe::calcLut_8U(srcForLut, lut_, tilesX_, tilesY_, make_int2(tileSize.width, tileSize.height), clipLimit, lutScale, stream);
+        else // type == CV_16UC1
+        {
+            ensureSizeIsEnough(tilesX_ * tilesY_, histSize, CV_32SC1, hist_);
+            clahe::calcLut_16U(srcForLut, lut_, tilesX_, tilesY_, make_int2(tileSize.width, tileSize.height), clipLimit, lutScale, hist_, stream);
+        }
+
+        if (type == CV_8UC1)
+            clahe::transform<uchar>(src, dst, lut_, tilesX_, tilesY_, make_int2(tileSize.width, tileSize.height), stream);
+        else // type == CV_16UC1
+            clahe::transform<ushort>(src, dst, lut_, tilesX_, tilesY_, make_int2(tileSize.width, tileSize.height), stream);
+    }
+
+    void CLAHE_Impl::setClipLimit(double clipLimit)
+    {
+        clipLimit_ = clipLimit;
+    }
+
+    double CLAHE_Impl::getClipLimit() const
+    {
+        return clipLimit_;
+    }
+
+    void CLAHE_Impl::setTilesGridSize(cv::Size tileGridSize)
+    {
+        tilesX_ = tileGridSize.width;
+        tilesY_ = tileGridSize.height;
+    }
+
+    cv::Size CLAHE_Impl::getTilesGridSize() const
+    {
+        return cv::Size(tilesX_, tilesY_);
+    }
+
+    void CLAHE_Impl::collectGarbage()
+    {
+        srcExt_.release();
+        lut_.release();
+    }
+}
+
+cv::Ptr<cv::cuda::CLAHE> cv::cuda::createCLAHE(double clipLimit, cv::Size tileGridSize)
+{
+    return makePtr<CLAHE_Impl>(clipLimit, tileGridSize.width, tileGridSize.height);
+}
+
+////////////////////////////////////////////////////////////////////////
+// NPP Histogram
+
+namespace
+{
+    typedef NppStatus (*get_buf_size_c1_t)(NppiSize oSizeROI, int nLevels, int* hpBufferSize);
+    typedef NppStatus (*get_buf_size_c4_t)(NppiSize oSizeROI, int nLevels[], int* hpBufferSize);
+
+    template<int SDEPTH> struct NppHistogramEvenFuncC1
+    {
+        typedef typename NPPTypeTraits<SDEPTH>::npp_type src_t;
+
+    typedef NppStatus (*func_ptr)(const src_t* pSrc, int nSrcStep, NppiSize oSizeROI, Npp32s * pHist,
+            int nLevels, Npp32s nLowerLevel, Npp32s nUpperLevel, Npp8u * pBuffer);
+    };
+    template<int SDEPTH> struct NppHistogramEvenFuncC4
+    {
+        typedef typename NPPTypeTraits<SDEPTH>::npp_type src_t;
+
+        typedef NppStatus (*func_ptr)(const src_t* pSrc, int nSrcStep, NppiSize oSizeROI,
+            Npp32s * pHist[4], int nLevels[4], Npp32s nLowerLevel[4], Npp32s nUpperLevel[4], Npp8u * pBuffer);
+    };
+
+    template<int SDEPTH, typename NppHistogramEvenFuncC1<SDEPTH>::func_ptr func, get_buf_size_c1_t get_buf_size>
+    struct NppHistogramEvenC1
+    {
+        typedef typename NppHistogramEvenFuncC1<SDEPTH>::src_t src_t;
+
+        static void hist(const GpuMat& src, OutputArray _hist, int histSize, int lowerLevel, int upperLevel, Stream& stream)
+        {
+            const int levels = histSize + 1;
+
+            _hist.create(1, histSize, CV_32S);
+            GpuMat hist = _hist.getGpuMat();
+
+            NppiSize sz;
+            sz.width = src.cols;
+            sz.height = src.rows;
+
+            int buf_size;
+            get_buf_size(sz, levels, &buf_size);
+
+            BufferPool pool(stream);
+            GpuMat buf = pool.getBuffer(1, buf_size, CV_8UC1);
+
+            NppStreamHandler h(stream);
+
+            nppSafeCall( func(src.ptr<src_t>(), static_cast<int>(src.step), sz, hist.ptr<Npp32s>(), levels,
+                lowerLevel, upperLevel, buf.ptr<Npp8u>()) );
+
+            if (!stream)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    };
+    template<int SDEPTH, typename NppHistogramEvenFuncC4<SDEPTH>::func_ptr func, get_buf_size_c4_t get_buf_size>
+    struct NppHistogramEvenC4
+    {
+        typedef typename NppHistogramEvenFuncC4<SDEPTH>::src_t src_t;
+
+        static void hist(const GpuMat& src, GpuMat hist[4], int histSize[4], int lowerLevel[4], int upperLevel[4], Stream& stream)
+        {
+            int levels[] = {histSize[0] + 1, histSize[1] + 1, histSize[2] + 1, histSize[3] + 1};
+            hist[0].create(1, histSize[0], CV_32S);
+            hist[1].create(1, histSize[1], CV_32S);
+            hist[2].create(1, histSize[2], CV_32S);
+            hist[3].create(1, histSize[3], CV_32S);
+
+            NppiSize sz;
+            sz.width = src.cols;
+            sz.height = src.rows;
+
+            Npp32s* pHist[] = {hist[0].ptr<Npp32s>(), hist[1].ptr<Npp32s>(), hist[2].ptr<Npp32s>(), hist[3].ptr<Npp32s>()};
+
+            int buf_size;
+            get_buf_size(sz, levels, &buf_size);
+
+            BufferPool pool(stream);
+            GpuMat buf = pool.getBuffer(1, buf_size, CV_8UC1);
+
+            NppStreamHandler h(stream);
+
+            nppSafeCall( func(src.ptr<src_t>(), static_cast<int>(src.step), sz, pHist, levels, lowerLevel, upperLevel, buf.ptr<Npp8u>()) );
+
+            if (!stream)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    };
+
+    template<int SDEPTH> struct NppHistogramRangeFuncC1
+    {
+        typedef typename NPPTypeTraits<SDEPTH>::npp_type src_t;
+        typedef Npp32s level_t;
+        enum {LEVEL_TYPE_CODE=CV_32SC1};
+
+        typedef NppStatus (*func_ptr)(const src_t* pSrc, int nSrcStep, NppiSize oSizeROI, Npp32s* pHist,
+            const Npp32s* pLevels, int nLevels, Npp8u* pBuffer);
+    };
+    template<> struct NppHistogramRangeFuncC1<CV_32F>
+    {
+        typedef Npp32f src_t;
+        typedef Npp32f level_t;
+        enum {LEVEL_TYPE_CODE=CV_32FC1};
+
+        typedef NppStatus (*func_ptr)(const Npp32f* pSrc, int nSrcStep, NppiSize oSizeROI, Npp32s* pHist,
+            const Npp32f* pLevels, int nLevels, Npp8u* pBuffer);
+    };
+    template<int SDEPTH> struct NppHistogramRangeFuncC4
+    {
+        typedef typename NPPTypeTraits<SDEPTH>::npp_type src_t;
+        typedef Npp32s level_t;
+        enum {LEVEL_TYPE_CODE=CV_32SC1};
+
+        typedef NppStatus (*func_ptr)(const src_t* pSrc, int nSrcStep, NppiSize oSizeROI, Npp32s* pHist[4],
+            const Npp32s* pLevels[4], int nLevels[4], Npp8u* pBuffer);
+    };
+    template<> struct NppHistogramRangeFuncC4<CV_32F>
+    {
+        typedef Npp32f src_t;
+        typedef Npp32f level_t;
+        enum {LEVEL_TYPE_CODE=CV_32FC1};
+
+        typedef NppStatus (*func_ptr)(const Npp32f* pSrc, int nSrcStep, NppiSize oSizeROI, Npp32s* pHist[4],
+            const Npp32f* pLevels[4], int nLevels[4], Npp8u* pBuffer);
+    };
+
+    template<int SDEPTH, typename NppHistogramRangeFuncC1<SDEPTH>::func_ptr func, get_buf_size_c1_t get_buf_size>
+    struct NppHistogramRangeC1
+    {
+        typedef typename NppHistogramRangeFuncC1<SDEPTH>::src_t src_t;
+        typedef typename NppHistogramRangeFuncC1<SDEPTH>::level_t level_t;
+        enum {LEVEL_TYPE_CODE=NppHistogramRangeFuncC1<SDEPTH>::LEVEL_TYPE_CODE};
+
+        static void hist(const GpuMat& src, OutputArray _hist, const GpuMat& levels, Stream& stream)
+        {
+            CV_Assert( levels.type() == LEVEL_TYPE_CODE && levels.rows == 1 );
+
+            _hist.create(1, levels.cols - 1, CV_32S);
+            GpuMat hist = _hist.getGpuMat();
+
+            NppiSize sz;
+            sz.width = src.cols;
+            sz.height = src.rows;
+
+            int buf_size;
+            get_buf_size(sz, levels.cols, &buf_size);
+
+            BufferPool pool(stream);
+            GpuMat buf = pool.getBuffer(1, buf_size, CV_8UC1);
+
+            NppStreamHandler h(stream);
+
+            nppSafeCall( func(src.ptr<src_t>(), static_cast<int>(src.step), sz, hist.ptr<Npp32s>(), levels.ptr<level_t>(), levels.cols, buf.ptr<Npp8u>()) );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    };
+    template<int SDEPTH, typename NppHistogramRangeFuncC4<SDEPTH>::func_ptr func, get_buf_size_c4_t get_buf_size>
+    struct NppHistogramRangeC4
+    {
+        typedef typename NppHistogramRangeFuncC4<SDEPTH>::src_t src_t;
+        typedef typename NppHistogramRangeFuncC1<SDEPTH>::level_t level_t;
+        enum {LEVEL_TYPE_CODE=NppHistogramRangeFuncC1<SDEPTH>::LEVEL_TYPE_CODE};
+
+        static void hist(const GpuMat& src, GpuMat hist[4], const GpuMat levels[4], Stream& stream)
+        {
+            CV_Assert( levels[0].type() == LEVEL_TYPE_CODE && levels[0].rows == 1 );
+            CV_Assert( levels[1].type() == LEVEL_TYPE_CODE && levels[1].rows == 1 );
+            CV_Assert( levels[2].type() == LEVEL_TYPE_CODE && levels[2].rows == 1 );
+            CV_Assert( levels[3].type() == LEVEL_TYPE_CODE && levels[3].rows == 1 );
+
+            hist[0].create(1, levels[0].cols - 1, CV_32S);
+            hist[1].create(1, levels[1].cols - 1, CV_32S);
+            hist[2].create(1, levels[2].cols - 1, CV_32S);
+            hist[3].create(1, levels[3].cols - 1, CV_32S);
+
+            Npp32s* pHist[] = {hist[0].ptr<Npp32s>(), hist[1].ptr<Npp32s>(), hist[2].ptr<Npp32s>(), hist[3].ptr<Npp32s>()};
+            int nLevels[] = {levels[0].cols, levels[1].cols, levels[2].cols, levels[3].cols};
+            const level_t* pLevels[] = {levels[0].ptr<level_t>(), levels[1].ptr<level_t>(), levels[2].ptr<level_t>(), levels[3].ptr<level_t>()};
+
+            NppiSize sz;
+            sz.width = src.cols;
+            sz.height = src.rows;
+
+            int buf_size;
+            get_buf_size(sz, nLevels, &buf_size);
+
+            BufferPool pool(stream);
+            GpuMat buf = pool.getBuffer(1, buf_size, CV_8UC1);
+
+            NppStreamHandler h(stream);
+
+            nppSafeCall( func(src.ptr<src_t>(), static_cast<int>(src.step), sz, pHist, pLevels, nLevels, buf.ptr<Npp8u>()) );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    };
+}
+
+void cv::cuda::evenLevels(OutputArray _levels, int nLevels, int lowerLevel, int upperLevel, Stream& stream)
+{
+    const int kind = _levels.kind();
+
+    _levels.create(1, nLevels, CV_32SC1);
+
+    Mat host_levels;
+    if (kind == _InputArray::CUDA_GPU_MAT)
+        host_levels.create(1, nLevels, CV_32SC1);
+    else
+        host_levels = _levels.getMat();
+
+    nppSafeCall( nppiEvenLevelsHost_32s(host_levels.ptr<Npp32s>(), nLevels, lowerLevel, upperLevel) );
+
+    if (kind == _InputArray::CUDA_GPU_MAT)
+        _levels.getGpuMatRef().upload(host_levels, stream);
+}
+
+namespace hist
+{
+    void histEven8u(PtrStepSzb src, int* hist, int binCount, int lowerLevel, int upperLevel, cudaStream_t stream);
+}
+
+namespace
+{
+    void histEven8u(const GpuMat& src, GpuMat& hist, int histSize, int lowerLevel, int upperLevel, cudaStream_t stream)
+    {
+        hist.create(1, histSize, CV_32S);
+        cudaSafeCall( cudaMemsetAsync(hist.data, 0, histSize * sizeof(int), stream) );
+        hist::histEven8u(src, hist.ptr<int>(), histSize, lowerLevel, upperLevel, stream);
+    }
+}
+
+void cv::cuda::histEven(InputArray _src, OutputArray hist, int histSize, int lowerLevel, int upperLevel, Stream& stream)
+{
+    typedef void (*hist_t)(const GpuMat& src, OutputArray hist, int levels, int lowerLevel, int upperLevel, Stream& stream);
+    static const hist_t hist_callers[] =
+    {
+        NppHistogramEvenC1<CV_8U , nppiHistogramEven_8u_C1R , nppiHistogramEvenGetBufferSize_8u_C1R >::hist,
+        0,
+        NppHistogramEvenC1<CV_16U, nppiHistogramEven_16u_C1R, nppiHistogramEvenGetBufferSize_16u_C1R>::hist,
+        NppHistogramEvenC1<CV_16S, nppiHistogramEven_16s_C1R, nppiHistogramEvenGetBufferSize_16s_C1R>::hist
+    };
+
+    GpuMat src = _src.getGpuMat();
+
+    if (src.depth() == CV_8U && deviceSupports(FEATURE_SET_COMPUTE_30))
+    {
+        histEven8u(src, hist.getGpuMatRef(), histSize, lowerLevel, upperLevel, StreamAccessor::getStream(stream));
+        return;
+    }
+
+    CV_Assert( src.type() == CV_8UC1 || src.type() == CV_16UC1 || src.type() == CV_16SC1 );
+
+    hist_callers[src.depth()](src, hist, histSize, lowerLevel, upperLevel, stream);
+}
+
+void cv::cuda::histEven(InputArray _src, GpuMat hist[4], int histSize[4], int lowerLevel[4], int upperLevel[4], Stream& stream)
+{
+    typedef void (*hist_t)(const GpuMat& src, GpuMat hist[4], int levels[4], int lowerLevel[4], int upperLevel[4], Stream& stream);
+    static const hist_t hist_callers[] =
+    {
+        NppHistogramEvenC4<CV_8U , nppiHistogramEven_8u_C4R , nppiHistogramEvenGetBufferSize_8u_C4R >::hist,
+        0,
+        NppHistogramEvenC4<CV_16U, nppiHistogramEven_16u_C4R, nppiHistogramEvenGetBufferSize_16u_C4R>::hist,
+        NppHistogramEvenC4<CV_16S, nppiHistogramEven_16s_C4R, nppiHistogramEvenGetBufferSize_16s_C4R>::hist
+    };
+
+    GpuMat src = _src.getGpuMat();
+
+    CV_Assert( src.type() == CV_8UC4 || src.type() == CV_16UC4 || src.type() == CV_16SC4 );
+
+    hist_callers[src.depth()](src, hist, histSize, lowerLevel, upperLevel, stream);
+}
+
+void cv::cuda::histRange(InputArray _src, OutputArray hist, InputArray _levels, Stream& stream)
+{
+    typedef void (*hist_t)(const GpuMat& src, OutputArray hist, const GpuMat& levels, Stream& stream);
+    static const hist_t hist_callers[] =
+    {
+        NppHistogramRangeC1<CV_8U , nppiHistogramRange_8u_C1R , nppiHistogramRangeGetBufferSize_8u_C1R >::hist,
+        0,
+        NppHistogramRangeC1<CV_16U, nppiHistogramRange_16u_C1R, nppiHistogramRangeGetBufferSize_16u_C1R>::hist,
+        NppHistogramRangeC1<CV_16S, nppiHistogramRange_16s_C1R, nppiHistogramRangeGetBufferSize_16s_C1R>::hist,
+        0,
+        NppHistogramRangeC1<CV_32F, nppiHistogramRange_32f_C1R, nppiHistogramRangeGetBufferSize_32f_C1R>::hist
+    };
+
+    GpuMat src = _src.getGpuMat();
+    GpuMat levels = _levels.getGpuMat();
+
+    CV_Assert( src.type() == CV_8UC1 || src.type() == CV_16UC1 || src.type() == CV_16SC1 || src.type() == CV_32FC1 );
+
+    hist_callers[src.depth()](src, hist, levels, stream);
+}
+
+void cv::cuda::histRange(InputArray _src, GpuMat hist[4], const GpuMat levels[4], Stream& stream)
+{
+    typedef void (*hist_t)(const GpuMat& src, GpuMat hist[4], const GpuMat levels[4], Stream& stream);
+    static const hist_t hist_callers[] =
+    {
+        NppHistogramRangeC4<CV_8U , nppiHistogramRange_8u_C4R , nppiHistogramRangeGetBufferSize_8u_C4R >::hist,
+        0,
+        NppHistogramRangeC4<CV_16U, nppiHistogramRange_16u_C4R, nppiHistogramRangeGetBufferSize_16u_C4R>::hist,
+        NppHistogramRangeC4<CV_16S, nppiHistogramRange_16s_C4R, nppiHistogramRangeGetBufferSize_16s_C4R>::hist,
+        0,
+        NppHistogramRangeC4<CV_32F, nppiHistogramRange_32f_C4R, nppiHistogramRangeGetBufferSize_32f_C4R>::hist
+    };
+
+    GpuMat src = _src.getGpuMat();
+
+    CV_Assert( src.type() == CV_8UC4 || src.type() == CV_16UC4 || src.type() == CV_16SC4 || src.type() == CV_32FC4 );
+
+    hist_callers[src.depth()](src, hist, levels, stream);
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudaimgproc/src/hough_circles.cpp b/modules/cudaimgproc/src/hough_circles.cpp
new file mode 100644
index 00000000000..0fa962d71b4
--- /dev/null
+++ b/modules/cudaimgproc/src/hough_circles.cpp
@@ -0,0 +1,318 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER) || !defined(HAVE_OPENCV_CUDAFILTERS)
+
+Ptr<cuda::HoughCirclesDetector> cv::cuda::createHoughCirclesDetector(float, float, int, int, int, int, int) { throw_no_cuda(); return Ptr<HoughCirclesDetector>(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace hough
+    {
+        int buildPointList_gpu(PtrStepSzb src, unsigned int* list);
+    }
+
+    namespace hough_circles
+    {
+        void circlesAccumCenters_gpu(const unsigned int* list, int count, PtrStepi dx, PtrStepi dy, PtrStepSzi accum, int minRadius, int maxRadius, float idp);
+        int buildCentersList_gpu(PtrStepSzi accum, unsigned int* centers, int threshold);
+        int circlesAccumRadius_gpu(const unsigned int* centers, int centersCount, const unsigned int* list, int count,
+                                   float3* circles, int maxCircles, float dp, int minRadius, int maxRadius, int threshold, bool has20);
+    }
+}}}
+
+namespace
+{
+    class HoughCirclesDetectorImpl : public HoughCirclesDetector
+    {
+    public:
+        HoughCirclesDetectorImpl(float dp, float minDist, int cannyThreshold, int votesThreshold, int minRadius, int maxRadius, int maxCircles);
+
+        void detect(InputArray src, OutputArray circles, Stream& stream);
+
+        void setDp(float dp) { dp_ = dp; }
+        float getDp() const { return dp_; }
+
+        void setMinDist(float minDist) { minDist_ = minDist; }
+        float getMinDist() const { return minDist_; }
+
+        void setCannyThreshold(int cannyThreshold) { cannyThreshold_ = cannyThreshold; }
+        int getCannyThreshold() const { return cannyThreshold_; }
+
+        void setVotesThreshold(int votesThreshold) { votesThreshold_ = votesThreshold; }
+        int getVotesThreshold() const { return votesThreshold_; }
+
+        void setMinRadius(int minRadius) { minRadius_ = minRadius; }
+        int getMinRadius() const { return minRadius_; }
+
+        void setMaxRadius(int maxRadius) { maxRadius_ = maxRadius; }
+        int getMaxRadius() const { return maxRadius_; }
+
+        void setMaxCircles(int maxCircles) { maxCircles_ = maxCircles; }
+        int getMaxCircles() const { return maxCircles_; }
+
+        void write(FileStorage& fs) const
+        {
+            writeFormat(fs);
+            fs << "name" << "HoughCirclesDetector_CUDA"
+            << "dp" << dp_
+            << "minDist" << minDist_
+            << "cannyThreshold" << cannyThreshold_
+            << "votesThreshold" << votesThreshold_
+            << "minRadius" << minRadius_
+            << "maxRadius" << maxRadius_
+            << "maxCircles" << maxCircles_;
+        }
+
+        void read(const FileNode& fn)
+        {
+            CV_Assert( String(fn["name"]) == "HoughCirclesDetector_CUDA" );
+            dp_ = (float)fn["dp"];
+            minDist_ = (float)fn["minDist"];
+            cannyThreshold_ = (int)fn["cannyThreshold"];
+            votesThreshold_ = (int)fn["votesThreshold"];
+            minRadius_ = (int)fn["minRadius"];
+            maxRadius_ = (int)fn["maxRadius"];
+            maxCircles_ = (int)fn["maxCircles"];
+        }
+
+    private:
+        float dp_;
+        float minDist_;
+        int cannyThreshold_;
+        int votesThreshold_;
+        int minRadius_;
+        int maxRadius_;
+        int maxCircles_;
+
+        GpuMat dx_, dy_;
+        GpuMat edges_;
+        GpuMat accum_;
+        Mat tt; //CPU copy of accum_
+        GpuMat list_;
+        GpuMat result_;
+        Ptr<cuda::Filter> filterDx_;
+        Ptr<cuda::Filter> filterDy_;
+        Ptr<cuda::CannyEdgeDetector> canny_;
+    };
+
+    bool centersCompare(Vec3f a, Vec3f b) {return (a[2] > b[2]);}
+
+    HoughCirclesDetectorImpl::HoughCirclesDetectorImpl(float dp, float minDist, int cannyThreshold, int votesThreshold,
+                                                       int minRadius, int maxRadius, int maxCircles) :
+        dp_(dp), minDist_(minDist), cannyThreshold_(cannyThreshold), votesThreshold_(votesThreshold),
+        minRadius_(minRadius), maxRadius_(maxRadius), maxCircles_(maxCircles)
+    {
+        canny_ = cuda::createCannyEdgeDetector(std::max(cannyThreshold_ / 2, 1), cannyThreshold_);
+
+        filterDx_ = cuda::createSobelFilter(CV_8UC1, CV_32S, 1, 0);
+        filterDy_ = cuda::createSobelFilter(CV_8UC1, CV_32S, 0, 1);
+    }
+
+    void HoughCirclesDetectorImpl::detect(InputArray _src, OutputArray circles, Stream& stream)
+    {
+        // TODO : implement async version
+        CV_UNUSED(stream);
+
+        using namespace cv::cuda::device::hough;
+        using namespace cv::cuda::device::hough_circles;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.type() == CV_8UC1 );
+        CV_Assert( src.cols < std::numeric_limits<unsigned short>::max() );
+        CV_Assert( src.rows < std::numeric_limits<unsigned short>::max() );
+        CV_Assert( dp_ > 0 );
+        CV_Assert( minRadius_ > 0 && maxRadius_ > minRadius_ );
+        CV_Assert( cannyThreshold_ > 0 );
+        CV_Assert( votesThreshold_ > 0 );
+        CV_Assert( maxCircles_ > 0 );
+
+        const float idp = 1.0f / dp_;
+
+        filterDx_->apply(src, dx_);
+        filterDy_->apply(src, dy_);
+
+        canny_->setLowThreshold(std::max(cannyThreshold_ / 2, 1));
+        canny_->setHighThreshold(cannyThreshold_);
+
+        canny_->detect(dx_, dy_, edges_);
+
+        ensureSizeIsEnough(2, src.size().area(), CV_32SC1, list_);
+        unsigned int* srcPoints = list_.ptr<unsigned int>(0);
+        unsigned int* centers = list_.ptr<unsigned int>(1);
+
+        const int pointsCount = buildPointList_gpu(edges_, srcPoints);
+        if (pointsCount == 0)
+        {
+            circles.release();
+            return;
+        }
+
+        ensureSizeIsEnough(cvCeil(src.rows * idp) + 2, cvCeil(src.cols * idp) + 2, CV_32SC1, accum_);
+        accum_.setTo(Scalar::all(0));
+
+        circlesAccumCenters_gpu(srcPoints, pointsCount, dx_, dy_, accum_, minRadius_, maxRadius_, idp);
+
+        accum_.download(tt);
+
+        int centersCount = buildCentersList_gpu(accum_, centers, votesThreshold_);
+        if (centersCount == 0)
+        {
+            circles.release();
+            return;
+        }
+
+        if (minDist_ > 1)
+        {
+            AutoBuffer<ushort2> oldBuf_(centersCount);
+            AutoBuffer<ushort2> newBuf_(centersCount);
+            int newCount = 0;
+
+            ushort2* oldBuf = oldBuf_.data();
+            ushort2* newBuf = newBuf_.data();
+
+            cudaSafeCall( cudaMemcpy(oldBuf, centers, centersCount * sizeof(ushort2), cudaMemcpyDeviceToHost) );
+
+            const int cellSize = cvRound(minDist_);
+            const int gridWidth = (src.cols + cellSize - 1) / cellSize;
+            const int gridHeight = (src.rows + cellSize - 1) / cellSize;
+
+            std::vector< std::vector<ushort2> > grid(gridWidth * gridHeight);
+
+            const float minDist2 = minDist_ * minDist_;
+
+            std::vector<Vec3f> sortBuf;
+            for(int i=0; i<centersCount; i++){
+                Vec3f temp;
+                temp[0] = oldBuf[i].x;
+                temp[1] = oldBuf[i].y;
+                temp[2] = tt.at<int>(temp[1]+1, temp[0]+1);
+                sortBuf.push_back(temp);
+            }
+            std::sort(sortBuf.begin(), sortBuf.end(), centersCompare);
+
+            for (int i = 0; i < centersCount; ++i)
+            {
+                ushort2 p;
+                p.x = sortBuf[i][0];
+                p.y = sortBuf[i][1];
+
+                bool good = true;
+
+                int xCell = static_cast<int>(p.x / cellSize);
+                int yCell = static_cast<int>(p.y / cellSize);
+
+                int x1 = xCell - 1;
+                int y1 = yCell - 1;
+                int x2 = xCell + 1;
+                int y2 = yCell + 1;
+
+                // boundary check
+                x1 = std::max(0, x1);
+                y1 = std::max(0, y1);
+                x2 = std::min(gridWidth - 1, x2);
+                y2 = std::min(gridHeight - 1, y2);
+
+                for (int yy = y1; yy <= y2; ++yy)
+                {
+                    for (int xx = x1; xx <= x2; ++xx)
+                    {
+                        std::vector<ushort2>& m = grid[yy * gridWidth + xx];
+
+                        for(size_t j = 0; j < m.size(); ++j)
+                        {
+                            float dx = (float)(p.x - m[j].x);
+                            float dy = (float)(p.y - m[j].y);
+
+                            if (dx * dx + dy * dy < minDist2)
+                            {
+                                good = false;
+                                goto break_out;
+                            }
+                        }
+                    }
+                }
+
+                break_out:
+
+                if(good)
+                {
+                    grid[yCell * gridWidth + xCell].push_back(p);
+
+                    newBuf[newCount++] = p;
+                }
+            }
+
+            cudaSafeCall( cudaMemcpy(centers, newBuf, newCount * sizeof(unsigned int), cudaMemcpyHostToDevice) );
+            centersCount = newCount;
+        }
+
+        ensureSizeIsEnough(1, maxCircles_, CV_32FC3, result_);
+
+        int circlesCount = circlesAccumRadius_gpu(centers, centersCount, srcPoints, pointsCount, result_.ptr<float3>(), maxCircles_,
+                                                  dp_, minRadius_, maxRadius_, votesThreshold_, deviceSupports(FEATURE_SET_COMPUTE_20));
+
+        if (circlesCount == 0)
+        {
+            circles.release();
+            return;
+        }
+
+        result_.cols = circlesCount;
+        result_.copyTo(circles);
+    }
+}
+
+Ptr<HoughCirclesDetector> cv::cuda::createHoughCirclesDetector(float dp, float minDist, int cannyThreshold, int votesThreshold, int minRadius, int maxRadius, int maxCircles)
+{
+    return makePtr<HoughCirclesDetectorImpl>(dp, minDist, cannyThreshold, votesThreshold, minRadius, maxRadius, maxCircles);
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudaimgproc/src/hough_lines.cpp b/modules/cudaimgproc/src/hough_lines.cpp
new file mode 100644
index 00000000000..e112e09a3a8
--- /dev/null
+++ b/modules/cudaimgproc/src/hough_lines.cpp
@@ -0,0 +1,212 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+Ptr<cuda::HoughLinesDetector> cv::cuda::createHoughLinesDetector(float, float, int, bool, int) { throw_no_cuda(); return Ptr<HoughLinesDetector>(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace hough
+    {
+        int buildPointList_gpu(PtrStepSzb src, unsigned int* list);
+    }
+
+    namespace hough_lines
+    {
+        void linesAccum_gpu(const unsigned int* list, int count, PtrStepSzi accum, float rho, float theta, size_t sharedMemPerBlock, bool has20);
+        int linesGetResult_gpu(PtrStepSzi accum, float2* out, int* votes, int maxSize, float rho, float theta, int threshold, bool doSort);
+    }
+}}}
+
+namespace
+{
+    class HoughLinesDetectorImpl : public HoughLinesDetector
+    {
+    public:
+        HoughLinesDetectorImpl(float rho, float theta, int threshold, bool doSort, int maxLines) :
+            rho_(rho), theta_(theta), threshold_(threshold), doSort_(doSort), maxLines_(maxLines)
+        {
+        }
+
+        void detect(InputArray src, OutputArray lines, Stream& stream);
+        void downloadResults(InputArray d_lines, OutputArray h_lines, OutputArray h_votes, Stream& stream);
+
+        void setRho(float rho) { rho_ = rho; }
+        float getRho() const { return rho_; }
+
+        void setTheta(float theta) { theta_ = theta; }
+        float getTheta() const { return theta_; }
+
+        void setThreshold(int threshold) { threshold_ = threshold; }
+        int getThreshold() const { return threshold_; }
+
+        void setDoSort(bool doSort) { doSort_ = doSort; }
+        bool getDoSort() const { return doSort_; }
+
+        void setMaxLines(int maxLines) { maxLines_ = maxLines; }
+        int getMaxLines() const { return maxLines_; }
+
+        void write(FileStorage& fs) const
+        {
+            writeFormat(fs);
+            fs << "name" << "HoughLinesDetector_CUDA"
+            << "rho" << rho_
+            << "theta" << theta_
+            << "threshold" << threshold_
+            << "doSort" << doSort_
+            << "maxLines" << maxLines_;
+        }
+
+        void read(const FileNode& fn)
+        {
+            CV_Assert( String(fn["name"]) == "HoughLinesDetector_CUDA" );
+            rho_ = (float)fn["rho"];
+            theta_ = (float)fn["theta"];
+            threshold_ = (int)fn["threshold"];
+            doSort_ = (int)fn["doSort"] != 0;
+            maxLines_ = (int)fn["maxLines"];
+        }
+
+    private:
+        float rho_;
+        float theta_;
+        int threshold_;
+        bool doSort_;
+        int maxLines_;
+
+        GpuMat accum_;
+        GpuMat list_;
+        GpuMat result_;
+    };
+
+    void HoughLinesDetectorImpl::detect(InputArray _src, OutputArray lines, Stream& stream)
+    {
+        // TODO : implement async version
+        CV_UNUSED(stream);
+
+        using namespace cv::cuda::device::hough;
+        using namespace cv::cuda::device::hough_lines;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.type() == CV_8UC1 );
+        CV_Assert( src.cols < std::numeric_limits<unsigned short>::max() );
+        CV_Assert( src.rows < std::numeric_limits<unsigned short>::max() );
+
+        ensureSizeIsEnough(1, src.size().area(), CV_32SC1, list_);
+        unsigned int* srcPoints = list_.ptr<unsigned int>();
+
+        const int pointsCount = buildPointList_gpu(src, srcPoints);
+        if (pointsCount == 0)
+        {
+            lines.release();
+            return;
+        }
+
+        const int numangle = cvRound(CV_PI / theta_);
+        const int numrho = cvRound(((src.cols + src.rows) * 2 + 1) / rho_);
+        CV_Assert( numangle > 0 && numrho > 0 );
+
+        ensureSizeIsEnough(numangle + 2, numrho + 2, CV_32SC1, accum_);
+        accum_.setTo(Scalar::all(0));
+
+        DeviceInfo devInfo;
+        linesAccum_gpu(srcPoints, pointsCount, accum_, rho_, theta_, devInfo.sharedMemPerBlock(), devInfo.supports(FEATURE_SET_COMPUTE_20));
+
+        ensureSizeIsEnough(2, maxLines_, CV_32FC2, result_);
+
+        int linesCount = linesGetResult_gpu(accum_, result_.ptr<float2>(0), result_.ptr<int>(1), maxLines_, rho_, theta_, threshold_, doSort_);
+
+        if (linesCount == 0)
+        {
+            lines.release();
+            return;
+        }
+
+        result_.cols = linesCount;
+        result_.copyTo(lines);
+    }
+
+    void HoughLinesDetectorImpl::downloadResults(InputArray _d_lines, OutputArray h_lines, OutputArray h_votes, Stream& stream)
+    {
+        GpuMat d_lines = _d_lines.getGpuMat();
+
+        if (d_lines.empty())
+        {
+            h_lines.release();
+            if (h_votes.needed())
+                h_votes.release();
+            return;
+        }
+
+        CV_Assert( d_lines.rows == 2 && d_lines.type() == CV_32FC2 );
+
+        if (stream)
+            d_lines.row(0).download(h_lines, stream);
+        else
+            d_lines.row(0).download(h_lines);
+
+        if (h_votes.needed())
+        {
+            GpuMat d_votes(1, d_lines.cols, CV_32SC1, d_lines.ptr<int>(1));
+            if (stream)
+                d_votes.download(h_votes, stream);
+            else
+                d_votes.download(h_votes);
+        }
+    }
+}
+
+Ptr<HoughLinesDetector> cv::cuda::createHoughLinesDetector(float rho, float theta, int threshold, bool doSort, int maxLines)
+{
+    return makePtr<HoughLinesDetectorImpl>(rho, theta, threshold, doSort, maxLines);
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudaimgproc/src/hough_segments.cpp b/modules/cudaimgproc/src/hough_segments.cpp
new file mode 100644
index 00000000000..34ee4744619
--- /dev/null
+++ b/modules/cudaimgproc/src/hough_segments.cpp
@@ -0,0 +1,187 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+Ptr<cuda::HoughSegmentDetector> cv::cuda::createHoughSegmentDetector(float, float, int, int, int) { throw_no_cuda(); return Ptr<HoughSegmentDetector>(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace hough
+    {
+        int buildPointList_gpu(PtrStepSzb src, unsigned int* list);
+    }
+
+    namespace hough_lines
+    {
+        void linesAccum_gpu(const unsigned int* list, int count, PtrStepSzi accum, float rho, float theta, size_t sharedMemPerBlock, bool has20);
+    }
+
+    namespace hough_segments
+    {
+        int houghLinesProbabilistic_gpu(PtrStepSzb mask, PtrStepSzi accum, int4* out, int maxSize, float rho, float theta, int lineGap, int lineLength);
+    }
+}}}
+
+namespace
+{
+    class HoughSegmentDetectorImpl : public HoughSegmentDetector
+    {
+    public:
+        HoughSegmentDetectorImpl(float rho, float theta, int minLineLength, int maxLineGap, int maxLines) :
+            rho_(rho), theta_(theta), minLineLength_(minLineLength), maxLineGap_(maxLineGap), maxLines_(maxLines)
+        {
+        }
+
+        void detect(InputArray src, OutputArray lines, Stream& stream);
+
+        void setRho(float rho) { rho_ = rho; }
+        float getRho() const { return rho_; }
+
+        void setTheta(float theta) { theta_ = theta; }
+        float getTheta() const { return theta_; }
+
+        void setMinLineLength(int minLineLength) { minLineLength_ = minLineLength; }
+        int getMinLineLength() const { return minLineLength_; }
+
+        void setMaxLineGap(int maxLineGap) { maxLineGap_ = maxLineGap; }
+        int getMaxLineGap() const { return maxLineGap_; }
+
+        void setMaxLines(int maxLines) { maxLines_ = maxLines; }
+        int getMaxLines() const { return maxLines_; }
+
+        void write(FileStorage& fs) const
+        {
+            writeFormat(fs);
+            fs << "name" << "PHoughLinesDetector_CUDA"
+            << "rho" << rho_
+            << "theta" << theta_
+            << "minLineLength" << minLineLength_
+            << "maxLineGap" << maxLineGap_
+            << "maxLines" << maxLines_;
+        }
+
+        void read(const FileNode& fn)
+        {
+            CV_Assert( String(fn["name"]) == "PHoughLinesDetector_CUDA" );
+            rho_ = (float)fn["rho"];
+            theta_ = (float)fn["theta"];
+            minLineLength_ = (int)fn["minLineLength"];
+            maxLineGap_ = (int)fn["maxLineGap"];
+            maxLines_ = (int)fn["maxLines"];
+        }
+
+    private:
+        float rho_;
+        float theta_;
+        int minLineLength_;
+        int maxLineGap_;
+        int maxLines_;
+
+        GpuMat accum_;
+        GpuMat list_;
+        GpuMat result_;
+    };
+
+    void HoughSegmentDetectorImpl::detect(InputArray _src, OutputArray lines, Stream& stream)
+    {
+        // TODO : implement async version
+        CV_UNUSED(stream);
+
+        using namespace cv::cuda::device::hough;
+        using namespace cv::cuda::device::hough_lines;
+        using namespace cv::cuda::device::hough_segments;
+
+        GpuMat src = _src.getGpuMat();
+
+        CV_Assert( src.type() == CV_8UC1 );
+        CV_Assert( src.cols < std::numeric_limits<unsigned short>::max() );
+        CV_Assert( src.rows < std::numeric_limits<unsigned short>::max() );
+
+        ensureSizeIsEnough(1, src.size().area(), CV_32SC1, list_);
+        unsigned int* srcPoints = list_.ptr<unsigned int>();
+
+        const int pointsCount = buildPointList_gpu(src, srcPoints);
+        if (pointsCount == 0)
+        {
+            lines.release();
+            return;
+        }
+
+        const int numangle = cvRound(CV_PI / theta_);
+        const int numrho = cvRound(((src.cols + src.rows) * 2 + 1) / rho_);
+        CV_Assert( numangle > 0 && numrho > 0 );
+
+        ensureSizeIsEnough(numangle + 2, numrho + 2, CV_32SC1, accum_);
+        accum_.setTo(Scalar::all(0));
+
+        DeviceInfo devInfo;
+        linesAccum_gpu(srcPoints, pointsCount, accum_, rho_, theta_, devInfo.sharedMemPerBlock(), devInfo.supports(FEATURE_SET_COMPUTE_20));
+
+        ensureSizeIsEnough(1, maxLines_, CV_32SC4, result_);
+
+        int linesCount = houghLinesProbabilistic_gpu(src, accum_, result_.ptr<int4>(), maxLines_, rho_, theta_, maxLineGap_, minLineLength_);
+
+        if (linesCount == 0)
+        {
+            lines.release();
+            return;
+        }
+
+        result_.cols = linesCount;
+        result_.copyTo(lines);
+    }
+}
+
+Ptr<HoughSegmentDetector> cv::cuda::createHoughSegmentDetector(float rho, float theta, int minLineLength, int maxLineGap, int maxLines)
+{
+    return makePtr<HoughSegmentDetectorImpl>(rho, theta, minLineLength, maxLineGap, maxLines);
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudaimgproc/src/match_template.cpp b/modules/cudaimgproc/src/match_template.cpp
new file mode 100644
index 00000000000..25c42dfd96c
--- /dev/null
+++ b/modules/cudaimgproc/src/match_template.cpp
@@ -0,0 +1,644 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || !defined (HAVE_OPENCV_CUDAARITHM) || defined (CUDA_DISABLER)
+
+Ptr<cuda::TemplateMatching> cv::cuda::createTemplateMatching(int, int, Size) { throw_no_cuda(); return Ptr<cuda::TemplateMatching>(); }
+
+#else
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace match_template
+    {
+        void matchTemplateNaive_CCORR_8U(const PtrStepSzb image, const PtrStepSzb templ, PtrStepSzf result, int cn, cudaStream_t stream);
+        void matchTemplateNaive_CCORR_32F(const PtrStepSzb image, const PtrStepSzb templ, PtrStepSzf result, int cn, cudaStream_t stream);
+
+        void matchTemplateNaive_SQDIFF_8U(const PtrStepSzb image, const PtrStepSzb templ, PtrStepSzf result, int cn, cudaStream_t stream);
+        void matchTemplateNaive_SQDIFF_32F(const PtrStepSzb image, const PtrStepSzb templ, PtrStepSzf result, int cn, cudaStream_t stream);
+
+        void matchTemplatePrepared_SQDIFF_8U(int w, int h, const PtrStepSz<double> image_sqsum, double templ_sqsum, PtrStepSzf result,
+            int cn, cudaStream_t stream);
+
+        void matchTemplatePrepared_SQDIFF_NORMED_8U(int w, int h, const PtrStepSz<double> image_sqsum, double templ_sqsum, PtrStepSzf result,
+            int cn, cudaStream_t stream);
+
+        void matchTemplatePrepared_CCOFF_8U(int w, int h, const PtrStepSz<int> image_sum, int templ_sum, PtrStepSzf result, cudaStream_t stream);
+        void matchTemplatePrepared_CCOFF_8UC2(
+            int w, int h,
+            const PtrStepSz<int> image_sum_r,
+            const PtrStepSz<int> image_sum_g,
+            int templ_sum_r,
+            int templ_sum_g,
+            PtrStepSzf result, cudaStream_t stream);
+        void matchTemplatePrepared_CCOFF_8UC3(
+                int w, int h,
+                const PtrStepSz<int> image_sum_r,
+                const PtrStepSz<int> image_sum_g,
+                const PtrStepSz<int> image_sum_b,
+                int templ_sum_r,
+                int templ_sum_g,
+                int templ_sum_b,
+                PtrStepSzf result, cudaStream_t stream);
+        void matchTemplatePrepared_CCOFF_8UC4(
+                int w, int h,
+                const PtrStepSz<int> image_sum_r,
+                const PtrStepSz<int> image_sum_g,
+                const PtrStepSz<int> image_sum_b,
+                const PtrStepSz<int> image_sum_a,
+                int templ_sum_r,
+                int templ_sum_g,
+                int templ_sum_b,
+                int templ_sum_a,
+                PtrStepSzf result, cudaStream_t stream);
+
+
+        void matchTemplatePrepared_CCOFF_NORMED_8U(
+                int w, int h, const PtrStepSz<int> image_sum,
+                const PtrStepSz<double> image_sqsum,
+                int templ_sum, double templ_sqsum,
+                PtrStepSzf result, cudaStream_t stream);
+        void matchTemplatePrepared_CCOFF_NORMED_8UC2(
+                int w, int h,
+                const PtrStepSz<int> image_sum_r, const PtrStepSz<double> image_sqsum_r,
+                const PtrStepSz<int> image_sum_g, const PtrStepSz<double> image_sqsum_g,
+                int templ_sum_r, double templ_sqsum_r,
+                int templ_sum_g, double templ_sqsum_g,
+                PtrStepSzf result, cudaStream_t stream);
+        void matchTemplatePrepared_CCOFF_NORMED_8UC3(
+                int w, int h,
+                const PtrStepSz<int> image_sum_r, const PtrStepSz<double> image_sqsum_r,
+                const PtrStepSz<int> image_sum_g, const PtrStepSz<double> image_sqsum_g,
+                const PtrStepSz<int> image_sum_b, const PtrStepSz<double> image_sqsum_b,
+                int templ_sum_r, double templ_sqsum_r,
+                int templ_sum_g, double templ_sqsum_g,
+                int templ_sum_b, double templ_sqsum_b,
+                PtrStepSzf result, cudaStream_t stream);
+        void matchTemplatePrepared_CCOFF_NORMED_8UC4(
+                int w, int h,
+                const PtrStepSz<int> image_sum_r, const PtrStepSz<double> image_sqsum_r,
+                const PtrStepSz<int> image_sum_g, const PtrStepSz<double> image_sqsum_g,
+                const PtrStepSz<int> image_sum_b, const PtrStepSz<double> image_sqsum_b,
+                const PtrStepSz<int> image_sum_a, const PtrStepSz<double> image_sqsum_a,
+                int templ_sum_r, double templ_sqsum_r,
+                int templ_sum_g, double templ_sqsum_g,
+                int templ_sum_b, double templ_sqsum_b,
+                int templ_sum_a, double templ_sqsum_a,
+                PtrStepSzf result, cudaStream_t stream);
+
+        void normalize_8U(int w, int h, const PtrStepSz<double> image_sqsum,
+                          double templ_sqsum, PtrStepSzf result, int cn, cudaStream_t stream);
+
+        void extractFirstChannel_32F(const PtrStepSzb image, PtrStepSzf result, int cn, cudaStream_t stream);
+    }
+}}}
+
+namespace
+{
+    // Evaluates optimal template's area threshold. If
+    // template's area is less  than the threshold, we use naive match
+    // template version, otherwise FFT-based (if available)
+    int getTemplateThreshold(int method, int depth)
+    {
+        switch (method)
+        {
+        case TM_CCORR:
+            if (depth == CV_32F) return 250;
+            if (depth == CV_8U) return 300;
+            break;
+
+        case TM_SQDIFF:
+            if (depth == CV_8U) return 300;
+            break;
+        }
+
+        CV_Error(Error::StsBadArg, "unsupported match template mode");
+        return 0;
+    }
+
+    ///////////////////////////////////////////////////////////////
+    // CCORR_32F
+
+    class Match_CCORR_32F : public TemplateMatching
+    {
+    public:
+        explicit Match_CCORR_32F(Size user_block_size);
+
+        void match(InputArray image, InputArray templ, OutputArray result, Stream& stream = Stream::Null());
+
+    private:
+        Ptr<cuda::Convolution> conv_;
+        GpuMat result_;
+    };
+
+    Match_CCORR_32F::Match_CCORR_32F(Size user_block_size)
+    {
+        conv_ = cuda::createConvolution(user_block_size);
+    }
+
+    void Match_CCORR_32F::match(InputArray _image, InputArray _templ, OutputArray _result, Stream& _stream)
+    {
+        using namespace cv::cuda::device::match_template;
+
+        GpuMat image = _image.getGpuMat();
+        GpuMat templ = _templ.getGpuMat();
+
+        CV_Assert( image.depth() == CV_32F );
+        CV_Assert( image.type() == templ.type() );
+        CV_Assert( image.cols >= templ.cols && image.rows >= templ.rows );
+
+        cudaStream_t stream = StreamAccessor::getStream(_stream);
+
+        _result.create(image.rows - templ.rows + 1, image.cols - templ.cols + 1, CV_32FC1);
+        GpuMat result = _result.getGpuMat();
+
+        if (templ.size().area() < getTemplateThreshold(TM_CCORR, CV_32F))
+        {
+            matchTemplateNaive_CCORR_32F(image, templ, result, image.channels(), stream);
+            return;
+        }
+
+        if (image.channels() == 1)
+        {
+            conv_->convolve(image.reshape(1), templ.reshape(1), result, true, _stream);
+        }
+        else
+        {
+            conv_->convolve(image.reshape(1), templ.reshape(1), result_, true, _stream);
+            extractFirstChannel_32F(result_, result, image.channels(), stream);
+        }
+    }
+
+    ///////////////////////////////////////////////////////////////
+    // CCORR_8U
+
+    class Match_CCORR_8U : public TemplateMatching
+    {
+    public:
+        explicit Match_CCORR_8U(Size user_block_size) : match32F_(user_block_size)
+        {
+        }
+
+        void match(InputArray image, InputArray templ, OutputArray result, Stream& stream = Stream::Null());
+
+    private:
+        GpuMat imagef_, templf_;
+        Match_CCORR_32F match32F_;
+    };
+
+    void Match_CCORR_8U::match(InputArray _image, InputArray _templ, OutputArray _result, Stream& stream)
+    {
+        using namespace cv::cuda::device::match_template;
+
+        GpuMat image = _image.getGpuMat();
+        GpuMat templ = _templ.getGpuMat();
+
+        CV_Assert( image.depth() == CV_8U );
+        CV_Assert( image.type() == templ.type() );
+        CV_Assert( image.cols >= templ.cols && image.rows >= templ.rows );
+
+        if (templ.size().area() < getTemplateThreshold(TM_CCORR, CV_8U))
+        {
+            _result.create(image.rows - templ.rows + 1, image.cols - templ.cols + 1, CV_32FC1);
+            GpuMat result = _result.getGpuMat();
+
+            matchTemplateNaive_CCORR_8U(image, templ, result, image.channels(), StreamAccessor::getStream(stream));
+            return;
+        }
+
+        image.convertTo(imagef_, CV_32F, stream);
+        templ.convertTo(templf_, CV_32F, stream);
+
+        match32F_.match(imagef_, templf_, _result, stream);
+    }
+
+    ///////////////////////////////////////////////////////////////
+    // CCORR_NORMED_8U
+
+    class Match_CCORR_NORMED_8U : public TemplateMatching
+    {
+    public:
+        explicit Match_CCORR_NORMED_8U(Size user_block_size) : match_CCORR_(user_block_size)
+        {
+        }
+
+        void match(InputArray image, InputArray templ, OutputArray result, Stream& stream = Stream::Null());
+
+    private:
+        Match_CCORR_8U match_CCORR_;
+        GpuMat image_sqsums_;
+    };
+
+    void Match_CCORR_NORMED_8U::match(InputArray _image, InputArray _templ, OutputArray _result, Stream& stream)
+    {
+        using namespace cv::cuda::device::match_template;
+
+        GpuMat image = _image.getGpuMat();
+        GpuMat templ = _templ.getGpuMat();
+
+        CV_Assert( image.depth() == CV_8U );
+        CV_Assert( image.type() == templ.type() );
+        CV_Assert( image.cols >= templ.cols && image.rows >= templ.rows );
+
+        match_CCORR_.match(image, templ, _result, stream);
+        GpuMat result = _result.getGpuMat();
+
+        cuda::sqrIntegral(image.reshape(1), image_sqsums_, stream);
+
+        double templ_sqsum = cuda::sqrSum(templ.reshape(1))[0];
+
+        normalize_8U(templ.cols, templ.rows, image_sqsums_, templ_sqsum, result, image.channels(), StreamAccessor::getStream(stream));
+    }
+
+    ///////////////////////////////////////////////////////////////
+    // SQDIFF_32F
+
+    class Match_SQDIFF_32F : public TemplateMatching
+    {
+    public:
+        void match(InputArray image, InputArray templ, OutputArray result, Stream& stream = Stream::Null());
+    };
+
+    void Match_SQDIFF_32F::match(InputArray _image, InputArray _templ, OutputArray _result, Stream& stream)
+    {
+        using namespace cv::cuda::device::match_template;
+
+        GpuMat image = _image.getGpuMat();
+        GpuMat templ = _templ.getGpuMat();
+
+        CV_Assert( image.depth() == CV_32F );
+        CV_Assert( image.type() == templ.type() );
+        CV_Assert( image.cols >= templ.cols && image.rows >= templ.rows );
+
+        _result.create(image.rows - templ.rows + 1, image.cols - templ.cols + 1, CV_32FC1);
+        GpuMat result = _result.getGpuMat();
+
+        matchTemplateNaive_SQDIFF_32F(image, templ, result, image.channels(), StreamAccessor::getStream(stream));
+    }
+
+    ///////////////////////////////////////////////////////////////
+    // SQDIFF_8U
+
+    class Match_SQDIFF_8U : public TemplateMatching
+    {
+    public:
+        explicit Match_SQDIFF_8U(Size user_block_size) : match_CCORR_(user_block_size)
+        {
+        }
+
+        void match(InputArray image, InputArray templ, OutputArray result, Stream& stream = Stream::Null());
+
+    private:
+        GpuMat image_sqsums_;
+        Match_CCORR_8U match_CCORR_;
+    };
+
+    void Match_SQDIFF_8U::match(InputArray _image, InputArray _templ, OutputArray _result, Stream& stream)
+    {
+        using namespace cv::cuda::device::match_template;
+
+        GpuMat image = _image.getGpuMat();
+        GpuMat templ = _templ.getGpuMat();
+
+        CV_Assert( image.depth() == CV_8U );
+        CV_Assert( image.type() == templ.type() );
+        CV_Assert( image.cols >= templ.cols && image.rows >= templ.rows );
+
+        if (templ.size().area() < getTemplateThreshold(TM_SQDIFF, CV_8U))
+        {
+            _result.create(image.rows - templ.rows + 1, image.cols - templ.cols + 1, CV_32FC1);
+            GpuMat result = _result.getGpuMat();
+
+            matchTemplateNaive_SQDIFF_8U(image, templ, result, image.channels(), StreamAccessor::getStream(stream));
+            return;
+        }
+
+        cuda::sqrIntegral(image.reshape(1), image_sqsums_, stream);
+
+        double templ_sqsum = cuda::sqrSum(templ.reshape(1))[0];
+
+        match_CCORR_.match(image, templ, _result, stream);
+        GpuMat result = _result.getGpuMat();
+
+        matchTemplatePrepared_SQDIFF_8U(templ.cols, templ.rows, image_sqsums_, templ_sqsum, result, image.channels(), StreamAccessor::getStream(stream));
+    }
+
+    ///////////////////////////////////////////////////////////////
+    // SQDIFF_NORMED_8U
+
+    class Match_SQDIFF_NORMED_8U : public TemplateMatching
+    {
+    public:
+        explicit Match_SQDIFF_NORMED_8U(Size user_block_size) : match_CCORR_(user_block_size)
+        {
+        }
+
+        void match(InputArray image, InputArray templ, OutputArray result, Stream& stream = Stream::Null());
+
+    private:
+        GpuMat image_sqsums_;
+        Match_CCORR_8U match_CCORR_;
+    };
+
+    void Match_SQDIFF_NORMED_8U::match(InputArray _image, InputArray _templ, OutputArray _result, Stream& stream)
+    {
+        using namespace cv::cuda::device::match_template;
+
+        GpuMat image = _image.getGpuMat();
+        GpuMat templ = _templ.getGpuMat();
+
+        CV_Assert( image.depth() == CV_8U );
+        CV_Assert( image.type() == templ.type() );
+        CV_Assert( image.cols >= templ.cols && image.rows >= templ.rows );
+
+        cuda::sqrIntegral(image.reshape(1), image_sqsums_, stream);
+
+        double templ_sqsum = cuda::sqrSum(templ.reshape(1))[0];
+
+        match_CCORR_.match(image, templ, _result, stream);
+        GpuMat result = _result.getGpuMat();
+
+        matchTemplatePrepared_SQDIFF_NORMED_8U(templ.cols, templ.rows, image_sqsums_, templ_sqsum, result, image.channels(), StreamAccessor::getStream(stream));
+    }
+
+    ///////////////////////////////////////////////////////////////
+    // CCOFF_8U
+
+    class Match_CCOEFF_8U : public TemplateMatching
+    {
+    public:
+        explicit Match_CCOEFF_8U(Size user_block_size) : match_CCORR_(user_block_size)
+        {
+        }
+
+        void match(InputArray image, InputArray templ, OutputArray result, Stream& stream = Stream::Null());
+
+    private:
+        std::vector<GpuMat> images_;
+        std::vector<GpuMat> image_sums_;
+        Match_CCORR_8U match_CCORR_;
+    };
+
+    void Match_CCOEFF_8U::match(InputArray _image, InputArray _templ, OutputArray _result, Stream& stream)
+    {
+        using namespace cv::cuda::device::match_template;
+
+        GpuMat image = _image.getGpuMat();
+        GpuMat templ = _templ.getGpuMat();
+
+        CV_Assert( image.depth() == CV_8U );
+        CV_Assert( image.type() == templ.type() );
+        CV_Assert( image.cols >= templ.cols && image.rows >= templ.rows );
+
+        match_CCORR_.match(image, templ, _result, stream);
+        GpuMat result = _result.getGpuMat();
+
+        if (image.channels() == 1)
+        {
+            image_sums_.resize(1);
+            cuda::integral(image, image_sums_[0], stream);
+
+            int templ_sum = (int) cuda::sum(templ)[0];
+
+            matchTemplatePrepared_CCOFF_8U(templ.cols, templ.rows, image_sums_[0], templ_sum, result, StreamAccessor::getStream(stream));
+        }
+        else
+        {
+            cuda::split(image, images_);
+
+            image_sums_.resize(images_.size());
+            for (int i = 0; i < image.channels(); ++i)
+                cuda::integral(images_[i], image_sums_[i], stream);
+
+            Scalar templ_sum = cuda::sum(templ);
+
+            switch (image.channels())
+            {
+            case 2:
+                matchTemplatePrepared_CCOFF_8UC2(
+                        templ.cols, templ.rows, image_sums_[0], image_sums_[1],
+                        (int) templ_sum[0], (int) templ_sum[1],
+                        result, StreamAccessor::getStream(stream));
+                break;
+            case 3:
+                matchTemplatePrepared_CCOFF_8UC3(
+                        templ.cols, templ.rows, image_sums_[0], image_sums_[1], image_sums_[2],
+                        (int) templ_sum[0], (int) templ_sum[1], (int) templ_sum[2],
+                        result, StreamAccessor::getStream(stream));
+                break;
+            case 4:
+                matchTemplatePrepared_CCOFF_8UC4(
+                        templ.cols, templ.rows, image_sums_[0], image_sums_[1], image_sums_[2], image_sums_[3],
+                        (int) templ_sum[0], (int) templ_sum[1], (int) templ_sum[2], (int) templ_sum[3],
+                        result, StreamAccessor::getStream(stream));
+                break;
+            default:
+                CV_Error(Error::StsBadArg, "unsupported number of channels");
+            }
+        }
+    }
+
+    ///////////////////////////////////////////////////////////////
+    // CCOFF_NORMED_8U
+
+    class Match_CCOEFF_NORMED_8U : public TemplateMatching
+    {
+    public:
+        explicit Match_CCOEFF_NORMED_8U(Size user_block_size) : match_CCORR_32F_(user_block_size)
+        {
+        }
+
+        void match(InputArray image, InputArray templ, OutputArray result, Stream& stream = Stream::Null());
+
+    private:
+        GpuMat imagef_, templf_;
+        Match_CCORR_32F match_CCORR_32F_;
+        std::vector<GpuMat> images_;
+        std::vector<GpuMat> image_sums_;
+        std::vector<GpuMat> image_sqsums_;
+    };
+
+    void Match_CCOEFF_NORMED_8U::match(InputArray _image, InputArray _templ, OutputArray _result, Stream& stream)
+    {
+        using namespace cv::cuda::device::match_template;
+
+        GpuMat image = _image.getGpuMat();
+        GpuMat templ = _templ.getGpuMat();
+
+        CV_Assert( image.depth() == CV_8U );
+        CV_Assert( image.type() == templ.type() );
+        CV_Assert( image.cols >= templ.cols && image.rows >= templ.rows );
+
+        image.convertTo(imagef_, CV_32F, stream);
+        templ.convertTo(templf_, CV_32F, stream);
+
+        match_CCORR_32F_.match(imagef_, templf_, _result, stream);
+        GpuMat result = _result.getGpuMat();
+
+        if (image.channels() == 1)
+        {
+            image_sums_.resize(1);
+            cuda::integral(image, image_sums_[0], stream);
+
+            image_sqsums_.resize(1);
+            cuda::sqrIntegral(image, image_sqsums_[0], stream);
+
+            int templ_sum = (int) cuda::sum(templ)[0];
+            double templ_sqsum = cuda::sqrSum(templ)[0];
+
+            matchTemplatePrepared_CCOFF_NORMED_8U(
+                    templ.cols, templ.rows, image_sums_[0], image_sqsums_[0],
+                    templ_sum, templ_sqsum, result, StreamAccessor::getStream(stream));
+        }
+        else
+        {
+            cuda::split(image, images_);
+
+            image_sums_.resize(images_.size());
+            image_sqsums_.resize(images_.size());
+            for (int i = 0; i < image.channels(); ++i)
+            {
+                cuda::integral(images_[i], image_sums_[i], stream);
+                cuda::sqrIntegral(images_[i], image_sqsums_[i], stream);
+            }
+
+            Scalar templ_sum = cuda::sum(templ);
+            Scalar templ_sqsum = cuda::sqrSum(templ);
+
+            switch (image.channels())
+            {
+            case 2:
+                matchTemplatePrepared_CCOFF_NORMED_8UC2(
+                        templ.cols, templ.rows,
+                        image_sums_[0], image_sqsums_[0],
+                        image_sums_[1], image_sqsums_[1],
+                        (int)templ_sum[0], templ_sqsum[0],
+                        (int)templ_sum[1], templ_sqsum[1],
+                        result, StreamAccessor::getStream(stream));
+                break;
+            case 3:
+                matchTemplatePrepared_CCOFF_NORMED_8UC3(
+                        templ.cols, templ.rows,
+                        image_sums_[0], image_sqsums_[0],
+                        image_sums_[1], image_sqsums_[1],
+                        image_sums_[2], image_sqsums_[2],
+                        (int)templ_sum[0], templ_sqsum[0],
+                        (int)templ_sum[1], templ_sqsum[1],
+                        (int)templ_sum[2], templ_sqsum[2],
+                        result, StreamAccessor::getStream(stream));
+                break;
+            case 4:
+                matchTemplatePrepared_CCOFF_NORMED_8UC4(
+                        templ.cols, templ.rows,
+                        image_sums_[0], image_sqsums_[0],
+                        image_sums_[1], image_sqsums_[1],
+                        image_sums_[2], image_sqsums_[2],
+                        image_sums_[3], image_sqsums_[3],
+                        (int)templ_sum[0], templ_sqsum[0],
+                        (int)templ_sum[1], templ_sqsum[1],
+                        (int)templ_sum[2], templ_sqsum[2],
+                        (int)templ_sum[3], templ_sqsum[3],
+                        result, StreamAccessor::getStream(stream));
+                break;
+            default:
+                CV_Error(Error::StsBadArg, "unsupported number of channels");
+            }
+        }
+    }
+}
+
+Ptr<cuda::TemplateMatching> cv::cuda::createTemplateMatching(int srcType, int method, Size user_block_size)
+{
+    const int sdepth = CV_MAT_DEPTH(srcType);
+
+    CV_Assert( sdepth == CV_8U || sdepth == CV_32F );
+
+    if (sdepth == CV_32F)
+    {
+        switch (method)
+        {
+        case TM_SQDIFF:
+            return makePtr<Match_SQDIFF_32F>();
+
+        case TM_CCORR:
+            return makePtr<Match_CCORR_32F>(user_block_size);
+
+        default:
+            CV_Error( Error::StsBadFlag, "Unsopported method" );
+            return Ptr<cuda::TemplateMatching>();
+        }
+    }
+    else
+    {
+        switch (method)
+        {
+        case TM_SQDIFF:
+            return makePtr<Match_SQDIFF_8U>(user_block_size);
+
+        case TM_SQDIFF_NORMED:
+            return makePtr<Match_SQDIFF_NORMED_8U>(user_block_size);
+
+        case TM_CCORR:
+            return makePtr<Match_CCORR_8U>(user_block_size);
+
+        case TM_CCORR_NORMED:
+            return makePtr<Match_CCORR_NORMED_8U>(user_block_size);
+
+        case TM_CCOEFF:
+            return makePtr<Match_CCOEFF_8U>(user_block_size);
+
+        case TM_CCOEFF_NORMED:
+            return makePtr<Match_CCOEFF_NORMED_8U>(user_block_size);
+
+        default:
+            CV_Error( Error::StsBadFlag, "Unsopported method" );
+            return Ptr<cuda::TemplateMatching>();
+        }
+    }
+}
+
+#endif
diff --git a/modules/cudaimgproc/src/mean_shift.cpp b/modules/cudaimgproc/src/mean_shift.cpp
new file mode 100644
index 00000000000..c0e4999fa30
--- /dev/null
+++ b/modules/cudaimgproc/src/mean_shift.cpp
@@ -0,0 +1,128 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+void cv::cuda::meanShiftFiltering(InputArray, OutputArray, int, int, TermCriteria, Stream&) { throw_no_cuda(); }
+void cv::cuda::meanShiftProc(InputArray, OutputArray, OutputArray, int, int, TermCriteria, Stream&) { throw_no_cuda(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+////////////////////////////////////////////////////////////////////////
+// meanShiftFiltering
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace imgproc
+    {
+        void meanShiftFiltering_gpu(const PtrStepSzb& src, PtrStepSzb dst, int sp, int sr, int maxIter, float eps, cudaStream_t stream);
+    }
+}}}
+
+void cv::cuda::meanShiftFiltering(InputArray _src, OutputArray _dst, int sp, int sr, TermCriteria criteria, Stream& stream)
+{
+    using namespace ::cv::cuda::device::imgproc;
+
+    GpuMat src = _src.getGpuMat();
+
+    CV_Assert( src.type() == CV_8UC4 );
+
+    _dst.create(src.size(), CV_8UC4);
+    GpuMat dst = _dst.getGpuMat();
+
+    if (!(criteria.type & TermCriteria::MAX_ITER))
+        criteria.maxCount = 5;
+
+    int maxIter = std::min(std::max(criteria.maxCount, 1), 100);
+
+    if (!(criteria.type & TermCriteria::EPS))
+        criteria.epsilon = 1.f;
+
+    float eps = (float) std::max(criteria.epsilon, 0.0);
+
+    meanShiftFiltering_gpu(src, dst, sp, sr, maxIter, eps, StreamAccessor::getStream(stream));
+}
+
+////////////////////////////////////////////////////////////////////////
+// meanShiftProc_CUDA
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace imgproc
+    {
+        void meanShiftProc_gpu(const PtrStepSzb& src, PtrStepSzb dstr, PtrStepSzb dstsp, int sp, int sr, int maxIter, float eps, cudaStream_t stream);
+    }
+}}}
+
+void cv::cuda::meanShiftProc(InputArray _src, OutputArray _dstr, OutputArray _dstsp, int sp, int sr, TermCriteria criteria, Stream& stream)
+{
+    using namespace ::cv::cuda::device::imgproc;
+
+    GpuMat src = _src.getGpuMat();
+
+    CV_Assert( src.type() == CV_8UC4 );
+
+    _dstr.create(src.size(), CV_8UC4);
+    _dstsp.create(src.size(), CV_16SC2);
+
+    GpuMat dstr = _dstr.getGpuMat();
+    GpuMat dstsp = _dstsp.getGpuMat();
+
+    if (!(criteria.type & TermCriteria::MAX_ITER))
+        criteria.maxCount = 5;
+
+    int maxIter = std::min(std::max(criteria.maxCount, 1), 100);
+
+    if (!(criteria.type & TermCriteria::EPS))
+        criteria.epsilon = 1.f;
+
+    float eps = (float) std::max(criteria.epsilon, 0.0);
+
+    meanShiftProc_gpu(src, dstr, dstsp, sp, sr, maxIter, eps, StreamAccessor::getStream(stream));
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudaimgproc/src/mssegmentation.cpp b/modules/cudaimgproc/src/mssegmentation.cpp
new file mode 100644
index 00000000000..2e7a6c10b08
--- /dev/null
+++ b/modules/cudaimgproc/src/mssegmentation.cpp
@@ -0,0 +1,402 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#include "precomp.hpp"
+
+#if !defined HAVE_CUDA || defined(CUDA_DISABLER)
+
+void cv::cuda::meanShiftSegmentation(InputArray, OutputArray, int, int, int, TermCriteria, Stream&) { throw_no_cuda(); }
+
+#else
+
+// Auxiliray stuff
+namespace
+{
+
+//
+// Declarations
+//
+
+class DjSets
+{
+public:
+    DjSets(int n);
+    int find(int elem);
+    int merge(int set1, int set2);
+
+    std::vector<int> parent;
+    std::vector<int> rank;
+    std::vector<int> size;
+private:
+    DjSets(const DjSets&);
+    void operator =(const DjSets&);
+};
+
+
+template <typename T>
+struct GraphEdge
+{
+    GraphEdge() {}
+    GraphEdge(int to_, int next_, const T& val_) : to(to_), next(next_), val(val_) {}
+    int to;
+    int next;
+    T val;
+};
+
+
+template <typename T>
+class Graph
+{
+public:
+    typedef GraphEdge<T> Edge;
+
+    Graph(int numv, int nume_max);
+
+    void addEdge(int from, int to, const T& val=T());
+
+    std::vector<int> start;
+    std::vector<Edge> edges;
+
+    int numv;
+    int nume_max;
+    int nume;
+private:
+    Graph(const Graph&);
+    void operator =(const Graph&);
+};
+
+
+struct SegmLinkVal
+{
+    SegmLinkVal() {}
+    SegmLinkVal(int dr_, int dsp_) : dr(dr_), dsp(dsp_) {}
+    bool operator <(const SegmLinkVal& other) const
+    {
+        return dr + dsp < other.dr + other.dsp;
+    }
+    int dr;
+    int dsp;
+};
+
+
+struct SegmLink
+{
+    SegmLink() {}
+    SegmLink(int from_, int to_, const SegmLinkVal& val_)
+        : from(from_), to(to_), val(val_) {}
+    bool operator <(const SegmLink& other) const
+    {
+        return val < other.val;
+    }
+    int from;
+    int to;
+    SegmLinkVal val;
+};
+
+//
+// Implementation
+//
+
+DjSets::DjSets(int n) : parent(n), rank(n, 0), size(n, 1)
+{
+    for (int i = 0; i < n; ++i)
+        parent[i] = i;
+}
+
+
+inline int DjSets::find(int elem)
+{
+    int set = elem;
+    while (set != parent[set])
+        set = parent[set];
+    while (elem != parent[elem])
+    {
+        int next = parent[elem];
+        parent[elem] = set;
+        elem = next;
+    }
+    return set;
+}
+
+
+inline int DjSets::merge(int set1, int set2)
+{
+    if (rank[set1] < rank[set2])
+    {
+        parent[set1] = set2;
+        size[set2] += size[set1];
+        return set2;
+    }
+    if (rank[set2] < rank[set1])
+    {
+        parent[set2] = set1;
+        size[set1] += size[set2];
+        return set1;
+    }
+    parent[set1] = set2;
+    rank[set2]++;
+    size[set2] += size[set1];
+    return set2;
+}
+
+
+template <typename T>
+Graph<T>::Graph(int numv_, int nume_max_) : start(numv_, -1), edges(nume_max_)
+{
+    this->numv = numv_;
+    this->nume_max = nume_max_;
+    nume = 0;
+}
+
+
+template <typename T>
+inline void Graph<T>::addEdge(int from, int to, const T& val)
+{
+    edges[nume] = Edge(to, start[from], val);
+    start[from] = nume;
+    nume++;
+}
+
+
+inline int pix(int y, int x, int ncols)
+{
+    return y * ncols + x;
+}
+
+
+inline int sqr(int x)
+{
+    return x * x;
+}
+
+
+inline int dist2(const cv::Vec4b& lhs, const cv::Vec4b& rhs)
+{
+    return sqr(lhs[0] - rhs[0]) + sqr(lhs[1] - rhs[1]) + sqr(lhs[2] - rhs[2]);
+}
+
+
+inline int dist2(const cv::Vec2s& lhs, const cv::Vec2s& rhs)
+{
+    return sqr(lhs[0] - rhs[0]) + sqr(lhs[1] - rhs[1]);
+}
+
+} // anonymous namespace
+
+
+void cv::cuda::meanShiftSegmentation(InputArray _src, OutputArray _dst, int sp, int sr, int minsize, TermCriteria criteria, Stream& stream)
+{
+    GpuMat src = _src.getGpuMat();
+
+    CV_Assert( src.type() == CV_8UC4 );
+
+    const int nrows = src.rows;
+    const int ncols = src.cols;
+    const int hr = sr;
+    const int hsp = sp;
+
+    // Perform mean shift procedure and obtain region and spatial maps
+    GpuMat d_rmap, d_spmap;
+    cuda::meanShiftProc(src, d_rmap, d_spmap, sp, sr, criteria, stream);
+
+    stream.waitForCompletion();
+
+    Mat rmap(d_rmap);
+    Mat spmap(d_spmap);
+
+    Graph<SegmLinkVal> g(nrows * ncols, 4 * (nrows - 1) * (ncols - 1)
+                                        + (nrows - 1) + (ncols - 1));
+
+    // Make region adjacent graph from image
+    Vec4b r1;
+    Vec4b r2[4];
+    Vec2s sp1;
+    Vec2s sp2[4];
+    int dr[4];
+    int dsp[4];
+    for (int y = 0; y < nrows - 1; ++y)
+    {
+        Vec4b* ry = rmap.ptr<Vec4b>(y);
+        Vec4b* ryp = rmap.ptr<Vec4b>(y + 1);
+        Vec2s* spy = spmap.ptr<Vec2s>(y);
+        Vec2s* spyp = spmap.ptr<Vec2s>(y + 1);
+        for (int x = 0; x < ncols - 1; ++x)
+        {
+            r1 = ry[x];
+            sp1 = spy[x];
+
+            r2[0] = ry[x + 1];
+            r2[1] = ryp[x];
+            r2[2] = ryp[x + 1];
+            r2[3] = ryp[x];
+
+            sp2[0] = spy[x + 1];
+            sp2[1] = spyp[x];
+            sp2[2] = spyp[x + 1];
+            sp2[3] = spyp[x];
+
+            dr[0] = dist2(r1, r2[0]);
+            dr[1] = dist2(r1, r2[1]);
+            dr[2] = dist2(r1, r2[2]);
+            dsp[0] = dist2(sp1, sp2[0]);
+            dsp[1] = dist2(sp1, sp2[1]);
+            dsp[2] = dist2(sp1, sp2[2]);
+
+            r1 = ry[x + 1];
+            sp1 = spy[x + 1];
+
+            dr[3] = dist2(r1, r2[3]);
+            dsp[3] = dist2(sp1, sp2[3]);
+
+            g.addEdge(pix(y, x, ncols), pix(y, x + 1, ncols), SegmLinkVal(dr[0], dsp[0]));
+            g.addEdge(pix(y, x, ncols), pix(y + 1, x, ncols), SegmLinkVal(dr[1], dsp[1]));
+            g.addEdge(pix(y, x, ncols), pix(y + 1, x + 1, ncols), SegmLinkVal(dr[2], dsp[2]));
+            g.addEdge(pix(y, x + 1, ncols), pix(y + 1, x, ncols), SegmLinkVal(dr[3], dsp[3]));
+        }
+    }
+    for (int y = 0; y < nrows - 1; ++y)
+    {
+        r1 = rmap.at<Vec4b>(y, ncols - 1);
+        r2[0] = rmap.at<Vec4b>(y + 1, ncols - 1);
+        sp1 = spmap.at<Vec2s>(y, ncols - 1);
+        sp2[0] = spmap.at<Vec2s>(y + 1, ncols - 1);
+        dr[0] = dist2(r1, r2[0]);
+        dsp[0] = dist2(sp1, sp2[0]);
+        g.addEdge(pix(y, ncols - 1, ncols), pix(y + 1, ncols - 1, ncols), SegmLinkVal(dr[0], dsp[0]));
+    }
+    for (int x = 0; x < ncols - 1; ++x)
+    {
+        r1 = rmap.at<Vec4b>(nrows - 1, x);
+        r2[0] = rmap.at<Vec4b>(nrows - 1, x + 1);
+        sp1 = spmap.at<Vec2s>(nrows - 1, x);
+        sp2[0] = spmap.at<Vec2s>(nrows - 1, x + 1);
+        dr[0] = dist2(r1, r2[0]);
+        dsp[0] = dist2(sp1, sp2[0]);
+        g.addEdge(pix(nrows - 1, x, ncols), pix(nrows - 1, x + 1, ncols), SegmLinkVal(dr[0], dsp[0]));
+    }
+
+    DjSets comps(g.numv);
+
+    // Find adjacent components
+    for (int v = 0; v < g.numv; ++v)
+    {
+        for (int e_it = g.start[v]; e_it != -1; e_it = g.edges[e_it].next)
+        {
+            int c1 = comps.find(v);
+            int c2 = comps.find(g.edges[e_it].to);
+            if (c1 != c2 && g.edges[e_it].val.dr < hr && g.edges[e_it].val.dsp < hsp)
+                comps.merge(c1, c2);
+        }
+    }
+
+    std::vector<SegmLink> edges;
+    edges.reserve(g.numv);
+
+    // Prepare edges connecting different components
+    for (int v = 0; v < g.numv; ++v)
+    {
+        int c1 = comps.find(v);
+        for (int e_it = g.start[v]; e_it != -1; e_it = g.edges[e_it].next)
+        {
+            int c2 = comps.find(g.edges[e_it].to);
+            if (c1 != c2)
+                edges.push_back(SegmLink(c1, c2, g.edges[e_it].val));
+        }
+    }
+
+    // Sort all graph's edges connecting different components (in ascending order)
+    std::sort(edges.begin(), edges.end());
+
+    // Exclude small components (starting from the nearest couple)
+    for (size_t i = 0; i < edges.size(); ++i)
+    {
+        int c1 = comps.find(edges[i].from);
+        int c2 = comps.find(edges[i].to);
+        if (c1 != c2 && (comps.size[c1] < minsize || comps.size[c2] < minsize))
+            comps.merge(c1, c2);
+    }
+
+    // Compute sum of the pixel's colors which are in the same segment
+    Mat h_src(src);
+    std::vector<Vec4i> sumcols(nrows * ncols, Vec4i(0, 0, 0, 0));
+    for (int y = 0; y < nrows; ++y)
+    {
+        Vec4b* h_srcy = h_src.ptr<Vec4b>(y);
+        for (int x = 0; x < ncols; ++x)
+        {
+            int parent = comps.find(pix(y, x, ncols));
+            Vec4b col = h_srcy[x];
+            Vec4i& sumcol = sumcols[parent];
+            sumcol[0] += col[0];
+            sumcol[1] += col[1];
+            sumcol[2] += col[2];
+        }
+    }
+
+    // Create final image, color of each segment is the average color of its pixels
+    Mat dst(src.size(), src.type());
+
+    for (int y = 0; y < nrows; ++y)
+    {
+        Vec4b* dsty = dst.ptr<Vec4b>(y);
+        for (int x = 0; x < ncols; ++x)
+        {
+            int parent = comps.find(pix(y, x, ncols));
+            const Vec4i& sumcol = sumcols[parent];
+            Vec4b& dstcol = dsty[x];
+            dstcol[0] = static_cast<uchar>(sumcol[0] / comps.size[parent]);
+            dstcol[1] = static_cast<uchar>(sumcol[1] / comps.size[parent]);
+            dstcol[2] = static_cast<uchar>(sumcol[2] / comps.size[parent]);
+            dstcol[3] = 255;
+        }
+    }
+
+    if (_dst.kind() == _InputArray::CUDA_GPU_MAT)
+    {
+        GpuMat dstGpuMat = getOutputMat(_dst, src.size(), src.type(), stream);
+        dstGpuMat.upload(dst, stream);
+    }
+    else {
+        dst.copyTo(_dst);
+    }
+}
+
+#endif // #if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
diff --git a/modules/cudaimgproc/src/precomp.hpp b/modules/cudaimgproc/src/precomp.hpp
new file mode 100644
index 00000000000..3bbb2a8f08a
--- /dev/null
+++ b/modules/cudaimgproc/src/precomp.hpp
@@ -0,0 +1,65 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_PRECOMP_H__
+#define __OPENCV_PRECOMP_H__
+
+#include "opencv2/cudaimgproc.hpp"
+
+#include "opencv2/core/utility.hpp"
+#include "opencv2/core/private.hpp"
+#include "opencv2/core/private.cuda.hpp"
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifdef HAVE_OPENCV_CUDAARITHM
+#  include "opencv2/cudaarithm.hpp"
+#endif
+
+#ifdef HAVE_OPENCV_CUDAFILTERS
+#  include "opencv2/cudafilters.hpp"
+#endif
+
+#include <limits>
+#include <algorithm>
+
+#endif /* __OPENCV_PRECOMP_H__ */
diff --git a/modules/optflow/test/ocl/test_dis.cpp b/modules/cudaimgproc/test/test_bilateral_filter.cpp
similarity index 56%
rename from modules/optflow/test/ocl/test_dis.cpp
rename to modules/cudaimgproc/test/test_bilateral_filter.cpp
index 57f2969e138..8e158096143 100644
--- a/modules/optflow/test/ocl/test_dis.cpp
+++ b/modules/cudaimgproc/test/test_bilateral_filter.cpp
@@ -7,10 +7,11 @@
 //  copy or use the software.
 //
 //
-//                        Intel License Agreement
+//                           License Agreement
 //                For Open Source Computer Vision Library
 //
-// Copyright (C) 2000, Intel Corporation, all rights reserved.
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
 // Third party copyrights are property of their respective owners.
 //
 // Redistribution and use in source and binary forms, with or without modification,
@@ -23,7 +24,7 @@
 //     this list of conditions and the following disclaimer in the documentation
 //     and/or other materials provided with the distribution.
 //
-//   * The name of Intel Corporation may not be used to endorse or promote products
+//   * The name of the copyright holders may not be used to endorse or promote products
 //     derived from this software without specific prior written permission.
 //
 // This software is provided by the copyright holders and contributors "as is" and
@@ -39,58 +40,59 @@
 //
 //M*/
 
-#include "../test_precomp.hpp"
-#include "opencv2/ts/ocl_test.hpp"
+#include "test_precomp.hpp"
 
-#ifdef HAVE_OPENCL
+#ifdef HAVE_CUDA
 
 namespace opencv_test { namespace {
 
-PARAM_TEST_CASE(OCL_DenseOpticalFlow_DIS, int)
+////////////////////////////////////////////////////////
+// BilateralFilter
+
+PARAM_TEST_CASE(BilateralFilter, cv::cuda::DeviceInfo, cv::Size, MatType)
 {
-    int preset;
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    int kernel_size;
+    float sigma_color;
+    float sigma_spatial;
 
     virtual void SetUp()
     {
-        preset = GET_PARAM(0);
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+
+        kernel_size = 5;
+        sigma_color = 10.f;
+        sigma_spatial = 3.5f;
+
+        cv::cuda::setDevice(devInfo.deviceID());
     }
 };
 
-OCL_TEST_P(OCL_DenseOpticalFlow_DIS, Mat)
+CUDA_TEST_P(BilateralFilter, Accuracy)
 {
-    Mat frame1, frame2, GT;
-
-    frame1 = imread(TS::ptr()->get_data_path() + "optflow/RubberWhale1.png");
-    frame2 = imread(TS::ptr()->get_data_path() + "optflow/RubberWhale2.png");
-
-    CV_Assert(!frame1.empty() && !frame2.empty());
+    cv::Mat src = randomMat(size, type);
 
-    cvtColor(frame1, frame1, COLOR_BGR2GRAY);
-    cvtColor(frame2, frame2, COLOR_BGR2GRAY);
+    src.convertTo(src, type);
+    cv::cuda::GpuMat dst;
 
-    Ptr<DenseOpticalFlow> algo;
+    cv::cuda::bilateralFilter(loadMat(src), dst, kernel_size, sigma_color, sigma_spatial);
 
-    // iterate over presets:
-    for (int i = 0; i < cvtest::ocl::test_loop_times; i++)
-    {
-        Mat flow;
-        UMat ocl_flow;
-
-        algo = createOptFlow_DIS(preset);
-        OCL_OFF(algo->calc(frame1, frame2, flow));
-        OCL_ON(algo->calc(frame1, frame2, ocl_flow));
-        ASSERT_EQ(flow.rows, ocl_flow.rows);
-        ASSERT_EQ(flow.cols, ocl_flow.cols);
+    cv::Mat dst_gold;
+    cv::bilateralFilter(src, dst_gold, kernel_size, sigma_color, sigma_spatial);
 
-        EXPECT_MAT_SIMILAR(flow, ocl_flow, 6e-3);
-    }
+    EXPECT_MAT_NEAR(dst_gold, dst, src.depth() == CV_32F ? 1e-3 : 1.0);
 }
 
-OCL_INSTANTIATE_TEST_CASE_P(Contrib, OCL_DenseOpticalFlow_DIS,
-                            Values(DISOpticalFlow::PRESET_ULTRAFAST,
-                                   DISOpticalFlow::PRESET_FAST,
-                                   DISOpticalFlow::PRESET_MEDIUM));
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, BilateralFilter, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(cv::Size(128, 128), cv::Size(113, 113), cv::Size(639, 481)),
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC3), MatType(CV_32FC1), MatType(CV_32FC3))
+    ));
 
-}} // namespace
 
-#endif // HAVE_OPENCL
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaimgproc/test/test_blend.cpp b/modules/cudaimgproc/test/test_blend.cpp
new file mode 100644
index 00000000000..41934f9af62
--- /dev/null
+++ b/modules/cudaimgproc/test/test_blend.cpp
@@ -0,0 +1,126 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+////////////////////////////////////////////////////////////////////////////
+// Blend
+
+namespace
+{
+    template <typename T>
+    void blendLinearGold(const cv::Mat& img1, const cv::Mat& img2, const cv::Mat& weights1, const cv::Mat& weights2, cv::Mat& result_gold)
+    {
+        result_gold.create(img1.size(), img1.type());
+
+        int cn = img1.channels();
+
+        for (int y = 0; y < img1.rows; ++y)
+        {
+            const float* weights1_row = weights1.ptr<float>(y);
+            const float* weights2_row = weights2.ptr<float>(y);
+            const T* img1_row = img1.ptr<T>(y);
+            const T* img2_row = img2.ptr<T>(y);
+            T* result_gold_row = result_gold.ptr<T>(y);
+
+            for (int x = 0; x < img1.cols * cn; ++x)
+            {
+                float w1 = weights1_row[x / cn];
+                float w2 = weights2_row[x / cn];
+                result_gold_row[x] = static_cast<T>((img1_row[x] * w1 + img2_row[x] * w2) / (w1 + w2 + 1e-5f));
+            }
+        }
+    }
+}
+
+PARAM_TEST_CASE(Blend, cv::cuda::DeviceInfo, cv::Size, MatType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Blend, Accuracy)
+{
+    int depth = CV_MAT_DEPTH(type);
+
+    cv::Mat img1 = randomMat(size, type, 0.0, depth == CV_8U ? 255.0 : 1.0);
+    cv::Mat img2 = randomMat(size, type, 0.0, depth == CV_8U ? 255.0 : 1.0);
+    cv::Mat weights1 = randomMat(size, CV_32F, 0, 1);
+    cv::Mat weights2 = randomMat(size, CV_32F, 0, 1);
+
+    cv::cuda::GpuMat result;
+    cv::cuda::blendLinear(loadMat(img1, useRoi), loadMat(img2, useRoi), loadMat(weights1, useRoi), loadMat(weights2, useRoi), result);
+
+    cv::Mat result_gold;
+    if (depth == CV_8U)
+        blendLinearGold<uchar>(img1, img2, weights1, weights2, result_gold);
+    else
+        blendLinearGold<float>(img1, img2, weights1, weights2, result_gold);
+
+    EXPECT_MAT_NEAR(result_gold, result, CV_MAT_DEPTH(type) == CV_8U ? 1.0 : 1e-5);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, Blend, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC3), MatType(CV_8UC4), MatType(CV_32FC1), MatType(CV_32FC3), MatType(CV_32FC4)),
+    WHOLE_SUBMAT));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaimgproc/test/test_canny.cpp b/modules/cudaimgproc/test/test_canny.cpp
new file mode 100644
index 00000000000..a782a87b3b4
--- /dev/null
+++ b/modules/cudaimgproc/test/test_canny.cpp
@@ -0,0 +1,160 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+////////////////////////////////////////////////////////
+// Canny
+
+namespace
+{
+    IMPLEMENT_PARAM_CLASS(AppertureSize, int)
+    IMPLEMENT_PARAM_CLASS(L2gradient, bool)
+}
+
+PARAM_TEST_CASE(Canny, cv::cuda::DeviceInfo, AppertureSize, L2gradient, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    int apperture_size;
+    bool useL2gradient;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        apperture_size = GET_PARAM(1);
+        useL2gradient = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Canny, Accuracy)
+{
+    cv::Mat img = readImage("stereobm/aloe-L.png", cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(img.empty());
+
+    double low_thresh = 50.0;
+    double high_thresh = 100.0;
+
+    cv::Ptr<cv::cuda::CannyEdgeDetector> canny = cv::cuda::createCannyEdgeDetector(low_thresh, high_thresh, apperture_size, useL2gradient);
+
+    cv::cuda::GpuMat edges;
+    canny->detect(loadMat(img, useRoi), edges);
+
+    cv::Mat edges_gold;
+    cv::Canny(img, edges_gold, low_thresh, high_thresh, apperture_size, useL2gradient);
+
+    EXPECT_MAT_SIMILAR(edges_gold, edges, 2e-2);
+}
+
+class CannyAsyncParallelLoopBody : public cv::ParallelLoopBody
+{
+public:
+    CannyAsyncParallelLoopBody(const cv::cuda::GpuMat& d_img_, cv::cuda::GpuMat* edges_, double low_thresh_, double high_thresh_, int apperture_size_, bool useL2gradient_)
+        : d_img(d_img_), edges(edges_), low_thresh(low_thresh_), high_thresh(high_thresh_), apperture_size(apperture_size_), useL2gradient(useL2gradient_) {}
+    ~CannyAsyncParallelLoopBody() {};
+    void operator()(const cv::Range& r) const
+    {
+        for (int i = r.start; i < r.end; i++) {
+            cv::cuda::Stream stream;
+            cv::Ptr<cv::cuda::CannyEdgeDetector> canny = cv::cuda::createCannyEdgeDetector(low_thresh, high_thresh, apperture_size, useL2gradient);
+            canny->detect(d_img, edges[i], stream);
+            stream.waitForCompletion();
+        }
+    }
+protected:
+    const cv::cuda::GpuMat& d_img;
+    cv::cuda::GpuMat* edges;
+    double low_thresh;
+    double high_thresh;
+    int apperture_size;
+    bool useL2gradient;
+};
+
+#define NUM_STREAMS 64
+
+CUDA_TEST_P(Canny, Async)
+{
+    if (!supportFeature(devInfo, cv::cuda::FEATURE_SET_COMPUTE_30))
+    {
+        throw SkipTestException("CUDA device doesn't support texture objects");
+    }
+    else
+    {
+        const cv::Mat img = readImage("stereobm/aloe-L.png", cv::IMREAD_GRAYSCALE);
+        ASSERT_FALSE(img.empty());
+
+        const cv::cuda::GpuMat d_img_roi = loadMat(img, useRoi);
+
+        double low_thresh = 50.0;
+        double high_thresh = 100.0;
+
+        // Synchronous call
+        cv::Ptr<cv::cuda::CannyEdgeDetector> canny = cv::cuda::createCannyEdgeDetector(low_thresh, high_thresh, apperture_size, useL2gradient);
+        cv::cuda::GpuMat edges_gold;
+        canny->detect(d_img_roi, edges_gold);
+
+        // Asynchronous call
+        cv::cuda::GpuMat edges[NUM_STREAMS];
+        cv::parallel_for_(cv::Range(0, NUM_STREAMS), CannyAsyncParallelLoopBody(d_img_roi, edges, low_thresh, high_thresh, apperture_size, useL2gradient));
+
+        // Compare the results of synchronous call and asynchronous call
+        for (int i = 0; i < NUM_STREAMS; i++)
+            EXPECT_MAT_NEAR(edges_gold, edges[i], 0.0);
+    }
+ }
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, Canny, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(AppertureSize(3), AppertureSize(5), AppertureSize(7)),
+    testing::Values(L2gradient(false), L2gradient(true)),
+    WHOLE_SUBMAT));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaimgproc/test/test_color.cpp b/modules/cudaimgproc/test/test_color.cpp
new file mode 100644
index 00000000000..3016bfa6227
--- /dev/null
+++ b/modules/cudaimgproc/test/test_color.cpp
@@ -0,0 +1,2513 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// cvtColor
+
+PARAM_TEST_CASE(CvtColor, cv::cuda::DeviceInfo, cv::Size, MatDepth, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    bool useRoi;
+
+    cv::Mat img;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        img = randomMat(size, CV_MAKE_TYPE(depth, 3), 0.0, depth == CV_32F ? 1.0 : 255.0);
+    }
+};
+
+CUDA_TEST_P(CvtColor, BGR2RGB)
+{
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2RGB);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2RGB);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGR2RGBA)
+{
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2RGBA);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2RGBA);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGR2BGRA)
+{
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2BGRA);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2BGRA);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGRA2RGB)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGRA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGRA2RGB);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGRA2RGB);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGRA2BGR)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGRA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGRA2BGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGRA2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGRA2RGBA)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGRA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGRA2RGBA);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGRA2RGBA);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGR2GRAY)
+{
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2GRAY);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2GRAY);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, RGB2GRAY)
+{
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2GRAY);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2GRAY);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, GRAY2BGR)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2GRAY);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_GRAY2BGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_GRAY2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, GRAY2BGRA)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2GRAY);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_GRAY2BGRA, 4);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_GRAY2BGRA, 4);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGRA2GRAY)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGRA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGRA2GRAY);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGRA2GRAY);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, RGBA2GRAY)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2RGBA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGBA2GRAY);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGBA2GRAY);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, BGR2BGR565)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2BGR565);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2BGR565);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, RGB2BGR565)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2BGR565);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2BGR565);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGR5652BGR)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGR565);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR5652BGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR5652BGR);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGR5652RGB)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGR565);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR5652RGB);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR5652RGB);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGRA2BGR565)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGRA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGRA2BGR565);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGRA2BGR565);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, RGBA2BGR565)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2RGBA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGBA2BGR565);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGBA2BGR565);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGR5652BGRA)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGR565);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR5652BGRA, 4);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR5652BGRA, 4);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGR5652RGBA)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGR565);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR5652RGBA, 4);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR5652RGBA, 4);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, GRAY2BGR565)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2GRAY);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_GRAY2BGR565);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_GRAY2BGR565);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGR5652GRAY)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGR565);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR5652GRAY);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR5652GRAY);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGR2BGR555)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2BGR555);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2BGR555);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, RGB2BGR555)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2BGR555);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2BGR555);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGR5552BGR)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGR555);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR5552BGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR5552BGR);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGR5552RGB)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGR555);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR5552RGB);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR5552RGB);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGRA2BGR555)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGRA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGRA2BGR555);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGRA2BGR555);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, RGBA2BGR555)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2RGBA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGBA2BGR555);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGBA2BGR555);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGR5552BGRA)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGR555);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR5552BGRA, 4);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR5552BGRA, 4);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGR5552RGBA)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGR555);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR5552RGBA, 4);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR5552RGBA, 4);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, GRAY2BGR555)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2GRAY);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_GRAY2BGR555);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_GRAY2BGR555);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGR5552GRAY)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGR555);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR5552GRAY);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR5552GRAY);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+CUDA_TEST_P(CvtColor, BGR2XYZ)
+{
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2XYZ);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2XYZ);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, RGB2XYZ)
+{
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2XYZ);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2XYZ);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, BGR2XYZ4)
+{
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2XYZ, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2XYZ);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, BGRA2XYZ4)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2BGRA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2XYZ, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2XYZ);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, XYZ2BGR)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2XYZ);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_XYZ2BGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_XYZ2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, XYZ2RGB)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2XYZ);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_XYZ2RGB);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_XYZ2RGB);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, XYZ42BGR)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2XYZ);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_XYZ2BGR);
+
+    cv::Mat channels[4];
+    cv::split(src, channels);
+    channels[3] = cv::Mat(src.size(), depth, cv::Scalar::all(0));
+    cv::merge(channels, 4, src);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_XYZ2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, XYZ42BGRA)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2XYZ);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_XYZ2BGR, 4);
+
+    cv::Mat channels[4];
+    cv::split(src, channels);
+    channels[3] = cv::Mat(src.size(), depth, cv::Scalar::all(0));
+    cv::merge(channels, 4, src);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_XYZ2BGR, 4);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, BGR2YCrCb)
+{
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2YCrCb);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2YCrCb);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, RGB2YCrCb)
+{
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2YCrCb);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2YCrCb);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, BGR2YCrCb4)
+{
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2YCrCb, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2YCrCb);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, RGBA2YCrCb4)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2RGBA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2YCrCb, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2YCrCb);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, YCrCb2BGR)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2YCrCb);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_YCrCb2BGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_YCrCb2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, YCrCb2RGB)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2YCrCb);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_YCrCb2RGB);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_YCrCb2RGB);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, YCrCb42RGB)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2YCrCb);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_YCrCb2RGB);
+
+    cv::Mat channels[4];
+    cv::split(src, channels);
+    channels[3] = cv::Mat(src.size(), depth, cv::Scalar::all(0));
+    cv::merge(channels, 4, src);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_YCrCb2RGB);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, YCrCb42RGBA)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2YCrCb);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_YCrCb2RGB, 4);
+
+    cv::Mat channels[4];
+    cv::split(src, channels);
+    channels[3] = cv::Mat(src.size(), depth, cv::Scalar::all(0));
+    cv::merge(channels, 4, src);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_YCrCb2RGB, 4);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, BGR2HSV)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2HSV);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2HSV);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, RGB2HSV)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2HSV);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2HSV);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, RGB2HSV4)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2HSV, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2HSV);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, RGBA2HSV4)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2RGBA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2HSV, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2HSV);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, BGR2HLS)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2HLS);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2HLS);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, RGB2HLS)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2HLS);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2HLS);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, RGB2HLS4)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2HLS, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2HLS);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, RGBA2HLS4)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2RGBA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2HLS, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2HLS);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, HSV2BGR)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2HSV);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_HSV2BGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_HSV2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, HSV2RGB)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2HSV);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_HSV2RGB);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_HSV2RGB);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, HSV42BGR)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2HSV);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_HSV2BGR);
+
+    cv::Mat channels[4];
+    cv::split(src, channels);
+    channels[3] = cv::Mat(src.size(), depth, cv::Scalar::all(0));
+    cv::merge(channels, 4, src);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_HSV2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, HSV42BGRA)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2HSV);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_HSV2BGR, 4);
+
+    cv::Mat channels[4];
+    cv::split(src, channels);
+    channels[3] = cv::Mat(src.size(), depth, cv::Scalar::all(0));
+    cv::merge(channels, 4, src);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_HSV2BGR, 4);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, HLS2BGR)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2HLS);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_HLS2BGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_HLS2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, HLS2RGB)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2HLS);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_HLS2RGB);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_HLS2RGB);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, HLS42RGB)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2HLS);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_HLS2RGB);
+
+    cv::Mat channels[4];
+    cv::split(src, channels);
+    channels[3] = cv::Mat(src.size(), depth, cv::Scalar::all(0));
+    cv::merge(channels, 4, src);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_HLS2RGB);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, HLS42RGBA)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2HLS);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_HLS2RGB, 4);
+
+    cv::Mat channels[4];
+    cv::split(src, channels);
+    channels[3] = cv::Mat(src.size(), depth, cv::Scalar::all(0));
+    cv::merge(channels, 4, src);
+
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_HLS2RGB, 4);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, BGR2HSV_FULL)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2HSV_FULL);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2HSV_FULL);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, RGB2HSV_FULL)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2HSV_FULL);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2HSV_FULL);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, RGB2HSV4_FULL)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2HSV_FULL, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2HSV_FULL);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, RGBA2HSV4_FULL)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2RGBA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2HSV_FULL, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2HSV_FULL);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, BGR2HLS_FULL)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2HLS_FULL);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2HLS_FULL);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, RGB2HLS_FULL)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2HLS_FULL);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2HLS_FULL);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, RGB2HLS4_FULL)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2HLS_FULL, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2HLS_FULL);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, RGBA2HLS4_FULL)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2RGBA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2HLS_FULL, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2HLS_FULL);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, HSV2BGR_FULL)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2HSV_FULL);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_HSV2BGR_FULL);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_HSV2BGR_FULL);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, HSV2RGB_FULL)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2HSV_FULL);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_HSV2RGB_FULL);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_HSV2RGB_FULL);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, HSV42RGB_FULL)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2HSV_FULL);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_HSV2RGB_FULL);
+
+    cv::Mat channels[4];
+    cv::split(src, channels);
+    channels[3] = cv::Mat(src.size(), depth, cv::Scalar::all(0));
+    cv::merge(channels, 4, src);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_HSV2RGB_FULL);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, HSV42RGBA_FULL)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2HSV_FULL);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_HSV2RGB_FULL, 4);
+
+    cv::Mat channels[4];
+    cv::split(src, channels);
+    channels[3] = cv::Mat(src.size(), depth, cv::Scalar::all(0));
+    cv::merge(channels, 4, src);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_HSV2RGB_FULL, 4);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, HLS2BGR_FULL)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2HLS_FULL);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_HLS2BGR_FULL);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_HLS2BGR_FULL);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, HLS2RGB_FULL)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2HLS_FULL);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_HLS2RGB_FULL);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_HLS2RGB_FULL);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, HLS42RGB_FULL)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2HLS_FULL);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_HLS2RGB_FULL);
+
+    cv::Mat channels[4];
+    cv::split(src, channels);
+    channels[3] = cv::Mat(src.size(), depth, cv::Scalar::all(0));
+    cv::merge(channels, 4, src);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_HLS2RGB_FULL);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, HLS42RGBA_FULL)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2HLS_FULL);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_HLS2RGB_FULL, 4);
+
+    cv::Mat channels[4];
+    cv::split(src, channels);
+    channels[3] = cv::Mat(src.size(), depth, cv::Scalar::all(0));
+    cv::merge(channels, 4, src);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_HLS2RGB_FULL, 4);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_32F ? 1e-2 : 1);
+}
+
+CUDA_TEST_P(CvtColor, BGR2YUV)
+{
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2YUV);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2YUV);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, RGB2YUV)
+{
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2YUV);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2YUV);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, YUV2BGR)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2YUV);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_YUV2BGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_YUV2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, YUV42BGR)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2YUV);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_YUV2BGR);
+
+    cv::Mat channels[4];
+    cv::split(src, channels);
+    channels[3] = cv::Mat(src.size(), depth, cv::Scalar::all(0));
+    cv::merge(channels, 4, src);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_YUV2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, YUV42BGRA)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2YUV);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_YUV2BGR, 4);
+
+    cv::Mat channels[4];
+    cv::split(src, channels);
+    channels[3] = cv::Mat(src.size(), depth, cv::Scalar::all(0));
+    cv::merge(channels, 4, src);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_YUV2BGR, 4);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, YUV2RGB)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_RGB2YUV);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_YUV2RGB);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_YUV2RGB);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, BGR2YUV4)
+{
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2YUV, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2YUV);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, RGBA2YUV4)
+{
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2RGBA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2YUV, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2YUV);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, 1e-5);
+}
+
+CUDA_TEST_P(CvtColor, BGR2Lab)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2Lab);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2Lab);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1);
+}
+
+CUDA_TEST_P(CvtColor, RGB2Lab)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2Lab);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2Lab);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1);
+}
+
+CUDA_TEST_P(CvtColor, BGRA2Lab4)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2RGBA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2Lab, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2Lab);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, depth == CV_8U ? 1 : 6e-1);
+}
+
+CUDA_TEST_P(CvtColor, LBGR2Lab)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_LBGR2Lab);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_LBGR2Lab);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 1 : 1e-3);
+}
+
+CUDA_TEST_P(CvtColor, LRGB2Lab)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_LRGB2Lab);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_LRGB2Lab);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 1 : 1e-3);
+}
+
+CUDA_TEST_P(CvtColor, LBGRA2Lab4)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2RGBA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_LBGR2Lab, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_LBGR2Lab);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, depth == CV_8U ? 1 : 1e-3);
+}
+
+CUDA_TEST_P(CvtColor, Lab2BGR)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2Lab);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_Lab2BGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_Lab2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 2 : 1e-2);
+}
+
+CUDA_TEST_P(CvtColor, Lab2RGB)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2Lab);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_Lab2RGB);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_Lab2RGB);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 2 : 1e-2);
+}
+
+CUDA_TEST_P(CvtColor, Lab2BGRA)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2Lab);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_Lab2BGR, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_Lab2BGR, 4);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 2 : 1e-2);
+}
+
+CUDA_TEST_P(CvtColor, Lab2LBGR)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2Lab);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_Lab2LBGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_Lab2LBGR);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 2 : 5e-3);
+}
+
+CUDA_TEST_P(CvtColor, Lab2LRGB)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2Lab);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_Lab2LRGB);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_Lab2LRGB);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 2 : 5e-3);
+}
+
+CUDA_TEST_P(CvtColor, Lab2LRGBA)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2Lab);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_Lab2LRGB, 4);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_Lab2LRGB, 4);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 2 : 5e-3);
+}
+
+CUDA_TEST_P(CvtColor, BGR2Luv)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2Luv);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2Luv);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 1 : 1e-3);
+}
+
+CUDA_TEST_P(CvtColor, RGB2Luv)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGB2Luv);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGB2Luv);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 1 : 1e-3);
+}
+
+CUDA_TEST_P(CvtColor, BGRA2Luv4)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2RGBA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BGR2Luv, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGR2Luv);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, depth == CV_8U ? 1 : 1e-3);
+}
+
+CUDA_TEST_P(CvtColor, LBGR2Luv)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_LBGR2Luv);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_LBGR2Luv);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 1 : 1e-3);
+}
+
+CUDA_TEST_P(CvtColor, LRGB2Luv)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src = img;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_LRGB2Luv);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_LRGB2Luv);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 1 : 1e-3);
+}
+
+CUDA_TEST_P(CvtColor, LBGRA2Luv4)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2RGBA);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_LBGR2Luv, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_LBGR2Luv);
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat channels[4];
+    cv::split(h_dst, channels);
+    cv::merge(channels, 3, h_dst);
+
+    EXPECT_MAT_NEAR(dst_gold, h_dst, depth == CV_8U ? 1 : 1e-3);
+}
+
+CUDA_TEST_P(CvtColor, Luv2BGR)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2Luv);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_Luv2BGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_Luv2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 2 : 1e-4);
+}
+
+CUDA_TEST_P(CvtColor, Luv2RGB)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2Luv);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_Luv2RGB);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_Luv2RGB);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 2 : 1e-4);
+}
+
+CUDA_TEST_P(CvtColor, Luv2BGRA)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2Luv);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_Luv2BGR, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_Luv2BGR, 4);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 2 : 1e-4);
+}
+
+CUDA_TEST_P(CvtColor, Luv2LBGR)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2Luv);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_Luv2LBGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_Luv2LBGR);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 2 : 1e-4);
+}
+
+CUDA_TEST_P(CvtColor, Luv2LRGB)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2Luv);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_Luv2LRGB);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_Luv2LRGB);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 2 : 1e-4);
+}
+
+CUDA_TEST_P(CvtColor, Luv2LRGBA)
+{
+    if (depth == CV_16U)
+        return;
+
+    cv::Mat src;
+    cv::cvtColor(img, src, cv::COLOR_BGR2Luv);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_Luv2LRGB, 4);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_Luv2LRGB, 4);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, depth == CV_8U ? 1 : 1e-4);
+}
+
+#if defined (CUDA_VERSION) && (CUDA_VERSION >= 5000)
+
+CUDA_TEST_P(CvtColor, RGBA2mRGBA)
+{
+    if (depth != CV_8U)
+        return;
+
+    cv::Mat src = randomMat(size, CV_MAKE_TYPE(depth, 4));
+
+    cv::cuda::GpuMat dst = createMat(src.size(), src.type(), useRoi);
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_RGBA2mRGBA);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_RGBA2mRGBA);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1);
+}
+
+#endif // defined (CUDA_VERSION) && (CUDA_VERSION >= 5000)
+
+CUDA_TEST_P(CvtColor, BayerBG2BGR)
+{
+    if ((depth != CV_8U && depth != CV_16U) || useRoi)
+        return;
+
+    cv::Mat src = randomMat(size, depth);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BayerBG2BGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BayerBG2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), dst(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), 0);
+}
+
+CUDA_TEST_P(CvtColor, BayerBG2BGR4)
+{
+    if ((depth != CV_8U && depth != CV_16U) || useRoi)
+        return;
+
+    cv::Mat src = randomMat(size, depth);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BayerBG2BGR, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BayerBG2BGR);
+
+    cv::Mat dst4(dst);
+    cv::Mat dst3;
+    cv::cvtColor(dst4, dst3, cv::COLOR_BGRA2BGR);
+
+
+    EXPECT_MAT_NEAR(dst_gold(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), dst3(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), 0);
+}
+
+CUDA_TEST_P(CvtColor, BayerGB2BGR)
+{
+    if ((depth != CV_8U && depth != CV_16U) || useRoi)
+        return;
+
+    cv::Mat src = randomMat(size, depth);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BayerGB2BGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BayerGB2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), dst(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), 0);
+}
+
+CUDA_TEST_P(CvtColor, BayerGB2BGR4)
+{
+    if ((depth != CV_8U && depth != CV_16U) || useRoi)
+        return;
+
+    cv::Mat src = randomMat(size, depth);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BayerGB2BGR, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BayerGB2BGR);
+
+    cv::Mat dst4(dst);
+    cv::Mat dst3;
+    cv::cvtColor(dst4, dst3, cv::COLOR_BGRA2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), dst3(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), 0);
+}
+
+CUDA_TEST_P(CvtColor, BayerRG2BGR)
+{
+    if ((depth != CV_8U && depth != CV_16U) || useRoi)
+        return;
+
+    cv::Mat src = randomMat(size, depth);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BayerRG2BGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BayerRG2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), dst(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), 0);
+}
+
+CUDA_TEST_P(CvtColor, BayerRG2BGR4)
+{
+    if ((depth != CV_8U && depth != CV_16U) || useRoi)
+        return;
+
+    cv::Mat src = randomMat(size, depth);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BayerRG2BGR, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BayerRG2BGR);
+
+    cv::Mat dst4(dst);
+    cv::Mat dst3;
+    cv::cvtColor(dst4, dst3, cv::COLOR_BGRA2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), dst3(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), 0);
+}
+
+CUDA_TEST_P(CvtColor, BayerGR2BGR)
+{
+    if ((depth != CV_8U && depth != CV_16U) || useRoi)
+        return;
+
+    cv::Mat src = randomMat(size, depth);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BayerGR2BGR);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BayerGR2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), dst(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), 0);
+}
+
+CUDA_TEST_P(CvtColor, BayerGR2BGR4)
+{
+    if ((depth != CV_8U && depth != CV_16U) || useRoi)
+        return;
+
+    cv::Mat src = randomMat(size, depth);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BayerGR2BGR, 4);
+
+    ASSERT_EQ(4, dst.channels());
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BayerGR2BGR);
+
+    cv::Mat dst4(dst);
+    cv::Mat dst3;
+    cv::cvtColor(dst4, dst3, cv::COLOR_BGRA2BGR);
+
+    EXPECT_MAT_NEAR(dst_gold(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), dst3(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), 0);
+}
+
+CUDA_TEST_P(CvtColor, BayerBG2Gray)
+{
+    if ((depth != CV_8U && depth != CV_16U) || useRoi)
+        return;
+
+    cv::Mat src = randomMat(size, depth);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BayerBG2GRAY);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BayerBG2GRAY);
+
+    EXPECT_MAT_NEAR(dst_gold(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), dst(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), 2);
+}
+
+CUDA_TEST_P(CvtColor, BayerGB2Gray)
+{
+    if ((depth != CV_8U && depth != CV_16U) || useRoi)
+        return;
+
+    cv::Mat src = randomMat(size, depth);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BayerGB2GRAY);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BayerGB2GRAY);
+
+    EXPECT_MAT_NEAR(dst_gold(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), dst(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), 2);
+}
+
+CUDA_TEST_P(CvtColor, BayerRG2Gray)
+{
+    if ((depth != CV_8U && depth != CV_16U) || useRoi)
+        return;
+
+    cv::Mat src = randomMat(size, depth);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BayerRG2GRAY);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BayerRG2GRAY);
+
+    EXPECT_MAT_NEAR(dst_gold(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), dst(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), 2);
+}
+
+CUDA_TEST_P(CvtColor, BayerGR2Gray)
+{
+    if ((depth != CV_8U && depth != CV_16U) || useRoi)
+        return;
+
+    cv::Mat src = randomMat(size, depth);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::cvtColor(loadMat(src, useRoi), dst, cv::COLOR_BayerGR2GRAY);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BayerGR2GRAY);
+
+    EXPECT_MAT_NEAR(dst_gold(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), dst(cv::Rect(1, 1, dst.cols - 2, dst.rows - 2)), 2);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, CvtColor, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U), MatDepth(CV_16U), MatDepth(CV_32F)),
+    WHOLE_SUBMAT));
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// Demosaicing
+
+struct Demosaicing : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    virtual void SetUp()
+    {
+        devInfo = GetParam();
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+
+    static void mosaic(const cv::Mat_<cv::Vec3b>& src, cv::Mat_<uchar>& dst, cv::Point firstRed)
+    {
+        dst.create(src.size());
+
+        for (int y = 0; y < src.rows; ++y)
+        {
+            for (int x = 0; x < src.cols; ++x)
+            {
+                cv::Vec3b pix = src(y, x);
+
+                cv::Point alternate;
+                alternate.x = (x + firstRed.x) % 2;
+                alternate.y = (y + firstRed.y) % 2;
+
+                if (alternate.y == 0)
+                {
+                    if (alternate.x == 0)
+                    {
+                        // RG
+                        // GB
+                        dst(y, x) = pix[2];
+                    }
+                    else
+                    {
+                        // GR
+                        // BG
+                        dst(y, x) = pix[1];
+                    }
+                }
+                else
+                {
+                    if (alternate.x == 0)
+                    {
+                        // GB
+                        // RG
+                        dst(y, x) = pix[1];
+                    }
+                    else
+                    {
+                        // BG
+                        // GR
+                        dst(y, x) = pix[0];
+                    }
+                }
+            }
+        }
+    }
+};
+
+CUDA_TEST_P(Demosaicing, BayerBG2BGR)
+{
+    cv::Mat img = readImage("stereobm/aloe-L.png");
+    ASSERT_FALSE(img.empty()) << "Can't load input image";
+
+    cv::Mat_<uchar> src;
+    mosaic(img, src, cv::Point(1, 1));
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::demosaicing(loadMat(src), dst, cv::COLOR_BayerBG2BGR);
+
+    EXPECT_MAT_SIMILAR(img, dst, 2e-2);
+}
+
+CUDA_TEST_P(Demosaicing, BayerGB2BGR)
+{
+    cv::Mat img = readImage("stereobm/aloe-L.png");
+    ASSERT_FALSE(img.empty()) << "Can't load input image";
+
+    cv::Mat_<uchar> src;
+    mosaic(img, src, cv::Point(0, 1));
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::demosaicing(loadMat(src), dst, cv::COLOR_BayerGB2BGR);
+
+    EXPECT_MAT_SIMILAR(img, dst, 2e-2);
+}
+
+CUDA_TEST_P(Demosaicing, BayerRG2BGR)
+{
+    cv::Mat img = readImage("stereobm/aloe-L.png");
+    ASSERT_FALSE(img.empty()) << "Can't load input image";
+
+    cv::Mat_<uchar> src;
+    mosaic(img, src, cv::Point(0, 0));
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::demosaicing(loadMat(src), dst, cv::COLOR_BayerRG2BGR);
+
+    EXPECT_MAT_SIMILAR(img, dst, 2e-2);
+}
+
+CUDA_TEST_P(Demosaicing, BayerGR2BGR)
+{
+    cv::Mat img = readImage("stereobm/aloe-L.png");
+    ASSERT_FALSE(img.empty()) << "Can't load input image";
+
+    cv::Mat_<uchar> src;
+    mosaic(img, src, cv::Point(1, 0));
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::demosaicing(loadMat(src), dst, cv::COLOR_BayerGR2BGR);
+
+    EXPECT_MAT_SIMILAR(img, dst, 2e-2);
+}
+
+CUDA_TEST_P(Demosaicing, BayerBG2BGR_MHT)
+{
+    cv::Mat img = readImage("stereobm/aloe-L.png");
+    ASSERT_FALSE(img.empty()) << "Can't load input image";
+
+    cv::Mat_<uchar> src;
+    mosaic(img, src, cv::Point(1, 1));
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::demosaicing(loadMat(src), dst, cv::cuda::COLOR_BayerBG2BGR_MHT);
+
+    EXPECT_MAT_SIMILAR(img, dst, 5e-3);
+}
+
+CUDA_TEST_P(Demosaicing, BayerGB2BGR_MHT)
+{
+    cv::Mat img = readImage("stereobm/aloe-L.png");
+    ASSERT_FALSE(img.empty()) << "Can't load input image";
+
+    cv::Mat_<uchar> src;
+    mosaic(img, src, cv::Point(0, 1));
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::demosaicing(loadMat(src), dst, cv::cuda::COLOR_BayerGB2BGR_MHT);
+
+    EXPECT_MAT_SIMILAR(img, dst, 5e-3);
+}
+
+CUDA_TEST_P(Demosaicing, BayerRG2BGR_MHT)
+{
+    cv::Mat img = readImage("stereobm/aloe-L.png");
+    ASSERT_FALSE(img.empty()) << "Can't load input image";
+
+    cv::Mat_<uchar> src;
+    mosaic(img, src, cv::Point(0, 0));
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::demosaicing(loadMat(src), dst, cv::cuda::COLOR_BayerRG2BGR_MHT);
+
+    EXPECT_MAT_SIMILAR(img, dst, 5e-3);
+}
+
+CUDA_TEST_P(Demosaicing, BayerGR2BGR_MHT)
+{
+    cv::Mat img = readImage("stereobm/aloe-L.png");
+    ASSERT_FALSE(img.empty()) << "Can't load input image";
+
+    cv::Mat_<uchar> src;
+    mosaic(img, src, cv::Point(1, 0));
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::demosaicing(loadMat(src), dst, cv::cuda::COLOR_BayerGR2BGR_MHT);
+
+    EXPECT_MAT_SIMILAR(img, dst, 5e-3);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, Demosaicing, ALL_DEVICES);
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// swapChannels
+
+PARAM_TEST_CASE(SwapChannels, cv::cuda::DeviceInfo, cv::Size, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        useRoi = GET_PARAM(2);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(SwapChannels, Accuracy)
+{
+    cv::Mat src = readImageType("stereobm/aloe-L.png", CV_8UC4);
+    ASSERT_FALSE(src.empty());
+
+    cv::cuda::GpuMat d_src = loadMat(src, useRoi);
+
+    const int dstOrder[] = {2, 1, 0, 3};
+    cv::cuda::swapChannels(d_src, dstOrder);
+
+    cv::Mat dst_gold;
+    cv::cvtColor(src, dst_gold, cv::COLOR_BGRA2RGBA);
+
+    EXPECT_MAT_NEAR(dst_gold, d_src, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, SwapChannels, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    WHOLE_SUBMAT));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaimgproc/test/test_corners.cpp b/modules/cudaimgproc/test/test_corners.cpp
new file mode 100644
index 00000000000..4ec54099b7d
--- /dev/null
+++ b/modules/cudaimgproc/test/test_corners.cpp
@@ -0,0 +1,151 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// CornerHarris
+
+namespace
+{
+    IMPLEMENT_PARAM_CLASS(BlockSize, int);
+    IMPLEMENT_PARAM_CLASS(ApertureSize, int);
+}
+
+PARAM_TEST_CASE(CornerHarris, cv::cuda::DeviceInfo, MatType, BorderType, BlockSize, ApertureSize)
+{
+    cv::cuda::DeviceInfo devInfo;
+    int type;
+    int borderType;
+    int blockSize;
+    int apertureSize;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        type = GET_PARAM(1);
+        borderType = GET_PARAM(2);
+        blockSize = GET_PARAM(3);
+        apertureSize = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(CornerHarris, Accuracy)
+{
+    cv::Mat src = readImageType("stereobm/aloe-L.png", type);
+    ASSERT_FALSE(src.empty());
+
+    double k = randomDouble(0.1, 0.9);
+
+    cv::Ptr<cv::cuda::CornernessCriteria> harris = cv::cuda::createHarrisCorner(src.type(), blockSize, apertureSize, k, borderType);
+
+    cv::cuda::GpuMat dst;
+    harris->compute(loadMat(src), dst);
+
+    cv::Mat dst_gold;
+    cv::cornerHarris(src, dst_gold, blockSize, apertureSize, k, borderType);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.02);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, CornerHarris, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_32FC1)),
+    testing::Values(BorderType(cv::BORDER_REFLECT101), BorderType(cv::BORDER_REPLICATE), BorderType(cv::BORDER_REFLECT)),
+    testing::Values(BlockSize(3), BlockSize(5), BlockSize(7)),
+    testing::Values(ApertureSize(0), ApertureSize(3), ApertureSize(5), ApertureSize(7))));
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// cornerMinEigen
+
+PARAM_TEST_CASE(CornerMinEigen, cv::cuda::DeviceInfo, MatType, BorderType, BlockSize, ApertureSize)
+{
+    cv::cuda::DeviceInfo devInfo;
+    int type;
+    int borderType;
+    int blockSize;
+    int apertureSize;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        type = GET_PARAM(1);
+        borderType = GET_PARAM(2);
+        blockSize = GET_PARAM(3);
+        apertureSize = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(CornerMinEigen, Accuracy)
+{
+    cv::Mat src = readImageType("stereobm/aloe-L.png", type);
+    ASSERT_FALSE(src.empty());
+
+    cv::Ptr<cv::cuda::CornernessCriteria> minEigenVal = cv::cuda::createMinEigenValCorner(src.type(), blockSize, apertureSize, borderType);
+
+    cv::cuda::GpuMat dst;
+    minEigenVal->compute(loadMat(src), dst);
+
+    cv::Mat dst_gold;
+    cv::cornerMinEigenVal(src, dst_gold, blockSize, apertureSize, borderType);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.02);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, CornerMinEigen, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_32FC1)),
+    testing::Values(BorderType(cv::BORDER_REFLECT101), BorderType(cv::BORDER_REPLICATE), BorderType(cv::BORDER_REFLECT)),
+    testing::Values(BlockSize(3), BlockSize(5), BlockSize(7)),
+    testing::Values(ApertureSize(0), ApertureSize(3), ApertureSize(5), ApertureSize(7))));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaimgproc/test/test_gftt.cpp b/modules/cudaimgproc/test/test_gftt.cpp
new file mode 100644
index 00000000000..9ad66396121
--- /dev/null
+++ b/modules/cudaimgproc/test/test_gftt.cpp
@@ -0,0 +1,133 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////
+// GoodFeaturesToTrack
+
+namespace
+{
+    IMPLEMENT_PARAM_CLASS(MinDistance, double)
+}
+
+PARAM_TEST_CASE(GoodFeaturesToTrack, cv::cuda::DeviceInfo, MinDistance)
+{
+    cv::cuda::DeviceInfo devInfo;
+    double minDistance;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        minDistance = GET_PARAM(1);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(GoodFeaturesToTrack, Accuracy)
+{
+    cv::Mat image = readImage("opticalflow/frame0.png", cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(image.empty());
+
+    int maxCorners = 1000;
+    double qualityLevel = 0.01;
+
+    cv::Ptr<cv::cuda::CornersDetector> detector = cv::cuda::createGoodFeaturesToTrackDetector(image.type(), maxCorners, qualityLevel, minDistance);
+
+    cv::cuda::GpuMat d_pts;
+    detector->detect(loadMat(image), d_pts);
+
+    ASSERT_FALSE(d_pts.empty());
+
+    std::vector<cv::Point2f> pts(d_pts.cols);
+    cv::Mat pts_mat(1, d_pts.cols, CV_32FC2, (void*) &pts[0]);
+    d_pts.download(pts_mat);
+
+    std::vector<cv::Point2f> pts_gold;
+    cv::goodFeaturesToTrack(image, pts_gold, maxCorners, qualityLevel, minDistance);
+
+    ASSERT_EQ(pts_gold.size(), pts.size());
+
+    size_t mistmatch = 0;
+    for (size_t i = 0; i < pts.size(); ++i)
+    {
+        cv::Point2i a = pts_gold[i];
+        cv::Point2i b = pts[i];
+
+        bool eq = std::abs(a.x - b.x) < 1 && std::abs(a.y - b.y) < 1;
+
+        if (!eq)
+            ++mistmatch;
+    }
+
+    double bad_ratio = static_cast<double>(mistmatch) / pts.size();
+
+    ASSERT_LE(bad_ratio, 0.01);
+}
+
+CUDA_TEST_P(GoodFeaturesToTrack, EmptyCorners)
+{
+    int maxCorners = 1000;
+    double qualityLevel = 0.01;
+
+    cv::cuda::GpuMat src(100, 100, CV_8UC1, cv::Scalar::all(0));
+    cv::cuda::GpuMat corners(1, maxCorners, CV_32FC2);
+
+    cv::Ptr<cv::cuda::CornersDetector> detector = cv::cuda::createGoodFeaturesToTrackDetector(src.type(), maxCorners, qualityLevel, minDistance);
+
+    detector->detect(src, corners);
+
+    ASSERT_TRUE(corners.empty());
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, GoodFeaturesToTrack, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(MinDistance(0.0), MinDistance(3.0))));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaimgproc/test/test_histogram.cpp b/modules/cudaimgproc/test/test_histogram.cpp
new file mode 100644
index 00000000000..6af8eb2135c
--- /dev/null
+++ b/modules/cudaimgproc/test/test_histogram.cpp
@@ -0,0 +1,284 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// HistEven
+
+PARAM_TEST_CASE(HistEven, cv::cuda::DeviceInfo, cv::Size)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(HistEven, Accuracy)
+{
+    cv::Mat src = randomMat(size, CV_8UC1);
+
+    int hbins = 30;
+    float hranges[] = {50.0f, 200.0f};
+
+    cv::cuda::GpuMat hist;
+    cv::cuda::histEven(loadMat(src), hist, hbins, (int) hranges[0], (int) hranges[1]);
+
+    cv::Mat hist_gold;
+
+    int histSize[] = {hbins};
+    const float* ranges[] = {hranges};
+    int channels[] = {0};
+    cv::calcHist(&src, 1, channels, cv::Mat(), hist_gold, 1, histSize, ranges);
+
+    hist_gold = hist_gold.t();
+    hist_gold.convertTo(hist_gold, CV_32S);
+
+    EXPECT_MAT_NEAR(hist_gold, hist, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, HistEven, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES));
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// CalcHist
+
+PARAM_TEST_CASE(CalcHist, cv::cuda::DeviceInfo, cv::Size)
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    cv::Size size;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(CalcHist, Accuracy)
+{
+    cv::Mat src = randomMat(size, CV_8UC1);
+
+    cv::cuda::GpuMat hist;
+    cv::cuda::calcHist(loadMat(src), hist);
+
+    cv::Mat hist_gold;
+
+    const int hbins = 256;
+    const float hranges[] = {0.0f, 256.0f};
+    const int histSize[] = {hbins};
+    const float* ranges[] = {hranges};
+    const int channels[] = {0};
+
+    cv::calcHist(&src, 1, channels, cv::Mat(), hist_gold, 1, histSize, ranges);
+    hist_gold = hist_gold.reshape(1, 1);
+    hist_gold.convertTo(hist_gold, CV_32S);
+
+    EXPECT_MAT_NEAR(hist_gold, hist, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, CalcHist, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES));
+
+PARAM_TEST_CASE(CalcHistWithMask, cv::cuda::DeviceInfo, cv::Size)
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    cv::Size size;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(CalcHistWithMask, Accuracy)
+{
+    cv::Mat src = randomMat(size, CV_8UC1);
+    cv::Mat mask = randomMat(size, CV_8UC1);
+    cv::Mat(mask, cv::Rect(0, 0, size.width / 2, size.height / 2)).setTo(0);
+
+    cv::cuda::GpuMat hist;
+    cv::cuda::calcHist(loadMat(src), loadMat(mask), hist);
+
+    cv::Mat hist_gold;
+
+    const int hbins = 256;
+    const float hranges[] = {0.0f, 256.0f};
+    const int histSize[] = {hbins};
+    const float* ranges[] = {hranges};
+    const int channels[] = {0};
+
+    cv::calcHist(&src, 1, channels, mask, hist_gold, 1, histSize, ranges);
+    hist_gold = hist_gold.reshape(1, 1);
+    hist_gold.convertTo(hist_gold, CV_32S);
+
+    EXPECT_MAT_NEAR(hist_gold, hist, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, CalcHistWithMask, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES));
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// EqualizeHist
+
+PARAM_TEST_CASE(EqualizeHist, cv::cuda::DeviceInfo, cv::Size)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(EqualizeHist, Async)
+{
+    cv::Mat src = randomMat(size, CV_8UC1);
+
+    cv::cuda::Stream stream;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::equalizeHist(loadMat(src), dst, stream);
+
+    stream.waitForCompletion();
+
+    cv::Mat dst_gold;
+    cv::equalizeHist(src, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 3.0);
+}
+
+CUDA_TEST_P(EqualizeHist, Accuracy)
+{
+    cv::Mat src = randomMat(size, CV_8UC1);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::equalizeHist(loadMat(src), dst);
+
+    cv::Mat dst_gold;
+    cv::equalizeHist(src, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 3.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, EqualizeHist, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES));
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// CLAHE
+
+namespace
+{
+    IMPLEMENT_PARAM_CLASS(ClipLimit, double)
+}
+
+PARAM_TEST_CASE(CLAHE, cv::cuda::DeviceInfo, cv::Size, ClipLimit, MatType)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    double clipLimit;
+    int type;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        clipLimit = GET_PARAM(2);
+        type = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(CLAHE, Accuracy)
+{
+    cv::Mat src;
+    if (type == CV_8UC1)
+        src = randomMat(size, type);
+    else if (type == CV_16UC1)
+        src = randomMat(size, type, 0, 65535);
+
+    cv::Ptr<cv::cuda::CLAHE> clahe = cv::cuda::createCLAHE(clipLimit);
+    cv::cuda::GpuMat dst;
+    clahe->apply(loadMat(src), dst);
+
+    cv::Ptr<cv::CLAHE> clahe_gold = cv::createCLAHE(clipLimit);
+    cv::Mat dst_gold;
+    clahe_gold->apply(src, dst_gold);
+
+    ASSERT_MAT_NEAR(dst_gold, dst, 1.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, CLAHE, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(0.0, 5.0, 10.0, 20.0, 40.0),
+    testing::Values(MatType(CV_8UC1), MatType(CV_16UC1))));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaimgproc/test/test_hough.cpp b/modules/cudaimgproc/test/test_hough.cpp
new file mode 100644
index 00000000000..05c5bba2390
--- /dev/null
+++ b/modules/cudaimgproc/test/test_hough.cpp
@@ -0,0 +1,261 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// HoughLines
+
+PARAM_TEST_CASE(HoughLines, cv::cuda::DeviceInfo, cv::Size, UseRoi)
+{
+    static void generateLines(cv::Mat& img)
+    {
+        img.setTo(cv::Scalar::all(0));
+
+        cv::line(img, cv::Point(20, 0), cv::Point(20, img.rows), cv::Scalar::all(255));
+        cv::line(img, cv::Point(0, 50), cv::Point(img.cols, 50), cv::Scalar::all(255));
+        cv::line(img, cv::Point(0, 0), cv::Point(img.cols, img.rows), cv::Scalar::all(255));
+        cv::line(img, cv::Point(img.cols, 0), cv::Point(0, img.rows), cv::Scalar::all(255));
+    }
+
+    static void drawLines(cv::Mat& dst, const std::vector<cv::Vec2f>& lines)
+    {
+        dst.setTo(cv::Scalar::all(0));
+
+        for (size_t i = 0; i < lines.size(); ++i)
+        {
+            float rho = lines[i][0], theta = lines[i][1];
+            cv::Point pt1, pt2;
+            double a = std::cos(theta), b = std::sin(theta);
+            double x0 = a*rho, y0 = b*rho;
+            pt1.x = cvRound(x0 + 1000*(-b));
+            pt1.y = cvRound(y0 + 1000*(a));
+            pt2.x = cvRound(x0 - 1000*(-b));
+            pt2.y = cvRound(y0 - 1000*(a));
+            cv::line(dst, pt1, pt2, cv::Scalar::all(255));
+        }
+    }
+};
+
+CUDA_TEST_P(HoughLines, Accuracy)
+{
+    const cv::cuda::DeviceInfo devInfo = GET_PARAM(0);
+    cv::cuda::setDevice(devInfo.deviceID());
+    const cv::Size size = GET_PARAM(1);
+    const bool useRoi = GET_PARAM(2);
+
+    const float rho = 1.0f;
+    const float theta = (float) (1.5 * CV_PI / 180.0);
+    const int threshold = 100;
+
+    cv::Mat src(size, CV_8UC1);
+    generateLines(src);
+
+    cv::Ptr<cv::cuda::HoughLinesDetector> hough = cv::cuda::createHoughLinesDetector(rho, theta, threshold);
+
+    cv::cuda::GpuMat d_lines;
+    hough->detect(loadMat(src, useRoi), d_lines);
+
+    std::vector<cv::Vec2f> lines;
+    hough->downloadResults(d_lines, lines);
+
+    cv::Mat dst(size, CV_8UC1);
+    drawLines(dst, lines);
+
+    ASSERT_MAT_NEAR(src, dst, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, HoughLines, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    WHOLE_SUBMAT));
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// HoughCircles
+
+PARAM_TEST_CASE(HoughCircles, cv::cuda::DeviceInfo, cv::Size, UseRoi)
+{
+    static void drawCircles(cv::Mat& dst, const std::vector<cv::Vec3f>& circles, bool fill)
+    {
+        dst.setTo(cv::Scalar::all(0));
+
+        for (size_t i = 0; i < circles.size(); ++i)
+            cv::circle(dst, cv::Point2f(circles[i][0], circles[i][1]), (int)circles[i][2], cv::Scalar::all(255), fill ? -1 : 1);
+    }
+};
+
+CUDA_TEST_P(HoughCircles, Accuracy)
+{
+    const cv::cuda::DeviceInfo devInfo = GET_PARAM(0);
+    cv::cuda::setDevice(devInfo.deviceID());
+    const cv::Size size = GET_PARAM(1);
+    const bool useRoi = GET_PARAM(2);
+
+    const float dp = 2.0f;
+    const float minDist = 0.0f;
+    const int minRadius = 10;
+    const int maxRadius = 20;
+    const int cannyThreshold = 100;
+    const int votesThreshold = 20;
+
+    std::vector<cv::Vec3f> circles_gold(4);
+    circles_gold[0] = cv::Vec3i(20, 20, minRadius);
+    circles_gold[1] = cv::Vec3i(90, 87, minRadius + 3);
+    circles_gold[2] = cv::Vec3i(30, 70, minRadius + 8);
+    circles_gold[3] = cv::Vec3i(80, 10, maxRadius);
+
+    cv::Mat src(size, CV_8UC1);
+    drawCircles(src, circles_gold, true);
+
+    cv::Ptr<cv::cuda::HoughCirclesDetector> houghCircles = cv::cuda::createHoughCirclesDetector(dp, minDist, cannyThreshold, votesThreshold, minRadius, maxRadius);
+
+    cv::cuda::GpuMat d_circles;
+    houghCircles->detect(loadMat(src, useRoi), d_circles);
+
+    std::vector<cv::Vec3f> circles;
+    d_circles.download(circles);
+
+    ASSERT_FALSE(circles.empty());
+
+    for (size_t i = 0; i < circles.size(); ++i)
+    {
+        cv::Vec3f cur = circles[i];
+
+        bool found = false;
+
+        for (size_t j = 0; j < circles_gold.size(); ++j)
+        {
+            cv::Vec3f gold = circles_gold[j];
+
+            if (std::fabs(cur[0] - gold[0]) < 5 && std::fabs(cur[1] - gold[1]) < 5 && std::fabs(cur[2] - gold[2]) < 5)
+            {
+                found = true;
+                break;
+            }
+        }
+
+        ASSERT_TRUE(found);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, HoughCircles, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    WHOLE_SUBMAT));
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// GeneralizedHough
+
+PARAM_TEST_CASE(GeneralizedHough, cv::cuda::DeviceInfo, UseRoi)
+{
+};
+
+CUDA_TEST_P(GeneralizedHough, Ballard)
+{
+    const cv::cuda::DeviceInfo devInfo = GET_PARAM(0);
+    cv::cuda::setDevice(devInfo.deviceID());
+    const bool useRoi = GET_PARAM(1);
+
+    cv::Mat templ = readImage("../cv/shared/templ.png", cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(templ.empty());
+
+    cv::Point templCenter(templ.cols / 2, templ.rows / 2);
+
+    const size_t gold_count = 3;
+    cv::Point pos_gold[gold_count];
+    pos_gold[0] = cv::Point(templCenter.x + 10, templCenter.y + 10);
+    pos_gold[1] = cv::Point(2 * templCenter.x + 40, templCenter.y + 10);
+    pos_gold[2] = cv::Point(2 * templCenter.x + 40, 2 * templCenter.y + 40);
+
+    cv::Mat image(templ.rows * 3, templ.cols * 3, CV_8UC1, cv::Scalar::all(0));
+    for (size_t i = 0; i < gold_count; ++i)
+    {
+        cv::Rect rec(pos_gold[i].x - templCenter.x, pos_gold[i].y - templCenter.y, templ.cols, templ.rows);
+        cv::Mat imageROI = image(rec);
+        templ.copyTo(imageROI);
+    }
+
+    cv::Ptr<cv::GeneralizedHoughBallard> alg = cv::cuda::createGeneralizedHoughBallard();
+    alg->setVotesThreshold(200);
+
+    alg->setTemplate(loadMat(templ, useRoi));
+
+    cv::cuda::GpuMat d_pos;
+    alg->detect(loadMat(image, useRoi), d_pos);
+
+    std::vector<cv::Vec4f> pos;
+    d_pos.download(pos);
+
+    ASSERT_EQ(gold_count, pos.size());
+
+    for (size_t i = 0; i < gold_count; ++i)
+    {
+        cv::Point gold = pos_gold[i];
+
+        bool found = false;
+
+        for (size_t j = 0; j < pos.size(); ++j)
+        {
+            cv::Point2f p(pos[j][0], pos[j][1]);
+
+            if (::fabs(p.x - gold.x) < 2 && ::fabs(p.y - gold.y) < 2)
+            {
+                found = true;
+                break;
+            }
+        }
+
+        ASSERT_TRUE(found);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, GeneralizedHough, testing::Combine(
+    ALL_DEVICES,
+    WHOLE_SUBMAT));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaimgproc/test/test_main.cpp b/modules/cudaimgproc/test/test_main.cpp
new file mode 100644
index 00000000000..04f4fcf6e60
--- /dev/null
+++ b/modules/cudaimgproc/test/test_main.cpp
@@ -0,0 +1,45 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+CV_CUDA_TEST_MAIN("gpu")
diff --git a/modules/cudaimgproc/test/test_match_template.cpp b/modules/cudaimgproc/test/test_match_template.cpp
new file mode 100644
index 00000000000..5ae1cd2ba53
--- /dev/null
+++ b/modules/cudaimgproc/test/test_match_template.cpp
@@ -0,0 +1,341 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+////////////////////////////////////////////////////////////////////////////////
+// MatchTemplate8U
+
+CV_ENUM(TemplateMethod, cv::TM_SQDIFF, cv::TM_SQDIFF_NORMED, cv::TM_CCORR, cv::TM_CCORR_NORMED, cv::TM_CCOEFF, cv::TM_CCOEFF_NORMED)
+#define ALL_TEMPLATE_METHODS testing::Values(TemplateMethod(cv::TM_SQDIFF), TemplateMethod(cv::TM_SQDIFF_NORMED), TemplateMethod(cv::TM_CCORR), TemplateMethod(cv::TM_CCORR_NORMED), TemplateMethod(cv::TM_CCOEFF), TemplateMethod(cv::TM_CCOEFF_NORMED))
+
+namespace
+{
+    IMPLEMENT_PARAM_CLASS(TemplateSize, cv::Size);
+}
+
+PARAM_TEST_CASE(MatchTemplate8U, cv::cuda::DeviceInfo, cv::Size, TemplateSize, Channels, TemplateMethod)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    cv::Size templ_size;
+    int cn;
+    int method;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        templ_size = GET_PARAM(2);
+        cn = GET_PARAM(3);
+        method = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(MatchTemplate8U, Accuracy)
+{
+    cv::Mat image = randomMat(size, CV_MAKETYPE(CV_8U, cn));
+    cv::Mat templ = randomMat(templ_size, CV_MAKETYPE(CV_8U, cn));
+
+    cv::Ptr<cv::cuda::TemplateMatching> alg = cv::cuda::createTemplateMatching(image.type(), method);
+
+    cv::cuda::GpuMat dst;
+    alg->match(loadMat(image), loadMat(templ), dst);
+
+    cv::Mat dst_gold;
+    cv::matchTemplate(image, templ, dst_gold, method);
+
+    cv::Mat h_dst(dst);
+    ASSERT_EQ(dst_gold.size(), h_dst.size());
+    ASSERT_EQ(dst_gold.type(), h_dst.type());
+    for (int y = 0; y < h_dst.rows; ++y)
+    {
+        for (int x = 0; x < h_dst.cols; ++x)
+        {
+            float gold_val = dst_gold.at<float>(y, x);
+            float actual_val = dst_gold.at<float>(y, x);
+            ASSERT_FLOAT_EQ(gold_val, actual_val) << y << ", " << x;
+        }
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, MatchTemplate8U, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(TemplateSize(cv::Size(5, 5)), TemplateSize(cv::Size(16, 16)), TemplateSize(cv::Size(30, 30))),
+    testing::Values(Channels(1), Channels(3), Channels(4)),
+    ALL_TEMPLATE_METHODS));
+
+////////////////////////////////////////////////////////////////////////////////
+// MatchTemplate32F
+
+PARAM_TEST_CASE(MatchTemplate32F, cv::cuda::DeviceInfo, cv::Size, TemplateSize, Channels, TemplateMethod)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    cv::Size templ_size;
+    int cn;
+    int method;
+
+    int n, m, h, w;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        templ_size = GET_PARAM(2);
+        cn = GET_PARAM(3);
+        method = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(MatchTemplate32F, Regression)
+{
+    cv::Mat image = randomMat(size, CV_MAKETYPE(CV_32F, cn));
+    cv::Mat templ = randomMat(templ_size, CV_MAKETYPE(CV_32F, cn));
+
+    cv::Ptr<cv::cuda::TemplateMatching> alg = cv::cuda::createTemplateMatching(image.type(), method);
+
+    cv::cuda::GpuMat dst;
+    alg->match(loadMat(image), loadMat(templ), dst);
+
+    cv::Mat dst_gold;
+    cv::matchTemplate(image, templ, dst_gold, method);
+
+    cv::Mat h_dst(dst);
+    ASSERT_EQ(dst_gold.size(), h_dst.size());
+    ASSERT_EQ(dst_gold.type(), h_dst.type());
+    for (int y = 0; y < h_dst.rows; ++y)
+    {
+        for (int x = 0; x < h_dst.cols; ++x)
+        {
+            float gold_val = dst_gold.at<float>(y, x);
+            float actual_val = dst_gold.at<float>(y, x);
+            ASSERT_FLOAT_EQ(gold_val, actual_val) << y << ", " << x;
+        }
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, MatchTemplate32F, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(TemplateSize(cv::Size(5, 5)), TemplateSize(cv::Size(16, 16)), TemplateSize(cv::Size(30, 30))),
+    testing::Values(Channels(1), Channels(3), Channels(4)),
+    testing::Values(TemplateMethod(cv::TM_SQDIFF), TemplateMethod(cv::TM_CCORR))));
+
+////////////////////////////////////////////////////////////////////////////////
+// MatchTemplateBlackSource
+
+PARAM_TEST_CASE(MatchTemplateBlackSource, cv::cuda::DeviceInfo, TemplateMethod)
+{
+    cv::cuda::DeviceInfo devInfo;
+    int method;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        method = GET_PARAM(1);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(MatchTemplateBlackSource, Accuracy)
+{
+    cv::Mat image = readImage("matchtemplate/black.png");
+    ASSERT_FALSE(image.empty());
+
+    cv::Mat pattern = readImage("matchtemplate/cat.png");
+    ASSERT_FALSE(pattern.empty());
+
+    cv::Ptr<cv::cuda::TemplateMatching> alg = cv::cuda::createTemplateMatching(image.type(), method);
+
+    cv::cuda::GpuMat d_dst;
+    alg->match(loadMat(image), loadMat(pattern), d_dst);
+
+    cv::Mat dst(d_dst);
+
+    double maxValue;
+    cv::Point maxLoc;
+    cv::minMaxLoc(dst, NULL, &maxValue, NULL, &maxLoc);
+
+    cv::Point maxLocGold = cv::Point(284, 12);
+
+    ASSERT_EQ(maxLocGold, maxLoc);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, MatchTemplateBlackSource, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(TemplateMethod(cv::TM_CCOEFF_NORMED), TemplateMethod(cv::TM_CCORR_NORMED))));
+
+////////////////////////////////////////////////////////////////////////////////
+// MatchTemplate_CCOEF_NORMED
+
+PARAM_TEST_CASE(MatchTemplate_CCOEF_NORMED, cv::cuda::DeviceInfo, std::pair<std::string, std::string>)
+{
+    cv::cuda::DeviceInfo devInfo;
+    std::string imageName;
+    std::string patternName;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        imageName = GET_PARAM(1).first;
+        patternName = GET_PARAM(1).second;
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(MatchTemplate_CCOEF_NORMED, Accuracy)
+{
+    cv::Mat image = readImage(imageName);
+    ASSERT_FALSE(image.empty());
+
+    cv::Mat pattern = readImage(patternName);
+    ASSERT_FALSE(pattern.empty());
+
+    cv::Ptr<cv::cuda::TemplateMatching> alg = cv::cuda::createTemplateMatching(image.type(), cv::TM_CCOEFF_NORMED);
+
+    cv::cuda::GpuMat d_dst;
+    alg->match(loadMat(image), loadMat(pattern), d_dst);
+
+    cv::Mat dst(d_dst);
+
+    cv::Point minLoc, maxLoc;
+    double minVal, maxVal;
+    cv::minMaxLoc(dst, &minVal, &maxVal, &minLoc, &maxLoc);
+
+    cv::Mat dstGold;
+    cv::matchTemplate(image, pattern, dstGold, cv::TM_CCOEFF_NORMED);
+
+    double minValGold, maxValGold;
+    cv::Point minLocGold, maxLocGold;
+    cv::minMaxLoc(dstGold, &minValGold, &maxValGold, &minLocGold, &maxLocGold);
+
+    ASSERT_EQ(minLocGold, minLoc);
+    ASSERT_EQ(maxLocGold, maxLoc);
+    ASSERT_LE(maxVal, 1.0);
+    ASSERT_GE(minVal, -1.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, MatchTemplate_CCOEF_NORMED, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(std::make_pair(std::string("matchtemplate/source-0.png"), std::string("matchtemplate/target-0.png")))));
+
+////////////////////////////////////////////////////////////////////////////////
+// MatchTemplate_CanFindBigTemplate
+
+struct MatchTemplate_CanFindBigTemplate : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    virtual void SetUp()
+    {
+        devInfo = GetParam();
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(MatchTemplate_CanFindBigTemplate, SQDIFF_NORMED)
+{
+    cv::Mat scene = readImage("matchtemplate/scene.png");
+    ASSERT_FALSE(scene.empty());
+
+    cv::Mat templ = readImage("matchtemplate/template.png");
+    ASSERT_FALSE(templ.empty());
+
+    cv::Ptr<cv::cuda::TemplateMatching> alg = cv::cuda::createTemplateMatching(scene.type(), cv::TM_SQDIFF_NORMED);
+
+    cv::cuda::GpuMat d_result;
+    alg->match(loadMat(scene), loadMat(templ), d_result);
+
+    cv::Mat result(d_result);
+
+    double minVal;
+    cv::Point minLoc;
+    cv::minMaxLoc(result, &minVal, 0, &minLoc, 0);
+
+    ASSERT_GE(minVal, 0);
+    ASSERT_LT(minVal, 1e-3);
+    ASSERT_EQ(344, minLoc.x);
+    ASSERT_EQ(0, minLoc.y);
+}
+
+CUDA_TEST_P(MatchTemplate_CanFindBigTemplate, SQDIFF)
+{
+    cv::Mat scene = readImage("matchtemplate/scene.png");
+    ASSERT_FALSE(scene.empty());
+
+    cv::Mat templ = readImage("matchtemplate/template.png");
+    ASSERT_FALSE(templ.empty());
+
+    cv::Ptr<cv::cuda::TemplateMatching> alg = cv::cuda::createTemplateMatching(scene.type(), cv::TM_SQDIFF);
+
+    cv::cuda::GpuMat d_result;
+    alg->match(loadMat(scene), loadMat(templ), d_result);
+
+    cv::Mat result(d_result);
+
+    double minVal;
+    cv::Point minLoc;
+    cv::minMaxLoc(result, &minVal, 0, &minLoc, 0);
+
+    ASSERT_GE(minVal, 0);
+    ASSERT_EQ(344, minLoc.x);
+    ASSERT_EQ(0, minLoc.y);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, MatchTemplate_CanFindBigTemplate, ALL_DEVICES);
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaimgproc/test/test_mean_shift.cpp b/modules/cudaimgproc/test/test_mean_shift.cpp
new file mode 100644
index 00000000000..7505619cd32
--- /dev/null
+++ b/modules/cudaimgproc/test/test_mean_shift.cpp
@@ -0,0 +1,176 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+////////////////////////////////////////////////////////////////////////////////
+// MeanShift
+
+struct MeanShift : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    cv::Mat img;
+
+    int spatialRad;
+    int colorRad;
+
+    virtual void SetUp()
+    {
+        devInfo = GetParam();
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        img = readImageType("meanshift/cones.png", CV_8UC4);
+        ASSERT_FALSE(img.empty());
+
+        spatialRad = 30;
+        colorRad = 30;
+    }
+};
+
+CUDA_TEST_P(MeanShift, Filtering)
+{
+    cv::Mat img_template;
+    if (supportFeature(devInfo, cv::cuda::FEATURE_SET_COMPUTE_20))
+        img_template = readImage("meanshift/con_result.png");
+    else
+        img_template = readImage("meanshift/con_result_CC1X.png");
+    ASSERT_FALSE(img_template.empty());
+
+    cv::cuda::GpuMat d_dst;
+    cv::cuda::meanShiftFiltering(loadMat(img), d_dst, spatialRad, colorRad);
+
+    ASSERT_EQ(CV_8UC4, d_dst.type());
+
+    cv::Mat dst(d_dst);
+
+    cv::Mat result;
+    cv::cvtColor(dst, result, cv::COLOR_BGRA2BGR);
+
+    EXPECT_MAT_NEAR(img_template, result, 0.0);
+}
+
+CUDA_TEST_P(MeanShift, Proc)
+{
+    cv::FileStorage fs;
+    if (supportFeature(devInfo, cv::cuda::FEATURE_SET_COMPUTE_20))
+        fs.open(std::string(cvtest::TS::ptr()->get_data_path()) + "meanshift/spmap.yaml", cv::FileStorage::READ);
+    else
+        fs.open(std::string(cvtest::TS::ptr()->get_data_path()) + "meanshift/spmap_CC1X.yaml", cv::FileStorage::READ);
+    ASSERT_TRUE(fs.isOpened());
+
+    cv::Mat spmap_template;
+    fs["spmap"] >> spmap_template;
+    ASSERT_FALSE(spmap_template.empty());
+
+    cv::cuda::GpuMat rmap_filtered;
+    cv::cuda::meanShiftFiltering(loadMat(img), rmap_filtered, spatialRad, colorRad);
+
+    cv::cuda::GpuMat rmap;
+    cv::cuda::GpuMat spmap;
+    cv::cuda::meanShiftProc(loadMat(img), rmap, spmap, spatialRad, colorRad);
+
+    ASSERT_EQ(CV_8UC4, rmap.type());
+
+    EXPECT_MAT_NEAR(rmap_filtered, rmap, 0.0);
+    EXPECT_MAT_NEAR(spmap_template, spmap, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, MeanShift, ALL_DEVICES);
+
+////////////////////////////////////////////////////////////////////////////////
+// MeanShiftSegmentation
+
+namespace
+{
+    IMPLEMENT_PARAM_CLASS(MinSize, int);
+}
+
+PARAM_TEST_CASE(MeanShiftSegmentation, cv::cuda::DeviceInfo, MinSize)
+{
+    cv::cuda::DeviceInfo devInfo;
+    int minsize;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        minsize = GET_PARAM(1);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(MeanShiftSegmentation, Regression)
+{
+    cv::Mat img = readImageType("meanshift/cones.png", CV_8UC4);
+    ASSERT_FALSE(img.empty());
+
+    std::ostringstream path;
+    path << "meanshift/cones_segmented_sp10_sr10_minsize" << minsize;
+    if (supportFeature(devInfo, cv::cuda::FEATURE_SET_COMPUTE_20))
+        path << ".png";
+    else
+        path << "_CC1X.png";
+    cv::Mat dst_gold = readImage(path.str());
+    ASSERT_FALSE(dst_gold.empty());
+
+    cv::Mat dst;
+    cv::cuda::meanShiftSegmentation(loadMat(img), dst, 10, 10, minsize);
+
+    cv::Mat dst_rgb;
+    cv::cvtColor(dst, dst_rgb, cv::COLOR_BGRA2BGR);
+
+    EXPECT_MAT_SIMILAR(dst_gold, dst_rgb, 1e-3);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ImgProc, MeanShiftSegmentation, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(MinSize(0), MinSize(4), MinSize(20), MinSize(84), MinSize(340), MinSize(1364))));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaimgproc/test/test_precomp.hpp b/modules/cudaimgproc/test/test_precomp.hpp
new file mode 100644
index 00000000000..dd94f6f2856
--- /dev/null
+++ b/modules/cudaimgproc/test/test_precomp.hpp
@@ -0,0 +1,52 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_test.hpp"
+
+#include "opencv2/cudaimgproc.hpp"
+
+#include "cvconfig.h"
+
+#endif
diff --git a/modules/cudalegacy/CMakeLists.txt b/modules/cudalegacy/CMakeLists.txt
new file mode 100644
index 00000000000..b782849e5b6
--- /dev/null
+++ b/modules/cudalegacy/CMakeLists.txt
@@ -0,0 +1,10 @@
+if(NOT HAVE_CUDA)
+  ocv_module_disable(cudalegacy)
+endif()
+
+set(the_description "CUDA-accelerated Computer Vision (legacy)")
+
+ocv_warnings_disable(CMAKE_CXX_FLAGS /wd4127 /wd4130 /wd4324 /wd4512 /wd4310 -Wundef -Wmissing-declarations -Wuninitialized -Wshadow -Wdeprecated-declarations -Wstrict-aliasing -Wtautological-compare)
+
+ocv_define_module(cudalegacy opencv_core opencv_video
+  OPTIONAL opencv_objdetect opencv_imgproc opencv_calib3d opencv_cudaarithm opencv_cudafilters opencv_cudaimgproc)
diff --git a/modules/cudalegacy/include/opencv2/cudalegacy.hpp b/modules/cudalegacy/include/opencv2/cudalegacy.hpp
new file mode 100644
index 00000000000..ace8548e35d
--- /dev/null
+++ b/modules/cudalegacy/include/opencv2/cudalegacy.hpp
@@ -0,0 +1,290 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_CUDALEGACY_HPP
+#define OPENCV_CUDALEGACY_HPP
+
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/cudalegacy/NCV.hpp"
+#include "opencv2/cudalegacy/NPP_staging.hpp"
+#include "opencv2/cudalegacy/NCVPyramid.hpp"
+#include "opencv2/cudalegacy/NCVHaarObjectDetection.hpp"
+#include "opencv2/cudalegacy/NCVBroxOpticalFlow.hpp"
+#include "opencv2/video/background_segm.hpp"
+
+/**
+  @addtogroup cuda
+  @{
+    @defgroup cudalegacy Legacy support
+  @}
+*/
+
+namespace cv { namespace cuda {
+
+//! @addtogroup cudalegacy
+//! @{
+
+//
+// ImagePyramid
+//
+
+class CV_EXPORTS ImagePyramid : public Algorithm
+{
+public:
+    virtual void getLayer(OutputArray outImg, Size outRoi, Stream& stream = Stream::Null()) const = 0;
+};
+
+CV_EXPORTS Ptr<ImagePyramid> createImagePyramid(InputArray img, int nLayers = -1, Stream& stream = Stream::Null());
+
+//
+// GMG
+//
+
+/** @brief Background/Foreground Segmentation Algorithm.
+
+The class discriminates between foreground and background pixels by building and maintaining a model
+of the background. Any pixel which does not fit this model is then deemed to be foreground. The
+class implements algorithm described in @cite Gold2012 .
+ */
+class CV_EXPORTS BackgroundSubtractorGMG : public cv::BackgroundSubtractor
+{
+public:
+    using cv::BackgroundSubtractor::apply;
+    virtual void apply(InputArray image, OutputArray fgmask, double learningRate, Stream& stream) = 0;
+
+    virtual int getMaxFeatures() const = 0;
+    virtual void setMaxFeatures(int maxFeatures) = 0;
+
+    virtual double getDefaultLearningRate() const = 0;
+    virtual void setDefaultLearningRate(double lr) = 0;
+
+    virtual int getNumFrames() const = 0;
+    virtual void setNumFrames(int nframes) = 0;
+
+    virtual int getQuantizationLevels() const = 0;
+    virtual void setQuantizationLevels(int nlevels) = 0;
+
+    virtual double getBackgroundPrior() const = 0;
+    virtual void setBackgroundPrior(double bgprior) = 0;
+
+    virtual int getSmoothingRadius() const = 0;
+    virtual void setSmoothingRadius(int radius) = 0;
+
+    virtual double getDecisionThreshold() const = 0;
+    virtual void setDecisionThreshold(double thresh) = 0;
+
+    virtual bool getUpdateBackgroundModel() const = 0;
+    virtual void setUpdateBackgroundModel(bool update) = 0;
+
+    virtual double getMinVal() const = 0;
+    virtual void setMinVal(double val) = 0;
+
+    virtual double getMaxVal() const = 0;
+    virtual void setMaxVal(double val) = 0;
+};
+
+/** @brief Creates GMG Background Subtractor
+
+@param initializationFrames Number of frames of video to use to initialize histograms.
+@param decisionThreshold Value above which pixel is determined to be FG.
+ */
+CV_EXPORTS Ptr<cuda::BackgroundSubtractorGMG>
+    createBackgroundSubtractorGMG(int initializationFrames = 120, double decisionThreshold = 0.8);
+
+//
+// FGD
+//
+
+/** @brief The class discriminates between foreground and background pixels by building and maintaining a model
+of the background.
+
+Any pixel which does not fit this model is then deemed to be foreground. The class implements
+algorithm described in @cite FGD2003 .
+@sa BackgroundSubtractor
+ */
+class CV_EXPORTS BackgroundSubtractorFGD : public cv::BackgroundSubtractor
+{
+public:
+    /** @brief Returns the output foreground regions calculated by findContours.
+
+    @param foreground_regions Output array (CPU memory).
+     */
+    virtual void getForegroundRegions(OutputArrayOfArrays foreground_regions) = 0;
+};
+
+struct CV_EXPORTS FGDParams
+{
+    int Lc;  //!< Quantized levels per 'color' component. Power of two, typically 32, 64 or 128.
+    int N1c; //!< Number of color vectors used to model normal background color variation at a given pixel.
+    int N2c; //!< Number of color vectors retained at given pixel.  Must be > N1c, typically ~ 5/3 of N1c.
+    //!< Used to allow the first N1c vectors to adapt over time to changing background.
+
+    int Lcc;  //!< Quantized levels per 'color co-occurrence' component.  Power of two, typically 16, 32 or 64.
+    int N1cc; //!< Number of color co-occurrence vectors used to model normal background color variation at a given pixel.
+    int N2cc; //!< Number of color co-occurrence vectors retained at given pixel.  Must be > N1cc, typically ~ 5/3 of N1cc.
+    //!< Used to allow the first N1cc vectors to adapt over time to changing background.
+
+    bool is_obj_without_holes; //!< If TRUE we ignore holes within foreground blobs. Defaults to TRUE.
+    int perform_morphing;     //!< Number of erode-dilate-erode foreground-blob cleanup iterations.
+    //!< These erase one-pixel junk blobs and merge almost-touching blobs. Default value is 1.
+
+    float alpha1; //!< How quickly we forget old background pixel values seen. Typically set to 0.1.
+    float alpha2; //!< "Controls speed of feature learning". Depends on T. Typical value circa 0.005.
+    float alpha3; //!< Alternate to alpha2, used (e.g.) for quicker initial convergence. Typical value 0.1.
+
+    float delta;   //!< Affects color and color co-occurrence quantization, typically set to 2.
+    float T;       //!< A percentage value which determines when new features can be recognized as new background. (Typically 0.9).
+    float minArea; //!< Discard foreground blobs whose bounding box is smaller than this threshold.
+
+    //! default Params
+    FGDParams();
+};
+
+/** @brief Creates FGD Background Subtractor
+
+@param params Algorithm's parameters. See @cite FGD2003 for explanation.
+ */
+CV_EXPORTS Ptr<cuda::BackgroundSubtractorFGD>
+    createBackgroundSubtractorFGD(const FGDParams& params = FGDParams());
+
+//
+// Optical flow
+//
+
+//! Calculates optical flow for 2 images using block matching algorithm */
+CV_EXPORTS void calcOpticalFlowBM(const GpuMat& prev, const GpuMat& curr,
+                                  Size block_size, Size shift_size, Size max_range, bool use_previous,
+                                  GpuMat& velx, GpuMat& vely, GpuMat& buf,
+                                  Stream& stream = Stream::Null());
+
+class CV_EXPORTS FastOpticalFlowBM
+{
+public:
+    void operator ()(const GpuMat& I0, const GpuMat& I1, GpuMat& flowx, GpuMat& flowy, int search_window = 21, int block_window = 7, Stream& s = Stream::Null());
+
+private:
+    GpuMat buffer;
+    GpuMat extended_I0;
+    GpuMat extended_I1;
+};
+
+/** @brief Interpolates frames (images) using provided optical flow (displacement field).
+
+@param frame0 First frame (32-bit floating point images, single channel).
+@param frame1 Second frame. Must have the same type and size as frame0 .
+@param fu Forward horizontal displacement.
+@param fv Forward vertical displacement.
+@param bu Backward horizontal displacement.
+@param bv Backward vertical displacement.
+@param pos New frame position.
+@param newFrame Output image.
+@param buf Temporary buffer, will have width x 6\*height size, CV_32FC1 type and contain 6
+GpuMat: occlusion masks for first frame, occlusion masks for second, interpolated forward
+horizontal flow, interpolated forward vertical flow, interpolated backward horizontal flow,
+interpolated backward vertical flow.
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS void interpolateFrames(const GpuMat& frame0, const GpuMat& frame1,
+                                  const GpuMat& fu, const GpuMat& fv,
+                                  const GpuMat& bu, const GpuMat& bv,
+                                  float pos, GpuMat& newFrame, GpuMat& buf,
+                                  Stream& stream = Stream::Null());
+
+CV_EXPORTS void createOpticalFlowNeedleMap(const GpuMat& u, const GpuMat& v, GpuMat& vertex, GpuMat& colors);
+
+//
+// Labeling
+//
+
+//!performs labeling via graph cuts of a 2D regular 4-connected graph.
+CV_EXPORTS void graphcut(GpuMat& terminals, GpuMat& leftTransp, GpuMat& rightTransp, GpuMat& top, GpuMat& bottom, GpuMat& labels,
+                         GpuMat& buf, Stream& stream = Stream::Null());
+
+//!performs labeling via graph cuts of a 2D regular 8-connected graph.
+CV_EXPORTS void graphcut(GpuMat& terminals, GpuMat& leftTransp, GpuMat& rightTransp, GpuMat& top, GpuMat& topLeft, GpuMat& topRight,
+                         GpuMat& bottom, GpuMat& bottomLeft, GpuMat& bottomRight,
+                         GpuMat& labels,
+                         GpuMat& buf, Stream& stream = Stream::Null());
+
+//! compute mask for Generalized Flood fill componetns labeling.
+CV_EXPORTS void connectivityMask(const GpuMat& image, GpuMat& mask, const cv::Scalar& lo, const cv::Scalar& hi, Stream& stream = Stream::Null());
+
+//! performs connected componnents labeling.
+CV_EXPORTS void labelComponents(const GpuMat& mask, GpuMat& components, int flags = 0, Stream& stream = Stream::Null());
+
+//
+// Calib3d
+//
+
+CV_EXPORTS void transformPoints(const GpuMat& src, const Mat& rvec, const Mat& tvec,
+                                GpuMat& dst, Stream& stream = Stream::Null());
+
+CV_EXPORTS void projectPoints(const GpuMat& src, const Mat& rvec, const Mat& tvec,
+                              const Mat& camera_mat, const Mat& dist_coef, GpuMat& dst,
+                              Stream& stream = Stream::Null());
+
+/** @brief Finds the object pose from 3D-2D point correspondences.
+
+@param object Single-row matrix of object points.
+@param image Single-row matrix of image points.
+@param camera_mat 3x3 matrix of intrinsic camera parameters.
+@param dist_coef Distortion coefficients. See undistortPoints for details.
+@param rvec Output 3D rotation vector.
+@param tvec Output 3D translation vector.
+@param use_extrinsic_guess Flag to indicate that the function must use rvec and tvec as an
+initial transformation guess. It is not supported for now.
+@param num_iters Maximum number of RANSAC iterations.
+@param max_dist Euclidean distance threshold to detect whether point is inlier or not.
+@param min_inlier_count Flag to indicate that the function must stop if greater or equal number
+of inliers is achieved. It is not supported for now.
+@param inliers Output vector of inlier indices.
+ */
+CV_EXPORTS void solvePnPRansac(const Mat& object, const Mat& image, const Mat& camera_mat,
+                               const Mat& dist_coef, Mat& rvec, Mat& tvec, bool use_extrinsic_guess=false,
+                               int num_iters=100, float max_dist=8.0, int min_inlier_count=100,
+                               std::vector<int>* inliers=NULL);
+
+//! @}
+
+}}
+
+#endif /* OPENCV_CUDALEGACY_HPP */
diff --git a/modules/cudalegacy/include/opencv2/cudalegacy/NCV.hpp b/modules/cudalegacy/include/opencv2/cudalegacy/NCV.hpp
new file mode 100644
index 00000000000..d0ec6a42d6e
--- /dev/null
+++ b/modules/cudalegacy/include/opencv2/cudalegacy/NCV.hpp
@@ -0,0 +1,1032 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _ncv_hpp_
+#define _ncv_hpp_
+
+#include "opencv2/core/cvdef.h"
+
+#ifdef _WIN32
+    #define WIN32_LEAN_AND_MEAN
+#endif
+
+#include <cuda_runtime.h>
+#include "opencv2/core/cvstd.hpp"
+#include "opencv2/core/utility.hpp"
+
+
+//==============================================================================
+//
+// Compile-time assert functionality
+//
+//==============================================================================
+
+//! @addtogroup cudalegacy
+//! @{
+
+/**
+* Compile-time assert namespace
+*/
+namespace NcvCTprep
+{
+    template <bool x>
+    struct CT_ASSERT_FAILURE;
+
+    template <>
+    struct CT_ASSERT_FAILURE<true> {};
+
+    template <int x>
+    struct assertTest{};
+}
+
+
+#define NCV_CT_PREP_PASTE_AUX(a,b)      a##b                         ///< Concatenation indirection macro
+#define NCV_CT_PREP_PASTE(a,b)          NCV_CT_PREP_PASTE_AUX(a, b)  ///< Concatenation macro
+
+
+/**
+* Performs compile-time assertion of a condition on the file scope
+*/
+#define NCV_CT_ASSERT(X) \
+    typedef NcvCTprep::assertTest<sizeof(NcvCTprep::CT_ASSERT_FAILURE< (bool)(X) >)> \
+    NCV_CT_PREP_PASTE(__ct_assert_typedef_, __LINE__)
+
+
+
+//==============================================================================
+//
+// Alignment macros
+//
+//==============================================================================
+
+
+#if !defined(__align__) && !defined(__CUDACC__)
+    #if defined(_WIN32) || defined(_WIN64)
+        #define __align__(n)         __declspec(align(n))
+    #elif defined(__unix__)
+        #define __align__(n)         __attribute__((__aligned__(n)))
+    #endif
+#endif
+
+
+//==============================================================================
+//
+// Integral and compound types of guaranteed size
+//
+//==============================================================================
+
+
+typedef               bool NcvBool;
+typedef          long long Ncv64s;
+
+#if defined(__APPLE__) && !defined(__CUDACC__)
+    typedef uint64_t Ncv64u;
+#else
+    typedef unsigned long long Ncv64u;
+#endif
+
+typedef                int Ncv32s;
+typedef       unsigned int Ncv32u;
+typedef              short Ncv16s;
+typedef     unsigned short Ncv16u;
+typedef        signed char Ncv8s;
+typedef      unsigned char Ncv8u;
+typedef              float Ncv32f;
+typedef             double Ncv64f;
+
+
+struct NcvRect8u
+{
+    Ncv8u x;
+    Ncv8u y;
+    Ncv8u width;
+    Ncv8u height;
+    __host__ __device__ NcvRect8u() : x(0), y(0), width(0), height(0) {};
+    __host__ __device__ NcvRect8u(Ncv8u x_, Ncv8u y_, Ncv8u width_, Ncv8u height_) : x(x_), y(y_), width(width_), height(height_) {}
+};
+
+
+struct NcvRect32s
+{
+    Ncv32s x;          ///< x-coordinate of upper left corner.
+    Ncv32s y;          ///< y-coordinate of upper left corner.
+    Ncv32s width;      ///< Rectangle width.
+    Ncv32s height;     ///< Rectangle height.
+    __host__ __device__ NcvRect32s() : x(0), y(0), width(0), height(0) {};
+    __host__ __device__ NcvRect32s(Ncv32s x_, Ncv32s y_, Ncv32s width_, Ncv32s height_)
+        : x(x_), y(y_), width(width_), height(height_) {}
+};
+
+
+struct NcvRect32u
+{
+    Ncv32u x;          ///< x-coordinate of upper left corner.
+    Ncv32u y;          ///< y-coordinate of upper left corner.
+    Ncv32u width;      ///< Rectangle width.
+    Ncv32u height;     ///< Rectangle height.
+    __host__ __device__ NcvRect32u() : x(0), y(0), width(0), height(0) {};
+    __host__ __device__ NcvRect32u(Ncv32u x_, Ncv32u y_, Ncv32u width_, Ncv32u height_)
+        : x(x_), y(y_), width(width_), height(height_) {}
+};
+
+
+struct NcvSize32s
+{
+    Ncv32s width;  ///< Rectangle width.
+    Ncv32s height; ///< Rectangle height.
+    __host__ __device__ NcvSize32s() : width(0), height(0) {};
+    __host__ __device__ NcvSize32s(Ncv32s width_, Ncv32s height_) : width(width_), height(height_) {}
+};
+
+
+struct NcvSize32u
+{
+    Ncv32u width;  ///< Rectangle width.
+    Ncv32u height; ///< Rectangle height.
+    __host__ __device__ NcvSize32u() : width(0), height(0) {};
+    __host__ __device__ NcvSize32u(Ncv32u width_, Ncv32u height_) : width(width_), height(height_) {}
+    __host__ __device__ bool operator == (const NcvSize32u &another) const {return this->width == another.width && this->height == another.height;}
+};
+
+
+struct NcvPoint2D32s
+{
+    Ncv32s x; ///< Point X.
+    Ncv32s y; ///< Point Y.
+    __host__ __device__ NcvPoint2D32s() : x(0), y(0) {};
+    __host__ __device__ NcvPoint2D32s(Ncv32s x_, Ncv32s y_) : x(x_), y(y_) {}
+};
+
+
+struct NcvPoint2D32u
+{
+    Ncv32u x; ///< Point X.
+    Ncv32u y; ///< Point Y.
+    __host__ __device__ NcvPoint2D32u() : x(0), y(0) {};
+    __host__ __device__ NcvPoint2D32u(Ncv32u x_, Ncv32u y_) : x(x_), y(y_) {}
+};
+
+//! @cond IGNORED
+
+NCV_CT_ASSERT(sizeof(NcvBool) <= 4);
+NCV_CT_ASSERT(sizeof(Ncv64s) == 8);
+NCV_CT_ASSERT(sizeof(Ncv64u) == 8);
+NCV_CT_ASSERT(sizeof(Ncv32s) == 4);
+NCV_CT_ASSERT(sizeof(Ncv32u) == 4);
+NCV_CT_ASSERT(sizeof(Ncv16s) == 2);
+NCV_CT_ASSERT(sizeof(Ncv16u) == 2);
+NCV_CT_ASSERT(sizeof(Ncv8s) == 1);
+NCV_CT_ASSERT(sizeof(Ncv8u) == 1);
+NCV_CT_ASSERT(sizeof(Ncv32f) == 4);
+NCV_CT_ASSERT(sizeof(Ncv64f) == 8);
+NCV_CT_ASSERT(sizeof(NcvRect8u) == sizeof(Ncv32u));
+NCV_CT_ASSERT(sizeof(NcvRect32s) == 4 * sizeof(Ncv32s));
+NCV_CT_ASSERT(sizeof(NcvRect32u) == 4 * sizeof(Ncv32u));
+NCV_CT_ASSERT(sizeof(NcvSize32u) == 2 * sizeof(Ncv32u));
+NCV_CT_ASSERT(sizeof(NcvPoint2D32u) == 2 * sizeof(Ncv32u));
+
+//! @endcond
+
+//==============================================================================
+//
+// Persistent constants
+//
+//==============================================================================
+
+
+const Ncv32u K_WARP_SIZE = 32;
+const Ncv32u K_LOG2_WARP_SIZE = 5;
+
+
+//==============================================================================
+//
+// Error handling
+//
+//==============================================================================
+
+
+CV_EXPORTS void ncvDebugOutput(const cv::String &msg);
+
+
+typedef void NCVDebugOutputHandler(const cv::String &msg);
+
+
+CV_EXPORTS void ncvSetDebugOutputHandler(NCVDebugOutputHandler* func);
+
+
+#define ncvAssertPrintCheck(pred, msg) \
+    do \
+    { \
+        if (!(pred)) \
+        { \
+            cv::String str = cv::format("NCV Assertion Failed: %s, file=%s, line=%d", msg, __FILE__, __LINE__); \
+            ncvDebugOutput(str); \
+        } \
+    } while (0)
+
+
+#define ncvAssertPrintReturn(pred, msg, err) \
+    do \
+    { \
+        ncvAssertPrintCheck(pred, msg); \
+        if (!(pred)) return err; \
+    } while (0)
+
+
+#define ncvAssertReturn(pred, err) \
+    do \
+    { \
+        cv::String msg = cv::format("retcode=%d", (int)err); \
+        ncvAssertPrintReturn(pred, msg.c_str(), err); \
+    } while (0)
+
+
+#define ncvAssertReturnNcvStat(ncvOp) \
+    do \
+    { \
+        NCVStatus _ncvStat = ncvOp; \
+        cv::String msg = cv::format("NcvStat=%d", (int)_ncvStat); \
+        ncvAssertPrintReturn(NCV_SUCCESS==_ncvStat, msg.c_str(), _ncvStat); \
+    } while (0)
+
+
+#define ncvAssertCUDAReturn(cudacall, errCode) \
+    do \
+    { \
+        cudaError_t res = cudacall; \
+        cv::String msg = cv::format("cudaError_t=%d", (int)res); \
+        ncvAssertPrintReturn(cudaSuccess==res, msg.c_str(), errCode); \
+    } while (0)
+
+
+#define ncvAssertCUDALastErrorReturn(errCode) \
+    do \
+    { \
+        cudaError_t res = cudaGetLastError(); \
+        cv::String msg = cv::format("cudaError_t=%d", (int)res); \
+        ncvAssertPrintReturn(cudaSuccess==res, msg.c_str(), errCode); \
+    } while (0)
+
+
+/**
+* Return-codes for status notification, errors and warnings
+*/
+enum
+{
+    //NCV statuses
+    NCV_SUCCESS,
+    NCV_UNKNOWN_ERROR,
+
+    NCV_CUDA_ERROR,
+    NCV_NPP_ERROR,
+    NCV_FILE_ERROR,
+
+    NCV_NULL_PTR,
+    NCV_INCONSISTENT_INPUT,
+    NCV_TEXTURE_BIND_ERROR,
+    NCV_DIMENSIONS_INVALID,
+
+    NCV_INVALID_ROI,
+    NCV_INVALID_STEP,
+    NCV_INVALID_SCALE,
+
+    NCV_ALLOCATOR_NOT_INITIALIZED,
+    NCV_ALLOCATOR_BAD_ALLOC,
+    NCV_ALLOCATOR_BAD_DEALLOC,
+    NCV_ALLOCATOR_INSUFFICIENT_CAPACITY,
+    NCV_ALLOCATOR_DEALLOC_ORDER,
+    NCV_ALLOCATOR_BAD_REUSE,
+
+    NCV_MEM_COPY_ERROR,
+    NCV_MEM_RESIDENCE_ERROR,
+    NCV_MEM_INSUFFICIENT_CAPACITY,
+
+    NCV_HAAR_INVALID_PIXEL_STEP,
+    NCV_HAAR_TOO_MANY_FEATURES_IN_CLASSIFIER,
+    NCV_HAAR_TOO_MANY_FEATURES_IN_CASCADE,
+    NCV_HAAR_TOO_LARGE_FEATURES,
+    NCV_HAAR_XML_LOADING_EXCEPTION,
+
+    NCV_NOIMPL_HAAR_TILTED_FEATURES,
+    NCV_NOT_IMPLEMENTED,
+
+    NCV_WARNING_HAAR_DETECTIONS_VECTOR_OVERFLOW,
+
+    //NPP statuses
+    NPPST_SUCCESS = NCV_SUCCESS,              ///< Successful operation (same as NPP_NO_ERROR)
+    NPPST_ERROR,                              ///< Unknown error
+    NPPST_CUDA_KERNEL_EXECUTION_ERROR,        ///< CUDA kernel execution error
+    NPPST_NULL_POINTER_ERROR,                 ///< NULL pointer argument error
+    NPPST_TEXTURE_BIND_ERROR,                 ///< CUDA texture binding error or non-zero offset returned
+    NPPST_MEMCPY_ERROR,                       ///< CUDA memory copy error
+    NPPST_MEM_ALLOC_ERR,                      ///< CUDA memory allocation error
+    NPPST_MEMFREE_ERR,                        ///< CUDA memory deallocation error
+
+    //NPPST statuses
+    NPPST_INVALID_ROI,                        ///< Invalid region of interest argument
+    NPPST_INVALID_STEP,                       ///< Invalid image lines step argument (check sign, alignment, relation to image width)
+    NPPST_INVALID_SCALE,                      ///< Invalid scale parameter passed
+    NPPST_MEM_INSUFFICIENT_BUFFER,            ///< Insufficient user-allocated buffer
+    NPPST_MEM_RESIDENCE_ERROR,                ///< Memory residence error detected (check if pointers should be device or pinned)
+    NPPST_MEM_INTERNAL_ERROR,                 ///< Internal memory management error
+
+    NCV_LAST_STATUS                           ///< Marker to continue error numeration in other files
+};
+
+
+typedef Ncv32u NCVStatus;
+
+
+#define NCV_SET_SKIP_COND(x) \
+    bool __ncv_skip_cond = x
+
+
+#define NCV_RESET_SKIP_COND(x) \
+    __ncv_skip_cond = x
+
+
+#define NCV_SKIP_COND_BEGIN \
+    if (!__ncv_skip_cond) {
+
+
+#define NCV_SKIP_COND_END \
+    }
+
+
+//==============================================================================
+//
+// Timer
+//
+//==============================================================================
+
+
+typedef struct _NcvTimer *NcvTimer;
+
+CV_EXPORTS NcvTimer ncvStartTimer(void);
+
+CV_EXPORTS double ncvEndQueryTimerUs(NcvTimer t);
+
+CV_EXPORTS double ncvEndQueryTimerMs(NcvTimer t);
+
+
+//==============================================================================
+//
+// Memory management classes template compound types
+//
+//==============================================================================
+
+
+/**
+* Calculates the aligned top bound value
+*/
+CV_EXPORTS Ncv32u alignUp(Ncv32u what, Ncv32u alignment);
+
+
+/**
+* NCVMemoryType
+*/
+enum NCVMemoryType
+{
+    NCVMemoryTypeNone,
+    NCVMemoryTypeHostPageable,
+    NCVMemoryTypeHostPinned,
+    NCVMemoryTypeDevice
+};
+
+
+/**
+* NCVMemPtr
+*/
+struct CV_EXPORTS NCVMemPtr
+{
+    void *ptr;
+    NCVMemoryType memtype;
+    void clear();
+};
+
+
+/**
+* NCVMemSegment
+*/
+struct CV_EXPORTS NCVMemSegment
+{
+    NCVMemPtr begin;
+    size_t size;
+    void clear();
+};
+
+
+/**
+* INCVMemAllocator (Interface)
+*/
+class CV_EXPORTS INCVMemAllocator
+{
+public:
+    virtual ~INCVMemAllocator() = 0;
+
+    virtual NCVStatus alloc(NCVMemSegment &seg, size_t size) = 0;
+    virtual NCVStatus dealloc(NCVMemSegment &seg) = 0;
+
+    virtual NcvBool isInitialized(void) const = 0;
+    virtual NcvBool isCounting(void) const = 0;
+
+    virtual NCVMemoryType memType(void) const = 0;
+    virtual Ncv32u alignment(void) const = 0;
+    virtual size_t maxSize(void) const = 0;
+};
+
+inline INCVMemAllocator::~INCVMemAllocator() {}
+
+
+/**
+* NCVMemStackAllocator
+*/
+class CV_EXPORTS NCVMemStackAllocator : public INCVMemAllocator
+{
+    NCVMemStackAllocator();
+    NCVMemStackAllocator(const NCVMemStackAllocator &);
+
+public:
+
+    explicit NCVMemStackAllocator(Ncv32u alignment);
+    NCVMemStackAllocator(NCVMemoryType memT, size_t capacity, Ncv32u alignment, void *reusePtr=NULL);
+    virtual ~NCVMemStackAllocator();
+
+    virtual NCVStatus alloc(NCVMemSegment &seg, size_t size);
+    virtual NCVStatus dealloc(NCVMemSegment &seg);
+
+    virtual NcvBool isInitialized() const;
+    virtual NcvBool isCounting() const;
+
+    virtual NCVMemoryType memType() const;
+    virtual Ncv32u alignment() const;
+    virtual size_t maxSize() const;
+
+private:
+
+    NCVMemoryType _memType;
+    Ncv32u _alignment;
+    Ncv8u *allocBegin;
+    Ncv8u *begin;
+    Ncv8u *end;
+    size_t currentSize;
+    size_t _maxSize;
+    NcvBool bReusesMemory;
+};
+
+
+/**
+* NCVMemNativeAllocator
+*/
+class CV_EXPORTS NCVMemNativeAllocator : public INCVMemAllocator
+{
+public:
+
+    NCVMemNativeAllocator(NCVMemoryType memT, Ncv32u alignment);
+    virtual ~NCVMemNativeAllocator();
+
+    virtual NCVStatus alloc(NCVMemSegment &seg, size_t size);
+    virtual NCVStatus dealloc(NCVMemSegment &seg);
+
+    virtual NcvBool isInitialized() const;
+    virtual NcvBool isCounting() const;
+
+    virtual NCVMemoryType memType() const;
+    virtual Ncv32u alignment() const;
+    virtual size_t maxSize() const;
+
+private:
+
+    NCVMemNativeAllocator();
+    NCVMemNativeAllocator(const NCVMemNativeAllocator &);
+
+    NCVMemoryType _memType;
+    Ncv32u _alignment;
+    size_t currentSize;
+    size_t _maxSize;
+};
+
+
+/**
+* Copy dispatchers
+*/
+CV_EXPORTS NCVStatus memSegCopyHelper(void *dst, NCVMemoryType dstType,
+                                       const void *src, NCVMemoryType srcType,
+                                       size_t sz, cudaStream_t cuStream);
+
+
+CV_EXPORTS NCVStatus memSegCopyHelper2D(void *dst, Ncv32u dstPitch, NCVMemoryType dstType,
+                                         const void *src, Ncv32u srcPitch, NCVMemoryType srcType,
+                                         Ncv32u widthbytes, Ncv32u height, cudaStream_t cuStream);
+
+
+/**
+* NCVVector (1D)
+*/
+template <class T>
+class NCVVector
+{
+    NCVVector(const NCVVector &);
+
+public:
+
+    NCVVector()
+    {
+        clear();
+    }
+
+    virtual ~NCVVector() {}
+
+    void clear()
+    {
+        _ptr = NULL;
+        _length = 0;
+        _memtype = NCVMemoryTypeNone;
+    }
+
+    NCVStatus copySolid(NCVVector<T> &dst, cudaStream_t cuStream, size_t howMuch=0) const
+    {
+        if (howMuch == 0)
+        {
+            ncvAssertReturn(dst._length == this->_length, NCV_MEM_COPY_ERROR);
+            howMuch = this->_length * sizeof(T);
+        }
+        else
+        {
+            ncvAssertReturn(dst._length * sizeof(T) >= howMuch &&
+                this->_length * sizeof(T) >= howMuch &&
+                howMuch > 0, NCV_MEM_COPY_ERROR);
+        }
+        ncvAssertReturn((this->_ptr != NULL || this->_memtype == NCVMemoryTypeNone) &&
+                        (dst._ptr != NULL || dst._memtype == NCVMemoryTypeNone), NCV_NULL_PTR);
+
+        NCVStatus ncvStat = NCV_SUCCESS;
+        if (this->_memtype != NCVMemoryTypeNone)
+        {
+            ncvStat = memSegCopyHelper(dst._ptr, dst._memtype,
+                                       this->_ptr, this->_memtype,
+                                       howMuch, cuStream);
+        }
+
+        return ncvStat;
+    }
+
+    T *ptr() const {return this->_ptr;}
+    size_t length() const {return this->_length;}
+    NCVMemoryType memType() const {return this->_memtype;}
+
+protected:
+
+    T *_ptr;
+    size_t _length;
+    NCVMemoryType _memtype;
+};
+
+
+/**
+* NCVVectorAlloc
+*/
+template <class T>
+class NCVVectorAlloc : public NCVVector<T>
+{
+    NCVVectorAlloc();
+    NCVVectorAlloc(const NCVVectorAlloc &);
+    NCVVectorAlloc& operator=(const NCVVectorAlloc<T>&);
+
+public:
+
+    NCVVectorAlloc(INCVMemAllocator &allocator_, Ncv32u length_)
+        :
+        allocator(allocator_)
+    {
+        NCVStatus ncvStat;
+
+        this->clear();
+        this->allocatedMem.clear();
+
+        ncvStat = allocator.alloc(this->allocatedMem, length_ * sizeof(T));
+        ncvAssertPrintReturn(ncvStat == NCV_SUCCESS, "NCVVectorAlloc ctor:: alloc failed", );
+
+        this->_ptr = (T *)this->allocatedMem.begin.ptr;
+        this->_length = length_;
+        this->_memtype = this->allocatedMem.begin.memtype;
+    }
+
+    ~NCVVectorAlloc()
+    {
+        NCVStatus ncvStat;
+
+        ncvStat = allocator.dealloc(this->allocatedMem);
+        ncvAssertPrintCheck(ncvStat == NCV_SUCCESS, "NCVVectorAlloc dtor:: dealloc failed");
+
+        this->clear();
+    }
+
+    NcvBool isMemAllocated() const
+    {
+        return (this->allocatedMem.begin.ptr != NULL) || (this->allocator.isCounting());
+    }
+
+    Ncv32u getAllocatorsAlignment() const
+    {
+        return allocator.alignment();
+    }
+
+    NCVMemSegment getSegment() const
+    {
+        return allocatedMem;
+    }
+
+private:
+    INCVMemAllocator &allocator;
+    NCVMemSegment allocatedMem;
+};
+
+
+/**
+* NCVVectorReuse
+*/
+template <class T>
+class NCVVectorReuse : public NCVVector<T>
+{
+    NCVVectorReuse();
+    NCVVectorReuse(const NCVVectorReuse &);
+
+public:
+
+    explicit NCVVectorReuse(const NCVMemSegment &memSegment)
+    {
+        this->bReused = false;
+        this->clear();
+
+        this->_length = memSegment.size / sizeof(T);
+        this->_ptr = (T *)memSegment.begin.ptr;
+        this->_memtype = memSegment.begin.memtype;
+
+        this->bReused = true;
+    }
+
+    NCVVectorReuse(const NCVMemSegment &memSegment, Ncv32u length_)
+    {
+        this->bReused = false;
+        this->clear();
+
+        ncvAssertPrintReturn(length_ * sizeof(T) <= memSegment.size, \
+            "NCVVectorReuse ctor:: memory binding failed due to size mismatch", );
+
+        this->_length = length_;
+        this->_ptr = (T *)memSegment.begin.ptr;
+        this->_memtype = memSegment.begin.memtype;
+
+        this->bReused = true;
+    }
+
+    NcvBool isMemReused() const
+    {
+        return this->bReused;
+    }
+
+private:
+
+    NcvBool bReused;
+};
+
+
+/**
+* NCVMatrix (2D)
+*/
+template <class T>
+class NCVMatrix
+{
+    NCVMatrix(const NCVMatrix &);
+
+public:
+
+    NCVMatrix()
+    {
+        clear();
+    }
+
+    virtual ~NCVMatrix() {}
+
+    void clear()
+    {
+        _ptr = NULL;
+        _pitch = 0;
+        _width = 0;
+        _height = 0;
+        _memtype = NCVMemoryTypeNone;
+    }
+
+    Ncv32u stride() const
+    {
+        return _pitch / sizeof(T);
+    }
+
+    //a side effect of this function is that it copies everything in a single chunk, so the "padding" will be overwritten
+    NCVStatus copySolid(NCVMatrix<T> &dst, cudaStream_t cuStream, size_t howMuch=0) const
+    {
+        if (howMuch == 0)
+        {
+            ncvAssertReturn(dst._pitch == this->_pitch &&
+                            dst._height == this->_height, NCV_MEM_COPY_ERROR);
+            howMuch = this->_pitch * this->_height;
+        }
+        else
+        {
+            ncvAssertReturn(dst._pitch * dst._height >= howMuch &&
+                            this->_pitch * this->_height >= howMuch &&
+                            howMuch > 0, NCV_MEM_COPY_ERROR);
+        }
+        ncvAssertReturn((this->_ptr != NULL || this->_memtype == NCVMemoryTypeNone) &&
+                        (dst._ptr != NULL || dst._memtype == NCVMemoryTypeNone), NCV_NULL_PTR);
+
+        NCVStatus ncvStat = NCV_SUCCESS;
+        if (this->_memtype != NCVMemoryTypeNone)
+        {
+            ncvStat = memSegCopyHelper(dst._ptr, dst._memtype,
+                                       this->_ptr, this->_memtype,
+                                       howMuch, cuStream);
+        }
+
+        return ncvStat;
+    }
+
+    NCVStatus copy2D(NCVMatrix<T> &dst, NcvSize32u roi, cudaStream_t cuStream) const
+    {
+        ncvAssertReturn(this->width() >= roi.width && this->height() >= roi.height &&
+                        dst.width() >= roi.width && dst.height() >= roi.height, NCV_MEM_COPY_ERROR);
+        ncvAssertReturn((this->_ptr != NULL || this->_memtype == NCVMemoryTypeNone) &&
+                        (dst._ptr != NULL || dst._memtype == NCVMemoryTypeNone), NCV_NULL_PTR);
+
+        NCVStatus ncvStat = NCV_SUCCESS;
+        if (this->_memtype != NCVMemoryTypeNone)
+        {
+            ncvStat = memSegCopyHelper2D(dst._ptr, dst._pitch, dst._memtype,
+                                         this->_ptr, this->_pitch, this->_memtype,
+                                         roi.width * sizeof(T), roi.height, cuStream);
+        }
+
+        return ncvStat;
+    }
+
+    T& at(Ncv32u x, Ncv32u y) const
+    {
+        NcvBool bOutRange = (x >= this->_width || y >= this->_height);
+        ncvAssertPrintCheck(!bOutRange, "Error addressing matrix");
+        if (bOutRange)
+        {
+            return *this->_ptr;
+        }
+        return ((T *)((Ncv8u *)this->_ptr + y * this->_pitch))[x];
+    }
+
+    T *ptr() const {return this->_ptr;}
+    Ncv32u width() const {return this->_width;}
+    Ncv32u height() const {return this->_height;}
+    NcvSize32u size() const {return NcvSize32u(this->_width, this->_height);}
+    Ncv32u pitch() const {return this->_pitch;}
+    NCVMemoryType memType() const {return this->_memtype;}
+
+protected:
+
+    T *_ptr;
+    Ncv32u _width;
+    Ncv32u _height;
+    Ncv32u _pitch;
+    NCVMemoryType _memtype;
+};
+
+
+/**
+* NCVMatrixAlloc
+*/
+template <class T>
+class NCVMatrixAlloc : public NCVMatrix<T>
+{
+    NCVMatrixAlloc();
+    NCVMatrixAlloc(const NCVMatrixAlloc &);
+    NCVMatrixAlloc& operator=(const NCVMatrixAlloc &);
+public:
+
+    NCVMatrixAlloc(INCVMemAllocator &allocator_, Ncv32u width_, Ncv32u height_, Ncv32u pitch_=0)
+        :
+        allocator(allocator_)
+    {
+        NCVStatus ncvStat;
+
+        this->clear();
+        this->allocatedMem.clear();
+
+        Ncv32u widthBytes = width_ * sizeof(T);
+        Ncv32u pitchBytes = alignUp(widthBytes, allocator.alignment());
+
+        if (pitch_ != 0)
+        {
+            ncvAssertPrintReturn(pitch_ >= pitchBytes &&
+                (pitch_ & (allocator.alignment() - 1)) == 0,
+                "NCVMatrixAlloc ctor:: incorrect pitch passed", );
+            pitchBytes = pitch_;
+        }
+
+        Ncv32u requiredAllocSize = pitchBytes * height_;
+
+        ncvStat = allocator.alloc(this->allocatedMem, requiredAllocSize);
+        ncvAssertPrintReturn(ncvStat == NCV_SUCCESS, "NCVMatrixAlloc ctor:: alloc failed", );
+
+        this->_ptr = (T *)this->allocatedMem.begin.ptr;
+        this->_width = width_;
+        this->_height = height_;
+        this->_pitch = pitchBytes;
+        this->_memtype = this->allocatedMem.begin.memtype;
+    }
+
+    ~NCVMatrixAlloc()
+    {
+        NCVStatus ncvStat;
+
+        ncvStat = allocator.dealloc(this->allocatedMem);
+        ncvAssertPrintCheck(ncvStat == NCV_SUCCESS, "NCVMatrixAlloc dtor:: dealloc failed");
+
+        this->clear();
+    }
+
+    NcvBool isMemAllocated() const
+    {
+        return (this->allocatedMem.begin.ptr != NULL) || (this->allocator.isCounting());
+    }
+
+    Ncv32u getAllocatorsAlignment() const
+    {
+        return allocator.alignment();
+    }
+
+    NCVMemSegment getSegment() const
+    {
+        return allocatedMem;
+    }
+
+private:
+
+    INCVMemAllocator &allocator;
+    NCVMemSegment allocatedMem;
+};
+
+
+/**
+* NCVMatrixReuse
+*/
+template <class T>
+class NCVMatrixReuse : public NCVMatrix<T>
+{
+    NCVMatrixReuse();
+    NCVMatrixReuse(const NCVMatrixReuse &);
+
+public:
+
+    NCVMatrixReuse(const NCVMemSegment &memSegment, Ncv32u alignment, Ncv32u width_, Ncv32u height_, Ncv32u pitch_=0, NcvBool bSkipPitchCheck=false)
+    {
+        this->bReused = false;
+        this->clear();
+
+        Ncv32u widthBytes = width_ * sizeof(T);
+        Ncv32u pitchBytes = alignUp(widthBytes, alignment);
+
+        if (pitch_ != 0)
+        {
+            if (!bSkipPitchCheck)
+            {
+                ncvAssertPrintReturn(pitch_ >= pitchBytes &&
+                    (pitch_ & (alignment - 1)) == 0,
+                    "NCVMatrixReuse ctor:: incorrect pitch passed", );
+            }
+            else
+            {
+                ncvAssertPrintReturn(pitch_ >= widthBytes, "NCVMatrixReuse ctor:: incorrect pitch passed", );
+            }
+            pitchBytes = pitch_;
+        }
+
+        ncvAssertPrintReturn(pitchBytes * height_ <= memSegment.size, \
+            "NCVMatrixReuse ctor:: memory binding failed due to size mismatch", );
+
+        this->_width = width_;
+        this->_height = height_;
+        this->_pitch = pitchBytes;
+        this->_ptr = (T *)memSegment.begin.ptr;
+        this->_memtype = memSegment.begin.memtype;
+
+        this->bReused = true;
+    }
+
+    NCVMatrixReuse(const NCVMatrix<T> &mat, NcvRect32u roi)
+    {
+        this->bReused = false;
+        this->clear();
+
+        ncvAssertPrintReturn(roi.x < mat.width() && roi.y < mat.height() && \
+            roi.x + roi.width <= mat.width() && roi.y + roi.height <= mat.height(),
+            "NCVMatrixReuse ctor:: memory binding failed due to mismatching ROI and source matrix dims", );
+
+        this->_width = roi.width;
+        this->_height = roi.height;
+        this->_pitch = mat.pitch();
+        this->_ptr = &mat.at(roi.x, roi.y);
+        this->_memtype = mat.memType();
+
+        this->bReused = true;
+    }
+
+    NcvBool isMemReused() const
+    {
+        return this->bReused;
+    }
+
+private:
+
+    NcvBool bReused;
+};
+
+
+/**
+* Operations with rectangles
+*/
+CV_EXPORTS NCVStatus ncvGroupRectangles_host(NCVVector<NcvRect32u> &hypotheses, Ncv32u &numHypotheses,
+                                              Ncv32u minNeighbors, Ncv32f intersectEps, NCVVector<Ncv32u> *hypothesesWeights);
+
+
+CV_EXPORTS NCVStatus ncvDrawRects_8u_host(Ncv8u *h_dst, Ncv32u dstStride, Ncv32u dstWidth, Ncv32u dstHeight,
+                                           NcvRect32u *h_rects, Ncv32u numRects, Ncv8u color);
+
+
+CV_EXPORTS NCVStatus ncvDrawRects_32u_host(Ncv32u *h_dst, Ncv32u dstStride, Ncv32u dstWidth, Ncv32u dstHeight,
+                                            NcvRect32u *h_rects, Ncv32u numRects, Ncv32u color);
+
+
+CV_EXPORTS NCVStatus ncvDrawRects_8u_device(Ncv8u *d_dst, Ncv32u dstStride, Ncv32u dstWidth, Ncv32u dstHeight,
+                                             NcvRect32u *d_rects, Ncv32u numRects, Ncv8u color, cudaStream_t cuStream);
+
+
+CV_EXPORTS NCVStatus ncvDrawRects_32u_device(Ncv32u *d_dst, Ncv32u dstStride, Ncv32u dstWidth, Ncv32u dstHeight,
+                                              NcvRect32u *d_rects, Ncv32u numRects, Ncv32u color, cudaStream_t cuStream);
+
+
+#define CLAMP(x,a,b)        ( (x) > (b) ? (b) : ( (x) < (a) ? (a) : (x) ) )
+#define CLAMP_TOP(x, a)     (((x) > (a)) ? (a) : (x))
+#define CLAMP_BOTTOM(x, a)  (((x) < (a)) ? (a) : (x))
+#define CLAMP_0_255(x)      CLAMP(x,0,255)
+
+
+#define SUB_BEGIN(type, name)    struct { __inline type name
+#define SUB_END(name)            } name;
+#define SUB_CALL(name)           name.name
+
+#define SQR(x)              ((x)*(x))
+
+
+#define ncvSafeMatAlloc(name, type, alloc, width, height, err) \
+    NCVMatrixAlloc<type> name(alloc, width, height); \
+    ncvAssertReturn(name.isMemAllocated(), err);
+
+//! @}
+
+#endif // _ncv_hpp_
diff --git a/modules/cudalegacy/include/opencv2/cudalegacy/NCVBroxOpticalFlow.hpp b/modules/cudalegacy/include/opencv2/cudalegacy/NCVBroxOpticalFlow.hpp
new file mode 100644
index 00000000000..c14532b4809
--- /dev/null
+++ b/modules/cudalegacy/include/opencv2/cudalegacy/NCVBroxOpticalFlow.hpp
@@ -0,0 +1,110 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+////////////////////////////////////////////////////////////////////////////////
+//
+// NVIDIA CUDA implementation of Brox et al Optical Flow algorithm
+//
+// Algorithm is explained in the original paper:
+//      T. Brox, A. Bruhn, N. Papenberg, J. Weickert:
+//      High accuracy optical flow estimation based on a theory for warping.
+//      ECCV 2004.
+//
+// Implementation by Mikhail Smirnov
+// email: msmirnov@nvidia.com, devsupport@nvidia.com
+//
+// Credits for help with the code to:
+// Alexey Mendelenko, Anton Obukhov, and Alexander Kharlamov.
+//
+////////////////////////////////////////////////////////////////////////////////
+
+#ifndef _ncv_optical_flow_h_
+#define _ncv_optical_flow_h_
+
+#include "opencv2/cudalegacy/NCV.hpp"
+
+//! @addtogroup cudalegacy
+//! @{
+
+/// \brief Model and solver parameters
+struct NCVBroxOpticalFlowDescriptor
+{
+    /// flow smoothness
+    Ncv32f alpha;
+    /// gradient constancy importance
+    Ncv32f gamma;
+    /// pyramid scale factor
+    Ncv32f scale_factor;
+    /// number of lagged non-linearity iterations (inner loop)
+    Ncv32u number_of_inner_iterations;
+    /// number of warping iterations (number of pyramid levels)
+    Ncv32u number_of_outer_iterations;
+    /// number of linear system solver iterations
+    Ncv32u number_of_solver_iterations;
+};
+
+/////////////////////////////////////////////////////////////////////////////////////////
+/// \brief Compute optical flow
+///
+/// Based on method by Brox et al [2004]
+/// \param [in]  desc              model and solver parameters
+/// \param [in]  gpu_mem_allocator GPU memory allocator
+/// \param [in]  frame0            source frame
+/// \param [in]  frame1            frame to track
+/// \param [out] u                 flow horizontal component (along \b x axis)
+/// \param [out] v                 flow vertical component (along \b y axis)
+/// \param       stream
+/// \return                        computation status
+/////////////////////////////////////////////////////////////////////////////////////////
+
+CV_EXPORTS
+NCVStatus NCVBroxOpticalFlow(const NCVBroxOpticalFlowDescriptor desc,
+                             INCVMemAllocator &gpu_mem_allocator,
+                             const NCVMatrix<Ncv32f> &frame0,
+                             const NCVMatrix<Ncv32f> &frame1,
+                             NCVMatrix<Ncv32f> &u,
+                             NCVMatrix<Ncv32f> &v,
+                             cudaStream_t stream);
+
+//! @}
+
+#endif
diff --git a/modules/cudalegacy/include/opencv2/cudalegacy/NCVHaarObjectDetection.hpp b/modules/cudalegacy/include/opencv2/cudalegacy/NCVHaarObjectDetection.hpp
new file mode 100644
index 00000000000..3ec167ce405
--- /dev/null
+++ b/modules/cudalegacy/include/opencv2/cudalegacy/NCVHaarObjectDetection.hpp
@@ -0,0 +1,464 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+////////////////////////////////////////////////////////////////////////////////
+//
+// NVIDIA CUDA implementation of Viola-Jones Object Detection Framework
+//
+// The algorithm and code are explained in the upcoming GPU Computing Gems
+// chapter in detail:
+//
+//   Anton Obukhov, "Haar Classifiers for Object Detection with CUDA"
+//   PDF URL placeholder
+//   email: aobukhov@nvidia.com, devsupport@nvidia.com
+//
+// Credits for help with the code to:
+// Alexey Mendelenko, Cyril Crassin, and Mikhail Smirnov.
+//
+////////////////////////////////////////////////////////////////////////////////
+
+#ifndef _ncvhaarobjectdetection_hpp_
+#define _ncvhaarobjectdetection_hpp_
+
+#include "opencv2/cudalegacy/NCV.hpp"
+
+//! @addtogroup cudalegacy
+//! @{
+
+//==============================================================================
+//
+// Guaranteed size cross-platform classifier structures
+//
+//==============================================================================
+#if defined __GNUC__ && (__GNUC__*100 + __GNUC_MINOR__ > 204)
+typedef Ncv32f __attribute__((__may_alias__)) Ncv32f_a;
+#else
+typedef Ncv32f Ncv32f_a;
+#endif
+
+struct HaarFeature64
+{
+    uint2 _ui2;
+
+#define HaarFeature64_CreateCheck_MaxRectField                  0xFF
+
+    __host__ NCVStatus setRect(Ncv32u rectX, Ncv32u rectY, Ncv32u rectWidth, Ncv32u rectHeight, Ncv32u /*clsWidth*/, Ncv32u /*clsHeight*/)
+    {
+        ncvAssertReturn(rectWidth <= HaarFeature64_CreateCheck_MaxRectField && rectHeight <= HaarFeature64_CreateCheck_MaxRectField, NCV_HAAR_TOO_LARGE_FEATURES);
+        NcvRect8u* tmpRect = (NcvRect8u*)&this->_ui2.x;
+        tmpRect->x = (Ncv8u)rectX;
+        tmpRect->y = (Ncv8u)rectY;
+        tmpRect->width = (Ncv8u)rectWidth;
+        tmpRect->height = (Ncv8u)rectHeight;
+        return NCV_SUCCESS;
+    }
+
+    __host__ NCVStatus setWeight(Ncv32f weight)
+    {
+        ((Ncv32f_a*)&(this->_ui2.y))[0] = weight;
+        return NCV_SUCCESS;
+    }
+
+    __device__ __host__ void getRect(Ncv32u *rectX, Ncv32u *rectY, Ncv32u *rectWidth, Ncv32u *rectHeight)
+    {
+        NcvRect8u* tmpRect = (NcvRect8u*)&this->_ui2.x;
+        *rectX = tmpRect->x;
+        *rectY = tmpRect->y;
+        *rectWidth = tmpRect->width;
+        *rectHeight = tmpRect->height;
+    }
+
+    __device__ __host__ Ncv32f getWeight(void)
+    {
+        return *(Ncv32f_a*)(&this->_ui2.y);
+    }
+};
+
+
+struct HaarFeatureDescriptor32
+{
+private:
+
+#define HaarFeatureDescriptor32_Interpret_MaskFlagTilted        0x80000000
+#define HaarFeatureDescriptor32_Interpret_MaskFlagLeftNodeLeaf  0x40000000
+#define HaarFeatureDescriptor32_Interpret_MaskFlagRightNodeLeaf 0x20000000
+#define HaarFeatureDescriptor32_CreateCheck_MaxNumFeatures      0x1F
+#define HaarFeatureDescriptor32_NumFeatures_Shift               24
+#define HaarFeatureDescriptor32_CreateCheck_MaxFeatureOffset    0x00FFFFFF
+
+    Ncv32u desc;
+
+public:
+
+    __host__ NCVStatus create(NcvBool bTilted, NcvBool bLeftLeaf, NcvBool bRightLeaf,
+                              Ncv32u numFeatures, Ncv32u offsetFeatures)
+    {
+        if (numFeatures > HaarFeatureDescriptor32_CreateCheck_MaxNumFeatures)
+        {
+            return NCV_HAAR_TOO_MANY_FEATURES_IN_CLASSIFIER;
+        }
+        if (offsetFeatures > HaarFeatureDescriptor32_CreateCheck_MaxFeatureOffset)
+        {
+            return NCV_HAAR_TOO_MANY_FEATURES_IN_CASCADE;
+        }
+        this->desc = 0;
+        this->desc |= (bTilted ? HaarFeatureDescriptor32_Interpret_MaskFlagTilted : 0);
+        this->desc |= (bLeftLeaf ? HaarFeatureDescriptor32_Interpret_MaskFlagLeftNodeLeaf : 0);
+        this->desc |= (bRightLeaf ? HaarFeatureDescriptor32_Interpret_MaskFlagRightNodeLeaf : 0);
+        this->desc |= (numFeatures << HaarFeatureDescriptor32_NumFeatures_Shift);
+        this->desc |= offsetFeatures;
+        return NCV_SUCCESS;
+    }
+
+    __device__ __host__ NcvBool isTilted(void)
+    {
+        return (this->desc & HaarFeatureDescriptor32_Interpret_MaskFlagTilted) != 0;
+    }
+
+    __device__ __host__ NcvBool isLeftNodeLeaf(void)
+    {
+        return (this->desc & HaarFeatureDescriptor32_Interpret_MaskFlagLeftNodeLeaf) != 0;
+    }
+
+    __device__ __host__ NcvBool isRightNodeLeaf(void)
+    {
+        return (this->desc & HaarFeatureDescriptor32_Interpret_MaskFlagRightNodeLeaf) != 0;
+    }
+
+    __device__ __host__ Ncv32u getNumFeatures(void)
+    {
+        return (this->desc >> HaarFeatureDescriptor32_NumFeatures_Shift) & HaarFeatureDescriptor32_CreateCheck_MaxNumFeatures;
+    }
+
+    __device__ __host__ Ncv32u getFeaturesOffset(void)
+    {
+        return this->desc & HaarFeatureDescriptor32_CreateCheck_MaxFeatureOffset;
+    }
+};
+
+struct HaarClassifierNodeDescriptor32
+{
+    uint1 _ui1;
+
+    __host__ NCVStatus create(Ncv32f leafValue)
+    {
+        *(Ncv32f_a *)&this->_ui1 = leafValue;
+        return NCV_SUCCESS;
+    }
+
+    __host__ NCVStatus create(Ncv32u offsetHaarClassifierNode)
+    {
+        this->_ui1.x = offsetHaarClassifierNode;
+        return NCV_SUCCESS;
+    }
+
+    __host__ Ncv32f getLeafValueHost(void)
+    {
+        return *(Ncv32f_a *)&this->_ui1.x;
+    }
+
+#ifdef __CUDACC__
+    __device__ Ncv32f getLeafValue(void)
+    {
+        return __int_as_float(this->_ui1.x);
+    }
+#endif
+
+    __device__ __host__ Ncv32u getNextNodeOffset(void)
+    {
+        return this->_ui1.x;
+    }
+};
+
+#if defined __GNUC__ && (__GNUC__*100 + __GNUC_MINOR__ > 204)
+typedef Ncv32u __attribute__((__may_alias__)) Ncv32u_a;
+#else
+typedef Ncv32u Ncv32u_a;
+#endif
+
+struct HaarClassifierNode128
+{
+    uint4 _ui4;
+
+    __host__ NCVStatus setFeatureDesc(HaarFeatureDescriptor32 f)
+    {
+        this->_ui4.x = *(Ncv32u *)&f;
+        return NCV_SUCCESS;
+    }
+
+    __host__ NCVStatus setThreshold(Ncv32f t)
+    {
+        this->_ui4.y = *(Ncv32u_a *)&t;
+        return NCV_SUCCESS;
+    }
+
+    __host__ NCVStatus setLeftNodeDesc(HaarClassifierNodeDescriptor32 nl)
+    {
+        this->_ui4.z = *(Ncv32u_a *)&nl;
+        return NCV_SUCCESS;
+    }
+
+    __host__ NCVStatus setRightNodeDesc(HaarClassifierNodeDescriptor32 nr)
+    {
+        this->_ui4.w = *(Ncv32u_a *)&nr;
+        return NCV_SUCCESS;
+    }
+
+    __host__ __device__ HaarFeatureDescriptor32 getFeatureDesc(void)
+    {
+        return *(HaarFeatureDescriptor32 *)&this->_ui4.x;
+    }
+
+    __host__ __device__ Ncv32f getThreshold(void)
+    {
+        return *(Ncv32f_a*)&this->_ui4.y;
+    }
+
+    __host__ __device__ HaarClassifierNodeDescriptor32 getLeftNodeDesc(void)
+    {
+        return *(HaarClassifierNodeDescriptor32 *)&this->_ui4.z;
+    }
+
+    __host__ __device__ HaarClassifierNodeDescriptor32 getRightNodeDesc(void)
+    {
+        return *(HaarClassifierNodeDescriptor32 *)&this->_ui4.w;
+    }
+};
+
+
+struct HaarStage64
+{
+#define HaarStage64_Interpret_MaskRootNodes         0x0000FFFF
+#define HaarStage64_Interpret_MaskRootNodeOffset    0xFFFF0000
+#define HaarStage64_Interpret_ShiftRootNodeOffset   16
+
+    uint2 _ui2;
+
+    __host__ NCVStatus setStageThreshold(Ncv32f t)
+    {
+        this->_ui2.x = *(Ncv32u_a *)&t;
+        return NCV_SUCCESS;
+    }
+
+    __host__ NCVStatus setStartClassifierRootNodeOffset(Ncv32u val)
+    {
+        if (val > (HaarStage64_Interpret_MaskRootNodeOffset >> HaarStage64_Interpret_ShiftRootNodeOffset))
+        {
+            return NCV_HAAR_XML_LOADING_EXCEPTION;
+        }
+        this->_ui2.y = (val << HaarStage64_Interpret_ShiftRootNodeOffset) | (this->_ui2.y & HaarStage64_Interpret_MaskRootNodes);
+        return NCV_SUCCESS;
+    }
+
+    __host__ NCVStatus setNumClassifierRootNodes(Ncv32u val)
+    {
+        if (val > HaarStage64_Interpret_MaskRootNodes)
+        {
+            return NCV_HAAR_XML_LOADING_EXCEPTION;
+        }
+        this->_ui2.y = val | (this->_ui2.y & HaarStage64_Interpret_MaskRootNodeOffset);
+        return NCV_SUCCESS;
+    }
+
+    __host__ __device__ Ncv32f getStageThreshold(void)
+    {
+        return *(Ncv32f_a*)&this->_ui2.x;
+    }
+
+    __host__ __device__ Ncv32u getStartClassifierRootNodeOffset(void)
+    {
+        return (this->_ui2.y >> HaarStage64_Interpret_ShiftRootNodeOffset);
+    }
+
+    __host__ __device__ Ncv32u getNumClassifierRootNodes(void)
+    {
+        return (this->_ui2.y & HaarStage64_Interpret_MaskRootNodes);
+    }
+};
+
+
+NCV_CT_ASSERT(sizeof(HaarFeature64) == 8);
+NCV_CT_ASSERT(sizeof(HaarFeatureDescriptor32) == 4);
+NCV_CT_ASSERT(sizeof(HaarClassifierNodeDescriptor32) == 4);
+NCV_CT_ASSERT(sizeof(HaarClassifierNode128) == 16);
+NCV_CT_ASSERT(sizeof(HaarStage64) == 8);
+
+
+//==============================================================================
+//
+// Classifier cascade descriptor
+//
+//==============================================================================
+
+
+struct HaarClassifierCascadeDescriptor
+{
+    Ncv32u NumStages;
+    Ncv32u NumClassifierRootNodes;
+    Ncv32u NumClassifierTotalNodes;
+    Ncv32u NumFeatures;
+    NcvSize32u ClassifierSize;
+    NcvBool bNeedsTiltedII;
+    NcvBool bHasStumpsOnly;
+};
+
+
+//==============================================================================
+//
+// Functional interface
+//
+//==============================================================================
+
+
+enum
+{
+    NCVPipeObjDet_Default               = 0x000,
+    NCVPipeObjDet_UseFairImageScaling   = 0x001,
+    NCVPipeObjDet_FindLargestObject     = 0x002,
+    NCVPipeObjDet_VisualizeInPlace      = 0x004,
+};
+
+
+CV_EXPORTS NCVStatus ncvDetectObjectsMultiScale_device(NCVMatrix<Ncv8u> &d_srcImg,
+                                                        NcvSize32u srcRoi,
+                                                        NCVVector<NcvRect32u> &d_dstRects,
+                                                        Ncv32u &dstNumRects,
+
+                                                        HaarClassifierCascadeDescriptor &haar,
+                                                        NCVVector<HaarStage64> &h_HaarStages,
+                                                        NCVVector<HaarStage64> &d_HaarStages,
+                                                        NCVVector<HaarClassifierNode128> &d_HaarNodes,
+                                                        NCVVector<HaarFeature64> &d_HaarFeatures,
+
+                                                        NcvSize32u minObjSize,
+                                                        Ncv32u minNeighbors,      //default 4
+                                                        Ncv32f scaleStep,         //default 1.2f
+                                                        Ncv32u pixelStep,         //default 1
+                                                        Ncv32u flags,             //default NCVPipeObjDet_Default
+
+                                                        INCVMemAllocator &gpuAllocator,
+                                                        INCVMemAllocator &cpuAllocator,
+                                                        cudaDeviceProp &devProp,
+                                                        cudaStream_t cuStream);
+
+
+#define OBJDET_MASK_ELEMENT_INVALID_32U     0xFFFFFFFF
+#define HAAR_STDDEV_BORDER                  1
+
+
+CV_EXPORTS NCVStatus ncvApplyHaarClassifierCascade_device(NCVMatrix<Ncv32u> &d_integralImage,
+                                                           NCVMatrix<Ncv32f> &d_weights,
+                                                           NCVMatrixAlloc<Ncv32u> &d_pixelMask,
+                                                           Ncv32u &numDetections,
+                                                           HaarClassifierCascadeDescriptor &haar,
+                                                           NCVVector<HaarStage64> &h_HaarStages,
+                                                           NCVVector<HaarStage64> &d_HaarStages,
+                                                           NCVVector<HaarClassifierNode128> &d_HaarNodes,
+                                                           NCVVector<HaarFeature64> &d_HaarFeatures,
+                                                           NcvBool bMaskElements,
+                                                           NcvSize32u anchorsRoi,
+                                                           Ncv32u pixelStep,
+                                                           Ncv32f scaleArea,
+                                                           INCVMemAllocator &gpuAllocator,
+                                                           INCVMemAllocator &cpuAllocator,
+                                                           cudaDeviceProp &devProp,
+                                                           cudaStream_t cuStream);
+
+
+CV_EXPORTS NCVStatus ncvApplyHaarClassifierCascade_host(NCVMatrix<Ncv32u> &h_integralImage,
+                                                         NCVMatrix<Ncv32f> &h_weights,
+                                                         NCVMatrixAlloc<Ncv32u> &h_pixelMask,
+                                                         Ncv32u &numDetections,
+                                                         HaarClassifierCascadeDescriptor &haar,
+                                                         NCVVector<HaarStage64> &h_HaarStages,
+                                                         NCVVector<HaarClassifierNode128> &h_HaarNodes,
+                                                         NCVVector<HaarFeature64> &h_HaarFeatures,
+                                                         NcvBool bMaskElements,
+                                                         NcvSize32u anchorsRoi,
+                                                         Ncv32u pixelStep,
+                                                         Ncv32f scaleArea);
+
+
+#define RECT_SIMILARITY_PROPORTION      0.2f
+
+
+CV_EXPORTS NCVStatus ncvGrowDetectionsVector_device(NCVVector<Ncv32u> &pixelMask,
+                                                     Ncv32u numPixelMaskDetections,
+                                                     NCVVector<NcvRect32u> &hypotheses,
+                                                     Ncv32u &totalDetections,
+                                                     Ncv32u totalMaxDetections,
+                                                     Ncv32u rectWidth,
+                                                     Ncv32u rectHeight,
+                                                     Ncv32f curScale,
+                                                     cudaStream_t cuStream);
+
+
+CV_EXPORTS NCVStatus ncvGrowDetectionsVector_host(NCVVector<Ncv32u> &pixelMask,
+                                                   Ncv32u numPixelMaskDetections,
+                                                   NCVVector<NcvRect32u> &hypotheses,
+                                                   Ncv32u &totalDetections,
+                                                   Ncv32u totalMaxDetections,
+                                                   Ncv32u rectWidth,
+                                                   Ncv32u rectHeight,
+                                                   Ncv32f curScale);
+
+
+CV_EXPORTS NCVStatus ncvHaarGetClassifierSize(const cv::String &filename, Ncv32u &numStages,
+                                               Ncv32u &numNodes, Ncv32u &numFeatures);
+
+
+CV_EXPORTS NCVStatus ncvHaarLoadFromFile_host(const cv::String &filename,
+                                               HaarClassifierCascadeDescriptor &haar,
+                                               NCVVector<HaarStage64> &h_HaarStages,
+                                               NCVVector<HaarClassifierNode128> &h_HaarNodes,
+                                               NCVVector<HaarFeature64> &h_HaarFeatures);
+
+
+CV_EXPORTS NCVStatus ncvHaarStoreNVBIN_host(const cv::String &filename,
+                                             HaarClassifierCascadeDescriptor haar,
+                                             NCVVector<HaarStage64> &h_HaarStages,
+                                             NCVVector<HaarClassifierNode128> &h_HaarNodes,
+                                             NCVVector<HaarFeature64> &h_HaarFeatures);
+
+//! @}
+
+#endif // _ncvhaarobjectdetection_hpp_
diff --git a/modules/cudalegacy/include/opencv2/cudalegacy/NCVPyramid.hpp b/modules/cudalegacy/include/opencv2/cudalegacy/NCVPyramid.hpp
new file mode 100644
index 00000000000..28de29fe318
--- /dev/null
+++ b/modules/cudalegacy/include/opencv2/cudalegacy/NCVPyramid.hpp
@@ -0,0 +1,113 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _ncvpyramid_hpp_
+#define _ncvpyramid_hpp_
+
+#include <memory>
+#include <vector>
+#include "opencv2/cudalegacy/NCV.hpp"
+#include "opencv2/core/cuda/common.hpp"
+
+//! @cond IGNORED
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace pyramid
+    {
+        CV_EXPORTS void downsampleX2(PtrStepSzb src, PtrStepSzb dst, int depth, int cn, cudaStream_t stream);
+        CV_EXPORTS void interpolateFrom1(PtrStepSzb src, PtrStepSzb dst, int depth, int cn, cudaStream_t stream);
+    }
+}}}
+
+#if 0 //def _WIN32
+
+template <class T>
+class CV_EXPORTS NCVMatrixStack
+{
+public:
+    NCVMatrixStack() {this->_arr.clear();}
+    ~NCVMatrixStack()
+    {
+        const Ncv32u nElem = this->_arr.size();
+        for (Ncv32u i=0; i<nElem; i++)
+        {
+            pop_back();
+        }
+    }
+    void push_back(NCVMatrix<T> *elem) {this->_arr.push_back(std::shared_ptr< NCVMatrix<T> >(elem));}
+    void pop_back() {this->_arr.pop_back();}
+    NCVMatrix<T> * operator [] (int i) const {return this->_arr[i].get();}
+private:
+    std::vector< std::shared_ptr< NCVMatrix<T> > > _arr;
+};
+
+
+template <class T>
+class CV_EXPORTS NCVImagePyramid
+{
+public:
+
+    NCVImagePyramid(const NCVMatrix<T> &img,
+                    Ncv8u nLayers,
+                    INCVMemAllocator &alloc,
+                    cudaStream_t cuStream);
+    ~NCVImagePyramid();
+    NcvBool isInitialized() const;
+    NCVStatus getLayer(NCVMatrix<T> &outImg,
+                       NcvSize32u outRoi,
+                       NcvBool bTrilinear,
+                       cudaStream_t cuStream) const;
+
+private:
+
+    NcvBool _isInitialized;
+    const NCVMatrix<T> *layer0;
+    NCVMatrixStack<T> pyramid;
+    Ncv32u nLayers;
+};
+
+#endif //_WIN32
+
+//! @endcond
+
+#endif //_ncvpyramid_hpp_
diff --git a/modules/cudalegacy/include/opencv2/cudalegacy/NPP_staging.hpp b/modules/cudalegacy/include/opencv2/cudalegacy/NPP_staging.hpp
new file mode 100644
index 00000000000..89e7f7cdea3
--- /dev/null
+++ b/modules/cudalegacy/include/opencv2/cudalegacy/NPP_staging.hpp
@@ -0,0 +1,906 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _npp_staging_hpp_
+#define _npp_staging_hpp_
+
+#include "opencv2/cudalegacy/NCV.hpp"
+
+//! @addtogroup cudalegacy
+//! @{
+
+/** \defgroup core_npp NPPST Core
+ * Basic functions for CUDA streams management.
+ * @{
+ */
+
+/**
+ * Gets an active CUDA stream used by NPPST
+ * NOT THREAD SAFE
+ * \return Current CUDA stream
+ */
+CV_EXPORTS
+cudaStream_t nppStGetActiveCUDAstream();
+
+
+/**
+ * Sets an active CUDA stream used by NPPST
+ * NOT THREAD SAFE
+ * \param cudaStream        [IN] cudaStream CUDA stream to become current
+ * \return CUDA stream used before
+ */
+CV_EXPORTS
+cudaStream_t nppStSetActiveCUDAstream(cudaStream_t cudaStream);
+
+
+/** @} */
+
+
+/** \defgroup nppi NPPST Image Processing
+* @{
+*/
+
+
+/** Border type
+ *
+ * Filtering operations assume that each pixel has a neighborhood of pixels.
+ * The following structure describes possible ways to define non-existent pixels.
+ */
+enum NppStBorderType
+{
+    nppStBorderNone   = 0, ///< There is no need to define additional pixels, image is extended already
+    nppStBorderClamp  = 1, ///< Clamp out of range position to borders
+    nppStBorderWrap   = 2, ///< Wrap out of range position. Image becomes periodic.
+    nppStBorderMirror = 3  ///< reflect out of range position across borders
+};
+
+
+/**
+ * Filter types for image resizing
+ */
+enum NppStInterpMode
+{
+    nppStSupersample, ///< Supersampling. For downscaling only
+    nppStBicubic      ///< Bicubic convolution filter, a = -0.5 (cubic Hermite spline)
+};
+
+
+/** Frame interpolation state
+ *
+ * This structure holds parameters required for frame interpolation.
+ * Forward displacement field is a per-pixel mapping from frame 0 to frame 1.
+ * Backward displacement field is a per-pixel mapping from frame 1 to frame 0.
+ */
+
+ struct NppStInterpolationState
+{
+    NcvSize32u size;      ///< frame size
+    Ncv32u nStep;         ///< pitch
+    Ncv32f pos;           ///< new frame position
+    Ncv32f *pSrcFrame0;   ///< frame 0
+    Ncv32f *pSrcFrame1;   ///< frame 1
+    Ncv32f *pFU;          ///< forward horizontal displacement
+    Ncv32f *pFV;          ///< forward vertical displacement
+    Ncv32f *pBU;          ///< backward horizontal displacement
+    Ncv32f *pBV;          ///< backward vertical displacement
+    Ncv32f *pNewFrame;    ///< new frame
+    Ncv32f *ppBuffers[6]; ///< temporary buffers
+};
+
+
+/** Size of a buffer required for interpolation.
+ *
+ * Requires several such buffers. See \see NppStInterpolationState.
+ *
+ * \param srcSize           [IN]  Frame size (both frames must be of the same size)
+ * \param nStep             [IN]  Frame line step
+ * \param hpSize            [OUT] Where to store computed size (host memory)
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStGetInterpolationBufferSize(NcvSize32u srcSize,
+                                           Ncv32u nStep,
+                                           Ncv32u *hpSize);
+
+
+/** Interpolate frames (images) using provided optical flow (displacement field).
+ * 32-bit floating point images, single channel
+ *
+ * \param pState            [IN] structure containing all required parameters (host memory)
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStInterpolateFrames(const NppStInterpolationState *pState);
+
+
+/** Row linear filter. 32-bit floating point image, single channel
+ *
+ * Apply horizontal linear filter
+ *
+ * \param pSrc              [IN]  Source image pointer (CUDA device memory)
+ * \param srcSize           [IN]  Source image size
+ * \param nSrcStep          [IN]  Source image line step
+ * \param pDst              [OUT] Destination image pointer (CUDA device memory)
+ * \param dstSize           [OUT] Destination image size
+ * \param nDstStep
+ * \param oROI              [IN]  Region of interest in the source image
+ * \param borderType        [IN]  Type of border
+ * \param pKernel           [IN]  Pointer to row kernel values (CUDA device memory)
+ * \param nKernelSize       [IN]  Size of the kernel in pixels
+ * \param nAnchor           [IN]  The kernel row alignment with respect to the position of the input pixel
+ * \param multiplier        [IN]  Value by which the computed result is multiplied
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStFilterRowBorder_32f_C1R(const Ncv32f *pSrc,
+                                        NcvSize32u srcSize,
+                                        Ncv32u nSrcStep,
+                                        Ncv32f *pDst,
+                                        NcvSize32u dstSize,
+                                        Ncv32u nDstStep,
+                                        NcvRect32u oROI,
+                                        NppStBorderType borderType,
+                                        const Ncv32f *pKernel,
+                                        Ncv32s nKernelSize,
+                                        Ncv32s nAnchor,
+                                        Ncv32f multiplier);
+
+
+/** Column linear filter. 32-bit floating point image, single channel
+ *
+ * Apply vertical linear filter
+ *
+ * \param pSrc              [IN]  Source image pointer (CUDA device memory)
+ * \param srcSize           [IN]  Source image size
+ * \param nSrcStep          [IN]  Source image line step
+ * \param pDst              [OUT] Destination image pointer (CUDA device memory)
+ * \param dstSize           [OUT] Destination image size
+ * \param nDstStep          [IN]
+ * \param oROI              [IN]  Region of interest in the source image
+ * \param borderType        [IN]  Type of border
+ * \param pKernel           [IN]  Pointer to column kernel values (CUDA device memory)
+ * \param nKernelSize       [IN]  Size of the kernel in pixels
+ * \param nAnchor           [IN]  The kernel column alignment with respect to the position of the input pixel
+ * \param multiplier        [IN]  Value by which the computed result is multiplied
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStFilterColumnBorder_32f_C1R(const Ncv32f *pSrc,
+                                           NcvSize32u srcSize,
+                                           Ncv32u nSrcStep,
+                                           Ncv32f *pDst,
+                                           NcvSize32u dstSize,
+                                           Ncv32u nDstStep,
+                                           NcvRect32u oROI,
+                                           NppStBorderType borderType,
+                                           const Ncv32f *pKernel,
+                                           Ncv32s nKernelSize,
+                                           Ncv32s nAnchor,
+                                           Ncv32f multiplier);
+
+
+/** Size of buffer required for vector image warping.
+ *
+ * \param srcSize           [IN]  Source image size
+ * \param nSrcStep          [IN]  Source image line step
+ * \param hpSize            [OUT] Where to store computed size (host memory)
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStVectorWarpGetBufferSize(NcvSize32u srcSize,
+                                        Ncv32u nSrcStep,
+                                        Ncv32u *hpSize);
+
+
+/** Warp image using provided 2D vector field and 1x1 point spread function.
+ * 32-bit floating point image, single channel
+ *
+ * During warping pixels from the source image may fall between pixels of the destination image.
+ * PSF (point spread function) describes how the source image pixel affects pixels of the destination.
+ * For 1x1 PSF only single pixel with the largest intersection is affected (similar to nearest interpolation).
+ *
+ * Destination image size and line step must be the same as the source image size and line step
+ *
+ * \param pSrc              [IN]  Source image pointer (CUDA device memory)
+ * \param srcSize           [IN]  Source image size
+ * \param nSrcStep          [IN]  Source image line step
+ * \param pU                [IN]  Pointer to horizontal displacement field (CUDA device memory)
+ * \param pV                [IN]  Pointer to vertical displacement field (CUDA device memory)
+ * \param nVFStep           [IN]  Displacement field line step
+ * \param timeScale         [IN]  Value by which displacement field will be scaled for warping
+ * \param pDst              [OUT] Destination image pointer (CUDA device memory)
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStVectorWarp_PSF1x1_32f_C1(const Ncv32f *pSrc,
+                                         NcvSize32u srcSize,
+                                         Ncv32u nSrcStep,
+                                         const Ncv32f *pU,
+                                         const Ncv32f *pV,
+                                         Ncv32u nVFStep,
+                                         Ncv32f timeScale,
+                                         Ncv32f *pDst);
+
+
+/** Warp image using provided 2D vector field and 2x2 point spread function.
+ * 32-bit floating point image, single channel
+ *
+ * During warping pixels from the source image may fall between pixels of the destination image.
+ * PSF (point spread function) describes how the source image pixel affects pixels of the destination.
+ * For 2x2 PSF all four intersected pixels will be affected.
+ *
+ * Destination image size and line step must be the same as the source image size and line step
+ *
+ * \param pSrc              [IN]  Source image pointer (CUDA device memory)
+ * \param srcSize           [IN]  Source image size
+ * \param nSrcStep          [IN]  Source image line step
+ * \param pU                [IN]  Pointer to horizontal displacement field (CUDA device memory)
+ * \param pV                [IN]  Pointer to vertical displacement field (CUDA device memory)
+ * \param nVFStep           [IN]  Displacement field line step
+ * \param pBuffer
+ * \param timeScale         [IN]  Value by which displacement field will be scaled for warping
+ * \param pDst              [OUT] Destination image pointer (CUDA device memory)
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStVectorWarp_PSF2x2_32f_C1(const Ncv32f *pSrc,
+                                         NcvSize32u srcSize,
+                                         Ncv32u nSrcStep,
+                                         const Ncv32f *pU,
+                                         const Ncv32f *pV,
+                                         Ncv32u nVFStep,
+                                         Ncv32f *pBuffer,
+                                         Ncv32f timeScale,
+                                         Ncv32f *pDst);
+
+
+/** Resize. 32-bit floating point image, single channel
+ *
+ * Resizes image using specified filter (interpolation type)
+ *
+ * \param pSrc              [IN]  Source image pointer (CUDA device memory)
+ * \param srcSize           [IN]  Source image size
+ * \param nSrcStep          [IN]  Source image line step
+ * \param srcROI            [IN]  Source image region of interest
+ * \param pDst              [OUT] Destination image pointer (CUDA device memory)
+ * \param dstSize           [IN]  Destination image size
+ * \param nDstStep          [IN]  Destination image line step
+ * \param dstROI            [IN]  Destination image region of interest
+ * \param xFactor           [IN]  Row scale factor
+ * \param yFactor           [IN]  Column scale factor
+ * \param interpolation     [IN]  Interpolation type
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStResize_32f_C1R(const Ncv32f *pSrc,
+                               NcvSize32u srcSize,
+                               Ncv32u nSrcStep,
+                               NcvRect32u srcROI,
+                               Ncv32f *pDst,
+                               NcvSize32u dstSize,
+                               Ncv32u nDstStep,
+                               NcvRect32u dstROI,
+                               Ncv32f xFactor,
+                               Ncv32f yFactor,
+                               NppStInterpMode interpolation);
+
+
+/**
+ * Downsamples (decimates) an image using the nearest neighbor algorithm. 32-bit unsigned pixels, single channel.
+ *
+ * \param d_src             [IN] Source image pointer (CUDA device memory)
+ * \param srcStep           [IN] Source image line step
+ * \param d_dst             [OUT] Destination image pointer (CUDA device memory)
+ * \param dstStep           [IN] Destination image line step
+ * \param srcRoi            [IN] Region of interest in the source image
+ * \param scale             [IN] Downsampling scale factor (positive integer)
+ * \param readThruTexture   [IN] Performance hint to cache source in texture (true) or read directly (false)
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStDecimate_32u_C1R(Ncv32u *d_src, Ncv32u srcStep,
+                                 Ncv32u *d_dst, Ncv32u dstStep,
+                                 NcvSize32u srcRoi, Ncv32u scale,
+                                 NcvBool readThruTexture);
+
+
+/**
+ * Downsamples (decimates) an image using the nearest neighbor algorithm. 32-bit signed pixels, single channel.
+ * \see nppiStDecimate_32u_C1R
+ */
+CV_EXPORTS
+NCVStatus nppiStDecimate_32s_C1R(Ncv32s *d_src, Ncv32u srcStep,
+                                 Ncv32s *d_dst, Ncv32u dstStep,
+                                 NcvSize32u srcRoi, Ncv32u scale,
+                                 NcvBool readThruTexture);
+
+
+/**
+ * Downsamples (decimates) an image using the nearest neighbor algorithm. 32-bit float pixels, single channel.
+ * \see nppiStDecimate_32u_C1R
+ */
+CV_EXPORTS
+NCVStatus nppiStDecimate_32f_C1R(Ncv32f *d_src, Ncv32u srcStep,
+                                 Ncv32f *d_dst, Ncv32u dstStep,
+                                 NcvSize32u srcRoi, Ncv32u scale,
+                                 NcvBool readThruTexture);
+
+
+/**
+* Downsamples (decimates) an image using the nearest neighbor algorithm. 64-bit unsigned pixels, single channel.
+* \see nppiStDecimate_32u_C1R
+*/
+CV_EXPORTS
+NCVStatus nppiStDecimate_64u_C1R(Ncv64u *d_src, Ncv32u srcStep,
+                                 Ncv64u *d_dst, Ncv32u dstStep,
+                                 NcvSize32u srcRoi, Ncv32u scale,
+                                 NcvBool readThruTexture);
+
+
+/**
+ * Downsamples (decimates) an image using the nearest neighbor algorithm. 64-bit signed pixels, single channel.
+ * \see nppiStDecimate_32u_C1R
+ */
+CV_EXPORTS
+NCVStatus nppiStDecimate_64s_C1R(Ncv64s *d_src, Ncv32u srcStep,
+                                 Ncv64s *d_dst, Ncv32u dstStep,
+                                 NcvSize32u srcRoi, Ncv32u scale,
+                                 NcvBool readThruTexture);
+
+
+/**
+ * Downsamples (decimates) an image using the nearest neighbor algorithm. 64-bit float pixels, single channel.
+ * \see nppiStDecimate_32u_C1R
+ */
+CV_EXPORTS
+NCVStatus nppiStDecimate_64f_C1R(Ncv64f *d_src, Ncv32u srcStep,
+                                 Ncv64f *d_dst, Ncv32u dstStep,
+                                 NcvSize32u srcRoi, Ncv32u scale,
+                                 NcvBool readThruTexture);
+
+
+/**
+ * Downsamples (decimates) an image using the nearest neighbor algorithm. 32-bit unsigned pixels, single channel. Host implementation.
+ *
+ * \param h_src             [IN] Source image pointer (Host or pinned memory)
+ * \param srcStep           [IN] Source image line step
+ * \param h_dst             [OUT] Destination image pointer (Host or pinned memory)
+ * \param dstStep           [IN] Destination image line step
+ * \param srcRoi            [IN] Region of interest in the source image
+ * \param scale             [IN] Downsampling scale factor (positive integer)
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStDecimate_32u_C1R_host(Ncv32u *h_src, Ncv32u srcStep,
+                                      Ncv32u *h_dst, Ncv32u dstStep,
+                                      NcvSize32u srcRoi, Ncv32u scale);
+
+
+/**
+ * Downsamples (decimates) an image using the nearest neighbor algorithm. 32-bit signed pixels, single channel. Host implementation.
+ * \see nppiStDecimate_32u_C1R_host
+ */
+CV_EXPORTS
+NCVStatus nppiStDecimate_32s_C1R_host(Ncv32s *h_src, Ncv32u srcStep,
+                                      Ncv32s *h_dst, Ncv32u dstStep,
+                                      NcvSize32u srcRoi, Ncv32u scale);
+
+
+/**
+ * Downsamples (decimates) an image using the nearest neighbor algorithm. 32-bit float pixels, single channel. Host implementation.
+ * \see nppiStDecimate_32u_C1R_host
+ */
+CV_EXPORTS
+NCVStatus nppiStDecimate_32f_C1R_host(Ncv32f *h_src, Ncv32u srcStep,
+                                      Ncv32f *h_dst, Ncv32u dstStep,
+                                      NcvSize32u srcRoi, Ncv32u scale);
+
+
+/**
+ * Downsamples (decimates) an image using the nearest neighbor algorithm. 64-bit unsigned pixels, single channel. Host implementation.
+ * \see nppiStDecimate_32u_C1R_host
+ */
+CV_EXPORTS
+NCVStatus nppiStDecimate_64u_C1R_host(Ncv64u *h_src, Ncv32u srcStep,
+                                      Ncv64u *h_dst, Ncv32u dstStep,
+                                      NcvSize32u srcRoi, Ncv32u scale);
+
+
+/**
+ * Downsamples (decimates) an image using the nearest neighbor algorithm. 64-bit signed pixels, single channel. Host implementation.
+ * \see nppiStDecimate_32u_C1R_host
+ */
+CV_EXPORTS
+NCVStatus nppiStDecimate_64s_C1R_host(Ncv64s *h_src, Ncv32u srcStep,
+                                      Ncv64s *h_dst, Ncv32u dstStep,
+                                      NcvSize32u srcRoi, Ncv32u scale);
+
+
+/**
+ * Downsamples (decimates) an image using the nearest neighbor algorithm. 64-bit float pixels, single channel. Host implementation.
+ * \see nppiStDecimate_32u_C1R_host
+ */
+CV_EXPORTS
+NCVStatus nppiStDecimate_64f_C1R_host(Ncv64f *h_src, Ncv32u srcStep,
+                                      Ncv64f *h_dst, Ncv32u dstStep,
+                                      NcvSize32u srcRoi, Ncv32u scale);
+
+
+/**
+ * Computes standard deviation for each rectangular region of the input image using integral images.
+ *
+ * \param d_sum             [IN] Integral image pointer (CUDA device memory)
+ * \param sumStep           [IN] Integral image line step
+ * \param d_sqsum           [IN] Squared integral image pointer (CUDA device memory)
+ * \param sqsumStep         [IN] Squared integral image line step
+ * \param d_norm            [OUT] Stddev image pointer (CUDA device memory). Each pixel contains stddev of a rect with top-left corner at the original location in the image
+ * \param normStep          [IN] Stddev image line step
+ * \param roi               [IN] Region of interest in the source image
+ * \param rect              [IN] Rectangular region to calculate stddev over
+ * \param scaleArea         [IN] Multiplication factor to account decimated scale
+ * \param readThruTexture   [IN] Performance hint to cache source in texture (true) or read directly (false)
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStRectStdDev_32f_C1R(Ncv32u *d_sum, Ncv32u sumStep,
+                                   Ncv64u *d_sqsum, Ncv32u sqsumStep,
+                                   Ncv32f *d_norm, Ncv32u normStep,
+                                   NcvSize32u roi, NcvRect32u rect,
+                                   Ncv32f scaleArea, NcvBool readThruTexture);
+
+
+/**
+ * Computes standard deviation for each rectangular region of the input image using integral images. Host implementation
+ *
+ * \param h_sum             [IN] Integral image pointer (Host or pinned memory)
+ * \param sumStep           [IN] Integral image line step
+ * \param h_sqsum           [IN] Squared integral image pointer (Host or pinned memory)
+ * \param sqsumStep         [IN] Squared integral image line step
+ * \param h_norm            [OUT] Stddev image pointer (Host or pinned memory). Each pixel contains stddev of a rect with top-left corner at the original location in the image
+ * \param normStep          [IN] Stddev image line step
+ * \param roi               [IN] Region of interest in the source image
+ * \param rect              [IN] Rectangular region to calculate stddev over
+ * \param scaleArea         [IN] Multiplication factor to account decimated scale
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStRectStdDev_32f_C1R_host(Ncv32u *h_sum, Ncv32u sumStep,
+                                        Ncv64u *h_sqsum, Ncv32u sqsumStep,
+                                        Ncv32f *h_norm, Ncv32u normStep,
+                                        NcvSize32u roi, NcvRect32u rect,
+                                        Ncv32f scaleArea);
+
+
+/**
+ * Transposes an image. 32-bit unsigned pixels, single channel
+ *
+ * \param d_src             [IN] Source image pointer (CUDA device memory)
+ * \param srcStride         [IN] Source image line step
+ * \param d_dst             [OUT] Destination image pointer (CUDA device memory)
+ * \param dstStride         [IN] Destination image line step
+ * \param srcRoi            [IN] Region of interest of the source image
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStTranspose_32u_C1R(Ncv32u *d_src, Ncv32u srcStride,
+                                  Ncv32u *d_dst, Ncv32u dstStride, NcvSize32u srcRoi);
+
+
+/**
+ * Transposes an image. 32-bit signed pixels, single channel
+ * \see nppiStTranspose_32u_C1R
+ */
+CV_EXPORTS
+NCVStatus nppiStTranspose_32s_C1R(Ncv32s *d_src, Ncv32u srcStride,
+                                  Ncv32s *d_dst, Ncv32u dstStride, NcvSize32u srcRoi);
+
+
+/**
+ * Transposes an image. 32-bit float pixels, single channel
+ * \see nppiStTranspose_32u_C1R
+ */
+CV_EXPORTS
+NCVStatus nppiStTranspose_32f_C1R(Ncv32f *d_src, Ncv32u srcStride,
+                                  Ncv32f *d_dst, Ncv32u dstStride, NcvSize32u srcRoi);
+
+
+/**
+ * Transposes an image. 64-bit unsigned pixels, single channel
+ * \see nppiStTranspose_32u_C1R
+ */
+CV_EXPORTS
+NCVStatus nppiStTranspose_64u_C1R(Ncv64u *d_src, Ncv32u srcStride,
+                                  Ncv64u *d_dst, Ncv32u dstStride, NcvSize32u srcRoi);
+
+
+/**
+ * Transposes an image. 64-bit signed pixels, single channel
+ * \see nppiStTranspose_32u_C1R
+ */
+CV_EXPORTS
+NCVStatus nppiStTranspose_64s_C1R(Ncv64s *d_src, Ncv32u srcStride,
+                                  Ncv64s *d_dst, Ncv32u dstStride, NcvSize32u srcRoi);
+
+
+/**
+ * Transposes an image. 64-bit float pixels, single channel
+ * \see nppiStTranspose_32u_C1R
+ */
+CV_EXPORTS
+NCVStatus nppiStTranspose_64f_C1R(Ncv64f *d_src, Ncv32u srcStride,
+                                  Ncv64f *d_dst, Ncv32u dstStride, NcvSize32u srcRoi);
+
+
+/**
+ * Transposes an image. 128-bit pixels of any type, single channel
+ * \see nppiStTranspose_32u_C1R
+ */
+CV_EXPORTS
+NCVStatus nppiStTranspose_128_C1R(void *d_src, Ncv32u srcStep,
+                                  void *d_dst, Ncv32u dstStep, NcvSize32u srcRoi);
+
+
+/**
+ * Transposes an image. 32-bit unsigned pixels, single channel. Host implementation
+ *
+ * \param h_src             [IN] Source image pointer (Host or pinned memory)
+ * \param srcStride         [IN] Source image line step
+ * \param h_dst             [OUT] Destination image pointer (Host or pinned memory)
+ * \param dstStride         [IN] Destination image line step
+ * \param srcRoi            [IN] Region of interest of the source image
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStTranspose_32u_C1R_host(Ncv32u *h_src, Ncv32u srcStride,
+                                       Ncv32u *h_dst, Ncv32u dstStride, NcvSize32u srcRoi);
+
+
+/**
+ * Transposes an image. 32-bit signed pixels, single channel. Host implementation
+ * \see nppiStTranspose_32u_C1R_host
+ */
+CV_EXPORTS
+NCVStatus nppiStTranspose_32s_C1R_host(Ncv32s *h_src, Ncv32u srcStride,
+                                       Ncv32s *h_dst, Ncv32u dstStride, NcvSize32u srcRoi);
+
+
+/**
+ * Transposes an image. 32-bit float pixels, single channel. Host implementation
+ * \see nppiStTranspose_32u_C1R_host
+ */
+CV_EXPORTS
+NCVStatus nppiStTranspose_32f_C1R_host(Ncv32f *h_src, Ncv32u srcStride,
+                                       Ncv32f *h_dst, Ncv32u dstStride, NcvSize32u srcRoi);
+
+
+/**
+ * Transposes an image. 64-bit unsigned pixels, single channel. Host implementation
+ * \see nppiStTranspose_32u_C1R_host
+ */
+CV_EXPORTS
+NCVStatus nppiStTranspose_64u_C1R_host(Ncv64u *h_src, Ncv32u srcStride,
+                                       Ncv64u *h_dst, Ncv32u dstStride, NcvSize32u srcRoi);
+
+
+/**
+ * Transposes an image. 64-bit signed pixels, single channel. Host implementation
+ * \see nppiStTranspose_32u_C1R_host
+ */
+CV_EXPORTS
+NCVStatus nppiStTranspose_64s_C1R_host(Ncv64s *h_src, Ncv32u srcStride,
+                                       Ncv64s *h_dst, Ncv32u dstStride, NcvSize32u srcRoi);
+
+
+/**
+ * Transposes an image. 64-bit float pixels, single channel. Host implementation
+ * \see nppiStTranspose_32u_C1R_host
+ */
+CV_EXPORTS
+NCVStatus nppiStTranspose_64f_C1R_host(Ncv64f *h_src, Ncv32u srcStride,
+                                       Ncv64f *h_dst, Ncv32u dstStride, NcvSize32u srcRoi);
+
+
+/**
+ * Transposes an image. 128-bit pixels of any type, single channel. Host implementation
+ * \see nppiStTranspose_32u_C1R_host
+ */
+CV_EXPORTS
+NCVStatus nppiStTranspose_128_C1R_host(void *d_src, Ncv32u srcStep,
+                                       void *d_dst, Ncv32u dstStep, NcvSize32u srcRoi);
+
+
+/**
+ * Calculates the size of the temporary buffer for integral image creation
+ *
+ * \param roiSize           [IN] Size of the input image
+ * \param pBufsize          [OUT] Pointer to host variable that returns the size of the temporary buffer (in bytes)
+ * \param devProp           [IN] CUDA device properties structure, containing texture alignment information
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStIntegralGetSize_8u32u(NcvSize32u roiSize, Ncv32u *pBufsize, cudaDeviceProp &devProp);
+
+
+/**
+ * Calculates the size of the temporary buffer for integral image creation
+ * \see nppiStIntegralGetSize_8u32u
+ */
+CV_EXPORTS
+NCVStatus nppiStIntegralGetSize_32f32f(NcvSize32u roiSize, Ncv32u *pBufsize, cudaDeviceProp &devProp);
+
+
+/**
+ * Creates an integral image representation for the input image
+ *
+ * \param d_src             [IN] Source image pointer (CUDA device memory)
+ * \param srcStep           [IN] Source image line step
+ * \param d_dst             [OUT] Destination integral image pointer (CUDA device memory)
+ * \param dstStep           [IN] Destination image line step
+ * \param roiSize           [IN] Region of interest of the source image
+ * \param pBuffer           [IN] Pointer to the pre-allocated temporary buffer (CUDA device memory)
+ * \param bufSize           [IN] Size of the pBuffer in bytes
+ * \param devProp           [IN] CUDA device properties structure, containing texture alignment information
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStIntegral_8u32u_C1R(Ncv8u *d_src, Ncv32u srcStep,
+                                   Ncv32u *d_dst, Ncv32u dstStep, NcvSize32u roiSize,
+                                   Ncv8u *pBuffer, Ncv32u bufSize, cudaDeviceProp &devProp);
+
+
+/**
+ * Creates an integral image representation for the input image
+ * \see nppiStIntegral_8u32u_C1R
+ */
+CV_EXPORTS
+NCVStatus nppiStIntegral_32f32f_C1R(Ncv32f *d_src, Ncv32u srcStep,
+                                    Ncv32f *d_dst, Ncv32u dstStep, NcvSize32u roiSize,
+                                    Ncv8u *pBuffer, Ncv32u bufSize, cudaDeviceProp &devProp);
+
+
+/**
+ * Creates an integral image representation for the input image. Host implementation
+ *
+ * \param h_src             [IN] Source image pointer (Host or pinned memory)
+ * \param srcStep           [IN] Source image line step
+ * \param h_dst             [OUT] Destination integral image pointer (Host or pinned memory)
+ * \param dstStep           [IN] Destination image line step
+ * \param roiSize           [IN] Region of interest of the source image
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStIntegral_8u32u_C1R_host(Ncv8u *h_src, Ncv32u srcStep,
+                                        Ncv32u *h_dst, Ncv32u dstStep, NcvSize32u roiSize);
+
+
+/**
+ * Creates an integral image representation for the input image. Host implementation
+ * \see nppiStIntegral_8u32u_C1R_host
+ */
+CV_EXPORTS
+NCVStatus nppiStIntegral_32f32f_C1R_host(Ncv32f *h_src, Ncv32u srcStep,
+                                         Ncv32f *h_dst, Ncv32u dstStep, NcvSize32u roiSize);
+
+
+/**
+ * Calculates the size of the temporary buffer for squared integral image creation
+ *
+ * \param roiSize           [IN] Size of the input image
+ * \param pBufsize          [OUT] Pointer to host variable that returns the size of the temporary buffer (in bytes)
+ * \param devProp           [IN] CUDA device properties structure, containing texture alignment information
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStSqrIntegralGetSize_8u64u(NcvSize32u roiSize, Ncv32u *pBufsize, cudaDeviceProp &devProp);
+
+
+/**
+ * Creates a squared integral image representation for the input image
+ *
+ * \param d_src             [IN] Source image pointer (CUDA device memory)
+ * \param srcStep           [IN] Source image line step
+ * \param d_dst             [OUT] Destination squared integral image pointer (CUDA device memory)
+ * \param dstStep           [IN] Destination image line step
+ * \param roiSize           [IN] Region of interest of the source image
+ * \param pBuffer           [IN] Pointer to the pre-allocated temporary buffer (CUDA device memory)
+ * \param bufSize           [IN] Size of the pBuffer in bytes
+ * \param devProp           [IN] CUDA device properties structure, containing texture alignment information
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStSqrIntegral_8u64u_C1R(Ncv8u *d_src, Ncv32u srcStep,
+                                      Ncv64u *d_dst, Ncv32u dstStep, NcvSize32u roiSize,
+                                      Ncv8u *pBuffer, Ncv32u bufSize, cudaDeviceProp &devProp);
+
+
+/**
+ * Creates a squared integral image representation for the input image. Host implementation
+ *
+ * \param h_src             [IN] Source image pointer (Host or pinned memory)
+ * \param srcStep           [IN] Source image line step
+ * \param h_dst             [OUT] Destination squared integral image pointer (Host or pinned memory)
+ * \param dstStep           [IN] Destination image line step
+ * \param roiSize           [IN] Region of interest of the source image
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppiStSqrIntegral_8u64u_C1R_host(Ncv8u *h_src, Ncv32u srcStep,
+                                           Ncv64u *h_dst, Ncv32u dstStep, NcvSize32u roiSize);
+
+
+/** @} */
+
+
+/** \defgroup npps NPPST Signal Processing
+* @{
+*/
+
+
+/**
+ * Calculates the size of the temporary buffer for vector compaction. 32-bit unsigned values
+ *
+ * \param srcLen            [IN] Length of the input vector in elements
+ * \param pBufsize          [OUT] Pointer to host variable that returns the size of the temporary buffer (in bytes)
+ * \param devProp           [IN] CUDA device properties structure, containing texture alignment information
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppsStCompactGetSize_32u(Ncv32u srcLen, Ncv32u *pBufsize, cudaDeviceProp &devProp);
+
+
+/**
+ * Calculates the size of the temporary buffer for vector compaction. 32-bit signed values
+ * \see nppsStCompactGetSize_32u
+ */
+NCVStatus nppsStCompactGetSize_32s(Ncv32u srcLen, Ncv32u *pBufsize, cudaDeviceProp &devProp);
+
+
+/**
+ * Calculates the size of the temporary buffer for vector compaction. 32-bit float values
+ * \see nppsStCompactGetSize_32u
+ */
+NCVStatus nppsStCompactGetSize_32f(Ncv32u srcLen, Ncv32u *pBufsize, cudaDeviceProp &devProp);
+
+
+/**
+ * Compacts the input vector by removing elements of specified value. 32-bit unsigned values
+ *
+ * \param d_src             [IN] Source vector pointer (CUDA device memory)
+ * \param srcLen            [IN] Source vector length
+ * \param d_dst             [OUT] Destination vector pointer (CUDA device memory)
+ * \param p_dstLen          [OUT] Pointer to the destination vector length (Pinned memory or NULL)
+ * \param elemRemove        [IN] The value to be removed
+ * \param pBuffer           [IN] Pointer to the pre-allocated temporary buffer (CUDA device memory)
+ * \param bufSize           [IN] Size of the pBuffer in bytes
+ * \param devProp           [IN] CUDA device properties structure, containing texture alignment information
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppsStCompact_32u(Ncv32u *d_src, Ncv32u srcLen,
+                            Ncv32u *d_dst, Ncv32u *p_dstLen,
+                            Ncv32u elemRemove, Ncv8u *pBuffer,
+                            Ncv32u bufSize, cudaDeviceProp &devProp);
+
+
+/**
+ * Compacts the input vector by removing elements of specified value. 32-bit signed values
+ * \see nppsStCompact_32u
+ */
+CV_EXPORTS
+NCVStatus nppsStCompact_32s(Ncv32s *d_src, Ncv32u srcLen,
+                            Ncv32s *d_dst, Ncv32u *p_dstLen,
+                            Ncv32s elemRemove, Ncv8u *pBuffer,
+                            Ncv32u bufSize, cudaDeviceProp &devProp);
+
+
+/**
+ * Compacts the input vector by removing elements of specified value. 32-bit float values
+ * \see nppsStCompact_32u
+ */
+CV_EXPORTS
+NCVStatus nppsStCompact_32f(Ncv32f *d_src, Ncv32u srcLen,
+                            Ncv32f *d_dst, Ncv32u *p_dstLen,
+                            Ncv32f elemRemove, Ncv8u *pBuffer,
+                            Ncv32u bufSize, cudaDeviceProp &devProp);
+
+
+/**
+ * Compacts the input vector by removing elements of specified value. 32-bit unsigned values. Host implementation
+ *
+ * \param h_src             [IN] Source vector pointer (CUDA device memory)
+ * \param srcLen            [IN] Source vector length
+ * \param h_dst             [OUT] Destination vector pointer (CUDA device memory)
+ * \param dstLen            [OUT] Pointer to the destination vector length (can be NULL)
+ * \param elemRemove        [IN] The value to be removed
+ *
+ * \return NCV status code
+ */
+CV_EXPORTS
+NCVStatus nppsStCompact_32u_host(Ncv32u *h_src, Ncv32u srcLen,
+                                 Ncv32u *h_dst, Ncv32u *dstLen, Ncv32u elemRemove);
+
+
+/**
+ * Compacts the input vector by removing elements of specified value. 32-bit signed values. Host implementation
+ * \see nppsStCompact_32u_host
+ */
+CV_EXPORTS
+NCVStatus nppsStCompact_32s_host(Ncv32s *h_src, Ncv32u srcLen,
+                                 Ncv32s *h_dst, Ncv32u *dstLen, Ncv32s elemRemove);
+
+
+/**
+ * Compacts the input vector by removing elements of specified value. 32-bit float values. Host implementation
+ * \see nppsStCompact_32u_host
+ */
+CV_EXPORTS
+NCVStatus nppsStCompact_32f_host(Ncv32f *h_src, Ncv32u srcLen,
+                                 Ncv32f *h_dst, Ncv32u *dstLen, Ncv32f elemRemove);
+
+
+/** @} */
+
+//! @} cudalegacy
+
+#endif // _npp_staging_hpp_
diff --git a/modules/cudalegacy/include/opencv2/cudalegacy/private.hpp b/modules/cudalegacy/include/opencv2/cudalegacy/private.hpp
new file mode 100644
index 00000000000..79f9e635bf0
--- /dev/null
+++ b/modules/cudalegacy/include/opencv2/cudalegacy/private.hpp
@@ -0,0 +1,96 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_CORE_CUDALEGACY_PRIVATE_HPP
+#define OPENCV_CORE_CUDALEGACY_PRIVATE_HPP
+
+#ifndef __OPENCV_BUILD
+#  error this is a private header which should not be used from outside of the OpenCV library
+#endif
+
+#include "opencv2/core/private.cuda.hpp"
+
+#ifndef HAVE_CUDA
+#  error cudalegacy module requires CUDA
+#endif
+
+#include "opencv2/cudalegacy.hpp"
+
+//! @cond IGNORED
+
+namespace cv { namespace cuda
+{
+    class NppStStreamHandler
+    {
+    public:
+        inline explicit NppStStreamHandler(cudaStream_t newStream = 0)
+        {
+            oldStream = nppStSetActiveCUDAstream(newStream);
+        }
+
+        inline ~NppStStreamHandler()
+        {
+            nppStSetActiveCUDAstream(oldStream);
+        }
+
+    private:
+        cudaStream_t oldStream;
+    };
+
+    CV_EXPORTS cv::String getNcvErrorMessage(int code);
+
+    static inline void checkNcvError(int err, const char* file, const int line, const char* func)
+    {
+        if (NCV_SUCCESS != err)
+        {
+            cv::String msg = getNcvErrorMessage(err);
+            cv::error(cv::Error::GpuApiCallError, msg, func, file, line);
+        }
+    }
+}}
+
+#define ncvSafeCall(expr)  cv::cuda::checkNcvError(expr, __FILE__, __LINE__, CV_Func)
+
+//! @endcond
+
+#endif // OPENCV_CORE_CUDALEGACY_PRIVATE_HPP
diff --git a/modules/cudalegacy/perf/perf_bgsegm.cpp b/modules/cudalegacy/perf/perf_bgsegm.cpp
new file mode 100644
index 00000000000..fb3aabd1dc5
--- /dev/null
+++ b/modules/cudalegacy/perf/perf_bgsegm.cpp
@@ -0,0 +1,236 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+#ifdef HAVE_OPENCV_CUDAIMGPROC
+#  include "opencv2/cudaimgproc.hpp"
+#endif
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////
+// FGDStatModel
+
+#ifdef HAVE_VIDEO_INPUT
+
+DEF_PARAM_TEST_1(Video, string);
+
+PERF_TEST_P(Video, FGDStatModel,
+            Values(string("gpu/video/768x576.avi")))
+{
+    const int numIters = 10;
+
+    declare.time(60);
+
+    const string inputFile = perf::TestBase::getDataPath(GetParam());
+
+    cv::VideoCapture cap(inputFile);
+    ASSERT_TRUE(cap.isOpened());
+
+    cv::Mat frame;
+    cap >> frame;
+    ASSERT_FALSE(frame.empty());
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::cuda::GpuMat d_frame(frame), foreground;
+
+        cv::Ptr<cv::cuda::BackgroundSubtractorFGD> d_fgd = cv::cuda::createBackgroundSubtractorFGD();
+        d_fgd->apply(d_frame, foreground);
+
+        int i = 0;
+
+        // collect performance data
+        for (; i < numIters; ++i)
+        {
+            cap >> frame;
+            ASSERT_FALSE(frame.empty());
+
+            d_frame.upload(frame);
+
+            startTimer();
+            if(!next())
+                break;
+
+            d_fgd->apply(d_frame, foreground);
+
+            stopTimer();
+        }
+
+        // process last frame in sequence to get data for sanity test
+        for (; i < numIters; ++i)
+        {
+            cap >> frame;
+            ASSERT_FALSE(frame.empty());
+
+            d_frame.upload(frame);
+
+            d_fgd->apply(d_frame, foreground);
+        }
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+
+    SANITY_CHECK_NOTHING();
+}
+
+#endif
+
+//////////////////////////////////////////////////////
+// GMG
+
+#ifdef HAVE_VIDEO_INPUT
+
+DEF_PARAM_TEST(Video_Cn_MaxFeatures, string, MatCn, int);
+
+PERF_TEST_P(Video_Cn_MaxFeatures, GMG,
+            Combine(Values(string("gpu/video/768x576.avi")),
+                    CUDA_CHANNELS_1_3_4,
+                    Values(20, 40, 60)))
+{
+    const int numIters = 150;
+
+    const std::string inputFile = perf::TestBase::getDataPath(GET_PARAM(0));
+    const int cn = GET_PARAM(1);
+    const int maxFeatures = GET_PARAM(2);
+
+    cv::VideoCapture cap(inputFile);
+    ASSERT_TRUE(cap.isOpened());
+
+    cv::Mat frame;
+    cap >> frame;
+    ASSERT_FALSE(frame.empty());
+
+    if (cn != 3)
+    {
+        cv::Mat temp;
+        if (cn == 1)
+            cv::cvtColor(frame, temp, cv::COLOR_BGR2GRAY);
+        else
+            cv::cvtColor(frame, temp, cv::COLOR_BGR2BGRA);
+        cv::swap(temp, frame);
+    }
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::cuda::GpuMat d_frame(frame);
+        cv::cuda::GpuMat foreground;
+
+        cv::Ptr<cv::cuda::BackgroundSubtractorGMG> d_gmg = cv::cuda::createBackgroundSubtractorGMG();
+        d_gmg->setMaxFeatures(maxFeatures);
+
+        d_gmg->apply(d_frame, foreground);
+
+        int i = 0;
+
+        // collect performance data
+        for (; i < numIters; ++i)
+        {
+            cap >> frame;
+            if (frame.empty())
+            {
+                cap.release();
+                cap.open(inputFile);
+                cap >> frame;
+            }
+
+            if (cn != 3)
+            {
+                cv::Mat temp;
+                if (cn == 1)
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2GRAY);
+                else
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2BGRA);
+                cv::swap(temp, frame);
+            }
+
+            d_frame.upload(frame);
+
+            startTimer();
+            if(!next())
+                break;
+
+            d_gmg->apply(d_frame, foreground);
+
+            stopTimer();
+        }
+
+        // process last frame in sequence to get data for sanity test
+        for (; i < numIters; ++i)
+        {
+            cap >> frame;
+            if (frame.empty())
+            {
+                cap.release();
+                cap.open(inputFile);
+                cap >> frame;
+            }
+
+            if (cn != 3)
+            {
+                cv::Mat temp;
+                if (cn == 1)
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2GRAY);
+                else
+                    cv::cvtColor(frame, temp, cv::COLOR_BGR2BGRA);
+                cv::swap(temp, frame);
+            }
+
+            d_frame.upload(frame);
+
+            d_gmg->apply(d_frame, foreground);
+        }
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+
+    SANITY_CHECK_NOTHING();
+}
+
+#endif
+
+}} // namespace
diff --git a/modules/cudalegacy/perf/perf_calib3d.cpp b/modules/cudalegacy/perf/perf_calib3d.cpp
new file mode 100644
index 00000000000..9558e6cffc5
--- /dev/null
+++ b/modules/cudalegacy/perf/perf_calib3d.cpp
@@ -0,0 +1,140 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+#ifdef HAVE_OPENCV_CALIB3D
+
+#include "opencv2/calib3d.hpp"
+
+namespace opencv_test { namespace {
+
+DEF_PARAM_TEST_1(Count, int);
+
+//////////////////////////////////////////////////////////////////////
+// ProjectPoints
+
+PERF_TEST_P(Count, Calib3D_ProjectPoints,
+            Values(5000, 10000, 20000))
+{
+    const int count = GetParam();
+
+    cv::Mat src(1, count, CV_32FC3);
+    declare.in(src, WARMUP_RNG);
+
+    const cv::Mat rvec = cv::Mat::ones(1, 3, CV_32FC1);
+    const cv::Mat tvec = cv::Mat::ones(1, 3, CV_32FC1);
+    const cv::Mat camera_mat = cv::Mat::ones(3, 3, CV_32FC1);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::projectPoints(d_src, rvec, tvec, camera_mat, cv::Mat(), dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::projectPoints(src, rvec, tvec, camera_mat, cv::noArray(), dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// SolvePnPRansac
+
+PERF_TEST_P(Count, Calib3D_SolvePnPRansac,
+            Values(5000, 10000, 20000))
+{
+    declare.time(10.0);
+
+    const int count = GetParam();
+
+    cv::Mat object(1, count, CV_32FC3);
+    declare.in(object, WARMUP_RNG);
+
+    cv::Mat camera_mat(3, 3, CV_32FC1);
+    cv::randu(camera_mat, 0.5, 1);
+    camera_mat.at<float>(0, 1) = 0.f;
+    camera_mat.at<float>(1, 0) = 0.f;
+    camera_mat.at<float>(2, 0) = 0.f;
+    camera_mat.at<float>(2, 1) = 0.f;
+
+    const cv::Mat dist_coef(1, 8, CV_32F, cv::Scalar::all(0));
+
+    cv::Mat rvec_gold(1, 3, CV_32FC1);
+    cv::randu(rvec_gold, 0, 1);
+
+    cv::Mat tvec_gold(1, 3, CV_32FC1);
+    cv::randu(tvec_gold, 0, 1);
+
+    std::vector<cv::Point2f> image_vec;
+    cv::projectPoints(object, rvec_gold, tvec_gold, camera_mat, dist_coef, image_vec);
+
+    const cv::Mat image(1, count, CV_32FC2, &image_vec[0]);
+
+    cv::Mat rvec;
+    cv::Mat tvec;
+
+    if (PERF_RUN_CUDA())
+    {
+        TEST_CYCLE() cv::cuda::solvePnPRansac(object, image, camera_mat, dist_coef, rvec, tvec);
+
+        CUDA_SANITY_CHECK(rvec, 1e-3);
+        CUDA_SANITY_CHECK(tvec, 1e-3);
+    }
+    else
+    {
+        TEST_CYCLE() cv::solvePnPRansac(object, image, camera_mat, dist_coef, rvec, tvec);
+
+        CPU_SANITY_CHECK(rvec, 1e-6);
+        CPU_SANITY_CHECK(tvec, 1e-6);
+    }
+}
+
+}} // namespace
+#endif
diff --git a/modules/cudalegacy/perf/perf_labeling.cpp b/modules/cudalegacy/perf/perf_labeling.cpp
new file mode 100644
index 00000000000..0084744f41c
--- /dev/null
+++ b/modules/cudalegacy/perf/perf_labeling.cpp
@@ -0,0 +1,195 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+DEF_PARAM_TEST_1(Image, string);
+
+struct GreedyLabeling
+{
+    struct dot
+    {
+        int x;
+        int y;
+
+        static dot make(int i, int j)
+        {
+            dot d; d.x = i; d.y = j;
+            return d;
+        }
+    };
+
+    struct InInterval
+    {
+        InInterval(const int& _lo, const int& _hi) : lo(-_lo), hi(_hi) {}
+        const int lo, hi;
+
+        bool operator() (const unsigned char a, const unsigned char b) const
+        {
+            int d = a - b;
+            return lo <= d && d <= hi;
+        }
+
+    private:
+        InInterval& operator=(const InInterval&);
+
+
+    };
+
+    GreedyLabeling(cv::Mat img)
+    : image(img), _labels(image.size(), CV_32SC1, cv::Scalar::all(-1)) {stack = new dot[image.cols * image.rows];}
+
+    ~GreedyLabeling(){delete[] stack;}
+
+    void operator() (cv::Mat labels) const
+    {
+        labels.setTo(cv::Scalar::all(-1));
+        InInterval inInt(0, 2);
+        int cc = -1;
+
+        int* dist_labels = (int*)labels.data;
+        int pitch = static_cast<int>(labels.step1());
+
+        unsigned char* source = (unsigned char*)image.data;
+        int width = image.cols;
+        int height = image.rows;
+
+        for (int j = 0; j < image.rows; ++j)
+            for (int i = 0; i < image.cols; ++i)
+            {
+                if (dist_labels[j * pitch + i] != -1) continue;
+
+                dot* top = stack;
+                dot p = dot::make(i, j);
+                cc++;
+
+                dist_labels[j * pitch + i] = cc;
+
+                while (top >= stack)
+                {
+                    int*  dl = &dist_labels[p.y * pitch + p.x];
+                    unsigned char* sp = &source[p.y * image.step1() + p.x];
+
+                    dl[0] = cc;
+
+                    //right
+                    if( p.x < (width - 1) && dl[ +1] == -1 && inInt(sp[0], sp[+1]))
+                        *top++ = dot::make(p.x + 1, p.y);
+
+                    //left
+                    if( p.x > 0 && dl[-1] == -1 && inInt(sp[0], sp[-1]))
+                        *top++ = dot::make(p.x - 1, p.y);
+
+                    //bottom
+                    if( p.y < (height - 1) && dl[+pitch] == -1 && inInt(sp[0], sp[+image.step1()]))
+                        *top++ = dot::make(p.x, p.y + 1);
+
+                    //top
+                    if( p.y > 0 && dl[-pitch] == -1 && inInt(sp[0], sp[-static_cast<int>(image.step1())]))
+                        *top++ = dot::make(p.x, p.y - 1);
+
+                    p = *--top;
+                }
+            }
+    }
+
+    cv::Mat image;
+    cv::Mat _labels;
+    dot* stack;
+};
+
+PERF_TEST_P(Image, DISABLED_Labeling_ConnectivityMask,
+            Values<string>("gpu/labeling/aloe-disp.png"))
+{
+    declare.time(1.0);
+
+    const cv::Mat image = readImage(GetParam(), cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(image.empty());
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::cuda::GpuMat d_image(image);
+        cv::cuda::GpuMat mask;
+
+        TEST_CYCLE() cv::cuda::connectivityMask(d_image, mask, cv::Scalar::all(0), cv::Scalar::all(2));
+
+        CUDA_SANITY_CHECK(mask);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+PERF_TEST_P(Image, DISABLED_Labeling_ConnectedComponents,
+            Values<string>("gpu/labeling/aloe-disp.png"))
+{
+    declare.time(1.0);
+
+    const cv::Mat image = readImage(GetParam(), cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(image.empty());
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::cuda::GpuMat d_mask;
+        cv::cuda::connectivityMask(cv::cuda::GpuMat(image), d_mask, cv::Scalar::all(0), cv::Scalar::all(2));
+
+        cv::cuda::GpuMat components;
+
+        TEST_CYCLE() cv::cuda::labelComponents(d_mask, components);
+
+        CUDA_SANITY_CHECK(components);
+    }
+    else
+    {
+        GreedyLabeling host(image);
+
+        TEST_CYCLE() host(host._labels);
+
+        cv::Mat components = host._labels;
+        CPU_SANITY_CHECK(components);
+    }
+}
+
+}} // namespace
diff --git a/modules/cudalegacy/perf/perf_main.cpp b/modules/cudalegacy/perf/perf_main.cpp
new file mode 100644
index 00000000000..0830707460e
--- /dev/null
+++ b/modules/cudalegacy/perf/perf_main.cpp
@@ -0,0 +1,47 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+using namespace perf;
+
+CV_PERF_TEST_CUDA_MAIN(cudalegacy)
diff --git a/modules/cudalegacy/perf/perf_precomp.hpp b/modules/cudalegacy/perf/perf_precomp.hpp
new file mode 100644
index 00000000000..89f1b051a2b
--- /dev/null
+++ b/modules/cudalegacy/perf/perf_precomp.hpp
@@ -0,0 +1,57 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_PERF_PRECOMP_HPP__
+#define __OPENCV_PERF_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_perf.hpp"
+
+#include "opencv2/cudalegacy.hpp"
+#include "opencv2/video.hpp"
+
+#include "opencv2/opencv_modules.hpp"
+
+namespace opencv_test {
+using namespace perf;
+}
+
+#endif
diff --git a/modules/cudalegacy/src/NCV.cpp b/modules/cudalegacy/src/NCV.cpp
new file mode 100644
index 00000000000..ddb7003fad0
--- /dev/null
+++ b/modules/cudalegacy/src/NCV.cpp
@@ -0,0 +1,888 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+//==============================================================================
+//
+// Error handling helpers
+//
+//==============================================================================
+
+namespace
+{
+    #define error_entry(entry)  { entry, #entry }
+
+    struct ErrorEntry
+    {
+        int code;
+        const char* str;
+    };
+
+    struct ErrorEntryComparer
+    {
+        int code;
+        ErrorEntryComparer(int code_) : code(code_) {}
+        bool operator()(const ErrorEntry& e) const { return e.code == code; }
+    };
+
+    //////////////////////////////////////////////////////////////////////////
+    // NCV errors
+
+    const ErrorEntry ncv_errors [] =
+    {
+        error_entry( NCV_SUCCESS ),
+        error_entry( NCV_UNKNOWN_ERROR ),
+        error_entry( NCV_CUDA_ERROR ),
+        error_entry( NCV_NPP_ERROR ),
+        error_entry( NCV_FILE_ERROR ),
+        error_entry( NCV_NULL_PTR ),
+        error_entry( NCV_INCONSISTENT_INPUT ),
+        error_entry( NCV_TEXTURE_BIND_ERROR ),
+        error_entry( NCV_DIMENSIONS_INVALID ),
+        error_entry( NCV_INVALID_ROI ),
+        error_entry( NCV_INVALID_STEP ),
+        error_entry( NCV_INVALID_SCALE ),
+        error_entry( NCV_INVALID_SCALE ),
+        error_entry( NCV_ALLOCATOR_NOT_INITIALIZED ),
+        error_entry( NCV_ALLOCATOR_BAD_ALLOC ),
+        error_entry( NCV_ALLOCATOR_BAD_DEALLOC ),
+        error_entry( NCV_ALLOCATOR_INSUFFICIENT_CAPACITY ),
+        error_entry( NCV_ALLOCATOR_DEALLOC_ORDER ),
+        error_entry( NCV_ALLOCATOR_BAD_REUSE ),
+        error_entry( NCV_MEM_COPY_ERROR ),
+        error_entry( NCV_MEM_RESIDENCE_ERROR ),
+        error_entry( NCV_MEM_INSUFFICIENT_CAPACITY ),
+        error_entry( NCV_HAAR_INVALID_PIXEL_STEP ),
+        error_entry( NCV_HAAR_TOO_MANY_FEATURES_IN_CLASSIFIER ),
+        error_entry( NCV_HAAR_TOO_MANY_FEATURES_IN_CASCADE ),
+        error_entry( NCV_HAAR_TOO_LARGE_FEATURES ),
+        error_entry( NCV_HAAR_XML_LOADING_EXCEPTION ),
+        error_entry( NCV_NOIMPL_HAAR_TILTED_FEATURES ),
+        error_entry( NCV_WARNING_HAAR_DETECTIONS_VECTOR_OVERFLOW ),
+        error_entry( NPPST_SUCCESS ),
+        error_entry( NPPST_ERROR ),
+        error_entry( NPPST_CUDA_KERNEL_EXECUTION_ERROR ),
+        error_entry( NPPST_NULL_POINTER_ERROR ),
+        error_entry( NPPST_TEXTURE_BIND_ERROR ),
+        error_entry( NPPST_MEMCPY_ERROR ),
+        error_entry( NPPST_MEM_ALLOC_ERR ),
+        error_entry( NPPST_MEMFREE_ERR ),
+        error_entry( NPPST_INVALID_ROI ),
+        error_entry( NPPST_INVALID_STEP ),
+        error_entry( NPPST_INVALID_SCALE ),
+        error_entry( NPPST_MEM_INSUFFICIENT_BUFFER ),
+        error_entry( NPPST_MEM_RESIDENCE_ERROR ),
+        error_entry( NPPST_MEM_INTERNAL_ERROR )
+    };
+
+    const size_t ncv_error_num = sizeof(ncv_errors) / sizeof(ncv_errors[0]);
+}
+
+cv::String cv::cuda::getNcvErrorMessage(int code)
+{
+    size_t idx = std::find_if(ncv_errors, ncv_errors + ncv_error_num, ErrorEntryComparer(code)) - ncv_errors;
+
+    const char* msg = (idx != ncv_error_num) ? ncv_errors[idx].str : "Unknown error code";
+    String str = cv::format("%s [Code = %d]", msg, code);
+
+    return str;
+}
+
+
+static void stdDebugOutput(const cv::String &msg)
+{
+    std::cout << msg.c_str() << std::endl;
+}
+
+
+static NCVDebugOutputHandler *debugOutputHandler = stdDebugOutput;
+
+
+void ncvDebugOutput(const cv::String &msg)
+{
+    debugOutputHandler(msg);
+}
+
+
+void ncvSetDebugOutputHandler(NCVDebugOutputHandler *func)
+{
+    debugOutputHandler = func;
+}
+
+
+//==============================================================================
+//
+// Memory wrappers and helpers
+//
+//==============================================================================
+
+
+Ncv32u alignUp(Ncv32u what, Ncv32u alignment)
+{
+    Ncv32u alignMask = alignment-1;
+    Ncv32u inverseAlignMask = ~alignMask;
+    Ncv32u res = (what + alignMask) & inverseAlignMask;
+    return res;
+}
+
+
+void NCVMemPtr::clear()
+{
+    ptr = NULL;
+    memtype = NCVMemoryTypeNone;
+}
+
+
+void NCVMemSegment::clear()
+{
+    begin.clear();
+    size = 0;
+}
+
+
+NCVStatus memSegCopyHelper(void *dst, NCVMemoryType dstType, const void *src, NCVMemoryType srcType, size_t sz, cudaStream_t cuStream)
+{
+    NCVStatus ncvStat;
+    switch (dstType)
+    {
+    case NCVMemoryTypeHostPageable:
+    case NCVMemoryTypeHostPinned:
+        switch (srcType)
+        {
+        case NCVMemoryTypeHostPageable:
+        case NCVMemoryTypeHostPinned:
+            memcpy(dst, src, sz);
+            ncvStat = NCV_SUCCESS;
+            break;
+        case NCVMemoryTypeDevice:
+            if (cuStream != 0)
+            {
+                ncvAssertCUDAReturn(cudaMemcpyAsync(dst, src, sz, cudaMemcpyDeviceToHost, cuStream), NCV_CUDA_ERROR);
+            }
+            else
+            {
+                ncvAssertCUDAReturn(cudaMemcpy(dst, src, sz, cudaMemcpyDeviceToHost), NCV_CUDA_ERROR);
+            }
+            ncvStat = NCV_SUCCESS;
+            break;
+        default:
+            ncvStat = NCV_MEM_RESIDENCE_ERROR;
+        }
+        break;
+    case NCVMemoryTypeDevice:
+        switch (srcType)
+        {
+        case NCVMemoryTypeHostPageable:
+        case NCVMemoryTypeHostPinned:
+            if (cuStream != 0)
+            {
+                ncvAssertCUDAReturn(cudaMemcpyAsync(dst, src, sz, cudaMemcpyHostToDevice, cuStream), NCV_CUDA_ERROR);
+            }
+            else
+            {
+                ncvAssertCUDAReturn(cudaMemcpy(dst, src, sz, cudaMemcpyHostToDevice), NCV_CUDA_ERROR);
+            }
+            ncvStat = NCV_SUCCESS;
+            break;
+        case NCVMemoryTypeDevice:
+            if (cuStream != 0)
+            {
+                ncvAssertCUDAReturn(cudaMemcpyAsync(dst, src, sz, cudaMemcpyDeviceToDevice, cuStream), NCV_CUDA_ERROR);
+            }
+            else
+            {
+                ncvAssertCUDAReturn(cudaMemcpy(dst, src, sz, cudaMemcpyDeviceToDevice), NCV_CUDA_ERROR);
+            }
+            ncvStat = NCV_SUCCESS;
+            break;
+        default:
+            ncvStat = NCV_MEM_RESIDENCE_ERROR;
+        }
+        break;
+    default:
+        ncvStat = NCV_MEM_RESIDENCE_ERROR;
+    }
+
+    return ncvStat;
+}
+
+
+NCVStatus memSegCopyHelper2D(void *dst, Ncv32u dstPitch, NCVMemoryType dstType,
+                             const void *src, Ncv32u srcPitch, NCVMemoryType srcType,
+                             Ncv32u widthbytes, Ncv32u height, cudaStream_t cuStream)
+{
+    NCVStatus ncvStat;
+    switch (dstType)
+    {
+    case NCVMemoryTypeHostPageable:
+    case NCVMemoryTypeHostPinned:
+        switch (srcType)
+        {
+        case NCVMemoryTypeHostPageable:
+        case NCVMemoryTypeHostPinned:
+            for (Ncv32u i=0; i<height; i++)
+            {
+                memcpy((char*)dst + i * dstPitch, (char*)src + i * srcPitch, widthbytes);
+            }
+            ncvStat = NCV_SUCCESS;
+            break;
+        case NCVMemoryTypeDevice:
+            if (cuStream != 0)
+            {
+                ncvAssertCUDAReturn(cudaMemcpy2DAsync(dst, dstPitch, src, srcPitch, widthbytes, height, cudaMemcpyDeviceToHost, cuStream), NCV_CUDA_ERROR);
+            }
+            else
+            {
+                ncvAssertCUDAReturn(cudaMemcpy2D(dst, dstPitch, src, srcPitch, widthbytes, height, cudaMemcpyDeviceToHost), NCV_CUDA_ERROR);
+            }
+            ncvStat = NCV_SUCCESS;
+            break;
+        default:
+            ncvStat = NCV_MEM_RESIDENCE_ERROR;
+        }
+        break;
+    case NCVMemoryTypeDevice:
+        switch (srcType)
+        {
+        case NCVMemoryTypeHostPageable:
+        case NCVMemoryTypeHostPinned:
+            if (cuStream != 0)
+            {
+                ncvAssertCUDAReturn(cudaMemcpy2DAsync(dst, dstPitch, src, srcPitch, widthbytes, height, cudaMemcpyHostToDevice, cuStream), NCV_CUDA_ERROR);
+            }
+            else
+            {
+                ncvAssertCUDAReturn(cudaMemcpy2D(dst, dstPitch, src, srcPitch, widthbytes, height, cudaMemcpyHostToDevice), NCV_CUDA_ERROR);
+            }
+            ncvStat = NCV_SUCCESS;
+            break;
+        case NCVMemoryTypeDevice:
+            if (cuStream != 0)
+            {
+                ncvAssertCUDAReturn(cudaMemcpy2DAsync(dst, dstPitch, src, srcPitch, widthbytes, height, cudaMemcpyDeviceToDevice, cuStream), NCV_CUDA_ERROR);
+            }
+            else
+            {
+                ncvAssertCUDAReturn(cudaMemcpy2D(dst, dstPitch, src, srcPitch, widthbytes, height, cudaMemcpyDeviceToDevice), NCV_CUDA_ERROR);
+            }
+            ncvStat = NCV_SUCCESS;
+            break;
+        default:
+            ncvStat = NCV_MEM_RESIDENCE_ERROR;
+        }
+        break;
+    default:
+        ncvStat = NCV_MEM_RESIDENCE_ERROR;
+    }
+
+    return ncvStat;
+}
+
+
+//===================================================================
+//
+// NCVMemStackAllocator class members implementation
+//
+//===================================================================
+
+
+NCVMemStackAllocator::NCVMemStackAllocator(Ncv32u alignment_) :
+    _memType(NCVMemoryTypeNone),
+    _alignment(alignment_),
+    allocBegin(NULL),
+    begin(NULL),
+    end(NULL),
+    currentSize(0),
+    _maxSize(0),
+    bReusesMemory(false)
+{
+    NcvBool bProperAlignment = (alignment_ & (alignment_ - 1)) == 0;
+    ncvAssertPrintCheck(bProperAlignment, "NCVMemStackAllocator ctor:: alignment not power of 2");
+}
+
+
+NCVMemStackAllocator::NCVMemStackAllocator(NCVMemoryType memT, size_t capacity, Ncv32u alignment_, void *reusePtr) :
+    _memType(memT),
+    _alignment(alignment_),
+    allocBegin(NULL),
+    currentSize(0),
+    _maxSize(0)
+{
+    NcvBool bProperAlignment = (alignment_ & (alignment_ - 1)) == 0;
+    ncvAssertPrintCheck(bProperAlignment, "NCVMemStackAllocator ctor:: _alignment not power of 2");
+    ncvAssertPrintCheck(memT != NCVMemoryTypeNone, "NCVMemStackAllocator ctor:: Incorrect allocator type");
+
+    allocBegin = NULL;
+
+    if (reusePtr == NULL && capacity != 0)
+    {
+        bReusesMemory = false;
+        switch (memT)
+        {
+        case NCVMemoryTypeDevice:
+            ncvAssertCUDAReturn(cudaMalloc(&allocBegin, capacity), );
+            break;
+        case NCVMemoryTypeHostPinned:
+            ncvAssertCUDAReturn(cudaMallocHost(&allocBegin, capacity), );
+            break;
+        case NCVMemoryTypeHostPageable:
+            allocBegin = (Ncv8u *)malloc(capacity);
+            break;
+        default:;
+        }
+    }
+    else
+    {
+        bReusesMemory = true;
+        allocBegin = (Ncv8u *)reusePtr;
+    }
+
+    if (capacity == 0)
+    {
+        allocBegin = (Ncv8u *)(0x1);
+    }
+
+    if (!isCounting())
+    {
+        begin = allocBegin;
+        end = begin + capacity;
+    }
+}
+
+
+NCVMemStackAllocator::~NCVMemStackAllocator()
+{
+    if (allocBegin != NULL)
+    {
+        ncvAssertPrintCheck(currentSize == 0, "NCVMemStackAllocator dtor:: not all objects were deallocated properly, forcing destruction");
+
+        if (!bReusesMemory && (allocBegin != (Ncv8u *)(0x1)))
+        {
+            switch (_memType)
+            {
+            case NCVMemoryTypeDevice:
+                ncvAssertCUDAReturn(cudaFree(allocBegin), );
+                break;
+            case NCVMemoryTypeHostPinned:
+                ncvAssertCUDAReturn(cudaFreeHost(allocBegin), );
+                break;
+            case NCVMemoryTypeHostPageable:
+                free(allocBegin);
+                break;
+            default:;
+            }
+        }
+
+        allocBegin = NULL;
+    }
+}
+
+
+NCVStatus NCVMemStackAllocator::alloc(NCVMemSegment &seg, size_t size)
+{
+    seg.clear();
+    ncvAssertReturn(isInitialized(), NCV_ALLOCATOR_BAD_ALLOC);
+
+    size = alignUp(static_cast<Ncv32u>(size), this->_alignment);
+    this->currentSize += size;
+    this->_maxSize = std::max(this->_maxSize, this->currentSize);
+
+    if (!isCounting())
+    {
+        size_t availSize = end - begin;
+        ncvAssertReturn(size <= availSize, NCV_ALLOCATOR_INSUFFICIENT_CAPACITY);
+    }
+
+    seg.begin.ptr = begin;
+    seg.begin.memtype = this->_memType;
+    seg.size = size;
+    begin += size;
+
+    return NCV_SUCCESS;
+}
+
+
+NCVStatus NCVMemStackAllocator::dealloc(NCVMemSegment &seg)
+{
+    ncvAssertReturn(isInitialized(), NCV_ALLOCATOR_BAD_ALLOC);
+    ncvAssertReturn(seg.begin.memtype == this->_memType, NCV_ALLOCATOR_BAD_DEALLOC);
+    ncvAssertReturn(seg.begin.ptr != NULL || isCounting(), NCV_ALLOCATOR_BAD_DEALLOC);
+    ncvAssertReturn(seg.begin.ptr == begin - seg.size, NCV_ALLOCATOR_DEALLOC_ORDER);
+
+    currentSize -= seg.size;
+    begin -= seg.size;
+
+    seg.clear();
+
+    ncvAssertReturn(allocBegin <= begin, NCV_ALLOCATOR_BAD_DEALLOC);
+
+    return NCV_SUCCESS;
+}
+
+
+NcvBool NCVMemStackAllocator::isInitialized(void) const
+{
+    return (((this->_alignment & (this->_alignment-1)) == 0) && isCounting()) || this->allocBegin != NULL;
+}
+
+
+NcvBool NCVMemStackAllocator::isCounting(void) const
+{
+    return this->_memType == NCVMemoryTypeNone;
+}
+
+
+NCVMemoryType NCVMemStackAllocator::memType(void) const
+{
+    return this->_memType;
+}
+
+
+Ncv32u NCVMemStackAllocator::alignment(void) const
+{
+    return this->_alignment;
+}
+
+
+size_t NCVMemStackAllocator::maxSize(void) const
+{
+    return this->_maxSize;
+}
+
+
+//===================================================================
+//
+// NCVMemNativeAllocator class members implementation
+//
+//===================================================================
+
+
+NCVMemNativeAllocator::NCVMemNativeAllocator(NCVMemoryType memT, Ncv32u alignment_) :
+    _memType(memT),
+    _alignment(alignment_),
+    currentSize(0),
+    _maxSize(0)
+{
+    ncvAssertPrintReturn(memT != NCVMemoryTypeNone, "NCVMemNativeAllocator ctor:: counting not permitted for this allocator type", );
+}
+
+
+NCVMemNativeAllocator::~NCVMemNativeAllocator()
+{
+    ncvAssertPrintCheck(currentSize == 0, "NCVMemNativeAllocator dtor:: detected memory leak");
+}
+
+
+NCVStatus NCVMemNativeAllocator::alloc(NCVMemSegment &seg, size_t size)
+{
+    seg.clear();
+    ncvAssertReturn(isInitialized(), NCV_ALLOCATOR_BAD_ALLOC);
+
+    switch (this->_memType)
+    {
+    case NCVMemoryTypeDevice:
+        ncvAssertCUDAReturn(cudaMalloc(&seg.begin.ptr, size), NCV_CUDA_ERROR);
+        break;
+    case NCVMemoryTypeHostPinned:
+        ncvAssertCUDAReturn(cudaMallocHost(&seg.begin.ptr, size), NCV_CUDA_ERROR);
+        break;
+    case NCVMemoryTypeHostPageable:
+        seg.begin.ptr = (Ncv8u *)malloc(size);
+        break;
+    default:;
+    }
+
+    this->currentSize += alignUp(static_cast<Ncv32u>(size), this->_alignment);
+    this->_maxSize = std::max(this->_maxSize, this->currentSize);
+
+    seg.begin.memtype = this->_memType;
+    seg.size = size;
+
+    return NCV_SUCCESS;
+}
+
+
+NCVStatus NCVMemNativeAllocator::dealloc(NCVMemSegment &seg)
+{
+    ncvAssertReturn(isInitialized(), NCV_ALLOCATOR_BAD_ALLOC);
+    ncvAssertReturn(seg.begin.memtype == this->_memType, NCV_ALLOCATOR_BAD_DEALLOC);
+    ncvAssertReturn(seg.begin.ptr != NULL, NCV_ALLOCATOR_BAD_DEALLOC);
+
+    ncvAssertReturn(currentSize >= alignUp(static_cast<Ncv32u>(seg.size), this->_alignment), NCV_ALLOCATOR_BAD_DEALLOC);
+    currentSize -= alignUp(static_cast<Ncv32u>(seg.size), this->_alignment);
+
+    switch (this->_memType)
+    {
+    case NCVMemoryTypeDevice:
+        ncvAssertCUDAReturn(cudaFree(seg.begin.ptr), NCV_CUDA_ERROR);
+        break;
+    case NCVMemoryTypeHostPinned:
+        ncvAssertCUDAReturn(cudaFreeHost(seg.begin.ptr), NCV_CUDA_ERROR);
+        break;
+    case NCVMemoryTypeHostPageable:
+        free(seg.begin.ptr);
+        break;
+    default:;
+    }
+
+    seg.clear();
+
+    return NCV_SUCCESS;
+}
+
+
+NcvBool NCVMemNativeAllocator::isInitialized(void) const
+{
+    return (this->_alignment != 0);
+}
+
+
+NcvBool NCVMemNativeAllocator::isCounting(void) const
+{
+    return false;
+}
+
+
+NCVMemoryType NCVMemNativeAllocator::memType(void) const
+{
+    return this->_memType;
+}
+
+
+Ncv32u NCVMemNativeAllocator::alignment(void) const
+{
+    return this->_alignment;
+}
+
+
+size_t NCVMemNativeAllocator::maxSize(void) const
+{
+    return this->_maxSize;
+}
+
+
+//===================================================================
+//
+// Time and timer routines
+//
+//===================================================================
+
+
+typedef struct _NcvTimeMoment NcvTimeMoment;
+
+#if defined(_WIN32) || defined(_WIN64)
+
+    #include <Windows.h>
+
+    typedef struct _NcvTimeMoment
+    {
+        LONGLONG moment, freq;
+    } NcvTimeMoment;
+
+
+    static void _ncvQueryMoment(NcvTimeMoment *t)
+    {
+        QueryPerformanceFrequency((LARGE_INTEGER *)&(t->freq));
+        QueryPerformanceCounter((LARGE_INTEGER *)&(t->moment));
+    }
+
+
+    double _ncvMomentToMicroseconds(NcvTimeMoment *t)
+    {
+        return 1000000.0 * t->moment / t->freq;
+    }
+
+
+    double _ncvMomentsDiffToMicroseconds(NcvTimeMoment *t1, NcvTimeMoment *t2)
+    {
+        return 1000000.0 * 2 * ((t2->moment) - (t1->moment)) / (t1->freq + t2->freq);
+    }
+
+
+    double _ncvMomentsDiffToMilliseconds(NcvTimeMoment *t1, NcvTimeMoment *t2)
+    {
+        return 1000.0 * 2 * ((t2->moment) - (t1->moment)) / (t1->freq + t2->freq);
+    }
+
+#elif defined(__GNUC__)
+
+    #include <sys/time.h>
+
+    typedef struct _NcvTimeMoment
+    {
+        struct timeval tv;
+        struct timezone tz;
+    } NcvTimeMoment;
+
+
+    void _ncvQueryMoment(NcvTimeMoment *t)
+    {
+        gettimeofday(& t->tv, & t->tz);
+    }
+
+
+    double _ncvMomentToMicroseconds(NcvTimeMoment *t)
+    {
+        return 1000000.0 * t->tv.tv_sec + (double)t->tv.tv_usec;
+    }
+
+
+    double _ncvMomentsDiffToMicroseconds(NcvTimeMoment *t1, NcvTimeMoment *t2)
+    {
+        return (((double)t2->tv.tv_sec - (double)t1->tv.tv_sec) * 1000000 + (double)t2->tv.tv_usec - (double)t1->tv.tv_usec);
+    }
+
+    double _ncvMomentsDiffToMilliseconds(NcvTimeMoment *t1, NcvTimeMoment *t2)
+    {
+        return ((double)t2->tv.tv_sec - (double)t1->tv.tv_sec) * 1000;
+    }
+
+#endif //#if defined(_WIN32) || defined(_WIN64)
+
+
+struct _NcvTimer
+{
+    NcvTimeMoment t1, t2;
+};
+
+
+NcvTimer ncvStartTimer(void)
+{
+    struct _NcvTimer *t;
+    t = (struct _NcvTimer *)malloc(sizeof(struct _NcvTimer));
+    _ncvQueryMoment(&t->t1);
+    return t;
+}
+
+
+double ncvEndQueryTimerUs(NcvTimer t)
+{
+    double res;
+    _ncvQueryMoment(&t->t2);
+    res = _ncvMomentsDiffToMicroseconds(&t->t1, &t->t2);
+    free(t);
+    return res;
+}
+
+
+double ncvEndQueryTimerMs(NcvTimer t)
+{
+    double res;
+    _ncvQueryMoment(&t->t2);
+    res = _ncvMomentsDiffToMilliseconds(&t->t1, &t->t2);
+    free(t);
+    return res;
+}
+
+
+//===================================================================
+//
+// Operations with rectangles
+//
+//===================================================================
+
+struct RectConvert
+{
+    cv::Rect operator()(const NcvRect32u& nr) const { return cv::Rect(nr.x, nr.y, nr.width, nr.height); }
+    NcvRect32u operator()(const cv::Rect& nr) const
+    {
+        NcvRect32u rect;
+        rect.x = nr.x;
+        rect.y = nr.y;
+        rect.width = nr.width;
+        rect.height = nr.height;
+        return rect;
+    }
+};
+
+static void groupRectangles(std::vector<NcvRect32u> &hypotheses, int groupThreshold, double eps, std::vector<Ncv32u> *weights)
+{
+#ifndef HAVE_OPENCV_OBJDETECT
+    CV_UNUSED(hypotheses);
+    CV_UNUSED(groupThreshold);
+    CV_UNUSED(eps);
+    CV_UNUSED(weights);
+    CV_Error(cv::Error::StsNotImplemented, "This functionality requires objdetect module");
+#else
+    std::vector<cv::Rect> rects(hypotheses.size());
+    std::transform(hypotheses.begin(), hypotheses.end(), rects.begin(), RectConvert());
+
+    if (weights)
+    {
+        std::vector<int> weights_int;
+        weights_int.assign(weights->begin(), weights->end());
+        cv::groupRectangles(rects, weights_int, groupThreshold, eps);
+    }
+    else
+    {
+        cv::groupRectangles(rects, groupThreshold, eps);
+    }
+    std::transform(rects.begin(), rects.end(), hypotheses.begin(), RectConvert());
+    hypotheses.resize(rects.size());
+#endif
+}
+
+
+
+NCVStatus ncvGroupRectangles_host(NCVVector<NcvRect32u> &hypotheses,
+                                  Ncv32u &numHypotheses,
+                                  Ncv32u minNeighbors,
+                                  Ncv32f intersectEps,
+                                  NCVVector<Ncv32u> *hypothesesWeights)
+{
+    ncvAssertReturn(hypotheses.memType() == NCVMemoryTypeHostPageable ||
+                    hypotheses.memType() == NCVMemoryTypeHostPinned, NCV_MEM_RESIDENCE_ERROR);
+    if (hypothesesWeights != NULL)
+    {
+        ncvAssertReturn(hypothesesWeights->memType() == NCVMemoryTypeHostPageable ||
+                        hypothesesWeights->memType() == NCVMemoryTypeHostPinned, NCV_MEM_RESIDENCE_ERROR);
+    }
+
+    if (numHypotheses == 0)
+    {
+        return NCV_SUCCESS;
+    }
+
+    std::vector<NcvRect32u> rects(numHypotheses);
+    memcpy(&rects[0], hypotheses.ptr(), numHypotheses * sizeof(NcvRect32u));
+
+    std::vector<Ncv32u> weights;
+    if (hypothesesWeights != NULL)
+    {
+        groupRectangles(rects, minNeighbors, intersectEps, &weights);
+    }
+    else
+    {
+        groupRectangles(rects, minNeighbors, intersectEps, NULL);
+    }
+
+    numHypotheses = (Ncv32u)rects.size();
+    if (numHypotheses > 0)
+    {
+        memcpy(hypotheses.ptr(), &rects[0], numHypotheses * sizeof(NcvRect32u));
+    }
+
+    if (hypothesesWeights != NULL)
+    {
+        memcpy(hypothesesWeights->ptr(), &weights[0], numHypotheses * sizeof(Ncv32u));
+    }
+
+    return NCV_SUCCESS;
+}
+
+
+template <class T>
+static NCVStatus drawRectsWrapperHost(T *h_dst,
+                                      Ncv32u dstStride,
+                                      Ncv32u dstWidth,
+                                      Ncv32u dstHeight,
+                                      NcvRect32u *h_rects,
+                                      Ncv32u numRects,
+                                      T color)
+{
+    ncvAssertReturn(h_dst != NULL && h_rects != NULL, NCV_NULL_PTR);
+    ncvAssertReturn(dstWidth > 0 && dstHeight > 0, NCV_DIMENSIONS_INVALID);
+    ncvAssertReturn(dstStride >= dstWidth, NCV_INVALID_STEP);
+    ncvAssertReturn(numRects != 0, NCV_SUCCESS);
+    ncvAssertReturn(numRects <= dstWidth * dstHeight, NCV_DIMENSIONS_INVALID);
+
+    for (Ncv32u i=0; i<numRects; i++)
+    {
+        NcvRect32u rect = h_rects[i];
+
+        if (rect.x < dstWidth)
+        {
+            for (Ncv32u each=rect.y; each<rect.y+rect.height && each<dstHeight; each++)
+            {
+                h_dst[each*dstStride+rect.x] = color;
+            }
+        }
+        if (rect.x+rect.width-1 < dstWidth)
+        {
+            for (Ncv32u each=rect.y; each<rect.y+rect.height && each<dstHeight; each++)
+            {
+                h_dst[each*dstStride+rect.x+rect.width-1] = color;
+            }
+        }
+        if (rect.y < dstHeight)
+        {
+            for (Ncv32u j=rect.x; j<rect.x+rect.width && j<dstWidth; j++)
+            {
+                h_dst[rect.y*dstStride+j] = color;
+            }
+        }
+        if (rect.y + rect.height - 1 < dstHeight)
+        {
+            for (Ncv32u j=rect.x; j<rect.x+rect.width && j<dstWidth; j++)
+            {
+                h_dst[(rect.y+rect.height-1)*dstStride+j] = color;
+            }
+        }
+    }
+
+    return NCV_SUCCESS;
+}
+
+
+NCVStatus ncvDrawRects_8u_host(Ncv8u *h_dst,
+                               Ncv32u dstStride,
+                               Ncv32u dstWidth,
+                               Ncv32u dstHeight,
+                               NcvRect32u *h_rects,
+                               Ncv32u numRects,
+                               Ncv8u color)
+{
+    return drawRectsWrapperHost(h_dst, dstStride, dstWidth, dstHeight, h_rects, numRects, color);
+}
+
+
+NCVStatus ncvDrawRects_32u_host(Ncv32u *h_dst,
+                                Ncv32u dstStride,
+                                Ncv32u dstWidth,
+                                Ncv32u dstHeight,
+                                NcvRect32u *h_rects,
+                                Ncv32u numRects,
+                                Ncv32u color)
+{
+    return drawRectsWrapperHost(h_dst, dstStride, dstWidth, dstHeight, h_rects, numRects, color);
+}
diff --git a/modules/cudalegacy/src/bm.cpp b/modules/cudalegacy/src/bm.cpp
new file mode 100644
index 00000000000..19ad164f85e
--- /dev/null
+++ b/modules/cudalegacy/src/bm.cpp
@@ -0,0 +1,204 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined HAVE_CUDA || defined(CUDA_DISABLER)
+
+void cv::cuda::calcOpticalFlowBM(const GpuMat&, const GpuMat&, Size, Size, Size, bool, GpuMat&, GpuMat&, GpuMat&, Stream&) { throw_no_cuda(); }
+
+#else // HAVE_CUDA
+
+namespace optflowbm
+{
+    void calc(PtrStepSzb prev, PtrStepSzb curr, PtrStepSzf velx, PtrStepSzf vely, int2 blockSize, int2 shiftSize, bool usePrevious,
+              int maxX, int maxY, int acceptLevel, int escapeLevel, const short2* ss, int ssCount, cudaStream_t stream);
+}
+
+void cv::cuda::calcOpticalFlowBM(const GpuMat& prev, const GpuMat& curr, Size blockSize, Size shiftSize, Size maxRange, bool usePrevious, GpuMat& velx, GpuMat& vely, GpuMat& buf, Stream& st)
+{
+    CV_Assert( prev.type() == CV_8UC1 );
+    CV_Assert( curr.size() == prev.size() && curr.type() == prev.type() );
+
+    const Size velSize((prev.cols - blockSize.width + shiftSize.width) / shiftSize.width,
+                       (prev.rows - blockSize.height + shiftSize.height) / shiftSize.height);
+
+    velx.create(velSize, CV_32FC1);
+    vely.create(velSize, CV_32FC1);
+
+    // scanning scheme coordinates
+    std::vector<short2> ss((2 * maxRange.width + 1) * (2 * maxRange.height + 1));
+    int ssCount = 0;
+
+    // Calculate scanning scheme
+    const int minCount = std::min(maxRange.width, maxRange.height);
+
+    // use spiral search pattern
+    //
+    //     9 10 11 12
+    //     8  1  2 13
+    //     7  *  3 14
+    //     6  5  4 15
+    //... 20 19 18 17
+    //
+
+    for (int i = 0; i < minCount; ++i)
+    {
+        // four cycles along sides
+        int x = -i - 1, y = x;
+
+        // upper side
+        for (int j = -i; j <= i + 1; ++j, ++ssCount)
+        {
+            ss[ssCount].x = (short) ++x;
+            ss[ssCount].y = (short) y;
+        }
+
+        // right side
+        for (int j = -i; j <= i + 1; ++j, ++ssCount)
+        {
+            ss[ssCount].x = (short) x;
+            ss[ssCount].y = (short) ++y;
+        }
+
+        // bottom side
+        for (int j = -i; j <= i + 1; ++j, ++ssCount)
+        {
+            ss[ssCount].x = (short) --x;
+            ss[ssCount].y = (short) y;
+        }
+
+        // left side
+        for (int j = -i; j <= i + 1; ++j, ++ssCount)
+        {
+            ss[ssCount].x = (short) x;
+            ss[ssCount].y = (short) --y;
+        }
+    }
+
+    // the rest part
+    if (maxRange.width < maxRange.height)
+    {
+        const int xleft = -minCount;
+
+        // cycle by neighbor rings
+        for (int i = minCount; i < maxRange.height; ++i)
+        {
+            // two cycles by x
+            int y = -(i + 1);
+            int x = xleft;
+
+            // upper side
+            for (int j = -maxRange.width; j <= maxRange.width; ++j, ++ssCount, ++x)
+            {
+                ss[ssCount].x = (short) x;
+                ss[ssCount].y = (short) y;
+            }
+
+            x = xleft;
+            y = -y;
+
+            // bottom side
+            for (int j = -maxRange.width; j <= maxRange.width; ++j, ++ssCount, ++x)
+            {
+                ss[ssCount].x = (short) x;
+                ss[ssCount].y = (short) y;
+            }
+        }
+    }
+    else if (maxRange.width > maxRange.height)
+    {
+        const int yupper = -minCount;
+
+        // cycle by neighbor rings
+        for (int i = minCount; i < maxRange.width; ++i)
+        {
+            // two cycles by y
+            int x = -(i + 1);
+            int y = yupper;
+
+            // left side
+            for (int j = -maxRange.height; j <= maxRange.height; ++j, ++ssCount, ++y)
+            {
+                ss[ssCount].x = (short) x;
+                ss[ssCount].y = (short) y;
+            }
+
+            y = yupper;
+            x = -x;
+
+            // right side
+            for (int j = -maxRange.height; j <= maxRange.height; ++j, ++ssCount, ++y)
+            {
+                ss[ssCount].x = (short) x;
+                ss[ssCount].y = (short) y;
+            }
+        }
+    }
+
+    const cudaStream_t stream = StreamAccessor::getStream(st);
+
+    ensureSizeIsEnough(1, ssCount, CV_16SC2, buf);
+    if (stream == 0)
+        cudaSafeCall( cudaMemcpy(buf.data, &ss[0], ssCount * sizeof(short2), cudaMemcpyHostToDevice) );
+    else
+        cudaSafeCall( cudaMemcpyAsync(buf.data, &ss[0], ssCount * sizeof(short2), cudaMemcpyHostToDevice, stream) );
+
+    const int maxX = prev.cols - blockSize.width;
+    const int maxY = prev.rows - blockSize.height;
+
+    const int SMALL_DIFF = 2;
+    const int BIG_DIFF = 128;
+
+    const int blSize = blockSize.area();
+    const int acceptLevel = blSize * SMALL_DIFF;
+    const int escapeLevel = blSize * BIG_DIFF;
+
+    optflowbm::calc(prev, curr, velx, vely,
+                    make_int2(blockSize.width, blockSize.height), make_int2(shiftSize.width, shiftSize.height), usePrevious,
+                    maxX, maxY, acceptLevel, escapeLevel, buf.ptr<short2>(), ssCount, stream);
+}
+
+#endif // HAVE_CUDA
diff --git a/modules/cudalegacy/src/bm_fast.cpp b/modules/cudalegacy/src/bm_fast.cpp
new file mode 100644
index 00000000000..ecb87908fc1
--- /dev/null
+++ b/modules/cudalegacy/src/bm_fast.cpp
@@ -0,0 +1,90 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined HAVE_CUDA || !defined(HAVE_OPENCV_CUDAARITHM) || defined(CUDA_DISABLER)
+
+void cv::cuda::FastOpticalFlowBM::operator ()(const GpuMat&, const GpuMat&, GpuMat&, GpuMat&, int, int, Stream&) { throw_no_cuda(); }
+
+#else // HAVE_CUDA
+
+namespace optflowbm_fast
+{
+    void get_buffer_size(int src_cols, int src_rows, int search_window, int block_window, int& buffer_cols, int& buffer_rows);
+
+    template <typename T>
+    void calc(PtrStepSzb I0, PtrStepSzb I1, PtrStepSzf velx, PtrStepSzf vely, PtrStepi buffer, int search_window, int block_window, cudaStream_t stream);
+}
+
+void cv::cuda::FastOpticalFlowBM::operator ()(const GpuMat& I0, const GpuMat& I1, GpuMat& flowx, GpuMat& flowy, int search_window, int block_window, Stream& stream)
+{
+    CV_Assert( I0.type() == CV_8UC1 );
+    CV_Assert( I1.size() == I0.size() && I1.type() == I0.type() );
+
+    int border_size = search_window / 2 + block_window / 2;
+    Size esize = I0.size() + Size(border_size, border_size) * 2;
+
+    ensureSizeIsEnough(esize, I0.type(), extended_I0);
+    ensureSizeIsEnough(esize, I0.type(), extended_I1);
+
+    cuda::copyMakeBorder(I0, extended_I0, border_size, border_size, border_size, border_size, cv::BORDER_DEFAULT, Scalar(), stream);
+    cuda::copyMakeBorder(I1, extended_I1, border_size, border_size, border_size, border_size, cv::BORDER_DEFAULT, Scalar(), stream);
+
+    GpuMat I0_hdr = extended_I0(Rect(Point2i(border_size, border_size), I0.size()));
+    GpuMat I1_hdr = extended_I1(Rect(Point2i(border_size, border_size), I0.size()));
+
+    int bcols, brows;
+    optflowbm_fast::get_buffer_size(I0.cols, I0.rows, search_window, block_window, bcols, brows);
+
+    ensureSizeIsEnough(brows, bcols, CV_32SC1, buffer);
+
+    flowx.create(I0.size(), CV_32FC1);
+    flowy.create(I0.size(), CV_32FC1);
+
+    optflowbm_fast::calc<uchar>(I0_hdr, I1_hdr, flowx, flowy, buffer, search_window, block_window, StreamAccessor::getStream(stream));
+}
+
+#endif // HAVE_CUDA
diff --git a/modules/cudalegacy/src/calib3d.cpp b/modules/cudalegacy/src/calib3d.cpp
new file mode 100644
index 00000000000..b58ca3a98d3
--- /dev/null
+++ b/modules/cudalegacy/src/calib3d.cpp
@@ -0,0 +1,292 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined HAVE_CUDA || !defined HAVE_OPENCV_CALIB3D || defined(CUDA_DISABLER)
+
+void cv::cuda::transformPoints(const GpuMat&, const Mat&, const Mat&, GpuMat&, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::projectPoints(const GpuMat&, const Mat&, const Mat&, const Mat&, const Mat&, GpuMat&, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::solvePnPRansac(const Mat&, const Mat&, const Mat&, const Mat&, Mat&, Mat&, bool, int, float, int, std::vector<int>*) { throw_no_cuda(); }
+
+#else
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace transform_points
+    {
+        void call(const PtrStepSz<float3> src, const float* rot, const float* transl, PtrStepSz<float3> dst, cudaStream_t stream);
+    }
+
+    namespace project_points
+    {
+        void call(const PtrStepSz<float3> src, const float* rot, const float* transl, const float* proj, PtrStepSz<float2> dst, cudaStream_t stream);
+    }
+
+    namespace solve_pnp_ransac
+    {
+        int maxNumIters();
+
+        void computeHypothesisScores(
+                const int num_hypotheses, const int num_points, const float* rot_matrices,
+                const float3* transl_vectors, const float3* object, const float2* image,
+                const float dist_threshold, int* hypothesis_scores);
+    }
+}}}
+
+using namespace ::cv::cuda::device;
+
+namespace
+{
+    void transformPointsCaller(const GpuMat& src, const Mat& rvec, const Mat& tvec, GpuMat& dst, cudaStream_t stream)
+    {
+        CV_Assert(src.rows == 1 && src.cols > 0 && src.type() == CV_32FC3);
+        CV_Assert(rvec.size() == Size(3, 1) && rvec.type() == CV_32F);
+        CV_Assert(tvec.size() == Size(3, 1) && tvec.type() == CV_32F);
+
+        // Convert rotation vector into matrix
+        Mat rot;
+        Rodrigues(rvec, rot);
+
+        dst.create(src.size(), src.type());
+        transform_points::call(src, rot.ptr<float>(), tvec.ptr<float>(), dst, stream);
+    }
+}
+
+void cv::cuda::transformPoints(const GpuMat& src, const Mat& rvec, const Mat& tvec, GpuMat& dst, Stream& stream)
+{
+    transformPointsCaller(src, rvec, tvec, dst, StreamAccessor::getStream(stream));
+}
+
+namespace
+{
+    void projectPointsCaller(const GpuMat& src, const Mat& rvec, const Mat& tvec, const Mat& camera_mat, const Mat& dist_coef, GpuMat& dst, cudaStream_t stream)
+    {
+        CV_Assert(src.rows == 1 && src.cols > 0 && src.type() == CV_32FC3);
+        CV_Assert(rvec.size() == Size(3, 1) && rvec.type() == CV_32F);
+        CV_Assert(tvec.size() == Size(3, 1) && tvec.type() == CV_32F);
+        CV_Assert(camera_mat.size() == Size(3, 3) && camera_mat.type() == CV_32F);
+        CV_Assert(dist_coef.empty()); // Undistortion isn't supported
+
+        // Convert rotation vector into matrix
+        Mat rot;
+        Rodrigues(rvec, rot);
+
+        dst.create(src.size(), CV_32FC2);
+        project_points::call(src, rot.ptr<float>(), tvec.ptr<float>(), camera_mat.ptr<float>(), dst,stream);
+    }
+}
+
+void cv::cuda::projectPoints(const GpuMat& src, const Mat& rvec, const Mat& tvec, const Mat& camera_mat, const Mat& dist_coef, GpuMat& dst, Stream& stream)
+{
+    projectPointsCaller(src, rvec, tvec, camera_mat, dist_coef, dst, StreamAccessor::getStream(stream));
+}
+
+namespace
+{
+    // Selects subset_size random different points from [0, num_points - 1] range
+    void selectRandom(int subset_size, int num_points, std::vector<int>& subset)
+    {
+        subset.resize(subset_size);
+        for (int i = 0; i < subset_size; ++i)
+        {
+            bool was;
+            do
+            {
+                subset[i] = rand() % num_points;
+                was = false;
+                for (int j = 0; j < i; ++j)
+                    if (subset[j] == subset[i])
+                    {
+                        was = true;
+                        break;
+                    }
+            } while (was);
+        }
+    }
+
+    // Computes rotation, translation pair for small subsets if the input data
+    class TransformHypothesesGenerator : public ParallelLoopBody
+    {
+    public:
+        TransformHypothesesGenerator(const Mat& object_, const Mat& image_, const Mat& dist_coef_,
+                                     const Mat& camera_mat_, int num_points_, int subset_size_,
+                                     Mat rot_matrices_, Mat transl_vectors_)
+                : object(&object_), image(&image_), dist_coef(&dist_coef_), camera_mat(&camera_mat_),
+                  num_points(num_points_), subset_size(subset_size_), rot_matrices(rot_matrices_),
+                  transl_vectors(transl_vectors_) {}
+
+        void operator()(const Range& range) const CV_OVERRIDE
+        {
+            // Input data for generation of the current hypothesis
+            std::vector<int> subset_indices(subset_size);
+            Mat_<Point3f> object_subset(1, subset_size);
+            Mat_<Point2f> image_subset(1, subset_size);
+
+            // Current hypothesis data
+            Mat rot_vec(1, 3, CV_64F);
+            Mat rot_mat(3, 3, CV_64F);
+            Mat transl_vec(1, 3, CV_64F);
+
+            for (int iter = range.start; iter < range.end; ++iter)
+            {
+                selectRandom(subset_size, num_points, subset_indices);
+                for (int i = 0; i < subset_size; ++i)
+                {
+                   object_subset(0, i) = object->at<Point3f>(subset_indices[i]);
+                   image_subset(0, i) = image->at<Point2f>(subset_indices[i]);
+                }
+
+                solvePnP(object_subset, image_subset, *camera_mat, *dist_coef, rot_vec, transl_vec, false, SOLVEPNP_EPNP);
+
+                // Remember translation vector
+                Mat transl_vec_ = transl_vectors.colRange(iter * 3, (iter + 1) * 3);
+                transl_vec = transl_vec.reshape(0, 1);
+                transl_vec.convertTo(transl_vec_, CV_32F);
+
+                // Remember rotation matrix
+                Rodrigues(rot_vec, rot_mat);
+                Mat rot_mat_ = rot_matrices.colRange(iter * 9, (iter + 1) * 9).reshape(0, 3);
+                rot_mat.convertTo(rot_mat_, CV_32F);
+            }
+        }
+
+        const Mat* object;
+        const Mat* image;
+        const Mat* dist_coef;
+        const Mat* camera_mat;
+        int num_points;
+        int subset_size;
+
+        // Hypotheses storage (global)
+        Mat rot_matrices;
+        Mat transl_vectors;
+    };
+}
+
+void cv::cuda::solvePnPRansac(const Mat& object, const Mat& image, const Mat& camera_mat,
+                             const Mat& dist_coef, Mat& rvec, Mat& tvec, bool use_extrinsic_guess,
+                             int num_iters, float max_dist, int min_inlier_count,
+                             std::vector<int>* inliers)
+{
+    CV_UNUSED(min_inlier_count);
+    CV_Assert(object.rows == 1 && object.cols > 0 && object.type() == CV_32FC3);
+    CV_Assert(image.rows == 1 && image.cols > 0 && image.type() == CV_32FC2);
+    CV_Assert(object.cols == image.cols);
+    CV_Assert(camera_mat.size() == Size(3, 3) && camera_mat.type() == CV_32F);
+    CV_Assert(!use_extrinsic_guess); // We don't support initial guess for now
+    CV_Assert(num_iters <= solve_pnp_ransac::maxNumIters());
+
+    const int subset_size = 4;
+    const int num_points = object.cols;
+    CV_Assert(num_points >= subset_size);
+
+    // Unapply distortion and intrinsic camera transformations
+    Mat eye_camera_mat = Mat::eye(3, 3, CV_32F);
+    Mat empty_dist_coef;
+    Mat image_normalized;
+    undistortPoints(image, image_normalized, camera_mat, dist_coef, Mat(), eye_camera_mat);
+
+    // Hypotheses storage (global)
+    Mat rot_matrices(1, num_iters * 9, CV_32F);
+    Mat transl_vectors(1, num_iters * 3, CV_32F);
+
+    // Generate set of hypotheses using small subsets of the input data
+    TransformHypothesesGenerator body(object, image_normalized, empty_dist_coef, eye_camera_mat,
+                                      num_points, subset_size, rot_matrices, transl_vectors);
+    parallel_for_(Range(0, num_iters), body);
+
+    // Compute scores (i.e. number of inliers) for each hypothesis
+    GpuMat d_object(object);
+    GpuMat d_image_normalized(image_normalized);
+    GpuMat d_hypothesis_scores(1, num_iters, CV_32S);
+    solve_pnp_ransac::computeHypothesisScores(
+            num_iters, num_points, rot_matrices.ptr<float>(), transl_vectors.ptr<float3>(),
+            d_object.ptr<float3>(), d_image_normalized.ptr<float2>(), max_dist * max_dist,
+            d_hypothesis_scores.ptr<int>());
+
+    // Find the best hypothesis index
+    Point best_idx;
+    double best_score;
+    cuda::minMaxLoc(d_hypothesis_scores, NULL, &best_score, NULL, &best_idx);
+    int num_inliers = static_cast<int>(best_score);
+
+    // Extract the best hypothesis data
+
+    Mat rot_mat = rot_matrices.colRange(best_idx.x * 9, (best_idx.x + 1) * 9).reshape(0, 3);
+    Rodrigues(rot_mat, rvec);
+    rvec = rvec.reshape(0, 1);
+
+    tvec = transl_vectors.colRange(best_idx.x * 3, (best_idx.x + 1) * 3).clone();
+    tvec = tvec.reshape(0, 1);
+
+    // Build vector of inlier indices
+    if (inliers != NULL)
+    {
+        inliers->clear();
+        inliers->reserve(num_inliers);
+
+        Point3f p, p_transf;
+        Point2f p_proj;
+        const float* rot = rot_mat.ptr<float>();
+        const float* transl = tvec.ptr<float>();
+
+        for (int i = 0; i < num_points; ++i)
+        {
+            p = object.at<Point3f>(0, i);
+            p_transf.x = rot[0] * p.x + rot[1] * p.y + rot[2] * p.z + transl[0];
+            p_transf.y = rot[3] * p.x + rot[4] * p.y + rot[5] * p.z + transl[1];
+            p_transf.z = rot[6] * p.x + rot[7] * p.y + rot[8] * p.z + transl[2];
+            p_proj.x = p_transf.x / p_transf.z;
+            p_proj.y = p_transf.y / p_transf.z;
+            if (norm(p_proj - image_normalized.at<Point2f>(0, i)) < max_dist)
+                inliers->push_back(i);
+        }
+    }
+}
+
+#endif
diff --git a/modules/cudalegacy/src/cuda/NCV.cu b/modules/cudalegacy/src/cuda/NCV.cu
new file mode 100644
index 00000000000..a149fc0f859
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/NCV.cu
@@ -0,0 +1,180 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include <iostream>
+#include <vector>
+
+#include "opencv2/cudalegacy/NCV.hpp"
+
+//===================================================================
+//
+// Operations with rectangles
+//
+//===================================================================
+
+
+const Ncv32u NUMTHREADS_DRAWRECTS = 32;
+const Ncv32u NUMTHREADS_DRAWRECTS_LOG2 = 5;
+
+
+template <class T>
+__global__ void drawRects(T *d_dst,
+                          Ncv32u dstStride,
+                          Ncv32u dstWidth,
+                          Ncv32u dstHeight,
+                          NcvRect32u *d_rects,
+                          Ncv32u numRects,
+                          T color)
+{
+    Ncv32u blockId = blockIdx.y * 65535 + blockIdx.x;
+    if (blockId > numRects * 4)
+    {
+        return;
+    }
+
+    NcvRect32u curRect = d_rects[blockId >> 2];
+    NcvBool bVertical = blockId & 0x1;
+    NcvBool bTopLeft = blockId & 0x2;
+
+    Ncv32u pt0x, pt0y;
+    if (bVertical)
+    {
+        Ncv32u numChunks = (curRect.height + NUMTHREADS_DRAWRECTS - 1) >> NUMTHREADS_DRAWRECTS_LOG2;
+
+        pt0x = bTopLeft ? curRect.x : curRect.x + curRect.width - 1;
+        pt0y = curRect.y;
+
+        if (pt0x < dstWidth)
+        {
+            for (Ncv32u chunkId = 0; chunkId < numChunks; chunkId++)
+            {
+                Ncv32u ptY = pt0y + chunkId * NUMTHREADS_DRAWRECTS + threadIdx.x;
+                if (ptY < pt0y + curRect.height && ptY < dstHeight)
+                {
+                    d_dst[ptY * dstStride + pt0x] = color;
+                }
+            }
+        }
+    }
+    else
+    {
+        Ncv32u numChunks = (curRect.width + NUMTHREADS_DRAWRECTS - 1) >> NUMTHREADS_DRAWRECTS_LOG2;
+
+        pt0x = curRect.x;
+        pt0y = bTopLeft ? curRect.y : curRect.y + curRect.height - 1;
+
+        if (pt0y < dstHeight)
+        {
+            for (Ncv32u chunkId = 0; chunkId < numChunks; chunkId++)
+            {
+                Ncv32u ptX = pt0x + chunkId * NUMTHREADS_DRAWRECTS + threadIdx.x;
+                if (ptX < pt0x + curRect.width && ptX < dstWidth)
+                {
+                    d_dst[pt0y * dstStride + ptX] = color;
+                }
+            }
+        }
+    }
+}
+
+
+template <class T>
+static NCVStatus drawRectsWrapperDevice(T *d_dst,
+                                        Ncv32u dstStride,
+                                        Ncv32u dstWidth,
+                                        Ncv32u dstHeight,
+                                        NcvRect32u *d_rects,
+                                        Ncv32u numRects,
+                                        T color,
+                                        cudaStream_t cuStream)
+{
+    CV_UNUSED(cuStream);
+    ncvAssertReturn(d_dst != NULL && d_rects != NULL, NCV_NULL_PTR);
+    ncvAssertReturn(dstWidth > 0 && dstHeight > 0, NCV_DIMENSIONS_INVALID);
+    ncvAssertReturn(dstStride >= dstWidth, NCV_INVALID_STEP);
+    ncvAssertReturn(numRects <= dstWidth * dstHeight, NCV_DIMENSIONS_INVALID);
+
+    if (numRects == 0)
+    {
+        return NCV_SUCCESS;
+    }
+
+    dim3 grid(numRects * 4);
+    dim3 block(NUMTHREADS_DRAWRECTS);
+    if (grid.x > 65535)
+    {
+        grid.y = (grid.x + 65534) / 65535;
+        grid.x = 65535;
+    }
+
+    drawRects<T><<<grid, block>>>(d_dst, dstStride, dstWidth, dstHeight, d_rects, numRects, color);
+
+    ncvAssertCUDALastErrorReturn(NCV_CUDA_ERROR);
+
+    return NCV_SUCCESS;
+}
+
+
+NCVStatus ncvDrawRects_8u_device(Ncv8u *d_dst,
+                                 Ncv32u dstStride,
+                                 Ncv32u dstWidth,
+                                 Ncv32u dstHeight,
+                                 NcvRect32u *d_rects,
+                                 Ncv32u numRects,
+                                 Ncv8u color,
+                                 cudaStream_t cuStream)
+{
+    return drawRectsWrapperDevice(d_dst, dstStride, dstWidth, dstHeight, d_rects, numRects, color, cuStream);
+}
+
+
+NCVStatus ncvDrawRects_32u_device(Ncv32u *d_dst,
+                                  Ncv32u dstStride,
+                                  Ncv32u dstWidth,
+                                  Ncv32u dstHeight,
+                                  NcvRect32u *d_rects,
+                                  Ncv32u numRects,
+                                  Ncv32u color,
+                                  cudaStream_t cuStream)
+{
+    return drawRectsWrapperDevice(d_dst, dstStride, dstWidth, dstHeight, d_rects, numRects, color, cuStream);
+}
diff --git a/modules/cudalegacy/src/cuda/NCVAlg.hpp b/modules/cudalegacy/src/cuda/NCVAlg.hpp
new file mode 100644
index 00000000000..96a7e5e0f6f
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/NCVAlg.hpp
@@ -0,0 +1,155 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _ncv_alg_hpp_
+#define _ncv_alg_hpp_
+
+#include "opencv2/cudalegacy/NCV.hpp"
+
+
+template <class T>
+static void swap(T &p1, T &p2)
+{
+    T tmp = p1;
+    p1 = p2;
+    p2 = tmp;
+}
+
+
+template<typename T>
+static T divUp(T a, T b)
+{
+    return (a + b - 1) / b;
+}
+
+
+template<typename T>
+struct functorAddValues
+{
+    static __device__ __inline__ void assign(volatile T *dst, volatile T *src)
+    {
+        //Works only for integral types. If you see compiler error here, then you have to specify how to copy your object as a set of integral fields.
+        *dst = *src;
+    }
+    static __device__ __inline__ void reduce(volatile T &in1out, const volatile T &in2)
+    {
+        in1out += in2;
+    }
+};
+
+
+template<typename T>
+struct functorMinValues
+{
+    static __device__ __inline__ void assign(volatile T *dst, volatile T *src)
+    {
+        //Works only for integral types. If you see compiler error here, then you have to specify how to copy your object as a set of integral fields.
+        *dst = *src;
+    }
+    static __device__ __inline__ void reduce(volatile T &in1out, const volatile T &in2)
+    {
+        in1out = in1out > in2 ? in2 : in1out;
+    }
+};
+
+
+template<typename T>
+struct functorMaxValues
+{
+    static __device__ __inline__ void assign(volatile T *dst, volatile T *src)
+    {
+        //Works only for integral types. If you see compiler error here, then you have to specify how to copy your object as a set of integral fields.
+        *dst = *src;
+    }
+    static __device__ __inline__ void reduce(volatile T &in1out, const volatile T &in2)
+    {
+        in1out = in1out > in2 ? in1out : in2;
+    }
+};
+
+
+template<typename Tdata, class Tfunc, Ncv32u nThreads>
+static __device__ Tdata subReduce(Tdata threadElem)
+{
+    Tfunc functor;
+
+    __shared__ Tdata _reduceArr[nThreads];
+    volatile Tdata *reduceArr = _reduceArr;
+    functor.assign(reduceArr + threadIdx.x, &threadElem);
+    __syncthreads();
+
+    if (nThreads >= 256 && threadIdx.x < 128)
+    {
+        functor.reduce(reduceArr[threadIdx.x], reduceArr[threadIdx.x + 128]);
+    }
+    __syncthreads();
+
+    if (nThreads >= 128 && threadIdx.x < 64)
+    {
+        functor.reduce(reduceArr[threadIdx.x], reduceArr[threadIdx.x + 64]);
+    }
+    __syncthreads();
+
+    if (threadIdx.x < 32)
+    {
+        if (nThreads >= 64)
+        {
+            functor.reduce(reduceArr[threadIdx.x], reduceArr[threadIdx.x + 32]);
+        }
+        if (nThreads >= 32 && threadIdx.x < 16)
+        {
+            functor.reduce(reduceArr[threadIdx.x], reduceArr[threadIdx.x + 16]);
+            functor.reduce(reduceArr[threadIdx.x], reduceArr[threadIdx.x + 8]);
+            functor.reduce(reduceArr[threadIdx.x], reduceArr[threadIdx.x + 4]);
+            functor.reduce(reduceArr[threadIdx.x], reduceArr[threadIdx.x + 2]);
+            functor.reduce(reduceArr[threadIdx.x], reduceArr[threadIdx.x + 1]);
+        }
+    }
+
+    __syncthreads();
+    Tdata reduceRes;
+    functor.assign(&reduceRes, reduceArr);
+    return reduceRes;
+}
+
+
+#endif //_ncv_alg_hpp_
diff --git a/modules/cudalegacy/src/cuda/NCVBroxOpticalFlow.cu b/modules/cudalegacy/src/cuda/NCVBroxOpticalFlow.cu
new file mode 100644
index 00000000000..01914880248
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/NCVBroxOpticalFlow.cu
@@ -0,0 +1,1164 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+////////////////////////////////////////////////////////////////////////////////
+//
+// NVIDIA CUDA implementation of Brox et al Optical Flow algorithm
+//
+// Algorithm is explained in the original paper:
+//      T. Brox, A. Bruhn, N. Papenberg, J. Weickert:
+//      High accuracy optical flow estimation based on a theory for warping.
+//      ECCV 2004.
+//
+// Implementation by Mikhail Smirnov
+// email: msmirnov@nvidia.com, devsupport@nvidia.com
+//
+// Credits for help with the code to:
+// Alexey Mendelenko, Anton Obukhov, and Alexander Kharlamov.
+//
+////////////////////////////////////////////////////////////////////////////////
+
+#include <iostream>
+#include <vector>
+#include <memory>
+
+#include "opencv2/core/cuda/utility.hpp"
+
+#include "opencv2/cudalegacy/NPP_staging.hpp"
+#include "opencv2/cudalegacy/NCVBroxOpticalFlow.hpp"
+
+
+typedef NCVVectorAlloc<Ncv32f> FloatVector;
+
+/////////////////////////////////////////////////////////////////////////////////////////
+// Implementation specific constants
+/////////////////////////////////////////////////////////////////////////////////////////
+__device__ const float eps2 = 1e-6f;
+
+/////////////////////////////////////////////////////////////////////////////////////////
+// Additional defines
+/////////////////////////////////////////////////////////////////////////////////////////
+
+// rounded up division
+inline int iDivUp(int a, int b)
+{
+    return (a + b - 1)/b;
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////
+// Texture references
+/////////////////////////////////////////////////////////////////////////////////////////
+
+texture<float, 2, cudaReadModeElementType> tex_coarse;
+texture<float, 2, cudaReadModeElementType> tex_fine;
+
+texture<float, 2, cudaReadModeElementType> tex_I1;
+texture<float, 2, cudaReadModeElementType> tex_I0;
+
+texture<float, 2, cudaReadModeElementType> tex_Ix;
+texture<float, 2, cudaReadModeElementType> tex_Ixx;
+texture<float, 2, cudaReadModeElementType> tex_Ix0;
+
+texture<float, 2, cudaReadModeElementType> tex_Iy;
+texture<float, 2, cudaReadModeElementType> tex_Iyy;
+texture<float, 2, cudaReadModeElementType> tex_Iy0;
+
+texture<float, 2, cudaReadModeElementType> tex_Ixy;
+
+texture<float, 1, cudaReadModeElementType> tex_u;
+texture<float, 1, cudaReadModeElementType> tex_v;
+texture<float, 1, cudaReadModeElementType> tex_du;
+texture<float, 1, cudaReadModeElementType> tex_dv;
+texture<float, 1, cudaReadModeElementType> tex_numerator_dudv;
+texture<float, 1, cudaReadModeElementType> tex_numerator_u;
+texture<float, 1, cudaReadModeElementType> tex_numerator_v;
+texture<float, 1, cudaReadModeElementType> tex_inv_denominator_u;
+texture<float, 1, cudaReadModeElementType> tex_inv_denominator_v;
+texture<float, 1, cudaReadModeElementType> tex_diffusivity_x;
+texture<float, 1, cudaReadModeElementType> tex_diffusivity_y;
+
+
+/////////////////////////////////////////////////////////////////////////////////////////
+// SUPPLEMENTARY FUNCTIONS
+/////////////////////////////////////////////////////////////////////////////////////////
+
+///////////////////////////////////////////////////////////////////////////////
+/// \brief performs pointwise summation of two vectors stored in device memory
+/// \param d_res    - pointer to resulting vector (device memory)
+/// \param d_op1    - term #1 (device memory)
+/// \param d_op2    - term #2 (device memory)
+/// \param len    - vector size
+///////////////////////////////////////////////////////////////////////////////
+__global__ void pointwise_add(float *d_res, const float *d_op1, const float *d_op2, const int len)
+{
+    const int pos = blockIdx.x*blockDim.x + threadIdx.x;
+
+    if(pos >= len) return;
+
+    d_res[pos] = d_op1[pos] + d_op2[pos];
+}
+
+///////////////////////////////////////////////////////////////////////////////
+/// \brief wrapper for summation kernel.
+///  Computes \b op1 + \b op2 and stores result to \b res
+/// \param res   array, containing op1 + op2 (device memory)
+/// \param op1   term #1 (device memory)
+/// \param op2   term #2 (device memory)
+/// \param count vector size
+///////////////////////////////////////////////////////////////////////////////
+static void add(float *res, const float *op1, const float *op2, const int count, cudaStream_t stream)
+{
+    dim3 threads(256);
+    dim3 blocks(iDivUp(count, threads.x));
+
+    pointwise_add<<<blocks, threads, 0, stream>>>(res, op1, op2, count);
+}
+
+///////////////////////////////////////////////////////////////////////////////
+/// \brief wrapper for summation kernel.
+/// Increments \b res by \b rhs
+/// \param res   initial vector, will be replaced with result (device memory)
+/// \param rhs   increment (device memory)
+/// \param count vector size
+///////////////////////////////////////////////////////////////////////////////
+static void add(float *res, const float *rhs, const int count, cudaStream_t stream)
+{
+    add(res, res, rhs, count, stream);
+}
+
+///////////////////////////////////////////////////////////////////////////////
+/// \brief kernel for scaling vector by scalar
+/// \param d_res  scaled vector (device memory)
+/// \param d_src  source vector (device memory)
+/// \param scale  scalar to scale by
+/// \param len    vector size (number of elements)
+///////////////////////////////////////////////////////////////////////////////
+__global__ void scaleVector(float *d_res, const float *d_src, float scale, const int len)
+{
+    const int pos = blockIdx.x * blockDim.x + threadIdx.x;
+
+    if (pos >= len) return;
+
+    d_res[pos] = d_src[pos] * scale;
+}
+
+///////////////////////////////////////////////////////////////////////////////
+/// \brief scale vector by scalar
+///
+/// kernel wrapper
+/// \param d_res  scaled vector (device memory)
+/// \param d_src  source vector (device memory)
+/// \param scale  scalar to scale by
+/// \param len    vector size (number of elements)
+/// \param stream CUDA stream
+///////////////////////////////////////////////////////////////////////////////
+static void ScaleVector(float *d_res, const float *d_src, float scale, const int len, cudaStream_t stream)
+{
+    dim3 threads(256);
+    dim3 blocks(iDivUp(len, threads.x));
+
+    scaleVector<<<blocks, threads, 0, stream>>>(d_res, d_src, scale, len);
+}
+
+const int SOR_TILE_WIDTH = 32;
+const int SOR_TILE_HEIGHT = 6;
+const int PSOR_TILE_WIDTH = 32;
+const int PSOR_TILE_HEIGHT = 6;
+const int PSOR_PITCH = PSOR_TILE_WIDTH + 4;
+const int PSOR_HEIGHT = PSOR_TILE_HEIGHT + 4;
+
+///////////////////////////////////////////////////////////////////////////////
+///\brief Utility function. Compute smooth term diffusivity along x axis
+///\param s (out) pointer to memory location for result (diffusivity)
+///\param pos (in) position within shared memory array containing \b u
+///\param u (in) shared memory array containing \b u
+///\param v (in) shared memory array containing \b v
+///\param du (in) shared memory array containing \b du
+///\param dv (in) shared memory array containing \b dv
+///////////////////////////////////////////////////////////////////////////////
+__forceinline__ __device__ void diffusivity_along_x(float *s, int pos, const float *u, const float *v, const float *du, const float *dv)
+{
+    //x derivative between pixels (i,j) and (i-1,j)
+    const int left = pos-1;
+    float u_x = u[pos] + du[pos] - u[left] - du[left];
+    float v_x = v[pos] + dv[pos] - v[left] - dv[left];
+    const int up        = pos + PSOR_PITCH;
+    const int down      = pos - PSOR_PITCH;
+    const int up_left   = up - 1;
+    const int down_left = down-1;
+    //y derivative between pixels (i,j) and (i-1,j)
+    float u_y = 0.25f*(u[up] + du[up] + u[up_left] + du[up_left] - u[down] - du[down] - u[down_left] - du[down_left]);
+    float v_y = 0.25f*(v[up] + dv[up] + v[up_left] + dv[up_left] - v[down] - dv[down] - v[down_left] - dv[down_left]);
+    *s = 0.5f / sqrtf(u_x*u_x + v_x*v_x + u_y*u_y + v_y*v_y + eps2);
+}
+
+///////////////////////////////////////////////////////////////////////////////
+///\brief Utility function. Compute smooth term diffusivity along y axis
+///\param s (out) pointer to memory location for result (diffusivity)
+///\param pos (in) position within shared memory array containing \b u
+///\param u (in) shared memory array containing \b u
+///\param v (in) shared memory array containing \b v
+///\param du (in) shared memory array containing \b du
+///\param dv (in) shared memory array containing \b dv
+///////////////////////////////////////////////////////////////////////////////
+__forceinline__ __device__ void diffusivity_along_y(float *s, int pos, const float *u, const float *v, const float *du, const float *dv)
+{
+    //y derivative between pixels (i,j) and (i,j-1)
+    const int down = pos-PSOR_PITCH;
+    float u_y = u[pos] + du[pos] - u[down] - du[down];
+    float v_y = v[pos] + dv[pos] - v[down] - dv[down];
+    const int right      = pos + 1;
+    const int left       = pos - 1;
+    const int down_right = down + 1;
+    const int down_left  = down - 1;
+    //x derivative between pixels (i,j) and (i,j-1);
+    float u_x = 0.25f*(u[right] + u[down_right] + du[right] + du[down_right] - u[left] - u[down_left] - du[left] - du[down_left]);
+    float v_x = 0.25f*(v[right] + v[down_right] + dv[right] + dv[down_right] - v[left] - v[down_left] - dv[left] - dv[down_left]);
+    *s = 0.5f/sqrtf(u_x*u_x + v_x*v_x + u_y*u_y + v_y*v_y + eps2);
+}
+
+///////////////////////////////////////////////////////////////////////////////
+///\brief Utility function. Load element of 2D global memory to shared memory
+///\param smem pointer to shared memory array
+///\param is shared memory array column
+///\param js shared memory array row
+///\param w number of columns in global memory array
+///\param h number of rows in global memory array
+///\param p global memory array pitch in floats
+///////////////////////////////////////////////////////////////////////////////
+template<int tex_id>
+__forceinline__ __device__ void load_array_element(float *smem, int is, int js, int i, int j, int w, int h, int p)
+{
+    //position within shared memory array
+    const int ijs = js * PSOR_PITCH + is;
+    //mirror reflection across borders
+    i = max(i, -i-1);
+    i = min(i, w-i+w-1);
+    j = max(j, -j-1);
+    j = min(j, h-j+h-1);
+    const int pos = j * p + i;
+    switch(tex_id){
+        case 0:
+            smem[ijs] = tex1Dfetch(tex_u, pos);
+            break;
+        case 1:
+            smem[ijs] = tex1Dfetch(tex_v, pos);
+            break;
+        case 2:
+            smem[ijs] = tex1Dfetch(tex_du, pos);
+            break;
+        case 3:
+            smem[ijs] = tex1Dfetch(tex_dv, pos);
+            break;
+    }
+}
+
+///////////////////////////////////////////////////////////////////////////////
+///\brief Utility function. Load part (tile) of 2D global memory to shared memory
+///\param smem pointer to target shared memory array
+///\param ig column number within source
+///\param jg row number within source
+///\param w number of columns in global memory array
+///\param h number of rows in global memory array
+///\param p global memory array pitch in floats
+///////////////////////////////////////////////////////////////////////////////
+template<int tex>
+__forceinline__ __device__ void load_array(float *smem, int ig, int jg, int w, int h, int p)
+{
+    const int i = threadIdx.x + 2;
+    const int j = threadIdx.y + 2;
+    load_array_element<tex>(smem, i, j, ig, jg, w, h, p);//load current pixel
+    __syncthreads();
+    if(threadIdx.y < 2)
+    {
+        //load bottom shadow elements
+        load_array_element<tex>(smem, i, j-2, ig, jg-2, w, h, p);
+        if(threadIdx.x < 2)
+        {
+            //load bottom right shadow elements
+            load_array_element<tex>(smem, i+PSOR_TILE_WIDTH, j-2, ig+PSOR_TILE_WIDTH, jg-2, w, h, p);
+            //load middle right shadow elements
+            load_array_element<tex>(smem, i+PSOR_TILE_WIDTH, j, ig+PSOR_TILE_WIDTH, jg, w, h, p);
+        }
+        else if(threadIdx.x >= PSOR_TILE_WIDTH-2)
+        {
+            //load bottom left shadow elements
+            load_array_element<tex>(smem, i-PSOR_TILE_WIDTH, j-2, ig-PSOR_TILE_WIDTH, jg-2, w, h, p);
+            //load middle left shadow elements
+            load_array_element<tex>(smem, i-PSOR_TILE_WIDTH, j, ig-PSOR_TILE_WIDTH, jg, w, h, p);
+        }
+    }
+    else if(threadIdx.y >= PSOR_TILE_HEIGHT-2)
+    {
+        //load upper shadow elements
+        load_array_element<tex>(smem, i, j+2, ig, jg+2, w, h, p);
+        if(threadIdx.x < 2)
+        {
+            //load upper right shadow elements
+            load_array_element<tex>(smem, i+PSOR_TILE_WIDTH, j+2, ig+PSOR_TILE_WIDTH, jg+2, w, h, p);
+            //load middle right shadow elements
+            load_array_element<tex>(smem, i+PSOR_TILE_WIDTH, j, ig+PSOR_TILE_WIDTH, jg, w, h, p);
+        }
+        else if(threadIdx.x >= PSOR_TILE_WIDTH-2)
+        {
+            //load upper left shadow elements
+            load_array_element<tex>(smem, i-PSOR_TILE_WIDTH, j+2, ig-PSOR_TILE_WIDTH, jg+2, w, h, p);
+            //load middle left shadow elements
+            load_array_element<tex>(smem, i-PSOR_TILE_WIDTH, j, ig-PSOR_TILE_WIDTH, jg, w, h, p);
+        }
+    }
+    else
+    {
+        //load middle shadow elements
+        if(threadIdx.x < 2)
+        {
+            //load middle right shadow elements
+            load_array_element<tex>(smem, i+PSOR_TILE_WIDTH, j, ig+PSOR_TILE_WIDTH, jg, w, h, p);
+        }
+        else if(threadIdx.x >= PSOR_TILE_WIDTH-2)
+        {
+            //load middle left shadow elements
+            load_array_element<tex>(smem, i-PSOR_TILE_WIDTH, j, ig-PSOR_TILE_WIDTH, jg, w, h, p);
+        }
+    }
+    __syncthreads();
+}
+
+///////////////////////////////////////////////////////////////////////////////
+/// \brief computes matrix of linearised system for \c du, \c dv
+/// Computed values reside in GPU memory. \n
+/// Matrix computation is divided into two steps. This kernel performs first step\n
+/// - compute smoothness term diffusivity between pixels - psi dash smooth
+/// - compute robustness factor in the data term - psi dash data
+/// \param diffusivity_x (in/out) diffusivity between pixels along x axis in smoothness term
+/// \param diffusivity_y (in/out) diffusivity between pixels along y axis in smoothness term
+/// \param denominator_u (in/out) precomputed part of expression for new du value in SOR iteration
+/// \param denominator_v (in/out) precomputed part of expression for new dv value in SOR iteration
+/// \param numerator_dudv (in/out) precomputed part of expression for new du and dv value in SOR iteration
+/// \param numerator_u (in/out) precomputed part of expression for new du value in SOR iteration
+/// \param numerator_v (in/out) precomputed part of expression for new dv value in SOR iteration
+/// \param w (in) frame width
+/// \param h (in) frame height
+/// \param pitch (in) pitch in floats
+/// \param alpha (in) alpha in Brox model (flow smoothness)
+/// \param gamma (in) gamma in Brox model (edge importance)
+///////////////////////////////////////////////////////////////////////////////
+
+__global__ void prepare_sor_stage_1_tex(float *diffusivity_x, float *diffusivity_y,
+                                                        float *denominator_u, float *denominator_v,
+                                                        float *numerator_dudv,
+                                                        float *numerator_u, float *numerator_v,
+                                                        int w, int h, int s,
+                                                        float alpha, float gamma)
+{
+    __shared__ float u[PSOR_PITCH * PSOR_HEIGHT];
+    __shared__ float v[PSOR_PITCH * PSOR_HEIGHT];
+    __shared__ float du[PSOR_PITCH * PSOR_HEIGHT];
+    __shared__ float dv[PSOR_PITCH * PSOR_HEIGHT];
+
+    //position within tile
+    const int i = threadIdx.x;
+    const int j = threadIdx.y;
+    //position within smem arrays
+    const int ijs = (j+2) * PSOR_PITCH + i + 2;
+    //position within global memory
+    const int ig  = blockIdx.x * blockDim.x + threadIdx.x;
+    const int jg  = blockIdx.y * blockDim.y + threadIdx.y;
+    const int ijg = jg * s + ig;
+    //position within texture
+    float x = (float)ig + 0.5f;
+    float y = (float)jg + 0.5f;
+    //load u  and v to smem
+    load_array<0>(u, ig, jg, w, h, s);
+    load_array<1>(v, ig, jg, w, h, s);
+    load_array<2>(du, ig, jg, w, h, s);
+    load_array<3>(dv, ig, jg, w, h, s);
+    //warped position
+    float wx = (x + u[ijs])/(float)w;
+    float wy = (y + v[ijs])/(float)h;
+    x /= (float)w;
+    y /= (float)h;
+    //compute image derivatives
+    const float Iz  = tex2D(tex_I1, wx, wy) - tex2D(tex_I0, x, y);
+    const float Ix  = tex2D(tex_Ix, wx, wy);
+    const float Ixz = Ix - tex2D(tex_Ix0, x, y);
+    const float Ixy = tex2D(tex_Ixy, wx, wy);
+    const float Ixx = tex2D(tex_Ixx, wx, wy);
+    const float Iy  = tex2D(tex_Iy, wx, wy);
+    const float Iyz = Iy - tex2D(tex_Iy0, x, y);
+    const float Iyy = tex2D(tex_Iyy, wx, wy);
+    //compute data term
+    float q0, q1, q2;
+    q0 = Iz  + Ix  * du[ijs] + Iy  * dv[ijs];
+    q1 = Ixz + Ixx * du[ijs] + Ixy * dv[ijs];
+    q2 = Iyz + Ixy * du[ijs] + Iyy * dv[ijs];
+    float data_term = 0.5f * rsqrtf(q0*q0 + gamma*(q1*q1 + q2*q2) + eps2);
+    //scale data term by 1/alpha
+    data_term /= alpha;
+    //compute smoothness term (diffusivity)
+    float sx, sy;
+
+    if(ig >= w || jg >= h) return;
+
+    diffusivity_along_x(&sx, ijs, u, v, du, dv);
+    diffusivity_along_y(&sy, ijs, u, v, du, dv);
+
+    if(ig == 0) sx = 0.0f;
+    if(jg == 0) sy = 0.0f;
+
+    numerator_dudv[ijg] = data_term * (Ix*Iy + gamma * Ixy*(Ixx + Iyy));
+    numerator_u[ijg]    = data_term * (Ix*Iz + gamma * (Ixx*Ixz + Ixy*Iyz));
+    numerator_v[ijg]    = data_term * (Iy*Iz + gamma * (Iyy*Iyz + Ixy*Ixz));
+    denominator_u[ijg]  = data_term * (Ix*Ix + gamma * (Ixy*Ixy + Ixx*Ixx));
+    denominator_v[ijg]  = data_term * (Iy*Iy + gamma * (Ixy*Ixy + Iyy*Iyy));
+    diffusivity_x[ijg]  = sx;
+    diffusivity_y[ijg]  = sy;
+}
+
+///////////////////////////////////////////////////////////////////////////////
+///\brief computes matrix of linearised system for \c du, \c dv
+///\param inv_denominator_u
+///\param inv_denominator_v
+///\param w
+///\param h
+///\param s
+///////////////////////////////////////////////////////////////////////////////
+__global__ void prepare_sor_stage_2(float *inv_denominator_u, float *inv_denominator_v,
+                                    int w, int h, int s)
+{
+    __shared__ float sx[(PSOR_TILE_WIDTH+1) * (PSOR_TILE_HEIGHT+1)];
+    __shared__ float sy[(PSOR_TILE_WIDTH+1) * (PSOR_TILE_HEIGHT+1)];
+    //position within tile
+    const int i = threadIdx.x;
+    const int j = threadIdx.y;
+    //position within smem arrays
+    const int ijs = j*(PSOR_TILE_WIDTH+1) + i;
+    //position within global memory
+    const int ig  = blockIdx.x * blockDim.x + threadIdx.x;
+    const int jg  = blockIdx.y * blockDim.y + threadIdx.y;
+    const int ijg = jg*s + ig;
+    int inside = ig < w && jg < h;
+    float denom_u;
+    float denom_v;
+    if(inside)
+    {
+        denom_u = inv_denominator_u[ijg];
+        denom_v = inv_denominator_v[ijg];
+    }
+    if(inside)
+    {
+        sx[ijs] = tex1Dfetch(tex_diffusivity_x, ijg);
+        sy[ijs] = tex1Dfetch(tex_diffusivity_y, ijg);
+    }
+    else
+    {
+        sx[ijs] = 0.0f;
+        sy[ijs] = 0.0f;
+    }
+    int up = ijs+PSOR_TILE_WIDTH+1;
+    if(j == PSOR_TILE_HEIGHT-1)
+    {
+        if(jg < h-1 && inside)
+        {
+            sy[up] = tex1Dfetch(tex_diffusivity_y, ijg + s);
+        }
+        else
+        {
+            sy[up] = 0.0f;
+        }
+    }
+    int right = ijs + 1;
+    if(threadIdx.x == PSOR_TILE_WIDTH-1)
+    {
+        if(ig < w-1 && inside)
+        {
+            sx[right] = tex1Dfetch(tex_diffusivity_x, ijg + 1);
+        }
+        else
+        {
+            sx[right] = 0.0f;
+        }
+    }
+    __syncthreads();
+    float diffusivity_sum;
+    diffusivity_sum = sx[ijs] + sx[ijs+1] + sy[ijs] + sy[ijs+PSOR_TILE_WIDTH+1];
+    if(inside)
+    {
+        denom_u += diffusivity_sum;
+        denom_v += diffusivity_sum;
+        inv_denominator_u[ijg] = 1.0f/denom_u;
+        inv_denominator_v[ijg] = 1.0f/denom_v;
+    }
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////
+// Red-Black SOR
+/////////////////////////////////////////////////////////////////////////////////////////
+
+template<int isBlack> __global__ void sor_pass(float *new_du,
+                                               float *new_dv,
+                                               const float *g_inv_denominator_u,
+                                               const float *g_inv_denominator_v,
+                                               const float *g_numerator_u,
+                                               const float *g_numerator_v,
+                                               const float *g_numerator_dudv,
+                                               float omega,
+                                               int width,
+                                               int height,
+                                               int stride)
+{
+    int i = blockIdx.x * blockDim.x + threadIdx.x;
+    int j = blockIdx.y * blockDim.y + threadIdx.y;
+
+    if(i >= width || j >= height)
+        return;
+
+    const int pos = j * stride + i;
+    const int pos_r = i < width - 1 ? pos + 1 : pos;
+    const int pos_u = j < height - 1 ? pos + stride : pos;
+    const int pos_d = j > 0 ? pos - stride : pos;
+    const int pos_l = i > 0 ? pos - 1 : pos;
+
+    //load smooth term
+    float s_up, s_left, s_right, s_down;
+    s_left = tex1Dfetch(tex_diffusivity_x, pos);
+    s_down = tex1Dfetch(tex_diffusivity_y, pos);
+    if(i < width-1)
+        s_right = tex1Dfetch(tex_diffusivity_x, pos_r);
+    else
+        s_right = 0.0f; //Neumann BC
+    if(j < height-1)
+        s_up = tex1Dfetch(tex_diffusivity_y, pos_u);
+    else
+        s_up = 0.0f; //Neumann BC
+
+    //load u, v and du, dv
+    float u_up, u_left, u_right, u_down, u;
+    float v_up, v_left, v_right, v_down, v;
+    float du_up, du_left, du_right, du_down, du;
+    float dv_up, dv_left, dv_right, dv_down, dv;
+
+    u_left  = tex1Dfetch(tex_u, pos_l);
+    u_right = tex1Dfetch(tex_u, pos_r);
+    u_down  = tex1Dfetch(tex_u, pos_d);
+    u_up    = tex1Dfetch(tex_u, pos_u);
+    u       = tex1Dfetch(tex_u, pos);
+
+    v_left  = tex1Dfetch(tex_v, pos_l);
+    v_right = tex1Dfetch(tex_v, pos_r);
+    v_down  = tex1Dfetch(tex_v, pos_d);
+    v       = tex1Dfetch(tex_v, pos);
+    v_up    = tex1Dfetch(tex_v, pos_u);
+
+    du       = tex1Dfetch(tex_du, pos);
+    du_left  = tex1Dfetch(tex_du, pos_l);
+    du_right = tex1Dfetch(tex_du, pos_r);
+    du_down  = tex1Dfetch(tex_du, pos_d);
+    du_up    = tex1Dfetch(tex_du, pos_u);
+
+    dv       = tex1Dfetch(tex_dv, pos);
+    dv_left  = tex1Dfetch(tex_dv, pos_l);
+    dv_right = tex1Dfetch(tex_dv, pos_r);
+    dv_down  = tex1Dfetch(tex_dv, pos_d);
+    dv_up    = tex1Dfetch(tex_dv, pos_u);
+
+    float numerator_dudv    = g_numerator_dudv[pos];
+
+    if((i+j)%2 == isBlack)
+    {
+        // update du
+        float numerator_u = (s_left*(u_left + du_left) + s_up*(u_up + du_up) + s_right*(u_right + du_right) + s_down*(u_down + du_down) -
+                             u * (s_left + s_right + s_up + s_down) - g_numerator_u[pos] - numerator_dudv*dv);
+
+        du = (1.0f - omega) * du + omega * g_inv_denominator_u[pos] * numerator_u;
+
+        // update dv
+        float numerator_v = (s_left*(v_left + dv_left) + s_up*(v_up + dv_up) + s_right*(v_right + dv_right) + s_down*(v_down + dv_down) -
+                             v * (s_left + s_right + s_up + s_down) - g_numerator_v[pos] - numerator_dudv*du);
+
+        dv = (1.0f - omega) * dv + omega * g_inv_denominator_v[pos] * numerator_v;
+    }
+    new_du[pos] = du;
+    new_dv[pos] = dv;
+}
+
+///////////////////////////////////////////////////////////////////////////////
+// utility functions
+///////////////////////////////////////////////////////////////////////////////
+
+void initTexture1D(texture<float, 1, cudaReadModeElementType> &tex)
+{
+    tex.addressMode[0] = cudaAddressModeClamp;
+    tex.filterMode = cudaFilterModePoint;
+    tex.normalized = false;
+}
+
+void initTexture2D(texture<float, 2, cudaReadModeElementType> &tex)
+{
+    tex.addressMode[0] = cudaAddressModeMirror;
+    tex.addressMode[1] = cudaAddressModeMirror;
+    tex.filterMode = cudaFilterModeLinear;
+    tex.normalized = true;
+}
+
+void InitTextures()
+{
+    initTexture2D(tex_I0);
+    initTexture2D(tex_I1);
+    initTexture2D(tex_fine);      // for downsampling
+    initTexture2D(tex_coarse);    // for prolongation
+
+    initTexture2D(tex_Ix);
+    initTexture2D(tex_Ixx);
+    initTexture2D(tex_Ix0);
+
+    initTexture2D(tex_Iy);
+    initTexture2D(tex_Iyy);
+    initTexture2D(tex_Iy0);
+
+    initTexture2D(tex_Ixy);
+
+    initTexture1D(tex_u);
+    initTexture1D(tex_v);
+    initTexture1D(tex_du);
+    initTexture1D(tex_dv);
+    initTexture1D(tex_diffusivity_x);
+    initTexture1D(tex_diffusivity_y);
+    initTexture1D(tex_inv_denominator_u);
+    initTexture1D(tex_inv_denominator_v);
+    initTexture1D(tex_numerator_dudv);
+    initTexture1D(tex_numerator_u);
+    initTexture1D(tex_numerator_v);
+}
+
+namespace
+{
+    struct ImagePyramid
+    {
+        std::vector<FloatVector*> img0;
+        std::vector<FloatVector*> img1;
+
+        std::vector<Ncv32u> w;
+        std::vector<Ncv32u> h;
+
+        explicit ImagePyramid(int outer_iterations)
+        {
+            img0.reserve(outer_iterations);
+            img1.reserve(outer_iterations);
+
+            w.reserve(outer_iterations);
+            h.reserve(outer_iterations);
+        }
+
+        ~ImagePyramid()
+        {
+            w.clear();
+            h.clear();
+
+            for (int i = static_cast<int>(img0.size()) - 1; i >= 0; --i)
+            {
+                delete img1[i];
+                delete img0[i];
+            }
+
+            img0.clear();
+            img1.clear();
+        }
+    };
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////
+// MAIN FUNCTION
+/////////////////////////////////////////////////////////////////////////////////////////
+NCVStatus NCVBroxOpticalFlow(const NCVBroxOpticalFlowDescriptor desc,
+                             INCVMemAllocator &gpu_mem_allocator,
+                             const NCVMatrix<Ncv32f> &frame0,
+                             const NCVMatrix<Ncv32f> &frame1,
+                             NCVMatrix<Ncv32f> &uOut,
+                             NCVMatrix<Ncv32f> &vOut,
+                             cudaStream_t stream)
+{
+    ncvAssertPrintReturn(desc.alpha > 0.0f                   , "Invalid alpha"                      , NCV_INCONSISTENT_INPUT);
+    ncvAssertPrintReturn(desc.gamma >= 0.0f                  , "Invalid gamma"                      , NCV_INCONSISTENT_INPUT);
+    ncvAssertPrintReturn(desc.number_of_inner_iterations > 0 , "Invalid number of inner iterations" , NCV_INCONSISTENT_INPUT);
+    ncvAssertPrintReturn(desc.number_of_outer_iterations > 0 , "Invalid number of outer iterations" , NCV_INCONSISTENT_INPUT);
+    ncvAssertPrintReturn(desc.number_of_solver_iterations > 0, "Invalid number of solver iterations", NCV_INCONSISTENT_INPUT);
+
+    const Ncv32u kSourceWidth  = frame0.width();
+    const Ncv32u kSourceHeight = frame0.height();
+
+    ncvAssertPrintReturn(frame1.width() == kSourceWidth && frame1.height() == kSourceHeight, "Frame dims do not match", NCV_INCONSISTENT_INPUT);
+    ncvAssertReturn(uOut.width() == kSourceWidth && vOut.width() == kSourceWidth &&
+        uOut.height() == kSourceHeight && vOut.height() == kSourceHeight, NCV_INCONSISTENT_INPUT);
+
+    ncvAssertReturn(gpu_mem_allocator.isInitialized(), NCV_ALLOCATOR_NOT_INITIALIZED);
+
+    bool kSkipProcessing = gpu_mem_allocator.isCounting();
+
+    int cuda_device;
+    ncvAssertCUDAReturn(cudaGetDevice(&cuda_device), NCV_CUDA_ERROR);
+
+    cudaDeviceProp device_props;
+    ncvAssertCUDAReturn(cudaGetDeviceProperties(&device_props, cuda_device), NCV_CUDA_ERROR);
+
+    Ncv32u alignmentValue = gpu_mem_allocator.alignment ();
+
+    const Ncv32u kStrideAlignmentFloat = alignmentValue / sizeof(float);
+    const Ncv32u kSourcePitch = alignUp(kSourceWidth, kStrideAlignmentFloat) * sizeof(float);
+
+    const Ncv32f scale_factor = desc.scale_factor;
+    const Ncv32f alpha = desc.alpha;
+    const Ncv32f gamma = desc.gamma;
+
+    const Ncv32u kSizeInPixelsAligned = alignUp(kSourceWidth, kStrideAlignmentFloat)*kSourceHeight;
+
+#if defined SAFE_VECTOR_DECL
+#undef SAFE_VECTOR_DECL
+#endif
+#define SAFE_VECTOR_DECL(name, allocator, size) \
+    FloatVector name((allocator), (size)); \
+    ncvAssertReturn(name.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+
+    // matrix elements
+    SAFE_VECTOR_DECL(diffusivity_x,  gpu_mem_allocator, kSizeInPixelsAligned);
+    SAFE_VECTOR_DECL(diffusivity_y,  gpu_mem_allocator, kSizeInPixelsAligned);
+    SAFE_VECTOR_DECL(denom_u,  gpu_mem_allocator, kSizeInPixelsAligned);
+    SAFE_VECTOR_DECL(denom_v,  gpu_mem_allocator, kSizeInPixelsAligned);
+    SAFE_VECTOR_DECL(num_dudv, gpu_mem_allocator, kSizeInPixelsAligned);
+    SAFE_VECTOR_DECL(num_u,    gpu_mem_allocator, kSizeInPixelsAligned);
+    SAFE_VECTOR_DECL(num_v,    gpu_mem_allocator, kSizeInPixelsAligned);
+
+    // flow components
+    SAFE_VECTOR_DECL(u, gpu_mem_allocator, kSizeInPixelsAligned);
+    SAFE_VECTOR_DECL(v, gpu_mem_allocator, kSizeInPixelsAligned);
+
+    SAFE_VECTOR_DECL(u_new, gpu_mem_allocator, kSizeInPixelsAligned);
+    SAFE_VECTOR_DECL(v_new, gpu_mem_allocator, kSizeInPixelsAligned);
+
+    // flow increments
+    SAFE_VECTOR_DECL(du, gpu_mem_allocator, kSizeInPixelsAligned);
+    SAFE_VECTOR_DECL(dv, gpu_mem_allocator, kSizeInPixelsAligned);
+
+    SAFE_VECTOR_DECL(du_new, gpu_mem_allocator, kSizeInPixelsAligned);
+    SAFE_VECTOR_DECL(dv_new, gpu_mem_allocator, kSizeInPixelsAligned);
+
+    // temporary storage
+    SAFE_VECTOR_DECL(device_buffer, gpu_mem_allocator,
+        alignUp(kSourceWidth, kStrideAlignmentFloat) * alignUp(kSourceHeight, kStrideAlignmentFloat));
+
+    // image derivatives
+    SAFE_VECTOR_DECL(Ix,  gpu_mem_allocator, kSizeInPixelsAligned);
+    SAFE_VECTOR_DECL(Ixx, gpu_mem_allocator, kSizeInPixelsAligned);
+    SAFE_VECTOR_DECL(Ix0, gpu_mem_allocator, kSizeInPixelsAligned);
+    SAFE_VECTOR_DECL(Iy,  gpu_mem_allocator, kSizeInPixelsAligned);
+    SAFE_VECTOR_DECL(Iyy, gpu_mem_allocator, kSizeInPixelsAligned);
+    SAFE_VECTOR_DECL(Iy0, gpu_mem_allocator, kSizeInPixelsAligned);
+    SAFE_VECTOR_DECL(Ixy, gpu_mem_allocator, kSizeInPixelsAligned);
+
+    // spatial derivative filter size
+    const int kDFilterSize = 5;
+    SAFE_VECTOR_DECL(derivativeFilter, gpu_mem_allocator, kDFilterSize);
+
+    if (!kSkipProcessing)
+    {
+        const float derivativeFilterHost[kDFilterSize] = {1.0f, -8.0f, 0.0f, 8.0f, -1.0f};
+
+        ncvAssertCUDAReturn(cudaMemcpy(derivativeFilter.ptr(), derivativeFilterHost, sizeof(float) * kDFilterSize,
+            cudaMemcpyHostToDevice), NCV_CUDA_ERROR);
+
+        InitTextures();
+    }
+
+    //prepare image pyramid
+    ImagePyramid pyr(desc.number_of_outer_iterations);
+
+    cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float>();
+
+    float scale = 1.0f;
+
+    //cuda arrays for frames
+    std::unique_ptr<FloatVector> pI0(new FloatVector(gpu_mem_allocator, kSizeInPixelsAligned));
+    ncvAssertReturn(pI0->isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+
+    std::unique_ptr<FloatVector> pI1(new FloatVector(gpu_mem_allocator, kSizeInPixelsAligned));
+    ncvAssertReturn(pI1->isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+
+    if (!kSkipProcessing)
+    {
+        //copy frame data to device
+        size_t dst_width_in_bytes = alignUp(kSourceWidth, kStrideAlignmentFloat) * sizeof(float);
+        size_t src_width_in_bytes = kSourceWidth * sizeof(float);
+        size_t src_pitch_in_bytes = frame0.pitch();
+
+        ncvAssertCUDAReturn( cudaMemcpy2DAsync(pI0->ptr(), dst_width_in_bytes, frame0.ptr(),
+            src_pitch_in_bytes, src_width_in_bytes, kSourceHeight, cudaMemcpyDeviceToDevice, stream), NCV_CUDA_ERROR );
+
+        ncvAssertCUDAReturn( cudaMemcpy2DAsync(pI1->ptr(), dst_width_in_bytes, frame1.ptr(),
+            src_pitch_in_bytes, src_width_in_bytes, kSourceHeight, cudaMemcpyDeviceToDevice, stream), NCV_CUDA_ERROR );
+    }
+
+    FloatVector* I0 = pI0.release();
+    FloatVector* I1 = pI1.release();
+
+        //prepare pyramid
+    pyr.img0.push_back(I0);
+    pyr.img1.push_back(I1);
+
+    pyr.w.push_back(kSourceWidth);
+    pyr.h.push_back(kSourceHeight);
+
+    scale *= scale_factor;
+
+    Ncv32u prev_level_width  = kSourceWidth;
+    Ncv32u prev_level_height = kSourceHeight;
+    while((prev_level_width > 15) && (prev_level_height > 15) && (static_cast<Ncv32u>(pyr.img0.size()) < desc.number_of_outer_iterations))
+    {
+        //current resolution
+        Ncv32u level_width  = static_cast<Ncv32u>(ceilf(kSourceWidth  * scale));
+        Ncv32u level_height = static_cast<Ncv32u>(ceilf(kSourceHeight * scale));
+
+        Ncv32u level_width_aligned  = alignUp(level_width,  kStrideAlignmentFloat);
+
+        Ncv32u buffer_size = alignUp(level_width, kStrideAlignmentFloat) * level_height; // buffer size in floats
+
+        Ncv32u prev_level_pitch = alignUp(prev_level_width, kStrideAlignmentFloat) * sizeof(float);
+
+        std::unique_ptr<FloatVector> level_frame0(new FloatVector(gpu_mem_allocator, buffer_size));
+        ncvAssertReturn(level_frame0->isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+
+        std::unique_ptr<FloatVector> level_frame1(new FloatVector(gpu_mem_allocator, buffer_size));
+        ncvAssertReturn(level_frame1->isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+
+        if (!kSkipProcessing)
+        {
+            ncvAssertCUDAReturn(cudaStreamSynchronize(stream), NCV_CUDA_ERROR);
+
+            NcvSize32u srcSize (prev_level_width, prev_level_height);
+            NcvSize32u dstSize (level_width, level_height);
+            NcvRect32u srcROI (0, 0, prev_level_width, prev_level_height);
+            NcvRect32u dstROI (0, 0, level_width, level_height);
+
+            // frame 0
+            ncvAssertReturnNcvStat( nppiStResize_32f_C1R (I0->ptr(), srcSize, prev_level_pitch, srcROI,
+                level_frame0->ptr(), dstSize, level_width_aligned * sizeof (float), dstROI, scale_factor, scale_factor, nppStSupersample) );
+
+            // frame 1
+            ncvAssertReturnNcvStat( nppiStResize_32f_C1R (I1->ptr(), srcSize, prev_level_pitch, srcROI,
+                level_frame1->ptr(), dstSize, level_width_aligned * sizeof (float), dstROI, scale_factor, scale_factor, nppStSupersample) );
+        }
+
+        I0 = level_frame0.release();
+        I1 = level_frame1.release();
+
+        //store pointers
+        pyr.img0.push_back(I0);
+        pyr.img1.push_back(I1);
+
+        pyr.w.push_back(level_width);
+        pyr.h.push_back(level_height);
+
+        scale *= scale_factor;
+
+        prev_level_width  = level_width;
+        prev_level_height = level_height;
+    }
+
+    if (!kSkipProcessing)
+    {
+        //initial values for flow is 0
+        ncvAssertCUDAReturn(cudaMemsetAsync(u.ptr(), 0, kSizeInPixelsAligned * sizeof(float), stream), NCV_CUDA_ERROR);
+        ncvAssertCUDAReturn(cudaMemsetAsync(v.ptr(), 0, kSizeInPixelsAligned * sizeof(float), stream), NCV_CUDA_ERROR);
+
+        //select images with lowest resolution
+        size_t pitch = alignUp(pyr.w.back(), kStrideAlignmentFloat) * sizeof(float);
+        ncvAssertCUDAReturn(cudaBindTexture2D(0, tex_I0, pyr.img0.back()->ptr(), channel_desc, pyr.w.back(), pyr.h.back(), pitch), NCV_CUDA_ERROR);
+        ncvAssertCUDAReturn(cudaBindTexture2D(0, tex_I1, pyr.img1.back()->ptr(), channel_desc, pyr.w.back(), pyr.h.back(), pitch), NCV_CUDA_ERROR);
+        ncvAssertCUDAReturn(cudaStreamSynchronize(stream), NCV_CUDA_ERROR);
+
+        FloatVector* ptrU = &u;
+        FloatVector* ptrV = &v;
+        FloatVector* ptrUNew = &u_new;
+        FloatVector* ptrVNew = &v_new;
+
+        std::vector<FloatVector*>::const_reverse_iterator img0Iter = pyr.img0.rbegin();
+        std::vector<FloatVector*>::const_reverse_iterator img1Iter = pyr.img1.rbegin();
+
+        //outer loop
+        //warping fixed point iteration
+        while(!pyr.w.empty())
+        {
+            //current grid dimensions
+            const Ncv32u kLevelWidth  = pyr.w.back();
+            const Ncv32u kLevelHeight = pyr.h.back();
+            const Ncv32u kLevelStride = alignUp(kLevelWidth, kStrideAlignmentFloat);
+
+            //size of current image in bytes
+            const int kLevelSizeInBytes = kLevelStride * kLevelHeight * sizeof(float);
+
+            //number of points at current resolution
+            const int kLevelSizeInPixels = kLevelStride * kLevelHeight;
+
+            //initial guess for du and dv
+            ncvAssertCUDAReturn(cudaMemsetAsync(du.ptr(), 0, kLevelSizeInBytes, stream), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaMemsetAsync(dv.ptr(), 0, kLevelSizeInBytes, stream), NCV_CUDA_ERROR);
+
+            //texture format descriptor
+            cudaChannelFormatDesc ch_desc = cudaCreateChannelDesc<float>();
+
+            I0 = *img0Iter;
+            I1 = *img1Iter;
+
+            ++img0Iter;
+            ++img1Iter;
+
+            ncvAssertCUDAReturn(cudaBindTexture2D(0, tex_I0, I0->ptr(), ch_desc, kLevelWidth, kLevelHeight, kLevelStride*sizeof(float)), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaBindTexture2D(0, tex_I1, I1->ptr(), ch_desc, kLevelWidth, kLevelHeight, kLevelStride*sizeof(float)), NCV_CUDA_ERROR);
+
+            //compute derivatives
+            dim3 dBlocks(iDivUp(kLevelWidth, 32), iDivUp(kLevelHeight, 6));
+            //dim3 dThreads(32, 6);
+
+            const int kPitchTex = kLevelStride * sizeof(float);
+
+            NcvSize32u srcSize(kLevelWidth, kLevelHeight);
+            Ncv32u nSrcStep = kLevelStride * sizeof(float);
+            NcvRect32u oROI(0, 0, kLevelWidth, kLevelHeight);
+
+            // Ix0
+            ncvAssertReturnNcvStat( nppiStFilterRowBorder_32f_C1R (I0->ptr(), srcSize, nSrcStep, Ix0.ptr(), srcSize, nSrcStep, oROI,
+                nppStBorderMirror, derivativeFilter.ptr(), kDFilterSize, kDFilterSize/2, 1.0f/12.0f) );
+
+            // Iy0
+            ncvAssertReturnNcvStat( nppiStFilterColumnBorder_32f_C1R (I0->ptr(), srcSize, nSrcStep, Iy0.ptr(), srcSize, nSrcStep, oROI,
+                nppStBorderMirror, derivativeFilter.ptr(), kDFilterSize, kDFilterSize/2, 1.0f/12.0f) );
+
+            // Ix
+            ncvAssertReturnNcvStat( nppiStFilterRowBorder_32f_C1R (I1->ptr(), srcSize, nSrcStep, Ix.ptr(), srcSize, nSrcStep, oROI,
+                nppStBorderMirror, derivativeFilter.ptr(), kDFilterSize, kDFilterSize/2, 1.0f/12.0f) );
+
+            // Iy
+            ncvAssertReturnNcvStat( nppiStFilterColumnBorder_32f_C1R (I1->ptr(), srcSize, nSrcStep, Iy.ptr(), srcSize, nSrcStep, oROI,
+                nppStBorderMirror, derivativeFilter.ptr(), kDFilterSize, kDFilterSize/2, 1.0f/12.0f) );
+
+            // Ixx
+            ncvAssertReturnNcvStat( nppiStFilterRowBorder_32f_C1R (Ix.ptr(), srcSize, nSrcStep, Ixx.ptr(), srcSize, nSrcStep, oROI,
+                nppStBorderMirror, derivativeFilter.ptr(), kDFilterSize, kDFilterSize/2, 1.0f/12.0f) );
+
+            // Iyy
+            ncvAssertReturnNcvStat( nppiStFilterColumnBorder_32f_C1R (Iy.ptr(), srcSize, nSrcStep, Iyy.ptr(), srcSize, nSrcStep, oROI,
+                nppStBorderMirror, derivativeFilter.ptr(), kDFilterSize, kDFilterSize/2, 1.0f/12.0f) );
+
+            // Ixy
+            ncvAssertReturnNcvStat( nppiStFilterRowBorder_32f_C1R (Iy.ptr(), srcSize, nSrcStep, Ixy.ptr(), srcSize, nSrcStep, oROI,
+                nppStBorderMirror, derivativeFilter.ptr(), kDFilterSize, kDFilterSize/2, 1.0f/12.0f) );
+
+            ncvAssertCUDAReturn(cudaBindTexture2D(0, tex_Ix,  Ix.ptr(),  ch_desc, kLevelWidth, kLevelHeight, kPitchTex), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaBindTexture2D(0, tex_Ixx, Ixx.ptr(), ch_desc, kLevelWidth, kLevelHeight, kPitchTex), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaBindTexture2D(0, tex_Ix0, Ix0.ptr(), ch_desc, kLevelWidth, kLevelHeight, kPitchTex), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaBindTexture2D(0, tex_Iy,  Iy.ptr(),  ch_desc, kLevelWidth, kLevelHeight, kPitchTex), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaBindTexture2D(0, tex_Iyy, Iyy.ptr(), ch_desc, kLevelWidth, kLevelHeight, kPitchTex), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaBindTexture2D(0, tex_Iy0, Iy0.ptr(), ch_desc, kLevelWidth, kLevelHeight, kPitchTex), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaBindTexture2D(0, tex_Ixy, Ixy.ptr(), ch_desc, kLevelWidth, kLevelHeight, kPitchTex), NCV_CUDA_ERROR);
+
+            //    flow
+            ncvAssertCUDAReturn(cudaBindTexture(0, tex_u, ptrU->ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaBindTexture(0, tex_v, ptrV->ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+            //    flow increments
+            ncvAssertCUDAReturn(cudaBindTexture(0, tex_du, du.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaBindTexture(0, tex_dv, dv.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+
+            dim3 psor_blocks(iDivUp(kLevelWidth, PSOR_TILE_WIDTH), iDivUp(kLevelHeight, PSOR_TILE_HEIGHT));
+            dim3 psor_threads(PSOR_TILE_WIDTH, PSOR_TILE_HEIGHT);
+
+            dim3 sor_blocks(iDivUp(kLevelWidth, SOR_TILE_WIDTH), iDivUp(kLevelHeight, SOR_TILE_HEIGHT));
+            dim3 sor_threads(SOR_TILE_WIDTH, SOR_TILE_HEIGHT);
+
+            // inner loop
+            // lagged nonlinearity fixed point iteration
+            ncvAssertCUDAReturn(cudaStreamSynchronize(stream), NCV_CUDA_ERROR);
+            for (Ncv32u current_inner_iteration = 0; current_inner_iteration < desc.number_of_inner_iterations; ++current_inner_iteration)
+            {
+                //compute coefficients
+                prepare_sor_stage_1_tex<<<psor_blocks, psor_threads, 0, stream>>>
+                    (diffusivity_x.ptr(),
+                     diffusivity_y.ptr(),
+                     denom_u.ptr(),
+                     denom_v.ptr(),
+                     num_dudv.ptr(),
+                     num_u.ptr(),
+                     num_v.ptr(),
+                     kLevelWidth,
+                     kLevelHeight,
+                     kLevelStride,
+                     alpha,
+                     gamma);
+
+                ncvAssertCUDALastErrorReturn(NCV_CUDA_ERROR);
+
+                ncvAssertCUDAReturn(cudaBindTexture(0, tex_diffusivity_x, diffusivity_x.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+                ncvAssertCUDAReturn(cudaBindTexture(0, tex_diffusivity_y, diffusivity_y.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+
+                ncvAssertCUDAReturn(cudaBindTexture(0, tex_numerator_dudv, num_dudv.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+
+                ncvAssertCUDAReturn(cudaBindTexture(0, tex_numerator_u, num_u.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+                ncvAssertCUDAReturn(cudaBindTexture(0, tex_numerator_v, num_v.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+
+                prepare_sor_stage_2<<<psor_blocks, psor_threads, 0, stream>>>(denom_u.ptr(), denom_v.ptr(), kLevelWidth, kLevelHeight, kLevelStride);
+
+                ncvAssertCUDALastErrorReturn(NCV_CUDA_ERROR);
+
+                //    linear system coefficients
+                ncvAssertCUDAReturn(cudaBindTexture(0, tex_diffusivity_x, diffusivity_x.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+                ncvAssertCUDAReturn(cudaBindTexture(0, tex_diffusivity_y, diffusivity_y.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+
+                ncvAssertCUDAReturn(cudaBindTexture(0, tex_numerator_dudv, num_dudv.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+
+                ncvAssertCUDAReturn(cudaBindTexture(0, tex_numerator_u, num_u.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+                ncvAssertCUDAReturn(cudaBindTexture(0, tex_numerator_v, num_v.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+
+                ncvAssertCUDAReturn(cudaBindTexture(0, tex_inv_denominator_u, denom_u.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+                ncvAssertCUDAReturn(cudaBindTexture(0, tex_inv_denominator_v, denom_v.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+
+                //solve linear system
+                for (Ncv32u solver_iteration = 0; solver_iteration < desc.number_of_solver_iterations; ++solver_iteration)
+                {
+                    float omega = 1.99f;
+
+                    ncvAssertCUDAReturn(cudaBindTexture(0, tex_du, du.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+                    ncvAssertCUDAReturn(cudaBindTexture(0, tex_dv, dv.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+
+                    sor_pass<0><<<sor_blocks, sor_threads, 0, stream>>>
+                        (du_new.ptr(),
+                        dv_new.ptr(),
+                        denom_u.ptr(),
+                        denom_v.ptr(),
+                        num_u.ptr(),
+                        num_v.ptr(),
+                        num_dudv.ptr(),
+                        omega,
+                        kLevelWidth,
+                        kLevelHeight,
+                        kLevelStride);
+
+                    ncvAssertCUDALastErrorReturn(NCV_CUDA_ERROR);
+
+                    ncvAssertCUDAReturn(cudaBindTexture(0, tex_du, du_new.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+                    ncvAssertCUDAReturn(cudaBindTexture(0, tex_dv, dv_new.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+
+                    sor_pass<1><<<sor_blocks, sor_threads, 0, stream>>>
+                        (du.ptr(),
+                        dv.ptr(),
+                        denom_u.ptr(),
+                        denom_v.ptr(),
+                        num_u.ptr(),
+                        num_v.ptr(),
+                        num_dudv.ptr(),
+                        omega,
+                        kLevelWidth,
+                        kLevelHeight,
+                        kLevelStride);
+
+                    ncvAssertCUDALastErrorReturn(NCV_CUDA_ERROR);
+
+                    ncvAssertCUDAReturn(cudaBindTexture(0, tex_du, du.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+                    ncvAssertCUDAReturn(cudaBindTexture(0, tex_dv, dv.ptr(), ch_desc, kLevelSizeInBytes), NCV_CUDA_ERROR);
+                }//end of solver loop
+            }// end of inner loop
+
+            //update u and v
+            add(ptrU->ptr(), du.ptr(), kLevelSizeInPixels, stream);
+            ncvAssertCUDALastErrorReturn(NCV_CUDA_ERROR);
+            add(ptrV->ptr(), dv.ptr(), kLevelSizeInPixels, stream);
+            ncvAssertCUDALastErrorReturn(NCV_CUDA_ERROR);
+
+            //prolongate using texture
+            pyr.w.pop_back();
+            pyr.h.pop_back();
+            if (!pyr.w.empty())
+            {
+                //compute new image size
+                Ncv32u nw = pyr.w.back();
+                Ncv32u nh = pyr.h.back();
+                Ncv32u ns = alignUp(nw, kStrideAlignmentFloat);
+
+                dim3 p_blocks(iDivUp(nw, 32), iDivUp(nh, 8));
+                //dim3 p_threads(32, 8);
+
+                NcvSize32u inner_srcSize (kLevelWidth, kLevelHeight);
+                NcvSize32u dstSize (nw, nh);
+                NcvRect32u srcROI (0, 0, kLevelWidth, kLevelHeight);
+                NcvRect32u dstROI (0, 0, nw, nh);
+
+                ncvAssertReturnNcvStat( nppiStResize_32f_C1R (ptrU->ptr(), inner_srcSize, kLevelStride * sizeof (float), srcROI,
+                    ptrUNew->ptr(), dstSize, ns * sizeof (float), dstROI, 1.0f/scale_factor, 1.0f/scale_factor, nppStBicubic) );
+
+                ScaleVector(ptrUNew->ptr(), ptrUNew->ptr(), 1.0f/scale_factor, ns * nh, stream);
+                ncvAssertCUDALastErrorReturn(NCV_CUDA_ERROR);
+
+                ncvAssertReturnNcvStat( nppiStResize_32f_C1R (ptrV->ptr(), inner_srcSize, kLevelStride * sizeof (float), srcROI,
+                    ptrVNew->ptr(), dstSize, ns * sizeof (float), dstROI, 1.0f/scale_factor, 1.0f/scale_factor, nppStBicubic) );
+
+                ScaleVector(ptrVNew->ptr(), ptrVNew->ptr(), 1.0f/scale_factor, ns * nh, stream);
+                ncvAssertCUDALastErrorReturn((int)NCV_CUDA_ERROR);
+
+                cv::cuda::device::swap<FloatVector*>(ptrU, ptrUNew);
+                cv::cuda::device::swap<FloatVector*>(ptrV, ptrVNew);
+            }
+            scale /= scale_factor;
+        }
+
+        // end of warping iterations
+        ncvAssertCUDAReturn(cudaStreamSynchronize(stream), (int)NCV_CUDA_ERROR);
+
+        ncvAssertCUDAReturn( cudaMemcpy2DAsync
+            (uOut.ptr(), uOut.pitch(), ptrU->ptr(),
+            kSourcePitch, kSourceWidth*sizeof(float), kSourceHeight, cudaMemcpyDeviceToDevice, stream), (int)NCV_CUDA_ERROR );
+
+        ncvAssertCUDAReturn( cudaMemcpy2DAsync
+            (vOut.ptr(), vOut.pitch(), ptrV->ptr(),
+            kSourcePitch, kSourceWidth*sizeof(float), kSourceHeight, cudaMemcpyDeviceToDevice, stream), (int)NCV_CUDA_ERROR );
+
+        ncvAssertCUDAReturn(cudaStreamSynchronize(stream), (int)NCV_CUDA_ERROR);
+    }
+
+    return NCV_SUCCESS;
+}
diff --git a/modules/cudalegacy/src/cuda/NCVColorConversion.hpp b/modules/cudalegacy/src/cuda/NCVColorConversion.hpp
new file mode 100644
index 00000000000..c1293f2b34d
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/NCVColorConversion.hpp
@@ -0,0 +1,100 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+// this file does not contain any used code.
+
+#ifndef _ncv_color_conversion_hpp_
+#define _ncv_color_conversion_hpp_
+
+#include "NCVPixelOperations.hpp"
+
+#if 0
+enum NCVColorSpace
+{
+    NCVColorSpaceGray,
+    NCVColorSpaceRGBA,
+};
+
+template<NCVColorSpace CSin, NCVColorSpace CSout, typename Tin, typename Tout> struct __pixColorConv {
+static void _pixColorConv(const Tin &pixIn, Tout &pixOut);
+};
+
+template<typename Tin, typename Tout> struct __pixColorConv<NCVColorSpaceRGBA, NCVColorSpaceGray, Tin, Tout> {
+static void _pixColorConv(const Tin &pixIn, Tout &pixOut)
+{
+    Ncv32f luma = 0.299f * pixIn.x + 0.587f * pixIn.y + 0.114f * pixIn.z;
+    _TDemoteClampNN(luma, pixOut.x);
+}};
+
+template<typename Tin, typename Tout> struct __pixColorConv<NCVColorSpaceGray, NCVColorSpaceRGBA, Tin, Tout> {
+static void _pixColorConv(const Tin &pixIn, Tout &pixOut)
+{
+    _TDemoteClampNN(pixIn.x, pixOut.x);
+    _TDemoteClampNN(pixIn.x, pixOut.y);
+    _TDemoteClampNN(pixIn.x, pixOut.z);
+    pixOut.w = 0;
+}};
+
+template<NCVColorSpace CSin, NCVColorSpace CSout, typename Tin, typename Tout>
+static NCVStatus _ncvColorConv_host(const NCVMatrix<Tin> &h_imgIn,
+                             const NCVMatrix<Tout> &h_imgOut)
+{
+    ncvAssertReturn(h_imgIn.size() == h_imgOut.size(), NCV_DIMENSIONS_INVALID);
+    ncvAssertReturn(h_imgIn.memType() == h_imgOut.memType() &&
+                    (h_imgIn.memType() == NCVMemoryTypeHostPinned || h_imgIn.memType() == NCVMemoryTypeNone), NCV_MEM_RESIDENCE_ERROR);
+    NCV_SET_SKIP_COND(h_imgIn.memType() == NCVMemoryTypeNone);
+    NCV_SKIP_COND_BEGIN
+
+    for (Ncv32u i=0; i<h_imgIn.height(); i++)
+    {
+        for (Ncv32u j=0; j<h_imgIn.width(); j++)
+        {
+            __pixColorConv<CSin, CSout, Tin, Tout>::_pixColorConv(h_imgIn.at(j,i), h_imgOut.at(j,i));
+        }
+    }
+
+    NCV_SKIP_COND_END
+    return NCV_SUCCESS;
+}
+#endif
+
+#endif //_ncv_color_conversion_hpp_
diff --git a/modules/cudalegacy/src/cuda/NCVHaarObjectDetection.cu b/modules/cudalegacy/src/cuda/NCVHaarObjectDetection.cu
new file mode 100644
index 00000000000..5be987a3adc
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/NCVHaarObjectDetection.cu
@@ -0,0 +1,2453 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+////////////////////////////////////////////////////////////////////////////////
+//
+// NVIDIA CUDA implementation of Viola-Jones Object Detection Framework
+//
+// The algorithm and code are explained in the upcoming GPU Computing Gems
+// chapter in detail:
+//
+//   Anton Obukhov, "Haar Classifiers for Object Detection with CUDA"
+//   PDF URL placeholder
+//   email: aobukhov@nvidia.com, devsupport@nvidia.com
+//
+// Credits for help with the code to:
+// Alexey Mendelenko, Cyril Crassin, and Mikhail Smirnov.
+//
+////////////////////////////////////////////////////////////////////////////////
+
+#include <algorithm>
+#include <cstdio>
+
+#include "opencv2/cudev.hpp"
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifdef HAVE_OPENCV_OBJDETECT
+#  include "opencv2/objdetect.hpp"
+//#  include "opencv2/objdetect/objdetect_c.h"
+#endif
+
+#include "opencv2/cudalegacy/NCV.hpp"
+#include "opencv2/cudalegacy/NPP_staging.hpp"
+#include "opencv2/cudalegacy/NCVHaarObjectDetection.hpp"
+
+#include "NCVRuntimeTemplates.hpp"
+#include "NCVAlg.hpp"
+
+
+//==============================================================================
+//
+// HaarClassifierCascade file
+//
+//==============================================================================
+
+
+const Ncv32u MAX_GRID_DIM = 65535;
+
+
+const Ncv32u NUM_THREADS_ANCHORSPARALLEL = 64;
+
+
+#define NUM_THREADS_CLASSIFIERPARALLEL_LOG2     6
+#define NUM_THREADS_CLASSIFIERPARALLEL          (1 << NUM_THREADS_CLASSIFIERPARALLEL_LOG2)
+
+
+/** \internal
+* Haar features solid array.
+*/
+texture<uint2, 1, cudaReadModeElementType> texHaarFeatures;
+
+
+/** \internal
+* Haar classifiers flattened trees container.
+* Two parts: first contains root nodes, second - nodes that are referred by root nodes.
+* Drawback: breaks tree locality (might cause more cache misses
+* Advantage: No need to introduce additional 32-bit field to index root nodes offsets
+*/
+texture<uint4, 1, cudaReadModeElementType> texHaarClassifierNodes;
+
+
+texture<Ncv32u, 1, cudaReadModeElementType> texIImage;
+
+
+__device__ HaarStage64 getStage(Ncv32u iStage, HaarStage64 *d_Stages)
+{
+    return d_Stages[iStage];
+}
+
+
+template <NcvBool tbCacheTextureCascade>
+__device__ HaarClassifierNode128 getClassifierNode(Ncv32u iNode, HaarClassifierNode128 *d_ClassifierNodes)
+{
+    HaarClassifierNode128 tmpNode;
+    if (tbCacheTextureCascade)
+    {
+        tmpNode._ui4 = tex1Dfetch(texHaarClassifierNodes, iNode);
+    }
+    else
+    {
+        tmpNode = d_ClassifierNodes[iNode];
+    }
+    return tmpNode;
+}
+
+
+template <NcvBool tbCacheTextureCascade>
+__device__ void getFeature(Ncv32u iFeature, HaarFeature64 *d_Features,
+                           Ncv32f *weight,
+                           Ncv32u *rectX, Ncv32u *rectY, Ncv32u *rectWidth, Ncv32u *rectHeight)
+{
+    HaarFeature64 feature;
+    if (tbCacheTextureCascade)
+    {
+        feature._ui2 = tex1Dfetch(texHaarFeatures, iFeature);
+    }
+    else
+    {
+        feature = d_Features[iFeature];
+    }
+    feature.getRect(rectX, rectY, rectWidth, rectHeight);
+    *weight = feature.getWeight();
+}
+
+
+template <NcvBool tbCacheTextureIImg>
+__device__ Ncv32u getElemIImg(Ncv32u x, Ncv32u *d_IImg)
+{
+    if (tbCacheTextureIImg)
+    {
+        return tex1Dfetch(texIImage, x);
+    }
+    else
+    {
+        return d_IImg[x];
+    }
+}
+
+
+__device__ Ncv32u d_outMaskPosition;
+
+
+__device__ void compactBlockWriteOutAnchorParallel(Ncv32u threadPassFlag, Ncv32u threadElem, Ncv32u *vectorOut)
+{
+#if __CUDA_ARCH__ && __CUDA_ARCH__ >= 110
+
+    __shared__ Ncv32u shmem[NUM_THREADS_ANCHORSPARALLEL];
+    __shared__ Ncv32u numPassed;
+    __shared__ Ncv32u outMaskOffset;
+
+    Ncv32u incScan = cv::cudev::blockScanInclusive<NUM_THREADS_ANCHORSPARALLEL>(threadPassFlag, shmem, threadIdx.x);
+    __syncthreads();
+
+    if (threadIdx.x == NUM_THREADS_ANCHORSPARALLEL-1)
+    {
+        numPassed = incScan;
+        outMaskOffset = atomicAdd(&d_outMaskPosition, incScan);
+    }
+
+    if (threadPassFlag)
+    {
+        Ncv32u excScan = incScan - threadPassFlag;
+        shmem[excScan] = threadElem;
+    }
+
+    __syncthreads();
+
+    if (threadIdx.x < numPassed)
+    {
+        vectorOut[outMaskOffset + threadIdx.x] = shmem[threadIdx.x];
+    }
+#endif
+}
+
+
+template <NcvBool tbInitMaskPositively,
+          NcvBool tbCacheTextureIImg,
+          NcvBool tbCacheTextureCascade,
+          NcvBool tbReadPixelIndexFromVector,
+          NcvBool tbDoAtomicCompaction>
+__global__ void applyHaarClassifierAnchorParallel(Ncv32u *d_IImg, Ncv32u IImgStride,
+                                                  Ncv32f *d_weights, Ncv32u weightsStride,
+                                                  HaarFeature64 *d_Features, HaarClassifierNode128 *d_ClassifierNodes, HaarStage64 *d_Stages,
+                                                  Ncv32u *d_inMask, Ncv32u *d_outMask,
+                                                  Ncv32u mask1Dlen, Ncv32u mask2Dstride,
+                                                  NcvSize32u anchorsRoi, Ncv32u startStageInc, Ncv32u endStageExc, Ncv32f scaleArea)
+{
+    Ncv32u y_offs;
+    Ncv32u x_offs;
+    Ncv32u maskOffset;
+    Ncv32u outMaskVal;
+
+    NcvBool bInactiveThread = false;
+
+    if (tbReadPixelIndexFromVector)
+    {
+        maskOffset = (MAX_GRID_DIM * blockIdx.y + blockIdx.x) * NUM_THREADS_ANCHORSPARALLEL + threadIdx.x;
+
+        if (maskOffset >= mask1Dlen)
+        {
+            if (tbDoAtomicCompaction) bInactiveThread = true; else return;
+        }
+
+        if (!tbDoAtomicCompaction || tbDoAtomicCompaction && !bInactiveThread)
+        {
+            outMaskVal = d_inMask[maskOffset];
+            y_offs = outMaskVal >> 16;
+            x_offs = outMaskVal & 0xFFFF;
+        }
+    }
+    else
+    {
+        y_offs = blockIdx.y;
+        x_offs = blockIdx.x * NUM_THREADS_ANCHORSPARALLEL + threadIdx.x;
+
+        if (x_offs >= mask2Dstride)
+        {
+            if (tbDoAtomicCompaction) bInactiveThread = true; else return;
+        }
+
+        if (!tbDoAtomicCompaction || tbDoAtomicCompaction && !bInactiveThread)
+        {
+            maskOffset = y_offs * mask2Dstride + x_offs;
+
+            if ((x_offs >= anchorsRoi.width) ||
+                (!tbInitMaskPositively &&
+                 d_inMask != d_outMask &&
+                 d_inMask[maskOffset] == OBJDET_MASK_ELEMENT_INVALID_32U))
+            {
+                if (tbDoAtomicCompaction)
+                {
+                    bInactiveThread = true;
+                }
+                else
+                {
+                    d_outMask[maskOffset] = OBJDET_MASK_ELEMENT_INVALID_32U;
+                    return;
+                }
+            }
+
+            outMaskVal = (y_offs << 16) | x_offs;
+        }
+    }
+
+    NcvBool bPass = true;
+
+    if (!tbDoAtomicCompaction || tbDoAtomicCompaction)
+    {
+        Ncv32f pixelStdDev = 0.0f;
+
+        if (!bInactiveThread)
+            pixelStdDev = d_weights[y_offs * weightsStride + x_offs];
+
+        for (Ncv32u iStage = startStageInc; iStage < endStageExc; iStage++)
+        {
+            Ncv32f curStageSum = 0.0f;
+
+            HaarStage64 curStage = getStage(iStage, d_Stages);
+            Ncv32u numRootNodesInStage = curStage.getNumClassifierRootNodes();
+            Ncv32u curRootNodeOffset = curStage.getStartClassifierRootNodeOffset();
+            Ncv32f stageThreshold = curStage.getStageThreshold();
+
+            while (numRootNodesInStage--)
+            {
+                NcvBool bMoreNodesToTraverse = true;
+                Ncv32u iNode = curRootNodeOffset;
+
+                if (bPass && !bInactiveThread)
+                {
+                    while (bMoreNodesToTraverse)
+                    {
+                        HaarClassifierNode128 curNode = getClassifierNode<tbCacheTextureCascade>(iNode, d_ClassifierNodes);
+                        HaarFeatureDescriptor32 featuresDesc = curNode.getFeatureDesc();
+                        Ncv32u curNodeFeaturesNum = featuresDesc.getNumFeatures();
+                        Ncv32u iFeature = featuresDesc.getFeaturesOffset();
+
+                        Ncv32f curNodeVal = 0.0f;
+
+                        for (Ncv32u iRect=0; iRect<curNodeFeaturesNum; iRect++)
+                        {
+                            Ncv32f rectWeight;
+                            Ncv32u rectX, rectY, rectWidth, rectHeight;
+                            getFeature<tbCacheTextureCascade>
+                                (iFeature + iRect, d_Features,
+                                &rectWeight, &rectX, &rectY, &rectWidth, &rectHeight);
+
+                            Ncv32u iioffsTL = (y_offs + rectY) * IImgStride + (x_offs + rectX);
+                            Ncv32u iioffsTR = iioffsTL + rectWidth;
+                            Ncv32u iioffsBL = iioffsTL + rectHeight * IImgStride;
+                            Ncv32u iioffsBR = iioffsBL + rectWidth;
+
+                            Ncv32u rectSum = getElemIImg<tbCacheTextureIImg>(iioffsBR, d_IImg) -
+                                             getElemIImg<tbCacheTextureIImg>(iioffsBL, d_IImg) +
+                                             getElemIImg<tbCacheTextureIImg>(iioffsTL, d_IImg) -
+                                             getElemIImg<tbCacheTextureIImg>(iioffsTR, d_IImg);
+
+    #if defined CPU_FP_COMPLIANCE || defined DISABLE_MAD_SELECTIVELY
+                        curNodeVal += __fmul_rn((Ncv32f)rectSum, rectWeight);
+    #else
+                        curNodeVal += (Ncv32f)rectSum * rectWeight;
+    #endif
+                        }
+
+                        HaarClassifierNodeDescriptor32 nodeLeft = curNode.getLeftNodeDesc();
+                        HaarClassifierNodeDescriptor32 nodeRight = curNode.getRightNodeDesc();
+                        Ncv32f nodeThreshold = curNode.getThreshold();
+
+                        HaarClassifierNodeDescriptor32 nextNodeDescriptor;
+                        NcvBool nextNodeIsLeaf;
+
+                        if (curNodeVal < scaleArea * pixelStdDev * nodeThreshold)
+                        {
+                            nextNodeDescriptor = nodeLeft;
+                            nextNodeIsLeaf = featuresDesc.isLeftNodeLeaf();
+                        }
+                        else
+                        {
+                            nextNodeDescriptor = nodeRight;
+                            nextNodeIsLeaf = featuresDesc.isRightNodeLeaf();
+                        }
+
+                        if (nextNodeIsLeaf)
+                        {
+                            Ncv32f tmpLeafValue = nextNodeDescriptor.getLeafValue();
+                            curStageSum += tmpLeafValue;
+                            bMoreNodesToTraverse = false;
+                        }
+                        else
+                        {
+                            iNode = nextNodeDescriptor.getNextNodeOffset();
+                        }
+                    }
+                }
+
+                __syncthreads();
+                curRootNodeOffset++;
+            }
+
+            if (curStageSum < stageThreshold)
+            {
+                bPass = false;
+                outMaskVal = OBJDET_MASK_ELEMENT_INVALID_32U;
+            }
+        }
+    }
+
+    __syncthreads();
+
+    if (!tbDoAtomicCompaction)
+    {
+        if (!tbReadPixelIndexFromVector ||
+            (tbReadPixelIndexFromVector && (!bPass || d_inMask != d_outMask)))
+        {
+            d_outMask[maskOffset] = outMaskVal;
+        }
+    }
+    else
+    {
+        compactBlockWriteOutAnchorParallel(bPass && !bInactiveThread,
+                                           outMaskVal,
+                                           d_outMask);
+    }
+}
+
+
+template <NcvBool tbCacheTextureIImg,
+          NcvBool tbCacheTextureCascade,
+          NcvBool tbDoAtomicCompaction>
+__global__ void applyHaarClassifierClassifierParallel(Ncv32u *d_IImg, Ncv32u IImgStride,
+                                                      Ncv32f *d_weights, Ncv32u weightsStride,
+                                                      HaarFeature64 *d_Features, HaarClassifierNode128 *d_ClassifierNodes, HaarStage64 *d_Stages,
+                                                      Ncv32u *d_inMask, Ncv32u *d_outMask,
+                                                      Ncv32u mask1Dlen, Ncv32u mask2Dstride,
+                                                      NcvSize32u anchorsRoi, Ncv32u startStageInc, Ncv32u endStageExc, Ncv32f scaleArea)
+{
+    Ncv32u maskOffset = MAX_GRID_DIM * blockIdx.y + blockIdx.x;
+
+    if (maskOffset >= mask1Dlen)
+    {
+        return;
+    }
+
+    Ncv32u outMaskVal = d_inMask[maskOffset];
+    Ncv32u y_offs = outMaskVal >> 16;
+    Ncv32u x_offs = outMaskVal & 0xFFFF;
+
+    Ncv32f pixelStdDev = d_weights[y_offs * weightsStride + x_offs];
+    NcvBool bPass = true;
+
+    for (Ncv32u iStage = startStageInc; iStage<endStageExc; iStage++)
+    {
+        //this variable is subject to reduction
+        Ncv32f curStageSum = 0.0f;
+
+        HaarStage64 curStage = getStage(iStage, d_Stages);
+        Ncv32s numRootNodesInStage = curStage.getNumClassifierRootNodes();
+        Ncv32u curRootNodeOffset = curStage.getStartClassifierRootNodeOffset() + threadIdx.x;
+        Ncv32f stageThreshold = curStage.getStageThreshold();
+
+        Ncv32u numRootChunks = (numRootNodesInStage + NUM_THREADS_CLASSIFIERPARALLEL - 1) >> NUM_THREADS_CLASSIFIERPARALLEL_LOG2;
+
+        for (Ncv32u chunkId=0; chunkId<numRootChunks; chunkId++)
+        {
+            NcvBool bMoreNodesToTraverse = true;
+
+            if (chunkId * NUM_THREADS_CLASSIFIERPARALLEL + threadIdx.x < numRootNodesInStage)
+            {
+                Ncv32u iNode = curRootNodeOffset;
+
+                while (bMoreNodesToTraverse)
+                {
+                    HaarClassifierNode128 curNode = getClassifierNode<tbCacheTextureCascade>(iNode, d_ClassifierNodes);
+                    HaarFeatureDescriptor32 featuresDesc = curNode.getFeatureDesc();
+                    Ncv32u curNodeFeaturesNum = featuresDesc.getNumFeatures();
+                    Ncv32u iFeature = featuresDesc.getFeaturesOffset();
+
+                    Ncv32f curNodeVal = 0.0f;
+                    //TODO: fetch into shmem if size suffices. Shmem can be shared with reduce
+                    for (Ncv32u iRect=0; iRect<curNodeFeaturesNum; iRect++)
+                    {
+                        Ncv32f rectWeight;
+                        Ncv32u rectX, rectY, rectWidth, rectHeight;
+                        getFeature<tbCacheTextureCascade>
+                            (iFeature + iRect, d_Features,
+                            &rectWeight, &rectX, &rectY, &rectWidth, &rectHeight);
+
+                        Ncv32u iioffsTL = (y_offs + rectY) * IImgStride + (x_offs + rectX);
+                        Ncv32u iioffsTR = iioffsTL + rectWidth;
+                        Ncv32u iioffsBL = iioffsTL + rectHeight * IImgStride;
+                        Ncv32u iioffsBR = iioffsBL + rectWidth;
+
+                        Ncv32u rectSum = getElemIImg<tbCacheTextureIImg>(iioffsBR, d_IImg) -
+                                         getElemIImg<tbCacheTextureIImg>(iioffsBL, d_IImg) +
+                                         getElemIImg<tbCacheTextureIImg>(iioffsTL, d_IImg) -
+                                         getElemIImg<tbCacheTextureIImg>(iioffsTR, d_IImg);
+
+#if defined CPU_FP_COMPLIANCE || defined DISABLE_MAD_SELECTIVELY
+                        curNodeVal += __fmul_rn((Ncv32f)rectSum, rectWeight);
+#else
+                        curNodeVal += (Ncv32f)rectSum * rectWeight;
+#endif
+                    }
+
+                    HaarClassifierNodeDescriptor32 nodeLeft = curNode.getLeftNodeDesc();
+                    HaarClassifierNodeDescriptor32 nodeRight = curNode.getRightNodeDesc();
+                    Ncv32f nodeThreshold = curNode.getThreshold();
+
+                    HaarClassifierNodeDescriptor32 nextNodeDescriptor;
+                    NcvBool nextNodeIsLeaf;
+
+                    if (curNodeVal < scaleArea * pixelStdDev * nodeThreshold)
+                    {
+                        nextNodeDescriptor = nodeLeft;
+                        nextNodeIsLeaf = featuresDesc.isLeftNodeLeaf();
+                    }
+                    else
+                    {
+                        nextNodeDescriptor = nodeRight;
+                        nextNodeIsLeaf = featuresDesc.isRightNodeLeaf();
+                    }
+
+                    if (nextNodeIsLeaf)
+                    {
+                        Ncv32f tmpLeafValue = nextNodeDescriptor.getLeafValue();
+                        curStageSum += tmpLeafValue;
+                        bMoreNodesToTraverse = false;
+                    }
+                    else
+                    {
+                        iNode = nextNodeDescriptor.getNextNodeOffset();
+                    }
+                }
+            }
+            __syncthreads();
+
+            curRootNodeOffset += NUM_THREADS_CLASSIFIERPARALLEL;
+        }
+
+        Ncv32f finalStageSum = subReduce<Ncv32f, functorAddValues<Ncv32f>, NUM_THREADS_CLASSIFIERPARALLEL>(curStageSum);
+
+        if (finalStageSum < stageThreshold)
+        {
+            bPass = false;
+            outMaskVal = OBJDET_MASK_ELEMENT_INVALID_32U;
+            break;
+        }
+    }
+
+    if (!tbDoAtomicCompaction)
+    {
+        if (!bPass || d_inMask != d_outMask)
+        {
+            if (!threadIdx.x)
+            {
+                d_outMask[maskOffset] = outMaskVal;
+            }
+        }
+    }
+    else
+    {
+#if __CUDA_ARCH__ && __CUDA_ARCH__ >= 110
+        if (bPass && !threadIdx.x)
+        {
+            Ncv32u outMaskOffset = atomicAdd(&d_outMaskPosition, 1);
+            d_outMask[outMaskOffset] = outMaskVal;
+        }
+#endif
+    }
+}
+
+
+template <NcvBool tbMaskByInmask,
+          NcvBool tbDoAtomicCompaction>
+__global__ void initializeMaskVector(Ncv32u *d_inMask, Ncv32u *d_outMask,
+                                     Ncv32u mask1Dlen, Ncv32u mask2Dstride,
+                                     NcvSize32u anchorsRoi, Ncv32u step)
+{
+    Ncv32u y_offs = blockIdx.y;
+    Ncv32u x_offs = blockIdx.x * NUM_THREADS_ANCHORSPARALLEL + threadIdx.x;
+    Ncv32u outMaskOffset = y_offs * gridDim.x * blockDim.x + x_offs;
+
+    Ncv32u y_offs_upsc = step * y_offs;
+    Ncv32u x_offs_upsc = step * x_offs;
+    Ncv32u inMaskOffset = y_offs_upsc * mask2Dstride + x_offs_upsc;
+
+    Ncv32u outElem = OBJDET_MASK_ELEMENT_INVALID_32U;
+
+    if (x_offs_upsc < anchorsRoi.width &&
+        (!tbMaskByInmask || d_inMask[inMaskOffset] != OBJDET_MASK_ELEMENT_INVALID_32U))
+    {
+        outElem = (y_offs_upsc << 16) | x_offs_upsc;
+    }
+
+    if (!tbDoAtomicCompaction)
+    {
+        d_outMask[outMaskOffset] = outElem;
+    }
+    else
+    {
+        compactBlockWriteOutAnchorParallel(outElem != OBJDET_MASK_ELEMENT_INVALID_32U,
+                                           outElem,
+                                           d_outMask);
+    }
+}
+
+
+struct applyHaarClassifierAnchorParallelFunctor
+{
+    dim3 gridConf, blockConf;
+    cudaStream_t cuStream;
+
+    //Kernel arguments are stored as members;
+    Ncv32u *d_IImg;
+    Ncv32u IImgStride;
+    Ncv32f *d_weights;
+    Ncv32u weightsStride;
+    HaarFeature64 *d_Features;
+    HaarClassifierNode128 *d_ClassifierNodes;
+    HaarStage64 *d_Stages;
+    Ncv32u *d_inMask;
+    Ncv32u *d_outMask;
+    Ncv32u mask1Dlen;
+    Ncv32u mask2Dstride;
+    NcvSize32u anchorsRoi;
+    Ncv32u startStageInc;
+    Ncv32u endStageExc;
+    Ncv32f scaleArea;
+
+    //Arguments are passed through the constructor
+    applyHaarClassifierAnchorParallelFunctor(dim3 _gridConf, dim3 _blockConf, cudaStream_t _cuStream,
+                                             Ncv32u *_d_IImg, Ncv32u _IImgStride,
+                                             Ncv32f *_d_weights, Ncv32u _weightsStride,
+                                             HaarFeature64 *_d_Features, HaarClassifierNode128 *_d_ClassifierNodes, HaarStage64 *_d_Stages,
+                                             Ncv32u *_d_inMask, Ncv32u *_d_outMask,
+                                             Ncv32u _mask1Dlen, Ncv32u _mask2Dstride,
+                                             NcvSize32u _anchorsRoi, Ncv32u _startStageInc,
+                                             Ncv32u _endStageExc, Ncv32f _scaleArea) :
+    gridConf(_gridConf),
+    blockConf(_blockConf),
+    cuStream(_cuStream),
+    d_IImg(_d_IImg),
+    IImgStride(_IImgStride),
+    d_weights(_d_weights),
+    weightsStride(_weightsStride),
+    d_Features(_d_Features),
+    d_ClassifierNodes(_d_ClassifierNodes),
+    d_Stages(_d_Stages),
+    d_inMask(_d_inMask),
+    d_outMask(_d_outMask),
+    mask1Dlen(_mask1Dlen),
+    mask2Dstride(_mask2Dstride),
+    anchorsRoi(_anchorsRoi),
+    startStageInc(_startStageInc),
+    endStageExc(_endStageExc),
+    scaleArea(_scaleArea)
+    {}
+
+    template<class TList>
+    void call(TList tl)
+    {
+        CV_UNUSED(tl);
+        applyHaarClassifierAnchorParallel <
+            Loki::TL::TypeAt<TList, 0>::Result::value,
+            Loki::TL::TypeAt<TList, 1>::Result::value,
+            Loki::TL::TypeAt<TList, 2>::Result::value,
+            Loki::TL::TypeAt<TList, 3>::Result::value,
+            Loki::TL::TypeAt<TList, 4>::Result::value >
+            <<<gridConf, blockConf, 0, cuStream>>>
+            (d_IImg, IImgStride,
+            d_weights, weightsStride,
+            d_Features, d_ClassifierNodes, d_Stages,
+            d_inMask, d_outMask,
+            mask1Dlen, mask2Dstride,
+            anchorsRoi, startStageInc,
+            endStageExc, scaleArea);
+    }
+};
+
+
+void applyHaarClassifierAnchorParallelDynTemplate(NcvBool tbInitMaskPositively,
+                                                  NcvBool tbCacheTextureIImg,
+                                                  NcvBool tbCacheTextureCascade,
+                                                  NcvBool tbReadPixelIndexFromVector,
+                                                  NcvBool tbDoAtomicCompaction,
+
+                                                  dim3 gridConf, dim3 blockConf, cudaStream_t cuStream,
+
+                                                  Ncv32u *d_IImg, Ncv32u IImgStride,
+                                                  Ncv32f *d_weights, Ncv32u weightsStride,
+                                                  HaarFeature64 *d_Features, HaarClassifierNode128 *d_ClassifierNodes, HaarStage64 *d_Stages,
+                                                  Ncv32u *d_inMask, Ncv32u *d_outMask,
+                                                  Ncv32u mask1Dlen, Ncv32u mask2Dstride,
+                                                  NcvSize32u anchorsRoi, Ncv32u startStageInc,
+                                                  Ncv32u endStageExc, Ncv32f scaleArea)
+{
+
+    applyHaarClassifierAnchorParallelFunctor functor(gridConf, blockConf, cuStream,
+                                                     d_IImg, IImgStride,
+                                                     d_weights, weightsStride,
+                                                     d_Features, d_ClassifierNodes, d_Stages,
+                                                     d_inMask, d_outMask,
+                                                     mask1Dlen, mask2Dstride,
+                                                     anchorsRoi, startStageInc,
+                                                     endStageExc, scaleArea);
+
+    //Second parameter is the number of "dynamic" template parameters
+    NCVRuntimeTemplateBool::KernelCaller<Loki::NullType, 5, applyHaarClassifierAnchorParallelFunctor>
+        ::call( &functor,
+                tbInitMaskPositively,
+                tbCacheTextureIImg,
+                tbCacheTextureCascade,
+                tbReadPixelIndexFromVector,
+                tbDoAtomicCompaction);
+}
+
+
+struct applyHaarClassifierClassifierParallelFunctor
+{
+    dim3 gridConf, blockConf;
+    cudaStream_t cuStream;
+
+    //Kernel arguments are stored as members;
+    Ncv32u *d_IImg;
+    Ncv32u IImgStride;
+    Ncv32f *d_weights;
+    Ncv32u weightsStride;
+    HaarFeature64 *d_Features;
+    HaarClassifierNode128 *d_ClassifierNodes;
+    HaarStage64 *d_Stages;
+    Ncv32u *d_inMask;
+    Ncv32u *d_outMask;
+    Ncv32u mask1Dlen;
+    Ncv32u mask2Dstride;
+    NcvSize32u anchorsRoi;
+    Ncv32u startStageInc;
+    Ncv32u endStageExc;
+    Ncv32f scaleArea;
+
+    //Arguments are passed through the constructor
+    applyHaarClassifierClassifierParallelFunctor(dim3 _gridConf, dim3 _blockConf, cudaStream_t _cuStream,
+                                                 Ncv32u *_d_IImg, Ncv32u _IImgStride,
+                                                 Ncv32f *_d_weights, Ncv32u _weightsStride,
+                                                 HaarFeature64 *_d_Features, HaarClassifierNode128 *_d_ClassifierNodes, HaarStage64 *_d_Stages,
+                                                 Ncv32u *_d_inMask, Ncv32u *_d_outMask,
+                                                 Ncv32u _mask1Dlen, Ncv32u _mask2Dstride,
+                                                 NcvSize32u _anchorsRoi, Ncv32u _startStageInc,
+                                                 Ncv32u _endStageExc, Ncv32f _scaleArea) :
+    gridConf(_gridConf),
+    blockConf(_blockConf),
+    cuStream(_cuStream),
+    d_IImg(_d_IImg),
+    IImgStride(_IImgStride),
+    d_weights(_d_weights),
+    weightsStride(_weightsStride),
+    d_Features(_d_Features),
+    d_ClassifierNodes(_d_ClassifierNodes),
+    d_Stages(_d_Stages),
+    d_inMask(_d_inMask),
+    d_outMask(_d_outMask),
+    mask1Dlen(_mask1Dlen),
+    mask2Dstride(_mask2Dstride),
+    anchorsRoi(_anchorsRoi),
+    startStageInc(_startStageInc),
+    endStageExc(_endStageExc),
+    scaleArea(_scaleArea)
+    {}
+
+    template<class TList>
+    void call(TList tl)
+    {
+        CV_UNUSED(tl);
+        applyHaarClassifierClassifierParallel <
+            Loki::TL::TypeAt<TList, 0>::Result::value,
+            Loki::TL::TypeAt<TList, 1>::Result::value,
+            Loki::TL::TypeAt<TList, 2>::Result::value >
+            <<<gridConf, blockConf, 0, cuStream>>>
+            (d_IImg, IImgStride,
+            d_weights, weightsStride,
+            d_Features, d_ClassifierNodes, d_Stages,
+            d_inMask, d_outMask,
+            mask1Dlen, mask2Dstride,
+            anchorsRoi, startStageInc,
+            endStageExc, scaleArea);
+    }
+};
+
+
+void applyHaarClassifierClassifierParallelDynTemplate(NcvBool tbCacheTextureIImg,
+                                                      NcvBool tbCacheTextureCascade,
+                                                      NcvBool tbDoAtomicCompaction,
+
+                                                      dim3 gridConf, dim3 blockConf, cudaStream_t cuStream,
+
+                                                      Ncv32u *d_IImg, Ncv32u IImgStride,
+                                                      Ncv32f *d_weights, Ncv32u weightsStride,
+                                                      HaarFeature64 *d_Features, HaarClassifierNode128 *d_ClassifierNodes, HaarStage64 *d_Stages,
+                                                      Ncv32u *d_inMask, Ncv32u *d_outMask,
+                                                      Ncv32u mask1Dlen, Ncv32u mask2Dstride,
+                                                      NcvSize32u anchorsRoi, Ncv32u startStageInc,
+                                                      Ncv32u endStageExc, Ncv32f scaleArea)
+{
+    applyHaarClassifierClassifierParallelFunctor functor(gridConf, blockConf, cuStream,
+                                                         d_IImg, IImgStride,
+                                                         d_weights, weightsStride,
+                                                         d_Features, d_ClassifierNodes, d_Stages,
+                                                         d_inMask, d_outMask,
+                                                         mask1Dlen, mask2Dstride,
+                                                         anchorsRoi, startStageInc,
+                                                         endStageExc, scaleArea);
+
+    //Second parameter is the number of "dynamic" template parameters
+    NCVRuntimeTemplateBool::KernelCaller<Loki::NullType, 3, applyHaarClassifierClassifierParallelFunctor>
+        ::call( &functor,
+                tbCacheTextureIImg,
+                tbCacheTextureCascade,
+                tbDoAtomicCompaction);
+}
+
+
+struct initializeMaskVectorFunctor
+{
+    dim3 gridConf, blockConf;
+    cudaStream_t cuStream;
+
+    //Kernel arguments are stored as members;
+    Ncv32u *d_inMask;
+    Ncv32u *d_outMask;
+    Ncv32u mask1Dlen;
+    Ncv32u mask2Dstride;
+    NcvSize32u anchorsRoi;
+    Ncv32u step;
+
+    //Arguments are passed through the constructor
+    initializeMaskVectorFunctor(dim3 _gridConf, dim3 _blockConf, cudaStream_t _cuStream,
+                                Ncv32u *_d_inMask, Ncv32u *_d_outMask,
+                                Ncv32u _mask1Dlen, Ncv32u _mask2Dstride,
+                                NcvSize32u _anchorsRoi, Ncv32u _step) :
+    gridConf(_gridConf),
+    blockConf(_blockConf),
+    cuStream(_cuStream),
+    d_inMask(_d_inMask),
+    d_outMask(_d_outMask),
+    mask1Dlen(_mask1Dlen),
+    mask2Dstride(_mask2Dstride),
+    anchorsRoi(_anchorsRoi),
+    step(_step)
+    {}
+
+    template<class TList>
+    void call(TList tl)
+    {
+        CV_UNUSED(tl);
+        initializeMaskVector <
+            Loki::TL::TypeAt<TList, 0>::Result::value,
+            Loki::TL::TypeAt<TList, 1>::Result::value >
+            <<<gridConf, blockConf, 0, cuStream>>>
+            (d_inMask, d_outMask,
+             mask1Dlen, mask2Dstride,
+             anchorsRoi, step);
+    }
+};
+
+
+void initializeMaskVectorDynTemplate(NcvBool tbMaskByInmask,
+                                     NcvBool tbDoAtomicCompaction,
+
+                                     dim3 gridConf, dim3 blockConf, cudaStream_t cuStream,
+
+                                     Ncv32u *d_inMask, Ncv32u *d_outMask,
+                                     Ncv32u mask1Dlen, Ncv32u mask2Dstride,
+                                     NcvSize32u anchorsRoi, Ncv32u step)
+{
+    initializeMaskVectorFunctor functor(gridConf, blockConf, cuStream,
+                                        d_inMask, d_outMask,
+                                        mask1Dlen, mask2Dstride,
+                                        anchorsRoi, step);
+
+    //Second parameter is the number of "dynamic" template parameters
+    NCVRuntimeTemplateBool::KernelCaller<Loki::NullType, 2, initializeMaskVectorFunctor>
+        ::call( &functor,
+                tbMaskByInmask,
+                tbDoAtomicCompaction);
+}
+
+
+Ncv32u getStageNumWithNotLessThanNclassifiers(Ncv32u N, HaarClassifierCascadeDescriptor &haar,
+                                              NCVVector<HaarStage64> &h_HaarStages)
+{
+    Ncv32u i = 0;
+    for (; i<haar.NumStages; i++)
+    {
+        if (h_HaarStages.ptr()[i].getNumClassifierRootNodes() >= N)
+        {
+            break;
+        }
+    }
+    return i;
+}
+
+
+NCVStatus ncvApplyHaarClassifierCascade_device(NCVMatrix<Ncv32u> &integral,
+                                               NCVMatrix<Ncv32f> &d_weights,
+                                               NCVMatrixAlloc<Ncv32u> &d_pixelMask,
+                                               Ncv32u &numDetections,
+                                               HaarClassifierCascadeDescriptor &haar,
+                                               NCVVector<HaarStage64> &h_HaarStages,
+                                               NCVVector<HaarStage64> &d_HaarStages,
+                                               NCVVector<HaarClassifierNode128> &d_HaarNodes,
+                                               NCVVector<HaarFeature64> &d_HaarFeatures,
+                                               NcvBool bMaskElements,
+                                               NcvSize32u anchorsRoi,
+                                               Ncv32u pixelStep,
+                                               Ncv32f scaleArea,
+                                               INCVMemAllocator &gpuAllocator,
+                                               INCVMemAllocator &cpuAllocator,
+                                               cudaDeviceProp &devProp,
+                                               cudaStream_t cuStream)
+{
+    ncvAssertReturn(integral.memType() == d_weights.memType()&&
+                    integral.memType() == d_pixelMask.memType() &&
+                    integral.memType() == gpuAllocator.memType() &&
+                   (integral.memType() == NCVMemoryTypeDevice ||
+                    integral.memType() == NCVMemoryTypeNone), NCV_MEM_RESIDENCE_ERROR);
+
+    ncvAssertReturn(d_HaarStages.memType() == d_HaarNodes.memType() &&
+                    d_HaarStages.memType() == d_HaarFeatures.memType() &&
+                     (d_HaarStages.memType() == NCVMemoryTypeDevice ||
+                      d_HaarStages.memType() == NCVMemoryTypeNone), NCV_MEM_RESIDENCE_ERROR);
+
+    ncvAssertReturn(h_HaarStages.memType() != NCVMemoryTypeDevice, NCV_MEM_RESIDENCE_ERROR);
+
+    ncvAssertReturn(gpuAllocator.isInitialized() && cpuAllocator.isInitialized(), NCV_ALLOCATOR_NOT_INITIALIZED);
+
+    ncvAssertReturn((integral.ptr() != NULL && d_weights.ptr() != NULL && d_pixelMask.ptr() != NULL &&
+                     h_HaarStages.ptr() != NULL && d_HaarStages.ptr() != NULL && d_HaarNodes.ptr() != NULL &&
+                     d_HaarFeatures.ptr() != NULL) || gpuAllocator.isCounting(), NCV_NULL_PTR);
+
+    ncvAssertReturn(anchorsRoi.width > 0 && anchorsRoi.height > 0 &&
+                    d_pixelMask.width() >= anchorsRoi.width && d_pixelMask.height() >= anchorsRoi.height &&
+                    d_weights.width() >= anchorsRoi.width && d_weights.height() >= anchorsRoi.height &&
+                    integral.width() >= anchorsRoi.width + haar.ClassifierSize.width &&
+                    integral.height() >= anchorsRoi.height + haar.ClassifierSize.height, NCV_DIMENSIONS_INVALID);
+
+    ncvAssertReturn(scaleArea > 0, NCV_INVALID_SCALE);
+
+    ncvAssertReturn(d_HaarStages.length() >= haar.NumStages &&
+                    d_HaarNodes.length() >= haar.NumClassifierTotalNodes &&
+                    d_HaarFeatures.length() >= haar.NumFeatures &&
+                    d_HaarStages.length() == h_HaarStages.length() &&
+                    haar.NumClassifierRootNodes <= haar.NumClassifierTotalNodes, NCV_DIMENSIONS_INVALID);
+
+    ncvAssertReturn(haar.bNeedsTiltedII == false || gpuAllocator.isCounting(), NCV_NOIMPL_HAAR_TILTED_FEATURES);
+
+    ncvAssertReturn(pixelStep == 1 || pixelStep == 2, NCV_HAAR_INVALID_PIXEL_STEP);
+
+    NCV_SET_SKIP_COND(gpuAllocator.isCounting());
+
+#if defined _SELF_TEST_
+
+    NCVStatus ncvStat;
+
+    NCVMatrixAlloc<Ncv32u> h_integralImage(cpuAllocator, integral.width, integral.height, integral.pitch);
+    ncvAssertReturn(h_integralImage.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+    NCVMatrixAlloc<Ncv32f> h_weights(cpuAllocator, d_weights.width, d_weights.height, d_weights.pitch);
+    ncvAssertReturn(h_weights.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+    NCVMatrixAlloc<Ncv32u> h_pixelMask(cpuAllocator, d_pixelMask.width, d_pixelMask.height, d_pixelMask.pitch);
+    ncvAssertReturn(h_pixelMask.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+    NCVVectorAlloc<HaarClassifierNode128> h_HaarNodes(cpuAllocator, d_HaarNodes.length);
+    ncvAssertReturn(h_HaarNodes.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+    NCVVectorAlloc<HaarFeature64> h_HaarFeatures(cpuAllocator, d_HaarFeatures.length);
+    ncvAssertReturn(h_HaarFeatures.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+
+    NCVMatrixAlloc<Ncv32u> h_pixelMask_d(cpuAllocator, d_pixelMask.width, d_pixelMask.height, d_pixelMask.pitch);
+    ncvAssertReturn(h_pixelMask_d.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+
+    NCV_SKIP_COND_BEGIN
+
+    ncvStat = d_pixelMask.copySolid(h_pixelMask, 0);
+    ncvAssertReturnNcvStat(ncvStat);
+    ncvStat = integral.copySolid(h_integralImage, 0);
+    ncvAssertReturnNcvStat(ncvStat);
+    ncvStat = d_weights.copySolid(h_weights, 0);
+    ncvAssertReturnNcvStat(ncvStat);
+    ncvStat = d_HaarNodes.copySolid(h_HaarNodes, 0);
+    ncvAssertReturnNcvStat(ncvStat);
+    ncvStat = d_HaarFeatures.copySolid(h_HaarFeatures, 0);
+    ncvAssertReturnNcvStat(ncvStat);
+    ncvAssertCUDAReturn(cudaStreamSynchronize(0), NCV_CUDA_ERROR);
+
+    for (Ncv32u i=0; i<(Ncv32u)anchorsRoi.height; i++)
+    {
+        for (Ncv32u j=0; j<d_pixelMask.stride(); j++)
+        {
+            if ((i%pixelStep==0) && (j%pixelStep==0) && (j<(Ncv32u)anchorsRoi.width))
+            {
+                if (!bMaskElements || h_pixelMask.ptr[i*d_pixelMask.stride()+j] != OBJDET_MASK_ELEMENT_INVALID_32U)
+                {
+                    h_pixelMask.ptr[i*d_pixelMask.stride()+j] = (i << 16) | j;
+                }
+            }
+            else
+            {
+                h_pixelMask.ptr[i*d_pixelMask.stride()+j] = OBJDET_MASK_ELEMENT_INVALID_32U;
+            }
+        }
+    }
+
+    NCV_SKIP_COND_END
+
+#endif
+
+    NCVVectorReuse<Ncv32u> d_vecPixelMask(d_pixelMask.getSegment(), anchorsRoi.height * d_pixelMask.stride());
+    ncvAssertReturn(d_vecPixelMask.isMemReused(), NCV_ALLOCATOR_BAD_REUSE);
+
+    NCVVectorAlloc<Ncv32u> d_vecPixelMaskTmp(gpuAllocator, static_cast<Ncv32u>(d_vecPixelMask.length()));
+    ncvAssertReturn(d_vecPixelMaskTmp.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+
+    NCVVectorAlloc<Ncv32u> hp_pool32u(cpuAllocator, 2);
+    ncvAssertReturn(hp_pool32u.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+    Ncv32u *hp_zero = &hp_pool32u.ptr()[0];
+    Ncv32u *hp_numDet = &hp_pool32u.ptr()[1];
+
+    NCV_SKIP_COND_BEGIN
+    *hp_zero = 0;
+    *hp_numDet = 0;
+    NCV_SKIP_COND_END
+
+    Ncv32f scaleAreaPixels = scaleArea * ((haar.ClassifierSize.width - 2*HAAR_STDDEV_BORDER) *
+                                          (haar.ClassifierSize.height - 2*HAAR_STDDEV_BORDER));
+
+    NcvBool bTexCacheCascade = devProp.major < 2;
+    NcvBool bTexCacheIImg = true; //this works better even on Fermi so far
+    NcvBool bDoAtomicCompaction = devProp.major >= 2 || (devProp.major == 1 && devProp.minor >= 3);
+
+    NCVVector<Ncv32u> *d_ptrNowData = &d_vecPixelMask;
+    NCVVector<Ncv32u> *d_ptrNowTmp = &d_vecPixelMaskTmp;
+
+    Ncv32u szNppCompactTmpBuf;
+    nppsStCompactGetSize_32u(static_cast<Ncv32u>(d_vecPixelMask.length()), &szNppCompactTmpBuf, devProp);
+    if (bDoAtomicCompaction)
+    {
+        szNppCompactTmpBuf = 0;
+    }
+    NCVVectorAlloc<Ncv8u> d_tmpBufCompact(gpuAllocator, szNppCompactTmpBuf);
+
+    NCV_SKIP_COND_BEGIN
+
+    if (bTexCacheIImg)
+    {
+        cudaChannelFormatDesc cfdTexIImage;
+        cfdTexIImage = cudaCreateChannelDesc<Ncv32u>();
+
+        size_t alignmentOffset;
+        ncvAssertCUDAReturn(cudaBindTexture(&alignmentOffset, texIImage, integral.ptr(), cfdTexIImage,
+            (anchorsRoi.height + haar.ClassifierSize.height) * integral.pitch()), NCV_CUDA_ERROR);
+        ncvAssertReturn(alignmentOffset==0, NCV_TEXTURE_BIND_ERROR);
+    }
+
+    if (bTexCacheCascade)
+    {
+        cudaChannelFormatDesc cfdTexHaarFeatures;
+        cudaChannelFormatDesc cfdTexHaarClassifierNodes;
+        cfdTexHaarFeatures = cudaCreateChannelDesc<uint2>();
+        cfdTexHaarClassifierNodes = cudaCreateChannelDesc<uint4>();
+
+        size_t alignmentOffset;
+        ncvAssertCUDAReturn(cudaBindTexture(&alignmentOffset, texHaarFeatures,
+            d_HaarFeatures.ptr(), cfdTexHaarFeatures,sizeof(HaarFeature64) * haar.NumFeatures), NCV_CUDA_ERROR);
+        ncvAssertReturn(alignmentOffset==0, NCV_TEXTURE_BIND_ERROR);
+        ncvAssertCUDAReturn(cudaBindTexture(&alignmentOffset, texHaarClassifierNodes,
+            d_HaarNodes.ptr(), cfdTexHaarClassifierNodes, sizeof(HaarClassifierNode128) * haar.NumClassifierTotalNodes), NCV_CUDA_ERROR);
+        ncvAssertReturn(alignmentOffset==0, NCV_TEXTURE_BIND_ERROR);
+    }
+
+    Ncv32u stageStartAnchorParallel = 0;
+    Ncv32u stageMiddleSwitch = getStageNumWithNotLessThanNclassifiers(NUM_THREADS_CLASSIFIERPARALLEL,
+        haar, h_HaarStages);
+    Ncv32u stageEndClassifierParallel = haar.NumStages;
+    if (stageMiddleSwitch == 0)
+    {
+        stageMiddleSwitch = 1;
+    }
+
+    //create stages subdivision for pixel-parallel processing
+    const Ncv32u compactEveryNstage = bDoAtomicCompaction ? 7 : 1;
+    Ncv32u curStop = stageStartAnchorParallel;
+    std::vector<Ncv32u> pixParallelStageStops;
+    while (curStop < stageMiddleSwitch)
+    {
+        pixParallelStageStops.push_back(curStop);
+        curStop += compactEveryNstage;
+    }
+    if (curStop > compactEveryNstage && curStop - stageMiddleSwitch > compactEveryNstage / 2)
+    {
+        pixParallelStageStops[pixParallelStageStops.size()-1] =
+            (stageMiddleSwitch - (curStop - 2 * compactEveryNstage)) / 2;
+    }
+    pixParallelStageStops.push_back(stageMiddleSwitch);
+    Ncv32u pixParallelStageStopsIndex = 0;
+
+    if (pixelStep != 1 || bMaskElements)
+    {
+        if (bDoAtomicCompaction)
+        {
+            ncvAssertCUDAReturn(cudaMemcpyToSymbolAsync(d_outMaskPosition, hp_zero, sizeof(Ncv32u),
+                                                        0, cudaMemcpyHostToDevice, cuStream), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+        }
+
+        dim3 gridInit((((anchorsRoi.width + pixelStep - 1) / pixelStep + NUM_THREADS_ANCHORSPARALLEL - 1) / NUM_THREADS_ANCHORSPARALLEL),
+                        (anchorsRoi.height + pixelStep - 1) / pixelStep);
+        dim3 blockInit(NUM_THREADS_ANCHORSPARALLEL);
+
+        if (gridInit.x == 0 || gridInit.y == 0)
+        {
+            numDetections = 0;
+            return NCV_SUCCESS;
+        }
+
+        initializeMaskVectorDynTemplate(bMaskElements,
+                                        bDoAtomicCompaction,
+                                        gridInit, blockInit, cuStream,
+                                        d_ptrNowData->ptr(),
+                                        d_ptrNowTmp->ptr(),
+                                        static_cast<Ncv32u>(d_vecPixelMask.length()), d_pixelMask.stride(),
+                                        anchorsRoi, pixelStep);
+        ncvAssertCUDAReturn(cudaGetLastError(), NCV_CUDA_ERROR);
+
+        if (bDoAtomicCompaction)
+        {
+            ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaMemcpyFromSymbolAsync(hp_numDet, d_outMaskPosition, sizeof(Ncv32u),
+                                                          0, cudaMemcpyDeviceToHost, cuStream), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+            swap(d_ptrNowData, d_ptrNowTmp);
+        }
+        else
+        {
+            NCVStatus nppSt;
+            nppSt = nppsStCompact_32u(d_ptrNowTmp->ptr(), static_cast<Ncv32u>(d_vecPixelMask.length()),
+                                      d_ptrNowData->ptr(), hp_numDet, OBJDET_MASK_ELEMENT_INVALID_32U,
+                                      d_tmpBufCompact.ptr(), szNppCompactTmpBuf, devProp);
+            ncvAssertReturn(nppSt == NPPST_SUCCESS, NCV_NPP_ERROR);
+        }
+        numDetections = *hp_numDet;
+    }
+    else
+    {
+        //
+        // 1. Run the first pixel-input pixel-parallel classifier for few stages
+        //
+
+        if (bDoAtomicCompaction)
+        {
+            ncvAssertCUDAReturn(cudaMemcpyToSymbolAsync(d_outMaskPosition, hp_zero, sizeof(Ncv32u),
+                                                        0, cudaMemcpyHostToDevice, cuStream), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+        }
+
+        dim3 grid1(((d_pixelMask.stride() + NUM_THREADS_ANCHORSPARALLEL - 1) / NUM_THREADS_ANCHORSPARALLEL),
+                   anchorsRoi.height);
+        dim3 block1(NUM_THREADS_ANCHORSPARALLEL);
+        applyHaarClassifierAnchorParallelDynTemplate(
+            true,                         //tbInitMaskPositively
+            bTexCacheIImg,                //tbCacheTextureIImg
+            bTexCacheCascade,             //tbCacheTextureCascade
+            pixParallelStageStops[pixParallelStageStopsIndex] != 0,//tbReadPixelIndexFromVector
+            bDoAtomicCompaction,          //tbDoAtomicCompaction
+            grid1,
+            block1,
+            cuStream,
+            integral.ptr(), integral.stride(),
+            d_weights.ptr(), d_weights.stride(),
+            d_HaarFeatures.ptr(), d_HaarNodes.ptr(), d_HaarStages.ptr(),
+            d_ptrNowData->ptr(),
+            bDoAtomicCompaction ? d_ptrNowTmp->ptr() : d_ptrNowData->ptr(),
+            0,
+            d_pixelMask.stride(),
+            anchorsRoi,
+            pixParallelStageStops[pixParallelStageStopsIndex],
+            pixParallelStageStops[pixParallelStageStopsIndex+1],
+            scaleAreaPixels);
+        ncvAssertCUDAReturn(cudaGetLastError(), NCV_CUDA_ERROR);
+
+        if (bDoAtomicCompaction)
+        {
+            ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaMemcpyFromSymbolAsync(hp_numDet, d_outMaskPosition, sizeof(Ncv32u),
+                                                          0, cudaMemcpyDeviceToHost, cuStream), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+        }
+        else
+        {
+            NCVStatus nppSt;
+            nppSt = nppsStCompact_32u(d_ptrNowData->ptr(), static_cast<Ncv32u>(d_vecPixelMask.length()),
+                                      d_ptrNowTmp->ptr(), hp_numDet, OBJDET_MASK_ELEMENT_INVALID_32U,
+                                      d_tmpBufCompact.ptr(), szNppCompactTmpBuf, devProp);
+            ncvAssertReturnNcvStat(nppSt);
+        }
+
+        swap(d_ptrNowData, d_ptrNowTmp);
+        numDetections = *hp_numDet;
+
+        pixParallelStageStopsIndex++;
+    }
+
+    //
+    // 2. Run pixel-parallel stages
+    //
+
+    for (; pixParallelStageStopsIndex < pixParallelStageStops.size()-1; pixParallelStageStopsIndex++)
+    {
+        if (numDetections == 0)
+        {
+            break;
+        }
+
+        if (bDoAtomicCompaction)
+        {
+            ncvAssertCUDAReturn(cudaMemcpyToSymbolAsync(d_outMaskPosition, hp_zero, sizeof(Ncv32u),
+                                                        0, cudaMemcpyHostToDevice, cuStream), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+        }
+
+        dim3 grid2((numDetections + NUM_THREADS_ANCHORSPARALLEL - 1) / NUM_THREADS_ANCHORSPARALLEL);
+        if (numDetections > MAX_GRID_DIM)
+        {
+            grid2.x = MAX_GRID_DIM;
+            grid2.y = (numDetections + MAX_GRID_DIM - 1) / MAX_GRID_DIM;
+        }
+        dim3 block2(NUM_THREADS_ANCHORSPARALLEL);
+
+        applyHaarClassifierAnchorParallelDynTemplate(
+            false,                        //tbInitMaskPositively
+            bTexCacheIImg,                //tbCacheTextureIImg
+            bTexCacheCascade,             //tbCacheTextureCascade
+            pixParallelStageStops[pixParallelStageStopsIndex] != 0 || pixelStep != 1 || bMaskElements,//tbReadPixelIndexFromVector
+            bDoAtomicCompaction,          //tbDoAtomicCompaction
+            grid2,
+            block2,
+            cuStream,
+            integral.ptr(), integral.stride(),
+            d_weights.ptr(), d_weights.stride(),
+            d_HaarFeatures.ptr(), d_HaarNodes.ptr(), d_HaarStages.ptr(),
+            d_ptrNowData->ptr(),
+            bDoAtomicCompaction ? d_ptrNowTmp->ptr() : d_ptrNowData->ptr(),
+            numDetections,
+            d_pixelMask.stride(),
+            anchorsRoi,
+            pixParallelStageStops[pixParallelStageStopsIndex],
+            pixParallelStageStops[pixParallelStageStopsIndex+1],
+            scaleAreaPixels);
+        ncvAssertCUDAReturn(cudaGetLastError(), NCV_CUDA_ERROR);
+
+        if (bDoAtomicCompaction)
+        {
+            ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaMemcpyFromSymbolAsync(hp_numDet, d_outMaskPosition, sizeof(Ncv32u),
+                                                          0, cudaMemcpyDeviceToHost, cuStream), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+        }
+        else
+        {
+            NCVStatus nppSt;
+            nppSt = nppsStCompact_32u(d_ptrNowData->ptr(), numDetections,
+                                      d_ptrNowTmp->ptr(), hp_numDet, OBJDET_MASK_ELEMENT_INVALID_32U,
+                                      d_tmpBufCompact.ptr(), szNppCompactTmpBuf, devProp);
+            ncvAssertReturnNcvStat(nppSt);
+        }
+
+        swap(d_ptrNowData, d_ptrNowTmp);
+        numDetections = *hp_numDet;
+    }
+
+    //
+    // 3. Run all left stages in one stage-parallel kernel
+    //
+
+    if (numDetections > 0 && stageMiddleSwitch < stageEndClassifierParallel)
+    {
+        if (bDoAtomicCompaction)
+        {
+            ncvAssertCUDAReturn(cudaMemcpyToSymbolAsync(d_outMaskPosition, hp_zero, sizeof(Ncv32u),
+                                                        0, cudaMemcpyHostToDevice, cuStream), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+        }
+
+        dim3 grid3(numDetections);
+        if (numDetections > MAX_GRID_DIM)
+        {
+            grid3.x = MAX_GRID_DIM;
+            grid3.y = (numDetections + MAX_GRID_DIM - 1) / MAX_GRID_DIM;
+        }
+        dim3 block3(NUM_THREADS_CLASSIFIERPARALLEL);
+
+        applyHaarClassifierClassifierParallelDynTemplate(
+            bTexCacheIImg,                //tbCacheTextureIImg
+            bTexCacheCascade,             //tbCacheTextureCascade
+            bDoAtomicCompaction,          //tbDoAtomicCompaction
+            grid3,
+            block3,
+            cuStream,
+            integral.ptr(), integral.stride(),
+            d_weights.ptr(), d_weights.stride(),
+            d_HaarFeatures.ptr(), d_HaarNodes.ptr(), d_HaarStages.ptr(),
+            d_ptrNowData->ptr(),
+            bDoAtomicCompaction ? d_ptrNowTmp->ptr() : d_ptrNowData->ptr(),
+            numDetections,
+            d_pixelMask.stride(),
+            anchorsRoi,
+            stageMiddleSwitch,
+            stageEndClassifierParallel,
+            scaleAreaPixels);
+        ncvAssertCUDAReturn(cudaGetLastError(), NCV_CUDA_ERROR);
+
+        if (bDoAtomicCompaction)
+        {
+            ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaMemcpyFromSymbolAsync(hp_numDet, d_outMaskPosition, sizeof(Ncv32u),
+                                                          0, cudaMemcpyDeviceToHost, cuStream), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+        }
+        else
+        {
+            NCVStatus nppSt;
+            nppSt = nppsStCompact_32u(d_ptrNowData->ptr(), numDetections,
+                                      d_ptrNowTmp->ptr(), hp_numDet, OBJDET_MASK_ELEMENT_INVALID_32U,
+                                      d_tmpBufCompact.ptr(), szNppCompactTmpBuf, devProp);
+            ncvAssertReturnNcvStat(nppSt);
+        }
+
+        swap(d_ptrNowData, d_ptrNowTmp);
+        numDetections = *hp_numDet;
+    }
+
+    if (d_ptrNowData != &d_vecPixelMask)
+    {
+        d_vecPixelMaskTmp.copySolid(d_vecPixelMask, cuStream);
+        ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+    }
+
+#if defined _SELF_TEST_
+
+    ncvStat = d_pixelMask.copySolid(h_pixelMask_d, 0);
+    ncvAssertReturnNcvStat(ncvStat);
+    ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+
+    if (bDoAtomicCompaction)
+    {
+        std::sort(h_pixelMask_d.ptr, h_pixelMask_d.ptr + numDetections);
+    }
+
+    Ncv32u fpu_oldcw, fpu_cw;
+    _controlfp_s(&fpu_cw, 0, 0);
+    fpu_oldcw = fpu_cw;
+    _controlfp_s(&fpu_cw, _PC_24, _MCW_PC);
+    Ncv32u numDetGold;
+    ncvStat = ncvApplyHaarClassifierCascade_host(h_integralImage, h_weights, h_pixelMask, numDetGold, haar,
+                                                 h_HaarStages, h_HaarNodes, h_HaarFeatures,
+                                                 bMaskElements, anchorsRoi, pixelStep, scaleArea);
+    ncvAssertReturnNcvStat(ncvStat);
+    _controlfp_s(&fpu_cw, fpu_oldcw, _MCW_PC);
+
+    bool bPass = true;
+
+    if (numDetGold != numDetections)
+    {
+        printf("NCVHaarClassifierCascade::applyHaarClassifierCascade numdetections don't match: cpu=%d, gpu=%d\n", numDetGold, numDetections);
+        bPass = false;
+    }
+    else
+    {
+        for (Ncv32u i=0; i<std::max(numDetGold, numDetections) && bPass; i++)
+        {
+            if (h_pixelMask.ptr[i] != h_pixelMask_d.ptr[i])
+            {
+                printf("NCVHaarClassifierCascade::applyHaarClassifierCascade self test failed: i=%d, cpu=%d, gpu=%d\n", i, h_pixelMask.ptr[i], h_pixelMask_d.ptr[i]);
+                bPass = false;
+            }
+        }
+    }
+
+    printf("NCVHaarClassifierCascade::applyHaarClassifierCascade %s\n", bPass?"PASSED":"FAILED");
+#endif
+
+    NCV_SKIP_COND_END
+
+    return NCV_SUCCESS;
+}
+
+
+//==============================================================================
+//
+// HypothesesOperations file
+//
+//==============================================================================
+
+
+const Ncv32u NUM_GROW_THREADS = 128;
+
+
+__device__ __host__ NcvRect32u pixelToRect(Ncv32u pixel, Ncv32u width, Ncv32u height, Ncv32f scale)
+{
+    NcvRect32u res;
+    res.x = (Ncv32u)(scale * (pixel & 0xFFFF));
+    res.y = (Ncv32u)(scale * (pixel >> 16));
+    res.width = (Ncv32u)(scale * width);
+    res.height = (Ncv32u)(scale * height);
+    return res;
+}
+
+
+__global__ void growDetectionsKernel(Ncv32u *pixelMask, Ncv32u numElements,
+                                     NcvRect32u *hypotheses,
+                                     Ncv32u rectWidth, Ncv32u rectHeight, Ncv32f curScale)
+{
+    Ncv32u blockId = blockIdx.y * 65535 + blockIdx.x;
+    Ncv32u elemAddr = blockId * NUM_GROW_THREADS + threadIdx.x;
+    if (elemAddr >= numElements)
+    {
+        return;
+    }
+    hypotheses[elemAddr] = pixelToRect(pixelMask[elemAddr], rectWidth, rectHeight, curScale);
+}
+
+
+NCVStatus ncvGrowDetectionsVector_device(NCVVector<Ncv32u> &pixelMask,
+                                         Ncv32u numPixelMaskDetections,
+                                         NCVVector<NcvRect32u> &hypotheses,
+                                         Ncv32u &totalDetections,
+                                         Ncv32u totalMaxDetections,
+                                         Ncv32u rectWidth,
+                                         Ncv32u rectHeight,
+                                         Ncv32f curScale,
+                                         cudaStream_t cuStream)
+{
+    ncvAssertReturn(pixelMask.ptr() != NULL && hypotheses.ptr() != NULL, NCV_NULL_PTR);
+
+    ncvAssertReturn(pixelMask.memType() == hypotheses.memType() &&
+                    pixelMask.memType() == NCVMemoryTypeDevice, NCV_MEM_RESIDENCE_ERROR);
+
+    ncvAssertReturn(rectWidth > 0 && rectHeight > 0 && curScale > 0, NCV_INVALID_ROI);
+
+    ncvAssertReturn(curScale > 0, NCV_INVALID_SCALE);
+
+    ncvAssertReturn(totalMaxDetections <= hypotheses.length() &&
+                    numPixelMaskDetections <= pixelMask.length() &&
+                    totalDetections <= totalMaxDetections, NCV_INCONSISTENT_INPUT);
+
+    NCVStatus ncvStat = NCV_SUCCESS;
+    Ncv32u numDetsToCopy = numPixelMaskDetections;
+
+    if (numDetsToCopy == 0)
+    {
+        return ncvStat;
+    }
+
+    if (totalDetections + numPixelMaskDetections > totalMaxDetections)
+    {
+        ncvStat = NCV_WARNING_HAAR_DETECTIONS_VECTOR_OVERFLOW;
+        numDetsToCopy = totalMaxDetections - totalDetections;
+    }
+
+    dim3 block(NUM_GROW_THREADS);
+    dim3 grid((numDetsToCopy + NUM_GROW_THREADS - 1) / NUM_GROW_THREADS);
+    if (grid.x > 65535)
+    {
+        grid.y = (grid.x + 65534) / 65535;
+        grid.x = 65535;
+    }
+    growDetectionsKernel<<<grid, block, 0, cuStream>>>(pixelMask.ptr(), numDetsToCopy,
+                                                       hypotheses.ptr() + totalDetections,
+                                                       rectWidth, rectHeight, curScale);
+    ncvAssertCUDAReturn(cudaGetLastError(), NCV_CUDA_ERROR);
+
+    totalDetections += numDetsToCopy;
+    return ncvStat;
+}
+
+
+//==============================================================================
+//
+// Pipeline file
+//
+//==============================================================================
+
+
+NCVStatus ncvDetectObjectsMultiScale_device(NCVMatrix<Ncv8u> &d_srcImg,
+                                            NcvSize32u srcRoi,
+                                            NCVVector<NcvRect32u> &d_dstRects,
+                                            Ncv32u &dstNumRects,
+
+                                            HaarClassifierCascadeDescriptor &haar,
+                                            NCVVector<HaarStage64> &h_HaarStages,
+                                            NCVVector<HaarStage64> &d_HaarStages,
+                                            NCVVector<HaarClassifierNode128> &d_HaarNodes,
+                                            NCVVector<HaarFeature64> &d_HaarFeatures,
+
+                                            NcvSize32u minObjSize,
+                                            Ncv32u minNeighbors,      //default 4
+                                            Ncv32f scaleStep,         //default 1.2f
+                                            Ncv32u pixelStep,         //default 1
+                                            Ncv32u flags,             //default NCVPipeObjDet_Default
+
+                                            INCVMemAllocator &gpuAllocator,
+                                            INCVMemAllocator &cpuAllocator,
+                                            cudaDeviceProp &devProp,
+                                            cudaStream_t cuStream)
+{
+    ncvAssertReturn(d_srcImg.memType() == d_dstRects.memType() &&
+                    d_srcImg.memType() == gpuAllocator.memType() &&
+                     (d_srcImg.memType() == NCVMemoryTypeDevice ||
+                      d_srcImg.memType() == NCVMemoryTypeNone), NCV_MEM_RESIDENCE_ERROR);
+
+    ncvAssertReturn(d_HaarStages.memType() == d_HaarNodes.memType() &&
+                    d_HaarStages.memType() == d_HaarFeatures.memType() &&
+                     (d_HaarStages.memType() == NCVMemoryTypeDevice ||
+                      d_HaarStages.memType() == NCVMemoryTypeNone), NCV_MEM_RESIDENCE_ERROR);
+
+    ncvAssertReturn(h_HaarStages.memType() != NCVMemoryTypeDevice, NCV_MEM_RESIDENCE_ERROR);
+
+    ncvAssertReturn(gpuAllocator.isInitialized() && cpuAllocator.isInitialized(), NCV_ALLOCATOR_NOT_INITIALIZED);
+
+    ncvAssertReturn((d_srcImg.ptr() != NULL && d_dstRects.ptr() != NULL &&
+                     h_HaarStages.ptr() != NULL && d_HaarStages.ptr() != NULL && d_HaarNodes.ptr() != NULL &&
+                     d_HaarFeatures.ptr() != NULL) || gpuAllocator.isCounting(), NCV_NULL_PTR);
+    ncvAssertReturn(srcRoi.width > 0 && srcRoi.height > 0 &&
+                    d_srcImg.width() >= srcRoi.width && d_srcImg.height() >= srcRoi.height &&
+                    srcRoi.width >= minObjSize.width && srcRoi.height >= minObjSize.height &&
+                    d_dstRects.length() >= 1, NCV_DIMENSIONS_INVALID);
+
+    ncvAssertReturn(scaleStep > 1.0f, NCV_INVALID_SCALE);
+
+    ncvAssertReturn(d_HaarStages.length() >= haar.NumStages &&
+                    d_HaarNodes.length() >= haar.NumClassifierTotalNodes &&
+                    d_HaarFeatures.length() >= haar.NumFeatures &&
+                    d_HaarStages.length() == h_HaarStages.length() &&
+                    haar.NumClassifierRootNodes <= haar.NumClassifierTotalNodes, NCV_DIMENSIONS_INVALID);
+
+    ncvAssertReturn(haar.bNeedsTiltedII == false, NCV_NOIMPL_HAAR_TILTED_FEATURES);
+
+    ncvAssertReturn(pixelStep == 1 || pixelStep == 2, NCV_HAAR_INVALID_PIXEL_STEP);
+
+    //TODO: set NPP active stream to cuStream
+
+    NCVStatus ncvStat;
+    NCV_SET_SKIP_COND(gpuAllocator.isCounting());
+
+    Ncv32u integralWidth = d_srcImg.width() + 1;
+    Ncv32u integralHeight = d_srcImg.height() + 1;
+
+    NCVMatrixAlloc<Ncv32u> integral(gpuAllocator, integralWidth, integralHeight);
+    ncvAssertReturn(integral.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+    NCVMatrixAlloc<Ncv64u> d_sqIntegralImage(gpuAllocator, integralWidth, integralHeight);
+    ncvAssertReturn(d_sqIntegralImage.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+
+    NCVMatrixAlloc<Ncv32f> d_rectStdDev(gpuAllocator, d_srcImg.width(), d_srcImg.height());
+    ncvAssertReturn(d_rectStdDev.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+    NCVMatrixAlloc<Ncv32u> d_pixelMask(gpuAllocator, d_srcImg.width(), d_srcImg.height());
+    ncvAssertReturn(d_pixelMask.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+
+    NCVMatrixAlloc<Ncv32u> d_scaledIntegralImage(gpuAllocator, integralWidth, integralHeight);
+    ncvAssertReturn(d_scaledIntegralImage.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+    NCVMatrixAlloc<Ncv64u> d_scaledSqIntegralImage(gpuAllocator, integralWidth, integralHeight);
+    ncvAssertReturn(d_scaledSqIntegralImage.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+
+    NCVVectorAlloc<NcvRect32u> d_hypothesesIntermediate(gpuAllocator, d_srcImg.width() * d_srcImg.height());
+    ncvAssertReturn(d_hypothesesIntermediate.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+    NCVVectorAlloc<NcvRect32u> h_hypothesesIntermediate(cpuAllocator, d_srcImg.width() * d_srcImg.height());
+    ncvAssertReturn(h_hypothesesIntermediate.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+
+    NCVStatus nppStat;
+    Ncv32u szTmpBufIntegral, szTmpBufSqIntegral;
+    nppStat = nppiStIntegralGetSize_8u32u(NcvSize32u(d_srcImg.width(), d_srcImg.height()), &szTmpBufIntegral, devProp);
+    ncvAssertReturnNcvStat(nppStat);
+    nppStat = nppiStSqrIntegralGetSize_8u64u(NcvSize32u(d_srcImg.width(), d_srcImg.height()), &szTmpBufSqIntegral, devProp);
+    ncvAssertReturnNcvStat(nppStat);
+    NCVVectorAlloc<Ncv8u> d_tmpIIbuf(gpuAllocator, std::max(szTmpBufIntegral, szTmpBufSqIntegral));
+    ncvAssertReturn(d_tmpIIbuf.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+
+    NCV_SKIP_COND_BEGIN
+
+    nppStat = nppiStIntegral_8u32u_C1R(d_srcImg.ptr(), d_srcImg.pitch(),
+                                       integral.ptr(), integral.pitch(),
+                                       NcvSize32u(d_srcImg.width(), d_srcImg.height()),
+                                       d_tmpIIbuf.ptr(), szTmpBufIntegral, devProp);
+    ncvAssertReturnNcvStat(nppStat);
+
+    nppStat = nppiStSqrIntegral_8u64u_C1R(d_srcImg.ptr(), d_srcImg.pitch(),
+                                          d_sqIntegralImage.ptr(), d_sqIntegralImage.pitch(),
+                                          NcvSize32u(d_srcImg.width(), d_srcImg.height()),
+                                          d_tmpIIbuf.ptr(), szTmpBufSqIntegral, devProp);
+    ncvAssertReturnNcvStat(nppStat);
+
+    NCV_SKIP_COND_END
+
+    dstNumRects = 0;
+
+    Ncv32u lastCheckedScale = 0;
+    NcvBool bReverseTraverseScale = ((flags & NCVPipeObjDet_FindLargestObject) != 0);
+    std::vector<Ncv32u> scalesVector;
+
+    NcvBool bFoundLargestFace = false;
+
+    for (Ncv32f scaleIter = 1.0f; ; scaleIter *= scaleStep)
+    {
+        Ncv32u scale = (Ncv32u)scaleIter;
+        if (lastCheckedScale == scale)
+        {
+            continue;
+        }
+        lastCheckedScale = scale;
+
+        if (haar.ClassifierSize.width * (Ncv32s)scale < minObjSize.width ||
+            haar.ClassifierSize.height * (Ncv32s)scale < minObjSize.height)
+        {
+            continue;
+        }
+
+        NcvSize32s srcRoi_, srcIIRo_i, scaledIIRoi, searchRoi;
+
+        srcRoi_.width = d_srcImg.width();
+        srcRoi_.height = d_srcImg.height();
+
+        srcIIRo_i.width = srcRoi_.width + 1;
+        srcIIRo_i.height = srcRoi_.height + 1;
+
+        scaledIIRoi.width = srcIIRo_i.width / scale;
+        scaledIIRoi.height = srcIIRo_i.height / scale;
+
+        searchRoi.width = scaledIIRoi.width - haar.ClassifierSize.width;
+        searchRoi.height = scaledIIRoi.height - haar.ClassifierSize.height;
+
+        if (searchRoi.width <= 0 || searchRoi.height <= 0)
+        {
+            break;
+        }
+
+        scalesVector.push_back(scale);
+
+        if (gpuAllocator.isCounting())
+        {
+            break;
+        }
+    }
+
+    if (bReverseTraverseScale)
+    {
+        std::reverse(scalesVector.begin(), scalesVector.end());
+    }
+
+    //TODO: handle _fair_scale_ flag
+    for (Ncv32u i=0; i<scalesVector.size(); i++)
+    {
+        Ncv32u scale = scalesVector[i];
+
+        NcvSize32u srcRoi_, scaledIIRoi, searchRoi;
+        NcvSize32u srcIIRoi;
+        srcRoi_.width = d_srcImg.width();
+        srcRoi_.height = d_srcImg.height();
+        srcIIRoi.width = srcRoi_.width + 1;
+        srcIIRoi.height = srcRoi_.height + 1;
+        scaledIIRoi.width = srcIIRoi.width / scale;
+        scaledIIRoi.height = srcIIRoi.height / scale;
+        searchRoi.width = scaledIIRoi.width - haar.ClassifierSize.width;
+        searchRoi.height = scaledIIRoi.height - haar.ClassifierSize.height;
+
+        NCV_SKIP_COND_BEGIN
+
+        nppStat = nppiStDecimate_32u_C1R(
+            integral.ptr(), integral.pitch(),
+            d_scaledIntegralImage.ptr(), d_scaledIntegralImage.pitch(),
+            srcIIRoi, scale, true);
+        ncvAssertReturnNcvStat(nppStat);
+
+        nppStat = nppiStDecimate_64u_C1R(
+            d_sqIntegralImage.ptr(), d_sqIntegralImage.pitch(),
+            d_scaledSqIntegralImage.ptr(), d_scaledSqIntegralImage.pitch(),
+            srcIIRoi, scale, true);
+        ncvAssertReturnNcvStat(nppStat);
+
+        const NcvRect32u rect(
+            HAAR_STDDEV_BORDER,
+            HAAR_STDDEV_BORDER,
+            haar.ClassifierSize.width - 2*HAAR_STDDEV_BORDER,
+            haar.ClassifierSize.height - 2*HAAR_STDDEV_BORDER);
+        nppStat = nppiStRectStdDev_32f_C1R(
+            d_scaledIntegralImage.ptr(), d_scaledIntegralImage.pitch(),
+            d_scaledSqIntegralImage.ptr(), d_scaledSqIntegralImage.pitch(),
+            d_rectStdDev.ptr(), d_rectStdDev.pitch(),
+            NcvSize32u(searchRoi.width, searchRoi.height), rect,
+            (Ncv32f)scale*scale, true);
+        ncvAssertReturnNcvStat(nppStat);
+
+        NCV_SKIP_COND_END
+
+        Ncv32u detectionsOnThisScale;
+        ncvStat = ncvApplyHaarClassifierCascade_device(
+            d_scaledIntegralImage, d_rectStdDev, d_pixelMask,
+            detectionsOnThisScale,
+            haar, h_HaarStages, d_HaarStages, d_HaarNodes, d_HaarFeatures, false,
+            searchRoi, pixelStep, (Ncv32f)scale*scale,
+            gpuAllocator, cpuAllocator, devProp, cuStream);
+        ncvAssertReturnNcvStat(nppStat);
+
+        NCV_SKIP_COND_BEGIN
+
+        NCVVectorReuse<Ncv32u> d_vecPixelMask(d_pixelMask.getSegment());
+        ncvStat = ncvGrowDetectionsVector_device(
+            d_vecPixelMask,
+            detectionsOnThisScale,
+            d_hypothesesIntermediate,
+            dstNumRects,
+            static_cast<Ncv32u>(d_hypothesesIntermediate.length()),
+            haar.ClassifierSize.width,
+            haar.ClassifierSize.height,
+            (Ncv32f)scale,
+            cuStream);
+        ncvAssertReturn(ncvStat == NCV_SUCCESS, ncvStat);
+
+        if (flags & NCVPipeObjDet_FindLargestObject)
+        {
+            if (dstNumRects == 0)
+            {
+                continue;
+            }
+
+            if (dstNumRects != 0)
+            {
+                ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+                ncvStat = d_hypothesesIntermediate.copySolid(h_hypothesesIntermediate, cuStream,
+                                                             dstNumRects * sizeof(NcvRect32u));
+                ncvAssertReturnNcvStat(ncvStat);
+                ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+            }
+
+            Ncv32u numStrongHypothesesNow = dstNumRects;
+            ncvStat = ncvGroupRectangles_host(
+                h_hypothesesIntermediate,
+                numStrongHypothesesNow,
+                minNeighbors,
+                RECT_SIMILARITY_PROPORTION,
+                NULL);
+            ncvAssertReturnNcvStat(ncvStat);
+
+            if (numStrongHypothesesNow > 0)
+            {
+                NcvRect32u maxRect = h_hypothesesIntermediate.ptr()[0];
+                for (Ncv32u j=1; j<numStrongHypothesesNow; j++)
+                {
+                    if (maxRect.width < h_hypothesesIntermediate.ptr()[j].width)
+                    {
+                        maxRect = h_hypothesesIntermediate.ptr()[j];
+                    }
+                }
+
+                h_hypothesesIntermediate.ptr()[0] = maxRect;
+                dstNumRects = 1;
+
+                ncvStat = h_hypothesesIntermediate.copySolid(d_dstRects, cuStream, sizeof(NcvRect32u));
+                ncvAssertReturnNcvStat(ncvStat);
+
+                bFoundLargestFace = true;
+
+                break;
+            }
+        }
+
+        NCV_SKIP_COND_END
+
+        if (gpuAllocator.isCounting())
+        {
+            break;
+        }
+    }
+
+    NCVStatus ncvRetCode = NCV_SUCCESS;
+
+    NCV_SKIP_COND_BEGIN
+
+    if (flags & NCVPipeObjDet_FindLargestObject)
+    {
+        if (!bFoundLargestFace)
+        {
+            dstNumRects = 0;
+        }
+    }
+    else
+    {
+        //TODO: move hypotheses filtration to GPU pipeline (the only CPU-resident element of the pipeline left)
+        if (dstNumRects != 0)
+        {
+            ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+            ncvStat = d_hypothesesIntermediate.copySolid(h_hypothesesIntermediate, cuStream,
+                                                         dstNumRects * sizeof(NcvRect32u));
+            ncvAssertReturnNcvStat(ncvStat);
+            ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+        }
+
+        ncvStat = ncvGroupRectangles_host(
+            h_hypothesesIntermediate,
+            dstNumRects,
+            minNeighbors,
+            RECT_SIMILARITY_PROPORTION,
+            NULL);
+        ncvAssertReturnNcvStat(ncvStat);
+
+        if (dstNumRects > d_dstRects.length())
+        {
+            ncvRetCode = NCV_WARNING_HAAR_DETECTIONS_VECTOR_OVERFLOW;
+            dstNumRects = static_cast<Ncv32u>(d_dstRects.length());
+        }
+
+        if (dstNumRects != 0)
+        {
+            ncvStat = h_hypothesesIntermediate.copySolid(d_dstRects, cuStream,
+                                                         dstNumRects * sizeof(NcvRect32u));
+            ncvAssertReturnNcvStat(ncvStat);
+        }
+    }
+
+    if (flags & NCVPipeObjDet_VisualizeInPlace)
+    {
+        ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+        ncvDrawRects_8u_device(d_srcImg.ptr(), d_srcImg.stride(),
+                               d_srcImg.width(), d_srcImg.height(),
+                               d_dstRects.ptr(), dstNumRects, 255, cuStream);
+    }
+
+    NCV_SKIP_COND_END
+
+    return ncvRetCode;
+}
+
+
+//==============================================================================
+//
+// Purely Host code: classifier IO, mock-ups
+//
+//==============================================================================
+
+
+#ifdef _SELF_TEST_
+#include <float.h>
+#endif
+
+
+NCVStatus ncvApplyHaarClassifierCascade_host(NCVMatrix<Ncv32u> &h_integralImage,
+                                             NCVMatrix<Ncv32f> &h_weights,
+                                             NCVMatrixAlloc<Ncv32u> &h_pixelMask,
+                                             Ncv32u &numDetections,
+                                             HaarClassifierCascadeDescriptor &haar,
+                                             NCVVector<HaarStage64> &h_HaarStages,
+                                             NCVVector<HaarClassifierNode128> &h_HaarNodes,
+                                             NCVVector<HaarFeature64> &h_HaarFeatures,
+                                             NcvBool bMaskElements,
+                                             NcvSize32u anchorsRoi,
+                                             Ncv32u pixelStep,
+                                             Ncv32f scaleArea)
+{
+    ncvAssertReturn(h_integralImage.memType() == h_weights.memType() &&
+                    h_integralImage.memType() == h_pixelMask.memType() &&
+                     (h_integralImage.memType() == NCVMemoryTypeHostPageable ||
+                      h_integralImage.memType() == NCVMemoryTypeHostPinned), NCV_MEM_RESIDENCE_ERROR);
+    ncvAssertReturn(h_HaarStages.memType() == h_HaarNodes.memType() &&
+                    h_HaarStages.memType() == h_HaarFeatures.memType() &&
+                     (h_HaarStages.memType() == NCVMemoryTypeHostPageable ||
+                      h_HaarStages.memType() == NCVMemoryTypeHostPinned), NCV_MEM_RESIDENCE_ERROR);
+    ncvAssertReturn(h_integralImage.ptr() != NULL && h_weights.ptr() != NULL && h_pixelMask.ptr() != NULL &&
+                    h_HaarStages.ptr() != NULL && h_HaarNodes.ptr() != NULL && h_HaarFeatures.ptr() != NULL, NCV_NULL_PTR);
+    ncvAssertReturn(anchorsRoi.width > 0 && anchorsRoi.height > 0 &&
+                    h_pixelMask.width() >= anchorsRoi.width && h_pixelMask.height() >= anchorsRoi.height &&
+                    h_weights.width() >= anchorsRoi.width && h_weights.height() >= anchorsRoi.height &&
+                    h_integralImage.width() >= anchorsRoi.width + haar.ClassifierSize.width &&
+                    h_integralImage.height() >= anchorsRoi.height + haar.ClassifierSize.height, NCV_DIMENSIONS_INVALID);
+    ncvAssertReturn(scaleArea > 0, NCV_INVALID_SCALE);
+    ncvAssertReturn(h_HaarStages.length() >= haar.NumStages &&
+                    h_HaarNodes.length() >= haar.NumClassifierTotalNodes &&
+                    h_HaarFeatures.length() >= haar.NumFeatures &&
+                    h_HaarStages.length() == h_HaarStages.length() &&
+                    haar.NumClassifierRootNodes <= haar.NumClassifierTotalNodes, NCV_DIMENSIONS_INVALID);
+    ncvAssertReturn(haar.bNeedsTiltedII == false, NCV_NOIMPL_HAAR_TILTED_FEATURES);
+    ncvAssertReturn(pixelStep == 1 || pixelStep == 2, NCV_HAAR_INVALID_PIXEL_STEP);
+
+    Ncv32f scaleAreaPixels = scaleArea * ((haar.ClassifierSize.width - 2*HAAR_STDDEV_BORDER) *
+                                          (haar.ClassifierSize.height - 2*HAAR_STDDEV_BORDER));
+
+    for (Ncv32u i=0; i<anchorsRoi.height; i++)
+    {
+        for (Ncv32u j=0; j<h_pixelMask.stride(); j++)
+        {
+            if (i % pixelStep != 0 || j % pixelStep != 0 || j >= anchorsRoi.width)
+            {
+                h_pixelMask.ptr()[i * h_pixelMask.stride() + j] = OBJDET_MASK_ELEMENT_INVALID_32U;
+            }
+            else
+            {
+                for (Ncv32u iStage = 0; iStage < haar.NumStages; iStage++)
+                {
+                    Ncv32f curStageSum = 0.0f;
+                    Ncv32u numRootNodesInStage = h_HaarStages.ptr()[iStage].getNumClassifierRootNodes();
+                    Ncv32u curRootNodeOffset = h_HaarStages.ptr()[iStage].getStartClassifierRootNodeOffset();
+
+                    if (iStage == 0)
+                    {
+                        if (bMaskElements && h_pixelMask.ptr()[i * h_pixelMask.stride() + j] == OBJDET_MASK_ELEMENT_INVALID_32U)
+                        {
+                            break;
+                        }
+                        else
+                        {
+                            h_pixelMask.ptr()[i * h_pixelMask.stride() + j] = ((i << 16) | j);
+                        }
+                    }
+                    else if (h_pixelMask.ptr()[i * h_pixelMask.stride() + j] == OBJDET_MASK_ELEMENT_INVALID_32U)
+                    {
+                        break;
+                    }
+
+                    while (numRootNodesInStage--)
+                    {
+                        NcvBool bMoreNodesToTraverse = true;
+                        Ncv32u curNodeOffset = curRootNodeOffset;
+
+                        while (bMoreNodesToTraverse)
+                        {
+                            HaarClassifierNode128 curNode = h_HaarNodes.ptr()[curNodeOffset];
+                            HaarFeatureDescriptor32 curFeatDesc = curNode.getFeatureDesc();
+                            Ncv32u curNodeFeaturesNum = curFeatDesc.getNumFeatures();
+                            Ncv32u curNodeFeaturesOffs = curFeatDesc.getFeaturesOffset();
+
+                            Ncv32f curNodeVal = 0.f;
+                            for (Ncv32u iRect=0; iRect<curNodeFeaturesNum; iRect++)
+                            {
+                                HaarFeature64 feature = h_HaarFeatures.ptr()[curNodeFeaturesOffs + iRect];
+                                Ncv32u rectX, rectY, rectWidth, rectHeight;
+                                feature.getRect(&rectX, &rectY, &rectWidth, &rectHeight);
+                                Ncv32f rectWeight = feature.getWeight();
+                                Ncv32u iioffsTL = (i + rectY) * h_integralImage.stride() + (j + rectX);
+                                Ncv32u iioffsTR = iioffsTL + rectWidth;
+                                Ncv32u iioffsBL = iioffsTL + rectHeight * h_integralImage.stride();
+                                Ncv32u iioffsBR = iioffsBL + rectWidth;
+
+                                Ncv32u iivalTL = h_integralImage.ptr()[iioffsTL];
+                                Ncv32u iivalTR = h_integralImage.ptr()[iioffsTR];
+                                Ncv32u iivalBL = h_integralImage.ptr()[iioffsBL];
+                                Ncv32u iivalBR = h_integralImage.ptr()[iioffsBR];
+                                Ncv32u rectSum = iivalBR - iivalBL + iivalTL - iivalTR;
+                                curNodeVal += (Ncv32f)rectSum * rectWeight;
+                            }
+
+                            HaarClassifierNodeDescriptor32 nodeLeft = curNode.getLeftNodeDesc();
+                            HaarClassifierNodeDescriptor32 nodeRight = curNode.getRightNodeDesc();
+                            Ncv32f nodeThreshold = curNode.getThreshold();
+
+                            HaarClassifierNodeDescriptor32 nextNodeDescriptor;
+                            NcvBool nextNodeIsLeaf;
+
+                            if (curNodeVal < scaleAreaPixels * h_weights.ptr()[i * h_weights.stride() + j] * nodeThreshold)
+                            {
+                                nextNodeDescriptor = nodeLeft;
+                                nextNodeIsLeaf = curFeatDesc.isLeftNodeLeaf();
+                            }
+                            else
+                            {
+                                nextNodeDescriptor = nodeRight;
+                                nextNodeIsLeaf = curFeatDesc.isRightNodeLeaf();
+                            }
+
+                            if (nextNodeIsLeaf)
+                            {
+                                Ncv32f tmpLeafValue = nextNodeDescriptor.getLeafValueHost();
+                                curStageSum += tmpLeafValue;
+                                bMoreNodesToTraverse = false;
+                            }
+                            else
+                            {
+                                curNodeOffset = nextNodeDescriptor.getNextNodeOffset();
+                            }
+                        }
+
+                        curRootNodeOffset++;
+                    }
+
+                    Ncv32f tmpStageThreshold = h_HaarStages.ptr()[iStage].getStageThreshold();
+                    if (curStageSum < tmpStageThreshold)
+                    {
+                        //drop
+                        h_pixelMask.ptr()[i * h_pixelMask.stride() + j] = OBJDET_MASK_ELEMENT_INVALID_32U;
+                        break;
+                    }
+                }
+            }
+        }
+    }
+
+    std::sort(h_pixelMask.ptr(), h_pixelMask.ptr() + anchorsRoi.height * h_pixelMask.stride());
+    Ncv32u i = 0;
+    for (; i<anchorsRoi.height * h_pixelMask.stride(); i++)
+    {
+        if (h_pixelMask.ptr()[i] == OBJDET_MASK_ELEMENT_INVALID_32U)
+        {
+            break;
+        }
+    }
+    numDetections = i;
+
+    return NCV_SUCCESS;
+}
+
+
+NCVStatus ncvGrowDetectionsVector_host(NCVVector<Ncv32u> &pixelMask,
+                                       Ncv32u numPixelMaskDetections,
+                                       NCVVector<NcvRect32u> &hypotheses,
+                                       Ncv32u &totalDetections,
+                                       Ncv32u totalMaxDetections,
+                                       Ncv32u rectWidth,
+                                       Ncv32u rectHeight,
+                                       Ncv32f curScale)
+{
+    ncvAssertReturn(pixelMask.ptr() != NULL && hypotheses.ptr() != NULL, NCV_NULL_PTR);
+    ncvAssertReturn(pixelMask.memType() == hypotheses.memType() &&
+                    pixelMask.memType() != NCVMemoryTypeDevice, NCV_MEM_RESIDENCE_ERROR);
+    ncvAssertReturn(rectWidth > 0 && rectHeight > 0 && curScale > 0, NCV_INVALID_ROI);
+    ncvAssertReturn(curScale > 0, NCV_INVALID_SCALE);
+    ncvAssertReturn(totalMaxDetections <= hypotheses.length() &&
+                    numPixelMaskDetections <= pixelMask.length() &&
+                    totalDetections <= totalMaxDetections, NCV_INCONSISTENT_INPUT);
+
+    NCVStatus ncvStat = NCV_SUCCESS;
+    Ncv32u numDetsToCopy = numPixelMaskDetections;
+
+    if (numDetsToCopy == 0)
+    {
+        return ncvStat;
+    }
+
+    if (totalDetections + numPixelMaskDetections > totalMaxDetections)
+    {
+        ncvStat = NCV_WARNING_HAAR_DETECTIONS_VECTOR_OVERFLOW;
+        numDetsToCopy = totalMaxDetections - totalDetections;
+    }
+
+    for (Ncv32u i=0; i<numDetsToCopy; i++)
+    {
+        hypotheses.ptr()[totalDetections + i] = pixelToRect(pixelMask.ptr()[i], rectWidth, rectHeight, curScale);
+    }
+
+    totalDetections += numDetsToCopy;
+    return ncvStat;
+}
+
+static NCVStatus loadFromXML(const cv::String &filename,
+                      HaarClassifierCascadeDescriptor &haar,
+                      std::vector<HaarStage64> &haarStages,
+                      std::vector<HaarClassifierNode128> &haarClassifierNodes,
+                      std::vector<HaarFeature64> &haarFeatures)
+{
+    CV_UNUSED(filename);
+    CV_UNUSED(haar);
+    CV_UNUSED(haarStages);
+    CV_UNUSED(haarClassifierNodes);
+    CV_UNUSED(haarFeatures);
+    CV_Error(cv::Error::StsNotImplemented, "Loading from XML file is not available");
+#if 0 // CvLoad is not available since OpenCV 4.0
+    NCVStatus ncvStat;
+
+    haar.NumStages = 0;
+    haar.NumClassifierRootNodes = 0;
+    haar.NumClassifierTotalNodes = 0;
+    haar.NumFeatures = 0;
+    haar.ClassifierSize.width = 0;
+    haar.ClassifierSize.height = 0;
+    haar.bHasStumpsOnly = true;
+    haar.bNeedsTiltedII = false;
+    Ncv32u curMaxTreeDepth = 0;
+
+    std::vector<HaarClassifierNode128> h_TmpClassifierNotRootNodes;
+    haarStages.resize(0);
+    haarClassifierNodes.resize(0);
+    haarFeatures.resize(0);
+
+    cv::Ptr<CvHaarClassifierCascade> oldCascade((CvHaarClassifierCascade*)cvLoad(filename.c_str(), 0, 0, 0));
+    if (!oldCascade)
+    {
+        return NCV_HAAR_XML_LOADING_EXCEPTION;
+    }
+
+    haar.ClassifierSize.width = oldCascade->orig_window_size.width;
+    haar.ClassifierSize.height = oldCascade->orig_window_size.height;
+
+    int stagesCount = oldCascade->count;
+    for(int s = 0; s < stagesCount; ++s) // by stages
+    {
+        HaarStage64 curStage;
+        curStage.setStartClassifierRootNodeOffset(static_cast<Ncv32u>(haarClassifierNodes.size()));
+
+        curStage.setStageThreshold(oldCascade->stage_classifier[s].threshold);
+
+        int treesCount = oldCascade->stage_classifier[s].count;
+        for(int t = 0; t < treesCount; ++t) // by trees
+        {
+            Ncv32u nodeId = 0;
+            CvHaarClassifier* tree = &oldCascade->stage_classifier[s].classifier[t];
+
+            int nodesCount = tree->count;
+            for(int n = 0; n < nodesCount; ++n)  //by features
+            {
+                CvHaarFeature* feature = &tree->haar_feature[n];
+
+                HaarClassifierNode128 curNode;
+                curNode.setThreshold(tree->threshold[n]);
+
+                NcvBool bIsLeftNodeLeaf = false;
+                NcvBool bIsRightNodeLeaf = false;
+
+                HaarClassifierNodeDescriptor32 nodeLeft;
+                if ( tree->left[n] <= 0 )
+                {
+                    Ncv32f leftVal = tree->alpha[-tree->left[n]];
+                    ncvStat = nodeLeft.create(leftVal);
+                    ncvAssertReturn(ncvStat == NCV_SUCCESS, ncvStat);
+                    bIsLeftNodeLeaf = true;
+                }
+                else
+                {
+                    Ncv32u leftNodeOffset = tree->left[n];
+                    nodeLeft.create((Ncv32u)(h_TmpClassifierNotRootNodes.size() + leftNodeOffset - 1));
+                    haar.bHasStumpsOnly = false;
+                }
+                curNode.setLeftNodeDesc(nodeLeft);
+
+                HaarClassifierNodeDescriptor32 nodeRight;
+                if ( tree->right[n] <= 0 )
+                {
+                    Ncv32f rightVal = tree->alpha[-tree->right[n]];
+                    ncvStat = nodeRight.create(rightVal);
+                    ncvAssertReturn(ncvStat == NCV_SUCCESS, ncvStat);
+                    bIsRightNodeLeaf = true;
+                }
+                else
+                {
+                    Ncv32u rightNodeOffset = tree->right[n];
+                    nodeRight.create((Ncv32u)(h_TmpClassifierNotRootNodes.size() + rightNodeOffset - 1));
+                    haar.bHasStumpsOnly = false;
+                }
+                curNode.setRightNodeDesc(nodeRight);
+
+                Ncv32u tiltedVal = feature->tilted;
+                haar.bNeedsTiltedII = (tiltedVal != 0);
+
+                Ncv32u featureId = 0;
+                for(int l = 0; l < CV_HAAR_FEATURE_MAX; ++l) //by rects
+                {
+                    Ncv32u rectX = feature->rect[l].r.x;
+                    Ncv32u rectY = feature->rect[l].r.y;
+                    Ncv32u rectWidth = feature->rect[l].r.width;
+                    Ncv32u rectHeight = feature->rect[l].r.height;
+
+                    Ncv32f rectWeight = feature->rect[l].weight;
+
+                    if (rectWeight == 0/* && rectX == 0 &&rectY == 0 && rectWidth == 0 && rectHeight == 0*/)
+                        break;
+
+                    HaarFeature64 curFeature;
+                    ncvStat = curFeature.setRect(rectX, rectY, rectWidth, rectHeight, haar.ClassifierSize.width, haar.ClassifierSize.height);
+                    curFeature.setWeight(rectWeight);
+                    ncvAssertReturn(NCV_SUCCESS == ncvStat, ncvStat);
+                    haarFeatures.push_back(curFeature);
+
+                    featureId++;
+                }
+
+                HaarFeatureDescriptor32 tmpFeatureDesc;
+                ncvStat = tmpFeatureDesc.create(haar.bNeedsTiltedII, bIsLeftNodeLeaf, bIsRightNodeLeaf,
+                    featureId, static_cast<Ncv32u>(haarFeatures.size()) - featureId);
+                ncvAssertReturn(NCV_SUCCESS == ncvStat, ncvStat);
+                curNode.setFeatureDesc(tmpFeatureDesc);
+
+                if (!nodeId)
+                {
+                    //root node
+                    haarClassifierNodes.push_back(curNode);
+                    curMaxTreeDepth = 1;
+                }
+                else
+                {
+                    //other node
+                    h_TmpClassifierNotRootNodes.push_back(curNode);
+                    curMaxTreeDepth++;
+                }
+
+                nodeId++;
+            }
+        }
+
+        curStage.setNumClassifierRootNodes(treesCount);
+        haarStages.push_back(curStage);
+    }
+
+    //fill in cascade stats
+    haar.NumStages = static_cast<Ncv32u>(haarStages.size());
+    haar.NumClassifierRootNodes = static_cast<Ncv32u>(haarClassifierNodes.size());
+    haar.NumClassifierTotalNodes = static_cast<Ncv32u>(haar.NumClassifierRootNodes + h_TmpClassifierNotRootNodes.size());
+    haar.NumFeatures = static_cast<Ncv32u>(haarFeatures.size());
+
+    //merge root and leaf nodes in one classifiers array
+    Ncv32u offsetRoot = static_cast<Ncv32u>(haarClassifierNodes.size());
+    for (Ncv32u i=0; i<haarClassifierNodes.size(); i++)
+    {
+        HaarFeatureDescriptor32 featureDesc = haarClassifierNodes[i].getFeatureDesc();
+
+        HaarClassifierNodeDescriptor32 nodeLeft = haarClassifierNodes[i].getLeftNodeDesc();
+        if (!featureDesc.isLeftNodeLeaf())
+        {
+            Ncv32u newOffset = nodeLeft.getNextNodeOffset() + offsetRoot;
+            nodeLeft.create(newOffset);
+        }
+        haarClassifierNodes[i].setLeftNodeDesc(nodeLeft);
+
+        HaarClassifierNodeDescriptor32 nodeRight = haarClassifierNodes[i].getRightNodeDesc();
+        if (!featureDesc.isRightNodeLeaf())
+        {
+            Ncv32u newOffset = nodeRight.getNextNodeOffset() + offsetRoot;
+            nodeRight.create(newOffset);
+        }
+        haarClassifierNodes[i].setRightNodeDesc(nodeRight);
+    }
+
+    for (Ncv32u i=0; i<h_TmpClassifierNotRootNodes.size(); i++)
+    {
+        HaarFeatureDescriptor32 featureDesc = h_TmpClassifierNotRootNodes[i].getFeatureDesc();
+
+        HaarClassifierNodeDescriptor32 nodeLeft = h_TmpClassifierNotRootNodes[i].getLeftNodeDesc();
+        if (!featureDesc.isLeftNodeLeaf())
+        {
+            Ncv32u newOffset = nodeLeft.getNextNodeOffset() + offsetRoot;
+            nodeLeft.create(newOffset);
+        }
+        h_TmpClassifierNotRootNodes[i].setLeftNodeDesc(nodeLeft);
+
+        HaarClassifierNodeDescriptor32 nodeRight = h_TmpClassifierNotRootNodes[i].getRightNodeDesc();
+        if (!featureDesc.isRightNodeLeaf())
+        {
+            Ncv32u newOffset = nodeRight.getNextNodeOffset() + offsetRoot;
+            nodeRight.create(newOffset);
+        }
+        h_TmpClassifierNotRootNodes[i].setRightNodeDesc(nodeRight);
+
+        haarClassifierNodes.push_back(h_TmpClassifierNotRootNodes[i]);
+    }
+
+    return NCV_SUCCESS;
+#endif
+}
+
+
+#define NVBIN_HAAR_SIZERESERVED     16
+#define NVBIN_HAAR_VERSION          0x1
+
+
+static NCVStatus loadFromNVBIN(const cv::String &filename,
+                               HaarClassifierCascadeDescriptor &haar,
+                               std::vector<HaarStage64> &haarStages,
+                               std::vector<HaarClassifierNode128> &haarClassifierNodes,
+                               std::vector<HaarFeature64> &haarFeatures)
+{
+    size_t readCount;
+    FILE *fp = fopen(filename.c_str(), "rb");
+    ncvAssertReturn(fp != NULL, NCV_FILE_ERROR);
+    Ncv32u fileVersion;
+    readCount = fread(&fileVersion, sizeof(Ncv32u), 1, fp);
+    ncvAssertReturn(1 == readCount, NCV_FILE_ERROR);
+    ncvAssertReturn(fileVersion == NVBIN_HAAR_VERSION, NCV_FILE_ERROR);
+    Ncv32u fsize;
+    readCount = fread(&fsize, sizeof(Ncv32u), 1, fp);
+    ncvAssertReturn(1 == readCount, NCV_FILE_ERROR);
+    fseek(fp, 0, SEEK_END);
+    Ncv32u fsizeActual = ftell(fp);
+    ncvAssertReturn(fsize == fsizeActual, NCV_FILE_ERROR);
+
+    std::vector<unsigned char> fdata;
+    fdata.resize(fsize);
+    Ncv32u dataOffset = 0;
+    fseek(fp, 0, SEEK_SET);
+    readCount = fread(&fdata[0], fsize, 1, fp);
+    ncvAssertReturn(1 == readCount, NCV_FILE_ERROR);
+    fclose(fp);
+
+    //data
+    dataOffset = NVBIN_HAAR_SIZERESERVED;
+    haar.NumStages = *(Ncv32u *)(&fdata[0]+dataOffset);
+    dataOffset += sizeof(Ncv32u);
+    haar.NumClassifierRootNodes = *(Ncv32u *)(&fdata[0]+dataOffset);
+    dataOffset += sizeof(Ncv32u);
+    haar.NumClassifierTotalNodes = *(Ncv32u *)(&fdata[0]+dataOffset);
+    dataOffset += sizeof(Ncv32u);
+    haar.NumFeatures = *(Ncv32u *)(&fdata[0]+dataOffset);
+    dataOffset += sizeof(Ncv32u);
+    haar.ClassifierSize = *(NcvSize32u *)(&fdata[0]+dataOffset);
+    dataOffset += sizeof(NcvSize32u);
+    haar.bNeedsTiltedII = *(NcvBool *)(&fdata[0]+dataOffset);
+    dataOffset += sizeof(NcvBool);
+    haar.bHasStumpsOnly = *(NcvBool *)(&fdata[0]+dataOffset);
+    dataOffset += sizeof(NcvBool);
+
+    haarStages.resize(haar.NumStages);
+    haarClassifierNodes.resize(haar.NumClassifierTotalNodes);
+    haarFeatures.resize(haar.NumFeatures);
+
+    Ncv32u szStages = haar.NumStages * sizeof(HaarStage64);
+    Ncv32u szClassifiers = haar.NumClassifierTotalNodes * sizeof(HaarClassifierNode128);
+    Ncv32u szFeatures = haar.NumFeatures * sizeof(HaarFeature64);
+
+    memcpy(&haarStages[0], &fdata[0]+dataOffset, szStages);
+    dataOffset += szStages;
+    memcpy(&haarClassifierNodes[0], &fdata[0]+dataOffset, szClassifiers);
+    dataOffset += szClassifiers;
+    memcpy(&haarFeatures[0], &fdata[0]+dataOffset, szFeatures);
+    dataOffset += szFeatures;
+
+    return NCV_SUCCESS;
+}
+
+
+NCVStatus ncvHaarGetClassifierSize(const cv::String &filename, Ncv32u &numStages,
+                                   Ncv32u &numNodes, Ncv32u &numFeatures)
+{
+    size_t readCount;
+    NCVStatus ncvStat;
+
+    cv::String fext = filename.substr(filename.find_last_of(".") + 1);
+    std::transform(fext.begin(), fext.end(), fext.begin(), ::tolower);
+
+    if (fext == "nvbin")
+    {
+        FILE *fp = fopen(filename.c_str(), "rb");
+        ncvAssertReturn(fp != NULL, NCV_FILE_ERROR);
+        Ncv32u fileVersion;
+        readCount = fread(&fileVersion, sizeof(Ncv32u), 1, fp);
+        ncvAssertReturn(1 == readCount, NCV_FILE_ERROR);
+        ncvAssertReturn(fileVersion == NVBIN_HAAR_VERSION, NCV_FILE_ERROR);
+        fseek(fp, NVBIN_HAAR_SIZERESERVED, SEEK_SET);
+        Ncv32u tmp;
+        readCount = fread(&numStages,   sizeof(Ncv32u), 1, fp);
+        ncvAssertReturn(1 == readCount, NCV_FILE_ERROR);
+        readCount = fread(&tmp,         sizeof(Ncv32u), 1, fp);
+        ncvAssertReturn(1 == readCount, NCV_FILE_ERROR);
+        readCount = fread(&numNodes,    sizeof(Ncv32u), 1, fp);
+        ncvAssertReturn(1 == readCount, NCV_FILE_ERROR);
+        readCount = fread(&numFeatures, sizeof(Ncv32u), 1, fp);
+        ncvAssertReturn(1 == readCount, NCV_FILE_ERROR);
+        fclose(fp);
+    }
+    else if (fext == "xml")
+    {
+        HaarClassifierCascadeDescriptor haar;
+        std::vector<HaarStage64> haarStages;
+        std::vector<HaarClassifierNode128> haarNodes;
+        std::vector<HaarFeature64> haarFeatures;
+
+        ncvStat = loadFromXML(filename, haar, haarStages, haarNodes, haarFeatures);
+        ncvAssertReturnNcvStat(ncvStat);
+
+        numStages = haar.NumStages;
+        numNodes = haar.NumClassifierTotalNodes;
+        numFeatures = haar.NumFeatures;
+    }
+    else
+    {
+        return NCV_HAAR_XML_LOADING_EXCEPTION;
+    }
+
+    return NCV_SUCCESS;
+}
+
+
+NCVStatus ncvHaarLoadFromFile_host(const cv::String &filename,
+                                   HaarClassifierCascadeDescriptor &haar,
+                                   NCVVector<HaarStage64> &h_HaarStages,
+                                   NCVVector<HaarClassifierNode128> &h_HaarNodes,
+                                   NCVVector<HaarFeature64> &h_HaarFeatures)
+{
+    ncvAssertReturn(h_HaarStages.memType() == NCVMemoryTypeHostPinned &&
+                    h_HaarNodes.memType() == NCVMemoryTypeHostPinned &&
+                    h_HaarFeatures.memType() == NCVMemoryTypeHostPinned, NCV_MEM_RESIDENCE_ERROR);
+
+    NCVStatus ncvStat;
+
+    cv::String fext = filename.substr(filename.find_last_of(".") + 1);
+    std::transform(fext.begin(), fext.end(), fext.begin(), ::tolower);
+
+    std::vector<HaarStage64> haarStages;
+    std::vector<HaarClassifierNode128> haarNodes;
+    std::vector<HaarFeature64> haarFeatures;
+
+    if (fext == "nvbin")
+    {
+        ncvStat = loadFromNVBIN(filename, haar, haarStages, haarNodes, haarFeatures);
+        ncvAssertReturnNcvStat(ncvStat);
+    }
+    else if (fext == "xml")
+    {
+        ncvStat = loadFromXML(filename, haar, haarStages, haarNodes, haarFeatures);
+        ncvAssertReturnNcvStat(ncvStat);
+    }
+    else
+    {
+        return NCV_HAAR_XML_LOADING_EXCEPTION;
+    }
+
+    ncvAssertReturn(h_HaarStages.length() >= haarStages.size(), NCV_MEM_INSUFFICIENT_CAPACITY);
+    ncvAssertReturn(h_HaarNodes.length() >= haarNodes.size(), NCV_MEM_INSUFFICIENT_CAPACITY);
+    ncvAssertReturn(h_HaarFeatures.length() >= haarFeatures.size(), NCV_MEM_INSUFFICIENT_CAPACITY);
+
+    memcpy(h_HaarStages.ptr(), &haarStages[0], haarStages.size()*sizeof(HaarStage64));
+    memcpy(h_HaarNodes.ptr(), &haarNodes[0], haarNodes.size()*sizeof(HaarClassifierNode128));
+    memcpy(h_HaarFeatures.ptr(), &haarFeatures[0], haarFeatures.size()*sizeof(HaarFeature64));
+
+    return NCV_SUCCESS;
+}
+
+
+NCVStatus ncvHaarStoreNVBIN_host(const cv::String &filename,
+                                 HaarClassifierCascadeDescriptor haar,
+                                 NCVVector<HaarStage64> &h_HaarStages,
+                                 NCVVector<HaarClassifierNode128> &h_HaarNodes,
+                                 NCVVector<HaarFeature64> &h_HaarFeatures)
+{
+    ncvAssertReturn(h_HaarStages.length() >= haar.NumStages, NCV_INCONSISTENT_INPUT);
+    ncvAssertReturn(h_HaarNodes.length() >= haar.NumClassifierTotalNodes, NCV_INCONSISTENT_INPUT);
+    ncvAssertReturn(h_HaarFeatures.length() >= haar.NumFeatures, NCV_INCONSISTENT_INPUT);
+    ncvAssertReturn(h_HaarStages.memType() == NCVMemoryTypeHostPinned &&
+                    h_HaarNodes.memType() == NCVMemoryTypeHostPinned &&
+                    h_HaarFeatures.memType() == NCVMemoryTypeHostPinned, NCV_MEM_RESIDENCE_ERROR);
+
+    Ncv32u szStages = haar.NumStages * sizeof(HaarStage64);
+    Ncv32u szClassifiers = haar.NumClassifierTotalNodes * sizeof(HaarClassifierNode128);
+    Ncv32u szFeatures = haar.NumFeatures * sizeof(HaarFeature64);
+
+    Ncv32u dataOffset = 0;
+    std::vector<unsigned char> fdata;
+    fdata.resize(szStages+szClassifiers+szFeatures+1024, 0);
+
+    //header
+    *(Ncv32u *)(&fdata[0]+dataOffset) = NVBIN_HAAR_VERSION;
+
+    //data
+    dataOffset = NVBIN_HAAR_SIZERESERVED;
+    *(Ncv32u *)(&fdata[0]+dataOffset) = haar.NumStages;
+    dataOffset += sizeof(Ncv32u);
+    *(Ncv32u *)(&fdata[0]+dataOffset) = haar.NumClassifierRootNodes;
+    dataOffset += sizeof(Ncv32u);
+    *(Ncv32u *)(&fdata[0]+dataOffset) = haar.NumClassifierTotalNodes;
+    dataOffset += sizeof(Ncv32u);
+    *(Ncv32u *)(&fdata[0]+dataOffset) = haar.NumFeatures;
+    dataOffset += sizeof(Ncv32u);
+    *(NcvSize32u *)(&fdata[0]+dataOffset) = haar.ClassifierSize;
+    dataOffset += sizeof(NcvSize32u);
+    *(NcvBool *)(&fdata[0]+dataOffset) = haar.bNeedsTiltedII;
+    dataOffset += sizeof(NcvBool);
+    *(NcvBool *)(&fdata[0]+dataOffset) = haar.bHasStumpsOnly;
+    dataOffset += sizeof(NcvBool);
+
+    memcpy(&fdata[0]+dataOffset, h_HaarStages.ptr(), szStages);
+    dataOffset += szStages;
+    memcpy(&fdata[0]+dataOffset, h_HaarNodes.ptr(), szClassifiers);
+    dataOffset += szClassifiers;
+    memcpy(&fdata[0]+dataOffset, h_HaarFeatures.ptr(), szFeatures);
+    dataOffset += szFeatures;
+    Ncv32u fsize = dataOffset;
+
+    //TODO: CRC32 here
+
+    //update header
+    dataOffset = sizeof(Ncv32u);
+    *(Ncv32u *)(&fdata[0]+dataOffset) = fsize;
+
+    FILE *fp = fopen(filename.c_str(), "wb");
+    ncvAssertReturn(fp != NULL, NCV_FILE_ERROR);
+    fwrite(&fdata[0], fsize, 1, fp);
+    fclose(fp);
+    return NCV_SUCCESS;
+}
diff --git a/modules/cudalegacy/src/cuda/NCVPixelOperations.hpp b/modules/cudalegacy/src/cuda/NCVPixelOperations.hpp
new file mode 100644
index 00000000000..3d570c5faac
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/NCVPixelOperations.hpp
@@ -0,0 +1,351 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _ncv_pixel_operations_hpp_
+#define _ncv_pixel_operations_hpp_
+
+#include <limits.h>
+#include <float.h>
+#include "opencv2/cudalegacy/NCV.hpp"
+
+template<typename TBase> inline __host__ __device__ TBase _pixMaxVal();
+template<> inline __host__ __device__ Ncv8u  _pixMaxVal<Ncv8u>()  {return UCHAR_MAX;}
+template<> inline __host__ __device__ Ncv16u _pixMaxVal<Ncv16u>() {return USHRT_MAX;}
+template<> inline __host__ __device__ Ncv32u _pixMaxVal<Ncv32u>() {return  UINT_MAX;}
+template<> inline __host__ __device__ Ncv8s  _pixMaxVal<Ncv8s>()  {return  SCHAR_MAX;}
+template<> inline __host__ __device__ Ncv16s _pixMaxVal<Ncv16s>() {return  SHRT_MAX;}
+template<> inline __host__ __device__ Ncv32s _pixMaxVal<Ncv32s>() {return   INT_MAX;}
+template<> inline __host__ __device__ Ncv32f _pixMaxVal<Ncv32f>() {return   FLT_MAX;}
+template<> inline __host__ __device__ Ncv64f _pixMaxVal<Ncv64f>() {return   DBL_MAX;}
+
+template<typename TBase> inline __host__ __device__ TBase _pixMinVal();
+template<> inline __host__ __device__ Ncv8u  _pixMinVal<Ncv8u>()  {return 0;}
+template<> inline __host__ __device__ Ncv16u _pixMinVal<Ncv16u>() {return 0;}
+template<> inline __host__ __device__ Ncv32u _pixMinVal<Ncv32u>() {return 0;}
+template<> inline __host__ __device__ Ncv8s  _pixMinVal<Ncv8s>()  {return SCHAR_MIN;}
+template<> inline __host__ __device__ Ncv16s _pixMinVal<Ncv16s>() {return SHRT_MIN;}
+template<> inline __host__ __device__ Ncv32s _pixMinVal<Ncv32s>() {return INT_MIN;}
+template<> inline __host__ __device__ Ncv32f _pixMinVal<Ncv32f>() {return FLT_MIN;}
+template<> inline __host__ __device__ Ncv64f _pixMinVal<Ncv64f>() {return DBL_MIN;}
+
+template<typename Tvec> struct TConvVec2Base;
+template<> struct TConvVec2Base<uchar1>  {typedef Ncv8u TBase;};
+template<> struct TConvVec2Base<uchar3>  {typedef Ncv8u TBase;};
+template<> struct TConvVec2Base<uchar4>  {typedef Ncv8u TBase;};
+template<> struct TConvVec2Base<ushort1> {typedef Ncv16u TBase;};
+template<> struct TConvVec2Base<ushort3> {typedef Ncv16u TBase;};
+template<> struct TConvVec2Base<ushort4> {typedef Ncv16u TBase;};
+template<> struct TConvVec2Base<uint1>   {typedef Ncv32u TBase;};
+template<> struct TConvVec2Base<uint3>   {typedef Ncv32u TBase;};
+template<> struct TConvVec2Base<uint4>   {typedef Ncv32u TBase;};
+template<> struct TConvVec2Base<float1>  {typedef Ncv32f TBase;};
+template<> struct TConvVec2Base<float3>  {typedef Ncv32f TBase;};
+template<> struct TConvVec2Base<float4>  {typedef Ncv32f TBase;};
+template<> struct TConvVec2Base<double1> {typedef Ncv64f TBase;};
+template<> struct TConvVec2Base<double3> {typedef Ncv64f TBase;};
+template<> struct TConvVec2Base<double4> {typedef Ncv64f TBase;};
+
+#define NC(T)       (sizeof(T) / sizeof(TConvVec2Base<T>::TBase))
+
+template<typename TBase, Ncv32u NC> struct TConvBase2Vec;
+template<> struct TConvBase2Vec<Ncv8u, 1>  {typedef uchar1 TVec;};
+template<> struct TConvBase2Vec<Ncv8u, 3>  {typedef uchar3 TVec;};
+template<> struct TConvBase2Vec<Ncv8u, 4>  {typedef uchar4 TVec;};
+template<> struct TConvBase2Vec<Ncv16u, 1> {typedef ushort1 TVec;};
+template<> struct TConvBase2Vec<Ncv16u, 3> {typedef ushort3 TVec;};
+template<> struct TConvBase2Vec<Ncv16u, 4> {typedef ushort4 TVec;};
+template<> struct TConvBase2Vec<Ncv32u, 1> {typedef uint1 TVec;};
+template<> struct TConvBase2Vec<Ncv32u, 3> {typedef uint3 TVec;};
+template<> struct TConvBase2Vec<Ncv32u, 4> {typedef uint4 TVec;};
+template<> struct TConvBase2Vec<Ncv32f, 1> {typedef float1 TVec;};
+template<> struct TConvBase2Vec<Ncv32f, 3> {typedef float3 TVec;};
+template<> struct TConvBase2Vec<Ncv32f, 4> {typedef float4 TVec;};
+template<> struct TConvBase2Vec<Ncv64f, 1> {typedef double1 TVec;};
+template<> struct TConvBase2Vec<Ncv64f, 3> {typedef double3 TVec;};
+template<> struct TConvBase2Vec<Ncv64f, 4> {typedef double4 TVec;};
+
+//TODO: consider using CUDA intrinsics to avoid branching
+template<typename Tin> inline __host__ __device__ void _TDemoteClampZ(Tin &a, Ncv8u &out) {out = (Ncv8u)CLAMP_0_255(a);}
+template<typename Tin> inline __host__ __device__ void _TDemoteClampZ(Tin &a, Ncv16u &out) {out = (Ncv16u)CLAMP(a, 0, USHRT_MAX);}
+template<typename Tin> inline __host__ __device__ void _TDemoteClampZ(Tin &a, Ncv32u &out) {out = (Ncv32u)CLAMP(a, 0, UINT_MAX);}
+template<typename Tin> inline __host__ __device__ void _TDemoteClampZ(Tin &a, Ncv32f &out) {out = (Ncv32f)a;}
+
+//TODO: consider using CUDA intrinsics to avoid branching
+template<typename Tin> inline __host__ __device__ void _TDemoteClampNN(Tin &a, Ncv8u &out) {out = (Ncv8u)CLAMP_0_255(a+0.5f);}
+template<typename Tin> inline __host__ __device__ void _TDemoteClampNN(Tin &a, Ncv16u &out) {out = (Ncv16u)CLAMP(a+0.5f, 0, USHRT_MAX);}
+template<typename Tin> inline __host__ __device__ void _TDemoteClampNN(Tin &a, Ncv32u &out) {out = (Ncv32u)CLAMP(a+0.5f, 0, UINT_MAX);}
+template<typename Tin> inline __host__ __device__ void _TDemoteClampNN(Tin &a, Ncv32f &out) {out = (Ncv32f)a;}
+
+template<typename Tout> inline Tout _pixMakeZero();
+template<> inline __host__ __device__ uchar1 _pixMakeZero<uchar1>() {return make_uchar1(0);}
+template<> inline __host__ __device__ uchar3 _pixMakeZero<uchar3>() {return make_uchar3(0,0,0);}
+template<> inline __host__ __device__ uchar4 _pixMakeZero<uchar4>() {return make_uchar4(0,0,0,0);}
+template<> inline __host__ __device__ ushort1 _pixMakeZero<ushort1>() {return make_ushort1(0);}
+template<> inline __host__ __device__ ushort3 _pixMakeZero<ushort3>() {return make_ushort3(0,0,0);}
+template<> inline __host__ __device__ ushort4 _pixMakeZero<ushort4>() {return make_ushort4(0,0,0,0);}
+template<> inline __host__ __device__ uint1 _pixMakeZero<uint1>() {return make_uint1(0);}
+template<> inline __host__ __device__ uint3 _pixMakeZero<uint3>() {return make_uint3(0,0,0);}
+template<> inline __host__ __device__ uint4 _pixMakeZero<uint4>() {return make_uint4(0,0,0,0);}
+template<> inline __host__ __device__ float1 _pixMakeZero<float1>() {return make_float1(0.f);}
+template<> inline __host__ __device__ float3 _pixMakeZero<float3>() {return make_float3(0.f,0.f,0.f);}
+template<> inline __host__ __device__ float4 _pixMakeZero<float4>() {return make_float4(0.f,0.f,0.f,0.f);}
+template<> inline __host__ __device__ double1 _pixMakeZero<double1>() {return make_double1(0.);}
+template<> inline __host__ __device__ double3 _pixMakeZero<double3>() {return make_double3(0.,0.,0.);}
+template<> inline __host__ __device__ double4 _pixMakeZero<double4>() {return make_double4(0.,0.,0.,0.);}
+
+static inline __host__ __device__ uchar1 _pixMake(Ncv8u x) {return make_uchar1(x);}
+static inline __host__ __device__ uchar3 _pixMake(Ncv8u x, Ncv8u y, Ncv8u z) {return make_uchar3(x,y,z);}
+static inline __host__ __device__ uchar4 _pixMake(Ncv8u x, Ncv8u y, Ncv8u z, Ncv8u w) {return make_uchar4(x,y,z,w);}
+static inline __host__ __device__ ushort1 _pixMake(Ncv16u x) {return make_ushort1(x);}
+static inline __host__ __device__ ushort3 _pixMake(Ncv16u x, Ncv16u y, Ncv16u z) {return make_ushort3(x,y,z);}
+static inline __host__ __device__ ushort4 _pixMake(Ncv16u x, Ncv16u y, Ncv16u z, Ncv16u w) {return make_ushort4(x,y,z,w);}
+static inline __host__ __device__ uint1 _pixMake(Ncv32u x) {return make_uint1(x);}
+static inline __host__ __device__ uint3 _pixMake(Ncv32u x, Ncv32u y, Ncv32u z) {return make_uint3(x,y,z);}
+static inline __host__ __device__ uint4 _pixMake(Ncv32u x, Ncv32u y, Ncv32u z, Ncv32u w) {return make_uint4(x,y,z,w);}
+static inline __host__ __device__ float1 _pixMake(Ncv32f x) {return make_float1(x);}
+static inline __host__ __device__ float3 _pixMake(Ncv32f x, Ncv32f y, Ncv32f z) {return make_float3(x,y,z);}
+static inline __host__ __device__ float4 _pixMake(Ncv32f x, Ncv32f y, Ncv32f z, Ncv32f w) {return make_float4(x,y,z,w);}
+static inline __host__ __device__ double1 _pixMake(Ncv64f x) {return make_double1(x);}
+static inline __host__ __device__ double3 _pixMake(Ncv64f x, Ncv64f y, Ncv64f z) {return make_double3(x,y,z);}
+static inline __host__ __device__ double4 _pixMake(Ncv64f x, Ncv64f y, Ncv64f z, Ncv64f w) {return make_double4(x,y,z,w);}
+
+
+template<typename Tin, typename Tout, Ncv32u CN> struct __pixDemoteClampZ_CN {static __host__ __device__ Tout _pixDemoteClampZ_CN(Tin &pix);};
+
+template<typename Tin, typename Tout> struct __pixDemoteClampZ_CN<Tin, Tout, 1> {
+static __host__ __device__ Tout _pixDemoteClampZ_CN(Tin &pix)
+{
+    Tout out;
+    _TDemoteClampZ(pix.x, out.x);
+    return out;
+}};
+
+template<typename Tin, typename Tout> struct __pixDemoteClampZ_CN<Tin, Tout, 3> {
+static __host__ __device__ Tout _pixDemoteClampZ_CN(Tin &pix)
+{
+    Tout out;
+    _TDemoteClampZ(pix.x, out.x);
+    _TDemoteClampZ(pix.y, out.y);
+    _TDemoteClampZ(pix.z, out.z);
+    return out;
+}};
+
+template<typename Tin, typename Tout> struct __pixDemoteClampZ_CN<Tin, Tout, 4> {
+static __host__ __device__ Tout _pixDemoteClampZ_CN(Tin &pix)
+{
+    Tout out;
+    _TDemoteClampZ(pix.x, out.x);
+    _TDemoteClampZ(pix.y, out.y);
+    _TDemoteClampZ(pix.z, out.z);
+    _TDemoteClampZ(pix.w, out.w);
+    return out;
+}};
+
+template<typename Tin, typename Tout> inline __host__ __device__ Tout _pixDemoteClampZ(Tin &pix)
+{
+    return __pixDemoteClampZ_CN<Tin, Tout, NC(Tin)>::_pixDemoteClampZ_CN(pix);
+}
+
+
+template<typename Tin, typename Tout, Ncv32u CN> struct __pixDemoteClampNN_CN {static __host__ __device__ Tout _pixDemoteClampNN_CN(Tin &pix);};
+
+template<typename Tin, typename Tout> struct __pixDemoteClampNN_CN<Tin, Tout, 1> {
+static __host__ __device__ Tout _pixDemoteClampNN_CN(Tin &pix)
+{
+    Tout out;
+    _TDemoteClampNN(pix.x, out.x);
+    return out;
+}};
+
+template<typename Tin, typename Tout> struct __pixDemoteClampNN_CN<Tin, Tout, 3> {
+static __host__ __device__ Tout _pixDemoteClampNN_CN(Tin &pix)
+{
+    Tout out;
+    _TDemoteClampNN(pix.x, out.x);
+    _TDemoteClampNN(pix.y, out.y);
+    _TDemoteClampNN(pix.z, out.z);
+    return out;
+}};
+
+template<typename Tin, typename Tout> struct __pixDemoteClampNN_CN<Tin, Tout, 4> {
+static __host__ __device__ Tout _pixDemoteClampNN_CN(Tin &pix)
+{
+    Tout out;
+    _TDemoteClampNN(pix.x, out.x);
+    _TDemoteClampNN(pix.y, out.y);
+    _TDemoteClampNN(pix.z, out.z);
+    _TDemoteClampNN(pix.w, out.w);
+    return out;
+}};
+
+template<typename Tin, typename Tout> inline __host__ __device__ Tout _pixDemoteClampNN(Tin &pix)
+{
+    return __pixDemoteClampNN_CN<Tin, Tout, NC(Tin)>::_pixDemoteClampNN_CN(pix);
+}
+
+
+template<typename Tin, typename Tout, typename Tw, Ncv32u CN> struct __pixScale_CN {static __host__ __device__ Tout _pixScale_CN(Tin &pix, Tw w);};
+
+template<typename Tin, typename Tout, typename Tw> struct __pixScale_CN<Tin, Tout, Tw, 1> {
+static __host__ __device__ Tout _pixScale_CN(Tin &pix, Tw w)
+{
+    Tout out;
+    typedef typename TConvVec2Base<Tout>::TBase TBout;
+    out.x = (TBout)(pix.x * w);
+    return out;
+}};
+
+template<typename Tin, typename Tout, typename Tw> struct __pixScale_CN<Tin, Tout, Tw, 3> {
+static __host__ __device__ Tout _pixScale_CN(Tin &pix, Tw w)
+{
+    Tout out;
+    typedef typename TConvVec2Base<Tout>::TBase TBout;
+    out.x = (TBout)(pix.x * w);
+    out.y = (TBout)(pix.y * w);
+    out.z = (TBout)(pix.z * w);
+    return out;
+}};
+
+template<typename Tin, typename Tout, typename Tw> struct __pixScale_CN<Tin, Tout, Tw, 4> {
+static __host__ __device__ Tout _pixScale_CN(Tin &pix, Tw w)
+{
+    Tout out;
+    typedef typename TConvVec2Base<Tout>::TBase TBout;
+    out.x = (TBout)(pix.x * w);
+    out.y = (TBout)(pix.y * w);
+    out.z = (TBout)(pix.z * w);
+    out.w = (TBout)(pix.w * w);
+    return out;
+}};
+
+template<typename Tin, typename Tout, typename Tw> static __host__ __device__ Tout _pixScale(Tin &pix, Tw w)
+{
+    return __pixScale_CN<Tin, Tout, Tw, NC(Tin)>::_pixScale_CN(pix, w);
+}
+
+
+template<typename Tin, typename Tout, Ncv32u CN> struct __pixAdd_CN {static __host__ __device__ Tout _pixAdd_CN(Tout &pix1, Tin &pix2);};
+
+template<typename Tin, typename Tout> struct __pixAdd_CN<Tin, Tout, 1> {
+static __host__ __device__ Tout _pixAdd_CN(Tout &pix1, Tin &pix2)
+{
+    Tout out;
+    out.x = pix1.x + pix2.x;
+    return out;
+}};
+
+template<typename Tin, typename Tout> struct __pixAdd_CN<Tin, Tout, 3> {
+static __host__ __device__ Tout _pixAdd_CN(Tout &pix1, Tin &pix2)
+{
+    Tout out;
+    out.x = pix1.x + pix2.x;
+    out.y = pix1.y + pix2.y;
+    out.z = pix1.z + pix2.z;
+    return out;
+}};
+
+template<typename Tin, typename Tout> struct __pixAdd_CN<Tin, Tout, 4> {
+static __host__ __device__ Tout _pixAdd_CN(Tout &pix1, Tin &pix2)
+{
+    Tout out;
+    out.x = pix1.x + pix2.x;
+    out.y = pix1.y + pix2.y;
+    out.z = pix1.z + pix2.z;
+    out.w = pix1.w + pix2.w;
+    return out;
+}};
+
+template<typename Tin, typename Tout> static __host__ __device__ Tout _pixAdd(Tout &pix1, Tin &pix2)
+{
+    return __pixAdd_CN<Tin, Tout, NC(Tin)>::_pixAdd_CN(pix1, pix2);
+}
+
+
+template<typename Tin, typename Tout, Ncv32u CN> struct __pixDist_CN {static __host__ __device__ Tout _pixDist_CN(Tin &pix1, Tin &pix2);};
+
+template<typename Tin, typename Tout> struct __pixDist_CN<Tin, Tout, 1> {
+static __host__ __device__ Tout _pixDist_CN(Tin &pix1, Tin &pix2)
+{
+    return Tout(SQR(pix1.x - pix2.x));
+}};
+
+template<typename Tin, typename Tout> struct __pixDist_CN<Tin, Tout, 3> {
+static __host__ __device__ Tout _pixDist_CN(Tin &pix1, Tin &pix2)
+{
+    return Tout(SQR(pix1.x - pix2.x) + SQR(pix1.y - pix2.y) + SQR(pix1.z - pix2.z));
+}};
+
+template<typename Tin, typename Tout> struct __pixDist_CN<Tin, Tout, 4> {
+static __host__ __device__ Tout _pixDist_CN(Tin &pix1, Tin &pix2)
+{
+    return Tout(SQR(pix1.x - pix2.x) + SQR(pix1.y - pix2.y) + SQR(pix1.z - pix2.z) + SQR(pix1.w - pix2.w));
+}};
+
+template<typename Tin, typename Tout> static __host__ __device__ Tout _pixDist(Tin &pix1, Tin &pix2)
+{
+    return __pixDist_CN<Tin, Tout, NC(Tin)>::_pixDist_CN(pix1, pix2);
+}
+
+
+template <typename T> struct TAccPixWeighted;
+template<> struct TAccPixWeighted<uchar1> {typedef double1 type;};
+template<> struct TAccPixWeighted<uchar3> {typedef double3 type;};
+template<> struct TAccPixWeighted<uchar4> {typedef double4 type;};
+template<> struct TAccPixWeighted<ushort1> {typedef double1 type;};
+template<> struct TAccPixWeighted<ushort3> {typedef double3 type;};
+template<> struct TAccPixWeighted<ushort4> {typedef double4 type;};
+template<> struct TAccPixWeighted<float1> {typedef double1 type;};
+template<> struct TAccPixWeighted<float3> {typedef double3 type;};
+template<> struct TAccPixWeighted<float4> {typedef double4 type;};
+
+template<typename Tfrom> struct TAccPixDist {};
+template<> struct TAccPixDist<uchar1> {typedef Ncv32u type;};
+template<> struct TAccPixDist<uchar3> {typedef Ncv32u type;};
+template<> struct TAccPixDist<uchar4> {typedef Ncv32u type;};
+template<> struct TAccPixDist<ushort1> {typedef Ncv32u type;};
+template<> struct TAccPixDist<ushort3> {typedef Ncv32u type;};
+template<> struct TAccPixDist<ushort4> {typedef Ncv32u type;};
+template<> struct TAccPixDist<float1> {typedef Ncv32f type;};
+template<> struct TAccPixDist<float3> {typedef Ncv32f type;};
+template<> struct TAccPixDist<float4> {typedef Ncv32f type;};
+
+#endif //_ncv_pixel_operations_hpp_
diff --git a/modules/cudalegacy/src/cuda/NCVPyramid.cu b/modules/cudalegacy/src/cuda/NCVPyramid.cu
new file mode 100644
index 00000000000..c37b1a9e1dc
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/NCVPyramid.cu
@@ -0,0 +1,621 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include <stdio.h>
+#include <cuda_runtime.h>
+
+#include "opencv2/core/cuda/common.hpp"
+
+#include "opencv2/cudalegacy/NCV.hpp"
+#include "opencv2/cudalegacy/NCVPyramid.hpp"
+
+#include "NCVAlg.hpp"
+#include "NCVPixelOperations.hpp"
+
+template<typename T, Ncv32u CN> struct __average4_CN {static __host__ __device__ T _average4_CN(const T &p00, const T &p01, const T &p10, const T &p11);};
+
+template<typename T> struct __average4_CN<T, 1> {
+static __host__ __device__ T _average4_CN(const T &p00, const T &p01, const T &p10, const T &p11)
+{
+    T out;
+    out.x = ((Ncv32s)p00.x + p01.x + p10.x + p11.x + 2) / 4;
+    return out;
+}};
+
+template<> struct __average4_CN<float1, 1> {
+static __host__ __device__ float1 _average4_CN(const float1 &p00, const float1 &p01, const float1 &p10, const float1 &p11)
+{
+    float1 out;
+    out.x = (p00.x + p01.x + p10.x + p11.x) / 4;
+    return out;
+}};
+
+template<> struct __average4_CN<double1, 1> {
+static __host__ __device__ double1 _average4_CN(const double1 &p00, const double1 &p01, const double1 &p10, const double1 &p11)
+{
+    double1 out;
+    out.x = (p00.x + p01.x + p10.x + p11.x) / 4;
+    return out;
+}};
+
+template<typename T> struct __average4_CN<T, 3> {
+static __host__ __device__ T _average4_CN(const T &p00, const T &p01, const T &p10, const T &p11)
+{
+    T out;
+    out.x = ((Ncv32s)p00.x + p01.x + p10.x + p11.x + 2) / 4;
+    out.y = ((Ncv32s)p00.y + p01.y + p10.y + p11.y + 2) / 4;
+    out.z = ((Ncv32s)p00.z + p01.z + p10.z + p11.z + 2) / 4;
+    return out;
+}};
+
+template<> struct __average4_CN<float3, 3> {
+static __host__ __device__ float3 _average4_CN(const float3 &p00, const float3 &p01, const float3 &p10, const float3 &p11)
+{
+    float3 out;
+    out.x = (p00.x + p01.x + p10.x + p11.x) / 4;
+    out.y = (p00.y + p01.y + p10.y + p11.y) / 4;
+    out.z = (p00.z + p01.z + p10.z + p11.z) / 4;
+    return out;
+}};
+
+template<> struct __average4_CN<double3, 3> {
+static __host__ __device__ double3 _average4_CN(const double3 &p00, const double3 &p01, const double3 &p10, const double3 &p11)
+{
+    double3 out;
+    out.x = (p00.x + p01.x + p10.x + p11.x) / 4;
+    out.y = (p00.y + p01.y + p10.y + p11.y) / 4;
+    out.z = (p00.z + p01.z + p10.z + p11.z) / 4;
+    return out;
+}};
+
+template<typename T> struct __average4_CN<T, 4> {
+static __host__ __device__ T _average4_CN(const T &p00, const T &p01, const T &p10, const T &p11)
+{
+    T out;
+    out.x = ((Ncv32s)p00.x + p01.x + p10.x + p11.x + 2) / 4;
+    out.y = ((Ncv32s)p00.y + p01.y + p10.y + p11.y + 2) / 4;
+    out.z = ((Ncv32s)p00.z + p01.z + p10.z + p11.z + 2) / 4;
+    out.w = ((Ncv32s)p00.w + p01.w + p10.w + p11.w + 2) / 4;
+    return out;
+}};
+
+template<> struct __average4_CN<float4, 4> {
+static __host__ __device__ float4 _average4_CN(const float4 &p00, const float4 &p01, const float4 &p10, const float4 &p11)
+{
+    float4 out;
+    out.x = (p00.x + p01.x + p10.x + p11.x) / 4;
+    out.y = (p00.y + p01.y + p10.y + p11.y) / 4;
+    out.z = (p00.z + p01.z + p10.z + p11.z) / 4;
+    out.w = (p00.w + p01.w + p10.w + p11.w) / 4;
+    return out;
+}};
+
+template<> struct __average4_CN<double4, 4> {
+static __host__ __device__ double4 _average4_CN(const double4 &p00, const double4 &p01, const double4 &p10, const double4 &p11)
+{
+    double4 out;
+    out.x = (p00.x + p01.x + p10.x + p11.x) / 4;
+    out.y = (p00.y + p01.y + p10.y + p11.y) / 4;
+    out.z = (p00.z + p01.z + p10.z + p11.z) / 4;
+    out.w = (p00.w + p01.w + p10.w + p11.w) / 4;
+    return out;
+}};
+
+template<typename T> static __host__ __device__ T _average4(const T &p00, const T &p01, const T &p10, const T &p11)
+{
+    return __average4_CN<T, NC(T)>::_average4_CN(p00, p01, p10, p11);
+}
+
+
+template<typename Tin, typename Tout, Ncv32u CN> struct __lerp_CN {static __host__ __device__ Tout _lerp_CN(const Tin &a, const Tin &b, Ncv32f d);};
+
+template<typename Tin, typename Tout> struct __lerp_CN<Tin, Tout, 1> {
+static __host__ __device__ Tout _lerp_CN(const Tin &a, const Tin &b, Ncv32f d)
+{
+    typedef typename TConvVec2Base<Tout>::TBase TB;
+    return _pixMake(TB(b.x * d + a.x * (1 - d)));
+}};
+
+template<typename Tin, typename Tout> struct __lerp_CN<Tin, Tout, 3> {
+static __host__ __device__ Tout _lerp_CN(const Tin &a, const Tin &b, Ncv32f d)
+{
+    typedef typename TConvVec2Base<Tout>::TBase TB;
+    return _pixMake(TB(b.x * d + a.x * (1 - d)),
+                    TB(b.y * d + a.y * (1 - d)),
+                    TB(b.z * d + a.z * (1 - d)));
+}};
+
+template<typename Tin, typename Tout> struct __lerp_CN<Tin, Tout, 4> {
+static __host__ __device__ Tout _lerp_CN(const Tin &a, const Tin &b, Ncv32f d)
+{
+    typedef typename TConvVec2Base<Tout>::TBase TB;
+    return _pixMake(TB(b.x * d + a.x * (1 - d)),
+                    TB(b.y * d + a.y * (1 - d)),
+                    TB(b.z * d + a.z * (1 - d)),
+                    TB(b.w * d + a.w * (1 - d)));
+}};
+
+template<typename Tin, typename Tout> static __host__ __device__ Tout _lerp(const Tin &a, const Tin &b, Ncv32f d)
+{
+    return __lerp_CN<Tin, Tout, NC(Tin)>::_lerp_CN(a, b, d);
+}
+
+
+template<typename T>
+__global__ void kernelDownsampleX2(T *d_src,
+                                   Ncv32u srcPitch,
+                                   T *d_dst,
+                                   Ncv32u dstPitch,
+                                   NcvSize32u dstRoi)
+{
+    Ncv32u i = blockIdx.y * blockDim.y + threadIdx.y;
+    Ncv32u j = blockIdx.x * blockDim.x + threadIdx.x;
+
+    if (i < dstRoi.height && j < dstRoi.width)
+    {
+        T *d_src_line1 = (T *)((Ncv8u *)d_src + (2 * i + 0) * srcPitch);
+        T *d_src_line2 = (T *)((Ncv8u *)d_src + (2 * i + 1) * srcPitch);
+        T *d_dst_line = (T *)((Ncv8u *)d_dst + i * dstPitch);
+
+        T p00 = d_src_line1[2*j+0];
+        T p01 = d_src_line1[2*j+1];
+        T p10 = d_src_line2[2*j+0];
+        T p11 = d_src_line2[2*j+1];
+
+        d_dst_line[j] = _average4(p00, p01, p10, p11);
+    }
+}
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace pyramid
+    {
+        template <typename T> void kernelDownsampleX2_gpu(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream)
+        {
+            dim3 bDim(16, 8);
+            dim3 gDim(divUp(src.cols, bDim.x), divUp(src.rows, bDim.y));
+
+            kernelDownsampleX2<<<gDim, bDim, 0, stream>>>((T*)src.data, static_cast<Ncv32u>(src.step),
+                (T*)dst.data, static_cast<Ncv32u>(dst.step), NcvSize32u(dst.cols, dst.rows));
+
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        void downsampleX2(PtrStepSzb src, PtrStepSzb dst, int depth, int cn, cudaStream_t stream)
+        {
+            typedef void (*func_t)(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+
+            static const func_t funcs[6][4] =
+            {
+                {kernelDownsampleX2_gpu<uchar1>       , 0 /*kernelDownsampleX2_gpu<uchar2>*/ , kernelDownsampleX2_gpu<uchar3>      , kernelDownsampleX2_gpu<uchar4>      },
+                {0 /*kernelDownsampleX2_gpu<char1>*/  , 0 /*kernelDownsampleX2_gpu<char2>*/  , 0 /*kernelDownsampleX2_gpu<char3>*/ , 0 /*kernelDownsampleX2_gpu<char4>*/ },
+                {kernelDownsampleX2_gpu<ushort1>      , 0 /*kernelDownsampleX2_gpu<ushort2>*/, kernelDownsampleX2_gpu<ushort3>     , kernelDownsampleX2_gpu<ushort4>     },
+                {0 /*kernelDownsampleX2_gpu<short1>*/ , 0 /*kernelDownsampleX2_gpu<short2>*/ , 0 /*kernelDownsampleX2_gpu<short3>*/, 0 /*kernelDownsampleX2_gpu<short4>*/},
+                {0 /*kernelDownsampleX2_gpu<int1>*/   , 0 /*kernelDownsampleX2_gpu<int2>*/   , 0 /*kernelDownsampleX2_gpu<int3>*/  , 0 /*kernelDownsampleX2_gpu<int4>*/  },
+                {kernelDownsampleX2_gpu<float1>       , 0 /*kernelDownsampleX2_gpu<float2>*/ , kernelDownsampleX2_gpu<float3>      , kernelDownsampleX2_gpu<float4>      }
+            };
+
+            const func_t func = funcs[depth][cn - 1];
+            CV_Assert(func != 0);
+
+            func(src, dst, stream);
+        }
+    }
+}}}
+
+
+
+
+template<typename T>
+__global__ void kernelInterpolateFrom1(T *d_srcTop,
+                                       Ncv32u srcTopPitch,
+                                       NcvSize32u szTopRoi,
+                                       T *d_dst,
+                                       Ncv32u dstPitch,
+                                       NcvSize32u dstRoi)
+{
+    Ncv32u i = blockIdx.y * blockDim.y + threadIdx.y;
+    Ncv32u j = blockIdx.x * blockDim.x + threadIdx.x;
+
+    if (i < dstRoi.height && j < dstRoi.width)
+    {
+        Ncv32f ptTopX = 1.0f * (szTopRoi.width - 1) * j / (dstRoi.width - 1);
+        Ncv32f ptTopY = 1.0f * (szTopRoi.height - 1) * i / (dstRoi.height - 1);
+        Ncv32u xl = (Ncv32u)ptTopX;
+        Ncv32u xh = xl+1;
+        Ncv32f dx = ptTopX - xl;
+        Ncv32u yl = (Ncv32u)ptTopY;
+        Ncv32u yh = yl+1;
+        Ncv32f dy = ptTopY - yl;
+
+        T *d_src_line1 = (T *)((Ncv8u *)d_srcTop + yl * srcTopPitch);
+        T *d_src_line2 = (T *)((Ncv8u *)d_srcTop + yh * srcTopPitch);
+        T *d_dst_line = (T *)((Ncv8u *)d_dst + i * dstPitch);
+
+        T p00, p01, p10, p11;
+        p00 = d_src_line1[xl];
+        p01 = xh < szTopRoi.width ? d_src_line1[xh] : p00;
+        p10 = yh < szTopRoi.height ? d_src_line2[xl] : p00;
+        p11 = (xh < szTopRoi.width && yh < szTopRoi.height) ? d_src_line2[xh] : p00;
+        typedef typename TConvBase2Vec<Ncv32f, NC(T)>::TVec TVFlt;
+        TVFlt m_00_01 = _lerp<T, TVFlt>(p00, p01, dx);
+        TVFlt m_10_11 = _lerp<T, TVFlt>(p10, p11, dx);
+        TVFlt mixture = _lerp<TVFlt, TVFlt>(m_00_01, m_10_11, dy);
+        T outPix = _pixDemoteClampZ<TVFlt, T>(mixture);
+
+        d_dst_line[j] = outPix;
+    }
+}
+namespace cv { namespace cuda { namespace device
+{
+    namespace pyramid
+    {
+        template <typename T> void kernelInterpolateFrom1_gpu(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream)
+        {
+            dim3 bDim(16, 8);
+            dim3 gDim(divUp(dst.cols, bDim.x), divUp(dst.rows, bDim.y));
+
+            kernelInterpolateFrom1<<<gDim, bDim, 0, stream>>>((T*) src.data, static_cast<Ncv32u>(src.step), NcvSize32u(src.cols, src.rows),
+                (T*) dst.data, static_cast<Ncv32u>(dst.step), NcvSize32u(dst.cols, dst.rows));
+
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        void interpolateFrom1(PtrStepSzb src, PtrStepSzb dst, int depth, int cn, cudaStream_t stream)
+        {
+            typedef void (*func_t)(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+
+            static const func_t funcs[6][4] =
+            {
+                {kernelInterpolateFrom1_gpu<uchar1>      , 0 /*kernelInterpolateFrom1_gpu<uchar2>*/ , kernelInterpolateFrom1_gpu<uchar3>      , kernelInterpolateFrom1_gpu<uchar4>      },
+                {0 /*kernelInterpolateFrom1_gpu<char1>*/ , 0 /*kernelInterpolateFrom1_gpu<char2>*/  , 0 /*kernelInterpolateFrom1_gpu<char3>*/ , 0 /*kernelInterpolateFrom1_gpu<char4>*/ },
+                {kernelInterpolateFrom1_gpu<ushort1>     , 0 /*kernelInterpolateFrom1_gpu<ushort2>*/, kernelInterpolateFrom1_gpu<ushort3>     , kernelInterpolateFrom1_gpu<ushort4>     },
+                {0 /*kernelInterpolateFrom1_gpu<short1>*/, 0 /*kernelInterpolateFrom1_gpu<short2>*/ , 0 /*kernelInterpolateFrom1_gpu<short3>*/, 0 /*kernelInterpolateFrom1_gpu<short4>*/},
+                {0 /*kernelInterpolateFrom1_gpu<int1>*/  , 0 /*kernelInterpolateFrom1_gpu<int2>*/   , 0 /*kernelInterpolateFrom1_gpu<int3>*/  , 0 /*kernelInterpolateFrom1_gpu<int4>*/  },
+                {kernelInterpolateFrom1_gpu<float1>      , 0 /*kernelInterpolateFrom1_gpu<float2>*/ , kernelInterpolateFrom1_gpu<float3>      , kernelInterpolateFrom1_gpu<float4>      }
+            };
+
+            const func_t func = funcs[depth][cn - 1];
+            CV_Assert(func != 0);
+
+            func(src, dst, stream);
+        }
+    }
+}}}
+
+
+#if 0 //def _WIN32
+
+template<typename T>
+static T _interpLinear(const T &a, const T &b, Ncv32f d)
+{
+    typedef typename TConvBase2Vec<Ncv32f, NC(T)>::TVec TVFlt;
+    TVFlt tmp = _lerp<T, TVFlt>(a, b, d);
+    return _pixDemoteClampZ<TVFlt, T>(tmp);
+}
+
+
+template<typename T>
+static T _interpBilinear(const NCVMatrix<T> &refLayer, Ncv32f x, Ncv32f y)
+{
+    Ncv32u xl = (Ncv32u)x;
+    Ncv32u xh = xl+1;
+    Ncv32f dx = x - xl;
+    Ncv32u yl = (Ncv32u)y;
+    Ncv32u yh = yl+1;
+    Ncv32f dy = y - yl;
+    T p00, p01, p10, p11;
+    p00 = refLayer.at(xl, yl);
+    p01 = xh < refLayer.width() ? refLayer.at(xh, yl) : p00;
+    p10 = yh < refLayer.height() ? refLayer.at(xl, yh) : p00;
+    p11 = (xh < refLayer.width() && yh < refLayer.height()) ? refLayer.at(xh, yh) : p00;
+    typedef typename TConvBase2Vec<Ncv32f, NC(T)>::TVec TVFlt;
+    TVFlt m_00_01 = _lerp<T, TVFlt>(p00, p01, dx);
+    TVFlt m_10_11 = _lerp<T, TVFlt>(p10, p11, dx);
+    TVFlt mixture = _lerp<TVFlt, TVFlt>(m_00_01, m_10_11, dy);
+    return _pixDemoteClampZ<TVFlt, T>(mixture);
+}
+
+template <class T>
+NCVImagePyramid<T>::NCVImagePyramid(const NCVMatrix<T> &img,
+                                    Ncv8u numLayers,
+                                    INCVMemAllocator &alloc,
+                                    cudaStream_t cuStream)
+{
+    this->_isInitialized = false;
+    ncvAssertPrintReturn(img.memType() == alloc.memType(), "NCVImagePyramid::ctor error", );
+
+    this->layer0 = &img;
+    NcvSize32u szLastLayer(img.width(), img.height());
+    this->nLayers = 1;
+
+    NCV_SET_SKIP_COND(alloc.isCounting());
+    NcvBool bDeviceCode = alloc.memType() == NCVMemoryTypeDevice;
+
+    if (numLayers == 0)
+    {
+        numLayers = 255; //it will cut-off when any of the dimensions goes 1
+    }
+
+#ifdef SELF_CHECK_GPU
+    NCVMemNativeAllocator allocCPU(NCVMemoryTypeHostPinned, 512);
+#endif
+
+    for (Ncv32u i=0; i<(Ncv32u)numLayers-1; i++)
+    {
+        NcvSize32u szCurLayer(szLastLayer.width / 2, szLastLayer.height / 2);
+        if (szCurLayer.width == 0 || szCurLayer.height == 0)
+        {
+            break;
+        }
+
+        this->pyramid.push_back(new NCVMatrixAlloc<T>(alloc, szCurLayer.width, szCurLayer.height));
+        ncvAssertPrintReturn(((NCVMatrixAlloc<T> *)(this->pyramid[i]))->isMemAllocated(), "NCVImagePyramid::ctor error", );
+        this->nLayers++;
+
+        //fill in the layer
+        NCV_SKIP_COND_BEGIN
+
+        const NCVMatrix<T> *prevLayer = i == 0 ? this->layer0 : this->pyramid[i-1];
+        NCVMatrix<T> *curLayer = this->pyramid[i];
+
+        if (bDeviceCode)
+        {
+            dim3 bDim(16, 8);
+            dim3 gDim(divUp(szCurLayer.width, bDim.x), divUp(szCurLayer.height, bDim.y));
+            kernelDownsampleX2<<<gDim, bDim, 0, cuStream>>>(prevLayer->ptr(),
+                                                            prevLayer->pitch(),
+                                                            curLayer->ptr(),
+                                                            curLayer->pitch(),
+                                                            szCurLayer);
+            ncvAssertPrintReturn(cudaSuccess == cudaGetLastError(), "NCVImagePyramid::ctor error", );
+
+#ifdef SELF_CHECK_GPU
+            NCVMatrixAlloc<T> h_prevLayer(allocCPU, prevLayer->width(), prevLayer->height());
+            ncvAssertPrintReturn(h_prevLayer.isMemAllocated(), "Validation failure in NCVImagePyramid::ctor", );
+            NCVMatrixAlloc<T> h_curLayer(allocCPU, curLayer->width(), curLayer->height());
+            ncvAssertPrintReturn(h_curLayer.isMemAllocated(), "Validation failure in NCVImagePyramid::ctor", );
+            ncvAssertPrintReturn(NCV_SUCCESS == prevLayer->copy2D(h_prevLayer, prevLayer->size(), cuStream), "Validation failure in NCVImagePyramid::ctor", );
+            ncvAssertPrintReturn(NCV_SUCCESS == curLayer->copy2D(h_curLayer, curLayer->size(), cuStream), "Validation failure in NCVImagePyramid::ctor", );
+            ncvAssertPrintReturn(cudaSuccess == cudaStreamSynchronize(cuStream), "Validation failure in NCVImagePyramid::ctor", );
+            for (Ncv32u i=0; i<szCurLayer.height; i++)
+            {
+                for (Ncv32u j=0; j<szCurLayer.width; j++)
+                {
+                    T p00 = h_prevLayer.at(2*j+0, 2*i+0);
+                    T p01 = h_prevLayer.at(2*j+1, 2*i+0);
+                    T p10 = h_prevLayer.at(2*j+0, 2*i+1);
+                    T p11 = h_prevLayer.at(2*j+1, 2*i+1);
+                    T outGold = _average4(p00, p01, p10, p11);
+                    T outGPU = h_curLayer.at(j, i);
+                    ncvAssertPrintReturn(0 == memcmp(&outGold, &outGPU, sizeof(T)), "Validation failure in NCVImagePyramid::ctor with kernelDownsampleX2", );
+                }
+            }
+#endif
+        }
+        else
+        {
+            for (Ncv32u i=0; i<szCurLayer.height; i++)
+            {
+                for (Ncv32u j=0; j<szCurLayer.width; j++)
+                {
+                    T p00 = prevLayer->at(2*j+0, 2*i+0);
+                    T p01 = prevLayer->at(2*j+1, 2*i+0);
+                    T p10 = prevLayer->at(2*j+0, 2*i+1);
+                    T p11 = prevLayer->at(2*j+1, 2*i+1);
+                    curLayer->at(j, i) = _average4(p00, p01, p10, p11);
+                }
+            }
+        }
+
+        NCV_SKIP_COND_END
+
+        szLastLayer = szCurLayer;
+    }
+
+    this->_isInitialized = true;
+}
+
+
+template <class T>
+NCVImagePyramid<T>::~NCVImagePyramid()
+{
+}
+
+
+template <class T>
+NcvBool NCVImagePyramid<T>::isInitialized() const
+{
+    return this->_isInitialized;
+}
+
+
+template <class T>
+NCVStatus NCVImagePyramid<T>::getLayer(NCVMatrix<T> &outImg,
+                                       NcvSize32u outRoi,
+                                       NcvBool bTrilinear,
+                                       cudaStream_t cuStream) const
+{
+    ncvAssertReturn(this->isInitialized(), NCV_UNKNOWN_ERROR);
+    ncvAssertReturn(outImg.memType() == this->layer0->memType(), NCV_MEM_RESIDENCE_ERROR);
+    ncvAssertReturn(outRoi.width <= this->layer0->width() && outRoi.height <= this->layer0->height() &&
+                    outRoi.width > 0 && outRoi.height > 0, NCV_DIMENSIONS_INVALID);
+
+    if (outRoi.width == this->layer0->width() && outRoi.height == this->layer0->height())
+    {
+        ncvAssertReturnNcvStat(this->layer0->copy2D(outImg, NcvSize32u(this->layer0->width(), this->layer0->height()), cuStream));
+        return NCV_SUCCESS;
+    }
+
+    Ncv32f lastScale = 1.0f;
+    Ncv32f curScale;
+    const NCVMatrix<T> *lastLayer = this->layer0;
+    const NCVMatrix<T> *curLayer = NULL;
+    NcvBool bUse2Refs = false;
+
+    for (Ncv32u i=0; i<this->nLayers-1; i++)
+    {
+        curScale = lastScale * 0.5f;
+        curLayer = this->pyramid[i];
+
+        if (outRoi.width == curLayer->width() && outRoi.height == curLayer->height())
+        {
+            ncvAssertReturnNcvStat(this->pyramid[i]->copy2D(outImg, NcvSize32u(this->pyramid[i]->width(), this->pyramid[i]->height()), cuStream));
+            return NCV_SUCCESS;
+        }
+
+        if (outRoi.width >= curLayer->width() && outRoi.height >= curLayer->height())
+        {
+            if (outRoi.width < lastLayer->width() && outRoi.height < lastLayer->height())
+            {
+                bUse2Refs = true;
+            }
+            break;
+        }
+
+        lastScale = curScale;
+        lastLayer = curLayer;
+    }
+
+    bUse2Refs = bUse2Refs && bTrilinear;
+
+    NCV_SET_SKIP_COND(outImg.memType() == NCVMemoryTypeNone);
+    NcvBool bDeviceCode = this->layer0->memType() == NCVMemoryTypeDevice;
+
+#ifdef SELF_CHECK_GPU
+    NCVMemNativeAllocator allocCPU(NCVMemoryTypeHostPinned, 512);
+#endif
+
+    NCV_SKIP_COND_BEGIN
+
+    if (bDeviceCode)
+    {
+        ncvAssertReturn(bUse2Refs == false, NCV_NOT_IMPLEMENTED);
+
+        dim3 bDim(16, 8);
+        dim3 gDim(divUp(outRoi.width, bDim.x), divUp(outRoi.height, bDim.y));
+        kernelInterpolateFrom1<<<gDim, bDim, 0, cuStream>>>(lastLayer->ptr(),
+                                                            lastLayer->pitch(),
+                                                            lastLayer->size(),
+                                                            outImg.ptr(),
+                                                            outImg.pitch(),
+                                                            outRoi);
+        ncvAssertCUDAReturn(cudaGetLastError(), NCV_CUDA_ERROR);
+
+#ifdef SELF_CHECK_GPU
+        ncvSafeMatAlloc(h_lastLayer, T, allocCPU, lastLayer->width(), lastLayer->height(), NCV_ALLOCATOR_BAD_ALLOC);
+        ncvSafeMatAlloc(h_outImg, T, allocCPU, outImg.width(), outImg.height(), NCV_ALLOCATOR_BAD_ALLOC);
+        ncvAssertReturnNcvStat(lastLayer->copy2D(h_lastLayer, lastLayer->size(), cuStream));
+        ncvAssertReturnNcvStat(outImg.copy2D(h_outImg, outRoi, cuStream));
+        ncvAssertCUDAReturn(cudaStreamSynchronize(cuStream), NCV_CUDA_ERROR);
+
+        for (Ncv32u i=0; i<outRoi.height; i++)
+        {
+            for (Ncv32u j=0; j<outRoi.width; j++)
+            {
+                NcvSize32u szTopLayer(lastLayer->width(), lastLayer->height());
+                Ncv32f ptTopX = 1.0f * (szTopLayer.width - 1) * j / (outRoi.width - 1);
+                Ncv32f ptTopY = 1.0f * (szTopLayer.height - 1) * i / (outRoi.height - 1);
+                T outGold = _interpBilinear(h_lastLayer, ptTopX, ptTopY);
+                ncvAssertPrintReturn(0 == memcmp(&outGold, &h_outImg.at(j,i), sizeof(T)), "Validation failure in NCVImagePyramid::ctor with kernelInterpolateFrom1", NCV_UNKNOWN_ERROR);
+            }
+        }
+#endif
+    }
+    else
+    {
+        for (Ncv32u i=0; i<outRoi.height; i++)
+        {
+            for (Ncv32u j=0; j<outRoi.width; j++)
+            {
+                //top layer pixel (always exists)
+                NcvSize32u szTopLayer(lastLayer->width(), lastLayer->height());
+                Ncv32f ptTopX = 1.0f * (szTopLayer.width - 1) * j / (outRoi.width - 1);
+                Ncv32f ptTopY = 1.0f * (szTopLayer.height - 1) * i / (outRoi.height - 1);
+                T topPix = _interpBilinear(*lastLayer, ptTopX, ptTopY);
+                T trilinearPix = topPix;
+
+                if (bUse2Refs)
+                {
+                    //bottom layer pixel (exists only if the requested scale is greater than the smallest layer scale)
+                    NcvSize32u szBottomLayer(curLayer->width(), curLayer->height());
+                    Ncv32f ptBottomX = 1.0f * (szBottomLayer.width - 1) * j / (outRoi.width - 1);
+                    Ncv32f ptBottomY = 1.0f * (szBottomLayer.height - 1) * i / (outRoi.height - 1);
+                    T bottomPix = _interpBilinear(*curLayer, ptBottomX, ptBottomY);
+
+                    Ncv32f scale = (1.0f * outRoi.width / layer0->width() + 1.0f * outRoi.height / layer0->height()) / 2;
+                    Ncv32f dl = (scale - curScale) / (lastScale - curScale);
+                    dl = CLAMP(dl, 0.0f, 1.0f);
+                    trilinearPix = _interpLinear(bottomPix, topPix, dl);
+                }
+
+                outImg.at(j, i) = trilinearPix;
+            }
+        }
+    }
+
+    NCV_SKIP_COND_END
+
+    return NCV_SUCCESS;
+}
+
+
+template class NCVImagePyramid<uchar1>;
+template class NCVImagePyramid<uchar3>;
+template class NCVImagePyramid<uchar4>;
+template class NCVImagePyramid<ushort1>;
+template class NCVImagePyramid<ushort3>;
+template class NCVImagePyramid<ushort4>;
+template class NCVImagePyramid<uint1>;
+template class NCVImagePyramid<uint3>;
+template class NCVImagePyramid<uint4>;
+template class NCVImagePyramid<float1>;
+template class NCVImagePyramid<float3>;
+template class NCVImagePyramid<float4>;
+
+#endif //_WIN32
diff --git a/modules/cudalegacy/src/cuda/NCVRuntimeTemplates.hpp b/modules/cudalegacy/src/cuda/NCVRuntimeTemplates.hpp
new file mode 100644
index 00000000000..ad59b32f7af
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/NCVRuntimeTemplates.hpp
@@ -0,0 +1,221 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _ncvruntimetemplates_hpp_
+#define _ncvruntimetemplates_hpp_
+#if defined _MSC_VER &&_MSC_VER >= 1200
+#pragma warning( disable: 4800 )
+#endif
+
+
+#include <stdarg.h>
+#include <vector>
+
+
+////////////////////////////////////////////////////////////////////////////////
+// The Loki Library
+// Copyright (c) 2001 by Andrei Alexandrescu
+// This code accompanies the book:
+// Alexandrescu, Andrei. "Modern C++ Design: Generic Programming and Design
+//     Patterns Applied". Copyright (c) 2001. Addison-Wesley.
+// Permission to use, copy, modify, distribute and sell this software for any
+//     purpose is hereby granted without fee, provided that the above copyright
+//     notice appear in all copies and that both that copyright notice and this
+//     permission notice appear in supporting documentation.
+// The author or Addison-Welsey Longman make no representations about the
+//     suitability of this software for any purpose. It is provided "as is"
+//     without express or implied warranty.
+// http://loki-lib.sourceforge.net/index.php?n=Main.License
+////////////////////////////////////////////////////////////////////////////////
+
+namespace Loki
+{
+    //==============================================================================
+    // class NullType
+    // Used as a placeholder for "no type here"
+    // Useful as an end marker in typelists
+    //==============================================================================
+
+    class NullType {};
+
+    //==============================================================================
+    // class template Typelist
+    // The building block of typelists of any length
+    // Use it through the LOKI_TYPELIST_NN macros
+    // Defines nested types:
+    //     Head (first element, a non-typelist type by convention)
+    //     Tail (second element, can be another typelist)
+    //==============================================================================
+
+    template <class T, class U>
+    struct Typelist
+    {
+        typedef T Head;
+        typedef U Tail;
+    };
+
+    //==============================================================================
+    // class template Int2Type
+    // Converts each integral constant into a unique type
+    // Invocation: Int2Type<v> where v is a compile-time constant integral
+    // Defines 'value', an enum that evaluates to v
+    //==============================================================================
+
+    template <int v>
+    struct Int2Type
+    {
+        enum { value = v };
+    };
+
+    namespace TL
+    {
+        //==============================================================================
+        // class template TypeAt
+        // Finds the type at a given index in a typelist
+        // Invocation (TList is a typelist and index is a compile-time integral
+        //     constant):
+        // TypeAt<TList, index>::Result
+        // returns the type in position 'index' in TList
+        // If you pass an out-of-bounds index, the result is a compile-time error
+        //==============================================================================
+
+        template <class TList, unsigned int index> struct TypeAt;
+
+        template <class Head, class Tail>
+        struct TypeAt<Typelist<Head, Tail>, 0>
+        {
+            typedef Head Result;
+        };
+
+        template <class Head, class Tail, unsigned int i>
+        struct TypeAt<Typelist<Head, Tail>, i>
+        {
+            typedef typename TypeAt<Tail, i - 1>::Result Result;
+        };
+    }
+}
+
+
+////////////////////////////////////////////////////////////////////////////////
+// Runtime boolean template instance dispatcher
+// Cyril Crassin <cyril.crassin@icare3d.org>
+// NVIDIA, 2010
+////////////////////////////////////////////////////////////////////////////////
+
+namespace NCVRuntimeTemplateBool
+{
+    //This struct is used to transform a list of parameters into template arguments
+    //The idea is to build a typelist containing the arguments
+    //and to pass this typelist to a user defined functor
+    template<typename TList, int NumArguments, class Func>
+    struct KernelCaller
+    {
+        //Convenience function used by the user
+        //Takes a variable argument list, transforms it into a list
+        static void call(Func *functor, ...)
+        {
+            //Vector used to collect arguments
+            std::vector<int> templateParamList;
+
+            //Variable argument list manipulation
+            va_list listPointer;
+            va_start(listPointer, functor);
+            //Collect parameters into the list
+            for(int i=0; i<NumArguments; i++)
+            {
+                int val = va_arg(listPointer, int);
+                templateParamList.push_back(val);
+            }
+            va_end(listPointer);
+
+            //Call the actual typelist building function
+            call(*functor, templateParamList);
+        }
+
+        //Actual function called recursively to build a typelist based
+        //on a list of values
+        static void call( Func &functor, std::vector<int> &templateParamList)
+        {
+            //Get current parameter value in the list
+            NcvBool val = templateParamList[templateParamList.size() - 1];
+            templateParamList.pop_back();
+
+            //Select the compile time value to add into the typelist
+            //depending on the runtime variable and make recursive call.
+            //Both versions are really instantiated
+            if (val)
+            {
+                KernelCaller<
+                    Loki::Typelist<typename Loki::Int2Type<1>, TList >,
+                    NumArguments-1, Func >
+                    ::call(functor, templateParamList);
+            }
+            else
+            {
+                KernelCaller<
+                    Loki::Typelist<typename Loki::Int2Type<0>, TList >,
+                    NumArguments-1, Func >
+                    ::call(functor, templateParamList);
+            }
+        }
+    };
+
+    //Specialization for 0 value left in the list
+    //-> actual kernel functor call
+    template<class TList, class Func>
+    struct KernelCaller<TList, 0, Func>
+    {
+        static void call(Func &functor)
+        {
+            //Call to the functor's kernel call method
+            functor.call(TList()); //TList instantiated to get the method template parameter resolved
+        }
+
+        static void call(Func &functor, std::vector<int> &templateParams)
+        {
+            CV_UNUSED(templateParams);
+            functor.call(TList());
+        }
+    };
+}
+
+#endif //_ncvruntimetemplates_hpp_
diff --git a/modules/cudalegacy/src/cuda/NPP_staging.cu b/modules/cudalegacy/src/cuda/NPP_staging.cu
new file mode 100644
index 00000000000..90880d56cc5
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/NPP_staging.cu
@@ -0,0 +1,2510 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include <vector>
+#include <cuda_runtime.h>
+
+#include "opencv2/cudev.hpp"
+
+#include "opencv2/cudalegacy/NPP_staging.hpp"
+
+
+texture<Ncv8u,  1, cudaReadModeElementType> tex8u;
+texture<Ncv32u, 1, cudaReadModeElementType> tex32u;
+texture<uint2,  1, cudaReadModeElementType> tex64u;
+
+
+//==============================================================================
+//
+// CUDA streams handling
+//
+//==============================================================================
+
+
+static cudaStream_t nppStream = 0;
+
+
+cudaStream_t nppStGetActiveCUDAstream(void)
+{
+    return nppStream;
+}
+
+
+
+cudaStream_t nppStSetActiveCUDAstream(cudaStream_t cudaStream)
+{
+    cudaStream_t tmp = nppStream;
+    nppStream = cudaStream;
+    return tmp;
+}
+
+
+//==============================================================================
+//
+// IntegralImage.cu
+//
+//==============================================================================
+
+
+const Ncv32u NUM_SCAN_THREADS = 256;
+const Ncv32u LOG2_NUM_SCAN_THREADS = 8;
+
+
+template<class T_in, class T_out>
+struct _scanElemOp
+{
+    template<bool tbDoSqr>
+    static inline __host__ __device__ T_out scanElemOp(T_in elem)
+    {
+        return scanElemOp( elem, Int2Type<(int)tbDoSqr>() );
+    }
+
+private:
+
+    template <int v> struct Int2Type { enum { value = v }; };
+
+    static inline __host__ __device__ T_out scanElemOp(T_in elem, Int2Type<0>)
+    {
+        return (T_out)elem;
+    }
+
+    static inline __host__ __device__ T_out scanElemOp(T_in elem, Int2Type<1>)
+    {
+        return (T_out)(elem*elem);
+    }
+};
+
+
+template<class T>
+inline __device__ T readElem(T *d_src, Ncv32u texOffs, Ncv32u srcStride, Ncv32u curElemOffs);
+
+
+template<>
+inline __device__ Ncv8u readElem<Ncv8u>(Ncv8u *d_src, Ncv32u texOffs, Ncv32u srcStride, Ncv32u curElemOffs)
+{
+    return tex1Dfetch(tex8u, texOffs + srcStride * blockIdx.x + curElemOffs);
+}
+
+
+template<>
+inline __device__ Ncv32u readElem<Ncv32u>(Ncv32u *d_src, Ncv32u texOffs, Ncv32u srcStride, Ncv32u curElemOffs)
+{
+    return d_src[curElemOffs];
+}
+
+
+template<>
+inline __device__ Ncv32f readElem<Ncv32f>(Ncv32f *d_src, Ncv32u texOffs, Ncv32u srcStride, Ncv32u curElemOffs)
+{
+    return d_src[curElemOffs];
+}
+
+
+/**
+* \brief Segmented scan kernel
+*
+* Calculates per-row prefix scans of the input image.
+* Out-of-bounds safe: reads 'size' elements, writes 'size+1' elements
+*
+* \tparam T_in      Type of input image elements
+* \tparam T_out     Type of output image elements
+* \tparam T_op      Defines an operation to be performed on the input image pixels
+*
+* \param d_src      [IN] Source image pointer
+* \param srcWidth   [IN] Source image width
+* \param srcStride  [IN] Source image stride
+* \param d_II       [OUT] Output image pointer
+* \param IIstride   [IN] Output image stride
+*
+* \return None
+*/
+template <class T_in, class T_out, bool tbDoSqr>
+__global__ void scanRows(T_in *d_src, Ncv32u texOffs, Ncv32u srcWidth, Ncv32u srcStride,
+                         T_out *d_II, Ncv32u IIstride)
+{
+    //advance pointers to the current line
+    if (sizeof(T_in) != 1)
+    {
+        d_src += srcStride * blockIdx.x;
+    }
+    //for initial image 8bit source we use texref tex8u
+    d_II += IIstride * blockIdx.x;
+
+    Ncv32u numBuckets = (srcWidth + NUM_SCAN_THREADS - 1) >> LOG2_NUM_SCAN_THREADS;
+    Ncv32u offsetX = 0;
+
+    __shared__ T_out shmem[NUM_SCAN_THREADS];
+    __shared__ T_out carryElem;
+    carryElem = 0;
+    __syncthreads();
+
+    while (numBuckets--)
+    {
+        Ncv32u curElemOffs = offsetX + threadIdx.x;
+        T_out curScanElem;
+
+        T_in curElem;
+        T_out curElemMod;
+
+        if (curElemOffs < srcWidth)
+        {
+            //load elements
+            curElem = readElem<T_in>(d_src, texOffs, srcStride, curElemOffs);
+        }
+        curElemMod = _scanElemOp<T_in, T_out>::scanElemOp<tbDoSqr>(curElem);
+
+        //inclusive scan
+        curScanElem = cv::cudev::blockScanInclusive<NUM_SCAN_THREADS>(curElemMod, shmem, threadIdx.x);
+
+        if (curElemOffs <= srcWidth)
+        {
+            //make scan exclusive and write the bucket to the output buffer
+            d_II[curElemOffs] = carryElem + curScanElem - curElemMod;
+            offsetX += NUM_SCAN_THREADS;
+        }
+
+        //remember last element for subsequent buckets adjustment
+        __syncthreads();
+        if (threadIdx.x == NUM_SCAN_THREADS-1)
+        {
+            carryElem += curScanElem;
+        }
+        __syncthreads();
+    }
+
+    if (offsetX == srcWidth && !threadIdx.x)
+    {
+        d_II[offsetX] = carryElem;
+    }
+}
+
+
+template <bool tbDoSqr, class T_in, class T_out>
+NCVStatus scanRowsWrapperDevice(T_in *d_src, Ncv32u srcStride,
+                                T_out *d_dst, Ncv32u dstStride, NcvSize32u roi)
+{
+    cudaChannelFormatDesc cfdTex;
+    size_t alignmentOffset = 0;
+    if (sizeof(T_in) == 1)
+    {
+        cfdTex = cudaCreateChannelDesc<Ncv8u>();
+        ncvAssertCUDAReturn(cudaBindTexture(&alignmentOffset, tex8u, d_src, cfdTex, roi.height * srcStride), NPPST_TEXTURE_BIND_ERROR);
+        if (alignmentOffset > 0)
+        {
+            ncvAssertCUDAReturn(cudaUnbindTexture(tex8u), NCV_CUDA_ERROR);
+            ncvAssertCUDAReturn(cudaBindTexture(&alignmentOffset, tex8u, d_src, cfdTex, alignmentOffset + roi.height * srcStride), NPPST_TEXTURE_BIND_ERROR);
+        }
+    }
+    scanRows
+        <T_in, T_out, tbDoSqr>
+        <<<roi.height, NUM_SCAN_THREADS, 0, nppStGetActiveCUDAstream()>>>
+        (d_src, (Ncv32u)alignmentOffset, roi.width, srcStride, d_dst, dstStride);
+
+    ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+
+    return NPPST_SUCCESS;
+}
+
+
+static Ncv32u getPaddedDimension(Ncv32u dim, Ncv32u elemTypeSize, Ncv32u allocatorAlignment)
+{
+    Ncv32u alignMask = allocatorAlignment-1;
+    Ncv32u inverseAlignMask = ~alignMask;
+    Ncv32u dimBytes = dim * elemTypeSize;
+    Ncv32u pitch = (dimBytes + alignMask) & inverseAlignMask;
+    Ncv32u PaddedDim = pitch / elemTypeSize;
+    return PaddedDim;
+}
+
+
+template <class T_in, class T_out>
+NCVStatus ncvIntegralImage_device(T_in *d_src, Ncv32u srcStep,
+                                  T_out *d_dst, Ncv32u dstStep, NcvSize32u roi,
+                                  INCVMemAllocator &gpuAllocator)
+{
+    ncvAssertReturn(sizeof(T_out) == sizeof(Ncv32u), NPPST_MEM_INTERNAL_ERROR);
+    ncvAssertReturn(gpuAllocator.memType() == NCVMemoryTypeDevice ||
+                      gpuAllocator.memType() == NCVMemoryTypeNone, NPPST_MEM_RESIDENCE_ERROR);
+    ncvAssertReturn(gpuAllocator.isInitialized(), NPPST_MEM_INTERNAL_ERROR);
+    ncvAssertReturn((d_src != NULL && d_dst != NULL) || gpuAllocator.isCounting(), NPPST_NULL_POINTER_ERROR);
+    ncvAssertReturn(roi.width > 0 && roi.height > 0, NPPST_INVALID_ROI);
+    ncvAssertReturn(srcStep >= roi.width * sizeof(T_in) &&
+                      dstStep >= (roi.width + 1) * sizeof(T_out) &&
+                      srcStep % sizeof(T_in) == 0 &&
+                      dstStep % sizeof(T_out) == 0, NPPST_INVALID_STEP);
+    srcStep /= sizeof(T_in);
+    dstStep /= sizeof(T_out);
+
+    Ncv32u WidthII = roi.width + 1;
+    Ncv32u HeightII = roi.height + 1;
+    Ncv32u PaddedWidthII32 = getPaddedDimension(WidthII, sizeof(Ncv32u), gpuAllocator.alignment());
+    Ncv32u PaddedHeightII32 = getPaddedDimension(HeightII, sizeof(Ncv32u), gpuAllocator.alignment());
+
+    NCVMatrixAlloc<T_out> Tmp32_1(gpuAllocator, PaddedWidthII32, PaddedHeightII32);
+    ncvAssertReturn(gpuAllocator.isCounting() || Tmp32_1.isMemAllocated(), NPPST_MEM_INTERNAL_ERROR);
+    NCVMatrixAlloc<T_out> Tmp32_2(gpuAllocator, PaddedHeightII32, PaddedWidthII32);
+    ncvAssertReturn(gpuAllocator.isCounting() || Tmp32_2.isMemAllocated(), NPPST_MEM_INTERNAL_ERROR);
+    ncvAssertReturn(Tmp32_1.pitch() * Tmp32_1.height() == Tmp32_2.pitch() * Tmp32_2.height(), NPPST_MEM_INTERNAL_ERROR);
+
+    NCVStatus ncvStat;
+    NCV_SET_SKIP_COND(gpuAllocator.isCounting());
+
+    NCV_SKIP_COND_BEGIN
+
+    ncvStat = scanRowsWrapperDevice
+        <false>
+        (d_src, srcStep, Tmp32_1.ptr(), PaddedWidthII32, roi);
+    ncvAssertReturnNcvStat(ncvStat);
+
+    ncvStat = nppiStTranspose_32u_C1R((Ncv32u *)Tmp32_1.ptr(), PaddedWidthII32*sizeof(Ncv32u),
+                                      (Ncv32u *)Tmp32_2.ptr(), PaddedHeightII32*sizeof(Ncv32u), NcvSize32u(WidthII, roi.height));
+    ncvAssertReturnNcvStat(ncvStat);
+
+    ncvStat = scanRowsWrapperDevice
+        <false>
+        (Tmp32_2.ptr(), PaddedHeightII32, Tmp32_1.ptr(), PaddedHeightII32, NcvSize32u(roi.height, WidthII));
+    ncvAssertReturnNcvStat(ncvStat);
+
+    ncvStat = nppiStTranspose_32u_C1R((Ncv32u *)Tmp32_1.ptr(), PaddedHeightII32*sizeof(Ncv32u),
+                                      (Ncv32u *)d_dst, dstStep*sizeof(Ncv32u), NcvSize32u(HeightII, WidthII));
+    ncvAssertReturnNcvStat(ncvStat);
+
+    NCV_SKIP_COND_END
+
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus ncvSquaredIntegralImage_device(Ncv8u *d_src, Ncv32u srcStep,
+                                         Ncv64u *d_dst, Ncv32u dstStep, NcvSize32u roi,
+                                         INCVMemAllocator &gpuAllocator)
+{
+    ncvAssertReturn(gpuAllocator.isInitialized(), NPPST_MEM_INTERNAL_ERROR);
+    ncvAssertReturn(gpuAllocator.memType() == NCVMemoryTypeDevice ||
+                      gpuAllocator.memType() == NCVMemoryTypeNone, NPPST_MEM_RESIDENCE_ERROR);
+    ncvAssertReturn((d_src != NULL && d_dst != NULL) || gpuAllocator.isCounting(), NPPST_NULL_POINTER_ERROR);
+    ncvAssertReturn(roi.width > 0 && roi.height > 0, NPPST_INVALID_ROI);
+    ncvAssertReturn(srcStep >= roi.width &&
+                      dstStep >= (roi.width + 1) * sizeof(Ncv64u) &&
+                      dstStep % sizeof(Ncv64u) == 0, NPPST_INVALID_STEP);
+    dstStep /= sizeof(Ncv64u);
+
+    Ncv32u WidthII = roi.width + 1;
+    Ncv32u HeightII = roi.height + 1;
+    Ncv32u PaddedWidthII32 = getPaddedDimension(WidthII, sizeof(Ncv32u), gpuAllocator.alignment());
+    Ncv32u PaddedHeightII32 = getPaddedDimension(HeightII, sizeof(Ncv32u), gpuAllocator.alignment());
+    Ncv32u PaddedWidthII64 = getPaddedDimension(WidthII, sizeof(Ncv64u), gpuAllocator.alignment());
+    Ncv32u PaddedHeightII64 = getPaddedDimension(HeightII, sizeof(Ncv64u), gpuAllocator.alignment());
+    Ncv32u PaddedWidthMax = PaddedWidthII32 > PaddedWidthII64 ? PaddedWidthII32 : PaddedWidthII64;
+    Ncv32u PaddedHeightMax = PaddedHeightII32 > PaddedHeightII64 ? PaddedHeightII32 : PaddedHeightII64;
+
+    NCVMatrixAlloc<Ncv32u> Tmp32_1(gpuAllocator, PaddedWidthII32, PaddedHeightII32);
+    ncvAssertReturn(Tmp32_1.isMemAllocated(), NPPST_MEM_INTERNAL_ERROR);
+    NCVMatrixAlloc<Ncv64u> Tmp64(gpuAllocator, PaddedWidthMax, PaddedHeightMax);
+    ncvAssertReturn(Tmp64.isMemAllocated(), NPPST_MEM_INTERNAL_ERROR);
+
+    NCVMatrixReuse<Ncv32u> Tmp32_2(Tmp64.getSegment(), gpuAllocator.alignment(), PaddedWidthII32, PaddedHeightII32);
+    ncvAssertReturn(Tmp32_2.isMemReused(), NPPST_MEM_INTERNAL_ERROR);
+    NCVMatrixReuse<Ncv64u> Tmp64_2(Tmp64.getSegment(), gpuAllocator.alignment(), PaddedWidthII64, PaddedHeightII64);
+    ncvAssertReturn(Tmp64_2.isMemReused(), NPPST_MEM_INTERNAL_ERROR);
+
+    NCVStatus ncvStat;
+    NCV_SET_SKIP_COND(gpuAllocator.isCounting());
+
+    NCV_SKIP_COND_BEGIN
+
+    ncvStat = scanRowsWrapperDevice
+        <true, Ncv8u, Ncv32u>
+        (d_src, srcStep, Tmp32_2.ptr(), PaddedWidthII32, roi);
+    ncvAssertReturnNcvStat(ncvStat);
+
+    ncvStat = nppiStTranspose_32u_C1R(Tmp32_2.ptr(), PaddedWidthII32*sizeof(Ncv32u),
+                                      Tmp32_1.ptr(), PaddedHeightII32*sizeof(Ncv32u), NcvSize32u(WidthII, roi.height));
+    ncvAssertReturnNcvStat(ncvStat);
+
+    ncvStat = scanRowsWrapperDevice
+        <false, Ncv32u, Ncv64u>
+        (Tmp32_1.ptr(), PaddedHeightII32, Tmp64_2.ptr(), PaddedHeightII64, NcvSize32u(roi.height, WidthII));
+    ncvAssertReturnNcvStat(ncvStat);
+
+    ncvStat = nppiStTranspose_64u_C1R(Tmp64_2.ptr(), PaddedHeightII64*sizeof(Ncv64u),
+                                      d_dst, dstStep*sizeof(Ncv64u), NcvSize32u(HeightII, WidthII));
+    ncvAssertReturnNcvStat(ncvStat);
+
+    NCV_SKIP_COND_END
+
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppiStIntegralGetSize_8u32u(NcvSize32u roiSize, Ncv32u *pBufsize, cudaDeviceProp &devProp)
+{
+    ncvAssertReturn(pBufsize != NULL, NPPST_NULL_POINTER_ERROR);
+    ncvAssertReturn(roiSize.width > 0 && roiSize.height > 0, NPPST_INVALID_ROI);
+
+    NCVMemStackAllocator gpuCounter(static_cast<Ncv32u>(devProp.textureAlignment));
+    ncvAssertReturn(gpuCounter.isInitialized(), NPPST_MEM_INTERNAL_ERROR);
+
+    NCVStatus ncvStat = ncvIntegralImage_device((Ncv8u*)NULL, roiSize.width,
+                                                  (Ncv32u*)NULL, (roiSize.width+1) * sizeof(Ncv32u),
+                                                  roiSize, gpuCounter);
+    ncvAssertReturnNcvStat(ncvStat);
+
+    *pBufsize = (Ncv32u)gpuCounter.maxSize();
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppiStIntegralGetSize_32f32f(NcvSize32u roiSize, Ncv32u *pBufsize, cudaDeviceProp &devProp)
+{
+    ncvAssertReturn(pBufsize != NULL, NPPST_NULL_POINTER_ERROR);
+    ncvAssertReturn(roiSize.width > 0 && roiSize.height > 0, NPPST_INVALID_ROI);
+
+    NCVMemStackAllocator gpuCounter(static_cast<Ncv32u>(devProp.textureAlignment));
+    ncvAssertReturn(gpuCounter.isInitialized(), NPPST_MEM_INTERNAL_ERROR);
+
+    NCVStatus ncvStat = ncvIntegralImage_device((Ncv32f*)NULL, roiSize.width * sizeof(Ncv32f),
+                                                  (Ncv32f*)NULL, (roiSize.width+1) * sizeof(Ncv32f),
+                                                  roiSize, gpuCounter);
+    ncvAssertReturnNcvStat(ncvStat);
+
+    *pBufsize = (Ncv32u)gpuCounter.maxSize();
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppiStSqrIntegralGetSize_8u64u(NcvSize32u roiSize, Ncv32u *pBufsize, cudaDeviceProp &devProp)
+{
+    ncvAssertReturn(pBufsize != NULL, NPPST_NULL_POINTER_ERROR);
+    ncvAssertReturn(roiSize.width > 0 && roiSize.height > 0, NPPST_INVALID_ROI);
+
+    NCVMemStackAllocator gpuCounter(static_cast<Ncv32u>(devProp.textureAlignment));
+    ncvAssertReturn(gpuCounter.isInitialized(), NPPST_MEM_INTERNAL_ERROR);
+
+    NCVStatus ncvStat = ncvSquaredIntegralImage_device(NULL, roiSize.width,
+                                                         NULL, (roiSize.width+1) * sizeof(Ncv64u),
+                                                         roiSize, gpuCounter);
+    ncvAssertReturnNcvStat(ncvStat);
+
+    *pBufsize = (Ncv32u)gpuCounter.maxSize();
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppiStIntegral_8u32u_C1R(Ncv8u *d_src, Ncv32u srcStep,
+                                   Ncv32u *d_dst, Ncv32u dstStep,
+                                   NcvSize32u roiSize, Ncv8u *pBuffer,
+                                   Ncv32u bufSize, cudaDeviceProp &devProp)
+{
+    NCVMemStackAllocator gpuAllocator(NCVMemoryTypeDevice, bufSize, static_cast<Ncv32u>(devProp.textureAlignment), pBuffer);
+    ncvAssertReturn(gpuAllocator.isInitialized(), NPPST_MEM_INTERNAL_ERROR);
+
+    NCVStatus ncvStat = ncvIntegralImage_device(d_src, srcStep, d_dst, dstStep, roiSize, gpuAllocator);
+    ncvAssertReturnNcvStat(ncvStat);
+
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppiStIntegral_32f32f_C1R(Ncv32f *d_src, Ncv32u srcStep,
+                                    Ncv32f *d_dst, Ncv32u dstStep,
+                                    NcvSize32u roiSize, Ncv8u *pBuffer,
+                                    Ncv32u bufSize, cudaDeviceProp &devProp)
+{
+    NCVMemStackAllocator gpuAllocator(NCVMemoryTypeDevice, bufSize, static_cast<Ncv32u>(devProp.textureAlignment), pBuffer);
+    ncvAssertReturn(gpuAllocator.isInitialized(), NPPST_MEM_INTERNAL_ERROR);
+
+    NCVStatus ncvStat = ncvIntegralImage_device(d_src, srcStep, d_dst, dstStep, roiSize, gpuAllocator);
+    ncvAssertReturnNcvStat(ncvStat);
+
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppiStSqrIntegral_8u64u_C1R(Ncv8u *d_src, Ncv32u srcStep,
+                                      Ncv64u *d_dst, Ncv32u dstStep,
+                                      NcvSize32u roiSize, Ncv8u *pBuffer,
+                                      Ncv32u bufSize, cudaDeviceProp &devProp)
+{
+    NCVMemStackAllocator gpuAllocator(NCVMemoryTypeDevice, bufSize, static_cast<Ncv32u>(devProp.textureAlignment), pBuffer);
+    ncvAssertReturn(gpuAllocator.isInitialized(), NPPST_MEM_INTERNAL_ERROR);
+
+    NCVStatus ncvStat = ncvSquaredIntegralImage_device(d_src, srcStep, d_dst, dstStep, roiSize, gpuAllocator);
+    ncvAssertReturnNcvStat(ncvStat);
+
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppiStIntegral_8u32u_C1R_host(Ncv8u *h_src, Ncv32u srcStep,
+                                        Ncv32u *h_dst, Ncv32u dstStep,
+                                        NcvSize32u roiSize)
+{
+    ncvAssertReturn(h_src != NULL && h_dst != NULL, NPPST_NULL_POINTER_ERROR);
+    ncvAssertReturn(roiSize.width > 0 && roiSize.height > 0, NPPST_INVALID_ROI);
+    ncvAssertReturn(srcStep >= roiSize.width &&
+                      dstStep >= (roiSize.width + 1) * sizeof(Ncv32u) &&
+                      dstStep % sizeof(Ncv32u) == 0, NPPST_INVALID_STEP);
+    dstStep /= sizeof(Ncv32u);
+
+    Ncv32u WidthII = roiSize.width + 1;
+    Ncv32u HeightII = roiSize.height + 1;
+
+    memset(h_dst, 0, WidthII * sizeof(Ncv32u));
+    for (Ncv32u i=1; i<HeightII; i++)
+    {
+        h_dst[i * dstStep] = 0;
+        for (Ncv32u j=1; j<WidthII; j++)
+        {
+            Ncv32u top = h_dst[(i-1) * dstStep + j];
+            Ncv32u left = h_dst[i * dstStep + (j - 1)];
+            Ncv32u topleft = h_dst[(i - 1) * dstStep + (j - 1)];
+            Ncv32u elem = h_src[(i - 1) * srcStep + (j - 1)];
+            h_dst[i * dstStep + j] = elem + left - topleft + top;
+        }
+    }
+
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppiStIntegral_32f32f_C1R_host(Ncv32f *h_src, Ncv32u srcStep,
+                                         Ncv32f *h_dst, Ncv32u dstStep,
+                                         NcvSize32u roiSize)
+{
+    ncvAssertReturn(h_src != NULL && h_dst != NULL, NPPST_NULL_POINTER_ERROR);
+    ncvAssertReturn(roiSize.width > 0 && roiSize.height > 0, NPPST_INVALID_ROI);
+    ncvAssertReturn(srcStep >= roiSize.width * sizeof(Ncv32f) &&
+                      dstStep >= (roiSize.width + 1) * sizeof(Ncv32f) &&
+                      srcStep % sizeof(Ncv32f) == 0 &&
+                      dstStep % sizeof(Ncv32f) == 0, NPPST_INVALID_STEP);
+    srcStep /= sizeof(Ncv32f);
+    dstStep /= sizeof(Ncv32f);
+
+    Ncv32u WidthII = roiSize.width + 1;
+    Ncv32u HeightII = roiSize.height + 1;
+
+    memset(h_dst, 0, WidthII * sizeof(Ncv32u));
+    for (Ncv32u i=1; i<HeightII; i++)
+    {
+        h_dst[i * dstStep] = 0.0f;
+        for (Ncv32u j=1; j<WidthII; j++)
+        {
+            Ncv32f top = h_dst[(i-1) * dstStep + j];
+            Ncv32f left = h_dst[i * dstStep + (j - 1)];
+            Ncv32f topleft = h_dst[(i - 1) * dstStep + (j - 1)];
+            Ncv32f elem = h_src[(i - 1) * srcStep + (j - 1)];
+            h_dst[i * dstStep + j] = elem + left - topleft + top;
+        }
+    }
+
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppiStSqrIntegral_8u64u_C1R_host(Ncv8u *h_src, Ncv32u srcStep,
+                                           Ncv64u *h_dst, Ncv32u dstStep,
+                                           NcvSize32u roiSize)
+{
+    ncvAssertReturn(h_src != NULL && h_dst != NULL, NPPST_NULL_POINTER_ERROR);
+    ncvAssertReturn(roiSize.width > 0 && roiSize.height > 0, NPPST_INVALID_ROI);
+    ncvAssertReturn(srcStep >= roiSize.width &&
+                      dstStep >= (roiSize.width + 1) * sizeof(Ncv64u) &&
+                      dstStep % sizeof(Ncv64u) == 0, NPPST_INVALID_STEP);
+    dstStep /= sizeof(Ncv64u);
+
+    Ncv32u WidthII = roiSize.width + 1;
+    Ncv32u HeightII = roiSize.height + 1;
+
+    memset(h_dst, 0, WidthII * sizeof(Ncv64u));
+    for (Ncv32u i=1; i<HeightII; i++)
+    {
+        h_dst[i * dstStep] = 0;
+        for (Ncv32u j=1; j<WidthII; j++)
+        {
+            Ncv64u top = h_dst[(i-1) * dstStep + j];
+            Ncv64u left = h_dst[i * dstStep + (j - 1)];
+            Ncv64u topleft = h_dst[(i - 1) * dstStep + (j - 1)];
+            Ncv64u elem = h_src[(i - 1) * srcStep + (j - 1)];
+            h_dst[i * dstStep + j] = elem*elem + left - topleft + top;
+        }
+    }
+
+    return NPPST_SUCCESS;
+}
+
+
+//==============================================================================
+//
+// Decimate.cu
+//
+//==============================================================================
+
+
+const Ncv32u NUM_DOWNSAMPLE_NEAREST_THREADS_X = 32;
+const Ncv32u NUM_DOWNSAMPLE_NEAREST_THREADS_Y = 8;
+
+
+template<class T, NcvBool tbCacheTexture>
+__device__ T getElem_Decimate(Ncv32u x, T *d_src);
+
+
+template<>
+__device__ Ncv32u getElem_Decimate<Ncv32u, true>(Ncv32u x, Ncv32u *d_src)
+{
+    return tex1Dfetch(tex32u, x);
+}
+
+
+template<>
+__device__ Ncv32u getElem_Decimate<Ncv32u, false>(Ncv32u x, Ncv32u *d_src)
+{
+    return d_src[x];
+}
+
+
+template<>
+__device__ Ncv64u getElem_Decimate<Ncv64u, true>(Ncv32u x, Ncv64u *d_src)
+{
+    uint2 tmp = tex1Dfetch(tex64u, x);
+    Ncv64u res = (Ncv64u)tmp.y;
+    res <<= 32;
+    res |= tmp.x;
+    return res;
+}
+
+
+template<>
+__device__ Ncv64u getElem_Decimate<Ncv64u, false>(Ncv32u x, Ncv64u *d_src)
+{
+    return d_src[x];
+}
+
+
+template <class T, NcvBool tbCacheTexture>
+__global__ void decimate_C1R(T *d_src, Ncv32u srcStep, T *d_dst, Ncv32u dstStep,
+                                      NcvSize32u dstRoi, Ncv32u scale)
+{
+    int curX = blockIdx.x * blockDim.x + threadIdx.x;
+    int curY = blockIdx.y * blockDim.y + threadIdx.y;
+
+    if (curX >= dstRoi.width || curY >= dstRoi.height)
+    {
+        return;
+    }
+
+    d_dst[curY * dstStep + curX] = getElem_Decimate<T, tbCacheTexture>((curY * srcStep + curX) * scale, d_src);
+}
+
+
+template <class T>
+static NCVStatus decimateWrapperDevice(T *d_src, Ncv32u srcStep,
+                                                T *d_dst, Ncv32u dstStep,
+                                                NcvSize32u srcRoi, Ncv32u scale,
+                                                NcvBool readThruTexture)
+{
+    ncvAssertReturn(d_src != NULL && d_dst != NULL, NPPST_NULL_POINTER_ERROR);
+    ncvAssertReturn(srcRoi.width > 0 && srcRoi.height > 0, NPPST_INVALID_ROI);
+    ncvAssertReturn(scale != 0, NPPST_INVALID_SCALE);
+    ncvAssertReturn(srcStep >= (Ncv32u)(srcRoi.width) * sizeof(T) &&
+                      dstStep >= (Ncv32u)(srcRoi.width * sizeof(T) / scale), NPPST_INVALID_STEP);
+    srcStep /= sizeof(T);
+    dstStep /= sizeof(T);
+
+    NcvSize32u dstRoi;
+    dstRoi.width = srcRoi.width / scale;
+    dstRoi.height = srcRoi.height / scale;
+
+    dim3 grid((dstRoi.width + NUM_DOWNSAMPLE_NEAREST_THREADS_X - 1) / NUM_DOWNSAMPLE_NEAREST_THREADS_X,
+              (dstRoi.height + NUM_DOWNSAMPLE_NEAREST_THREADS_Y - 1) / NUM_DOWNSAMPLE_NEAREST_THREADS_Y);
+    dim3 block(NUM_DOWNSAMPLE_NEAREST_THREADS_X, NUM_DOWNSAMPLE_NEAREST_THREADS_Y);
+
+    if (!readThruTexture)
+    {
+        decimate_C1R
+            <T, false>
+            <<<grid, block, 0, nppStGetActiveCUDAstream()>>>
+            (d_src, srcStep, d_dst, dstStep, dstRoi, scale);
+    }
+    else
+    {
+        cudaChannelFormatDesc cfdTexSrc;
+
+        if (sizeof(T) == sizeof(Ncv32u))
+        {
+            cfdTexSrc = cudaCreateChannelDesc<Ncv32u>();
+
+            size_t alignmentOffset;
+            ncvAssertCUDAReturn(cudaBindTexture(&alignmentOffset, tex32u, d_src, cfdTexSrc, srcRoi.height * srcStep * sizeof(T)), NPPST_TEXTURE_BIND_ERROR);
+            ncvAssertReturn(alignmentOffset==0, NPPST_TEXTURE_BIND_ERROR);
+        }
+        else
+        {
+            cfdTexSrc = cudaCreateChannelDesc<uint2>();
+
+            size_t alignmentOffset;
+            ncvAssertCUDAReturn(cudaBindTexture(&alignmentOffset, tex64u, d_src, cfdTexSrc, srcRoi.height * srcStep * sizeof(T)), NPPST_TEXTURE_BIND_ERROR);
+            ncvAssertReturn(alignmentOffset==0, NPPST_TEXTURE_BIND_ERROR);
+        }
+
+        decimate_C1R
+            <T, true>
+            <<<grid, block, 0, nppStGetActiveCUDAstream()>>>
+            (d_src, srcStep, d_dst, dstStep, dstRoi, scale);
+    }
+
+    ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+
+    return NPPST_SUCCESS;
+}
+
+
+template <class T>
+static NCVStatus decimateWrapperHost(T *h_src, Ncv32u srcStep,
+                                              T *h_dst, Ncv32u dstStep,
+                                              NcvSize32u srcRoi, Ncv32u scale)
+{
+    ncvAssertReturn(h_src != NULL && h_dst != NULL, NPPST_NULL_POINTER_ERROR);
+    ncvAssertReturn(srcRoi.width != 0 && srcRoi.height != 0, NPPST_INVALID_ROI);
+    ncvAssertReturn(scale != 0, NPPST_INVALID_SCALE);
+    ncvAssertReturn(srcStep >= (Ncv32u)(srcRoi.width) * sizeof(T) &&
+                      dstStep >= (Ncv32u)(srcRoi.width * sizeof(T) / scale) &&
+                      srcStep % sizeof(T) == 0 && dstStep % sizeof(T) == 0, NPPST_INVALID_STEP);
+    srcStep /= sizeof(T);
+    dstStep /= sizeof(T);
+
+    NcvSize32u dstRoi;
+    dstRoi.width = srcRoi.width / scale;
+    dstRoi.height = srcRoi.height / scale;
+
+    for (Ncv32u i=0; i<dstRoi.height; i++)
+    {
+        for (Ncv32u j=0; j<dstRoi.width; j++)
+        {
+            h_dst[i*dstStep+j] = h_src[i*scale*srcStep + j*scale];
+        }
+    }
+
+    return NPPST_SUCCESS;
+}
+
+
+#define implementNppDecimate(bit, typ) \
+    NCVStatus nppiStDecimate_##bit##typ##_C1R(Ncv##bit##typ *d_src, Ncv32u srcStep, \
+                                                     Ncv##bit##typ *d_dst, Ncv32u dstStep, \
+                                                     NcvSize32u srcRoi, Ncv32u scale, NcvBool readThruTexture) \
+    { \
+        return decimateWrapperDevice<Ncv##bit##u>((Ncv##bit##u *)d_src, srcStep, \
+                                                           (Ncv##bit##u *)d_dst, dstStep, \
+                                                           srcRoi, scale, readThruTexture); \
+    }
+
+
+#define implementNppDecimateHost(bit, typ) \
+    NCVStatus nppiStDecimate_##bit##typ##_C1R_host(Ncv##bit##typ *h_src, Ncv32u srcStep, \
+                                                          Ncv##bit##typ *h_dst, Ncv32u dstStep, \
+                                                          NcvSize32u srcRoi, Ncv32u scale) \
+    { \
+        return decimateWrapperHost<Ncv##bit##u>((Ncv##bit##u *)h_src, srcStep, \
+                                                         (Ncv##bit##u *)h_dst, dstStep, \
+                                                         srcRoi, scale); \
+    }
+
+
+implementNppDecimate(32, u)
+implementNppDecimate(32, s)
+implementNppDecimate(32, f)
+implementNppDecimate(64, u)
+implementNppDecimate(64, s)
+implementNppDecimate(64, f)
+implementNppDecimateHost(32, u)
+implementNppDecimateHost(32, s)
+implementNppDecimateHost(32, f)
+implementNppDecimateHost(64, u)
+implementNppDecimateHost(64, s)
+implementNppDecimateHost(64, f)
+
+
+//==============================================================================
+//
+// RectStdDev.cu
+//
+//==============================================================================
+
+
+const Ncv32u NUM_RECTSTDDEV_THREADS = 128;
+
+
+template <NcvBool tbCacheTexture>
+__device__ Ncv32u getElemSum(Ncv32u x, Ncv32u *d_sum)
+{
+    if (tbCacheTexture)
+    {
+        return tex1Dfetch(tex32u, x);
+    }
+    else
+    {
+        return d_sum[x];
+    }
+}
+
+
+template <NcvBool tbCacheTexture>
+__device__ Ncv64u getElemSqSum(Ncv32u x, Ncv64u *d_sqsum)
+{
+    if (tbCacheTexture)
+    {
+        uint2 tmp = tex1Dfetch(tex64u, x);
+        Ncv64u res = (Ncv64u)tmp.y;
+        res <<= 32;
+        res |= tmp.x;
+        return res;
+    }
+    else
+    {
+        return d_sqsum[x];
+    }
+}
+
+
+template <NcvBool tbCacheTexture>
+__global__ void rectStdDev_32f_C1R(Ncv32u *d_sum, Ncv32u sumStep,
+                                   Ncv64u *d_sqsum, Ncv32u sqsumStep,
+                                   Ncv32f *d_norm, Ncv32u normStep,
+                                   NcvSize32u roi, NcvRect32u rect, Ncv32f invRectArea)
+{
+    Ncv32u x_offs = blockIdx.x * NUM_RECTSTDDEV_THREADS + threadIdx.x;
+    if (x_offs >= roi.width)
+    {
+        return;
+    }
+
+    Ncv32u sum_offset = blockIdx.y * sumStep + x_offs;
+    Ncv32u sqsum_offset = blockIdx.y * sqsumStep + x_offs;
+
+    //OPT: try swapping order (could change cache hit/miss ratio)
+    Ncv32u sum_tl = getElemSum<tbCacheTexture>(sum_offset + rect.y * sumStep + rect.x, d_sum);
+    Ncv32u sum_bl = getElemSum<tbCacheTexture>(sum_offset + (rect.y + rect.height) * sumStep + rect.x, d_sum);
+    Ncv32u sum_tr = getElemSum<tbCacheTexture>(sum_offset + rect.y * sumStep + rect.x + rect.width, d_sum);
+    Ncv32u sum_br = getElemSum<tbCacheTexture>(sum_offset + (rect.y + rect.height) * sumStep + rect.x + rect.width, d_sum);
+    Ncv32u sum_val = sum_br + sum_tl - sum_tr - sum_bl;
+
+    Ncv64u sqsum_tl, sqsum_bl, sqsum_tr, sqsum_br;
+    sqsum_tl = getElemSqSum<tbCacheTexture>(sqsum_offset + rect.y * sqsumStep + rect.x, d_sqsum);
+    sqsum_bl = getElemSqSum<tbCacheTexture>(sqsum_offset + (rect.y + rect.height) * sqsumStep + rect.x, d_sqsum);
+    sqsum_tr = getElemSqSum<tbCacheTexture>(sqsum_offset + rect.y * sqsumStep + rect.x + rect.width, d_sqsum);
+    sqsum_br = getElemSqSum<tbCacheTexture>(sqsum_offset + (rect.y + rect.height) * sqsumStep + rect.x + rect.width, d_sqsum);
+    Ncv64u sqsum_val = sqsum_br + sqsum_tl - sqsum_tr - sqsum_bl;
+
+    Ncv32f mean = sum_val * invRectArea;
+
+    //////////////////////////////////////////////////////////////////////////
+    // sqsum_val_res = sqsum_val / rectArea
+    //////////////////////////////////////////////////////////////////////////
+
+    Ncv32f sqsum_val_1 = __ull2float_rz(sqsum_val);
+    Ncv64u sqsum_val_2 = __float2ull_rz(sqsum_val_1);
+    Ncv64u sqsum_val_3 = sqsum_val - sqsum_val_2;
+    Ncv32f sqsum_val_4 = __ull2float_rn(sqsum_val_3);
+    sqsum_val_1 *= invRectArea;
+    sqsum_val_4 *= invRectArea;
+    Ncv32f sqsum_val_res = sqsum_val_1 + sqsum_val_4;
+
+    //////////////////////////////////////////////////////////////////////////
+    // variance = sqsum_val_res - mean * mean
+    //////////////////////////////////////////////////////////////////////////
+
+#if defined DISABLE_MAD_SELECTIVELY
+    Ncv32f variance = sqsum_val_2 - __fmul_rn(mean, mean);
+#else
+    Ncv32f variance = sqsum_val_res - mean * mean;
+#endif
+
+    //////////////////////////////////////////////////////////////////////////
+    // stddev = sqrtf(variance)
+    //////////////////////////////////////////////////////////////////////////
+
+    //Ncv32f stddev = sqrtf(variance);
+    Ncv32f stddev = __fsqrt_rn(variance);
+
+    d_norm[blockIdx.y * normStep + x_offs] = stddev;
+}
+
+
+NCVStatus nppiStRectStdDev_32f_C1R(Ncv32u *d_sum, Ncv32u sumStep,
+                                   Ncv64u *d_sqsum, Ncv32u sqsumStep,
+                                   Ncv32f *d_norm, Ncv32u normStep,
+                                   NcvSize32u roi, NcvRect32u rect,
+                                   Ncv32f scaleArea, NcvBool readThruTexture)
+{
+    ncvAssertReturn(d_sum != NULL && d_sqsum != NULL && d_norm != NULL, NPPST_NULL_POINTER_ERROR);
+    ncvAssertReturn(roi.width > 0 && roi.height > 0, NPPST_INVALID_ROI);
+    ncvAssertReturn(sumStep >= (Ncv32u)(roi.width + rect.x + rect.width - 1) * sizeof(Ncv32u) &&
+                      sqsumStep >= (Ncv32u)(roi.width + rect.x + rect.width - 1) * sizeof(Ncv64u) &&
+                      normStep >= (Ncv32u)roi.width * sizeof(Ncv32f) &&
+                      sumStep % sizeof(Ncv32u) == 0 &&
+                      sqsumStep % sizeof(Ncv64u) == 0 &&
+                      normStep % sizeof(Ncv32f) == 0, NPPST_INVALID_STEP);
+    ncvAssertReturn(scaleArea >= 1.0f, NPPST_INVALID_SCALE);
+    sumStep /= sizeof(Ncv32u);
+    sqsumStep /= sizeof(Ncv64u);
+    normStep /= sizeof(Ncv32f);
+
+    Ncv32f rectArea = rect.width * rect.height * scaleArea;
+    Ncv32f invRectArea = 1.0f / rectArea;
+
+    dim3 grid(((roi.width + NUM_RECTSTDDEV_THREADS - 1) / NUM_RECTSTDDEV_THREADS), roi.height);
+    dim3 block(NUM_RECTSTDDEV_THREADS);
+
+    if (!readThruTexture)
+    {
+        rectStdDev_32f_C1R
+            <false>
+            <<<grid, block, 0, nppStGetActiveCUDAstream()>>>
+            (d_sum, sumStep, d_sqsum, sqsumStep, d_norm, normStep, roi, rect, invRectArea);
+    }
+    else
+    {
+        cudaChannelFormatDesc cfdTexSrc;
+        cudaChannelFormatDesc cfdTexSqr;
+        cfdTexSrc = cudaCreateChannelDesc<Ncv32u>();
+        cfdTexSqr = cudaCreateChannelDesc<uint2>();
+
+        size_t alignmentOffset;
+        ncvAssertCUDAReturn(cudaBindTexture(&alignmentOffset, tex32u, d_sum, cfdTexSrc, (roi.height + rect.y + rect.height) * sumStep * sizeof(Ncv32u)), NPPST_TEXTURE_BIND_ERROR);
+        ncvAssertReturn(alignmentOffset==0, NPPST_TEXTURE_BIND_ERROR);
+        ncvAssertCUDAReturn(cudaBindTexture(&alignmentOffset, tex64u, d_sqsum, cfdTexSqr, (roi.height + rect.y + rect.height) * sqsumStep * sizeof(Ncv64u)), NPPST_TEXTURE_BIND_ERROR);
+        ncvAssertReturn(alignmentOffset==0, NPPST_TEXTURE_BIND_ERROR);
+
+        rectStdDev_32f_C1R
+            <true>
+            <<<grid, block, 0, nppStGetActiveCUDAstream()>>>
+            (NULL, sumStep, NULL, sqsumStep, d_norm, normStep, roi, rect, invRectArea);
+    }
+
+    ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppiStRectStdDev_32f_C1R_host(Ncv32u *h_sum, Ncv32u sumStep,
+                                        Ncv64u *h_sqsum, Ncv32u sqsumStep,
+                                        Ncv32f *h_norm, Ncv32u normStep,
+                                        NcvSize32u roi, NcvRect32u rect,
+                                        Ncv32f scaleArea)
+{
+    ncvAssertReturn(h_sum != NULL && h_sqsum != NULL && h_norm != NULL, NPPST_NULL_POINTER_ERROR);
+    ncvAssertReturn(roi.width > 0 && roi.height > 0, NPPST_INVALID_ROI);
+    ncvAssertReturn(sumStep >= (Ncv32u)(roi.width + rect.x + rect.width - 1) * sizeof(Ncv32u) &&
+                      sqsumStep >= (Ncv32u)(roi.width + rect.x + rect.width - 1) * sizeof(Ncv64u) &&
+                      normStep >= (Ncv32u)roi.width * sizeof(Ncv32f) &&
+                      sumStep % sizeof(Ncv32u) == 0 &&
+                      sqsumStep % sizeof(Ncv64u) == 0 &&
+                      normStep % sizeof(Ncv32f) == 0, NPPST_INVALID_STEP);
+    ncvAssertReturn(scaleArea >= 1.0f, NPPST_INVALID_SCALE);
+    sumStep /= sizeof(Ncv32u);
+    sqsumStep /= sizeof(Ncv64u);
+    normStep /= sizeof(Ncv32f);
+
+    Ncv32f rectArea = rect.width * rect.height * scaleArea;
+    Ncv32f invRectArea = 1.0f / rectArea;
+
+    for (Ncv32u i=0; i<roi.height; i++)
+    {
+        for (Ncv32u j=0; j<roi.width; j++)
+        {
+            Ncv32u sum_offset = i * sumStep + j;
+            Ncv32u sqsum_offset = i * sqsumStep + j;
+
+            Ncv32u sum_tl = h_sum[sum_offset + rect.y * sumStep + rect.x];
+            Ncv32u sum_bl = h_sum[sum_offset + (rect.y + rect.height) * sumStep + rect.x];
+            Ncv32u sum_tr = h_sum[sum_offset + rect.y * sumStep + rect.x + rect.width];
+            Ncv32u sum_br = h_sum[sum_offset + (rect.y + rect.height) * sumStep + rect.x + rect.width];
+            Ncv64f sum_val = sum_br + sum_tl - sum_tr - sum_bl;
+
+            Ncv64u sqsum_tl = h_sqsum[sqsum_offset + rect.y * sqsumStep + rect.x];
+            Ncv64u sqsum_bl = h_sqsum[sqsum_offset + (rect.y + rect.height) * sqsumStep + rect.x];
+            Ncv64u sqsum_tr = h_sqsum[sqsum_offset + rect.y * sqsumStep + rect.x + rect.width];
+            Ncv64u sqsum_br = h_sqsum[sqsum_offset + (rect.y + rect.height) * sqsumStep + rect.x + rect.width];
+            Ncv64f sqsum_val = (Ncv64f)(sqsum_br + sqsum_tl - sqsum_tr - sqsum_bl);
+
+            Ncv64f mean = sum_val * invRectArea;
+            Ncv64f sqsum_val_2 = sqsum_val / rectArea;
+            Ncv64f variance = sqsum_val_2 - mean * mean;
+
+            h_norm[i * normStep + j] = (Ncv32f)sqrt(variance);
+        }
+    }
+
+    return NPPST_SUCCESS;
+}
+
+
+//==============================================================================
+//
+// Transpose.cu
+//
+//==============================================================================
+
+
+const Ncv32u TRANSPOSE_TILE_DIM   = 16;
+const Ncv32u TRANSPOSE_BLOCK_ROWS = 16;
+
+
+/**
+* \brief Matrix transpose kernel
+*
+* Calculates transpose of the input image
+* \see TRANSPOSE_TILE_DIM
+*
+* \tparam T_in      Type of input image elements
+* \tparam T_out     Type of output image elements
+*
+* \param d_src      [IN] Source image pointer
+* \param srcStride  [IN] Source image stride
+* \param d_dst      [OUT] Output image pointer
+* \param dstStride  [IN] Output image stride
+*
+* \return None
+*/
+template <class T>
+__global__ void transpose(T *d_src, Ncv32u srcStride,
+                          T *d_dst, Ncv32u dstStride, NcvSize32u srcRoi)
+{
+    __shared__ T tile[TRANSPOSE_TILE_DIM][TRANSPOSE_TILE_DIM+1];
+
+    Ncv32u blockIdx_x, blockIdx_y;
+
+    // do diagonal reordering
+    if (gridDim.x == gridDim.y)
+    {
+        blockIdx_y = blockIdx.x;
+        blockIdx_x = (blockIdx.x + blockIdx.y) % gridDim.x;
+    }
+    else
+    {
+        Ncv32u bid = blockIdx.x + gridDim.x * blockIdx.y;
+        blockIdx_y = bid % gridDim.y;
+        blockIdx_x = ((bid / gridDim.y) + blockIdx_y) % gridDim.x;
+    }
+
+    Ncv32u xIndex = blockIdx_x * TRANSPOSE_TILE_DIM + threadIdx.x;
+    Ncv32u yIndex = blockIdx_y * TRANSPOSE_TILE_DIM + threadIdx.y;
+    Ncv32u index_gmem = xIndex + yIndex * srcStride;
+
+    if (xIndex < srcRoi.width)
+    {
+        for (Ncv32u i=0; i<TRANSPOSE_TILE_DIM; i+=TRANSPOSE_BLOCK_ROWS)
+        {
+            if (yIndex + i < srcRoi.height)
+            {
+                tile[threadIdx.y+i][threadIdx.x] = d_src[index_gmem+i*srcStride];
+            }
+        }
+    }
+
+    __syncthreads();
+
+    xIndex = blockIdx_y * TRANSPOSE_TILE_DIM + threadIdx.x;
+    yIndex = blockIdx_x * TRANSPOSE_TILE_DIM + threadIdx.y;
+    index_gmem = xIndex + yIndex * dstStride;
+
+    if (xIndex < srcRoi.height)
+    {
+        for (Ncv32u i=0; i<TRANSPOSE_TILE_DIM; i+=TRANSPOSE_BLOCK_ROWS)
+        {
+            if (yIndex + i < srcRoi.width)
+            {
+                d_dst[index_gmem+i*dstStride] = tile[threadIdx.x][threadIdx.y+i];
+            }
+        }
+    }
+}
+
+
+template <class T>
+NCVStatus transposeWrapperDevice(T *d_src, Ncv32u srcStride,
+                                   T *d_dst, Ncv32u dstStride, NcvSize32u srcRoi)
+{
+    ncvAssertReturn(d_src != NULL && d_dst != NULL, NPPST_NULL_POINTER_ERROR);
+    ncvAssertReturn(srcRoi.width > 0 && srcRoi.height > 0, NPPST_INVALID_ROI);
+    ncvAssertReturn(srcStride >= srcRoi.width * sizeof(T) &&
+                      dstStride >= srcRoi.height * sizeof(T) &&
+                      srcStride % sizeof(T) == 0 && dstStride % sizeof(T) == 0, NPPST_INVALID_STEP);
+    srcStride /= sizeof(T);
+    dstStride /= sizeof(T);
+
+    dim3 grid((srcRoi.width + TRANSPOSE_TILE_DIM - 1) / TRANSPOSE_TILE_DIM,
+              (srcRoi.height + TRANSPOSE_TILE_DIM - 1) / TRANSPOSE_TILE_DIM);
+    dim3 block(TRANSPOSE_TILE_DIM, TRANSPOSE_TILE_DIM);
+    transpose
+        <T>
+        <<<grid, block, 0, nppStGetActiveCUDAstream()>>>
+        (d_src, srcStride, d_dst, dstStride, srcRoi);
+    ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+
+    return NPPST_SUCCESS;
+}
+
+
+template <class T>
+static NCVStatus transposeWrapperHost(T *h_src, Ncv32u srcStride,
+                                        T *h_dst, Ncv32u dstStride, NcvSize32u srcRoi)
+{
+    ncvAssertReturn(h_src != NULL && h_dst != NULL, NPPST_NULL_POINTER_ERROR);
+    ncvAssertReturn(srcRoi.width > 0 && srcRoi.height > 0, NPPST_INVALID_ROI);
+    ncvAssertReturn(srcStride >= srcRoi.width * sizeof(T) &&
+                      dstStride >= srcRoi.height * sizeof(T) &&
+                      srcStride % sizeof(T) == 0 && dstStride % sizeof(T) == 0, NPPST_INVALID_STEP);
+    srcStride /= sizeof(T);
+    dstStride /= sizeof(T);
+
+    for (Ncv32u i=0; i<srcRoi.height; i++)
+    {
+        for (Ncv32u j=0; j<srcRoi.width; j++)
+        {
+            h_dst[j*dstStride+i] = h_src[i*srcStride + j];
+        }
+    }
+
+    return NPPST_SUCCESS;
+}
+
+
+#define implementNppTranspose(bit, typ) \
+    NCVStatus nppiStTranspose_##bit##typ##_C1R(Ncv##bit##typ *d_src, Ncv32u srcStep, \
+                                             Ncv##bit##typ *d_dst, Ncv32u dstStep, NcvSize32u srcRoi) \
+    { \
+        return transposeWrapperDevice<Ncv##bit##u>((Ncv##bit##u *)d_src, srcStep, \
+                                                   (Ncv##bit##u *)d_dst, dstStep, srcRoi); \
+    }
+
+
+#define implementNppTransposeHost(bit, typ) \
+    NCVStatus nppiStTranspose_##bit##typ##_C1R_host(Ncv##bit##typ *h_src, Ncv32u srcStep, \
+                                                  Ncv##bit##typ *h_dst, Ncv32u dstStep, \
+                                                  NcvSize32u srcRoi) \
+    { \
+        return transposeWrapperHost<Ncv##bit##u>((Ncv##bit##u *)h_src, srcStep, \
+                                                 (Ncv##bit##u *)h_dst, dstStep, srcRoi); \
+    }
+
+
+implementNppTranspose(32,u)
+implementNppTranspose(32,s)
+implementNppTranspose(32,f)
+implementNppTranspose(64,u)
+implementNppTranspose(64,s)
+implementNppTranspose(64,f)
+
+implementNppTransposeHost(32,u)
+implementNppTransposeHost(32,s)
+implementNppTransposeHost(32,f)
+implementNppTransposeHost(64,u)
+implementNppTransposeHost(64,s)
+implementNppTransposeHost(64,f)
+
+
+NCVStatus nppiStTranspose_128_C1R(void *d_src, Ncv32u srcStep,
+                                  void *d_dst, Ncv32u dstStep, NcvSize32u srcRoi)
+{
+    return transposeWrapperDevice<uint4>((uint4 *)d_src, srcStep, (uint4 *)d_dst, dstStep, srcRoi);
+}
+
+
+NCVStatus nppiStTranspose_128_C1R_host(void *d_src, Ncv32u srcStep,
+                                       void *d_dst, Ncv32u dstStep, NcvSize32u srcRoi)
+{
+    return transposeWrapperHost<uint4>((uint4 *)d_src, srcStep, (uint4 *)d_dst, dstStep, srcRoi);
+}
+
+
+//==============================================================================
+//
+// Compact.cu
+//
+//==============================================================================
+
+
+const Ncv32u NUM_REMOVE_THREADS = 256;
+
+
+template <bool bRemove, bool bWritePartial>
+__global__ void removePass1Scan(Ncv32u *d_src, Ncv32u srcLen,
+                                Ncv32u *d_offsets, Ncv32u *d_blockSums,
+                                Ncv32u elemRemove)
+{
+    Ncv32u blockId = blockIdx.y * 65535 + blockIdx.x;
+    Ncv32u elemAddrIn = blockId * NUM_REMOVE_THREADS + threadIdx.x;
+
+    if (elemAddrIn > srcLen + blockDim.x)
+    {
+        return;
+    }
+
+    __shared__ Ncv32u shmem[NUM_REMOVE_THREADS];
+
+    Ncv32u scanElem = 0;
+    if (elemAddrIn < srcLen)
+    {
+        if (bRemove)
+        {
+            scanElem = (d_src[elemAddrIn] != elemRemove) ? 1 : 0;
+        }
+        else
+        {
+            scanElem = d_src[elemAddrIn];
+        }
+    }
+
+    Ncv32u localScanInc = cv::cudev::blockScanInclusive<NUM_REMOVE_THREADS>(scanElem, shmem, threadIdx.x);
+    __syncthreads();
+
+    if (elemAddrIn < srcLen)
+    {
+        if (threadIdx.x == NUM_REMOVE_THREADS-1 && bWritePartial)
+        {
+            d_blockSums[blockId] = localScanInc;
+        }
+
+        if (bRemove)
+        {
+            d_offsets[elemAddrIn] = localScanInc - scanElem;
+        }
+        else
+        {
+            d_src[elemAddrIn] = localScanInc - scanElem;
+        }
+    }
+}
+
+
+__global__ void removePass2Adjust(Ncv32u *d_offsets, Ncv32u srcLen, Ncv32u *d_blockSums)
+{
+    Ncv32u blockId = blockIdx.y * 65535 + blockIdx.x;
+    Ncv32u elemAddrIn = blockId * NUM_REMOVE_THREADS + threadIdx.x;
+    if (elemAddrIn >= srcLen)
+    {
+        return;
+    }
+
+    __shared__ Ncv32u valOffs;
+    valOffs = d_blockSums[blockId];
+    __syncthreads();
+
+    d_offsets[elemAddrIn] += valOffs;
+}
+
+
+__global__ void removePass3Compact(Ncv32u *d_src, Ncv32u srcLen,
+                                   Ncv32u *d_offsets, Ncv32u *d_dst,
+                                   Ncv32u elemRemove, Ncv32u *dstLenValue)
+{
+    Ncv32u blockId = blockIdx.y * 65535 + blockIdx.x;
+    Ncv32u elemAddrIn = blockId * NUM_REMOVE_THREADS + threadIdx.x;
+    if (elemAddrIn >= srcLen)
+    {
+        return;
+    }
+
+    Ncv32u elem = d_src[elemAddrIn];
+    Ncv32u elemAddrOut = d_offsets[elemAddrIn];
+    if (elem != elemRemove)
+    {
+        d_dst[elemAddrOut] = elem;
+    }
+
+    if (elemAddrIn == srcLen-1)
+    {
+        if (elem != elemRemove)
+        {
+            *dstLenValue = elemAddrOut + 1;
+        }
+        else
+        {
+            *dstLenValue = elemAddrOut;
+        }
+    }
+}
+
+
+NCVStatus compactVector_32u_device(Ncv32u *d_src, Ncv32u srcLen,
+                                   Ncv32u *d_dst, Ncv32u *dstLenPinned,
+                                   Ncv32u elemRemove,
+                                   INCVMemAllocator &gpuAllocator)
+{
+    ncvAssertReturn(gpuAllocator.isInitialized(), NPPST_MEM_INTERNAL_ERROR);
+    ncvAssertReturn((d_src != NULL && d_dst != NULL) || gpuAllocator.isCounting(), NPPST_NULL_POINTER_ERROR);
+
+    if (srcLen == 0)
+    {
+        if (dstLenPinned != NULL)
+        {
+            *dstLenPinned = 0;
+        }
+        return NPPST_SUCCESS;
+    }
+
+    std::vector<Ncv32u> partSumNums;
+    std::vector<Ncv32u> partSumOffsets;
+    Ncv32u partSumLastNum = srcLen;
+    Ncv32u partSumLastOffs = 0;
+    do
+    {
+        partSumNums.push_back(partSumLastNum);
+        partSumOffsets.push_back(partSumLastOffs);
+
+        Ncv32u curPartSumAlignedLength = alignUp(partSumLastNum * sizeof(Ncv32u),
+                                                 gpuAllocator.alignment()) / sizeof(Ncv32u);
+        partSumLastOffs += curPartSumAlignedLength;
+
+        partSumLastNum = (partSumLastNum + NUM_REMOVE_THREADS - 1) / NUM_REMOVE_THREADS;
+    }
+    while (partSumLastNum>1);
+    partSumNums.push_back(partSumLastNum);
+    partSumOffsets.push_back(partSumLastOffs);
+
+    NCVVectorAlloc<Ncv32u> d_hierSums(gpuAllocator, partSumLastOffs+1);
+    ncvAssertReturn(gpuAllocator.isCounting() || d_hierSums.isMemAllocated(), NPPST_MEM_INTERNAL_ERROR);
+    NCVVectorAlloc<Ncv32u> d_numDstElements(gpuAllocator, 1);
+    ncvAssertReturn(gpuAllocator.isCounting() || d_numDstElements.isMemAllocated(), NPPST_MEM_INTERNAL_ERROR);
+
+    NCV_SET_SKIP_COND(gpuAllocator.isCounting());
+    NCV_SKIP_COND_BEGIN
+
+    dim3 block(NUM_REMOVE_THREADS);
+
+    //calculate zero-level partial sums for indices calculation
+    if (partSumNums.size() > 2)
+    {
+        dim3 grid(partSumNums[1]);
+
+        if (grid.x > 65535)
+        {
+            grid.y = (grid.x + 65534) / 65535;
+            grid.x = 65535;
+        }
+        removePass1Scan
+            <true, true>
+            <<<grid, block, 0, nppStGetActiveCUDAstream()>>>
+            (d_src, srcLen,
+             d_hierSums.ptr(),
+             d_hierSums.ptr() + partSumOffsets[1],
+             elemRemove);
+
+        ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+
+        //calculate hierarchical partial sums
+        for (Ncv32u i=1; i<partSumNums.size()-1; i++)
+        {
+            dim3 grid_partial(partSumNums[i+1]);
+            if (grid_partial.x > 65535)
+            {
+                grid_partial.y = (grid_partial.x + 65534) / 65535;
+                grid_partial.x = 65535;
+            }
+            if (grid_partial.x != 1)
+            {
+                removePass1Scan
+                    <false, true>
+                    <<<grid_partial, block, 0, nppStGetActiveCUDAstream()>>>
+                    (d_hierSums.ptr() + partSumOffsets[i],
+                     partSumNums[i], NULL,
+                     d_hierSums.ptr() + partSumOffsets[i+1],
+                     0);
+            }
+            else
+            {
+                removePass1Scan
+                    <false, false>
+                    <<<grid_partial, block, 0, nppStGetActiveCUDAstream()>>>
+                    (d_hierSums.ptr() + partSumOffsets[i],
+                     partSumNums[i], NULL,
+                     NULL,
+                     0);
+            }
+
+            ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+        }
+
+        //adjust hierarchical partial sums
+        for (Ncv32s i=(Ncv32s)partSumNums.size()-3; i>=0; i--)
+        {
+            dim3 grid_local(partSumNums[i+1]);
+            if (grid_local.x > 65535)
+            {
+                grid_local.y = (grid_local.x + 65534) / 65535;
+                grid_local.x = 65535;
+            }
+            removePass2Adjust
+                <<<grid_local, block, 0, nppStGetActiveCUDAstream()>>>
+                (d_hierSums.ptr() + partSumOffsets[i], partSumNums[i],
+                 d_hierSums.ptr() + partSumOffsets[i+1]);
+
+            ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+        }
+    }
+    else
+    {
+        dim3 grid_local(partSumNums[1]);
+        removePass1Scan
+            <true, false>
+            <<<grid_local, block, 0, nppStGetActiveCUDAstream()>>>
+            (d_src, srcLen,
+             d_hierSums.ptr(),
+             NULL, elemRemove);
+
+        ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+    }
+
+    //compact source vector using indices
+    dim3 grid(partSumNums[1]);
+    if (grid.x > 65535)
+    {
+        grid.y = (grid.x + 65534) / 65535;
+        grid.x = 65535;
+    }
+    removePass3Compact
+        <<<grid, block, 0, nppStGetActiveCUDAstream()>>>
+        (d_src, srcLen, d_hierSums.ptr(), d_dst,
+         elemRemove, d_numDstElements.ptr());
+
+    ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+
+    //get number of dst elements
+    if (dstLenPinned != NULL)
+    {
+        ncvAssertCUDAReturn(cudaMemcpyAsync(dstLenPinned, d_numDstElements.ptr(), sizeof(Ncv32u),
+                                              cudaMemcpyDeviceToHost, nppStGetActiveCUDAstream()), NPPST_MEM_RESIDENCE_ERROR);
+        ncvAssertCUDAReturn(cudaStreamSynchronize(nppStGetActiveCUDAstream()), NPPST_MEM_RESIDENCE_ERROR);
+    }
+
+    NCV_SKIP_COND_END
+
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppsStCompactGetSize_32u(Ncv32u srcLen, Ncv32u *pBufsize, cudaDeviceProp &devProp)
+{
+    ncvAssertReturn(pBufsize != NULL, NPPST_NULL_POINTER_ERROR);
+
+    if (srcLen == 0)
+    {
+        *pBufsize = 0;
+        return NPPST_SUCCESS;
+    }
+
+    NCVMemStackAllocator gpuCounter(static_cast<Ncv32u>(devProp.textureAlignment));
+    ncvAssertReturn(gpuCounter.isInitialized(), NPPST_MEM_INTERNAL_ERROR);
+
+    NCVStatus ncvStat = compactVector_32u_device(NULL, srcLen, NULL, NULL, 0xC001C0DE,
+                                                 gpuCounter);
+    ncvAssertReturnNcvStat(ncvStat);
+
+    *pBufsize = (Ncv32u)gpuCounter.maxSize();
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppsStCompactGetSize_32s(Ncv32u srcLen, Ncv32u *pBufsize, cudaDeviceProp &devProp)
+{
+    return nppsStCompactGetSize_32u(srcLen, pBufsize, devProp);
+}
+
+
+NCVStatus nppsStCompactGetSize_32f(Ncv32u srcLen, Ncv32u *pBufsize, cudaDeviceProp &devProp)
+{
+    return nppsStCompactGetSize_32u(srcLen, pBufsize, devProp);
+}
+
+
+NCVStatus nppsStCompact_32u(Ncv32u *d_src, Ncv32u srcLen,
+                            Ncv32u *d_dst, Ncv32u *p_dstLen,
+                            Ncv32u elemRemove, Ncv8u *pBuffer,
+                            Ncv32u bufSize, cudaDeviceProp &devProp)
+{
+    NCVMemStackAllocator gpuAllocator(NCVMemoryTypeDevice, bufSize, static_cast<Ncv32u>(devProp.textureAlignment), pBuffer);
+    ncvAssertReturn(gpuAllocator.isInitialized(), NPPST_MEM_INTERNAL_ERROR);
+
+    NCVStatus ncvStat = compactVector_32u_device(d_src, srcLen, d_dst, p_dstLen, elemRemove,
+                                                 gpuAllocator);
+    ncvAssertReturnNcvStat(ncvStat);
+
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppsStCompact_32s(Ncv32s *d_src, Ncv32u srcLen,
+                            Ncv32s *d_dst, Ncv32u *p_dstLen,
+                            Ncv32s elemRemove, Ncv8u *pBuffer,
+                            Ncv32u bufSize, cudaDeviceProp &devProp)
+{
+    return nppsStCompact_32u((Ncv32u *)d_src, srcLen, (Ncv32u *)d_dst, p_dstLen,
+                             *(Ncv32u *)&elemRemove, pBuffer, bufSize, devProp);
+}
+
+
+#if defined __GNUC__ && (__GNUC__*100 + __GNUC_MINOR__ > 204)
+typedef Ncv32u __attribute__((__may_alias__)) Ncv32u_a;
+#else
+typedef Ncv32u Ncv32u_a;
+#endif
+
+NCVStatus nppsStCompact_32f(Ncv32f *d_src, Ncv32u srcLen,
+                            Ncv32f *d_dst, Ncv32u *p_dstLen,
+                            Ncv32f elemRemove, Ncv8u *pBuffer,
+                            Ncv32u bufSize, cudaDeviceProp &devProp)
+{
+    return nppsStCompact_32u((Ncv32u *)d_src, srcLen, (Ncv32u *)d_dst, p_dstLen,
+                             *(Ncv32u_a *)&elemRemove, pBuffer, bufSize, devProp);
+}
+
+NCVStatus nppsStCompact_32u_host(Ncv32u *h_src, Ncv32u srcLen,
+                                 Ncv32u *h_dst, Ncv32u *dstLen, Ncv32u elemRemove)
+{
+    ncvAssertReturn(h_src != NULL && h_dst != NULL, NPPST_NULL_POINTER_ERROR);
+
+    if (srcLen == 0)
+    {
+        if (dstLen != NULL)
+        {
+            *dstLen = 0;
+        }
+        return NPPST_SUCCESS;
+    }
+
+    Ncv32u dstIndex = 0;
+    for (Ncv32u srcIndex=0; srcIndex<srcLen; srcIndex++)
+    {
+        if (h_src[srcIndex] != elemRemove)
+        {
+            h_dst[dstIndex++] = h_src[srcIndex];
+        }
+    }
+
+    if (dstLen != NULL)
+    {
+        *dstLen = dstIndex;
+    }
+
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppsStCompact_32s_host(Ncv32s *h_src, Ncv32u srcLen,
+                                 Ncv32s *h_dst, Ncv32u *dstLen, Ncv32s elemRemove)
+{
+    return nppsStCompact_32u_host((Ncv32u *)h_src, srcLen, (Ncv32u *)h_dst, dstLen, *(Ncv32u_a *)&elemRemove);
+}
+
+
+NCVStatus nppsStCompact_32f_host(Ncv32f *h_src, Ncv32u srcLen,
+                                 Ncv32f *h_dst, Ncv32u *dstLen, Ncv32f elemRemove)
+{
+    return nppsStCompact_32u_host((Ncv32u *)h_src, srcLen, (Ncv32u *)h_dst, dstLen, *(Ncv32u_a *)&elemRemove);
+}
+
+//==============================================================================
+//
+// Filter.cu
+//
+//==============================================================================
+
+
+texture <float, 1, cudaReadModeElementType> texSrc;
+texture <float, 1, cudaReadModeElementType> texKernel;
+
+
+__forceinline__ __device__ float getValueMirrorRow(const int rowOffset,
+                                                   int i,
+                                                   int w)
+{
+    if (i < 0) i = 1 - i;
+    if (i >= w) i = w + w - i - 1;
+    return tex1Dfetch (texSrc, rowOffset + i);
+}
+
+
+__forceinline__ __device__ float getValueMirrorColumn(const int offset,
+                                                      const int rowStep,
+                                                      int j,
+                                                      int h)
+{
+    if (j < 0) j = 1 - j;
+    if (j >= h) j = h + h - j - 1;
+    return tex1Dfetch (texSrc, offset + j * rowStep);
+}
+
+
+__global__ void FilterRowBorderMirror_32f_C1R(Ncv32u srcStep,
+                                              Ncv32f *pDst,
+                                              NcvSize32u dstSize,
+                                              Ncv32u dstStep,
+                                              NcvRect32u roi,
+                                              Ncv32s nKernelSize,
+                                              Ncv32s nAnchor,
+                                              Ncv32f multiplier)
+{
+    // position within ROI
+    const int ix = blockDim.x * blockIdx.x + threadIdx.x;
+    const int iy = blockDim.y * blockIdx.y + threadIdx.y;
+
+    if (ix >= roi.width || iy >= roi.height)
+    {
+        return;
+    }
+
+    const int p = nKernelSize - nAnchor - 1;
+
+    const int j = roi.y + iy;
+
+    const int rowOffset = j * srcStep + roi.x;
+
+    float sum = 0.0f;
+    for (int m = 0; m < nKernelSize; ++m)
+    {
+        sum += getValueMirrorRow (rowOffset, ix + m - p, roi.width)
+            * tex1Dfetch (texKernel, m);
+    }
+
+    pDst[iy * dstStep + ix] = sum * multiplier;
+}
+
+
+__global__ void FilterColumnBorderMirror_32f_C1R(Ncv32u srcStep,
+                                                 Ncv32f *pDst,
+                                                 NcvSize32u dstSize,
+                                                 Ncv32u dstStep,
+                                                 NcvRect32u roi,
+                                                 Ncv32s nKernelSize,
+                                                 Ncv32s nAnchor,
+                                                 Ncv32f multiplier)
+{
+    const int ix = blockDim.x * blockIdx.x + threadIdx.x;
+    const int iy = blockDim.y * blockIdx.y + threadIdx.y;
+
+    if (ix >= roi.width || iy >= roi.height)
+    {
+        return;
+    }
+
+    const int p = nKernelSize - nAnchor - 1;
+    const int i = roi.x + ix;
+    const int offset = i + roi.y * srcStep;
+
+    float sum = 0.0f;
+    for (int m = 0; m < nKernelSize; ++m)
+    {
+        sum += getValueMirrorColumn (offset, srcStep, iy + m - p, roi.height)
+            * tex1Dfetch (texKernel, m);
+    }
+
+    pDst[ix + iy * dstStep] = sum * multiplier;
+}
+
+
+NCVStatus nppiStFilterRowBorder_32f_C1R(const Ncv32f *pSrc,
+                                        NcvSize32u srcSize,
+                                        Ncv32u nSrcStep,
+                                        Ncv32f *pDst,
+                                        NcvSize32u dstSize,
+                                        Ncv32u nDstStep,
+                                        NcvRect32u oROI,
+                                        NppStBorderType borderType,
+                                        const Ncv32f *pKernel,
+                                        Ncv32s nKernelSize,
+                                        Ncv32s nAnchor,
+                                        Ncv32f multiplier)
+{
+    ncvAssertReturn (pSrc != NULL &&
+        pDst != NULL &&
+        pKernel != NULL, NCV_NULL_PTR);
+
+    ncvAssertReturn (oROI.width > 0 && oROI.height > 0, NPPST_INVALID_ROI);
+
+    ncvAssertReturn (srcSize.width * sizeof (Ncv32f) <= nSrcStep &&
+        dstSize.width * sizeof (Ncv32f) <= nDstStep &&
+        oROI.width * sizeof (Ncv32f) <= nSrcStep &&
+        oROI.width * sizeof (Ncv32f) <= nDstStep &&
+        nSrcStep % sizeof (Ncv32f) == 0 &&
+        nDstStep % sizeof (Ncv32f) == 0, NPPST_INVALID_STEP);
+
+    Ncv32u srcStep = nSrcStep / sizeof (Ncv32f);
+    Ncv32u dstStep = nDstStep / sizeof (Ncv32f);
+
+    // adjust ROI size to be within source image
+    if (oROI.x + oROI.width > srcSize.width)
+    {
+        oROI.width = srcSize.width - oROI.x;
+    }
+
+    if (oROI.y + oROI.height > srcSize.height)
+    {
+        oROI.height = srcSize.height - oROI.y;
+    }
+
+    cudaChannelFormatDesc floatChannel = cudaCreateChannelDesc <float> ();
+    texSrc.normalized    = false;
+    texKernel.normalized = false;
+
+    cudaBindTexture (0, texSrc, pSrc, floatChannel, srcSize.height * nSrcStep);
+    cudaBindTexture (0, texKernel, pKernel, floatChannel, nKernelSize * sizeof (Ncv32f));
+
+    dim3 ctaSize (32, 6);
+    dim3 gridSize ((oROI.width + ctaSize.x - 1) / ctaSize.x,
+        (oROI.height + ctaSize.y - 1) / ctaSize.y);
+
+    switch (borderType)
+    {
+    case nppStBorderNone:
+        return NPPST_ERROR;
+    case nppStBorderClamp:
+        return NPPST_ERROR;
+    case nppStBorderWrap:
+        return NPPST_ERROR;
+    case nppStBorderMirror:
+        FilterRowBorderMirror_32f_C1R <<<gridSize, ctaSize, 0, nppStGetActiveCUDAstream ()>>>
+            (srcStep, pDst, dstSize, dstStep, oROI, nKernelSize, nAnchor, multiplier);
+        ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+        break;
+    default:
+        return NPPST_ERROR;
+    }
+
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppiStFilterColumnBorder_32f_C1R(const Ncv32f *pSrc,
+                                           NcvSize32u srcSize,
+                                           Ncv32u nSrcStep,
+                                           Ncv32f *pDst,
+                                           NcvSize32u dstSize,
+                                           Ncv32u nDstStep,
+                                           NcvRect32u oROI,
+                                           NppStBorderType borderType,
+                                           const Ncv32f *pKernel,
+                                           Ncv32s nKernelSize,
+                                           Ncv32s nAnchor,
+                                           Ncv32f multiplier)
+{
+    ncvAssertReturn (pSrc != NULL &&
+        pDst != NULL &&
+        pKernel != NULL, NCV_NULL_PTR);
+
+    ncvAssertReturn (oROI.width > 0 && oROI.height > 0, NPPST_INVALID_ROI);
+
+    ncvAssertReturn (srcSize.width * sizeof (Ncv32f) <= nSrcStep &&
+        dstSize.width * sizeof (Ncv32f) <= nDstStep &&
+        oROI.width * sizeof (Ncv32f) <= nSrcStep &&
+        oROI.width * sizeof (Ncv32f) <= nDstStep &&
+        nSrcStep % sizeof (Ncv32f) == 0 &&
+        nDstStep % sizeof (Ncv32f) == 0, NPPST_INVALID_STEP);
+
+    Ncv32u srcStep = nSrcStep / sizeof (Ncv32f);
+    Ncv32u dstStep = nDstStep / sizeof (Ncv32f);
+
+    // adjust ROI size to be within source image
+    if (oROI.x + oROI.width > srcSize.width)
+    {
+        oROI.width = srcSize.width - oROI.x;
+    }
+
+    if (oROI.y + oROI.height > srcSize.height)
+    {
+        oROI.height = srcSize.height - oROI.y;
+    }
+
+    cudaChannelFormatDesc floatChannel = cudaCreateChannelDesc <float> ();
+    texSrc.normalized    = false;
+    texKernel.normalized = false;
+
+    cudaBindTexture (0, texSrc, pSrc, floatChannel, srcSize.height * nSrcStep);
+    cudaBindTexture (0, texKernel, pKernel, floatChannel, nKernelSize * sizeof (Ncv32f));
+
+    dim3 ctaSize (32, 6);
+    dim3 gridSize ((oROI.width + ctaSize.x - 1) / ctaSize.x,
+        (oROI.height + ctaSize.y - 1) / ctaSize.y);
+
+    switch (borderType)
+    {
+    case nppStBorderClamp:
+        return NPPST_ERROR;
+    case nppStBorderWrap:
+        return NPPST_ERROR;
+    case nppStBorderMirror:
+        FilterColumnBorderMirror_32f_C1R <<<gridSize, ctaSize, 0, nppStGetActiveCUDAstream ()>>>
+            (srcStep, pDst, dstSize, dstStep, oROI, nKernelSize, nAnchor, multiplier);
+        ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+        break;
+    default:
+        return NPPST_ERROR;
+    }
+
+    return NPPST_SUCCESS;
+}
+
+
+//==============================================================================
+//
+// FrameInterpolate.cu
+//
+//==============================================================================
+
+
+inline Ncv32u iDivUp(Ncv32u num, Ncv32u denom)
+{
+    return (num + denom - 1)/denom;
+}
+
+
+texture<float, 2, cudaReadModeElementType> tex_src1;
+texture<float, 2, cudaReadModeElementType> tex_src0;
+
+
+__global__ void BlendFramesKernel(const float *u, const float *v,   // forward flow
+                                  const float *ur, const float *vr, // backward flow
+                                  const float *o0, const float *o1, // coverage masks
+                                  int w, int h, int s,
+                                  float theta, float *out)
+{
+    const int ix = threadIdx.x + blockDim.x * blockIdx.x;
+    const int iy = threadIdx.y + blockDim.y * blockIdx.y;
+
+    const int pos = ix + s * iy;
+
+    if (ix >= w || iy >= h) return;
+
+    float _u = u[pos];
+    float _v = v[pos];
+
+    float _ur = ur[pos];
+    float _vr = vr[pos];
+
+    float x = (float)ix + 0.5f;
+    float y = (float)iy + 0.5f;
+    bool b0 = o0[pos] > 1e-4f;
+    bool b1 = o1[pos] > 1e-4f;
+
+    if (b0 && b1)
+    {
+        // pixel is visible on both frames
+        out[pos] = tex2D(tex_src0, x - _u * theta, y - _v * theta) * (1.0f - theta) +
+            tex2D(tex_src1, x + _u * (1.0f - theta), y + _v * (1.0f - theta)) * theta;
+    }
+    else if (b0)
+    {
+        // visible on the first frame only
+        out[pos] = tex2D(tex_src0, x - _u * theta, y - _v * theta);
+    }
+    else
+    {
+        // visible on the second frame only
+        out[pos] = tex2D(tex_src1, x - _ur * (1.0f - theta), y - _vr * (1.0f - theta));
+    }
+}
+
+
+NCVStatus BlendFrames(const Ncv32f *src0,
+                      const Ncv32f *src1,
+                      const Ncv32f *ufi,
+                      const Ncv32f *vfi,
+                      const Ncv32f *ubi,
+                      const Ncv32f *vbi,
+                      const Ncv32f *o1,
+                      const Ncv32f *o2,
+                      Ncv32u width,
+                      Ncv32u height,
+                      Ncv32u stride,
+                      Ncv32f theta,
+                      Ncv32f *out)
+{
+    tex_src1.addressMode[0] = cudaAddressModeClamp;
+    tex_src1.addressMode[1] = cudaAddressModeClamp;
+    tex_src1.filterMode = cudaFilterModeLinear;
+    tex_src1.normalized = false;
+
+    tex_src0.addressMode[0] = cudaAddressModeClamp;
+    tex_src0.addressMode[1] = cudaAddressModeClamp;
+    tex_src0.filterMode = cudaFilterModeLinear;
+    tex_src0.normalized = false;
+
+    cudaChannelFormatDesc desc = cudaCreateChannelDesc <float> ();
+    const Ncv32u pitch = stride * sizeof (float);
+    ncvAssertCUDAReturn (cudaBindTexture2D (0, tex_src1, src1, desc, width, height, pitch), NPPST_TEXTURE_BIND_ERROR);
+    ncvAssertCUDAReturn (cudaBindTexture2D (0, tex_src0, src0, desc, width, height, pitch), NPPST_TEXTURE_BIND_ERROR);
+
+    dim3 threads (32, 4);
+    dim3 blocks (iDivUp (width, threads.x), iDivUp (height, threads.y));
+
+    BlendFramesKernel<<<blocks, threads, 0, nppStGetActiveCUDAstream ()>>>
+        (ufi, vfi, ubi, vbi, o1, o2, width, height, stride, theta, out);
+
+    ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppiStGetInterpolationBufferSize(NcvSize32u srcSize,
+                                           Ncv32u nStep,
+                                           Ncv32u *hpSize)
+{
+    NCVStatus status = NPPST_ERROR;
+    status = nppiStVectorWarpGetBufferSize(srcSize, nStep, hpSize);
+    return status;
+}
+
+
+NCVStatus nppiStInterpolateFrames(const NppStInterpolationState *pState)
+{
+    // check state validity
+    ncvAssertReturn (pState->pSrcFrame0 != 0 &&
+        pState->pSrcFrame1 != 0 &&
+        pState->pFU != 0 &&
+        pState->pFV != 0 &&
+        pState->pBU != 0 &&
+        pState->pBV != 0 &&
+        pState->pNewFrame != 0 &&
+        pState->ppBuffers[0] != 0 &&
+        pState->ppBuffers[1] != 0 &&
+        pState->ppBuffers[2] != 0 &&
+        pState->ppBuffers[3] != 0 &&
+        pState->ppBuffers[4] != 0 &&
+        pState->ppBuffers[5] != 0, NPPST_NULL_POINTER_ERROR);
+
+    ncvAssertReturn (pState->size.width  > 0 &&
+        pState->size.height > 0, NPPST_ERROR);
+
+    ncvAssertReturn (pState->nStep >= pState->size.width * sizeof (Ncv32f) &&
+        pState->nStep > 0 &&
+        pState->nStep % sizeof (Ncv32f) == 0,
+        NPPST_INVALID_STEP);
+
+    // change notation
+    Ncv32f *cov0 = pState->ppBuffers[0];
+    Ncv32f *cov1 = pState->ppBuffers[1];
+    Ncv32f *fwdU = pState->ppBuffers[2]; // forward u
+    Ncv32f *fwdV = pState->ppBuffers[3]; // forward v
+    Ncv32f *bwdU = pState->ppBuffers[4]; // backward u
+    Ncv32f *bwdV = pState->ppBuffers[5]; // backward v
+    // warp flow
+    ncvAssertReturnNcvStat (
+        nppiStVectorWarp_PSF2x2_32f_C1 (pState->pFU,
+        pState->size,
+        pState->nStep,
+        pState->pFU,
+        pState->pFV,
+        pState->nStep,
+        cov0,
+        pState->pos,
+        fwdU) );
+    ncvAssertReturnNcvStat (
+        nppiStVectorWarp_PSF2x2_32f_C1 (pState->pFV,
+        pState->size,
+        pState->nStep,
+        pState->pFU,
+        pState->pFV,
+        pState->nStep,
+        cov0,
+        pState->pos,
+        fwdV) );
+    // warp backward flow
+    ncvAssertReturnNcvStat (
+        nppiStVectorWarp_PSF2x2_32f_C1 (pState->pBU,
+        pState->size,
+        pState->nStep,
+        pState->pBU,
+        pState->pBV,
+        pState->nStep,
+        cov1,
+        1.0f - pState->pos,
+        bwdU) );
+    ncvAssertReturnNcvStat (
+        nppiStVectorWarp_PSF2x2_32f_C1 (pState->pBV,
+        pState->size,
+        pState->nStep,
+        pState->pBU,
+        pState->pBV,
+        pState->nStep,
+        cov1,
+        1.0f - pState->pos,
+        bwdU) );
+    // interpolate frame
+    ncvAssertReturnNcvStat (
+        BlendFrames (pState->pSrcFrame0,
+        pState->pSrcFrame1,
+        fwdU,
+        fwdV,
+        bwdU,
+        bwdV,
+        cov0,
+        cov1,
+        pState->size.width,
+        pState->size.height,
+        pState->nStep / sizeof (Ncv32f),
+        pState->pos,
+        pState->pNewFrame) );
+
+    return NPPST_SUCCESS;
+}
+
+
+//==============================================================================
+//
+// VectorWarpFrame.cu
+//
+//==============================================================================
+
+
+#if defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 200)
+
+// FP32 atomic add
+static __forceinline__ __device__ float _atomicAdd(float *addr, float val)
+{
+    float old = *addr, assumed;
+
+    do {
+        assumed = old;
+        old = int_as_float(__iAtomicCAS((int*)addr,
+              float_as_int(assumed),
+              float_as_int(val+assumed)));
+    } while( assumed!=old );
+
+    return old;
+}
+#else
+#define _atomicAdd atomicAdd
+#endif
+
+
+__global__ void ForwardWarpKernel_PSF2x2(const float *u,
+                                         const float *v,
+                                         const float *src,
+                                         const int w,
+                                         const int h,
+                                         const int flow_stride,
+                                         const int image_stride,
+                                         const float time_scale,
+                                         float *normalization_factor,
+                                         float *dst)
+{
+    int j = threadIdx.x + blockDim.x * blockIdx.x;
+    int i = threadIdx.y + blockDim.y * blockIdx.y;
+
+    if (i >= h || j >= w) return;
+
+    int flow_row_offset  = i * flow_stride;
+    int image_row_offset = i * image_stride;
+
+    //bottom left corner of a target pixel
+    float cx = u[flow_row_offset + j] * time_scale + (float)j + 1.0f;
+    float cy = v[flow_row_offset + j] * time_scale + (float)i + 1.0f;
+    // pixel containing bottom left corner
+    float px;
+    float py;
+    float dx = modff (cx, &px);
+    float dy = modff (cy, &py);
+    // target pixel integer coords
+    int tx;
+    int ty;
+    tx = (int) px;
+    ty = (int) py;
+    float value = src[image_row_offset + j];
+    float weight;
+    // fill pixel containing bottom right corner
+    if (!((tx >= w) || (tx < 0) || (ty >= h) || (ty < 0)))
+    {
+        weight = dx * dy;
+        _atomicAdd (dst + ty * image_stride + tx, value * weight);
+        _atomicAdd (normalization_factor + ty * image_stride + tx, weight);
+    }
+
+    // fill pixel containing bottom left corner
+    tx -= 1;
+    if (!((tx >= w) || (tx < 0) || (ty >= h) || (ty < 0)))
+    {
+        weight = (1.0f - dx) * dy;
+        _atomicAdd (dst + ty * image_stride + tx, value * weight);
+        _atomicAdd (normalization_factor + ty * image_stride + tx, weight);
+    }
+
+    // fill pixel containing upper left corner
+    ty -= 1;
+    if (!((tx >= w) || (tx < 0) || (ty >= h) || (ty < 0)))
+    {
+        weight = (1.0f - dx) * (1.0f - dy);
+        _atomicAdd (dst + ty * image_stride + tx, value * weight);
+        _atomicAdd (normalization_factor + ty * image_stride + tx, weight);
+    }
+
+    // fill pixel containing upper right corner
+    tx += 1;
+    if (!((tx >= w) || (tx < 0) || (ty >= h) || (ty < 0)))
+    {
+        weight = dx * (1.0f - dy);
+        _atomicAdd (dst + ty * image_stride + tx, value * weight);
+        _atomicAdd (normalization_factor + ty * image_stride + tx, weight);
+    }
+}
+
+
+__global__ void ForwardWarpKernel_PSF1x1(const float *u,
+                                         const float *v,
+                                         const float *src,
+                                         const int w,
+                                         const int h,
+                                         const int flow_stride,
+                                         const int image_stride,
+                                         const float time_scale,
+                                         float *dst)
+{
+    int j = threadIdx.x + blockDim.x * blockIdx.x;
+    int i = threadIdx.y + blockDim.y * blockIdx.y;
+
+    if (i >= h || j >= w) return;
+
+    int flow_row_offset = i * flow_stride;
+    int image_row_offset = i * image_stride;
+
+    float u_ = u[flow_row_offset + j];
+    float v_ = v[flow_row_offset + j];
+
+    //bottom left corner of target pixel
+    float cx = u_ * time_scale + (float)j + 1.0f;
+    float cy = v_ * time_scale + (float)i + 1.0f;
+    // pixel containing bottom left corner
+    int tx = __float2int_rn (cx);
+    int ty = __float2int_rn (cy);
+
+    float value = src[image_row_offset + j];
+    // fill pixel
+    if (!((tx >= w) || (tx < 0) || (ty >= h) || (ty < 0)))
+    {
+        _atomicAdd (dst + ty * image_stride + tx, value);
+    }
+}
+
+
+__global__ void NormalizeKernel(const float *normalization_factor, int w, int h, int s, float *image)
+{
+    int i = threadIdx.y + blockDim.y * blockIdx.y;
+    int j = threadIdx.x + blockDim.x * blockIdx.x;
+
+    if (i >= h || j >= w) return;
+
+    const int pos = i * s + j;
+
+    float scale = normalization_factor[pos];
+
+    float invScale = (scale == 0.0f) ? 1.0f : (1.0f / scale);
+
+    image[pos] *= invScale;
+}
+
+
+__global__ void MemsetKernel(const float value, int w, int h, float *image)
+{
+    int i = threadIdx.y + blockDim.y * blockIdx.y;
+    int j = threadIdx.x + blockDim.x * blockIdx.x;
+
+    if (i >= h || j >= w) return;
+
+    const int pos = i * w + j;
+
+    image[pos] = value;
+}
+
+
+NCVStatus nppiStVectorWarpGetBufferSize (NcvSize32u srcSize, Ncv32u nSrcStep, Ncv32u *hpSize)
+{
+    ncvAssertReturn (hpSize != NULL, NPPST_NULL_POINTER_ERROR);
+    ncvAssertReturn (srcSize.width * sizeof (Ncv32f) <= nSrcStep,
+        NPPST_INVALID_STEP);
+
+    *hpSize = nSrcStep * srcSize.height;
+
+    return NPPST_SUCCESS;
+}
+
+
+// does not require normalization
+NCVStatus nppiStVectorWarp_PSF1x1_32f_C1(const Ncv32f *pSrc,
+                                         NcvSize32u srcSize,
+                                         Ncv32u nSrcStep,
+                                         const Ncv32f *pU,
+                                         const Ncv32f *pV,
+                                         Ncv32u nVFStep,
+                                         Ncv32f timeScale,
+                                         Ncv32f *pDst)
+{
+    ncvAssertReturn (pSrc != NULL &&
+        pU   != NULL &&
+        pV   != NULL &&
+        pDst != NULL, NPPST_NULL_POINTER_ERROR);
+
+    ncvAssertReturn (srcSize.width * sizeof (Ncv32f) <= nSrcStep &&
+        srcSize.width * sizeof (Ncv32f) <= nVFStep,
+        NPPST_INVALID_STEP);
+
+    Ncv32u srcStep = nSrcStep / sizeof (Ncv32f);
+    Ncv32u vfStep  = nVFStep / sizeof (Ncv32f);
+
+    dim3 ctaSize (32, 6);
+    dim3 gridSize (iDivUp (srcSize.width, ctaSize.x), iDivUp (srcSize.height, ctaSize.y));
+
+    ForwardWarpKernel_PSF1x1 <<<gridSize, ctaSize, 0, nppStGetActiveCUDAstream()>>>
+        (pU, pV, pSrc, srcSize.width, srcSize.height, vfStep, srcStep, timeScale, pDst);
+
+    ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+
+    return NPPST_SUCCESS;
+}
+
+
+NCVStatus nppiStVectorWarp_PSF2x2_32f_C1(const Ncv32f *pSrc,
+                                         NcvSize32u srcSize,
+                                         Ncv32u nSrcStep,
+                                         const Ncv32f *pU,
+                                         const Ncv32f *pV,
+                                         Ncv32u nVFStep,
+                                         Ncv32f *pBuffer,
+                                         Ncv32f timeScale,
+                                         Ncv32f *pDst)
+{
+    ncvAssertReturn (pSrc != NULL &&
+        pU   != NULL &&
+        pV   != NULL &&
+        pDst != NULL &&
+        pBuffer != NULL, NPPST_NULL_POINTER_ERROR);
+
+    ncvAssertReturn (srcSize.width * sizeof (Ncv32f) <= nSrcStep &&
+        srcSize.width * sizeof (Ncv32f) <= nVFStep, NPPST_INVALID_STEP);
+
+    Ncv32u srcStep = nSrcStep / sizeof (Ncv32f);
+    Ncv32u vfStep = nVFStep / sizeof(Ncv32f);
+
+    dim3 ctaSize(32, 6);
+    dim3 gridSize (iDivUp (srcSize.width, ctaSize.x), iDivUp (srcSize.height, ctaSize.y));
+
+    MemsetKernel <<<gridSize, ctaSize, 0, nppStGetActiveCUDAstream()>>>
+        (0, srcSize.width, srcSize.height, pBuffer);
+
+    ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+
+    ForwardWarpKernel_PSF2x2 <<<gridSize, ctaSize, 0, nppStGetActiveCUDAstream()>>>
+        (pU, pV, pSrc, srcSize.width, srcSize.height, vfStep, srcStep, timeScale, pBuffer, pDst);
+
+    ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+
+    NormalizeKernel <<<gridSize, ctaSize, 0, nppStGetActiveCUDAstream()>>>
+        (pBuffer, srcSize.width, srcSize.height, srcStep, pDst);
+
+    ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+
+    return NPPST_SUCCESS;
+}
+
+
+//==============================================================================
+//
+// Resize.cu
+//
+//==============================================================================
+
+
+texture <float, 2, cudaReadModeElementType> texSrc2D;
+
+
+__forceinline__
+__device__ float processLine(int spos,
+                             float xmin,
+                             float xmax,
+                             int ixmin,
+                             int ixmax,
+                             float fxmin,
+                             float cxmax)
+{
+    // first element
+    float wsum = 1.0f - xmin + fxmin;
+    float sum = tex1Dfetch(texSrc, spos) * (1.0f - xmin + fxmin);
+    spos++;
+    for (int ix = ixmin + 1; ix < ixmax; ++ix)
+    {
+        sum += tex1Dfetch(texSrc, spos);
+        spos++;
+        wsum += 1.0f;
+    }
+    sum += tex1Dfetch(texSrc, spos) * (cxmax - xmax);
+    wsum += cxmax - xmax;
+    return sum / wsum;
+}
+
+
+__global__ void resizeSuperSample_32f(NcvSize32u srcSize,
+                                      Ncv32u srcStep,
+                                      NcvRect32u srcROI,
+                                      Ncv32f *dst,
+                                      NcvSize32u dstSize,
+                                      Ncv32u dstStep,
+                                      NcvRect32u dstROI,
+                                      Ncv32f scaleX,
+                                      Ncv32f scaleY)
+{
+    // position within dst ROI
+    const int ix = blockIdx.x * blockDim.x + threadIdx.x;
+    const int iy = blockIdx.y * blockDim.y + threadIdx.y;
+
+    if (ix >= dstROI.width || iy >= dstROI.height)
+    {
+        return;
+    }
+
+    float rw = (float) srcROI.width;
+    float rh = (float) srcROI.height;
+
+    // source position
+    float x = scaleX * (float) ix;
+    float y = scaleY * (float) iy;
+
+    // x sampling range
+    float xBegin = fmax (x - scaleX, 0.0f);
+    float xEnd   = fmin (x + scaleX, rw - 1.0f);
+    // y sampling range
+    float yBegin = fmax (y - scaleY, 0.0f);
+    float yEnd   = fmin (y + scaleY, rh - 1.0f);
+    // x range of source samples
+    float floorXBegin = floorf (xBegin);
+    float ceilXEnd    = ceilf (xEnd);
+    int iXBegin = srcROI.x + (int) floorXBegin;
+    int iXEnd   = srcROI.x + (int) ceilXEnd;
+    // y range of source samples
+    float floorYBegin = floorf (yBegin);
+    float ceilYEnd    = ceilf (yEnd);
+    int iYBegin = srcROI.y + (int) floorYBegin;
+    int iYEnd   = srcROI.y + (int) ceilYEnd;
+
+    // first row
+    int pos = iYBegin * srcStep + iXBegin;
+
+    float wsum = 1.0f - yBegin + floorYBegin;
+
+    float sum = processLine (pos, xBegin, xEnd, iXBegin, iXEnd, floorXBegin,
+        ceilXEnd) * (1.0f - yBegin + floorYBegin);
+    pos += srcStep;
+    for (int iy = iYBegin + 1; iy < iYEnd; ++iy)
+    {
+        sum += processLine (pos, xBegin, xEnd, iXBegin, iXEnd, floorXBegin,
+            ceilXEnd);
+        pos += srcStep;
+        wsum += 1.0f;
+    }
+
+    sum += processLine (pos, xBegin, xEnd, iXBegin, iXEnd, floorXBegin,
+        ceilXEnd) * (ceilYEnd - yEnd);
+    wsum += ceilYEnd - yEnd;
+    sum /= wsum;
+
+    dst[(ix + dstROI.x) + (iy + dstROI.y) * dstStep] = sum;
+}
+
+
+// bicubic interpolation
+__forceinline__
+__device__ float bicubicCoeff(float x_)
+{
+    float x = fabsf(x_);
+    if (x <= 1.0f)
+    {
+        return x * x * (1.5f * x - 2.5f) + 1.0f;
+    }
+    else if (x < 2.0f)
+    {
+        return x * (x * (-0.5f * x + 2.5f) - 4.0f) + 2.0f;
+    }
+    else
+    {
+        return 0.0f;
+    }
+}
+
+
+__global__ void resizeBicubic(NcvSize32u srcSize,
+                              NcvRect32u srcROI,
+                              NcvSize32u dstSize,
+                              Ncv32u dstStep,
+                              Ncv32f *dst,
+                              NcvRect32u dstROI,
+                              Ncv32f scaleX,
+                              Ncv32f scaleY)
+{
+    const int ix = blockIdx.x * blockDim.x + threadIdx.x;
+    const int iy = blockIdx.y * blockDim.y + threadIdx.y;
+
+    if (ix >= dstROI.width || iy >= dstROI.height)
+    {
+        return;
+    }
+
+    const float dx = 1.0f / srcROI.width;
+    const float dy = 1.0f / srcROI.height;
+
+    float rx = (float) srcROI.x;
+    float ry = (float) srcROI.y;
+
+    float rw = (float) srcROI.width;
+    float rh = (float) srcROI.height;
+
+    float x = scaleX * (float) ix;
+    float y = scaleY * (float) iy;
+
+    // sampling range
+    // border mode is clamp
+    float xmin = fmax (ceilf (x - 2.0f), 0.0f);
+    float xmax = fmin (floorf (x + 2.0f), rw - 1.0f);
+
+    float ymin = fmax (ceilf (y - 2.0f), 0.0f);
+    float ymax = fmin (floorf (y + 2.0f), rh - 1.0f);
+
+    // shift data window to match ROI
+    rx += 0.5f;
+    ry += 0.5f;
+
+    x += rx;
+    y += ry;
+
+    xmin += rx;
+    xmax += rx;
+    ymin += ry;
+    ymax += ry;
+
+    float sum  = 0.0f;
+    float wsum = 0.0f;
+
+    for (float cy = ymin; cy <= ymax; cy += 1.0f)
+    {
+        for (float cx = xmin; cx <= xmax; cx += 1.0f)
+        {
+            float xDist = x - cx;
+            float yDist = y - cy;
+            float wx = bicubicCoeff (xDist);
+            float wy = bicubicCoeff (yDist);
+            wx *= wy;
+            sum += wx * tex2D (texSrc2D, cx * dx, cy * dy);
+            wsum += wx;
+        }
+    }
+    dst[(ix + dstROI.x)+ (iy + dstROI.y) * dstStep] = (!wsum)? 0 : sum / wsum;
+}
+
+
+NCVStatus nppiStResize_32f_C1R(const Ncv32f *pSrc,
+                               NcvSize32u srcSize,
+                               Ncv32u nSrcStep,
+                               NcvRect32u srcROI,
+                               Ncv32f *pDst,
+                               NcvSize32u dstSize,
+                               Ncv32u nDstStep,
+                               NcvRect32u dstROI,
+                               Ncv32f xFactor,
+                               Ncv32f yFactor,
+                               NppStInterpMode interpolation)
+{
+    NCVStatus status = NPPST_SUCCESS;
+
+    ncvAssertReturn (pSrc != NULL && pDst != NULL, NPPST_NULL_POINTER_ERROR);
+    ncvAssertReturn (xFactor != 0.0 && yFactor != 0.0, NPPST_INVALID_SCALE);
+
+    ncvAssertReturn (nSrcStep >= sizeof (Ncv32f) * (Ncv32u) srcSize.width &&
+        nDstStep >= sizeof (Ncv32f) * (Ncv32f) dstSize.width,
+        NPPST_INVALID_STEP);
+
+    Ncv32u srcStep = nSrcStep / sizeof (Ncv32f);
+    Ncv32u dstStep = nDstStep / sizeof (Ncv32f);
+
+    // TODO: preprocess ROI to prevent out of bounds access
+
+    if (interpolation == nppStSupersample)
+    {
+        // bind texture
+        cudaBindTexture (0, texSrc, pSrc, srcSize.height * nSrcStep);
+        // invoke kernel
+        dim3 ctaSize (32, 6);
+        dim3 gridSize ((dstROI.width  + ctaSize.x - 1) / ctaSize.x,
+            (dstROI.height + ctaSize.y - 1) / ctaSize.y);
+
+        resizeSuperSample_32f <<<gridSize, ctaSize, 0, nppStGetActiveCUDAstream ()>>>
+            (srcSize, srcStep, srcROI, pDst, dstSize, dstStep, dstROI, 1.0f / xFactor, 1.0f / yFactor);
+    }
+    else if (interpolation == nppStBicubic)
+    {
+        texSrc2D.addressMode[0] = cudaAddressModeMirror;
+        texSrc2D.addressMode[1] = cudaAddressModeMirror;
+        texSrc2D.normalized = true;
+
+        cudaChannelFormatDesc desc = cudaCreateChannelDesc <float> ();
+
+        cudaBindTexture2D (0, texSrc2D, pSrc, desc, srcSize.width, srcSize.height,
+            nSrcStep);
+
+        dim3 ctaSize (32, 6);
+        dim3 gridSize ((dstSize.width  + ctaSize.x - 1) / ctaSize.x,
+            (dstSize.height + ctaSize.y - 1) / ctaSize.y);
+
+        resizeBicubic <<<gridSize, ctaSize, 0, nppStGetActiveCUDAstream ()>>>
+            (srcSize, srcROI, dstSize, dstStep, pDst, dstROI, 1.0f / xFactor, 1.0f / yFactor);
+    }
+    else
+    {
+        status = NPPST_ERROR;
+    }
+
+    ncvAssertCUDALastErrorReturn(NPPST_CUDA_KERNEL_EXECUTION_ERROR);
+
+    return status;
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudalegacy/src/cuda/bm.cu b/modules/cudalegacy/src/cuda/bm.cu
new file mode 100644
index 00000000000..1307a8e3275
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/bm.cu
@@ -0,0 +1,169 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/limits.hpp"
+#include "opencv2/core/cuda/functional.hpp"
+#include "opencv2/core/cuda/reduce.hpp"
+
+using namespace cv::cuda;
+using namespace cv::cuda::device;
+
+namespace optflowbm
+{
+    texture<uchar, cudaTextureType2D, cudaReadModeElementType> tex_prev(false, cudaFilterModePoint, cudaAddressModeClamp);
+    texture<uchar, cudaTextureType2D, cudaReadModeElementType> tex_curr(false, cudaFilterModePoint, cudaAddressModeClamp);
+
+    __device__ int cmpBlocks(int X1, int Y1, int X2, int Y2, int2 blockSize)
+    {
+        int s = 0;
+
+        for (int y = 0; y < blockSize.y; ++y)
+        {
+            for (int x = 0; x < blockSize.x; ++x)
+                s += ::abs(tex2D(tex_prev, X1 + x, Y1 + y) - tex2D(tex_curr, X2 + x, Y2 + y));
+        }
+
+        return s;
+    }
+
+    __global__ void calcOptFlowBM(PtrStepSzf velx, PtrStepf vely, const int2 blockSize, const int2 shiftSize, const bool usePrevious,
+                                  const int maxX, const int maxY, const int acceptLevel, const int escapeLevel,
+                                  const short2* ss, const int ssCount)
+    {
+        const int j = blockIdx.x * blockDim.x + threadIdx.x;
+        const int i = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (i >= velx.rows || j >= velx.cols)
+            return;
+
+        const int X1 = j * shiftSize.x;
+        const int Y1 = i * shiftSize.y;
+
+        const int offX = usePrevious ? __float2int_rn(velx(i, j)) : 0;
+        const int offY = usePrevious ? __float2int_rn(vely(i, j)) : 0;
+
+        int X2 = X1 + offX;
+        int Y2 = Y1 + offY;
+
+        int dist = numeric_limits<int>::max();
+
+        if (0 <= X2 && X2 <= maxX && 0 <= Y2 && Y2 <= maxY)
+            dist = cmpBlocks(X1, Y1, X2, Y2, blockSize);
+
+        int countMin = 1;
+        int sumx = offX;
+        int sumy = offY;
+
+        if (dist > acceptLevel)
+        {
+            // do brute-force search
+            for (int k = 0; k < ssCount; ++k)
+            {
+                const short2 ssVal = ss[k];
+
+                const int dx = offX + ssVal.x;
+                const int dy = offY + ssVal.y;
+
+                X2 = X1 + dx;
+                Y2 = Y1 + dy;
+
+                if (0 <= X2 && X2 <= maxX && 0 <= Y2 && Y2 <= maxY)
+                {
+                    const int tmpDist = cmpBlocks(X1, Y1, X2, Y2, blockSize);
+                    if (tmpDist < acceptLevel)
+                    {
+                        sumx = dx;
+                        sumy = dy;
+                        countMin = 1;
+                        break;
+                    }
+
+                    if (tmpDist < dist)
+                    {
+                        dist = tmpDist;
+                        sumx = dx;
+                        sumy = dy;
+                        countMin = 1;
+                    }
+                    else if (tmpDist == dist)
+                    {
+                        sumx += dx;
+                        sumy += dy;
+                        countMin++;
+                    }
+                }
+            }
+
+            if (dist > escapeLevel)
+            {
+                sumx = offX;
+                sumy = offY;
+                countMin = 1;
+            }
+        }
+
+        velx(i, j) = static_cast<float>(sumx) / countMin;
+        vely(i, j) = static_cast<float>(sumy) / countMin;
+    }
+
+    void calc(PtrStepSzb prev, PtrStepSzb curr, PtrStepSzf velx, PtrStepSzf vely, int2 blockSize, int2 shiftSize, bool usePrevious,
+              int maxX, int maxY, int acceptLevel, int escapeLevel, const short2* ss, int ssCount, cudaStream_t stream)
+    {
+        bindTexture(&tex_prev, prev);
+        bindTexture(&tex_curr, curr);
+
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(velx.cols, block.x), divUp(vely.rows, block.y));
+
+        calcOptFlowBM<<<grid, block, 0, stream>>>(velx, vely, blockSize, shiftSize, usePrevious,
+                                                  maxX, maxY, acceptLevel,  escapeLevel, ss, ssCount);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}
+
+#endif // !defined CUDA_DISABLER
diff --git a/modules/cudalegacy/src/cuda/bm_fast.cu b/modules/cudalegacy/src/cuda/bm_fast.cu
new file mode 100644
index 00000000000..fe1bc999238
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/bm_fast.cu
@@ -0,0 +1,295 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/limits.hpp"
+#include "opencv2/core/cuda/functional.hpp"
+#include "opencv2/core/cuda/reduce.hpp"
+
+using namespace cv::cuda;
+using namespace cv::cuda::device;
+
+namespace optflowbm_fast
+{
+    enum
+    {
+        CTA_SIZE = 128,
+
+        TILE_COLS = 128,
+        TILE_ROWS = 32,
+
+        STRIDE = CTA_SIZE
+    };
+
+    template <typename T> __device__ __forceinline__ int calcDist(T a, T b)
+    {
+        return ::abs(a - b);
+    }
+
+    template <class T> struct FastOptFlowBM
+    {
+
+        int search_radius;
+        int block_radius;
+
+        int search_window;
+        int block_window;
+
+        PtrStepSz<T> I0;
+        PtrStep<T> I1;
+
+        mutable PtrStepi buffer;
+
+        FastOptFlowBM(int search_window_, int block_window_,
+                      PtrStepSz<T> I0_, PtrStepSz<T> I1_,
+                      PtrStepi buffer_) :
+            search_radius(search_window_ / 2), block_radius(block_window_ / 2),
+            search_window(search_window_), block_window(block_window_),
+            I0(I0_), I1(I1_),
+            buffer(buffer_)
+        {
+        }
+
+        __device__ __forceinline__ void initSums_BruteForce(int i, int j, int* dist_sums, PtrStepi& col_sums, PtrStepi& up_col_sums) const
+        {
+            for (int index = threadIdx.x; index < search_window * search_window; index += STRIDE)
+            {
+                dist_sums[index] = 0;
+
+                for (int tx = 0; tx < block_window; ++tx)
+                    col_sums(tx, index) = 0;
+
+                int y = index / search_window;
+                int x = index - y * search_window;
+
+                int ay = i;
+                int ax = j;
+
+                int by = i + y - search_radius;
+                int bx = j + x - search_radius;
+
+                for (int tx = -block_radius; tx <= block_radius; ++tx)
+                {
+                    int col_sum = 0;
+                    for (int ty = -block_radius; ty <= block_radius; ++ty)
+                    {
+                        int dist = calcDist(I0(ay + ty, ax + tx), I1(by + ty, bx + tx));
+
+                        dist_sums[index] += dist;
+                        col_sum += dist;
+                    }
+
+                    col_sums(tx + block_radius, index) = col_sum;
+                }
+
+                up_col_sums(j, index) = col_sums(block_window - 1, index);
+            }
+        }
+
+        __device__ __forceinline__ void shiftRight_FirstRow(int i, int j, int first, int* dist_sums, PtrStepi& col_sums, PtrStepi& up_col_sums) const
+        {
+            for (int index = threadIdx.x; index < search_window * search_window; index += STRIDE)
+            {
+                int y = index / search_window;
+                int x = index - y * search_window;
+
+                int ay = i;
+                int ax = j + block_radius;
+
+                int by = i + y - search_radius;
+                int bx = j + x - search_radius + block_radius;
+
+                int col_sum = 0;
+
+                for (int ty = -block_radius; ty <= block_radius; ++ty)
+                    col_sum += calcDist(I0(ay + ty, ax), I1(by + ty, bx));
+
+                dist_sums[index] += col_sum - col_sums(first, index);
+
+                col_sums(first, index) = col_sum;
+                up_col_sums(j, index) = col_sum;
+            }
+        }
+
+        __device__ __forceinline__ void shiftRight_UpSums(int i, int j, int first, int* dist_sums, PtrStepi& col_sums, PtrStepi& up_col_sums) const
+        {
+            int ay = i;
+            int ax = j + block_radius;
+
+            T a_up   = I0(ay - block_radius - 1, ax);
+            T a_down = I0(ay + block_radius, ax);
+
+            for(int index = threadIdx.x; index < search_window * search_window; index += STRIDE)
+            {
+                int y = index / search_window;
+                int x = index - y * search_window;
+
+                int by = i + y - search_radius;
+                int bx = j + x - search_radius + block_radius;
+
+                T b_up   = I1(by - block_radius - 1, bx);
+                T b_down = I1(by + block_radius, bx);
+
+                int col_sum = up_col_sums(j, index) + calcDist(a_down, b_down) - calcDist(a_up, b_up);
+
+                dist_sums[index] += col_sum  - col_sums(first, index);
+                col_sums(first, index) = col_sum;
+                up_col_sums(j, index) = col_sum;
+            }
+        }
+
+        __device__ __forceinline__ void convolve_window(int i, int j, const int* dist_sums, float& velx, float& vely) const
+        {
+            int bestDist = numeric_limits<int>::max();
+            int bestInd = -1;
+
+            for (int index = threadIdx.x; index < search_window * search_window; index += STRIDE)
+            {
+                int curDist = dist_sums[index];
+                if (curDist < bestDist)
+                {
+                    bestDist = curDist;
+                    bestInd = index;
+                }
+            }
+
+            __shared__ int cta_dist_buffer[CTA_SIZE];
+            __shared__ int cta_ind_buffer[CTA_SIZE];
+
+            reduceKeyVal<CTA_SIZE>(cta_dist_buffer, bestDist, cta_ind_buffer, bestInd, threadIdx.x, less<int>());
+
+            if (threadIdx.x == 0)
+            {
+                int y = bestInd / search_window;
+                int x = bestInd - y * search_window;
+
+                velx = x - search_radius;
+                vely = y - search_radius;
+            }
+        }
+
+        __device__ __forceinline__ void operator()(PtrStepf velx, PtrStepf vely) const
+        {
+            int tbx = blockIdx.x * TILE_COLS;
+            int tby = blockIdx.y * TILE_ROWS;
+
+            int tex = ::min(tbx + TILE_COLS, I0.cols);
+            int tey = ::min(tby + TILE_ROWS, I0.rows);
+
+            PtrStepi col_sums;
+            col_sums.data = buffer.ptr(I0.cols + blockIdx.x * block_window) + blockIdx.y * search_window * search_window;
+            col_sums.step = buffer.step;
+
+            PtrStepi up_col_sums;
+            up_col_sums.data = buffer.data + blockIdx.y * search_window * search_window;
+            up_col_sums.step = buffer.step;
+
+            extern __shared__ int dist_sums[]; //search_window * search_window
+
+            int first = 0;
+
+            for (int i = tby; i < tey; ++i)
+            {
+                for (int j = tbx; j < tex; ++j)
+                {
+                    __syncthreads();
+
+                    if (j == tbx)
+                    {
+                        initSums_BruteForce(i, j, dist_sums, col_sums, up_col_sums);
+                        first = 0;
+                    }
+                    else
+                    {
+                        if (i == tby)
+                          shiftRight_FirstRow(i, j, first, dist_sums, col_sums, up_col_sums);
+                        else
+                          shiftRight_UpSums(i, j, first, dist_sums, col_sums, up_col_sums);
+
+                        first = (first + 1) % block_window;
+                    }
+
+                    __syncthreads();
+
+                    convolve_window(i, j, dist_sums, velx(i, j), vely(i, j));
+                }
+            }
+        }
+
+    };
+
+    template<typename T> __global__ void optflowbm_fast_kernel(const FastOptFlowBM<T> fbm, PtrStepf velx, PtrStepf vely)
+    {
+        fbm(velx, vely);
+    }
+
+    void get_buffer_size(int src_cols, int src_rows, int search_window, int block_window, int& buffer_cols, int& buffer_rows)
+    {
+        dim3 grid(divUp(src_cols, TILE_COLS), divUp(src_rows, TILE_ROWS));
+
+        buffer_cols = search_window * search_window * grid.y;
+        buffer_rows = src_cols + block_window * grid.x;
+    }
+
+    template <typename T>
+    void calc(PtrStepSzb I0, PtrStepSzb I1, PtrStepSzf velx, PtrStepSzf vely, PtrStepi buffer, int search_window, int block_window, cudaStream_t stream)
+    {
+        FastOptFlowBM<T> fbm(search_window, block_window, I0, I1, buffer);
+
+        dim3 block(CTA_SIZE, 1);
+        dim3 grid(divUp(I0.cols, TILE_COLS), divUp(I0.rows, TILE_ROWS));
+
+        size_t smem = search_window * search_window * sizeof(int);
+
+        optflowbm_fast_kernel<<<grid, block, smem, stream>>>(fbm, velx, vely);
+        cudaSafeCall ( cudaGetLastError () );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    template void calc<uchar>(PtrStepSzb I0, PtrStepSzb I1, PtrStepSzf velx, PtrStepSzf vely, PtrStepi buffer, int search_window, int block_window, cudaStream_t stream);
+}
+
+#endif // !defined CUDA_DISABLER
diff --git a/modules/cudalegacy/src/cuda/calib3d.cu b/modules/cudalegacy/src/cuda/calib3d.cu
new file mode 100644
index 00000000000..f30d2a493ac
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/calib3d.cu
@@ -0,0 +1,193 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/transform.hpp"
+#include "opencv2/core/cuda/functional.hpp"
+#include "opencv2/core/cuda/reduce.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    #define SOLVE_PNP_RANSAC_MAX_NUM_ITERS 200
+
+    namespace transform_points
+    {
+        __constant__ float3 crot0;
+        __constant__ float3 crot1;
+        __constant__ float3 crot2;
+        __constant__ float3 ctransl;
+
+        struct TransformOp : unary_function<float3, float3>
+        {
+            __device__ __forceinline__ float3 operator()(const float3& p) const
+            {
+                return make_float3(
+                        crot0.x * p.x + crot0.y * p.y + crot0.z * p.z + ctransl.x,
+                        crot1.x * p.x + crot1.y * p.y + crot1.z * p.z + ctransl.y,
+                        crot2.x * p.x + crot2.y * p.y + crot2.z * p.z + ctransl.z);
+            }
+            __host__ __device__ __forceinline__ TransformOp() {}
+            __host__ __device__ __forceinline__ TransformOp(const TransformOp&) {}
+        };
+
+        void call(const PtrStepSz<float3> src, const float* rot,
+                  const float* transl, PtrStepSz<float3> dst,
+                  cudaStream_t stream)
+        {
+            cudaSafeCall(cudaMemcpyToSymbol(crot0, rot, sizeof(float) * 3));
+            cudaSafeCall(cudaMemcpyToSymbol(crot1, rot + 3, sizeof(float) * 3));
+            cudaSafeCall(cudaMemcpyToSymbol(crot2, rot + 6, sizeof(float) * 3));
+            cudaSafeCall(cudaMemcpyToSymbol(ctransl, transl, sizeof(float) * 3));
+            cv::cuda::device::transform(src, dst, TransformOp(), WithOutMask(), stream);
+        }
+    } // namespace transform_points
+
+    namespace project_points
+    {
+        __constant__ float3 crot0;
+        __constant__ float3 crot1;
+        __constant__ float3 crot2;
+        __constant__ float3 ctransl;
+        __constant__ float3 cproj0;
+        __constant__ float3 cproj1;
+
+        struct ProjectOp : unary_function<float3, float3>
+        {
+            __device__ __forceinline__ float2 operator()(const float3& p) const
+            {
+                // Rotate and translate in 3D
+                float3 t = make_float3(
+                        crot0.x * p.x + crot0.y * p.y + crot0.z * p.z + ctransl.x,
+                        crot1.x * p.x + crot1.y * p.y + crot1.z * p.z + ctransl.y,
+                        crot2.x * p.x + crot2.y * p.y + crot2.z * p.z + ctransl.z);
+                // Project on 2D plane
+                return make_float2(
+                        (cproj0.x * t.x + cproj0.y * t.y) / t.z + cproj0.z,
+                        (cproj1.x * t.x + cproj1.y * t.y) / t.z + cproj1.z);
+            }
+            __host__ __device__ __forceinline__ ProjectOp() {}
+            __host__ __device__ __forceinline__ ProjectOp(const ProjectOp&) {}
+        };
+
+        void call(const PtrStepSz<float3> src, const float* rot,
+                  const float* transl, const float* proj, PtrStepSz<float2> dst,
+                  cudaStream_t stream)
+        {
+            cudaSafeCall(cudaMemcpyToSymbol(crot0, rot, sizeof(float) * 3));
+            cudaSafeCall(cudaMemcpyToSymbol(crot1, rot + 3, sizeof(float) * 3));
+            cudaSafeCall(cudaMemcpyToSymbol(crot2, rot + 6, sizeof(float) * 3));
+            cudaSafeCall(cudaMemcpyToSymbol(ctransl, transl, sizeof(float) * 3));
+            cudaSafeCall(cudaMemcpyToSymbol(cproj0, proj, sizeof(float) * 3));
+            cudaSafeCall(cudaMemcpyToSymbol(cproj1, proj + 3, sizeof(float) * 3));
+            cv::cuda::device::transform(src, dst, ProjectOp(), WithOutMask(), stream);
+        }
+    } // namespace project_points
+
+    namespace solve_pnp_ransac
+    {
+        __constant__ float3 crot_matrices[SOLVE_PNP_RANSAC_MAX_NUM_ITERS * 3];
+        __constant__ float3 ctransl_vectors[SOLVE_PNP_RANSAC_MAX_NUM_ITERS];
+
+        int maxNumIters()
+        {
+            return SOLVE_PNP_RANSAC_MAX_NUM_ITERS;
+        }
+
+        __device__ __forceinline__ float sqr(float x)
+        {
+            return x * x;
+        }
+
+        template <int BLOCK_SIZE>
+        __global__ void computeHypothesisScoresKernel(
+                const int num_points, const float3* object, const float2* image,
+                const float dist_threshold, int* g_num_inliers)
+        {
+            const float3* const &rot_mat = crot_matrices + blockIdx.x * 3;
+            const float3 &transl_vec = ctransl_vectors[blockIdx.x];
+            int num_inliers = 0;
+
+            for (int i = threadIdx.x; i < num_points; i += blockDim.x)
+            {
+                float3 p = object[i];
+                p = make_float3(
+                        rot_mat[0].x * p.x + rot_mat[0].y * p.y + rot_mat[0].z * p.z + transl_vec.x,
+                        rot_mat[1].x * p.x + rot_mat[1].y * p.y + rot_mat[1].z * p.z + transl_vec.y,
+                        rot_mat[2].x * p.x + rot_mat[2].y * p.y + rot_mat[2].z * p.z + transl_vec.z);
+                p.x /= p.z;
+                p.y /= p.z;
+                float2 image_p = image[i];
+                if (sqr(p.x - image_p.x) + sqr(p.y - image_p.y) < dist_threshold)
+                    ++num_inliers;
+            }
+
+            __shared__ int s_num_inliers[BLOCK_SIZE];
+            reduce<BLOCK_SIZE>(s_num_inliers, num_inliers, threadIdx.x, plus<int>());
+
+            if (threadIdx.x == 0)
+                g_num_inliers[blockIdx.x] = num_inliers;
+        }
+
+        void computeHypothesisScores(
+                const int num_hypotheses, const int num_points, const float* rot_matrices,
+                const float3* transl_vectors, const float3* object, const float2* image,
+                const float dist_threshold, int* hypothesis_scores)
+        {
+            cudaSafeCall(cudaMemcpyToSymbol(crot_matrices, rot_matrices, num_hypotheses * 3 * sizeof(float3)));
+            cudaSafeCall(cudaMemcpyToSymbol(ctransl_vectors, transl_vectors, num_hypotheses * sizeof(float3)));
+
+            dim3 threads(256);
+            dim3 grid(num_hypotheses);
+
+            computeHypothesisScoresKernel<256><<<grid, threads>>>(
+                    num_points, object, image, dist_threshold, hypothesis_scores);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    } // namespace solvepnp_ransac
+}}} // namespace cv { namespace cuda { namespace cudev
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudalegacy/src/cuda/ccomponetns.cu b/modules/cudalegacy/src/cuda/ccomponetns.cu
new file mode 100644
index 00000000000..59121296a6c
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/ccomponetns.cu
@@ -0,0 +1,534 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include <opencv2/core/cuda/common.hpp>
+#include <opencv2/core/cuda/vec_traits.hpp>
+#include <opencv2/core/cuda/vec_math.hpp>
+#include <opencv2/core/cuda/emulation.hpp>
+
+#include <iostream>
+#include <stdio.h>
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace ccl
+    {
+        enum
+        {
+            WARP_SIZE  = 32,
+            WARP_LOG   = 5,
+
+            CTA_SIZE_X = 32,
+            CTA_SIZE_Y = 8,
+
+            STA_SIZE_MERGE_Y = 4,
+            STA_SIZE_MERGE_X = 32,
+
+            TPB_X = 1,
+            TPB_Y = 4,
+
+            TILE_COLS = CTA_SIZE_X * TPB_X,
+            TILE_ROWS = CTA_SIZE_Y * TPB_Y
+        };
+
+        template<typename T> struct IntervalsTraits
+        {
+            typedef T elem_type;
+        };
+
+        template<> struct IntervalsTraits<unsigned char>
+        {
+            typedef int dist_type;
+            enum {ch = 1};
+        };
+
+        template<> struct IntervalsTraits<uchar3>
+        {
+            typedef int3 dist_type;
+            enum {ch = 3};
+        };
+
+        template<> struct IntervalsTraits<uchar4>
+        {
+            typedef int4 dist_type;
+            enum {ch = 4};
+        };
+
+        template<> struct IntervalsTraits<unsigned short>
+        {
+            typedef int dist_type;
+            enum {ch = 1};
+        };
+
+        template<> struct IntervalsTraits<ushort3>
+        {
+            typedef int3 dist_type;
+            enum {ch = 3};
+        };
+
+        template<> struct IntervalsTraits<ushort4>
+        {
+            typedef int4 dist_type;
+            enum {ch = 4};
+        };
+
+        template<> struct IntervalsTraits<float>
+        {
+            typedef float dist_type;
+            enum {ch = 1};
+        };
+
+        template<> struct IntervalsTraits<int>
+        {
+            typedef int dist_type;
+            enum {ch = 1};
+        };
+
+        typedef unsigned char component;
+        enum Edges { UP = 1, DOWN = 2, LEFT = 4, RIGHT = 8, EMPTY = 0xF0 };
+
+        template<typename T, int CH> struct InInterval {};
+
+        template<typename T> struct InInterval<T, 1>
+        {
+            typedef typename VecTraits<T>::elem_type E;
+            __host__ __device__ __forceinline__ InInterval(const float4& _lo, const float4& _hi) : lo((E)(-_lo.x)), hi((E)_hi.x) { }
+            T lo, hi;
+
+            template<typename I> __device__ __forceinline__ bool operator() (const I& a, const I& b) const
+            {
+                I d = a - b;
+                return lo <= d && d <= hi;
+            }
+        };
+
+
+        template<typename T> struct InInterval<T, 3>
+        {
+            typedef typename VecTraits<T>::elem_type E;
+            __host__ __device__ __forceinline__ InInterval(const float4& _lo, const float4& _hi)
+            : lo (VecTraits<T>::make((E)(-_lo.x), (E)(-_lo.y), (E)(-_lo.z))), hi (VecTraits<T>::make((E)_hi.x, (E)_hi.y, (E)_hi.z)) { }
+            T lo, hi;
+
+            template<typename I> __device__ __forceinline__ bool operator() (const I& a, const I& b) const
+            {
+                I d = saturate_cast<I>(a - b);
+                return lo.x <= d.x && d.x <= hi.x &&
+                       lo.y <= d.y && d.y <= hi.y &&
+                       lo.z <= d.z && d.z <= hi.z;
+            }
+        };
+
+        template<typename T> struct InInterval<T, 4>
+        {
+            typedef typename VecTraits<T>::elem_type E;
+            __host__ __device__ __forceinline__ InInterval(const float4& _lo, const float4& _hi)
+            : lo (VecTraits<T>::make((E)(-_lo.x), (E)(-_lo.y), (E)(-_lo.z), (E)(-_lo.w))), hi (VecTraits<T>::make((E)_hi.x, (E)_hi.y, (E)_hi.z, (E)_hi.w)) { }
+            T lo, hi;
+
+            template<typename I> __device__ __forceinline__ bool operator() (const I& a, const I& b) const
+            {
+                I d = saturate_cast<I>(a - b);
+                return lo.x <= d.x && d.x <= hi.x &&
+                       lo.y <= d.y && d.y <= hi.y &&
+                       lo.z <= d.z && d.z <= hi.z &&
+                       lo.w <= d.w && d.w <= hi.w;
+            }
+        };
+
+
+        template<typename T, typename F>
+        __global__ void computeConnectivity(const PtrStepSz<T> image, PtrStepSzb components, F connected)
+        {
+            int x = threadIdx.x + blockIdx.x * blockDim.x;
+            int y = threadIdx.y + blockIdx.y * blockDim.y;
+
+            if (x >= image.cols || y >= image.rows) return;
+
+            T intensity = image(y, x);
+            component c = 0;
+
+            if ( x > 0 && connected(intensity, image(y, x - 1)))
+                c |= LEFT;
+
+            if ( y > 0 && connected(intensity, image(y - 1, x)))
+                c |= UP;
+
+            if ( x + 1 < image.cols && connected(intensity, image(y, x + 1)))
+                c |= RIGHT;
+
+            if ( y + 1 < image.rows && connected(intensity, image(y + 1, x)))
+                c |= DOWN;
+
+            components(y, x) = c;
+        }
+
+        template< typename T>
+        void computeEdges(const PtrStepSzb& image, PtrStepSzb edges, const float4& lo, const float4& hi, cudaStream_t stream)
+        {
+            dim3 block(CTA_SIZE_X, CTA_SIZE_Y);
+            dim3 grid(divUp(image.cols, block.x), divUp(image.rows, block.y));
+
+            typedef InInterval<typename IntervalsTraits<T>::dist_type, IntervalsTraits<T>::ch> Int_t;
+
+            Int_t inInt(lo, hi);
+            computeConnectivity<T, Int_t><<<grid, block, 0, stream>>>(static_cast<const PtrStepSz<T> >(image), edges, inInt);
+
+            cudaSafeCall( cudaGetLastError() );
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template void computeEdges<uchar>  (const PtrStepSzb& image, PtrStepSzb edges, const float4& lo, const float4& hi, cudaStream_t stream);
+        template void computeEdges<uchar3> (const PtrStepSzb& image, PtrStepSzb edges, const float4& lo, const float4& hi, cudaStream_t stream);
+        template void computeEdges<uchar4> (const PtrStepSzb& image, PtrStepSzb edges, const float4& lo, const float4& hi, cudaStream_t stream);
+        template void computeEdges<ushort> (const PtrStepSzb& image, PtrStepSzb edges, const float4& lo, const float4& hi, cudaStream_t stream);
+        template void computeEdges<ushort3>(const PtrStepSzb& image, PtrStepSzb edges, const float4& lo, const float4& hi, cudaStream_t stream);
+        template void computeEdges<ushort4>(const PtrStepSzb& image, PtrStepSzb edges, const float4& lo, const float4& hi, cudaStream_t stream);
+        template void computeEdges<int>    (const PtrStepSzb& image, PtrStepSzb edges, const float4& lo, const float4& hi, cudaStream_t stream);
+        template void computeEdges<float>  (const PtrStepSzb& image, PtrStepSzb edges, const float4& lo, const float4& hi, cudaStream_t stream);
+
+        __global__ void lableTiles(const PtrStepSzb edges, PtrStepSzi comps)
+        {
+            int x = threadIdx.x + blockIdx.x * TILE_COLS;
+            int y = threadIdx.y + blockIdx.y * TILE_ROWS;
+
+            if (x >= edges.cols || y >= edges.rows) return;
+
+            //currently x is 1
+            int bounds = ((y + TPB_Y) < edges.rows);
+
+            __shared__ int labelsTile[TILE_ROWS][TILE_COLS];
+            __shared__ int  edgesTile[TILE_ROWS][TILE_COLS];
+
+            int new_labels[TPB_Y][TPB_X];
+            int old_labels[TPB_Y][TPB_X];
+
+            #pragma unroll
+            for (int i = 0; i < TPB_Y; ++i)
+                #pragma unroll
+                for (int j = 0; j < TPB_X; ++j)
+                {
+                    int yloc = threadIdx.y + CTA_SIZE_Y * i;
+                    int xloc = threadIdx.x + CTA_SIZE_X * j;
+                    component c = edges(bounds * (y + CTA_SIZE_Y * i), x + CTA_SIZE_X * j);
+
+                    if (!xloc) c &= ~LEFT;
+                    if (!yloc) c &= ~UP;
+
+                    if (xloc == TILE_COLS -1) c &= ~RIGHT;
+                    if (yloc == TILE_ROWS -1) c &= ~DOWN;
+
+                    new_labels[i][j] = yloc * TILE_COLS + xloc;
+                    edgesTile[yloc][xloc] = c;
+                }
+
+            for (int k = 0; ;++k)
+            {
+                //1. backup
+                #pragma unroll
+                for (int i = 0; i < TPB_Y; ++i)
+                    #pragma unroll
+                    for (int j = 0; j < TPB_X; ++j)
+                    {
+                        int yloc = threadIdx.y + CTA_SIZE_Y * i;
+                        int xloc = threadIdx.x + CTA_SIZE_X * j;
+
+                        old_labels[i][j]       = new_labels[i][j];
+                        labelsTile[yloc][xloc] = new_labels[i][j];
+                    }
+
+                __syncthreads();
+
+                //2. compare local arrays
+                #pragma unroll
+                for (int i = 0; i < TPB_Y; ++i)
+                    #pragma unroll
+                    for (int j = 0; j < TPB_X; ++j)
+                    {
+                        int yloc = threadIdx.y + CTA_SIZE_Y * i;
+                        int xloc = threadIdx.x + CTA_SIZE_X * j;
+
+                        component c = edgesTile[yloc][xloc];
+                        int label = new_labels[i][j];
+
+                        if (c & UP)
+                           label = ::min(label, labelsTile[yloc - 1][xloc]);
+
+                        if (c &  DOWN)
+                           label = ::min(label, labelsTile[yloc + 1][xloc]);
+
+                        if (c & LEFT)
+                           label = ::min(label, labelsTile[yloc][xloc - 1]);
+
+                        if (c & RIGHT)
+                           label = ::min(label, labelsTile[yloc][xloc + 1]);
+
+                       new_labels[i][j] = label;
+                    }
+
+                __syncthreads();
+
+                //3. determine: Is any value changed?
+                int changed = 0;
+                #pragma unroll
+                for (int i = 0; i < TPB_Y; ++i)
+                    #pragma unroll
+                    for (int j = 0; j < TPB_X; ++j)
+                    {
+                        if (new_labels[i][j] < old_labels[i][j])
+                        {
+                            changed = 1;
+                            Emulation::smem::atomicMin(&labelsTile[0][0] + old_labels[i][j], new_labels[i][j]);
+                        }
+                    }
+
+                changed = Emulation::syncthreadsOr(changed);
+
+                if (!changed)
+                    break;
+
+                //4. Compact paths
+                const int *labels = &labelsTile[0][0];
+                #pragma unroll
+                for (int i = 0; i < TPB_Y; ++i)
+                    #pragma unroll
+                    for (int j = 0; j < TPB_X; ++j)
+                    {
+                        int label = new_labels[i][j];
+
+                        while( labels[label] < label ) label = labels[label];
+
+                        new_labels[i][j] = label;
+                    }
+                __syncthreads();
+            }
+
+            #pragma unroll
+            for (int i = 0; i < TPB_Y; ++i)
+            #pragma unroll
+                for (int j = 0; j < TPB_X; ++j)
+                {
+                    int label = new_labels[i][j];
+                    int yloc = label / TILE_COLS;
+                    int xloc = label - yloc * TILE_COLS;
+
+                    xloc += blockIdx.x * TILE_COLS;
+                    yloc += blockIdx.y * TILE_ROWS;
+
+                    label = yloc * edges.cols + xloc;
+                    // do it for x too.
+                    if (y + CTA_SIZE_Y * i < comps.rows) comps(y + CTA_SIZE_Y * i, x + CTA_SIZE_X * j) = label;
+                }
+        }
+
+        __device__ __forceinline__ int root(const PtrStepSzi& comps, int label)
+        {
+            while(1)
+            {
+                int y = label / comps.cols;
+                int x = label - y * comps.cols;
+
+                int parent = comps(y, x);
+
+                if (label == parent) break;
+
+                label = parent;
+            }
+            return label;
+        }
+
+        __device__ __forceinline__ void isConnected(PtrStepSzi& comps, int l1, int l2, bool& changed)
+        {
+            int r1 = root(comps, l1);
+            int r2 = root(comps, l2);
+
+            if (r1 == r2) return;
+
+            int mi = ::min(r1, r2);
+            int ma = ::max(r1, r2);
+
+            int y = ma / comps.cols;
+            int x = ma - y * comps.cols;
+
+            atomicMin(&comps.ptr(y)[x], mi);
+            changed = true;
+        }
+
+        __global__ void crossMerge(const int tilesNumY, const int tilesNumX, int tileSizeY, int tileSizeX,
+            const PtrStepSzb edges, PtrStepSzi comps, const int yIncomplete, int xIncomplete)
+        {
+            int tid = threadIdx.y * blockDim.x + threadIdx.x;
+            int stride = blockDim.y * blockDim.x;
+
+            int ybegin = blockIdx.y * (tilesNumY * tileSizeY);
+            int yend   = ybegin + tilesNumY * tileSizeY;
+
+            if (blockIdx.y == gridDim.y - 1)
+            {
+                yend -= yIncomplete * tileSizeY;
+                yend -= tileSizeY;
+                tileSizeY = (edges.rows % tileSizeY);
+
+                yend += tileSizeY;
+            }
+
+            int xbegin = blockIdx.x * tilesNumX * tileSizeX;
+            int xend   = xbegin + tilesNumX * tileSizeX;
+
+            if (blockIdx.x == gridDim.x - 1)
+            {
+                if (xIncomplete) yend = ybegin;
+                xend -= xIncomplete * tileSizeX;
+                xend -= tileSizeX;
+                tileSizeX = (edges.cols % tileSizeX);
+
+                xend += tileSizeX;
+            }
+
+            if (blockIdx.y == (gridDim.y - 1) && yIncomplete)
+            {
+                xend = xbegin;
+            }
+
+            int tasksV = (tilesNumX - 1) * (yend - ybegin);
+            int tasksH = (tilesNumY - 1) * (xend - xbegin);
+
+            int total = tasksH + tasksV;
+
+            bool changed;
+            do
+            {
+                changed = false;
+                for (int taskIdx = tid; taskIdx < total; taskIdx += stride)
+                {
+                    if (taskIdx < tasksH)
+                    {
+                        int indexH = taskIdx;
+
+                        int row = indexH / (xend - xbegin);
+                        int col = indexH - row * (xend - xbegin);
+
+                        int y = ybegin + (row + 1) * tileSizeY;
+                        int x = xbegin + col;
+
+                        component e = edges( x, y);
+                        if (e & UP)
+                        {
+                            int lc = comps(y,x);
+                            int lu = comps(y - 1, x);
+
+                            isConnected(comps, lc, lu, changed);
+                        }
+                    }
+                    else
+                    {
+                        int indexV = taskIdx - tasksH;
+
+                        int col = indexV / (yend - ybegin);
+                        int row = indexV - col * (yend - ybegin);
+
+                        int x = xbegin + (col + 1) * tileSizeX;
+                        int y = ybegin + row;
+
+                        component e = edges(x, y);
+                        if (e & LEFT)
+                        {
+                            int lc = comps(y, x);
+                            int ll = comps(y, x - 1);
+
+                            isConnected(comps, lc, ll, changed);
+                        }
+                    }
+                }
+            } while (Emulation::syncthreadsOr(changed));
+        }
+
+        __global__ void flatten(const PtrStepSzb edges, PtrStepSzi comps)
+        {
+            int x = threadIdx.x + blockIdx.x * blockDim.x;
+            int y = threadIdx.y + blockIdx.y * blockDim.y;
+
+            if( x < comps.cols && y < comps.rows)
+                comps(y, x) = root(comps, comps(y, x));
+        }
+
+        enum {CC_NO_COMPACT = 0, CC_COMPACT_LABELS = 1};
+
+        void labelComponents(const PtrStepSzb& edges, PtrStepSzi comps, int flags, cudaStream_t stream)
+        {
+            CV_UNUSED(flags);
+            dim3 block(CTA_SIZE_X, CTA_SIZE_Y);
+            dim3 grid(divUp(edges.cols, TILE_COLS), divUp(edges.rows, TILE_ROWS));
+
+            lableTiles<<<grid, block, 0, stream>>>(edges, comps);
+            cudaSafeCall( cudaGetLastError() );
+
+            int tileSizeX = TILE_COLS, tileSizeY = TILE_ROWS;
+            while (grid.x > 1 || grid.y > 1)
+            {
+                dim3 mergeGrid((int)ceilf(grid.x / 2.f), (int)ceilf(grid.y / 2.f));
+                dim3 mergeBlock(STA_SIZE_MERGE_X, STA_SIZE_MERGE_Y);
+                // debug log
+                // std::cout << "merging: " << grid.y  << " x " << grid.x << " ---> " << mergeGrid.y <<  " x " << mergeGrid.x << " for tiles: " << tileSizeY << " x " << tileSizeX << std::endl;
+                crossMerge<<<mergeGrid, mergeBlock, 0, stream>>>(2, 2, tileSizeY, tileSizeX, edges, comps, (int)ceilf(grid.y / 2.f) - grid.y / 2, (int)ceilf(grid.x / 2.f) - grid.x / 2);
+                tileSizeX <<= 1;
+                tileSizeY <<= 1;
+                grid = mergeGrid;
+
+                cudaSafeCall( cudaGetLastError() );
+            }
+
+            grid.x = divUp(edges.cols, block.x);
+            grid.y = divUp(edges.rows, block.y);
+            flatten<<<grid, block, 0, stream>>>(edges, comps);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    }
+} } }
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudalegacy/src/cuda/fgd.cu b/modules/cudalegacy/src/cuda/fgd.cu
new file mode 100644
index 00000000000..9f7b142c574
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/fgd.cu
@@ -0,0 +1,801 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/limits.hpp"
+#include "opencv2/core/cuda/utility.hpp"
+#include "opencv2/core/cuda/reduce.hpp"
+#include "opencv2/core/cuda/functional.hpp"
+#include "fgd.hpp"
+
+using namespace cv::cuda;
+using namespace cv::cuda::device;
+
+namespace fgd
+{
+    ////////////////////////////////////////////////////////////////////////////
+    // calcDiffHistogram
+
+    const unsigned int UINT_BITS = 32U;
+    const int LOG_WARP_SIZE = 5;
+    const int WARP_SIZE = 1 << LOG_WARP_SIZE;
+#if (__CUDA_ARCH__ < 120)
+    const unsigned int TAG_MASK = (1U << (UINT_BITS - LOG_WARP_SIZE)) - 1U;
+#endif
+
+    const int MERGE_THREADBLOCK_SIZE = 256;
+
+    __device__ __forceinline__ void addByte(unsigned int* s_WarpHist_, unsigned int data, unsigned int threadTag)
+    {
+        #if (__CUDA_ARCH__ < 120)
+            volatile unsigned int* s_WarpHist = s_WarpHist_;
+            unsigned int count;
+            do
+            {
+                count = s_WarpHist[data] & TAG_MASK;
+                count = threadTag | (count + 1);
+                s_WarpHist[data] = count;
+            } while (s_WarpHist[data] != count);
+        #else
+            atomicInc(s_WarpHist_ + data, (unsigned int)(-1));
+        #endif
+    }
+
+
+    template <typename PT, typename CT>
+    __global__ void calcPartialHistogram(const PtrStepSz<PT> prevFrame, const PtrStep<CT> curFrame, unsigned int* partialBuf0, unsigned int* partialBuf1, unsigned int* partialBuf2)
+    {
+#if (__CUDA_ARCH__ < 200)
+        const int HISTOGRAM_WARP_COUNT = 4;
+#else
+        const int HISTOGRAM_WARP_COUNT = 6;
+#endif
+        const int HISTOGRAM_THREADBLOCK_SIZE = HISTOGRAM_WARP_COUNT * WARP_SIZE;
+        const int HISTOGRAM_THREADBLOCK_MEMORY = HISTOGRAM_WARP_COUNT * HISTOGRAM_BIN_COUNT;
+
+        //Per-warp subhistogram storage
+        __shared__ unsigned int s_Hist0[HISTOGRAM_THREADBLOCK_MEMORY];
+        __shared__ unsigned int s_Hist1[HISTOGRAM_THREADBLOCK_MEMORY];
+        __shared__ unsigned int s_Hist2[HISTOGRAM_THREADBLOCK_MEMORY];
+
+        //Clear shared memory storage for current threadblock before processing
+        #pragma unroll
+        for (int i = 0; i < (HISTOGRAM_THREADBLOCK_MEMORY / HISTOGRAM_THREADBLOCK_SIZE); ++i)
+        {
+           s_Hist0[threadIdx.x + i * HISTOGRAM_THREADBLOCK_SIZE] = 0;
+           s_Hist1[threadIdx.x + i * HISTOGRAM_THREADBLOCK_SIZE] = 0;
+           s_Hist2[threadIdx.x + i * HISTOGRAM_THREADBLOCK_SIZE] = 0;
+        }
+        __syncthreads();
+
+        const unsigned int warpId = threadIdx.x >> LOG_WARP_SIZE;
+
+        unsigned int* s_WarpHist0 = s_Hist0 + warpId * HISTOGRAM_BIN_COUNT;
+        unsigned int* s_WarpHist1 = s_Hist1 + warpId * HISTOGRAM_BIN_COUNT;
+        unsigned int* s_WarpHist2 = s_Hist2 + warpId * HISTOGRAM_BIN_COUNT;
+
+        const unsigned int tag = threadIdx.x << (UINT_BITS - LOG_WARP_SIZE);
+        const int dataCount = prevFrame.rows * prevFrame.cols;
+        for (unsigned int pos = blockIdx.x * HISTOGRAM_THREADBLOCK_SIZE + threadIdx.x; pos < dataCount; pos += HISTOGRAM_THREADBLOCK_SIZE * PARTIAL_HISTOGRAM_COUNT)
+        {
+            const unsigned int y = pos / prevFrame.cols;
+            const unsigned int x = pos % prevFrame.cols;
+
+            PT prevVal = prevFrame(y, x);
+            CT curVal = curFrame(y, x);
+
+            int3 diff = make_int3(
+                ::abs(curVal.x - prevVal.x),
+                ::abs(curVal.y - prevVal.y),
+                ::abs(curVal.z - prevVal.z)
+            );
+
+            addByte(s_WarpHist0, diff.x, tag);
+            addByte(s_WarpHist1, diff.y, tag);
+            addByte(s_WarpHist2, diff.z, tag);
+        }
+        __syncthreads();
+
+        //Merge per-warp histograms into per-block and write to global memory
+        for (unsigned int bin = threadIdx.x; bin < HISTOGRAM_BIN_COUNT; bin += HISTOGRAM_THREADBLOCK_SIZE)
+        {
+            unsigned int sum0 = 0;
+            unsigned int sum1 = 0;
+            unsigned int sum2 = 0;
+
+            #pragma unroll
+            for (int i = 0; i < HISTOGRAM_WARP_COUNT; ++i)
+            {
+                #if (__CUDA_ARCH__ < 120)
+                    sum0 += s_Hist0[bin + i * HISTOGRAM_BIN_COUNT] & TAG_MASK;
+                    sum1 += s_Hist1[bin + i * HISTOGRAM_BIN_COUNT] & TAG_MASK;
+                    sum2 += s_Hist2[bin + i * HISTOGRAM_BIN_COUNT] & TAG_MASK;
+                #else
+                    sum0 += s_Hist0[bin + i * HISTOGRAM_BIN_COUNT];
+                    sum1 += s_Hist1[bin + i * HISTOGRAM_BIN_COUNT];
+                    sum2 += s_Hist2[bin + i * HISTOGRAM_BIN_COUNT];
+                #endif
+            }
+
+            partialBuf0[blockIdx.x * HISTOGRAM_BIN_COUNT + bin] = sum0;
+            partialBuf1[blockIdx.x * HISTOGRAM_BIN_COUNT + bin] = sum1;
+            partialBuf2[blockIdx.x * HISTOGRAM_BIN_COUNT + bin] = sum2;
+        }
+    }
+
+    __global__ void mergeHistogram(const unsigned int* partialBuf0, const unsigned int* partialBuf1, const unsigned int* partialBuf2, unsigned int* hist0, unsigned int* hist1, unsigned int* hist2)
+    {
+        unsigned int sum0 = 0;
+        unsigned int sum1 = 0;
+        unsigned int sum2 = 0;
+
+        #pragma unroll
+        for (unsigned int i = threadIdx.x; i < PARTIAL_HISTOGRAM_COUNT; i += MERGE_THREADBLOCK_SIZE)
+        {
+            sum0 += partialBuf0[blockIdx.x + i * HISTOGRAM_BIN_COUNT];
+            sum1 += partialBuf1[blockIdx.x + i * HISTOGRAM_BIN_COUNT];
+            sum2 += partialBuf2[blockIdx.x + i * HISTOGRAM_BIN_COUNT];
+        }
+
+        __shared__ unsigned int data0[MERGE_THREADBLOCK_SIZE];
+        __shared__ unsigned int data1[MERGE_THREADBLOCK_SIZE];
+        __shared__ unsigned int data2[MERGE_THREADBLOCK_SIZE];
+
+        plus<unsigned int> op;
+        reduce<MERGE_THREADBLOCK_SIZE>(smem_tuple(data0, data1, data2), thrust::tie(sum0, sum1, sum2), threadIdx.x, thrust::make_tuple(op, op, op));
+
+        if(threadIdx.x == 0)
+        {
+            hist0[blockIdx.x] = sum0;
+            hist1[blockIdx.x] = sum1;
+            hist2[blockIdx.x] = sum2;
+        }
+    }
+
+    template <typename PT, typename CT>
+    void calcDiffHistogram_gpu(PtrStepSzb prevFrame, PtrStepSzb curFrame,
+                               unsigned int* hist0, unsigned int* hist1, unsigned int* hist2,
+                               unsigned int* partialBuf0, unsigned int* partialBuf1, unsigned int* partialBuf2,
+                               bool cc20, cudaStream_t stream)
+    {
+        const int HISTOGRAM_WARP_COUNT = cc20 ? 6 : 4;
+        const int HISTOGRAM_THREADBLOCK_SIZE = HISTOGRAM_WARP_COUNT * WARP_SIZE;
+
+        calcPartialHistogram<PT, CT><<<PARTIAL_HISTOGRAM_COUNT, HISTOGRAM_THREADBLOCK_SIZE, 0, stream>>>(
+                (PtrStepSz<PT>)prevFrame, (PtrStepSz<CT>)curFrame, partialBuf0, partialBuf1, partialBuf2);
+        cudaSafeCall( cudaGetLastError() );
+
+        mergeHistogram<<<HISTOGRAM_BIN_COUNT, MERGE_THREADBLOCK_SIZE, 0, stream>>>(partialBuf0, partialBuf1, partialBuf2, hist0, hist1, hist2);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    template void calcDiffHistogram_gpu<uchar3, uchar3>(PtrStepSzb prevFrame, PtrStepSzb curFrame, unsigned int* hist0, unsigned int* hist1, unsigned int* hist2, unsigned int* partialBuf0, unsigned int* partialBuf1, unsigned int* partialBuf2, bool cc20, cudaStream_t stream);
+    template void calcDiffHistogram_gpu<uchar3, uchar4>(PtrStepSzb prevFrame, PtrStepSzb curFrame, unsigned int* hist0, unsigned int* hist1, unsigned int* hist2, unsigned int* partialBuf0, unsigned int* partialBuf1, unsigned int* partialBuf2, bool cc20, cudaStream_t stream);
+    template void calcDiffHistogram_gpu<uchar4, uchar3>(PtrStepSzb prevFrame, PtrStepSzb curFrame, unsigned int* hist0, unsigned int* hist1, unsigned int* hist2, unsigned int* partialBuf0, unsigned int* partialBuf1, unsigned int* partialBuf2, bool cc20, cudaStream_t stream);
+    template void calcDiffHistogram_gpu<uchar4, uchar4>(PtrStepSzb prevFrame, PtrStepSzb curFrame, unsigned int* hist0, unsigned int* hist1, unsigned int* hist2, unsigned int* partialBuf0, unsigned int* partialBuf1, unsigned int* partialBuf2, bool cc20, cudaStream_t stream);
+
+    /////////////////////////////////////////////////////////////////////////
+    // calcDiffThreshMask
+
+    template <typename PT, typename CT>
+    __global__ void calcDiffThreshMask(const PtrStepSz<PT> prevFrame, const PtrStep<CT> curFrame, uchar3 bestThres, PtrStepb changeMask)
+    {
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+
+        if (y > prevFrame.rows || x > prevFrame.cols)
+            return;
+
+        PT prevVal = prevFrame(y, x);
+        CT curVal = curFrame(y, x);
+
+        int3 diff = make_int3(
+            ::abs(curVal.x - prevVal.x),
+            ::abs(curVal.y - prevVal.y),
+            ::abs(curVal.z - prevVal.z)
+        );
+
+        if (diff.x > bestThres.x || diff.y > bestThres.y || diff.z > bestThres.z)
+            changeMask(y, x) = 255;
+    }
+
+    template <typename PT, typename CT>
+    void calcDiffThreshMask_gpu(PtrStepSzb prevFrame, PtrStepSzb curFrame, uchar3 bestThres, PtrStepSzb changeMask, cudaStream_t stream)
+    {
+        dim3 block(32, 8);
+        dim3 grid(divUp(prevFrame.cols, block.x), divUp(prevFrame.rows, block.y));
+
+        calcDiffThreshMask<PT, CT><<<grid, block, 0, stream>>>((PtrStepSz<PT>)prevFrame, (PtrStepSz<CT>)curFrame, bestThres, changeMask);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    template void calcDiffThreshMask_gpu<uchar3, uchar3>(PtrStepSzb prevFrame, PtrStepSzb curFrame, uchar3 bestThres, PtrStepSzb changeMask, cudaStream_t stream);
+    template void calcDiffThreshMask_gpu<uchar3, uchar4>(PtrStepSzb prevFrame, PtrStepSzb curFrame, uchar3 bestThres, PtrStepSzb changeMask, cudaStream_t stream);
+    template void calcDiffThreshMask_gpu<uchar4, uchar3>(PtrStepSzb prevFrame, PtrStepSzb curFrame, uchar3 bestThres, PtrStepSzb changeMask, cudaStream_t stream);
+    template void calcDiffThreshMask_gpu<uchar4, uchar4>(PtrStepSzb prevFrame, PtrStepSzb curFrame, uchar3 bestThres, PtrStepSzb changeMask, cudaStream_t stream);
+
+    /////////////////////////////////////////////////////////////////////////
+    // bgfgClassification
+
+    __constant__ BGPixelStat c_stat;
+
+    void setBGPixelStat(const BGPixelStat& stat)
+    {
+        cudaSafeCall( cudaMemcpyToSymbol(c_stat, &stat, sizeof(BGPixelStat)) );
+    }
+
+    template <typename T> struct Output;
+    template <> struct Output<uchar3>
+    {
+        static __device__ __forceinline__ uchar3 make(uchar v0, uchar v1, uchar v2)
+        {
+            return make_uchar3(v0, v1, v2);
+        }
+    };
+    template <> struct Output<uchar4>
+    {
+        static __device__ __forceinline__ uchar4 make(uchar v0, uchar v1, uchar v2)
+        {
+            return make_uchar4(v0, v1, v2, 255);
+        }
+    };
+
+    template <typename PT, typename CT, typename OT>
+    __global__ void bgfgClassification(const PtrStepSz<PT> prevFrame, const PtrStep<CT> curFrame,
+                                       const PtrStepb Ftd, const PtrStepb Fbd, PtrStepb foreground,
+                                       int deltaC, int deltaCC, float alpha2, int N1c, int N1cc)
+    {
+        const int i = blockIdx.y * blockDim.y + threadIdx.y;
+        const int j = blockIdx.x * blockDim.x + threadIdx.x;
+
+        if (i > prevFrame.rows || j > prevFrame.cols)
+            return;
+
+        if (Fbd(i, j) || Ftd(i, j))
+        {
+            float Pb  = 0.0f;
+            float Pv  = 0.0f;
+            float Pvb = 0.0f;
+
+            int val = 0;
+
+            // Is it a motion pixel?
+            if (Ftd(i, j))
+            {
+                if (!c_stat.is_trained_dyn_model(i, j))
+                    val = 1;
+                else
+                {
+                    PT prevVal = prevFrame(i, j);
+                    CT curVal = curFrame(i, j);
+
+                    // Compare with stored CCt vectors:
+                    for (int k = 0; k < N1cc && c_stat.PV_CC(i, j, k) > alpha2; ++k)
+                    {
+                        OT v1 = c_stat.V1_CC<OT>(i, j, k);
+                        OT v2 = c_stat.V2_CC<OT>(i, j, k);
+
+                        if (::abs(v1.x - prevVal.x) <= deltaCC &&
+                            ::abs(v1.y - prevVal.y) <= deltaCC &&
+                            ::abs(v1.z - prevVal.z) <= deltaCC &&
+                            ::abs(v2.x - curVal.x) <= deltaCC &&
+                            ::abs(v2.y - curVal.y) <= deltaCC &&
+                            ::abs(v2.z - curVal.z) <= deltaCC)
+                        {
+                            Pv += c_stat.PV_CC(i, j, k);
+                            Pvb += c_stat.PVB_CC(i, j, k);
+                        }
+                    }
+
+                    Pb = c_stat.Pbcc(i, j);
+                    if (2 * Pvb * Pb <= Pv)
+                        val = 1;
+                }
+            }
+            else if(c_stat.is_trained_st_model(i, j))
+            {
+                CT curVal = curFrame(i, j);
+
+                // Compare with stored Ct vectors:
+                for (int k = 0; k < N1c && c_stat.PV_C(i, j, k) > alpha2; ++k)
+                {
+                    OT v = c_stat.V_C<OT>(i, j, k);
+
+                    if (::abs(v.x - curVal.x) <= deltaC &&
+                        ::abs(v.y - curVal.y) <= deltaC &&
+                        ::abs(v.z - curVal.z) <= deltaC)
+                    {
+                        Pv += c_stat.PV_C(i, j, k);
+                        Pvb += c_stat.PVB_C(i, j, k);
+                    }
+                }
+                Pb = c_stat.Pbc(i, j);
+                if (2 * Pvb * Pb <= Pv)
+                    val = 1;
+            }
+
+            // Update foreground:
+            foreground(i, j) = static_cast<uchar>(val);
+        } // end if( change detection...
+    }
+
+    template <typename PT, typename CT, typename OT>
+    void bgfgClassification_gpu(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground,
+                                int deltaC, int deltaCC, float alpha2, int N1c, int N1cc, cudaStream_t stream)
+    {
+        dim3 block(32, 8);
+        dim3 grid(divUp(prevFrame.cols, block.x), divUp(prevFrame.rows, block.y));
+
+        cudaSafeCall( cudaFuncSetCacheConfig(bgfgClassification<PT, CT, OT>, cudaFuncCachePreferL1) );
+
+        bgfgClassification<PT, CT, OT><<<grid, block, 0, stream>>>((PtrStepSz<PT>)prevFrame, (PtrStepSz<CT>)curFrame,
+                                                                   Ftd, Fbd, foreground,
+                                                                   deltaC, deltaCC, alpha2, N1c, N1cc);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    template void bgfgClassification_gpu<uchar3, uchar3, uchar3>(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, int deltaC, int deltaCC, float alpha2, int N1c, int N1cc, cudaStream_t stream);
+    template void bgfgClassification_gpu<uchar3, uchar3, uchar4>(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, int deltaC, int deltaCC, float alpha2, int N1c, int N1cc, cudaStream_t stream);
+    template void bgfgClassification_gpu<uchar3, uchar4, uchar3>(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, int deltaC, int deltaCC, float alpha2, int N1c, int N1cc, cudaStream_t stream);
+    template void bgfgClassification_gpu<uchar3, uchar4, uchar4>(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, int deltaC, int deltaCC, float alpha2, int N1c, int N1cc, cudaStream_t stream);
+    template void bgfgClassification_gpu<uchar4, uchar3, uchar3>(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, int deltaC, int deltaCC, float alpha2, int N1c, int N1cc, cudaStream_t stream);
+    template void bgfgClassification_gpu<uchar4, uchar3, uchar4>(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, int deltaC, int deltaCC, float alpha2, int N1c, int N1cc, cudaStream_t stream);
+    template void bgfgClassification_gpu<uchar4, uchar4, uchar3>(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, int deltaC, int deltaCC, float alpha2, int N1c, int N1cc, cudaStream_t stream);
+    template void bgfgClassification_gpu<uchar4, uchar4, uchar4>(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, int deltaC, int deltaCC, float alpha2, int N1c, int N1cc, cudaStream_t stream);
+
+    ////////////////////////////////////////////////////////////////////////////
+    // updateBackgroundModel
+
+    template <typename PT, typename CT, typename OT, class PrevFramePtr2D, class CurFramePtr2D, class FtdPtr2D, class FbdPtr2D>
+    __global__ void updateBackgroundModel(int cols, int rows, const PrevFramePtr2D prevFrame, const CurFramePtr2D curFrame, const FtdPtr2D Ftd, const FbdPtr2D Fbd,
+                                          PtrStepb foreground, PtrStep<OT> background,
+                                          int deltaC, int deltaCC, float alpha1, float alpha2, float alpha3, int N1c, int N1cc, int N2c, int N2cc, float T)
+    {
+        const int i = blockIdx.y * blockDim.y + threadIdx.y;
+        const int j = blockIdx.x * blockDim.x + threadIdx.x;
+
+        if (i > rows || j > cols)
+            return;
+
+        const float MIN_PV = 1e-10f;
+
+        const uchar is_trained_dyn_model = c_stat.is_trained_dyn_model(i, j);
+        if (Ftd(i, j) || !is_trained_dyn_model)
+        {
+            const float alpha = is_trained_dyn_model ? alpha2 : alpha3;
+
+            float Pbcc = c_stat.Pbcc(i, j);
+
+            //update Pb
+            Pbcc *= (1.0f - alpha);
+            if (!foreground(i, j))
+            {
+                Pbcc += alpha;
+            }
+
+            int min_dist = numeric_limits<int>::max();
+            int indx = -1;
+
+            PT prevVal = prevFrame(i, j);
+            CT curVal = curFrame(i, j);
+
+            // Find best Vi match:
+            for (int k = 0; k < N2cc; ++k)
+            {
+                float PV_CC = c_stat.PV_CC(i, j, k);
+                if (!PV_CC)
+                    break;
+
+                if (PV_CC < MIN_PV)
+                {
+                    c_stat.PV_CC(i, j, k) = 0;
+                    c_stat.PVB_CC(i, j, k) = 0;
+                    continue;
+                }
+
+                c_stat.PV_CC(i, j, k) = PV_CC * (1.0f - alpha);
+                c_stat.PVB_CC(i, j, k) = c_stat.PVB_CC(i, j, k) * (1.0f - alpha);
+
+                OT v1 = c_stat.V1_CC<OT>(i, j, k);
+
+                int3 val1 = make_int3(
+                    ::abs(v1.x - prevVal.x),
+                    ::abs(v1.y - prevVal.y),
+                    ::abs(v1.z - prevVal.z)
+                );
+
+                OT v2 = c_stat.V2_CC<OT>(i, j, k);
+
+                int3 val2 = make_int3(
+                    ::abs(v2.x - curVal.x),
+                    ::abs(v2.y - curVal.y),
+                    ::abs(v2.z - curVal.z)
+                );
+
+                int dist = val1.x + val1.y + val1.z + val2.x + val2.y + val2.z;
+
+                if (dist < min_dist &&
+                    val1.x <= deltaCC && val1.y <= deltaCC && val1.z <= deltaCC &&
+                    val2.x <= deltaCC && val2.y <= deltaCC && val2.z <= deltaCC)
+                {
+                    min_dist = dist;
+                    indx = k;
+                }
+            }
+
+            if (indx < 0)
+            {
+                // Replace N2th elem in the table by new feature:
+                indx = N2cc - 1;
+                c_stat.PV_CC(i, j, indx) = alpha;
+                c_stat.PVB_CC(i, j, indx) = alpha;
+
+                //udate Vt
+                c_stat.V1_CC<OT>(i, j, indx) = Output<OT>::make(prevVal.x, prevVal.y, prevVal.z);
+                c_stat.V2_CC<OT>(i, j, indx) = Output<OT>::make(curVal.x, curVal.y, curVal.z);
+            }
+            else
+            {
+                // Update:
+                c_stat.PV_CC(i, j, indx) += alpha;
+
+                if (!foreground(i, j))
+                {
+                    c_stat.PVB_CC(i, j, indx) += alpha;
+                }
+            }
+
+            //re-sort CCt table by Pv
+            const float PV_CC_indx = c_stat.PV_CC(i, j, indx);
+            const float PVB_CC_indx = c_stat.PVB_CC(i, j, indx);
+            const OT V1_CC_indx = c_stat.V1_CC<OT>(i, j, indx);
+            const OT V2_CC_indx = c_stat.V2_CC<OT>(i, j, indx);
+            for (int k = 0; k < indx; ++k)
+            {
+                if (c_stat.PV_CC(i, j, k) <= PV_CC_indx)
+                {
+                    //shift elements
+                    float Pv_tmp1;
+                    float Pv_tmp2 = PV_CC_indx;
+
+                    float Pvb_tmp1;
+                    float Pvb_tmp2 = PVB_CC_indx;
+
+                    OT v1_tmp1;
+                    OT v1_tmp2 = V1_CC_indx;
+
+                    OT v2_tmp1;
+                    OT v2_tmp2 = V2_CC_indx;
+
+                    for (int l = k; l <= indx; ++l)
+                    {
+                        Pv_tmp1 = c_stat.PV_CC(i, j, l);
+                        c_stat.PV_CC(i, j, l) = Pv_tmp2;
+                        Pv_tmp2 = Pv_tmp1;
+
+                        Pvb_tmp1 = c_stat.PVB_CC(i, j, l);
+                        c_stat.PVB_CC(i, j, l) = Pvb_tmp2;
+                        Pvb_tmp2 = Pvb_tmp1;
+
+                        v1_tmp1 = c_stat.V1_CC<OT>(i, j, l);
+                        c_stat.V1_CC<OT>(i, j, l) = v1_tmp2;
+                        v1_tmp2 = v1_tmp1;
+
+                        v2_tmp1 = c_stat.V2_CC<OT>(i, j, l);
+                        c_stat.V2_CC<OT>(i, j, l) = v2_tmp2;
+                        v2_tmp2 = v2_tmp1;
+                    }
+
+                    break;
+                }
+            }
+
+            float sum1 = 0.0f;
+            float sum2 = 0.0f;
+
+            //check "once-off" changes
+            for (int k = 0; k < N1cc; ++k)
+            {
+                const float PV_CC = c_stat.PV_CC(i, j, k);
+                if (!PV_CC)
+                    break;
+
+                sum1 += PV_CC;
+                sum2 += c_stat.PVB_CC(i, j, k);
+            }
+
+            if (sum1 > T)
+                c_stat.is_trained_dyn_model(i, j) = 1;
+
+            float diff = sum1 - Pbcc * sum2;
+
+            // Update stat table:
+            if (diff > T)
+            {
+                //new BG features are discovered
+                for (int k = 0; k < N1cc; ++k)
+                {
+                    const float PV_CC = c_stat.PV_CC(i, j, k);
+                    if (!PV_CC)
+                        break;
+
+                    c_stat.PVB_CC(i, j, k) = (PV_CC - Pbcc * c_stat.PVB_CC(i, j, k)) / (1.0f - Pbcc);
+                }
+            }
+
+            c_stat.Pbcc(i, j) = Pbcc;
+        }
+
+        // Handle "stationary" pixel:
+        if (!Ftd(i, j))
+        {
+            const float alpha = c_stat.is_trained_st_model(i, j) ? alpha2 : alpha3;
+
+            float Pbc = c_stat.Pbc(i, j);
+
+            //update Pb
+            Pbc *= (1.0f - alpha);
+            if (!foreground(i, j))
+            {
+                Pbc += alpha;
+            }
+
+            int min_dist = numeric_limits<int>::max();
+            int indx = -1;
+
+            CT curVal = curFrame(i, j);
+
+            //find best Vi match
+            for (int k = 0; k < N2c; ++k)
+            {
+                float PV_C = c_stat.PV_C(i, j, k);
+
+                if (PV_C < MIN_PV)
+                {
+                    c_stat.PV_C(i, j, k) = 0;
+                    c_stat.PVB_C(i, j, k) = 0;
+                    continue;
+                }
+
+                // Exponential decay of memory
+                c_stat.PV_C(i, j, k) = PV_C * (1.0f - alpha);
+                c_stat.PVB_C(i, j, k) = c_stat.PVB_C(i, j, k) * (1.0f - alpha);
+
+                OT v = c_stat.V_C<OT>(i, j, k);
+                int3 val = make_int3(
+                    ::abs(v.x - curVal.x),
+                    ::abs(v.y - curVal.y),
+                    ::abs(v.z - curVal.z)
+                );
+
+                int dist = val.x + val.y + val.z;
+
+                if (dist < min_dist && val.x <= deltaC && val.y <= deltaC && val.z <= deltaC)
+                {
+                    min_dist = dist;
+                    indx = k;
+                }
+            }
+
+            if (indx < 0)
+            {
+                //N2th elem in the table is replaced by a new features
+                indx = N2c - 1;
+
+                c_stat.PV_C(i, j, indx) = alpha;
+                c_stat.PVB_C(i, j, indx) = alpha;
+
+                //udate Vt
+                c_stat.V_C<OT>(i, j, indx) = Output<OT>::make(curVal.x, curVal.y, curVal.z);
+            }
+            else
+            {
+                //update
+                c_stat.PV_C(i, j, indx) += alpha;
+
+                if (!foreground(i, j))
+                {
+                    c_stat.PVB_C(i, j, indx) += alpha;
+                }
+            }
+
+            //re-sort Ct table by Pv
+            const float PV_C_indx = c_stat.PV_C(i, j, indx);
+            const float PVB_C_indx = c_stat.PVB_C(i, j, indx);
+            OT V_C_indx = c_stat.V_C<OT>(i, j, indx);
+            for (int k = 0; k < indx; ++k)
+            {
+                if (c_stat.PV_C(i, j, k) <= PV_C_indx)
+                {
+                    //shift elements
+                    float Pv_tmp1;
+                    float Pv_tmp2 = PV_C_indx;
+
+                    float Pvb_tmp1;
+                    float Pvb_tmp2 = PVB_C_indx;
+
+                    OT v_tmp1;
+                    OT v_tmp2 = V_C_indx;
+
+                    for (int l = k; l <= indx; ++l)
+                    {
+                        Pv_tmp1 = c_stat.PV_C(i, j, l);
+                        c_stat.PV_C(i, j, l) = Pv_tmp2;
+                        Pv_tmp2 = Pv_tmp1;
+
+                        Pvb_tmp1 = c_stat.PVB_C(i, j, l);
+                        c_stat.PVB_C(i, j, l) = Pvb_tmp2;
+                        Pvb_tmp2 = Pvb_tmp1;
+
+                        v_tmp1 = c_stat.V_C<OT>(i, j, l);
+                        c_stat.V_C<OT>(i, j, l) = v_tmp2;
+                        v_tmp2 = v_tmp1;
+                    }
+
+                    break;
+                }
+            }
+
+            // Check "once-off" changes:
+            float sum1 = 0.0f;
+            float sum2 = 0.0f;
+            for (int k = 0; k < N1c; ++k)
+            {
+                const float PV_C = c_stat.PV_C(i, j, k);
+                if (!PV_C)
+                    break;
+
+                sum1 += PV_C;
+                sum2 += c_stat.PVB_C(i, j, k);
+            }
+
+            if (sum1 > T)
+                c_stat.is_trained_st_model(i, j) = 1;
+
+            float diff = sum1 - Pbc * sum2;
+
+            // Update stat table:
+            if (diff > T)
+            {
+                //new BG features are discovered
+                for (int k = 0; k < N1c; ++k)
+                {
+                    const float PV_C = c_stat.PV_C(i, j, k);
+                    if (!PV_C)
+                        break;
+
+                    c_stat.PVB_C(i, j, k) = (PV_C - Pbc * c_stat.PVB_C(i, j, k)) / (1.0f - Pbc);
+                }
+
+                c_stat.Pbc(i, j) = 1.0f - Pbc;
+            }
+            else
+            {
+                c_stat.Pbc(i, j) = Pbc;
+            }
+        } // if !(change detection) at pixel (i,j)
+
+        // Update the reference BG image:
+        if (!foreground(i, j))
+        {
+            CT curVal = curFrame(i, j);
+
+            if (!Ftd(i, j) && !Fbd(i, j))
+            {
+                // Apply IIR filter:
+                OT oldVal = background(i, j);
+
+                int3 newVal = make_int3(
+                    __float2int_rn(oldVal.x * (1.0f - alpha1) + curVal.x * alpha1),
+                    __float2int_rn(oldVal.y * (1.0f - alpha1) + curVal.y * alpha1),
+                    __float2int_rn(oldVal.z * (1.0f - alpha1) + curVal.z * alpha1)
+                );
+
+                background(i, j) = Output<OT>::make(
+                    static_cast<uchar>(newVal.x),
+                    static_cast<uchar>(newVal.y),
+                    static_cast<uchar>(newVal.z)
+                );
+            }
+            else
+            {
+                background(i, j) = Output<OT>::make(curVal.x, curVal.y, curVal.z);
+            }
+        }
+    }
+
+    template <typename PT, typename CT, typename OT>
+    struct UpdateBackgroundModel
+    {
+        static void call(PtrStepSz<PT> prevFrame, PtrStepSz<CT> curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, PtrStepSz<OT> background,
+                         int deltaC, int deltaCC, float alpha1, float alpha2, float alpha3, int N1c, int N1cc, int N2c, int N2cc, float T,
+                         cudaStream_t stream)
+        {
+            dim3 block(32, 8);
+            dim3 grid(divUp(prevFrame.cols, block.x), divUp(prevFrame.rows, block.y));
+
+            cudaSafeCall( cudaFuncSetCacheConfig(updateBackgroundModel<PT, CT, OT, PtrStep<PT>, PtrStep<CT>, PtrStepb, PtrStepb>, cudaFuncCachePreferL1) );
+
+            updateBackgroundModel<PT, CT, OT, PtrStep<PT>, PtrStep<CT>, PtrStepb, PtrStepb><<<grid, block, 0, stream>>>(
+                prevFrame.cols, prevFrame.rows,
+                prevFrame, curFrame,
+                Ftd, Fbd, foreground, background,
+                deltaC, deltaCC, alpha1, alpha2, alpha3, N1c, N1cc, N2c, N2cc, T);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    };
+
+    template <typename PT, typename CT, typename OT>
+    void updateBackgroundModel_gpu(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, PtrStepSzb background,
+                                   int deltaC, int deltaCC, float alpha1, float alpha2, float alpha3, int N1c, int N1cc, int N2c, int N2cc, float T,
+                                   cudaStream_t stream)
+    {
+        UpdateBackgroundModel<PT, CT, OT>::call(PtrStepSz<PT>(prevFrame), PtrStepSz<CT>(curFrame), Ftd, Fbd, foreground, PtrStepSz<OT>(background),
+                                                deltaC, deltaCC, alpha1, alpha2, alpha3, N1c, N1cc, N2c, N2cc, T, stream);
+    }
+
+    template void updateBackgroundModel_gpu<uchar3, uchar3, uchar3>(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, PtrStepSzb background, int deltaC, int deltaCC, float alpha1, float alpha2, float alpha3, int N1c, int N1cc, int N2c, int N2cc, float T, cudaStream_t stream);
+    template void updateBackgroundModel_gpu<uchar3, uchar3, uchar4>(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, PtrStepSzb background, int deltaC, int deltaCC, float alpha1, float alpha2, float alpha3, int N1c, int N1cc, int N2c, int N2cc, float T, cudaStream_t stream);
+    template void updateBackgroundModel_gpu<uchar3, uchar4, uchar3>(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, PtrStepSzb background, int deltaC, int deltaCC, float alpha1, float alpha2, float alpha3, int N1c, int N1cc, int N2c, int N2cc, float T, cudaStream_t stream);
+    template void updateBackgroundModel_gpu<uchar3, uchar4, uchar4>(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, PtrStepSzb background, int deltaC, int deltaCC, float alpha1, float alpha2, float alpha3, int N1c, int N1cc, int N2c, int N2cc, float T, cudaStream_t stream);
+    template void updateBackgroundModel_gpu<uchar4, uchar3, uchar3>(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, PtrStepSzb background, int deltaC, int deltaCC, float alpha1, float alpha2, float alpha3, int N1c, int N1cc, int N2c, int N2cc, float T, cudaStream_t stream);
+    template void updateBackgroundModel_gpu<uchar4, uchar3, uchar4>(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, PtrStepSzb background, int deltaC, int deltaCC, float alpha1, float alpha2, float alpha3, int N1c, int N1cc, int N2c, int N2cc, float T, cudaStream_t stream);
+    template void updateBackgroundModel_gpu<uchar4, uchar4, uchar3>(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, PtrStepSzb background, int deltaC, int deltaCC, float alpha1, float alpha2, float alpha3, int N1c, int N1cc, int N2c, int N2cc, float T, cudaStream_t stream);
+    template void updateBackgroundModel_gpu<uchar4, uchar4, uchar4>(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground, PtrStepSzb background, int deltaC, int deltaCC, float alpha1, float alpha2, float alpha3, int N1c, int N1cc, int N2c, int N2cc, float T, cudaStream_t stream);
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudalegacy/src/cuda/fgd.hpp b/modules/cudalegacy/src/cuda/fgd.hpp
new file mode 100644
index 00000000000..2cfab6e40d3
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/fgd.hpp
@@ -0,0 +1,189 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __FGD_BGFG_COMMON_HPP__
+#define __FGD_BGFG_COMMON_HPP__
+
+#include "opencv2/core/cuda_types.hpp"
+
+namespace fgd
+{
+    struct BGPixelStat
+    {
+    public:
+#ifdef __CUDACC__
+        __device__ float& Pbc(int i, int j);
+        __device__ float& Pbcc(int i, int j);
+
+        __device__ unsigned char& is_trained_st_model(int i, int j);
+        __device__ unsigned char& is_trained_dyn_model(int i, int j);
+
+        __device__ float& PV_C(int i, int j, int k);
+        __device__ float& PVB_C(int i, int j, int k);
+        template <typename T> __device__ T& V_C(int i, int j, int k);
+
+        __device__ float& PV_CC(int i, int j, int k);
+        __device__ float& PVB_CC(int i, int j, int k);
+        template <typename T> __device__ T& V1_CC(int i, int j, int k);
+        template <typename T> __device__ T& V2_CC(int i, int j, int k);
+#endif
+
+        int rows_;
+
+        unsigned char* Pbc_data_;
+        size_t Pbc_step_;
+
+        unsigned char* Pbcc_data_;
+        size_t Pbcc_step_;
+
+        unsigned char* is_trained_st_model_data_;
+        size_t is_trained_st_model_step_;
+
+        unsigned char* is_trained_dyn_model_data_;
+        size_t is_trained_dyn_model_step_;
+
+        unsigned char* ctable_Pv_data_;
+        size_t ctable_Pv_step_;
+
+        unsigned char* ctable_Pvb_data_;
+        size_t ctable_Pvb_step_;
+
+        unsigned char* ctable_v_data_;
+        size_t ctable_v_step_;
+
+        unsigned char* cctable_Pv_data_;
+        size_t cctable_Pv_step_;
+
+        unsigned char* cctable_Pvb_data_;
+        size_t cctable_Pvb_step_;
+
+        unsigned char* cctable_v1_data_;
+        size_t cctable_v1_step_;
+
+        unsigned char* cctable_v2_data_;
+        size_t cctable_v2_step_;
+    };
+
+#ifdef __CUDACC__
+    __device__ __forceinline__ float& BGPixelStat::Pbc(int i, int j)
+    {
+        return *((float*)(Pbc_data_ + i * Pbc_step_) + j);
+    }
+
+    __device__ __forceinline__ float& BGPixelStat::Pbcc(int i, int j)
+    {
+        return *((float*)(Pbcc_data_ + i * Pbcc_step_) + j);
+    }
+
+    __device__ __forceinline__ unsigned char& BGPixelStat::is_trained_st_model(int i, int j)
+    {
+        return *((unsigned char*)(is_trained_st_model_data_ + i * is_trained_st_model_step_) + j);
+    }
+
+    __device__ __forceinline__ unsigned char& BGPixelStat::is_trained_dyn_model(int i, int j)
+    {
+        return *((unsigned char*)(is_trained_dyn_model_data_ + i * is_trained_dyn_model_step_) + j);
+    }
+
+    __device__ __forceinline__ float& BGPixelStat::PV_C(int i, int j, int k)
+    {
+        return *((float*)(ctable_Pv_data_ + ((k * rows_) + i) * ctable_Pv_step_) + j);
+    }
+
+    __device__ __forceinline__ float& BGPixelStat::PVB_C(int i, int j, int k)
+    {
+        return *((float*)(ctable_Pvb_data_ + ((k * rows_) + i) * ctable_Pvb_step_) + j);
+    }
+
+    template <typename T> __device__ __forceinline__ T& BGPixelStat::V_C(int i, int j, int k)
+    {
+        return *((T*)(ctable_v_data_ + ((k * rows_) + i) * ctable_v_step_) + j);
+    }
+
+    __device__ __forceinline__ float& BGPixelStat::PV_CC(int i, int j, int k)
+    {
+        return *((float*)(cctable_Pv_data_ + ((k * rows_) + i) * cctable_Pv_step_) + j);
+    }
+
+    __device__ __forceinline__ float& BGPixelStat::PVB_CC(int i, int j, int k)
+    {
+        return *((float*)(cctable_Pvb_data_ + ((k * rows_) + i) * cctable_Pvb_step_) + j);
+    }
+
+    template <typename T> __device__ __forceinline__ T& BGPixelStat::V1_CC(int i, int j, int k)
+    {
+        return *((T*)(cctable_v1_data_ + ((k * rows_) + i) * cctable_v1_step_) + j);
+    }
+
+    template <typename T> __device__ __forceinline__ T& BGPixelStat::V2_CC(int i, int j, int k)
+    {
+        return *((T*)(cctable_v2_data_ + ((k * rows_) + i) * cctable_v2_step_) + j);
+    }
+#endif
+
+    const int PARTIAL_HISTOGRAM_COUNT = 240;
+    const int HISTOGRAM_BIN_COUNT = 256;
+
+    template <typename PT, typename CT>
+    void calcDiffHistogram_gpu(cv::cuda::PtrStepSzb prevFrame, cv::cuda::PtrStepSzb curFrame,
+                               unsigned int* hist0, unsigned int* hist1, unsigned int* hist2,
+                               unsigned int* partialBuf0, unsigned int* partialBuf1, unsigned int* partialBuf2,
+                               bool cc20, cudaStream_t stream);
+
+    template <typename PT, typename CT>
+    void calcDiffThreshMask_gpu(cv::cuda::PtrStepSzb prevFrame, cv::cuda::PtrStepSzb curFrame, uchar3 bestThres, cv::cuda::PtrStepSzb changeMask, cudaStream_t stream);
+
+    void setBGPixelStat(const BGPixelStat& stat);
+
+    template <typename PT, typename CT, typename OT>
+    void bgfgClassification_gpu(cv::cuda::PtrStepSzb prevFrame, cv::cuda::PtrStepSzb curFrame,
+                                cv::cuda::PtrStepSzb Ftd, cv::cuda::PtrStepSzb Fbd, cv::cuda::PtrStepSzb foreground,
+                                int deltaC, int deltaCC, float alpha2, int N1c, int N1cc, cudaStream_t stream);
+
+    template <typename PT, typename CT, typename OT>
+    void updateBackgroundModel_gpu(cv::cuda::PtrStepSzb prevFrame, cv::cuda::PtrStepSzb curFrame,
+                                   cv::cuda::PtrStepSzb Ftd, cv::cuda::PtrStepSzb Fbd, cv::cuda::PtrStepSzb foreground, cv::cuda::PtrStepSzb background,
+                                   int deltaC, int deltaCC, float alpha1, float alpha2, float alpha3, int N1c, int N1cc, int N2c, int N2cc, float T,
+                                   cudaStream_t stream);
+}
+
+#endif // __FGD_BGFG_COMMON_HPP__
diff --git a/modules/cudalegacy/src/cuda/gmg.cu b/modules/cudalegacy/src/cuda/gmg.cu
new file mode 100644
index 00000000000..c55fdb721bc
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/gmg.cu
@@ -0,0 +1,258 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/vec_traits.hpp"
+#include "opencv2/core/cuda/limits.hpp"
+
+namespace cv { namespace cuda { namespace device {
+    namespace gmg
+    {
+        __constant__ int   c_width;
+        __constant__ int   c_height;
+        __constant__ float c_minVal;
+        __constant__ float c_maxVal;
+        __constant__ int   c_quantizationLevels;
+        __constant__ float c_backgroundPrior;
+        __constant__ float c_decisionThreshold;
+        __constant__ int   c_maxFeatures;
+        __constant__ int   c_numInitializationFrames;
+
+        void loadConstants(int width, int height, float minVal, float maxVal, int quantizationLevels, float backgroundPrior,
+                           float decisionThreshold, int maxFeatures, int numInitializationFrames)
+        {
+            cudaSafeCall( cudaMemcpyToSymbol(c_width, &width, sizeof(width)) );
+            cudaSafeCall( cudaMemcpyToSymbol(c_height, &height, sizeof(height)) );
+            cudaSafeCall( cudaMemcpyToSymbol(c_minVal, &minVal, sizeof(minVal)) );
+            cudaSafeCall( cudaMemcpyToSymbol(c_maxVal, &maxVal, sizeof(maxVal)) );
+            cudaSafeCall( cudaMemcpyToSymbol(c_quantizationLevels, &quantizationLevels, sizeof(quantizationLevels)) );
+            cudaSafeCall( cudaMemcpyToSymbol(c_backgroundPrior, &backgroundPrior, sizeof(backgroundPrior)) );
+            cudaSafeCall( cudaMemcpyToSymbol(c_decisionThreshold, &decisionThreshold, sizeof(decisionThreshold)) );
+            cudaSafeCall( cudaMemcpyToSymbol(c_maxFeatures, &maxFeatures, sizeof(maxFeatures)) );
+            cudaSafeCall( cudaMemcpyToSymbol(c_numInitializationFrames, &numInitializationFrames, sizeof(numInitializationFrames)) );
+        }
+
+        __device__ float findFeature(const int color, const PtrStepi& colors, const PtrStepf& weights, const int x, const int y, const int nfeatures)
+        {
+            for (int i = 0, fy = y; i < nfeatures; ++i, fy += c_height)
+            {
+                if (color == colors(fy, x))
+                    return weights(fy, x);
+            }
+
+            // not in histogram, so return 0.
+            return 0.0f;
+        }
+
+        __device__ void normalizeHistogram(PtrStepf weights, const int x, const int y, const int nfeatures)
+        {
+            float total = 0.0f;
+            for (int i = 0, fy = y; i < nfeatures; ++i, fy += c_height)
+                total += weights(fy, x);
+
+            if (total != 0.0f)
+            {
+                for (int i = 0, fy = y; i < nfeatures; ++i, fy += c_height)
+                    weights(fy, x) /= total;
+            }
+        }
+
+        __device__ bool insertFeature(const int color, const float weight, PtrStepi colors, PtrStepf weights, const int x, const int y, int& nfeatures)
+        {
+            for (int i = 0, fy = y; i < nfeatures; ++i, fy += c_height)
+            {
+                if (color == colors(fy, x))
+                {
+                    // feature in histogram
+
+                    weights(fy, x) += weight;
+
+                    return false;
+                }
+            }
+
+            if (nfeatures == c_maxFeatures)
+            {
+                // discard oldest feature
+
+                int idx = -1;
+                float minVal = numeric_limits<float>::max();
+                for (int i = 0, fy = y; i < nfeatures; ++i, fy += c_height)
+                {
+                    const float w = weights(fy, x);
+                    if (w < minVal)
+                    {
+                        minVal = w;
+                        idx = fy;
+                    }
+                }
+
+                colors(idx, x) = color;
+                weights(idx, x) = weight;
+
+                return false;
+            }
+
+            colors(nfeatures * c_height + y, x) = color;
+            weights(nfeatures * c_height + y, x) = weight;
+
+            ++nfeatures;
+
+            return true;
+        }
+
+        namespace detail
+        {
+            template <int cn> struct Quantization
+            {
+                template <typename T>
+                __device__ static int apply(const T& val)
+                {
+                    int res = 0;
+                    res |= static_cast<int>((val.x - c_minVal) * c_quantizationLevels / (c_maxVal - c_minVal));
+                    res |= static_cast<int>((val.y - c_minVal) * c_quantizationLevels / (c_maxVal - c_minVal)) << 8;
+                    res |= static_cast<int>((val.z - c_minVal) * c_quantizationLevels / (c_maxVal - c_minVal)) << 16;
+                    return res;
+                }
+            };
+
+            template <> struct Quantization<1>
+            {
+                template <typename T>
+                __device__ static int apply(T val)
+                {
+                    return static_cast<int>((val - c_minVal) * c_quantizationLevels / (c_maxVal - c_minVal));
+                }
+            };
+        }
+
+        template <typename T> struct Quantization : detail::Quantization<VecTraits<T>::cn> {};
+
+        template <typename SrcT>
+        __global__ void update(const PtrStep<SrcT> frame, PtrStepb fgmask, PtrStepi colors_, PtrStepf weights_, PtrStepi nfeatures_,
+                               const int frameNum, const float learningRate, const bool updateBackgroundModel)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x >= c_width || y >= c_height)
+                return;
+
+            const SrcT pix = frame(y, x);
+            const int newFeatureColor = Quantization<SrcT>::apply(pix);
+
+            int nfeatures = nfeatures_(y, x);
+
+            if (frameNum >= c_numInitializationFrames)
+            {
+                // typical operation
+
+                const float weight = findFeature(newFeatureColor, colors_, weights_, x, y, nfeatures);
+
+                // see Godbehere, Matsukawa, Goldberg (2012) for reasoning behind this implementation of Bayes rule
+                const float posterior = (weight * c_backgroundPrior) / (weight * c_backgroundPrior + (1.0f - weight) * (1.0f - c_backgroundPrior));
+
+                const bool isForeground = ((1.0f - posterior) > c_decisionThreshold);
+                fgmask(y, x) = (uchar)(-isForeground);
+
+                // update histogram.
+
+                if (updateBackgroundModel)
+                {
+                    for (int i = 0, fy = y; i < nfeatures; ++i, fy += c_height)
+                        weights_(fy, x) *= 1.0f - learningRate;
+
+                    bool inserted = insertFeature(newFeatureColor, learningRate, colors_, weights_, x, y, nfeatures);
+
+                    if (inserted)
+                    {
+                        normalizeHistogram(weights_, x, y, nfeatures);
+                        nfeatures_(y, x) = nfeatures;
+                    }
+                }
+            }
+            else if (updateBackgroundModel)
+            {
+                // training-mode update
+
+                insertFeature(newFeatureColor, 1.0f, colors_, weights_, x, y, nfeatures);
+
+                if (frameNum == c_numInitializationFrames - 1)
+                    normalizeHistogram(weights_, x, y, nfeatures);
+            }
+        }
+
+        template <typename SrcT>
+        void update_gpu(PtrStepSzb frame, PtrStepb fgmask, PtrStepSzi colors, PtrStepf weights, PtrStepi nfeatures,
+                        int frameNum, float learningRate, bool updateBackgroundModel, cudaStream_t stream)
+        {
+            const dim3 block(32, 8);
+            const dim3 grid(divUp(frame.cols, block.x), divUp(frame.rows, block.y));
+
+            cudaSafeCall( cudaFuncSetCacheConfig(update<SrcT>, cudaFuncCachePreferL1) );
+
+            update<SrcT><<<grid, block, 0, stream>>>((PtrStepSz<SrcT>) frame, fgmask, colors, weights, nfeatures, frameNum, learningRate, updateBackgroundModel);
+
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template void update_gpu<uchar  >(PtrStepSzb frame, PtrStepb fgmask, PtrStepSzi colors, PtrStepf weights, PtrStepi nfeatures, int frameNum, float learningRate, bool updateBackgroundModel, cudaStream_t stream);
+        template void update_gpu<uchar3 >(PtrStepSzb frame, PtrStepb fgmask, PtrStepSzi colors, PtrStepf weights, PtrStepi nfeatures, int frameNum, float learningRate, bool updateBackgroundModel, cudaStream_t stream);
+        template void update_gpu<uchar4 >(PtrStepSzb frame, PtrStepb fgmask, PtrStepSzi colors, PtrStepf weights, PtrStepi nfeatures, int frameNum, float learningRate, bool updateBackgroundModel, cudaStream_t stream);
+
+        template void update_gpu<ushort >(PtrStepSzb frame, PtrStepb fgmask, PtrStepSzi colors, PtrStepf weights, PtrStepi nfeatures, int frameNum, float learningRate, bool updateBackgroundModel, cudaStream_t stream);
+        template void update_gpu<ushort3>(PtrStepSzb frame, PtrStepb fgmask, PtrStepSzi colors, PtrStepf weights, PtrStepi nfeatures, int frameNum, float learningRate, bool updateBackgroundModel, cudaStream_t stream);
+        template void update_gpu<ushort4>(PtrStepSzb frame, PtrStepb fgmask, PtrStepSzi colors, PtrStepf weights, PtrStepi nfeatures, int frameNum, float learningRate, bool updateBackgroundModel, cudaStream_t stream);
+
+        template void update_gpu<float  >(PtrStepSzb frame, PtrStepb fgmask, PtrStepSzi colors, PtrStepf weights, PtrStepi nfeatures, int frameNum, float learningRate, bool updateBackgroundModel, cudaStream_t stream);
+        template void update_gpu<float3 >(PtrStepSzb frame, PtrStepb fgmask, PtrStepSzi colors, PtrStepf weights, PtrStepi nfeatures, int frameNum, float learningRate, bool updateBackgroundModel, cudaStream_t stream);
+        template void update_gpu<float4 >(PtrStepSzb frame, PtrStepb fgmask, PtrStepSzi colors, PtrStepf weights, PtrStepi nfeatures, int frameNum, float learningRate, bool updateBackgroundModel, cudaStream_t stream);
+    }
+}}}
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudalegacy/src/cuda/needle_map.cu b/modules/cudalegacy/src/cuda/needle_map.cu
new file mode 100644
index 00000000000..a98b17cafed
--- /dev/null
+++ b/modules/cudalegacy/src/cuda/needle_map.cu
@@ -0,0 +1,220 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace optical_flow
+    {
+        #define NEEDLE_MAP_SCALE 16
+        #define NUM_VERTS_PER_ARROW 6
+
+        __global__ void NeedleMapAverageKernel(const PtrStepSzf u, const PtrStepf v, PtrStepf u_avg, PtrStepf v_avg)
+        {
+            __shared__ float smem[2 * NEEDLE_MAP_SCALE];
+
+            volatile float* u_col_sum = smem;
+            volatile float* v_col_sum = u_col_sum + NEEDLE_MAP_SCALE;
+
+            const int x = blockIdx.x * NEEDLE_MAP_SCALE + threadIdx.x;
+            const int y = blockIdx.y * NEEDLE_MAP_SCALE;
+
+            u_col_sum[threadIdx.x] = 0;
+            v_col_sum[threadIdx.x] = 0;
+
+            #pragma unroll
+            for(int i = 0; i < NEEDLE_MAP_SCALE; ++i)
+            {
+                u_col_sum[threadIdx.x] += u(::min(y + i, u.rows - 1), x);
+                v_col_sum[threadIdx.x] += v(::min(y + i, u.rows - 1), x);
+            }
+
+            if (threadIdx.x < 8)
+            {
+                // now add the column sums
+                const uint X = threadIdx.x;
+
+                if (X | 0xfe == 0xfe)  // bit 0 is 0
+                {
+                    u_col_sum[threadIdx.x] += u_col_sum[threadIdx.x + 1];
+                    v_col_sum[threadIdx.x] += v_col_sum[threadIdx.x + 1];
+                }
+
+                if (X | 0xfe == 0xfc) // bits 0 & 1 == 0
+                {
+                    u_col_sum[threadIdx.x] += u_col_sum[threadIdx.x + 2];
+                    v_col_sum[threadIdx.x] += v_col_sum[threadIdx.x + 2];
+                }
+
+                if (X | 0xf8 == 0xf8)
+                {
+                    u_col_sum[threadIdx.x] += u_col_sum[threadIdx.x + 4];
+                    v_col_sum[threadIdx.x] += v_col_sum[threadIdx.x + 4];
+                }
+
+                if (X == 0)
+                {
+                    u_col_sum[threadIdx.x] += u_col_sum[threadIdx.x + 8];
+                    v_col_sum[threadIdx.x] += v_col_sum[threadIdx.x + 8];
+                }
+            }
+
+            if (threadIdx.x == 0)
+            {
+                const float coeff = 1.0f / (NEEDLE_MAP_SCALE * NEEDLE_MAP_SCALE);
+
+                u_col_sum[0] *= coeff;
+                v_col_sum[0] *= coeff;
+
+                u_avg(blockIdx.y, blockIdx.x) = u_col_sum[0];
+                v_avg(blockIdx.y, blockIdx.x) = v_col_sum[0];
+            }
+        }
+
+        void NeedleMapAverage_gpu(PtrStepSzf u, PtrStepSzf v, PtrStepSzf u_avg, PtrStepSzf v_avg)
+        {
+            const dim3 block(NEEDLE_MAP_SCALE);
+            const dim3 grid(u_avg.cols, u_avg.rows);
+
+            NeedleMapAverageKernel<<<grid, block>>>(u, v, u_avg, v_avg);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        __global__ void NeedleMapVertexKernel(const PtrStepSzf u_avg, const PtrStepf v_avg, float* vertex_data, float* color_data, float max_flow, float xscale, float yscale)
+        {
+            // test - just draw a triangle at each pixel
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            const float arrow_x = x * NEEDLE_MAP_SCALE + NEEDLE_MAP_SCALE / 2.0f;
+            const float arrow_y = y * NEEDLE_MAP_SCALE + NEEDLE_MAP_SCALE / 2.0f;
+
+            float3 v[NUM_VERTS_PER_ARROW];
+
+            if (x < u_avg.cols && y < u_avg.rows)
+            {
+                const float u_avg_val = u_avg(y, x);
+                const float v_avg_val = v_avg(y, x);
+
+                const float theta = ::atan2f(v_avg_val, u_avg_val);
+
+                float r = ::sqrtf(v_avg_val * v_avg_val + u_avg_val * u_avg_val);
+                r = fmin(14.0f * (r / max_flow), 14.0f);
+
+                v[0].z = 1.0f;
+                v[1].z = 0.7f;
+                v[2].z = 0.7f;
+                v[3].z = 0.7f;
+                v[4].z = 0.7f;
+                v[5].z = 1.0f;
+
+                v[0].x = arrow_x;
+                v[0].y = arrow_y;
+                v[5].x = arrow_x;
+                v[5].y = arrow_y;
+
+                v[2].x = arrow_x + r * ::cosf(theta);
+                v[2].y = arrow_y + r * ::sinf(theta);
+                v[3].x = v[2].x;
+                v[3].y = v[2].y;
+
+                r = ::fmin(r, 2.5f);
+
+                v[1].x = arrow_x + r * ::cosf(theta - CV_PI_F / 2.0f);
+                v[1].y = arrow_y + r * ::sinf(theta - CV_PI_F / 2.0f);
+
+                v[4].x = arrow_x + r * ::cosf(theta + CV_PI_F / 2.0f);
+                v[4].y = arrow_y + r * ::sinf(theta + CV_PI_F / 2.0f);
+
+                int indx = (y * u_avg.cols + x) * NUM_VERTS_PER_ARROW * 3;
+
+                color_data[indx] = (theta - CV_PI_F) / CV_PI_F * 180.0f;
+                vertex_data[indx++] = v[0].x * xscale;
+                vertex_data[indx++] = v[0].y * yscale;
+                vertex_data[indx++] = v[0].z;
+
+                color_data[indx] = (theta - CV_PI_F) / CV_PI_F * 180.0f;
+                vertex_data[indx++] = v[1].x * xscale;
+                vertex_data[indx++] = v[1].y * yscale;
+                vertex_data[indx++] = v[1].z;
+
+                color_data[indx] = (theta - CV_PI_F) / CV_PI_F * 180.0f;
+                vertex_data[indx++] = v[2].x * xscale;
+                vertex_data[indx++] = v[2].y * yscale;
+                vertex_data[indx++] = v[2].z;
+
+                color_data[indx] = (theta - CV_PI_F) / CV_PI_F * 180.0f;
+                vertex_data[indx++] = v[3].x * xscale;
+                vertex_data[indx++] = v[3].y * yscale;
+                vertex_data[indx++] = v[3].z;
+
+                color_data[indx] = (theta - CV_PI_F) / CV_PI_F * 180.0f;
+                vertex_data[indx++] = v[4].x * xscale;
+                vertex_data[indx++] = v[4].y * yscale;
+                vertex_data[indx++] = v[4].z;
+
+                color_data[indx] = (theta - CV_PI_F) / CV_PI_F * 180.0f;
+                vertex_data[indx++] = v[5].x * xscale;
+                vertex_data[indx++] = v[5].y * yscale;
+                vertex_data[indx++] = v[5].z;
+            }
+        }
+
+        void CreateOpticalFlowNeedleMap_gpu(PtrStepSzf u_avg, PtrStepSzf v_avg, float* vertex_buffer, float* color_data, float max_flow, float xscale, float yscale)
+        {
+            const dim3 block(16);
+            const dim3 grid(divUp(u_avg.cols, block.x), divUp(u_avg.rows, block.y));
+
+            NeedleMapVertexKernel<<<grid, block>>>(u_avg, v_avg, vertex_buffer, color_data, max_flow, xscale, yscale);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    }
+}}}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudalegacy/src/fgd.cpp b/modules/cudalegacy/src/fgd.cpp
new file mode 100644
index 00000000000..7e5728a1c5b
--- /dev/null
+++ b/modules/cudalegacy/src/fgd.cpp
@@ -0,0 +1,729 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined(HAVE_CUDA) || defined(CUDA_DISABLER) || !defined(HAVE_OPENCV_IMGPROC) || !defined(HAVE_OPENCV_CUDAARITHM) || !defined(HAVE_OPENCV_CUDAIMGPROC)
+
+cv::cuda::FGDParams::FGDParams() { throw_no_cuda(); }
+
+Ptr<cuda::BackgroundSubtractorFGD> cv::cuda::createBackgroundSubtractorFGD(const FGDParams&) { throw_no_cuda(); return Ptr<cuda::BackgroundSubtractorFGD>(); }
+
+#else
+
+#include "cuda/fgd.hpp"
+#include "opencv2/imgproc/imgproc_c.h"
+
+/////////////////////////////////////////////////////////////////////////
+// FGDParams
+
+namespace
+{
+    // Default parameters of foreground detection algorithm:
+    const int BGFG_FGD_LC  = 128;
+    const int BGFG_FGD_N1C = 15;
+    const int BGFG_FGD_N2C = 25;
+
+    const int BGFG_FGD_LCC   = 64;
+    const int BGFG_FGD_N1CC = 25;
+    const int BGFG_FGD_N2CC = 40;
+
+    // Background reference image update parameter:
+    const float BGFG_FGD_ALPHA_1 = 0.1f;
+
+    // stat model update parameter
+    // 0.002f ~ 1K frame(~45sec), 0.005 ~ 18sec (if 25fps and absolutely static BG)
+    const float BGFG_FGD_ALPHA_2 = 0.005f;
+
+    // start value for alpha parameter (to fast initiate statistic model)
+    const float BGFG_FGD_ALPHA_3 = 0.1f;
+
+    const float BGFG_FGD_DELTA = 2.0f;
+
+    const float BGFG_FGD_T = 0.9f;
+
+    const float BGFG_FGD_MINAREA= 15.0f;
+}
+
+cv::cuda::FGDParams::FGDParams()
+{
+    Lc      = BGFG_FGD_LC;
+    N1c     = BGFG_FGD_N1C;
+    N2c     = BGFG_FGD_N2C;
+
+    Lcc     = BGFG_FGD_LCC;
+    N1cc    = BGFG_FGD_N1CC;
+    N2cc    = BGFG_FGD_N2CC;
+
+    delta   = BGFG_FGD_DELTA;
+
+    alpha1  = BGFG_FGD_ALPHA_1;
+    alpha2  = BGFG_FGD_ALPHA_2;
+    alpha3  = BGFG_FGD_ALPHA_3;
+
+    T       = BGFG_FGD_T;
+    minArea = BGFG_FGD_MINAREA;
+
+    is_obj_without_holes = true;
+    perform_morphing     = 1;
+}
+
+/////////////////////////////////////////////////////////////////////////
+// copyChannels
+
+namespace
+{
+    void copyChannels(const GpuMat& src, GpuMat& dst, int dst_cn = -1)
+    {
+        const int src_cn = src.channels();
+
+        if (dst_cn < 0)
+            dst_cn = src_cn;
+
+        cuda::ensureSizeIsEnough(src.size(), CV_MAKE_TYPE(src.depth(), dst_cn), dst);
+
+        if (src_cn == dst_cn)
+        {
+            src.copyTo(dst);
+        }
+        else
+        {
+            static const int cvt_codes[4][4] =
+            {
+                {-1, -1, COLOR_GRAY2BGR, COLOR_GRAY2BGRA},
+                {-1, -1, -1, -1},
+                {COLOR_BGR2GRAY, -1, -1, COLOR_BGR2BGRA},
+                {COLOR_BGRA2GRAY, -1, COLOR_BGRA2BGR, -1}
+            };
+
+            const int cvt_code = cvt_codes[src_cn - 1][dst_cn - 1];
+            CV_DbgAssert( cvt_code >= 0 );
+
+            cuda::cvtColor(src, dst, cvt_code, dst_cn);
+        }
+    }
+}
+
+/////////////////////////////////////////////////////////////////////////
+// changeDetection
+
+namespace
+{
+    void calcDiffHistogram(const GpuMat& prevFrame, const GpuMat& curFrame, GpuMat& hist, GpuMat& histBuf)
+    {
+        typedef void (*func_t)(PtrStepSzb prevFrame, PtrStepSzb curFrame,
+                               unsigned int* hist0, unsigned int* hist1, unsigned int* hist2,
+                               unsigned int* partialBuf0, unsigned int* partialBuf1, unsigned int* partialBuf2,
+                               bool cc20, cudaStream_t stream);
+        static const func_t funcs[4][4] =
+        {
+            {0,0,0,0},
+            {0,0,0,0},
+            {0,0,fgd::calcDiffHistogram_gpu<uchar3, uchar3>,fgd::calcDiffHistogram_gpu<uchar3, uchar4>},
+            {0,0,fgd::calcDiffHistogram_gpu<uchar4, uchar3>,fgd::calcDiffHistogram_gpu<uchar4, uchar4>}
+        };
+
+        hist.create(3, 256, CV_32SC1);
+        histBuf.create(3, fgd::PARTIAL_HISTOGRAM_COUNT * fgd::HISTOGRAM_BIN_COUNT, CV_32SC1);
+
+        funcs[prevFrame.channels() - 1][curFrame.channels() - 1](
+                    prevFrame, curFrame,
+                    hist.ptr<unsigned int>(0), hist.ptr<unsigned int>(1), hist.ptr<unsigned int>(2),
+                    histBuf.ptr<unsigned int>(0), histBuf.ptr<unsigned int>(1), histBuf.ptr<unsigned int>(2),
+                    deviceSupports(FEATURE_SET_COMPUTE_20), 0);
+    }
+
+    void calcRelativeVariance(unsigned int hist[3 * 256], double relativeVariance[3][fgd::HISTOGRAM_BIN_COUNT])
+    {
+        std::memset(relativeVariance, 0, 3 * fgd::HISTOGRAM_BIN_COUNT * sizeof(double));
+
+        for (int thres = fgd::HISTOGRAM_BIN_COUNT - 2; thres >= 0; --thres)
+        {
+            Vec3d sum(0.0, 0.0, 0.0);
+            Vec3d sqsum(0.0, 0.0, 0.0);
+            Vec3i count(0, 0, 0);
+
+            for (int j = thres; j < fgd::HISTOGRAM_BIN_COUNT; ++j)
+            {
+                sum[0]   += static_cast<double>(j) * hist[j];
+                sqsum[0] += static_cast<double>(j * j) * hist[j];
+                count[0] += hist[j];
+
+                sum[1]   += static_cast<double>(j) * hist[j + 256];
+                sqsum[1] += static_cast<double>(j * j) * hist[j + 256];
+                count[1] += hist[j + 256];
+
+                sum[2]   += static_cast<double>(j) * hist[j + 512];
+                sqsum[2] += static_cast<double>(j * j) * hist[j + 512];
+                count[2] += hist[j + 512];
+            }
+
+            count[0] = std::max(count[0], 1);
+            count[1] = std::max(count[1], 1);
+            count[2] = std::max(count[2], 1);
+
+            Vec3d my(
+                sum[0] / count[0],
+                sum[1] / count[1],
+                sum[2] / count[2]
+            );
+
+            relativeVariance[0][thres] = std::sqrt(sqsum[0] / count[0] - my[0] * my[0]);
+            relativeVariance[1][thres] = std::sqrt(sqsum[1] / count[1] - my[1] * my[1]);
+            relativeVariance[2][thres] = std::sqrt(sqsum[2] / count[2] - my[2] * my[2]);
+        }
+    }
+
+    void calcDiffThreshMask(const GpuMat& prevFrame, const GpuMat& curFrame, Vec3d bestThres, GpuMat& changeMask)
+    {
+        typedef void (*func_t)(PtrStepSzb prevFrame, PtrStepSzb curFrame, uchar3 bestThres, PtrStepSzb changeMask, cudaStream_t stream);
+        static const func_t funcs[4][4] =
+        {
+            {0,0,0,0},
+            {0,0,0,0},
+            {0,0,fgd::calcDiffThreshMask_gpu<uchar3, uchar3>,fgd::calcDiffThreshMask_gpu<uchar3, uchar4>},
+            {0,0,fgd::calcDiffThreshMask_gpu<uchar4, uchar3>,fgd::calcDiffThreshMask_gpu<uchar4, uchar4>}
+        };
+
+        changeMask.setTo(Scalar::all(0));
+
+        funcs[prevFrame.channels() - 1][curFrame.channels() - 1](prevFrame, curFrame,
+                                                                 make_uchar3((uchar)bestThres[0], (uchar)bestThres[1], (uchar)bestThres[2]),
+                                                                 changeMask, 0);
+    }
+
+    // performs change detection for Foreground detection algorithm
+    void changeDetection(const GpuMat& prevFrame, const GpuMat& curFrame, GpuMat& changeMask, GpuMat& hist, GpuMat& histBuf)
+    {
+        calcDiffHistogram(prevFrame, curFrame, hist, histBuf);
+
+        unsigned int histData[3 * 256];
+        Mat h_hist(3, 256, CV_32SC1, histData);
+        hist.download(h_hist);
+
+        double relativeVariance[3][fgd::HISTOGRAM_BIN_COUNT];
+        calcRelativeVariance(histData, relativeVariance);
+
+        // Find maximum:
+        Vec3d bestThres(10.0, 10.0, 10.0);
+        for (int i = 0; i < fgd::HISTOGRAM_BIN_COUNT; ++i)
+        {
+            bestThres[0] = std::max(bestThres[0], relativeVariance[0][i]);
+            bestThres[1] = std::max(bestThres[1], relativeVariance[1][i]);
+            bestThres[2] = std::max(bestThres[2], relativeVariance[2][i]);
+        }
+
+        calcDiffThreshMask(prevFrame, curFrame, bestThres, changeMask);
+    }
+}
+
+/////////////////////////////////////////////////////////////////////////
+// bgfgClassification
+
+namespace
+{
+    int bgfgClassification(const GpuMat& prevFrame, const GpuMat& curFrame,
+                           const GpuMat& Ftd, const GpuMat& Fbd,
+                           GpuMat& foreground,
+                           const FGDParams& params, int out_cn)
+    {
+        typedef void (*func_t)(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd, PtrStepSzb foreground,
+                               int deltaC, int deltaCC, float alpha2, int N1c, int N1cc, cudaStream_t stream);
+        static const func_t funcs[4][4][4] =
+        {
+            {
+                {0,0,0,0}, {0,0,0,0}, {0,0,0,0}, {0,0,0,0}
+            },
+            {
+                {0,0,0,0}, {0,0,0,0}, {0,0,0,0}, {0,0,0,0}
+            },
+            {
+                {0,0,0,0}, {0,0,0,0},
+                {0,0,fgd::bgfgClassification_gpu<uchar3, uchar3, uchar3>,fgd::bgfgClassification_gpu<uchar3, uchar3, uchar4>},
+                {0,0,fgd::bgfgClassification_gpu<uchar3, uchar4, uchar3>,fgd::bgfgClassification_gpu<uchar3, uchar4, uchar4>}
+            },
+            {
+                {0,0,0,0}, {0,0,0,0},
+                {0,0,fgd::bgfgClassification_gpu<uchar4, uchar3, uchar3>,fgd::bgfgClassification_gpu<uchar4, uchar3, uchar4>},
+                {0,0,fgd::bgfgClassification_gpu<uchar4, uchar4, uchar3>,fgd::bgfgClassification_gpu<uchar4, uchar4, uchar4>}
+            }
+        };
+
+        const int deltaC  = cvRound(params.delta * 256 / params.Lc);
+        const int deltaCC = cvRound(params.delta * 256 / params.Lcc);
+
+        funcs[prevFrame.channels() - 1][curFrame.channels() - 1][out_cn - 1](prevFrame, curFrame, Ftd, Fbd, foreground,
+                                                                             deltaC, deltaCC, params.alpha2,
+                                                                             params.N1c, params.N1cc, 0);
+
+        int count = cuda::countNonZero(foreground);
+
+        cuda::multiply(foreground, Scalar::all(255), foreground);
+
+        return count;
+    }
+}
+
+/////////////////////////////////////////////////////////////////////////
+// smoothForeground
+
+#ifdef HAVE_OPENCV_CUDAFILTERS
+
+namespace
+{
+    void morphology(const GpuMat& src, GpuMat& dst, GpuMat& filterBrd, int brd, Ptr<cuda::Filter>& filter, Scalar brdVal)
+    {
+        cuda::copyMakeBorder(src, filterBrd, brd, brd, brd, brd, BORDER_CONSTANT, brdVal);
+        filter->apply(filterBrd(Rect(brd, brd, src.cols, src.rows)), dst);
+    }
+
+    void smoothForeground(GpuMat& foreground, GpuMat& filterBrd, GpuMat& buf,
+                          Ptr<cuda::Filter>& erodeFilter, Ptr<cuda::Filter>& dilateFilter,
+                          const FGDParams& params)
+    {
+        const int brd = params.perform_morphing;
+
+        const Scalar erodeBrdVal = Scalar::all(UCHAR_MAX);
+        const Scalar dilateBrdVal = Scalar::all(0);
+
+        // MORPH_OPEN
+        morphology(foreground, buf, filterBrd, brd, erodeFilter, erodeBrdVal);
+        morphology(buf, foreground, filterBrd, brd, dilateFilter, dilateBrdVal);
+
+        // MORPH_CLOSE
+        morphology(foreground, buf, filterBrd, brd, dilateFilter, dilateBrdVal);
+        morphology(buf, foreground, filterBrd, brd, erodeFilter, erodeBrdVal);
+    }
+}
+
+#endif
+
+/////////////////////////////////////////////////////////////////////////
+// findForegroundRegions
+
+namespace
+{
+    void seqToContours(CvSeq* _ccontours, CvMemStorage* storage, OutputArrayOfArrays _contours)
+    {
+        Seq<CvSeq*> all_contours(cvTreeToNodeSeq(_ccontours, sizeof(CvSeq), storage));
+
+        size_t total = all_contours.size();
+
+        _contours.create((int) total, 1, 0, -1, true);
+
+        SeqIterator<CvSeq*> it = all_contours.begin();
+        for (size_t i = 0; i < total; ++i, ++it)
+        {
+            CvSeq* c = *it;
+            ((CvContour*)c)->color = (int)i;
+            _contours.create((int)c->total, 1, CV_32SC2, (int)i, true);
+            Mat ci = _contours.getMat((int)i);
+            CV_Assert( ci.isContinuous() );
+            cvCvtSeqToArray(c, ci.data);
+        }
+    }
+
+    int findForegroundRegions(GpuMat& d_foreground, Mat& h_foreground, std::vector< std::vector<Point> >& foreground_regions,
+                              CvMemStorage* storage, const FGDParams& params)
+    {
+        int region_count = 0;
+
+        // Discard under-size foreground regions:
+
+        d_foreground.download(h_foreground);
+        IplImage ipl_foreground = cvIplImage(h_foreground);
+        CvSeq* first_seq = 0;
+
+        cvFindContours(&ipl_foreground, storage, &first_seq, sizeof(CvContour), CV_RETR_LIST);
+
+        for (CvSeq* seq = first_seq; seq; seq = seq->h_next)
+        {
+            CvContour* cnt = reinterpret_cast<CvContour*>(seq);
+
+            if (cnt->rect.width * cnt->rect.height < params.minArea || (params.is_obj_without_holes && CV_IS_SEQ_HOLE(seq)))
+            {
+                // Delete under-size contour:
+                CvSeq* prev_seq = seq->h_prev;
+                if (prev_seq)
+                {
+                    prev_seq->h_next = seq->h_next;
+
+                    if (seq->h_next)
+                        seq->h_next->h_prev = prev_seq;
+                }
+                else
+                {
+                    first_seq = seq->h_next;
+
+                    if (seq->h_next)
+                        seq->h_next->h_prev = NULL;
+                }
+            }
+            else
+            {
+                region_count++;
+            }
+        }
+
+        seqToContours(first_seq, storage, foreground_regions);
+        h_foreground.setTo(0);
+
+        drawContours(h_foreground, foreground_regions, -1, Scalar::all(255), -1);
+
+        d_foreground.upload(h_foreground);
+
+        return region_count;
+    }
+}
+
+/////////////////////////////////////////////////////////////////////////
+// updateBackgroundModel
+
+namespace
+{
+    void updateBackgroundModel(const GpuMat& prevFrame, const GpuMat& curFrame, const GpuMat& Ftd, const GpuMat& Fbd,
+                               const GpuMat& foreground, GpuMat& background,
+                               const FGDParams& params)
+    {
+        typedef void (*func_t)(PtrStepSzb prevFrame, PtrStepSzb curFrame, PtrStepSzb Ftd, PtrStepSzb Fbd,
+                               PtrStepSzb foreground, PtrStepSzb background,
+                               int deltaC, int deltaCC, float alpha1, float alpha2, float alpha3, int N1c, int N1cc, int N2c, int N2cc, float T, cudaStream_t stream);
+        static const func_t funcs[4][4][4] =
+        {
+            {
+                {0,0,0,0}, {0,0,0,0}, {0,0,0,0}, {0,0,0,0}
+            },
+            {
+                {0,0,0,0}, {0,0,0,0}, {0,0,0,0}, {0,0,0,0}
+            },
+            {
+                {0,0,0,0}, {0,0,0,0},
+                {0,0,fgd::updateBackgroundModel_gpu<uchar3, uchar3, uchar3>,fgd::updateBackgroundModel_gpu<uchar3, uchar3, uchar4>},
+                {0,0,fgd::updateBackgroundModel_gpu<uchar3, uchar4, uchar3>,fgd::updateBackgroundModel_gpu<uchar3, uchar4, uchar4>}
+            },
+            {
+                {0,0,0,0}, {0,0,0,0},
+                {0,0,fgd::updateBackgroundModel_gpu<uchar4, uchar3, uchar3>,fgd::updateBackgroundModel_gpu<uchar4, uchar3, uchar4>},
+                {0,0,fgd::updateBackgroundModel_gpu<uchar4, uchar4, uchar3>,fgd::updateBackgroundModel_gpu<uchar4, uchar4, uchar4>}
+            }
+        };
+
+        const int deltaC  = cvRound(params.delta * 256 / params.Lc);
+        const int deltaCC = cvRound(params.delta * 256 / params.Lcc);
+
+        funcs[prevFrame.channels() - 1][curFrame.channels() - 1][background.channels() - 1](
+                    prevFrame, curFrame, Ftd, Fbd, foreground, background,
+                    deltaC, deltaCC, params.alpha1, params.alpha2, params.alpha3,
+                    params.N1c, params.N1cc, params.N2c, params.N2cc, params.T,
+                    0);
+    }
+}
+
+
+namespace
+{
+    class BGPixelStat
+    {
+    public:
+        void create(Size size, const FGDParams& params);
+
+        void setTrained();
+
+        operator fgd::BGPixelStat();
+
+    private:
+        GpuMat Pbc_;
+        GpuMat Pbcc_;
+        GpuMat is_trained_st_model_;
+        GpuMat is_trained_dyn_model_;
+
+        GpuMat ctable_Pv_;
+        GpuMat ctable_Pvb_;
+        GpuMat ctable_v_;
+
+        GpuMat cctable_Pv_;
+        GpuMat cctable_Pvb_;
+        GpuMat cctable_v1_;
+        GpuMat cctable_v2_;
+    };
+
+    void BGPixelStat::create(Size size, const FGDParams& params)
+    {
+        cuda::ensureSizeIsEnough(size, CV_32FC1, Pbc_);
+        Pbc_.setTo(Scalar::all(0));
+
+        cuda::ensureSizeIsEnough(size, CV_32FC1, Pbcc_);
+        Pbcc_.setTo(Scalar::all(0));
+
+        cuda::ensureSizeIsEnough(size, CV_8UC1, is_trained_st_model_);
+        is_trained_st_model_.setTo(Scalar::all(0));
+
+        cuda::ensureSizeIsEnough(size, CV_8UC1, is_trained_dyn_model_);
+        is_trained_dyn_model_.setTo(Scalar::all(0));
+
+        cuda::ensureSizeIsEnough(params.N2c * size.height, size.width, CV_32FC1, ctable_Pv_);
+        ctable_Pv_.setTo(Scalar::all(0));
+
+        cuda::ensureSizeIsEnough(params.N2c * size.height, size.width, CV_32FC1, ctable_Pvb_);
+        ctable_Pvb_.setTo(Scalar::all(0));
+
+        cuda::ensureSizeIsEnough(params.N2c * size.height, size.width, CV_8UC4, ctable_v_);
+        ctable_v_.setTo(Scalar::all(0));
+
+        cuda::ensureSizeIsEnough(params.N2cc * size.height, size.width, CV_32FC1, cctable_Pv_);
+        cctable_Pv_.setTo(Scalar::all(0));
+
+        cuda::ensureSizeIsEnough(params.N2cc * size.height, size.width, CV_32FC1, cctable_Pvb_);
+        cctable_Pvb_.setTo(Scalar::all(0));
+
+        cuda::ensureSizeIsEnough(params.N2cc * size.height, size.width, CV_8UC4, cctable_v1_);
+        cctable_v1_.setTo(Scalar::all(0));
+
+        cuda::ensureSizeIsEnough(params.N2cc * size.height, size.width, CV_8UC4, cctable_v2_);
+        cctable_v2_.setTo(Scalar::all(0));
+    }
+
+    void BGPixelStat::setTrained()
+    {
+        is_trained_st_model_.setTo(Scalar::all(1));
+        is_trained_dyn_model_.setTo(Scalar::all(1));
+    }
+
+    BGPixelStat::operator fgd::BGPixelStat()
+    {
+        fgd::BGPixelStat stat;
+
+        stat.rows_ = Pbc_.rows;
+
+        stat.Pbc_data_ = Pbc_.data;
+        stat.Pbc_step_ = Pbc_.step;
+
+        stat.Pbcc_data_ = Pbcc_.data;
+        stat.Pbcc_step_ = Pbcc_.step;
+
+        stat.is_trained_st_model_data_ = is_trained_st_model_.data;
+        stat.is_trained_st_model_step_ = is_trained_st_model_.step;
+
+        stat.is_trained_dyn_model_data_ = is_trained_dyn_model_.data;
+        stat.is_trained_dyn_model_step_ = is_trained_dyn_model_.step;
+
+        stat.ctable_Pv_data_ = ctable_Pv_.data;
+        stat.ctable_Pv_step_ = ctable_Pv_.step;
+
+        stat.ctable_Pvb_data_ = ctable_Pvb_.data;
+        stat.ctable_Pvb_step_ = ctable_Pvb_.step;
+
+        stat.ctable_v_data_ = ctable_v_.data;
+        stat.ctable_v_step_ = ctable_v_.step;
+
+        stat.cctable_Pv_data_ = cctable_Pv_.data;
+        stat.cctable_Pv_step_ = cctable_Pv_.step;
+
+        stat.cctable_Pvb_data_ = cctable_Pvb_.data;
+        stat.cctable_Pvb_step_ = cctable_Pvb_.step;
+
+        stat.cctable_v1_data_ = cctable_v1_.data;
+        stat.cctable_v1_step_ = cctable_v1_.step;
+
+        stat.cctable_v2_data_ = cctable_v2_.data;
+        stat.cctable_v2_step_ = cctable_v2_.step;
+
+        return stat;
+    }
+
+    class FGDImpl : public cuda::BackgroundSubtractorFGD
+    {
+    public:
+        explicit FGDImpl(const FGDParams& params);
+        ~FGDImpl();
+
+        void apply(InputArray image, OutputArray fgmask, double learningRate=-1);
+
+        void getBackgroundImage(OutputArray backgroundImage) const;
+
+        void getForegroundRegions(OutputArrayOfArrays foreground_regions);
+
+    private:
+        void initialize(const GpuMat& firstFrame);
+
+        FGDParams params_;
+        Size frameSize_;
+
+        GpuMat background_;
+        GpuMat foreground_;
+        std::vector< std::vector<Point> > foreground_regions_;
+
+        Mat h_foreground_;
+
+        GpuMat prevFrame_;
+        GpuMat Ftd_;
+        GpuMat Fbd_;
+        BGPixelStat stat_;
+
+        GpuMat hist_;
+        GpuMat histBuf_;
+
+        GpuMat buf_;
+        GpuMat filterBrd_;
+
+#ifdef HAVE_OPENCV_CUDAFILTERS
+        Ptr<cuda::Filter> dilateFilter_;
+        Ptr<cuda::Filter> erodeFilter_;
+#endif
+
+        CvMemStorage* storage_;
+    };
+
+    FGDImpl::FGDImpl(const FGDParams& params) : params_(params), frameSize_(0, 0)
+    {
+        storage_ = cvCreateMemStorage();
+        CV_Assert( storage_ != 0 );
+    }
+
+    FGDImpl::~FGDImpl()
+    {
+        cvReleaseMemStorage(&storage_);
+    }
+
+    void FGDImpl::apply(InputArray _frame, OutputArray fgmask, double)
+    {
+        GpuMat curFrame = _frame.getGpuMat();
+
+        if (curFrame.size() != frameSize_)
+        {
+            initialize(curFrame);
+            return;
+        }
+
+        CV_Assert( curFrame.type() == CV_8UC3 || curFrame.type() == CV_8UC4 );
+        CV_Assert( curFrame.size() == prevFrame_.size() );
+
+        cvClearMemStorage(storage_);
+        foreground_regions_.clear();
+        foreground_.setTo(Scalar::all(0));
+
+        changeDetection(prevFrame_, curFrame, Ftd_, hist_, histBuf_);
+        changeDetection(background_, curFrame, Fbd_, hist_, histBuf_);
+
+        int FG_pixels_count = bgfgClassification(prevFrame_, curFrame, Ftd_, Fbd_, foreground_, params_, 4);
+
+#ifdef HAVE_OPENCV_CUDAFILTERS
+        if (params_.perform_morphing > 0)
+            smoothForeground(foreground_, filterBrd_, buf_, erodeFilter_, dilateFilter_, params_);
+#endif
+
+        if (params_.minArea > 0 || params_.is_obj_without_holes)
+            findForegroundRegions(foreground_, h_foreground_, foreground_regions_, storage_, params_);
+
+        // Check ALL BG update condition:
+        const double BGFG_FGD_BG_UPDATE_TRESH = 0.5;
+        if (static_cast<double>(FG_pixels_count) / Ftd_.size().area() > BGFG_FGD_BG_UPDATE_TRESH)
+            stat_.setTrained();
+
+        updateBackgroundModel(prevFrame_, curFrame, Ftd_, Fbd_, foreground_, background_, params_);
+
+        copyChannels(curFrame, prevFrame_, 4);
+
+        foreground_.copyTo(fgmask);
+    }
+
+    void FGDImpl::getBackgroundImage(OutputArray backgroundImage) const
+    {
+        cuda::cvtColor(background_, backgroundImage, COLOR_BGRA2BGR);
+    }
+
+    void FGDImpl::getForegroundRegions(OutputArrayOfArrays dst)
+    {
+        size_t total = foreground_regions_.size();
+
+        dst.create((int) total, 1, 0, -1, true);
+
+        for (size_t i = 0; i < total; ++i)
+        {
+            std::vector<Point>& c = foreground_regions_[i];
+
+            dst.create((int) c.size(), 1, CV_32SC2, (int) i, true);
+            Mat ci = dst.getMat((int) i);
+
+            Mat(ci.size(), ci.type(), &c[0]).copyTo(ci);
+        }
+    }
+
+    void FGDImpl::initialize(const GpuMat& firstFrame)
+    {
+        CV_Assert( firstFrame.type() == CV_8UC3 || firstFrame.type() == CV_8UC4 );
+
+        frameSize_ = firstFrame.size();
+
+        cuda::ensureSizeIsEnough(firstFrame.size(), CV_8UC1, foreground_);
+
+        copyChannels(firstFrame, background_, 4);
+        copyChannels(firstFrame, prevFrame_, 4);
+
+        cuda::ensureSizeIsEnough(firstFrame.size(), CV_8UC1, Ftd_);
+        cuda::ensureSizeIsEnough(firstFrame.size(), CV_8UC1, Fbd_);
+
+        stat_.create(firstFrame.size(), params_);
+        fgd::setBGPixelStat(stat_);
+
+#ifdef HAVE_OPENCV_CUDAFILTERS
+        if (params_.perform_morphing > 0)
+        {
+            Mat kernel = getStructuringElement(MORPH_RECT, Size(1 + params_.perform_morphing * 2, 1 + params_.perform_morphing * 2));
+            Point anchor(params_.perform_morphing, params_.perform_morphing);
+
+            dilateFilter_ = cuda::createMorphologyFilter(MORPH_DILATE, CV_8UC1, kernel, anchor);
+            erodeFilter_ = cuda::createMorphologyFilter(MORPH_ERODE, CV_8UC1, kernel, anchor);
+        }
+#endif
+    }
+}
+
+Ptr<cuda::BackgroundSubtractorFGD> cv::cuda::createBackgroundSubtractorFGD(const FGDParams& params)
+{
+    return makePtr<FGDImpl>(params);
+}
+
+#endif // HAVE_CUDA
diff --git a/modules/cudalegacy/src/gmg.cpp b/modules/cudalegacy/src/gmg.cpp
new file mode 100644
index 00000000000..a982d8689bf
--- /dev/null
+++ b/modules/cudalegacy/src/gmg.cpp
@@ -0,0 +1,277 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined HAVE_CUDA || defined(CUDA_DISABLER)
+
+Ptr<cuda::BackgroundSubtractorGMG> cv::cuda::createBackgroundSubtractorGMG(int, double) { throw_no_cuda(); return Ptr<cuda::BackgroundSubtractorGMG>(); }
+
+#else
+
+namespace cv { namespace cuda { namespace device {
+    namespace gmg
+    {
+        void loadConstants(int width, int height, float minVal, float maxVal, int quantizationLevels, float backgroundPrior,
+                           float decisionThreshold, int maxFeatures, int numInitializationFrames);
+
+        template <typename SrcT>
+        void update_gpu(PtrStepSzb frame, PtrStepb fgmask, PtrStepSzi colors, PtrStepf weights, PtrStepi nfeatures,
+                        int frameNum,  float learningRate, bool updateBackgroundModel, cudaStream_t stream);
+    }
+}}}
+
+namespace
+{
+    class GMGImpl : public cuda::BackgroundSubtractorGMG
+    {
+    public:
+        GMGImpl(int initializationFrames, double decisionThreshold);
+
+        void apply(InputArray image, OutputArray fgmask, double learningRate=-1);
+        void apply(InputArray image, OutputArray fgmask, double learningRate, Stream& stream);
+
+        void getBackgroundImage(OutputArray backgroundImage) const;
+
+        int getMaxFeatures() const { return maxFeatures_; }
+        void setMaxFeatures(int maxFeatures) { maxFeatures_ = maxFeatures; }
+
+        double getDefaultLearningRate() const { return learningRate_; }
+        void setDefaultLearningRate(double lr) { learningRate_ = (float) lr; }
+
+        int getNumFrames() const { return numInitializationFrames_; }
+        void setNumFrames(int nframes) { numInitializationFrames_ = nframes; }
+
+        int getQuantizationLevels() const { return quantizationLevels_; }
+        void setQuantizationLevels(int nlevels) { quantizationLevels_ = nlevels; }
+
+        double getBackgroundPrior() const { return backgroundPrior_; }
+        void setBackgroundPrior(double bgprior) { backgroundPrior_ = (float) bgprior; }
+
+        int getSmoothingRadius() const { return smoothingRadius_; }
+        void setSmoothingRadius(int radius) { smoothingRadius_ = radius; }
+
+        double getDecisionThreshold() const { return decisionThreshold_; }
+        void setDecisionThreshold(double thresh) { decisionThreshold_ = (float) thresh; }
+
+        bool getUpdateBackgroundModel() const { return updateBackgroundModel_; }
+        void setUpdateBackgroundModel(bool update) { updateBackgroundModel_ = update; }
+
+        double getMinVal() const { return minVal_; }
+        void setMinVal(double val) { minVal_ = (float) val; }
+
+        double getMaxVal() const { return maxVal_; }
+        void setMaxVal(double val) { maxVal_ = (float) val; }
+
+    private:
+        void initialize(Size frameSize, float min, float max);
+
+        //! Total number of distinct colors to maintain in histogram.
+        int maxFeatures_;
+
+        //! Set between 0.0 and 1.0, determines how quickly features are "forgotten" from histograms.
+        float learningRate_;
+
+        //! Number of frames of video to use to initialize histograms.
+        int numInitializationFrames_;
+
+        //! Number of discrete levels in each channel to be used in histograms.
+        int quantizationLevels_;
+
+        //! Prior probability that any given pixel is a background pixel. A sensitivity parameter.
+        float backgroundPrior_;
+
+        //! Smoothing radius, in pixels, for cleaning up FG image.
+        int smoothingRadius_;
+
+        //! Value above which pixel is determined to be FG.
+        float decisionThreshold_;
+
+        //! Perform background model update.
+        bool updateBackgroundModel_;
+
+        float minVal_, maxVal_;
+
+        Size frameSize_;
+        int frameNum_;
+
+        GpuMat nfeatures_;
+        GpuMat colors_;
+        GpuMat weights_;
+
+#if defined(HAVE_OPENCV_CUDAFILTERS) && defined(HAVE_OPENCV_CUDAARITHM)
+        Ptr<cuda::Filter> boxFilter_;
+        GpuMat buf_;
+#endif
+    };
+
+    GMGImpl::GMGImpl(int initializationFrames, double decisionThreshold)
+    {
+        maxFeatures_ = 64;
+        learningRate_ = 0.025f;
+        numInitializationFrames_ = initializationFrames;
+        quantizationLevels_ = 16;
+        backgroundPrior_ = 0.8f;
+        decisionThreshold_ = (float) decisionThreshold;
+        smoothingRadius_ = 7;
+        updateBackgroundModel_ = true;
+        minVal_ = maxVal_ = 0;
+    }
+
+    void GMGImpl::apply(InputArray image, OutputArray fgmask, double learningRate)
+    {
+        apply(image, fgmask, learningRate, Stream::Null());
+    }
+
+    void GMGImpl::apply(InputArray _frame, OutputArray _fgmask, double newLearningRate, Stream& stream)
+    {
+        using namespace cv::cuda::device::gmg;
+
+        typedef void (*func_t)(PtrStepSzb frame, PtrStepb fgmask, PtrStepSzi colors, PtrStepf weights, PtrStepi nfeatures,
+                               int frameNum, float learningRate, bool updateBackgroundModel, cudaStream_t stream);
+        static const func_t funcs[6][4] =
+        {
+            {update_gpu<uchar>, 0, update_gpu<uchar3>, update_gpu<uchar4>},
+            {0,0,0,0},
+            {update_gpu<ushort>, 0, update_gpu<ushort3>, update_gpu<ushort4>},
+            {0,0,0,0},
+            {0,0,0,0},
+            {update_gpu<float>, 0, update_gpu<float3>, update_gpu<float4>}
+        };
+
+        GpuMat frame = _frame.getGpuMat();
+
+        CV_Assert( frame.depth() == CV_8U || frame.depth() == CV_16U || frame.depth() == CV_32F );
+        CV_Assert( frame.channels() == 1 || frame.channels() == 3 || frame.channels() == 4 );
+
+        if (newLearningRate != -1.0)
+        {
+            CV_Assert( newLearningRate >= 0.0 && newLearningRate <= 1.0 );
+            learningRate_ = (float) newLearningRate;
+        }
+
+        if (frame.size() != frameSize_)
+        {
+            double minVal = minVal_;
+            double maxVal = maxVal_;
+
+            if (minVal_ == 0 && maxVal_ == 0)
+            {
+                minVal = 0;
+                maxVal = frame.depth() == CV_8U ? 255.0 : frame.depth() == CV_16U ? std::numeric_limits<ushort>::max() : 1.0;
+            }
+
+            initialize(frame.size(), (float) minVal, (float) maxVal);
+        }
+
+        _fgmask.create(frameSize_, CV_8UC1);
+        GpuMat fgmask = _fgmask.getGpuMat();
+
+        fgmask.setTo(Scalar::all(0), stream);
+
+        funcs[frame.depth()][frame.channels() - 1](frame, fgmask, colors_, weights_, nfeatures_, frameNum_,
+                                                   learningRate_, updateBackgroundModel_, StreamAccessor::getStream(stream));
+
+#if defined(HAVE_OPENCV_CUDAFILTERS) && defined(HAVE_OPENCV_CUDAARITHM)
+        // medianBlur
+        if (smoothingRadius_ > 0)
+        {
+            boxFilter_->apply(fgmask, buf_, stream);
+            const int minCount = (smoothingRadius_ * smoothingRadius_ + 1) / 2;
+            const double thresh = 255.0 * minCount / (smoothingRadius_ * smoothingRadius_);
+            cuda::threshold(buf_, fgmask, thresh, 255.0, THRESH_BINARY, stream);
+        }
+#endif
+
+        // keep track of how many frames we have processed
+        ++frameNum_;
+    }
+
+    void GMGImpl::getBackgroundImage(OutputArray backgroundImage) const
+    {
+        CV_UNUSED(backgroundImage);
+        CV_Error(Error::StsNotImplemented, "Not implemented");
+    }
+
+    void GMGImpl::initialize(Size frameSize, float min, float max)
+    {
+        using namespace cv::cuda::device::gmg;
+
+        CV_Assert( maxFeatures_ > 0 );
+        CV_Assert( learningRate_ >= 0.0f && learningRate_ <= 1.0f);
+        CV_Assert( numInitializationFrames_ >= 1);
+        CV_Assert( quantizationLevels_ >= 1 && quantizationLevels_ <= 255);
+        CV_Assert( backgroundPrior_ >= 0.0f && backgroundPrior_ <= 1.0f);
+
+        minVal_ = min;
+        maxVal_ = max;
+        CV_Assert( minVal_ < maxVal_ );
+
+        frameSize_ = frameSize;
+
+        frameNum_ = 0;
+
+        nfeatures_.create(frameSize_, CV_32SC1);
+        colors_.create(maxFeatures_ * frameSize_.height, frameSize_.width, CV_32SC1);
+        weights_.create(maxFeatures_ * frameSize_.height, frameSize_.width, CV_32FC1);
+
+        nfeatures_.setTo(Scalar::all(0));
+
+#if defined(HAVE_OPENCV_CUDAFILTERS) && defined(HAVE_OPENCV_CUDAARITHM)
+        if (smoothingRadius_ > 0)
+            boxFilter_ = cuda::createBoxFilter(CV_8UC1, -1, Size(smoothingRadius_, smoothingRadius_));
+#endif
+
+        loadConstants(frameSize_.width, frameSize_.height, minVal_, maxVal_,
+                      quantizationLevels_, backgroundPrior_, decisionThreshold_, maxFeatures_, numInitializationFrames_);
+    }
+}
+
+Ptr<cuda::BackgroundSubtractorGMG> cv::cuda::createBackgroundSubtractorGMG(int initializationFrames, double decisionThreshold)
+{
+    return makePtr<GMGImpl>(initializationFrames, decisionThreshold);
+}
+
+#endif
diff --git a/modules/cudalegacy/src/graphcuts.cpp b/modules/cudalegacy/src/graphcuts.cpp
new file mode 100644
index 00000000000..f4017a56a80
--- /dev/null
+++ b/modules/cudalegacy/src/graphcuts.cpp
@@ -0,0 +1,283 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+// GraphCut has been removed in NPP 8.0
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER) || (CUDART_VERSION >= 8000)
+
+void cv::cuda::graphcut(GpuMat&, GpuMat&, GpuMat&, GpuMat&, GpuMat&, GpuMat&, GpuMat&, Stream&) { throw_no_cuda(); }
+void cv::cuda::graphcut(GpuMat&, GpuMat&, GpuMat&, GpuMat&, GpuMat&, GpuMat&, GpuMat&, GpuMat&, GpuMat&, GpuMat&, GpuMat&, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::connectivityMask(const GpuMat&, GpuMat&, const cv::Scalar&, const cv::Scalar&, Stream&) { throw_no_cuda(); }
+void cv::cuda::labelComponents(const GpuMat&, GpuMat&, int, Stream&) { throw_no_cuda(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace ccl
+    {
+        void labelComponents(const PtrStepSzb& edges, PtrStepSzi comps, int flags, cudaStream_t stream);
+
+        template<typename T>
+        void computeEdges(const PtrStepSzb& image, PtrStepSzb edges, const float4& lo, const float4& hi, cudaStream_t stream);
+    }
+}}}
+
+static float4 scalarToCudaType(const cv::Scalar& in)
+{
+  return make_float4((float)in[0], (float)in[1], (float)in[2], (float)in[3]);
+}
+
+void cv::cuda::connectivityMask(const GpuMat& image, GpuMat& mask, const cv::Scalar& lo, const cv::Scalar& hi, Stream& s)
+{
+    CV_Assert(!image.empty());
+
+    int ch = image.channels();
+    CV_Assert(ch <= 4);
+
+    int depth = image.depth();
+
+    typedef void (*func_t)(const PtrStepSzb& image, PtrStepSzb edges, const float4& lo, const float4& hi, cudaStream_t stream);
+
+    static const func_t suppotLookup[8][4] =
+    {   //    1,    2,     3,     4
+        { device::ccl::computeEdges<uchar>,  0,  device::ccl::computeEdges<uchar3>,  device::ccl::computeEdges<uchar4>  },// CV_8U
+        { 0,                                 0,  0,                                  0                                  },// CV_16U
+        { device::ccl::computeEdges<ushort>, 0,  device::ccl::computeEdges<ushort3>, device::ccl::computeEdges<ushort4> },// CV_8S
+        { 0,                                 0,  0,                                  0                                  },// CV_16S
+        { device::ccl::computeEdges<int>,    0,  0,                                  0                                  },// CV_32S
+        { device::ccl::computeEdges<float>,  0,  0,                                  0                                  },// CV_32F
+        { 0,                                 0,  0,                                  0                                  },// CV_64F
+        { 0,                                 0,  0,                                  0                                  } // CV_16F
+    };
+
+    func_t f = suppotLookup[depth][ch - 1];
+    CV_Assert(f);
+
+    if (image.size() != mask.size() || mask.type() != CV_8UC1)
+        mask.create(image.size(), CV_8UC1);
+
+    cudaStream_t stream = StreamAccessor::getStream(s);
+    float4 culo = scalarToCudaType(lo), cuhi = scalarToCudaType(hi);
+    f(image, mask, culo, cuhi, stream);
+}
+
+void cv::cuda::labelComponents(const GpuMat& mask, GpuMat& components, int flags, Stream& s)
+{
+    CV_Assert(!mask.empty() && mask.type() == CV_8U);
+
+    if (!deviceSupports(SHARED_ATOMICS))
+        CV_Error(cv::Error::StsNotImplemented, "The device doesn't support shared atomics and communicative synchronization!");
+
+    components.create(mask.size(), CV_32SC1);
+
+    cudaStream_t stream = StreamAccessor::getStream(s);
+    device::ccl::labelComponents(mask, components, flags, stream);
+}
+
+namespace
+{
+    typedef NppStatus (*init_func_t)(NppiSize oSize, NppiGraphcutState** ppState, Npp8u* pDeviceMem);
+
+    class NppiGraphcutStateHandler
+    {
+    public:
+        NppiGraphcutStateHandler(NppiSize sznpp, Npp8u* pDeviceMem, const init_func_t func)
+        {
+            nppSafeCall( func(sznpp, &pState, pDeviceMem) );
+        }
+
+        ~NppiGraphcutStateHandler()
+        {
+            nppSafeCall( nppiGraphcutFree(pState) );
+        }
+
+        operator NppiGraphcutState*()
+        {
+            return pState;
+        }
+
+    private:
+        NppiGraphcutState* pState;
+    };
+}
+
+void cv::cuda::graphcut(GpuMat& terminals, GpuMat& leftTransp, GpuMat& rightTransp, GpuMat& top, GpuMat& bottom, GpuMat& labels, GpuMat& buf, Stream& s)
+{
+#if (CUDA_VERSION < 5000)
+    CV_Assert(terminals.type() == CV_32S);
+#else
+    CV_Assert(terminals.type() == CV_32S || terminals.type() == CV_32F);
+#endif
+
+    Size src_size = terminals.size();
+
+    CV_Assert(leftTransp.size() == Size(src_size.height, src_size.width));
+    CV_Assert(leftTransp.type() == terminals.type());
+
+    CV_Assert(rightTransp.size() == Size(src_size.height, src_size.width));
+    CV_Assert(rightTransp.type() == terminals.type());
+
+    CV_Assert(top.size() == src_size);
+    CV_Assert(top.type() == terminals.type());
+
+    CV_Assert(bottom.size() == src_size);
+    CV_Assert(bottom.type() == terminals.type());
+
+    labels.create(src_size, CV_8U);
+
+    NppiSize sznpp;
+    sznpp.width = src_size.width;
+    sznpp.height = src_size.height;
+
+    int bufsz;
+    nppSafeCall( nppiGraphcutGetSize(sznpp, &bufsz) );
+
+    ensureSizeIsEnough(1, bufsz, CV_8U, buf);
+
+    cudaStream_t stream = StreamAccessor::getStream(s);
+
+    NppStreamHandler h(stream);
+
+    NppiGraphcutStateHandler state(sznpp, buf.ptr<Npp8u>(), nppiGraphcutInitAlloc);
+
+#if (CUDA_VERSION < 5000)
+    nppSafeCall( nppiGraphcut_32s8u(terminals.ptr<Npp32s>(), leftTransp.ptr<Npp32s>(), rightTransp.ptr<Npp32s>(), top.ptr<Npp32s>(), bottom.ptr<Npp32s>(),
+        static_cast<int>(terminals.step), static_cast<int>(leftTransp.step), sznpp, labels.ptr<Npp8u>(), static_cast<int>(labels.step), state) );
+#else
+    if (terminals.type() == CV_32S)
+    {
+        nppSafeCall( nppiGraphcut_32s8u(terminals.ptr<Npp32s>(), leftTransp.ptr<Npp32s>(), rightTransp.ptr<Npp32s>(), top.ptr<Npp32s>(), bottom.ptr<Npp32s>(),
+            static_cast<int>(terminals.step), static_cast<int>(leftTransp.step), sznpp, labels.ptr<Npp8u>(), static_cast<int>(labels.step), state) );
+    }
+    else
+    {
+        nppSafeCall( nppiGraphcut_32f8u(terminals.ptr<Npp32f>(), leftTransp.ptr<Npp32f>(), rightTransp.ptr<Npp32f>(), top.ptr<Npp32f>(), bottom.ptr<Npp32f>(),
+            static_cast<int>(terminals.step), static_cast<int>(leftTransp.step), sznpp, labels.ptr<Npp8u>(), static_cast<int>(labels.step), state) );
+    }
+#endif
+
+    if (stream == 0)
+        cudaSafeCall( cudaDeviceSynchronize() );
+}
+
+void cv::cuda::graphcut(GpuMat& terminals, GpuMat& leftTransp, GpuMat& rightTransp, GpuMat& top, GpuMat& topLeft, GpuMat& topRight,
+              GpuMat& bottom, GpuMat& bottomLeft, GpuMat& bottomRight, GpuMat& labels, GpuMat& buf, Stream& s)
+{
+#if (CUDA_VERSION < 5000)
+    CV_Assert(terminals.type() == CV_32S);
+#else
+    CV_Assert(terminals.type() == CV_32S || terminals.type() == CV_32F);
+#endif
+
+    Size src_size = terminals.size();
+
+    CV_Assert(leftTransp.size() == Size(src_size.height, src_size.width));
+    CV_Assert(leftTransp.type() == terminals.type());
+
+    CV_Assert(rightTransp.size() == Size(src_size.height, src_size.width));
+    CV_Assert(rightTransp.type() == terminals.type());
+
+    CV_Assert(top.size() == src_size);
+    CV_Assert(top.type() == terminals.type());
+
+    CV_Assert(topLeft.size() == src_size);
+    CV_Assert(topLeft.type() == terminals.type());
+
+    CV_Assert(topRight.size() == src_size);
+    CV_Assert(topRight.type() == terminals.type());
+
+    CV_Assert(bottom.size() == src_size);
+    CV_Assert(bottom.type() == terminals.type());
+
+    CV_Assert(bottomLeft.size() == src_size);
+    CV_Assert(bottomLeft.type() == terminals.type());
+
+    CV_Assert(bottomRight.size() == src_size);
+    CV_Assert(bottomRight.type() == terminals.type());
+
+    labels.create(src_size, CV_8U);
+
+    NppiSize sznpp;
+    sznpp.width = src_size.width;
+    sznpp.height = src_size.height;
+
+    int bufsz;
+    nppSafeCall( nppiGraphcut8GetSize(sznpp, &bufsz) );
+
+    ensureSizeIsEnough(1, bufsz, CV_8U, buf);
+
+    cudaStream_t stream = StreamAccessor::getStream(s);
+
+    NppStreamHandler h(stream);
+
+    NppiGraphcutStateHandler state(sznpp, buf.ptr<Npp8u>(), nppiGraphcut8InitAlloc);
+
+#if (CUDA_VERSION < 5000)
+    nppSafeCall( nppiGraphcut8_32s8u(terminals.ptr<Npp32s>(), leftTransp.ptr<Npp32s>(), rightTransp.ptr<Npp32s>(),
+        top.ptr<Npp32s>(), topLeft.ptr<Npp32s>(), topRight.ptr<Npp32s>(),
+        bottom.ptr<Npp32s>(), bottomLeft.ptr<Npp32s>(), bottomRight.ptr<Npp32s>(),
+        static_cast<int>(terminals.step), static_cast<int>(leftTransp.step), sznpp, labels.ptr<Npp8u>(), static_cast<int>(labels.step), state) );
+#else
+    if (terminals.type() == CV_32S)
+    {
+        nppSafeCall( nppiGraphcut8_32s8u(terminals.ptr<Npp32s>(), leftTransp.ptr<Npp32s>(), rightTransp.ptr<Npp32s>(),
+            top.ptr<Npp32s>(), topLeft.ptr<Npp32s>(), topRight.ptr<Npp32s>(),
+            bottom.ptr<Npp32s>(), bottomLeft.ptr<Npp32s>(), bottomRight.ptr<Npp32s>(),
+            static_cast<int>(terminals.step), static_cast<int>(leftTransp.step), sznpp, labels.ptr<Npp8u>(), static_cast<int>(labels.step), state) );
+    }
+    else
+    {
+        nppSafeCall( nppiGraphcut8_32f8u(terminals.ptr<Npp32f>(), leftTransp.ptr<Npp32f>(), rightTransp.ptr<Npp32f>(),
+            top.ptr<Npp32f>(), topLeft.ptr<Npp32f>(), topRight.ptr<Npp32f>(),
+            bottom.ptr<Npp32f>(), bottomLeft.ptr<Npp32f>(), bottomRight.ptr<Npp32f>(),
+            static_cast<int>(terminals.step), static_cast<int>(leftTransp.step), sznpp, labels.ptr<Npp8u>(), static_cast<int>(labels.step), state) );
+    }
+#endif
+
+    if (stream == 0)
+        cudaSafeCall( cudaDeviceSynchronize() );
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudalegacy/src/image_pyramid.cpp b/modules/cudalegacy/src/image_pyramid.cpp
new file mode 100644
index 00000000000..938ffea5d80
--- /dev/null
+++ b/modules/cudalegacy/src/image_pyramid.cpp
@@ -0,0 +1,147 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined HAVE_CUDA || defined(CUDA_DISABLER)
+
+Ptr<ImagePyramid> cv::cuda::createImagePyramid(InputArray, int, Stream&) { throw_no_cuda(); return Ptr<ImagePyramid>(); }
+
+#else // HAVE_CUDA
+
+namespace
+{
+    class ImagePyramidImpl : public ImagePyramid
+    {
+    public:
+        ImagePyramidImpl(InputArray img, int nLayers, Stream& stream);
+
+        void getLayer(OutputArray outImg, Size outRoi, Stream& stream = Stream::Null()) const;
+
+    private:
+        GpuMat layer0_;
+        std::vector<GpuMat> pyramid_;
+        int nLayers_;
+    };
+
+    ImagePyramidImpl::ImagePyramidImpl(InputArray _img, int numLayers, Stream& stream)
+    {
+        GpuMat img = _img.getGpuMat();
+
+        CV_Assert( img.depth() <= CV_32F && img.channels() <= 4 );
+
+        img.copyTo(layer0_, stream);
+
+        Size szLastLayer = img.size();
+        nLayers_ = 1;
+
+        if (numLayers <= 0)
+            numLayers = 255; // it will cut-off when any of the dimensions goes 1
+
+        pyramid_.resize(numLayers);
+
+        for (int i = 0; i < numLayers - 1; ++i)
+        {
+            Size szCurLayer(szLastLayer.width / 2, szLastLayer.height / 2);
+
+            if (szCurLayer.width == 0 || szCurLayer.height == 0)
+                break;
+
+            ensureSizeIsEnough(szCurLayer, img.type(), pyramid_[i]);
+            nLayers_++;
+
+            const GpuMat& prevLayer = i == 0 ? layer0_ : pyramid_[i - 1];
+
+            cv::cuda::device::pyramid::downsampleX2(prevLayer, pyramid_[i], img.depth(), img.channels(), StreamAccessor::getStream(stream));
+
+            szLastLayer = szCurLayer;
+        }
+    }
+
+    void ImagePyramidImpl::getLayer(OutputArray _outImg, Size outRoi, Stream& stream) const
+    {
+        CV_Assert( outRoi.width <= layer0_.cols && outRoi.height <= layer0_.rows && outRoi.width > 0 && outRoi.height > 0 );
+
+        ensureSizeIsEnough(outRoi, layer0_.type(), _outImg);
+        GpuMat outImg = _outImg.getGpuMat();
+
+        if (outRoi.width == layer0_.cols && outRoi.height == layer0_.rows)
+        {
+            layer0_.copyTo(outImg, stream);
+            return;
+        }
+
+        float lastScale = 1.0f;
+        float curScale;
+        GpuMat lastLayer = layer0_;
+        GpuMat curLayer;
+
+        for (int i = 0; i < nLayers_ - 1; ++i)
+        {
+            curScale = lastScale * 0.5f;
+            curLayer = pyramid_[i];
+
+            if (outRoi.width == curLayer.cols && outRoi.height == curLayer.rows)
+            {
+                curLayer.copyTo(outImg, stream);
+            }
+
+            if (outRoi.width >= curLayer.cols && outRoi.height >= curLayer.rows)
+                break;
+
+            lastScale = curScale;
+            lastLayer = curLayer;
+        }
+
+        cv::cuda::device::pyramid::interpolateFrom1(lastLayer, outImg, outImg.depth(), outImg.channels(), StreamAccessor::getStream(stream));
+    }
+}
+
+Ptr<ImagePyramid> cv::cuda::createImagePyramid(InputArray img, int nLayers, Stream& stream)
+{
+    return Ptr<ImagePyramid>(new ImagePyramidImpl(img, nLayers, stream));
+}
+
+#endif
diff --git a/modules/cudalegacy/src/interpolate_frames.cpp b/modules/cudalegacy/src/interpolate_frames.cpp
new file mode 100644
index 00000000000..5cb7ea8f59c
--- /dev/null
+++ b/modules/cudalegacy/src/interpolate_frames.cpp
@@ -0,0 +1,113 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || !defined (HAVE_OPENCV_CUDALEGACY) || defined (CUDA_DISABLER)
+
+void cv::cuda::interpolateFrames(const GpuMat&, const GpuMat&, const GpuMat&, const GpuMat&, const GpuMat&, const GpuMat&, float, GpuMat&, GpuMat&, Stream&) { throw_no_cuda(); }
+
+#else
+
+void cv::cuda::interpolateFrames(const GpuMat& frame0, const GpuMat& frame1, const GpuMat& fu, const GpuMat& fv, const GpuMat& bu, const GpuMat& bv,
+                                float pos, GpuMat& newFrame, GpuMat& buf, Stream& s)
+{
+    CV_Assert(frame0.type() == CV_32FC1);
+    CV_Assert(frame1.size() == frame0.size() && frame1.type() == frame0.type());
+    CV_Assert(fu.size() == frame0.size() && fu.type() == frame0.type());
+    CV_Assert(fv.size() == frame0.size() && fv.type() == frame0.type());
+    CV_Assert(bu.size() == frame0.size() && bu.type() == frame0.type());
+    CV_Assert(bv.size() == frame0.size() && bv.type() == frame0.type());
+
+    newFrame.create(frame0.size(), frame0.type());
+
+    buf.create(6 * frame0.rows, frame0.cols, CV_32FC1);
+    buf.setTo(Scalar::all(0));
+
+    // occlusion masks
+    GpuMat occ0 = buf.rowRange(0 * frame0.rows, 1 * frame0.rows);
+    GpuMat occ1 = buf.rowRange(1 * frame0.rows, 2 * frame0.rows);
+
+    // interpolated forward flow
+    GpuMat fui = buf.rowRange(2 * frame0.rows, 3 * frame0.rows);
+    GpuMat fvi = buf.rowRange(3 * frame0.rows, 4 * frame0.rows);
+
+    // interpolated backward flow
+    GpuMat bui = buf.rowRange(4 * frame0.rows, 5 * frame0.rows);
+    GpuMat bvi = buf.rowRange(5 * frame0.rows, 6 * frame0.rows);
+
+    size_t step = frame0.step;
+
+    CV_Assert(frame1.step == step && fu.step == step && fv.step == step && bu.step == step && bv.step == step && newFrame.step == step && buf.step == step);
+
+    cudaStream_t stream = StreamAccessor::getStream(s);
+    NppStStreamHandler h(stream);
+
+    NppStInterpolationState state;
+
+    state.size         = NcvSize32u(frame0.cols, frame0.rows);
+    state.nStep        = static_cast<Ncv32u>(step);
+    state.pSrcFrame0   = const_cast<Ncv32f*>(frame0.ptr<Ncv32f>());
+    state.pSrcFrame1   = const_cast<Ncv32f*>(frame1.ptr<Ncv32f>());
+    state.pFU          = const_cast<Ncv32f*>(fu.ptr<Ncv32f>());
+    state.pFV          = const_cast<Ncv32f*>(fv.ptr<Ncv32f>());
+    state.pBU          = const_cast<Ncv32f*>(bu.ptr<Ncv32f>());
+    state.pBV          = const_cast<Ncv32f*>(bv.ptr<Ncv32f>());
+    state.pos          = pos;
+    state.pNewFrame    = newFrame.ptr<Ncv32f>();
+    state.ppBuffers[0] = occ0.ptr<Ncv32f>();
+    state.ppBuffers[1] = occ1.ptr<Ncv32f>();
+    state.ppBuffers[2] = fui.ptr<Ncv32f>();
+    state.ppBuffers[3] = fvi.ptr<Ncv32f>();
+    state.ppBuffers[4] = bui.ptr<Ncv32f>();
+    state.ppBuffers[5] = bvi.ptr<Ncv32f>();
+
+    ncvSafeCall( nppiStInterpolateFrames(&state) );
+
+    if (stream == 0)
+        cudaSafeCall( cudaDeviceSynchronize() );
+}
+
+#endif /* HAVE_CUDA */
diff --git a/modules/cudalegacy/src/needle_map.cpp b/modules/cudalegacy/src/needle_map.cpp
new file mode 100644
index 00000000000..185bfc1e828
--- /dev/null
+++ b/modules/cudalegacy/src/needle_map.cpp
@@ -0,0 +1,100 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || !defined(HAVE_OPENCV_CUDAIMGPROC) || defined (CUDA_DISABLER)
+
+void cv::cuda::createOpticalFlowNeedleMap(const GpuMat&, const GpuMat&, GpuMat&, GpuMat&) { throw_no_cuda(); }
+
+#else
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace optical_flow
+    {
+        void NeedleMapAverage_gpu(PtrStepSzf u, PtrStepSzf v, PtrStepSzf u_avg, PtrStepSzf v_avg);
+        void CreateOpticalFlowNeedleMap_gpu(PtrStepSzf u_avg, PtrStepSzf v_avg, float* vertex_buffer, float* color_data, float max_flow, float xscale, float yscale);
+    }
+}}}
+
+void cv::cuda::createOpticalFlowNeedleMap(const GpuMat& u, const GpuMat& v, GpuMat& vertex, GpuMat& colors)
+{
+    using namespace cv::cuda::device::optical_flow;
+
+    CV_Assert(u.type() == CV_32FC1);
+    CV_Assert(v.type() == u.type() && v.size() == u.size());
+
+    const int NEEDLE_MAP_SCALE = 16;
+
+    const int x_needles = u.cols / NEEDLE_MAP_SCALE;
+    const int y_needles = u.rows / NEEDLE_MAP_SCALE;
+
+    GpuMat u_avg(y_needles, x_needles, CV_32FC1);
+    GpuMat v_avg(y_needles, x_needles, CV_32FC1);
+
+    NeedleMapAverage_gpu(u, v, u_avg, v_avg);
+
+    const int NUM_VERTS_PER_ARROW = 6;
+
+    const int num_arrows = x_needles * y_needles * NUM_VERTS_PER_ARROW;
+
+    vertex.create(1, num_arrows, CV_32FC3);
+    colors.create(1, num_arrows, CV_32FC3);
+
+    colors.setTo(Scalar::all(1.0));
+
+    double uMax, vMax;
+    cuda::minMax(u_avg, 0, &uMax);
+    cuda::minMax(v_avg, 0, &vMax);
+
+    float max_flow = static_cast<float>(std::sqrt(uMax * uMax + vMax * vMax));
+
+    CreateOpticalFlowNeedleMap_gpu(u_avg, v_avg, vertex.ptr<float>(), colors.ptr<float>(), max_flow, 1.0f / u.cols, 1.0f / u.rows);
+
+    cuda::cvtColor(colors, colors, COLOR_HSV2RGB);
+}
+
+#endif /* HAVE_CUDA */
diff --git a/modules/cudalegacy/src/precomp.hpp b/modules/cudalegacy/src/precomp.hpp
new file mode 100644
index 00000000000..e87cc8620c9
--- /dev/null
+++ b/modules/cudalegacy/src/precomp.hpp
@@ -0,0 +1,85 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_PRECOMP_H__
+#define __OPENCV_PRECOMP_H__
+
+#include <limits>
+#include <iostream>
+#include <algorithm>
+
+#if defined(__OPENCV_BUILD) && defined(__clang__)
+#pragma clang diagnostic ignored "-Winconsistent-missing-override"
+#endif
+#if defined(__OPENCV_BUILD) && defined(__GNUC__) && __GNUC__ >= 5
+#pragma GCC diagnostic ignored "-Wsuggest-override"
+#endif
+
+#include "opencv2/cudalegacy.hpp"
+#include "opencv2/core/utility.hpp"
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifdef HAVE_OPENCV_OBJDETECT
+#  include "opencv2/objdetect.hpp"
+#endif
+
+#ifdef HAVE_OPENCV_CALIB3D
+#  include "opencv2/calib3d.hpp"
+#endif
+
+#ifdef HAVE_OPENCV_CUDAARITHM
+#  include "opencv2/cudaarithm.hpp"
+#endif
+
+#ifdef HAVE_OPENCV_CUDAFILTERS
+#  include "opencv2/cudafilters.hpp"
+#endif
+
+#ifdef HAVE_OPENCV_CUDAIMGPROC
+#  include "opencv2/cudaimgproc.hpp"
+#endif
+
+#include "opencv2/core/private.cuda.hpp"
+#include "opencv2/cudalegacy/private.hpp"
+
+#endif /* __OPENCV_PRECOMP_H__ */
diff --git a/modules/cudalegacy/test/NCVAutoTestLister.hpp b/modules/cudalegacy/test/NCVAutoTestLister.hpp
new file mode 100644
index 00000000000..8730eeea7e3
--- /dev/null
+++ b/modules/cudalegacy/test/NCVAutoTestLister.hpp
@@ -0,0 +1,166 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _ncvautotestlister_hpp_
+#define _ncvautotestlister_hpp_
+
+#include <vector>
+
+#include "NCVTest.hpp"
+#include "main_test_nvidia.h"
+
+class NCVAutoTestLister
+{
+public:
+
+    NCVAutoTestLister(std::string testSuiteName_, OutputLevel outputLevel_ = OutputLevelCompact, NcvBool bStopOnFirstFail_=false)
+        :
+    testSuiteName(testSuiteName_),
+    outputLevel(outputLevel_),
+    bStopOnFirstFail(bStopOnFirstFail_)
+    {
+    }
+
+    void add(INCVTest *test)
+    {
+        this->tests.push_back(test);
+    }
+
+    bool invoke()
+    {
+        Ncv32u nPassed = 0;
+        Ncv32u nFailed = 0;
+        Ncv32u nFailedMem = 0;
+
+        if (outputLevel == OutputLevelCompact)
+        {
+            printf("Test suite '%s' with %d tests\n",
+                testSuiteName.c_str(),
+                (int)(this->tests.size()));
+        }
+
+        for (Ncv32u i=0; i<this->tests.size(); i++)
+        {
+            INCVTest &curTest = *tests[i];
+
+            NCVTestReport curReport;
+            bool res = curTest.executeTest(curReport);
+
+            if (outputLevel == OutputLevelFull)
+            {
+                printf("Test %3i %16s; Consumed mem GPU = %8d, CPU = %8d; %s\n",
+                    i,
+                    curTest.getName().c_str(),
+                    curReport.statsNums["MemGPU"],
+                    curReport.statsNums["MemCPU"],
+                    curReport.statsText["rcode"].c_str());
+            }
+
+            if (res)
+            {
+                nPassed++;
+                if (outputLevel == OutputLevelCompact)
+                {
+                    printf(".");
+                }
+            }
+            else
+            {
+                if (!curReport.statsText["rcode"].compare("FAILED"))
+                {
+                    nFailed++;
+                    if (outputLevel == OutputLevelCompact)
+                    {
+                        printf("x");
+                    }
+                    if (bStopOnFirstFail)
+                    {
+                        break;
+                    }
+                }
+                else
+                {
+                    nFailedMem++;
+                    if (outputLevel == OutputLevelCompact)
+                    {
+                        printf("m");
+                    }
+                }
+            }
+            fflush(stdout);
+        }
+        if (outputLevel == OutputLevelCompact)
+        {
+            printf("\n");
+        }
+
+        if (outputLevel != OutputLevelNone)
+        {
+            printf("Test suite '%s' complete: %d total, %d passed, %d memory errors, %d failed\n\n",
+                testSuiteName.c_str(),
+                (int)(this->tests.size()),
+                nPassed,
+                nFailedMem,
+                nFailed);
+        }
+
+        bool passed = nFailed == 0 && nFailedMem == 0;
+        return passed;
+    }
+
+    ~NCVAutoTestLister()
+    {
+        for (Ncv32u i=0; i<this->tests.size(); i++)
+        {
+            delete tests[i];
+        }
+    }
+
+private:
+
+    std::string testSuiteName;
+    OutputLevel outputLevel;
+    NcvBool bStopOnFirstFail;
+    std::vector<INCVTest *> tests;
+};
+
+#endif // _ncvautotestlister_hpp_
diff --git a/modules/cudalegacy/test/NCVTest.hpp b/modules/cudalegacy/test/NCVTest.hpp
new file mode 100644
index 00000000000..8461c271505
--- /dev/null
+++ b/modules/cudalegacy/test/NCVTest.hpp
@@ -0,0 +1,247 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _ncvtest_hpp_
+#define _ncvtest_hpp_
+
+#if defined _MSC_VER
+# pragma warning( disable : 4201 4408 4100)
+#endif
+
+#include <string>
+#include <vector>
+#include <map>
+#include <memory>
+#include <algorithm>
+#include <fstream>
+
+#include <cuda_runtime.h>
+
+#include "opencv2/cudalegacy.hpp"
+
+
+struct NCVTestReport
+{
+    std::map<std::string, Ncv32u> statsNums;
+    std::map<std::string, std::string> statsText;
+};
+
+
+class INCVTest
+{
+public:
+    virtual bool executeTest(NCVTestReport &report) = 0;
+    virtual std::string getName() const = 0;
+    virtual ~INCVTest(){}
+};
+
+
+class NCVTestProvider : public INCVTest
+{
+public:
+
+    NCVTestProvider(std::string testName_)
+        :
+        testName(testName_)
+    {
+        int devId;
+        ncvAssertPrintReturn(cudaSuccess == cudaGetDevice(&devId), "Error returned from cudaGetDevice", );
+        ncvAssertPrintReturn(cudaSuccess == cudaGetDeviceProperties(&this->devProp, devId), "Error returned from cudaGetDeviceProperties", );
+    }
+
+    virtual bool init() = 0;
+    virtual bool process() = 0;
+    virtual bool deinit() = 0;
+    virtual bool toString(std::ofstream &strOut) = 0;
+
+    virtual std::string getName() const
+    {
+        return this->testName;
+    }
+
+    virtual ~NCVTestProvider()
+    {
+        deinitMemory();
+    }
+
+    virtual bool executeTest(NCVTestReport &report)
+    {
+        bool res;
+        report.statsText["rcode"] = "FAILED";
+
+        res = initMemory(report);
+        if (!res)
+        {
+            dumpToFile(report);
+            deinitMemory();
+            return false;
+        }
+
+        res = init();
+        if (!res)
+        {
+            dumpToFile(report);
+            deinit();
+            deinitMemory();
+            return false;
+        }
+
+        res = process();
+        if (!res)
+        {
+            dumpToFile(report);
+            deinit();
+            deinitMemory();
+            return false;
+        }
+
+        res = deinit();
+        if (!res)
+        {
+            dumpToFile(report);
+            deinitMemory();
+            return false;
+        }
+
+        deinitMemory();
+
+        report.statsText["rcode"] = "Passed";
+        return true;
+    }
+
+protected:
+
+    cudaDeviceProp devProp;
+    std::unique_ptr<INCVMemAllocator> allocatorGPU;
+    std::unique_ptr<INCVMemAllocator> allocatorCPU;
+
+private:
+
+    std::string testName;
+
+    bool initMemory(NCVTestReport &report)
+    {
+        this->allocatorGPU.reset(new NCVMemStackAllocator(static_cast<Ncv32u>(devProp.textureAlignment)));
+        this->allocatorCPU.reset(new NCVMemStackAllocator(static_cast<Ncv32u>(devProp.textureAlignment)));
+
+        if (!this->allocatorGPU.get()->isInitialized() ||
+            !this->allocatorCPU.get()->isInitialized())
+        {
+            report.statsText["rcode"] = "Memory FAILED";
+            return false;
+        }
+
+        if (!this->process())
+        {
+            report.statsText["rcode"] = "Memory FAILED";
+            return false;
+        }
+
+        Ncv32u maxGPUsize = (Ncv32u)this->allocatorGPU.get()->maxSize();
+        Ncv32u maxCPUsize = (Ncv32u)this->allocatorCPU.get()->maxSize();
+
+        report.statsNums["MemGPU"] = maxGPUsize;
+        report.statsNums["MemCPU"] = maxCPUsize;
+
+        this->allocatorGPU.reset(new NCVMemStackAllocator(NCVMemoryTypeDevice, maxGPUsize, static_cast<Ncv32u>(devProp.textureAlignment)));
+
+        this->allocatorCPU.reset(new NCVMemStackAllocator(NCVMemoryTypeHostPinned, maxCPUsize, static_cast<Ncv32u>(devProp.textureAlignment)));
+
+        if (!this->allocatorGPU.get()->isInitialized() ||
+            !this->allocatorCPU.get()->isInitialized())
+        {
+            report.statsText["rcode"] = "Memory FAILED";
+            return false;
+        }
+
+        return true;
+    }
+
+    void deinitMemory()
+    {
+        this->allocatorGPU.reset();
+        this->allocatorCPU.reset();
+    }
+
+    void dumpToFile(NCVTestReport &report)
+    {
+        bool bReasonMem = (0 == report.statsText["rcode"].compare("Memory FAILED"));
+        std::string fname = "TestDump_";
+        fname += (bReasonMem ? "m_" : "") + this->testName + ".log";
+        std::ofstream stream(fname.c_str(), std::ios::trunc | std::ios::out);
+        if (!stream.is_open()) return;
+
+        stream << "NCV Test Failure Log: " << this->testName << std::endl;
+        stream << "====================================================" << std::endl << std::endl;
+        stream << "Test initialization report: " << std::endl;
+        for (std::map<std::string,std::string>::iterator it=report.statsText.begin();
+             it != report.statsText.end(); ++it)
+        {
+            stream << it->first << "=" << it->second << std::endl;
+        }
+        for (std::map<std::string,Ncv32u>::iterator it=report.statsNums.begin();
+            it != report.statsNums.end(); ++it)
+        {
+            stream << it->first << "=" << it->second << std::endl;
+        }
+        stream << std::endl;
+
+        stream << "Test initialization parameters: " << std::endl;
+        bool bSerializeRes = false;
+        try
+        {
+            bSerializeRes = this->toString(stream);
+        }
+        catch (...)
+        {
+        }
+
+        if (!bSerializeRes)
+        {
+            stream << "Couldn't retrieve object dump" << std::endl;
+        }
+
+        stream.flush();
+    }
+};
+
+#endif // _ncvtest_hpp_
diff --git a/modules/cudalegacy/test/NCVTestSourceProvider.hpp b/modules/cudalegacy/test/NCVTestSourceProvider.hpp
new file mode 100644
index 00000000000..58e92cea5a3
--- /dev/null
+++ b/modules/cudalegacy/test/NCVTestSourceProvider.hpp
@@ -0,0 +1,193 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _ncvtestsourceprovider_hpp_
+#define _ncvtestsourceprovider_hpp_
+
+#include <memory>
+
+#include "opencv2/highgui.hpp"
+#include "opencv2/cudalegacy.hpp"
+
+
+template <class T>
+class NCVTestSourceProvider
+{
+public:
+
+    NCVTestSourceProvider(Ncv32u seed, T rangeLow, T rangeHigh, Ncv32u maxWidth, Ncv32u maxHeight)
+        :
+        bInit(false)
+    {
+        ncvAssertPrintReturn(rangeLow < rangeHigh, "NCVTestSourceProvider ctor:: Invalid range", );
+
+        int devId;
+        cudaDeviceProp devProp;
+        ncvAssertPrintReturn(cudaSuccess == cudaGetDevice(&devId), "Error returned from cudaGetDevice", );
+        ncvAssertPrintReturn(cudaSuccess == cudaGetDeviceProperties(&devProp, devId), "Error returned from cudaGetDeviceProperties", );
+
+        //Ncv32u maxWpitch = alignUp(maxWidth * sizeof(T), devProp.textureAlignment);
+
+        allocatorCPU.reset(new NCVMemNativeAllocator(NCVMemoryTypeHostPinned, static_cast<Ncv32u>(devProp.textureAlignment)));
+        data.reset(new NCVMatrixAlloc<T>(*this->allocatorCPU.get(), maxWidth, maxHeight));
+        ncvAssertPrintReturn(data.get()->isMemAllocated(), "NCVTestSourceProvider ctor:: Matrix not allocated", );
+
+        this->dataWidth = maxWidth;
+        this->dataHeight = maxHeight;
+
+        srand(seed);
+
+        for (Ncv32u i=0; i<maxHeight; i++)
+        {
+            for (Ncv32u j=0; j<data.get()->stride(); j++)
+            {
+                data.get()->ptr()[i * data.get()->stride() + j] =
+                    (T)(((1.0 * rand()) / RAND_MAX) * (rangeHigh - rangeLow) + rangeLow);
+            }
+        }
+
+        this->bInit = true;
+    }
+
+    NCVTestSourceProvider(std::string pgmFilename)
+        :
+        bInit(false)
+    {
+        ncvAssertPrintReturn(sizeof(T) == 1, "NCVTestSourceProvider ctor:: PGM constructor complies only with 8bit types", );
+
+        cv::Mat image = cv::imread(pgmFilename);
+        ncvAssertPrintReturn(!image.empty(), "NCVTestSourceProvider ctor:: PGM file error", );
+
+        int devId;
+        cudaDeviceProp devProp;
+        ncvAssertPrintReturn(cudaSuccess == cudaGetDevice(&devId), "Error returned from cudaGetDevice", );
+        ncvAssertPrintReturn(cudaSuccess == cudaGetDeviceProperties(&devProp, devId), "Error returned from cudaGetDeviceProperties", );
+
+        allocatorCPU.reset(new NCVMemNativeAllocator(NCVMemoryTypeHostPinned, static_cast<Ncv32u>(devProp.textureAlignment)));
+        data.reset(new NCVMatrixAlloc<T>(*this->allocatorCPU.get(), image.cols, image.rows));
+        ncvAssertPrintReturn(data.get()->isMemAllocated(), "NCVTestSourceProvider ctor:: Matrix not allocated", );
+
+        this->dataWidth = image.cols;
+        this->dataHeight = image.rows;
+
+        cv::Mat hdr(image.size(), CV_8UC1, data.get()->ptr(), data.get()->pitch());
+        image.copyTo(hdr);
+
+        this->bInit = true;
+    }
+
+    NcvBool fill(NCVMatrix<T> &dst)
+    {
+        ncvAssertReturn(this->isInit() &&
+                        dst.memType() == allocatorCPU.get()->memType(), false);
+
+        if (dst.width() == 0 || dst.height() == 0)
+        {
+            return true;
+        }
+
+        for (Ncv32u i=0; i<dst.height(); i++)
+        {
+            Ncv32u srcLine = i % this->dataHeight;
+
+            Ncv32u srcFullChunks = dst.width() / this->dataWidth;
+            for (Ncv32u j=0; j<srcFullChunks; j++)
+            {
+                memcpy(dst.ptr() + i * dst.stride() + j * this->dataWidth,
+                    this->data.get()->ptr() + this->data.get()->stride() * srcLine,
+                    this->dataWidth * sizeof(T));
+            }
+
+            Ncv32u srcLastChunk = dst.width() % this->dataWidth;
+            memcpy(dst.ptr() + i * dst.stride() + srcFullChunks * this->dataWidth,
+                this->data.get()->ptr() + this->data.get()->stride() * srcLine,
+                srcLastChunk * sizeof(T));
+        }
+
+        return true;
+    }
+
+    NcvBool fill(NCVVector<T> &dst)
+    {
+        ncvAssertReturn(this->isInit() &&
+                        dst.memType() == allocatorCPU.get()->memType(), false);
+
+        if (dst.length() == 0)
+        {
+            return true;
+        }
+
+        Ncv32u srcLen = this->dataWidth * this->dataHeight;
+
+        Ncv32u srcFullChunks = (Ncv32u)dst.length() / srcLen;
+        for (Ncv32u j=0; j<srcFullChunks; j++)
+        {
+            memcpy(dst.ptr() + j * srcLen, this->data.get()->ptr(), srcLen * sizeof(T));
+        }
+
+        Ncv32u srcLastChunk = dst.length() % srcLen;
+        memcpy(dst.ptr() + srcFullChunks * srcLen, this->data.get()->ptr(), srcLastChunk * sizeof(T));
+
+        return true;
+    }
+
+    ~NCVTestSourceProvider()
+    {
+        data.reset();
+        allocatorCPU.reset();
+    }
+
+private:
+
+    NcvBool isInit(void)
+    {
+        return this->bInit;
+    }
+
+    NcvBool bInit;
+    std::unique_ptr< INCVMemAllocator > allocatorCPU;
+    std::unique_ptr< NCVMatrixAlloc<T> > data;
+    Ncv32u dataWidth;
+    Ncv32u dataHeight;
+};
+
+#endif // _ncvtestsourceprovider_hpp_
diff --git a/modules/cudalegacy/test/TestCompact.cpp b/modules/cudalegacy/test/TestCompact.cpp
new file mode 100644
index 00000000000..70640f37db8
--- /dev/null
+++ b/modules/cudalegacy/test/TestCompact.cpp
@@ -0,0 +1,159 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+TestCompact::TestCompact(std::string testName_, NCVTestSourceProvider<Ncv32u> &src_,
+                                             Ncv32u length_, Ncv32u badElem_, Ncv32u badElemPercentage_)
+    :
+    NCVTestProvider(testName_),
+    src(src_),
+    length(length_),
+    badElem(badElem_),
+    badElemPercentage(badElemPercentage_ > 100 ? 100 : badElemPercentage_)
+{
+}
+
+
+bool TestCompact::toString(std::ofstream &strOut)
+{
+    strOut << "length=" << length << std::endl;
+    strOut << "badElem=" << badElem << std::endl;
+    strOut << "badElemPercentage=" << badElemPercentage << std::endl;
+    return true;
+}
+
+
+bool TestCompact::init()
+{
+    return true;
+}
+
+
+bool TestCompact::process()
+{
+    NCVStatus ncvStat;
+    bool rcode = false;
+
+    NCVVectorAlloc<Ncv32u> h_vecSrc(*this->allocatorCPU.get(), this->length);
+    ncvAssertReturn(h_vecSrc.isMemAllocated(), false);
+    NCVVectorAlloc<Ncv32u> d_vecSrc(*this->allocatorGPU.get(), this->length);
+    ncvAssertReturn(d_vecSrc.isMemAllocated(), false);
+
+    NCVVectorAlloc<Ncv32u> h_vecDst(*this->allocatorCPU.get(), this->length);
+    ncvAssertReturn(h_vecDst.isMemAllocated(), false);
+    NCVVectorAlloc<Ncv32u> d_vecDst(*this->allocatorGPU.get(), this->length);
+    ncvAssertReturn(d_vecDst.isMemAllocated(), false);
+    NCVVectorAlloc<Ncv32u> h_vecDst_d(*this->allocatorCPU.get(), this->length);
+    ncvAssertReturn(h_vecDst_d.isMemAllocated(), false);
+
+    NCV_SET_SKIP_COND(this->allocatorGPU.get()->isCounting());
+    NCV_SKIP_COND_BEGIN
+    ncvAssertReturn(this->src.fill(h_vecSrc), false);
+    for (Ncv32u i=0; i<this->length; i++)
+    {
+        Ncv32u tmp = (h_vecSrc.ptr()[i]) & 0xFF;
+        tmp = tmp * 99 / 255;
+        if (tmp < this->badElemPercentage)
+        {
+            h_vecSrc.ptr()[i] = this->badElem;
+        }
+    }
+    NCV_SKIP_COND_END
+
+    NCVVectorAlloc<Ncv32u> h_dstLen(*this->allocatorCPU.get(), 1);
+    ncvAssertReturn(h_dstLen.isMemAllocated(), false);
+    Ncv32u bufSize;
+    ncvStat = nppsStCompactGetSize_32u(this->length, &bufSize, this->devProp);
+    ncvAssertReturn(NPPST_SUCCESS == ncvStat, false);
+    NCVVectorAlloc<Ncv8u> d_tmpBuf(*this->allocatorGPU.get(), bufSize);
+    ncvAssertReturn(d_tmpBuf.isMemAllocated(), false);
+
+    Ncv32u h_outElemNum_h = 0;
+
+    NCV_SKIP_COND_BEGIN
+    ncvStat = h_vecSrc.copySolid(d_vecSrc, 0);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+    ncvStat = nppsStCompact_32u(d_vecSrc.ptr(), this->length,
+                                d_vecDst.ptr(), h_dstLen.ptr(), this->badElem,
+                                d_tmpBuf.ptr(), bufSize, this->devProp);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+    ncvStat = d_vecDst.copySolid(h_vecDst_d, 0);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+
+    ncvStat = nppsStCompact_32u_host(h_vecSrc.ptr(), this->length, h_vecDst.ptr(), &h_outElemNum_h, this->badElem);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+    NCV_SKIP_COND_END
+
+    //bit-to-bit check
+    bool bLoopVirgin = true;
+
+    NCV_SKIP_COND_BEGIN
+    if (h_dstLen.ptr()[0] != h_outElemNum_h)
+    {
+        bLoopVirgin = false;
+    }
+    else
+    {
+        for (Ncv32u i=0; bLoopVirgin && i < h_outElemNum_h; i++)
+        {
+            if (h_vecDst.ptr()[i] != h_vecDst_d.ptr()[i])
+            {
+                bLoopVirgin = false;
+            }
+        }
+    }
+    NCV_SKIP_COND_END
+
+    if (bLoopVirgin)
+    {
+        rcode = true;
+    }
+
+    return rcode;
+}
+
+
+bool TestCompact::deinit()
+{
+    return true;
+}
diff --git a/modules/cudalegacy/test/TestCompact.h b/modules/cudalegacy/test/TestCompact.h
new file mode 100644
index 00000000000..256cdf4cf9e
--- /dev/null
+++ b/modules/cudalegacy/test/TestCompact.h
@@ -0,0 +1,73 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _testhypothesescompact_h_
+#define _testhypothesescompact_h_
+
+#include "NCVTest.hpp"
+#include "NCVTestSourceProvider.hpp"
+
+
+class TestCompact : public NCVTestProvider
+{
+public:
+
+    TestCompact(std::string testName, NCVTestSourceProvider<Ncv32u> &src,
+                          Ncv32u length, Ncv32u badElem, Ncv32u badElemPercentage);
+
+    virtual bool init();
+    virtual bool process();
+    virtual bool deinit();
+    virtual bool toString(std::ofstream &strOut);
+
+private:
+    TestCompact(const TestCompact&);
+    TestCompact& operator=(const TestCompact&);
+
+
+    NCVTestSourceProvider<Ncv32u> &src;
+    Ncv32u length;
+    Ncv32u badElem;
+    Ncv32u badElemPercentage;
+};
+
+#endif // _testhypothesescompact_h_
diff --git a/modules/cudalegacy/test/TestDrawRects.cpp b/modules/cudalegacy/test/TestDrawRects.cpp
new file mode 100644
index 00000000000..40d8e21b49a
--- /dev/null
+++ b/modules/cudalegacy/test/TestDrawRects.cpp
@@ -0,0 +1,194 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+
+template <class T>
+TestDrawRects<T>::TestDrawRects(std::string testName_, NCVTestSourceProvider<T> &src_,
+                                NCVTestSourceProvider<Ncv32u> &src32u_,
+                                Ncv32u width_, Ncv32u height_, Ncv32u numRects_, T color_)
+    :
+    NCVTestProvider(testName_),
+    src(src_),
+    src32u(src32u_),
+    width(width_),
+    height(height_),
+    numRects(numRects_),
+    color(color_)
+{
+}
+
+
+template <class T>
+bool TestDrawRects<T>::toString(std::ofstream &strOut)
+{
+    strOut << "sizeof(T)=" << sizeof(T) << std::endl;
+    strOut << "width=" << width << std::endl;
+    strOut << "height=" << height << std::endl;
+    strOut << "numRects=" << numRects << std::endl;
+    strOut << "color=" << color << std::endl;
+    return true;
+}
+
+
+template <class T>
+bool TestDrawRects<T>::init()
+{
+    return true;
+}
+
+
+template <class T>
+bool TestDrawRects<T>::process()
+{
+    NCVStatus ncvStat;
+    bool rcode = false;
+
+    NCVMatrixAlloc<T> d_img(*this->allocatorGPU.get(), this->width, this->height);
+    ncvAssertReturn(d_img.isMemAllocated(), false);
+    NCVMatrixAlloc<T> h_img(*this->allocatorCPU.get(), this->width, this->height);
+    ncvAssertReturn(h_img.isMemAllocated(), false);
+    NCVMatrixAlloc<T> h_img_d(*this->allocatorCPU.get(), this->width, this->height);
+    ncvAssertReturn(h_img_d.isMemAllocated(), false);
+
+    NCVVectorAlloc<NcvRect32u> d_rects(*this->allocatorGPU.get(), this->numRects);
+    ncvAssertReturn(d_rects.isMemAllocated(), false);
+    NCVVectorAlloc<NcvRect32u> h_rects(*this->allocatorCPU.get(), this->numRects);
+    ncvAssertReturn(h_rects.isMemAllocated(), false);
+
+    NCV_SET_SKIP_COND(this->allocatorGPU.get()->isCounting());
+    NCV_SKIP_COND_BEGIN
+    ncvAssertReturn(this->src.fill(h_img), false);
+    ncvStat = h_img.copySolid(d_img, 0);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+    ncvAssertCUDAReturn(cudaStreamSynchronize(0), false);
+
+    //fill vector of rectangles with random rects covering the input
+    NCVVectorReuse<Ncv32u> h_rects_as32u(h_rects.getSegment());
+    ncvAssertReturn(h_rects_as32u.isMemReused(), false);
+    ncvAssertReturn(this->src32u.fill(h_rects_as32u), false);
+    for (Ncv32u i=0; i<this->numRects; i++)
+    {
+        h_rects.ptr()[i].x = (Ncv32u)(((1.0 * h_rects.ptr()[i].x) / RAND_MAX) * (this->width-2));
+        h_rects.ptr()[i].y = (Ncv32u)(((1.0 * h_rects.ptr()[i].y) / RAND_MAX) * (this->height-2));
+        h_rects.ptr()[i].width = (Ncv32u)(((1.0 * h_rects.ptr()[i].width) / RAND_MAX) * (this->width+10 - h_rects.ptr()[i].x));
+        h_rects.ptr()[i].height = (Ncv32u)(((1.0 * h_rects.ptr()[i].height) / RAND_MAX) * (this->height+10 - h_rects.ptr()[i].y));
+    }
+    ncvStat = h_rects.copySolid(d_rects, 0);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+    ncvAssertCUDAReturn(cudaStreamSynchronize(0), false);
+
+    if (sizeof(T) == sizeof(Ncv32u))
+    {
+        ncvStat = ncvDrawRects_32u_device((Ncv32u *)d_img.ptr(), d_img.stride(), this->width, this->height,
+                                          (NcvRect32u *)d_rects.ptr(), this->numRects, this->color, 0);
+    }
+    else if (sizeof(T) == sizeof(Ncv8u))
+    {
+        ncvStat = ncvDrawRects_8u_device((Ncv8u *)d_img.ptr(), d_img.stride(), this->width, this->height,
+                                         (NcvRect32u *)d_rects.ptr(), this->numRects, (Ncv8u)this->color, 0);
+    }
+    else
+    {
+        ncvAssertPrintReturn(false, "Incorrect drawrects test instance", false);
+    }
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+    NCV_SKIP_COND_END
+
+    ncvStat = d_img.copySolid(h_img_d, 0);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+    ncvAssertCUDAReturn(cudaStreamSynchronize(0), false);
+
+    NCV_SKIP_COND_BEGIN
+    if (sizeof(T) == sizeof(Ncv32u))
+    {
+        ncvStat = ncvDrawRects_32u_host((Ncv32u *)h_img.ptr(), h_img.stride(), this->width, this->height,
+                                        (NcvRect32u *)h_rects.ptr(), this->numRects, this->color);
+    }
+    else if (sizeof(T) == sizeof(Ncv8u))
+    {
+        ncvStat = ncvDrawRects_8u_host((Ncv8u *)h_img.ptr(), h_img.stride(), this->width, this->height,
+                                       (NcvRect32u *)h_rects.ptr(), this->numRects, (Ncv8u)this->color);
+    }
+    else
+    {
+        ncvAssertPrintReturn(false, "Incorrect drawrects test instance", false);
+    }
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+    NCV_SKIP_COND_END
+
+    //bit-to-bit check
+    bool bLoopVirgin = true;
+
+    NCV_SKIP_COND_BEGIN
+    //const Ncv64f relEPS = 0.005;
+    for (Ncv32u i=0; bLoopVirgin && i < h_img.height(); i++)
+    {
+        for (Ncv32u j=0; bLoopVirgin && j < h_img.width(); j++)
+        {
+            if (h_img.ptr()[h_img.stride()*i+j] != h_img_d.ptr()[h_img_d.stride()*i+j])
+            {
+                bLoopVirgin = false;
+            }
+        }
+    }
+    NCV_SKIP_COND_END
+
+    if (bLoopVirgin)
+    {
+        rcode = true;
+    }
+
+    return rcode;
+}
+
+
+template <class T>
+bool TestDrawRects<T>::deinit()
+{
+    return true;
+}
+
+
+template class TestDrawRects<Ncv8u>;
+template class TestDrawRects<Ncv32u>;
diff --git a/modules/cudalegacy/test/TestDrawRects.h b/modules/cudalegacy/test/TestDrawRects.h
new file mode 100644
index 00000000000..b64c133d505
--- /dev/null
+++ b/modules/cudalegacy/test/TestDrawRects.h
@@ -0,0 +1,76 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _testdrawrects_h_
+#define _testdrawrects_h_
+
+#include "NCVTest.hpp"
+#include "NCVTestSourceProvider.hpp"
+
+
+template <class T>
+class TestDrawRects : public NCVTestProvider
+{
+public:
+
+    TestDrawRects(std::string testName, NCVTestSourceProvider<T> &src, NCVTestSourceProvider<Ncv32u> &src32u,
+                  Ncv32u width, Ncv32u height, Ncv32u numRects, T color);
+
+    virtual bool init();
+    virtual bool process();
+    virtual bool deinit();
+    virtual bool toString(std::ofstream &strOut);
+
+private:
+
+    TestDrawRects(const TestDrawRects&);
+    TestDrawRects& operator=(const TestDrawRects&);
+
+    NCVTestSourceProvider<T> &src;
+    NCVTestSourceProvider<Ncv32u> &src32u;
+    Ncv32u width;
+    Ncv32u height;
+    Ncv32u numRects;
+    T color;
+};
+
+#endif // _testdrawrects_h_
diff --git a/modules/cudalegacy/test/TestHaarCascadeApplication.cpp b/modules/cudalegacy/test/TestHaarCascadeApplication.cpp
new file mode 100644
index 00000000000..d5fbb7561cc
--- /dev/null
+++ b/modules/cudalegacy/test/TestHaarCascadeApplication.cpp
@@ -0,0 +1,335 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+namespace
+{
+    // http://www.christian-seiler.de/projekte/fpmath/
+    class FpuControl
+    {
+    public:
+        FpuControl();
+        ~FpuControl();
+
+    private:
+    #if defined(__GNUC__) && !defined(__APPLE__) && !defined(__arm__) && !defined(__aarch64__) && !defined(__powerpc64__)
+        fpu_control_t fpu_oldcw, fpu_cw;
+    #elif defined(_WIN32) && !defined(_WIN64)
+        unsigned int fpu_oldcw, fpu_cw;
+    #endif
+    };
+
+    FpuControl::FpuControl()
+    {
+    #if defined(__GNUC__) && !defined(__APPLE__) && !defined(__arm__) && !defined(__aarch64__) && !defined(__powerpc64__)
+        _FPU_GETCW(fpu_oldcw);
+        fpu_cw = (fpu_oldcw & ~_FPU_EXTENDED & ~_FPU_DOUBLE & ~_FPU_SINGLE) | _FPU_SINGLE;
+        _FPU_SETCW(fpu_cw);
+    #elif defined(_WIN32) && !defined(_WIN64)
+        _controlfp_s(&fpu_cw, 0, 0);
+        fpu_oldcw = fpu_cw;
+        _controlfp_s(&fpu_cw, _PC_24, _MCW_PC);
+    #endif
+    }
+
+    FpuControl::~FpuControl()
+    {
+    #if defined(__GNUC__) && !defined(__APPLE__) && !defined(__arm__) && !defined(__aarch64__) && !defined(__powerpc64__)
+        _FPU_SETCW(fpu_oldcw);
+    #elif defined(_WIN32) && !defined(_WIN64)
+        _controlfp_s(&fpu_cw, fpu_oldcw, _MCW_PC);
+    #endif
+    }
+}
+
+TestHaarCascadeApplication::TestHaarCascadeApplication(std::string testName_, NCVTestSourceProvider<Ncv8u> &src_,
+                                                       std::string cascadeName_, Ncv32u width_, Ncv32u height_)
+    :
+    NCVTestProvider(testName_),
+    src(src_),
+    cascadeName(cascadeName_),
+    width(width_),
+    height(height_)
+{
+}
+
+
+bool TestHaarCascadeApplication::toString(std::ofstream &strOut)
+{
+    strOut << "cascadeName=" << cascadeName << std::endl;
+    strOut << "width=" << width << std::endl;
+    strOut << "height=" << height << std::endl;
+    return true;
+}
+
+
+bool TestHaarCascadeApplication::init()
+{
+    return true;
+}
+
+bool TestHaarCascadeApplication::process()
+{
+    NCVStatus ncvStat;
+    bool rcode = false;
+
+    Ncv32u numStages, numNodes, numFeatures;
+
+    ncvStat = ncvHaarGetClassifierSize(this->cascadeName, numStages, numNodes, numFeatures);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+
+    NCVVectorAlloc<HaarStage64> h_HaarStages(*this->allocatorCPU.get(), numStages);
+    ncvAssertReturn(h_HaarStages.isMemAllocated(), false);
+    NCVVectorAlloc<HaarClassifierNode128> h_HaarNodes(*this->allocatorCPU.get(), numNodes);
+    ncvAssertReturn(h_HaarNodes.isMemAllocated(), false);
+    NCVVectorAlloc<HaarFeature64> h_HaarFeatures(*this->allocatorCPU.get(), numFeatures);
+    ncvAssertReturn(h_HaarFeatures.isMemAllocated(), false);
+
+    NCVVectorAlloc<HaarStage64> d_HaarStages(*this->allocatorGPU.get(), numStages);
+    ncvAssertReturn(d_HaarStages.isMemAllocated(), false);
+    NCVVectorAlloc<HaarClassifierNode128> d_HaarNodes(*this->allocatorGPU.get(), numNodes);
+    ncvAssertReturn(d_HaarNodes.isMemAllocated(), false);
+    NCVVectorAlloc<HaarFeature64> d_HaarFeatures(*this->allocatorGPU.get(), numFeatures);
+    ncvAssertReturn(d_HaarFeatures.isMemAllocated(), false);
+
+    HaarClassifierCascadeDescriptor haar;
+    haar.ClassifierSize.width = haar.ClassifierSize.height = 1;
+    haar.bNeedsTiltedII = false;
+    haar.NumClassifierRootNodes = numNodes;
+    haar.NumClassifierTotalNodes = numNodes;
+    haar.NumFeatures = numFeatures;
+    haar.NumStages = numStages;
+
+    NCV_SET_SKIP_COND(this->allocatorGPU.get()->isCounting());
+    NCV_SKIP_COND_BEGIN
+
+    ncvStat = ncvHaarLoadFromFile_host(this->cascadeName, haar, h_HaarStages, h_HaarNodes, h_HaarFeatures);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+
+    ncvAssertReturn(NCV_SUCCESS == h_HaarStages.copySolid(d_HaarStages, 0), false);
+    ncvAssertReturn(NCV_SUCCESS == h_HaarNodes.copySolid(d_HaarNodes, 0), false);
+    ncvAssertReturn(NCV_SUCCESS == h_HaarFeatures.copySolid(d_HaarFeatures, 0), false);
+    ncvAssertCUDAReturn(cudaStreamSynchronize(0), false);
+
+    NCV_SKIP_COND_END
+
+    NcvSize32s srcRoi, srcIIRoi, searchRoi;
+    srcRoi.width = this->width;
+    srcRoi.height = this->height;
+    srcIIRoi.width = srcRoi.width + 1;
+    srcIIRoi.height = srcRoi.height + 1;
+    searchRoi.width = srcIIRoi.width - haar.ClassifierSize.width;
+    searchRoi.height = srcIIRoi.height - haar.ClassifierSize.height;
+    if (searchRoi.width <= 0 || searchRoi.height <= 0)
+    {
+        return false;
+    }
+    NcvSize32u searchRoiU(searchRoi.width, searchRoi.height);
+
+    NCVMatrixAlloc<Ncv8u> d_img(*this->allocatorGPU.get(), this->width, this->height);
+    ncvAssertReturn(d_img.isMemAllocated(), false);
+    NCVMatrixAlloc<Ncv8u> h_img(*this->allocatorCPU.get(), this->width, this->height);
+    ncvAssertReturn(h_img.isMemAllocated(), false);
+
+    Ncv32u integralWidth = this->width + 1;
+    Ncv32u integralHeight = this->height + 1;
+
+    NCVMatrixAlloc<Ncv32u> d_integralImage(*this->allocatorGPU.get(), integralWidth, integralHeight);
+    ncvAssertReturn(d_integralImage.isMemAllocated(), false);
+    NCVMatrixAlloc<Ncv64u> d_sqIntegralImage(*this->allocatorGPU.get(), integralWidth, integralHeight);
+    ncvAssertReturn(d_sqIntegralImage.isMemAllocated(), false);
+    NCVMatrixAlloc<Ncv32u> h_integralImage(*this->allocatorCPU.get(), integralWidth, integralHeight);
+    ncvAssertReturn(h_integralImage.isMemAllocated(), false);
+    NCVMatrixAlloc<Ncv64u> h_sqIntegralImage(*this->allocatorCPU.get(), integralWidth, integralHeight);
+    ncvAssertReturn(h_sqIntegralImage.isMemAllocated(), false);
+
+    NCVMatrixAlloc<Ncv32f> d_rectStdDev(*this->allocatorGPU.get(), this->width, this->height);
+    ncvAssertReturn(d_rectStdDev.isMemAllocated(), false);
+    NCVMatrixAlloc<Ncv32u> d_pixelMask(*this->allocatorGPU.get(), this->width, this->height);
+    ncvAssertReturn(d_pixelMask.isMemAllocated(), false);
+    NCVMatrixAlloc<Ncv32f> h_rectStdDev(*this->allocatorCPU.get(), this->width, this->height);
+    ncvAssertReturn(h_rectStdDev.isMemAllocated(), false);
+    NCVMatrixAlloc<Ncv32u> h_pixelMask(*this->allocatorCPU.get(), this->width, this->height);
+    ncvAssertReturn(h_pixelMask.isMemAllocated(), false);
+
+    NCVVectorAlloc<NcvRect32u> d_hypotheses(*this->allocatorGPU.get(), this->width * this->height);
+    ncvAssertReturn(d_hypotheses.isMemAllocated(), false);
+    NCVVectorAlloc<NcvRect32u> h_hypotheses(*this->allocatorCPU.get(), this->width * this->height);
+    ncvAssertReturn(h_hypotheses.isMemAllocated(), false);
+
+    NCVStatus nppStat;
+    Ncv32u szTmpBufIntegral, szTmpBufSqIntegral;
+    nppStat = nppiStIntegralGetSize_8u32u(NcvSize32u(this->width, this->height), &szTmpBufIntegral, this->devProp);
+    ncvAssertReturn(nppStat == NPPST_SUCCESS, false);
+    nppStat = nppiStSqrIntegralGetSize_8u64u(NcvSize32u(this->width, this->height), &szTmpBufSqIntegral, this->devProp);
+    ncvAssertReturn(nppStat == NPPST_SUCCESS, false);
+    NCVVectorAlloc<Ncv8u> d_tmpIIbuf(*this->allocatorGPU.get(), std::max(szTmpBufIntegral, szTmpBufSqIntegral));
+    ncvAssertReturn(d_tmpIIbuf.isMemAllocated(), false);
+
+    Ncv32u detectionsOnThisScale_d = 0;
+    Ncv32u detectionsOnThisScale_h = 0;
+
+    NCV_SKIP_COND_BEGIN
+
+    ncvAssertReturn(this->src.fill(h_img), false);
+    ncvStat = h_img.copySolid(d_img, 0);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+    ncvAssertCUDAReturn(cudaStreamSynchronize(0), false);
+
+    nppStat = nppiStIntegral_8u32u_C1R(d_img.ptr(), d_img.pitch(),
+                                       d_integralImage.ptr(), d_integralImage.pitch(),
+                                       NcvSize32u(d_img.width(), d_img.height()),
+                                       d_tmpIIbuf.ptr(), szTmpBufIntegral, this->devProp);
+    ncvAssertReturn(nppStat == NPPST_SUCCESS, false);
+
+    nppStat = nppiStSqrIntegral_8u64u_C1R(d_img.ptr(), d_img.pitch(),
+                                          d_sqIntegralImage.ptr(), d_sqIntegralImage.pitch(),
+                                          NcvSize32u(d_img.width(), d_img.height()),
+                                          d_tmpIIbuf.ptr(), szTmpBufSqIntegral, this->devProp);
+    ncvAssertReturn(nppStat == NPPST_SUCCESS, false);
+
+    const NcvRect32u rect(
+        HAAR_STDDEV_BORDER,
+        HAAR_STDDEV_BORDER,
+        haar.ClassifierSize.width - 2*HAAR_STDDEV_BORDER,
+        haar.ClassifierSize.height - 2*HAAR_STDDEV_BORDER);
+    nppStat = nppiStRectStdDev_32f_C1R(
+        d_integralImage.ptr(), d_integralImage.pitch(),
+        d_sqIntegralImage.ptr(), d_sqIntegralImage.pitch(),
+        d_rectStdDev.ptr(), d_rectStdDev.pitch(),
+        NcvSize32u(searchRoi.width, searchRoi.height), rect,
+        1.0f, true);
+    ncvAssertReturn(nppStat == NPPST_SUCCESS, false);
+
+    ncvStat = d_integralImage.copySolid(h_integralImage, 0);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+    ncvStat = d_rectStdDev.copySolid(h_rectStdDev, 0);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+
+    for (Ncv32u i=0; i<searchRoiU.height; i++)
+    {
+        for (Ncv32u j=0; j<h_pixelMask.stride(); j++)
+        {
+            if (j<searchRoiU.width)
+            {
+                h_pixelMask.ptr()[i*h_pixelMask.stride()+j] = (i << 16) | j;
+            }
+            else
+            {
+                h_pixelMask.ptr()[i*h_pixelMask.stride()+j] = OBJDET_MASK_ELEMENT_INVALID_32U;
+            }
+        }
+    }
+    ncvAssertReturn(cudaSuccess == cudaStreamSynchronize(0), false);
+
+    {
+        // calculations here
+        FpuControl fpu;
+        CV_UNUSED(fpu);
+
+        ncvStat = ncvApplyHaarClassifierCascade_host(
+            h_integralImage, h_rectStdDev, h_pixelMask,
+            detectionsOnThisScale_h,
+            haar, h_HaarStages, h_HaarNodes, h_HaarFeatures, false,
+            searchRoiU, 1, 1.0f);
+        ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+    }
+
+    NCV_SKIP_COND_END
+
+    int devId;
+    ncvAssertCUDAReturn(cudaGetDevice(&devId), false);
+    cudaDeviceProp _devProp;
+    ncvAssertCUDAReturn(cudaGetDeviceProperties(&_devProp, devId), false);
+
+    ncvStat = ncvApplyHaarClassifierCascade_device(
+        d_integralImage, d_rectStdDev, d_pixelMask,
+        detectionsOnThisScale_d,
+        haar, h_HaarStages, d_HaarStages, d_HaarNodes, d_HaarFeatures, false,
+        searchRoiU, 1, 1.0f,
+        *this->allocatorGPU.get(), *this->allocatorCPU.get(),
+        _devProp, 0);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+
+    NCVMatrixAlloc<Ncv32u> h_pixelMask_d(*this->allocatorCPU.get(), this->width, this->height);
+    ncvAssertReturn(h_pixelMask_d.isMemAllocated(), false);
+
+    //bit-to-bit check
+    bool bLoopVirgin = true;
+
+    NCV_SKIP_COND_BEGIN
+
+    ncvStat = d_pixelMask.copySolid(h_pixelMask_d, 0);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+
+    if (detectionsOnThisScale_d != detectionsOnThisScale_h)
+    {
+        bLoopVirgin = false;
+    }
+    else
+    {
+        std::sort(h_pixelMask_d.ptr(), h_pixelMask_d.ptr() + detectionsOnThisScale_d);
+        for (Ncv32u i=0; i<detectionsOnThisScale_d && bLoopVirgin; i++)
+        {
+            if (h_pixelMask.ptr()[i] != h_pixelMask_d.ptr()[i])
+            {
+                bLoopVirgin = false;
+            }
+        }
+    }
+
+    NCV_SKIP_COND_END
+
+    if (bLoopVirgin)
+    {
+        rcode = true;
+    }
+
+    return rcode;
+}
+
+
+bool TestHaarCascadeApplication::deinit()
+{
+    return true;
+}
diff --git a/modules/cudalegacy/test/TestHaarCascadeApplication.h b/modules/cudalegacy/test/TestHaarCascadeApplication.h
new file mode 100644
index 00000000000..eaa43717451
--- /dev/null
+++ b/modules/cudalegacy/test/TestHaarCascadeApplication.h
@@ -0,0 +1,73 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _testhaarcascadeapplication_h_
+#define _testhaarcascadeapplication_h_
+
+#include "NCVTest.hpp"
+#include "NCVTestSourceProvider.hpp"
+
+
+class TestHaarCascadeApplication : public NCVTestProvider
+{
+public:
+
+    TestHaarCascadeApplication(std::string testName, NCVTestSourceProvider<Ncv8u> &src,
+                               std::string cascadeName, Ncv32u width, Ncv32u height);
+
+    virtual bool init();
+    virtual bool process();
+    virtual bool deinit();
+    virtual bool toString(std::ofstream &strOut);
+
+private:
+    TestHaarCascadeApplication(const TestHaarCascadeApplication&);
+    TestHaarCascadeApplication& operator=(const TestHaarCascadeApplication&);
+
+
+    NCVTestSourceProvider<Ncv8u> &src;
+    std::string cascadeName;
+    Ncv32u width;
+    Ncv32u height;
+};
+
+#endif // _testhaarcascadeapplication_h_
diff --git a/modules/cudalegacy/test/TestHaarCascadeLoader.cpp b/modules/cudalegacy/test/TestHaarCascadeLoader.cpp
new file mode 100644
index 00000000000..8ca44dd1332
--- /dev/null
+++ b/modules/cudalegacy/test/TestHaarCascadeLoader.cpp
@@ -0,0 +1,153 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+
+TestHaarCascadeLoader::TestHaarCascadeLoader(std::string testName_, std::string cascadeName_)
+    :
+    NCVTestProvider(testName_),
+    cascadeName(cascadeName_)
+{
+}
+
+
+bool TestHaarCascadeLoader::toString(std::ofstream &strOut)
+{
+    strOut << "cascadeName=" << cascadeName << std::endl;
+    return true;
+}
+
+
+bool TestHaarCascadeLoader::init()
+{
+    return true;
+}
+
+
+bool TestHaarCascadeLoader::process()
+{
+    NCVStatus ncvStat;
+    bool rcode = false;
+
+    Ncv32u numStages, numNodes, numFeatures;
+    Ncv32u numStages_2 = 0, numNodes_2 = 0, numFeatures_2 = 0;
+
+    ncvStat = ncvHaarGetClassifierSize(this->cascadeName, numStages, numNodes, numFeatures);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+
+    NCVVectorAlloc<HaarStage64> h_HaarStages(*this->allocatorCPU.get(), numStages);
+    ncvAssertReturn(h_HaarStages.isMemAllocated(), false);
+    NCVVectorAlloc<HaarClassifierNode128> h_HaarNodes(*this->allocatorCPU.get(), numNodes);
+    ncvAssertReturn(h_HaarNodes.isMemAllocated(), false);
+    NCVVectorAlloc<HaarFeature64> h_HaarFeatures(*this->allocatorCPU.get(), numFeatures);
+    ncvAssertReturn(h_HaarFeatures.isMemAllocated(), false);
+
+    NCVVectorAlloc<HaarStage64> h_HaarStages_2(*this->allocatorCPU.get(), numStages);
+    ncvAssertReturn(h_HaarStages_2.isMemAllocated(), false);
+    NCVVectorAlloc<HaarClassifierNode128> h_HaarNodes_2(*this->allocatorCPU.get(), numNodes);
+    ncvAssertReturn(h_HaarNodes_2.isMemAllocated(), false);
+    NCVVectorAlloc<HaarFeature64> h_HaarFeatures_2(*this->allocatorCPU.get(), numFeatures);
+    ncvAssertReturn(h_HaarFeatures_2.isMemAllocated(), false);
+
+    HaarClassifierCascadeDescriptor haar;
+    HaarClassifierCascadeDescriptor haar_2;
+
+    NCV_SET_SKIP_COND(this->allocatorGPU.get()->isCounting());
+    NCV_SKIP_COND_BEGIN
+
+    const std::string testNvbinName = cv::tempfile("test.nvbin");
+    ncvStat = ncvHaarLoadFromFile_host(this->cascadeName, haar, h_HaarStages, h_HaarNodes, h_HaarFeatures);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+
+    ncvStat = ncvHaarStoreNVBIN_host(testNvbinName, haar, h_HaarStages, h_HaarNodes, h_HaarFeatures);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+
+    ncvStat = ncvHaarGetClassifierSize(testNvbinName, numStages_2, numNodes_2, numFeatures_2);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+
+    ncvStat = ncvHaarLoadFromFile_host(testNvbinName, haar_2, h_HaarStages_2, h_HaarNodes_2, h_HaarFeatures_2);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+
+    NCV_SKIP_COND_END
+
+    //bit-to-bit check
+    bool bLoopVirgin = true;
+
+    NCV_SKIP_COND_BEGIN
+
+    if (
+    numStages_2 != numStages                                       ||
+    numNodes_2 != numNodes                                         ||
+    numFeatures_2 != numFeatures                                   ||
+    haar.NumStages               != haar_2.NumStages               ||
+    haar.NumClassifierRootNodes  != haar_2.NumClassifierRootNodes  ||
+    haar.NumClassifierTotalNodes != haar_2.NumClassifierTotalNodes ||
+    haar.NumFeatures             != haar_2.NumFeatures             ||
+    haar.ClassifierSize.width    != haar_2.ClassifierSize.width    ||
+    haar.ClassifierSize.height   != haar_2.ClassifierSize.height   ||
+    haar.bNeedsTiltedII          != haar_2.bNeedsTiltedII          ||
+    haar.bHasStumpsOnly          != haar_2.bHasStumpsOnly          )
+    {
+        bLoopVirgin = false;
+    }
+    if (memcmp(h_HaarStages.ptr(), h_HaarStages_2.ptr(), haar.NumStages * sizeof(HaarStage64)) ||
+        memcmp(h_HaarNodes.ptr(), h_HaarNodes_2.ptr(), haar.NumClassifierTotalNodes * sizeof(HaarClassifierNode128)) ||
+        memcmp(h_HaarFeatures.ptr(), h_HaarFeatures_2.ptr(), haar.NumFeatures * sizeof(HaarFeature64)) )
+    {
+        bLoopVirgin = false;
+    }
+    NCV_SKIP_COND_END
+
+    if (bLoopVirgin)
+    {
+        rcode = true;
+    }
+
+    return rcode;
+}
+
+
+bool TestHaarCascadeLoader::deinit()
+{
+    return true;
+}
diff --git a/modules/cudalegacy/test/TestHaarCascadeLoader.h b/modules/cudalegacy/test/TestHaarCascadeLoader.h
new file mode 100644
index 00000000000..86c6324635b
--- /dev/null
+++ b/modules/cudalegacy/test/TestHaarCascadeLoader.h
@@ -0,0 +1,66 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _testhaarcascadeloader_h_
+#define _testhaarcascadeloader_h_
+
+#include "NCVTest.hpp"
+#include "NCVTestSourceProvider.hpp"
+
+
+class TestHaarCascadeLoader : public NCVTestProvider
+{
+public:
+
+    TestHaarCascadeLoader(std::string testName, std::string cascadeName);
+
+    virtual bool init();
+    virtual bool process();
+    virtual bool deinit();
+    virtual bool toString(std::ofstream &strOut);
+
+private:
+
+    std::string cascadeName;
+};
+
+#endif // _testhaarcascadeloader_h_
diff --git a/modules/cudalegacy/test/TestHypothesesFilter.cpp b/modules/cudalegacy/test/TestHypothesesFilter.cpp
new file mode 100644
index 00000000000..39d6556616a
--- /dev/null
+++ b/modules/cudalegacy/test/TestHypothesesFilter.cpp
@@ -0,0 +1,206 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+
+TestHypothesesFilter::TestHypothesesFilter(std::string testName_, NCVTestSourceProvider<Ncv32u> &src_,
+                                           Ncv32u numDstRects_, Ncv32u minNeighbors_, Ncv32f eps_)
+    :
+    NCVTestProvider(testName_),
+    src(src_),
+    numDstRects(numDstRects_),
+    minNeighbors(minNeighbors_),
+    eps(eps_)
+{
+}
+
+
+bool TestHypothesesFilter::toString(std::ofstream &strOut)
+{
+    strOut << "numDstRects=" << numDstRects << std::endl;
+    strOut << "minNeighbors=" << minNeighbors << std::endl;
+    strOut << "eps=" << eps << std::endl;
+    return true;
+}
+
+
+bool TestHypothesesFilter::init()
+{
+    this->canvasWidth = 4096;
+    this->canvasHeight = 4096;
+    return true;
+}
+
+
+bool compareRects(const NcvRect32u &r1, const NcvRect32u &r2, Ncv32f eps)
+{
+    double delta = eps*(std::min(r1.width, r2.width) + std::min(r1.height, r2.height))*0.5;
+    return std::abs((Ncv32s)r1.x - (Ncv32s)r2.x) <= delta &&
+        std::abs((Ncv32s)r1.y - (Ncv32s)r2.y) <= delta &&
+        std::abs((Ncv32s)r1.x + (Ncv32s)r1.width - (Ncv32s)r2.x - (Ncv32s)r2.width) <= delta &&
+        std::abs((Ncv32s)r1.y + (Ncv32s)r1.height - (Ncv32s)r2.y - (Ncv32s)r2.height) <= delta;
+}
+
+
+inline bool operator < (const NcvRect32u &a, const NcvRect32u &b)
+{
+    return a.x < b.x;
+}
+
+
+bool TestHypothesesFilter::process()
+{
+    NCVStatus ncvStat;
+    bool rcode = false;
+
+    NCVVectorAlloc<Ncv32u> h_random32u(*this->allocatorCPU.get(), this->numDstRects * sizeof(NcvRect32u) / sizeof(Ncv32u));
+    ncvAssertReturn(h_random32u.isMemAllocated(), false);
+
+    Ncv32u srcSlotSize = 2 * this->minNeighbors + 1;
+
+    NCVVectorAlloc<NcvRect32u> h_vecSrc(*this->allocatorCPU.get(), this->numDstRects*srcSlotSize);
+    ncvAssertReturn(h_vecSrc.isMemAllocated(), false);
+    NCVVectorAlloc<NcvRect32u> h_vecDst_groundTruth(*this->allocatorCPU.get(), this->numDstRects);
+    ncvAssertReturn(h_vecDst_groundTruth.isMemAllocated(), false);
+
+    NCV_SET_SKIP_COND(this->allocatorCPU.get()->isCounting());
+
+    NCV_SKIP_COND_BEGIN
+    ncvAssertReturn(this->src.fill(h_random32u), false);
+    Ncv32u randCnt = 0;
+    Ncv64f randVal;
+
+    for (Ncv32u i=0; i<this->numDstRects; i++)
+    {
+        h_vecDst_groundTruth.ptr()[i].x = i * this->canvasWidth / this->numDstRects + this->canvasWidth / (this->numDstRects * 4);
+        h_vecDst_groundTruth.ptr()[i].y = i * this->canvasHeight / this->numDstRects + this->canvasHeight / (this->numDstRects * 4);
+        h_vecDst_groundTruth.ptr()[i].width = this->canvasWidth / (this->numDstRects * 2);
+        h_vecDst_groundTruth.ptr()[i].height = this->canvasHeight / (this->numDstRects * 2);
+
+        Ncv32u numNeighbors = this->minNeighbors + 1 + (Ncv32u)(((1.0 * h_random32u.ptr()[i]) * (this->minNeighbors + 1)) / 0xFFFFFFFF);
+        numNeighbors = (numNeighbors > srcSlotSize) ? srcSlotSize : numNeighbors;
+
+        //fill in strong hypotheses                           (2 * ((1.0 * randVal) / 0xFFFFFFFF) - 1)
+        for (Ncv32u j=0; j<numNeighbors; j++)
+        {
+            randVal = (1.0 * h_random32u.ptr()[randCnt++]) / 0xFFFFFFFF; randCnt = randCnt % h_random32u.length();
+            h_vecSrc.ptr()[srcSlotSize * i + j].x =
+                h_vecDst_groundTruth.ptr()[i].x +
+                (Ncv32s)(h_vecDst_groundTruth.ptr()[i].width * this->eps * (randVal - 0.5));
+            randVal = (1.0 * h_random32u.ptr()[randCnt++]) / 0xFFFFFFFF; randCnt = randCnt % h_random32u.length();
+            h_vecSrc.ptr()[srcSlotSize * i + j].y =
+                h_vecDst_groundTruth.ptr()[i].y +
+                (Ncv32s)(h_vecDst_groundTruth.ptr()[i].height * this->eps * (randVal - 0.5));
+            h_vecSrc.ptr()[srcSlotSize * i + j].width = h_vecDst_groundTruth.ptr()[i].width;
+            h_vecSrc.ptr()[srcSlotSize * i + j].height = h_vecDst_groundTruth.ptr()[i].height;
+        }
+
+        //generate weak hypotheses (to be removed in processing)
+        for (Ncv32u j=numNeighbors; j<srcSlotSize; j++)
+        {
+            randVal = (1.0 * h_random32u.ptr()[randCnt++]) / 0xFFFFFFFF; randCnt = randCnt % h_random32u.length();
+            h_vecSrc.ptr()[srcSlotSize * i + j].x =
+                this->canvasWidth + h_vecDst_groundTruth.ptr()[i].x +
+                (Ncv32s)(h_vecDst_groundTruth.ptr()[i].width * this->eps * (randVal - 0.5));
+            randVal = (1.0 * h_random32u.ptr()[randCnt++]) / 0xFFFFFFFF; randCnt = randCnt % h_random32u.length();
+            h_vecSrc.ptr()[srcSlotSize * i + j].y =
+                this->canvasHeight + h_vecDst_groundTruth.ptr()[i].y +
+                (Ncv32s)(h_vecDst_groundTruth.ptr()[i].height * this->eps * (randVal - 0.5));
+            h_vecSrc.ptr()[srcSlotSize * i + j].width = h_vecDst_groundTruth.ptr()[i].width;
+            h_vecSrc.ptr()[srcSlotSize * i + j].height = h_vecDst_groundTruth.ptr()[i].height;
+        }
+    }
+
+    //shuffle
+    for (Ncv32u i=0; i<this->numDstRects*srcSlotSize-1; i++)
+    {
+        Ncv32u randValLocal = h_random32u.ptr()[randCnt++]; randCnt = randCnt % h_random32u.length();
+        Ncv32u secondSwap = randValLocal % (this->numDstRects*srcSlotSize-1 - i);
+        NcvRect32u tmp = h_vecSrc.ptr()[i + secondSwap];
+        h_vecSrc.ptr()[i + secondSwap] = h_vecSrc.ptr()[i];
+        h_vecSrc.ptr()[i] = tmp;
+    }
+    NCV_SKIP_COND_END
+
+    Ncv32u numHypothesesSrc = static_cast<Ncv32u>(h_vecSrc.length());
+    NCV_SKIP_COND_BEGIN
+    ncvStat = ncvGroupRectangles_host(h_vecSrc, numHypothesesSrc, this->minNeighbors, this->eps, NULL);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+    NCV_SKIP_COND_END
+
+    //verification
+    bool bLoopVirgin = true;
+
+    NCV_SKIP_COND_BEGIN
+    if (numHypothesesSrc != this->numDstRects)
+    {
+        bLoopVirgin = false;
+    }
+    else
+    {
+        std::vector<NcvRect32u> tmpRects(numHypothesesSrc);
+        memcpy(&tmpRects[0], h_vecSrc.ptr(), numHypothesesSrc * sizeof(NcvRect32u));
+        std::sort(tmpRects.begin(), tmpRects.end());
+        for (Ncv32u i=0; i<numHypothesesSrc && bLoopVirgin; i++)
+        {
+            if (!compareRects(tmpRects[i], h_vecDst_groundTruth.ptr()[i], this->eps))
+            {
+                bLoopVirgin = false;
+            }
+        }
+    }
+    NCV_SKIP_COND_END
+
+    if (bLoopVirgin)
+    {
+        rcode = true;
+    }
+
+    return rcode;
+}
+
+
+bool TestHypothesesFilter::deinit()
+{
+    return true;
+}
diff --git a/modules/cudalegacy/test/TestHypothesesFilter.h b/modules/cudalegacy/test/TestHypothesesFilter.h
new file mode 100644
index 00000000000..f190785fe7c
--- /dev/null
+++ b/modules/cudalegacy/test/TestHypothesesFilter.h
@@ -0,0 +1,76 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _testhypothesesfilter_h_
+#define _testhypothesesfilter_h_
+
+#include "NCVTest.hpp"
+#include "NCVTestSourceProvider.hpp"
+
+
+class TestHypothesesFilter : public NCVTestProvider
+{
+public:
+
+    TestHypothesesFilter(std::string testName, NCVTestSourceProvider<Ncv32u> &src,
+                         Ncv32u numDstRects, Ncv32u minNeighbors, Ncv32f eps);
+
+    virtual bool init();
+    virtual bool process();
+    virtual bool deinit();
+    virtual bool toString(std::ofstream &strOut);
+
+private:
+
+    TestHypothesesFilter(const TestHypothesesFilter&);
+    TestHypothesesFilter& operator=(const TestHypothesesFilter&);
+
+    NCVTestSourceProvider<Ncv32u> &src;
+    Ncv32u numDstRects;
+    Ncv32u minNeighbors;
+    Ncv32f eps;
+
+    Ncv32u canvasWidth;
+    Ncv32u canvasHeight;
+};
+
+#endif // _testhypothesesfilter_h_
diff --git a/modules/cudalegacy/test/TestHypothesesGrow.cpp b/modules/cudalegacy/test/TestHypothesesGrow.cpp
new file mode 100644
index 00000000000..e7fe4d939df
--- /dev/null
+++ b/modules/cudalegacy/test/TestHypothesesGrow.cpp
@@ -0,0 +1,164 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+
+TestHypothesesGrow::TestHypothesesGrow(std::string testName_, NCVTestSourceProvider<Ncv32u> &src_,
+                                       Ncv32u rectWidth_, Ncv32u rectHeight_, Ncv32f rectScale_,
+                                       Ncv32u maxLenSrc_, Ncv32u lenSrc_, Ncv32u maxLenDst_, Ncv32u lenDst_)
+    :
+    NCVTestProvider(testName_),
+    src(src_),
+    rectWidth(rectWidth_),
+    rectHeight(rectHeight_),
+    rectScale(rectScale_),
+    maxLenSrc(maxLenSrc_),
+    lenSrc(lenSrc_),
+    maxLenDst(maxLenDst_),
+    lenDst(lenDst_)
+{
+}
+
+
+bool TestHypothesesGrow::toString(std::ofstream &strOut)
+{
+    strOut << "rectWidth=" << rectWidth << std::endl;
+    strOut << "rectHeight=" << rectHeight << std::endl;
+    strOut << "rectScale=" << rectScale << std::endl;
+    strOut << "maxLenSrc=" << maxLenSrc << std::endl;
+    strOut << "lenSrc=" << lenSrc << std::endl;
+    strOut << "maxLenDst=" << maxLenDst << std::endl;
+    strOut << "lenDst=" << lenDst << std::endl;
+    return true;
+}
+
+
+bool TestHypothesesGrow::init()
+{
+    return true;
+}
+
+
+bool TestHypothesesGrow::process()
+{
+    NCVStatus ncvStat;
+    bool rcode = false;
+
+    NCVVectorAlloc<Ncv32u> h_vecSrc(*this->allocatorCPU.get(), this->maxLenSrc);
+    ncvAssertReturn(h_vecSrc.isMemAllocated(), false);
+    NCVVectorAlloc<Ncv32u> d_vecSrc(*this->allocatorGPU.get(), this->maxLenSrc);
+    ncvAssertReturn(d_vecSrc.isMemAllocated(), false);
+
+    NCVVectorAlloc<NcvRect32u> h_vecDst(*this->allocatorCPU.get(), this->maxLenDst);
+    ncvAssertReturn(h_vecDst.isMemAllocated(), false);
+    NCVVectorAlloc<NcvRect32u> d_vecDst(*this->allocatorGPU.get(), this->maxLenDst);
+    ncvAssertReturn(d_vecDst.isMemAllocated(), false);
+    NCVVectorAlloc<NcvRect32u> h_vecDst_d(*this->allocatorCPU.get(), this->maxLenDst);
+    ncvAssertReturn(h_vecDst_d.isMemAllocated(), false);
+
+    NCV_SET_SKIP_COND(this->allocatorGPU.get()->isCounting());
+
+    NCV_SKIP_COND_BEGIN
+    ncvAssertReturn(this->src.fill(h_vecSrc), false);
+    memset(h_vecDst.ptr(), 0, h_vecDst.length() * sizeof(NcvRect32u));
+    NCVVectorReuse<Ncv32u> h_vecDst_as32u(h_vecDst.getSegment(), lenDst * sizeof(NcvRect32u) / sizeof(Ncv32u));
+    ncvAssertReturn(h_vecDst_as32u.isMemReused(), false);
+    ncvAssertReturn(this->src.fill(h_vecDst_as32u), false);
+    memcpy(h_vecDst_d.ptr(), h_vecDst.ptr(), h_vecDst.length() * sizeof(NcvRect32u));
+    NCV_SKIP_COND_END
+
+    ncvStat = h_vecSrc.copySolid(d_vecSrc, 0);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+    ncvStat = h_vecDst.copySolid(d_vecDst, 0);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+    ncvAssertCUDAReturn(cudaStreamSynchronize(0), false);
+
+    Ncv32u h_outElemNum_d = 0;
+    Ncv32u h_outElemNum_h = 0;
+    NCV_SKIP_COND_BEGIN
+    h_outElemNum_d = this->lenDst;
+    ncvStat = ncvGrowDetectionsVector_device(d_vecSrc, this->lenSrc,
+                                             d_vecDst, h_outElemNum_d, this->maxLenDst,
+                                             this->rectWidth, this->rectHeight, this->rectScale, 0);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+    ncvStat = d_vecDst.copySolid(h_vecDst_d, 0);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+    ncvAssertCUDAReturn(cudaStreamSynchronize(0), false);
+
+    h_outElemNum_h = this->lenDst;
+    ncvStat = ncvGrowDetectionsVector_host(h_vecSrc, this->lenSrc,
+                                           h_vecDst, h_outElemNum_h, this->maxLenDst,
+                                           this->rectWidth, this->rectHeight, this->rectScale);
+    ncvAssertReturn(ncvStat == NCV_SUCCESS, false);
+    NCV_SKIP_COND_END
+
+    //bit-to-bit check
+    bool bLoopVirgin = true;
+
+    NCV_SKIP_COND_BEGIN
+    if (h_outElemNum_d != h_outElemNum_h)
+    {
+        bLoopVirgin = false;
+    }
+    else
+    {
+        if (memcmp(h_vecDst.ptr(), h_vecDst_d.ptr(), this->maxLenDst * sizeof(NcvRect32u)))
+        {
+            bLoopVirgin = false;
+        }
+    }
+    NCV_SKIP_COND_END
+
+    if (bLoopVirgin)
+    {
+        rcode = true;
+    }
+
+    return rcode;
+}
+
+
+bool TestHypothesesGrow::deinit()
+{
+    return true;
+}
diff --git a/modules/cudalegacy/test/TestHypothesesGrow.h b/modules/cudalegacy/test/TestHypothesesGrow.h
new file mode 100644
index 00000000000..51e879f6978
--- /dev/null
+++ b/modules/cudalegacy/test/TestHypothesesGrow.h
@@ -0,0 +1,78 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _testhypothesesgrow_h_
+#define _testhypothesesgrow_h_
+
+#include "NCVTest.hpp"
+#include "NCVTestSourceProvider.hpp"
+
+
+class TestHypothesesGrow : public NCVTestProvider
+{
+public:
+
+    TestHypothesesGrow(std::string testName, NCVTestSourceProvider<Ncv32u> &src,
+                       Ncv32u rectWidth, Ncv32u rectHeight, Ncv32f rectScale,
+                       Ncv32u maxLenSrc, Ncv32u lenSrc, Ncv32u maxLenDst, Ncv32u lenDst);
+
+    virtual bool init();
+    virtual bool process();
+    virtual bool deinit();
+    virtual bool toString(std::ofstream &strOut);
+
+private:
+    TestHypothesesGrow(const TestHypothesesGrow&);
+    TestHypothesesGrow& operator=(const TestHypothesesGrow&);
+
+
+    NCVTestSourceProvider<Ncv32u> &src;
+    Ncv32u rectWidth;
+    Ncv32u rectHeight;
+    Ncv32f rectScale;
+    Ncv32u maxLenSrc;
+    Ncv32u lenSrc;
+    Ncv32u maxLenDst;
+    Ncv32u lenDst;
+};
+
+#endif // _testhypothesesgrow_h_
diff --git a/modules/cudalegacy/test/TestIntegralImage.cpp b/modules/cudalegacy/test/TestIntegralImage.cpp
new file mode 100644
index 00000000000..c04edff7c96
--- /dev/null
+++ b/modules/cudalegacy/test/TestIntegralImage.cpp
@@ -0,0 +1,215 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+
+template <class T_in, class T_out>
+TestIntegralImage<T_in, T_out>::TestIntegralImage(std::string testName_, NCVTestSourceProvider<T_in> &src_,
+                                                  Ncv32u width_, Ncv32u height_)
+    :
+    NCVTestProvider(testName_),
+    src(src_),
+    width(width_),
+    height(height_)
+{
+}
+
+
+template <class T_in, class T_out>
+bool TestIntegralImage<T_in, T_out>::toString(std::ofstream &strOut)
+{
+    strOut << "sizeof(T_in)=" << sizeof(T_in) << std::endl;
+    strOut << "sizeof(T_out)=" << sizeof(T_out) << std::endl;
+    strOut << "width=" << width << std::endl;
+    strOut << "height=" << height << std::endl;
+    return true;
+}
+
+
+template <class T_in, class T_out>
+bool TestIntegralImage<T_in, T_out>::init()
+{
+    return true;
+}
+
+
+template <class T_in, class T_out>
+bool TestIntegralImage<T_in, T_out>::process()
+{
+    NCVStatus ncvStat;
+    bool rcode = false;
+
+    Ncv32u widthII = this->width + 1;
+    Ncv32u heightII = this->height + 1;
+
+    NCVMatrixAlloc<T_in> d_img(*this->allocatorGPU.get(), this->width, this->height);
+    ncvAssertReturn(d_img.isMemAllocated(), false);
+    NCVMatrixAlloc<T_in> h_img(*this->allocatorCPU.get(), this->width, this->height);
+    ncvAssertReturn(h_img.isMemAllocated(), false);
+    NCVMatrixAlloc<T_out> d_imgII(*this->allocatorGPU.get(), widthII, heightII);
+    ncvAssertReturn(d_imgII.isMemAllocated(), false);
+    NCVMatrixAlloc<T_out> h_imgII(*this->allocatorCPU.get(), widthII, heightII);
+    ncvAssertReturn(h_imgII.isMemAllocated(), false);
+    NCVMatrixAlloc<T_out> h_imgII_d(*this->allocatorCPU.get(), widthII, heightII);
+    ncvAssertReturn(h_imgII_d.isMemAllocated(), false);
+
+    Ncv32u bufSize;
+    if (sizeof(T_in) == sizeof(Ncv8u))
+    {
+        ncvStat = nppiStIntegralGetSize_8u32u(NcvSize32u(this->width, this->height), &bufSize, this->devProp);
+        ncvAssertReturn(NPPST_SUCCESS == ncvStat, false);
+    }
+    else if (sizeof(T_in) == sizeof(Ncv32f))
+    {
+        ncvStat = nppiStIntegralGetSize_32f32f(NcvSize32u(this->width, this->height), &bufSize, this->devProp);
+        ncvAssertReturn(NPPST_SUCCESS == ncvStat, false);
+    }
+    else
+    {
+        ncvAssertPrintReturn(false, "Incorrect integral image test instance", false);
+    }
+
+    NCVVectorAlloc<Ncv8u> d_tmpBuf(*this->allocatorGPU.get(), bufSize);
+    ncvAssertReturn(d_tmpBuf.isMemAllocated(), false);
+
+    NCV_SET_SKIP_COND(this->allocatorGPU.get()->isCounting());
+    NCV_SKIP_COND_BEGIN
+
+    ncvAssertReturn(this->src.fill(h_img), false);
+
+    ncvStat = h_img.copySolid(d_img, 0);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+
+    if (sizeof(T_in) == sizeof(Ncv8u))
+    {
+        ncvStat = nppiStIntegral_8u32u_C1R((Ncv8u *)d_img.ptr(), d_img.pitch(),
+                                           (Ncv32u *)d_imgII.ptr(), d_imgII.pitch(),
+                                           NcvSize32u(this->width, this->height),
+                                           d_tmpBuf.ptr(), bufSize, this->devProp);
+        ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+    }
+    else if (sizeof(T_in) == sizeof(Ncv32f))
+    {
+        ncvStat = nppiStIntegral_32f32f_C1R((Ncv32f *)d_img.ptr(), d_img.pitch(),
+                                            (Ncv32f *)d_imgII.ptr(), d_imgII.pitch(),
+                                            NcvSize32u(this->width, this->height),
+                                            d_tmpBuf.ptr(), bufSize, this->devProp);
+        ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+    }
+    else
+    {
+        ncvAssertPrintReturn(false, "Incorrect integral image test instance", false);
+    }
+
+    ncvStat = d_imgII.copySolid(h_imgII_d, 0);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+
+    if (sizeof(T_in) == sizeof(Ncv8u))
+    {
+        ncvStat = nppiStIntegral_8u32u_C1R_host((Ncv8u *)h_img.ptr(), h_img.pitch(),
+                                                (Ncv32u *)h_imgII.ptr(), h_imgII.pitch(),
+                                                NcvSize32u(this->width, this->height));
+        ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+    }
+    else if (sizeof(T_in) == sizeof(Ncv32f))
+    {
+        ncvStat = nppiStIntegral_32f32f_C1R_host((Ncv32f *)h_img.ptr(), h_img.pitch(),
+                                                 (Ncv32f *)h_imgII.ptr(), h_imgII.pitch(),
+                                                 NcvSize32u(this->width, this->height));
+        ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+    }
+    else
+    {
+        ncvAssertPrintReturn(false, "Incorrect integral image test instance", false);
+    }
+
+    NCV_SKIP_COND_END
+
+    //bit-to-bit check
+    bool bLoopVirgin = true;
+
+    NCV_SKIP_COND_BEGIN
+    for (Ncv32u i=0; bLoopVirgin && i < h_img.height() + 1; i++)
+    {
+        for (Ncv32u j=0; bLoopVirgin && j < h_img.width() + 1; j++)
+        {
+            if (sizeof(T_in) == sizeof(Ncv8u))
+            {
+                if (h_imgII.ptr()[h_imgII.stride()*i+j] != h_imgII_d.ptr()[h_imgII_d.stride()*i+j])
+                {
+                    bLoopVirgin = false;
+                }
+            }
+            else if (sizeof(T_in) == sizeof(Ncv32f))
+            {
+                if (fabsf((float)h_imgII.ptr()[h_imgII.stride()*i+j] - (float)h_imgII_d.ptr()[h_imgII_d.stride()*i+j]) > 0.01f)
+                {
+                    bLoopVirgin = false;
+                }
+            }
+            else
+            {
+                ncvAssertPrintReturn(false, "Incorrect integral image test instance", false);
+            }
+        }
+    }
+    NCV_SKIP_COND_END
+
+    if (bLoopVirgin)
+    {
+        rcode = true;
+    }
+
+    return rcode;
+}
+
+
+template <class T_in, class T_out>
+bool TestIntegralImage<T_in, T_out>::deinit()
+{
+    return true;
+}
+
+
+template class TestIntegralImage<Ncv8u, Ncv32u>;
+template class TestIntegralImage<Ncv32f, Ncv32f>;
diff --git a/modules/cudalegacy/test/TestIntegralImage.h b/modules/cudalegacy/test/TestIntegralImage.h
new file mode 100644
index 00000000000..c4f58ba1467
--- /dev/null
+++ b/modules/cudalegacy/test/TestIntegralImage.h
@@ -0,0 +1,72 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _testintegralimage_h_
+#define _testintegralimage_h_
+
+#include "NCVTest.hpp"
+#include "NCVTestSourceProvider.hpp"
+
+
+template <class T_in, class T_out>
+class TestIntegralImage : public NCVTestProvider
+{
+public:
+
+    TestIntegralImage(std::string testName, NCVTestSourceProvider<T_in> &src,
+                      Ncv32u width, Ncv32u height);
+
+    virtual bool init();
+    virtual bool process();
+    virtual bool deinit();
+    virtual bool toString(std::ofstream &strOut);
+
+private:
+    TestIntegralImage(const TestIntegralImage&);
+    TestIntegralImage& operator=(const TestIntegralImage&);
+
+    NCVTestSourceProvider<T_in> &src;
+    Ncv32u width;
+    Ncv32u height;
+};
+
+#endif // _testintegralimage_h_
diff --git a/modules/cudalegacy/test/TestIntegralImageSquared.cpp b/modules/cudalegacy/test/TestIntegralImageSquared.cpp
new file mode 100644
index 00000000000..5481fa2e3a6
--- /dev/null
+++ b/modules/cudalegacy/test/TestIntegralImageSquared.cpp
@@ -0,0 +1,148 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+
+TestIntegralImageSquared::TestIntegralImageSquared(std::string testName_, NCVTestSourceProvider<Ncv8u> &src_,
+                                                   Ncv32u width_, Ncv32u height_)
+    :
+    NCVTestProvider(testName_),
+    src(src_),
+    width(width_),
+    height(height_)
+{
+}
+
+
+bool TestIntegralImageSquared::toString(std::ofstream &strOut)
+{
+    strOut << "width=" << width << std::endl;
+    strOut << "height=" << height << std::endl;
+    return true;
+}
+
+
+bool TestIntegralImageSquared::init()
+{
+    return true;
+}
+
+
+bool TestIntegralImageSquared::process()
+{
+    NCVStatus ncvStat;
+    bool rcode = false;
+
+    Ncv32u widthSII = this->width + 1;
+    Ncv32u heightSII = this->height + 1;
+
+    NCVMatrixAlloc<Ncv8u> d_img(*this->allocatorGPU.get(), this->width, this->height);
+    ncvAssertReturn(d_img.isMemAllocated(), false);
+    NCVMatrixAlloc<Ncv8u> h_img(*this->allocatorCPU.get(), this->width, this->height);
+    ncvAssertReturn(h_img.isMemAllocated(), false);
+    NCVMatrixAlloc<Ncv64u> d_imgSII(*this->allocatorGPU.get(), widthSII, heightSII);
+    ncvAssertReturn(d_imgSII.isMemAllocated(), false);
+    NCVMatrixAlloc<Ncv64u> h_imgSII(*this->allocatorCPU.get(), widthSII, heightSII);
+    ncvAssertReturn(h_imgSII.isMemAllocated(), false);
+    NCVMatrixAlloc<Ncv64u> h_imgSII_d(*this->allocatorCPU.get(), widthSII, heightSII);
+    ncvAssertReturn(h_imgSII_d.isMemAllocated(), false);
+
+    Ncv32u bufSize;
+    ncvStat = nppiStSqrIntegralGetSize_8u64u(NcvSize32u(this->width, this->height), &bufSize, this->devProp);
+    ncvAssertReturn(NPPST_SUCCESS == ncvStat, false);
+    NCVVectorAlloc<Ncv8u> d_tmpBuf(*this->allocatorGPU.get(), bufSize);
+    ncvAssertReturn(d_tmpBuf.isMemAllocated(), false);
+
+    NCV_SET_SKIP_COND(this->allocatorGPU.get()->isCounting());
+    NCV_SKIP_COND_BEGIN
+
+    ncvAssertReturn(this->src.fill(h_img), false);
+
+    ncvStat = h_img.copySolid(d_img, 0);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+
+    ncvStat = nppiStSqrIntegral_8u64u_C1R(d_img.ptr(), d_img.pitch(),
+                                          d_imgSII.ptr(), d_imgSII.pitch(),
+                                          NcvSize32u(this->width, this->height),
+                                          d_tmpBuf.ptr(), bufSize, this->devProp);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+
+    ncvStat = d_imgSII.copySolid(h_imgSII_d, 0);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+
+    ncvStat = nppiStSqrIntegral_8u64u_C1R_host(h_img.ptr(), h_img.pitch(),
+                                               h_imgSII.ptr(), h_imgSII.pitch(),
+                                               NcvSize32u(this->width, this->height));
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+
+    NCV_SKIP_COND_END
+
+    //bit-to-bit check
+    bool bLoopVirgin = true;
+
+    NCV_SKIP_COND_BEGIN
+    for (Ncv32u i=0; bLoopVirgin && i < h_img.height() + 1; i++)
+    {
+        for (Ncv32u j=0; bLoopVirgin && j < h_img.width() + 1; j++)
+        {
+            if (h_imgSII.ptr()[h_imgSII.stride()*i+j] != h_imgSII_d.ptr()[h_imgSII_d.stride()*i+j])
+            {
+                bLoopVirgin = false;
+            }
+        }
+    }
+    NCV_SKIP_COND_END
+
+    if (bLoopVirgin)
+    {
+        rcode = true;
+    }
+
+    return rcode;
+}
+
+
+bool TestIntegralImageSquared::deinit()
+{
+    return true;
+}
diff --git a/modules/cudalegacy/test/TestIntegralImageSquared.h b/modules/cudalegacy/test/TestIntegralImageSquared.h
new file mode 100644
index 00000000000..20e5ca8df3b
--- /dev/null
+++ b/modules/cudalegacy/test/TestIntegralImageSquared.h
@@ -0,0 +1,71 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _testintegralimagesquared_h_
+#define _testintegralimagesquared_h_
+
+#include "NCVTest.hpp"
+#include "NCVTestSourceProvider.hpp"
+
+
+class TestIntegralImageSquared : public NCVTestProvider
+{
+public:
+
+    TestIntegralImageSquared(std::string testName, NCVTestSourceProvider<Ncv8u> &src,
+                             Ncv32u width, Ncv32u height);
+
+    virtual bool init();
+    virtual bool process();
+    virtual bool deinit();
+    virtual bool toString(std::ofstream &strOut);
+
+private:
+    TestIntegralImageSquared(const TestIntegralImageSquared&);
+    TestIntegralImageSquared& operator=(const TestIntegralImageSquared&);
+
+    NCVTestSourceProvider<Ncv8u> &src;
+    Ncv32u width;
+    Ncv32u height;
+};
+
+#endif // _testintegralimagesquared_h_
diff --git a/modules/cudalegacy/test/TestRectStdDev.cpp b/modules/cudalegacy/test/TestRectStdDev.cpp
new file mode 100644
index 00000000000..86bb9ed23b9
--- /dev/null
+++ b/modules/cudalegacy/test/TestRectStdDev.cpp
@@ -0,0 +1,209 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+
+TestRectStdDev::TestRectStdDev(std::string testName_, NCVTestSourceProvider<Ncv8u> &src_,
+                               Ncv32u width_, Ncv32u height_, NcvRect32u rect_, Ncv32f scaleFactor_,
+                               NcvBool bTextureCache_)
+    :
+    NCVTestProvider(testName_),
+    src(src_),
+    width(width_),
+    height(height_),
+    rect(rect_),
+    scaleFactor(scaleFactor_),
+    bTextureCache(bTextureCache_)
+{
+}
+
+
+bool TestRectStdDev::toString(std::ofstream &strOut)
+{
+    strOut << "width=" << width << std::endl;
+    strOut << "height=" << height << std::endl;
+    strOut << "rect=[" << rect.x << ", " << rect.y << ", " << rect.width << ", " << rect.height << "]\n";
+    strOut << "scaleFactor=" << scaleFactor << std::endl;
+    strOut << "bTextureCache=" << bTextureCache << std::endl;
+    return true;
+}
+
+
+bool TestRectStdDev::init()
+{
+    return true;
+}
+
+
+bool TestRectStdDev::process()
+{
+    NCVStatus ncvStat;
+    bool rcode = false;
+
+    Ncv32s _normWidth = (Ncv32s)this->width - this->rect.x - this->rect.width + 1;
+    Ncv32s _normHeight = (Ncv32s)this->height - this->rect.y - this->rect.height + 1;
+    if (_normWidth <= 0 || _normHeight <= 0)
+    {
+        return true;
+    }
+    Ncv32u normWidth = (Ncv32u)_normWidth;
+    Ncv32u normHeight = (Ncv32u)_normHeight;
+    NcvSize32u szNormRoi(normWidth, normHeight);
+
+    Ncv32u widthII = this->width + 1;
+    Ncv32u heightII = this->height + 1;
+    Ncv32u widthSII = this->width + 1;
+    Ncv32u heightSII = this->height + 1;
+
+    NCVMatrixAlloc<Ncv8u> d_img(*this->allocatorGPU.get(), this->width, this->height);
+    ncvAssertReturn(d_img.isMemAllocated(), false);
+    NCVMatrixAlloc<Ncv8u> h_img(*this->allocatorCPU.get(), this->width, this->height);
+    ncvAssertReturn(h_img.isMemAllocated(), false);
+
+    NCVMatrixAlloc<Ncv32u> d_imgII(*this->allocatorGPU.get(), widthII, heightII);
+    ncvAssertReturn(d_imgII.isMemAllocated(), false);
+    NCVMatrixAlloc<Ncv32u> h_imgII(*this->allocatorCPU.get(), widthII, heightII);
+    ncvAssertReturn(h_imgII.isMemAllocated(), false);
+
+    NCVMatrixAlloc<Ncv64u> d_imgSII(*this->allocatorGPU.get(), widthSII, heightSII);
+    ncvAssertReturn(d_imgSII.isMemAllocated(), false);
+    NCVMatrixAlloc<Ncv64u> h_imgSII(*this->allocatorCPU.get(), widthSII, heightSII);
+    ncvAssertReturn(h_imgSII.isMemAllocated(), false);
+
+    NCVMatrixAlloc<Ncv32f> d_norm(*this->allocatorGPU.get(), normWidth, normHeight);
+    ncvAssertReturn(d_norm.isMemAllocated(), false);
+    NCVMatrixAlloc<Ncv32f> h_norm(*this->allocatorCPU.get(), normWidth, normHeight);
+    ncvAssertReturn(h_norm.isMemAllocated(), false);
+    NCVMatrixAlloc<Ncv32f> h_norm_d(*this->allocatorCPU.get(), normWidth, normHeight);
+    ncvAssertReturn(h_norm_d.isMemAllocated(), false);
+
+    Ncv32u bufSizeII, bufSizeSII;
+    ncvStat = nppiStIntegralGetSize_8u32u(NcvSize32u(this->width, this->height), &bufSizeII, this->devProp);
+    ncvAssertReturn(NPPST_SUCCESS == ncvStat, false);
+    ncvStat = nppiStSqrIntegralGetSize_8u64u(NcvSize32u(this->width, this->height), &bufSizeSII, this->devProp);
+    ncvAssertReturn(NPPST_SUCCESS == ncvStat, false);
+    Ncv32u bufSize = bufSizeII > bufSizeSII ? bufSizeII : bufSizeSII;
+    NCVVectorAlloc<Ncv8u> d_tmpBuf(*this->allocatorGPU.get(), bufSize);
+    ncvAssertReturn(d_tmpBuf.isMemAllocated(), false);
+
+    NCV_SET_SKIP_COND(this->allocatorGPU.get()->isCounting());
+    NCV_SKIP_COND_BEGIN
+    ncvAssertReturn(this->src.fill(h_img), false);
+
+    ncvStat = h_img.copySolid(d_img, 0);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+
+    ncvStat = nppiStIntegral_8u32u_C1R(d_img.ptr(), d_img.pitch(),
+                                       d_imgII.ptr(), d_imgII.pitch(),
+                                       NcvSize32u(this->width, this->height),
+                                       d_tmpBuf.ptr(), bufSize, this->devProp);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+
+    ncvStat = nppiStSqrIntegral_8u64u_C1R(d_img.ptr(), d_img.pitch(),
+                                          d_imgSII.ptr(), d_imgSII.pitch(),
+                                          NcvSize32u(this->width, this->height),
+                                          d_tmpBuf.ptr(), bufSize, this->devProp);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+
+    ncvStat = nppiStRectStdDev_32f_C1R(d_imgII.ptr(), d_imgII.pitch(),
+                                       d_imgSII.ptr(), d_imgSII.pitch(),
+                                       d_norm.ptr(), d_norm.pitch(),
+                                       szNormRoi, this->rect,
+                                       this->scaleFactor,
+                                       this->bTextureCache);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+
+    ncvStat = d_norm.copySolid(h_norm_d, 0);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+
+    ncvStat = nppiStIntegral_8u32u_C1R_host(h_img.ptr(), h_img.pitch(),
+                                          h_imgII.ptr(), h_imgII.pitch(),
+                                          NcvSize32u(this->width, this->height));
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+
+    ncvStat = nppiStSqrIntegral_8u64u_C1R_host(h_img.ptr(), h_img.pitch(),
+                                             h_imgSII.ptr(), h_imgSII.pitch(),
+                                             NcvSize32u(this->width, this->height));
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+
+    ncvStat = nppiStRectStdDev_32f_C1R_host(h_imgII.ptr(), h_imgII.pitch(),
+                                          h_imgSII.ptr(), h_imgSII.pitch(),
+                                          h_norm.ptr(), h_norm.pitch(),
+                                          szNormRoi, this->rect,
+                                          this->scaleFactor);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+    NCV_SKIP_COND_END
+
+    //bit-to-bit check
+    bool bLoopVirgin = true;
+
+    NCV_SKIP_COND_BEGIN
+    const Ncv64f relEPS = 0.005;
+    for (Ncv32u i=0; bLoopVirgin && i < h_norm.height(); i++)
+    {
+        for (Ncv32u j=0; bLoopVirgin && j < h_norm.width(); j++)
+        {
+            Ncv64f absErr = fabs(h_norm.ptr()[h_norm.stride()*i+j] - h_norm_d.ptr()[h_norm_d.stride()*i+j]);
+            Ncv64f relErr = absErr / h_norm.ptr()[h_norm.stride()*i+j];
+
+            if (relErr > relEPS)
+            {
+                bLoopVirgin = false;
+            }
+        }
+    }
+    NCV_SKIP_COND_END
+
+    if (bLoopVirgin)
+    {
+        rcode = true;
+    }
+
+    return rcode;
+}
+
+
+bool TestRectStdDev::deinit()
+{
+    return true;
+}
diff --git a/modules/cudalegacy/test/TestRectStdDev.h b/modules/cudalegacy/test/TestRectStdDev.h
new file mode 100644
index 00000000000..e22fbe87c78
--- /dev/null
+++ b/modules/cudalegacy/test/TestRectStdDev.h
@@ -0,0 +1,76 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _testrectstddev_h_
+#define _testrectstddev_h_
+
+#include "NCVTest.hpp"
+#include "NCVTestSourceProvider.hpp"
+
+
+class TestRectStdDev : public NCVTestProvider
+{
+public:
+
+    TestRectStdDev(std::string testName, NCVTestSourceProvider<Ncv8u> &src,
+                   Ncv32u width, Ncv32u height, NcvRect32u rect, Ncv32f scaleFactor,
+                   NcvBool bTextureCache);
+
+    virtual bool init();
+    virtual bool process();
+    virtual bool deinit();
+    virtual bool toString(std::ofstream &strOut);
+
+private:
+    TestRectStdDev(const TestRectStdDev&);
+    TestRectStdDev& operator=(const TestRectStdDev&);
+
+    NCVTestSourceProvider<Ncv8u> &src;
+    Ncv32u width;
+    Ncv32u height;
+    NcvRect32u rect;
+    Ncv32f scaleFactor;
+
+    NcvBool bTextureCache;
+};
+
+#endif // _testrectstddev_h_
diff --git a/modules/cudalegacy/test/TestResize.cpp b/modules/cudalegacy/test/TestResize.cpp
new file mode 100644
index 00000000000..d2080f06de7
--- /dev/null
+++ b/modules/cudalegacy/test/TestResize.cpp
@@ -0,0 +1,190 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+
+template <class T>
+TestResize<T>::TestResize(std::string testName_, NCVTestSourceProvider<T> &src_,
+                          Ncv32u width_, Ncv32u height_, Ncv32u scaleFactor_, NcvBool bTextureCache_)
+    :
+    NCVTestProvider(testName_),
+    src(src_),
+    width(width_),
+    height(height_),
+    scaleFactor(scaleFactor_),
+    bTextureCache(bTextureCache_)
+{
+}
+
+
+template <class T>
+bool TestResize<T>::toString(std::ofstream &strOut)
+{
+    strOut << "sizeof(T)=" << sizeof(T) << std::endl;
+    strOut << "width=" << width << std::endl;
+    strOut << "scaleFactor=" << scaleFactor << std::endl;
+    strOut << "bTextureCache=" << bTextureCache << std::endl;
+    return true;
+}
+
+
+template <class T>
+bool TestResize<T>::init()
+{
+    return true;
+}
+
+
+template <class T>
+bool TestResize<T>::process()
+{
+    NCVStatus ncvStat;
+    bool rcode = false;
+
+    Ncv32s smallWidth = this->width / this->scaleFactor;
+    Ncv32s smallHeight = this->height / this->scaleFactor;
+    if (smallWidth == 0 || smallHeight == 0)
+    {
+        return true;
+    }
+
+    NcvSize32u srcSize(this->width, this->height);
+
+    NCVMatrixAlloc<T> d_img(*this->allocatorGPU.get(), this->width, this->height);
+    ncvAssertReturn(d_img.isMemAllocated(), false);
+    NCVMatrixAlloc<T> h_img(*this->allocatorCPU.get(), this->width, this->height);
+    ncvAssertReturn(h_img.isMemAllocated(), false);
+
+    NCVMatrixAlloc<T> d_small(*this->allocatorGPU.get(), smallWidth, smallHeight);
+    ncvAssertReturn(d_small.isMemAllocated(), false);
+    NCVMatrixAlloc<T> h_small(*this->allocatorCPU.get(), smallWidth, smallHeight);
+    ncvAssertReturn(h_small.isMemAllocated(), false);
+    NCVMatrixAlloc<T> h_small_d(*this->allocatorCPU.get(), smallWidth, smallHeight);
+    ncvAssertReturn(h_small_d.isMemAllocated(), false);
+
+    NCV_SET_SKIP_COND(this->allocatorGPU.get()->isCounting());
+    NCV_SKIP_COND_BEGIN
+    ncvAssertReturn(this->src.fill(h_img), false);
+    NCV_SKIP_COND_END
+
+    ncvStat = h_img.copySolid(d_img, 0);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+    NCV_SKIP_COND_BEGIN
+    if (sizeof(T) == sizeof(Ncv32u))
+    {
+        ncvStat = nppiStDecimate_32u_C1R((Ncv32u *)d_img.ptr(), d_img.pitch(),
+                                         (Ncv32u *)d_small.ptr(), d_small.pitch(),
+                                         srcSize, this->scaleFactor,
+                                         this->bTextureCache);
+    }
+    else if (sizeof(T) == sizeof(Ncv64u))
+    {
+        ncvStat = nppiStDecimate_64u_C1R((Ncv64u *)d_img.ptr(), d_img.pitch(),
+                                         (Ncv64u *)d_small.ptr(), d_small.pitch(),
+                                         srcSize, this->scaleFactor,
+                                         this->bTextureCache);
+    }
+    else
+    {
+        ncvAssertPrintReturn(false, "Incorrect downsample test instance", false);
+    }
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+    NCV_SKIP_COND_END
+    ncvStat = d_small.copySolid(h_small_d, 0);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+
+    NCV_SKIP_COND_BEGIN
+    if (sizeof(T) == sizeof(Ncv32u))
+    {
+        ncvStat = nppiStDecimate_32u_C1R_host((Ncv32u *)h_img.ptr(), h_img.pitch(),
+                                              (Ncv32u *)h_small.ptr(), h_small.pitch(),
+                                              srcSize, this->scaleFactor);
+    }
+    else if (sizeof(T) == sizeof(Ncv64u))
+    {
+        ncvStat = nppiStDecimate_64u_C1R_host((Ncv64u *)h_img.ptr(), h_img.pitch(),
+                                              (Ncv64u *)h_small.ptr(), h_small.pitch(),
+                                              srcSize, this->scaleFactor);
+    }
+    else
+    {
+        ncvAssertPrintReturn(false, "Incorrect downsample test instance", false);
+    }
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+    NCV_SKIP_COND_END
+
+    //bit-to-bit check
+    bool bLoopVirgin = true;
+
+    NCV_SKIP_COND_BEGIN
+    //const Ncv64f relEPS = 0.005;
+    for (Ncv32u i=0; bLoopVirgin && i < h_small.height(); i++)
+    {
+        for (Ncv32u j=0; bLoopVirgin && j < h_small.width(); j++)
+        {
+            if (h_small.ptr()[h_small.stride()*i+j] != h_small_d.ptr()[h_small_d.stride()*i+j])
+            {
+                bLoopVirgin = false;
+            }
+        }
+    }
+    NCV_SKIP_COND_END
+
+    if (bLoopVirgin)
+    {
+        rcode = true;
+    }
+
+    return rcode;
+}
+
+
+template <class T>
+bool TestResize<T>::deinit()
+{
+    return true;
+}
+
+
+template class TestResize<Ncv32u>;
+template class TestResize<Ncv64u>;
diff --git a/modules/cudalegacy/test/TestResize.h b/modules/cudalegacy/test/TestResize.h
new file mode 100644
index 00000000000..b2b28a83ff9
--- /dev/null
+++ b/modules/cudalegacy/test/TestResize.h
@@ -0,0 +1,74 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _testresize_h_
+#define _testresize_h_
+
+#include "NCVTest.hpp"
+#include "NCVTestSourceProvider.hpp"
+
+template <class T>
+class TestResize : public NCVTestProvider
+{
+public:
+
+    TestResize(std::string testName, NCVTestSourceProvider<T> &src,
+               Ncv32u width, Ncv32u height, Ncv32u scaleFactor, NcvBool bTextureCache);
+
+    virtual bool init();
+    virtual bool process();
+    virtual bool deinit();
+    virtual bool toString(std::ofstream &strOut);
+
+private:
+    TestResize(const TestResize&);
+    TestResize& operator=(const TestResize&);
+
+    NCVTestSourceProvider<T> &src;
+    Ncv32u width;
+    Ncv32u height;
+    Ncv32u scaleFactor;
+
+    NcvBool bTextureCache;
+};
+
+#endif // _testresize_h_
diff --git a/modules/cudalegacy/test/TestTranspose.cpp b/modules/cudalegacy/test/TestTranspose.cpp
new file mode 100644
index 00000000000..3322a0758bc
--- /dev/null
+++ b/modules/cudalegacy/test/TestTranspose.cpp
@@ -0,0 +1,177 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+
+template <class T>
+TestTranspose<T>::TestTranspose(std::string testName_, NCVTestSourceProvider<T> &src_,
+                                Ncv32u width_, Ncv32u height_)
+    :
+    NCVTestProvider(testName_),
+    src(src_),
+    width(width_),
+    height(height_)
+{
+}
+
+
+template <class T>
+bool TestTranspose<T>::toString(std::ofstream &strOut)
+{
+    strOut << "sizeof(T)=" << sizeof(T) << std::endl;
+    strOut << "width=" << width << std::endl;
+    return true;
+}
+
+
+template <class T>
+bool TestTranspose<T>::init()
+{
+    return true;
+}
+
+
+template <class T>
+bool TestTranspose<T>::process()
+{
+    NCVStatus ncvStat;
+    bool rcode = false;
+
+    NcvSize32u srcSize(this->width, this->height);
+
+    NCVMatrixAlloc<T> d_img(*this->allocatorGPU.get(), this->width, this->height);
+    ncvAssertReturn(d_img.isMemAllocated(), false);
+    NCVMatrixAlloc<T> h_img(*this->allocatorCPU.get(), this->width, this->height);
+    ncvAssertReturn(h_img.isMemAllocated(), false);
+
+    NCVMatrixAlloc<T> d_dst(*this->allocatorGPU.get(), this->height, this->width);
+    ncvAssertReturn(d_dst.isMemAllocated(), false);
+    NCVMatrixAlloc<T> h_dst(*this->allocatorCPU.get(), this->height, this->width);
+    ncvAssertReturn(h_dst.isMemAllocated(), false);
+    NCVMatrixAlloc<T> h_dst_d(*this->allocatorCPU.get(), this->height, this->width);
+    ncvAssertReturn(h_dst_d.isMemAllocated(), false);
+
+    NCV_SET_SKIP_COND(this->allocatorGPU.get()->isCounting());
+    NCV_SKIP_COND_BEGIN
+    ncvAssertReturn(this->src.fill(h_img), false);
+    NCV_SKIP_COND_END
+
+    ncvStat = h_img.copySolid(d_img, 0);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+    NCV_SKIP_COND_BEGIN
+    if (sizeof(T) == sizeof(Ncv32u))
+    {
+        ncvStat = nppiStTranspose_32u_C1R((Ncv32u *)d_img.ptr(), d_img.pitch(),
+                                          (Ncv32u *)d_dst.ptr(), d_dst.pitch(),
+                                          NcvSize32u(this->width, this->height));
+    }
+    else if (sizeof(T) == sizeof(Ncv64u))
+    {
+        ncvStat = nppiStTranspose_64u_C1R((Ncv64u *)d_img.ptr(), d_img.pitch(),
+                                        (Ncv64u *)d_dst.ptr(), d_dst.pitch(),
+                                        NcvSize32u(this->width, this->height));
+    }
+    else
+    {
+        ncvAssertPrintReturn(false, "Incorrect transpose test instance", false);
+    }
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+    NCV_SKIP_COND_END
+    ncvStat = d_dst.copySolid(h_dst_d, 0);
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+
+    NCV_SKIP_COND_BEGIN
+    if (sizeof(T) == sizeof(Ncv32u))
+    {
+        ncvStat = nppiStTranspose_32u_C1R_host((Ncv32u *)h_img.ptr(), h_img.pitch(),
+                                               (Ncv32u *)h_dst.ptr(), h_dst.pitch(),
+                                               NcvSize32u(this->width, this->height));
+    }
+    else if (sizeof(T) == sizeof(Ncv64u))
+    {
+        ncvStat = nppiStTranspose_64u_C1R_host((Ncv64u *)h_img.ptr(), h_img.pitch(),
+                                               (Ncv64u *)h_dst.ptr(), h_dst.pitch(),
+                                               NcvSize32u(this->width, this->height));
+    }
+    else
+    {
+        ncvAssertPrintReturn(false, "Incorrect downsample test instance", false);
+    }
+    ncvAssertReturn(ncvStat == NPPST_SUCCESS, false);
+    NCV_SKIP_COND_END
+
+    //bit-to-bit check
+    bool bLoopVirgin = true;
+
+    NCV_SKIP_COND_BEGIN
+    //const Ncv64f relEPS = 0.005;
+    for (Ncv32u i=0; bLoopVirgin && i < this->width; i++)
+    {
+        for (Ncv32u j=0; bLoopVirgin && j < this->height; j++)
+        {
+            if (h_dst.ptr()[h_dst.stride()*i+j] != h_dst_d.ptr()[h_dst_d.stride()*i+j])
+            {
+                bLoopVirgin = false;
+            }
+        }
+    }
+    NCV_SKIP_COND_END
+
+    if (bLoopVirgin)
+    {
+        rcode = true;
+    }
+
+    return rcode;
+}
+
+
+template <class T>
+bool TestTranspose<T>::deinit()
+{
+    return true;
+}
+
+
+template class TestTranspose<Ncv32u>;
+template class TestTranspose<Ncv64u>;
diff --git a/modules/cudalegacy/test/TestTranspose.h b/modules/cudalegacy/test/TestTranspose.h
new file mode 100644
index 00000000000..c83306fd1c8
--- /dev/null
+++ b/modules/cudalegacy/test/TestTranspose.h
@@ -0,0 +1,73 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef _testtranspose_h_
+#define _testtranspose_h_
+
+#include "NCVTest.hpp"
+#include "NCVTestSourceProvider.hpp"
+
+
+template <class T>
+class TestTranspose : public NCVTestProvider
+{
+public:
+
+    TestTranspose(std::string testName, NCVTestSourceProvider<T> &src,
+                  Ncv32u width, Ncv32u height);
+
+    virtual bool init();
+    virtual bool process();
+    virtual bool deinit();
+    virtual bool toString(std::ofstream &strOut);
+
+private:
+
+    TestTranspose(const TestTranspose&);
+    TestTranspose& operator=(const TestTranspose&);
+
+    NCVTestSourceProvider<T> &src;
+    Ncv32u width;
+    Ncv32u height;
+};
+
+#endif // _testtranspose_h_
diff --git a/modules/cudalegacy/test/main_nvidia.cpp b/modules/cudalegacy/test/main_nvidia.cpp
new file mode 100644
index 00000000000..347eb55ac02
--- /dev/null
+++ b/modules/cudalegacy/test/main_nvidia.cpp
@@ -0,0 +1,459 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#if defined _MSC_VER && _MSC_VER >= 1200
+# pragma warning (disable : 4408 4201 4100)
+#endif
+
+static std::string path;
+
+namespace {
+
+template <class T_in, class T_out>
+void generateIntegralTests(NCVAutoTestLister &testLister,
+                           NCVTestSourceProvider<T_in> &src,
+                           Ncv32u maxWidth, Ncv32u maxHeight)
+{
+    for (Ncv32f _i=1.0; _i<maxWidth; _i*=1.2f)
+    {
+        Ncv32u i = (Ncv32u)_i;
+        char testName[80];
+        sprintf(testName, "LinIntImgW%dH%d", i, 2);
+        testLister.add(new TestIntegralImage<T_in, T_out>(testName, src, i, 2));
+    }
+    for (Ncv32f _i=1.0; _i<maxHeight; _i*=1.2f)
+    {
+        Ncv32u i = (Ncv32u)_i;
+        char testName[80];
+        sprintf(testName, "LinIntImgW%dH%d", 2, i);
+        testLister.add(new TestIntegralImage<T_in, T_out>(testName, src, 2, i));
+    }
+
+    testLister.add(new TestIntegralImage<T_in, T_out>("LinIntImg_VGA", src, 640, 480));
+}
+
+void generateSquaredIntegralTests(NCVAutoTestLister &testLister, NCVTestSourceProvider<Ncv8u> &src,
+                                  Ncv32u maxWidth, Ncv32u maxHeight)
+{
+    for (Ncv32f _i=1.0; _i<maxWidth; _i*=1.2f)
+    {
+        Ncv32u i = (Ncv32u)_i;
+        char testName[80];
+        sprintf(testName, "SqIntImgW%dH%d", i, 32);
+        testLister.add(new TestIntegralImageSquared(testName, src, i, 32));
+    }
+    for (Ncv32f _i=1.0; _i<maxHeight; _i*=1.2f)
+    {
+        Ncv32u i = (Ncv32u)_i;
+        char testName[80];
+        sprintf(testName, "SqIntImgW%dH%d", 32, i);
+        testLister.add(new TestIntegralImageSquared(testName, src, 32, i));
+    }
+
+    testLister.add(new TestIntegralImageSquared("SqLinIntImg_VGA", src, 640, 480));
+}
+
+void generateRectStdDevTests(NCVAutoTestLister &testLister, NCVTestSourceProvider<Ncv8u> &src,
+                             Ncv32u maxWidth, Ncv32u maxHeight)
+{
+    NcvRect32u rect(1,1,18,18);
+
+    for (Ncv32f _i=32; _i<maxHeight/2 && _i < maxWidth/2; _i*=1.2f)
+    {
+        Ncv32u i = (Ncv32u)_i;
+        char testName[80];
+        sprintf(testName, "RectStdDevW%dH%d", i*2, i);
+        testLister.add(new TestRectStdDev(testName, src, i*2, i, rect, 1, true));
+        testLister.add(new TestRectStdDev(testName, src, i*2, i, rect, 1.5, false));
+        testLister.add(new TestRectStdDev(testName, src, i-1, i*2-1, rect, 1, false));
+        testLister.add(new TestRectStdDev(testName, src, i-1, i*2-1, rect, 2.5, true));
+    }
+
+    testLister.add(new TestRectStdDev("RectStdDev_VGA", src, 640, 480, rect, 1, true));
+}
+
+template <class T>
+void generateResizeTests(NCVAutoTestLister &testLister, NCVTestSourceProvider<T> &src)
+{
+    for (Ncv32u i=2; i<10; ++i)
+    {
+        char testName[80];
+        sprintf(testName, "TestResize_VGA_s%d", i);
+        testLister.add(new TestResize<T>(testName, src, 640, 480, i, true));
+        testLister.add(new TestResize<T>(testName, src, 640, 480, i, false));
+    }
+
+    for (Ncv32u i=2; i<10; ++i)
+    {
+        char testName[80];
+        sprintf(testName, "TestResize_1080_s%d", i);
+        testLister.add(new TestResize<T>(testName, src, 1920, 1080, i, true));
+        testLister.add(new TestResize<T>(testName, src, 1920, 1080, i, false));
+    }
+}
+
+void generateNPPSTVectorTests(NCVAutoTestLister &testLister, NCVTestSourceProvider<Ncv32u> &src, Ncv32u maxLength)
+{
+    //compaction
+    for (Ncv32f _i=256.0; _i<maxLength; _i*=1.5f)
+    {
+        Ncv32u i = (Ncv32u)_i;
+        char testName[80];
+        sprintf(testName, "Compaction%d", i);
+        testLister.add(new TestCompact(testName, src, i, 0xFFFFFFFF, 30));
+    }
+    for (Ncv32u i=1; i<260; i++)
+    {
+        char testName[80];
+        sprintf(testName, "Compaction%d", i);
+        testLister.add(new TestCompact(testName, src, i, 0xC001C0DE, 70));
+        testLister.add(new TestCompact(testName, src, i, 0xC001C0DE, 0));
+        testLister.add(new TestCompact(testName, src, i, 0xC001C0DE, 100));
+    }
+    for (Ncv32u i=256*256-10; i<256*256+10; i++)
+    {
+        char testName[80];
+        sprintf(testName, "Compaction%d", i);
+        testLister.add(new TestCompact(testName, src, i, 0xFFFFFFFF, 40));
+    }
+    for (Ncv32u i=256*256*256-2; i<256*256*256+2; i++)
+    {
+        char testName[80];
+        sprintf(testName, "Compaction%d", i);
+        testLister.add(new TestCompact(testName, src, i, 0x00000000, 2));
+    }
+}
+
+
+template <class T>
+void generateTransposeTests(NCVAutoTestLister &testLister, NCVTestSourceProvider<T> &src)
+{
+    for (int i=2; i<64; i+=4)
+    {
+        for (int j=2; j<64; j+=4)
+        {
+            char testName[80];
+            sprintf(testName, "TestTranspose_%dx%d", i, j);
+            testLister.add(new TestTranspose<T>(testName, src, i, j));
+        }
+    }
+
+    for (int i=1; i<128; i+=1)
+    {
+        for (int j=1; j<2; j+=1)
+        {
+            char testName[80];
+            sprintf(testName, "TestTranspose_%dx%d", i, j);
+            testLister.add(new TestTranspose<T>(testName, src, i, j));
+        }
+    }
+
+    testLister.add(new TestTranspose<T>("TestTranspose_VGA", src, 640, 480));
+    testLister.add(new TestTranspose<T>("TestTranspose_HD1080", src, 1920, 1080));
+
+    //regression tests
+    testLister.add(new TestTranspose<T>("TestTranspose_reg_0", src, 1072, 375));
+}
+
+template <class T>
+void generateDrawRectsTests(NCVAutoTestLister &testLister,
+                            NCVTestSourceProvider<T> &src,
+                            NCVTestSourceProvider<Ncv32u> &src32u,
+                            Ncv32u maxWidth, Ncv32u maxHeight)
+{
+    for (Ncv32f _i=16.0; _i<maxWidth; _i*=1.1f)
+    {
+        Ncv32u i = (Ncv32u)_i;
+        Ncv32u j = maxHeight * i / maxWidth;
+        if (!j) continue;
+        char testName[80];
+        sprintf(testName, "DrawRectsW%dH%d", i, j);
+
+        if (sizeof(T) == sizeof(Ncv32u))
+        {
+            testLister.add(new TestDrawRects<T>(testName, src, src32u, i, j, i*j/1000+1, (T)0xFFFFFFFF));
+        }
+        else if (sizeof(T) == sizeof(Ncv8u))
+        {
+            testLister.add(new TestDrawRects<T>(testName, src, src32u, i, j, i*j/1000+1, (T)0xFF));
+        }
+        else
+        {
+            ncvAssertPrintCheck(false, "Attempted to instantiate non-existing DrawRects test suite");
+        }
+    }
+
+    //test VGA
+    testLister.add(new TestDrawRects<T>("DrawRects_VGA", src, src32u, 640, 480, 640*480/1000, (T)0xFF));
+}
+
+void generateVectorTests(NCVAutoTestLister &testLister, NCVTestSourceProvider<Ncv32u> &src, Ncv32u maxLength)
+{
+    //growth
+    for (Ncv32f _i=10.0; _i<maxLength; _i*=1.5f)
+    {
+        Ncv32u i = (Ncv32u)_i;
+        char testName[80];
+        sprintf(testName, "VectorGrow%d", i);
+        testLister.add(new TestHypothesesGrow(testName, src, 20, 20, 2.2f, i, i/2, i, i/4));
+        testLister.add(new TestHypothesesGrow(testName, src, 10, 42, 1.2f, i, i, i, 0));
+    }
+    testLister.add(new TestHypothesesGrow("VectorGrow01b", src, 10, 42, 1.2f, 10, 0, 10, 1));
+    testLister.add(new TestHypothesesGrow("VectorGrow11b", src, 10, 42, 1.2f, 10, 1, 10, 1));
+    testLister.add(new TestHypothesesGrow("VectorGrow10b", src, 10, 42, 1.2f, 10, 1, 10, 0));
+    testLister.add(new TestHypothesesGrow("VectorGrow00b", src, 10, 42, 1.2f, 10, 0, 10, 0));
+}
+
+void generateHypothesesFiltrationTests(NCVAutoTestLister &testLister, NCVTestSourceProvider<Ncv32u> &src, Ncv32u maxLength)
+{
+    for (Ncv32f _i=1.0; _i<maxLength; _i*=1.1f)
+    {
+        Ncv32u i = (Ncv32u)_i;
+        char testName[80];
+        sprintf(testName, "HypFilter%d", i);
+        testLister.add(new TestHypothesesFilter(testName, src, i, 3, 0.2f));
+        testLister.add(new TestHypothesesFilter(testName, src, i, 0, 0.2f));
+        testLister.add(new TestHypothesesFilter(testName, src, i, 1, 0.1f));
+    }
+}
+
+
+void generateHaarLoaderTests(NCVAutoTestLister &testLister)
+{
+    testLister.add(new TestHaarCascadeLoader("haarcascade_eye.xml", path + "haarcascade_eye.xml"));
+    testLister.add(new TestHaarCascadeLoader("haarcascade_frontalface_alt.xml", path + "haarcascade_frontalface_alt.xml"));
+    testLister.add(new TestHaarCascadeLoader("haarcascade_frontalface_alt2.xml", path + "haarcascade_frontalface_alt2.xml"));
+    testLister.add(new TestHaarCascadeLoader("haarcascade_frontalface_alt_tree.xml", path + "haarcascade_frontalface_alt_tree.xml"));
+    testLister.add(new TestHaarCascadeLoader("haarcascade_eye_tree_eyeglasses.xml", path + "haarcascade_eye_tree_eyeglasses.xml"));
+}
+
+void generateHaarApplicationTests(NCVAutoTestLister &testLister, NCVTestSourceProvider<Ncv8u> &src,
+                                  Ncv32u maxWidth, Ncv32u maxHeight)
+{
+    CV_UNUSED(maxHeight);
+    for (Ncv32u i=100; i<512; i+=41)
+    {
+        for (Ncv32u j=100; j<128; j+=25)
+        {
+            char testName[80];
+            sprintf(testName, "HaarAppl%d_%d", i, j);
+            testLister.add(new TestHaarCascadeApplication(testName, src, path + "haarcascade_frontalface_alt.xml", j, i));
+        }
+    }
+    for (Ncv32f _i=20.0; _i<maxWidth; _i*=1.5f)
+    {
+        Ncv32u i = (Ncv32u)_i;
+        char testName[80];
+        sprintf(testName, "HaarAppl%d", i);
+        testLister.add(new TestHaarCascadeApplication(testName, src, path + "haarcascade_frontalface_alt.xml", i, i));
+    }
+}
+
+static void devNullOutput(const cv::String& msg)
+{
+    CV_UNUSED(msg);
+}
+
+}
+
+bool nvidia_NPPST_Integral_Image(const std::string& test_data_path, OutputLevel outputLevel)
+{
+    path = test_data_path.c_str();
+    ncvSetDebugOutputHandler(devNullOutput);
+
+    NCVAutoTestLister testListerII("NPPST Integral Image", outputLevel);
+
+    NCVTestSourceProvider<Ncv8u> testSrcRandom_8u(2010, 0, 255, 2048, 2048);
+    NCVTestSourceProvider<Ncv32f> testSrcRandom_32f(2010, -1.0f, 1.0f, 2048, 2048);
+
+    generateIntegralTests<Ncv8u, Ncv32u>(testListerII, testSrcRandom_8u, 2048, 2048);
+    generateIntegralTests<Ncv32f, Ncv32f>(testListerII, testSrcRandom_32f, 2048, 2048);
+
+    return testListerII.invoke();
+}
+
+bool nvidia_NPPST_Squared_Integral_Image(const std::string& test_data_path, OutputLevel outputLevel)
+{
+    path = test_data_path;
+    ncvSetDebugOutputHandler(devNullOutput);
+
+    NCVAutoTestLister testListerSII("NPPST Squared Integral Image", outputLevel);
+
+    NCVTestSourceProvider<Ncv8u> testSrcRandom_8u(2010, 0, 255, 2048, 2048);
+
+    generateSquaredIntegralTests(testListerSII, testSrcRandom_8u, 2048, 2048);
+
+    return testListerSII.invoke();
+}
+
+bool nvidia_NPPST_RectStdDev(const std::string& test_data_path, OutputLevel outputLevel)
+{
+    path = test_data_path;
+    ncvSetDebugOutputHandler(devNullOutput);
+
+    NCVAutoTestLister testListerRStdDev("NPPST RectStdDev", outputLevel);
+
+    NCVTestSourceProvider<Ncv8u> testSrcRandom_8u(2010, 0, 255, 2048, 2048);
+
+    generateRectStdDevTests(testListerRStdDev, testSrcRandom_8u, 2048, 2048);
+
+    return testListerRStdDev.invoke();
+}
+
+bool nvidia_NPPST_Resize(const std::string& test_data_path, OutputLevel outputLevel)
+{
+    path = test_data_path;
+    ncvSetDebugOutputHandler(devNullOutput);
+
+    NCVAutoTestLister testListerResize("NPPST Resize", outputLevel);
+
+    NCVTestSourceProvider<Ncv32u> testSrcRandom_32u(2010, 0, 0xFFFFFFFF, 2048, 2048);
+    NCVTestSourceProvider<Ncv64u> testSrcRandom_64u(2010, 0, (Ncv64u) -1, 2048, 2048);
+
+    generateResizeTests(testListerResize, testSrcRandom_32u);
+    generateResizeTests(testListerResize, testSrcRandom_64u);
+
+    return testListerResize.invoke();
+}
+
+bool nvidia_NPPST_Vector_Operations(const std::string& test_data_path, OutputLevel outputLevel)
+{
+    path = test_data_path;
+    ncvSetDebugOutputHandler(devNullOutput);
+
+    NCVAutoTestLister testListerNPPSTVectorOperations("NPPST Vector Operations", outputLevel);
+
+    NCVTestSourceProvider<Ncv32u> testSrcRandom_32u(2010, 0, 0xFFFFFFFF, 2048, 2048);
+
+    generateNPPSTVectorTests(testListerNPPSTVectorOperations, testSrcRandom_32u, 2048*2048);
+
+    return testListerNPPSTVectorOperations.invoke();
+}
+
+bool nvidia_NPPST_Transpose(const std::string& test_data_path, OutputLevel outputLevel)
+{
+    path = test_data_path;
+    ncvSetDebugOutputHandler(devNullOutput);
+
+    NCVAutoTestLister testListerTranspose("NPPST Transpose", outputLevel);
+
+    NCVTestSourceProvider<Ncv32u> testSrcRandom_32u(2010, 0, 0xFFFFFFFF, 2048, 2048);
+    NCVTestSourceProvider<Ncv64u> testSrcRandom_64u(2010, 0, (Ncv64u) -1, 2048, 2048);
+
+    generateTransposeTests(testListerTranspose, testSrcRandom_32u);
+    generateTransposeTests(testListerTranspose, testSrcRandom_64u);
+
+    return testListerTranspose.invoke();
+}
+
+bool nvidia_NCV_Vector_Operations(const std::string& test_data_path, OutputLevel outputLevel)
+{
+    path = test_data_path;
+    ncvSetDebugOutputHandler(devNullOutput);
+
+    NCVAutoTestLister testListerVectorOperations("Vector Operations", outputLevel);
+
+    NCVTestSourceProvider<Ncv32u> testSrcRandom_32u(2010, 0, 0xFFFFFFFF, 2048, 2048);
+
+    generateVectorTests(testListerVectorOperations, testSrcRandom_32u, 2048*2048);
+
+    return testListerVectorOperations.invoke();
+
+}
+
+bool nvidia_NCV_Haar_Cascade_Loader(const std::string& test_data_path, OutputLevel outputLevel)
+{
+    path = test_data_path;
+    ncvSetDebugOutputHandler(devNullOutput);
+
+    NCVAutoTestLister testListerHaarLoader("Haar Cascade Loader", outputLevel);
+
+    generateHaarLoaderTests(testListerHaarLoader);
+
+    return testListerHaarLoader.invoke();
+}
+
+bool nvidia_NCV_Haar_Cascade_Application(const std::string& test_data_path, OutputLevel outputLevel)
+{
+    path = test_data_path;
+    ncvSetDebugOutputHandler(devNullOutput);
+
+    NCVAutoTestLister testListerHaarAppl("Haar Cascade Application", outputLevel);
+
+    NCVTestSourceProvider<Ncv8u> testSrcFacesVGA_8u(path + "group_1_640x480_VGA.pgm");
+
+    generateHaarApplicationTests(testListerHaarAppl, testSrcFacesVGA_8u, 640, 480);
+
+    return testListerHaarAppl.invoke();
+}
+
+bool nvidia_NCV_Hypotheses_Filtration(const std::string& test_data_path, OutputLevel outputLevel)
+{
+    path = test_data_path;
+    ncvSetDebugOutputHandler(devNullOutput);
+
+    NCVAutoTestLister testListerHypFiltration("Hypotheses Filtration", outputLevel);
+
+    NCVTestSourceProvider<Ncv32u> testSrcRandom_32u(2010, 0, 0xFFFFFFFF, 2048, 2048);
+
+    generateHypothesesFiltrationTests(testListerHypFiltration, testSrcRandom_32u, 512);
+
+    return testListerHypFiltration.invoke();
+}
+
+bool nvidia_NCV_Visualization(const std::string& test_data_path, OutputLevel outputLevel)
+{
+    path = test_data_path;
+    ncvSetDebugOutputHandler(devNullOutput);
+
+    NCVAutoTestLister testListerVisualize("Visualization", outputLevel);
+
+    NCVTestSourceProvider<Ncv8u> testSrcRandom_8u(2010, 0, 255, 2048, 2048);
+    NCVTestSourceProvider<Ncv32u> testSrcRandom_32u(2010, 0, RAND_MAX, 2048, 2048);
+
+    generateDrawRectsTests(testListerVisualize, testSrcRandom_8u, testSrcRandom_32u, 2048, 2048);
+    generateDrawRectsTests(testListerVisualize, testSrcRandom_32u, testSrcRandom_32u, 2048, 2048);
+
+    return testListerVisualize.invoke();
+}
diff --git a/modules/cudalegacy/test/main_test_nvidia.h b/modules/cudalegacy/test/main_test_nvidia.h
new file mode 100644
index 00000000000..9a3ae1e06a7
--- /dev/null
+++ b/modules/cudalegacy/test/main_test_nvidia.h
@@ -0,0 +1,67 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __main_test_nvidia_h__
+#define __main_test_nvidia_h__
+
+enum OutputLevel
+{
+    OutputLevelNone,
+    OutputLevelCompact,
+    OutputLevelFull
+};
+
+extern OutputLevel nvidiaTestOutputLevel;
+
+bool nvidia_NPPST_Integral_Image(const std::string& test_data_path, OutputLevel outputLevel);
+bool nvidia_NPPST_Squared_Integral_Image(const std::string& test_data_path, OutputLevel outputLevel);
+bool nvidia_NPPST_RectStdDev(const std::string& test_data_path, OutputLevel outputLevel);
+bool nvidia_NPPST_Resize(const std::string& test_data_path, OutputLevel outputLevel);
+bool nvidia_NPPST_Vector_Operations(const std::string& test_data_path, OutputLevel outputLevel);
+bool nvidia_NPPST_Transpose(const std::string& test_data_path, OutputLevel outputLevel);
+bool nvidia_NCV_Vector_Operations(const std::string& test_data_path, OutputLevel outputLevel);
+bool nvidia_NCV_Haar_Cascade_Loader(const std::string& test_data_path, OutputLevel outputLevel);
+bool nvidia_NCV_Haar_Cascade_Application(const std::string& test_data_path, OutputLevel outputLevel);
+bool nvidia_NCV_Hypotheses_Filtration(const std::string& test_data_path, OutputLevel outputLevel);
+bool nvidia_NCV_Visualization(const std::string& test_data_path, OutputLevel outputLevel);
+
+#endif
diff --git a/modules/cudalegacy/test/test_calib3d.cpp b/modules/cudalegacy/test/test_calib3d.cpp
new file mode 100644
index 00000000000..e21432a8a4c
--- /dev/null
+++ b/modules/cudalegacy/test/test_calib3d.cpp
@@ -0,0 +1,193 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#if defined HAVE_CUDA && defined HAVE_OPENCV_CALIB3D
+
+#include "opencv2/calib3d.hpp"
+
+namespace opencv_test { namespace {
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// transformPoints
+
+struct TransformPoints : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    virtual void SetUp()
+    {
+        devInfo = GetParam();
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(TransformPoints, Accuracy)
+{
+    cv::Mat src = randomMat(cv::Size(1000, 1), CV_32FC3, 0, 10);
+    cv::Mat rvec = randomMat(cv::Size(3, 1), CV_32F, 0, 1);
+    cv::Mat tvec = randomMat(cv::Size(3, 1), CV_32F, 0, 1);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::transformPoints(loadMat(src), rvec, tvec, dst);
+
+    ASSERT_EQ(src.size(), dst.size());
+    ASSERT_EQ(src.type(), dst.type());
+
+    cv::Mat h_dst(dst);
+
+    cv::Mat rot;
+    cv::Rodrigues(rvec, rot);
+
+    for (int i = 0; i < h_dst.cols; ++i)
+    {
+        cv::Point3f res = h_dst.at<cv::Point3f>(0, i);
+
+        cv::Point3f p = src.at<cv::Point3f>(0, i);
+        cv::Point3f res_gold(
+                rot.at<float>(0, 0) * p.x + rot.at<float>(0, 1) * p.y + rot.at<float>(0, 2) * p.z + tvec.at<float>(0, 0),
+                rot.at<float>(1, 0) * p.x + rot.at<float>(1, 1) * p.y + rot.at<float>(1, 2) * p.z + tvec.at<float>(0, 1),
+                rot.at<float>(2, 0) * p.x + rot.at<float>(2, 1) * p.y + rot.at<float>(2, 2) * p.z + tvec.at<float>(0, 2));
+
+        ASSERT_POINT3_NEAR(res_gold, res, 1e-5);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Calib3D, TransformPoints, ALL_DEVICES);
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// ProjectPoints
+
+struct ProjectPoints : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    virtual void SetUp()
+    {
+        devInfo = GetParam();
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(ProjectPoints, Accuracy)
+{
+    cv::Mat src = randomMat(cv::Size(1000, 1), CV_32FC3, 0, 10);
+    cv::Mat rvec = randomMat(cv::Size(3, 1), CV_32F, 0, 1);
+    cv::Mat tvec = randomMat(cv::Size(3, 1), CV_32F, 0, 1);
+    cv::Mat camera_mat = randomMat(cv::Size(3, 3), CV_32F, 0.5, 1);
+    camera_mat.at<float>(0, 1) = 0.f;
+    camera_mat.at<float>(1, 0) = 0.f;
+    camera_mat.at<float>(2, 0) = 0.f;
+    camera_mat.at<float>(2, 1) = 0.f;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::projectPoints(loadMat(src), rvec, tvec, camera_mat, cv::Mat(), dst);
+
+    ASSERT_EQ(1, dst.rows);
+    ASSERT_EQ(MatType(CV_32FC2), MatType(dst.type()));
+
+    std::vector<cv::Point2f> dst_gold;
+    cv::projectPoints(src, rvec, tvec, camera_mat, cv::Mat(1, 8, CV_32F, cv::Scalar::all(0)), dst_gold);
+
+    ASSERT_EQ(dst_gold.size(), static_cast<size_t>(dst.cols));
+
+    cv::Mat h_dst(dst);
+
+    for (size_t i = 0; i < dst_gold.size(); ++i)
+    {
+        cv::Point2f res = h_dst.at<cv::Point2f>(0, (int)i);
+        cv::Point2f res_gold = dst_gold[i];
+
+        ASSERT_LE(cv::norm(res_gold - res) / cv::norm(res_gold), 1e-3f);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Calib3D, ProjectPoints, ALL_DEVICES);
+
+///////////////////////////////////////////////////////////////////////////////////////////////////////
+// SolvePnPRansac
+
+struct SolvePnPRansac : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    virtual void SetUp()
+    {
+        devInfo = GetParam();
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(SolvePnPRansac, Accuracy)
+{
+    cv::Mat object = randomMat(cv::Size(5000, 1), CV_32FC3, 0, 100);
+    cv::Mat camera_mat = randomMat(cv::Size(3, 3), CV_32F, 0.5, 1);
+    camera_mat.at<float>(0, 1) = 0.f;
+    camera_mat.at<float>(1, 0) = 0.f;
+    camera_mat.at<float>(2, 0) = 0.f;
+    camera_mat.at<float>(2, 1) = 0.f;
+
+    std::vector<cv::Point2f> image_vec;
+    cv::Mat rvec_gold;
+    cv::Mat tvec_gold;
+    rvec_gold = randomMat(cv::Size(3, 1), CV_32F, 0, 1);
+    tvec_gold = randomMat(cv::Size(3, 1), CV_32F, 0, 1);
+    cv::projectPoints(object, rvec_gold, tvec_gold, camera_mat, cv::Mat(1, 8, CV_32F, cv::Scalar::all(0)), image_vec);
+
+    cv::Mat rvec, tvec;
+    std::vector<int> inliers;
+    cv::cuda::solvePnPRansac(object, cv::Mat(1, (int)image_vec.size(), CV_32FC2, &image_vec[0]),
+                            camera_mat, cv::Mat(1, 8, CV_32F, cv::Scalar::all(0)),
+                            rvec, tvec, false, 200, 2.f, 100, &inliers);
+
+    ASSERT_LE(cv::norm(rvec - rvec_gold), 1e-3);
+    ASSERT_LE(cv::norm(tvec - tvec_gold), 1e-3);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Calib3D, SolvePnPRansac, ALL_DEVICES);
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudalegacy/test/test_labeling.cpp b/modules/cudalegacy/test/test_labeling.cpp
new file mode 100644
index 00000000000..2d0bde41d59
--- /dev/null
+++ b/modules/cudalegacy/test/test_labeling.cpp
@@ -0,0 +1,200 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+namespace
+{
+    struct GreedyLabeling
+    {
+        struct dot
+        {
+            int x;
+            int y;
+
+            static dot make(int i, int j)
+            {
+                dot d; d.x = i; d.y = j;
+                return d;
+            }
+        };
+
+        struct InInterval
+        {
+            InInterval(const int& _lo, const int& _hi) : lo(-_lo), hi(_hi) {}
+            const int lo, hi;
+
+            bool operator() (const unsigned char a, const unsigned char b) const
+            {
+                int d = a - b;
+                return lo <= d && d <= hi;
+            }
+        };
+
+        GreedyLabeling(cv::Mat img)
+        : image(img), _labels(image.size(), CV_32SC1, cv::Scalar::all(-1)) {}
+
+        void operator() (cv::Mat labels) const
+        {
+            InInterval inInt(0, 2);
+            dot* stack = new dot[image.cols * image.rows];
+
+            int cc = -1;
+
+            int* dist_labels = (int*)labels.data;
+            int pitch = (int) labels.step1();
+
+            unsigned char* source = (unsigned char*)image.data;
+            int width = image.cols;
+            int height = image.rows;
+            int step1 = (int)image.step1();
+
+            for (int j = 0; j < image.rows; ++j)
+                for (int i = 0; i < image.cols; ++i)
+                {
+                    if (dist_labels[j * pitch + i] != -1) continue;
+
+                    dot* top = stack;
+                    dot p = dot::make(i, j);
+                    cc++;
+
+                    dist_labels[j * pitch + i] = cc;
+
+                    while (top >= stack)
+                    {
+                        int*  dl = &dist_labels[p.y * pitch + p.x];
+                        unsigned char* sp = &source[p.y * step1 + p.x];
+
+                        dl[0] = cc;
+
+                        //right
+                        if( p.x < (width - 1) && dl[ +1] == -1 && inInt(sp[0], sp[+1]))
+                            *top++ = dot::make(p.x + 1, p.y);
+
+                        //left
+                        if( p.x > 0 && dl[-1] == -1 && inInt(sp[0], sp[-1]))
+                            *top++ = dot::make(p.x - 1, p.y);
+
+                        //bottom
+                        if( p.y < (height - 1) && dl[+pitch] == -1 && inInt(sp[0], sp[+step1]))
+                            *top++ = dot::make(p.x, p.y + 1);
+
+                        //top
+                        if( p.y > 0 && dl[-pitch] == -1 && inInt(sp[0], sp[-step1]))
+                            *top++ = dot::make(p.x, p.y - 1);
+
+                        p = *--top;
+                    }
+                }
+            delete[] stack;
+        }
+
+        void checkCorrectness(cv::Mat gpu)
+        {
+            cv::Mat diff = gpu - _labels;
+
+            int outliers = 0;
+            for (int j = 0; j < image.rows; ++j)
+                for (int i = 0; i < image.cols - 1; ++i)
+                {
+                    if ( (_labels.at<int>(j,i) == gpu.at<int>(j,i + 1)) && (diff.at<int>(j, i) != diff.at<int>(j,i + 1)))
+                    {
+                        outliers++;
+                    }
+                }
+            ASSERT_TRUE(outliers < gpu.cols + gpu.rows);
+        }
+
+        cv::Mat image;
+        cv::Mat _labels;
+    };
+}
+
+struct Labeling : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    virtual void SetUp()
+    {
+        devInfo = GetParam();
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+
+    cv::Mat loat_image()
+    {
+        return cv::imread(std::string( cvtest::TS::ptr()->get_data_path() ) + "labeling/label.png");
+    }
+};
+
+CUDA_TEST_P(Labeling, DISABLED_ConnectedComponents)
+{
+    cv::Mat image;
+    cvtColor(loat_image(), image, cv::COLOR_BGR2GRAY);
+
+    cv::threshold(image, image, 150, 255, cv::THRESH_BINARY);
+
+    ASSERT_TRUE(image.type() == CV_8UC1);
+
+    GreedyLabeling host(image);
+    host(host._labels);
+
+    cv::cuda::GpuMat mask;
+    mask.create(image.rows, image.cols, CV_8UC1);
+
+    cv::cuda::GpuMat components;
+    components.create(image.rows, image.cols, CV_32SC1);
+
+    cv::cuda::connectivityMask(cv::cuda::GpuMat(image), mask, cv::Scalar::all(0), cv::Scalar::all(2));
+
+    cv::cuda::labelComponents(mask, components);
+
+    host.checkCorrectness(cv::Mat(components));
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_ConnectedComponents, Labeling, ALL_DEVICES);
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudalegacy/test/test_main.cpp b/modules/cudalegacy/test/test_main.cpp
new file mode 100644
index 00000000000..3f5b70659eb
--- /dev/null
+++ b/modules/cudalegacy/test/test_main.cpp
@@ -0,0 +1,130 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+using namespace std;
+using namespace cv;
+using namespace cv::cuda;
+using namespace opencv_test;
+using namespace testing;
+
+int main(int argc, char** argv)
+{
+    try
+    {
+        const std::string keys =
+                "{ h help ?            |      | Print help}"
+                "{ i info              |      | Print information about system and exit }"
+                "{ device              | -1   | Device on which tests will be executed (-1 means all devices) }"
+                "{ nvtest_output_level | none | NVidia test verbosity level (none, compact, full) }"
+                ;
+
+        CommandLineParser cmd(argc, (const char**)argv, keys);
+
+        if (cmd.has("help"))
+        {
+            cmd.printMessage();
+            return 0;
+        }
+
+        printCudaInfo();
+
+        if (cmd.has("info"))
+        {
+            return 0;
+        }
+
+        int device = cmd.get<int>("device");
+        if (device < 0)
+        {
+            DeviceManager::instance().loadAll();
+
+            cout << "Run tests on all supported devices \n" << endl;
+        }
+        else
+        {
+            DeviceManager::instance().load(device);
+
+            DeviceInfo info(device);
+            cout << "Run tests on device " << device << " [" << info.name() << "] \n" << endl;
+        }
+
+        string outputLevel = cmd.get<string>("nvtest_output_level");
+
+        if (outputLevel == "none")
+            nvidiaTestOutputLevel = OutputLevelNone;
+        else if (outputLevel == "compact")
+            nvidiaTestOutputLevel = OutputLevelCompact;
+        else if (outputLevel == "full")
+            nvidiaTestOutputLevel = OutputLevelFull;
+
+        TS::ptr()->init("gpu");
+        InitGoogleTest(&argc, argv);
+
+        return RUN_ALL_TESTS();
+    }
+    catch (const exception& e)
+    {
+        cerr << e.what() << endl;
+        return -1;
+    }
+    catch (...)
+    {
+        cerr << "Unknown error" << endl;
+        return -1;
+    }
+
+    return 0;
+}
+
+#else // HAVE_CUDA
+
+int main()
+{
+    printf("OpenCV was built without CUDA support\n");
+    return 0;
+}
+
+#endif // HAVE_CUDA
diff --git a/modules/cudalegacy/test/test_nvidia.cpp b/modules/cudalegacy/test/test_nvidia.cpp
new file mode 100644
index 00000000000..b1305bec48e
--- /dev/null
+++ b/modules/cudalegacy/test/test_nvidia.cpp
@@ -0,0 +1,152 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+OutputLevel nvidiaTestOutputLevel = OutputLevelNone;
+
+namespace opencv_test { namespace {
+
+struct NVidiaTest : TestWithParam<cv::cuda::DeviceInfo>
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    std::string _path;
+
+    virtual void SetUp()
+    {
+        devInfo = GetParam();
+
+        cv::cuda::setDevice(devInfo.deviceID());
+        _path = TS::ptr()->get_data_path().c_str();
+        _path = _path + "haarcascade/";
+    }
+};
+
+struct NPPST : NVidiaTest {};
+struct NCV : NVidiaTest {};
+
+CUDA_TEST_P(NPPST, Integral)
+{
+    bool res = nvidia_NPPST_Integral_Image(_path, nvidiaTestOutputLevel);
+
+    ASSERT_TRUE(res);
+}
+
+CUDA_TEST_P(NPPST, SquaredIntegral)
+{
+    bool res = nvidia_NPPST_Squared_Integral_Image(_path, nvidiaTestOutputLevel);
+
+    ASSERT_TRUE(res);
+}
+
+CUDA_TEST_P(NPPST, RectStdDev)
+{
+    bool res = nvidia_NPPST_RectStdDev(_path, nvidiaTestOutputLevel);
+
+    ASSERT_TRUE(res);
+}
+
+CUDA_TEST_P(NPPST, Resize)
+{
+    bool res = nvidia_NPPST_Resize(_path, nvidiaTestOutputLevel);
+
+    ASSERT_TRUE(res);
+}
+
+CUDA_TEST_P(NPPST, VectorOperations)
+{
+    bool res = nvidia_NPPST_Vector_Operations(_path, nvidiaTestOutputLevel);
+
+    ASSERT_TRUE(res);
+}
+
+CUDA_TEST_P(NPPST, Transpose)
+{
+    bool res = nvidia_NPPST_Transpose(_path, nvidiaTestOutputLevel);
+
+    ASSERT_TRUE(res);
+}
+
+CUDA_TEST_P(NCV, VectorOperations)
+{
+    bool res = nvidia_NCV_Vector_Operations(_path, nvidiaTestOutputLevel);
+
+    ASSERT_TRUE(res);
+}
+
+CUDA_TEST_P(NCV, DISABLED_HaarCascadeLoader)
+{
+    bool res = nvidia_NCV_Haar_Cascade_Loader(_path, nvidiaTestOutputLevel);
+
+    ASSERT_TRUE(res);
+}
+
+CUDA_TEST_P(NCV, DISABLED_HaarCascadeApplication)
+{
+    bool res = nvidia_NCV_Haar_Cascade_Application(_path, nvidiaTestOutputLevel);
+
+    ASSERT_TRUE(res);
+}
+
+CUDA_TEST_P(NCV, HypothesesFiltration)
+{
+    bool res = nvidia_NCV_Hypotheses_Filtration(_path, nvidiaTestOutputLevel);
+
+    ASSERT_TRUE(res);
+}
+
+CUDA_TEST_P(NCV, Visualization)
+{
+    bool res = nvidia_NCV_Visualization(_path, nvidiaTestOutputLevel);
+
+    ASSERT_TRUE(res);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Legacy, NPPST, ALL_DEVICES);
+INSTANTIATE_TEST_CASE_P(CUDA_Legacy, NCV, ALL_DEVICES);
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudalegacy/test/test_precomp.hpp b/modules/cudalegacy/test/test_precomp.hpp
new file mode 100644
index 00000000000..f4ca2f48a52
--- /dev/null
+++ b/modules/cudalegacy/test/test_precomp.hpp
@@ -0,0 +1,90 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#if defined(__GNUC__) && !defined(__APPLE__) && !defined(__arm__) && !defined(__aarch64__) && !defined(__powerpc64__)
+    #include <fpu_control.h>
+#endif
+
+#include <cfloat>
+#include <cstdio>
+#include <cmath>
+#include <vector>
+#include <string>
+#include <map>
+#include <memory>
+#include <algorithm>
+#include <fstream>
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_test.hpp"
+
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/cudalegacy.hpp"
+#include "opencv2/highgui.hpp"
+
+#include "opencv2/core/private.cuda.hpp"
+
+#include "opencv2/opencv_modules.hpp"
+
+#include "cvconfig.h"
+
+#include "NCVTest.hpp"
+#include "NCVAutoTestLister.hpp"
+#include "NCVTestSourceProvider.hpp"
+
+#include "TestIntegralImage.h"
+#include "TestIntegralImageSquared.h"
+#include "TestRectStdDev.h"
+#include "TestResize.h"
+#include "TestCompact.h"
+#include "TestTranspose.h"
+#include "TestDrawRects.h"
+#include "TestHypothesesGrow.h"
+#include "TestHypothesesFilter.h"
+#include "TestHaarCascadeLoader.h"
+#include "TestHaarCascadeApplication.h"
+
+#include "main_test_nvidia.h"
+
+#endif
diff --git a/modules/cudaobjdetect/CMakeLists.txt b/modules/cudaobjdetect/CMakeLists.txt
new file mode 100644
index 00000000000..a3f565ff0ab
--- /dev/null
+++ b/modules/cudaobjdetect/CMakeLists.txt
@@ -0,0 +1,9 @@
+if(IOS OR (NOT HAVE_CUDA AND NOT BUILD_CUDA_STUBS))
+  ocv_module_disable(cudaobjdetect)
+endif()
+
+set(the_description "CUDA-accelerated Object Detection")
+
+ocv_warnings_disable(CMAKE_CXX_FLAGS /wd4127 /wd4324 /wd4512 -Wundef -Wmissing-declarations -Wshadow -Wstrict-aliasing)
+
+ocv_define_module(cudaobjdetect opencv_objdetect opencv_cudaarithm opencv_cudawarping OPTIONAL opencv_cudalegacy WRAP python)
diff --git a/modules/cudaobjdetect/include/opencv2/cudaobjdetect.hpp b/modules/cudaobjdetect/include/opencv2/cudaobjdetect.hpp
new file mode 100644
index 00000000000..cb7a3fe2ebf
--- /dev/null
+++ b/modules/cudaobjdetect/include/opencv2/cudaobjdetect.hpp
@@ -0,0 +1,315 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_CUDAOBJDETECT_HPP
+#define OPENCV_CUDAOBJDETECT_HPP
+
+#ifndef __cplusplus
+#  error cudaobjdetect.hpp header must be compiled as C++
+#endif
+
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/objdetect.hpp"
+
+/**
+  @addtogroup cuda
+  @{
+      @defgroup cudaobjdetect Object Detection
+  @}
+ */
+
+namespace cv { namespace cuda {
+
+//! @addtogroup cudaobjdetect
+//! @{
+
+//
+// HOG (Histogram-of-Oriented-Gradients) Descriptor and Object Detector
+//
+
+/** @brief The class implements Histogram of Oriented Gradients (@cite Dalal2005) object detector.
+
+@note
+    -   An example applying the HOG descriptor for people detection can be found at
+        opencv_source_code/samples/cpp/peopledetect.cpp
+    -   A CUDA example applying the HOG descriptor for people detection can be found at
+        opencv_source_code/samples/gpu/hog.cpp
+    -   (Python) An example applying the HOG descriptor for people detection can be found at
+        opencv_source_code/samples/python/peopledetect.py
+ */
+class CV_EXPORTS_W HOG : public Algorithm
+{
+public:
+    /** @brief Creates the HOG descriptor and detector.
+
+    @param win_size Detection window size. Align to block size and block stride.
+    @param block_size Block size in pixels. Align to cell size. Only (16,16) is supported for now.
+    @param block_stride Block stride. It must be a multiple of cell size.
+    @param cell_size Cell size. Only (8, 8) is supported for now.
+    @param nbins Number of bins. Only 9 bins per cell are supported for now.
+     */
+    CV_WRAP static Ptr<HOG> create(Size win_size = Size(64, 128),
+                           Size block_size = Size(16, 16),
+                           Size block_stride = Size(8, 8),
+                           Size cell_size = Size(8, 8),
+                           int nbins = 9);
+
+    //! Gaussian smoothing window parameter.
+    CV_WRAP virtual void setWinSigma(double win_sigma) = 0;
+    CV_WRAP virtual double getWinSigma() const = 0;
+
+    //! L2-Hys normalization method shrinkage.
+    CV_WRAP virtual void setL2HysThreshold(double threshold_L2hys) = 0;
+    CV_WRAP virtual double getL2HysThreshold() const = 0;
+
+    //! Flag to specify whether the gamma correction preprocessing is required or not.
+    CV_WRAP virtual void setGammaCorrection(bool gamma_correction) = 0;
+    CV_WRAP virtual bool getGammaCorrection() const = 0;
+
+    //! Maximum number of detection window increases.
+    CV_WRAP virtual void setNumLevels(int nlevels) = 0;
+    CV_WRAP virtual int getNumLevels() const = 0;
+
+    //! Threshold for the distance between features and SVM classifying plane.
+    //! Usually it is 0 and should be specified in the detector coefficients (as the last free
+    //! coefficient). But if the free coefficient is omitted (which is allowed), you can specify it
+    //! manually here.
+    CV_WRAP virtual void setHitThreshold(double hit_threshold) = 0;
+    CV_WRAP virtual double getHitThreshold() const = 0;
+
+    //! Window stride. It must be a multiple of block stride.
+    CV_WRAP virtual void setWinStride(Size win_stride) = 0;
+    CV_WRAP virtual Size getWinStride() const = 0;
+
+    //! Coefficient of the detection window increase.
+    CV_WRAP virtual void setScaleFactor(double scale0) = 0;
+    CV_WRAP virtual double getScaleFactor() const = 0;
+
+    //! Coefficient to regulate the similarity threshold. When detected, some
+    //! objects can be covered by many rectangles. 0 means not to perform grouping.
+    //! See groupRectangles.
+    CV_WRAP virtual void setGroupThreshold(int group_threshold) = 0;
+    CV_WRAP virtual int getGroupThreshold() const = 0;
+
+    //! Descriptor storage format:
+    //!   - **DESCR_FORMAT_ROW_BY_ROW** - Row-major order.
+    //!   - **DESCR_FORMAT_COL_BY_COL** - Column-major order.
+    CV_WRAP virtual void setDescriptorFormat(HOGDescriptor::DescriptorStorageFormat descr_format) = 0;
+    CV_WRAP virtual HOGDescriptor::DescriptorStorageFormat getDescriptorFormat() const = 0;
+
+    /** @brief Returns the number of coefficients required for the classification.
+     */
+    CV_WRAP virtual size_t getDescriptorSize() const = 0;
+
+    /** @brief Returns the block histogram size.
+     */
+    CV_WRAP virtual size_t getBlockHistogramSize() const = 0;
+
+    /** @brief Sets coefficients for the linear SVM classifier.
+     */
+    CV_WRAP virtual void setSVMDetector(InputArray detector) = 0;
+
+    /** @brief Returns coefficients of the classifier trained for people detection.
+     */
+    CV_WRAP virtual Mat getDefaultPeopleDetector() const = 0;
+
+    /** @brief Performs object detection without a multi-scale window.
+
+    @param img Source image. CV_8UC1 and CV_8UC4 types are supported for now.
+    @param found_locations Left-top corner points of detected objects boundaries.
+    @param confidences Optional output array for confidences.
+     */
+    virtual void detect(InputArray img,
+                        std::vector<Point>& found_locations,
+                        std::vector<double>* confidences = NULL) = 0;
+
+    CV_WRAP inline void detect(InputArray img,
+        CV_OUT std::vector<Point>& found_locations,
+        CV_OUT std::vector<double>& confidences) {
+        detect(img, found_locations, &confidences);
+    }
+
+    /** @brief Performs object detection without a multi-scale window.
+
+    @param img Source image. CV_8UC1 and CV_8UC4 types are supported for now.
+    @param found_locations Left-top corner points of detected objects boundaries.
+     */
+    CV_WRAP inline void detectWithoutConf(InputArray img,
+        CV_OUT std::vector<Point>& found_locations) {
+        detect(img, found_locations, NULL);
+    }
+
+    /** @brief Performs object detection with a multi-scale window.
+
+    @param img Source image. See cuda::HOGDescriptor::detect for type limitations.
+    @param found_locations Detected objects boundaries.
+    @param confidences Optional output array for confidences.
+     */
+    virtual void detectMultiScale(InputArray img,
+                                  std::vector<Rect>& found_locations,
+                                  std::vector<double>* confidences = NULL) = 0;
+
+    CV_WRAP inline void detectMultiScale(InputArray img,
+        CV_OUT std::vector<Rect>& found_locations,
+        CV_OUT std::vector<double>& confidences) {
+        detectMultiScale(img, found_locations, &confidences);
+    }
+
+    /** @brief Performs object detection with a multi-scale window.
+
+    @param img Source image. See cuda::HOGDescriptor::detect for type limitations.
+    @param found_locations Detected objects boundaries.
+     */
+    CV_WRAP inline void detectMultiScaleWithoutConf(InputArray img,
+        CV_OUT std::vector<Rect>& found_locations) {
+        detectMultiScale(img, found_locations, NULL);
+    }
+
+    /** @brief Returns block descriptors computed for the whole image.
+
+    @param img Source image. See cuda::HOGDescriptor::detect for type limitations.
+    @param descriptors 2D array of descriptors.
+    @param stream CUDA stream.
+     */
+    CV_WRAP virtual void compute(InputArray img,
+                         OutputArray descriptors,
+                         Stream& stream = Stream::Null()) = 0;
+};
+
+//
+// CascadeClassifier
+//
+
+/** @brief Cascade classifier class used for object detection. Supports HAAR and LBP cascades. :
+
+@note
+   -   A cascade classifier example can be found at
+        opencv_source_code/samples/gpu/cascadeclassifier.cpp
+    -   A Nvidea API specific cascade classifier example can be found at
+        opencv_source_code/samples/gpu/cascadeclassifier_nvidia_api.cpp
+ */
+class CV_EXPORTS_W CascadeClassifier : public Algorithm
+{
+public:
+    /** @brief Loads the classifier from a file. Cascade type is detected automatically by constructor parameter.
+
+    @param filename Name of the file from which the classifier is loaded. Only the old haar classifier
+    (trained by the haar training application) and NVIDIA's nvbin are supported for HAAR and only new
+    type of OpenCV XML cascade supported for LBP. The working haar models can be found at opencv_folder/data/haarcascades_cuda/
+     */
+    CV_WRAP static Ptr<cuda::CascadeClassifier> create(const String& filename);
+    /** @overload
+     */
+    static Ptr<cuda::CascadeClassifier> create(const FileStorage& file);
+
+    //! Maximum possible object size. Objects larger than that are ignored. Used for
+    //! second signature and supported only for LBP cascades.
+    CV_WRAP virtual void setMaxObjectSize(Size maxObjectSize) = 0;
+    CV_WRAP virtual Size getMaxObjectSize() const = 0;
+
+    //! Minimum possible object size. Objects smaller than that are ignored.
+    CV_WRAP virtual void setMinObjectSize(Size minSize) = 0;
+    CV_WRAP virtual Size getMinObjectSize() const = 0;
+
+    //! Parameter specifying how much the image size is reduced at each image scale.
+    CV_WRAP virtual void setScaleFactor(double scaleFactor) = 0;
+    CV_WRAP virtual double getScaleFactor() const = 0;
+
+    //! Parameter specifying how many neighbors each candidate rectangle should have
+    //! to retain it.
+    CV_WRAP virtual void setMinNeighbors(int minNeighbors) = 0;
+    CV_WRAP virtual int getMinNeighbors() const = 0;
+
+    CV_WRAP virtual void setFindLargestObject(bool findLargestObject) = 0;
+    CV_WRAP virtual bool getFindLargestObject() = 0;
+
+    CV_WRAP virtual void setMaxNumObjects(int maxNumObjects) = 0;
+    CV_WRAP virtual int getMaxNumObjects() const = 0;
+
+    CV_WRAP virtual Size getClassifierSize() const = 0;
+
+    /** @brief Detects objects of different sizes in the input image.
+
+    @param image Matrix of type CV_8U containing an image where objects should be detected.
+    @param objects Buffer to store detected objects (rectangles).
+    @param stream CUDA stream.
+
+    To get final array of detected objects use CascadeClassifier::convert method.
+
+    @code
+        Ptr<cuda::CascadeClassifier> cascade_gpu = cuda::CascadeClassifier::create(...);
+
+        Mat image_cpu = imread(...)
+        GpuMat image_gpu(image_cpu);
+
+        GpuMat objbuf;
+        cascade_gpu->detectMultiScale(image_gpu, objbuf);
+
+        std::vector<Rect> faces;
+        cascade_gpu->convert(objbuf, faces);
+
+        for(int i = 0; i < detections_num; ++i)
+           cv::rectangle(image_cpu, faces[i], Scalar(255));
+
+        imshow("Faces", image_cpu);
+    @endcode
+
+    @sa CascadeClassifier::detectMultiScale
+     */
+    CV_WRAP virtual void detectMultiScale(InputArray image,
+                                  OutputArray objects,
+                                  Stream& stream = Stream::Null()) = 0;
+
+    /** @brief Converts objects array from internal representation to standard vector.
+
+    @param gpu_objects Objects array in internal representation.
+    @param objects Resulting array.
+     */
+    CV_WRAP virtual void convert(OutputArray gpu_objects,
+                         std::vector<Rect>& objects) = 0;
+};
+
+//! @}
+
+}} // namespace cv { namespace cuda {
+
+#endif /* OPENCV_CUDAOBJDETECT_HPP */
diff --git a/modules/cudaobjdetect/misc/python/test/test_cudaobjdetect.py b/modules/cudaobjdetect/misc/python/test/test_cudaobjdetect.py
new file mode 100644
index 00000000000..b11d7725740
--- /dev/null
+++ b/modules/cudaobjdetect/misc/python/test/test_cudaobjdetect.py
@@ -0,0 +1,38 @@
+#!/usr/bin/env python
+import os
+import cv2 as cv
+import numpy as np
+
+from tests_common import NewOpenCVTests, unittest
+
+class cudaobjdetect_test(NewOpenCVTests):
+    def setUp(self):
+        super(cudaobjdetect_test, self).setUp()
+        if not cv.cuda.getCudaEnabledDeviceCount():
+            self.skipTest("No CUDA-capable device is detected")
+
+    @unittest.skipIf('OPENCV_TEST_DATA_PATH' not in os.environ,
+                        "OPENCV_TEST_DATA_PATH is not defined")
+    def test_hog(self):
+        img_path = os.environ['OPENCV_TEST_DATA_PATH'] + '/gpu/caltech/image_00000009_0.png'
+        npMat = cv.cvtColor(cv.imread(img_path),cv.COLOR_BGR2BGRA)
+
+        cuMat = cv.cuda_GpuMat(npMat)
+        cuHog = cv.cuda.HOG_create()
+        cuHog.setSVMDetector(cuHog.getDefaultPeopleDetector())
+
+        loc, conf = cuHog.detect(cuMat)
+        self.assertTrue(len(loc) == len(conf) and len(loc) > 0 and len(loc[0]) == 2)
+
+        loc = cuHog.detectWithoutConf(cuMat)
+        self.assertTrue(len(loc) > 0 and len(loc[0]) == 2)
+
+        loc = cuHog.detectMultiScaleWithoutConf(cuMat)
+        self.assertTrue(len(loc) > 0 and len(loc[0]) == 4)
+
+        cuHog.setGroupThreshold(0)
+        loc, conf = cuHog.detectMultiScale(cuMat)
+        self.assertTrue(len(loc) == len(conf) and len(loc) > 0 and len(loc[0]) == 4)
+
+if __name__ == '__main__':
+    NewOpenCVTests.bootstrap()
\ No newline at end of file
diff --git a/modules/cudaobjdetect/perf/perf_main.cpp b/modules/cudaobjdetect/perf/perf_main.cpp
new file mode 100644
index 00000000000..7a927be744d
--- /dev/null
+++ b/modules/cudaobjdetect/perf/perf_main.cpp
@@ -0,0 +1,47 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+using namespace perf;
+
+CV_PERF_TEST_CUDA_MAIN(cudaobjdetect)
diff --git a/modules/cudaobjdetect/perf/perf_objdetect.cpp b/modules/cudaobjdetect/perf/perf_objdetect.cpp
new file mode 100644
index 00000000000..2bfd1052694
--- /dev/null
+++ b/modules/cudaobjdetect/perf/perf_objdetect.cpp
@@ -0,0 +1,173 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+///////////////////////////////////////////////////////////////
+// HOG
+
+DEF_PARAM_TEST_1(Image, string);
+
+PERF_TEST_P(Image, ObjDetect_HOG,
+            Values<string>("gpu/hog/road.png",
+                           "gpu/caltech/image_00000009_0.png",
+                           "gpu/caltech/image_00000032_0.png",
+                           "gpu/caltech/image_00000165_0.png",
+                           "gpu/caltech/image_00000261_0.png",
+                           "gpu/caltech/image_00000469_0.png",
+                           "gpu/caltech/image_00000527_0.png",
+                           "gpu/caltech/image_00000574_0.png"))
+{
+    declare.time(300.0);
+
+    const cv::Mat img = readImage(GetParam(), cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(img.empty());
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_img(img);
+        std::vector<cv::Rect> gpu_found_locations;
+
+        cv::Ptr<cv::cuda::HOG> d_hog = cv::cuda::HOG::create();
+        d_hog->setSVMDetector(d_hog->getDefaultPeopleDetector());
+
+        TEST_CYCLE() d_hog->detectMultiScale(d_img, gpu_found_locations);
+
+        SANITY_CHECK(gpu_found_locations);
+    }
+    else
+    {
+        std::vector<cv::Rect> cpu_found_locations;
+
+        cv::Ptr<cv::cuda::HOG> d_hog = cv::cuda::HOG::create();
+
+        cv::HOGDescriptor hog;
+        hog.setSVMDetector(d_hog->getDefaultPeopleDetector());
+
+        TEST_CYCLE() hog.detectMultiScale(img, cpu_found_locations);
+
+        SANITY_CHECK(cpu_found_locations);
+    }
+}
+
+///////////////////////////////////////////////////////////////
+// HaarClassifier
+
+typedef pair<string, string> pair_string;
+DEF_PARAM_TEST_1(ImageAndCascade, pair_string);
+
+PERF_TEST_P(ImageAndCascade, ObjDetect_HaarClassifier,
+            Values<pair_string>(make_pair("gpu/haarcascade/group_1_640x480_VGA.pgm", "gpu/perf/haarcascade_frontalface_alt.xml")))
+{
+    const cv::Mat img = readImage(GetParam().first, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(img.empty());
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::cuda::CascadeClassifier> d_cascade =
+                cv::cuda::CascadeClassifier::create(perf::TestBase::getDataPath(GetParam().second));
+
+        const cv::cuda::GpuMat d_img(img);
+        cv::cuda::GpuMat objects_buffer;
+
+        TEST_CYCLE() d_cascade->detectMultiScale(d_img, objects_buffer);
+
+        std::vector<cv::Rect> gpu_rects;
+        d_cascade->convert(objects_buffer, gpu_rects);
+
+        cv::groupRectangles(gpu_rects, 3, 0.2);
+        SANITY_CHECK(gpu_rects);
+    }
+    else
+    {
+        cv::CascadeClassifier cascade;
+        ASSERT_TRUE(cascade.load(perf::TestBase::getDataPath("gpu/perf/haarcascade_frontalface_alt.xml")));
+
+        std::vector<cv::Rect> cpu_rects;
+
+        TEST_CYCLE() cascade.detectMultiScale(img, cpu_rects);
+
+        SANITY_CHECK(cpu_rects);
+    }
+}
+
+///////////////////////////////////////////////////////////////
+// LBP cascade
+
+PERF_TEST_P(ImageAndCascade, ObjDetect_LBPClassifier,
+            Values<pair_string>(make_pair("gpu/haarcascade/group_1_640x480_VGA.pgm", "gpu/lbpcascade/lbpcascade_frontalface.xml")))
+{
+    const cv::Mat img = readImage(GetParam().first, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(img.empty());
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::cuda::CascadeClassifier> d_cascade =
+                cv::cuda::CascadeClassifier::create(perf::TestBase::getDataPath(GetParam().second));
+
+        const cv::cuda::GpuMat d_img(img);
+        cv::cuda::GpuMat objects_buffer;
+
+        TEST_CYCLE() d_cascade->detectMultiScale(d_img, objects_buffer);
+
+        std::vector<cv::Rect> gpu_rects;
+        d_cascade->convert(objects_buffer, gpu_rects);
+
+        cv::groupRectangles(gpu_rects, 3, 0.2);
+        SANITY_CHECK(gpu_rects);
+    }
+    else
+    {
+        cv::CascadeClassifier cascade;
+        ASSERT_TRUE(cascade.load(perf::TestBase::getDataPath("gpu/lbpcascade/lbpcascade_frontalface.xml")));
+
+        std::vector<cv::Rect> cpu_rects;
+
+        TEST_CYCLE() cascade.detectMultiScale(img, cpu_rects);
+
+        SANITY_CHECK(cpu_rects);
+    }
+}
+
+}} // namespace
diff --git a/modules/cudaobjdetect/perf/perf_precomp.hpp b/modules/cudaobjdetect/perf/perf_precomp.hpp
new file mode 100644
index 00000000000..77575a6d4d9
--- /dev/null
+++ b/modules/cudaobjdetect/perf/perf_precomp.hpp
@@ -0,0 +1,53 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_PERF_PRECOMP_HPP__
+#define __OPENCV_PERF_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_perf.hpp"
+
+#include "opencv2/cudaobjdetect.hpp"
+#include "opencv2/objdetect.hpp"
+
+namespace opencv_test { using namespace perf; }
+
+#endif
diff --git a/modules/cudaobjdetect/src/cascadeclassifier.cpp b/modules/cudaobjdetect/src/cascadeclassifier.cpp
new file mode 100644
index 00000000000..d15a09fdacb
--- /dev/null
+++ b/modules/cudaobjdetect/src/cascadeclassifier.cpp
@@ -0,0 +1,861 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+#define CUDA_DISABLER //#include "opencv2/objdetect/objdetect_c.h"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+Ptr<cuda::CascadeClassifier> cv::cuda::CascadeClassifier::create(const String&) { throw_no_cuda(); return Ptr<cuda::CascadeClassifier>(); }
+Ptr<cuda::CascadeClassifier> cv::cuda::CascadeClassifier::create(const FileStorage&) { throw_no_cuda(); return Ptr<cuda::CascadeClassifier>(); }
+
+#else
+
+//
+// CascadeClassifierBase
+//
+
+namespace
+{
+    class CascadeClassifierBase : public cuda::CascadeClassifier
+    {
+    public:
+        CascadeClassifierBase();
+
+        virtual void setMaxObjectSize(Size maxObjectSize) { maxObjectSize_ = maxObjectSize; }
+        virtual Size getMaxObjectSize() const { return maxObjectSize_; }
+
+        virtual void setMinObjectSize(Size minSize) { minObjectSize_ = minSize; }
+        virtual Size getMinObjectSize() const { return minObjectSize_; }
+
+        virtual void setScaleFactor(double scaleFactor) { scaleFactor_ = scaleFactor; }
+        virtual double getScaleFactor() const { return scaleFactor_; }
+
+        virtual void setMinNeighbors(int minNeighbors) { minNeighbors_ = minNeighbors; }
+        virtual int getMinNeighbors() const { return minNeighbors_; }
+
+        virtual void setFindLargestObject(bool findLargestObject) { findLargestObject_ = findLargestObject; }
+        virtual bool getFindLargestObject() { return findLargestObject_; }
+
+        virtual void setMaxNumObjects(int maxNumObjects) { maxNumObjects_ = maxNumObjects; }
+        virtual int getMaxNumObjects() const { return maxNumObjects_; }
+
+    protected:
+        Size maxObjectSize_;
+        Size minObjectSize_;
+        double scaleFactor_;
+        int minNeighbors_;
+        bool findLargestObject_;
+        int maxNumObjects_;
+    };
+
+    CascadeClassifierBase::CascadeClassifierBase() :
+        maxObjectSize_(),
+        minObjectSize_(),
+        scaleFactor_(1.2),
+        minNeighbors_(4),
+        findLargestObject_(false),
+        maxNumObjects_(100)
+    {
+    }
+}
+
+//
+// HaarCascade
+//
+
+#ifdef HAVE_OPENCV_CUDALEGACY
+
+namespace
+{
+    class HaarCascade_Impl : public CascadeClassifierBase
+    {
+    public:
+        explicit HaarCascade_Impl(const String& filename);
+
+        virtual Size getClassifierSize() const;
+
+        virtual void detectMultiScale(InputArray image,
+                                      OutputArray objects,
+                                      Stream& stream);
+
+        virtual void convert(OutputArray gpu_objects,
+                             std::vector<Rect>& objects);
+
+    private:
+        NCVStatus load(const String& classifierFile);
+        NCVStatus calculateMemReqsAndAllocate(const Size& frameSize);
+        NCVStatus process(const GpuMat& src, GpuMat& objects, cv::Size ncvMinSize, /*out*/ unsigned int& numDetections);
+
+        Size lastAllocatedFrameSize;
+
+        Ptr<NCVMemStackAllocator> gpuAllocator;
+        Ptr<NCVMemStackAllocator> cpuAllocator;
+
+        cudaDeviceProp devProp;
+        NCVStatus ncvStat;
+
+        Ptr<NCVMemNativeAllocator> gpuCascadeAllocator;
+        Ptr<NCVMemNativeAllocator> cpuCascadeAllocator;
+
+        Ptr<NCVVectorAlloc<HaarStage64> >           h_haarStages;
+        Ptr<NCVVectorAlloc<HaarClassifierNode128> > h_haarNodes;
+        Ptr<NCVVectorAlloc<HaarFeature64> >         h_haarFeatures;
+
+        HaarClassifierCascadeDescriptor haar;
+
+        Ptr<NCVVectorAlloc<HaarStage64> >           d_haarStages;
+        Ptr<NCVVectorAlloc<HaarClassifierNode128> > d_haarNodes;
+        Ptr<NCVVectorAlloc<HaarFeature64> >         d_haarFeatures;
+    };
+
+    static void NCVDebugOutputHandler(const String &msg)
+    {
+        CV_Error(Error::GpuApiCallError, msg.c_str());
+    }
+
+    HaarCascade_Impl::HaarCascade_Impl(const String& filename) :
+        lastAllocatedFrameSize(-1, -1)
+    {
+        ncvSetDebugOutputHandler(NCVDebugOutputHandler);
+        ncvSafeCall( load(filename) );
+    }
+
+    Size HaarCascade_Impl::getClassifierSize() const
+    {
+        return Size(haar.ClassifierSize.width, haar.ClassifierSize.height);
+    }
+
+    void HaarCascade_Impl::detectMultiScale(InputArray _image,
+                                            OutputArray _objects,
+                                            Stream& stream)
+    {
+        const GpuMat image = _image.getGpuMat();
+
+        CV_Assert( image.depth() == CV_8U);
+        CV_Assert( scaleFactor_ > 1 );
+        CV_Assert( !stream );
+
+        Size ncvMinSize = getClassifierSize();
+        if (ncvMinSize.width < minObjectSize_.width && ncvMinSize.height < minObjectSize_.height)
+        {
+            ncvMinSize.width = minObjectSize_.width;
+            ncvMinSize.height = minObjectSize_.height;
+        }
+
+        BufferPool pool(stream);
+        GpuMat objectsBuf = pool.getBuffer(1, maxNumObjects_, traits::Type<Rect>::value);
+
+        unsigned int numDetections;
+        ncvSafeCall( process(image, objectsBuf, ncvMinSize, numDetections) );
+
+        if (numDetections > 0)
+        {
+            objectsBuf.colRange(0, numDetections).copyTo(_objects);
+        }
+        else
+        {
+            _objects.release();
+        }
+    }
+
+    void HaarCascade_Impl::convert(OutputArray _gpu_objects, std::vector<Rect>& objects)
+    {
+        if (_gpu_objects.empty())
+        {
+            objects.clear();
+            return;
+        }
+
+        Mat gpu_objects;
+        if (_gpu_objects.kind() == _InputArray::CUDA_GPU_MAT)
+        {
+            _gpu_objects.getGpuMat().download(gpu_objects);
+        }
+        else
+        {
+            gpu_objects = _gpu_objects.getMat();
+        }
+
+        CV_Assert( gpu_objects.rows == 1 );
+        CV_Assert( gpu_objects.type() == traits::Type<Rect>::value );
+
+        Rect* ptr = gpu_objects.ptr<Rect>();
+        objects.assign(ptr, ptr + gpu_objects.cols);
+    }
+
+    NCVStatus HaarCascade_Impl::load(const String& classifierFile)
+    {
+        int devId = cv::cuda::getDevice();
+        ncvAssertCUDAReturn(cudaGetDeviceProperties(&devProp, devId), NCV_CUDA_ERROR);
+
+        // Load the classifier from file (assuming its size is about 1 mb) using a simple allocator
+        gpuCascadeAllocator = makePtr<NCVMemNativeAllocator>(NCVMemoryTypeDevice, static_cast<int>(devProp.textureAlignment));
+        cpuCascadeAllocator = makePtr<NCVMemNativeAllocator>(NCVMemoryTypeHostPinned, static_cast<int>(devProp.textureAlignment));
+
+        ncvAssertPrintReturn(gpuCascadeAllocator->isInitialized(), "Error creating cascade GPU allocator", NCV_CUDA_ERROR);
+        ncvAssertPrintReturn(cpuCascadeAllocator->isInitialized(), "Error creating cascade CPU allocator", NCV_CUDA_ERROR);
+
+        Ncv32u haarNumStages, haarNumNodes, haarNumFeatures;
+        ncvStat = ncvHaarGetClassifierSize(classifierFile, haarNumStages, haarNumNodes, haarNumFeatures);
+        ncvAssertPrintReturn(ncvStat == NCV_SUCCESS, "Error reading classifier size (check the file)", NCV_FILE_ERROR);
+
+        h_haarStages.reset  (new NCVVectorAlloc<HaarStage64>(*cpuCascadeAllocator, haarNumStages));
+        h_haarNodes.reset   (new NCVVectorAlloc<HaarClassifierNode128>(*cpuCascadeAllocator, haarNumNodes));
+        h_haarFeatures.reset(new NCVVectorAlloc<HaarFeature64>(*cpuCascadeAllocator, haarNumFeatures));
+
+        ncvAssertPrintReturn(h_haarStages->isMemAllocated(), "Error in cascade CPU allocator", NCV_CUDA_ERROR);
+        ncvAssertPrintReturn(h_haarNodes->isMemAllocated(), "Error in cascade CPU allocator", NCV_CUDA_ERROR);
+        ncvAssertPrintReturn(h_haarFeatures->isMemAllocated(), "Error in cascade CPU allocator", NCV_CUDA_ERROR);
+
+        ncvStat = ncvHaarLoadFromFile_host(classifierFile, haar, *h_haarStages, *h_haarNodes, *h_haarFeatures);
+        ncvAssertPrintReturn(ncvStat == NCV_SUCCESS, "Error loading classifier", NCV_FILE_ERROR);
+
+        d_haarStages.reset  (new NCVVectorAlloc<HaarStage64>(*gpuCascadeAllocator, haarNumStages));
+        d_haarNodes.reset   (new NCVVectorAlloc<HaarClassifierNode128>(*gpuCascadeAllocator, haarNumNodes));
+        d_haarFeatures.reset(new NCVVectorAlloc<HaarFeature64>(*gpuCascadeAllocator, haarNumFeatures));
+
+        ncvAssertPrintReturn(d_haarStages->isMemAllocated(), "Error in cascade GPU allocator", NCV_CUDA_ERROR);
+        ncvAssertPrintReturn(d_haarNodes->isMemAllocated(), "Error in cascade GPU allocator", NCV_CUDA_ERROR);
+        ncvAssertPrintReturn(d_haarFeatures->isMemAllocated(), "Error in cascade GPU allocator", NCV_CUDA_ERROR);
+
+        ncvStat = h_haarStages->copySolid(*d_haarStages, 0);
+        ncvAssertPrintReturn(ncvStat == NCV_SUCCESS, "Error copying cascade to GPU", NCV_CUDA_ERROR);
+        ncvStat = h_haarNodes->copySolid(*d_haarNodes, 0);
+        ncvAssertPrintReturn(ncvStat == NCV_SUCCESS, "Error copying cascade to GPU", NCV_CUDA_ERROR);
+        ncvStat = h_haarFeatures->copySolid(*d_haarFeatures, 0);
+        ncvAssertPrintReturn(ncvStat == NCV_SUCCESS, "Error copying cascade to GPU", NCV_CUDA_ERROR);
+
+        return NCV_SUCCESS;
+    }
+
+    NCVStatus HaarCascade_Impl::calculateMemReqsAndAllocate(const Size& frameSize)
+    {
+        if (lastAllocatedFrameSize == frameSize)
+        {
+            return NCV_SUCCESS;
+        }
+
+        // Calculate memory requirements and create real allocators
+        NCVMemStackAllocator gpuCounter(static_cast<int>(devProp.textureAlignment));
+        NCVMemStackAllocator cpuCounter(static_cast<int>(devProp.textureAlignment));
+
+        ncvAssertPrintReturn(gpuCounter.isInitialized(), "Error creating GPU memory counter", NCV_CUDA_ERROR);
+        ncvAssertPrintReturn(cpuCounter.isInitialized(), "Error creating CPU memory counter", NCV_CUDA_ERROR);
+
+        NCVMatrixAlloc<Ncv8u> d_src(gpuCounter, frameSize.width, frameSize.height);
+        NCVMatrixAlloc<Ncv8u> h_src(cpuCounter, frameSize.width, frameSize.height);
+
+        ncvAssertReturn(d_src.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+        ncvAssertReturn(h_src.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+
+        NCVVectorAlloc<NcvRect32u> d_rects(gpuCounter, 100);
+        ncvAssertReturn(d_rects.isMemAllocated(), NCV_ALLOCATOR_BAD_ALLOC);
+
+        NcvSize32u roi;
+        roi.width = d_src.width();
+        roi.height = d_src.height();
+        Ncv32u numDetections;
+        ncvStat = ncvDetectObjectsMultiScale_device(d_src, roi, d_rects, numDetections, haar, *h_haarStages,
+            *d_haarStages, *d_haarNodes, *d_haarFeatures, haar.ClassifierSize, 4, 1.2f, 1, 0, gpuCounter, cpuCounter, devProp, 0);
+
+        ncvAssertReturnNcvStat(ncvStat);
+        ncvAssertCUDAReturn(cudaStreamSynchronize(0), NCV_CUDA_ERROR);
+
+        gpuAllocator = makePtr<NCVMemStackAllocator>(NCVMemoryTypeDevice, gpuCounter.maxSize(), static_cast<int>(devProp.textureAlignment));
+        cpuAllocator = makePtr<NCVMemStackAllocator>(NCVMemoryTypeHostPinned, cpuCounter.maxSize(), static_cast<int>(devProp.textureAlignment));
+
+        ncvAssertPrintReturn(gpuAllocator->isInitialized(), "Error creating GPU memory allocator", NCV_CUDA_ERROR);
+        ncvAssertPrintReturn(cpuAllocator->isInitialized(), "Error creating CPU memory allocator", NCV_CUDA_ERROR);
+
+        lastAllocatedFrameSize = frameSize;
+        return NCV_SUCCESS;
+    }
+
+    NCVStatus HaarCascade_Impl::process(const GpuMat& src, GpuMat& objects, cv::Size ncvMinSize, /*out*/ unsigned int& numDetections)
+    {
+        calculateMemReqsAndAllocate(src.size());
+
+        NCVMemPtr src_beg;
+        src_beg.ptr = (void*)src.ptr<Ncv8u>();
+        src_beg.memtype = NCVMemoryTypeDevice;
+
+        NCVMemSegment src_seg;
+        src_seg.begin = src_beg;
+        src_seg.size  = src.step * src.rows;
+
+        NCVMatrixReuse<Ncv8u> d_src(src_seg, static_cast<int>(devProp.textureAlignment), src.cols, src.rows, static_cast<int>(src.step), true);
+        ncvAssertReturn(d_src.isMemReused(), NCV_ALLOCATOR_BAD_REUSE);
+
+        CV_Assert(objects.rows == 1);
+
+        NCVMemPtr objects_beg;
+        objects_beg.ptr = (void*)objects.ptr<NcvRect32u>();
+        objects_beg.memtype = NCVMemoryTypeDevice;
+
+        NCVMemSegment objects_seg;
+        objects_seg.begin = objects_beg;
+        objects_seg.size = objects.step * objects.rows;
+        NCVVectorReuse<NcvRect32u> d_rects(objects_seg, objects.cols);
+        ncvAssertReturn(d_rects.isMemReused(), NCV_ALLOCATOR_BAD_REUSE);
+
+        NcvSize32u roi;
+        roi.width = d_src.width();
+        roi.height = d_src.height();
+
+        NcvSize32u winMinSize(ncvMinSize.width, ncvMinSize.height);
+
+        Ncv32u flags = 0;
+        flags |= findLargestObject_ ? NCVPipeObjDet_FindLargestObject : 0;
+
+        ncvStat = ncvDetectObjectsMultiScale_device(
+            d_src, roi, d_rects, numDetections, haar, *h_haarStages,
+            *d_haarStages, *d_haarNodes, *d_haarFeatures,
+            winMinSize,
+            minNeighbors_,
+            scaleFactor_, 1,
+            flags,
+            *gpuAllocator, *cpuAllocator, devProp, 0);
+        ncvAssertReturnNcvStat(ncvStat);
+        ncvAssertCUDAReturn(cudaStreamSynchronize(0), NCV_CUDA_ERROR);
+
+        return NCV_SUCCESS;
+    }
+}
+
+#endif
+
+//
+// LbpCascade
+//
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace lbp
+    {
+        void classifyPyramid(int frameW,
+                             int frameH,
+                             int windowW,
+                             int windowH,
+                             float initalScale,
+                             float factor,
+                             int total,
+                             const PtrStepSzb& mstages,
+                             const int nstages,
+                             const PtrStepSzi& mnodes,
+                             const PtrStepSzf& mleaves,
+                             const PtrStepSzi& msubsets,
+                             const PtrStepSzb& mfeatures,
+                             const int subsetSize,
+                             PtrStepSz<int4> objects,
+                             unsigned int* classified,
+                             PtrStepSzi integral);
+
+        void connectedConmonents(PtrStepSz<int4> candidates,
+                                 int ncandidates,
+                                 PtrStepSz<int4> objects,
+                                 int groupThreshold,
+                                 float grouping_eps,
+                                 unsigned int* nclasses);
+    }
+}}}
+
+namespace
+{
+    cv::Size operator -(const cv::Size& a, const cv::Size& b)
+    {
+        return cv::Size(a.width - b.width, a.height - b.height);
+    }
+
+    cv::Size operator +(const cv::Size& a, const int& i)
+    {
+        return cv::Size(a.width + i, a.height + i);
+    }
+
+    cv::Size operator *(const cv::Size& a, const float& f)
+    {
+        return cv::Size(cvRound(a.width * f), cvRound(a.height * f));
+    }
+
+    cv::Size operator /(const cv::Size& a, const float& f)
+    {
+        return cv::Size(cvRound(a.width / f), cvRound(a.height / f));
+    }
+
+    bool operator <=(const cv::Size& a, const cv::Size& b)
+    {
+        return a.width <= b.width && a.height <= b.width;
+    }
+
+    struct PyrLavel
+    {
+        PyrLavel(int _order, float _scale, cv::Size frame, cv::Size window, cv::Size minObjectSize)
+        {
+            do
+            {
+                order = _order;
+                scale = pow(_scale, order);
+                sFrame = frame / scale;
+                workArea = sFrame - window + 1;
+                sWindow = window * scale;
+                _order++;
+            } while (sWindow <= minObjectSize);
+        }
+
+        bool isFeasible(cv::Size maxObj)
+        {
+            return workArea.width > 0 && workArea.height > 0 && sWindow <= maxObj;
+        }
+
+        PyrLavel next(float factor, cv::Size frame, cv::Size window, cv::Size minObjectSize)
+        {
+            return PyrLavel(order + 1, factor, frame, window, minObjectSize);
+        }
+
+        int order;
+        float scale;
+        cv::Size sFrame;
+        cv::Size workArea;
+        cv::Size sWindow;
+    };
+
+    class LbpCascade_Impl : public CascadeClassifierBase
+    {
+    public:
+        explicit LbpCascade_Impl(const FileStorage& file);
+
+        virtual Size getClassifierSize() const { return NxM; }
+
+        virtual void detectMultiScale(InputArray image,
+                                      OutputArray objects,
+                                      Stream& stream);
+
+        virtual void convert(OutputArray gpu_objects,
+                             std::vector<Rect>& objects);
+
+    private:
+        bool load(const FileNode &root);
+        void allocateBuffers(cv::Size frame);
+
+    private:
+        struct Stage
+        {
+            int    first;
+            int    ntrees;
+            float  threshold;
+        };
+
+        enum stage { BOOST = 0 };
+        enum feature { LBP = 1, HAAR = 2 };
+
+        static const stage stageType = BOOST;
+        static const feature featureType = LBP;
+
+        cv::Size NxM;
+        bool isStumps;
+        int ncategories;
+        int subsetSize;
+        int nodeStep;
+
+        // gpu representation of classifier
+        GpuMat stage_mat;
+        GpuMat trees_mat;
+        GpuMat nodes_mat;
+        GpuMat leaves_mat;
+        GpuMat subsets_mat;
+        GpuMat features_mat;
+
+        GpuMat integral;
+        GpuMat integralBuffer;
+        GpuMat resuzeBuffer;
+
+        GpuMat candidates;
+        static const int integralFactor = 4;
+    };
+
+    LbpCascade_Impl::LbpCascade_Impl(const FileStorage& file)
+    {
+        load(file.getFirstTopLevelNode());
+    }
+
+    void LbpCascade_Impl::detectMultiScale(InputArray _image,
+                                           OutputArray _objects,
+                                           Stream& stream)
+    {
+        const GpuMat image = _image.getGpuMat();
+
+        CV_Assert( image.depth() == CV_8U);
+        CV_Assert( scaleFactor_ > 1 );
+        CV_Assert( !stream );
+
+        const float grouping_eps = 0.2f;
+
+        BufferPool pool(stream);
+        GpuMat objects = pool.getBuffer(1, maxNumObjects_, traits::Type<Rect>::value);
+
+        // used for debug
+        // candidates.setTo(cv::Scalar::all(0));
+        // objects.setTo(cv::Scalar::all(0));
+
+        if (maxObjectSize_ == cv::Size())
+            maxObjectSize_ = image.size();
+
+        allocateBuffers(image.size());
+
+        unsigned int classified = 0;
+        GpuMat dclassified(1, 1, CV_32S);
+        cudaSafeCall( cudaMemcpy(dclassified.ptr(), &classified, sizeof(int), cudaMemcpyHostToDevice) );
+
+        PyrLavel level(0, scaleFactor_, image.size(), NxM, minObjectSize_);
+
+        while (level.isFeasible(maxObjectSize_))
+        {
+            int acc = level.sFrame.width + 1;
+            float iniScale = level.scale;
+
+            cv::Size area = level.workArea;
+            int step = 1 + (level.scale <= 2.f);
+
+            int total = 0, prev  = 0;
+
+            while (acc <= integralFactor * (image.cols + 1) && level.isFeasible(maxObjectSize_))
+            {
+                // create sutable matrix headers
+                GpuMat src  = resuzeBuffer(cv::Rect(0, 0, level.sFrame.width, level.sFrame.height));
+                GpuMat sint = integral(cv::Rect(prev, 0, level.sFrame.width + 1, level.sFrame.height + 1));
+
+                // generate integral for scale
+                cuda::resize(image, src, level.sFrame, 0, 0, cv::INTER_LINEAR);
+                cuda::integral(src, sint);
+
+                // calculate job
+                int totalWidth = level.workArea.width / step;
+                total += totalWidth * (level.workArea.height / step);
+
+                // go to next pyramid level
+                level = level.next(scaleFactor_, image.size(), NxM, minObjectSize_);
+                area = level.workArea;
+
+                step = (1 + (level.scale <= 2.f));
+                prev = acc;
+                acc += level.sFrame.width + 1;
+            }
+
+            device::lbp::classifyPyramid(image.cols, image.rows, NxM.width - 1, NxM.height - 1, iniScale, scaleFactor_, total, stage_mat, stage_mat.cols / sizeof(Stage), nodes_mat,
+                leaves_mat, subsets_mat, features_mat, subsetSize, candidates, dclassified.ptr<unsigned int>(), integral);
+        }
+
+        if (minNeighbors_ <= 0  || objects.empty())
+            return;
+
+        cudaSafeCall( cudaMemcpy(&classified, dclassified.ptr(), sizeof(int), cudaMemcpyDeviceToHost) );
+        device::lbp::connectedConmonents(candidates, classified, objects, minNeighbors_, grouping_eps, dclassified.ptr<unsigned int>());
+
+        cudaSafeCall( cudaMemcpy(&classified, dclassified.ptr(), sizeof(int), cudaMemcpyDeviceToHost) );
+        cudaSafeCall( cudaDeviceSynchronize() );
+
+        if (classified > 0)
+        {
+            objects.colRange(0, classified).copyTo(_objects);
+        }
+        else
+        {
+            _objects.release();
+        }
+    }
+
+    void LbpCascade_Impl::convert(OutputArray _gpu_objects, std::vector<Rect>& objects)
+    {
+        if (_gpu_objects.empty())
+        {
+            objects.clear();
+            return;
+        }
+
+        Mat gpu_objects;
+        if (_gpu_objects.kind() == _InputArray::CUDA_GPU_MAT)
+        {
+            _gpu_objects.getGpuMat().download(gpu_objects);
+        }
+        else
+        {
+            gpu_objects = _gpu_objects.getMat();
+        }
+
+        CV_Assert( gpu_objects.rows == 1 );
+        CV_Assert( gpu_objects.type() == traits::Type<Rect>::value );
+
+        Rect* ptr = gpu_objects.ptr<Rect>();
+        objects.assign(ptr, ptr + gpu_objects.cols);
+    }
+
+    bool LbpCascade_Impl::load(const FileNode &root)
+    {
+        const char *CUDA_CC_STAGE_TYPE       = "stageType";
+        const char *CUDA_CC_FEATURE_TYPE     = "featureType";
+        const char *CUDA_CC_BOOST            = "BOOST";
+        const char *CUDA_CC_LBP              = "LBP";
+        const char *CUDA_CC_MAX_CAT_COUNT    = "maxCatCount";
+        const char *CUDA_CC_HEIGHT           = "height";
+        const char *CUDA_CC_WIDTH            = "width";
+        const char *CUDA_CC_STAGE_PARAMS     = "stageParams";
+        const char *CUDA_CC_MAX_DEPTH        = "maxDepth";
+        const char *CUDA_CC_FEATURE_PARAMS   = "featureParams";
+        const char *CUDA_CC_STAGES           = "stages";
+        const char *CUDA_CC_STAGE_THRESHOLD  = "stageThreshold";
+        const float CUDA_THRESHOLD_EPS       = 1e-5f;
+        const char *CUDA_CC_WEAK_CLASSIFIERS = "weakClassifiers";
+        const char *CUDA_CC_INTERNAL_NODES   = "internalNodes";
+        const char *CUDA_CC_LEAF_VALUES      = "leafValues";
+        const char *CUDA_CC_FEATURES         = "features";
+        const char *CUDA_CC_RECT             = "rect";
+
+        String stageTypeStr = (String)root[CUDA_CC_STAGE_TYPE];
+        CV_Assert(stageTypeStr == CUDA_CC_BOOST);
+
+        String featureTypeStr = (String)root[CUDA_CC_FEATURE_TYPE];
+        CV_Assert(featureTypeStr == CUDA_CC_LBP);
+
+        NxM.width =  (int)root[CUDA_CC_WIDTH];
+        NxM.height = (int)root[CUDA_CC_HEIGHT];
+        CV_Assert( NxM.height > 0 && NxM.width > 0 );
+
+        isStumps = ((int)(root[CUDA_CC_STAGE_PARAMS][CUDA_CC_MAX_DEPTH]) == 1) ? true : false;
+        CV_Assert(isStumps);
+
+        FileNode fn = root[CUDA_CC_FEATURE_PARAMS];
+        if (fn.empty())
+            return false;
+
+        ncategories = fn[CUDA_CC_MAX_CAT_COUNT];
+
+        subsetSize = (ncategories + 31) / 32;
+        nodeStep = 3 + ( ncategories > 0 ? subsetSize : 1 );
+
+        fn = root[CUDA_CC_STAGES];
+        if (fn.empty())
+            return false;
+
+        std::vector<Stage> stages;
+        stages.reserve(fn.size());
+
+        std::vector<int> cl_trees;
+        std::vector<int> cl_nodes;
+        std::vector<float> cl_leaves;
+        std::vector<int> subsets;
+
+        FileNodeIterator it = fn.begin(), it_end = fn.end();
+        for (size_t si = 0; it != it_end; si++, ++it )
+        {
+            FileNode fns = *it;
+            Stage st;
+            st.threshold = (float)fns[CUDA_CC_STAGE_THRESHOLD] - CUDA_THRESHOLD_EPS;
+
+            fns = fns[CUDA_CC_WEAK_CLASSIFIERS];
+            if (fns.empty())
+                return false;
+
+            st.ntrees = (int)fns.size();
+            st.first = (int)cl_trees.size();
+
+            stages.push_back(st);// (int, int, float)
+
+            cl_trees.reserve(stages[si].first + stages[si].ntrees);
+
+            // weak trees
+            FileNodeIterator it1 = fns.begin(), it1_end = fns.end();
+            for ( ; it1 != it1_end; ++it1 )
+            {
+                FileNode fnw = *it1;
+
+                FileNode internalNodes = fnw[CUDA_CC_INTERNAL_NODES];
+                FileNode leafValues = fnw[CUDA_CC_LEAF_VALUES];
+                if ( internalNodes.empty() || leafValues.empty() )
+                    return false;
+
+                int nodeCount = (int)internalNodes.size()/nodeStep;
+                cl_trees.push_back(nodeCount);
+
+                cl_nodes.reserve((cl_nodes.size() + nodeCount) * 3);
+                cl_leaves.reserve(cl_leaves.size() + leafValues.size());
+
+                if( subsetSize > 0 )
+                    subsets.reserve(subsets.size() + nodeCount * subsetSize);
+
+                // nodes
+                FileNodeIterator iIt = internalNodes.begin(), iEnd = internalNodes.end();
+
+                for( ; iIt != iEnd; )
+                {
+                    cl_nodes.push_back((int)*(iIt++));
+                    cl_nodes.push_back((int)*(iIt++));
+                    cl_nodes.push_back((int)*(iIt++));
+
+                    if( subsetSize > 0 )
+                        for( int j = 0; j < subsetSize; j++, ++iIt )
+                            subsets.push_back((int)*iIt);
+                }
+
+                // leaves
+                iIt = leafValues.begin(), iEnd = leafValues.end();
+                for( ; iIt != iEnd; ++iIt )
+                    cl_leaves.push_back((float)*iIt);
+            }
+        }
+
+        fn = root[CUDA_CC_FEATURES];
+        if( fn.empty() )
+            return false;
+        std::vector<uchar> features;
+        features.reserve(fn.size() * 4);
+        FileNodeIterator f_it = fn.begin(), f_end = fn.end();
+        for (; f_it != f_end; ++f_it)
+        {
+            FileNode rect = (*f_it)[CUDA_CC_RECT];
+            FileNodeIterator r_it = rect.begin();
+            features.push_back(saturate_cast<uchar>((int)*(r_it++)));
+            features.push_back(saturate_cast<uchar>((int)*(r_it++)));
+            features.push_back(saturate_cast<uchar>((int)*(r_it++)));
+            features.push_back(saturate_cast<uchar>((int)*(r_it++)));
+        }
+
+        // copy data structures on gpu
+        stage_mat.upload(cv::Mat(1, (int) (stages.size() * sizeof(Stage)), CV_8UC1, (uchar*)&(stages[0]) ));
+        trees_mat.upload(cv::Mat(cl_trees).reshape(1,1));
+        nodes_mat.upload(cv::Mat(cl_nodes).reshape(1,1));
+        leaves_mat.upload(cv::Mat(cl_leaves).reshape(1,1));
+        subsets_mat.upload(cv::Mat(subsets).reshape(1,1));
+        features_mat.upload(cv::Mat(features).reshape(4,1));
+
+        return true;
+    }
+
+    void LbpCascade_Impl::allocateBuffers(cv::Size frame)
+    {
+        if (frame == cv::Size())
+            return;
+
+        if (resuzeBuffer.empty() || frame.width > resuzeBuffer.cols || frame.height > resuzeBuffer.rows)
+        {
+            resuzeBuffer.create(frame, CV_8UC1);
+
+            integral.create(frame.height + 1, integralFactor * (frame.width + 1), CV_32SC1);
+
+        #ifdef HAVE_OPENCV_CUDALEGACY
+            NcvSize32u roiSize;
+            roiSize.width = frame.width;
+            roiSize.height = frame.height;
+
+            cudaDeviceProp prop;
+            cudaSafeCall( cudaGetDeviceProperties(&prop, cv::cuda::getDevice()) );
+
+            Ncv32u bufSize;
+            ncvSafeCall( nppiStIntegralGetSize_8u32u(roiSize, &bufSize, prop) );
+            integralBuffer.create(1, bufSize, CV_8UC1);
+        #endif
+
+            candidates.create(1 , frame.width >> 1, CV_32SC4);
+        }
+    }
+
+}
+
+//
+// create
+//
+
+Ptr<cuda::CascadeClassifier> cv::cuda::CascadeClassifier::create(const String& filename)
+{
+    String fext = filename.substr(filename.find_last_of(".") + 1);
+    std::transform(fext.begin(), fext.end(), fext.begin(), ::tolower);
+
+    if (fext == "nvbin")
+    {
+    #ifndef HAVE_OPENCV_CUDALEGACY
+        CV_Error(Error::StsUnsupportedFormat, "OpenCV CUDA objdetect was built without HaarCascade");
+        return Ptr<cuda::CascadeClassifier>();
+    #else
+        return makePtr<HaarCascade_Impl>(filename);
+    #endif
+    }
+
+    FileStorage fs(filename, FileStorage::READ);
+
+    if (!fs.isOpened())
+    {
+    #ifndef HAVE_OPENCV_CUDALEGACY
+        CV_Error(Error::StsUnsupportedFormat, "OpenCV CUDA objdetect was built without HaarCascade");
+        return Ptr<cuda::CascadeClassifier>();
+    #else
+        return makePtr<HaarCascade_Impl>(filename);
+    #endif
+    }
+
+    const char *CUDA_CC_LBP = "LBP";
+    String featureTypeStr = (String)fs.getFirstTopLevelNode()["featureType"];
+    if (featureTypeStr == CUDA_CC_LBP)
+    {
+        return makePtr<LbpCascade_Impl>(fs);
+    }
+    else
+    {
+    #ifndef HAVE_OPENCV_CUDALEGACY
+        CV_Error(Error::StsUnsupportedFormat, "OpenCV CUDA objdetect was built without HaarCascade");
+        return Ptr<cuda::CascadeClassifier>();
+    #else
+        return makePtr<HaarCascade_Impl>(filename);
+    #endif
+    }
+
+    CV_Error(Error::StsUnsupportedFormat, "Unsupported format for CUDA CascadeClassifier");
+    return Ptr<cuda::CascadeClassifier>();
+}
+
+Ptr<cuda::CascadeClassifier> cv::cuda::CascadeClassifier::create(const FileStorage& file)
+{
+    return makePtr<LbpCascade_Impl>(file);
+}
+
+#endif
diff --git a/modules/cudaobjdetect/src/cuda/hog.cu b/modules/cudaobjdetect/src/cuda/hog.cu
new file mode 100644
index 00000000000..5c12860620a
--- /dev/null
+++ b/modules/cudaobjdetect/src/cuda/hog.cu
@@ -0,0 +1,890 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/reduce.hpp"
+#include "opencv2/core/cuda/functional.hpp"
+#include "opencv2/core/cuda/warp_shuffle.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+
+    namespace hog
+    {
+        __constant__ int cnbins;
+        __constant__ int cblock_stride_x;
+        __constant__ int cblock_stride_y;
+        __constant__ int cnblocks_win_x;
+        __constant__ int cnblocks_win_y;
+        __constant__ int cncells_block_x;
+        __constant__ int cncells_block_y;
+        __constant__ int cblock_hist_size;
+        __constant__ int cblock_hist_size_2up;
+        __constant__ int cdescr_size;
+        __constant__ int cdescr_width;
+
+
+        /* Returns the nearest upper power of two, works only for
+        the typical GPU thread count (pert block) values */
+        int power_2up(unsigned int n)
+        {
+            if (n <= 1) return 1;
+            else if (n <= 2) return 2;
+            else if (n <= 4) return 4;
+            else if (n <= 8) return 8;
+            else if (n <= 16) return 16;
+            else if (n <= 32) return 32;
+            else if (n <= 64) return 64;
+            else if (n <= 128) return 128;
+            else if (n <= 256) return 256;
+            else if (n <= 512) return 512;
+            else if (n <= 1024) return 1024;
+            return -1; // Input is too big
+        }
+
+        /* Returns the max size for nblocks */
+        int max_nblocks(int nthreads, int ncells_block = 1)
+        {
+            int threads = nthreads * ncells_block;
+            if(threads * 4 <= 256)
+                return 4;
+            else if(threads * 3 <= 256)
+                return 3;
+            else if(threads * 2 <= 256)
+                return 2;
+            else
+                return 1;
+        }
+
+
+        void set_up_constants(int nbins,
+                              int block_stride_x, int block_stride_y,
+                              int nblocks_win_x, int nblocks_win_y,
+                              int ncells_block_x, int ncells_block_y,
+                              const cudaStream_t& stream)
+        {
+            cudaSafeCall(cudaMemcpyToSymbolAsync(cnbins,               &nbins,               sizeof(nbins),               0, cudaMemcpyHostToDevice, stream));
+            cudaSafeCall(cudaMemcpyToSymbolAsync(cblock_stride_x,      &block_stride_x,      sizeof(block_stride_x),      0, cudaMemcpyHostToDevice, stream));
+            cudaSafeCall(cudaMemcpyToSymbolAsync(cblock_stride_y,      &block_stride_y,      sizeof(block_stride_y),      0, cudaMemcpyHostToDevice, stream));
+            cudaSafeCall(cudaMemcpyToSymbolAsync(cnblocks_win_x,       &nblocks_win_x,       sizeof(nblocks_win_x),       0, cudaMemcpyHostToDevice, stream));
+            cudaSafeCall(cudaMemcpyToSymbolAsync(cnblocks_win_y,       &nblocks_win_y,       sizeof(nblocks_win_y),       0, cudaMemcpyHostToDevice, stream));
+            cudaSafeCall(cudaMemcpyToSymbolAsync(cncells_block_x,      &ncells_block_x,      sizeof(ncells_block_x),      0, cudaMemcpyHostToDevice, stream));
+            cudaSafeCall(cudaMemcpyToSymbolAsync(cncells_block_y,      &ncells_block_y,      sizeof(ncells_block_y),      0, cudaMemcpyHostToDevice, stream));
+
+            int block_hist_size = nbins * ncells_block_x * ncells_block_y;
+            cudaSafeCall(cudaMemcpyToSymbolAsync(cblock_hist_size,     &block_hist_size,     sizeof(block_hist_size),     0, cudaMemcpyHostToDevice, stream));
+
+            int block_hist_size_2up = power_2up(block_hist_size);
+            cudaSafeCall(cudaMemcpyToSymbolAsync(cblock_hist_size_2up, &block_hist_size_2up, sizeof(block_hist_size_2up), 0, cudaMemcpyHostToDevice, stream));
+
+            int descr_width = nblocks_win_x * block_hist_size;
+            cudaSafeCall(cudaMemcpyToSymbolAsync(cdescr_width,         &descr_width,         sizeof(descr_width),         0, cudaMemcpyHostToDevice, stream));
+
+            int descr_size = descr_width * nblocks_win_y;
+            cudaSafeCall(cudaMemcpyToSymbolAsync(cdescr_size,          &descr_size,          sizeof(descr_size),          0, cudaMemcpyHostToDevice, stream));
+        }
+
+
+        //----------------------------------------------------------------------------
+        // Histogram computation
+        //
+        // CUDA kernel to compute the histograms
+        template <int nblocks> // Number of histogram blocks processed by single GPU thread block
+        __global__ void compute_hists_kernel_many_blocks(const int img_block_width, const PtrStepf grad,
+                                                         const PtrStepb qangle, float scale, float* block_hists,
+                                                         int cell_size, int patch_size, int block_patch_size,
+                                                         int threads_cell, int threads_block, int half_cell_size)
+        {
+            const int block_x = threadIdx.z;
+            const int cell_x = threadIdx.x / threads_cell;
+            const int cell_y = threadIdx.y;
+            const int cell_thread_x = threadIdx.x & (threads_cell - 1);
+
+            if (blockIdx.x * blockDim.z + block_x >= img_block_width)
+                return;
+
+            extern __shared__ float smem[];
+            float* hists = smem;
+            float* final_hist = smem + cnbins * block_patch_size * nblocks;
+
+            // patch_size means that patch_size pixels affect on block's cell
+            if (cell_thread_x < patch_size)
+            {
+                const int offset_x = (blockIdx.x * blockDim.z + block_x) * cblock_stride_x +
+                                     half_cell_size * cell_x + cell_thread_x;
+                const int offset_y = blockIdx.y * cblock_stride_y + half_cell_size * cell_y;
+
+                const float* grad_ptr = grad.ptr(offset_y) + offset_x * 2;
+                const unsigned char* qangle_ptr = qangle.ptr(offset_y) + offset_x * 2;
+
+
+                float* hist = hists + patch_size * (cell_y * blockDim.z * cncells_block_y +
+                                            cell_x + block_x * cncells_block_x) +
+                                           cell_thread_x;
+                for (int bin_id = 0; bin_id < cnbins; ++bin_id)
+                    hist[bin_id * block_patch_size * nblocks] = 0.f;
+
+                //(dist_x, dist_y) : distance between current pixel in patch and cell's center
+                const int dist_x = -half_cell_size + (int)cell_thread_x - half_cell_size * cell_x;
+
+                const int dist_y_begin = -half_cell_size - half_cell_size * (int)threadIdx.y;
+                for (int dist_y = dist_y_begin; dist_y < dist_y_begin + patch_size; ++dist_y)
+                {
+                    float2 vote = *(const float2*)grad_ptr;
+                    uchar2 bin = *(const uchar2*)qangle_ptr;
+
+                    grad_ptr += grad.step/sizeof(float);
+                    qangle_ptr += qangle.step;
+
+                    //(dist_center_x, dist_center_y) : distance between current pixel in patch and block's center
+                    int dist_center_y = dist_y - half_cell_size * (1 - 2 * cell_y);
+                    int dist_center_x = dist_x - half_cell_size * (1 - 2 * cell_x);
+
+                    float gaussian = ::expf(-(dist_center_y * dist_center_y +
+                                              dist_center_x * dist_center_x) * scale);
+
+                    float interp_weight = ((float)cell_size - ::fabs(dist_y + 0.5f)) *
+                                          ((float)cell_size - ::fabs(dist_x + 0.5f)) / (float)threads_block;
+
+                    hist[bin.x * block_patch_size * nblocks] += gaussian * interp_weight * vote.x;
+                    hist[bin.y * block_patch_size * nblocks] += gaussian * interp_weight * vote.y;
+                }
+
+                //reduction of the histograms
+                volatile float* hist_ = hist;
+                for (int bin_id = 0; bin_id < cnbins; ++bin_id, hist_ += block_patch_size * nblocks)
+                {
+                    if (cell_thread_x < patch_size/2) hist_[0] += hist_[patch_size/2];
+                    if (cell_thread_x < patch_size/4 && (!((patch_size/4) < 3 && cell_thread_x == 0)))
+                            hist_[0] += hist_[patch_size/4];
+                    if (cell_thread_x == 0)
+                        final_hist[((cell_x + block_x * cncells_block_x) * cncells_block_y + cell_y) * cnbins + bin_id]
+                            = hist_[0] + hist_[1] + hist_[2];
+                }
+            }
+
+            __syncthreads();
+
+            float* block_hist = block_hists + (blockIdx.y * img_block_width +
+                                               blockIdx.x * blockDim.z + block_x) *
+                                              cblock_hist_size;
+
+            //copying from final_hist to block_hist
+            int tid;
+            if(threads_cell < cnbins)
+            {
+                tid = (cell_y * cncells_block_y + cell_x) * cnbins + cell_thread_x;
+            } else
+            {
+                tid = (cell_y * cncells_block_y + cell_x) * threads_cell + cell_thread_x;
+            }
+            if (tid < cblock_hist_size)
+            {
+                block_hist[tid] = final_hist[block_x * cblock_hist_size + tid];
+                if(threads_cell < cnbins && cell_thread_x == (threads_cell-1))
+                {
+                    for(int i=1;i<=(cnbins - threads_cell);++i)
+                    {
+                        block_hist[tid + i] = final_hist[block_x * cblock_hist_size + tid + i];
+                    }
+                }
+            }
+        }
+
+        //declaration of variables and invoke the kernel with the calculated number of blocks
+        void compute_hists(int nbins,
+                           int block_stride_x, int block_stride_y,
+                           int height, int width,
+                           const PtrStepSzf& grad, const PtrStepSzb& qangle,
+                           float sigma,
+                           float* block_hists,
+                           int cell_size_x, int cell_size_y,
+                           int ncells_block_x, int ncells_block_y,
+                           const cudaStream_t& stream)
+        {
+            const int ncells_block = ncells_block_x * ncells_block_y;
+            const int patch_side = cell_size_x / 4;
+            const int patch_size = cell_size_x + (patch_side * 2);
+            const int block_patch_size = ncells_block * patch_size;
+            const int threads_cell = power_2up(patch_size);
+            const int threads_block = ncells_block * threads_cell;
+            const int half_cell_size = cell_size_x / 2;
+
+            int img_block_width = (width - ncells_block_x * cell_size_x + block_stride_x) /
+                                  block_stride_x;
+            int img_block_height = (height - ncells_block_y * cell_size_y + block_stride_y) /
+                                   block_stride_y;
+
+            const int nblocks = max_nblocks(threads_cell, ncells_block);
+            dim3 grid(divUp(img_block_width, nblocks), img_block_height);
+            dim3 threads(threads_cell * ncells_block_x, ncells_block_y, nblocks);
+
+            // Precompute gaussian spatial window parameter
+            float scale = 1.f / (2.f * sigma * sigma);
+
+            int hists_size = (nbins * ncells_block * patch_size * nblocks) * sizeof(float);
+            int final_hists_size = (nbins * ncells_block * nblocks) * sizeof(float);
+            int smem = hists_size + final_hists_size;
+            if (nblocks == 4)
+                compute_hists_kernel_many_blocks<4><<<grid, threads, smem, stream>>>(img_block_width, grad, qangle, scale, block_hists, cell_size_x, patch_size, block_patch_size, threads_cell, threads_block, half_cell_size);
+            else if (nblocks == 3)
+                compute_hists_kernel_many_blocks<3><<<grid, threads, smem, stream>>>(img_block_width, grad, qangle, scale, block_hists, cell_size_x, patch_size, block_patch_size, threads_cell, threads_block, half_cell_size);
+            else if (nblocks == 2)
+                compute_hists_kernel_many_blocks<2><<<grid, threads, smem, stream>>>(img_block_width, grad, qangle, scale, block_hists, cell_size_x, patch_size, block_patch_size, threads_cell, threads_block, half_cell_size);
+            else
+                compute_hists_kernel_many_blocks<1><<<grid, threads, smem, stream>>>(img_block_width, grad, qangle, scale, block_hists, cell_size_x, patch_size, block_patch_size, threads_cell, threads_block, half_cell_size);
+
+            cudaSafeCall( cudaGetLastError() );
+        }
+
+
+        //-------------------------------------------------------------
+        //  Normalization of histograms via L2Hys_norm
+        //
+
+
+        template<int size>
+        __device__ float reduce_smem(float* smem, float val)
+        {
+            unsigned int tid = threadIdx.x;
+            float sum = val;
+
+            reduce<size>(smem, sum, tid, plus<float>());
+
+            if (size == 32)
+            {
+            #if __CUDA_ARCH__ >= 300
+                return shfl(sum, 0);
+            #else
+                return smem[0];
+            #endif
+            }
+            else
+            {
+            #if __CUDA_ARCH__ >= 300
+                if (threadIdx.x == 0)
+                    smem[0] = sum;
+            #endif
+
+                __syncthreads();
+
+                return smem[0];
+            }
+        }
+
+
+        template <int nthreads, // Number of threads which process one block historgam
+                  int nblocks> // Number of block hisograms processed by one GPU thread block
+        __global__ void normalize_hists_kernel_many_blocks(const int block_hist_size,
+                                                           const int img_block_width,
+                                                           float* block_hists, float threshold)
+        {
+            if (blockIdx.x * blockDim.z + threadIdx.z >= img_block_width)
+                return;
+
+            float* hist = block_hists + (blockIdx.y * img_block_width +
+                                         blockIdx.x * blockDim.z + threadIdx.z) *
+                                        block_hist_size + threadIdx.x;
+
+            __shared__ float sh_squares[nthreads * nblocks];
+            float* squares = sh_squares + threadIdx.z * nthreads;
+
+            float elem = 0.f;
+            if (threadIdx.x < block_hist_size)
+                elem = hist[0];
+
+            __syncthreads(); // prevent race condition (redundant?)
+            float sum = reduce_smem<nthreads>(squares, elem * elem);
+
+            float scale = 1.0f / (::sqrtf(sum) + 0.1f * block_hist_size);
+            elem = ::min(elem * scale, threshold);
+
+            __syncthreads(); // prevent race condition
+            sum = reduce_smem<nthreads>(squares, elem * elem);
+
+            scale = 1.0f / (::sqrtf(sum) + 1e-3f);
+
+            if (threadIdx.x < block_hist_size)
+                hist[0] = elem * scale;
+        }
+
+
+        void normalize_hists(int nbins,
+                             int block_stride_x, int block_stride_y,
+                             int height, int width,
+                             float* block_hists,
+                             float threshold,
+                             int cell_size_x, int cell_size_y,
+                             int ncells_block_x, int ncells_block_y,
+                             const cudaStream_t& stream)
+        {
+            const int nblocks = 1;
+
+            int block_hist_size = nbins * ncells_block_x * ncells_block_y;
+            int nthreads = power_2up(block_hist_size);
+            dim3 threads(nthreads, 1, nblocks);
+
+            int img_block_width = (width - ncells_block_x * cell_size_x + block_stride_x) / block_stride_x;
+            int img_block_height = (height - ncells_block_y * cell_size_y + block_stride_y) / block_stride_y;
+            dim3 grid(divUp(img_block_width, nblocks), img_block_height);
+
+            if (nthreads == 32)
+                normalize_hists_kernel_many_blocks<32, nblocks><<<grid, threads, 0, stream>>>(block_hist_size, img_block_width, block_hists, threshold);
+            else if (nthreads == 64)
+                normalize_hists_kernel_many_blocks<64, nblocks><<<grid, threads, 0, stream>>>(block_hist_size, img_block_width, block_hists, threshold);
+            else if (nthreads == 128)
+                normalize_hists_kernel_many_blocks<128, nblocks><<<grid, threads, 0, stream>>>(block_hist_size, img_block_width, block_hists, threshold);
+            else if (nthreads == 256)
+                normalize_hists_kernel_many_blocks<256, nblocks><<<grid, threads, 0, stream>>>(block_hist_size, img_block_width, block_hists, threshold);
+            else if (nthreads == 512)
+                normalize_hists_kernel_many_blocks<512, nblocks><<<grid, threads, 0, stream>>>(block_hist_size, img_block_width, block_hists, threshold);
+            else
+                CV_Error(cv::Error::StsBadArg, "normalize_hists: histogram's size is too big, try to decrease number of bins");
+
+            cudaSafeCall( cudaGetLastError() );
+        }
+
+
+        //---------------------------------------------------------------------
+        //  Linear SVM based classification
+        //
+
+       // return confidence values not just positive location
+       template <int nthreads, // Number of threads per one histogram block
+                 int nblocks>  // Number of histogram block processed by single GPU thread block
+       __global__ void compute_confidence_hists_kernel_many_blocks(const int img_win_width, const int img_block_width,
+                                                                                                           const int win_block_stride_x, const int win_block_stride_y,
+                                                                                                           const float* block_hists, const float* coefs,
+                                                                                                           float free_coef, float threshold, float* confidences)
+       {
+           const int win_x = threadIdx.z;
+           if (blockIdx.x * blockDim.z + win_x >= img_win_width)
+                   return;
+
+           const float* hist = block_hists + (blockIdx.y * win_block_stride_y * img_block_width +
+                                                                                blockIdx.x * win_block_stride_x * blockDim.z + win_x) *
+                                                                               cblock_hist_size;
+
+           float product = 0.f;
+           for (int i = threadIdx.x; i < cdescr_size; i += nthreads)
+           {
+                   int offset_y = i / cdescr_width;
+                   int offset_x = i - offset_y * cdescr_width;
+                   product += coefs[i] * hist[offset_y * img_block_width * cblock_hist_size + offset_x];
+           }
+
+           __shared__ float products[nthreads * nblocks];
+
+           const int tid = threadIdx.z * nthreads + threadIdx.x;
+
+           reduce<nthreads>(products, product, tid, plus<float>());
+
+           if (threadIdx.x == 0)
+               confidences[blockIdx.y * img_win_width + blockIdx.x * blockDim.z + win_x] = product + free_coef;
+
+       }
+
+       void compute_confidence_hists(int win_height, int win_width, int block_stride_y, int block_stride_x,
+                                               int win_stride_y, int win_stride_x, int height, int width, float* block_hists,
+                                               float* coefs, float free_coef, float threshold, int cell_size_x, int ncells_block_x, float *confidences)
+       {
+           const int nthreads = 256;
+           const int nblocks = 1;
+
+           int win_block_stride_x = win_stride_x / block_stride_x;
+           int win_block_stride_y = win_stride_y / block_stride_y;
+           int img_win_width = (width - win_width + win_stride_x) / win_stride_x;
+           int img_win_height = (height - win_height + win_stride_y) / win_stride_y;
+
+           dim3 threads(nthreads, 1, nblocks);
+           dim3 grid(divUp(img_win_width, nblocks), img_win_height);
+
+           cudaSafeCall(cudaFuncSetCacheConfig(compute_confidence_hists_kernel_many_blocks<nthreads, nblocks>,
+                                                                                   cudaFuncCachePreferL1));
+
+           int img_block_width = (width - ncells_block_x * cell_size_x + block_stride_x) /
+                                                       block_stride_x;
+           compute_confidence_hists_kernel_many_blocks<nthreads, nblocks><<<grid, threads>>>(
+                   img_win_width, img_block_width, win_block_stride_x, win_block_stride_y,
+                   block_hists, coefs, free_coef, threshold, confidences);
+           cudaSafeCall(cudaDeviceSynchronize());
+       }
+
+
+
+        template <int nthreads, // Number of threads per one histogram block
+                  int nblocks>  // Number of histogram block processed by single GPU thread block
+        __global__ void classify_hists_kernel_many_blocks(const int img_win_width, const int img_block_width,
+                                                          const int win_block_stride_x, const int win_block_stride_y,
+                                                          const float* block_hists, const float* coefs,
+                                                          float free_coef, float threshold, unsigned char* labels)
+        {
+            const int win_x = threadIdx.z;
+            if (blockIdx.x * blockDim.z + win_x >= img_win_width)
+                return;
+
+            const float* hist = block_hists + (blockIdx.y * win_block_stride_y * img_block_width +
+                                               blockIdx.x * win_block_stride_x * blockDim.z + win_x) *
+                                              cblock_hist_size;
+
+            float product = 0.f;
+            for (int i = threadIdx.x; i < cdescr_size; i += nthreads)
+            {
+                int offset_y = i / cdescr_width;
+                int offset_x = i - offset_y * cdescr_width;
+                product += coefs[i] * hist[offset_y * img_block_width * cblock_hist_size + offset_x];
+            }
+
+            __shared__ float products[nthreads * nblocks];
+
+            const int tid = threadIdx.z * nthreads + threadIdx.x;
+
+            reduce<nthreads>(products, product, tid, plus<float>());
+
+            if (threadIdx.x == 0)
+                labels[blockIdx.y * img_win_width + blockIdx.x * blockDim.z + win_x] = (product + free_coef >= threshold);
+        }
+
+
+        void classify_hists(int win_height, int win_width, int block_stride_y, int block_stride_x,
+                            int win_stride_y, int win_stride_x, int height, int width, float* block_hists,
+                            float* coefs, float free_coef, float threshold, int cell_size_x, int ncells_block_x, unsigned char* labels)
+        {
+            const int nthreads = 256;
+            const int nblocks = 1;
+
+            int win_block_stride_x = win_stride_x / block_stride_x;
+            int win_block_stride_y = win_stride_y / block_stride_y;
+            int img_win_width = (width - win_width + win_stride_x) / win_stride_x;
+            int img_win_height = (height - win_height + win_stride_y) / win_stride_y;
+
+            dim3 threads(nthreads, 1, nblocks);
+            dim3 grid(divUp(img_win_width, nblocks), img_win_height);
+
+            cudaSafeCall(cudaFuncSetCacheConfig(classify_hists_kernel_many_blocks<nthreads, nblocks>, cudaFuncCachePreferL1));
+
+            int img_block_width = (width - ncells_block_x * cell_size_x + block_stride_x) / block_stride_x;
+            classify_hists_kernel_many_blocks<nthreads, nblocks><<<grid, threads>>>(
+                img_win_width, img_block_width, win_block_stride_x, win_block_stride_y,
+                block_hists, coefs, free_coef, threshold, labels);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        //----------------------------------------------------------------------------
+        // Extract descriptors
+
+
+        template <int nthreads>
+        __global__ void extract_descrs_by_rows_kernel(const int img_block_width,
+                                                      const int win_block_stride_x, const int win_block_stride_y,
+                                                      const float* block_hists,
+                                                      PtrStepf descriptors)
+        {
+            // Get left top corner of the window in src
+            const float* hist = block_hists + (blockIdx.y * win_block_stride_y * img_block_width +
+                                               blockIdx.x * win_block_stride_x) * cblock_hist_size;
+
+            // Get left top corner of the window in dst
+            float* descriptor = descriptors.ptr(blockIdx.y * gridDim.x + blockIdx.x);
+
+            // Copy elements from src to dst
+            for (int i = threadIdx.x; i < cdescr_size; i += nthreads)
+            {
+                int offset_y = i / cdescr_width;
+                int offset_x = i - offset_y * cdescr_width;
+                descriptor[i] = hist[offset_y * img_block_width * cblock_hist_size + offset_x];
+            }
+        }
+
+
+        void extract_descrs_by_rows(int win_height, int win_width,
+                                    int block_stride_y, int block_stride_x,
+                                    int win_stride_y, int win_stride_x,
+                                    int height, int width,
+                                    float* block_hists, int cell_size_x,
+                                    int ncells_block_x,
+                                    PtrStepSzf descriptors,
+                                    const cudaStream_t& stream)
+        {
+            const int nthreads = 256;
+
+            int win_block_stride_x = win_stride_x / block_stride_x;
+            int win_block_stride_y = win_stride_y / block_stride_y;
+            int img_win_width = (width - win_width + win_stride_x) / win_stride_x;
+            int img_win_height = (height - win_height + win_stride_y) / win_stride_y;
+            dim3 threads(nthreads, 1);
+            dim3 grid(img_win_width, img_win_height);
+
+            int img_block_width = (width - ncells_block_x * cell_size_x + block_stride_x) / block_stride_x;
+            extract_descrs_by_rows_kernel<nthreads><<<grid, threads, 0, stream>>>(img_block_width, win_block_stride_x, win_block_stride_y, block_hists, descriptors);
+
+            cudaSafeCall( cudaGetLastError() );
+        }
+
+
+        template <int nthreads>
+        __global__ void extract_descrs_by_cols_kernel(const int img_block_width,
+                                                      const int win_block_stride_x, const int win_block_stride_y,
+                                                      const float* block_hists,
+                                                      PtrStepf descriptors)
+        {
+            // Get left top corner of the window in src
+            const float* hist = block_hists + (blockIdx.y * win_block_stride_y * img_block_width +
+                                               blockIdx.x * win_block_stride_x) * cblock_hist_size;
+
+            // Get left top corner of the window in dst
+            float* descriptor = descriptors.ptr(blockIdx.y * gridDim.x + blockIdx.x);
+
+            // Copy elements from src to dst
+            for (int i = threadIdx.x; i < cdescr_size; i += nthreads)
+            {
+                int block_idx = i / cblock_hist_size;
+                int idx_in_block = i - block_idx * cblock_hist_size;
+
+                int y = block_idx / cnblocks_win_x;
+                int x = block_idx - y * cnblocks_win_x;
+
+                descriptor[(x * cnblocks_win_y + y) * cblock_hist_size + idx_in_block]
+                    = hist[(y * img_block_width  + x) * cblock_hist_size + idx_in_block];
+            }
+        }
+
+
+        void extract_descrs_by_cols(int win_height, int win_width,
+                                    int block_stride_y, int block_stride_x,
+                                    int win_stride_y, int win_stride_x,
+                                    int height, int width,
+                                    float* block_hists,
+                                    int cell_size_x, int ncells_block_x,
+                                    PtrStepSzf descriptors,
+                                    const cudaStream_t& stream)
+        {
+            const int nthreads = 256;
+
+            int win_block_stride_x = win_stride_x / block_stride_x;
+            int win_block_stride_y = win_stride_y / block_stride_y;
+            int img_win_width = (width - win_width + win_stride_x) / win_stride_x;
+            int img_win_height = (height - win_height + win_stride_y) / win_stride_y;
+            dim3 threads(nthreads, 1);
+            dim3 grid(img_win_width, img_win_height);
+
+            int img_block_width = (width - ncells_block_x * cell_size_x + block_stride_x) / block_stride_x;
+            extract_descrs_by_cols_kernel<nthreads><<<grid, threads, 0, stream>>>(img_block_width, win_block_stride_x, win_block_stride_y, block_hists, descriptors);
+
+            cudaSafeCall( cudaGetLastError() );
+        }
+
+        //----------------------------------------------------------------------------
+        // Gradients computation
+
+
+        template <int nthreads, int correct_gamma>
+        __global__ void compute_gradients_8UC4_kernel(int height, int width, const PtrStepb img,
+                                                      float angle_scale, PtrStepf grad, PtrStepb qangle)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+
+            const uchar4* row = (const uchar4*)img.ptr(blockIdx.y);
+
+            __shared__ float sh_row[(nthreads + 2) * 3];
+
+            uchar4 val;
+            if (x < width)
+                val = row[x];
+            else
+                val = row[width - 2];
+
+            sh_row[threadIdx.x + 1] = val.x;
+            sh_row[threadIdx.x + 1 + (nthreads + 2)] = val.y;
+            sh_row[threadIdx.x + 1 + 2 * (nthreads + 2)] = val.z;
+
+            if (threadIdx.x == 0)
+            {
+                val = row[::max(x - 1, 1)];
+                sh_row[0] = val.x;
+                sh_row[(nthreads + 2)] = val.y;
+                sh_row[2 * (nthreads + 2)] = val.z;
+            }
+
+            if (threadIdx.x == blockDim.x - 1)
+            {
+                val = row[::min(x + 1, width - 2)];
+                sh_row[blockDim.x + 1] = val.x;
+                sh_row[blockDim.x + 1 + (nthreads + 2)] = val.y;
+                sh_row[blockDim.x + 1 + 2 * (nthreads + 2)] = val.z;
+            }
+
+            __syncthreads();
+            if (x < width)
+            {
+                float3 a, b;
+
+                b.x = sh_row[threadIdx.x + 2];
+                b.y = sh_row[threadIdx.x + 2 + (nthreads + 2)];
+                b.z = sh_row[threadIdx.x + 2 + 2 * (nthreads + 2)];
+                a.x = sh_row[threadIdx.x];
+                a.y = sh_row[threadIdx.x + (nthreads + 2)];
+                a.z = sh_row[threadIdx.x + 2 * (nthreads + 2)];
+
+                float3 dx;
+                if (correct_gamma)
+                    dx = make_float3(::sqrtf(b.x) - ::sqrtf(a.x), ::sqrtf(b.y) - ::sqrtf(a.y), ::sqrtf(b.z) - ::sqrtf(a.z));
+                else
+                    dx = make_float3(b.x - a.x, b.y - a.y, b.z - a.z);
+
+                float3 dy = make_float3(0.f, 0.f, 0.f);
+
+                if (blockIdx.y > 0 && blockIdx.y < height - 1)
+                {
+                    val = ((const uchar4*)img.ptr(blockIdx.y - 1))[x];
+                    a = make_float3(val.x, val.y, val.z);
+
+                    val = ((const uchar4*)img.ptr(blockIdx.y + 1))[x];
+                    b = make_float3(val.x, val.y, val.z);
+
+                    if (correct_gamma)
+                        dy = make_float3(::sqrtf(b.x) - ::sqrtf(a.x), ::sqrtf(b.y) - ::sqrtf(a.y), ::sqrtf(b.z) - ::sqrtf(a.z));
+                    else
+                        dy = make_float3(b.x - a.x, b.y - a.y, b.z - a.z);
+                }
+
+                float best_dx = dx.x;
+                float best_dy = dy.x;
+
+                float mag0 = dx.x * dx.x + dy.x * dy.x;
+                float mag1 = dx.y * dx.y + dy.y * dy.y;
+                if (mag0 < mag1)
+                {
+                    best_dx = dx.y;
+                    best_dy = dy.y;
+                    mag0 = mag1;
+                }
+
+                mag1 = dx.z * dx.z + dy.z * dy.z;
+                if (mag0 < mag1)
+                {
+                    best_dx = dx.z;
+                    best_dy = dy.z;
+                    mag0 = mag1;
+                }
+
+                mag0 = ::sqrtf(mag0);
+
+                float ang = (::atan2f(best_dy, best_dx) + CV_PI_F) * angle_scale - 0.5f;
+                int hidx = (int)::floorf(ang);
+                ang -= hidx;
+                hidx = (hidx + cnbins) % cnbins;
+
+                ((uchar2*)qangle.ptr(blockIdx.y))[x] = make_uchar2(hidx, (hidx + 1) % cnbins);
+                ((float2*)grad.ptr(blockIdx.y))[x] = make_float2(mag0 * (1.f - ang), mag0 * ang);
+            }
+        }
+
+
+        void compute_gradients_8UC4(int nbins,
+                                    int height, int width, const PtrStepSzb& img,
+                                    float angle_scale,
+                                    PtrStepSzf grad, PtrStepSzb qangle,
+                                    bool correct_gamma,
+                                    const cudaStream_t& stream)
+        {
+            CV_UNUSED(nbins);
+            const int nthreads = 256;
+
+            dim3 bdim(nthreads, 1);
+            dim3 gdim(divUp(width, bdim.x), divUp(height, bdim.y));
+
+            if (correct_gamma)
+                compute_gradients_8UC4_kernel<nthreads, 1><<<gdim, bdim, 0, stream>>>(height, width, img, angle_scale, grad, qangle);
+            else
+                compute_gradients_8UC4_kernel<nthreads, 0><<<gdim, bdim, 0, stream>>>(height, width, img, angle_scale, grad, qangle);
+
+            cudaSafeCall( cudaGetLastError() );
+        }
+
+        template <int nthreads, int correct_gamma>
+        __global__ void compute_gradients_8UC1_kernel(int height, int width, const PtrStepb img,
+                                                      float angle_scale, PtrStepf grad, PtrStepb qangle)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+
+            const unsigned char* row = (const unsigned char*)img.ptr(blockIdx.y);
+
+            __shared__ float sh_row[nthreads + 2];
+
+            if (x < width)
+                sh_row[threadIdx.x + 1] = row[x];
+            else
+                sh_row[threadIdx.x + 1] = row[width - 2];
+
+            if (threadIdx.x == 0)
+                sh_row[0] = row[::max(x - 1, 1)];
+
+            if (threadIdx.x == blockDim.x - 1)
+                sh_row[blockDim.x + 1] = row[::min(x + 1, width - 2)];
+
+            __syncthreads();
+            if (x < width)
+            {
+                float dx;
+
+                if (correct_gamma)
+                    dx = ::sqrtf(sh_row[threadIdx.x + 2]) - ::sqrtf(sh_row[threadIdx.x]);
+                else
+                    dx = sh_row[threadIdx.x + 2] - sh_row[threadIdx.x];
+
+                float dy = 0.f;
+                if (blockIdx.y > 0 && blockIdx.y < height - 1)
+                {
+                    float a = ((const unsigned char*)img.ptr(blockIdx.y + 1))[x];
+                    float b = ((const unsigned char*)img.ptr(blockIdx.y - 1))[x];
+                    if (correct_gamma)
+                        dy = ::sqrtf(a) - ::sqrtf(b);
+                    else
+                        dy = a - b;
+                }
+                float mag = ::sqrtf(dx * dx + dy * dy);
+
+                float ang = (::atan2f(dy, dx) + CV_PI_F) * angle_scale - 0.5f;
+                int hidx = (int)::floorf(ang);
+                ang -= hidx;
+                hidx = (hidx + cnbins) % cnbins;
+
+                ((uchar2*)qangle.ptr(blockIdx.y))[x] = make_uchar2(hidx, (hidx + 1) % cnbins);
+                ((float2*)  grad.ptr(blockIdx.y))[x] = make_float2(mag * (1.f - ang), mag * ang);
+            }
+        }
+
+
+        void compute_gradients_8UC1(int nbins,
+                                    int height, int width, const PtrStepSzb& img,
+                                    float angle_scale,
+                                    PtrStepSzf grad, PtrStepSzb qangle,
+                                    bool correct_gamma,
+                                    const cudaStream_t& stream)
+        {
+            CV_UNUSED(nbins);
+            const int nthreads = 256;
+
+            dim3 bdim(nthreads, 1);
+            dim3 gdim(divUp(width, bdim.x), divUp(height, bdim.y));
+
+            if (correct_gamma)
+                compute_gradients_8UC1_kernel<nthreads, 1><<<gdim, bdim, 0, stream>>>(height, width, img, angle_scale, grad, qangle);
+            else
+                compute_gradients_8UC1_kernel<nthreads, 0><<<gdim, bdim, 0, stream>>>(height, width, img, angle_scale, grad, qangle);
+
+            cudaSafeCall( cudaGetLastError() );
+        }
+
+
+
+        //-------------------------------------------------------------------
+        // Resize
+
+        texture<uchar4, 2, cudaReadModeNormalizedFloat> resize8UC4_tex;
+        texture<uchar,  2, cudaReadModeNormalizedFloat> resize8UC1_tex;
+
+        __global__ void resize_for_hog_kernel(float sx, float sy, PtrStepSz<uchar> dst, int colOfs)
+        {
+            unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
+            unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < dst.cols && y < dst.rows)
+                dst.ptr(y)[x] = tex2D(resize8UC1_tex, x * sx + colOfs, y * sy) * 255;
+        }
+
+        __global__ void resize_for_hog_kernel(float sx, float sy, PtrStepSz<uchar4> dst, int colOfs)
+        {
+            unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
+            unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < dst.cols && y < dst.rows)
+            {
+                float4 val = tex2D(resize8UC4_tex, x * sx + colOfs, y * sy);
+                dst.ptr(y)[x] = make_uchar4(val.x * 255, val.y * 255, val.z * 255, val.w * 255);
+            }
+        }
+
+        template<class T, class TEX>
+        static void resize_for_hog(const PtrStepSzb& src, PtrStepSzb dst, TEX& tex)
+        {
+            tex.filterMode = cudaFilterModeLinear;
+
+            size_t texOfs = 0;
+            int colOfs = 0;
+
+            cudaChannelFormatDesc desc = cudaCreateChannelDesc<T>();
+            cudaSafeCall( cudaBindTexture2D(&texOfs, tex, src.data, desc, src.cols, src.rows, src.step) );
+
+            if (texOfs != 0)
+            {
+                colOfs = static_cast<int>( texOfs/sizeof(T) );
+                cudaSafeCall( cudaUnbindTexture(tex) );
+                cudaSafeCall( cudaBindTexture2D(&texOfs, tex, src.data, desc, src.cols, src.rows, src.step) );
+            }
+
+            dim3 threads(32, 8);
+            dim3 grid(divUp(dst.cols, threads.x), divUp(dst.rows, threads.y));
+
+            float sx = static_cast<float>(src.cols) / dst.cols;
+            float sy = static_cast<float>(src.rows) / dst.rows;
+
+            resize_for_hog_kernel<<<grid, threads>>>(sx, sy, (PtrStepSz<T>)dst, colOfs);
+            cudaSafeCall( cudaGetLastError() );
+
+            cudaSafeCall( cudaDeviceSynchronize() );
+
+            cudaSafeCall( cudaUnbindTexture(tex) );
+        }
+
+        void resize_8UC1(const PtrStepSzb& src, PtrStepSzb dst) { resize_for_hog<uchar> (src, dst, resize8UC1_tex); }
+        void resize_8UC4(const PtrStepSzb& src, PtrStepSzb dst) { resize_for_hog<uchar4>(src, dst, resize8UC4_tex); }
+    } // namespace hog
+}}} // namespace cv { namespace cuda { namespace cudev
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudaobjdetect/src/cuda/lbp.cu b/modules/cudaobjdetect/src/cuda/lbp.cu
new file mode 100644
index 00000000000..e99d28cea9c
--- /dev/null
+++ b/modules/cudaobjdetect/src/cuda/lbp.cu
@@ -0,0 +1,303 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "lbp.hpp"
+#include "opencv2/core/cuda/vec_traits.hpp"
+#include "opencv2/core/cuda/saturate_cast.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace lbp
+    {
+        struct LBP
+        {
+            __host__ __device__ __forceinline__ LBP() {}
+
+            __device__ __forceinline__ int operator() (const int* integral, int ty, int fh, int fw, int& shift) const
+            {
+                int anchors[9];
+
+                anchors[0]  = integral[ty];
+                anchors[1]  = integral[ty + fw];
+                anchors[0] -= anchors[1];
+                anchors[2]  = integral[ty + fw * 2];
+                anchors[1] -= anchors[2];
+                anchors[2] -= integral[ty + fw * 3];
+
+                ty += fh;
+                anchors[3]  = integral[ty];
+                anchors[4]  = integral[ty + fw];
+                anchors[3] -= anchors[4];
+                anchors[5]  = integral[ty + fw * 2];
+                anchors[4] -= anchors[5];
+                anchors[5] -= integral[ty + fw * 3];
+
+                anchors[0] -= anchors[3];
+                anchors[1] -= anchors[4];
+                anchors[2] -= anchors[5];
+                // 0 - 2 contains s0 - s2
+
+                ty += fh;
+                anchors[6]  = integral[ty];
+                anchors[7]  = integral[ty + fw];
+                anchors[6] -= anchors[7];
+                anchors[8]  = integral[ty + fw * 2];
+                anchors[7] -= anchors[8];
+                anchors[8] -= integral[ty + fw * 3];
+
+                anchors[3] -= anchors[6];
+                anchors[4] -= anchors[7];
+                anchors[5] -= anchors[8];
+                // 3 - 5 contains s3 - s5
+
+                anchors[0] -= anchors[4];
+                anchors[1] -= anchors[4];
+                anchors[2] -= anchors[4];
+                anchors[3] -= anchors[4];
+                anchors[5] -= anchors[4];
+
+                int response = (~(anchors[0] >> 31)) & 4;
+                response |= (~(anchors[1] >> 31)) & 2;;
+                response |= (~(anchors[2] >> 31)) & 1;
+
+                shift = (~(anchors[5] >> 31)) & 16;
+                shift |= (~(anchors[3] >> 31)) & 1;
+
+                ty += fh;
+                anchors[0]  = integral[ty];
+                anchors[1]  = integral[ty + fw];
+                anchors[0] -= anchors[1];
+                anchors[2]  = integral[ty + fw * 2];
+                anchors[1] -= anchors[2];
+                anchors[2] -= integral[ty + fw * 3];
+
+                anchors[6] -= anchors[0];
+                anchors[7] -= anchors[1];
+                anchors[8] -= anchors[2];
+                // 0 -2 contains s6 - s8
+
+                anchors[6] -= anchors[4];
+                anchors[7] -= anchors[4];
+                anchors[8] -= anchors[4];
+
+                shift |= (~(anchors[6] >> 31)) & 2;
+                shift |= (~(anchors[7] >> 31)) & 4;
+                shift |= (~(anchors[8] >> 31)) & 8;
+                return response;
+            }
+        };
+
+        template<typename Pr>
+        __global__ void disjoin(int4* candidates, int4* objects, unsigned int n, int groupThreshold, float grouping_eps, unsigned int* nclasses)
+        {
+            unsigned int tid = threadIdx.x;
+            extern __shared__ int sbuff[];
+
+            int* labels = sbuff;
+            int* rrects = sbuff + n;
+
+            Pr predicate(grouping_eps);
+            partition(candidates, n, labels, predicate);
+
+            rrects[tid * 4 + 0] = 0;
+            rrects[tid * 4 + 1] = 0;
+            rrects[tid * 4 + 2] = 0;
+            rrects[tid * 4 + 3] = 0;
+            __syncthreads();
+
+            int cls = labels[tid];
+            Emulation::smem::atomicAdd((rrects + cls * 4 + 0), candidates[tid].x);
+            Emulation::smem::atomicAdd((rrects + cls * 4 + 1), candidates[tid].y);
+            Emulation::smem::atomicAdd((rrects + cls * 4 + 2), candidates[tid].z);
+            Emulation::smem::atomicAdd((rrects + cls * 4 + 3), candidates[tid].w);
+
+            __syncthreads();
+            labels[tid] = 0;
+
+            __syncthreads();
+            Emulation::smem::atomicInc((unsigned int*)labels + cls, n);
+
+            __syncthreads();
+            *nclasses = 0;
+
+            int active = labels[tid];
+            if (active)
+            {
+                int* r1 = rrects + tid * 4;
+                float s = 1.f / active;
+                r1[0] = saturate_cast<int>(r1[0] * s);
+                r1[1] = saturate_cast<int>(r1[1] * s);
+                r1[2] = saturate_cast<int>(r1[2] * s);
+                r1[3] = saturate_cast<int>(r1[3] * s);
+            }
+            __syncthreads();
+
+            if (active && active >= groupThreshold)
+            {
+                int* r1 = rrects + tid * 4;
+                int4 r_out = make_int4(r1[0], r1[1], r1[2], r1[3]);
+
+                int aidx = Emulation::smem::atomicInc(nclasses, n);
+                objects[aidx] = r_out;
+            }
+        }
+
+        void connectedConmonents(PtrStepSz<int4> candidates, int ncandidates, PtrStepSz<int4> objects, int groupThreshold, float grouping_eps, unsigned int* nclasses)
+        {
+            if (!ncandidates) return;
+            int block = ncandidates;
+            int smem  = block * ( sizeof(int) + sizeof(int4) );
+            disjoin<InSameComponint><<<1, block, smem>>>(candidates, objects, ncandidates, groupThreshold, grouping_eps, nclasses);
+            cudaSafeCall( cudaGetLastError() );
+        }
+
+        struct Cascade
+        {
+            __host__ __device__ __forceinline__ Cascade(const Stage* _stages, int _nstages, const ClNode* _nodes, const float* _leaves,
+                const int* _subsets, const uchar4* _features, int _subsetSize)
+
+            : stages(_stages), nstages(_nstages), nodes(_nodes), leaves(_leaves), subsets(_subsets), features(_features), subsetSize(_subsetSize){}
+
+            __device__ __forceinline__ bool operator() (int y, int x, int* integral, const int pitch) const
+            {
+                int current_node = 0;
+                int current_leave = 0;
+
+                for (int s = 0; s < nstages; ++s)
+                {
+                    float sum = 0;
+                    Stage stage = stages[s];
+                    for (int t = 0; t < stage.ntrees; t++)
+                    {
+                        ClNode node = nodes[current_node];
+                        uchar4 feature = features[node.featureIdx];
+
+                        int shift;
+                        int c = evaluator(integral, (y + feature.y) * pitch + x + feature.x, feature.w * pitch, feature.z, shift);
+                        int idx =  (subsets[ current_node * subsetSize + c] & ( 1 << shift)) ? current_leave : current_leave + 1;
+                        sum += leaves[idx];
+
+                        current_node += 1;
+                        current_leave += 2;
+                    }
+
+                    if (sum < stage.threshold)
+                        return false;
+                }
+
+                return true;
+            }
+
+            const Stage*  stages;
+            const int nstages;
+
+            const ClNode* nodes;
+            const float* leaves;
+            const int* subsets;
+            const uchar4* features;
+
+            const int subsetSize;
+            const LBP evaluator;
+        };
+
+        // stepShift, scale, width_k, sum_prev => y =  sum_prev + tid_k / width_k, x = tid_k - tid_k / width_k
+        __global__ void lbp_cascade(const Cascade cascade, int frameW, int frameH, int windowW, int windowH, float scale, const float factor,
+            const int total, int* integral, const int pitch, PtrStepSz<int4> objects, unsigned int* classified)
+        {
+            int ftid = blockIdx.x * blockDim.x + threadIdx.x;
+            if (ftid >= total) return;
+
+            int step = (scale <= 2.f);
+
+            int windowsForLine = (__float2int_rn( __fdividef(frameW, scale)) - windowW) >> step;
+            int stotal = windowsForLine * ( (__float2int_rn( __fdividef(frameH, scale)) - windowH) >> step);
+            int wshift = 0;
+
+            int scaleTid = ftid;
+
+            while (scaleTid >= stotal)
+            {
+                scaleTid -= stotal;
+                wshift += __float2int_rn(__fdividef(frameW, scale)) + 1;
+                scale *= factor;
+                step = (scale <= 2.f);
+                windowsForLine = ( ((__float2int_rn(__fdividef(frameW, scale)) - windowW) >> step));
+                stotal = windowsForLine * ( (__float2int_rn(__fdividef(frameH, scale)) - windowH) >> step);
+            }
+
+            int y = __fdividef(scaleTid, windowsForLine);
+            int x = scaleTid - y * windowsForLine;
+
+            x <<= step;
+            y <<= step;
+
+            if (cascade(y, x + wshift, integral, pitch))
+            {
+                if(x >= __float2int_rn(__fdividef(frameW, scale)) - windowW) return;
+
+                int4 rect;
+                rect.x = __float2int_rn(x * scale);
+                rect.y = __float2int_rn(y * scale);
+                rect.z = __float2int_rn(windowW * scale);
+                rect.w = __float2int_rn(windowH * scale);
+
+                int res = atomicInc(classified, (unsigned int)objects.cols);
+                objects(0, res) = rect;
+            }
+        }
+
+        void classifyPyramid(int frameW, int frameH, int windowW, int windowH, float initialScale, float factor, int workAmount,
+            const PtrStepSzb& mstages, const int nstages, const PtrStepSzi& mnodes, const PtrStepSzf& mleaves, const PtrStepSzi& msubsets, const PtrStepSzb& mfeatures,
+            const int subsetSize, PtrStepSz<int4> objects, unsigned int* classified, PtrStepSzi integral)
+        {
+            const int block = 128;
+            int grid = divUp(workAmount, block);
+            cudaFuncSetCacheConfig(lbp_cascade, cudaFuncCachePreferL1);
+            Cascade cascade((Stage*)mstages.ptr(), nstages, (ClNode*)mnodes.ptr(), mleaves.ptr(), msubsets.ptr(), (uchar4*)mfeatures.ptr(), subsetSize);
+            lbp_cascade<<<grid, block>>>(cascade, frameW, frameH, windowW, windowH, initialScale, factor, workAmount, integral.ptr(), (int)integral.step / sizeof(int), objects, classified);
+        }
+    }
+}}}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudaobjdetect/src/cuda/lbp.hpp b/modules/cudaobjdetect/src/cuda/lbp.hpp
new file mode 100644
index 00000000000..417d7994f97
--- /dev/null
+++ b/modules/cudaobjdetect/src/cuda/lbp.hpp
@@ -0,0 +1,112 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_CUDA_DEVICE_LBP_HPP_
+#define __OPENCV_CUDA_DEVICE_LBP_HPP_
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/emulation.hpp"
+
+namespace cv { namespace cuda { namespace device {
+
+namespace lbp {
+
+    struct Stage
+    {
+        int    first;
+        int    ntrees;
+        float  threshold;
+    };
+
+    struct ClNode
+    {
+        int   left;
+        int   right;
+        int   featureIdx;
+    };
+
+    struct InSameComponint
+    {
+    public:
+        __device__ __forceinline__ InSameComponint(float _eps) : eps(_eps) {}
+        __device__ __forceinline__ InSameComponint(const InSameComponint& other) : eps(other.eps) {}
+
+        __device__ __forceinline__ bool operator()(const int4& r1, const int4& r2) const
+        {
+            float delta = eps * (::min(r1.z, r2.z) + ::min(r1.w, r2.w)) * 0.5f;
+
+            return ::abs(r1.x - r2.x) <= delta && ::abs(r1.y - r2.y) <= delta
+                && ::abs(r1.x + r1.z - r2.x - r2.z) <= delta && ::abs(r1.y + r1.w - r2.y - r2.w) <= delta;
+        }
+        float eps;
+    };
+
+    template<typename Pr>
+    __device__ __forceinline__ void partition(int4* vec, unsigned int n, int* labels, Pr predicate)
+    {
+        unsigned tid = threadIdx.x;
+        labels[tid] = tid;
+        __syncthreads();
+        for (unsigned int id = 0; id < n; id++)
+        {
+            if (tid != id && predicate(vec[tid], vec[id]))
+            {
+                int p = labels[tid];
+                int q = labels[id];
+                if (p < q)
+                {
+                    Emulation::smem::atomicMin(labels + id, p);
+                }
+                else if (p > q)
+                {
+                    Emulation::smem::atomicMin(labels + tid, q);
+                }
+            }
+        }
+        __syncthreads();
+    }
+
+} // lbp
+
+} } }// namespaces
+
+#endif
diff --git a/modules/cudaobjdetect/src/hog.cpp b/modules/cudaobjdetect/src/hog.cpp
new file mode 100644
index 00000000000..06fd64a209f
--- /dev/null
+++ b/modules/cudaobjdetect/src/hog.cpp
@@ -0,0 +1,1754 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+Ptr<cuda::HOG> cv::cuda::HOG::create(Size, Size, Size, Size, int) { throw_no_cuda(); return Ptr<cuda::HOG>(); }
+
+#else
+
+/****************************************************************************************\
+      The code below is implementation of HOG (Histogram-of-Oriented Gradients)
+      descriptor and object detection, introduced by Navneet Dalal and Bill Triggs.
+
+      The computed feature vectors are compatible with the
+      INRIA Object Detection and Localization Toolkit
+      (http://pascal.inrialpes.fr/soft/olt/)
+\****************************************************************************************/
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace hog
+    {
+        void set_up_constants(int nbins,
+                              int block_stride_x, int block_stride_y,
+                              int nblocks_win_x, int nblocks_win_y,
+                              int ncells_block_x, int ncells_block_y,
+                              const cudaStream_t& stream);
+
+        void compute_hists(int nbins,
+                           int block_stride_x, int block_stride_y,
+                           int height, int width,
+                           const PtrStepSzf& grad, const PtrStepSzb& qangle,
+                           float sigma,
+                           float* block_hists,
+                           int cell_size_x, int cell_size_y,
+                           int ncells_block_x, int ncells_block_y,
+                           const cudaStream_t& stream);
+
+        void normalize_hists(int nbins,
+                             int block_stride_x, int block_stride_y,
+                             int height, int width,
+                             float* block_hists,
+                             float threshold,
+                             int cell_size_x, int cell_size_y,
+                             int ncells_block_x, int ncells_block_y,
+                             const cudaStream_t& stream);
+
+        void classify_hists(int win_height, int win_width, int block_stride_y,
+                            int block_stride_x, int win_stride_y, int win_stride_x, int height,
+                            int width, float* block_hists, float* coefs, float free_coef,
+                            float threshold, int cell_size_x, int ncells_block_x, unsigned char* labels);
+
+        void compute_confidence_hists(int win_height, int win_width, int block_stride_y, int block_stride_x,
+                                      int win_stride_y, int win_stride_x, int height, int width, float* block_hists,
+                                      float* coefs, float free_coef, float threshold, int cell_size_x, int ncells_block_x, float *confidences);
+
+        void extract_descrs_by_rows(int win_height, int win_width,
+                                    int block_stride_y, int block_stride_x,
+                                    int win_stride_y, int win_stride_x,
+                                    int height, int width,
+                                    float* block_hists,
+                                    int cell_size_x, int ncells_block_x,
+                                    cv::cuda::PtrStepSzf descriptors,
+                                    const cudaStream_t& stream);
+        void extract_descrs_by_cols(int win_height, int win_width,
+                                    int block_stride_y, int block_stride_x,
+                                    int win_stride_y, int win_stride_x,
+                                    int height, int width,
+                                    float* block_hists,
+                                    int cell_size_x, int ncells_block_x,
+                                    cv::cuda::PtrStepSzf descriptors,
+                                    const cudaStream_t& stream);
+
+        void compute_gradients_8UC1(int nbins,
+                                    int height, int width, const cv::cuda::PtrStepSzb& img,
+                                    float angle_scale,
+                                    cv::cuda::PtrStepSzf grad, cv::cuda::PtrStepSzb qangle,
+                                    bool correct_gamma,
+                                    const cudaStream_t& stream);
+        void compute_gradients_8UC4(int nbins,
+                                    int height, int width, const cv::cuda::PtrStepSzb& img,
+                                    float angle_scale,
+                                    cv::cuda::PtrStepSzf grad, cv::cuda::PtrStepSzb qangle,
+                                    bool correct_gamma,
+                                    const cudaStream_t& stream);
+
+        void resize_8UC1(const cv::cuda::PtrStepSzb& src, cv::cuda::PtrStepSzb dst);
+        void resize_8UC4(const cv::cuda::PtrStepSzb& src, cv::cuda::PtrStepSzb dst);
+    }
+}}}
+
+using namespace cv::cuda::device;
+
+namespace
+{
+    class HOG_Impl : public cv::cuda::HOG
+    {
+    public:
+        HOG_Impl(Size win_size,
+                 Size block_size,
+                 Size block_stride,
+                 Size cell_size,
+                 int nbins);
+
+        virtual void setWinSigma(double win_sigma) { win_sigma_ = win_sigma; }
+        virtual double getWinSigma() const;
+
+        virtual void setL2HysThreshold(double threshold_L2hys) { threshold_L2hys_ = threshold_L2hys; }
+        virtual double getL2HysThreshold() const { return threshold_L2hys_; }
+
+        virtual void setGammaCorrection(bool gamma_correction) { gamma_correction_ = gamma_correction; }
+        virtual bool getGammaCorrection() const { return gamma_correction_; }
+
+        virtual void setNumLevels(int nlevels) { nlevels_ = nlevels; }
+        virtual int getNumLevels() const { return nlevels_; }
+
+        virtual void setHitThreshold(double hit_threshold) { hit_threshold_ = hit_threshold; }
+        virtual double getHitThreshold() const { return hit_threshold_; }
+
+        virtual void setWinStride(Size win_stride) { win_stride_ = win_stride; }
+        virtual Size getWinStride() const { return win_stride_; }
+
+        virtual void setScaleFactor(double scale0) { scale0_ = scale0; }
+        virtual double getScaleFactor() const { return scale0_; }
+
+        virtual void setGroupThreshold(int group_threshold) { group_threshold_ = group_threshold; }
+        virtual int getGroupThreshold() const { return group_threshold_; }
+
+        virtual void setDescriptorFormat(HOGDescriptor::DescriptorStorageFormat descr_format) { descr_format_ = descr_format; }
+        virtual HOGDescriptor::DescriptorStorageFormat getDescriptorFormat() const { return descr_format_; }
+
+        virtual size_t getDescriptorSize() const;
+
+        virtual size_t getBlockHistogramSize() const;
+
+        virtual void setSVMDetector(InputArray detector);
+
+        virtual Mat getDefaultPeopleDetector() const;
+
+        virtual void detect(InputArray img,
+                            std::vector<Point>& found_locations,
+                            std::vector<double>* confidences);
+
+        virtual void detectMultiScale(InputArray img,
+                                      std::vector<Rect>& found_locations,
+                                      std::vector<double>* confidences);
+
+        virtual void compute(InputArray img,
+                             OutputArray descriptors,
+                             Stream& stream);
+
+    private:
+        Size win_size_;
+        Size block_size_;
+        Size block_stride_;
+        Size cell_size_;
+        int nbins_;
+
+        double win_sigma_;
+        double threshold_L2hys_;
+        bool gamma_correction_;
+        int nlevels_;
+        double hit_threshold_;
+        Size win_stride_;
+        double scale0_;
+        int group_threshold_;
+        HOGDescriptor::DescriptorStorageFormat descr_format_;
+        Size cells_per_block_;
+
+    private:
+        int getTotalHistSize(Size img_size) const;
+        void computeBlockHistograms(const GpuMat& img, GpuMat& block_hists, Stream& stream);
+//        void computeGradient(const GpuMat& img, GpuMat& grad, GpuMat& qangle, Stream& stream);
+
+        // Coefficients of the separating plane
+        float free_coef_;
+        GpuMat detector_;
+    };
+
+    HOG_Impl::HOG_Impl(Size win_size,
+                       Size block_size,
+                       Size block_stride,
+                       Size cell_size,
+                       int nbins) :
+        win_size_(win_size),
+        block_size_(block_size),
+        block_stride_(block_stride),
+        cell_size_(cell_size),
+        nbins_(nbins),
+
+        win_sigma_(-1.0),
+        threshold_L2hys_(0.2),
+        gamma_correction_(true),
+        nlevels_(64),
+        hit_threshold_(0.0),
+        win_stride_(block_stride),
+        scale0_(1.05),
+        group_threshold_(2),
+        descr_format_(HOGDescriptor::DESCR_FORMAT_COL_BY_COL),
+        cells_per_block_(block_size.width / cell_size.width, block_size.height / cell_size.height)
+    {
+        CV_Assert((win_size.width  - block_size.width ) % block_stride.width  == 0 &&
+                  (win_size.height - block_size.height) % block_stride.height == 0);
+
+        CV_Assert(block_size.width % cell_size.width == 0 &&
+                  block_size.height % cell_size.height == 0);
+
+        // Navneet Dalal and Bill Triggs. Histograms of oriented gradients for
+        // human detection. In International Conference on Computer Vision and
+        // Pattern Recognition, volume 2, pages 886-893, June 2005
+        // http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf (28.07.2015) [Figure 5]
+        CV_Assert(block_stride == (block_size / 2));
+
+        CV_Assert(cell_size.width == cell_size.height);
+    }
+
+    static int numPartsWithin(int size, int part_size, int stride)
+    {
+        return (size - part_size + stride) / stride;
+    }
+
+    static Size numPartsWithin(Size size, Size part_size, Size stride)
+    {
+        return Size(numPartsWithin(size.width, part_size.width, stride.width),
+                    numPartsWithin(size.height, part_size.height, stride.height));
+    }
+
+    size_t HOG_Impl::getDescriptorSize() const
+    {
+        return numPartsWithin(win_size_, block_size_, block_stride_).area() * getBlockHistogramSize();
+    }
+
+    size_t HOG_Impl::getBlockHistogramSize() const
+    {
+        return nbins_ * cells_per_block_.area();
+    }
+
+    double HOG_Impl::getWinSigma() const
+    {
+        return win_sigma_ >= 0 ? win_sigma_ : (block_size_.width + block_size_.height) / 8.0;
+    }
+
+    void HOG_Impl::setSVMDetector(InputArray _detector)
+    {
+        const int descriptor_size = static_cast<int>(getDescriptorSize());
+
+        const Mat detector = _detector.getMat();
+
+        CV_Assert( detector.type() == CV_32FC1 );
+        CV_Assert( detector.rows == 1 );
+        CV_Assert( detector.cols == descriptor_size || detector.cols == descriptor_size + 1 );
+
+        std::vector<float> detector_reordered(detector.ptr<float>(), detector.ptr<float>() + detector.cols);
+
+        size_t block_hist_size = getBlockHistogramSize();
+        Size blocks_per_win = numPartsWithin(win_size_, block_size_, block_stride_);
+
+        for (int i = 0; i < blocks_per_win.height; ++i)
+        {
+            for (int j = 0; j < blocks_per_win.width; ++j)
+            {
+                const float* src = detector.ptr<float>() + (j * blocks_per_win.height + i) * block_hist_size;
+                float* dst = &detector_reordered[0] + (i * blocks_per_win.width + j) * block_hist_size;
+                for (size_t k = 0; k < block_hist_size; ++k)
+                    dst[k] = src[k];
+            }
+        }
+
+        detector_.upload(Mat(detector_reordered).reshape(1, 1));
+        free_coef_ = detector.cols > descriptor_size ? detector.at<float>(0, descriptor_size) : 0;
+    }
+
+    static Mat getPeopleDetector64x128();
+    static Mat getPeopleDetector48x96();
+
+    Mat HOG_Impl::getDefaultPeopleDetector() const
+    {
+        CV_Assert( win_size_ == Size(64, 128) || win_size_ == Size(48, 96) );
+
+        if (win_size_ == Size(64, 128))
+            return getPeopleDetector64x128();
+        else
+            return getPeopleDetector48x96();
+    }
+
+    void HOG_Impl::detect(InputArray _img, std::vector<Point>& hits, std::vector<double>* confidences)
+    {
+        const GpuMat img = _img.getGpuMat();
+
+        CV_Assert( img.type() == CV_8UC1 || img.type() == CV_8UC4 );
+        CV_Assert( win_stride_.width % block_stride_.width == 0 && win_stride_.height % block_stride_.height == 0 );
+
+        hits.clear();
+        if (detector_.empty())
+            return;
+
+        BufferPool pool(Stream::Null());
+
+        GpuMat block_hists = pool.getBuffer(1, getTotalHistSize(img.size()), CV_32FC1);
+        computeBlockHistograms(img, block_hists, Stream::Null());
+
+        Size wins_per_img = numPartsWithin(img.size(), win_size_, win_stride_);
+
+        if (confidences == NULL)
+        {
+            GpuMat labels = pool.getBuffer(1, wins_per_img.area(), CV_8UC1);
+
+            hog::classify_hists(win_size_.height, win_size_.width,
+                                block_stride_.height, block_stride_.width,
+                                win_stride_.height, win_stride_.width,
+                                img.rows, img.cols,
+                                block_hists.ptr<float>(),
+                                detector_.ptr<float>(),
+                                (float)free_coef_,
+                                (float)hit_threshold_,
+                                cell_size_.width, cells_per_block_.width,
+                                labels.ptr());
+
+            Mat labels_host;
+            labels.download(labels_host);
+            unsigned char* vec = labels_host.ptr();
+
+            for (int i = 0; i < wins_per_img.area(); i++)
+            {
+                int y = i / wins_per_img.width;
+                int x = i - wins_per_img.width * y;
+                if (vec[i])
+                    hits.push_back(Point(x * win_stride_.width, y * win_stride_.height));
+            }
+        }
+        else
+        {
+            GpuMat labels = pool.getBuffer(1, wins_per_img.area(), CV_32FC1);
+
+            hog::compute_confidence_hists(win_size_.height, win_size_.width,
+                                          block_stride_.height, block_stride_.width,
+                                          win_stride_.height, win_stride_.width,
+                                          img.rows, img.cols,
+                                          block_hists.ptr<float>(),
+                                          detector_.ptr<float>(),
+                                          (float)free_coef_,
+                                          (float)hit_threshold_,
+                                          cell_size_.width, cells_per_block_.width,
+                                          labels.ptr<float>());
+
+            Mat labels_host;
+            labels.download(labels_host);
+            float* vec = labels_host.ptr<float>();
+
+            confidences->clear();
+            for (int i = 0; i < wins_per_img.area(); i++)
+            {
+                int y = i / wins_per_img.width;
+                int x = i - wins_per_img.width * y;
+
+                if (vec[i] >= hit_threshold_)
+                {
+                    hits.push_back(Point(x * win_stride_.width, y * win_stride_.height));
+                    confidences->push_back((double)vec[i]);
+                }
+            }
+        }
+    }
+
+    void HOG_Impl::detectMultiScale(InputArray _img,
+                                    std::vector<Rect>& found_locations,
+                                    std::vector<double>* confidences)
+    {
+        const GpuMat img = _img.getGpuMat();
+
+        CV_Assert( img.type() == CV_8UC1 || img.type() == CV_8UC4 );
+        CV_Assert( confidences == NULL || group_threshold_ == 0 );
+
+        std::vector<double> level_scale;
+        double scale = 1.0;
+        int levels = 0;
+        for (levels = 0; levels < nlevels_; levels++)
+        {
+            level_scale.push_back(scale);
+
+            if (cvRound(img.cols / scale) < win_size_.width ||
+                cvRound(img.rows / scale) < win_size_.height ||
+                scale0_ <= 1)
+            {
+                break;
+            }
+
+            scale *= scale0_;
+        }
+        levels = std::max(levels, 1);
+        level_scale.resize(levels);
+
+        std::vector<Point> level_hits;
+        std::vector<double> level_confidences;
+
+        BufferPool pool(Stream::Null());
+
+        found_locations.clear();
+        for (size_t i = 0; i < level_scale.size(); i++)
+        {
+            scale = level_scale[i];
+
+            Size sz(cvRound(img.cols / scale), cvRound(img.rows / scale));
+
+            GpuMat smaller_img;
+            if (sz == img.size())
+            {
+                smaller_img = img;
+            }
+            else
+            {
+                smaller_img = pool.getBuffer(sz, img.type());
+                switch (img.type())
+                {
+                    case CV_8UC1: hog::resize_8UC1(img, smaller_img); break;
+                    case CV_8UC4: hog::resize_8UC4(img, smaller_img); break;
+                }
+            }
+
+            detect(smaller_img, level_hits,
+                   confidences ? &level_confidences : NULL);
+
+            Size scaled_win_size(cvRound(win_size_.width * scale),
+                                 cvRound(win_size_.height * scale));
+
+            for (size_t j = 0; j < level_hits.size(); j++)
+            {
+                found_locations.push_back(Rect(Point2d(level_hits[j]) * scale, scaled_win_size));
+                if (confidences)
+                    confidences->push_back(level_confidences[j]);
+            }
+        }
+
+        if (group_threshold_ > 0)
+        {
+            groupRectangles(found_locations, group_threshold_, 0.2/*magic number copied from CPU version*/);
+        }
+    }
+
+    void HOG_Impl::compute(InputArray _img,
+                           OutputArray _descriptors,
+                           Stream& stream)
+    {
+        const GpuMat img = _img.getGpuMat();
+
+        CV_Assert( img.type() == CV_8UC1 || img.type() == CV_8UC4 );
+        CV_Assert( win_stride_.width % block_stride_.width == 0 && win_stride_.height % block_stride_.height == 0 );
+
+        BufferPool   pool(stream);
+        GpuMat       block_hists     = pool.getBuffer(1, getTotalHistSize(img.size()), CV_32FC1);
+        Size         wins_per_img    = numPartsWithin(img.size(), win_size_, win_stride_);
+        Size         blocks_per_win  = numPartsWithin(win_size_, block_size_, block_stride_);
+        const size_t block_hist_size = getBlockHistogramSize();
+        _descriptors.create(wins_per_img.area(), static_cast<int>(blocks_per_win.area() * block_hist_size), CV_32FC1);
+        GpuMat       descriptors     = _descriptors.getGpuMat();
+
+        computeBlockHistograms(img, block_hists, stream);
+
+        switch (descr_format_)
+        {
+        case HOGDescriptor::DESCR_FORMAT_ROW_BY_ROW:
+            hog::extract_descrs_by_rows(win_size_.height, win_size_.width,
+                                        block_stride_.height, block_stride_.width,
+                                        win_stride_.height, win_stride_.width,
+                                        img.rows, img.cols,
+                                        block_hists.ptr<float>(),
+                                        cell_size_.width, cells_per_block_.width,
+                                        descriptors,
+                                        StreamAccessor::getStream(stream));
+            break;
+        case HOGDescriptor::DESCR_FORMAT_COL_BY_COL:
+            hog::extract_descrs_by_cols(win_size_.height, win_size_.width,
+                                        block_stride_.height, block_stride_.width,
+                                        win_stride_.height, win_stride_.width,
+                                        img.rows, img.cols,
+                                        block_hists.ptr<float>(),
+                                        cell_size_.width, cells_per_block_.width,
+                                        descriptors,
+                                        StreamAccessor::getStream(stream));
+            break;
+        default:
+            CV_Error(cv::Error::StsBadArg, "Unknown descriptor format");
+        }
+    }
+
+    int HOG_Impl::getTotalHistSize(Size img_size) const
+    {
+        size_t block_hist_size = getBlockHistogramSize();
+        Size blocks_per_img = numPartsWithin(img_size, block_size_, block_stride_);
+        return static_cast<int>(block_hist_size * blocks_per_img.area());
+    }
+
+    void HOG_Impl::computeBlockHistograms(const GpuMat& img, GpuMat& block_hists, Stream& stream)
+    {
+        BufferPool pool(stream);
+        cv::Size blocks_per_win = numPartsWithin(win_size_, block_size_, block_stride_);
+        float  angleScale = static_cast<float>(nbins_ / CV_PI);
+        GpuMat grad       = pool.getBuffer(img.size(), CV_32FC2);
+        GpuMat qangle     = pool.getBuffer(img.size(), CV_8UC2);
+
+        hog::set_up_constants(nbins_,
+                              block_stride_.width, block_stride_.height,
+                              blocks_per_win.width, blocks_per_win.height,
+                              cells_per_block_.width, cells_per_block_.height,
+                              StreamAccessor::getStream(stream));
+
+        switch (img.type())
+        {
+            case CV_8UC1:
+                hog::compute_gradients_8UC1(nbins_,
+                                            img.rows, img.cols, img,
+                                            angleScale,
+                                            grad, qangle,
+                                            gamma_correction_,
+                                            StreamAccessor::getStream(stream));
+                break;
+            case CV_8UC4:
+                hog::compute_gradients_8UC4(nbins_,
+                                            img.rows, img.cols, img,
+                                            angleScale,
+                                            grad, qangle,
+                                            gamma_correction_,
+                                            StreamAccessor::getStream(stream));
+                break;
+        }
+
+        hog::compute_hists(nbins_,
+                           block_stride_.width, block_stride_.height,
+                           img.rows, img.cols,
+                           grad, qangle,
+                           (float)getWinSigma(),
+                           block_hists.ptr<float>(),
+                           cell_size_.width, cell_size_.height,
+                           cells_per_block_.width, cells_per_block_.height,
+                           StreamAccessor::getStream(stream));
+
+        hog::normalize_hists(nbins_,
+                             block_stride_.width, block_stride_.height,
+                             img.rows, img.cols,
+                             block_hists.ptr<float>(),
+                             (float)threshold_L2hys_,
+                             cell_size_.width, cell_size_.height,
+                             cells_per_block_.width, cells_per_block_.height,
+                             StreamAccessor::getStream(stream));
+    }
+}
+
+Ptr<cuda::HOG> cv::cuda::HOG::create(Size win_size,
+                                     Size block_size,
+                                     Size block_stride,
+                                     Size cell_size,
+                                     int nbins)
+{
+    return makePtr<HOG_Impl>(win_size, block_size, block_stride, cell_size, nbins);
+}
+
+namespace
+{
+    static Mat getPeopleDetector48x96()
+    {
+        static float detector[] = {
+            0.294350f, -0.098796f, -0.129522f, 0.078753f, 0.387527f, 0.261529f,
+            0.145939f, 0.061520f, 0.328699f, 0.227148f, -0.066467f, -0.086723f,
+            0.047559f, 0.106714f, 0.037897f, 0.111461f, -0.024406f, 0.304769f,
+            0.254676f, -0.069235f, 0.082566f, 0.147260f, 0.326969f, 0.148888f,
+            0.055270f, -0.087985f, 0.261720f, 0.143442f, 0.026812f, 0.238212f,
+            0.194020f, 0.056341f, -0.025854f, -0.034444f, -0.156631f, 0.205174f,
+            0.089008f, -0.139811f, -0.100147f, -0.037830f, -0.029230f, -0.055641f,
+            0.033248f, -0.016512f, 0.155244f, 0.247315f, -0.124694f, -0.048414f,
+            -0.062219f, 0.193683f, 0.004574f, 0.055089f, 0.093565f, 0.167712f,
+            0.167581f, 0.018895f, 0.215258f, 0.122609f, 0.090520f, -0.067219f,
+            -0.049029f, -0.099615f, 0.241804f, -0.094893f, -0.176248f, 0.001727f,
+            -0.134473f, 0.104442f, 0.050942f, 0.081165f, 0.072156f, 0.121646f,
+            0.002656f, -0.297974f, -0.133587f, -0.060121f, -0.092515f, -0.048974f,
+            -0.084754f, -0.180111f, -0.038590f, 0.086283f, -0.134636f, -0.107249f,
+            0.132890f, 0.141556f, 0.249425f, 0.130273f, -0.030031f, 0.073212f,
+            -0.008155f, 0.019931f, 0.071688f, 0.000300f, -0.019525f, -0.021725f,
+            -0.040993f, -0.086841f, 0.070124f, 0.240033f, 0.265350f, 0.043208f,
+            0.166754f, 0.091453f, 0.060916f, -0.036972f, -0.091043f, 0.079873f,
+            0.219781f, 0.158102f, -0.140618f, -0.043016f, 0.124802f, 0.093668f,
+            0.103208f, 0.094872f, 0.080541f, 0.137711f, 0.160566f, -0.169231f,
+            0.013983f, 0.309508f, -0.004217f, -0.057200f, -0.064489f, 0.014066f,
+            0.361009f, 0.251328f, -0.080983f, -0.044183f, 0.061436f, -0.037381f,
+            -0.078786f, 0.030993f, 0.066314f, 0.037683f, 0.152325f, -0.091683f,
+            0.070203f, 0.217856f, 0.036435f, -0.076462f, 0.006254f, -0.094431f,
+            0.154829f, -0.023038f, -0.196961f, -0.024594f, 0.178465f, -0.050139f,
+            -0.045932f, -0.000965f, 0.109112f, 0.046165f, -0.159373f, -0.008713f,
+            0.041307f, 0.097129f, -0.057211f, -0.064599f, 0.077165f, 0.176167f,
+            0.138322f, 0.065753f, -0.104950f, 0.017933f, 0.136255f, -0.011598f,
+            0.047007f, 0.080550f, 0.068619f, 0.084661f, -0.035493f, -0.091314f,
+            -0.041411f, 0.060971f, -0.101912f, -0.079870f, -0.085977f, -0.022686f,
+            0.079788f, -0.098064f, -0.054603f, 0.040383f, 0.300794f, 0.128603f,
+            0.094844f, 0.047407f, 0.101825f, 0.061832f, -0.162160f, -0.204553f,
+            -0.035165f, 0.101450f, -0.016641f, -0.027140f, -0.134392f, -0.008743f,
+            0.102331f, 0.114853f, 0.009644f, 0.062823f, 0.237339f, 0.167843f,
+            0.053066f, -0.012592f, 0.043158f, 0.002305f, 0.065001f, -0.038929f,
+            -0.020356f, 0.152343f, 0.043469f, -0.029967f, -0.042948f, 0.032481f,
+            0.068488f, -0.110840f, -0.111083f, 0.111980f, -0.002072f, -0.005562f,
+            0.082926f, 0.006635f, -0.108153f, 0.024242f, -0.086464f, -0.189884f,
+            -0.017492f, 0.191456f, -0.007683f, -0.128769f, -0.038017f, -0.132380f,
+            0.091926f, 0.079696f, -0.106728f, -0.007656f, 0.172744f, 0.011576f,
+            0.009883f, 0.083258f, -0.026516f, 0.145534f, 0.153924f, -0.130290f,
+            -0.108945f, 0.124490f, -0.003186f, -0.100485f, 0.015024f, -0.060512f,
+            0.026288f, -0.086713f, -0.169012f, 0.076517f, 0.215778f, 0.043701f,
+            -0.131642f, -0.012585f, -0.045181f, -0.118183f, -0.241544f, -0.167293f,
+            -0.020107f, -0.019917f, -0.101827f, -0.107096f, -0.010503f, 0.044938f,
+            0.189680f, 0.217119f, -0.046086f, 0.044508f, 0.199716f, -0.036004f,
+            -0.148927f, 0.013355f, -0.078279f, 0.030451f, 0.056301f, -0.024609f,
+            0.083224f, 0.099533f, -0.039432f, -0.138880f, 0.005482f, -0.024120f,
+            -0.140468f, -0.066381f, -0.017057f, 0.009260f, -0.058004f, -0.028486f,
+            -0.061610f, 0.007483f, -0.158309f, -0.150687f, -0.044595f, -0.105121f,
+            -0.045763f, -0.006618f, -0.024419f, -0.117713f, -0.119366f, -0.175941f,
+            -0.071542f, 0.119027f, 0.111362f, 0.043080f, 0.034889f, 0.093003f,
+            0.007842f, 0.057368f, -0.108834f, -0.079968f, 0.230959f, 0.020205f,
+            0.011470f, 0.098877f, 0.101310f, -0.030215f, -0.018018f, -0.059552f,
+            -0.106157f, 0.021866f, -0.036471f, 0.080051f, 0.041165f, -0.082101f,
+            0.117726f, 0.030961f, -0.054763f, -0.084102f, -0.185778f, -0.061305f,
+            -0.038089f, -0.110728f, -0.264010f, 0.076675f, -0.077111f, -0.137644f,
+            0.036232f, 0.277995f, 0.019116f, 0.107738f, 0.144003f, 0.080304f,
+            0.215036f, 0.228897f, 0.072713f, 0.077773f, 0.120168f, 0.075324f,
+            0.062730f, 0.122478f, -0.049008f, 0.164912f, 0.162450f, 0.041246f,
+            0.009891f, -0.097827f, -0.038700f, -0.023027f, -0.120020f, 0.203364f,
+            0.248474f, 0.149810f, -0.036276f, -0.082814f, -0.090343f, -0.027143f,
+            -0.075689f, -0.320310f, -0.000500f, -0.143334f, -0.065077f, -0.186936f,
+            0.129372f, 0.116431f, 0.181699f, 0.170436f, 0.418854f, 0.460045f,
+            0.333719f, 0.230515f, 0.047822f, -0.044954f, -0.068086f, 0.140179f,
+            -0.044821f, 0.085550f, 0.092483f, -0.107296f, -0.130670f, -0.206629f,
+            0.114601f, -0.317869f, -0.076663f, 0.038680f, 0.212753f, -0.016059f,
+            -0.126526f, -0.163602f, 0.210154f, 0.099887f, -0.126366f, 0.118453f,
+            0.019309f, -0.021611f, -0.096499f, -0.111809f, -0.200489f, 0.142854f,
+            0.228840f, -0.353346f, -0.179151f, 0.116834f, 0.252389f, -0.031728f,
+            -0.188135f, -0.158998f, 0.386523f, 0.122315f, 0.209944f, 0.394023f,
+            0.359030f, 0.260717f, 0.170335f, 0.013683f, -0.142596f, -0.026138f,
+            -0.011878f, -0.150519f, 0.047159f, -0.107062f, -0.147347f, -0.187689f,
+            -0.186027f, -0.208048f, 0.058468f, -0.073026f, -0.236556f, -0.079788f,
+            -0.146216f, -0.058563f, -0.101361f, -0.071294f, -0.071093f, 0.116919f,
+            0.234304f, 0.306781f, 0.321866f, 0.240000f, 0.073261f, -0.012173f,
+            0.026479f, 0.050173f, 0.166127f, 0.228955f, 0.061905f, 0.156460f,
+            0.205990f, 0.120672f, 0.037350f, 0.167884f, 0.290099f, 0.420900f,
+            -0.012601f, 0.189839f, 0.306378f, 0.118383f, -0.095598f, -0.072360f,
+            -0.132496f, -0.224259f, -0.126021f, 0.022714f, 0.284039f, 0.051369f,
+            -0.000927f, -0.058735f, -0.083354f, -0.141254f, -0.187578f, -0.202669f,
+            0.048902f, 0.246597f, 0.441863f, 0.342519f, 0.066979f, 0.215286f,
+            0.188191f, -0.072240f, -0.208142f, -0.030196f, 0.178141f, 0.136985f,
+            -0.043374f, -0.181098f, 0.091815f, 0.116177f, -0.126690f, -0.386625f,
+            0.368165f, 0.269149f, -0.088042f, -0.028823f, 0.092961f, 0.024099f,
+            0.046112f, 0.176756f, 0.135849f, 0.124955f, 0.195467f, -0.037218f,
+            0.167217f, 0.188938f, 0.053528f, -0.066561f, 0.133721f, -0.070565f,
+            0.115898f, 0.152435f, -0.116993f, -0.110592f, -0.179005f, 0.026668f,
+            0.080530f, 0.075084f, -0.070401f, 0.012497f, 0.021849f, -0.139764f,
+            -0.022020f, -0.096301f, -0.064954f, -0.127446f, -0.013806f, -0.108315f,
+            0.156285f, 0.149867f, -0.011382f, 0.064532f, 0.029168f, 0.027393f,
+            0.069716f, 0.153735f, 0.038459f, 0.230714f, 0.253840f, 0.059522f,
+            -0.045053f, 0.014083f, 0.071103f, 0.068747f, 0.095887f, 0.005832f,
+            0.144887f, 0.026357f, -0.067359f, -0.044151f, -0.123283f, -0.019911f,
+            0.005318f, 0.109208f, -0.003201f, -0.021734f, 0.142025f, -0.066907f,
+            -0.120070f, -0.188639f, 0.012472f, -0.048704f, -0.012366f, -0.184828f,
+            0.168591f, 0.267166f, 0.058208f, -0.044101f, 0.033500f, 0.178558f,
+            0.104550f, 0.122418f, 0.080177f, 0.173246f, 0.298537f, 0.064173f,
+            0.053397f, 0.174341f, 0.230984f, 0.117025f, 0.166242f, 0.227781f,
+            0.120623f, 0.176952f, -0.011393f, -0.086483f, -0.008270f, 0.051700f,
+            -0.153369f, -0.058837f, -0.057639f, -0.060115f, 0.026349f, -0.160745f,
+            -0.037894f, -0.048575f, 0.041052f, -0.022112f, 0.060365f, 0.051906f,
+            0.162657f, 0.138519f, -0.050185f, -0.005938f, 0.071301f, 0.127686f,
+            0.062342f, 0.144400f, 0.072600f, 0.198436f, 0.246219f, -0.078185f,
+            -0.036169f, 0.075934f, 0.047328f, -0.013601f, 0.087205f, 0.019900f,
+            0.022606f, -0.015365f, -0.092506f, 0.075275f, -0.116375f, 0.050500f,
+            0.045118f, 0.166567f, 0.072073f, 0.060371f, 0.131747f, -0.169863f,
+            -0.039352f, -0.047486f, -0.039797f, -0.204312f, 0.021710f, 0.129443f,
+            -0.021173f, 0.173416f, -0.070794f, -0.063986f, 0.069689f, -0.064099f,
+            -0.123201f, -0.017372f, -0.206870f, 0.065863f, 0.113226f, 0.024707f,
+            -0.071341f, -0.066964f, -0.098278f, -0.062927f, 0.075840f, 0.014716f,
+            0.019378f, 0.132699f, -0.074191f, -0.089557f, -0.078446f, -0.197488f,
+            -0.173665f, 0.052583f, 0.044361f, 0.113549f, 0.098492f, 0.077379f,
+            -0.011146f, -0.192593f, -0.164435f, 0.045568f, 0.205699f, 0.049187f,
+            -0.082281f, 0.134874f, 0.185499f, 0.034968f, -0.119561f, -0.112372f,
+            -0.115091f, -0.054042f, -0.183816f, -0.078100f, 0.190695f, 0.091617f,
+            0.004257f, -0.041135f, -0.061453f, -0.141592f, -0.194809f, -0.120638f,
+            0.020168f, 0.109672f, 0.067398f, -0.015238f, -0.239145f, -0.264671f,
+            -0.185176f, 0.050472f, 0.020793f, 0.035678f, 0.022839f, -0.052055f,
+            -0.127968f, -0.113049f, -0.228416f, -0.258281f, -0.053437f, 0.076424f,
+            0.061450f, 0.237478f, 0.003618f, -0.055865f, -0.108087f, -0.028937f,
+            0.045585f, 0.052829f, -0.001471f, 0.022826f, 0.059565f, -0.104430f,
+            -0.077266f, -0.211882f, -0.212078f, 0.028074f, 0.075846f, 0.016265f,
+            0.161879f, 0.134477f, 0.008935f, -0.048041f, 0.074692f, 0.004928f,
+            -0.025156f, 0.192874f, 0.074410f, 0.308732f, 0.267400f, 0.094208f,
+            -0.005251f, 0.042041f, -0.032148f, 0.015588f, 0.252869f, 0.175302f,
+            0.022892f, 0.081673f, 0.063208f, 0.162626f, 0.194426f, 0.233890f,
+            0.262292f, 0.186930f, 0.084079f, -0.286388f, -0.213034f, -0.048867f,
+            -0.207669f, -0.170050f, 0.011673f, -0.092958f, -0.192786f, -0.273536f,
+            0.230904f, 0.266732f, 0.320519f, 0.297155f, 0.548169f, 0.304922f,
+            0.132687f, 0.247333f, 0.212488f, -0.271472f, -0.142105f, -0.002627f,
+            -0.119215f, 0.128383f, 0.100079f, -0.057490f, -0.121902f, -0.228892f,
+            0.202292f, -0.399795f, -0.371326f, -0.095836f, -0.063626f, -0.161375f,
+            -0.311180f, -0.294797f, 0.242122f, 0.011788f, 0.095573f, 0.322523f,
+            0.511840f, 0.322880f, 0.313259f, 0.173331f, 0.002542f, -0.029802f,
+            0.324766f, -0.326170f, -0.340547f, -0.138288f, -0.002963f, -0.114060f,
+            -0.377312f, -0.442570f, 0.212446f, -0.007759f, -0.011576f, 0.169711f,
+            0.308689f, 0.317348f, 0.539390f, 0.332845f, 0.057331f, -0.068180f,
+            0.101994f, 0.266995f, 0.209570f, 0.355730f, 0.091635f, 0.170238f,
+            0.125215f, 0.274154f, 0.070223f, 0.025515f, 0.049946f, -0.000550f,
+            0.043715f, -0.141843f, 0.020844f, 0.129871f, 0.256588f, 0.105015f,
+            0.148339f, 0.170682f, 0.028792f, 0.074037f, 0.160042f, 0.405137f,
+            0.246187f, 0.352160f, 0.168951f, 0.222263f, 0.264439f, 0.065945f,
+            0.021963f, -0.075084f, 0.093105f, 0.027318f, 0.098864f, 0.057566f,
+            -0.080282f, 0.185032f, 0.314419f, 0.333727f, 0.125798f, 0.294919f,
+            0.386002f, 0.217619f, -0.183517f, -0.278622f, -0.002342f, -0.027821f,
+            -0.134266f, -0.331843f, -0.008296f, 0.124564f, 0.053712f, -0.369016f,
+            -0.095036f, 0.209381f, 0.423760f, 0.371760f, 0.106397f, 0.369408f,
+            0.485608f, 0.231201f, -0.138685f, -0.349208f, -0.070083f, 0.028991f,
+            -0.081630f, -0.395992f, -0.146791f, -0.027354f, 0.063396f, -0.272484f,
+            0.058299f, 0.338207f, 0.110767f, -0.052642f, -0.233848f, -0.027448f,
+            0.030328f, 0.155572f, -0.093826f, 0.019331f, 0.120638f, 0.006292f,
+            -0.106083f, -0.236290f, -0.140933f, -0.088067f, -0.025138f, -0.208395f,
+            -0.025502f, 0.144192f, -0.048353f, -0.106144f, -0.305121f, -0.114147f,
+            0.090963f, 0.327727f, 0.035606f, -0.093779f, 0.002651f, -0.171081f,
+            -0.188131f, -0.216571f, -0.209101f, -0.054402f, 0.157147f, -0.057127f,
+            0.066584f, 0.008988f, 0.041191f, 0.034456f, -0.078255f, 0.052099f,
+            -0.022239f, 0.066981f, -0.117520f, -0.072637f, 0.062512f, 0.037570f,
+            -0.057544f, -0.312359f, 0.034357f, -0.031549f, 0.002566f, -0.207375f,
+            -0.070654f, -0.018786f, -0.044815f, -0.012814f, -0.076320f, 0.078183f,
+            0.023877f, 0.117078f, 0.022292f, -0.205424f, -0.060430f, -0.017296f,
+            -0.004827f, -0.321036f, -0.092155f, 0.038837f, 0.073190f, -0.067513f,
+            0.026521f, 0.171945f, 0.087318f, 0.034495f, -0.034089f, 0.154410f,
+            -0.061431f, 0.007435f, -0.111094f, -0.095976f, 0.014741f, -0.132324f,
+            -0.029517f, -0.192160f, 0.098667f, 0.020762f, 0.177050f, -0.064510f,
+            -0.054437f, -0.058678f, -0.001858f, 0.167602f, 0.015735f, 0.054338f,
+            0.016477f, 0.186381f, -0.010667f, 0.054692f, 0.126742f, 0.013140f,
+            0.090353f, -0.133608f, -0.018017f, -0.152619f, 0.027600f, -0.138700f,
+            -0.050274f, 0.045141f, -0.118731f, 0.094797f, -0.167605f, 0.097461f,
+            -0.009131f, 0.199920f, -0.052976f, 0.158194f, 0.178568f, -0.107600f,
+            0.009671f, -0.084072f, -0.040258f, -0.205673f, 0.102891f, 0.223511f,
+            0.042699f, 0.118548f, -0.021274f, 0.110997f, -0.155121f, 0.027696f,
+            -0.149968f, 0.051552f, -0.129219f, 0.173524f, 0.073972f, -0.189045f,
+            -0.034523f, -0.106655f, -0.011843f, -0.197381f, 0.219413f, 0.183197f,
+            -0.054920f, 0.144955f, 0.036517f, -0.085412f, -0.229070f, -0.143710f,
+            -0.049486f, 0.156634f, -0.008673f, -0.064778f, 0.082344f, 0.145673f,
+            0.002912f, -0.210121f, -0.116564f, 0.078425f, 0.220908f, -0.067594f,
+            0.048610f, 0.084912f, -0.066202f, -0.112515f, -0.217767f, -0.082640f,
+            -0.017414f, 0.230265f, -0.070735f, 0.066073f, 0.215256f, 0.071157f,
+            -0.087220f, -0.202235f, -0.011918f, 0.099562f, 0.174716f, -0.063845f,
+            -0.121055f, 0.014367f, 0.132709f, -0.005060f, -0.244606f, -0.179693f,
+            -0.134690f, 0.023239f, -0.193116f, -0.076975f, -0.021164f, -0.001938f,
+            -0.163799f, -0.111437f, -0.210362f, -0.166376f, 0.034754f, 0.010036f,
+            -0.021917f, 0.068014f, -0.086893f, -0.251746f, -0.267171f, 0.037383f,
+            0.003966f, 0.033571f, -0.151506f, 0.025437f, -0.020626f, -0.308454f,
+            -0.343143f, -0.092263f, -0.026261f, -0.028345f, 0.036036f, 0.035169f,
+            0.129470f, 0.122205f, 0.015661f, -0.070612f, -0.094333f, -0.066055f,
+            -0.041083f, 0.159146f, 0.073184f, 0.110044f, 0.174471f, 0.078069f,
+            -0.014881f, 0.008116f, 0.013209f, 0.075857f, 0.195605f, 0.062714f,
+            0.067955f, 0.056544f, -0.153908f, -0.141749f, -0.072550f, 0.033523f,
+            -0.024665f, 0.134487f, 0.079076f, 0.133562f, 0.227130f, 0.018054f,
+            0.004928f, 0.169162f, 0.065152f, 0.072160f, 0.131631f, 0.096303f,
+            0.054288f, 0.106256f, 0.114632f, 0.119038f, 0.515200f, 0.247429f,
+            0.199134f, 0.211957f, 0.127558f, -0.294684f, -0.194890f, -0.049988f,
+            -0.112247f, -0.008122f, -0.006176f, 0.037035f, -0.110881f, -0.249989f,
+            0.152434f, 0.234621f, 0.153340f, 0.349283f, 0.683049f, 0.157174f,
+            0.124844f, 0.099136f, 0.064407f, -0.248400f, -0.155323f, -0.026498f,
+            -0.023450f, 0.049051f, -0.114187f, 0.007195f, -0.176825f, -0.376926f,
+            0.366159f, -0.179938f, -0.148508f, 0.006043f, 0.170048f, 0.097866f,
+            -0.102658f, -0.260430f, 0.248868f, 0.037019f, -0.118111f, 0.078176f,
+            0.194171f, 0.211328f, 0.368612f, 0.361213f, 0.130013f, 0.094650f,
+            0.227396f, -0.178058f, -0.114782f, -0.008093f, 0.231080f, -0.011843f,
+            -0.097917f, -0.325788f, 0.141879f, 0.119738f, -0.230427f, -0.117419f,
+            -0.114153f, 0.037903f, 0.116383f, 0.218773f, -0.101884f, 0.059466f,
+            0.119255f, 0.010874f, -0.031449f, 0.045996f, 0.119931f, 0.273760f,
+            0.311700f, 0.261794f, 0.194809f, 0.339829f, 0.239449f, 0.064140f,
+            0.077597f, 0.098996f, 0.143534f, 0.184602f, 0.037507f, 0.225494f,
+            0.096142f, -0.147370f, -0.207833f, -0.174742f, -0.086391f, -0.038942f,
+            0.159577f, -0.088492f, -0.000989f, 0.108154f, -0.025890f, -0.072713f,
+            0.025997f, -0.006803f, -0.086879f, -0.011290f, -0.269200f, -0.103450f,
+            -0.124910f, -0.116340f, 0.141459f, 0.208800f, 0.042268f, 0.265034f,
+            0.516474f, 0.217591f, -0.018843f, -0.313328f, -0.168363f, 0.047129f,
+            0.090480f, -0.109852f, -0.018761f, 0.210669f, 0.281269f, -0.043591f,
+            -0.034147f, -0.237772f, -0.134843f, -0.072481f, -0.103831f, 0.038355f,
+            0.308619f, 0.148023f, -0.045867f, -0.123950f, -0.210860f, -0.064973f,
+            -0.036308f, -0.046731f, -0.022099f, 0.095776f, 0.409423f, 0.060635f,
+            -0.065196f, 0.051828f, 0.027981f, -0.009609f, -0.137681f, -0.095011f,
+            -0.019045f, 0.177278f, 0.009759f, -0.092119f, -0.016958f, -0.133860f,
+            -0.118421f, -0.032039f, -0.006214f, -0.084541f, 0.063971f, -0.073642f,
+            0.165676f, 0.110443f, 0.044131f, 0.046568f, 0.053292f, -0.055466f,
+            0.015512f, 0.371947f, 0.232102f, -0.016923f, 0.103979f, -0.091758f,
+            0.005907f, 0.209100f, 0.157433f, 0.030518f, 0.250366f, 0.062322f,
+            0.036720f, 0.094676f, 0.017306f, -0.010328f, -0.079012f, 0.016781f,
+            -0.112435f, 0.061795f, 0.042543f, -0.126799f, -0.009975f, -0.056760f,
+            0.046424f, -0.194712f, -0.139399f, -0.037731f, 0.157989f, -0.016261f,
+            0.123345f, 0.230563f, 0.083300f, -0.016392f, 0.059567f, -0.016035f,
+            -0.064767f, 0.231945f, 0.156629f, 0.034602f, 0.145628f, 0.041315f,
+            0.034535f, 0.019967f, -0.089188f, -0.012091f, 0.307857f, 0.211405f,
+            -0.025091f, -0.148249f, -0.129384f, 0.063536f, -0.068603f, -0.067941f,
+            -0.035104f, 0.210832f, 0.063810f, 0.062764f, -0.089889f, -0.030554f,
+            0.014791f, -0.053362f, -0.037818f, -0.196640f, 0.008388f, -0.082654f,
+            0.143056f, 0.064221f, 0.069795f, 0.191040f, 0.097321f, -0.028679f,
+            0.075794f, 0.313154f, 0.086240f, 0.207643f, 0.017809f, 0.122867f,
+            0.224586f, 0.167403f, -0.023884f, 0.047434f, 0.344091f, 0.187745f,
+            0.136177f, 0.141738f, 0.063799f, 0.045233f, -0.077342f, -0.003525f,
+            -0.165041f, -0.025616f, -0.073745f, 0.164439f, 0.011200f, -0.145896f,
+            -0.027954f, -0.061987f, -0.039874f, -0.142775f, 0.151042f, -0.038238f,
+            0.053152f, 0.078615f, 0.086061f, 0.100593f, 0.128046f, -0.071006f,
+            -0.116558f, 0.208445f, 0.051086f, 0.076843f, 0.023191f, -0.084781f,
+            -0.011790f, 0.147807f, -0.048554f, -0.113932f, 0.283322f, 0.190934f,
+            0.092789f, 0.033018f, -0.142428f, -0.142480f, -0.099023f, -0.041020f,
+            -0.042760f, 0.203295f, -0.053475f, 0.042424f, 0.222839f, -0.019167f,
+            -0.133176f, -0.276216f, -0.031998f, 0.117290f, 0.177827f, -0.059973f,
+            -0.064744f, -0.117040f, -0.155482f, -0.099531f, 0.164121f, -0.026682f,
+            -0.093810f, 0.238993f, -0.006506f, 0.007830f, 0.065819f, -0.203643f,
+            -0.100925f, -0.053652f, -0.130770f, 0.026277f, 0.131796f, 0.032742f,
+            0.127186f, 0.116694f, -0.161122f, -0.279773f, -0.252515f, -0.002638f,
+            0.042812f, 0.096776f, -0.123280f, 0.064858f, -0.010455f, -0.219760f,
+            -0.239331f, -0.104363f, -0.058022f, -0.053584f, 0.025611f, 0.005129f,
+            -0.100418f, -0.045712f, -0.194418f, -0.126366f, -0.030530f, 0.051168f,
+            0.215959f, 0.172402f, -0.054700f, -0.185995f, -0.278360f, -0.193693f,
+            -0.040309f, 0.003735f, -0.007770f, 0.123556f, 0.190179f, -0.077315f,
+            0.117403f, 0.212942f, 0.012160f, 0.000113f, 0.027331f, 0.040202f,
+            0.033293f, 0.219438f, 0.184174f, 0.259349f, 0.311206f, 0.082547f,
+            -0.047875f, -0.078417f, 0.010746f, 0.082620f, 0.311931f, 0.307605f,
+            0.003863f, 0.021405f, -0.026388f, -0.019572f, 0.020582f, -0.059353f,
+            0.025199f, 0.261319f, 0.086316f, 0.143614f, 0.107780f, 0.003900f,
+            -0.188397f, -0.038563f, -0.106045f, -0.125154f, -0.010509f, 0.054021f,
+            0.242130f, 0.279152f, 0.215546f, 0.346995f, 0.440856f, 0.237452f,
+            0.234154f, 0.301646f, 0.168929f, -0.208358f, -0.126848f, 0.010260f,
+            0.121018f, -0.062975f, -0.052848f, 0.050341f, -0.061103f, -0.266482f,
+            0.107186f, 0.140221f, 0.280065f, 0.287889f, 0.373198f, 0.151596f,
+            0.013593f, 0.115616f, 0.014616f, -0.281710f, -0.237597f, -0.117305f,
+            -0.000034f, -0.136739f, -0.196275f, -0.095225f, -0.125310f, -0.250514f,
+            0.236804f, -0.071805f, -0.037421f, 0.048230f, 0.321596f, 0.063632f,
+            0.024039f, -0.029133f, 0.230983f, 0.160593f, -0.154355f, -0.013086f,
+            -0.079929f, 0.094692f, 0.160391f, 0.180239f, 0.053895f, 0.100759f,
+            0.288631f, 0.038191f, 0.181692f, 0.229682f, 0.440166f, 0.063401f,
+            0.006273f, 0.020865f, 0.338695f, 0.256244f, -0.043927f, 0.115617f,
+            0.003296f, 0.173965f, 0.021318f, -0.040936f, -0.118932f, 0.182380f,
+            0.235922f, -0.053233f, -0.015053f, -0.101057f, 0.095341f, 0.051111f,
+            0.161831f, 0.032614f, 0.159496f, 0.072375f, 0.025089f, 0.023748f,
+            0.029151f, 0.161284f, -0.117717f, -0.036191f, -0.176822f, -0.162006f,
+            0.226542f, -0.078329f, 0.043079f, -0.119172f, 0.054614f, -0.101365f,
+            -0.064541f, -0.115304f, 0.135170f, 0.298872f, 0.098060f, 0.089428f,
+            -0.007497f, 0.110391f, -0.028824f, 0.020835f, -0.036804f, 0.125411f,
+            0.192105f, -0.048931f, 0.003086f, -0.010681f, 0.074698f, -0.016263f,
+            0.096063f, 0.060267f, -0.007277f, 0.139139f, -0.080635f, 0.036628f,
+            0.086058f, 0.131979f, 0.085707f, 0.025301f, 0.226094f, 0.194759f,
+            0.042193f, -0.157846f, -0.068402f, -0.141450f, -0.112659f, -0.076305f,
+            -0.069085f, -0.114332f, -0.102005f, 0.132193f, -0.067042f, 0.106643f,
+            0.198964f, 0.171616f, 0.167237f, -0.033730f, -0.026755f, 0.083621f,
+            0.149459f, -0.002799f, -0.000318f, 0.011753f, 0.065889f, -0.089375f,
+            -0.049610f, 0.224579f, 0.216548f, -0.034908f, -0.017851f, -0.088144f,
+            0.007530f, 0.240268f, 0.073270f, 0.013263f, 0.175323f, 0.012082f,
+            0.093993f, 0.015282f, 0.105854f, 0.107990f, 0.077798f, -0.096166f,
+            -0.079607f, 0.177820f, 0.142392f, 0.033337f, -0.078100f, -0.081616f,
+            -0.046993f, 0.139459f, 0.020272f, -0.123161f, 0.175269f, 0.105217f,
+            0.057328f, 0.080909f, -0.012612f, -0.097081f, 0.082060f, -0.096716f,
+            -0.063921f, 0.201884f, 0.128166f, -0.035051f, -0.032227f, -0.068139f,
+            -0.115915f, 0.095080f, -0.086007f, -0.067543f, 0.030776f, 0.032712f,
+            0.088937f, 0.054336f, -0.039329f, -0.114022f, 0.171672f, -0.112321f,
+            -0.217646f, 0.065186f, 0.060223f, 0.192174f, 0.055580f, -0.131107f,
+            -0.144338f, 0.056730f, -0.034707f, -0.081616f, -0.135298f, -0.000614f,
+            0.087189f, 0.014614f, 0.067709f, 0.107689f, 0.225780f, 0.084361f,
+            -0.008544f, 0.051649f, -0.048369f, -0.037739f, -0.060710f, 0.002654f,
+            0.016935f, 0.085563f, -0.015961f, -0.019265f, 0.111788f, 0.062376f,
+            0.202019f, 0.047713f, 0.042261f, 0.069716f, 0.242913f, 0.021052f,
+            -0.072812f, -0.155920f, -0.026436f, 0.035621f, -0.079300f, -0.028787f,
+            -0.048329f, 0.084718f, -0.060565f, -0.083750f, -0.164075f, -0.040742f,
+            -0.086219f, 0.015271f, -0.005204f, -0.016038f, 0.045816f, -0.050433f,
+            -0.077652f, 0.117109f, 0.009611f, -0.009045f, -0.008634f, -0.055373f,
+            -0.085968f, 0.028527f, -0.054736f, -0.168089f, 0.175839f, 0.071205f,
+            -0.023603f, 0.037907f, -0.004561f, -0.022634f, 0.123831f, 0.094469f,
+            -0.072920f, -0.133642f, -0.014032f, -0.142754f, -0.026999f, -0.199409f,
+            0.013268f, 0.226989f, 0.048650f, -0.170988f, -0.050141f, 0.007880f,
+            0.061880f, 0.019078f, -0.043578f, -0.038139f, 0.134814f, 0.054097f,
+            -0.081670f, 0.176838f, 0.047920f, -0.038176f, 0.050406f, -0.107181f,
+            -0.036279f, 0.027060f, 0.081594f, -0.002820f, 0.090507f, -0.033338f,
+            -0.059571f, 0.013404f, -0.099860f, 0.073371f, 0.342805f, 0.098305f,
+            -0.150910f, -0.020822f, -0.056960f, 0.046262f, -0.043413f, -0.149405f,
+            -0.129105f, -0.010899f, -0.014229f, -0.179949f, -0.113044f, -0.049468f,
+            -0.065513f, 0.090269f, -0.011919f, 0.087846f, 0.095796f, 0.146127f,
+            0.101599f, 0.078066f, -0.084348f, -0.100002f, -0.020134f, -0.050169f,
+            0.062122f, 0.014640f, 0.019143f, 0.036543f, 0.180924f, -0.013976f,
+            -0.066768f, -0.001090f, -0.070419f, -0.004839f, -0.001504f, 0.034483f,
+            -0.044954f, -0.050336f, -0.088638f, -0.174782f, -0.116082f, -0.205507f,
+            0.015587f, -0.042839f, -0.096879f, -0.144097f, -0.050268f, -0.196796f,
+            0.109639f, 0.271411f, 0.173732f, 0.108070f, 0.156437f, 0.124255f,
+            0.097242f, 0.238693f, 0.083941f, 0.109105f, 0.223940f, 0.267188f,
+            0.027385f, 0.025819f, 0.125070f, 0.093738f, 0.040353f, 0.038645f,
+            -0.012730f, 0.144063f, 0.052931f, -0.009138f, 0.084193f, 0.160272f,
+            -0.041366f, 0.011951f, -0.121446f, -0.106713f, -0.047566f, 0.047984f,
+            -0.255224f, -0.076116f, 0.098685f, -0.150845f, -0.171513f, -0.156590f,
+            0.058331f, 0.187493f, 0.413018f, 0.554265f, 0.372242f, 0.237943f,
+            0.124571f, 0.110829f, 0.010322f, -0.174477f, -0.067627f, -0.001979f,
+            0.142913f, 0.040597f, 0.019907f, 0.025963f, -0.043585f, -0.120732f,
+            0.099937f, 0.091059f, 0.247307f, 0.204226f, -0.042753f, -0.068580f,
+            -0.119002f, 0.026722f, 0.034853f, -0.060934f, -0.025054f, -0.093026f,
+            -0.035372f, -0.233209f, -0.049869f, -0.039151f, -0.022279f, -0.065380f,
+            -9.063785f };
+
+        return Mat(1, static_cast<int>(sizeof(detector)/sizeof(detector[0])), CV_32FC1, detector);
+    }
+
+    Mat getPeopleDetector64x128()
+    {
+        static float detector[] = {
+           0.05359386f, -0.14721455f, -0.05532170f, 0.05077307f,
+           0.11547081f, -0.04268804f, 0.04635834f, -0.05468199f, 0.08232084f,
+           0.10424068f, -0.02294518f, 0.01108519f, 0.01378693f, 0.11193510f,
+           0.01268418f, 0.08528346f, -0.06309239f, 0.13054633f, 0.08100729f,
+           -0.05209739f, -0.04315529f, 0.09341384f, 0.11035026f, -0.07596218f,
+           -0.05517511f, -0.04465296f, 0.02947334f, 0.04555536f,
+           -3.55954492e-003f, 0.07818956f, 0.07730991f, 0.07890715f, 0.06222893f,
+           0.09001380f, -0.03574381f, 0.03414327f, 0.05677258f, -0.04773581f,
+           0.03746637f, -0.03521175f, 0.06955440f, -0.03849038f, 0.01052293f,
+           0.01736112f, 0.10867710f, 0.08748853f, 3.29739624e-003f, 0.10907028f,
+           0.07913758f, 0.10393070f, 0.02091867f, 0.11594022f, 0.13182420f,
+           0.09879354f, 0.05362710f, -0.06745391f, -7.01260753e-003f,
+           5.24702156e-003f, 0.03236255f, 0.01407916f, 0.02207983f, 0.02537322f,
+           0.04547948f, 0.07200756f, 0.03129894f, -0.06274468f, 0.02107014f,
+           0.06035208f, 0.08636236f, 4.53164103e-003f, 0.02193363f, 0.02309801f,
+           0.05568166f, -0.02645093f, 0.04448695f, 0.02837519f, 0.08975694f,
+           0.04461516f, 0.08975355f, 0.07514391f, 0.02306982f, 0.10410084f,
+           0.06368385f, 0.05943464f, 4.58420580e-003f, 0.05220337f, 0.06675851f,
+           0.08358569f, 0.06712101f, 0.06559004f, -0.03930482f, -9.15936660e-003f,
+           -0.05897915f, 0.02816453f, 0.05032348f, 0.06780671f, 0.03377650f,
+           -6.09417039e-004f, -0.01795146f, -0.03083684f, -0.01302475f,
+           -0.02972313f, 7.88706727e-003f, -0.03525961f, -2.50397739e-003f,
+           0.05245084f, 0.11791293f, -0.02167498f, 0.05299332f, 0.06640524f,
+           0.05190265f, -8.27316567e-003f, 0.03033127f, 0.05842173f,
+           -4.01050318e-003f, -6.25105947e-003f, 0.05862958f, -0.02465461f,
+           0.05546781f, -0.08228195f, -0.07234028f, 0.04640540f, -0.01308254f,
+           -0.02506191f, 0.03100746f, -0.04665651f, -0.04591486f, 0.02949927f,
+           0.06035462f, 0.02244646f, -0.01698639f, 0.01040041f, 0.01131170f,
+           0.05419579f, -0.02130277f, -0.04321722f, -0.03665198f, 0.01126490f,
+           -0.02606488f, -0.02228328f, -0.02255680f, -0.03427236f,
+           -7.75165204e-003f, -0.06195229f, 8.21638294e-003f, 0.09535975f,
+           -0.03709979f, -0.06942501f, 0.14579427f, -0.05448192f, -0.02055904f,
+           0.05747357f, 0.02781788f, -0.07077577f, -0.05178314f, -0.10429011f,
+           -0.11235505f, 0.07529039f, -0.07559302f, -0.08786739f, 0.02983843f,
+           0.02667585f, 0.01382199f, -0.01797496f, -0.03141199f, -0.02098101f,
+           0.09029204f, 0.04955018f, 0.13718739f, 0.11379953f, 1.80019124e-003f,
+           -0.04577610f, -1.11108483e-003f, -0.09470536f, -0.11596080f,
+           0.04489342f, 0.01784211f, 3.06850672e-003f, 0.10781866f,
+           3.36498418e-003f, -0.10842580f, -0.07436839f, -0.10535070f,
+           -0.01866805f, 0.16057891f, -5.07316366e-003f, -0.04295658f,
+           -5.90488780e-003f, 8.82003549e-003f, -0.01492646f, -0.05029279f,
+           -0.12875880f, 8.78831954e-004f, -0.01297184f, -0.07592774f,
+           -0.02668831f, -6.93787413e-004f, 0.02406698f, -0.01773298f,
+           -0.03855745f, -0.05877856f, 0.03259695f, 0.12826584f, 0.06292590f,
+           -4.10733931e-003f, 0.10996531f, 0.01332991f, 0.02088735f, 0.04037504f,
+           -0.05210760f, 0.07760046f, 0.06399347f, -0.05751930f, -0.10053057f,
+           0.07505023f, -0.02139782f, 0.01796176f, 2.34400877e-003f, -0.04208319f,
+           0.07355055f, 0.05093350f, -0.02996780f, -0.02219072f, 0.03355330f,
+           0.04418742f, -0.05580705f, -0.05037573f, -0.04548179f, 0.01379514f,
+           0.02150671f, -0.02194211f, -0.13682702f, 0.05464972f, 0.01608082f,
+           0.05309116f, 0.04701022f, 1.33690401e-003f, 0.07575664f, 0.09625306f,
+           8.92647635e-003f, -0.02819123f, 0.10866830f, -0.03439325f,
+           -0.07092371f, -0.06004780f, -0.02712298f, -7.07467366e-003f,
+           -0.01637020f, 0.01336790f, -0.10313606f, 0.04906582f, -0.05732445f,
+           -0.02731079f, 0.01042235f, -0.08340668f, 0.03686501f, 0.06108340f,
+           0.01322748f, -0.07809529f, 0.03774724f, -0.03413248f, -0.06096525f,
+           -0.04212124f, -0.07982176f, -1.25973229e-003f, -0.03045501f,
+           -0.01236493f, -0.06312395f, 0.04789570f, -0.04602066f, 0.08576570f,
+           0.02521080f, 0.02988098f, 0.10314583f, 0.07060035f, 0.04520544f,
+           -0.04426654f, 0.13146530f, 0.08386490f, 0.02164590f, -2.12280243e-003f,
+           -0.03686353f, -0.02074944f, -0.03829959f, -0.01530596f, 0.02689708f,
+           0.11867401f, -0.06043470f, -0.02785023f, -0.04775074f, 0.04878745f,
+           0.06350956f, 0.03494788f, 0.01467400f, 1.17890188e-003f, 0.04379614f,
+           2.03681854e-003f, -0.03958609f, -0.01072688f, 6.43705716e-003f,
+           0.02996500f, -0.03418507f, -0.01960307f, -0.01219154f,
+           -4.37000440e-003f, -0.02549453f, 0.02646318f, -0.01632513f,
+           6.46516960e-003f, -0.01929734f, 4.78711911e-003f, 0.04962371f,
+           0.03809111f, 0.07265724f, 0.05758125f, -0.03741554f, 0.01648608f,
+           -8.45285598e-003f, 0.03996826f, -0.08185477f, 0.02638875f,
+           -0.04026615f, -0.02744674f, -0.04071517f, 1.05096330e-003f,
+           -0.04741232f, -0.06733172f, 8.70434940e-003f, -0.02192543f,
+           1.35350740e-003f, -0.03056974f, -0.02975521f, -0.02887780f,
+           -0.01210713f, -0.04828526f, -0.09066251f, -0.09969629f, -0.03665164f,
+           -8.88111943e-004f, -0.06826669f, -0.01866150f, -0.03627640f,
+           -0.01408288f, 0.01874239f, -0.02075835f, 0.09145175f, -0.03547291f,
+           0.05396780f, 0.04198981f, 0.01301925f, -0.03384354f, -0.12201976f,
+           0.06830920f, -0.03715654f, 9.55848210e-003f, 5.05685573e-003f,
+           0.05659294f, 3.90764466e-003f, 0.02808490f, -0.05518097f, -0.03711621f,
+           -0.02835565f, -0.04420464f, -0.01031947f, 0.01883466f,
+           -8.49525444e-003f, -0.09419250f, -0.01269387f, -0.02133371f,
+           -0.10190815f, -0.07844430f, 2.43644323e-003f, -4.09610150e-003f,
+           0.01202551f, -0.06452291f, -0.10593818f, -0.02464746f, -0.02199699f,
+           -0.07401930f, 0.07285886f, 8.87513801e-004f, 9.97662079e-003f,
+           8.46779719e-003f, 0.03730333f, -0.02905126f, 0.03573337f, -0.04393689f,
+           -0.12014472f, 0.03176554f, -2.76015815e-003f, 0.10824566f, 0.05090732f,
+           -3.30179278e-003f, -0.05123822f, 5.04784798e-003f, -0.05664124f,
+           -5.99415926e-003f, -0.05341901f, -0.01221393f, 0.01291318f,
+           9.91760660e-003f, -7.56987557e-003f, -0.06193124f, -2.24549137e-003f,
+           0.01987562f, -0.02018840f, -0.06975540f, -0.06601523f, -0.03349112f,
+           -0.08910118f, -0.03371435f, -0.07406893f, -0.02248047f, -0.06159951f,
+           2.77751544e-003f, -0.05723337f, -0.04792468f, 0.07518548f,
+           2.77279224e-003f, 0.04211938f, 0.03100502f, 0.05278448f, 0.03954679f,
+           -0.03006846f, -0.03851741f, -0.02792403f, -0.02875333f, 0.01531280f,
+           0.02186953f, -0.01989829f, 2.50679464e-003f, -0.10258728f,
+           -0.04785743f, -0.02887216f, 3.85063468e-003f, 0.01112236f,
+           8.29218887e-003f, -0.04822981f, -0.04503597f, -0.03713100f,
+           -0.06988008f, -0.11002295f, -2.69209221e-003f, 1.85383670e-003f,
+           -0.05921049f, -0.06105053f, -0.08458050f, -0.04527602f,
+           8.90329306e-004f, -0.05875023f, -2.68602883e-003f, -0.01591195f,
+           0.03631859f, 0.05493166f, 0.07300330f, 5.53333294e-003f, 0.06400407f,
+           0.01847740f, -5.76280477e-003f, -0.03210877f, 4.25160583e-003f,
+           0.01166520f, -1.44864211e-003f, 0.02253744f, -0.03367080f, 0.06983195f,
+           -4.22323542e-003f, -8.89401045e-003f, -0.07943393f, 0.05199728f,
+           0.06065201f, 0.04133492f, 1.44032843e-003f, -0.09585235f, -0.03964731f,
+           0.04232114f, 0.01750465f, -0.04487902f, -7.59733608e-003f, 0.02011171f,
+           0.04673622f, 0.09011173f, -0.07869188f, -0.04682482f, -0.05080139f,
+           -3.99383716e-003f, -0.05346331f, 0.01085723f, -0.03599333f,
+           -0.07097908f, 0.03551549f, 0.02680387f, 0.03471529f, 0.01790393f,
+           0.05471273f, 9.62048303e-003f, -0.03180215f, 0.05864431f, 0.02330614f,
+           0.01633144f, -0.05616681f, -0.10245429f, -0.08302189f, 0.07291322f,
+           -0.01972590f, -0.02619633f, -0.02485327f, -0.04627592f,
+           1.48853404e-003f, 0.05514185f, -0.01270860f, -0.01948900f, 0.06373586f,
+           0.05002292f, -0.03009798f, 8.76216311e-003f, -0.02474238f,
+           -0.05504891f, 1.74034527e-003f, -0.03333667f, 0.01524987f, 0.11663762f,
+           -1.32344989e-003f, -0.06608453f, 0.05687166f, -6.89525274e-004f,
+           -0.04402352f, 0.09450210f, -0.04222684f, -0.05360983f, 0.01779531f,
+           0.02561388f, -0.11075410f, -8.77790991e-003f, -0.01099504f,
+           -0.10380266f, 0.03103457f, -0.02105741f, -0.07371717f, 0.05146710f,
+           0.10581432f, -0.08617968f, -0.02892107f, 0.01092199f, 0.14551543f,
+           -2.24320893e-003f, -0.05818033f, -0.07390742f, 0.05701261f,
+           0.12937020f, -0.04986651f, 0.10182415f, 0.05028650f, 0.12515625f,
+           0.09175041f, 0.06404983f, 0.01523394f, 0.09460562f, 0.06106631f,
+           -0.14266998f, -0.02926703f, 0.02762171f, 0.02164151f,
+           -9.58488265e-004f, -0.04231362f, -0.09866509f, 0.04322244f,
+           0.05872034f, -0.04838847f, 0.06319253f, 0.02443798f, -0.03606876f,
+           9.38737206e-003f, 0.04289991f, -0.01027411f, 0.08156885f, 0.08751175f,
+           -0.13191354f, 8.16054735e-003f, -0.01452161f, 0.02952677f, 0.03615945f,
+           -2.09128903e-003f, 0.02246693f, 0.09623287f, 0.09412123f, -0.02924758f,
+           -0.07815186f, -0.02203079f, -2.02566991e-003f, 0.01094733f,
+           -0.01442332f, 0.02838561f, 0.11882371f, 7.28798332e-003f, -0.10345965f,
+           0.07561217f, -0.02049661f, 4.44177445e-003f, 0.01609347f, -0.04893158f,
+           -0.08758243f, -7.67420698e-003f, 0.08862378f, 0.06098121f, 0.06565887f,
+           7.32981879e-003f, 0.03558407f, -0.03874352f, -0.02490055f,
+           -0.06771075f, 0.09939223f, -0.01066077f, 0.01382995f, -0.07289080f,
+           7.47184316e-003f, 0.10621431f, -0.02878659f, 0.02383525f, -0.03274646f,
+           0.02137008f, 0.03837290f, 0.02450992f, -0.04296818f, -0.02895143f,
+           0.05327370f, 0.01499020f, 0.04998732f, 0.12938657f, 0.09391870f,
+           0.04292390f, -0.03359194f, -0.06809492f, 0.01125796f, 0.17290455f,
+           -0.03430733f, -0.06255233f, -0.01813114f, 0.11726857f, -0.06127599f,
+           -0.08677909f, -0.03429872f, 0.04684938f, 0.08161420f, 0.03538774f,
+           0.01833884f, 0.11321855f, 0.03261845f, -0.04826299f, 0.01752407f,
+           -0.01796414f, -0.10464549f, -3.30041884e-003f, 2.29343961e-004f,
+           0.01457292f, -0.02132982f, -0.02602923f, -9.87351313e-003f,
+           0.04273872f, -0.02103316f, -0.07994065f, 0.02614958f, -0.02111666f,
+           -0.06964913f, -0.13453490f, -0.06861878f, -6.09341264e-003f,
+           0.08251446f, 0.15612499f, 2.46531400e-003f, 8.88424646e-003f,
+           -0.04152999f, 0.02054853f, 0.05277953f, -0.03087788f, 0.02817579f,
+           0.13939077f, 0.07641046f, -0.03627627f, -0.03015098f, -0.04041540f,
+           -0.01360690f, -0.06227205f, -0.02738223f, 0.13577610f, 0.15235767f,
+           -0.05392922f, -0.11175954f, 0.02157129f, 0.01146481f, -0.05264937f,
+           -0.06595174f, -0.02749175f, 0.11812254f, 0.17404149f, -0.06137035f,
+           -0.11003478f, -0.01351621f, -0.01745916f, -0.08577441f, -0.04469909f,
+           -0.06106115f, 0.10559758f, 0.20806813f, -0.09174948f, 7.09621934e-004f,
+           0.03579374f, 0.07215115f, 0.02221742f, 0.01827742f, -7.90785067e-003f,
+           0.01489554f, 0.14519960f, -0.06425831f, 0.02990399f, -1.80181325e-003f,
+           -0.01401528f, -0.04171134f, -3.70530109e-003f, -0.09090481f,
+           0.09520713f, 0.08845516f, -0.02651753f, -0.03016730f, 0.02562448f,
+           0.03563816f, -0.03817881f, 0.01433385f, 0.02256983f, 0.02872120f,
+           0.01001934f, -0.06332260f, 0.04338406f, 0.07001807f, -0.04705722f,
+           -0.07318907f, 0.02630457f, 0.03106382f, 0.06648342f, 0.10913180f,
+           -0.01630815f, 0.02910308f, 0.02895109f, 0.08040254f, 0.06969310f,
+           0.06797734f, 6.08639978e-003f, 4.16588830e-003f, 0.08926726f,
+           -0.03123648f, 0.02700146f, 0.01168734f, -0.01631594f, 4.61015804e-003f,
+           8.51359498e-003f, -0.03544224f, 0.03571994f, 4.29766066e-003f,
+           -0.01970077f, -8.79793242e-003f, 0.09607988f, 0.01544222f,
+           -0.03923707f, 0.07308586f, 0.06061262f, 1.31683104e-004f,
+           -7.98222050e-003f, 0.02399261f, -0.06084389f, -0.02743429f,
+           -0.05475523f, -0.04131311f, 0.03559756f, 0.03055342f, 0.02981433f,
+           0.14860515f, 0.01766787f, 0.02945257f, 0.04898238f, 0.01026922f,
+           0.02811658f, 0.08267091f, 0.02732154f, -0.01237693f, 0.11760156f,
+           0.03802063f, -0.03309754f, 5.24957618e-003f, -0.02460510f, 0.02691451f,
+           0.05399988f, -0.10133506f, 0.06385437f, -0.01818005f, 0.02259503f,
+           0.03573135f, 0.01042848f, -0.04153402f, -0.04043029f, 0.01643575f,
+           0.08326677f, 4.61383024e-004f, -0.05308095f, -0.08536223f,
+           -1.61011645e-003f, -0.02163720f, -0.01783352f, 0.03859637f,
+           0.08498885f, -0.01725216f, 0.08625131f, 0.10995087f, 0.09177644f,
+           0.08498347f, 0.07646490f, 0.05580502f, 0.02693516f, 0.09996913f,
+           0.09070327f, 0.06667200f, 0.05873008f, -0.02247842f, 0.07772321f,
+           0.12408436f, 0.12629253f, -8.41997913e-004f, 0.01477783f, 0.09165990f,
+           -2.98401713e-003f, -0.06466447f, -0.07057302f, 2.09516948e-004f,
+           0.02210209f, -0.02158809f, -0.08602506f, -0.02284836f,
+           4.01876355e-003f, 9.56660323e-003f, -0.02073978f, -0.04635138f,
+           -7.59423291e-003f, -0.01377393f, -0.04559359f, -0.13284740f,
+           -0.08671406f, -0.03654395f, 0.01142869f, 0.03287891f, -0.04392983f,
+           0.06142959f, 0.17710890f, 0.10385257f, 0.01329137f, 0.10067633f,
+           0.12450829f, -0.04476709f, 0.09049144f, 0.04589312f, 0.11167907f,
+           0.08587538f, 0.04767583f, 1.67188141e-003f, 0.02359802f, -0.03808852f,
+           0.03126272f, -0.01919029f, -0.05698918f, -0.02365112f, -0.06519032f,
+           -0.05599358f, -0.07097308f, -0.03301812f, -0.04719102f, -0.02566297f,
+           0.01324074f, -0.09230672f, -0.05518232f, -0.04712864f, -0.03380903f,
+           -0.06719479f, 0.01183908f, -0.09326738f, 0.01642865f, 0.03789867f,
+           -6.61567831e-003f, 0.07796386f, 0.07246574f, 0.04706347f, -0.02523437f,
+           -0.01696830f, -0.08068866f, 0.06030888f, 0.10527060f, -0.06611756f,
+           0.02977346f, 0.02621830f, 0.01913855f, -0.08479366f, -0.06322418f,
+           -0.13570616f, -0.07644490f, 9.31900274e-003f, -0.08095149f,
+           -0.10197903f, -0.05204025f, 0.01413151f, -0.07800411f, -0.01885122f,
+           -0.07509381f, -0.10136326f, -0.05212355f, -0.09944065f,
+           -1.33606605e-003f, -0.06342617f, -0.04178550f, -0.12373723f,
+           -0.02832736f, -0.06057501f, 0.05830070f, 0.07604282f, -0.06462587f,
+           8.02447461e-003f, 0.11580125f, 0.12332212f, 0.01978462f,
+           -2.72378162e-003f, 0.05850752f, -0.04674481f, 0.05148062f,
+           -2.62542837e-003f, 0.11253355f, 0.09893716f, 0.09785093f, -0.04659257f,
+           -0.01102429f, -0.07002308f, 0.03088913f, -0.02565549f, -0.07671449f,
+           3.17443861e-003f, -0.10783514f, -0.02314270f, -0.11089555f,
+           -0.01024768f, 0.03116021f, -0.04964825f, 0.02281825f, 5.50005678e-003f,
+           -0.08427856f, -0.14685495f, -0.07719755f, -0.13342668f, -0.04525511f,
+           -0.09914210f, 0.02588859f, 0.03469279f, 0.04664020f, 0.11688190f,
+           0.09647275f, 0.10857815f, -0.01448726f, 0.04299758f, -0.06763151f,
+           1.33257592e-003f, 0.14331576f, 0.07574340f, 0.09166205f, 0.05674926f,
+           0.11325553f, -0.01106494f, 0.02062161f, -0.11484840f, -0.07492137f,
+           -0.02864293f, -0.01275638f, -0.06946032f, -0.10101652f, -0.04113498f,
+           -0.02214783f, -0.01273942f, -0.07480393f, -0.10556041f, -0.07622112f,
+           -0.09988393f, -0.11453961f, -0.12073903f, -0.09412795f, -0.07146588f,
+           -0.04054537f, -0.06127083f, 0.04221122f, 0.07688113f, 0.04099256f,
+           0.12663734f, 0.14683802f, 0.21761774f, 0.12525328f, 0.18431792f,
+           -1.66402373e-003f, 2.37777247e-003f, 0.01445475f, 0.03509416f,
+           0.02654697f, 0.01716739f, 0.05374011f, 0.02944174f, 0.11323927f,
+           -0.01485456f, -0.01611330f, -1.85554172e-003f, -0.01708549f,
+           -0.05435753f, -0.05302101f, 0.05260378f, -0.03582945f,
+           -3.42867890e-004f, 1.36076682e-003f, -0.04436073f, -0.04228432f,
+           0.03281291f, -0.05480836f, -0.10197772f, -0.07206279f, -0.10741059f,
+           -0.02366946f, 0.10278475f, -2.74783419e-003f, -0.03242477f,
+           0.02308955f, 0.02835869f, 0.10348799f, 0.19580358f, 0.10252027f,
+           0.08039929f, 0.05525554f, -0.13250865f, -0.14395352f, 3.13586881e-003f,
+           -0.03387071f, 8.94669443e-003f, 0.05406157f, -4.97324532e-003f,
+           -0.01189114f, 2.82919413e-004f, -0.03901557f, -0.04898705f,
+           0.02164520f, -0.01382906f, -0.01850416f, 0.01869347f, -0.02450060f,
+           0.02291678f, 0.08196463f, 0.03309153f, -0.10629974f, 0.02473924f,
+           0.05344394f, -0.02404823f, -0.03243643f, -5.55244600e-003f,
+           -0.08009996f, 0.02811539f, 0.04235742f, 0.01859004f, 0.04902123f,
+           -0.01438252f, -0.01526853f, 0.02044195f, -0.05008660f, 0.04244113f,
+           0.07611816f, 0.04950470f, -0.06020549f, -4.26026015e-003f, 0.13133512f,
+           -0.01438738f, -0.01958807f, -0.04044152f, -0.12425045f,
+           2.84353318e-003f, -0.05042776f, -0.09121484f, 7.34345755e-003f,
+           0.09388847f, 0.11800314f, 4.72295098e-003f, 4.44378285e-003f,
+           -0.07984917f, -0.03613737f, 0.04490915f, -0.02246483f, 0.04681071f,
+           0.05240871f, 0.02157206f, -0.04603431f, -0.01197929f, -0.02748779f,
+           0.13621049f, 0.08812155f, -0.07802048f, 4.86458559e-003f, -0.01598836f,
+           0.01024450f, -0.03463517f, -0.02304239f, -0.08692665f, 0.06655128f,
+           0.05785803f, -0.12640759f, 0.02307472f, 0.07337402f, 0.07525434f,
+           0.04943763f, -0.02241034f, -0.09978238f, 0.14487994f, -0.06570521f,
+           -0.07855482f, 0.02830222f, -5.29603509e-004f, -0.04669895f,
+           -0.11822784f, -0.12246452f, -0.15365660f, -0.02969127f, 0.08078201f,
+           0.13512598f, 0.11505685f, 0.04740673f, 0.01376022f, -0.05852978f,
+           -0.01537809f, -0.05541119f, 0.02491065f, -0.02870786f, 0.02760978f,
+           0.23836176f, 0.22347429f, 0.10306466f, -0.06919070f, -0.10132039f,
+           -0.20198342f, -0.05040560f, 0.27163076f, 0.36987007f, 0.34540465f,
+           0.29095781f, 0.05649706f, 0.04125737f, 0.07505883f, -0.02737836f,
+           -8.43431335e-003f, 0.07368195f, 0.01653876f, -0.09402955f,
+           -0.09574359f, 0.01474337f, -0.07128561f, -0.03460737f, 0.11438941f,
+           0.13752601f, -0.06385452f, -0.06310338f, 8.19548313e-003f, 0.11622470f,
+           5.05133113e-003f, -0.07602754f, 0.06695660f, 0.25723928f, 0.09037900f,
+           0.28826267f, 0.13165380f, -0.05312614f, -0.02137198f, -0.03442232f,
+           -0.06255679f, 0.03899667f, 0.18391028f, 0.26016650f, 0.03374462f,
+           0.01860465f, 0.19077586f, 0.18160543f, 3.43634398e-003f, -0.03036782f,
+           0.19683038f, 0.35378191f, 0.24968483f, -0.03222649f, 0.28972381f,
+           0.43091634f, 0.30778357f, 0.02335266f, -0.09877399f, -6.85245218e-003f,
+           0.08945240f, -0.08150686f, 0.02792493f, 0.24806842f, 0.17338486f,
+           0.06231801f, -0.10432383f, -0.16653322f, -0.13197899f, -0.08531576f,
+           -0.19271527f, -0.13536365f, 0.22240199f, 0.39219588f, 0.26597717f,
+           -0.01231649f, 0.01016179f, 0.13379875f, 0.12018334f, -0.04852953f,
+           -0.07915270f, 0.07036012f, 3.87723115e-003f, -0.06126805f,
+           -0.15015170f, -0.11406515f, -0.08556531f, -0.07429333f, -0.16115491f,
+           0.13214062f, 0.25691369f, 0.05697750f, 0.06861912f, -6.02903729e-003f,
+           -7.94562511e-003f, 0.04799571f, 0.06695165f, -0.01926842f, 0.06206308f,
+           0.13450983f, -0.06381495f, -2.98370165e-003f, -0.03482971f,
+           7.53991678e-003f, 0.03895611f, 0.11464261f, 0.01669971f,
+           8.27818643e-003f, -7.49160210e-003f, -0.11712562f, -0.10650621f,
+           -0.10353880f, -0.04994106f, -7.65618810e-004f, 0.03023767f,
+           -0.04759270f, -0.07302686f, -0.05825012f, -0.13156348f, -0.10639747f,
+           -0.19393684f, -0.09973683f, -0.07918908f, 4.63177625e-004f,
+           -6.61382044e-004f, 0.15853868f, 0.08561199f, -0.07660093f,
+           -0.08015265f, -0.06164073f, 0.01882577f, -7.29908410e-004f,
+           0.06840892f, 0.03843764f, 0.20274927f, 0.22028814f, -5.26101235e-003f,
+           0.01452435f, -0.06331623f, 0.02865064f, 0.05673740f, 0.12171564f,
+           0.03837196f, 0.03555467f, -0.02662914f, -0.10280123f, -0.06526285f,
+           -0.11066351f, -0.08988424f, -0.10103678f, 8.10526591e-003f,
+           5.95238712e-003f, 0.02617721f, -0.01705742f, -0.10897956f,
+           -0.08004991f, -0.11271993f, -0.06185647f, -0.06103712f, 0.01597041f,
+           -0.05923606f, 0.09410726f, 0.22858568f, 0.03263380f, 0.06772990f,
+           -0.09003516f, 0.01017870f, 0.01931688f, 0.08628357f, -0.01430009f,
+           0.10954945f, 0.16612452f, -0.02434544f, -0.03310068f, -0.04236627f,
+           0.01212392f, -6.15046406e-003f, 0.06954194f, 0.03015283f, 0.01787957f,
+           0.02781667f, -0.05561153f, -8.96244217e-003f, -0.04971489f,
+           0.07510284f, 0.01775282f, 0.05889897f, -0.07981427f, 0.03647643f,
+           -3.73833324e-003f, -0.08894575f, -0.06429435f, -0.08068276f,
+           0.03567704f, -0.07131936f, -7.21910037e-003f, -0.09566668f,
+           0.17886090f, 0.14911725f, 0.02070032f, -0.05017120f, -0.04992622f,
+           0.01570143f, -0.09906903f, 0.06456193f, 0.15329507f, 0.18820767f,
+           0.11689861f, -0.01178513f, -0.02225163f, -0.01905318f, 0.10271224f,
+           -7.27029052e-003f, 0.11664233f, 0.14796902f, 0.07771893f, 0.02400013f,
+           -0.05361797f, -0.01972888f, 0.01376177f, 0.06740040f, -0.06525395f,
+           0.05726178f, -0.02404981f, -0.14018567f, -0.02074987f, -0.04621970f,
+           -0.04688627f, -0.01842059f, 0.07722727f, -0.04852883f, 0.01529004f,
+           -0.19639495f, 0.10817073f, 0.03795860f, -0.09435206f, -0.07984378f,
+           -0.03383440f, 0.11081333f, 0.02237366f, 0.12703256f, 0.21613893f,
+           0.02918790f, 4.66472283e-003f, -0.10274266f, -0.04854131f,
+           -3.46305710e-003f, 0.08652268f, 0.02251546f, 0.09636052f, 0.17180754f,
+           -0.09272388f, 4.59174305e-004f, -0.11723048f, -0.12210111f,
+           -0.15547538f, 0.07218186f, -0.05297846f, 0.03779940f, 0.05150875f,
+           -0.03802310f, 0.03870645f, -0.15250699f, -0.08696499f, -0.02021560f,
+           0.04118926f, -0.15177974f, 0.01577647f, 0.10249301f, 7.50041893e-003f,
+           0.01721806f, -0.06828983f, -0.02397596f, -0.06598977f, -0.04317593f,
+           -0.08064980f, 6.66632550e-003f, 0.03333484f, 0.07093620f, 0.08231064f,
+           -0.06577903f, -0.06698844f, -0.06984019f, -0.06508023f, -0.14145090f,
+           -0.02393239f, 0.06485303f, 8.83263443e-003f, 0.09251080f, -0.07557579f,
+           -0.05067699f, -0.09798748f, -0.06703258f, -0.14056294f, 0.03245994f,
+           0.12554143f, 0.01761621f, 0.12980327f, -0.04081950f, -0.11906909f,
+           -0.14813015f, -0.08376863f, -0.12200681f, 0.04988137f, 0.05424247f,
+           -3.90952639e-003f, 0.03255733f, -0.12717837f, -0.07461493f,
+           -0.05703964f, -0.01736189f, -0.08026433f, -0.05433894f, -0.01719359f,
+           0.02886275f, 0.01772653f, -0.09163518f, 3.57789593e-003f, -0.10129993f,
+           -0.02653764f, -0.08131415f, -0.03847986f, -7.62157550e-004f,
+           0.06486648f, 0.19675669f, -0.04919156f, -0.07059129f, -0.04857785f,
+           -0.01042383f, -0.08328653f, 0.03660302f, -0.03696846f, 0.04969259f,
+           0.08241162f, -0.12514858f, -0.06122676f, -0.03750202f,
+           6.52989605e-003f, -0.10247213f, 0.02568346f, 4.51781414e-003f,
+           -0.03734229f, -0.01131264f, -0.05412074f, 8.89345480e-004f,
+           -0.12388977f, -0.05959237f, -0.12418608f, -0.06151643f, -0.07310260f,
+           0.02441575f, 0.07023528f, -0.07548289f, -7.57147965e-004f,
+           -0.09061348f, -0.08112976f, -0.06920306f, 9.54394229e-003f,
+           -0.01219902f, 1.21273217e-003f, -8.88989680e-003f, -0.08309301f,
+           -0.04552661f, -0.10739882f, -0.05691034f, -0.13928030f, 0.09027749f,
+           0.15123098f, 0.03175976f, 0.17763577f, 3.29913251e-004f, 0.05151888f,
+           -0.09844074f, -0.09475287f, -0.08571247f, 0.16241577f, 0.19336018f,
+           8.57454538e-003f, 0.11474732f, -0.01493934f, 0.03352379f, -0.08966240f,
+           -0.02322310f, 0.02663568f, 0.05448750f, -0.03536883f, -0.07210463f,
+           -0.06807277f, -0.03121621f, -0.05932408f, -0.17282860f, -0.15873498f,
+           -0.04956378f, 0.01603377f, -0.12385946f, 0.13878587f, 0.21468069f,
+           0.13510075f, 0.20992437f, 0.08845878f, 0.08104013f, 0.03754176f,
+           0.12173114f, 0.11103114f, 0.10643122f, 0.13941477f, 0.11640384f,
+           0.14786847f, 0.01218238f, 0.01160753f, 0.03547940f, 0.08794311f,
+           -0.01695384f, -0.07692261f, -0.08236158f, 6.79194089e-003f,
+           -0.02458403f, 0.13022894f, 0.10953187f, 0.09857773f, 0.04735930f,
+           -0.04353498f, -0.15173385f, -0.17904443f, -0.10450364f, -0.13418166f,
+           -0.06633098f, -0.03170381f, -0.06839000f, -0.11350126f, -0.06983913f,
+           0.19083543f, 0.17604128f, 0.07730632f, 0.10022651f, 0.36428109f,
+           0.28291923f, 0.12688625f, 0.15942036f, 0.14064661f, -0.11201853f,
+           -0.13969108f, -0.09088077f, -0.14107047f, 0.05117374f,
+           -2.63348082e-003f, -0.10794610f, -0.09715455f, -0.05284977f,
+           0.01565668f, 0.05031200f, 0.07021113f, -0.02963028f, 0.01766960f,
+           0.08333644f, -0.03211382f, 4.90096770e-003f, 0.05186674f, -0.05045737f,
+           -0.09624767f, -0.02525997f, 0.06916669f, 0.01213916f, 0.05333899f,
+           -0.03443280f, -0.10055527f, -0.06291115f, 5.42851724e-003f,
+           -6.30360236e-003f, 0.02270257f, -0.01769792f, 0.03273688f, 0.07746078f,
+           7.77099328e-003f, 0.05041346f, 0.01648103f, -0.02321534f, -0.09930186f,
+           -0.02293853f, 0.02034990f, -0.08324204f, 0.08510064f, -0.03732836f,
+           -0.06465405f, -0.06086946f, 0.13680504f, -0.11469388f, -0.03896406f,
+           -0.07142810f, 2.67581246e-003f, -0.03639632f, -0.09849060f,
+           -0.11014334f, 0.17489147f, 0.17610909f, -0.16091567f, -0.07248894f,
+           0.01567141f, 0.23742996f, 0.07552249f, -0.06270349f, -0.07303379f,
+           0.25442186f, 0.16903116f, -0.08168741f, -0.05913896f, -0.03954096f,
+           6.81776879e-003f, -0.05615319f, -0.07303037f, -0.12176382f,
+           0.12385108f, 0.22084464f, -0.05543206f, -0.03310431f, 0.05731593f,
+           0.19481890f, 0.04016430f, -0.06480758f, -0.12353460f, 0.18733442f,
+           -0.09631214f, -0.11192076f, 0.12404587f, 0.15671748f, 0.19256128f,
+           0.10895617f, 0.03391477f, -0.13032004f, -0.05626907f, -0.09025607f,
+           0.23485197f, 0.27812332f, 0.26725492f, 0.07255980f, 0.16565137f,
+           0.22388470f, 0.07441066f, -0.21003133f, -0.08075339f, -0.15031935f,
+           0.07023834f, 0.10872041f, 0.18156518f, 0.20037253f, 0.13571967f,
+           -0.11915682f, -0.11131983f, -0.18878011f, 0.06074620f, 0.20578890f,
+           0.12413109f, 0.03930207f, 0.29176015f, 0.29502738f, 0.27856228f,
+           -0.01803601f, 0.16646385f, 0.19268319f, 0.01900682f, 0.06026287f,
+           2.35868432e-003f, 0.01558199f, 0.02707230f, 0.11383014f, 0.12103992f,
+           0.03907350f, 0.04637353f, 0.09020995f, 0.11919726f, -3.63007211e-003f,
+           0.02220155f, 0.10336831f, 0.17351882f, 0.12259731f, 0.18983354f,
+           0.15736865f, 0.01160725f, -0.01690723f, -9.69582412e-004f, 0.07213813f,
+           0.01161613f, 0.17864859f, 0.24486147f, 0.18208991f, 0.20177495f,
+           0.05972528f, -8.93934630e-003f, -0.02316955f, 0.14436610f, 0.14114498f,
+           0.05520950f, 0.06353590f, -0.19124921f, 0.10174713f, 0.29414919f,
+           0.26448128f, 0.09344960f, 0.15284036f, 0.19797507f, 0.11369792f,
+           -0.12722753f, -0.21396367f, -0.02008235f, -0.06566695f, -0.01662150f,
+           -0.03937003f, 0.04778343f, 0.05017274f, -0.02299062f, -0.20208496f,
+           -0.06395898f, 0.13721776f, 0.22544557f, 0.14888357f, 0.08687132f,
+           0.27088094f, 0.32206613f, 0.09782200f, -0.18523243f, -0.17232181f,
+           -0.01041531f, 0.04008654f, 0.04199702f, -0.08081299f, -0.03755421f,
+           -0.04809646f, -0.05222081f, -0.21709201f, -0.06622940f, 0.02945281f,
+           -0.04600435f, -0.05256077f, -0.08432942f, 0.02848100f, 0.03490564f,
+           8.28621630e-003f, -0.11051246f, -0.11210597f, -0.01998289f,
+           -0.05369405f, -0.08869293f, -0.18799506f, -0.05436598f, -0.05011634f,
+           -0.05419716f, -0.06151857f, -0.10827805f, 0.04346735f, 0.04016083f,
+           0.01520820f, -0.12173316f, -0.04880285f, -0.01101406f, 0.03250847f,
+           -0.06009551f, -0.03082932f, -0.02295134f, -0.06856834f, -0.08775249f,
+           -0.23793389f, -0.09174541f, -0.05538322f, -0.04321031f, -0.11874759f,
+           -0.04221844f, -0.06070468f, 0.01194489f, 0.02608565f, -0.03892140f,
+           -0.01643151f, -0.02602034f, -0.01305472f, 0.03920100f, -0.06514261f,
+           0.01126918f, -6.27710763e-003f, -0.02720047f, -0.11133634f,
+           0.03300330f, 0.02398472f, 0.04079665f, -0.10564448f, 0.05966159f,
+           0.01195221f, -0.03179441f, -0.01692590f, -0.06177841f, 0.01841576f,
+           -5.51078189e-003f, -0.06821765f, -0.03191888f, -0.09545476f,
+           0.03030550f, -0.04896152f, -0.02914624f, -0.13283344f, -0.04783419f,
+           6.07836898e-003f, -0.01449538f, -0.13358212f, -0.09687774f,
+           -0.02813793f, 0.01213498f, 0.06650011f, -0.02039067f, 0.13356198f,
+           0.05986415f, -9.12760664e-003f, -0.18780160f, -0.11992817f,
+           -0.06342237f, 0.01229534f, 0.07143231f, 0.10713009f, 0.11085765f,
+           0.06569190f, -0.02956399f, -0.16288325f, -0.13993549f, -0.01292515f,
+           0.03833013f, 0.09130384f, -0.05086257f, 0.05617329f, -0.03896667f,
+           -0.06282311f, -0.11490010f, -0.14264110f, -0.04530499f, 0.01598189f,
+           0.09167797f, 0.08663294f, 0.04885277f, -0.05741219f, -0.07565769f,
+           -0.17136464f, -0.02619422f, -0.02477579f, 0.02679587f, 0.11621952f,
+           0.08788391f, 0.15520640f, 0.04709549f, 0.04504483f, -0.10214074f,
+           -0.12293372f, -0.04820546f, -0.05484834f, 0.05473754f, 0.07346445f,
+           0.05577277f, -0.08209965f, 0.03462975f, -0.20962234f, -0.09324598f,
+           3.79481679e-003f, 0.03617633f, 0.16742408f, 0.07058107f, 0.10204960f,
+           -0.06795346f, 3.22807301e-003f, -0.12589309f, -0.17496960f,
+           0.02078314f, -0.07694324f, 0.12184640f, 0.08997164f, 0.04793497f,
+           -0.11383379f, -0.08046359f, -0.25716835f, -0.08080962f,
+           6.80711539e-003f, -0.02930280f, -3.04938294e-003f, -0.11106286f,
+           -0.04628860f, -0.07821649f, 7.70127494e-003f, -0.10247706f,
+           1.21042714e-003f, 0.20573859f, -0.03241005f, 8.42972286e-003f,
+           0.01946464f, -0.01197973f, -0.14579976f, 0.04233614f,
+           -4.14096704e-003f, -0.06866436f, -0.02431862f, -0.13529138f,
+           1.25891645e-003f, -0.11425111f, -0.04303651f, -0.01694815f,
+           0.05720210f, -0.16040207f, 0.02772896f, 0.05498345f, -0.15010567f,
+           0.01450866f, 0.02350303f, -0.04301004f, -0.04951802f, 0.21702233f,
+           -0.03159155f, -0.01963303f, 0.18232647f, -0.03263875f,
+           -2.88476888e-003f, 0.01587562f, -1.94303901e-003f, -0.07789494f,
+           0.04674156f, -6.25576358e-003f, 0.08925962f, 0.21353747f, 0.01254677f,
+           -0.06999976f, -0.05931328f, -0.01884327f, -0.04306272f, 0.11794136f,
+           0.03842728f, -0.03907030f, 0.05636114f, -0.09766009f, -0.02104000f,
+           8.72711372e-003f, -0.02736877f, -0.05112274f, 0.16996814f, 0.02955785f,
+           0.02094014f, 0.08414304f, -0.03335762f, -0.03617457f, -0.05808248f,
+           -0.08872101f, 0.02927705f, 0.27077839f, 0.06075108f, 0.07478261f,
+           0.15282831f, -0.03908454f, -0.05101782f, -9.51998029e-003f,
+           -0.03272416f, -0.08735625f, 0.07633440f, -0.07185312f, 0.13841286f,
+           0.07812646f, -0.12901451f, -0.05488589f, -0.05644578f, -0.03290703f,
+           -0.11184757f, 0.03751570f, -0.05978153f, -0.09155276f, 0.05657315f,
+           -0.04328186f, -0.03047933f, -0.01413135f, -0.10181040f, -0.01384013f,
+           0.20132534f, -0.01536873f, -0.07641169f, 0.05906778f, -0.07833145f,
+           -0.01523801f, -0.07502609f, -0.09461885f, -0.15013233f, 0.16050665f,
+           0.09021381f, 0.08473236f, 0.03386267f, -0.09147339f, -0.09170618f,
+           -0.08498498f, -0.05119187f, -0.10431040f, 0.01041618f, -0.03064913f,
+           0.09340212f, 0.06448522f, -0.03881054f, -0.04985436f, -0.14794017f,
+           -0.05200112f, -0.02144495f, 0.04000821f, 0.12420804f, -0.01851651f,
+           -0.04116732f, -0.11951703f, -0.04879033f, -0.08722515f, -0.08454733f,
+           -0.10549165f, 0.11251976f, 0.10766345f, 0.19201984f, 0.06128913f,
+           -0.02734615f, -0.08834923f, -0.16999826f, -0.03548348f,
+           -5.36092324e-003f, 0.08297954f, 0.07226378f, 0.04194529f, 0.04668673f,
+           8.73902347e-003f, 0.06980139f, 0.05652480f, 0.05879445f, 0.02477076f,
+           0.02451423f, 0.12433673f, 0.05600227f, 0.06886370f, 0.03863076f,
+           0.07459056f, 0.02264139f, 0.01495469f, 0.06344220f, 0.06945208f,
+           0.02931899f, 0.11719371f, 0.04527427f, 0.03248192f, 2.08271481e-003f,
+           0.02044626f, 0.11403449f, 0.04303892f, 0.06444661f, 0.04959024f,
+           0.08174094f, 0.09240247f, 0.04894639f, 0.02252937f, -0.01652530f,
+           0.07587013f, 0.06064249f, 0.13954395f, 0.02772832f, 0.07093039f,
+           0.08501238f, 0.01701301f, 0.09055722f, 0.33421436f, 0.20163782f,
+           0.09821030f, 0.07951369f, 0.08695120f, -0.12757730f, -0.13865978f,
+           -0.06610068f, -0.10985506f, 0.03406816f, -0.01116336f, -0.07281768f,
+           -0.13525715f, -0.12844718f, 0.08956250f, 0.09171610f, 0.10092317f,
+           0.23385370f, 0.34489515f, 0.09901748f, 0.02002922f, 0.12335990f,
+           0.07606190f, -0.14899330f, -0.15634622f, -0.06494618f, -0.01760547f,
+           0.03404277f, -0.13208845f, -0.12101169f, -0.18294574f, -0.16560709f,
+           0.02183887f, -0.02752613f, 0.01813638f, 0.02000757f, 0.01319924f,
+           0.08030242f, 0.01220535f, 2.98233377e-003f, -0.01307070f, 0.05970297f,
+           -0.05345284f, -0.03381982f, -9.87543724e-003f, -0.06869387f,
+           0.03956730f, -0.03108176f, -0.05732809f, 0.02172386f, 0.04159765f,
+           2.62783933e-003f, 0.04813229f, 0.09358983f, -8.18389002e-003f,
+           0.01724574f, -0.02547474f, -0.04967288f, -0.02390376f, 0.06640504f,
+           -0.06306566f, 0.01137518f, 0.05589378f, -0.08237787f, 0.02455001f,
+           -0.03059422f, -0.08953978f, 0.06851497f, 0.07190268f, -0.07610799f,
+           7.87237938e-003f, -7.85830803e-003f, 0.06006952f, -0.01126728f,
+           -2.85743061e-003f, -0.04772895f, 0.01884944f, 0.15005857f,
+           -0.06268821f, -0.01989072f, 0.01138399f, 0.08760451f, 0.03879007f,
+           -9.66926850e-003f, -0.08012961f, 0.06414555f, -0.01362950f,
+           -0.09135523f, 0.01755159f, 0.04459474f, 0.09650917f, 0.05219948f,
+           -2.19440833e-003f, -0.07037939f, -0.01599054f, 0.13103317f,
+           -0.02492603f, -0.01032540f, -0.02903307f, 0.04489160f, 0.05148086f,
+           0.01858173f, -0.02919228f, 0.08299296f, -0.04590359f, -0.15745632f,
+           -0.09068198f, -0.02972453f, 0.12985018f, 0.22320485f, 0.24261914f,
+           0.03642650f, -0.05506422f, 2.67413049e-003f, -0.03834032f, 0.06449424f,
+           0.03834866f, 0.03816991f, 0.25039271f, 0.34212017f, 0.32433882f,
+           0.18824573f, -0.08599839f, -0.17599408f, -0.15317015f, -0.09913155f,
+           -0.02856072f, -0.05304699f, -1.06437842e-003f, -0.06641813f,
+           -0.07509298f, 0.01463361f, -0.07551918f, -0.04510373f,
+           -8.44620075e-003f, 0.01772176f, 0.04068235f, 0.20295307f, 0.15719447f,
+           0.05712103f, 0.26296997f, 0.14657754f, 0.01547317f, -0.05052776f,
+           -0.03881342f, -0.01437883f, -0.04930177f, 0.11719568f, 0.24098417f,
+           0.26468599f, 0.31698579f, 0.10103608f, -0.01096375f, -0.01367013f,
+           0.17104232f, 0.20065314f, 2.67622480e-003f, -0.01190034f, 0.18301608f,
+           0.09459770f, -0.06357619f, -0.06473801f, 0.01377906f, -0.10032775f,
+           -0.06388740f, 3.80393048e-003f, 0.06206078f, 0.10349120f, 0.26804337f,
+           8.17918684e-003f, -0.02314351f, 9.34422202e-003f, 0.09198381f,
+           0.03681326f, -8.77339672e-003f, -0.09662418f, -0.02715708f,
+           0.13503517f, 0.08962728f, -6.57071499e-003f, -0.03201199f, 0.28510824f,
+           0.32095715f, 0.18512695f, -0.14230858f, -0.14048551f, -0.07181299f,
+           -0.08575408f, -0.08661680f, -0.17416079f, 7.54326640e-004f,
+           0.05601677f, 0.13585392f, -0.04960437f, -0.07708392f, 0.10676333f,
+           -0.04407546f, -0.07209078f, 0.03663663f, 0.28949317f, 0.41127121f,
+           0.27431169f, -0.06900328f, -0.21474190f, -0.15578632f, -0.19555484f,
+           -0.15209621f, -0.11269179f, 0.07416003f, 0.18991330f, 0.26858172f,
+           0.01952259f, 0.01017922f, 0.02159843f, -4.95165400e-003f, -0.04368168f,
+           -0.12721671f, -0.06673957f, -0.11275250f, 0.04413409f, 0.05578312f,
+           0.03896771f, 0.03566417f, -0.05871816f, -0.07388090f, -0.17965563f,
+           -0.08570268f, -0.15273231f, -0.06022318f, -0.06999847f,
+           -6.81510568e-003f, 0.06294262f, -6.54901436e-004f, -0.01128654f,
+           -0.02289657f, 0.04849290f, 0.04140804f, 0.23681939f, 0.14545733f,
+           0.01989965f, 0.12032662f, 3.87463090e-003f, -6.02597650e-003f,
+           -0.05919775f, -0.03067224f, -0.07787777f, 0.10834727f, 0.02153730f,
+           0.02765649f, 0.03975543f, -0.12182906f, -0.04900113f, -0.09940100f,
+           -0.06453611f, -0.13757215f, -0.03721382f, 0.02827376f, -0.04351249f,
+           0.01907038f, -0.10284120f, -0.05671160f, -0.10760647f, -0.09624009f,
+           -0.09565596f, -0.01303654f, 0.03080539f, 0.01416511f, 0.05846142f,
+           -5.42971538e-003f, 0.06221476f, -0.03320325f, -0.06791797f,
+           -0.05791342f, 0.12851369f, 0.14990346f, 0.03634374f, 0.14262885f,
+           0.04330391f, 0.05032569f, -0.05631914f, 0.01606137f, 0.04387223f,
+           0.22344995f, 0.15722635f, -0.04693628f, 0.03006579f, -2.52882647e-003f,
+           0.05717621f, -0.07529724f, -0.02848588f, -0.06868757f,
+           -4.51729307e-003f, 0.06466042f, -0.05935378f, -0.04704857f,
+           -0.07363959f, 0.04843248f, -0.13421375f, -0.09789340f, -0.10255270f,
+           0.03509852f, 0.04751543f, -0.03822323f, 0.09740467f, 0.04762916f,
+           0.03940146f, -0.08283259f, 0.09552965f, 0.05038739f, 0.21258622f,
+           0.09646992f, 0.03241193f, 0.05167701f, 0.04614570f, 0.04330090f,
+           -0.02671840f, -0.06259909f, -0.02301898f, 0.18829170f, 0.10522786f,
+           0.04313190f, 0.01670948f, -0.08421925f, 0.05911417f, -0.10582602f,
+           -0.04855484f, -0.08373898f, 0.07775915f, 0.03723533f, -0.12047344f,
+           4.86345543e-003f, -0.10520902f, 0.06571782f, -0.07528137f,
+           -0.03245651f, -0.09869066f, -0.02917477f, -0.18293270f, 0.14810945f,
+           9.24033765e-003f, -0.04354914f, 0.02266885f, -0.11872729f,
+           -0.04016589f, 0.02830229f, 0.22539048f, 0.20565644f, 0.16701797f,
+           0.09019924f, 0.01300652f, 0.09760600f, -0.03675831f, -0.01935448f,
+           -0.06894835f, 0.08077277f, 0.19047537f, 0.11312226f, 0.04106043f,
+           -0.11187182f, 0.04312806f, -0.18548580f, -0.11287174f, -0.08794551f,
+           0.02078281f, -0.15295486f, 0.11806386f, -0.01103218f, -0.15971117f,
+           0.02153538f, -0.05232147f, -0.10835317f, -0.13910367f, 0.05920752f,
+           -0.10122602f, 0.20174250f, 0.09105796f, -0.01881348f, 0.09559010f,
+           -0.03725745f, -0.09442931f, -0.09763174f, 0.05854454f, 0.08287182f,
+           0.12919849f, 0.08594352f, -2.49806582e-003f, 0.02398440f,
+           5.67950122e-003f, -0.06296340f, -0.12993270f, 0.03855852f, 0.05186560f,
+           0.10839908f, -0.03380463f, -0.12654832f, -0.05399339f, -0.07456800f,
+           -0.04736232f, -0.10164231f, 0.07496139f, 0.08125214f, 0.07656177f,
+           -0.04999603f, -0.12823077f, -0.07692395f, -0.11317524f, -0.09118655f,
+           -0.05695669f, 0.10477209f, 0.07468581f, 0.01630048f, -8.00961629e-003f,
+           -0.06582128f, -0.04019095f, -0.04682907f, -0.01907842f, -0.10997720f,
+           0.04911406f, 0.02931030f, 0.04197735f, -0.05773980f, -0.09670641f,
+           -0.03594951f, -0.03402121f, -0.07149299f, -0.10566200f, 0.10601286f,
+           0.06340689f, -0.01518632f, -5.96402306e-003f, -0.07628012f,
+           -3.52779147e-003f, -0.02683854f, -0.10265494f, -0.02680815f,
+           0.16338381f, 0.03103515f, 0.02296976f, 0.01624348f, -0.10831620f,
+           -0.02314233f, -0.04789969f, -0.05530700f, -0.06461314f, 0.10494506f,
+           0.04642856f, -0.07592955f, -0.06197905f, -0.09042154f, -0.01445521f,
+           -0.04297818f, -0.11262015f, -0.11430512f, 0.03174541f, -0.03677487f,
+           -0.02963996f, -0.06610169f, -0.13292049f, -0.07059067f, -0.08444111f,
+           -0.02640536f, -0.07136250f, 0.04559967f, 0.01459980f, 0.17989251f,
+           0.04435328f, -0.12464730f, -0.02871115f, -0.10752209f, -0.03393742f,
+           -0.03791408f, 0.02548251f, 0.01956050f, 0.19245651f, 0.13963254f,
+           -0.05904696f, -0.07424626f, -0.10411884f, 1.54176133e-003f,
+           0.01797429f, 0.13025844f, 0.04547642f, -0.05710349f, -0.10697161f,
+           -0.13489437f, -0.06515755f, -0.06406886f, -4.08572936e-003f,
+           -0.01336483f, 0.04368737f, -0.11259720f, -0.05701635f, -0.06469971f,
+           -0.08346602f, -0.04166770f, -0.05795543f, -0.08247511f, -0.05742628f,
+           0.08452254f, -0.03350224f, 0.13980860f, 0.13252275f, 0.07589617f,
+           0.07539988f, 0.12155797f, 0.19087289f, 0.15050751f, 0.21250245f,
+           0.14206800f, 0.01298489f, 0.07450245f, 0.06559097f, 0.01700557f,
+           0.04512971f, 0.16950700f, 0.10261577f, 0.16389982f, 0.05505059f,
+           -0.03453077f, 0.08622462f, 0.07935954f, 0.03976260f, 0.02036091f,
+           3.95744899e-003f, 0.03267065f, 0.15235919f, 0.01297494f, -0.08109194f,
+           0.01407558f, 4.40693414e-003f, -0.15157418f, -0.11390478f,
+           -0.07487597f, -7.81322457e-003f, -0.02749545f, -0.10181408f,
+           0.13755716f, 0.14007211f, 0.13482562f, 0.27517235f, 0.34251109f,
+           0.07639657f, 0.07268607f, 0.19823882f, 0.16135791f, -0.04186463f,
+           -0.12784107f, -0.09846287f, 0.03169041f, 0.10974082f, -0.15051922f,
+           -0.08916726f, -0.07138767f, -0.04153349f, 6.25418453e-003f,
+           0.01266654f, 0.10533249f, 0.12749144f, 0.15148053f, 0.01498513f,
+           0.06305949f, -0.01247123f, -0.08778401f, -0.08551880f, -0.11955146f,
+           -0.08493572f, -0.02901620f, -0.02394859f, -0.13427313f, -0.11053200f,
+           -0.14413260f, -0.15203285f, 0.03972760f, -3.72127310e-004f,
+           -0.04200919f, 0.06105104f, 0.01904975f, -0.01106191f,
+           -7.27445772e-003f, -0.01520341f, 1.10228511e-003f, -0.04949187f,
+           -0.08013099f, 5.72071038e-003f, 0.08415454f, -0.06523152f, 0.03664081f,
+           -0.02673042f, -0.12066154f, -0.03702074f, 0.06006580f, 0.01628682f,
+           -6.17772620e-003f, 0.08192339f, -3.41629819e-003f, 0.02870512f,
+           0.05807141f, 0.04959986f, 0.04618251f, -0.04901629f, -0.10579574f,
+           0.02274442f, 0.12070961f, 2.23597488e-003f, 0.09831765f, -0.03019848f,
+           -0.11181970f, -0.04961075f, 0.02498928f, -0.03714991f, -0.01619653f,
+           0.02643486f, -7.62964319e-003f, -0.02882290f, -0.06242594f,
+           -0.08439861f, 0.07220893f, 0.07263952f, 0.01561574f, 0.03091968f,
+           0.01708712f, -0.03797151f, -3.18561122e-003f, 0.01624021f,
+           -0.02828573f, 0.11284444f, -1.32280716e-003f, -0.07784860f,
+           -0.07209100f, 0.03372242f, 0.12154529f, 0.02278104f, -0.05275500f,
+           -0.01918484f, 0.12989293f, 0.05424401f, 0.02333086f, 0.04029022f,
+           0.12392918f, 0.09495489f, 0.09190340f, 0.07935889f, 8.76816828e-003f,
+           0.17148446f, -8.51302687e-003f, -0.08011249f, -0.06796283f,
+           0.04884845f, 0.01112272f, -0.07835306f, -1.14811445e-003f,
+           -0.03440760f, 0.02845243f, 0.07695542f, -0.07069533f, -0.01151784f,
+           -8.53884313e-003f, -0.01662786f, -0.04163864f, 0.05400505f,
+           0.02859163f, 0.02921852f, 0.05003135f, -6.85718050e-003f, -0.01632611f,
+           0.07780217f, 0.04042810f, -0.01216440f, 3.60914599e-003f, -0.06322435f,
+           0.09516726f, 0.12877031f, -9.69162490e-003f, 0.01031179f, 0.05180895f,
+           -9.34659224e-003f, -0.01644533f, -0.04849347f, -0.04343236f,
+           0.10514783f, 0.08046635f, -0.04615205f, -0.03975486f, -0.01485525f,
+           0.13096830f, -0.01517950f, -0.06571898f, -0.04016372f, 0.01849786f,
+           0.02439670f, 0.08067258f, 1.74824719e-003f, 0.07053747f, 0.08819518f,
+           -5.08352555e-003f, -0.06550863f, -0.08266170f, -0.07780605f,
+           0.01453450f, -0.08756890f, 0.01096501f, -8.71319138e-003f, 0.10110464f,
+           0.02420769f, -0.06708383f, 0.02007811f, 5.93133038e-003f, 0.05398923f,
+           0.07538138f, 0.02049227f, 0.02242589f, 0.04011070f, -1.44875818e-003f,
+           -4.19115182e-003f, 0.06367654f, 0.02506934f, 0.02434536f, 0.05879405f,
+           -8.22952855e-003f, -0.01242441f, 0.04224926f, -0.01754923f,
+           0.05958161f, 0.03818886f, -0.01830363f, -0.04308917f, -0.04422197f,
+           -0.02432721f, 0.02264866f, 2.03751423e-003f, 0.01197031f, 0.04439203f,
+           0.12169247f, 0.03602713f, -0.02599251f, -1.98226492e-003f, 0.02046336f,
+           -0.02639058f, -1.91242550e-003f, -0.09334669f, -0.03595153f,
+           -9.88179818e-003f, -0.06848445f, -0.04666303f, -0.09955736f,
+           -0.04206430f, 0.02609075f, 9.09005292e-003f, -0.07138551f,
+           -4.22313227e-004f, 0.01766645f, 0.02756404f, 0.01308276f, 0.04052891f,
+           0.02387515f, 0.05337298f, 0.02500631f, -0.04970853f, -0.12467445f,
+           0.17604403f, 0.12256411f, -0.07512254f, 8.70451052e-003f, -0.05697548f,
+           -0.03626474f, -8.76623299e-003f, -0.01210897f, -0.09451522f,
+           0.07490732f, -0.02008001f, -0.02681278f, -0.06463405f, -0.01517507f,
+           7.33757764e-003f, 6.07147906e-003f, -0.09316964f, -0.04575328f,
+           0.13261597f, 0.15424870f, -0.01655918f, -0.02772390f, -0.05243644f,
+           -0.02356456f, -0.02351753f, -0.10211615f, -0.12873036f, 0.14549787f,
+           0.12519856f, 4.38762689e-003f, 0.02795992f, 0.05170322f, 0.09223596f,
+           0.05890015f, 0.02376701f, -0.02777346f, 0.09506908f, 0.02328936f,
+           -0.02319928f, -0.03218696f, -0.01527841f, -0.01016694f, -0.02674719f,
+           0.05137179f, 0.01980666f, 0.06544447f, -0.01746171f, 0.01026380f,
+           0.01561806f, 7.97004555e-004f, 0.07601810f, 0.01907250f, -0.03083035f,
+           -0.05987392f, 0.09242783f, 0.14555025f, 0.01035827f, 0.03092401f,
+           -0.09562709f, -0.03802354f, 0.02531144f, 0.03079449f, -0.07100715f,
+           0.03330721f, -2.69116857e-003f, 0.03167490f, 0.05744999f, 0.03259895f,
+           1.91266940e-003f, 0.03194578f, 0.07389776f, 0.02198060f, 0.07633314f,
+           0.03293105f, -0.09103648f, 0.04718142f, 0.06102672f, -0.01003063f,
+           5.85481385e-003f, -0.01522574f, 0.02323526f, 0.10584345f,
+           4.35879454e-003f, 0.06107873f, 0.05868603f, -0.03115531f, 0.01214679f,
+           0.08567052f, 3.93926632e-003f, -0.02521488f, -1.88425183e-003f,
+           0.02038053f, -6.26854831e-004f, 0.04897438f, -0.04280585f,
+           -0.04819689f, -0.04812867f, -0.01451186f, 0.05101469f,
+           -9.01125465e-003f, -0.03333859f, 0.03917955f, 0.04196448f, 0.04292135f,
+           0.02809529f, 0.02999715f, 0.04081348f, 9.10039060e-003f, 0.09703232f,
+           0.10379741f, 0.02348725f, -4.72756615e-003f, 0.01027325f, 0.10402658f,
+           0.12071823f, 0.09817299f, -0.02612033f, 0.03638414f, 0.05896405f,
+           0.04865025f, 0.04793910f, -0.03882321f, -0.02962117f, -0.01222268f,
+           0.04071597f, 0.01922777f, -0.02287866f, 0.03328381f, 0.01859092f,
+           0.09024994f, 0.03804455f, -0.01424510f, 0.01953739f, 0.02509617f,
+           -0.03390914f, -0.05663941f, -0.01641979f, 0.05848591f, 0.04639670f,
+           0.02092116f, 0.12911791f, 0.19918139f, 0.07739855f, -7.25806039e-003f,
+           0.04074838f, 0.03183993f, 1.39251316e-003f, -0.01428625f, 0.01865480f,
+           0.08529541f, 0.13547510f, 0.11189661f, 0.03998901f, 0.09575938f,
+           -0.02631102f, -0.03458253f, -0.04749985f, -0.06070716f,
+           4.71884012e-003f, 0.06445789f, -0.02450038f, -0.05483776f,
+           -0.04657237f, -0.02030717f, -0.03480766f, -0.09397731f, -0.06399718f,
+           -0.01804585f, 5.62348310e-003f, -6.64811488e-003f, -0.06517869f,
+           6.96210237e-003f, -0.01860148f, -0.04245830f, -0.05850367f,
+           -3.24417115e-003f, 0.07700698f, 0.11290991f, 0.09923030f, -0.02970599f,
+           0.05592411f, 0.04813979f, -0.09811195f, -0.09357996f, -0.03276114f,
+           0.05218338f, 0.04141375f, 3.92977800e-003f, -0.05047480f, 0.15960084f,
+           0.04612800f, -0.03114098f, -0.04650044f, -0.03249795f, -0.02425641f,
+           -0.04311355f, 0.04307659f, -0.09401883f, -0.04742785f, -0.01254499f,
+           -0.06598741f, 3.41369561e-003f, -0.05620445f, -7.28127593e-003f,
+           -0.05998361f, -0.03274450f, -0.07376868f, 3.19015374e-003f,
+           -0.07733069f, 0.05815864f, -0.02471071f, 0.03850617f, 0.13838784f,
+           0.15399861f, 0.01731321f, -0.01477586f, 0.10393341f, 0.05159833f,
+           -0.01945555f, -0.03427503f, -0.04867341f, 0.09237480f, 0.10732719f,
+           0.06071450f, -0.01355071f, 0.01844356f, -0.03480803f, -0.03796671f,
+           2.15628621e-004f, -0.05440186f, 0.01889855f, -0.01443413f,
+           -0.02607902f, -0.02938001f, 0.02720689f, -0.06228397f, -0.02970936f,
+           -0.03426210f, -0.10280876f, -0.06739304f, -0.05227850f, 0.03360292f,
+           -0.11278441f, -0.06966180f, -0.13937433f, 9.10932291e-003f,
+           2.52020749e-004f, -4.07359656e-003f, 0.12310639f, 0.09343060f,
+           0.07302511f, 0.03222093f, 0.07532879f, 0.03792387f, -0.04985180f,
+           0.01804602f, 0.02694195f, 0.13481498f, 0.04601225f, 0.04106982f,
+           0.08511057f, 0.12314661f, 0.01320830f, 0.05044121f, -5.52943908e-003f,
+           -0.08992624f, -0.02249301f, -0.08181777f, 0.06165213f, -0.03256603f,
+           -0.01068920f, -0.01323473f, -0.11970232f, -0.04616347f, -0.12088681f,
+           -0.06762606f, -0.08676834f, -0.06434575f, 0.01772529f, 0.03469615f,
+           -0.10926618f, 0.03013873f, 0.14030397f, 0.16130108f, 0.17985588f,
+           0.11281928f, 0.10530639f, 0.08905948f, 0.07733764f, 0.06695238f,
+           0.02142088f, 0.06438877f, 0.09794453f, 0.05745072f, 0.02788557f,
+           0.02632830f, 0.07985807f, 4.24902979e-003f, 8.47890321e-003f,
+           -0.02679466f, -5.28812688e-003f, -0.02162580f, -0.07490715f,
+           -0.08251337f, -0.02056576f, -0.01026194f, -1.15492963e-003f,
+           -5.75720915e-004f, -0.07210591f, -0.07320981f, -0.04883312f,
+           -0.10897151f, -0.07477258f, -0.08867134f, -0.09222437f, -0.10924666f,
+           -0.10430276f, 0.07953499f, 0.02767959f, 0.11393359f, 0.18779543f,
+           0.03313421f, 0.02143700f, 0.05852016f, -2.12067598e-003f,
+           -3.76984011e-003f, 0.02774167f, -0.03124610f, 0.01465141f, 0.01616004f,
+           -0.01391913f, -0.04404102f, -0.05444227f, -0.14684731f, -0.15016587f,
+           0.04509468f, 1.29563001e-003f, 0.01398350f, 0.05610404f, -0.04868806f,
+           -0.04776716f, -8.16873740e-003f, -2.30126386e-003f, -0.02286313f,
+           0.11983398f, -0.04703261f, -0.08814441f, -0.07585249f, -0.10799607f,
+           -0.03232087f, 0.01509786f, -0.04843464f, -0.03967846f, 0.09589416f,
+           0.01352560f, -0.01458119f, 0.01050829f, -0.03038946f, 0.01608388f,
+           1.11975556e-003f, -0.01250656f, 2.86211423e-003f, 0.04333691f,
+           -0.14603497f, -0.01946543f, -0.02327525f, -0.01973944f, 0.07944400f,
+           -0.02224544f, -0.06701808f, 0.03476532f, 0.11505594f, -0.02712801f,
+           -0.01665113f, 0.06315716f, -0.08205860f, 0.07431999f, 0.04915778f,
+           -0.04468752f, -0.01490402f, 0.07400476f, -0.11650901f, 0.05102430f,
+           0.04559118f, -0.05916039f, 0.08840760f, -0.01587902f, -0.14890194f,
+           0.07857784f, 0.04710254f, -0.05381983f, -0.07331945f, -0.03604643f,
+           0.15611970f, 0.07649943f, -0.05959348f, -0.02776607f, 0.11098688f,
+           0.03758875f, -0.04446875f, 0.04933187f, 0.01345535f, 0.06921103f,
+           0.07364785f, 0.05518956f, 0.02899585f, 0.09375840f, 0.10518434f,
+           -0.04420241f, 0.01915282f, -3.56386811e-003f, 0.14586878f, 0.10286101f,
+           -0.04360626f, -0.12723237f, 0.09076386f, 0.11119842f, -0.06035013f,
+           0.09674817f, 0.08938243f, 0.07065924f, 0.02603180f, 5.84815582e-003f,
+           -0.05922065f, 0.12360309f, 3.59695964e-003f, 2.99844006e-003f,
+           0.03697936f, 0.02043072f, 0.04168725f, 0.01025975f, -0.01359980f,
+           -0.01600920f, 0.02581056f, 0.02329250f, 2.98100687e-003f, 0.01629762f,
+           0.06652115f, 0.05855627f, 0.01237463f, -0.01297135f, 0.01761587f,
+           0.05090865f, 0.06549342f, -0.04425945f, 2.43203156e-003f,
+           3.07327788e-003f, 0.06678630f, -0.04303836f, 0.01082393f, -0.06476044f,
+           0.04077786f, 0.12441979f, 0.08237778f, 0.07424165f, 0.04065890f,
+           0.06905543f, 0.09556347f, 0.12724875f, -0.02132082f, 0.08514154f,
+           -0.04175328f, -0.02666954f, 0.01897836f, 0.03317382f, 9.45465732e-003f,
+           -0.01238974f, -0.04242500f, -0.01419479f, -0.03545213f, -0.02440874f,
+           0.08684119f, 0.04212951f, 0.02462858f, -0.01104825f, -5.01706870e-003f,
+           0.02968982f, 0.02597476f, -0.01568939f, 0.04514892f, 0.06974549f,
+           0.08670278f, 0.06828108f, 0.10238872f, 0.05405957f, 0.06548470f,
+           -0.03763957f, 0.01366090f, 0.07069602f, 0.05363748f, 0.04798120f,
+           0.11706422f, 0.05466456f, -0.01869259f, 0.06344382f, 0.03106543f,
+           0.08432506f, -0.02061096f, 0.03821088f, -6.92190882e-003f,
+           6.40467042e-003f, -0.01271779f, 6.89014705e-005f, 0.04541415f,
+           -0.01899539f, -0.05020239f, 0.03000903f, 0.01090422f, 4.52452758e-003f,
+           0.02573632f, -0.02388454f, -0.04200457f, 1.72783900e-003f,
+           -0.05978370f, -0.02720562f, 0.06573715f, 0.01154317f, 0.01265615f,
+           0.07375994f, -9.19828378e-003f, -0.04914120f, 0.02124831f, 0.06455322f,
+           0.04372910f, -0.03310043f, 0.03605788f, -6.78055827e-003f,
+           9.36202332e-003f, 0.01747596f, -0.06406314f, -0.06812935f, 0.08080816f,
+           -0.02778088f, 0.02735260f, 0.06393493f, 0.06652229f, 0.05676993f,
+           0.08640018f, -7.59188086e-003f, -0.02012847f, -0.04741159f,
+           -0.01657069f, -0.01624399f, 0.05547778f, -2.33309763e-003f,
+           0.01120033f, 0.06141156f, -0.06285004f, -0.08732341f, -0.09313398f,
+           -0.04267832f, 5.57443965e-003f, 0.04809862f, 0.01773641f,
+           5.37361018e-003f, 0.14842421f, -0.06298012f, -0.02935147f, 0.11443478f,
+           -0.05034208f, 5.65494271e-003f, 0.02076526f, -0.04577984f,
+           -0.04735741f, 0.02961071f, -0.09307127f, -0.04417921f, -0.04990027f,
+           -0.03940028f, 0.01306016f, 0.06267900f, 0.03758737f, 0.08460117f,
+           0.13858789f, 0.04862388f, -0.06319809f, -0.05655516f, 0.01885816f,
+           -0.03285607f, 0.03371567f, -0.07040928f, -0.04514049f, 0.01392166f,
+           0.08184422f, -0.07230316f, 0.02386871f, 0.02184591f, 0.02605764f,
+           -0.01033954f, 9.29878280e-003f, 7.67351175e-003f, 0.15189242f,
+           0.02069071f, -0.09738296f, -0.08894105f, -0.07768748f, 0.02332268f,
+           -0.01778995f, -0.03258888f, -0.08180822f, -0.08492987f, 0.02290156f,
+           -0.11368170f, -0.03554465f, -0.04533844f, -0.02861580f, 0.06782424f,
+           0.01113123f, 0.02453644f, 0.12721945f, 0.08084814f, -0.03607795f,
+           0.01109122f, 0.04803548f, -0.03489929f, 0.03399536f, -0.05682014f,
+           8.59533902e-003f, -4.27904585e-003f, 0.03230887f, -0.01300198f,
+           -0.01038137f, -0.07930113f, 8.33097473e-003f, 0.02296994f,
+           -0.01306500f, -0.01881626f, 0.04413369f, 0.05729880f, -0.03761553f,
+           0.01942326f, 1.64540811e-003f, -0.03811319f, 0.04190650f, -0.14978096f,
+           -0.04514487f, 0.01209545f, -5.46460645e-003f, -0.01647195f,
+           7.63064111e-003f, -0.07494587f, 0.08415288f, 0.10020141f, -0.01228561f,
+           0.06553826f, 0.04554005f, 0.07890417f, 0.03041138f, 0.01752007f,
+           0.09208256f, -3.74419295e-004f, 0.10549527f, 0.04686913f, 0.01894833f,
+           -0.02651412f, -4.34682379e-003f, 5.44942822e-003f, 0.01444484f,
+           0.05882156f, -0.03336544f, 0.04603891f, -0.10432546f, 0.01923928f,
+           0.01842845f, -0.01712168f, -0.02222766f, 0.04693324f, -0.06202956f,
+           -0.01422159f, 0.08732220f, -0.07706107f, 0.02661049f, -0.04300238f,
+           -0.03092422f, -0.03552184f, -0.01886088f, -0.04979934f, 0.03906401f,
+           0.04608644f, 0.04966111f, 0.04275464f, -0.04621769f, -0.02653212f,
+           8.57011229e-003f, 0.03839684f, 0.05818764f, 0.03880796f,
+           -2.76100676e-004f, 0.03076511f, -0.03266929f, -0.05374557f,
+           0.04986527f, -9.45429131e-003f, 0.03582499f, -2.64564669e-003f,
+           -1.07461517e-003f, 0.02962313f, -0.01483363f, 0.03060869f, 0.02448327f,
+           0.01845641f, 0.03282966f, -0.03534438f, -0.01084059f, -0.01119136f,
+           -1.85360224e-003f, -5.94652840e-004f, -0.04451817f, 2.98327743e-003f,
+           0.06272484f, -0.02152076f, -3.05971340e-003f, -0.05070828f,
+           0.01531762f, 0.01282815f, 0.05167150f, 9.46266949e-003f,
+           -3.34558333e-003f, 0.11442288f, -0.03906701f, -2.67325155e-003f,
+           0.03069184f, -0.01134165f, 0.02949462f, 0.02879886f, 0.03855566f,
+           -0.03450781f, 0.09142872f, -0.02156654f, 0.06075062f, -0.06220816f,
+           0.01944680f, 6.68372354e-003f, -0.06656796f, 8.70784000e-003f,
+           0.03456013f, 0.02434320f, -0.13236357f, -0.04177035f, -0.02069627f,
+           0.01068112f, 0.01505432f, -0.07517391f, -3.83571628e-003f,
+           -0.06298508f, -0.02881260f, -0.13101046f, -0.07221562f,
+           -5.79945277e-003f, -8.57300125e-003f, 0.03782469f, 0.02762164f,
+           0.04942456f, -0.02936396f, 0.09597211f, 0.01921411f, 0.06101191f,
+           -0.04787507f, -0.01379578f, -7.40224449e-003f, -0.02220136f,
+           -0.01313756f, 7.77558051e-003f, 0.12296968f, 0.02939998f, 0.03594062f,
+           -0.07788624f, -0.01133144f, 3.99316690e-004f, -0.06090347f,
+           -0.01122066f, -4.68682544e-003f, 0.07633100f, -0.06748922f,
+           -0.05640298f, -0.05265681f, -0.01139122f, -0.01624347f, -0.04715714f,
+           -0.01099092f, 0.01048561f, 3.28499987e-003f, -0.05810167f,
+           -0.07699911f, -0.03330683f, 0.04185145f, 0.03478536f, 0.02275165f,
+           0.02304766f, 6.66040834e-003f, 0.10968148f, -5.93013782e-003f,
+           -0.04858336f, -0.04203213f, -0.09316786f, -6.13074889e-003f,
+           -0.02544625f, 0.01366201f, 9.18555818e-003f, -0.01846578f,
+           -0.05622401f, -0.03989377f, -0.07810296f, 6.91275718e-003f,
+           0.05957597f, -0.03901334f, 0.01572002f, -0.01193903f,
+           -6.89400872e-003f, -0.03093356f, -0.04136098f, -0.01562869f,
+           -0.04604580f, 0.02865234f, -0.08678447f, -0.03232484f, -0.05364593f,
+           -0.01445016f, -0.07003860f, -0.08669746f, -0.04520775f, 0.04274122f,
+           0.03117515f, 0.08175703f, 0.01081109f, 0.06379741f, 0.06199206f,
+           0.02865988f, 0.02360346f, 0.06725410f, -0.03248780f, -9.37702879e-003f,
+           0.08265898f, -0.02245839f, 0.05125763f, -0.01862395f, 0.01973453f,
+           -0.01994494f, -0.10770868f, 0.03180375f, 3.23935156e-003f,
+           -0.02142080f, -0.04256190f, 0.04760900f, 0.04282863f, 0.05635953f,
+           -0.01870849f, 0.05540622f, -0.03042666f, 0.01455277f, -0.06630179f,
+           -0.05843807f, -0.03739681f, -0.09739155f, -0.03220233f, -0.05620182f,
+           -0.10381401f, 0.07400211f, 4.20676917e-003f, 0.03258535f,
+           2.14308966e-003f, 0.05121966f, -0.01274337f, 0.02384761f, 0.06335578f,
+           -0.07905591f, 0.08375625f, -0.07898903f, -0.06508528f, -0.02498444f,
+           0.06535810f, 0.03970535f, 0.04895468f, -0.01169566f, -0.03980601f,
+           0.05682293f, 0.05925463f, -0.01165808f, -0.07936699f, -0.04208954f,
+           0.01333987f, 0.09051196f, 0.10098671f, -0.03974256f, 0.01238771f,
+           -0.07501741f, -0.03655440f, -0.04301528f, 0.09216860f,
+           4.63579083e-004f, 0.02851115f, 0.02142735f, 1.28244064e-004f,
+           0.02879687f, -0.08554889f, -0.04838862f, 0.08135369f, -0.05756533f,
+           0.01413900f, 0.03451880f, -0.06619488f, -0.03053130f, 0.02961676f,
+           -0.07384635f, 0.01135692f, 0.05283910f, -0.07778034f, -0.02107482f,
+           -0.05511716f, -0.13473752f, 0.03030157f, 0.06722020f, -0.06218817f,
+           -0.05826827f, 0.06254654f, 0.02895772f, -0.01664000f, -0.03620280f,
+           -0.01612278f, -1.46097376e-003f, 0.14013411f, -8.96181818e-003f,
+           -0.03250246f, 3.38630192e-003f, 2.64779478e-003f, 0.03359732f,
+           -0.02411991f, -0.04229729f, 0.10666174f, -6.66579151f };
+
+        return Mat(1, static_cast<int>(sizeof(detector)/sizeof(detector[0])), CV_32FC1, detector);
+    }
+}
+
+#endif
diff --git a/modules/cudaobjdetect/src/precomp.hpp b/modules/cudaobjdetect/src/precomp.hpp
new file mode 100644
index 00000000000..2e5ab7af3bd
--- /dev/null
+++ b/modules/cudaobjdetect/src/precomp.hpp
@@ -0,0 +1,62 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_PRECOMP_H__
+#define __OPENCV_PRECOMP_H__
+
+#include <limits>
+
+#include "opencv2/cudaobjdetect.hpp"
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudawarping.hpp"
+#include "opencv2/objdetect.hpp"
+
+#include "opencv2/core/private.cuda.hpp"
+#include "opencv2/core/utility.hpp"
+
+#include "opencv2/opencv_modules.hpp"
+
+#ifdef HAVE_OPENCV_CUDALEGACY
+#  include "opencv2/cudalegacy/private.hpp"
+#endif
+
+#endif /* __OPENCV_PRECOMP_H__ */
diff --git a/modules/cudaobjdetect/test/test_main.cpp b/modules/cudaobjdetect/test/test_main.cpp
new file mode 100644
index 00000000000..04f4fcf6e60
--- /dev/null
+++ b/modules/cudaobjdetect/test/test_main.cpp
@@ -0,0 +1,45 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+CV_CUDA_TEST_MAIN("gpu")
diff --git a/modules/cudaobjdetect/test/test_objdetect.cpp b/modules/cudaobjdetect/test/test_objdetect.cpp
new file mode 100644
index 00000000000..e9ba15a2115
--- /dev/null
+++ b/modules/cudaobjdetect/test/test_objdetect.cpp
@@ -0,0 +1,563 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+//#define DUMP
+
+struct HOG : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Ptr<cv::cuda::HOG> hog;
+
+#ifdef DUMP
+    std::ofstream f;
+#else
+    std::ifstream f;
+#endif
+
+    int wins_per_img_x;
+    int wins_per_img_y;
+    int blocks_per_win_x;
+    int blocks_per_win_y;
+    int block_hist_size;
+
+    virtual void SetUp()
+    {
+        devInfo = GetParam();
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        hog = cv::cuda::HOG::create();
+    }
+
+#ifdef DUMP
+    void dump(const std::vector<cv::Point>& locations)
+    {
+        int nlocations = locations.size();
+        f.write((char*)&nlocations, sizeof(nlocations));
+
+        for (int i = 0; i < locations.size(); ++i)
+            f.write((char*)&locations[i], sizeof(locations[i]));
+    }
+#else
+    void compare(const std::vector<cv::Point>& locations)
+    {
+        // skip block_hists check
+        int rows, cols;
+        f.read((char*)&rows, sizeof(rows));
+        f.read((char*)&cols, sizeof(cols));
+        for (int i = 0; i < rows; ++i)
+        {
+            for (int j = 0; j < cols; ++j)
+            {
+                float val;
+                f.read((char*)&val, sizeof(val));
+            }
+        }
+
+        int nlocations;
+        f.read((char*)&nlocations, sizeof(nlocations));
+        ASSERT_EQ(nlocations, static_cast<int>(locations.size()));
+
+        for (int i = 0; i < nlocations; ++i)
+        {
+            cv::Point location;
+            f.read((char*)&location, sizeof(location));
+            ASSERT_EQ(location, locations[i]);
+        }
+    }
+#endif
+
+    void testDetect(const cv::Mat& img)
+    {
+        hog->setGammaCorrection(false);
+        hog->setSVMDetector(hog->getDefaultPeopleDetector());
+
+        std::vector<cv::Point> locations;
+
+        // Test detect
+        hog->detect(loadMat(img), locations);
+
+#ifdef DUMP
+        dump(locations);
+#else
+        compare(locations);
+#endif
+
+        // Test detect on smaller image
+        cv::Mat img2;
+        cv::resize(img, img2, cv::Size(img.cols / 2, img.rows / 2));
+        hog->detect(loadMat(img2), locations);
+
+#ifdef DUMP
+        dump(locations);
+#else
+        compare(locations);
+#endif
+
+        // Test detect on greater image
+        cv::resize(img, img2, cv::Size(img.cols * 2, img.rows * 2));
+        hog->detect(loadMat(img2), locations);
+
+#ifdef DUMP
+        dump(locations);
+#else
+        compare(locations);
+#endif
+    }
+};
+
+// desabled while resize does not fixed
+CUDA_TEST_P(HOG, DISABLED_Detect)
+{
+    cv::Mat img_rgb = readImage("hog/road.png");
+    ASSERT_FALSE(img_rgb.empty());
+
+    f.open((std::string(cvtest::TS::ptr()->get_data_path()) + "hog/expected_output.bin").c_str(), std::ios_base::binary);
+    ASSERT_TRUE(f.is_open());
+
+    // Test on color image
+    cv::Mat img;
+    cv::cvtColor(img_rgb, img, cv::COLOR_BGR2BGRA);
+    testDetect(img);
+
+    // Test on gray image
+    cv::cvtColor(img_rgb, img, cv::COLOR_BGR2GRAY);
+    testDetect(img);
+}
+
+CUDA_TEST_P(HOG, GetDescriptors)
+{
+    // Load image (e.g. train data, composed from windows)
+    cv::Mat img_rgb = readImage("hog/train_data.png");
+    ASSERT_FALSE(img_rgb.empty());
+
+    // Convert to C4
+    cv::Mat img;
+    cv::cvtColor(img_rgb, img, cv::COLOR_BGR2BGRA);
+
+    cv::cuda::GpuMat d_img(img);
+
+    // Convert train images into feature vectors (train table)
+    cv::cuda::GpuMat descriptors, descriptors_by_cols;
+
+    hog->setWinStride(Size(64, 128));
+
+    hog->setDescriptorFormat(HOGDescriptor::DESCR_FORMAT_ROW_BY_ROW);
+    hog->compute(d_img, descriptors);
+
+    hog->setDescriptorFormat(HOGDescriptor::DESCR_FORMAT_COL_BY_COL);
+    hog->compute(d_img, descriptors_by_cols);
+
+    // Check size of the result train table
+    wins_per_img_x = 3;
+    wins_per_img_y = 2;
+    blocks_per_win_x = 7;
+    blocks_per_win_y = 15;
+    block_hist_size = 36;
+    cv::Size descr_size_expected = cv::Size(blocks_per_win_x * blocks_per_win_y * block_hist_size,
+                                            wins_per_img_x * wins_per_img_y);
+    ASSERT_EQ(descr_size_expected, descriptors.size());
+
+    // Check both formats of output descriptors are handled correctly
+    cv::Mat dr(descriptors);
+    cv::Mat dc(descriptors_by_cols);
+    for (int i = 0; i < wins_per_img_x * wins_per_img_y; ++i)
+    {
+        const float* l = dr.rowRange(i, i + 1).ptr<float>();
+        const float* r = dc.rowRange(i, i + 1).ptr<float>();
+        for (int y = 0; y < blocks_per_win_y; ++y)
+            for (int x = 0; x < blocks_per_win_x; ++x)
+                for (int k = 0; k < block_hist_size; ++k)
+                    ASSERT_EQ(l[(y * blocks_per_win_x + x) * block_hist_size + k],
+                              r[(x * blocks_per_win_y + y) * block_hist_size + k]);
+    }
+}
+/*
+INSTANTIATE_TEST_CASE_P(CUDA_ObjDetect, HOG, ALL_DEVICES);
+*/
+//============== caltech hog tests =====================//
+
+struct CalTech : public ::testing::TestWithParam<tuple<cv::cuda::DeviceInfo, std::string> >
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Mat img;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        img = readImage(GET_PARAM(1), cv::IMREAD_GRAYSCALE);
+        ASSERT_FALSE(img.empty());
+    }
+};
+
+CUDA_TEST_P(CalTech, HOG)
+{
+    cv::cuda::GpuMat d_img(img);
+    cv::Mat markedImage(img.clone());
+
+    cv::Ptr<cv::cuda::HOG> d_hog = cv::cuda::HOG::create();
+    d_hog->setSVMDetector(d_hog->getDefaultPeopleDetector());
+    d_hog->setNumLevels(d_hog->getNumLevels() + 32);
+
+    std::vector<cv::Rect> found_locations;
+    d_hog->detectMultiScale(d_img, found_locations);
+
+#if defined (LOG_CASCADE_STATISTIC)
+    for (int i = 0; i < (int)found_locations.size(); i++)
+    {
+        cv::Rect r = found_locations[i];
+
+        std::cout << r.x << " " << r.y  << " " << r.width << " " << r.height << std::endl;
+        cv::rectangle(markedImage, r , CV_RGB(255, 0, 0));
+    }
+
+    cv::imshow("Res", markedImage);
+    cv::waitKey();
+#endif
+}
+
+INSTANTIATE_TEST_CASE_P(DISABLED_detect, CalTech, testing::Combine(ALL_DEVICES,
+    ::testing::Values<std::string>("caltech/image_00000009_0.png", "caltech/image_00000032_0.png",
+        "caltech/image_00000165_0.png", "caltech/image_00000261_0.png", "caltech/image_00000469_0.png",
+        "caltech/image_00000527_0.png", "caltech/image_00000574_0.png")));
+
+
+//------------------------variable GPU HOG Tests------------------------//
+struct Hog_var : public ::testing::TestWithParam<tuple<cv::cuda::DeviceInfo, std::string> >
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Mat img, c_img;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        cv::Rect roi(0, 0, 16, 32);
+        img = readImage(GET_PARAM(1), cv::IMREAD_GRAYSCALE);
+        ASSERT_FALSE(img.empty());
+        c_img = img(roi);
+    }
+};
+
+CUDA_TEST_P(Hog_var, HOG)
+{
+    cv::cuda::GpuMat _img(c_img);
+    cv::cuda::GpuMat d_img;
+
+    int win_stride_width = 8;int win_stride_height = 8;
+    int win_width = 16;
+    int block_width = 8;
+    int block_stride_width = 4;int block_stride_height = 4;
+    int cell_width = 4;
+    int nbins = 9;
+
+    Size win_stride(win_stride_width, win_stride_height);
+    Size win_size(win_width, win_width * 2);
+    Size block_size(block_width, block_width);
+    Size block_stride(block_stride_width, block_stride_height);
+    Size cell_size(cell_width, cell_width);
+
+    cv::Ptr<cv::cuda::HOG> gpu_hog = cv::cuda::HOG::create(win_size, block_size, block_stride, cell_size, nbins);
+
+    gpu_hog->setNumLevels(13);
+    gpu_hog->setHitThreshold(0);
+    gpu_hog->setWinStride(win_stride);
+    gpu_hog->setScaleFactor(1.05);
+    gpu_hog->setGroupThreshold(8);
+    gpu_hog->compute(_img, d_img);
+
+    vector<float> gpu_desc_vec;
+    ASSERT_TRUE(gpu_desc_vec.empty());
+    cv::Mat R(d_img);
+
+    cv::HOGDescriptor cpu_hog(win_size, block_size, block_stride, cell_size, nbins);
+    cpu_hog.nlevels = 13;
+    vector<float> cpu_desc_vec;
+    ASSERT_TRUE(cpu_desc_vec.empty());
+    cpu_hog.compute(c_img, cpu_desc_vec, win_stride, Size(0,0));
+}
+
+INSTANTIATE_TEST_CASE_P(DISABLED_detect, Hog_var, testing::Combine(ALL_DEVICES,
+    ::testing::Values<std::string>("/hog/road.png")));
+
+struct Hog_var_cell : public ::testing::TestWithParam<tuple<cv::cuda::DeviceInfo, std::string> >
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Mat img, c_img, c_img2, c_img3, c_img4;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        cv::Rect roi(0, 0, 48, 96);
+        img = readImage(GET_PARAM(1), cv::IMREAD_GRAYSCALE);
+        ASSERT_FALSE(img.empty());
+        c_img = img(roi);
+
+        cv::Rect roi2(0, 0, 54, 108);
+        c_img2 = img(roi2);
+
+        cv::Rect roi3(0, 0, 64, 128);
+        c_img3 = img(roi3);
+
+        cv::Rect roi4(0, 0, 32, 64);
+        c_img4 = img(roi4);
+    }
+};
+
+CUDA_TEST_P(Hog_var_cell, HOG)
+{
+    cv::cuda::GpuMat _img(c_img);
+    cv::cuda::GpuMat _img2(c_img2);
+    cv::cuda::GpuMat _img3(c_img3);
+    cv::cuda::GpuMat _img4(c_img4);
+    cv::cuda::GpuMat d_img;
+
+    ASSERT_FALSE(_img.empty());
+    ASSERT_TRUE(d_img.empty());
+
+    int win_stride_width = 8;int win_stride_height = 8;
+    int win_width = 48;
+    int block_width = 16;
+    int block_stride_width = 8;int block_stride_height = 8;
+    int cell_width = 8;
+    int nbins = 9;
+
+    Size win_stride(win_stride_width, win_stride_height);
+    Size win_size(win_width, win_width * 2);
+    Size block_size(block_width, block_width);
+    Size block_stride(block_stride_width, block_stride_height);
+    Size cell_size(cell_width, cell_width);
+
+    cv::Ptr<cv::cuda::HOG> gpu_hog = cv::cuda::HOG::create(win_size, block_size, block_stride, cell_size, nbins);
+
+    gpu_hog->setNumLevels(13);
+    gpu_hog->setHitThreshold(0);
+    gpu_hog->setWinStride(win_stride);
+    gpu_hog->setScaleFactor(1.05);
+    gpu_hog->setGroupThreshold(8);
+    gpu_hog->compute(_img, d_img);
+//------------------------------------------------------------------------------
+    cv::cuda::GpuMat d_img2;
+    ASSERT_TRUE(d_img2.empty());
+
+    int win_stride_width2 = 8;int win_stride_height2 = 8;
+    int win_width2 = 48;
+    int block_width2 = 16;
+    int block_stride_width2 = 8;int block_stride_height2 = 8;
+    int cell_width2 = 4;
+
+    Size win_stride2(win_stride_width2, win_stride_height2);
+    Size win_size2(win_width2, win_width2 * 2);
+    Size block_size2(block_width2, block_width2);
+    Size block_stride2(block_stride_width2, block_stride_height2);
+    Size cell_size2(cell_width2, cell_width2);
+
+    cv::Ptr<cv::cuda::HOG> gpu_hog2 = cv::cuda::HOG::create(win_size2, block_size2, block_stride2, cell_size2, nbins);
+    gpu_hog2->setWinStride(win_stride2);
+    gpu_hog2->compute(_img, d_img2);
+//------------------------------------------------------------------------------
+    cv::cuda::GpuMat d_img3;
+    ASSERT_TRUE(d_img3.empty());
+
+    int win_stride_width3 = 9;int win_stride_height3 = 9;
+    int win_width3 = 54;
+    int block_width3 = 18;
+    int block_stride_width3 = 9;int block_stride_height3 = 9;
+    int cell_width3 = 6;
+
+    Size win_stride3(win_stride_width3, win_stride_height3);
+    Size win_size3(win_width3, win_width3 * 2);
+    Size block_size3(block_width3, block_width3);
+    Size block_stride3(block_stride_width3, block_stride_height3);
+    Size cell_size3(cell_width3, cell_width3);
+
+    cv::Ptr<cv::cuda::HOG> gpu_hog3 = cv::cuda::HOG::create(win_size3, block_size3, block_stride3, cell_size3, nbins);
+    gpu_hog3->setWinStride(win_stride3);
+    gpu_hog3->compute(_img2, d_img3);
+//------------------------------------------------------------------------------
+    cv::cuda::GpuMat d_img4;
+    ASSERT_TRUE(d_img4.empty());
+
+    int win_stride_width4 = 16;int win_stride_height4 = 16;
+    int win_width4 = 64;
+    int block_width4 = 32;
+    int block_stride_width4 = 16;int block_stride_height4 = 16;
+    int cell_width4 = 8;
+
+    Size win_stride4(win_stride_width4, win_stride_height4);
+    Size win_size4(win_width4, win_width4 * 2);
+    Size block_size4(block_width4, block_width4);
+    Size block_stride4(block_stride_width4, block_stride_height4);
+    Size cell_size4(cell_width4, cell_width4);
+
+    cv::Ptr<cv::cuda::HOG> gpu_hog4 = cv::cuda::HOG::create(win_size4, block_size4, block_stride4, cell_size4, nbins);
+    gpu_hog4->setWinStride(win_stride4);
+    gpu_hog4->compute(_img3, d_img4);
+//------------------------------------------------------------------------------
+    cv::cuda::GpuMat d_img5;
+    ASSERT_TRUE(d_img5.empty());
+
+    int win_stride_width5 = 16;int win_stride_height5 = 16;
+    int win_width5 = 64;
+    int block_width5 = 32;
+    int block_stride_width5 = 16;int block_stride_height5 = 16;
+    int cell_width5 = 16;
+
+    Size win_stride5(win_stride_width5, win_stride_height5);
+    Size win_size5(win_width5, win_width5 * 2);
+    Size block_size5(block_width5, block_width5);
+    Size block_stride5(block_stride_width5, block_stride_height5);
+    Size cell_size5(cell_width5, cell_width5);
+
+    cv::Ptr<cv::cuda::HOG> gpu_hog5 = cv::cuda::HOG::create(win_size5, block_size5, block_stride5, cell_size5, nbins);
+    gpu_hog5->setWinStride(win_stride5);
+    gpu_hog5->compute(_img3, d_img5);
+//------------------------------------------------------------------------------
+}
+
+INSTANTIATE_TEST_CASE_P(DISABLED_detect, Hog_var_cell, testing::Combine(ALL_DEVICES,
+    ::testing::Values<std::string>("/hog/road.png")));
+//////////////////////////////////////////////////////////////////////////////////////////
+/// LBP classifier
+
+PARAM_TEST_CASE(LBP_Read_classifier, cv::cuda::DeviceInfo, int)
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(LBP_Read_classifier, Accuracy)
+{
+    std::string classifierXmlPath = std::string(cvtest::TS::ptr()->get_data_path()) + "lbpcascade/lbpcascade_frontalface.xml";
+
+    cv::Ptr<cv::cuda::CascadeClassifier> d_cascade;
+
+    ASSERT_NO_THROW(
+        d_cascade = cv::cuda::CascadeClassifier::create(classifierXmlPath);
+    );
+
+    ASSERT_FALSE(d_cascade.empty());
+}
+
+INSTANTIATE_TEST_CASE_P(DISABLED_CUDA_ObjDetect, LBP_Read_classifier,
+                        testing::Combine(ALL_DEVICES, testing::Values<int>(0)));
+
+
+PARAM_TEST_CASE(LBP_classify, cv::cuda::DeviceInfo, int)
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(LBP_classify, Accuracy)
+{
+    std::string classifierXmlPath = std::string(cvtest::TS::ptr()->get_data_path()) + "lbpcascade/lbpcascade_frontalface.xml";
+    std::string imagePath = std::string(cvtest::TS::ptr()->get_data_path()) + "lbpcascade/er.png";
+
+    cv::CascadeClassifier cpuClassifier(classifierXmlPath);
+    ASSERT_FALSE(cpuClassifier.empty());
+
+    cv::Mat image = cv::imread(imagePath);
+    image = image.colRange(0, image.cols/2);
+    cv::Mat grey;
+    cvtColor(image, grey, cv::COLOR_BGR2GRAY);
+    ASSERT_FALSE(image.empty());
+
+    std::vector<cv::Rect> rects;
+    cpuClassifier.detectMultiScale(grey, rects);
+    cv::Mat markedImage = image.clone();
+
+    std::vector<cv::Rect>::iterator it = rects.begin();
+    for (; it != rects.end(); ++it)
+        cv::rectangle(markedImage, *it, cv::Scalar(255, 0, 0));
+
+    cv::Ptr<cv::cuda::CascadeClassifier> gpuClassifier =
+            cv::cuda::CascadeClassifier::create(classifierXmlPath);
+
+    cv::cuda::GpuMat tested(grey);
+    cv::cuda::GpuMat gpu_rects_buf;
+    gpuClassifier->detectMultiScale(tested, gpu_rects_buf);
+
+    std::vector<cv::Rect> gpu_rects;
+    gpuClassifier->convert(gpu_rects_buf, gpu_rects);
+
+#if defined (LOG_CASCADE_STATISTIC)
+    for (size_t i = 0; i < gpu_rects.size(); i++)
+    {
+        cv::Rect r = gpu_rects[i];
+
+        std::cout << r.x << " " << r.y  << " " << r.width << " " << r.height << std::endl;
+        cv::rectangle(markedImage, r , CV_RGB(255, 0, 0));
+    }
+
+    cv::imshow("Res", markedImage);
+    cv::waitKey();
+#endif
+}
+
+INSTANTIATE_TEST_CASE_P(DISABLED_CUDA_ObjDetect, LBP_classify,
+                        testing::Combine(ALL_DEVICES, testing::Values<int>(0)));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaobjdetect/test/test_precomp.hpp b/modules/cudaobjdetect/test/test_precomp.hpp
new file mode 100644
index 00000000000..b7967085086
--- /dev/null
+++ b/modules/cudaobjdetect/test/test_precomp.hpp
@@ -0,0 +1,55 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#include <fstream>
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_test.hpp"
+
+#include "opencv2/cudaobjdetect.hpp"
+#include "opencv2/objdetect.hpp"
+
+#include "cvconfig.h"
+
+#endif
diff --git a/modules/cudaoptflow/CMakeLists.txt b/modules/cudaoptflow/CMakeLists.txt
new file mode 100644
index 00000000000..e5b823ab4aa
--- /dev/null
+++ b/modules/cudaoptflow/CMakeLists.txt
@@ -0,0 +1,28 @@
+if(IOS OR WINRT OR (NOT HAVE_CUDA AND NOT BUILD_CUDA_STUBS))
+  ocv_module_disable(cudaoptflow)
+endif()
+
+set(the_description "CUDA-accelerated Optical Flow")
+
+ocv_warnings_disable(CMAKE_CXX_FLAGS /wd4127 /wd4324 /wd4512 -Wundef -Wmissing-declarations -Wshadow -Wstrict-aliasing)
+
+ocv_define_module(cudaoptflow opencv_video opencv_optflow opencv_cudaarithm opencv_cudawarping opencv_cudaimgproc OPTIONAL opencv_cudalegacy WRAP python)
+
+set(NVIDIA_OPTICAL_FLOW_1_0_HEADERS_COMMIT "79c6cee80a2df9a196f20afd6b598a9810964c32")
+set(NVIDIA_OPTICAL_FLOW_1_0_HEADERS_MD5 "ca5acedee6cb45d0ec610a6732de5c15")
+set(NVIDIA_OPTICAL_FLOW_1_0_HEADERS_PATH "${OpenCV_BINARY_DIR}/3rdparty/NVIDIAOpticalFlowSDK_1_0_Headers")
+ocv_download(FILENAME "${NVIDIA_OPTICAL_FLOW_1_0_HEADERS_COMMIT}.zip"
+               HASH ${NVIDIA_OPTICAL_FLOW_1_0_HEADERS_MD5}
+               URL
+                 "https://github.com/NVIDIA/NVIDIAOpticalFlowSDK/archive/"
+               DESTINATION_DIR "${NVIDIA_OPTICAL_FLOW_1_0_HEADERS_PATH}"
+               STATUS NVIDIA_OPTICAL_FLOW_1_0_HEADERS_DOWNLOAD_SUCCESS
+               ID "NVIDIA_OPTICAL_FLOW"
+               UNPACK RELATIVE_URL)
+
+if(NOT NVIDIA_OPTICAL_FLOW_1_0_HEADERS_DOWNLOAD_SUCCESS)
+  message(STATUS "Failed to download NVIDIA_Optical_Flow_1_0 Headers")
+else()
+  add_definitions(-DHAVE_NVIDIA_OPTFLOW=1)
+  ocv_include_directories(SYSTEM "${NVIDIA_OPTICAL_FLOW_1_0_HEADERS_PATH}/NVIDIAOpticalFlowSDK-${NVIDIA_OPTICAL_FLOW_1_0_HEADERS_COMMIT}")
+endif()
\ No newline at end of file
diff --git a/modules/cudaoptflow/include/opencv2/cudaoptflow.hpp b/modules/cudaoptflow/include/opencv2/cudaoptflow.hpp
new file mode 100644
index 00000000000..84cfee00b8f
--- /dev/null
+++ b/modules/cudaoptflow/include/opencv2/cudaoptflow.hpp
@@ -0,0 +1,454 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_CUDAOPTFLOW_HPP
+#define OPENCV_CUDAOPTFLOW_HPP
+
+#ifndef __cplusplus
+#  error cudaoptflow.hpp header must be compiled as C++
+#endif
+
+#include "opencv2/core/cuda.hpp"
+
+/**
+  @addtogroup cuda
+  @{
+    @defgroup cudaoptflow Optical Flow
+  @}
+ */
+
+namespace cv { namespace cuda {
+
+//! @addtogroup cudaoptflow
+//! @{
+
+//
+// Interface
+//
+
+/** @brief Base interface for dense optical flow algorithms.
+ */
+class CV_EXPORTS_W DenseOpticalFlow : public Algorithm
+{
+public:
+    /** @brief Calculates a dense optical flow.
+
+    @param I0 first input image.
+    @param I1 second input image of the same size and the same type as I0.
+    @param flow computed flow image that has the same size as I0 and type CV_32FC2.
+    @param stream Stream for the asynchronous version.
+     */
+    CV_WRAP virtual void calc(InputArray I0, InputArray I1, InputOutputArray flow, Stream& stream = Stream::Null()) = 0;
+};
+
+/** @brief Base interface for sparse optical flow algorithms.
+ */
+class CV_EXPORTS_W SparseOpticalFlow : public Algorithm
+{
+public:
+    /** @brief Calculates a sparse optical flow.
+
+    @param prevImg First input image.
+    @param nextImg Second input image of the same size and the same type as prevImg.
+    @param prevPts Vector of 2D points for which the flow needs to be found.
+    @param nextPts Output vector of 2D points containing the calculated new positions of input features in the second image.
+    @param status Output status vector. Each element of the vector is set to 1 if the
+                  flow for the corresponding features has been found. Otherwise, it is set to 0.
+    @param err Optional output vector that contains error response for each point (inverse confidence).
+    @param stream Stream for the asynchronous version.
+     */
+    CV_WRAP virtual void calc(InputArray prevImg, InputArray nextImg,
+                      InputArray prevPts, InputOutputArray nextPts,
+                      OutputArray status,
+                      OutputArray err = cv::noArray(),
+                      Stream& stream = Stream::Null()) = 0;
+};
+/** @brief Base Interface for optical flow algorithms using NVIDIA Optical Flow SDK.
+ */
+class CV_EXPORTS_W NvidiaHWOpticalFlow : public Algorithm
+{
+public:
+    /** @brief Calculates Optical Flow using NVIDIA Optical Flow SDK.
+
+    * NVIDIA GPUs starting with Turing contain a dedicated hardware accelerator for computing optical flow vectors between pairs of images.
+    * The optical flow hardware accelerator generates block-based optical flow vectors.
+    * The size of the block depends on hardware in use, and can be queried using the function getGridSize().
+    * The block-based flow vectors generated by the hardware can be converted to dense representation (i.e. per-pixel flow vectors) using upSampler() helper function, if needed.
+    * The flow vectors are stored in CV_16SC2 format with x and y components of each flow vector in 16-bit signed fixed point representation S10.5.
+
+    @param inputImage Input image.
+    @param referenceImage Reference image of the same size and the same type as input image.
+    @param flow A buffer consisting of inputImage.Size() / getGridSize() flow vectors in CV_16SC2 format.
+    @param stream Stream for the asynchronous version.
+    @param hint Hint buffer if client provides external hints. Must have same size as flow buffer.
+                Caller can provide flow vectors as hints for optical flow calculation.
+    @param cost Cost buffer contains numbers indicating the confidence associated with each of the generated flow vectors.
+                Higher the cost, lower the confidence. Cost buffer is of type CV_32SC1.
+
+    @note
+    - Client must use critical sections around each calc() function if calling it from multiple threads.
+    */
+    CV_WRAP virtual void calc(
+        InputArray inputImage,
+        InputArray referenceImage,
+        InputOutputArray flow,
+        Stream& stream = Stream::Null(),
+        InputArray hint = cv::noArray(),
+        OutputArray cost = cv::noArray()) = 0;
+
+    /** @brief Releases all buffers, contexts and device pointers.
+    */
+    CV_WRAP virtual void collectGarbage() = 0;
+
+    /** @brief Returns grid size of output buffer as per the hardware's capability.
+    */
+    CV_WRAP virtual int getGridSize() const = 0;
+};
+
+//
+// BroxOpticalFlow
+//
+
+/** @brief Class computing the optical flow for two images using Brox et al Optical Flow algorithm (@cite Brox2004).
+ */
+class CV_EXPORTS_W BroxOpticalFlow : public DenseOpticalFlow
+{
+public:
+    CV_WRAP virtual double getFlowSmoothness() const = 0;
+    CV_WRAP virtual void setFlowSmoothness(double alpha) = 0;
+
+    CV_WRAP virtual double getGradientConstancyImportance() const = 0;
+    CV_WRAP virtual void setGradientConstancyImportance(double gamma) = 0;
+
+    CV_WRAP virtual double getPyramidScaleFactor() const = 0;
+    CV_WRAP virtual void setPyramidScaleFactor(double scale_factor) = 0;
+
+    //! number of lagged non-linearity iterations (inner loop)
+    CV_WRAP virtual int getInnerIterations() const = 0;
+    CV_WRAP virtual void setInnerIterations(int inner_iterations) = 0;
+
+    //! number of warping iterations (number of pyramid levels)
+    CV_WRAP virtual int getOuterIterations() const = 0;
+    CV_WRAP virtual void setOuterIterations(int outer_iterations) = 0;
+
+    //! number of linear system solver iterations
+    CV_WRAP virtual int getSolverIterations() const = 0;
+    CV_WRAP virtual void setSolverIterations(int solver_iterations) = 0;
+
+    CV_WRAP static Ptr<BroxOpticalFlow> create(
+            double alpha = 0.197,
+            double gamma = 50.0,
+            double scale_factor = 0.8,
+            int inner_iterations = 5,
+            int outer_iterations = 150,
+            int solver_iterations = 10);
+};
+
+//
+// PyrLKOpticalFlow
+//
+
+/** @brief Class used for calculating a sparse optical flow.
+
+The class can calculate an optical flow for a sparse feature set using the
+iterative Lucas-Kanade method with pyramids.
+
+@sa calcOpticalFlowPyrLK
+
+@note
+   -   An example of the Lucas Kanade optical flow algorithm can be found at
+        opencv_source_code/samples/gpu/pyrlk_optical_flow.cpp
+ */
+class CV_EXPORTS_W SparsePyrLKOpticalFlow : public SparseOpticalFlow
+{
+public:
+    CV_WRAP virtual Size getWinSize() const = 0;
+    CV_WRAP virtual void setWinSize(Size winSize) = 0;
+
+    CV_WRAP virtual int getMaxLevel() const = 0;
+    CV_WRAP virtual void setMaxLevel(int maxLevel) = 0;
+
+    CV_WRAP virtual int getNumIters() const = 0;
+    CV_WRAP virtual void setNumIters(int iters) = 0;
+
+    CV_WRAP virtual bool getUseInitialFlow() const = 0;
+    CV_WRAP virtual void setUseInitialFlow(bool useInitialFlow) = 0;
+
+    CV_WRAP static Ptr<cuda::SparsePyrLKOpticalFlow> create(
+            Size winSize = Size(21, 21),
+            int maxLevel = 3,
+            int iters = 30,
+            bool useInitialFlow = false);
+};
+
+/** @brief Class used for calculating a dense optical flow.
+
+The class can calculate an optical flow for a dense optical flow using the
+iterative Lucas-Kanade method with pyramids.
+ */
+class CV_EXPORTS_W DensePyrLKOpticalFlow : public DenseOpticalFlow
+{
+public:
+    CV_WRAP virtual Size getWinSize() const = 0;
+    CV_WRAP virtual void setWinSize(Size winSize) = 0;
+
+    CV_WRAP virtual int getMaxLevel() const = 0;
+    CV_WRAP virtual void setMaxLevel(int maxLevel) = 0;
+
+    CV_WRAP virtual int getNumIters() const = 0;
+    CV_WRAP virtual void setNumIters(int iters) = 0;
+
+    CV_WRAP virtual bool getUseInitialFlow() const = 0;
+    CV_WRAP virtual void setUseInitialFlow(bool useInitialFlow) = 0;
+
+    CV_WRAP static Ptr<DensePyrLKOpticalFlow> create(
+            Size winSize = Size(13, 13),
+            int maxLevel = 3,
+            int iters = 30,
+            bool useInitialFlow = false);
+};
+
+//
+// FarnebackOpticalFlow
+//
+
+/** @brief Class computing a dense optical flow using the Gunnar Farneback's algorithm.
+ */
+class CV_EXPORTS_W FarnebackOpticalFlow : public DenseOpticalFlow
+{
+public:
+    CV_WRAP virtual int getNumLevels() const = 0;
+    CV_WRAP virtual void setNumLevels(int numLevels) = 0;
+
+    CV_WRAP virtual double getPyrScale() const = 0;
+    CV_WRAP virtual void setPyrScale(double pyrScale) = 0;
+
+    CV_WRAP virtual bool getFastPyramids() const = 0;
+    CV_WRAP virtual void setFastPyramids(bool fastPyramids) = 0;
+
+    CV_WRAP virtual int getWinSize() const = 0;
+    CV_WRAP virtual void setWinSize(int winSize) = 0;
+
+    CV_WRAP virtual int getNumIters() const = 0;
+    CV_WRAP virtual void setNumIters(int numIters) = 0;
+
+    CV_WRAP virtual int getPolyN() const = 0;
+    CV_WRAP virtual void setPolyN(int polyN) = 0;
+
+    CV_WRAP virtual double getPolySigma() const = 0;
+    CV_WRAP virtual void setPolySigma(double polySigma) = 0;
+
+    CV_WRAP virtual int getFlags() const = 0;
+    CV_WRAP virtual void setFlags(int flags) = 0;
+
+    CV_WRAP static Ptr<cuda::FarnebackOpticalFlow> create(
+            int numLevels = 5,
+            double pyrScale = 0.5,
+            bool fastPyramids = false,
+            int winSize = 13,
+            int numIters = 10,
+            int polyN = 5,
+            double polySigma = 1.1,
+            int flags = 0);
+};
+
+//
+// OpticalFlowDual_TVL1
+//
+
+/** @brief Implementation of the Zach, Pock and Bischof Dual TV-L1 Optical Flow method.
+ *
+ * @sa C. Zach, T. Pock and H. Bischof, "A Duality Based Approach for Realtime TV-L1 Optical Flow".
+ * @sa Javier Sanchez, Enric Meinhardt-Llopis and Gabriele Facciolo. "TV-L1 Optical Flow Estimation".
+ */
+class CV_EXPORTS_W OpticalFlowDual_TVL1 : public DenseOpticalFlow
+{
+public:
+    /**
+     * Time step of the numerical scheme.
+     */
+    CV_WRAP virtual double getTau() const = 0;
+    CV_WRAP virtual void setTau(double tau) = 0;
+
+    /**
+     * Weight parameter for the data term, attachment parameter.
+     * This is the most relevant parameter, which determines the smoothness of the output.
+     * The smaller this parameter is, the smoother the solutions we obtain.
+     * It depends on the range of motions of the images, so its value should be adapted to each image sequence.
+     */
+    CV_WRAP virtual double getLambda() const = 0;
+    CV_WRAP virtual void setLambda(double lambda) = 0;
+
+    /**
+     * Weight parameter for (u - v)^2, tightness parameter.
+     * It serves as a link between the attachment and the regularization terms.
+     * In theory, it should have a small value in order to maintain both parts in correspondence.
+     * The method is stable for a large range of values of this parameter.
+     */
+    CV_WRAP virtual double getGamma() const = 0;
+    CV_WRAP virtual void setGamma(double gamma) = 0;
+
+    /**
+     * parameter used for motion estimation. It adds a variable allowing for illumination variations
+     * Set this parameter to 1. if you have varying illumination.
+     * See: Chambolle et al, A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging
+     * Journal of Mathematical imaging and vision, may 2011 Vol 40 issue 1, pp 120-145
+     */
+    CV_WRAP virtual double getTheta() const = 0;
+    CV_WRAP virtual void setTheta(double theta) = 0;
+
+    /**
+     * Number of scales used to create the pyramid of images.
+     */
+    CV_WRAP virtual int getNumScales() const = 0;
+    CV_WRAP virtual void setNumScales(int nscales) = 0;
+
+    /**
+     * Number of warpings per scale.
+     * Represents the number of times that I1(x+u0) and grad( I1(x+u0) ) are computed per scale.
+     * This is a parameter that assures the stability of the method.
+     * It also affects the running time, so it is a compromise between speed and accuracy.
+     */
+    CV_WRAP virtual int getNumWarps() const = 0;
+    CV_WRAP virtual void setNumWarps(int warps) = 0;
+
+    /**
+     * Stopping criterion threshold used in the numerical scheme, which is a trade-off between precision and running time.
+     * A small value will yield more accurate solutions at the expense of a slower convergence.
+     */
+    CV_WRAP virtual double getEpsilon() const = 0;
+    CV_WRAP virtual void setEpsilon(double epsilon) = 0;
+
+    /**
+     * Stopping criterion iterations number used in the numerical scheme.
+     */
+    CV_WRAP virtual int getNumIterations() const = 0;
+    CV_WRAP virtual void setNumIterations(int iterations) = 0;
+
+    CV_WRAP virtual double getScaleStep() const = 0;
+    CV_WRAP virtual void setScaleStep(double scaleStep) = 0;
+
+    CV_WRAP virtual bool getUseInitialFlow() const = 0;
+    CV_WRAP virtual void setUseInitialFlow(bool useInitialFlow) = 0;
+
+    CV_WRAP static Ptr<OpticalFlowDual_TVL1> create(
+            double tau = 0.25,
+            double lambda = 0.15,
+            double theta = 0.3,
+            int nscales = 5,
+            int warps = 5,
+            double epsilon = 0.01,
+            int iterations = 300,
+            double scaleStep = 0.8,
+            double gamma = 0.0,
+            bool useInitialFlow = false);
+};
+
+//
+// NvidiaOpticalFlow
+//
+
+/** @brief Class for computing the optical flow vectors between two images using NVIDIA Optical Flow hardware and Optical Flow SDK 1.0.
+@note
+- A sample application demonstrating the use of NVIDIA Optical Flow can be found at
+opencv_source_code/samples/gpu/nvidia_optical_flow.cpp
+- An example application comparing accuracy and performance of NVIDIA Optical Flow with other optical flow algorithms in OpenCV can be found at
+opencv_source_code/samples/gpu/optical_flow.cpp
+*/
+
+class CV_EXPORTS_W NvidiaOpticalFlow_1_0 : public NvidiaHWOpticalFlow
+{
+public:
+    /**
+    * Supported optical flow performance levels.
+    */
+    enum NVIDIA_OF_PERF_LEVEL
+    {
+        NV_OF_PERF_LEVEL_UNDEFINED,
+        NV_OF_PERF_LEVEL_SLOW = 5,                   /**< Slow perf level results in lowest performance and best quality */
+        NV_OF_PERF_LEVEL_MEDIUM = 10,                /**< Medium perf level results in low performance and medium quality */
+        NV_OF_PERF_LEVEL_FAST = 20,                  /**< Fast perf level results in high performance and low quality */
+        NV_OF_PERF_LEVEL_MAX
+    };
+
+    /** @brief The NVIDIA optical flow hardware generates flow vectors at granularity gridSize, which can be queried via function getGridSize().
+    * Upsampler() helper function converts the hardware-generated flow vectors to dense representation (1 flow vector for each pixel)
+    * using nearest neighbour upsampling method.
+
+    @param flow Buffer of type CV_16FC2 containing flow vectors generated by calc().
+    @param width Width of the input image in pixels for which these flow vectors were generated.
+    @param height Height of the input image in pixels for which these flow vectors were generated.
+    @param gridSize Granularity of the optical flow vectors returned by calc() function. Can be queried using getGridSize().
+    @param upsampledFlow Buffer of type CV_32FC2, containing upsampled flow vectors, each flow vector for 1 pixel, in the pitch-linear layout.
+    */
+    CV_WRAP virtual void upSampler(InputArray flow, int width, int height,
+        int gridSize, InputOutputArray upsampledFlow) = 0;
+
+    /** @brief Instantiate NVIDIA Optical Flow
+
+    @param width Width of input image in pixels.
+    @param height Height of input image in pixels.
+    @param perfPreset Optional parameter. Refer [NV OF SDK documentation](https://developer.nvidia.com/opticalflow-sdk) for details about presets.
+                      Defaults to NV_OF_PERF_LEVEL_SLOW.
+    @param enableTemporalHints Optional parameter. Flag to enable temporal hints. When set to true, the hardware uses the flow vectors
+                               generated in previous call to calc() as internal hints for the current call to calc().
+                               Useful when computing flow vectors between successive video frames. Defaults to false.
+    @param enableExternalHints Optional Parameter. Flag to enable passing external hints buffer to calc(). Defaults to false.
+    @param enableCostBuffer Optional Parameter. Flag to enable cost buffer output from calc(). Defaults to false.
+    @param gpuId Optional parameter to select the GPU ID on which the optical flow should be computed. Useful in multi-GPU systems. Defaults to 0.
+    */
+    CV_WRAP static Ptr<NvidiaOpticalFlow_1_0> create(
+        int width,
+        int height,
+        cv::cuda::NvidiaOpticalFlow_1_0::NVIDIA_OF_PERF_LEVEL perfPreset
+        = cv::cuda::NvidiaOpticalFlow_1_0::NVIDIA_OF_PERF_LEVEL::NV_OF_PERF_LEVEL_SLOW,
+        bool enableTemporalHints = false,
+        bool enableExternalHints = false,
+        bool enableCostBuffer = false,
+        int gpuId = 0);
+};
+
+//! @}
+
+}} // namespace cv { namespace cuda {
+
+#endif /* OPENCV_CUDAOPTFLOW_HPP */
diff --git a/modules/cudaoptflow/perf/perf_main.cpp b/modules/cudaoptflow/perf/perf_main.cpp
new file mode 100644
index 00000000000..863a5109bf7
--- /dev/null
+++ b/modules/cudaoptflow/perf/perf_main.cpp
@@ -0,0 +1,47 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+using namespace perf;
+
+CV_PERF_TEST_CUDA_MAIN(cudaoptflow)
diff --git a/modules/cudaoptflow/perf/perf_optflow.cpp b/modules/cudaoptflow/perf/perf_optflow.cpp
new file mode 100644
index 00000000000..ceb4811f8bf
--- /dev/null
+++ b/modules/cudaoptflow/perf/perf_optflow.cpp
@@ -0,0 +1,382 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+typedef pair<string, string> pair_string;
+
+DEF_PARAM_TEST_1(ImagePair, pair_string);
+
+//////////////////////////////////////////////////////
+// BroxOpticalFlow
+
+PERF_TEST_P(ImagePair, BroxOpticalFlow,
+            Values<pair_string>(make_pair("gpu/opticalflow/frame0.png", "gpu/opticalflow/frame1.png")))
+{
+    declare.time(300);
+
+    cv::Mat frame0 = readImage(GetParam().first, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame0.empty());
+
+    cv::Mat frame1 = readImage(GetParam().second, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame1.empty());
+
+    frame0.convertTo(frame0, CV_32FC1, 1.0 / 255.0);
+    frame1.convertTo(frame1, CV_32FC1, 1.0 / 255.0);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_frame0(frame0);
+        const cv::cuda::GpuMat d_frame1(frame1);
+        cv::cuda::GpuMat flow;
+
+        cv::Ptr<cv::cuda::BroxOpticalFlow> d_alg =
+                cv::cuda::BroxOpticalFlow::create(0.197 /*alpha*/, 50.0 /*gamma*/, 0.8 /*scale_factor*/,
+                                                  10 /*inner_iterations*/, 77 /*outer_iterations*/, 10 /*solver_iterations*/);
+
+        TEST_CYCLE() d_alg->calc(d_frame0, d_frame1, flow);
+
+        cv::cuda::GpuMat flows[2];
+        cv::cuda::split(flow, flows);
+
+        cv::cuda::GpuMat u = flows[0];
+        cv::cuda::GpuMat v = flows[1];
+
+        CUDA_SANITY_CHECK(u, 1e-1);
+        CUDA_SANITY_CHECK(v, 1e-1);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////
+// PyrLKOpticalFlowSparse
+
+DEF_PARAM_TEST(ImagePair_Gray_NPts_WinSz_Levels_Iters, pair_string, bool, int, int, int, int);
+
+PERF_TEST_P(ImagePair_Gray_NPts_WinSz_Levels_Iters, PyrLKOpticalFlowSparse,
+            Combine(Values<pair_string>(make_pair("gpu/opticalflow/frame0.png", "gpu/opticalflow/frame1.png")),
+                    Bool(),
+                    Values(8000),
+                    Values(21),
+                    Values(1, 3),
+                    Values(1, 30)))
+{
+    declare.time(20.0);
+
+    const pair_string imagePair = GET_PARAM(0);
+    const bool useGray = GET_PARAM(1);
+    const int points = GET_PARAM(2);
+    const int winSize = GET_PARAM(3);
+    const int levels = GET_PARAM(4);
+    const int iters = GET_PARAM(5);
+
+    cv::Mat frame0 = readImage(imagePair.first, useGray ? cv::IMREAD_GRAYSCALE : cv::IMREAD_COLOR);
+    ASSERT_FALSE(frame0.empty());
+
+    cv::Mat frame1 = readImage(imagePair.second, useGray ? cv::IMREAD_GRAYSCALE : cv::IMREAD_COLOR);
+    ASSERT_FALSE(frame1.empty());
+
+    cv::Mat gray_frame;
+    if (useGray)
+        gray_frame = frame0;
+    else
+        cv::cvtColor(frame0, gray_frame, cv::COLOR_BGR2GRAY);
+
+    cv::Mat pts;
+    cv::goodFeaturesToTrack(gray_frame, pts, points, 0.01, 0.0);
+
+    frame0.convertTo(frame0, CV_32F);
+    frame1.convertTo(frame1, CV_32F);
+    if(!useGray)
+    {
+        cv::cvtColor(frame0, frame0, cv::COLOR_BGR2BGRA);
+        cv::cvtColor(frame1, frame1, cv::COLOR_BGR2BGRA);
+    }
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_pts(pts.reshape(2, 1));
+
+        cv::Ptr<cv::cuda::SparsePyrLKOpticalFlow> d_pyrLK =
+                cv::cuda::SparsePyrLKOpticalFlow::create(cv::Size(winSize, winSize),
+                                                         levels - 1,
+                                                         iters);
+
+        const cv::cuda::GpuMat d_frame0(frame0);
+        const cv::cuda::GpuMat d_frame1(frame1);
+        cv::cuda::GpuMat nextPts;
+        cv::cuda::GpuMat status;
+
+        TEST_CYCLE() d_pyrLK->calc(d_frame0, d_frame1, d_pts, nextPts, status);
+
+        CUDA_SANITY_CHECK(nextPts);
+        CUDA_SANITY_CHECK(status);
+    }
+    else
+    {
+        cv::Mat nextPts;
+        cv::Mat status;
+
+        TEST_CYCLE()
+        {
+            cv::calcOpticalFlowPyrLK(frame0, frame1, pts, nextPts, status, cv::noArray(),
+                                     cv::Size(winSize, winSize), levels - 1,
+                                     cv::TermCriteria(cv::TermCriteria::COUNT + cv::TermCriteria::EPS, iters, 0.01));
+        }
+
+        CPU_SANITY_CHECK(nextPts);
+        CPU_SANITY_CHECK(status);
+    }
+}
+
+//////////////////////////////////////////////////////
+// PyrLKOpticalFlowDense
+
+DEF_PARAM_TEST(ImagePair_WinSz_Levels_Iters, pair_string, int, int, int);
+
+PERF_TEST_P(ImagePair_WinSz_Levels_Iters, PyrLKOpticalFlowDense,
+            Combine(Values<pair_string>(make_pair("gpu/opticalflow/frame0.png", "gpu/opticalflow/frame1.png")),
+                    Values(3, 5, 7, 9, 13, 17, 21),
+                    Values(1, 3),
+                    Values(1, 10)))
+{
+    declare.time(30);
+
+    const pair_string imagePair = GET_PARAM(0);
+    const int winSize = GET_PARAM(1);
+    const int levels = GET_PARAM(2);
+    const int iters = GET_PARAM(3);
+
+    const cv::Mat frame0 = readImage(imagePair.first, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame0.empty());
+
+    const cv::Mat frame1 = readImage(imagePair.second, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame1.empty());
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_frame0(frame0);
+        const cv::cuda::GpuMat d_frame1(frame1);
+        cv::cuda::GpuMat flow;
+
+        cv::Ptr<cv::cuda::DensePyrLKOpticalFlow> d_pyrLK =
+                cv::cuda::DensePyrLKOpticalFlow::create(cv::Size(winSize, winSize),
+                                                        levels - 1,
+                                                        iters);
+
+        TEST_CYCLE() d_pyrLK->calc(d_frame0, d_frame1, flow);
+
+        cv::cuda::GpuMat flows[2];
+        cv::cuda::split(flow, flows);
+
+        cv::cuda::GpuMat u = flows[0];
+        cv::cuda::GpuMat v = flows[1];
+
+        // Sanity test fails on Maxwell and CUDA 7.0
+        SANITY_CHECK_NOTHING();
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////
+// FarnebackOpticalFlow
+
+PERF_TEST_P(ImagePair, FarnebackOpticalFlow,
+            Values<pair_string>(make_pair("gpu/opticalflow/frame0.png", "gpu/opticalflow/frame1.png")))
+{
+    declare.time(10);
+
+    const cv::Mat frame0 = readImage(GetParam().first, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame0.empty());
+
+    const cv::Mat frame1 = readImage(GetParam().second, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame1.empty());
+
+    const int numLevels = 5;
+    const double pyrScale = 0.5;
+    const int winSize = 13;
+    const int numIters = 10;
+    const int polyN = 5;
+    const double polySigma = 1.1;
+    const int flags = 0;
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_frame0(frame0);
+        const cv::cuda::GpuMat d_frame1(frame1);
+        cv::cuda::GpuMat flow;
+
+        cv::Ptr<cv::cuda::FarnebackOpticalFlow> d_farneback =
+                cv::cuda::FarnebackOpticalFlow::create(numLevels, pyrScale, false, winSize,
+                                                       numIters, polyN, polySigma, flags);
+
+        TEST_CYCLE() d_farneback->calc(d_frame0, d_frame1, flow);
+
+        cv::cuda::GpuMat flows[2];
+        cv::cuda::split(flow, flows);
+
+        cv::cuda::GpuMat u = flows[0];
+        cv::cuda::GpuMat v = flows[1];
+
+        CUDA_SANITY_CHECK(u, 1e-4);
+        CUDA_SANITY_CHECK(v, 1e-4);
+    }
+    else
+    {
+        cv::Mat flow;
+
+        TEST_CYCLE() cv::calcOpticalFlowFarneback(frame0, frame1, flow, pyrScale, numLevels, winSize, numIters, polyN, polySigma, flags);
+
+        CPU_SANITY_CHECK(flow);
+    }
+}
+
+//////////////////////////////////////////////////////
+// OpticalFlowDual_TVL1
+
+PERF_TEST_P(ImagePair, OpticalFlowDual_TVL1,
+            Values<pair_string>(make_pair("gpu/opticalflow/frame0.png", "gpu/opticalflow/frame1.png")))
+{
+    declare.time(20);
+
+    const cv::Mat frame0 = readImage(GetParam().first, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame0.empty());
+
+    const cv::Mat frame1 = readImage(GetParam().second, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame1.empty());
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_frame0(frame0);
+        const cv::cuda::GpuMat d_frame1(frame1);
+        cv::cuda::GpuMat flow;
+
+        cv::Ptr<cv::cuda::OpticalFlowDual_TVL1> d_alg =
+                cv::cuda::OpticalFlowDual_TVL1::create();
+
+        TEST_CYCLE() d_alg->calc(d_frame0, d_frame1, flow);
+
+        cv::cuda::GpuMat flows[2];
+        cv::cuda::split(flow, flows);
+
+        cv::cuda::GpuMat u = flows[0];
+        cv::cuda::GpuMat v = flows[1];
+
+        CUDA_SANITY_CHECK(u, 1e-1);
+        CUDA_SANITY_CHECK(v, 1e-1);
+    }
+    else
+    {
+        cv::Mat flow;
+
+        cv::Ptr<cv::optflow::DualTVL1OpticalFlow> alg = cv::optflow::createOptFlow_DualTVL1();
+        alg->setMedianFiltering(1);
+        alg->setInnerIterations(1);
+        alg->setOuterIterations(300);
+        TEST_CYCLE() alg->calc(frame0, frame1, flow);
+
+        CPU_SANITY_CHECK(flow);
+    }
+}
+
+//////////////////////////////////////////////////////
+// NvidiaOpticalFlow_1_0
+
+PERF_TEST_P(ImagePair, NvidiaOpticalFlow_1_0,
+    Values<pair_string>(make_pair("gpu/opticalflow/frame0.png", "gpu/opticalflow/frame1.png")))
+{
+    declare.time(10);
+
+    const cv::Mat frame0 = readImage(GetParam().first, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame0.empty());
+
+    const cv::Mat frame1 = readImage(GetParam().second, cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame1.empty());
+
+    const int width = frame0.size().width;
+    const int height = frame0.size().height;
+    const bool enableTemporalHints = false;
+    const bool enableExternalHints = false;
+    const bool enableCostBuffer = false;
+    const int gpuid = 0;
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_frame0(frame0);
+        const cv::cuda::GpuMat d_frame1(frame1);
+        cv::cuda::GpuMat d_flow;
+        cv::Ptr<cv::cuda::NvidiaOpticalFlow_1_0> d_nvof;
+        try
+        {
+            d_nvof = cv::cuda::NvidiaOpticalFlow_1_0::create(width, height,
+                cv::cuda::NvidiaOpticalFlow_1_0::NVIDIA_OF_PERF_LEVEL::NV_OF_PERF_LEVEL_FAST,
+                enableTemporalHints, enableExternalHints, enableCostBuffer, gpuid);
+        }
+        catch (const cv::Exception& e)
+        {
+            if(e.code == Error::StsBadFunc || e.code == Error::StsBadArg || e.code == Error::StsNullPtr)
+                throw SkipTestException("Current configuration is not supported");
+            throw;
+        }
+
+        TEST_CYCLE() d_nvof->calc(d_frame0, d_frame1, d_flow);
+
+        cv::cuda::GpuMat flow[2];
+        cv::cuda::split(d_flow, flow);
+
+        cv::cuda::GpuMat u = flow[0];
+        cv::cuda::GpuMat v = flow[1];
+
+        CUDA_SANITY_CHECK(u, 1e-10);
+        CUDA_SANITY_CHECK(v, 1e-10);
+    }
+}
+
+}} // namespace
diff --git a/modules/cudaoptflow/perf/perf_precomp.hpp b/modules/cudaoptflow/perf/perf_precomp.hpp
new file mode 100644
index 00000000000..2561e930edf
--- /dev/null
+++ b/modules/cudaoptflow/perf/perf_precomp.hpp
@@ -0,0 +1,58 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_PERF_PRECOMP_HPP__
+#define __OPENCV_PERF_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_perf.hpp"
+
+#include "opencv2/cudaoptflow.hpp"
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/video.hpp"
+#include "opencv2/optflow.hpp"
+
+namespace opencv_test {
+using namespace perf;
+using namespace testing;
+}
+
+#endif
diff --git a/modules/cudaoptflow/samples/nvidia_optical_flow.cpp b/modules/cudaoptflow/samples/nvidia_optical_flow.cpp
new file mode 100644
index 00000000000..478a5f16184
--- /dev/null
+++ b/modules/cudaoptflow/samples/nvidia_optical_flow.cpp
@@ -0,0 +1,225 @@
+#include <unordered_map>
+#include <iostream>
+#include <fstream>
+#include <iomanip>
+
+#include "opencv2/core.hpp"
+#include "opencv2/core/utility.hpp"
+#include "opencv2/highgui.hpp"
+#include "opencv2/imgproc.hpp"
+#include "opencv2/cudaoptflow.hpp"
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/video/tracking.hpp"
+
+using namespace std;
+using namespace cv;
+using namespace cv::cuda;
+
+//this function is taken from opencv/samples/gpu/optical_flow.cpp
+inline bool isFlowCorrect(Point2f u)
+{
+    return !cvIsNaN(u.x) && !cvIsNaN(u.y) && fabs(u.x) < 1e9 && fabs(u.y) < 1e9;
+}
+
+//this function is taken from opencv/samples/gpu/optical_flow.cpp
+static Vec3b computeColor(float fx, float fy)
+{
+    static bool first = true;
+
+    // relative lengths of color transitions:
+    // these are chosen based on perceptual similarity
+    // (e.g. one can distinguish more shades between red and yellow
+    //  than between yellow and green)
+    const int RY = 15;
+    const int YG = 6;
+    const int GC = 4;
+    const int CB = 11;
+    const int BM = 13;
+    const int MR = 6;
+    const int NCOLS = RY + YG + GC + CB + BM + MR;
+    static Vec3i colorWheel[NCOLS];
+
+    if (first)
+    {
+        int k = 0;
+
+        for (int i = 0; i < RY; ++i, ++k)
+            colorWheel[k] = Vec3i(255, 255 * i / RY, 0);
+
+        for (int i = 0; i < YG; ++i, ++k)
+            colorWheel[k] = Vec3i(255 - 255 * i / YG, 255, 0);
+
+        for (int i = 0; i < GC; ++i, ++k)
+            colorWheel[k] = Vec3i(0, 255, 255 * i / GC);
+
+        for (int i = 0; i < CB; ++i, ++k)
+            colorWheel[k] = Vec3i(0, 255 - 255 * i / CB, 255);
+
+        for (int i = 0; i < BM; ++i, ++k)
+            colorWheel[k] = Vec3i(255 * i / BM, 0, 255);
+
+        for (int i = 0; i < MR; ++i, ++k)
+            colorWheel[k] = Vec3i(255, 0, 255 - 255 * i / MR);
+
+        first = false;
+    }
+
+    const float rad = sqrt(fx * fx + fy * fy);
+    const float a = atan2(-fy, -fx) / (float)CV_PI;
+
+    const float fk = (a + 1.0f) / 2.0f * (NCOLS - 1);
+    const int k0 = static_cast<int>(fk);
+    const int k1 = (k0 + 1) % NCOLS;
+    const float f = fk - k0;
+
+    Vec3b pix;
+
+    for (int b = 0; b < 3; b++)
+    {
+        const float col0 = colorWheel[k0][b] / 255.0f;
+        const float col1 = colorWheel[k1][b] / 255.0f;
+
+        float col = (1 - f) * col0 + f * col1;
+
+        if (rad <= 1)
+            col = 1 - rad * (1 - col); // increase saturation with radius
+        else
+            col *= .75; // out of range
+
+        pix[2 - b] = static_cast<uchar>(255.0 * col);
+    }
+
+    return pix;
+}
+
+//this function is taken from opencv/samples/gpu/optical_flow.cpp
+static void drawOpticalFlow(const Mat_<float>& flowx, const Mat_<float>& flowy
+    , Mat& dst, float maxmotion = -1)
+{
+    dst.create(flowx.size(), CV_8UC3);
+    dst.setTo(Scalar::all(0));
+
+    // determine motion range:
+    float maxrad = maxmotion;
+
+    if (maxmotion <= 0)
+    {
+        maxrad = 1;
+        for (int y = 0; y < flowx.rows; ++y)
+        {
+            for (int x = 0; x < flowx.cols; ++x)
+            {
+                Point2f u(flowx(y, x), flowy(y, x));
+
+                if (!isFlowCorrect(u))
+                    continue;
+
+                maxrad = max(maxrad, sqrt(u.x * u.x + u.y * u.y));
+            }
+        }
+    }
+
+    for (int y = 0; y < flowx.rows; ++y)
+    {
+        for (int x = 0; x < flowx.cols; ++x)
+        {
+            Point2f u(flowx(y, x), flowy(y, x));
+
+            if (isFlowCorrect(u))
+                dst.at<Vec3b>(y, x) = computeColor(u.x / maxrad, u.y / maxrad);
+        }
+    }
+}
+
+int main(int argc, char **argv)
+{
+    std::unordered_map<std::string, NvidiaOpticalFlow_1_0::NVIDIA_OF_PERF_LEVEL> presetMap = {
+        { "slow", NvidiaOpticalFlow_1_0::NVIDIA_OF_PERF_LEVEL::NV_OF_PERF_LEVEL_SLOW },
+        { "medium", NvidiaOpticalFlow_1_0::NVIDIA_OF_PERF_LEVEL::NV_OF_PERF_LEVEL_MEDIUM },
+        { "fast", NvidiaOpticalFlow_1_0::NVIDIA_OF_PERF_LEVEL::NV_OF_PERF_LEVEL_FAST } };
+
+    try
+    {
+        CommandLineParser cmd(argc, argv,
+            "{ l left   | ../data/basketball1.png | specify left image }"
+            "{ r right  | ../data/basketball2.png | specify right image }"
+            "{ g gpuid  | 0 | cuda device index}"
+            "{ p preset | slow | perf preset for OF algo [ options : slow, medium, fast ]}"
+            "{ o output | OpenCVNvOF.flo | output flow vector file in middlebury format}"
+            "{ th enableTemporalHints | false | Enable temporal hints}"
+            "{ eh enableExternalHints | false | Enable external hints}"
+            "{ cb enableCostBuffer | false | Enable output cost buffer}"
+            "{ h help   | | print help message }");
+
+        cmd.about("Nvidia's optical flow sample.");
+        if (cmd.has("help") || !cmd.check())
+        {
+            cmd.printMessage();
+            cmd.printErrors();
+            return 0;
+        }
+
+        string pathL = cmd.get<string>("left");
+        string pathR = cmd.get<string>("right");
+        string preset = cmd.get<string>("preset");
+        string output = cmd.get<string>("output");
+        bool enableExternalHints = cmd.get<bool>("enableExternalHints");
+        bool enableTemporalHints = cmd.get<bool>("enableTemporalHints");
+        bool enableCostBuffer = cmd.get<bool>("enableCostBuffer");
+        int gpuId = cmd.get<int>("gpuid");
+
+        if (pathL.empty()) cout << "Specify left image path\n";
+        if (pathR.empty()) cout << "Specify right image path\n";
+        if (preset.empty()) cout << "Specify perf preset for OpticalFlow algo\n";
+        if (pathL.empty() || pathR.empty()) return 0;
+
+        auto search = presetMap.find(preset);
+        if (search == presetMap.end())
+        {
+            std::cout << "Invalid preset level : " << preset << std::endl;
+            return 0;
+        }
+        NvidiaOpticalFlow_1_0::NVIDIA_OF_PERF_LEVEL perfPreset = search->second;
+
+        Mat frameL = imread(pathL, IMREAD_GRAYSCALE);
+        Mat frameR = imread(pathR, IMREAD_GRAYSCALE);
+        if (frameL.empty()) cout << "Can't open '" << pathL << "'\n";
+        if (frameR.empty()) cout << "Can't open '" << pathR << "'\n";
+        if (frameL.empty() || frameR.empty()) return -1;
+
+        Ptr<NvidiaOpticalFlow_1_0> nvof = NvidiaOpticalFlow_1_0::create(
+            frameL.size().width, frameL.size().height, perfPreset,
+            enableTemporalHints, enableExternalHints, enableCostBuffer, gpuId);
+
+        Mat flowx, flowy, flowxy, upsampledFlowXY, image;
+
+        nvof->calc(frameL, frameR, flowxy);
+
+        nvof->upSampler(flowxy, frameL.size().width, frameL.size().height,
+            nvof->getGridSize(), upsampledFlowXY);
+
+        if (output.size() != 0)
+        {
+            if (!writeOpticalFlow(output, upsampledFlowXY))
+                cout << "Failed to save Flow Vector" << endl;
+            else
+                cout << "Flow vector saved as '" << output << "'\n";
+        }
+
+        Mat planes[] = { flowx, flowy };
+        split(upsampledFlowXY, planes);
+        flowx = planes[0]; flowy = planes[1];
+
+        drawOpticalFlow(flowx, flowy, image, 10);
+
+        imshow("Colorize image",image);
+        waitKey(0);
+        nvof->collectGarbage();
+    }
+    catch (const std::exception &ex)
+    {
+        std::cout << ex.what() << std::endl;
+        return 1;
+    }
+    return 0;
+}
\ No newline at end of file
diff --git a/modules/cudaoptflow/samples/optical_flow.cpp b/modules/cudaoptflow/samples/optical_flow.cpp
new file mode 100644
index 00000000000..289883b23b4
--- /dev/null
+++ b/modules/cudaoptflow/samples/optical_flow.cpp
@@ -0,0 +1,259 @@
+#include <iostream>
+#include <fstream>
+
+#include "opencv2/core.hpp"
+#include <opencv2/core/utility.hpp>
+#include "opencv2/highgui.hpp"
+#include "opencv2/cudaoptflow.hpp"
+#include "opencv2/cudaarithm.hpp"
+
+using namespace std;
+using namespace cv;
+using namespace cv::cuda;
+
+inline bool isFlowCorrect(Point2f u)
+{
+    return !cvIsNaN(u.x) && !cvIsNaN(u.y) && fabs(u.x) < 1e9 && fabs(u.y) < 1e9;
+}
+
+static Vec3b computeColor(float fx, float fy)
+{
+    static bool first = true;
+
+    // relative lengths of color transitions:
+    // these are chosen based on perceptual similarity
+    // (e.g. one can distinguish more shades between red and yellow
+    //  than between yellow and green)
+    const int RY = 15;
+    const int YG = 6;
+    const int GC = 4;
+    const int CB = 11;
+    const int BM = 13;
+    const int MR = 6;
+    const int NCOLS = RY + YG + GC + CB + BM + MR;
+    static Vec3i colorWheel[NCOLS];
+
+    if (first)
+    {
+        int k = 0;
+
+        for (int i = 0; i < RY; ++i, ++k)
+            colorWheel[k] = Vec3i(255, 255 * i / RY, 0);
+
+        for (int i = 0; i < YG; ++i, ++k)
+            colorWheel[k] = Vec3i(255 - 255 * i / YG, 255, 0);
+
+        for (int i = 0; i < GC; ++i, ++k)
+            colorWheel[k] = Vec3i(0, 255, 255 * i / GC);
+
+        for (int i = 0; i < CB; ++i, ++k)
+            colorWheel[k] = Vec3i(0, 255 - 255 * i / CB, 255);
+
+        for (int i = 0; i < BM; ++i, ++k)
+            colorWheel[k] = Vec3i(255 * i / BM, 0, 255);
+
+        for (int i = 0; i < MR; ++i, ++k)
+            colorWheel[k] = Vec3i(255, 0, 255 - 255 * i / MR);
+
+        first = false;
+    }
+
+    const float rad = sqrt(fx * fx + fy * fy);
+    const float a = atan2(-fy, -fx) / (float)CV_PI;
+
+    const float fk = (a + 1.0f) / 2.0f * (NCOLS - 1);
+    const int k0 = static_cast<int>(fk);
+    const int k1 = (k0 + 1) % NCOLS;
+    const float f = fk - k0;
+
+    Vec3b pix;
+
+    for (int b = 0; b < 3; b++)
+    {
+        const float col0 = colorWheel[k0][b] / 255.0f;
+        const float col1 = colorWheel[k1][b] / 255.0f;
+
+        float col = (1 - f) * col0 + f * col1;
+
+        if (rad <= 1)
+            col = 1 - rad * (1 - col); // increase saturation with radius
+        else
+            col *= .75; // out of range
+
+        pix[2 - b] = static_cast<uchar>(255.0 * col);
+    }
+
+    return pix;
+}
+
+static void drawOpticalFlow(const Mat_<float>& flowx, const Mat_<float>& flowy, Mat& dst, float maxmotion = -1)
+{
+    dst.create(flowx.size(), CV_8UC3);
+    dst.setTo(Scalar::all(0));
+
+    // determine motion range:
+    float maxrad = maxmotion;
+
+    if (maxmotion <= 0)
+    {
+        maxrad = 1;
+        for (int y = 0; y < flowx.rows; ++y)
+        {
+            for (int x = 0; x < flowx.cols; ++x)
+            {
+                Point2f u(flowx(y, x), flowy(y, x));
+
+                if (!isFlowCorrect(u))
+                    continue;
+
+                maxrad = max(maxrad, sqrt(u.x * u.x + u.y * u.y));
+            }
+        }
+    }
+
+    for (int y = 0; y < flowx.rows; ++y)
+    {
+        for (int x = 0; x < flowx.cols; ++x)
+        {
+            Point2f u(flowx(y, x), flowy(y, x));
+
+            if (isFlowCorrect(u))
+                dst.at<Vec3b>(y, x) = computeColor(u.x / maxrad, u.y / maxrad);
+        }
+    }
+}
+
+static void showFlow(const char* name, const GpuMat& d_flow)
+{
+    GpuMat planes[2];
+    cuda::split(d_flow, planes);
+
+    Mat flowx(planes[0]);
+    Mat flowy(planes[1]);
+
+    Mat out;
+    drawOpticalFlow(flowx, flowy, out, 10);
+
+    imshow(name, out);
+}
+
+int main(int argc, const char* argv[])
+{
+    string filename1, filename2;
+    if (argc < 3)
+    {
+        cerr << "Usage : " << argv[0] << " <frame0> <frame1>" << endl;
+        filename1 = "../data/basketball1.png";
+        filename2 = "../data/basketball2.png";
+    }
+    else
+    {
+        filename1 = argv[1];
+        filename2 = argv[2];
+    }
+
+    Mat frame0 = imread(filename1, IMREAD_GRAYSCALE);
+    Mat frame1 = imread(filename2, IMREAD_GRAYSCALE);
+
+    if (frame0.empty())
+    {
+        cerr << "Can't open image [" << filename1 << "]" << endl;
+        return -1;
+    }
+    if (frame1.empty())
+    {
+        cerr << "Can't open image [" << filename2 << "]" << endl;
+        return -1;
+    }
+
+    if (frame1.size() != frame0.size())
+    {
+        cerr << "Images should be of equal sizes" << endl;
+        return -1;
+    }
+
+    GpuMat d_frame0(frame0);
+    GpuMat d_frame1(frame1);
+
+    GpuMat d_flow(frame0.size(), CV_32FC2), d_flowxy;
+
+    Ptr<cuda::BroxOpticalFlow> brox = cuda::BroxOpticalFlow::create(0.197f, 50.0f, 0.8f, 10, 77, 10);
+    Ptr<cuda::DensePyrLKOpticalFlow> lk = cuda::DensePyrLKOpticalFlow::create(Size(7, 7));
+    Ptr<cuda::FarnebackOpticalFlow> farn = cuda::FarnebackOpticalFlow::create();
+    Ptr<cuda::OpticalFlowDual_TVL1> tvl1 = cuda::OpticalFlowDual_TVL1::create();
+    Ptr<cuda::NvidiaOpticalFlow_1_0> nvof = cuda::NvidiaOpticalFlow_1_0::create(
+        frame0.size().width, frame0.size().height, NvidiaOpticalFlow_1_0::NVIDIA_OF_PERF_LEVEL::NV_OF_PERF_LEVEL_FAST);
+
+    {
+        GpuMat d_frame0f;
+        GpuMat d_frame1f;
+
+        d_frame0.convertTo(d_frame0f, CV_32F, 1.0 / 255.0);
+        d_frame1.convertTo(d_frame1f, CV_32F, 1.0 / 255.0);
+
+        const int64 start = getTickCount();
+
+        brox->calc(d_frame0f, d_frame1f, d_flow);
+
+        const double timeSec = (getTickCount() - start) / getTickFrequency();
+        cout << "Brox : " << timeSec << " sec" << endl;
+
+        showFlow("Brox", d_flow);
+    }
+
+    {
+        const int64 start = getTickCount();
+
+        lk->calc(d_frame0, d_frame1, d_flow);
+
+        const double timeSec = (getTickCount() - start) / getTickFrequency();
+        cout << "LK : " << timeSec << " sec" << endl;
+
+        showFlow("LK", d_flow);
+    }
+
+    {
+        const int64 start = getTickCount();
+
+        farn->calc(d_frame0, d_frame1, d_flow);
+
+        const double timeSec = (getTickCount() - start) / getTickFrequency();
+        cout << "Farn : " << timeSec << " sec" << endl;
+
+        showFlow("Farn", d_flow);
+    }
+
+    {
+        const int64 start = getTickCount();
+
+        tvl1->calc(d_frame0, d_frame1, d_flow);
+
+        const double timeSec = (getTickCount() - start) / getTickFrequency();
+        cout << "TVL1 : " << timeSec << " sec" << endl;
+
+        showFlow("TVL1", d_flow);
+    }
+
+    {
+        //The timing displayed below includes the time taken to copy the input buffers to the OF CUDA input buffers
+        //and to copy the output buffers from the OF CUDA output buffer to the output buffer.
+        //Hence it is expected to be more than what is displayed in the NVIDIA Optical Flow SDK documentation.
+        const int64 start = getTickCount();
+
+        nvof->calc(d_frame0, d_frame1, d_flowxy);
+
+        const double timeSec = (getTickCount() - start) / getTickFrequency();
+        cout << "NVIDIAOpticalFlow : " << timeSec << " sec" << endl;
+
+        nvof->upSampler(d_flowxy, frame0.size().width, frame0.size().height,
+            nvof->getGridSize(), d_flow);
+
+        showFlow("NVIDIAOpticalFlow", d_flow);
+    }
+
+    imshow("Frame 0", frame0);
+    imshow("Frame 1", frame1);
+    waitKey();
+
+    return 0;
+}
diff --git a/modules/cudaoptflow/src/brox.cpp b/modules/cudaoptflow/src/brox.cpp
new file mode 100644
index 00000000000..11c541906be
--- /dev/null
+++ b/modules/cudaoptflow/src/brox.cpp
@@ -0,0 +1,194 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || !defined (HAVE_OPENCV_CUDALEGACY) || defined (CUDA_DISABLER)
+
+Ptr<BroxOpticalFlow> cv::cuda::BroxOpticalFlow::create(double, double, double, int, int, int) { throw_no_cuda(); return Ptr<BroxOpticalFlow>(); }
+
+#else
+
+namespace {
+
+    class BroxOpticalFlowImpl : public BroxOpticalFlow
+    {
+    public:
+        BroxOpticalFlowImpl(double alpha, double gamma, double scale_factor,
+                            int inner_iterations, int outer_iterations, int solver_iterations) :
+            alpha_(alpha), gamma_(gamma), scale_factor_(scale_factor),
+            inner_iterations_(inner_iterations), outer_iterations_(outer_iterations),
+            solver_iterations_(solver_iterations)
+        {
+        }
+
+        virtual void calc(InputArray I0, InputArray I1, InputOutputArray flow, Stream& stream);
+
+        virtual double getFlowSmoothness() const { return alpha_; }
+        virtual void setFlowSmoothness(double alpha) { alpha_ = static_cast<float>(alpha); }
+
+        virtual double getGradientConstancyImportance() const { return gamma_; }
+        virtual void setGradientConstancyImportance(double gamma) { gamma_ = static_cast<float>(gamma); }
+
+        virtual double getPyramidScaleFactor() const { return scale_factor_; }
+        virtual void setPyramidScaleFactor(double scale_factor) { scale_factor_ = static_cast<float>(scale_factor); }
+
+        //! number of lagged non-linearity iterations (inner loop)
+        virtual int getInnerIterations() const { return inner_iterations_; }
+        virtual void setInnerIterations(int inner_iterations) { inner_iterations_ = inner_iterations; }
+
+        //! number of warping iterations (number of pyramid levels)
+        virtual int getOuterIterations() const { return outer_iterations_; }
+        virtual void setOuterIterations(int outer_iterations) { outer_iterations_ = outer_iterations; }
+
+        //! number of linear system solver iterations
+        virtual int getSolverIterations() const { return solver_iterations_; }
+        virtual void setSolverIterations(int solver_iterations) { solver_iterations_ = solver_iterations; }
+
+    private:
+        //! flow smoothness
+        float alpha_;
+
+        //! gradient constancy importance
+        float gamma_;
+
+        //! pyramid scale factor
+        float scale_factor_;
+
+        //! number of lagged non-linearity iterations (inner loop)
+        int inner_iterations_;
+
+        //! number of warping iterations (number of pyramid levels)
+        int outer_iterations_;
+
+        //! number of linear system solver iterations
+        int solver_iterations_;
+    };
+
+    static size_t getBufSize(const NCVBroxOpticalFlowDescriptor& desc,
+                             const NCVMatrix<Ncv32f>& frame0, const NCVMatrix<Ncv32f>& frame1,
+                             NCVMatrix<Ncv32f>& u, NCVMatrix<Ncv32f>& v,
+                             size_t textureAlignment)
+    {
+        NCVMemStackAllocator gpuCounter(static_cast<Ncv32u>(textureAlignment));
+
+        ncvSafeCall( NCVBroxOpticalFlow(desc, gpuCounter, frame0, frame1, u, v, 0) );
+
+        return gpuCounter.maxSize();
+    }
+
+    static void outputHandler(const String &msg)
+    {
+        CV_Error(cv::Error::GpuApiCallError, msg.c_str());
+    }
+
+    void BroxOpticalFlowImpl::calc(InputArray _I0, InputArray _I1, InputOutputArray _flow, Stream& stream)
+    {
+        const GpuMat frame0 = _I0.getGpuMat();
+        const GpuMat frame1 = _I1.getGpuMat();
+
+        CV_Assert( frame0.type() == CV_32FC1 );
+        CV_Assert( frame1.size() == frame0.size() && frame1.type() == frame0.type() );
+
+        ncvSetDebugOutputHandler(outputHandler);
+
+        BufferPool pool(stream);
+        GpuMat u = pool.getBuffer(frame0.size(), CV_32FC1);
+        GpuMat v = pool.getBuffer(frame0.size(), CV_32FC1);
+
+        NCVBroxOpticalFlowDescriptor desc;
+        desc.alpha = alpha_;
+        desc.gamma = gamma_;
+        desc.scale_factor = scale_factor_;
+        desc.number_of_inner_iterations = inner_iterations_;
+        desc.number_of_outer_iterations = outer_iterations_;
+        desc.number_of_solver_iterations = solver_iterations_;
+
+        NCVMemSegment frame0MemSeg;
+        frame0MemSeg.begin.memtype = NCVMemoryTypeDevice;
+        frame0MemSeg.begin.ptr = const_cast<uchar*>(frame0.data);
+        frame0MemSeg.size = frame0.step * frame0.rows;
+
+        NCVMemSegment frame1MemSeg;
+        frame1MemSeg.begin.memtype = NCVMemoryTypeDevice;
+        frame1MemSeg.begin.ptr = const_cast<uchar*>(frame1.data);
+        frame1MemSeg.size = frame1.step * frame1.rows;
+
+        NCVMemSegment uMemSeg;
+        uMemSeg.begin.memtype = NCVMemoryTypeDevice;
+        uMemSeg.begin.ptr = u.ptr();
+        uMemSeg.size = u.step * u.rows;
+
+        NCVMemSegment vMemSeg;
+        vMemSeg.begin.memtype = NCVMemoryTypeDevice;
+        vMemSeg.begin.ptr = v.ptr();
+        vMemSeg.size = v.step * v.rows;
+
+        DeviceInfo devInfo;
+        size_t textureAlignment = devInfo.textureAlignment();
+
+        NCVMatrixReuse<Ncv32f> frame0Mat(frame0MemSeg, static_cast<Ncv32u>(textureAlignment), frame0.cols, frame0.rows, static_cast<Ncv32u>(frame0.step));
+        NCVMatrixReuse<Ncv32f> frame1Mat(frame1MemSeg, static_cast<Ncv32u>(textureAlignment), frame1.cols, frame1.rows, static_cast<Ncv32u>(frame1.step));
+        NCVMatrixReuse<Ncv32f> uMat(uMemSeg, static_cast<Ncv32u>(textureAlignment), u.cols, u.rows, static_cast<Ncv32u>(u.step));
+        NCVMatrixReuse<Ncv32f> vMat(vMemSeg, static_cast<Ncv32u>(textureAlignment), v.cols, v.rows, static_cast<Ncv32u>(v.step));
+
+        size_t bufSize = getBufSize(desc, frame0Mat, frame1Mat, uMat, vMat, textureAlignment);
+        GpuMat buf = pool.getBuffer(1, static_cast<int>(bufSize), CV_8UC1);
+
+        NCVMemStackAllocator gpuAllocator(NCVMemoryTypeDevice, bufSize, static_cast<Ncv32u>(textureAlignment), buf.ptr());
+
+        ncvSafeCall( NCVBroxOpticalFlow(desc, gpuAllocator, frame0Mat, frame1Mat, uMat, vMat, StreamAccessor::getStream(stream)) );
+
+        GpuMat flows[] = {u, v};
+        cuda::merge(flows, 2, _flow, stream);
+    }
+}
+
+Ptr<BroxOpticalFlow> cv::cuda::BroxOpticalFlow::create(double alpha, double gamma, double scale_factor, int inner_iterations, int outer_iterations, int solver_iterations)
+{
+    return makePtr<BroxOpticalFlowImpl>(alpha, gamma, scale_factor, inner_iterations, outer_iterations, solver_iterations);
+}
+
+#endif /* HAVE_CUDA */
diff --git a/modules/cudaoptflow/src/cuda/farneback.cu b/modules/cudaoptflow/src/cuda/farneback.cu
new file mode 100644
index 00000000000..7c902c8d226
--- /dev/null
+++ b/modules/cudaoptflow/src/cuda/farneback.cu
@@ -0,0 +1,656 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/border_interpolate.hpp"
+
+#define tx threadIdx.x
+#define ty threadIdx.y
+#define bx blockIdx.x
+#define by blockIdx.y
+#define bdx blockDim.x
+#define bdy blockDim.y
+
+#define BORDER_SIZE 5
+#define MAX_KSIZE_HALF 100
+
+namespace cv { namespace cuda { namespace device { namespace optflow_farneback
+{
+    __constant__ float c_g[8];
+    __constant__ float c_xg[8];
+    __constant__ float c_xxg[8];
+    __constant__ float c_ig11, c_ig03, c_ig33, c_ig55;
+
+
+    template <int polyN>
+    __global__ void polynomialExpansion(
+            const int height, const int width, const PtrStepf src, PtrStepf dst)
+    {
+        const int y = by * bdy + ty;
+        const int x = bx * (bdx - 2*polyN) + tx - polyN;
+
+        if (y < height)
+        {
+            extern __shared__ float smem[];
+            volatile float *row = smem + tx;
+            int xWarped = ::min(::max(x, 0), width - 1);
+
+            row[0] = src(y, xWarped) * c_g[0];
+            row[bdx] = 0.f;
+            row[2*bdx] = 0.f;
+
+            for (int k = 1; k <= polyN; ++k)
+            {
+                float t0 = src(::max(y - k, 0), xWarped);
+                float t1 = src(::min(y + k, height - 1), xWarped);
+
+                row[0] += c_g[k] * (t0 + t1);
+                row[bdx] += c_xg[k] * (t1 - t0);
+                row[2*bdx] += c_xxg[k] * (t0 + t1);
+            }
+
+            __syncthreads();
+
+            if (tx >= polyN && tx + polyN < bdx && x < width)
+            {
+                float b1 = c_g[0] * row[0];
+                float b3 = c_g[0] * row[bdx];
+                float b5 = c_g[0] * row[2*bdx];
+                float b2 = 0, b4 = 0, b6 = 0;
+
+                for (int k = 1; k <= polyN; ++k)
+                {
+                    b1 += (row[k] + row[-k]) * c_g[k];
+                    b4 += (row[k] + row[-k]) * c_xxg[k];
+                    b2 += (row[k] - row[-k]) * c_xg[k];
+                    b3 += (row[k + bdx] + row[-k + bdx]) * c_g[k];
+                    b6 += (row[k + bdx] - row[-k + bdx]) * c_xg[k];
+                    b5 += (row[k + 2*bdx] + row[-k + 2*bdx]) * c_g[k];
+                }
+
+                dst(y, xWarped) = b3*c_ig11;
+                dst(height + y, xWarped) = b2*c_ig11;
+                dst(2*height + y, xWarped) = b1*c_ig03 + b5*c_ig33;
+                dst(3*height + y, xWarped) = b1*c_ig03 + b4*c_ig33;
+                dst(4*height + y, xWarped) = b6*c_ig55;
+            }
+        }
+    }
+
+
+    void setPolynomialExpansionConsts(
+            int polyN, const float *g, const float *xg, const float *xxg,
+            float ig11, float ig03, float ig33, float ig55)
+    {
+        cudaSafeCall(cudaMemcpyToSymbol(c_g, g, (polyN + 1) * sizeof(*g)));
+        cudaSafeCall(cudaMemcpyToSymbol(c_xg, xg, (polyN + 1) * sizeof(*xg)));
+        cudaSafeCall(cudaMemcpyToSymbol(c_xxg, xxg, (polyN + 1) * sizeof(*xxg)));
+        cudaSafeCall(cudaMemcpyToSymbol(c_ig11, &ig11, sizeof(ig11)));
+        cudaSafeCall(cudaMemcpyToSymbol(c_ig03, &ig03, sizeof(ig03)));
+        cudaSafeCall(cudaMemcpyToSymbol(c_ig33, &ig33, sizeof(ig33)));
+        cudaSafeCall(cudaMemcpyToSymbol(c_ig55, &ig55, sizeof(ig55)));
+    }
+
+
+    void polynomialExpansionGpu(const PtrStepSzf &src, int polyN, PtrStepSzf dst, cudaStream_t stream)
+    {
+        dim3 block(256);
+        dim3 grid(divUp(src.cols, block.x - 2*polyN), src.rows);
+        int smem = 3 * block.x * sizeof(float);
+
+        if (polyN == 5)
+            polynomialExpansion<5><<<grid, block, smem, stream>>>(src.rows, src.cols, src, dst);
+        else if (polyN == 7)
+            polynomialExpansion<7><<<grid, block, smem, stream>>>(src.rows, src.cols, src, dst);
+
+        cudaSafeCall(cudaGetLastError());
+
+        if (stream == 0)
+            cudaSafeCall(cudaDeviceSynchronize());
+    }
+
+
+    __constant__ float c_border[BORDER_SIZE + 1];
+
+    __global__ void updateMatrices(
+            const int height, const int width, const PtrStepf flowx, const PtrStepf flowy,
+            const PtrStepf R0, const PtrStepf R1, PtrStepf M)
+    {
+        const int y = by * bdy + ty;
+        const int x = bx * bdx + tx;
+
+        if (y < height && x < width)
+        {
+            float dx = flowx(y, x);
+            float dy = flowy(y, x);
+            float fx = x + dx;
+            float fy = y + dy;
+
+            int x1 = floorf(fx);
+            int y1 = floorf(fy);
+            fx -= x1; fy -= y1;
+
+            float r2, r3, r4, r5, r6;
+
+            if (x1 >= 0 && y1 >= 0 && x1 < width - 1 && y1 < height - 1)
+            {
+                float a00 = (1.f - fx) * (1.f - fy);
+                float a01 = fx * (1.f - fy);
+                float a10 = (1.f - fx) * fy;
+                float a11 = fx * fy;
+
+                r2 = a00 * R1(y1, x1) +
+                     a01 * R1(y1, x1 + 1) +
+                     a10 * R1(y1 + 1, x1) +
+                     a11 * R1(y1 + 1, x1 + 1);
+
+                r3 = a00 * R1(height + y1, x1) +
+                     a01 * R1(height + y1, x1 + 1) +
+                     a10 * R1(height + y1 + 1, x1) +
+                     a11 * R1(height + y1 + 1, x1 + 1);
+
+                r4 = a00 * R1(2*height + y1, x1) +
+                     a01 * R1(2*height + y1, x1 + 1) +
+                     a10 * R1(2*height + y1 + 1, x1) +
+                     a11 * R1(2*height + y1 + 1, x1 + 1);
+
+                r5 = a00 * R1(3*height + y1, x1) +
+                     a01 * R1(3*height + y1, x1 + 1) +
+                     a10 * R1(3*height + y1 + 1, x1) +
+                     a11 * R1(3*height + y1 + 1, x1 + 1);
+
+                r6 = a00 * R1(4*height + y1, x1) +
+                     a01 * R1(4*height + y1, x1 + 1) +
+                     a10 * R1(4*height + y1 + 1, x1) +
+                     a11 * R1(4*height + y1 + 1, x1 + 1);
+
+                r4 = (R0(2*height + y, x) + r4) * 0.5f;
+                r5 = (R0(3*height + y, x) + r5) * 0.5f;
+                r6 = (R0(4*height + y, x) + r6) * 0.25f;
+            }
+            else
+            {
+                r2 = r3 = 0.f;
+                r4 = R0(2*height + y, x);
+                r5 = R0(3*height + y, x);
+                r6 = R0(4*height + y, x) * 0.5f;
+            }
+
+            r2 = (R0(y, x) - r2) * 0.5f;
+            r3 = (R0(height + y, x) - r3) * 0.5f;
+
+            r2 += r4*dy + r6*dx;
+            r3 += r6*dy + r5*dx;
+
+            float scale =
+                    c_border[::min(x, BORDER_SIZE)] *
+                    c_border[::min(y, BORDER_SIZE)] *
+                    c_border[::min(width - x - 1, BORDER_SIZE)] *
+                    c_border[::min(height - y - 1, BORDER_SIZE)];
+
+            r2 *= scale; r3 *= scale; r4 *= scale;
+            r5 *= scale; r6 *= scale;
+
+            M(y, x) = r4*r4 + r6*r6;
+            M(height + y, x) = (r4 + r5)*r6;
+            M(2*height + y, x) = r5*r5 + r6*r6;
+            M(3*height + y, x) = r4*r2 + r6*r3;
+            M(4*height + y, x) = r6*r2 + r5*r3;
+        }
+    }
+
+
+    void setUpdateMatricesConsts()
+    {
+        static const float border[BORDER_SIZE + 1] = {0.14f, 0.14f, 0.4472f, 0.4472f, 0.4472f, 1.f};
+        cudaSafeCall(cudaMemcpyToSymbol(c_border, border, (BORDER_SIZE + 1) * sizeof(*border)));
+    }
+
+
+    void updateMatricesGpu(
+            const PtrStepSzf flowx, const PtrStepSzf flowy, const PtrStepSzf R0, const PtrStepSzf R1,
+            PtrStepSzf M, cudaStream_t stream)
+    {
+        dim3 block(32, 8);
+        dim3 grid(divUp(flowx.cols, block.x), divUp(flowx.rows, block.y));
+
+        updateMatrices<<<grid, block, 0, stream>>>(flowx.rows, flowx.cols, flowx, flowy, R0, R1, M);
+
+        cudaSafeCall(cudaGetLastError());
+
+        if (stream == 0)
+            cudaSafeCall(cudaDeviceSynchronize());
+    }
+
+
+    __global__ void updateFlow(
+            const int height, const int width, const PtrStepf M, PtrStepf flowx, PtrStepf flowy)
+    {
+        const int y = by * bdy + ty;
+        const int x = bx * bdx + tx;
+
+        if (y < height && x < width)
+        {
+            float g11 = M(y, x);
+            float g12 = M(height + y, x);
+            float g22 = M(2*height + y, x);
+            float h1 = M(3*height + y, x);
+            float h2 = M(4*height + y, x);
+
+            float detInv = 1.f / (g11*g22 - g12*g12 + 1e-3f);
+
+            flowx(y, x) = (g11*h2 - g12*h1) * detInv;
+            flowy(y, x) = (g22*h1 - g12*h2) * detInv;
+        }
+    }
+
+
+    void updateFlowGpu(const PtrStepSzf M, PtrStepSzf flowx, PtrStepSzf flowy, cudaStream_t stream)
+    {
+        dim3 block(32, 8);
+        dim3 grid(divUp(flowx.cols, block.x), divUp(flowx.rows, block.y));
+
+        updateFlow<<<grid, block, 0, stream>>>(flowx.rows, flowx.cols, M, flowx, flowy);
+
+        cudaSafeCall(cudaGetLastError());
+
+        if (stream == 0)
+            cudaSafeCall(cudaDeviceSynchronize());
+    }
+
+
+    /*__global__ void boxFilter(
+            const int height, const int width, const PtrStepf src,
+            const int ksizeHalf, const float boxAreaInv, PtrStepf dst)
+    {
+        const int y = by * bdy + ty;
+        const int x = bx * bdx + tx;
+
+        extern __shared__ float smem[];
+        volatile float *row = smem + ty * (bdx + 2*ksizeHalf);
+
+        if (y < height)
+        {
+            // Vertical pass
+            for (int i = tx; i < bdx + 2*ksizeHalf; i += bdx)
+            {
+                int xExt = int(bx * bdx) + i - ksizeHalf;
+                xExt = ::min(::max(xExt, 0), width - 1);
+
+                row[i] = src(y, xExt);
+                for (int j = 1; j <= ksizeHalf; ++j)
+                    row[i] += src(::max(y - j, 0), xExt) + src(::min(y + j, height - 1), xExt);
+            }
+
+            if (x < width)
+            {
+                __syncthreads();
+
+                // Horizontal passs
+                row += tx + ksizeHalf;
+                float res = row[0];
+                for (int i = 1; i <= ksizeHalf; ++i)
+                    res += row[-i] + row[i];
+                dst(y, x) = res * boxAreaInv;
+            }
+        }
+    }
+
+
+    void boxFilterGpu(const PtrStepSzf src, int ksizeHalf, PtrStepSzf dst, cudaStream_t stream)
+    {
+        dim3 block(256);
+        dim3 grid(divUp(src.cols, block.x), divUp(src.rows, block.y));
+        int smem = (block.x + 2*ksizeHalf) * block.y * sizeof(float);
+
+        float boxAreaInv = 1.f / ((1 + 2*ksizeHalf) * (1 + 2*ksizeHalf));
+        boxFilter<<<grid, block, smem, stream>>>(src.rows, src.cols, src, ksizeHalf, boxAreaInv, dst);
+
+        cudaSafeCall(cudaGetLastError());
+
+        if (stream == 0)
+            cudaSafeCall(cudaDeviceSynchronize());
+    }*/
+
+
+    __global__ void boxFilter5(
+            const int height, const int width, const PtrStepf src,
+            const int ksizeHalf, const float boxAreaInv, PtrStepf dst)
+    {
+        const int y = by * bdy + ty;
+        const int x = bx * bdx + tx;
+
+        extern __shared__ float smem[];
+
+        const int smw = bdx + 2*ksizeHalf; // shared memory "width"
+        volatile float *row = smem + 5 * ty * smw;
+
+        if (y < height)
+        {
+            // Vertical pass
+            for (int i = tx; i < bdx + 2*ksizeHalf; i += bdx)
+            {
+                int xExt = int(bx * bdx) + i - ksizeHalf;
+                xExt = ::min(::max(xExt, 0), width - 1);
+
+                #pragma unroll
+                for (int k = 0; k < 5; ++k)
+                    row[k*smw + i] = src(k*height + y, xExt);
+
+                for (int j = 1; j <= ksizeHalf; ++j)
+                    #pragma unroll
+                    for (int k = 0; k < 5; ++k)
+                        row[k*smw + i] +=
+                                src(k*height + ::max(y - j, 0), xExt) +
+                                src(k*height + ::min(y + j, height - 1), xExt);
+            }
+
+            if (x < width)
+            {
+                __syncthreads();
+
+                // Horizontal passs
+
+                row += tx + ksizeHalf;
+                float res[5];
+
+                #pragma unroll
+                for (int k = 0; k < 5; ++k)
+                    res[k] = row[k*smw];
+
+                for (int i = 1; i <= ksizeHalf; ++i)
+                    #pragma unroll
+                    for (int k = 0; k < 5; ++k)
+                        res[k] += row[k*smw - i] + row[k*smw + i];
+
+                #pragma unroll
+                for (int k = 0; k < 5; ++k)
+                    dst(k*height + y, x) = res[k] * boxAreaInv;
+            }
+        }
+    }
+
+
+    void boxFilter5Gpu(const PtrStepSzf src, int ksizeHalf, PtrStepSzf dst, cudaStream_t stream)
+    {
+        int height = src.rows / 5;
+        int width = src.cols;
+
+        dim3 block(256);
+        dim3 grid(divUp(width, block.x), divUp(height, block.y));
+        int smem = (block.x + 2*ksizeHalf) * 5 * block.y * sizeof(float);
+
+        float boxAreaInv = 1.f / ((1 + 2*ksizeHalf) * (1 + 2*ksizeHalf));
+        boxFilter5<<<grid, block, smem, stream>>>(height, width, src, ksizeHalf, boxAreaInv, dst);
+
+        cudaSafeCall(cudaGetLastError());
+
+        if (stream == 0)
+            cudaSafeCall(cudaDeviceSynchronize());
+    }
+
+
+    void boxFilter5Gpu_CC11(const PtrStepSzf src, int ksizeHalf, PtrStepSzf dst, cudaStream_t stream)
+    {
+        int height = src.rows / 5;
+        int width = src.cols;
+
+        dim3 block(128);
+        dim3 grid(divUp(width, block.x), divUp(height, block.y));
+        int smem = (block.x + 2*ksizeHalf) * 5 * block.y * sizeof(float);
+
+        float boxAreaInv = 1.f / ((1 + 2*ksizeHalf) * (1 + 2*ksizeHalf));
+        boxFilter5<<<grid, block, smem, stream>>>(height, width, src, ksizeHalf, boxAreaInv, dst);
+
+        cudaSafeCall(cudaGetLastError());
+
+        if (stream == 0)
+            cudaSafeCall(cudaDeviceSynchronize());
+    }
+
+
+    __constant__ float c_gKer[MAX_KSIZE_HALF + 1];
+
+    template <typename Border>
+    __global__ void gaussianBlur(
+            const int height, const int width, const PtrStepf src, const int ksizeHalf,
+            const Border b, PtrStepf dst)
+    {
+        const int y = by * bdy + ty;
+        const int x = bx * bdx + tx;
+
+        extern __shared__ float smem[];
+        volatile float *row = smem + ty * (bdx + 2*ksizeHalf);
+
+        if (y < height)
+        {
+            // Vertical pass
+            for (int i = tx; i < bdx + 2*ksizeHalf; i += bdx)
+            {
+                int xExt = int(bx * bdx) + i - ksizeHalf;
+                xExt = b.idx_col(xExt);
+                row[i] = src(y, xExt) * c_gKer[0];
+                for (int j = 1; j <= ksizeHalf; ++j)
+                    row[i] +=
+                            (src(b.idx_row_low(y - j), xExt) +
+                             src(b.idx_row_high(y + j), xExt)) * c_gKer[j];
+            }
+
+            if (x < width)
+            {
+                __syncthreads();
+
+                // Horizontal pass
+                row += tx + ksizeHalf;
+                float res = row[0] * c_gKer[0];
+                for (int i = 1; i <= ksizeHalf; ++i)
+                    res += (row[-i] + row[i]) * c_gKer[i];
+                dst(y, x) = res;
+            }
+        }
+    }
+
+
+    void setGaussianBlurKernel(const float *gKer, int ksizeHalf)
+    {
+        cudaSafeCall(cudaMemcpyToSymbol(c_gKer, gKer, (ksizeHalf + 1) * sizeof(*gKer)));
+    }
+
+
+    template <typename Border>
+    void gaussianBlurCaller(const PtrStepSzf src, int ksizeHalf, PtrStepSzf dst, cudaStream_t stream)
+    {
+        int height = src.rows;
+        int width = src.cols;
+
+        dim3 block(256);
+        dim3 grid(divUp(width, block.x), divUp(height, block.y));
+        int smem = (block.x + 2*ksizeHalf) * block.y * sizeof(float);
+        Border b(height, width);
+
+        gaussianBlur<<<grid, block, smem, stream>>>(height, width, src, ksizeHalf, b, dst);
+
+        cudaSafeCall(cudaGetLastError());
+
+        if (stream == 0)
+            cudaSafeCall(cudaDeviceSynchronize());
+    }
+
+
+    void gaussianBlurGpu(
+            const PtrStepSzf src, int ksizeHalf, PtrStepSzf dst, int borderMode, cudaStream_t stream)
+    {
+        typedef void (*caller_t)(const PtrStepSzf, int, PtrStepSzf, cudaStream_t);
+
+        static const caller_t callers[] =
+        {
+            0 /*gaussianBlurCaller<BrdConstant<float> >*/,
+            gaussianBlurCaller<BrdReplicate<float> >,
+            0 /*gaussianBlurCaller<BrdReflect<float> >*/,
+            0 /*gaussianBlurCaller<BrdWrap<float> >*/,
+            gaussianBlurCaller<BrdReflect101<float> >
+        };
+
+        callers[borderMode](src, ksizeHalf, dst, stream);
+    }
+
+
+    template <typename Border>
+    __global__ void gaussianBlur5(
+            const int height, const int width, const PtrStepf src, const int ksizeHalf,
+            const Border b, PtrStepf dst)
+    {
+        const int y = by * bdy + ty;
+        const int x = bx * bdx + tx;
+
+        extern __shared__ float smem[];
+
+        const int smw = bdx + 2*ksizeHalf; // shared memory "width"
+        volatile float *row = smem + 5 * ty * smw;
+
+        if (y < height)
+        {
+            // Vertical pass
+            for (int i = tx; i < bdx + 2*ksizeHalf; i += bdx)
+            {
+                int xExt = int(bx * bdx) + i - ksizeHalf;
+                xExt = b.idx_col(xExt);
+
+                #pragma unroll
+                for (int k = 0; k < 5; ++k)
+                    row[k*smw + i] = src(k*height + y, xExt) * c_gKer[0];
+
+                for (int j = 1; j <= ksizeHalf; ++j)
+                    #pragma unroll
+                    for (int k = 0; k < 5; ++k)
+                        row[k*smw + i] +=
+                                (src(k*height + b.idx_row_low(y - j), xExt) +
+                                 src(k*height + b.idx_row_high(y + j), xExt)) * c_gKer[j];
+            }
+
+            if (x < width)
+            {
+                __syncthreads();
+
+                // Horizontal pass
+
+                row += tx + ksizeHalf;
+                float res[5];
+
+                #pragma unroll
+                for (int k = 0; k < 5; ++k)
+                    res[k] = row[k*smw] * c_gKer[0];
+
+                for (int i = 1; i <= ksizeHalf; ++i)
+                    #pragma unroll
+                    for (int k = 0; k < 5; ++k)
+                        res[k] += (row[k*smw - i] + row[k*smw + i]) * c_gKer[i];
+
+                #pragma unroll
+                for (int k = 0; k < 5; ++k)
+                    dst(k*height + y, x) = res[k];
+            }
+        }
+    }
+
+
+    template <typename Border, int blockDimX>
+    void gaussianBlur5Caller(
+            const PtrStepSzf src, int ksizeHalf, PtrStepSzf dst, cudaStream_t stream)
+    {
+        int height = src.rows / 5;
+        int width = src.cols;
+
+        dim3 block(blockDimX);
+        dim3 grid(divUp(width, block.x), divUp(height, block.y));
+        int smem = (block.x + 2*ksizeHalf) * 5 * block.y * sizeof(float);
+        Border b(height, width);
+
+        gaussianBlur5<<<grid, block, smem, stream>>>(height, width, src, ksizeHalf, b, dst);
+
+        cudaSafeCall(cudaGetLastError());
+
+        if (stream == 0)
+            cudaSafeCall(cudaDeviceSynchronize());
+    }
+
+
+    void gaussianBlur5Gpu(
+            const PtrStepSzf src, int ksizeHalf, PtrStepSzf dst, int borderMode, cudaStream_t stream)
+    {
+        typedef void (*caller_t)(const PtrStepSzf, int, PtrStepSzf, cudaStream_t);
+
+        static const caller_t callers[] =
+        {
+            0 /*gaussianBlur5Caller<BrdConstant<float>,256>*/,
+            gaussianBlur5Caller<BrdReplicate<float>,256>,
+            0 /*gaussianBlur5Caller<BrdReflect<float>,256>*/,
+            0 /*gaussianBlur5Caller<BrdWrap<float>,256>*/,
+            gaussianBlur5Caller<BrdReflect101<float>,256>
+        };
+
+        callers[borderMode](src, ksizeHalf, dst, stream);
+    }
+
+    void gaussianBlur5Gpu_CC11(
+            const PtrStepSzf src, int ksizeHalf, PtrStepSzf dst, int borderMode, cudaStream_t stream)
+    {
+        typedef void (*caller_t)(const PtrStepSzf, int, PtrStepSzf, cudaStream_t);
+
+        static const caller_t callers[] =
+        {
+            0 /*gaussianBlur5Caller<BrdConstant<float>,128>*/,
+            gaussianBlur5Caller<BrdReplicate<float>,128>,
+            0 /*gaussianBlur5Caller<BrdReflect<float>,128>*/,
+            0 /*gaussianBlur5Caller<BrdWrap<float>,128>*/,
+            gaussianBlur5Caller<BrdReflect101<float>,128>
+        };
+
+        callers[borderMode](src, ksizeHalf, dst, stream);
+    }
+
+}}}} // namespace cv { namespace cuda { namespace cudev { namespace optflow_farneback
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudaoptflow/src/cuda/pyrlk.cu b/modules/cudaoptflow/src/cuda/pyrlk.cu
new file mode 100644
index 00000000000..5d18c7697e5
--- /dev/null
+++ b/modules/cudaoptflow/src/cuda/pyrlk.cu
@@ -0,0 +1,1162 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/utility.hpp"
+#include "opencv2/core/cuda/functional.hpp"
+#include "opencv2/core/cuda/limits.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/reduce.hpp"
+#include "opencv2/core/cuda/filters.hpp"
+#include "opencv2/core/cuda/border_interpolate.hpp"
+
+#include <iostream>
+
+using namespace cv::cuda;
+using namespace cv::cuda::device;
+
+namespace pyrlk
+{
+    __constant__ int c_winSize_x;
+    __constant__ int c_winSize_y;
+    __constant__ int c_halfWin_x;
+    __constant__ int c_halfWin_y;
+    __constant__ int c_iters;
+
+    texture<uchar, cudaTextureType2D, cudaReadModeNormalizedFloat> tex_I8U(false, cudaFilterModeLinear, cudaAddressModeClamp);
+    texture<uchar4, cudaTextureType2D, cudaReadModeNormalizedFloat> tex_I8UC4(false, cudaFilterModeLinear, cudaAddressModeClamp);
+
+    texture<ushort4, cudaTextureType2D, cudaReadModeNormalizedFloat> tex_I16UC4(false, cudaFilterModeLinear, cudaAddressModeClamp);
+
+
+    texture<float, cudaTextureType2D, cudaReadModeElementType> tex_If(false, cudaFilterModeLinear, cudaAddressModeClamp);
+    texture<float4, cudaTextureType2D, cudaReadModeElementType> tex_If4(false, cudaFilterModeLinear, cudaAddressModeClamp);
+
+    texture<uchar, cudaTextureType2D, cudaReadModeElementType> tex_Ib(false, cudaFilterModePoint, cudaAddressModeClamp);
+
+    texture<uchar, cudaTextureType2D, cudaReadModeNormalizedFloat> tex_J8U(false, cudaFilterModeLinear, cudaAddressModeClamp);
+    texture<uchar4, cudaTextureType2D, cudaReadModeNormalizedFloat> tex_J8UC4(false, cudaFilterModeLinear, cudaAddressModeClamp);
+
+    texture<ushort4, cudaTextureType2D, cudaReadModeNormalizedFloat> tex_J16UC4(false, cudaFilterModeLinear, cudaAddressModeClamp);
+
+
+    texture<float, cudaTextureType2D, cudaReadModeElementType> tex_Jf(false, cudaFilterModeLinear, cudaAddressModeClamp);
+    texture<float4, cudaTextureType2D, cudaReadModeElementType> tex_Jf4(false, cudaFilterModeLinear, cudaAddressModeClamp);
+
+
+    template <int cn, typename T> struct Tex_I
+    {
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<typename TypeVec<T, cn>::vec_type> I)
+        {
+            CV_UNUSED(I);
+        }
+    };
+
+    template <> struct Tex_I<1, uchar>
+    {
+        static __device__ __forceinline__ float read(float x, float y)
+        {
+            return tex2D(tex_I8U, x, y);
+        }
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<uchar>& I)
+        {
+            bindTexture(&tex_I8U, I);
+        }
+    };
+    template <> struct Tex_I<1, ushort>
+    {
+        static __device__ __forceinline__ float read(float x, float y)
+        {
+            return 0.0;
+        }
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<ushort>& I)
+        {
+            CV_UNUSED(I);
+        }
+    };
+    template <> struct Tex_I<1, int>
+    {
+        static __device__ __forceinline__ float read(float x, float y)
+        {
+            return 0.0;
+        }
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<int>& I)
+        {
+            CV_UNUSED(I);
+        }
+    };
+    template <> struct Tex_I<1, float>
+    {
+        static __device__ __forceinline__ float read(float x, float y)
+        {
+            return tex2D(tex_If, x, y);
+        }
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<float>& I)
+        {
+            bindTexture(&tex_If, I);
+        }
+    };
+    // ****************** 3 channel specializations ************************
+    template <> struct Tex_I<3, uchar>
+    {
+        static __device__ __forceinline__ float3 read(float x, float y)
+        {
+            return make_float3(0,0,0);
+        }
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<uchar3> I)
+        {
+            CV_UNUSED(I);
+        }
+    };
+    template <> struct Tex_I<3, ushort>
+    {
+        static __device__ __forceinline__ float3 read(float x, float y)
+        {
+            return make_float3(0, 0, 0);
+        }
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<ushort3> I)
+        {
+            CV_UNUSED(I);
+        }
+    };
+    template <> struct Tex_I<3, int>
+    {
+        static __device__ __forceinline__ float3 read(float x, float y)
+        {
+            return make_float3(0, 0, 0);
+        }
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<int3> I)
+        {
+            CV_UNUSED(I);
+        }
+    };
+    template <> struct Tex_I<3, float>
+    {
+        static __device__ __forceinline__ float3 read(float x, float y)
+        {
+            return make_float3(0, 0, 0);
+        }
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<float3> I)
+        {
+            CV_UNUSED(I);
+        }
+    };
+    // ****************** 4 channel specializations ************************
+
+    template <> struct Tex_I<4, uchar>
+    {
+        static __device__ __forceinline__ float4 read(float x, float y)
+        {
+            return tex2D(tex_I8UC4, x, y);
+        }
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<uchar4>& I)
+        {
+            bindTexture(&tex_I8UC4, I);
+        }
+    };
+    template <> struct Tex_I<4, ushort>
+    {
+        static __device__ __forceinline__ float4 read(float x, float y)
+        {
+            return tex2D(tex_I16UC4, x, y);
+        }
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<ushort4>& I)
+        {
+            bindTexture(&tex_I16UC4, I);
+        }
+    };
+    template <> struct Tex_I<4, float>
+    {
+        static __device__ __forceinline__ float4 read(float x, float y)
+        {
+            return tex2D(tex_If4, x, y);
+        }
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<float4>& I)
+        {
+            bindTexture(&tex_If4, I);
+        }
+    };
+    // ************* J  ***************
+    template <int cn, typename T> struct Tex_J
+    {
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<typename TypeVec<T,cn>::vec_type>& J)
+        {
+            CV_UNUSED(J);
+        }
+    };
+    template <> struct Tex_J<1, uchar>
+    {
+        static __device__ __forceinline__ float read(float x, float y)
+        {
+            return tex2D(tex_J8U, x, y);
+        }
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<uchar>& J)
+        {
+            bindTexture(&tex_J8U, J);
+        }
+    };
+    template <> struct Tex_J<1, float>
+    {
+        static __device__ __forceinline__ float read(float x, float y)
+        {
+            return tex2D(tex_Jf, x, y);
+        }
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<float>& J)
+        {
+            bindTexture(&tex_Jf, J);
+        }
+    };
+    // ************* 4 channel specializations ***************
+    template <> struct Tex_J<4, uchar>
+    {
+        static __device__ __forceinline__ float4 read(float x, float y)
+        {
+            return tex2D(tex_J8UC4, x, y);
+        }
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<uchar4>& J)
+        {
+            bindTexture(&tex_J8UC4, J);
+        }
+    };
+    template <> struct Tex_J<4, ushort>
+    {
+        static __device__ __forceinline__ float4 read(float x, float y)
+        {
+            return tex2D(tex_J16UC4, x, y);
+        }
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<ushort4>& J)
+        {
+            bindTexture(&tex_J16UC4, J);
+        }
+    };
+    template <> struct Tex_J<4, float>
+    {
+        static __device__ __forceinline__ float4 read(float x, float y)
+        {
+            return tex2D(tex_Jf4, x, y);
+        }
+        static __host__ __forceinline__ void bindTexture_(PtrStepSz<float4>& J)
+        {
+            bindTexture(&tex_Jf4, J);
+        }
+    };
+
+    __device__ __forceinline__ void accum(float& dst, const float& val)
+    {
+        dst += val;
+    }
+    __device__ __forceinline__ void accum(float& dst, const float2& val)
+    {
+        dst += val.x + val.y;
+    }
+    __device__ __forceinline__ void accum(float& dst, const float3& val)
+    {
+        dst += val.x + val.y + val.z;
+    }
+    __device__ __forceinline__ void accum(float& dst, const float4& val)
+    {
+        dst += val.x + val.y + val.z + val.w;
+    }
+
+    __device__ __forceinline__ float abs_(float a)
+    {
+        return ::fabsf(a);
+    }
+    __device__ __forceinline__ float4 abs_(const float4& a)
+    {
+        return abs(a);
+    }
+    __device__ __forceinline__ float2 abs_(const float2& a)
+    {
+        return abs(a);
+    }
+    __device__ __forceinline__ float3 abs_(const float3& a)
+    {
+        return abs(a);
+    }
+
+
+    template<typename T> __device__ __forceinline__ typename TypeVec<float, 1>::vec_type ToFloat(const typename TypeVec<T, 1>::vec_type& other)
+    {
+        return other;
+    }
+    template<typename T> __device__ __forceinline__  typename TypeVec<float, 2>::vec_type ToFloat(const typename TypeVec<T, 2>::vec_type& other)
+    {
+        typename TypeVec<float, 2>::vec_type ret;
+        ret.x = other.x;
+        ret.y = other.y;
+        return ret;
+    }
+    template<typename T> __device__ __forceinline__  typename TypeVec<float, 3>::vec_type ToFloat(const typename TypeVec<T, 3>::vec_type& other)
+    {
+        typename TypeVec<float, 3>::vec_type ret;
+        ret.x = other.x;
+        ret.y = other.y;
+        ret.z = other.z;
+        return ret;
+    }
+    template<typename T> __device__ __forceinline__  typename TypeVec<float, 4>::vec_type ToFloat(const typename TypeVec<T, 4>::vec_type& other)
+    {
+        typename TypeVec<float, 4>::vec_type ret;
+        ret.x = other.x;
+        ret.y = other.y;
+        ret.z = other.z;
+        ret.w = other.w;
+        return ret;
+    }
+
+    template <typename T>
+    struct DenormalizationFactor
+    {
+        static __device__ __forceinline__ float factor()
+        {
+            return 1.0f;
+        }
+    };
+
+    template <>
+    struct DenormalizationFactor<uchar>
+    {
+        static __device__ __forceinline__ float factor()
+        {
+            return 255.0f;
+        }
+    };
+
+    template <int cn, int PATCH_X, int PATCH_Y, bool calcErr, typename T>
+    __global__ void sparseKernel(const float2* prevPts, float2* nextPts, uchar* status, float* err, const int level, const int rows, const int cols)
+    {
+    #if __CUDA_ARCH__ <= 110
+        const int BLOCK_SIZE = 128;
+    #else
+        const int BLOCK_SIZE = 256;
+    #endif
+
+        __shared__ float smem1[BLOCK_SIZE];
+        __shared__ float smem2[BLOCK_SIZE];
+        __shared__ float smem3[BLOCK_SIZE];
+
+        const unsigned int tid = threadIdx.y * blockDim.x + threadIdx.x;
+
+        float2 prevPt = prevPts[blockIdx.x];
+        prevPt.x *= (1.0f / (1 << level));
+        prevPt.y *= (1.0f / (1 << level));
+
+        if (prevPt.x < 0 || prevPt.x >= cols || prevPt.y < 0 || prevPt.y >= rows)
+        {
+            if (tid == 0 && level == 0)
+                status[blockIdx.x] = 0;
+
+            return;
+        }
+
+        prevPt.x -= c_halfWin_x;
+        prevPt.y -= c_halfWin_y;
+
+        // extract the patch from the first image, compute covariation matrix of derivatives
+
+        float A11 = 0;
+        float A12 = 0;
+        float A22 = 0;
+
+        typedef typename TypeVec<float, cn>::vec_type work_type;
+
+        work_type I_patch   [PATCH_Y][PATCH_X];
+        work_type dIdx_patch[PATCH_Y][PATCH_X];
+        work_type dIdy_patch[PATCH_Y][PATCH_X];
+
+        for (int yBase = threadIdx.y, i = 0; yBase < c_winSize_y; yBase += blockDim.y, ++i)
+        {
+            for (int xBase = threadIdx.x, j = 0; xBase < c_winSize_x; xBase += blockDim.x, ++j)
+            {
+                float x = prevPt.x + xBase + 0.5f;
+                float y = prevPt.y + yBase + 0.5f;
+
+                I_patch[i][j] = Tex_I<cn, T>::read(x, y);
+
+                // Sharr Deriv
+
+                work_type dIdx = 3.0f * Tex_I<cn,T>::read(x+1, y-1) + 10.0f * Tex_I<cn, T>::read(x+1, y) + 3.0f * Tex_I<cn,T>::read(x+1, y+1) -
+                                 (3.0f * Tex_I<cn,T>::read(x-1, y-1) + 10.0f * Tex_I<cn, T>::read(x-1, y) + 3.0f * Tex_I<cn,T>::read(x-1, y+1));
+
+                work_type dIdy = 3.0f * Tex_I<cn,T>::read(x-1, y+1) + 10.0f * Tex_I<cn, T>::read(x, y+1) + 3.0f * Tex_I<cn,T>::read(x+1, y+1) -
+                                (3.0f * Tex_I<cn,T>::read(x-1, y-1) + 10.0f * Tex_I<cn, T>::read(x, y-1) + 3.0f * Tex_I<cn,T>::read(x+1, y-1));
+
+                dIdx_patch[i][j] = dIdx;
+                dIdy_patch[i][j] = dIdy;
+
+                accum(A11, dIdx * dIdx);
+                accum(A12, dIdx * dIdy);
+                accum(A22, dIdy * dIdy);
+            }
+        }
+
+        reduce<BLOCK_SIZE>(smem_tuple(smem1, smem2, smem3), thrust::tie(A11, A12, A22), tid, thrust::make_tuple(plus<float>(), plus<float>(), plus<float>()));
+
+    #if __CUDA_ARCH__ >= 300
+        if (tid == 0)
+        {
+            smem1[0] = A11;
+            smem2[0] = A12;
+            smem3[0] = A22;
+        }
+    #endif
+
+        __syncthreads();
+
+        A11 = smem1[0];
+        A12 = smem2[0];
+        A22 = smem3[0];
+
+        float D = A11 * A22 - A12 * A12;
+
+        if (D < numeric_limits<float>::epsilon())
+        {
+            if (tid == 0 && level == 0)
+                status[blockIdx.x] = 0;
+
+            return;
+        }
+
+        D = 1.f / D;
+
+        A11 *= D;
+        A12 *= D;
+        A22 *= D;
+
+        float2 nextPt = nextPts[blockIdx.x];
+        nextPt.x *= 2.f;
+        nextPt.y *= 2.f;
+
+        nextPt.x -= c_halfWin_x;
+        nextPt.y -= c_halfWin_y;
+
+        for (int k = 0; k < c_iters; ++k)
+        {
+            if (nextPt.x < -c_halfWin_x || nextPt.x >= cols || nextPt.y < -c_halfWin_y || nextPt.y >= rows)
+            {
+                if (tid == 0 && level == 0)
+                    status[blockIdx.x] = 0;
+
+                return;
+            }
+
+            float b1 = 0;
+            float b2 = 0;
+
+            for (int y = threadIdx.y, i = 0; y < c_winSize_y; y += blockDim.y, ++i)
+            {
+                for (int x = threadIdx.x, j = 0; x < c_winSize_x; x += blockDim.x, ++j)
+                {
+                    work_type I_val = I_patch[i][j];
+                    work_type J_val = Tex_J<cn, T>::read(nextPt.x + x + 0.5f, nextPt.y + y + 0.5f);
+
+                    work_type diff = (J_val - I_val) * 32.0f;
+
+                    accum(b1, diff * dIdx_patch[i][j]);
+                    accum(b2, diff * dIdy_patch[i][j]);
+                }
+            }
+
+            reduce<BLOCK_SIZE>(smem_tuple(smem1, smem2), thrust::tie(b1, b2), tid, thrust::make_tuple(plus<float>(), plus<float>()));
+
+        #if __CUDA_ARCH__ >= 300
+            if (tid == 0)
+            {
+                smem1[0] = b1;
+                smem2[0] = b2;
+            }
+        #endif
+
+            __syncthreads();
+
+            b1 = smem1[0];
+            b2 = smem2[0];
+
+            float2 delta;
+            delta.x = A12 * b2 - A22 * b1;
+            delta.y = A12 * b1 - A11 * b2;
+
+            nextPt.x += delta.x;
+            nextPt.y += delta.y;
+
+            if (::fabs(delta.x) < 0.01f && ::fabs(delta.y) < 0.01f)
+                break;
+        }
+
+        float errval = 0;
+        if (calcErr)
+        {
+            for (int y = threadIdx.y, i = 0; y < c_winSize_y; y += blockDim.y, ++i)
+            {
+                for (int x = threadIdx.x, j = 0; x < c_winSize_x; x += blockDim.x, ++j)
+                {
+                    work_type I_val = I_patch[i][j];
+                    work_type J_val = Tex_J<cn, T>::read(nextPt.x + x + 0.5f, nextPt.y + y + 0.5f);
+
+                    work_type diff = J_val - I_val;
+
+                    accum(errval, abs_(diff));
+                }
+            }
+
+            reduce<BLOCK_SIZE>(smem1, errval, tid, plus<float>());
+        }
+
+        if (tid == 0)
+        {
+            nextPt.x += c_halfWin_x;
+            nextPt.y += c_halfWin_y;
+
+            nextPts[blockIdx.x] = nextPt;
+
+            if (calcErr)
+                err[blockIdx.x] = static_cast<float>(errval) / (::min(cn, 3) * c_winSize_x * c_winSize_y) * DenormalizationFactor<T>::factor();
+        }
+    }
+
+    // Kernel, uses non texture fetches
+    template <int PATCH_X, int PATCH_Y, bool calcErr, int cn, typename T, typename Ptr2D>
+    __global__ void sparseKernel_(Ptr2D I, Ptr2D J, const float2* prevPts, float2* nextPts, uchar* status, float* err, const int level, const int rows, const int cols)
+    {
+#if __CUDA_ARCH__ <= 110
+        const int BLOCK_SIZE = 128;
+#else
+        const int BLOCK_SIZE = 256;
+#endif
+
+        __shared__ float smem1[BLOCK_SIZE];
+        __shared__ float smem2[BLOCK_SIZE];
+        __shared__ float smem3[BLOCK_SIZE];
+
+        const unsigned int tid = threadIdx.y * blockDim.x + threadIdx.x;
+
+        float2 prevPt = prevPts[blockIdx.x];
+        prevPt.x *= (1.0f / (1 << level));
+        prevPt.y *= (1.0f / (1 << level));
+
+        if (prevPt.x < 0 || prevPt.x >= cols || prevPt.y < 0 || prevPt.y >= rows)
+        {
+            if (tid == 0 && level == 0)
+                status[blockIdx.x] = 0;
+
+            return;
+        }
+
+        prevPt.x -= c_halfWin_x;
+        prevPt.y -= c_halfWin_y;
+
+        // extract the patch from the first image, compute covariation matrix of derivatives
+
+        float A11 = 0;
+        float A12 = 0;
+        float A22 = 0;
+
+        typedef typename TypeVec<float, cn>::vec_type work_type;
+
+        work_type I_patch[PATCH_Y][PATCH_X];
+        work_type dIdx_patch[PATCH_Y][PATCH_X];
+        work_type dIdy_patch[PATCH_Y][PATCH_X];
+
+        for (int yBase = threadIdx.y, i = 0; yBase < c_winSize_y; yBase += blockDim.y, ++i)
+        {
+            for (int xBase = threadIdx.x, j = 0; xBase < c_winSize_x; xBase += blockDim.x, ++j)
+            {
+                float x = prevPt.x + xBase + 0.5f;
+                float y = prevPt.y + yBase + 0.5f;
+
+                I_patch[i][j] = ToFloat<T>(I(y, x));
+
+                // Sharr Deriv
+
+                work_type dIdx = 3.0f * I(y - 1, x + 1) + 10.0f * I(y, x + 1) + 3.0f * I(y + 1, x + 1) -
+                    (3.0f * I(y - 1, x - 1) + 10.0f * I(y, x - 1) + 3.0f * I(y + 1 , x - 1));
+
+                work_type dIdy = 3.0f * I(y + 1, x - 1) + 10.0f * I(y + 1, x) + 3.0f * I(y+1, x + 1) -
+                    (3.0f * I(y - 1, x - 1) + 10.0f * I(y-1, x) + 3.0f * I(y - 1, x + 1));
+
+                dIdx_patch[i][j] = dIdx;
+                dIdy_patch[i][j] = dIdy;
+
+                accum(A11, dIdx * dIdx);
+                accum(A12, dIdx * dIdy);
+                accum(A22, dIdy * dIdy);
+            }
+        }
+
+        reduce<BLOCK_SIZE>(smem_tuple(smem1, smem2, smem3), thrust::tie(A11, A12, A22), tid, thrust::make_tuple(plus<float>(), plus<float>(), plus<float>()));
+
+#if __CUDA_ARCH__ >= 300
+        if (tid == 0)
+        {
+            smem1[0] = A11;
+            smem2[0] = A12;
+            smem3[0] = A22;
+        }
+#endif
+
+        __syncthreads();
+
+        A11 = smem1[0];
+        A12 = smem2[0];
+        A22 = smem3[0];
+
+        float D = A11 * A22 - A12 * A12;
+
+        if (D < numeric_limits<float>::epsilon())
+        {
+            if (tid == 0 && level == 0)
+                status[blockIdx.x] = 0;
+
+            return;
+        }
+
+        D = 1.f / D;
+
+        A11 *= D;
+        A12 *= D;
+        A22 *= D;
+
+        float2 nextPt = nextPts[blockIdx.x];
+        nextPt.x *= 2.f;
+        nextPt.y *= 2.f;
+
+        nextPt.x -= c_halfWin_x;
+        nextPt.y -= c_halfWin_y;
+
+        for (int k = 0; k < c_iters; ++k)
+        {
+            if (nextPt.x < -c_halfWin_x || nextPt.x >= cols || nextPt.y < -c_halfWin_y || nextPt.y >= rows)
+            {
+                if (tid == 0 && level == 0)
+                    status[blockIdx.x] = 0;
+
+                return;
+            }
+
+            float b1 = 0;
+            float b2 = 0;
+
+            for (int y = threadIdx.y, i = 0; y < c_winSize_y; y += blockDim.y, ++i)
+            {
+                for (int x = threadIdx.x, j = 0; x < c_winSize_x; x += blockDim.x, ++j)
+                {
+                    work_type I_val = I_patch[i][j];
+                    work_type J_val = ToFloat<T>(J(nextPt.y + y + 0.5f, nextPt.x + x + 0.5f));
+
+                    work_type diff = (J_val - I_val) * 32.0f;
+
+                    accum(b1, diff * dIdx_patch[i][j]);
+                    accum(b2, diff * dIdy_patch[i][j]);
+                }
+            }
+
+            reduce<BLOCK_SIZE>(smem_tuple(smem1, smem2), thrust::tie(b1, b2), tid, thrust::make_tuple(plus<float>(), plus<float>()));
+
+#if __CUDA_ARCH__ >= 300
+            if (tid == 0)
+            {
+                smem1[0] = b1;
+                smem2[0] = b2;
+            }
+#endif
+
+            __syncthreads();
+
+            b1 = smem1[0];
+            b2 = smem2[0];
+
+            float2 delta;
+            delta.x = A12 * b2 - A22 * b1;
+            delta.y = A12 * b1 - A11 * b2;
+
+            nextPt.x += delta.x;
+            nextPt.y += delta.y;
+
+            if (::fabs(delta.x) < 0.01f && ::fabs(delta.y) < 0.01f)
+                break;
+        }
+
+        float errval = 0;
+        if (calcErr)
+        {
+            for (int y = threadIdx.y, i = 0; y < c_winSize_y; y += blockDim.y, ++i)
+            {
+                for (int x = threadIdx.x, j = 0; x < c_winSize_x; x += blockDim.x, ++j)
+                {
+                    work_type I_val = I_patch[i][j];
+                    work_type J_val = ToFloat<T>(J(nextPt.y + y + 0.5f, nextPt.x + x + 0.5f));
+
+                    work_type diff = J_val - I_val;
+
+                    accum(errval, abs_(diff));
+                }
+            }
+
+            reduce<BLOCK_SIZE>(smem1, errval, tid, plus<float>());
+        }
+
+        if (tid == 0)
+        {
+            nextPt.x += c_halfWin_x;
+            nextPt.y += c_halfWin_y;
+
+            nextPts[blockIdx.x] = nextPt;
+
+            if (calcErr)
+                err[blockIdx.x] = static_cast<float>(errval) / (::min(cn, 3)*c_winSize_x * c_winSize_y);
+        }
+    } // __global__ void sparseKernel_
+
+
+    template <int cn, int PATCH_X, int PATCH_Y, typename T> class sparse_caller
+    {
+    public:
+        static void call(PtrStepSz<typename TypeVec<T, cn>::vec_type> I, PtrStepSz<typename TypeVec<T, cn>::vec_type> J, int rows, int cols, const float2* prevPts, float2* nextPts, uchar* status, float* err, int ptcount,
+            int level, dim3 block, cudaStream_t stream)
+        {
+            dim3 grid(ptcount);
+            CV_UNUSED(I);
+            CV_UNUSED(J);
+            if (level == 0 && err)
+                sparseKernel<cn, PATCH_X, PATCH_Y, true, T> <<<grid, block, 0, stream >>>(prevPts, nextPts, status, err, level, rows, cols);
+            else
+                sparseKernel<cn, PATCH_X, PATCH_Y, false, T> <<<grid, block, 0, stream >>>(prevPts, nextPts, status, err, level, rows, cols);
+
+            cudaSafeCall(cudaGetLastError());
+
+            if (stream == 0)
+                cudaSafeCall(cudaDeviceSynchronize());
+        }
+    };
+    // Specialization to use non texture path because for some reason the texture path keeps failing accuracy tests
+    template<int PATCH_X, int PATCH_Y> class sparse_caller<1, PATCH_X, PATCH_Y, unsigned short>
+    {
+    public:
+        typedef typename TypeVec<unsigned short, 1>::vec_type work_type;
+        typedef PtrStepSz<work_type> Ptr2D;
+        typedef BrdConstant<work_type> BrdType;
+        typedef BorderReader<Ptr2D, BrdType> Reader;
+        typedef LinearFilter<Reader> Filter;
+        static void call(Ptr2D I, Ptr2D J, int rows, int cols, const float2* prevPts, float2* nextPts, uchar* status, float* err, int ptcount,
+            int level, dim3 block, cudaStream_t stream)
+        {
+            dim3 grid(ptcount);
+            if (level == 0 && err)
+            {
+                sparseKernel_<PATCH_X, PATCH_Y, true, 1, unsigned short> <<<grid, block, 0, stream >>>(
+                    Filter(Reader(I, BrdType(rows, cols))),
+                    Filter(Reader(J, BrdType(rows, cols))),
+                    prevPts, nextPts, status, err, level, rows, cols);
+            }
+            else
+            {
+                sparseKernel_<PATCH_X, PATCH_Y, false, 1, unsigned short> <<<grid, block, 0, stream >>>(
+                    Filter(Reader(I, BrdType(rows, cols))),
+                    Filter(Reader(J, BrdType(rows, cols))),
+                    prevPts, nextPts, status, err, level, rows, cols);
+            }
+            cudaSafeCall(cudaGetLastError());
+
+            if (stream == 0)
+                cudaSafeCall(cudaDeviceSynchronize());
+        }
+    };
+    // Specialization for int because the texture path keeps failing
+    template<int PATCH_X, int PATCH_Y> class sparse_caller<1, PATCH_X, PATCH_Y, int>
+    {
+    public:
+        typedef typename TypeVec<int, 1>::vec_type work_type;
+        typedef PtrStepSz<work_type> Ptr2D;
+        typedef BrdConstant<work_type> BrdType;
+        typedef BorderReader<Ptr2D, BrdType> Reader;
+        typedef LinearFilter<Reader> Filter;
+        static void call(Ptr2D I, Ptr2D J, int rows, int cols, const float2* prevPts, float2* nextPts, uchar* status, float* err, int ptcount,
+            int level, dim3 block, cudaStream_t stream)
+        {
+            dim3 grid(ptcount);
+            if (level == 0 && err)
+            {
+                sparseKernel_<PATCH_X, PATCH_Y, true, 1, int> <<<grid, block, 0, stream >>>(
+                    Filter(Reader(I, BrdType(rows, cols))),
+                    Filter(Reader(J, BrdType(rows, cols))),
+                    prevPts, nextPts, status, err, level, rows, cols);
+            }
+            else
+            {
+                sparseKernel_<PATCH_X, PATCH_Y, false, 1, int> <<<grid, block, 0, stream >>>(
+                    Filter(Reader(I, BrdType(rows, cols))),
+                    Filter(Reader(J, BrdType(rows, cols))),
+                    prevPts, nextPts, status, err, level, rows, cols);
+            }
+            cudaSafeCall(cudaGetLastError());
+
+            if (stream == 0)
+                cudaSafeCall(cudaDeviceSynchronize());
+        }
+    };
+    template<int PATCH_X, int PATCH_Y> class sparse_caller<4, PATCH_X, PATCH_Y, int>
+    {
+    public:
+        typedef typename TypeVec<int, 4>::vec_type work_type;
+        typedef PtrStepSz<work_type> Ptr2D;
+        typedef BrdConstant<work_type> BrdType;
+        typedef BorderReader<Ptr2D, BrdType> Reader;
+        typedef LinearFilter<Reader> Filter;
+        static void call(Ptr2D I, Ptr2D J, int rows, int cols, const float2* prevPts, float2* nextPts, uchar* status, float* err, int ptcount,
+            int level, dim3 block, cudaStream_t stream)
+        {
+            dim3 grid(ptcount);
+            if (level == 0 && err)
+            {
+                sparseKernel_<PATCH_X, PATCH_Y, true, 4, int> <<<grid, block, 0, stream >>>(
+                    Filter(Reader(I, BrdType(rows, cols))),
+                    Filter(Reader(J, BrdType(rows, cols))),
+                    prevPts, nextPts, status, err, level, rows, cols);
+            }
+            else
+            {
+                sparseKernel_<PATCH_X, PATCH_Y, false, 4, int> <<<grid, block, 0, stream >>>(
+                    Filter(Reader(I, BrdType(rows, cols))),
+                    Filter(Reader(J, BrdType(rows, cols))),
+                    prevPts, nextPts, status, err, level, rows, cols);
+            }
+            cudaSafeCall(cudaGetLastError());
+
+            if (stream == 0)
+                cudaSafeCall(cudaDeviceSynchronize());
+        }
+    };
+    using namespace cv::cuda::device;
+    template <int PATCH_X, int PATCH_Y, typename T> class sparse_caller<3, PATCH_X, PATCH_Y, T>
+    {
+    public:
+        typedef typename TypeVec<T, 3>::vec_type work_type;
+        typedef PtrStepSz<work_type> Ptr2D;
+        typedef BrdConstant<work_type> BrdType;
+        typedef BorderReader<Ptr2D, BrdType> Reader;
+        typedef LinearFilter<Reader> Filter;
+        static void call(Ptr2D I, Ptr2D J, int rows, int cols, const float2* prevPts, float2* nextPts, uchar* status, float* err, int ptcount,
+            int level, dim3 block, cudaStream_t stream)
+        {
+            dim3 grid(ptcount);
+            if (level == 0 && err)
+            {
+                sparseKernel_<PATCH_X, PATCH_Y, true, 3, T> <<<grid, block, 0, stream >>>(
+                    Filter(Reader(I, BrdType(rows, cols))),
+                    Filter(Reader(J, BrdType(rows, cols))),
+                    prevPts, nextPts, status, err, level, rows, cols);
+            }
+            else
+            {
+                sparseKernel_<PATCH_X, PATCH_Y, false, 3, T> <<<grid, block, 0, stream >>>(
+                    Filter(Reader(I, BrdType(rows, cols))),
+                    Filter(Reader(J, BrdType(rows, cols))),
+                    prevPts, nextPts, status, err, level, rows, cols);
+            }
+            cudaSafeCall(cudaGetLastError());
+
+            if (stream == 0)
+                cudaSafeCall(cudaDeviceSynchronize());
+        }
+    };
+
+
+    template <bool calcErr>
+    __global__ void denseKernel(PtrStepf u, PtrStepf v, const PtrStepf prevU, const PtrStepf prevV, PtrStepf err, const int rows, const int cols)
+    {
+        extern __shared__ int smem[];
+
+        const int patchWidth  = blockDim.x + 2 * c_halfWin_x;
+        const int patchHeight = blockDim.y + 2 * c_halfWin_y;
+
+        int* I_patch = smem;
+        int* dIdx_patch = I_patch + patchWidth * patchHeight;
+        int* dIdy_patch = dIdx_patch + patchWidth * patchHeight;
+
+        const int xBase = blockIdx.x * blockDim.x;
+        const int yBase = blockIdx.y * blockDim.y;
+
+        for (int i = threadIdx.y; i < patchHeight; i += blockDim.y)
+        {
+            for (int j = threadIdx.x; j < patchWidth; j += blockDim.x)
+            {
+                float x = xBase - c_halfWin_x + j + 0.5f;
+                float y = yBase - c_halfWin_y + i + 0.5f;
+
+                I_patch[i * patchWidth + j] = tex2D(tex_If, x, y);
+
+                // Sharr Deriv
+
+                dIdx_patch[i * patchWidth + j] = 3 * tex2D(tex_If, x+1, y-1) + 10 * tex2D(tex_If, x+1, y) + 3 * tex2D(tex_If, x+1, y+1) -
+                                                (3 * tex2D(tex_If, x-1, y-1) + 10 * tex2D(tex_If, x-1, y) + 3 * tex2D(tex_If, x-1, y+1));
+
+                dIdy_patch[i * patchWidth + j] = 3 * tex2D(tex_If, x-1, y+1) + 10 * tex2D(tex_If, x, y+1) + 3 * tex2D(tex_If, x+1, y+1) -
+                                                (3 * tex2D(tex_If, x-1, y-1) + 10 * tex2D(tex_If, x, y-1) + 3 * tex2D(tex_If, x+1, y-1));
+            }
+        }
+
+        __syncthreads();
+
+        const int x = xBase + threadIdx.x;
+        const int y = yBase + threadIdx.y;
+
+        if (x >= cols || y >= rows)
+            return;
+
+
+        int A11i = 0;
+        int A12i = 0;
+        int A22i = 0;
+
+        for (int i = 0; i < c_winSize_y; ++i)
+        {
+            for (int j = 0; j < c_winSize_x; ++j)
+            {
+                int dIdx = dIdx_patch[(threadIdx.y + i) * patchWidth + (threadIdx.x + j)];
+                int dIdy = dIdy_patch[(threadIdx.y + i) * patchWidth + (threadIdx.x + j)];
+
+                A11i += dIdx * dIdx;
+                A12i += dIdx * dIdy;
+                A22i += dIdy * dIdy;
+            }
+        }
+
+        float A11 = A11i;
+        float A12 = A12i;
+        float A22 = A22i;
+
+        float D = A11 * A22 - A12 * A12;
+
+        if (D < numeric_limits<float>::epsilon())
+        {
+            if (calcErr)
+                err(y, x) = numeric_limits<float>::max();
+            return;
+        }
+
+        D = 1.f / D;
+
+        A11 *= D;
+        A12 *= D;
+        A22 *= D;
+
+        float2 nextPt;
+        nextPt.x = x + prevU(y/2, x/2) * 2.0f;
+        nextPt.y = y + prevV(y/2, x/2) * 2.0f;
+
+        for (int k = 0; k < c_iters; ++k)
+        {
+            if (nextPt.x < 0 || nextPt.x >= cols || nextPt.y < 0 || nextPt.y >= rows)
+            {
+                if (calcErr)
+                    err(y, x) = numeric_limits<float>::max();
+
+                return;
+            }
+
+            int b1 = 0;
+            int b2 = 0;
+
+            for (int i = 0; i < c_winSize_y; ++i)
+            {
+                for (int j = 0; j < c_winSize_x; ++j)
+                {
+                    int I = I_patch[(threadIdx.y + i) * patchWidth + threadIdx.x + j];
+                    int J = tex2D(tex_Jf, nextPt.x - c_halfWin_x + j + 0.5f, nextPt.y - c_halfWin_y + i + 0.5f);
+
+                    int diff = (J - I) * 32;
+
+                    int dIdx = dIdx_patch[(threadIdx.y + i) * patchWidth + (threadIdx.x + j)];
+                    int dIdy = dIdy_patch[(threadIdx.y + i) * patchWidth + (threadIdx.x + j)];
+
+                    b1 += diff * dIdx;
+                    b2 += diff * dIdy;
+                }
+            }
+
+
+            float2 delta;
+            delta.x = A12 * b2 - A22 * b1;
+            delta.y = A12 * b1 - A11 * b2;
+
+            nextPt.x += delta.x;
+            nextPt.y += delta.y;
+
+            if (::fabs(delta.x) < 0.01f && ::fabs(delta.y) < 0.01f)
+                break;
+        }
+
+        u(y, x) = nextPt.x - x;
+        v(y, x) = nextPt.y - y;
+
+        if (calcErr)
+        {
+            int errval = 0;
+
+            for (int i = 0; i < c_winSize_y; ++i)
+            {
+                for (int j = 0; j < c_winSize_x; ++j)
+                {
+                    int I = I_patch[(threadIdx.y + i) * patchWidth + threadIdx.x + j];
+                    int J = tex2D(tex_Jf, nextPt.x - c_halfWin_x + j + 0.5f, nextPt.y - c_halfWin_y + i + 0.5f);
+
+                    errval += ::abs(J - I);
+                }
+            }
+
+            err(y, x) = static_cast<float>(errval) / (c_winSize_x * c_winSize_y);
+        }
+    }
+
+    void loadWinSize(int* winSize, int* halfWinSize, cudaStream_t stream)
+    {
+        cudaSafeCall( cudaMemcpyToSymbolAsync(c_winSize_x, winSize, sizeof(int), 0, cudaMemcpyHostToDevice, stream) );
+        cudaSafeCall( cudaMemcpyToSymbolAsync(c_winSize_y, winSize + 1, sizeof(int), 0, cudaMemcpyHostToDevice, stream) );
+
+        cudaSafeCall( cudaMemcpyToSymbolAsync(c_halfWin_x, halfWinSize, sizeof(int), 0, cudaMemcpyHostToDevice, stream) );
+        cudaSafeCall( cudaMemcpyToSymbolAsync(c_halfWin_y, halfWinSize + 1, sizeof(int), 0, cudaMemcpyHostToDevice, stream) );
+    }
+
+    void loadIters(int* iters, cudaStream_t stream)
+    {
+        cudaSafeCall( cudaMemcpyToSymbolAsync(c_iters, iters, sizeof(int), 0, cudaMemcpyHostToDevice, stream) );
+    }
+
+    void loadConstants(int2 winSize_, int iters_, cudaStream_t stream)
+    {
+        static int2 winSize = make_int2(0,0);
+        if(winSize.x != winSize_.x || winSize.y != winSize_.y)
+        {
+            winSize = winSize_;
+            cudaSafeCall( cudaMemcpyToSymbolAsync(c_winSize_x, &winSize.x, sizeof(int), 0, cudaMemcpyHostToDevice, stream) );
+            cudaSafeCall( cudaMemcpyToSymbolAsync(c_winSize_y, &winSize.y, sizeof(int), 0, cudaMemcpyHostToDevice, stream) );
+        }
+
+        static int2 halfWin = make_int2(0,0);
+        int2 half = make_int2((winSize.x - 1) / 2, (winSize.y - 1) / 2);
+        if(halfWin.x != half.x || halfWin.y != half.y)
+        {
+            halfWin = half;
+            cudaSafeCall( cudaMemcpyToSymbolAsync(c_halfWin_x, &halfWin.x, sizeof(int), 0, cudaMemcpyHostToDevice, stream) );
+            cudaSafeCall( cudaMemcpyToSymbolAsync(c_halfWin_y, &halfWin.y, sizeof(int), 0, cudaMemcpyHostToDevice, stream) );
+        }
+
+        static int iters = 0;
+        if(iters != iters_)
+        {
+            iters = iters_;
+            cudaSafeCall( cudaMemcpyToSymbolAsync(c_iters, &iters, sizeof(int), 0, cudaMemcpyHostToDevice, stream) );
+        }
+    }
+
+    template<typename T, int cn> struct pyrLK_caller
+    {
+        static void sparse(PtrStepSz<typename TypeVec<T, cn>::vec_type> I, PtrStepSz<typename TypeVec<T, cn>::vec_type> J, const float2* prevPts, float2* nextPts, uchar* status, float* err, int ptcount,
+            int level, dim3 block, dim3 patch, cudaStream_t stream)
+        {
+            typedef void(*func_t)(PtrStepSz<typename TypeVec<T, cn>::vec_type> I, PtrStepSz<typename TypeVec<T, cn>::vec_type> J,
+                int rows, int cols, const float2* prevPts, float2* nextPts, uchar* status, float* err, int ptcount,
+                int level, dim3 block, cudaStream_t stream);
+
+            static const func_t funcs[5][5] =
+            {
+                { sparse_caller<cn, 1, 1,T>::call, sparse_caller<cn, 2, 1,T>::call, sparse_caller<cn, 3, 1,T>::call, sparse_caller<cn, 4, 1,T>::call, sparse_caller<cn, 5, 1,T>::call },
+                { sparse_caller<cn, 1, 2,T>::call, sparse_caller<cn, 2, 2,T>::call, sparse_caller<cn, 3, 2,T>::call, sparse_caller<cn, 4, 2,T>::call, sparse_caller<cn, 5, 2,T>::call },
+                { sparse_caller<cn, 1, 3,T>::call, sparse_caller<cn, 2, 3,T>::call, sparse_caller<cn, 3, 3,T>::call, sparse_caller<cn, 4, 3,T>::call, sparse_caller<cn, 5, 3,T>::call },
+                { sparse_caller<cn, 1, 4,T>::call, sparse_caller<cn, 2, 4,T>::call, sparse_caller<cn, 3, 4,T>::call, sparse_caller<cn, 4, 4,T>::call, sparse_caller<cn, 5, 4,T>::call },
+                { sparse_caller<cn, 1, 5,T>::call, sparse_caller<cn, 2, 5,T>::call, sparse_caller<cn, 3, 5,T>::call, sparse_caller<cn, 4, 5,T>::call, sparse_caller<cn, 5, 5,T>::call }
+            };
+
+            Tex_I<cn, T>::bindTexture_(I);
+            Tex_J<cn, T>::bindTexture_(J);
+
+            funcs[patch.y - 1][patch.x - 1](I, J, I.rows, I.cols, prevPts, nextPts, status, err, ptcount,
+                level, block, stream);
+        }
+        static void dense(PtrStepSz<T> I, PtrStepSz<T> J, PtrStepSzf u, PtrStepSzf v, PtrStepSzf prevU, PtrStepSzf prevV, PtrStepSzf err, int2 winSize, cudaStream_t stream)
+        {
+            dim3 block(16, 16);
+            dim3 grid(divUp(I.cols, block.x), divUp(I.rows, block.y));
+            Tex_I<1, T>::bindTexture_(I);
+            Tex_J<1, T>::bindTexture_(J);
+
+            int2 halfWin = make_int2((winSize.x - 1) / 2, (winSize.y - 1) / 2);
+            const int patchWidth = block.x + 2 * halfWin.x;
+            const int patchHeight = block.y + 2 * halfWin.y;
+            size_t smem_size = 3 * patchWidth * patchHeight * sizeof(int);
+
+            if (err.data)
+            {
+                denseKernel<true> << <grid, block, smem_size, stream >> >(u, v, prevU, prevV, err, I.rows, I.cols);
+                cudaSafeCall(cudaGetLastError());
+            }
+            else
+            {
+                denseKernel<false> << <grid, block, smem_size, stream >> >(u, v, prevU, prevV, PtrStepf(), I.rows, I.cols);
+                cudaSafeCall(cudaGetLastError());
+            }
+
+            if (stream == 0)
+                cudaSafeCall(cudaDeviceSynchronize());
+        }
+    };
+
+    template class pyrLK_caller<unsigned char,1>;
+    template class pyrLK_caller<unsigned short,1>;
+    template class pyrLK_caller<int,1>;
+    template class pyrLK_caller<float,1>;
+
+    template class pyrLK_caller<unsigned char, 3>;
+    template class pyrLK_caller<unsigned short, 3>;
+    template class pyrLK_caller<int, 3>;
+    template class pyrLK_caller<float, 3>;
+
+    template class pyrLK_caller<unsigned char, 4>;
+    template class pyrLK_caller<unsigned short, 4>;
+    template class pyrLK_caller<int, 4>;
+    template class pyrLK_caller<float, 4>;
+}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudaoptflow/src/cuda/tvl1flow.cu b/modules/cudaoptflow/src/cuda/tvl1flow.cu
new file mode 100644
index 00000000000..66f0d664a03
--- /dev/null
+++ b/modules/cudaoptflow/src/cuda/tvl1flow.cu
@@ -0,0 +1,372 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/border_interpolate.hpp"
+#include "opencv2/core/cuda/limits.hpp"
+
+using namespace cv::cuda;
+using namespace cv::cuda::device;
+
+////////////////////////////////////////////////////////////
+// centeredGradient
+
+namespace tvl1flow
+{
+    __global__ void centeredGradientKernel(const PtrStepSzf src, PtrStepf dx, PtrStepf dy)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= src.cols || y >= src.rows)
+            return;
+
+        dx(y, x) = 0.5f * (src(y, ::min(x + 1, src.cols - 1)) - src(y, ::max(x - 1, 0)));
+        dy(y, x) = 0.5f * (src(::min(y + 1, src.rows - 1), x) - src(::max(y - 1, 0), x));
+    }
+
+    void centeredGradient(PtrStepSzf src, PtrStepSzf dx, PtrStepSzf dy, cudaStream_t stream)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(src.cols, block.x), divUp(src.rows, block.y));
+
+        centeredGradientKernel<<<grid, block, 0, stream>>>(src, dx, dy);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (!stream)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}
+
+////////////////////////////////////////////////////////////
+// warpBackward
+
+namespace tvl1flow
+{
+    static __device__ __forceinline__ float bicubicCoeff(float x_)
+    {
+        float x = fabsf(x_);
+        if (x <= 1.0f)
+        {
+            return x * x * (1.5f * x - 2.5f) + 1.0f;
+        }
+        else if (x < 2.0f)
+        {
+            return x * (x * (-0.5f * x + 2.5f) - 4.0f) + 2.0f;
+        }
+        else
+        {
+            return 0.0f;
+        }
+    }
+
+    texture<float, cudaTextureType2D, cudaReadModeElementType> tex_I1 (false, cudaFilterModePoint, cudaAddressModeClamp);
+    texture<float, cudaTextureType2D, cudaReadModeElementType> tex_I1x(false, cudaFilterModePoint, cudaAddressModeClamp);
+    texture<float, cudaTextureType2D, cudaReadModeElementType> tex_I1y(false, cudaFilterModePoint, cudaAddressModeClamp);
+
+    __global__ void warpBackwardKernel(const PtrStepSzf I0, const PtrStepf u1, const PtrStepf u2, PtrStepf I1w, PtrStepf I1wx, PtrStepf I1wy, PtrStepf grad, PtrStepf rho)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= I0.cols || y >= I0.rows)
+            return;
+
+        const float u1Val = u1(y, x);
+        const float u2Val = u2(y, x);
+
+        const float wx = x + u1Val;
+        const float wy = y + u2Val;
+
+        const int xmin = ::ceilf(wx - 2.0f);
+        const int xmax = ::floorf(wx + 2.0f);
+
+        const int ymin = ::ceilf(wy - 2.0f);
+        const int ymax = ::floorf(wy + 2.0f);
+
+        float sum  = 0.0f;
+        float sumx = 0.0f;
+        float sumy = 0.0f;
+        float wsum = 0.0f;
+
+        for (int cy = ymin; cy <= ymax; ++cy)
+        {
+            for (int cx = xmin; cx <= xmax; ++cx)
+            {
+                const float w = bicubicCoeff(wx - cx) * bicubicCoeff(wy - cy);
+
+                sum  += w * tex2D(tex_I1 , cx, cy);
+                sumx += w * tex2D(tex_I1x, cx, cy);
+                sumy += w * tex2D(tex_I1y, cx, cy);
+
+                wsum += w;
+            }
+        }
+
+        const float coeff = 1.0f / wsum;
+
+        const float I1wVal  = sum  * coeff;
+        const float I1wxVal = sumx * coeff;
+        const float I1wyVal = sumy * coeff;
+
+        I1w(y, x)  = I1wVal;
+        I1wx(y, x) = I1wxVal;
+        I1wy(y, x) = I1wyVal;
+
+        const float Ix2 = I1wxVal * I1wxVal;
+        const float Iy2 = I1wyVal * I1wyVal;
+
+        // store the |Grad(I1)|^2
+        grad(y, x) = Ix2 + Iy2;
+
+        // compute the constant part of the rho function
+        const float I0Val = I0(y, x);
+        rho(y, x) = I1wVal - I1wxVal * u1Val - I1wyVal * u2Val - I0Val;
+    }
+
+    void warpBackward(PtrStepSzf I0, PtrStepSzf I1, PtrStepSzf I1x, PtrStepSzf I1y,
+                      PtrStepSzf u1, PtrStepSzf u2, PtrStepSzf I1w, PtrStepSzf I1wx,
+                      PtrStepSzf I1wy, PtrStepSzf grad, PtrStepSzf rho,
+                      cudaStream_t stream)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(I0.cols, block.x), divUp(I0.rows, block.y));
+
+        bindTexture(&tex_I1 , I1);
+        bindTexture(&tex_I1x, I1x);
+        bindTexture(&tex_I1y, I1y);
+
+        warpBackwardKernel<<<grid, block, 0, stream>>>(I0, u1, u2, I1w, I1wx, I1wy, grad, rho);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (!stream)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}
+
+////////////////////////////////////////////////////////////
+// estimateU
+
+namespace tvl1flow
+{
+    __device__ float divergence(const PtrStepf& v1, const PtrStepf& v2, int y, int x)
+    {
+        if (x > 0 && y > 0)
+        {
+            const float v1x = v1(y, x) - v1(y, x - 1);
+            const float v2y = v2(y, x) - v2(y - 1, x);
+            return v1x + v2y;
+        }
+        else
+        {
+            if (y > 0)
+                return v1(y, 0) + v2(y, 0) - v2(y - 1, 0);
+            else
+            {
+                if (x > 0)
+                    return v1(0, x) - v1(0, x - 1) + v2(0, x);
+                else
+                    return v1(0, 0) + v2(0, 0);
+            }
+        }
+    }
+
+    __global__ void estimateUKernel(const PtrStepSzf I1wx, const PtrStepf I1wy,
+                              const PtrStepf grad, const PtrStepf rho_c,
+                              const PtrStepf p11, const PtrStepf p12,
+                              const PtrStepf p21, const PtrStepf p22,
+                              const PtrStepf p31, const PtrStepf p32,
+                              PtrStepf u1, PtrStepf u2, PtrStepf u3, PtrStepf error,
+                              const float l_t, const float theta, const float gamma, const bool calcError)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= I1wx.cols || y >= I1wx.rows)
+            return;
+
+        const float I1wxVal = I1wx(y, x);
+        const float I1wyVal = I1wy(y, x);
+        const float gradVal = grad(y, x);
+        const float u1OldVal = u1(y, x);
+        const float u2OldVal = u2(y, x);
+        const float u3OldVal = gamma ? u3(y, x) : 0;
+
+        const float rho = rho_c(y, x) + (I1wxVal * u1OldVal + I1wyVal * u2OldVal + gamma * u3OldVal);
+
+        // estimate the values of the variable (v1, v2) (thresholding operator TH)
+
+        float d1 = 0.0f;
+        float d2 = 0.0f;
+        float d3 = 0.0f;
+
+        if (rho < -l_t * gradVal)
+        {
+            d1 = l_t * I1wxVal;
+            d2 = l_t * I1wyVal;
+            if (gamma)
+                d3 = l_t * gamma;
+        }
+        else if (rho > l_t * gradVal)
+        {
+            d1 = -l_t * I1wxVal;
+            d2 = -l_t * I1wyVal;
+            if (gamma)
+                d3 = -l_t * gamma;
+        }
+        else if (gradVal > numeric_limits<float>::epsilon())
+        {
+            const float fi = -rho / gradVal;
+            d1 = fi * I1wxVal;
+            d2 = fi * I1wyVal;
+            if (gamma)
+                d3 = fi * gamma;
+        }
+
+        const float v1 = u1OldVal + d1;
+        const float v2 = u2OldVal + d2;
+        const float v3 = u3OldVal + d3;
+
+        // compute the divergence of the dual variable (p1, p2)
+
+        const float div_p1 = divergence(p11, p12, y, x);
+        const float div_p2 = divergence(p21, p22, y, x);
+        const float div_p3 = gamma ? divergence(p31, p32, y, x) : 0;
+
+        // estimate the values of the optical flow (u1, u2)
+
+        const float u1NewVal = v1 + theta * div_p1;
+        const float u2NewVal = v2 + theta * div_p2;
+        const float u3NewVal = gamma ? v3 + theta * div_p3 : 0;
+
+        u1(y, x) = u1NewVal;
+        u2(y, x) = u2NewVal;
+        if (gamma)
+            u3(y, x) = u3NewVal;
+
+        if (calcError)
+        {
+            const float n1 = (u1OldVal - u1NewVal) * (u1OldVal - u1NewVal);
+            const float n2 = (u2OldVal - u2NewVal) * (u2OldVal - u2NewVal);
+            error(y, x) = n1 + n2;
+        }
+    }
+
+    void estimateU(PtrStepSzf I1wx, PtrStepSzf I1wy,
+                   PtrStepSzf grad, PtrStepSzf rho_c,
+                   PtrStepSzf p11, PtrStepSzf p12, PtrStepSzf p21, PtrStepSzf p22, PtrStepSzf p31, PtrStepSzf p32,
+                   PtrStepSzf u1, PtrStepSzf u2, PtrStepSzf u3, PtrStepSzf error,
+                   float l_t, float theta, float gamma, bool calcError,
+                   cudaStream_t stream)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(I1wx.cols, block.x), divUp(I1wx.rows, block.y));
+
+        estimateUKernel<<<grid, block, 0, stream>>>(I1wx, I1wy, grad, rho_c, p11, p12, p21, p22, p31, p32, u1, u2, u3, error, l_t, theta, gamma, calcError);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (!stream)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}
+
+////////////////////////////////////////////////////////////
+// estimateDualVariables
+
+namespace tvl1flow
+{
+    __global__ void estimateDualVariablesKernel(const PtrStepSzf u1, const PtrStepf u2, const PtrStepSzf u3,
+                                                PtrStepf p11, PtrStepf p12, PtrStepf p21, PtrStepf p22, PtrStepf p31, PtrStepf p32, const float taut, const float gamma)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= u1.cols || y >= u1.rows)
+            return;
+
+        const float u1x = u1(y, ::min(x + 1, u1.cols - 1)) - u1(y, x);
+        const float u1y = u1(::min(y + 1, u1.rows - 1), x) - u1(y, x);
+
+        const float u2x = u2(y, ::min(x + 1, u1.cols - 1)) - u2(y, x);
+        const float u2y = u2(::min(y + 1, u1.rows - 1), x) - u2(y, x);
+
+        const float u3x = gamma ? u3(y, ::min(x + 1, u1.cols - 1)) - u3(y, x) : 0;
+        const float u3y = gamma ? u3(::min(y + 1, u1.rows - 1), x) - u3(y, x) : 0;
+
+        const float g1 = ::hypotf(u1x, u1y);
+        const float g2 = ::hypotf(u2x, u2y);
+        const float g3 = gamma ? ::hypotf(u3x, u3y) : 0;
+
+        const float ng1 = 1.0f + taut * g1;
+        const float ng2 = 1.0f + taut * g2;
+        const float ng3 = gamma ? 1.0f + taut * g3 : 0;
+
+        p11(y, x) = (p11(y, x) + taut * u1x) / ng1;
+        p12(y, x) = (p12(y, x) + taut * u1y) / ng1;
+        p21(y, x) = (p21(y, x) + taut * u2x) / ng2;
+        p22(y, x) = (p22(y, x) + taut * u2y) / ng2;
+        if (gamma)
+        {
+            p31(y, x) = (p31(y, x) + taut * u3x) / ng3;
+            p32(y, x) = (p32(y, x) + taut * u3y) / ng3;
+        }
+    }
+
+    void estimateDualVariables(PtrStepSzf u1, PtrStepSzf u2, PtrStepSzf u3,
+                               PtrStepSzf p11, PtrStepSzf p12, PtrStepSzf p21, PtrStepSzf p22, PtrStepSzf p31, PtrStepSzf p32,
+                               float taut, float gamma,
+                               cudaStream_t stream)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(u1.cols, block.x), divUp(u1.rows, block.y));
+
+        estimateDualVariablesKernel<<<grid, block, 0, stream>>>(u1, u2, u3, p11, p12, p21, p22, p31, p32, taut, gamma);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (!stream)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}
+
+#endif // !defined CUDA_DISABLER
diff --git a/modules/cudaoptflow/src/farneback.cpp b/modules/cudaoptflow/src/farneback.cpp
new file mode 100644
index 00000000000..69ea437ec41
--- /dev/null
+++ b/modules/cudaoptflow/src/farneback.cpp
@@ -0,0 +1,484 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined HAVE_CUDA || defined(CUDA_DISABLER)
+
+Ptr<FarnebackOpticalFlow> cv::cuda::FarnebackOpticalFlow::create(int, double, bool, int, int, int, double, int) { throw_no_cuda(); return Ptr<FarnebackOpticalFlow>(); }
+
+#else
+
+#define MIN_SIZE 32
+
+// CUDA resize() is fast, but it differs from the CPU analog. Disabling this flag
+// leads to an inefficient code. It's for debug purposes only.
+#define ENABLE_CUDA_RESIZE 1
+
+namespace cv { namespace cuda { namespace device { namespace optflow_farneback
+{
+    void setPolynomialExpansionConsts(
+            int polyN, const float *g, const float *xg, const float *xxg,
+            float ig11, float ig03, float ig33, float ig55);
+
+    void polynomialExpansionGpu(const PtrStepSzf &src, int polyN, PtrStepSzf dst, cudaStream_t stream);
+
+    void setUpdateMatricesConsts();
+
+    void updateMatricesGpu(
+            const PtrStepSzf flowx, const PtrStepSzf flowy, const PtrStepSzf R0, const PtrStepSzf R1,
+            PtrStepSzf M, cudaStream_t stream);
+
+    void updateFlowGpu(
+            const PtrStepSzf M, PtrStepSzf flowx, PtrStepSzf flowy, cudaStream_t stream);
+
+    void boxFilter5Gpu(const PtrStepSzf src, int ksizeHalf, PtrStepSzf dst, cudaStream_t stream);
+
+    void boxFilter5Gpu_CC11(const PtrStepSzf src, int ksizeHalf, PtrStepSzf dst, cudaStream_t stream);
+
+    void setGaussianBlurKernel(const float *gKer, int ksizeHalf);
+
+    void gaussianBlurGpu(
+            const PtrStepSzf src, int ksizeHalf, PtrStepSzf dst, int borderType, cudaStream_t stream);
+
+    void gaussianBlur5Gpu(
+            const PtrStepSzf src, int ksizeHalf, PtrStepSzf dst, int borderType, cudaStream_t stream);
+
+    void gaussianBlur5Gpu_CC11(
+            const PtrStepSzf src, int ksizeHalf, PtrStepSzf dst, int borderType, cudaStream_t stream);
+
+}}}}
+
+namespace
+{
+    class FarnebackOpticalFlowImpl : public cv::cuda::FarnebackOpticalFlow
+    {
+    public:
+        FarnebackOpticalFlowImpl(int numLevels, double pyrScale, bool fastPyramids, int winSize,
+                                 int numIters, int polyN, double polySigma, int flags) :
+            numLevels_(numLevels), pyrScale_(pyrScale), fastPyramids_(fastPyramids), winSize_(winSize),
+            numIters_(numIters), polyN_(polyN), polySigma_(polySigma), flags_(flags)
+        {
+        }
+
+        virtual int getNumLevels() const { return numLevels_; }
+        virtual void setNumLevels(int numLevels) { numLevels_ = numLevels; }
+
+        virtual double getPyrScale() const { return pyrScale_; }
+        virtual void setPyrScale(double pyrScale) { pyrScale_ = pyrScale; }
+
+        virtual bool getFastPyramids() const { return fastPyramids_; }
+        virtual void setFastPyramids(bool fastPyramids) { fastPyramids_ = fastPyramids; }
+
+        virtual int getWinSize() const { return winSize_; }
+        virtual void setWinSize(int winSize) { winSize_ = winSize; }
+
+        virtual int getNumIters() const { return numIters_; }
+        virtual void setNumIters(int numIters) { numIters_ = numIters; }
+
+        virtual int getPolyN() const { return polyN_; }
+        virtual void setPolyN(int polyN) { polyN_ = polyN; }
+
+        virtual double getPolySigma() const { return polySigma_; }
+        virtual void setPolySigma(double polySigma) { polySigma_ = polySigma; }
+
+        virtual int getFlags() const { return flags_; }
+        virtual void setFlags(int flags) { flags_ = flags; }
+
+        virtual void calc(InputArray I0, InputArray I1, InputOutputArray flow, Stream& stream);
+
+    private:
+        int numLevels_;
+        double pyrScale_;
+        bool fastPyramids_;
+        int winSize_;
+        int numIters_;
+        int polyN_;
+        double polySigma_;
+        int flags_;
+
+    private:
+        void prepareGaussian(
+                int n, double sigma, float *g, float *xg, float *xxg,
+                double &ig11, double &ig03, double &ig33, double &ig55);
+
+        void setPolynomialExpansionConsts(int n, double sigma);
+
+        void updateFlow_boxFilter(
+                const GpuMat& R0, const GpuMat& R1, GpuMat& flowx, GpuMat &flowy,
+                GpuMat& M, GpuMat &bufM, int blockSize, bool updateMatrices, Stream streams[]);
+
+        void updateFlow_gaussianBlur(
+                const GpuMat& R0, const GpuMat& R1, GpuMat& flowx, GpuMat& flowy,
+                GpuMat& M, GpuMat &bufM, int blockSize, bool updateMatrices, Stream streams[]);
+
+        void calcImpl(const GpuMat &frame0, const GpuMat &frame1, GpuMat &flowx, GpuMat &flowy, Stream &stream);
+
+        GpuMat frames_[2];
+        GpuMat pyrLevel_[2], M_, bufM_, R_[2], blurredFrame_[2];
+        std::vector<GpuMat> pyramid0_, pyramid1_;
+    };
+
+    void FarnebackOpticalFlowImpl::calc(InputArray _frame0, InputArray _frame1, InputOutputArray _flow, Stream& stream)
+    {
+        const GpuMat frame0 = _frame0.getGpuMat();
+        const GpuMat frame1 = _frame1.getGpuMat();
+        GpuMat flow = _flow.getGpuMat();
+
+        CV_Assert(frame0.channels() == 1 && frame1.channels() == 1);
+        CV_Assert(frame0.size() == frame1.size());
+
+        GpuMat flowx, flowy;
+
+        // If flag is set, check for integrity; if not set, allocate memory space
+        if (flags_ & OPTFLOW_USE_INITIAL_FLOW)
+        {
+            CV_Assert(flow.size() == frame0.size() && flow.channels() == 2 &&
+                      flow.depth() == CV_32F);
+
+            std::vector<cuda::GpuMat> _flows(2);
+            cuda::split(flow, _flows, stream);
+            flowx = _flows[0];
+            flowy = _flows[1];
+        }
+        else
+        {
+            flowx.create(frame0.size(), CV_32FC1);
+            flowy.create(frame0.size(), CV_32FC1);
+        }
+
+        calcImpl(frame0, frame1, flowx, flowy, stream);
+
+        GpuMat flows[] = {flowx, flowy};
+        cuda::merge(flows, 2, _flow, stream);
+    }
+
+    GpuMat allocMatFromBuf(int rows, int cols, int type, GpuMat& mat)
+    {
+        if (!mat.empty() && mat.type() == type && mat.rows >= rows && mat.cols >= cols)
+            return mat(Rect(0, 0, cols, rows));
+
+        return mat = GpuMat(rows, cols, type);
+    }
+
+    void FarnebackOpticalFlowImpl::prepareGaussian(
+            int n, double sigma, float *g, float *xg, float *xxg,
+            double &ig11, double &ig03, double &ig33, double &ig55)
+    {
+        double s = 0.;
+        for (int x = -n; x <= n; x++)
+        {
+            g[x] = (float)std::exp(-x*x/(2*sigma*sigma));
+            s += g[x];
+        }
+
+        s = 1./s;
+        for (int x = -n; x <= n; x++)
+        {
+            g[x] = (float)(g[x]*s);
+            xg[x] = (float)(x*g[x]);
+            xxg[x] = (float)(x*x*g[x]);
+        }
+
+        Mat_<double> G(6, 6);
+        G.setTo(0);
+
+        for (int y = -n; y <= n; y++)
+        {
+            for (int x = -n; x <= n; x++)
+            {
+                G(0,0) += g[y]*g[x];
+                G(1,1) += g[y]*g[x]*x*x;
+                G(3,3) += g[y]*g[x]*x*x*x*x;
+                G(5,5) += g[y]*g[x]*x*x*y*y;
+            }
+        }
+
+        //G[0][0] = 1.;
+        G(2,2) = G(0,3) = G(0,4) = G(3,0) = G(4,0) = G(1,1);
+        G(4,4) = G(3,3);
+        G(3,4) = G(4,3) = G(5,5);
+
+        // invG:
+        // [ x        e  e    ]
+        // [    y             ]
+        // [       y          ]
+        // [ e        z       ]
+        // [ e           z    ]
+        // [                u ]
+        Mat_<double> invG = G.inv(DECOMP_CHOLESKY);
+
+        ig11 = invG(1,1);
+        ig03 = invG(0,3);
+        ig33 = invG(3,3);
+        ig55 = invG(5,5);
+    }
+
+    void FarnebackOpticalFlowImpl::setPolynomialExpansionConsts(int n, double sigma)
+    {
+        std::vector<float> buf(n*6 + 3);
+        float* g = &buf[0] + n;
+        float* xg = g + n*2 + 1;
+        float* xxg = xg + n*2 + 1;
+
+        if (sigma < FLT_EPSILON)
+            sigma = n*0.3;
+
+        double ig11, ig03, ig33, ig55;
+        prepareGaussian(n, sigma, g, xg, xxg, ig11, ig03, ig33, ig55);
+
+        device::optflow_farneback::setPolynomialExpansionConsts(n, g, xg, xxg, static_cast<float>(ig11), static_cast<float>(ig03), static_cast<float>(ig33), static_cast<float>(ig55));
+    }
+
+    void FarnebackOpticalFlowImpl::updateFlow_boxFilter(
+            const GpuMat& R0, const GpuMat& R1, GpuMat& flowx, GpuMat &flowy,
+            GpuMat& M, GpuMat &bufM, int blockSize, bool updateMatrices, Stream streams[])
+    {
+        if (deviceSupports(FEATURE_SET_COMPUTE_12))
+            device::optflow_farneback::boxFilter5Gpu(M, blockSize/2, bufM, StreamAccessor::getStream(streams[0]));
+        else
+            device::optflow_farneback::boxFilter5Gpu_CC11(M, blockSize/2, bufM, StreamAccessor::getStream(streams[0]));
+        swap(M, bufM);
+
+        for (int i = 1; i < 5; ++i)
+            streams[i].waitForCompletion();
+        device::optflow_farneback::updateFlowGpu(M, flowx, flowy, StreamAccessor::getStream(streams[0]));
+
+        if (updateMatrices)
+            device::optflow_farneback::updateMatricesGpu(flowx, flowy, R0, R1, M, StreamAccessor::getStream(streams[0]));
+    }
+
+    void FarnebackOpticalFlowImpl::updateFlow_gaussianBlur(
+            const GpuMat& R0, const GpuMat& R1, GpuMat& flowx, GpuMat& flowy,
+            GpuMat& M, GpuMat &bufM, int blockSize, bool updateMatrices, Stream streams[])
+    {
+        if (deviceSupports(FEATURE_SET_COMPUTE_12))
+            device::optflow_farneback::gaussianBlur5Gpu(
+                        M, blockSize/2, bufM, BORDER_REPLICATE, StreamAccessor::getStream(streams[0]));
+        else
+            device::optflow_farneback::gaussianBlur5Gpu_CC11(
+                        M, blockSize/2, bufM, BORDER_REPLICATE, StreamAccessor::getStream(streams[0]));
+        swap(M, bufM);
+
+        device::optflow_farneback::updateFlowGpu(M, flowx, flowy, StreamAccessor::getStream(streams[0]));
+
+        if (updateMatrices)
+            device::optflow_farneback::updateMatricesGpu(flowx, flowy, R0, R1, M, StreamAccessor::getStream(streams[0]));
+    }
+
+    void FarnebackOpticalFlowImpl::calcImpl(const GpuMat &frame0, const GpuMat &frame1, GpuMat &flowx, GpuMat &flowy, Stream &stream)
+    {
+        CV_Assert(polyN_ == 5 || polyN_ == 7);
+        CV_Assert(!fastPyramids_ || std::abs(pyrScale_ - 0.5) < 1e-6);
+
+        Stream streams[5];
+        if (stream)
+            streams[0] = stream;
+
+        Size size = frame0.size();
+        GpuMat prevFlowX, prevFlowY, curFlowX, curFlowY;
+
+        GpuMat flowx0 = flowx;
+        GpuMat flowy0 = flowy;
+
+        // Crop unnecessary levels
+        double scale = 1;
+        int numLevelsCropped = 0;
+        for (; numLevelsCropped < numLevels_; numLevelsCropped++)
+        {
+            scale *= pyrScale_;
+            if (size.width*scale < MIN_SIZE || size.height*scale < MIN_SIZE)
+                break;
+        }
+
+        frame0.convertTo(frames_[0], CV_32F, streams[0]);
+        frame1.convertTo(frames_[1], CV_32F, streams[1]);
+
+        if (fastPyramids_)
+        {
+            // Build Gaussian pyramids using pyrDown()
+            pyramid0_.resize(numLevelsCropped + 1);
+            pyramid1_.resize(numLevelsCropped + 1);
+            pyramid0_[0] = frames_[0];
+            pyramid1_[0] = frames_[1];
+            for (int i = 1; i <= numLevelsCropped; ++i)
+            {
+                cuda::pyrDown(pyramid0_[i - 1], pyramid0_[i], streams[0]);
+                cuda::pyrDown(pyramid1_[i - 1], pyramid1_[i], streams[1]);
+            }
+        }
+
+        setPolynomialExpansionConsts(polyN_, polySigma_);
+        device::optflow_farneback::setUpdateMatricesConsts();
+
+        for (int k = numLevelsCropped; k >= 0; k--)
+        {
+            streams[0].waitForCompletion();
+
+            scale = 1;
+            for (int i = 0; i < k; i++)
+                scale *= pyrScale_;
+
+            double sigma = (1./scale - 1) * 0.5;
+            int smoothSize = cvRound(sigma*5) | 1;
+            smoothSize = std::max(smoothSize, 3);
+
+            int width = cvRound(size.width*scale);
+            int height = cvRound(size.height*scale);
+
+            if (fastPyramids_)
+            {
+                width = pyramid0_[k].cols;
+                height = pyramid0_[k].rows;
+            }
+
+            if (k > 0)
+            {
+                curFlowX.create(height, width, CV_32F);
+                curFlowY.create(height, width, CV_32F);
+            }
+            else
+            {
+                curFlowX = flowx0;
+                curFlowY = flowy0;
+            }
+
+            if (!prevFlowX.data)
+            {
+                if (flags_ & OPTFLOW_USE_INITIAL_FLOW)
+                {
+                    cuda::resize(flowx0, curFlowX, Size(width, height), 0, 0, INTER_LINEAR, streams[0]);
+                    cuda::resize(flowy0, curFlowY, Size(width, height), 0, 0, INTER_LINEAR, streams[1]);
+                    curFlowX.convertTo(curFlowX, curFlowX.depth(), scale, streams[0]);
+                    curFlowY.convertTo(curFlowY, curFlowY.depth(), scale, streams[1]);
+                }
+                else
+                {
+                    curFlowX.setTo(0, streams[0]);
+                    curFlowY.setTo(0, streams[1]);
+                }
+            }
+            else
+            {
+                cuda::resize(prevFlowX, curFlowX, Size(width, height), 0, 0, INTER_LINEAR, streams[0]);
+                cuda::resize(prevFlowY, curFlowY, Size(width, height), 0, 0, INTER_LINEAR, streams[1]);
+                curFlowX.convertTo(curFlowX, curFlowX.depth(), 1./pyrScale_, streams[0]);
+                curFlowY.convertTo(curFlowY, curFlowY.depth(), 1./pyrScale_, streams[1]);
+            }
+
+            GpuMat M = allocMatFromBuf(5*height, width, CV_32F, M_);
+            GpuMat bufM = allocMatFromBuf(5*height, width, CV_32F, bufM_);
+            GpuMat R[2] =
+            {
+                allocMatFromBuf(5*height, width, CV_32F, R_[0]),
+                allocMatFromBuf(5*height, width, CV_32F, R_[1])
+            };
+
+            if (fastPyramids_)
+            {
+                device::optflow_farneback::polynomialExpansionGpu(pyramid0_[k], polyN_, R[0], StreamAccessor::getStream(streams[0]));
+                device::optflow_farneback::polynomialExpansionGpu(pyramid1_[k], polyN_, R[1], StreamAccessor::getStream(streams[1]));
+            }
+            else
+            {
+                GpuMat blurredFrame[2] =
+                {
+                    allocMatFromBuf(size.height, size.width, CV_32F, blurredFrame_[0]),
+                    allocMatFromBuf(size.height, size.width, CV_32F, blurredFrame_[1])
+                };
+                GpuMat pyrLevel[2] =
+                {
+                    allocMatFromBuf(height, width, CV_32F, pyrLevel_[0]),
+                    allocMatFromBuf(height, width, CV_32F, pyrLevel_[1])
+                };
+
+                Mat g = getGaussianKernel(smoothSize, sigma, CV_32F);
+                device::optflow_farneback::setGaussianBlurKernel(g.ptr<float>(smoothSize/2), smoothSize/2);
+
+                for (int i = 0; i < 2; i++)
+                {
+                    device::optflow_farneback::gaussianBlurGpu(
+                            frames_[i], smoothSize/2, blurredFrame[i], BORDER_REFLECT101, StreamAccessor::getStream(streams[i]));
+                    cuda::resize(blurredFrame[i], pyrLevel[i], Size(width, height), 0.0, 0.0, INTER_LINEAR, streams[i]);
+                    device::optflow_farneback::polynomialExpansionGpu(pyrLevel[i], polyN_, R[i], StreamAccessor::getStream(streams[i]));
+                }
+            }
+
+            streams[1].waitForCompletion();
+            device::optflow_farneback::updateMatricesGpu(curFlowX, curFlowY, R[0], R[1], M, StreamAccessor::getStream(streams[0]));
+
+            if (flags_ & OPTFLOW_FARNEBACK_GAUSSIAN)
+            {
+                Mat g = getGaussianKernel(winSize_, winSize_/2*0.3f, CV_32F);
+                device::optflow_farneback::setGaussianBlurKernel(g.ptr<float>(winSize_/2), winSize_/2);
+            }
+            for (int i = 0; i < numIters_; i++)
+            {
+                if (flags_ & OPTFLOW_FARNEBACK_GAUSSIAN)
+                    updateFlow_gaussianBlur(R[0], R[1], curFlowX, curFlowY, M, bufM, winSize_, i < numIters_-1, streams);
+                else
+                    updateFlow_boxFilter(R[0], R[1], curFlowX, curFlowY, M, bufM, winSize_, i < numIters_-1, streams);
+            }
+
+            prevFlowX = curFlowX;
+            prevFlowY = curFlowY;
+        }
+
+        flowx = curFlowX;
+        flowy = curFlowY;
+
+        if (!stream)
+            streams[0].waitForCompletion();
+    }
+}
+
+Ptr<cv::cuda::FarnebackOpticalFlow> cv::cuda::FarnebackOpticalFlow::create(int numLevels, double pyrScale, bool fastPyramids, int winSize,
+                                                                 int numIters, int polyN, double polySigma, int flags)
+{
+    return makePtr<FarnebackOpticalFlowImpl>(numLevels, pyrScale, fastPyramids, winSize,
+                                             numIters, polyN, polySigma, flags);
+}
+
+#endif
diff --git a/modules/cudaoptflow/src/nvidiaOpticalFlow.cpp b/modules/cudaoptflow/src/nvidiaOpticalFlow.cpp
new file mode 100644
index 00000000000..03f4b86cac3
--- /dev/null
+++ b/modules/cudaoptflow/src/nvidiaOpticalFlow.cpp
@@ -0,0 +1,654 @@
+//
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+//
+//M*/
+#include "precomp.hpp"
+
+#if !defined HAVE_CUDA || defined(CUDA_DISABLER)
+
+cv::Ptr<cv::cuda::NvidiaOpticalFlow_1_0> cv::cuda::NvidiaOpticalFlow_1_0::create(int, int, NVIDIA_OF_PERF_LEVEL, bool, bool, bool, int) { throw_no_cuda(); return cv::Ptr<cv::cuda::NvidiaOpticalFlow_1_0>(); }
+
+#elif !defined HAVE_NVIDIA_OPTFLOW
+
+cv::Ptr<cv::cuda::NvidiaOpticalFlow_1_0> cv::cuda::NvidiaOpticalFlow_1_0::create(int, int, NVIDIA_OF_PERF_LEVEL, bool, bool, bool, int)
+{
+    CV_Error(cv::Error::HeaderIsNull, "OpenCV was build without NVIDIA OpticalFlow support");
+}
+
+#else
+
+#include "nvOpticalFlowCommon.h"
+#include "nvOpticalFlowCuda.h"
+
+#if defined(_WIN32) || defined(_WIN64)
+#include <Windows.h>
+#else
+#define HMODULE void *
+#define _stricmp strcasecmp
+#include <dlfcn.h>
+#endif
+
+//macro for dll loading
+#if defined(_WIN64)
+#define OF_MODULENAME TEXT("nvofapi64.dll")
+#define CUDA_MODULENAME TEXT("nvcuda.dll")
+#elif defined(_WIN32)
+#define OF_MODULENAME TEXT("nvofapi.dll")
+#define CUDA_MODULENAME TEXT("nvcuda.dll")
+#else
+#define OF_MODULENAME "libnvidia-opticalflow.so.1"
+#define CUDA_MODULENAME "libcuda.so"
+#endif
+
+#define NVOF_API_CALL(nvOFAPI)                                                                      \
+    do                                                                                              \
+    {                                                                                               \
+        NV_OF_STATUS errorCode = nvOFAPI;                                                           \
+        std::ostringstream errorLog;                                                                \
+        if(errorCode != NV_OF_SUCCESS)                                                              \
+        {                                                                                           \
+            switch (errorCode)                                                                      \
+            {                                                                                       \
+            case 1:                                                                                 \
+                errorLog << #nvOFAPI << " returned error " << (unsigned int)errorCode;              \
+                errorLog << ":NV_OF_ERR_OF_NOT_AVAILABLE";                                          \
+                CV_Error(Error::StsBadFunc, errorLog.str());                                        \
+                break;                                                                              \
+            case 2:                                                                                 \
+                errorLog << #nvOFAPI << " returned error " << (unsigned int)errorCode;              \
+                errorLog << ":NV_OF_ERR_UNSUPPORTED_DEVICE";                                        \
+                CV_Error(Error::StsBadArg, errorLog.str());                                         \
+                break;                                                                              \
+            case 3:                                                                                 \
+                errorLog << #nvOFAPI << " returned error " << (unsigned int)errorCode;              \
+                errorLog << ":NV_OF_ERR_DEVICE_DOES_NOT_EXIST";                                     \
+                CV_Error(Error::StsBadArg, errorLog.str());                                         \
+                break;                                                                              \
+            case 4:                                                                                 \
+                errorLog << #nvOFAPI << " returned error " << (unsigned int)errorCode;              \
+                errorLog << ":NV_OF_ERR_INVALID_PTR";                                               \
+                CV_Error(Error::StsNullPtr, errorLog.str());                                        \
+                break;                                                                              \
+            case 5:                                                                                 \
+                errorLog << #nvOFAPI << " returned error " << (unsigned int)errorCode;              \
+                errorLog << ":NV_OF_ERR_INVALID_PARAM";                                             \
+                CV_Error(Error::StsBadArg, errorLog.str());                                         \
+                break;                                                                              \
+            case 6:                                                                                 \
+                errorLog << #nvOFAPI << " returned error " << (unsigned int)errorCode;              \
+                errorLog << ":NV_OF_ERR_INVALID_CALL";                                              \
+                CV_Error(Error::BadCallBack, errorLog.str());                                       \
+                break;                                                                              \
+            case 7:                                                                                 \
+                errorLog << #nvOFAPI << " returned error " << (unsigned int)errorCode;              \
+                errorLog << ":NV_OF_ERR_INVALID_VERSION";                                           \
+                CV_Error(Error::StsError, errorLog.str());                                          \
+                break;                                                                              \
+            case 8:                                                                                 \
+                errorLog << #nvOFAPI << " returned error " << (unsigned int)errorCode;              \
+                errorLog << ":NV_OF_ERR_OUT_OF_MEMORY";                                             \
+                CV_Error(Error::StsNoMem, errorLog.str());                                          \
+                break;                                                                              \
+            case 9:                                                                                 \
+                errorLog << #nvOFAPI << " returned error " << (unsigned int)errorCode;              \
+                errorLog << ":NV_OF_ERR_NOT_INITIALIZED";                                           \
+                CV_Error(Error::StsBadArg, errorLog.str());                                         \
+                break;                                                                              \
+            case 10:                                                                                \
+                errorLog << #nvOFAPI << " returned error " << (unsigned int)errorCode;              \
+                errorLog << ":NV_OF_ERR_UNSUPPORTED_FEATURE";                                       \
+                CV_Error(Error::StsBadArg, errorLog.str());                                         \
+                break;                                                                              \
+            case 11:                                                                                \
+                errorLog << #nvOFAPI << " returned error " << (unsigned int)errorCode;              \
+                errorLog << ":NV_OF_ERR_GENERIC";                                                   \
+                CV_Error(Error::StsInternal, errorLog.str());                                       \
+                break;                                                                              \
+            default:                                                                                \
+                break;                                                                              \
+            }                                                                                       \
+        }                                                                                           \
+    } while (0)                                                                                     \
+
+using namespace std;
+using namespace cv;
+using namespace cv::cuda;
+
+namespace
+{
+class LoadNvidiaModules
+{
+private:
+    typedef int(*PFNCudaCuCtxGetCurrent)(CUcontext*);
+    typedef NV_OF_STATUS(NVOFAPI *PFNNvOFAPICreateInstanceCuda)
+        (uint32_t apiVer, NV_OF_CUDA_API_FUNCTION_LIST* cudaOf);
+
+    PFNCudaCuCtxGetCurrent m_cudaDriverAPIGetCurrentCtx;
+    PFNNvOFAPICreateInstanceCuda m_NvOFAPICreateInstanceCuda;
+    HMODULE m_hOFModule;
+    HMODULE m_hCudaModule;
+    bool m_isFailed;
+
+    LoadNvidiaModules() :
+        m_cudaDriverAPIGetCurrentCtx(NULL),
+        m_NvOFAPICreateInstanceCuda(NULL),
+        m_isFailed(false)
+    {
+//Loading Cuda Library
+#if defined(_WIN32) || defined(_WIN64)
+        HMODULE hCudaModule = LoadLibrary(CUDA_MODULENAME);
+#else
+        void *hCudaModule = dlopen(CUDA_MODULENAME, RTLD_LAZY);
+#endif
+
+        if (hCudaModule == NULL)
+        {
+            m_isFailed = true;
+            CV_Error(Error::StsBadFunc, "Cannot find Cuda library.");
+        }
+        m_hCudaModule = hCudaModule;
+
+#if defined(_WIN32)
+        m_cudaDriverAPIGetCurrentCtx = (PFNCudaCuCtxGetCurrent)GetProcAddress(m_hCudaModule, "cuCtxGetCurrent");
+#else
+        m_cudaDriverAPIGetCurrentCtx = (PFNCudaCuCtxGetCurrent)dlsym(m_hCudaModule, "cuCtxGetCurrent");
+#endif
+        if (!m_cudaDriverAPIGetCurrentCtx)
+        {
+            m_isFailed = true;
+            CV_Error(Error::StsBadFunc,
+                "Cannot find Cuda Driver API : cuCtxGetCurrent() entry in Cuda library");
+        }
+
+//Loading Optical Flow Library
+#if defined(_WIN32) || defined(_WIN64)
+        HMODULE hOFModule = LoadLibrary(OF_MODULENAME);
+#else
+        void *hOFModule = dlopen(OF_MODULENAME, RTLD_LAZY);
+#endif
+
+        if (hOFModule == NULL)
+        {
+            m_isFailed = true;
+            CV_Error(Error::StsBadFunc, "Cannot find NvOF library.");
+        }
+        m_hOFModule = hOFModule;
+
+#if defined(_WIN32)
+        m_NvOFAPICreateInstanceCuda = (PFNNvOFAPICreateInstanceCuda)GetProcAddress(m_hOFModule, "NvOFAPICreateInstanceCuda");
+#else
+        m_NvOFAPICreateInstanceCuda = (PFNNvOFAPICreateInstanceCuda)dlsym(m_hOFModule, "NvOFAPICreateInstanceCuda");
+#endif
+        if (!m_NvOFAPICreateInstanceCuda)
+        {
+            m_isFailed = true;
+            CV_Error(Error::StsBadFunc,
+                "Cannot find NvOFAPICreateInstanceCuda() entry in NVOF library");
+        }
+    };
+
+    ~LoadNvidiaModules()
+    {
+        if (NULL != m_hCudaModule)
+        {
+#if defined(_WIN32) || defined(_WIN64)
+            FreeLibrary(m_hCudaModule);
+#else
+            dlclose(m_hCudaModule);
+#endif
+        }
+        if (NULL != m_hOFModule)
+        {
+#if defined(_WIN32) || defined(_WIN64)
+            FreeLibrary(m_hOFModule);
+#else
+            dlclose(m_hOFModule);
+#endif
+        }
+        m_hCudaModule = NULL;
+        m_hOFModule = NULL;
+        m_cudaDriverAPIGetCurrentCtx = NULL;
+        m_NvOFAPICreateInstanceCuda = NULL;
+    }
+
+public:
+    static LoadNvidiaModules& Init()
+    {
+        static LoadNvidiaModules LoadLibraryObj;
+        if (LoadLibraryObj.m_isFailed)
+            CV_Error(Error::StsError, "Can't initialize LoadNvidiaModules Class Object");
+        return LoadLibraryObj;
+    }
+
+    PFNCudaCuCtxGetCurrent GetCudaLibraryFunctionPtr() { return m_cudaDriverAPIGetCurrentCtx; }
+    PFNNvOFAPICreateInstanceCuda GetOFLibraryFunctionPtr() { return m_NvOFAPICreateInstanceCuda; }
+};
+
+class NvidiaOpticalFlowImpl : public cv::cuda::NvidiaOpticalFlow_1_0
+{
+private:
+    int m_width;
+    int m_height;
+    NV_OF_PERF_LEVEL m_preset;
+    bool m_enableTemporalHints;
+    bool m_enableExternalHints;
+    bool m_enableCostBuffer;
+    int m_gpuId;
+
+    CUcontext m_cuContext;
+    NV_OF_BUFFER_FORMAT m_format;
+
+    NV_OF_OUTPUT_VECTOR_GRID_SIZE m_gridSize;
+
+    NV_OF_BUFFER_DESCRIPTOR m_inputBufferDesc;
+    NV_OF_BUFFER_DESCRIPTOR m_outputBufferDesc;
+    NV_OF_BUFFER_DESCRIPTOR m_hintBufferDesc;
+    NV_OF_BUFFER_DESCRIPTOR m_costBufferDesc;
+
+    uint32_t m_outputElementSize;
+    uint32_t m_costBufElementSize;
+    uint32_t m_hintBufElementSize;
+
+    NV_OF_INIT_PARAMS m_initParams;
+
+    std::unique_ptr<NV_OF_CUDA_API_FUNCTION_LIST> m_ofAPI;
+    NvOFHandle m_hOF; //nvof handle
+
+    NvOFGPUBufferHandle m_hInputBuffer;
+    NvOFGPUBufferHandle m_hReferenceBuffer;
+    NvOFGPUBufferHandle m_hOutputBuffer;
+    NvOFGPUBufferHandle m_hHintBuffer;
+    NvOFGPUBufferHandle m_hCostBuffer;
+
+    CUdeviceptr m_frame0cuDevPtr;
+    CUdeviceptr m_frame1cuDevPtr;
+    CUdeviceptr m_flowXYcuDevPtr;
+    CUdeviceptr m_hintcuDevPtr;
+    CUdeviceptr m_costcuDevPtr;
+
+    NV_OF_CUDA_BUFFER_STRIDE_INFO m_inputBufferStrideInfo;
+    NV_OF_CUDA_BUFFER_STRIDE_INFO m_referenceBufferStrideInfo;
+    NV_OF_CUDA_BUFFER_STRIDE_INFO m_outputBufferStrideInfo;
+    NV_OF_CUDA_BUFFER_STRIDE_INFO m_hintBufferStrideInfo;
+    NV_OF_CUDA_BUFFER_STRIDE_INFO m_costBufferStrideInfo;
+
+    NV_OF_CUDA_API_FUNCTION_LIST* GetAPI()
+    {
+        std::lock_guard<std::mutex> lock(m_lock);
+        return  m_ofAPI.get();
+    }
+
+    NvOFHandle GetHandle() { return m_hOF; }
+
+protected:
+    std::mutex m_lock;
+
+public:
+    NvidiaOpticalFlowImpl(int width, int height, NV_OF_PERF_LEVEL perfPreset,
+        bool bEnableTemporalHints, bool bEnableExternalHints, bool bEnableCostBuffer, int gpuId);
+
+    virtual void calc(InputArray inputImage, InputArray referenceImage,
+        InputOutputArray flow, Stream& stream = Stream::Null(),
+        InputArray hint = cv::noArray(), OutputArray cost = cv::noArray());
+
+    virtual void collectGarbage();
+
+    virtual void upSampler(InputArray flow, int width, int height,
+        int gridSize, InputOutputArray upsampledFlow);
+
+    virtual int getGridSize() const { return m_gridSize; }
+};
+
+NvidiaOpticalFlowImpl::NvidiaOpticalFlowImpl(
+    int width, int height, NV_OF_PERF_LEVEL perfPreset, bool bEnableTemporalHints,
+    bool bEnableExternalHints, bool bEnableCostBuffer, int gpuId) :
+    m_width(width), m_height(height), m_preset(perfPreset),
+    m_enableTemporalHints((NV_OF_BOOL)bEnableTemporalHints),
+    m_enableExternalHints((NV_OF_BOOL)bEnableExternalHints),
+    m_enableCostBuffer((NV_OF_BOOL)bEnableCostBuffer), m_gpuId(gpuId),
+    m_cuContext(nullptr), m_format(NV_OF_BUFFER_FORMAT_GRAYSCALE8),
+    m_gridSize(NV_OF_OUTPUT_VECTOR_GRID_SIZE_4)
+{
+    LoadNvidiaModules& LoadNvidiaModulesObj = LoadNvidiaModules::Init();
+
+    int nGpu = 0;
+
+    cuSafeCall(cudaGetDeviceCount(&nGpu));
+    if (m_gpuId < 0 || m_gpuId >= nGpu)
+    {
+        CV_Error(Error::StsBadArg, "Invalid GPU Ordinal");
+    }
+
+    cuSafeCall(cudaSetDevice(m_gpuId));
+    cuSafeCall(cudaFree(m_cuContext));
+
+    cuSafeCall(LoadNvidiaModulesObj.GetCudaLibraryFunctionPtr()(&m_cuContext));
+
+    if (m_gridSize != NV_OF_OUTPUT_VECTOR_GRID_SIZE_4)
+    {
+        CV_Error(Error::StsBadArg, "Unsupported grid size");
+    }
+
+    auto nOutWidth = (m_width + m_gridSize - 1) / m_gridSize;
+    auto nOutHeight = (m_height + m_gridSize - 1) / m_gridSize;
+
+    auto outBufFmt = NV_OF_BUFFER_FORMAT_SHORT2;
+
+    memset(&m_inputBufferDesc, 0, sizeof(m_inputBufferDesc));
+    m_inputBufferDesc.width = m_width;
+    m_inputBufferDesc.height = m_height;
+    m_inputBufferDesc.bufferFormat = m_format;
+    m_inputBufferDesc.bufferUsage = NV_OF_BUFFER_USAGE_INPUT;
+
+    memset(&m_outputBufferDesc, 0, sizeof(m_outputBufferDesc));
+    m_outputBufferDesc.width = nOutWidth;
+    m_outputBufferDesc.height = nOutHeight;
+    m_outputBufferDesc.bufferFormat = outBufFmt;
+    m_outputBufferDesc.bufferUsage = NV_OF_BUFFER_USAGE_OUTPUT;
+    m_outputElementSize = sizeof(NV_OF_FLOW_VECTOR);
+
+    if (m_enableExternalHints)
+    {
+        memset(&m_hintBufferDesc, 0, sizeof(m_hintBufferDesc));
+        m_hintBufferDesc.width = nOutWidth;
+        m_hintBufferDesc.height = nOutHeight;
+        m_hintBufferDesc.bufferFormat = outBufFmt;
+        m_hintBufferDesc.bufferUsage = NV_OF_BUFFER_USAGE_HINT;
+        m_hintBufElementSize = m_outputElementSize;
+    }
+
+    if (m_enableCostBuffer)
+    {
+        memset(&m_costBufferDesc, 0, sizeof(m_costBufferDesc));
+        m_costBufferDesc.width = nOutWidth;
+        m_costBufferDesc.height = nOutHeight;
+        m_costBufferDesc.bufferFormat = NV_OF_BUFFER_FORMAT_UINT;
+        m_costBufferDesc.bufferUsage = NV_OF_BUFFER_USAGE_COST;
+        m_costBufElementSize = sizeof(uint32_t);
+    }
+
+    m_ofAPI.reset(new NV_OF_CUDA_API_FUNCTION_LIST());
+
+    NVOF_API_CALL(LoadNvidiaModulesObj.GetOFLibraryFunctionPtr()(NV_OF_API_VERSION, m_ofAPI.get()));
+    NVOF_API_CALL(GetAPI()->nvCreateOpticalFlowCuda(m_cuContext, &m_hOF));
+
+    memset(&m_initParams, 0, sizeof(m_initParams));
+    m_initParams.width = m_inputBufferDesc.width;
+    m_initParams.height = m_inputBufferDesc.height;
+    m_initParams.enableExternalHints = (NV_OF_BOOL)m_enableExternalHints;
+    m_initParams.enableOutputCost = (NV_OF_BOOL)m_enableCostBuffer;
+    m_initParams.hintGridSize = (NV_OF_BOOL)m_enableExternalHints == NV_OF_TRUE ?
+        NV_OF_HINT_VECTOR_GRID_SIZE_4 : NV_OF_HINT_VECTOR_GRID_SIZE_UNDEFINED;
+    m_initParams.outGridSize = m_gridSize;
+    m_initParams.mode = NV_OF_MODE_OPTICALFLOW;
+    m_initParams.perfLevel = m_preset;
+
+    NVOF_API_CALL(GetAPI()->nvOFInit(GetHandle(), &m_initParams));
+
+    //Input Buffer 1
+    NVOF_API_CALL(GetAPI()->nvOFCreateGPUBufferCuda(GetHandle(),
+        &m_inputBufferDesc, NV_OF_CUDA_BUFFER_TYPE_CUDEVICEPTR, &m_hInputBuffer));
+    m_frame0cuDevPtr = GetAPI()->nvOFGPUBufferGetCUdeviceptr(m_hInputBuffer);
+    NVOF_API_CALL(GetAPI()->nvOFGPUBufferGetStrideInfo(
+        m_hInputBuffer, &m_inputBufferStrideInfo));
+
+    //Input Buffer 2
+    NVOF_API_CALL(GetAPI()->nvOFCreateGPUBufferCuda(GetHandle(),
+        &m_inputBufferDesc, NV_OF_CUDA_BUFFER_TYPE_CUDEVICEPTR, &m_hReferenceBuffer));
+    m_frame1cuDevPtr = GetAPI()->nvOFGPUBufferGetCUdeviceptr(m_hReferenceBuffer);
+    NVOF_API_CALL(GetAPI()->nvOFGPUBufferGetStrideInfo(
+        m_hReferenceBuffer, &m_referenceBufferStrideInfo));
+
+    //Output Buffer
+    NVOF_API_CALL(GetAPI()->nvOFCreateGPUBufferCuda(GetHandle(),
+        &m_outputBufferDesc, NV_OF_CUDA_BUFFER_TYPE_CUDEVICEPTR, &m_hOutputBuffer));
+    m_flowXYcuDevPtr = GetAPI()->nvOFGPUBufferGetCUdeviceptr(m_hOutputBuffer);
+    NVOF_API_CALL(GetAPI()->nvOFGPUBufferGetStrideInfo(
+        m_hOutputBuffer, &m_outputBufferStrideInfo));
+
+    //Hint Buffer
+    if (m_enableExternalHints)
+    {
+        NVOF_API_CALL(GetAPI()->nvOFCreateGPUBufferCuda(GetHandle(),
+            &m_hintBufferDesc, NV_OF_CUDA_BUFFER_TYPE_CUDEVICEPTR, &m_hHintBuffer));
+        m_hintcuDevPtr = GetAPI()->nvOFGPUBufferGetCUdeviceptr(m_hHintBuffer);
+        NVOF_API_CALL(GetAPI()->nvOFGPUBufferGetStrideInfo(
+            m_hHintBuffer, &m_hintBufferStrideInfo));
+    }
+
+    //Cost Buffer
+    if (m_enableCostBuffer)
+    {
+        NVOF_API_CALL(GetAPI()->nvOFCreateGPUBufferCuda(GetHandle(),
+            &m_costBufferDesc, NV_OF_CUDA_BUFFER_TYPE_CUDEVICEPTR, &m_hCostBuffer));
+        m_costcuDevPtr = GetAPI()->nvOFGPUBufferGetCUdeviceptr(m_hCostBuffer);
+        NVOF_API_CALL(GetAPI()->nvOFGPUBufferGetStrideInfo(
+            m_hCostBuffer, &m_costBufferStrideInfo));
+    }
+}
+
+void NvidiaOpticalFlowImpl::calc(InputArray _frame0, InputArray _frame1, InputOutputArray _flow,
+    Stream& stream, InputArray hint, OutputArray cost)
+{
+    Stream inputStream = {};
+    Stream outputStream = {};
+    if (stream)
+        inputStream = stream;
+
+    NVOF_API_CALL(GetAPI()->nvOFSetIOCudaStreams(GetHandle(),
+        StreamAccessor::getStream(inputStream), StreamAccessor::getStream(outputStream)));
+
+    GpuMat frame0GpuMat(_frame0.size(), _frame0.type(), (void*)m_frame0cuDevPtr,
+        m_inputBufferStrideInfo.strideInfo[0].strideXInBytes);
+    GpuMat frame1GpuMat(_frame1.size(), _frame1.type(), (void*)m_frame1cuDevPtr,
+        m_referenceBufferStrideInfo.strideInfo[0].strideXInBytes);
+    GpuMat flowXYGpuMat(Size((m_width + m_gridSize - 1) / m_gridSize,
+        (m_height + m_gridSize - 1) / m_gridSize), CV_16SC2,
+        (void*)m_flowXYcuDevPtr, m_outputBufferStrideInfo.strideInfo[0].strideXInBytes);
+
+    //check whether frame0 is Mat or GpuMat
+    if (_frame0.isMat())
+    {
+        //Get Mats from InputArrays
+        frame0GpuMat.upload(_frame0);
+    }
+    else if (_frame0.isGpuMat())
+    {
+        //Get GpuMats from InputArrays
+        _frame0.copyTo(frame0GpuMat);
+    }
+    else
+    {
+        CV_Error(Error::StsBadArg,
+            "Incorrect input. Pass input image (frame0) as Mat or GpuMat");
+    }
+
+    //check whether frame1 is Mat or GpuMat
+    if (_frame1.isMat())
+    {
+        //Get Mats from InputArrays
+        frame1GpuMat.upload(_frame1);
+    }
+    else if (_frame1.isGpuMat())
+    {
+        //Get GpuMats from InputArrays
+        _frame1.copyTo(frame1GpuMat);
+    }
+    else
+    {
+        CV_Error(Error::StsBadArg,
+            "Incorrect input. Pass reference image (frame1) as Mat or GpuMat");
+    }
+
+    if (m_enableExternalHints)
+    {
+        GpuMat hintGpuMat(hint.size(), hint.type(), (void*)m_hintcuDevPtr,
+            m_hintBufferStrideInfo.strideInfo[0].strideXInBytes);
+
+        if (hint.isMat())
+        {
+            //Get Mat from InputArray hint
+            hintGpuMat.upload(hint);
+        }
+        else if(hint.isGpuMat())
+        {
+            //Get GpuMat from InputArray hint
+            hint.copyTo(hintGpuMat);
+        }
+        else
+        {
+            CV_Error(Error::StsBadArg,"Incorrect hint buffer passed. Pass Mat or GpuMat");
+        }
+    }
+
+    inputStream.waitForCompletion();
+
+    //Execute Call
+    NV_OF_EXECUTE_INPUT_PARAMS exeInParams;
+    NV_OF_EXECUTE_OUTPUT_PARAMS exeOutParams;
+    memset(&exeInParams, 0, sizeof(exeInParams));
+    exeInParams.inputFrame = m_hInputBuffer;
+    exeInParams.referenceFrame = m_hReferenceBuffer;
+    exeInParams.disableTemporalHints = (NV_OF_BOOL)m_enableTemporalHints == NV_OF_TRUE ?
+        NV_OF_FALSE : NV_OF_TRUE;
+    exeInParams.externalHints = m_initParams.enableExternalHints == NV_OF_TRUE ?
+        m_hHintBuffer : nullptr;
+    memset(&exeOutParams, 0, sizeof(exeOutParams));
+    exeOutParams.outputBuffer = m_hOutputBuffer;
+    exeOutParams.outputCostBuffer = m_initParams.enableOutputCost == NV_OF_TRUE ?
+        m_hCostBuffer : nullptr;;
+    NVOF_API_CALL(GetAPI()->nvOFExecute(GetHandle(), &exeInParams, &exeOutParams));
+
+    outputStream.waitForCompletion();
+
+    if (_flow.isMat())
+        flowXYGpuMat.download(_flow);
+    else if(_flow.isGpuMat())
+        flowXYGpuMat.copyTo(_flow);
+    else
+        CV_Error(Error::StsBadArg, "Incorrect flow buffer passed. Pass Mat or GpuMat");
+
+    if (m_enableCostBuffer)
+    {
+        GpuMat costGpuMat(Size((m_width + m_gridSize - 1) / m_gridSize,
+            (m_height + m_gridSize - 1) / m_gridSize), CV_32SC1, (void*)m_costcuDevPtr,
+            m_costBufferStrideInfo.strideInfo[0].strideXInBytes);
+
+        if (cost.isMat())
+            costGpuMat.download(cost);
+        else if(cost.isGpuMat())
+            costGpuMat.copyTo(cost);
+        else
+            CV_Error(Error::StsBadArg, "Incorrect cost buffer passed. Pass Mat or GpuMat");
+    }
+    cuSafeCall(cudaDeviceSynchronize());
+}
+
+void NvidiaOpticalFlowImpl::collectGarbage()
+{
+    if (m_hInputBuffer)
+    {
+        NVOF_API_CALL(GetAPI()->nvOFDestroyGPUBufferCuda(m_hInputBuffer));
+    }
+    if (m_hReferenceBuffer)
+    {
+        NVOF_API_CALL(GetAPI()->nvOFDestroyGPUBufferCuda(m_hReferenceBuffer));
+    }
+    if (m_hOutputBuffer)
+    {
+        NVOF_API_CALL(GetAPI()->nvOFDestroyGPUBufferCuda(m_hOutputBuffer));
+    }
+    if (m_enableExternalHints)
+    {
+        if (m_hHintBuffer)
+        {
+            NVOF_API_CALL(GetAPI()->nvOFDestroyGPUBufferCuda(m_hHintBuffer));
+        }
+    }
+    if (m_enableCostBuffer)
+    {
+        if (m_hCostBuffer)
+        {
+            NVOF_API_CALL(GetAPI()->nvOFDestroyGPUBufferCuda(m_hCostBuffer));
+        }
+    }
+    if (m_hOF)
+    {
+        NVOF_API_CALL(GetAPI()->nvOFDestroy(m_hOF));
+    }
+}
+
+void NvidiaOpticalFlowImpl::upSampler(InputArray _flow, int width, int height,
+    int gridSize, InputOutputArray upsampledFlow)
+{
+    Mat flow;
+    if (_flow.isMat())
+    {
+        _flow.copyTo(flow);
+    }
+    else if (_flow.isGpuMat())
+    {
+        GpuMat __flow = _flow.getGpuMat();
+        __flow.download(flow);
+    }
+    else
+    {
+        CV_Error(Error::StsBadArg,
+            "Incorrect flow buffer passed. Pass either Mat or GpuMat");
+    }
+
+    std::unique_ptr<float[]> flowVectors = nullptr;
+    const NV_OF_FLOW_VECTOR* _flowVectors = static_cast<const NV_OF_FLOW_VECTOR*>((const void*)flow.data);
+    flowVectors.reset(new float[2 * width * height]);
+    for (int y = 0; y < height; ++y)
+    {
+        for (int x = 0; x < width; ++x)
+        {
+            uint32_t blockIdX = x / gridSize;
+            uint32_t blockIdY = y / gridSize;
+            uint32_t widthInBlocks = ((width + gridSize - 1) / gridSize);
+            uint32_t heightInBlocks = ((height + gridSize - 1) / gridSize);;
+            if ((blockIdX < widthInBlocks) && (blockIdY < heightInBlocks))
+            {
+                flowVectors[(y * 2 * width) + 2 * x] = (float)
+                    (_flowVectors[blockIdX + (blockIdY * widthInBlocks)].flowx / (float)(1 << 5));
+                flowVectors[(y * 2 * width) + 2 * x + 1] = (float)
+                    (_flowVectors[blockIdX + (blockIdY * widthInBlocks)].flowy / (float)(1 << 5));
+            }
+        }
+    }
+
+    Mat output(Size(width, height), CV_32FC2, flowVectors.get());
+    if (upsampledFlow.isMat())
+    {
+        output.copyTo(upsampledFlow);
+    }
+    else if (upsampledFlow.isGpuMat())
+    {
+        GpuMat _output(output);
+        _output.copyTo(upsampledFlow);
+    }
+    else
+    {
+        CV_Error(Error::StsBadArg,
+            "Incorrect flow buffer passed for upsampled flow. Pass either Mat or GpuMat");
+    }
+}}
+
+Ptr<cv::cuda::NvidiaOpticalFlow_1_0> cv::cuda::NvidiaOpticalFlow_1_0::create(
+    int width, int height, NVIDIA_OF_PERF_LEVEL perfPreset,
+    bool bEnableTemporalHints, bool bEnableExternalHints,
+    bool bEnableCostBuffer, int gpuId)
+{
+    return makePtr<NvidiaOpticalFlowImpl>(
+        width,
+        height,
+        (NV_OF_PERF_LEVEL)perfPreset,
+        bEnableTemporalHints,
+        bEnableExternalHints,
+        bEnableCostBuffer,
+        gpuId);
+}
+#endif
diff --git a/modules/cudaoptflow/src/precomp.hpp b/modules/cudaoptflow/src/precomp.hpp
new file mode 100644
index 00000000000..d5ac493342d
--- /dev/null
+++ b/modules/cudaoptflow/src/precomp.hpp
@@ -0,0 +1,62 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_PRECOMP_H__
+#define __OPENCV_PRECOMP_H__
+
+#include <limits>
+
+#include "opencv2/cudaoptflow.hpp"
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/cudawarping.hpp"
+#include "opencv2/cudaimgproc.hpp"
+#include "opencv2/video.hpp"
+
+#include "opencv2/core/private.cuda.hpp"
+#include "opencv2/core/cuda/vec_traits.hpp"
+#include "opencv2/opencv_modules.hpp"
+
+#ifdef HAVE_OPENCV_CUDALEGACY
+#  include "opencv2/cudalegacy/private.hpp"
+#endif
+
+#endif /* __OPENCV_PRECOMP_H__ */
diff --git a/modules/cudaoptflow/src/pyrlk.cpp b/modules/cudaoptflow/src/pyrlk.cpp
new file mode 100644
index 00000000000..10209779033
--- /dev/null
+++ b/modules/cudaoptflow/src/pyrlk.cpp
@@ -0,0 +1,404 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+Ptr<cv::cuda::SparsePyrLKOpticalFlow> cv::cuda::SparsePyrLKOpticalFlow::create(Size, int, int, bool) { throw_no_cuda(); return Ptr<SparsePyrLKOpticalFlow>(); }
+
+Ptr<cv::cuda::DensePyrLKOpticalFlow> cv::cuda::DensePyrLKOpticalFlow::create(Size, int, int, bool) { throw_no_cuda(); return Ptr<DensePyrLKOpticalFlow>(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+namespace pyrlk
+{
+    void loadConstants(int* winSize, int iters, cudaStream_t stream);
+    void loadWinSize(int* winSize, int* halfWinSize, cudaStream_t stream);
+    void loadIters(int* iters, cudaStream_t stream);
+    template<typename T, int cn> struct pyrLK_caller
+    {
+        static void sparse(PtrStepSz<typename device::TypeVec<T, cn>::vec_type> I, PtrStepSz<typename device::TypeVec<T, cn>::vec_type> J, const float2* prevPts, float2* nextPts, uchar* status, float* err, int ptcount,
+            int level, dim3 block, dim3 patch, cudaStream_t stream);
+
+        static void dense(PtrStepSzf I, PtrStepSzf J, PtrStepSzf u, PtrStepSzf v, PtrStepSzf prevU, PtrStepSzf prevV,
+            PtrStepSzf err, int2 winSize, cudaStream_t stream);
+    };
+
+    template<typename T, int cn> void dispatcher(GpuMat I, GpuMat J, const float2* prevPts, float2* nextPts, uchar* status, float* err, int ptcount,
+        int level, dim3 block, dim3 patch, cudaStream_t stream)
+    {
+        pyrLK_caller<T, cn>::sparse(I, J, prevPts, nextPts, status, err, ptcount, level, block, patch, stream);
+    }
+}
+
+namespace
+{
+    class PyrLKOpticalFlowBase
+    {
+    public:
+        PyrLKOpticalFlowBase(Size winSize, int maxLevel, int iters, bool useInitialFlow);
+
+        void sparse(const GpuMat& prevImg, const GpuMat& nextImg, const GpuMat& prevPts, GpuMat& nextPts,
+            GpuMat& status, GpuMat* err, Stream& stream);
+
+        void sparse(std::vector<GpuMat>& prevPyr, std::vector<GpuMat>& nextPyr, const GpuMat& prevPts, GpuMat& nextPts,
+            GpuMat& status, GpuMat* err, Stream& stream);
+
+        void dense(const GpuMat& prevImg, const GpuMat& nextImg, GpuMat& u, GpuMat& v, Stream& stream);
+
+    protected:
+        int winSize_[2];
+        int halfWinSize_[2];
+        int maxLevel_;
+        int iters_;
+        bool useInitialFlow_;
+        void buildImagePyramid(const GpuMat& prevImg, std::vector<GpuMat>& prevPyr, const GpuMat& nextImg, std::vector<GpuMat>& nextPyr, Stream stream);
+    private:
+        friend class SparsePyrLKOpticalFlowImpl;
+        std::vector<GpuMat> prevPyr_;
+        std::vector<GpuMat> nextPyr_;
+    };
+
+    PyrLKOpticalFlowBase::PyrLKOpticalFlowBase(Size winSize, int maxLevel, int iters, bool useInitialFlow) :
+        maxLevel_(maxLevel), iters_(iters), useInitialFlow_(useInitialFlow)
+    {
+        winSize_[0] = winSize.width;
+        winSize_[1] = winSize.height;
+        halfWinSize_[0] = (winSize.width - 1) / 2;
+        halfWinSize_[1] = (winSize.height - 1) / 2;
+        pyrlk::loadWinSize(winSize_, halfWinSize_, 0);
+        pyrlk::loadIters(&iters_, 0);
+    }
+
+    void calcPatchSize(Size winSize, dim3& block, dim3& patch)
+    {
+        if (winSize.width > 32 && winSize.width > 2 * winSize.height)
+        {
+            block.x = deviceSupports(FEATURE_SET_COMPUTE_12) ? 32 : 16;
+            block.y = 8;
+        }
+        else
+        {
+            block.x = 16;
+            block.y = deviceSupports(FEATURE_SET_COMPUTE_12) ? 16 : 8;
+        }
+
+        patch.x = (winSize.width  + block.x - 1) / block.x;
+        patch.y = (winSize.height + block.y - 1) / block.y;
+
+        block.z = patch.z = 1;
+    }
+
+    void PyrLKOpticalFlowBase::buildImagePyramid(const GpuMat& prevImg, std::vector<GpuMat>& prevPyr, const GpuMat& nextImg, std::vector<GpuMat>& nextPyr, Stream stream)
+    {
+        prevPyr.resize(maxLevel_ + 1);
+        nextPyr.resize(maxLevel_ + 1);
+
+        int cn = prevImg.channels();
+
+        CV_Assert(cn == 1 || cn == 3 || cn == 4);
+
+        prevPyr[0] = prevImg;
+        nextPyr[0] = nextImg;
+
+        for (int level = 1; level <= maxLevel_; ++level)
+        {
+            cuda::pyrDown(prevPyr[level - 1], prevPyr[level], stream);
+            cuda::pyrDown(nextPyr[level - 1], nextPyr[level], stream);
+        }
+    }
+    void PyrLKOpticalFlowBase::sparse(std::vector<GpuMat>& prevPyr, std::vector<GpuMat>& nextPyr, const GpuMat& prevPts, GpuMat& nextPts,
+        GpuMat& status, GpuMat* err, Stream& stream)
+    {
+        CV_Assert(prevPyr.size() && nextPyr.size() && "Pyramid needs to at least contain the original matrix as the first element");
+        CV_Assert(prevPyr[0].size() == nextPyr[0].size());
+        CV_Assert(prevPts.rows == 1 && prevPts.type() == CV_32FC2);
+        CV_Assert(maxLevel_ >= 0);
+        CV_Assert(winSize_[0] > 2 && winSize_[1] > 2);
+        if (useInitialFlow_)
+            CV_Assert(nextPts.size() == prevPts.size() && nextPts.type() == prevPts.type());
+        else
+            ensureSizeIsEnough(1, prevPts.cols, prevPts.type(), nextPts);
+
+        GpuMat temp1 = (useInitialFlow_ ? nextPts : prevPts).reshape(1);
+        GpuMat temp2 = nextPts.reshape(1);
+        cuda::multiply(temp1, Scalar::all(1.0 / (1 << maxLevel_) / 2.0), temp2, 1, -1, stream);
+
+
+        ensureSizeIsEnough(1, prevPts.cols, CV_8UC1, status);
+        status.setTo(Scalar::all(1), stream);
+
+        if (err)
+            ensureSizeIsEnough(1, prevPts.cols, CV_32FC1, *err);
+
+        if (prevPyr.size() != size_t(maxLevel_ + 1) || nextPyr.size() != size_t(maxLevel_ + 1))
+        {
+            buildImagePyramid(prevPyr[0], prevPyr, nextPyr[0], nextPyr, stream);
+        }
+
+        dim3 block, patch;
+        calcPatchSize(Size(winSize_[0], winSize_[1]), block, patch);
+        CV_Assert(patch.x > 0 && patch.x < 6 && patch.y > 0 && patch.y < 6);
+        cudaStream_t stream_ = StreamAccessor::getStream(stream);
+        pyrlk::loadWinSize(winSize_, halfWinSize_, stream_);
+        pyrlk::loadIters(&iters_, stream_);
+
+        const int cn = prevPyr[0].channels();
+        const int type = prevPyr[0].depth();
+
+        typedef void(*func_t)(GpuMat I, GpuMat J, const float2* prevPts, float2* nextPts, uchar* status, float* err, int ptcount,
+            int level, dim3 block, dim3 patch, cudaStream_t stream);
+
+        // Current int datatype is disabled due to pyrDown not implementing it
+        // while ushort does work, it has significantly worse performance, and thus doesn't pass accuracy tests.
+        static const func_t funcs[6][4] =
+        {
+          {   pyrlk::dispatcher<uchar, 1>     , /*pyrlk::dispatcher<uchar, 2>*/ 0,   pyrlk::dispatcher<uchar, 3>      ,   pyrlk::dispatcher<uchar, 4>    },
+          { /*pyrlk::dispatcher<char, 1>*/   0, /*pyrlk::dispatcher<char, 2>*/  0, /*pyrlk::dispatcher<char, 3>*/  0  , /*pyrlk::dispatcher<char, 4>*/ 0 },
+          {   pyrlk::dispatcher<ushort, 1>    , /*pyrlk::dispatcher<ushort, 2>*/0,   pyrlk::dispatcher<ushort, 3>     ,   pyrlk::dispatcher<ushort, 4>   },
+          { /*pyrlk::dispatcher<short, 1>*/  0, /*pyrlk::dispatcher<short, 2>*/ 0, /*pyrlk::dispatcher<short, 3>*/ 0  , /*pyrlk::dispatcher<short, 4>*/0 },
+          {   pyrlk::dispatcher<int, 1>       , /*pyrlk::dispatcher<int, 2>*/   0,   pyrlk::dispatcher<int, 3>        ,   pyrlk::dispatcher<int, 4>      },
+          {   pyrlk::dispatcher<float, 1>     , /*pyrlk::dispatcher<float, 2>*/ 0,   pyrlk::dispatcher<float, 3>      ,   pyrlk::dispatcher<float, 4>    }
+        };
+
+        func_t func = funcs[type][cn-1];
+        CV_Assert(func != NULL && "Datatype not implemented");
+        for (int level = maxLevel_; level >= 0; level--)
+        {
+            func(prevPyr[level], nextPyr[level],
+                prevPts.ptr<float2>(), nextPts.ptr<float2>(),
+                status.ptr(), level == 0 && err ? err->ptr<float>() : 0,
+                prevPts.cols, level, block, patch,
+                stream_);
+        }
+    }
+
+    void PyrLKOpticalFlowBase::sparse(const GpuMat& prevImg, const GpuMat& nextImg, const GpuMat& prevPts, GpuMat& nextPts, GpuMat& status, GpuMat* err, Stream& stream)
+    {
+        if (prevPts.empty())
+        {
+            nextPts.release();
+            status.release();
+            if (err) err->release();
+            return;
+        }
+        CV_Assert( prevImg.channels() == 1 || prevImg.channels() == 3 || prevImg.channels() == 4 );
+        CV_Assert( prevImg.size() == nextImg.size() && prevImg.type() == nextImg.type() );
+
+        // build the image pyramids.
+        buildImagePyramid(prevImg, prevPyr_, nextImg, nextPyr_, stream);
+
+        sparse(prevPyr_, nextPyr_, prevPts, nextPts, status, err, stream);
+
+    }
+
+    void PyrLKOpticalFlowBase::dense(const GpuMat& prevImg, const GpuMat& nextImg, GpuMat& u, GpuMat& v, Stream& stream)
+    {
+        CV_Assert( prevImg.type() == CV_8UC1 );
+        CV_Assert( prevImg.size() == nextImg.size() && prevImg.type() == nextImg.type() );
+        CV_Assert( maxLevel_ >= 0 );
+        CV_Assert( winSize_[0] > 2 && winSize_[1] > 2 );
+
+        // build the image pyramids.
+
+        prevPyr_.resize(maxLevel_ + 1);
+        nextPyr_.resize(maxLevel_ + 1);
+
+        //prevPyr_[0] = prevImg;
+
+        prevImg.convertTo(prevPyr_[0], CV_32F, stream);
+        nextImg.convertTo(nextPyr_[0], CV_32F, stream);
+
+        for (int level = 1; level <= maxLevel_; ++level)
+        {
+            cuda::pyrDown(prevPyr_[level - 1], prevPyr_[level], stream);
+            cuda::pyrDown(nextPyr_[level - 1], nextPyr_[level], stream);
+        }
+
+        BufferPool pool(stream);
+
+        GpuMat uPyr[] = {
+            pool.getBuffer(prevImg.size(), CV_32FC1),
+            pool.getBuffer(prevImg.size(), CV_32FC1),
+        };
+        GpuMat vPyr[] = {
+            pool.getBuffer(prevImg.size(), CV_32FC1),
+            pool.getBuffer(prevImg.size(), CV_32FC1),
+        };
+
+        uPyr[0].setTo(Scalar::all(0), stream);
+        vPyr[0].setTo(Scalar::all(0), stream);
+        uPyr[1].setTo(Scalar::all(0), stream);
+        vPyr[1].setTo(Scalar::all(0), stream);
+        cudaStream_t stream_ = StreamAccessor::getStream(stream);
+        pyrlk::loadWinSize(winSize_, halfWinSize_, stream_);
+        pyrlk::loadIters(&iters_, stream_);
+        int2 winSize2i = make_int2(winSize_[0], winSize_[1]);
+        //pyrlk::loadConstants(winSize2i, iters_, StreamAccessor::getStream(stream));
+
+        int idx = 0;
+
+        for (int level = maxLevel_; level >= 0; level--)
+        {
+            int idx2 = (idx + 1) & 1;
+
+            pyrlk::pyrLK_caller<float,1>::dense(prevPyr_[level], nextPyr_[level],
+                         uPyr[idx], vPyr[idx], uPyr[idx2], vPyr[idx2],
+                         PtrStepSzf(), winSize2i,
+                         stream_);
+
+            if (level > 0)
+                idx = idx2;
+        }
+
+        uPyr[idx].copyTo(u, stream);
+        vPyr[idx].copyTo(v, stream);
+    }
+
+    class SparsePyrLKOpticalFlowImpl : public cv::cuda::SparsePyrLKOpticalFlow, private PyrLKOpticalFlowBase
+    {
+    public:
+        SparsePyrLKOpticalFlowImpl(Size winSize, int maxLevel, int iters, bool useInitialFlow) :
+            PyrLKOpticalFlowBase(winSize, maxLevel, iters, useInitialFlow)
+        {
+        }
+
+        virtual Size getWinSize() const { return cv::Size(winSize_[0], winSize_[1]); }
+        virtual void setWinSize(Size winSize) {
+            winSize_[0] = winSize.width;
+            winSize_[1] = winSize.height;
+            halfWinSize_[0] = (winSize.width - 1) / 2;
+            halfWinSize_[1] = (winSize.height -1) / 2;
+        }
+
+        virtual int getMaxLevel() const { return maxLevel_; }
+        virtual void setMaxLevel(int maxLevel) { maxLevel_ = maxLevel; }
+
+        virtual int getNumIters() const { return iters_; }
+        virtual void setNumIters(int iters) { iters_ = iters; }
+
+        virtual bool getUseInitialFlow() const { return useInitialFlow_; }
+        virtual void setUseInitialFlow(bool useInitialFlow) { useInitialFlow_ = useInitialFlow; }
+
+        virtual void calc(InputArray _prevImg, InputArray _nextImg,
+                          InputArray _prevPts, InputOutputArray _nextPts,
+                          OutputArray _status,
+                          OutputArray _err,
+                          Stream& stream)
+        {
+            const GpuMat prevPts = _prevPts.getGpuMat();
+            GpuMat& nextPts = _nextPts.getGpuMatRef();
+            GpuMat& status = _status.getGpuMatRef();
+            GpuMat* err = _err.needed() ? &(_err.getGpuMatRef()) : NULL;
+            if (_prevImg.kind() == _InputArray::STD_VECTOR_CUDA_GPU_MAT && _nextImg.kind() == _InputArray::STD_VECTOR_CUDA_GPU_MAT)
+            {
+                std::vector<GpuMat> prevPyr, nextPyr;
+                _prevImg.getGpuMatVector(prevPyr);
+                _nextImg.getGpuMatVector(nextPyr);
+                sparse(prevPyr, nextPyr, prevPts, nextPts, status, err, stream);
+            }
+            else
+            {
+                const GpuMat prevImg = _prevImg.getGpuMat();
+                const GpuMat nextImg = _nextImg.getGpuMat();
+                sparse(prevImg, nextImg, prevPts, nextPts, status, err, stream);
+            }
+        }
+    };
+
+    class DensePyrLKOpticalFlowImpl : public DensePyrLKOpticalFlow, private PyrLKOpticalFlowBase
+    {
+    public:
+        DensePyrLKOpticalFlowImpl(Size winSize, int maxLevel, int iters, bool useInitialFlow) :
+            PyrLKOpticalFlowBase(winSize, maxLevel, iters, useInitialFlow)
+        {
+        }
+
+        virtual Size getWinSize() const { return cv::Size(winSize_[0], winSize_[1]); }
+        virtual void setWinSize(Size winSize) {
+            winSize_[0] = winSize.width;
+            winSize_[1] = winSize.height;
+            halfWinSize_[0] = (winSize.width - 1) / 2;
+            halfWinSize_[1] = (winSize.height -1) / 2;
+        }
+
+        virtual int getMaxLevel() const { return maxLevel_; }
+        virtual void setMaxLevel(int maxLevel) { maxLevel_ = maxLevel; }
+
+        virtual int getNumIters() const { return iters_; }
+        virtual void setNumIters(int iters) { iters_ = iters; }
+
+        virtual bool getUseInitialFlow() const { return useInitialFlow_; }
+        virtual void setUseInitialFlow(bool useInitialFlow) { useInitialFlow_ = useInitialFlow; }
+
+        virtual void calc(InputArray _prevImg, InputArray _nextImg, InputOutputArray _flow, Stream& stream)
+        {
+            const GpuMat prevImg = _prevImg.getGpuMat();
+            const GpuMat nextImg = _nextImg.getGpuMat();
+
+            BufferPool pool(stream);
+            GpuMat u = pool.getBuffer(prevImg.size(), CV_32FC1);
+            GpuMat v = pool.getBuffer(prevImg.size(), CV_32FC1);
+
+            dense(prevImg, nextImg, u, v, stream);
+
+            GpuMat flows[] = {u, v};
+            cuda::merge(flows, 2, _flow, stream);
+        }
+    };
+}
+
+Ptr<cv::cuda::SparsePyrLKOpticalFlow> cv::cuda::SparsePyrLKOpticalFlow::create(Size winSize, int maxLevel, int iters, bool useInitialFlow)
+{
+    return makePtr<SparsePyrLKOpticalFlowImpl>(winSize, maxLevel, iters, useInitialFlow);
+}
+
+Ptr<cv::cuda::DensePyrLKOpticalFlow> cv::cuda::DensePyrLKOpticalFlow::create(Size winSize, int maxLevel, int iters, bool useInitialFlow)
+{
+    return makePtr<DensePyrLKOpticalFlowImpl>(winSize, maxLevel, iters, useInitialFlow);
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudaoptflow/src/tvl1flow.cpp b/modules/cudaoptflow/src/tvl1flow.cpp
new file mode 100644
index 00000000000..abc6c2e318f
--- /dev/null
+++ b/modules/cudaoptflow/src/tvl1flow.cpp
@@ -0,0 +1,385 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+#if !defined HAVE_CUDA || defined(CUDA_DISABLER)
+
+cv::Ptr<cv::cuda::OpticalFlowDual_TVL1> cv::cuda::OpticalFlowDual_TVL1::create(double, double, double, int, int, double, int, double, double, bool)
+{
+    throw_no_cuda();
+    return Ptr<cv::cuda::OpticalFlowDual_TVL1>();
+}
+
+#else
+
+using namespace cv;
+using namespace cv::cuda;
+
+namespace tvl1flow
+{
+    void centeredGradient(PtrStepSzf src, PtrStepSzf dx, PtrStepSzf dy, cudaStream_t stream);
+    void warpBackward(PtrStepSzf I0, PtrStepSzf I1, PtrStepSzf I1x, PtrStepSzf I1y,
+                      PtrStepSzf u1, PtrStepSzf u2,
+                      PtrStepSzf I1w, PtrStepSzf I1wx, PtrStepSzf I1wy,
+                      PtrStepSzf grad, PtrStepSzf rho,
+                      cudaStream_t stream);
+    void estimateU(PtrStepSzf I1wx, PtrStepSzf I1wy,
+                   PtrStepSzf grad, PtrStepSzf rho_c,
+                   PtrStepSzf p11, PtrStepSzf p12, PtrStepSzf p21, PtrStepSzf p22, PtrStepSzf p31, PtrStepSzf p32,
+                   PtrStepSzf u1, PtrStepSzf u2, PtrStepSzf u3, PtrStepSzf error,
+                   float l_t, float theta, float gamma, bool calcError,
+                   cudaStream_t stream);
+    void estimateDualVariables(PtrStepSzf u1, PtrStepSzf u2, PtrStepSzf u3,
+                               PtrStepSzf p11, PtrStepSzf p12, PtrStepSzf p21, PtrStepSzf p22, PtrStepSzf p31, PtrStepSzf p32,
+                               float taut, float gamma,
+                               cudaStream_t stream);
+}
+
+namespace
+{
+    class OpticalFlowDual_TVL1_Impl : public OpticalFlowDual_TVL1
+    {
+    public:
+        OpticalFlowDual_TVL1_Impl(double tau, double lambda, double theta, int nscales, int warps, double epsilon,
+                                  int iterations, double scaleStep, double gamma, bool useInitialFlow) :
+            tau_(tau), lambda_(lambda), gamma_(gamma), theta_(theta), nscales_(nscales), warps_(warps),
+            epsilon_(epsilon), iterations_(iterations), scaleStep_(scaleStep), useInitialFlow_(useInitialFlow)
+        {
+        }
+
+        virtual double getTau() const { return tau_; }
+        virtual void setTau(double tau) { tau_ = tau; }
+
+        virtual double getLambda() const { return lambda_; }
+        virtual void setLambda(double lambda) { lambda_ = lambda; }
+
+        virtual double getGamma() const { return gamma_; }
+        virtual void setGamma(double gamma) { gamma_ = gamma; }
+
+        virtual double getTheta() const { return theta_; }
+        virtual void setTheta(double theta) { theta_ = theta; }
+
+        virtual int getNumScales() const { return nscales_; }
+        virtual void setNumScales(int nscales) { nscales_ = nscales; }
+
+        virtual int getNumWarps() const { return warps_; }
+        virtual void setNumWarps(int warps) { warps_ = warps; }
+
+        virtual double getEpsilon() const { return epsilon_; }
+        virtual void setEpsilon(double epsilon) { epsilon_ = epsilon; }
+
+        virtual int getNumIterations() const { return iterations_; }
+        virtual void setNumIterations(int iterations) { iterations_ = iterations; }
+
+        virtual double getScaleStep() const { return scaleStep_; }
+        virtual void setScaleStep(double scaleStep) { scaleStep_ = scaleStep; }
+
+        virtual bool getUseInitialFlow() const { return useInitialFlow_; }
+        virtual void setUseInitialFlow(bool useInitialFlow) { useInitialFlow_ = useInitialFlow; }
+
+        virtual void calc(InputArray I0, InputArray I1, InputOutputArray flow, Stream& stream);
+
+    private:
+        double tau_;
+        double lambda_;
+        double gamma_;
+        double theta_;
+        int nscales_;
+        int warps_;
+        double epsilon_;
+        int iterations_;
+        double scaleStep_;
+        bool useInitialFlow_;
+
+    private:
+        void calcImpl(const GpuMat& I0, const GpuMat& I1, GpuMat& flowx, GpuMat& flowy, Stream& stream);
+        void procOneScale(const GpuMat& I0, const GpuMat& I1, GpuMat& u1, GpuMat& u2, GpuMat& u3, Stream& stream);
+
+        std::vector<GpuMat> I0s;
+        std::vector<GpuMat> I1s;
+        std::vector<GpuMat> u1s;
+        std::vector<GpuMat> u2s;
+        std::vector<GpuMat> u3s;
+
+        GpuMat I1x_buf;
+        GpuMat I1y_buf;
+
+        GpuMat I1w_buf;
+        GpuMat I1wx_buf;
+        GpuMat I1wy_buf;
+
+        GpuMat grad_buf;
+        GpuMat rho_c_buf;
+
+        GpuMat p11_buf;
+        GpuMat p12_buf;
+        GpuMat p21_buf;
+        GpuMat p22_buf;
+        GpuMat p31_buf;
+        GpuMat p32_buf;
+
+        GpuMat diff_buf;
+        GpuMat norm_buf;
+    };
+
+    void OpticalFlowDual_TVL1_Impl::calc(InputArray _frame0, InputArray _frame1, InputOutputArray _flow, Stream& stream)
+    {
+        const GpuMat frame0 = _frame0.getGpuMat();
+        const GpuMat frame1 = _frame1.getGpuMat();
+
+        BufferPool pool(stream);
+        GpuMat flowx = pool.getBuffer(frame0.size(), CV_32FC1);
+        GpuMat flowy = pool.getBuffer(frame0.size(), CV_32FC1);
+
+        calcImpl(frame0, frame1, flowx, flowy, stream);
+
+        GpuMat flows[] = {flowx, flowy};
+        cuda::merge(flows, 2, _flow, stream);
+    }
+
+    void OpticalFlowDual_TVL1_Impl::calcImpl(const GpuMat& I0, const GpuMat& I1, GpuMat& flowx, GpuMat& flowy, Stream& stream)
+    {
+        CV_Assert( I0.type() == CV_8UC1 || I0.type() == CV_32FC1 );
+        CV_Assert( I0.size() == I1.size() );
+        CV_Assert( I0.type() == I1.type() );
+        CV_Assert( !useInitialFlow_ || (flowx.size() == I0.size() && flowx.type() == CV_32FC1 && flowy.size() == flowx.size() && flowy.type() == flowx.type()) );
+        CV_Assert( nscales_ > 0 );
+
+        // allocate memory for the pyramid structure
+        I0s.resize(nscales_);
+        I1s.resize(nscales_);
+        u1s.resize(nscales_);
+        u2s.resize(nscales_);
+        u3s.resize(nscales_);
+
+        I0.convertTo(I0s[0], CV_32F, I0.depth() == CV_8U ? 1.0 : 255.0, stream);
+        I1.convertTo(I1s[0], CV_32F, I1.depth() == CV_8U ? 1.0 : 255.0, stream);
+
+        if (!useInitialFlow_)
+        {
+            flowx.create(I0.size(), CV_32FC1);
+            flowy.create(I0.size(), CV_32FC1);
+        }
+
+        u1s[0] = flowx;
+        u2s[0] = flowy;
+        if (gamma_)
+        {
+            u3s[0].create(I0.size(), CV_32FC1);
+        }
+
+        I1x_buf.create(I0.size(), CV_32FC1);
+        I1y_buf.create(I0.size(), CV_32FC1);
+
+        I1w_buf.create(I0.size(), CV_32FC1);
+        I1wx_buf.create(I0.size(), CV_32FC1);
+        I1wy_buf.create(I0.size(), CV_32FC1);
+
+        grad_buf.create(I0.size(), CV_32FC1);
+        rho_c_buf.create(I0.size(), CV_32FC1);
+
+        p11_buf.create(I0.size(), CV_32FC1);
+        p12_buf.create(I0.size(), CV_32FC1);
+        p21_buf.create(I0.size(), CV_32FC1);
+        p22_buf.create(I0.size(), CV_32FC1);
+        if (gamma_)
+        {
+            p31_buf.create(I0.size(), CV_32FC1);
+            p32_buf.create(I0.size(), CV_32FC1);
+        }
+        diff_buf.create(I0.size(), CV_32FC1);
+
+        // create the scales
+        for (int s = 1; s < nscales_; ++s)
+        {
+            cuda::resize(I0s[s-1], I0s[s], Size(), scaleStep_, scaleStep_, INTER_LINEAR, stream);
+            cuda::resize(I1s[s-1], I1s[s], Size(), scaleStep_, scaleStep_, INTER_LINEAR, stream);
+
+            if (I0s[s].cols < 16 || I0s[s].rows < 16)
+            {
+                nscales_ = s;
+                break;
+            }
+
+            if (useInitialFlow_)
+            {
+                cuda::resize(u1s[s-1], u1s[s], Size(), scaleStep_, scaleStep_, INTER_LINEAR, stream);
+                cuda::resize(u2s[s-1], u2s[s], Size(), scaleStep_, scaleStep_, INTER_LINEAR, stream);
+
+                cuda::multiply(u1s[s], Scalar::all(scaleStep_), u1s[s], 1, -1, stream);
+                cuda::multiply(u2s[s], Scalar::all(scaleStep_), u2s[s], 1, -1, stream);
+            }
+            else
+            {
+                u1s[s].create(I0s[s].size(), CV_32FC1);
+                u2s[s].create(I0s[s].size(), CV_32FC1);
+            }
+            if (gamma_)
+            {
+                u3s[s].create(I0s[s].size(), CV_32FC1);
+            }
+        }
+
+        if (!useInitialFlow_)
+        {
+            u1s[nscales_-1].setTo(Scalar::all(0), stream);
+            u2s[nscales_-1].setTo(Scalar::all(0), stream);
+        }
+        if (gamma_)
+        {
+            u3s[nscales_ - 1].setTo(Scalar::all(0), stream);
+        }
+
+        // pyramidal structure for computing the optical flow
+        for (int s = nscales_ - 1; s >= 0; --s)
+        {
+            // compute the optical flow at the current scale
+            procOneScale(I0s[s], I1s[s], u1s[s], u2s[s], u3s[s], stream);
+
+            // if this was the last scale, finish now
+            if (s == 0)
+                break;
+
+            // otherwise, upsample the optical flow
+
+            // zoom the optical flow for the next finer scale
+            cuda::resize(u1s[s], u1s[s - 1], I0s[s - 1].size(), 0, 0, INTER_LINEAR, stream);
+            cuda::resize(u2s[s], u2s[s - 1], I0s[s - 1].size(), 0, 0, INTER_LINEAR, stream);
+            if (gamma_)
+            {
+                cuda::resize(u3s[s], u3s[s - 1], I0s[s - 1].size(), 0, 0, INTER_LINEAR, stream);
+            }
+
+            // scale the optical flow with the appropriate zoom factor
+            cuda::multiply(u1s[s - 1], Scalar::all(1/scaleStep_), u1s[s - 1], 1, -1, stream);
+            cuda::multiply(u2s[s - 1], Scalar::all(1/scaleStep_), u2s[s - 1], 1, -1, stream);
+        }
+    }
+
+    void OpticalFlowDual_TVL1_Impl::procOneScale(const GpuMat& I0, const GpuMat& I1, GpuMat& u1, GpuMat& u2, GpuMat& u3, Stream& _stream)
+    {
+        using namespace tvl1flow;
+
+        cudaStream_t stream = StreamAccessor::getStream(_stream);
+
+        const double scaledEpsilon = epsilon_ * epsilon_ * I0.size().area();
+
+        CV_DbgAssert( I1.size() == I0.size() );
+        CV_DbgAssert( I1.type() == I0.type() );
+        CV_DbgAssert( u1.size() == I0.size() );
+        CV_DbgAssert( u2.size() == u1.size() );
+
+        GpuMat I1x = I1x_buf(Rect(0, 0, I0.cols, I0.rows));
+        GpuMat I1y = I1y_buf(Rect(0, 0, I0.cols, I0.rows));
+        centeredGradient(I1, I1x, I1y, stream);
+
+        GpuMat I1w = I1w_buf(Rect(0, 0, I0.cols, I0.rows));
+        GpuMat I1wx = I1wx_buf(Rect(0, 0, I0.cols, I0.rows));
+        GpuMat I1wy = I1wy_buf(Rect(0, 0, I0.cols, I0.rows));
+
+        GpuMat grad = grad_buf(Rect(0, 0, I0.cols, I0.rows));
+        GpuMat rho_c = rho_c_buf(Rect(0, 0, I0.cols, I0.rows));
+
+        GpuMat p11 = p11_buf(Rect(0, 0, I0.cols, I0.rows));
+        GpuMat p12 = p12_buf(Rect(0, 0, I0.cols, I0.rows));
+        GpuMat p21 = p21_buf(Rect(0, 0, I0.cols, I0.rows));
+        GpuMat p22 = p22_buf(Rect(0, 0, I0.cols, I0.rows));
+        GpuMat p31, p32;
+        if (gamma_)
+        {
+            p31 = p31_buf(Rect(0, 0, I0.cols, I0.rows));
+            p32 = p32_buf(Rect(0, 0, I0.cols, I0.rows));
+        }
+        p11.setTo(Scalar::all(0), _stream);
+        p12.setTo(Scalar::all(0), _stream);
+        p21.setTo(Scalar::all(0), _stream);
+        p22.setTo(Scalar::all(0), _stream);
+        if (gamma_)
+        {
+            p31.setTo(Scalar::all(0), _stream);
+            p32.setTo(Scalar::all(0), _stream);
+        }
+
+        GpuMat diff = diff_buf(Rect(0, 0, I0.cols, I0.rows));
+
+        const float l_t = static_cast<float>(lambda_ * theta_);
+        const float taut = static_cast<float>(tau_ / theta_);
+
+        for (int warpings = 0; warpings < warps_; ++warpings)
+        {
+            warpBackward(I0, I1, I1x, I1y, u1, u2, I1w, I1wx, I1wy, grad, rho_c, stream);
+
+            double error = std::numeric_limits<double>::max();
+            double prevError = 0.0;
+            for (int n = 0; error > scaledEpsilon && n < iterations_; ++n)
+            {
+                // some tweaks to make sum operation less frequently
+                bool calcError = (epsilon_ > 0) && (n & 0x1) && (prevError < scaledEpsilon);
+                estimateU(I1wx, I1wy, grad, rho_c, p11, p12, p21, p22, p31, p32, u1, u2, u3, diff, l_t, static_cast<float>(theta_), gamma_, calcError, stream);
+                if (calcError)
+                {
+                    _stream.waitForCompletion();
+                    error = cuda::sum(diff, norm_buf)[0];
+                    prevError = error;
+                }
+                else
+                {
+                    error = std::numeric_limits<double>::max();
+                    prevError -= scaledEpsilon;
+                }
+
+                estimateDualVariables(u1, u2, u3, p11, p12, p21, p22, p31, p32, taut, gamma_, stream);
+            }
+        }
+    }
+}
+
+Ptr<OpticalFlowDual_TVL1> cv::cuda::OpticalFlowDual_TVL1::create(
+            double tau, double lambda, double theta, int nscales, int warps,
+            double epsilon, int iterations, double scaleStep, double gamma, bool useInitialFlow)
+{
+    return makePtr<OpticalFlowDual_TVL1_Impl>(tau, lambda, theta, nscales, warps,
+                                              epsilon, iterations, scaleStep, gamma, useInitialFlow);
+}
+
+#endif // !defined HAVE_CUDA || defined(CUDA_DISABLER)
diff --git a/modules/cudaoptflow/test/test_main.cpp b/modules/cudaoptflow/test/test_main.cpp
new file mode 100644
index 00000000000..04f4fcf6e60
--- /dev/null
+++ b/modules/cudaoptflow/test/test_main.cpp
@@ -0,0 +1,45 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+CV_CUDA_TEST_MAIN("gpu")
diff --git a/modules/cudaoptflow/test/test_optflow.cpp b/modules/cudaoptflow/test/test_optflow.cpp
new file mode 100644
index 00000000000..ccdef08c499
--- /dev/null
+++ b/modules/cudaoptflow/test/test_optflow.cpp
@@ -0,0 +1,514 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////
+// BroxOpticalFlow
+
+//#define BROX_DUMP
+
+struct BroxOpticalFlow : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    virtual void SetUp()
+    {
+        devInfo = GetParam();
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(BroxOpticalFlow, Regression)
+{
+    cv::Mat frame0 = readImageType("opticalflow/frame0.png", CV_32FC1);
+    ASSERT_FALSE(frame0.empty());
+
+    cv::Mat frame1 = readImageType("opticalflow/frame1.png", CV_32FC1);
+    ASSERT_FALSE(frame1.empty());
+
+    cv::Ptr<cv::cuda::BroxOpticalFlow> brox =
+            cv::cuda::BroxOpticalFlow::create(0.197 /*alpha*/, 50.0 /*gamma*/, 0.8 /*scale_factor*/,
+                                              10 /*inner_iterations*/, 77 /*outer_iterations*/, 10 /*solver_iterations*/);
+
+    cv::cuda::GpuMat flow;
+    brox->calc(loadMat(frame0), loadMat(frame1), flow);
+
+    cv::cuda::GpuMat flows[2];
+    cv::cuda::split(flow, flows);
+
+    cv::cuda::GpuMat u = flows[0];
+    cv::cuda::GpuMat v = flows[1];
+
+    std::string fname(cvtest::TS::ptr()->get_data_path());
+    if (devInfo.majorVersion() >= 2)
+        fname += "opticalflow/brox_optical_flow_cc20.bin";
+    else
+        fname += "opticalflow/brox_optical_flow.bin";
+
+#ifndef BROX_DUMP
+    std::ifstream f(fname.c_str(), std::ios_base::binary);
+
+    int rows, cols;
+
+    f.read((char*) &rows, sizeof(rows));
+    f.read((char*) &cols, sizeof(cols));
+
+    cv::Mat u_gold(rows, cols, CV_32FC1);
+
+    for (int i = 0; i < u_gold.rows; ++i)
+        f.read(u_gold.ptr<char>(i), u_gold.cols * sizeof(float));
+
+    cv::Mat v_gold(rows, cols, CV_32FC1);
+
+    for (int i = 0; i < v_gold.rows; ++i)
+        f.read(v_gold.ptr<char>(i), v_gold.cols * sizeof(float));
+
+    EXPECT_MAT_SIMILAR(u_gold, u, 1e-3);
+    EXPECT_MAT_SIMILAR(v_gold, v, 1e-3);
+#else
+    std::ofstream f(fname.c_str(), std::ios_base::binary);
+
+    f.write((char*) &u.rows, sizeof(u.rows));
+    f.write((char*) &u.cols, sizeof(u.cols));
+
+    cv::Mat h_u(u);
+    cv::Mat h_v(v);
+
+    for (int i = 0; i < u.rows; ++i)
+        f.write(h_u.ptr<char>(i), u.cols * sizeof(float));
+
+    for (int i = 0; i < v.rows; ++i)
+        f.write(h_v.ptr<char>(i), v.cols * sizeof(float));
+#endif
+}
+
+CUDA_TEST_P(BroxOpticalFlow, OpticalFlowNan)
+{
+    cv::Mat frame0 = readImageType("opticalflow/frame0.png", CV_32FC1);
+    ASSERT_FALSE(frame0.empty());
+
+    cv::Mat frame1 = readImageType("opticalflow/frame1.png", CV_32FC1);
+    ASSERT_FALSE(frame1.empty());
+
+    cv::Mat r_frame0, r_frame1;
+    cv::resize(frame0, r_frame0, cv::Size(1380,1000));
+    cv::resize(frame1, r_frame1, cv::Size(1380,1000));
+
+    cv::Ptr<cv::cuda::BroxOpticalFlow> brox =
+            cv::cuda::BroxOpticalFlow::create(0.197 /*alpha*/, 50.0 /*gamma*/, 0.8 /*scale_factor*/,
+                                              10 /*inner_iterations*/, 77 /*outer_iterations*/, 10 /*solver_iterations*/);
+
+    cv::cuda::GpuMat flow;
+    brox->calc(loadMat(frame0), loadMat(frame1), flow);
+
+    cv::cuda::GpuMat flows[2];
+    cv::cuda::split(flow, flows);
+
+    cv::cuda::GpuMat u = flows[0];
+    cv::cuda::GpuMat v = flows[1];
+
+    cv::Mat h_u, h_v;
+    u.download(h_u);
+    v.download(h_v);
+
+    EXPECT_TRUE(cv::checkRange(h_u));
+    EXPECT_TRUE(cv::checkRange(h_v));
+};
+
+INSTANTIATE_TEST_CASE_P(CUDA_OptFlow, BroxOpticalFlow, ALL_DEVICES);
+
+//////////////////////////////////////////////////////
+// PyrLKOpticalFlow
+
+namespace
+{
+    IMPLEMENT_PARAM_CLASS(Chan, int)
+    IMPLEMENT_PARAM_CLASS(DataType, int)
+}
+
+PARAM_TEST_CASE(PyrLKOpticalFlow, cv::cuda::DeviceInfo, Chan, DataType)
+{
+    cv::cuda::DeviceInfo devInfo;
+    int channels;
+    int dataType;
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        channels = GET_PARAM(1);
+        dataType = GET_PARAM(2);
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(PyrLKOpticalFlow, Sparse)
+{
+    cv::Mat frame0 = readImage("opticalflow/frame0.png", channels == 1 ? cv::IMREAD_GRAYSCALE : cv::IMREAD_COLOR);
+    ASSERT_FALSE(frame0.empty());
+
+    cv::Mat frame1 = readImage("opticalflow/frame1.png", channels == 1 ? cv::IMREAD_GRAYSCALE : cv::IMREAD_COLOR);
+    ASSERT_FALSE(frame1.empty());
+
+    cv::Mat gray_frame;
+    if (channels == 1)
+        gray_frame = frame0;
+    else
+        cv::cvtColor(frame0, gray_frame, cv::COLOR_BGR2GRAY);
+
+    std::vector<cv::Point2f> pts;
+    cv::goodFeaturesToTrack(gray_frame, pts, 1000, 0.01, 0.0);
+
+    cv::cuda::GpuMat d_pts;
+    cv::Mat pts_mat(1, (int) pts.size(), CV_32FC2, (void*) &pts[0]);
+    d_pts.upload(pts_mat);
+
+    cv::Ptr<cv::cuda::SparsePyrLKOpticalFlow> pyrLK =
+            cv::cuda::SparsePyrLKOpticalFlow::create();
+
+    std::vector<cv::Point2f> nextPts_gold;
+    std::vector<unsigned char> status_gold;
+    cv::calcOpticalFlowPyrLK(frame0, frame1, pts, nextPts_gold, status_gold, cv::noArray());
+
+
+    cv::cuda::GpuMat d_nextPts;
+    cv::cuda::GpuMat d_status;
+    cv::Mat converted0, converted1;
+    if(channels == 4)
+    {
+        cv::cvtColor(frame0, frame0, cv::COLOR_BGR2BGRA);
+        cv::cvtColor(frame1, frame1, cv::COLOR_BGR2BGRA);
+    }
+    frame0.convertTo(converted0, dataType);
+    frame1.convertTo(converted1, dataType);
+
+    pyrLK->calc(loadMat(converted0), loadMat(converted1), d_pts, d_nextPts, d_status);
+
+    std::vector<cv::Point2f> nextPts(d_nextPts.cols);
+    cv::Mat nextPts_mat(1, d_nextPts.cols, CV_32FC2, (void*)&nextPts[0]);
+    d_nextPts.download(nextPts_mat);
+
+    std::vector<unsigned char> status(d_status.cols);
+    cv::Mat status_mat(1, d_status.cols, CV_8UC1, (void*)&status[0]);
+    d_status.download(status_mat);
+
+    ASSERT_EQ(nextPts_gold.size(), nextPts.size());
+    ASSERT_EQ(status_gold.size(), status.size());
+
+    size_t mistmatch = 0;
+    for (size_t i = 0; i < nextPts.size(); ++i)
+    {
+        cv::Point2i a = nextPts[i];
+        cv::Point2i b = nextPts_gold[i];
+
+        if (status[i] != status_gold[i])
+        {
+            ++mistmatch;
+            continue;
+        }
+
+        if (status[i])
+        {
+            bool eq = std::abs(a.x - b.x) <= 1 && std::abs(a.y - b.y) <= 1;
+
+            if (!eq)
+                ++mistmatch;
+        }
+    }
+
+    double bad_ratio = static_cast<double>(mistmatch) / nextPts.size();
+
+    ASSERT_LE(bad_ratio, 0.01);
+
+
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_OptFlow, PyrLKOpticalFlow, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(Chan(1), Chan(3), Chan(4)),
+    testing::Values(DataType(CV_8U), DataType(CV_16U), DataType(CV_32S), DataType(CV_32F))));
+
+
+
+//////////////////////////////////////////////////////
+// FarnebackOpticalFlow
+
+namespace
+{
+    IMPLEMENT_PARAM_CLASS(PyrScale, double)
+    IMPLEMENT_PARAM_CLASS(PolyN, int)
+    CV_FLAGS(FarnebackOptFlowFlags, 0, OPTFLOW_FARNEBACK_GAUSSIAN)
+    IMPLEMENT_PARAM_CLASS(UseInitFlow, bool)
+}
+
+PARAM_TEST_CASE(FarnebackOpticalFlow, cv::cuda::DeviceInfo, PyrScale, PolyN, FarnebackOptFlowFlags, UseInitFlow)
+{
+    cv::cuda::DeviceInfo devInfo;
+    double pyrScale;
+    int polyN;
+    int flags;
+    bool useInitFlow;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        pyrScale = GET_PARAM(1);
+        polyN = GET_PARAM(2);
+        flags = GET_PARAM(3);
+        useInitFlow = GET_PARAM(4);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(FarnebackOpticalFlow, Accuracy)
+{
+    cv::Mat frame0 = readImage("opticalflow/rubberwhale1.png", cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame0.empty());
+
+    cv::Mat frame1 = readImage("opticalflow/rubberwhale2.png", cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame1.empty());
+
+    double polySigma = polyN <= 5 ? 1.1 : 1.5;
+
+    cv::Ptr<cv::cuda::FarnebackOpticalFlow> farn =
+            cv::cuda::FarnebackOpticalFlow::create();
+    farn->setPyrScale(pyrScale);
+    farn->setPolyN(polyN);
+    farn->setPolySigma(polySigma);
+    farn->setFlags(flags);
+
+    cv::cuda::GpuMat d_flow;
+    farn->calc(loadMat(frame0), loadMat(frame1), d_flow);
+
+    cv::Mat flow;
+    if (useInitFlow)
+    {
+        d_flow.download(flow);
+
+        farn->setFlags(farn->getFlags() | cv::OPTFLOW_USE_INITIAL_FLOW);
+        farn->calc(loadMat(frame0), loadMat(frame1), d_flow);
+    }
+
+    cv::calcOpticalFlowFarneback(
+        frame0, frame1, flow, farn->getPyrScale(), farn->getNumLevels(), farn->getWinSize(),
+        farn->getNumIters(), farn->getPolyN(), farn->getPolySigma(), farn->getFlags());
+
+    // Relax test limit when the flag is set
+    if (farn->getFlags() & cv::OPTFLOW_FARNEBACK_GAUSSIAN)
+    {
+        EXPECT_MAT_SIMILAR(flow, d_flow, 2e-2);
+    }
+    else
+    {
+        EXPECT_MAT_SIMILAR(flow, d_flow, 1e-4);
+    }
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_OptFlow, FarnebackOpticalFlow, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(PyrScale(0.3), PyrScale(0.5), PyrScale(0.8)),
+    testing::Values(PolyN(5), PolyN(7)),
+    testing::Values(FarnebackOptFlowFlags(0), FarnebackOptFlowFlags(cv::OPTFLOW_FARNEBACK_GAUSSIAN)),
+    testing::Values(UseInitFlow(false), UseInitFlow(true))));
+
+//////////////////////////////////////////////////////
+// OpticalFlowDual_TVL1
+
+namespace
+{
+    IMPLEMENT_PARAM_CLASS(Gamma, double)
+}
+
+PARAM_TEST_CASE(OpticalFlowDual_TVL1, cv::cuda::DeviceInfo, Gamma)
+{
+    cv::cuda::DeviceInfo devInfo;
+    double gamma;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        gamma = GET_PARAM(1);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(OpticalFlowDual_TVL1, Accuracy)
+{
+    cv::Mat frame0 = readImage("opticalflow/rubberwhale1.png", cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame0.empty());
+
+    cv::Mat frame1 = readImage("opticalflow/rubberwhale2.png", cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame1.empty());
+
+    cv::Ptr<cv::cuda::OpticalFlowDual_TVL1> d_alg =
+            cv::cuda::OpticalFlowDual_TVL1::create();
+    d_alg->setNumIterations(10);
+    d_alg->setGamma(gamma);
+
+    cv::cuda::GpuMat d_flow;
+    d_alg->calc(loadMat(frame0), loadMat(frame1), d_flow);
+
+    cv::Ptr<cv::optflow::DualTVL1OpticalFlow> alg = cv::optflow::createOptFlow_DualTVL1();
+    alg->setMedianFiltering(1);
+    alg->setInnerIterations(1);
+    alg->setOuterIterations(d_alg->getNumIterations());
+    alg->setGamma(gamma);
+
+    cv::Mat flow;
+    alg->calc(frame0, frame1, flow);
+
+    EXPECT_MAT_SIMILAR(flow, d_flow, 4e-3);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_OptFlow, OpticalFlowDual_TVL1, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(Gamma(0.0), Gamma(1.0))));
+
+//////////////////////////////////////////////////////
+// NvidiaOpticalFlow_1_0
+
+struct NvidiaOpticalFlow_1_0 : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    virtual void SetUp()
+    {
+        devInfo = GetParam();
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(NvidiaOpticalFlow_1_0, Regression)
+{
+    cv::Mat frame0 = readImage("opticalflow/frame0.png", cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame0.empty());
+
+    cv::Mat frame1 = readImage("opticalflow/frame1.png", cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame1.empty());
+
+    const int width = frame0.size().width;
+    const int height = frame0.size().height;
+    const bool enableTemporalHints = false;
+    const bool enableExternalHints = false;
+    const bool enableCostBuffer = false;
+    const int gpuid = 0;
+
+    cv::Ptr<cv::cuda::NvidiaOpticalFlow_1_0> d_nvof;
+    try
+    {
+        d_nvof = cv::cuda::NvidiaOpticalFlow_1_0::create(width, height,
+            cv::cuda::NvidiaOpticalFlow_1_0::NVIDIA_OF_PERF_LEVEL::NV_OF_PERF_LEVEL_SLOW,
+            enableTemporalHints, enableExternalHints, enableCostBuffer, gpuid);
+    }
+    catch (const cv::Exception& e)
+    {
+        if (e.code == Error::StsBadFunc || e.code == Error::StsBadArg || e.code == Error::StsNullPtr)
+            throw SkipTestException("Current configuration is not supported");
+        throw;
+    }
+    const int gridSize = d_nvof->getGridSize();
+
+    Mat flow, upsampledFlow;
+    d_nvof->calc(loadMat(frame0), loadMat(frame1), flow);
+    d_nvof->upSampler(flow, width, height, gridSize, upsampledFlow);
+
+    std::string fname(cvtest::TS::ptr()->get_data_path());
+    fname += "opticalflow/nvofGolden.flo";
+    cv::Mat golden = cv::readOpticalFlow(fname.c_str());
+    ASSERT_FALSE(golden.empty());
+
+    EXPECT_MAT_SIMILAR(golden, upsampledFlow, 1e-10);
+}
+
+CUDA_TEST_P(NvidiaOpticalFlow_1_0, OpticalFlowNan)
+{
+    cv::Mat frame0 = readImage("opticalflow/rubberwhale1.png", cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame0.empty());
+
+    cv::Mat frame1 = readImage("opticalflow/rubberwhale2.png", cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame1.empty());
+
+    cv::Mat r_frame0, r_frame1;
+
+    const int width = frame0.size().width;
+    const int height = frame0.size().height;
+    const bool enableTemporalHints = false;
+    const bool enableExternalHints = false;
+    const bool enableCostBuffer = false;
+    const int gpuid = 0;
+
+    cv::Ptr<cv::cuda::NvidiaOpticalFlow_1_0> d_nvof;
+    try
+    {
+        d_nvof = cv::cuda::NvidiaOpticalFlow_1_0::create(width, height,
+            cv::cuda::NvidiaOpticalFlow_1_0::NVIDIA_OF_PERF_LEVEL::NV_OF_PERF_LEVEL_SLOW,
+            enableTemporalHints, enableExternalHints, enableCostBuffer, gpuid);
+    }
+    catch (const cv::Exception& e)
+    {
+        if (e.code == Error::StsBadFunc || e.code == Error::StsBadArg || e.code == Error::StsNullPtr)
+            throw SkipTestException("Current configuration is not supported");
+        throw;
+    }
+
+    Mat flow, flowx, flowy;
+    d_nvof->calc(loadMat(frame0), loadMat(frame1), flow);
+
+    Mat planes[] = { flowx, flowy };
+    split(flow, planes);
+    flowx = planes[0]; flowy = planes[1];
+
+    EXPECT_TRUE(cv::checkRange(flowx));
+    EXPECT_TRUE(cv::checkRange(flowy));
+};
+
+INSTANTIATE_TEST_CASE_P(CUDA_OptFlow, NvidiaOpticalFlow_1_0, ALL_DEVICES);
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudaoptflow/test/test_precomp.hpp b/modules/cudaoptflow/test/test_precomp.hpp
new file mode 100644
index 00000000000..415a067cd44
--- /dev/null
+++ b/modules/cudaoptflow/test/test_precomp.hpp
@@ -0,0 +1,55 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_test.hpp"
+
+#include "opencv2/cudaoptflow.hpp"
+#include "opencv2/cudaarithm.hpp"
+#include "opencv2/video.hpp"
+#include "opencv2/optflow.hpp"
+
+#include "cvconfig.h"
+
+#endif
diff --git a/modules/cudastereo/CMakeLists.txt b/modules/cudastereo/CMakeLists.txt
new file mode 100644
index 00000000000..c02086913cf
--- /dev/null
+++ b/modules/cudastereo/CMakeLists.txt
@@ -0,0 +1,9 @@
+if(IOS OR  WINRT OR (NOT HAVE_CUDA AND NOT BUILD_CUDA_STUBS))
+  ocv_module_disable(cudastereo)
+endif()
+
+set(the_description "CUDA-accelerated Stereo Correspondence")
+
+ocv_warnings_disable(CMAKE_CXX_FLAGS /wd4127 /wd4324 /wd4512 -Wundef -Wmissing-declarations -Wshadow)
+
+ocv_define_module(cudastereo opencv_calib3d WRAP python)
diff --git a/modules/cudastereo/include/opencv2/cudastereo.hpp b/modules/cudastereo/include/opencv2/cudastereo.hpp
new file mode 100644
index 00000000000..0c312054d7c
--- /dev/null
+++ b/modules/cudastereo/include/opencv2/cudastereo.hpp
@@ -0,0 +1,333 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_CUDASTEREO_HPP
+#define OPENCV_CUDASTEREO_HPP
+
+#ifndef __cplusplus
+#  error cudastereo.hpp header must be compiled as C++
+#endif
+
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/calib3d.hpp"
+
+/**
+  @addtogroup cuda
+  @{
+    @defgroup cudastereo Stereo Correspondence
+  @}
+ */
+
+namespace cv { namespace cuda {
+
+//! @addtogroup cudastereo
+//! @{
+
+/////////////////////////////////////////
+// StereoBM
+
+/** @brief Class computing stereo correspondence (disparity map) using the block matching algorithm. :
+
+@sa StereoBM
+ */
+class CV_EXPORTS_W StereoBM : public cv::StereoBM
+{
+public:
+    using cv::StereoBM::compute;
+
+    CV_WRAP virtual void compute(InputArray left, InputArray right, OutputArray disparity, Stream& stream) = 0;
+};
+
+/** @brief Creates StereoBM object.
+
+@param numDisparities the disparity search range. For each pixel algorithm will find the best
+disparity from 0 (default minimum disparity) to numDisparities. The search range can then be
+shifted by changing the minimum disparity.
+@param blockSize the linear size of the blocks compared by the algorithm. The size should be odd
+(as the block is centered at the current pixel). Larger block size implies smoother, though less
+accurate disparity map. Smaller block size gives more detailed disparity map, but there is higher
+chance for algorithm to find a wrong correspondence.
+ */
+CV_EXPORTS_W Ptr<cuda::StereoBM> createStereoBM(int numDisparities = 64, int blockSize = 19);
+
+/////////////////////////////////////////
+// StereoBeliefPropagation
+
+/** @brief Class computing stereo correspondence using the belief propagation algorithm. :
+
+The class implements algorithm described in @cite Felzenszwalb2006 . It can compute own data cost
+(using a truncated linear model) or use a user-provided data cost.
+
+@note
+   StereoBeliefPropagation requires a lot of memory for message storage:
+
+    \f[width \_ step  \cdot height  \cdot ndisp  \cdot 4  \cdot (1 + 0.25)\f]
+
+    and for data cost storage:
+
+    \f[width\_step \cdot height \cdot ndisp \cdot (1 + 0.25 + 0.0625 +  \dotsm + \frac{1}{4^{levels}})\f]
+
+    width_step is the number of bytes in a line including padding.
+
+StereoBeliefPropagation uses a truncated linear model for the data cost and discontinuity terms:
+
+\f[DataCost = data \_ weight  \cdot \min ( \lvert Img_Left(x,y)-Img_Right(x-d,y)  \rvert , max \_ data \_ term)\f]
+
+\f[DiscTerm =  \min (disc \_ single \_ jump  \cdot \lvert f_1-f_2  \rvert , max \_ disc \_ term)\f]
+
+For more details, see @cite Felzenszwalb2006 .
+
+By default, StereoBeliefPropagation uses floating-point arithmetics and the CV_32FC1 type for
+messages. But it can also use fixed-point arithmetics and the CV_16SC1 message type for better
+performance. To avoid an overflow in this case, the parameters must satisfy the following
+requirement:
+
+\f[10  \cdot 2^{levels-1}  \cdot max \_ data \_ term < SHRT \_ MAX\f]
+
+@sa StereoMatcher
+ */
+class CV_EXPORTS_W StereoBeliefPropagation : public cv::StereoMatcher
+{
+public:
+    using cv::StereoMatcher::compute;
+
+    /** @overload */
+    CV_WRAP virtual void compute(InputArray left, InputArray right, OutputArray disparity, Stream& stream) = 0;
+
+    /** @brief Enables the stereo correspondence operator that finds the disparity for the specified data cost.
+
+    @param data User-specified data cost, a matrix of msg_type type and
+    Size(\<image columns\>\*ndisp, \<image rows\>) size.
+    @param disparity Output disparity map. If disparity is empty, the output type is CV_16SC1 .
+    Otherwise, the type is retained. In 16-bit signed format, the disparity values do not have
+    fractional bits.
+    @param stream Stream for the asynchronous version.
+     */
+    CV_WRAP virtual void compute(InputArray data, OutputArray disparity, Stream& stream = Stream::Null()) = 0;
+
+    //! number of BP iterations on each level
+    CV_WRAP virtual int getNumIters() const = 0;
+    CV_WRAP virtual void setNumIters(int iters) = 0;
+
+    //! number of levels
+    CV_WRAP virtual int getNumLevels() const = 0;
+    CV_WRAP virtual void setNumLevels(int levels) = 0;
+
+    //! truncation of data cost
+    CV_WRAP virtual double getMaxDataTerm() const = 0;
+    CV_WRAP virtual void setMaxDataTerm(double max_data_term) = 0;
+
+    //! data weight
+    CV_WRAP virtual double getDataWeight() const = 0;
+    CV_WRAP virtual void setDataWeight(double data_weight) = 0;
+
+    //! truncation of discontinuity cost
+    CV_WRAP virtual double getMaxDiscTerm() const = 0;
+    CV_WRAP virtual void setMaxDiscTerm(double max_disc_term) = 0;
+
+    //! discontinuity single jump
+    CV_WRAP virtual double getDiscSingleJump() const = 0;
+    CV_WRAP virtual void setDiscSingleJump(double disc_single_jump) = 0;
+
+    //! type for messages (CV_16SC1 or CV_32FC1)
+    CV_WRAP virtual int getMsgType() const = 0;
+    CV_WRAP virtual void setMsgType(int msg_type) = 0;
+
+    /** @brief Uses a heuristic method to compute the recommended parameters ( ndisp, iters and levels ) for the
+    specified image size ( width and height ).
+     */
+    CV_WRAP static void estimateRecommendedParams(int width, int height, int& ndisp, int& iters, int& levels);
+};
+
+/** @brief Creates StereoBeliefPropagation object.
+
+@param ndisp Number of disparities.
+@param iters Number of BP iterations on each level.
+@param levels Number of levels.
+@param msg_type Type for messages. CV_16SC1 and CV_32FC1 types are supported.
+ */
+CV_EXPORTS_W Ptr<cuda::StereoBeliefPropagation>
+    createStereoBeliefPropagation(int ndisp = 64, int iters = 5, int levels = 5, int msg_type = CV_32F);
+
+/////////////////////////////////////////
+// StereoConstantSpaceBP
+
+/** @brief Class computing stereo correspondence using the constant space belief propagation algorithm. :
+
+The class implements algorithm described in @cite Yang2010 . StereoConstantSpaceBP supports both local
+minimum and global minimum data cost initialization algorithms. For more details, see the paper
+mentioned above. By default, a local algorithm is used. To enable a global algorithm, set
+use_local_init_data_cost to false .
+
+StereoConstantSpaceBP uses a truncated linear model for the data cost and discontinuity terms:
+
+\f[DataCost = data \_ weight  \cdot \min ( \lvert I_2-I_1  \rvert , max \_ data \_ term)\f]
+
+\f[DiscTerm =  \min (disc \_ single \_ jump  \cdot \lvert f_1-f_2  \rvert , max \_ disc \_ term)\f]
+
+For more details, see @cite Yang2010 .
+
+By default, StereoConstantSpaceBP uses floating-point arithmetics and the CV_32FC1 type for
+messages. But it can also use fixed-point arithmetics and the CV_16SC1 message type for better
+performance. To avoid an overflow in this case, the parameters must satisfy the following
+requirement:
+
+\f[10  \cdot 2^{levels-1}  \cdot max \_ data \_ term < SHRT \_ MAX\f]
+
+ */
+class CV_EXPORTS_W StereoConstantSpaceBP : public cuda::StereoBeliefPropagation
+{
+public:
+    //! number of active disparity on the first level
+    CV_WRAP virtual int getNrPlane() const = 0;
+    CV_WRAP virtual void setNrPlane(int nr_plane) = 0;
+
+    CV_WRAP virtual bool getUseLocalInitDataCost() const = 0;
+    CV_WRAP virtual void setUseLocalInitDataCost(bool use_local_init_data_cost) = 0;
+
+    /** @brief Uses a heuristic method to compute parameters (ndisp, iters, levelsand nrplane) for the specified
+    image size (widthand height).
+     */
+    CV_WRAP static void estimateRecommendedParams(int width, int height, int& ndisp, int& iters, int& levels, int& nr_plane);
+};
+
+/** @brief Creates StereoConstantSpaceBP object.
+
+@param ndisp Number of disparities.
+@param iters Number of BP iterations on each level.
+@param levels Number of levels.
+@param nr_plane Number of disparity levels on the first level.
+@param msg_type Type for messages. CV_16SC1 and CV_32FC1 types are supported.
+ */
+CV_EXPORTS_W Ptr<cuda::StereoConstantSpaceBP>
+    createStereoConstantSpaceBP(int ndisp = 128, int iters = 8, int levels = 4, int nr_plane = 4, int msg_type = CV_32F);
+
+/////////////////////////////////////////
+// DisparityBilateralFilter
+
+/** @brief Class refining a disparity map using joint bilateral filtering. :
+
+The class implements @cite Yang2010 algorithm.
+ */
+class CV_EXPORTS_W DisparityBilateralFilter : public cv::Algorithm
+{
+public:
+    /** @brief Refines a disparity map using joint bilateral filtering.
+
+    @param disparity Input disparity map. CV_8UC1 and CV_16SC1 types are supported.
+    @param image Input image. CV_8UC1 and CV_8UC3 types are supported.
+    @param dst Destination disparity map. It has the same size and type as disparity .
+    @param stream Stream for the asynchronous version.
+     */
+    CV_WRAP virtual void apply(InputArray disparity, InputArray image, OutputArray dst, Stream& stream = Stream::Null()) = 0;
+
+    CV_WRAP virtual int getNumDisparities() const = 0;
+    CV_WRAP virtual void setNumDisparities(int numDisparities) = 0;
+
+    CV_WRAP virtual int getRadius() const = 0;
+    CV_WRAP virtual void setRadius(int radius) = 0;
+
+    CV_WRAP virtual int getNumIters() const = 0;
+    CV_WRAP virtual void setNumIters(int iters) = 0;
+
+    //! truncation of data continuity
+    CV_WRAP virtual double getEdgeThreshold() const = 0;
+    CV_WRAP virtual void setEdgeThreshold(double edge_threshold) = 0;
+
+    //! truncation of disparity continuity
+    CV_WRAP virtual double getMaxDiscThreshold() const = 0;
+    CV_WRAP virtual void setMaxDiscThreshold(double max_disc_threshold) = 0;
+
+    //! filter range sigma
+    CV_WRAP virtual double getSigmaRange() const = 0;
+    CV_WRAP virtual void setSigmaRange(double sigma_range) = 0;
+};
+
+/** @brief Creates DisparityBilateralFilter object.
+
+@param ndisp Number of disparities.
+@param radius Filter radius.
+@param iters Number of iterations.
+ */
+CV_EXPORTS_W Ptr<cuda::DisparityBilateralFilter>
+    createDisparityBilateralFilter(int ndisp = 64, int radius = 3, int iters = 1);
+
+/////////////////////////////////////////
+// Utility
+
+/** @brief Reprojects a disparity image to 3D space.
+
+@param disp Input single-channel 8-bit unsigned, 16-bit signed, 32-bit signed or 32-bit
+floating-point disparity image. If 16-bit signed format is used, the values are assumed to have no
+fractional bits.
+@param xyzw Output 3- or 4-channel floating-point image of the same size as disp . Each element of
+xyzw(x,y) contains 3D coordinates (x,y,z) or (x,y,z,1) of the point (x,y) , computed from the
+disparity map.
+@param Q \f$4 \times 4\f$ perspective transformation matrix that can be obtained via stereoRectify .
+@param dst_cn The number of channels for output image. Can be 3 or 4.
+@param stream Stream for the asynchronous version.
+
+@sa reprojectImageTo3D
+ */
+CV_EXPORTS_W void reprojectImageTo3D(InputArray disp, OutputArray xyzw, InputArray Q, int dst_cn = 4, Stream& stream = Stream::Null());
+
+/** @brief Colors a disparity image.
+
+@param src_disp Input single-channel 8-bit unsigned, 16-bit signed, 32-bit signed or 32-bit
+floating-point disparity image. If 16-bit signed format is used, the values are assumed to have no
+fractional bits.
+@param dst_disp Output disparity image. It has the same size as src_disp. The type is CV_8UC4
+in BGRA format (alpha = 255).
+@param ndisp Number of disparities.
+@param stream Stream for the asynchronous version.
+
+This function draws a colored disparity map by converting disparity values from [0..ndisp) interval
+first to HSV color space (where different disparity values correspond to different hues) and then
+converting the pixels to RGB for visualization.
+ */
+CV_EXPORTS_W void drawColorDisp(InputArray src_disp, OutputArray dst_disp, int ndisp, Stream& stream = Stream::Null());
+
+//! @}
+
+}} // namespace cv { namespace cuda {
+
+#endif /* OPENCV_CUDASTEREO_HPP */
diff --git a/modules/cudastereo/perf/perf_main.cpp b/modules/cudastereo/perf/perf_main.cpp
new file mode 100644
index 00000000000..3cf84eb7833
--- /dev/null
+++ b/modules/cudastereo/perf/perf_main.cpp
@@ -0,0 +1,47 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+using namespace perf;
+
+CV_PERF_TEST_CUDA_MAIN(cudastereo)
diff --git a/modules/cudastereo/perf/perf_precomp.hpp b/modules/cudastereo/perf/perf_precomp.hpp
new file mode 100644
index 00000000000..1da8d3ae16b
--- /dev/null
+++ b/modules/cudastereo/perf/perf_precomp.hpp
@@ -0,0 +1,55 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_PERF_PRECOMP_HPP__
+#define __OPENCV_PERF_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_perf.hpp"
+
+#include "opencv2/cudastereo.hpp"
+#include "opencv2/calib3d.hpp"
+
+namespace opencv_test {
+using namespace perf;
+}
+
+#endif
diff --git a/modules/cudastereo/perf/perf_stereo.cpp b/modules/cudastereo/perf/perf_stereo.cpp
new file mode 100644
index 00000000000..50529c2fe09
--- /dev/null
+++ b/modules/cudastereo/perf/perf_stereo.cpp
@@ -0,0 +1,255 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////////////////////
+// StereoBM
+
+typedef tuple<string, string> pair_string;
+DEF_PARAM_TEST_1(ImagePair, pair_string);
+
+PERF_TEST_P(ImagePair, StereoBM,
+            Values(pair_string("gpu/perf/aloe.png", "gpu/perf/aloeR.png")))
+{
+    declare.time(300.0);
+
+    const cv::Mat imgLeft = readImage(GET_PARAM(0), cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(imgLeft.empty());
+
+    const cv::Mat imgRight = readImage(GET_PARAM(1), cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(imgRight.empty());
+
+    const int ndisp = 256;
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::StereoBM> d_bm = cv::cuda::createStereoBM(ndisp);
+
+        const cv::cuda::GpuMat d_imgLeft(imgLeft);
+        const cv::cuda::GpuMat d_imgRight(imgRight);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() d_bm->compute(d_imgLeft, d_imgRight, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Ptr<cv::StereoBM> bm = cv::StereoBM::create(ndisp);
+
+        cv::Mat dst;
+
+        TEST_CYCLE() bm->compute(imgLeft, imgRight, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// StereoBeliefPropagation
+
+PERF_TEST_P(ImagePair, StereoBeliefPropagation,
+            Values(pair_string("gpu/stereobp/aloe-L.png", "gpu/stereobp/aloe-R.png")))
+{
+    declare.time(300.0);
+
+    const cv::Mat imgLeft = readImage(GET_PARAM(0));
+    ASSERT_FALSE(imgLeft.empty());
+
+    const cv::Mat imgRight = readImage(GET_PARAM(1));
+    ASSERT_FALSE(imgRight.empty());
+
+    const int ndisp = 64;
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::cuda::StereoBeliefPropagation> d_bp = cv::cuda::createStereoBeliefPropagation(ndisp);
+
+        const cv::cuda::GpuMat d_imgLeft(imgLeft);
+        const cv::cuda::GpuMat d_imgRight(imgRight);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() d_bp->compute(d_imgLeft, d_imgRight, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// StereoConstantSpaceBP
+
+PERF_TEST_P(ImagePair, StereoConstantSpaceBP,
+            Values(pair_string("gpu/stereobm/aloe-L.png", "gpu/stereobm/aloe-R.png")))
+{
+    declare.time(300.0);
+
+    const cv::Mat imgLeft = readImage(GET_PARAM(0), cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(imgLeft.empty());
+
+    const cv::Mat imgRight = readImage(GET_PARAM(1), cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(imgRight.empty());
+
+    const int ndisp = 128;
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::cuda::StereoConstantSpaceBP> d_csbp = cv::cuda::createStereoConstantSpaceBP(ndisp);
+
+        const cv::cuda::GpuMat d_imgLeft(imgLeft);
+        const cv::cuda::GpuMat d_imgRight(imgRight);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() d_csbp->compute(d_imgLeft, d_imgRight, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// DisparityBilateralFilter
+
+PERF_TEST_P(ImagePair, DisparityBilateralFilter,
+            Values(pair_string("gpu/stereobm/aloe-L.png", "gpu/stereobm/aloe-disp.png")))
+{
+    const cv::Mat img = readImage(GET_PARAM(0), cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(img.empty());
+
+    const cv::Mat disp = readImage(GET_PARAM(1), cv::IMREAD_GRAYSCALE);
+    ASSERT_FALSE(disp.empty());
+
+    const int ndisp = 128;
+
+    if (PERF_RUN_CUDA())
+    {
+        cv::Ptr<cv::cuda::DisparityBilateralFilter> d_filter = cv::cuda::createDisparityBilateralFilter(ndisp);
+
+        const cv::cuda::GpuMat d_img(img);
+        const cv::cuda::GpuMat d_disp(disp);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() d_filter->apply(d_disp, d_img, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// ReprojectImageTo3D
+
+DEF_PARAM_TEST(Sz_Depth, cv::Size, MatDepth);
+
+PERF_TEST_P(Sz_Depth, ReprojectImageTo3D,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16S)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+
+    cv::Mat src(size, depth);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Mat Q(4, 4, CV_32FC1);
+    cv::randu(Q, 0.1, 1.0);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::reprojectImageTo3D(d_src, dst, Q);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::reprojectImageTo3D(src, dst, Q);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// DrawColorDisp
+
+PERF_TEST_P(Sz_Depth, DrawColorDisp,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16S)))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int type = GET_PARAM(1);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::drawColorDisp(d_src, dst, 255);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+}} // namespace
diff --git a/modules/cudastereo/src/cuda/disparity_bilateral_filter.cu b/modules/cudastereo/src/cuda/disparity_bilateral_filter.cu
new file mode 100644
index 00000000000..c69e6559f79
--- /dev/null
+++ b/modules/cudastereo/src/cuda/disparity_bilateral_filter.cu
@@ -0,0 +1,205 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/limits.hpp"
+
+#include "disparity_bilateral_filter.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace disp_bilateral_filter
+    {
+        template <int channels>
+        struct DistRgbMax
+        {
+            static __device__ __forceinline__ uchar calc(const uchar* a, const uchar* b)
+            {
+                uchar x = ::abs(a[0] - b[0]);
+                uchar y = ::abs(a[1] - b[1]);
+                uchar z = ::abs(a[2] - b[2]);
+                return (::max(::max(x, y), z));
+            }
+        };
+
+        template <>
+        struct DistRgbMax<1>
+        {
+            static __device__ __forceinline__ uchar calc(const uchar* a, const uchar* b)
+            {
+                return ::abs(a[0] - b[0]);
+            }
+        };
+
+        template <int channels, typename T>
+        __global__ void disp_bilateral_filter(int t, T* disp, size_t disp_step,
+            const uchar* img, size_t img_step, int h, int w,
+            const float* ctable_color, const float * ctable_space, size_t ctable_space_step,
+            int cradius,
+            short cedge_disc, short cmax_disc)
+        {
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+            const int x = ((blockIdx.x * blockDim.x + threadIdx.x) << 1) + ((y + t) & 1);
+
+            T dp[5];
+
+            if (y > 0 && y < h - 1 && x > 0 && x < w - 1)
+            {
+                dp[0] = *(disp + (y  ) * disp_step + x + 0);
+                dp[1] = *(disp + (y-1) * disp_step + x + 0);
+                dp[2] = *(disp + (y  ) * disp_step + x - 1);
+                dp[3] = *(disp + (y+1) * disp_step + x + 0);
+                dp[4] = *(disp + (y  ) * disp_step + x + 1);
+
+                if(::abs(dp[1] - dp[0]) >= cedge_disc || ::abs(dp[2] - dp[0]) >= cedge_disc || ::abs(dp[3] - dp[0]) >= cedge_disc || ::abs(dp[4] - dp[0]) >= cedge_disc)
+                {
+                    const int ymin = ::max(0, y - cradius);
+                    const int xmin = ::max(0, x - cradius);
+                    const int ymax = ::min(h - 1, y + cradius);
+                    const int xmax = ::min(w - 1, x + cradius);
+
+                    float cost[] = {0.0f, 0.0f, 0.0f, 0.0f, 0.0f};
+
+                    const uchar* ic = img + y * img_step + channels * x;
+
+                    for(int yi = ymin; yi <= ymax; yi++)
+                    {
+                        const T* disp_y = disp + yi * disp_step;
+
+                        for(int xi = xmin; xi <= xmax; xi++)
+                        {
+                            const uchar* in = img + yi * img_step + channels * xi;
+
+                            uchar dist_rgb = DistRgbMax<channels>::calc(in, ic);
+
+                            const float weight = ctable_color[dist_rgb] * (ctable_space + ::abs(y-yi)* ctable_space_step)[::abs(x-xi)];
+
+                            const T disp_reg = disp_y[xi];
+
+                            cost[0] += ::min(cmax_disc, ::abs(disp_reg - dp[0])) * weight;
+                            cost[1] += ::min(cmax_disc, ::abs(disp_reg - dp[1])) * weight;
+                            cost[2] += ::min(cmax_disc, ::abs(disp_reg - dp[2])) * weight;
+                            cost[3] += ::min(cmax_disc, ::abs(disp_reg - dp[3])) * weight;
+                            cost[4] += ::min(cmax_disc, ::abs(disp_reg - dp[4])) * weight;
+                        }
+                    }
+
+                    float minimum = numeric_limits<float>::max();
+                    int id = 0;
+
+                    if (cost[0] < minimum)
+                    {
+                        minimum = cost[0];
+                        id = 0;
+                    }
+                    if (cost[1] < minimum)
+                    {
+                        minimum = cost[1];
+                        id = 1;
+                    }
+                    if (cost[2] < minimum)
+                    {
+                        minimum = cost[2];
+                        id = 2;
+                    }
+                    if (cost[3] < minimum)
+                    {
+                        minimum = cost[3];
+                        id = 3;
+                    }
+                    if (cost[4] < minimum)
+                    {
+                        minimum = cost[4];
+                        id = 4;
+                    }
+
+                    *(disp + y * disp_step + x) = dp[id];
+                }
+            }
+        }
+
+        template <typename T>
+        void disp_bilateral_filter(PtrStepSz<T> disp, PtrStepSzb img, int channels, int iters, const float *table_color, const float* table_space, size_t table_step, int radius, short edge_disc, short max_disc, cudaStream_t stream)
+        {
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+            grid.x = divUp(disp.cols, threads.x << 1);
+            grid.y = divUp(disp.rows, threads.y);
+
+            switch (channels)
+            {
+            case 1:
+                for (int i = 0; i < iters; ++i)
+                {
+                    disp_bilateral_filter<1><<<grid, threads, 0, stream>>>(0, disp.data, disp.step/sizeof(T), img.data, img.step, disp.rows, disp.cols, table_color, table_space, table_step, radius, edge_disc, max_disc);
+                    cudaSafeCall( cudaGetLastError() );
+
+                    disp_bilateral_filter<1><<<grid, threads, 0, stream>>>(1, disp.data, disp.step/sizeof(T), img.data, img.step, disp.rows, disp.cols, table_color, table_space, table_step, radius, edge_disc, max_disc);
+                    cudaSafeCall( cudaGetLastError() );
+                }
+                break;
+            case 3:
+                for (int i = 0; i < iters; ++i)
+                {
+                    disp_bilateral_filter<3><<<grid, threads, 0, stream>>>(0, disp.data, disp.step/sizeof(T), img.data, img.step, disp.rows, disp.cols, table_color, table_space, table_step, radius, edge_disc, max_disc);
+                    cudaSafeCall( cudaGetLastError() );
+
+                    disp_bilateral_filter<3><<<grid, threads, 0, stream>>>(1, disp.data, disp.step/sizeof(T), img.data, img.step, disp.rows, disp.cols, table_color, table_space, table_step, radius, edge_disc, max_disc);
+                    cudaSafeCall( cudaGetLastError() );
+                }
+                break;
+            default:
+                CV_Error(cv::Error::BadNumChannels, "Unsupported channels count");
+            }
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template void disp_bilateral_filter<uchar>(PtrStepSz<uchar> disp, PtrStepSzb img, int channels, int iters, const float *table_color, const float *table_space, size_t table_step, int radius, short, short, cudaStream_t stream);
+        template void disp_bilateral_filter<short>(PtrStepSz<short> disp, PtrStepSzb img, int channels, int iters, const float *table_color, const float *table_space, size_t table_step, int radius, short, short, cudaStream_t stream);
+    } // namespace bilateral_filter
+}}} // namespace cv { namespace cuda { namespace cudev
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudastereo/src/cuda/disparity_bilateral_filter.hpp b/modules/cudastereo/src/cuda/disparity_bilateral_filter.hpp
new file mode 100644
index 00000000000..95be8345736
--- /dev/null
+++ b/modules/cudastereo/src/cuda/disparity_bilateral_filter.hpp
@@ -0,0 +1,8 @@
+namespace cv { namespace cuda { namespace device
+{
+    namespace disp_bilateral_filter
+    {
+        template<typename T>
+        void disp_bilateral_filter(PtrStepSz<T> disp, PtrStepSzb img, int channels, int iters, const float *, const float *, size_t, int radius, short edge_disc, short max_disc, cudaStream_t stream);
+    }
+}}}
diff --git a/modules/cudastereo/src/cuda/stereobm.cu b/modules/cudastereo/src/cuda/stereobm.cu
new file mode 100644
index 00000000000..7c72f76e386
--- /dev/null
+++ b/modules/cudastereo/src/cuda/stereobm.cu
@@ -0,0 +1,577 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace stereobm
+    {
+        //////////////////////////////////////////////////////////////////////////////////////////////////
+        /////////////////////////////////////// Stereo BM ////////////////////////////////////////////////
+        //////////////////////////////////////////////////////////////////////////////////////////////////
+
+        #define ROWSperTHREAD 21     // the number of rows a thread will process
+
+        #define BLOCK_W 128          // the thread block width (464)
+        #define N_DISPARITIES 8
+
+        #define STEREO_MIND 0                    // The minimum d range to check
+        #define STEREO_DISP_STEP N_DISPARITIES   // the d step, must be <= 1 to avoid aliasing
+
+        __constant__ unsigned int* cminSSDImage;
+        __constant__ size_t cminSSD_step;
+        __constant__ int cwidth;
+        __constant__ int cheight;
+
+        __device__ __forceinline__ int SQ(int a)
+        {
+            return a * a;
+        }
+
+        template<int RADIUS>
+        __device__ unsigned int CalcSSD(volatile unsigned int *col_ssd_cache, volatile unsigned int *col_ssd, const int X)
+        {
+            unsigned int cache = 0;
+            unsigned int cache2 = 0;
+
+            if (X < cwidth - RADIUS)
+            {
+                for(int i = 1; i <= RADIUS; i++)
+                    cache += col_ssd[i];
+
+                col_ssd_cache[0] = cache;
+            }
+
+            __syncthreads();
+
+            if (X < cwidth - RADIUS)
+            {
+                if (threadIdx.x < BLOCK_W - RADIUS)
+                    cache2 = col_ssd_cache[RADIUS];
+                else
+                    for(int i = RADIUS + 1; i < (2 * RADIUS + 1); i++)
+                        cache2 += col_ssd[i];
+            }
+
+            return col_ssd[0] + cache + cache2;
+        }
+
+        template<int RADIUS>
+        __device__ uint2 MinSSD(volatile unsigned int *col_ssd_cache, volatile unsigned int *col_ssd, const int X)
+        {
+            unsigned int ssd[N_DISPARITIES];
+
+            //See above:  #define COL_SSD_SIZE (BLOCK_W + 2 * RADIUS)
+            ssd[0] = CalcSSD<RADIUS>(col_ssd_cache, col_ssd + 0 * (BLOCK_W + 2 * RADIUS), X);
+            __syncthreads();
+            ssd[1] = CalcSSD<RADIUS>(col_ssd_cache, col_ssd + 1 * (BLOCK_W + 2 * RADIUS), X);
+            __syncthreads();
+            ssd[2] = CalcSSD<RADIUS>(col_ssd_cache, col_ssd + 2 * (BLOCK_W + 2 * RADIUS), X);
+            __syncthreads();
+            ssd[3] = CalcSSD<RADIUS>(col_ssd_cache, col_ssd + 3 * (BLOCK_W + 2 * RADIUS), X);
+            __syncthreads();
+            ssd[4] = CalcSSD<RADIUS>(col_ssd_cache, col_ssd + 4 * (BLOCK_W + 2 * RADIUS), X);
+            __syncthreads();
+            ssd[5] = CalcSSD<RADIUS>(col_ssd_cache, col_ssd + 5 * (BLOCK_W + 2 * RADIUS), X);
+            __syncthreads();
+            ssd[6] = CalcSSD<RADIUS>(col_ssd_cache, col_ssd + 6 * (BLOCK_W + 2 * RADIUS), X);
+            __syncthreads();
+            ssd[7] = CalcSSD<RADIUS>(col_ssd_cache, col_ssd + 7 * (BLOCK_W + 2 * RADIUS), X);
+
+            int mssd = ::min(::min(::min(ssd[0], ssd[1]), ::min(ssd[4], ssd[5])), ::min(::min(ssd[2], ssd[3]), ::min(ssd[6], ssd[7])));
+
+            int bestIdx = 0;
+            for (int i = 0; i < N_DISPARITIES; i++)
+            {
+                if (mssd == ssd[i])
+                    bestIdx = i;
+            }
+
+            return make_uint2(mssd, bestIdx);
+        }
+
+        template<int RADIUS>
+        __device__ void StepDown(int idx1, int idx2, unsigned char* imageL, unsigned char* imageR, int d, volatile unsigned int *col_ssd)
+        {
+            unsigned char leftPixel1;
+            unsigned char leftPixel2;
+            unsigned char rightPixel1[8];
+            unsigned char rightPixel2[8];
+            unsigned int diff1, diff2;
+
+            leftPixel1 = imageL[idx1];
+            leftPixel2 = imageL[idx2];
+
+            idx1 = idx1 - d;
+            idx2 = idx2 - d;
+
+            rightPixel1[7] = imageR[idx1 - 7];
+            rightPixel1[0] = imageR[idx1 - 0];
+            rightPixel1[1] = imageR[idx1 - 1];
+            rightPixel1[2] = imageR[idx1 - 2];
+            rightPixel1[3] = imageR[idx1 - 3];
+            rightPixel1[4] = imageR[idx1 - 4];
+            rightPixel1[5] = imageR[idx1 - 5];
+            rightPixel1[6] = imageR[idx1 - 6];
+
+            rightPixel2[7] = imageR[idx2 - 7];
+            rightPixel2[0] = imageR[idx2 - 0];
+            rightPixel2[1] = imageR[idx2 - 1];
+            rightPixel2[2] = imageR[idx2 - 2];
+            rightPixel2[3] = imageR[idx2 - 3];
+            rightPixel2[4] = imageR[idx2 - 4];
+            rightPixel2[5] = imageR[idx2 - 5];
+            rightPixel2[6] = imageR[idx2 - 6];
+
+            //See above:  #define COL_SSD_SIZE (BLOCK_W + 2 * RADIUS)
+            diff1 = leftPixel1 - rightPixel1[0];
+            diff2 = leftPixel2 - rightPixel2[0];
+            col_ssd[0 * (BLOCK_W + 2 * RADIUS)] += SQ(diff2) - SQ(diff1);
+
+            diff1 = leftPixel1 - rightPixel1[1];
+            diff2 = leftPixel2 - rightPixel2[1];
+            col_ssd[1 * (BLOCK_W + 2 * RADIUS)] += SQ(diff2) - SQ(diff1);
+
+            diff1 = leftPixel1 - rightPixel1[2];
+            diff2 = leftPixel2 - rightPixel2[2];
+            col_ssd[2 * (BLOCK_W + 2 * RADIUS)] += SQ(diff2) - SQ(diff1);
+
+            diff1 = leftPixel1 - rightPixel1[3];
+            diff2 = leftPixel2 - rightPixel2[3];
+            col_ssd[3 * (BLOCK_W + 2 * RADIUS)] += SQ(diff2) - SQ(diff1);
+
+            diff1 = leftPixel1 - rightPixel1[4];
+            diff2 = leftPixel2 - rightPixel2[4];
+            col_ssd[4 * (BLOCK_W + 2 * RADIUS)] += SQ(diff2) - SQ(diff1);
+
+            diff1 = leftPixel1 - rightPixel1[5];
+            diff2 = leftPixel2 - rightPixel2[5];
+            col_ssd[5 * (BLOCK_W + 2 * RADIUS)] += SQ(diff2) - SQ(diff1);
+
+            diff1 = leftPixel1 - rightPixel1[6];
+            diff2 = leftPixel2 - rightPixel2[6];
+            col_ssd[6 * (BLOCK_W + 2 * RADIUS)] += SQ(diff2) - SQ(diff1);
+
+            diff1 = leftPixel1 - rightPixel1[7];
+            diff2 = leftPixel2 - rightPixel2[7];
+            col_ssd[7 * (BLOCK_W + 2 * RADIUS)] += SQ(diff2) - SQ(diff1);
+        }
+
+        template<int RADIUS>
+        __device__ void InitColSSD(int x_tex, int y_tex, int im_pitch, unsigned char* imageL, unsigned char* imageR, int d, volatile unsigned int *col_ssd)
+        {
+            unsigned char leftPixel1;
+            int idx;
+            unsigned int diffa[] = {0, 0, 0, 0, 0, 0, 0, 0};
+
+            for(int i = 0; i < (2 * RADIUS + 1); i++)
+            {
+                idx = y_tex * im_pitch + x_tex;
+                leftPixel1 = imageL[idx];
+                idx = idx - d;
+
+                diffa[0] += SQ(leftPixel1 - imageR[idx - 0]);
+                diffa[1] += SQ(leftPixel1 - imageR[idx - 1]);
+                diffa[2] += SQ(leftPixel1 - imageR[idx - 2]);
+                diffa[3] += SQ(leftPixel1 - imageR[idx - 3]);
+                diffa[4] += SQ(leftPixel1 - imageR[idx - 4]);
+                diffa[5] += SQ(leftPixel1 - imageR[idx - 5]);
+                diffa[6] += SQ(leftPixel1 - imageR[idx - 6]);
+                diffa[7] += SQ(leftPixel1 - imageR[idx - 7]);
+
+                y_tex += 1;
+            }
+            //See above:  #define COL_SSD_SIZE (BLOCK_W + 2 * RADIUS)
+            col_ssd[0 * (BLOCK_W + 2 * RADIUS)] = diffa[0];
+            col_ssd[1 * (BLOCK_W + 2 * RADIUS)] = diffa[1];
+            col_ssd[2 * (BLOCK_W + 2 * RADIUS)] = diffa[2];
+            col_ssd[3 * (BLOCK_W + 2 * RADIUS)] = diffa[3];
+            col_ssd[4 * (BLOCK_W + 2 * RADIUS)] = diffa[4];
+            col_ssd[5 * (BLOCK_W + 2 * RADIUS)] = diffa[5];
+            col_ssd[6 * (BLOCK_W + 2 * RADIUS)] = diffa[6];
+            col_ssd[7 * (BLOCK_W + 2 * RADIUS)] = diffa[7];
+        }
+
+        template<int RADIUS>
+        __global__ void stereoKernel(unsigned char *left, unsigned char *right, size_t img_step, PtrStepb disp, int maxdisp)
+        {
+            extern __shared__ unsigned int col_ssd_cache[];
+            volatile unsigned int *col_ssd = col_ssd_cache + BLOCK_W + threadIdx.x;
+            volatile unsigned int *col_ssd_extra = threadIdx.x < (2 * RADIUS) ? col_ssd + BLOCK_W : 0;  //#define N_DIRTY_PIXELS (2 * RADIUS)
+
+            //#define X (blockIdx.x * BLOCK_W + threadIdx.x + STEREO_MAXD)
+            int X = (blockIdx.x * BLOCK_W + threadIdx.x + maxdisp + RADIUS);
+            //#define Y (__mul24(blockIdx.y, ROWSperTHREAD) + RADIUS)
+            #define Y (blockIdx.y * ROWSperTHREAD + RADIUS)
+            //int Y = blockIdx.y * ROWSperTHREAD + RADIUS;
+
+            unsigned int* minSSDImage = cminSSDImage + X + Y * cminSSD_step;
+            unsigned char* disparImage = disp.data + X + Y * disp.step;
+            //if (X < cwidth)
+            //{
+            //    unsigned int *minSSDImage_end = minSSDImage + min(ROWSperTHREAD, cheight - Y) * minssd_step;
+            //    for(uint *ptr = minSSDImage; ptr != minSSDImage_end; ptr += minssd_step )
+            //        *ptr = 0xFFFFFFFF;
+            //}
+            int end_row = ::min(ROWSperTHREAD, cheight - Y - RADIUS);
+            int y_tex;
+            int x_tex = X - RADIUS;
+
+            if (x_tex >= cwidth)
+                return;
+
+            for(int d = STEREO_MIND; d < maxdisp; d += STEREO_DISP_STEP)
+            {
+                y_tex = Y - RADIUS;
+
+                InitColSSD<RADIUS>(x_tex, y_tex, img_step, left, right, d, col_ssd);
+
+                if (col_ssd_extra > 0)
+                    if (x_tex + BLOCK_W < cwidth)
+                        InitColSSD<RADIUS>(x_tex + BLOCK_W, y_tex, img_step, left, right, d, col_ssd_extra);
+
+                __syncthreads(); //before MinSSD function
+
+                if (Y < cheight - RADIUS)
+                {
+                    uint2 minSSD = MinSSD<RADIUS>(col_ssd_cache + threadIdx.x, col_ssd, X);
+
+                    // For threads that do not satisfy the if condition below("X < cwidth - RADIUS"), previously
+                    // computed "minSSD" value, which is the result of "MinSSD" function call, is not used at all.
+                    //
+                    // However, since the "MinSSD" function has "__syncthreads" call in its body, those threads
+                    // must also call "MinSSD" to avoid deadlock. (#13850)
+                    //
+                    // From CUDA 9, using "__syncwarp" with proper mask value instead of using "__syncthreads"
+                    // could be an option, but the shared memory access pattern does not allow this option,
+                    // resulting in race condition. (Checked via "cuda-memcheck --tool racecheck")
+
+                    if (X < cwidth - RADIUS)
+                    {
+                        if (minSSD.x < minSSDImage[0])
+                        {
+                            disparImage[0] = (unsigned char)(d + minSSD.y);
+                            minSSDImage[0] = minSSD.x;
+                        }
+                    }
+                }
+
+                for(int row = 1; row < end_row; row++)
+                {
+                    int idx1 = y_tex * img_step + x_tex;
+                    int idx2 = (y_tex + (2 * RADIUS + 1)) * img_step + x_tex;
+
+                    __syncthreads();
+
+                    StepDown<RADIUS>(idx1, idx2, left, right, d, col_ssd);
+
+                    if (col_ssd_extra)
+                        if (x_tex + BLOCK_W < cwidth)
+                            StepDown<RADIUS>(idx1, idx2, left + BLOCK_W, right + BLOCK_W, d, col_ssd_extra);
+
+                    y_tex += 1;
+
+                    __syncthreads(); //before MinSSD function
+
+                    if (row < cheight - RADIUS - Y)
+                    {
+                        uint2 minSSD = MinSSD<RADIUS>(col_ssd_cache + threadIdx.x, col_ssd, X);
+
+                        // For threads that do not satisfy the if condition below("X < cwidth - RADIUS"), previously
+                        // computed "minSSD" value, which is the result of "MinSSD" function call, is not used at all.
+                        //
+                        // However, since the "MinSSD" function has "__syncthreads" call in its body, those threads
+                        // must also call "MinSSD" to avoid deadlock. (#13850)
+                        //
+                        // From CUDA 9, using "__syncwarp" with proper mask value instead of using "__syncthreads"
+                        // could be an option, but the shared memory access pattern does not allow this option,
+                        // resulting in race condition. (Checked via "cuda-memcheck --tool racecheck")
+
+                        if (X < cwidth - RADIUS)
+                        {
+                            int idx = row * cminSSD_step;
+                            if (minSSD.x < minSSDImage[idx])
+                            {
+                                disparImage[disp.step * row] = (unsigned char)(d + minSSD.y);
+                                minSSDImage[idx] = minSSD.x;
+                            }
+                        }
+                    }
+                } // for row loop
+
+                __syncthreads(); // before initializing shared memory at the beginning of next loop
+
+            } // for d loop
+        }
+
+
+        template<int RADIUS> void kernel_caller(const PtrStepSzb& left, const PtrStepSzb& right, const PtrStepSzb& disp, int maxdisp, cudaStream_t & stream)
+        {
+            dim3 grid(1,1,1);
+            dim3 threads(BLOCK_W, 1, 1);
+
+            grid.x = divUp(left.cols - maxdisp - 2 * RADIUS, BLOCK_W);
+            grid.y = divUp(left.rows - 2 * RADIUS, ROWSperTHREAD);
+
+            //See above:  #define COL_SSD_SIZE (BLOCK_W + 2 * RADIUS)
+            size_t smem_size = (BLOCK_W + N_DISPARITIES * (BLOCK_W + 2 * RADIUS)) * sizeof(unsigned int);
+
+            stereoKernel<RADIUS><<<grid, threads, smem_size, stream>>>(left.data, right.data, left.step, disp, maxdisp);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        };
+
+        typedef void (*kernel_caller_t)(const PtrStepSzb& left, const PtrStepSzb& right, const PtrStepSzb& disp, int maxdisp, cudaStream_t & stream);
+
+        const static kernel_caller_t callers[] =
+        {
+            0,
+            kernel_caller< 1>, kernel_caller< 2>, kernel_caller< 3>, kernel_caller< 4>, kernel_caller< 5>,
+            kernel_caller< 6>, kernel_caller< 7>, kernel_caller< 8>, kernel_caller< 9>, kernel_caller<10>,
+            kernel_caller<11>, kernel_caller<12>, kernel_caller<13>, kernel_caller<14>, kernel_caller<15>,
+            kernel_caller<16>, kernel_caller<17>, kernel_caller<18>, kernel_caller<19>, kernel_caller<20>,
+            kernel_caller<21>, kernel_caller<22>, kernel_caller<23>, kernel_caller<24>, kernel_caller<25>
+
+            //0,0,0, 0,0,0, 0,0,kernel_caller<9>
+        };
+        const int calles_num = sizeof(callers)/sizeof(callers[0]);
+
+        void stereoBM_CUDA(const PtrStepSzb& left, const PtrStepSzb& right, const PtrStepSzb& disp, int maxdisp, int winsz, const PtrStepSz<unsigned int>& minSSD_buf, cudaStream_t& stream)
+        {
+            int winsz2 = winsz >> 1;
+
+            if (winsz2 == 0 || winsz2 >= calles_num)
+                CV_Error(cv::Error::StsBadArg, "Unsupported window size");
+
+            //cudaSafeCall( cudaFuncSetCacheConfig(&stereoKernel, cudaFuncCachePreferL1) );
+            //cudaSafeCall( cudaFuncSetCacheConfig(&stereoKernel, cudaFuncCachePreferShared) );
+
+            cudaSafeCall( cudaMemset2D(disp.data, disp.step, 0, disp.cols, disp.rows) );
+            cudaSafeCall( cudaMemset2D(minSSD_buf.data, minSSD_buf.step, 0xFF, minSSD_buf.cols * minSSD_buf.elemSize(), disp.rows) );
+
+            cudaSafeCall( cudaMemcpyToSymbol( cwidth, &left.cols, sizeof(left.cols) ) );
+            cudaSafeCall( cudaMemcpyToSymbol( cheight, &left.rows, sizeof(left.rows) ) );
+            cudaSafeCall( cudaMemcpyToSymbol( cminSSDImage, &minSSD_buf.data, sizeof(minSSD_buf.data) ) );
+
+            size_t minssd_step = minSSD_buf.step/minSSD_buf.elemSize();
+            cudaSafeCall( cudaMemcpyToSymbol( cminSSD_step,  &minssd_step, sizeof(minssd_step) ) );
+
+            callers[winsz2](left, right, disp, maxdisp, stream);
+        }
+
+        //////////////////////////////////////////////////////////////////////////////////////////////////
+        /////////////////////////////////////// Sobel Prefiler ///////////////////////////////////////////
+        //////////////////////////////////////////////////////////////////////////////////////////////////
+
+        texture<unsigned char, 2, cudaReadModeElementType> texForSobel;
+
+        __global__ void prefilter_kernel(PtrStepSzb output, int prefilterCap)
+        {
+            int x = blockDim.x * blockIdx.x + threadIdx.x;
+            int y = blockDim.y * blockIdx.y + threadIdx.y;
+
+            if (x < output.cols && y < output.rows)
+            {
+                int conv = (int)tex2D(texForSobel, x - 1, y - 1) * (-1) + (int)tex2D(texForSobel, x + 1, y - 1) * (1) +
+                           (int)tex2D(texForSobel, x - 1, y    ) * (-2) + (int)tex2D(texForSobel, x + 1, y    ) * (2) +
+                           (int)tex2D(texForSobel, x - 1, y + 1) * (-1) + (int)tex2D(texForSobel, x + 1, y + 1) * (1);
+
+
+                conv = ::min(::min(::max(-prefilterCap, conv), prefilterCap) + prefilterCap, 255);
+                output.ptr(y)[x] = conv & 0xFF;
+            }
+        }
+
+        void prefilter_xsobel(const PtrStepSzb& input, const PtrStepSzb& output, int prefilterCap, cudaStream_t & stream)
+        {
+            cudaChannelFormatDesc desc = cudaCreateChannelDesc<unsigned char>();
+            cudaSafeCall( cudaBindTexture2D( 0, texForSobel, input.data, desc, input.cols, input.rows, input.step ) );
+
+            dim3 threads(16, 16, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(input.cols, threads.x);
+            grid.y = divUp(input.rows, threads.y);
+
+            prefilter_kernel<<<grid, threads, 0, stream>>>(output, prefilterCap);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+
+            cudaSafeCall( cudaUnbindTexture (texForSobel ) );
+        }
+
+
+        //////////////////////////////////////////////////////////////////////////////////////////////////
+        /////////////////////////////////// Textureness filtering ////////////////////////////////////////
+        //////////////////////////////////////////////////////////////////////////////////////////////////
+
+        texture<unsigned char, 2, cudaReadModeNormalizedFloat> texForTF;
+
+        __device__ __forceinline__ float sobel(int x, int y)
+        {
+            float conv = tex2D(texForTF, x - 1, y - 1) * (-1) + tex2D(texForTF, x + 1, y - 1) * (1) +
+                         tex2D(texForTF, x - 1, y    ) * (-2) + tex2D(texForTF, x + 1, y    ) * (2) +
+                         tex2D(texForTF, x - 1, y + 1) * (-1) + tex2D(texForTF, x + 1, y + 1) * (1);
+            return fabs(conv);
+        }
+
+        __device__ float CalcSums(float *cols, float *cols_cache, int winsz)
+        {
+            float cache = 0;
+            float cache2 = 0;
+            int winsz2 = winsz/2;
+
+            for(int i = 1; i <= winsz2; i++)
+                cache += cols[i];
+
+            cols_cache[0] = cache;
+
+            __syncthreads();
+
+            if (threadIdx.x < blockDim.x - winsz2)
+                cache2 = cols_cache[winsz2];
+            else
+                for(int i = winsz2 + 1; i < winsz; i++)
+                    cache2 += cols[i];
+
+            return cols[0] + cache + cache2;
+        }
+
+        #define RpT (2 * ROWSperTHREAD)  // got experimentally
+
+        __global__ void textureness_kernel(PtrStepSzb disp, int winsz, float threshold)
+        {
+            int winsz2 = winsz/2;
+            int n_dirty_pixels = (winsz2) * 2;
+
+            extern __shared__ float cols_cache[];
+            float *cols = cols_cache + blockDim.x + threadIdx.x;
+            float *cols_extra = threadIdx.x < n_dirty_pixels ? cols + blockDim.x : 0;
+
+            int x = blockIdx.x * blockDim.x + threadIdx.x;
+            int beg_row = blockIdx.y * RpT;
+            int end_row = ::min(beg_row + RpT, disp.rows);
+
+            if (x < disp.cols)
+            {
+                int y = beg_row;
+
+                float sum = 0;
+                float sum_extra = 0;
+
+                for(int i = y - winsz2; i <= y + winsz2; ++i)
+                {
+                    sum += sobel(x - winsz2, i);
+                    if (cols_extra)
+                        sum_extra += sobel(x + blockDim.x - winsz2, i);
+                }
+                *cols = sum;
+                if (cols_extra)
+                    *cols_extra = sum_extra;
+
+                __syncthreads();
+
+                float sum_win = CalcSums(cols, cols_cache + threadIdx.x, winsz) * 255;
+                if (sum_win < threshold)
+                    disp.data[y * disp.step + x] = 0;
+
+                __syncthreads();
+
+                for(int y = beg_row + 1; y < end_row; ++y)
+                {
+                    sum = sum - sobel(x - winsz2, y - winsz2 - 1) + sobel(x - winsz2, y + winsz2);
+                    *cols = sum;
+
+                    if (cols_extra)
+                    {
+                        sum_extra = sum_extra - sobel(x + blockDim.x - winsz2, y - winsz2 - 1) + sobel(x + blockDim.x - winsz2, y + winsz2);
+                        *cols_extra = sum_extra;
+                    }
+
+                    __syncthreads();
+                    float sum_win = CalcSums(cols, cols_cache + threadIdx.x, winsz) * 255;
+                    if (sum_win < threshold)
+                        disp.data[y * disp.step + x] = 0;
+
+                    __syncthreads();
+                }
+            }
+        }
+
+        void postfilter_textureness(const PtrStepSzb& input, int winsz, float avgTexturenessThreshold, const PtrStepSzb& disp, cudaStream_t & stream)
+        {
+            avgTexturenessThreshold *= winsz * winsz;
+
+            texForTF.filterMode     = cudaFilterModeLinear;
+            texForTF.addressMode[0] = cudaAddressModeWrap;
+            texForTF.addressMode[1] = cudaAddressModeWrap;
+
+            cudaChannelFormatDesc desc = cudaCreateChannelDesc<unsigned char>();
+            cudaSafeCall( cudaBindTexture2D( 0, texForTF, input.data, desc, input.cols, input.rows, input.step ) );
+
+            dim3 threads(128, 1, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(input.cols, threads.x);
+            grid.y = divUp(input.rows, RpT);
+
+            size_t smem_size = (threads.x + threads.x + (winsz/2) * 2 ) * sizeof(float);
+            textureness_kernel<<<grid, threads, smem_size, stream>>>(disp, winsz, avgTexturenessThreshold);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+
+            cudaSafeCall( cudaUnbindTexture (texForTF) );
+        }
+    } // namespace stereobm
+}}} // namespace cv { namespace cuda { namespace cudev
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudastereo/src/cuda/stereobp.cu b/modules/cudastereo/src/cuda/stereobp.cu
new file mode 100644
index 00000000000..a4e7d2dabae
--- /dev/null
+++ b/modules/cudastereo/src/cuda/stereobp.cu
@@ -0,0 +1,543 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/saturate_cast.hpp"
+#include "opencv2/core/cuda/limits.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace stereobp
+    {
+        ///////////////////////////////////////////////////////////////
+        /////////////////////// load constants ////////////////////////
+        ///////////////////////////////////////////////////////////////
+
+        __constant__ int   cndisp;
+        __constant__ float cmax_data_term;
+        __constant__ float cdata_weight;
+        __constant__ float cmax_disc_term;
+        __constant__ float cdisc_single_jump;
+
+        void load_constants(int ndisp, float max_data_term, float data_weight, float max_disc_term, float disc_single_jump)
+        {
+            cudaSafeCall( cudaMemcpyToSymbol(cndisp,            &ndisp,            sizeof(int  )) );
+            cudaSafeCall( cudaMemcpyToSymbol(cmax_data_term,    &max_data_term,    sizeof(float)) );
+            cudaSafeCall( cudaMemcpyToSymbol(cdata_weight,      &data_weight,      sizeof(float)) );
+            cudaSafeCall( cudaMemcpyToSymbol(cmax_disc_term,    &max_disc_term,    sizeof(float)) );
+            cudaSafeCall( cudaMemcpyToSymbol(cdisc_single_jump, &disc_single_jump, sizeof(float)) );
+        }
+
+        ///////////////////////////////////////////////////////////////
+        ////////////////////////// comp data //////////////////////////
+        ///////////////////////////////////////////////////////////////
+
+        template <int cn> struct PixDiff;
+        template <> struct PixDiff<1>
+        {
+            __device__ __forceinline__ PixDiff(const uchar* ls)
+            {
+                l = *ls;
+            }
+            __device__ __forceinline__ float operator()(const uchar* rs) const
+            {
+                return ::abs((int)l - *rs);
+            }
+            uchar l;
+        };
+        template <> struct PixDiff<3>
+        {
+            __device__ __forceinline__ PixDiff(const uchar* ls)
+            {
+                l = *((uchar3*)ls);
+            }
+            __device__ __forceinline__ float operator()(const uchar* rs) const
+            {
+                const float tr = 0.299f;
+                const float tg = 0.587f;
+                const float tb = 0.114f;
+
+                float val  = tb * ::abs((int)l.x - rs[0]);
+                      val += tg * ::abs((int)l.y - rs[1]);
+                      val += tr * ::abs((int)l.z - rs[2]);
+
+                return val;
+            }
+            uchar3 l;
+        };
+        template <> struct PixDiff<4>
+        {
+            __device__ __forceinline__ PixDiff(const uchar* ls)
+            {
+                l = *((uchar4*)ls);
+            }
+            __device__ __forceinline__ float operator()(const uchar* rs) const
+            {
+                const float tr = 0.299f;
+                const float tg = 0.587f;
+                const float tb = 0.114f;
+
+                uchar4 r = *((uchar4*)rs);
+
+                float val  = tb * ::abs((int)l.x - r.x);
+                      val += tg * ::abs((int)l.y - r.y);
+                      val += tr * ::abs((int)l.z - r.z);
+
+                return val;
+            }
+            uchar4 l;
+        };
+
+        template <int cn, typename D>
+        __global__ void comp_data(const PtrStepSzb left, const PtrStepb right, PtrStep<D> data)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (y > 0 && y < left.rows - 1 && x > 0 && x < left.cols - 1)
+            {
+                const uchar* ls = left.ptr(y) + x * cn;
+                const PixDiff<cn> pixDiff(ls);
+                const uchar* rs = right.ptr(y) + x * cn;
+
+                D* ds = data.ptr(y) + x;
+                const size_t disp_step = data.step * left.rows / sizeof(D);
+
+                for (int disp = 0; disp < cndisp; disp++)
+                {
+                    if (x - disp >= 1)
+                    {
+                        float val = pixDiff(rs - disp * cn);
+
+                        ds[disp * disp_step] = saturate_cast<D>(fmin(cdata_weight * val, cdata_weight * cmax_data_term));
+                    }
+                    else
+                    {
+                        ds[disp * disp_step] = saturate_cast<D>(cdata_weight * cmax_data_term);
+                    }
+                }
+            }
+        }
+
+        template<typename T, typename D>
+        void comp_data_gpu(const PtrStepSzb& left, const PtrStepSzb& right, const PtrStepSzb& data, cudaStream_t stream);
+
+        template <> void comp_data_gpu<uchar, short>(const PtrStepSzb& left, const PtrStepSzb& right, const PtrStepSzb& data, cudaStream_t stream)
+        {
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(left.cols, threads.x);
+            grid.y = divUp(left.rows, threads.y);
+
+            comp_data<1, short><<<grid, threads, 0, stream>>>(left, right, (PtrStepSz<short>)data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+        template <> void comp_data_gpu<uchar, float>(const PtrStepSzb& left, const PtrStepSzb& right, const PtrStepSzb& data, cudaStream_t stream)
+        {
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(left.cols, threads.x);
+            grid.y = divUp(left.rows, threads.y);
+
+            comp_data<1, float><<<grid, threads, 0, stream>>>(left, right, (PtrStepSz<float>)data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template <> void comp_data_gpu<uchar3, short>(const PtrStepSzb& left, const PtrStepSzb& right, const PtrStepSzb& data, cudaStream_t stream)
+        {
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(left.cols, threads.x);
+            grid.y = divUp(left.rows, threads.y);
+
+            comp_data<3, short><<<grid, threads, 0, stream>>>(left, right, (PtrStepSz<short>)data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+        template <> void comp_data_gpu<uchar3, float>(const PtrStepSzb& left, const PtrStepSzb& right, const PtrStepSzb& data, cudaStream_t stream)
+        {
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(left.cols, threads.x);
+            grid.y = divUp(left.rows, threads.y);
+
+            comp_data<3, float><<<grid, threads, 0, stream>>>(left, right, (PtrStepSz<float>)data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template <> void comp_data_gpu<uchar4, short>(const PtrStepSzb& left, const PtrStepSzb& right, const PtrStepSzb& data, cudaStream_t stream)
+        {
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(left.cols, threads.x);
+            grid.y = divUp(left.rows, threads.y);
+
+            comp_data<4, short><<<grid, threads, 0, stream>>>(left, right, (PtrStepSz<short>)data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+        template <> void comp_data_gpu<uchar4, float>(const PtrStepSzb& left, const PtrStepSzb& right, const PtrStepSzb& data, cudaStream_t stream)
+        {
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(left.cols, threads.x);
+            grid.y = divUp(left.rows, threads.y);
+
+            comp_data<4, float><<<grid, threads, 0, stream>>>(left, right, (PtrStepSz<float>)data);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        ///////////////////////////////////////////////////////////////
+        //////////////////////// data step down ///////////////////////
+        ///////////////////////////////////////////////////////////////
+
+        template <typename T>
+        __global__ void data_step_down(int dst_cols, int dst_rows, int src_cols, int src_rows, const PtrStep<T> src, PtrStep<T> dst)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < dst_cols && y < dst_rows)
+            {
+                for (int d = 0; d < cndisp; ++d)
+                {
+                    // check the index of src
+                    const int x0 = 2 * x;
+                    const int x1 = ::min(x0 + 1, src_cols - 1);
+                    const int y0 = 2 * y;
+                    const int y1 = ::min(y0 + 1, src_rows - 1);
+                    float dst_reg  = src.ptr(d * src_rows + y0)[x0];
+                          dst_reg += src.ptr(d * src_rows + y1)[x0];
+                          dst_reg += src.ptr(d * src_rows + y0)[x1];
+                          dst_reg += src.ptr(d * src_rows + y1)[x1];
+
+                    dst.ptr(d * dst_rows + y)[x] = saturate_cast<T>(dst_reg);
+                }
+            }
+        }
+
+        template<typename T>
+        void data_step_down_gpu(int dst_cols, int dst_rows, int src_cols, int src_rows, const PtrStepSzb& src, const PtrStepSzb& dst, cudaStream_t stream)
+        {
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(dst_cols, threads.x);
+            grid.y = divUp(dst_rows, threads.y);
+
+            data_step_down<T><<<grid, threads, 0, stream>>>(dst_cols, dst_rows, src_cols, src_rows, (PtrStepSz<T>)src, (PtrStepSz<T>)dst);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template void data_step_down_gpu<short>(int dst_cols, int dst_rows, int src_cols, int src_rows, const PtrStepSzb& src, const PtrStepSzb& dst, cudaStream_t stream);
+        template void data_step_down_gpu<float>(int dst_cols, int dst_rows, int src_cols, int src_rows, const PtrStepSzb& src, const PtrStepSzb& dst, cudaStream_t stream);
+
+        ///////////////////////////////////////////////////////////////
+        /////////////////// level up messages  ////////////////////////
+        ///////////////////////////////////////////////////////////////
+
+        template <typename T>
+        __global__ void level_up_message(int dst_cols, int dst_rows, int src_rows, const PtrStep<T> src, PtrStep<T> dst)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (x < dst_cols && y < dst_rows)
+            {
+                const size_t dst_disp_step = dst.step * dst_rows / sizeof(T);
+                const size_t src_disp_step = src.step * src_rows / sizeof(T);
+
+                T*       dstr = dst.ptr(y  ) + x;
+                const T* srcr = src.ptr(y/2) + x/2;
+
+                for (int d = 0; d < cndisp; ++d)
+                    dstr[d * dst_disp_step] = srcr[d * src_disp_step];
+            }
+        }
+
+        template <typename T>
+        void level_up_messages_gpu(int dst_idx, int dst_cols, int dst_rows, int src_rows, PtrStepSzb* mus, PtrStepSzb* mds, PtrStepSzb* mls, PtrStepSzb* mrs, cudaStream_t stream)
+        {
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(dst_cols, threads.x);
+            grid.y = divUp(dst_rows, threads.y);
+
+            int src_idx = (dst_idx + 1) & 1;
+
+            level_up_message<T><<<grid, threads, 0, stream>>>(dst_cols, dst_rows, src_rows, (PtrStepSz<T>)mus[src_idx], (PtrStepSz<T>)mus[dst_idx]);
+            cudaSafeCall( cudaGetLastError() );
+
+            level_up_message<T><<<grid, threads, 0, stream>>>(dst_cols, dst_rows, src_rows, (PtrStepSz<T>)mds[src_idx], (PtrStepSz<T>)mds[dst_idx]);
+            cudaSafeCall( cudaGetLastError() );
+
+            level_up_message<T><<<grid, threads, 0, stream>>>(dst_cols, dst_rows, src_rows, (PtrStepSz<T>)mls[src_idx], (PtrStepSz<T>)mls[dst_idx]);
+            cudaSafeCall( cudaGetLastError() );
+
+            level_up_message<T><<<grid, threads, 0, stream>>>(dst_cols, dst_rows, src_rows, (PtrStepSz<T>)mrs[src_idx], (PtrStepSz<T>)mrs[dst_idx]);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template void level_up_messages_gpu<short>(int dst_idx, int dst_cols, int dst_rows, int src_rows, PtrStepSzb* mus, PtrStepSzb* mds, PtrStepSzb* mls, PtrStepSzb* mrs, cudaStream_t stream);
+        template void level_up_messages_gpu<float>(int dst_idx, int dst_cols, int dst_rows, int src_rows, PtrStepSzb* mus, PtrStepSzb* mds, PtrStepSzb* mls, PtrStepSzb* mrs, cudaStream_t stream);
+
+        ///////////////////////////////////////////////////////////////
+        ////////////////////  calc all iterations /////////////////////
+        ///////////////////////////////////////////////////////////////
+
+        template <typename T>
+        __device__ void calc_min_linear_penalty(T* dst, size_t step)
+        {
+            float prev = dst[0];
+            float cur;
+            for (int disp = 1; disp < cndisp; ++disp)
+            {
+                prev += cdisc_single_jump;
+                cur = dst[step * disp];
+                if (prev < cur)
+                {
+                    cur = prev;
+                    dst[step * disp] = saturate_cast<T>(prev);
+                }
+                prev = cur;
+            }
+
+            prev = dst[(cndisp - 1) * step];
+            for (int disp = cndisp - 2; disp >= 0; disp--)
+            {
+                prev += cdisc_single_jump;
+                cur = dst[step * disp];
+                if (prev < cur)
+                {
+                    cur = prev;
+                    dst[step * disp] = saturate_cast<T>(prev);
+                }
+                prev = cur;
+            }
+        }
+
+        template <typename T>
+        __device__ void message(const T* msg1, const T* msg2, const T* msg3, const T* data, T* dst, size_t msg_disp_step, size_t data_disp_step)
+        {
+            float minimum = device::numeric_limits<float>::max();
+
+            for(int i = 0; i < cndisp; ++i)
+            {
+                float dst_reg  = msg1[msg_disp_step * i];
+                      dst_reg += msg2[msg_disp_step * i];
+                      dst_reg += msg3[msg_disp_step * i];
+                      dst_reg += data[data_disp_step * i];
+
+                if (dst_reg < minimum)
+                    minimum = dst_reg;
+
+                dst[msg_disp_step * i] = saturate_cast<T>(dst_reg);
+            }
+
+            calc_min_linear_penalty(dst, msg_disp_step);
+
+            minimum += cmax_disc_term;
+
+            float sum = 0;
+            for(int i = 0; i < cndisp; ++i)
+            {
+                float dst_reg = dst[msg_disp_step * i];
+                if (dst_reg > minimum)
+                {
+                    dst_reg = minimum;
+                    dst[msg_disp_step * i] = saturate_cast<T>(minimum);
+                }
+                sum += dst_reg;
+            }
+            sum /= cndisp;
+
+            for(int i = 0; i < cndisp; ++i)
+                dst[msg_disp_step * i] -= sum;
+        }
+
+        template <typename T>
+        __global__ void one_iteration(int t, int elem_step, T* u, T* d, T* l, T* r, const PtrStep<T> data, int cols, int rows)
+        {
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+            const int x = ((blockIdx.x * blockDim.x + threadIdx.x) << 1) + ((y + t) & 1);
+
+            if ((y > 0) && (y < rows - 1) && (x > 0) && (x < cols - 1))
+            {
+                T* us = u + y * elem_step + x;
+                T* ds = d + y * elem_step + x;
+                T* ls = l + y * elem_step + x;
+                T* rs = r + y * elem_step + x;
+                const T* dt = data.ptr(y) + x;
+
+                size_t msg_disp_step = elem_step * rows;
+                size_t data_disp_step = data.step * rows / sizeof(T);
+
+                message(us + elem_step, ls         + 1, rs - 1, dt, us, msg_disp_step, data_disp_step);
+                message(ds - elem_step, ls         + 1, rs - 1, dt, ds, msg_disp_step, data_disp_step);
+                message(us + elem_step, ds - elem_step, rs - 1, dt, rs, msg_disp_step, data_disp_step);
+                message(us + elem_step, ds - elem_step, ls + 1, dt, ls, msg_disp_step, data_disp_step);
+            }
+        }
+
+        template <typename T>
+        void calc_all_iterations_gpu(int cols, int rows, int iters, const PtrStepSzb& u, const PtrStepSzb& d,
+            const PtrStepSzb& l, const PtrStepSzb& r, const PtrStepSzb& data, cudaStream_t stream)
+        {
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(cols, threads.x << 1);
+            grid.y = divUp(rows, threads.y);
+
+            int elem_step = (int)(u.step / sizeof(T));
+
+            for(int t = 0; t < iters; ++t)
+            {
+                one_iteration<T><<<grid, threads, 0, stream>>>(t, elem_step, (T*)u.data, (T*)d.data, (T*)l.data, (T*)r.data, (PtrStepSz<T>)data, cols, rows);
+                cudaSafeCall( cudaGetLastError() );
+
+                if (stream == 0)
+                    cudaSafeCall( cudaDeviceSynchronize() );
+            }
+        }
+
+        template void calc_all_iterations_gpu<short>(int cols, int rows, int iters, const PtrStepSzb& u, const PtrStepSzb& d, const PtrStepSzb& l, const PtrStepSzb& r, const PtrStepSzb& data, cudaStream_t stream);
+        template void calc_all_iterations_gpu<float>(int cols, int rows, int iters, const PtrStepSzb& u, const PtrStepSzb& d, const PtrStepSzb& l, const PtrStepSzb& r, const PtrStepSzb& data, cudaStream_t stream);
+
+        ///////////////////////////////////////////////////////////////
+        /////////////////////////// output ////////////////////////////
+        ///////////////////////////////////////////////////////////////
+
+        template <typename T>
+        __global__ void output(const int elem_step, const T* u, const T* d, const T* l, const T* r, const T* data,
+            PtrStepSz<short> disp)
+        {
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (y > 0 && y < disp.rows - 1 && x > 0 && x < disp.cols - 1)
+            {
+                const T* us = u + (y + 1) * elem_step + x;
+                const T* ds = d + (y - 1) * elem_step + x;
+                const T* ls = l + y * elem_step + (x + 1);
+                const T* rs = r + y * elem_step+ (x - 1);
+                const T* dt = data + y * elem_step + x;
+
+                size_t disp_step = disp.rows * elem_step;
+
+                int best = 0;
+                float best_val = numeric_limits<float>::max();
+                for (int d = 0; d < cndisp; ++d)
+                {
+                    float val  = us[d * disp_step];
+                          val += ds[d * disp_step];
+                          val += ls[d * disp_step];
+                          val += rs[d * disp_step];
+                          val += dt[d * disp_step];
+
+                    if (val < best_val)
+                    {
+                        best_val = val;
+                        best = d;
+                    }
+                }
+
+                disp.ptr(y)[x] = saturate_cast<short>(best);
+            }
+        }
+
+        template <typename T>
+        void output_gpu(const PtrStepSzb& u, const PtrStepSzb& d, const PtrStepSzb& l, const PtrStepSzb& r, const PtrStepSzb& data,
+            const PtrStepSz<short>& disp, cudaStream_t stream)
+        {
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(disp.cols, threads.x);
+            grid.y = divUp(disp.rows, threads.y);
+
+            int elem_step = static_cast<int>(u.step/sizeof(T));
+
+            output<T><<<grid, threads, 0, stream>>>(elem_step, (const T*)u.data, (const T*)d.data, (const T*)l.data, (const T*)r.data, (const T*)data.data, disp);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template void output_gpu<short>(const PtrStepSzb& u, const PtrStepSzb& d, const PtrStepSzb& l, const PtrStepSzb& r, const PtrStepSzb& data, const PtrStepSz<short>& disp, cudaStream_t stream);
+        template void output_gpu<float>(const PtrStepSzb& u, const PtrStepSzb& d, const PtrStepSzb& l, const PtrStepSzb& r, const PtrStepSzb& data, const PtrStepSz<short>& disp, cudaStream_t stream);
+    } // namespace stereobp
+}}} // namespace cv { namespace cuda { namespace cudev
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudastereo/src/cuda/stereocsbp.cu b/modules/cudastereo/src/cuda/stereocsbp.cu
new file mode 100644
index 00000000000..de1707d68b8
--- /dev/null
+++ b/modules/cudastereo/src/cuda/stereocsbp.cu
@@ -0,0 +1,814 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/saturate_cast.hpp"
+#include "opencv2/core/cuda/limits.hpp"
+#include "opencv2/core/cuda/reduce.hpp"
+#include "opencv2/core/cuda/functional.hpp"
+
+#include "stereocsbp.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace stereocsbp
+    {
+        ///////////////////////////////////////////////////////////////
+        /////////////////////// init data cost ////////////////////////
+        ///////////////////////////////////////////////////////////////
+
+        template <int channels> static float __device__ pixeldiff(const uchar* left, const uchar* right, float max_data_term);
+        template<> __device__ __forceinline__ float pixeldiff<1>(const uchar* left, const uchar* right, float max_data_term)
+        {
+            return fminf( ::abs((int)*left - *right), max_data_term);
+        }
+        template<> __device__ __forceinline__ float pixeldiff<3>(const uchar* left, const uchar* right, float max_data_term)
+        {
+            float tb = 0.114f * ::abs((int)left[0] - right[0]);
+            float tg = 0.587f * ::abs((int)left[1] - right[1]);
+            float tr = 0.299f * ::abs((int)left[2] - right[2]);
+
+            return fminf(tr + tg + tb, max_data_term);
+        }
+        template<> __device__ __forceinline__ float pixeldiff<4>(const uchar* left, const uchar* right, float max_data_term)
+        {
+            uchar4 l = *((const uchar4*)left);
+            uchar4 r = *((const uchar4*)right);
+
+            float tb = 0.114f * ::abs((int)l.x - r.x);
+            float tg = 0.587f * ::abs((int)l.y - r.y);
+            float tr = 0.299f * ::abs((int)l.z - r.z);
+
+            return fminf(tr + tg + tb, max_data_term);
+        }
+
+        template <typename T>
+        __global__ void get_first_k_initial_global(uchar *ctemp, T* data_cost_selected_, T *selected_disp_pyr, int h, int w, int nr_plane, int ndisp,
+            size_t msg_step, size_t disp_step)
+        {
+            int x = blockIdx.x * blockDim.x + threadIdx.x;
+            int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (y < h && x < w)
+            {
+                T* selected_disparity = selected_disp_pyr + y * msg_step + x;
+                T* data_cost_selected = data_cost_selected_ + y * msg_step + x;
+                T* data_cost = (T*)ctemp + y * msg_step + x;
+
+                for(int i = 0; i < nr_plane; i++)
+                {
+                    T minimum = device::numeric_limits<T>::max();
+                    int id = 0;
+                    for(int d = 0; d < ndisp; d++)
+                    {
+                        T cur = data_cost[d * disp_step];
+                        if(cur < minimum)
+                        {
+                            minimum = cur;
+                            id = d;
+                        }
+                    }
+
+                    data_cost_selected[i  * disp_step] = minimum;
+                    selected_disparity[i  * disp_step] = id;
+                    data_cost         [id * disp_step] = numeric_limits<T>::max();
+                }
+            }
+        }
+
+
+        template <typename T>
+        __global__ void get_first_k_initial_local(uchar *ctemp, T* data_cost_selected_, T* selected_disp_pyr, int h, int w, int nr_plane, int ndisp,
+            size_t msg_step, size_t disp_step)
+        {
+            int x = blockIdx.x * blockDim.x + threadIdx.x;
+            int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (y < h && x < w)
+            {
+                T* selected_disparity = selected_disp_pyr + y * msg_step + x;
+                T* data_cost_selected = data_cost_selected_ + y * msg_step + x;
+                T* data_cost = (T*)ctemp + y * msg_step + x;
+
+                int nr_local_minimum = 0;
+
+                T prev = data_cost[0 * disp_step];
+                T cur  = data_cost[1 * disp_step];
+                T next = data_cost[2 * disp_step];
+
+                for (int d = 1; d < ndisp - 1 && nr_local_minimum < nr_plane; d++)
+                {
+                    if (cur < prev && cur < next)
+                    {
+                        data_cost_selected[nr_local_minimum * disp_step] = cur;
+                        selected_disparity[nr_local_minimum * disp_step] = d;
+
+                        data_cost[d * disp_step] = numeric_limits<T>::max();
+
+                        nr_local_minimum++;
+                    }
+                    prev = cur;
+                    cur = next;
+                    next = data_cost[(d + 1) * disp_step];
+                }
+
+                for (int i = nr_local_minimum; i < nr_plane; i++)
+                {
+                    T minimum = numeric_limits<T>::max();
+                    int id = 0;
+
+                    for (int d = 0; d < ndisp; d++)
+                    {
+                        cur = data_cost[d * disp_step];
+                        if (cur < minimum)
+                        {
+                            minimum = cur;
+                            id = d;
+                        }
+                    }
+                    data_cost_selected[i * disp_step] = minimum;
+                    selected_disparity[i * disp_step] = id;
+
+                    data_cost[id * disp_step] = numeric_limits<T>::max();
+                }
+            }
+        }
+
+        template <typename T, int channels>
+        __global__ void init_data_cost(const uchar *cleft, const uchar *cright, uchar *ctemp, size_t cimg_step,
+                                      int h, int w, int level, int ndisp, float data_weight, float max_data_term,
+                                      int min_disp, size_t msg_step, size_t disp_step)
+        {
+            int x = blockIdx.x * blockDim.x + threadIdx.x;
+            int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (y < h && x < w)
+            {
+                int y0 = y << level;
+                int yt = (y + 1) << level;
+
+                int x0 = x << level;
+                int xt = (x + 1) << level;
+
+                T* data_cost = (T*)ctemp + y * msg_step + x;
+
+                for(int d = 0; d < ndisp; ++d)
+                {
+                    float val = 0.0f;
+                    for(int yi = y0; yi < yt; yi++)
+                    {
+                        for(int xi = x0; xi < xt; xi++)
+                        {
+                            int xr = xi - d;
+                            if(d < min_disp || xr < 0)
+                                val += data_weight * max_data_term;
+                            else
+                            {
+                                const uchar* lle = cleft + yi * cimg_step + xi * channels;
+                                const uchar* lri = cright + yi * cimg_step + xr * channels;
+
+                                val += data_weight * pixeldiff<channels>(lle, lri, max_data_term);
+                            }
+                        }
+                    }
+                    data_cost[disp_step * d] = saturate_cast<T>(val);
+                }
+            }
+        }
+
+        template <typename T, int winsz, int channels>
+        __global__ void init_data_cost_reduce(const uchar *cleft, const uchar *cright, uchar *ctemp, size_t cimg_step,
+                                              int level, int rows, int cols, int h, int ndisp, float data_weight, float max_data_term,
+                                              int min_disp, size_t msg_step, size_t disp_step)
+        {
+            int x_out = blockIdx.x;
+            int y_out = blockIdx.y % h;
+            int d = (blockIdx.y / h) * blockDim.z + threadIdx.z;
+
+            int tid = threadIdx.x;
+
+            if (d < ndisp)
+            {
+                int x0 = x_out << level;
+                int y0 = y_out << level;
+
+                int len = ::min(y0 + winsz, rows) - y0;
+
+                float val = 0.0f;
+                if (x0 + tid < cols)
+                {
+                    if (x0 + tid - d < 0 || d < min_disp)
+                        val = data_weight * max_data_term * len;
+                    else
+                    {
+                        const uchar* lle =  cleft + y0 * cimg_step + channels * (x0 + tid    );
+                        const uchar* lri = cright + y0 * cimg_step + channels * (x0 + tid - d);
+
+                        for(int y = 0; y < len; ++y)
+                        {
+                            val += data_weight * pixeldiff<channels>(lle, lri, max_data_term);
+
+                            lle += cimg_step;
+                            lri += cimg_step;
+                        }
+                    }
+                }
+
+                extern __shared__ float smem[];
+
+                reduce<winsz>(smem + winsz * threadIdx.z, val, tid, plus<float>());
+
+                T* data_cost = (T*)ctemp + y_out * msg_step + x_out;
+
+                if (tid == 0)
+                    data_cost[disp_step * d] = saturate_cast<T>(val);
+            }
+        }
+
+
+        template <typename T>
+        void init_data_cost_caller_(const uchar *cleft, const uchar *cright, uchar *ctemp, size_t cimg_step, int /*rows*/, int /*cols*/, int h, int w, int level, int ndisp, int channels, float data_weight, float max_data_term, int min_disp, size_t msg_step, size_t disp_step, cudaStream_t stream)
+        {
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(w, threads.x);
+            grid.y = divUp(h, threads.y);
+
+            switch (channels)
+            {
+            case 1: init_data_cost<T, 1><<<grid, threads, 0, stream>>>(cleft, cright, ctemp, cimg_step, h, w, level, ndisp, data_weight, max_data_term, min_disp, msg_step, disp_step); break;
+            case 3: init_data_cost<T, 3><<<grid, threads, 0, stream>>>(cleft, cright, ctemp, cimg_step, h, w, level, ndisp, data_weight, max_data_term, min_disp, msg_step, disp_step); break;
+            case 4: init_data_cost<T, 4><<<grid, threads, 0, stream>>>(cleft, cright, ctemp, cimg_step, h, w, level, ndisp, data_weight, max_data_term, min_disp, msg_step, disp_step); break;
+            default: CV_Error(cv::Error::BadNumChannels, "Unsupported channels count");
+            }
+        }
+
+        template <typename T, int winsz>
+        void init_data_cost_reduce_caller_(const uchar *cleft, const uchar *cright, uchar *ctemp, size_t cimg_step, int rows, int cols, int h, int w, int level, int ndisp, int channels, float data_weight, float max_data_term, int min_disp, size_t msg_step, size_t disp_step, cudaStream_t stream)
+        {
+            const int threadsNum = 256;
+            const size_t smem_size = threadsNum * sizeof(float);
+
+            dim3 threads(winsz, 1, threadsNum / winsz);
+            dim3 grid(w, h, 1);
+            grid.y *= divUp(ndisp, threads.z);
+
+            switch (channels)
+            {
+            case 1: init_data_cost_reduce<T, winsz, 1><<<grid, threads, smem_size, stream>>>(cleft, cright, ctemp, cimg_step, level, rows, cols, h, ndisp, data_weight, max_data_term, min_disp, msg_step, disp_step); break;
+            case 3: init_data_cost_reduce<T, winsz, 3><<<grid, threads, smem_size, stream>>>(cleft, cright, ctemp, cimg_step, level, rows, cols, h, ndisp, data_weight, max_data_term, min_disp, msg_step, disp_step); break;
+            case 4: init_data_cost_reduce<T, winsz, 4><<<grid, threads, smem_size, stream>>>(cleft, cright, ctemp, cimg_step, level, rows, cols, h, ndisp, data_weight, max_data_term, min_disp, msg_step, disp_step); break;
+            default: CV_Error(cv::Error::BadNumChannels, "Unsupported channels count");
+            }
+        }
+
+        template<class T>
+        void init_data_cost(const uchar *cleft, const uchar *cright, uchar *ctemp, size_t cimg_step, int rows, int cols, T* disp_selected_pyr, T* data_cost_selected, size_t msg_step,
+                    int h, int w, int level, int nr_plane, int ndisp, int channels, float data_weight, float max_data_term, int min_disp, bool use_local_init_data_cost, cudaStream_t stream)
+        {
+
+            typedef void (*InitDataCostCaller)(const uchar *cleft, const uchar *cright, uchar *ctemp, size_t cimg_step, int cols, int rows, int w, int h, int level, int ndisp, int channels, float data_weight, float max_data_term, int min_disp, size_t msg_step, size_t disp_step, cudaStream_t stream);
+
+            static const InitDataCostCaller init_data_cost_callers[] =
+            {
+                init_data_cost_caller_<T>, init_data_cost_caller_<T>, init_data_cost_reduce_caller_<T, 4>,
+                init_data_cost_reduce_caller_<T, 8>, init_data_cost_reduce_caller_<T, 16>, init_data_cost_reduce_caller_<T, 32>,
+                init_data_cost_reduce_caller_<T, 64>, init_data_cost_reduce_caller_<T, 128>, init_data_cost_reduce_caller_<T, 256>
+            };
+
+            size_t disp_step = msg_step * h;
+
+            init_data_cost_callers[level](cleft, cright, ctemp, cimg_step, rows, cols, h, w, level, ndisp, channels, data_weight, max_data_term, min_disp, msg_step, disp_step, stream);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(w, threads.x);
+            grid.y = divUp(h, threads.y);
+
+            if (use_local_init_data_cost == true)
+                get_first_k_initial_local<<<grid, threads, 0, stream>>> (ctemp, data_cost_selected, disp_selected_pyr, h, w, nr_plane, ndisp, msg_step, disp_step);
+            else
+                get_first_k_initial_global<<<grid, threads, 0, stream>>>(ctemp, data_cost_selected, disp_selected_pyr, h, w, nr_plane, ndisp, msg_step, disp_step);
+
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template void init_data_cost<short>(const uchar *cleft, const uchar *cright, uchar *ctemp, size_t cimg_step, int rows, int cols, short* disp_selected_pyr, short* data_cost_selected, size_t msg_step,
+                    int h, int w, int level, int nr_plane, int ndisp, int channels, float data_weight, float max_data_term, int min_disp, bool use_local_init_data_cost, cudaStream_t stream);
+
+        template void init_data_cost<float>(const uchar *cleft, const uchar *cright, uchar *ctemp, size_t cimg_step, int rows, int cols, float* disp_selected_pyr, float* data_cost_selected, size_t msg_step,
+                    int h, int w, int level, int nr_plane, int ndisp, int channels, float data_weight, float max_data_term, int min_disp, bool use_local_init_data_cost, cudaStream_t stream);
+
+        ///////////////////////////////////////////////////////////////
+        ////////////////////// compute data cost //////////////////////
+        ///////////////////////////////////////////////////////////////
+
+        template <typename T, int channels>
+        __global__ void compute_data_cost(const uchar *cleft, const uchar *cright, size_t cimg_step, const T* selected_disp_pyr, T* data_cost_, int h, int w, int level, int nr_plane, float data_weight, float max_data_term, int min_disp, size_t msg_step, size_t disp_step1, size_t disp_step2)
+        {
+            int x = blockIdx.x * blockDim.x + threadIdx.x;
+            int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (y < h && x < w)
+            {
+                int y0 = y << level;
+                int yt = (y + 1) << level;
+
+                int x0 = x << level;
+                int xt = (x + 1) << level;
+
+                const T* selected_disparity = selected_disp_pyr + y/2 * msg_step + x/2;
+                T* data_cost = data_cost_ + y * msg_step + x;
+
+                for(int d = 0; d < nr_plane; d++)
+                {
+                    float val = 0.0f;
+                    for(int yi = y0; yi < yt; yi++)
+                    {
+                        for(int xi = x0; xi < xt; xi++)
+                        {
+                            int sel_disp = selected_disparity[d * disp_step2];
+                            int xr = xi - sel_disp;
+
+                            if (xr < 0 || sel_disp < min_disp)
+                                val += data_weight * max_data_term;
+                            else
+                            {
+                                const uchar* left_x = cleft + yi * cimg_step + xi * channels;
+                                const uchar* right_x = cright + yi * cimg_step + xr * channels;
+
+                                val += data_weight * pixeldiff<channels>(left_x, right_x, max_data_term);
+                            }
+                        }
+                    }
+                    data_cost[disp_step1 * d] = saturate_cast<T>(val);
+                }
+            }
+        }
+
+        template <typename T, int winsz, int channels>
+        __global__ void compute_data_cost_reduce(const uchar *cleft, const uchar *cright, size_t cimg_step, const T* selected_disp_pyr, T* data_cost_, int level, int rows, int cols, int h, int nr_plane, float data_weight, float max_data_term, int min_disp, size_t msg_step, size_t disp_step1, size_t disp_step2)
+        {
+            int x_out = blockIdx.x;
+            int y_out = blockIdx.y % h;
+            int d = (blockIdx.y / h) * blockDim.z + threadIdx.z;
+
+            int tid = threadIdx.x;
+
+            const T* selected_disparity = selected_disp_pyr + y_out/2 * msg_step + x_out/2;
+            T* data_cost = data_cost_ + y_out * msg_step + x_out;
+
+            if (d < nr_plane)
+            {
+                int sel_disp = selected_disparity[d * disp_step2];
+
+                int x0 = x_out << level;
+                int y0 = y_out << level;
+
+                int len = ::min(y0 + winsz, rows) - y0;
+
+                float val = 0.0f;
+                if (x0 + tid < cols)
+                {
+                    if (x0 + tid - sel_disp < 0 || sel_disp < min_disp)
+                        val = data_weight * max_data_term * len;
+                    else
+                    {
+                        const uchar* lle =  cleft + y0 * cimg_step + channels * (x0 + tid    );
+                        const uchar* lri = cright + y0 * cimg_step + channels * (x0 + tid - sel_disp);
+
+                        for(int y = 0; y < len; ++y)
+                        {
+                            val += data_weight * pixeldiff<channels>(lle, lri, max_data_term);
+
+                            lle += cimg_step;
+                            lri += cimg_step;
+                        }
+                    }
+                }
+
+                extern __shared__ float smem[];
+
+                reduce<winsz>(smem + winsz * threadIdx.z, val, tid, plus<float>());
+
+                if (tid == 0)
+                    data_cost[disp_step1 * d] = saturate_cast<T>(val);
+            }
+        }
+
+        template <typename T>
+        void compute_data_cost_caller_(const uchar *cleft, const uchar *cright, size_t cimg_step, const T* disp_selected_pyr, T* data_cost, int /*rows*/, int /*cols*/,
+                                      int h, int w, int level, int nr_plane, int channels, float data_weight, float max_data_term, int min_disp, size_t msg_step, size_t disp_step1, size_t disp_step2, cudaStream_t stream)
+        {
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(w, threads.x);
+            grid.y = divUp(h, threads.y);
+
+            switch(channels)
+            {
+            case 1: compute_data_cost<T, 1><<<grid, threads, 0, stream>>>(cleft, cright, cimg_step, disp_selected_pyr, data_cost, h, w, level, nr_plane, data_weight, max_data_term, min_disp, msg_step, disp_step1, disp_step2); break;
+            case 3: compute_data_cost<T, 3><<<grid, threads, 0, stream>>>(cleft, cright, cimg_step, disp_selected_pyr, data_cost, h, w, level, nr_plane, data_weight, max_data_term, min_disp, msg_step, disp_step1, disp_step2); break;
+            case 4: compute_data_cost<T, 4><<<grid, threads, 0, stream>>>(cleft, cright, cimg_step, disp_selected_pyr, data_cost, h, w, level, nr_plane, data_weight, max_data_term, min_disp, msg_step, disp_step1, disp_step2); break;
+            default: CV_Error(cv::Error::BadNumChannels, "Unsupported channels count");
+            }
+        }
+
+        template <typename T, int winsz>
+        void compute_data_cost_reduce_caller_(const uchar *cleft, const uchar *cright, size_t cimg_step, const T* disp_selected_pyr, T* data_cost, int rows, int cols,
+                                      int h, int w, int level, int nr_plane, int channels, float data_weight, float max_data_term, int min_disp, size_t msg_step, size_t disp_step1, size_t disp_step2, cudaStream_t stream)
+        {
+            const int threadsNum = 256;
+            const size_t smem_size = threadsNum * sizeof(float);
+
+            dim3 threads(winsz, 1, threadsNum / winsz);
+            dim3 grid(w, h, 1);
+            grid.y *= divUp(nr_plane, threads.z);
+
+            switch (channels)
+            {
+            case 1: compute_data_cost_reduce<T, winsz, 1><<<grid, threads, smem_size, stream>>>(cleft, cright, cimg_step, disp_selected_pyr, data_cost, level, rows, cols, h, nr_plane, data_weight, max_data_term, min_disp, msg_step, disp_step1, disp_step2); break;
+            case 3: compute_data_cost_reduce<T, winsz, 3><<<grid, threads, smem_size, stream>>>(cleft, cright, cimg_step, disp_selected_pyr, data_cost, level, rows, cols, h, nr_plane, data_weight, max_data_term, min_disp, msg_step, disp_step1, disp_step2); break;
+            case 4: compute_data_cost_reduce<T, winsz, 4><<<grid, threads, smem_size, stream>>>(cleft, cright, cimg_step, disp_selected_pyr, data_cost, level, rows, cols, h, nr_plane, data_weight, max_data_term, min_disp, msg_step, disp_step1, disp_step2); break;
+            default: CV_Error(cv::Error::BadNumChannels, "Unsupported channels count");
+            }
+        }
+
+        template<class T>
+        void compute_data_cost(const uchar *cleft, const uchar *cright, size_t cimg_step, const T* disp_selected_pyr, T* data_cost, size_t msg_step,
+                               int rows, int cols, int h, int w, int h2, int level, int nr_plane, int channels, float data_weight, float max_data_term,
+                               int min_disp, cudaStream_t stream)
+        {
+            typedef void (*ComputeDataCostCaller)(const uchar *cleft, const uchar *cright, size_t cimg_step, const T* disp_selected_pyr, T* data_cost, int rows, int cols,
+                int h, int w, int level, int nr_plane, int channels, float data_weight, float max_data_term, int min_disp, size_t msg_step, size_t disp_step1, size_t disp_step2, cudaStream_t stream);
+
+            static const ComputeDataCostCaller callers[] =
+            {
+                compute_data_cost_caller_<T>, compute_data_cost_caller_<T>, compute_data_cost_reduce_caller_<T, 4>,
+                compute_data_cost_reduce_caller_<T, 8>, compute_data_cost_reduce_caller_<T, 16>, compute_data_cost_reduce_caller_<T, 32>,
+                compute_data_cost_reduce_caller_<T, 64>, compute_data_cost_reduce_caller_<T, 128>, compute_data_cost_reduce_caller_<T, 256>
+            };
+
+            size_t disp_step1 = msg_step * h;
+            size_t disp_step2 = msg_step * h2;
+
+            callers[level](cleft, cright, cimg_step, disp_selected_pyr, data_cost, rows, cols, h, w, level, nr_plane, channels, data_weight, max_data_term, min_disp, msg_step, disp_step1, disp_step2, stream);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template void compute_data_cost(const uchar *cleft, const uchar *cright, size_t cimg_step, const short* disp_selected_pyr, short* data_cost, size_t msg_step,
+                               int rows, int cols, int h, int w, int h2, int level, int nr_plane, int channels, float data_weight, float max_data_term, int min_disp, cudaStream_t stream);
+
+        template void compute_data_cost(const uchar *cleft, const uchar *cright, size_t cimg_step, const float* disp_selected_pyr, float* data_cost, size_t msg_step,
+                               int rows, int cols, int h, int w, int h2, int level, int nr_plane, int channels, float data_weight, float max_data_term, int min_disp, cudaStream_t stream);
+
+
+        ///////////////////////////////////////////////////////////////
+        //////////////////////// init message /////////////////////////
+        ///////////////////////////////////////////////////////////////
+
+
+         template <typename T>
+        __device__ void get_first_k_element_increase(T* u_new, T* d_new, T* l_new, T* r_new,
+                                                     const T* u_cur, const T* d_cur, const T* l_cur, const T* r_cur,
+                                                     T* data_cost_selected, T* disparity_selected_new, T* data_cost_new,
+                                                     const T* data_cost_cur, const T* disparity_selected_cur,
+                                                     int nr_plane, int nr_plane2, size_t disp_step1, size_t disp_step2)
+        {
+            for(int i = 0; i < nr_plane; i++)
+            {
+                T minimum = numeric_limits<T>::max();
+                int id = 0;
+                for(int j = 0; j < nr_plane2; j++)
+                {
+                    T cur = data_cost_new[j * disp_step1];
+                    if(cur < minimum)
+                    {
+                        minimum = cur;
+                        id = j;
+                    }
+                }
+
+                data_cost_selected[i * disp_step1] = data_cost_cur[id * disp_step1];
+                disparity_selected_new[i * disp_step1] = disparity_selected_cur[id * disp_step2];
+
+                u_new[i * disp_step1] = u_cur[id * disp_step2];
+                d_new[i * disp_step1] = d_cur[id * disp_step2];
+                l_new[i * disp_step1] = l_cur[id * disp_step2];
+                r_new[i * disp_step1] = r_cur[id * disp_step2];
+
+                data_cost_new[id * disp_step1] = numeric_limits<T>::max();
+            }
+        }
+
+        template <typename T>
+        __global__ void init_message(uchar *ctemp, T* u_new_, T* d_new_, T* l_new_, T* r_new_,
+                                     const T* u_cur_, const T* d_cur_, const T* l_cur_, const T* r_cur_,
+                                     T* selected_disp_pyr_new, const T* selected_disp_pyr_cur,
+                                     T* data_cost_selected_, const T* data_cost_,
+                                     int h, int w, int nr_plane, int h2, int w2, int nr_plane2,
+                                     size_t msg_step, size_t disp_step1, size_t disp_step2)
+        {
+            int x = blockIdx.x * blockDim.x + threadIdx.x;
+            int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (y < h && x < w)
+            {
+                const T* u_cur = u_cur_ + ::min(h2-1, y/2 + 1) * msg_step + x/2;
+                const T* d_cur = d_cur_ + ::max(0, y/2 - 1)    * msg_step + x/2;
+                const T* l_cur = l_cur_ + (y/2)                * msg_step + ::min(w2-1, x/2 + 1);
+                const T* r_cur = r_cur_ + (y/2)                * msg_step + ::max(0, x/2 - 1);
+
+                T* data_cost_new = (T*)ctemp + y * msg_step + x;
+
+                const T* disparity_selected_cur = selected_disp_pyr_cur + y/2 * msg_step + x/2;
+                const T* data_cost = data_cost_ + y * msg_step + x;
+
+                for(int d = 0; d < nr_plane2; d++)
+                {
+                    int idx2 = d * disp_step2;
+
+                    T val  = data_cost[d * disp_step1] + u_cur[idx2] + d_cur[idx2] + l_cur[idx2] + r_cur[idx2];
+                    data_cost_new[d * disp_step1] = val;
+                }
+
+                T* data_cost_selected = data_cost_selected_ + y * msg_step + x;
+                T* disparity_selected_new = selected_disp_pyr_new + y * msg_step + x;
+
+                T* u_new = u_new_ + y * msg_step + x;
+                T* d_new = d_new_ + y * msg_step + x;
+                T* l_new = l_new_ + y * msg_step + x;
+                T* r_new = r_new_ + y * msg_step + x;
+
+                u_cur = u_cur_ + y/2 * msg_step + x/2;
+                d_cur = d_cur_ + y/2 * msg_step + x/2;
+                l_cur = l_cur_ + y/2 * msg_step + x/2;
+                r_cur = r_cur_ + y/2 * msg_step + x/2;
+
+                get_first_k_element_increase(u_new, d_new, l_new, r_new, u_cur, d_cur, l_cur, r_cur,
+                                             data_cost_selected, disparity_selected_new, data_cost_new,
+                                             data_cost, disparity_selected_cur, nr_plane, nr_plane2,
+                                             disp_step1, disp_step2);
+            }
+        }
+
+
+        template<class T>
+        void init_message(uchar *ctemp, T* u_new, T* d_new, T* l_new, T* r_new,
+                          const T* u_cur, const T* d_cur, const T* l_cur, const T* r_cur,
+                          T* selected_disp_pyr_new, const T* selected_disp_pyr_cur,
+                          T* data_cost_selected, const T* data_cost, size_t msg_step,
+                          int h, int w, int nr_plane, int h2, int w2, int nr_plane2, cudaStream_t stream)
+        {
+
+            size_t disp_step1 = msg_step * h;
+            size_t disp_step2 = msg_step * h2;
+
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(w, threads.x);
+            grid.y = divUp(h, threads.y);
+
+            init_message<<<grid, threads, 0, stream>>>(ctemp, u_new, d_new, l_new, r_new,
+                                                       u_cur, d_cur, l_cur, r_cur,
+                                                       selected_disp_pyr_new, selected_disp_pyr_cur,
+                                                       data_cost_selected, data_cost,
+                                                       h, w, nr_plane, h2, w2, nr_plane2,
+                                                       msg_step, disp_step1, disp_step2);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+
+        template void init_message(uchar *ctemp, short* u_new, short* d_new, short* l_new, short* r_new,
+                          const short* u_cur, const short* d_cur, const short* l_cur, const short* r_cur,
+                          short* selected_disp_pyr_new, const short* selected_disp_pyr_cur,
+                          short* data_cost_selected, const short* data_cost, size_t msg_step,
+                          int h, int w, int nr_plane, int h2, int w2, int nr_plane2, cudaStream_t stream);
+
+        template void init_message(uchar *ctemp, float* u_new, float* d_new, float* l_new, float* r_new,
+                          const float* u_cur, const float* d_cur, const float* l_cur, const float* r_cur,
+                          float* selected_disp_pyr_new, const float* selected_disp_pyr_cur,
+                          float* data_cost_selected, const float* data_cost, size_t msg_step,
+                          int h, int w, int nr_plane, int h2, int w2, int nr_plane2, cudaStream_t stream);
+
+        ///////////////////////////////////////////////////////////////
+        ////////////////////  calc all iterations /////////////////////
+        ///////////////////////////////////////////////////////////////
+
+        template <typename T>
+        __device__ void message_per_pixel(const T* data, T* msg_dst, const T* msg1, const T* msg2, const T* msg3,
+                                          const T* dst_disp, const T* src_disp, int nr_plane, int max_disc_term, float disc_single_jump, volatile T* temp,
+                                          size_t disp_step)
+        {
+            T minimum = numeric_limits<T>::max();
+
+            for(int d = 0; d < nr_plane; d++)
+            {
+                int idx = d * disp_step;
+                T val  = data[idx] + msg1[idx] + msg2[idx] + msg3[idx];
+
+                if(val < minimum)
+                    minimum = val;
+
+                msg_dst[idx] = val;
+            }
+
+            float sum = 0;
+            for(int d = 0; d < nr_plane; d++)
+            {
+                float cost_min = minimum + max_disc_term;
+                T src_disp_reg = src_disp[d * disp_step];
+
+                for(int d2 = 0; d2 < nr_plane; d2++)
+                    cost_min = fmin(cost_min, msg_dst[d2 * disp_step] + disc_single_jump * ::abs(dst_disp[d2 * disp_step] - src_disp_reg));
+
+                temp[d * disp_step] = saturate_cast<T>(cost_min);
+                sum += cost_min;
+            }
+            sum /= nr_plane;
+
+            for(int d = 0; d < nr_plane; d++)
+                msg_dst[d * disp_step] = saturate_cast<T>(temp[d * disp_step] - sum);
+        }
+
+        template <typename T>
+        __global__ void compute_message(uchar *ctemp, T* u_, T* d_, T* l_, T* r_, const T* data_cost_selected, const T* selected_disp_pyr_cur, int h, int w, int nr_plane, int i, int max_disc_term, float disc_single_jump, size_t msg_step, size_t disp_step)
+        {
+            int y = blockIdx.y * blockDim.y + threadIdx.y;
+            int x = ((blockIdx.x * blockDim.x + threadIdx.x) << 1) + ((y + i) & 1);
+
+            if (y > 0 && y < h - 1 && x > 0 && x < w - 1)
+            {
+                const T* data = data_cost_selected + y * msg_step + x;
+
+                T* u = u_ + y * msg_step + x;
+                T* d = d_ + y * msg_step + x;
+                T* l = l_ + y * msg_step + x;
+                T* r = r_ + y * msg_step + x;
+
+                const T* disp = selected_disp_pyr_cur + y * msg_step + x;
+
+                T* temp = (T*)ctemp + y * msg_step + x;
+
+                message_per_pixel(data, u, r - 1, u + msg_step, l + 1, disp, disp - msg_step, nr_plane, max_disc_term, disc_single_jump, temp, disp_step);
+                message_per_pixel(data, d, d - msg_step, r - 1, l + 1, disp, disp + msg_step, nr_plane, max_disc_term, disc_single_jump, temp, disp_step);
+                message_per_pixel(data, l, u + msg_step, d - msg_step, l + 1, disp, disp - 1, nr_plane, max_disc_term, disc_single_jump, temp, disp_step);
+                message_per_pixel(data, r, u + msg_step, d - msg_step, r - 1, disp, disp + 1, nr_plane, max_disc_term, disc_single_jump, temp, disp_step);
+            }
+        }
+
+
+        template<class T>
+        void calc_all_iterations(uchar *ctemp, T* u, T* d, T* l, T* r, const T* data_cost_selected,
+            const T* selected_disp_pyr_cur, size_t msg_step, int h, int w, int nr_plane, int iters, int max_disc_term, float disc_single_jump, cudaStream_t stream)
+        {
+            size_t disp_step = msg_step * h;
+
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(w, threads.x << 1);
+            grid.y = divUp(h, threads.y);
+
+            for(int t = 0; t < iters; ++t)
+            {
+                compute_message<<<grid, threads, 0, stream>>>(ctemp, u, d, l, r, data_cost_selected, selected_disp_pyr_cur, h, w, nr_plane, t & 1, max_disc_term, disc_single_jump, msg_step, disp_step);
+                cudaSafeCall( cudaGetLastError() );
+            }
+            if (stream == 0)
+                    cudaSafeCall( cudaDeviceSynchronize() );
+        };
+
+        template void calc_all_iterations(uchar *ctemp, short* u, short* d, short* l, short* r, const short* data_cost_selected, const short* selected_disp_pyr_cur, size_t msg_step,
+            int h, int w, int nr_plane, int iters, int max_disc_term, float disc_single_jump, cudaStream_t stream);
+
+        template void calc_all_iterations(uchar *ctemp, float* u, float* d, float* l, float* r, const float* data_cost_selected, const float* selected_disp_pyr_cur, size_t msg_step,
+            int h, int w, int nr_plane, int iters, int max_disc_term, float disc_single_jump, cudaStream_t stream);
+
+
+        ///////////////////////////////////////////////////////////////
+        /////////////////////////// output ////////////////////////////
+        ///////////////////////////////////////////////////////////////
+
+
+        template <typename T>
+        __global__ void compute_disp(const T* u_, const T* d_, const T* l_, const T* r_,
+                                     const T* data_cost_selected, const T* disp_selected_pyr,
+                                     PtrStepSz<short> disp, int nr_plane, size_t msg_step, size_t disp_step)
+        {
+            int x = blockIdx.x * blockDim.x + threadIdx.x;
+            int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            if (y > 0 && y < disp.rows - 1 && x > 0 && x < disp.cols - 1)
+            {
+                const T* data = data_cost_selected + y * msg_step + x;
+                const T* disp_selected = disp_selected_pyr + y * msg_step + x;
+
+                const T* u = u_ + (y+1) * msg_step + (x+0);
+                const T* d = d_ + (y-1) * msg_step + (x+0);
+                const T* l = l_ + (y+0) * msg_step + (x+1);
+                const T* r = r_ + (y+0) * msg_step + (x-1);
+
+                int best = 0;
+                T best_val = numeric_limits<T>::max();
+                for (int i = 0; i < nr_plane; ++i)
+                {
+                    int idx = i * disp_step;
+                    T val = data[idx]+ u[idx] + d[idx] + l[idx] + r[idx];
+
+                    if (val < best_val)
+                    {
+                        best_val = val;
+                        best = saturate_cast<short>(disp_selected[idx]);
+                    }
+                }
+                disp(y, x) = best;
+            }
+        }
+
+        template<class T>
+        void compute_disp(const T* u, const T* d, const T* l, const T* r, const T* data_cost_selected, const T* disp_selected, size_t msg_step,
+            const PtrStepSz<short>& disp, int nr_plane, cudaStream_t stream)
+        {
+            size_t disp_step = disp.rows * msg_step;
+
+            dim3 threads(32, 8, 1);
+            dim3 grid(1, 1, 1);
+
+            grid.x = divUp(disp.cols, threads.x);
+            grid.y = divUp(disp.rows, threads.y);
+
+            compute_disp<<<grid, threads, 0, stream>>>(u, d, l, r, data_cost_selected, disp_selected, disp, nr_plane, msg_step, disp_step);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template void compute_disp(const short* u, const short* d, const short* l, const short* r, const short* data_cost_selected, const short* disp_selected, size_t msg_step,
+            const PtrStepSz<short>& disp, int nr_plane, cudaStream_t stream);
+
+        template void compute_disp(const float* u, const float* d, const float* l, const float* r, const float* data_cost_selected, const float* disp_selected, size_t msg_step,
+            const PtrStepSz<short>& disp, int nr_plane, cudaStream_t stream);
+    } // namespace stereocsbp
+}}} // namespace cv { namespace cuda { namespace cudev {
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudastereo/src/cuda/stereocsbp.hpp b/modules/cudastereo/src/cuda/stereocsbp.hpp
new file mode 100644
index 00000000000..305497292d5
--- /dev/null
+++ b/modules/cudastereo/src/cuda/stereocsbp.hpp
@@ -0,0 +1,29 @@
+namespace cv { namespace cuda { namespace device
+{
+    namespace stereocsbp
+    {
+        template<class T>
+        void init_data_cost(const uchar *left, const uchar *right, uchar *ctemp, size_t cimg_step, int rows, int cols, T* disp_selected_pyr, T* data_cost_selected, size_t msg_step,
+                    int h, int w, int level, int nr_plane, int ndisp, int channels, float data_weight, float max_data_term, int min_disp, bool use_local_init_data_cost, cudaStream_t stream);
+
+        template<class T>
+        void compute_data_cost(const uchar *left, const uchar *right, size_t cimg_step, const T* disp_selected_pyr, T* data_cost, size_t msg_step,
+                               int rows, int cols, int h, int w, int h2, int level, int nr_plane, int channels, float data_weight, float max_data_term,
+                               int min_disp, cudaStream_t stream);
+
+        template<class T>
+        void init_message(uchar *ctemp, T* u_new, T* d_new, T* l_new, T* r_new,
+                          const T* u_cur, const T* d_cur, const T* l_cur, const T* r_cur,
+                          T* selected_disp_pyr_new, const T* selected_disp_pyr_cur,
+                          T* data_cost_selected, const T* data_cost, size_t msg_step,
+                          int h, int w, int nr_plane, int h2, int w2, int nr_plane2, cudaStream_t stream);
+
+        template<class T>
+        void calc_all_iterations(uchar *ctemp, T* u, T* d, T* l, T* r, const T* data_cost_selected,
+            const T* selected_disp_pyr_cur, size_t msg_step, int h, int w, int nr_plane, int iters, int max_disc_term, float disc_single_jump, cudaStream_t stream);
+
+        template<class T>
+        void compute_disp(const T* u, const T* d, const T* l, const T* r, const T* data_cost_selected, const T* disp_selected, size_t msg_step,
+            const PtrStepSz<short>& disp, int nr_plane, cudaStream_t stream);
+    }
+}}}
diff --git a/modules/cudastereo/src/cuda/util.cu b/modules/cudastereo/src/cuda/util.cu
new file mode 100644
index 00000000000..b65c240ee2f
--- /dev/null
+++ b/modules/cudastereo/src/cuda/util.cu
@@ -0,0 +1,290 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/transform.hpp"
+#include "opencv2/core/cuda/functional.hpp"
+#include "opencv2/core/cuda/reduce.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    /////////////////////////////////// reprojectImageTo3D ///////////////////////////////////////////////
+
+    __constant__ float cq[16];
+
+    template <typename T, typename D>
+    __global__ void reprojectImageTo3D(const PtrStepSz<T> disp, PtrStep<D> xyz)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (y >= disp.rows || x >= disp.cols)
+            return;
+
+        const float qx = x * cq[ 0] + y * cq[ 1] + cq[ 3];
+        const float qy = x * cq[ 4] + y * cq[ 5] + cq[ 7];
+        const float qz = x * cq[ 8] + y * cq[ 9] + cq[11];
+        const float qw = x * cq[12] + y * cq[13] + cq[15];
+
+        const T d = disp(y, x);
+
+        const float iW = 1.f / (qw + cq[14] * d);
+
+        D v = VecTraits<D>::all(1.0f);
+        v.x = (qx + cq[2] * d) * iW;
+        v.y = (qy + cq[6] * d) * iW;
+        v.z = (qz + cq[10] * d) * iW;
+
+        xyz(y, x) = v;
+    }
+
+    template <typename T, typename D>
+    void reprojectImageTo3D_gpu(const PtrStepSzb disp, PtrStepSzb xyz, const float* q, cudaStream_t stream)
+    {
+        dim3 block(32, 8);
+        dim3 grid(divUp(disp.cols, block.x), divUp(disp.rows, block.y));
+
+        cudaSafeCall( cudaMemcpyToSymbol(cq, q, 16 * sizeof(float)) );
+
+        reprojectImageTo3D<T, D><<<grid, block, 0, stream>>>((PtrStepSz<T>)disp, (PtrStepSz<D>)xyz);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    template void reprojectImageTo3D_gpu<uchar, float3>(const PtrStepSzb disp, PtrStepSzb xyz, const float* q, cudaStream_t stream);
+    template void reprojectImageTo3D_gpu<uchar, float4>(const PtrStepSzb disp, PtrStepSzb xyz, const float* q, cudaStream_t stream);
+    template void reprojectImageTo3D_gpu<short, float3>(const PtrStepSzb disp, PtrStepSzb xyz, const float* q, cudaStream_t stream);
+    template void reprojectImageTo3D_gpu<short, float4>(const PtrStepSzb disp, PtrStepSzb xyz, const float* q, cudaStream_t stream);
+    template void reprojectImageTo3D_gpu<int, float3>(const PtrStepSzb disp, PtrStepSzb xyz, const float* q, cudaStream_t stream);
+    template void reprojectImageTo3D_gpu<int, float4>(const PtrStepSzb disp, PtrStepSzb xyz, const float* q, cudaStream_t stream);
+    template void reprojectImageTo3D_gpu<float, float3>(const PtrStepSzb disp, PtrStepSzb xyz, const float* q, cudaStream_t stream);
+    template void reprojectImageTo3D_gpu<float, float4>(const PtrStepSzb disp, PtrStepSzb xyz, const float* q, cudaStream_t stream);
+
+    /////////////////////////////////// drawColorDisp ///////////////////////////////////////////////
+
+    template <typename T>
+    __device__ unsigned int cvtPixel(T d, int ndisp, float S = 1, float V = 1)
+    {
+        unsigned int H = ((ndisp-d) * 240)/ndisp;
+
+        unsigned int hi = (H/60) % 6;
+        float f = H/60.f - H/60;
+        float p = V * (1 - S);
+        float q = V * (1 - f * S);
+        float t = V * (1 - (1 - f) * S);
+
+        float3 res;
+
+        if (hi == 0) //R = V,	G = t,	B = p
+        {
+            res.x = p;
+            res.y = t;
+            res.z = V;
+        }
+
+        if (hi == 1) // R = q,	G = V,	B = p
+        {
+            res.x = p;
+            res.y = V;
+            res.z = q;
+        }
+
+        if (hi == 2) // R = p,	G = V,	B = t
+        {
+            res.x = t;
+            res.y = V;
+            res.z = p;
+        }
+
+        if (hi == 3) // R = p,	G = q,	B = V
+        {
+            res.x = V;
+            res.y = q;
+            res.z = p;
+        }
+
+        if (hi == 4) // R = t,	G = p,	B = V
+        {
+            res.x = V;
+            res.y = p;
+            res.z = t;
+        }
+
+        if (hi == 5) // R = V,	G = p,	B = q
+        {
+            res.x = q;
+            res.y = p;
+            res.z = V;
+        }
+        const unsigned int b = (unsigned int)(::max(0.f, ::min(res.x, 1.f)) * 255.f);
+        const unsigned int g = (unsigned int)(::max(0.f, ::min(res.y, 1.f)) * 255.f);
+        const unsigned int r = (unsigned int)(::max(0.f, ::min(res.z, 1.f)) * 255.f);
+        const unsigned int a = 255U;
+
+        return (a << 24) + (r << 16) + (g << 8) + b;
+    }
+
+    __global__ void drawColorDisp(uchar* disp, size_t disp_step, uchar* out_image, size_t out_step, int width, int height, int ndisp)
+    {
+        const int x = (blockIdx.x * blockDim.x + threadIdx.x) << 2;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if(x < width && y < height)
+        {
+            uchar4 d4 = *(uchar4*)(disp + y * disp_step + x);
+
+            uint4 res;
+            res.x = cvtPixel(d4.x, ndisp);
+            res.y = cvtPixel(d4.y, ndisp);
+            res.z = cvtPixel(d4.z, ndisp);
+            res.w = cvtPixel(d4.w, ndisp);
+
+            uint4* line = (uint4*)(out_image + y * out_step);
+            line[x >> 2] = res;
+        }
+    }
+
+    __global__ void drawColorDisp(short* disp, size_t disp_step, uchar* out_image, size_t out_step, int width, int height, int ndisp)
+    {
+        const int x = (blockIdx.x * blockDim.x + threadIdx.x) << 1;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if(x < width && y < height)
+        {
+            short2 d2 = *(short2*)(disp + y * disp_step + x);
+
+            uint2 res;
+            res.x = cvtPixel(d2.x, ndisp);
+            res.y = cvtPixel(d2.y, ndisp);
+
+            uint2* line = (uint2*)(out_image + y * out_step);
+            line[x >> 1] = res;
+        }
+    }
+
+    __global__ void drawColorDisp(int* disp, size_t disp_step, uchar* out_image, size_t out_step, int width, int height, int ndisp)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if(x < width && y < height)
+        {
+            uint *line = (uint*)(out_image + y * out_step);
+            line[x] = cvtPixel(disp[y*disp_step + x], ndisp);
+        }
+    }
+
+    __global__ void drawColorDisp(float* disp, size_t disp_step, uchar* out_image, size_t out_step, int width, int height, int ndisp)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if(x < width && y < height)
+        {
+            uint *line = (uint*)(out_image + y * out_step);
+            line[x] = cvtPixel(disp[y*disp_step + x], ndisp);
+        }
+    }
+
+    void drawColorDisp_gpu(const PtrStepSzb& src, const PtrStepSzb& dst, int ndisp, const cudaStream_t& stream)
+    {
+        dim3 threads(16, 16, 1);
+        dim3 grid(1, 1, 1);
+        grid.x = divUp(src.cols, threads.x << 2);
+        grid.y = divUp(src.rows, threads.y);
+
+        drawColorDisp<<<grid, threads, 0, stream>>>(src.data, src.step, dst.data, dst.step, src.cols, src.rows, ndisp);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    void drawColorDisp_gpu(const PtrStepSz<short>& src, const PtrStepSzb& dst, int ndisp, const cudaStream_t& stream)
+    {
+        dim3 threads(32, 8, 1);
+        dim3 grid(1, 1, 1);
+        grid.x = divUp(src.cols, threads.x << 1);
+        grid.y = divUp(src.rows, threads.y);
+
+        drawColorDisp<<<grid, threads, 0, stream>>>(src.data, src.step / sizeof(short), dst.data, dst.step, src.cols, src.rows, ndisp);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    void drawColorDisp_gpu(const PtrStepSz<int>& src, const PtrStepSzb& dst, int ndisp, const cudaStream_t& stream)
+    {
+        dim3 threads(32, 8, 1);
+        dim3 grid(1, 1, 1);
+        grid.x = divUp(src.cols, threads.x);
+        grid.y = divUp(src.rows, threads.y);
+
+        drawColorDisp<<<grid, threads, 0, stream>>>(src.data, src.step / sizeof(int), dst.data, dst.step, src.cols, src.rows, ndisp);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    void drawColorDisp_gpu(const PtrStepSz<float>& src, const PtrStepSzb& dst, int ndisp, const cudaStream_t& stream)
+    {
+        dim3 threads(32, 8, 1);
+        dim3 grid(1, 1, 1);
+        grid.x = divUp(src.cols, threads.x);
+        grid.y = divUp(src.rows, threads.y);
+
+        drawColorDisp<<<grid, threads, 0, stream>>>(src.data, src.step / sizeof(float), dst.data, dst.step, src.cols, src.rows, ndisp);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+}}} // namespace cv { namespace cuda { namespace cudev
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudastereo/src/disparity_bilateral_filter.cpp b/modules/cudastereo/src/disparity_bilateral_filter.cpp
new file mode 100644
index 00000000000..c59e3b2cb4a
--- /dev/null
+++ b/modules/cudastereo/src/disparity_bilateral_filter.cpp
@@ -0,0 +1,197 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+Ptr<cuda::DisparityBilateralFilter> cv::cuda::createDisparityBilateralFilter(int, int, int) { throw_no_cuda(); return Ptr<cuda::DisparityBilateralFilter>(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+#include "cuda/disparity_bilateral_filter.hpp"
+
+namespace
+{
+    class DispBilateralFilterImpl : public cuda::DisparityBilateralFilter
+    {
+    public:
+        DispBilateralFilterImpl(int ndisp, int radius, int iters);
+
+        void apply(InputArray disparity, InputArray image, OutputArray dst, Stream& stream);
+
+        int getNumDisparities() const { return ndisp_; }
+        void setNumDisparities(int numDisparities) { ndisp_ = numDisparities; }
+
+        int getRadius() const { return radius_; }
+        void setRadius(int radius);
+
+        int getNumIters() const { return iters_; }
+        void setNumIters(int iters) { iters_ = iters; }
+
+        double getEdgeThreshold() const { return edge_threshold_; }
+        void setEdgeThreshold(double edge_threshold) { edge_threshold_ = (float) edge_threshold; }
+
+        double getMaxDiscThreshold() const { return max_disc_threshold_; }
+        void setMaxDiscThreshold(double max_disc_threshold) { max_disc_threshold_ = (float) max_disc_threshold; }
+
+        double getSigmaRange() const { return sigma_range_; }
+        void setSigmaRange(double sigma_range);
+
+    private:
+        int ndisp_;
+        int radius_;
+        int iters_;
+        float edge_threshold_;
+        float max_disc_threshold_;
+        float sigma_range_;
+
+        GpuMat table_color_;
+        GpuMat table_space_;
+    };
+
+    void calc_color_weighted_table(GpuMat& table_color, float sigma_range, int len)
+    {
+        Mat cpu_table_color(1, len, CV_32F);
+
+        float* line = cpu_table_color.ptr<float>();
+
+        for(int i = 0; i < len; i++)
+            line[i] = static_cast<float>(std::exp(-double(i * i) / (2 * sigma_range * sigma_range)));
+
+        table_color.upload(cpu_table_color);
+    }
+
+    void calc_space_weighted_filter(GpuMat& table_space, int win_size, float dist_space)
+    {
+        int half = (win_size >> 1);
+
+        Mat cpu_table_space(half + 1, half + 1, CV_32F);
+
+        for (int y = 0; y <= half; ++y)
+        {
+            float* row = cpu_table_space.ptr<float>(y);
+            for (int x = 0; x <= half; ++x)
+                row[x] = exp(-sqrt(float(y * y) + float(x * x)) / dist_space);
+        }
+
+        table_space.upload(cpu_table_space);
+    }
+
+    const float DEFAULT_EDGE_THRESHOLD = 0.1f;
+    const float DEFAULT_MAX_DISC_THRESHOLD = 0.2f;
+    const float DEFAULT_SIGMA_RANGE = 10.0f;
+
+    DispBilateralFilterImpl::DispBilateralFilterImpl(int ndisp, int radius, int iters) :
+        ndisp_(ndisp), radius_(radius), iters_(iters),
+        edge_threshold_(DEFAULT_EDGE_THRESHOLD), max_disc_threshold_(DEFAULT_MAX_DISC_THRESHOLD),
+        sigma_range_(DEFAULT_SIGMA_RANGE)
+    {
+        calc_color_weighted_table(table_color_, sigma_range_, 255);
+        calc_space_weighted_filter(table_space_, radius_ * 2 + 1, radius_ + 1.0f);
+    }
+
+    void DispBilateralFilterImpl::setRadius(int radius)
+    {
+        radius_ = radius;
+        calc_space_weighted_filter(table_space_, radius_ * 2 + 1, radius_ + 1.0f);
+    }
+
+    void DispBilateralFilterImpl::setSigmaRange(double sigma_range)
+    {
+        sigma_range_ = (float) sigma_range;
+        calc_color_weighted_table(table_color_, sigma_range_, 255);
+    }
+
+    template <typename T>
+    void disp_bilateral_filter_operator(int ndisp, int radius, int iters, float edge_threshold, float max_disc_threshold,
+                                        GpuMat& table_color, GpuMat& table_space,
+                                        const GpuMat& disp, const GpuMat& img,
+                                        OutputArray _dst, Stream& stream)
+    {
+        using namespace cv::cuda::device::disp_bilateral_filter;
+
+        const short edge_disc = std::max<short>(short(1), short(ndisp * edge_threshold + 0.5));
+        const short max_disc = short(ndisp * max_disc_threshold + 0.5);
+
+        size_t table_space_step = table_space.step / sizeof(float);
+
+        _dst.create(disp.size(), disp.type());
+        GpuMat dst = _dst.getGpuMat();
+
+        if (dst.data != disp.data)
+            disp.copyTo(dst, stream);
+
+        disp_bilateral_filter<T>(dst, img, img.channels(), iters, table_color.ptr<float>(), (float *)table_space.data, table_space_step, radius, edge_disc, max_disc, StreamAccessor::getStream(stream));
+    }
+
+    void DispBilateralFilterImpl::apply(InputArray _disp, InputArray _image, OutputArray dst, Stream& stream)
+    {
+        typedef void (*bilateral_filter_operator_t)(int ndisp, int radius, int iters, float edge_threshold, float max_disc_threshold,
+                                                    GpuMat& table_color, GpuMat& table_space,
+                                                    const GpuMat& disp, const GpuMat& img, OutputArray dst, Stream& stream);
+        const bilateral_filter_operator_t operators[] =
+            {disp_bilateral_filter_operator<unsigned char>, 0, 0, disp_bilateral_filter_operator<short>, 0, 0, 0, 0};
+
+        CV_Assert( 0 < ndisp_ && 0 < radius_ && 0 < iters_ );
+
+        GpuMat disp = _disp.getGpuMat();
+        GpuMat img = _image.getGpuMat();
+
+        CV_Assert( disp.type() == CV_8U || disp.type() == CV_16S );
+        CV_Assert( img.type() == CV_8UC1 || img.type() == CV_8UC3 );
+        CV_Assert( disp.size() == img.size() );
+
+        operators[disp.type()](ndisp_, radius_, iters_, edge_threshold_, max_disc_threshold_,
+                               table_color_, table_space_, disp, img, dst, stream);
+    }
+}
+
+Ptr<cuda::DisparityBilateralFilter> cv::cuda::createDisparityBilateralFilter(int ndisp, int radius, int iters)
+{
+    return makePtr<DispBilateralFilterImpl>(ndisp, radius, iters);
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudastereo/src/precomp.hpp b/modules/cudastereo/src/precomp.hpp
new file mode 100644
index 00000000000..bdd0738e7e3
--- /dev/null
+++ b/modules/cudastereo/src/precomp.hpp
@@ -0,0 +1,53 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_PRECOMP_H__
+#define __OPENCV_PRECOMP_H__
+
+#include <limits>
+
+#include "opencv2/cudastereo.hpp"
+
+#include "opencv2/core/private.cuda.hpp"
+#include "opencv2/core/utility.hpp"
+
+#endif /* __OPENCV_PRECOMP_H__ */
diff --git a/modules/cudastereo/src/stereobm.cpp b/modules/cudastereo/src/stereobm.cpp
new file mode 100644
index 00000000000..1cfc0a644ff
--- /dev/null
+++ b/modules/cudastereo/src/stereobm.cpp
@@ -0,0 +1,185 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+Ptr<cuda::StereoBM> cv::cuda::createStereoBM(int, int) { throw_no_cuda(); return Ptr<cuda::StereoBM>(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace stereobm
+    {
+        void stereoBM_CUDA(const PtrStepSzb& left, const PtrStepSzb& right, const PtrStepSzb& disp, int ndisp, int winsz, const PtrStepSz<unsigned int>& minSSD_buf, cudaStream_t & stream);
+        void prefilter_xsobel(const PtrStepSzb& input, const PtrStepSzb& output, int prefilterCap /*= 31*/, cudaStream_t & stream);
+        void postfilter_textureness(const PtrStepSzb& input, int winsz, float avgTexturenessThreshold, const PtrStepSzb& disp, cudaStream_t & stream);
+    }
+}}}
+
+namespace
+{
+    class StereoBMImpl : public cuda::StereoBM
+    {
+    public:
+        StereoBMImpl(int numDisparities, int blockSize);
+
+        void compute(InputArray left, InputArray right, OutputArray disparity);
+        void compute(InputArray left, InputArray right, OutputArray disparity, Stream& stream);
+
+        int getMinDisparity() const { return 0; }
+        void setMinDisparity(int /*minDisparity*/) {}
+
+        int getNumDisparities() const { return ndisp_; }
+        void setNumDisparities(int numDisparities) { ndisp_ = numDisparities; }
+
+        int getBlockSize() const { return winSize_; }
+        void setBlockSize(int blockSize) { winSize_ = blockSize; }
+
+        int getSpeckleWindowSize() const { return 0; }
+        void setSpeckleWindowSize(int /*speckleWindowSize*/) {}
+
+        int getSpeckleRange() const { return 0; }
+        void setSpeckleRange(int /*speckleRange*/) {}
+
+        int getDisp12MaxDiff() const { return 0; }
+        void setDisp12MaxDiff(int /*disp12MaxDiff*/) {}
+
+        int getPreFilterType() const { return preset_; }
+        void setPreFilterType(int preFilterType) { preset_ = preFilterType; }
+
+        int getPreFilterSize() const { return 0; }
+        void setPreFilterSize(int /*preFilterSize*/) {}
+
+        int getPreFilterCap() const { return preFilterCap_; }
+        void setPreFilterCap(int preFilterCap) { preFilterCap_ = preFilterCap; }
+
+        int getTextureThreshold() const { return static_cast<int>(avergeTexThreshold_); }
+        void setTextureThreshold(int textureThreshold) { avergeTexThreshold_ = static_cast<float>(textureThreshold); }
+
+        int getUniquenessRatio() const { return 0; }
+        void setUniquenessRatio(int /*uniquenessRatio*/) {}
+
+        int getSmallerBlockSize() const { return 0; }
+        void setSmallerBlockSize(int /*blockSize*/){}
+
+        Rect getROI1() const { return Rect(); }
+        void setROI1(Rect /*roi1*/) {}
+
+        Rect getROI2() const { return Rect(); }
+        void setROI2(Rect /*roi2*/) {}
+
+    private:
+        int preset_;
+        int ndisp_;
+        int winSize_;
+        int preFilterCap_;
+        float avergeTexThreshold_;
+
+        GpuMat minSSD_, leBuf_, riBuf_;
+    };
+
+    StereoBMImpl::StereoBMImpl(int numDisparities, int blockSize)
+        : preset_(0), ndisp_(numDisparities), winSize_(blockSize), preFilterCap_(31), avergeTexThreshold_(3)
+    {
+    }
+
+    void StereoBMImpl::compute(InputArray left, InputArray right, OutputArray disparity)
+    {
+        compute(left, right, disparity, Stream::Null());
+    }
+
+    void StereoBMImpl::compute(InputArray _left, InputArray _right, OutputArray _disparity, Stream& _stream)
+    {
+        using namespace ::cv::cuda::device::stereobm;
+
+        const int max_supported_ndisp = 1 << (sizeof(unsigned char) * 8);
+        CV_Assert( 0 < ndisp_ && ndisp_ <= max_supported_ndisp );
+        CV_Assert( ndisp_ % 8 == 0 );
+        CV_Assert( winSize_ % 2 == 1 );
+
+        GpuMat left = _left.getGpuMat();
+        GpuMat right = _right.getGpuMat();
+
+        CV_Assert( left.type() == CV_8UC1 );
+        CV_Assert( left.size() == right.size() && left.type() == right.type() );
+
+        _disparity.create(left.size(), CV_8UC1);
+        GpuMat disparity = _disparity.getGpuMat();
+
+        cudaStream_t stream = StreamAccessor::getStream(_stream);
+
+        cuda::ensureSizeIsEnough(left.size(), CV_32SC1, minSSD_);
+
+        PtrStepSzb le_for_bm =  left;
+        PtrStepSzb ri_for_bm = right;
+
+        if (preset_ == cv::StereoBM::PREFILTER_XSOBEL)
+        {
+            cuda::ensureSizeIsEnough(left.size(), left.type(), leBuf_);
+            cuda::ensureSizeIsEnough(right.size(), right.type(), riBuf_);
+
+            prefilter_xsobel( left, leBuf_, preFilterCap_, stream);
+            prefilter_xsobel(right, riBuf_, preFilterCap_, stream);
+
+            le_for_bm = leBuf_;
+            ri_for_bm = riBuf_;
+        }
+
+        stereoBM_CUDA(le_for_bm, ri_for_bm, disparity, ndisp_, winSize_, minSSD_, stream);
+
+        if (avergeTexThreshold_ > 0)
+            postfilter_textureness(le_for_bm, winSize_, avergeTexThreshold_, disparity, stream);
+    }
+}
+
+Ptr<cuda::StereoBM> cv::cuda::createStereoBM(int numDisparities, int blockSize)
+{
+    return makePtr<StereoBMImpl>(numDisparities, blockSize);
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudastereo/src/stereobp.cpp b/modules/cudastereo/src/stereobp.cpp
new file mode 100644
index 00000000000..2c39d3e240e
--- /dev/null
+++ b/modules/cudastereo/src/stereobp.cpp
@@ -0,0 +1,380 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+void cv::cuda::StereoBeliefPropagation::estimateRecommendedParams(int, int, int&, int&, int&) { throw_no_cuda(); }
+
+Ptr<cuda::StereoBeliefPropagation> cv::cuda::createStereoBeliefPropagation(int, int, int, int) { throw_no_cuda(); return Ptr<cuda::StereoBeliefPropagation>(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace stereobp
+    {
+        void load_constants(int ndisp, float max_data_term, float data_weight, float max_disc_term, float disc_single_jump);
+        template<typename T, typename D>
+        void comp_data_gpu(const PtrStepSzb& left, const PtrStepSzb& right, const PtrStepSzb& data, cudaStream_t stream);
+        template<typename T>
+        void data_step_down_gpu(int dst_cols, int dst_rows, int src_cols, int src_rows, const PtrStepSzb& src, const PtrStepSzb& dst, cudaStream_t stream);
+        template <typename T>
+        void level_up_messages_gpu(int dst_idx, int dst_cols, int dst_rows, int src_rows, PtrStepSzb* mus, PtrStepSzb* mds, PtrStepSzb* mls, PtrStepSzb* mrs, cudaStream_t stream);
+        template <typename T>
+        void calc_all_iterations_gpu(int cols, int rows, int iters, const PtrStepSzb& u, const PtrStepSzb& d,
+            const PtrStepSzb& l, const PtrStepSzb& r, const PtrStepSzb& data, cudaStream_t stream);
+        template <typename T>
+        void output_gpu(const PtrStepSzb& u, const PtrStepSzb& d, const PtrStepSzb& l, const PtrStepSzb& r, const PtrStepSzb& data,
+            const PtrStepSz<short>& disp, cudaStream_t stream);
+    }
+}}}
+
+namespace
+{
+    class StereoBPImpl : public cuda::StereoBeliefPropagation
+    {
+    public:
+        StereoBPImpl(int ndisp, int iters, int levels, int msg_type);
+
+        void compute(InputArray left, InputArray right, OutputArray disparity);
+        void compute(InputArray left, InputArray right, OutputArray disparity, Stream& stream);
+        void compute(InputArray data, OutputArray disparity, Stream& stream);
+
+        int getMinDisparity() const { return 0; }
+        void setMinDisparity(int /*minDisparity*/) {}
+
+        int getNumDisparities() const { return ndisp_; }
+        void setNumDisparities(int numDisparities) { ndisp_ = numDisparities; }
+
+        int getBlockSize() const { return 0; }
+        void setBlockSize(int /*blockSize*/) {}
+
+        int getSpeckleWindowSize() const { return 0; }
+        void setSpeckleWindowSize(int /*speckleWindowSize*/) {}
+
+        int getSpeckleRange() const { return 0; }
+        void setSpeckleRange(int /*speckleRange*/) {}
+
+        int getDisp12MaxDiff() const { return 0; }
+        void setDisp12MaxDiff(int /*disp12MaxDiff*/) {}
+
+        int getNumIters() const { return iters_; }
+        void setNumIters(int iters) { iters_ = iters; }
+
+        int getNumLevels() const { return levels_; }
+        void setNumLevels(int levels) { levels_ = levels; }
+
+        double getMaxDataTerm() const { return max_data_term_; }
+        void setMaxDataTerm(double max_data_term) { max_data_term_ = (float) max_data_term; }
+
+        double getDataWeight() const { return data_weight_; }
+        void setDataWeight(double data_weight) { data_weight_ = (float) data_weight; }
+
+        double getMaxDiscTerm() const { return max_disc_term_; }
+        void setMaxDiscTerm(double max_disc_term) { max_disc_term_ = (float) max_disc_term; }
+
+        double getDiscSingleJump() const { return disc_single_jump_; }
+        void setDiscSingleJump(double disc_single_jump) { disc_single_jump_ = (float) disc_single_jump; }
+
+        int getMsgType() const { return msg_type_; }
+        void setMsgType(int msg_type) { msg_type_ = msg_type; }
+
+    private:
+        void init(Stream& stream);
+        void calcBP(OutputArray disp, Stream& stream);
+
+        int ndisp_;
+        int iters_;
+        int levels_;
+        float max_data_term_;
+        float data_weight_;
+        float max_disc_term_;
+        float disc_single_jump_;
+        int msg_type_;
+
+        float scale_;
+        int rows_, cols_;
+        std::vector<int> cols_all_, rows_all_;
+        GpuMat u_, d_, l_, r_, u2_, d2_, l2_, r2_;
+        std::vector<GpuMat> datas_;
+        GpuMat outBuf_;
+    };
+
+    const float DEFAULT_MAX_DATA_TERM = 10.0f;
+    const float DEFAULT_DATA_WEIGHT = 0.07f;
+    const float DEFAULT_MAX_DISC_TERM = 1.7f;
+    const float DEFAULT_DISC_SINGLE_JUMP = 1.0f;
+
+    StereoBPImpl::StereoBPImpl(int ndisp, int iters, int levels, int msg_type) :
+        ndisp_(ndisp), iters_(iters), levels_(levels),
+        max_data_term_(DEFAULT_MAX_DATA_TERM), data_weight_(DEFAULT_DATA_WEIGHT),
+        max_disc_term_(DEFAULT_MAX_DISC_TERM), disc_single_jump_(DEFAULT_DISC_SINGLE_JUMP),
+        msg_type_(msg_type)
+    {
+    }
+
+    void StereoBPImpl::compute(InputArray left, InputArray right, OutputArray disparity)
+    {
+        compute(left, right, disparity, Stream::Null());
+    }
+
+    void StereoBPImpl::compute(InputArray _left, InputArray _right, OutputArray disparity, Stream& stream)
+    {
+        using namespace cv::cuda::device::stereobp;
+
+        typedef void (*comp_data_t)(const PtrStepSzb& left, const PtrStepSzb& right, const PtrStepSzb& data, cudaStream_t stream);
+        static const comp_data_t comp_data_callers[2][5] =
+        {
+            {0, comp_data_gpu<unsigned char, short>, 0, comp_data_gpu<uchar3, short>, comp_data_gpu<uchar4, short>},
+            {0, comp_data_gpu<unsigned char, float>, 0, comp_data_gpu<uchar3, float>, comp_data_gpu<uchar4, float>}
+        };
+
+        scale_ = msg_type_ == CV_32F ? 1.0f : 10.0f;
+
+        CV_Assert( 0 < ndisp_ && 0 < iters_ && 0 < levels_ );
+        CV_Assert( msg_type_ == CV_32F || msg_type_ == CV_16S );
+        CV_Assert( msg_type_ == CV_32F || (1 << (levels_ - 1)) * scale_ * max_data_term_ < std::numeric_limits<short>::max() );
+
+        GpuMat left = _left.getGpuMat();
+        GpuMat right = _right.getGpuMat();
+
+        CV_Assert( left.type() == CV_8UC1 || left.type() == CV_8UC3 || left.type() == CV_8UC4 );
+        CV_Assert( left.size() == right.size() && left.type() == right.type() );
+
+        rows_ = left.rows;
+        cols_ = left.cols;
+
+        const int divisor = (int) pow(2.f, levels_ - 1.0f);
+        const int lowest_cols = cols_ / divisor;
+        const int lowest_rows = rows_ / divisor;
+        const int min_image_dim_size = 2;
+        CV_Assert( std::min(lowest_cols, lowest_rows) > min_image_dim_size );
+
+        init(stream);
+
+        datas_[0].create(rows_ * ndisp_, cols_, msg_type_);
+
+        comp_data_callers[msg_type_ == CV_32F][left.channels()](left, right, datas_[0], StreamAccessor::getStream(stream));
+
+        calcBP(disparity, stream);
+    }
+
+    void StereoBPImpl::compute(InputArray _data, OutputArray disparity, Stream& stream)
+    {
+        scale_ = msg_type_ == CV_32F ? 1.0f : 10.0f;
+
+        CV_Assert( 0 < ndisp_ && 0 < iters_ && 0 < levels_ );
+        CV_Assert( msg_type_ == CV_32F || msg_type_ == CV_16S );
+        CV_Assert( msg_type_ == CV_32F || (1 << (levels_ - 1)) * scale_ * max_data_term_ < std::numeric_limits<short>::max() );
+
+        GpuMat data = _data.getGpuMat();
+
+        CV_Assert( (data.type() == msg_type_) && (data.rows % ndisp_ == 0) );
+
+        rows_ = data.rows / ndisp_;
+        cols_ = data.cols;
+
+        const int divisor = (int) pow(2.f, levels_ - 1.0f);
+        const int lowest_cols = cols_ / divisor;
+        const int lowest_rows = rows_ / divisor;
+        const int min_image_dim_size = 2;
+        CV_Assert( std::min(lowest_cols, lowest_rows) > min_image_dim_size );
+
+        init(stream);
+
+        data.copyTo(datas_[0], stream);
+
+        calcBP(disparity, stream);
+    }
+
+    void StereoBPImpl::init(Stream& stream)
+    {
+        using namespace cv::cuda::device::stereobp;
+
+        u_.create(rows_ * ndisp_, cols_, msg_type_);
+        d_.create(rows_ * ndisp_, cols_, msg_type_);
+        l_.create(rows_ * ndisp_, cols_, msg_type_);
+        r_.create(rows_ * ndisp_, cols_, msg_type_);
+
+        if (levels_ & 1)
+        {
+            //can clear less area
+            u_.setTo(0, stream);
+            d_.setTo(0, stream);
+            l_.setTo(0, stream);
+            r_.setTo(0, stream);
+        }
+
+        if (levels_ > 1)
+        {
+            int less_rows = (rows_ + 1) / 2;
+            int less_cols = (cols_ + 1) / 2;
+
+            u2_.create(less_rows * ndisp_, less_cols, msg_type_);
+            d2_.create(less_rows * ndisp_, less_cols, msg_type_);
+            l2_.create(less_rows * ndisp_, less_cols, msg_type_);
+            r2_.create(less_rows * ndisp_, less_cols, msg_type_);
+
+            if ((levels_ & 1) == 0)
+            {
+                u2_.setTo(0, stream);
+                d2_.setTo(0, stream);
+                l2_.setTo(0, stream);
+                r2_.setTo(0, stream);
+            }
+        }
+
+        load_constants(ndisp_, max_data_term_, scale_ * data_weight_, scale_ * max_disc_term_, scale_ * disc_single_jump_);
+
+        datas_.resize(levels_);
+
+        cols_all_.resize(levels_);
+        rows_all_.resize(levels_);
+
+        cols_all_[0] = cols_;
+        rows_all_[0] = rows_;
+    }
+
+    void StereoBPImpl::calcBP(OutputArray disp, Stream& _stream)
+    {
+        using namespace cv::cuda::device::stereobp;
+
+        typedef void (*data_step_down_t)(int dst_cols, int dst_rows, int src_cols, int src_rows, const PtrStepSzb& src, const PtrStepSzb& dst, cudaStream_t stream);
+        static const data_step_down_t data_step_down_callers[2] =
+        {
+            data_step_down_gpu<short>, data_step_down_gpu<float>
+        };
+
+        typedef void (*level_up_messages_t)(int dst_idx, int dst_cols, int dst_rows, int src_rows, PtrStepSzb* mus, PtrStepSzb* mds, PtrStepSzb* mls, PtrStepSzb* mrs, cudaStream_t stream);
+        static const level_up_messages_t level_up_messages_callers[2] =
+        {
+            level_up_messages_gpu<short>, level_up_messages_gpu<float>
+        };
+
+        typedef void (*calc_all_iterations_t)(int cols, int rows, int iters, const PtrStepSzb& u, const PtrStepSzb& d, const PtrStepSzb& l, const PtrStepSzb& r, const PtrStepSzb& data, cudaStream_t stream);
+        static const calc_all_iterations_t calc_all_iterations_callers[2] =
+        {
+            calc_all_iterations_gpu<short>, calc_all_iterations_gpu<float>
+        };
+
+        typedef void (*output_t)(const PtrStepSzb& u, const PtrStepSzb& d, const PtrStepSzb& l, const PtrStepSzb& r, const PtrStepSzb& data, const PtrStepSz<short>& disp, cudaStream_t stream);
+        static const output_t output_callers[2] =
+        {
+            output_gpu<short>, output_gpu<float>
+        };
+
+        const int funcIdx = msg_type_ == CV_32F;
+
+        cudaStream_t stream = StreamAccessor::getStream(_stream);
+
+        for (int i = 1; i < levels_; ++i)
+        {
+            cols_all_[i] = (cols_all_[i-1] + 1) / 2;
+            rows_all_[i] = (rows_all_[i-1] + 1) / 2;
+
+            datas_[i].create(rows_all_[i] * ndisp_, cols_all_[i], msg_type_);
+
+            data_step_down_callers[funcIdx](cols_all_[i], rows_all_[i], cols_all_[i-1], rows_all_[i-1], datas_[i-1], datas_[i], stream);
+        }
+
+        PtrStepSzb mus[] = {u_, u2_};
+        PtrStepSzb mds[] = {d_, d2_};
+        PtrStepSzb mrs[] = {r_, r2_};
+        PtrStepSzb mls[] = {l_, l2_};
+
+        int mem_idx = (levels_ & 1) ? 0 : 1;
+
+        for (int i = levels_ - 1; i >= 0; --i)
+        {
+            // for lower level we have already computed messages by setting to zero
+            if (i != levels_ - 1)
+                level_up_messages_callers[funcIdx](mem_idx, cols_all_[i], rows_all_[i], rows_all_[i+1], mus, mds, mls, mrs, stream);
+
+            calc_all_iterations_callers[funcIdx](cols_all_[i], rows_all_[i], iters_, mus[mem_idx], mds[mem_idx], mls[mem_idx], mrs[mem_idx], datas_[i], stream);
+
+            mem_idx = (mem_idx + 1) & 1;
+        }
+
+        const int dtype = disp.fixedType() ? disp.type() : CV_16SC1;
+
+        disp.create(rows_, cols_, dtype);
+        GpuMat out = disp.getGpuMat();
+
+        if (dtype != CV_16SC1)
+        {
+            outBuf_.create(rows_, cols_, CV_16SC1);
+            out = outBuf_;
+        }
+
+        out.setTo(0, _stream);
+
+        output_callers[funcIdx](u_, d_, l_, r_, datas_.front(), out, stream);
+
+        if (dtype != CV_16SC1)
+            out.convertTo(disp, dtype, _stream);
+    }
+}
+
+Ptr<cuda::StereoBeliefPropagation> cv::cuda::createStereoBeliefPropagation(int ndisp, int iters, int levels, int msg_type)
+{
+    return makePtr<StereoBPImpl>(ndisp, iters, levels, msg_type);
+}
+
+void cv::cuda::StereoBeliefPropagation::estimateRecommendedParams(int width, int height, int& ndisp, int& iters, int& levels)
+{
+    ndisp = width / 4;
+    if ((ndisp & 1) != 0)
+        ndisp++;
+
+    int mm = std::max(width, height);
+    iters = mm / 100 + 2;
+
+    levels = (int)(::log(static_cast<double>(mm)) + 1) * 4 / 5;
+    if (levels == 0) levels++;
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudastereo/src/stereocsbp.cpp b/modules/cudastereo/src/stereocsbp.cpp
new file mode 100644
index 00000000000..bc5a230f63e
--- /dev/null
+++ b/modules/cudastereo/src/stereocsbp.cpp
@@ -0,0 +1,357 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)
+
+void cv::cuda::StereoConstantSpaceBP::estimateRecommendedParams(int, int, int&, int&, int&, int&) { throw_no_cuda(); }
+
+Ptr<cuda::StereoConstantSpaceBP> cv::cuda::createStereoConstantSpaceBP(int, int, int, int, int) { throw_no_cuda(); return Ptr<cuda::StereoConstantSpaceBP>(); }
+
+#else /* !defined (HAVE_CUDA) */
+
+#include "cuda/stereocsbp.hpp"
+
+namespace
+{
+    class StereoCSBPImpl : public cuda::StereoConstantSpaceBP
+    {
+    public:
+        StereoCSBPImpl(int ndisp, int iters, int levels, int nr_plane, int msg_type);
+
+        void compute(InputArray left, InputArray right, OutputArray disparity);
+        void compute(InputArray left, InputArray right, OutputArray disparity, Stream& stream);
+        void compute(InputArray data, OutputArray disparity, Stream& stream);
+
+        int getMinDisparity() const { return min_disp_th_; }
+        void setMinDisparity(int minDisparity) { min_disp_th_ = minDisparity; }
+
+        int getNumDisparities() const { return ndisp_; }
+        void setNumDisparities(int numDisparities) { ndisp_ = numDisparities; }
+
+        int getBlockSize() const { return 0; }
+        void setBlockSize(int /*blockSize*/) {}
+
+        int getSpeckleWindowSize() const { return 0; }
+        void setSpeckleWindowSize(int /*speckleWindowSize*/) {}
+
+        int getSpeckleRange() const { return 0; }
+        void setSpeckleRange(int /*speckleRange*/) {}
+
+        int getDisp12MaxDiff() const { return 0; }
+        void setDisp12MaxDiff(int /*disp12MaxDiff*/) {}
+
+        int getNumIters() const { return iters_; }
+        void setNumIters(int iters) { iters_ = iters; }
+
+        int getNumLevels() const { return levels_; }
+        void setNumLevels(int levels) { levels_ = levels; }
+
+        double getMaxDataTerm() const { return max_data_term_; }
+        void setMaxDataTerm(double max_data_term) { max_data_term_ = (float) max_data_term; }
+
+        double getDataWeight() const { return data_weight_; }
+        void setDataWeight(double data_weight) { data_weight_ = (float) data_weight; }
+
+        double getMaxDiscTerm() const { return max_disc_term_; }
+        void setMaxDiscTerm(double max_disc_term) { max_disc_term_ = (float) max_disc_term; }
+
+        double getDiscSingleJump() const { return disc_single_jump_; }
+        void setDiscSingleJump(double disc_single_jump) { disc_single_jump_ = (float) disc_single_jump; }
+
+        int getMsgType() const { return msg_type_; }
+        void setMsgType(int msg_type) { msg_type_ = msg_type; }
+
+        int getNrPlane() const { return nr_plane_; }
+        void setNrPlane(int nr_plane) { nr_plane_ = nr_plane; }
+
+        bool getUseLocalInitDataCost() const { return use_local_init_data_cost_; }
+        void setUseLocalInitDataCost(bool use_local_init_data_cost) { use_local_init_data_cost_ = use_local_init_data_cost; }
+
+    private:
+        int min_disp_th_;
+        int ndisp_;
+        int iters_;
+        int levels_;
+        float max_data_term_;
+        float data_weight_;
+        float max_disc_term_;
+        float disc_single_jump_;
+        int msg_type_;
+        int nr_plane_;
+        bool use_local_init_data_cost_;
+
+        GpuMat mbuf_;
+        GpuMat temp_;
+        GpuMat outBuf_;
+    };
+
+    const float DEFAULT_MAX_DATA_TERM = 30.0f;
+    const float DEFAULT_DATA_WEIGHT = 1.0f;
+    const float DEFAULT_MAX_DISC_TERM = 160.0f;
+    const float DEFAULT_DISC_SINGLE_JUMP = 10.0f;
+
+    StereoCSBPImpl::StereoCSBPImpl(int ndisp, int iters, int levels, int nr_plane, int msg_type) :
+        min_disp_th_(0), ndisp_(ndisp), iters_(iters), levels_(levels),
+        max_data_term_(DEFAULT_MAX_DATA_TERM), data_weight_(DEFAULT_DATA_WEIGHT),
+        max_disc_term_(DEFAULT_MAX_DISC_TERM), disc_single_jump_(DEFAULT_DISC_SINGLE_JUMP),
+        msg_type_(msg_type), nr_plane_(nr_plane), use_local_init_data_cost_(true)
+    {
+    }
+
+    void StereoCSBPImpl::compute(InputArray left, InputArray right, OutputArray disparity)
+    {
+        compute(left, right, disparity, Stream::Null());
+    }
+
+    void StereoCSBPImpl::compute(InputArray _left, InputArray _right, OutputArray disp, Stream& _stream)
+    {
+        using namespace cv::cuda::device::stereocsbp;
+
+        CV_Assert( msg_type_ == CV_32F || msg_type_ == CV_16S );
+        CV_Assert( 0 < ndisp_ && 0 < iters_ && 0 < levels_ && 0 < nr_plane_ && levels_ <= 8 );
+
+        GpuMat left = _left.getGpuMat();
+        GpuMat right = _right.getGpuMat();
+
+        CV_Assert( left.type() == CV_8UC1 || left.type() == CV_8UC3 || left.type() == CV_8UC4 );
+        CV_Assert( left.size() == right.size() && left.type() == right.type() );
+
+        cudaStream_t stream = StreamAccessor::getStream(_stream);
+
+        ////////////////////////////////////////////////////////////////////////////////////////////
+        // Init
+
+        int rows = left.rows;
+        int cols = left.cols;
+
+        levels_ = std::min(levels_, int(log((double)ndisp_) / log(2.0)));
+
+        // compute sizes
+        AutoBuffer<int> buf(levels_ * 3);
+        int* cols_pyr = buf.data();
+        int* rows_pyr = cols_pyr + levels_;
+        int* nr_plane_pyr = rows_pyr + levels_;
+
+        cols_pyr[0]     = cols;
+        rows_pyr[0]     = rows;
+        nr_plane_pyr[0] = nr_plane_;
+
+        for (int i = 1; i < levels_; i++)
+        {
+            cols_pyr[i]     = cols_pyr[i-1] / 2;
+            rows_pyr[i]     = rows_pyr[i-1] / 2;
+            nr_plane_pyr[i] = nr_plane_pyr[i-1] * 2;
+        }
+
+        GpuMat u[2], d[2], l[2], r[2], disp_selected_pyr[2], data_cost, data_cost_selected;
+
+        //allocate buffers
+        int buffers_count = 10; // (up + down + left + right + disp_selected_pyr) * 2
+        buffers_count += 2; //  data_cost has twice more rows than other buffers, what's why +2, not +1;
+        buffers_count += 1; //  data_cost_selected
+        mbuf_.create(rows * nr_plane_ * buffers_count, cols, msg_type_);
+
+        data_cost          = mbuf_.rowRange(0, rows * nr_plane_ * 2);
+        data_cost_selected = mbuf_.rowRange(data_cost.rows, data_cost.rows + rows * nr_plane_);
+
+        for(int k = 0; k < 2; ++k) // in/out
+        {
+            GpuMat sub1 = mbuf_.rowRange(data_cost.rows + data_cost_selected.rows, mbuf_.rows);
+            GpuMat sub2 = sub1.rowRange((k+0)*sub1.rows/2, (k+1)*sub1.rows/2);
+
+            GpuMat *buf_ptrs[] = { &u[k], &d[k], &l[k], &r[k], &disp_selected_pyr[k] };
+            for(int _r = 0; _r < 5; ++_r)
+            {
+                *buf_ptrs[_r] = sub2.rowRange(_r * sub2.rows/5, (_r+1) * sub2.rows/5);
+                CV_DbgAssert( buf_ptrs[_r]->cols == cols && buf_ptrs[_r]->rows == rows * nr_plane_ );
+            }
+        };
+
+        size_t elem_step = mbuf_.step / mbuf_.elemSize();
+
+        Size temp_size = data_cost.size();
+        if ((size_t)temp_size.area() < elem_step * rows_pyr[levels_ - 1] * ndisp_)
+            temp_size = Size(static_cast<int>(elem_step), rows_pyr[levels_ - 1] * ndisp_);
+
+        temp_.create(temp_size, msg_type_);
+
+        ////////////////////////////////////////////////////////////////////////////
+        // Compute
+
+        l[0].setTo(0, _stream);
+        d[0].setTo(0, _stream);
+        r[0].setTo(0, _stream);
+        u[0].setTo(0, _stream);
+
+        l[1].setTo(0, _stream);
+        d[1].setTo(0, _stream);
+        r[1].setTo(0, _stream);
+        u[1].setTo(0, _stream);
+
+        data_cost.setTo(0, _stream);
+        data_cost_selected.setTo(0, _stream);
+
+        int cur_idx = 0;
+
+        if (msg_type_ == CV_32F)
+        {
+            for (int i = levels_ - 1; i >= 0; i--)
+            {
+                if (i == levels_ - 1)
+                {
+                    init_data_cost(left.ptr<uchar>(), right.ptr<uchar>(), temp_.ptr<uchar>(), left.step, left.rows, left.cols, disp_selected_pyr[cur_idx].ptr<float>(), data_cost_selected.ptr<float>(),
+                        elem_step, rows_pyr[i], cols_pyr[i], i, nr_plane_pyr[i], ndisp_, left.channels(), data_weight_, max_data_term_, min_disp_th_, use_local_init_data_cost_, stream);
+                }
+                else
+                {
+                    compute_data_cost(left.ptr<uchar>(), right.ptr<uchar>(), left.step, disp_selected_pyr[cur_idx].ptr<float>(), data_cost.ptr<float>(), elem_step,
+                        left.rows, left.cols, rows_pyr[i], cols_pyr[i], rows_pyr[i+1], i, nr_plane_pyr[i+1], left.channels(), data_weight_, max_data_term_, min_disp_th_, stream);
+
+                    int new_idx = (cur_idx + 1) & 1;
+
+                    init_message(temp_.ptr<uchar>(),
+                                 u[new_idx].ptr<float>(), d[new_idx].ptr<float>(), l[new_idx].ptr<float>(), r[new_idx].ptr<float>(),
+                                 u[cur_idx].ptr<float>(), d[cur_idx].ptr<float>(), l[cur_idx].ptr<float>(), r[cur_idx].ptr<float>(),
+                                 disp_selected_pyr[new_idx].ptr<float>(), disp_selected_pyr[cur_idx].ptr<float>(),
+                                 data_cost_selected.ptr<float>(), data_cost.ptr<float>(), elem_step, rows_pyr[i],
+                                 cols_pyr[i], nr_plane_pyr[i], rows_pyr[i+1], cols_pyr[i+1], nr_plane_pyr[i+1], stream);
+
+                    cur_idx = new_idx;
+                }
+
+                calc_all_iterations(temp_.ptr<uchar>(), u[cur_idx].ptr<float>(), d[cur_idx].ptr<float>(), l[cur_idx].ptr<float>(), r[cur_idx].ptr<float>(),
+                                    data_cost_selected.ptr<float>(), disp_selected_pyr[cur_idx].ptr<float>(), elem_step,
+                                    rows_pyr[i], cols_pyr[i], nr_plane_pyr[i], iters_, max_disc_term_, disc_single_jump_, stream);
+            }
+        }
+        else
+        {
+            for (int i = levels_ - 1; i >= 0; i--)
+            {
+                if (i == levels_ - 1)
+                {
+                    init_data_cost(left.ptr<uchar>(), right.ptr<uchar>(), temp_.ptr<uchar>(), left.step, left.rows, left.cols, disp_selected_pyr[cur_idx].ptr<short>(), data_cost_selected.ptr<short>(),
+                        elem_step, rows_pyr[i], cols_pyr[i], i, nr_plane_pyr[i], ndisp_, left.channels(), data_weight_, max_data_term_, min_disp_th_, use_local_init_data_cost_, stream);
+                }
+                else
+                {
+                    compute_data_cost(left.ptr<uchar>(), right.ptr<uchar>(), left.step, disp_selected_pyr[cur_idx].ptr<short>(), data_cost.ptr<short>(), elem_step,
+                        left.rows, left.cols, rows_pyr[i], cols_pyr[i], rows_pyr[i+1], i, nr_plane_pyr[i+1], left.channels(), data_weight_, max_data_term_, min_disp_th_, stream);
+
+                    int new_idx = (cur_idx + 1) & 1;
+
+                    init_message(temp_.ptr<uchar>(),
+                                 u[new_idx].ptr<short>(), d[new_idx].ptr<short>(), l[new_idx].ptr<short>(), r[new_idx].ptr<short>(),
+                                 u[cur_idx].ptr<short>(), d[cur_idx].ptr<short>(), l[cur_idx].ptr<short>(), r[cur_idx].ptr<short>(),
+                                 disp_selected_pyr[new_idx].ptr<short>(), disp_selected_pyr[cur_idx].ptr<short>(),
+                                 data_cost_selected.ptr<short>(), data_cost.ptr<short>(), elem_step, rows_pyr[i],
+                                 cols_pyr[i], nr_plane_pyr[i], rows_pyr[i+1], cols_pyr[i+1], nr_plane_pyr[i+1], stream);
+
+                    cur_idx = new_idx;
+                }
+
+                calc_all_iterations(temp_.ptr<uchar>(), u[cur_idx].ptr<short>(), d[cur_idx].ptr<short>(), l[cur_idx].ptr<short>(), r[cur_idx].ptr<short>(),
+                                    data_cost_selected.ptr<short>(), disp_selected_pyr[cur_idx].ptr<short>(), elem_step,
+                                    rows_pyr[i], cols_pyr[i], nr_plane_pyr[i], iters_, max_disc_term_, disc_single_jump_, stream);
+            }
+        }
+
+        const int dtype = disp.fixedType() ? disp.type() : CV_16SC1;
+
+        disp.create(rows, cols, dtype);
+        GpuMat out = disp.getGpuMat();
+
+        if (dtype != CV_16SC1)
+        {
+            outBuf_.create(rows, cols, CV_16SC1);
+            out = outBuf_;
+        }
+
+        out.setTo(0, _stream);
+
+        if (msg_type_ == CV_32F)
+        {
+            compute_disp(u[cur_idx].ptr<float>(), d[cur_idx].ptr<float>(), l[cur_idx].ptr<float>(), r[cur_idx].ptr<float>(),
+                         data_cost_selected.ptr<float>(), disp_selected_pyr[cur_idx].ptr<float>(), elem_step, out, nr_plane_pyr[0], stream);
+        }
+        else
+        {
+            compute_disp(u[cur_idx].ptr<short>(), d[cur_idx].ptr<short>(), l[cur_idx].ptr<short>(), r[cur_idx].ptr<short>(),
+                         data_cost_selected.ptr<short>(), disp_selected_pyr[cur_idx].ptr<short>(), elem_step, out, nr_plane_pyr[0], stream);
+        }
+
+        if (dtype != CV_16SC1)
+            out.convertTo(disp, dtype, _stream);
+    }
+
+    void StereoCSBPImpl::compute(InputArray /*data*/, OutputArray /*disparity*/, Stream& /*stream*/)
+    {
+        CV_Error(Error::StsNotImplemented, "Not implemented");
+    }
+}
+
+Ptr<cuda::StereoConstantSpaceBP> cv::cuda::createStereoConstantSpaceBP(int ndisp, int iters, int levels, int nr_plane, int msg_type)
+{
+    return makePtr<StereoCSBPImpl>(ndisp, iters, levels, nr_plane, msg_type);
+}
+
+void cv::cuda::StereoConstantSpaceBP::estimateRecommendedParams(int width, int height, int& ndisp, int& iters, int& levels, int& nr_plane)
+{
+    ndisp = (int) ((float) width / 3.14f);
+    if ((ndisp & 1) != 0)
+        ndisp++;
+
+    int mm = std::max(width, height);
+    iters = mm / 100 + ((mm > 1200)? - 4 : 4);
+
+    levels = (int)::log(static_cast<double>(mm)) * 2 / 3;
+    if (levels == 0) levels++;
+
+    nr_plane = (int) ((float) ndisp / std::pow(2.0, levels + 1));
+}
+
+#endif /* !defined (HAVE_CUDA) */
diff --git a/modules/cudastereo/src/util.cpp b/modules/cudastereo/src/util.cpp
new file mode 100644
index 00000000000..09b108ca892
--- /dev/null
+++ b/modules/cudastereo/src/util.cpp
@@ -0,0 +1,125 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined HAVE_CUDA || defined(CUDA_DISABLER)
+
+void cv::cuda::reprojectImageTo3D(InputArray, OutputArray, InputArray, int, Stream&) { throw_no_cuda(); }
+void cv::cuda::drawColorDisp(InputArray, OutputArray, int, Stream&) { throw_no_cuda(); }
+
+#else
+
+////////////////////////////////////////////////////////////////////////
+// reprojectImageTo3D
+
+namespace cv { namespace cuda { namespace device
+{
+    template <typename T, typename D>
+    void reprojectImageTo3D_gpu(const PtrStepSzb disp, PtrStepSzb xyz, const float* q, cudaStream_t stream);
+}}}
+
+void cv::cuda::reprojectImageTo3D(InputArray _disp, OutputArray _xyz, InputArray _Q, int dst_cn, Stream& stream)
+{
+    using namespace cv::cuda::device;
+
+    typedef void (*func_t)(const PtrStepSzb disp, PtrStepSzb xyz, const float* q, cudaStream_t stream);
+    static const func_t funcs[2][6] =
+    {
+        {reprojectImageTo3D_gpu<uchar, float3>, 0, 0, reprojectImageTo3D_gpu<short, float3>, reprojectImageTo3D_gpu<int, float3>, reprojectImageTo3D_gpu<float, float3>},
+        {reprojectImageTo3D_gpu<uchar, float4>, 0, 0, reprojectImageTo3D_gpu<short, float4>, reprojectImageTo3D_gpu<int, float4>, reprojectImageTo3D_gpu<float, float4>}
+    };
+
+    GpuMat disp = _disp.getGpuMat();
+    Mat Q = _Q.getMat();
+
+    CV_Assert( disp.type() == CV_8U || disp.type() == CV_16S || disp.type() == CV_32S || disp.type() == CV_32F );
+    CV_Assert( Q.type() == CV_32F && Q.rows == 4 && Q.cols == 4 && Q.isContinuous() );
+    CV_Assert( dst_cn == 3 || dst_cn == 4 );
+
+    _xyz.create(disp.size(), CV_MAKE_TYPE(CV_32F, dst_cn));
+    GpuMat xyz = _xyz.getGpuMat();
+
+    funcs[dst_cn == 4][disp.type()](disp, xyz, Q.ptr<float>(), StreamAccessor::getStream(stream));
+}
+
+////////////////////////////////////////////////////////////////////////
+// drawColorDisp
+
+namespace cv { namespace cuda { namespace device
+{
+    void drawColorDisp_gpu(const PtrStepSzb& src, const PtrStepSzb& dst, int ndisp, const cudaStream_t& stream);
+    void drawColorDisp_gpu(const PtrStepSz<short>& src, const PtrStepSzb& dst, int ndisp, const cudaStream_t& stream);
+    void drawColorDisp_gpu(const PtrStepSz<int>& src, const PtrStepSzb& dst, int ndisp, const cudaStream_t& stream);
+    void drawColorDisp_gpu(const PtrStepSz<float>& src, const PtrStepSzb& dst, int ndisp, const cudaStream_t& stream);
+}}}
+
+namespace
+{
+    template <typename T>
+    void drawColorDisp_caller(const GpuMat& src, OutputArray _dst, int ndisp, const cudaStream_t& stream)
+    {
+        using namespace ::cv::cuda::device;
+
+        _dst.create(src.size(), CV_8UC4);
+        GpuMat dst = _dst.getGpuMat();
+
+        drawColorDisp_gpu((PtrStepSz<T>)src, dst, ndisp, stream);
+    }
+}
+
+void cv::cuda::drawColorDisp(InputArray _src, OutputArray dst, int ndisp, Stream& stream)
+{
+    typedef void (*drawColorDisp_caller_t)(const GpuMat& src, OutputArray dst, int ndisp, const cudaStream_t& stream);
+    const drawColorDisp_caller_t drawColorDisp_callers[] = {drawColorDisp_caller<unsigned char>, 0, 0, drawColorDisp_caller<short>, drawColorDisp_caller<int>, drawColorDisp_caller<float>, 0, 0};
+
+    GpuMat src = _src.getGpuMat();
+
+    CV_Assert( src.type() == CV_8U || src.type() == CV_16S || src.type() == CV_32S || src.type() == CV_32F );
+
+    drawColorDisp_callers[src.type()](src, dst, ndisp, StreamAccessor::getStream(stream));
+}
+
+#endif
diff --git a/modules/cudastereo/test/test_main.cpp b/modules/cudastereo/test/test_main.cpp
new file mode 100644
index 00000000000..04f4fcf6e60
--- /dev/null
+++ b/modules/cudastereo/test/test_main.cpp
@@ -0,0 +1,45 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+CV_CUDA_TEST_MAIN("gpu")
diff --git a/modules/cudastereo/test/test_precomp.hpp b/modules/cudastereo/test/test_precomp.hpp
new file mode 100644
index 00000000000..ee7ed69fb6a
--- /dev/null
+++ b/modules/cudastereo/test/test_precomp.hpp
@@ -0,0 +1,53 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_test.hpp"
+
+#include "opencv2/cudastereo.hpp"
+#include "opencv2/calib3d.hpp"
+
+#include "cvconfig.h"
+
+#endif
diff --git a/modules/cudastereo/test/test_stereo.cpp b/modules/cudastereo/test/test_stereo.cpp
new file mode 100644
index 00000000000..ebcab085ceb
--- /dev/null
+++ b/modules/cudastereo/test/test_stereo.cpp
@@ -0,0 +1,214 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////////////////////////
+// StereoBM
+
+struct StereoBM : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    virtual void SetUp()
+    {
+        devInfo = GetParam();
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(StereoBM, Regression)
+{
+    cv::Mat left_image  = readImage("stereobm/aloe-L.png", cv::IMREAD_GRAYSCALE);
+    cv::Mat right_image = readImage("stereobm/aloe-R.png", cv::IMREAD_GRAYSCALE);
+    cv::Mat disp_gold   = readImage("stereobm/aloe-disp.png", cv::IMREAD_GRAYSCALE);
+
+    ASSERT_FALSE(left_image.empty());
+    ASSERT_FALSE(right_image.empty());
+    ASSERT_FALSE(disp_gold.empty());
+
+    cv::Ptr<cv::StereoBM> bm = cv::cuda::createStereoBM(128, 19);
+    cv::cuda::GpuMat disp;
+
+    bm->compute(loadMat(left_image), loadMat(right_image), disp);
+
+    EXPECT_MAT_NEAR(disp_gold, disp, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Stereo, StereoBM, ALL_DEVICES);
+
+//////////////////////////////////////////////////////////////////////////
+// StereoBeliefPropagation
+
+struct StereoBeliefPropagation : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    virtual void SetUp()
+    {
+        devInfo = GetParam();
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(StereoBeliefPropagation, Regression)
+{
+    cv::Mat left_image  = readImage("stereobp/aloe-L.png");
+    cv::Mat right_image = readImage("stereobp/aloe-R.png");
+    cv::Mat disp_gold   = readImage("stereobp/aloe-disp.png", cv::IMREAD_GRAYSCALE);
+
+    ASSERT_FALSE(left_image.empty());
+    ASSERT_FALSE(right_image.empty());
+    ASSERT_FALSE(disp_gold.empty());
+
+    cv::Ptr<cv::cuda::StereoBeliefPropagation> bp = cv::cuda::createStereoBeliefPropagation(64, 8, 2, CV_16S);
+    bp->setMaxDataTerm(25.0);
+    bp->setDataWeight(0.1);
+    bp->setMaxDiscTerm(15.0);
+    bp->setDiscSingleJump(1.0);
+
+    cv::cuda::GpuMat disp;
+
+    bp->compute(loadMat(left_image), loadMat(right_image), disp);
+
+    cv::Mat h_disp(disp);
+    h_disp.convertTo(h_disp, disp_gold.depth());
+
+    EXPECT_MAT_NEAR(disp_gold, h_disp, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Stereo, StereoBeliefPropagation, ALL_DEVICES);
+
+//////////////////////////////////////////////////////////////////////////
+// StereoConstantSpaceBP
+
+struct StereoConstantSpaceBP : testing::TestWithParam<cv::cuda::DeviceInfo>
+{
+    cv::cuda::DeviceInfo devInfo;
+
+    virtual void SetUp()
+    {
+        devInfo = GetParam();
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(StereoConstantSpaceBP, Regression)
+{
+    cv::Mat left_image  = readImage("csstereobp/aloe-L.png");
+    cv::Mat right_image = readImage("csstereobp/aloe-R.png");
+
+    cv::Mat disp_gold;
+
+    if (supportFeature(devInfo, cv::cuda::FEATURE_SET_COMPUTE_20))
+        disp_gold = readImage("csstereobp/aloe-disp.png", cv::IMREAD_GRAYSCALE);
+    else
+        disp_gold = readImage("csstereobp/aloe-disp_CC1X.png", cv::IMREAD_GRAYSCALE);
+
+    ASSERT_FALSE(left_image.empty());
+    ASSERT_FALSE(right_image.empty());
+    ASSERT_FALSE(disp_gold.empty());
+
+    cv::Ptr<cv::cuda::StereoConstantSpaceBP> csbp = cv::cuda::createStereoConstantSpaceBP(128, 16, 4, 4);
+    cv::cuda::GpuMat disp;
+
+    csbp->compute(loadMat(left_image), loadMat(right_image), disp);
+
+    cv::Mat h_disp(disp);
+    h_disp.convertTo(h_disp, disp_gold.depth());
+
+    EXPECT_MAT_SIMILAR(disp_gold, h_disp, 1e-4);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Stereo, StereoConstantSpaceBP, ALL_DEVICES);
+
+////////////////////////////////////////////////////////////////////////////////
+// reprojectImageTo3D
+
+PARAM_TEST_CASE(ReprojectImageTo3D, cv::cuda::DeviceInfo, cv::Size, MatDepth, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int depth;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        depth = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(ReprojectImageTo3D, Accuracy)
+{
+    cv::Mat disp = randomMat(size, depth, 5.0, 30.0);
+    cv::Mat Q = randomMat(cv::Size(4, 4), CV_32FC1, 0.1, 1.0);
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::reprojectImageTo3D(loadMat(disp, useRoi), dst, Q, 3);
+
+    cv::Mat dst_gold;
+    cv::reprojectImageTo3D(disp, dst_gold, Q, false);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-5);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Stereo, ReprojectImageTo3D, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatDepth(CV_8U), MatDepth(CV_16S)),
+    WHOLE_SUBMAT));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudawarping/CMakeLists.txt b/modules/cudawarping/CMakeLists.txt
new file mode 100644
index 00000000000..6370189b75c
--- /dev/null
+++ b/modules/cudawarping/CMakeLists.txt
@@ -0,0 +1,9 @@
+if(IOS OR WINRT OR (NOT HAVE_CUDA AND NOT BUILD_CUDA_STUBS))
+  ocv_module_disable(cudawarping)
+endif()
+
+set(the_description "CUDA-accelerated Image Warping")
+
+ocv_warnings_disable(CMAKE_CXX_FLAGS /wd4127 /wd4324 /wd4512 -Wundef -Wmissing-declarations -Wshadow)
+
+ocv_define_module(cudawarping opencv_core opencv_imgproc OPTIONAL opencv_cudev WRAP python)
diff --git a/modules/cudawarping/include/opencv2/cudawarping.hpp b/modules/cudawarping/include/opencv2/cudawarping.hpp
new file mode 100644
index 00000000000..45cca1ccf86
--- /dev/null
+++ b/modules/cudawarping/include/opencv2/cudawarping.hpp
@@ -0,0 +1,252 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_CUDAWARPING_HPP
+#define OPENCV_CUDAWARPING_HPP
+
+#ifndef __cplusplus
+#  error cudawarping.hpp header must be compiled as C++
+#endif
+
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/imgproc.hpp"
+
+/**
+  @addtogroup cuda
+  @{
+    @defgroup cudawarping Image Warping
+  @}
+ */
+
+namespace cv { namespace cuda {
+
+//! @addtogroup cudawarping
+//! @{
+
+/** @brief Applies a generic geometrical transformation to an image.
+
+@param src Source image.
+@param dst Destination image with the size the same as xmap and the type the same as src .
+@param xmap X values. Only CV_32FC1 type is supported.
+@param ymap Y values. Only CV_32FC1 type is supported.
+@param interpolation Interpolation method (see resize ). INTER_NEAREST , INTER_LINEAR and
+INTER_CUBIC are supported for now.
+@param borderMode Pixel extrapolation method (see borderInterpolate ). BORDER_REFLECT101 ,
+BORDER_REPLICATE , BORDER_CONSTANT , BORDER_REFLECT and BORDER_WRAP are supported for now.
+@param borderValue Value used in case of a constant border. By default, it is 0.
+@param stream Stream for the asynchronous version.
+
+The function transforms the source image using the specified map:
+
+\f[\texttt{dst} (x,y) =  \texttt{src} (xmap(x,y), ymap(x,y))\f]
+
+Values of pixels with non-integer coordinates are computed using the bilinear interpolation.
+
+@sa remap
+ */
+CV_EXPORTS_W void remap(InputArray src, OutputArray dst, InputArray xmap, InputArray ymap,
+                      int interpolation, int borderMode = BORDER_CONSTANT, Scalar borderValue = Scalar(),
+                      Stream& stream = Stream::Null());
+
+/** @brief Resizes an image.
+
+@param src Source image.
+@param dst Destination image with the same type as src . The size is dsize (when it is non-zero)
+or the size is computed from src.size() , fx , and fy .
+@param dsize Destination image size. If it is zero, it is computed as:
+\f[\texttt{dsize = Size(round(fx*src.cols), round(fy*src.rows))}\f]
+Either dsize or both fx and fy must be non-zero.
+@param fx Scale factor along the horizontal axis. If it is zero, it is computed as:
+\f[\texttt{(double)dsize.width/src.cols}\f]
+@param fy Scale factor along the vertical axis. If it is zero, it is computed as:
+\f[\texttt{(double)dsize.height/src.rows}\f]
+@param interpolation Interpolation method. INTER_NEAREST , INTER_LINEAR and INTER_CUBIC are
+supported for now.
+@param stream Stream for the asynchronous version.
+
+@sa resize
+ */
+CV_EXPORTS_W void resize(InputArray src, OutputArray dst, Size dsize, double fx=0, double fy=0, int interpolation = INTER_LINEAR, Stream& stream = Stream::Null());
+
+/** @brief Applies an affine transformation to an image.
+
+@param src Source image. CV_8U , CV_16U , CV_32S , or CV_32F depth and 1, 3, or 4 channels are
+supported.
+@param dst Destination image with the same type as src . The size is dsize .
+@param M *2x3* Mat or UMat transformation matrix.
+@param dsize Size of the destination image.
+@param flags Combination of interpolation methods (see resize) and the optional flag
+WARP_INVERSE_MAP specifying that M is an inverse transformation ( dst=\>src ). Only
+INTER_NEAREST , INTER_LINEAR , and INTER_CUBIC interpolation methods are supported.
+@param borderMode
+@param borderValue
+@param stream Stream for the asynchronous version.
+
+@sa warpAffine
+ */
+CV_EXPORTS void warpAffine(InputArray src, OutputArray dst, InputArray M, Size dsize, int flags = INTER_LINEAR,
+    int borderMode = BORDER_CONSTANT, Scalar borderValue = Scalar(), Stream& stream = Stream::Null());
+
+CV_WRAP inline void warpAffine(InputArray src, OutputArray dst, UMat M, Size dsize, int flags = INTER_LINEAR,
+    int borderMode = BORDER_CONSTANT, Scalar borderValue = Scalar(), Stream& stream = Stream::Null()) {
+    warpAffine(src, dst, InputArray(M), dsize, flags, borderMode, borderValue, stream);
+}
+
+CV_WRAP inline void warpAffine(InputArray src, OutputArray dst, Mat M, Size dsize, int flags = INTER_LINEAR,
+    int borderMode = BORDER_CONSTANT, Scalar borderValue = Scalar(), Stream& stream = Stream::Null()) {
+    warpAffine(src, dst, InputArray(M), dsize, flags, borderMode, borderValue, stream);
+}
+
+/** @brief Builds transformation maps for affine transformation.
+
+@param M *2x3* Mat or UMat transformation matrix.
+@param inverse Flag specifying that M is an inverse transformation ( dst=\>src ).
+@param dsize Size of the destination image.
+@param xmap X values with CV_32FC1 type.
+@param ymap Y values with CV_32FC1 type.
+@param stream Stream for the asynchronous version.
+
+@sa cuda::warpAffine , cuda::remap
+ */
+CV_EXPORTS void buildWarpAffineMaps(InputArray M, bool inverse, Size dsize, OutputArray xmap, OutputArray ymap, Stream& stream = Stream::Null());
+
+CV_WRAP inline void buildWarpAffineMaps(UMat M, bool inverse, Size dsize, CV_OUT GpuMat& xmap, CV_OUT GpuMat& ymap, Stream& stream = Stream::Null()) {
+    buildWarpAffineMaps(InputArray(M), inverse, dsize, OutputArray(xmap), OutputArray(ymap), stream);
+}
+
+CV_WRAP inline void buildWarpAffineMaps(Mat M, bool inverse, Size dsize, CV_OUT GpuMat& xmap, CV_OUT GpuMat& ymap, Stream& stream = Stream::Null()) {
+    buildWarpAffineMaps(InputArray(M), inverse, dsize, OutputArray(xmap), OutputArray(ymap), stream);
+}
+
+/** @brief Applies a perspective transformation to an image.
+
+@param src Source image. CV_8U , CV_16U , CV_32S , or CV_32F depth and 1, 3, or 4 channels are
+supported.
+@param dst Destination image with the same type as src . The size is dsize .
+@param M *3x3* Mat or UMat transformation matrix.
+@param dsize Size of the destination image.
+@param flags Combination of interpolation methods (see resize ) and the optional flag
+WARP_INVERSE_MAP specifying that M is the inverse transformation ( dst =\> src ). Only
+INTER_NEAREST , INTER_LINEAR , and INTER_CUBIC interpolation methods are supported.
+@param borderMode
+@param borderValue
+@param stream Stream for the asynchronous version.
+
+@sa warpPerspective
+ */
+CV_EXPORTS void warpPerspective(InputArray src, OutputArray dst, InputArray M, Size dsize, int flags = INTER_LINEAR,
+    int borderMode = BORDER_CONSTANT, Scalar borderValue = Scalar(), Stream& stream = Stream::Null());
+
+CV_WRAP inline void warpPerspective(InputArray src, OutputArray dst, UMat M, Size dsize, int flags = INTER_LINEAR,
+    int borderMode = BORDER_CONSTANT, Scalar borderValue = Scalar(), Stream& stream = Stream::Null()) {
+    warpPerspective(src, dst, InputArray(M), dsize, flags, borderMode, borderValue, stream);
+}
+
+CV_WRAP inline void warpPerspective(InputArray src, OutputArray dst, Mat M, Size dsize, int flags = INTER_LINEAR,
+    int borderMode = BORDER_CONSTANT, Scalar borderValue = Scalar(), Stream& stream = Stream::Null()) {
+    warpPerspective(src, dst, InputArray(M), dsize, flags, borderMode, borderValue, stream);
+}
+
+/** @brief Builds transformation maps for perspective transformation.
+
+@param M *3x3* Mat or UMat transformation matrix.
+@param inverse Flag specifying that M is an inverse transformation ( dst=\>src ).
+@param dsize Size of the destination image.
+@param xmap X values with CV_32FC1 type.
+@param ymap Y values with CV_32FC1 type.
+@param stream Stream for the asynchronous version.
+
+@sa cuda::warpPerspective , cuda::remap
+ */
+CV_EXPORTS void buildWarpPerspectiveMaps(InputArray M, bool inverse, Size dsize, OutputArray xmap, OutputArray ymap, Stream& stream = Stream::Null());
+
+CV_WRAP inline void buildWarpPerspectiveMaps(UMat M, bool inverse, Size dsize, CV_OUT GpuMat& xmap, CV_OUT GpuMat& ymap, Stream& stream = Stream::Null()) {
+    buildWarpPerspectiveMaps(InputArray(M), inverse, dsize, OutputArray(xmap), OutputArray(ymap), stream);
+}
+
+CV_WRAP inline void buildWarpPerspectiveMaps(Mat M, bool inverse, Size dsize, CV_OUT GpuMat& xmap, CV_OUT GpuMat& ymap, Stream& stream = Stream::Null()) {
+    buildWarpPerspectiveMaps(InputArray(M), inverse, dsize, OutputArray(xmap), OutputArray(ymap), stream);
+}
+
+/** @brief Rotates an image around the origin (0,0) and then shifts it.
+
+@param src Source image. Supports 1, 3 or 4 channels images with CV_8U , CV_16U or CV_32F
+depth.
+@param dst Destination image with the same type as src . The size is dsize .
+@param dsize Size of the destination image.
+@param angle Angle of rotation in degrees.
+@param xShift Shift along the horizontal axis.
+@param yShift Shift along the vertical axis.
+@param interpolation Interpolation method. Only INTER_NEAREST , INTER_LINEAR , and INTER_CUBIC
+are supported.
+@param stream Stream for the asynchronous version.
+
+@sa cuda::warpAffine
+ */
+CV_EXPORTS_W void rotate(InputArray src, OutputArray dst, Size dsize, double angle, double xShift = 0, double yShift = 0,
+                       int interpolation = INTER_LINEAR, Stream& stream = Stream::Null());
+
+/** @brief Smoothes an image and downsamples it.
+
+@param src Source image.
+@param dst Destination image. Will have Size((src.cols+1)/2, (src.rows+1)/2) size and the same
+type as src .
+@param stream Stream for the asynchronous version.
+
+@sa pyrDown
+ */
+CV_EXPORTS_W void pyrDown(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+/** @brief Upsamples an image and then smoothes it.
+
+@param src Source image.
+@param dst Destination image. Will have Size(src.cols\*2, src.rows\*2) size and the same type as
+src .
+@param stream Stream for the asynchronous version.
+ */
+CV_EXPORTS_W void pyrUp(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
+
+//! @}
+
+}} // namespace cv { namespace cuda {
+
+#endif /* OPENCV_CUDAWARPING_HPP */
diff --git a/modules/cudawarping/misc/python/test/test_cudawarping.py b/modules/cudawarping/misc/python/test/test_cudawarping.py
new file mode 100644
index 00000000000..65c6a3119b5
--- /dev/null
+++ b/modules/cudawarping/misc/python/test/test_cudawarping.py
@@ -0,0 +1,76 @@
+#!/usr/bin/env python
+import os
+import cv2 as cv
+import numpy as np
+
+from tests_common import NewOpenCVTests, unittest
+
+def create_affine_transform_matrix(size,angle):
+    return np.array([[np.cos(angle), -np.sin(angle), size[1]/2], [np.sin(angle), np.cos(angle), 0]])
+def create_perspective_transform_matrix(size,angle):
+    return np.vstack([create_affine_transform_matrix(size,angle),[0, 0, 1]])
+
+class cudawarping_test(NewOpenCVTests):
+    def setUp(self):
+        super(cudawarping_test, self).setUp()
+        if not cv.cuda.getCudaEnabledDeviceCount():
+            self.skipTest("No CUDA-capable device is detected")
+
+    def test_resize(self):
+        dstSz = (256,256)
+        interp = cv.INTER_NEAREST
+        npMat = (np.random.random((128,128,3))*255).astype(np.uint8)
+
+        cuMat = cv.cuda_GpuMat(npMat)
+        cuMatDst = cv.cuda_GpuMat(dstSz,cuMat.type())
+
+        self.assertTrue(np.allclose(cv.cuda.resize(cuMat,dstSz,interpolation=interp).download(),
+            cv.resize(npMat,dstSz,interpolation=interp)))
+
+        cv.cuda.resize(cuMat,dstSz,cuMatDst,interpolation=interp)
+        self.assertTrue(np.allclose(cuMatDst.download(),cv.resize(npMat,dstSz,interpolation=interp)))
+
+    def test_warp(self):
+        npMat = (np.random.random((128,128,3))*255).astype(np.uint8)
+        size = npMat.shape[:2]
+        M1 = create_affine_transform_matrix(size,np.pi/2)
+
+        cuMat = cv.cuda_GpuMat(npMat)
+        cuMatDst = cv.cuda_GpuMat(size,cuMat.type())
+
+        borderType = cv.BORDER_REFLECT101
+        self.assertTrue(np.allclose(cv.cuda.warpAffine(cuMat,M1,size,borderMode=borderType).download(),
+            cv.warpAffine(npMat,M1,size, borderMode=borderType)))
+        cv.cuda.warpAffine(cuMat,M1,size,cuMatDst,borderMode=borderType)
+        self.assertTrue(np.allclose(cuMatDst.download(),cv.warpAffine(npMat,M1,size,borderMode=borderType)))
+
+        interpolation = cv.INTER_NEAREST
+        flags = interpolation | cv.WARP_INVERSE_MAP
+        dst_gold = cv.warpAffine(npMat, M1, size, flags = flags)
+        cuMaps = cv.cuda.buildWarpAffineMaps(M1,True,size)
+        dst = cv.remap(npMat, cuMaps[0].download(), cuMaps[1].download(),interpolation)
+        self.assertTrue(np.allclose(dst,dst_gold))
+
+        xmap = cv.cuda_GpuMat(size,cv.CV_32FC1)
+        ymap = cv.cuda_GpuMat(size,cv.CV_32FC1)
+        cv.cuda.buildWarpAffineMaps(M1,True,size,xmap,ymap)
+        dst = cv.remap(npMat, xmap.download(), ymap.download(),interpolation)
+        self.assertTrue(np.allclose(dst,dst_gold))
+
+        M2 = create_perspective_transform_matrix(size,np.pi/2)
+        np.allclose(cv.cuda.warpPerspective(cuMat,M2,size,borderMode=borderType).download(),
+                    cv.warpPerspective(npMat,M2,size,borderMode=borderType))
+        cv.cuda.warpPerspective(cuMat,M2,size,cuMatDst,borderMode=borderType)
+        self.assertTrue(np.allclose(cuMatDst.download(),cv.warpPerspective(npMat,M2,size,borderMode=borderType)))
+
+        dst_gold = cv.warpPerspective(npMat, M2, size, flags = flags)
+        cuMaps = cv.cuda.buildWarpPerspectiveMaps(M2,True,size)
+        dst = cv.remap(npMat, cuMaps[0].download(), cuMaps[1].download(),interpolation)
+        self.assertTrue(np.allclose(dst,dst_gold))
+
+        cv.cuda.buildWarpPerspectiveMaps(M2,True,size,xmap,ymap)
+        dst = cv.remap(npMat, xmap.download(), ymap.download(),interpolation)
+        self.assertTrue(np.allclose(dst,dst_gold))
+
+if __name__ == '__main__':
+    NewOpenCVTests.bootstrap()
\ No newline at end of file
diff --git a/modules/cudawarping/perf/perf_main.cpp b/modules/cudawarping/perf/perf_main.cpp
new file mode 100644
index 00000000000..831844491c3
--- /dev/null
+++ b/modules/cudawarping/perf/perf_main.cpp
@@ -0,0 +1,47 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+using namespace perf;
+
+CV_PERF_TEST_CUDA_MAIN(cudawarping)
diff --git a/modules/cudawarping/perf/perf_precomp.hpp b/modules/cudawarping/perf/perf_precomp.hpp
new file mode 100644
index 00000000000..2311a482aab
--- /dev/null
+++ b/modules/cudawarping/perf/perf_precomp.hpp
@@ -0,0 +1,55 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_PERF_PRECOMP_HPP__
+#define __OPENCV_PERF_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_perf.hpp"
+
+#include "opencv2/cudawarping.hpp"
+
+namespace opencv_test {
+using namespace perf;
+}
+
+#endif
diff --git a/modules/cudawarping/perf/perf_warping.cpp b/modules/cudawarping/perf/perf_warping.cpp
new file mode 100644
index 00000000000..3e7aa18f559
--- /dev/null
+++ b/modules/cudawarping/perf/perf_warping.cpp
@@ -0,0 +1,436 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//////////////////////////////////////////////////////////////////////
+// Remap
+
+enum { HALF_SIZE=0, UPSIDE_DOWN, REFLECTION_X, REFLECTION_BOTH };
+CV_ENUM(RemapMode, HALF_SIZE, UPSIDE_DOWN, REFLECTION_X, REFLECTION_BOTH)
+
+void generateMap(cv::Mat& map_x, cv::Mat& map_y, int remapMode)
+{
+    for (int j = 0; j < map_x.rows; ++j)
+    {
+        for (int i = 0; i < map_x.cols; ++i)
+        {
+            switch (remapMode)
+            {
+            case HALF_SIZE:
+                if (i > map_x.cols*0.25 && i < map_x.cols*0.75 && j > map_x.rows*0.25 && j < map_x.rows*0.75)
+                {
+                    map_x.at<float>(j,i) = 2.f * (i - map_x.cols * 0.25f) + 0.5f;
+                    map_y.at<float>(j,i) = 2.f * (j - map_x.rows * 0.25f) + 0.5f;
+                }
+                else
+                {
+                    map_x.at<float>(j,i) = 0.f;
+                    map_y.at<float>(j,i) = 0.f;
+                }
+                break;
+            case UPSIDE_DOWN:
+                map_x.at<float>(j,i) = static_cast<float>(i);
+                map_y.at<float>(j,i) = static_cast<float>(map_x.rows - j);
+                break;
+            case REFLECTION_X:
+                map_x.at<float>(j,i) = static_cast<float>(map_x.cols - i);
+                map_y.at<float>(j,i) = static_cast<float>(j);
+                break;
+            case REFLECTION_BOTH:
+                map_x.at<float>(j,i) = static_cast<float>(map_x.cols - i);
+                map_y.at<float>(j,i) = static_cast<float>(map_x.rows - j);
+                break;
+            } // end of switch
+        }
+    }
+}
+
+DEF_PARAM_TEST(Sz_Depth_Cn_Inter_Border_Mode, cv::Size, MatDepth, MatCn, Interpolation, BorderMode, RemapMode);
+
+PERF_TEST_P(Sz_Depth_Cn_Inter_Border_Mode, Remap,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F),
+                    CUDA_CHANNELS_1_3_4,
+                    Values(Interpolation(cv::INTER_NEAREST), Interpolation(cv::INTER_LINEAR), Interpolation(cv::INTER_CUBIC)),
+                    ALL_BORDER_MODES,
+                    RemapMode::all()))
+{
+    declare.time(20.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+    const int interpolation = GET_PARAM(3);
+    const int borderMode = GET_PARAM(4);
+    const int remapMode = GET_PARAM(5);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    cv::Mat xmap(size, CV_32FC1);
+    cv::Mat ymap(size, CV_32FC1);
+    generateMap(xmap, ymap, remapMode);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        const cv::cuda::GpuMat d_xmap(xmap);
+        const cv::cuda::GpuMat d_ymap(ymap);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::remap(d_src, dst, d_xmap, d_ymap, interpolation, borderMode);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::remap(src, dst, xmap, ymap, interpolation, borderMode);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Resize
+
+DEF_PARAM_TEST(Sz_Depth_Cn_Inter_Scale, cv::Size, MatDepth, MatCn, Interpolation, double);
+
+PERF_TEST_P(Sz_Depth_Cn_Inter_Scale, Resize,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F),
+                    CUDA_CHANNELS_1_3_4,
+                    Values(Interpolation(cv::INTER_NEAREST), Interpolation(cv::INTER_LINEAR), Interpolation(cv::INTER_CUBIC)),
+                    Values(0.5, 0.3, 2.0)))
+{
+    declare.time(20.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+    const int interpolation = GET_PARAM(3);
+    const double f = GET_PARAM(4);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::resize(d_src, dst, cv::Size(), f, f, interpolation);
+
+        CUDA_SANITY_CHECK(dst, 1e-3, ERROR_RELATIVE);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::resize(src, dst, cv::Size(), f, f, interpolation);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// ResizeArea
+
+DEF_PARAM_TEST(Sz_Depth_Cn_Scale, cv::Size, MatDepth, MatCn, double);
+
+PERF_TEST_P(Sz_Depth_Cn_Scale, ResizeArea,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F),
+                    CUDA_CHANNELS_1_3_4,
+                    Values(0.2, 0.1, 0.05)))
+{
+    declare.time(1.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+    const int interpolation = cv::INTER_AREA;
+    const double f = GET_PARAM(3);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::resize(d_src, dst, cv::Size(), f, f, interpolation);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::resize(src, dst, cv::Size(), f, f, interpolation);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// WarpAffine
+
+DEF_PARAM_TEST(Sz_Depth_Cn_Inter_Border, cv::Size, MatDepth, MatCn, Interpolation, BorderMode);
+
+PERF_TEST_P(Sz_Depth_Cn_Inter_Border, WarpAffine,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F),
+                    CUDA_CHANNELS_1_3_4,
+                    Values(Interpolation(cv::INTER_NEAREST), Interpolation(cv::INTER_LINEAR), Interpolation(cv::INTER_CUBIC)),
+                    ALL_BORDER_MODES))
+{
+    declare.time(20.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+    const int interpolation = GET_PARAM(3);
+    const int borderMode = GET_PARAM(4);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    const double aplha = CV_PI / 4;
+    const double mat[2 * 3] =
+    {
+        std::cos(aplha), -std::sin(aplha), static_cast<double>(src.cols) / 2.0,
+        std::sin(aplha),  std::cos(aplha), 0
+    };
+    const cv::Mat M(2, 3, CV_64F, (void*) mat);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::warpAffine(d_src, dst, M, size, interpolation, borderMode);
+
+        CUDA_SANITY_CHECK(dst, 1);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::warpAffine(src, dst, M, size, interpolation, borderMode);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// WarpPerspective
+
+PERF_TEST_P(Sz_Depth_Cn_Inter_Border, WarpPerspective,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F),
+                    CUDA_CHANNELS_1_3_4,
+                    Values(Interpolation(cv::INTER_NEAREST), Interpolation(cv::INTER_LINEAR), Interpolation(cv::INTER_CUBIC)),
+                    ALL_BORDER_MODES))
+{
+    declare.time(20.0);
+
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+    const int interpolation = GET_PARAM(3);
+    const int borderMode = GET_PARAM(4);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    const double aplha = CV_PI / 4;
+    double mat[3][3] = { {std::cos(aplha), -std::sin(aplha), static_cast<double>(src.cols) / 2.0},
+                         {std::sin(aplha),  std::cos(aplha), 0},
+                         {0.0,              0.0,             1.0}};
+    const cv::Mat M(3, 3, CV_64F, (void*) mat);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::warpPerspective(d_src, dst, M, size, interpolation, borderMode);
+
+        CUDA_SANITY_CHECK(dst, 1);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::warpPerspective(src, dst, M, size, interpolation, borderMode);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// Rotate
+
+DEF_PARAM_TEST(Sz_Depth_Cn_Inter, cv::Size, MatDepth, MatCn, Interpolation);
+
+PERF_TEST_P(Sz_Depth_Cn_Inter, Rotate,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F),
+                    CUDA_CHANNELS_1_3_4,
+                    Values(Interpolation(cv::INTER_NEAREST), Interpolation(cv::INTER_LINEAR), Interpolation(cv::INTER_CUBIC))))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+    const int interpolation = GET_PARAM(3);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::rotate(d_src, dst, size, 30.0, 0, 0, interpolation);
+
+        CUDA_SANITY_CHECK(dst, 1e-3, ERROR_RELATIVE);
+    }
+    else
+    {
+        FAIL_NO_CPU();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// PyrDown
+
+DEF_PARAM_TEST(Sz_Depth_Cn, cv::Size, MatDepth, MatCn);
+
+PERF_TEST_P(Sz_Depth_Cn, PyrDown,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F),
+                    CUDA_CHANNELS_1_3_4))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::pyrDown(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::pyrDown(src, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////
+// PyrUp
+
+PERF_TEST_P(Sz_Depth_Cn, PyrUp,
+            Combine(CUDA_TYPICAL_MAT_SIZES,
+                    Values(CV_8U, CV_16U, CV_32F),
+                    CUDA_CHANNELS_1_3_4))
+{
+    const cv::Size size = GET_PARAM(0);
+    const int depth = GET_PARAM(1);
+    const int channels = GET_PARAM(2);
+
+    const int type = CV_MAKE_TYPE(depth, channels);
+
+    cv::Mat src(size, type);
+    declare.in(src, WARMUP_RNG);
+
+    if (PERF_RUN_CUDA())
+    {
+        const cv::cuda::GpuMat d_src(src);
+        cv::cuda::GpuMat dst;
+
+        TEST_CYCLE() cv::cuda::pyrUp(d_src, dst);
+
+        CUDA_SANITY_CHECK(dst);
+    }
+    else
+    {
+        cv::Mat dst;
+
+        TEST_CYCLE() cv::pyrUp(src, dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+}} // namespace
diff --git a/modules/cudawarping/src/cuda/pyr_down.cu b/modules/cudawarping/src/cuda/pyr_down.cu
new file mode 100644
index 00000000000..03e791dcf35
--- /dev/null
+++ b/modules/cudawarping/src/cuda/pyr_down.cu
@@ -0,0 +1,228 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/border_interpolate.hpp"
+#include "opencv2/core/cuda/vec_traits.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/saturate_cast.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace imgproc
+    {
+        template <typename T, typename B> __global__ void pyrDown(const PtrStepSz<T> src, PtrStep<T> dst, const B b, int dst_cols)
+        {
+            typedef typename TypeVec<float, VecTraits<T>::cn>::vec_type work_t;
+
+            __shared__ work_t smem[256 + 4];
+
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y;
+
+            const int src_y = 2 * y;
+
+            if (src_y >= 2 && src_y < src.rows - 2 && x >= 2 && x < src.cols - 2)
+            {
+                {
+                    work_t sum;
+
+                    sum =       0.0625f * src(src_y - 2, x);
+                    sum = sum + 0.25f   * src(src_y - 1, x);
+                    sum = sum + 0.375f  * src(src_y    , x);
+                    sum = sum + 0.25f   * src(src_y + 1, x);
+                    sum = sum + 0.0625f * src(src_y + 2, x);
+
+                    smem[2 + threadIdx.x] = sum;
+                }
+
+                if (threadIdx.x < 2)
+                {
+                    const int left_x = x - 2;
+
+                    work_t sum;
+
+                    sum =       0.0625f * src(src_y - 2, left_x);
+                    sum = sum + 0.25f   * src(src_y - 1, left_x);
+                    sum = sum + 0.375f  * src(src_y    , left_x);
+                    sum = sum + 0.25f   * src(src_y + 1, left_x);
+                    sum = sum + 0.0625f * src(src_y + 2, left_x);
+
+                    smem[threadIdx.x] = sum;
+                }
+
+                if (threadIdx.x > 253)
+                {
+                    const int right_x = x + 2;
+
+                    work_t sum;
+
+                    sum =       0.0625f * src(src_y - 2, right_x);
+                    sum = sum + 0.25f   * src(src_y - 1, right_x);
+                    sum = sum + 0.375f  * src(src_y    , right_x);
+                    sum = sum + 0.25f   * src(src_y + 1, right_x);
+                    sum = sum + 0.0625f * src(src_y + 2, right_x);
+
+                    smem[4 + threadIdx.x] = sum;
+                }
+            }
+            else
+            {
+                {
+                    work_t sum;
+
+                    sum =       0.0625f * src(b.idx_row_low (src_y - 2), b.idx_col_high(x));
+                    sum = sum + 0.25f   * src(b.idx_row_low (src_y - 1), b.idx_col_high(x));
+                    sum = sum + 0.375f  * src(src_y                    , b.idx_col_high(x));
+                    sum = sum + 0.25f   * src(b.idx_row_high(src_y + 1), b.idx_col_high(x));
+                    sum = sum + 0.0625f * src(b.idx_row_high(src_y + 2), b.idx_col_high(x));
+
+                    smem[2 + threadIdx.x] = sum;
+                }
+
+                if (threadIdx.x < 2)
+                {
+                    const int left_x = x - 2;
+
+                    work_t sum;
+
+                    sum =       0.0625f * src(b.idx_row_low (src_y - 2), b.idx_col(left_x));
+                    sum = sum + 0.25f   * src(b.idx_row_low (src_y - 1), b.idx_col(left_x));
+                    sum = sum + 0.375f  * src(src_y                    , b.idx_col(left_x));
+                    sum = sum + 0.25f   * src(b.idx_row_high(src_y + 1), b.idx_col(left_x));
+                    sum = sum + 0.0625f * src(b.idx_row_high(src_y + 2), b.idx_col(left_x));
+
+                    smem[threadIdx.x] = sum;
+                }
+
+                if (threadIdx.x > 253)
+                {
+                    const int right_x = x + 2;
+
+                    work_t sum;
+
+                    sum =       0.0625f * src(b.idx_row_low (src_y - 2), b.idx_col_high(right_x));
+                    sum = sum + 0.25f   * src(b.idx_row_low (src_y - 1), b.idx_col_high(right_x));
+                    sum = sum + 0.375f  * src(src_y                    , b.idx_col_high(right_x));
+                    sum = sum + 0.25f   * src(b.idx_row_high(src_y + 1), b.idx_col_high(right_x));
+                    sum = sum + 0.0625f * src(b.idx_row_high(src_y + 2), b.idx_col_high(right_x));
+
+                    smem[4 + threadIdx.x] = sum;
+                }
+            }
+
+            __syncthreads();
+
+            if (threadIdx.x < 128)
+            {
+                const int tid2 = threadIdx.x * 2;
+
+                work_t sum;
+
+                sum =       0.0625f * smem[2 + tid2 - 2];
+                sum = sum + 0.25f   * smem[2 + tid2 - 1];
+                sum = sum + 0.375f  * smem[2 + tid2    ];
+                sum = sum + 0.25f   * smem[2 + tid2 + 1];
+                sum = sum + 0.0625f * smem[2 + tid2 + 2];
+
+                const int dst_x = (blockIdx.x * blockDim.x + tid2) / 2;
+
+                if (dst_x < dst_cols)
+                    dst.ptr(y)[dst_x] = saturate_cast<T>(sum);
+            }
+        }
+
+        template <typename T, template <typename> class B> void pyrDown_caller(PtrStepSz<T> src, PtrStepSz<T> dst, cudaStream_t stream)
+        {
+            const dim3 block(256);
+            const dim3 grid(divUp(src.cols, block.x), dst.rows);
+
+            B<T> b(src.rows, src.cols);
+
+            pyrDown<T><<<grid, block, 0, stream>>>(src, dst, b, dst.cols);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template <typename T> void pyrDown_gpu(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream)
+        {
+            pyrDown_caller<T, BrdReflect101>(static_cast< PtrStepSz<T> >(src), static_cast< PtrStepSz<T> >(dst), stream);
+        }
+
+        template void pyrDown_gpu<uchar>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrDown_gpu<uchar2>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrDown_gpu<uchar3>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrDown_gpu<uchar4>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+
+        //template void pyrDown_gpu<schar>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrDown_gpu<char2>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrDown_gpu<char3>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrDown_gpu<char4>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+
+        template void pyrDown_gpu<ushort>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrDown_gpu<ushort2>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrDown_gpu<ushort3>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrDown_gpu<ushort4>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+
+        template void pyrDown_gpu<short>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrDown_gpu<short2>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrDown_gpu<short3>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrDown_gpu<short4>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+
+        template void pyrDown_gpu<int>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrDown_gpu<int2>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrDown_gpu<int3>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrDown_gpu<int4>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+
+        template void pyrDown_gpu<float>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrDown_gpu<float2>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrDown_gpu<float3>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrDown_gpu<float4>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+    } // namespace imgproc
+}}} // namespace cv { namespace cuda { namespace cudev
+
+
+#endif /* CUDA_DISABLER */
\ No newline at end of file
diff --git a/modules/cudawarping/src/cuda/pyr_up.cu b/modules/cudawarping/src/cuda/pyr_up.cu
new file mode 100644
index 00000000000..b22454964bd
--- /dev/null
+++ b/modules/cudawarping/src/cuda/pyr_up.cu
@@ -0,0 +1,196 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/border_interpolate.hpp"
+#include "opencv2/core/cuda/vec_traits.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/saturate_cast.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace imgproc
+    {
+        template <typename T> __global__ void pyrUp(const PtrStepSz<T> src, PtrStepSz<T> dst)
+        {
+            typedef typename TypeVec<float, VecTraits<T>::cn>::vec_type sum_t;
+
+            const int x = blockIdx.x * blockDim.x + threadIdx.x;
+            const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+            __shared__ sum_t s_srcPatch[10][10];
+            __shared__ sum_t s_dstPatch[20][16];
+
+            if (threadIdx.x < 10 && threadIdx.y < 10)
+            {
+                int srcx = static_cast<int>((blockIdx.x * blockDim.x) / 2 + threadIdx.x) - 1;
+                int srcy = static_cast<int>((blockIdx.y * blockDim.y) / 2 + threadIdx.y) - 1;
+
+                srcx = ::abs(srcx);
+                srcx = ::min(src.cols - 1, srcx);
+
+                srcy = ::abs(srcy);
+                srcy = ::min(src.rows - 1, srcy);
+
+                s_srcPatch[threadIdx.y][threadIdx.x] = saturate_cast<sum_t>(src(srcy, srcx));
+            }
+
+            __syncthreads();
+
+            sum_t sum = VecTraits<sum_t>::all(0);
+
+            const int evenFlag = static_cast<int>((threadIdx.x & 1) == 0);
+            const int oddFlag  = static_cast<int>((threadIdx.x & 1) != 0);
+            const bool eveny = ((threadIdx.y & 1) == 0);
+            const int tidx = threadIdx.x;
+
+            if (eveny)
+            {
+                sum = sum + (evenFlag * 0.0625f) * s_srcPatch[1 + (threadIdx.y >> 1)][1 + ((tidx - 2) >> 1)];
+                sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[1 + (threadIdx.y >> 1)][1 + ((tidx - 1) >> 1)];
+                sum = sum + (evenFlag * 0.375f ) * s_srcPatch[1 + (threadIdx.y >> 1)][1 + ((tidx    ) >> 1)];
+                sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[1 + (threadIdx.y >> 1)][1 + ((tidx + 1) >> 1)];
+                sum = sum + (evenFlag * 0.0625f) * s_srcPatch[1 + (threadIdx.y >> 1)][1 + ((tidx + 2) >> 1)];
+            }
+
+            s_dstPatch[2 + threadIdx.y][threadIdx.x] = sum;
+
+            if (threadIdx.y < 2)
+            {
+                sum = VecTraits<sum_t>::all(0);
+
+                if (eveny)
+                {
+                    sum = sum + (evenFlag * 0.0625f) * s_srcPatch[0][1 + ((tidx - 2) >> 1)];
+                    sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[0][1 + ((tidx - 1) >> 1)];
+                    sum = sum + (evenFlag * 0.375f ) * s_srcPatch[0][1 + ((tidx    ) >> 1)];
+                    sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[0][1 + ((tidx + 1) >> 1)];
+                    sum = sum + (evenFlag * 0.0625f) * s_srcPatch[0][1 + ((tidx + 2) >> 1)];
+                }
+
+                s_dstPatch[threadIdx.y][threadIdx.x] = sum;
+            }
+
+            if (threadIdx.y > 13)
+            {
+                sum = VecTraits<sum_t>::all(0);
+
+                if (eveny)
+                {
+                    sum = sum + (evenFlag * 0.0625f) * s_srcPatch[9][1 + ((tidx - 2) >> 1)];
+                    sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[9][1 + ((tidx - 1) >> 1)];
+                    sum = sum + (evenFlag * 0.375f ) * s_srcPatch[9][1 + ((tidx    ) >> 1)];
+                    sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[9][1 + ((tidx + 1) >> 1)];
+                    sum = sum + (evenFlag * 0.0625f) * s_srcPatch[9][1 + ((tidx + 2) >> 1)];
+                }
+
+                s_dstPatch[4 + threadIdx.y][threadIdx.x] = sum;
+            }
+
+            __syncthreads();
+
+            sum = VecTraits<sum_t>::all(0);
+
+            const int tidy = threadIdx.y;
+
+            sum = sum + 0.0625f * s_dstPatch[2 + tidy - 2][threadIdx.x];
+            sum = sum + 0.25f   * s_dstPatch[2 + tidy - 1][threadIdx.x];
+            sum = sum + 0.375f  * s_dstPatch[2 + tidy    ][threadIdx.x];
+            sum = sum + 0.25f   * s_dstPatch[2 + tidy + 1][threadIdx.x];
+            sum = sum + 0.0625f * s_dstPatch[2 + tidy + 2][threadIdx.x];
+
+            if (x < dst.cols && y < dst.rows)
+                dst(y, x) = saturate_cast<T>(4.0f * sum);
+        }
+
+        template <typename T> void pyrUp_caller(PtrStepSz<T> src, PtrStepSz<T> dst, cudaStream_t stream)
+        {
+            const dim3 block(16, 16);
+            const dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y));
+
+            pyrUp<<<grid, block, 0, stream>>>(src, dst);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        template <typename T> void pyrUp_gpu(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream)
+        {
+            pyrUp_caller<T>(static_cast< PtrStepSz<T> >(src), static_cast< PtrStepSz<T> >(dst), stream);
+        }
+
+        template void pyrUp_gpu<uchar>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrUp_gpu<uchar2>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrUp_gpu<uchar3>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrUp_gpu<uchar4>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+
+        //template void pyrUp_gpu<schar>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrUp_gpu<char2>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrUp_gpu<char3>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrUp_gpu<char4>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+
+        template void pyrUp_gpu<ushort>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrUp_gpu<ushort2>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrUp_gpu<ushort3>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrUp_gpu<ushort4>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+
+        template void pyrUp_gpu<short>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrUp_gpu<short2>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrUp_gpu<short3>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrUp_gpu<short4>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+
+        //template void pyrUp_gpu<int>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrUp_gpu<int2>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrUp_gpu<int3>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrUp_gpu<int4>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+
+        template void pyrUp_gpu<float>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        //template void pyrUp_gpu<float2>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrUp_gpu<float3>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+        template void pyrUp_gpu<float4>(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+    } // namespace imgproc
+}}} // namespace cv { namespace cuda { namespace cudev
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudawarping/src/cuda/remap.cu b/modules/cudawarping/src/cuda/remap.cu
new file mode 100644
index 00000000000..79f155ddfb9
--- /dev/null
+++ b/modules/cudawarping/src/cuda/remap.cu
@@ -0,0 +1,274 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/border_interpolate.hpp"
+#include "opencv2/core/cuda/vec_traits.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/saturate_cast.hpp"
+#include "opencv2/core/cuda/filters.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace imgproc
+    {
+        template <typename Ptr2D, typename T> __global__ void remap(const Ptr2D src, const PtrStepf mapx, const PtrStepf mapy, PtrStepSz<T> dst)
+        {
+            const int x = blockDim.x * blockIdx.x + threadIdx.x;
+            const int y = blockDim.y * blockIdx.y + threadIdx.y;
+
+            if (x < dst.cols && y < dst.rows)
+            {
+                const float xcoo = mapx.ptr(y)[x];
+                const float ycoo = mapy.ptr(y)[x];
+
+                dst.ptr(y)[x] = saturate_cast<T>(src(ycoo, xcoo));
+            }
+        }
+
+        template <template <typename> class Filter, template <typename> class B, typename T> struct RemapDispatcherStream
+        {
+            static void call(PtrStepSz<T> src, PtrStepSzf mapx, PtrStepSzf mapy, PtrStepSz<T> dst, const float* borderValue, cudaStream_t stream, bool)
+            {
+                typedef typename TypeVec<float, VecTraits<T>::cn>::vec_type work_type;
+
+                dim3 block(32, 8);
+                dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y));
+
+                B<work_type> brd(src.rows, src.cols, VecTraits<work_type>::make(borderValue));
+                BorderReader< PtrStep<T>, B<work_type> > brdSrc(src, brd);
+                Filter< BorderReader< PtrStep<T>, B<work_type> > > filter_src(brdSrc);
+
+                remap<<<grid, block, 0, stream>>>(filter_src, mapx, mapy, dst);
+                cudaSafeCall( cudaGetLastError() );
+            }
+        };
+
+        template <template <typename> class Filter, template <typename> class B, typename T> struct RemapDispatcherNonStream
+        {
+            static void call(PtrStepSz<T> src, PtrStepSz<T> srcWhole, int xoff, int yoff, PtrStepSzf mapx, PtrStepSzf mapy, PtrStepSz<T> dst, const float* borderValue, bool)
+            {
+                CV_UNUSED(srcWhole);
+                CV_UNUSED(xoff);
+                CV_UNUSED(yoff);
+                typedef typename TypeVec<float, VecTraits<T>::cn>::vec_type work_type;
+
+                dim3 block(32, 8);
+                dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y));
+
+                B<work_type> brd(src.rows, src.cols, VecTraits<work_type>::make(borderValue));
+                BorderReader< PtrStep<T>, B<work_type> > brdSrc(src, brd);
+                Filter< BorderReader< PtrStep<T>, B<work_type> > > filter_src(brdSrc);
+
+                remap<<<grid, block>>>(filter_src, mapx, mapy, dst);
+                cudaSafeCall( cudaGetLastError() );
+
+                cudaSafeCall( cudaDeviceSynchronize() );
+            }
+        };
+
+        #define OPENCV_CUDA_IMPLEMENT_REMAP_TEX(type) \
+            texture< type , cudaTextureType2D> tex_remap_ ## type (0, cudaFilterModePoint, cudaAddressModeClamp); \
+            struct tex_remap_ ## type ## _reader \
+            { \
+                typedef type elem_type; \
+                typedef int index_type; \
+                int xoff, yoff; \
+                tex_remap_ ## type ## _reader (int xoff_, int yoff_) : xoff(xoff_), yoff(yoff_) {} \
+                __device__ __forceinline__ elem_type operator ()(index_type y, index_type x) const \
+                { \
+                    return tex2D(tex_remap_ ## type , x + xoff, y + yoff); \
+                } \
+            }; \
+            template <template <typename> class Filter, template <typename> class B> struct RemapDispatcherNonStream<Filter, B, type> \
+            { \
+                static void call(PtrStepSz< type > src, PtrStepSz< type > srcWhole, int xoff, int yoff, PtrStepSzf mapx, PtrStepSzf mapy, \
+                    PtrStepSz< type > dst, const float* borderValue, bool cc20) \
+                { \
+                    typedef typename TypeVec<float, VecTraits< type >::cn>::vec_type work_type; \
+                    dim3 block(32, cc20 ? 8 : 4); \
+                    dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y)); \
+                    bindTexture(&tex_remap_ ## type , srcWhole); \
+                    tex_remap_ ## type ##_reader texSrc(xoff, yoff); \
+                    B<work_type> brd(src.rows, src.cols, VecTraits<work_type>::make(borderValue)); \
+                    BorderReader< tex_remap_ ## type ##_reader, B<work_type> > brdSrc(texSrc, brd); \
+                    Filter< BorderReader< tex_remap_ ## type ##_reader, B<work_type> > > filter_src(brdSrc); \
+                    remap<<<grid, block>>>(filter_src, mapx, mapy, dst); \
+                    cudaSafeCall( cudaGetLastError() ); \
+                    cudaSafeCall( cudaDeviceSynchronize() ); \
+                } \
+            }; \
+            template <template <typename> class Filter> struct RemapDispatcherNonStream<Filter, BrdReplicate, type> \
+            { \
+                static void call(PtrStepSz< type > src, PtrStepSz< type > srcWhole, int xoff, int yoff, PtrStepSzf mapx, PtrStepSzf mapy, \
+                    PtrStepSz< type > dst, const float*, bool) \
+                { \
+                    dim3 block(32, 8); \
+                    dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y)); \
+                    bindTexture(&tex_remap_ ## type , srcWhole); \
+                    tex_remap_ ## type ##_reader texSrc(xoff, yoff); \
+                    if (srcWhole.cols == src.cols && srcWhole.rows == src.rows) \
+                    { \
+                        Filter< tex_remap_ ## type ##_reader > filter_src(texSrc); \
+                        remap<<<grid, block>>>(filter_src, mapx, mapy, dst); \
+                    } \
+                    else \
+                    { \
+                        BrdReplicate<type> brd(src.rows, src.cols); \
+                        BorderReader< tex_remap_ ## type ##_reader, BrdReplicate<type> > brdSrc(texSrc, brd); \
+                        Filter< BorderReader< tex_remap_ ## type ##_reader, BrdReplicate<type> > > filter_src(brdSrc); \
+                        remap<<<grid, block>>>(filter_src, mapx, mapy, dst); \
+                    } \
+                    cudaSafeCall( cudaGetLastError() ); \
+                    cudaSafeCall( cudaDeviceSynchronize() ); \
+                } \
+            };
+
+        OPENCV_CUDA_IMPLEMENT_REMAP_TEX(uchar)
+        //OPENCV_CUDA_IMPLEMENT_REMAP_TEX(uchar2)
+        OPENCV_CUDA_IMPLEMENT_REMAP_TEX(uchar4)
+
+        //OPENCV_CUDA_IMPLEMENT_REMAP_TEX(schar)
+        //OPENCV_CUDA_IMPLEMENT_REMAP_TEX(char2)
+        //OPENCV_CUDA_IMPLEMENT_REMAP_TEX(char4)
+
+        OPENCV_CUDA_IMPLEMENT_REMAP_TEX(ushort)
+        //OPENCV_CUDA_IMPLEMENT_REMAP_TEX(ushort2)
+        OPENCV_CUDA_IMPLEMENT_REMAP_TEX(ushort4)
+
+        OPENCV_CUDA_IMPLEMENT_REMAP_TEX(short)
+        //OPENCV_CUDA_IMPLEMENT_REMAP_TEX(short2)
+        OPENCV_CUDA_IMPLEMENT_REMAP_TEX(short4)
+
+        //OPENCV_CUDA_IMPLEMENT_REMAP_TEX(int)
+        //OPENCV_CUDA_IMPLEMENT_REMAP_TEX(int2)
+        //OPENCV_CUDA_IMPLEMENT_REMAP_TEX(int4)
+
+        OPENCV_CUDA_IMPLEMENT_REMAP_TEX(float)
+        //OPENCV_CUDA_IMPLEMENT_REMAP_TEX(float2)
+        OPENCV_CUDA_IMPLEMENT_REMAP_TEX(float4)
+
+        #undef OPENCV_CUDA_IMPLEMENT_REMAP_TEX
+
+        template <template <typename> class Filter, template <typename> class B, typename T> struct RemapDispatcher
+        {
+            static void call(PtrStepSz<T> src, PtrStepSz<T> srcWhole, int xoff, int yoff, PtrStepSzf mapx, PtrStepSzf mapy,
+                PtrStepSz<T> dst, const float* borderValue, cudaStream_t stream, bool cc20)
+            {
+                if (stream == 0)
+                    RemapDispatcherNonStream<Filter, B, T>::call(src, srcWhole, xoff, yoff, mapx, mapy, dst, borderValue, cc20);
+                else
+                    RemapDispatcherStream<Filter, B, T>::call(src, mapx, mapy, dst, borderValue, stream, cc20);
+            }
+        };
+
+        template <typename T> void remap_gpu(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap,
+            PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20)
+        {
+            typedef void (*caller_t)(PtrStepSz<T> src, PtrStepSz<T> srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap,
+                PtrStepSz<T> dst, const float* borderValue, cudaStream_t stream, bool cc20);
+
+            static const caller_t callers[3][5] =
+            {
+                {
+                    RemapDispatcher<PointFilter, BrdConstant, T>::call,
+                    RemapDispatcher<PointFilter, BrdReplicate, T>::call,
+                    RemapDispatcher<PointFilter, BrdReflect, T>::call,
+                    RemapDispatcher<PointFilter, BrdWrap, T>::call,
+                    RemapDispatcher<PointFilter, BrdReflect101, T>::call
+                },
+                {
+                    RemapDispatcher<LinearFilter, BrdConstant, T>::call,
+                    RemapDispatcher<LinearFilter, BrdReplicate, T>::call,
+                    RemapDispatcher<LinearFilter, BrdReflect, T>::call,
+                    RemapDispatcher<LinearFilter, BrdWrap, T>::call,
+                    RemapDispatcher<LinearFilter, BrdReflect101, T>::call
+                },
+                {
+                    RemapDispatcher<CubicFilter, BrdConstant, T>::call,
+                    RemapDispatcher<CubicFilter, BrdReplicate, T>::call,
+                    RemapDispatcher<CubicFilter, BrdReflect, T>::call,
+                    RemapDispatcher<CubicFilter, BrdWrap, T>::call,
+                    RemapDispatcher<CubicFilter, BrdReflect101, T>::call
+                }
+            };
+
+            callers[interpolation][borderMode](static_cast< PtrStepSz<T> >(src), static_cast< PtrStepSz<T> >(srcWhole), xoff, yoff, xmap, ymap,
+                static_cast< PtrStepSz<T> >(dst), borderValue, stream, cc20);
+        }
+
+        template void remap_gpu<uchar >(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void remap_gpu<uchar2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void remap_gpu<uchar3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void remap_gpu<uchar4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        //template void remap_gpu<schar>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void remap_gpu<char2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void remap_gpu<char3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void remap_gpu<char4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        template void remap_gpu<ushort >(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void remap_gpu<ushort2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void remap_gpu<ushort3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void remap_gpu<ushort4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        template void remap_gpu<short >(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void remap_gpu<short2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void remap_gpu<short3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void remap_gpu<short4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        //template void remap_gpu<int >(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void remap_gpu<int2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void remap_gpu<int3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void remap_gpu<int4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        template void remap_gpu<float >(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void remap_gpu<float2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void remap_gpu<float3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void remap_gpu<float4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+    } // namespace imgproc
+}}} // namespace cv { namespace cuda { namespace cudev
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudawarping/src/cuda/resize.cu b/modules/cudawarping/src/cuda/resize.cu
new file mode 100644
index 00000000000..7285a474870
--- /dev/null
+++ b/modules/cudawarping/src/cuda/resize.cu
@@ -0,0 +1,482 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include <cfloat>
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/border_interpolate.hpp"
+#include "opencv2/core/cuda/vec_traits.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/saturate_cast.hpp"
+#include "opencv2/core/cuda/filters.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    // kernels
+
+    template <typename T> __global__ void resize_nearest(const PtrStep<T> src, PtrStepSz<T> dst, const float fy, const float fx)
+    {
+        const int dst_x = blockDim.x * blockIdx.x + threadIdx.x;
+        const int dst_y = blockDim.y * blockIdx.y + threadIdx.y;
+
+        if (dst_x < dst.cols && dst_y < dst.rows)
+        {
+            const float src_x = dst_x * fx;
+            const float src_y = dst_y * fy;
+
+            dst(dst_y, dst_x) = src(__float2int_rz(src_y), __float2int_rz(src_x));
+        }
+    }
+
+    template <typename T> __global__ void resize_linear(const PtrStepSz<T> src, PtrStepSz<T> dst, const float fy, const float fx)
+    {
+        typedef typename TypeVec<float, VecTraits<T>::cn>::vec_type work_type;
+
+        const int dst_x = blockDim.x * blockIdx.x + threadIdx.x;
+        const int dst_y = blockDim.y * blockIdx.y + threadIdx.y;
+
+        if (dst_x < dst.cols && dst_y < dst.rows)
+        {
+            const float src_x = dst_x * fx;
+            const float src_y = dst_y * fy;
+
+            work_type out = VecTraits<work_type>::all(0);
+
+            const int x1 = __float2int_rd(src_x);
+            const int y1 = __float2int_rd(src_y);
+            const int x2 = x1 + 1;
+            const int y2 = y1 + 1;
+            const int x2_read = ::min(x2, src.cols - 1);
+            const int y2_read = ::min(y2, src.rows - 1);
+
+            T src_reg = src(y1, x1);
+            out = out + src_reg * ((x2 - src_x) * (y2 - src_y));
+
+            src_reg = src(y1, x2_read);
+            out = out + src_reg * ((src_x - x1) * (y2 - src_y));
+
+            src_reg = src(y2_read, x1);
+            out = out + src_reg * ((x2 - src_x) * (src_y - y1));
+
+            src_reg = src(y2_read, x2_read);
+            out = out + src_reg * ((src_x - x1) * (src_y - y1));
+
+            dst(dst_y, dst_x) = saturate_cast<T>(out);
+        }
+    }
+
+    template <class Ptr2D, typename T> __global__ void resize(const Ptr2D src, PtrStepSz<T> dst, const float fy, const float fx)
+    {
+        const int dst_x = blockDim.x * blockIdx.x + threadIdx.x;
+        const int dst_y = blockDim.y * blockIdx.y + threadIdx.y;
+
+        if (dst_x < dst.cols && dst_y < dst.rows)
+        {
+            const float src_x = dst_x * fx;
+            const float src_y = dst_y * fy;
+
+            dst(dst_y, dst_x) = src(src_y, src_x);
+        }
+    }
+
+    template <typename Ptr2D, typename T> __global__ void resize_area(const Ptr2D src, PtrStepSz<T> dst)
+    {
+        const int x = blockDim.x * blockIdx.x + threadIdx.x;
+        const int y = blockDim.y * blockIdx.y + threadIdx.y;
+
+        if (x < dst.cols && y < dst.rows)
+        {
+            dst(y, x) = src(y, x);
+        }
+    }
+
+    // textures
+
+    template <typename T> struct TextureAccessor;
+
+    #define OPENCV_CUDA_IMPLEMENT_RESIZE_TEX(type) \
+        texture<type, cudaTextureType2D, cudaReadModeElementType> tex_resize_##type (0, cudaFilterModePoint, cudaAddressModeClamp); \
+        template <> struct TextureAccessor<type> \
+        { \
+            typedef type elem_type; \
+            typedef int index_type; \
+            int xoff; \
+            int yoff; \
+            __device__ __forceinline__ elem_type operator ()(index_type y, index_type x) const \
+            { \
+                return tex2D(tex_resize_##type, x + xoff, y + yoff); \
+            } \
+            __host__ static void bind(const PtrStepSz<type>& mat) \
+            { \
+                bindTexture(&tex_resize_##type, mat); \
+            } \
+        };
+
+    OPENCV_CUDA_IMPLEMENT_RESIZE_TEX(uchar)
+    OPENCV_CUDA_IMPLEMENT_RESIZE_TEX(uchar4)
+
+    OPENCV_CUDA_IMPLEMENT_RESIZE_TEX(ushort)
+    OPENCV_CUDA_IMPLEMENT_RESIZE_TEX(ushort4)
+
+    OPENCV_CUDA_IMPLEMENT_RESIZE_TEX(short)
+    OPENCV_CUDA_IMPLEMENT_RESIZE_TEX(short4)
+
+    OPENCV_CUDA_IMPLEMENT_RESIZE_TEX(float)
+    OPENCV_CUDA_IMPLEMENT_RESIZE_TEX(float4)
+
+    #undef OPENCV_CUDA_IMPLEMENT_RESIZE_TEX
+
+    template <typename T>
+    TextureAccessor<T> texAccessor(const PtrStepSz<T>& mat, int yoff, int xoff)
+    {
+        TextureAccessor<T>::bind(mat);
+
+        TextureAccessor<T> t;
+        t.xoff = xoff;
+        t.yoff = yoff;
+
+        return t;
+    }
+
+    // callers for nearest interpolation
+
+    template <typename T>
+    void call_resize_nearest_glob(const PtrStepSz<T>& src, const PtrStepSz<T>& dst, float fy, float fx, cudaStream_t stream)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y));
+
+        resize_nearest<<<grid, block, 0, stream>>>(src, dst, fy, fx);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    template <typename T>
+    void call_resize_nearest_tex(const PtrStepSz<T>& /*src*/, const PtrStepSz<T>& srcWhole, int yoff, int xoff, const PtrStepSz<T>& dst, float fy, float fx)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y));
+
+        resize<<<grid, block>>>(texAccessor(srcWhole, yoff, xoff), dst, fy, fx);
+        cudaSafeCall( cudaGetLastError() );
+
+        cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    // callers for linear interpolation
+
+    template <typename T>
+    void call_resize_linear_glob(const PtrStepSz<T>& src, const PtrStepSz<T>& dst, float fy, float fx, cudaStream_t stream)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y));
+
+        resize_linear<<<grid, block, 0, stream>>>(src, dst, fy, fx);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    template <typename T>
+    void call_resize_linear_tex(const PtrStepSz<T>& src, const PtrStepSz<T>& srcWhole, int yoff, int xoff, const PtrStepSz<T>& dst, float fy, float fx)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y));
+
+        if (srcWhole.data == src.data)
+        {
+            TextureAccessor<T> texSrc = texAccessor(src, 0, 0);
+            LinearFilter< TextureAccessor<T> > filteredSrc(texSrc);
+
+            resize<<<grid, block>>>(filteredSrc, dst, fy, fx);
+        }
+        else
+        {
+            TextureAccessor<T> texSrc = texAccessor(srcWhole, yoff, xoff);
+
+            BrdReplicate<T> brd(src.rows, src.cols);
+            BorderReader<TextureAccessor<T>, BrdReplicate<T> > brdSrc(texSrc, brd);
+            LinearFilter< BorderReader<TextureAccessor<T>, BrdReplicate<T> > > filteredSrc(brdSrc);
+
+            resize<<<grid, block>>>(filteredSrc, dst, fy, fx);
+        }
+
+        cudaSafeCall( cudaGetLastError() );
+
+        cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    // callers for cubic interpolation
+
+    template <typename T>
+    void call_resize_cubic_glob(const PtrStepSz<T>& src, const PtrStepSz<T>& dst, float fy, float fx, cudaStream_t stream)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y));
+
+        BrdReplicate<T> brd(src.rows, src.cols);
+        BorderReader< PtrStep<T>, BrdReplicate<T> > brdSrc(src, brd);
+        CubicFilter< BorderReader< PtrStep<T>, BrdReplicate<T> > > filteredSrc(brdSrc);
+
+        resize<<<grid, block, 0, stream>>>(filteredSrc, dst, fy, fx);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    template <typename T>
+    void call_resize_cubic_tex(const PtrStepSz<T>& src, const PtrStepSz<T>& srcWhole, int yoff, int xoff, const PtrStepSz<T>& dst, float fy, float fx)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y));
+
+        if (srcWhole.data == src.data)
+        {
+            TextureAccessor<T> texSrc = texAccessor(src, 0, 0);
+            CubicFilter< TextureAccessor<T> > filteredSrc(texSrc);
+
+            resize<<<grid, block>>>(filteredSrc, dst, fy, fx);
+        }
+        else
+        {
+            TextureAccessor<T> texSrc = texAccessor(srcWhole, yoff, xoff);
+
+            BrdReplicate<T> brd(src.rows, src.cols);
+            BorderReader<TextureAccessor<T>, BrdReplicate<T> > brdSrc(texSrc, brd);
+            CubicFilter< BorderReader<TextureAccessor<T>, BrdReplicate<T> > > filteredSrc(brdSrc);
+
+            resize<<<grid, block>>>(filteredSrc, dst, fy, fx);
+        }
+
+        cudaSafeCall( cudaGetLastError() );
+
+        cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    // ResizeNearestDispatcher
+
+    template <typename T> struct ResizeNearestDispatcher
+    {
+        static void call(const PtrStepSz<T>& src, const PtrStepSz<T>& /*srcWhole*/, int /*yoff*/, int /*xoff*/, const PtrStepSz<T>& dst, float fy, float fx, cudaStream_t stream)
+        {
+            call_resize_nearest_glob(src, dst, fy, fx, stream);
+        }
+    };
+
+    template <typename T> struct SelectImplForNearest
+    {
+        static void call(const PtrStepSz<T>& src, const PtrStepSz<T>& srcWhole, int yoff, int xoff, const PtrStepSz<T>& dst, float fy, float fx, cudaStream_t stream)
+        {
+            if (stream)
+                call_resize_nearest_glob(src, dst, fy, fx, stream);
+            else
+            {
+                if (fx > 1 || fy > 1)
+                    call_resize_nearest_glob(src, dst, fy, fx, 0);
+                else
+                    call_resize_nearest_tex(src, srcWhole, yoff, xoff, dst, fy, fx);
+            }
+        }
+    };
+
+    template <> struct ResizeNearestDispatcher<uchar> : SelectImplForNearest<uchar> {};
+    template <> struct ResizeNearestDispatcher<uchar4> : SelectImplForNearest<uchar4> {};
+
+    template <> struct ResizeNearestDispatcher<ushort> : SelectImplForNearest<ushort> {};
+    template <> struct ResizeNearestDispatcher<ushort4> : SelectImplForNearest<ushort4> {};
+
+    template <> struct ResizeNearestDispatcher<short> : SelectImplForNearest<short> {};
+    template <> struct ResizeNearestDispatcher<short4> : SelectImplForNearest<short4> {};
+
+    template <> struct ResizeNearestDispatcher<float> : SelectImplForNearest<float> {};
+    template <> struct ResizeNearestDispatcher<float4> : SelectImplForNearest<float4> {};
+
+    // ResizeLinearDispatcher
+
+    template <typename T> struct ResizeLinearDispatcher
+    {
+        static void call(const PtrStepSz<T>& src, const PtrStepSz<T>& /*srcWhole*/, int /*yoff*/, int /*xoff*/, const PtrStepSz<T>& dst, float fy, float fx, cudaStream_t stream)
+        {
+            call_resize_linear_glob(src, dst, fy, fx, stream);
+        }
+    };
+
+    template <typename T> struct SelectImplForLinear
+    {
+        static void call(const PtrStepSz<T>& src, const PtrStepSz<T>& srcWhole, int yoff, int xoff, const PtrStepSz<T>& dst, float fy, float fx, cudaStream_t stream)
+        {
+            if (stream)
+                call_resize_linear_glob(src, dst, fy, fx, stream);
+            else
+            {
+                if (fx > 1 || fy > 1)
+                    call_resize_linear_glob(src, dst, fy, fx, 0);
+                else
+                    call_resize_linear_tex(src, srcWhole, yoff, xoff, dst, fy, fx);
+            }
+        }
+    };
+
+    template <> struct ResizeLinearDispatcher<uchar> : SelectImplForLinear<uchar> {};
+    template <> struct ResizeLinearDispatcher<uchar4> : SelectImplForLinear<uchar4> {};
+
+    template <> struct ResizeLinearDispatcher<ushort> : SelectImplForLinear<ushort> {};
+    template <> struct ResizeLinearDispatcher<ushort4> : SelectImplForLinear<ushort4> {};
+
+    template <> struct ResizeLinearDispatcher<short> : SelectImplForLinear<short> {};
+    template <> struct ResizeLinearDispatcher<short4> : SelectImplForLinear<short4> {};
+
+    template <> struct ResizeLinearDispatcher<float> : SelectImplForLinear<float> {};
+    template <> struct ResizeLinearDispatcher<float4> : SelectImplForLinear<float4> {};
+
+    // ResizeCubicDispatcher
+
+    template <typename T> struct ResizeCubicDispatcher
+    {
+        static void call(const PtrStepSz<T>& src, const PtrStepSz<T>& /*srcWhole*/, int /*yoff*/, int /*xoff*/, const PtrStepSz<T>& dst, float fy, float fx, cudaStream_t stream)
+        {
+            call_resize_cubic_glob(src, dst, fy, fx, stream);
+        }
+    };
+
+    template <typename T> struct SelectImplForCubic
+    {
+        static void call(const PtrStepSz<T>& src, const PtrStepSz<T>& srcWhole, int yoff, int xoff, const PtrStepSz<T>& dst, float fy, float fx, cudaStream_t stream)
+        {
+            if (stream)
+                call_resize_cubic_glob(src, dst, fy, fx, stream);
+            else
+                call_resize_cubic_tex(src, srcWhole, yoff, xoff, dst, fy, fx);
+        }
+    };
+
+    template <> struct ResizeCubicDispatcher<uchar> : SelectImplForCubic<uchar> {};
+    template <> struct ResizeCubicDispatcher<uchar4> : SelectImplForCubic<uchar4> {};
+
+    template <> struct ResizeCubicDispatcher<ushort> : SelectImplForCubic<ushort> {};
+    template <> struct ResizeCubicDispatcher<ushort4> : SelectImplForCubic<ushort4> {};
+
+    template <> struct ResizeCubicDispatcher<short> : SelectImplForCubic<short> {};
+    template <> struct ResizeCubicDispatcher<short4> : SelectImplForCubic<short4> {};
+
+    template <> struct ResizeCubicDispatcher<float> : SelectImplForCubic<float> {};
+    template <> struct ResizeCubicDispatcher<float4> : SelectImplForCubic<float4> {};
+
+    // ResizeAreaDispatcher
+
+    template <typename T> struct ResizeAreaDispatcher
+    {
+        static void call(const PtrStepSz<T>& src, const PtrStepSz<T>&, int, int, const PtrStepSz<T>& dst, float fy, float fx, cudaStream_t stream)
+        {
+            const int iscale_x = (int) round(fx);
+            const int iscale_y = (int) round(fy);
+
+            const dim3 block(32, 8);
+            const dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y));
+
+            if (std::abs(fx - iscale_x) < FLT_MIN && std::abs(fy - iscale_y) < FLT_MIN)
+            {
+                BrdConstant<T> brd(src.rows, src.cols);
+                BorderReader< PtrStep<T>, BrdConstant<T> > brdSrc(src, brd);
+                IntegerAreaFilter< BorderReader< PtrStep<T>, BrdConstant<T> > > filteredSrc(brdSrc, fx, fy);
+
+                resize_area<<<grid, block, 0, stream>>>(filteredSrc, dst);
+            }
+            else
+            {
+                BrdConstant<T> brd(src.rows, src.cols);
+                BorderReader< PtrStep<T>, BrdConstant<T> > brdSrc(src, brd);
+                AreaFilter< BorderReader< PtrStep<T>, BrdConstant<T> > > filteredSrc(brdSrc, fx, fy);
+
+                resize_area<<<grid, block, 0, stream>>>(filteredSrc, dst);
+            }
+
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    };
+
+    // resize
+
+    template <typename T> void resize(const PtrStepSzb& src, const PtrStepSzb& srcWhole, int yoff, int xoff, const PtrStepSzb& dst, float fy, float fx, int interpolation, cudaStream_t stream)
+    {
+        typedef void (*func_t)(const PtrStepSz<T>& src, const PtrStepSz<T>& srcWhole, int yoff, int xoff, const PtrStepSz<T>& dst, float fy, float fx, cudaStream_t stream);
+        static const func_t funcs[4] =
+        {
+            ResizeNearestDispatcher<T>::call,
+            ResizeLinearDispatcher<T>::call,
+            ResizeCubicDispatcher<T>::call,
+            ResizeAreaDispatcher<T>::call
+        };
+
+        // change to linear if area interpolation upscaling
+        if (interpolation == 3 && (fx <= 1.f || fy <= 1.f))
+            interpolation = 1;
+
+        funcs[interpolation](static_cast< PtrStepSz<T> >(src), static_cast< PtrStepSz<T> >(srcWhole), yoff, xoff, static_cast< PtrStepSz<T> >(dst), fy, fx, stream);
+    }
+
+    template void resize<uchar >(const PtrStepSzb& src, const PtrStepSzb& srcWhole, int yoff, int xoff, const PtrStepSzb& dst, float fy, float fx, int interpolation, cudaStream_t stream);
+    template void resize<uchar3>(const PtrStepSzb& src, const PtrStepSzb& srcWhole, int yoff, int xoff, const PtrStepSzb& dst, float fy, float fx, int interpolation, cudaStream_t stream);
+    template void resize<uchar4>(const PtrStepSzb& src, const PtrStepSzb& srcWhole, int yoff, int xoff, const PtrStepSzb& dst, float fy, float fx, int interpolation, cudaStream_t stream);
+
+    template void resize<ushort >(const PtrStepSzb& src, const PtrStepSzb& srcWhole, int yoff, int xoff, const PtrStepSzb& dst, float fy, float fx, int interpolation, cudaStream_t stream);
+    template void resize<ushort3>(const PtrStepSzb& src, const PtrStepSzb& srcWhole, int yoff, int xoff, const PtrStepSzb& dst, float fy, float fx, int interpolation, cudaStream_t stream);
+    template void resize<ushort4>(const PtrStepSzb& src, const PtrStepSzb& srcWhole, int yoff, int xoff, const PtrStepSzb& dst, float fy, float fx, int interpolation, cudaStream_t stream);
+
+    template void resize<short >(const PtrStepSzb& src, const PtrStepSzb& srcWhole, int yoff, int xoff, const PtrStepSzb& dst, float fy, float fx, int interpolation, cudaStream_t stream);
+    template void resize<short3>(const PtrStepSzb& src, const PtrStepSzb& srcWhole, int yoff, int xoff, const PtrStepSzb& dst, float fy, float fx, int interpolation, cudaStream_t stream);
+    template void resize<short4>(const PtrStepSzb& src, const PtrStepSzb& srcWhole, int yoff, int xoff, const PtrStepSzb& dst, float fy, float fx, int interpolation, cudaStream_t stream);
+
+    template void resize<float >(const PtrStepSzb& src, const PtrStepSzb& srcWhole, int yoff, int xoff, const PtrStepSzb& dst, float fy, float fx, int interpolation, cudaStream_t stream);
+    template void resize<float3>(const PtrStepSzb& src, const PtrStepSzb& srcWhole, int yoff, int xoff, const PtrStepSzb& dst, float fy, float fx, int interpolation, cudaStream_t stream);
+    template void resize<float4>(const PtrStepSzb& src, const PtrStepSzb& srcWhole, int yoff, int xoff, const PtrStepSzb& dst, float fy, float fx, int interpolation, cudaStream_t stream);
+}}}
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudawarping/src/cuda/warp.cu b/modules/cudawarping/src/cuda/warp.cu
new file mode 100644
index 00000000000..2412f6ee88d
--- /dev/null
+++ b/modules/cudawarping/src/cuda/warp.cu
@@ -0,0 +1,389 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/border_interpolate.hpp"
+#include "opencv2/core/cuda/vec_traits.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+#include "opencv2/core/cuda/saturate_cast.hpp"
+#include "opencv2/core/cuda/filters.hpp"
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace imgproc
+    {
+        __constant__ float c_warpMat[3 * 3];
+
+        struct AffineTransform
+        {
+            static __device__ __forceinline__ float2 calcCoord(int x, int y)
+            {
+                const float xcoo = c_warpMat[0] * x + c_warpMat[1] * y + c_warpMat[2];
+                const float ycoo = c_warpMat[3] * x + c_warpMat[4] * y + c_warpMat[5];
+
+                return make_float2(xcoo, ycoo);
+            }
+        };
+
+        struct PerspectiveTransform
+        {
+            static __device__ __forceinline__ float2 calcCoord(int x, int y)
+            {
+                const float coeff = 1.0f / (c_warpMat[6] * x + c_warpMat[7] * y + c_warpMat[8]);
+
+                const float xcoo = coeff * (c_warpMat[0] * x + c_warpMat[1] * y + c_warpMat[2]);
+                const float ycoo = coeff * (c_warpMat[3] * x + c_warpMat[4] * y + c_warpMat[5]);
+
+                return make_float2(xcoo, ycoo);
+            }
+        };
+
+        ///////////////////////////////////////////////////////////////////
+        // Build Maps
+
+        template <class Transform> __global__ void buildWarpMaps(PtrStepSzf xmap, PtrStepf ymap)
+        {
+            const int x = blockDim.x * blockIdx.x + threadIdx.x;
+            const int y = blockDim.y * blockIdx.y + threadIdx.y;
+
+            if (x < xmap.cols && y < xmap.rows)
+            {
+                const float2 coord = Transform::calcCoord(x, y);
+
+                xmap(y, x) = coord.x;
+                ymap(y, x) = coord.y;
+            }
+        }
+
+        template <class Transform> void buildWarpMaps_caller(PtrStepSzf xmap, PtrStepSzf ymap, cudaStream_t stream)
+        {
+            dim3 block(32, 8);
+            dim3 grid(divUp(xmap.cols, block.x), divUp(xmap.rows, block.y));
+
+            buildWarpMaps<Transform><<<grid, block, 0, stream>>>(xmap, ymap);
+            cudaSafeCall( cudaGetLastError() );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+
+        void buildWarpAffineMaps_gpu(float coeffs[2 * 3], PtrStepSzf xmap, PtrStepSzf ymap, cudaStream_t stream)
+        {
+            cudaSafeCall( cudaMemcpyToSymbol(c_warpMat, coeffs, 2 * 3 * sizeof(float)) );
+
+            buildWarpMaps_caller<AffineTransform>(xmap, ymap, stream);
+        }
+
+        void buildWarpPerspectiveMaps_gpu(float coeffs[3 * 3], PtrStepSzf xmap, PtrStepSzf ymap, cudaStream_t stream)
+        {
+            cudaSafeCall( cudaMemcpyToSymbol(c_warpMat, coeffs, 3 * 3 * sizeof(float)) );
+
+            buildWarpMaps_caller<PerspectiveTransform>(xmap, ymap, stream);
+        }
+
+        ///////////////////////////////////////////////////////////////////
+        // Warp
+
+        template <class Transform, class Ptr2D, typename T> __global__ void warp(const Ptr2D src, PtrStepSz<T> dst)
+        {
+            const int x = blockDim.x * blockIdx.x + threadIdx.x;
+            const int y = blockDim.y * blockIdx.y + threadIdx.y;
+
+            if (x < dst.cols && y < dst.rows)
+            {
+                const float2 coord = Transform::calcCoord(x, y);
+
+                dst.ptr(y)[x] = saturate_cast<T>(src(coord.y, coord.x));
+            }
+        }
+
+        template <class Transform, template <typename> class Filter, template <typename> class B, typename T> struct WarpDispatcherStream
+        {
+            static void call(PtrStepSz<T> src, PtrStepSz<T> dst, const float* borderValue, cudaStream_t stream, bool)
+            {
+                typedef typename TypeVec<float, VecTraits<T>::cn>::vec_type work_type;
+
+                dim3 block(32, 8);
+                dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y));
+
+                B<work_type> brd(src.rows, src.cols, VecTraits<work_type>::make(borderValue));
+                BorderReader< PtrStep<T>, B<work_type> > brdSrc(src, brd);
+                Filter< BorderReader< PtrStep<T>, B<work_type> > > filter_src(brdSrc);
+
+                warp<Transform><<<grid, block, 0, stream>>>(filter_src, dst);
+                cudaSafeCall( cudaGetLastError() );
+            }
+        };
+
+        template <class Transform, template <typename> class Filter, template <typename> class B, typename T> struct WarpDispatcherNonStream
+        {
+            static void call(PtrStepSz<T> src, PtrStepSz<T> srcWhole, int xoff, int yoff, PtrStepSz<T> dst, const float* borderValue, bool)
+            {
+                CV_UNUSED(xoff);
+                CV_UNUSED(yoff);
+                CV_UNUSED(srcWhole);
+
+                typedef typename TypeVec<float, VecTraits<T>::cn>::vec_type work_type;
+
+                dim3 block(32, 8);
+                dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y));
+
+                B<work_type> brd(src.rows, src.cols, VecTraits<work_type>::make(borderValue));
+                BorderReader< PtrStep<T>, B<work_type> > brdSrc(src, brd);
+                Filter< BorderReader< PtrStep<T>, B<work_type> > > filter_src(brdSrc);
+
+                warp<Transform><<<grid, block>>>(filter_src, dst);
+                cudaSafeCall( cudaGetLastError() );
+
+                cudaSafeCall( cudaDeviceSynchronize() );
+            }
+        };
+
+        #define OPENCV_CUDA_IMPLEMENT_WARP_TEX(type) \
+            texture< type , cudaTextureType2D > tex_warp_ ## type (0, cudaFilterModePoint, cudaAddressModeClamp); \
+            struct tex_warp_ ## type ## _reader \
+            { \
+                typedef type elem_type; \
+                typedef int index_type; \
+                int xoff, yoff; \
+                tex_warp_ ## type ## _reader (int xoff_, int yoff_) : xoff(xoff_), yoff(yoff_) {} \
+                __device__ __forceinline__ elem_type operator ()(index_type y, index_type x) const \
+                { \
+                    return tex2D(tex_warp_ ## type , x + xoff, y + yoff); \
+                } \
+            }; \
+            template <class Transform, template <typename> class Filter, template <typename> class B> struct WarpDispatcherNonStream<Transform, Filter, B, type> \
+            { \
+                static void call(PtrStepSz< type > src, PtrStepSz< type > srcWhole, int xoff, int yoff, PtrStepSz< type > dst, const float* borderValue, bool cc20) \
+                { \
+                    typedef typename TypeVec<float, VecTraits< type >::cn>::vec_type work_type; \
+                    dim3 block(32, cc20 ? 8 : 4); \
+                    dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y)); \
+                    bindTexture(&tex_warp_ ## type , srcWhole); \
+                    tex_warp_ ## type ##_reader texSrc(xoff, yoff); \
+                    B<work_type> brd(src.rows, src.cols, VecTraits<work_type>::make(borderValue)); \
+                    BorderReader< tex_warp_ ## type ##_reader, B<work_type> > brdSrc(texSrc, brd); \
+                    Filter< BorderReader< tex_warp_ ## type ##_reader, B<work_type> > > filter_src(brdSrc); \
+                    warp<Transform><<<grid, block>>>(filter_src, dst); \
+                    cudaSafeCall( cudaGetLastError() ); \
+                    cudaSafeCall( cudaDeviceSynchronize() ); \
+                } \
+            }; \
+            template <class Transform, template <typename> class Filter> struct WarpDispatcherNonStream<Transform, Filter, BrdReplicate, type> \
+            { \
+                static void call(PtrStepSz< type > src, PtrStepSz< type > srcWhole, int xoff, int yoff, PtrStepSz< type > dst, const float*, bool) \
+                { \
+                    dim3 block(32, 8); \
+                    dim3 grid(divUp(dst.cols, block.x), divUp(dst.rows, block.y)); \
+                    bindTexture(&tex_warp_ ## type , srcWhole); \
+                    tex_warp_ ## type ##_reader texSrc(xoff, yoff); \
+                    if (srcWhole.cols == src.cols && srcWhole.rows == src.rows) \
+                    { \
+                        Filter< tex_warp_ ## type ##_reader > filter_src(texSrc); \
+                        warp<Transform><<<grid, block>>>(filter_src, dst); \
+                    } \
+                    else \
+                    { \
+                        BrdReplicate<type> brd(src.rows, src.cols); \
+                        BorderReader< tex_warp_ ## type ##_reader, BrdReplicate<type> > brdSrc(texSrc, brd); \
+                        Filter< BorderReader< tex_warp_ ## type ##_reader, BrdReplicate<type> > > filter_src(brdSrc); \
+                        warp<Transform><<<grid, block>>>(filter_src, dst); \
+                    } \
+                    cudaSafeCall( cudaGetLastError() ); \
+                    cudaSafeCall( cudaDeviceSynchronize() ); \
+                } \
+            };
+
+        OPENCV_CUDA_IMPLEMENT_WARP_TEX(uchar)
+        //OPENCV_CUDA_IMPLEMENT_WARP_TEX(uchar2)
+        OPENCV_CUDA_IMPLEMENT_WARP_TEX(uchar4)
+
+        //OPENCV_CUDA_IMPLEMENT_WARP_TEX(schar)
+        //OPENCV_CUDA_IMPLEMENT_WARP_TEX(char2)
+        //OPENCV_CUDA_IMPLEMENT_WARP_TEX(char4)
+
+        OPENCV_CUDA_IMPLEMENT_WARP_TEX(ushort)
+        //OPENCV_CUDA_IMPLEMENT_WARP_TEX(ushort2)
+        OPENCV_CUDA_IMPLEMENT_WARP_TEX(ushort4)
+
+        OPENCV_CUDA_IMPLEMENT_WARP_TEX(short)
+        //OPENCV_CUDA_IMPLEMENT_WARP_TEX(short2)
+        OPENCV_CUDA_IMPLEMENT_WARP_TEX(short4)
+
+        //OPENCV_CUDA_IMPLEMENT_WARP_TEX(int)
+        //OPENCV_CUDA_IMPLEMENT_WARP_TEX(int2)
+        //OPENCV_CUDA_IMPLEMENT_WARP_TEX(int4)
+
+        OPENCV_CUDA_IMPLEMENT_WARP_TEX(float)
+        //OPENCV_CUDA_IMPLEMENT_WARP_TEX(float2)
+        OPENCV_CUDA_IMPLEMENT_WARP_TEX(float4)
+
+        #undef OPENCV_CUDA_IMPLEMENT_WARP_TEX
+
+        template <class Transform, template <typename> class Filter, template <typename> class B, typename T> struct WarpDispatcher
+        {
+            static void call(PtrStepSz<T> src, PtrStepSz<T> srcWhole, int xoff, int yoff, PtrStepSz<T> dst, const float* borderValue, cudaStream_t stream, bool cc20)
+            {
+                if (stream == 0)
+                    WarpDispatcherNonStream<Transform, Filter, B, T>::call(src, srcWhole, xoff, yoff, dst, borderValue, cc20);
+                else
+                    WarpDispatcherStream<Transform, Filter, B, T>::call(src, dst, borderValue, stream, cc20);
+            }
+        };
+
+        template <class Transform, typename T>
+        void warp_caller(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzb dst, int interpolation,
+                         int borderMode, const float* borderValue, cudaStream_t stream, bool cc20)
+        {
+            typedef void (*func_t)(PtrStepSz<T> src, PtrStepSz<T> srcWhole, int xoff, int yoff, PtrStepSz<T> dst, const float* borderValue, cudaStream_t stream, bool cc20);
+
+            static const func_t funcs[3][5] =
+            {
+                {
+                    WarpDispatcher<Transform, PointFilter, BrdConstant, T>::call,
+                    WarpDispatcher<Transform, PointFilter, BrdReplicate, T>::call,
+                    WarpDispatcher<Transform, PointFilter, BrdReflect, T>::call,
+                    WarpDispatcher<Transform, PointFilter, BrdWrap, T>::call,
+                    WarpDispatcher<Transform, PointFilter, BrdReflect101, T>::call
+                },
+                {
+                    WarpDispatcher<Transform, LinearFilter, BrdConstant, T>::call,
+                    WarpDispatcher<Transform, LinearFilter, BrdReplicate, T>::call,
+                    WarpDispatcher<Transform, LinearFilter, BrdReflect, T>::call,
+                    WarpDispatcher<Transform, LinearFilter, BrdWrap, T>::call,
+                    WarpDispatcher<Transform, LinearFilter, BrdReflect101, T>::call
+                },
+                {
+                    WarpDispatcher<Transform, CubicFilter, BrdConstant, T>::call,
+                    WarpDispatcher<Transform, CubicFilter, BrdReplicate, T>::call,
+                    WarpDispatcher<Transform, CubicFilter, BrdReflect, T>::call,
+                    WarpDispatcher<Transform, CubicFilter, BrdWrap, T>::call,
+                    WarpDispatcher<Transform, CubicFilter, BrdReflect101, T>::call
+                }
+            };
+
+            funcs[interpolation][borderMode](static_cast< PtrStepSz<T> >(src), static_cast< PtrStepSz<T> >(srcWhole), xoff, yoff,
+                static_cast< PtrStepSz<T> >(dst), borderValue, stream, cc20);
+        }
+
+        template <typename T> void warpAffine_gpu(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation,
+                                                  int borderMode, const float* borderValue, cudaStream_t stream, bool cc20)
+        {
+            cudaSafeCall( cudaMemcpyToSymbol(c_warpMat, coeffs, 2 * 3 * sizeof(float)) );
+
+            warp_caller<AffineTransform, T>(src, srcWhole, xoff, yoff, dst, interpolation, borderMode, borderValue, stream, cc20);
+        }
+
+        template void warpAffine_gpu<uchar >(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpAffine_gpu<uchar2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void warpAffine_gpu<uchar3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void warpAffine_gpu<uchar4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        //template void warpAffine_gpu<schar>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpAffine_gpu<char2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpAffine_gpu<char3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpAffine_gpu<char4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        template void warpAffine_gpu<ushort >(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpAffine_gpu<ushort2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void warpAffine_gpu<ushort3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void warpAffine_gpu<ushort4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        template void warpAffine_gpu<short >(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpAffine_gpu<short2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void warpAffine_gpu<short3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void warpAffine_gpu<short4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        //template void warpAffine_gpu<int >(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpAffine_gpu<int2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpAffine_gpu<int3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpAffine_gpu<int4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        template void warpAffine_gpu<float >(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpAffine_gpu<float2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void warpAffine_gpu<float3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void warpAffine_gpu<float4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        template <typename T> void warpPerspective_gpu(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation,
+                                                  int borderMode, const float* borderValue, cudaStream_t stream, bool cc20)
+        {
+            cudaSafeCall( cudaMemcpyToSymbol(c_warpMat, coeffs, 3 * 3 * sizeof(float)) );
+
+            warp_caller<PerspectiveTransform, T>(src, srcWhole, xoff, yoff, dst, interpolation, borderMode, borderValue, stream, cc20);
+        }
+
+        template void warpPerspective_gpu<uchar >(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpPerspective_gpu<uchar2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void warpPerspective_gpu<uchar3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void warpPerspective_gpu<uchar4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        //template void warpPerspective_gpu<schar>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpPerspective_gpu<char2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpPerspective_gpu<char3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpPerspective_gpu<char4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        template void warpPerspective_gpu<ushort >(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpPerspective_gpu<ushort2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void warpPerspective_gpu<ushort3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void warpPerspective_gpu<ushort4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        template void warpPerspective_gpu<short >(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpPerspective_gpu<short2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void warpPerspective_gpu<short3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void warpPerspective_gpu<short4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        //template void warpPerspective_gpu<int >(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpPerspective_gpu<int2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpPerspective_gpu<int3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpPerspective_gpu<int4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        template void warpPerspective_gpu<float >(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        //template void warpPerspective_gpu<float2>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void warpPerspective_gpu<float3>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+        template void warpPerspective_gpu<float4>(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+    } // namespace imgproc
+}}} // namespace cv { namespace cuda { namespace cudev
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/cudawarping/src/precomp.hpp b/modules/cudawarping/src/precomp.hpp
new file mode 100644
index 00000000000..a59a4e92574
--- /dev/null
+++ b/modules/cudawarping/src/precomp.hpp
@@ -0,0 +1,50 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_PRECOMP_H__
+#define __OPENCV_PRECOMP_H__
+
+#include "opencv2/cudawarping.hpp"
+
+#include "opencv2/core/private.cuda.hpp"
+
+#endif /* __OPENCV_PRECOMP_H__ */
diff --git a/modules/cudawarping/src/pyramids.cpp b/modules/cudawarping/src/pyramids.cpp
new file mode 100644
index 00000000000..817a1671599
--- /dev/null
+++ b/modules/cudawarping/src/pyramids.cpp
@@ -0,0 +1,134 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined HAVE_CUDA || defined(CUDA_DISABLER)
+
+void cv::cuda::pyrDown(InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+void cv::cuda::pyrUp(InputArray, OutputArray, Stream&) { throw_no_cuda(); }
+
+#else // HAVE_CUDA
+
+//////////////////////////////////////////////////////////////////////////////
+// pyrDown
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace imgproc
+    {
+        template <typename T> void pyrDown_gpu(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+    }
+}}}
+
+void cv::cuda::pyrDown(InputArray _src, OutputArray _dst, Stream& stream)
+{
+    using namespace cv::cuda::device::imgproc;
+
+    typedef void (*func_t)(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+    static const func_t funcs[6][4] =
+    {
+        {pyrDown_gpu<uchar>      , 0 /*pyrDown_gpu<uchar2>*/ , pyrDown_gpu<uchar3>      , pyrDown_gpu<uchar4>      },
+        {0 /*pyrDown_gpu<schar>*/, 0 /*pyrDown_gpu<schar2>*/ , 0 /*pyrDown_gpu<schar3>*/, 0 /*pyrDown_gpu<schar4>*/},
+        {pyrDown_gpu<ushort>     , 0 /*pyrDown_gpu<ushort2>*/, pyrDown_gpu<ushort3>     , pyrDown_gpu<ushort4>     },
+        {pyrDown_gpu<short>      , 0 /*pyrDown_gpu<short2>*/ , pyrDown_gpu<short3>      , pyrDown_gpu<short4>      },
+        {pyrDown_gpu<int>        , 0 /*pyrDown_gpu<int2>*/   , pyrDown_gpu<int3>        , pyrDown_gpu<int4>        },
+        {pyrDown_gpu<float>      , 0 /*pyrDown_gpu<float2>*/ , pyrDown_gpu<float3>      , pyrDown_gpu<float4>      }
+    };
+
+    GpuMat src = _src.getGpuMat();
+
+    CV_Assert( src.depth() <= CV_32F && src.channels() <= 4 );
+
+    const func_t func = funcs[src.depth()][src.channels() - 1];
+    CV_Assert( func != 0 );
+
+    _dst.create((src.rows + 1) / 2, (src.cols + 1) / 2, src.type());
+    GpuMat dst = _dst.getGpuMat();
+
+    func(src, dst, StreamAccessor::getStream(stream));
+}
+
+
+//////////////////////////////////////////////////////////////////////////////
+// pyrUp
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace imgproc
+    {
+        template <typename T> void pyrUp_gpu(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+    }
+}}}
+
+void cv::cuda::pyrUp(InputArray _src, OutputArray _dst, Stream& stream)
+{
+    using namespace cv::cuda::device::imgproc;
+
+    typedef void (*func_t)(PtrStepSzb src, PtrStepSzb dst, cudaStream_t stream);
+    static const func_t funcs[6][4] =
+    {
+        {pyrUp_gpu<uchar>      , 0 /*pyrUp_gpu<uchar2>*/ , pyrUp_gpu<uchar3>      , pyrUp_gpu<uchar4>      },
+        {0 /*pyrUp_gpu<schar>*/, 0 /*pyrUp_gpu<schar2>*/ , 0 /*pyrUp_gpu<schar3>*/, 0 /*pyrUp_gpu<schar4>*/},
+        {pyrUp_gpu<ushort>     , 0 /*pyrUp_gpu<ushort2>*/, pyrUp_gpu<ushort3>     , pyrUp_gpu<ushort4>     },
+        {pyrUp_gpu<short>      , 0 /*pyrUp_gpu<short2>*/ , pyrUp_gpu<short3>      , pyrUp_gpu<short4>      },
+        {0 /*pyrUp_gpu<int>*/  , 0 /*pyrUp_gpu<int2>*/   , 0 /*pyrUp_gpu<int3>*/  , 0 /*pyrUp_gpu<int4>*/  },
+        {pyrUp_gpu<float>      , 0 /*pyrUp_gpu<float2>*/ , pyrUp_gpu<float3>      , pyrUp_gpu<float4>      }
+    };
+
+    GpuMat src = _src.getGpuMat();
+
+    CV_Assert( src.depth() <= CV_32F && src.channels() <= 4 );
+
+    const func_t func = funcs[src.depth()][src.channels() - 1];
+    CV_Assert( func != 0 );
+
+    _dst.create(src.rows * 2, src.cols * 2, src.type());
+    GpuMat dst = _dst.getGpuMat();
+
+    func(src, dst, StreamAccessor::getStream(stream));
+}
+
+#endif
\ No newline at end of file
diff --git a/modules/cudawarping/src/remap.cpp b/modules/cudawarping/src/remap.cpp
new file mode 100644
index 00000000000..c1351daf2d4
--- /dev/null
+++ b/modules/cudawarping/src/remap.cpp
@@ -0,0 +1,104 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+#if !defined HAVE_CUDA || defined(CUDA_DISABLER)
+
+void cv::cuda::remap(InputArray, OutputArray, InputArray, InputArray, int, int, Scalar, Stream&){ throw_no_cuda(); }
+
+#else // HAVE_CUDA
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace imgproc
+    {
+        template <typename T>
+        void remap_gpu(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst,
+                       int interpolation, int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+    }
+}}}
+
+void cv::cuda::remap(InputArray _src, OutputArray _dst, InputArray _xmap, InputArray _ymap, int interpolation, int borderMode, Scalar borderValue, Stream& stream)
+{
+    using namespace cv::cuda::device::imgproc;
+
+    typedef void (*func_t)(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, PtrStepSzf xmap, PtrStepSzf ymap, PtrStepSzb dst, int interpolation,
+        int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+    static const func_t funcs[6][4] =
+    {
+        {remap_gpu<uchar>      , 0 /*remap_gpu<uchar2>*/ , remap_gpu<uchar3>     , remap_gpu<uchar4>     },
+        {0 /*remap_gpu<schar>*/, 0 /*remap_gpu<char2>*/  , 0 /*remap_gpu<char3>*/, 0 /*remap_gpu<char4>*/},
+        {remap_gpu<ushort>     , 0 /*remap_gpu<ushort2>*/, remap_gpu<ushort3>    , remap_gpu<ushort4>    },
+        {remap_gpu<short>      , 0 /*remap_gpu<short2>*/ , remap_gpu<short3>     , remap_gpu<short4>     },
+        {0 /*remap_gpu<int>*/  , 0 /*remap_gpu<int2>*/   , 0 /*remap_gpu<int3>*/ , 0 /*remap_gpu<int4>*/ },
+        {remap_gpu<float>      , 0 /*remap_gpu<float2>*/ , remap_gpu<float3>     , remap_gpu<float4>     }
+    };
+
+    GpuMat src = _src.getGpuMat();
+    GpuMat xmap = _xmap.getGpuMat();
+    GpuMat ymap = _ymap.getGpuMat();
+
+    CV_Assert( src.depth() <= CV_32F && src.channels() <= 4 );
+    CV_Assert( xmap.type() == CV_32F && ymap.type() == CV_32F && xmap.size() == ymap.size() );
+    CV_Assert( interpolation == INTER_NEAREST || interpolation == INTER_LINEAR || interpolation == INTER_CUBIC );
+    CV_Assert( borderMode == BORDER_REFLECT101 || borderMode == BORDER_REPLICATE || borderMode == BORDER_CONSTANT || borderMode == BORDER_REFLECT || borderMode == BORDER_WRAP );
+
+    const func_t func = funcs[src.depth()][src.channels() - 1];
+    if (!func)
+        CV_Error(Error::StsUnsupportedFormat, "Unsupported input type");
+
+    _dst.create(xmap.size(), src.type());
+    GpuMat dst = _dst.getGpuMat();
+
+    Scalar_<float> borderValueFloat;
+    borderValueFloat = borderValue;
+
+    Size wholeSize;
+    Point ofs;
+    src.locateROI(wholeSize, ofs);
+
+    func(src, PtrStepSzb(wholeSize.height, wholeSize.width, src.datastart, src.step), ofs.x, ofs.y, xmap, ymap,
+        dst, interpolation, borderMode, borderValueFloat.val, StreamAccessor::getStream(stream), deviceSupports(FEATURE_SET_COMPUTE_20));
+}
+
+#endif // HAVE_CUDA
diff --git a/modules/cudawarping/src/resize.cpp b/modules/cudawarping/src/resize.cpp
new file mode 100644
index 00000000000..9943a6cdc6a
--- /dev/null
+++ b/modules/cudawarping/src/resize.cpp
@@ -0,0 +1,108 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+#if !defined HAVE_CUDA || defined(CUDA_DISABLER)
+
+void cv::cuda::resize(InputArray, OutputArray, Size, double, double, int, Stream&) { throw_no_cuda(); }
+
+#else // HAVE_CUDA
+
+namespace cv { namespace cuda { namespace device
+{
+    template <typename T>
+    void resize(const PtrStepSzb& src, const PtrStepSzb& srcWhole, int yoff, int xoff, const PtrStepSzb& dst, float fy, float fx, int interpolation, cudaStream_t stream);
+}}}
+
+void cv::cuda::resize(InputArray _src, OutputArray _dst, Size dsize, double fx, double fy, int interpolation, Stream& stream)
+{
+    GpuMat src = _src.getGpuMat();
+
+    typedef void (*func_t)(const PtrStepSzb& src, const PtrStepSzb& srcWhole, int yoff, int xoff, const PtrStepSzb& dst, float fy, float fx, int interpolation, cudaStream_t stream);
+    static const func_t funcs[6][4] =
+    {
+        {device::resize<uchar>      , 0 /*device::resize<uchar2>*/ , device::resize<uchar3>     , device::resize<uchar4>     },
+        {0 /*device::resize<schar>*/, 0 /*device::resize<char2>*/  , 0 /*device::resize<char3>*/, 0 /*device::resize<char4>*/},
+        {device::resize<ushort>     , 0 /*device::resize<ushort2>*/, device::resize<ushort3>    , device::resize<ushort4>    },
+        {device::resize<short>      , 0 /*device::resize<short2>*/ , device::resize<short3>     , device::resize<short4>     },
+        {0 /*device::resize<int>*/  , 0 /*device::resize<int2>*/   , 0 /*device::resize<int3>*/ , 0 /*device::resize<int4>*/ },
+        {device::resize<float>      , 0 /*device::resize<float2>*/ , device::resize<float3>     , device::resize<float4>     }
+    };
+
+    CV_Assert( src.depth() <= CV_32F && src.channels() <= 4 );
+    CV_Assert( interpolation == INTER_NEAREST || interpolation == INTER_LINEAR || interpolation == INTER_CUBIC || interpolation == INTER_AREA );
+    CV_Assert( !(dsize == Size()) || (fx > 0 && fy > 0) );
+
+    if (dsize == Size())
+    {
+        dsize = Size(saturate_cast<int>(src.cols * fx), saturate_cast<int>(src.rows * fy));
+    }
+    else
+    {
+        fx = static_cast<double>(dsize.width) / src.cols;
+        fy = static_cast<double>(dsize.height) / src.rows;
+    }
+
+    _dst.create(dsize, src.type());
+    GpuMat dst = _dst.getGpuMat();
+
+    if (dsize == src.size())
+    {
+        src.copyTo(dst, stream);
+        return;
+    }
+
+    const func_t func = funcs[src.depth()][src.channels() - 1];
+
+    if (!func)
+        CV_Error(Error::StsUnsupportedFormat, "Unsupported combination of source and destination types");
+
+    Size wholeSize;
+    Point ofs;
+    src.locateROI(wholeSize, ofs);
+    PtrStepSzb wholeSrc(wholeSize.height, wholeSize.width, src.datastart, src.step);
+
+    func(src, wholeSrc, ofs.y, ofs.x, dst, static_cast<float>(1.0 / fy), static_cast<float>(1.0 / fx), interpolation, StreamAccessor::getStream(stream));
+}
+
+#endif // HAVE_CUDA
diff --git a/modules/cudawarping/src/warp.cpp b/modules/cudawarping/src/warp.cpp
new file mode 100644
index 00000000000..a18a459c2b1
--- /dev/null
+++ b/modules/cudawarping/src/warp.cpp
@@ -0,0 +1,534 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+#if !defined HAVE_CUDA || defined(CUDA_DISABLER)
+
+void cv::cuda::warpAffine(InputArray, OutputArray, InputArray, Size, int, int, Scalar, Stream&) { throw_no_cuda(); }
+void cv::cuda::buildWarpAffineMaps(InputArray, bool, Size, OutputArray, OutputArray, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::warpPerspective(InputArray, OutputArray, InputArray, Size, int, int, Scalar, Stream&) { throw_no_cuda(); }
+void cv::cuda::buildWarpPerspectiveMaps(InputArray, bool, Size, OutputArray, OutputArray, Stream&) { throw_no_cuda(); }
+
+void cv::cuda::rotate(InputArray, OutputArray, Size, double, double, double, int, Stream&) { throw_no_cuda(); }
+
+#else // HAVE_CUDA
+
+namespace cv { namespace cuda { namespace device
+{
+    namespace imgproc
+    {
+        void buildWarpAffineMaps_gpu(float coeffs[2 * 3], PtrStepSzf xmap, PtrStepSzf ymap, cudaStream_t stream);
+
+        template <typename T>
+        void warpAffine_gpu(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation,
+                            int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        void buildWarpPerspectiveMaps_gpu(float coeffs[3 * 3], PtrStepSzf xmap, PtrStepSzf ymap, cudaStream_t stream);
+
+        template <typename T>
+        void warpPerspective_gpu(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[3 * 3], PtrStepSzb dst, int interpolation,
+                            int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+    }
+}}}
+
+void cv::cuda::buildWarpAffineMaps(InputArray _M, bool inverse, Size dsize, OutputArray _xmap, OutputArray _ymap, Stream& stream)
+{
+    using namespace cv::cuda::device::imgproc;
+
+    Mat M = _M.getMat();
+
+    CV_Assert( M.rows == 2 && M.cols == 3 );
+
+    _xmap.create(dsize, CV_32FC1);
+    _ymap.create(dsize, CV_32FC1);
+
+    GpuMat xmap = _xmap.getGpuMat();
+    GpuMat ymap = _ymap.getGpuMat();
+
+    float coeffs[2 * 3];
+    Mat coeffsMat(2, 3, CV_32F, (void*)coeffs);
+
+    if (inverse)
+        M.convertTo(coeffsMat, coeffsMat.type());
+    else
+    {
+        cv::Mat iM;
+        invertAffineTransform(M, iM);
+        iM.convertTo(coeffsMat, coeffsMat.type());
+    }
+
+    buildWarpAffineMaps_gpu(coeffs, xmap, ymap, StreamAccessor::getStream(stream));
+}
+
+void cv::cuda::buildWarpPerspectiveMaps(InputArray _M, bool inverse, Size dsize, OutputArray _xmap, OutputArray _ymap, Stream& stream)
+{
+    using namespace cv::cuda::device::imgproc;
+
+    Mat M = _M.getMat();
+
+    CV_Assert( M.rows == 3 && M.cols == 3 );
+
+    _xmap.create(dsize, CV_32FC1);
+    _ymap.create(dsize, CV_32FC1);
+
+    GpuMat xmap = _xmap.getGpuMat();
+    GpuMat ymap = _ymap.getGpuMat();
+
+    float coeffs[3 * 3];
+    Mat coeffsMat(3, 3, CV_32F, (void*)coeffs);
+
+    if (inverse)
+        M.convertTo(coeffsMat, coeffsMat.type());
+    else
+    {
+        cv::Mat iM;
+        invert(M, iM);
+        iM.convertTo(coeffsMat, coeffsMat.type());
+    }
+
+    buildWarpPerspectiveMaps_gpu(coeffs, xmap, ymap, StreamAccessor::getStream(stream));
+}
+
+namespace
+{
+    template <int DEPTH> struct NppWarpFunc
+    {
+        typedef typename NPPTypeTraits<DEPTH>::npp_type npp_type;
+
+        typedef NppStatus (*func_t)(const npp_type* pSrc, NppiSize srcSize, int srcStep, NppiRect srcRoi, npp_type* pDst,
+                                    int dstStep, NppiRect dstRoi, const double coeffs[][3],
+                                    int interpolation);
+    };
+
+    template <int DEPTH, typename NppWarpFunc<DEPTH>::func_t func> struct NppWarp
+    {
+        typedef typename NppWarpFunc<DEPTH>::npp_type npp_type;
+
+        static void call(const cv::cuda::GpuMat& src, cv::cuda::GpuMat& dst, double coeffs[][3], int interpolation, cudaStream_t stream)
+        {
+            static const int npp_inter[] = {NPPI_INTER_NN, NPPI_INTER_LINEAR, NPPI_INTER_CUBIC};
+
+            NppiSize srcsz;
+            srcsz.height = src.rows;
+            srcsz.width = src.cols;
+
+            NppiRect srcroi;
+            srcroi.x = 0;
+            srcroi.y = 0;
+            srcroi.height = src.rows;
+            srcroi.width = src.cols;
+
+            NppiRect dstroi;
+            dstroi.x = 0;
+            dstroi.y = 0;
+            dstroi.height = dst.rows;
+            dstroi.width = dst.cols;
+
+            cv::cuda::NppStreamHandler h(stream);
+
+            nppSafeCall( func(src.ptr<npp_type>(), srcsz, static_cast<int>(src.step), srcroi,
+                              dst.ptr<npp_type>(), static_cast<int>(dst.step), dstroi,
+                              coeffs, npp_inter[interpolation]) );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    };
+}
+
+void cv::cuda::warpAffine(InputArray _src, OutputArray _dst, InputArray _M, Size dsize, int flags, int borderMode, Scalar borderValue, Stream& stream)
+{
+    GpuMat src = _src.getGpuMat();
+    Mat M = _M.getMat();
+
+    CV_Assert( M.rows == 2 && M.cols == 3 );
+
+    const int interpolation = flags & INTER_MAX;
+
+    CV_Assert( src.depth() <= CV_32F && src.channels() <= 4 );
+    CV_Assert( interpolation == INTER_NEAREST || interpolation == INTER_LINEAR || interpolation == INTER_CUBIC );
+    CV_Assert( borderMode == BORDER_REFLECT101 || borderMode == BORDER_REPLICATE || borderMode == BORDER_CONSTANT || borderMode == BORDER_REFLECT || borderMode == BORDER_WRAP );
+
+    _dst.create(dsize, src.type());
+    GpuMat dst = _dst.getGpuMat();
+
+    Size wholeSize;
+    Point ofs;
+    src.locateROI(wholeSize, ofs);
+
+    static const bool useNppTab[6][4][3] =
+    {
+        {
+            {false, false, true},
+            {false, false, false},
+            {false, true, true},
+            {false, false, false}
+        },
+        {
+            {false, false, false},
+            {false, false, false},
+            {false, false, false},
+            {false, false, false}
+        },
+        {
+            {false, true, true},
+            {false, false, false},
+            {false, true, true},
+            {false, false, false}
+        },
+        {
+            {false, false, false},
+            {false, false, false},
+            {false, false, false},
+            {false, false, false}
+        },
+        {
+            {false, true, true},
+            {false, false, false},
+            {false, true, true},
+            {false, false, true}
+        },
+        {
+            {false, true, true},
+            {false, false, false},
+            {false, true, true},
+            {false, false, true}
+        }
+    };
+
+    bool useNpp = borderMode == BORDER_CONSTANT && ofs.x == 0 && ofs.y == 0 && useNppTab[src.depth()][src.channels() - 1][interpolation];
+    // NPP bug on float data
+    useNpp = useNpp && src.depth() != CV_32F;
+
+    if (useNpp)
+    {
+        typedef void (*func_t)(const cv::cuda::GpuMat& src, cv::cuda::GpuMat& dst, double coeffs[][3], int flags, cudaStream_t stream);
+
+        static const func_t funcs[2][6][4] =
+        {
+            {
+                {NppWarp<CV_8U, nppiWarpAffine_8u_C1R>::call, 0, NppWarp<CV_8U, nppiWarpAffine_8u_C3R>::call, NppWarp<CV_8U, nppiWarpAffine_8u_C4R>::call},
+                {0, 0, 0, 0},
+                {NppWarp<CV_16U, nppiWarpAffine_16u_C1R>::call, 0, NppWarp<CV_16U, nppiWarpAffine_16u_C3R>::call, NppWarp<CV_16U, nppiWarpAffine_16u_C4R>::call},
+                {0, 0, 0, 0},
+                {NppWarp<CV_32S, nppiWarpAffine_32s_C1R>::call, 0, NppWarp<CV_32S, nppiWarpAffine_32s_C3R>::call, NppWarp<CV_32S, nppiWarpAffine_32s_C4R>::call},
+                {NppWarp<CV_32F, nppiWarpAffine_32f_C1R>::call, 0, NppWarp<CV_32F, nppiWarpAffine_32f_C3R>::call, NppWarp<CV_32F, nppiWarpAffine_32f_C4R>::call}
+            },
+            {
+                {NppWarp<CV_8U, nppiWarpAffineBack_8u_C1R>::call, 0, NppWarp<CV_8U, nppiWarpAffineBack_8u_C3R>::call, NppWarp<CV_8U, nppiWarpAffineBack_8u_C4R>::call},
+                {0, 0, 0, 0},
+                {NppWarp<CV_16U, nppiWarpAffineBack_16u_C1R>::call, 0, NppWarp<CV_16U, nppiWarpAffineBack_16u_C3R>::call, NppWarp<CV_16U, nppiWarpAffineBack_16u_C4R>::call},
+                {0, 0, 0, 0},
+                {NppWarp<CV_32S, nppiWarpAffineBack_32s_C1R>::call, 0, NppWarp<CV_32S, nppiWarpAffineBack_32s_C3R>::call, NppWarp<CV_32S, nppiWarpAffineBack_32s_C4R>::call},
+                {NppWarp<CV_32F, nppiWarpAffineBack_32f_C1R>::call, 0, NppWarp<CV_32F, nppiWarpAffineBack_32f_C3R>::call, NppWarp<CV_32F, nppiWarpAffineBack_32f_C4R>::call}
+            }
+        };
+
+        dst.setTo(borderValue, stream);
+
+        double coeffs[2][3];
+        Mat coeffsMat(2, 3, CV_64F, (void*)coeffs);
+        M.convertTo(coeffsMat, coeffsMat.type());
+
+        const func_t func = funcs[(flags & WARP_INVERSE_MAP) != 0][src.depth()][src.channels() - 1];
+        CV_Assert(func != 0);
+
+        func(src, dst, coeffs, interpolation, StreamAccessor::getStream(stream));
+    }
+    else
+    {
+        using namespace cv::cuda::device::imgproc;
+
+        typedef void (*func_t)(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation,
+            int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        static const func_t funcs[6][4] =
+        {
+            {warpAffine_gpu<uchar>      , 0 /*warpAffine_gpu<uchar2>*/ , warpAffine_gpu<uchar3>     , warpAffine_gpu<uchar4>     },
+            {0 /*warpAffine_gpu<schar>*/, 0 /*warpAffine_gpu<char2>*/  , 0 /*warpAffine_gpu<char3>*/, 0 /*warpAffine_gpu<char4>*/},
+            {warpAffine_gpu<ushort>     , 0 /*warpAffine_gpu<ushort2>*/, warpAffine_gpu<ushort3>    , warpAffine_gpu<ushort4>    },
+            {warpAffine_gpu<short>      , 0 /*warpAffine_gpu<short2>*/ , warpAffine_gpu<short3>     , warpAffine_gpu<short4>     },
+            {0 /*warpAffine_gpu<int>*/  , 0 /*warpAffine_gpu<int2>*/   , 0 /*warpAffine_gpu<int3>*/ , 0 /*warpAffine_gpu<int4>*/ },
+            {warpAffine_gpu<float>      , 0 /*warpAffine_gpu<float2>*/ , warpAffine_gpu<float3>     , warpAffine_gpu<float4>     }
+        };
+
+        const func_t func = funcs[src.depth()][src.channels() - 1];
+        CV_Assert(func != 0);
+
+        float coeffs[2 * 3];
+        Mat coeffsMat(2, 3, CV_32F, (void*)coeffs);
+
+        if (flags & WARP_INVERSE_MAP)
+            M.convertTo(coeffsMat, coeffsMat.type());
+        else
+        {
+            cv::Mat iM;
+            invertAffineTransform(M, iM);
+            iM.convertTo(coeffsMat, coeffsMat.type());
+        }
+
+        Scalar_<float> borderValueFloat;
+        borderValueFloat = borderValue;
+
+        func(src, PtrStepSzb(wholeSize.height, wholeSize.width, src.datastart, src.step), ofs.x, ofs.y, coeffs,
+            dst, interpolation, borderMode, borderValueFloat.val, StreamAccessor::getStream(stream), deviceSupports(FEATURE_SET_COMPUTE_20));
+    }
+}
+
+void cv::cuda::warpPerspective(InputArray _src, OutputArray _dst, InputArray _M, Size dsize, int flags, int borderMode, Scalar borderValue, Stream& stream)
+{
+    GpuMat src = _src.getGpuMat();
+    Mat M = _M.getMat();
+
+    CV_Assert( M.rows == 3 && M.cols == 3 );
+
+    const int interpolation = flags & INTER_MAX;
+
+    CV_Assert( src.depth() <= CV_32F && src.channels() <= 4 );
+    CV_Assert( interpolation == INTER_NEAREST || interpolation == INTER_LINEAR || interpolation == INTER_CUBIC );
+    CV_Assert( borderMode == BORDER_REFLECT101 || borderMode == BORDER_REPLICATE || borderMode == BORDER_CONSTANT || borderMode == BORDER_REFLECT || borderMode == BORDER_WRAP) ;
+
+    _dst.create(dsize, src.type());
+    GpuMat dst = _dst.getGpuMat();
+
+    Size wholeSize;
+    Point ofs;
+    src.locateROI(wholeSize, ofs);
+
+    static const bool useNppTab[6][4][3] =
+    {
+        {
+            {false, false, true},
+            {false, false, false},
+            {false, true, true},
+            {false, false, false}
+        },
+        {
+            {false, false, false},
+            {false, false, false},
+            {false, false, false},
+            {false, false, false}
+        },
+        {
+            {false, true, true},
+            {false, false, false},
+            {false, true, true},
+            {false, false, false}
+        },
+        {
+            {false, false, false},
+            {false, false, false},
+            {false, false, false},
+            {false, false, false}
+        },
+        {
+            {false, true, true},
+            {false, false, false},
+            {false, true, true},
+            {false, false, true}
+        },
+        {
+            {false, true, true},
+            {false, false, false},
+            {false, true, true},
+            {false, false, true}
+        }
+    };
+
+    bool useNpp = borderMode == BORDER_CONSTANT && ofs.x == 0 && ofs.y == 0 && useNppTab[src.depth()][src.channels() - 1][interpolation];
+    // NPP bug on float data
+    useNpp = useNpp && src.depth() != CV_32F;
+
+    if (useNpp)
+    {
+        typedef void (*func_t)(const cv::cuda::GpuMat& src, cv::cuda::GpuMat& dst, double coeffs[][3], int flags, cudaStream_t stream);
+
+        static const func_t funcs[2][6][4] =
+        {
+            {
+                {NppWarp<CV_8U, nppiWarpPerspective_8u_C1R>::call, 0, NppWarp<CV_8U, nppiWarpPerspective_8u_C3R>::call, NppWarp<CV_8U, nppiWarpPerspective_8u_C4R>::call},
+                {0, 0, 0, 0},
+                {NppWarp<CV_16U, nppiWarpPerspective_16u_C1R>::call, 0, NppWarp<CV_16U, nppiWarpPerspective_16u_C3R>::call, NppWarp<CV_16U, nppiWarpPerspective_16u_C4R>::call},
+                {0, 0, 0, 0},
+                {NppWarp<CV_32S, nppiWarpPerspective_32s_C1R>::call, 0, NppWarp<CV_32S, nppiWarpPerspective_32s_C3R>::call, NppWarp<CV_32S, nppiWarpPerspective_32s_C4R>::call},
+                {NppWarp<CV_32F, nppiWarpPerspective_32f_C1R>::call, 0, NppWarp<CV_32F, nppiWarpPerspective_32f_C3R>::call, NppWarp<CV_32F, nppiWarpPerspective_32f_C4R>::call}
+            },
+            {
+                {NppWarp<CV_8U, nppiWarpPerspectiveBack_8u_C1R>::call, 0, NppWarp<CV_8U, nppiWarpPerspectiveBack_8u_C3R>::call, NppWarp<CV_8U, nppiWarpPerspectiveBack_8u_C4R>::call},
+                {0, 0, 0, 0},
+                {NppWarp<CV_16U, nppiWarpPerspectiveBack_16u_C1R>::call, 0, NppWarp<CV_16U, nppiWarpPerspectiveBack_16u_C3R>::call, NppWarp<CV_16U, nppiWarpPerspectiveBack_16u_C4R>::call},
+                {0, 0, 0, 0},
+                {NppWarp<CV_32S, nppiWarpPerspectiveBack_32s_C1R>::call, 0, NppWarp<CV_32S, nppiWarpPerspectiveBack_32s_C3R>::call, NppWarp<CV_32S, nppiWarpPerspectiveBack_32s_C4R>::call},
+                {NppWarp<CV_32F, nppiWarpPerspectiveBack_32f_C1R>::call, 0, NppWarp<CV_32F, nppiWarpPerspectiveBack_32f_C3R>::call, NppWarp<CV_32F, nppiWarpPerspectiveBack_32f_C4R>::call}
+            }
+        };
+
+        dst.setTo(borderValue, stream);
+
+        double coeffs[3][3];
+        Mat coeffsMat(3, 3, CV_64F, (void*)coeffs);
+        M.convertTo(coeffsMat, coeffsMat.type());
+
+        const func_t func = funcs[(flags & WARP_INVERSE_MAP) != 0][src.depth()][src.channels() - 1];
+        CV_Assert(func != 0);
+
+        func(src, dst, coeffs, interpolation, StreamAccessor::getStream(stream));
+    }
+    else
+    {
+        using namespace cv::cuda::device::imgproc;
+
+        typedef void (*func_t)(PtrStepSzb src, PtrStepSzb srcWhole, int xoff, int yoff, float coeffs[2 * 3], PtrStepSzb dst, int interpolation,
+            int borderMode, const float* borderValue, cudaStream_t stream, bool cc20);
+
+        static const func_t funcs[6][4] =
+        {
+            {warpPerspective_gpu<uchar>      , 0 /*warpPerspective_gpu<uchar2>*/ , warpPerspective_gpu<uchar3>     , warpPerspective_gpu<uchar4>     },
+            {0 /*warpPerspective_gpu<schar>*/, 0 /*warpPerspective_gpu<char2>*/  , 0 /*warpPerspective_gpu<char3>*/, 0 /*warpPerspective_gpu<char4>*/},
+            {warpPerspective_gpu<ushort>     , 0 /*warpPerspective_gpu<ushort2>*/, warpPerspective_gpu<ushort3>    , warpPerspective_gpu<ushort4>    },
+            {warpPerspective_gpu<short>      , 0 /*warpPerspective_gpu<short2>*/ , warpPerspective_gpu<short3>     , warpPerspective_gpu<short4>     },
+            {0 /*warpPerspective_gpu<int>*/  , 0 /*warpPerspective_gpu<int2>*/   , 0 /*warpPerspective_gpu<int3>*/ , 0 /*warpPerspective_gpu<int4>*/ },
+            {warpPerspective_gpu<float>      , 0 /*warpPerspective_gpu<float2>*/ , warpPerspective_gpu<float3>     , warpPerspective_gpu<float4>     }
+        };
+
+        const func_t func = funcs[src.depth()][src.channels() - 1];
+        CV_Assert(func != 0);
+
+        float coeffs[3 * 3];
+        Mat coeffsMat(3, 3, CV_32F, (void*)coeffs);
+
+        if (flags & WARP_INVERSE_MAP)
+            M.convertTo(coeffsMat, coeffsMat.type());
+        else
+        {
+            cv::Mat iM;
+            invert(M, iM);
+            iM.convertTo(coeffsMat, coeffsMat.type());
+        }
+
+        Scalar_<float> borderValueFloat;
+        borderValueFloat = borderValue;
+
+        func(src, PtrStepSzb(wholeSize.height, wholeSize.width, src.datastart, src.step), ofs.x, ofs.y, coeffs,
+            dst, interpolation, borderMode, borderValueFloat.val, StreamAccessor::getStream(stream), deviceSupports(FEATURE_SET_COMPUTE_20));
+    }
+}
+
+////////////////////////////////////////////////////////////////////////
+// rotate
+
+namespace
+{
+    template <int DEPTH> struct NppRotateFunc
+    {
+        typedef typename NPPTypeTraits<DEPTH>::npp_type npp_type;
+
+        typedef NppStatus (*func_t)(const npp_type* pSrc, NppiSize oSrcSize, int nSrcStep, NppiRect oSrcROI,
+                                    npp_type* pDst, int nDstStep, NppiRect oDstROI,
+                                    double nAngle, double nShiftX, double nShiftY, int eInterpolation);
+    };
+
+    template <int DEPTH, typename NppRotateFunc<DEPTH>::func_t func> struct NppRotate
+    {
+        typedef typename NppRotateFunc<DEPTH>::npp_type npp_type;
+
+        static void call(const GpuMat& src, GpuMat& dst, Size dsize, double angle, double xShift, double yShift, int interpolation, cudaStream_t stream)
+        {
+            CV_UNUSED(dsize);
+            static const int npp_inter[] = {NPPI_INTER_NN, NPPI_INTER_LINEAR, NPPI_INTER_CUBIC};
+
+            NppStreamHandler h(stream);
+
+            NppiSize srcsz;
+            srcsz.height = src.rows;
+            srcsz.width = src.cols;
+            NppiRect srcroi;
+            srcroi.x = srcroi.y = 0;
+            srcroi.height = src.rows;
+            srcroi.width = src.cols;
+            NppiRect dstroi;
+            dstroi.x = dstroi.y = 0;
+            dstroi.height = dst.rows;
+            dstroi.width = dst.cols;
+
+            nppSafeCall( func(src.ptr<npp_type>(), srcsz, static_cast<int>(src.step), srcroi,
+                dst.ptr<npp_type>(), static_cast<int>(dst.step), dstroi, angle, xShift, yShift, npp_inter[interpolation]) );
+
+            if (stream == 0)
+                cudaSafeCall( cudaDeviceSynchronize() );
+        }
+    };
+}
+
+void cv::cuda::rotate(InputArray _src, OutputArray _dst, Size dsize, double angle, double xShift, double yShift, int interpolation, Stream& stream)
+{
+    typedef void (*func_t)(const GpuMat& src, GpuMat& dst, Size dsize, double angle, double xShift, double yShift, int interpolation, cudaStream_t stream);
+    static const func_t funcs[6][4] =
+    {
+        {NppRotate<CV_8U, nppiRotate_8u_C1R>::call, 0, NppRotate<CV_8U, nppiRotate_8u_C3R>::call, NppRotate<CV_8U, nppiRotate_8u_C4R>::call},
+        {0,0,0,0},
+        {NppRotate<CV_16U, nppiRotate_16u_C1R>::call, 0, NppRotate<CV_16U, nppiRotate_16u_C3R>::call, NppRotate<CV_16U, nppiRotate_16u_C4R>::call},
+        {0,0,0,0},
+        {0,0,0,0},
+        {NppRotate<CV_32F, nppiRotate_32f_C1R>::call, 0, NppRotate<CV_32F, nppiRotate_32f_C3R>::call, NppRotate<CV_32F, nppiRotate_32f_C4R>::call}
+    };
+
+    GpuMat src = _src.getGpuMat();
+
+    CV_Assert( src.depth() == CV_8U || src.depth() == CV_16U || src.depth() == CV_32F );
+    CV_Assert( src.channels() == 1 || src.channels() == 3 || src.channels() == 4 );
+    CV_Assert( interpolation == INTER_NEAREST || interpolation == INTER_LINEAR || interpolation == INTER_CUBIC );
+
+    _dst.create(dsize, src.type());
+    GpuMat dst = _dst.getGpuMat();
+
+    dst.setTo(Scalar::all(0), stream);
+
+    funcs[src.depth()][src.channels() - 1](src, dst, dsize, angle, xShift, yShift, interpolation, StreamAccessor::getStream(stream));
+}
+
+#endif // HAVE_CUDA
diff --git a/modules/cudawarping/test/interpolation.hpp b/modules/cudawarping/test/interpolation.hpp
new file mode 100644
index 00000000000..7a00143e1d9
--- /dev/null
+++ b/modules/cudawarping/test/interpolation.hpp
@@ -0,0 +1,131 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_TEST_INTERPOLATION_HPP__
+#define __OPENCV_TEST_INTERPOLATION_HPP__
+
+#include "opencv2/core.hpp"
+#include "opencv2/imgproc.hpp"
+
+template <typename T> T readVal(const cv::Mat& src, int y, int x, int c, int border_type, cv::Scalar borderVal = cv::Scalar())
+{
+    if (border_type == cv::BORDER_CONSTANT)
+        return (y >= 0 && y < src.rows && x >= 0 && x < src.cols) ? src.at<T>(y, x * src.channels() + c) : cv::saturate_cast<T>(borderVal.val[c]);
+
+    return src.at<T>(cv::borderInterpolate(y, src.rows, border_type), cv::borderInterpolate(x, src.cols, border_type) * src.channels() + c);
+}
+
+template <typename T> struct NearestInterpolator
+{
+    static T getValue(const cv::Mat& src, float y, float x, int c, int border_type, cv::Scalar borderVal = cv::Scalar())
+    {
+        return readVal<T>(src, int(y), int(x), c, border_type, borderVal);
+    }
+};
+
+template <typename T> struct LinearInterpolator
+{
+    static T getValue(const cv::Mat& src, float y, float x, int c, int border_type, cv::Scalar borderVal = cv::Scalar())
+    {
+        int x1 = cvFloor(x);
+        int y1 = cvFloor(y);
+        int x2 = x1 + 1;
+        int y2 = y1 + 1;
+
+        float res = 0;
+
+        res += readVal<T>(src, y1, x1, c, border_type, borderVal) * ((x2 - x) * (y2 - y));
+        res += readVal<T>(src, y1, x2, c, border_type, borderVal) * ((x - x1) * (y2 - y));
+        res += readVal<T>(src, y2, x1, c, border_type, borderVal) * ((x2 - x) * (y - y1));
+        res += readVal<T>(src, y2, x2, c, border_type, borderVal) * ((x - x1) * (y - y1));
+
+        return cv::saturate_cast<T>(res);
+    }
+};
+
+template <typename T> struct CubicInterpolator
+{
+    static float bicubicCoeff(float x_)
+    {
+        float x = fabsf(x_);
+        if (x <= 1.0f)
+        {
+            return x * x * (1.5f * x - 2.5f) + 1.0f;
+        }
+        else if (x < 2.0f)
+        {
+            return x * (x * (-0.5f * x + 2.5f) - 4.0f) + 2.0f;
+        }
+        else
+        {
+            return 0.0f;
+        }
+    }
+
+    static T getValue(const cv::Mat& src, float y, float x, int c, int border_type, cv::Scalar borderVal = cv::Scalar())
+    {
+        const float xmin = ceilf(x - 2.0f);
+        const float xmax = floorf(x + 2.0f);
+
+        const float ymin = ceilf(y - 2.0f);
+        const float ymax = floorf(y + 2.0f);
+
+        float sum  = 0.0f;
+        float wsum = 0.0f;
+
+        for (float cy = ymin; cy <= ymax; cy += 1.0f)
+        {
+            for (float cx = xmin; cx <= xmax; cx += 1.0f)
+            {
+                const float w = bicubicCoeff(x - cx) * bicubicCoeff(y - cy);
+                sum += w * readVal<T>(src, (int) floorf(cy), (int) floorf(cx), c, border_type, borderVal);
+                wsum += w;
+            }
+        }
+
+        float res = (!wsum)? 0 : sum / wsum;
+
+        return cv::saturate_cast<T>(res);
+    }
+};
+
+#endif // __OPENCV_TEST_INTERPOLATION_HPP__
diff --git a/modules/cudawarping/test/test_main.cpp b/modules/cudawarping/test/test_main.cpp
new file mode 100644
index 00000000000..04f4fcf6e60
--- /dev/null
+++ b/modules/cudawarping/test/test_main.cpp
@@ -0,0 +1,45 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+CV_CUDA_TEST_MAIN("gpu")
diff --git a/modules/cudawarping/test/test_precomp.hpp b/modules/cudawarping/test/test_precomp.hpp
new file mode 100644
index 00000000000..1d80af7229b
--- /dev/null
+++ b/modules/cudawarping/test/test_precomp.hpp
@@ -0,0 +1,54 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_test.hpp"
+
+#include "opencv2/cudawarping.hpp"
+
+#include "cvconfig.h"
+
+#include "interpolation.hpp"
+
+#endif
diff --git a/modules/cudawarping/test/test_pyramids.cpp b/modules/cudawarping/test/test_pyramids.cpp
new file mode 100644
index 00000000000..5b498aed132
--- /dev/null
+++ b/modules/cudawarping/test/test_pyramids.cpp
@@ -0,0 +1,131 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+////////////////////////////////////////////////////////
+// pyrDown
+
+PARAM_TEST_CASE(PyrDown, cv::cuda::DeviceInfo, cv::Size, MatType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(PyrDown, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+
+    cv::cuda::GpuMat dst = createMat(cv::Size((size.width + 1) / 2, (size.height + 1) / 2), type, useRoi);
+    cv::cuda::pyrDown(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::pyrDown(src, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, src.depth() == CV_32F ? 1e-4 : 1.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Warping, PyrDown, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC3), MatType(CV_8UC4), MatType(CV_16UC1), MatType(CV_16UC3), MatType(CV_16UC4), MatType(CV_32FC1), MatType(CV_32FC3), MatType(CV_32FC4)),
+    WHOLE_SUBMAT));
+
+////////////////////////////////////////////////////////
+// pyrUp
+
+PARAM_TEST_CASE(PyrUp, cv::cuda::DeviceInfo, cv::Size, MatType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        useRoi = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(PyrUp, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+
+    cv::cuda::GpuMat dst = createMat(cv::Size(size.width * 2, size.height * 2), type, useRoi);
+    cv::cuda::pyrUp(loadMat(src, useRoi), dst);
+
+    cv::Mat dst_gold;
+    cv::pyrUp(src, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, src.depth() == CV_32F ? 1e-4 : 1.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Warping, PyrUp, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC3), MatType(CV_8UC4), MatType(CV_16UC1), MatType(CV_16UC3), MatType(CV_16UC4), MatType(CV_32FC1), MatType(CV_32FC3), MatType(CV_32FC4)),
+    WHOLE_SUBMAT));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudawarping/test/test_remap.cpp b/modules/cudawarping/test/test_remap.cpp
new file mode 100644
index 00000000000..5839013d62c
--- /dev/null
+++ b/modules/cudawarping/test/test_remap.cpp
@@ -0,0 +1,182 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+///////////////////////////////////////////////////////////////////
+// Gold implementation
+
+namespace
+{
+    template <typename T, template <typename> class Interpolator> void remapImpl(const cv::Mat& src, const cv::Mat& xmap, const cv::Mat& ymap, cv::Mat& dst, int borderType, cv::Scalar borderVal)
+    {
+        const int cn = src.channels();
+
+        cv::Size dsize = xmap.size();
+
+        dst.create(dsize, src.type());
+
+        for (int y = 0; y < dsize.height; ++y)
+        {
+            for (int x = 0; x < dsize.width; ++x)
+            {
+                for (int c = 0; c < cn; ++c)
+                    dst.at<T>(y, x * cn + c) = Interpolator<T>::getValue(src, ymap.at<float>(y, x), xmap.at<float>(y, x), c, borderType, borderVal);
+            }
+        }
+    }
+
+    void remapGold(const cv::Mat& src, const cv::Mat& xmap, const cv::Mat& ymap, cv::Mat& dst, int interpolation, int borderType, cv::Scalar borderVal)
+    {
+        typedef void (*func_t)(const cv::Mat& src, const cv::Mat& xmap, const cv::Mat& ymap, cv::Mat& dst, int borderType, cv::Scalar borderVal);
+
+        static const func_t nearest_funcs[] =
+        {
+            remapImpl<unsigned char, NearestInterpolator>,
+            remapImpl<signed char, NearestInterpolator>,
+            remapImpl<unsigned short, NearestInterpolator>,
+            remapImpl<short, NearestInterpolator>,
+            remapImpl<int, NearestInterpolator>,
+            remapImpl<float, NearestInterpolator>
+        };
+
+        static const func_t linear_funcs[] =
+        {
+            remapImpl<unsigned char, LinearInterpolator>,
+            remapImpl<signed char, LinearInterpolator>,
+            remapImpl<unsigned short, LinearInterpolator>,
+            remapImpl<short, LinearInterpolator>,
+            remapImpl<int, LinearInterpolator>,
+            remapImpl<float, LinearInterpolator>
+        };
+
+        static const func_t cubic_funcs[] =
+        {
+            remapImpl<unsigned char, CubicInterpolator>,
+            remapImpl<signed char, CubicInterpolator>,
+            remapImpl<unsigned short, CubicInterpolator>,
+            remapImpl<short, CubicInterpolator>,
+            remapImpl<int, CubicInterpolator>,
+            remapImpl<float, CubicInterpolator>
+        };
+
+        static const func_t* funcs[] = {nearest_funcs, linear_funcs, cubic_funcs};
+
+        funcs[interpolation][src.depth()](src, xmap, ymap, dst, borderType, borderVal);
+    }
+}
+
+///////////////////////////////////////////////////////////////////
+// Test
+
+PARAM_TEST_CASE(Remap, cv::cuda::DeviceInfo, cv::Size, MatType, Interpolation, BorderType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    int interpolation;
+    int borderType;
+    bool useRoi;
+
+    cv::Mat xmap;
+    cv::Mat ymap;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        interpolation = GET_PARAM(3);
+        borderType = GET_PARAM(4);
+        useRoi = GET_PARAM(5);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+
+        // rotation matrix
+
+        const double aplha = CV_PI / 4;
+        static double M[2][3] = { {std::cos(aplha), -std::sin(aplha), size.width / 2.0},
+                                  {std::sin(aplha),  std::cos(aplha), 0.0}};
+
+        xmap.create(size, CV_32FC1);
+        ymap.create(size, CV_32FC1);
+
+        for (int y = 0; y < size.height; ++y)
+        {
+            for (int x = 0; x < size.width; ++x)
+            {
+                xmap.at<float>(y, x) = static_cast<float>(M[0][0] * x + M[0][1] * y + M[0][2]);
+                ymap.at<float>(y, x) = static_cast<float>(M[1][0] * x + M[1][1] * y + M[1][2]);
+            }
+        }
+    }
+};
+
+CUDA_TEST_P(Remap, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+    cv::Scalar val = randomScalar(0.0, 255.0);
+
+    cv::cuda::GpuMat dst = createMat(xmap.size(), type, useRoi);
+    cv::cuda::remap(loadMat(src, useRoi), dst, loadMat(xmap, useRoi), loadMat(ymap, useRoi), interpolation, borderType, val);
+
+    cv::Mat dst_gold;
+    remapGold(src, xmap, ymap, dst_gold, interpolation, borderType, val);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, src.depth() == CV_32F ? 1e-3 : 1.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Warping, Remap, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC3), MatType(CV_8UC4), MatType(CV_32FC1), MatType(CV_32FC3), MatType(CV_32FC4)),
+    testing::Values(Interpolation(cv::INTER_NEAREST), Interpolation(cv::INTER_LINEAR), Interpolation(cv::INTER_CUBIC)),
+    testing::Values(BorderType(cv::BORDER_REFLECT101), BorderType(cv::BORDER_REPLICATE), BorderType(cv::BORDER_CONSTANT), BorderType(cv::BORDER_REFLECT), BorderType(cv::BORDER_WRAP)),
+    WHOLE_SUBMAT));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudawarping/test/test_resize.cpp b/modules/cudawarping/test/test_resize.cpp
new file mode 100644
index 00000000000..5822f87c037
--- /dev/null
+++ b/modules/cudawarping/test/test_resize.cpp
@@ -0,0 +1,211 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+///////////////////////////////////////////////////////////////////
+// Gold implementation
+
+namespace
+{
+    template <typename T, template <typename> class Interpolator>
+    void resizeImpl(const cv::Mat& src, cv::Mat& dst, double fx, double fy)
+    {
+        const int cn = src.channels();
+
+        cv::Size dsize(cv::saturate_cast<int>(src.cols * fx), cv::saturate_cast<int>(src.rows * fy));
+
+        dst.create(dsize, src.type());
+
+        float ifx = static_cast<float>(1.0 / fx);
+        float ify = static_cast<float>(1.0 / fy);
+
+        for (int y = 0; y < dsize.height; ++y)
+        {
+            for (int x = 0; x < dsize.width; ++x)
+            {
+                for (int c = 0; c < cn; ++c)
+                    dst.at<T>(y, x * cn + c) = Interpolator<T>::getValue(src, y * ify, x * ifx, c, cv::BORDER_REPLICATE);
+            }
+        }
+    }
+
+    void resizeGold(const cv::Mat& src, cv::Mat& dst, double fx, double fy, int interpolation)
+    {
+        typedef void (*func_t)(const cv::Mat& src, cv::Mat& dst, double fx, double fy);
+
+        static const func_t nearest_funcs[] =
+        {
+            resizeImpl<unsigned char, NearestInterpolator>,
+            resizeImpl<signed char, NearestInterpolator>,
+            resizeImpl<unsigned short, NearestInterpolator>,
+            resizeImpl<short, NearestInterpolator>,
+            resizeImpl<int, NearestInterpolator>,
+            resizeImpl<float, NearestInterpolator>
+        };
+
+
+        static const func_t linear_funcs[] =
+        {
+            resizeImpl<unsigned char, LinearInterpolator>,
+            resizeImpl<signed char, LinearInterpolator>,
+            resizeImpl<unsigned short, LinearInterpolator>,
+            resizeImpl<short, LinearInterpolator>,
+            resizeImpl<int, LinearInterpolator>,
+            resizeImpl<float, LinearInterpolator>
+        };
+
+        static const func_t cubic_funcs[] =
+        {
+            resizeImpl<unsigned char, CubicInterpolator>,
+            resizeImpl<signed char, CubicInterpolator>,
+            resizeImpl<unsigned short, CubicInterpolator>,
+            resizeImpl<short, CubicInterpolator>,
+            resizeImpl<int, CubicInterpolator>,
+            resizeImpl<float, CubicInterpolator>
+        };
+
+        static const func_t* funcs[] = {nearest_funcs, linear_funcs, cubic_funcs};
+
+        funcs[interpolation][src.depth()](src, dst, fx, fy);
+    }
+}
+
+///////////////////////////////////////////////////////////////////
+// Test
+
+PARAM_TEST_CASE(Resize, cv::cuda::DeviceInfo, cv::Size, MatType, double, Interpolation, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    double coeff;
+    int interpolation;
+    int type;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        coeff = GET_PARAM(3);
+        interpolation = GET_PARAM(4);
+        useRoi = GET_PARAM(5);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(Resize, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+
+    cv::cuda::GpuMat dst = createMat(cv::Size(cv::saturate_cast<int>(src.cols * coeff), cv::saturate_cast<int>(src.rows * coeff)), type, useRoi);
+    cv::cuda::resize(loadMat(src, useRoi), dst, cv::Size(), coeff, coeff, interpolation);
+
+    cv::Mat dst_gold;
+    resizeGold(src, dst_gold, coeff, coeff, interpolation);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, src.depth() == CV_32F ? 1e-2 : 1.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Warping, Resize, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC3), MatType(CV_8UC4), MatType(CV_16UC1), MatType(CV_16UC3), MatType(CV_16UC4), MatType(CV_32FC1), MatType(CV_32FC3), MatType(CV_32FC4)),
+    testing::Values(0.3, 0.5, 1.5, 2.0),
+    testing::Values(Interpolation(cv::INTER_NEAREST), Interpolation(cv::INTER_LINEAR), Interpolation(cv::INTER_CUBIC)),
+    WHOLE_SUBMAT));
+
+/////////////////
+
+PARAM_TEST_CASE(ResizeSameAsHost, cv::cuda::DeviceInfo, cv::Size, MatType, double, Interpolation, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    double coeff;
+    int interpolation;
+    int type;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        coeff = GET_PARAM(3);
+        interpolation = GET_PARAM(4);
+        useRoi = GET_PARAM(5);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+// downscaling only: used for classifiers
+CUDA_TEST_P(ResizeSameAsHost, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+
+    cv::cuda::GpuMat dst = createMat(cv::Size(cv::saturate_cast<int>(src.cols * coeff), cv::saturate_cast<int>(src.rows * coeff)), type, useRoi);
+    cv::cuda::resize(loadMat(src, useRoi), dst, cv::Size(), coeff, coeff, interpolation);
+
+    cv::Mat dst_gold;
+    cv::resize(src, dst_gold, cv::Size(), coeff, coeff, interpolation);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, src.depth() == CV_32F ? 1e-2 : 1.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Warping, ResizeSameAsHost, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC3), MatType(CV_8UC4), MatType(CV_16UC1), MatType(CV_16UC3), MatType(CV_16UC4), MatType(CV_32FC1), MatType(CV_32FC3), MatType(CV_32FC4)),
+    testing::Values(0.3, 0.5),
+    testing::Values(Interpolation(cv::INTER_NEAREST), Interpolation(cv::INTER_AREA)),
+    WHOLE_SUBMAT));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudawarping/test/test_warp_affine.cpp b/modules/cudawarping/test/test_warp_affine.cpp
new file mode 100644
index 00000000000..d26a5fdeb7c
--- /dev/null
+++ b/modules/cudawarping/test/test_warp_affine.cpp
@@ -0,0 +1,282 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+namespace
+{
+    cv::Mat createTransformMatrix(cv::Size srcSize, double angle)
+    {
+        cv::Mat M(2, 3, CV_64FC1);
+
+        M.at<double>(0, 0) = std::cos(angle); M.at<double>(0, 1) = -std::sin(angle); M.at<double>(0, 2) = srcSize.width / 2;
+        M.at<double>(1, 0) = std::sin(angle); M.at<double>(1, 1) =  std::cos(angle); M.at<double>(1, 2) = 0.0;
+
+        return M;
+    }
+}
+
+///////////////////////////////////////////////////////////////////
+// Test buildWarpAffineMaps
+
+PARAM_TEST_CASE(BuildWarpAffineMaps, cv::cuda::DeviceInfo, cv::Size, Inverse)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    bool inverse;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        inverse = GET_PARAM(2);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(BuildWarpAffineMaps, Accuracy)
+{
+    cv::Mat M = createTransformMatrix(size, CV_PI / 4);
+    cv::Mat src = randomMat(randomSize(200, 400), CV_8UC1);
+
+    cv::cuda::GpuMat xmap, ymap;
+    cv::cuda::buildWarpAffineMaps(M, inverse, size, xmap, ymap);
+
+    int interpolation = cv::INTER_NEAREST;
+    int borderMode = cv::BORDER_CONSTANT;
+    int flags = interpolation;
+    if (inverse)
+        flags |= cv::WARP_INVERSE_MAP;
+
+    cv::Mat dst;
+    cv::remap(src, dst, cv::Mat(xmap), cv::Mat(ymap), interpolation, borderMode);
+
+    cv::Mat dst_gold;
+    cv::warpAffine(src, dst_gold, M, size, flags, borderMode);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Warping, BuildWarpAffineMaps, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    DIRECT_INVERSE));
+
+///////////////////////////////////////////////////////////////////
+// Gold implementation
+
+namespace
+{
+    template <typename T, template <typename> class Interpolator> void warpAffineImpl(const cv::Mat& src, const cv::Mat& M, cv::Size dsize, cv::Mat& dst, int borderType, cv::Scalar borderVal)
+    {
+        const int cn = src.channels();
+
+        dst.create(dsize, src.type());
+
+        for (int y = 0; y < dsize.height; ++y)
+        {
+            for (int x = 0; x < dsize.width; ++x)
+            {
+                float xcoo = static_cast<float>(M.at<double>(0, 0) * x + M.at<double>(0, 1) * y + M.at<double>(0, 2));
+                float ycoo = static_cast<float>(M.at<double>(1, 0) * x + M.at<double>(1, 1) * y + M.at<double>(1, 2));
+
+                for (int c = 0; c < cn; ++c)
+                    dst.at<T>(y, x * cn + c) = Interpolator<T>::getValue(src, ycoo, xcoo, c, borderType, borderVal);
+            }
+        }
+    }
+
+    void warpAffineGold(const cv::Mat& src, const cv::Mat& M, bool inverse, cv::Size dsize, cv::Mat& dst, int interpolation, int borderType, cv::Scalar borderVal)
+    {
+        typedef void (*func_t)(const cv::Mat& src, const cv::Mat& M, cv::Size dsize, cv::Mat& dst, int borderType, cv::Scalar borderVal);
+
+        static const func_t nearest_funcs[] =
+        {
+            warpAffineImpl<unsigned char, NearestInterpolator>,
+            warpAffineImpl<signed char, NearestInterpolator>,
+            warpAffineImpl<unsigned short, NearestInterpolator>,
+            warpAffineImpl<short, NearestInterpolator>,
+            warpAffineImpl<int, NearestInterpolator>,
+            warpAffineImpl<float, NearestInterpolator>
+        };
+
+        static const func_t linear_funcs[] =
+        {
+            warpAffineImpl<unsigned char, LinearInterpolator>,
+            warpAffineImpl<signed char, LinearInterpolator>,
+            warpAffineImpl<unsigned short, LinearInterpolator>,
+            warpAffineImpl<short, LinearInterpolator>,
+            warpAffineImpl<int, LinearInterpolator>,
+            warpAffineImpl<float, LinearInterpolator>
+        };
+
+        static const func_t cubic_funcs[] =
+        {
+            warpAffineImpl<unsigned char, CubicInterpolator>,
+            warpAffineImpl<signed char, CubicInterpolator>,
+            warpAffineImpl<unsigned short, CubicInterpolator>,
+            warpAffineImpl<short, CubicInterpolator>,
+            warpAffineImpl<int, CubicInterpolator>,
+            warpAffineImpl<float, CubicInterpolator>
+        };
+
+        static const func_t* funcs[] = {nearest_funcs, linear_funcs, cubic_funcs};
+
+        if (inverse)
+            funcs[interpolation][src.depth()](src, M, dsize, dst, borderType, borderVal);
+        else
+        {
+            cv::Mat iM;
+            cv::invertAffineTransform(M, iM);
+            funcs[interpolation][src.depth()](src, iM, dsize, dst, borderType, borderVal);
+        }
+    }
+}
+
+///////////////////////////////////////////////////////////////////
+// Test
+
+PARAM_TEST_CASE(WarpAffine, cv::cuda::DeviceInfo, cv::Size, MatType, Inverse, Interpolation, BorderType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    bool inverse;
+    int interpolation;
+    int borderType;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        inverse = GET_PARAM(3);
+        interpolation = GET_PARAM(4);
+        borderType = GET_PARAM(5);
+        useRoi = GET_PARAM(6);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(WarpAffine, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+    cv::Mat M = createTransformMatrix(size, CV_PI / 3);
+    int flags = interpolation;
+    if (inverse)
+        flags |= cv::WARP_INVERSE_MAP;
+    cv::Scalar val = randomScalar(0.0, 255.0);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    cv::cuda::warpAffine(loadMat(src, useRoi), dst, M, size, flags, borderType, val);
+
+    cv::Mat dst_gold;
+    warpAffineGold(src, M, inverse, size, dst_gold, interpolation, borderType, val);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, src.depth() == CV_32F ? 1e-1 : 1.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Warping, WarpAffine, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC3), MatType(CV_8UC4), MatType(CV_16UC1), MatType(CV_16UC3), MatType(CV_16UC4), MatType(CV_32FC1), MatType(CV_32FC3), MatType(CV_32FC4)),
+    DIRECT_INVERSE,
+    testing::Values(Interpolation(cv::INTER_NEAREST), Interpolation(cv::INTER_LINEAR), Interpolation(cv::INTER_CUBIC)),
+    testing::Values(BorderType(cv::BORDER_REFLECT101), BorderType(cv::BORDER_REPLICATE), BorderType(cv::BORDER_REFLECT), BorderType(cv::BORDER_WRAP)),
+    WHOLE_SUBMAT));
+
+///////////////////////////////////////////////////////////////////
+// Test NPP
+
+PARAM_TEST_CASE(WarpAffineNPP, cv::cuda::DeviceInfo, MatType, Inverse, Interpolation)
+{
+    cv::cuda::DeviceInfo devInfo;
+    int type;
+    bool inverse;
+    int interpolation;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        type = GET_PARAM(1);
+        inverse = GET_PARAM(2);
+        interpolation = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(WarpAffineNPP, Accuracy)
+{
+    cv::Mat src = readImageType("stereobp/aloe-L.png", type);
+    ASSERT_FALSE(src.empty());
+
+    cv::Mat M = createTransformMatrix(src.size(), CV_PI / 4);
+    int flags = interpolation;
+    if (inverse)
+        flags |= cv::WARP_INVERSE_MAP;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::warpAffine(loadMat(src), dst, M, src.size(), flags);
+
+    cv::Mat dst_gold;
+    warpAffineGold(src, M, inverse, src.size(), dst_gold, interpolation, cv::BORDER_CONSTANT, cv::Scalar::all(0));
+
+    EXPECT_MAT_SIMILAR(dst_gold, dst, 2e-2);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Warping, WarpAffineNPP, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC3), MatType(CV_8UC4), MatType(CV_32FC1), MatType(CV_32FC3), MatType(CV_32FC4)),
+    DIRECT_INVERSE,
+    testing::Values(Interpolation(cv::INTER_NEAREST), Interpolation(cv::INTER_LINEAR), Interpolation(cv::INTER_CUBIC))));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudawarping/test/test_warp_perspective.cpp b/modules/cudawarping/test/test_warp_perspective.cpp
new file mode 100644
index 00000000000..7c5c758892c
--- /dev/null
+++ b/modules/cudawarping/test/test_warp_perspective.cpp
@@ -0,0 +1,285 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_CUDA
+
+namespace opencv_test { namespace {
+
+namespace
+{
+    cv::Mat createTransformMatrix(cv::Size srcSize, double angle)
+    {
+        cv::Mat M(3, 3, CV_64FC1);
+
+        M.at<double>(0, 0) = std::cos(angle); M.at<double>(0, 1) = -std::sin(angle); M.at<double>(0, 2) = srcSize.width / 2;
+        M.at<double>(1, 0) = std::sin(angle); M.at<double>(1, 1) =  std::cos(angle); M.at<double>(1, 2) = 0.0;
+        M.at<double>(2, 0) = 0.0            ; M.at<double>(2, 1) =  0.0            ; M.at<double>(2, 2) = 1.0;
+
+        return M;
+    }
+}
+
+///////////////////////////////////////////////////////////////////
+// Test buildWarpPerspectiveMaps
+
+PARAM_TEST_CASE(BuildWarpPerspectiveMaps, cv::cuda::DeviceInfo, cv::Size, Inverse)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    bool inverse;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        inverse = GET_PARAM(2);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(BuildWarpPerspectiveMaps, Accuracy)
+{
+    cv::Mat M = createTransformMatrix(size, CV_PI / 4);
+
+    cv::cuda::GpuMat xmap, ymap;
+    cv::cuda::buildWarpPerspectiveMaps(M, inverse, size, xmap, ymap);
+
+    cv::Mat src = randomMat(randomSize(200, 400), CV_8UC1);
+    int interpolation = cv::INTER_NEAREST;
+    int borderMode = cv::BORDER_CONSTANT;
+    int flags = interpolation;
+    if (inverse)
+        flags |= cv::WARP_INVERSE_MAP;
+
+    cv::Mat dst;
+    cv::remap(src, dst, cv::Mat(xmap), cv::Mat(ymap), interpolation, borderMode);
+
+    cv::Mat dst_gold;
+    cv::warpPerspective(src, dst_gold, M, size, flags, borderMode);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Warping, BuildWarpPerspectiveMaps, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    DIRECT_INVERSE));
+
+///////////////////////////////////////////////////////////////////
+// Gold implementation
+
+namespace
+{
+    template <typename T, template <typename> class Interpolator> void warpPerspectiveImpl(const cv::Mat& src, const cv::Mat& M, cv::Size dsize, cv::Mat& dst, int borderType, cv::Scalar borderVal)
+    {
+        const int cn = src.channels();
+
+        dst.create(dsize, src.type());
+
+        for (int y = 0; y < dsize.height; ++y)
+        {
+            for (int x = 0; x < dsize.width; ++x)
+            {
+                float coeff = static_cast<float>(M.at<double>(2, 0) * x + M.at<double>(2, 1) * y + M.at<double>(2, 2));
+
+                float xcoo = static_cast<float>((M.at<double>(0, 0) * x + M.at<double>(0, 1) * y + M.at<double>(0, 2)) / coeff);
+                float ycoo = static_cast<float>((M.at<double>(1, 0) * x + M.at<double>(1, 1) * y + M.at<double>(1, 2)) / coeff);
+
+                for (int c = 0; c < cn; ++c)
+                    dst.at<T>(y, x * cn + c) = Interpolator<T>::getValue(src, ycoo, xcoo, c, borderType, borderVal);
+            }
+        }
+    }
+
+    void warpPerspectiveGold(const cv::Mat& src, const cv::Mat& M, bool inverse, cv::Size dsize, cv::Mat& dst, int interpolation, int borderType, cv::Scalar borderVal)
+    {
+        typedef void (*func_t)(const cv::Mat& src, const cv::Mat& M, cv::Size dsize, cv::Mat& dst, int borderType, cv::Scalar borderVal);
+
+        static const func_t nearest_funcs[] =
+        {
+            warpPerspectiveImpl<unsigned char, NearestInterpolator>,
+            warpPerspectiveImpl<signed char, NearestInterpolator>,
+            warpPerspectiveImpl<unsigned short, NearestInterpolator>,
+            warpPerspectiveImpl<short, NearestInterpolator>,
+            warpPerspectiveImpl<int, NearestInterpolator>,
+            warpPerspectiveImpl<float, NearestInterpolator>
+        };
+
+        static const func_t linear_funcs[] =
+        {
+            warpPerspectiveImpl<unsigned char, LinearInterpolator>,
+            warpPerspectiveImpl<signed char, LinearInterpolator>,
+            warpPerspectiveImpl<unsigned short, LinearInterpolator>,
+            warpPerspectiveImpl<short, LinearInterpolator>,
+            warpPerspectiveImpl<int, LinearInterpolator>,
+            warpPerspectiveImpl<float, LinearInterpolator>
+        };
+
+        static const func_t cubic_funcs[] =
+        {
+            warpPerspectiveImpl<unsigned char, CubicInterpolator>,
+            warpPerspectiveImpl<signed char, CubicInterpolator>,
+            warpPerspectiveImpl<unsigned short, CubicInterpolator>,
+            warpPerspectiveImpl<short, CubicInterpolator>,
+            warpPerspectiveImpl<int, CubicInterpolator>,
+            warpPerspectiveImpl<float, CubicInterpolator>
+        };
+
+        static const func_t* funcs[] = {nearest_funcs, linear_funcs, cubic_funcs};
+
+        if (inverse)
+            funcs[interpolation][src.depth()](src, M, dsize, dst, borderType, borderVal);
+        else
+        {
+            cv::Mat iM;
+            cv::invert(M, iM);
+            funcs[interpolation][src.depth()](src, iM, dsize, dst, borderType, borderVal);
+        }
+    }
+}
+
+///////////////////////////////////////////////////////////////////
+// Test
+
+PARAM_TEST_CASE(WarpPerspective, cv::cuda::DeviceInfo, cv::Size, MatType, Inverse, Interpolation, BorderType, UseRoi)
+{
+    cv::cuda::DeviceInfo devInfo;
+    cv::Size size;
+    int type;
+    bool inverse;
+    int interpolation;
+    int borderType;
+    bool useRoi;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        size = GET_PARAM(1);
+        type = GET_PARAM(2);
+        inverse = GET_PARAM(3);
+        interpolation = GET_PARAM(4);
+        borderType = GET_PARAM(5);
+        useRoi = GET_PARAM(6);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(WarpPerspective, Accuracy)
+{
+    cv::Mat src = randomMat(size, type);
+    cv::Mat M = createTransformMatrix(size, CV_PI / 3);
+    int flags = interpolation;
+    if (inverse)
+        flags |= cv::WARP_INVERSE_MAP;
+    cv::Scalar val = randomScalar(0.0, 255.0);
+
+    cv::cuda::GpuMat dst = createMat(size, type, useRoi);
+    cv::cuda::warpPerspective(loadMat(src, useRoi), dst, M, size, flags, borderType, val);
+
+    cv::Mat dst_gold;
+    warpPerspectiveGold(src, M, inverse, size, dst_gold, interpolation, borderType, val);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, src.depth() == CV_32F ? 1e-1 : 1.0);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Warping, WarpPerspective, testing::Combine(
+    ALL_DEVICES,
+    DIFFERENT_SIZES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC3), MatType(CV_8UC4), MatType(CV_16UC1), MatType(CV_16UC3), MatType(CV_16UC4), MatType(CV_32FC1), MatType(CV_32FC3), MatType(CV_32FC4)),
+    DIRECT_INVERSE,
+    testing::Values(Interpolation(cv::INTER_NEAREST), Interpolation(cv::INTER_LINEAR), Interpolation(cv::INTER_CUBIC)),
+    testing::Values(BorderType(cv::BORDER_REFLECT101), BorderType(cv::BORDER_REPLICATE), BorderType(cv::BORDER_REFLECT), BorderType(cv::BORDER_WRAP)),
+    WHOLE_SUBMAT));
+
+///////////////////////////////////////////////////////////////////
+// Test NPP
+
+PARAM_TEST_CASE(WarpPerspectiveNPP, cv::cuda::DeviceInfo, MatType, Inverse, Interpolation)
+{
+    cv::cuda::DeviceInfo devInfo;
+    int type;
+    bool inverse;
+    int interpolation;
+
+    virtual void SetUp()
+    {
+        devInfo = GET_PARAM(0);
+        type = GET_PARAM(1);
+        inverse = GET_PARAM(2);
+        interpolation = GET_PARAM(3);
+
+        cv::cuda::setDevice(devInfo.deviceID());
+    }
+};
+
+CUDA_TEST_P(WarpPerspectiveNPP, Accuracy)
+{
+    cv::Mat src = readImageType("stereobp/aloe-L.png", type);
+    ASSERT_FALSE(src.empty());
+
+    cv::Mat M = createTransformMatrix(src.size(), CV_PI / 4);
+    int flags = interpolation;
+    if (inverse)
+        flags |= cv::WARP_INVERSE_MAP;
+
+    cv::cuda::GpuMat dst;
+    cv::cuda::warpPerspective(loadMat(src), dst, M, src.size(), flags);
+
+    cv::Mat dst_gold;
+    warpPerspectiveGold(src, M, inverse, src.size(), dst_gold, interpolation, cv::BORDER_CONSTANT, cv::Scalar::all(0));
+
+    EXPECT_MAT_SIMILAR(dst_gold, dst, 2e-2);
+}
+
+INSTANTIATE_TEST_CASE_P(CUDA_Warping, WarpPerspectiveNPP, testing::Combine(
+    ALL_DEVICES,
+    testing::Values(MatType(CV_8UC1), MatType(CV_8UC3), MatType(CV_8UC4), MatType(CV_32FC1), MatType(CV_32FC3), MatType(CV_32FC4)),
+    DIRECT_INVERSE,
+    testing::Values(Interpolation(cv::INTER_NEAREST), Interpolation(cv::INTER_LINEAR), Interpolation(cv::INTER_CUBIC))));
+
+
+}} // namespace
+#endif // HAVE_CUDA
diff --git a/modules/cudev/CMakeLists.txt b/modules/cudev/CMakeLists.txt
new file mode 100644
index 00000000000..742f7c8ae55
--- /dev/null
+++ b/modules/cudev/CMakeLists.txt
@@ -0,0 +1,25 @@
+if(NOT HAVE_CUDA)
+  ocv_module_disable(cudev)
+endif()
+
+set(the_description "CUDA device layer")
+
+ocv_warnings_disable(CMAKE_CXX_FLAGS /wd4189 /wd4505 -Wundef -Wmissing-declarations -Wunused-function -Wunused-variable -Wenum-compare -Wshadow)
+
+ocv_add_module(cudev)
+
+ocv_module_include_directories(opencv_core)
+
+file(GLOB_RECURSE lib_hdrs "${CMAKE_CURRENT_LIST_DIR}/include/opencv2/${name}/*.hpp")
+file(GLOB         lib_srcs "${CMAKE_CURRENT_LIST_DIR}/src/*.cpp")
+
+source_group("Include" FILES ${lib_hdrs})
+source_group("Src" FILES ${lib_srcs})
+
+ocv_glob_module_sources(HEADERS ${lib_hdrs} SOURCES ${lib_srcs})
+
+ocv_create_module()
+
+if(BUILD_TESTS AND NOT BUILD_opencv_world)
+  add_subdirectory(test)
+endif()
diff --git a/modules/cudev/include/opencv2/cudev.hpp b/modules/cudev/include/opencv2/cudev.hpp
new file mode 100644
index 00000000000..1f435c2b6a5
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev.hpp
@@ -0,0 +1,119 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_HPP
+#define OPENCV_CUDEV_HPP
+
+#include "cudev/common.hpp"
+
+#include "cudev/util/atomic.hpp"
+#include "cudev/util/limits.hpp"
+#include "cudev/util/saturate_cast.hpp"
+#include "cudev/util/simd_functions.hpp"
+#include "cudev/util/tuple.hpp"
+#include "cudev/util/type_traits.hpp"
+#include "cudev/util/vec_math.hpp"
+#include "cudev/util/vec_traits.hpp"
+
+#include "cudev/functional/color_cvt.hpp"
+#include "cudev/functional/functional.hpp"
+#include "cudev/functional/tuple_adapter.hpp"
+
+#include "cudev/warp/reduce.hpp"
+#include "cudev/warp/scan.hpp"
+#include "cudev/warp/shuffle.hpp"
+#include "cudev/warp/warp.hpp"
+
+#include "cudev/block/block.hpp"
+#include "cudev/block/dynamic_smem.hpp"
+#include "cudev/block/reduce.hpp"
+#include "cudev/block/scan.hpp"
+#include "cudev/block/vec_distance.hpp"
+
+#include "cudev/grid/copy.hpp"
+#include "cudev/grid/reduce.hpp"
+#include "cudev/grid/histogram.hpp"
+#include "cudev/grid/integral.hpp"
+#include "cudev/grid/pyramids.hpp"
+#include "cudev/grid/reduce_to_vec.hpp"
+#include "cudev/grid/split_merge.hpp"
+#include "cudev/grid/transform.hpp"
+#include "cudev/grid/transpose.hpp"
+
+#include "cudev/ptr2d/constant.hpp"
+#include "cudev/ptr2d/deriv.hpp"
+#include "cudev/ptr2d/extrapolation.hpp"
+#include "cudev/ptr2d/glob.hpp"
+#include "cudev/ptr2d/gpumat.hpp"
+#include "cudev/ptr2d/interpolation.hpp"
+#include "cudev/ptr2d/lut.hpp"
+#include "cudev/ptr2d/mask.hpp"
+#include "cudev/ptr2d/remap.hpp"
+#include "cudev/ptr2d/resize.hpp"
+#include "cudev/ptr2d/texture.hpp"
+#include "cudev/ptr2d/traits.hpp"
+#include "cudev/ptr2d/transform.hpp"
+#include "cudev/ptr2d/warping.hpp"
+#include "cudev/ptr2d/zip.hpp"
+
+#include "cudev/expr/binary_func.hpp"
+#include "cudev/expr/binary_op.hpp"
+#include "cudev/expr/color.hpp"
+#include "cudev/expr/deriv.hpp"
+#include "cudev/expr/expr.hpp"
+#include "cudev/expr/per_element_func.hpp"
+#include "cudev/expr/reduction.hpp"
+#include "cudev/expr/unary_func.hpp"
+#include "cudev/expr/unary_op.hpp"
+#include "cudev/expr/warping.hpp"
+
+/**
+  @addtogroup cuda
+  @{
+    @defgroup cudev Device layer
+  @}
+*/
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/block/block.hpp b/modules/cudev/include/opencv2/cudev/block/block.hpp
new file mode 100644
index 00000000000..54e93003608
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/block/block.hpp
@@ -0,0 +1,133 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_BLOCK_BLOCK_HPP
+#define OPENCV_CUDEV_BLOCK_BLOCK_HPP
+
+#include "../common.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+struct Block
+{
+    __device__ __forceinline__ static uint blockId()
+    {
+        return (blockIdx.z * gridDim.y + blockIdx.y) * gridDim.x + blockIdx.x;
+    }
+
+    __device__ __forceinline__ static uint blockSize()
+    {
+        return blockDim.x * blockDim.y * blockDim.z;
+    }
+
+    __device__ __forceinline__ static uint threadLineId()
+    {
+        return (threadIdx.z * blockDim.y + threadIdx.y) * blockDim.x + threadIdx.x;
+    }
+};
+
+template <class It, typename T>
+__device__ __forceinline__ static void blockFill(It beg, It end, const T& value)
+{
+    uint STRIDE = Block::blockSize();
+    It t = beg + Block::threadLineId();
+
+    for(; t < end; t += STRIDE)
+        *t = value;
+}
+
+template <class OutIt, typename T>
+__device__ __forceinline__ static void blockYota(OutIt beg, OutIt end, T value)
+{
+    uint STRIDE = Block::blockSize();
+    uint tid = Block::threadLineId();
+    value += tid;
+
+    for(OutIt t = beg + tid; t < end; t += STRIDE, value += STRIDE)
+        *t = value;
+}
+
+template <class InIt, class OutIt>
+__device__ __forceinline__ static void blockCopy(InIt beg, InIt end, OutIt out)
+{
+    uint STRIDE = Block::blockSize();
+    InIt  t = beg + Block::threadLineId();
+    OutIt o = out + (t - beg);
+
+    for(; t < end; t += STRIDE, o += STRIDE)
+        *o = *t;
+}
+
+template <class InIt, class OutIt, class UnOp>
+__device__ __forceinline__ static void blockTransform(InIt beg, InIt end, OutIt out, const UnOp& op)
+{
+    uint STRIDE = Block::blockSize();
+    InIt  t = beg + Block::threadLineId();
+    OutIt o = out + (t - beg);
+
+    for(; t < end; t += STRIDE, o += STRIDE)
+        *o = op(*t);
+}
+
+template <class InIt1, class InIt2, class OutIt, class BinOp>
+__device__ __forceinline__ static void blockTransform(InIt1 beg1, InIt1 end1, InIt2 beg2, OutIt out, const BinOp& op)
+{
+    uint STRIDE = Block::blockSize();
+    InIt1 t1 = beg1 + Block::threadLineId();
+    InIt2 t2 = beg2 + Block::threadLineId();
+    OutIt o  = out + (t1 - beg1);
+
+    for(; t1 < end1; t1 += STRIDE, t2 += STRIDE, o += STRIDE)
+        *o = op(*t1, *t2);
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/block/detail/reduce.hpp b/modules/cudev/include/opencv2/cudev/block/detail/reduce.hpp
new file mode 100644
index 00000000000..151e949a617
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/block/detail/reduce.hpp
@@ -0,0 +1,392 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_BLOCK_REDUCE_DETAIL_HPP
+#define OPENCV_CUDEV_BLOCK_REDUCE_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../util/tuple.hpp"
+#include "../../util/type_traits.hpp"
+#include "../../warp/warp.hpp"
+#include "../../warp/shuffle.hpp"
+
+namespace cv { namespace cudev {
+
+namespace block_reduce_detail
+{
+    // GetType
+
+    template <typename T> struct GetType;
+
+    template <typename T> struct GetType<T*>
+    {
+        typedef T type;
+    };
+
+    template <typename T> struct GetType<volatile T*>
+    {
+        typedef T type;
+    };
+
+    template <typename T> struct GetType<T&>
+    {
+        typedef T type;
+    };
+
+    // For
+
+    template <int I, int N> struct For
+    {
+        template <class PointerTuple, class ValTuple>
+        __device__ static void loadToSmem(const PointerTuple& smem, const ValTuple& val, uint tid)
+        {
+            get<I>(smem)[tid] = get<I>(val);
+
+            For<I + 1, N>::loadToSmem(smem, val, tid);
+        }
+
+        template <class PointerTuple, class ValTuple>
+        __device__ static void loadFromSmem(const PointerTuple& smem, const ValTuple& val, uint tid)
+        {
+            get<I>(val) = get<I>(smem)[tid];
+
+            For<I + 1, N>::loadFromSmem(smem, val, tid);
+        }
+
+        template <class PointerTuple, class ValTuple, class OpTuple>
+        __device__ static void merge(const PointerTuple& smem, const ValTuple& val, uint tid, uint delta, const OpTuple& op)
+        {
+            typename GetType<typename tuple_element<I, PointerTuple>::type>::type reg = get<I>(smem)[tid + delta];
+            get<I>(smem)[tid] = get<I>(val) = get<I>(op)(get<I>(val), reg);
+
+            For<I + 1, N>::merge(smem, val, tid, delta, op);
+        }
+
+#if CV_CUDEV_ARCH >= 300
+        template <class ValTuple, class OpTuple>
+        __device__ static void mergeShfl(const ValTuple& val, uint delta, uint width, const OpTuple& op)
+        {
+            typename GetType<typename tuple_element<I, ValTuple>::type>::type reg = shfl_down(get<I>(val), delta, width);
+            get<I>(val) = get<I>(op)(get<I>(val), reg);
+
+            For<I + 1, N>::mergeShfl(val, delta, width, op);
+        }
+#endif
+    };
+
+    template <int N> struct For<N, N>
+    {
+        template <class PointerTuple, class ValTuple>
+        __device__ __forceinline__ static void loadToSmem(const PointerTuple&, const ValTuple&, uint)
+        {
+        }
+        template <class PointerTuple, class ValTuple>
+        __device__ __forceinline__ static void loadFromSmem(const PointerTuple&, const ValTuple&, uint)
+        {
+        }
+
+        template <class PointerTuple, class ValTuple, class OpTuple>
+        __device__ __forceinline__ static void merge(const PointerTuple&, const ValTuple&, uint, uint, const OpTuple&)
+        {
+        }
+
+#if CV_CUDEV_ARCH >= 300
+        template <class ValTuple, class OpTuple>
+        __device__ __forceinline__ static void mergeShfl(const ValTuple&, uint, uint, const OpTuple&)
+        {
+        }
+#endif
+    };
+
+    // loadToSmem / loadFromSmem
+
+    template <typename T>
+    __device__ __forceinline__ void loadToSmem(volatile T* smem, T& val, uint tid)
+    {
+        smem[tid] = val;
+    }
+
+    template <typename T>
+    __device__ __forceinline__ void loadFromSmem(volatile T* smem, T& val, uint tid)
+    {
+        val = smem[tid];
+    }
+
+    template <typename P0, typename P1, typename P2, typename P3, typename P4, typename P5, typename P6, typename P7, typename P8, typename P9,
+              typename R0, typename R1, typename R2, typename R3, typename R4, typename R5, typename R6, typename R7, typename R8, typename R9>
+    __device__ __forceinline__ void loadToSmem(const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem,
+                                               const tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9>& val,
+                                               uint tid)
+    {
+        For<0, tuple_size<tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9> >::value>::loadToSmem(smem, val, tid);
+    }
+
+    template <typename P0, typename P1, typename P2, typename P3, typename P4, typename P5, typename P6, typename P7, typename P8, typename P9,
+              typename R0, typename R1, typename R2, typename R3, typename R4, typename R5, typename R6, typename R7, typename R8, typename R9>
+    __device__ __forceinline__ void loadFromSmem(const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem,
+                                                     const tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9>& val,
+                                                     uint tid)
+    {
+        For<0, tuple_size<tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9> >::value>::loadFromSmem(smem, val, tid);
+    }
+
+    // merge
+
+    template <typename T, class Op>
+    __device__ __forceinline__ void merge(volatile T* smem, T& val, uint tid, uint delta, const Op& op)
+    {
+        T reg = smem[tid + delta];
+        smem[tid] = val = op(val, reg);
+    }
+
+    template <typename P0, typename P1, typename P2, typename P3, typename P4, typename P5, typename P6, typename P7, typename P8, typename P9,
+              typename R0, typename R1, typename R2, typename R3, typename R4, typename R5, typename R6, typename R7, typename R8, typename R9,
+              class Op0, class Op1, class Op2, class Op3, class Op4, class Op5, class Op6, class Op7, class Op8, class Op9>
+    __device__ __forceinline__ void merge(const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem,
+                                          const tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9>& val,
+                                          uint tid,
+                                          uint delta,
+                                          const tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9>& op)
+    {
+        For<0, tuple_size<tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9> >::value>::merge(smem, val, tid, delta, op);
+    }
+
+    // mergeShfl
+
+#if CV_CUDEV_ARCH >= 300
+    template <typename T, class Op>
+    __device__ __forceinline__ void mergeShfl(T& val, uint delta, uint width, const Op& op)
+    {
+        T reg = shfl_down(val, delta, width);
+        val = op(val, reg);
+    }
+
+    template <typename R0, typename R1, typename R2, typename R3, typename R4, typename R5, typename R6, typename R7, typename R8, typename R9,
+              class Op0, class Op1, class Op2, class Op3, class Op4, class Op5, class Op6, class Op7, class Op8, class Op9>
+    __device__ __forceinline__ void mergeShfl(const tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9>& val,
+                                              uint delta,
+                                              uint width,
+                                              const tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9>& op)
+    {
+        For<0, tuple_size<tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9> >::value>::mergeShfl(val, delta, width, op);
+    }
+#endif
+
+    // Generic
+
+    template <int N> struct Generic
+    {
+        template <typename Pointer, typename Reference, class Op>
+        __device__ static void reduce(Pointer smem, Reference val, uint tid, Op op)
+        {
+            loadToSmem(smem, val, tid);
+            if (N >= 32)
+                __syncthreads();
+
+            if (N >= 2048)
+            {
+                if (tid < 1024)
+                    merge(smem, val, tid, 1024, op);
+
+                __syncthreads();
+            }
+            if (N >= 1024)
+            {
+                if (tid < 512)
+                    merge(smem, val, tid, 512, op);
+
+                __syncthreads();
+            }
+            if (N >= 512)
+            {
+                if (tid < 256)
+                    merge(smem, val, tid, 256, op);
+
+                __syncthreads();
+            }
+            if (N >= 256)
+            {
+                if (tid < 128)
+                    merge(smem, val, tid, 128, op);
+
+                __syncthreads();
+            }
+            if (N >= 128)
+            {
+                if (tid < 64)
+                    merge(smem, val, tid, 64, op);
+
+                __syncthreads();
+            }
+            if (N >= 64)
+            {
+                if (tid < 32)
+                    merge(smem, val, tid, 32, op);
+            }
+
+            if (tid < 16)
+            {
+                merge(smem, val, tid, 16, op);
+                merge(smem, val, tid, 8, op);
+                merge(smem, val, tid, 4, op);
+                merge(smem, val, tid, 2, op);
+                merge(smem, val, tid, 1, op);
+            }
+        }
+    };
+
+    // Unroll
+
+    template <int I, typename Pointer, typename Reference, class Op> struct Unroll
+    {
+        __device__ static void loop(Pointer smem, Reference val, uint tid, Op op)
+        {
+            merge(smem, val, tid, I, op);
+            Unroll<I / 2, Pointer, Reference, Op>::loop(smem, val, tid, op);
+        }
+
+#if CV_CUDEV_ARCH >= 300
+        __device__ static void loopShfl(Reference val, Op op, uint N)
+        {
+            mergeShfl(val, I, N, op);
+            Unroll<I / 2, Pointer, Reference, Op>::loopShfl(val, op, N);
+        }
+#endif
+    };
+
+    template <typename Pointer, typename Reference, class Op> struct Unroll<0, Pointer, Reference, Op>
+    {
+        __device__ __forceinline__ static void loop(Pointer, Reference, uint, Op)
+        {
+        }
+
+#if CV_CUDEV_ARCH >= 300
+        __device__ __forceinline__ static void loopShfl(Reference, Op, uint)
+        {
+        }
+#endif
+    };
+
+    // WarpOptimized
+
+    template <int N> struct WarpOptimized
+    {
+        template <typename Pointer, typename Reference, class Op>
+        __device__ static void reduce(Pointer smem, Reference val, uint tid, Op op)
+        {
+        #if CV_CUDEV_ARCH >= 300
+            CV_UNUSED(smem);
+            CV_UNUSED(tid);
+
+            Unroll<N / 2, Pointer, Reference, Op>::loopShfl(val, op, N);
+        #else
+            loadToSmem(smem, val, tid);
+
+            if (tid < N / 2)
+                Unroll<N / 2, Pointer, Reference, Op>::loop(smem, val, tid, op);
+        #endif
+        }
+    };
+
+    // GenericOptimized32
+
+    template <int N> struct GenericOptimized32
+    {
+        enum { M = N / 32 };
+
+        template <typename Pointer, typename Reference, class Op>
+        __device__ static void reduce(Pointer smem, Reference val, uint tid, Op op)
+        {
+            const uint laneId = Warp::laneId();
+
+        #if CV_CUDEV_ARCH >= 300
+            Unroll<16, Pointer, Reference, Op>::loopShfl(val, op, warpSize);
+
+            if (laneId == 0)
+                loadToSmem(smem, val, tid / 32);
+        #else
+            loadToSmem(smem, val, tid);
+
+            if (laneId < 16)
+                Unroll<16, Pointer, Reference, Op>::loop(smem, val, tid, op);
+
+            __syncthreads();
+
+            if (laneId == 0)
+                loadToSmem(smem, val, tid / 32);
+        #endif
+
+            __syncthreads();
+
+            loadFromSmem(smem, val, tid);
+
+            if (tid < 32)
+            {
+        #if CV_CUDEV_ARCH >= 300
+                Unroll<M / 2, Pointer, Reference, Op>::loopShfl(val, op, M);
+        #else
+                Unroll<M / 2, Pointer, Reference, Op>::loop(smem, val, tid, op);
+        #endif
+            }
+        }
+    };
+
+    template <int N> struct Dispatcher
+    {
+        typedef typename SelectIf<
+            (N <= 32) && IsPowerOf2<N>::value,
+            WarpOptimized<N>,
+            typename SelectIf<
+                (N <= 1024) && IsPowerOf2<N>::value,
+                GenericOptimized32<N>,
+                Generic<N>
+            >::type
+        >::type reductor;
+    };
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/block/detail/reduce_key_val.hpp b/modules/cudev/include/opencv2/cudev/block/detail/reduce_key_val.hpp
new file mode 100644
index 00000000000..4af834a446e
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/block/detail/reduce_key_val.hpp
@@ -0,0 +1,394 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_BLOCK_REDUCE_KEY_VAL_DETAIL_HPP
+#define OPENCV_CUDEV_BLOCK_REDUCE_KEY_VAL_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../util/tuple.hpp"
+#include "../../util/type_traits.hpp"
+#include "../../warp/warp.hpp"
+
+namespace cv { namespace cudev {
+
+namespace block_reduce_key_val_detail
+{
+    // GetType
+
+    template <typename T> struct GetType;
+
+    template <typename T> struct GetType<T*>
+    {
+        typedef T type;
+    };
+
+    template <typename T> struct GetType<volatile T*>
+    {
+        typedef T type;
+    };
+
+    template <typename T> struct GetType<T&>
+    {
+        typedef T type;
+    };
+
+    // For
+
+    template <int I, int N> struct For
+    {
+        template <class PointerTuple, class ReferenceTuple>
+        __device__ static void loadToSmem(const PointerTuple& smem, const ReferenceTuple& data, uint tid)
+        {
+            get<I>(smem)[tid] = get<I>(data);
+
+            For<I + 1, N>::loadToSmem(smem, data, tid);
+        }
+
+        template <class PointerTuple, class ReferenceTuple>
+        __device__ static void loadFromSmem(const PointerTuple& smem, const ReferenceTuple& data, uint tid)
+        {
+            get<I>(data) = get<I>(smem)[tid];
+
+            For<I + 1, N>::loadFromSmem(smem, data, tid);
+        }
+
+        template <class PointerTuple, class ReferenceTuple>
+        __device__ static void copy(const PointerTuple& svals, const ReferenceTuple& val, uint tid, uint delta)
+        {
+            get<I>(svals)[tid] = get<I>(val) = get<I>(svals)[tid + delta];
+
+            For<I + 1, N>::copy(svals, val, tid, delta);
+        }
+
+        template <class KeyPointerTuple, class KeyReferenceTuple, class ValPointerTuple, class ValReferenceTuple, class CmpTuple>
+        __device__ static void merge(const KeyPointerTuple& skeys, const KeyReferenceTuple& key,
+                                     const ValPointerTuple& svals, const ValReferenceTuple& val,
+                                     const CmpTuple& cmp,
+                                     uint tid, uint delta)
+        {
+            typename GetType<typename tuple_element<I, KeyPointerTuple>::type>::type reg = get<I>(skeys)[tid + delta];
+
+            if (get<I>(cmp)(reg, get<I>(key)))
+            {
+                get<I>(skeys)[tid] = get<I>(key) = reg;
+                get<I>(svals)[tid] = get<I>(val) = get<I>(svals)[tid + delta];
+            }
+
+            For<I + 1, N>::merge(skeys, key, svals, val, cmp, tid, delta);
+        }
+    };
+
+    template <int N> struct For<N, N>
+    {
+        template <class PointerTuple, class ReferenceTuple>
+        __device__ static void loadToSmem(const PointerTuple&, const ReferenceTuple&, uint)
+        {
+        }
+
+        template <class PointerTuple, class ReferenceTuple>
+        __device__ static void loadFromSmem(const PointerTuple&, const ReferenceTuple&, uint)
+        {
+        }
+
+        template <class PointerTuple, class ReferenceTuple>
+        __device__ static void copy(const PointerTuple&, const ReferenceTuple&, uint, uint)
+        {
+        }
+
+        template <class KeyPointerTuple, class KeyReferenceTuple, class ValPointerTuple, class ValReferenceTuple, class CmpTuple>
+        __device__ static void merge(const KeyPointerTuple&, const KeyReferenceTuple&,
+                                     const ValPointerTuple&, const ValReferenceTuple&,
+                                     const CmpTuple&,
+                                     uint, uint)
+        {
+        }
+    };
+
+    // loadToSmem / loadFromSmem
+
+    template <typename T>
+    __device__ __forceinline__ void loadToSmem(volatile T* smem, T& data, uint tid)
+    {
+        smem[tid] = data;
+    }
+
+    template <typename T>
+    __device__ __forceinline__ void loadFromSmem(volatile T* smem, T& data, uint tid)
+    {
+        data = smem[tid];
+    }
+
+    template <typename VP0, typename VP1, typename VP2, typename VP3, typename VP4, typename VP5, typename VP6, typename VP7, typename VP8, typename VP9,
+              typename VR0, typename VR1, typename VR2, typename VR3, typename VR4, typename VR5, typename VR6, typename VR7, typename VR8, typename VR9>
+    __device__ __forceinline__ void loadToSmem(const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>& smem,
+                                               const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>& data,
+                                               uint tid)
+    {
+        For<0, tuple_size<tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9> >::value>::loadToSmem(smem, data, tid);
+    }
+
+    template <typename VP0, typename VP1, typename VP2, typename VP3, typename VP4, typename VP5, typename VP6, typename VP7, typename VP8, typename VP9,
+              typename VR0, typename VR1, typename VR2, typename VR3, typename VR4, typename VR5, typename VR6, typename VR7, typename VR8, typename VR9>
+    __device__ __forceinline__ void loadFromSmem(const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>& smem,
+                                                 const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>& data,
+                                                 uint tid)
+    {
+        For<0, tuple_size<tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9> >::value>::loadFromSmem(smem, data, tid);
+    }
+
+    // copyVals
+
+    template <typename V>
+    __device__ __forceinline__ void copyVals(volatile V* svals, V& val, uint tid, uint delta)
+    {
+        svals[tid] = val = svals[tid + delta];
+    }
+
+    template <typename VP0, typename VP1, typename VP2, typename VP3, typename VP4, typename VP5, typename VP6, typename VP7, typename VP8, typename VP9,
+              typename VR0, typename VR1, typename VR2, typename VR3, typename VR4, typename VR5, typename VR6, typename VR7, typename VR8, typename VR9>
+    __device__ __forceinline__ void copyVals(const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>& svals,
+                                             const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>& val,
+                                             uint tid, uint delta)
+    {
+        For<0, tuple_size<tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9> >::value>::copy(svals, val, tid, delta);
+    }
+
+    // merge
+
+    template <typename K, typename V, class Cmp>
+    __device__ void merge(volatile K* skeys, K& key, volatile V* svals, V& val, const Cmp& cmp, uint tid, uint delta)
+    {
+        K reg = skeys[tid + delta];
+
+        if (cmp(reg, key))
+        {
+            skeys[tid] = key = reg;
+            copyVals(svals, val, tid, delta);
+        }
+    }
+
+    template <typename K,
+              typename VP0, typename VP1, typename VP2, typename VP3, typename VP4, typename VP5, typename VP6, typename VP7, typename VP8, typename VP9,
+              typename VR0, typename VR1, typename VR2, typename VR3, typename VR4, typename VR5, typename VR6, typename VR7, typename VR8, typename VR9,
+              class Cmp>
+    __device__ void merge(volatile K* skeys, K& key,
+                          const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>& svals,
+                          const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>& val,
+                          const Cmp& cmp, uint tid, uint delta)
+    {
+        K reg = skeys[tid + delta];
+
+        if (cmp(reg, key))
+        {
+            skeys[tid] = key = reg;
+            copyVals(svals, val, tid, delta);
+        }
+    }
+
+    template <typename KP0, typename KP1, typename KP2, typename KP3, typename KP4, typename KP5, typename KP6, typename KP7, typename KP8, typename KP9,
+              typename KR0, typename KR1, typename KR2, typename KR3, typename KR4, typename KR5, typename KR6, typename KR7, typename KR8, typename KR9,
+              typename VP0, typename VP1, typename VP2, typename VP3, typename VP4, typename VP5, typename VP6, typename VP7, typename VP8, typename VP9,
+              typename VR0, typename VR1, typename VR2, typename VR3, typename VR4, typename VR5, typename VR6, typename VR7, typename VR8, typename VR9,
+              class Cmp0, class Cmp1, class Cmp2, class Cmp3, class Cmp4, class Cmp5, class Cmp6, class Cmp7, class Cmp8, class Cmp9>
+    __device__ __forceinline__ void merge(const tuple<KP0, KP1, KP2, KP3, KP4, KP5, KP6, KP7, KP8, KP9>& skeys,
+                                          const tuple<KR0, KR1, KR2, KR3, KR4, KR5, KR6, KR7, KR8, KR9>& key,
+                                          const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>& svals,
+                                          const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>& val,
+                                          const tuple<Cmp0, Cmp1, Cmp2, Cmp3, Cmp4, Cmp5, Cmp6, Cmp7, Cmp8, Cmp9>& cmp,
+                                          uint tid, uint delta)
+    {
+        For<0, tuple_size<tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9> >::value>::merge(skeys, key, svals, val, cmp, tid, delta);
+    }
+
+    // Generic
+
+    template <int N> struct Generic
+    {
+        template <class KP, class KR, class VP, class VR, class Cmp>
+        __device__ static void reduce(KP skeys, KR key, VP svals, VR val, uint tid, Cmp cmp)
+        {
+            loadToSmem(skeys, key, tid);
+            loadValsToSmem(svals, val, tid);
+            if (N >= 32)
+                __syncthreads();
+
+            if (N >= 2048)
+            {
+                if (tid < 1024)
+                    merge(skeys, key, svals, val, cmp, tid, 1024);
+
+                __syncthreads();
+            }
+            if (N >= 1024)
+            {
+                if (tid < 512)
+                    merge(skeys, key, svals, val, cmp, tid, 512);
+
+                __syncthreads();
+            }
+            if (N >= 512)
+            {
+                if (tid < 256)
+                    merge(skeys, key, svals, val, cmp, tid, 256);
+
+                __syncthreads();
+            }
+            if (N >= 256)
+            {
+                if (tid < 128)
+                    merge(skeys, key, svals, val, cmp, tid, 128);
+
+                __syncthreads();
+            }
+            if (N >= 128)
+            {
+                if (tid < 64)
+                    merge(skeys, key, svals, val, cmp, tid, 64);
+
+                __syncthreads();
+            }
+            if (N >= 64)
+            {
+                if (tid < 32)
+                    merge(skeys, key, svals, val, cmp, tid, 32);
+            }
+
+            if (tid < 16)
+            {
+                merge(skeys, key, svals, val, cmp, tid, 16);
+                merge(skeys, key, svals, val, cmp, tid, 8);
+                merge(skeys, key, svals, val, cmp, tid, 4);
+                merge(skeys, key, svals, val, cmp, tid, 2);
+                merge(skeys, key, svals, val, cmp, tid, 1);
+            }
+        }
+    };
+
+    // Unroll
+
+    template <int I, class KP, class KR, class VP, class VR, class Cmp> struct Unroll
+    {
+        __device__ static void loop(KP skeys, KR key, VP svals, VR val, uint tid, Cmp cmp)
+        {
+            merge(skeys, key, svals, val, cmp, tid, I);
+            Unroll<I / 2, KP, KR, VP, VR, Cmp>::loop(skeys, key, svals, val, tid, cmp);
+        }
+    };
+
+    template <class KP, class KR, class VP, class VR, class Cmp> struct Unroll<0, KP, KR, VP, VR, Cmp>
+    {
+        __device__ __forceinline__ static void loop(KP, KR, VP, VR, uint, Cmp)
+        {
+        }
+    };
+
+    // WarpOptimized
+
+    template <int N> struct WarpOptimized
+    {
+        template <class KP, class KR, class VP, class VR, class Cmp>
+        __device__ static void reduce(KP skeys, KR key, VP svals, VR val, uint tid, Cmp cmp)
+        {
+            loadToSmem(skeys, key, tid);
+            loadToSmem(svals, val, tid);
+
+            if (tid < N / 2)
+                Unroll<N / 2, KP, KR, VP, VR, Cmp>::loop(skeys, key, svals, val, tid, cmp);
+        }
+    };
+
+    // GenericOptimized32
+
+    template <uint N> struct GenericOptimized32
+    {
+        enum { M = N / 32 };
+
+        template <class KP, class KR, class VP, class VR, class Cmp>
+        __device__ static void reduce(KP skeys, KR key, VP svals, VR val, uint tid, Cmp cmp)
+        {
+            const uint laneId = Warp::laneId();
+
+            loadToSmem(skeys, key, tid);
+            loadToSmem(svals, val, tid);
+
+            if (laneId < 16)
+                Unroll<16, KP, KR, VP, VR, Cmp>::loop(skeys, key, svals, val, tid, cmp);
+
+            __syncthreads();
+
+            if (laneId == 0)
+            {
+                loadToSmem(skeys, key, tid / 32);
+                loadToSmem(svals, val, tid / 32);
+            }
+
+            __syncthreads();
+
+            loadFromSmem(skeys, key, tid);
+
+            if (tid < 32)
+            {
+                Unroll<M / 2, KP, KR, VP, VR, Cmp>::loop(skeys, key, svals, val, tid, cmp);
+            }
+        }
+    };
+
+    template <int N> struct Dispatcher
+    {
+        typedef typename SelectIf<
+            (N <= 32) && IsPowerOf2<N>::value,
+            WarpOptimized<N>,
+            typename SelectIf<
+                (N <= 1024) && IsPowerOf2<N>::value,
+                GenericOptimized32<N>,
+                Generic<N>
+            >::type
+        >::type reductor;
+    };
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/block/dynamic_smem.hpp b/modules/cudev/include/opencv2/cudev/block/dynamic_smem.hpp
new file mode 100644
index 00000000000..610de3023d8
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/block/dynamic_smem.hpp
@@ -0,0 +1,91 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_BLOCK_DYNAMIC_SMEM_HPP
+#define OPENCV_CUDEV_BLOCK_DYNAMIC_SMEM_HPP
+
+#include "../common.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <class T> struct DynamicSharedMem
+{
+    __device__ __forceinline__ operator T*()
+    {
+        extern __shared__ int __smem[];
+        return (T*) __smem;
+    }
+
+    __device__ __forceinline__ operator const T*() const
+    {
+        extern __shared__ int __smem[];
+        return (T*) __smem;
+    }
+};
+
+// specialize for double to avoid unaligned memory access compile errors
+template <> struct DynamicSharedMem<double>
+{
+    __device__ __forceinline__ operator double*()
+    {
+        extern __shared__ double __smem_d[];
+        return (double*) __smem_d;
+    }
+
+    __device__ __forceinline__ operator const double*() const
+    {
+        extern __shared__ double __smem_d[];
+        return (double*) __smem_d;
+    }
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/block/reduce.hpp b/modules/cudev/include/opencv2/cudev/block/reduce.hpp
new file mode 100644
index 00000000000..06f59a16ae9
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/block/reduce.hpp
@@ -0,0 +1,133 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_BLOCK_REDUCE_HPP
+#define OPENCV_CUDEV_BLOCK_REDUCE_HPP
+
+#include "../common.hpp"
+#include "../util/tuple.hpp"
+#include "../warp/reduce.hpp"
+#include "detail/reduce.hpp"
+#include "detail/reduce_key_val.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// blockReduce
+
+template <int N, typename T, class Op>
+__device__ __forceinline__ void blockReduce(volatile T* smem, T& val, uint tid, const Op& op)
+{
+    block_reduce_detail::Dispatcher<N>::reductor::template reduce<volatile T*, T&, const Op&>(smem, val, tid, op);
+}
+
+template <int N,
+          typename P0, typename P1, typename P2, typename P3, typename P4, typename P5, typename P6, typename P7, typename P8, typename P9,
+          typename R0, typename R1, typename R2, typename R3, typename R4, typename R5, typename R6, typename R7, typename R8, typename R9,
+          class Op0, class Op1, class Op2, class Op3, class Op4, class Op5, class Op6, class Op7, class Op8, class Op9>
+__device__ __forceinline__ void blockReduce(const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem,
+                                            const tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9>& val,
+                                            uint tid,
+                                            const tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9>& op)
+{
+    block_reduce_detail::Dispatcher<N>::reductor::template reduce<
+            const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>&,
+            const tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9>&,
+            const tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9>&>(smem, val, tid, op);
+}
+
+// blockReduceKeyVal
+
+template <int N, typename K, typename V, class Cmp>
+__device__ __forceinline__ void blockReduceKeyVal(volatile K* skeys, K& key, volatile V* svals, V& val, uint tid, const Cmp& cmp)
+{
+    block_reduce_key_val_detail::Dispatcher<N>::reductor::template reduce<volatile K*, K&, volatile V*, V&, const Cmp&>(skeys, key, svals, val, tid, cmp);
+}
+
+template <int N,
+          typename K,
+          typename VP0, typename VP1, typename VP2, typename VP3, typename VP4, typename VP5, typename VP6, typename VP7, typename VP8, typename VP9,
+          typename VR0, typename VR1, typename VR2, typename VR3, typename VR4, typename VR5, typename VR6, typename VR7, typename VR8, typename VR9,
+          class Cmp>
+__device__ __forceinline__ void blockReduceKeyVal(volatile K* skeys, K& key,
+                                                  const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>& svals,
+                                                  const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>& val,
+                                                  uint tid, const Cmp& cmp)
+{
+    block_reduce_key_val_detail::Dispatcher<N>::reductor::template reduce<volatile K*, K&,
+            const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>&,
+            const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>&,
+            const Cmp&>(skeys, key, svals, val, tid, cmp);
+}
+
+template <int N,
+          typename KP0, typename KP1, typename KP2, typename KP3, typename KP4, typename KP5, typename KP6, typename KP7, typename KP8, typename KP9,
+          typename KR0, typename KR1, typename KR2, typename KR3, typename KR4, typename KR5, typename KR6, typename KR7, typename KR8, typename KR9,
+          typename VP0, typename VP1, typename VP2, typename VP3, typename VP4, typename VP5, typename VP6, typename VP7, typename VP8, typename VP9,
+          typename VR0, typename VR1, typename VR2, typename VR3, typename VR4, typename VR5, typename VR6, typename VR7, typename VR8, typename VR9,
+          class Cmp0, class Cmp1, class Cmp2, class Cmp3, class Cmp4, class Cmp5, class Cmp6, class Cmp7, class Cmp8, class Cmp9>
+__device__ __forceinline__ void blockReduceKeyVal(const tuple<KP0, KP1, KP2, KP3, KP4, KP5, KP6, KP7, KP8, KP9>& skeys,
+                                                  const tuple<KR0, KR1, KR2, KR3, KR4, KR5, KR6, KR7, KR8, KR9>& key,
+                                                  const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>& svals,
+                                                  const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>& val,
+                                                  uint tid,
+                                                  const tuple<Cmp0, Cmp1, Cmp2, Cmp3, Cmp4, Cmp5, Cmp6, Cmp7, Cmp8, Cmp9>& cmp)
+{
+    block_reduce_key_val_detail::Dispatcher<N>::reductor::template reduce<
+            const tuple<KP0, KP1, KP2, KP3, KP4, KP5, KP6, KP7, KP8, KP9>&,
+            const tuple<KR0, KR1, KR2, KR3, KR4, KR5, KR6, KR7, KR8, KR9>&,
+            const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>&,
+            const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>&,
+            const tuple<Cmp0, Cmp1, Cmp2, Cmp3, Cmp4, Cmp5, Cmp6, Cmp7, Cmp8, Cmp9>&
+            >(skeys, key, svals, val, tid, cmp);
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/block/scan.hpp b/modules/cudev/include/opencv2/cudev/block/scan.hpp
new file mode 100644
index 00000000000..2bfa62e6d6b
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/block/scan.hpp
@@ -0,0 +1,257 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_BLOCK_SCAN_HPP
+#define OPENCV_CUDEV_BLOCK_SCAN_HPP
+
+#include "../common.hpp"
+#include "../warp/scan.hpp"
+#include "../warp/warp.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+#if __CUDACC_VER_MAJOR__ >= 9
+
+// Usage Note
+// - THREADS_NUM should be equal to the number of threads in this block.
+// - smem must be able to contain at least n elements of type T, where n is equal to the number
+//   of warps in this block. The number can be calculated by divUp(THREADS_NUM, WARP_SIZE).
+//
+// Dev Note
+// - Starting from CUDA 9.0, support for Fermi is dropped. So CV_CUDEV_ARCH >= 300 is implied.
+// - "For Pascal and earlier architectures (CV_CUDEV_ARCH < 700), all threads in mask must execute
+//    the same warp intrinsic instruction in convergence, and the union of all values in mask must
+//    be equal to the warp's active mask."
+//   (https://docs.nvidia.com/cuda/archive/10.0/cuda-c-programming-guide#independent-thread-scheduling-7-x)
+// - Above restriction does not apply starting from Volta (CV_CUDEV_ARCH >= 700). We just need to
+//   take care so that "all non-exited threads named in mask must execute the same intrinsic with
+//   the same mask."
+//   (https://docs.nvidia.com/cuda/archive/10.0/cuda-c-programming-guide#warp-description)
+
+template <int THREADS_NUM, typename T>
+__device__ T blockScanInclusive(T data, volatile T* smem, uint tid)
+{
+    const int residual = THREADS_NUM & (WARP_SIZE - 1);
+
+#if CV_CUDEV_ARCH < 700
+    const uint residual_mask = (1U << residual) - 1;
+#endif
+
+    if (THREADS_NUM > WARP_SIZE)
+    {
+        // bottom-level inclusive warp scan
+    #if CV_CUDEV_ARCH >= 700
+        T warpResult = warpScanInclusive(0xFFFFFFFFU, data);
+    #else
+        T warpResult;
+
+        if (0 == residual)
+            warpResult = warpScanInclusive(0xFFFFFFFFU, data);
+        else
+        {
+            const int n_warps = divUp(THREADS_NUM, WARP_SIZE);
+            const int warp_num = Warp::warpId();
+
+            if (warp_num < n_warps - 1)
+                warpResult = warpScanInclusive(0xFFFFFFFFU, data);
+            else
+            {
+                // We are at the last threads of a block whose number of threads
+                // is not a multiple of the warp size
+                warpResult = warpScanInclusive(residual_mask, data);
+            }
+        }
+    #endif
+
+        __syncthreads();
+
+        // save top elements of each warp for exclusive warp scan
+        // sync to wait for warp scans to complete (because smem is being overwritten)
+        if ((tid & (WARP_SIZE - 1)) == (WARP_SIZE - 1))
+        {
+            smem[tid >> LOG_WARP_SIZE] = warpResult;
+        }
+
+        __syncthreads();
+
+        int quot = THREADS_NUM / WARP_SIZE;
+
+        if (tid < quot)
+        {
+            // grab top warp elements
+            T val = smem[tid];
+
+            uint mask = (1LLU << quot) - 1;
+
+            if (0 == residual)
+            {
+                // calculate exclusive scan and write back to shared memory
+                smem[tid] = warpScanExclusive(mask, val);
+            }
+            else
+            {
+                // Read from smem[tid]              (T val = smem[tid])
+                // and write to smem[tid + 1]       (smem[tid + 1] = warpScanInclusive(mask, val))
+                // should be explicitly fenced by "__syncwarp" to get rid of
+                // "cuda-memcheck --tool racecheck" warnings.
+                __syncwarp(mask);
+
+                // calculate inclusive scan and write back to shared memory with offset 1
+                smem[tid + 1] = warpScanInclusive(mask, val);
+
+                if (tid == 0)
+                    smem[0] = 0;
+            }
+        }
+
+        __syncthreads();
+
+        // return updated warp scans
+        return warpResult + smem[tid >> LOG_WARP_SIZE];
+    }
+    else
+    {
+    #if CV_CUDEV_ARCH >= 700
+        return warpScanInclusive(0xFFFFFFFFU, data);
+    #else
+        if (THREADS_NUM == WARP_SIZE)
+            return warpScanInclusive(0xFFFFFFFFU, data);
+        else
+            return warpScanInclusive(residual_mask, data);
+    #endif
+    }
+}
+
+template <int THREADS_NUM, typename T>
+__device__ __forceinline__ T blockScanExclusive(T data, volatile T* smem, uint tid)
+{
+    return blockScanInclusive<THREADS_NUM>(data, smem, tid) - data;
+}
+
+#else // __CUDACC_VER_MAJOR__ >= 9
+
+// Usage Note
+// - THREADS_NUM should be equal to the number of threads in this block.
+// - (>= Kepler) smem must be able to contain at least n elements of type T, where n is equal to the number
+//   of warps in this block. The number can be calculated by divUp(THREADS_NUM, WARP_SIZE).
+// - (Fermi) smem must be able to contain at least n elements of type T, where n is equal to the number
+//   of threads in this block (= THREADS_NUM).
+
+template <int THREADS_NUM, typename T>
+__device__ T blockScanInclusive(T data, volatile T* smem, uint tid)
+{
+    if (THREADS_NUM > WARP_SIZE)
+    {
+        // bottom-level inclusive warp scan
+        T warpResult = warpScanInclusive(data, smem, tid);
+
+        __syncthreads();
+
+        // save top elements of each warp for exclusive warp scan
+        // sync to wait for warp scans to complete (because s_Data is being overwritten)
+        if ((tid & (WARP_SIZE - 1)) == (WARP_SIZE - 1))
+        {
+            smem[tid >> LOG_WARP_SIZE] = warpResult;
+        }
+
+        __syncthreads();
+
+        int quot = THREADS_NUM / WARP_SIZE;
+
+        T val;
+
+        if (tid < quot)
+        {
+            // grab top warp elements
+            val = smem[tid];
+        }
+
+        __syncthreads();
+
+        if (tid < quot)
+        {
+
+            if (0 == (THREADS_NUM & (WARP_SIZE - 1)))
+            {
+                // calculate exclusive scan and write back to shared memory
+                smem[tid] = warpScanExclusive(val, smem, tid);
+            }
+            else
+            {
+                // calculate inclusive scan and write back to shared memory with offset 1
+                smem[tid + 1] = warpScanInclusive(val, smem, tid);
+
+                if (tid == 0)
+                    smem[0] = 0;
+            }
+        }
+
+        __syncthreads();
+
+        // return updated warp scans
+        return warpResult + smem[tid >> LOG_WARP_SIZE];
+    }
+    else
+    {
+        return warpScanInclusive(data, smem, tid);
+    }
+}
+
+template <int THREADS_NUM, typename T>
+__device__ __forceinline__ T blockScanExclusive(T data, volatile T* smem, uint tid)
+{
+    return blockScanInclusive<THREADS_NUM>(data, smem, tid) - data;
+}
+
+#endif // __CUDACC_VER_MAJOR__ >= 9
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/block/vec_distance.hpp b/modules/cudev/include/opencv2/cudev/block/vec_distance.hpp
new file mode 100644
index 00000000000..3dc38757279
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/block/vec_distance.hpp
@@ -0,0 +1,189 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_BLOCK_VEC_DISTANCE_HPP
+#define OPENCV_CUDEV_BLOCK_VEC_DISTANCE_HPP
+
+#include "../common.hpp"
+#include "../functional/functional.hpp"
+#include "../warp/reduce.hpp"
+#include "reduce.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// NormL1
+
+template <typename T> struct NormL1
+{
+    typedef int value_type;
+    typedef uint result_type;
+
+    result_type mySum;
+
+    __device__ __forceinline__ NormL1() : mySum(0) {}
+
+    __device__ __forceinline__ void reduceThread(value_type val1, value_type val2)
+    {
+        mySum = __sad(val1, val2, mySum);
+    }
+
+    __device__ __forceinline__ void reduceWarp(result_type* smem, uint tid)
+    {
+        warpReduce(smem, mySum, tid, plus<result_type>());
+    }
+
+    template <int THREAD_DIM> __device__ __forceinline__ void reduceBlock(result_type* smem, uint tid)
+    {
+        blockReduce<THREAD_DIM>(smem, mySum, tid, plus<result_type>());
+    }
+
+    __device__ __forceinline__ operator result_type() const
+    {
+        return mySum;
+    }
+};
+template <> struct NormL1<float>
+{
+    typedef float value_type;
+    typedef float result_type;
+
+    result_type mySum;
+
+    __device__ __forceinline__ NormL1() : mySum(0.0f) {}
+
+    __device__ __forceinline__ void reduceThread(value_type val1, value_type val2)
+    {
+        mySum += ::fabsf(val1 - val2);
+    }
+
+    __device__ __forceinline__ void reduceWarp(result_type* smem, uint tid)
+    {
+        warpReduce(smem, mySum, tid, plus<result_type>());
+    }
+
+    template <int THREAD_DIM> __device__ __forceinline__ void reduceBlock(result_type* smem, uint tid)
+    {
+        blockReduce<THREAD_DIM>(smem, mySum, tid, plus<result_type>());
+    }
+
+    __device__ __forceinline__ operator result_type() const
+    {
+        return mySum;
+    }
+};
+
+// NormL2
+
+struct NormL2
+{
+    typedef float value_type;
+    typedef float result_type;
+
+    result_type mySum;
+
+    __device__ __forceinline__ NormL2() : mySum(0.0f) {}
+
+    __device__ __forceinline__ void reduceThread(value_type val1, value_type val2)
+    {
+        const float diff = val1 - val2;
+        mySum += diff * diff;
+    }
+
+    __device__ __forceinline__ void reduceWarp(result_type* smem, uint tid)
+    {
+        warpReduce(smem, mySum, tid, plus<result_type>());
+    }
+
+    template <int THREAD_DIM> __device__ __forceinline__ void reduceBlock(result_type* smem, uint tid)
+    {
+        blockReduce<THREAD_DIM>(smem, mySum, tid, plus<result_type>());
+    }
+
+    __device__ __forceinline__ operator result_type() const
+    {
+        return ::sqrtf(mySum);
+    }
+};
+
+// NormHamming
+
+struct NormHamming
+{
+    typedef int value_type;
+    typedef int result_type;
+
+    result_type mySum;
+
+    __device__ __forceinline__ NormHamming() : mySum(0) {}
+
+    __device__ __forceinline__ void reduceThread(value_type val1, value_type val2)
+    {
+        mySum += __popc(val1 ^ val2);
+    }
+
+    __device__ __forceinline__ void reduceWarp(result_type* smem, uint tid)
+    {
+        warpReduce(smem, mySum, tid, plus<result_type>());
+    }
+
+    template <int THREAD_DIM> __device__ __forceinline__ void reduceBlock(result_type* smem, uint tid)
+    {
+        blockReduce<THREAD_DIM>(smem, mySum, tid, plus<result_type>());
+    }
+
+    __device__ __forceinline__ operator result_type() const
+    {
+        return mySum;
+    }
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/common.hpp b/modules/cudev/include/opencv2/cudev/common.hpp
new file mode 100644
index 00000000000..b4439f55154
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/common.hpp
@@ -0,0 +1,94 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_COMMON_HPP
+#define OPENCV_CUDEV_COMMON_HPP
+
+#include <cuda_runtime.h>
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/core/cuda_stream_accessor.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+using namespace cv::cuda;
+
+// CV_CUDEV_ARCH
+
+#ifndef __CUDA_ARCH__
+#   define CV_CUDEV_ARCH 0
+#else
+#   define CV_CUDEV_ARCH __CUDA_ARCH__
+#endif
+
+// CV_CUDEV_SAFE_CALL
+
+__host__ __forceinline__ void checkCudaError(cudaError_t err, const char* file, const int line, const char* func)
+{
+    if (cudaSuccess != err)
+        cv::error(cv::Error::GpuApiCallError, cudaGetErrorString(err), func, file, line);
+}
+
+#define CV_CUDEV_SAFE_CALL(expr) cv::cudev::checkCudaError((expr), __FILE__, __LINE__, CV_Func)
+
+// divUp
+
+__host__ __device__ __forceinline__ int divUp(int total, int grain)
+{
+    return (total + grain - 1) / grain;
+}
+
+// math constants
+
+#define CV_PI_F   ((float)CV_PI)
+#define CV_LOG2_F ((float)CV_LOG2)
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/expr/binary_func.hpp b/modules/cudev/include/opencv2/cudev/expr/binary_func.hpp
new file mode 100644
index 00000000000..3e5c009517b
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/expr/binary_func.hpp
@@ -0,0 +1,80 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_EXPR_BINARY_FUNC_HPP
+#define OPENCV_CUDEV_EXPR_BINARY_FUNC_HPP
+
+#include "../common.hpp"
+#include "../util/type_traits.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/transform.hpp"
+#include "../functional/functional.hpp"
+#include "expr.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+#define CV_CUDEV_EXPR_BINARY_FUNC(name) \
+    template <class SrcPtr1, class SrcPtr2> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<SrcPtr1>::ptr_type, typename PtrTraits<SrcPtr2>::ptr_type, name ## _func<typename LargerType<typename PtrTraits<SrcPtr1>::value_type, typename PtrTraits<SrcPtr2>::value_type>::type> > > \
+    name ## _(const SrcPtr1& src1, const SrcPtr2& src2) \
+    { \
+        return makeExpr(transformPtr(src1, src2, name ## _func<typename LargerType<typename PtrTraits<SrcPtr1>::value_type, typename PtrTraits<SrcPtr2>::value_type>::type>())); \
+    }
+
+CV_CUDEV_EXPR_BINARY_FUNC(hypot)
+CV_CUDEV_EXPR_BINARY_FUNC(magnitude)
+CV_CUDEV_EXPR_BINARY_FUNC(atan2)
+CV_CUDEV_EXPR_BINARY_FUNC(absdiff)
+
+#undef CV_CUDEV_EXPR_BINARY_FUNC
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/expr/binary_op.hpp b/modules/cudev/include/opencv2/cudev/expr/binary_op.hpp
new file mode 100644
index 00000000000..e3c9ebbc895
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/expr/binary_op.hpp
@@ -0,0 +1,240 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_EXPR_BINARY_OP_HPP
+#define OPENCV_CUDEV_EXPR_BINARY_OP_HPP
+
+#include "../common.hpp"
+#include "../util/type_traits.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/transform.hpp"
+#include "../ptr2d/gpumat.hpp"
+#include "../ptr2d/texture.hpp"
+#include "../ptr2d/glob.hpp"
+#include "../functional/functional.hpp"
+#include "expr.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// Binary Operations
+
+#define CV_CUDEV_EXPR_BINOP_INST(op, functor) \
+    template <typename T> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<GpuMat_<T> >::ptr_type, typename PtrTraits<GpuMat_<T> >::ptr_type, functor<T> > > \
+    operator op(const GpuMat_<T>& src1, const GpuMat_<T>& src2) \
+    { \
+        return makeExpr(transformPtr(src1, src2, functor<T>())); \
+    } \
+    template <typename T> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<GpuMat_<T> >::ptr_type, typename PtrTraits<GlobPtrSz<T> >::ptr_type, functor<T> > > \
+    operator op(const GpuMat_<T>& src1, const GlobPtrSz<T>& src2) \
+    { \
+        return makeExpr(transformPtr(src1, src2, functor<T>())); \
+    } \
+    template <typename T> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<GlobPtrSz<T> >::ptr_type, typename PtrTraits<GpuMat_<T> >::ptr_type, functor<T> > > \
+    operator op(const GlobPtrSz<T>& src1, const GpuMat_<T>& src2) \
+    { \
+        return makeExpr(transformPtr(src1, src2, functor<T>())); \
+    } \
+    template <typename T> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<GpuMat_<T> >::ptr_type, typename PtrTraits<Texture<T> >::ptr_type, functor<T> > > \
+    operator op(const GpuMat_<T>& src1, const Texture<T>& src2) \
+    { \
+        return makeExpr(transformPtr(src1, src2, functor<T>())); \
+    } \
+    template <typename T> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<Texture<T> >::ptr_type, typename PtrTraits<GpuMat_<T> >::ptr_type, functor<T> > > \
+    operator op(const Texture<T>& src1, const GpuMat_<T>& src2) \
+    { \
+        return makeExpr(transformPtr(src1, src2, functor<T>())); \
+    } \
+    template <typename T, class Body> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<GpuMat_<T> >::ptr_type, typename PtrTraits<Body>::ptr_type, functor<typename LargerType<T, typename PtrTraits<Body>::value_type>::type> > > \
+    operator op(const GpuMat_<T>& src1, const Expr<Body>& src2) \
+    { \
+        return makeExpr(transformPtr(src1, src2.body, functor<typename LargerType<T, typename PtrTraits<Body>::value_type>::type>())); \
+    } \
+    template <typename T, class Body> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<Body>::ptr_type, typename PtrTraits<GpuMat_<T> >::ptr_type, functor<typename LargerType<T, typename PtrTraits<Body>::value_type>::type> > > \
+    operator op(const Expr<Body>& src1, const GpuMat_<T>& src2) \
+    { \
+        return makeExpr(transformPtr(src1.body, src2, functor<typename LargerType<T, typename PtrTraits<Body>::value_type>::type>())); \
+    } \
+    template <typename T> \
+    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<GpuMat_<T> >::ptr_type, Binder2nd< functor<T> > > > \
+    operator op(const GpuMat_<T>& src, T val) \
+    { \
+        return makeExpr(transformPtr(src, bind2nd(functor<T>(), val))); \
+    } \
+    template <typename T> \
+    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<GpuMat_<T> >::ptr_type, Binder1st< functor<T> > > > \
+    operator op(T val, const GpuMat_<T>& src) \
+    { \
+        return makeExpr(transformPtr(src, bind1st(functor<T>(), val))); \
+    } \
+    template <typename T> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<GlobPtrSz<T> >::ptr_type, typename PtrTraits<GlobPtrSz<T> >::ptr_type, functor<T> > > \
+    operator op(const GlobPtrSz<T>& src1, const GlobPtrSz<T>& src2) \
+    { \
+        return makeExpr(transformPtr(src1, src2, functor<T>())); \
+    } \
+    template <typename T> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<GlobPtrSz<T> >::ptr_type, typename PtrTraits<Texture<T> >::ptr_type, functor<T> > > \
+    operator op(const GlobPtrSz<T>& src1, const Texture<T>& src2) \
+    { \
+        return makeExpr(transformPtr(src1, src2, functor<T>())); \
+    } \
+    template <typename T> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<Texture<T> >::ptr_type, typename PtrTraits<GlobPtrSz<T> >::ptr_type, functor<T> > > \
+    operator op(const Texture<T>& src1, const GlobPtrSz<T>& src2) \
+    { \
+        return makeExpr(transformPtr(src1, src2, functor<T>())); \
+    } \
+    template <typename T, class Body> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<GlobPtrSz<T> >::ptr_type, typename PtrTraits<Body>::ptr_type, functor<typename LargerType<T, typename PtrTraits<Body>::value_type>::type> > > \
+    operator op(const GlobPtrSz<T>& src1, const Expr<Body>& src2) \
+    { \
+        return makeExpr(transformPtr(src1, src2.body, functor<typename LargerType<T, typename PtrTraits<Body>::value_type>::type>())); \
+    } \
+    template <typename T, class Body> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<Body>::ptr_type, typename PtrTraits<GlobPtrSz<T> >::ptr_type, functor<typename LargerType<T, typename PtrTraits<Body>::value_type>::type> > > \
+    operator op(const Expr<Body>& src1, const GlobPtrSz<T>& src2) \
+    { \
+        return makeExpr(transformPtr(src1.body, src2, functor<typename LargerType<T, typename PtrTraits<Body>::value_type>::type>())); \
+    } \
+    template <typename T> \
+    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<GlobPtrSz<T> >::ptr_type, Binder2nd< functor<T> > > > \
+    operator op(const GlobPtrSz<T>& src, T val) \
+    { \
+        return makeExpr(transformPtr(src, bind2nd(functor<T>(), val))); \
+    } \
+    template <typename T> \
+    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<GlobPtrSz<T> >::ptr_type, Binder1st< functor<T> > > > \
+    operator op(T val, const GlobPtrSz<T>& src) \
+    { \
+        return makeExpr(transformPtr(src, bind1st(functor<T>(), val))); \
+    } \
+    template <typename T> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<Texture<T> >::ptr_type, typename PtrTraits<Texture<T> >::ptr_type, functor<T> > > \
+    operator op(const Texture<T>& src1, const Texture<T>& src2) \
+    { \
+        return makeExpr(transformPtr(src1, src2, functor<T>())); \
+    } \
+    template <typename T, class Body> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<Texture<T> >::ptr_type, typename PtrTraits<Body>::ptr_type, functor<typename LargerType<T, typename PtrTraits<Body>::value_type>::type> > > \
+    operator op(const Texture<T>& src1, const Expr<Body>& src2) \
+    { \
+        return makeExpr(transformPtr(src1, src2.body, functor<typename LargerType<T, typename PtrTraits<Body>::value_type>::type>())); \
+    } \
+    template <typename T, class Body> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<Body>::ptr_type, typename PtrTraits<Texture<T> >::ptr_type, functor<typename LargerType<T, typename PtrTraits<Body>::value_type>::type> > > \
+    operator op(const Expr<Body>& src1, const Texture<T>& src2) \
+    { \
+        return makeExpr(transformPtr(src1.body, src2, functor<typename LargerType<T, typename PtrTraits<Body>::value_type>::type>())); \
+    } \
+    template <typename T> \
+    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<Texture<T> >::ptr_type, Binder2nd< functor<T> > > > \
+    operator op(const Texture<T>& src, T val) \
+    { \
+        return makeExpr(transformPtr(src, bind2nd(functor<T>(), val))); \
+    } \
+    template <typename T> \
+    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<Texture<T> >::ptr_type, Binder1st< functor<T> > > > \
+    operator op(T val, const Texture<T>& src) \
+    { \
+        return makeExpr(transformPtr(src, bind1st(functor<T>(), val))); \
+    } \
+    template <class Body1, class Body2> \
+    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<Body1>::ptr_type, typename PtrTraits<Body2>::ptr_type, functor<typename LargerType<typename PtrTraits<Body1>::value_type, typename PtrTraits<Body2>::value_type>::type> > > \
+    operator op(const Expr<Body1>& a, const Expr<Body2>& b) \
+    { \
+        return makeExpr(transformPtr(a.body, b.body, functor<typename LargerType<typename PtrTraits<Body1>::value_type, typename PtrTraits<Body2>::value_type>::type>())); \
+    } \
+    template <class Body> \
+    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<Body>::ptr_type, Binder2nd< functor<typename Body::value_type> > > > \
+    operator op(const Expr<Body>& a, typename Body::value_type val) \
+    { \
+        return makeExpr(transformPtr(a.body, bind2nd(functor<typename Body::value_type>(), val))); \
+    } \
+    template <class Body> \
+    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<Body>::ptr_type, Binder1st< functor<typename Body::value_type> > > > \
+    operator op(typename Body::value_type val, const Expr<Body>& a) \
+    { \
+        return makeExpr(transformPtr(a.body, bind1st(functor<typename Body::value_type>(), val))); \
+    }
+
+CV_CUDEV_EXPR_BINOP_INST(+, plus)
+CV_CUDEV_EXPR_BINOP_INST(-, minus)
+CV_CUDEV_EXPR_BINOP_INST(*, multiplies)
+CV_CUDEV_EXPR_BINOP_INST(/, divides)
+CV_CUDEV_EXPR_BINOP_INST(%, modulus)
+
+CV_CUDEV_EXPR_BINOP_INST(==, equal_to)
+CV_CUDEV_EXPR_BINOP_INST(!=, not_equal_to)
+CV_CUDEV_EXPR_BINOP_INST(>, greater)
+CV_CUDEV_EXPR_BINOP_INST(<, less)
+CV_CUDEV_EXPR_BINOP_INST(>=, greater_equal)
+CV_CUDEV_EXPR_BINOP_INST(<=, less_equal)
+
+CV_CUDEV_EXPR_BINOP_INST(&&, logical_and)
+CV_CUDEV_EXPR_BINOP_INST(||, logical_or)
+
+CV_CUDEV_EXPR_BINOP_INST(&, bit_and)
+CV_CUDEV_EXPR_BINOP_INST(|, bit_or)
+CV_CUDEV_EXPR_BINOP_INST(^, bit_xor)
+CV_CUDEV_EXPR_BINOP_INST(<<, bit_lshift)
+CV_CUDEV_EXPR_BINOP_INST(>>, bit_rshift)
+
+#undef CV_CUDEV_EXPR_BINOP_INST
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/expr/color.hpp b/modules/cudev/include/opencv2/cudev/expr/color.hpp
new file mode 100644
index 00000000000..ca487a270f1
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/expr/color.hpp
@@ -0,0 +1,287 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_EXPR_COLOR_HPP
+#define OPENCV_CUDEV_EXPR_COLOR_HPP
+
+#include "../common.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/transform.hpp"
+#include "../functional/color_cvt.hpp"
+#include "expr.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+#define CV_CUDEV_EXPR_CVTCOLOR_INST(name) \
+    template <class SrcPtr> \
+    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<SrcPtr>::ptr_type, name ## _func<typename VecTraits<typename PtrTraits<SrcPtr>::value_type>::elem_type> > > \
+    name ## _(const SrcPtr& src) \
+    { \
+        return makeExpr(transformPtr(src, name ## _func<typename VecTraits<typename PtrTraits<SrcPtr>::value_type>::elem_type>())); \
+    }
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_RGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_BGRA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_RGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_BGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_RGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_RGBA)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_GRAY)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_GRAY)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_GRAY)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_GRAY)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(GRAY_to_BGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(GRAY_to_BGRA)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_YUV)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_YUV)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_YUV4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_YUV4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_YUV)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_YUV)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_YUV4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_YUV4)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(YUV_to_RGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(YUV_to_RGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(YUV4_to_RGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(YUV4_to_RGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(YUV_to_BGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(YUV_to_BGRA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(YUV4_to_BGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(YUV4_to_BGRA)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_YCrCb)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_YCrCb)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_YCrCb4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_YCrCb4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_YCrCb)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_YCrCb)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_YCrCb4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_YCrCb4)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(YCrCb_to_RGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(YCrCb_to_RGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(YCrCb4_to_RGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(YCrCb4_to_RGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(YCrCb_to_BGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(YCrCb_to_BGRA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(YCrCb4_to_BGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(YCrCb4_to_BGRA)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_XYZ)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_XYZ)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_XYZ4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_XYZ4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_XYZ)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_XYZ)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_XYZ4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_XYZ4)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(XYZ_to_RGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(XYZ4_to_RGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(XYZ_to_RGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(XYZ4_to_RGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(XYZ_to_BGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(XYZ4_to_BGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(XYZ_to_BGRA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(XYZ4_to_BGRA)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_HSV)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_HSV)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_HSV4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_HSV4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_HSV)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_HSV)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_HSV4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_HSV4)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_HSV_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_HSV_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_HSV4_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_HSV4_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_HSV_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_HSV_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_HSV4_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_HSV4_FULL)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(HSV_to_RGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HSV_to_RGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HSV4_to_RGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HSV4_to_RGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HSV_to_BGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HSV_to_BGRA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HSV4_to_BGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HSV4_to_BGRA)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(HSV_to_RGB_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HSV_to_RGBA_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HSV4_to_RGB_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HSV4_to_RGBA_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HSV_to_BGR_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HSV_to_BGRA_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HSV4_to_BGR_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HSV4_to_BGRA_FULL)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_HLS)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_HLS)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_HLS4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_HLS4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_HLS)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_HLS)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_HLS4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_HLS4)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_HLS_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_HLS_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_HLS4_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_HLS4_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_HLS_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_HLS_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_HLS4_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_HLS4_FULL)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(HLS_to_RGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HLS_to_RGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HLS4_to_RGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HLS4_to_RGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HLS_to_BGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HLS_to_BGRA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HLS4_to_BGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HLS4_to_BGRA)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(HLS_to_RGB_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HLS_to_RGBA_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HLS4_to_RGB_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HLS4_to_RGBA_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HLS_to_BGR_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HLS_to_BGRA_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HLS4_to_BGR_FULL)
+CV_CUDEV_EXPR_CVTCOLOR_INST(HLS4_to_BGRA_FULL)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_Lab)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_Lab)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_Lab4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_Lab4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_Lab)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_Lab)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_Lab4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_Lab4)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(LRGB_to_Lab)
+CV_CUDEV_EXPR_CVTCOLOR_INST(LRGBA_to_Lab)
+CV_CUDEV_EXPR_CVTCOLOR_INST(LRGB_to_Lab4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(LRGBA_to_Lab4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(LBGR_to_Lab)
+CV_CUDEV_EXPR_CVTCOLOR_INST(LBGRA_to_Lab)
+CV_CUDEV_EXPR_CVTCOLOR_INST(LBGR_to_Lab4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(LBGRA_to_Lab4)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(Lab_to_RGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Lab4_to_RGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Lab_to_RGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Lab4_to_RGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Lab_to_BGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Lab4_to_BGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Lab_to_BGRA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Lab4_to_BGRA)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(Lab_to_LRGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Lab4_to_LRGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Lab_to_LRGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Lab4_to_LRGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Lab_to_LBGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Lab4_to_LBGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Lab_to_LBGRA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Lab4_to_LBGRA)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_Luv)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_Luv)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGB_to_Luv4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(RGBA_to_Luv4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_Luv)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_Luv)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGR_to_Luv4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(BGRA_to_Luv4)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(LRGB_to_Luv)
+CV_CUDEV_EXPR_CVTCOLOR_INST(LRGBA_to_Luv)
+CV_CUDEV_EXPR_CVTCOLOR_INST(LRGB_to_Luv4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(LRGBA_to_Luv4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(LBGR_to_Luv)
+CV_CUDEV_EXPR_CVTCOLOR_INST(LBGRA_to_Luv)
+CV_CUDEV_EXPR_CVTCOLOR_INST(LBGR_to_Luv4)
+CV_CUDEV_EXPR_CVTCOLOR_INST(LBGRA_to_Luv4)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(Luv_to_RGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Luv4_to_RGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Luv_to_RGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Luv4_to_RGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Luv_to_BGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Luv4_to_BGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Luv_to_BGRA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Luv4_to_BGRA)
+
+CV_CUDEV_EXPR_CVTCOLOR_INST(Luv_to_LRGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Luv4_to_LRGB)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Luv_to_LRGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Luv4_to_LRGBA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Luv_to_LBGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Luv4_to_LBGR)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Luv_to_LBGRA)
+CV_CUDEV_EXPR_CVTCOLOR_INST(Luv4_to_LBGRA)
+
+#undef CV_CUDEV_EXPR_CVTCOLOR_INST
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/expr/deriv.hpp b/modules/cudev/include/opencv2/cudev/expr/deriv.hpp
new file mode 100644
index 00000000000..d1b06155de6
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/expr/deriv.hpp
@@ -0,0 +1,126 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_EXPR_DERIV_HPP
+#define OPENCV_CUDEV_EXPR_DERIV_HPP
+
+#include "../common.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/deriv.hpp"
+#include "expr.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// derivX
+
+template <class SrcPtr>
+__host__ Expr<DerivXPtrSz<typename PtrTraits<SrcPtr>::ptr_type> >
+derivX_(const SrcPtr& src)
+{
+    return makeExpr(derivXPtr(src));
+}
+
+// derivY
+
+template <class SrcPtr>
+__host__ Expr<DerivYPtrSz<typename PtrTraits<SrcPtr>::ptr_type> >
+derivY_(const SrcPtr& src)
+{
+    return makeExpr(derivYPtr(src));
+}
+
+// sobelX
+
+template <class SrcPtr>
+__host__ Expr<SobelXPtrSz<typename PtrTraits<SrcPtr>::ptr_type> >
+sobelX_(const SrcPtr& src)
+{
+    return makeExpr(sobelXPtr(src));
+}
+
+// sobelY
+
+template <class SrcPtr>
+__host__ Expr<SobelYPtrSz<typename PtrTraits<SrcPtr>::ptr_type> >
+sobelY_(const SrcPtr& src)
+{
+    return makeExpr(sobelYPtr(src));
+}
+
+// scharrX
+
+template <class SrcPtr>
+__host__ Expr<ScharrXPtrSz<typename PtrTraits<SrcPtr>::ptr_type> >
+scharrX_(const SrcPtr& src)
+{
+    return makeExpr(scharrXPtr(src));
+}
+
+// scharrY
+
+template <class SrcPtr>
+__host__ Expr<ScharrYPtrSz<typename PtrTraits<SrcPtr>::ptr_type> >
+scharrY_(const SrcPtr& src)
+{
+    return makeExpr(scharrYPtr(src));
+}
+
+// laplacian
+
+template <int ksize, class SrcPtr>
+__host__ Expr<LaplacianPtrSz<ksize, typename PtrTraits<SrcPtr>::ptr_type> >
+laplacian_(const SrcPtr& src)
+{
+    return makeExpr(laplacianPtr<ksize>(src));
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/expr/expr.hpp b/modules/cudev/include/opencv2/cudev/expr/expr.hpp
new file mode 100644
index 00000000000..5f4a49111fe
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/expr/expr.hpp
@@ -0,0 +1,97 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_EXPR_EXPR_HPP
+#define OPENCV_CUDEV_EXPR_EXPR_HPP
+
+#include "../common.hpp"
+#include "../ptr2d/traits.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <class Body> struct Expr
+{
+    Body body;
+};
+
+template <class Body>
+__host__ Expr<Body> makeExpr(const Body& body)
+{
+    Expr<Body> e;
+    e.body = body;
+    return e;
+}
+
+template <class Body> struct PtrTraits< Expr<Body> >
+{
+    typedef Expr<Body>                         ptr_sz_type;
+    typedef typename PtrTraits<Body>::ptr_type ptr_type;
+
+    typedef typename ptr_type::value_type value_type;
+
+    __host__ static ptr_type shrinkPtr(const Expr<Body>& expr)
+    {
+        return PtrTraits<Body>::shrinkPtr(expr.body);
+    }
+
+    __host__ static int getRows(const Expr<Body>& expr)
+    {
+        return PtrTraits<Body>::getRows(expr.body);
+    }
+
+    __host__ static int getCols(const Expr<Body>& expr)
+    {
+        return PtrTraits<Body>::getCols(expr.body);
+    }
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/expr/per_element_func.hpp b/modules/cudev/include/opencv2/cudev/expr/per_element_func.hpp
new file mode 100644
index 00000000000..9eac3331b34
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/expr/per_element_func.hpp
@@ -0,0 +1,137 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_EXPR_PER_ELEMENT_FUNC_HPP
+#define OPENCV_CUDEV_EXPR_PER_ELEMENT_FUNC_HPP
+
+#include "../common.hpp"
+#include "../util/type_traits.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/transform.hpp"
+#include "../ptr2d/lut.hpp"
+#include "../functional/functional.hpp"
+#include "expr.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// min/max
+
+template <class SrcPtr1, class SrcPtr2>
+__host__ Expr<BinaryTransformPtrSz<typename PtrTraits<SrcPtr1>::ptr_type, typename PtrTraits<SrcPtr2>::ptr_type, minimum<typename LargerType<typename PtrTraits<SrcPtr1>::value_type, typename PtrTraits<SrcPtr2>::value_type>::type> > >
+min_(const SrcPtr1& src1, const SrcPtr2& src2)
+{
+    return makeExpr(transformPtr(src1, src2, minimum<typename LargerType<typename PtrTraits<SrcPtr1>::value_type, typename PtrTraits<SrcPtr2>::value_type>::type>()));
+}
+
+template <class SrcPtr1, class SrcPtr2>
+__host__ Expr<BinaryTransformPtrSz<typename PtrTraits<SrcPtr1>::ptr_type, typename PtrTraits<SrcPtr2>::ptr_type, maximum<typename LargerType<typename PtrTraits<SrcPtr1>::value_type, typename PtrTraits<SrcPtr2>::value_type>::type> > >
+max_(const SrcPtr1& src1, const SrcPtr2& src2)
+{
+    return makeExpr(transformPtr(src1, src2, maximum<typename LargerType<typename PtrTraits<SrcPtr1>::value_type, typename PtrTraits<SrcPtr2>::value_type>::type>()));
+}
+
+// threshold
+
+template <class SrcPtr>
+__host__ Expr<UnaryTransformPtrSz<typename PtrTraits<SrcPtr>::ptr_type, ThreshBinaryFunc<typename PtrTraits<SrcPtr>::value_type> > >
+threshBinary_(const SrcPtr& src, typename PtrTraits<SrcPtr>::value_type thresh, typename PtrTraits<SrcPtr>::value_type maxVal)
+{
+    return makeExpr(transformPtr(src, thresh_binary_func(thresh, maxVal)));
+}
+
+template <class SrcPtr>
+__host__ Expr<UnaryTransformPtrSz<typename PtrTraits<SrcPtr>::ptr_type, ThreshBinaryInvFunc<typename PtrTraits<SrcPtr>::value_type> > >
+threshBinaryInv_(const SrcPtr& src, typename PtrTraits<SrcPtr>::value_type thresh, typename PtrTraits<SrcPtr>::value_type maxVal)
+{
+    return makeExpr(transformPtr(src, thresh_binary_inv_func(thresh, maxVal)));
+}
+
+template <class SrcPtr>
+__host__ Expr<UnaryTransformPtrSz<typename PtrTraits<SrcPtr>::ptr_type, ThreshTruncFunc<typename PtrTraits<SrcPtr>::value_type> > >
+threshTrunc_(const SrcPtr& src, typename PtrTraits<SrcPtr>::value_type thresh)
+{
+    return makeExpr(transformPtr(src, thresh_trunc_func(thresh)));
+}
+
+template <class SrcPtr>
+__host__ Expr<UnaryTransformPtrSz<typename PtrTraits<SrcPtr>::ptr_type, ThreshToZeroFunc<typename PtrTraits<SrcPtr>::value_type> > >
+threshToZero_(const SrcPtr& src, typename PtrTraits<SrcPtr>::value_type thresh)
+{
+    return makeExpr(transformPtr(src, thresh_to_zero_func(thresh)));
+}
+
+template <class SrcPtr>
+__host__ Expr<UnaryTransformPtrSz<typename PtrTraits<SrcPtr>::ptr_type, ThreshToZeroInvFunc<typename PtrTraits<SrcPtr>::value_type> > >
+threshToZeroInv_(const SrcPtr& src, typename PtrTraits<SrcPtr>::value_type thresh)
+{
+    return makeExpr(transformPtr(src, thresh_to_zero_inv_func(thresh)));
+}
+
+// cvt
+
+template <typename D, class SrcPtr>
+__host__ Expr<UnaryTransformPtrSz<typename PtrTraits<SrcPtr>::ptr_type, saturate_cast_func<typename PtrTraits<SrcPtr>::value_type, D> > >
+cvt_(const SrcPtr& src)
+{
+    return makeExpr(transformPtr(src, saturate_cast_func<typename PtrTraits<SrcPtr>::value_type, D>()));
+}
+
+// lut
+
+template <class SrcPtr, class TablePtr>
+__host__ Expr<LutPtrSz<typename PtrTraits<SrcPtr>::ptr_type, typename PtrTraits<TablePtr>::ptr_type> >
+lut_(const SrcPtr& src, const TablePtr& tbl)
+{
+    return makeExpr(lutPtr(src, tbl));
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/expr/reduction.hpp b/modules/cudev/include/opencv2/cudev/expr/reduction.hpp
new file mode 100644
index 00000000000..aa772d03480
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/expr/reduction.hpp
@@ -0,0 +1,264 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_EXPR_REDUCTION_HPP
+#define OPENCV_CUDEV_EXPR_REDUCTION_HPP
+
+#include "../common.hpp"
+#include "../grid/reduce.hpp"
+#include "../grid/histogram.hpp"
+#include "../grid/integral.hpp"
+#include "../grid/reduce_to_vec.hpp"
+#include "../ptr2d/traits.hpp"
+#include "expr.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// sum
+
+template <class SrcPtr> struct SumExprBody
+{
+    SrcPtr src;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridCalcSum(src, dst, stream);
+    }
+};
+
+template <class SrcPtr>
+__host__ Expr<SumExprBody<SrcPtr> >
+sum_(const SrcPtr& src)
+{
+    SumExprBody<SrcPtr> body;
+    body.src = src;
+    return makeExpr(body);
+}
+
+// minVal
+
+template <class SrcPtr> struct FindMinValExprBody
+{
+    SrcPtr src;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridFindMinVal(src, dst, stream);
+    }
+};
+
+template <class SrcPtr>
+__host__ Expr<FindMinValExprBody<SrcPtr> >
+minVal_(const SrcPtr& src)
+{
+    FindMinValExprBody<SrcPtr> body;
+    body.src = src;
+    return makeExpr(body);
+}
+
+// maxVal
+
+template <class SrcPtr> struct FindMaxValExprBody
+{
+    SrcPtr src;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridFindMaxVal(src, dst, stream);
+    }
+};
+
+template <class SrcPtr>
+__host__ Expr<FindMaxValExprBody<SrcPtr> >
+maxVal_(const SrcPtr& src)
+{
+    FindMaxValExprBody<SrcPtr> body;
+    body.src = src;
+    return makeExpr(body);
+}
+
+// minMaxVal
+
+template <class SrcPtr> struct FindMinMaxValExprBody
+{
+    SrcPtr src;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridFindMinMaxVal(src, dst, stream);
+    }
+};
+
+template <class SrcPtr>
+__host__ Expr<FindMinMaxValExprBody<SrcPtr> >
+minMaxVal_(const SrcPtr& src)
+{
+    FindMinMaxValExprBody<SrcPtr> body;
+    body.src = src;
+    return makeExpr(body);
+}
+
+// countNonZero
+
+template <class SrcPtr> struct CountNonZeroExprBody
+{
+    SrcPtr src;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridCountNonZero(src, dst, stream);
+    }
+};
+
+template <class SrcPtr>
+__host__ Expr<CountNonZeroExprBody<SrcPtr> >
+countNonZero_(const SrcPtr& src)
+{
+    CountNonZeroExprBody<SrcPtr> body;
+    body.src = src;
+    return makeExpr(body);
+}
+
+// reduceToRow
+
+template <class Reductor, class SrcPtr> struct ReduceToRowBody
+{
+    SrcPtr src;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridReduceToRow<Reductor>(src, dst, stream);
+    }
+};
+
+template <class Reductor, class SrcPtr>
+__host__ Expr<ReduceToRowBody<Reductor, SrcPtr> >
+reduceToRow_(const SrcPtr& src)
+{
+    ReduceToRowBody<Reductor, SrcPtr> body;
+    body.src = src;
+    return makeExpr(body);
+}
+
+// reduceToColumn
+
+template <class Reductor, class SrcPtr> struct ReduceToColumnBody
+{
+    SrcPtr src;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridReduceToColumn<Reductor>(src, dst, stream);
+    }
+};
+
+template <class Reductor, class SrcPtr>
+__host__ Expr<ReduceToColumnBody<Reductor, SrcPtr> >
+reduceToColumn_(const SrcPtr& src)
+{
+    ReduceToColumnBody<Reductor, SrcPtr> body;
+    body.src = src;
+    return makeExpr(body);
+}
+
+// histogram
+
+template <int BIN_COUNT, class SrcPtr> struct HistogramBody
+{
+    SrcPtr src;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridHistogram<BIN_COUNT>(src, dst, stream);
+    }
+};
+
+template <int BIN_COUNT, class SrcPtr>
+__host__ Expr<HistogramBody<BIN_COUNT, SrcPtr> >
+histogram_(const SrcPtr& src)
+{
+    HistogramBody<BIN_COUNT, SrcPtr> body;
+    body.src = src;
+    return makeExpr(body);
+}
+
+// integral
+
+template <class SrcPtr> struct IntegralBody
+{
+    SrcPtr src;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridIntegral(src, dst, stream);
+    }
+};
+
+template <class SrcPtr>
+__host__ Expr<IntegralBody<SrcPtr> >
+integral_(const SrcPtr& src)
+{
+    IntegralBody<SrcPtr> body;
+    body.src = src;
+    return makeExpr(body);
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/expr/unary_func.hpp b/modules/cudev/include/opencv2/cudev/expr/unary_func.hpp
new file mode 100644
index 00000000000..567e15c41ad
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/expr/unary_func.hpp
@@ -0,0 +1,103 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_EXPR_UNARY_FUNC_HPP
+#define OPENCV_CUDEV_EXPR_UNARY_FUNC_HPP
+
+#include "../common.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/transform.hpp"
+#include "../functional/functional.hpp"
+#include "expr.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+#define CV_CUDEV_EXPR_UNARY_FUNC(name) \
+    template <class SrcPtr> \
+    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<SrcPtr>::ptr_type, name ## _func<typename PtrTraits<SrcPtr>::value_type> > > \
+    name ## _(const SrcPtr& src) \
+    { \
+        return makeExpr(transformPtr(src, name ## _func<typename PtrTraits<SrcPtr>::value_type>())); \
+    }
+
+CV_CUDEV_EXPR_UNARY_FUNC(abs)
+CV_CUDEV_EXPR_UNARY_FUNC(sqr)
+CV_CUDEV_EXPR_UNARY_FUNC(sqrt)
+CV_CUDEV_EXPR_UNARY_FUNC(exp)
+CV_CUDEV_EXPR_UNARY_FUNC(exp2)
+CV_CUDEV_EXPR_UNARY_FUNC(exp10)
+CV_CUDEV_EXPR_UNARY_FUNC(log)
+CV_CUDEV_EXPR_UNARY_FUNC(log2)
+CV_CUDEV_EXPR_UNARY_FUNC(log10)
+CV_CUDEV_EXPR_UNARY_FUNC(sin)
+CV_CUDEV_EXPR_UNARY_FUNC(cos)
+CV_CUDEV_EXPR_UNARY_FUNC(tan)
+CV_CUDEV_EXPR_UNARY_FUNC(asin)
+CV_CUDEV_EXPR_UNARY_FUNC(acos)
+CV_CUDEV_EXPR_UNARY_FUNC(atan)
+CV_CUDEV_EXPR_UNARY_FUNC(sinh)
+CV_CUDEV_EXPR_UNARY_FUNC(cosh)
+CV_CUDEV_EXPR_UNARY_FUNC(tanh)
+CV_CUDEV_EXPR_UNARY_FUNC(asinh)
+CV_CUDEV_EXPR_UNARY_FUNC(acosh)
+CV_CUDEV_EXPR_UNARY_FUNC(atanh)
+
+#undef CV_CUDEV_EXPR_UNARY_FUNC
+
+template <class SrcPtr>
+__host__ Expr<UnaryTransformPtrSz<typename PtrTraits<SrcPtr>::ptr_type, Binder2nd<pow_func<typename PtrTraits<SrcPtr>::value_type> > > >
+pow_(const SrcPtr& src, float power)
+{
+    return makeExpr(transformPtr(src, bind2nd(pow_func<typename PtrTraits<SrcPtr>::value_type>(), power)));
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/expr/unary_op.hpp b/modules/cudev/include/opencv2/cudev/expr/unary_op.hpp
new file mode 100644
index 00000000000..4e067fe9157
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/expr/unary_op.hpp
@@ -0,0 +1,99 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_EXPR_UNARY_OP_HPP
+#define OPENCV_CUDEV_EXPR_UNARY_OP_HPP
+
+#include "../common.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/transform.hpp"
+#include "../ptr2d/gpumat.hpp"
+#include "../ptr2d/texture.hpp"
+#include "../ptr2d/glob.hpp"
+#include "../functional/functional.hpp"
+#include "expr.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+#define CV_CUDEV_EXPR_UNOP_INST(op, functor) \
+    template <typename T> \
+    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<GpuMat_<T> >::ptr_type, functor<T> > > \
+    operator op(const GpuMat_<T>& src) \
+    { \
+        return makeExpr(transformPtr(src, functor<T>())); \
+    } \
+    template <typename T> \
+    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<GlobPtrSz<T> >::ptr_type, functor<T> > > \
+    operator op(const GlobPtrSz<T>& src) \
+    { \
+        return makeExpr(transformPtr(src, functor<T>())); \
+    } \
+    template <typename T> \
+    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<Texture<T> >::ptr_type, functor<T> > > \
+    operator op(const Texture<T>& src) \
+    { \
+        return makeExpr(transformPtr(src, functor<T>())); \
+    } \
+    template <class Body> \
+    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<Body>::ptr_type, functor<typename Body::value_type> > > \
+    operator op(const Expr<Body>& src) \
+    { \
+        return makeExpr(transformPtr(src.body, functor<typename Body::value_type>())); \
+    }
+
+CV_CUDEV_EXPR_UNOP_INST(-, negate)
+CV_CUDEV_EXPR_UNOP_INST(!, logical_not)
+CV_CUDEV_EXPR_UNOP_INST(~, bit_not)
+
+#undef CV_CUDEV_EXPR_UNOP_INST
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/expr/warping.hpp b/modules/cudev/include/opencv2/cudev/expr/warping.hpp
new file mode 100644
index 00000000000..ca04b4dd0ac
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/expr/warping.hpp
@@ -0,0 +1,176 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_EXPR_WARPING_HPP
+#define OPENCV_CUDEV_EXPR_WARPING_HPP
+
+#include "../common.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/resize.hpp"
+#include "../ptr2d/remap.hpp"
+#include "../ptr2d/warping.hpp"
+#include "../grid/pyramids.hpp"
+#include "../grid/transpose.hpp"
+#include "expr.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// resize
+
+template <class SrcPtr>
+__host__ Expr<ResizePtrSz<typename PtrTraits<SrcPtr>::ptr_type> >
+resize_(const SrcPtr& src, float fx, float fy)
+{
+    return makeExpr(resizePtr(src, fx, fy));
+}
+
+// remap
+
+template <class SrcPtr, class MapPtr>
+__host__ Expr<RemapPtr1Sz<typename PtrTraits<SrcPtr>::ptr_type, typename PtrTraits<MapPtr>::ptr_type> >
+remap_(const SrcPtr& src, const MapPtr& map)
+{
+    return makeExpr(remapPtr(src, map));
+}
+
+template <class SrcPtr, class MapXPtr, class MapYPtr>
+__host__ Expr<RemapPtr2Sz<typename PtrTraits<SrcPtr>::ptr_type, typename PtrTraits<MapXPtr>::ptr_type, typename PtrTraits<MapYPtr>::ptr_type> >
+remap_(const SrcPtr& src, const MapXPtr& mapx, const MapYPtr& mapy)
+{
+    return makeExpr(remapPtr(src, mapx, mapy));
+}
+
+// warpAffine
+
+template <class SrcPtr>
+__host__ Expr<RemapPtr1Sz<typename PtrTraits<SrcPtr>::ptr_type, AffineMapPtr> >
+warpAffine_(const SrcPtr& src, Size dstSize, const GpuMat_<float>& warpMat)
+{
+    return makeExpr(warpAffinePtr(src, dstSize, warpMat));
+}
+
+// warpPerspective
+
+template <class SrcPtr>
+__host__ Expr<RemapPtr1Sz<typename PtrTraits<SrcPtr>::ptr_type, PerspectiveMapPtr> >
+warpPerspective_(const SrcPtr& src, Size dstSize, const GpuMat_<float>& warpMat)
+{
+    return makeExpr(warpPerspectivePtr(src, dstSize, warpMat));
+}
+
+// pyrDown
+
+template <class SrcPtr> struct PyrDownBody
+{
+    SrcPtr src;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridPyrDown(src, dst, stream);
+    }
+};
+
+template <class SrcPtr>
+__host__ Expr<PyrDownBody<SrcPtr> >
+pyrDown_(const SrcPtr& src)
+{
+    PyrDownBody<SrcPtr> body;
+    body.src = src;
+    return makeExpr(body);
+}
+
+// pyrUp
+
+template <class SrcPtr> struct PyrUpBody
+{
+    SrcPtr src;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridPyrUp(src, dst, stream);
+    }
+};
+
+template <class SrcPtr>
+__host__ Expr<PyrUpBody<SrcPtr> >
+pyrUp_(const SrcPtr& src)
+{
+    PyrUpBody<SrcPtr> body;
+    body.src = src;
+    return makeExpr(body);
+}
+
+// transpose
+
+template <class SrcPtr> struct TransposeBody
+{
+    SrcPtr src;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridTranspose(src, dst, stream);
+    }
+};
+
+template <class SrcPtr>
+__host__ Expr<TransposeBody<SrcPtr> >
+transpose_(const SrcPtr& src)
+{
+    TransposeBody<SrcPtr> body;
+    body.src = src;
+    return makeExpr(body);
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/functional/color_cvt.hpp b/modules/cudev/include/opencv2/cudev/functional/color_cvt.hpp
new file mode 100644
index 00000000000..2681030ae01
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/functional/color_cvt.hpp
@@ -0,0 +1,479 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_FUNCTIONAL_COLOR_CVT_HPP
+#define OPENCV_CUDEV_FUNCTIONAL_COLOR_CVT_HPP
+
+#include "../common.hpp"
+#include "detail/color_cvt.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// Various 3/4-channel to 3/4-channel RGB transformations
+
+#define CV_CUDEV_RGB2RGB_INST(name, scn, dcn, bidx) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::RGB2RGB<SrcDepth, scn, dcn, bidx> \
+    { \
+    };
+
+CV_CUDEV_RGB2RGB_INST(BGR_to_RGB, 3, 3, 2)
+CV_CUDEV_RGB2RGB_INST(BGR_to_BGRA, 3, 4, 0)
+CV_CUDEV_RGB2RGB_INST(BGR_to_RGBA, 3, 4, 2)
+CV_CUDEV_RGB2RGB_INST(BGRA_to_BGR, 4, 3, 0)
+CV_CUDEV_RGB2RGB_INST(BGRA_to_RGB, 4, 3, 2)
+CV_CUDEV_RGB2RGB_INST(BGRA_to_RGBA, 4, 4, 2)
+
+#undef CV_CUDEV_RGB2RGB_INST
+
+// RGB to Grayscale
+
+#define CV_CUDEV_RGB2GRAY_INST(name, scn, bidx) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::RGB2Gray<SrcDepth, scn, bidx> \
+    { \
+    };
+
+CV_CUDEV_RGB2GRAY_INST(RGB_to_GRAY, 3, 2)
+CV_CUDEV_RGB2GRAY_INST(BGR_to_GRAY, 3, 0)
+CV_CUDEV_RGB2GRAY_INST(RGBA_to_GRAY, 4, 2)
+CV_CUDEV_RGB2GRAY_INST(BGRA_to_GRAY, 4, 0)
+
+#undef CV_CUDEV_RGB2GRAY_INST
+
+// Grayscale to RGB
+
+#define CV_CUDEV_GRAY2RGB_INST(name, dcn) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::Gray2RGB<SrcDepth, dcn> \
+    { \
+    };
+
+CV_CUDEV_GRAY2RGB_INST(GRAY_to_BGR, 3)
+CV_CUDEV_GRAY2RGB_INST(GRAY_to_BGRA, 4)
+
+#undef CV_CUDEV_GRAY2RGB_INST
+
+// RGB to YUV
+
+#define CV_CUDEV_RGB2YUV_INST(name, scn, dcn, bidx) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::RGB2YUV<SrcDepth, scn, dcn, bidx> \
+    { \
+    };
+
+CV_CUDEV_RGB2YUV_INST(RGB_to_YUV, 3, 3, 2)
+CV_CUDEV_RGB2YUV_INST(RGBA_to_YUV, 4, 3, 2)
+CV_CUDEV_RGB2YUV_INST(RGB_to_YUV4, 3, 4, 2)
+CV_CUDEV_RGB2YUV_INST(RGBA_to_YUV4, 4, 4, 2)
+CV_CUDEV_RGB2YUV_INST(BGR_to_YUV, 3, 3, 0)
+CV_CUDEV_RGB2YUV_INST(BGRA_to_YUV, 4, 3, 0)
+CV_CUDEV_RGB2YUV_INST(BGR_to_YUV4, 3, 4, 0)
+CV_CUDEV_RGB2YUV_INST(BGRA_to_YUV4, 4, 4, 0)
+
+#undef CV_CUDEV_RGB2YUV_INST
+
+// YUV to RGB
+
+#define CV_CUDEV_YUV2RGB_INST(name, scn, dcn, bidx) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::YUV2RGB<SrcDepth, scn, dcn, bidx> \
+    { \
+    };
+
+CV_CUDEV_YUV2RGB_INST(YUV_to_RGB, 3, 3, 2)
+CV_CUDEV_YUV2RGB_INST(YUV_to_RGBA, 3, 4, 2)
+CV_CUDEV_YUV2RGB_INST(YUV4_to_RGB, 4, 3, 2)
+CV_CUDEV_YUV2RGB_INST(YUV4_to_RGBA, 4, 4, 2)
+CV_CUDEV_YUV2RGB_INST(YUV_to_BGR, 3, 3, 0)
+CV_CUDEV_YUV2RGB_INST(YUV_to_BGRA, 3, 4, 0)
+CV_CUDEV_YUV2RGB_INST(YUV4_to_BGR, 4, 3, 0)
+CV_CUDEV_YUV2RGB_INST(YUV4_to_BGRA, 4, 4, 0)
+
+#undef CV_CUDEV_YUV2RGB_INST
+
+// RGB to YCrCb
+
+#define CV_CUDEV_RGB2YCrCb_INST(name, scn, dcn, bidx) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::RGB2YCrCb<SrcDepth, scn, dcn, bidx> \
+    { \
+    };
+
+CV_CUDEV_RGB2YCrCb_INST(RGB_to_YCrCb, 3, 3, 2)
+CV_CUDEV_RGB2YCrCb_INST(RGBA_to_YCrCb, 4, 3, 2)
+CV_CUDEV_RGB2YCrCb_INST(RGB_to_YCrCb4, 3, 4, 2)
+CV_CUDEV_RGB2YCrCb_INST(RGBA_to_YCrCb4, 4, 4, 2)
+CV_CUDEV_RGB2YCrCb_INST(BGR_to_YCrCb, 3, 3, 0)
+CV_CUDEV_RGB2YCrCb_INST(BGRA_to_YCrCb, 4, 3, 0)
+CV_CUDEV_RGB2YCrCb_INST(BGR_to_YCrCb4, 3, 4, 0)
+CV_CUDEV_RGB2YCrCb_INST(BGRA_to_YCrCb4, 4, 4, 0)
+
+#undef CV_CUDEV_RGB2YCrCb_INST
+
+// YCrCb to RGB
+
+#define CV_CUDEV_YCrCb2RGB_INST(name, scn, dcn, bidx) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::YCrCb2RGB<SrcDepth, scn, dcn, bidx> \
+    { \
+    };
+
+CV_CUDEV_YCrCb2RGB_INST(YCrCb_to_RGB, 3, 3, 2)
+CV_CUDEV_YCrCb2RGB_INST(YCrCb_to_RGBA, 3, 4, 2)
+CV_CUDEV_YCrCb2RGB_INST(YCrCb4_to_RGB, 4, 3, 2)
+CV_CUDEV_YCrCb2RGB_INST(YCrCb4_to_RGBA, 4, 4, 2)
+CV_CUDEV_YCrCb2RGB_INST(YCrCb_to_BGR, 3, 3, 0)
+CV_CUDEV_YCrCb2RGB_INST(YCrCb_to_BGRA, 3, 4, 0)
+CV_CUDEV_YCrCb2RGB_INST(YCrCb4_to_BGR, 4, 3, 0)
+CV_CUDEV_YCrCb2RGB_INST(YCrCb4_to_BGRA, 4, 4, 0)
+
+#undef CV_CUDEV_YCrCb2RGB_INST
+
+// RGB to XYZ
+
+#define CV_CUDEV_RGB2XYZ_INST(name, scn, dcn, bidx) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::RGB2XYZ<SrcDepth, scn, dcn, bidx> \
+    { \
+    };
+
+CV_CUDEV_RGB2XYZ_INST(RGB_to_XYZ, 3, 3, 2)
+CV_CUDEV_RGB2XYZ_INST(RGBA_to_XYZ, 4, 3, 2)
+CV_CUDEV_RGB2XYZ_INST(RGB_to_XYZ4, 3, 4, 2)
+CV_CUDEV_RGB2XYZ_INST(RGBA_to_XYZ4, 4, 4, 2)
+CV_CUDEV_RGB2XYZ_INST(BGR_to_XYZ, 3, 3, 0)
+CV_CUDEV_RGB2XYZ_INST(BGRA_to_XYZ, 4, 3, 0)
+CV_CUDEV_RGB2XYZ_INST(BGR_to_XYZ4, 3, 4, 0)
+CV_CUDEV_RGB2XYZ_INST(BGRA_to_XYZ4, 4, 4, 0)
+
+#undef CV_CUDEV_RGB2XYZ_INST
+
+// XYZ to RGB
+
+#define CV_CUDEV_XYZ2RGB_INST(name, scn, dcn, bidx) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::XYZ2RGB<SrcDepth, scn, dcn, bidx> \
+    { \
+    };
+
+CV_CUDEV_XYZ2RGB_INST(XYZ_to_RGB, 3, 3, 2)
+CV_CUDEV_XYZ2RGB_INST(XYZ4_to_RGB, 4, 3, 2)
+CV_CUDEV_XYZ2RGB_INST(XYZ_to_RGBA, 3, 4, 2)
+CV_CUDEV_XYZ2RGB_INST(XYZ4_to_RGBA, 4, 4, 2)
+CV_CUDEV_XYZ2RGB_INST(XYZ_to_BGR, 3, 3, 0)
+CV_CUDEV_XYZ2RGB_INST(XYZ4_to_BGR, 4, 3, 0)
+CV_CUDEV_XYZ2RGB_INST(XYZ_to_BGRA, 3, 4, 0)
+CV_CUDEV_XYZ2RGB_INST(XYZ4_to_BGRA, 4, 4, 0)
+
+#undef CV_CUDEV_XYZ2RGB_INST
+
+// RGB to HSV
+
+#define CV_CUDEV_RGB2HSV_INST(name, scn, dcn, bidx) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::RGB2HSV<SrcDepth, scn, dcn, bidx, 180> \
+    { \
+    }; \
+    template <typename SrcDepth> struct name ## _FULL ## _func : cv::cudev::color_cvt_detail::RGB2HSV<SrcDepth, scn, dcn, bidx, 256> \
+    { \
+    }; \
+    template <> struct name ## _func<float> : cv::cudev::color_cvt_detail::RGB2HSV<float, scn, dcn, bidx, 360> \
+    { \
+    }; \
+    template <> struct name ## _FULL ## _func<float> : cv::cudev::color_cvt_detail::RGB2HSV<float, scn, dcn, bidx, 360> \
+    { \
+    };
+
+CV_CUDEV_RGB2HSV_INST(RGB_to_HSV, 3, 3, 2)
+CV_CUDEV_RGB2HSV_INST(RGBA_to_HSV, 4, 3, 2)
+CV_CUDEV_RGB2HSV_INST(RGB_to_HSV4, 3, 4, 2)
+CV_CUDEV_RGB2HSV_INST(RGBA_to_HSV4, 4, 4, 2)
+CV_CUDEV_RGB2HSV_INST(BGR_to_HSV, 3, 3, 0)
+CV_CUDEV_RGB2HSV_INST(BGRA_to_HSV, 4, 3, 0)
+CV_CUDEV_RGB2HSV_INST(BGR_to_HSV4, 3, 4, 0)
+CV_CUDEV_RGB2HSV_INST(BGRA_to_HSV4, 4, 4, 0)
+
+#undef CV_CUDEV_RGB2HSV_INST
+
+// HSV to RGB
+
+#define CV_CUDEV_HSV2RGB_INST(name, scn, dcn, bidx) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::HSV2RGB<SrcDepth, scn, dcn, bidx, 180> \
+    { \
+    }; \
+    template <typename SrcDepth> struct name ## _FULL ## _func : cv::cudev::color_cvt_detail::HSV2RGB<SrcDepth, scn, dcn, bidx, 255> \
+    { \
+    }; \
+    template <> struct name ## _func<float> : cv::cudev::color_cvt_detail::HSV2RGB<float, scn, dcn, bidx, 360> \
+    { \
+    }; \
+    template <> struct name ## _FULL ## _func<float> : cv::cudev::color_cvt_detail::HSV2RGB<float, scn, dcn, bidx, 360> \
+    { \
+    };
+
+CV_CUDEV_HSV2RGB_INST(HSV_to_RGB, 3, 3, 2)
+CV_CUDEV_HSV2RGB_INST(HSV_to_RGBA, 3, 4, 2)
+CV_CUDEV_HSV2RGB_INST(HSV4_to_RGB, 4, 3, 2)
+CV_CUDEV_HSV2RGB_INST(HSV4_to_RGBA, 4, 4, 2)
+CV_CUDEV_HSV2RGB_INST(HSV_to_BGR, 3, 3, 0)
+CV_CUDEV_HSV2RGB_INST(HSV_to_BGRA, 3, 4, 0)
+CV_CUDEV_HSV2RGB_INST(HSV4_to_BGR, 4, 3, 0)
+CV_CUDEV_HSV2RGB_INST(HSV4_to_BGRA, 4, 4, 0)
+
+#undef CV_CUDEV_HSV2RGB_INST
+
+// RGB to HLS
+
+#define CV_CUDEV_RGB2HLS_INST(name, scn, dcn, bidx) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::RGB2HLS<SrcDepth, scn, dcn, bidx, 180> \
+    { \
+    }; \
+    template <typename SrcDepth> struct name ## _FULL ## _func : cv::cudev::color_cvt_detail::RGB2HLS<SrcDepth, scn, dcn, bidx, 256> \
+    { \
+    }; \
+    template <> struct name ## _func<float> : cv::cudev::color_cvt_detail::RGB2HLS<float, scn, dcn, bidx, 360> \
+    { \
+    }; \
+    template <> struct name ## _FULL ## _func<float> : cv::cudev::color_cvt_detail::RGB2HLS<float, scn, dcn, bidx, 360> \
+    { \
+    };
+
+CV_CUDEV_RGB2HLS_INST(RGB_to_HLS, 3, 3, 2)
+CV_CUDEV_RGB2HLS_INST(RGBA_to_HLS, 4, 3, 2)
+CV_CUDEV_RGB2HLS_INST(RGB_to_HLS4, 3, 4, 2)
+CV_CUDEV_RGB2HLS_INST(RGBA_to_HLS4, 4, 4, 2)
+CV_CUDEV_RGB2HLS_INST(BGR_to_HLS, 3, 3, 0)
+CV_CUDEV_RGB2HLS_INST(BGRA_to_HLS, 4, 3, 0)
+CV_CUDEV_RGB2HLS_INST(BGR_to_HLS4, 3, 4, 0)
+CV_CUDEV_RGB2HLS_INST(BGRA_to_HLS4, 4, 4, 0)
+
+#undef CV_CUDEV_RGB2HLS_INST
+
+// HLS to RGB
+
+#define CV_CUDEV_HLS2RGB_INST(name, scn, dcn, bidx) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::HLS2RGB<SrcDepth, scn, dcn, bidx, 180> \
+    { \
+    }; \
+    template <typename SrcDepth> struct name ## _FULL ## _func : cv::cudev::color_cvt_detail::HLS2RGB<SrcDepth, scn, dcn, bidx, 255> \
+    { \
+    }; \
+    template <> struct name ## _func<float> : cv::cudev::color_cvt_detail::HLS2RGB<float, scn, dcn, bidx, 360> \
+    { \
+    }; \
+    template <> struct name ## _FULL ## _func<float> : cv::cudev::color_cvt_detail::HLS2RGB<float, scn, dcn, bidx, 360> \
+    { \
+    };
+
+CV_CUDEV_HLS2RGB_INST(HLS_to_RGB, 3, 3, 2)
+CV_CUDEV_HLS2RGB_INST(HLS_to_RGBA, 3, 4, 2)
+CV_CUDEV_HLS2RGB_INST(HLS4_to_RGB, 4, 3, 2)
+CV_CUDEV_HLS2RGB_INST(HLS4_to_RGBA, 4, 4, 2)
+CV_CUDEV_HLS2RGB_INST(HLS_to_BGR, 3, 3, 0)
+CV_CUDEV_HLS2RGB_INST(HLS_to_BGRA, 3, 4, 0)
+CV_CUDEV_HLS2RGB_INST(HLS4_to_BGR, 4, 3, 0)
+CV_CUDEV_HLS2RGB_INST(HLS4_to_BGRA, 4, 4, 0)
+
+#undef CV_CUDEV_HLS2RGB_INST
+
+// RGB to Lab
+
+#define CV_CUDEV_RGB2Lab_INST(name, scn, dcn, sRGB, blueIdx) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::RGB2Lab<SrcDepth, scn, dcn, sRGB, blueIdx> \
+    { \
+    };
+
+CV_CUDEV_RGB2Lab_INST(RGB_to_Lab, 3, 3, true, 2)
+CV_CUDEV_RGB2Lab_INST(RGBA_to_Lab, 4, 3, true, 2)
+CV_CUDEV_RGB2Lab_INST(RGB_to_Lab4, 3, 4, true, 2)
+CV_CUDEV_RGB2Lab_INST(RGBA_to_Lab4, 4, 4, true, 2)
+CV_CUDEV_RGB2Lab_INST(BGR_to_Lab, 3, 3, true, 0)
+CV_CUDEV_RGB2Lab_INST(BGRA_to_Lab, 4, 3, true, 0)
+CV_CUDEV_RGB2Lab_INST(BGR_to_Lab4, 3, 4, true, 0)
+CV_CUDEV_RGB2Lab_INST(BGRA_to_Lab4, 4, 4, true, 0)
+
+CV_CUDEV_RGB2Lab_INST(LRGB_to_Lab, 3, 3, false, 2)
+CV_CUDEV_RGB2Lab_INST(LRGBA_to_Lab, 4, 3, false, 2)
+CV_CUDEV_RGB2Lab_INST(LRGB_to_Lab4, 3, 4, false, 2)
+CV_CUDEV_RGB2Lab_INST(LRGBA_to_Lab4, 4, 4, false, 2)
+CV_CUDEV_RGB2Lab_INST(LBGR_to_Lab, 3, 3, false, 0)
+CV_CUDEV_RGB2Lab_INST(LBGRA_to_Lab, 4, 3, false, 0)
+CV_CUDEV_RGB2Lab_INST(LBGR_to_Lab4, 3, 4, false, 0)
+CV_CUDEV_RGB2Lab_INST(LBGRA_to_Lab4, 4, 4, false, 0)
+
+#undef CV_CUDEV_RGB2Lab_INST
+
+// Lab to RGB
+
+#define CV_CUDEV_Lab2RGB_INST(name, scn, dcn, sRGB, blueIdx) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::Lab2RGB<SrcDepth, scn, dcn, sRGB, blueIdx> \
+    { \
+    };
+
+CV_CUDEV_Lab2RGB_INST(Lab_to_RGB, 3, 3, true, 2)
+CV_CUDEV_Lab2RGB_INST(Lab4_to_RGB, 4, 3, true, 2)
+CV_CUDEV_Lab2RGB_INST(Lab_to_RGBA, 3, 4, true, 2)
+CV_CUDEV_Lab2RGB_INST(Lab4_to_RGBA, 4, 4, true, 2)
+CV_CUDEV_Lab2RGB_INST(Lab_to_BGR, 3, 3, true, 0)
+CV_CUDEV_Lab2RGB_INST(Lab4_to_BGR, 4, 3, true, 0)
+CV_CUDEV_Lab2RGB_INST(Lab_to_BGRA, 3, 4, true, 0)
+CV_CUDEV_Lab2RGB_INST(Lab4_to_BGRA, 4, 4, true, 0)
+
+CV_CUDEV_Lab2RGB_INST(Lab_to_LRGB, 3, 3, false, 2)
+CV_CUDEV_Lab2RGB_INST(Lab4_to_LRGB, 4, 3, false, 2)
+CV_CUDEV_Lab2RGB_INST(Lab_to_LRGBA, 3, 4, false, 2)
+CV_CUDEV_Lab2RGB_INST(Lab4_to_LRGBA, 4, 4, false, 2)
+CV_CUDEV_Lab2RGB_INST(Lab_to_LBGR, 3, 3, false, 0)
+CV_CUDEV_Lab2RGB_INST(Lab4_to_LBGR, 4, 3, false, 0)
+CV_CUDEV_Lab2RGB_INST(Lab_to_LBGRA, 3, 4, false, 0)
+CV_CUDEV_Lab2RGB_INST(Lab4_to_LBGRA, 4, 4, false, 0)
+
+#undef CV_CUDEV_Lab2RGB_INST
+
+// RGB to Luv
+
+#define CV_CUDEV_RGB2Luv_INST(name, scn, dcn, sRGB, blueIdx) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::RGB2Luv<SrcDepth, scn, dcn, sRGB, blueIdx> \
+    { \
+    };
+
+CV_CUDEV_RGB2Luv_INST(RGB_to_Luv, 3, 3, true, 2)
+CV_CUDEV_RGB2Luv_INST(RGBA_to_Luv, 4, 3, true, 2)
+CV_CUDEV_RGB2Luv_INST(RGB_to_Luv4, 3, 4, true, 2)
+CV_CUDEV_RGB2Luv_INST(RGBA_to_Luv4, 4, 4, true, 2)
+CV_CUDEV_RGB2Luv_INST(BGR_to_Luv, 3, 3, true, 0)
+CV_CUDEV_RGB2Luv_INST(BGRA_to_Luv, 4, 3, true, 0)
+CV_CUDEV_RGB2Luv_INST(BGR_to_Luv4, 3, 4, true, 0)
+CV_CUDEV_RGB2Luv_INST(BGRA_to_Luv4, 4, 4, true, 0)
+
+CV_CUDEV_RGB2Luv_INST(LRGB_to_Luv, 3, 3, false, 2)
+CV_CUDEV_RGB2Luv_INST(LRGBA_to_Luv, 4, 3, false, 2)
+CV_CUDEV_RGB2Luv_INST(LRGB_to_Luv4, 3, 4, false, 2)
+CV_CUDEV_RGB2Luv_INST(LRGBA_to_Luv4, 4, 4, false, 2)
+CV_CUDEV_RGB2Luv_INST(LBGR_to_Luv, 3, 3, false, 0)
+CV_CUDEV_RGB2Luv_INST(LBGRA_to_Luv, 4, 3, false, 0)
+CV_CUDEV_RGB2Luv_INST(LBGR_to_Luv4, 3, 4, false, 0)
+CV_CUDEV_RGB2Luv_INST(LBGRA_to_Luv4, 4, 4, false, 0)
+
+#undef CV_CUDEV_RGB2Luv_INST
+
+// Luv to RGB
+
+#define CV_CUDEV_Luv2RGB_INST(name, scn, dcn, sRGB, blueIdx) \
+    template <typename SrcDepth> struct name ## _func : cv::cudev::color_cvt_detail::Luv2RGB<SrcDepth, scn, dcn, sRGB, blueIdx> \
+    { \
+    };
+
+CV_CUDEV_Luv2RGB_INST(Luv_to_RGB, 3, 3, true, 2)
+CV_CUDEV_Luv2RGB_INST(Luv4_to_RGB, 4, 3, true, 2)
+CV_CUDEV_Luv2RGB_INST(Luv_to_RGBA, 3, 4, true, 2)
+CV_CUDEV_Luv2RGB_INST(Luv4_to_RGBA, 4, 4, true, 2)
+CV_CUDEV_Luv2RGB_INST(Luv_to_BGR, 3, 3, true, 0)
+CV_CUDEV_Luv2RGB_INST(Luv4_to_BGR, 4, 3, true, 0)
+CV_CUDEV_Luv2RGB_INST(Luv_to_BGRA, 3, 4, true, 0)
+CV_CUDEV_Luv2RGB_INST(Luv4_to_BGRA, 4, 4, true, 0)
+
+CV_CUDEV_Luv2RGB_INST(Luv_to_LRGB, 3, 3, false, 2)
+CV_CUDEV_Luv2RGB_INST(Luv4_to_LRGB, 4, 3, false, 2)
+CV_CUDEV_Luv2RGB_INST(Luv_to_LRGBA, 3, 4, false, 2)
+CV_CUDEV_Luv2RGB_INST(Luv4_to_LRGBA, 4, 4, false, 2)
+CV_CUDEV_Luv2RGB_INST(Luv_to_LBGR, 3, 3, false, 0)
+CV_CUDEV_Luv2RGB_INST(Luv4_to_LBGR, 4, 3, false, 0)
+CV_CUDEV_Luv2RGB_INST(Luv_to_LBGRA, 3, 4, false, 0)
+CV_CUDEV_Luv2RGB_INST(Luv4_to_LBGRA, 4, 4, false, 0)
+
+#undef CV_CUDEV_Luv2RGB_INST
+
+// 24/32-bit RGB to 16-bit (565 or 555) RGB
+
+#define CV_CUDEV_RGB2RGB5x5_INST(name, scn, bidx, green_bits) \
+    typedef cv::cudev::color_cvt_detail::RGB2RGB5x5<scn, bidx, green_bits> name ## _func;
+
+CV_CUDEV_RGB2RGB5x5_INST(BGR_to_BGR555, 3, 0, 5)
+CV_CUDEV_RGB2RGB5x5_INST(BGR_to_BGR565, 3, 0, 6)
+CV_CUDEV_RGB2RGB5x5_INST(RGB_to_BGR555, 3, 2, 5)
+CV_CUDEV_RGB2RGB5x5_INST(RGB_to_BGR565, 3, 2, 6)
+CV_CUDEV_RGB2RGB5x5_INST(BGRA_to_BGR555, 4, 0, 5)
+CV_CUDEV_RGB2RGB5x5_INST(BGRA_to_BGR565, 4, 0, 6)
+CV_CUDEV_RGB2RGB5x5_INST(RGBA_to_BGR555, 4, 2, 5)
+CV_CUDEV_RGB2RGB5x5_INST(RGBA_to_BGR565, 4, 2, 6)
+
+#undef CV_CUDEV_RGB2RGB5x5_INST
+
+// 16-bit (565 or 555) RGB to 24/32-bit RGB
+
+#define CV_CUDEV_RGB5x52RGB_INST(name, dcn, bidx, green_bits) \
+    typedef cv::cudev::color_cvt_detail::RGB5x52RGB<dcn, bidx, green_bits> name ## _func;
+
+CV_CUDEV_RGB5x52RGB_INST(BGR555_to_RGB, 3, 2, 5)
+CV_CUDEV_RGB5x52RGB_INST(BGR565_to_RGB, 3, 2, 6)
+CV_CUDEV_RGB5x52RGB_INST(BGR555_to_BGR, 3, 0, 5)
+CV_CUDEV_RGB5x52RGB_INST(BGR565_to_BGR, 3, 0, 6)
+CV_CUDEV_RGB5x52RGB_INST(BGR555_to_RGBA, 4, 2, 5)
+CV_CUDEV_RGB5x52RGB_INST(BGR565_to_RGBA, 4, 2, 6)
+CV_CUDEV_RGB5x52RGB_INST(BGR555_to_BGRA, 4, 0, 5)
+CV_CUDEV_RGB5x52RGB_INST(BGR565_to_BGRA, 4, 0, 6)
+
+#undef CV_CUDEV_RGB5x52RGB_INST
+
+// Grayscale to 16-bit (565 or 555) RGB
+
+#define CV_CUDEV_GRAY2RGB5x5_INST(name, green_bits) \
+    typedef cv::cudev::color_cvt_detail::Gray2RGB5x5<green_bits> name ## _func;
+
+CV_CUDEV_GRAY2RGB5x5_INST(GRAY_to_BGR555, 5)
+CV_CUDEV_GRAY2RGB5x5_INST(GRAY_to_BGR565, 6)
+
+#undef CV_CUDEV_GRAY2RGB5x5_INST
+
+// 16-bit (565 or 555) RGB to Grayscale
+
+#define CV_CUDEV_RGB5x52GRAY_INST(name, green_bits) \
+    typedef cv::cudev::color_cvt_detail::RGB5x52Gray<green_bits> name ## _func;
+
+CV_CUDEV_RGB5x52GRAY_INST(BGR555_to_GRAY, 5)
+CV_CUDEV_RGB5x52GRAY_INST(BGR565_to_GRAY, 6)
+
+#undef CV_CUDEV_RGB5x52GRAY_INST
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/functional/detail/color_cvt.hpp b/modules/cudev/include/opencv2/cudev/functional/detail/color_cvt.hpp
new file mode 100644
index 00000000000..4818ef1f4e2
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/functional/detail/color_cvt.hpp
@@ -0,0 +1,1290 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_FUNCTIONAL_COLOR_CVT_DETAIL_HPP
+#define OPENCV_CUDEV_FUNCTIONAL_COLOR_CVT_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../util/vec_traits.hpp"
+#include "../../util/saturate_cast.hpp"
+#include "../../util/limits.hpp"
+#include "../functional.hpp"
+
+namespace cv { namespace cudev {
+
+namespace color_cvt_detail
+{
+    // utility
+
+    #define CV_CUDEV_DESCALE(x, n) (((x) + (1 << ((n)-1))) >> (n))
+
+    template <typename T> struct ColorChannel
+    {
+        __device__ __forceinline__ static T max() { return numeric_limits<T>::max(); }
+        __device__ __forceinline__ static T half() { return (T)(max()/2 + 1); }
+    };
+
+    template <> struct ColorChannel<float>
+    {
+        __device__ __forceinline__ static float max() { return 1.f; }
+        __device__ __forceinline__ static float half() { return 0.5f; }
+    };
+
+    template <typename T> __device__ __forceinline__ void setAlpha(typename MakeVec<T, 3>::type& vec, T val)
+    {
+    }
+
+    template <typename T> __device__ __forceinline__ void setAlpha(typename MakeVec<T, 4>::type& vec, T val)
+    {
+        vec.w = val;
+    }
+
+    template <typename T> __device__ __forceinline__ T getAlpha(const typename MakeVec<T, 3>::type& vec)
+    {
+        return ColorChannel<T>::max();
+    }
+
+    template <typename T> __device__ __forceinline__ T getAlpha(const typename MakeVec<T, 4>::type& vec)
+    {
+        return vec.w;
+    }
+
+    enum
+    {
+        rgb_shift  = 15,
+        yuv_shift  = 14,
+        xyz_shift  = 12,
+        R2Y        = 4899,
+        G2Y        = 9617,
+        B2Y        = 1868,
+        RY15       = 9798,
+        GY15       = 19235,
+        BY15       = 3735,
+        BLOCK_SIZE = 256
+    };
+
+    // Various 3/4-channel to 3/4-channel RGB transformations
+
+    template <typename T, int scn, int dcn, int bidx> struct RGB2RGB
+            : unary_function<typename MakeVec<T, scn>::type, typename MakeVec<T, dcn>::type>
+    {
+        __device__ typename MakeVec<T, dcn>::type operator ()(const typename MakeVec<T, scn>::type& src) const
+        {
+            typename MakeVec<T, dcn>::type dst;
+
+            dst.x = bidx == 0 ? src.x : src.z;
+            dst.y = src.y;
+            dst.z = bidx == 0 ? src.z : src.x;
+            setAlpha(dst, getAlpha<T>(src));
+
+            return dst;
+        }
+    };
+
+    // 24/32-bit RGB to 16-bit (565 or 555) RGB
+
+    template <int scn, int bidx, int green_bits> struct RGB2RGB5x5;
+
+    template <int scn, int bidx> struct RGB2RGB5x5<scn, bidx, 6>
+            : unary_function<typename MakeVec<uchar, scn>::type, ushort>
+    {
+        __device__ ushort operator ()(const typename MakeVec<uchar, scn>::type& src) const
+        {
+            const int b = bidx == 0 ? src.x : src.z;
+            const int g = src.y;
+            const int r = bidx == 0 ? src.z : src.x;
+            return (ushort) ((b >> 3) | ((g & ~3) << 3) | ((r & ~7) << 8));
+        }
+    };
+
+    template <int bidx> struct RGB2RGB5x5<3, bidx, 5>
+            : unary_function<uchar3, ushort>
+    {
+        __device__ ushort operator ()(const uchar3& src) const
+        {
+            const int b = bidx == 0 ? src.x : src.z;
+            const int g = src.y;
+            const int r = bidx == 0 ? src.z : src.x;
+            return (ushort) ((b >> 3) | ((g & ~7) << 2) | ((r & ~7) << 7));
+        }
+    };
+
+    template <int bidx> struct RGB2RGB5x5<4, bidx, 5>
+            : unary_function<uchar4, ushort>
+    {
+        __device__ ushort operator ()(const uchar4& src) const
+        {
+            const int b = bidx == 0 ? src.x : src.z;
+            const int g = src.y;
+            const int r = bidx == 0 ? src.z : src.x;
+            const int a = src.w;
+            return (ushort) ((b >> 3) | ((g & ~7) << 2) | ((r & ~7) << 7) | (a ? 0x8000 : 0));
+        }
+    };
+
+    // 16-bit (565 or 555) RGB to 24/32-bit RGB
+
+    template <int dcn, int bidx, int green_bits> struct RGB5x52RGB;
+
+    template <int bidx> struct RGB5x52RGB<3, bidx, 5>
+            : unary_function<ushort, uchar3>
+    {
+        __device__ uchar3 operator ()(ushort src) const
+        {
+            const int b = src << 3;
+            const int r = (src >> 7) & ~7;
+
+            uchar3 dst;
+
+            dst.x = bidx == 0 ? b : r;
+            dst.y = (src >> 2) & ~7;
+            dst.z = bidx == 0 ? r : b;
+
+            return dst;
+        }
+    };
+
+    template <int bidx> struct RGB5x52RGB<4, bidx, 5>
+            : unary_function<ushort, uchar4>
+    {
+        __device__ uchar4 operator ()(ushort src) const
+        {
+            const int b = src << 3;
+            const int r = (src >> 7) & ~7;
+
+            uchar4 dst;
+
+            dst.x = bidx == 0 ? b : r;
+            dst.y = (src >> 2) & ~7;
+            dst.z = bidx == 0 ? r : b;
+            dst.w = (src & 0x8000) * 0xffu;
+
+            return dst;
+        }
+    };
+
+    template <int bidx> struct RGB5x52RGB<3, bidx, 6>
+            : unary_function<ushort, uchar3>
+    {
+        __device__ uchar3 operator ()(ushort src) const
+        {
+            const int b = src << 3;
+            const int r = (src >> 8) & ~7;
+
+            uchar3 dst;
+
+            dst.x = bidx == 0 ? b : r;
+            dst.y = (src >> 3) & ~3;
+            dst.z = bidx == 0 ? r : b;
+
+            return dst;
+        }
+    };
+
+    template <int bidx> struct RGB5x52RGB<4, bidx, 6>
+            : unary_function<ushort, uchar4>
+    {
+        __device__ uchar4 operator ()(ushort src) const
+        {
+            const int b = src << 3;
+            const int r = (src >> 8) & ~7;
+
+            uchar4 dst;
+
+            dst.x = bidx == 0 ? b : r;
+            dst.y = (src >> 3) & ~3;
+            dst.z = bidx == 0 ? r : b;
+            dst.w = 255;
+
+            return dst;
+        }
+    };
+
+    // Grayscale to RGB
+
+    template <typename T, int dcn> struct Gray2RGB
+            : unary_function<T, typename MakeVec<T, dcn>::type>
+    {
+        __device__ typename MakeVec<T, dcn>::type operator ()(T src) const
+        {
+            typename MakeVec<T, dcn>::type dst;
+
+            dst.z = dst.y = dst.x = src;
+            setAlpha(dst, ColorChannel<T>::max());
+
+            return dst;
+        }
+    };
+
+    // Grayscale to 16-bit (565 or 555) RGB
+
+    template <int green_bits> struct Gray2RGB5x5;
+
+    template <> struct Gray2RGB5x5<5>
+            : unary_function<uchar, ushort>
+    {
+        __device__ ushort operator ()(uchar src) const
+        {
+            const int t = src >> 3;
+            return (ushort)(t | (t << 5) | (t << 10));
+        }
+    };
+
+    template <> struct Gray2RGB5x5<6>
+            : unary_function<uchar, ushort>
+    {
+        __device__ ushort operator ()(uchar src) const
+        {
+            const int t = src;
+            return (ushort)((t >> 3) | ((t & ~3) << 3) | ((t & ~7) << 8));
+        }
+    };
+
+    // 16-bit (565 or 555) RGB to Grayscale
+
+    template <int green_bits> struct RGB5x52Gray;
+
+    template <> struct RGB5x52Gray<5>
+            : unary_function<ushort, uchar>
+    {
+        __device__ uchar operator ()(ushort src) const
+        {
+            return (uchar) CV_CUDEV_DESCALE(((src << 3) & 0xf8) * BY15 + ((src >> 2) & 0xf8) * GY15 + ((src >> 7) & 0xf8) * RY15, rgb_shift);
+        }
+    };
+
+    template <> struct RGB5x52Gray<6>
+            : unary_function<ushort, uchar>
+    {
+        __device__ uchar operator ()(ushort src) const
+        {
+            return (uchar) CV_CUDEV_DESCALE(((src << 3) & 0xf8) * BY15 + ((src >> 3) & 0xfc) * GY15 + ((src >> 8) & 0xf8) * RY15, rgb_shift);
+        }
+    };
+
+    // RGB to Grayscale
+
+    template <typename T, int scn, int bidx> struct RGB2Gray
+            : unary_function<typename MakeVec<T, scn>::type, T>
+    {
+        __device__ T operator ()(const typename MakeVec<T, scn>::type& src) const
+        {
+            const int b = bidx == 0 ? src.x : src.z;
+            const int g = src.y;
+            const int r = bidx == 0 ? src.z : src.x;
+            return (T)CV_CUDEV_DESCALE(b * BY15 + g * GY15 + r * RY15, rgb_shift);
+        }
+    };
+
+    template <int scn, int bidx> struct RGB2Gray<float, scn, bidx>
+            : unary_function<typename MakeVec<float, scn>::type, float>
+    {
+        __device__ float operator ()(const typename MakeVec<float, scn>::type& src) const
+        {
+            const float b = bidx == 0 ? src.x : src.z;
+            const float g = src.y;
+            const float r = bidx == 0 ? src.z : src.x;
+            return b * 0.114f + g * 0.587f + r * 0.299f;
+        }
+    };
+
+    // RGB to YUV
+
+    __constant__ float c_RGB2YUVCoeffs_f[5] = { 0.114f, 0.587f, 0.299f, 0.492f, 0.877f };
+    __constant__ int   c_RGB2YUVCoeffs_i[5] = { B2Y, G2Y, R2Y, 8061, 14369 };
+
+    template <typename T, int scn, int dcn, int bidx> struct RGB2YUV
+            : unary_function<typename MakeVec<T, scn>::type, typename MakeVec<T, dcn>::type>
+    {
+        __device__ typename MakeVec<T, dcn>::type operator ()(const typename MakeVec<T, scn>::type& src) const
+        {
+            const int b = bidx == 0 ? src.x : src.z;
+            const int g = src.y;
+            const int r = bidx == 0 ? src.z : src.x;
+
+            const int delta = ColorChannel<T>::half() * (1 << yuv_shift);
+
+            const int Y = CV_CUDEV_DESCALE(b * c_RGB2YUVCoeffs_i[0] + g * c_RGB2YUVCoeffs_i[1] + r * c_RGB2YUVCoeffs_i[2], yuv_shift);
+            const int Cb = CV_CUDEV_DESCALE((b - Y) * c_RGB2YUVCoeffs_i[3] + delta, yuv_shift);
+            const int Cr = CV_CUDEV_DESCALE((r - Y) * c_RGB2YUVCoeffs_i[4] + delta, yuv_shift);
+
+            typename MakeVec<T, dcn>::type dst;
+
+            dst.x = saturate_cast<T>(Y);
+            dst.y = saturate_cast<T>(Cb);
+            dst.z = saturate_cast<T>(Cr);
+
+            return dst;
+        }
+    };
+
+    template <int scn, int dcn, int bidx> struct RGB2YUV<float, scn, dcn, bidx>
+            : unary_function<typename MakeVec<float, scn>::type, typename MakeVec<float, dcn>::type>
+    {
+        __device__ typename MakeVec<float, dcn>::type operator ()(const typename MakeVec<float, scn>::type& src) const
+        {
+            const float b = bidx == 0 ? src.x : src.z;
+            const float g = src.y;
+            const float r = bidx == 0 ? src.z : src.x;
+
+            typename MakeVec<float, dcn>::type dst;
+
+            dst.x = b * c_RGB2YUVCoeffs_f[0] + g * c_RGB2YUVCoeffs_f[1] + r * c_RGB2YUVCoeffs_f[2];
+            dst.y = (b - dst.x) * c_RGB2YUVCoeffs_f[3] + ColorChannel<float>::half();
+            dst.z = (r - dst.x) * c_RGB2YUVCoeffs_f[4] + ColorChannel<float>::half();
+
+            return dst;
+        }
+    };
+
+    // YUV to RGB
+
+    __constant__ float c_YUV2RGBCoeffs_f[5] = { 2.032f, -0.395f, -0.581f, 1.140f };
+    __constant__ int   c_YUV2RGBCoeffs_i[5] = { 33292, -6472, -9519, 18678 };
+
+    template <typename T, int scn, int dcn, int bidx> struct YUV2RGB
+            : unary_function<typename MakeVec<T, scn>::type, typename MakeVec<T, dcn>::type>
+    {
+        __device__ typename MakeVec<T, dcn>::type operator ()(const typename MakeVec<T, scn>::type& src) const
+        {
+            const int r = src.x + CV_CUDEV_DESCALE((src.z - ColorChannel<T>::half()) * c_YUV2RGBCoeffs_i[3], yuv_shift);
+            const int g = src.x + CV_CUDEV_DESCALE((src.z - ColorChannel<T>::half()) * c_YUV2RGBCoeffs_i[2] + (src.y - ColorChannel<T>::half()) * c_YUV2RGBCoeffs_i[1], yuv_shift);
+            const int b = src.x + CV_CUDEV_DESCALE((src.y - ColorChannel<T>::half()) * c_YUV2RGBCoeffs_i[0], yuv_shift);
+
+            typename MakeVec<T, dcn>::type dst;
+
+            dst.x = saturate_cast<T>(bidx == 0 ? b : r);
+            dst.y = saturate_cast<T>(g);
+            dst.z = saturate_cast<T>(bidx == 0 ? r : b);
+            setAlpha(dst, ColorChannel<T>::max());
+
+            return dst;
+        }
+    };
+
+    template <int scn, int dcn, int bidx> struct YUV2RGB<float, scn, dcn, bidx>
+            : unary_function<typename MakeVec<float, scn>::type, typename MakeVec<float, dcn>::type>
+    {
+        __device__ typename MakeVec<float, dcn>::type operator ()(const typename MakeVec<float, scn>::type& src) const
+        {
+            const float r = src.x + (src.z - ColorChannel<float>::half()) * c_YUV2RGBCoeffs_f[3];
+            const float g = src.x + (src.z - ColorChannel<float>::half()) * c_YUV2RGBCoeffs_f[2] + (src.y - ColorChannel<float>::half()) * c_YUV2RGBCoeffs_f[1];
+            const float b = src.x + (src.y - ColorChannel<float>::half()) * c_YUV2RGBCoeffs_f[0];
+
+            typename MakeVec<float, dcn>::type dst;
+
+            dst.x = bidx == 0 ? b : r;
+            dst.y = g;
+            dst.z = bidx == 0 ? r : b;
+            setAlpha(dst, ColorChannel<float>::max());
+
+            return dst;
+        }
+    };
+
+    // RGB to YCrCb
+
+    __constant__ float c_RGB2YCrCbCoeffs_f[5] = { 0.299f, 0.587f, 0.114f, 0.713f, 0.564f };
+    __constant__ int   c_RGB2YCrCbCoeffs_i[5] = { R2Y, G2Y, B2Y, 11682, 9241 };
+
+    template <typename T, int scn, int dcn, int bidx> struct RGB2YCrCb
+            : unary_function<typename MakeVec<T, scn>::type, typename MakeVec<T, dcn>::type>
+    {
+        __device__ typename MakeVec<T, dcn>::type operator ()(const typename MakeVec<T, scn>::type& src) const
+        {
+            const int b = bidx == 0 ? src.x : src.z;
+            const int g = src.y;
+            const int r = bidx == 0 ? src.z : src.x;
+
+            const int delta = ColorChannel<T>::half() * (1 << yuv_shift);
+
+            const int Y = CV_CUDEV_DESCALE(b * c_RGB2YCrCbCoeffs_i[2] + g * c_RGB2YCrCbCoeffs_i[1] + r * c_RGB2YCrCbCoeffs_i[0], yuv_shift);
+            const int Cr = CV_CUDEV_DESCALE((r - Y) * c_RGB2YCrCbCoeffs_i[3] + delta, yuv_shift);
+            const int Cb = CV_CUDEV_DESCALE((b - Y) * c_RGB2YCrCbCoeffs_i[4] + delta, yuv_shift);
+
+            typename MakeVec<T, dcn>::type dst;
+
+            dst.x = saturate_cast<T>(Y);
+            dst.y = saturate_cast<T>(Cr);
+            dst.z = saturate_cast<T>(Cb);
+
+            return dst;
+        }
+    };
+
+    template <int scn, int dcn, int bidx> struct RGB2YCrCb<float, scn, dcn, bidx>
+            : unary_function<typename MakeVec<float, scn>::type, typename MakeVec<float, dcn>::type>
+    {
+        __device__ typename MakeVec<float, dcn>::type operator ()(const typename MakeVec<float, scn>::type& src) const
+        {
+            const float b = bidx == 0 ? src.x : src.z;
+            const float g = src.y;
+            const float r = bidx == 0 ? src.z : src.x;
+
+            typename MakeVec<float, dcn>::type dst;
+
+            dst.x = b * c_RGB2YCrCbCoeffs_f[2] + g * c_RGB2YCrCbCoeffs_f[1] + r * c_RGB2YCrCbCoeffs_f[0];
+            dst.y = (r - dst.x) * c_RGB2YCrCbCoeffs_f[3] + ColorChannel<float>::half();
+            dst.z = (b - dst.x) * c_RGB2YCrCbCoeffs_f[4] + ColorChannel<float>::half();
+
+            return dst;
+        }
+    };
+
+    // YCrCb to RGB
+
+    __constant__ float c_YCrCb2RGBCoeffs_f[5] = {1.403f, -0.714f, -0.344f, 1.773f};
+    __constant__ int   c_YCrCb2RGBCoeffs_i[5] = {22987, -11698, -5636, 29049};
+
+    template <typename T, int scn, int dcn, int bidx> struct YCrCb2RGB
+            : unary_function<typename MakeVec<T, scn>::type, typename MakeVec<T, dcn>::type>
+    {
+        __device__ typename MakeVec<T, dcn>::type operator ()(const typename MakeVec<T, scn>::type& src) const
+        {
+            const int b = src.x + CV_CUDEV_DESCALE((src.z - ColorChannel<T>::half()) * c_YCrCb2RGBCoeffs_i[3], yuv_shift);
+            const int g = src.x + CV_CUDEV_DESCALE((src.z - ColorChannel<T>::half()) * c_YCrCb2RGBCoeffs_i[2] + (src.y - ColorChannel<T>::half()) * c_YCrCb2RGBCoeffs_i[1], yuv_shift);
+            const int r = src.x + CV_CUDEV_DESCALE((src.y - ColorChannel<T>::half()) * c_YCrCb2RGBCoeffs_i[0], yuv_shift);
+
+            typename MakeVec<T, dcn>::type dst;
+
+            dst.x = saturate_cast<T>(bidx == 0 ? b : r);
+            dst.y = saturate_cast<T>(g);
+            dst.z = saturate_cast<T>(bidx == 0 ? r : b);
+            setAlpha(dst, ColorChannel<T>::max());
+
+            return dst;
+        }
+    };
+
+    template <int scn, int dcn, int bidx> struct YCrCb2RGB<float, scn, dcn, bidx>
+            : unary_function<typename MakeVec<float, scn>::type, typename MakeVec<float, dcn>::type>
+    {
+        __device__ typename MakeVec<float, dcn>::type operator ()(const typename MakeVec<float, scn>::type& src) const
+        {
+            const float b = src.x + (src.z - ColorChannel<float>::half()) * c_YCrCb2RGBCoeffs_f[3];
+            const float g = src.x + (src.z - ColorChannel<float>::half()) * c_YCrCb2RGBCoeffs_f[2] + (src.y - ColorChannel<float>::half()) * c_YCrCb2RGBCoeffs_f[1];
+            const float r = src.x + (src.y - ColorChannel<float>::half()) * c_YCrCb2RGBCoeffs_f[0];
+
+            typename MakeVec<float, dcn>::type dst;
+
+            dst.x = bidx == 0 ? b : r;
+            dst.y = g;
+            dst.z = bidx == 0 ? r : b;
+            setAlpha(dst, ColorChannel<float>::max());
+
+            return dst;
+        }
+    };
+
+    // RGB to XYZ
+
+    __constant__ float c_RGB2XYZ_D65f[9] = { 0.412453f, 0.357580f, 0.180423f, 0.212671f, 0.715160f, 0.072169f, 0.019334f, 0.119193f, 0.950227f };
+    __constant__ int   c_RGB2XYZ_D65i[9] = { 1689, 1465, 739, 871, 2929, 296, 79, 488, 3892 };
+
+    template <typename T, int scn, int dcn, int bidx> struct RGB2XYZ
+            : unary_function<typename MakeVec<T, scn>::type, typename MakeVec<T, dcn>::type>
+    {
+        __device__ typename MakeVec<T, dcn>::type operator ()(const typename MakeVec<T, scn>::type& src) const
+        {
+            const int b = bidx == 0 ? src.x : src.z;
+            const int g = src.y;
+            const int r = bidx == 0 ? src.z : src.x;
+
+            typename MakeVec<T, dcn>::type dst;
+
+            dst.z = saturate_cast<T>(CV_CUDEV_DESCALE(r * c_RGB2XYZ_D65i[6] + g * c_RGB2XYZ_D65i[7] + b * c_RGB2XYZ_D65i[8], xyz_shift));
+            dst.x = saturate_cast<T>(CV_CUDEV_DESCALE(r * c_RGB2XYZ_D65i[0] + g * c_RGB2XYZ_D65i[1] + b * c_RGB2XYZ_D65i[2], xyz_shift));
+            dst.y = saturate_cast<T>(CV_CUDEV_DESCALE(r * c_RGB2XYZ_D65i[3] + g * c_RGB2XYZ_D65i[4] + b * c_RGB2XYZ_D65i[5], xyz_shift));
+
+            return dst;
+        }
+    };
+
+    template <int scn, int dcn, int bidx> struct RGB2XYZ<float, scn, dcn, bidx>
+            : unary_function<typename MakeVec<float, scn>::type, typename MakeVec<float, dcn>::type>
+    {
+        __device__ typename MakeVec<float, dcn>::type operator ()(const typename MakeVec<float, scn>::type& src) const
+        {
+            const float b = bidx == 0 ? src.x : src.z;
+            const float g = src.y;
+            const float r = bidx == 0 ? src.z : src.x;
+
+            typename MakeVec<float, dcn>::type dst;
+
+            dst.x = r * c_RGB2XYZ_D65f[0] + g * c_RGB2XYZ_D65f[1] + b * c_RGB2XYZ_D65f[2];
+            dst.y = r * c_RGB2XYZ_D65f[3] + g * c_RGB2XYZ_D65f[4] + b * c_RGB2XYZ_D65f[5];
+            dst.z = r * c_RGB2XYZ_D65f[6] + g * c_RGB2XYZ_D65f[7] + b * c_RGB2XYZ_D65f[8];
+
+            return dst;
+        }
+    };
+
+    // XYZ to RGB
+
+    __constant__ float c_XYZ2sRGB_D65f[9] = { 3.240479f, -1.53715f, -0.498535f, -0.969256f, 1.875991f, 0.041556f, 0.055648f, -0.204043f, 1.057311f };
+    __constant__ int   c_XYZ2sRGB_D65i[9] = { 13273, -6296, -2042, -3970, 7684, 170, 228, -836, 4331 };
+
+    template <typename T, int scn, int dcn, int bidx> struct XYZ2RGB
+            : unary_function<typename MakeVec<T, scn>::type, typename MakeVec<T, dcn>::type>
+    {
+        __device__ typename MakeVec<T, dcn>::type operator ()(const typename MakeVec<T, scn>::type& src) const
+        {
+            const int b = CV_CUDEV_DESCALE(src.x * c_XYZ2sRGB_D65i[6] + src.y * c_XYZ2sRGB_D65i[7] + src.z * c_XYZ2sRGB_D65i[8], xyz_shift);
+            const int g = CV_CUDEV_DESCALE(src.x * c_XYZ2sRGB_D65i[3] + src.y * c_XYZ2sRGB_D65i[4] + src.z * c_XYZ2sRGB_D65i[5], xyz_shift);
+            const int r = CV_CUDEV_DESCALE(src.x * c_XYZ2sRGB_D65i[0] + src.y * c_XYZ2sRGB_D65i[1] + src.z * c_XYZ2sRGB_D65i[2], xyz_shift);
+
+            typename MakeVec<T, dcn>::type dst;
+
+            dst.x = saturate_cast<T>(bidx == 0 ? b : r);
+            dst.y = saturate_cast<T>(g);
+            dst.z = saturate_cast<T>(bidx == 0 ? r : b);
+            setAlpha(dst, ColorChannel<T>::max());
+
+            return dst;
+        }
+    };
+
+    template <int scn, int dcn, int bidx> struct XYZ2RGB<float, scn, dcn, bidx>
+            : unary_function<typename MakeVec<float, scn>::type, typename MakeVec<float, dcn>::type>
+    {
+        __device__ typename MakeVec<float, dcn>::type operator ()(const typename MakeVec<float, scn>::type& src) const
+        {
+            const float b = src.x * c_XYZ2sRGB_D65f[6] + src.y * c_XYZ2sRGB_D65f[7] + src.z * c_XYZ2sRGB_D65f[8];
+            const float g = src.x * c_XYZ2sRGB_D65f[3] + src.y * c_XYZ2sRGB_D65f[4] + src.z * c_XYZ2sRGB_D65f[5];
+            const float r = src.x * c_XYZ2sRGB_D65f[0] + src.y * c_XYZ2sRGB_D65f[1] + src.z * c_XYZ2sRGB_D65f[2];
+
+            typename MakeVec<float, dcn>::type dst;
+
+            dst.x = bidx == 0 ? b : r;
+            dst.y = g;
+            dst.z = bidx == 0 ? r : b;
+            setAlpha(dst, ColorChannel<float>::max());
+
+            return dst;
+        }
+    };
+
+    // RGB to HSV
+
+    __constant__ int c_HsvDivTable   [256] = {0, 1044480, 522240, 348160, 261120, 208896, 174080, 149211, 130560, 116053, 104448, 94953, 87040, 80345, 74606, 69632, 65280, 61440, 58027, 54973, 52224, 49737, 47476, 45412, 43520, 41779, 40172, 38684, 37303, 36017, 34816, 33693, 32640, 31651, 30720, 29842, 29013, 28229, 27486, 26782, 26112, 25475, 24869, 24290, 23738, 23211, 22706, 22223, 21760, 21316, 20890, 20480, 20086, 19707, 19342, 18991, 18651, 18324, 18008, 17703, 17408, 17123, 16846, 16579, 16320, 16069, 15825, 15589, 15360, 15137, 14921, 14711, 14507, 14308, 14115, 13926, 13743, 13565, 13391, 13221, 13056, 12895, 12738, 12584, 12434, 12288, 12145, 12006, 11869, 11736, 11605, 11478, 11353, 11231, 11111, 10995, 10880, 10768, 10658, 10550, 10445, 10341, 10240, 10141, 10043, 9947, 9854, 9761, 9671, 9582, 9495, 9410, 9326, 9243, 9162, 9082, 9004, 8927, 8852, 8777, 8704, 8632, 8561, 8492, 8423, 8356, 8290, 8224, 8160, 8097, 8034, 7973, 7913, 7853, 7795, 7737, 7680, 7624, 7569, 7514, 7461, 7408, 7355, 7304, 7253, 7203, 7154, 7105, 7057, 7010, 6963, 6917, 6872, 6827, 6782, 6739, 6695, 6653, 6611, 6569, 6528, 6487, 6447, 6408, 6369, 6330, 6292, 6254, 6217, 6180, 6144, 6108, 6073, 6037, 6003, 5968, 5935, 5901, 5868, 5835, 5803, 5771, 5739, 5708, 5677, 5646, 5615, 5585, 5556, 5526, 5497, 5468, 5440, 5412, 5384, 5356, 5329, 5302, 5275, 5249, 5222, 5196, 5171, 5145, 5120, 5095, 5070, 5046, 5022, 4998, 4974, 4950, 4927, 4904, 4881, 4858, 4836, 4813, 4791, 4769, 4748, 4726, 4705, 4684, 4663, 4642, 4622, 4601, 4581, 4561, 4541, 4522, 4502, 4483, 4464, 4445, 4426, 4407, 4389, 4370, 4352, 4334, 4316, 4298, 4281, 4263, 4246, 4229, 4212, 4195, 4178, 4161, 4145, 4128, 4112, 4096};
+    __constant__ int c_HsvDivTable180[256] = {0, 122880, 61440, 40960, 30720, 24576, 20480, 17554, 15360, 13653, 12288, 11171, 10240, 9452, 8777, 8192, 7680, 7228, 6827, 6467, 6144, 5851, 5585, 5343, 5120, 4915, 4726, 4551, 4389, 4237, 4096, 3964, 3840, 3724, 3614, 3511, 3413, 3321, 3234, 3151, 3072, 2997, 2926, 2858, 2793, 2731, 2671, 2614, 2560, 2508, 2458, 2409, 2363, 2318, 2276, 2234, 2194, 2156, 2119, 2083, 2048, 2014, 1982, 1950, 1920, 1890, 1862, 1834, 1807, 1781, 1755, 1731, 1707, 1683, 1661, 1638, 1617, 1596, 1575, 1555, 1536, 1517, 1499, 1480, 1463, 1446, 1429, 1412, 1396, 1381, 1365, 1350, 1336, 1321, 1307, 1293, 1280, 1267, 1254, 1241, 1229, 1217, 1205, 1193, 1182, 1170, 1159, 1148, 1138, 1127, 1117, 1107, 1097, 1087, 1078, 1069, 1059, 1050, 1041, 1033, 1024, 1016, 1007, 999, 991, 983, 975, 968, 960, 953, 945, 938, 931, 924, 917, 910, 904, 897, 890, 884, 878, 871, 865, 859, 853, 847, 842, 836, 830, 825, 819, 814, 808, 803, 798, 793, 788, 783, 778, 773, 768, 763, 759, 754, 749, 745, 740, 736, 731, 727, 723, 719, 714, 710, 706, 702, 698, 694, 690, 686, 683, 679, 675, 671, 668, 664, 661, 657, 654, 650, 647, 643, 640, 637, 633, 630, 627, 624, 621, 617, 614, 611, 608, 605, 602, 599, 597, 594, 591, 588, 585, 582, 580, 577, 574, 572, 569, 566, 564, 561, 559, 556, 554, 551, 549, 546, 544, 541, 539, 537, 534, 532, 530, 527, 525, 523, 521, 518, 516, 514, 512, 510, 508, 506, 504, 502, 500, 497, 495, 493, 492, 490, 488, 486, 484, 482};
+    __constant__ int c_HsvDivTable256[256] = {0, 174763, 87381, 58254, 43691, 34953, 29127, 24966, 21845, 19418, 17476, 15888, 14564, 13443, 12483, 11651, 10923, 10280, 9709, 9198, 8738, 8322, 7944, 7598, 7282, 6991, 6722, 6473, 6242, 6026, 5825, 5638, 5461, 5296, 5140, 4993, 4855, 4723, 4599, 4481, 4369, 4263, 4161, 4064, 3972, 3884, 3799, 3718, 3641, 3567, 3495, 3427, 3361, 3297, 3236, 3178, 3121, 3066, 3013, 2962, 2913, 2865, 2819, 2774, 2731, 2689, 2648, 2608, 2570, 2533, 2497, 2461, 2427, 2394, 2362, 2330, 2300, 2270, 2241, 2212, 2185, 2158, 2131, 2106, 2081, 2056, 2032, 2009, 1986, 1964, 1942, 1920, 1900, 1879, 1859, 1840, 1820, 1802, 1783, 1765, 1748, 1730, 1713, 1697, 1680, 1664, 1649, 1633, 1618, 1603, 1589, 1574, 1560, 1547, 1533, 1520, 1507, 1494, 1481, 1469, 1456, 1444, 1432, 1421, 1409, 1398, 1387, 1376, 1365, 1355, 1344, 1334, 1324, 1314, 1304, 1295, 1285, 1276, 1266, 1257, 1248, 1239, 1231, 1222, 1214, 1205, 1197, 1189, 1181, 1173, 1165, 1157, 1150, 1142, 1135, 1128, 1120, 1113, 1106, 1099, 1092, 1085, 1079, 1072, 1066, 1059, 1053, 1046, 1040, 1034, 1028, 1022, 1016, 1010, 1004, 999, 993, 987, 982, 976, 971, 966, 960, 955, 950, 945, 940, 935, 930, 925, 920, 915, 910, 906, 901, 896, 892, 887, 883, 878, 874, 869, 865, 861, 857, 853, 848, 844, 840, 836, 832, 828, 824, 820, 817, 813, 809, 805, 802, 798, 794, 791, 787, 784, 780, 777, 773, 770, 767, 763, 760, 757, 753, 750, 747, 744, 741, 737, 734, 731, 728, 725, 722, 719, 716, 713, 710, 708, 705, 702, 699, 696, 694, 691, 688, 685};
+
+    template <typename T, int scn, int dcn, int bidx, int hr> struct RGB2HSV;
+
+    template <int scn, int dcn, int bidx, int hr> struct RGB2HSV<uchar, scn, dcn, bidx, hr>
+            : unary_function<typename MakeVec<uchar, scn>::type, typename MakeVec<uchar, dcn>::type>
+    {
+        __device__ typename MakeVec<uchar, dcn>::type operator ()(const typename MakeVec<uchar, scn>::type& src) const
+        {
+            const int hsv_shift = 12;
+            const int* hdiv_table = hr == 180 ? c_HsvDivTable180 : c_HsvDivTable256;
+
+            const int b = bidx == 0 ? src.x : src.z;
+            const int g = src.y;
+            const int r = bidx == 0 ? src.z : src.x;
+
+            int h, s, v = b;
+            int vmin = b, diff;
+            int vr, vg;
+
+            v = ::max(v, g);
+            v = ::max(v, r);
+            vmin = ::min(vmin, g);
+            vmin = ::min(vmin, r);
+
+            diff = v - vmin;
+            vr = (v == r) * -1;
+            vg = (v == g) * -1;
+
+            s = (diff * c_HsvDivTable[v] + (1 << (hsv_shift-1))) >> hsv_shift;
+            h = (vr & (g - b)) + (~vr & ((vg & (b - r + 2 * diff)) + ((~vg) & (r - g + 4 * diff))));
+            h = (h * hdiv_table[diff] + (1 << (hsv_shift-1))) >> hsv_shift;
+            h += (h < 0) * hr;
+
+            typename MakeVec<uchar, dcn>::type dst;
+
+            dst.x = saturate_cast<uchar>(h);
+            dst.y = saturate_cast<uchar>(s);
+            dst.z = saturate_cast<uchar>(v);
+
+            return dst;
+        }
+    };
+
+    template <int scn, int dcn, int bidx, int hr> struct RGB2HSV<float, scn, dcn, bidx, hr>
+            : unary_function<typename MakeVec<float, scn>::type, typename MakeVec<float, dcn>::type>
+    {
+        __device__ typename MakeVec<float, dcn>::type operator ()(const typename MakeVec<float, scn>::type& src) const
+        {
+            const float hscale = hr * (1.f / 360.f);
+
+            const float b = bidx == 0 ? src.x : src.z;
+            const float g = src.y;
+            const float r = bidx == 0 ? src.z : src.x;
+
+            float h, s, v;
+            float vmin, diff;
+
+            v = vmin = r;
+            v = ::fmax(v, g);
+            v = ::fmax(v, b);
+            vmin = ::fmin(vmin, g);
+            vmin = ::fmin(vmin, b);
+
+            diff = v - vmin;
+            s = diff / (float)(::fabs(v) + numeric_limits<float>::epsilon());
+            diff = (float)(60. / (diff + numeric_limits<float>::epsilon()));
+
+            h  = (v == r) * (g - b) * diff;
+            h += (v != r && v == g) * ((b - r) * diff + 120.f);
+            h += (v != r && v != g) * ((r - g) * diff + 240.f);
+            h += (h < 0) * 360.f;
+
+            typename MakeVec<float, dcn>::type dst;
+
+            dst.x = h * hscale;
+            dst.y = s;
+            dst.z = v;
+
+            return dst;
+        }
+    };
+
+    // HSV to RGB
+
+    __constant__ int c_HsvSectorData[6][3] = { {1,3,0}, {1,0,2}, {3,0,1}, {0,2,1}, {0,1,3}, {2,1,0} };
+
+    template <typename T, int scn, int dcn, int bidx, int hr> struct HSV2RGB;
+
+    template <int scn, int dcn, int bidx, int hr> struct HSV2RGB<float, scn, dcn, bidx, hr>
+            : unary_function<typename MakeVec<float, scn>::type, typename MakeVec<float, dcn>::type>
+    {
+        __device__ typename MakeVec<float, dcn>::type operator ()(const typename MakeVec<float, scn>::type& src) const
+        {
+            const float hscale = 6.f / hr;
+
+            float h = src.x, s = src.y, v = src.z;
+            float b = v, g = v, r = v;
+
+            if (s != 0)
+            {
+                h *= hscale;
+
+                if( h < 0 )
+                    do h += 6; while( h < 0 );
+                else if( h >= 6 )
+                    do h -= 6; while( h >= 6 );
+
+                int sector = __float2int_rd(h);
+                h -= sector;
+
+                if ( (unsigned)sector >= 6u )
+                {
+                    sector = 0;
+                    h = 0.f;
+                }
+
+                float tab[4];
+                tab[0] = v;
+                tab[1] = v * (1.f - s);
+                tab[2] = v * (1.f - s * h);
+                tab[3] = v * (1.f - s * (1.f - h));
+
+                b = tab[c_HsvSectorData[sector][0]];
+                g = tab[c_HsvSectorData[sector][1]];
+                r = tab[c_HsvSectorData[sector][2]];
+            }
+
+            typename MakeVec<float, dcn>::type dst;
+
+            dst.x = bidx == 0 ? b : r;
+            dst.y = g;
+            dst.z = bidx == 0 ? r : b;
+            setAlpha(dst, ColorChannel<float>::max());
+
+            return dst;
+        }
+    };
+
+    template <int scn, int dcn, int bidx, int hr> struct HSV2RGB<uchar, scn, dcn, bidx, hr>
+            : unary_function<typename MakeVec<uchar, scn>::type, typename MakeVec<uchar, dcn>::type>
+    {
+        __device__ typename MakeVec<uchar, dcn>::type operator ()(const typename MakeVec<uchar, scn>::type& src) const
+        {
+            float3 buf;
+
+            buf.x = src.x;
+            buf.y = src.y * (1.f / 255.f);
+            buf.z = src.z * (1.f / 255.f);
+
+            HSV2RGB<float, 3, 3, bidx, hr> cvtf;
+            buf = cvtf(buf);
+
+            typename MakeVec<uchar, dcn>::type dst;
+
+            dst.x = saturate_cast<uchar>(buf.x * 255.f);
+            dst.y = saturate_cast<uchar>(buf.y * 255.f);
+            dst.z = saturate_cast<uchar>(buf.z * 255.f);
+            setAlpha(dst, ColorChannel<uchar>::max());
+
+            return dst;
+        }
+    };
+
+    // RGB to HLS
+
+    template <typename T, int scn, int dcn, int bidx, int hr> struct RGB2HLS;
+
+    template <int scn, int dcn, int bidx, int hr> struct RGB2HLS<float, scn, dcn, bidx, hr>
+            : unary_function<typename MakeVec<float, scn>::type, typename MakeVec<float, dcn>::type>
+    {
+        __device__ typename MakeVec<float, dcn>::type operator ()(const typename MakeVec<float, scn>::type& src) const
+        {
+            const float hscale = hr * (1.f / 360.f);
+
+            const float b = bidx == 0 ? src.x : src.z;
+            const float g = src.y;
+            const float r = bidx == 0 ? src.z : src.x;
+
+            float h = 0.f, s = 0.f, l;
+            float vmin, vmax, diff;
+
+            vmax = vmin = r;
+            vmax = ::fmax(vmax, g);
+            vmax = ::fmax(vmax, b);
+            vmin = ::fmin(vmin, g);
+            vmin = ::fmin(vmin, b);
+
+            diff = vmax - vmin;
+            l = (vmax + vmin) * 0.5f;
+
+            if (diff > numeric_limits<float>::epsilon())
+            {
+                s = l < 0.5f ? diff / (vmax + vmin) : diff / (2 - vmax - vmin);
+
+                diff = 60.f / diff;
+
+                h  = (vmax == r) * (g - b) * diff;
+                h += (vmax != r && vmax == g) * ((b - r) * diff + 120.f);
+                h += (vmax != r && vmax != g) * ((r - g) * diff + 240.f);
+                h += (h < 0.f) * 360.f;
+            }
+
+            typename MakeVec<float, dcn>::type dst;
+
+            dst.x = h * hscale;
+            dst.y = l;
+            dst.z = s;
+
+            return dst;
+        }
+    };
+
+    template <int scn, int dcn, int bidx, int hr> struct RGB2HLS<uchar, scn, dcn, bidx, hr>
+            : unary_function<typename MakeVec<uchar, scn>::type, typename MakeVec<uchar, dcn>::type>
+    {
+        __device__ typename MakeVec<uchar, dcn>::type operator ()(const typename MakeVec<uchar, scn>::type& src) const
+        {
+            float3 buf;
+
+            buf.x = src.x * (1.f / 255.f);
+            buf.y = src.y * (1.f / 255.f);
+            buf.z = src.z * (1.f / 255.f);
+
+            RGB2HLS<float, 3, 3, bidx, hr> cvtf;
+            buf = cvtf(buf);
+
+            typename MakeVec<uchar, dcn>::type dst;
+
+            dst.x = saturate_cast<uchar>(buf.x);
+            dst.y = saturate_cast<uchar>(buf.y * 255.f);
+            dst.z = saturate_cast<uchar>(buf.z * 255.f);
+
+            return dst;
+        }
+    };
+
+    // HLS to RGB
+
+    __constant__ int c_HlsSectorData[6][3] = { {1,3,0}, {1,0,2}, {3,0,1}, {0,2,1}, {0,1,3}, {2,1,0} };
+
+    template <typename T, int scn, int dcn, int bidx, int hr> struct HLS2RGB;
+
+    template <int scn, int dcn, int bidx, int hr> struct HLS2RGB<float, scn, dcn, bidx, hr>
+            : unary_function<typename MakeVec<float, scn>::type, typename MakeVec<float, dcn>::type>
+    {
+        __device__ typename MakeVec<float, dcn>::type operator ()(const typename MakeVec<float, scn>::type& src) const
+        {
+            const float hscale = 6.0f / hr;
+
+            float h = src.x, l = src.y, s = src.z;
+            float b = l, g = l, r = l;
+
+            if (s != 0)
+            {
+                float p2  = (l <= 0.5f) * l * (1 + s);
+                      p2 += (l > 0.5f) * (l + s - l * s);
+                float p1 = 2 * l - p2;
+
+                h *= hscale;
+
+                if( h < 0 )
+                    do h += 6; while( h < 0 );
+                else if( h >= 6 )
+                    do h -= 6; while( h >= 6 );
+
+                int sector;
+                sector = __float2int_rd(h);
+
+                h -= sector;
+
+                float tab[4];
+                tab[0] = p2;
+                tab[1] = p1;
+                tab[2] = p1 + (p2 - p1) * (1 - h);
+                tab[3] = p1 + (p2 - p1) * h;
+
+                b = tab[c_HlsSectorData[sector][0]];
+                g = tab[c_HlsSectorData[sector][1]];
+                r = tab[c_HlsSectorData[sector][2]];
+            }
+
+            typename MakeVec<float, dcn>::type dst;
+
+            dst.x = bidx == 0 ? b : r;
+            dst.y = g;
+            dst.z = bidx == 0 ? r : b;
+            setAlpha(dst, ColorChannel<float>::max());
+
+            return dst;
+        }
+    };
+
+    template <int scn, int dcn, int bidx, int hr> struct HLS2RGB<uchar, scn, dcn, bidx, hr>
+            : unary_function<typename MakeVec<uchar, scn>::type, typename MakeVec<uchar, dcn>::type>
+    {
+        __device__ typename MakeVec<uchar, dcn>::type operator ()(const typename MakeVec<uchar, scn>::type& src) const
+        {
+            float3 buf;
+
+            buf.x = src.x;
+            buf.y = src.y * (1.f / 255.f);
+            buf.z = src.z * (1.f / 255.f);
+
+            HLS2RGB<float, 3, 3, bidx, hr> cvtf;
+            buf = cvtf(buf);
+
+            typename MakeVec<uchar, dcn>::type dst;
+
+            dst.x = saturate_cast<uchar>(buf.x * 255.f);
+            dst.y = saturate_cast<uchar>(buf.y * 255.f);
+            dst.z = saturate_cast<uchar>(buf.z * 255.f);
+            setAlpha(dst, ColorChannel<uchar>::max());
+
+            return dst;
+        }
+    };
+
+    // RGB to Lab
+
+    enum
+    {
+        LAB_CBRT_TAB_SIZE = 1024,
+        GAMMA_TAB_SIZE = 1024,
+        lab_shift = xyz_shift,
+        gamma_shift = 3,
+        lab_shift2 = (lab_shift + gamma_shift),
+        LAB_CBRT_TAB_SIZE_B = (256 * 3 / 2 * (1 << gamma_shift))
+    };
+
+    __constant__ ushort c_sRGBGammaTab_b[] = {0,1,1,2,2,3,4,4,5,6,6,7,8,8,9,10,11,11,12,13,14,15,16,17,19,20,21,22,24,25,26,28,29,31,33,34,36,38,40,41,43,45,47,49,51,54,56,58,60,63,65,68,70,73,75,78,81,83,86,89,92,95,98,101,105,108,111,115,118,121,125,129,132,136,140,144,147,151,155,160,164,168,172,176,181,185,190,194,199,204,209,213,218,223,228,233,239,244,249,255,260,265,271,277,282,288,294,300,306,312,318,324,331,337,343,350,356,363,370,376,383,390,397,404,411,418,426,433,440,448,455,463,471,478,486,494,502,510,518,527,535,543,552,560,569,578,586,595,604,613,622,631,641,650,659,669,678,688,698,707,717,727,737,747,757,768,778,788,799,809,820,831,842,852,863,875,886,897,908,920,931,943,954,966,978,990,1002,1014,1026,1038,1050,1063,1075,1088,1101,1113,1126,1139,1152,1165,1178,1192,1205,1218,1232,1245,1259,1273,1287,1301,1315,1329,1343,1357,1372,1386,1401,1415,1430,1445,1460,1475,1490,1505,1521,1536,1551,1567,1583,1598,1614,1630,1646,1662,1678,1695,1711,1728,1744,1761,1778,1794,1811,1828,1846,1863,1880,1897,1915,1933,1950,1968,1986,2004,2022,2040};
+    __constant__ float c_sRGBGammaTab[] = {0,7.55853e-05,0.,-7.51331e-13,7.55853e-05,7.55853e-05,-2.25399e-12,3.75665e-12,0.000151171,7.55853e-05,9.01597e-12,-6.99932e-12,0.000226756,7.55853e-05,-1.1982e-11,2.41277e-12,0.000302341,7.55853e-05,-4.74369e-12,1.19001e-11,0.000377927,7.55853e-05,3.09568e-11,-2.09095e-11,0.000453512,7.55853e-05,-3.17718e-11,1.35303e-11,0.000529097,7.55853e-05,8.81905e-12,-4.10782e-12,0.000604683,7.55853e-05,-3.50439e-12,2.90097e-12,0.000680268,7.55853e-05,5.19852e-12,-7.49607e-12,0.000755853,7.55853e-05,-1.72897e-11,2.70833e-11,0.000831439,7.55854e-05,6.39602e-11,-4.26295e-11,0.000907024,7.55854e-05,-6.39282e-11,2.70193e-11,0.000982609,7.55853e-05,1.71298e-11,-7.24017e-12,0.00105819,7.55853e-05,-4.59077e-12,1.94137e-12,0.00113378,7.55853e-05,1.23333e-12,-5.25291e-13,0.00120937,7.55853e-05,-3.42545e-13,1.59799e-13,0.00128495,7.55853e-05,1.36852e-13,-1.13904e-13,0.00136054,7.55853e-05,-2.04861e-13,2.95818e-13,0.00143612,7.55853e-05,6.82594e-13,-1.06937e-12,0.00151171,7.55853e-05,-2.52551e-12,3.98166e-12,0.00158729,7.55853e-05,9.41946e-12,-1.48573e-11,0.00166288,7.55853e-05,-3.51523e-11,5.54474e-11,0.00173846,7.55854e-05,1.3119e-10,-9.0517e-11,0.00181405,7.55854e-05,-1.40361e-10,7.37899e-11,0.00188963,7.55853e-05,8.10085e-11,-8.82272e-11,0.00196522,7.55852e-05,-1.83673e-10,1.62704e-10,0.0020408,7.55853e-05,3.04438e-10,-2.13341e-10,0.00211639,7.55853e-05,-3.35586e-10,2.25e-10,0.00219197,7.55853e-05,3.39414e-10,-2.20997e-10,0.00226756,7.55853e-05,-3.23576e-10,1.93326e-10,0.00234315,7.55853e-05,2.564e-10,-8.66446e-11,0.00241873,7.55855e-05,-3.53328e-12,-7.9578e-11,0.00249432,7.55853e-05,-2.42267e-10,1.72126e-10,0.0025699,7.55853e-05,2.74111e-10,-1.43265e-10,0.00264549,7.55854e-05,-1.55683e-10,-6.47292e-11,0.00272107,7.55849e-05,-3.4987e-10,8.67842e-10,0.00279666,7.55868e-05,2.25366e-09,-3.8723e-09,0.00287224,7.55797e-05,-9.36325e-09,1.5087e-08,0.00294783,7.56063e-05,3.58978e-08,-5.69415e-08,0.00302341,7.55072e-05,-1.34927e-07,2.13144e-07,0.003099,7.58768e-05,5.04507e-07,1.38713e-07,0.00317552,7.7302e-05,9.20646e-07,-1.55186e-07,0.00325359,7.86777e-05,4.55087e-07,4.26813e-08,0.00333276,7.97159e-05,5.83131e-07,-1.06495e-08,0.00341305,8.08502e-05,5.51182e-07,3.87467e-09,0.00349446,8.19642e-05,5.62806e-07,-1.92586e-10,0.00357698,8.30892e-05,5.62228e-07,1.0866e-09,0.00366063,8.4217e-05,5.65488e-07,5.02818e-10,0.00374542,8.53494e-05,5.66997e-07,8.60211e-10,0.00383133,8.6486e-05,5.69577e-07,7.13044e-10,0.00391839,8.76273e-05,5.71716e-07,4.78527e-10,0.00400659,8.87722e-05,5.73152e-07,1.09818e-09,0.00409594,8.99218e-05,5.76447e-07,2.50964e-10,0.00418644,9.10754e-05,5.772e-07,1.15762e-09,0.00427809,9.22333e-05,5.80672e-07,2.40865e-10,0.0043709,9.33954e-05,5.81395e-07,1.13854e-09,0.00446488,9.45616e-05,5.84811e-07,3.27267e-10,0.00456003,9.57322e-05,5.85792e-07,8.1197e-10,0.00465635,9.69062e-05,5.88228e-07,6.15823e-10,0.00475384,9.80845e-05,5.90076e-07,9.15747e-10,0.00485252,9.92674e-05,5.92823e-07,3.778e-10,0.00495238,0.000100454,5.93956e-07,8.32623e-10,0.00505343,0.000101645,5.96454e-07,4.82695e-10,0.00515567,0.000102839,5.97902e-07,9.61904e-10,0.00525911,0.000104038,6.00788e-07,3.26281e-10,0.00536375,0.00010524,6.01767e-07,9.926e-10,0.00546959,0.000106447,6.04745e-07,3.59933e-10,0.00557664,0.000107657,6.05824e-07,8.2728e-10,0.0056849,0.000108871,6.08306e-07,5.21898e-10,0.00579438,0.00011009,6.09872e-07,8.10492e-10,0.00590508,0.000111312,6.12303e-07,4.27046e-10,0.00601701,0.000112538,6.13585e-07,7.40878e-10,0.00613016,0.000113767,6.15807e-07,8.00469e-10,0.00624454,0.000115001,6.18209e-07,2.48178e-10,0.00636016,0.000116238,6.18953e-07,1.00073e-09,0.00647702,0.000117479,6.21955e-07,4.05654e-10,0.00659512,0.000118724,6.23172e-07,6.36192e-10,0.00671447,0.000119973,6.25081e-07,7.74927e-10,0.00683507,0.000121225,6.27406e-07,4.54975e-10,0.00695692,0.000122481,6.28771e-07,6.64841e-10,0.00708003,0.000123741,6.30765e-07,6.10972e-10,0.00720441,0.000125004,6.32598e-07,6.16543e-10,0.00733004,0.000126271,6.34448e-07,6.48204e-10,0.00745695,0.000127542,6.36392e-07,5.15835e-10,0.00758513,0.000128816,6.3794e-07,5.48103e-10,0.00771458,0.000130094,6.39584e-07,1.01706e-09,0.00784532,0.000131376,6.42635e-07,4.0283e-11,0.00797734,0.000132661,6.42756e-07,6.84471e-10,0.00811064,0.000133949,6.4481e-07,9.47144e-10,0.00824524,0.000135241,6.47651e-07,1.83472e-10,0.00838112,0.000136537,6.48201e-07,1.11296e-09,0.00851831,0.000137837,6.5154e-07,2.13163e-11,0.0086568,0.00013914,6.51604e-07,6.64462e-10,0.00879659,0.000140445,6.53598e-07,1.04613e-09,0.00893769,0.000141756,6.56736e-07,-1.92377e-10,0.0090801,0.000143069,6.56159e-07,1.58601e-09,0.00922383,0.000144386,6.60917e-07,-5.63754e-10,0.00936888,0.000145706,6.59226e-07,1.60033e-09,0.00951524,0.000147029,6.64027e-07,-2.49543e-10,0.00966294,0.000148356,6.63278e-07,1.26043e-09,0.00981196,0.000149687,6.67059e-07,-1.35572e-10,0.00996231,0.00015102,6.66653e-07,1.14458e-09,0.010114,0.000152357,6.70086e-07,2.13864e-10,0.010267,0.000153698,6.70728e-07,7.93856e-10,0.0104214,0.000155042,6.73109e-07,3.36077e-10,0.0105771,0.000156389,6.74118e-07,6.55765e-10,0.0107342,0.000157739,6.76085e-07,7.66211e-10,0.0108926,0.000159094,6.78384e-07,4.66116e-12,0.0110524,0.000160451,6.78398e-07,1.07775e-09,0.0112135,0.000161811,6.81631e-07,3.41023e-10,0.011376,0.000163175,6.82654e-07,3.5205e-10,0.0115398,0.000164541,6.8371e-07,1.04473e-09,0.0117051,0.000165912,6.86844e-07,1.25757e-10,0.0118717,0.000167286,6.87222e-07,3.14818e-10,0.0120396,0.000168661,6.88166e-07,1.40886e-09,0.012209,0.000170042,6.92393e-07,-3.62244e-10,0.0123797,0.000171425,6.91306e-07,9.71397e-10,0.0125518,0.000172811,6.9422e-07,2.02003e-10,0.0127253,0.0001742,6.94826e-07,1.01448e-09,0.0129002,0.000175593,6.97869e-07,3.96653e-10,0.0130765,0.00017699,6.99059e-07,1.92927e-10,0.0132542,0.000178388,6.99638e-07,6.94305e-10,0.0134333,0.00017979,7.01721e-07,7.55108e-10,0.0136138,0.000181195,7.03986e-07,1.05918e-11,0.0137957,0.000182603,7.04018e-07,1.06513e-09,0.013979,0.000184015,7.07214e-07,3.85512e-10,0.0141637,0.00018543,7.0837e-07,1.86769e-10,0.0143499,0.000186848,7.0893e-07,7.30116e-10,0.0145374,0.000188268,7.11121e-07,6.17983e-10,0.0147264,0.000189692,7.12975e-07,5.23282e-10,0.0149168,0.000191119,7.14545e-07,8.28398e-11,0.0151087,0.000192549,7.14793e-07,1.0081e-09,0.0153019,0.000193981,7.17817e-07,5.41244e-10,0.0154966,0.000195418,7.19441e-07,-3.7907e-10,0.0156928,0.000196856,7.18304e-07,1.90641e-09,0.0158903,0.000198298,7.24023e-07,-7.27387e-10,0.0160893,0.000199744,7.21841e-07,1.00317e-09,0.0162898,0.000201191,7.24851e-07,4.39949e-10,0.0164917,0.000202642,7.2617e-07,9.6234e-10,0.0166951,0.000204097,7.29057e-07,-5.64019e-10,0.0168999,0.000205554,7.27365e-07,1.29374e-09,0.0171062,0.000207012,7.31247e-07,9.77025e-10,0.017314,0.000208478,7.34178e-07,-1.47651e-09,0.0175232,0.000209942,7.29748e-07,3.06636e-09,0.0177338,0.00021141,7.38947e-07,-1.47573e-09,0.017946,0.000212884,7.3452e-07,9.7386e-10,0.0181596,0.000214356,7.37442e-07,1.30562e-09,0.0183747,0.000215835,7.41358e-07,-6.08376e-10,0.0185913,0.000217315,7.39533e-07,1.12785e-09,0.0188093,0.000218798,7.42917e-07,-1.77711e-10,0.0190289,0.000220283,7.42384e-07,1.44562e-09,0.0192499,0.000221772,7.46721e-07,-1.68825e-11,0.0194724,0.000223266,7.4667e-07,4.84533e-10,0.0196964,0.000224761,7.48124e-07,-5.85298e-11,0.0199219,0.000226257,7.47948e-07,1.61217e-09,0.0201489,0.000227757,7.52785e-07,-8.02136e-10,0.0203775,0.00022926,7.50378e-07,1.59637e-09,0.0206075,0.000230766,7.55167e-07,4.47168e-12,0.020839,0.000232276,7.55181e-07,2.48387e-10,0.021072,0.000233787,7.55926e-07,8.6474e-10,0.0213066,0.000235302,7.5852e-07,1.78299e-11,0.0215426,0.000236819,7.58573e-07,9.26567e-10,0.0217802,0.000238339,7.61353e-07,1.34529e-12,0.0220193,0.000239862,7.61357e-07,9.30659e-10,0.0222599,0.000241387,7.64149e-07,1.34529e-12,0.0225021,0.000242915,7.64153e-07,9.26567e-10,0.0227458,0.000244447,7.66933e-07,1.76215e-11,0.022991,0.00024598,7.66986e-07,8.65536e-10,0.0232377,0.000247517,7.69582e-07,2.45677e-10,0.023486,0.000249057,7.70319e-07,1.44193e-11,0.0237358,0.000250598,7.70363e-07,1.55918e-09,0.0239872,0.000252143,7.7504e-07,-6.63173e-10,0.0242401,0.000253691,7.73051e-07,1.09357e-09,0.0244946,0.000255241,7.76331e-07,1.41919e-11,0.0247506,0.000256793,7.76374e-07,7.12248e-10,0.0250082,0.000258348,7.78511e-07,8.62049e-10,0.0252673,0.000259908,7.81097e-07,-4.35061e-10,0.025528,0.000261469,7.79792e-07,8.7825e-10,0.0257902,0.000263031,7.82426e-07,6.47181e-10,0.0260541,0.000264598,7.84368e-07,2.58448e-10,0.0263194,0.000266167,7.85143e-07,1.81558e-10,0.0265864,0.000267738,7.85688e-07,8.78041e-10,0.0268549,0.000269312,7.88322e-07,3.15102e-11,0.027125,0.000270889,7.88417e-07,8.58525e-10,0.0273967,0.000272468,7.90992e-07,2.59812e-10,0.02767,0.000274051,7.91772e-07,-3.5224e-11,0.0279448,0.000275634,7.91666e-07,1.74377e-09,0.0282212,0.000277223,7.96897e-07,-1.35196e-09,0.0284992,0.000278813,7.92841e-07,1.80141e-09,0.0287788,0.000280404,7.98246e-07,-2.65629e-10,0.0290601,0.000281999,7.97449e-07,1.12374e-09,0.0293428,0.000283598,8.0082e-07,-5.04106e-10,0.0296272,0.000285198,7.99308e-07,8.92764e-10,0.0299132,0.000286799,8.01986e-07,6.58379e-10,0.0302008,0.000288405,8.03961e-07,1.98971e-10,0.0304901,0.000290014,8.04558e-07,4.08382e-10,0.0307809,0.000291624,8.05783e-07,3.01839e-11,0.0310733,0.000293236,8.05874e-07,1.33343e-09,0.0313673,0.000294851,8.09874e-07,2.2419e-10,0.031663,0.000296472,8.10547e-07,-3.67606e-10,0.0319603,0.000298092,8.09444e-07,1.24624e-09,0.0322592,0.000299714,8.13182e-07,-8.92025e-10,0.0325597,0.000301338,8.10506e-07,2.32183e-09,0.0328619,0.000302966,8.17472e-07,-9.44719e-10,0.0331657,0.000304598,8.14638e-07,1.45703e-09,0.0334711,0.000306232,8.19009e-07,-1.15805e-09,0.0337781,0.000307866,8.15535e-07,3.17507e-09,0.0340868,0.000309507,8.2506e-07,-4.09161e-09,0.0343971,0.000311145,8.12785e-07,5.74079e-09,0.0347091,0.000312788,8.30007e-07,-3.97034e-09,0.0350227,0.000314436,8.18096e-07,2.68985e-09,0.035338,0.00031608,8.26166e-07,6.61676e-10,0.0356549,0.000317734,8.28151e-07,-1.61123e-09,0.0359734,0.000319386,8.23317e-07,2.05786e-09,0.0362936,0.000321038,8.29491e-07,8.30388e-10,0.0366155,0.0003227,8.31982e-07,-1.65424e-09,0.036939,0.000324359,8.27019e-07,2.06129e-09,0.0372642,0.000326019,8.33203e-07,8.59719e-10,0.0375911,0.000327688,8.35782e-07,-1.77488e-09,0.0379196,0.000329354,8.30458e-07,2.51464e-09,0.0382498,0.000331023,8.38002e-07,-8.33135e-10,0.0385817,0.000332696,8.35502e-07,8.17825e-10,0.0389152,0.00033437,8.37956e-07,1.28718e-09,0.0392504,0.00033605,8.41817e-07,-2.2413e-09,0.0395873,0.000337727,8.35093e-07,3.95265e-09,0.0399258,0.000339409,8.46951e-07,-2.39332e-09,0.0402661,0.000341095,8.39771e-07,1.89533e-09,0.040608,0.000342781,8.45457e-07,-1.46271e-09,0.0409517,0.000344467,8.41069e-07,3.95554e-09,0.041297,0.000346161,8.52936e-07,-3.18369e-09,0.041644,0.000347857,8.43385e-07,1.32873e-09,0.0419927,0.000349548,8.47371e-07,1.59402e-09,0.0423431,0.000351248,8.52153e-07,-2.54336e-10,0.0426952,0.000352951,8.5139e-07,-5.76676e-10,0.043049,0.000354652,8.4966e-07,2.56114e-09,0.0434045,0.000356359,8.57343e-07,-2.21744e-09,0.0437617,0.000358067,8.50691e-07,2.58344e-09,0.0441206,0.000359776,8.58441e-07,-6.65826e-10,0.0444813,0.000361491,8.56444e-07,7.99218e-11,0.0448436,0.000363204,8.56684e-07,3.46063e-10,0.0452077,0.000364919,8.57722e-07,2.26116e-09,0.0455734,0.000366641,8.64505e-07,-1.94005e-09,0.045941,0.000368364,8.58685e-07,1.77384e-09,0.0463102,0.000370087,8.64007e-07,-1.43005e-09,0.0466811,0.000371811,8.59717e-07,3.94634e-09,0.0470538,0.000373542,8.71556e-07,-3.17946e-09,0.0474282,0.000375276,8.62017e-07,1.32104e-09,0.0478043,0.000377003,8.6598e-07,1.62045e-09,0.0481822,0.00037874,8.70842e-07,-3.52297e-10,0.0485618,0.000380481,8.69785e-07,-2.11211e-10,0.0489432,0.00038222,8.69151e-07,1.19716e-09,0.0493263,0.000383962,8.72743e-07,-8.52026e-10,0.0497111,0.000385705,8.70187e-07,2.21092e-09,0.0500977,0.000387452,8.76819e-07,-5.41339e-10,0.050486,0.000389204,8.75195e-07,-4.5361e-11,0.0508761,0.000390954,8.75059e-07,7.22669e-10,0.0512679,0.000392706,8.77227e-07,8.79936e-10,0.0516615,0.000394463,8.79867e-07,-5.17048e-10,0.0520568,0.000396222,8.78316e-07,1.18833e-09,0.0524539,0.000397982,8.81881e-07,-5.11022e-10,0.0528528,0.000399744,8.80348e-07,8.55683e-10,0.0532534,0.000401507,8.82915e-07,8.13562e-10,0.0536558,0.000403276,8.85356e-07,-3.84603e-10,0.05406,0.000405045,8.84202e-07,7.24962e-10,0.0544659,0.000406816,8.86377e-07,1.20986e-09,0.0548736,0.000408592,8.90006e-07,-1.83896e-09,0.0552831,0.000410367,8.84489e-07,2.42071e-09,0.0556944,0.000412143,8.91751e-07,-3.93413e-10,0.0561074,0.000413925,8.90571e-07,-8.46967e-10,0.0565222,0.000415704,8.8803e-07,3.78122e-09,0.0569388,0.000417491,8.99374e-07,-3.1021e-09,0.0573572,0.000419281,8.90068e-07,1.17658e-09,0.0577774,0.000421064,8.93597e-07,2.12117e-09,0.0581993,0.000422858,8.99961e-07,-2.21068e-09,0.0586231,0.000424651,8.93329e-07,2.9961e-09,0.0590486,0.000426447,9.02317e-07,-2.32311e-09,0.059476,0.000428244,8.95348e-07,2.57122e-09,0.0599051,0.000430043,9.03062e-07,-5.11098e-10,0.0603361,0.000431847,9.01528e-07,-5.27166e-10,0.0607688,0.000433649,8.99947e-07,2.61984e-09,0.0612034,0.000435457,9.07806e-07,-2.50141e-09,0.0616397,0.000437265,9.00302e-07,3.66045e-09,0.0620779,0.000439076,9.11283e-07,-4.68977e-09,0.0625179,0.000440885,8.97214e-07,7.64783e-09,0.0629597,0.000442702,9.20158e-07,-7.27499e-09,0.0634033,0.000444521,8.98333e-07,6.55113e-09,0.0638487,0.000446337,9.17986e-07,-4.02844e-09,0.0642959,0.000448161,9.05901e-07,2.11196e-09,0.064745,0.000449979,9.12236e-07,3.03125e-09,0.0651959,0.000451813,9.2133e-07,-6.78648e-09,0.0656486,0.000453635,9.00971e-07,9.21375e-09,0.0661032,0.000455464,9.28612e-07,-7.71684e-09,0.0665596,0.000457299,9.05462e-07,6.7522e-09,0.0670178,0.00045913,9.25718e-07,-4.3907e-09,0.0674778,0.000460968,9.12546e-07,3.36e-09,0.0679397,0.000462803,9.22626e-07,-1.59876e-09,0.0684034,0.000464644,9.1783e-07,3.0351e-09,0.068869,0.000466488,9.26935e-07,-3.09101e-09,0.0693364,0.000468333,9.17662e-07,1.8785e-09,0.0698057,0.000470174,9.23298e-07,3.02733e-09,0.0702768,0.00047203,9.3238e-07,-6.53722e-09,0.0707497,0.000473875,9.12768e-07,8.22054e-09,0.0712245,0.000475725,9.37429e-07,-3.99325e-09,0.0717012,0.000477588,9.2545e-07,3.01839e-10,0.0721797,0.00047944,9.26355e-07,2.78597e-09,0.0726601,0.000481301,9.34713e-07,-3.99507e-09,0.0731423,0.000483158,9.22728e-07,5.7435e-09,0.0736264,0.000485021,9.39958e-07,-4.07776e-09,0.0741123,0.000486888,9.27725e-07,3.11695e-09,0.0746002,0.000488753,9.37076e-07,-9.39394e-10,0.0750898,0.000490625,9.34258e-07,6.4055e-10,0.0755814,0.000492495,9.3618e-07,-1.62265e-09,0.0760748,0.000494363,9.31312e-07,5.84995e-09,0.0765701,0.000496243,9.48861e-07,-6.87601e-09,0.0770673,0.00049812,9.28233e-07,6.75296e-09,0.0775664,0.000499997,9.48492e-07,-5.23467e-09,0.0780673,0.000501878,9.32788e-07,6.73523e-09,0.0785701,0.000503764,9.52994e-07,-6.80514e-09,0.0790748,0.000505649,9.32578e-07,5.5842e-09,0.0795814,0.000507531,9.49331e-07,-6.30583e-10,0.0800899,0.000509428,9.47439e-07,-3.0618e-09,0.0806003,0.000511314,9.38254e-07,5.4273e-09,0.0811125,0.000513206,9.54536e-07,-3.74627e-09,0.0816267,0.000515104,9.43297e-07,2.10713e-09,0.0821427,0.000516997,9.49618e-07,2.76839e-09,0.0826607,0.000518905,9.57924e-07,-5.73006e-09,0.0831805,0.000520803,9.40733e-07,5.25072e-09,0.0837023,0.0005227,9.56486e-07,-3.71718e-10,0.084226,0.000524612,9.5537e-07,-3.76404e-09,0.0847515,0.000526512,9.44078e-07,7.97735e-09,0.085279,0.000528424,9.6801e-07,-5.79367e-09,0.0858084,0.000530343,9.50629e-07,2.96268e-10,0.0863397,0.000532245,9.51518e-07,4.6086e-09,0.0868729,0.000534162,9.65344e-07,-3.82947e-09,0.087408,0.000536081,9.53856e-07,3.25861e-09,0.087945,0.000537998,9.63631e-07,-1.7543e-09,0.088484,0.00053992,9.58368e-07,3.75849e-09,0.0890249,0.000541848,9.69644e-07,-5.82891e-09,0.0895677,0.00054377,9.52157e-07,4.65593e-09,0.0901124,0.000545688,9.66125e-07,2.10643e-09,0.0906591,0.000547627,9.72444e-07,-5.63099e-09,0.0912077,0.000549555,9.55551e-07,5.51627e-09,0.0917582,0.000551483,9.721e-07,-1.53292e-09,0.0923106,0.000553422,9.67501e-07,6.15311e-10,0.092865,0.000555359,9.69347e-07,-9.28291e-10,0.0934213,0.000557295,9.66562e-07,3.09774e-09,0.0939796,0.000559237,9.75856e-07,-4.01186e-09,0.0945398,0.000561177,9.6382e-07,5.49892e-09,0.095102,0.000563121,9.80317e-07,-3.08258e-09,0.0956661,0.000565073,9.71069e-07,-6.19176e-10,0.0962321,0.000567013,9.69212e-07,5.55932e-09,0.0968001,0.000568968,9.8589e-07,-6.71704e-09,0.09737,0.00057092,9.65738e-07,6.40762e-09,0.0979419,0.00057287,9.84961e-07,-4.0122e-09,0.0985158,0.000574828,9.72925e-07,2.19059e-09,0.0990916,0.000576781,9.79496e-07,2.70048e-09,0.0996693,0.000578748,9.87598e-07,-5.54193e-09,0.100249,0.000580706,9.70972e-07,4.56597e-09,0.100831,0.000582662,9.8467e-07,2.17923e-09,0.101414,0.000584638,9.91208e-07,-5.83232e-09,0.102,0.000586603,9.73711e-07,6.24884e-09,0.102588,0.000588569,9.92457e-07,-4.26178e-09,0.103177,0.000590541,9.79672e-07,3.34781e-09,0.103769,0.00059251,9.89715e-07,-1.67904e-09,0.104362,0.000594485,9.84678e-07,3.36839e-09,0.104958,0.000596464,9.94783e-07,-4.34397e-09,0.105555,0.000598441,9.81751e-07,6.55696e-09,0.106155,0.000600424,1.00142e-06,-6.98272e-09,0.106756,0.000602406,9.80474e-07,6.4728e-09,0.107359,0.000604386,9.99893e-07,-4.00742e-09,0.107965,0.000606374,9.8787e-07,2.10654e-09,0.108572,0.000608356,9.9419e-07,3.0318e-09,0.109181,0.000610353,1.00329e-06,-6.7832e-09,0.109793,0.00061234,9.82936e-07,9.1998e-09,0.110406,0.000614333,1.01054e-06,-7.6642e-09,0.111021,0.000616331,9.87543e-07,6.55579e-09,0.111639,0.000618326,1.00721e-06,-3.65791e-09,0.112258,0.000620329,9.96236e-07,6.25467e-10,0.112879,0.000622324,9.98113e-07,1.15593e-09,0.113503,0.000624323,1.00158e-06,2.20158e-09,0.114128,0.000626333,1.00819e-06,-2.51191e-09,0.114755,0.000628342,1.00065e-06,3.95517e-10,0.115385,0.000630345,1.00184e-06,9.29807e-10,0.116016,0.000632351,1.00463e-06,3.33599e-09,0.116649,0.00063437,1.01463e-06,-6.82329e-09,0.117285,0.000636379,9.94163e-07,9.05595e-09,0.117922,0.000638395,1.02133e-06,-7.04862e-09,0.118562,0.000640416,1.00019e-06,4.23737e-09,0.119203,0.000642429,1.0129e-06,-2.45033e-09,0.119847,0.000644448,1.00555e-06,5.56395e-09,0.120492,0.000646475,1.02224e-06,-4.9043e-09,0.121139,0.000648505,1.00753e-06,-8.47952e-10,0.121789,0.000650518,1.00498e-06,8.29622e-09,0.122441,0.000652553,1.02987e-06,-9.98538e-09,0.123094,0.000654582,9.99914e-07,9.2936e-09,0.12375,0.00065661,1.02779e-06,-4.83707e-09,0.124407,0.000658651,1.01328e-06,2.60411e-09,0.125067,0.000660685,1.0211e-06,-5.57945e-09,0.125729,0.000662711,1.00436e-06,1.22631e-08,0.126392,0.000664756,1.04115e-06,-1.36704e-08,0.127058,0.000666798,1.00014e-06,1.26161e-08,0.127726,0.000668836,1.03798e-06,-6.99155e-09,0.128396,0.000670891,1.01701e-06,4.48836e-10,0.129068,0.000672926,1.01836e-06,5.19606e-09,0.129742,0.000674978,1.03394e-06,-6.3319e-09,0.130418,0.000677027,1.01495e-06,5.2305e-09,0.131096,0.000679073,1.03064e-06,3.11123e-10,0.131776,0.000681135,1.03157e-06,-6.47511e-09,0.132458,0.000683179,1.01215e-06,1.06882e-08,0.133142,0.000685235,1.04421e-06,-6.47519e-09,0.133829,0.000687304,1.02479e-06,3.11237e-10,0.134517,0.000689355,1.02572e-06,5.23035e-09,0.135207,0.000691422,1.04141e-06,-6.3316e-09,0.1359,0.000693486,1.02242e-06,5.19484e-09,0.136594,0.000695546,1.038e-06,4.53497e-10,0.137291,0.000697623,1.03936e-06,-7.00891e-09,0.137989,0.000699681,1.01834e-06,1.2681e-08,0.13869,0.000701756,1.05638e-06,-1.39128e-08,0.139393,0.000703827,1.01464e-06,1.31679e-08,0.140098,0.000705896,1.05414e-06,-8.95659e-09,0.140805,0.000707977,1.02727e-06,7.75742e-09,0.141514,0.000710055,1.05055e-06,-7.17182e-09,0.142225,0.000712135,1.02903e-06,6.02862e-09,0.142938,0.000714211,1.04712e-06,-2.04163e-09,0.143653,0.000716299,1.04099e-06,2.13792e-09,0.144371,0.000718387,1.04741e-06,-6.51009e-09,0.14509,0.000720462,1.02787e-06,9.00123e-09,0.145812,0.000722545,1.05488e-06,3.07523e-10,0.146535,0.000724656,1.0558e-06,-1.02312e-08,0.147261,0.000726737,1.02511e-06,1.0815e-08,0.147989,0.000728819,1.05755e-06,-3.22681e-09,0.148719,0.000730925,1.04787e-06,2.09244e-09,0.14945,0.000733027,1.05415e-06,-5.143e-09,0.150185,0.00073512,1.03872e-06,3.57844e-09,0.150921,0.000737208,1.04946e-06,5.73027e-09,0.151659,0.000739324,1.06665e-06,-1.15983e-08,0.152399,0.000741423,1.03185e-06,1.08605e-08,0.153142,0.000743519,1.06443e-06,-2.04106e-09,0.153886,0.000745642,1.05831e-06,-2.69642e-09,0.154633,0.00074775,1.05022e-06,-2.07425e-09,0.155382,0.000749844,1.044e-06,1.09934e-08,0.156133,0.000751965,1.07698e-06,-1.20972e-08,0.156886,0.000754083,1.04069e-06,7.59288e-09,0.157641,0.000756187,1.06347e-06,-3.37305e-09,0.158398,0.000758304,1.05335e-06,5.89921e-09,0.159158,0.000760428,1.07104e-06,-5.32248e-09,0.159919,0.000762554,1.05508e-06,4.8927e-10,0.160683,0.000764666,1.05654e-06,3.36547e-09,0.161448,0.000766789,1.06664e-06,9.50081e-10,0.162216,0.000768925,1.06949e-06,-7.16568e-09,0.162986,0.000771043,1.04799e-06,1.28114e-08,0.163758,0.000773177,1.08643e-06,-1.42774e-08,0.164533,0.000775307,1.0436e-06,1.44956e-08,0.165309,0.000777438,1.08708e-06,-1.39025e-08,0.166087,0.00077957,1.04538e-06,1.13118e-08,0.166868,0.000781695,1.07931e-06,-1.54224e-09,0.167651,0.000783849,1.07468e-06,-5.14312e-09,0.168436,0.000785983,1.05925e-06,7.21381e-09,0.169223,0.000788123,1.0809e-06,-8.81096e-09,0.170012,0.000790259,1.05446e-06,1.31289e-08,0.170803,0.000792407,1.09385e-06,-1.39022e-08,0.171597,0.000794553,1.05214e-06,1.26775e-08,0.172392,0.000796695,1.09018e-06,-7.00557e-09,0.17319,0.000798855,1.06916e-06,4.43796e-10,0.17399,0.000800994,1.07049e-06,5.23031e-09,0.174792,0.000803151,1.08618e-06,-6.46397e-09,0.175596,0.000805304,1.06679e-06,5.72444e-09,0.176403,0.000807455,1.08396e-06,-1.53254e-09,0.177211,0.000809618,1.07937e-06,4.05673e-10,0.178022,0.000811778,1.08058e-06,-9.01916e-11,0.178835,0.000813939,1.08031e-06,-4.49821e-11,0.17965,0.000816099,1.08018e-06,2.70234e-10,0.180467,0.00081826,1.08099e-06,-1.03603e-09,0.181286,0.000820419,1.07788e-06,3.87392e-09,0.182108,0.000822587,1.0895e-06,4.41522e-10,0.182932,0.000824767,1.09083e-06,-5.63997e-09,0.183758,0.000826932,1.07391e-06,7.21707e-09,0.184586,0.000829101,1.09556e-06,-8.32718e-09,0.185416,0.000831267,1.07058e-06,1.11907e-08,0.186248,0.000833442,1.10415e-06,-6.63336e-09,0.187083,0.00083563,1.08425e-06,4.41484e-10,0.187919,0.0008378,1.08557e-06,4.86754e-09,0.188758,0.000839986,1.10017e-06,-5.01041e-09,0.189599,0.000842171,1.08514e-06,2.72811e-10,0.190443,0.000844342,1.08596e-06,3.91916e-09,0.191288,0.000846526,1.09772e-06,-1.04819e-09,0.192136,0.000848718,1.09457e-06,2.73531e-10,0.192985,0.000850908,1.0954e-06,-4.58916e-11,0.193837,0.000853099,1.09526e-06,-9.01158e-11,0.194692,0.000855289,1.09499e-06,4.06506e-10,0.195548,0.00085748,1.09621e-06,-1.53595e-09,0.196407,0.000859668,1.0916e-06,5.73717e-09,0.197267,0.000861869,1.10881e-06,-6.51164e-09,0.19813,0.000864067,1.08928e-06,5.40831e-09,0.198995,0.000866261,1.1055e-06,-2.20401e-10,0.199863,0.000868472,1.10484e-06,-4.52652e-09,0.200732,0.000870668,1.09126e-06,3.42508e-09,0.201604,0.000872861,1.10153e-06,5.72762e-09,0.202478,0.000875081,1.11872e-06,-1.14344e-08,0.203354,0.000877284,1.08441e-06,1.02076e-08,0.204233,0.000879484,1.11504e-06,4.06355e-10,0.205113,0.000881715,1.11626e-06,-1.18329e-08,0.205996,0.000883912,1.08076e-06,1.71227e-08,0.206881,0.000886125,1.13213e-06,-1.19546e-08,0.207768,0.000888353,1.09626e-06,8.93465e-10,0.208658,0.000890548,1.09894e-06,8.38062e-09,0.209549,0.000892771,1.12408e-06,-4.61353e-09,0.210443,0.000895006,1.11024e-06,-4.82756e-09,0.211339,0.000897212,1.09576e-06,9.02245e-09,0.212238,0.00089943,1.12283e-06,-1.45997e-09,0.213138,0.000901672,1.11845e-06,-3.18255e-09,0.214041,0.000903899,1.1089e-06,-7.11073e-10,0.214946,0.000906115,1.10677e-06,6.02692e-09,0.215853,0.000908346,1.12485e-06,-8.49548e-09,0.216763,0.00091057,1.09936e-06,1.30537e-08,0.217675,0.000912808,1.13852e-06,-1.3917e-08,0.218588,0.000915044,1.09677e-06,1.28121e-08,0.219505,0.000917276,1.13521e-06,-7.5288e-09,0.220423,0.000919523,1.11262e-06,2.40205e-09,0.221344,0.000921756,1.11983e-06,-2.07941e-09,0.222267,0.000923989,1.11359e-06,5.91551e-09,0.223192,0.000926234,1.13134e-06,-6.68149e-09,0.224119,0.000928477,1.11129e-06,5.90929e-09,0.225049,0.000930717,1.12902e-06,-2.05436e-09,0.22598,0.000932969,1.12286e-06,2.30807e-09,0.226915,0.000935222,1.12978e-06,-7.17796e-09,0.227851,0.00093746,1.10825e-06,1.15028e-08,0.228789,0.000939711,1.14276e-06,-9.03083e-09,0.22973,0.000941969,1.11566e-06,9.71932e-09,0.230673,0.00094423,1.14482e-06,-1.49452e-08,0.231619,0.000946474,1.09998e-06,2.02591e-08,0.232566,0.000948735,1.16076e-06,-2.13879e-08,0.233516,0.000950993,1.0966e-06,2.05888e-08,0.234468,0.000953247,1.15837e-06,-1.62642e-08,0.235423,0.000955515,1.10957e-06,1.46658e-08,0.236379,0.000957779,1.15357e-06,-1.25966e-08,0.237338,0.000960048,1.11578e-06,5.91793e-09,0.238299,0.000962297,1.13353e-06,3.82602e-09,0.239263,0.000964576,1.14501e-06,-6.3208e-09,0.240229,0.000966847,1.12605e-06,6.55613e-09,0.241197,0.000969119,1.14572e-06,-5.00268e-09,0.242167,0.000971395,1.13071e-06,-1.44659e-09,0.243139,0.000973652,1.12637e-06,1.07891e-08,0.244114,0.000975937,1.15874e-06,-1.19073e-08,0.245091,0.000978219,1.12302e-06,7.03782e-09,0.246071,0.000980486,1.14413e-06,-1.34276e-09,0.247052,0.00098277,1.1401e-06,-1.66669e-09,0.248036,0.000985046,1.1351e-06,8.00935e-09,0.249022,0.00098734,1.15913e-06,-1.54694e-08,0.250011,0.000989612,1.11272e-06,2.4066e-08,0.251002,0.000991909,1.18492e-06,-2.11901e-08,0.251995,0.000994215,1.12135e-06,1.08973e-09,0.25299,0.000996461,1.12462e-06,1.68311e-08,0.253988,0.000998761,1.17511e-06,-8.8094e-09,0.254987,0.00100109,1.14868e-06,-1.13958e-08,0.25599,0.00100335,1.1145e-06,2.45902e-08,0.256994,0.00100565,1.18827e-06,-2.73603e-08,0.258001,0.00100795,1.10618e-06,2.52464e-08,0.25901,0.00101023,1.18192e-06,-1.40207e-08,0.260021,0.00101256,1.13986e-06,1.03387e-09,0.261035,0.00101484,1.14296e-06,9.8853e-09,0.262051,0.00101715,1.17262e-06,-1.07726e-08,0.263069,0.00101947,1.1403e-06,3.40272e-09,0.26409,0.00102176,1.15051e-06,-2.83827e-09,0.265113,0.00102405,1.142e-06,7.95039e-09,0.266138,0.00102636,1.16585e-06,8.39047e-10,0.267166,0.00102869,1.16836e-06,-1.13066e-08,0.268196,0.00103099,1.13444e-06,1.4585e-08,0.269228,0.00103331,1.1782e-06,-1.72314e-08,0.270262,0.00103561,1.1265e-06,2.45382e-08,0.271299,0.00103794,1.20012e-06,-2.13166e-08,0.272338,0.00104028,1.13617e-06,1.12364e-09,0.273379,0.00104255,1.13954e-06,1.68221e-08,0.274423,0.00104488,1.19001e-06,-8.80736e-09,0.275469,0.00104723,1.16358e-06,-1.13948e-08,0.276518,0.00104953,1.1294e-06,2.45839e-08,0.277568,0.00105186,1.20315e-06,-2.73361e-08,0.278621,0.00105418,1.12114e-06,2.51559e-08,0.279677,0.0010565,1.19661e-06,-1.36832e-08,0.280734,0.00105885,1.15556e-06,-2.25706e-10,0.281794,0.00106116,1.15488e-06,1.45862e-08,0.282857,0.00106352,1.19864e-06,-2.83167e-08,0.283921,0.00106583,1.11369e-06,3.90759e-08,0.284988,0.00106817,1.23092e-06,-3.85801e-08,0.286058,0.00107052,1.11518e-06,2.58375e-08,0.287129,0.00107283,1.19269e-06,-5.16498e-09,0.288203,0.0010752,1.1772e-06,-5.17768e-09,0.28928,0.00107754,1.16167e-06,-3.92671e-09,0.290358,0.00107985,1.14988e-06,2.08846e-08,0.29144,0.00108221,1.21254e-06,-2.00072e-08,0.292523,0.00108458,1.15252e-06,-4.60659e-10,0.293609,0.00108688,1.15114e-06,2.18499e-08,0.294697,0.00108925,1.21669e-06,-2.73343e-08,0.295787,0.0010916,1.13468e-06,2.78826e-08,0.29688,0.00109395,1.21833e-06,-2.45915e-08,0.297975,0.00109632,1.14456e-06,1.08787e-08,0.299073,0.00109864,1.17719e-06,1.08788e-08,0.300172,0.00110102,1.20983e-06,-2.45915e-08,0.301275,0.00110337,1.13605e-06,2.78828e-08,0.302379,0.00110573,1.2197e-06,-2.73348e-08,0.303486,0.00110808,1.1377e-06,2.18518e-08,0.304595,0.00111042,1.20325e-06,-4.67556e-10,0.305707,0.00111283,1.20185e-06,-1.99816e-08,0.306821,0.00111517,1.14191e-06,2.07891e-08,0.307937,0.00111752,1.20427e-06,-3.57026e-09,0.309056,0.00111992,1.19356e-06,-6.50797e-09,0.310177,0.00112228,1.17404e-06,-2.00165e-10,0.3113,0.00112463,1.17344e-06,7.30874e-09,0.312426,0.001127,1.19536e-06,7.67424e-10,0.313554,0.00112939,1.19767e-06,-1.03784e-08,0.314685,0.00113176,1.16653e-06,1.09437e-08,0.315818,0.00113412,1.19936e-06,-3.59406e-09,0.316953,0.00113651,1.18858e-06,3.43251e-09,0.318091,0.0011389,1.19888e-06,-1.0136e-08,0.319231,0.00114127,1.16847e-06,7.30915e-09,0.320374,0.00114363,1.1904e-06,1.07018e-08,0.321518,0.00114604,1.2225e-06,-2.03137e-08,0.322666,0.00114842,1.16156e-06,1.09484e-08,0.323815,0.00115078,1.19441e-06,6.32224e-09,0.324967,0.00115319,1.21337e-06,-6.43509e-09,0.326122,0.00115559,1.19407e-06,-1.03842e-08,0.327278,0.00115795,1.16291e-06,1.81697e-08,0.328438,0.00116033,1.21742e-06,-2.6901e-09,0.329599,0.00116276,1.20935e-06,-7.40939e-09,0.330763,0.00116515,1.18713e-06,2.52533e-09,0.331929,0.00116754,1.1947e-06,-2.69191e-09,0.333098,0.00116992,1.18663e-06,8.24218e-09,0.334269,0.00117232,1.21135e-06,-4.74377e-10,0.335443,0.00117474,1.20993e-06,-6.34471e-09,0.336619,0.00117714,1.1909e-06,-3.94922e-09,0.337797,0.00117951,1.17905e-06,2.21417e-08,0.338978,0.00118193,1.24547e-06,-2.50128e-08,0.340161,0.00118435,1.17043e-06,1.8305e-08,0.341346,0.00118674,1.22535e-06,-1.84048e-08,0.342534,0.00118914,1.17013e-06,2.55121e-08,0.343725,0.00119156,1.24667e-06,-2.40389e-08,0.344917,0.00119398,1.17455e-06,1.10389e-08,0.346113,0.00119636,1.20767e-06,9.68574e-09,0.34731,0.0011988,1.23673e-06,-1.99797e-08,0.34851,0.00120122,1.17679e-06,1.06284e-08,0.349713,0.0012036,1.20867e-06,7.26868e-09,0.350917,0.00120604,1.23048e-06,-9.90072e-09,0.352125,0.00120847,1.20078e-06,2.53177e-09,0.353334,0.00121088,1.20837e-06,-2.26199e-10,0.354546,0.0012133,1.20769e-06,-1.62705e-09,0.355761,0.00121571,1.20281e-06,6.73435e-09,0.356978,0.00121813,1.22302e-06,4.49207e-09,0.358197,0.00122059,1.23649e-06,-2.47027e-08,0.359419,0.00122299,1.16238e-06,3.47142e-08,0.360643,0.00122542,1.26653e-06,-2.47472e-08,0.36187,0.00122788,1.19229e-06,4.66965e-09,0.363099,0.00123028,1.20629e-06,6.06872e-09,0.36433,0.00123271,1.2245e-06,8.57729e-10,0.365564,0.00123516,1.22707e-06,-9.49952e-09,0.366801,0.00123759,1.19858e-06,7.33792e-09,0.36804,0.00124001,1.22059e-06,9.95025e-09,0.369281,0.00124248,1.25044e-06,-1.73366e-08,0.370525,0.00124493,1.19843e-06,-2.08464e-10,0.371771,0.00124732,1.1978e-06,1.81704e-08,0.373019,0.00124977,1.25232e-06,-1.28683e-08,0.37427,0.00125224,1.21371e-06,3.50042e-09,0.375524,0.00125468,1.22421e-06,-1.1335e-09,0.37678,0.00125712,1.22081e-06,1.03345e-09,0.378038,0.00125957,1.22391e-06,-3.00023e-09,0.379299,0.00126201,1.21491e-06,1.09676e-08,0.380562,0.00126447,1.24781e-06,-1.10676e-08,0.381828,0.00126693,1.21461e-06,3.50042e-09,0.383096,0.00126937,1.22511e-06,-2.93403e-09,0.384366,0.00127181,1.21631e-06,8.23574e-09,0.385639,0.00127427,1.24102e-06,-2.06607e-10,0.386915,0.00127675,1.2404e-06,-7.40935e-09,0.388193,0.00127921,1.21817e-06,4.1761e-11,0.389473,0.00128165,1.21829e-06,7.24223e-09,0.390756,0.0012841,1.24002e-06,7.91564e-10,0.392042,0.00128659,1.2424e-06,-1.04086e-08,0.393329,0.00128904,1.21117e-06,1.10405e-08,0.39462,0.0012915,1.24429e-06,-3.951e-09,0.395912,0.00129397,1.23244e-06,4.7634e-09,0.397208,0.00129645,1.24673e-06,-1.51025e-08,0.398505,0.0012989,1.20142e-06,2.58443e-08,0.399805,0.00130138,1.27895e-06,-2.86702e-08,0.401108,0.00130385,1.19294e-06,2.92318e-08,0.402413,0.00130632,1.28064e-06,-2.86524e-08,0.403721,0.0013088,1.19468e-06,2.57731e-08,0.405031,0.00131127,1.272e-06,-1.48355e-08,0.406343,0.00131377,1.2275e-06,3.76652e-09,0.407658,0.00131623,1.23879e-06,-2.30784e-10,0.408976,0.00131871,1.2381e-06,-2.84331e-09,0.410296,0.00132118,1.22957e-06,1.16041e-08,0.411618,0.00132367,1.26438e-06,-1.37708e-08,0.412943,0.00132616,1.22307e-06,1.36768e-08,0.41427,0.00132865,1.2641e-06,-1.1134e-08,0.4156,0.00133114,1.2307e-06,1.05714e-09,0.416933,0.00133361,1.23387e-06,6.90538e-09,0.418267,0.00133609,1.25459e-06,1.12372e-09,0.419605,0.00133861,1.25796e-06,-1.14002e-08,0.420945,0.00134109,1.22376e-06,1.46747e-08,0.422287,0.00134358,1.26778e-06,-1.7496e-08,0.423632,0.00134606,1.21529e-06,2.5507e-08,0.424979,0.00134857,1.29182e-06,-2.49272e-08,0.426329,0.00135108,1.21703e-06,1.45972e-08,0.427681,0.00135356,1.26083e-06,-3.65935e-09,0.429036,0.00135607,1.24985e-06,4.00178e-11,0.430393,0.00135857,1.24997e-06,3.49917e-09,0.431753,0.00136108,1.26047e-06,-1.40366e-08,0.433116,0.00136356,1.21836e-06,2.28448e-08,0.43448,0.00136606,1.28689e-06,-1.77378e-08,0.435848,0.00136858,1.23368e-06,1.83043e-08,0.437218,0.0013711,1.28859e-06,-2.56769e-08,0.43859,0.0013736,1.21156e-06,2.47987e-08,0.439965,0.0013761,1.28595e-06,-1.39133e-08,0.441342,0.00137863,1.24421e-06,1.05202e-09,0.442722,0.00138112,1.24737e-06,9.70507e-09,0.444104,0.00138365,1.27649e-06,-1.00698e-08,0.445489,0.00138617,1.24628e-06,7.72123e-10,0.446877,0.00138867,1.24859e-06,6.98132e-09,0.448267,0.00139118,1.26954e-06,1.10477e-09,0.449659,0.00139373,1.27285e-06,-1.14003e-08,0.451054,0.00139624,1.23865e-06,1.4694e-08,0.452452,0.00139876,1.28273e-06,-1.75734e-08,0.453852,0.00140127,1.23001e-06,2.5797e-08,0.455254,0.00140381,1.3074e-06,-2.60097e-08,0.456659,0.00140635,1.22937e-06,1.86371e-08,0.458067,0.00140886,1.28529e-06,-1.8736e-08,0.459477,0.00141137,1.22908e-06,2.65048e-08,0.46089,0.00141391,1.30859e-06,-2.76784e-08,0.462305,0.00141645,1.22556e-06,2.46043e-08,0.463722,0.00141897,1.29937e-06,-1.11341e-08,0.465143,0.00142154,1.26597e-06,-9.87033e-09,0.466565,0.00142404,1.23636e-06,2.08131e-08,0.467991,0.00142657,1.2988e-06,-1.37773e-08,0.469419,0.00142913,1.25746e-06,4.49378e-09,0.470849,0.00143166,1.27094e-06,-4.19781e-09,0.472282,0.00143419,1.25835e-06,1.22975e-08,0.473717,0.00143674,1.29524e-06,-1.51902e-08,0.475155,0.00143929,1.24967e-06,1.86608e-08,0.476596,0.00144184,1.30566e-06,-2.96506e-08,0.478039,0.00144436,1.2167e-06,4.03368e-08,0.479485,0.00144692,1.33771e-06,-4.22896e-08,0.480933,0.00144947,1.21085e-06,3.94148e-08,0.482384,0.00145201,1.32909e-06,-2.59626e-08,0.483837,0.00145459,1.2512e-06,4.83124e-09,0.485293,0.0014571,1.2657e-06,6.63757e-09,0.486751,0.00145966,1.28561e-06,-1.57911e-09,0.488212,0.00146222,1.28087e-06,-3.21468e-10,0.489676,0.00146478,1.27991e-06,2.86517e-09,0.491142,0.00146735,1.2885e-06,-1.11392e-08,0.49261,0.00146989,1.25508e-06,1.18893e-08,0.494081,0.00147244,1.29075e-06,-6.61574e-09,0.495555,0.001475,1.27091e-06,1.45736e-08,0.497031,0.00147759,1.31463e-06,-2.18759e-08,0.49851,0.00148015,1.249e-06,1.33252e-08,0.499992,0.00148269,1.28897e-06,-1.62277e-09,0.501476,0.00148526,1.28411e-06,-6.83421e-09,0.502962,0.00148781,1.2636e-06,2.89596e-08,0.504451,0.00149042,1.35048e-06,-4.93997e-08,0.505943,0.00149298,1.20228e-06,4.94299e-08,0.507437,0.00149553,1.35057e-06,-2.91107e-08,0.508934,0.00149814,1.26324e-06,7.40848e-09,0.510434,0.00150069,1.28547e-06,-5.23187e-10,0.511936,0.00150326,1.2839e-06,-5.31585e-09,0.51344,0.00150581,1.26795e-06,2.17866e-08,0.514947,0.00150841,1.33331e-06,-2.22257e-08,0.516457,0.00151101,1.26663e-06,7.51178e-09,0.517969,0.00151357,1.28917e-06,-7.82128e-09,0.519484,0.00151613,1.2657e-06,2.37733e-08,0.521002,0.00151873,1.33702e-06,-2.76674e-08,0.522522,0.00152132,1.25402e-06,2.72917e-08,0.524044,0.00152391,1.3359e-06,-2.18949e-08,0.525569,0.00152652,1.27021e-06,6.83372e-10,0.527097,0.00152906,1.27226e-06,1.91613e-08,0.528628,0.00153166,1.32974e-06,-1.77241e-08,0.53016,0.00153427,1.27657e-06,-7.86963e-09,0.531696,0.0015368,1.25296e-06,4.92027e-08,0.533234,0.00153945,1.40057e-06,-6.9732e-08,0.534775,0.00154204,1.19138e-06,5.09114e-08,0.536318,0.00154458,1.34411e-06,-1.4704e-08,0.537864,0.00154722,1.3e-06,7.9048e-09,0.539413,0.00154984,1.32371e-06,-1.69152e-08,0.540964,0.00155244,1.27297e-06,1.51355e-10,0.542517,0.00155499,1.27342e-06,1.63099e-08,0.544074,0.00155758,1.32235e-06,-5.78647e-09,0.545633,0.00156021,1.30499e-06,6.83599e-09,0.547194,0.00156284,1.3255e-06,-2.15575e-08,0.548758,0.00156543,1.26083e-06,1.97892e-08,0.550325,0.00156801,1.32019e-06,2.00525e-09,0.551894,0.00157065,1.32621e-06,-2.78103e-08,0.553466,0.00157322,1.24278e-06,4.96314e-08,0.555041,0.00157586,1.39167e-06,-5.1506e-08,0.556618,0.00157849,1.23716e-06,3.71835e-08,0.558198,0.00158107,1.34871e-06,-3.76233e-08,0.55978,0.00158366,1.23584e-06,5.37052e-08,0.561365,0.00158629,1.39695e-06,-5.79884e-08,0.562953,0.00158891,1.22299e-06,5.90392e-08,0.564543,0.00159153,1.4001e-06,-5.89592e-08,0.566136,0.00159416,1.22323e-06,5.7588e-08,0.567731,0.00159678,1.39599e-06,-5.21835e-08,0.569329,0.00159941,1.23944e-06,3.19369e-08,0.57093,0.00160199,1.33525e-06,-1.59594e-08,0.572533,0.00160461,1.28737e-06,3.19006e-08,0.574139,0.00160728,1.38307e-06,-5.20383e-08,0.575748,0.00160989,1.22696e-06,5.70431e-08,0.577359,0.00161251,1.39809e-06,-5.69247e-08,0.578973,0.00161514,1.22731e-06,5.14463e-08,0.580589,0.00161775,1.38165e-06,-2.9651e-08,0.582208,0.00162042,1.2927e-06,7.55339e-09,0.58383,0.00162303,1.31536e-06,-5.62636e-10,0.585455,0.00162566,1.31367e-06,-5.30281e-09,0.587081,0.00162827,1.29776e-06,2.17738e-08,0.588711,0.00163093,1.36309e-06,-2.21875e-08,0.590343,0.00163359,1.29652e-06,7.37164e-09,0.591978,0.00163621,1.31864e-06,-7.29907e-09,0.593616,0.00163882,1.29674e-06,2.18247e-08,0.595256,0.00164148,1.36221e-06,-2.03952e-08,0.596899,0.00164414,1.30103e-06,1.51241e-10,0.598544,0.00164675,1.30148e-06,1.97902e-08,0.600192,0.00164941,1.36085e-06,-1.97074e-08,0.601843,0.00165207,1.30173e-06,-5.65175e-10,0.603496,0.00165467,1.30004e-06,2.1968e-08,0.605152,0.00165734,1.36594e-06,-2.77024e-08,0.606811,0.00165999,1.28283e-06,2.92369e-08,0.608472,0.00166264,1.37054e-06,-2.96407e-08,0.610136,0.00166529,1.28162e-06,2.97215e-08,0.611803,0.00166795,1.37079e-06,-2.96408e-08,0.613472,0.0016706,1.28186e-06,2.92371e-08,0.615144,0.00167325,1.36957e-06,-2.77031e-08,0.616819,0.00167591,1.28647e-06,2.19708e-08,0.618496,0.00167855,1.35238e-06,-5.75407e-10,0.620176,0.00168125,1.35065e-06,-1.9669e-08,0.621858,0.00168389,1.29164e-06,1.96468e-08,0.623544,0.00168653,1.35058e-06,6.86403e-10,0.625232,0.00168924,1.35264e-06,-2.23924e-08,0.626922,0.00169187,1.28547e-06,2.92788e-08,0.628615,0.00169453,1.3733e-06,-3.51181e-08,0.630311,0.00169717,1.26795e-06,5.15889e-08,0.63201,0.00169987,1.42272e-06,-5.2028e-08,0.633711,0.00170255,1.26663e-06,3.73139e-08,0.635415,0.0017052,1.37857e-06,-3.76227e-08,0.637121,0.00170784,1.2657e-06,5.35722e-08,0.63883,0.00171054,1.42642e-06,-5.74567e-08,0.640542,0.00171322,1.25405e-06,5.70456e-08,0.642257,0.0017159,1.42519e-06,-5.15163e-08,0.643974,0.00171859,1.27064e-06,2.98103e-08,0.645694,0.00172122,1.36007e-06,-8.12016e-09,0.647417,0.00172392,1.33571e-06,2.67039e-09,0.649142,0.0017266,1.34372e-06,-2.56152e-09,0.65087,0.00172928,1.33604e-06,7.57571e-09,0.6526,0.00173197,1.35876e-06,-2.77413e-08,0.654334,0.00173461,1.27554e-06,4.3785e-08,0.65607,0.00173729,1.40689e-06,-2.81896e-08,0.657808,0.00174002,1.32233e-06,9.36893e-09,0.65955,0.00174269,1.35043e-06,-9.28617e-09,0.661294,0.00174536,1.32257e-06,2.77757e-08,0.66304,0.00174809,1.4059e-06,-4.2212e-08,0.66479,0.00175078,1.27926e-06,2.1863e-08,0.666542,0.0017534,1.34485e-06,1.43648e-08,0.668297,0.00175613,1.38795e-06,-1.97177e-08,0.670054,0.00175885,1.3288e-06,4.90115e-09,0.671814,0.00176152,1.3435e-06,1.13232e-10,0.673577,0.00176421,1.34384e-06,-5.3542e-09,0.675343,0.00176688,1.32778e-06,2.13035e-08,0.677111,0.0017696,1.39169e-06,-2.02553e-08,0.678882,0.00177232,1.33092e-06,1.13005e-10,0.680656,0.00177499,1.33126e-06,1.98031e-08,0.682432,0.00177771,1.39067e-06,-1.97211e-08,0.684211,0.00178043,1.33151e-06,-5.2349e-10,0.685993,0.00178309,1.32994e-06,2.18151e-08,0.687777,0.00178582,1.39538e-06,-2.71325e-08,0.689564,0.00178853,1.31398e-06,2.71101e-08,0.691354,0.00179124,1.39531e-06,-2.17035e-08,0.693147,0.00179396,1.3302e-06,9.92865e-11,0.694942,0.00179662,1.3305e-06,2.13063e-08,0.69674,0.00179935,1.39442e-06,-2.57198e-08,0.698541,0.00180206,1.31726e-06,2.19682e-08,0.700344,0.00180476,1.38317e-06,-2.54852e-09,0.70215,0.00180752,1.37552e-06,-1.17741e-08,0.703959,0.00181023,1.3402e-06,-9.95999e-09,0.705771,0.00181288,1.31032e-06,5.16141e-08,0.707585,0.00181566,1.46516e-06,-7.72869e-08,0.709402,0.00181836,1.2333e-06,7.87197e-08,0.711222,0.00182106,1.46946e-06,-5.87781e-08,0.713044,0.00182382,1.29312e-06,3.71834e-08,0.714869,0.00182652,1.40467e-06,-3.03511e-08,0.716697,0.00182924,1.31362e-06,2.46161e-08,0.718528,0.00183194,1.38747e-06,-8.5087e-09,0.720361,0.00183469,1.36194e-06,9.41892e-09,0.722197,0.00183744,1.3902e-06,-2.91671e-08,0.724036,0.00184014,1.3027e-06,4.76448e-08,0.725878,0.00184288,1.44563e-06,-4.22028e-08,0.727722,0.00184565,1.31902e-06,1.95682e-09,0.729569,0.00184829,1.3249e-06,3.43754e-08,0.731419,0.00185104,1.42802e-06,-2.0249e-08,0.733271,0.00185384,1.36727e-06,-1.29838e-08,0.735126,0.00185654,1.32832e-06,1.25794e-08,0.736984,0.00185923,1.36606e-06,2.22711e-08,0.738845,0.00186203,1.43287e-06,-4.20594e-08,0.740708,0.00186477,1.3067e-06,2.67571e-08,0.742574,0.00186746,1.38697e-06,-5.36424e-09,0.744443,0.00187022,1.37087e-06,-5.30023e-09,0.746315,0.00187295,1.35497e-06,2.65653e-08,0.748189,0.00187574,1.43467e-06,-4.13564e-08,0.750066,0.00187848,1.3106e-06,1.9651e-08,0.751946,0.00188116,1.36955e-06,2.23572e-08,0.753828,0.00188397,1.43663e-06,-4.9475e-08,0.755714,0.00188669,1.2882e-06,5.63335e-08,0.757602,0.00188944,1.4572e-06,-5.66499e-08,0.759493,0.00189218,1.28725e-06,5.10567e-08,0.761386,0.00189491,1.44042e-06,-2.83677e-08,0.763283,0.00189771,1.35532e-06,2.80962e-09,0.765182,0.00190042,1.36375e-06,1.71293e-08,0.767083,0.0019032,1.41513e-06,-1.17221e-08,0.768988,0.001906,1.37997e-06,-2.98453e-08,0.770895,0.00190867,1.29043e-06,7.14987e-08,0.772805,0.00191146,1.50493e-06,-7.73354e-08,0.774718,0.00191424,1.27292e-06,5.90292e-08,0.776634,0.00191697,1.45001e-06,-3.9572e-08,0.778552,0.00191975,1.33129e-06,3.9654e-08,0.780473,0.00192253,1.45026e-06,-5.94395e-08,0.782397,0.00192525,1.27194e-06,7.88945e-08,0.784324,0.00192803,1.50862e-06,-7.73249e-08,0.786253,0.00193082,1.27665e-06,5.15913e-08,0.788185,0.00193352,1.43142e-06,-9.83099e-09,0.79012,0.00193636,1.40193e-06,-1.22672e-08,0.792058,0.00193912,1.36513e-06,-7.05275e-10,0.793999,0.00194185,1.36301e-06,1.50883e-08,0.795942,0.00194462,1.40828e-06,-4.33147e-11,0.797888,0.00194744,1.40815e-06,-1.49151e-08,0.799837,0.00195021,1.3634e-06,9.93244e-11,0.801788,0.00195294,1.3637e-06,1.45179e-08,0.803743,0.00195571,1.40725e-06,1.43363e-09,0.8057,0.00195853,1.41155e-06,-2.02525e-08,0.80766,0.00196129,1.35079e-06,1.99718e-08,0.809622,0.00196405,1.41071e-06,-3.01649e-11,0.811588,0.00196687,1.41062e-06,-1.9851e-08,0.813556,0.00196964,1.35107e-06,1.98296e-08,0.815527,0.0019724,1.41056e-06,1.37485e-10,0.817501,0.00197522,1.41097e-06,-2.03796e-08,0.819477,0.00197798,1.34983e-06,2.17763e-08,0.821457,0.00198074,1.41516e-06,-7.12085e-09,0.823439,0.00198355,1.3938e-06,6.70707e-09,0.825424,0.00198636,1.41392e-06,-1.97074e-08,0.827412,0.00198913,1.35479e-06,1.25179e-08,0.829402,0.00199188,1.39235e-06,2.92405e-08,0.831396,0.00199475,1.48007e-06,-6.98755e-08,0.833392,0.0019975,1.27044e-06,7.14477e-08,0.835391,0.00200026,1.48479e-06,-3.71014e-08,0.837392,0.00200311,1.37348e-06,1.73533e-08,0.839397,0.00200591,1.42554e-06,-3.23118e-08,0.841404,0.00200867,1.32861e-06,5.2289e-08,0.843414,0.00201148,1.48547e-06,-5.76348e-08,0.845427,0.00201428,1.31257e-06,5.9041e-08,0.847443,0.00201708,1.48969e-06,-5.93197e-08,0.849461,0.00201988,1.31173e-06,5.90289e-08,0.851482,0.00202268,1.48882e-06,-5.75864e-08,0.853507,0.00202549,1.31606e-06,5.21075e-08,0.855533,0.00202828,1.47238e-06,-3.16344e-08,0.857563,0.00203113,1.37748e-06,1.48257e-08,0.859596,0.00203393,1.42196e-06,-2.76684e-08,0.861631,0.00203669,1.33895e-06,3.62433e-08,0.863669,0.00203947,1.44768e-06,1.90463e-09,0.86571,0.00204237,1.45339e-06,-4.38617e-08,0.867754,0.00204515,1.32181e-06,5.43328e-08,0.8698,0.00204796,1.48481e-06,-5.42603e-08,0.87185,0.00205076,1.32203e-06,4.34989e-08,0.873902,0.00205354,1.45252e-06,-5.26029e-10,0.875957,0.00205644,1.45095e-06,-4.13949e-08,0.878015,0.00205922,1.32676e-06,4.68962e-08,0.880075,0.00206201,1.46745e-06,-2.69807e-08,0.882139,0.00206487,1.38651e-06,1.42181e-09,0.884205,0.00206764,1.39077e-06,2.12935e-08,0.886274,0.00207049,1.45465e-06,-2.69912e-08,0.888346,0.00207332,1.37368e-06,2.70664e-08,0.890421,0.00207615,1.45488e-06,-2.16698e-08,0.892498,0.00207899,1.38987e-06,8.14756e-12,0.894579,0.00208177,1.38989e-06,2.16371e-08,0.896662,0.00208462,1.45481e-06,-2.6952e-08,0.898748,0.00208744,1.37395e-06,2.65663e-08,0.900837,0.00209027,1.45365e-06,-1.97084e-08,0.902928,0.00209312,1.39452e-06,-7.33731e-09,0.905023,0.00209589,1.37251e-06,4.90578e-08,0.90712,0.00209878,1.51968e-06,-6.96845e-08,0.90922,0.00210161,1.31063e-06,5.08664e-08,0.911323,0.00210438,1.46323e-06,-1.45717e-08,0.913429,0.00210727,1.41952e-06,7.42038e-09,0.915538,0.00211013,1.44178e-06,-1.51097e-08,0.917649,0.00211297,1.39645e-06,-6.58618e-09,0.919764,0.00211574,1.37669e-06,4.14545e-08,0.921881,0.00211862,1.50105e-06,-4.00222e-08,0.924001,0.0021215,1.38099e-06,-5.7518e-10,0.926124,0.00212426,1.37926e-06,4.23229e-08,0.92825,0.00212714,1.50623e-06,-4.9507e-08,0.930378,0.00213001,1.35771e-06,3.64958e-08,0.93251,0.00213283,1.4672e-06,-3.68713e-08,0.934644,0.00213566,1.35658e-06,5.13848e-08,0.936781,0.00213852,1.51074e-06,-4.94585e-08,0.938921,0.0021414,1.36236e-06,2.72399e-08,0.941064,0.0021442,1.44408e-06,1.0372e-10,0.943209,0.00214709,1.44439e-06,-2.76547e-08,0.945358,0.0021499,1.36143e-06,5.09106e-08,0.947509,0.00215277,1.51416e-06,-5.67784e-08,0.949663,0.00215563,1.34382e-06,5.69935e-08,0.95182,0.00215849,1.5148e-06,-5.19861e-08,0.95398,0.00216136,1.35885e-06,3.17417e-08,0.956143,0.00216418,1.45407e-06,-1.53758e-08,0.958309,0.00216704,1.40794e-06,2.97615e-08,0.960477,0.00216994,1.49723e-06,-4.40657e-08,0.962649,0.00217281,1.36503e-06,2.72919e-08,0.964823,0.00217562,1.44691e-06,-5.49729e-09,0.967,0.0021785,1.43041e-06,-5.30273e-09,0.96918,0.00218134,1.41451e-06,2.67084e-08,0.971363,0.00218425,1.49463e-06,-4.19265e-08,0.973548,0.00218711,1.36885e-06,2.17881e-08,0.975737,0.00218992,1.43422e-06,1.43789e-08,0.977928,0.00219283,1.47735e-06,-1.96989e-08,0.980122,0.00219572,1.41826e-06,4.81221e-09,0.98232,0.00219857,1.43269e-06,4.50048e-10,0.98452,0.00220144,1.43404e-06,-6.61237e-09,0.986722,0.00220429,1.41421e-06,2.59993e-08,0.988928,0.0022072,1.4922e-06,-3.77803e-08,0.991137,0.00221007,1.37886e-06,5.9127e-09,0.993348,0.00221284,1.3966e-06,1.33339e-07,0.995563,0.00221604,1.79662e-06,-5.98872e-07,0.99778,0.00222015,0.,0.};
+
+    __device__ static int LabCbrt_b(int i)
+    {
+        float x = i * (1.f / (255.f * (1 << gamma_shift)));
+        return (1 << lab_shift2) * (x < 0.008856f ? x * 7.787f + 0.13793103448275862f : ::cbrtf(x));
+    }
+
+    __device__ static float splineInterpolate(float x, const float* tab, int n)
+    {
+        int ix = ::min(::max(int(x), 0), n-1);
+        x -= ix;
+        tab += ix * 4;
+        return ((tab[3] * x + tab[2]) * x + tab[1]) * x + tab[0];
+    }
+
+    template <typename T, int scn, int dcn, bool srgb, int blueIdx> struct RGB2Lab;
+
+    template <int scn, int dcn, bool srgb, int blueIdx> struct RGB2Lab<uchar, scn, dcn, srgb, blueIdx>
+            : unary_function<typename MakeVec<uchar, scn>::type, typename MakeVec<uchar, dcn>::type>
+    {
+        __device__ typename MakeVec<uchar, dcn>::type operator ()(const typename MakeVec<uchar, scn>::type& src) const
+        {
+            const int Lscale = (116 * 255 + 50) / 100;
+            const int Lshift = -((16 * 255 * (1 << lab_shift2) + 50) / 100);
+
+            int B = blueIdx == 0 ? src.x : src.z;
+            int G = src.y;
+            int R = blueIdx == 0 ? src.z : src.x;
+
+            if (srgb)
+            {
+                B = c_sRGBGammaTab_b[B];
+                G = c_sRGBGammaTab_b[G];
+                R = c_sRGBGammaTab_b[R];
+            }
+            else
+            {
+                B <<= 3;
+                G <<= 3;
+                R <<= 3;
+            }
+
+            int fX = LabCbrt_b(CV_CUDEV_DESCALE(B * 778 + G * 1541 + R * 1777, lab_shift));
+            int fY = LabCbrt_b(CV_CUDEV_DESCALE(B * 296 + G * 2929 + R * 871, lab_shift));
+            int fZ = LabCbrt_b(CV_CUDEV_DESCALE(B * 3575 + G * 448 + R * 73, lab_shift));
+
+            int L = CV_CUDEV_DESCALE(Lscale * fY + Lshift, lab_shift2);
+            int a = CV_CUDEV_DESCALE(500 * (fX - fY) + 128 * (1 << lab_shift2), lab_shift2);
+            int b = CV_CUDEV_DESCALE(200 * (fY - fZ) + 128 * (1 << lab_shift2), lab_shift2);
+
+            typename MakeVec<uchar, dcn>::type dst;
+
+            dst.x = saturate_cast<uchar>(L);
+            dst.y = saturate_cast<uchar>(a);
+            dst.z = saturate_cast<uchar>(b);
+
+            return dst;
+        }
+    };
+
+    template <int scn, int dcn, bool srgb, int blueIdx> struct RGB2Lab<float, scn, dcn, srgb, blueIdx>
+            : unary_function<typename MakeVec<float, scn>::type, typename MakeVec<float, dcn>::type>
+    {
+        __device__ typename MakeVec<float, dcn>::type operator ()(const typename MakeVec<float, scn>::type& src) const
+        {
+            const float _1_3 = 1.0f / 3.0f;
+            const float _a = 16.0f / 116.0f;
+
+            float B = blueIdx == 0 ? src.x : src.z;
+            float G = src.y;
+            float R = blueIdx == 0 ? src.z : src.x;
+
+            if (srgb)
+            {
+                B = splineInterpolate(B * GAMMA_TAB_SIZE, c_sRGBGammaTab, GAMMA_TAB_SIZE);
+                G = splineInterpolate(G * GAMMA_TAB_SIZE, c_sRGBGammaTab, GAMMA_TAB_SIZE);
+                R = splineInterpolate(R * GAMMA_TAB_SIZE, c_sRGBGammaTab, GAMMA_TAB_SIZE);
+            }
+
+            float X = B * 0.189828f + G * 0.376219f + R * 0.433953f;
+            float Y = B * 0.072169f + G * 0.715160f + R * 0.212671f;
+            float Z = B * 0.872766f + G * 0.109477f + R * 0.017758f;
+
+            float FX = X > 0.008856f ? ::powf(X, _1_3) : (7.787f * X + _a);
+            float FY = Y > 0.008856f ? ::powf(Y, _1_3) : (7.787f * Y + _a);
+            float FZ = Z > 0.008856f ? ::powf(Z, _1_3) : (7.787f * Z + _a);
+
+            float L = Y > 0.008856f ? (116.f * FY - 16.f) : (903.3f * Y);
+            float a = 500.f * (FX - FY);
+            float b = 200.f * (FY - FZ);
+
+            typename MakeVec<float, dcn>::type dst;
+
+            dst.x = L;
+            dst.y = a;
+            dst.z = b;
+
+            return dst;
+        }
+    };
+
+    // Lab to RGB
+
+    __constant__ float c_sRGBInvGammaTab[] = {0,0.0126255,0.,-8.33961e-06,0.0126172,0.0126005,-2.50188e-05,4.1698e-05,0.0252344,0.0126756,0.000100075,-0.000158451,0.0378516,0.0124004,-0.000375277,-0.000207393,0.0496693,0.0110276,-0.000997456,0.00016837,0.0598678,0.00953783,-0.000492346,2.07235e-05,0.068934,0.00861531,-0.000430176,3.62876e-05,0.0771554,0.00786382,-0.000321313,1.87625e-05,0.0847167,0.00727748,-0.000265025,1.53594e-05,0.0917445,0.00679351,-0.000218947,1.10545e-05,0.0983301,0.00638877,-0.000185784,8.66984e-06,0.104542,0.00604322,-0.000159774,6.82996e-06,0.110432,0.00574416,-0.000139284,5.51008e-06,0.116042,0.00548212,-0.000122754,4.52322e-06,0.121406,0.00525018,-0.000109184,3.75557e-06,0.126551,0.00504308,-9.79177e-05,3.17134e-06,0.131499,0.00485676,-8.84037e-05,2.68469e-06,0.13627,0.004688,-8.03496e-05,2.31725e-06,0.14088,0.00453426,-7.33978e-05,2.00868e-06,0.145343,0.00439349,-6.73718e-05,1.74775e-06,0.149671,0.00426399,-6.21286e-05,1.53547e-06,0.153875,0.00414434,-5.75222e-05,1.364e-06,0.157963,0.00403338,-5.34301e-05,1.20416e-06,0.161944,0.00393014,-4.98177e-05,1.09114e-06,0.165825,0.00383377,-4.65443e-05,9.57987e-07,0.169613,0.00374356,-4.36703e-05,8.88359e-07,0.173314,0.00365888,-4.10052e-05,7.7849e-07,0.176933,0.00357921,-3.86697e-05,7.36254e-07,0.180474,0.00350408,-3.6461e-05,6.42534e-07,0.183942,0.00343308,-3.45334e-05,6.12614e-07,0.187342,0.00336586,-3.26955e-05,5.42894e-07,0.190675,0.00330209,-3.10669e-05,5.08967e-07,0.193947,0.00324149,-2.954e-05,4.75977e-07,0.197159,0.00318383,-2.8112e-05,4.18343e-07,0.200315,0.00312887,-2.6857e-05,4.13651e-07,0.203418,0.00307639,-2.5616e-05,3.70847e-07,0.206469,0.00302627,-2.45035e-05,3.3813e-07,0.209471,0.00297828,-2.34891e-05,3.32999e-07,0.212426,0.0029323,-2.24901e-05,2.96826e-07,0.215336,0.00288821,-2.15996e-05,2.82736e-07,0.218203,0.00284586,-2.07514e-05,2.70961e-07,0.221029,0.00280517,-1.99385e-05,2.42744e-07,0.223814,0.00276602,-1.92103e-05,2.33277e-07,0.226561,0.0027283,-1.85105e-05,2.2486e-07,0.229271,0.00269195,-1.78359e-05,2.08383e-07,0.231945,0.00265691,-1.72108e-05,1.93305e-07,0.234585,0.00262307,-1.66308e-05,1.80687e-07,0.237192,0.00259035,-1.60888e-05,1.86632e-07,0.239766,0.00255873,-1.55289e-05,1.60569e-07,0.24231,0.00252815,-1.50472e-05,1.54566e-07,0.244823,0.00249852,-1.45835e-05,1.59939e-07,0.247307,0.00246983,-1.41037e-05,1.29549e-07,0.249763,0.00244202,-1.3715e-05,1.41429e-07,0.252191,0.00241501,-1.32907e-05,1.39198e-07,0.254593,0.00238885,-1.28731e-05,1.06444e-07,0.256969,0.00236342,-1.25538e-05,1.2048e-07,0.25932,0.00233867,-1.21924e-05,1.26892e-07,0.261647,0.00231467,-1.18117e-05,8.72084e-08,0.26395,0.00229131,-1.15501e-05,1.20323e-07,0.26623,0.00226857,-1.11891e-05,8.71514e-08,0.268487,0.00224645,-1.09276e-05,9.73165e-08,0.270723,0.00222489,-1.06357e-05,8.98259e-08,0.272937,0.00220389,-1.03662e-05,7.98218e-08,0.275131,0.00218339,-1.01267e-05,9.75254e-08,0.277304,0.00216343,-9.83416e-06,6.65195e-08,0.279458,0.00214396,-9.63461e-06,8.34313e-08,0.281592,0.00212494,-9.38431e-06,7.65919e-08,0.283708,0.00210641,-9.15454e-06,5.7236e-08,0.285805,0.00208827,-8.98283e-06,8.18939e-08,0.287885,0.00207055,-8.73715e-06,6.2224e-08,0.289946,0.00205326,-8.55047e-06,5.66388e-08,0.291991,0.00203633,-8.38056e-06,6.88491e-08,0.294019,0.00201978,-8.17401e-06,5.53955e-08,0.296031,0.00200359,-8.00782e-06,6.71971e-08,0.298027,0.00198778,-7.80623e-06,3.34439e-08,0.300007,0.00197227,-7.7059e-06,6.7248e-08,0.301971,0.00195706,-7.50416e-06,5.51915e-08,0.303921,0.00194221,-7.33858e-06,3.98124e-08,0.305856,0.00192766,-7.21915e-06,5.37795e-08,0.307776,0.00191338,-7.05781e-06,4.30919e-08,0.309683,0.00189939,-6.92853e-06,4.20744e-08,0.311575,0.00188566,-6.80231e-06,5.68321e-08,0.313454,0.00187223,-6.63181e-06,2.86195e-08,0.31532,0.00185905,-6.54595e-06,3.73075e-08,0.317172,0.00184607,-6.43403e-06,6.05684e-08,0.319012,0.00183338,-6.25233e-06,1.84426e-08,0.320839,0.00182094,-6.197e-06,4.44757e-08,0.322654,0.00180867,-6.06357e-06,4.20729e-08,0.324456,0.00179667,-5.93735e-06,2.56511e-08,0.326247,0.00178488,-5.8604e-06,3.41368e-08,0.328026,0.00177326,-5.75799e-06,4.64177e-08,0.329794,0.00176188,-5.61874e-06,1.86107e-08,0.33155,0.0017507,-5.5629e-06,2.81511e-08,0.333295,0.00173966,-5.47845e-06,4.75987e-08,0.335029,0.00172884,-5.33565e-06,1.98726e-08,0.336753,0.00171823,-5.27604e-06,2.19226e-08,0.338466,0.00170775,-5.21027e-06,4.14483e-08,0.340169,0.00169745,-5.08592e-06,2.09017e-08,0.341861,0.00168734,-5.02322e-06,2.39561e-08,0.343543,0.00167737,-4.95135e-06,3.22852e-08,0.345216,0.00166756,-4.85449e-06,2.57173e-08,0.346878,0.00165793,-4.77734e-06,1.38569e-08,0.348532,0.00164841,-4.73577e-06,3.80634e-08,0.350175,0.00163906,-4.62158e-06,1.27043e-08,0.35181,0.00162985,-4.58347e-06,3.03279e-08,0.353435,0.00162078,-4.49249e-06,1.49961e-08,0.355051,0.00161184,-4.4475e-06,2.88977e-08,0.356659,0.00160303,-4.3608e-06,1.84241e-08,0.358257,0.00159436,-4.30553e-06,1.6616e-08,0.359848,0.0015858,-4.25568e-06,3.43218e-08,0.361429,0.00157739,-4.15272e-06,-4.89172e-09,0.363002,0.00156907,-4.16739e-06,4.48498e-08,0.364567,0.00156087,-4.03284e-06,4.30676e-09,0.366124,0.00155282,-4.01992e-06,2.73303e-08,0.367673,0.00154486,-3.93793e-06,5.58036e-09,0.369214,0.001537,-3.92119e-06,3.97554e-08,0.370747,0.00152928,-3.80193e-06,-1.55904e-08,0.372272,0.00152163,-3.8487e-06,5.24081e-08,0.37379,0.00151409,-3.69147e-06,-1.52272e-08,0.375301,0.00150666,-3.73715e-06,3.83028e-08,0.376804,0.0014993,-3.62225e-06,1.10278e-08,0.378299,0.00149209,-3.58916e-06,6.99326e-09,0.379788,0.00148493,-3.56818e-06,2.06038e-08,0.381269,0.00147786,-3.50637e-06,2.98009e-08,0.382744,0.00147093,-3.41697e-06,-2.05978e-08,0.384211,0.00146404,-3.47876e-06,5.25899e-08,0.385672,0.00145724,-3.32099e-06,-1.09471e-08,0.387126,0.00145056,-3.35383e-06,2.10009e-08,0.388573,0.00144392,-3.29083e-06,1.63501e-08,0.390014,0.00143739,-3.24178e-06,3.00641e-09,0.391448,0.00143091,-3.23276e-06,3.12282e-08,0.392875,0.00142454,-3.13908e-06,-8.70932e-09,0.394297,0.00141824,-3.16521e-06,3.34114e-08,0.395712,0.00141201,-3.06497e-06,-5.72754e-09,0.397121,0.00140586,-3.08215e-06,1.9301e-08,0.398524,0.00139975,-3.02425e-06,1.7931e-08,0.39992,0.00139376,-2.97046e-06,-1.61822e-09,0.401311,0.00138781,-2.97531e-06,1.83442e-08,0.402696,0.00138192,-2.92028e-06,1.76485e-08,0.404075,0.00137613,-2.86733e-06,4.68617e-10,0.405448,0.00137039,-2.86593e-06,1.02794e-08,0.406816,0.00136469,-2.83509e-06,1.80179e-08,0.408178,0.00135908,-2.78104e-06,7.05594e-09,0.409534,0.00135354,-2.75987e-06,1.33633e-08,0.410885,0.00134806,-2.71978e-06,-9.04568e-10,0.41223,0.00134261,-2.72249e-06,2.0057e-08,0.41357,0.00133723,-2.66232e-06,1.00841e-08,0.414905,0.00133194,-2.63207e-06,-7.88835e-10,0.416234,0.00132667,-2.63444e-06,2.28734e-08,0.417558,0.00132147,-2.56582e-06,-1.29785e-09,0.418877,0.00131633,-2.56971e-06,1.21205e-08,0.420191,0.00131123,-2.53335e-06,1.24202e-08,0.421499,0.0013062,-2.49609e-06,-2.19681e-09,0.422803,0.0013012,-2.50268e-06,2.61696e-08,0.424102,0.00129628,-2.42417e-06,-1.30747e-08,0.425396,0.00129139,-2.46339e-06,2.6129e-08,0.426685,0.00128654,-2.38501e-06,-2.03454e-09,0.427969,0.00128176,-2.39111e-06,1.18115e-08,0.429248,0.00127702,-2.35567e-06,1.43932e-08,0.430523,0.00127235,-2.31249e-06,-9.77965e-09,0.431793,0.00126769,-2.34183e-06,2.47253e-08,0.433058,0.00126308,-2.26766e-06,2.85278e-10,0.434319,0.00125855,-2.2668e-06,3.93614e-09,0.435575,0.00125403,-2.25499e-06,1.37722e-08,0.436827,0.00124956,-2.21368e-06,5.79803e-10,0.438074,0.00124513,-2.21194e-06,1.37112e-08,0.439317,0.00124075,-2.1708e-06,4.17973e-09,0.440556,0.00123642,-2.15826e-06,-6.27703e-10,0.44179,0.0012321,-2.16015e-06,2.81332e-08,0.44302,0.00122787,-2.07575e-06,-2.24985e-08,0.444246,0.00122365,-2.14324e-06,3.20586e-08,0.445467,0.00121946,-2.04707e-06,-1.6329e-08,0.446685,0.00121532,-2.09605e-06,3.32573e-08,0.447898,0.00121122,-1.99628e-06,-2.72927e-08,0.449107,0.00120715,-2.07816e-06,4.6111e-08,0.450312,0.00120313,-1.93983e-06,-3.79416e-08,0.451514,0.00119914,-2.05365e-06,4.60507e-08,0.452711,0.00119517,-1.9155e-06,-2.7052e-08,0.453904,0.00119126,-1.99666e-06,3.23551e-08,0.455093,0.00118736,-1.89959e-06,-1.29613e-08,0.456279,0.00118352,-1.93848e-06,1.94905e-08,0.45746,0.0011797,-1.88e-06,-5.39588e-09,0.458638,0.00117593,-1.89619e-06,2.09282e-09,0.459812,0.00117214,-1.88991e-06,2.68267e-08,0.460982,0.00116844,-1.80943e-06,-1.99925e-08,0.462149,0.00116476,-1.86941e-06,2.3341e-08,0.463312,0.00116109,-1.79939e-06,-1.37674e-08,0.464471,0.00115745,-1.84069e-06,3.17287e-08,0.465627,0.00115387,-1.7455e-06,-2.37407e-08,0.466779,0.00115031,-1.81673e-06,3.34315e-08,0.467927,0.00114677,-1.71643e-06,-2.05786e-08,0.469073,0.00114328,-1.77817e-06,1.90802e-08,0.470214,0.00113978,-1.72093e-06,3.86247e-09,0.471352,0.00113635,-1.70934e-06,-4.72759e-09,0.472487,0.00113292,-1.72352e-06,1.50478e-08,0.473618,0.00112951,-1.67838e-06,4.14108e-09,0.474746,0.00112617,-1.66595e-06,-1.80986e-09,0.47587,0.00112283,-1.67138e-06,3.09816e-09,0.476991,0.0011195,-1.66209e-06,1.92198e-08,0.478109,0.00111623,-1.60443e-06,-2.03726e-08,0.479224,0.00111296,-1.66555e-06,3.2468e-08,0.480335,0.00110973,-1.56814e-06,-2.00922e-08,0.481443,0.00110653,-1.62842e-06,1.80983e-08,0.482548,0.00110333,-1.57413e-06,7.30362e-09,0.48365,0.0011002,-1.55221e-06,-1.75107e-08,0.484749,0.00109705,-1.60475e-06,3.29373e-08,0.485844,0.00109393,-1.50594e-06,-2.48315e-08,0.486937,0.00109085,-1.58043e-06,3.65865e-08,0.488026,0.0010878,-1.47067e-06,-3.21078e-08,0.489112,0.00108476,-1.56699e-06,3.22397e-08,0.490195,0.00108172,-1.47027e-06,-7.44391e-09,0.491276,0.00107876,-1.49261e-06,-2.46428e-09,0.492353,0.00107577,-1.5e-06,1.73011e-08,0.493427,0.00107282,-1.4481e-06,-7.13552e-09,0.494499,0.0010699,-1.4695e-06,1.1241e-08,0.495567,0.001067,-1.43578e-06,-8.02637e-09,0.496633,0.0010641,-1.45986e-06,2.08645e-08,0.497695,0.00106124,-1.39726e-06,-1.58271e-08,0.498755,0.0010584,-1.44475e-06,1.26415e-08,0.499812,0.00105555,-1.40682e-06,2.48655e-08,0.500866,0.00105281,-1.33222e-06,-5.24988e-08,0.501918,0.00104999,-1.48972e-06,6.59206e-08,0.502966,0.00104721,-1.29196e-06,-3.237e-08,0.504012,0.00104453,-1.38907e-06,3.95479e-09,0.505055,0.00104176,-1.3772e-06,1.65509e-08,0.506096,0.00103905,-1.32755e-06,-1.05539e-08,0.507133,0.00103637,-1.35921e-06,2.56648e-08,0.508168,0.00103373,-1.28222e-06,-3.25007e-08,0.509201,0.00103106,-1.37972e-06,4.47336e-08,0.51023,0.00102844,-1.24552e-06,-2.72245e-08,0.511258,0.00102587,-1.32719e-06,4.55952e-09,0.512282,0.00102323,-1.31352e-06,8.98645e-09,0.513304,0.00102063,-1.28656e-06,1.90992e-08,0.514323,0.00101811,-1.22926e-06,-2.57786e-08,0.51534,0.00101557,-1.30659e-06,2.44104e-08,0.516355,0.00101303,-1.23336e-06,-1.22581e-08,0.517366,0.00101053,-1.27014e-06,2.4622e-08,0.518376,0.00100806,-1.19627e-06,-2.66253e-08,0.519383,0.00100559,-1.27615e-06,2.22744e-08,0.520387,0.00100311,-1.20932e-06,-2.8679e-09,0.521389,0.00100068,-1.21793e-06,-1.08029e-08,0.522388,0.000998211,-1.25034e-06,4.60795e-08,0.523385,0.000995849,-1.1121e-06,-5.4306e-08,0.52438,0.000993462,-1.27502e-06,5.19354e-08,0.525372,0.000991067,-1.11921e-06,-3.42262e-08,0.526362,0.000988726,-1.22189e-06,2.53646e-08,0.52735,0.000986359,-1.14579e-06,-7.62782e-09,0.528335,0.000984044,-1.16868e-06,5.14668e-09,0.529318,0.000981722,-1.15324e-06,-1.29589e-08,0.530298,0.000979377,-1.19211e-06,4.66888e-08,0.531276,0.000977133,-1.05205e-06,-5.45868e-08,0.532252,0.000974865,-1.21581e-06,5.24495e-08,0.533226,0.000972591,-1.05846e-06,-3.60019e-08,0.534198,0.000970366,-1.16647e-06,3.19537e-08,0.535167,0.000968129,-1.07061e-06,-3.2208e-08,0.536134,0.000965891,-1.16723e-06,3.72738e-08,0.537099,0.000963668,-1.05541e-06,2.32205e-09,0.538061,0.000961564,-1.04844e-06,-4.65618e-08,0.539022,0.000959328,-1.18813e-06,6.47159e-08,0.53998,0.000957146,-9.93979e-07,-3.3488e-08,0.540936,0.000955057,-1.09444e-06,9.63166e-09,0.54189,0.000952897,-1.06555e-06,-5.03871e-09,0.542842,0.000950751,-1.08066e-06,1.05232e-08,0.543792,0.000948621,-1.04909e-06,2.25503e-08,0.544739,0.000946591,-9.81444e-07,-4.11195e-08,0.545685,0.000944504,-1.1048e-06,2.27182e-08,0.546628,0.000942363,-1.03665e-06,9.85146e-09,0.54757,0.000940319,-1.00709e-06,-2.51938e-09,0.548509,0.000938297,-1.01465e-06,2.25858e-10,0.549446,0.000936269,-1.01397e-06,1.61598e-09,0.550381,0.000934246,-1.00913e-06,-6.68983e-09,0.551315,0.000932207,-1.0292e-06,2.51434e-08,0.552246,0.000930224,-9.53765e-07,-3.42793e-08,0.553175,0.000928214,-1.0566e-06,5.23688e-08,0.554102,0.000926258,-8.99497e-07,-5.59865e-08,0.555028,0.000924291,-1.06746e-06,5.23679e-08,0.555951,0.000922313,-9.10352e-07,-3.42763e-08,0.556872,0.00092039,-1.01318e-06,2.51326e-08,0.557792,0.000918439,-9.37783e-07,-6.64954e-09,0.558709,0.000916543,-9.57732e-07,1.46554e-09,0.559625,0.000914632,-9.53335e-07,7.87281e-10,0.560538,0.000912728,-9.50973e-07,-4.61466e-09,0.56145,0.000910812,-9.64817e-07,1.76713e-08,0.56236,0.000908935,-9.11804e-07,-6.46564e-09,0.563268,0.000907092,-9.312e-07,8.19121e-09,0.564174,0.000905255,-9.06627e-07,-2.62992e-08,0.565078,0.000903362,-9.85524e-07,3.74007e-08,0.565981,0.000901504,-8.73322e-07,-4.0942e-09,0.566882,0.000899745,-8.85605e-07,-2.1024e-08,0.56778,0.00089791,-9.48677e-07,2.85854e-08,0.568677,0.000896099,-8.62921e-07,-3.3713e-08,0.569573,0.000894272,-9.64059e-07,4.6662e-08,0.570466,0.000892484,-8.24073e-07,-3.37258e-08,0.571358,0.000890734,-9.25251e-07,2.86365e-08,0.572247,0.00088897,-8.39341e-07,-2.12155e-08,0.573135,0.000887227,-9.02988e-07,-3.37913e-09,0.574022,0.000885411,-9.13125e-07,3.47319e-08,0.574906,0.000883689,-8.08929e-07,-1.63394e-08,0.575789,0.000882022,-8.57947e-07,-2.8979e-08,0.57667,0.00088022,-9.44885e-07,7.26509e-08,0.57755,0.000878548,-7.26932e-07,-8.28106e-08,0.578427,0.000876845,-9.75364e-07,7.97774e-08,0.579303,0.000875134,-7.36032e-07,-5.74849e-08,0.580178,0.00087349,-9.08486e-07,3.09529e-08,0.58105,0.000871765,-8.15628e-07,-6.72206e-09,0.581921,0.000870114,-8.35794e-07,-4.06451e-09,0.582791,0.00086843,-8.47987e-07,2.29799e-08,0.583658,0.000866803,-7.79048e-07,-2.82503e-08,0.584524,0.00086516,-8.63799e-07,3.04167e-08,0.585388,0.000863524,-7.72548e-07,-3.38119e-08,0.586251,0.000861877,-8.73984e-07,4.52264e-08,0.587112,0.000860265,-7.38305e-07,-2.78842e-08,0.587972,0.000858705,-8.21958e-07,6.70567e-09,0.58883,0.000857081,-8.01841e-07,1.06161e-09,0.589686,0.000855481,-7.98656e-07,-1.09521e-08,0.590541,0.00085385,-8.31512e-07,4.27468e-08,0.591394,0.000852316,-7.03272e-07,-4.08257e-08,0.592245,0.000850787,-8.25749e-07,1.34677e-09,0.593095,0.000849139,-8.21709e-07,3.54387e-08,0.593944,0.000847602,-7.15393e-07,-2.38924e-08,0.59479,0.0008461,-7.8707e-07,5.26143e-10,0.595636,0.000844527,-7.85491e-07,2.17879e-08,0.596479,0.000843021,-7.20127e-07,-2.80733e-08,0.597322,0.000841497,-8.04347e-07,3.09005e-08,0.598162,0.000839981,-7.11646e-07,-3.5924e-08,0.599002,0.00083845,-8.19418e-07,5.3191e-08,0.599839,0.000836971,-6.59845e-07,-5.76307e-08,0.600676,0.000835478,-8.32737e-07,5.81227e-08,0.60151,0.000833987,-6.58369e-07,-5.56507e-08,0.602344,0.000832503,-8.25321e-07,4.52706e-08,0.603175,0.000830988,-6.89509e-07,-6.22236e-09,0.604006,0.000829591,-7.08176e-07,-2.03811e-08,0.604834,0.000828113,-7.6932e-07,2.8142e-08,0.605662,0.000826659,-6.84894e-07,-3.25822e-08,0.606488,0.000825191,-7.8264e-07,4.25823e-08,0.607312,0.000823754,-6.54893e-07,-1.85376e-08,0.608135,0.000822389,-7.10506e-07,-2.80365e-08,0.608957,0.000820883,-7.94616e-07,7.1079e-08,0.609777,0.000819507,-5.81379e-07,-7.74655e-08,0.610596,0.000818112,-8.13775e-07,5.9969e-08,0.611413,0.000816665,-6.33868e-07,-4.32013e-08,0.612229,0.000815267,-7.63472e-07,5.32313e-08,0.613044,0.0008139,-6.03778e-07,-5.05148e-08,0.613857,0.000812541,-7.55323e-07,2.96187e-08,0.614669,0.000811119,-6.66466e-07,-8.35545e-09,0.615479,0.000809761,-6.91533e-07,3.80301e-09,0.616288,0.00080839,-6.80124e-07,-6.85666e-09,0.617096,0.000807009,-7.00694e-07,2.36237e-08,0.617903,0.000805678,-6.29822e-07,-2.80336e-08,0.618708,0.000804334,-7.13923e-07,2.8906e-08,0.619511,0.000802993,-6.27205e-07,-2.79859e-08,0.620314,0.000801655,-7.11163e-07,2.34329e-08,0.621114,0.000800303,-6.40864e-07,-6.14108e-09,0.621914,0.000799003,-6.59287e-07,1.13151e-09,0.622712,0.000797688,-6.55893e-07,1.61507e-09,0.62351,0.000796381,-6.51048e-07,-7.59186e-09,0.624305,0.000795056,-6.73823e-07,2.87524e-08,0.6251,0.000793794,-5.87566e-07,-4.7813e-08,0.625893,0.000792476,-7.31005e-07,4.32901e-08,0.626685,0.000791144,-6.01135e-07,-6.13814e-09,0.627475,0.000789923,-6.19549e-07,-1.87376e-08,0.628264,0.000788628,-6.75762e-07,2.14837e-08,0.629052,0.000787341,-6.11311e-07,-7.59265e-09,0.629839,0.000786095,-6.34089e-07,8.88692e-09,0.630625,0.000784854,-6.07428e-07,-2.7955e-08,0.631409,0.000783555,-6.91293e-07,4.33285e-08,0.632192,0.000782302,-5.61307e-07,-2.61497e-08,0.632973,0.000781101,-6.39757e-07,1.6658e-09,0.633754,0.000779827,-6.34759e-07,1.94866e-08,0.634533,0.000778616,-5.76299e-07,-2.00076e-08,0.635311,0.000777403,-6.36322e-07,9.39091e-10,0.636088,0.000776133,-6.33505e-07,1.62512e-08,0.636863,0.000774915,-5.84751e-07,-6.33937e-09,0.637638,0.000773726,-6.03769e-07,9.10609e-09,0.638411,0.000772546,-5.76451e-07,-3.00849e-08,0.639183,0.000771303,-6.66706e-07,5.1629e-08,0.639953,0.000770125,-5.11819e-07,-5.7222e-08,0.640723,0.000768929,-6.83485e-07,5.80497e-08,0.641491,0.000767736,-5.09336e-07,-5.57674e-08,0.642259,0.000766551,-6.76638e-07,4.58105e-08,0.643024,0.000765335,-5.39206e-07,-8.26541e-09,0.643789,0.000764231,-5.64002e-07,-1.27488e-08,0.644553,0.000763065,-6.02249e-07,-3.44168e-10,0.645315,0.00076186,-6.03281e-07,1.41254e-08,0.646077,0.000760695,-5.60905e-07,3.44727e-09,0.646837,0.000759584,-5.50563e-07,-2.79144e-08,0.647596,0.000758399,-6.34307e-07,4.86057e-08,0.648354,0.000757276,-4.88489e-07,-4.72989e-08,0.64911,0.000756158,-6.30386e-07,2.13807e-08,0.649866,0.000754961,-5.66244e-07,2.13808e-08,0.65062,0.000753893,-5.02102e-07,-4.7299e-08,0.651374,0.000752746,-6.43999e-07,4.86059e-08,0.652126,0.000751604,-4.98181e-07,-2.79154e-08,0.652877,0.000750524,-5.81927e-07,3.45089e-09,0.653627,0.000749371,-5.71575e-07,1.41119e-08,0.654376,0.00074827,-5.29239e-07,-2.93748e-10,0.655123,0.00074721,-5.3012e-07,-1.29368e-08,0.65587,0.000746111,-5.68931e-07,-7.56355e-09,0.656616,0.000744951,-5.91621e-07,4.3191e-08,0.65736,0.000743897,-4.62048e-07,-4.59911e-08,0.658103,0.000742835,-6.00022e-07,2.15642e-08,0.658846,0.0007417,-5.35329e-07,1.93389e-08,0.659587,0.000740687,-4.77312e-07,-3.93152e-08,0.660327,0.000739615,-5.95258e-07,1.87126e-08,0.661066,0.00073848,-5.3912e-07,2.40695e-08,0.661804,0.000737474,-4.66912e-07,-5.53859e-08,0.662541,0.000736374,-6.33069e-07,7.82648e-08,0.663277,0.000735343,-3.98275e-07,-7.88593e-08,0.664012,0.00073431,-6.34853e-07,5.83585e-08,0.664745,0.000733215,-4.59777e-07,-3.53656e-08,0.665478,0.000732189,-5.65874e-07,2.34994e-08,0.66621,0.000731128,-4.95376e-07,9.72743e-10,0.66694,0.00073014,-4.92458e-07,-2.73903e-08,0.66767,0.000729073,-5.74629e-07,4.89839e-08,0.668398,0.000728071,-4.27677e-07,-4.93359e-08,0.669126,0.000727068,-5.75685e-07,2.91504e-08,0.669853,0.000726004,-4.88234e-07,-7.66109e-09,0.670578,0.000725004,-5.11217e-07,1.49392e-09,0.671303,0.000723986,-5.06735e-07,1.68533e-09,0.672026,0.000722978,-5.01679e-07,-8.23525e-09,0.672749,0.00072195,-5.26385e-07,3.12556e-08,0.67347,0.000720991,-4.32618e-07,-5.71825e-08,0.674191,0.000719954,-6.04166e-07,7.8265e-08,0.67491,0.00071898,-3.69371e-07,-7.70634e-08,0.675628,0.00071801,-6.00561e-07,5.11747e-08,0.676346,0.000716963,-4.47037e-07,-8.42615e-09,0.677062,0.000716044,-4.72315e-07,-1.747e-08,0.677778,0.000715046,-5.24725e-07,1.87015e-08,0.678493,0.000714053,-4.68621e-07,2.26856e-09,0.679206,0.000713123,-4.61815e-07,-2.77758e-08,0.679919,0.000712116,-5.45142e-07,4.92298e-08,0.68063,0.000711173,-3.97453e-07,-4.99339e-08,0.681341,0.000710228,-5.47255e-07,3.12967e-08,0.682051,0.000709228,-4.53365e-07,-1.56481e-08,0.68276,0.000708274,-5.00309e-07,3.12958e-08,0.683467,0.000707367,-4.06422e-07,-4.99303e-08,0.684174,0.000706405,-5.56213e-07,4.9216e-08,0.68488,0.00070544,-4.08565e-07,-2.77245e-08,0.685585,0.00070454,-4.91738e-07,2.07748e-09,0.686289,0.000703562,-4.85506e-07,1.94146e-08,0.686992,0.00070265,-4.27262e-07,-2.01314e-08,0.687695,0.000701735,-4.87656e-07,1.50616e-09,0.688396,0.000700764,-4.83137e-07,1.41067e-08,0.689096,0.00069984,-4.40817e-07,1.67168e-09,0.689795,0.000698963,-4.35802e-07,-2.07934e-08,0.690494,0.000698029,-4.98182e-07,2.18972e-08,0.691192,0.000697099,-4.32491e-07,-7.19092e-09,0.691888,0.000696212,-4.54064e-07,6.86642e-09,0.692584,0.000695325,-4.33464e-07,-2.02747e-08,0.693279,0.000694397,-4.94288e-07,1.46279e-08,0.693973,0.000693452,-4.50405e-07,2.13678e-08,0.694666,0.000692616,-3.86301e-07,-4.04945e-08,0.695358,0.000691721,-5.07785e-07,2.14009e-08,0.696049,0.00069077,-4.43582e-07,1.44955e-08,0.69674,0.000689926,-4.00096e-07,-1.97783e-08,0.697429,0.000689067,-4.5943e-07,5.01296e-09,0.698118,0.000688163,-4.44392e-07,-2.73521e-10,0.698805,0.000687273,-4.45212e-07,-3.91893e-09,0.699492,0.000686371,-4.56969e-07,1.59493e-08,0.700178,0.000685505,-4.09121e-07,-2.73351e-10,0.700863,0.000684686,-4.09941e-07,-1.4856e-08,0.701548,0.000683822,-4.54509e-07,9.25979e-11,0.702231,0.000682913,-4.54231e-07,1.44855e-08,0.702913,0.000682048,-4.10775e-07,1.56992e-09,0.703595,0.000681231,-4.06065e-07,-2.07652e-08,0.704276,0.000680357,-4.68361e-07,2.18864e-08,0.704956,0.000679486,-4.02701e-07,-7.17595e-09,0.705635,0.000678659,-4.24229e-07,6.81748e-09,0.706313,0.000677831,-4.03777e-07,-2.0094e-08,0.70699,0.000676963,-4.64059e-07,1.39538e-08,0.707667,0.000676077,-4.22197e-07,2.38835e-08,0.708343,0.000675304,-3.50547e-07,-4.98831e-08,0.709018,0.000674453,-5.00196e-07,5.64395e-08,0.709692,0.000673622,-3.30878e-07,-5.66657e-08,0.710365,0.00067279,-5.00875e-07,5.1014e-08,0.711037,0.000671942,-3.47833e-07,-2.81809e-08,0.711709,0.000671161,-4.32376e-07,2.10513e-09,0.712379,0.000670303,-4.2606e-07,1.97604e-08,0.713049,0.00066951,-3.66779e-07,-2.15422e-08,0.713718,0.000668712,-4.31406e-07,6.8038e-09,0.714387,0.000667869,-4.10994e-07,-5.67295e-09,0.715054,0.00066703,-4.28013e-07,1.5888e-08,0.715721,0.000666222,-3.80349e-07,1.72576e-09,0.716387,0.000665467,-3.75172e-07,-2.27911e-08,0.717052,0.000664648,-4.43545e-07,2.9834e-08,0.717716,0.00066385,-3.54043e-07,-3.69401e-08,0.718379,0.000663031,-4.64864e-07,5.83219e-08,0.719042,0.000662277,-2.89898e-07,-7.71382e-08,0.719704,0.000661465,-5.21313e-07,7.14171e-08,0.720365,0.000660637,-3.07061e-07,-2.97161e-08,0.721025,0.000659934,-3.96209e-07,-1.21575e-08,0.721685,0.000659105,-4.32682e-07,1.87412e-08,0.722343,0.000658296,-3.76458e-07,-3.2029e-09,0.723001,0.000657533,-3.86067e-07,-5.9296e-09,0.723659,0.000656743,-4.03856e-07,2.69213e-08,0.724315,0.000656016,-3.23092e-07,-4.21511e-08,0.724971,0.000655244,-4.49545e-07,2.24737e-08,0.725625,0.000654412,-3.82124e-07,1.18611e-08,0.726279,0.000653683,-3.46541e-07,-1.03132e-08,0.726933,0.000652959,-3.7748e-07,-3.02128e-08,0.727585,0.000652114,-4.68119e-07,7.15597e-08,0.728237,0.000651392,-2.5344e-07,-7.72119e-08,0.728888,0.000650654,-4.85075e-07,5.8474e-08,0.729538,0.000649859,-3.09654e-07,-3.74746e-08,0.730188,0.000649127,-4.22077e-07,3.18197e-08,0.730837,0.000648379,-3.26618e-07,-3.01997e-08,0.731485,0.000647635,-4.17217e-07,2.93747e-08,0.732132,0.000646888,-3.29093e-07,-2.76943e-08,0.732778,0.000646147,-4.12176e-07,2.17979e-08,0.733424,0.000645388,-3.46783e-07,1.07292e-10,0.734069,0.000644695,-3.46461e-07,-2.22271e-08,0.734713,0.000643935,-4.13142e-07,2.91963e-08,0.735357,0.000643197,-3.25553e-07,-3.49536e-08,0.736,0.000642441,-4.30414e-07,5.10133e-08,0.736642,0.000641733,-2.77374e-07,-4.98904e-08,0.737283,0.000641028,-4.27045e-07,2.93392e-08,0.737924,0.000640262,-3.39028e-07,-7.86156e-09,0.738564,0.000639561,-3.62612e-07,2.10703e-09,0.739203,0.000638842,-3.56291e-07,-5.6653e-10,0.739842,0.000638128,-3.57991e-07,1.59086e-10,0.740479,0.000637412,-3.57513e-07,-6.98321e-11,0.741116,0.000636697,-3.57723e-07,1.20214e-10,0.741753,0.000635982,-3.57362e-07,-4.10987e-10,0.742388,0.000635266,-3.58595e-07,1.5237e-09,0.743023,0.000634553,-3.54024e-07,-5.68376e-09,0.743657,0.000633828,-3.71075e-07,2.12113e-08,0.744291,0.00063315,-3.07441e-07,-1.95569e-08,0.744924,0.000632476,-3.66112e-07,-2.58816e-09,0.745556,0.000631736,-3.73877e-07,2.99096e-08,0.746187,0.000631078,-2.84148e-07,-5.74454e-08,0.746818,0.000630337,-4.56484e-07,8.06629e-08,0.747448,0.000629666,-2.14496e-07,-8.63922e-08,0.748077,0.000628978,-4.73672e-07,8.60918e-08,0.748706,0.000628289,-2.15397e-07,-7.91613e-08,0.749334,0.000627621,-4.5288e-07,5.17393e-08,0.749961,0.00062687,-2.97663e-07,-8.58662e-09,0.750588,0.000626249,-3.23422e-07,-1.73928e-08,0.751214,0.00062555,-3.75601e-07,1.85532e-08,0.751839,0.000624855,-3.19941e-07,2.78479e-09,0.752463,0.000624223,-3.11587e-07,-2.96923e-08,0.753087,0.000623511,-4.00664e-07,5.63799e-08,0.75371,0.000622879,-2.31524e-07,-7.66179e-08,0.754333,0.000622186,-4.61378e-07,7.12778e-08,0.754955,0.000621477,-2.47545e-07,-2.96794e-08,0.755576,0.000620893,-3.36583e-07,-1.21648e-08,0.756196,0.000620183,-3.73077e-07,1.87339e-08,0.756816,0.000619493,-3.16875e-07,-3.16622e-09,0.757435,0.00061885,-3.26374e-07,-6.0691e-09,0.758054,0.000618179,-3.44581e-07,2.74426e-08,0.758672,0.000617572,-2.62254e-07,-4.40968e-08,0.759289,0.000616915,-3.94544e-07,2.97352e-08,0.759906,0.000616215,-3.05338e-07,-1.52393e-08,0.760522,0.000615559,-3.51056e-07,3.12221e-08,0.761137,0.000614951,-2.5739e-07,-5.00443e-08,0.761751,0.000614286,-4.07523e-07,4.9746e-08,0.762365,0.00061362,-2.58285e-07,-2.97303e-08,0.762979,0.000613014,-3.47476e-07,9.57079e-09,0.763591,0.000612348,-3.18764e-07,-8.55287e-09,0.764203,0.000611685,-3.44422e-07,2.46407e-08,0.764815,0.00061107,-2.705e-07,-3.04053e-08,0.765426,0.000610437,-3.61716e-07,3.73759e-08,0.766036,0.000609826,-2.49589e-07,-5.94935e-08,0.766645,0.000609149,-4.28069e-07,8.13889e-08,0.767254,0.000608537,-1.83902e-07,-8.72483e-08,0.767862,0.000607907,-4.45647e-07,8.87901e-08,0.76847,0.000607282,-1.79277e-07,-8.90983e-08,0.769077,0.000606656,-4.46572e-07,8.87892e-08,0.769683,0.000606029,-1.80204e-07,-8.72446e-08,0.770289,0.000605407,-4.41938e-07,8.13752e-08,0.770894,0.000604768,-1.97812e-07,-5.94423e-08,0.771498,0.000604194,-3.76139e-07,3.71848e-08,0.772102,0.000603553,-2.64585e-07,-2.96922e-08,0.772705,0.000602935,-3.53661e-07,2.19793e-08,0.773308,0.000602293,-2.87723e-07,1.37955e-09,0.77391,0.000601722,-2.83585e-07,-2.74976e-08,0.774512,0.000601072,-3.66077e-07,4.9006e-08,0.775112,0.000600487,-2.19059e-07,-4.93171e-08,0.775712,0.000599901,-3.67011e-07,2.90531e-08,0.776312,0.000599254,-2.79851e-07,-7.29081e-09,0.776911,0.000598673,-3.01724e-07,1.10077e-10,0.777509,0.00059807,-3.01393e-07,6.85053e-09,0.778107,0.000597487,-2.80842e-07,-2.75123e-08,0.778704,0.000596843,-3.63379e-07,4.35939e-08,0.779301,0.000596247,-2.32597e-07,-2.7654e-08,0.779897,0.000595699,-3.15559e-07,7.41741e-09,0.780492,0.00059509,-2.93307e-07,-2.01562e-09,0.781087,0.000594497,-2.99354e-07,6.45059e-10,0.781681,0.000593901,-2.97418e-07,-5.64635e-10,0.782275,0.000593304,-2.99112e-07,1.61347e-09,0.782868,0.000592711,-2.94272e-07,-5.88926e-09,0.78346,0.000592105,-3.1194e-07,2.19436e-08,0.784052,0.000591546,-2.46109e-07,-2.22805e-08,0.784643,0.000590987,-3.1295e-07,7.57368e-09,0.785234,0.000590384,-2.90229e-07,-8.01428e-09,0.785824,0.00058978,-3.14272e-07,2.44834e-08,0.786414,0.000589225,-2.40822e-07,-3.03148e-08,0.787003,0.000588652,-3.31766e-07,3.7171e-08,0.787591,0.0005881,-2.20253e-07,-5.87646e-08,0.788179,0.000587483,-3.96547e-07,7.86782e-08,0.788766,0.000586926,-1.60512e-07,-7.71342e-08,0.789353,0.000586374,-3.91915e-07,5.10444e-08,0.789939,0.000585743,-2.38782e-07,-7.83422e-09,0.790524,0.000585242,-2.62284e-07,-1.97076e-08,0.791109,0.000584658,-3.21407e-07,2.70598e-08,0.791693,0.000584097,-2.40228e-07,-2.89269e-08,0.792277,0.000583529,-3.27008e-07,2.90431e-08,0.792861,0.000582963,-2.39879e-07,-2.76409e-08,0.793443,0.0005824,-3.22802e-07,2.1916e-08,0.794025,0.00058182,-2.57054e-07,-4.18368e-10,0.794607,0.000581305,-2.58309e-07,-2.02425e-08,0.795188,0.000580727,-3.19036e-07,2.17838e-08,0.795768,0.000580155,-2.53685e-07,-7.28814e-09,0.796348,0.000579625,-2.75549e-07,7.36871e-09,0.796928,0.000579096,-2.53443e-07,-2.21867e-08,0.797506,0.000578523,-3.20003e-07,2.17736e-08,0.798085,0.000577948,-2.54683e-07,-5.30296e-09,0.798662,0.000577423,-2.70592e-07,-5.61698e-10,0.799239,0.00057688,-2.72277e-07,7.54977e-09,0.799816,0.000576358,-2.49627e-07,-2.96374e-08,0.800392,0.00057577,-3.38539e-07,5.1395e-08,0.800968,0.000575247,-1.84354e-07,-5.67335e-08,0.801543,0.000574708,-3.54555e-07,5.63297e-08,0.802117,0.000574168,-1.85566e-07,-4.93759e-08,0.802691,0.000573649,-3.33693e-07,2.19646e-08,0.803264,0.000573047,-2.678e-07,2.1122e-08,0.803837,0.000572575,-2.04433e-07,-4.68482e-08,0.804409,0.000572026,-3.44978e-07,4.70613e-08,0.804981,0.000571477,-2.03794e-07,-2.21877e-08,0.805552,0.000571003,-2.70357e-07,-1.79153e-08,0.806123,0.000570408,-3.24103e-07,3.42443e-08,0.806693,0.000569863,-2.2137e-07,1.47556e-10,0.807263,0.000569421,-2.20928e-07,-3.48345e-08,0.807832,0.000568874,-3.25431e-07,1.99812e-08,0.808401,0.000568283,-2.65487e-07,1.45143e-08,0.808969,0.000567796,-2.21945e-07,-1.84338e-08,0.809536,0.000567297,-2.77246e-07,-3.83608e-10,0.810103,0.000566741,-2.78397e-07,1.99683e-08,0.81067,0.000566244,-2.18492e-07,-1.98848e-08,0.811236,0.000565747,-2.78146e-07,-3.38976e-11,0.811801,0.000565191,-2.78248e-07,2.00204e-08,0.812366,0.000564695,-2.18187e-07,-2.04429e-08,0.812931,0.000564197,-2.79516e-07,2.1467e-09,0.813495,0.000563644,-2.73076e-07,1.18561e-08,0.814058,0.000563134,-2.37507e-07,1.00334e-08,0.814621,0.000562689,-2.07407e-07,-5.19898e-08,0.815183,0.000562118,-3.63376e-07,7.87163e-08,0.815745,0.000561627,-1.27227e-07,-8.40616e-08,0.816306,0.000561121,-3.79412e-07,7.87163e-08,0.816867,0.000560598,-1.43263e-07,-5.19898e-08,0.817428,0.000560156,-2.99233e-07,1.00335e-08,0.817988,0.000559587,-2.69132e-07,1.18559e-08,0.818547,0.000559085,-2.33564e-07,2.14764e-09,0.819106,0.000558624,-2.27122e-07,-2.04464e-08,0.819664,0.000558108,-2.88461e-07,2.00334e-08,0.820222,0.000557591,-2.28361e-07,-8.24277e-11,0.820779,0.000557135,-2.28608e-07,-1.97037e-08,0.821336,0.000556618,-2.87719e-07,1.92925e-08,0.821893,0.000556101,-2.29841e-07,2.13831e-09,0.822448,0.000555647,-2.23427e-07,-2.78458e-08,0.823004,0.000555117,-3.06964e-07,4.96402e-08,0.823559,0.000554652,-1.58043e-07,-5.15058e-08,0.824113,0.000554181,-3.12561e-07,3.71737e-08,0.824667,0.000553668,-2.0104e-07,-3.75844e-08,0.82522,0.000553153,-3.13793e-07,5.35592e-08,0.825773,0.000552686,-1.53115e-07,-5.74431e-08,0.826326,0.000552207,-3.25444e-07,5.7004e-08,0.826878,0.000551728,-1.54433e-07,-5.13635e-08,0.827429,0.000551265,-3.08523e-07,2.92406e-08,0.82798,0.000550735,-2.20801e-07,-5.99424e-09,0.828531,0.000550276,-2.38784e-07,-5.26363e-09,0.829081,0.000549782,-2.54575e-07,2.70488e-08,0.82963,0.000549354,-1.73429e-07,-4.33268e-08,0.83018,0.000548878,-3.03409e-07,2.7049e-08,0.830728,0.000548352,-2.22262e-07,-5.26461e-09,0.831276,0.000547892,-2.38056e-07,-5.99057e-09,0.831824,0.000547397,-2.56027e-07,2.92269e-08,0.832371,0.000546973,-1.68347e-07,-5.13125e-08,0.832918,0.000546482,-3.22284e-07,5.68139e-08,0.833464,0.000546008,-1.51843e-07,-5.67336e-08,0.83401,0.000545534,-3.22043e-07,5.09113e-08,0.834555,0.000545043,-1.6931e-07,-2.77022e-08,0.8351,0.000544621,-2.52416e-07,2.92924e-10,0.835644,0.000544117,-2.51537e-07,2.65305e-08,0.836188,0.000543694,-1.71946e-07,-4.68105e-08,0.836732,0.00054321,-3.12377e-07,4.15021e-08,0.837275,0.000542709,-1.87871e-07,1.13355e-11,0.837817,0.000542334,-1.87837e-07,-4.15474e-08,0.838359,0.000541833,-3.12479e-07,4.69691e-08,0.838901,0.000541349,-1.71572e-07,-2.71196e-08,0.839442,0.000540925,-2.52931e-07,1.90462e-09,0.839983,0.000540425,-2.47217e-07,1.95011e-08,0.840523,0.000539989,-1.88713e-07,-2.03045e-08,0.841063,0.00053955,-2.49627e-07,2.11216e-09,0.841602,0.000539057,-2.4329e-07,1.18558e-08,0.842141,0.000538606,-2.07723e-07,1.00691e-08,0.842679,0.000538221,-1.77516e-07,-5.21324e-08,0.843217,0.00053771,-3.33913e-07,7.92513e-08,0.843755,0.00053728,-9.6159e-08,-8.60587e-08,0.844292,0.000536829,-3.54335e-07,8.61696e-08,0.844828,0.000536379,-9.58263e-08,-7.98057e-08,0.845364,0.000535948,-3.35243e-07,5.42394e-08,0.8459,0.00053544,-1.72525e-07,-1.79426e-08,0.846435,0.000535041,-2.26353e-07,1.75308e-08,0.84697,0.000534641,-1.73761e-07,-5.21806e-08,0.847505,0.000534137,-3.30302e-07,7.19824e-08,0.848038,0.000533692,-1.14355e-07,-5.69349e-08,0.848572,0.000533293,-2.8516e-07,3.65479e-08,0.849105,0.000532832,-1.75516e-07,-2.96519e-08,0.849638,0.000532392,-2.64472e-07,2.2455e-08,0.85017,0.000531931,-1.97107e-07,-5.63451e-10,0.850702,0.000531535,-1.98797e-07,-2.02011e-08,0.851233,0.000531077,-2.59401e-07,2.17634e-08,0.851764,0.000530623,-1.94111e-07,-7.24794e-09,0.852294,0.000530213,-2.15854e-07,7.22832e-09,0.852824,0.000529803,-1.94169e-07,-2.16653e-08,0.853354,0.00052935,-2.59165e-07,1.98283e-08,0.853883,0.000528891,-1.9968e-07,1.95678e-09,0.854412,0.000528497,-1.9381e-07,-2.76554e-08,0.85494,0.000528027,-2.76776e-07,4.90603e-08,0.855468,0.00052762,-1.29596e-07,-4.93764e-08,0.855995,0.000527213,-2.77725e-07,2.92361e-08,0.856522,0.000526745,-1.90016e-07,-7.96341e-09,0.857049,0.000526341,-2.13907e-07,2.61752e-09,0.857575,0.000525922,-2.06054e-07,-2.50665e-09,0.8581,0.000525502,-2.13574e-07,7.40906e-09,0.858626,0.000525097,-1.91347e-07,-2.71296e-08,0.859151,0.000524633,-2.72736e-07,4.15048e-08,0.859675,0.000524212,-1.48221e-07,-1.96802e-08,0.860199,0.000523856,-2.07262e-07,-2.23886e-08,0.860723,0.000523375,-2.74428e-07,4.96299e-08,0.861246,0.000522975,-1.25538e-07,-5.69216e-08,0.861769,0.000522553,-2.96303e-07,5.88473e-08,0.862291,0.000522137,-1.19761e-07,-5.92584e-08,0.862813,0.00052172,-2.97536e-07,5.8977e-08,0.863334,0.000521301,-1.20605e-07,-5.74403e-08,0.863855,0.000520888,-2.92926e-07,5.15751e-08,0.864376,0.000520457,-1.38201e-07,-2.96506e-08,0.864896,0.000520091,-2.27153e-07,7.42277e-09,0.865416,0.000519659,-2.04885e-07,-4.05057e-11,0.865936,0.00051925,-2.05006e-07,-7.26074e-09,0.866455,0.000518818,-2.26788e-07,2.90835e-08,0.866973,0.000518451,-1.39538e-07,-4.94686e-08,0.867492,0.000518024,-2.87944e-07,4.95814e-08,0.868009,0.000517597,-1.39199e-07,-2.96479e-08,0.868527,0.000517229,-2.28143e-07,9.40539e-09,0.869044,0.000516801,-1.99927e-07,-7.9737e-09,0.86956,0.000516378,-2.23848e-07,2.24894e-08,0.870077,0.000515997,-1.5638e-07,-2.23793e-08,0.870592,0.000515617,-2.23517e-07,7.42302e-09,0.871108,0.000515193,-2.01248e-07,-7.31283e-09,0.871623,0.000514768,-2.23187e-07,2.18283e-08,0.872137,0.000514387,-1.57702e-07,-2.03959e-08,0.872652,0.000514011,-2.1889e-07,1.50711e-10,0.873165,0.000513573,-2.18437e-07,1.97931e-08,0.873679,0.000513196,-1.59058e-07,-1.97183e-08,0.874192,0.000512819,-2.18213e-07,-5.24324e-10,0.874704,0.000512381,-2.19786e-07,2.18156e-08,0.875217,0.000512007,-1.54339e-07,-2.71336e-08,0.875728,0.000511616,-2.3574e-07,2.71141e-08,0.87624,0.000511226,-1.54398e-07,-2.17182e-08,0.876751,0.000510852,-2.19552e-07,1.54131e-10,0.877262,0.000510414,-2.1909e-07,2.11017e-08,0.877772,0.000510039,-1.55785e-07,-2.49562e-08,0.878282,0.000509652,-2.30654e-07,1.91183e-08,0.878791,0.000509248,-1.73299e-07,8.08751e-09,0.8793,0.000508926,-1.49036e-07,-5.14684e-08,0.879809,0.000508474,-3.03441e-07,7.85766e-08,0.880317,0.000508103,-6.77112e-08,-8.40242e-08,0.880825,0.000507715,-3.19784e-07,7.87063e-08,0.881333,0.000507312,-8.36649e-08,-5.19871e-08,0.88184,0.000506988,-2.39626e-07,1.00327e-08,0.882346,0.000506539,-2.09528e-07,1.18562e-08,0.882853,0.000506156,-1.73959e-07,2.14703e-09,0.883359,0.000505814,-1.67518e-07,-2.04444e-08,0.883864,0.000505418,-2.28851e-07,2.00258e-08,0.88437,0.00050502,-1.68774e-07,-5.42855e-11,0.884874,0.000504682,-1.68937e-07,-1.98087e-08,0.885379,0.000504285,-2.28363e-07,1.96842e-08,0.885883,0.000503887,-1.6931e-07,6.76342e-10,0.886387,0.000503551,-1.67281e-07,-2.23896e-08,0.88689,0.000503149,-2.3445e-07,2.92774e-08,0.887393,0.000502768,-1.46618e-07,-3.51152e-08,0.887896,0.00050237,-2.51963e-07,5.15787e-08,0.888398,0.00050202,-9.72271e-08,-5.19903e-08,0.8889,0.00050167,-2.53198e-07,3.71732e-08,0.889401,0.000501275,-1.41678e-07,-3.70978e-08,0.889902,0.00050088,-2.52972e-07,5.16132e-08,0.890403,0.000500529,-9.81321e-08,-5.01459e-08,0.890903,0.000500183,-2.4857e-07,2.9761e-08,0.891403,0.000499775,-1.59287e-07,-9.29351e-09,0.891903,0.000499428,-1.87167e-07,7.41301e-09,0.892402,0.000499076,-1.64928e-07,-2.03585e-08,0.892901,0.000498685,-2.26004e-07,1.44165e-08,0.893399,0.000498276,-1.82754e-07,2.22974e-08,0.893898,0.000497978,-1.15862e-07,-4.40013e-08,0.894395,0.000497614,-2.47866e-07,3.44985e-08,0.894893,0.000497222,-1.44371e-07,-3.43882e-08,0.89539,0.00049683,-2.47535e-07,4.34497e-08,0.895886,0.000496465,-1.17186e-07,-2.02012e-08,0.896383,0.00049617,-1.7779e-07,-2.22497e-08,0.896879,0.000495748,-2.44539e-07,4.95952e-08,0.897374,0.000495408,-9.57532e-08,-5.69217e-08,0.89787,0.000495045,-2.66518e-07,5.88823e-08,0.898364,0.000494689,-8.98713e-08,-5.93983e-08,0.898859,0.000494331,-2.68066e-07,5.95017e-08,0.899353,0.000493973,-8.95613e-08,-5.9399e-08,0.899847,0.000493616,-2.67758e-07,5.8885e-08,0.90034,0.000493257,-9.11033e-08,-5.69317e-08,0.900833,0.000492904,-2.61898e-07,4.96326e-08,0.901326,0.000492529,-1.13001e-07,-2.23893e-08,0.901819,0.000492236,-1.80169e-07,-1.968e-08,0.902311,0.000491817,-2.39209e-07,4.15047e-08,0.902802,0.000491463,-1.14694e-07,-2.71296e-08,0.903293,0.000491152,-1.96083e-07,7.409e-09,0.903784,0.000490782,-1.73856e-07,-2.50645e-09,0.904275,0.000490427,-1.81376e-07,2.61679e-09,0.904765,0.000490072,-1.73525e-07,-7.96072e-09,0.905255,0.000489701,-1.97407e-07,2.92261e-08,0.905745,0.000489394,-1.09729e-07,-4.93389e-08,0.906234,0.000489027,-2.57746e-07,4.89204e-08,0.906723,0.000488658,-1.10985e-07,-2.71333e-08,0.907211,0.000488354,-1.92385e-07,8.30861e-12,0.907699,0.00048797,-1.9236e-07,2.71001e-08,0.908187,0.000487666,-1.1106e-07,-4.88041e-08,0.908675,0.000487298,-2.57472e-07,4.89069e-08,0.909162,0.000486929,-1.10751e-07,-2.76143e-08,0.909649,0.000486625,-1.93594e-07,1.9457e-09,0.910135,0.000486244,-1.87757e-07,1.98315e-08,0.910621,0.000485928,-1.28262e-07,-2.16671e-08,0.911107,0.000485606,-1.93264e-07,7.23216e-09,0.911592,0.000485241,-1.71567e-07,-7.26152e-09,0.912077,0.000484877,-1.93352e-07,2.18139e-08,0.912562,0.000484555,-1.2791e-07,-2.03895e-08,0.913047,0.000484238,-1.89078e-07,1.39494e-10,0.913531,0.000483861,-1.8866e-07,1.98315e-08,0.914014,0.000483543,-1.29165e-07,-1.98609e-08,0.914498,0.000483225,-1.88748e-07,7.39912e-12,0.914981,0.000482847,-1.88726e-07,1.98313e-08,0.915463,0.000482529,-1.29232e-07,-1.9728e-08,0.915946,0.000482212,-1.88416e-07,-5.24035e-10,0.916428,0.000481833,-1.89988e-07,2.18241e-08,0.916909,0.000481519,-1.24516e-07,-2.71679e-08,0.917391,0.000481188,-2.06019e-07,2.72427e-08,0.917872,0.000480858,-1.24291e-07,-2.21985e-08,0.918353,0.000480543,-1.90886e-07,1.94644e-09,0.918833,0.000480167,-1.85047e-07,1.44127e-08,0.919313,0.00047984,-1.41809e-07,7.39438e-12,0.919793,0.000479556,-1.41787e-07,-1.44423e-08,0.920272,0.000479229,-1.85114e-07,-1.84291e-09,0.920751,0.000478854,-1.90642e-07,2.18139e-08,0.92123,0.000478538,-1.25201e-07,-2.58081e-08,0.921708,0.00047821,-2.02625e-07,2.18139e-08,0.922186,0.00047787,-1.37183e-07,-1.84291e-09,0.922664,0.00047759,-1.42712e-07,-1.44423e-08,0.923141,0.000477262,-1.86039e-07,7.34701e-12,0.923618,0.00047689,-1.86017e-07,1.44129e-08,0.924095,0.000476561,-1.42778e-07,1.94572e-09,0.924572,0.000476281,-1.36941e-07,-2.21958e-08,0.925048,0.000475941,-2.03528e-07,2.72327e-08,0.925523,0.000475615,-1.2183e-07,-2.71304e-08,0.925999,0.00047529,-2.03221e-07,2.16843e-08,0.926474,0.000474949,-1.38168e-07,-2.16005e-12,0.926949,0.000474672,-1.38175e-07,-2.16756e-08,0.927423,0.000474331,-2.03202e-07,2.71001e-08,0.927897,0.000474006,-1.21902e-07,-2.71201e-08,0.928371,0.000473681,-2.03262e-07,2.17757e-08,0.928845,0.00047334,-1.37935e-07,-3.78028e-10,0.929318,0.000473063,-1.39069e-07,-2.02636e-08,0.929791,0.000472724,-1.9986e-07,2.18276e-08,0.930263,0.000472389,-1.34377e-07,-7.44231e-09,0.930736,0.000472098,-1.56704e-07,7.94165e-09,0.931208,0.000471809,-1.32879e-07,-2.43243e-08,0.931679,0.00047147,-2.05851e-07,2.97508e-08,0.932151,0.000471148,-1.16599e-07,-3.50742e-08,0.932622,0.000470809,-2.21822e-07,5.09414e-08,0.933092,0.000470518,-6.89976e-08,-4.94821e-08,0.933563,0.000470232,-2.17444e-07,2.77775e-08,0.934033,0.00046988,-1.34111e-07,-2.02351e-09,0.934502,0.000469606,-1.40182e-07,-1.96835e-08,0.934972,0.000469267,-1.99232e-07,2.11529e-08,0.935441,0.000468932,-1.35774e-07,-5.32332e-09,0.93591,0.000468644,-1.51743e-07,1.40413e-10,0.936378,0.000468341,-1.51322e-07,4.76166e-09,0.936846,0.000468053,-1.37037e-07,-1.9187e-08,0.937314,0.000467721,-1.94598e-07,1.23819e-08,0.937782,0.000467369,-1.57453e-07,2.92642e-08,0.938249,0.000467142,-6.96601e-08,-6.98342e-08,0.938716,0.000466793,-2.79163e-07,7.12586e-08,0.939183,0.000466449,-6.53869e-08,-3.63863e-08,0.939649,0.000466209,-1.74546e-07,1.46818e-08,0.940115,0.000465904,-1.305e-07,-2.2341e-08,0.940581,0.000465576,-1.97523e-07,1.50774e-08,0.941046,0.000465226,-1.52291e-07,2.16359e-08,0.941511,0.000464986,-8.73832e-08,-4.20162e-08,0.941976,0.000464685,-2.13432e-07,2.72198e-08,0.942441,0.00046434,-1.31773e-07,-7.2581e-09,0.942905,0.000464055,-1.53547e-07,1.81263e-09,0.943369,0.000463753,-1.48109e-07,7.58386e-12,0.943832,0.000463457,-1.48086e-07,-1.84298e-09,0.944296,0.000463155,-1.53615e-07,7.36433e-09,0.944759,0.00046287,-1.31522e-07,-2.76143e-08,0.945221,0.000462524,-2.14365e-07,4.34883e-08,0.945684,0.000462226,-8.39003e-08,-2.71297e-08,0.946146,0.000461977,-1.65289e-07,5.42595e-09,0.946608,0.000461662,-1.49012e-07,5.42593e-09,0.947069,0.000461381,-1.32734e-07,-2.71297e-08,0.94753,0.000461034,-2.14123e-07,4.34881e-08,0.947991,0.000460736,-8.36585e-08,-2.76134e-08,0.948452,0.000460486,-1.66499e-07,7.36083e-09,0.948912,0.000460175,-1.44416e-07,-1.82993e-09,0.949372,0.000459881,-1.49906e-07,-4.11073e-11,0.949832,0.000459581,-1.50029e-07,1.99434e-09,0.950291,0.000459287,-1.44046e-07,-7.93627e-09,0.950751,0.000458975,-1.67855e-07,2.97507e-08,0.951209,0.000458728,-7.86029e-08,-5.1462e-08,0.951668,0.000458417,-2.32989e-07,5.6888e-08,0.952126,0.000458121,-6.2325e-08,-5.68806e-08,0.952584,0.000457826,-2.32967e-07,5.14251e-08,0.953042,0.000457514,-7.86914e-08,-2.96107e-08,0.953499,0.000457268,-1.67523e-07,7.41296e-09,0.953956,0.000456955,-1.45285e-07,-4.11262e-11,0.954413,0.000456665,-1.45408e-07,-7.24847e-09,0.95487,0.000456352,-1.67153e-07,2.9035e-08,0.955326,0.000456105,-8.00484e-08,-4.92869e-08,0.955782,0.000455797,-2.27909e-07,4.89032e-08,0.956238,0.000455488,-8.11994e-08,-2.71166e-08,0.956693,0.000455244,-1.62549e-07,-4.13678e-11,0.957148,0.000454919,-1.62673e-07,2.72821e-08,0.957603,0.000454675,-8.0827e-08,-4.94824e-08,0.958057,0.000454365,-2.29274e-07,5.14382e-08,0.958512,0.000454061,-7.49597e-08,-3.7061e-08,0.958965,0.0004538,-1.86143e-07,3.72013e-08,0.959419,0.000453539,-7.45389e-08,-5.21396e-08,0.959873,0.000453234,-2.30958e-07,5.21476e-08,0.960326,0.000452928,-7.45146e-08,-3.72416e-08,0.960778,0.000452667,-1.8624e-07,3.72143e-08,0.961231,0.000452407,-7.45967e-08,-5.20109e-08,0.961683,0.000452101,-2.30629e-07,5.16199e-08,0.962135,0.000451795,-7.57696e-08,-3.52595e-08,0.962587,0.000451538,-1.81548e-07,2.98133e-08,0.963038,0.000451264,-9.2108e-08,-2.43892e-08,0.963489,0.000451007,-1.65276e-07,8.13892e-09,0.96394,0.000450701,-1.40859e-07,-8.16647e-09,0.964391,0.000450394,-1.65358e-07,2.45269e-08,0.964841,0.000450137,-9.17775e-08,-3.03367e-08,0.965291,0.000449863,-1.82787e-07,3.7215e-08,0.965741,0.000449609,-7.11424e-08,-5.89188e-08,0.96619,0.00044929,-2.47899e-07,7.92509e-08,0.966639,0.000449032,-1.01462e-08,-7.92707e-08,0.967088,0.000448773,-2.47958e-07,5.90181e-08,0.967537,0.000448455,-7.0904e-08,-3.75925e-08,0.967985,0.0004482,-1.83681e-07,3.17471e-08,0.968433,0.000447928,-8.84401e-08,-2.97913e-08,0.968881,0.000447662,-1.77814e-07,2.78133e-08,0.969329,0.000447389,-9.4374e-08,-2.18572e-08,0.969776,0.000447135,-1.59946e-07,1.10134e-11,0.970223,0.000446815,-1.59913e-07,2.18132e-08,0.97067,0.000446561,-9.44732e-08,-2.76591e-08,0.971116,0.000446289,-1.7745e-07,2.92185e-08,0.971562,0.000446022,-8.97948e-08,-2.96104e-08,0.972008,0.000445753,-1.78626e-07,2.96185e-08,0.972454,0.000445485,-8.97706e-08,-2.92588e-08,0.972899,0.000445218,-1.77547e-07,2.78123e-08,0.973344,0.000444946,-9.41103e-08,-2.23856e-08,0.973789,0.000444691,-1.61267e-07,2.12559e-09,0.974233,0.000444374,-1.5489e-07,1.38833e-08,0.974678,0.000444106,-1.13241e-07,1.94591e-09,0.975122,0.000443886,-1.07403e-07,-2.16669e-08,0.975565,0.000443606,-1.72404e-07,2.5117e-08,0.976009,0.000443336,-9.70526e-08,-1.91963e-08,0.976452,0.000443085,-1.54642e-07,-7.93627e-09,0.976895,0.000442752,-1.7845e-07,5.09414e-08,0.977338,0.000442548,-2.56262e-08,-7.66201e-08,0.97778,0.000442266,-2.55486e-07,7.67249e-08,0.978222,0.000441986,-2.53118e-08,-5.14655e-08,0.978664,0.000441781,-1.79708e-07,9.92773e-09,0.979106,0.000441451,-1.49925e-07,1.17546e-08,0.979547,0.000441186,-1.14661e-07,2.65868e-09,0.979988,0.000440965,-1.06685e-07,-2.23893e-08,0.980429,0.000440684,-1.73853e-07,2.72939e-08,0.980869,0.000440419,-9.19716e-08,-2.71816e-08,0.98131,0.000440153,-1.73516e-07,2.18278e-08,0.98175,0.000439872,-1.08033e-07,-5.24833e-10,0.982189,0.000439654,-1.09607e-07,-1.97284e-08,0.982629,0.000439376,-1.68793e-07,1.98339e-08,0.983068,0.000439097,-1.09291e-07,-2.62901e-12,0.983507,0.000438879,-1.09299e-07,-1.98234e-08,0.983946,0.000438601,-1.68769e-07,1.96916e-08,0.984384,0.000438322,-1.09694e-07,6.6157e-10,0.984823,0.000438105,-1.0771e-07,-2.23379e-08,0.985261,0.000437823,-1.74723e-07,2.90855e-08,0.985698,0.00043756,-8.74669e-08,-3.43992e-08,0.986136,0.000437282,-1.90665e-07,4.89068e-08,0.986573,0.000437048,-4.39442e-08,-4.20188e-08,0.98701,0.000436834,-1.7e-07,-4.11073e-11,0.987446,0.000436494,-1.70124e-07,4.21832e-08,0.987883,0.00043628,-4.35742e-08,-4.94824e-08,0.988319,0.000436044,-1.92021e-07,3.6537e-08,0.988755,0.00043577,-8.24102e-08,-3.70611e-08,0.989191,0.000435494,-1.93593e-07,5.21026e-08,0.989626,0.000435263,-3.72855e-08,-5.21402e-08,0.990061,0.000435032,-1.93706e-07,3.7249e-08,0.990496,0.000434756,-8.19592e-08,-3.72512e-08,0.990931,0.000434481,-1.93713e-07,5.21511e-08,0.991365,0.00043425,-3.72595e-08,-5.21439e-08,0.991799,0.000434019,-1.93691e-07,3.72152e-08,0.992233,0.000433743,-8.20456e-08,-3.71123e-08,0.992667,0.000433468,-1.93382e-07,5.16292e-08,0.9931,0.000433236,-3.84947e-08,-5.01953e-08,0.993533,0.000433008,-1.89081e-07,2.99427e-08,0.993966,0.00043272,-9.92525e-08,-9.9708e-09,0.994399,0.000432491,-1.29165e-07,9.94051e-09,0.994831,0.000432263,-9.93434e-08,-2.97912e-08,0.995263,0.000431975,-1.88717e-07,4.96198e-08,0.995695,0.000431746,-3.98578e-08,-4.94785e-08,0.996127,0.000431518,-1.88293e-07,2.9085e-08,0.996558,0.000431229,-1.01038e-07,-7.25675e-09,0.996989,0.000431005,-1.22809e-07,-5.79945e-11,0.99742,0.000430759,-1.22983e-07,7.48873e-09,0.997851,0.000430536,-1.00516e-07,-2.98969e-08,0.998281,0.000430245,-1.90207e-07,5.24942e-08,0.998711,0.000430022,-3.27246e-08,-6.08706e-08,0.999141,0.000429774,-2.15336e-07,7.17788e-08,0.999571,0.000429392,0.,0.};
+
+    template <typename T, int scn, int dcn, bool srgb, int blueIdx> struct Lab2RGB;
+
+    template <int scn, int dcn, bool srgb, int blueIdx> struct Lab2RGB<float, scn, dcn, srgb, blueIdx>
+            : unary_function<typename MakeVec<float, scn>::type, typename MakeVec<float, dcn>::type>
+    {
+        __device__ typename MakeVec<float, dcn>::type operator ()(const typename MakeVec<float, scn>::type& src) const
+        {
+            const float lThresh = 0.008856f * 903.3f;
+            const float fThresh = 7.787f * 0.008856f + 16.0f / 116.0f;
+
+            float Y, fy;
+
+            if (src.x <= lThresh)
+            {
+                Y = src.x / 903.3f;
+                fy = 7.787f * Y + 16.0f / 116.0f;
+            }
+            else
+            {
+                fy = (src.x + 16.0f) / 116.0f;
+                Y = fy * fy * fy;
+            }
+
+            float X = src.y / 500.0f + fy;
+            float Z = fy - src.z / 200.0f;
+
+            if (X <= fThresh)
+                X = (X - 16.0f / 116.0f) / 7.787f;
+            else
+                X = X * X * X;
+
+            if (Z <= fThresh)
+                Z = (Z - 16.0f / 116.0f) / 7.787f;
+            else
+                Z = Z * Z * Z;
+
+            float B = 0.052891f * X - 0.204043f * Y + 1.151152f * Z;
+            float G = -0.921235f * X + 1.875991f * Y + 0.045244f * Z;
+            float R = 3.079933f * X - 1.537150f * Y - 0.542782f * Z;
+
+            if (srgb)
+            {
+                B = splineInterpolate(B * GAMMA_TAB_SIZE, c_sRGBInvGammaTab, GAMMA_TAB_SIZE);
+                G = splineInterpolate(G * GAMMA_TAB_SIZE, c_sRGBInvGammaTab, GAMMA_TAB_SIZE);
+                R = splineInterpolate(R * GAMMA_TAB_SIZE, c_sRGBInvGammaTab, GAMMA_TAB_SIZE);
+            }
+
+            typename MakeVec<float, dcn>::type dst;
+
+            dst.x = blueIdx == 0 ? B : R;
+            dst.y = G;
+            dst.z = blueIdx == 0 ? R : B;
+            setAlpha(dst, ColorChannel<float>::max());
+
+            return dst;
+        }
+    };
+
+    template <int scn, int dcn, bool srgb, int blueIdx> struct Lab2RGB<uchar, scn, dcn, srgb, blueIdx>
+            : unary_function<typename MakeVec<uchar, scn>::type, typename MakeVec<uchar, dcn>::type>
+    {
+        __device__ typename MakeVec<uchar, dcn>::type operator ()(const typename MakeVec<uchar, scn>::type& src) const
+        {
+            float3 buf;
+
+            buf.x = src.x * (100.f / 255.f);
+            buf.y = src.y - 128;
+            buf.z = src.z - 128;
+
+            Lab2RGB<float, 3, 3, srgb, blueIdx> cvtf;
+            buf = cvtf(buf);
+
+            typename MakeVec<uchar, dcn>::type dst;
+
+            dst.x = saturate_cast<uchar>(buf.x * 255.f);
+            dst.y = saturate_cast<uchar>(buf.y * 255.f);
+            dst.z = saturate_cast<uchar>(buf.z * 255.f);
+            setAlpha(dst, ColorChannel<uchar>::max());
+
+            return dst;
+        }
+    };
+
+    // RGB to Luv
+
+    __constant__ float c_LabCbrtTab[] = {0.137931,0.0114066,0.,1.18859e-07,0.149338,0.011407,3.56578e-07,-5.79396e-07,0.160745,0.0114059,-1.38161e-06,2.16892e-06,0.172151,0.0114097,5.12516e-06,-8.0814e-06,0.183558,0.0113957,-1.9119e-05,3.01567e-05,0.194965,0.0114479,7.13509e-05,-0.000112545,0.206371,0.011253,-0.000266285,-0.000106493,0.217252,0.0104009,-0.000585765,7.32149e-05,0.22714,0.00944906,-0.00036612,1.21917e-05,0.236235,0.0087534,-0.000329545,2.01753e-05,0.244679,0.00815483,-0.000269019,1.24435e-05,0.252577,0.00765412,-0.000231689,1.05618e-05,0.26001,0.00722243,-0.000200003,8.26662e-06,0.267041,0.00684723,-0.000175203,6.76746e-06,0.27372,0.00651712,-0.000154901,5.61192e-06,0.280088,0.00622416,-0.000138065,4.67009e-06,0.286179,0.00596204,-0.000124055,3.99012e-06,0.292021,0.0057259,-0.000112085,3.36032e-06,0.297638,0.00551181,-0.000102004,2.95338e-06,0.30305,0.00531666,-9.31435e-05,2.52875e-06,0.308277,0.00513796,-8.55572e-05,2.22022e-06,0.313331,0.00497351,-7.88966e-05,1.97163e-06,0.318228,0.00482163,-7.29817e-05,1.7248e-06,0.322978,0.00468084,-6.78073e-05,1.55998e-06,0.327593,0.0045499,-6.31274e-05,1.36343e-06,0.332081,0.00442774,-5.90371e-05,1.27136e-06,0.336451,0.00431348,-5.5223e-05,1.09111e-06,0.34071,0.00420631,-5.19496e-05,1.0399e-06,0.344866,0.00410553,-4.88299e-05,9.18347e-07,0.348923,0.00401062,-4.60749e-05,8.29942e-07,0.352889,0.00392096,-4.35851e-05,7.98478e-07,0.356767,0.00383619,-4.11896e-05,6.84917e-07,0.360562,0.00375586,-3.91349e-05,6.63976e-07,0.36428,0.00367959,-3.7143e-05,5.93086e-07,0.367923,0.00360708,-3.53637e-05,5.6976e-07,0.371495,0.00353806,-3.36544e-05,4.95533e-07,0.375,0.00347224,-3.21678e-05,4.87951e-07,0.378441,0.00340937,-3.0704e-05,4.4349e-07,0.38182,0.00334929,-2.93735e-05,4.20297e-07,0.38514,0.0032918,-2.81126e-05,3.7872e-07,0.388404,0.00323671,-2.69764e-05,3.596e-07,0.391614,0.00318384,-2.58976e-05,3.5845e-07,0.394772,0.00313312,-2.48223e-05,2.92765e-07,0.397881,0.00308435,-2.3944e-05,3.18232e-07,0.400942,0.00303742,-2.29893e-05,2.82046e-07,0.403957,0.00299229,-2.21432e-05,2.52315e-07,0.406927,0.00294876,-2.13862e-05,2.58416e-07,0.409855,0.00290676,-2.0611e-05,2.33939e-07,0.412741,0.00286624,-1.99092e-05,2.36342e-07,0.415587,0.00282713,-1.92001e-05,1.916e-07,0.418396,0.00278931,-1.86253e-05,2.1915e-07,0.421167,0.00275271,-1.79679e-05,1.83498e-07,0.423901,0.00271733,-1.74174e-05,1.79343e-07,0.426602,0.00268303,-1.68794e-05,1.72013e-07,0.429268,0.00264979,-1.63633e-05,1.75686e-07,0.431901,0.00261759,-1.58363e-05,1.3852e-07,0.434503,0.00258633,-1.54207e-05,1.64304e-07,0.437074,0.00255598,-1.49278e-05,1.28136e-07,0.439616,0.00252651,-1.45434e-05,1.57618e-07,0.442128,0.0024979,-1.40705e-05,1.0566e-07,0.444612,0.00247007,-1.37535e-05,1.34998e-07,0.447068,0.00244297,-1.33485e-05,1.29207e-07,0.449498,0.00241666,-1.29609e-05,9.32347e-08,0.451902,0.00239102,-1.26812e-05,1.23703e-07,0.45428,0.00236603,-1.23101e-05,9.74072e-08,0.456634,0.0023417,-1.20179e-05,1.12518e-07,0.458964,0.002318,-1.16803e-05,7.83681e-08,0.46127,0.00229488,-1.14452e-05,1.10452e-07,0.463554,0.00227232,-1.11139e-05,7.58719e-08,0.465815,0.00225032,-1.08863e-05,9.2699e-08,0.468055,0.00222882,-1.06082e-05,8.97738e-08,0.470273,0.00220788,-1.03388e-05,5.4845e-08,0.47247,0.00218736,-1.01743e-05,1.0808e-07,0.474648,0.00216734,-9.85007e-06,4.9277e-08,0.476805,0.00214779,-9.70224e-06,8.22408e-08,0.478943,0.00212863,-9.45551e-06,6.87942e-08,0.481063,0.00210993,-9.24913e-06,5.98144e-08,0.483163,0.00209161,-9.06969e-06,7.93789e-08,0.485246,0.00207371,-8.83155e-06,3.99032e-08,0.487311,0.00205616,-8.71184e-06,8.88325e-08,0.489358,0.002039,-8.44534e-06,2.20004e-08,0.491389,0.00202218,-8.37934e-06,9.13872e-08,0.493403,0.0020057,-8.10518e-06,2.96829e-08,0.495401,0.00198957,-8.01613e-06,5.81028e-08,0.497382,0.00197372,-7.84183e-06,6.5731e-08,0.499348,0.00195823,-7.64463e-06,3.66019e-08,0.501299,0.00194305,-7.53483e-06,2.62811e-08,0.503234,0.00192806,-7.45598e-06,9.66907e-08,0.505155,0.00191344,-7.16591e-06,4.18928e-09,0.507061,0.00189912,-7.15334e-06,6.53665e-08,0.508953,0.00188501,-6.95724e-06,3.23686e-08,0.510831,0.00187119,-6.86014e-06,4.35774e-08,0.512696,0.0018576,-6.72941e-06,3.17406e-08,0.514547,0.00184424,-6.63418e-06,6.78785e-08,0.516384,0.00183117,-6.43055e-06,-5.23126e-09,0.518209,0.0018183,-6.44624e-06,7.22562e-08,0.520021,0.00180562,-6.22947e-06,1.42292e-08,0.52182,0.0017932,-6.18679e-06,4.9641e-08,0.523607,0.00178098,-6.03786e-06,2.56259e-08,0.525382,0.00176898,-5.96099e-06,2.66696e-08,0.527145,0.00175714,-5.88098e-06,4.65094e-08,0.528897,0.00174552,-5.74145e-06,2.57114e-08,0.530637,0.00173411,-5.66431e-06,2.94588e-08,0.532365,0.00172287,-5.57594e-06,3.52667e-08,0.534082,0.00171182,-5.47014e-06,8.28868e-09,0.535789,0.00170091,-5.44527e-06,5.07871e-08,0.537484,0.00169017,-5.29291e-06,2.69817e-08,0.539169,0.00167967,-5.21197e-06,2.01009e-08,0.540844,0.0016693,-5.15166e-06,1.18237e-08,0.542508,0.00165903,-5.11619e-06,5.18135e-08,0.544162,0.00164896,-4.96075e-06,1.9341e-08,0.545806,0.00163909,-4.90273e-06,-9.96867e-09,0.54744,0.00162926,-4.93263e-06,8.01382e-08,0.549064,0.00161963,-4.69222e-06,-1.25601e-08,0.550679,0.00161021,-4.7299e-06,2.97067e-08,0.552285,0.00160084,-4.64078e-06,1.29426e-08,0.553881,0.0015916,-4.60195e-06,3.77327e-08,0.555468,0.00158251,-4.48875e-06,1.49412e-08,0.557046,0.00157357,-4.44393e-06,2.17118e-08,0.558615,0.00156475,-4.3788e-06,1.74206e-08,0.560176,0.00155605,-4.32653e-06,2.78152e-08,0.561727,0.00154748,-4.24309e-06,-9.47239e-09,0.563271,0.00153896,-4.27151e-06,6.9679e-08,0.564805,0.00153063,-4.06247e-06,-3.08246e-08,0.566332,0.00152241,-4.15494e-06,5.36188e-08,0.56785,0.00151426,-3.99409e-06,-4.83594e-09,0.56936,0.00150626,-4.00859e-06,2.53293e-08,0.570863,0.00149832,-3.93261e-06,2.27286e-08,0.572357,0.00149052,-3.86442e-06,2.96541e-09,0.573844,0.0014828,-3.85552e-06,2.50147e-08,0.575323,0.00147516,-3.78048e-06,1.61842e-08,0.576794,0.00146765,-3.73193e-06,2.94582e-08,0.578258,0.00146028,-3.64355e-06,-1.48076e-08,0.579715,0.00145295,-3.68798e-06,2.97724e-08,0.581164,0.00144566,-3.59866e-06,1.49272e-08,0.582606,0.00143851,-3.55388e-06,2.97285e-08,0.584041,0.00143149,-3.46469e-06,-1.46323e-08,0.585469,0.00142451,-3.50859e-06,2.88004e-08,0.58689,0.00141758,-3.42219e-06,1.864e-08,0.588304,0.00141079,-3.36627e-06,1.58482e-08,0.589712,0.00140411,-3.31872e-06,-2.24279e-08,0.591112,0.00139741,-3.38601e-06,7.38639e-08,0.592507,0.00139085,-3.16441e-06,-3.46088e-08,0.593894,0.00138442,-3.26824e-06,4.96675e-09,0.595275,0.0013779,-3.25334e-06,7.4346e-08,0.59665,0.00137162,-3.0303e-06,-6.39319e-08,0.598019,0.00136536,-3.2221e-06,6.21725e-08,0.599381,0.00135911,-3.03558e-06,-5.94423e-09,0.600737,0.00135302,-3.05341e-06,2.12091e-08,0.602087,0.00134697,-2.98979e-06,-1.92876e-08,0.603431,0.00134094,-3.04765e-06,5.5941e-08,0.604769,0.00133501,-2.87983e-06,-2.56622e-08,0.606101,0.00132917,-2.95681e-06,4.67078e-08,0.607427,0.0013234,-2.81669e-06,-4.19592e-08,0.608748,0.00131764,-2.94257e-06,6.15243e-08,0.610062,0.00131194,-2.75799e-06,-2.53244e-08,0.611372,0.00130635,-2.83397e-06,3.97739e-08,0.612675,0.0013008,-2.71465e-06,-1.45618e-08,0.613973,0.00129533,-2.75833e-06,1.84733e-08,0.615266,0.00128986,-2.70291e-06,2.73606e-10,0.616553,0.00128446,-2.70209e-06,4.00367e-08,0.617835,0.00127918,-2.58198e-06,-4.12113e-08,0.619111,0.00127389,-2.70561e-06,6.52039e-08,0.620383,0.00126867,-2.51e-06,-4.07901e-08,0.621649,0.00126353,-2.63237e-06,3.83516e-08,0.62291,0.00125838,-2.51732e-06,6.59315e-09,0.624166,0.00125337,-2.49754e-06,-5.11939e-09,0.625416,0.00124836,-2.5129e-06,1.38846e-08,0.626662,0.00124337,-2.47124e-06,9.18514e-09,0.627903,0.00123846,-2.44369e-06,8.97952e-09,0.629139,0.0012336,-2.41675e-06,1.45012e-08,0.63037,0.00122881,-2.37325e-06,-7.37949e-09,0.631597,0.00122404,-2.39538e-06,1.50169e-08,0.632818,0.00121929,-2.35033e-06,6.91648e-09,0.634035,0.00121461,-2.32958e-06,1.69219e-08,0.635248,0.00121,-2.27882e-06,-1.49997e-08,0.636455,0.0012054,-2.32382e-06,4.30769e-08,0.637659,0.00120088,-2.19459e-06,-3.80986e-08,0.638857,0.00119638,-2.30888e-06,4.97134e-08,0.640051,0.00119191,-2.15974e-06,-4.15463e-08,0.641241,0.00118747,-2.28438e-06,5.68667e-08,0.642426,0.00118307,-2.11378e-06,-7.10641e-09,0.643607,0.00117882,-2.1351e-06,-2.8441e-08,0.644784,0.00117446,-2.22042e-06,6.12658e-08,0.645956,0.00117021,-2.03663e-06,-3.78083e-08,0.647124,0.00116602,-2.15005e-06,3.03627e-08,0.648288,0.00116181,-2.05896e-06,-2.40379e-08,0.649448,0.00115762,-2.13108e-06,6.57887e-08,0.650603,0.00115356,-1.93371e-06,-6.03028e-08,0.651755,0.00114951,-2.11462e-06,5.62134e-08,0.652902,0.00114545,-1.94598e-06,-4.53417e-08,0.654046,0.00114142,-2.082e-06,6.55489e-08,0.655185,0.00113745,-1.88536e-06,-3.80396e-08,0.656321,0.00113357,-1.99948e-06,2.70049e-08,0.657452,0.00112965,-1.91846e-06,-1.03755e-08,0.65858,0.00112578,-1.94959e-06,1.44973e-08,0.659704,0.00112192,-1.9061e-06,1.1991e-08,0.660824,0.00111815,-1.87012e-06,-2.85634e-09,0.66194,0.0011144,-1.87869e-06,-5.65782e-10,0.663053,0.00111064,-1.88039e-06,5.11947e-09,0.664162,0.0011069,-1.86503e-06,3.96924e-08,0.665267,0.00110328,-1.74595e-06,-4.46795e-08,0.666368,0.00109966,-1.87999e-06,1.98161e-08,0.667466,0.00109596,-1.82054e-06,2.502e-08,0.66856,0.00109239,-1.74548e-06,-6.86593e-10,0.669651,0.0010889,-1.74754e-06,-2.22739e-08,0.670738,0.00108534,-1.81437e-06,3.01776e-08,0.671821,0.0010818,-1.72383e-06,2.07732e-08,0.672902,0.00107841,-1.66151e-06,-5.36658e-08,0.673978,0.00107493,-1.82251e-06,7.46802e-08,0.675051,0.00107151,-1.59847e-06,-6.62411e-08,0.676121,0.00106811,-1.79719e-06,7.10748e-08,0.677188,0.00106473,-1.58397e-06,-3.92441e-08,0.678251,0.00106145,-1.7017e-06,2.62973e-08,0.679311,0.00105812,-1.62281e-06,-6.34035e-09,0.680367,0.00105486,-1.64183e-06,-9.36249e-10,0.68142,0.00105157,-1.64464e-06,1.00854e-08,0.68247,0.00104831,-1.61438e-06,2.01995e-08,0.683517,0.00104514,-1.55378e-06,-3.1279e-08,0.68456,0.00104194,-1.64762e-06,4.53114e-08,0.685601,0.00103878,-1.51169e-06,-3.07573e-08,0.686638,0.00103567,-1.60396e-06,1.81133e-08,0.687672,0.00103251,-1.54962e-06,1.79085e-08,0.688703,0.00102947,-1.49589e-06,-3.01428e-08,0.689731,0.00102639,-1.58632e-06,4.30583e-08,0.690756,0.00102334,-1.45715e-06,-2.28814e-08,0.691778,0.00102036,-1.52579e-06,-1.11373e-08,0.692797,0.00101727,-1.5592e-06,6.74305e-08,0.693812,0.00101436,-1.35691e-06,-7.97709e-08,0.694825,0.0010114,-1.59622e-06,7.28391e-08,0.695835,0.00100843,-1.37771e-06,-3.27715e-08,0.696842,0.00100558,-1.47602e-06,-1.35807e-09,0.697846,0.00100262,-1.48009e-06,3.82037e-08,0.698847,0.000999775,-1.36548e-06,-3.22474e-08,0.699846,0.000996948,-1.46223e-06,3.11809e-08,0.700841,0.000994117,-1.36868e-06,-3.28714e-08,0.701834,0.000991281,-1.4673e-06,4.07001e-08,0.702824,0.000988468,-1.3452e-06,-1.07197e-08,0.703811,0.000985746,-1.37736e-06,2.17866e-09,0.704795,0.000982998,-1.37082e-06,2.00521e-09,0.705777,0.000980262,-1.3648e-06,-1.01996e-08,0.706756,0.000977502,-1.3954e-06,3.87931e-08,0.707732,0.000974827,-1.27902e-06,-2.57632e-08,0.708706,0.000972192,-1.35631e-06,4.65513e-09,0.709676,0.000969493,-1.34235e-06,7.14257e-09,0.710645,0.00096683,-1.32092e-06,2.63791e-08,0.71161,0.000964267,-1.24178e-06,-5.30543e-08,0.712573,0.000961625,-1.40095e-06,6.66289e-08,0.713533,0.000959023,-1.20106e-06,-3.46474e-08,0.714491,0.000956517,-1.305e-06,1.23559e-08,0.715446,0.000953944,-1.26793e-06,-1.47763e-08,0.716399,0.000951364,-1.31226e-06,4.67494e-08,0.717349,0.000948879,-1.17201e-06,-5.3012e-08,0.718297,0.000946376,-1.33105e-06,4.60894e-08,0.719242,0.000943852,-1.19278e-06,-1.21366e-08,0.720185,0.00094143,-1.22919e-06,2.45673e-09,0.721125,0.000938979,-1.22182e-06,2.30966e-09,0.722063,0.000936543,-1.21489e-06,-1.16954e-08,0.722998,0.000934078,-1.24998e-06,4.44718e-08,0.723931,0.000931711,-1.11656e-06,-4.69823e-08,0.724861,0.000929337,-1.25751e-06,2.4248e-08,0.725789,0.000926895,-1.18477e-06,9.5949e-09,0.726715,0.000924554,-1.15598e-06,-3.02286e-09,0.727638,0.000922233,-1.16505e-06,2.49649e-09,0.72856,0.00091991,-1.15756e-06,-6.96321e-09,0.729478,0.000917575,-1.17845e-06,2.53564e-08,0.730395,0.000915294,-1.10238e-06,-3.48578e-08,0.731309,0.000912984,-1.20695e-06,5.44704e-08,0.732221,0.000910734,-1.04354e-06,-6.38144e-08,0.73313,0.000908455,-1.23499e-06,8.15781e-08,0.734038,0.00090623,-9.90253e-07,-8.3684e-08,0.734943,0.000903999,-1.2413e-06,7.43441e-08,0.735846,0.000901739,-1.01827e-06,-3.48787e-08,0.736746,0.000899598,-1.12291e-06,5.56596e-09,0.737645,0.000897369,-1.10621e-06,1.26148e-08,0.738541,0.000895194,-1.06837e-06,3.57935e-09,0.739435,0.000893068,-1.05763e-06,-2.69322e-08,0.740327,0.000890872,-1.13842e-06,4.45448e-08,0.741217,0.000888729,-1.00479e-06,-3.20376e-08,0.742105,0.000886623,-1.1009e-06,2.40011e-08,0.74299,0.000884493,-1.0289e-06,-4.36209e-09,0.743874,0.000882422,-1.04199e-06,-6.55268e-09,0.744755,0.000880319,-1.06164e-06,3.05728e-08,0.745634,0.000878287,-9.69926e-07,-5.61338e-08,0.746512,0.000876179,-1.13833e-06,7.4753e-08,0.747387,0.000874127,-9.14068e-07,-6.40644e-08,0.74826,0.000872106,-1.10626e-06,6.22955e-08,0.749131,0.000870081,-9.19375e-07,-6.59083e-08,0.75,0.000868044,-1.1171e-06,8.21284e-08,0.750867,0.000866056,-8.70714e-07,-8.37915e-08,0.751732,0.000864064,-1.12209e-06,7.42237e-08,0.752595,0.000862042,-8.99418e-07,-3.42894e-08,0.753456,0.00086014,-1.00229e-06,3.32955e-09,0.754315,0.000858146,-9.92297e-07,2.09712e-08,0.755173,0.000856224,-9.29384e-07,-2.76096e-08,0.756028,0.000854282,-1.01221e-06,2.98627e-08,0.756881,0.000852348,-9.22625e-07,-3.22365e-08,0.757733,0.000850406,-1.01933e-06,3.94786e-08,0.758582,0.000848485,-9.00898e-07,-6.46833e-09,0.75943,0.000846664,-9.20303e-07,-1.36052e-08,0.760275,0.000844783,-9.61119e-07,1.28447e-09,0.761119,0.000842864,-9.57266e-07,8.4674e-09,0.761961,0.000840975,-9.31864e-07,2.44506e-08,0.762801,0.000839185,-8.58512e-07,-4.6665e-08,0.763639,0.000837328,-9.98507e-07,4.30001e-08,0.764476,0.00083546,-8.69507e-07,-6.12609e-09,0.76531,0.000833703,-8.87885e-07,-1.84959e-08,0.766143,0.000831871,-9.43372e-07,2.05052e-08,0.766974,0.000830046,-8.81857e-07,-3.92026e-09,0.767803,0.000828271,-8.93618e-07,-4.82426e-09,0.768631,0.000826469,-9.0809e-07,2.32172e-08,0.769456,0.000824722,-8.38439e-07,-2.84401e-08,0.77028,0.00082296,-9.23759e-07,3.09386e-08,0.771102,0.000821205,-8.30943e-07,-3.57099e-08,0.771922,0.000819436,-9.38073e-07,5.22963e-08,0.772741,0.000817717,-7.81184e-07,-5.42658e-08,0.773558,0.000815992,-9.43981e-07,4.55579e-08,0.774373,0.000814241,-8.07308e-07,-8.75656e-09,0.775186,0.0008126,-8.33578e-07,-1.05315e-08,0.775998,0.000810901,-8.65172e-07,-8.72188e-09,0.776808,0.000809145,-8.91338e-07,4.54191e-08,0.777616,0.000807498,-7.5508e-07,-5.37454e-08,0.778423,0.000805827,-9.16317e-07,5.03532e-08,0.779228,0.000804145,-7.65257e-07,-2.84584e-08,0.780031,0.000802529,-8.50632e-07,3.87579e-09,0.780833,0.00080084,-8.39005e-07,1.29552e-08,0.781633,0.0007992,-8.00139e-07,3.90804e-09,0.782432,0.000797612,-7.88415e-07,-2.85874e-08,0.783228,0.000795949,-8.74177e-07,5.0837e-08,0.784023,0.000794353,-7.21666e-07,-5.55513e-08,0.784817,0.000792743,-8.8832e-07,5.21587e-08,0.785609,0.000791123,-7.31844e-07,-3.38744e-08,0.786399,0.000789558,-8.33467e-07,2.37342e-08,0.787188,0.000787962,-7.62264e-07,-1.45775e-09,0.787975,0.000786433,-7.66638e-07,-1.79034e-08,0.788761,0.000784846,-8.20348e-07,1.34665e-08,0.789545,0.000783246,-7.79948e-07,2.3642e-08,0.790327,0.000781757,-7.09022e-07,-4.84297e-08,0.791108,0.000780194,-8.54311e-07,5.08674e-08,0.791888,0.000778638,-7.01709e-07,-3.58303e-08,0.792666,0.000777127,-8.092e-07,3.28493e-08,0.793442,0.000775607,-7.10652e-07,-3.59624e-08,0.794217,0.000774078,-8.1854e-07,5.13959e-08,0.79499,0.000772595,-6.64352e-07,-5.04121e-08,0.795762,0.000771115,-8.15588e-07,3.10431e-08,0.796532,0.000769577,-7.22459e-07,-1.41557e-08,0.797301,0.00076809,-7.64926e-07,2.55795e-08,0.798069,0.000766636,-6.88187e-07,-2.85578e-08,0.798835,0.000765174,-7.73861e-07,2.90472e-08,0.799599,0.000763714,-6.86719e-07,-2.80262e-08,0.800362,0.000762256,-7.70798e-07,2.34531e-08,0.801123,0.000760785,-7.00438e-07,-6.18144e-09,0.801884,0.000759366,-7.18983e-07,1.27263e-09,0.802642,0.000757931,-7.15165e-07,1.09101e-09,0.803399,0.000756504,-7.11892e-07,-5.63675e-09,0.804155,0.000755064,-7.28802e-07,2.14559e-08,0.80491,0.00075367,-6.64434e-07,-2.05821e-08,0.805663,0.00075228,-7.26181e-07,1.26812e-09,0.806414,0.000750831,-7.22377e-07,1.55097e-08,0.807164,0.000749433,-6.75848e-07,-3.70216e-09,0.807913,0.00074807,-6.86954e-07,-7.0105e-10,0.80866,0.000746694,-6.89057e-07,6.5063e-09,0.809406,0.000745336,-6.69538e-07,-2.53242e-08,0.810151,0.000743921,-7.45511e-07,3.51858e-08,0.810894,0.000742535,-6.39953e-07,3.79034e-09,0.811636,0.000741267,-6.28582e-07,-5.03471e-08,0.812377,0.000739858,-7.79624e-07,7.83886e-08,0.813116,0.000738534,-5.44458e-07,-8.43935e-08,0.813854,0.000737192,-7.97638e-07,8.03714e-08,0.81459,0.000735838,-5.56524e-07,-5.82784e-08,0.815325,0.00073455,-7.31359e-07,3.35329e-08,0.816059,0.000733188,-6.3076e-07,-1.62486e-08,0.816792,0.000731878,-6.79506e-07,3.14614e-08,0.817523,0.000730613,-5.85122e-07,-4.99925e-08,0.818253,0.000729293,-7.35099e-07,4.92994e-08,0.818982,0.000727971,-5.87201e-07,-2.79959e-08,0.819709,0.000726712,-6.71189e-07,3.07959e-09,0.820435,0.000725379,-6.6195e-07,1.56777e-08,0.82116,0.000724102,-6.14917e-07,-6.18564e-09,0.821883,0.000722854,-6.33474e-07,9.06488e-09,0.822606,0.000721614,-6.06279e-07,-3.00739e-08,0.823327,0.000720311,-6.96501e-07,5.16262e-08,0.824046,0.000719073,-5.41623e-07,-5.72214e-08,0.824765,0.000717818,-7.13287e-07,5.80503e-08,0.825482,0.000716566,-5.39136e-07,-5.57703e-08,0.826198,0.00071532,-7.06447e-07,4.58215e-08,0.826912,0.000714045,-5.68983e-07,-8.30636e-09,0.827626,0.000712882,-5.93902e-07,-1.25961e-08,0.828338,0.000711656,-6.3169e-07,-9.13985e-10,0.829049,0.00071039,-6.34432e-07,1.62519e-08,0.829759,0.00070917,-5.85676e-07,-4.48904e-09,0.830468,0.000707985,-5.99143e-07,1.70418e-09,0.831175,0.000706792,-5.9403e-07,-2.32768e-09,0.831881,0.000705597,-6.01014e-07,7.60648e-09,0.832586,0.000704418,-5.78194e-07,-2.80982e-08,0.83329,0.000703177,-6.62489e-07,4.51817e-08,0.833993,0.000701988,-5.26944e-07,-3.34192e-08,0.834694,0.000700834,-6.27201e-07,2.88904e-08,0.835394,0.000699666,-5.4053e-07,-2.25378e-08,0.836093,0.000698517,-6.08143e-07,1.65589e-09,0.836791,0.000697306,-6.03176e-07,1.59142e-08,0.837488,0.000696147,-5.55433e-07,-5.70801e-09,0.838184,0.000695019,-5.72557e-07,6.91792e-09,0.838878,0.000693895,-5.51803e-07,-2.19637e-08,0.839571,0.000692725,-6.17694e-07,2.13321e-08,0.840263,0.000691554,-5.53698e-07,-3.75996e-09,0.840954,0.000690435,-5.64978e-07,-6.29219e-09,0.841644,0.000689287,-5.83855e-07,2.89287e-08,0.842333,0.000688206,-4.97068e-07,-4.98181e-08,0.843021,0.000687062,-6.46523e-07,5.11344e-08,0.843707,0.000685922,-4.9312e-07,-3.55102e-08,0.844393,0.00068483,-5.9965e-07,3.13019e-08,0.845077,0.000683724,-5.05745e-07,-3.00925e-08,0.84576,0.000682622,-5.96022e-07,2.94636e-08,0.846442,0.000681519,-5.07631e-07,-2.81572e-08,0.847123,0.000680419,-5.92103e-07,2.35606e-08,0.847803,0.000679306,-5.21421e-07,-6.48045e-09,0.848482,0.000678243,-5.40863e-07,2.36124e-09,0.849159,0.000677169,-5.33779e-07,-2.96461e-09,0.849836,0.000676092,-5.42673e-07,9.49728e-09,0.850512,0.000675035,-5.14181e-07,-3.50245e-08,0.851186,0.000673902,-6.19254e-07,7.09959e-08,0.851859,0.000672876,-4.06267e-07,-7.01453e-08,0.852532,0.000671853,-6.16703e-07,3.07714e-08,0.853203,0.000670712,-5.24388e-07,6.66423e-09,0.853873,0.000669684,-5.04396e-07,2.17629e-09,0.854542,0.000668681,-4.97867e-07,-1.53693e-08,0.855211,0.000667639,-5.43975e-07,-3.03752e-10,0.855878,0.000666551,-5.44886e-07,1.65844e-08,0.856544,0.000665511,-4.95133e-07,-6.42907e-09,0.857209,0.000664501,-5.1442e-07,9.13195e-09,0.857873,0.0006635,-4.87024e-07,-3.00987e-08,0.858536,0.000662435,-5.7732e-07,5.16584e-08,0.859198,0.000661436,-4.22345e-07,-5.73255e-08,0.859859,0.000660419,-5.94322e-07,5.84343e-08,0.860518,0.000659406,-4.19019e-07,-5.72022e-08,0.861177,0.000658396,-5.90626e-07,5.11653e-08,0.861835,0.000657368,-4.3713e-07,-2.82495e-08,0.862492,0.000656409,-5.21878e-07,2.22788e-09,0.863148,0.000655372,-5.15195e-07,1.9338e-08,0.863803,0.0006544,-4.5718e-07,-1.99754e-08,0.864457,0.000653425,-5.17107e-07,9.59024e-10,0.86511,0.000652394,-5.1423e-07,1.61393e-08,0.865762,0.000651414,-4.65812e-07,-5.91149e-09,0.866413,0.000650465,-4.83546e-07,7.50665e-09,0.867063,0.00064952,-4.61026e-07,-2.4115e-08,0.867712,0.000648526,-5.33371e-07,2.93486e-08,0.86836,0.000647547,-4.45325e-07,-3.36748e-08,0.869007,0.000646555,-5.4635e-07,4.57461e-08,0.869653,0.0006456,-4.09112e-07,-3.01002e-08,0.870298,0.000644691,-4.99412e-07,1.50501e-08,0.870942,0.000643738,-4.54262e-07,-3.01002e-08,0.871585,0.000642739,-5.44563e-07,4.57461e-08,0.872228,0.000641787,-4.07324e-07,-3.36748e-08,0.872869,0.000640871,-5.08349e-07,2.93486e-08,0.873509,0.000639943,-4.20303e-07,-2.4115e-08,0.874149,0.00063903,-4.92648e-07,7.50655e-09,0.874787,0.000638067,-4.70128e-07,-5.91126e-09,0.875425,0.000637109,-4.87862e-07,1.61385e-08,0.876062,0.000636182,-4.39447e-07,9.61961e-10,0.876697,0.000635306,-4.36561e-07,-1.99863e-08,0.877332,0.000634373,-4.9652e-07,1.93785e-08,0.877966,0.000633438,-4.38384e-07,2.07697e-09,0.878599,0.000632567,-4.32153e-07,-2.76864e-08,0.879231,0.00063162,-5.15212e-07,4.90641e-08,0.879862,0.000630737,-3.6802e-07,-4.93606e-08,0.880493,0.000629852,-5.16102e-07,2.9169e-08,0.881122,0.000628908,-4.28595e-07,-7.71083e-09,0.881751,0.000628027,-4.51727e-07,1.6744e-09,0.882378,0.000627129,-4.46704e-07,1.01317e-09,0.883005,0.000626239,-4.43665e-07,-5.72703e-09,0.883631,0.000625334,-4.60846e-07,2.1895e-08,0.884255,0.000624478,-3.95161e-07,-2.22481e-08,0.88488,0.000623621,-4.61905e-07,7.4928e-09,0.885503,0.00062272,-4.39427e-07,-7.72306e-09,0.886125,0.000621818,-4.62596e-07,2.33995e-08,0.886746,0.000620963,-3.92398e-07,-2.62704e-08,0.887367,0.000620099,-4.71209e-07,2.20775e-08,0.887987,0.000619223,-4.04976e-07,-2.43496e-09,0.888605,0.000618406,-4.12281e-07,-1.23377e-08,0.889223,0.000617544,-4.49294e-07,-7.81876e-09,0.88984,0.000616622,-4.72751e-07,4.36128e-08,0.890457,0.000615807,-3.41912e-07,-4.7423e-08,0.891072,0.000614981,-4.84181e-07,2.68698e-08,0.891687,0.000614093,-4.03572e-07,-4.51384e-10,0.8923,0.000613285,-4.04926e-07,-2.50643e-08,0.892913,0.0006124,-4.80119e-07,4.11038e-08,0.893525,0.000611563,-3.56808e-07,-2.01414e-08,0.894136,0.000610789,-4.17232e-07,-2.01426e-08,0.894747,0.000609894,-4.7766e-07,4.11073e-08,0.895356,0.000609062,-3.54338e-07,-2.50773e-08,0.895965,0.000608278,-4.2957e-07,-4.02954e-10,0.896573,0.000607418,-4.30779e-07,2.66891e-08,0.89718,0.000606636,-3.50711e-07,-4.67489e-08,0.897786,0.000605795,-4.90958e-07,4.10972e-08,0.898391,0.000604936,-3.67666e-07,1.56948e-09,0.898996,0.000604205,-3.62958e-07,-4.73751e-08,0.8996,0.000603337,-5.05083e-07,6.87214e-08,0.900202,0.000602533,-2.98919e-07,-4.86966e-08,0.900805,0.000601789,-4.45009e-07,6.85589e-09,0.901406,0.00060092,-4.24441e-07,2.1273e-08,0.902007,0.000600135,-3.60622e-07,-3.23434e-08,0.902606,0.000599317,-4.57652e-07,4.84959e-08,0.903205,0.000598547,-3.12164e-07,-4.24309e-08,0.903803,0.000597795,-4.39457e-07,2.01844e-09,0.904401,0.000596922,-4.33402e-07,3.43571e-08,0.904997,0.000596159,-3.30331e-07,-2.02374e-08,0.905593,0.000595437,-3.91043e-07,-1.30123e-08,0.906188,0.000594616,-4.3008e-07,1.26819e-08,0.906782,0.000593794,-3.92034e-07,2.18894e-08,0.907376,0.000593076,-3.26366e-07,-4.06349e-08,0.907968,0.000592301,-4.4827e-07,2.1441e-08,0.90856,0.000591469,-3.83947e-07,1.44754e-08,0.909151,0.000590744,-3.40521e-07,-1.97379e-08,0.909742,0.000590004,-3.99735e-07,4.87161e-09,0.910331,0.000589219,-3.8512e-07,2.51532e-10,0.91092,0.00058845,-3.84366e-07,-5.87776e-09,0.911508,0.000587663,-4.01999e-07,2.32595e-08,0.912096,0.000586929,-3.3222e-07,-2.75554e-08,0.912682,0.000586182,-4.14887e-07,2.73573e-08,0.913268,0.000585434,-3.32815e-07,-2.22692e-08,0.913853,0.000584702,-3.99622e-07,2.11486e-09,0.914437,0.000583909,-3.93278e-07,1.38098e-08,0.915021,0.000583164,-3.51848e-07,2.25042e-09,0.915604,0.000582467,-3.45097e-07,-2.28115e-08,0.916186,0.000581708,-4.13531e-07,2.93911e-08,0.916767,0.000580969,-3.25358e-07,-3.51481e-08,0.917348,0.000580213,-4.30803e-07,5.15967e-08,0.917928,0.000579506,-2.76012e-07,-5.20296e-08,0.918507,0.000578798,-4.32101e-07,3.73124e-08,0.919085,0.000578046,-3.20164e-07,-3.76154e-08,0.919663,0.000577293,-4.3301e-07,5.35447e-08,0.92024,0.000576587,-2.72376e-07,-5.7354e-08,0.920816,0.000575871,-4.44438e-07,5.66621e-08,0.921391,0.000575152,-2.74452e-07,-5.00851e-08,0.921966,0.000574453,-4.24707e-07,2.4469e-08,0.92254,0.000573677,-3.513e-07,1.18138e-08,0.923114,0.000573009,-3.15859e-07,-1.21195e-08,0.923686,0.000572341,-3.52217e-07,-2.29403e-08,0.924258,0.000571568,-4.21038e-07,4.4276e-08,0.924829,0.000570859,-2.8821e-07,-3.49546e-08,0.9254,0.000570178,-3.93074e-07,3.59377e-08,0.92597,0.000569499,-2.85261e-07,-4.91915e-08,0.926539,0.000568781,-4.32835e-07,4.16189e-08,0.927107,0.00056804,-3.07979e-07,1.92523e-09,0.927675,0.00056743,-3.02203e-07,-4.93198e-08,0.928242,0.000566678,-4.50162e-07,7.61447e-08,0.928809,0.000566006,-2.21728e-07,-7.6445e-08,0.929374,0.000565333,-4.51063e-07,5.08216e-08,0.929939,0.000564583,-2.98599e-07,-7.63212e-09,0.930503,0.000563963,-3.21495e-07,-2.02931e-08,0.931067,0.000563259,-3.82374e-07,2.92001e-08,0.93163,0.000562582,-2.94774e-07,-3.69025e-08,0.932192,0.000561882,-4.05482e-07,5.88053e-08,0.932754,0.000561247,-2.29066e-07,-7.91094e-08,0.933315,0.000560552,-4.66394e-07,7.88184e-08,0.933875,0.000559856,-2.29939e-07,-5.73501e-08,0.934434,0.000559224,-4.01989e-07,3.13727e-08,0.934993,0.000558514,-3.07871e-07,-8.53611e-09,0.935551,0.000557873,-3.33479e-07,2.77175e-09,0.936109,0.000557214,-3.25164e-07,-2.55091e-09,0.936666,0.000556556,-3.32817e-07,7.43188e-09,0.937222,0.000555913,-3.10521e-07,-2.71766e-08,0.937778,0.00055521,-3.92051e-07,4.167e-08,0.938333,0.000554551,-2.67041e-07,-2.02941e-08,0.938887,0.000553956,-3.27923e-07,-2.00984e-08,0.93944,0.00055324,-3.88218e-07,4.10828e-08,0.939993,0.000552587,-2.6497e-07,-2.50237e-08,0.940546,0.000551982,-3.40041e-07,-5.92583e-10,0.941097,0.0005513,-3.41819e-07,2.7394e-08,0.941648,0.000550698,-2.59637e-07,-4.93788e-08,0.942199,0.000550031,-4.07773e-07,5.09119e-08,0.942748,0.000549368,-2.55038e-07,-3.50595e-08,0.943297,0.000548753,-3.60216e-07,2.97214e-08,0.943846,0.000548122,-2.71052e-07,-2.42215e-08,0.944394,0.000547507,-3.43716e-07,7.55985e-09,0.944941,0.000546842,-3.21037e-07,-6.01796e-09,0.945487,0.000546182,-3.3909e-07,1.65119e-08,0.946033,0.000545553,-2.89555e-07,-4.2498e-10,0.946578,0.000544973,-2.9083e-07,-1.4812e-08,0.947123,0.000544347,-3.35266e-07,6.83068e-11,0.947667,0.000543676,-3.35061e-07,1.45388e-08,0.94821,0.00054305,-2.91444e-07,1.38123e-09,0.948753,0.000542471,-2.87301e-07,-2.00637e-08,0.949295,0.000541836,-3.47492e-07,1.92688e-08,0.949837,0.000541199,-2.89685e-07,2.59298e-09,0.950378,0.000540628,-2.81906e-07,-2.96407e-08,0.950918,0.000539975,-3.70829e-07,5.63652e-08,0.951458,0.000539402,-2.01733e-07,-7.66107e-08,0.951997,0.000538769,-4.31565e-07,7.12638e-08,0.952535,0.00053812,-2.17774e-07,-2.96305e-08,0.953073,0.000537595,-3.06665e-07,-1.23464e-08,0.95361,0.000536945,-3.43704e-07,1.94114e-08,0.954147,0.000536316,-2.8547e-07,-5.69451e-09,0.954683,0.000535728,-3.02554e-07,3.36666e-09,0.955219,0.000535133,-2.92454e-07,-7.77208e-09,0.955753,0.000534525,-3.1577e-07,2.77216e-08,0.956288,0.000533976,-2.32605e-07,-4.35097e-08,0.956821,0.00053338,-3.63134e-07,2.7108e-08,0.957354,0.000532735,-2.8181e-07,-5.31772e-09,0.957887,0.000532156,-2.97764e-07,-5.83718e-09,0.958419,0.000531543,-3.15275e-07,2.86664e-08,0.95895,0.000530998,-2.29276e-07,-4.9224e-08,0.959481,0.000530392,-3.76948e-07,4.90201e-08,0.960011,0.000529785,-2.29887e-07,-2.76471e-08,0.96054,0.000529243,-3.12829e-07,1.96385e-09,0.961069,0.000528623,-3.06937e-07,1.97917e-08,0.961598,0.000528068,-2.47562e-07,-2.15261e-08,0.962125,0.000527508,-3.1214e-07,6.70795e-09,0.962653,0.000526904,-2.92016e-07,-5.30573e-09,0.963179,0.000526304,-3.07934e-07,1.4515e-08,0.963705,0.000525732,-2.64389e-07,6.85048e-09,0.964231,0.000525224,-2.43837e-07,-4.19169e-08,0.964756,0.00052461,-3.69588e-07,4.1608e-08,0.96528,0.000523996,-2.44764e-07,-5.30598e-09,0.965804,0.000523491,-2.60682e-07,-2.03841e-08,0.966327,0.000522908,-3.21834e-07,2.72378e-08,0.966849,0.000522346,-2.40121e-07,-2.89625e-08,0.967371,0.000521779,-3.27008e-07,2.90075e-08,0.967893,0.000521212,-2.39986e-07,-2.74629e-08,0.968414,0.00052065,-3.22374e-07,2.12396e-08,0.968934,0.000520069,-2.58656e-07,2.10922e-09,0.969454,0.000519558,-2.52328e-07,-2.96765e-08,0.969973,0.000518964,-3.41357e-07,5.6992e-08,0.970492,0.000518452,-1.70382e-07,-7.90821e-08,0.97101,0.000517874,-4.07628e-07,8.05224e-08,0.971528,0.000517301,-1.66061e-07,-6.41937e-08,0.972045,0.000516776,-3.58642e-07,5.70429e-08,0.972561,0.00051623,-1.87513e-07,-4.47686e-08,0.973077,0.00051572,-3.21819e-07,2.82237e-09,0.973593,0.000515085,-3.13352e-07,3.34792e-08,0.974108,0.000514559,-2.12914e-07,-1.75298e-08,0.974622,0.000514081,-2.65503e-07,-2.29648e-08,0.975136,0.000513481,-3.34398e-07,4.97843e-08,0.975649,0.000512961,-1.85045e-07,-5.6963e-08,0.976162,0.00051242,-3.55934e-07,5.88585e-08,0.976674,0.000511885,-1.79359e-07,-5.92616e-08,0.977185,0.000511348,-3.57143e-07,5.89785e-08,0.977696,0.000510811,-1.80208e-07,-5.74433e-08,0.978207,0.000510278,-3.52538e-07,5.15854e-08,0.978717,0.000509728,-1.97781e-07,-2.9689e-08,0.979226,0.000509243,-2.86848e-07,7.56591e-09,0.979735,0.000508692,-2.64151e-07,-5.74649e-10,0.980244,0.000508162,-2.65875e-07,-5.26732e-09,0.980752,0.000507615,-2.81677e-07,2.16439e-08,0.981259,0.000507116,-2.16745e-07,-2.17037e-08,0.981766,0.000506618,-2.81856e-07,5.56636e-09,0.982272,0.000506071,-2.65157e-07,-5.61689e-10,0.982778,0.000505539,-2.66842e-07,-3.31963e-09,0.983283,0.000504995,-2.76801e-07,1.38402e-08,0.983788,0.000504483,-2.3528e-07,7.56339e-09,0.984292,0.000504035,-2.1259e-07,-4.40938e-08,0.984796,0.000503478,-3.44871e-07,4.96026e-08,0.985299,0.000502937,-1.96064e-07,-3.51071e-08,0.985802,0.000502439,-3.01385e-07,3.12212e-08,0.986304,0.00050193,-2.07721e-07,-3.0173e-08,0.986806,0.000501424,-2.9824e-07,2.9866e-08,0.987307,0.000500917,-2.08642e-07,-2.96865e-08,0.987808,0.000500411,-2.97702e-07,2.92753e-08,0.988308,0.000499903,-2.09876e-07,-2.78101e-08,0.988807,0.0004994,-2.93306e-07,2.23604e-08,0.989307,0.000498881,-2.26225e-07,-2.02681e-09,0.989805,0.000498422,-2.32305e-07,-1.42531e-08,0.990303,0.000497915,-2.75065e-07,-5.65232e-10,0.990801,0.000497363,-2.76761e-07,1.65141e-08,0.991298,0.000496859,-2.27218e-07,-5.88639e-09,0.991795,0.000496387,-2.44878e-07,7.0315e-09,0.992291,0.000495918,-2.23783e-07,-2.22396e-08,0.992787,0.000495404,-2.90502e-07,2.23224e-08,0.993282,0.00049489,-2.23535e-07,-7.44543e-09,0.993776,0.000494421,-2.45871e-07,7.45924e-09,0.994271,0.000493951,-2.23493e-07,-2.23915e-08,0.994764,0.000493437,-2.90668e-07,2.25021e-08,0.995257,0.000492923,-2.23161e-07,-8.01218e-09,0.99575,0.000492453,-2.47198e-07,9.54669e-09,0.996242,0.000491987,-2.18558e-07,-3.01746e-08,0.996734,0.000491459,-3.09082e-07,5.1547e-08,0.997225,0.000490996,-1.54441e-07,-5.68039e-08,0.997716,0.000490517,-3.24853e-07,5.64594e-08,0.998206,0.000490036,-1.55474e-07,-4.98245e-08,0.998696,0.000489576,-3.04948e-07,2.36292e-08,0.999186,0.000489037,-2.3406e-07,1.49121e-08,0.999674,0.000488613,-1.89324e-07,-2.3673e-08,1.00016,0.000488164,-2.60343e-07,2.01754e-08,1.00065,0.000487704,-1.99816e-07,-5.70288e-08,1.00114,0.000487133,-3.70903e-07,8.87303e-08,1.00162,0.000486657,-1.04712e-07,-5.94737e-08,1.00211,0.000486269,-2.83133e-07,2.99553e-08,1.0026,0.000485793,-1.93267e-07,-6.03474e-08,1.00308,0.000485225,-3.74309e-07,9.2225e-08,1.00357,0.000484754,-9.76345e-08,-7.0134e-08,1.00405,0.000484348,-3.08036e-07,6.91016e-08,1.00454,0.000483939,-1.00731e-07,-8.70633e-08,1.00502,0.000483476,-3.61921e-07,4.07328e-08,1.0055,0.000482875,-2.39723e-07,4.33413e-08,1.00599,0.000482525,-1.09699e-07,-9.48886e-08,1.00647,0.000482021,-3.94365e-07,9.77947e-08,1.00695,0.000481526,-1.00981e-07,-5.78713e-08,1.00743,0.00048115,-2.74595e-07,1.44814e-08,1.00791,0.000480645,-2.31151e-07,-5.42665e-11,1.00839,0.000480182,-2.31314e-07,-1.42643e-08,1.00887,0.000479677,-2.74106e-07,5.71115e-08,1.00935,0.0004793,-1.02772e-07,-9.49724e-08,1.00983,0.000478809,-3.87689e-07,8.43596e-08,1.01031,0.000478287,-1.3461e-07,-4.04755e-09,1.01079,0.000478006,-1.46753e-07,-6.81694e-08,1.01127,0.000477508,-3.51261e-07,3.83067e-08,1.01174,0.00047692,-2.36341e-07,3.41521e-08,1.01222,0.00047655,-1.33885e-07,-5.57058e-08,1.0127,0.000476115,-3.01002e-07,6.94616e-08,1.01317,0.000475721,-9.26174e-08,-1.02931e-07,1.01365,0.000475227,-4.01412e-07,1.03846e-07,1.01412,0.000474736,-8.98751e-08,-7.40321e-08,1.0146,0.000474334,-3.11971e-07,7.30735e-08,1.01507,0.00047393,-9.27508e-08,-9.90527e-08,1.01554,0.000473447,-3.89909e-07,8.47188e-08,1.01602,0.000472921,-1.35753e-07,-1.40381e-09,1.01649,0.000472645,-1.39964e-07,-7.91035e-08,1.01696,0.000472128,-3.77275e-07,7.93993e-08,1.01744,0.000471612,-1.39077e-07,-7.52607e-11,1.01791,0.000471334,-1.39302e-07,-7.90983e-08,1.01838,0.000470818,-3.76597e-07,7.80499e-08,1.01885,0.000470299,-1.42448e-07,5.31733e-09,1.01932,0.00047003,-1.26496e-07,-9.93193e-08,1.01979,0.000469479,-4.24453e-07,1.53541e-07,1.02026,0.00046909,3.617e-08,-1.57217e-07,1.02073,0.000468691,-4.35482e-07,1.177e-07,1.02119,0.000468173,-8.23808e-08,-7.51659e-08,1.02166,0.000467783,-3.07878e-07,6.37538e-08,1.02213,0.000467358,-1.16617e-07,-6.064e-08,1.0226,0.000466943,-2.98537e-07,5.9597e-08,1.02306,0.000466525,-1.19746e-07,-5.85386e-08,1.02353,0.00046611,-2.95362e-07,5.53482e-08,1.024,0.000465685,-1.29317e-07,-4.36449e-08,1.02446,0.000465296,-2.60252e-07,2.20268e-11,1.02493,0.000464775,-2.60186e-07,4.35568e-08,1.02539,0.000464386,-1.29516e-07,-5.50398e-08,1.02586,0.000463961,-2.94635e-07,5.73932e-08,1.02632,0.000463544,-1.22456e-07,-5.53236e-08,1.02678,0.000463133,-2.88426e-07,4.46921e-08,1.02725,0.000462691,-1.5435e-07,-4.23534e-09,1.02771,0.000462369,-1.67056e-07,-2.77507e-08,1.02817,0.000461952,-2.50308e-07,-3.97101e-09,1.02863,0.000461439,-2.62221e-07,4.36348e-08,1.02909,0.000461046,-1.31317e-07,-5.13589e-08,1.02955,0.000460629,-2.85394e-07,4.25913e-08,1.03001,0.000460186,-1.5762e-07,2.0285e-10,1.03047,0.000459871,-1.57011e-07,-4.34027e-08,1.03093,0.000459427,-2.87219e-07,5.41987e-08,1.03139,0.000459015,-1.24623e-07,-5.4183e-08,1.03185,0.000458604,-2.87172e-07,4.33239e-08,1.03231,0.000458159,-1.572e-07,9.65817e-11,1.03277,0.000457845,-1.56911e-07,-4.37103e-08,1.03323,0.0004574,-2.88041e-07,5.55351e-08,1.03368,0.000456991,-1.21436e-07,-5.9221e-08,1.03414,0.00045657,-2.99099e-07,6.21394e-08,1.0346,0.000456158,-1.1268e-07,-7.01275e-08,1.03505,0.000455723,-3.23063e-07,9.91614e-08,1.03551,0.000455374,-2.55788e-08,-8.80996e-08,1.03596,0.000455058,-2.89878e-07,1.48184e-08,1.03642,0.000454523,-2.45422e-07,2.88258e-08,1.03687,0.000454119,-1.58945e-07,-1.09125e-08,1.03733,0.000453768,-1.91682e-07,1.48241e-08,1.03778,0.000453429,-1.4721e-07,-4.83838e-08,1.03823,0.00045299,-2.92361e-07,5.95019e-08,1.03869,0.000452584,-1.13856e-07,-7.04146e-08,1.03914,0.000452145,-3.25099e-07,1.02947e-07,1.03959,0.000451803,-1.62583e-08,-1.02955e-07,1.04004,0.000451462,-3.25123e-07,7.04544e-08,1.04049,0.000451023,-1.1376e-07,-5.96534e-08,1.04094,0.000450616,-2.9272e-07,4.89499e-08,1.04139,0.000450178,-1.45871e-07,-1.69369e-08,1.04184,0.000449835,-1.96681e-07,1.87977e-08,1.04229,0.000449498,-1.40288e-07,-5.82539e-08,1.04274,0.000449043,-3.1505e-07,9.50087e-08,1.04319,0.000448698,-3.00238e-08,-8.33623e-08,1.04364,0.000448388,-2.80111e-07,2.20363e-11,1.04409,0.000447828,-2.80045e-07,8.32742e-08,1.04454,0.000447517,-3.02221e-08,-9.47002e-08,1.04498,0.000447173,-3.14323e-07,5.7108e-08,1.04543,0.000446716,-1.42999e-07,-1.45225e-08,1.04588,0.000446386,-1.86566e-07,9.82022e-10,1.04632,0.000446016,-1.8362e-07,1.05944e-08,1.04677,0.00044568,-1.51837e-07,-4.33597e-08,1.04721,0.000445247,-2.81916e-07,4.36352e-08,1.04766,0.000444814,-1.51011e-07,-1.19717e-08,1.0481,0.000444476,-1.86926e-07,4.25158e-09,1.04855,0.000444115,-1.74171e-07,-5.03461e-09,1.04899,0.000443751,-1.89275e-07,1.58868e-08,1.04944,0.00044342,-1.41614e-07,-5.85127e-08,1.04988,0.000442961,-3.17152e-07,9.89548e-08,1.05032,0.000442624,-2.0288e-08,-9.88878e-08,1.05076,0.000442287,-3.16951e-07,5.81779e-08,1.05121,0.000441827,-1.42418e-07,-1.46144e-08,1.05165,0.000441499,-1.86261e-07,2.79892e-10,1.05209,0.000441127,-1.85421e-07,1.34949e-08,1.05253,0.000440797,-1.44937e-07,-5.42594e-08,1.05297,0.000440344,-3.07715e-07,8.43335e-08,1.05341,0.000439982,-5.47146e-08,-4.46558e-08,1.05385,0.000439738,-1.88682e-07,-2.49193e-08,1.05429,0.000439286,-2.6344e-07,2.5124e-08,1.05473,0.000438835,-1.88068e-07,4.36328e-08,1.05517,0.000438589,-5.71699e-08,-8.04459e-08,1.05561,0.000438234,-2.98508e-07,3.97324e-08,1.05605,0.000437756,-1.79311e-07,4.07258e-08,1.05648,0.000437519,-5.71332e-08,-8.34263e-08,1.05692,0.000437155,-3.07412e-07,5.45608e-08,1.05736,0.000436704,-1.4373e-07,-1.56078e-08,1.05779,0.000436369,-1.90553e-07,7.87043e-09,1.05823,0.000436012,-1.66942e-07,-1.58739e-08,1.05867,0.00043563,-2.14563e-07,5.56251e-08,1.0591,0.000435368,-4.76881e-08,-8.74172e-08,1.05954,0.000435011,-3.0994e-07,5.56251e-08,1.05997,0.000434558,-1.43064e-07,-1.58739e-08,1.06041,0.000434224,-1.90686e-07,7.87042e-09,1.06084,0.000433866,-1.67075e-07,-1.56078e-08,1.06127,0.000433485,-2.13898e-07,5.45609e-08,1.06171,0.000433221,-5.02157e-08,-8.34263e-08,1.06214,0.00043287,-3.00495e-07,4.07258e-08,1.06257,0.000432391,-1.78317e-07,3.97325e-08,1.063,0.000432154,-5.91198e-08,-8.04464e-08,1.06344,0.000431794,-3.00459e-07,4.36347e-08,1.06387,0.000431324,-1.69555e-07,2.5117e-08,1.0643,0.000431061,-9.42041e-08,-2.48934e-08,1.06473,0.000430798,-1.68884e-07,-4.47527e-08,1.06516,0.000430326,-3.03142e-07,8.46951e-08,1.06559,0.000429973,-4.90573e-08,-5.56089e-08,1.06602,0.000429708,-2.15884e-07,1.85314e-08,1.06645,0.000429332,-1.6029e-07,-1.85166e-08,1.06688,0.000428956,-2.1584e-07,5.5535e-08,1.06731,0.000428691,-4.92347e-08,-8.44142e-08,1.06774,0.000428339,-3.02477e-07,4.37032e-08,1.06816,0.000427865,-1.71368e-07,2.88107e-08,1.06859,0.000427609,-8.49356e-08,-3.97367e-08,1.06902,0.00042732,-2.04146e-07,1.09267e-08,1.06945,0.000426945,-1.71365e-07,-3.97023e-09,1.06987,0.00042659,-1.83276e-07,4.9542e-09,1.0703,0.000426238,-1.68414e-07,-1.58466e-08,1.07073,0.000425854,-2.15953e-07,5.84321e-08,1.07115,0.000425597,-4.0657e-08,-9.86725e-08,1.07158,0.00042522,-3.36674e-07,9.78392e-08,1.072,0.00042484,-4.31568e-08,-5.42658e-08,1.07243,0.000424591,-2.05954e-07,1.45377e-11,1.07285,0.000424179,-2.0591e-07,5.42076e-08,1.07328,0.00042393,-4.32877e-08,-9.76357e-08,1.0737,0.00042355,-3.36195e-07,9.79165e-08,1.07412,0.000423172,-4.24451e-08,-5.56118e-08,1.07455,0.00042292,-2.09281e-07,5.32143e-09,1.07497,0.000422518,-1.93316e-07,3.43261e-08,1.07539,0.000422234,-9.0338e-08,-2.34165e-08,1.07581,0.000421983,-1.60588e-07,-5.98692e-08,1.07623,0.000421482,-3.40195e-07,1.43684e-07,1.07666,0.000421233,9.08574e-08,-1.5724e-07,1.07708,0.000420943,-3.80862e-07,1.27647e-07,1.0775,0.000420564,2.0791e-09,-1.1493e-07,1.07792,0.000420223,-3.4271e-07,9.36534e-08,1.07834,0.000419819,-6.17499e-08,-2.12653e-08,1.07876,0.000419632,-1.25546e-07,-8.59219e-09,1.07918,0.000419355,-1.51322e-07,-6.35752e-08,1.0796,0.000418861,-3.42048e-07,1.43684e-07,1.08002,0.000418608,8.90034e-08,-1.53532e-07,1.08043,0.000418326,-3.71593e-07,1.12817e-07,1.08085,0.000417921,-3.31414e-08,-5.93184e-08,1.08127,0.000417677,-2.11097e-07,5.24697e-09,1.08169,0.00041727,-1.95356e-07,3.83305e-08,1.0821,0.000416995,-8.03642e-08,-3.93597e-08,1.08252,0.000416716,-1.98443e-07,-1.0094e-10,1.08294,0.000416319,-1.98746e-07,3.97635e-08,1.08335,0.00041604,-7.94557e-08,-3.97437e-08,1.08377,0.000415762,-1.98687e-07,1.94215e-12,1.08419,0.000415365,-1.98681e-07,3.97359e-08,1.0846,0.000415087,-7.94732e-08,-3.97362e-08,1.08502,0.000414809,-1.98682e-07,-4.31063e-13,1.08543,0.000414411,-1.98683e-07,3.97379e-08,1.08584,0.000414133,-7.94694e-08,-3.97418e-08,1.08626,0.000413855,-1.98695e-07,2.00563e-11,1.08667,0.000413458,-1.98635e-07,3.96616e-08,1.08709,0.000413179,-7.965e-08,-3.9457e-08,1.0875,0.000412902,-1.98021e-07,-1.04281e-09,1.08791,0.000412502,-2.01149e-07,4.36282e-08,1.08832,0.000412231,-7.02648e-08,-5.42608e-08,1.08874,0.000411928,-2.33047e-07,5.42057e-08,1.08915,0.000411624,-7.04301e-08,-4.33527e-08,1.08956,0.000411353,-2.00488e-07,-4.07378e-12,1.08997,0.000410952,-2.005e-07,4.3369e-08,1.09038,0.000410681,-7.03934e-08,-5.42627e-08,1.09079,0.000410378,-2.33182e-07,5.44726e-08,1.0912,0.000410075,-6.97637e-08,-4.44186e-08,1.09161,0.000409802,-2.03019e-07,3.99235e-09,1.09202,0.000409408,-1.91042e-07,2.84491e-08,1.09243,0.000409111,-1.05695e-07,1.42043e-09,1.09284,0.000408904,-1.01434e-07,-3.41308e-08,1.09325,0.000408599,-2.03826e-07,1.58937e-08,1.09366,0.000408239,-1.56145e-07,-2.94438e-08,1.09406,0.000407838,-2.44476e-07,1.01881e-07,1.09447,0.000407655,6.11676e-08,-1.39663e-07,1.09488,0.000407358,-3.57822e-07,9.91432e-08,1.09529,0.00040694,-6.03921e-08,-1.84912e-08,1.09569,0.000406764,-1.15866e-07,-2.51785e-08,1.0961,0.000406457,-1.91401e-07,-4.03115e-12,1.09651,0.000406074,-1.91413e-07,2.51947e-08,1.09691,0.000405767,-1.15829e-07,1.84346e-08,1.09732,0.00040559,-6.05254e-08,-9.89332e-08,1.09772,0.000405172,-3.57325e-07,1.3888e-07,1.09813,0.000404874,5.93136e-08,-9.8957e-08,1.09853,0.000404696,-2.37557e-07,1.853e-08,1.09894,0.000404277,-1.81968e-07,2.48372e-08,1.09934,0.000403987,-1.07456e-07,1.33047e-09,1.09975,0.000403776,-1.03465e-07,-3.01591e-08,1.10015,0.000403479,-1.93942e-07,9.66054e-11,1.10055,0.000403091,-1.93652e-07,2.97727e-08,1.10096,0.000402793,-1.04334e-07,2.19273e-11,1.10136,0.000402585,-1.04268e-07,-2.98604e-08,1.10176,0.000402287,-1.93849e-07,2.10325e-10,1.10216,0.0004019,-1.93218e-07,2.90191e-08,1.10256,0.0004016,-1.06161e-07,2.92264e-09,1.10297,0.000401397,-9.73931e-08,-4.07096e-08,1.10337,0.00040108,-2.19522e-07,4.07067e-08,1.10377,0.000400763,-9.7402e-08,-2.90783e-09,1.10417,0.000400559,-1.06126e-07,-2.90754e-08,1.10457,0.00040026,-1.93352e-07,9.00021e-14,1.10497,0.000399873,-1.93351e-07,2.9075e-08,1.10537,0.000399574,-1.06126e-07,2.90902e-09,1.10577,0.00039937,-9.73992e-08,-4.07111e-08,1.10617,0.000399053,-2.19533e-07,4.07262e-08,1.10657,0.000398736,-9.73541e-08,-2.98424e-09,1.10697,0.000398533,-1.06307e-07,-2.87892e-08,1.10736,0.000398234,-1.92674e-07,-1.06824e-09,1.10776,0.000397845,-1.95879e-07,3.30622e-08,1.10816,0.000397552,-9.66926e-08,-1.19712e-08,1.10856,0.000397323,-1.32606e-07,1.48225e-08,1.10895,0.000397102,-8.81387e-08,-4.73187e-08,1.10935,0.000396784,-2.30095e-07,5.52429e-08,1.10975,0.00039649,-6.4366e-08,-5.44437e-08,1.11014,0.000396198,-2.27697e-07,4.33226e-08,1.11054,0.000395872,-9.77293e-08,3.62656e-10,1.11094,0.000395678,-9.66414e-08,-4.47732e-08,1.11133,0.00039535,-2.30961e-07,5.95208e-08,1.11173,0.000395067,-5.23985e-08,-7.41008e-08,1.11212,0.00039474,-2.74701e-07,1.17673e-07,1.11252,0.000394543,7.83181e-08,-1.58172e-07,1.11291,0.000394225,-3.96199e-07,1.57389e-07,1.1133,0.000393905,7.59679e-08,-1.13756e-07,1.1137,0.000393716,-2.653e-07,5.92165e-08,1.11409,0.000393363,-8.76507e-08,-3.90074e-09,1.11449,0.000393176,-9.93529e-08,-4.36136e-08,1.11488,0.000392846,-2.30194e-07,5.91457e-08,1.11527,0.000392563,-5.27564e-08,-7.376e-08,1.11566,0.000392237,-2.74037e-07,1.16685e-07,1.11606,0.000392039,7.60189e-08,-1.54562e-07,1.11645,0.000391727,-3.87667e-07,1.43935e-07,1.11684,0.000391384,4.4137e-08,-6.35487e-08,1.11723,0.000391281,-1.46509e-07,-8.94896e-09,1.11762,0.000390961,-1.73356e-07,-1.98647e-08,1.11801,0.000390555,-2.3295e-07,8.8408e-08,1.1184,0.000390354,3.22736e-08,-9.53486e-08,1.11879,0.000390133,-2.53772e-07,5.45677e-08,1.11918,0.000389789,-9.0069e-08,-3.71296e-09,1.11957,0.000389598,-1.01208e-07,-3.97159e-08,1.11996,0.000389276,-2.20355e-07,4.33671e-08,1.12035,0.000388966,-9.02542e-08,-1.45431e-08,1.12074,0.000388741,-1.33883e-07,1.48052e-08,1.12113,0.000388518,-8.94678e-08,-4.46778e-08,1.12152,0.000388205,-2.23501e-07,4.46966e-08,1.12191,0.000387892,-8.94114e-08,-1.48992e-08,1.12229,0.000387669,-1.34109e-07,1.49003e-08,1.12268,0.000387445,-8.94082e-08,-4.47019e-08,1.12307,0.000387132,-2.23514e-07,4.4698e-08,1.12345,0.000386819,-8.942e-08,-1.48806e-08,1.12384,0.000386596,-1.34062e-07,1.48245e-08,1.12423,0.000386372,-8.95885e-08,-4.44172e-08,1.12461,0.00038606,-2.2284e-07,4.36351e-08,1.125,0.000385745,-9.19348e-08,-1.09139e-08,1.12539,0.000385528,-1.24677e-07,2.05584e-11,1.12577,0.000385279,-1.24615e-07,1.08317e-08,1.12616,0.000385062,-9.21198e-08,-4.33473e-08,1.12654,0.000384748,-2.22162e-07,4.33481e-08,1.12693,0.000384434,-9.21174e-08,-1.08356e-08,1.12731,0.000384217,-1.24624e-07,-5.50907e-12,1.12769,0.000383968,-1.24641e-07,1.08577e-08,1.12808,0.000383751,-9.20679e-08,-4.34252e-08,1.12846,0.000383437,-2.22343e-07,4.36337e-08,1.12884,0.000383123,-9.14422e-08,-1.19005e-08,1.12923,0.000382904,-1.27144e-07,3.96813e-09,1.12961,0.000382662,-1.15239e-07,-3.97207e-09,1.12999,0.000382419,-1.27155e-07,1.19201e-08,1.13038,0.000382201,-9.1395e-08,-4.37085e-08,1.13076,0.000381887,-2.2252e-07,4.37046e-08,1.13114,0.000381573,-9.14068e-08,-1.19005e-08,1.13152,0.000381355,-1.27108e-07,3.89734e-09,1.1319,0.000381112,-1.15416e-07,-3.68887e-09,1.13228,0.00038087,-1.26483e-07,1.08582e-08,1.13266,0.00038065,-9.39083e-08,-3.97438e-08,1.13304,0.000380343,-2.1314e-07,2.89076e-08,1.13342,0.000380003,-1.26417e-07,4.33225e-08,1.1338,0.00037988,3.55072e-09,-8.29883e-08,1.13418,0.000379638,-2.45414e-07,5.0212e-08,1.13456,0.000379298,-9.47781e-08,1.34964e-09,1.13494,0.000379113,-9.07292e-08,-5.56105e-08,1.13532,0.000378764,-2.57561e-07,1.01883e-07,1.1357,0.000378555,4.80889e-08,-1.13504e-07,1.13608,0.000378311,-2.92423e-07,1.13713e-07,1.13646,0.000378067,4.87176e-08,-1.02931e-07,1.13683,0.000377856,-2.60076e-07,5.95923e-08,1.13721,0.000377514,-8.12988e-08,-1.62288e-08,1.13759,0.000377303,-1.29985e-07,5.32278e-09,1.13797,0.000377059,-1.14017e-07,-5.06237e-09,1.13834,0.000376816,-1.29204e-07,1.49267e-08,1.13872,0.000376602,-8.44237e-08,-5.46444e-08,1.1391,0.000376269,-2.48357e-07,8.44417e-08,1.13947,0.000376026,4.96815e-09,-4.47039e-08,1.13985,0.000375902,-1.29143e-07,-2.48355e-08,1.14023,0.000375569,-2.0365e-07,2.48368e-08,1.1406,0.000375236,-1.2914e-07,4.46977e-08,1.14098,0.000375112,4.95341e-09,-8.44184e-08,1.14135,0.000374869,-2.48302e-07,5.45572e-08,1.14173,0.000374536,-8.463e-08,-1.46013e-08,1.1421,0.000374323,-1.28434e-07,3.8478e-09,1.14247,0.000374077,-1.1689e-07,-7.89941e-10,1.14285,0.000373841,-1.1926e-07,-6.88042e-10,1.14322,0.0003736,-1.21324e-07,3.54213e-09,1.1436,0.000373368,-1.10698e-07,-1.34805e-08,1.14397,0.000373107,-1.51139e-07,5.03798e-08,1.14434,0.000372767,0.,0.};
+
+    template <typename T, int scn, int dcn, bool srgb, int blueIdx> struct RGB2Luv;
+
+    template <int scn, int dcn, bool srgb, int blueIdx> struct RGB2Luv<float, scn, dcn, srgb, blueIdx>
+            : unary_function<typename MakeVec<float, scn>::type, typename MakeVec<float, dcn>::type>
+    {
+        __device__ typename MakeVec<float, dcn>::type operator ()(const typename MakeVec<float, scn>::type& src) const
+        {
+            const float _d = 1.f / (0.950456f + 15 + 1.088754f * 3);
+            const float _un = 13 * (4 * 0.950456f * _d);
+            const float _vn = 13 * (9 * _d);
+
+            float B = blueIdx == 0 ? src.x : src.z;
+            float G = src.y;
+            float R = blueIdx == 0 ? src.z : src.x;
+
+            if (srgb)
+            {
+                B = splineInterpolate(B * GAMMA_TAB_SIZE, c_sRGBGammaTab, GAMMA_TAB_SIZE);
+                G = splineInterpolate(G * GAMMA_TAB_SIZE, c_sRGBGammaTab, GAMMA_TAB_SIZE);
+                R = splineInterpolate(R * GAMMA_TAB_SIZE, c_sRGBGammaTab, GAMMA_TAB_SIZE);
+            }
+
+            float X = R * 0.412453f + G * 0.357580f + B * 0.180423f;
+            float Y = R * 0.212671f + G * 0.715160f + B * 0.072169f;
+            float Z = R * 0.019334f + G * 0.119193f + B * 0.950227f;
+
+            float L = splineInterpolate(Y * (LAB_CBRT_TAB_SIZE / 1.5f), c_LabCbrtTab, LAB_CBRT_TAB_SIZE);
+            L = 116.f * L - 16.f;
+
+            const float d = (4 * 13) / ::fmaxf(X + 15 * Y + 3 * Z, numeric_limits<float>::epsilon());
+            float u = L * (X * d - _un);
+            float v = L * ((9 * 0.25f) * Y * d - _vn);
+
+            typename MakeVec<float, dcn>::type dst;
+
+            dst.x = L;
+            dst.y = u;
+            dst.z = v;
+
+            return dst;
+        }
+    };
+
+    template <int scn, int dcn, bool srgb, int blueIdx> struct RGB2Luv<uchar, scn, dcn, srgb, blueIdx>
+            : unary_function<typename MakeVec<uchar, scn>::type, typename MakeVec<uchar, dcn>::type>
+    {
+        __device__ typename MakeVec<uchar, dcn>::type operator ()(const typename MakeVec<uchar, scn>::type& src) const
+        {
+            float3 buf;
+
+            buf.x = src.x * (1.f / 255.f);
+            buf.y = src.y * (1.f / 255.f);
+            buf.z = src.z * (1.f / 255.f);
+
+            RGB2Luv<float, 3, 3, srgb, blueIdx> cvtf;
+            buf = cvtf(buf);
+
+            typename MakeVec<uchar, dcn>::type dst;
+
+            dst.x = saturate_cast<uchar>(buf.x * 2.55f);
+            dst.y = saturate_cast<uchar>(buf.y * 0.72033898305084743f + 96.525423728813564f);
+            dst.z = saturate_cast<uchar>(buf.z * 0.9732824427480916f + 136.259541984732824f);
+
+            return dst;
+        }
+    };
+
+    // Luv to RGB
+
+    template <typename T, int scn, int dcn, bool srgb, int blueIdx> struct Luv2RGB;
+
+    template <int scn, int dcn, bool srgb, int blueIdx> struct Luv2RGB<float, scn, dcn, srgb, blueIdx>
+            : unary_function<typename MakeVec<float, scn>::type, typename MakeVec<float, dcn>::type>
+    {
+        __device__ typename MakeVec<float, dcn>::type operator ()(const typename MakeVec<float, scn>::type& src) const
+        {
+            const float _d = 1.f / (0.950456f + 15 + 1.088754f * 3);
+            const float _un = 13 * 4 * 0.950456f * _d;
+            const float _vn = 13 * 9 * _d;
+
+            float L = src.x;
+            float u = src.y;
+            float v = src.z;
+
+            float Y1 = (L + 16.f) * (1.f / 116.f);
+            Y1 = Y1 * Y1 * Y1;
+            float Y0 = L * (1.f / 903.3f);
+            float Y = L <= 8.f ? Y0 : Y1;
+
+            u = (u + _un * L) * 3.f;
+            v = (v + _vn * L) * 4.f;
+
+            float iv = 1.f / v;
+            iv = ::fmaxf(-0.25f, ::fminf(0.25f, iv));
+            float X = 3.f * u * iv;
+            float Z = (12.f * 13.f * L - u) * iv - 5.f;
+
+            float B = (0.055648f * X - 0.204043f + 1.057311f * Z) * Y;
+            float G = (-0.969256f * X + 1.875991f + 0.041556f * Z) * Y;
+            float R = (3.240479f * X - 1.537150f - 0.498535f * Z) * Y;
+
+            R = ::fminf(::fmaxf(R, 0.f), 1.f);
+            G = ::fminf(::fmaxf(G, 0.f), 1.f);
+            B = ::fminf(::fmaxf(B, 0.f), 1.f);
+
+            if (srgb)
+            {
+                B = splineInterpolate(B * GAMMA_TAB_SIZE, c_sRGBInvGammaTab, GAMMA_TAB_SIZE);
+                G = splineInterpolate(G * GAMMA_TAB_SIZE, c_sRGBInvGammaTab, GAMMA_TAB_SIZE);
+                R = splineInterpolate(R * GAMMA_TAB_SIZE, c_sRGBInvGammaTab, GAMMA_TAB_SIZE);
+            }
+
+            typename MakeVec<float, dcn>::type dst;
+
+            dst.x = blueIdx == 0 ? B : R;
+            dst.y = G;
+            dst.z = blueIdx == 0 ? R : B;
+            setAlpha(dst, ColorChannel<float>::max());
+
+            return dst;
+        }
+    };
+
+    template <int scn, int dcn, bool srgb, int blueIdx> struct Luv2RGB<uchar, scn, dcn, srgb, blueIdx>
+            : unary_function<typename MakeVec<uchar, scn>::type, typename MakeVec<uchar, dcn>::type>
+    {
+        __device__ typename MakeVec<uchar, dcn>::type operator ()(const typename MakeVec<uchar, scn>::type& src) const
+        {
+            float3 buf;
+
+            buf.x = src.x * (100.f / 255.f);
+            buf.y = src.y * 1.388235294117647f - 134.f;
+            buf.z = src.z * 1.027450980392157f - 140.f;
+
+            Luv2RGB<float, 3, 3, srgb, blueIdx> cvtf;
+            buf = cvtf(buf);
+
+            typename MakeVec<uchar, dcn>::type dst;
+
+            dst.x = saturate_cast<uchar>(buf.x * 255.f);
+            dst.y = saturate_cast<uchar>(buf.y * 255.f);
+            dst.z = saturate_cast<uchar>(buf.z * 255.f);
+            setAlpha(dst, ColorChannel<uchar>::max());
+
+            return dst;
+        }
+    };
+
+    #undef CV_CUDEV_DESCALE
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/functional/functional.hpp b/modules/cudev/include/opencv2/cudev/functional/functional.hpp
new file mode 100644
index 00000000000..f6569cf3d55
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/functional/functional.hpp
@@ -0,0 +1,904 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_FUNCTIONAL_FUNCTIONAL_HPP
+#define OPENCV_CUDEV_FUNCTIONAL_FUNCTIONAL_HPP
+
+#include "../common.hpp"
+#include "../util/saturate_cast.hpp"
+#include "../util/vec_traits.hpp"
+#include "../util/vec_math.hpp"
+#include "../util/type_traits.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// Function Objects
+
+template <typename _Arg, typename _Result> struct unary_function
+{
+    typedef _Arg    argument_type;
+    typedef _Result result_type;
+};
+
+template <typename _Arg1, typename _Arg2, typename _Result> struct binary_function
+{
+    typedef _Arg1   first_argument_type;
+    typedef _Arg2   second_argument_type;
+    typedef _Result result_type;
+};
+
+// Arithmetic Operations
+
+template <typename T> struct plus : binary_function<T, T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type a,
+                                             typename TypeTraits<T>::parameter_type b) const
+    {
+        return saturate_cast<T>(a + b);
+    }
+};
+
+template <typename T> struct minus : binary_function<T, T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type a,
+                                             typename TypeTraits<T>::parameter_type b) const
+    {
+        return saturate_cast<T>(a - b);
+    }
+};
+
+template <typename T> struct multiplies : binary_function<T, T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type a,
+                                             typename TypeTraits<T>::parameter_type b) const
+    {
+        return saturate_cast<T>(a * b);
+    }
+};
+
+template <typename T> struct divides : binary_function<T, T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type a,
+                                             typename TypeTraits<T>::parameter_type b) const
+    {
+        return saturate_cast<T>(a / b);
+    }
+};
+
+template <typename T> struct modulus : binary_function<T, T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type a,
+                                             typename TypeTraits<T>::parameter_type b) const
+    {
+        return saturate_cast<T>(a % b);
+    }
+};
+
+template <typename T> struct negate : unary_function<T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type a) const
+    {
+        return saturate_cast<T>(-a);
+    }
+};
+
+// Comparison Operations
+
+template <typename T> struct equal_to : binary_function<T, T, typename MakeVec<uchar, VecTraits<T>::cn>::type>
+{
+    __device__ __forceinline__ typename MakeVec<uchar, VecTraits<T>::cn>::type
+                                    operator ()(typename TypeTraits<T>::parameter_type a,
+                                                typename TypeTraits<T>::parameter_type b) const
+    {
+        return a == b;
+    }
+};
+
+template <typename T> struct not_equal_to : binary_function<T, T, typename MakeVec<uchar, VecTraits<T>::cn>::type>
+{
+    __device__ __forceinline__ typename MakeVec<uchar, VecTraits<T>::cn>::type
+                                    operator ()(typename TypeTraits<T>::parameter_type a,
+                                                typename TypeTraits<T>::parameter_type b) const
+    {
+        return a != b;
+    }
+};
+
+template <typename T> struct greater : binary_function<T, T, typename MakeVec<uchar, VecTraits<T>::cn>::type>
+{
+    __device__ __forceinline__ typename MakeVec<uchar, VecTraits<T>::cn>::type
+                                    operator ()(typename TypeTraits<T>::parameter_type a,
+                                                typename TypeTraits<T>::parameter_type b) const
+    {
+        return a > b;
+    }
+};
+
+template <typename T> struct less : binary_function<T, T, typename MakeVec<uchar, VecTraits<T>::cn>::type>
+{
+    __device__ __forceinline__ typename MakeVec<uchar, VecTraits<T>::cn>::type
+                                    operator ()(typename TypeTraits<T>::parameter_type a,
+                                                typename TypeTraits<T>::parameter_type b) const
+    {
+        return a < b;
+    }
+};
+
+template <typename T> struct greater_equal : binary_function<T, T, typename MakeVec<uchar, VecTraits<T>::cn>::type>
+{
+    __device__ __forceinline__ typename MakeVec<uchar, VecTraits<T>::cn>::type
+                                    operator ()(typename TypeTraits<T>::parameter_type a,
+                                                typename TypeTraits<T>::parameter_type b) const
+    {
+        return a >= b;
+    }
+};
+
+template <typename T> struct less_equal : binary_function<T, T, typename MakeVec<uchar, VecTraits<T>::cn>::type>
+{
+    __device__ __forceinline__ typename MakeVec<uchar, VecTraits<T>::cn>::type
+                                    operator ()(typename TypeTraits<T>::parameter_type a,
+                                                typename TypeTraits<T>::parameter_type b) const
+    {
+        return a <= b;
+    }
+};
+
+// Logical Operations
+
+template <typename T> struct logical_and : binary_function<T, T, typename MakeVec<uchar, VecTraits<T>::cn>::type>
+{
+    __device__ __forceinline__ typename MakeVec<uchar, VecTraits<T>::cn>::type
+                                    operator ()(typename TypeTraits<T>::parameter_type a,
+                                                typename TypeTraits<T>::parameter_type b) const
+    {
+        return a && b;
+    }
+};
+
+template <typename T> struct logical_or : binary_function<T, T, typename MakeVec<uchar, VecTraits<T>::cn>::type>
+{
+    __device__ __forceinline__ typename MakeVec<uchar, VecTraits<T>::cn>::type
+                                    operator ()(typename TypeTraits<T>::parameter_type a,
+                                                typename TypeTraits<T>::parameter_type b) const
+    {
+        return a || b;
+    }
+};
+
+template <typename T> struct logical_not : unary_function<T, typename MakeVec<uchar, VecTraits<T>::cn>::type>
+{
+    __device__ __forceinline__ typename MakeVec<uchar, VecTraits<T>::cn>::type
+                                    operator ()(typename TypeTraits<T>::parameter_type a) const
+    {
+        return !a;
+    }
+};
+
+// Bitwise Operations
+
+template <typename T> struct bit_and : binary_function<T, T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type a,
+                                             typename TypeTraits<T>::parameter_type b) const
+    {
+        return a & b;
+    }
+};
+
+template <typename T> struct bit_or : binary_function<T, T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type a,
+                                             typename TypeTraits<T>::parameter_type b) const
+    {
+        return a | b;
+    }
+};
+
+template <typename T> struct bit_xor : binary_function<T, T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type a,
+                                             typename TypeTraits<T>::parameter_type b) const
+    {
+        return a ^ b;
+    }
+};
+
+template <typename T> struct bit_not : unary_function<T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type v) const
+    {
+        return ~v;
+    }
+};
+
+template <typename T> struct bit_lshift : binary_function<T, T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type a,
+                                             typename TypeTraits<T>::parameter_type b) const
+    {
+        return a << b;
+    }
+};
+
+template <typename T> struct bit_rshift : binary_function<T, T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type a,
+                                             typename TypeTraits<T>::parameter_type b) const
+    {
+        return a >> b;
+    }
+};
+
+// Generalized Identity Operations
+
+template <typename T> struct identity : unary_function<T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type x) const
+    {
+        return x;
+    }
+};
+
+template <typename T1, typename T2> struct project1st : binary_function<T1, T2, T1>
+{
+    __device__ __forceinline__ T1
+                    operator ()(typename TypeTraits<T1>::parameter_type lhs,
+                                typename TypeTraits<T2>::parameter_type) const
+    {
+        return lhs;
+    }
+};
+
+template <typename T1, typename T2> struct project2nd : binary_function<T1, T2, T2>
+{
+    __device__ __forceinline__ T2
+                    operator ()(typename TypeTraits<T1>::parameter_type,
+                                typename TypeTraits<T2>::parameter_type rhs) const
+    {
+        return rhs;
+    }
+};
+
+// Min/Max Operations
+
+template <typename T> struct maximum : binary_function<T, T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type a,
+                                             typename TypeTraits<T>::parameter_type b) const
+    {
+        return max(a, b);
+    }
+};
+
+template <typename T> struct minimum : binary_function<T, T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type a,
+                                             typename TypeTraits<T>::parameter_type b) const
+    {
+        return min(a, b);
+    }
+};
+
+#define CV_CUDEV_MINMAX_INST(type, maxop, minop) \
+    template <> struct maximum<type> : binary_function<type, type, type> \
+    { \
+        __device__ __forceinline__ type operator ()(type a, type b) const {return maxop(a, b);} \
+    }; \
+    template <> struct minimum<type> : binary_function<type, type, type> \
+    { \
+        __device__ __forceinline__ type operator ()(type a, type b) const {return minop(a, b);} \
+    };
+
+
+CV_CUDEV_MINMAX_INST(uchar, ::max, ::min)
+CV_CUDEV_MINMAX_INST(schar, ::max, ::min)
+CV_CUDEV_MINMAX_INST(ushort, ::max, ::min)
+CV_CUDEV_MINMAX_INST(short, ::max, ::min)
+CV_CUDEV_MINMAX_INST(int, ::max, ::min)
+CV_CUDEV_MINMAX_INST(uint, ::max, ::min)
+CV_CUDEV_MINMAX_INST(float, ::fmaxf, ::fminf)
+CV_CUDEV_MINMAX_INST(double, ::fmax, ::fmin)
+
+#undef CV_CUDEV_MINMAX_INST
+
+// abs_func
+
+template <typename T> struct abs_func : unary_function<T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type x) const
+    {
+        return abs(x);
+    }
+};
+
+template <> struct abs_func<uchar> : unary_function<uchar, uchar>
+{
+    __device__ __forceinline__ uchar operator ()(uchar x) const
+    {
+        return x;
+    }
+};
+
+template <> struct abs_func<schar> : unary_function<schar, schar>
+{
+    __device__ __forceinline__ schar operator ()(schar x) const
+    {
+        return ::abs((int) x);
+    }
+};
+
+template <> struct abs_func<ushort> : unary_function<ushort, ushort>
+{
+    __device__ __forceinline__ ushort operator ()(ushort x) const
+    {
+        return x;
+    }
+};
+
+template <> struct abs_func<short> : unary_function<short, short>
+{
+    __device__ __forceinline__ short operator ()(short x) const
+    {
+        return ::abs((int) x);
+    }
+};
+
+template <> struct abs_func<uint> : unary_function<uint, uint>
+{
+    __device__ __forceinline__ uint operator ()(uint x) const
+    {
+        return x;
+    }
+};
+
+template <> struct abs_func<int> : unary_function<int, int>
+{
+    __device__ __forceinline__ int operator ()(int x) const
+    {
+        return ::abs(x);
+    }
+};
+
+template <> struct abs_func<float> : unary_function<float, float>
+{
+    __device__ __forceinline__ float operator ()(float x) const
+    {
+        return ::fabsf(x);
+    }
+};
+
+template <> struct abs_func<double> : unary_function<double, double>
+{
+    __device__ __forceinline__ double operator ()(double x) const
+    {
+        return ::fabs(x);
+    }
+};
+
+// absdiff_func
+
+template <typename T> struct absdiff_func : binary_function<T, T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type a, typename TypeTraits<T>::parameter_type b) const
+    {
+        abs_func<T> f;
+        return f(a - b);
+    }
+};
+
+// Math functions
+
+template <typename T> struct sqr_func : unary_function<T, T>
+{
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type x) const
+    {
+        return x * x;
+    }
+};
+
+namespace functional_detail
+{
+    template <typename T> struct FloatType
+    {
+        typedef typename MakeVec<
+            typename LargerType<float, typename VecTraits<T>::elem_type>::type,
+            VecTraits<T>::cn
+        >::type type;
+    };
+}
+
+#define CV_CUDEV_UNARY_FUNCTION_INST(name, func) \
+    template <typename T> struct name ## _func : unary_function<T, typename functional_detail::FloatType<T>::type> \
+    { \
+        __device__ __forceinline__ typename functional_detail::FloatType<T>::type operator ()(typename TypeTraits<T>::parameter_type a) const \
+        { \
+            return name(a); \
+        } \
+    }; \
+    template <> struct name ## _func<uchar> : unary_function<uchar, float> \
+    { \
+        __device__ __forceinline__ float operator ()(uchar a) const \
+        { \
+            return func ## f(a); \
+        } \
+    }; \
+    template <> struct name ## _func<schar> : unary_function<schar, float> \
+    { \
+        __device__ __forceinline__ float operator ()(schar a) const \
+        { \
+            return func ## f(a); \
+        } \
+    }; \
+    template <> struct name ## _func<ushort> : unary_function<ushort, float> \
+    { \
+        __device__ __forceinline__ float operator ()(ushort a) const \
+        { \
+            return func ## f(a); \
+        } \
+    }; \
+    template <> struct name ## _func<short> : unary_function<short, float> \
+    { \
+        __device__ __forceinline__ float operator ()(short a) const \
+        { \
+            return func ## f(a); \
+        } \
+    }; \
+    template <> struct name ## _func<uint> : unary_function<uint, float> \
+    { \
+        __device__ __forceinline__ float operator ()(uint a) const \
+        { \
+            return func ## f(a); \
+        } \
+    }; \
+    template <> struct name ## _func<int> : unary_function<int, float> \
+    { \
+        __device__ __forceinline__ float operator ()(int a) const \
+        { \
+            return func ## f(a); \
+        } \
+    }; \
+    template <> struct name ## _func<float> : unary_function<float, float> \
+    { \
+        __device__ __forceinline__ float operator ()(float a) const \
+        { \
+            return func ## f(a); \
+        } \
+    }; \
+    template <> struct name ## _func<double> : unary_function<double, double> \
+    { \
+        __device__ __forceinline__ double operator ()(double a) const \
+        { \
+            return func(a); \
+        } \
+    };
+
+CV_CUDEV_UNARY_FUNCTION_INST(sqrt, ::sqrt)
+CV_CUDEV_UNARY_FUNCTION_INST(exp, ::exp)
+CV_CUDEV_UNARY_FUNCTION_INST(exp2, ::exp2)
+CV_CUDEV_UNARY_FUNCTION_INST(exp10, ::exp10)
+CV_CUDEV_UNARY_FUNCTION_INST(log, ::log)
+CV_CUDEV_UNARY_FUNCTION_INST(log2, ::log2)
+CV_CUDEV_UNARY_FUNCTION_INST(log10, ::log10)
+CV_CUDEV_UNARY_FUNCTION_INST(sin, ::sin)
+CV_CUDEV_UNARY_FUNCTION_INST(cos, ::cos)
+CV_CUDEV_UNARY_FUNCTION_INST(tan, ::tan)
+CV_CUDEV_UNARY_FUNCTION_INST(asin, ::asin)
+CV_CUDEV_UNARY_FUNCTION_INST(acos, ::acos)
+CV_CUDEV_UNARY_FUNCTION_INST(atan, ::atan)
+CV_CUDEV_UNARY_FUNCTION_INST(sinh, ::sinh)
+CV_CUDEV_UNARY_FUNCTION_INST(cosh, ::cosh)
+CV_CUDEV_UNARY_FUNCTION_INST(tanh, ::tanh)
+CV_CUDEV_UNARY_FUNCTION_INST(asinh, ::asinh)
+CV_CUDEV_UNARY_FUNCTION_INST(acosh, ::acosh)
+CV_CUDEV_UNARY_FUNCTION_INST(atanh, ::atanh)
+
+#undef CV_CUDEV_UNARY_FUNCTION_INST
+
+#define CV_CUDEV_BINARY_FUNCTION_INST(name, func) \
+    template <typename T> struct name ## _func : binary_function<T, T, typename functional_detail::FloatType<T>::type> \
+    { \
+        __device__ __forceinline__ typename functional_detail::FloatType<T>::type operator ()(typename TypeTraits<T>::parameter_type a, typename TypeTraits<T>::parameter_type b) const \
+        { \
+            return name(a, b); \
+        } \
+    }; \
+    template <> struct name ## _func<uchar> : binary_function<uchar, uchar, float> \
+    { \
+        __device__ __forceinline__ float operator ()(uchar a, uchar b) const \
+        { \
+            return func ## f(a, b); \
+        } \
+    }; \
+    template <> struct name ## _func<schar> : binary_function<schar, schar, float> \
+    { \
+        __device__ __forceinline__ float operator ()(schar a, schar b) const \
+        { \
+            return func ## f(a, b); \
+        } \
+    }; \
+    template <> struct name ## _func<ushort> : binary_function<ushort, ushort, float> \
+    { \
+        __device__ __forceinline__ float operator ()(ushort a, ushort b) const \
+        { \
+            return func ## f(a, b); \
+        } \
+    }; \
+    template <> struct name ## _func<short> : binary_function<short, short, float> \
+    { \
+        __device__ __forceinline__ float operator ()(short a, short b) const \
+        { \
+            return func ## f(a, b); \
+        } \
+    }; \
+    template <> struct name ## _func<uint> : binary_function<uint, uint, float> \
+    { \
+        __device__ __forceinline__ float operator ()(uint a, uint b) const \
+        { \
+            return func ## f(a, b); \
+        } \
+    }; \
+    template <> struct name ## _func<int> : binary_function<int, int, float> \
+    { \
+        __device__ __forceinline__ float operator ()(int a, int b) const \
+        { \
+            return func ## f(a, b); \
+        } \
+    }; \
+    template <> struct name ## _func<float> : binary_function<float, float, float> \
+    { \
+        __device__ __forceinline__ float operator ()(float a, float b) const \
+        { \
+            return func ## f(a, b); \
+        } \
+    }; \
+    template <> struct name ## _func<double> : binary_function<double, double, double> \
+    { \
+        __device__ __forceinline__ double operator ()(double a, double b) const \
+        { \
+            return func(a, b); \
+        } \
+    };
+
+CV_CUDEV_BINARY_FUNCTION_INST(hypot, ::hypot)
+CV_CUDEV_BINARY_FUNCTION_INST(atan2, ::atan2)
+
+#undef CV_CUDEV_BINARY_FUNCTION_INST
+
+template <typename T> struct magnitude_func : binary_function<T, T, typename functional_detail::FloatType<T>::type>
+{
+    __device__ __forceinline__ typename functional_detail::FloatType<T>::type operator ()(typename TypeTraits<T>::parameter_type a, typename TypeTraits<T>::parameter_type b) const
+    {
+        sqrt_func<typename functional_detail::FloatType<T>::type> f;
+        return f(a * a + b * b);
+    }
+};
+
+template <typename T> struct magnitude_sqr_func : binary_function<T, T, typename functional_detail::FloatType<T>::type>
+{
+    __device__ __forceinline__ typename functional_detail::FloatType<T>::type operator ()(typename TypeTraits<T>::parameter_type a, typename TypeTraits<T>::parameter_type b) const
+    {
+        return a * a + b * b;
+    }
+};
+
+template <typename T, bool angleInDegrees> struct direction_func : binary_function<T, T, T>
+{
+    __device__ T operator ()(T x, T y) const
+    {
+        atan2_func<T> f;
+        typename atan2_func<T>::result_type angle = f(y, x);
+
+        angle += (angle < 0) * (2.0f * CV_PI_F);
+
+        if (angleInDegrees)
+            angle *= (180.0f / CV_PI_F);
+
+        return saturate_cast<T>(angle);
+    }
+};
+
+template <typename T> struct pow_func : binary_function<T, float, float>
+{
+    __device__ __forceinline__ float operator ()(T val, float power) const
+    {
+        return ::powf(val, power);
+    }
+};
+template <> struct pow_func<double> : binary_function<double, double, double>
+{
+    __device__ __forceinline__ double operator ()(double val, double power) const
+    {
+        return ::pow(val, power);
+    }
+};
+
+// Saturate Cast Functor
+
+template <typename T, typename D> struct saturate_cast_func : unary_function<T, D>
+{
+    __device__ __forceinline__ D operator ()(typename TypeTraits<T>::parameter_type v) const
+    {
+        return saturate_cast<D>(v);
+    }
+};
+
+// Convert Fp16 dummy
+template <typename T, typename D> struct saturate_cast_fp16_func;
+
+// Convert Fp16 from Fp32
+template <> struct saturate_cast_fp16_func<float, short> : unary_function<float, short>
+{
+    __device__ __forceinline__ short operator ()(float v) const
+    {
+        return cast_fp16<float, short>(v);
+    }
+};
+
+// Convert Fp16 to Fp32
+template <> struct saturate_cast_fp16_func<short, float> : unary_function<short, float>
+{
+    __device__ __forceinline__ float operator ()(short v) const
+    {
+        return cast_fp16<short, float>(v);
+    }
+};
+
+// Threshold Functors
+
+template <typename T> struct ThreshBinaryFunc : unary_function<T, T>
+{
+    T thresh;
+    T maxVal;
+
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type src) const
+    {
+        return saturate_cast<T>(src > thresh) * maxVal;
+    }
+};
+
+template <typename T>
+__host__ __device__ ThreshBinaryFunc<T> thresh_binary_func(T thresh, T maxVal)
+{
+    ThreshBinaryFunc<T> f;
+    f.thresh = thresh;
+    f.maxVal = maxVal;
+    return f;
+}
+
+template <typename T> struct ThreshBinaryInvFunc : unary_function<T, T>
+{
+    T thresh;
+    T maxVal;
+
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type src) const
+    {
+        return saturate_cast<T>(src <= thresh) * maxVal;
+    }
+};
+
+template <typename T>
+__host__ __device__ ThreshBinaryInvFunc<T> thresh_binary_inv_func(T thresh, T maxVal)
+{
+    ThreshBinaryInvFunc<T> f;
+    f.thresh = thresh;
+    f.maxVal = maxVal;
+    return f;
+}
+
+template <typename T> struct ThreshTruncFunc : unary_function<T, T>
+{
+    T thresh;
+
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type src) const
+    {
+        minimum<T> minOp;
+        return minOp(src, thresh);
+    }
+};
+
+template <typename T>
+__host__ __device__ ThreshTruncFunc<T> thresh_trunc_func(T thresh)
+{
+    ThreshTruncFunc<T> f;
+    f.thresh = thresh;
+    return f;
+}
+
+template <typename T> struct ThreshToZeroFunc : unary_function<T, T>
+{
+    T thresh;
+
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type src) const
+    {
+        return saturate_cast<T>(src > thresh) * src;
+    }
+};
+
+template <typename T>
+__host__ __device__ ThreshToZeroFunc<T> thresh_to_zero_func(T thresh)
+{
+    ThreshToZeroFunc<T> f;
+    f.thresh = thresh;
+    return f;
+}
+
+template <typename T> struct ThreshToZeroInvFunc : unary_function<T, T>
+{
+    T thresh;
+
+    __device__ __forceinline__ T operator ()(typename TypeTraits<T>::parameter_type src) const
+    {
+        return saturate_cast<T>(src <= thresh) * src;
+    }
+};
+
+template <typename T>
+__host__ __device__ ThreshToZeroInvFunc<T> thresh_to_zero_inv_func(T thresh)
+{
+    ThreshToZeroInvFunc<T> f;
+    f.thresh = thresh;
+    return f;
+}
+
+// Function Object Adaptors
+
+template <class Predicate> struct UnaryNegate : unary_function<typename Predicate::argument_type, typename Predicate::result_type>
+{
+    Predicate pred;
+
+    __device__ __forceinline__ typename Predicate::result_type operator ()(
+            typename TypeTraits<typename Predicate::argument_type>::parameter_type x) const
+    {
+        return !pred(x);
+    }
+};
+
+template <class Predicate>
+__host__ __device__ UnaryNegate<Predicate> not1(const Predicate& pred)
+{
+    UnaryNegate<Predicate> n;
+    n.pred = pred;
+    return n;
+}
+
+template <class Predicate> struct BinaryNegate : binary_function<typename Predicate::first_argument_type, typename Predicate::second_argument_type, typename Predicate::result_type>
+{
+    Predicate pred;
+
+    __device__ __forceinline__ typename Predicate::result_type operator ()(
+            typename TypeTraits<typename Predicate::first_argument_type>::parameter_type x,
+            typename TypeTraits<typename Predicate::second_argument_type>::parameter_type y) const
+    {
+        return !pred(x, y);
+    }
+};
+
+template <class Predicate>
+__host__ __device__ BinaryNegate<Predicate> not2(const Predicate& pred)
+{
+    BinaryNegate<Predicate> n;
+    n.pred = pred;
+    return n;
+}
+
+template <class Op> struct Binder1st : unary_function<typename Op::second_argument_type, typename Op::result_type>
+{
+    Op op;
+    typename Op::first_argument_type arg1;
+
+    __device__ __forceinline__ typename Op::result_type operator ()(
+            typename TypeTraits<typename Op::second_argument_type>::parameter_type a) const
+    {
+        return op(arg1, a);
+    }
+};
+
+template <class Op>
+__host__ __device__ Binder1st<Op> bind1st(const Op& op, const typename Op::first_argument_type& arg1)
+{
+    Binder1st<Op> b;
+    b.op = op;
+    b.arg1 = arg1;
+    return b;
+}
+
+template <class Op> struct Binder2nd : unary_function<typename Op::first_argument_type, typename Op::result_type>
+{
+    Op op;
+    typename Op::second_argument_type arg2;
+
+    __device__ __forceinline__ typename Op::result_type operator ()(
+            typename TypeTraits<typename Op::first_argument_type>::parameter_type a) const
+    {
+        return op(a, arg2);
+    }
+};
+
+template <class Op>
+__host__ __device__ Binder2nd<Op> bind2nd(const Op& op, const typename Op::second_argument_type& arg2)
+{
+    Binder2nd<Op> b;
+    b.op = op;
+    b.arg2 = arg2;
+    return b;
+}
+
+// Functor Traits
+
+template <typename F> struct IsUnaryFunction
+{
+    typedef char Yes;
+    struct No {Yes a[2];};
+
+    template <typename T, typename D> static Yes check(unary_function<T, D>);
+    static No check(...);
+
+    static F makeF();
+
+    enum { value = (sizeof(check(makeF())) == sizeof(Yes)) };
+};
+
+template <typename F> struct IsBinaryFunction
+{
+    typedef char Yes;
+    struct No {Yes a[2];};
+
+    template <typename T1, typename T2, typename D> static Yes check(binary_function<T1, T2, D>);
+    static No check(...);
+
+    static F makeF();
+
+    enum { value = (sizeof(check(makeF())) == sizeof(Yes)) };
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/functional/tuple_adapter.hpp b/modules/cudev/include/opencv2/cudev/functional/tuple_adapter.hpp
new file mode 100644
index 00000000000..15331818b1c
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/functional/tuple_adapter.hpp
@@ -0,0 +1,103 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_FUNCTIONAL_TUPLE_ADAPTER_HPP
+#define OPENCV_CUDEV_FUNCTIONAL_TUPLE_ADAPTER_HPP
+
+#include "../common.hpp"
+#include "../util/tuple.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <class Op, int n> struct UnaryTupleAdapter
+{
+    typedef typename Op::result_type result_type;
+
+    Op op;
+
+    template <class Tuple>
+    __device__ __forceinline__ typename Op::result_type operator ()(const Tuple& t) const
+    {
+        return op(get<n>(t));
+    }
+};
+
+template <int n, class Op>
+__host__ __device__ UnaryTupleAdapter<Op, n> unaryTupleAdapter(const Op& op)
+{
+    UnaryTupleAdapter<Op, n> a;
+    a.op = op;
+    return a;
+}
+
+template <class Op, int n0, int n1> struct BinaryTupleAdapter
+{
+    typedef typename Op::result_type result_type;
+
+    Op op;
+
+    template <class Tuple>
+    __device__ __forceinline__ typename Op::result_type operator ()(const Tuple& t) const
+    {
+        return op(get<n0>(t), get<n1>(t));
+    }
+};
+
+template <int n0, int n1, class Op>
+__host__ __device__ BinaryTupleAdapter<Op, n0, n1> binaryTupleAdapter(const Op& op)
+{
+    BinaryTupleAdapter<Op, n0, n1> a;
+    a.op = op;
+    return a;
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/copy.hpp b/modules/cudev/include/opencv2/cudev/grid/copy.hpp
new file mode 100644
index 00000000000..cbaca840392
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/copy.hpp
@@ -0,0 +1,457 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_COPY_HPP
+#define OPENCV_CUDEV_GRID_COPY_HPP
+
+#include "../common.hpp"
+#include "../util/tuple.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/gpumat.hpp"
+#include "../ptr2d/glob.hpp"
+#include "../ptr2d/mask.hpp"
+#include "../ptr2d/zip.hpp"
+#include "detail/copy.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <class Policy, class SrcPtr, typename DstType, class MaskPtr>
+__host__ void gridCopy_(const SrcPtr& src, GpuMat_<DstType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    dst.create(rows, cols);
+
+    grid_copy_detail::copy<Policy>(shrinkPtr(src), shrinkPtr(dst), shrinkPtr(mask), rows, cols, StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType, class MaskPtr>
+__host__ void gridCopy_(const SrcPtr& src, const GlobPtrSz<DstType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(dst) == rows && getCols(dst) == cols );
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_copy_detail::copy<Policy>(shrinkPtr(src), shrinkPtr(dst), shrinkPtr(mask), rows, cols, StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType>
+__host__ void gridCopy_(const SrcPtr& src, GpuMat_<DstType>& dst, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    dst.create(rows, cols);
+
+    grid_copy_detail::copy<Policy>(shrinkPtr(src), shrinkPtr(dst), WithOutMask(), rows, cols, StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType>
+__host__ void gridCopy_(const SrcPtr& src, const GlobPtrSz<DstType>& dst, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(dst) == rows && getCols(dst) == cols );
+
+    grid_copy_detail::copy<Policy>(shrinkPtr(src), shrinkPtr(dst), WithOutMask(), rows, cols, StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtrTuple, typename D0, typename D1, class MaskPtr>
+__host__ void gridCopy_(const SrcPtrTuple& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>& >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<SrcPtrTuple>::value == 2, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+
+    grid_copy_detail::copy_tuple<Policy>(shrinkPtr(src),
+                                         shrinkPtr(zipPtr(get<0>(dst), get<1>(dst))),
+                                         shrinkPtr(mask),
+                                         rows, cols,
+                                         StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtrTuple, typename D0, typename D1, class MaskPtr>
+__host__ void gridCopy_(const SrcPtrTuple& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1> >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<SrcPtrTuple>::value == 2, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(get<0>(dst)) == rows && getCols(get<0>(dst)) == cols );
+    CV_Assert( getRows(get<1>(dst)) == rows && getCols(get<1>(dst)) == cols );
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_copy_detail::copy_tuple<Policy>(shrinkPtr(src),
+                                         shrinkPtr(zipPtr(get<0>(dst), get<1>(dst))),
+                                         shrinkPtr(mask),
+                                         rows, cols,
+                                         StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtrTuple, typename D0, typename D1>
+__host__ void gridCopy_(const SrcPtrTuple& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>& >& dst, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<SrcPtrTuple>::value == 2, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+
+    grid_copy_detail::copy_tuple<Policy>(shrinkPtr(src),
+                                         shrinkPtr(zipPtr(get<0>(dst), get<1>(dst))),
+                                         WithOutMask(),
+                                         rows, cols,
+                                         StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtrTuple, typename D0, typename D1>
+__host__ void gridCopy_(const SrcPtrTuple& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1> >& dst, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<SrcPtrTuple>::value == 2, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(get<0>(dst)) == rows && getCols(get<0>(dst)) == cols );
+    CV_Assert( getRows(get<1>(dst)) == rows && getCols(get<1>(dst)) == cols );
+
+    grid_copy_detail::copy_tuple<Policy>(shrinkPtr(src),
+                                         shrinkPtr(zipPtr(get<0>(dst), get<1>(dst))),
+                                         WithOutMask(),
+                                         rows, cols,
+                                         StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtrTuple, typename D0, typename D1, typename D2, class MaskPtr>
+__host__ void gridCopy_(const SrcPtrTuple& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>&, GpuMat_<D2>& >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<SrcPtrTuple>::value == 3, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+    get<2>(dst).create(rows, cols);
+
+    grid_copy_detail::copy_tuple<Policy>(shrinkPtr(src),
+                                         shrinkPtr(zipPtr(get<0>(dst), get<1>(dst), get<2>(dst))),
+                                         shrinkPtr(mask),
+                                         rows, cols,
+                                         StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtrTuple, typename D0, typename D1, typename D2, class MaskPtr>
+__host__ void gridCopy_(const SrcPtrTuple& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1>, GlobPtrSz<D2> >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<SrcPtrTuple>::value == 3, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(get<0>(dst)) == rows && getCols(get<0>(dst)) == cols );
+    CV_Assert( getRows(get<1>(dst)) == rows && getCols(get<1>(dst)) == cols );
+    CV_Assert( getRows(get<2>(dst)) == rows && getCols(get<2>(dst)) == cols );
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_copy_detail::copy_tuple<Policy>(shrinkPtr(src),
+                                         shrinkPtr(zipPtr(get<0>(dst), get<1>(dst), get<2>(dst))),
+                                         shrinkPtr(mask),
+                                         rows, cols,
+                                         StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtrTuple, typename D0, typename D1, typename D2>
+__host__ void gridCopy_(const SrcPtrTuple& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>&, GpuMat_<D2>& >& dst, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<SrcPtrTuple>::value == 3, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+    get<2>(dst).create(rows, cols);
+
+    grid_copy_detail::copy_tuple<Policy>(shrinkPtr(src),
+                                         shrinkPtr(zipPtr(get<0>(dst), get<1>(dst), get<2>(dst))),
+                                         WithOutMask(),
+                                         rows, cols,
+                                         StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtrTuple, typename D0, typename D1, typename D2>
+__host__ void gridCopy_(const SrcPtrTuple& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1>, GlobPtrSz<D2> >& dst, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<SrcPtrTuple>::value == 3, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(get<0>(dst)) == rows && getCols(get<0>(dst)) == cols );
+    CV_Assert( getRows(get<1>(dst)) == rows && getCols(get<1>(dst)) == cols );
+    CV_Assert( getRows(get<2>(dst)) == rows && getCols(get<2>(dst)) == cols );
+
+    grid_copy_detail::copy_tuple<Policy>(shrinkPtr(src),
+                                         shrinkPtr(zipPtr(get<0>(dst), get<1>(dst), get<2>(dst))),
+                                         WithOutMask(),
+                                         rows, cols,
+                                         StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtrTuple, typename D0, typename D1, typename D2, typename D3, class MaskPtr>
+__host__ void gridCopy_(const SrcPtrTuple& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>&, GpuMat_<D2>&, GpuMat_<D3>& >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<SrcPtrTuple>::value == 4, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+    get<2>(dst).create(rows, cols);
+    get<3>(dst).create(rows, cols);
+
+    grid_copy_detail::copy_tuple<Policy>(shrinkPtr(src),
+                                         shrinkPtr(zipPtr(get<0>(dst), get<1>(dst), get<2>(dst), get<3>(dst))),
+                                         shrinkPtr(mask),
+                                         rows, cols,
+                                         StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtrTuple, typename D0, typename D1, typename D2, typename D3, class MaskPtr>
+__host__ void gridCopy_(const SrcPtrTuple& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1>, GlobPtrSz<D2>, GlobPtrSz<D3> >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<SrcPtrTuple>::value == 4, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(get<0>(dst)) == rows && getCols(get<0>(dst)) == cols );
+    CV_Assert( getRows(get<1>(dst)) == rows && getCols(get<1>(dst)) == cols );
+    CV_Assert( getRows(get<2>(dst)) == rows && getCols(get<2>(dst)) == cols );
+    CV_Assert( getRows(get<3>(dst)) == rows && getCols(get<3>(dst)) == cols );
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_copy_detail::copy_tuple<Policy>(shrinkPtr(src),
+                                         shrinkPtr(zipPtr(get<0>(dst), get<1>(dst), get<2>(dst), get<3>(dst))),
+                                         shrinkPtr(mask),
+                                         rows, cols,
+                                         StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtrTuple, typename D0, typename D1, typename D2, typename D3>
+__host__ void gridCopy_(const SrcPtrTuple& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>&, GpuMat_<D2>&, GpuMat_<D3>& >& dst, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<SrcPtrTuple>::value == 4, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+    get<2>(dst).create(rows, cols);
+    get<3>(dst).create(rows, cols);
+
+    grid_copy_detail::copy_tuple<Policy>(shrinkPtr(src),
+                                         shrinkPtr(zipPtr(get<0>(dst), get<1>(dst), get<2>(dst), get<3>(dst))),
+                                         WithOutMask(),
+                                         rows, cols,
+                                         StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtrTuple, typename D0, typename D1, typename D2, typename D3>
+__host__ void gridCopy_(const SrcPtrTuple& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1>, GlobPtrSz<D2>, GlobPtrSz<D3> >& dst, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<SrcPtrTuple>::value == 4, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(get<0>(dst)) == rows && getCols(get<0>(dst)) == cols );
+    CV_Assert( getRows(get<1>(dst)) == rows && getCols(get<1>(dst)) == cols );
+    CV_Assert( getRows(get<2>(dst)) == rows && getCols(get<2>(dst)) == cols );
+    CV_Assert( getRows(get<3>(dst)) == rows && getCols(get<3>(dst)) == cols );
+
+    grid_copy_detail::copy_tuple<Policy>(shrinkPtr(src),
+                                         shrinkPtr(zipPtr(get<0>(dst), get<1>(dst), get<2>(dst), get<3>(dst))),
+                                         WithOutMask(),
+                                         rows, cols,
+                                         StreamAccessor::getStream(stream));
+}
+
+// Default Policy
+
+struct DefaultCopyPolicy
+{
+    enum {
+        block_size_x = 32,
+        block_size_y = 8
+    };
+};
+
+template <class SrcPtr, typename DstType, class MaskPtr>
+__host__ void gridCopy(const SrcPtr& src, GpuMat_<DstType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridCopy_<DefaultCopyPolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtr, typename DstType, class MaskPtr>
+__host__ void gridCopy(const SrcPtr& src, const GlobPtrSz<DstType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridCopy_<DefaultCopyPolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtr, typename DstType>
+__host__ void gridCopy(const SrcPtr& src, GpuMat_<DstType>& dst, Stream& stream = Stream::Null())
+{
+    gridCopy_<DefaultCopyPolicy>(src, dst, stream);
+}
+
+template <class SrcPtr, typename DstType>
+__host__ void gridCopy(const SrcPtr& src, const GlobPtrSz<DstType>& dst, Stream& stream = Stream::Null())
+{
+    gridCopy_<DefaultCopyPolicy>(src, dst, stream);
+}
+
+template <class SrcPtrTuple, typename D0, typename D1, class MaskPtr>
+__host__ void gridCopy(const SrcPtrTuple& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>& >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridCopy_<DefaultCopyPolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtrTuple, typename D0, typename D1, class MaskPtr>
+__host__ void gridCopy(const SrcPtrTuple& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1> >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridCopy_<DefaultCopyPolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtrTuple, typename D0, typename D1>
+__host__ void gridCopy(const SrcPtrTuple& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>& >& dst, Stream& stream = Stream::Null())
+{
+    gridCopy_<DefaultCopyPolicy>(src, dst, stream);
+}
+
+template <class SrcPtrTuple, typename D0, typename D1>
+__host__ void gridCopy(const SrcPtrTuple& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1> >& dst, Stream& stream = Stream::Null())
+{
+    gridCopy_<DefaultCopyPolicy>(src, dst, stream);
+}
+
+template <class SrcPtrTuple, typename D0, typename D1, typename D2, class MaskPtr>
+__host__ void gridCopy(const SrcPtrTuple& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>&, GpuMat_<D2>& >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridCopy_<DefaultCopyPolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtrTuple, typename D0, typename D1, typename D2, class MaskPtr>
+__host__ void gridCopy(const SrcPtrTuple& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1>, GlobPtrSz<D2> >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridCopy_<DefaultCopyPolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtrTuple, typename D0, typename D1, typename D2>
+__host__ void gridCopy(const SrcPtrTuple& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>&, GpuMat_<D2>& >& dst, Stream& stream = Stream::Null())
+{
+    gridCopy_<DefaultCopyPolicy>(src, dst, stream);
+}
+
+template <class SrcPtrTuple, typename D0, typename D1, typename D2>
+__host__ void gridCopy(const SrcPtrTuple& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1>, GlobPtrSz<D2> >& dst, Stream& stream = Stream::Null())
+{
+    gridCopy_<DefaultCopyPolicy>(src, dst, stream);
+}
+
+template <class SrcPtrTuple, typename D0, typename D1, typename D2, typename D3, class MaskPtr>
+__host__ void gridCopy(const SrcPtrTuple& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>&, GpuMat_<D2>&, GpuMat_<D3>& >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridCopy_<DefaultCopyPolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtrTuple, typename D0, typename D1, typename D2, typename D3, class MaskPtr>
+__host__ void gridCopy(const SrcPtrTuple& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1>, GlobPtrSz<D2>, GlobPtrSz<D3> >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridCopy_<DefaultCopyPolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtrTuple, typename D0, typename D1, typename D2, typename D3>
+__host__ void gridCopy_(const SrcPtrTuple& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>&, GpuMat_<D2>&, GpuMat_<D3>& >& dst, Stream& stream = Stream::Null())
+{
+    gridCopy_<DefaultCopyPolicy>(src, dst, stream);
+}
+
+template <class SrcPtrTuple, typename D0, typename D1, typename D2, typename D3>
+__host__ void gridCopy_(const SrcPtrTuple& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1>, GlobPtrSz<D2>, GlobPtrSz<D3> >& dst, Stream& stream = Stream::Null())
+{
+    gridCopy_<DefaultCopyPolicy>(src, dst, stream);
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/detail/copy.hpp b/modules/cudev/include/opencv2/cudev/grid/detail/copy.hpp
new file mode 100644
index 00000000000..b6fce94548c
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/detail/copy.hpp
@@ -0,0 +1,132 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_COPY_DETAIL_HPP
+#define OPENCV_CUDEV_GRID_COPY_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../util/tuple.hpp"
+#include "../../util/saturate_cast.hpp"
+#include "../../ptr2d/glob.hpp"
+#include "../../ptr2d/traits.hpp"
+
+namespace cv { namespace cudev {
+
+namespace grid_copy_detail
+{
+    template <class SrcPtr, typename DstType, class MaskPtr>
+    __global__ void copy(const SrcPtr src, GlobPtr<DstType> dst, const MaskPtr mask, const int rows, const int cols)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= cols || y >= rows || !mask(y, x))
+            return;
+
+        dst(y, x) = saturate_cast<DstType>(src(y, x));
+    }
+
+    template <class Policy, class SrcPtr, typename DstType, class MaskPtr>
+    __host__ void copy(const SrcPtr& src, const GlobPtr<DstType>& dst, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        const dim3 block(Policy::block_size_x, Policy::block_size_y);
+        const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));
+
+        copy<<<grid, block, 0, stream>>>(src, dst, mask, rows, cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+
+    template <int count> struct Unroll
+    {
+        template <class SrcPtrTuple, class DstPtrTuple>
+        __device__ static void copy(const SrcPtrTuple& src, DstPtrTuple& dst, const int y, const int x)
+        {
+            typedef typename tuple_element<count - 1, DstPtrTuple>::type dst_ptr_type;
+            typedef typename PtrTraits<dst_ptr_type>::value_type dst_type;
+
+            get<count - 1>(dst)(y, x) = saturate_cast<dst_type>(get<count - 1>(src)(y, x));
+            Unroll<count - 1>::copy(src, dst, y, x);
+        }
+    };
+    template <> struct Unroll<0>
+    {
+        template <class SrcPtrTuple, class DstPtrTuple>
+        __device__ __forceinline__ static void copy(const SrcPtrTuple&, DstPtrTuple&, const int, const int)
+        {
+        }
+    };
+
+    template <class SrcPtrTuple, class DstPtrTuple, class MaskPtr>
+    __global__ void copy_tuple(const SrcPtrTuple src, DstPtrTuple dst, const MaskPtr mask, const int rows, const int cols)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= cols || y >= rows || !mask(y, x))
+            return;
+
+        Unroll<tuple_size<SrcPtrTuple>::value>::copy(src, dst, y, x);
+    }
+
+    template <class Policy, class SrcPtrTuple, class DstPtrTuple, class MaskPtr>
+    __host__ void copy_tuple(const SrcPtrTuple& src, const DstPtrTuple& dst, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        const dim3 block(Policy::block_size_x, Policy::block_size_y);
+        const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));
+
+        copy_tuple<<<grid, block, 0, stream>>>(src, dst, mask, rows, cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/detail/histogram.hpp b/modules/cudev/include/opencv2/cudev/grid/detail/histogram.hpp
new file mode 100644
index 00000000000..8b6164e930d
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/detail/histogram.hpp
@@ -0,0 +1,111 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_HISTOGRAM_DETAIL_HPP
+#define OPENCV_CUDEV_GRID_HISTOGRAM_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../util/atomic.hpp"
+
+namespace cv { namespace cudev {
+
+namespace grid_histogram_detail
+{
+    template <int BIN_COUNT, int BLOCK_SIZE, class SrcPtr, typename ResType, class MaskPtr>
+    __global__ void histogram(const SrcPtr src, ResType* hist, const MaskPtr mask, const int rows, const int cols)
+    {
+    #if CV_CUDEV_ARCH >= 120
+        __shared__ ResType smem[BIN_COUNT];
+
+        const int y = blockIdx.x * blockDim.y + threadIdx.y;
+        const int tid = threadIdx.y * blockDim.x + threadIdx.x;
+
+        for (int i = tid; i < BIN_COUNT; i += BLOCK_SIZE)
+            smem[i] = 0;
+
+        __syncthreads();
+
+        if (y < rows)
+        {
+            for (int x = threadIdx.x; x < cols; x += blockDim.x)
+            {
+                if (mask(y, x))
+                {
+                    const uint data = src(y, x);
+                    atomicAdd(&smem[data % BIN_COUNT], 1);
+                }
+            }
+        }
+
+        __syncthreads();
+
+        for (int i = tid; i < BIN_COUNT; i += BLOCK_SIZE)
+        {
+            const ResType histVal = smem[i];
+            if (histVal > 0)
+                atomicAdd(hist + i, histVal);
+        }
+    #endif
+    }
+
+    template <int BIN_COUNT, class Policy, class SrcPtr, typename ResType, class MaskPtr>
+    __host__ void histogram(const SrcPtr& src, ResType* hist, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        const dim3 block(Policy::block_size_x, Policy::block_size_y);
+        const dim3 grid(divUp(rows, block.y));
+
+        const int BLOCK_SIZE = Policy::block_size_x * Policy::block_size_y;
+
+        histogram<BIN_COUNT, BLOCK_SIZE><<<grid, block, 0, stream>>>(src, hist, mask, rows, cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/detail/integral.hpp b/modules/cudev/include/opencv2/cudev/grid/detail/integral.hpp
new file mode 100644
index 00000000000..eeae57d625a
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/detail/integral.hpp
@@ -0,0 +1,629 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_INTEGRAL_DETAIL_HPP
+#define OPENCV_CUDEV_GRID_INTEGRAL_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../warp/shuffle.hpp"
+#include "../../block/scan.hpp"
+#include "../../ptr2d/glob.hpp"
+
+namespace cv { namespace cudev {
+
+namespace integral_detail
+{
+    // horizontal_pass
+
+    template <int NUM_SCAN_THREADS, class SrcPtr, typename D>
+    __global__ void horizontal_pass(const SrcPtr src, GlobPtr<D> dst, const int cols)
+    {
+        __shared__ D smem[NUM_SCAN_THREADS * 2];
+        __shared__ D carryElem;
+
+        if (threadIdx.x == 0)
+            carryElem = 0;
+
+        __syncthreads();
+
+        D* dst_row = dst.row(blockIdx.x);
+
+        int numBuckets = divUp(cols, NUM_SCAN_THREADS);
+        int offsetX = 0;
+
+        while (numBuckets--)
+        {
+            const int curElemOffs = offsetX + threadIdx.x;
+
+            D curElem = 0.0f;
+
+            if (curElemOffs < cols)
+                curElem = src(blockIdx.x, curElemOffs);
+
+            const D curScanElem = blockScanInclusive<NUM_SCAN_THREADS>(curElem, smem, threadIdx.x);
+
+            if (curElemOffs < cols)
+                dst_row[curElemOffs] = carryElem + curScanElem;
+
+            offsetX += NUM_SCAN_THREADS;
+
+            __syncthreads();
+
+            if (threadIdx.x == NUM_SCAN_THREADS - 1)
+            {
+                carryElem += curScanElem;
+            }
+
+            __syncthreads();
+        }
+    }
+
+    template <int NUM_SCAN_THREADS, typename T, typename D>
+    __global__ void horizontal_pass(const GlobPtr<T> src, GlobPtr<D> dst, const int cols)
+    {
+        __shared__ D smem[NUM_SCAN_THREADS * 2];
+        __shared__ D carryElem;
+
+        if (threadIdx.x == 0)
+            carryElem = 0;
+
+        __syncthreads();
+
+        const T* src_row = src.row(blockIdx.x);
+        D* dst_row = dst.row(blockIdx.x);
+
+        int numBuckets = divUp(cols, NUM_SCAN_THREADS);
+        int offsetX = 0;
+
+        while (numBuckets--)
+        {
+            const int curElemOffs = offsetX + threadIdx.x;
+
+            D curElem = 0.0f;
+
+            if (curElemOffs < cols)
+                curElem = src_row[curElemOffs];
+
+            const D curScanElem = blockScanInclusive<NUM_SCAN_THREADS>(curElem, smem, threadIdx.x);
+
+            if (curElemOffs < cols)
+                dst_row[curElemOffs] = carryElem + curScanElem;
+
+            offsetX += NUM_SCAN_THREADS;
+
+            __syncthreads();
+
+            if (threadIdx.x == NUM_SCAN_THREADS - 1)
+            {
+                carryElem += curScanElem;
+            }
+
+            __syncthreads();
+        }
+    }
+
+    template <class SrcPtr, typename D>
+    __host__ void horizontal_pass(const SrcPtr& src, const GlobPtr<D>& dst, int rows, int cols, cudaStream_t stream)
+    {
+        const int NUM_SCAN_THREADS = 256;
+
+        const dim3 block(NUM_SCAN_THREADS);
+        const dim3 grid(rows);
+
+        horizontal_pass<NUM_SCAN_THREADS><<<grid, block, 0, stream>>>(src, dst, cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+    }
+
+    // horisontal_pass_8u_shfl
+
+    __device__ static uchar4 int_to_uchar4(unsigned int in)
+    {
+        uchar4 bytes;
+        bytes.x = (in & 0x000000ff) >>  0;
+        bytes.y = (in & 0x0000ff00) >>  8;
+        bytes.z = (in & 0x00ff0000) >> 16;
+        bytes.w = (in & 0xff000000) >> 24;
+        return bytes;
+    }
+
+    __global__ static void horisontal_pass_8u_shfl_kernel(const GlobPtr<uint4> img, GlobPtr<uint4> integral)
+    {
+    #if CV_CUDEV_ARCH >= 300
+        __shared__ int sums[128];
+
+        const int id = threadIdx.x;
+        const int lane_id = id % warpSize;
+        const int warp_id = id / warpSize;
+
+        const uint4 data = img(blockIdx.x, id);
+
+        const uchar4 a = int_to_uchar4(data.x);
+        const uchar4 b = int_to_uchar4(data.y);
+        const uchar4 c = int_to_uchar4(data.z);
+        const uchar4 d = int_to_uchar4(data.w);
+
+        int result[16];
+
+        result[0]  =              a.x;
+        result[1]  = result[0]  + a.y;
+        result[2]  = result[1]  + a.z;
+        result[3]  = result[2]  + a.w;
+
+        result[4]  = result[3]  + b.x;
+        result[5]  = result[4]  + b.y;
+        result[6]  = result[5]  + b.z;
+        result[7]  = result[6]  + b.w;
+
+        result[8]  = result[7]  + c.x;
+        result[9]  = result[8]  + c.y;
+        result[10] = result[9]  + c.z;
+        result[11] = result[10] + c.w;
+
+        result[12] = result[11] + d.x;
+        result[13] = result[12] + d.y;
+        result[14] = result[13] + d.z;
+        result[15] = result[14] + d.w;
+
+        int sum = result[15];
+
+        // the prefix sum for each thread's 16 value is computed,
+        // now the final sums (result[15]) need to be shared
+        // with the other threads and add.  To do this,
+        // the shfl_up() instruction is used and a shuffle scan
+        // operation is performed to distribute the sums to the correct
+        // threads
+        #pragma unroll
+        for (int i = 1; i < 32; i *= 2)
+        {
+            const int n = compatible_shfl_up(sum, i, 32);
+
+            if (lane_id >= i)
+            {
+                #pragma unroll
+                for (int k = 0; k < 16; ++k)
+                    result[k] += n;
+
+                sum += n;
+            }
+        }
+
+        // Now the final sum for the warp must be shared
+        // between warps.  This is done by each warp
+        // having a thread store to shared memory, then
+        // having some other warp load the values and
+        // compute a prefix sum, again by using shfl_up.
+        // The results are uniformly added back to the warps.
+        // last thread in the warp holding sum of the warp
+        // places that in shared
+        if (threadIdx.x % warpSize == warpSize - 1)
+            sums[warp_id] = result[15];
+
+        __syncthreads();
+
+        if (warp_id == 0)
+        {
+            int warp_sum = sums[lane_id];
+
+            #pragma unroll
+            for (int i = 1; i < 32; i *= 2)
+            {
+                const int n = compatible_shfl_up(warp_sum, i, 32);
+
+                if (lane_id >= i)
+                    warp_sum += n;
+            }
+
+            sums[lane_id] = warp_sum;
+        }
+
+        __syncthreads();
+
+        int blockSum = 0;
+
+        // fold in unused warp
+        if (warp_id > 0)
+        {
+            blockSum = sums[warp_id - 1];
+
+            #pragma unroll
+            for (int k = 0; k < 16; ++k)
+                result[k] += blockSum;
+        }
+
+        // assemble result
+        // Each thread has 16 values to write, which are
+        // now integer data (to avoid overflow).  Instead of
+        // each thread writing consecutive uint4s, the
+        // approach shown here experiments using
+        // the shuffle command to reformat the data
+        // inside the registers so that each thread holds
+        // consecutive data to be written so larger contiguous
+        // segments can be assembled for writing.
+
+        /*
+            For example data that needs to be written as
+
+            GMEM[16] <- x0 x1 x2 x3 y0 y1 y2 y3 z0 z1 z2 z3 w0 w1 w2 w3
+            but is stored in registers (r0..r3), in four threads (0..3) as:
+
+            threadId   0  1  2  3
+              r0      x0 y0 z0 w0
+              r1      x1 y1 z1 w1
+              r2      x2 y2 z2 w2
+              r3      x3 y3 z3 w3
+
+              after apply shfl_xor operations to move data between registers r1..r3:
+
+            threadId  00 01 10 11
+                      x0 y0 z0 w0
+             xor(01)->y1 x1 w1 z1
+             xor(10)->z2 w2 x2 y2
+             xor(11)->w3 z3 y3 x3
+
+             and now x0..x3, and z0..z3 can be written out in order by all threads.
+
+             In the current code, each register above is actually representing
+             four integers to be written as uint4's to GMEM.
+        */
+
+        result[4]  = shfl_xor(result[4] , 1, 32);
+        result[5]  = shfl_xor(result[5] , 1, 32);
+        result[6]  = shfl_xor(result[6] , 1, 32);
+        result[7]  = shfl_xor(result[7] , 1, 32);
+
+        result[8]  = shfl_xor(result[8] , 2, 32);
+        result[9]  = shfl_xor(result[9] , 2, 32);
+        result[10] = shfl_xor(result[10], 2, 32);
+        result[11] = shfl_xor(result[11], 2, 32);
+
+        result[12] = shfl_xor(result[12], 3, 32);
+        result[13] = shfl_xor(result[13], 3, 32);
+        result[14] = shfl_xor(result[14], 3, 32);
+        result[15] = shfl_xor(result[15], 3, 32);
+
+        uint4* integral_row = integral.row(blockIdx.x);
+        uint4 output;
+
+        ///////
+
+        if (threadIdx.x % 4 == 0)
+            output = make_uint4(result[0], result[1], result[2], result[3]);
+
+        if (threadIdx.x % 4 == 1)
+            output = make_uint4(result[4], result[5], result[6], result[7]);
+
+        if (threadIdx.x % 4 == 2)
+            output = make_uint4(result[8], result[9], result[10], result[11]);
+
+        if (threadIdx.x % 4 == 3)
+            output = make_uint4(result[12], result[13], result[14], result[15]);
+
+        integral_row[threadIdx.x % 4 + (threadIdx.x / 4) * 16] = output;
+
+        ///////
+
+        if (threadIdx.x % 4 == 2)
+            output = make_uint4(result[0], result[1], result[2], result[3]);
+
+        if (threadIdx.x % 4 == 3)
+            output = make_uint4(result[4], result[5], result[6], result[7]);
+
+        if (threadIdx.x % 4 == 0)
+            output = make_uint4(result[8], result[9], result[10], result[11]);
+
+        if (threadIdx.x % 4 == 1)
+            output = make_uint4(result[12], result[13], result[14], result[15]);
+
+        integral_row[(threadIdx.x + 2) % 4 + (threadIdx.x / 4) * 16 + 8] = output;
+
+        // continuning from the above example,
+        // this use of shfl_xor() places the y0..y3 and w0..w3 data
+        // in order.
+
+        #pragma unroll
+        for (int i = 0; i < 16; ++i)
+            result[i] = shfl_xor(result[i], 1, 32);
+
+        if (threadIdx.x % 4 == 0)
+            output = make_uint4(result[0], result[1], result[2], result[3]);
+
+        if (threadIdx.x % 4 == 1)
+            output = make_uint4(result[4], result[5], result[6], result[7]);
+
+        if (threadIdx.x % 4 == 2)
+            output = make_uint4(result[8], result[9], result[10], result[11]);
+
+        if (threadIdx.x % 4 == 3)
+            output = make_uint4(result[12], result[13], result[14], result[15]);
+
+        integral_row[threadIdx.x % 4 + (threadIdx.x / 4) * 16 + 4] = output;
+
+        ///////
+
+        if (threadIdx.x % 4 == 2)
+            output = make_uint4(result[0], result[1], result[2], result[3]);
+
+        if (threadIdx.x % 4 == 3)
+            output = make_uint4(result[4], result[5], result[6], result[7]);
+
+        if (threadIdx.x % 4 == 0)
+            output = make_uint4(result[8], result[9], result[10], result[11]);
+
+        if (threadIdx.x % 4 == 1)
+            output = make_uint4(result[12], result[13], result[14], result[15]);
+
+        integral_row[(threadIdx.x + 2) % 4 + (threadIdx.x / 4) * 16 + 12] = output;
+    #endif
+    }
+
+    __host__ static void horisontal_pass_8u_shfl(const GlobPtr<uchar> src, GlobPtr<uint> integral, int rows, int cols, cudaStream_t stream)
+    {
+        // each thread handles 16 values, use 1 block/row
+        // save, because step is actually can't be less 512 bytes
+        const int block = cols / 16;
+
+        // launch 1 block / row
+        const int grid = rows;
+
+        CV_CUDEV_SAFE_CALL( cudaFuncSetCacheConfig(horisontal_pass_8u_shfl_kernel, cudaFuncCachePreferL1) );
+
+        GlobPtr<uint4> src4 = globPtr((uint4*) src.data, src.step);
+        GlobPtr<uint4> integral4 = globPtr((uint4*) integral.data, integral.step);
+
+        horisontal_pass_8u_shfl_kernel<<<grid, block, 0, stream>>>(src4, integral4);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+    }
+
+    // vertical
+
+    template <typename T>
+    __global__ void vertical_pass(GlobPtr<T> integral, const int rows, const int cols)
+    {
+    #if CV_CUDEV_ARCH >= 300
+        __shared__ T sums[32][9];
+
+        const int tidx = blockIdx.x * blockDim.x + threadIdx.x;
+        const int lane_id = tidx % 8;
+
+        sums[threadIdx.x][threadIdx.y] = 0;
+        __syncthreads();
+
+        T stepSum = 0;
+
+        int numBuckets = divUp(rows, blockDim.y);
+        int y = threadIdx.y;
+
+        while (numBuckets--)
+        {
+            T* p = integral.row(y) + tidx;
+
+            T sum = (tidx < cols) && (y < rows) ? *p : 0;
+
+            sums[threadIdx.x][threadIdx.y] = sum;
+            __syncthreads();
+
+            // place into SMEM
+            // shfl scan reduce the SMEM, reformating so the column
+            // sums are computed in a warp
+            // then read out properly
+            const int j = threadIdx.x % 8;
+            const int k = threadIdx.x / 8 + threadIdx.y * 4;
+
+            T partial_sum = sums[k][j];
+
+            for (int i = 1; i <= 8; i *= 2)
+            {
+                T n = compatible_shfl_up(partial_sum, i, 32);
+
+                if (lane_id >= i)
+                    partial_sum += n;
+            }
+
+            sums[k][j] = partial_sum;
+            __syncthreads();
+
+            if (threadIdx.y > 0)
+                sum += sums[threadIdx.x][threadIdx.y - 1];
+
+            sum += stepSum;
+            stepSum += sums[threadIdx.x][blockDim.y - 1];
+
+            __syncthreads();
+
+            if ((tidx < cols) && (y < rows))
+            {
+                *p = sum;
+            }
+
+            y += blockDim.y;
+        }
+    #else
+        __shared__ T smem[32][32];
+        __shared__ T prevVals[32];
+
+        volatile T* smem_row = &smem[0][0] + 64 * threadIdx.y;
+
+        if (threadIdx.y == 0)
+            prevVals[threadIdx.x] = 0;
+
+        __syncthreads();
+
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+
+        int numBuckets = divUp(rows, 8 * 4);
+        int offsetY = 0;
+
+        while (numBuckets--)
+        {
+            const int curRowOffs = offsetY + threadIdx.y;
+
+            T curElems[4];
+            T temp[4];
+
+            // load patch
+
+            smem[threadIdx.y +  0][threadIdx.x] = 0.0f;
+            smem[threadIdx.y +  8][threadIdx.x] = 0.0f;
+            smem[threadIdx.y + 16][threadIdx.x] = 0.0f;
+            smem[threadIdx.y + 24][threadIdx.x] = 0.0f;
+
+            if (x < cols)
+            {
+                for (int i = 0; i < 4; ++i)
+                {
+                    if (curRowOffs + i * 8 < rows)
+                        smem[threadIdx.y + i * 8][threadIdx.x] = integral(curRowOffs + i * 8, x);
+                }
+            }
+
+            __syncthreads();
+
+            // reduce
+
+            curElems[0] = smem[threadIdx.x][threadIdx.y     ];
+            curElems[1] = smem[threadIdx.x][threadIdx.y +  8];
+            curElems[2] = smem[threadIdx.x][threadIdx.y + 16];
+            curElems[3] = smem[threadIdx.x][threadIdx.y + 24];
+
+            __syncthreads();
+
+            temp[0] = curElems[0] = warpScanInclusive(curElems[0], smem_row, threadIdx.x);
+            temp[1] = curElems[1] = warpScanInclusive(curElems[1], smem_row, threadIdx.x);
+            temp[2] = curElems[2] = warpScanInclusive(curElems[2], smem_row, threadIdx.x);
+            temp[3] = curElems[3] = warpScanInclusive(curElems[3], smem_row, threadIdx.x);
+
+            curElems[0] += prevVals[threadIdx.y     ];
+            curElems[1] += prevVals[threadIdx.y +  8];
+            curElems[2] += prevVals[threadIdx.y + 16];
+            curElems[3] += prevVals[threadIdx.y + 24];
+
+            __syncthreads();
+
+            if (threadIdx.x == 31)
+            {
+                prevVals[threadIdx.y     ] += temp[0];
+                prevVals[threadIdx.y +  8] += temp[1];
+                prevVals[threadIdx.y + 16] += temp[2];
+                prevVals[threadIdx.y + 24] += temp[3];
+            }
+
+            smem[threadIdx.y     ][threadIdx.x] = curElems[0];
+            smem[threadIdx.y +  8][threadIdx.x] = curElems[1];
+            smem[threadIdx.y + 16][threadIdx.x] = curElems[2];
+            smem[threadIdx.y + 24][threadIdx.x] = curElems[3];
+
+            __syncthreads();
+
+            // store patch
+
+            if (x < cols)
+            {
+                // read 4 value from source
+                for (int i = 0; i < 4; ++i)
+                {
+                    if (curRowOffs + i * 8 < rows)
+                        integral(curRowOffs + i * 8, x) = smem[threadIdx.x][threadIdx.y + i * 8];
+                }
+            }
+
+            __syncthreads();
+
+            offsetY += 8 * 4;
+        }
+    #endif
+    }
+
+    template <typename T>
+    __host__ void vertical_pass(const GlobPtr<T>& integral, int rows, int cols, cudaStream_t stream)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(cols, block.x));
+
+        vertical_pass<<<grid, block, 0, stream>>>(integral, rows, cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+    }
+
+    // integral
+
+    template <class SrcPtr, typename D>
+    __host__ void integral(const SrcPtr& src, const GlobPtr<D>& dst, int rows, int cols, cudaStream_t stream)
+    {
+        horizontal_pass(src, dst, rows, cols, stream);
+        vertical_pass(dst, rows, cols, stream);
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+
+    __host__ static void integral(const GlobPtr<uchar>& src, const GlobPtr<uint>& dst, int rows, int cols, cudaStream_t stream)
+    {
+        if (deviceSupports(FEATURE_SET_COMPUTE_30)
+            && (cols % 64 == 0)
+            && reinterpret_cast<intptr_t>(src.data) % 32 == 0
+            && reinterpret_cast<intptr_t>(dst.data) % 32 == 0)
+        {
+            horisontal_pass_8u_shfl(src, dst, rows, cols, stream);
+        }
+        else
+        {
+            horizontal_pass(src, dst, rows, cols, stream);
+        }
+
+        vertical_pass(dst, rows, cols, stream);
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+
+    __host__ __forceinline__ void integral(const GlobPtr<uchar>& src, const GlobPtr<int>& dst, int rows, int cols, cudaStream_t stream)
+    {
+        GlobPtr<uint> dstui = globPtr((uint*) dst.data, dst.step);
+        integral(src, dstui, rows, cols, stream);
+    }
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/detail/minmaxloc.hpp b/modules/cudev/include/opencv2/cudev/grid/detail/minmaxloc.hpp
new file mode 100644
index 00000000000..9e4f348e1a8
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/detail/minmaxloc.hpp
@@ -0,0 +1,177 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_MINMAXLOC_DETAIL_HPP
+#define OPENCV_CUDEV_GRID_MINMAXLOC_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../util/vec_traits.hpp"
+#include "../../util/type_traits.hpp"
+#include "../../util/limits.hpp"
+#include "../../block/reduce.hpp"
+
+namespace cv { namespace cudev {
+
+namespace grid_minmaxloc_detail
+{
+    template <int BLOCK_SIZE, class SrcPtr, typename ResType, class MaskPtr>
+    __global__ void minMaxLoc_pass_1(const SrcPtr src, ResType* minVal, ResType* maxVal, int* minLoc, int* maxLoc, const MaskPtr mask, const int rows, const int cols, const int patch_y, const int patch_x)
+    {
+        __shared__ ResType sMinVal[BLOCK_SIZE];
+        __shared__ ResType sMaxVal[BLOCK_SIZE];
+        __shared__ uint sMinLoc[BLOCK_SIZE];
+        __shared__ uint sMaxLoc[BLOCK_SIZE];
+
+        const int x0 = blockIdx.x * blockDim.x * patch_x + threadIdx.x;
+        const int y0 = blockIdx.y * blockDim.y * patch_y + threadIdx.y;
+
+        ResType myMin = numeric_limits<ResType>::max();
+        ResType myMax = -numeric_limits<ResType>::max();
+        int myMinLoc = -1;
+        int myMaxLoc = -1;
+
+        for (int i = 0, y = y0; i < patch_y && y < rows; ++i, y += blockDim.y)
+        {
+            for (int j = 0, x = x0; j < patch_x && x < cols; ++j, x += blockDim.x)
+            {
+                if (mask(y, x))
+                {
+                    const ResType srcVal = src(y, x);
+
+                    if (srcVal < myMin)
+                    {
+                        myMin = srcVal;
+                        myMinLoc = y * cols + x;
+                    }
+
+                    if (srcVal > myMax)
+                    {
+                        myMax = srcVal;
+                        myMaxLoc = y * cols + x;
+                    }
+                }
+            }
+        }
+
+        const int tid = threadIdx.y * blockDim.x + threadIdx.x;
+
+        blockReduceKeyVal<BLOCK_SIZE>(smem_tuple(sMinVal, sMaxVal), tie(myMin, myMax),
+                                      smem_tuple(sMinLoc, sMaxLoc), tie(myMinLoc, myMaxLoc),
+                                      tid,
+                                      make_tuple(less<ResType>(), greater<ResType>()));
+
+        const int bid = blockIdx.y * gridDim.x + blockIdx.x;
+
+        if (tid == 0)
+        {
+            minVal[bid] = myMin;
+            maxVal[bid] = myMax;
+            minLoc[bid] = myMinLoc;
+            maxLoc[bid] = myMaxLoc;
+        }
+    }
+
+    template <int BLOCK_SIZE, typename T>
+    __global__ void minMaxLoc_pass_2(T* minMal, T* maxVal, int* minLoc, int* maxLoc, int count)
+    {
+        __shared__ T sMinVal[BLOCK_SIZE];
+        __shared__ T sMaxVal[BLOCK_SIZE];
+        __shared__ int sMinLoc[BLOCK_SIZE];
+        __shared__ int sMaxLoc[BLOCK_SIZE];
+
+        const int idx = ::min(threadIdx.x, count - 1);
+
+        T myMin = minMal[idx];
+        T myMax = maxVal[idx];
+        int myMinLoc = minLoc[idx];
+        int myMaxLoc = maxLoc[idx];
+
+        blockReduceKeyVal<BLOCK_SIZE>(smem_tuple(sMinVal, sMaxVal), tie(myMin, myMax),
+                                      smem_tuple(sMinLoc, sMaxLoc), tie(myMinLoc, myMaxLoc),
+                                      threadIdx.x,
+                                      make_tuple(less<T>(), greater<T>()));
+
+        if (threadIdx.x == 0)
+        {
+            minMal[0] = myMin;
+            maxVal[0] = myMax;
+            minLoc[0] = myMinLoc;
+            maxLoc[0] = myMaxLoc;
+        }
+    }
+
+    template <class Policy>
+    void getLaunchCfg(int rows, int cols, dim3& block, dim3& grid)
+    {
+        block = dim3(Policy::block_size_x, Policy::block_size_y);
+        grid = dim3(divUp(cols, block.x * Policy::patch_size_x), divUp(rows, block.y * Policy::patch_size_y));
+
+        grid.x = ::min(grid.x, block.x);
+        grid.y = ::min(grid.y, block.y);
+    }
+
+    template <class Policy, class SrcPtr, typename ResType, class MaskPtr>
+    __host__ void minMaxLoc(const SrcPtr& src, ResType* minVal, ResType* maxVal, int* minLoc, int* maxLoc, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        dim3 block, grid;
+        getLaunchCfg<Policy>(rows, cols, block, grid);
+
+        const int patch_x = divUp(divUp(cols, grid.x), block.x);
+        const int patch_y = divUp(divUp(rows, grid.y), block.y);
+
+        minMaxLoc_pass_1<Policy::block_size_x * Policy::block_size_y><<<grid, block, 0, stream>>>(src, minVal, maxVal, minLoc, maxLoc, mask, rows, cols, patch_y, patch_x);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        minMaxLoc_pass_2<Policy::block_size_x * Policy::block_size_y><<<1, Policy::block_size_x * Policy::block_size_y, 0, stream>>>(minVal, maxVal, minLoc, maxLoc, grid.x * grid.y);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/detail/pyr_down.hpp b/modules/cudev/include/opencv2/cudev/grid/detail/pyr_down.hpp
new file mode 100644
index 00000000000..51cdf1bfb9b
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/detail/pyr_down.hpp
@@ -0,0 +1,201 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_PYR_DOWN_DETAIL_HPP
+#define OPENCV_CUDEV_GRID_PYR_DOWN_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../util/vec_traits.hpp"
+#include "../../util/saturate_cast.hpp"
+#include "../../util/type_traits.hpp"
+#include "../../ptr2d/glob.hpp"
+#include "../../ptr2d/traits.hpp"
+
+namespace cv { namespace cudev {
+
+namespace pyramids_detail
+{
+    template <class Brd, class SrcPtr, typename DstType>
+    __global__ void pyrDown(const SrcPtr src, GlobPtr<DstType> dst, const int src_rows, const int src_cols, const int dst_cols)
+    {
+        typedef typename PtrTraits<SrcPtr>::value_type src_type;
+        typedef typename VecTraits<src_type>::elem_type src_elem_type;
+        typedef typename LargerType<float, src_elem_type>::type work_elem_type;
+        typedef typename MakeVec<work_elem_type, VecTraits<src_type>::cn>::type work_type;
+
+        __shared__ work_type smem[256 + 4];
+
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y;
+
+        const int src_y = 2 * y;
+
+        if (src_y >= 2 && src_y < src_rows - 2 && x >= 2 && x < src_cols - 2)
+        {
+            {
+                work_type sum;
+
+                sum =       0.0625f * src(src_y - 2, x);
+                sum = sum + 0.25f   * src(src_y - 1, x);
+                sum = sum + 0.375f  * src(src_y    , x);
+                sum = sum + 0.25f   * src(src_y + 1, x);
+                sum = sum + 0.0625f * src(src_y + 2, x);
+
+                smem[2 + threadIdx.x] = sum;
+            }
+
+            if (threadIdx.x < 2)
+            {
+                const int left_x = x - 2;
+
+                work_type sum;
+
+                sum =       0.0625f * src(src_y - 2, left_x);
+                sum = sum + 0.25f   * src(src_y - 1, left_x);
+                sum = sum + 0.375f  * src(src_y    , left_x);
+                sum = sum + 0.25f   * src(src_y + 1, left_x);
+                sum = sum + 0.0625f * src(src_y + 2, left_x);
+
+                smem[threadIdx.x] = sum;
+            }
+
+            if (threadIdx.x > 253)
+            {
+                const int right_x = x + 2;
+
+                work_type sum;
+
+                sum =       0.0625f * src(src_y - 2, right_x);
+                sum = sum + 0.25f   * src(src_y - 1, right_x);
+                sum = sum + 0.375f  * src(src_y    , right_x);
+                sum = sum + 0.25f   * src(src_y + 1, right_x);
+                sum = sum + 0.0625f * src(src_y + 2, right_x);
+
+                smem[4 + threadIdx.x] = sum;
+            }
+        }
+        else
+        {
+            {
+                work_type sum;
+
+                sum =       0.0625f * src(Brd::idx_low(src_y - 2, src_rows) , Brd::idx_high(x, src_cols));
+                sum = sum + 0.25f   * src(Brd::idx_low(src_y - 1, src_rows) , Brd::idx_high(x, src_cols));
+                sum = sum + 0.375f  * src(src_y                             , Brd::idx_high(x, src_cols));
+                sum = sum + 0.25f   * src(Brd::idx_high(src_y + 1, src_rows), Brd::idx_high(x, src_cols));
+                sum = sum + 0.0625f * src(Brd::idx_high(src_y + 2, src_rows), Brd::idx_high(x, src_cols));
+
+                smem[2 + threadIdx.x] = sum;
+            }
+
+            if (threadIdx.x < 2)
+            {
+                const int left_x = x - 2;
+
+                work_type sum;
+
+                sum =       0.0625f * src(Brd::idx_low(src_y - 2, src_rows) , Brd::idx_low(Brd::idx_high(left_x, src_cols), src_cols));
+                sum = sum + 0.25f   * src(Brd::idx_low(src_y - 1, src_rows) , Brd::idx_low(Brd::idx_high(left_x, src_cols), src_cols));
+                sum = sum + 0.375f  * src(src_y                             , Brd::idx_low(Brd::idx_high(left_x, src_cols), src_cols));
+                sum = sum + 0.25f   * src(Brd::idx_high(src_y + 1, src_rows), Brd::idx_low(Brd::idx_high(left_x, src_cols), src_cols));
+                sum = sum + 0.0625f * src(Brd::idx_high(src_y + 2, src_rows), Brd::idx_low(Brd::idx_high(left_x, src_cols), src_cols));
+
+                smem[threadIdx.x] = sum;
+            }
+
+            if (threadIdx.x > 253)
+            {
+                const int right_x = x + 2;
+
+                work_type sum;
+
+                sum =       0.0625f * src(Brd::idx_low(src_y - 2, src_rows) , Brd::idx_high(right_x, src_cols));
+                sum = sum + 0.25f   * src(Brd::idx_low(src_y - 1, src_rows) , Brd::idx_high(right_x, src_cols));
+                sum = sum + 0.375f  * src(src_y                             , Brd::idx_high(right_x, src_cols));
+                sum = sum + 0.25f   * src(Brd::idx_high(src_y + 1, src_rows), Brd::idx_high(right_x, src_cols));
+                sum = sum + 0.0625f * src(Brd::idx_high(src_y + 2, src_rows), Brd::idx_high(right_x, src_cols));
+
+                smem[4 + threadIdx.x] = sum;
+            }
+        }
+
+        __syncthreads();
+
+        if (threadIdx.x < 128)
+        {
+            const int tid2 = threadIdx.x * 2;
+
+            work_type sum;
+
+            sum =       0.0625f * smem[2 + tid2 - 2];
+            sum = sum + 0.25f   * smem[2 + tid2 - 1];
+            sum = sum + 0.375f  * smem[2 + tid2    ];
+            sum = sum + 0.25f   * smem[2 + tid2 + 1];
+            sum = sum + 0.0625f * smem[2 + tid2 + 2];
+
+            const int dst_x = (blockIdx.x * blockDim.x + tid2) / 2;
+
+            if (dst_x < dst_cols)
+                dst(y, dst_x) = saturate_cast<DstType>(sum);
+        }
+    }
+
+    template <class Brd, class SrcPtr, typename DstType>
+    __host__ void pyrDown(const SrcPtr& src, const GlobPtr<DstType>& dst, int src_rows, int src_cols, int dst_rows, int dst_cols, cudaStream_t stream)
+    {
+        const dim3 block(256);
+        const dim3 grid(divUp(src_cols, block.x), dst_rows);
+
+        pyrDown<Brd><<<grid, block, 0, stream>>>(src, dst, src_rows, src_cols, dst_cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/detail/pyr_up.hpp b/modules/cudev/include/opencv2/cudev/grid/detail/pyr_up.hpp
new file mode 100644
index 00000000000..b5543ae0266
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/detail/pyr_up.hpp
@@ -0,0 +1,172 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_PYR_UP_DETAIL_HPP
+#define OPENCV_CUDEV_GRID_PYR_UP_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../util/vec_traits.hpp"
+#include "../../util/saturate_cast.hpp"
+#include "../../util/type_traits.hpp"
+#include "../../ptr2d/glob.hpp"
+#include "../../ptr2d/traits.hpp"
+
+namespace cv { namespace cudev {
+
+namespace pyramids_detail
+{
+    template <class SrcPtr, typename DstType>
+    __global__ void pyrUp(const SrcPtr src, GlobPtr<DstType> dst, const int src_rows, const int src_cols, const int dst_rows, const int dst_cols)
+    {
+        typedef typename PtrTraits<SrcPtr>::value_type src_type;
+        typedef typename VecTraits<src_type>::elem_type src_elem_type;
+        typedef typename LargerType<float, src_elem_type>::type work_elem_type;
+        typedef typename MakeVec<work_elem_type, VecTraits<src_type>::cn>::type work_type;
+
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        __shared__ work_type s_srcPatch[10][10];
+        __shared__ work_type s_dstPatch[20][16];
+
+        if (threadIdx.x < 10 && threadIdx.y < 10)
+        {
+            int srcx = static_cast<int>((blockIdx.x * blockDim.x) / 2 + threadIdx.x) - 1;
+            int srcy = static_cast<int>((blockIdx.y * blockDim.y) / 2 + threadIdx.y) - 1;
+
+            srcx = ::abs(srcx);
+            srcx = ::min(src_cols - 1, srcx);
+
+            srcy = ::abs(srcy);
+            srcy = ::min(src_rows - 1, srcy);
+
+            s_srcPatch[threadIdx.y][threadIdx.x] = saturate_cast<work_type>(src(srcy, srcx));
+        }
+
+        __syncthreads();
+
+        work_type sum = VecTraits<work_type>::all(0);
+
+        const int evenFlag = static_cast<int>((threadIdx.x & 1) == 0);
+        const int oddFlag  = static_cast<int>((threadIdx.x & 1) != 0);
+        const bool eveny = ((threadIdx.y & 1) == 0);
+        const int tidx = threadIdx.x;
+
+        if (eveny)
+        {
+            sum = sum + (evenFlag * 0.0625f) * s_srcPatch[1 + (threadIdx.y >> 1)][1 + ((tidx - 2) >> 1)];
+            sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[1 + (threadIdx.y >> 1)][1 + ((tidx - 1) >> 1)];
+            sum = sum + (evenFlag * 0.375f ) * s_srcPatch[1 + (threadIdx.y >> 1)][1 + ((tidx    ) >> 1)];
+            sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[1 + (threadIdx.y >> 1)][1 + ((tidx + 1) >> 1)];
+            sum = sum + (evenFlag * 0.0625f) * s_srcPatch[1 + (threadIdx.y >> 1)][1 + ((tidx + 2) >> 1)];
+        }
+
+        s_dstPatch[2 + threadIdx.y][threadIdx.x] = sum;
+
+        if (threadIdx.y < 2)
+        {
+            sum = VecTraits<work_type>::all(0);
+
+            if (eveny)
+            {
+                sum = sum + (evenFlag * 0.0625f) * s_srcPatch[0][1 + ((tidx - 2) >> 1)];
+                sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[0][1 + ((tidx - 1) >> 1)];
+                sum = sum + (evenFlag * 0.375f ) * s_srcPatch[0][1 + ((tidx    ) >> 1)];
+                sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[0][1 + ((tidx + 1) >> 1)];
+                sum = sum + (evenFlag * 0.0625f) * s_srcPatch[0][1 + ((tidx + 2) >> 1)];
+            }
+
+            s_dstPatch[threadIdx.y][threadIdx.x] = sum;
+        }
+
+        if (threadIdx.y > 13)
+        {
+            sum = VecTraits<work_type>::all(0);
+
+            if (eveny)
+            {
+                sum = sum + (evenFlag * 0.0625f) * s_srcPatch[9][1 + ((tidx - 2) >> 1)];
+                sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[9][1 + ((tidx - 1) >> 1)];
+                sum = sum + (evenFlag * 0.375f ) * s_srcPatch[9][1 + ((tidx    ) >> 1)];
+                sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[9][1 + ((tidx + 1) >> 1)];
+                sum = sum + (evenFlag * 0.0625f) * s_srcPatch[9][1 + ((tidx + 2) >> 1)];
+            }
+
+            s_dstPatch[4 + threadIdx.y][threadIdx.x] = sum;
+        }
+
+        __syncthreads();
+
+        sum = VecTraits<work_type>::all(0);
+
+        const int tidy = threadIdx.y;
+
+        sum = sum + 0.0625f * s_dstPatch[2 + tidy - 2][threadIdx.x];
+        sum = sum + 0.25f   * s_dstPatch[2 + tidy - 1][threadIdx.x];
+        sum = sum + 0.375f  * s_dstPatch[2 + tidy    ][threadIdx.x];
+        sum = sum + 0.25f   * s_dstPatch[2 + tidy + 1][threadIdx.x];
+        sum = sum + 0.0625f * s_dstPatch[2 + tidy + 2][threadIdx.x];
+
+        if (x < dst_cols && y < dst_rows)
+            dst(y, x) = saturate_cast<DstType>(4.0f * sum);
+    }
+
+    template <class SrcPtr, typename DstType>
+    __host__ void pyrUp(const SrcPtr& src, const GlobPtr<DstType>& dst, int src_rows, int src_cols, int dst_rows, int dst_cols, cudaStream_t stream)
+    {
+        const dim3 block(16, 16);
+        const dim3 grid(divUp(dst_cols, block.x), divUp(dst_rows, block.y));
+
+        pyrUp<<<grid, block, 0, stream>>>(src, dst, src_rows, src_cols, dst_rows, dst_cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/detail/reduce.hpp b/modules/cudev/include/opencv2/cudev/grid/detail/reduce.hpp
new file mode 100644
index 00000000000..2c8dfb36ecb
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/detail/reduce.hpp
@@ -0,0 +1,466 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_REDUCE_DETAIL_HPP
+#define OPENCV_CUDEV_GRID_REDUCE_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../util/tuple.hpp"
+#include "../../util/saturate_cast.hpp"
+#include "../../util/atomic.hpp"
+#include "../../util/vec_traits.hpp"
+#include "../../util/type_traits.hpp"
+#include "../../util/limits.hpp"
+#include "../../block/reduce.hpp"
+#include "../../functional/functional.hpp"
+#include "../../ptr2d/traits.hpp"
+
+namespace cv { namespace cudev {
+
+namespace grid_reduce_detail
+{
+    // Unroll
+
+    template <int cn> struct Unroll;
+
+    template <> struct Unroll<1>
+    {
+        template <int BLOCK_SIZE, typename R>
+        __device__ __forceinline__ static volatile R* smem(R* ptr)
+        {
+            return ptr;
+        }
+
+        template <typename R>
+        __device__ __forceinline__ static R& res(R& val)
+        {
+            return val;
+        }
+
+        template <class Op>
+        __device__ __forceinline__ static const Op& op(const Op& aop)
+        {
+            return aop;
+        }
+    };
+
+    template <> struct Unroll<2>
+    {
+        template <int BLOCK_SIZE, typename R>
+        __device__ __forceinline__ static tuple<volatile R*, volatile R*> smem(R* ptr)
+        {
+            return smem_tuple(ptr, ptr + BLOCK_SIZE);
+        }
+
+        template <typename R>
+        __device__ __forceinline__ static tuple<typename VecTraits<R>::elem_type&, typename VecTraits<R>::elem_type&> res(R& val)
+        {
+            return tie(val.x, val.y);
+        }
+
+        template <class Op>
+        __device__ __forceinline__ static tuple<Op, Op> op(const Op& aop)
+        {
+            return make_tuple(aop, aop);
+        }
+    };
+
+    template <> struct Unroll<3>
+    {
+        template <int BLOCK_SIZE, typename R>
+        __device__ __forceinline__ static tuple<volatile R*, volatile R*, volatile R*> smem(R* ptr)
+        {
+            return smem_tuple(ptr, ptr + BLOCK_SIZE, ptr + 2 * BLOCK_SIZE);
+        }
+
+        template <typename R>
+        __device__ __forceinline__ static tuple<typename VecTraits<R>::elem_type&,
+                                                typename VecTraits<R>::elem_type&,
+                                                typename VecTraits<R>::elem_type&> res(R& val)
+        {
+            return tie(val.x, val.y, val.z);
+        }
+
+        template <class Op>
+        __device__ __forceinline__ static tuple<Op, Op, Op> op(const Op& aop)
+        {
+            return make_tuple(aop, aop, aop);
+        }
+    };
+
+    template <> struct Unroll<4>
+    {
+        template <int BLOCK_SIZE, typename R>
+        __device__ __forceinline__ static tuple<volatile R*, volatile R*, volatile R*, volatile R*> smem(R* ptr)
+        {
+            return smem_tuple(ptr, ptr + BLOCK_SIZE, ptr + 2 * BLOCK_SIZE, ptr + 3 * BLOCK_SIZE);
+        }
+
+        template <typename R>
+        __device__ __forceinline__ static tuple<typename VecTraits<R>::elem_type&,
+                                                typename VecTraits<R>::elem_type&,
+                                                typename VecTraits<R>::elem_type&,
+                                                typename VecTraits<R>::elem_type&> res(R& val)
+        {
+            return tie(val.x, val.y, val.z, val.w);
+        }
+
+        template <class Op>
+        __device__ __forceinline__ static tuple<Op, Op, Op, Op> op(const Op& aop)
+        {
+            return make_tuple(aop, aop, aop, aop);
+        }
+    };
+
+    // AtomicUnroll
+
+    template <typename R, int cn> struct AtomicUnroll;
+
+    template <typename R> struct AtomicUnroll<R, 1>
+    {
+        __device__ __forceinline__ static void add(R* ptr, R val)
+        {
+            atomicAdd(ptr, val);
+        }
+
+        __device__ __forceinline__ static void min(R* ptr, R val)
+        {
+            atomicMin(ptr, val);
+        }
+
+        __device__ __forceinline__ static void max(R* ptr, R val)
+        {
+            atomicMax(ptr, val);
+        }
+    };
+
+    template <typename R> struct AtomicUnroll<R, 2>
+    {
+        typedef typename MakeVec<R, 2>::type val_type;
+
+        __device__ __forceinline__ static void add(R* ptr, val_type val)
+        {
+            atomicAdd(ptr, val.x);
+            atomicAdd(ptr + 1, val.y);
+        }
+
+        __device__ __forceinline__ static void min(R* ptr, val_type val)
+        {
+            atomicMin(ptr, val.x);
+            atomicMin(ptr + 1, val.y);
+        }
+
+        __device__ __forceinline__ static void max(R* ptr, val_type val)
+        {
+            atomicMax(ptr, val.x);
+            atomicMax(ptr + 1, val.y);
+        }
+    };
+
+    template <typename R> struct AtomicUnroll<R, 3>
+    {
+        typedef typename MakeVec<R, 3>::type val_type;
+
+        __device__ __forceinline__ static void add(R* ptr, val_type val)
+        {
+            atomicAdd(ptr, val.x);
+            atomicAdd(ptr + 1, val.y);
+            atomicAdd(ptr + 2, val.z);
+        }
+
+        __device__ __forceinline__ static void min(R* ptr, val_type val)
+        {
+            atomicMin(ptr, val.x);
+            atomicMin(ptr + 1, val.y);
+            atomicMin(ptr + 2, val.z);
+        }
+
+        __device__ __forceinline__ static void max(R* ptr, val_type val)
+        {
+            atomicMax(ptr, val.x);
+            atomicMax(ptr + 1, val.y);
+            atomicMax(ptr + 2, val.z);
+        }
+    };
+
+    template <typename R> struct AtomicUnroll<R, 4>
+    {
+        typedef typename MakeVec<R, 4>::type val_type;
+
+        __device__ __forceinline__ static void add(R* ptr, val_type val)
+        {
+            atomicAdd(ptr, val.x);
+            atomicAdd(ptr + 1, val.y);
+            atomicAdd(ptr + 2, val.z);
+            atomicAdd(ptr + 3, val.w);
+        }
+
+        __device__ __forceinline__ static void min(R* ptr, val_type val)
+        {
+            atomicMin(ptr, val.x);
+            atomicMin(ptr + 1, val.y);
+            atomicMin(ptr + 2, val.z);
+            atomicMin(ptr + 3, val.w);
+        }
+
+        __device__ __forceinline__ static void max(R* ptr, val_type val)
+        {
+            atomicMax(ptr, val.x);
+            atomicMax(ptr + 1, val.y);
+            atomicMax(ptr + 2, val.z);
+            atomicMax(ptr + 3, val.w);
+        }
+    };
+
+    // SumReductor
+
+    template <typename src_type, typename work_type> struct SumReductor
+    {
+        typedef typename VecTraits<work_type>::elem_type work_elem_type;
+        enum { cn = VecTraits<src_type>::cn };
+
+        work_type sum;
+
+        __device__ __forceinline__ SumReductor()
+        {
+            sum = VecTraits<work_type>::all(0);
+        }
+
+        __device__ __forceinline__ void reduceVal(typename TypeTraits<src_type>::parameter_type srcVal)
+        {
+            sum = sum + saturate_cast<work_type>(srcVal);
+        }
+
+        template <int BLOCK_SIZE>
+        __device__ void reduceGrid(work_elem_type* result, int tid)
+        {
+            __shared__ work_elem_type smem[BLOCK_SIZE * cn];
+
+            blockReduce<BLOCK_SIZE>(Unroll<cn>::template smem<BLOCK_SIZE>(smem), Unroll<cn>::res(sum), tid, Unroll<cn>::op(plus<work_elem_type>()));
+
+            if (tid == 0)
+                AtomicUnroll<work_elem_type, cn>::add(result, sum);
+        }
+    };
+
+    // MinMaxReductor
+
+    template <typename T> struct minop : minimum<T>
+    {
+        __device__ __forceinline__ static T initial()
+        {
+            return numeric_limits<T>::max();
+        }
+
+        __device__ __forceinline__ static void atomic(T* result, T myval)
+        {
+            atomicMin(result, myval);
+        }
+    };
+
+    template <typename T> struct maxop : maximum<T>
+    {
+        __device__ __forceinline__ static T initial()
+        {
+            return -numeric_limits<T>::max();
+        }
+
+        __device__ __forceinline__ static void atomic(T* result, T myval)
+        {
+            atomicMax(result, myval);
+        }
+    };
+
+    struct both
+    {
+    };
+
+    template <class Op, typename src_type, typename work_type> struct MinMaxReductor
+    {
+        work_type myval;
+
+        __device__ __forceinline__ MinMaxReductor()
+        {
+            myval = Op::initial();
+        }
+
+        __device__ __forceinline__ void reduceVal(typename TypeTraits<src_type>::parameter_type srcVal)
+        {
+            Op op;
+
+            myval = op(myval, srcVal);
+        }
+
+        template <int BLOCK_SIZE>
+        __device__ void reduceGrid(work_type* result, int tid)
+        {
+            __shared__ work_type smem[BLOCK_SIZE];
+
+            Op op;
+
+            blockReduce<BLOCK_SIZE>(smem, myval, tid, op);
+
+            if (tid == 0)
+                Op::atomic(result, myval);
+        }
+    };
+
+    template <typename src_type, typename work_type> struct MinMaxReductor<both, src_type, work_type>
+    {
+        work_type mymin;
+        work_type mymax;
+
+        __device__ __forceinline__ MinMaxReductor()
+        {
+            mymin = numeric_limits<work_type>::max();
+            mymax = -numeric_limits<work_type>::max();
+        }
+
+        __device__ __forceinline__ void reduceVal(typename TypeTraits<src_type>::parameter_type srcVal)
+        {
+            minimum<work_type> minOp;
+            maximum<work_type> maxOp;
+
+            mymin = minOp(mymin, srcVal);
+            mymax = maxOp(mymax, srcVal);
+        }
+
+        template <int BLOCK_SIZE>
+        __device__ void reduceGrid(work_type* result, int tid)
+        {
+            __shared__ work_type sminval[BLOCK_SIZE];
+            __shared__ work_type smaxval[BLOCK_SIZE];
+
+            minimum<work_type> minOp;
+            maximum<work_type> maxOp;
+
+            blockReduce<BLOCK_SIZE>(smem_tuple(sminval, smaxval), tie(mymin, mymax), tid, make_tuple(minOp, maxOp));
+
+            if (tid == 0)
+            {
+                atomicMin(result, mymin);
+                atomicMax(result + 1, mymax);
+            }
+        }
+    };
+
+    // glob_reduce
+
+    template <class Reductor, int BLOCK_SIZE, int PATCH_X, int PATCH_Y, class SrcPtr, typename ResType, class MaskPtr>
+    __global__ void reduce(const SrcPtr src, ResType* result, const MaskPtr mask, const int rows, const int cols)
+    {
+        const int x0 = blockIdx.x * blockDim.x * PATCH_X + threadIdx.x;
+        const int y0 = blockIdx.y * blockDim.y * PATCH_Y + threadIdx.y;
+
+        Reductor reductor;
+
+        for (int i = 0, y = y0; i < PATCH_Y && y < rows; ++i, y += blockDim.y)
+        {
+            for (int j = 0, x = x0; j < PATCH_X && x < cols; ++j, x += blockDim.x)
+            {
+                if (mask(y, x))
+                {
+                    reductor.reduceVal(src(y, x));
+                }
+            }
+        }
+
+        const int tid = threadIdx.y * blockDim.x + threadIdx.x;
+
+        reductor.template reduceGrid<BLOCK_SIZE>(result, tid);
+    }
+
+    template <class Reductor, class Policy, class SrcPtr, typename ResType, class MaskPtr>
+    __host__ void reduce(const SrcPtr& src, ResType* result, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        const dim3 block(Policy::block_size_x, Policy::block_size_y);
+        const dim3 grid(divUp(cols, block.x * Policy::patch_size_x), divUp(rows, block.y * Policy::patch_size_y));
+
+        reduce<Reductor, Policy::block_size_x * Policy::block_size_y, Policy::patch_size_x, Policy::patch_size_y><<<grid, block, 0, stream>>>(src, result, mask, rows, cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+
+    // callers
+
+    template <class Policy, class SrcPtr, typename ResType, class MaskPtr>
+    __host__ void sum(const SrcPtr& src, ResType* result, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        typedef typename PtrTraits<SrcPtr>::value_type src_type;
+        typedef typename VecTraits<ResType>::elem_type res_elem_type;
+
+        reduce<SumReductor<src_type, ResType>, Policy>(src, (res_elem_type*) result, mask, rows, cols, stream);
+    }
+
+    template <class Policy, class SrcPtr, typename ResType, class MaskPtr>
+    __host__ void minVal(const SrcPtr& src, ResType* result, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        typedef typename PtrTraits<SrcPtr>::value_type src_type;
+
+        reduce<MinMaxReductor<minop<ResType>, src_type, ResType>, Policy>(src, result, mask, rows, cols, stream);
+    }
+
+    template <class Policy, class SrcPtr, typename ResType, class MaskPtr>
+    __host__ void maxVal(const SrcPtr& src, ResType* result, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        typedef typename PtrTraits<SrcPtr>::value_type src_type;
+
+        reduce<MinMaxReductor<maxop<ResType>, src_type, ResType>, Policy>(src, result, mask, rows, cols, stream);
+    }
+
+    template <class Policy, class SrcPtr, typename ResType, class MaskPtr>
+    __host__ void minMaxVal(const SrcPtr& src, ResType* result, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        typedef typename PtrTraits<SrcPtr>::value_type src_type;
+
+        reduce<MinMaxReductor<both, src_type, ResType>, Policy>(src, result, mask, rows, cols, stream);
+    }
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/detail/reduce_to_column.hpp b/modules/cudev/include/opencv2/cudev/grid/detail/reduce_to_column.hpp
new file mode 100644
index 00000000000..e1c8a3bc7a2
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/detail/reduce_to_column.hpp
@@ -0,0 +1,146 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_REDUCE_TO_COLUMN_DETAIL_HPP
+#define OPENCV_CUDEV_GRID_REDUCE_TO_COLUMN_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../util/saturate_cast.hpp"
+#include "../../block/reduce.hpp"
+
+namespace cv { namespace cudev {
+
+namespace grid_reduce_to_vec_detail
+{
+    template <int BLOCK_SIZE, typename work_type, typename work_elem_type, class Reductor, int cn> struct Reduce;
+
+    template <int BLOCK_SIZE, typename work_type, typename work_elem_type, class Reductor> struct Reduce<BLOCK_SIZE, work_type, work_elem_type, Reductor, 1>
+    {
+        __device__ __forceinline__ static void call(work_elem_type smem[1][BLOCK_SIZE], work_type& myVal)
+        {
+            typename Reductor::template rebind<work_elem_type>::other op;
+            blockReduce<BLOCK_SIZE>(smem[0], myVal, threadIdx.x, op);
+        }
+    };
+
+    template <int BLOCK_SIZE, typename work_type, typename work_elem_type, class Reductor> struct Reduce<BLOCK_SIZE, work_type, work_elem_type, Reductor, 2>
+    {
+        __device__ __forceinline__ static void call(work_elem_type smem[2][BLOCK_SIZE], work_type& myVal)
+        {
+            typename Reductor::template rebind<work_elem_type>::other op;
+            blockReduce<BLOCK_SIZE>(smem_tuple(smem[0], smem[1]), tie(myVal.x, myVal.y), threadIdx.x, make_tuple(op, op));
+        }
+    };
+
+    template <int BLOCK_SIZE, typename work_type, typename work_elem_type, class Reductor> struct Reduce<BLOCK_SIZE, work_type, work_elem_type, Reductor, 3>
+    {
+        __device__ __forceinline__ static void call(work_elem_type smem[3][BLOCK_SIZE], work_type& myVal)
+        {
+            typename Reductor::template rebind<work_elem_type>::other op;
+            blockReduce<BLOCK_SIZE>(smem_tuple(smem[0], smem[1], smem[2]), tie(myVal.x, myVal.y, myVal.z), threadIdx.x, make_tuple(op, op, op));
+        }
+    };
+
+    template <int BLOCK_SIZE, typename work_type, typename work_elem_type, class Reductor> struct Reduce<BLOCK_SIZE, work_type, work_elem_type, Reductor, 4>
+    {
+        __device__ __forceinline__ static void call(work_elem_type smem[4][BLOCK_SIZE], work_type& myVal)
+        {
+            typename Reductor::template rebind<work_elem_type>::other op;
+            blockReduce<BLOCK_SIZE>(smem_tuple(smem[0], smem[1], smem[2], smem[3]), tie(myVal.x, myVal.y, myVal.z, myVal.w), threadIdx.x, make_tuple(op, op, op, op));
+        }
+    };
+
+    template <class Reductor, int BLOCK_SIZE, class SrcPtr, typename ResType, class MaskPtr>
+    __global__ void reduceToColumn(const SrcPtr src, ResType* dst, const MaskPtr mask, const int cols)
+    {
+        typedef typename Reductor::work_type work_type;
+        typedef typename VecTraits<work_type>::elem_type work_elem_type;
+        const int cn = VecTraits<work_type>::cn;
+
+        __shared__ work_elem_type smem[cn][BLOCK_SIZE];
+
+        const int y = blockIdx.x;
+
+        work_type myVal = Reductor::initialValue();
+
+        Reductor op;
+
+        for (int x = threadIdx.x; x < cols; x += BLOCK_SIZE)
+        {
+            if (mask(y, x))
+            {
+                myVal = op(myVal, saturate_cast<work_type>(src(y, x)));
+            }
+        }
+
+        Reduce<BLOCK_SIZE, work_type, work_elem_type, Reductor, cn>::call(smem, myVal);
+
+        if (threadIdx.x == 0)
+            dst[y] = saturate_cast<ResType>(Reductor::result(myVal, cols));
+    }
+
+    template <class Reductor, class Policy, class SrcPtr, typename ResType, class MaskPtr>
+    __host__ void reduceToColumn(const SrcPtr& src, ResType* dst, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        const int BLOCK_SIZE_X = Policy::block_size_x;
+        const int BLOCK_SIZE_Y = Policy::block_size_y;
+
+        const int BLOCK_SIZE = BLOCK_SIZE_X * BLOCK_SIZE_Y;
+
+        const dim3 block(BLOCK_SIZE);
+        const dim3 grid(rows);
+
+        reduceToColumn<Reductor, BLOCK_SIZE><<<grid, block, 0, stream>>>(src, dst, mask, cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+
+    }
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/detail/reduce_to_row.hpp b/modules/cudev/include/opencv2/cudev/grid/detail/reduce_to_row.hpp
new file mode 100644
index 00000000000..8d3c7e40498
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/detail/reduce_to_row.hpp
@@ -0,0 +1,118 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_REDUCE_TO_ROW_DETAIL_HPP
+#define OPENCV_CUDEV_GRID_REDUCE_TO_ROW_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../util/saturate_cast.hpp"
+#include "../../block/reduce.hpp"
+
+namespace cv { namespace cudev {
+
+namespace grid_reduce_to_vec_detail
+{
+    template <class Reductor, int BLOCK_SIZE_X, int BLOCK_SIZE_Y, class SrcPtr, typename ResType, class MaskPtr>
+    __global__ void reduceToRow(const SrcPtr src, ResType* dst, const MaskPtr mask, const int rows, const int cols)
+    {
+        typedef typename Reductor::work_type work_type;
+
+        __shared__ work_type smem[BLOCK_SIZE_X * BLOCK_SIZE_Y];
+
+        const int x = blockIdx.x * BLOCK_SIZE_X + threadIdx.x;
+
+        work_type myVal = Reductor::initialValue();
+
+        Reductor op;
+
+        if (x < cols)
+        {
+            for (int y = threadIdx.y; y < rows; y += BLOCK_SIZE_Y)
+            {
+                if (mask(y, x))
+                {
+                    myVal = op(myVal, saturate_cast<work_type>(src(y, x)));
+                }
+            }
+        }
+
+        smem[threadIdx.x * BLOCK_SIZE_Y + threadIdx.y] = myVal;
+
+        __syncthreads();
+
+        volatile work_type* srow = smem + threadIdx.y * BLOCK_SIZE_X;
+
+        myVal = srow[threadIdx.x];
+        blockReduce<BLOCK_SIZE_X>(srow, myVal, threadIdx.x, op);
+
+        if (threadIdx.x == 0)
+            srow[0] = myVal;
+
+        __syncthreads();
+
+        if (threadIdx.y == 0 && x < cols)
+            dst[x] = saturate_cast<ResType>(Reductor::result(smem[threadIdx.x * BLOCK_SIZE_X], rows));
+    }
+
+    template <class Reductor, class SrcPtr, typename ResType, class MaskPtr>
+    __host__ void reduceToRow(const SrcPtr& src, ResType* dst, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        const int BLOCK_SIZE_X = 16;
+        const int BLOCK_SIZE_Y = 16;
+
+        const dim3 block(BLOCK_SIZE_X, BLOCK_SIZE_Y);
+        const dim3 grid(divUp(cols, block.x));
+
+        reduceToRow<Reductor, BLOCK_SIZE_X, BLOCK_SIZE_Y><<<grid, block, 0, stream>>>(src, dst, mask, rows, cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/detail/split_merge.hpp b/modules/cudev/include/opencv2/cudev/grid/detail/split_merge.hpp
new file mode 100644
index 00000000000..3f512060165
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/detail/split_merge.hpp
@@ -0,0 +1,282 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_SPLIT_MERGE_DETAIL_HPP
+#define OPENCV_CUDEV_GRID_SPLIT_MERGE_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../util/saturate_cast.hpp"
+#include "../../util/tuple.hpp"
+#include "../../util/vec_traits.hpp"
+#include "../../ptr2d/glob.hpp"
+#include "../../ptr2d/traits.hpp"
+
+namespace cv { namespace cudev {
+
+namespace grid_split_merge_detail
+{
+    // merge
+
+    template <class Src1Ptr, class Src2Ptr, typename DstType, class MaskPtr>
+    __global__ void mergeC2(const Src1Ptr src1, const Src2Ptr src2, GlobPtr<DstType> dst, const MaskPtr mask, const int rows, const int cols)
+    {
+        typedef typename VecTraits<DstType>::elem_type dst_elem_type;
+
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= cols || y >= rows || !mask(y, x))
+            return;
+
+        dst(y, x) = VecTraits<DstType>::make(
+                    saturate_cast<dst_elem_type>(src1(y, x)),
+                    saturate_cast<dst_elem_type>(src2(y, x))
+                    );
+    }
+
+    template <class Policy, class Src1Ptr, class Src2Ptr, typename DstType, class MaskPtr>
+    __host__ void mergeC2(const Src1Ptr& src1, const Src2Ptr& src2, const GlobPtr<DstType>& dst, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        const dim3 block(Policy::block_size_x, Policy::block_size_y);
+        const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));
+
+        mergeC2<<<grid, block, 0, stream>>>(src1, src2, dst, mask, rows, cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL(cudaDeviceSynchronize());
+    }
+
+    template <class Src1Ptr, class Src2Ptr, class Src3Ptr, typename DstType, class MaskPtr>
+    __global__ void mergeC3(const Src1Ptr src1, const Src2Ptr src2, const Src3Ptr src3, GlobPtr<DstType> dst, const MaskPtr mask, const int rows, const int cols)
+    {
+        typedef typename VecTraits<DstType>::elem_type dst_elem_type;
+
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= cols || y >= rows || !mask(y, x))
+            return;
+
+        dst(y, x) = VecTraits<DstType>::make(
+                    saturate_cast<dst_elem_type>(src1(y, x)),
+                    saturate_cast<dst_elem_type>(src2(y, x)),
+                    saturate_cast<dst_elem_type>(src3(y, x))
+                    );
+    }
+
+    template <class Policy, class Src1Ptr, class Src2Ptr, class Src3Ptr, typename DstType, class MaskPtr>
+    __host__ void mergeC3(const Src1Ptr& src1, const Src2Ptr& src2, const Src3Ptr& src3, const GlobPtr<DstType>& dst, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        const dim3 block(Policy::block_size_x, Policy::block_size_y);
+        const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));
+
+        mergeC3<<<grid, block, 0, stream>>>(src1, src2, src3, dst, mask, rows, cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL(cudaDeviceSynchronize());
+    }
+
+    template <class Src1Ptr, class Src2Ptr, class Src3Ptr, class Src4Ptr, typename DstType, class MaskPtr>
+    __global__ void mergeC4(const Src1Ptr src1, const Src2Ptr src2, const Src3Ptr src3, const Src4Ptr src4, GlobPtr<DstType> dst, const MaskPtr mask, const int rows, const int cols)
+    {
+        typedef typename VecTraits<DstType>::elem_type dst_elem_type;
+
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= cols || y >= rows || !mask(y, x))
+            return;
+
+        dst(y, x) = VecTraits<DstType>::make(
+                    saturate_cast<dst_elem_type>(src1(y, x)),
+                    saturate_cast<dst_elem_type>(src2(y, x)),
+                    saturate_cast<dst_elem_type>(src3(y, x)),
+                    saturate_cast<dst_elem_type>(src4(y, x))
+                    );
+    }
+
+    template <class Policy, class Src1Ptr, class Src2Ptr, class Src3Ptr, class Src4Ptr, typename DstType, class MaskPtr>
+    __host__ void mergeC4(const Src1Ptr& src1, const Src2Ptr& src2, const Src3Ptr& src3, const Src4Ptr& src4, const GlobPtr<DstType>& dst, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        const dim3 block(Policy::block_size_x, Policy::block_size_y);
+        const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));
+
+        mergeC4<<<grid, block, 0, stream>>>(src1, src2, src3, src4, dst, mask, rows, cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL(cudaDeviceSynchronize());
+    }
+
+    template <int cn, class Policy> struct MergeImpl;
+
+    template <class Policy> struct MergeImpl<2, Policy>
+    {
+        template <class SrcPtrTuple, typename DstType, class MaskPtr>
+        __host__ static void merge(const SrcPtrTuple& src, const GlobPtr<DstType>& dst, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+        {
+            mergeC2<Policy>(get<0>(src), get<1>(src), dst, mask, rows, cols, stream);
+        }
+    };
+
+    template <class Policy> struct MergeImpl<3, Policy>
+    {
+        template <class SrcPtrTuple, typename DstType, class MaskPtr>
+        __host__ static void merge(const SrcPtrTuple& src, const GlobPtr<DstType>& dst, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+        {
+            mergeC3<Policy>(get<0>(src), get<1>(src), get<2>(src), dst, mask, rows, cols, stream);
+        }
+    };
+
+    template <class Policy> struct MergeImpl<4, Policy>
+    {
+        template <class SrcPtrTuple, typename DstType, class MaskPtr>
+        __host__ static void merge(const SrcPtrTuple& src, const GlobPtr<DstType>& dst, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+        {
+            mergeC4<Policy>(get<0>(src), get<1>(src), get<2>(src), get<3>(src), dst, mask, rows, cols, stream);
+        }
+    };
+
+    // split
+
+    template <class SrcPtr, typename DstType, class MaskPtr>
+    __global__ void split(const SrcPtr src, GlobPtr<DstType> dst1, GlobPtr<DstType> dst2, const MaskPtr mask, const int rows, const int cols)
+    {
+        typedef typename PtrTraits<SrcPtr>::value_type src_type;
+
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= cols || y >= rows || !mask(y, x))
+            return;
+
+        const src_type src_value = src(y, x);
+
+        dst1(y, x) = src_value.x;
+        dst2(y, x) = src_value.y;
+    }
+
+    template <class Policy, class SrcPtr, typename DstType, class MaskPtr>
+    __host__ void split(const SrcPtr& src, const GlobPtr<DstType>& dst1, const GlobPtr<DstType>& dst2, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        const dim3 block(Policy::block_size_x, Policy::block_size_y);
+        const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));
+
+        split<<<grid, block, 0, stream>>>(src, dst1, dst2, mask, rows, cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL(cudaDeviceSynchronize());
+    }
+
+    template <class SrcPtr, typename DstType, class MaskPtr>
+    __global__ void split(const SrcPtr src, GlobPtr<DstType> dst1, GlobPtr<DstType> dst2, GlobPtr<DstType> dst3, const MaskPtr mask, const int rows, const int cols)
+    {
+        typedef typename PtrTraits<SrcPtr>::value_type src_type;
+
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= cols || y >= rows || !mask(y, x))
+            return;
+
+        const src_type src_value = src(y, x);
+
+        dst1(y, x) = src_value.x;
+        dst2(y, x) = src_value.y;
+        dst3(y, x) = src_value.z;
+    }
+
+    template <class Policy, class SrcPtr, typename DstType, class MaskPtr>
+    __host__ void split(const SrcPtr& src, const GlobPtr<DstType>& dst1, const GlobPtr<DstType>& dst2, const GlobPtr<DstType>& dst3, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        const dim3 block(Policy::block_size_x, Policy::block_size_y);
+        const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));
+
+        split<<<grid, block, 0, stream>>>(src, dst1, dst2, dst3, mask, rows, cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL(cudaDeviceSynchronize());
+    }
+
+    template <class SrcPtr, typename DstType, class MaskPtr>
+    __global__ void split(const SrcPtr src, GlobPtr<DstType> dst1, GlobPtr<DstType> dst2, GlobPtr<DstType> dst3, GlobPtr<DstType> dst4, const MaskPtr mask, const int rows, const int cols)
+    {
+        typedef typename PtrTraits<SrcPtr>::value_type src_type;
+
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= cols || y >= rows || !mask(y, x))
+            return;
+
+        const src_type src_value = src(y, x);
+
+        dst1(y, x) = src_value.x;
+        dst2(y, x) = src_value.y;
+        dst3(y, x) = src_value.z;
+        dst4(y, x) = src_value.w;
+    }
+
+    template <class Policy, class SrcPtr, typename DstType, class MaskPtr>
+    __host__ void split(const SrcPtr& src, const GlobPtr<DstType>& dst1, const GlobPtr<DstType>& dst2, const GlobPtr<DstType>& dst3, const GlobPtr<DstType>& dst4, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        const dim3 block(Policy::block_size_x, Policy::block_size_y);
+        const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));
+
+        split<<<grid, block, 0, stream>>>(src, dst1, dst2, dst3, dst4, mask, rows, cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL(cudaDeviceSynchronize());
+    }
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/detail/transform.hpp b/modules/cudev/include/opencv2/cudev/grid/detail/transform.hpp
new file mode 100644
index 00000000000..557797d7c85
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/detail/transform.hpp
@@ -0,0 +1,417 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_TRANSFORM_DETAIL_HPP
+#define OPENCV_CUDEV_GRID_TRANSFORM_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../util/tuple.hpp"
+#include "../../util/saturate_cast.hpp"
+#include "../../util/vec_traits.hpp"
+#include "../../ptr2d/glob.hpp"
+#include "../../ptr2d/traits.hpp"
+
+namespace cv { namespace cudev {
+
+namespace grid_transform_detail
+{
+    // OpUnroller
+
+    template <int cn> struct OpUnroller;
+
+    template <> struct OpUnroller<1>
+    {
+        template <typename T, typename D, class UnOp, class MaskPtr>
+        __device__ __forceinline__ static void unroll(const T& src, D& dst, const UnOp& op, const MaskPtr& mask, int x_shifted, int y)
+        {
+            if (mask(y, x_shifted))
+                dst.x = op(src.x);
+        }
+
+        template <typename T1, typename T2, typename D, class BinOp, class MaskPtr>
+        __device__ __forceinline__ static void unroll(const T1& src1, const T2& src2, D& dst, const BinOp& op, const MaskPtr& mask, int x_shifted, int y)
+        {
+            if (mask(y, x_shifted))
+                dst.x = op(src1.x, src2.x);
+        }
+    };
+
+    template <> struct OpUnroller<2>
+    {
+        template <typename T, typename D, class UnOp, class MaskPtr>
+        __device__ __forceinline__ static void unroll(const T& src, D& dst, const UnOp& op, const MaskPtr& mask, int x_shifted, int y)
+        {
+            if (mask(y, x_shifted))
+                dst.x = op(src.x);
+            if (mask(y, x_shifted + 1))
+                dst.y = op(src.y);
+        }
+
+        template <typename T1, typename T2, typename D, class BinOp, class MaskPtr>
+        __device__ __forceinline__ static void unroll(const T1& src1, const T2& src2, D& dst, const BinOp& op, const MaskPtr& mask, int x_shifted, int y)
+        {
+            if (mask(y, x_shifted))
+                dst.x = op(src1.x, src2.x);
+            if (mask(y, x_shifted + 1))
+                dst.y = op(src1.y, src2.y);
+        }
+    };
+
+    template <> struct OpUnroller<3>
+    {
+        template <typename T, typename D, class UnOp, class MaskPtr>
+        __device__ __forceinline__ static void unroll(const T& src, D& dst, const UnOp& op, const MaskPtr& mask, int x_shifted, int y)
+        {
+            if (mask(y, x_shifted))
+                dst.x = op(src.x);
+            if (mask(y, x_shifted + 1))
+                dst.y = op(src.y);
+            if (mask(y, x_shifted + 2))
+                dst.z = op(src.z);
+        }
+
+        template <typename T1, typename T2, typename D, class BinOp, class MaskPtr>
+        __device__ __forceinline__ static void unroll(const T1& src1, const T2& src2, D& dst, const BinOp& op, const MaskPtr& mask, int x_shifted, int y)
+        {
+            if (mask(y, x_shifted))
+                dst.x = op(src1.x, src2.x);
+            if (mask(y, x_shifted + 1))
+                dst.y = op(src1.y, src2.y);
+            if (mask(y, x_shifted + 2))
+                dst.z = op(src1.z, src2.z);
+        }
+    };
+
+    template <> struct OpUnroller<4>
+    {
+        template <typename T, typename D, class UnOp, class MaskPtr>
+        __device__ __forceinline__ static void unroll(const T& src, D& dst, const UnOp& op, const MaskPtr& mask, int x_shifted, int y)
+        {
+            if (mask(y, x_shifted))
+                dst.x = op(src.x);
+            if (mask(y, x_shifted + 1))
+                dst.y = op(src.y);
+            if (mask(y, x_shifted + 2))
+                dst.z = op(src.z);
+            if (mask(y, x_shifted + 3))
+                dst.w = op(src.w);
+        }
+
+        template <typename T1, typename T2, typename D, class BinOp, class MaskPtr>
+        __device__ __forceinline__ static void unroll(const T1& src1, const T2& src2, D& dst, const BinOp& op, const MaskPtr& mask, int x_shifted, int y)
+        {
+            if (mask(y, x_shifted))
+                dst.x = op(src1.x, src2.x);
+            if (mask(y, x_shifted + 1))
+                dst.y = op(src1.y, src2.y);
+            if (mask(y, x_shifted + 2))
+                dst.z = op(src1.z, src2.z);
+            if (mask(y, x_shifted + 3))
+                dst.w = op(src1.w, src2.w);
+        }
+    };
+
+    // transformSimple
+
+    template <class SrcPtr, typename DstType, class UnOp, class MaskPtr>
+    __global__ void transformSimple(const SrcPtr src, GlobPtr<DstType> dst, const UnOp op, const MaskPtr mask, const int rows, const int cols)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= cols || y >= rows || !mask(y, x))
+            return;
+
+        dst(y, x) = saturate_cast<DstType>(op(src(y, x)));
+    }
+
+    template <class SrcPtr1, class SrcPtr2, typename DstType, class BinOp, class MaskPtr>
+    __global__ void transformSimple(const SrcPtr1 src1, const SrcPtr2 src2, GlobPtr<DstType> dst, const BinOp op, const MaskPtr mask, const int rows, const int cols)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= cols || y >= rows || !mask(y, x))
+            return;
+
+        dst(y, x) = saturate_cast<DstType>(op(src1(y, x), src2(y, x)));
+    }
+
+    // transformSmart
+
+    template <int SHIFT, typename SrcType, typename DstType, class UnOp, class MaskPtr>
+    __global__ void transformSmart(const GlobPtr<SrcType> src_, GlobPtr<DstType> dst_, const UnOp op, const MaskPtr mask, const int rows, const int cols)
+    {
+        typedef typename MakeVec<SrcType, SHIFT>::type read_type;
+        typedef typename MakeVec<DstType, SHIFT>::type write_type;
+
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+        const int x_shifted = x * SHIFT;
+
+        if (y < rows)
+        {
+            const SrcType* src = src_.row(y);
+            DstType* dst = dst_.row(y);
+
+            if (x_shifted + SHIFT - 1 < cols)
+            {
+                const read_type src_n_el = ((const read_type*)src)[x];
+
+                OpUnroller<SHIFT>::unroll(src_n_el, ((write_type*)dst)[x], op, mask, x_shifted, y);
+            }
+            else
+            {
+                for (int real_x = x_shifted; real_x < cols; ++real_x)
+                {
+                    if (mask(y, real_x))
+                        dst[real_x] = op(src[real_x]);
+                }
+            }
+        }
+    }
+
+    template <int SHIFT, typename SrcType1, typename SrcType2, typename DstType, class BinOp, class MaskPtr>
+    __global__ void transformSmart(const GlobPtr<SrcType1> src1_, const GlobPtr<SrcType2> src2_, GlobPtr<DstType> dst_, const BinOp op, const MaskPtr mask, const int rows, const int cols)
+    {
+        typedef typename MakeVec<SrcType1, SHIFT>::type read_type1;
+        typedef typename MakeVec<SrcType2, SHIFT>::type read_type2;
+        typedef typename MakeVec<DstType, SHIFT>::type write_type;
+
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+        const int x_shifted = x * SHIFT;
+
+        if (y < rows)
+        {
+            const SrcType1* src1 = src1_.row(y);
+            const SrcType2* src2 = src2_.row(y);
+            DstType* dst = dst_.row(y);
+
+            if (x_shifted + SHIFT - 1 < cols)
+            {
+                const read_type1 src1_n_el = ((const read_type1*)src1)[x];
+                const read_type2 src2_n_el = ((const read_type2*)src2)[x];
+
+                OpUnroller<SHIFT>::unroll(src1_n_el, src2_n_el, ((write_type*)dst)[x], op, mask, x_shifted, y);
+            }
+            else
+            {
+                for (int real_x = x_shifted; real_x < cols; ++real_x)
+                {
+                    if (mask(y, real_x))
+                        dst[real_x] = op(src1[real_x], src2[real_x]);
+                }
+            }
+        }
+    }
+
+    // TransformDispatcher
+
+    template <bool UseSmart, class Policy> struct TransformDispatcher;
+
+    template <class Policy> struct TransformDispatcher<false, Policy>
+    {
+        template <class SrcPtr, typename DstType, class UnOp, class MaskPtr>
+        __host__ static void call(const SrcPtr& src, const GlobPtr<DstType>& dst, const UnOp& op, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+        {
+            const dim3 block(Policy::block_size_x, Policy::block_size_y);
+            const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));
+
+            transformSimple<<<grid, block, 0, stream>>>(src, dst, op, mask, rows, cols);
+            CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+            if (stream == 0)
+                CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+        }
+
+        template <class SrcPtr1, class SrcPtr2, typename DstType, class BinOp, class MaskPtr>
+        __host__ static void call(const SrcPtr1& src1, const SrcPtr2& src2, const GlobPtr<DstType>& dst, const BinOp& op, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+        {
+            const dim3 block(Policy::block_size_x, Policy::block_size_y);
+            const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));
+
+            transformSimple<<<grid, block, 0, stream>>>(src1, src2, dst, op, mask, rows, cols);
+            CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+            if (stream == 0)
+                CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+        }
+    };
+
+    template <class Policy> struct TransformDispatcher<true, Policy>
+    {
+        template <typename T>
+        __host__ static bool isAligned(const T* ptr, size_t size)
+        {
+            return reinterpret_cast<size_t>(ptr) % size == 0;
+        }
+
+        __host__ static bool isAligned(size_t step, size_t size)
+        {
+            return step % size == 0;
+        }
+
+        template <typename SrcType, typename DstType, class UnOp, class MaskPtr>
+        __host__ static void call(const GlobPtr<SrcType>& src, const GlobPtr<DstType>& dst, const UnOp& op, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+        {
+            if (Policy::shift == 1 ||
+                !isAligned(src.data, Policy::shift * sizeof(SrcType)) || !isAligned(src.step, Policy::shift * sizeof(SrcType)) ||
+                !isAligned(dst.data, Policy::shift * sizeof(DstType)) || !isAligned(dst.step, Policy::shift * sizeof(DstType)))
+            {
+                TransformDispatcher<false, Policy>::call(src, dst, op, mask, rows, cols, stream);
+                return;
+            }
+
+            const dim3 block(Policy::block_size_x, Policy::block_size_y);
+            const dim3 grid(divUp(cols, block.x * Policy::shift), divUp(rows, block.y));
+
+            transformSmart<Policy::shift><<<grid, block, 0, stream>>>(src, dst, op, mask, rows, cols);
+            CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+            if (stream == 0)
+                CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+        }
+
+        template <typename SrcType1, typename SrcType2, typename DstType, class BinOp, class MaskPtr>
+        __host__ static void call(const GlobPtr<SrcType1>& src1, const GlobPtr<SrcType2>& src2, const GlobPtr<DstType>& dst, const BinOp& op, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+        {
+            if (Policy::shift == 1 ||
+                !isAligned(src1.data, Policy::shift * sizeof(SrcType1)) || !isAligned(src1.step, Policy::shift * sizeof(SrcType1)) ||
+                !isAligned(src2.data, Policy::shift * sizeof(SrcType2)) || !isAligned(src2.step, Policy::shift * sizeof(SrcType2)) ||
+                !isAligned(dst.data,  Policy::shift * sizeof(DstType))  || !isAligned(dst.step,  Policy::shift * sizeof(DstType)))
+            {
+                TransformDispatcher<false, Policy>::call(src1, src2, dst, op, mask, rows, cols, stream);
+                return;
+            }
+
+            const dim3 block(Policy::block_size_x, Policy::block_size_y);
+            const dim3 grid(divUp(cols, block.x * Policy::shift), divUp(rows, block.y));
+
+            transformSmart<Policy::shift><<<grid, block, 0, stream>>>(src1, src2, dst, op, mask, rows, cols);
+            CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+            if (stream == 0)
+                CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+        }
+    };
+
+    template <class Policy, class SrcPtr, typename DstType, class UnOp, class MaskPtr>
+    __host__ void transform_unary(const SrcPtr& src, const GlobPtr<DstType>& dst, const UnOp& op, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        TransformDispatcher<false, Policy>::call(src, dst, op, mask, rows, cols, stream);
+    }
+
+    template <class Policy, class SrcPtr1, class SrcPtr2, typename DstType, class BinOp, class MaskPtr>
+    __host__ void transform_binary(const SrcPtr1& src1, const SrcPtr2& src2, const GlobPtr<DstType>& dst, const BinOp& op, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        TransformDispatcher<false, Policy>::call(src1, src2, dst, op, mask, rows, cols, stream);
+    }
+
+    template <class Policy, typename SrcType, typename DstType, class UnOp, class MaskPtr>
+    __host__ void transform_unary(const GlobPtr<SrcType>& src, const GlobPtr<DstType>& dst, const UnOp& op, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        TransformDispatcher<VecTraits<SrcType>::cn == 1 && VecTraits<DstType>::cn == 1 && Policy::shift != 1, Policy>::call(src, dst, op, mask, rows, cols, stream);
+    }
+
+    template <class Policy, typename SrcType1, typename SrcType2, typename DstType, class BinOp, class MaskPtr>
+    __host__ void transform_binary(const GlobPtr<SrcType1>& src1, const GlobPtr<SrcType2>& src2, const GlobPtr<DstType>& dst, const BinOp& op, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        TransformDispatcher<VecTraits<SrcType1>::cn == 1 && VecTraits<SrcType2>::cn == 1 && VecTraits<DstType>::cn == 1 && Policy::shift != 1, Policy>::call(src1, src2, dst, op, mask, rows, cols, stream);
+    }
+
+    // transform_tuple
+
+    template <int count> struct Unroll
+    {
+        template <class SrcVal, class DstPtrTuple, class OpTuple>
+        __device__ static void transform(const SrcVal& srcVal, DstPtrTuple& dst, const OpTuple& op, int y, int x)
+        {
+            typedef typename tuple_element<count - 1, DstPtrTuple>::type dst_ptr_type;
+            typedef typename PtrTraits<dst_ptr_type>::value_type dst_type;
+
+            get<count - 1>(dst)(y, x) = saturate_cast<dst_type>(get<count - 1>(op)(srcVal));
+            Unroll<count - 1>::transform(srcVal, dst, op, y, x);
+        }
+    };
+    template <> struct Unroll<0>
+    {
+        template <class SrcVal, class DstPtrTuple, class OpTuple>
+        __device__ __forceinline__ static void transform(const SrcVal&, DstPtrTuple&, const OpTuple&, int, int)
+        {
+        }
+    };
+
+    template <class SrcPtr, class DstPtrTuple, class OpTuple, class MaskPtr>
+    __global__ void transform_tuple(const SrcPtr src, DstPtrTuple dst, const OpTuple op, const MaskPtr mask, const int rows, const int cols)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= cols || y >= rows || !mask(y, x))
+            return;
+
+        typename PtrTraits<SrcPtr>::value_type srcVal = src(y, x);
+
+        Unroll<tuple_size<DstPtrTuple>::value>::transform(srcVal, dst, op, y, x);
+    }
+
+    template <class Policy, class SrcPtrTuple, class DstPtrTuple, class OpTuple, class MaskPtr>
+    __host__ void transform_tuple(const SrcPtrTuple& src, const DstPtrTuple& dst, const OpTuple& op, const MaskPtr& mask, int rows, int cols, cudaStream_t stream)
+    {
+        const dim3 block(Policy::block_size_x, Policy::block_size_y);
+        const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));
+
+        transform_tuple<<<grid, block, 0, stream>>>(src, dst, op, mask, rows, cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/detail/transpose.hpp b/modules/cudev/include/opencv2/cudev/grid/detail/transpose.hpp
new file mode 100644
index 00000000000..1236d4bce36
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/detail/transpose.hpp
@@ -0,0 +1,127 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_TRANSPOSE_DETAIL_HPP
+#define OPENCV_CUDEV_GRID_TRANSPOSE_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../util/saturate_cast.hpp"
+#include "../../ptr2d/glob.hpp"
+#include "../../ptr2d/traits.hpp"
+
+namespace cv { namespace cudev {
+
+namespace transpose_detail
+{
+    template <int TILE_DIM, int BLOCK_DIM_Y, class SrcPtr, typename DstType>
+    __global__ void transpose(const SrcPtr src, GlobPtr<DstType> dst, const int rows, const int cols)
+    {
+        typedef typename PtrTraits<SrcPtr>::value_type src_type;
+
+        __shared__ src_type tile[TILE_DIM][TILE_DIM + 1];
+
+        int blockIdx_x, blockIdx_y;
+
+        // do diagonal reordering
+        if (gridDim.x == gridDim.y)
+        {
+            blockIdx_y = blockIdx.x;
+            blockIdx_x = (blockIdx.x + blockIdx.y) % gridDim.x;
+        }
+        else
+        {
+            int bid = blockIdx.x + gridDim.x * blockIdx.y;
+            blockIdx_y = bid % gridDim.y;
+            blockIdx_x = ((bid / gridDim.y) + blockIdx_y) % gridDim.x;
+        }
+
+        int xIndex = blockIdx_x * TILE_DIM + threadIdx.x;
+        int yIndex = blockIdx_y * TILE_DIM + threadIdx.y;
+
+        if (xIndex < cols)
+        {
+            for (int i = 0; i < TILE_DIM; i += BLOCK_DIM_Y)
+            {
+                if (yIndex + i < rows)
+                {
+                    tile[threadIdx.y + i][threadIdx.x] = src(yIndex + i, xIndex);
+                }
+            }
+        }
+
+        __syncthreads();
+
+        xIndex = blockIdx_y * TILE_DIM + threadIdx.x;
+        yIndex = blockIdx_x * TILE_DIM + threadIdx.y;
+
+        if (xIndex < rows)
+        {
+            for (int i = 0; i < TILE_DIM; i += BLOCK_DIM_Y)
+            {
+                if (yIndex + i < cols)
+                {
+                    dst(yIndex + i, xIndex) = saturate_cast<DstType>(tile[threadIdx.x][threadIdx.y + i]);
+                }
+            }
+        }
+    }
+
+    template <class Policy, class SrcPtr, typename DstType>
+    __host__ void transpose(const SrcPtr& src, const GlobPtr<DstType>& dst, int rows, int cols, cudaStream_t stream)
+    {
+        const dim3 block(Policy::tile_dim, Policy::block_dim_y);
+        const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));
+
+        transpose<Policy::tile_dim, Policy::block_dim_y><<<grid, block, 0, stream>>>(src, dst, rows, cols);
+        CV_CUDEV_SAFE_CALL( cudaGetLastError() );
+
+        if (stream == 0)
+            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
+    }
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/histogram.hpp b/modules/cudev/include/opencv2/cudev/grid/histogram.hpp
new file mode 100644
index 00000000000..1ceee9fdd51
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/histogram.hpp
@@ -0,0 +1,124 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_HISTOGRAM_HPP
+#define OPENCV_CUDEV_GRID_HISTOGRAM_HPP
+
+#include "../common.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/gpumat.hpp"
+#include "../ptr2d/mask.hpp"
+#include "detail/histogram.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <int BIN_COUNT, class Policy, class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridHistogram_(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_Assert( deviceSupports(SHARED_ATOMICS) );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    dst.create(1, BIN_COUNT);
+    dst.setTo(0, stream);
+
+    grid_histogram_detail::histogram<BIN_COUNT, Policy>(shrinkPtr(src),
+                                                        dst[0],
+                                                        shrinkPtr(mask),
+                                                        rows, cols,
+                                                        StreamAccessor::getStream(stream));
+}
+
+template <int BIN_COUNT, class Policy, class SrcPtr, typename ResType>
+__host__ void gridHistogram_(const SrcPtr& src, GpuMat_<ResType>& dst, Stream& stream = Stream::Null())
+{
+    CV_Assert( deviceSupports(SHARED_ATOMICS) );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    dst.create(1, BIN_COUNT);
+    dst.setTo(0, stream);
+
+    grid_histogram_detail::histogram<BIN_COUNT, Policy>(shrinkPtr(src),
+                                                        dst[0],
+                                                        WithOutMask(),
+                                                        rows, cols,
+                                                        StreamAccessor::getStream(stream));
+}
+
+// default policy
+
+struct DefaultHistogramPolicy
+{
+    enum {
+        block_size_x = 32,
+        block_size_y = 8
+    };
+};
+
+template <int BIN_COUNT, class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridHistogram(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridHistogram_<BIN_COUNT, DefaultHistogramPolicy>(src, dst, mask, stream);
+}
+
+template <int BIN_COUNT, class SrcPtr, typename ResType>
+__host__ void gridHistogram(const SrcPtr& src, GpuMat_<ResType>& dst, Stream& stream = Stream::Null())
+{
+    gridHistogram_<BIN_COUNT, DefaultHistogramPolicy>(src, dst, stream);
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/integral.hpp b/modules/cudev/include/opencv2/cudev/grid/integral.hpp
new file mode 100644
index 00000000000..3e680b1a196
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/integral.hpp
@@ -0,0 +1,74 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_INTEGRAL_HPP
+#define OPENCV_CUDEV_GRID_INTEGRAL_HPP
+
+#include "../common.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/gpumat.hpp"
+#include "detail/integral.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <class SrcPtr, typename DstType>
+__host__ void gridIntegral(const SrcPtr& src, GpuMat_<DstType>& dst, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    dst.create(rows, cols);
+
+    integral_detail::integral(shrinkPtr(src), shrinkPtr(dst), rows, cols, StreamAccessor::getStream(stream));
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/pyramids.hpp b/modules/cudev/include/opencv2/cudev/grid/pyramids.hpp
new file mode 100644
index 00000000000..64826578259
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/pyramids.hpp
@@ -0,0 +1,93 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_PYRAMIDS_HPP
+#define OPENCV_CUDEV_GRID_PYRAMIDS_HPP
+
+#include "../common.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/gpumat.hpp"
+#include "../ptr2d/extrapolation.hpp"
+#include "detail/pyr_down.hpp"
+#include "detail/pyr_up.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <class Brd, class SrcPtr, typename DstType>
+__host__ void gridPyrDown_(const SrcPtr& src, GpuMat_<DstType>& dst, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    dst.create(divUp(rows, 2), divUp(cols, 2));
+
+    pyramids_detail::pyrDown<Brd>(shrinkPtr(src), shrinkPtr(dst), rows, cols, dst.rows, dst.cols, StreamAccessor::getStream(stream));
+}
+
+template <class SrcPtr, typename DstType>
+__host__ void gridPyrDown(const SrcPtr& src, GpuMat_<DstType>& dst, Stream& stream = Stream::Null())
+{
+    gridPyrDown_<BrdReflect101>(src, dst, stream);
+}
+
+template <class SrcPtr, typename DstType>
+__host__ void gridPyrUp(const SrcPtr& src, GpuMat_<DstType>& dst, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    dst.create(rows * 2, cols * 2);
+
+    pyramids_detail::pyrUp(shrinkPtr(src), shrinkPtr(dst), rows, cols, dst.rows, dst.cols, StreamAccessor::getStream(stream));
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/reduce.hpp b/modules/cudev/include/opencv2/cudev/grid/reduce.hpp
new file mode 100644
index 00000000000..e9e42b27fdb
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/reduce.hpp
@@ -0,0 +1,380 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_REDUCE_HPP
+#define OPENCV_CUDEV_GRID_REDUCE_HPP
+
+#include <limits>
+#include "../common.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/gpumat.hpp"
+#include "../ptr2d/mask.hpp"
+#include "../ptr2d/transform.hpp"
+#include "detail/reduce.hpp"
+#include "detail/minmaxloc.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <class Policy, class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridCalcSum_(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    typedef typename PtrTraits<SrcPtr>::value_type src_type;
+
+    CV_StaticAssert( unsigned(VecTraits<src_type>::cn) == unsigned(VecTraits<ResType>::cn), "" );
+
+    dst.create(1, 1);
+    dst.setTo(0, stream);
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_reduce_detail::sum<Policy>(shrinkPtr(src),
+                                    dst[0],
+                                    shrinkPtr(mask),
+                                    rows, cols,
+                                    StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename ResType>
+__host__ void gridCalcSum_(const SrcPtr& src, GpuMat_<ResType>& dst, Stream& stream = Stream::Null())
+{
+    typedef typename PtrTraits<SrcPtr>::value_type src_type;
+
+    CV_StaticAssert( unsigned(VecTraits<src_type>::cn) == unsigned(VecTraits<ResType>::cn), "" );
+
+    dst.create(1, 1);
+    dst.setTo(0, stream);
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    grid_reduce_detail::sum<Policy>(shrinkPtr(src),
+                                    dst[0],
+                                    WithOutMask(),
+                                    rows, cols,
+                                    StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridFindMinVal_(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    dst.create(1, 1);
+    dst.setTo(Scalar::all(std::numeric_limits<ResType>::max()), stream);
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_reduce_detail::minVal<Policy>(shrinkPtr(src),
+                                       dst[0],
+                                       shrinkPtr(mask),
+                                       rows, cols,
+                                       StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename ResType>
+__host__ void gridFindMinVal_(const SrcPtr& src, GpuMat_<ResType>& dst, Stream& stream = Stream::Null())
+{
+    dst.create(1, 1);
+    dst.setTo(Scalar::all(std::numeric_limits<ResType>::max()), stream);
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    grid_reduce_detail::minVal<Policy>(shrinkPtr(src),
+                                       dst[0],
+                                       WithOutMask(),
+                                       rows, cols,
+                                       StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridFindMaxVal_(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    dst.create(1, 1);
+    dst.setTo(Scalar::all(-std::numeric_limits<ResType>::max()), stream);
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_reduce_detail::maxVal<Policy>(shrinkPtr(src),
+                                       dst[0],
+                                       shrinkPtr(mask),
+                                       rows, cols,
+                                       StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename ResType>
+__host__ void gridFindMaxVal_(const SrcPtr& src, GpuMat_<ResType>& dst, Stream& stream = Stream::Null())
+{
+    dst.create(1, 1);
+    dst.setTo(Scalar::all(-std::numeric_limits<ResType>::max()), stream);
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    grid_reduce_detail::maxVal<Policy>(shrinkPtr(src),
+                                       dst[0],
+                                       WithOutMask(),
+                                       rows, cols,
+                                       StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridFindMinMaxVal_(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    dst.create(1, 2);
+    dst.col(0).setTo(Scalar::all(std::numeric_limits<ResType>::max()), stream);
+    dst.col(1).setTo(Scalar::all(-std::numeric_limits<ResType>::max()), stream);
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_reduce_detail::minMaxVal<Policy>(shrinkPtr(src),
+                                          dst[0],
+                                          shrinkPtr(mask),
+                                          rows, cols,
+                                          StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename ResType>
+__host__ void gridFindMinMaxVal_(const SrcPtr& src, GpuMat_<ResType>& dst, Stream& stream = Stream::Null())
+{
+    dst.create(1, 2);
+    dst.col(0).setTo(Scalar::all(std::numeric_limits<ResType>::max()), stream);
+    dst.col(1).setTo(Scalar::all(-std::numeric_limits<ResType>::max()), stream);
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    grid_reduce_detail::minMaxVal<Policy>(shrinkPtr(src),
+                                          dst[0],
+                                          WithOutMask(),
+                                          rows, cols,
+                                          StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridMinMaxLoc_(const SrcPtr& src, GpuMat_<ResType>& valBuf, GpuMat_<int>& locBuf, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    dim3 grid, block;
+    grid_minmaxloc_detail::getLaunchCfg<Policy>(rows, cols, block, grid);
+
+    valBuf.create(2, grid.x * grid.y);
+    locBuf.create(2, grid.x * grid.y);
+
+    grid_minmaxloc_detail::minMaxLoc<Policy>(shrinkPtr(src),
+                                             valBuf[0], valBuf[1], locBuf[0], locBuf[1],
+                                             shrinkPtr(mask),
+                                             rows, cols,
+                                             StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename ResType>
+__host__ void gridMinMaxLoc_(const SrcPtr& src, GpuMat_<ResType>& valBuf, GpuMat_<int>& locBuf, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    dim3 grid, block;
+    grid_minmaxloc_detail::getLaunchCfg<Policy>(rows, cols, block, grid);
+
+    valBuf.create(2, grid.x * grid.y);
+    locBuf.create(2, grid.x * grid.y);
+
+    grid_minmaxloc_detail::minMaxLoc<Policy>(shrinkPtr(src),
+                                             valBuf[0], valBuf[1], locBuf[0], locBuf[1],
+                                             WithOutMask(),
+                                             rows, cols,
+                                             StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridCountNonZero_(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    dst.create(1, 1);
+    dst.setTo(0, stream);
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    typedef typename PtrTraits<SrcPtr>::value_type src_type;
+    not_equal_to<src_type> ne_op;
+    const src_type zero = VecTraits<src_type>::all(0);
+
+    grid_reduce_detail::sum<Policy>(shrinkPtr(transformPtr(src, bind2nd(ne_op, zero))),
+                                    dst[0],
+                                    shrinkPtr(mask),
+                                    rows, cols,
+                                    StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename ResType>
+__host__ void gridCountNonZero_(const SrcPtr& src, GpuMat_<ResType>& dst, Stream& stream = Stream::Null())
+{
+    dst.create(1, 1);
+    dst.setTo(0, stream);
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    typedef typename PtrTraits<SrcPtr>::value_type src_type;
+    not_equal_to<src_type> ne_op;
+    const src_type zero = VecTraits<src_type>::all(0);
+
+    grid_reduce_detail::sum<Policy>(shrinkPtr(transformPtr(src, bind2nd(ne_op, zero))),
+                                    dst[0],
+                                    WithOutMask(),
+                                    rows, cols,
+                                    StreamAccessor::getStream(stream));
+}
+
+// default policy
+
+struct DefaultGlobReducePolicy
+{
+    enum {
+        block_size_x = 32,
+        block_size_y = 8,
+
+        patch_size_x = 4,
+        patch_size_y = 4
+    };
+};
+
+template <class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridCalcSum(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridCalcSum_<DefaultGlobReducePolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtr, typename ResType>
+__host__ void gridCalcSum(const SrcPtr& src, GpuMat_<ResType>& dst, Stream& stream = Stream::Null())
+{
+    gridCalcSum_<DefaultGlobReducePolicy>(src, dst, stream);
+}
+
+template <class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridFindMinVal(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridFindMinVal_<DefaultGlobReducePolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtr, typename ResType>
+__host__ void gridFindMinVal(const SrcPtr& src, GpuMat_<ResType>& dst, Stream& stream = Stream::Null())
+{
+    gridFindMinVal_<DefaultGlobReducePolicy>(src, dst, stream);
+}
+
+template <class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridFindMaxVal(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridFindMaxVal_<DefaultGlobReducePolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtr, typename ResType>
+__host__ void gridFindMaxVal(const SrcPtr& src, GpuMat_<ResType>& dst, Stream& stream = Stream::Null())
+{
+    gridFindMaxVal_<DefaultGlobReducePolicy>(src, dst, stream);
+}
+
+template <class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridFindMinMaxVal(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridFindMinMaxVal_<DefaultGlobReducePolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtr, typename ResType>
+__host__ void gridFindMinMaxVal(const SrcPtr& src, GpuMat_<ResType>& dst, Stream& stream = Stream::Null())
+{
+    gridFindMinMaxVal_<DefaultGlobReducePolicy>(src, dst, stream);
+}
+
+template <class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridMinMaxLoc(const SrcPtr& src, GpuMat_<ResType>& valBuf, GpuMat_<int>& locBuf, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridMinMaxLoc_<DefaultGlobReducePolicy>(src, valBuf, locBuf, mask, stream);
+}
+
+template <class SrcPtr, typename ResType>
+__host__ void gridMinMaxLoc(const SrcPtr& src, GpuMat_<ResType>& valBuf, GpuMat_<int>& locBuf, Stream& stream = Stream::Null())
+{
+    gridMinMaxLoc_<DefaultGlobReducePolicy>(src, valBuf, locBuf, stream);
+}
+
+template <class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridCountNonZero(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridCountNonZero_<DefaultGlobReducePolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtr, typename ResType>
+__host__ void gridCountNonZero(const SrcPtr& src, GpuMat_<ResType>& dst, Stream& stream = Stream::Null())
+{
+    gridCountNonZero_<DefaultGlobReducePolicy>(src, dst, stream);
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/reduce_to_vec.hpp b/modules/cudev/include/opencv2/cudev/grid/reduce_to_vec.hpp
new file mode 100644
index 00000000000..5955ceeaa5e
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/reduce_to_vec.hpp
@@ -0,0 +1,235 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_REDUCE_TO_VEC_HPP
+#define OPENCV_CUDEV_GRID_REDUCE_TO_VEC_HPP
+
+#include "../common.hpp"
+#include "../util/vec_traits.hpp"
+#include "../util/limits.hpp"
+#include "../util/saturate_cast.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/gpumat.hpp"
+#include "../ptr2d/mask.hpp"
+#include "../functional/functional.hpp"
+#include "detail/reduce_to_column.hpp"
+#include "detail/reduce_to_row.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <typename T> struct Sum : plus<T>
+{
+    typedef T work_type;
+
+    template <typename U> struct rebind
+    {
+        typedef Sum<U> other;
+    };
+
+    __device__ __forceinline__ static T initialValue()
+    {
+        return VecTraits<T>::all(0);
+    }
+
+    __device__ __forceinline__ static T result(T r, int)
+    {
+        return r;
+    }
+};
+
+template <typename T> struct Avg : plus<T>
+{
+    typedef T work_type;
+
+    template <typename U> struct rebind
+    {
+        typedef Avg<U> other;
+    };
+
+    __device__ __forceinline__ static T initialValue()
+    {
+        return VecTraits<T>::all(0);
+    }
+
+    __device__ __forceinline__ static T result(T r, float sz)
+    {
+        return saturate_cast<T>(r / sz);
+    }
+};
+
+template <typename T> struct Min : minimum<T>
+{
+    typedef T work_type;
+
+    template <typename U> struct rebind
+    {
+        typedef Min<U> other;
+    };
+
+    __device__ __forceinline__ static T initialValue()
+    {
+        return VecTraits<T>::all(numeric_limits<typename VecTraits<T>::elem_type>::max());
+    }
+
+    __device__ __forceinline__ static T result(T r, int)
+    {
+        return r;
+    }
+};
+
+template <typename T> struct Max : maximum<T>
+{
+    typedef T work_type;
+
+    template <typename U> struct rebind
+    {
+        typedef Max<U> other;
+    };
+
+    __device__ __forceinline__ static T initialValue()
+    {
+        return VecTraits<T>::all(-numeric_limits<typename VecTraits<T>::elem_type>::max());
+    }
+
+    __device__ __forceinline__ static T result(T r, int)
+    {
+        return r;
+    }
+};
+
+template <class Reductor, class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridReduceToRow(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    dst.create(1, cols);
+
+    grid_reduce_to_vec_detail::reduceToRow<Reductor>(shrinkPtr(src),
+                                                     dst[0],
+                                                     shrinkPtr(mask),
+                                                     rows, cols,
+                                                     StreamAccessor::getStream(stream));
+}
+
+template <class Reductor, class SrcPtr, typename ResType>
+__host__ void gridReduceToRow(const SrcPtr& src, GpuMat_<ResType>& dst, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    dst.create(1, cols);
+
+    grid_reduce_to_vec_detail::reduceToRow<Reductor>(shrinkPtr(src),
+                                                     dst[0],
+                                                     WithOutMask(),
+                                                     rows, cols,
+                                                     StreamAccessor::getStream(stream));
+}
+
+template <class Reductor, class Policy, class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridReduceToColumn_(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    cuda::createContinuous(rows, 1, dst.type(), dst);
+
+    grid_reduce_to_vec_detail::reduceToColumn<Reductor, Policy>(shrinkPtr(src),
+                                                                dst[0],
+                                                                shrinkPtr(mask),
+                                                                rows, cols,
+                                                                StreamAccessor::getStream(stream));
+}
+
+template <class Reductor, class Policy, class SrcPtr, typename ResType>
+__host__ void gridReduceToColumn_(const SrcPtr& src, GpuMat_<ResType>& dst, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    cuda::createContinuous(rows, 1, dst.type(), dst);
+
+    grid_reduce_to_vec_detail::reduceToColumn<Reductor, Policy>(shrinkPtr(src),
+                                                                dst[0],
+                                                                WithOutMask(),
+                                                                rows, cols,
+                                                                StreamAccessor::getStream(stream));
+}
+
+// default policy
+
+struct DefaultReduceToVecPolicy
+{
+    enum {
+        block_size_x = 32,
+        block_size_y = 8
+    };
+};
+
+template <class Reductor, class SrcPtr, typename ResType, class MaskPtr>
+__host__ void gridReduceToColumn(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridReduceToColumn_<Reductor, DefaultReduceToVecPolicy>(src, dst, mask, stream);
+}
+
+template <class Reductor, class SrcPtr, typename ResType>
+__host__ void gridReduceToColumn(const SrcPtr& src, GpuMat_<ResType>& dst, Stream& stream = Stream::Null())
+{
+    gridReduceToColumn_<Reductor, DefaultReduceToVecPolicy>(src, dst, stream);
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/split_merge.hpp b/modules/cudev/include/opencv2/cudev/grid/split_merge.hpp
new file mode 100644
index 00000000000..5c92a813ed8
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/split_merge.hpp
@@ -0,0 +1,589 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_SPLIT_MERGE_HPP
+#define OPENCV_CUDEV_GRID_SPLIT_MERGE_HPP
+
+#include "../common.hpp"
+#include "../util/tuple.hpp"
+#include "../util/vec_traits.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/gpumat.hpp"
+#include "../ptr2d/glob.hpp"
+#include "../ptr2d/mask.hpp"
+#include "detail/split_merge.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <class Policy, class SrcPtrTuple, typename DstType, class MaskPtr>
+__host__ void gridMerge_(const SrcPtrTuple& src, GpuMat_<DstType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<DstType>::cn == tuple_size<SrcPtrTuple>::value, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    dst.create(rows, cols);
+
+    grid_split_merge_detail::MergeImpl<VecTraits<DstType>::cn, Policy>::merge(shrinkPtr(src),
+                                                                              shrinkPtr(dst),
+                                                                              shrinkPtr(mask),
+                                                                              rows, cols,
+                                                                              StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtrTuple, typename DstType, class MaskPtr>
+__host__ void gridMerge_(const SrcPtrTuple& src, const GlobPtrSz<DstType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<DstType>::cn == tuple_size<SrcPtrTuple>::value, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(dst) == rows && getCols(dst) == cols );
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_split_merge_detail::MergeImpl<VecTraits<DstType>::cn, Policy>::merge(shrinkPtr(src),
+                                                                              shrinkPtr(dst),
+                                                                              shrinkPtr(mask),
+                                                                              rows, cols,
+                                                                              StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtrTuple, typename DstType>
+__host__ void gridMerge_(const SrcPtrTuple& src, GpuMat_<DstType>& dst, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<DstType>::cn == tuple_size<SrcPtrTuple>::value, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    dst.create(rows, cols);
+
+    grid_split_merge_detail::MergeImpl<VecTraits<DstType>::cn, Policy>::merge(shrinkPtr(src),
+                                                                              shrinkPtr(dst),
+                                                                              WithOutMask(),
+                                                                              rows, cols,
+                                                                              StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtrTuple, typename DstType>
+__host__ void gridMerge_(const SrcPtrTuple& src, const GlobPtrSz<DstType>& dst, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<DstType>::cn == tuple_size<SrcPtrTuple>::value, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(dst) == rows && getCols(dst) == cols );
+
+    grid_split_merge_detail::MergeImpl<VecTraits<DstType>::cn, Policy>::merge(shrinkPtr(src),
+                                                                              shrinkPtr(dst),
+                                                                              WithOutMask(),
+                                                                              rows, cols,
+                                                                              StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType, class MaskPtr>
+__host__ void gridSplit_(const SrcPtr& src, const tuple< GpuMat_<DstType>&, GpuMat_<DstType>& >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 2, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(get<0>(dst)), shrinkPtr(get<1>(dst)),
+                                           shrinkPtr(mask),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType, class MaskPtr>
+__host__ void gridSplit_(const SrcPtr& src, GpuMat_<DstType> (&dst)[2], const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 2, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    dst[0].create(rows, cols);
+    dst[1].create(rows, cols);
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(dst[0]), shrinkPtr(dst[1]),
+                                           shrinkPtr(mask),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType, class MaskPtr>
+__host__ void gridSplit_(const SrcPtr& src, GlobPtrSz<DstType> (&dst)[2], const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 2, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(dst[0]) == rows && getCols(dst[0]) == cols );
+    CV_Assert( getRows(dst[1]) == rows && getCols(dst[1]) == cols );
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(dst[0]), shrinkPtr(dst[1]),
+                                           shrinkPtr(mask),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType>
+__host__ void gridSplit_(const SrcPtr& src, const tuple< GpuMat_<DstType>&, GpuMat_<DstType>& >& dst, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 2, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(get<0>(dst)), shrinkPtr(get<1>(dst)),
+                                           WithOutMask(),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType>
+__host__ void gridSplit_(const SrcPtr& src, GpuMat_<DstType> (&dst)[2], Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 2, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    dst[0].create(rows, cols);
+    dst[1].create(rows, cols);
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(dst[0]), shrinkPtr(dst[1]),
+                                           WithOutMask(),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType>
+__host__ void gridSplit_(const SrcPtr& src, GlobPtrSz<DstType> (&dst)[2], Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 2, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(dst[0]) == rows && getCols(dst[0]) == cols );
+    CV_Assert( getRows(dst[1]) == rows && getCols(dst[1]) == cols );
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(dst[0]), shrinkPtr(dst[1]),
+                                           WithOutMask(),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType, class MaskPtr>
+__host__ void gridSplit_(const SrcPtr& src, const tuple< GpuMat_<DstType>&, GpuMat_<DstType>&, GpuMat_<DstType>& >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 3, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+    get<2>(dst).create(rows, cols);
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(get<0>(dst)), shrinkPtr(get<1>(dst)), shrinkPtr(get<2>(dst)),
+                                           shrinkPtr(mask),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType, class MaskPtr>
+__host__ void gridSplit_(const SrcPtr& src, GpuMat_<DstType> (&dst)[3], const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 3, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    dst[0].create(rows, cols);
+    dst[1].create(rows, cols);
+    dst[2].create(rows, cols);
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(dst[0]), shrinkPtr(dst[1]), shrinkPtr(dst[2]),
+                                           shrinkPtr(mask),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType, class MaskPtr>
+__host__ void gridSplit_(const SrcPtr& src, GlobPtrSz<DstType> (&dst)[3], const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 3, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(dst[0]) == rows && getCols(dst[0]) == cols );
+    CV_Assert( getRows(dst[1]) == rows && getCols(dst[1]) == cols );
+    CV_Assert( getRows(dst[2]) == rows && getCols(dst[2]) == cols );
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(dst[0]), shrinkPtr(dst[1]), shrinkPtr(dst[2]),
+                                           shrinkPtr(mask),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType>
+__host__ void gridSplit_(const SrcPtr& src, const tuple< GpuMat_<DstType>&, GpuMat_<DstType>&, GpuMat_<DstType>& >& dst, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 3, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+    get<2>(dst).create(rows, cols);
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(get<0>(dst)), shrinkPtr(get<1>(dst)), shrinkPtr(get<2>(dst)),
+                                           WithOutMask(),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType>
+__host__ void gridSplit_(const SrcPtr& src, GpuMat_<DstType> (&dst)[3], Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 3, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    dst[0].create(rows, cols);
+    dst[1].create(rows, cols);
+    dst[2].create(rows, cols);
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(dst[0]), shrinkPtr(dst[1]), shrinkPtr(dst[2]),
+                                           WithOutMask(),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType>
+__host__ void gridSplit_(const SrcPtr& src, GlobPtrSz<DstType> (&dst)[3], Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 3, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(dst[0]) == rows && getCols(dst[0]) == cols );
+    CV_Assert( getRows(dst[1]) == rows && getCols(dst[1]) == cols );
+    CV_Assert( getRows(dst[2]) == rows && getCols(dst[2]) == cols );
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(dst[0]), shrinkPtr(dst[1]), shrinkPtr(dst[2]),
+                                           WithOutMask(),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType, class MaskPtr>
+__host__ void gridSplit_(const SrcPtr& src, const tuple< GpuMat_<DstType>&, GpuMat_<DstType>&, GpuMat_<DstType>&, GpuMat_<DstType>& >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 4, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+    get<2>(dst).create(rows, cols);
+    get<3>(dst).create(rows, cols);
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(get<0>(dst)), shrinkPtr(get<1>(dst)), shrinkPtr(get<2>(dst)), shrinkPtr(get<3>(dst)),
+                                           shrinkPtr(mask),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType, class MaskPtr>
+__host__ void gridSplit_(const SrcPtr& src, GpuMat_<DstType> (&dst)[4], const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 4, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    dst[0].create(rows, cols);
+    dst[1].create(rows, cols);
+    dst[2].create(rows, cols);
+    dst[3].create(rows, cols);
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(dst[0]), shrinkPtr(dst[1]), shrinkPtr(dst[2]), shrinkPtr(dst[3]),
+                                           shrinkPtr(mask),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType, class MaskPtr>
+__host__ void gridSplit_(const SrcPtr& src, GlobPtrSz<DstType> (&dst)[4], const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 4, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(dst[0]) == rows && getCols(dst[0]) == cols );
+    CV_Assert( getRows(dst[1]) == rows && getCols(dst[1]) == cols );
+    CV_Assert( getRows(dst[2]) == rows && getCols(dst[2]) == cols );
+    CV_Assert( getRows(dst[3]) == rows && getCols(dst[3]) == cols );
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(dst[0]), shrinkPtr(dst[1]), shrinkPtr(dst[2]), shrinkPtr(dst[3]),
+                                           shrinkPtr(mask),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType>
+__host__ void gridSplit_(const SrcPtr& src, const tuple< GpuMat_<DstType>&, GpuMat_<DstType>&, GpuMat_<DstType>&, GpuMat_<DstType>& >& dst, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 4, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+    get<2>(dst).create(rows, cols);
+    get<3>(dst).create(rows, cols);
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(get<0>(dst)), shrinkPtr(get<1>(dst)), shrinkPtr(get<2>(dst)), shrinkPtr(get<3>(dst)),
+                                           WithOutMask(),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType>
+__host__ void gridSplit_(const SrcPtr& src, GpuMat_<DstType> (&dst)[4], Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 4, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    dst[0].create(rows, cols);
+    dst[1].create(rows, cols);
+    dst[2].create(rows, cols);
+    dst[3].create(rows, cols);
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(dst[0]), shrinkPtr(dst[1]), shrinkPtr(dst[2]), shrinkPtr(dst[3]),
+                                           WithOutMask(),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType>
+__host__ void gridSplit_(const SrcPtr& src, GlobPtrSz<DstType> (&dst)[4], Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( VecTraits<typename PtrTraits<SrcPtr>::value_type>::cn == 4, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(dst[0]) == rows && getCols(dst[0]) == cols );
+    CV_Assert( getRows(dst[1]) == rows && getCols(dst[1]) == cols );
+    CV_Assert( getRows(dst[2]) == rows && getCols(dst[2]) == cols );
+    CV_Assert( getRows(dst[3]) == rows && getCols(dst[3]) == cols );
+
+    grid_split_merge_detail::split<Policy>(shrinkPtr(src),
+                                           shrinkPtr(dst[0]), shrinkPtr(dst[1]), shrinkPtr(dst[2]), shrinkPtr(dst[3]),
+                                           WithOutMask(),
+                                           rows, cols,
+                                           StreamAccessor::getStream(stream));
+}
+
+// Default Policy
+
+struct DefaultSplitMergePolicy
+{
+    enum {
+        block_size_x = 32,
+        block_size_y = 8
+    };
+};
+
+template <class SrcPtrTuple, typename DstType, class MaskPtr>
+__host__ void gridMerge(const SrcPtrTuple& src, GpuMat_<DstType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridMerge_<DefaultSplitMergePolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtrTuple, typename DstType, class MaskPtr>
+__host__ void gridMerge(const SrcPtrTuple& src, const GlobPtrSz<DstType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridMerge_<DefaultSplitMergePolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtrTuple, typename DstType>
+__host__ void gridMerge(const SrcPtrTuple& src, GpuMat_<DstType>& dst, Stream& stream = Stream::Null())
+{
+    gridMerge_<DefaultSplitMergePolicy>(src, dst, stream);
+}
+
+template <class SrcPtrTuple, typename DstType>
+__host__ void gridMerge(const SrcPtrTuple& src, const GlobPtrSz<DstType>& dst, Stream& stream = Stream::Null())
+{
+    gridMerge_<DefaultSplitMergePolicy>(src, dst, stream);
+}
+
+template <class SrcPtr, typename DstType, class MaskPtr>
+__host__ void gridSplit(const SrcPtr& src, const tuple< GpuMat_<DstType>&, GpuMat_<DstType>& >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridSplit_<DefaultSplitMergePolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtr, typename DstType>
+__host__ void gridSplit(const SrcPtr& src, const tuple< GpuMat_<DstType>&, GpuMat_<DstType>& >& dst, Stream& stream = Stream::Null())
+{
+    gridSplit_<DefaultSplitMergePolicy>(src, dst, stream);
+}
+
+template <class SrcPtr, typename DstType, class MaskPtr>
+__host__ void gridSplit(const SrcPtr& src, const tuple< GpuMat_<DstType>&, GpuMat_<DstType>&, GpuMat_<DstType>& >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridSplit_<DefaultSplitMergePolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtr, typename DstType>
+__host__ void gridSplit(const SrcPtr& src, const tuple< GpuMat_<DstType>&, GpuMat_<DstType>&, GpuMat_<DstType>& >& dst, Stream& stream = Stream::Null())
+{
+    gridSplit_<DefaultSplitMergePolicy>(src, dst, stream);
+}
+
+template <class SrcPtr, typename DstType, class MaskPtr>
+__host__ void gridSplit(const SrcPtr& src, const tuple< GpuMat_<DstType>&, GpuMat_<DstType>&, GpuMat_<DstType>&, GpuMat_<DstType>& >& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridSplit_<DefaultSplitMergePolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtr, typename DstType>
+__host__ void gridSplit(const SrcPtr& src, const tuple< GpuMat_<DstType>&, GpuMat_<DstType>&, GpuMat_<DstType>&, GpuMat_<DstType>& >& dst, Stream& stream = Stream::Null())
+{
+    gridSplit_<DefaultSplitMergePolicy>(src, dst, stream);
+}
+
+template <class SrcPtr, typename DstType, int COUNT, class MaskPtr>
+__host__ void gridSplit(const SrcPtr& src, GpuMat_<DstType> (&dst)[COUNT], const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridSplit_<DefaultSplitMergePolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtr, typename DstType, int COUNT, class MaskPtr>
+__host__ void gridSplit(const SrcPtr& src, GlobPtrSz<DstType> (&dst)[COUNT], const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridSplit_<DefaultSplitMergePolicy>(src, dst, mask, stream);
+}
+
+template <class SrcPtr, typename DstType, int COUNT>
+__host__ void gridSplit(const SrcPtr& src, GpuMat_<DstType> (&dst)[COUNT], Stream& stream = Stream::Null())
+{
+    gridSplit_<DefaultSplitMergePolicy>(src, dst, stream);
+}
+
+template <class SrcPtr, typename DstType, int COUNT>
+__host__ void gridSplit(const SrcPtr& src, GlobPtrSz<DstType> (&dst)[COUNT], Stream& stream = Stream::Null())
+{
+    gridSplit_<DefaultSplitMergePolicy>(src, dst, stream);
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/transform.hpp b/modules/cudev/include/opencv2/cudev/grid/transform.hpp
new file mode 100644
index 00000000000..4f7d191e64b
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/transform.hpp
@@ -0,0 +1,546 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_TRANSFORM_HPP
+#define OPENCV_CUDEV_GRID_TRANSFORM_HPP
+
+#include "../common.hpp"
+#include "../util/tuple.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/gpumat.hpp"
+#include "../ptr2d/glob.hpp"
+#include "../ptr2d/mask.hpp"
+#include "../ptr2d/zip.hpp"
+#include "detail/transform.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <class Policy, class SrcPtr, typename DstType, class UnOp, class MaskPtr>
+__host__ void gridTransformUnary_(const SrcPtr& src, GpuMat_<DstType>& dst, const UnOp& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    dst.create(rows, cols);
+
+    grid_transform_detail::transform_unary<Policy>(shrinkPtr(src), shrinkPtr(dst), op, shrinkPtr(mask), rows, cols, StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType, class UnOp, class MaskPtr>
+__host__ void gridTransformUnary_(const SrcPtr& src, const GlobPtrSz<DstType>& dst, const UnOp& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(dst) == rows && getCols(dst) == cols );
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_transform_detail::transform_unary<Policy>(shrinkPtr(src), shrinkPtr(dst), op, shrinkPtr(mask), rows, cols, StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType, class UnOp>
+__host__ void gridTransformUnary_(const SrcPtr& src, GpuMat_<DstType>& dst, const UnOp& op, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    dst.create(rows, cols);
+
+    grid_transform_detail::transform_unary<Policy>(shrinkPtr(src), shrinkPtr(dst), op, WithOutMask(), rows, cols, StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType, class UnOp>
+__host__ void gridTransformUnary_(const SrcPtr& src, const GlobPtrSz<DstType>& dst, const UnOp& op, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(dst) == rows && getCols(dst) == cols );
+
+    grid_transform_detail::transform_unary<Policy>(shrinkPtr(src), shrinkPtr(dst), op, WithOutMask(), rows, cols, StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr1, class SrcPtr2, typename DstType, class BinOp, class MaskPtr>
+__host__ void gridTransformBinary_(const SrcPtr1& src1, const SrcPtr2& src2, GpuMat_<DstType>& dst, const BinOp& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src1);
+    const int cols = getCols(src1);
+
+    CV_Assert( getRows(src2) == rows && getCols(src2) == cols );
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    dst.create(rows, cols);
+
+    grid_transform_detail::transform_binary<Policy>(shrinkPtr(src1), shrinkPtr(src2), shrinkPtr(dst), op, shrinkPtr(mask), rows, cols, StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr1, class SrcPtr2, typename DstType, class BinOp, class MaskPtr>
+__host__ void gridTransformBinary_(const SrcPtr1& src1, const SrcPtr2& src2, const GlobPtrSz<DstType>& dst, const BinOp& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src1);
+    const int cols = getCols(src1);
+
+    CV_Assert( getRows(dst) == rows && getCols(dst) == cols );
+    CV_Assert( getRows(src2) == rows && getCols(src2) == cols );
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_transform_detail::transform_binary<Policy>(shrinkPtr(src1), shrinkPtr(src2), shrinkPtr(dst), op, shrinkPtr(mask), rows, cols, StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr1, class SrcPtr2, typename DstType, class BinOp>
+__host__ void gridTransformBinary_(const SrcPtr1& src1, const SrcPtr2& src2, GpuMat_<DstType>& dst, const BinOp& op, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src1);
+    const int cols = getCols(src1);
+
+    CV_Assert( getRows(src2) == rows && getCols(src2) == cols );
+
+    dst.create(rows, cols);
+
+    grid_transform_detail::transform_binary<Policy>(shrinkPtr(src1), shrinkPtr(src2), shrinkPtr(dst), op, WithOutMask(), rows, cols, StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr1, class SrcPtr2, typename DstType, class BinOp>
+__host__ void gridTransformBinary_(const SrcPtr1& src1, const SrcPtr2& src2, const GlobPtrSz<DstType>& dst, const BinOp& op, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src1);
+    const int cols = getCols(src1);
+
+    CV_Assert( getRows(dst) == rows && getCols(dst) == cols );
+    CV_Assert( getRows(src2) == rows && getCols(src2) == cols );
+
+    grid_transform_detail::transform_binary<Policy>(shrinkPtr(src1), shrinkPtr(src2), shrinkPtr(dst), op, WithOutMask(), rows, cols, StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename D0, typename D1, class OpTuple, class MaskPtr>
+__host__ void gridTransformTuple_(const SrcPtr& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>& >& dst, const OpTuple& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<OpTuple>::value == 2, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+
+    grid_transform_detail::transform_tuple<Policy>(shrinkPtr(src),
+                                                   shrinkPtr(zipPtr(get<0>(dst), get<1>(dst))),
+                                                   op,
+                                                   shrinkPtr(mask),
+                                                   rows, cols,
+                                                   StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename D0, typename D1, class OpTuple, class MaskPtr>
+__host__ void gridTransformTuple_(const SrcPtr& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1> >& dst, const OpTuple& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<OpTuple>::value == 2, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(get<0>(dst)) == rows && getCols(get<0>(dst)) == cols );
+    CV_Assert( getRows(get<1>(dst)) == rows && getCols(get<1>(dst)) == cols );
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_transform_detail::transform_tuple<Policy>(shrinkPtr(src),
+                                                   shrinkPtr(zipPtr(get<0>(dst), get<1>(dst))),
+                                                   op,
+                                                   shrinkPtr(mask),
+                                                   rows, cols,
+                                                   StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename D0, typename D1, class OpTuple>
+__host__ void gridTransformTuple_(const SrcPtr& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>& >& dst, const OpTuple& op, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<OpTuple>::value == 2, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+
+    grid_transform_detail::transform_tuple<Policy>(shrinkPtr(src),
+                                                   shrinkPtr(zipPtr(get<0>(dst), get<1>(dst))),
+                                                   op,
+                                                   WithOutMask(),
+                                                   rows, cols,
+                                                   StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename D0, typename D1, class OpTuple>
+__host__ void gridTransformTuple_(const SrcPtr& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1> >& dst, const OpTuple& op, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<OpTuple>::value == 2, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(get<0>(dst)) == rows && getCols(get<0>(dst)) == cols );
+    CV_Assert( getRows(get<1>(dst)) == rows && getCols(get<1>(dst)) == cols );
+
+    grid_transform_detail::transform_tuple<Policy>(shrinkPtr(src),
+                                                   shrinkPtr(zipPtr(get<0>(dst), get<1>(dst))),
+                                                   op,
+                                                   WithOutMask(),
+                                                   rows, cols,
+                                                   StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename D0, typename D1, typename D2, class OpTuple, class MaskPtr>
+__host__ void gridTransformTuple_(const SrcPtr& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>&, GpuMat_<D2>& >& dst, const OpTuple& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<OpTuple>::value == 3, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+    get<2>(dst).create(rows, cols);
+
+    grid_transform_detail::transform_tuple<Policy>(shrinkPtr(src),
+                                                   shrinkPtr(zipPtr(get<0>(dst), get<1>(dst), get<2>(dst))),
+                                                   op,
+                                                   shrinkPtr(mask),
+                                                   rows, cols,
+                                                   StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename D0, typename D1, typename D2, class OpTuple, class MaskPtr>
+__host__ void gridTransformTuple_(const SrcPtr& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1>, GlobPtrSz<D2> >& dst, const OpTuple& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<OpTuple>::value == 3, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(get<0>(dst)) == rows && getCols(get<0>(dst)) == cols );
+    CV_Assert( getRows(get<1>(dst)) == rows && getCols(get<1>(dst)) == cols );
+    CV_Assert( getRows(get<2>(dst)) == rows && getCols(get<2>(dst)) == cols );
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_transform_detail::transform_tuple<Policy>(shrinkPtr(src),
+                                                   shrinkPtr(zipPtr(get<0>(dst), get<1>(dst), get<2>(dst))),
+                                                   op,
+                                                   shrinkPtr(mask),
+                                                   rows, cols,
+                                                   StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename D0, typename D1, typename D2, class OpTuple>
+__host__ void gridTransformTuple_(const SrcPtr& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>&, GpuMat_<D2>& >& dst, const OpTuple& op, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<OpTuple>::value == 3, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+    get<2>(dst).create(rows, cols);
+
+    grid_transform_detail::transform_tuple<Policy>(shrinkPtr(src),
+                                                   shrinkPtr(zipPtr(get<0>(dst), get<1>(dst), get<2>(dst))),
+                                                   op,
+                                                   WithOutMask(),
+                                                   rows, cols,
+                                                   StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename D0, typename D1, typename D2, class OpTuple>
+__host__ void gridTransformTuple_(const SrcPtr& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1>, GlobPtrSz<D2> >& dst, const OpTuple& op, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<OpTuple>::value == 3, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(get<0>(dst)) == rows && getCols(get<0>(dst)) == cols );
+    CV_Assert( getRows(get<1>(dst)) == rows && getCols(get<1>(dst)) == cols );
+    CV_Assert( getRows(get<2>(dst)) == rows && getCols(get<2>(dst)) == cols );
+
+    grid_transform_detail::transform_tuple<Policy>(shrinkPtr(src),
+                                                   shrinkPtr(zipPtr(get<0>(dst), get<1>(dst), get<2>(dst))),
+                                                   op,
+                                                   WithOutMask(),
+                                                   rows, cols,
+                                                   StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename D0, typename D1, typename D2, typename D3, class OpTuple, class MaskPtr>
+__host__ void gridTransformTuple_(const SrcPtr& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>&, GpuMat_<D2>&, GpuMat_<D3>& >& dst, const OpTuple& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<OpTuple>::value == 4, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+    get<2>(dst).create(rows, cols);
+    get<3>(dst).create(rows, cols);
+
+    grid_transform_detail::transform_tuple<Policy>(shrinkPtr(src),
+                                                   shrinkPtr(zipPtr(get<0>(dst), get<1>(dst), get<2>(dst), get<3>(dst))),
+                                                   op,
+                                                   shrinkPtr(mask),
+                                                   rows, cols,
+                                                   StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename D0, typename D1, typename D2, typename D3, class OpTuple, class MaskPtr>
+__host__ void gridTransformTuple_(const SrcPtr& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1>, GlobPtrSz<D2>, GlobPtrSz<D3> >& dst, const OpTuple& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<OpTuple>::value == 4, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(get<0>(dst)) == rows && getCols(get<0>(dst)) == cols );
+    CV_Assert( getRows(get<1>(dst)) == rows && getCols(get<1>(dst)) == cols );
+    CV_Assert( getRows(get<2>(dst)) == rows && getCols(get<2>(dst)) == cols );
+    CV_Assert( getRows(get<3>(dst)) == rows && getCols(get<3>(dst)) == cols );
+    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );
+
+    grid_transform_detail::transform_tuple<Policy>(shrinkPtr(src),
+                                                   shrinkPtr(zipPtr(get<0>(dst), get<1>(dst), get<2>(dst), get<3>(dst))),
+                                                   op,
+                                                   shrinkPtr(mask),
+                                                   rows, cols,
+                                                   StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename D0, typename D1, typename D2, typename D3, class OpTuple>
+__host__ void gridTransformTuple_(const SrcPtr& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>&, GpuMat_<D2>&, GpuMat_<D3>& >& dst, const OpTuple& op, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<OpTuple>::value == 4, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    get<0>(dst).create(rows, cols);
+    get<1>(dst).create(rows, cols);
+    get<2>(dst).create(rows, cols);
+    get<3>(dst).create(rows, cols);
+
+    grid_transform_detail::transform_tuple<Policy>(shrinkPtr(src),
+                                                   shrinkPtr(zipPtr(get<0>(dst), get<1>(dst), get<2>(dst), get<3>(dst))),
+                                                   op,
+                                                   WithOutMask(),
+                                                   rows, cols,
+                                                   StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename D0, typename D1, typename D2, typename D3, class OpTuple>
+__host__ void gridTransformTuple_(const SrcPtr& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1>, GlobPtrSz<D2>, GlobPtrSz<D3> >& dst, const OpTuple& op, Stream& stream = Stream::Null())
+{
+    CV_StaticAssert( tuple_size<OpTuple>::value == 4, "" );
+
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(get<0>(dst)) == rows && getCols(get<0>(dst)) == cols );
+    CV_Assert( getRows(get<1>(dst)) == rows && getCols(get<1>(dst)) == cols );
+    CV_Assert( getRows(get<2>(dst)) == rows && getCols(get<2>(dst)) == cols );
+    CV_Assert( getRows(get<3>(dst)) == rows && getCols(get<3>(dst)) == cols );
+
+    grid_transform_detail::transform_tuple<Policy>(shrinkPtr(src),
+                                                   shrinkPtr(zipPtr(get<0>(dst), get<1>(dst), get<2>(dst), get<3>(dst))),
+                                                   op,
+                                                   WithOutMask(),
+                                                   rows, cols,
+                                                   StreamAccessor::getStream(stream));
+}
+
+// Default Policy
+
+struct DefaultTransformPolicy
+{
+    enum {
+        block_size_x = 32,
+        block_size_y = 8,
+        shift = 4
+    };
+};
+
+template <class SrcPtr, typename DstType, class Op, class MaskPtr>
+__host__ void gridTransformUnary(const SrcPtr& src, GpuMat_<DstType>& dst, const Op& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridTransformUnary_<DefaultTransformPolicy>(src, dst, op, mask, stream);
+}
+
+template <class SrcPtr, typename DstType, class Op, class MaskPtr>
+__host__ void gridTransformUnary(const SrcPtr& src, const GlobPtrSz<DstType>& dst, const Op& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridTransformUnary_<DefaultTransformPolicy>(src, dst, op, mask, stream);
+}
+
+template <class SrcPtr, typename DstType, class Op>
+__host__ void gridTransformUnary(const SrcPtr& src, GpuMat_<DstType>& dst, const Op& op, Stream& stream = Stream::Null())
+{
+    gridTransformUnary_<DefaultTransformPolicy>(src, dst, op, stream);
+}
+
+template <class SrcPtr, typename DstType, class Op>
+__host__ void gridTransformUnary(const SrcPtr& src, const GlobPtrSz<DstType>& dst, const Op& op, Stream& stream = Stream::Null())
+{
+    gridTransformUnary_<DefaultTransformPolicy>(src, dst, op, stream);
+}
+
+template <class SrcPtr1, class SrcPtr2, typename DstType, class Op, class MaskPtr>
+__host__ void gridTransformBinary(const SrcPtr1& src1, const SrcPtr2& src2, GpuMat_<DstType>& dst, const Op& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridTransformBinary_<DefaultTransformPolicy>(src1, src2, dst, op, mask, stream);
+}
+
+template <class SrcPtr1, class SrcPtr2, typename DstType, class Op, class MaskPtr>
+__host__ void gridTransformBinary(const SrcPtr1& src1, const SrcPtr2& src2, const GlobPtrSz<DstType>& dst, const Op& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridTransformBinary_<DefaultTransformPolicy>(src1, src2, dst, op, mask, stream);
+}
+
+template <class SrcPtr1, class SrcPtr2, typename DstType, class Op>
+__host__ void gridTransformBinary(const SrcPtr1& src1, const SrcPtr2& src2, GpuMat_<DstType>& dst, const Op& op, Stream& stream = Stream::Null())
+{
+    gridTransformBinary_<DefaultTransformPolicy>(src1, src2, dst, op, stream);
+}
+
+template <class SrcPtr1, class SrcPtr2, typename DstType, class Op>
+__host__ void gridTransformBinary(const SrcPtr1& src1, const SrcPtr2& src2, const GlobPtrSz<DstType>& dst, const Op& op, Stream& stream = Stream::Null())
+{
+    gridTransformBinary_<DefaultTransformPolicy>(src1, src2, dst, op, stream);
+}
+
+template <class SrcPtr, typename D0, typename D1, class OpTuple, class MaskPtr>
+__host__ void gridTransformTuple(const SrcPtr& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>& >& dst, const OpTuple& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridTransformTuple_<DefaultTransformPolicy>(src, dst, op, mask, stream);
+}
+
+template <class SrcPtr, typename D0, typename D1, class OpTuple, class MaskPtr>
+__host__ void gridTransformTuple(const SrcPtr& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1> >& dst, const OpTuple& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridTransformTuple_<DefaultTransformPolicy>(src, dst, op, mask, stream);
+}
+
+template <class SrcPtr, typename D0, typename D1, class OpTuple>
+__host__ void gridTransformTuple(const SrcPtr& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>& >& dst, const OpTuple& op, Stream& stream = Stream::Null())
+{
+    gridTransformTuple_<DefaultTransformPolicy>(src, dst, op, stream);
+}
+
+template <class SrcPtr, typename D0, typename D1, class OpTuple>
+__host__ void gridTransformTuple(const SrcPtr& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1> >& dst, const OpTuple& op, Stream& stream = Stream::Null())
+{
+    gridTransformTuple_<DefaultTransformPolicy>(src, dst, op, stream);
+}
+
+template <class SrcPtr, typename D0, typename D1, typename D2, class OpTuple, class MaskPtr>
+__host__ void gridTransformTuple(const SrcPtr& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>&, GpuMat_<D2>& >& dst, const OpTuple& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridTransformTuple_<DefaultTransformPolicy>(src, dst, op, mask, stream);
+}
+
+template <class SrcPtr, typename D0, typename D1, typename D2, class OpTuple, class MaskPtr>
+__host__ void gridTransformTuple(const SrcPtr& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1>, GlobPtrSz<D2> >& dst, const OpTuple& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridTransformTuple_<DefaultTransformPolicy>(src, dst, op, mask, stream);
+}
+
+template <class SrcPtr, typename D0, typename D1, typename D2, class OpTuple>
+__host__ void gridTransformTuple(const SrcPtr& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>&, GpuMat_<D2>& >& dst, const OpTuple& op, Stream& stream = Stream::Null())
+{
+    gridTransformTuple_<DefaultTransformPolicy>(src, dst, op, stream);
+}
+
+template <class SrcPtr, typename D0, typename D1, typename D2, class OpTuple>
+__host__ void gridTransformTuple(const SrcPtr& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1>, GlobPtrSz<D2> >& dst, const OpTuple& op, Stream& stream = Stream::Null())
+{
+    gridTransformTuple_<DefaultTransformPolicy>(src, dst, op, stream);
+}
+
+template <class SrcPtr, typename D0, typename D1, typename D2, typename D3, class OpTuple, class MaskPtr>
+__host__ void gridTransformTuple(const SrcPtr& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>&, GpuMat_<D2>&, GpuMat_<D3>& >& dst, const OpTuple& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridTransformTuple_<DefaultTransformPolicy>(src, dst, op, mask, stream);
+}
+
+template <class SrcPtr, typename D0, typename D1, typename D2, typename D3, class OpTuple, class MaskPtr>
+__host__ void gridTransformTuple(const SrcPtr& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1>, GlobPtrSz<D2>, GlobPtrSz<D3> >& dst, const OpTuple& op, const MaskPtr& mask, Stream& stream = Stream::Null())
+{
+    gridTransformTuple_<DefaultTransformPolicy>(src, dst, op, mask, stream);
+}
+
+template <class SrcPtr, typename D0, typename D1, typename D2, typename D3, class OpTuple>
+__host__ void gridTransformTuple(const SrcPtr& src, const tuple< GpuMat_<D0>&, GpuMat_<D1>&, GpuMat_<D2>&, GpuMat_<D3>& >& dst, const OpTuple& op, Stream& stream = Stream::Null())
+{
+    gridTransformTuple_<DefaultTransformPolicy>(src, dst, op, stream);
+}
+
+template <class SrcPtr, typename D0, typename D1, typename D2, typename D3, class OpTuple>
+__host__ void gridTransformTuple(const SrcPtr& src, const tuple< GlobPtrSz<D0>, GlobPtrSz<D1>, GlobPtrSz<D2>, GlobPtrSz<D3> >& dst, const OpTuple& op, Stream& stream = Stream::Null())
+{
+    gridTransformTuple_<DefaultTransformPolicy>(src, dst, op, stream);
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/grid/transpose.hpp b/modules/cudev/include/opencv2/cudev/grid/transpose.hpp
new file mode 100644
index 00000000000..72ab7267ff7
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/grid/transpose.hpp
@@ -0,0 +1,108 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_GRID_TRANSPOSE_HPP
+#define OPENCV_CUDEV_GRID_TRANSPOSE_HPP
+
+#include "../common.hpp"
+#include "../ptr2d/traits.hpp"
+#include "../ptr2d/gpumat.hpp"
+#include "../ptr2d/glob.hpp"
+#include "detail/transpose.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <class Policy, class SrcPtr, typename DstType>
+__host__ void gridTranspose_(const SrcPtr& src, GpuMat_<DstType>& dst, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    dst.create(cols, rows);
+
+    transpose_detail::transpose<Policy>(shrinkPtr(src), shrinkPtr(dst), rows, cols, StreamAccessor::getStream(stream));
+}
+
+template <class Policy, class SrcPtr, typename DstType>
+__host__ void gridTranspose_(const SrcPtr& src, const GlobPtrSz<DstType>& dst, Stream& stream = Stream::Null())
+{
+    const int rows = getRows(src);
+    const int cols = getCols(src);
+
+    CV_Assert( getRows(dst) == cols && getCols(dst) == rows );
+
+    transpose_detail::transpose<Policy>(shrinkPtr(src), shrinkPtr(dst), rows, cols, StreamAccessor::getStream(stream));
+}
+
+// Default Policy
+
+struct DefaultTransposePolicy
+{
+    enum {
+        tile_dim    = 16,
+        block_dim_y = 16
+    };
+};
+
+template <class SrcPtr, typename DstType>
+__host__ void gridTranspose(const SrcPtr& src, GpuMat_<DstType>& dst, Stream& stream = Stream::Null())
+{
+    gridTranspose_<DefaultTransposePolicy>(src, dst, stream);
+}
+
+template <class SrcPtr, typename DstType>
+__host__ void gridTranspose(const SrcPtr& src, const GlobPtrSz<DstType>& dst, Stream& stream = Stream::Null())
+{
+    gridTranspose_<DefaultTransposePolicy>(src, dst, stream);
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/ptr2d/constant.hpp b/modules/cudev/include/opencv2/cudev/ptr2d/constant.hpp
new file mode 100644
index 00000000000..eb96290b4e3
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/constant.hpp
@@ -0,0 +1,98 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_PTR2D_CONSTANT_HPP
+#define OPENCV_CUDEV_PTR2D_CONSTANT_HPP
+
+#include "../common.hpp"
+#include "traits.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <typename T> struct ConstantPtr
+{
+    typedef T   value_type;
+    typedef int index_type;
+
+    T value;
+
+    __device__ __forceinline__ T operator ()(int, int) const { return value; }
+};
+
+template <typename T> struct ConstantPtrSz : ConstantPtr<T>
+{
+    int rows, cols;
+};
+
+template <typename T>
+__host__ ConstantPtr<T> constantPtr(T value)
+{
+    ConstantPtr<T> p;
+    p.value = value;
+    return p;
+}
+
+template <typename T> ConstantPtrSz<T>
+__host__ constantPtr(T value, int rows, int cols)
+{
+    ConstantPtrSz<T> p;
+    p.value = value;
+    p.rows = rows;
+    p.cols = cols;
+    return p;
+}
+
+template <typename T> struct PtrTraits< ConstantPtrSz<T> > : PtrTraitsBase< ConstantPtrSz<T>, ConstantPtr<T> >
+{
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/ptr2d/deriv.hpp b/modules/cudev/include/opencv2/cudev/ptr2d/deriv.hpp
new file mode 100644
index 00000000000..fe30d3e7afc
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/deriv.hpp
@@ -0,0 +1,398 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_PTR2D_DERIV_HPP
+#define OPENCV_CUDEV_PTR2D_DERIV_HPP
+
+#include "../common.hpp"
+#include "../grid/copy.hpp"
+#include "traits.hpp"
+#include "gpumat.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// derivX
+
+template <class SrcPtr> struct DerivXPtr
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef int                                    index_type;
+
+    SrcPtr src;
+
+    __device__ __forceinline__ typename PtrTraits<SrcPtr>::value_type operator ()(int y, int x) const
+    {
+        return src(y, x + 1) - src(y, x - 1);
+    }
+};
+
+template <class SrcPtr> struct DerivXPtrSz : DerivXPtr<SrcPtr>
+{
+    int rows, cols;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridCopy(*this, dst, stream);
+    }
+};
+
+template <class SrcPtr>
+__host__ DerivXPtrSz<typename PtrTraits<SrcPtr>::ptr_type> derivXPtr(const SrcPtr& src)
+{
+    DerivXPtrSz<typename PtrTraits<SrcPtr>::ptr_type> s;
+    s.src = shrinkPtr(src);
+    s.rows = getRows(src);
+    s.cols = getCols(src);
+    return s;
+}
+
+template <class SrcPtr> struct PtrTraits< DerivXPtrSz<SrcPtr> > : PtrTraitsBase<DerivXPtrSz<SrcPtr>, DerivXPtr<SrcPtr> >
+{
+};
+
+// derivY
+
+template <class SrcPtr> struct DerivYPtr
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef int                                    index_type;
+
+    SrcPtr src;
+
+    __device__ __forceinline__ typename PtrTraits<SrcPtr>::value_type operator ()(int y, int x) const
+    {
+        return src(y + 1, x) - src(y - 1, x);
+    }
+};
+
+template <class SrcPtr> struct DerivYPtrSz : DerivYPtr<SrcPtr>
+{
+    int rows, cols;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridCopy(*this, dst, stream);
+    }
+};
+
+template <class SrcPtr>
+__host__ DerivYPtrSz<typename PtrTraits<SrcPtr>::ptr_type> derivYPtr(const SrcPtr& src)
+{
+    DerivYPtrSz<typename PtrTraits<SrcPtr>::ptr_type> s;
+    s.src = shrinkPtr(src);
+    s.rows = getRows(src);
+    s.cols = getCols(src);
+    return s;
+}
+
+template <class SrcPtr> struct PtrTraits< DerivYPtrSz<SrcPtr> > : PtrTraitsBase<DerivYPtrSz<SrcPtr>, DerivYPtr<SrcPtr> >
+{
+};
+
+// sobelX
+
+template <class SrcPtr> struct SobelXPtr
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef int                                    index_type;
+
+    SrcPtr src;
+
+    __device__ typename PtrTraits<SrcPtr>::value_type operator ()(int y, int x) const
+    {
+        typename PtrTraits<SrcPtr>::value_type vals[6] =
+        {
+            src(y - 1, x - 1), src(y - 1, x + 1),
+            src(y    , x - 1), src(y    , x + 1),
+            src(y + 1, x - 1), src(y + 1, x + 1),
+        };
+
+        return (vals[1] - vals[0]) + 2 * (vals[3] - vals[2]) + (vals[5] - vals[4]);
+    }
+};
+
+template <class SrcPtr> struct SobelXPtrSz : SobelXPtr<SrcPtr>
+{
+    int rows, cols;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridCopy(*this, dst, stream);
+    }
+};
+
+template <class SrcPtr>
+__host__ SobelXPtrSz<typename PtrTraits<SrcPtr>::ptr_type> sobelXPtr(const SrcPtr& src)
+{
+    SobelXPtrSz<typename PtrTraits<SrcPtr>::ptr_type> s;
+    s.src = shrinkPtr(src);
+    s.rows = getRows(src);
+    s.cols = getCols(src);
+    return s;
+}
+
+template <class SrcPtr> struct PtrTraits< SobelXPtrSz<SrcPtr> > : PtrTraitsBase<SobelXPtrSz<SrcPtr>, SobelXPtr<SrcPtr> >
+{
+};
+
+// sobelY
+
+template <class SrcPtr> struct SobelYPtr
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef int                                    index_type;
+
+    SrcPtr src;
+
+    __device__ typename PtrTraits<SrcPtr>::value_type operator ()(int y, int x) const
+    {
+        typename PtrTraits<SrcPtr>::value_type vals[6] =
+        {
+            src(y - 1, x - 1), src(y - 1, x), src(y - 1, x + 1),
+            src(y + 1, x - 1), src(y + 1, x), src(y + 1, x + 1)
+        };
+
+        return (vals[3] - vals[0]) + 2 * (vals[4] - vals[1]) + (vals[5] - vals[2]);
+    }
+};
+
+template <class SrcPtr> struct SobelYPtrSz : SobelYPtr<SrcPtr>
+{
+    int rows, cols;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridCopy(*this, dst, stream);
+    }
+};
+
+template <class SrcPtr>
+__host__ SobelYPtrSz<typename PtrTraits<SrcPtr>::ptr_type> sobelYPtr(const SrcPtr& src)
+{
+    SobelYPtrSz<typename PtrTraits<SrcPtr>::ptr_type> s;
+    s.src = shrinkPtr(src);
+    s.rows = getRows(src);
+    s.cols = getCols(src);
+    return s;
+}
+
+template <class SrcPtr> struct PtrTraits< SobelYPtrSz<SrcPtr> > : PtrTraitsBase<SobelYPtrSz<SrcPtr>, SobelYPtr<SrcPtr> >
+{
+};
+
+// scharrX
+
+template <class SrcPtr> struct ScharrXPtr
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef int                                    index_type;
+
+    SrcPtr src;
+
+    __device__ typename PtrTraits<SrcPtr>::value_type operator ()(int y, int x) const
+    {
+        typename PtrTraits<SrcPtr>::value_type vals[6] =
+        {
+            src(y - 1, x - 1), src(y - 1, x + 1),
+            src(y    , x - 1), src(y    , x + 1),
+            src(y + 1, x - 1), src(y + 1, x + 1),
+        };
+
+        return 3 * (vals[1] - vals[0]) + 10 * (vals[3] - vals[2]) + 3 * (vals[5] - vals[4]);
+    }
+};
+
+template <class SrcPtr> struct ScharrXPtrSz : ScharrXPtr<SrcPtr>
+{
+    int rows, cols;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridCopy(*this, dst, stream);
+    }
+};
+
+template <class SrcPtr>
+__host__ ScharrXPtrSz<typename PtrTraits<SrcPtr>::ptr_type> scharrXPtr(const SrcPtr& src)
+{
+    ScharrXPtrSz<typename PtrTraits<SrcPtr>::ptr_type> s;
+    s.src = shrinkPtr(src);
+    s.rows = getRows(src);
+    s.cols = getCols(src);
+    return s;
+}
+
+template <class SrcPtr> struct PtrTraits< ScharrXPtrSz<SrcPtr> > : PtrTraitsBase<ScharrXPtrSz<SrcPtr>, ScharrXPtr<SrcPtr> >
+{
+};
+
+// scharrY
+
+template <class SrcPtr> struct ScharrYPtr
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef int                                    index_type;
+
+    SrcPtr src;
+
+    __device__ typename PtrTraits<SrcPtr>::value_type operator ()(int y, int x) const
+    {
+        typename PtrTraits<SrcPtr>::value_type vals[6] =
+        {
+            src(y - 1, x - 1), src(y - 1, x), src(y - 1, x + 1),
+            src(y + 1, x - 1), src(y + 1, x), src(y + 1, x + 1)
+        };
+
+        return 3 * (vals[3] - vals[0]) + 10 * (vals[4] - vals[1]) + 3 * (vals[5] - vals[2]);
+    }
+};
+
+template <class SrcPtr> struct ScharrYPtrSz : ScharrYPtr<SrcPtr>
+{
+    int rows, cols;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridCopy(*this, dst, stream);
+    }
+};
+
+template <class SrcPtr>
+__host__ ScharrYPtrSz<typename PtrTraits<SrcPtr>::ptr_type> scharrYPtr(const SrcPtr& src)
+{
+    ScharrYPtrSz<typename PtrTraits<SrcPtr>::ptr_type> s;
+    s.src = shrinkPtr(src);
+    s.rows = getRows(src);
+    s.cols = getCols(src);
+    return s;
+}
+
+template <class SrcPtr> struct PtrTraits< ScharrYPtrSz<SrcPtr> > : PtrTraitsBase<ScharrYPtrSz<SrcPtr>, ScharrYPtr<SrcPtr> >
+{
+};
+
+// laplacian
+
+template <int ksize, class SrcPtr> struct LaplacianPtr;
+
+template <class SrcPtr> struct LaplacianPtr<1, SrcPtr>
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef int                                    index_type;
+
+    SrcPtr src;
+
+    __device__ typename PtrTraits<SrcPtr>::value_type operator ()(int y, int x) const
+    {
+        typename PtrTraits<SrcPtr>::value_type vals[5] =
+        {
+                           src(y - 1, x),
+            src(y, x - 1), src(y    , x), src(y, x + 1),
+                           src(y + 1, x)
+        };
+
+        return (vals[0] + vals[1] + vals[3] + vals[4]) - 4 * vals[2];
+    }
+};
+
+template <class SrcPtr> struct LaplacianPtr<3, SrcPtr>
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef int                                    index_type;
+
+   SrcPtr src;
+
+   __device__ typename PtrTraits<SrcPtr>::value_type operator ()(int y, int x) const
+   {
+       typename PtrTraits<SrcPtr>::value_type vals[5] =
+       {
+           src(y - 1, x - 1),            src(y - 1, x + 1),
+                              src(y, x),
+           src(y + 1, x - 1),            src(y + 1, x + 1)
+       };
+
+       return 2 * (vals[0] + vals[1] + vals[3] + vals[4]) - 8 * vals[2];
+   }
+};
+
+template <int ksize, class SrcPtr> struct LaplacianPtrSz : LaplacianPtr<ksize, SrcPtr>
+{
+    int rows, cols;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridCopy(*this, dst, stream);
+    }
+};
+
+template <int ksize, class SrcPtr>
+__host__ LaplacianPtrSz<ksize, typename PtrTraits<SrcPtr>::ptr_type> laplacianPtr(const SrcPtr& src)
+{
+    LaplacianPtrSz<ksize, typename PtrTraits<SrcPtr>::ptr_type> ptr;
+    ptr.src = shrinkPtr(src);
+    ptr.rows = getRows(src);
+    ptr.cols = getCols(src);
+    return ptr;
+}
+
+template <int ksize, class SrcPtr> struct PtrTraits< LaplacianPtrSz<ksize, SrcPtr> > : PtrTraitsBase<LaplacianPtrSz<ksize, SrcPtr>, LaplacianPtr<ksize, SrcPtr> >
+{
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/ptr2d/detail/gpumat.hpp b/modules/cudev/include/opencv2/cudev/ptr2d/detail/gpumat.hpp
new file mode 100644
index 00000000000..968d78e8328
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/detail/gpumat.hpp
@@ -0,0 +1,361 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_PTR2D_GPUMAT_DETAIL_HPP
+#define OPENCV_CUDEV_PTR2D_GPUMAT_DETAIL_HPP
+
+#include "../gpumat.hpp"
+
+namespace cv { namespace cudev {
+
+template <typename T>
+__host__ GpuMat_<T>::GpuMat_(Allocator* allocator)
+    : GpuMat(allocator)
+{
+    flags = (flags & ~CV_MAT_TYPE_MASK) | DataType<T>::type;
+}
+
+template <typename T>
+__host__ GpuMat_<T>::GpuMat_(int arows, int acols, Allocator* allocator)
+    : GpuMat(arows, acols, DataType<T>::type, allocator)
+{
+}
+
+template <typename T>
+__host__ GpuMat_<T>::GpuMat_(Size asize, Allocator* allocator)
+    : GpuMat(asize.height, asize.width, DataType<T>::type, allocator)
+{
+}
+
+template <typename T>
+__host__ GpuMat_<T>::GpuMat_(int arows, int acols, Scalar val, Allocator* allocator)
+    : GpuMat(arows, acols, DataType<T>::type, val, allocator)
+{
+}
+
+template <typename T>
+__host__ GpuMat_<T>::GpuMat_(Size asize, Scalar val, Allocator* allocator)
+    : GpuMat(asize.height, asize.width, DataType<T>::type, val, allocator)
+{
+}
+
+template <typename T>
+__host__ GpuMat_<T>::GpuMat_(const GpuMat_& m)
+    : GpuMat(m)
+{
+}
+
+template <typename T>
+__host__ GpuMat_<T>::GpuMat_(const GpuMat& m, Allocator* allocator)
+    : GpuMat(allocator)
+{
+    flags = (flags & ~CV_MAT_TYPE_MASK) | DataType<T>::type;
+
+    if (DataType<T>::type == m.type())
+    {
+        GpuMat::operator =(m);
+        return;
+    }
+
+    if (DataType<T>::depth == m.depth())
+    {
+        GpuMat::operator =(m.reshape(DataType<T>::channels, m.rows));
+        return;
+    }
+
+    CV_Assert( DataType<T>::channels == m.channels() );
+    m.convertTo(*this, type());
+}
+
+template <typename T>
+__host__ GpuMat_<T>::GpuMat_(int arows, int acols, T* adata, size_t astep)
+    : GpuMat(arows, acols, DataType<T>::type, adata, astep)
+{
+}
+
+template <typename T>
+__host__ GpuMat_<T>::GpuMat_(Size asize, T* adata, size_t astep)
+    : GpuMat(asize.height, asize.width, DataType<T>::type, adata, astep)
+{
+}
+
+template <typename T>
+__host__ GpuMat_<T>::GpuMat_(const GpuMat_& m, Range arowRange, Range acolRange)
+    : GpuMat(m, arowRange, acolRange)
+{
+}
+
+template <typename T>
+__host__ GpuMat_<T>::GpuMat_(const GpuMat_& m, Rect roi)
+    : GpuMat(m, roi)
+{
+}
+
+template <typename T>
+__host__ GpuMat_<T>::GpuMat_(InputArray arr, Allocator* allocator)
+    : GpuMat(allocator)
+{
+    flags = (flags & ~CV_MAT_TYPE_MASK) | DataType<T>::type;
+    upload(arr);
+}
+
+template <typename T>
+__host__ GpuMat_<T>& GpuMat_<T>::operator =(const GpuMat_& m)
+{
+    GpuMat::operator =(m);
+    return *this;
+}
+
+template <typename T>
+__host__ void GpuMat_<T>::create(int arows, int acols)
+{
+    GpuMat::create(arows, acols, DataType<T>::type);
+}
+
+template <typename T>
+__host__ void GpuMat_<T>::create(Size asize)
+{
+    GpuMat::create(asize, DataType<T>::type);
+}
+
+template <typename T>
+__host__ void GpuMat_<T>::swap(GpuMat_& mat)
+{
+    GpuMat::swap(mat);
+}
+
+template <typename T>
+__host__ void GpuMat_<T>::upload(InputArray arr)
+{
+    CV_Assert( arr.type() == DataType<T>::type );
+    GpuMat::upload(arr);
+}
+
+template <typename T>
+__host__ void GpuMat_<T>::upload(InputArray arr, Stream& stream)
+{
+    CV_Assert( arr.type() == DataType<T>::type );
+    GpuMat::upload(arr, stream);
+}
+
+template <typename T>
+__host__ GpuMat_<T>::operator GlobPtrSz<T>() const
+{
+    return globPtr((T*) data, step, rows, cols);
+}
+
+template <typename T>
+__host__ GpuMat_<T>::operator GlobPtr<T>() const
+{
+    return globPtr((T*) data, step);
+}
+
+template <typename T>
+__host__ GpuMat_<T> GpuMat_<T>::clone() const
+{
+    return GpuMat_(GpuMat::clone());
+}
+
+template <typename T>
+__host__ GpuMat_<T> GpuMat_<T>::row(int y) const
+{
+    return GpuMat_(*this, Range(y, y+1), Range::all());
+}
+
+template <typename T>
+__host__ GpuMat_<T> GpuMat_<T>::col(int x) const
+{
+    return GpuMat_(*this, Range::all(), Range(x, x+1));
+}
+
+template <typename T>
+__host__ GpuMat_<T> GpuMat_<T>::rowRange(int startrow, int endrow) const
+{
+    return GpuMat_(*this, Range(startrow, endrow), Range::all());
+}
+
+template <typename T>
+__host__ GpuMat_<T> GpuMat_<T>::rowRange(Range r) const
+{
+    return GpuMat_(*this, r, Range::all());
+}
+
+template <typename T>
+__host__ GpuMat_<T> GpuMat_<T>::colRange(int startcol, int endcol) const
+{
+    return GpuMat_(*this, Range::all(), Range(startcol, endcol));
+}
+
+template <typename T>
+__host__ GpuMat_<T> GpuMat_<T>::colRange(Range r) const
+{
+    return GpuMat_(*this, Range::all(), r);
+}
+
+template <typename T>
+__host__ GpuMat_<T> GpuMat_<T>::operator ()(Range _rowRange, Range _colRange) const
+{
+    return GpuMat_(*this, _rowRange, _colRange);
+}
+
+template <typename T>
+__host__ GpuMat_<T> GpuMat_<T>::operator ()(Rect roi) const
+{
+    return GpuMat_(*this, roi);
+}
+
+template <typename T>
+__host__ GpuMat_<T>& GpuMat_<T>::adjustROI(int dtop, int dbottom, int dleft, int dright)
+{
+    return (GpuMat_<T>&)(GpuMat::adjustROI(dtop, dbottom, dleft, dright));
+}
+
+template <typename T>
+__host__ size_t GpuMat_<T>::elemSize() const
+{
+    CV_DbgAssert( GpuMat::elemSize() == sizeof(T) );
+    return sizeof(T);
+}
+
+template <typename T>
+__host__ size_t GpuMat_<T>::elemSize1() const
+{
+    CV_DbgAssert( GpuMat::elemSize1() == sizeof(T) / DataType<T>::channels );
+    return sizeof(T) / DataType<T>::channels;
+}
+
+template <typename T>
+__host__ int GpuMat_<T>::type() const
+{
+    CV_DbgAssert( GpuMat::type() == DataType<T>::type );
+    return DataType<T>::type;
+}
+
+template <typename T>
+__host__ int GpuMat_<T>::depth() const
+{
+    CV_DbgAssert( GpuMat::depth() == DataType<T>::depth );
+    return DataType<T>::depth;
+}
+
+template <typename T>
+__host__ int GpuMat_<T>::channels() const
+{
+    CV_DbgAssert( GpuMat::channels() == DataType<T>::channels );
+    return DataType<T>::channels;
+}
+
+template <typename T>
+__host__ size_t GpuMat_<T>::stepT() const
+{
+    return step / elemSize();
+}
+
+template <typename T>
+__host__ size_t GpuMat_<T>::step1() const
+{
+    return step / elemSize1();
+}
+
+template <typename T>
+__host__ T* GpuMat_<T>::operator [](int y)
+{
+    return (T*)ptr(y);
+}
+
+template <typename T>
+__host__ const T* GpuMat_<T>::operator [](int y) const
+{
+    return (const T*)ptr(y);
+}
+
+template <typename T> template <class Body>
+__host__ GpuMat_<T>::GpuMat_(const Expr<Body>& expr)
+    : GpuMat()
+{
+    flags = (flags & ~CV_MAT_TYPE_MASK) | DataType<T>::type;
+    *this = expr;
+}
+
+template <typename T> template <class Body>
+__host__ GpuMat_<T>& GpuMat_<T>::operator =(const Expr<Body>& expr)
+{
+    expr.body.assignTo(*this);
+    return *this;
+}
+
+template <typename T> template <class Body>
+__host__ GpuMat_<T>& GpuMat_<T>::assign(const Expr<Body>& expr, Stream& stream)
+{
+    expr.body.assignTo(*this, stream);
+    return *this;
+}
+
+}}
+
+// Input / Output Arrays
+
+namespace cv {
+
+template<typename _Tp>
+__host__ _InputArray::_InputArray(const cudev::GpuMat_<_Tp>& m)
+    : flags(FIXED_TYPE + CUDA_GPU_MAT + DataType<_Tp>::type), obj((void*)&m)
+{}
+
+template<typename _Tp>
+__host__ _OutputArray::_OutputArray(cudev::GpuMat_<_Tp>& m)
+    : _InputArray(m)
+{}
+
+template<typename _Tp>
+__host__ _OutputArray::_OutputArray(const cudev::GpuMat_<_Tp>& m)
+    : _InputArray(m)
+{
+    flags |= FIXED_SIZE;
+}
+
+}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/ptr2d/extrapolation.hpp b/modules/cudev/include/opencv2/cudev/ptr2d/extrapolation.hpp
new file mode 100644
index 00000000000..14bb305ebb3
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/extrapolation.hpp
@@ -0,0 +1,224 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_PTR2D_EXTRAPOLATION_HPP
+#define OPENCV_CUDEV_PTR2D_EXTRAPOLATION_HPP
+
+#include "../common.hpp"
+#include "../util/vec_traits.hpp"
+#include "traits.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// BrdConstant
+
+template <class SrcPtr> struct BrdConstant
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef int                                    index_type;
+
+    SrcPtr src;
+    int rows, cols;
+    typename PtrTraits<SrcPtr>::value_type val;
+
+    __device__ __forceinline__ typename PtrTraits<SrcPtr>::value_type operator ()(int y, int x) const
+    {
+        return (x >= 0 && x < cols && y >= 0 && y < rows) ? src(y, x) : val;
+    }
+};
+
+template <class SrcPtr>
+__host__ BrdConstant<typename PtrTraits<SrcPtr>::ptr_type> brdConstant(const SrcPtr& src, typename PtrTraits<SrcPtr>::value_type val)
+{
+    BrdConstant<typename PtrTraits<SrcPtr>::ptr_type> b;
+    b.src = shrinkPtr(src);
+    b.rows = getRows(src);
+    b.cols = getCols(src);
+    b.val = val;
+    return b;
+}
+
+template <class SrcPtr>
+__host__ BrdConstant<typename PtrTraits<SrcPtr>::ptr_type> brdConstant(const SrcPtr& src)
+{
+    return brdConstant(src, VecTraits<typename PtrTraits<SrcPtr>::value_type>::all(0));
+}
+
+// BrdBase
+
+template <class BrdImpl, class SrcPtr> struct BrdBase
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef int                                    index_type;
+
+    SrcPtr src;
+    int rows, cols;
+
+    __device__ __forceinline__ int idx_row(int y) const
+    {
+        return BrdImpl::idx_low(BrdImpl::idx_high(y, rows), rows);
+    }
+
+    __device__ __forceinline__ int idx_col(int x) const
+    {
+        return BrdImpl::idx_low(BrdImpl::idx_high(x, cols), cols);
+    }
+
+    __device__ __forceinline__ typename PtrTraits<SrcPtr>::value_type operator ()(int y, int x) const
+    {
+        return src(idx_row(y), idx_col(x));
+    }
+};
+
+// BrdReplicate
+
+struct BrdReplicate
+{
+    __device__ __forceinline__ static int idx_low(int i, int len)
+    {
+        return ::max(i, 0);
+    }
+
+    __device__ __forceinline__ static int idx_high(int i, int len)
+    {
+        return ::min(i, len - 1);
+    }
+};
+
+template <class SrcPtr>
+__host__ BrdBase<BrdReplicate, typename PtrTraits<SrcPtr>::ptr_type> brdReplicate(const SrcPtr& src)
+{
+    BrdBase<BrdReplicate, typename PtrTraits<SrcPtr>::ptr_type> b;
+    b.src = shrinkPtr(src);
+    b.rows = getRows(src);
+    b.cols = getCols(src);
+    return b;
+}
+
+// BrdReflect101
+
+struct BrdReflect101
+{
+    __device__ __forceinline__ static int idx_low(int i, int len)
+    {
+        return ::abs(i) % len;
+    }
+
+    __device__ __forceinline__ static int idx_high(int i, int len)
+    {
+        const int last_ind = len - 1;
+        return ::abs(last_ind - ::abs(last_ind - i)) % len;
+    }
+};
+
+template <class SrcPtr>
+__host__ BrdBase<BrdReflect101, typename PtrTraits<SrcPtr>::ptr_type> brdReflect101(const SrcPtr& src)
+{
+    BrdBase<BrdReflect101, typename PtrTraits<SrcPtr>::ptr_type> b;
+    b.src = shrinkPtr(src);
+    b.rows = getRows(src);
+    b.cols = getCols(src);
+    return b;
+}
+
+// BrdReflect
+
+struct BrdReflect
+{
+    __device__ __forceinline__ static int idx_low(int i, int len)
+    {
+        return (::abs(i) - (i < 0)) % len;
+    }
+
+    __device__ __forceinline__ static int idx_high(int i, int len)
+    {
+        const int last_ind = len - 1;
+        return (last_ind - ::abs(last_ind - i) + (i > last_ind));
+    }
+};
+
+template <class SrcPtr>
+__host__ BrdBase<BrdReflect, typename PtrTraits<SrcPtr>::ptr_type> brdReflect(const SrcPtr& src)
+{
+    BrdBase<BrdReflect, typename PtrTraits<SrcPtr>::ptr_type> b;
+    b.src = shrinkPtr(src);
+    b.rows = getRows(src);
+    b.cols = getCols(src);
+    return b;
+}
+
+// BrdWrap
+
+struct BrdWrap
+{
+    __device__ __forceinline__ static int idx_low(int i, int len)
+    {
+        return (i >= 0) ? i : (i - ((i - len + 1) / len) * len);
+    }
+
+    __device__ __forceinline__ static int idx_high(int i, int len)
+    {
+        return (i < len) ? i : (i % len);
+    }
+};
+
+template <class SrcPtr>
+__host__ BrdBase<BrdWrap, typename PtrTraits<SrcPtr>::ptr_type> brdWrap(const SrcPtr& src)
+{
+    BrdBase<BrdWrap, typename PtrTraits<SrcPtr>::ptr_type> b;
+    b.src = shrinkPtr(src);
+    b.rows = getRows(src);
+    b.cols = getCols(src);
+    return b;
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/ptr2d/glob.hpp b/modules/cudev/include/opencv2/cudev/ptr2d/glob.hpp
new file mode 100644
index 00000000000..2024a7e01a2
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/glob.hpp
@@ -0,0 +1,129 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_PTR2D_GLOB_HPP
+#define OPENCV_CUDEV_PTR2D_GLOB_HPP
+
+#include "../common.hpp"
+#include "traits.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+/** @brief Structure similar to cv::cudev::GlobPtrSz but containing only a pointer and row step.
+
+Width and height fields are excluded due to performance reasons. The structure is intended
+for internal use or for users who write device code.
+ */
+template <typename T> struct GlobPtr
+{
+    typedef T   value_type;
+    typedef int index_type;
+
+    T* data;
+
+    //! stride between two consecutive rows in bytes. Step is stored always and everywhere in bytes!!!
+    size_t step;
+
+    __device__ __forceinline__       T* row(int y)       { return (      T*)( (      uchar*)data + y * step); }
+    __device__ __forceinline__ const T* row(int y) const { return (const T*)( (const uchar*)data + y * step); }
+
+    __device__ __forceinline__       T& operator ()(int y, int x)       { return row(y)[x]; }
+    __device__ __forceinline__ const T& operator ()(int y, int x) const { return row(y)[x]; }
+};
+
+/** @brief Lightweight class encapsulating pitched memory on a GPU and passed to nvcc-compiled code (CUDA
+kernels).
+
+Typically, it is used internally by OpenCV and by users who write device code. You can call
+its members from both host and device code.
+ */
+template <typename T> struct GlobPtrSz : GlobPtr<T>
+{
+    int rows, cols;
+};
+
+template <typename T>
+__host__ __device__ GlobPtr<T> globPtr(T* data, size_t step)
+{
+    GlobPtr<T> p;
+    p.data = data;
+    p.step = step;
+    return p;
+}
+
+template <typename T>
+__host__ __device__ GlobPtrSz<T> globPtr(T* data, size_t step, int rows, int cols)
+{
+    GlobPtrSz<T> p;
+    p.data = data;
+    p.step = step;
+    p.rows = rows;
+    p.cols = cols;
+    return p;
+}
+
+template <typename T>
+__host__ GlobPtrSz<T> globPtr(const GpuMat& mat)
+{
+    GlobPtrSz<T> p;
+    p.data = (T*) mat.data;
+    p.step = mat.step;
+    p.rows = mat.rows;
+    p.cols = mat.cols;
+    return p;
+}
+
+template <typename T> struct PtrTraits< GlobPtrSz<T> > : PtrTraitsBase<GlobPtrSz<T>, GlobPtr<T> >
+{
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/ptr2d/gpumat.hpp b/modules/cudev/include/opencv2/cudev/ptr2d/gpumat.hpp
new file mode 100644
index 00000000000..5d9a98f0cd4
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/gpumat.hpp
@@ -0,0 +1,166 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_PTR2D_GPUMAT_HPP
+#define OPENCV_CUDEV_PTR2D_GPUMAT_HPP
+
+#include "../common.hpp"
+#include "../util/vec_traits.hpp"
+#include "../expr/expr.hpp"
+#include "glob.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <typename T>
+class GpuMat_ : public GpuMat
+{
+public:
+    typedef T value_type;
+
+    //! default constructor
+    __host__ GpuMat_(Allocator* allocator = defaultAllocator());
+
+    //! constructs GpuMat of the specified size
+    __host__ GpuMat_(int arows, int acols, Allocator* allocator = defaultAllocator());
+    __host__ explicit GpuMat_(Size asize, Allocator* allocator = defaultAllocator());
+
+    //! constucts GpuMat and fills it with the specified value
+    __host__ GpuMat_(int arows, int acols, Scalar val, Allocator* allocator = defaultAllocator());
+    __host__ GpuMat_(Size asize, Scalar val, Allocator* allocator = defaultAllocator());
+
+    //! copy constructor
+    __host__ GpuMat_(const GpuMat_& m);
+
+    //! copy/conversion constructor. If m is of different type, it's converted
+    __host__ explicit GpuMat_(const GpuMat& m, Allocator* allocator = defaultAllocator());
+
+    //! constructs a matrix on top of user-allocated data. step is in bytes(!!!), regardless of the type
+    __host__ GpuMat_(int arows, int acols, T* adata, size_t astep = Mat::AUTO_STEP);
+    __host__ GpuMat_(Size asize, T* adata, size_t astep = Mat::AUTO_STEP);
+
+    //! selects a submatrix
+    __host__ GpuMat_(const GpuMat_& m, Range arowRange, Range acolRange);
+    __host__ GpuMat_(const GpuMat_& m, Rect roi);
+
+    //! builds GpuMat from host memory (Blocking call)
+    __host__ explicit GpuMat_(InputArray arr, Allocator* allocator = defaultAllocator());
+
+    //! assignment operators
+    __host__ GpuMat_& operator =(const GpuMat_& m);
+
+    //! allocates new GpuMat data unless the GpuMat already has specified size and type
+    __host__ void create(int arows, int acols);
+    __host__ void create(Size asize);
+
+    //! swaps with other smart pointer
+    __host__ void swap(GpuMat_& mat);
+
+    //! pefroms upload data to GpuMat (Blocking call)
+    __host__ void upload(InputArray arr);
+
+    //! pefroms upload data to GpuMat (Non-Blocking call)
+    __host__ void upload(InputArray arr, Stream& stream);
+
+    //! convert to GlobPtr
+    __host__ operator GlobPtrSz<T>() const;
+    __host__ operator GlobPtr<T>() const;
+
+    //! overridden forms of GpuMat::row() etc.
+    __host__ GpuMat_ clone() const;
+    __host__ GpuMat_ row(int y) const;
+    __host__ GpuMat_ col(int x) const;
+    __host__ GpuMat_ rowRange(int startrow, int endrow) const;
+    __host__ GpuMat_ rowRange(Range r) const;
+    __host__ GpuMat_ colRange(int startcol, int endcol) const;
+    __host__ GpuMat_ colRange(Range r) const;
+    __host__ GpuMat_ operator ()(Range rowRange, Range colRange) const;
+    __host__ GpuMat_ operator ()(Rect roi) const;
+    __host__ GpuMat_& adjustROI(int dtop, int dbottom, int dleft, int dright);
+
+    //! overridden forms of GpuMat::elemSize() etc.
+    __host__ size_t elemSize() const;
+    __host__ size_t elemSize1() const;
+    __host__ int type() const;
+    __host__ int depth() const;
+    __host__ int channels() const;
+    __host__ size_t step1() const;
+
+    //! returns step()/sizeof(T)
+    __host__ size_t stepT() const;
+
+    //! more convenient forms of row and element access operators
+    __host__ T* operator [](int y);
+    __host__ const T* operator [](int y) const;
+
+    //! expression templates
+    template <class Body> __host__ GpuMat_(const Expr<Body>& expr);
+    template <class Body> __host__ GpuMat_& operator =(const Expr<Body>& expr);
+    template <class Body> __host__ GpuMat_& assign(const Expr<Body>& expr, Stream& stream);
+};
+
+//! creates alternative GpuMat header for the same data, with different
+//! number of channels and/or different number of rows. see cvReshape.
+template <int cn, typename T>
+__host__ GpuMat_<typename MakeVec<typename VecTraits<T>::elem_type, cn>::type>
+reshape_(const GpuMat_<T>& mat, int rows = 0)
+{
+    GpuMat_<typename MakeVec<typename VecTraits<T>::elem_type, cn>::type> dst(mat.reshape(cn, rows));
+    return dst;
+}
+
+template <typename T> struct PtrTraits< GpuMat_<T> > : PtrTraitsBase<GpuMat_<T>, GlobPtr<T> >
+{
+};
+
+//! @}
+
+}}
+
+#include "detail/gpumat.hpp"
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/ptr2d/interpolation.hpp b/modules/cudev/include/opencv2/cudev/ptr2d/interpolation.hpp
new file mode 100644
index 00000000000..c416140f05d
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/interpolation.hpp
@@ -0,0 +1,390 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_PTR2D_INTERPOLATION_HPP
+#define OPENCV_CUDEV_PTR2D_INTERPOLATION_HPP
+
+#include "../common.hpp"
+#include "../util/vec_traits.hpp"
+#include "../util/saturate_cast.hpp"
+#include "../util/type_traits.hpp"
+#include "../util/limits.hpp"
+#include "traits.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// Nearest
+
+template <class SrcPtr> struct NearestInterPtr
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef float                                  index_type;
+
+    SrcPtr src;
+
+    __device__ __forceinline__ typename PtrTraits<SrcPtr>::value_type operator ()(float y, float x) const
+    {
+        return src(__float2int_rn(y), __float2int_rn(x));
+    }
+};
+
+template <class SrcPtr> struct NearestInterPtrSz : NearestInterPtr<SrcPtr>
+{
+    int rows, cols;
+};
+
+template <class SrcPtr>
+__host__ NearestInterPtrSz<typename PtrTraits<SrcPtr>::ptr_type> interNearest(const SrcPtr& src)
+{
+    NearestInterPtrSz<typename PtrTraits<SrcPtr>::ptr_type> i;
+    i.src = shrinkPtr(src);
+    i.rows = getRows(src);
+    i.cols = getCols(src);
+    return i;
+}
+
+template <class SrcPtr> struct PtrTraits< NearestInterPtrSz<SrcPtr> > : PtrTraitsBase<NearestInterPtrSz<SrcPtr>, NearestInterPtr<SrcPtr> >
+{
+};
+
+// Linear
+
+template <typename SrcPtr> struct LinearInterPtr
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef float                                  index_type;
+
+    SrcPtr src;
+
+    __device__ typename PtrTraits<SrcPtr>::value_type operator ()(float y, float x) const
+    {
+        typedef typename PtrTraits<SrcPtr>::value_type src_type;
+        typedef typename VecTraits<src_type>::elem_type src_elem_type;
+        typedef typename LargerType<float, src_elem_type>::type work_elem_type;
+        typedef typename MakeVec<work_elem_type, VecTraits<src_type>::cn>::type work_type;
+
+        work_type out = VecTraits<work_type>::all(0);
+
+        const int x1 = __float2int_rd(x);
+        const int y1 = __float2int_rd(y);
+        const int x2 = x1 + 1;
+        const int y2 = y1 + 1;
+
+        typename PtrTraits<SrcPtr>::value_type src_reg = src(y1, x1);
+        out = out + src_reg * static_cast<work_elem_type>((x2 - x) * (y2 - y));
+
+        src_reg = src(y1, x2);
+        out = out + src_reg * static_cast<work_elem_type>((x - x1) * (y2 - y));
+
+        src_reg = src(y2, x1);
+        out = out + src_reg * static_cast<work_elem_type>((x2 - x) * (y - y1));
+
+        src_reg = src(y2, x2);
+        out = out + src_reg * static_cast<work_elem_type>((x - x1) * (y - y1));
+
+        return saturate_cast<typename PtrTraits<SrcPtr>::value_type>(out);
+    }
+};
+
+template <class SrcPtr> struct LinearInterPtrSz : LinearInterPtr<SrcPtr>
+{
+    int rows, cols;
+};
+
+template <class SrcPtr>
+__host__ LinearInterPtrSz<typename PtrTraits<SrcPtr>::ptr_type> interLinear(const SrcPtr& src)
+{
+    LinearInterPtrSz<typename PtrTraits<SrcPtr>::ptr_type> i;
+    i.src = shrinkPtr(src);
+    i.rows = getRows(src);
+    i.cols = getCols(src);
+    return i;
+}
+
+template <class SrcPtr> struct PtrTraits< LinearInterPtrSz<SrcPtr> > : PtrTraitsBase<LinearInterPtrSz<SrcPtr>, LinearInterPtr<SrcPtr> >
+{
+};
+
+// Cubic
+
+template <typename SrcPtr> struct CubicInterPtr
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef float                                  index_type;
+
+    SrcPtr src;
+
+    __device__ static float bicubicCoeff(float x_)
+    {
+        float x = ::fabsf(x_);
+        if (x <= 1.0f)
+        {
+            return x * x * (1.5f * x - 2.5f) + 1.0f;
+        }
+        else if (x < 2.0f)
+        {
+            return x * (x * (-0.5f * x + 2.5f) - 4.0f) + 2.0f;
+        }
+        else
+        {
+            return 0.0f;
+        }
+    }
+
+    __device__ typename PtrTraits<SrcPtr>::value_type operator ()(float y, float x) const
+    {
+        typedef typename PtrTraits<SrcPtr>::value_type src_type;
+        typedef typename VecTraits<src_type>::elem_type src_elem_type;
+        typedef typename LargerType<float, src_elem_type>::type work_elem_type;
+        typedef typename MakeVec<work_elem_type, VecTraits<src_type>::cn>::type work_type;
+
+        const float xmin = ::ceilf(x - 2.0f);
+        const float xmax = ::floorf(x + 2.0f);
+
+        const float ymin = ::ceilf(y - 2.0f);
+        const float ymax = ::floorf(y + 2.0f);
+
+        work_type sum = VecTraits<work_type>::all(0);
+        float wsum = 0.0f;
+
+        for (float cy = ymin; cy <= ymax; cy += 1.0f)
+        {
+            for (float cx = xmin; cx <= xmax; cx += 1.0f)
+            {
+                typename PtrTraits<SrcPtr>::value_type src_reg = src(__float2int_rd(cy), __float2int_rd(cx));
+                const float w = bicubicCoeff(x - cx) * bicubicCoeff(y - cy);
+
+                sum = sum + static_cast<work_elem_type>(w) * src_reg;
+                wsum += w;
+            }
+        }
+
+        work_type res = (wsum > numeric_limits<float>::epsilon()) ? VecTraits<work_type>::all(0) : sum / static_cast<work_elem_type>(wsum);
+
+        return saturate_cast<typename PtrTraits<SrcPtr>::value_type>(res);
+    }
+};
+
+template <class SrcPtr> struct CubicInterPtrSz : CubicInterPtr<SrcPtr>
+{
+    int rows, cols;
+};
+
+template <class SrcPtr>
+__host__ CubicInterPtrSz<typename PtrTraits<SrcPtr>::ptr_type> interCubic(const SrcPtr& src)
+{
+    CubicInterPtrSz<typename PtrTraits<SrcPtr>::ptr_type> i;
+    i.src = shrinkPtr(src);
+    i.rows = getRows(src);
+    i.cols = getCols(src);
+    return i;
+}
+
+template <class SrcPtr> struct PtrTraits< CubicInterPtrSz<SrcPtr> > : PtrTraitsBase<CubicInterPtrSz<SrcPtr>, CubicInterPtr<SrcPtr> >
+{
+};
+
+// IntegerArea
+
+template <typename SrcPtr> struct IntegerAreaInterPtr
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef float                                  index_type;
+
+    SrcPtr src;
+    int area_width, area_height;
+
+    __device__ typename PtrTraits<SrcPtr>::value_type operator ()(float y, float x) const
+    {
+        typedef typename PtrTraits<SrcPtr>::value_type src_type;
+        typedef typename VecTraits<src_type>::elem_type src_elem_type;
+        typedef typename LargerType<float, src_elem_type>::type work_elem_type;
+        typedef typename MakeVec<work_elem_type, VecTraits<src_type>::cn>::type work_type;
+
+        const int sx1 = __float2int_rd(x);
+        const int sx2 = sx1 + area_width;
+
+        const int sy1 = __float2int_rd(y);
+        const int sy2 = sy1 + area_height;
+
+        work_type out = VecTraits<work_type>::all(0);
+
+        for (int dy = sy1; dy < sy2; ++dy)
+        {
+            for (int dx = sx1; dx < sx2; ++dx)
+            {
+                out = out + saturate_cast<work_type>(src(dy, dx));
+            }
+        }
+
+        const work_elem_type scale = 1.0f / (area_width * area_height);
+
+        return saturate_cast<typename PtrTraits<SrcPtr>::value_type>(out * scale);
+    }
+};
+
+template <class SrcPtr> struct IntegerAreaInterPtrSz : IntegerAreaInterPtr<SrcPtr>
+{
+    int rows, cols;
+};
+
+template <class SrcPtr>
+__host__ IntegerAreaInterPtrSz<typename PtrTraits<SrcPtr>::ptr_type> interArea(const SrcPtr& src, Size areaSize)
+{
+    IntegerAreaInterPtrSz<typename PtrTraits<SrcPtr>::ptr_type> i;
+    i.src = shrinkPtr(src);
+    i.area_width = areaSize.width;
+    i.area_height = areaSize.height;
+    i.rows = getRows(src);
+    i.cols = getCols(src);
+    return i;
+}
+
+template <class SrcPtr> struct PtrTraits< IntegerAreaInterPtrSz<SrcPtr> > : PtrTraitsBase<IntegerAreaInterPtrSz<SrcPtr>, IntegerAreaInterPtr<SrcPtr> >
+{
+};
+
+// CommonArea
+
+template <typename SrcPtr> struct CommonAreaInterPtr
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef float                                  index_type;
+
+    SrcPtr src;
+    float area_width, area_height;
+
+    __device__ typename PtrTraits<SrcPtr>::value_type operator ()(float y, float x) const
+    {
+        typedef typename PtrTraits<SrcPtr>::value_type src_type;
+        typedef typename VecTraits<src_type>::elem_type src_elem_type;
+        typedef typename LargerType<float, src_elem_type>::type work_elem_type;
+        typedef typename MakeVec<work_elem_type, VecTraits<src_type>::cn>::type work_type;
+
+        const float fsx1 = x;
+        const float fsx2 = fsx1 + area_width;
+
+        const int sx1 = __float2int_rd(fsx1);
+        const int sx2 = __float2int_ru(fsx2);
+
+        const float fsy1 = y;
+        const float fsy2 = fsy1 + area_height;
+
+        const int sy1 = __float2int_rd(fsy1);
+        const int sy2 = __float2int_ru(fsy2);
+
+        work_type out = VecTraits<work_type>::all(0);
+
+        for (int dy = sy1; dy < sy2; ++dy)
+        {
+            for (int dx = sx1; dx < sx2; ++dx)
+                out = out + saturate_cast<work_type>(src(dy, dx));
+
+            if (sx1 > fsx1)
+                out = out + saturate_cast<work_type>(src(dy, sx1 - 1)) * static_cast<work_elem_type>(sx1 - fsx1);
+
+            if (sx2 < fsx2)
+                out = out + saturate_cast<work_type>(src(dy, sx2)) * static_cast<work_elem_type>(fsx2 - sx2);
+        }
+
+        if (sy1 > fsy1)
+        {
+            for (int dx = sx1; dx < sx2; ++dx)
+                out = out + saturate_cast<work_type>(src(sy1 - 1, dx)) * static_cast<work_elem_type>(sy1 - fsy1);
+        }
+
+        if (sy2 < fsy2)
+        {
+            for (int dx = sx1; dx < sx2; ++dx)
+                out = out + saturate_cast<work_type>(src(sy2, dx)) * static_cast<work_elem_type>(fsy2 - sy2);
+        }
+
+        if ((sy1 > fsy1) && (sx1 > fsx1))
+            out = out + saturate_cast<work_type>(src(sy1 - 1, sx1 - 1)) * static_cast<work_elem_type>((sy1 - fsy1) * (sx1 - fsx1));
+
+        if ((sy1 > fsy1) && (sx2 < fsx2))
+            out = out + saturate_cast<work_type>(src(sy1 - 1, sx2)) * static_cast<work_elem_type>((sy1 - fsy1) * (fsx2 - sx2));
+
+        if ((sy2 < fsy2) && (sx2 < fsx2))
+            out = out + saturate_cast<work_type>(src(sy2, sx2)) * static_cast<work_elem_type>((fsy2 - sy2) * (fsx2 - sx2));
+
+        if ((sy2 < fsy2) && (sx1 > fsx1))
+            out = out + saturate_cast<work_type>(src(sy2, sx1 - 1)) * static_cast<work_elem_type>((fsy2 - sy2) * (sx1 - fsx1));
+
+        const work_elem_type scale = 1.0f / (area_width * area_height);
+
+        return saturate_cast<typename PtrTraits<SrcPtr>::value_type>(out * scale);
+    }
+};
+
+template <class SrcPtr> struct CommonAreaInterPtrSz : CommonAreaInterPtr<SrcPtr>
+{
+    int rows, cols;
+};
+
+template <class SrcPtr>
+__host__ CommonAreaInterPtrSz<typename PtrTraits<SrcPtr>::ptr_type> interArea(const SrcPtr& src, Size2f areaSize)
+{
+    CommonAreaInterPtrSz<typename PtrTraits<SrcPtr>::ptr_type> i;
+    i.src = shrinkPtr(src);
+    i.area_width = areaSize.width;
+    i.area_height = areaSize.height;
+    i.rows = getRows(src);
+    i.cols = getCols(src);
+    return i;
+}
+
+template <class SrcPtr> struct PtrTraits< CommonAreaInterPtrSz<SrcPtr> > : PtrTraitsBase<CommonAreaInterPtrSz<SrcPtr>, CommonAreaInterPtr<SrcPtr> >
+{
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/ptr2d/lut.hpp b/modules/cudev/include/opencv2/cudev/ptr2d/lut.hpp
new file mode 100644
index 00000000000..221732c0ec2
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/lut.hpp
@@ -0,0 +1,105 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_PTR2D_LUT_HPP
+#define OPENCV_CUDEV_PTR2D_LUT_HPP
+
+#include "../common.hpp"
+#include "../util/vec_traits.hpp"
+#include "../grid/copy.hpp"
+#include "traits.hpp"
+#include "gpumat.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <class SrcPtr, class TablePtr> struct LutPtr
+{
+    typedef typename PtrTraits<TablePtr>::value_type value_type;
+    typedef typename PtrTraits<SrcPtr>::index_type   index_type;
+
+    SrcPtr src;
+    TablePtr tbl;
+
+    __device__ __forceinline__ typename PtrTraits<TablePtr>::value_type operator ()(typename PtrTraits<SrcPtr>::index_type y, typename PtrTraits<SrcPtr>::index_type x) const
+    {
+        typedef typename PtrTraits<TablePtr>::index_type tbl_index_type;
+        return tbl(VecTraits<tbl_index_type>::all(0), src(y, x));
+    }
+};
+
+template <class SrcPtr, class TablePtr> struct LutPtrSz : LutPtr<SrcPtr, TablePtr>
+{
+    int rows, cols;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridCopy(*this, dst, stream);
+    }
+};
+
+template <class SrcPtr, class TablePtr>
+__host__ LutPtrSz<typename PtrTraits<SrcPtr>::ptr_type, typename PtrTraits<TablePtr>::ptr_type> lutPtr(const SrcPtr& src, const TablePtr& tbl)
+{
+    LutPtrSz<typename PtrTraits<SrcPtr>::ptr_type, typename PtrTraits<TablePtr>::ptr_type> ptr;
+    ptr.src = shrinkPtr(src);
+    ptr.tbl = shrinkPtr(tbl);
+    ptr.rows = getRows(src);
+    ptr.cols = getCols(src);
+    return ptr;
+}
+
+template <class SrcPtr, class TablePtr> struct PtrTraits< LutPtrSz<SrcPtr, TablePtr> > : PtrTraitsBase<LutPtrSz<SrcPtr, TablePtr>, LutPtr<SrcPtr, TablePtr> >
+{
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/ptr2d/mask.hpp b/modules/cudev/include/opencv2/cudev/ptr2d/mask.hpp
new file mode 100644
index 00000000000..46f518cca93
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/mask.hpp
@@ -0,0 +1,108 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_PTR2D_MASK_HPP
+#define OPENCV_CUDEV_PTR2D_MASK_HPP
+
+#include "../common.hpp"
+#include "traits.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+struct WithOutMask
+{
+    typedef bool value_type;
+    typedef int  index_type;
+
+    __device__ __forceinline__ bool operator ()(int, int) const
+    {
+        return true;
+    }
+};
+
+template <class MaskPtr> struct SingleMaskChannels
+{
+    typedef typename PtrTraits<MaskPtr>::value_type value_type;
+    typedef typename PtrTraits<MaskPtr>::index_type index_type;
+
+    MaskPtr mask;
+    int channels;
+
+    __device__ __forceinline__ value_type operator()(index_type y, index_type x) const
+    {
+        return mask(y, x / channels);
+    }
+
+};
+
+template <class MaskPtr> struct SingleMaskChannelsSz : SingleMaskChannels<MaskPtr>
+{
+    int rows, cols;
+};
+
+template <class MaskPtr>
+__host__ SingleMaskChannelsSz<typename PtrTraits<MaskPtr>::ptr_type>
+singleMaskChannels(const MaskPtr& mask, int channels)
+{
+    SingleMaskChannelsSz<typename PtrTraits<MaskPtr>::ptr_type> ptr;
+    ptr.mask = shrinkPtr(mask);
+    ptr.channels = channels;
+    ptr.rows = getRows(mask);
+    ptr.cols = getCols(mask) * channels;
+    return ptr;
+}
+
+template <class MaskPtr> struct PtrTraits< SingleMaskChannelsSz<MaskPtr> > : PtrTraitsBase<SingleMaskChannelsSz<MaskPtr>, SingleMaskChannels<MaskPtr> >
+{
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/ptr2d/remap.hpp b/modules/cudev/include/opencv2/cudev/ptr2d/remap.hpp
new file mode 100644
index 00000000000..cb21da48cf2
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/remap.hpp
@@ -0,0 +1,159 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_PTR2D_REMAP_HPP
+#define OPENCV_CUDEV_PTR2D_REMAP_HPP
+
+#include "opencv2/core/base.hpp"
+#include "../common.hpp"
+#include "../grid/copy.hpp"
+#include "traits.hpp"
+#include "gpumat.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <class SrcPtr, class MapPtr> struct RemapPtr1
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef typename PtrTraits<MapPtr>::index_type index_type;
+
+    SrcPtr src;
+    MapPtr map;
+
+    __device__ __forceinline__ typename PtrTraits<SrcPtr>::value_type operator ()(typename PtrTraits<MapPtr>::index_type y, typename PtrTraits<MapPtr>::index_type x) const
+    {
+        const typename PtrTraits<MapPtr>::value_type coord = map(y, x);
+        return src(coord.y, coord.x);
+    }
+};
+
+template <class SrcPtr, class MapXPtr, class MapYPtr> struct RemapPtr2
+{
+    typedef typename PtrTraits<SrcPtr>::value_type  value_type;
+    typedef typename PtrTraits<MapXPtr>::index_type index_type;
+
+    SrcPtr src;
+    MapXPtr mapx;
+    MapYPtr mapy;
+
+    __device__ __forceinline__ typename PtrTraits<SrcPtr>::value_type operator ()(typename PtrTraits<MapXPtr>::index_type y, typename PtrTraits<MapXPtr>::index_type x) const
+    {
+        const typename PtrTraits<MapXPtr>::value_type nx = mapx(y, x);
+        const typename PtrTraits<MapYPtr>::value_type ny = mapy(y, x);
+        return src(ny, nx);
+    }
+};
+
+template <class SrcPtr, class MapPtr> struct RemapPtr1Sz : RemapPtr1<SrcPtr, MapPtr>
+{
+    int rows, cols;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridCopy(*this, dst, stream);
+    }
+};
+
+template <class SrcPtr, class MapXPtr, class MapYPtr> struct RemapPtr2Sz : RemapPtr2<SrcPtr, MapXPtr, MapYPtr>
+{
+    int rows, cols;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridCopy(*this, dst, stream);
+    }
+};
+
+template <class SrcPtr, class MapPtr>
+__host__ RemapPtr1Sz<typename PtrTraits<SrcPtr>::ptr_type, typename PtrTraits<MapPtr>::ptr_type>
+remapPtr(const SrcPtr& src, const MapPtr& map)
+{
+    const int rows = getRows(map);
+    const int cols = getCols(map);
+
+    RemapPtr1Sz<typename PtrTraits<SrcPtr>::ptr_type, typename PtrTraits<MapPtr>::ptr_type> r;
+    r.src = shrinkPtr(src);
+    r.map = shrinkPtr(map);
+    r.rows = rows;
+    r.cols = cols;
+    return r;
+}
+
+template <class SrcPtr, class MapXPtr, class MapYPtr>
+__host__ RemapPtr2Sz<typename PtrTraits<SrcPtr>::ptr_type, typename PtrTraits<MapXPtr>::ptr_type, typename PtrTraits<MapYPtr>::ptr_type>
+remapPtr(const SrcPtr& src, const MapXPtr& mapx, const MapYPtr& mapy)
+{
+    const int rows = getRows(mapx);
+    const int cols = getCols(mapx);
+
+    CV_Assert( getRows(mapy) == rows && getCols(mapy) == cols );
+
+    RemapPtr2Sz<typename PtrTraits<SrcPtr>::ptr_type, typename PtrTraits<MapXPtr>::ptr_type, typename PtrTraits<MapYPtr>::ptr_type> r;
+    r.src = shrinkPtr(src);
+    r.mapx = shrinkPtr(mapx);
+    r.mapy = shrinkPtr(mapy);
+    r.rows = rows;
+    r.cols = cols;
+    return r;
+}
+
+template <class SrcPtr, class MapPtr> struct PtrTraits< RemapPtr1Sz<SrcPtr, MapPtr> > : PtrTraitsBase<RemapPtr1Sz<SrcPtr, MapPtr>, RemapPtr1<SrcPtr, MapPtr> >
+{
+};
+
+template <class SrcPtr, class MapXPtr, class MapYPtr> struct PtrTraits< RemapPtr2Sz<SrcPtr, MapXPtr, MapYPtr> > : PtrTraitsBase<RemapPtr2Sz<SrcPtr, MapXPtr, MapYPtr>, RemapPtr2<SrcPtr, MapXPtr, MapYPtr> >
+{
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/ptr2d/resize.hpp b/modules/cudev/include/opencv2/cudev/ptr2d/resize.hpp
new file mode 100644
index 00000000000..d026b7031a6
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/resize.hpp
@@ -0,0 +1,108 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_PTR2D_RESIZE_HPP
+#define OPENCV_CUDEV_PTR2D_RESIZE_HPP
+
+#include "opencv2/core/base.hpp"
+#include "../common.hpp"
+#include "../grid/copy.hpp"
+#include "traits.hpp"
+#include "gpumat.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <class SrcPtr> struct ResizePtr
+{
+    typedef typename PtrTraits<SrcPtr>::value_type value_type;
+    typedef typename PtrTraits<SrcPtr>::index_type index_type;
+
+    SrcPtr src;
+    float fx, fy;
+
+    __device__ __forceinline__ typename PtrTraits<SrcPtr>::value_type operator ()(typename PtrTraits<SrcPtr>::index_type y, typename PtrTraits<SrcPtr>::index_type x) const
+    {
+        const float yn = static_cast<float>(y * fy);
+        const float xn = static_cast<float>(x * fx);
+
+        return src(yn, xn);
+    }
+};
+
+template <class SrcPtr> struct ResizePtrSz : ResizePtr<SrcPtr>
+{
+    int rows, cols;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridCopy(*this, dst, stream);
+    }
+};
+
+template <class SrcPtr>
+__host__ ResizePtrSz<typename PtrTraits<SrcPtr>::ptr_type> resizePtr(const SrcPtr& src, float fx, float fy)
+{
+    ResizePtrSz<typename PtrTraits<SrcPtr>::ptr_type> r;
+    r.src = shrinkPtr(src);
+    r.fx = 1.0f / fx;
+    r.fy = 1.0f / fy;
+    r.rows = cv::saturate_cast<int>(getRows(src) * fy);
+    r.cols = cv::saturate_cast<int>(getCols(src) * fx);
+    return r;
+}
+
+template <class SrcPtr> struct PtrTraits< ResizePtrSz<SrcPtr> > : PtrTraitsBase<ResizePtrSz<SrcPtr>, ResizePtr<SrcPtr> >
+{
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/ptr2d/texture.hpp b/modules/cudev/include/opencv2/cudev/ptr2d/texture.hpp
new file mode 100644
index 00000000000..fdcc66ca2f7
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/texture.hpp
@@ -0,0 +1,258 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_PTR2D_TEXTURE_HPP
+#define OPENCV_CUDEV_PTR2D_TEXTURE_HPP
+
+#include <cstring>
+#include "../common.hpp"
+#include "glob.hpp"
+#include "gpumat.hpp"
+#include "traits.hpp"
+
+#if CUDART_VERSION >= 5050
+
+namespace
+{
+    template <typename T> struct CvCudevTextureRef
+    {
+        typedef texture<T, cudaTextureType2D, cudaReadModeElementType> TexRef;
+
+        static TexRef ref;
+
+        __host__ static void bind(const cv::cudev::GlobPtrSz<T>& mat,
+                                  bool normalizedCoords = false,
+                                  cudaTextureFilterMode filterMode = cudaFilterModePoint,
+                                  cudaTextureAddressMode addressMode = cudaAddressModeClamp)
+        {
+            ref.normalized = normalizedCoords;
+            ref.filterMode = filterMode;
+            ref.addressMode[0] = addressMode;
+            ref.addressMode[1] = addressMode;
+            ref.addressMode[2] = addressMode;
+
+            cudaChannelFormatDesc desc = cudaCreateChannelDesc<T>();
+
+            CV_CUDEV_SAFE_CALL( cudaBindTexture2D(0, &ref, mat.data, &desc, mat.cols, mat.rows, mat.step) );
+        }
+
+        __host__ static void unbind()
+        {
+            cudaUnbindTexture(ref);
+        }
+    };
+
+    template <typename T>
+    typename CvCudevTextureRef<T>::TexRef CvCudevTextureRef<T>::ref;
+}
+
+#endif
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+#if CUDART_VERSION >= 5050
+
+template <typename T> struct TexturePtr
+{
+    typedef T     value_type;
+    typedef float index_type;
+
+    cudaTextureObject_t texObj;
+
+    __device__ __forceinline__ T operator ()(float y, float x) const
+    {
+    #if CV_CUDEV_ARCH < 300
+        // Use the texture reference
+        return tex2D(CvCudevTextureRef<T>::ref, x, y);
+    #else
+        // Use the texture object
+        return tex2D<T>(texObj, x, y);
+    #endif
+    }
+};
+
+template <typename T> struct Texture : TexturePtr<T>
+{
+    int rows, cols;
+    bool cc30;
+
+    __host__ explicit Texture(const GlobPtrSz<T>& mat,
+                              bool normalizedCoords = false,
+                              cudaTextureFilterMode filterMode = cudaFilterModePoint,
+                              cudaTextureAddressMode addressMode = cudaAddressModeClamp)
+    {
+        cc30 = deviceSupports(FEATURE_SET_COMPUTE_30);
+
+        rows = mat.rows;
+        cols = mat.cols;
+
+        if (cc30)
+        {
+            // Use the texture object
+            cudaResourceDesc texRes;
+            std::memset(&texRes, 0, sizeof(texRes));
+            texRes.resType = cudaResourceTypePitch2D;
+            texRes.res.pitch2D.devPtr = mat.data;
+            texRes.res.pitch2D.height = mat.rows;
+            texRes.res.pitch2D.width = mat.cols;
+            texRes.res.pitch2D.pitchInBytes = mat.step;
+            texRes.res.pitch2D.desc = cudaCreateChannelDesc<T>();
+
+            cudaTextureDesc texDescr;
+            std::memset(&texDescr, 0, sizeof(texDescr));
+            texDescr.normalizedCoords = normalizedCoords;
+            texDescr.filterMode = filterMode;
+            texDescr.addressMode[0] = addressMode;
+            texDescr.addressMode[1] = addressMode;
+            texDescr.addressMode[2] = addressMode;
+            texDescr.readMode = cudaReadModeElementType;
+
+            CV_CUDEV_SAFE_CALL( cudaCreateTextureObject(&this->texObj, &texRes, &texDescr, 0) );
+        }
+        else
+        {
+            // Use the texture reference
+            CvCudevTextureRef<T>::bind(mat, normalizedCoords, filterMode, addressMode);
+        }
+    }
+
+    __host__ ~Texture()
+    {
+        if (cc30)
+        {
+            // Use the texture object
+            cudaDestroyTextureObject(this->texObj);
+        }
+        else
+        {
+            // Use the texture reference
+            CvCudevTextureRef<T>::unbind();
+        }
+    }
+};
+
+template <typename T> struct PtrTraits< Texture<T> > : PtrTraitsBase<Texture<T>, TexturePtr<T> >
+{
+};
+
+#else
+
+template <typename T> struct TexturePtr
+{
+    typedef T     value_type;
+    typedef float index_type;
+
+    cudaTextureObject_t texObj;
+
+    __device__ __forceinline__ T operator ()(float y, float x) const
+    {
+    #if CV_CUDEV_ARCH >= 300
+        // Use the texture object
+        return tex2D<T>(texObj, x, y);
+    #else
+        CV_UNUSED(y);
+        CV_UNUSED(x);
+        return T();
+    #endif
+    }
+};
+
+template <typename T> struct Texture : TexturePtr<T>
+{
+    int rows, cols;
+
+    __host__ explicit Texture(const GlobPtrSz<T>& mat,
+                              bool normalizedCoords = false,
+                              cudaTextureFilterMode filterMode = cudaFilterModePoint,
+                              cudaTextureAddressMode addressMode = cudaAddressModeClamp)
+    {
+        CV_Assert( deviceSupports(FEATURE_SET_COMPUTE_30) );
+
+        rows = mat.rows;
+        cols = mat.cols;
+
+        // Use the texture object
+        cudaResourceDesc texRes;
+        std::memset(&texRes, 0, sizeof(texRes));
+        texRes.resType = cudaResourceTypePitch2D;
+        texRes.res.pitch2D.devPtr = mat.data;
+        texRes.res.pitch2D.height = mat.rows;
+        texRes.res.pitch2D.width = mat.cols;
+        texRes.res.pitch2D.pitchInBytes = mat.step;
+        texRes.res.pitch2D.desc = cudaCreateChannelDesc<T>();
+
+        cudaTextureDesc texDescr;
+        std::memset(&texDescr, 0, sizeof(texDescr));
+        texDescr.normalizedCoords = normalizedCoords;
+        texDescr.filterMode = filterMode;
+        texDescr.addressMode[0] = addressMode;
+        texDescr.addressMode[1] = addressMode;
+        texDescr.addressMode[2] = addressMode;
+        texDescr.readMode = cudaReadModeElementType;
+
+        CV_CUDEV_SAFE_CALL( cudaCreateTextureObject(&this->texObj, &texRes, &texDescr, 0) );
+    }
+
+    __host__ ~Texture()
+    {
+        // Use the texture object
+        cudaDestroyTextureObject(this->texObj);
+    }
+};
+
+template <typename T> struct PtrTraits< Texture<T> > : PtrTraitsBase<Texture<T>, TexturePtr<T> >
+{
+};
+
+#endif
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/ptr2d/traits.hpp b/modules/cudev/include/opencv2/cudev/ptr2d/traits.hpp
new file mode 100644
index 00000000000..f0d1cad7a29
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/traits.hpp
@@ -0,0 +1,106 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_PTR2D_TRAITS_HPP
+#define OPENCV_CUDEV_PTR2D_TRAITS_HPP
+
+#include "../common.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <class Ptr2DSz, class Ptr2D> struct PtrTraitsBase
+{
+    typedef Ptr2DSz ptr_sz_type;
+    typedef Ptr2D   ptr_type;
+
+    typedef typename Ptr2D::value_type value_type;
+    typedef typename Ptr2D::index_type index_type;
+
+    __host__ static Ptr2D shrinkPtr(const Ptr2DSz& ptr)
+    {
+        return ptr;
+    }
+
+    __host__ static int getRows(const Ptr2DSz& ptr)
+    {
+        return ptr.rows;
+    }
+
+    __host__ static int getCols(const Ptr2DSz& ptr)
+    {
+        return ptr.cols;
+    }
+};
+
+template <class Ptr2DSz> struct PtrTraits : PtrTraitsBase<Ptr2DSz, Ptr2DSz>
+{
+};
+
+template <class Ptr2DSz>
+__host__ typename PtrTraits<Ptr2DSz>::ptr_type shrinkPtr(const Ptr2DSz& ptr)
+{
+    return PtrTraits<Ptr2DSz>::shrinkPtr(ptr);
+}
+
+template <class Ptr2DSz>
+__host__ int getRows(const Ptr2DSz& ptr)
+{
+    return PtrTraits<Ptr2DSz>::getRows(ptr);
+}
+
+template <class Ptr2DSz>
+__host__ int getCols(const Ptr2DSz& ptr)
+{
+    return PtrTraits<Ptr2DSz>::getCols(ptr);
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/ptr2d/transform.hpp b/modules/cudev/include/opencv2/cudev/ptr2d/transform.hpp
new file mode 100644
index 00000000000..21d50757d38
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/transform.hpp
@@ -0,0 +1,156 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_PTR2D_TRANSFORM_HPP
+#define OPENCV_CUDEV_PTR2D_TRANSFORM_HPP
+
+#include "../common.hpp"
+#include "../grid/copy.hpp"
+#include "traits.hpp"
+#include "gpumat.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// UnaryTransformPtr
+
+template <class SrcPtr, class Op> struct UnaryTransformPtr
+{
+    typedef typename Op::result_type               value_type;
+    typedef typename PtrTraits<SrcPtr>::index_type index_type;
+
+    SrcPtr src;
+    Op op;
+
+    __device__ __forceinline__ typename Op::result_type operator ()(typename PtrTraits<SrcPtr>::index_type y, typename PtrTraits<SrcPtr>::index_type x) const
+    {
+        return op(src(y, x));
+    }
+};
+
+template <class SrcPtr, class Op> struct UnaryTransformPtrSz : UnaryTransformPtr<SrcPtr, Op>
+{
+    int rows, cols;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridCopy(*this, dst, stream);
+    }
+};
+
+template <class SrcPtr, class Op>
+__host__ UnaryTransformPtrSz<typename PtrTraits<SrcPtr>::ptr_type, Op>
+transformPtr(const SrcPtr& src, const Op& op)
+{
+    UnaryTransformPtrSz<typename PtrTraits<SrcPtr>::ptr_type, Op> ptr;
+    ptr.src = shrinkPtr(src);
+    ptr.op = op;
+    ptr.rows = getRows(src);
+    ptr.cols = getCols(src);
+    return ptr;
+}
+
+template <class SrcPtr, class Op> struct PtrTraits< UnaryTransformPtrSz<SrcPtr, Op> > : PtrTraitsBase<UnaryTransformPtrSz<SrcPtr, Op>, UnaryTransformPtr<SrcPtr, Op> >
+{
+};
+
+// BinaryTransformPtr
+
+template <class Src1Ptr, class Src2Ptr, class Op> struct BinaryTransformPtr
+{
+    typedef typename Op::result_type                value_type;
+    typedef typename PtrTraits<Src1Ptr>::index_type index_type;
+
+    Src1Ptr src1;
+    Src2Ptr src2;
+    Op op;
+
+    __device__ __forceinline__ typename Op::result_type operator ()(typename PtrTraits<Src1Ptr>::index_type y, typename PtrTraits<Src1Ptr>::index_type x) const
+    {
+        return op(src1(y, x), src2(y, x));
+    }
+};
+
+template <class Src1Ptr, class Src2Ptr, class Op> struct BinaryTransformPtrSz : BinaryTransformPtr<Src1Ptr, Src2Ptr, Op>
+{
+    int rows, cols;
+
+    template <typename T>
+    __host__ void assignTo(GpuMat_<T>& dst, Stream& stream = Stream::Null()) const
+    {
+        gridCopy(*this, dst, stream);
+    }
+};
+
+template <class Src1Ptr, class Src2Ptr, class Op>
+__host__ BinaryTransformPtrSz<typename PtrTraits<Src1Ptr>::ptr_type, typename PtrTraits<Src2Ptr>::ptr_type, Op>
+transformPtr(const Src1Ptr& src1, const Src2Ptr& src2, const Op& op)
+{
+    const int rows = getRows(src1);
+    const int cols = getCols(src1);
+
+    CV_Assert( getRows(src2) == rows && getCols(src2) == cols );
+
+    BinaryTransformPtrSz<typename PtrTraits<Src1Ptr>::ptr_type, typename PtrTraits<Src2Ptr>::ptr_type, Op> ptr;
+    ptr.src1 = shrinkPtr(src1);
+    ptr.src2 = shrinkPtr(src2);
+    ptr.op = op;
+    ptr.rows = rows;
+    ptr.cols = cols;
+    return ptr;
+}
+
+template <class Src1Ptr, class Src2Ptr, class Op> struct PtrTraits< BinaryTransformPtrSz<Src1Ptr, Src2Ptr, Op> > : PtrTraitsBase<BinaryTransformPtrSz<Src1Ptr, Src2Ptr, Op>, BinaryTransformPtr<Src1Ptr, Src2Ptr, Op> >
+{
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/ptr2d/warping.hpp b/modules/cudev/include/opencv2/cudev/ptr2d/warping.hpp
new file mode 100644
index 00000000000..02df07f8248
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/warping.hpp
@@ -0,0 +1,157 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_PTR2D_WARPING_HPP
+#define OPENCV_CUDEV_PTR2D_WARPING_HPP
+
+#include "../common.hpp"
+#include "traits.hpp"
+#include "remap.hpp"
+#include "gpumat.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// affine
+
+struct AffineMapPtr
+{
+    typedef float2 value_type;
+    typedef float  index_type;
+
+    const float* warpMat;
+
+    __device__ __forceinline__ float2 operator ()(float y, float x) const
+    {
+        const float xcoo = warpMat[0] * x + warpMat[1] * y + warpMat[2];
+        const float ycoo = warpMat[3] * x + warpMat[4] * y + warpMat[5];
+
+        return make_float2(xcoo, ycoo);
+    }
+};
+
+struct AffineMapPtrSz : AffineMapPtr
+{
+    int rows, cols;
+};
+
+template <> struct PtrTraits<AffineMapPtrSz> : PtrTraitsBase<AffineMapPtrSz, AffineMapPtr>
+{
+};
+
+__host__ static AffineMapPtrSz affineMap(Size dstSize, const GpuMat_<float>& warpMat)
+{
+    CV_Assert( warpMat.rows == 2 && warpMat.cols == 3 );
+    CV_Assert( warpMat.isContinuous() );
+
+    AffineMapPtrSz map;
+    map.warpMat = warpMat[0];
+    map.rows = dstSize.height;
+    map.cols = dstSize.width;
+    return map;
+}
+
+template <class SrcPtr>
+__host__ RemapPtr1Sz<typename PtrTraits<SrcPtr>::ptr_type, AffineMapPtr>
+warpAffinePtr(const SrcPtr& src, Size dstSize, const GpuMat_<float>& warpMat)
+{
+    return remapPtr(src, affineMap(dstSize, warpMat));
+}
+
+// perspective
+
+struct PerspectiveMapPtr
+{
+    typedef float2 value_type;
+    typedef float  index_type;
+
+    const float* warpMat;
+
+    __device__ __forceinline__ float2 operator ()(float y, float x) const
+    {
+        const float coeff = 1.0f / (warpMat[6] * x + warpMat[7] * y + warpMat[8]);
+
+        const float xcoo = coeff * (warpMat[0] * x + warpMat[1] * y + warpMat[2]);
+        const float ycoo = coeff * (warpMat[3] * x + warpMat[4] * y + warpMat[5]);
+
+        return make_float2(xcoo, ycoo);
+    }
+};
+
+struct PerspectiveMapPtrSz : PerspectiveMapPtr
+{
+    int rows, cols;
+};
+
+template <> struct PtrTraits<PerspectiveMapPtrSz> : PtrTraitsBase<PerspectiveMapPtrSz, PerspectiveMapPtr>
+{
+};
+
+__host__ static PerspectiveMapPtrSz perspectiveMap(Size dstSize, const GpuMat_<float>& warpMat)
+{
+    CV_Assert( warpMat.rows == 3 && warpMat.cols == 3 );
+    CV_Assert( warpMat.isContinuous() );
+
+    PerspectiveMapPtrSz map;
+    map.warpMat = warpMat[0];
+    map.rows = dstSize.height;
+    map.cols = dstSize.width;
+    return map;
+}
+
+template <class SrcPtr>
+__host__ RemapPtr1Sz<typename PtrTraits<SrcPtr>::ptr_type, PerspectiveMapPtr>
+warpPerspectivePtr(const SrcPtr& src, Size dstSize, const GpuMat_<float>& warpMat)
+{
+    return remapPtr(src, perspectiveMap(dstSize, warpMat));
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/ptr2d/zip.hpp b/modules/cudev/include/opencv2/cudev/ptr2d/zip.hpp
new file mode 100644
index 00000000000..e68f4cf61f5
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/zip.hpp
@@ -0,0 +1,178 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_PTR2D_ZIP_HPP
+#define OPENCV_CUDEV_PTR2D_ZIP_HPP
+
+#include "../common.hpp"
+#include "../util/tuple.hpp"
+#include "traits.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <class PtrTuple> struct ZipPtr;
+
+template <class Ptr0, class Ptr1> struct ZipPtr< tuple<Ptr0, Ptr1> > : tuple<Ptr0, Ptr1>
+{
+    typedef tuple<typename PtrTraits<Ptr0>::value_type,
+                  typename PtrTraits<Ptr1>::value_type> value_type;
+    typedef typename PtrTraits<Ptr0>::index_type        index_type;
+
+    __host__ __device__ __forceinline__ ZipPtr() {}
+    __host__ __device__ __forceinline__ ZipPtr(const tuple<Ptr0, Ptr1>& t) : tuple<Ptr0, Ptr1>(t) {}
+
+    __device__ __forceinline__ value_type operator ()(index_type y, index_type x) const
+    {
+        return make_tuple(cv::cudev::get<0>(*this)(y, x), cv::cudev::get<1>(*this)(y, x));
+    }
+};
+
+template <class Ptr0, class Ptr1, class Ptr2> struct ZipPtr< tuple<Ptr0, Ptr1, Ptr2> > : tuple<Ptr0, Ptr1, Ptr2>
+{
+    typedef tuple<typename PtrTraits<Ptr0>::value_type,
+                  typename PtrTraits<Ptr1>::value_type,
+                  typename PtrTraits<Ptr2>::value_type> value_type;
+    typedef typename PtrTraits<Ptr0>::index_type        index_type;
+
+    __host__ __device__ __forceinline__ ZipPtr() {}
+    __host__ __device__ __forceinline__ ZipPtr(const tuple<Ptr0, Ptr1, Ptr2>& t) : tuple<Ptr0, Ptr1, Ptr2>(t) {}
+
+    __device__ __forceinline__ value_type operator ()(index_type y, index_type x) const
+    {
+        return make_tuple(cv::cudev::get<0>(*this)(y, x), cv::cudev::get<1>(*this)(y, x), cv::cudev::get<2>(*this)(y, x));
+    }
+};
+
+template <class Ptr0, class Ptr1, class Ptr2, class Ptr3> struct ZipPtr< tuple<Ptr0, Ptr1, Ptr2, Ptr3> > : tuple<Ptr0, Ptr1, Ptr2, Ptr3>
+{
+    typedef tuple<typename PtrTraits<Ptr0>::value_type,
+                  typename PtrTraits<Ptr1>::value_type,
+                  typename PtrTraits<Ptr2>::value_type,
+                  typename PtrTraits<Ptr3>::value_type> value_type;
+    typedef typename PtrTraits<Ptr0>::index_type        index_type;
+
+    __host__ __device__ __forceinline__ ZipPtr() {}
+    __host__ __device__ __forceinline__ ZipPtr(const tuple<Ptr0, Ptr1, Ptr2, Ptr3>& t) : tuple<Ptr0, Ptr1, Ptr2, Ptr3>(t) {}
+
+    __device__ __forceinline__ value_type operator ()(index_type y, index_type x) const
+    {
+        return make_tuple(cv::cudev::get<0>(*this)(y, x), cv::cudev::get<1>(*this)(y, x), cv::cudev::get<2>(*this)(y, x), cv::cudev::get<3>(*this)(y, x));
+    }
+};
+
+template <class PtrTuple> struct ZipPtrSz : ZipPtr<PtrTuple>
+{
+    int rows, cols;
+
+    __host__ __device__ __forceinline__ ZipPtrSz() {}
+    __host__ __device__ __forceinline__ ZipPtrSz(const PtrTuple& t) : ZipPtr<PtrTuple>(t) {}
+};
+
+template <class Ptr0, class Ptr1>
+__host__ ZipPtrSz< tuple<typename PtrTraits<Ptr0>::ptr_type, typename PtrTraits<Ptr1>::ptr_type> >
+zipPtr(const Ptr0& ptr0, const Ptr1& ptr1)
+{
+    const int rows = getRows(ptr0);
+    const int cols = getCols(ptr0);
+
+    CV_Assert( getRows(ptr1) == rows && getCols(ptr1) == cols );
+
+    ZipPtrSz< tuple<typename PtrTraits<Ptr0>::ptr_type, typename PtrTraits<Ptr1>::ptr_type> >
+            z(make_tuple(shrinkPtr(ptr0), shrinkPtr(ptr1)));
+    z.rows = rows;
+    z.cols = cols;
+
+    return z;
+}
+
+template <class Ptr0, class Ptr1, class Ptr2>
+__host__ ZipPtrSz< tuple<typename PtrTraits<Ptr0>::ptr_type, typename PtrTraits<Ptr1>::ptr_type, typename PtrTraits<Ptr2>::ptr_type> >
+zipPtr(const Ptr0& ptr0, const Ptr1& ptr1, const Ptr2& ptr2)
+{
+    const int rows = getRows(ptr0);
+    const int cols = getCols(ptr0);
+
+    CV_Assert( getRows(ptr1) == rows && getCols(ptr1) == cols );
+    CV_Assert( getRows(ptr2) == rows && getCols(ptr2) == cols );
+
+    ZipPtrSz< tuple<typename PtrTraits<Ptr0>::ptr_type, typename PtrTraits<Ptr1>::ptr_type, typename PtrTraits<Ptr2>::ptr_type> >
+            z(make_tuple(shrinkPtr(ptr0), shrinkPtr(ptr1), shrinkPtr(ptr2)));
+    z.rows = rows;
+    z.cols = cols;
+
+    return z;
+}
+
+template <class Ptr0, class Ptr1, class Ptr2, class Ptr3>
+__host__ ZipPtrSz< tuple<typename PtrTraits<Ptr0>::ptr_type, typename PtrTraits<Ptr1>::ptr_type, typename PtrTraits<Ptr2>::ptr_type, typename PtrTraits<Ptr3>::ptr_type> >
+zipPtr(const Ptr0& ptr0, const Ptr1& ptr1, const Ptr2& ptr2, const Ptr3& ptr3)
+{
+    const int rows = getRows(ptr0);
+    const int cols = getCols(ptr0);
+
+    CV_Assert( getRows(ptr1) == rows && getCols(ptr1) == cols );
+    CV_Assert( getRows(ptr2) == rows && getCols(ptr2) == cols );
+    CV_Assert( getRows(ptr3) == rows && getCols(ptr3) == cols );
+
+    ZipPtrSz< tuple<typename PtrTraits<Ptr0>::ptr_type, typename PtrTraits<Ptr1>::ptr_type, typename PtrTraits<Ptr2>::ptr_type, typename PtrTraits<Ptr3>::ptr_type> >
+            z(make_tuple(shrinkPtr(ptr0), shrinkPtr(ptr1), shrinkPtr(ptr2), shrinkPtr(ptr3)));
+    z.rows = rows;
+    z.cols = cols;
+
+    return z;
+}
+
+template <class PtrTuple> struct PtrTraits< ZipPtrSz<PtrTuple> > : PtrTraitsBase<ZipPtrSz<PtrTuple>, ZipPtr<PtrTuple> >
+{
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/util/atomic.hpp b/modules/cudev/include/opencv2/cudev/util/atomic.hpp
new file mode 100644
index 00000000000..190e8ee48b3
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/util/atomic.hpp
@@ -0,0 +1,202 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_UTIL_ATOMIC_HPP
+#define OPENCV_CUDEV_UTIL_ATOMIC_HPP
+
+#include "../common.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// atomicAdd
+
+__device__ __forceinline__ int atomicAdd(int* address, int val)
+{
+    return ::atomicAdd(address, val);
+}
+
+__device__ __forceinline__ uint atomicAdd(uint* address, uint val)
+{
+    return ::atomicAdd(address, val);
+}
+
+__device__ __forceinline__ float atomicAdd(float* address, float val)
+{
+#if CV_CUDEV_ARCH >= 200
+    return ::atomicAdd(address, val);
+#else
+    int* address_as_i = (int*) address;
+    int old = *address_as_i, assumed;
+    do {
+        assumed = old;
+        old = ::atomicCAS(address_as_i, assumed,
+            __float_as_int(val + __int_as_float(assumed)));
+    } while (assumed != old);
+    return __int_as_float(old);
+#endif
+}
+
+__device__ static double atomicAdd(double* address, double val)
+{
+#if CV_CUDEV_ARCH >= 130
+    unsigned long long int* address_as_ull = (unsigned long long int*) address;
+    unsigned long long int old = *address_as_ull, assumed;
+    do {
+        assumed = old;
+        old = ::atomicCAS(address_as_ull, assumed,
+            __double_as_longlong(val + __longlong_as_double(assumed)));
+    } while (assumed != old);
+    return __longlong_as_double(old);
+#else
+    CV_UNUSED(address);
+    CV_UNUSED(val);
+    return 0.0;
+#endif
+}
+
+// atomicMin
+
+__device__ __forceinline__ int atomicMin(int* address, int val)
+{
+    return ::atomicMin(address, val);
+}
+
+__device__ __forceinline__ uint atomicMin(uint* address, uint val)
+{
+    return ::atomicMin(address, val);
+}
+
+__device__ static float atomicMin(float* address, float val)
+{
+#if CV_CUDEV_ARCH >= 120
+    int* address_as_i = (int*) address;
+    int old = *address_as_i, assumed;
+    do {
+        assumed = old;
+        old = ::atomicCAS(address_as_i, assumed,
+            __float_as_int(::fminf(val, __int_as_float(assumed))));
+    } while (assumed != old);
+    return __int_as_float(old);
+#else
+    CV_UNUSED(address);
+    CV_UNUSED(val);
+    return 0.0f;
+#endif
+}
+
+__device__ static double atomicMin(double* address, double val)
+{
+#if CV_CUDEV_ARCH >= 130
+    unsigned long long int* address_as_ull = (unsigned long long int*) address;
+    unsigned long long int old = *address_as_ull, assumed;
+    do {
+        assumed = old;
+        old = ::atomicCAS(address_as_ull, assumed,
+            __double_as_longlong(::fmin(val, __longlong_as_double(assumed))));
+    } while (assumed != old);
+    return __longlong_as_double(old);
+#else
+    CV_UNUSED(address);
+    CV_UNUSED(val);
+    return 0.0;
+#endif
+}
+
+// atomicMax
+
+__device__ __forceinline__ int atomicMax(int* address, int val)
+{
+    return ::atomicMax(address, val);
+}
+
+__device__ __forceinline__ uint atomicMax(uint* address, uint val)
+{
+    return ::atomicMax(address, val);
+}
+
+__device__ static float atomicMax(float* address, float val)
+{
+#if CV_CUDEV_ARCH >= 120
+    int* address_as_i = (int*) address;
+    int old = *address_as_i, assumed;
+    do {
+        assumed = old;
+        old = ::atomicCAS(address_as_i, assumed,
+            __float_as_int(::fmaxf(val, __int_as_float(assumed))));
+    } while (assumed != old);
+    return __int_as_float(old);
+#else
+    CV_UNUSED(address);
+    CV_UNUSED(val);
+    return 0.0f;
+#endif
+}
+
+__device__ static double atomicMax(double* address, double val)
+{
+#if CV_CUDEV_ARCH >= 130
+    unsigned long long int* address_as_ull = (unsigned long long int*) address;
+    unsigned long long int old = *address_as_ull, assumed;
+    do {
+        assumed = old;
+        old = ::atomicCAS(address_as_ull, assumed,
+            __double_as_longlong(::fmax(val, __longlong_as_double(assumed))));
+    } while (assumed != old);
+    return __longlong_as_double(old);
+#else
+    CV_UNUSED(address);
+    CV_UNUSED(val);
+    return 0.0;
+#endif
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/util/detail/tuple.hpp b/modules/cudev/include/opencv2/cudev/util/detail/tuple.hpp
new file mode 100644
index 00000000000..248306149e5
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/util/detail/tuple.hpp
@@ -0,0 +1,175 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_UTIL_TUPLE_DETAIL_HPP
+#define OPENCV_CUDEV_UTIL_TUPLE_DETAIL_HPP
+
+#include <thrust/tuple.h>
+
+namespace cv { namespace cudev {
+
+namespace tuple_detail
+{
+    using thrust::tuple;
+    using thrust::tuple_size;
+    using thrust::get;
+    using thrust::tuple_element;
+    using thrust::make_tuple;
+    using thrust::tie;
+
+    template <class Tuple, int SIZE, template <typename T> class CvtOp> struct ConvertTuple;
+
+    template <class Tuple, template <typename T> class CvtOp> struct ConvertTuple<Tuple, 2, CvtOp>
+    {
+        typedef tuple<
+            typename CvtOp<typename tuple_element<0, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<1, Tuple>::type>::type
+        > type;
+    };
+
+    template <class Tuple, template <typename T> class CvtOp> struct ConvertTuple<Tuple, 3, CvtOp>
+    {
+        typedef tuple<
+            typename CvtOp<typename tuple_element<0, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<1, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<2, Tuple>::type>::type
+        > type;
+    };
+
+    template <class Tuple, template <typename T> class CvtOp> struct ConvertTuple<Tuple, 4, CvtOp>
+    {
+        typedef tuple<
+            typename CvtOp<typename tuple_element<0, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<1, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<2, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<3, Tuple>::type>::type
+        > type;
+    };
+
+    template <class Tuple, template <typename T> class CvtOp> struct ConvertTuple<Tuple, 5, CvtOp>
+    {
+        typedef tuple<
+            typename CvtOp<typename tuple_element<0, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<1, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<2, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<3, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<4, Tuple>::type>::type
+        > type;
+    };
+
+    template <class Tuple, template <typename T> class CvtOp> struct ConvertTuple<Tuple, 6, CvtOp>
+    {
+        typedef tuple<
+            typename CvtOp<typename tuple_element<0, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<1, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<2, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<3, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<4, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<5, Tuple>::type>::type
+        > type;
+    };
+
+    template <class Tuple, template <typename T> class CvtOp> struct ConvertTuple<Tuple, 7, CvtOp>
+    {
+        typedef tuple<
+            typename CvtOp<typename tuple_element<0, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<1, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<2, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<3, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<4, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<5, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<6, Tuple>::type>::type
+        > type;
+    };
+
+    template <class Tuple, template <typename T> class CvtOp> struct ConvertTuple<Tuple, 8, CvtOp>
+    {
+        typedef tuple<
+            typename CvtOp<typename tuple_element<0, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<1, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<2, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<3, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<4, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<5, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<6, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<7, Tuple>::type>::type
+        > type;
+    };
+
+    template <class Tuple, template <typename T> class CvtOp> struct ConvertTuple<Tuple, 9, CvtOp>
+    {
+        typedef tuple<
+            typename CvtOp<typename tuple_element<0, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<1, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<2, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<3, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<4, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<5, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<6, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<7, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<8, Tuple>::type>::type
+        > type;
+    };
+
+    template <class Tuple, template <typename T> class CvtOp> struct ConvertTuple<Tuple, 10, CvtOp>
+    {
+        typedef tuple<
+            typename CvtOp<typename tuple_element<0, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<1, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<2, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<3, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<4, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<5, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<6, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<7, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<8, Tuple>::type>::type,
+            typename CvtOp<typename tuple_element<9, Tuple>::type>::type
+        > type;
+    };
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/util/detail/type_traits.hpp b/modules/cudev/include/opencv2/cudev/util/detail/type_traits.hpp
new file mode 100644
index 00000000000..91e47362f94
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/util/detail/type_traits.hpp
@@ -0,0 +1,238 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_UTIL_TYPE_TRAITS_DETAIL_HPP
+#define OPENCV_CUDEV_UTIL_TYPE_TRAITS_DETAIL_HPP
+
+#include "../../common.hpp"
+
+namespace cv { namespace cudev {
+
+namespace type_traits_detail
+{
+    template <typename T> struct IsSignedIntergral { enum {value = 0}; };
+    template <> struct IsSignedIntergral<schar> { enum {value = 1}; };
+    template <> struct IsSignedIntergral<short> { enum {value = 1}; };
+    template <> struct IsSignedIntergral<int> { enum {value = 1}; };
+
+    template <typename T> struct IsUnsignedIntegral { enum {value = 0}; };
+    template <> struct IsUnsignedIntegral<uchar> { enum {value = 1}; };
+    template <> struct IsUnsignedIntegral<ushort> { enum {value = 1}; };
+    template <> struct IsUnsignedIntegral<uint> { enum {value = 1}; };
+
+    template <typename T> struct IsIntegral { enum {value = IsSignedIntergral<T>::value || IsUnsignedIntegral<T>::value}; };
+    template <> struct IsIntegral<char> { enum {value = 1}; };
+    template <> struct IsIntegral<bool> { enum {value = 1}; };
+
+    template <typename T> struct IsFloat { enum {value = 0}; };
+    template <> struct IsFloat<float> { enum {value = 1}; };
+    template <> struct IsFloat<double> { enum {value = 1}; };
+
+    template <typename T> struct IsVec { enum {value = 0}; };
+    template <> struct IsVec<uchar1> { enum {value = 1}; };
+    template <> struct IsVec<uchar2> { enum {value = 1}; };
+    template <> struct IsVec<uchar3> { enum {value = 1}; };
+    template <> struct IsVec<uchar4> { enum {value = 1}; };
+    template <> struct IsVec<char1> { enum {value = 1}; };
+    template <> struct IsVec<char2> { enum {value = 1}; };
+    template <> struct IsVec<char3> { enum {value = 1}; };
+    template <> struct IsVec<char4> { enum {value = 1}; };
+    template <> struct IsVec<ushort1> { enum {value = 1}; };
+    template <> struct IsVec<ushort2> { enum {value = 1}; };
+    template <> struct IsVec<ushort3> { enum {value = 1}; };
+    template <> struct IsVec<ushort4> { enum {value = 1}; };
+    template <> struct IsVec<short1> { enum {value = 1}; };
+    template <> struct IsVec<short2> { enum {value = 1}; };
+    template <> struct IsVec<short3> { enum {value = 1}; };
+    template <> struct IsVec<short4> { enum {value = 1}; };
+    template <> struct IsVec<uint1> { enum {value = 1}; };
+    template <> struct IsVec<uint2> { enum {value = 1}; };
+    template <> struct IsVec<uint3> { enum {value = 1}; };
+    template <> struct IsVec<uint4> { enum {value = 1}; };
+    template <> struct IsVec<int1> { enum {value = 1}; };
+    template <> struct IsVec<int2> { enum {value = 1}; };
+    template <> struct IsVec<int3> { enum {value = 1}; };
+    template <> struct IsVec<int4> { enum {value = 1}; };
+    template <> struct IsVec<float1> { enum {value = 1}; };
+    template <> struct IsVec<float2> { enum {value = 1}; };
+    template <> struct IsVec<float3> { enum {value = 1}; };
+    template <> struct IsVec<float4> { enum {value = 1}; };
+    template <> struct IsVec<double1> { enum {value = 1}; };
+    template <> struct IsVec<double2> { enum {value = 1}; };
+    template <> struct IsVec<double3> { enum {value = 1}; };
+    template <> struct IsVec<double4> { enum {value = 1}; };
+
+    template <class U> struct AddParameterType { typedef const U& type; };
+    template <class U> struct AddParameterType<U&> { typedef U& type; };
+    template <> struct AddParameterType<void> { typedef void type; };
+
+    // ReferenceTraits
+
+    template <class U> struct ReferenceTraits
+    {
+        enum { value = 0 };
+        typedef U type;
+    };
+    template <class U> struct ReferenceTraits<U&>
+    {
+        enum { value = 1 };
+        typedef U type;
+    };
+
+    // PointerTraits
+
+    template <class U> struct PointerTraits
+    {
+        enum { value = 0 };
+        typedef void type;
+    };
+    template <class U> struct PointerTraits<U*>
+    {
+        enum { value = 1 };
+        typedef U type;
+    };
+    template <class U> struct PointerTraits<U*&>
+    {
+        enum { value = 1 };
+        typedef U type;
+    };
+
+    // UnConst
+
+    template <class U> struct UnConst
+    {
+        typedef U type;
+        enum { value = 0 };
+    };
+    template <class U> struct UnConst<const U>
+    {
+        typedef U type;
+        enum { value = 1 };
+    };
+    template <class U> struct UnConst<const U&>
+    {
+        typedef U& type;
+        enum { value = 1 };
+    };
+
+    // UnVolatile
+
+    template <class U> struct UnVolatile
+    {
+        typedef U type;
+        enum { value = 0 };
+    };
+    template <class U> struct UnVolatile<volatile U>
+    {
+        typedef U type;
+        enum { value = 1 };
+    };
+    template <class U> struct UnVolatile<volatile U&>
+    {
+        typedef U& type;
+        enum { value = 1 };
+    };
+
+    // IsSimpleParameter
+
+    template <typename T> struct IsSimpleParameter
+    {
+        enum { value = IsIntegral<T>::value
+               || IsFloat<T>::value
+               || PointerTraits<typename ReferenceTraits<T>::type>::value};
+    };
+
+    // LargerDepth
+
+    template <bool, typename ThenType, typename ElseType> struct SelectIf
+    {
+        typedef ThenType type;
+    };
+    template <typename ThenType, typename ElseType> struct SelectIf<false, ThenType, ElseType>
+    {
+        typedef ElseType type;
+    };
+
+    template <typename A, typename B> struct LargerDepth
+    {
+        typedef typename SelectIf<sizeof(A) >= sizeof(B), A, B>::type type;
+    };
+    template <typename A> struct LargerDepth<A, float>
+    {
+        typedef float type;
+    };
+    template <typename A> struct LargerDepth<float, A>
+    {
+        typedef float type;
+    };
+    template <typename A> struct LargerDepth<A, double>
+    {
+        typedef double type;
+    };
+    template <typename A> struct LargerDepth<double, A>
+    {
+        typedef double type;
+    };
+    template <> struct LargerDepth<float, float>
+    {
+        typedef float type;
+    };
+    template <> struct LargerDepth<float, double>
+    {
+        typedef double type;
+    };
+    template <> struct LargerDepth<double, float>
+    {
+        typedef double type;
+    };
+    template <> struct LargerDepth<double, double>
+    {
+        typedef double type;
+    };
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/util/limits.hpp b/modules/cudev/include/opencv2/cudev/util/limits.hpp
new file mode 100644
index 00000000000..753fd91878b
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/util/limits.hpp
@@ -0,0 +1,129 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_UTIL_LIMITS_HPP
+#define OPENCV_CUDEV_UTIL_LIMITS_HPP
+
+#include <limits.h>
+#include <float.h>
+#include "../common.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <class T> struct numeric_limits;
+
+template <> struct numeric_limits<bool>
+{
+    __device__ __forceinline__ static bool min() { return false; }
+    __device__ __forceinline__ static bool max() { return true;  }
+    static const bool is_signed = false;
+};
+
+template <> struct numeric_limits<schar>
+{
+    __device__ __forceinline__ static schar min() { return SCHAR_MIN; }
+    __device__ __forceinline__ static schar max() { return SCHAR_MAX; }
+    static const bool is_signed = true;
+};
+
+template <> struct numeric_limits<uchar>
+{
+    __device__ __forceinline__ static uchar min() { return 0; }
+    __device__ __forceinline__ static uchar max() { return UCHAR_MAX; }
+    static const bool is_signed = false;
+};
+
+template <> struct numeric_limits<short>
+{
+    __device__ __forceinline__ static short min() { return SHRT_MIN; }
+    __device__ __forceinline__ static short max() { return SHRT_MAX; }
+    static const bool is_signed = true;
+};
+
+template <> struct numeric_limits<ushort>
+{
+    __device__ __forceinline__ static ushort min() { return 0; }
+    __device__ __forceinline__ static ushort max() { return USHRT_MAX; }
+    static const bool is_signed = false;
+};
+
+template <> struct numeric_limits<int>
+{
+    __device__ __forceinline__ static int min() { return INT_MIN; }
+    __device__ __forceinline__ static int max() { return INT_MAX; }
+    static const bool is_signed = true;
+};
+
+template <> struct numeric_limits<uint>
+{
+    __device__ __forceinline__ static uint min() { return 0; }
+    __device__ __forceinline__ static uint max() { return UINT_MAX; }
+    static const bool is_signed = false;
+};
+
+template <> struct numeric_limits<float>
+{
+    __device__ __forceinline__ static float min() { return FLT_MIN; }
+    __device__ __forceinline__ static float max() { return FLT_MAX; }
+    __device__ __forceinline__ static float epsilon() { return FLT_EPSILON; }
+    static const bool is_signed = true;
+};
+
+template <> struct numeric_limits<double>
+{
+    __device__ __forceinline__ static double min() { return DBL_MIN; }
+    __device__ __forceinline__ static double max() { return DBL_MAX; }
+    __device__ __forceinline__ static double epsilon() { return DBL_EPSILON; }
+    static const bool is_signed = true;
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/util/saturate_cast.hpp b/modules/cudev/include/opencv2/cudev/util/saturate_cast.hpp
new file mode 100644
index 00000000000..59fd8da45ac
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/util/saturate_cast.hpp
@@ -0,0 +1,300 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_UTIL_SATURATE_CAST_HPP
+#define OPENCV_CUDEV_UTIL_SATURATE_CAST_HPP
+
+#include "../common.hpp"
+#if __CUDACC_VER_MAJOR__ >= 9
+#include <cuda_fp16.h>
+#endif
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+template <typename T> __device__ __forceinline__ T saturate_cast(uchar v) { return T(v); }
+template <typename T> __device__ __forceinline__ T saturate_cast(schar v) { return T(v); }
+template <typename T> __device__ __forceinline__ T saturate_cast(ushort v) { return T(v); }
+template <typename T> __device__ __forceinline__ T saturate_cast(short v) { return T(v); }
+template <typename T> __device__ __forceinline__ T saturate_cast(uint v) { return T(v); }
+template <typename T> __device__ __forceinline__ T saturate_cast(int v) { return T(v); }
+template <typename T> __device__ __forceinline__ T saturate_cast(float v) { return T(v); }
+template <typename T> __device__ __forceinline__ T saturate_cast(double v) { return T(v); }
+
+template <> __device__ __forceinline__ uchar saturate_cast<uchar>(schar v)
+{
+    uint res = 0;
+    int vi = v;
+    asm("cvt.sat.u8.s8 %0, %1;" : "=r"(res) : "r"(vi));
+    return res;
+}
+template <> __device__ __forceinline__ uchar saturate_cast<uchar>(short v)
+{
+    uint res = 0;
+    asm("cvt.sat.u8.s16 %0, %1;" : "=r"(res) : "h"(v));
+    return res;
+}
+template <> __device__ __forceinline__ uchar saturate_cast<uchar>(ushort v)
+{
+    uint res = 0;
+    asm("cvt.sat.u8.u16 %0, %1;" : "=r"(res) : "h"(v));
+    return res;
+}
+template <> __device__ __forceinline__ uchar saturate_cast<uchar>(int v)
+{
+    uint res = 0;
+    asm("cvt.sat.u8.s32 %0, %1;" : "=r"(res) : "r"(v));
+    return res;
+}
+template <> __device__ __forceinline__ uchar saturate_cast<uchar>(uint v)
+{
+    uint res = 0;
+    asm("cvt.sat.u8.u32 %0, %1;" : "=r"(res) : "r"(v));
+    return res;
+}
+template <> __device__ __forceinline__ uchar saturate_cast<uchar>(float v)
+{
+    uint res = 0;
+    asm("cvt.rni.sat.u8.f32 %0, %1;" : "=r"(res) : "f"(v));
+    return res;
+}
+template <> __device__ __forceinline__ uchar saturate_cast<uchar>(double v)
+{
+    uint res = 0;
+    asm("cvt.rni.sat.u8.f64 %0, %1;" : "=r"(res) : "d"(v));
+    return res;
+}
+
+template <> __device__ __forceinline__ schar saturate_cast<schar>(uchar v)
+{
+    uint res = 0;
+    uint vi = v;
+    asm("cvt.sat.s8.u8 %0, %1;" : "=r"(res) : "r"(vi));
+    return res;
+}
+template <> __device__ __forceinline__ schar saturate_cast<schar>(short v)
+{
+    uint res = 0;
+    asm("cvt.sat.s8.s16 %0, %1;" : "=r"(res) : "h"(v));
+    return res;
+}
+template <> __device__ __forceinline__ schar saturate_cast<schar>(ushort v)
+{
+    uint res = 0;
+    asm("cvt.sat.s8.u16 %0, %1;" : "=r"(res) : "h"(v));
+    return res;
+}
+template <> __device__ __forceinline__ schar saturate_cast<schar>(int v)
+{
+    uint res = 0;
+    asm("cvt.sat.s8.s32 %0, %1;" : "=r"(res) : "r"(v));
+    return res;
+}
+template <> __device__ __forceinline__ schar saturate_cast<schar>(uint v)
+{
+    uint res = 0;
+    asm("cvt.sat.s8.u32 %0, %1;" : "=r"(res) : "r"(v));
+    return res;
+}
+template <> __device__ __forceinline__ schar saturate_cast<schar>(float v)
+{
+    uint res = 0;
+    asm("cvt.rni.sat.s8.f32 %0, %1;" : "=r"(res) : "f"(v));
+    return res;
+}
+template <> __device__ __forceinline__ schar saturate_cast<schar>(double v)
+{
+    uint res = 0;
+    asm("cvt.rni.sat.s8.f64 %0, %1;" : "=r"(res) : "d"(v));
+    return res;
+}
+
+template <> __device__ __forceinline__ ushort saturate_cast<ushort>(schar v)
+{
+    ushort res = 0;
+    int vi = v;
+    asm("cvt.sat.u16.s8 %0, %1;" : "=h"(res) : "r"(vi));
+    return res;
+}
+template <> __device__ __forceinline__ ushort saturate_cast<ushort>(short v)
+{
+    ushort res = 0;
+    asm("cvt.sat.u16.s16 %0, %1;" : "=h"(res) : "h"(v));
+    return res;
+}
+template <> __device__ __forceinline__ ushort saturate_cast<ushort>(int v)
+{
+    ushort res = 0;
+    asm("cvt.sat.u16.s32 %0, %1;" : "=h"(res) : "r"(v));
+    return res;
+}
+template <> __device__ __forceinline__ ushort saturate_cast<ushort>(uint v)
+{
+    ushort res = 0;
+    asm("cvt.sat.u16.u32 %0, %1;" : "=h"(res) : "r"(v));
+    return res;
+}
+template <> __device__ __forceinline__ ushort saturate_cast<ushort>(float v)
+{
+    ushort res = 0;
+    asm("cvt.rni.sat.u16.f32 %0, %1;" : "=h"(res) : "f"(v));
+    return res;
+}
+template <> __device__ __forceinline__ ushort saturate_cast<ushort>(double v)
+{
+    ushort res = 0;
+    asm("cvt.rni.sat.u16.f64 %0, %1;" : "=h"(res) : "d"(v));
+    return res;
+}
+
+template <> __device__ __forceinline__ short saturate_cast<short>(ushort v)
+{
+    short res = 0;
+    asm("cvt.sat.s16.u16 %0, %1;" : "=h"(res) : "h"(v));
+    return res;
+}
+template <> __device__ __forceinline__ short saturate_cast<short>(int v)
+{
+    short res = 0;
+    asm("cvt.sat.s16.s32 %0, %1;" : "=h"(res) : "r"(v));
+    return res;
+}
+template <> __device__ __forceinline__ short saturate_cast<short>(uint v)
+{
+    short res = 0;
+    asm("cvt.sat.s16.u32 %0, %1;" : "=h"(res) : "r"(v));
+    return res;
+}
+template <> __device__ __forceinline__ short saturate_cast<short>(float v)
+{
+    short res = 0;
+    asm("cvt.rni.sat.s16.f32 %0, %1;" : "=h"(res) : "f"(v));
+    return res;
+}
+template <> __device__ __forceinline__ short saturate_cast<short>(double v)
+{
+    short res = 0;
+    asm("cvt.rni.sat.s16.f64 %0, %1;" : "=h"(res) : "d"(v));
+    return res;
+}
+
+template <> __device__ __forceinline__ int saturate_cast<int>(uint v)
+{
+    int res = 0;
+    asm("cvt.sat.s32.u32 %0, %1;" : "=r"(res) : "r"(v));
+    return res;
+}
+template <> __device__ __forceinline__ int saturate_cast<int>(float v)
+{
+    return __float2int_rn(v);
+}
+template <> __device__ __forceinline__ int saturate_cast<int>(double v)
+{
+#if CV_CUDEV_ARCH >= 130
+    return __double2int_rn(v);
+#else
+    return saturate_cast<int>((float) v);
+#endif
+}
+
+template <> __device__ __forceinline__ uint saturate_cast<uint>(schar v)
+{
+    uint res = 0;
+    int vi = v;
+    asm("cvt.sat.u32.s8 %0, %1;" : "=r"(res) : "r"(vi));
+    return res;
+}
+template <> __device__ __forceinline__ uint saturate_cast<uint>(short v)
+{
+    uint res = 0;
+    asm("cvt.sat.u32.s16 %0, %1;" : "=r"(res) : "h"(v));
+    return res;
+}
+template <> __device__ __forceinline__ uint saturate_cast<uint>(int v)
+{
+    uint res = 0;
+    asm("cvt.sat.u32.s32 %0, %1;" : "=r"(res) : "r"(v));
+    return res;
+}
+template <> __device__ __forceinline__ uint saturate_cast<uint>(float v)
+{
+    return __float2uint_rn(v);
+}
+template <> __device__ __forceinline__ uint saturate_cast<uint>(double v)
+{
+#if CV_CUDEV_ARCH >= 130
+    return __double2uint_rn(v);
+#else
+    return saturate_cast<uint>((float) v);
+#endif
+}
+
+template <typename T, typename D> __device__ __forceinline__ D cast_fp16(T v);
+
+template <> __device__ __forceinline__ float cast_fp16<short, float>(short v)
+{
+#if __CUDACC_VER_MAJOR__ >= 9
+  return float(*(__half*)&v);
+#else
+    return __half2float(v);
+#endif
+}
+
+template <> __device__ __forceinline__ short cast_fp16<float, short>(float v)
+{
+#if __CUDACC_VER_MAJOR__ >= 9
+  __half h(v);
+  return *(short*)&h;
+#else
+  return (short)__float2half_rn(v);
+#endif
+}
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/util/simd_functions.hpp b/modules/cudev/include/opencv2/cudev/util/simd_functions.hpp
new file mode 100644
index 00000000000..ed6efa6a2ba
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/util/simd_functions.hpp
@@ -0,0 +1,918 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+/*
+ * Copyright (c) 2013 NVIDIA Corporation. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ *   Redistributions of source code must retain the above copyright notice,
+ *   this list of conditions and the following disclaimer.
+ *
+ *   Redistributions in binary form must reproduce the above copyright notice,
+ *   this list of conditions and the following disclaimer in the documentation
+ *   and/or other materials provided with the distribution.
+ *
+ *   Neither the name of NVIDIA Corporation nor the names of its contributors
+ *   may be used to endorse or promote products derived from this software
+ *   without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_UTIL_SIMD_FUNCTIONS_HPP
+#define OPENCV_CUDEV_UTIL_SIMD_FUNCTIONS_HPP
+
+#include "../common.hpp"
+
+/*
+  This header file contains inline functions that implement intra-word SIMD
+  operations, that are hardware accelerated on sm_3x (Kepler) GPUs. Efficient
+  emulation code paths are provided for earlier architectures (sm_1x, sm_2x)
+  to make the code portable across all GPUs supported by CUDA. The following
+  functions are currently implemented:
+
+  vadd2(a,b)      per-halfword unsigned addition, with wrap-around: a + b
+  vsub2(a,b)      per-halfword unsigned subtraction, with wrap-around: a - b
+  vabsdiff2(a,b)  per-halfword unsigned absolute difference: |a - b|
+  vavg2(a,b)      per-halfword unsigned average: (a + b) / 2
+  vavrg2(a,b)     per-halfword unsigned rounded average: (a + b + 1) / 2
+  vseteq2(a,b)    per-halfword unsigned comparison: a == b ? 1 : 0
+  vcmpeq2(a,b)    per-halfword unsigned comparison: a == b ? 0xffff : 0
+  vsetge2(a,b)    per-halfword unsigned comparison: a >= b ? 1 : 0
+  vcmpge2(a,b)    per-halfword unsigned comparison: a >= b ? 0xffff : 0
+  vsetgt2(a,b)    per-halfword unsigned comparison: a > b ? 1 : 0
+  vcmpgt2(a,b)    per-halfword unsigned comparison: a > b ? 0xffff : 0
+  vsetle2(a,b)    per-halfword unsigned comparison: a <= b ? 1 : 0
+  vcmple2(a,b)    per-halfword unsigned comparison: a <= b ? 0xffff : 0
+  vsetlt2(a,b)    per-halfword unsigned comparison: a < b ? 1 : 0
+  vcmplt2(a,b)    per-halfword unsigned comparison: a < b ? 0xffff : 0
+  vsetne2(a,b)    per-halfword unsigned comparison: a != b ? 1 : 0
+  vcmpne2(a,b)    per-halfword unsigned comparison: a != b ? 0xffff : 0
+  vmax2(a,b)      per-halfword unsigned maximum: max(a, b)
+  vmin2(a,b)      per-halfword unsigned minimum: min(a, b)
+
+  vadd4(a,b)      per-byte unsigned addition, with wrap-around: a + b
+  vsub4(a,b)      per-byte unsigned subtraction, with wrap-around: a - b
+  vabsdiff4(a,b)  per-byte unsigned absolute difference: |a - b|
+  vavg4(a,b)      per-byte unsigned average: (a + b) / 2
+  vavrg4(a,b)     per-byte unsigned rounded average: (a + b + 1) / 2
+  vseteq4(a,b)    per-byte unsigned comparison: a == b ? 1 : 0
+  vcmpeq4(a,b)    per-byte unsigned comparison: a == b ? 0xff : 0
+  vsetge4(a,b)    per-byte unsigned comparison: a >= b ? 1 : 0
+  vcmpge4(a,b)    per-byte unsigned comparison: a >= b ? 0xff : 0
+  vsetgt4(a,b)    per-byte unsigned comparison: a > b ? 1 : 0
+  vcmpgt4(a,b)    per-byte unsigned comparison: a > b ? 0xff : 0
+  vsetle4(a,b)    per-byte unsigned comparison: a <= b ? 1 : 0
+  vcmple4(a,b)    per-byte unsigned comparison: a <= b ? 0xff : 0
+  vsetlt4(a,b)    per-byte unsigned comparison: a < b ? 1 : 0
+  vcmplt4(a,b)    per-byte unsigned comparison: a < b ? 0xff : 0
+  vsetne4(a,b)    per-byte unsigned comparison: a != b ? 1: 0
+  vcmpne4(a,b)    per-byte unsigned comparison: a != b ? 0xff: 0
+  vmax4(a,b)      per-byte unsigned maximum: max(a, b)
+  vmin4(a,b)      per-byte unsigned minimum: min(a, b)
+*/
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// 2
+
+__device__ __forceinline__ uint vadd2(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vadd2.u32.u32.u32.sat %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#elif CV_CUDEV_ARCH >= 200
+    asm("vadd.u32.u32.u32.sat %0.h0, %1.h0, %2.h0, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vadd.u32.u32.u32.sat %0.h1, %1.h1, %2.h1, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint s;
+    s = a ^ b;          // sum bits
+    r = a + b;          // actual sum
+    s = s ^ r;          // determine carry-ins for each bit position
+    s = s & 0x00010000; // carry-in to high word (= carry-out from low word)
+    r = r - s;          // subtract out carry-out from low word
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vsub2(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vsub2.u32.u32.u32.sat %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#elif CV_CUDEV_ARCH >= 200
+    asm("vsub.u32.u32.u32.sat %0.h0, %1.h0, %2.h0, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vsub.u32.u32.u32.sat %0.h1, %1.h1, %2.h1, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint s;
+    s = a ^ b;          // sum bits
+    r = a - b;          // actual sum
+    s = s ^ r;          // determine carry-ins for each bit position
+    s = s & 0x00010000; // borrow to high word
+    r = r + s;          // compensate for borrow from low word
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vabsdiff2(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vabsdiff2.u32.u32.u32.sat %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#elif CV_CUDEV_ARCH >= 200
+    asm("vabsdiff.u32.u32.u32.sat %0.h0, %1.h0, %2.h0, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vabsdiff.u32.u32.u32.sat %0.h1, %1.h1, %2.h1, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint s, t, u, v;
+    s = a & 0x0000ffff; // extract low halfword
+    r = b & 0x0000ffff; // extract low halfword
+    u = ::max(r, s);    // maximum of low halfwords
+    v = ::min(r, s);    // minimum of low halfwords
+    s = a & 0xffff0000; // extract high halfword
+    r = b & 0xffff0000; // extract high halfword
+    t = ::max(r, s);    // maximum of high halfwords
+    s = ::min(r, s);    // minimum of high halfwords
+    r = u | t;          // maximum of both halfwords
+    s = v | s;          // minimum of both halfwords
+    r = r - s;          // |a - b| = max(a,b) - min(a,b);
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vavg2(uint a, uint b)
+{
+    uint r, s;
+
+    // HAKMEM #23: a + b = 2 * (a & b) + (a ^ b) ==>
+    // (a + b) / 2 = (a & b) + ((a ^ b) >> 1)
+    s = a ^ b;
+    r = a & b;
+    s = s & 0xfffefffe; // ensure shift doesn't cross halfword boundaries
+    s = s >> 1;
+    s = r + s;
+
+    return s;
+}
+
+__device__ __forceinline__ uint vavrg2(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vavrg2.u32.u32.u32 %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    // HAKMEM #23: a + b = 2 * (a | b) - (a ^ b) ==>
+    // (a + b + 1) / 2 = (a | b) - ((a ^ b) >> 1)
+    uint s;
+    s = a ^ b;
+    r = a | b;
+    s = s & 0xfffefffe; // ensure shift doesn't cross half-word boundaries
+    s = s >> 1;
+    r = r - s;
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vseteq2(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vset2.u32.u32.eq %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    // inspired by Alan Mycroft's null-byte detection algorithm:
+    // null_byte(x) = ((x - 0x01010101) & (~x & 0x80808080))
+    uint c;
+    r = a ^ b;          // 0x0000 if a == b
+    c = r | 0x80008000; // set msbs, to catch carry out
+    r = r ^ c;          // extract msbs, msb = 1 if r < 0x8000
+    c = c - 0x00010001; // msb = 0, if r was 0x0000 or 0x8000
+    c = r & ~c;         // msb = 1, if r was 0x0000
+    r = c >> 15;        // convert to bool
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vcmpeq2(uint a, uint b)
+{
+    uint r, c;
+
+#if CV_CUDEV_ARCH >= 300
+    r = vseteq2(a, b);
+    c = r << 16;        // convert bool
+    r = c - r;          //  into mask
+#else
+    // inspired by Alan Mycroft's null-byte detection algorithm:
+    // null_byte(x) = ((x - 0x01010101) & (~x & 0x80808080))
+    r = a ^ b;          // 0x0000 if a == b
+    c = r | 0x80008000; // set msbs, to catch carry out
+    r = r ^ c;          // extract msbs, msb = 1 if r < 0x8000
+    c = c - 0x00010001; // msb = 0, if r was 0x0000 or 0x8000
+    c = r & ~c;         // msb = 1, if r was 0x0000
+    r = c >> 15;        // convert
+    r = c - r;          //  msbs to
+    r = c | r;          //   mask
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vsetge2(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vset2.u32.u32.ge %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint c;
+    asm("not.b32 %0, %0;" : "+r"(b));
+    c = vavrg2(a, b);   // (a + ~b + 1) / 2 = (a - b) / 2
+    c = c & 0x80008000; // msb = carry-outs
+    r = c >> 15;        // convert to bool
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vcmpge2(uint a, uint b)
+{
+    uint r, c;
+
+#if CV_CUDEV_ARCH >= 300
+    r = vsetge2(a, b);
+    c = r << 16;        // convert bool
+    r = c - r;          //  into mask
+#else
+    asm("not.b32 %0, %0;" : "+r"(b));
+    c = vavrg2(a, b);   // (a + ~b + 1) / 2 = (a - b) / 2
+    c = c & 0x80008000; // msb = carry-outs
+    r = c >> 15;        // convert
+    r = c - r;          //  msbs to
+    r = c | r;          //   mask
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vsetgt2(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vset2.u32.u32.gt %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint c;
+    asm("not.b32 %0, %0;" : "+r"(b));
+    c = vavg2(a, b);    // (a + ~b) / 2 = (a - b) / 2 [rounded down]
+    c = c & 0x80008000; // msbs = carry-outs
+    r = c >> 15;        // convert to bool
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vcmpgt2(uint a, uint b)
+{
+    uint r, c;
+
+#if CV_CUDEV_ARCH >= 300
+    r = vsetgt2(a, b);
+    c = r << 16;        // convert bool
+    r = c - r;          //  into mask
+#else
+    asm("not.b32 %0, %0;" : "+r"(b));
+    c = vavg2(a, b);    // (a + ~b) / 2 = (a - b) / 2 [rounded down]
+    c = c & 0x80008000; // msbs = carry-outs
+    r = c >> 15;        // convert
+    r = c - r;          //  msbs to
+    r = c | r;          //   mask
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vsetle2(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vset2.u32.u32.le %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint c;
+    asm("not.b32 %0, %0;" : "+r"(a));
+    c = vavrg2(a, b);   // (b + ~a + 1) / 2 = (b - a) / 2
+    c = c & 0x80008000; // msb = carry-outs
+    r = c >> 15;        // convert to bool
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vcmple2(uint a, uint b)
+{
+    uint r, c;
+
+#if CV_CUDEV_ARCH >= 300
+    r = vsetle2(a, b);
+    c = r << 16;        // convert bool
+    r = c - r;          //  into mask
+#else
+    asm("not.b32 %0, %0;" : "+r"(a));
+    c = vavrg2(a, b);   // (b + ~a + 1) / 2 = (b - a) / 2
+    c = c & 0x80008000; // msb = carry-outs
+    r = c >> 15;        // convert
+    r = c - r;          //  msbs to
+    r = c | r;          //   mask
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vsetlt2(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vset2.u32.u32.lt %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint c;
+    asm("not.b32 %0, %0;" : "+r"(a));
+    c = vavg2(a, b);    // (b + ~a) / 2 = (b - a) / 2 [rounded down]
+    c = c & 0x80008000; // msb = carry-outs
+    r = c >> 15;        // convert to bool
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vcmplt2(uint a, uint b)
+{
+    uint r, c;
+
+#if CV_CUDEV_ARCH >= 300
+    r = vsetlt2(a, b);
+    c = r << 16;        // convert bool
+    r = c - r;          //  into mask
+#else
+    asm("not.b32 %0, %0;" : "+r"(a));
+    c = vavg2(a, b);    // (b + ~a) / 2 = (b - a) / 2 [rounded down]
+    c = c & 0x80008000; // msb = carry-outs
+    r = c >> 15;        // convert
+    r = c - r;          //  msbs to
+    r = c | r;          //   mask
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vsetne2(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm ("vset2.u32.u32.ne %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    // inspired by Alan Mycroft's null-byte detection algorithm:
+    // null_byte(x) = ((x - 0x01010101) & (~x & 0x80808080))
+    uint c;
+    r = a ^ b;          // 0x0000 if a == b
+    c = r | 0x80008000; // set msbs, to catch carry out
+    c = c - 0x00010001; // msb = 0, if r was 0x0000 or 0x8000
+    c = r | c;          // msb = 1, if r was not 0x0000
+    c = c & 0x80008000; // extract msbs
+    r = c >> 15;        // convert to bool
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vcmpne2(uint a, uint b)
+{
+    uint r, c;
+
+#if CV_CUDEV_ARCH >= 300
+    r = vsetne2(a, b);
+    c = r << 16;        // convert bool
+    r = c - r;          //  into mask
+#else
+    // inspired by Alan Mycroft's null-byte detection algorithm:
+    // null_byte(x) = ((x - 0x01010101) & (~x & 0x80808080))
+    r = a ^ b;          // 0x0000 if a == b
+    c = r | 0x80008000; // set msbs, to catch carry out
+    c = c - 0x00010001; // msb = 0, if r was 0x0000 or 0x8000
+    c = r | c;          // msb = 1, if r was not 0x0000
+    c = c & 0x80008000; // extract msbs
+    r = c >> 15;        // convert
+    r = c - r;          //  msbs to
+    r = c | r;          //   mask
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vmax2(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vmax2.u32.u32.u32 %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#elif CV_CUDEV_ARCH >= 200
+    asm("vmax.u32.u32.u32 %0.h0, %1.h0, %2.h0, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vmax.u32.u32.u32 %0.h1, %1.h1, %2.h1, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint s, t, u;
+    r = a & 0x0000ffff; // extract low halfword
+    s = b & 0x0000ffff; // extract low halfword
+    t = ::max(r, s);    // maximum of low halfwords
+    r = a & 0xffff0000; // extract high halfword
+    s = b & 0xffff0000; // extract high halfword
+    u = ::max(r, s);    // maximum of high halfwords
+    r = t | u;          // combine halfword maximums
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vmin2(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vmin2.u32.u32.u32 %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#elif CV_CUDEV_ARCH >= 200
+    asm("vmin.u32.u32.u32 %0.h0, %1.h0, %2.h0, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vmin.u32.u32.u32 %0.h1, %1.h1, %2.h1, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint s, t, u;
+    r = a & 0x0000ffff; // extract low halfword
+    s = b & 0x0000ffff; // extract low halfword
+    t = ::min(r, s);    // minimum of low halfwords
+    r = a & 0xffff0000; // extract high halfword
+    s = b & 0xffff0000; // extract high halfword
+    u = ::min(r, s);    // minimum of high halfwords
+    r = t | u;          // combine halfword minimums
+#endif
+
+    return r;
+}
+
+// 4
+
+__device__ __forceinline__ uint vadd4(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vadd4.u32.u32.u32.sat %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#elif CV_CUDEV_ARCH >= 200
+    asm("vadd.u32.u32.u32.sat %0.b0, %1.b0, %2.b0, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vadd.u32.u32.u32.sat %0.b1, %1.b1, %2.b1, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vadd.u32.u32.u32.sat %0.b2, %1.b2, %2.b2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vadd.u32.u32.u32.sat %0.b3, %1.b3, %2.b3, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint s, t;
+    s = a ^ b;          // sum bits
+    r = a & 0x7f7f7f7f; // clear msbs
+    t = b & 0x7f7f7f7f; // clear msbs
+    s = s & 0x80808080; // msb sum bits
+    r = r + t;          // add without msbs, record carry-out in msbs
+    r = r ^ s;          // sum of msb sum and carry-in bits, w/o carry-out
+#endif /* CV_CUDEV_ARCH >= 300 */
+
+    return r;
+}
+
+__device__ __forceinline__ uint vsub4(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vsub4.u32.u32.u32.sat %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#elif CV_CUDEV_ARCH >= 200
+    asm("vsub.u32.u32.u32.sat %0.b0, %1.b0, %2.b0, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vsub.u32.u32.u32.sat %0.b1, %1.b1, %2.b1, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vsub.u32.u32.u32.sat %0.b2, %1.b2, %2.b2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vsub.u32.u32.u32.sat %0.b3, %1.b3, %2.b3, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint s, t;
+    s = a ^ ~b;         // inverted sum bits
+    r = a | 0x80808080; // set msbs
+    t = b & 0x7f7f7f7f; // clear msbs
+    s = s & 0x80808080; // inverted msb sum bits
+    r = r - t;          // subtract w/o msbs, record inverted borrows in msb
+    r = r ^ s;          // combine inverted msb sum bits and borrows
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vavg4(uint a, uint b)
+{
+    uint r, s;
+
+    // HAKMEM #23: a + b = 2 * (a & b) + (a ^ b) ==>
+    // (a + b) / 2 = (a & b) + ((a ^ b) >> 1)
+    s = a ^ b;
+    r = a & b;
+    s = s & 0xfefefefe; // ensure following shift doesn't cross byte boundaries
+    s = s >> 1;
+    s = r + s;
+
+    return s;
+}
+
+__device__ __forceinline__ uint vavrg4(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vavrg4.u32.u32.u32 %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    // HAKMEM #23: a + b = 2 * (a | b) - (a ^ b) ==>
+    // (a + b + 1) / 2 = (a | b) - ((a ^ b) >> 1)
+    uint c;
+    c = a ^ b;
+    r = a | b;
+    c = c & 0xfefefefe; // ensure following shift doesn't cross byte boundaries
+    c = c >> 1;
+    r = r - c;
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vseteq4(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vset4.u32.u32.eq %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    // inspired by Alan Mycroft's null-byte detection algorithm:
+    // null_byte(x) = ((x - 0x01010101) & (~x & 0x80808080))
+    uint c;
+    r = a ^ b;          // 0x00 if a == b
+    c = r | 0x80808080; // set msbs, to catch carry out
+    r = r ^ c;          // extract msbs, msb = 1 if r < 0x80
+    c = c - 0x01010101; // msb = 0, if r was 0x00 or 0x80
+    c = r & ~c;         // msb = 1, if r was 0x00
+    r = c >> 7;         // convert to bool
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vcmpeq4(uint a, uint b)
+{
+    uint r, t;
+
+#if CV_CUDEV_ARCH >= 300
+    r = vseteq4(a, b);
+    t = r << 8;         // convert bool
+    r = t - r;          //  to mask
+#else
+    // inspired by Alan Mycroft's null-byte detection algorithm:
+    // null_byte(x) = ((x - 0x01010101) & (~x & 0x80808080))
+    t = a ^ b;          // 0x00 if a == b
+    r = t | 0x80808080; // set msbs, to catch carry out
+    t = t ^ r;          // extract msbs, msb = 1 if t < 0x80
+    r = r - 0x01010101; // msb = 0, if t was 0x00 or 0x80
+    r = t & ~r;         // msb = 1, if t was 0x00
+    t = r >> 7;         // build mask
+    t = r - t;          //  from
+    r = t | r;          //   msbs
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vsetle4(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vset4.u32.u32.le %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint c;
+    asm("not.b32 %0, %0;" : "+r"(a));
+    c = vavrg4(a, b);   // (b + ~a + 1) / 2 = (b - a) / 2
+    c = c & 0x80808080; // msb = carry-outs
+    r = c >> 7;         // convert to bool
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vcmple4(uint a, uint b)
+{
+    uint r, c;
+
+#if CV_CUDEV_ARCH >= 300
+    r = vsetle4(a, b);
+    c = r << 8;         // convert bool
+    r = c - r;          //  to mask
+#else
+    asm("not.b32 %0, %0;" : "+r"(a));
+    c = vavrg4(a, b);   // (b + ~a + 1) / 2 = (b - a) / 2
+    c = c & 0x80808080; // msbs = carry-outs
+    r = c >> 7;         // convert
+    r = c - r;          //  msbs to
+    r = c | r;          //   mask
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vsetlt4(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vset4.u32.u32.lt %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint c;
+    asm("not.b32 %0, %0;" : "+r"(a));
+    c = vavg4(a, b);    // (b + ~a) / 2 = (b - a) / 2 [rounded down]
+    c = c & 0x80808080; // msb = carry-outs
+    r = c >> 7;         // convert to bool
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vcmplt4(uint a, uint b)
+{
+    uint r, c;
+
+#if CV_CUDEV_ARCH >= 300
+    r = vsetlt4(a, b);
+    c = r << 8;         // convert bool
+    r = c - r;          //  to mask
+#else
+    asm("not.b32 %0, %0;" : "+r"(a));
+    c = vavg4(a, b);    // (b + ~a) / 2 = (b - a) / 2 [rounded down]
+    c = c & 0x80808080; // msbs = carry-outs
+    r = c >> 7;         // convert
+    r = c - r;          //  msbs to
+    r = c | r;          //   mask
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vsetge4(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vset4.u32.u32.ge %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint c;
+    asm("not.b32 %0, %0;" : "+r"(b));
+    c = vavrg4(a, b);   // (a + ~b + 1) / 2 = (a - b) / 2
+    c = c & 0x80808080; // msb = carry-outs
+    r = c >> 7;         // convert to bool
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vcmpge4(uint a, uint b)
+{
+    uint r, s;
+
+#if CV_CUDEV_ARCH >= 300
+    r = vsetge4(a, b);
+    s = r << 8;         // convert bool
+    r = s - r;          //  to mask
+#else
+    asm ("not.b32 %0,%0;" : "+r"(b));
+    r = vavrg4 (a, b);  // (a + ~b + 1) / 2 = (a - b) / 2
+    r = r & 0x80808080; // msb = carry-outs
+    s = r >> 7;         // build mask
+    s = r - s;          //  from
+    r = s | r;          //   msbs
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vsetgt4(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vset4.u32.u32.gt %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint c;
+    asm("not.b32 %0, %0;" : "+r"(b));
+    c = vavg4(a, b);    // (a + ~b) / 2 = (a - b) / 2 [rounded down]
+    c = c & 0x80808080; // msb = carry-outs
+    r = c >> 7;         // convert to bool
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vcmpgt4(uint a, uint b)
+{
+    uint r, c;
+
+#if CV_CUDEV_ARCH >= 300
+    r = vsetgt4(a, b);
+    c = r << 8;         // convert bool
+    r = c - r;          //  to mask
+#else
+    asm("not.b32 %0, %0;" : "+r"(b));
+    c = vavg4(a, b);    // (a + ~b) / 2 = (a - b) / 2 [rounded down]
+    c = c & 0x80808080; // msb = carry-outs
+    r = c >> 7;         // convert
+    r = c - r;          //  msbs to
+    r = c | r;          //   mask
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vsetne4(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vset4.u32.u32.ne %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    // inspired by Alan Mycroft's null-byte detection algorithm:
+    // null_byte(x) = ((x - 0x01010101) & (~x & 0x80808080))
+    uint c;
+    r = a ^ b;          // 0x00 if a == b
+    c = r | 0x80808080; // set msbs, to catch carry out
+    c = c - 0x01010101; // msb = 0, if r was 0x00 or 0x80
+    c = r | c;          // msb = 1, if r was not 0x00
+    c = c & 0x80808080; // extract msbs
+    r = c >> 7;         // convert to bool
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vcmpne4(uint a, uint b)
+{
+    uint r, c;
+
+#if CV_CUDEV_ARCH >= 300
+    r = vsetne4(a, b);
+    c = r << 8;         // convert bool
+    r = c - r;          //  to mask
+#else
+    // inspired by Alan Mycroft's null-byte detection algorithm:
+    // null_byte(x) = ((x - 0x01010101) & (~x & 0x80808080))
+    r = a ^ b;          // 0x00 if a == b
+    c = r | 0x80808080; // set msbs, to catch carry out
+    c = c - 0x01010101; // msb = 0, if r was 0x00 or 0x80
+    c = r | c;          // msb = 1, if r was not 0x00
+    c = c & 0x80808080; // extract msbs
+    r = c >> 7;         // convert
+    r = c - r;          //  msbs to
+    r = c | r;          //   mask
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vabsdiff4(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vabsdiff4.u32.u32.u32.sat %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#elif CV_CUDEV_ARCH >= 200
+    asm("vabsdiff.u32.u32.u32.sat %0.b0, %1.b0, %2.b0, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vabsdiff.u32.u32.u32.sat %0.b1, %1.b1, %2.b1, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vabsdiff.u32.u32.u32.sat %0.b2, %1.b2, %2.b2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vabsdiff.u32.u32.u32.sat %0.b3, %1.b3, %2.b3, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint s;
+    s = vcmpge4(a, b);  // mask = 0xff if a >= b
+    r = a ^ b;          //
+    s = (r &  s) ^ b;   // select a when a >= b, else select b => max(a,b)
+    r = s ^ r;          // select a when b >= a, else select b => min(a,b)
+    r = s - r;          // |a - b| = max(a,b) - min(a,b);
+#endif
+
+    return r;
+}
+
+__device__ __forceinline__ uint vmax4(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vmax4.u32.u32.u32 %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#elif CV_CUDEV_ARCH >= 200
+    asm("vmax.u32.u32.u32 %0.b0, %1.b0, %2.b0, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vmax.u32.u32.u32 %0.b1, %1.b1, %2.b1, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vmax.u32.u32.u32 %0.b2, %1.b2, %2.b2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vmax.u32.u32.u32 %0.b3, %1.b3, %2.b3, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint s;
+    s = vcmpge4(a, b);  // mask = 0xff if a >= b
+    r = a & s;          // select a when b >= a
+    s = b & ~s;         // select b when b < a
+    r = r | s;          // combine byte selections
+#endif
+
+    return r;           // byte-wise unsigned maximum
+}
+
+__device__ __forceinline__ uint vmin4(uint a, uint b)
+{
+    uint r = 0;
+
+#if CV_CUDEV_ARCH >= 300
+    asm("vmin4.u32.u32.u32 %0, %1, %2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#elif CV_CUDEV_ARCH >= 200
+    asm("vmin.u32.u32.u32 %0.b0, %1.b0, %2.b0, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vmin.u32.u32.u32 %0.b1, %1.b1, %2.b1, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vmin.u32.u32.u32 %0.b2, %1.b2, %2.b2, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+    asm("vmin.u32.u32.u32 %0.b3, %1.b3, %2.b3, %3;" : "=r"(r) : "r"(a), "r"(b), "r"(r));
+#else
+    uint s;
+    s = vcmpge4(b, a);  // mask = 0xff if a >= b
+    r = a & s;          // select a when b >= a
+    s = b & ~s;         // select b when b < a
+    r = r | s;          // combine byte selections
+#endif
+
+    return r;
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/util/tuple.hpp b/modules/cudev/include/opencv2/cudev/util/tuple.hpp
new file mode 100644
index 00000000000..b28bb4df03d
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/util/tuple.hpp
@@ -0,0 +1,85 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_UTIL_TUPLE_HPP
+#define OPENCV_CUDEV_UTIL_TUPLE_HPP
+
+#include "../common.hpp"
+#include "detail/tuple.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+using tuple_detail::tuple;
+using tuple_detail::tuple_size;
+using tuple_detail::get;
+using tuple_detail::tuple_element;
+using tuple_detail::make_tuple;
+using tuple_detail::tie;
+
+template <typename T> struct TupleTraits
+{
+    enum { is_tuple = 0 };
+    enum { size = 1 };
+};
+template <typename P0, typename P1, typename P2, typename P3, typename P4, typename P5, typename P6, typename P7, typename P8, typename P9>
+struct TupleTraits< tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9> >
+{
+    enum { is_tuple = 1 };
+    enum { size = tuple_size< tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9> >::value };
+};
+
+template <class Tuple, template <typename T> class CvtOp> struct ConvertTuple
+{
+    typedef typename tuple_detail::ConvertTuple<Tuple, tuple_size<Tuple>::value, CvtOp>::type type;
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/util/type_traits.hpp b/modules/cudev/include/opencv2/cudev/util/type_traits.hpp
new file mode 100644
index 00000000000..cad1f006f39
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/util/type_traits.hpp
@@ -0,0 +1,174 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_UTIL_TYPE_TRAITS_HPP
+#define OPENCV_CUDEV_UTIL_TYPE_TRAITS_HPP
+
+#include "../common.hpp"
+#include "vec_traits.hpp"
+#include "detail/type_traits.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// NullType
+
+struct NullType {};
+
+// Int2Type
+
+template <int A> struct Int2Type
+{
+   enum { value = A };
+};
+
+// ArrayWrapper
+
+template <typename T, int COUNT> struct ArrayWrapper
+{
+    T array[COUNT];
+};
+
+// Log2 (compile time calculation)
+
+template <int N, int CURRENT_VAL = N, int COUNT = 0> struct Log2
+{
+    enum { value = Log2<N, (CURRENT_VAL >> 1), COUNT + 1>::VALUE };
+};
+template <int N, int COUNT> struct Log2<N, 0, COUNT>
+{
+    enum { value = (1 << (COUNT - 1) < N) ? COUNT : COUNT - 1 };
+};
+
+// IsPowerOf2
+
+template <int N> struct IsPowerOf2
+{
+    enum { value = ((N != 0) && !(N & (N - 1))) };
+};
+
+// SelectIf
+
+template <bool, typename ThenType, typename ElseType> struct SelectIf
+{
+    typedef ThenType type;
+};
+template <typename ThenType, typename ElseType> struct SelectIf<false, ThenType, ElseType>
+{
+    typedef ElseType type;
+};
+
+// EnableIf
+
+template <bool, typename T = void> struct EnableIf {};
+template <typename T> struct EnableIf<true, T> { typedef T type; };
+
+// DisableIf
+
+template <bool, typename T = void> struct DisableIf {};
+template <typename T> struct DisableIf<false, T> { typedef T type; };
+
+// TypesEquals
+
+template <typename A, typename B> struct TypesEquals
+{
+    enum { value = 0 };
+};
+template <typename A> struct TypesEquals<A, A>
+{
+    enum { value = 1 };
+};
+
+// TypeTraits
+
+template <typename T> struct TypeTraits
+{
+    typedef typename type_traits_detail::UnConst<T>::type                                                non_const_type;
+    typedef typename type_traits_detail::UnVolatile<T>::type                                             non_volatile_type;
+    typedef typename type_traits_detail::UnVolatile<typename type_traits_detail::UnConst<T>::type>::type unqualified_type;
+    typedef typename type_traits_detail::PointerTraits<unqualified_type>::type                           pointee_type;
+    typedef typename type_traits_detail::ReferenceTraits<T>::type                                        referred_type;
+
+    enum { is_const          = type_traits_detail::UnConst<T>::value };
+    enum { is_volatile       = type_traits_detail::UnVolatile<T>::value };
+
+    enum { is_reference      = type_traits_detail::ReferenceTraits<unqualified_type>::value };
+    enum { is_pointer        = type_traits_detail::PointerTraits<typename type_traits_detail::ReferenceTraits<unqualified_type>::type>::value };
+
+    enum { is_unsigned_int   = type_traits_detail::IsUnsignedIntegral<unqualified_type>::value };
+    enum { is_signed_int     = type_traits_detail::IsSignedIntergral<unqualified_type>::value };
+    enum { is_integral       = type_traits_detail::IsIntegral<unqualified_type>::value };
+    enum { is_float          = type_traits_detail::IsFloat<unqualified_type>::value };
+    enum { is_scalar         = is_integral || is_float };
+    enum { is_vec            = type_traits_detail::IsVec<unqualified_type>::value };
+
+    typedef typename SelectIf<type_traits_detail::IsSimpleParameter<unqualified_type>::value,
+        T, typename type_traits_detail::AddParameterType<T>::type>::type parameter_type;
+};
+
+// LargerType
+
+template <typename A, typename B> struct LargerType
+{
+    typedef typename SelectIf<
+        unsigned(VecTraits<A>::cn) != unsigned(VecTraits<B>::cn),
+        void,
+        typename MakeVec<
+            typename type_traits_detail::LargerDepth<
+                typename VecTraits<A>::elem_type,
+                typename VecTraits<B>::elem_type
+            >::type,
+            VecTraits<A>::cn
+        >::type
+    >::type type;
+};
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/util/vec_math.hpp b/modules/cudev/include/opencv2/cudev/util/vec_math.hpp
new file mode 100644
index 00000000000..f6d8d2cda41
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/util/vec_math.hpp
@@ -0,0 +1,941 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_UTIL_VEC_MATH_HPP
+#define OPENCV_CUDEV_UTIL_VEC_MATH_HPP
+
+#include "vec_traits.hpp"
+#include "saturate_cast.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// saturate_cast
+
+namespace vec_math_detail
+{
+    template <int cn, typename VecD> struct SatCastHelper;
+
+    template <typename VecD> struct SatCastHelper<1, VecD>
+    {
+        template <typename VecS> __device__ __forceinline__ static VecD cast(const VecS& v)
+        {
+            typedef typename VecTraits<VecD>::elem_type D;
+            return VecTraits<VecD>::make(saturate_cast<D>(v.x));
+        }
+    };
+
+    template <typename VecD> struct SatCastHelper<2, VecD>
+    {
+        template <typename VecS> __device__ __forceinline__ static VecD cast(const VecS& v)
+        {
+            typedef typename VecTraits<VecD>::elem_type D;
+            return VecTraits<VecD>::make(saturate_cast<D>(v.x), saturate_cast<D>(v.y));
+        }
+    };
+
+    template <typename VecD> struct SatCastHelper<3, VecD>
+    {
+        template <typename VecS> __device__ __forceinline__ static VecD cast(const VecS& v)
+        {
+            typedef typename VecTraits<VecD>::elem_type D;
+            return VecTraits<VecD>::make(saturate_cast<D>(v.x), saturate_cast<D>(v.y), saturate_cast<D>(v.z));
+        }
+    };
+
+    template <typename VecD> struct SatCastHelper<4, VecD>
+    {
+        template <typename VecS> __device__ __forceinline__ static VecD cast(const VecS& v)
+        {
+            typedef typename VecTraits<VecD>::elem_type D;
+            return VecTraits<VecD>::make(saturate_cast<D>(v.x), saturate_cast<D>(v.y), saturate_cast<D>(v.z), saturate_cast<D>(v.w));
+        }
+    };
+}
+
+template<typename T> __device__ __forceinline__ T saturate_cast(const uchar1& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const char1& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const ushort1& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const short1& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const uint1& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const int1& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const float1& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const double1& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+
+template<typename T> __device__ __forceinline__ T saturate_cast(const uchar2& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const char2& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const ushort2& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const short2& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const uint2& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const int2& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const float2& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const double2& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+
+template<typename T> __device__ __forceinline__ T saturate_cast(const uchar3& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const char3& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const ushort3& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const short3& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const uint3& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const int3& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const float3& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const double3& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+
+template<typename T> __device__ __forceinline__ T saturate_cast(const uchar4& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const char4& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const ushort4& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const short4& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const uint4& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const int4& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const float4& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+template<typename T> __device__ __forceinline__ T saturate_cast(const double4& v) { return vec_math_detail::SatCastHelper<VecTraits<T>::cn, T>::cast(v); }
+
+// unary operators
+
+#define CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(op, input_type, output_type) \
+    __device__ __forceinline__ output_type ## 1 operator op(const input_type ## 1 & a) \
+    { \
+        return VecTraits<output_type ## 1>::make(op (a.x)); \
+    } \
+    __device__ __forceinline__ output_type ## 2 operator op(const input_type ## 2 & a) \
+    { \
+        return VecTraits<output_type ## 2>::make(op (a.x), op (a.y)); \
+    } \
+    __device__ __forceinline__ output_type ## 3 operator op(const input_type ## 3 & a) \
+    { \
+        return VecTraits<output_type ## 3>::make(op (a.x), op (a.y), op (a.z)); \
+    } \
+    __device__ __forceinline__ output_type ## 4 operator op(const input_type ## 4 & a) \
+    { \
+        return VecTraits<output_type ## 4>::make(op (a.x), op (a.y), op (a.z), op (a.w)); \
+    }
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(-, char, char)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(-, short, short)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(-, int, int)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(-, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(-, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(!, uchar, uchar)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(!, char, uchar)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(!, ushort, uchar)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(!, short, uchar)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(!, int, uchar)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(!, uint, uchar)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(!, float, uchar)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(!, double, uchar)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(~, uchar, uchar)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(~, char, char)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(~, ushort, ushort)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(~, short, short)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(~, int, int)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_OP(~, uint, uint)
+
+#undef CV_CUDEV_IMPLEMENT_VEC_UNARY_OP
+
+// unary functions
+
+#define CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(func_name, func, input_type, output_type) \
+    __device__ __forceinline__ output_type ## 1 func_name(const input_type ## 1 & a) \
+    { \
+        return VecTraits<output_type ## 1>::make(func (a.x)); \
+    } \
+    __device__ __forceinline__ output_type ## 2 func_name(const input_type ## 2 & a) \
+    { \
+        return VecTraits<output_type ## 2>::make(func (a.x), func (a.y)); \
+    } \
+    __device__ __forceinline__ output_type ## 3 func_name(const input_type ## 3 & a) \
+    { \
+        return VecTraits<output_type ## 3>::make(func (a.x), func (a.y), func (a.z)); \
+    } \
+    __device__ __forceinline__ output_type ## 4 func_name(const input_type ## 4 & a) \
+    { \
+        return VecTraits<output_type ## 4>::make(func (a.x), func (a.y), func (a.z), func (a.w)); \
+    }
+
+namespace vec_math_detail
+{
+    __device__ __forceinline__ schar abs_(schar val)
+    {
+        return (schar) ::abs((int) val);
+    }
+
+    __device__ __forceinline__ short abs_(short val)
+    {
+        return (short) ::abs((int) val);
+    }
+}
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(abs, /*::abs*/, uchar, uchar)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(abs, vec_math_detail::abs_, char, char)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(abs, /*::abs*/, ushort, ushort)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(abs, vec_math_detail::abs_, short, short)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(abs, ::abs, int, int)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(abs, /*::abs*/, uint, uint)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(abs, ::fabsf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(abs, ::fabs, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sqrt, ::sqrtf, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sqrt, ::sqrtf, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sqrt, ::sqrtf, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sqrt, ::sqrtf, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sqrt, ::sqrtf, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sqrt, ::sqrtf, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sqrt, ::sqrtf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sqrt, ::sqrt, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp, ::expf, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp, ::expf, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp, ::expf, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp, ::expf, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp, ::expf, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp, ::expf, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp, ::expf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp, ::exp, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp2, ::exp2f, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp2, ::exp2f, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp2, ::exp2f, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp2, ::exp2f, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp2, ::exp2f, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp2, ::exp2f, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp2, ::exp2f, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp2, ::exp2, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp10, ::exp10f, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp10, ::exp10f, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp10, ::exp10f, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp10, ::exp10f, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp10, ::exp10f, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp10, ::exp10f, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp10, ::exp10f, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(exp10, ::exp10, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log, ::logf, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log, ::logf, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log, ::logf, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log, ::logf, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log, ::logf, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log, ::logf, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log, ::logf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log, ::log, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log2, ::log2f, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log2, ::log2f, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log2, ::log2f, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log2, ::log2f, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log2, ::log2f, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log2, ::log2f, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log2, ::log2f, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log2, ::log2, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log10, ::log10f, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log10, ::log10f, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log10, ::log10f, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log10, ::log10f, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log10, ::log10f, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log10, ::log10f, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log10, ::log10f, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(log10, ::log10, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sin, ::sinf, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sin, ::sinf, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sin, ::sinf, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sin, ::sinf, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sin, ::sinf, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sin, ::sinf, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sin, ::sinf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sin, ::sin, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(cos, ::cosf, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(cos, ::cosf, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(cos, ::cosf, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(cos, ::cosf, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(cos, ::cosf, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(cos, ::cosf, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(cos, ::cosf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(cos, ::cos, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(tan, ::tanf, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(tan, ::tanf, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(tan, ::tanf, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(tan, ::tanf, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(tan, ::tanf, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(tan, ::tanf, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(tan, ::tanf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(tan, ::tan, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(asin, ::asinf, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(asin, ::asinf, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(asin, ::asinf, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(asin, ::asinf, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(asin, ::asinf, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(asin, ::asinf, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(asin, ::asinf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(asin, ::asin, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(acos, ::acosf, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(acos, ::acosf, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(acos, ::acosf, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(acos, ::acosf, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(acos, ::acosf, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(acos, ::acosf, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(acos, ::acosf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(acos, ::acos, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(atan, ::atanf, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(atan, ::atanf, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(atan, ::atanf, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(atan, ::atanf, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(atan, ::atanf, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(atan, ::atanf, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(atan, ::atanf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(atan, ::atan, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sinh, ::sinhf, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sinh, ::sinhf, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sinh, ::sinhf, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sinh, ::sinhf, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sinh, ::sinhf, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sinh, ::sinhf, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sinh, ::sinhf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(sinh, ::sinh, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(cosh, ::coshf, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(cosh, ::coshf, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(cosh, ::coshf, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(cosh, ::coshf, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(cosh, ::coshf, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(cosh, ::coshf, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(cosh, ::coshf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(cosh, ::cosh, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(tanh, ::tanhf, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(tanh, ::tanhf, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(tanh, ::tanhf, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(tanh, ::tanhf, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(tanh, ::tanhf, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(tanh, ::tanhf, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(tanh, ::tanhf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(tanh, ::tanh, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(asinh, ::asinhf, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(asinh, ::asinhf, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(asinh, ::asinhf, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(asinh, ::asinhf, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(asinh, ::asinhf, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(asinh, ::asinhf, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(asinh, ::asinhf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(asinh, ::asinh, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(acosh, ::acoshf, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(acosh, ::acoshf, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(acosh, ::acoshf, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(acosh, ::acoshf, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(acosh, ::acoshf, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(acosh, ::acoshf, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(acosh, ::acoshf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(acosh, ::acosh, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(atanh, ::atanhf, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(atanh, ::atanhf, char, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(atanh, ::atanhf, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(atanh, ::atanhf, short, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(atanh, ::atanhf, int, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(atanh, ::atanhf, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(atanh, ::atanhf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC(atanh, ::atanh, double, double)
+
+#undef CV_CUDEV_IMPLEMENT_VEC_UNARY_FUNC
+
+// binary operators (vec & vec)
+
+#define CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(op, input_type, output_type) \
+    __device__ __forceinline__ output_type ## 1 operator op(const input_type ## 1 & a, const input_type ## 1 & b) \
+    { \
+        return VecTraits<output_type ## 1>::make(a.x op b.x); \
+    } \
+    __device__ __forceinline__ output_type ## 2 operator op(const input_type ## 2 & a, const input_type ## 2 & b) \
+    { \
+        return VecTraits<output_type ## 2>::make(a.x op b.x, a.y op b.y); \
+    } \
+    __device__ __forceinline__ output_type ## 3 operator op(const input_type ## 3 & a, const input_type ## 3 & b) \
+    { \
+        return VecTraits<output_type ## 3>::make(a.x op b.x, a.y op b.y, a.z op b.z); \
+    } \
+    __device__ __forceinline__ output_type ## 4 operator op(const input_type ## 4 & a, const input_type ## 4 & b) \
+    { \
+        return VecTraits<output_type ## 4>::make(a.x op b.x, a.y op b.y, a.z op b.z, a.w op b.w); \
+    }
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(+, uchar, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(+, char, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(+, ushort, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(+, short, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(+, int, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(+, uint, uint)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(+, float, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(+, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(-, uchar, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(-, char, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(-, ushort, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(-, short, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(-, int, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(-, uint, uint)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(-, float, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(-, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(*, uchar, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(*, char, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(*, ushort, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(*, short, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(*, int, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(*, uint, uint)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(*, float, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(*, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(/, uchar, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(/, char, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(/, ushort, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(/, short, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(/, int, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(/, uint, uint)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(/, float, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(/, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(==, uchar, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(==, char, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(==, ushort, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(==, short, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(==, int, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(==, uint, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(==, float, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(==, double, uchar)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(!=, uchar, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(!=, char, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(!=, ushort, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(!=, short, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(!=, int, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(!=, uint, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(!=, float, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(!=, double, uchar)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(>, uchar, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(>, char, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(>, ushort, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(>, short, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(>, int, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(>, uint, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(>, float, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(>, double, uchar)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(<, uchar, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(<, char, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(<, ushort, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(<, short, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(<, int, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(<, uint, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(<, float, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(<, double, uchar)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(>=, uchar, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(>=, char, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(>=, ushort, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(>=, short, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(>=, int, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(>=, uint, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(>=, float, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(>=, double, uchar)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(<=, uchar, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(<=, char, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(<=, ushort, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(<=, short, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(<=, int, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(<=, uint, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(<=, float, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(<=, double, uchar)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(&&, uchar, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(&&, char, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(&&, ushort, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(&&, short, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(&&, int, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(&&, uint, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(&&, float, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(&&, double, uchar)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(||, uchar, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(||, char, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(||, ushort, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(||, short, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(||, int, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(||, uint, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(||, float, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(||, double, uchar)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(&, uchar, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(&, char, char)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(&, ushort, ushort)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(&, short, short)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(&, int, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(&, uint, uint)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(|, uchar, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(|, char, char)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(|, ushort, ushort)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(|, short, short)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(|, int, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(|, uint, uint)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(^, uchar, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(^, char, char)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(^, ushort, ushort)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(^, short, short)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(^, int, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_OP(^, uint, uint)
+
+#undef CV_CUDEV_IMPLEMENT_VEC_BINARY_OP
+
+// binary operators (vec & scalar)
+
+#define CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(op, input_type, scalar_type, output_type) \
+    __device__ __forceinline__ output_type ## 1 operator op(const input_type ## 1 & a, scalar_type s) \
+    { \
+        return VecTraits<output_type ## 1>::make(a.x op s); \
+    } \
+    __device__ __forceinline__ output_type ## 1 operator op(scalar_type s, const input_type ## 1 & b) \
+    { \
+        return VecTraits<output_type ## 1>::make(s op b.x); \
+    } \
+    __device__ __forceinline__ output_type ## 2 operator op(const input_type ## 2 & a, scalar_type s) \
+    { \
+        return VecTraits<output_type ## 2>::make(a.x op s, a.y op s); \
+    } \
+    __device__ __forceinline__ output_type ## 2 operator op(scalar_type s, const input_type ## 2 & b) \
+    { \
+        return VecTraits<output_type ## 2>::make(s op b.x, s op b.y); \
+    } \
+    __device__ __forceinline__ output_type ## 3 operator op(const input_type ## 3 & a, scalar_type s) \
+    { \
+        return VecTraits<output_type ## 3>::make(a.x op s, a.y op s, a.z op s); \
+    } \
+    __device__ __forceinline__ output_type ## 3 operator op(scalar_type s, const input_type ## 3 & b) \
+    { \
+        return VecTraits<output_type ## 3>::make(s op b.x, s op b.y, s op b.z); \
+    } \
+    __device__ __forceinline__ output_type ## 4 operator op(const input_type ## 4 & a, scalar_type s) \
+    { \
+        return VecTraits<output_type ## 4>::make(a.x op s, a.y op s, a.z op s, a.w op s); \
+    } \
+    __device__ __forceinline__ output_type ## 4 operator op(scalar_type s, const input_type ## 4 & b) \
+    { \
+        return VecTraits<output_type ## 4>::make(s op b.x, s op b.y, s op b.z, s op b.w); \
+    }
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, uchar, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, uchar, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, uchar, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, char, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, char, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, char, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, ushort, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, ushort, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, ushort, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, short, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, short, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, short, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, int, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, int, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, int, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, uint, uint, uint)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, uint, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, uint, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, float, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, float, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(+, double, double, double)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, uchar, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, uchar, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, uchar, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, char, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, char, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, char, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, ushort, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, ushort, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, ushort, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, short, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, short, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, short, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, int, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, int, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, int, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, uint, uint, uint)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, uint, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, uint, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, float, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, float, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(-, double, double, double)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, uchar, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, uchar, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, uchar, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, char, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, char, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, char, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, ushort, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, ushort, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, ushort, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, short, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, short, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, short, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, int, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, int, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, int, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, uint, uint, uint)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, uint, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, uint, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, float, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, float, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(*, double, double, double)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, uchar, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, uchar, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, uchar, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, char, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, char, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, char, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, ushort, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, ushort, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, ushort, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, short, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, short, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, short, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, int, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, int, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, int, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, uint, uint, uint)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, uint, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, uint, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, float, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, float, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(/, double, double, double)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(==, uchar, uchar, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(==, char, char, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(==, ushort, ushort, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(==, short, short, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(==, int, int, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(==, uint, uint, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(==, float, float, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(==, double, double, uchar)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(!=, uchar, uchar, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(!=, char, char, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(!=, ushort, ushort, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(!=, short, short, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(!=, int, int, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(!=, uint, uint, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(!=, float, float, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(!=, double, double, uchar)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(>, uchar, uchar, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(>, char, char, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(>, ushort, ushort, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(>, short, short, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(>, int, int, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(>, uint, uint, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(>, float, float, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(>, double, double, uchar)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(<, uchar, uchar, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(<, char, char, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(<, ushort, ushort, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(<, short, short, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(<, int, int, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(<, uint, uint, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(<, float, float, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(<, double, double, uchar)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(>=, uchar, uchar, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(>=, char, char, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(>=, ushort, ushort, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(>=, short, short, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(>=, int, int, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(>=, uint, uint, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(>=, float, float, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(>=, double, double, uchar)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(<=, uchar, uchar, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(<=, char, char, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(<=, ushort, ushort, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(<=, short, short, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(<=, int, int, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(<=, uint, uint, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(<=, float, float, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(<=, double, double, uchar)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(&&, uchar, uchar, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(&&, char, char, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(&&, ushort, ushort, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(&&, short, short, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(&&, int, int, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(&&, uint, uint, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(&&, float, float, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(&&, double, double, uchar)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(||, uchar, uchar, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(||, char, char, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(||, ushort, ushort, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(||, short, short, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(||, int, int, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(||, uint, uint, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(||, float, float, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(||, double, double, uchar)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(&, uchar, uchar, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(&, char, char, char)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(&, ushort, ushort, ushort)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(&, short, short, short)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(&, int, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(&, uint, uint, uint)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(|, uchar, uchar, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(|, char, char, char)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(|, ushort, ushort, ushort)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(|, short, short, short)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(|, int, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(|, uint, uint, uint)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(^, uchar, uchar, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(^, char, char, char)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(^, ushort, ushort, ushort)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(^, short, short, short)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(^, int, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP(^, uint, uint, uint)
+
+#undef CV_CUDEV_IMPLEMENT_SCALAR_BINARY_OP
+
+// binary function (vec & vec)
+
+#define CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(func_name, func, input_type, output_type) \
+    __device__ __forceinline__ output_type ## 1 func_name(const input_type ## 1 & a, const input_type ## 1 & b) \
+    { \
+        return VecTraits<output_type ## 1>::make(func (a.x, b.x)); \
+    } \
+    __device__ __forceinline__ output_type ## 2 func_name(const input_type ## 2 & a, const input_type ## 2 & b) \
+    { \
+        return VecTraits<output_type ## 2>::make(func (a.x, b.x), func (a.y, b.y)); \
+    } \
+    __device__ __forceinline__ output_type ## 3 func_name(const input_type ## 3 & a, const input_type ## 3 & b) \
+    { \
+        return VecTraits<output_type ## 3>::make(func (a.x, b.x), func (a.y, b.y), func (a.z, b.z)); \
+    } \
+    __device__ __forceinline__ output_type ## 4 func_name(const input_type ## 4 & a, const input_type ## 4 & b) \
+    { \
+        return VecTraits<output_type ## 4>::make(func (a.x, b.x), func (a.y, b.y), func (a.z, b.z), func (a.w, b.w)); \
+    }
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(max, ::max, uchar, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(max, ::max, char, char)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(max, ::max, ushort, ushort)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(max, ::max, short, short)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(max, ::max, uint, uint)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(max, ::max, int, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(max, ::fmaxf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(max, ::fmax, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(min, ::min, uchar, uchar)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(min, ::min, char, char)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(min, ::min, ushort, ushort)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(min, ::min, short, short)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(min, ::min, uint, uint)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(min, ::min, int, int)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(min, ::fminf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(min, ::fmin, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(hypot, ::hypotf, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(hypot, ::hypotf, char, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(hypot, ::hypotf, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(hypot, ::hypotf, short, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(hypot, ::hypotf, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(hypot, ::hypotf, int, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(hypot, ::hypotf, float, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(hypot, ::hypot, double, double)
+
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(atan2, ::atan2f, uchar, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(atan2, ::atan2f, char, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(atan2, ::atan2f, ushort, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(atan2, ::atan2f, short, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(atan2, ::atan2f, uint, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(atan2, ::atan2f, int, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(atan2, ::atan2f, float, float)
+CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC(atan2, ::atan2, double, double)
+
+#undef CV_CUDEV_IMPLEMENT_VEC_BINARY_FUNC
+
+// binary function (vec & scalar)
+
+#define CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(func_name, func, input_type, scalar_type, output_type) \
+    __device__ __forceinline__ output_type ## 1 func_name(const input_type ## 1 & a, scalar_type s) \
+    { \
+        return VecTraits<output_type ## 1>::make(func ((output_type) a.x, (output_type) s)); \
+    } \
+    __device__ __forceinline__ output_type ## 1 func_name(scalar_type s, const input_type ## 1 & b) \
+    { \
+        return VecTraits<output_type ## 1>::make(func ((output_type) s, (output_type) b.x)); \
+    } \
+    __device__ __forceinline__ output_type ## 2 func_name(const input_type ## 2 & a, scalar_type s) \
+    { \
+        return VecTraits<output_type ## 2>::make(func ((output_type) a.x, (output_type) s), func ((output_type) a.y, (output_type) s)); \
+    } \
+    __device__ __forceinline__ output_type ## 2 func_name(scalar_type s, const input_type ## 2 & b) \
+    { \
+        return VecTraits<output_type ## 2>::make(func ((output_type) s, (output_type) b.x), func ((output_type) s, (output_type) b.y)); \
+    } \
+    __device__ __forceinline__ output_type ## 3 func_name(const input_type ## 3 & a, scalar_type s) \
+    { \
+        return VecTraits<output_type ## 3>::make(func ((output_type) a.x, (output_type) s), func ((output_type) a.y, (output_type) s), func ((output_type) a.z, (output_type) s)); \
+    } \
+    __device__ __forceinline__ output_type ## 3 func_name(scalar_type s, const input_type ## 3 & b) \
+    { \
+        return VecTraits<output_type ## 3>::make(func ((output_type) s, (output_type) b.x), func ((output_type) s, (output_type) b.y), func ((output_type) s, (output_type) b.z)); \
+    } \
+    __device__ __forceinline__ output_type ## 4 func_name(const input_type ## 4 & a, scalar_type s) \
+    { \
+        return VecTraits<output_type ## 4>::make(func ((output_type) a.x, (output_type) s), func ((output_type) a.y, (output_type) s), func ((output_type) a.z, (output_type) s), func ((output_type) a.w, (output_type) s)); \
+    } \
+    __device__ __forceinline__ output_type ## 4 func_name(scalar_type s, const input_type ## 4 & b) \
+    { \
+        return VecTraits<output_type ## 4>::make(func ((output_type) s, (output_type) b.x), func ((output_type) s, (output_type) b.y), func ((output_type) s, (output_type) b.z), func ((output_type) s, (output_type) b.w)); \
+    }
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::max, uchar, uchar, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::fmaxf, uchar, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::fmax, uchar, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::max, char, char, char)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::fmaxf, char, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::fmax, char, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::max, ushort, ushort, ushort)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::fmaxf, ushort, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::fmax, ushort, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::max, short, short, short)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::fmaxf, short, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::fmax, short, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::max, uint, uint, uint)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::fmaxf, uint, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::fmax, uint, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::max, int, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::fmaxf, int, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::fmax, int, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::fmaxf, float, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::fmax, float, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(max, ::fmax, double, double, double)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::min, uchar, uchar, uchar)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::fminf, uchar, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::fmin, uchar, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::min, char, char, char)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::fminf, char, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::fmin, char, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::min, ushort, ushort, ushort)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::fminf, ushort, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::fmin, ushort, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::min, short, short, short)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::fminf, short, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::fmin, short, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::min, uint, uint, uint)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::fminf, uint, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::fmin, uint, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::min, int, int, int)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::fminf, int, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::fmin, int, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::fminf, float, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::fmin, float, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(min, ::fmin, double, double, double)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(hypot, ::hypotf, uchar, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(hypot, ::hypot, uchar, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(hypot, ::hypotf, char, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(hypot, ::hypot, char, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(hypot, ::hypotf, ushort, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(hypot, ::hypot, ushort, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(hypot, ::hypotf, short, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(hypot, ::hypot, short, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(hypot, ::hypotf, uint, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(hypot, ::hypot, uint, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(hypot, ::hypotf, int, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(hypot, ::hypot, int, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(hypot, ::hypotf, float, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(hypot, ::hypot, float, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(hypot, ::hypot, double, double, double)
+
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(atan2, ::atan2f, uchar, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(atan2, ::atan2, uchar, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(atan2, ::atan2f, char, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(atan2, ::atan2, char, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(atan2, ::atan2f, ushort, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(atan2, ::atan2, ushort, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(atan2, ::atan2f, short, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(atan2, ::atan2, short, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(atan2, ::atan2f, uint, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(atan2, ::atan2, uint, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(atan2, ::atan2f, int, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(atan2, ::atan2, int, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(atan2, ::atan2f, float, float, float)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(atan2, ::atan2, float, double, double)
+CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC(atan2, ::atan2, double, double, double)
+
+#undef CV_CUDEV_IMPLEMENT_SCALAR_BINARY_FUNC
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/util/vec_traits.hpp b/modules/cudev/include/opencv2/cudev/util/vec_traits.hpp
new file mode 100644
index 00000000000..bff3744ef7a
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/util/vec_traits.hpp
@@ -0,0 +1,325 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_UTIL_VEC_TRAITS_HPP
+#define OPENCV_CUDEV_UTIL_VEC_TRAITS_HPP
+
+#include "../common.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// MakeVec
+
+template<typename T, int CN> struct MakeVec;
+
+#define CV_CUDEV_MAKE_VEC_INST(elem_type) \
+    template<> struct MakeVec<elem_type, 1> { typedef elem_type      type; }; \
+    template<> struct MakeVec<elem_type, 2> { typedef elem_type ## 2 type; }; \
+    template<> struct MakeVec<elem_type, 3> { typedef elem_type ## 3 type; }; \
+    template<> struct MakeVec<elem_type, 4> { typedef elem_type ## 4 type; };
+
+CV_CUDEV_MAKE_VEC_INST(uchar)
+CV_CUDEV_MAKE_VEC_INST(ushort)
+CV_CUDEV_MAKE_VEC_INST(short)
+CV_CUDEV_MAKE_VEC_INST(int)
+CV_CUDEV_MAKE_VEC_INST(uint)
+CV_CUDEV_MAKE_VEC_INST(float)
+CV_CUDEV_MAKE_VEC_INST(double)
+
+#undef CV_CUDEV_MAKE_VEC_INST
+
+template<> struct MakeVec<schar, 1> { typedef schar type; };
+template<> struct MakeVec<schar, 2> { typedef char2 type; };
+template<> struct MakeVec<schar, 3> { typedef char3 type; };
+template<> struct MakeVec<schar, 4> { typedef char4 type; };
+
+template<> struct MakeVec<bool, 1> { typedef uchar  type; };
+template<> struct MakeVec<bool, 2> { typedef uchar2 type; };
+template<> struct MakeVec<bool, 3> { typedef uchar3 type; };
+template<> struct MakeVec<bool, 4> { typedef uchar4 type; };
+
+// VecTraits
+
+template<typename T> struct VecTraits;
+
+#define CV_CUDEV_VEC_TRAITS_INST(type) \
+    template <> struct VecTraits<type> \
+    { \
+        typedef type elem_type; \
+        enum {cn=1}; \
+        __host__ __device__ __forceinline__ static type all(type v) {return v;} \
+        __host__ __device__ __forceinline__ static type make(type x) {return x;} \
+        __host__ __device__ __forceinline__ static type make(const type* v) {return *v;} \
+    }; \
+    template <> struct VecTraits<type ## 1> \
+    { \
+        typedef type elem_type; \
+        enum {cn=1}; \
+        __host__ __device__ __forceinline__ static type ## 1 all(type v) {return make_ ## type ## 1(v);} \
+        __host__ __device__ __forceinline__ static type ## 1 make(type x) {return make_ ## type ## 1(x);} \
+        __host__ __device__ __forceinline__ static type ## 1 make(const type* v) {return make_ ## type ## 1(*v);} \
+    }; \
+    template <> struct VecTraits<type ## 2> \
+    { \
+        typedef type elem_type; \
+        enum {cn=2}; \
+        __host__ __device__ __forceinline__ static type ## 2 all(type v) {return make_ ## type ## 2(v, v);} \
+        __host__ __device__ __forceinline__ static type ## 2 make(type x, type y) {return make_ ## type ## 2(x, y);} \
+        __host__ __device__ __forceinline__ static type ## 2 make(const type* v) {return make_ ## type ## 2(v[0], v[1]);} \
+    }; \
+    template <> struct VecTraits<type ## 3> \
+    { \
+        typedef type elem_type; \
+        enum {cn=3}; \
+        __host__ __device__ __forceinline__ static type ## 3 all(type v) {return make_ ## type ## 3(v, v, v);} \
+        __host__ __device__ __forceinline__ static type ## 3 make(type x, type y, type z) {return make_ ## type ## 3(x, y, z);} \
+        __host__ __device__ __forceinline__ static type ## 3 make(const type* v) {return make_ ## type ## 3(v[0], v[1], v[2]);} \
+    }; \
+    template <> struct VecTraits<type ## 4> \
+    { \
+        typedef type elem_type; \
+        enum {cn=4}; \
+        __host__ __device__ __forceinline__ static type ## 4 all(type v) {return make_ ## type ## 4(v, v, v, v);} \
+        __host__ __device__ __forceinline__ static type ## 4 make(type x, type y, type z, type w) {return make_ ## type ## 4(x, y, z, w);} \
+        __host__ __device__ __forceinline__ static type ## 4 make(const type* v) {return make_ ## type ## 4(v[0], v[1], v[2], v[3]);} \
+    };
+
+CV_CUDEV_VEC_TRAITS_INST(uchar)
+CV_CUDEV_VEC_TRAITS_INST(ushort)
+CV_CUDEV_VEC_TRAITS_INST(short)
+CV_CUDEV_VEC_TRAITS_INST(int)
+CV_CUDEV_VEC_TRAITS_INST(uint)
+CV_CUDEV_VEC_TRAITS_INST(float)
+CV_CUDEV_VEC_TRAITS_INST(double)
+
+#undef CV_CUDEV_VEC_TRAITS_INST
+
+template<> struct VecTraits<schar>
+{
+    typedef schar elem_type;
+    enum {cn=1};
+    __host__ __device__ __forceinline__ static schar all(schar v) {return v;}
+    __host__ __device__ __forceinline__ static schar make(schar x) {return x;}
+    __host__ __device__ __forceinline__ static schar make(const schar* x) {return *x;}
+};
+template<> struct VecTraits<char1>
+{
+    typedef schar elem_type;
+    enum {cn=1};
+    __host__ __device__ __forceinline__ static char1 all(schar v) {return make_char1(v);}
+    __host__ __device__ __forceinline__ static char1 make(schar x) {return make_char1(x);}
+    __host__ __device__ __forceinline__ static char1 make(const schar* v) {return make_char1(v[0]);}
+};
+template<> struct VecTraits<char2>
+{
+    typedef schar elem_type;
+    enum {cn=2};
+    __host__ __device__ __forceinline__ static char2 all(schar v) {return make_char2(v, v);}
+    __host__ __device__ __forceinline__ static char2 make(schar x, schar y) {return make_char2(x, y);}
+    __host__ __device__ __forceinline__ static char2 make(const schar* v) {return make_char2(v[0], v[1]);}
+};
+template<> struct VecTraits<char3>
+{
+    typedef schar elem_type;
+    enum {cn=3};
+    __host__ __device__ __forceinline__ static char3 all(schar v) {return make_char3(v, v, v);}
+    __host__ __device__ __forceinline__ static char3 make(schar x, schar y, schar z) {return make_char3(x, y, z);}
+    __host__ __device__ __forceinline__ static char3 make(const schar* v) {return make_char3(v[0], v[1], v[2]);}
+};
+template<> struct VecTraits<char4>
+{
+    typedef schar elem_type;
+    enum {cn=4};
+    __host__ __device__ __forceinline__ static char4 all(schar v) {return make_char4(v, v, v, v);}
+    __host__ __device__ __forceinline__ static char4 make(schar x, schar y, schar z, schar w) {return make_char4(x, y, z, w);}
+    __host__ __device__ __forceinline__ static char4 make(const schar* v) {return make_char4(v[0], v[1], v[2], v[3]);}
+};
+
+//! @}
+
+}}
+
+// DataType
+
+namespace cv {
+
+template <> class DataType<uint>
+{
+public:
+    typedef uint         value_type;
+    typedef value_type   work_type;
+    typedef value_type   channel_type;
+    typedef value_type   vec_type;
+    enum { generic_type = 0,
+           depth        = CV_32S,
+           channels     = 1,
+           fmt          = (int)'i',
+           type         = CV_MAKE_TYPE(depth, channels)
+         };
+};
+
+#define CV_CUDEV_DATA_TYPE_INST(_depth_type, _channel_num) \
+    template <> class DataType< _depth_type ## _channel_num > \
+    { \
+    public: \
+        typedef _depth_type ## _channel_num     value_type; \
+        typedef value_type                      work_type; \
+        typedef _depth_type                     channel_type; \
+        typedef value_type                      vec_type; \
+        enum { generic_type = 0, \
+               depth        = DataType<channel_type>::depth, \
+               channels     = _channel_num, \
+               fmt          = DataType<channel_type>::fmt + ((channels - 1) << 8), \
+               type         = CV_MAKE_TYPE(depth, channels) \
+             }; \
+    };
+
+CV_CUDEV_DATA_TYPE_INST(uchar, 1)
+CV_CUDEV_DATA_TYPE_INST(uchar, 2)
+CV_CUDEV_DATA_TYPE_INST(uchar, 3)
+CV_CUDEV_DATA_TYPE_INST(uchar, 4)
+
+CV_CUDEV_DATA_TYPE_INST(ushort, 1)
+CV_CUDEV_DATA_TYPE_INST(ushort, 2)
+CV_CUDEV_DATA_TYPE_INST(ushort, 3)
+CV_CUDEV_DATA_TYPE_INST(ushort, 4)
+
+CV_CUDEV_DATA_TYPE_INST(short, 1)
+CV_CUDEV_DATA_TYPE_INST(short, 2)
+CV_CUDEV_DATA_TYPE_INST(short, 3)
+CV_CUDEV_DATA_TYPE_INST(short, 4)
+
+CV_CUDEV_DATA_TYPE_INST(int, 1)
+CV_CUDEV_DATA_TYPE_INST(int, 2)
+CV_CUDEV_DATA_TYPE_INST(int, 3)
+CV_CUDEV_DATA_TYPE_INST(int, 4)
+
+CV_CUDEV_DATA_TYPE_INST(uint, 1)
+CV_CUDEV_DATA_TYPE_INST(uint, 2)
+CV_CUDEV_DATA_TYPE_INST(uint, 3)
+CV_CUDEV_DATA_TYPE_INST(uint, 4)
+
+CV_CUDEV_DATA_TYPE_INST(float, 1)
+CV_CUDEV_DATA_TYPE_INST(float, 2)
+CV_CUDEV_DATA_TYPE_INST(float, 3)
+CV_CUDEV_DATA_TYPE_INST(float, 4)
+
+CV_CUDEV_DATA_TYPE_INST(double, 1)
+CV_CUDEV_DATA_TYPE_INST(double, 2)
+CV_CUDEV_DATA_TYPE_INST(double, 3)
+CV_CUDEV_DATA_TYPE_INST(double, 4)
+
+#undef CV_CUDEV_DATA_TYPE_INST
+
+template<> class DataType<char1>
+{
+public:
+    typedef char1      value_type;
+    typedef value_type work_type;
+    typedef schar      channel_type;
+    typedef value_type vec_type;
+
+    enum { generic_type = 0,
+           depth        = DataType<channel_type>::depth,
+           channels     = 1,
+           fmt          = DataType<channel_type>::fmt + ((channels - 1) << 8),
+           type         = CV_MAKE_TYPE(depth, channels)
+         };
+};
+
+template<> class DataType<char2>
+{
+public:
+    typedef char2      value_type;
+    typedef value_type work_type;
+    typedef schar      channel_type;
+    typedef value_type vec_type;
+
+    enum { generic_type = 0,
+           depth        = DataType<channel_type>::depth,
+           channels     = 2,
+           fmt          = DataType<channel_type>::fmt + ((channels - 1) << 8),
+           type         = CV_MAKE_TYPE(depth, channels)
+         };
+};
+
+template<> class DataType<char3>
+{
+public:
+    typedef char3      value_type;
+    typedef value_type work_type;
+    typedef schar      channel_type;
+    typedef value_type vec_type;
+
+    enum { generic_type = 0,
+           depth        = DataType<channel_type>::depth,
+           channels     = 3,
+           fmt          = DataType<channel_type>::fmt + ((channels - 1) << 8),
+           type         = CV_MAKE_TYPE(depth, channels)
+         };
+};
+
+template<> class DataType<char4>
+{
+public:
+    typedef char4      value_type;
+    typedef value_type work_type;
+    typedef schar      channel_type;
+    typedef value_type vec_type;
+
+    enum { generic_type = 0,
+           depth        = DataType<channel_type>::depth,
+           channels     = 4,
+           fmt          = DataType<channel_type>::fmt + ((channels - 1) << 8),
+           type         = CV_MAKE_TYPE(depth, channels)
+         };
+};
+
+}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/warp/detail/reduce.hpp b/modules/cudev/include/opencv2/cudev/warp/detail/reduce.hpp
new file mode 100644
index 00000000000..ef19bd35fa6
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/warp/detail/reduce.hpp
@@ -0,0 +1,222 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_WARP_REDUCE_DETAIL_HPP
+#define OPENCV_CUDEV_WARP_REDUCE_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../util/tuple.hpp"
+#include "../../warp/shuffle.hpp"
+
+namespace cv { namespace cudev {
+
+namespace warp_reduce_detail
+{
+    // GetType
+
+    template <typename T> struct GetType;
+
+    template <typename T> struct GetType<T*>
+    {
+        typedef T type;
+    };
+
+    template <typename T> struct GetType<volatile T*>
+    {
+        typedef T type;
+    };
+
+    template <typename T> struct GetType<T&>
+    {
+        typedef T type;
+    };
+
+    // For
+
+    template <int I, int N> struct For
+    {
+        template <class PointerTuple, class ValTuple>
+        __device__ static void loadToSmem(const PointerTuple& smem, const ValTuple& val, uint tid)
+        {
+            get<I>(smem)[tid] = get<I>(val);
+
+            For<I + 1, N>::loadToSmem(smem, val, tid);
+        }
+
+        template <class PointerTuple, class ValTuple, class OpTuple>
+        __device__ static void merge(const PointerTuple& smem, const ValTuple& val, uint tid, uint delta, const OpTuple& op)
+        {
+            typename GetType<typename tuple_element<I, PointerTuple>::type>::type reg = get<I>(smem)[tid + delta];
+            get<I>(smem)[tid] = get<I>(val) = get<I>(op)(get<I>(val), reg);
+
+            For<I + 1, N>::merge(smem, val, tid, delta, op);
+        }
+
+#if CV_CUDEV_ARCH >= 300
+        template <class ValTuple, class OpTuple>
+        __device__ static void mergeShfl(const ValTuple& val, uint delta, uint width, const OpTuple& op)
+        {
+            typename GetType<typename tuple_element<I, ValTuple>::type>::type reg = shfl_down(get<I>(val), delta, width);
+            get<I>(val) = get<I>(op)(get<I>(val), reg);
+
+            For<I + 1, N>::mergeShfl(val, delta, width, op);
+        }
+#endif
+    };
+
+    template <int N> struct For<N, N>
+    {
+        template <class PointerTuple, class ValTuple>
+        __device__ __forceinline__ static void loadToSmem(const PointerTuple&, const ValTuple&, uint)
+        {
+        }
+
+        template <class PointerTuple, class ValTuple, class OpTuple>
+        __device__ __forceinline__ static void merge(const PointerTuple&, const ValTuple&, uint, uint, const OpTuple&)
+        {
+        }
+
+#if CV_CUDEV_ARCH >= 300
+        template <class ValTuple, class OpTuple>
+        __device__ __forceinline__ static void mergeShfl(const ValTuple&, uint, uint, const OpTuple&)
+        {
+        }
+#endif
+    };
+
+    // loadToSmem
+
+    template <typename T>
+    __device__ __forceinline__ void loadToSmem(volatile T* smem, T& val, uint tid)
+    {
+        smem[tid] = val;
+    }
+
+    template <typename P0, typename P1, typename P2, typename P3, typename P4, typename P5, typename P6, typename P7, typename P8, typename P9,
+              typename R0, typename R1, typename R2, typename R3, typename R4, typename R5, typename R6, typename R7, typename R8, typename R9>
+    __device__ __forceinline__ void loadToSmem(const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem,
+                                               const tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9>& val,
+                                               uint tid)
+    {
+        For<0, tuple_size<tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9> >::value>::loadToSmem(smem, val, tid);
+    }
+
+    // merge
+
+    template <typename T, class Op>
+    __device__ __forceinline__ void merge(volatile T* smem, T& val, uint tid, uint delta, const Op& op)
+    {
+        T reg = smem[tid + delta];
+        smem[tid] = val = op(val, reg);
+    }
+
+    template <typename P0, typename P1, typename P2, typename P3, typename P4, typename P5, typename P6, typename P7, typename P8, typename P9,
+              typename R0, typename R1, typename R2, typename R3, typename R4, typename R5, typename R6, typename R7, typename R8, typename R9,
+              class Op0, class Op1, class Op2, class Op3, class Op4, class Op5, class Op6, class Op7, class Op8, class Op9>
+    __device__ __forceinline__ void merge(const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem,
+                                          const tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9>& val,
+                                          uint tid,
+                                          uint delta,
+                                          const tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9>& op)
+    {
+        For<0, tuple_size<tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9> >::value>::merge(smem, val, tid, delta, op);
+    }
+
+    // mergeShfl
+
+#if CV_CUDEV_ARCH >= 300
+    template <typename T, class Op>
+    __device__ __forceinline__ void mergeShfl(T& val, uint delta, uint width, const Op& op)
+    {
+        T reg = shfl_down(val, delta, width);
+        val = op(val, reg);
+    }
+
+    template <typename R0, typename R1, typename R2, typename R3, typename R4, typename R5, typename R6, typename R7, typename R8, typename R9,
+              class Op0, class Op1, class Op2, class Op3, class Op4, class Op5, class Op6, class Op7, class Op8, class Op9>
+    __device__ __forceinline__ void mergeShfl(const tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9>& val,
+                                              uint delta,
+                                              uint width,
+                                              const tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9>& op)
+    {
+        For<0, tuple_size<tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9> >::value>::mergeShfl(val, delta, width, op);
+    }
+#endif
+
+    // WarpReductor
+
+    struct WarpReductor
+    {
+        template <typename Pointer, typename Reference, class Op>
+        __device__ static void reduce(Pointer smem, Reference val, uint tid, Op op)
+        {
+        #if CV_CUDEV_ARCH >= 300
+            CV_UNUSED(smem);
+            CV_UNUSED(tid);
+
+            mergeShfl(val, 16, 32, op);
+            mergeShfl(val, 8, 32, op);
+            mergeShfl(val, 4, 32, op);
+            mergeShfl(val, 2, 32, op);
+            mergeShfl(val, 1, 32, op);
+        #else
+            loadToSmem(smem, val, tid);
+
+            if (tid < 16)
+            {
+                merge(smem, val, tid, 16, op);
+                merge(smem, val, tid, 8, op);
+                merge(smem, val, tid, 4, op);
+                merge(smem, val, tid, 2, op);
+                merge(smem, val, tid, 1, op);
+            }
+        #endif
+        }
+    };
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/warp/detail/reduce_key_val.hpp b/modules/cudev/include/opencv2/cudev/warp/detail/reduce_key_val.hpp
new file mode 100644
index 00000000000..c6deb3ace19
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/warp/detail/reduce_key_val.hpp
@@ -0,0 +1,239 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_WARP_REDUCE_KEY_VAL_DETAIL_HPP
+#define OPENCV_CUDEV_WARP_REDUCE_KEY_VAL_DETAIL_HPP
+
+#include "../../common.hpp"
+#include "../../util/tuple.hpp"
+
+namespace cv { namespace cudev {
+
+namespace warp_reduce_key_val_detail
+{
+    // GetType
+
+    template <typename T> struct GetType;
+
+    template <typename T> struct GetType<T*>
+    {
+        typedef T type;
+    };
+
+    template <typename T> struct GetType<volatile T*>
+    {
+        typedef T type;
+    };
+
+    template <typename T> struct GetType<T&>
+    {
+        typedef T type;
+    };
+
+    // For
+
+    template <int I, int N> struct For
+    {
+        template <class PointerTuple, class ReferenceTuple>
+        __device__ static void loadToSmem(const PointerTuple& smem, const ReferenceTuple& data, uint tid)
+        {
+            get<I>(smem)[tid] = get<I>(data);
+
+            For<I + 1, N>::loadToSmem(smem, data, tid);
+        }
+
+        template <class PointerTuple, class ReferenceTuple>
+        __device__ static void copy(const PointerTuple& svals, const ReferenceTuple& val, uint tid, uint delta)
+        {
+            get<I>(svals)[tid] = get<I>(val) = get<I>(svals)[tid + delta];
+
+            For<I + 1, N>::copy(svals, val, tid, delta);
+        }
+
+        template <class KeyPointerTuple, class KeyReferenceTuple, class ValPointerTuple, class ValReferenceTuple, class CmpTuple>
+        __device__ static void merge(const KeyPointerTuple& skeys, const KeyReferenceTuple& key,
+                                     const ValPointerTuple& svals, const ValReferenceTuple& val,
+                                     const CmpTuple& cmp,
+                                     uint tid, uint delta)
+        {
+            typename GetType<typename tuple_element<I, KeyPointerTuple>::type>::type reg = get<I>(skeys)[tid + delta];
+
+            if (get<I>(cmp)(reg, get<I>(key)))
+            {
+                get<I>(skeys)[tid] = get<I>(key) = reg;
+                get<I>(svals)[tid] = get<I>(val) = get<I>(svals)[tid + delta];
+            }
+
+            For<I + 1, N>::merge(skeys, key, svals, val, cmp, tid, delta);
+        }
+    };
+
+    template <int N> struct For<N, N>
+    {
+        template <class PointerTuple, class ReferenceTuple>
+        __device__ __forceinline__ static void loadToSmem(const PointerTuple&, const ReferenceTuple&, uint)
+        {
+        }
+
+        template <class PointerTuple, class ReferenceTuple>
+        __device__ __forceinline__ static void copy(const PointerTuple&, const ReferenceTuple&, uint, uint)
+        {
+        }
+
+        template <class KeyPointerTuple, class KeyReferenceTuple, class ValPointerTuple, class ValReferenceTuple, class CmpTuple>
+        __device__ __forceinline__ static void merge(const KeyPointerTuple&, const KeyReferenceTuple&,
+                                                     const ValPointerTuple&, const ValReferenceTuple&,
+                                                     const CmpTuple&,
+                                                     uint, uint)
+        {
+        }
+    };
+
+    // loadToSmem
+
+    template <typename T>
+    __device__ __forceinline__ void loadToSmem(volatile T* smem, T& data, uint tid)
+    {
+        smem[tid] = data;
+    }
+
+    template <typename VP0, typename VP1, typename VP2, typename VP3, typename VP4, typename VP5, typename VP6, typename VP7, typename VP8, typename VP9,
+              typename VR0, typename VR1, typename VR2, typename VR3, typename VR4, typename VR5, typename VR6, typename VR7, typename VR8, typename VR9>
+    __device__ __forceinline__ void loadToSmem(const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>& smem,
+                                               const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>& data,
+                                               uint tid)
+    {
+        For<0, tuple_size<tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9> >::value>::loadToSmem(smem, data, tid);
+    }
+
+    // copyVals
+
+    template <typename V>
+    __device__ __forceinline__ void copyVals(volatile V* svals, V& val, uint tid, uint delta)
+    {
+        svals[tid] = val = svals[tid + delta];
+    }
+
+    template <typename VP0, typename VP1, typename VP2, typename VP3, typename VP4, typename VP5, typename VP6, typename VP7, typename VP8, typename VP9,
+              typename VR0, typename VR1, typename VR2, typename VR3, typename VR4, typename VR5, typename VR6, typename VR7, typename VR8, typename VR9>
+    __device__ __forceinline__ void copyVals(const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>& svals,
+                                             const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>& val,
+                                             uint tid, uint delta)
+    {
+        For<0, tuple_size<tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9> >::value>::copy(svals, val, tid, delta);
+    }
+
+    // merge
+
+    template <typename K, typename V, class Cmp>
+    __device__ void merge(volatile K* skeys, K& key, volatile V* svals, V& val, const Cmp& cmp, uint tid, uint delta)
+    {
+        K reg = skeys[tid + delta];
+
+        if (cmp(reg, key))
+        {
+            skeys[tid] = key = reg;
+            copyVals(svals, val, tid, delta);
+        }
+    }
+
+    template <typename K,
+              typename VP0, typename VP1, typename VP2, typename VP3, typename VP4, typename VP5, typename VP6, typename VP7, typename VP8, typename VP9,
+              typename VR0, typename VR1, typename VR2, typename VR3, typename VR4, typename VR5, typename VR6, typename VR7, typename VR8, typename VR9,
+              class Cmp>
+    __device__ void merge(volatile K* skeys, K& key,
+                          const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>& svals,
+                          const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>& val,
+                          const Cmp& cmp, uint tid, uint delta)
+    {
+        K reg = skeys[tid + delta];
+
+        if (cmp(reg, key))
+        {
+            skeys[tid] = key = reg;
+            copyVals(svals, val, tid, delta);
+        }
+    }
+
+    template <typename KP0, typename KP1, typename KP2, typename KP3, typename KP4, typename KP5, typename KP6, typename KP7, typename KP8, typename KP9,
+              typename KR0, typename KR1, typename KR2, typename KR3, typename KR4, typename KR5, typename KR6, typename KR7, typename KR8, typename KR9,
+              typename VP0, typename VP1, typename VP2, typename VP3, typename VP4, typename VP5, typename VP6, typename VP7, typename VP8, typename VP9,
+              typename VR0, typename VR1, typename VR2, typename VR3, typename VR4, typename VR5, typename VR6, typename VR7, typename VR8, typename VR9,
+              class Cmp0, class Cmp1, class Cmp2, class Cmp3, class Cmp4, class Cmp5, class Cmp6, class Cmp7, class Cmp8, class Cmp9>
+    __device__ __forceinline__ void merge(const tuple<KP0, KP1, KP2, KP3, KP4, KP5, KP6, KP7, KP8, KP9>& skeys,
+                                          const tuple<KR0, KR1, KR2, KR3, KR4, KR5, KR6, KR7, KR8, KR9>& key,
+                                          const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>& svals,
+                                          const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>& val,
+                                          const tuple<Cmp0, Cmp1, Cmp2, Cmp3, Cmp4, Cmp5, Cmp6, Cmp7, Cmp8, Cmp9>& cmp,
+                                          uint tid, uint delta)
+    {
+        For<0, tuple_size<tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9> >::value>::merge(skeys, key, svals, val, cmp, tid, delta);
+    }
+
+    // WarpReductor
+
+    struct WarpReductor
+    {
+        template <class KP, class KR, class VP, class VR, class Cmp>
+        __device__ static void reduce(KP skeys, KR key, VP svals, VR val, uint tid, Cmp cmp)
+        {
+            loadToSmem(skeys, key, tid);
+            loadToSmem(svals, val, tid);
+
+            if (tid < 16)
+            {
+                merge(skeys, key, svals, val, cmp, tid, 16);
+                merge(skeys, key, svals, val, cmp, tid, 8);
+                merge(skeys, key, svals, val, cmp, tid, 4);
+                merge(skeys, key, svals, val, cmp, tid, 2);
+                merge(skeys, key, svals, val, cmp, tid, 1);
+            }
+        }
+    };
+}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/warp/reduce.hpp b/modules/cudev/include/opencv2/cudev/warp/reduce.hpp
new file mode 100644
index 00000000000..46826033fea
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/warp/reduce.hpp
@@ -0,0 +1,211 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_WARP_REDUCE_HPP
+#define OPENCV_CUDEV_WARP_REDUCE_HPP
+
+#include "../common.hpp"
+#include "../util/tuple.hpp"
+#include "detail/reduce.hpp"
+#include "detail/reduce_key_val.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+// warpReduce
+
+template <typename T, class Op>
+__device__ __forceinline__ void warpReduce(volatile T* smem, T& val, uint tid, const Op& op)
+{
+    warp_reduce_detail::WarpReductor::template reduce<volatile T*, T&, const Op&>(smem, val, tid, op);
+}
+
+template <typename P0, typename P1, typename P2, typename P3, typename P4, typename P5, typename P6, typename P7, typename P8, typename P9,
+          typename R0, typename R1, typename R2, typename R3, typename R4, typename R5, typename R6, typename R7, typename R8, typename R9,
+          class Op0, class Op1, class Op2, class Op3, class Op4, class Op5, class Op6, class Op7, class Op8, class Op9>
+__device__ __forceinline__ void warpReduce(const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem,
+                                           const tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9>& val,
+                                           uint tid,
+                                           const tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9>& op)
+{
+    warp_reduce_detail::WarpReductor::template reduce<
+            const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>&,
+            const tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9>&,
+            const tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9>&>(smem, val, tid, op);
+}
+
+// warpReduceKeyVal
+
+template <typename K, typename V, class Cmp>
+__device__ __forceinline__ void warpReduceKeyVal(volatile K* skeys, K& key, volatile V* svals, V& val, uint tid, const Cmp& cmp)
+{
+    warp_reduce_key_val_detail::WarpReductor::template reduce<volatile K*, K&, volatile V*, V&, const Cmp&>(skeys, key, svals, val, tid, cmp);
+}
+
+template <typename K,
+          typename VP0, typename VP1, typename VP2, typename VP3, typename VP4, typename VP5, typename VP6, typename VP7, typename VP8, typename VP9,
+          typename VR0, typename VR1, typename VR2, typename VR3, typename VR4, typename VR5, typename VR6, typename VR7, typename VR8, typename VR9,
+          class Cmp>
+__device__ __forceinline__ void warpReduceKeyVal(volatile K* skeys, K& key,
+                                                 const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>& svals,
+                                                 const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>& val,
+                                                 uint tid, const Cmp& cmp)
+{
+    warp_reduce_key_val_detail::WarpReductor::template reduce<volatile K*, K&,
+            const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>&,
+            const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>&,
+            const Cmp&>(skeys, key, svals, val, tid, cmp);
+}
+
+template <typename KP0, typename KP1, typename KP2, typename KP3, typename KP4, typename KP5, typename KP6, typename KP7, typename KP8, typename KP9,
+          typename KR0, typename KR1, typename KR2, typename KR3, typename KR4, typename KR5, typename KR6, typename KR7, typename KR8, typename KR9,
+          typename VP0, typename VP1, typename VP2, typename VP3, typename VP4, typename VP5, typename VP6, typename VP7, typename VP8, typename VP9,
+          typename VR0, typename VR1, typename VR2, typename VR3, typename VR4, typename VR5, typename VR6, typename VR7, typename VR8, typename VR9,
+          class Cmp0, class Cmp1, class Cmp2, class Cmp3, class Cmp4, class Cmp5, class Cmp6, class Cmp7, class Cmp8, class Cmp9>
+__device__ __forceinline__ void warpReduceKeyVal(const tuple<KP0, KP1, KP2, KP3, KP4, KP5, KP6, KP7, KP8, KP9>& skeys,
+                                                 const tuple<KR0, KR1, KR2, KR3, KR4, KR5, KR6, KR7, KR8, KR9>& key,
+                                                 const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>& svals,
+                                                 const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>& val,
+                                                 uint tid,
+                                                 const tuple<Cmp0, Cmp1, Cmp2, Cmp3, Cmp4, Cmp5, Cmp6, Cmp7, Cmp8, Cmp9>& cmp)
+{
+    warp_reduce_key_val_detail::WarpReductor::template reduce<
+            const tuple<KP0, KP1, KP2, KP3, KP4, KP5, KP6, KP7, KP8, KP9>&,
+            const tuple<KR0, KR1, KR2, KR3, KR4, KR5, KR6, KR7, KR8, KR9>&,
+            const tuple<VP0, VP1, VP2, VP3, VP4, VP5, VP6, VP7, VP8, VP9>&,
+            const tuple<VR0, VR1, VR2, VR3, VR4, VR5, VR6, VR7, VR8, VR9>&,
+            const tuple<Cmp0, Cmp1, Cmp2, Cmp3, Cmp4, Cmp5, Cmp6, Cmp7, Cmp8, Cmp9>&
+            >(skeys, key, svals, val, tid, cmp);
+}
+
+// smem_tuple
+
+template <typename T0>
+__device__ __forceinline__
+tuple<volatile T0*>
+smem_tuple(T0* t0)
+{
+    return make_tuple((volatile T0*) t0);
+}
+
+template <typename T0, typename T1>
+__device__ __forceinline__
+tuple<volatile T0*, volatile T1*>
+smem_tuple(T0* t0, T1* t1)
+{
+    return make_tuple((volatile T0*) t0, (volatile T1*) t1);
+}
+
+template <typename T0, typename T1, typename T2>
+__device__ __forceinline__
+tuple<volatile T0*, volatile T1*, volatile T2*>
+smem_tuple(T0* t0, T1* t1, T2* t2)
+{
+    return make_tuple((volatile T0*) t0, (volatile T1*) t1, (volatile T2*) t2);
+}
+
+template <typename T0, typename T1, typename T2, typename T3>
+__device__ __forceinline__
+tuple<volatile T0*, volatile T1*, volatile T2*, volatile T3*>
+smem_tuple(T0* t0, T1* t1, T2* t2, T3* t3)
+{
+    return make_tuple((volatile T0*) t0, (volatile T1*) t1, (volatile T2*) t2, (volatile T3*) t3);
+}
+
+template <typename T0, typename T1, typename T2, typename T3, typename T4>
+__device__ __forceinline__
+tuple<volatile T0*, volatile T1*, volatile T2*, volatile T3*, volatile T4*>
+smem_tuple(T0* t0, T1* t1, T2* t2, T3* t3, T4* t4)
+{
+    return make_tuple((volatile T0*) t0, (volatile T1*) t1, (volatile T2*) t2, (volatile T3*) t3, (volatile T4*) t4);
+}
+
+template <typename T0, typename T1, typename T2, typename T3, typename T4, typename T5>
+__device__ __forceinline__
+tuple<volatile T0*, volatile T1*, volatile T2*, volatile T3*, volatile T4*, volatile T5*>
+smem_tuple(T0* t0, T1* t1, T2* t2, T3* t3, T4* t4, T5* t5)
+{
+    return make_tuple((volatile T0*) t0, (volatile T1*) t1, (volatile T2*) t2, (volatile T3*) t3, (volatile T4*) t4, (volatile T5*) t5);
+}
+
+template <typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6>
+__device__ __forceinline__
+tuple<volatile T0*, volatile T1*, volatile T2*, volatile T3*, volatile T4*, volatile T5*, volatile T6*>
+smem_tuple(T0* t0, T1* t1, T2* t2, T3* t3, T4* t4, T5* t5, T6* t6)
+{
+    return make_tuple((volatile T0*) t0, (volatile T1*) t1, (volatile T2*) t2, (volatile T3*) t3, (volatile T4*) t4, (volatile T5*) t5, (volatile T6*) t6);
+}
+
+template <typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7>
+__device__ __forceinline__
+tuple<volatile T0*, volatile T1*, volatile T2*, volatile T3*, volatile T4*, volatile T5*, volatile T6*, volatile T7*>
+smem_tuple(T0* t0, T1* t1, T2* t2, T3* t3, T4* t4, T5* t5, T6* t6, T7* t7)
+{
+    return make_tuple((volatile T0*) t0, (volatile T1*) t1, (volatile T2*) t2, (volatile T3*) t3, (volatile T4*) t4, (volatile T5*) t5, (volatile T6*) t6, (volatile T7*) t7);
+}
+
+template <typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8>
+__device__ __forceinline__
+tuple<volatile T0*, volatile T1*, volatile T2*, volatile T3*, volatile T4*, volatile T5*, volatile T6*, volatile T7*, volatile T8*>
+smem_tuple(T0* t0, T1* t1, T2* t2, T3* t3, T4* t4, T5* t5, T6* t6, T7* t7, T8* t8)
+{
+    return make_tuple((volatile T0*) t0, (volatile T1*) t1, (volatile T2*) t2, (volatile T3*) t3, (volatile T4*) t4, (volatile T5*) t5, (volatile T6*) t6, (volatile T7*) t7, (volatile T8*) t8);
+}
+
+template <typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9>
+__device__ __forceinline__
+tuple<volatile T0*, volatile T1*, volatile T2*, volatile T3*, volatile T4*, volatile T5*, volatile T6*, volatile T7*, volatile T8*, volatile T9*>
+smem_tuple(T0* t0, T1* t1, T2* t2, T3* t3, T4* t4, T5* t5, T6* t6, T7* t7, T8* t8, T9* t9)
+{
+    return make_tuple((volatile T0*) t0, (volatile T1*) t1, (volatile T2*) t2, (volatile T3*) t3, (volatile T4*) t4, (volatile T5*) t5, (volatile T6*) t6, (volatile T7*) t7, (volatile T8*) t8, (volatile T9*) t9);
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/warp/scan.hpp b/modules/cudev/include/opencv2/cudev/warp/scan.hpp
new file mode 100644
index 00000000000..bab462973f5
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/warp/scan.hpp
@@ -0,0 +1,133 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_WARP_SCAN_HPP
+#define OPENCV_CUDEV_WARP_SCAN_HPP
+
+#include "../common.hpp"
+#include "warp.hpp"
+#include "shuffle.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+#if __CUDACC_VER_MAJOR__ >= 9
+
+// Starting from CUDA 9.0, support for Fermi is dropped.
+// So CV_CUDEV_ARCH >= 300 is implied.
+
+template <typename T>
+__device__ T warpScanInclusive(uint mask, T data)
+{
+    const uint laneId = Warp::laneId();
+
+    // scan on shufl functions
+    #pragma unroll
+    for (int i = 1; i <= (WARP_SIZE / 2); i *= 2)
+    {
+        const T val = shfl_up_sync(mask, data, i);
+        if (laneId >= i)
+              data += val;
+    }
+
+    return data;
+}
+
+template <typename T>
+__device__ __forceinline__ T warpScanExclusive(uint mask, T data)
+{
+    return warpScanInclusive(mask, data) - data;
+}
+
+#else // __CUDACC_VER_MAJOR__ >= 9
+
+template <typename T>
+__device__ T warpScanInclusive(T data, volatile T* smem, uint tid)
+{
+#if CV_CUDEV_ARCH >= 300
+    CV_UNUSED(smem);
+    CV_UNUSED(tid);
+
+    const uint laneId = Warp::laneId();
+
+    // scan on shufl functions
+    #pragma unroll
+    for (int i = 1; i <= (WARP_SIZE / 2); i *= 2)
+    {
+        const T val = shfl_up(data, i);
+        if (laneId >= i)
+              data += val;
+    }
+
+    return data;
+#else
+    const uint laneId = Warp::laneId();
+
+    smem[tid] = data;
+
+    #pragma unroll
+    for (int i = 1; i <= (WARP_SIZE / 2); i *= 2)
+        if (laneId >= i)
+            smem[tid] += smem[tid - i];
+
+    return smem[tid];
+#endif
+}
+
+template <typename T>
+__device__ __forceinline__ T warpScanExclusive(T data, volatile T* smem, uint tid)
+{
+    return warpScanInclusive(data, smem, tid) - data;
+}
+
+#endif // __CUDACC_VER_MAJOR__ >= 9
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/warp/shuffle.hpp b/modules/cudev/include/opencv2/cudev/warp/shuffle.hpp
new file mode 100644
index 00000000000..fd22d2a317f
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/warp/shuffle.hpp
@@ -0,0 +1,495 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_WARP_SHUFFLE_HPP
+#define OPENCV_CUDEV_WARP_SHUFFLE_HPP
+
+#include "../common.hpp"
+#include "../util/vec_traits.hpp"
+#include "../block/block.hpp"
+#include "warp.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+#if CV_CUDEV_ARCH >= 300
+
+#if __CUDACC_VER_MAJOR__ >= 9
+#  define __shfl(x, y, z) __shfl_sync(0xFFFFFFFFU, x, y, z)
+#  define __shfl_xor(x, y, z) __shfl_xor_sync(0xFFFFFFFFU, x, y, z)
+//#  define __shfl_up(x, y, z) __shfl_up_sync(0xFFFFFFFFU, x, y, z)
+#  define __shfl_down(x, y, z) __shfl_down_sync(0xFFFFFFFFU, x, y, z)
+#endif
+
+// shfl
+__device__ __forceinline__ uchar shfl(uchar val, int srcLane, int width = warpSize)
+{
+    return (uchar) __shfl((int) val, srcLane, width);
+}
+
+__device__ __forceinline__ schar shfl(schar val, int srcLane, int width = warpSize)
+{
+    return (schar) __shfl((int) val, srcLane, width);
+}
+
+__device__ __forceinline__ ushort shfl(ushort val, int srcLane, int width = warpSize)
+{
+    return (ushort) __shfl((int) val, srcLane, width);
+}
+
+__device__ __forceinline__ short shfl(short val, int srcLane, int width = warpSize)
+{
+    return (short) __shfl((int) val, srcLane, width);
+}
+
+__device__ __forceinline__ int shfl(int val, int srcLane, int width = warpSize)
+{
+    return __shfl(val, srcLane, width);
+}
+
+__device__ __forceinline__ uint shfl(uint val, int srcLane, int width = warpSize)
+{
+    return (uint) __shfl((int) val, srcLane, width);
+}
+
+__device__ __forceinline__ float shfl(float val, int srcLane, int width = warpSize)
+{
+    return __shfl(val, srcLane, width);
+}
+
+__device__ double shfl(double val, int srcLane, int width = warpSize)
+{
+    int lo = __double2loint(val);
+    int hi = __double2hiint(val);
+
+    lo = __shfl(lo, srcLane, width);
+    hi = __shfl(hi, srcLane, width);
+
+    return __hiloint2double(hi, lo);
+}
+
+#define CV_CUDEV_SHFL_VEC_INST(input_type) \
+    __device__ __forceinline__ input_type ## 1 shfl(const input_type ## 1 & val, int srcLane, int width = warpSize) \
+    { \
+        return VecTraits<input_type ## 1>::make( \
+                        shfl(val.x, srcLane, width) \
+                    ); \
+    } \
+    __device__ __forceinline__ input_type ## 2 shfl(const input_type ## 2 & val, int srcLane, int width = warpSize) \
+    { \
+        return VecTraits<input_type ## 2>::make( \
+                        shfl(val.x, srcLane, width), \
+                        shfl(val.y, srcLane, width) \
+                    ); \
+    } \
+    __device__ __forceinline__ input_type ## 3 shfl(const input_type ## 3 & val, int srcLane, int width = warpSize) \
+    { \
+        return VecTraits<input_type ## 3>::make( \
+                        shfl(val.x, srcLane, width), \
+                        shfl(val.y, srcLane, width), \
+                        shfl(val.z, srcLane, width) \
+                    ); \
+    } \
+    __device__ __forceinline__ input_type ## 4 shfl(const input_type ## 4 & val, int srcLane, int width = warpSize) \
+    { \
+        return VecTraits<input_type ## 4>::make( \
+                        shfl(val.x, srcLane, width), \
+                        shfl(val.y, srcLane, width), \
+                        shfl(val.z, srcLane, width), \
+                        shfl(val.w, srcLane, width) \
+                    ); \
+    }
+
+CV_CUDEV_SHFL_VEC_INST(uchar)
+CV_CUDEV_SHFL_VEC_INST(char)
+CV_CUDEV_SHFL_VEC_INST(ushort)
+CV_CUDEV_SHFL_VEC_INST(short)
+CV_CUDEV_SHFL_VEC_INST(uint)
+CV_CUDEV_SHFL_VEC_INST(int)
+CV_CUDEV_SHFL_VEC_INST(float)
+CV_CUDEV_SHFL_VEC_INST(double)
+
+#undef CV_CUDEV_SHFL_VEC_INST
+
+// shfl_up
+
+#if __CUDACC_VER_MAJOR__ >= 9
+
+template <typename T>
+__device__ __forceinline__ T shfl_up_sync(uint mask, T val, uint delta, int width = warpSize)
+{
+    return (T) __shfl_up_sync(mask, val, delta, width);
+}
+
+#else
+
+__device__ __forceinline__ uchar shfl_up(uchar val, uint delta, int width = warpSize)
+{
+    return (uchar) __shfl_up((int) val, delta, width);
+}
+
+__device__ __forceinline__ schar shfl_up(schar val, uint delta, int width = warpSize)
+{
+    return (schar) __shfl_up((int) val, delta, width);
+}
+
+__device__ __forceinline__ ushort shfl_up(ushort val, uint delta, int width = warpSize)
+{
+    return (ushort) __shfl_up((int) val, delta, width);
+}
+
+__device__ __forceinline__ short shfl_up(short val, uint delta, int width = warpSize)
+{
+    return (short) __shfl_up((int) val, delta, width);
+}
+
+__device__ __forceinline__ int shfl_up(int val, uint delta, int width = warpSize)
+{
+    return __shfl_up(val, delta, width);
+}
+
+__device__ __forceinline__ uint shfl_up(uint val, uint delta, int width = warpSize)
+{
+    return (uint) __shfl_up((int) val, delta, width);
+}
+
+__device__ __forceinline__ float shfl_up(float val, uint delta, int width = warpSize)
+{
+    return __shfl_up(val, delta, width);
+}
+
+__device__ double shfl_up(double val, uint delta, int width = warpSize)
+{
+    int lo = __double2loint(val);
+    int hi = __double2hiint(val);
+
+    lo = __shfl_up(lo, delta, width);
+    hi = __shfl_up(hi, delta, width);
+
+    return __hiloint2double(hi, lo);
+}
+
+__device__ __forceinline__ unsigned long long shfl_up(unsigned long long val, uint delta, int width = warpSize)
+{
+    return __shfl_up(val, delta, width);
+}
+
+#define CV_CUDEV_SHFL_UP_VEC_INST(input_type) \
+    __device__ __forceinline__ input_type ## 1 shfl_up(const input_type ## 1 & val, uint delta, int width = warpSize) \
+    { \
+        return VecTraits<input_type ## 1>::make( \
+                        shfl_up(val.x, delta, width) \
+                    ); \
+    } \
+    __device__ __forceinline__ input_type ## 2 shfl_up(const input_type ## 2 & val, uint delta, int width = warpSize) \
+    { \
+        return VecTraits<input_type ## 2>::make( \
+                        shfl_up(val.x, delta, width), \
+                        shfl_up(val.y, delta, width) \
+                    ); \
+    } \
+    __device__ __forceinline__ input_type ## 3 shfl_up(const input_type ## 3 & val, uint delta, int width = warpSize) \
+    { \
+        return VecTraits<input_type ## 3>::make( \
+                        shfl_up(val.x, delta, width), \
+                        shfl_up(val.y, delta, width), \
+                        shfl_up(val.z, delta, width) \
+                    ); \
+    } \
+    __device__ __forceinline__ input_type ## 4 shfl_up(const input_type ## 4 & val, uint delta, int width = warpSize) \
+    { \
+        return VecTraits<input_type ## 4>::make( \
+                        shfl_up(val.x, delta, width), \
+                        shfl_up(val.y, delta, width), \
+                        shfl_up(val.z, delta, width), \
+                        shfl_up(val.w, delta, width) \
+                    ); \
+    }
+
+CV_CUDEV_SHFL_UP_VEC_INST(uchar)
+CV_CUDEV_SHFL_UP_VEC_INST(char)
+CV_CUDEV_SHFL_UP_VEC_INST(ushort)
+CV_CUDEV_SHFL_UP_VEC_INST(short)
+CV_CUDEV_SHFL_UP_VEC_INST(uint)
+CV_CUDEV_SHFL_UP_VEC_INST(int)
+CV_CUDEV_SHFL_UP_VEC_INST(float)
+CV_CUDEV_SHFL_UP_VEC_INST(double)
+
+#undef CV_CUDEV_SHFL_UP_VEC_INST
+
+#endif
+
+template <typename T>
+__device__ __forceinline__ T compatible_shfl_up(T val, uint delta, int width = warpSize)
+{
+#if __CUDACC_VER_MAJOR__ < 9
+
+    return shfl_up(val, delta, width);
+
+#else // __CUDACC_VER_MAJOR__ < 9
+
+#if CV_CUDEV_ARCH >= 700
+    return shfl_up_sync(0xFFFFFFFFU, val, delta, width);
+#else
+    const int block_size = Block::blockSize();
+    const int residual = block_size & (warpSize - 1);
+
+    if (0 == residual)
+        return shfl_up_sync(0xFFFFFFFFU, val, delta, width);
+    else
+    {
+        const int n_warps = divUp(block_size, warpSize);
+        const int warp_id = Warp::warpId();
+
+        if (warp_id < n_warps - 1)
+            return shfl_up_sync(0xFFFFFFFFU, val, delta, width);
+        else
+        {
+            // We are at the last threads of a block whose number of threads
+            // is not a multiple of the warp size
+            uint mask = (1LU << residual) - 1;
+            return shfl_up_sync(mask, val, delta, width);
+        }
+    }
+#endif
+
+#endif // __CUDACC_VER_MAJOR__ < 9
+}
+
+// shfl_down
+
+__device__ __forceinline__ uchar shfl_down(uchar val, uint delta, int width = warpSize)
+{
+    return (uchar) __shfl_down((int) val, delta, width);
+}
+
+__device__ __forceinline__ schar shfl_down(schar val, uint delta, int width = warpSize)
+{
+    return (schar) __shfl_down((int) val, delta, width);
+}
+
+__device__ __forceinline__ ushort shfl_down(ushort val, uint delta, int width = warpSize)
+{
+    return (ushort) __shfl_down((int) val, delta, width);
+}
+
+__device__ __forceinline__ short shfl_down(short val, uint delta, int width = warpSize)
+{
+    return (short) __shfl_down((int) val, delta, width);
+}
+
+__device__ __forceinline__ int shfl_down(int val, uint delta, int width = warpSize)
+{
+    return __shfl_down(val, delta, width);
+}
+
+__device__ __forceinline__ uint shfl_down(uint val, uint delta, int width = warpSize)
+{
+    return (uint) __shfl_down((int) val, delta, width);
+}
+
+__device__ __forceinline__ float shfl_down(float val, uint delta, int width = warpSize)
+{
+    return __shfl_down(val, delta, width);
+}
+
+__device__ double shfl_down(double val, uint delta, int width = warpSize)
+{
+    int lo = __double2loint(val);
+    int hi = __double2hiint(val);
+
+    lo = __shfl_down(lo, delta, width);
+    hi = __shfl_down(hi, delta, width);
+
+    return __hiloint2double(hi, lo);
+}
+
+#define CV_CUDEV_SHFL_DOWN_VEC_INST(input_type) \
+    __device__ __forceinline__ input_type ## 1 shfl_down(const input_type ## 1 & val, uint delta, int width = warpSize) \
+    { \
+        return VecTraits<input_type ## 1>::make( \
+                        shfl_down(val.x, delta, width) \
+                    ); \
+    } \
+    __device__ __forceinline__ input_type ## 2 shfl_down(const input_type ## 2 & val, uint delta, int width = warpSize) \
+    { \
+        return VecTraits<input_type ## 2>::make( \
+                        shfl_down(val.x, delta, width), \
+                        shfl_down(val.y, delta, width) \
+                    ); \
+    } \
+    __device__ __forceinline__ input_type ## 3 shfl_down(const input_type ## 3 & val, uint delta, int width = warpSize) \
+    { \
+        return VecTraits<input_type ## 3>::make( \
+                        shfl_down(val.x, delta, width), \
+                        shfl_down(val.y, delta, width), \
+                        shfl_down(val.z, delta, width) \
+                    ); \
+    } \
+    __device__ __forceinline__ input_type ## 4 shfl_down(const input_type ## 4 & val, uint delta, int width = warpSize) \
+    { \
+        return VecTraits<input_type ## 4>::make( \
+                        shfl_down(val.x, delta, width), \
+                        shfl_down(val.y, delta, width), \
+                        shfl_down(val.z, delta, width), \
+                        shfl_down(val.w, delta, width) \
+                    ); \
+    }
+
+CV_CUDEV_SHFL_DOWN_VEC_INST(uchar)
+CV_CUDEV_SHFL_DOWN_VEC_INST(char)
+CV_CUDEV_SHFL_DOWN_VEC_INST(ushort)
+CV_CUDEV_SHFL_DOWN_VEC_INST(short)
+CV_CUDEV_SHFL_DOWN_VEC_INST(uint)
+CV_CUDEV_SHFL_DOWN_VEC_INST(int)
+CV_CUDEV_SHFL_DOWN_VEC_INST(float)
+CV_CUDEV_SHFL_DOWN_VEC_INST(double)
+
+#undef CV_CUDEV_SHFL_DOWN_VEC_INST
+
+// shfl_xor
+
+__device__ __forceinline__ uchar shfl_xor(uchar val, int laneMask, int width = warpSize)
+{
+    return (uchar) __shfl_xor((int) val, laneMask, width);
+}
+
+__device__ __forceinline__ schar shfl_xor(schar val, int laneMask, int width = warpSize)
+{
+    return (schar) __shfl_xor((int) val, laneMask, width);
+}
+
+__device__ __forceinline__ ushort shfl_xor(ushort val, int laneMask, int width = warpSize)
+{
+    return (ushort) __shfl_xor((int) val, laneMask, width);
+}
+
+__device__ __forceinline__ short shfl_xor(short val, int laneMask, int width = warpSize)
+{
+    return (short) __shfl_xor((int) val, laneMask, width);
+}
+
+__device__ __forceinline__ int shfl_xor(int val, int laneMask, int width = warpSize)
+{
+    return __shfl_xor(val, laneMask, width);
+}
+
+__device__ __forceinline__ uint shfl_xor(uint val, int laneMask, int width = warpSize)
+{
+    return (uint) __shfl_xor((int) val, laneMask, width);
+}
+
+__device__ __forceinline__ float shfl_xor(float val, int laneMask, int width = warpSize)
+{
+    return __shfl_xor(val, laneMask, width);
+}
+
+__device__ double shfl_xor(double val, int laneMask, int width = warpSize)
+{
+    int lo = __double2loint(val);
+    int hi = __double2hiint(val);
+
+    lo = __shfl_xor(lo, laneMask, width);
+    hi = __shfl_xor(hi, laneMask, width);
+
+    return __hiloint2double(hi, lo);
+}
+
+#define CV_CUDEV_SHFL_XOR_VEC_INST(input_type) \
+    __device__ __forceinline__ input_type ## 1 shfl_xor(const input_type ## 1 & val, int laneMask, int width = warpSize) \
+    { \
+        return VecTraits<input_type ## 1>::make( \
+                        shfl_xor(val.x, laneMask, width) \
+                    ); \
+    } \
+    __device__ __forceinline__ input_type ## 2 shfl_xor(const input_type ## 2 & val, int laneMask, int width = warpSize) \
+    { \
+        return VecTraits<input_type ## 2>::make( \
+                        shfl_xor(val.x, laneMask, width), \
+                        shfl_xor(val.y, laneMask, width) \
+                    ); \
+    } \
+    __device__ __forceinline__ input_type ## 3 shfl_xor(const input_type ## 3 & val, int laneMask, int width = warpSize) \
+    { \
+        return VecTraits<input_type ## 3>::make( \
+                        shfl_xor(val.x, laneMask, width), \
+                        shfl_xor(val.y, laneMask, width), \
+                        shfl_xor(val.z, laneMask, width) \
+                    ); \
+    } \
+    __device__ __forceinline__ input_type ## 4 shfl_xor(const input_type ## 4 & val, int laneMask, int width = warpSize) \
+    { \
+        return VecTraits<input_type ## 4>::make( \
+                        shfl_xor(val.x, laneMask, width), \
+                        shfl_xor(val.y, laneMask, width), \
+                        shfl_xor(val.z, laneMask, width), \
+                        shfl_xor(val.w, laneMask, width) \
+                    ); \
+    }
+
+CV_CUDEV_SHFL_XOR_VEC_INST(uchar)
+CV_CUDEV_SHFL_XOR_VEC_INST(char)
+CV_CUDEV_SHFL_XOR_VEC_INST(ushort)
+CV_CUDEV_SHFL_XOR_VEC_INST(short)
+CV_CUDEV_SHFL_XOR_VEC_INST(uint)
+CV_CUDEV_SHFL_XOR_VEC_INST(int)
+CV_CUDEV_SHFL_XOR_VEC_INST(float)
+CV_CUDEV_SHFL_XOR_VEC_INST(double)
+
+#undef CV_CUDEV_SHFL_XOR_VEC_INST
+#undef __shfl
+#undef __shfl_xor
+#undef __shfl_up
+#undef __shfl_down
+
+#endif // CV_CUDEV_ARCH >= 300
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/include/opencv2/cudev/warp/warp.hpp b/modules/cudev/include/opencv2/cudev/warp/warp.hpp
new file mode 100644
index 00000000000..db096c56101
--- /dev/null
+++ b/modules/cudev/include/opencv2/cudev/warp/warp.hpp
@@ -0,0 +1,127 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#pragma once
+
+#ifndef OPENCV_CUDEV_WARP_WARP_HPP
+#define OPENCV_CUDEV_WARP_WARP_HPP
+
+#include "../common.hpp"
+
+namespace cv { namespace cudev {
+
+//! @addtogroup cudev
+//! @{
+
+enum
+{
+    LOG_WARP_SIZE = 5,
+    WARP_SIZE     = 1 << LOG_WARP_SIZE
+};
+
+struct Warp
+{
+    __device__ __forceinline__ static uint laneId()
+    {
+        uint ret;
+        asm("mov.u32 %0, %%laneid;" : "=r"(ret));
+        return ret;
+    }
+
+    __device__ __forceinline__ static uint warpId()
+    {
+        const uint tid = (threadIdx.z * blockDim.y + threadIdx.y) * blockDim.x + threadIdx.x;
+        return tid / WARP_SIZE;
+    }
+};
+
+template <class It, typename T>
+__device__ __forceinline__ void warpFill(It beg, It end, const T& value)
+{
+    for(It t = beg + Warp::laneId(); t < end; t += WARP_SIZE)
+        *t = value;
+}
+
+template <class InIt, class OutIt>
+__device__ __forceinline__ OutIt warpCopy(InIt beg, InIt end, OutIt out)
+{
+    for(InIt t = beg + Warp::laneId(); t < end; t += WARP_SIZE, out += WARP_SIZE)
+        *out = *t;
+    return out;
+}
+
+template <class InIt, class OutIt, class UnOp>
+__device__ __forceinline__ OutIt warpTransform(InIt beg, InIt end, OutIt out, const UnOp& op)
+{
+    for(InIt t = beg + Warp::laneId(); t < end; t += WARP_SIZE, out += WARP_SIZE)
+        *out = op(*t);
+    return out;
+}
+
+template <class InIt1, class InIt2, class OutIt, class BinOp>
+__device__ __forceinline__ OutIt warpTransform(InIt1 beg1, InIt1 end1, InIt2 beg2, OutIt out, const BinOp& op)
+{
+    uint lane = Warp::laneId();
+
+    InIt1 t1 = beg1 + lane;
+    InIt2 t2 = beg2 + lane;
+    for(; t1 < end1; t1 += WARP_SIZE, t2 += WARP_SIZE, out += WARP_SIZE)
+        *out = op(*t1, *t2);
+    return out;
+}
+
+template<typename OutIt, typename T>
+__device__ __forceinline__ void warpYota(OutIt beg, OutIt end, T value)
+{
+    uint lane = Warp::laneId();
+    value += lane;
+
+    for(OutIt t = beg + lane; t < end; t += WARP_SIZE, value += WARP_SIZE)
+        *t = value;
+}
+
+//! @}
+
+}}
+
+#endif
diff --git a/modules/cudev/src/stub.cpp b/modules/cudev/src/stub.cpp
new file mode 100644
index 00000000000..ec060adedc9
--- /dev/null
+++ b/modules/cudev/src/stub.cpp
@@ -0,0 +1,11 @@
+#include <opencv2/core/cvdef.h>
+
+namespace cv { namespace cudev {
+
+CV_EXPORTS void stubFunc();
+
+}}
+
+void cv::cudev::stubFunc()
+{
+}
diff --git a/modules/cudev/test/CMakeLists.txt b/modules/cudev/test/CMakeLists.txt
new file mode 100644
index 00000000000..fad2033e810
--- /dev/null
+++ b/modules/cudev/test/CMakeLists.txt
@@ -0,0 +1,43 @@
+set(test_deps opencv_cudev opencv_core opencv_imgproc opencv_imgcodecs opencv_videoio opencv_highgui opencv_ts ${OPENCV_MODULE_opencv_ts_DEPS})
+
+ocv_check_dependencies(${test_deps})
+
+if(OCV_DEPENDENCIES_FOUND)
+  set(the_target "opencv_test_${name}")
+
+  ocv_module_include_directories("${test_deps}" "${the_module}")
+
+  file(GLOB test_srcs "${CMAKE_CURRENT_SOURCE_DIR}/*.cpp" "${CMAKE_CURRENT_SOURCE_DIR}/*.cu")
+  file(GLOB test_hdrs "${CMAKE_CURRENT_SOURCE_DIR}/*.hpp")
+  source_group("Src" FILES ${test_srcs})
+  source_group("Include" FILES ${test_hdrs})
+  set(OPENCV_TEST_${the_module}_SOURCES ${test_srcs} ${test_hdrs})
+
+  ocv_cuda_filter_options()
+
+  CUDA_ADD_EXECUTABLE(${the_target} ${OPENCV_TEST_${the_module}_SOURCES} OPTIONS -std=c++11)
+  ocv_target_link_libraries(${the_target} PRIVATE
+      ${test_deps} ${OPENCV_LINKER_LIBS} ${CUDA_LIBRARIES}
+  )
+  add_dependencies(opencv_tests ${the_target})
+
+  set_target_properties(${the_target} PROPERTIES LABELS "${OPENCV_MODULE_${the_module}_LABEL}")
+  set_source_files_properties(${OPENCV_TEST_${the_module}_SOURCES} ${${the_target}_pch}
+    PROPERTIES LABELS "${OPENCV_MODULE_${the_module}_LABEL};AccuracyTest")
+
+  # Additional target properties
+  set_target_properties(${the_target} PROPERTIES
+    DEBUG_POSTFIX "${OPENCV_DEBUG_POSTFIX}"
+    RUNTIME_OUTPUT_DIRECTORY "${EXECUTABLE_OUTPUT_PATH}"
+  )
+
+  if(ENABLE_SOLUTION_FOLDERS)
+    set_target_properties(${the_target} PROPERTIES FOLDER "tests accuracy")
+  endif()
+
+  ocv_add_test_from_target("${the_target}" "Accuracy" "${the_target}")
+
+  if(INSTALL_TESTS)
+    install(TARGETS ${the_target} RUNTIME DESTINATION ${OPENCV_TEST_INSTALL_PATH} COMPONENT tests)
+  endif()
+endif()
diff --git a/modules/cudev/test/test_arithm_func.cu b/modules/cudev/test/test_arithm_func.cu
new file mode 100644
index 00000000000..8d6826bccd6
--- /dev/null
+++ b/modules/cudev/test/test_arithm_func.cu
@@ -0,0 +1,168 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+using namespace cvtest;
+
+////////////////////////////////////////////////////////////////////////////////
+// SqrtTest
+
+template <typename T>
+class SqrtTest : public ::testing::Test
+{
+public:
+    void test_gpumat()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src = randomMat(size, type);
+
+        GpuMat_<T> d_src(src);
+
+        GpuMat_<T> dst = sqrt_(d_src);
+
+        Mat dst_gold;
+        cv::sqrt(src, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 1e-4);
+    }
+
+    void test_expr()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src1 = randomMat(size, type);
+        Mat src2 = randomMat(size, type);
+
+        GpuMat_<T> d_src1(src1), d_src2(src2);
+
+        GpuMat_<T> dst = sqrt_(d_src1 * d_src2);
+
+        Mat dst_gold;
+        cv::multiply(src1, src2, dst_gold);
+        cv::sqrt(dst_gold, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 1e-4);
+    }
+};
+
+TYPED_TEST_CASE(SqrtTest, float);
+
+TYPED_TEST(SqrtTest, GpuMat)
+{
+    SqrtTest<TypeParam>::test_gpumat();
+}
+
+TYPED_TEST(SqrtTest, Expr)
+{
+    SqrtTest<TypeParam>::test_expr();
+}
+
+////////////////////////////////////////////////////////////////////////////////
+// MagnitudeTest
+
+template <typename T>
+class MagnitudeTest : public ::testing::Test
+{
+public:
+    void test_accuracy()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src1 = randomMat(size, type);
+        Mat src2 = randomMat(size, type);
+
+        GpuMat_<T> d_src1(src1), d_src2(src2);
+
+        GpuMat_<T> dst1 = hypot_(d_src1, d_src2);
+        GpuMat_<T> dst2 = magnitude_(d_src1, d_src2);
+        GpuMat_<T> dst3 = sqrt_(sqr_(d_src1) + sqr_(d_src2));
+
+        EXPECT_MAT_NEAR(dst1, dst2, 1e-4);
+        EXPECT_MAT_NEAR(dst2, dst3, 0.0);
+    }
+};
+
+TYPED_TEST_CASE(MagnitudeTest, float);
+
+TYPED_TEST(MagnitudeTest, Accuracy)
+{
+    MagnitudeTest<TypeParam>::test_accuracy();
+}
+
+////////////////////////////////////////////////////////////////////////////////
+// PowTest
+
+template <typename T>
+class PowTest : public ::testing::Test
+{
+public:
+    void test_accuracy()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src = randomMat(size, type);
+
+        GpuMat_<T> d_src(src);
+
+        GpuMat_<T> dst1 = pow_(d_src, 0.5);
+        GpuMat_<T> dst2 = sqrt_(d_src);
+
+        EXPECT_MAT_NEAR(dst1, dst2, 1e-5);
+    }
+};
+
+TYPED_TEST_CASE(PowTest, float);
+
+TYPED_TEST(PowTest, Accuracy)
+{
+    PowTest<TypeParam>::test_accuracy();
+}
diff --git a/modules/cudev/test/test_arithm_op.cu b/modules/cudev/test/test_arithm_op.cu
new file mode 100644
index 00000000000..d4dca64d7c1
--- /dev/null
+++ b/modules/cudev/test/test_arithm_op.cu
@@ -0,0 +1,395 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+using namespace cvtest;
+
+typedef ::testing::Types<uchar, ushort, short, int, float> AllTypes;
+typedef ::testing::Types<short, int, float> SignedTypes;
+
+////////////////////////////////////////////////////////////////////////////////
+// UnaryMinusTest
+
+template <typename T>
+class UnaryMinusTest : public ::testing::Test
+{
+public:
+    void test_gpumat()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src = randomMat(size, type);
+
+        GpuMat_<T> d_src(src);
+
+        GpuMat_<T> dst = -d_src;
+
+        Mat dst_gold;
+        src.convertTo(dst_gold, src.depth(), -1);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+
+    void test_globptr()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src = randomMat(size, type);
+
+        GpuMat_<T> d_src(src);
+        GlobPtrSz<T> d_src_ptr = d_src;
+
+        GpuMat_<T> dst = -d_src_ptr;
+
+        Mat dst_gold;
+        src.convertTo(dst_gold, src.depth(), -1);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+
+    void test_texptr()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src = randomMat(size, type);
+
+        GpuMat_<T> d_src(src);
+        Texture<T> tex_src(d_src);
+
+        GpuMat_<T> dst = -tex_src;
+
+        Mat dst_gold;
+        src.convertTo(dst_gold, src.depth(), -1);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+
+    void test_expr()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src1 = randomMat(size, type);
+        Mat src2 = randomMat(size, type);
+
+        GpuMat_<T> d_src1(src1), d_src2(src2);
+
+        GpuMat_<T> dst = -(d_src1 + d_src2);
+
+        Mat dst_gold;
+        cv::add(src1, src2, dst_gold);
+        dst_gold.convertTo(dst_gold, dst_gold.depth(), -1);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+};
+
+TYPED_TEST_CASE(UnaryMinusTest, SignedTypes);
+
+TYPED_TEST(UnaryMinusTest, GpuMat)
+{
+    UnaryMinusTest<TypeParam>::test_gpumat();
+}
+
+TYPED_TEST(UnaryMinusTest, GlobPtrSz)
+{
+    UnaryMinusTest<TypeParam>::test_globptr();
+}
+
+TYPED_TEST(UnaryMinusTest, TexturePtr)
+{
+    UnaryMinusTest<TypeParam>::test_texptr();
+}
+
+TYPED_TEST(UnaryMinusTest, Expr)
+{
+    UnaryMinusTest<TypeParam>::test_expr();
+}
+
+////////////////////////////////////////////////////////////////////////////////
+// PlusTest
+
+template <typename T>
+class PlusTest : public ::testing::Test
+{
+public:
+    void test_gpumat_gpumat()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src1 = randomMat(size, type);
+        Mat src2 = randomMat(size, type);
+
+        GpuMat_<T> d_src1(src1), d_src2(src2);
+
+        GpuMat_<T> dst = d_src1 + d_src2;
+
+        Mat dst_gold;
+        cv::add(src1, src2, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+
+    void test_texptr_scalar()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src = randomMat(size, type);
+
+        GpuMat_<T> d_src(src);
+        Texture<T> tex_src(d_src);
+
+        GpuMat_<T> dst = tex_src + static_cast<T>(5);
+
+        Mat dst_gold;
+        cv::add(src, 5, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+
+    void test_expr_gpumat()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src1 = randomMat(size, type);
+        Mat src2 = randomMat(size, type);
+        Mat src3 = randomMat(size, type);
+
+        GpuMat_<T> d_src1(src1), d_src2(src2), d_src3(src3);
+
+        GpuMat_<T> dst = d_src1 + d_src2 + d_src3;
+
+        Mat dst_gold;
+        cv::add(src1, src2, dst_gold);
+        cv::add(dst_gold, src3, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+
+    void test_scalar_expr()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src1 = randomMat(size, type);
+        Mat src2 = randomMat(size, type);
+
+        GpuMat_<T> d_src1(src1), d_src2(src2);
+
+        GpuMat_<T> dst = static_cast<T>(5) + (d_src1 + d_src2);
+
+        Mat dst_gold;
+        cv::add(src1, src2, dst_gold);
+        cv::add(dst_gold, 5, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+};
+
+TYPED_TEST_CASE(PlusTest, AllTypes);
+
+TYPED_TEST(PlusTest, GpuMat_GpuMat)
+{
+    PlusTest<TypeParam>::test_gpumat_gpumat();
+}
+
+TYPED_TEST(PlusTest, TexturePtr_Scalar)
+{
+    PlusTest<TypeParam>::test_texptr_scalar();
+}
+
+TYPED_TEST(PlusTest, Expr_GpuMat)
+{
+    PlusTest<TypeParam>::test_expr_gpumat();
+}
+
+TYPED_TEST(PlusTest, Scalar_Expr)
+{
+    PlusTest<TypeParam>::test_scalar_expr();
+}
+
+////////////////////////////////////////////////////////////////////////////////
+// MinusTest
+
+template <typename T>
+class MinusTest : public ::testing::Test
+{
+public:
+    void test_gpumat_gpumat()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src1 = randomMat(size, type);
+        Mat src2 = randomMat(size, type);
+
+        GpuMat_<T> d_src1(src1), d_src2(src2);
+
+        GpuMat_<T> dst = d_src1 - d_src2;
+
+        Mat dst_gold;
+        cv::subtract(src1, src2, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+
+    void test_texptr_scalar()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src = randomMat(size, type);
+
+        GpuMat_<T> d_src(src);
+        Texture<T> tex_src(d_src);
+
+        GpuMat_<T> dst = tex_src - static_cast<T>(5);
+
+        Mat dst_gold;
+        cv::subtract(src, 5, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+
+    void test_expr_gpumat()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src1 = randomMat(size, type);
+        Mat src2 = randomMat(size, type);
+        Mat src3 = randomMat(size, type);
+
+        GpuMat_<T> d_src1(src1), d_src2(src2), d_src3(src3);
+
+        GpuMat_<T> dst = (d_src1 + d_src2) - d_src3;
+
+        Mat dst_gold;
+        cv::add(src1, src2, dst_gold);
+        cv::subtract(dst_gold, src3, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+
+    void test_scalar_expr()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src1 = randomMat(size, type);
+        Mat src2 = randomMat(size, type);
+
+        GpuMat_<T> d_src1(src1), d_src2(src2);
+
+        GpuMat_<T> dst = static_cast<T>(5) - (d_src1 + d_src2);
+
+        Mat dst_gold;
+        cv::add(src1, src2, dst_gold);
+        cv::subtract(5, dst_gold, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+};
+
+TYPED_TEST_CASE(MinusTest, SignedTypes);
+
+TYPED_TEST(MinusTest, GpuMat_GpuMat)
+{
+    MinusTest<TypeParam>::test_gpumat_gpumat();
+}
+
+TYPED_TEST(MinusTest, TexturePtr_Scalar)
+{
+    MinusTest<TypeParam>::test_texptr_scalar();
+}
+
+TYPED_TEST(MinusTest, Expr_GpuMat)
+{
+    MinusTest<TypeParam>::test_expr_gpumat();
+}
+
+TYPED_TEST(MinusTest, Scalar_Expr)
+{
+    MinusTest<TypeParam>::test_scalar_expr();
+}
+
+////////////////////////////////////////////////////////////////////////////////
+// AbsDiffTest
+
+template <typename T>
+class AbsDiffTest : public ::testing::Test
+{
+public:
+    void test_accuracy()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src1 = randomMat(size, type);
+        Mat src2 = randomMat(size, type);
+
+        GpuMat_<T> d_src1(src1), d_src2(src2);
+
+        GpuMat_<T> dst1 = absdiff_(d_src1, d_src2);
+        GpuMat_<T> dst2 = abs_(d_src1 - d_src2);
+
+        EXPECT_MAT_NEAR(dst1, dst2, 0.0);
+    }
+};
+
+TYPED_TEST_CASE(AbsDiffTest, SignedTypes);
+
+TYPED_TEST(AbsDiffTest, Accuracy)
+{
+    AbsDiffTest<TypeParam>::test_accuracy();
+}
diff --git a/modules/cudev/test/test_bitwize_op.cu b/modules/cudev/test/test_bitwize_op.cu
new file mode 100644
index 00000000000..6936f57485b
--- /dev/null
+++ b/modules/cudev/test/test_bitwize_op.cu
@@ -0,0 +1,146 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+using namespace cvtest;
+
+typedef ::testing::Types<uchar, ushort, short, int> IntTypes;
+
+////////////////////////////////////////////////////////////////////////////////
+// BitNotTest
+
+template <typename T>
+class BitNotTest : public ::testing::Test
+{
+public:
+    void test_gpumat()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src = randomMat(size, type);
+
+        GpuMat_<T> d_src(src);
+
+        GpuMat_<T> dst = ~d_src;
+
+        Mat dst_gold;
+        cv::bitwise_not(src, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+};
+
+TYPED_TEST_CASE(BitNotTest, IntTypes);
+
+TYPED_TEST(BitNotTest, GpuMat)
+{
+    BitNotTest<TypeParam>::test_gpumat();
+}
+
+////////////////////////////////////////////////////////////////////////////////
+// BitAndTest
+
+template <typename T>
+class BitAndTest : public ::testing::Test
+{
+public:
+    void test_gpumat_gpumat()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src1 = randomMat(size, type);
+        Mat src2 = randomMat(size, type);
+
+        GpuMat_<T> d_src1(src1), d_src2(src2);
+
+        GpuMat_<T> dst = d_src1 & d_src2;
+
+        Mat dst_gold;
+        cv::bitwise_and(src1, src2, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+};
+
+TYPED_TEST_CASE(BitAndTest, IntTypes);
+
+TYPED_TEST(BitAndTest, GpuMat_GpuMat)
+{
+    BitAndTest<TypeParam>::test_gpumat_gpumat();
+}
+
+////////////////////////////////////////////////////////////////////////////////
+// LShiftTest
+
+template <typename T>
+class LShiftTest : public ::testing::Test
+{
+public:
+    void test_accuracy()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src = randomMat(size, type);
+
+        GpuMat_<T> d_src(src);
+
+        GpuMat_<T> dst1 = d_src << 2;
+        GpuMat_<T> dst2 = d_src * 4;
+
+        EXPECT_MAT_NEAR(dst1, dst2, 0.0);
+    }
+};
+
+TYPED_TEST_CASE(LShiftTest, int);
+
+TYPED_TEST(LShiftTest, Accuracy)
+{
+    LShiftTest<TypeParam>::test_accuracy();
+}
diff --git a/modules/cudev/test/test_cmp_op.cu b/modules/cudev/test/test_cmp_op.cu
new file mode 100644
index 00000000000..19933723dac
--- /dev/null
+++ b/modules/cudev/test/test_cmp_op.cu
@@ -0,0 +1,151 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+using namespace cvtest;
+
+typedef ::testing::Types<uchar, ushort, short, int, float> AllTypes;
+
+////////////////////////////////////////////////////////////////////////////////
+// LessTest
+
+template <typename T>
+class LessTest : public ::testing::Test
+{
+public:
+    void test_gpumat_gpumat()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src1 = randomMat(size, type);
+        Mat src2 = randomMat(size, type);
+
+        GpuMat_<T> d_src1(src1), d_src2(src2);
+
+        GpuMat_<uchar> dst = (d_src1 < d_src2) * 255;
+
+        Mat dst_gold;
+        cv::compare(src1, src2, dst_gold, CMP_LT);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+};
+
+TYPED_TEST_CASE(LessTest, AllTypes);
+
+TYPED_TEST(LessTest, GpuMat_GpuMat)
+{
+    LessTest<TypeParam>::test_gpumat_gpumat();
+}
+
+////////////////////////////////////////////////////////////////////////////////
+// MinTest
+
+template <typename T>
+class MinTest : public ::testing::Test
+{
+public:
+    void test_gpumat_gpumat()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src1 = randomMat(size, type);
+        Mat src2 = randomMat(size, type);
+
+        GpuMat_<T> d_src1(src1), d_src2(src2);
+
+        GpuMat_<T> dst = min_(d_src1, d_src2);
+
+        Mat dst_gold;
+        cv::min(src1, src2, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+};
+
+TYPED_TEST_CASE(MinTest, AllTypes);
+
+TYPED_TEST(MinTest, GpuMat_GpuMat)
+{
+    MinTest<TypeParam>::test_gpumat_gpumat();
+}
+
+////////////////////////////////////////////////////////////////////////////////
+// ThreshBinaryTest
+
+typedef ::testing::Types<uchar, short, float> ThreshTypes;
+
+template <typename T>
+class ThreshBinaryTest : public ::testing::Test
+{
+public:
+    void test_gpumat()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src = randomMat(size, type);
+
+        GpuMat_<T> d_src(src);
+
+        GpuMat_<T> dst = threshBinary_(d_src, 128, 0);
+
+        Mat dst_gold;
+        cv::threshold(src, dst_gold, 128, 0, THRESH_BINARY);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+};
+
+TYPED_TEST_CASE(ThreshBinaryTest, ThreshTypes);
+
+TYPED_TEST(ThreshBinaryTest, GpuMat)
+{
+    ThreshBinaryTest<TypeParam>::test_gpumat();
+}
diff --git a/modules/cudev/test/test_color_cvt.cu b/modules/cudev/test/test_color_cvt.cu
new file mode 100644
index 00000000000..53154f99c66
--- /dev/null
+++ b/modules/cudev/test/test_color_cvt.cu
@@ -0,0 +1,180 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+using namespace cvtest;
+
+namespace cv {
+
+enum {
+    COLOR_BGR2BGR = COLOR_BGR2RGB,
+    COLOR_BGR2LRGB = COLOR_BGR2RGB,
+    COLOR_BGR2LBGR = COLOR_BGR2RGB
+};
+
+}
+
+#define CVT_COLOR_TEST(src_space, dst_space, src_cn, dst_cn) \
+    TEST(CvtColor, src_space ## _to_ ## dst_space) \
+    { \
+        const Size size = randomSize(100, 400); \
+        Mat bgrb = randomMat(size, CV_8UC3); \
+        Mat srcb; \
+        cv::cvtColor(bgrb, srcb, COLOR_BGR ## 2 ## src_space, src_cn); \
+        GpuMat_<SelectIf<src_cn == 1, uchar, uchar ## src_cn>::type> d_srcb(srcb); \
+        GpuMat_<SelectIf<dst_cn == 1, uchar, uchar ## dst_cn>::type> dstb = src_space ## _to_ ## dst_space ## _(d_srcb); \
+        Mat dstb_gold; \
+        cv::cvtColor(srcb, dstb_gold, COLOR_ ## src_space ## 2 ## dst_space); \
+        EXPECT_MAT_NEAR(dstb_gold, dstb, 2.0); \
+        Mat bgrf = randomMat(size, CV_32FC3, 0, 1); \
+        Mat srcf; \
+        cv::cvtColor(bgrf, srcf, COLOR_BGR ## 2 ## src_space, src_cn); \
+        GpuMat_<SelectIf<src_cn == 1, float, float ## src_cn>::type> d_srcf(srcf); \
+        GpuMat_<SelectIf<dst_cn == 1, float, float ## dst_cn>::type> dstf = src_space ## _to_ ## dst_space ## _(d_srcf); \
+        Mat dstf_gold; \
+        cv::cvtColor(srcf, dstf_gold, COLOR_ ## src_space ## 2 ## dst_space); \
+        EXPECT_MAT_NEAR(dstf_gold, dstf, 2.0); \
+    }
+
+// RGB <-> BGR
+
+CVT_COLOR_TEST(BGR, RGB, 3, 3)
+CVT_COLOR_TEST(BGR, BGRA, 3, 4)
+CVT_COLOR_TEST(BGR, RGBA, 3, 4)
+CVT_COLOR_TEST(BGRA, BGR, 4, 3)
+CVT_COLOR_TEST(BGRA, RGB, 4, 3)
+CVT_COLOR_TEST(BGRA, RGBA, 4, 4)
+
+// RGB <-> Gray
+
+CVT_COLOR_TEST(BGR, GRAY, 3, 1)
+CVT_COLOR_TEST(RGB, GRAY, 3, 1)
+CVT_COLOR_TEST(BGRA, GRAY, 4, 1)
+CVT_COLOR_TEST(RGBA, GRAY, 4, 1)
+
+CVT_COLOR_TEST(GRAY, BGR, 1, 3)
+CVT_COLOR_TEST(GRAY, BGRA, 1, 4)
+
+// RGB <-> YUV
+
+CVT_COLOR_TEST(RGB, YUV, 3, 3)
+CVT_COLOR_TEST(BGR, YUV, 3, 3)
+
+CVT_COLOR_TEST(YUV, RGB, 3, 3)
+CVT_COLOR_TEST(YUV, BGR, 3, 3)
+
+// RGB <-> YCrCb
+
+CVT_COLOR_TEST(RGB, YCrCb, 3, 3)
+CVT_COLOR_TEST(BGR, YCrCb, 3, 3)
+
+CVT_COLOR_TEST(YCrCb, RGB, 3, 3)
+CVT_COLOR_TEST(YCrCb, BGR, 3, 3)
+
+// RGB <-> XYZ
+
+CVT_COLOR_TEST(RGB, XYZ, 3, 3)
+CVT_COLOR_TEST(BGR, XYZ, 3, 3)
+
+CVT_COLOR_TEST(XYZ, RGB, 3, 3)
+CVT_COLOR_TEST(XYZ, BGR, 3, 3)
+
+// RGB <-> HSV
+
+CVT_COLOR_TEST(RGB, HSV, 3, 3)
+CVT_COLOR_TEST(BGR, HSV, 3, 3)
+
+CVT_COLOR_TEST(HSV, RGB, 3, 3)
+CVT_COLOR_TEST(HSV, BGR, 3, 3)
+
+CVT_COLOR_TEST(RGB, HSV_FULL, 3, 3)
+CVT_COLOR_TEST(BGR, HSV_FULL, 3, 3)
+
+CVT_COLOR_TEST(HSV, RGB_FULL, 3, 3)
+CVT_COLOR_TEST(HSV, BGR_FULL, 3, 3)
+
+// RGB <-> HLS
+
+CVT_COLOR_TEST(RGB, HLS, 3, 3)
+CVT_COLOR_TEST(BGR, HLS, 3, 3)
+
+CVT_COLOR_TEST(HLS, RGB, 3, 3)
+CVT_COLOR_TEST(HLS, BGR, 3, 3)
+
+CVT_COLOR_TEST(RGB, HLS_FULL, 3, 3)
+CVT_COLOR_TEST(BGR, HLS_FULL, 3, 3)
+
+CVT_COLOR_TEST(HLS, RGB_FULL, 3, 3)
+CVT_COLOR_TEST(HLS, BGR_FULL, 3, 3)
+
+// RGB <-> Lab
+
+CVT_COLOR_TEST(RGB, Lab, 3, 3)
+CVT_COLOR_TEST(BGR, Lab, 3, 3)
+
+CVT_COLOR_TEST(Lab, RGB, 3, 3)
+CVT_COLOR_TEST(Lab, BGR, 3, 3)
+
+CVT_COLOR_TEST(LRGB, Lab, 3, 3)
+CVT_COLOR_TEST(LBGR, Lab, 3, 3)
+
+CVT_COLOR_TEST(Lab, LRGB, 3, 3)
+CVT_COLOR_TEST(Lab, LBGR, 3, 3)
+
+// RGB <-> Luv
+
+CVT_COLOR_TEST(RGB, Luv, 3, 3)
+CVT_COLOR_TEST(BGR, Luv, 3, 3)
+
+CVT_COLOR_TEST(Luv, RGB, 3, 3)
+CVT_COLOR_TEST(Luv, BGR, 3, 3)
+
+CVT_COLOR_TEST(LRGB, Luv, 3, 3)
+CVT_COLOR_TEST(LBGR, Luv, 3, 3)
+
+CVT_COLOR_TEST(Luv, LRGB, 3, 3)
+CVT_COLOR_TEST(Luv, LBGR, 3, 3)
diff --git a/modules/cudev/test/test_cvt.cu b/modules/cudev/test/test_cvt.cu
new file mode 100644
index 00000000000..c6595259715
--- /dev/null
+++ b/modules/cudev/test/test_cvt.cu
@@ -0,0 +1,150 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+using namespace cvtest;
+
+typedef ::testing::Types<uchar, ushort, short, int, float> AllTypes;
+typedef ::testing::Types<short, float> Fp16Types;
+
+////////////////////////////////////////////////////////////////////////////////
+// CvtTest
+
+template <typename T>
+class CvtTest : public ::testing::Test
+{
+public:
+    void test_gpumat()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src = randomMat(size, type);
+
+        GpuMat_<T> d_src(src);
+
+        GpuMat_<T> dst = cvt_<T>(cvt_<float>(d_src) * 2.0f - 10.0f);
+
+        Mat dst_gold;
+        src.convertTo(dst_gold, src.depth(), 2, -10);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+};
+
+// dummy class
+template <typename T>
+class CvFp16Test : public ::testing::Test
+{
+public:
+    void test_gpumat()
+    {
+    }
+};
+
+template <>
+class CvFp16Test <short> : public ::testing::Test
+{
+public:
+    void test_gpumat()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<float>::type;
+
+        Mat src = randomMat(size, type), dst, ref;
+
+        GpuMat_<float> g_src(src);
+        GpuMat g_dst;
+
+        // Fp32 -> Fp16
+        cuda::convertFp16(g_src, g_dst);
+        cv::convertFp16(src, dst);
+        // Fp16 -> Fp32
+        cuda::convertFp16(g_dst.clone(), g_dst);
+        cv::convertFp16(dst, ref);
+
+        g_dst.download(dst);
+        EXPECT_MAT_NEAR(dst, ref, 0.0);
+    }
+};
+
+template <>
+class CvFp16Test <float> : public ::testing::Test
+{
+public:
+    void test_gpumat()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<float>::type;
+
+        Mat src = randomMat(size, type), dst, ref;
+
+        GpuMat_<float> g_src(src);
+        GpuMat g_dst;
+
+        // Fp32 -> Fp16
+        cuda::convertFp16(g_src, g_dst);
+        cv::convertFp16(src, ref);
+
+        g_dst.download(dst);
+        EXPECT_MAT_NEAR(dst, ref, 0.0);
+    }
+};
+
+TYPED_TEST_CASE(CvtTest, AllTypes);
+
+TYPED_TEST(CvtTest, GpuMat)
+{
+    CvtTest<TypeParam>::test_gpumat();
+}
+
+TYPED_TEST_CASE(CvFp16Test, Fp16Types);
+
+TYPED_TEST(CvFp16Test, GpuMat)
+{
+    CvFp16Test<TypeParam>::test_gpumat();
+}
diff --git a/modules/cudev/test/test_deriv.cu b/modules/cudev/test/test_deriv.cu
new file mode 100644
index 00000000000..2001b7fdeee
--- /dev/null
+++ b/modules/cudev/test/test_deriv.cu
@@ -0,0 +1,109 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+using namespace cvtest;
+
+TEST(Sobel, Accuracy)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+    Texture<uchar> tex_src(d_src);
+
+    GpuMat_<short> dx = sobelX_(cvt_<int>(tex_src));
+    GpuMat_<short> dy = sobelY_(cvt_<int>(tex_src));
+
+    Mat dx_gold, dy_gold;
+    cv::Sobel(src, dx_gold, CV_16S, 1, 0, 3, 1, 0, BORDER_REPLICATE);
+    cv::Sobel(src, dy_gold, CV_16S, 0, 1, 3, 1, 0, BORDER_REPLICATE);
+
+    EXPECT_MAT_NEAR(dx_gold, dx, 0.0);
+    EXPECT_MAT_NEAR(dy_gold, dy, 0.0);
+}
+
+TEST(Scharr, Accuracy)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+    Texture<uchar> tex_src(d_src);
+
+    GpuMat_<short> dx = scharrX_(cvt_<int>(tex_src));
+    GpuMat_<short> dy = scharrY_(cvt_<int>(tex_src));
+
+    Mat dx_gold, dy_gold;
+    cv::Scharr(src, dx_gold, CV_16S, 1, 0, 1, 0, BORDER_REPLICATE);
+    cv::Scharr(src, dy_gold, CV_16S, 0, 1, 1, 0, BORDER_REPLICATE);
+
+    EXPECT_MAT_NEAR(dx_gold, dx, 0.0);
+    EXPECT_MAT_NEAR(dy_gold, dy, 0.0);
+}
+
+TEST(Laplacian, Accuracy)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+    Texture<uchar> tex_src(d_src);
+
+    GpuMat_<short> dst1 = laplacian_<1>(cvt_<int>(tex_src));
+    GpuMat_<short> dst3 = laplacian_<3>(cvt_<int>(tex_src));
+
+    Mat dst1_gold, dst3_gold;
+    cv::Laplacian(src, dst1_gold, CV_16S, 1, 1, 0, BORDER_REPLICATE);
+    cv::Laplacian(src, dst3_gold, CV_16S, 3, 1, 0, BORDER_REPLICATE);
+
+    EXPECT_MAT_NEAR(dst1_gold, dst1, 0.0);
+    EXPECT_MAT_NEAR(dst3_gold, dst3, 0.0);
+}
diff --git a/modules/cudev/test/test_integral.cu b/modules/cudev/test/test_integral.cu
new file mode 100644
index 00000000000..3c34ffcc056
--- /dev/null
+++ b/modules/cudev/test/test_integral.cu
@@ -0,0 +1,103 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+using namespace cvtest;
+
+TEST(Integral, _8u)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<uint> dst = integral_(d_src);
+
+    Mat dst_gold;
+    cv::integral(src, dst_gold);
+
+    dst_gold = dst_gold(Rect(1, 1, size.width, size.height));
+
+    ASSERT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+TEST(Integral, _32f)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_32FC1, 0, 1);
+
+    GpuMat_<float> d_src(src);
+
+    GpuMat_<float> dst = integral_(d_src);
+
+    Mat dst_gold;
+    cv::integral(src, dst_gold, CV_32F);
+
+    dst_gold = dst_gold(Rect(1, 1, size.width, size.height));
+
+    ASSERT_PRED_FORMAT2(cvtest::MatComparator(1e-5, 0), dst_gold, Mat(dst));
+}
+
+TEST(Integral, _8u_opt)
+{
+    const Size size(640, 480);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<uint> dst = integral_(d_src);
+
+    Mat dst_gold;
+    cv::integral(src, dst_gold);
+
+    dst_gold = dst_gold(Rect(1, 1, size.width, size.height));
+
+    ASSERT_MAT_NEAR(dst_gold, dst, 0.0);
+}
diff --git a/modules/cudev/test/test_lut.cu b/modules/cudev/test/test_lut.cu
new file mode 100644
index 00000000000..62c3129a98c
--- /dev/null
+++ b/modules/cudev/test/test_lut.cu
@@ -0,0 +1,82 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+using namespace cvtest;
+
+////////////////////////////////////////////////////////////////////////////////
+// LutTest
+
+template <typename T>
+class LutTest : public ::testing::Test
+{
+public:
+    void test_gpumat()
+    {
+        const Size size = randomSize(100, 400);
+        const int type = DataType<T>::type;
+
+        Mat src = randomMat(size, type);
+        Mat tbl = randomMat(Size(256, 1), type);
+
+        GpuMat_<T> d_src(src), d_tbl(tbl);
+
+        GpuMat_<T> dst = lut_(d_src, d_tbl);
+
+        Mat dst_gold;
+        cv::LUT(src, tbl, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+};
+
+TYPED_TEST_CASE(LutTest, uchar);
+
+TYPED_TEST(LutTest, GpuMat)
+{
+    LutTest<TypeParam>::test_gpumat();
+}
diff --git a/modules/cudev/test/test_main.cpp b/modules/cudev/test/test_main.cpp
new file mode 100644
index 00000000000..fc7f8a3b3a6
--- /dev/null
+++ b/modules/cudev/test/test_main.cpp
@@ -0,0 +1,46 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/ts.hpp"
+
+CV_TEST_MAIN("cv")
diff --git a/modules/cudev/test/test_precomp.hpp b/modules/cudev/test/test_precomp.hpp
new file mode 100644
index 00000000000..591b9431839
--- /dev/null
+++ b/modules/cudev/test/test_precomp.hpp
@@ -0,0 +1,55 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#include "opencv2/cudev.hpp"
+
+#define CV_TEST_SKIP_NAMESPACE_CHECK
+#include "opencv2/ts.hpp"
+#include "opencv2/ts/cuda_test.hpp"
+
+#include "cvconfig.h"
+
+#endif
diff --git a/modules/cudev/test/test_pyramids.cu b/modules/cudev/test/test_pyramids.cu
new file mode 100644
index 00000000000..28678b8d628
--- /dev/null
+++ b/modules/cudev/test/test_pyramids.cu
@@ -0,0 +1,81 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+using namespace cvtest;
+
+TEST(PyrDown, _8uc1)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<uchar> dst = pyrDown_(d_src);
+
+    Mat dst_gold;
+    cv::pyrDown(src, dst_gold);
+
+    ASSERT_MAT_NEAR(dst_gold, dst, 1.0);
+}
+
+TEST(PyrUp, _32fc4)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_32FC4);
+
+    GpuMat_<float4> d_src(src);
+
+    GpuMat_<float4> dst = pyrDown_(d_src);
+
+    Mat dst_gold;
+    cv::pyrDown(src, dst_gold);
+
+    ASSERT_MAT_NEAR(dst_gold, dst, 1e-4);
+}
diff --git a/modules/cudev/test/test_reduction.cu b/modules/cudev/test/test_reduction.cu
new file mode 100644
index 00000000000..03c78def152
--- /dev/null
+++ b/modules/cudev/test/test_reduction.cu
@@ -0,0 +1,312 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+using namespace cvtest;
+
+TEST(Sum, GpuMat)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<float> dst = sum_(d_src);
+    float res;
+    dst.download(_OutputArray(&res, 1));
+
+    Scalar dst_gold = cv::sum(src);
+
+    ASSERT_FLOAT_EQ(static_cast<float>(dst_gold[0]), res);
+}
+
+TEST(Sum, Expr)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src1 = randomMat(size, CV_32FC1, 0, 1);
+    Mat src2 = randomMat(size, CV_32FC1, 0, 1);
+
+    GpuMat_<float> d_src1(src1), d_src2(src2);
+
+    GpuMat_<float> dst = sum_(abs_(d_src1 - d_src2));
+    float res;
+    dst.download(_OutputArray(&res, 1));
+
+    Scalar dst_gold = cv::norm(src1, src2, NORM_L1);
+
+    ASSERT_FLOAT_EQ(static_cast<float>(dst_gold[0]), res);
+}
+
+TEST(MinVal, GpuMat)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<float> dst = minVal_(d_src);
+    float res;
+    dst.download(_OutputArray(&res, 1));
+
+    double res_gold;
+    cv::minMaxLoc(src, &res_gold, 0);
+
+    ASSERT_FLOAT_EQ(static_cast<float>(res_gold), res);
+}
+
+TEST(MaxVal, Expr)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src1 = randomMat(size, CV_32SC1);
+    Mat src2 = randomMat(size, CV_32SC1);
+
+    GpuMat_<int> d_src1(src1), d_src2(src2);
+
+    GpuMat_<float> dst = maxVal_(abs_(d_src1 - d_src2));
+    float res;
+    dst.download(_OutputArray(&res, 1));
+
+    double res_gold = cv::norm(src1, src2, NORM_INF);
+
+    ASSERT_FLOAT_EQ(static_cast<float>(res_gold), res);
+}
+
+TEST(MinMaxVal, GpuMat)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<float> dst = minMaxVal_(d_src);
+    float res[2];
+    dst.download(Mat(1, 2, CV_32FC1, res));
+
+    double res_gold[2];
+    cv::minMaxLoc(src, &res_gold[0], &res_gold[1]);
+
+    ASSERT_FLOAT_EQ(static_cast<float>(res_gold[0]), res[0]);
+    ASSERT_FLOAT_EQ(static_cast<float>(res_gold[1]), res[1]);
+}
+
+TEST(NonZeroCount, Accuracy)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1, 0, 5);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<int> dst1 = countNonZero_(d_src);
+    GpuMat_<int> dst2 = sum_(cvt_<int>(d_src) != 0);
+
+    EXPECT_MAT_NEAR(dst1, dst2, 0.0);
+}
+
+TEST(ReduceToRow, Sum)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<int> dst = reduceToRow_<Sum<int> >(d_src);
+
+    Mat dst_gold;
+    cv::reduce(src, dst_gold, 0, REDUCE_SUM, CV_32S);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+TEST(ReduceToRow, Avg)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<float> dst = reduceToRow_<Avg<float> >(d_src);
+
+    Mat dst_gold;
+    cv::reduce(src, dst_gold, 0, REDUCE_AVG, CV_32F);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-4);
+}
+
+TEST(ReduceToRow, Min)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<uchar> dst = reduceToRow_<Min<uchar> >(d_src);
+
+    Mat dst_gold;
+    cv::reduce(src, dst_gold, 0, REDUCE_MIN);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+TEST(ReduceToRow, Max)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<uchar> dst = reduceToRow_<Max<uchar> >(d_src);
+
+    Mat dst_gold;
+    cv::reduce(src, dst_gold, 0, REDUCE_MAX);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+TEST(ReduceToColumn, Sum)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<int> dst = reduceToColumn_<Sum<int> >(d_src);
+
+    Mat dst_gold;
+    cv::reduce(src, dst_gold, 1, REDUCE_SUM, CV_32S);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+TEST(ReduceToColumn, Avg)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<float> dst = reduceToColumn_<Avg<float> >(d_src);
+
+    Mat dst_gold;
+    cv::reduce(src, dst_gold, 1, REDUCE_AVG, CV_32F);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 1e-4);
+}
+
+TEST(ReduceToColumn, Min)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<uchar> dst = reduceToColumn_<Min<uchar> >(d_src);
+
+    Mat dst_gold;
+    cv::reduce(src, dst_gold, 1, REDUCE_MIN);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+TEST(ReduceToColumn, Max)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<uchar> dst = reduceToColumn_<Max<uchar> >(d_src);
+
+    Mat dst_gold;
+    cv::reduce(src, dst_gold, 1, REDUCE_MAX);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+static void calcHistGold(const cv::Mat& src, cv::Mat& hist)
+{
+    hist.create(1, 256, CV_32SC1);
+    hist.setTo(cv::Scalar::all(0));
+
+    int* hist_row = hist.ptr<int>();
+    for (int y = 0; y < src.rows; ++y)
+    {
+        const uchar* src_row = src.ptr(y);
+
+        for (int x = 0; x < src.cols; ++x)
+            ++hist_row[src_row[x]];
+    }
+}
+
+TEST(Histogram, GpuMat)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<int> dst = histogram_<256>(d_src);
+
+    Mat dst_gold;
+    calcHistGold(src, dst_gold);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
diff --git a/modules/cudev/test/test_scan.cu b/modules/cudev/test/test_scan.cu
new file mode 100644
index 00000000000..e5404d8890b
--- /dev/null
+++ b/modules/cudev/test/test_scan.cu
@@ -0,0 +1,140 @@
+
+#include "test_precomp.hpp"
+
+using namespace cv;
+using namespace cv::cudev;
+using namespace cvtest;
+
+// BlockScanInt
+
+template <int THREADS_NUM>
+__global__ void int_kernel(int* data)
+{
+    uint tid = Block::threadLineId();
+
+#if CV_CUDEV_ARCH >= 300
+    const int n_warps = (THREADS_NUM - 1) / WARP_SIZE + 1;
+    __shared__ int smem[n_warps];
+#else
+    __shared__ int smem[THREADS_NUM];
+#endif
+
+    data[tid] = blockScanInclusive<THREADS_NUM>(data[tid], smem, tid);
+}
+
+#define BLOCK_SCAN_INT_TEST(block_size)                                 \
+    TEST(BlockScanInt, BlockSize##block_size)                           \
+    {                                                                   \
+        Mat src = randomMat(Size(block_size, 1), CV_32SC1, 0, 1024);    \
+                                                                        \
+        GpuMat d_src;                                                   \
+        d_src.upload(src);                                              \
+                                                                        \
+        for (int col = 1; col < block_size; col++)                      \
+            src.at<int>(0, col) += src.at<int>(0, col - 1);             \
+                                                                        \
+        int_kernel<block_size><<<1, block_size>>>((int*)d_src.data);    \
+                                                                        \
+        CV_CUDEV_SAFE_CALL(cudaDeviceSynchronize());                    \
+                                                                        \
+        EXPECT_MAT_NEAR(d_src, src, 0);                                 \
+    }
+
+BLOCK_SCAN_INT_TEST(29)
+BLOCK_SCAN_INT_TEST(30)
+BLOCK_SCAN_INT_TEST(32)
+BLOCK_SCAN_INT_TEST(40)
+BLOCK_SCAN_INT_TEST(41)
+
+BLOCK_SCAN_INT_TEST(59)
+BLOCK_SCAN_INT_TEST(60)
+BLOCK_SCAN_INT_TEST(64)
+BLOCK_SCAN_INT_TEST(70)
+BLOCK_SCAN_INT_TEST(71)
+
+BLOCK_SCAN_INT_TEST(109)
+BLOCK_SCAN_INT_TEST(110)
+BLOCK_SCAN_INT_TEST(128)
+BLOCK_SCAN_INT_TEST(130)
+BLOCK_SCAN_INT_TEST(131)
+
+BLOCK_SCAN_INT_TEST(189)
+BLOCK_SCAN_INT_TEST(200)
+BLOCK_SCAN_INT_TEST(256)
+BLOCK_SCAN_INT_TEST(300)
+BLOCK_SCAN_INT_TEST(311)
+
+BLOCK_SCAN_INT_TEST(489)
+BLOCK_SCAN_INT_TEST(500)
+BLOCK_SCAN_INT_TEST(512)
+BLOCK_SCAN_INT_TEST(600)
+BLOCK_SCAN_INT_TEST(611)
+
+BLOCK_SCAN_INT_TEST(1024)
+
+// BlockScanDouble
+
+template <int THREADS_NUM>
+__global__ void double_kernel(double* data)
+{
+    uint tid = Block::threadLineId();
+
+#if CV_CUDEV_ARCH >= 300
+    const int n_warps = (THREADS_NUM - 1) / WARP_SIZE + 1;
+    __shared__ double smem[n_warps];
+#else
+    __shared__ double smem[THREADS_NUM];
+#endif
+
+    data[tid] = blockScanInclusive<THREADS_NUM>(data[tid], smem, tid);
+}
+
+#define BLOCK_SCAN_DOUBLE_TEST(block_size)                                  \
+    TEST(BlockScanDouble, BlockSize##block_size)                            \
+    {                                                                       \
+        Mat src = randomMat(Size(block_size, 1), CV_64FC1, 0.0, 1.0);       \
+                                                                            \
+        GpuMat d_src;                                                       \
+        d_src.upload(src);                                                  \
+                                                                            \
+        for (int col = 1; col < block_size; col++)                          \
+            src.at<double>(0, col) += src.at<double>(0, col - 1);           \
+                                                                            \
+        double_kernel<block_size><<<1, block_size>>>((double*)d_src.data);  \
+                                                                            \
+        CV_CUDEV_SAFE_CALL(cudaDeviceSynchronize());                        \
+                                                                            \
+        EXPECT_MAT_NEAR(d_src, src, 1e-10);                                 \
+    }
+
+BLOCK_SCAN_DOUBLE_TEST(29)
+BLOCK_SCAN_DOUBLE_TEST(30)
+BLOCK_SCAN_DOUBLE_TEST(32)
+BLOCK_SCAN_DOUBLE_TEST(40)
+BLOCK_SCAN_DOUBLE_TEST(41)
+
+BLOCK_SCAN_DOUBLE_TEST(59)
+BLOCK_SCAN_DOUBLE_TEST(60)
+BLOCK_SCAN_DOUBLE_TEST(64)
+BLOCK_SCAN_DOUBLE_TEST(70)
+BLOCK_SCAN_DOUBLE_TEST(71)
+
+BLOCK_SCAN_DOUBLE_TEST(109)
+BLOCK_SCAN_DOUBLE_TEST(110)
+BLOCK_SCAN_DOUBLE_TEST(128)
+BLOCK_SCAN_DOUBLE_TEST(130)
+BLOCK_SCAN_DOUBLE_TEST(131)
+
+BLOCK_SCAN_DOUBLE_TEST(189)
+BLOCK_SCAN_DOUBLE_TEST(200)
+BLOCK_SCAN_DOUBLE_TEST(256)
+BLOCK_SCAN_DOUBLE_TEST(300)
+BLOCK_SCAN_DOUBLE_TEST(311)
+
+BLOCK_SCAN_DOUBLE_TEST(489)
+BLOCK_SCAN_DOUBLE_TEST(500)
+BLOCK_SCAN_DOUBLE_TEST(512)
+BLOCK_SCAN_DOUBLE_TEST(600)
+BLOCK_SCAN_DOUBLE_TEST(611)
+
+BLOCK_SCAN_DOUBLE_TEST(1024)
diff --git a/modules/cudev/test/test_split_merge.cu b/modules/cudev/test/test_split_merge.cu
new file mode 100644
index 00000000000..b25c8b96d6f
--- /dev/null
+++ b/modules/cudev/test/test_split_merge.cu
@@ -0,0 +1,180 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+using namespace cvtest;
+
+typedef ::testing::Types<uchar, ushort, short, int, float> AllTypes;
+
+////////////////////////////////////////////////////////////////////////////////
+// MergeTest
+
+template <typename T>
+class MergeTest : public ::testing::Test
+{
+public:
+    void test_c2()
+    {
+        const Size size = randomSize(100, 400);
+
+        const int src_type = DataType<T>::type;
+
+        Mat src1 = randomMat(size, src_type);
+        Mat src2 = randomMat(size, src_type);
+
+        GpuMat_<T> d_src1(src1);
+        GpuMat_<T> d_src2(src2);
+
+        GpuMat_<typename MakeVec<T, 2>::type> dst;
+        gridMerge(zipPtr(d_src1, d_src2), dst);
+
+        Mat dst_gold;
+        Mat srcs[] = {src1, src2};
+        cv::merge(srcs, 2, dst_gold);
+
+        EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+
+    void test_c3()
+    {
+        const Size size = randomSize(100, 400);
+
+        const int src_type = DataType<T>::type;
+
+        Mat src1 = randomMat(size, src_type);
+        Mat src2 = randomMat(size, src_type);
+        Mat src3 = randomMat(size, src_type);
+
+        GpuMat_<T> d_src1(src1);
+        GpuMat_<T> d_src2(src2);
+        GpuMat_<T> d_src3(src3);
+
+        GpuMat_<typename MakeVec<T, 3>::type> dst;
+        gridMerge(zipPtr(d_src1, d_src2, d_src3), dst);
+
+        Mat dst_gold;
+        Mat srcs[] = {src1, src2, src3};
+        cv::merge(srcs, 3, dst_gold);
+
+        ASSERT_MAT_NEAR(dst_gold, dst, 0.0);
+    }
+};
+
+TYPED_TEST_CASE(MergeTest, AllTypes);
+
+TYPED_TEST(MergeTest, C2)
+{
+    MergeTest<TypeParam>::test_c2();
+}
+
+TYPED_TEST(MergeTest, C3)
+{
+    MergeTest<TypeParam>::test_c3();
+}
+
+////////////////////////////////////////////////////////////////////////////////
+// SplitTest
+
+template <typename T>
+class SplitTest : public ::testing::Test
+{
+public:
+    void test_c3()
+    {
+        const Size size = randomSize(100, 400);
+
+        const int src_type = CV_MAKE_TYPE(DataType<T>::depth, 3);
+
+        Mat src = randomMat(size, src_type);
+
+        GpuMat_<typename MakeVec<T, 3>::type> d_src(src);
+
+        GpuMat_<T> dst1, dst2, dst3;
+        gridSplit(d_src, tie(dst1, dst2, dst3));
+
+        std::vector<Mat> dst;
+        cv::split(src, dst);
+
+        ASSERT_MAT_NEAR(dst[0], dst1, 0.0);
+        ASSERT_MAT_NEAR(dst[1], dst2, 0.0);
+        ASSERT_MAT_NEAR(dst[2], dst3, 0.0);
+    }
+
+    void test_c4()
+    {
+        const Size size = randomSize(100, 400);
+
+        const int src_type = CV_MAKE_TYPE(DataType<T>::depth, 4);
+
+        Mat src = randomMat(size, src_type);
+
+        GpuMat_<typename MakeVec<T, 4>::type> d_src(src);
+
+        GpuMat_<T> dst1, dst2, dst3, dst4;
+        gridSplit(d_src, tie(dst1, dst2, dst3, dst4));
+
+        std::vector<Mat> dst;
+        cv::split(src, dst);
+
+        ASSERT_MAT_NEAR(dst[0], dst1, 0.0);
+        ASSERT_MAT_NEAR(dst[1], dst2, 0.0);
+        ASSERT_MAT_NEAR(dst[2], dst3, 0.0);
+        ASSERT_MAT_NEAR(dst[3], dst4, 0.0);
+    }
+};
+
+TYPED_TEST_CASE(SplitTest, AllTypes);
+
+TYPED_TEST(SplitTest, C3)
+{
+    SplitTest<TypeParam>::test_c3();
+}
+
+TYPED_TEST(SplitTest, C4)
+{
+    SplitTest<TypeParam>::test_c4();
+}
diff --git a/modules/cudev/test/test_warp.cu b/modules/cudev/test/test_warp.cu
new file mode 100644
index 00000000000..72d0643148c
--- /dev/null
+++ b/modules/cudev/test/test_warp.cu
@@ -0,0 +1,256 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+using namespace cvtest;
+
+// remap
+
+enum { HALF_SIZE=0, UPSIDE_DOWN, REFLECTION_X, REFLECTION_BOTH };
+
+static void generateMap(Mat& mapx, Mat& mapy, int remapMode)
+{
+    for (int j = 0; j < mapx.rows; ++j)
+    {
+        for (int i = 0; i < mapx.cols; ++i)
+        {
+            switch (remapMode)
+            {
+            case HALF_SIZE:
+                if (i > mapx.cols*0.25 && i < mapx.cols*0.75 && j > mapx.rows*0.25 && j < mapx.rows*0.75)
+                {
+                    mapx.at<float>(j,i) = 2.f * (i - mapx.cols * 0.25f) + 0.5f;
+                    mapy.at<float>(j,i) = 2.f * (j - mapx.rows * 0.25f) + 0.5f;
+                }
+                else
+                {
+                    mapx.at<float>(j,i) = 0.f;
+                    mapy.at<float>(j,i) = 0.f;
+                }
+                break;
+            case UPSIDE_DOWN:
+                mapx.at<float>(j,i) = static_cast<float>(i);
+                mapy.at<float>(j,i) = static_cast<float>(mapx.rows - j);
+                break;
+            case REFLECTION_X:
+                mapx.at<float>(j,i) = static_cast<float>(mapx.cols - i);
+                mapy.at<float>(j,i) = static_cast<float>(j);
+                break;
+            case REFLECTION_BOTH:
+                mapx.at<float>(j,i) = static_cast<float>(mapx.cols - i);
+                mapy.at<float>(j,i) = static_cast<float>(mapx.rows - j);
+                break;
+            } // end of switch
+        }
+    }
+}
+
+static void test_remap(int remapMode)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_32FC1, 0, 1);
+
+    Mat mapx(size, CV_32FC1);
+    Mat mapy(size, CV_32FC1);
+    generateMap(mapx, mapy, remapMode);
+
+    GpuMat_<float> d_src(src);
+    GpuMat_<float> d_mapx(mapx);
+    GpuMat_<float> d_mapy(mapy);
+
+    GpuMat_<float> dst = remap_(interNearest(brdReplicate(d_src)), d_mapx, d_mapy);
+
+    Mat dst_gold;
+    cv::remap(src, dst_gold, mapx, mapy, INTER_NEAREST, BORDER_REPLICATE);
+
+    EXPECT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+TEST(Remap, HALF_SIZE)
+{
+    test_remap(HALF_SIZE);
+}
+
+TEST(Remap, UPSIDE_DOWN)
+{
+    test_remap(UPSIDE_DOWN);
+}
+
+TEST(Remap, REFLECTION_X)
+{
+    test_remap(REFLECTION_X);
+}
+
+TEST(Remap, REFLECTION_BOTH)
+{
+    test_remap(REFLECTION_BOTH);
+}
+
+// resize
+
+TEST(Resize, Upscale)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_32FC1, 0, 1);
+
+    GpuMat_<float> d_src(src);
+    Texture<float> tex_src(d_src);
+
+    GpuMat_<float> dst1 = resize_(interCubic(tex_src), 2, 2);
+
+    Mat mapx(size.height * 2, size.width * 2, CV_32FC1);
+    Mat mapy(size.height * 2, size.width * 2, CV_32FC1);
+
+    for (int y = 0; y < mapx.rows; ++y)
+    {
+        for (int x = 0; x < mapx.cols; ++x)
+        {
+            mapx.at<float>(y, x) = static_cast<float>(x / 2);
+            mapy.at<float>(y, x) = static_cast<float>(y / 2);
+        }
+    }
+
+    GpuMat_<float> d_mapx(mapx);
+    GpuMat_<float> d_mapy(mapy);
+
+    GpuMat_<float> dst2 = remap_(interCubic(brdReplicate(d_src)), d_mapx, d_mapy);
+
+    EXPECT_MAT_NEAR(dst1, dst2, 0.0);
+}
+
+TEST(Resize, Downscale)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_32FC1, 0, 1);
+    const float fx = 1.0f / 3.0f;
+    const float fy = 1.0f / 3.0f;
+
+    GpuMat_<float> d_src(src);
+    Texture<float> tex_src(d_src);
+
+    GpuMat_<float> dst1 = resize_(interArea(tex_src, Size(3, 3)), fx, fy);
+
+    Mat mapx(cv::saturate_cast<int>(size.height * fy), cv::saturate_cast<int>(size.width * fx), CV_32FC1);
+    Mat mapy(cv::saturate_cast<int>(size.height * fy), cv::saturate_cast<int>(size.width * fx), CV_32FC1);
+
+    for (int y = 0; y < mapx.rows; ++y)
+    {
+        for (int x = 0; x < mapx.cols; ++x)
+        {
+            mapx.at<float>(y, x) = x / fx;
+            mapy.at<float>(y, x) = y / fy;
+        }
+    }
+
+    GpuMat_<float> d_mapx(mapx);
+    GpuMat_<float> d_mapy(mapy);
+
+    GpuMat_<float> dst2 = remap_(interArea(brdReplicate(d_src), Size(3, 3)), d_mapx, d_mapy);
+
+    EXPECT_MAT_NEAR(dst1, dst2, 0.0);
+}
+
+// warpAffine & warpPerspective
+
+Mat createAffineTransformMatrix(Size srcSize, float angle, bool perspective)
+{
+    cv::Mat M(perspective ? 3 : 2, 3, CV_32FC1);
+
+    {
+        M.at<float>(0, 0) = std::cos(angle); M.at<float>(0, 1) = -std::sin(angle); M.at<float>(0, 2) = static_cast<float>(srcSize.width / 2);
+        M.at<float>(1, 0) = std::sin(angle); M.at<float>(1, 1) =  std::cos(angle); M.at<float>(1, 2) = 0.0f;
+    }
+    if (perspective)
+    {
+        M.at<float>(2, 0) = 0.0f           ; M.at<float>(2, 1) =  0.0f           ; M.at<float>(2, 2) = 1.0f;
+    }
+
+    return M;
+}
+
+TEST(WarpAffine, Rotation)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_32FC1, 0, 1);
+    Mat M = createAffineTransformMatrix(size, static_cast<float>(CV_PI / 4), false);
+
+    GpuMat_<float> d_src(src);
+    GpuMat_<float> d_M;
+    createContinuous(M.size(), M.type(), d_M);
+    d_M.upload(M);
+
+    GpuMat_<float> dst = warpAffine_(interNearest(brdConstant(d_src)), size, d_M);
+
+    Mat dst_gold;
+    cv::warpAffine(src, dst_gold, M, size, INTER_NEAREST | WARP_INVERSE_MAP);
+
+    EXPECT_MAT_SIMILAR(dst_gold, dst, 1e-3);
+}
+
+TEST(WarpPerspective, Rotation)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_32FC1, 0, 1);
+    Mat M = createAffineTransformMatrix(size, static_cast<float>(CV_PI / 4), true);
+
+    GpuMat_<float> d_src(src);
+    GpuMat_<float> d_M;
+    createContinuous(M.size(), M.type(), d_M);
+    d_M.upload(M);
+
+    GpuMat_<float> dst = warpPerspective_(interNearest(brdConstant(d_src)), size, d_M);
+
+    Mat dst_gold;
+    cv::warpPerspective(src, dst_gold, M, size, INTER_NEAREST | WARP_INVERSE_MAP);
+
+    EXPECT_MAT_SIMILAR(dst_gold, dst, 1e-3);
+}
diff --git a/modules/cudev/test/transpose.cu b/modules/cudev/test/transpose.cu
new file mode 100644
index 00000000000..515eedfc34d
--- /dev/null
+++ b/modules/cudev/test/transpose.cu
@@ -0,0 +1,81 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::cudev;
+using namespace cvtest;
+
+TEST(Transpose, _8uc1)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_8UC1);
+
+    GpuMat_<uchar> d_src(src);
+
+    GpuMat_<uchar> dst = transpose_(d_src);
+
+    Mat dst_gold;
+    cv::transpose(src, dst_gold);
+
+    ASSERT_MAT_NEAR(dst_gold, dst, 0.0);
+}
+
+TEST(Transpose, _32fc3)
+{
+    const Size size = randomSize(100, 400);
+
+    Mat src = randomMat(size, CV_32FC3);
+
+    GpuMat_<float3> d_src(src);
+
+    GpuMat_<float3> dst = transpose_(d_src);
+
+    Mat dst_gold;
+    cv::transpose(src, dst_gold);
+
+    ASSERT_MAT_NEAR(dst_gold, dst, 0.0);
+}
diff --git a/modules/cvv/src/qtutil/filter/diffFilterWidget.cpp b/modules/cvv/src/qtutil/filter/diffFilterWidget.cpp
index 70b971af98e..ca2964fd16d 100644
--- a/modules/cvv/src/qtutil/filter/diffFilterWidget.cpp
+++ b/modules/cvv/src/qtutil/filter/diffFilterWidget.cpp
@@ -1,6 +1,5 @@
 #include <opencv2/core.hpp>
 #include <opencv2/imgproc.hpp>
-#include <opencv2/imgproc/types_c.h>
 
 #include <unordered_map>
 
diff --git a/modules/cvv/src/qtutil/filter/sobelfilterwidget.cpp b/modules/cvv/src/qtutil/filter/sobelfilterwidget.cpp
index acb86229aea..c9f0a8aec0c 100644
--- a/modules/cvv/src/qtutil/filter/sobelfilterwidget.cpp
+++ b/modules/cvv/src/qtutil/filter/sobelfilterwidget.cpp
@@ -12,6 +12,7 @@
 
 namespace cvv
 {
+using namespace cv;
 namespace qtutil
 {
 
@@ -35,7 +36,7 @@ SobelFilterWidget::SobelFilterWidget(QWidget *parent)
 	ksize_->addItem("3");
 	ksize_->addItem("5");
 	ksize_->addItem("7");
-	ksize_->addItem("CV_SCHARR(-1)");
+	ksize_->addItem("FILTER_SCHARR(-1)");
 	ksize_->setCurrentIndex(1);
 
 	borderType_->addItem("BORDER_DEFAULT");
@@ -133,7 +134,7 @@ void SobelFilterWidget::applyFilter(InputArray in, OutputArray out) const
 		ksize = 7;
 		break;
 	case 4:
-		ksize = CV_SCHARR;
+		ksize = FILTER_SCHARR;
 		break;
 	}
 
@@ -258,15 +259,15 @@ std::pair<bool, QString> SobelFilterWidget::checkInput(InputArray in) const
 		ksize = 7;
 		break;
 	case 4:
-		ksize = CV_SCHARR;
+		ksize = FILTER_SCHARR;
 		break;
 	}
 
-	if (ksize == CV_SCHARR)
+	if (ksize == FILTER_SCHARR)
 	{
 		if (dx + dy != 1)
 		{
-			return { false, "ksize=CV_SCHARR but dx+dy != 1" };
+			return { false, "ksize=FILTER_SCHARR but dx+dy != 1" };
 		}
 	}
 	else
diff --git a/modules/datasets/include/opencv2/datasets/dataset.hpp b/modules/datasets/include/opencv2/datasets/dataset.hpp
index f62da155551..7b05c10c07c 100644
--- a/modules/datasets/include/opencv2/datasets/dataset.hpp
+++ b/modules/datasets/include/opencv2/datasets/dataset.hpp
@@ -435,6 +435,53 @@ Implements loading dataset:
 ./opencv/build/bin/example_datasets_slam_tumindoor -p=/home/user/path_to_unpacked_folders/
 ~~~
 
+@defgroup datasets_sr Super Resolution
+
+### The Berkeley Segmentation Dataset and Benchmark
+
+Implements loading dataset:
+
+"The Berkeley Segmentation Dataset and Benchmark": <https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/segbench/>
+
+Usage:
+-# From link above download `BSDS300-images.tgz`.
+-# Unpack.
+-# To load data run:
+~~~
+./opencv/build/bin/example_datasets_sr_bsds -p=/home/user/path_to_unpacked_folder/
+~~~
+
+### DIV2K dataset: DIVerse 2K
+
+Implements loading dataset:
+
+"DIV2K dataset: DIVerse 2K": <https://data.vision.ee.ethz.ch/cvl/DIV2K/>
+
+Usage:
+-# From link above download 'Train data (HR images)' or any other of the dataset files.
+-# Unpack.
+-# To load data run:
+~~~
+./opencv/build/bin/example_datasets_sr_div2k -p=/home/user/path_to_unpacked_folder/folder_containing_the_images/
+~~~
+
+### The General-100 Dataset
+
+Implements loading dataset:
+
+"General-100 dataset contains 100 bmp-format images (with no compression).
+We used this dataset in our FSRCNN ECCV 2016 paper. The size of these 100 images ranges from 710 x 704 (large) to 131 x 112 (small).
+They are all of good quality with clear edges but fewer smooth regions (e.g., sky and ocean), thus are very suitable for the super-resolution training.":
+<http://mmlab.ie.cuhk.edu.hk/projects/FSRCNN.html>
+
+Usage:
+-# From link above download `General-100.zip`.
+-# Unpack.
+-# To load data run:
+~~~
+./opencv/build/bin/example_datasets_sr_general100 -p=/home/user/path_to_unpacked_folder/
+~~~
+
 @defgroup datasets_tr Text Recognition
 
 ### The Chars74K Dataset
diff --git a/modules/datasets/include/opencv2/datasets/sr_bsds.hpp b/modules/datasets/include/opencv2/datasets/sr_bsds.hpp
new file mode 100644
index 00000000000..c319323dd53
--- /dev/null
+++ b/modules/datasets/include/opencv2/datasets/sr_bsds.hpp
@@ -0,0 +1,41 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_DATASETS_SR_BSDS_HPP
+#define OPENCV_DATASETS_SR_BSDS_HPP
+
+#include <string>
+#include <vector>
+
+#include "opencv2/datasets/dataset.hpp"
+
+#include <opencv2/core.hpp>
+
+namespace cv
+{
+namespace datasets
+{
+
+//! @addtogroup datasets_sr
+//! @{
+
+struct SR_bsdsObj : public Object
+{
+    std::string imageName;
+};
+
+class CV_EXPORTS SR_bsds : public Dataset
+{
+public:
+    virtual void load(const std::string &path) CV_OVERRIDE = 0;
+
+    static Ptr<SR_bsds> create();
+};
+
+//! @}
+
+}
+}
+
+#endif
\ No newline at end of file
diff --git a/modules/datasets/include/opencv2/datasets/sr_div2k.hpp b/modules/datasets/include/opencv2/datasets/sr_div2k.hpp
new file mode 100644
index 00000000000..9e2c0253bea
--- /dev/null
+++ b/modules/datasets/include/opencv2/datasets/sr_div2k.hpp
@@ -0,0 +1,41 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_DATASETS_SR_DIV2K_HPP
+#define OPENCV_DATASETS_SR_DIV2K_HPP
+
+#include <string>
+#include <vector>
+
+#include "opencv2/datasets/dataset.hpp"
+
+#include <opencv2/core.hpp>
+
+namespace cv
+{
+namespace datasets
+{
+
+//! @addtogroup datasets_sr
+//! @{
+
+struct SR_div2kObj : public Object
+{
+    std::string imageName;
+};
+
+class CV_EXPORTS SR_div2k : public Dataset
+{
+public:
+    virtual void load(const std::string &path) CV_OVERRIDE = 0;
+
+    static Ptr<SR_div2k> create();
+};
+
+//! @}
+
+}
+}
+
+#endif
\ No newline at end of file
diff --git a/modules/datasets/include/opencv2/datasets/sr_general100.hpp b/modules/datasets/include/opencv2/datasets/sr_general100.hpp
new file mode 100644
index 00000000000..8b2d18903f8
--- /dev/null
+++ b/modules/datasets/include/opencv2/datasets/sr_general100.hpp
@@ -0,0 +1,41 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_DATASETS_SR_GENERAL100_HPP
+#define OPENCV_DATASETS_SR_GENERAL100_HPP
+
+#include <string>
+#include <vector>
+
+#include "opencv2/datasets/dataset.hpp"
+
+#include <opencv2/core.hpp>
+
+namespace cv
+{
+namespace datasets
+{
+
+//! @addtogroup datasets_sr
+//! @{
+
+struct SR_general100Obj : public Object
+{
+    std::string imageName;
+};
+
+class CV_EXPORTS SR_general100 : public Dataset
+{
+public:
+    virtual void load(const std::string &path) CV_OVERRIDE = 0;
+
+    static Ptr<SR_general100> create();
+};
+
+//! @}
+
+}
+}
+
+#endif
\ No newline at end of file
diff --git a/modules/datasets/samples/sr_bsds.cpp b/modules/datasets/samples/sr_bsds.cpp
new file mode 100644
index 00000000000..75c7d7cb5f3
--- /dev/null
+++ b/modules/datasets/samples/sr_bsds.cpp
@@ -0,0 +1,50 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "opencv2/datasets/sr_bsds.hpp"
+
+#include <opencv2/core.hpp>
+
+#include <cstdio>
+
+#include <string>
+#include <vector>
+
+using namespace std;
+using namespace cv;
+using namespace cv::datasets;
+
+int main(int argc, char *argv[])
+{
+    const char *keys =
+            "{ help h usage ? |    | show this message }"
+            "{ path p         |true| path to dataset (images, iids_train.txt, iids_test.txt) }";
+    CommandLineParser parser(argc, argv, keys);
+    string path(parser.get<string>("path"));
+    if (parser.has("help") || path=="true")
+    {
+        parser.printMessage();
+        return -1;
+    }
+
+    Ptr<SR_bsds> dataset = SR_bsds::create();
+    dataset->load(path);
+
+    // ***************
+    // Dataset train & test contain names of appropriate images.
+    // For example, let's output full path & name for first train and test images.
+    // And dataset sizes.
+    printf("train size: %u\n", (unsigned int)dataset->getTrain().size());
+    printf("test size: %u\n", (unsigned int)dataset->getTest().size());
+
+    SR_bsdsObj *example1 = static_cast<SR_bsdsObj *>(dataset->getTrain()[0].get());
+    string fullPath(path + "images/train/" + example1->imageName + ".jpg");
+    printf("first train image: %s\n", fullPath.c_str());
+
+    SR_bsdsObj *example2 = static_cast<SR_bsdsObj *>(dataset->getTest()[0].get());
+    fullPath = path + "images/test/" + example2->imageName + ".jpg";
+    printf("first test image: %s\n", fullPath.c_str());
+
+    return 0;
+}
\ No newline at end of file
diff --git a/modules/datasets/samples/sr_div2k.cpp b/modules/datasets/samples/sr_div2k.cpp
new file mode 100644
index 00000000000..d28b846a57e
--- /dev/null
+++ b/modules/datasets/samples/sr_div2k.cpp
@@ -0,0 +1,47 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "opencv2/datasets/sr_div2k.hpp"
+
+#include <opencv2/core.hpp>
+
+#include <cstdio>
+
+#include <string>
+#include <vector>
+
+using namespace std;
+using namespace cv;
+using namespace cv::datasets;
+
+int main(int argc, char *argv[])
+{
+    const char *keys =
+            "{ help h usage ? |    | show this message }"
+            "{ path p         |true| path to dataset (Div2k dataset folder containing the images) }";
+    CommandLineParser parser(argc, argv, keys);
+    string path(parser.get<string>("path"));
+    if (parser.has("help") || path=="true")
+    {
+        parser.printMessage();
+        return -1;
+    }
+
+    Ptr<SR_div2k> dataset = SR_div2k::create();
+    dataset->load(path);
+
+    // ***************
+    // Dataset contains all images.
+    // For example, let's output dataset size; first image name; and second image full path.
+    printf("dataset size: %u\n", (unsigned int)dataset->getTrain().size());
+
+    SR_div2kObj *example = static_cast<SR_div2kObj *>(dataset->getTrain()[0].get());
+    printf("first image name: %s\n", example->imageName.c_str());
+
+    SR_div2kObj *example2 = static_cast<SR_div2kObj *>(dataset->getTrain()[1].get());
+    string fullPath = path + "/" + example2->imageName.c_str();
+    printf("second image full path: %s\n", fullPath.c_str());
+
+    return 0;
+}
\ No newline at end of file
diff --git a/modules/datasets/samples/sr_general100.cpp b/modules/datasets/samples/sr_general100.cpp
new file mode 100644
index 00000000000..c1ffd6a41a2
--- /dev/null
+++ b/modules/datasets/samples/sr_general100.cpp
@@ -0,0 +1,47 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "opencv2/datasets/sr_general100.hpp"
+
+#include <opencv2/core.hpp>
+
+#include <cstdio>
+
+#include <string>
+#include <vector>
+
+using namespace std;
+using namespace cv;
+using namespace cv::datasets;
+
+int main(int argc, char *argv[])
+{
+    const char *keys =
+            "{ help h usage ? |    | show this message }"
+            "{ path p         |true| path to dataset (General-100 dataset folder) }";
+    CommandLineParser parser(argc, argv, keys);
+    string path(parser.get<string>("path"));
+    if (parser.has("help") || path=="true")
+    {
+        parser.printMessage();
+        return -1;
+    }
+
+    Ptr<SR_general100> dataset = SR_general100::create();
+    dataset->load(path);
+
+    // ***************
+    // Dataset contains all images.
+    // For example, let's output dataset size; first image name; and second image full path.
+    printf("dataset size: %u\n", (unsigned int)dataset->getTrain().size());
+
+    SR_general100Obj *example = static_cast<SR_general100Obj *>(dataset->getTrain()[0].get());
+    printf("first image name: %s\n", example->imageName.c_str());
+
+    SR_general100Obj *example2 = static_cast<SR_general100Obj *>(dataset->getTrain()[1].get());
+    string fullPath = path + "/" + example2->imageName.c_str();
+    printf("second image full path: %s\n", fullPath.c_str());
+
+    return 0;
+}
\ No newline at end of file
diff --git a/modules/datasets/src/sr_bsds.cpp b/modules/datasets/src/sr_bsds.cpp
new file mode 100644
index 00000000000..aa59c019a1e
--- /dev/null
+++ b/modules/datasets/src/sr_bsds.cpp
@@ -0,0 +1,74 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "opencv2/datasets/sr_bsds.hpp"
+#include "opencv2/datasets/util.hpp"
+
+namespace cv
+{
+namespace datasets
+{
+
+using namespace std;
+
+class SR_bsdsImp CV_FINAL : public SR_bsds
+{
+public:
+    SR_bsdsImp() {}
+    //SR_bsdsImp(const string &path);
+    virtual ~SR_bsdsImp() {}
+
+    virtual void load(const string &path) CV_OVERRIDE;
+
+private:
+    void loadDataset(const string &path);
+
+    void loadDatasetPart(const string &fileName, vector< Ptr<Object> > &dataset_);
+};
+
+void SR_bsdsImp::loadDatasetPart(const string &fileName, vector< Ptr<Object> > &dataset_)
+{
+    ifstream infile(fileName.c_str());
+    string imageName;
+    while (infile >> imageName)
+    {
+        Ptr<SR_bsdsObj> curr(new SR_bsdsObj);
+        curr->imageName = imageName;
+        dataset_.push_back(curr);
+    }
+}
+
+/*SR_bsdsImp::SR_bsdsImp(const string &path)
+{
+    loadDataset(path);
+}*/
+
+void SR_bsdsImp::load(const string &path)
+{
+    loadDataset(path);
+}
+
+void SR_bsdsImp::loadDataset(const string &path)
+{
+    train.push_back(vector< Ptr<Object> >());
+    test.push_back(vector< Ptr<Object> >());
+    validation.push_back(vector< Ptr<Object> >());
+
+    string trainName(path + "iids_train.txt");
+    string testName(path + "iids_test.txt");
+
+    // loading train
+    loadDatasetPart(trainName, train.back());
+
+    // loading test
+    loadDatasetPart(testName, test.back());
+}
+
+Ptr<SR_bsds> SR_bsds::create()
+{
+    return Ptr<SR_bsdsImp>(new SR_bsdsImp);
+}
+
+}
+}
\ No newline at end of file
diff --git a/modules/datasets/src/sr_div2k.cpp b/modules/datasets/src/sr_div2k.cpp
new file mode 100644
index 00000000000..3e5700161f9
--- /dev/null
+++ b/modules/datasets/src/sr_div2k.cpp
@@ -0,0 +1,62 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "opencv2/datasets/sr_div2k.hpp"
+#include "opencv2/datasets/util.hpp"
+
+namespace cv
+{
+namespace datasets
+{
+
+using namespace std;
+
+class SR_div2kImp CV_FINAL : public SR_div2k
+{
+public:
+    SR_div2kImp() {}
+    //SR_div2kImp(const string &path);
+    virtual ~SR_div2kImp() {}
+
+    virtual void load(const string &path) CV_OVERRIDE;
+
+private:
+    void loadDataset(const string &path);
+};
+
+/*SR_div2kImp::SR_div2kImp(const string &path)
+{
+    loadDataset(path);
+}*/
+
+void SR_div2kImp::load(const string &path)
+{
+    loadDataset(path);
+}
+
+void SR_div2kImp::loadDataset(const string &path)
+{
+    train.push_back(vector< Ptr<Object> >());
+    test.push_back(vector< Ptr<Object> >());
+    validation.push_back(vector< Ptr<Object> >());
+
+    vector<string> fileNames;
+    getDirList(path, fileNames);
+    for (vector<string>::iterator it=fileNames.begin(); it!=fileNames.end(); ++it)
+    {
+        string &imageName = *it;
+        Ptr<SR_div2kObj> curr(new SR_div2kObj);
+        curr->imageName = imageName;
+        train.back().push_back(curr);
+
+    }
+}
+
+Ptr<SR_div2k> SR_div2k::create()
+{
+    return Ptr<SR_div2kImp>(new SR_div2kImp);
+}
+
+}
+}
\ No newline at end of file
diff --git a/modules/datasets/src/sr_general100.cpp b/modules/datasets/src/sr_general100.cpp
new file mode 100644
index 00000000000..c0ad6df87e8
--- /dev/null
+++ b/modules/datasets/src/sr_general100.cpp
@@ -0,0 +1,62 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "opencv2/datasets/sr_general100.hpp"
+#include "opencv2/datasets/util.hpp"
+
+namespace cv
+{
+namespace datasets
+{
+
+using namespace std;
+
+class SR_general100Imp CV_FINAL : public SR_general100
+{
+public:
+    SR_general100Imp() {}
+    //SR_general100Imp(const string &path);
+    virtual ~SR_general100Imp() {}
+
+    virtual void load(const string &path) CV_OVERRIDE;
+
+private:
+    void loadDataset(const string &path);
+};
+
+/*SR_general100Imp::SR_general100Imp(const string &path)
+{
+    loadDataset(path);
+}*/
+
+void SR_general100Imp::load(const string &path)
+{
+    loadDataset(path);
+}
+
+void SR_general100Imp::loadDataset(const string &path)
+{
+    train.push_back(vector< Ptr<Object> >());
+    test.push_back(vector< Ptr<Object> >());
+    validation.push_back(vector< Ptr<Object> >());
+
+    vector<string> fileNames;
+    getDirList(path, fileNames);
+    for (vector<string>::iterator it=fileNames.begin(); it!=fileNames.end(); ++it)
+    {
+        string &imageName = *it;
+        Ptr<SR_general100Obj> curr(new SR_general100Obj);
+        curr->imageName = imageName;
+        train.back().push_back(curr);
+
+    }
+}
+
+Ptr<SR_general100> SR_general100::create()
+{
+    return Ptr<SR_general100Imp>(new SR_general100Imp);
+}
+
+}
+}
\ No newline at end of file
diff --git a/modules/dnn_superres/CMakeLists.txt b/modules/dnn_superres/CMakeLists.txt
new file mode 100644
index 00000000000..bb18cfffffd
--- /dev/null
+++ b/modules/dnn_superres/CMakeLists.txt
@@ -0,0 +1,5 @@
+set(the_description "Super Resolution using CNNs")
+
+ocv_define_module(dnn_superres opencv_core opencv_imgproc opencv_dnn
+    OPTIONAL opencv_quality # samples
+)
diff --git a/modules/dnn_superres/README.md b/modules/dnn_superres/README.md
new file mode 100644
index 00000000000..b47772b049a
--- /dev/null
+++ b/modules/dnn_superres/README.md
@@ -0,0 +1,95 @@
+# Super Resolution using Convolutional Neural Networks
+
+This module contains several learning-based algorithms for upscaling an image.
+
+## Usage
+
+Run the following command to build this module:
+
+```make
+cmake -DOPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules -Dopencv_dnn_superres=ON <opencv_source_dir>
+```
+
+Refer to the tutorials to understand how to use this module.
+
+## Models
+
+There are four models which are trained.
+
+#### EDSR
+
+Trained models can be downloaded from [here](https://github.com/Saafke/EDSR_Tensorflow/tree/master/models).
+
+- Size of the model: ~38.5MB. This is a quantized version, so that it can be uploaded to GitHub. (Original was 150MB.)
+- This model was trained for 3 days with a batch size of 16
+- Link to implementation code: https://github.com/Saafke/EDSR_Tensorflow
+- x2, x3, x4 trained models available
+- Advantage: Highly accurate
+- Disadvantage: Slow and large filesize
+- Speed: < 3 sec for every scaling factor on 256x256 images on an Intel i7-9700K CPU.
+- Original paper: [Enhanced Deep Residual Networks for Single Image Super-Resolution](https://arxiv.org/pdf/1707.02921.pdf) [1]
+
+#### ESPCN
+
+Trained models can be downloaded from [here](https://github.com/fannymonori/TF-ESPCN/tree/master/export).
+
+- Size of the model: ~100kb
+- This model was trained for ~100 iterations with a batch size of 32
+- Link to implementation code: https://github.com/fannymonori/TF-ESPCN
+- x2, x3, x4 trained models available
+- Advantage: It is tiny and fast, and still performs well.
+- Disadvantage: Perform worse visually than newer, more robust models.
+- Speed: < 0.01 sec for every scaling factor on 256x256 images on an Intel i7-9700K CPU.
+- Original paper: [Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network](https://arxiv.org/pdf/1707.02921.pdf) [2]
+
+#### FSRCNN
+
+Trained models can be downloaded from [here](https://github.com/Saafke/FSRCNN_Tensorflow/tree/master/models).
+
+- Size of the model: ~40KB (~9kb for FSRCNN-small)
+- This model was trained for ~30 iterations with a batch size of 1
+- Link to implementation code: https://github.com/Saafke/FSRCNN_Tensorflow
+- Advantage: Fast, small and accurate
+- Disadvantage: Not state-of-the-art accuracy
+- Speed: < 0.01 sec for every scaling factor on 256x256 images on an Intel i7-9700K CPU.
+- Notes: FSRCNN-small has fewer parameters, thus less accurate but faster.
+- Original paper: [Accelerating the Super-Resolution Convolutional Neural Network](http://mmlab.ie.cuhk.edu.hk/projects/FSRCNN.html) [3]
+
+#### LapSRN
+
+Trained models can be downloaded from [here](https://github.com/fannymonori/TF-LapSRN/tree/master/export).
+
+- Size of the model: between 1-5Mb
+- This model was trained for ~50 iterations with a batch size of 32
+- Link to implementation code: https://github.com/fannymonori/TF-LAPSRN
+- x2, x4, x8 trained models available
+- Advantage: The model can do multi-scale super-resolution with one forward pass. It can now support 2x, 4x, 8x, and [2x, 4x] and [2x, 4x, 8x] super-resolution.
+- Disadvantage: It is slower than ESPCN and FSRCNN, and the accuracy is worse than EDSR.
+- Speed: < 0.1 sec for every scaling factor on 256x256 images on an Intel i7-9700K CPU.
+- Original paper: [Deep laplacian pyramid networks for fast and accurate super-resolution](https://arxiv.org/pdf/1707.02921.pdf) [4]
+
+### Benchmarks
+
+Comparing different algorithms. Scale x4 on monarch.png (768x512 image).
+
+|               | Inference time in seconds (CPU)| PSNR | SSIM |
+| ------------- |:-------------------:| ---------:|--------:|
+| ESPCN            |0.01159   | 26.5471 | 0.88116 |
+| EDSR             |3.26758     |**29.2404**  |**0.92112**  |
+| FSRCNN           | 0.01298   | 26.5646 | 0.88064 |
+| LapSRN           |0.28257    |26.7330   |0.88622  |
+| Bicubic          |0.00031 |26.0635  |0.87537  |
+| Nearest neighbor |**0.00014** |23.5628  |0.81741  |
+| Lanczos          |0.00101  |25.9115  |0.87057  |
+
+Refer to the benchmarks located in the tutorials for more detailed benchmarking.
+
+### References
+[1] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee, **"Enhanced Deep Residual Networks for Single Image Super-Resolution"**, <i> 2nd NTIRE: New Trends in Image Restoration and Enhancement workshop and challenge on image super-resolution in conjunction with **CVPR 2017**. </i> [[PDF](http://openaccess.thecvf.com/content_cvpr_2017_workshops/w12/papers/Lim_Enhanced_Deep_Residual_CVPR_2017_paper.pdf)] [[arXiv](https://arxiv.org/abs/1707.02921)] [[Slide](https://cv.snu.ac.kr/research/EDSR/Presentation_v3(release).pptx)]
+
+[2] Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A., Bishop, R., Rueckert, D. and Wang, Z., **"Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network"**, <i>Proceedings of the IEEE conference on computer vision and pattern recognition</i> **CVPR 2016**. [[PDF](http://openaccess.thecvf.com/content_cvpr_2016/papers/Shi_Real-Time_Single_Image_CVPR_2016_paper.pdf)] [[arXiv](https://arxiv.org/abs/1609.05158)]
+
+[3] Chao Dong, Chen Change Loy, Xiaoou Tang. **"Accelerating the Super-Resolution Convolutional Neural Network"**, <i> in Proceedings of European Conference on Computer Vision </i>**ECCV 2016**. [[PDF](http://personal.ie.cuhk.edu.hk/~ccloy/files/eccv_2016_accelerating.pdf)]
+[[arXiv](https://arxiv.org/abs/1608.00367)] [[Project Page](http://mmlab.ie.cuhk.edu.hk/projects/FSRCNN.html)]
+
+[4] Lai, W. S., Huang, J. B., Ahuja, N., and Yang, M. H., **"Deep laplacian pyramid networks for fast and accurate super-resolution"**, <i> In Proceedings of the IEEE conference on computer vision and pattern recognition </i>**CVPR 2017**. [[PDF](http://openaccess.thecvf.com/content_cvpr_2017/papers/Lai_Deep_Laplacian_Pyramid_CVPR_2017_paper.pdf)] [[arXiv](https://arxiv.org/abs/1710.01992)] [[Project Page](http://vllab.ucmerced.edu/wlai24/LapSRN/)]
diff --git a/modules/dnn_superres/include/opencv2/dnn_superres.hpp b/modules/dnn_superres/include/opencv2/dnn_superres.hpp
new file mode 100644
index 00000000000..06faffa1939
--- /dev/null
+++ b/modules/dnn_superres/include/opencv2/dnn_superres.hpp
@@ -0,0 +1,121 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef _OPENCV_DNN_SUPERRES_HPP_
+#define _OPENCV_DNN_SUPERRES_HPP_
+
+/** @defgroup dnn_superres DNN used for super resolution
+
+This module contains functionality for upscaling an image via convolutional neural networks.
+The following four models are implemented:
+
+- EDSR <https://arxiv.org/abs/1707.02921>
+- ESPCN <https://arxiv.org/abs/1609.05158>
+- FSRCNN <https://arxiv.org/abs/1608.00367>
+- LapSRN <https://arxiv.org/abs/1710.01992>
+
+*/
+
+#include "opencv2/core.hpp"
+#include "opencv2/dnn.hpp"
+
+namespace cv
+{
+namespace dnn_superres
+{
+
+//! @addtogroup dnn_superres
+//! @{
+
+/** @brief A class to upscale images via convolutional neural networks.
+The following four models are implemented:
+
+- edsr
+- espcn
+- fsrcnn
+- lapsrn
+ */
+
+class CV_EXPORTS DnnSuperResImpl
+{
+private:
+
+    /** @brief Net which holds the desired neural network
+     */
+    dnn::Net net;
+
+    std::string alg; //algorithm
+
+    int sc; //scale factor
+
+    void reconstruct_YCrCb(InputArray inpImg, InputArray origImg, OutputArray outpImg, int scale);
+
+    void preprocess_YCrCb(InputArray inpImg, OutputArray outpImg);
+
+public:
+
+    /** @brief Empty constructor
+     */
+    DnnSuperResImpl();
+
+    /** @brief Constructor which immediately sets the desired model
+    @param algo String containing one of the desired models:
+        - __edsr__
+        - __espcn__
+        - __fsrcnn__
+        - __lapsrn__
+    @param scale Integer specifying the upscale factor
+     */
+    DnnSuperResImpl(const std::string& algo, int scale);
+
+    /** @brief Read the model from the given path
+    @param path Path to the model file.
+    */
+    void readModel(const std::string& path);
+
+    /** @brief Read the model from the given path
+    @param weights Path to the model weights file.
+    @param definition Path to the model definition file.
+    */
+    void readModel(const std::string& weights, const std::string& definition);
+
+    /** @brief Set desired model
+    @param algo String containing one of the desired models:
+        - __edsr__
+        - __espcn__
+        - __fsrcnn__
+        - __lapsrn__
+    @param scale Integer specifying the upscale factor
+     */
+    void setModel(const std::string& algo, int scale);
+
+    /** @brief Upsample via neural network
+    @param img Image to upscale
+    @param result Destination upscaled image
+     */
+    void upsample(InputArray img, OutputArray result);
+
+    /** @brief Upsample via neural network of multiple outputs
+    @param img Image to upscale
+    @param imgs_new Destination upscaled images
+    @param scale_factors Scaling factors of the output nodes
+    @param node_names Names of the output nodes in the neural network
+    */
+    void upsampleMultioutput(InputArray img, std::vector<Mat> &imgs_new, const std::vector<int>& scale_factors, const std::vector<String>& node_names);
+
+    /** @brief Returns the scale factor of the model:
+    @return Current scale factor.
+    */
+    int getScale();
+
+    /** @brief Returns the scale factor of the model:
+    @return Current algorithm.
+    */
+    std::string getAlgorithm();
+};
+
+//! @} dnn_superres
+
+}} // cv::dnn_superres::
+#endif
diff --git a/modules/dnn_superres/perf/perf_dnn_superres.cpp b/modules/dnn_superres/perf/perf_dnn_superres.cpp
new file mode 100644
index 00000000000..2576b544243
--- /dev/null
+++ b/modules/dnn_superres/perf/perf_dnn_superres.cpp
@@ -0,0 +1,49 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "perf_precomp.hpp"
+
+using namespace std;
+using namespace cv;
+using namespace perf;
+
+namespace opencv_test { namespace {
+
+typedef perf::TestBaseWithParam<tuple<tuple<string,string,int>, string> > dnn_superres;
+
+#define MODEL testing::Values(tuple<string,string,int> {"espcn","ESPCN_x2.pb",2}, \
+                                tuple<string,string,int> {"lapsrn","LapSRN_x4.pb",4})
+#define IMAGES testing::Values("cv/dnn_superres/butterfly.png", "cv/shared/baboon.png", "cv/shared/lena.png")
+
+const string TEST_DIR = "cv/dnn_superres";
+
+PERF_TEST_P(dnn_superres, upsample, testing::Combine(MODEL, IMAGES))
+{
+    tuple<string,string,int> model = get<0>( GetParam() );
+    string image_name = get<1>( GetParam() );
+
+    string model_name = get<0>(model);
+    string model_filename = get<1>(model);
+    int scale = get<2>(model);
+
+    string model_path = cvtest::findDataFile(TEST_DIR + "/" + model_filename);
+    string image_path = cvtest::findDataFile(image_name);
+
+    DnnSuperResImpl sr;
+    sr.readModel(model_path);
+    sr.setModel(model_name, scale);
+
+    Mat img = imread(image_path);
+    ASSERT_FALSE(img.empty()) << image_path;
+
+    Mat result;
+
+    TEST_CYCLE() { sr.upsample(img, result); }
+
+    ASSERT_FALSE(result.empty());
+
+    SANITY_CHECK_NOTHING();
+}
+
+}}
\ No newline at end of file
diff --git a/modules/dnn_superres/perf/perf_main.cpp b/modules/dnn_superres/perf/perf_main.cpp
new file mode 100644
index 00000000000..98bcb58858b
--- /dev/null
+++ b/modules/dnn_superres/perf/perf_main.cpp
@@ -0,0 +1,7 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "perf_precomp.hpp"
+
+CV_PERF_TEST_MAIN( dnn_superres )
diff --git a/modules/dnn_superres/perf/perf_precomp.hpp b/modules/dnn_superres/perf/perf_precomp.hpp
new file mode 100644
index 00000000000..d53cdceb738
--- /dev/null
+++ b/modules/dnn_superres/perf/perf_precomp.hpp
@@ -0,0 +1,15 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef __OPENCV_PERF_PRECOMP_HPP__
+#define __OPENCV_PERF_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/dnn_superres.hpp"
+
+namespace opencv_test {
+    using namespace cv::dnn_superres;
+}
+
+#endif
diff --git a/modules/dnn_superres/samples/butterfly.png b/modules/dnn_superres/samples/butterfly.png
new file mode 100644
index 00000000000..90c31f1759c
Binary files /dev/null and b/modules/dnn_superres/samples/butterfly.png differ
diff --git a/modules/dnn_superres/samples/dnn_superres.cpp b/modules/dnn_superres/samples/dnn_superres.cpp
new file mode 100644
index 00000000000..77f28a55079
--- /dev/null
+++ b/modules/dnn_superres/samples/dnn_superres.cpp
@@ -0,0 +1,82 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include <iostream>
+
+#include <opencv2/dnn_superres.hpp>
+
+#include <opencv2/imgproc.hpp>
+#include <opencv2/highgui.hpp>
+
+using namespace std;
+using namespace cv;
+using namespace dnn;
+using namespace dnn_superres;
+
+int main(int argc, char *argv[])
+{
+    // Check for valid command line arguments, print usage
+    // if insufficient arguments were given.
+    if ( argc < 4 ) {
+        cout << "usage:   Arg 1: image     | Path to image" << endl;
+        cout << "\t Arg 2: algorithm | bilinear, bicubic, edsr, espcn, fsrcnn or lapsrn" << endl;
+        cout << "\t Arg 3: scale     | 2, 3 or 4 \n";
+        cout << "\t Arg 4: path to model file \n";
+        return -1;
+    }
+
+    string img_path = string(argv[1]);
+    string algorithm = string(argv[2]);
+    int scale = atoi(argv[3]);
+    string path = "";
+
+    if( argc > 4)
+        path = string(argv[4]);
+
+    // Load the image
+    Mat img = cv::imread(img_path);
+    Mat original_img(img);
+    if ( img.empty() )
+    {
+        std::cerr << "Couldn't load image: " << img << "\n";
+        return -2;
+    }
+
+    //Make dnn super resolution instance
+    DnnSuperResImpl sr;
+
+    Mat img_new;
+
+    if( algorithm == "bilinear" ){
+        resize(img, img_new, Size(), scale, scale, 2);
+    }
+    else if( algorithm == "bicubic" )
+    {
+        resize(img, img_new, Size(), scale, scale, 3);
+    }
+    else if( algorithm == "edsr" || algorithm == "espcn" || algorithm == "fsrcnn" || algorithm == "lapsrn" )
+    {
+        sr.readModel(path);
+        sr.setModel(algorithm, scale);
+        sr.upsample(img, img_new);
+    }
+    else{
+        std::cerr << "Algorithm not recognized. \n";
+    }
+
+    if ( img_new.empty() )
+    {
+        std::cerr << "Upsampling failed. \n";
+        return -3;
+    }
+    cout << "Upsampling succeeded. \n";
+
+    // Display image
+    cv::namedWindow("Initial Image", WINDOW_AUTOSIZE);
+    cv::imshow("Initial Image", img_new);
+    //cv::imwrite("./saved.jpg", img_new);
+    cv::waitKey(0);
+
+    return 0;
+}
\ No newline at end of file
diff --git a/modules/dnn_superres/samples/dnn_superres_benchmark_quality.cpp b/modules/dnn_superres/samples/dnn_superres_benchmark_quality.cpp
new file mode 100644
index 00000000000..4b4af39c4c4
--- /dev/null
+++ b/modules/dnn_superres/samples/dnn_superres_benchmark_quality.cpp
@@ -0,0 +1,208 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include <iostream>
+#include <opencv2/opencv_modules.hpp>
+
+#ifdef HAVE_OPENCV_QUALITY
+#include <opencv2/dnn_superres.hpp>
+#include <opencv2/quality.hpp>
+#include <opencv2/imgproc.hpp>
+#include <opencv2/highgui.hpp>
+
+using namespace std;
+using namespace cv;
+using namespace dnn_superres;
+
+static void showBenchmark(vector<Mat> images, string title, Size imageSize,
+                          const vector<String> imageTitles,
+                          const vector<double> psnrValues,
+                          const vector<double> ssimValues)
+{
+    int fontFace = FONT_HERSHEY_COMPLEX_SMALL;
+    int fontScale = 1;
+    Scalar fontColor = Scalar(255, 255, 255);
+
+    int len = static_cast<int>(images.size());
+
+    int cols = 2, rows = 2;
+
+    Mat fullImage = Mat::zeros(Size((cols * 10) + imageSize.width * cols, (rows * 10) + imageSize.height * rows),
+                               images[0].type());
+
+    stringstream ss;
+    int h_ = -1;
+    for (int i = 0; i < len; i++) {
+
+        int fontStart = 15;
+        int w_ = i % cols;
+        if (i % cols == 0)
+            h_++;
+
+        Rect ROI((w_ * (10 + imageSize.width)), (h_ * (10 + imageSize.height)), imageSize.width, imageSize.height);
+        Mat tmp;
+        resize(images[i], tmp, Size(ROI.width, ROI.height));
+
+        ss << imageTitles[i];
+        putText(tmp,
+                ss.str(),
+                Point(5, fontStart),
+                fontFace,
+                fontScale,
+                fontColor,
+                1,
+                16);
+
+        ss.str("");
+        fontStart += 20;
+
+        ss << "PSNR: " << psnrValues[i];
+        putText(tmp,
+                ss.str(),
+                Point(5, fontStart),
+                fontFace,
+                fontScale,
+                fontColor,
+                1,
+                16);
+
+        ss.str("");
+        fontStart += 20;
+
+        ss << "SSIM: " << ssimValues[i];
+        putText(tmp,
+                ss.str(),
+                Point(5, fontStart),
+                fontFace,
+                fontScale,
+                fontColor,
+                1,
+                16);
+
+        ss.str("");
+        fontStart += 20;
+
+        tmp.copyTo(fullImage(ROI));
+    }
+
+    namedWindow(title, 1);
+    imshow(title, fullImage);
+    waitKey();
+}
+
+static Vec2d getQualityValues(Mat orig, Mat upsampled)
+{
+    double psnr = PSNR(upsampled, orig);
+    Scalar q = quality::QualitySSIM::compute(upsampled, orig, noArray());
+    double ssim = mean(Vec3d((q[0]), q[1], q[2]))[0];
+    return Vec2d(psnr, ssim);
+}
+
+int main(int argc, char *argv[])
+{
+    // Check for valid command line arguments, print usage
+    // if insufficient arguments were given.
+    if (argc < 4) {
+        cout << "usage:   Arg 1: image path  | Path to image" << endl;
+        cout << "\t Arg 2: algorithm | edsr, espcn, fsrcnn or lapsrn" << endl;
+        cout << "\t Arg 3: path to model file 2 \n";
+        cout << "\t Arg 4: scale  | 2, 3, 4 or 8 \n";
+        return -1;
+    }
+
+    string path = string(argv[1]);
+    string algorithm = string(argv[2]);
+    string model = string(argv[3]);
+    int scale = atoi(argv[4]);
+
+    Mat img = imread(path);
+    if (img.empty()) {
+        cerr << "Couldn't load image: " << img << "\n";
+        return -2;
+    }
+
+    //Crop the image so the images will be aligned
+    int width = img.cols - (img.cols % scale);
+    int height = img.rows - (img.rows % scale);
+    Mat cropped = img(Rect(0, 0, width, height));
+
+    //Downscale the image for benchmarking
+    Mat img_downscaled;
+    resize(cropped, img_downscaled, Size(), 1.0 / scale, 1.0 / scale);
+
+    //Make dnn super resolution instance
+    DnnSuperResImpl sr;
+
+    vector <Mat> allImages;
+    Mat img_new;
+
+    //Read and set the dnn model
+    sr.readModel(model);
+    sr.setModel(algorithm, scale);
+    sr.upsample(img_downscaled, img_new);
+
+    vector<double> psnrValues = vector<double>();
+    vector<double> ssimValues = vector<double>();
+
+    //DL MODEL
+    Vec2f quality = getQualityValues(cropped, img_new);
+
+    psnrValues.push_back(quality[0]);
+    ssimValues.push_back(quality[1]);
+
+    cout << sr.getAlgorithm() << ":" << endl;
+    cout << "PSNR: " << quality[0] << " SSIM: " << quality[1] << endl;
+    cout << "----------------------" << endl;
+
+    //BICUBIC
+    Mat bicubic;
+    resize(img_downscaled, bicubic, Size(), scale, scale, INTER_CUBIC);
+    quality = getQualityValues(cropped, bicubic);
+
+    psnrValues.push_back(quality[0]);
+    ssimValues.push_back(quality[1]);
+
+    cout << "Bicubic " << endl;
+    cout << "PSNR: " << quality[0] << " SSIM: " << quality[1] << endl;
+    cout << "----------------------" << endl;
+
+    //NEAREST NEIGHBOR
+    Mat nearest;
+    resize(img_downscaled, nearest, Size(), scale, scale, INTER_NEAREST);
+    quality = getQualityValues(cropped, nearest);
+
+    psnrValues.push_back(quality[0]);
+    ssimValues.push_back(quality[1]);
+
+    cout << "Nearest neighbor" << endl;
+    cout << "PSNR: " << quality[0] << " SSIM: " << quality[1] << endl;
+    cout << "----------------------" << endl;
+
+    //LANCZOS
+    Mat lanczos;
+    resize(img_downscaled, lanczos, Size(), scale, scale, INTER_LANCZOS4);
+    quality = getQualityValues(cropped, lanczos);
+
+    psnrValues.push_back(quality[0]);
+    ssimValues.push_back(quality[1]);
+
+    cout << "Lanczos" << endl;
+    cout << "PSNR: " << quality[0] << " SSIM: " << quality[1] << endl;
+    cout << "-----------------------------------------------" << endl;
+
+    vector <Mat> imgs{img_new, bicubic, nearest, lanczos};
+    vector <String> titles{sr.getAlgorithm(), "Bicubic", "Nearest neighbor", "Lanczos"};
+    showBenchmark(imgs, "Quality benchmark", Size(bicubic.cols, bicubic.rows), titles, psnrValues, ssimValues);
+
+    waitKey(0);
+
+    return 0;
+}
+#else
+int main()
+{
+    std::cout << "This sample requires the OpenCV Quality module." << std::endl;
+    return 0;
+}
+#endif
diff --git a/modules/dnn_superres/samples/dnn_superres_benchmark_time.cpp b/modules/dnn_superres/samples/dnn_superres_benchmark_time.cpp
new file mode 100644
index 00000000000..ff27fa039d2
--- /dev/null
+++ b/modules/dnn_superres/samples/dnn_superres_benchmark_time.cpp
@@ -0,0 +1,165 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include <iostream>
+
+#include <opencv2/dnn_superres.hpp>
+#include <opencv2/imgproc.hpp>
+#include <opencv2/highgui.hpp>
+
+using namespace std;
+using namespace cv;
+using namespace dnn_superres;
+
+static void showBenchmark(vector<Mat> images, string title, Size imageSize,
+                          const vector<String> imageTitles,
+                          const vector<double> perfValues)
+{
+    int fontFace = FONT_HERSHEY_COMPLEX_SMALL;
+    int fontScale = 1;
+    Scalar fontColor = Scalar(255, 255, 255);
+
+    int len = static_cast<int>(images.size());
+
+    int cols = 2, rows = 2;
+
+    Mat fullImage = Mat::zeros(Size((cols * 10) + imageSize.width * cols, (rows * 10) + imageSize.height * rows),
+                               images[0].type());
+
+    stringstream ss;
+    int h_ = -1;
+    for (int i = 0; i < len; i++) {
+        int fontStart = 15;
+        int w_ = i % cols;
+        if (i % cols == 0)
+            h_++;
+
+        Rect ROI((w_ * (10 + imageSize.width)), (h_ * (10 + imageSize.height)), imageSize.width, imageSize.height);
+        Mat tmp;
+        resize(images[i], tmp, Size(ROI.width, ROI.height));
+
+
+        ss << imageTitles[i];
+        putText(tmp,
+                ss.str(),
+                Point(5, fontStart),
+                fontFace,
+                fontScale,
+                fontColor,
+                1,
+                16);
+
+        ss.str("");
+        fontStart += 20;
+
+        ss << perfValues[i];
+        putText(tmp,
+                ss.str(),
+                Point(5, fontStart),
+                fontFace,
+                fontScale,
+                fontColor,
+                1,
+                16);
+
+        tmp.copyTo(fullImage(ROI));
+    }
+
+    namedWindow(title, 1);
+    imshow(title, fullImage);
+    waitKey();
+}
+
+int main(int argc, char *argv[])
+{
+    // Check for valid command line arguments, print usage
+    // if insufficient arguments were given.
+    if (argc < 4) {
+        cout << "usage:   Arg 1: image path  | Path to image" << endl;
+        cout << "\t Arg 2: algorithm | edsr, espcn, fsrcnn or lapsrn" << endl;
+        cout << "\t Arg 3: path to model file 2 \n";
+        cout << "\t Arg 4: scale  | 2, 3, 4 or 8 \n";
+        return -1;
+    }
+
+    string path = string(argv[1]);
+    string algorithm = string(argv[2]);
+    string model = string(argv[3]);
+    int scale = atoi(argv[4]);
+
+    Mat img = imread(path);
+    if (img.empty()) {
+        cerr << "Couldn't load image: " << img << "\n";
+        return -2;
+    }
+
+    //Crop the image so the images will be aligned
+    int width = img.cols - (img.cols % scale);
+    int height = img.rows - (img.rows % scale);
+    Mat cropped = img(Rect(0, 0, width, height));
+
+    //Downscale the image for benchmarking
+    Mat img_downscaled;
+    resize(cropped, img_downscaled, Size(), 1.0 / scale, 1.0 / scale);
+
+    //Make dnn super resolution instance
+    DnnSuperResImpl sr;
+    Mat img_new;
+
+    //Read and set the dnn model
+    sr.readModel(model);
+    sr.setModel(algorithm, scale);
+
+    double elapsed = 0.0;
+    vector<double> perf;
+
+    TickMeter tm;
+
+    //DL MODEL
+    tm.start();
+    sr.upsample(img_downscaled, img_new);
+    tm.stop();
+    elapsed = tm.getTimeSec() / tm.getCounter();
+    perf.push_back(elapsed);
+
+    cout << sr.getAlgorithm() << " : " << elapsed << endl;
+
+    //BICUBIC
+    Mat bicubic;
+    tm.start();
+    resize(img_downscaled, bicubic, Size(), scale, scale, INTER_CUBIC);
+    tm.stop();
+    elapsed = tm.getTimeSec() / tm.getCounter();
+    perf.push_back(elapsed);
+
+    cout << "Bicubic" << " : " << elapsed << endl;
+
+    //NEAREST NEIGHBOR
+    Mat nearest;
+    tm.start();
+    resize(img_downscaled, nearest, Size(), scale, scale, INTER_NEAREST);
+    tm.stop();
+    elapsed = tm.getTimeSec() / tm.getCounter();
+    perf.push_back(elapsed);
+
+    cout << "Nearest" << " : " << elapsed << endl;
+
+    //LANCZOS
+    Mat lanczos;
+    tm.start();
+    resize(img_downscaled, lanczos, Size(), scale, scale, INTER_LANCZOS4);
+    tm.stop();
+    elapsed = tm.getTimeSec() / tm.getCounter();
+    perf.push_back(elapsed);
+
+    cout << "Lanczos" << " : " << elapsed << endl;
+
+    vector <Mat> imgs{img_new, bicubic, nearest, lanczos};
+    vector <String> titles{sr.getAlgorithm(), "Bicubic", "Nearest neighbor", "Lanczos"};
+    showBenchmark(imgs, "Time benchmark", Size(bicubic.cols, bicubic.rows), titles, perf);
+
+    waitKey(0);
+
+    return 0;
+}
\ No newline at end of file
diff --git a/modules/dnn_superres/samples/dnn_superres_multioutput.cpp b/modules/dnn_superres/samples/dnn_superres_multioutput.cpp
new file mode 100644
index 00000000000..b49fbab07a1
--- /dev/null
+++ b/modules/dnn_superres/samples/dnn_superres_multioutput.cpp
@@ -0,0 +1,81 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include <iostream>
+#include <sstream>
+#include <opencv2/dnn_superres.hpp>
+
+#include <opencv2/imgproc.hpp>
+#include <opencv2/highgui.hpp>
+
+using namespace std;
+using namespace cv;
+using namespace dnn_superres;
+
+int main(int argc, char *argv[])
+{
+    // Check for valid command line arguments, print usage
+    // if insufficient arguments were given.
+    if (argc < 4) {
+        cout << "usage:   Arg 1: image     | Path to image" << endl;
+        cout << "\t Arg 2: scales in a format of 2,4,8\n";
+        cout << "\t Arg 3: output node names in a format of nchw_output_0,nchw_output_1\n";
+        cout << "\t Arg 4: path to model file \n";
+        return -1;
+    }
+
+    string img_path = string(argv[1]);
+    string scales_str = string(argv[2]);
+    string output_names_str = string(argv[3]);
+    std::string path = string(argv[4]);
+
+    //Parse the scaling factors
+    std::vector<int> scales;
+    char delim = ',';
+    {
+        std::stringstream ss(scales_str);
+        std::string token;
+        while (std::getline(ss, token, delim)) {
+            scales.push_back(atoi(token.c_str()));
+        }
+    }
+
+    //Parse the output node names
+    std::vector<String> node_names;
+    {
+        std::stringstream ss(output_names_str);
+        std::string token;
+        while (std::getline(ss, token, delim)) {
+            node_names.push_back(token);
+        }
+    }
+
+    // Load the image
+    Mat img = cv::imread(img_path);
+    Mat original_img(img);
+    if (img.empty())
+    {
+        std::cerr << "Couldn't load image: " << img << "\n";
+        return -2;
+    }
+
+    //Make dnn super resolution instance
+    DnnSuperResImpl sr;
+    int scale = *max_element(scales.begin(), scales.end());
+    std::vector<Mat> outputs;
+    sr.readModel(path);
+    sr.setModel("lapsrn", scale);
+
+    sr.upsampleMultioutput(img, outputs, scales, node_names);
+
+    for(unsigned int i = 0; i < outputs.size(); i++)
+    {
+        cv::namedWindow("Upsampled image", WINDOW_AUTOSIZE);
+        cv::imshow("Upsampled image", outputs[i]);
+        //cv::imwrite("./saved.jpg", img_new);
+        cv::waitKey(0);
+    }
+
+    return 0;
+}
\ No newline at end of file
diff --git a/modules/dnn_superres/samples/dnn_superres_video.cpp b/modules/dnn_superres/samples/dnn_superres_video.cpp
new file mode 100644
index 00000000000..3fa55a6ddca
--- /dev/null
+++ b/modules/dnn_superres/samples/dnn_superres_video.cpp
@@ -0,0 +1,79 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include <iostream>
+
+#include <opencv2/dnn_superres.hpp>
+
+#include <opencv2/imgproc.hpp>
+#include <opencv2/highgui.hpp>
+
+using namespace std;
+using namespace cv;
+using namespace dnn_superres;
+
+int main(int argc, char *argv[])
+{
+    // Check for valid command line arguments, print usage
+    // if insufficient arguments were given.
+    if (argc < 4) {
+        cout << "usage:   Arg 1: input video path" << endl;
+        cout << "\t Arg 2: output video path" << endl;
+        cout << "\t Arg 3: algorithm | edsr, espcn, fsrcnn or lapsrn" << endl;
+        cout << "\t Arg 4: scale     | 2, 3, 4 or 8 \n";
+        cout << "\t Arg 5: path to model file \n";
+        return -1;
+    }
+
+    string input_path = string(argv[1]);
+    string output_path = string(argv[2]);
+    string algorithm = string(argv[3]);
+    int scale = atoi(argv[4]);
+    string path = string(argv[5]);
+
+    VideoCapture input_video(input_path);
+    int ex = static_cast<int>(input_video.get(CAP_PROP_FOURCC));
+    Size S = Size((int) input_video.get(CAP_PROP_FRAME_WIDTH) * scale,
+                  (int) input_video.get(CAP_PROP_FRAME_HEIGHT) * scale);
+
+    VideoWriter output_video;
+    output_video.open(output_path, ex, input_video.get(CAP_PROP_FPS), S, true);
+
+    if (!input_video.isOpened())
+    {
+        std::cerr << "Could not open the video." << std::endl;
+        return -1;
+    }
+
+    DnnSuperResImpl sr;
+    sr.readModel(path);
+    sr.setModel(algorithm, scale);
+
+    for(;;)
+    {
+        Mat frame, output_frame;
+        input_video >> frame;
+
+        if ( frame.empty() )
+            break;
+
+        sr.upsample(frame, output_frame);
+        output_video << output_frame;
+
+        namedWindow("Upsampled video", WINDOW_AUTOSIZE);
+        imshow("Upsampled video", output_frame);
+
+        namedWindow("Original video", WINDOW_AUTOSIZE);
+        imshow("Original video", frame);
+
+        char c=(char)waitKey(25);
+        if(c==27)
+            break;
+    }
+
+    input_video.release();
+    output_video.release();
+
+    return 0;
+}
\ No newline at end of file
diff --git a/modules/dnn_superres/src/dnn_superres.cpp b/modules/dnn_superres/src/dnn_superres.cpp
new file mode 100644
index 00000000000..3565d47e3df
--- /dev/null
+++ b/modules/dnn_superres/src/dnn_superres.cpp
@@ -0,0 +1,354 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "precomp.hpp"
+
+#include "opencv2/dnn_superres.hpp"
+
+namespace cv
+{
+namespace dnn_superres
+{
+
+/** @brief Class for importing DepthToSpace layer from the ESPCN model
+*/
+class DepthToSpace CV_FINAL : public cv::dnn::Layer
+{
+public:
+    DepthToSpace(const cv::dnn::LayerParams &params);
+
+    static cv::Ptr<cv::dnn::Layer> create(cv::dnn::LayerParams& params);
+
+    virtual bool getMemoryShapes(const std::vector<std::vector<int> > &inputs,
+                                 const int,
+                                 std::vector<std::vector<int> > &outputs,
+                                 std::vector<std::vector<int> > &) const CV_OVERRIDE;
+
+    virtual void forward(cv::InputArrayOfArrays inputs_arr,
+                         cv::OutputArrayOfArrays outputs_arr,
+                         cv::OutputArrayOfArrays) CV_OVERRIDE;
+
+    /// Register this layer
+    static void registerLayer()
+    {
+        static bool initialized = false;
+        if (!initialized)
+        {
+            //Register custom layer that implements pixel shuffling
+            std::string name = "DepthToSpace";
+            dnn::LayerParams layerParams = dnn::LayerParams();
+            cv::dnn::LayerFactory::registerLayer("DepthToSpace", DepthToSpace::create);
+            initialized = true;
+        }
+    }
+};
+
+
+DnnSuperResImpl::DnnSuperResImpl()
+{
+    DepthToSpace::registerLayer();
+}
+
+DnnSuperResImpl::DnnSuperResImpl(const std::string& algo, int scale)
+    : alg(algo), sc(scale)
+{
+    DepthToSpace::registerLayer();
+}
+
+void DnnSuperResImpl::readModel(const std::string& path)
+{
+    if ( path.size() )
+    {
+        this->net = dnn::readNetFromTensorflow(path);
+        CV_LOG_INFO(NULL, "Successfully loaded model: " << path);
+    }
+    else
+    {
+        CV_Error(Error::StsBadArg, std::string("Could not load model: ") + path);
+    }
+}
+
+void DnnSuperResImpl::readModel(const std::string& weights, const std::string& definition)
+{
+    if ( weights.size() && definition.size() )
+    {
+        this->net = dnn::readNetFromTensorflow(weights, definition);
+        CV_LOG_INFO(NULL, "Successfully loaded model: " << weights << " " << definition);
+    }
+    else
+    {
+        CV_Error(Error::StsBadArg, std::string("Could not load model: ") + weights + " " + definition);
+    }
+}
+
+void DnnSuperResImpl::setModel(const std::string& algo, int scale)
+{
+    this->sc = scale;
+    this->alg = algo;
+}
+
+void DnnSuperResImpl::upsample(InputArray img, OutputArray result)
+{
+    if (net.empty())
+        CV_Error(Error::StsError, "Model not specified. Please set model via setModel().");
+
+    if (this->alg == "espcn" || this->alg == "lapsrn" || this->alg == "fsrcnn")
+    {
+        //Preprocess the image: convert to YCrCb float image and normalize
+        Mat preproc_img;
+        preprocess_YCrCb(img, preproc_img);
+
+        //Split the image: only the Y channel is used for inference
+        Mat ycbcr_channels[3];
+        split(preproc_img, ycbcr_channels);
+
+        Mat Y = ycbcr_channels[0];
+
+        //Create blob from image so it has size 1,1,Width,Height
+        cv::Mat blob;
+        dnn::blobFromImage(Y, blob, 1.0);
+
+        //Get the HR output
+        this->net.setInput(blob);
+
+        Mat blob_output = this->net.forward();
+
+        //Convert from blob
+        std::vector <Mat> model_outs;
+        dnn::imagesFromBlob(blob_output, model_outs);
+        Mat out_img = model_outs[0];
+
+        //Reconstruct: upscale the Cr and Cb space and merge the three layer
+        reconstruct_YCrCb(out_img, preproc_img, result, this->sc);
+    }
+    else if (this->alg == "edsr")
+    {
+        //BGR mean of the Div2K dataset
+        Scalar mean = Scalar(103.1545782, 111.561547, 114.35629928);
+
+        //Convert to float
+        Mat float_img;
+        img.getMat().convertTo(float_img, CV_32F, 1.0);
+
+        //Create blob from image so it has size [1,3,Width,Height] and subtract dataset mean
+        cv::Mat blob;
+        dnn::blobFromImage(float_img, blob, 1.0, Size(), mean);
+
+        //Get the HR output
+        this->net.setInput(blob);
+        Mat blob_output = this->net.forward();
+
+        //Convert from blob
+        std::vector <Mat> model_outs;
+        dnn::imagesFromBlob(blob_output, model_outs);
+
+        //Post-process: add mean.
+        Mat(model_outs[0] + mean).convertTo(result, CV_8U);
+    }
+    else
+    {
+        CV_Error(cv::Error::StsNotImplemented, std::string("Unknown/unsupported superres algorithm: ") + this->alg);
+    }
+}
+
+void DnnSuperResImpl::upsampleMultioutput(InputArray img, std::vector<Mat> &imgs_new, const std::vector<int>& scale_factors, const std::vector<String>& node_names)
+{
+    CV_Assert(!img.empty());
+    CV_Assert(scale_factors.size() == node_names.size());
+    CV_Assert(!scale_factors.empty());
+    CV_Assert(!node_names.empty());
+
+    if ( this->alg != "lapsrn" )
+    {
+        CV_Error(cv::Error::StsBadArg, "Only LapSRN support multiscale upsampling for now.");
+        return;
+    }
+
+    if (net.empty())
+        CV_Error(Error::StsError, "Model not specified. Please set model via setModel().");
+
+    if (this->alg == "lapsrn")
+    {
+        Mat orig = img.getMat();
+
+        //Preprocess the image: convert to YCrCb float image and normalize
+        Mat preproc_img;
+        preprocess_YCrCb(orig, preproc_img);
+
+        //Split the image: only the Y channel is used for inference
+        Mat ycbcr_channels[3];
+        split(preproc_img, ycbcr_channels);
+
+        Mat Y = ycbcr_channels[0];
+
+        //Create blob from image so it has size 1,1,Width,Height
+        cv::Mat blob;
+        dnn::blobFromImage(Y, blob, 1.0);
+
+        //Get the HR outputs
+        std::vector <Mat> outputs_blobs;
+        this->net.setInput(blob);
+        this->net.forward(outputs_blobs, node_names);
+
+        for(unsigned int i = 0; i < scale_factors.size(); i++)
+        {
+            std::vector <Mat> model_outs;
+            dnn::imagesFromBlob(outputs_blobs[i], model_outs);
+            Mat out_img = model_outs[0];
+            Mat reconstructed;
+
+            reconstruct_YCrCb(out_img, preproc_img, reconstructed, scale_factors[i]);
+
+            imgs_new.push_back(reconstructed);
+        }
+    }
+}
+
+int DnnSuperResImpl::getScale()
+{
+    return this->sc;
+}
+
+std::string DnnSuperResImpl::getAlgorithm()
+{
+    return this->alg;
+}
+
+void DnnSuperResImpl::preprocess_YCrCb(InputArray inpImg, OutputArray outImg)
+{
+    if ( inpImg.type() == CV_8UC1 )
+    {
+        inpImg.getMat().convertTo(outImg, CV_32F, 1.0 / 255.0);
+    }
+    else if ( inpImg.type() == CV_32FC1 )
+    {
+        inpImg.getMat().convertTo(outImg, CV_32F, 1.0 / 255.0);
+    }
+    else if ( inpImg.type() == CV_32FC3 )
+    {
+        Mat img_float;
+        inpImg.getMat().convertTo(img_float, CV_32F, 1.0 / 255.0);
+        cvtColor(img_float, outImg, COLOR_BGR2YCrCb);
+    }
+    else if ( inpImg.type() == CV_8UC3 )
+    {
+        Mat ycrcb;
+        cvtColor(inpImg, ycrcb, COLOR_BGR2YCrCb);
+        ycrcb.convertTo(outImg, CV_32F, 1.0 / 255.0);
+    }
+    else
+    {
+        CV_Error(Error::StsBadArg, std::string("Not supported image type: ") + typeToString(inpImg.type()));
+    }
+}
+
+void DnnSuperResImpl::reconstruct_YCrCb(InputArray inpImg, InputArray origImg, OutputArray outImg, int scale)
+{
+    if ( origImg.type() == CV_32FC3 )
+    {
+        Mat orig_channels[3];
+        split(origImg.getMat(), orig_channels);
+
+        Mat Cr, Cb;
+        cv::resize(orig_channels[1], Cr, cv::Size(), scale, scale);
+        cv::resize(orig_channels[2], Cb, cv::Size(), scale, scale);
+
+        std::vector <Mat> channels;
+        channels.push_back(inpImg.getMat());
+        channels.push_back(Cr);
+        channels.push_back(Cb);
+
+        Mat merged_img;
+        merge(channels, merged_img);
+
+        Mat merged_8u_img;
+        merged_img.convertTo(merged_8u_img, CV_8U, 255.0);
+
+        cvtColor(merged_8u_img, outImg, COLOR_YCrCb2BGR);
+    }
+    else if ( origImg.type() == CV_32FC1 )
+    {
+        inpImg.getMat().convertTo(outImg, CV_8U, 255.0);
+    }
+    else
+    {
+        CV_Error(Error::StsBadArg, std::string("Not supported image type: ") + typeToString(origImg.type()));
+    }
+}
+
+
+DepthToSpace::DepthToSpace(const cv::dnn::LayerParams &params) : Layer(params)
+{
+}
+
+cv::Ptr<cv::dnn::Layer> DepthToSpace::create(cv::dnn::LayerParams &params)
+{
+    return cv::Ptr<cv::dnn::Layer>(new DepthToSpace(params));
+}
+
+bool DepthToSpace::getMemoryShapes(const std::vector <std::vector<int>> &inputs,
+        const int, std::vector <std::vector<int>> &outputs, std::vector <std::vector<int>> &) const
+{
+    std::vector<int> outShape(4);
+
+    int scale;
+    if( inputs[0][1] == 4 || inputs[0][1] == 9 || inputs[0][1] == 16 ) //Only one image channel
+    {
+        scale = static_cast<int>(sqrt(inputs[0][1]));
+    }
+    else // Three image channels
+    {
+        scale = static_cast<int>(sqrt(inputs[0][1]/3));
+    }
+
+    outShape[0] = inputs[0][0];
+    outShape[1] = static_cast<int>(inputs[0][1] / pow(scale,2));
+    outShape[2] = static_cast<int>(scale * inputs[0][2]);
+    outShape[3] = static_cast<int>(scale * inputs[0][3]);
+
+    outputs.assign(4, outShape);
+
+    return false;
+}
+
+void DepthToSpace::forward(cv::InputArrayOfArrays inputs_arr, cv::OutputArrayOfArrays outputs_arr,
+    cv::OutputArrayOfArrays)
+{
+    std::vector <cv::Mat> inputs, outputs;
+    inputs_arr.getMatVector(inputs);
+    outputs_arr.getMatVector(outputs);
+    cv::Mat &inp = inputs[0];
+    cv::Mat &out = outputs[0];
+    const float *inpData = (float *) inp.data;
+    float *outData = (float *) out.data;
+
+    const int inpHeight = inp.size[2];
+    const int inpWidth = inp.size[3];
+
+    const int numChannels = out.size[1];
+    const int outHeight = out.size[2];
+    const int outWidth = out.size[3];
+
+    int scale = int(outHeight / inpHeight);
+    int count = 0;
+
+    for (int ch = 0; ch < numChannels; ch++)
+    {
+        for (int y = 0; y < outHeight; y++)
+        {
+            for (int x = 0; x < outWidth; x++)
+            {
+                int x_coord = static_cast<int>(floor((y / scale)));
+                int y_coord = static_cast<int>(floor((x / scale)));
+                int c_coord = numChannels * scale * (y % scale) + numChannels * (x % scale) + ch;
+
+                int index = (((c_coord * inpHeight) + x_coord) * inpWidth) + y_coord;
+
+                outData[count++] = inpData[index];
+            }
+        }
+    }
+}
+
+}} // cv::dnn_superres::
diff --git a/modules/dnn_superres/src/precomp.hpp b/modules/dnn_superres/src/precomp.hpp
new file mode 100644
index 00000000000..cf1c4493569
--- /dev/null
+++ b/modules/dnn_superres/src/precomp.hpp
@@ -0,0 +1,23 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#ifndef __OPENCV_DNN_SUPERRES_PRECOMP_HPP__
+#define __OPENCV_DNN_SUPERRES_PRECOMP_HPP__
+
+#include <iostream>
+#include <vector>
+#include <memory>
+#include <string>
+#include <map>
+#include <numeric>
+#include <algorithm>
+
+#include "opencv2/core.hpp"
+#include <opencv2/core/utils/logger.hpp>
+
+#include "opencv2/dnn.hpp"
+#include "opencv2/imgproc.hpp"
+
+#include "opencv2/dnn_superres.hpp"
+
+#endif // __OPENCV_DNN_SUPERRES_PRECOMP_HPP__
diff --git a/modules/dnn_superres/test/test_dnn_superres.cpp b/modules/dnn_superres/test/test_dnn_superres.cpp
new file mode 100644
index 00000000000..6645040bddc
--- /dev/null
+++ b/modules/dnn_superres/test/test_dnn_superres.cpp
@@ -0,0 +1,114 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+const std::string DNN_SUPERRES_DIR = "dnn_superres";
+const std::string IMAGE_FILENAME = "butterfly.png";
+
+/****************************************************************************************\
+*                                Test single output models                               *
+\****************************************************************************************/
+
+void runSingleModel(std::string algorithm, int scale, std::string model_filename)
+{
+    SCOPED_TRACE(algorithm);
+
+    Ptr <DnnSuperResImpl> dnn_sr = makePtr<DnnSuperResImpl>();
+
+    std::string path = cvtest::findDataFile(DNN_SUPERRES_DIR + "/" + IMAGE_FILENAME);
+
+    Mat img = imread(path);
+    ASSERT_FALSE(img.empty()) << "Test image can't be loaded: " << path;
+
+    std::string pb_path = cvtest::findDataFile(DNN_SUPERRES_DIR + "/" + model_filename);
+
+    dnn_sr->readModel(pb_path);
+
+    dnn_sr->setModel(algorithm, scale);
+
+    ASSERT_EQ(scale, dnn_sr->getScale());
+    ASSERT_EQ(algorithm, dnn_sr->getAlgorithm());
+
+    Mat result;
+    dnn_sr->upsample(img, result);
+
+    ASSERT_FALSE(result.empty()) << "Could not perform upsampling for scale algorithm " << algorithm << " and scale factor " << scale;
+
+    int new_cols = img.cols * scale;
+    int new_rows = img.rows * scale;
+    ASSERT_EQ(new_cols, result.cols);
+    ASSERT_EQ(new_rows, result.rows);
+}
+
+TEST(CV_DnnSuperResSingleOutputTest, accuracy_espcn_2)
+{
+    runSingleModel("espcn", 2, "ESPCN_x2.pb");
+}
+
+TEST(CV_DnnSuperResSingleOutputTest, accuracy_fsrcnn_2)
+{
+    runSingleModel("fsrcnn", 2, "FSRCNN_x2.pb");
+}
+
+TEST(CV_DnnSuperResSingleOutputTest, accuracy_fsrcnn_3)
+{
+    runSingleModel("fsrcnn", 3, "FSRCNN_x3.pb");
+}
+
+
+/****************************************************************************************\
+*                                Test multi output models                               *
+\****************************************************************************************/
+
+void runMultiModel(std::string algorithm, int scale, std::string model_filename,
+                std::vector<int> scales, std::vector<String> node_names)
+{
+    SCOPED_TRACE(algorithm);
+
+    Ptr <DnnSuperResImpl> dnn_sr = makePtr<DnnSuperResImpl>();
+
+    std::string path = cvtest::findDataFile(DNN_SUPERRES_DIR + "/" + IMAGE_FILENAME);
+
+    Mat img = imread(path);
+    ASSERT_FALSE(img.empty()) << "Test image can't be loaded: " << path;
+
+    std::string pb_path = cvtest::findDataFile(DNN_SUPERRES_DIR + "/" + model_filename);
+
+    dnn_sr->readModel(pb_path);
+
+    dnn_sr->setModel(algorithm, scale);
+
+    ASSERT_EQ(scale, dnn_sr->getScale());
+    ASSERT_EQ(algorithm, dnn_sr->getAlgorithm());
+
+    std::vector<Mat> outputs;
+    dnn_sr->upsampleMultioutput(img, outputs, scales, node_names);
+
+    for(unsigned int i = 0; i < outputs.size(); i++)
+    {
+        SCOPED_TRACE(cv::format("i=%d scale[i]=%d", i, scales[i]));
+
+        ASSERT_FALSE(outputs[i].empty());
+
+        int new_cols = img.cols * scales[i];
+        int new_rows = img.rows * scales[i];
+
+        EXPECT_EQ(new_cols, outputs[i].cols);
+        EXPECT_EQ(new_rows, outputs[i].rows);
+    }
+}
+
+TEST(CV_DnnSuperResMultiOutputTest, accuracy)
+{
+    //LAPSRN
+    //x4
+    std::vector<String> names_4x {"NCHW_output_2x", "NCHW_output_4x"};
+    std::vector<int> scales_4x {2, 4};
+    runMultiModel("lapsrn", 4, "LapSRN_x4.pb", scales_4x, names_4x);
+}
+
+}}
diff --git a/modules/dnn_superres/test/test_main.cpp b/modules/dnn_superres/test/test_main.cpp
new file mode 100644
index 00000000000..0e51ddfd050
--- /dev/null
+++ b/modules/dnn_superres/test/test_main.cpp
@@ -0,0 +1,6 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#include "test_precomp.hpp"
+
+CV_TEST_MAIN("cv")
diff --git a/modules/dnn_superres/test/test_precomp.hpp b/modules/dnn_superres/test/test_precomp.hpp
new file mode 100644
index 00000000000..91e5b85dad0
--- /dev/null
+++ b/modules/dnn_superres/test/test_precomp.hpp
@@ -0,0 +1,15 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/dnn_superres.hpp"
+
+namespace opencv_test {
+    using namespace cv::dnn_superres;
+}
+
+#endif
diff --git a/modules/dnn_superres/tutorials/benchmark/images/bicubic_butterfly.jpg b/modules/dnn_superres/tutorials/benchmark/images/bicubic_butterfly.jpg
new file mode 100644
index 00000000000..91d28245de5
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/bicubic_butterfly.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/bicubic_comic.jpg b/modules/dnn_superres/tutorials/benchmark/images/bicubic_comic.jpg
new file mode 100644
index 00000000000..e179379b3df
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/bicubic_comic.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/bicubic_div2k.jpg b/modules/dnn_superres/tutorials/benchmark/images/bicubic_div2k.jpg
new file mode 100644
index 00000000000..56e82efe1ef
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/bicubic_div2k.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/bicubic_urban.jpg b/modules/dnn_superres/tutorials/benchmark/images/bicubic_urban.jpg
new file mode 100644
index 00000000000..506919f536c
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/bicubic_urban.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/edsr_butterfly.jpg b/modules/dnn_superres/tutorials/benchmark/images/edsr_butterfly.jpg
new file mode 100644
index 00000000000..a34d64fd1a8
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/edsr_butterfly.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/edsr_comic.jpg b/modules/dnn_superres/tutorials/benchmark/images/edsr_comic.jpg
new file mode 100644
index 00000000000..bd2e1bfc385
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/edsr_comic.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/edsr_urban.jpg b/modules/dnn_superres/tutorials/benchmark/images/edsr_urban.jpg
new file mode 100644
index 00000000000..5c37c11e623
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/edsr_urban.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/espcn_butterfly.jpg b/modules/dnn_superres/tutorials/benchmark/images/espcn_butterfly.jpg
new file mode 100644
index 00000000000..b7b4c1d9603
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/espcn_butterfly.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/espcn_comic.jpg b/modules/dnn_superres/tutorials/benchmark/images/espcn_comic.jpg
new file mode 100644
index 00000000000..e3f632a0336
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/espcn_comic.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/espcn_urban.jpg b/modules/dnn_superres/tutorials/benchmark/images/espcn_urban.jpg
new file mode 100644
index 00000000000..0559cbcbbc0
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/espcn_urban.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/fsrcnn_butterfly.jpg b/modules/dnn_superres/tutorials/benchmark/images/fsrcnn_butterfly.jpg
new file mode 100644
index 00000000000..030b20b6df6
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/fsrcnn_butterfly.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/fsrcnn_comic.jpg b/modules/dnn_superres/tutorials/benchmark/images/fsrcnn_comic.jpg
new file mode 100644
index 00000000000..055c285f4b9
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/fsrcnn_comic.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/fsrcnn_urban.jpg b/modules/dnn_superres/tutorials/benchmark/images/fsrcnn_urban.jpg
new file mode 100644
index 00000000000..0f18e2da736
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/fsrcnn_urban.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/lanczos_butterfly.jpg b/modules/dnn_superres/tutorials/benchmark/images/lanczos_butterfly.jpg
new file mode 100644
index 00000000000..2c19add4329
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/lanczos_butterfly.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/lanczos_comic.jpg b/modules/dnn_superres/tutorials/benchmark/images/lanczos_comic.jpg
new file mode 100644
index 00000000000..73efbfbfce7
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/lanczos_comic.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/lanczos_div2k.jpg b/modules/dnn_superres/tutorials/benchmark/images/lanczos_div2k.jpg
new file mode 100644
index 00000000000..61df63bc698
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/lanczos_div2k.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/lanczos_urban.jpg b/modules/dnn_superres/tutorials/benchmark/images/lanczos_urban.jpg
new file mode 100644
index 00000000000..70c6bff9ae2
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/lanczos_urban.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/lapsrn_butterfly.jpg b/modules/dnn_superres/tutorials/benchmark/images/lapsrn_butterfly.jpg
new file mode 100644
index 00000000000..9850b4979cd
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/lapsrn_butterfly.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/lapsrn_comic.jpg b/modules/dnn_superres/tutorials/benchmark/images/lapsrn_comic.jpg
new file mode 100644
index 00000000000..fd11693925f
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/lapsrn_comic.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/lapsrn_div2k.jpg b/modules/dnn_superres/tutorials/benchmark/images/lapsrn_div2k.jpg
new file mode 100644
index 00000000000..341d8839772
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/lapsrn_div2k.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/nearest_butterfly.jpg b/modules/dnn_superres/tutorials/benchmark/images/nearest_butterfly.jpg
new file mode 100644
index 00000000000..ab913255fb9
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/nearest_butterfly.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/nearest_comic.jpg b/modules/dnn_superres/tutorials/benchmark/images/nearest_comic.jpg
new file mode 100644
index 00000000000..aa20d4b5fc5
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/nearest_comic.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/nearest_div2k.jpg b/modules/dnn_superres/tutorials/benchmark/images/nearest_div2k.jpg
new file mode 100644
index 00000000000..e561bfdad13
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/nearest_div2k.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/nearest_urban.jpg b/modules/dnn_superres/tutorials/benchmark/images/nearest_urban.jpg
new file mode 100644
index 00000000000..16883c89e92
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/nearest_urban.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/orig_butterfly.jpg b/modules/dnn_superres/tutorials/benchmark/images/orig_butterfly.jpg
new file mode 100644
index 00000000000..d5f6094df3e
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/orig_butterfly.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/orig_comic.jpg b/modules/dnn_superres/tutorials/benchmark/images/orig_comic.jpg
new file mode 100644
index 00000000000..34114a95b26
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/orig_comic.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/orig_div2k.jpg b/modules/dnn_superres/tutorials/benchmark/images/orig_div2k.jpg
new file mode 100644
index 00000000000..577649f7707
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/orig_div2k.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/images/orig_urban.jpg b/modules/dnn_superres/tutorials/benchmark/images/orig_urban.jpg
new file mode 100644
index 00000000000..77719ff2b6f
Binary files /dev/null and b/modules/dnn_superres/tutorials/benchmark/images/orig_urban.jpg differ
diff --git a/modules/dnn_superres/tutorials/benchmark/sr_benchmark.markdown b/modules/dnn_superres/tutorials/benchmark/sr_benchmark.markdown
new file mode 100644
index 00000000000..26244c9f8ae
--- /dev/null
+++ b/modules/dnn_superres/tutorials/benchmark/sr_benchmark.markdown
@@ -0,0 +1,143 @@
+Super-resolution benchmarking {#tutorial_dnn_superres_benchmark}
+===========================
+
+Benchmarking
+----
+
+The super-resolution module contains sample codes for benchmarking, in order to compare different models and algorithms.
+Here is presented a sample code for performing benchmarking, and then a few benchmarking results are collected.
+It was performed on an Intel i7-9700K CPU on an Ubuntu 18.04.02 OS.
+
+Source Code of the sample
+-----------
+
+@includelineno dnn_superres/samples/dnn_superres_benchmark_quality.cpp
+
+Explanation
+-----------
+
+-#  **Read and downscale the image**
+    @code{.cpp}
+     int width = img.cols - (img.cols % scale);
+     int height = img.rows - (img.rows % scale);
+     Mat cropped = img(Rect(0, 0, width, height));
+     Mat img_downscaled;
+     cv::resize(cropped, img_downscaled, cv::Size(), 1.0 / scale, 1.0 / scale);
+    @endcode
+
+    Resize the image by the scaling factor. Before that a cropping is necessary, so the images will align.
+
+-#  **Set the model**
+    @code{.cpp}
+    DnnSuperResImpl sr;
+    sr.readModel(path);
+    sr.setModel(algorithm, scale);
+    sr.upsample(img_downscaled, img_new);
+    @endcode
+
+    Instantiate a dnn super-resolution object. Read and set the algorithm and scaling factor.
+
+-#  **Perform benchmarking**
+    @code{.cpp}
+    double psnr = PSNR(img_new, cropped);
+    Scalar q = cv::quality::QualitySSIM::compute(img_new, cropped, cv::noArray());
+    double ssim = mean(cv::Vec3f(q[0], q[1], q[2]))[0];
+    @endcode
+
+    Calculate PSNR and SSIM. Use OpenCVs PSNR (core opencv) and SSIM (contrib) functions to compare the images.
+    Repeat it with other upscaling algorithms, such as other DL models or interpolation methods (eg. bicubic, nearest neighbor).
+
+Benchmarking results
+-----------
+
+Dataset benchmarking
+----
+
+###General100 dataset
+
+<center>
+
+#####2x scaling factor
+
+
+|               | Avg inference time in sec (CPU)| Avg PSNR | Avg SSIM |
+| ------------- |:-------------------:| ---------:|--------:|
+| ESPCN            | **0.008795** | 32.7059 | 0.9276 |
+| EDSR             | 5.923450 | **34.1300** | **0.9447** |
+| FSRCNN           | 0.021741 | 32.8886 | 0.9301 |
+| LapSRN           | 0.114812 | 32.2681 | 0.9248 |
+| Bicubic          | 0.000208 | 32.1638 | 0.9305 |
+| Nearest neighbor | 0.000114 | 29.1665 | 0.9049 |
+| Lanczos          | 0.001094 | 32.4687 | 0.9327 |
+
+#####3x scaling factor
+
+|               | Avg inference time in sec (CPU)| Avg PSNR | Avg SSIM |
+| ------------- |:-------------------:| ---------:|--------:|
+| ESPCN            | **0.005495**  | 28.4229 | 0.8474 |
+| EDSR             | 2.455510    | **29.9828**  | **0.8801** |
+| FSRCNN           | 0.008807   | 28.3068 | 0.8429 |
+| LapSRN           | 0.282575    |26.7330   |0.8862  |
+| Bicubic          | 0.000311 |26.0635  |0.8754  |
+| Nearest neighbor | 0.000148 |23.5628  |0.8174  |
+| Lanczos          | 0.001012  |25.9115  |0.8706  |
+
+
+#####4x scaling factor
+
+|               | Avg inference time in sec (CPU)| Avg PSNR | Avg SSIM |
+| ------------- |:-------------------:| ---------:|--------:|
+| ESPCN            | **0.004311** | 26.6870 | 0.7891 |
+| EDSR             | 1.607570    | **28.1552** | **0.8317**  |
+| FSRCNN           | 0.005302  | 26.6088 | 0.7863 |
+| LapSRN           | 0.121229    |26.7383   |0.7896  |
+| Bicubic          | 0.000311 |26.0635  |0.8754  |
+| Nearest neighbor | 0.000148 |23.5628  |0.8174  |
+| Lanczos          | 0.001012  |25.9115  |0.8706  |
+
+
+</center>
+
+Images
+----
+
+<center>
+
+####2x scaling factor
+
+|Set5: butterfly.png | size: 256x256 | ||
+|:-------------:|:-------------------:|:-------------:|:----:|
+|![Original](images/orig_butterfly.jpg)|![Bicubic interpolation](images/bicubic_butterfly.jpg)|![Nearest neighbor interpolation](images/nearest_butterfly.jpg)|![Lanczos interpolation](images/lanczos_butterfly.jpg) |
+|PSRN / SSIM / Speed (CPU)|26.6645 / 0.9048 / 0.000201 |23.6854 / 0.8698 / **0.000075** | **26.9476** / **0.9075** / 0.001039|
+![ESPCN](images/espcn_butterfly.jpg)| ![FSRCNN](images/fsrcnn_butterfly.jpg) | ![LapSRN](images/lapsrn_butterfly.jpg) | ![EDSR](images/edsr_butterfly.jpg)
+|29.0341 / 0.9354 / **0.004157**| 29.0077 / 0.9345 / 0.006325 | 27.8212 / 0.9230 / 0.037937 | **30.0347** / **0.9453** / 2.077280 |
+
+####3x scaling factor
+
+|Urban100: img_001.png | size: 1024x644 | ||
+|:-------------:|:-------------------:|:-------------:|:----:|
+|![Original](images/orig_urban.jpg)|![Bicubic interpolation](images/bicubic_urban.jpg)|![Nearest neighbor interpolation](images/nearest_urban.jpg)|![Lanczos interpolation](images/lanczos_urban.jpg) |
+|PSRN / SSIM / Speed (CPU)| 27.0474 / **0.8484** / 0.000391 | 26.0842 / 0.8353 / **0.000236** | **27.0704** / 0.8483 / 0.002234|
+|![ESPCN](images/espcn_urban.jpg)| ![FSRCNN](images/fsrcnn_urban.jpg) | LapSRN is not trained for 3x <br/> because of its architecture  | ![EDSR](images/edsr_urban.jpg)
+|28.0118 / 0.8588 / **0.030748**| 28.0184 / 0.8597 / 0.094173 |  | **30.5671** / **0.9019** / 9.517580 |
+
+
+####4x scaling factor
+
+|Set14: comic.png | size: 250x361 | ||
+|:-------------:|:-------------------:|:-------------:|:----:|
+|![Original](images/orig_comic.jpg)|![Bicubic interpolation](images/bicubic_comic.jpg)|![Nearest neighbor interpolation](images/nearest_comic.jpg)|![Lanczos interpolation](images/lanczos_comic.jpg) |
+|PSRN / SSIM / Speed (CPU)| **19.6766** / **0.6413** / 0.000262 |18.5106 / 0.5879 / **0.000085** | 19.4948 / 0.6317 / 0.001098|
+|![ESPCN](images/espcn_comic.jpg)| ![FSRCNN](images/fsrcnn_comic.jpg) | ![LapSRN](images/lapsrn_comic.jpg) | ![EDSR](images/edsr_comic.jpg)
+|20.0417 / 0.6302 / **0.001894**| 20.0885 / 0.6384 / 0.002103 | 20.0676 / 0.6339 / 0.061640 | **20.5233** / **0.6901** / 0.665876 |
+
+####8x scaling factor
+
+|Div2K: 0006.png | size: 1356x2040 | |
+|:-------------:|:-------------------:|:-------------:|
+|![Original](images/orig_div2k.jpg)|![Bicubic interpolation](images/bicubic_div2k.jpg)|![Nearest neighbor interpolation](images/nearest_div2k.jpg)|
+|PSRN / SSIM / Speed (CPU)| 26.3139 / **0.8033** / 0.001107| 23.8291 / 0.7340 / **0.000611** |
+|![Lanczos interpolation](images/lanczos_div2k.jpg)| ![LapSRN](images/lapsrn_div2k.jpg) | |
+|26.1565 / 0.7962 / 0.004782| **26.7046** / 0.7987 / 2.274290 | |
+
+</center>
\ No newline at end of file
diff --git a/modules/dnn_superres/tutorials/table_of_content_dnn_superres.markdown b/modules/dnn_superres/tutorials/table_of_content_dnn_superres.markdown
new file mode 100644
index 00000000000..ce7c81d9e7a
--- /dev/null
+++ b/modules/dnn_superres/tutorials/table_of_content_dnn_superres.markdown
@@ -0,0 +1,26 @@
+Super Resolution using CNNs {#tutorial_table_of_content_dnn_superres}
+===========================
+
+-   @subpage tutorial_dnn_superres_upscale_image_single
+
+    *Author:* Xavier Weber
+
+    How to upscale images using the 'dnn_superres' interface: single-output
+
+-   @subpage tutorial_dnn_superres_upscale_image_multi
+
+    *Author:* Fanny Monori
+
+    How to upscale images using the 'dnn_superres' interface: multi-output
+
+-   @subpage tutorial_dnn_superres_upscale_video
+
+    *Author:* Fanny Monori
+
+    How to upscale a video using the 'dnn_superres' interface.
+
+-   @subpage tutorial_dnn_superres_benchmark
+
+    Authors:* Fanny Monori & Xavier Weber
+
+    Benchmarking of the algorithms.
diff --git a/modules/dnn_superres/tutorials/upscale_image_multi/upscale_image_multi.markdown b/modules/dnn_superres/tutorials/upscale_image_multi/upscale_image_multi.markdown
new file mode 100644
index 00000000000..faa5c319163
--- /dev/null
+++ b/modules/dnn_superres/tutorials/upscale_image_multi/upscale_image_multi.markdown
@@ -0,0 +1,85 @@
+Upscaling images: multi-output {#tutorial_dnn_superres_upscale_image_multi}
+===========================
+
+In this tutorial you will learn how to use the 'dnn_superres' interface to upscale an image via a multi-output pre-trained neural network.
+OpenCVs dnn module supports accessing multiple nodes in one inference, if the names of the nodes are given.
+Currently there is one model included that is capable of giving more output in one inference run, that is the LapSRN model.
+LapSRN supports multiple outputs with one forward pass. It can now support 2x, 4x, 8x, and (2x, 4x) and (2x, 4x, 8x) super-resolution.
+The uploaded trained model files have the following output node names:
+- 2x model: NCHW_output
+- 4x model: NCHW_output_2x, NCHW_output_4x
+- 8x model: NCHW_output_2x, NCHW_output_4x, NCHW_output_8x
+
+Building
+----
+
+When building OpenCV, run the following command to build the 'dnn_superres' module:
+
+```make
+cmake -DOPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules -Dopencv_dnn_superres=ON <opencv_source_dir>
+```
+
+Or make sure you check the dnn_superres module in the GUI version of CMake: cmake-gui.
+
+Source Code of the sample
+-----------
+
+Run the sample code with the following command
+
+```run
+./bin/example_dnn_superres_dnn_superres_multioutput path/to/image.png 2,4 NCHW_output_2x,NCHW_output_4x \
+path/to/opencv_contrib/modules/dnn_superres/models/LapSRN_x4.pb
+```
+
+
+@includelineno dnn_superres/samples/dnn_superres_multioutput.cpp
+
+Explanation
+-----------
+
+-#  **Set header and namespaces**
+    @code{.cpp}
+    #include <opencv2/dnn_superres.hpp>
+    using namespace std;
+    using namespace cv;
+    using namespace dnn_superres;
+    @endcode
+
+-#  **Create the Dnn Superres object**
+    @code{.cpp}
+    DnnSuperResImpl sr;
+    @endcode
+
+    Instantiate a dnn super-resolution object.
+
+-#  **Read the model**
+    @code{.cpp}
+    path = "models/LapSRN_x8.pb"
+    sr.readModel(path);
+    @endcode
+
+    Read the model from the given path.
+
+-#  **Set the model**
+    @code{.cpp}
+    sr.setModel("lapsrn", 8);
+    @endcode
+
+    Sets the algorithm and scaling factor. The last (largest) scaling factor should be given here.
+
+-#  **Give the node names and scaling factors**
+     @code{.cpp}
+     std::vector<int> scales{2, 4, 8}
+     std::vector<int> node_names{'NCHW_output_2x','NCHW_output_4x','NCHW_output_8x'}
+     @endcode
+
+     Set the scaling factors, and the output node names in the model.
+
+-#  **Upscale an image**
+    @code{.cpp}
+    Mat img = cv::imread(img_path);
+    std::vector<Mat> outputs;
+    sr.upsampleMultioutput(img, outputs, scales, node_names);
+    @endcode
+
+    Run the inference. The output images will be stored in a Mat vector.
diff --git a/modules/dnn_superres/tutorials/upscale_image_single/images/bicubicOutput.jpg b/modules/dnn_superres/tutorials/upscale_image_single/images/bicubicOutput.jpg
new file mode 100644
index 00000000000..0d834452932
Binary files /dev/null and b/modules/dnn_superres/tutorials/upscale_image_single/images/bicubicOutput.jpg differ
diff --git a/modules/dnn_superres/tutorials/upscale_image_single/images/fsrcnnOutput.jpg b/modules/dnn_superres/tutorials/upscale_image_single/images/fsrcnnOutput.jpg
new file mode 100644
index 00000000000..ce36c8d58e3
Binary files /dev/null and b/modules/dnn_superres/tutorials/upscale_image_single/images/fsrcnnOutput.jpg differ
diff --git a/modules/dnn_superres/tutorials/upscale_image_single/images/input.jpg b/modules/dnn_superres/tutorials/upscale_image_single/images/input.jpg
new file mode 100644
index 00000000000..6fbc335e692
Binary files /dev/null and b/modules/dnn_superres/tutorials/upscale_image_single/images/input.jpg differ
diff --git a/modules/dnn_superres/tutorials/upscale_image_single/upscale_image_single.markdown b/modules/dnn_superres/tutorials/upscale_image_single/upscale_image_single.markdown
new file mode 100644
index 00000000000..7b42e652f53
--- /dev/null
+++ b/modules/dnn_superres/tutorials/upscale_image_single/upscale_image_single.markdown
@@ -0,0 +1,66 @@
+Upscaling images: single-output {#tutorial_dnn_superres_upscale_image_single}
+===========================
+
+In this tutorial you will learn how to use the 'dnn_superres' interface to upscale an image via pre-trained neural networks.
+
+Building
+----
+
+When building OpenCV, run the following command to build the 'dnn_superres' module:
+
+```make
+cmake -DOPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules -Dopencv_dnn_superres=ON <opencv_source_dir>
+```
+
+Or make sure you check the dnn_superres module in the GUI version of CMake: cmake-gui.
+
+Source Code of the sample
+-----------
+
+@includelineno dnn_superres/samples/dnn_superres.cpp
+
+Explanation
+-----------
+
+-#  **Set header and namespaces**
+    @code{.cpp}
+    #include <opencv2/dnn_superres.hpp>
+    using namespace std;
+    using namespace cv;
+    using namespace dnn;
+    using namespace dnn_superres;
+    @endcode
+
+    If you want you can set the namespace like the code above.
+-#  **Create the Dnn Superres object**
+    @code{.cpp}
+    DnnSuperResImpl sr;
+    @endcode
+
+    This is just to create the object, register the custom dnn layers and get access to the class functions.
+-#  **Read the model**
+    @code{.cpp}
+    path = "models/FSRCNN_x2.pb"
+    sr.readModel(path);
+    @endcode
+
+    This reads the TensorFlow model from the .pb file. Here 'path' is one of the pre-trained Tensorflow models' path file. You can download the models from OpenCV's GitHub, in the 'dnn_superres' module.
+-#  **Set the model**
+    @code{.cpp}
+    sr.setModel("fsrcnn", 2);
+    @endcode
+
+    Depending on the model you want to run, you have to set the algorithm and upscale factor. This is to know the desired algorithm and scale, even if you change the .pb file's name. For example: if you chose FSRCNN_x2.pb, your algorithm and scale will be 'fsrcnn' and 2, respectively. (Other algorithm options include "edsr", "espcn" and "lapsrn".)
+-#  **Upscale an image**
+    @code{.cpp}
+    Mat img = cv::imread(img_path);
+    Mat img_new;
+    sr.upsample(img, img_new);
+    @endcode
+
+    Now we can upscale any image. Load an image via the standard 'imread' function and create a new Mat for the destination image. Then simple
+    upscale. Your upscaled image is located in 'img_new'.
+
+Original: ![](images/input.jpg)
+Upscaled Image via FSRCNN: ![](images/fsrcnnOutput.jpg)
+Upscaled Image via Bicubic Interpolation: ![](images/bicubicOutput.jpg)
\ No newline at end of file
diff --git a/modules/dnn_superres/tutorials/upscale_video/upscale_video.markdown b/modules/dnn_superres/tutorials/upscale_video/upscale_video.markdown
new file mode 100644
index 00000000000..711a34828b1
--- /dev/null
+++ b/modules/dnn_superres/tutorials/upscale_video/upscale_video.markdown
@@ -0,0 +1,56 @@
+Upscaling video {#tutorial_dnn_superres_upscale_video}
+===========================
+
+In this tutorial you will learn how to use the 'dnn_superres' interface to upscale video via pre-trained neural networks.
+
+Building
+----
+
+When building OpenCV, run the following command to build the 'dnn_superres' module:
+
+```make
+cmake -DOPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules -Dopencv_dnn_superres=ON <opencv_source_dir>
+```
+
+Or make sure you check the dnn_superres module in the GUI version of CMake: cmake-gui.
+
+Source Code of the sample
+-----------
+
+@includelineno dnn_superres/samples/dnn_superres_video.cpp
+
+Explanation
+-----------
+
+-#  **Set header and namespaces**
+    @code{.cpp}
+    #include <opencv2/dnn_superres.hpp>
+    using namespace std;
+    using namespace cv;
+    using namespace dnn_superres;
+    @endcode
+-#  **Create the Dnn Superres object**
+    @code{.cpp}
+    DnnSuperResImpl sr;
+    @endcode
+    Instantiate a dnn super-resolution object.
+-#  **Read the model**
+    @code{.cpp}
+    path = "models/ESPCN_x2.pb"
+    sr.readModel(path);
+    sr.setModel("espcn", 2);
+    @endcode
+    Read the model from the given path and sets the algorithm and scaling factor.
+-#  **Upscale a video**
+    @code{.cpp}
+    for(;;)
+    {
+        Mat frame, output_frame;
+        input_video >> frame;
+        if ( frame.empty() )
+            break;
+        sr.upsample(frame, output_frame);
+        ...
+    }
+    @endcode
+    Process and upsample video frame by frame.
diff --git a/modules/face/CMakeLists.txt b/modules/face/CMakeLists.txt
index 2d5f8075a68..ce3d88874e6 100644
--- a/modules/face/CMakeLists.txt
+++ b/modules/face/CMakeLists.txt
@@ -2,7 +2,7 @@ set(the_description "Face recognition etc")
 ocv_define_module(face opencv_core
     opencv_imgproc
     opencv_objdetect
-    opencv_video     # estimateRigidTransform() (trainFacemark)
+    opencv_calib3d   # estimateAffinePartial2D() (trainFacemark)
     opencv_photo     # seamlessClone() (face_swap sample)
     WRAP python java
 )
diff --git a/modules/face/include/opencv2/face.hpp b/modules/face/include/opencv2/face.hpp
index b24c07e41e5..8c4bda37b87 100644
--- a/modules/face/include/opencv2/face.hpp
+++ b/modules/face/include/opencv2/face.hpp
@@ -131,7 +131,7 @@ If you've set the threshold to 0.0 as we did above, then:
 
 @code
 //
-Mat img = imread("person1/3.jpg", CV_LOAD_IMAGE_GRAYSCALE);
+Mat img = imread("person1/3.jpg", IMREAD_GRAYSCALE);
 // Get a prediction from the model. Note: We've set a threshold of 0.0 above,
 // since the distance is almost always larger than 0.0, you'll get -1 as
 // label, which indicates, this face is unknown
@@ -178,13 +178,13 @@ class CV_EXPORTS_W FaceRecognizer : public Algorithm
     // using Mat of type CV_32SC1
     // Mat labels(number_of_samples, 1, CV_32SC1);
     // images for first person
-    images.push_back(imread("person0/0.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(0);
-    images.push_back(imread("person0/1.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(0);
-    images.push_back(imread("person0/2.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(0);
+    images.push_back(imread("person0/0.jpg", IMREAD_GRAYSCALE)); labels.push_back(0);
+    images.push_back(imread("person0/1.jpg", IMREAD_GRAYSCALE)); labels.push_back(0);
+    images.push_back(imread("person0/2.jpg", IMREAD_GRAYSCALE)); labels.push_back(0);
     // images for second person
-    images.push_back(imread("person1/0.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(1);
-    images.push_back(imread("person1/1.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(1);
-    images.push_back(imread("person1/2.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(1);
+    images.push_back(imread("person1/0.jpg", IMREAD_GRAYSCALE)); labels.push_back(1);
+    images.push_back(imread("person1/1.jpg", IMREAD_GRAYSCALE)); labels.push_back(1);
+    images.push_back(imread("person1/2.jpg", IMREAD_GRAYSCALE)); labels.push_back(1);
     @endcode
 
     Now that you have read some images, we can create a new FaceRecognizer. In this example I'll create
@@ -277,7 +277,7 @@ class CV_EXPORTS_W FaceRecognizer : public Algorithm
     // Do your initialization here (create the cv::FaceRecognizer model) ...
     // ...
     // Read in a sample image:
-    Mat img = imread("person1/3.jpg", CV_LOAD_IMAGE_GRAYSCALE);
+    Mat img = imread("person1/3.jpg", IMREAD_GRAYSCALE);
     // And get a prediction from the cv::FaceRecognizer:
     int predicted = model->predict(img);
     @endcode
@@ -288,7 +288,7 @@ class CV_EXPORTS_W FaceRecognizer : public Algorithm
     using namespace cv;
     // Do your initialization here (create the cv::FaceRecognizer model) ...
     // ...
-    Mat img = imread("person1/3.jpg", CV_LOAD_IMAGE_GRAYSCALE);
+    Mat img = imread("person1/3.jpg", IMREAD_GRAYSCALE);
     // Some variables for the predicted label and associated confidence (e.g. distance):
     int predicted_label = -1;
     double predicted_confidence = 0.0;
diff --git a/modules/face/src/eigen_faces.cpp b/modules/face/src/eigen_faces.cpp
index e57012472f1..896e86a1190 100644
--- a/modules/face/src/eigen_faces.cpp
+++ b/modules/face/src/eigen_faces.cpp
@@ -68,7 +68,7 @@ void Eigenfaces::train(InputArrayOfArrays _src, InputArray _local_labels) {
     if(_src.total() > 1) {
         for(int i = 1; i < static_cast<int>(_src.total()); i++) {
             if(_src.getMat(i-1).total() != _src.getMat(i).total()) {
-                String error_message = format("In the Eigenfaces method all input samples (training images) must be of equal size! Expected %d pixels, but was %d pixels.", _src.getMat(i-1).total(), _src.getMat(i).total());
+                String error_message = format("In the Eigenfaces method all input samples (training images) must be of equal size! Expected %zu pixels, but was %zu pixels.", _src.getMat(i-1).total(), _src.getMat(i).total());
                 CV_Error(Error::StsUnsupportedFormat, error_message);
             }
         }
@@ -82,7 +82,7 @@ void Eigenfaces::train(InputArrayOfArrays _src, InputArray _local_labels) {
    int n = data.rows;
     // assert there are as much samples as labels
     if(static_cast<int>(labels.total()) != n) {
-        String error_message = format("The number of samples (src) must equal the number of labels (labels)! len(src)=%d, len(labels)=%d.", n, labels.total());
+        String error_message = format("The number of samples (src) must equal the number of labels (labels)! len(src)=%d, len(labels)=%zu.", n, labels.total());
         CV_Error(Error::StsBadArg, error_message);
     }
     // clear existing model data
@@ -117,7 +117,7 @@ void Eigenfaces::predict(InputArray _src, Ptr<PredictCollector> collector) const
         CV_Error(Error::StsError, error_message);
     } else if(_eigenvectors.rows != static_cast<int>(src.total())) {
         // check data alignment just for clearer exception messages
-        String error_message = format("Wrong input image size. Reason: Training and Test images must be of equal size! Expected an image with %d elements, but got %d.", _eigenvectors.rows, src.total());
+        String error_message = format("Wrong input image size. Reason: Training and Test images must be of equal size! Expected an image with %d elements, but got %zu.", _eigenvectors.rows, src.total());
         CV_Error(Error::StsBadArg, error_message);
     }
     // project into PCA subspace
diff --git a/modules/face/src/face_utils.hpp b/modules/face/src/face_utils.hpp
index 35f85947cb9..318ce9b1248 100644
--- a/modules/face/src/face_utils.hpp
+++ b/modules/face/src/face_utils.hpp
@@ -28,7 +28,7 @@ inline Mat asRowMatrix(InputArrayOfArrays src, int rtype, double alpha=1, double
     for(unsigned int i = 0; i < n; i++) {
         // make sure data can be reshaped, throw exception if not!
         if(src.getMat(i).total() != d) {
-            String error_message = format("Wrong number of elements in matrix #%d! Expected %d was %d.", i, d, src.getMat(i).total());
+            String error_message = format("Wrong number of elements in matrix #%u! Expected %zu was %zu.", i, d, src.getMat(i).total());
             CV_Error(Error::StsBadArg, error_message);
         }
         // get a hold of the current row
diff --git a/modules/face/src/facemarkLBF.cpp b/modules/face/src/facemarkLBF.cpp
index 264ec0379eb..1181d718421 100644
--- a/modules/face/src/facemarkLBF.cpp
+++ b/modules/face/src/facemarkLBF.cpp
@@ -666,7 +666,7 @@ Mat FacemarkLBFImpl::BBox::project(const Mat &shape) const {
         res(i, 0) = (shape_(i, 0) - x_center) / x_scale;
         res(i, 1) = (shape_(i, 1) - y_center) / y_scale;
     }
-    return CV_CXX_MOVE(res);
+    return std::move(res);
 }
 
 // Project relative shape to absolute shape binding to this bbox
@@ -677,7 +677,7 @@ Mat FacemarkLBFImpl::BBox::reproject(const Mat &shape) const {
         res(i, 0) = shape_(i, 0)*x_scale + x_center;
         res(i, 1) = shape_(i, 1)*y_scale + y_center;
     }
-    return CV_CXX_MOVE(res);
+    return std::move(res);
 }
 
 Mat FacemarkLBFImpl::getMeanShape(std::vector<Mat> &gt_shapes, std::vector<BBox> &bboxes) {
@@ -1038,7 +1038,7 @@ Mat FacemarkLBFImpl::RandomForest::generateLBF(Mat &img, Mat &current_shape, BBo
             lbf_feat(i*trees_n + j) = (i*trees_n + j)*base + code;
         }
     }
-    return CV_CXX_MOVE(lbf_feat);
+    return std::move(lbf_feat);
 }
 
 void FacemarkLBFImpl::RandomForest::write(FileStorage fs, int k) {
@@ -1380,7 +1380,7 @@ Mat FacemarkLBFImpl::Regressor::globalRegressionPredict(const Mat &lbf, int stag
         for (int j = 0; j < lbf.cols; j++) y += w_ptr[lbf_ptr[j]];
         delta_shape(i, 1) = y;
     }
-    return CV_CXX_MOVE(delta_shape);
+    return std::move(delta_shape);
 } // Regressor::globalRegressionPredict
 
 Mat FacemarkLBFImpl::Regressor::predict(Mat &img, BBox &bbox) {
diff --git a/modules/face/src/fisher_faces.cpp b/modules/face/src/fisher_faces.cpp
index b51f38bde8a..25fa63a99fa 100644
--- a/modules/face/src/fisher_faces.cpp
+++ b/modules/face/src/fisher_faces.cpp
@@ -77,7 +77,7 @@ void Fisherfaces::train(InputArrayOfArrays src, InputArray _lbls) {
     if(src.total() > 1) {
         for(int i = 1; i < static_cast<int>(src.total()); i++) {
             if(src.getMat(i-1).total() != src.getMat(i).total()) {
-                String error_message = format("In the Fisherfaces method all input samples (training images) must be of equal size! Expected %d pixels, but was %d pixels.", src.getMat(i-1).total(), src.getMat(i).total());
+                String error_message = format("In the Fisherfaces method all input samples (training images) must be of equal size! Expected %zu pixels, but was %zu pixels.", src.getMat(i-1).total(), src.getMat(i).total());
                 CV_Error(Error::StsUnsupportedFormat, error_message);
             }
         }
@@ -89,10 +89,10 @@ void Fisherfaces::train(InputArrayOfArrays src, InputArray _lbls) {
     int N = data.rows;
     // make sure labels are passed in correct shape
     if(labels.total() != (size_t) N) {
-        String error_message = format("The number of samples (src) must equal the number of labels (labels)! len(src)=%d, len(labels)=%d.", N, labels.total());
+        String error_message = format("The number of samples (src) must equal the number of labels (labels)! len(src)=%d, len(labels)=%zu.", N, labels.total());
         CV_Error(Error::StsBadArg, error_message);
     } else if(labels.rows != 1 && labels.cols != 1) {
-        String error_message = format("Expected the labels in a matrix with one row or column! Given dimensions are rows=%s, cols=%d.", labels.rows, labels.cols);
+        String error_message = format("Expected the labels in a matrix with one row or column! Given dimensions are rows=%d, cols=%d.", labels.rows, labels.cols);
        CV_Error(Error::StsBadArg, error_message);
     }
     // clear existing model data
@@ -136,7 +136,7 @@ void Fisherfaces::predict(InputArray _src, Ptr<PredictCollector> collector) cons
         String error_message = "This Fisherfaces model is not computed yet. Did you call Fisherfaces::train?";
         CV_Error(Error::StsBadArg, error_message);
     } else if(src.total() != (size_t) _eigenvectors.rows) {
-        String error_message = format("Wrong input image size. Reason: Training and Test images must be of equal size! Expected an image with %d elements, but got %d.", _eigenvectors.rows, src.total());
+        String error_message = format("Wrong input image size. Reason: Training and Test images must be of equal size! Expected an image with %d elements, but got %zu.", _eigenvectors.rows, src.total());
         CV_Error(Error::StsBadArg, error_message);
     }
     // project into LDA subspace
diff --git a/modules/face/src/lbph_faces.cpp b/modules/face/src/lbph_faces.cpp
index 9688ff4b0a4..cfb39735660 100644
--- a/modules/face/src/lbph_faces.cpp
+++ b/modules/face/src/lbph_faces.cpp
@@ -371,7 +371,7 @@ void LBPH::train(InputArrayOfArrays _in_src, InputArray _in_labels, bool preserv
     Mat labels = _in_labels.getMat();
     // check if data is well- aligned
     if(labels.total() != src.size()) {
-        String error_message = format("The number of samples (src) must equal the number of labels (labels). Was len(samples)=%d, len(labels)=%d.", src.size(), _labels.total());
+        String error_message = format("The number of samples (src) must equal the number of labels (labels). Was len(samples)=%zu, len(labels)=%zu.", src.size(), _labels.total());
         CV_Error(Error::StsBadArg, error_message);
     }
     // if this model should be trained without preserving old data, delete old model data
diff --git a/modules/face/src/mace.cpp b/modules/face/src/mace.cpp
index 5941a3bbc75..2c09560bfe5 100644
--- a/modules/face/src/mace.cpp
+++ b/modules/face/src/mace.cpp
@@ -106,7 +106,7 @@ struct MACEImpl CV_FINAL : MACE {
         complexInput.copyTo(dftImg(Rect(0,0,IMGSIZE,IMGSIZE)));
 
         dft(dftImg, dftImg);
-        return CV_CXX_MOVE(dftImg);
+        return std::move(dftImg);
     }
 
 
diff --git a/modules/face/src/trainFacemark.cpp b/modules/face/src/trainFacemark.cpp
index 3bef45c6e2a..955df0d99ad 100644
--- a/modules/face/src/trainFacemark.cpp
+++ b/modules/face/src/trainFacemark.cpp
@@ -4,7 +4,7 @@
 
 #include "precomp.hpp"
 #include "face_alignmentimpl.hpp"
-#include "opencv2/video/tracking.hpp"
+#include "opencv2/calib3d.hpp"
 #include <climits>
 
 using namespace std;
@@ -123,7 +123,7 @@ bool FacemarkKazemiImpl :: getRelativePixels(vector<Point2f> sample,vector<Point
         CV_Error(Error::StsBadArg, error_message);
     }
     Mat transform_mat;
-    transform_mat = estimateRigidTransform(meanshape,sample,false);
+    transform_mat = estimateAffinePartial2D(meanshape, sample);
     unsigned long index;
     for (unsigned long i = 0;i<pixel_coordinates.size();i++) {
         if(!nearest.empty())
diff --git a/modules/hdf/src/hdf5.cpp b/modules/hdf/src/hdf5.cpp
index a24092a4e6a..8da69474775 100644
--- a/modules/hdf/src/hdf5.cpp
+++ b/modules/hdf/src/hdf5.cpp
@@ -257,7 +257,7 @@ inline int HDF5Impl::GetCVtype( hid_t h5Type ) const
     else if ( H5Tequal( h5Type, H5T_NATIVE_INT    ) )
       cvType = CV_32S;
     else
-      CV_Error_(Error::StsInternal, ("Unknown H5Type: %d.", h5Type));
+      CV_Error_(Error::StsInternal, ("Unknown H5Type: %lld.", (long long)h5Type));
 
     return cvType;
 }
@@ -441,10 +441,13 @@ void HDF5Impl::atread(String* value, const String& atlabel)
         CV_Error_(Error::StsInternal, ("Attribute '%s' is not of string type!", atlabel.c_str()));
     }
     size_t size = H5Tget_size(atype);
-    *value = String(size, 0); // allocate space
+    AutoBuffer<char> buf(size);
 
     hid_t atype_mem = H5Tget_native_type(atype, H5T_DIR_ASCEND);
-    H5Aread(attr, atype_mem, const_cast<char*>(value->c_str()));
+    H5Aread(attr, atype_mem, buf.data());
+    if (size > 0 && buf[size - 1] == '\0')
+        size--;
+    value->assign(buf.data(), size);
 
     H5Tclose(atype_mem);
     H5Tclose(atype);
diff --git a/modules/hdf/test/test_hdf5.cpp b/modules/hdf/test/test_hdf5.cpp
index a7610f72cee..52348c0f247 100644
--- a/modules/hdf/test/test_hdf5.cpp
+++ b/modules/hdf/test/test_hdf5.cpp
@@ -284,9 +284,27 @@ TEST_F(HDF5_Test, test_attribute_String)
 
     m_hdf_io->atwrite(attr_value, attr_name);
 
-    String expected_attr_value;
-    m_hdf_io->atread(&expected_attr_value, attr_name);
-    EXPECT_EQ(attr_value.compare(expected_attr_value), 0);
+    String got_attr_value;
+    m_hdf_io->atread(&got_attr_value, attr_name);
+    EXPECT_EQ(attr_value, got_attr_value);
+
+    m_hdf_io->close();
+}
+
+TEST_F(HDF5_Test, test_attribute_String_empty)
+{
+    reset();
+
+    String attr_name = "test-empty-string";
+    String attr_value;
+
+    m_hdf_io = hdf::open(m_filename);
+
+    m_hdf_io->atwrite(attr_value, attr_name);
+
+    String got_attr_value;
+    m_hdf_io->atread(&got_attr_value, attr_name);
+    EXPECT_EQ(attr_value, got_attr_value);
 
     m_hdf_io->close();
 }
diff --git a/modules/intensity_transform/CMakeLists.txt b/modules/intensity_transform/CMakeLists.txt
new file mode 100644
index 00000000000..d68ca5e3b5f
--- /dev/null
+++ b/modules/intensity_transform/CMakeLists.txt
@@ -0,0 +1,2 @@
+set(the_description "Intensity transformations")
+ocv_define_module(intensity_transform opencv_core WRAP python)
diff --git a/modules/intensity_transform/README.md b/modules/intensity_transform/README.md
new file mode 100644
index 00000000000..957eb983b73
--- /dev/null
+++ b/modules/intensity_transform/README.md
@@ -0,0 +1,4 @@
+Intensity Transformations
+========================
+
+This module contains some of the intensity transformation methods used to adjust the constrast of an image. The methods in the module include autoscaling, gamma correction, log transformations, and contrast stretching.
\ No newline at end of file
diff --git a/modules/intensity_transform/doc/intensity_transform.bib b/modules/intensity_transform/doc/intensity_transform.bib
new file mode 100644
index 00000000000..a2a0f72d050
--- /dev/null
+++ b/modules/intensity_transform/doc/intensity_transform.bib
@@ -0,0 +1,16 @@
+@book{Gonzalez2018,
+  title={Digital Image Processing 4th Edition},
+  author={Rafael C. Gonzalez, Richard E. Woods},
+  year={2018},
+  publisher={Pearson}
+}
+
+@misc{lcs435lab,
+  title={CS425 Lab: Intensity Transformations and Spatial Filtering},
+  url={http://www.cs.uregina.ca/Links/class-info/425/Lab3/}
+}
+
+@misc{theailearner,
+  title={Contrast Stretching},
+  url={https://theailearner.com/2019/01/30/contrast-stretching/}
+}
\ No newline at end of file
diff --git a/modules/intensity_transform/include/opencv2/intensity_transform.hpp b/modules/intensity_transform/include/opencv2/intensity_transform.hpp
new file mode 100644
index 00000000000..12447f5c965
--- /dev/null
+++ b/modules/intensity_transform/include/opencv2/intensity_transform.hpp
@@ -0,0 +1,78 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_INTENSITY_TRANSFORM_H
+#define OPENCV_INTENSITY_TRANSFORM_H
+
+#include "opencv2/core.hpp"
+
+/**
+ * @defgroup intensity_transform The module brings implementations of intensity transformation algorithms to adjust image contrast.
+ *
+ * Namespace for all functions is cv::intensity_trasnform.
+ *
+ * ### Supported Algorithms
+ * - Autoscaling
+ * - Log Transformations
+ * - Power-Law (Gamma) Transformations
+ * - Contrast Stretching
+ *
+ * Reference from following book and websites:
+ * - Digital Image Processing 4th Edition Chapter 3 [Rafael C. Gonzalez, Richard E. Woods] @cite Gonzalez2018
+ * - http://www.cs.uregina.ca/Links/class-info/425/Lab3/ @cite lcs435lab
+ * - https://theailearner.com/2019/01/30/contrast-stretching/ @cite theailearner
+*/
+
+namespace cv {
+namespace intensity_transform {
+
+//! @addtogroup intensity_transform
+//! @{
+
+/**
+ * @brief Given an input bgr or grayscale image and constant c, apply log transformation to the image
+ * on domain [0, 255] and return the resulting image.
+ *
+ * @param input input bgr or grayscale image.
+ * @param output resulting image of log transformations.
+*/
+CV_EXPORTS_W void logTransform(const Mat input, Mat& output);
+
+/**
+ * @brief Given an input bgr or grayscale image and constant gamma, apply power-law transformation,
+ * a.k.a. gamma correction to the image on domain [0, 255] and return the resulting image.
+ *
+ * @param input input bgr or grayscale image.
+ * @param output resulting image of gamma corrections.
+ * @param gamma constant in c*r^gamma where r is pixel value.
+*/
+CV_EXPORTS_W void gammaCorrection(const Mat input, Mat& output, const float gamma);
+
+/**
+ * @brief Given an input bgr or grayscale image, apply autoscaling on domain [0, 255] to increase
+ * the contrast of the input image and return the resulting image.
+ *
+ * @param input input bgr or grayscale image.
+ * @param output resulting image of autoscaling.
+*/
+CV_EXPORTS_W void autoscaling(const Mat input, Mat& output);
+
+/**
+ * @brief Given an input bgr or grayscale image, apply linear contrast stretching on domain [0, 255]
+ * and return the resulting image.
+ *
+ * @param input input bgr or grayscale image.
+ * @param output resulting image of contrast stretching.
+ * @param r1 x coordinate of first point (r1, s1) in the transformation function.
+ * @param s1 y coordinate of first point (r1, s1) in the transformation function.
+ * @param r2 x coordinate of second point (r2, s2) in the transformation function.
+ * @param s2 y coordinate of second point (r2, s2) in the transformation function.
+*/
+CV_EXPORTS_W void contrastStretching(const Mat input, Mat& output, const int r1, const int s1, const int r2, const int s2);
+
+//! @}
+
+}} // cv::intensity_transform::
+
+#endif
\ No newline at end of file
diff --git a/modules/intensity_transform/samples/intensity_transform.cpp b/modules/intensity_transform/samples/intensity_transform.cpp
new file mode 100644
index 00000000000..6274aaf2af3
--- /dev/null
+++ b/modules/intensity_transform/samples/intensity_transform.cpp
@@ -0,0 +1,39 @@
+#include "opencv2/core.hpp"
+#include "opencv2/imgcodecs.hpp"
+#include "opencv2/highgui.hpp"
+#include "opencv2/intensity_transform.hpp"
+
+#include <iostream>
+
+using namespace std;
+using namespace cv;
+using namespace cv::intensity_transform;
+
+int main(int argc, char **argv)
+{
+    if (argc != 2)
+    {
+        cerr << "Must input the path of the input image. Ex: intensity_transform image.jpg" << endl;
+        return -1;
+    }
+
+    // Read input image
+    Mat image = imread(argv[1]);
+
+    // Apply intensity transformations
+    Mat imgGamma, imgAutoscaled, imgLog, contrastStretch;
+    gammaCorrection(image, imgGamma, (float)(0.4));
+    autoscaling(image, imgAutoscaled);
+    logTransform(image, imgLog);
+    contrastStretching(image, contrastStretch, 70, 15, 120, 240);
+
+    // Display intensity transformation results
+    imshow("Original Image", image);
+    imshow("Autoscale", imgAutoscaled);
+    imshow("Gamma Correction", imgGamma);
+    imshow("Log Transformation", imgLog);
+    imshow("Contrast Stretching", contrastStretch);
+    waitKey(0);
+
+    return 0;
+}
\ No newline at end of file
diff --git a/modules/intensity_transform/src/intensity_transform.cpp b/modules/intensity_transform/src/intensity_transform.cpp
new file mode 100644
index 00000000000..30ceb2a2706
--- /dev/null
+++ b/modules/intensity_transform/src/intensity_transform.cpp
@@ -0,0 +1,65 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace std;
+
+namespace cv {
+namespace intensity_transform {
+
+void logTransform(const Mat input, Mat& output)
+{
+    double maxVal;
+    minMaxLoc(input, NULL, &maxVal, NULL, NULL);
+    const double c = 255 / log(1 + maxVal);
+    Mat add_one_64f;
+    input.convertTo(add_one_64f, CV_64F, 1, 1.0f);
+    Mat log_64f;
+    cv::log(add_one_64f, log_64f);
+    log_64f.convertTo(output, CV_8UC3, c, 0.0f);
+}
+
+void gammaCorrection(const Mat input, Mat& output, const float gamma)
+{
+    std::array<uchar, 256> table;
+    for (int i = 0; i < 256; i++)
+    {
+        table[i] = saturate_cast<uchar>(pow((i / 255.0), gamma) * 255.0);
+    }
+
+    LUT(input, table, output);
+}
+
+void autoscaling(const Mat input, Mat& output)
+{
+    double minVal, maxVal;
+    minMaxLoc(input, &minVal, &maxVal, NULL, NULL);
+    output = 255 * (input - minVal) / (maxVal - minVal);
+}
+
+void contrastStretching(const Mat input, Mat& output, const int r1, const int s1, const int r2, const int s2)
+{
+    std::array<uchar, 256> table;
+    for (int i = 0; i < 256; i++)
+    {
+        if (i <= r1)
+        {
+            table[i] = saturate_cast<uchar>(((float)s1 / (float)r1) * i);
+        }
+        else if (r1 < i && i <= r2)
+        {
+            table[i] = saturate_cast<uchar>(((float)(s2 - s1)/(float)(r2 - r1)) * (i - r1) + s1);
+        }
+        else // (r2 < i)
+        {
+            table[i] = saturate_cast<uchar>(((float)(255 - s2)/(float)(255 - r2)) * (i - r2) + s2);
+        }
+    }
+
+    LUT(input, table, output);
+}
+
+}} // cv::intensity_transform::
\ No newline at end of file
diff --git a/modules/intensity_transform/src/precomp.hpp b/modules/intensity_transform/src/precomp.hpp
new file mode 100644
index 00000000000..6708d5bba01
--- /dev/null
+++ b/modules/intensity_transform/src/precomp.hpp
@@ -0,0 +1,12 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_INTENSITY_TRANSFORM_PRECOMP_H
+#define OPENCV_INTENSITY_TRANSFORM_PRECOMP_H
+
+#include "opencv2/core.hpp"
+#include "opencv2/intensity_transform.hpp"
+#include <array>
+
+#endif
diff --git a/modules/intensity_transform/test/test_intensity_transform.cpp b/modules/intensity_transform/test/test_intensity_transform.cpp
new file mode 100644
index 00000000000..5780904d014
--- /dev/null
+++ b/modules/intensity_transform/test/test_intensity_transform.cpp
@@ -0,0 +1,201 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "test_precomp.hpp"
+
+using namespace cv;
+
+namespace opencv_test { namespace {
+
+TEST(intensity_transform_logTransform, accuracy)
+{
+    uchar image_data[] = {
+         51, 211, 212,  38,  48,  25, 189,  16,  64, 197,
+        104, 137,  60,  10,  78, 234, 186, 149,  37, 236,
+        128,  80,   6,  53,   7,  65, 233,  15, 216,  42,
+        108, 132, 136, 194, 117, 128, 214,  46, 220, 119,
+        101, 126, 148,  22,  86, 206,  91, 125, 234,  24,
+        162, 136,  46, 247, 245,  81, 157, 126,  73, 173,
+        120, 230, 117, 111, 145, 168, 169, 187,  23, 109,
+          0, 184,  23,  43, 108, 201,  13, 170, 249, 228,
+        107,  59,  73, 254, 116, 156, 209, 155, 149,  95,
+         24, 245, 136, 107, 192, 114,  69,  80, 199,   8
+    };
+
+    Mat_<uchar> image(10, 10, image_data);
+
+    Mat res;
+    cv::intensity_transform::logTransform(image, res);
+
+    uchar expectedRes_data[] = {
+        182, 247, 247, 169, 179, 150, 241, 130, 192, 243,
+        214, 227, 189, 110, 201, 251, 241, 231, 167, 252,
+        224, 202,  90, 184,  96, 193, 251, 128, 248, 173,
+        216, 225, 226, 243, 220, 224, 247, 177, 248, 220,
+        213, 223, 230, 144, 206, 245, 208, 223, 251, 148,
+        234, 226, 177, 254, 253, 203, 233, 223, 198, 237,
+        221, 250, 220, 217, 229, 236, 236, 241, 146, 216,
+          0, 240, 146, 174, 216, 244, 121, 237, 254, 250,
+        215, 188, 198, 255, 219, 233, 246, 232, 231, 210,
+        148, 253, 226, 215, 242, 218, 196, 202, 244, 101
+    };
+
+    Mat_<uchar> expectedRes(10, 10, expectedRes_data);
+
+    EXPECT_LE(cvtest::norm(res, expectedRes, NORM_INF), 1);
+}
+
+TEST(intensity_transform_gammaCorrection, accuracy1)
+{
+    uchar image_data[] = {
+         51, 211, 212,  38,  48,  25, 189,  16,  64, 197,
+        104, 137,  60,  10,  78, 234, 186, 149,  37, 236,
+        128,  80,   6,  53,   7,  65, 233,  15, 216,  42,
+        108, 132, 136, 194, 117, 128, 214,  46, 220, 119,
+        101, 126, 148,  22,  86, 206,  91, 125, 234,  24,
+        162, 136,  46, 247, 245,  81, 157, 126,  73, 173,
+        120, 230, 117, 111, 145, 168, 169, 187,  23, 109,
+          0, 184,  23,  43, 108, 201,  13, 170, 249, 228,
+        107,  59,  73, 254, 116, 156, 209, 155, 149,  95,
+         24, 245, 136, 107, 192, 114,  69,  80, 199,   8
+    };
+
+    Mat_<uchar> image(10, 10, image_data);
+
+    Mat res;
+    cv::intensity_transform::gammaCorrection(image, res, 1.0);
+
+    uchar expectedRes_data[] = {
+         51, 211, 212,  38,  48,  25, 189,  16,  64, 197,
+        104, 137,  60,  10,  78, 234, 186, 149,  37, 236,
+        128,  80,   6,  53,   7,  65, 233,  15, 216,  42,
+        108, 132, 136, 194, 117, 128, 214,  46, 220, 119,
+        101, 126, 148,  22,  86, 206,  91, 125, 234,  24,
+        162, 136,  46, 247, 245,  81, 157, 126,  73, 173,
+        120, 230, 117, 111, 145, 168, 169, 187,  23, 109,
+          0, 184,  23,  43, 108, 201,  13, 170, 249, 228,
+        107,  59,  73, 254, 116, 156, 209, 155, 149,  95,
+         24, 245, 136, 107, 192, 114,  69,  80, 199,   8
+    };
+
+    Mat_<uchar> expectedRes(10, 10, expectedRes_data);
+
+    EXPECT_LE(cvtest::norm(res, expectedRes, NORM_INF), 1);
+}
+
+TEST(intensity_transform_gammaCorrection, accuracy2)
+{
+    uchar image_data[] = {
+         51, 211, 212,  38,  48,  25, 189,  16,  64, 197,
+        104, 137,  60,  10,  78, 234, 186, 149,  37, 236,
+        128,  80,   6,  53,   7,  65, 233,  15, 216,  42,
+        108, 132, 136, 194, 117, 128, 214,  46, 220, 119,
+        101, 126, 148,  22,  86, 206,  91, 125, 234,  24,
+        162, 136,  46, 247, 245,  81, 157, 126,  73, 173,
+        120, 230, 117, 111, 145, 168, 169, 187,  23, 109,
+          0, 184,  23,  43, 108, 201,  13, 170, 249, 228,
+        107,  59,  73, 254, 116, 156, 209, 155, 149,  95,
+         24, 245, 136, 107, 192, 114,  69,  80, 199,   8
+    };
+
+    Mat_<uchar> image(10, 10, image_data);
+
+    Mat res;
+    cv::intensity_transform::gammaCorrection(image, res, (float)(0.4));
+
+    uchar expectedRes_data[] = {
+        133, 236, 236, 119, 130, 100, 226,  84, 146, 229,
+        178, 198, 142,  69, 158, 246, 224, 205, 117, 247,
+        193, 160,  56, 136,  60, 147, 245,  82, 238, 123,
+        180, 195, 198, 228, 186, 193, 237, 128, 240, 187,
+        176, 192, 205,  95, 165, 234, 168, 191, 246,  99,
+        212, 198, 128, 251, 250, 161, 210, 192, 154, 218,
+        188, 244, 186, 182, 203, 215, 216, 225,  97, 181,
+          0, 223,  97, 125, 180, 231,  77, 216, 252, 243,
+        180, 141, 154, 254, 186, 209, 235, 208, 205, 171,
+         99, 250, 198, 180, 227, 184, 151, 160, 230, 63
+    };
+
+    Mat_<uchar> expectedRes(10, 10, expectedRes_data);
+
+    EXPECT_LE(cvtest::norm(res, expectedRes, NORM_INF), 1);
+}
+
+TEST(intensity_transform_autoscaling, accuracy)
+{
+    uchar image_data[] = {
+         32,  59, 164, 127, 151, 107, 167,  62, 195, 143,
+         54, 166, 104,  27, 152,  20,  35, 135,  12, 198,
+        107,  63,  90, 169,  67, 135, 136,  14,  94, 115,
+         34, 150, 169, 171, 130,  39, 190, 108, 103,  32,
+         57,  83, 146,  37,  81, 143, 144,  47,  87,  49,
+         32, 108,  17, 165, 127, 137, 108,  35, 179, 175,
+         40, 148, 174,  79, 146, 119, 103, 168, 167, 160,
+         66, 107, 164,  19,  85, 126,  58,  95,  15, 131,
+         88,  58, 162,  90, 147, 125,  61, 157,  60, 104,
+        128, 193,  69, 104,  94, 196,  11,  66,  18, 179
+    };
+
+    Mat_<uchar> image(10, 10, image_data);
+
+    Mat res;
+    cv::intensity_transform::autoscaling(image, res);
+
+    uchar expectedRes_data[] = {
+         29,  65, 209, 158, 191, 131, 213,  70, 251, 180,
+         59, 211, 127,  22, 192,  12,  33, 169,   1, 255,
+        131,  71, 108, 215,  76, 169, 170,   4, 113, 142,
+         31, 190, 215, 218, 162,  38, 244, 132, 125,  29,
+         63,  98, 184,  35,  95, 180, 181,  49, 104,  52,
+         29, 132,   8, 210, 158, 172, 132,  33, 229, 224,
+         40, 187, 222,  93, 184, 147, 125, 214, 213, 203,
+         75, 131, 209,  11, 101, 157,  64, 115,   5, 164,
+        105,  64, 206, 108, 185, 155,  68, 199,  67, 127,
+        160, 248,  79, 127, 113, 252,   0,  75,  10, 229
+    };
+
+    Mat_<uchar> expectedRes(10, 10, expectedRes_data);
+
+    EXPECT_LE(cvtest::norm(res, expectedRes, NORM_INF), 1);
+}
+
+TEST(intensity_transform_contrastStretching, accuracy)
+{
+    uchar image_data[] = {
+         32,  59, 164, 127, 151, 107, 167,  62, 195, 143,
+         54, 166, 104,  27, 152,  20,  35, 135,  12, 198,
+        107,  63,  90, 169,  67, 135, 136,  14,  94, 115,
+         34, 150, 169, 171, 130,  39, 190, 108, 103,  32,
+         57,  83, 146,  37,  81, 143, 144,  47,  87,  49,
+         32, 108,  17, 165, 127, 137, 108,  35, 179, 175,
+         40, 148, 174,  79, 146, 119, 103, 168, 167, 160,
+         66, 107, 164,  19,  85, 126,  58,  95,  15, 131,
+         88,  58, 162,  90, 147, 125,  61, 157,  60, 104,
+        128, 193,  69, 104,  94, 196,  11,  66,  18, 179
+    };
+
+    Mat_<uchar> image(10, 10, image_data);
+
+    Mat res;
+    cv::intensity_transform::contrastStretching(image, res, 70, 15, 120, 240);
+
+    uchar expectedRes_data[] = {
+          6,  12, 244, 240, 243, 181, 245,  13, 248, 242,
+         11, 245, 168,   5, 243,   4,   7, 241,   2, 248,
+        181,  13, 105, 245,  14, 241, 241,   3, 123, 217,
+          7, 243, 245, 245, 241,   8, 247, 186, 163,   6,
+         12,  73, 242,   7,  64, 242, 242,  10,  91,  10,
+          6, 186,   3, 245, 240, 241, 186,   7, 246, 246,
+          8, 243, 246,  55, 242, 235, 163, 245, 245, 244,
+         14, 181, 244,   4,  82, 240,  12, 127,   3, 241,
+         96,  12, 244, 105, 243, 240,  13, 244,  12, 168,
+        240, 248,  14, 168, 123, 248,   2,  14,   3, 246
+    };
+
+    Mat_<uchar> expectedRes(10, 10, expectedRes_data);
+
+    EXPECT_LE(cvtest::norm(res, expectedRes, NORM_INF), 1);
+}
+
+}} // namespace
diff --git a/modules/intensity_transform/test/test_main.cpp b/modules/intensity_transform/test/test_main.cpp
new file mode 100644
index 00000000000..556b026ac58
--- /dev/null
+++ b/modules/intensity_transform/test/test_main.cpp
@@ -0,0 +1,7 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "test_precomp.hpp"
+
+CV_TEST_MAIN("cv")
diff --git a/modules/intensity_transform/test/test_precomp.hpp b/modules/intensity_transform/test/test_precomp.hpp
new file mode 100644
index 00000000000..626c591fe28
--- /dev/null
+++ b/modules/intensity_transform/test/test_precomp.hpp
@@ -0,0 +1,15 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/intensity_transform.hpp"
+
+namespace opencv_test {
+using namespace cv::intensity_transform;
+}
+
+#endif
\ No newline at end of file
diff --git a/modules/optflow/CMakeLists.txt b/modules/optflow/CMakeLists.txt
index 39a37f143dd..e4232749ea2 100644
--- a/modules/optflow/CMakeLists.txt
+++ b/modules/optflow/CMakeLists.txt
@@ -1,2 +1,2 @@
 set(the_description "Optical Flow Algorithms")
-ocv_define_module(optflow opencv_core opencv_imgproc opencv_video opencv_ximgproc opencv_imgcodecs opencv_flann WRAP python)
+ocv_define_module(optflow opencv_core opencv_imgproc opencv_calib3d opencv_video opencv_ximgproc opencv_imgcodecs opencv_flann WRAP python)
diff --git a/modules/optflow/README.md b/modules/optflow/README.md
index 614b5a1d1d1..0be2e93b28f 100644
--- a/modules/optflow/README.md
+++ b/modules/optflow/README.md
@@ -1,4 +1,4 @@
 Optical Flow Algorithms
 =======================
 
-Algorithms for running and evaluating deepflow, simpleflow, sparsetodenseflow and motion templates (silhouette flow).
+Algorithms for running and evaluating deepflow, simpleflow, sparsetodenseflow, robust local optical flow and motion templates (silhouette flow).
diff --git a/modules/optflow/doc/optflow.bib b/modules/optflow/doc/optflow.bib
index 5cfcab79095..78bcc356282 100644
--- a/modules/optflow/doc/optflow.bib
+++ b/modules/optflow/doc/optflow.bib
@@ -38,13 +38,6 @@ @inproceedings{Weinzaepfel2013
   organization={IEEE}
 }
 
-@inproceedings{Kroeger2016,
-  author={Till Kroeger and Radu Timofte and Dengxin Dai and Luc Van Gool},
-  title={Fast Optical Flow using Dense Inverse Search},
-  booktitle={Proceedings of the European Conference on Computer Vision ({ECCV})},
-  year = {2016}
-}
-
 @inproceedings{Brox2004,
   title={High accuracy optical flow estimation based on a theory for warping},
   author={Brox, Thomas and Bruhn, Andr{\'e}s and Papenberg, Nils and Weickert, Joachim},
@@ -68,3 +61,69 @@ @inproceedings{Wang_2016_CVPR
   month = {June},
   year = {2016}
 }
+
+@inproceedings{Geistert2016,
+  author = {Jonas Geistert and Tobias Senst and Thomas Sikora},
+  title = {Robust Local Optical Flow: Dense Motion Vector Field Interpolation},
+  booktitle = {Picture Coding Symposium},
+  year = {2016},
+  pages = {1--5},
+}
+
+@inproceedings{Senst2016,
+  author = {Tobias Senst and Jonas Geistert and Thomas Sikora},
+  title = {Robust local optical flow: Long-range motions and varying illuminations},
+  booktitle = {IEEE International Conference on Image Processing},
+  year = {2016},
+  pages = {4478--4482},
+}
+
+@inproceedings{Senst2014,
+  author = {Tobias Senst and Thilo Borgmann and Ivo Keller and Thomas Sikora},
+  title = {Cross based Robust Local Optical Flow},
+  booktitle = {21th IEEE International Conference on Image Processing},
+  year = {2014},
+  pages = {1967--1971},
+}
+
+@inproceedings{Senst2013,
+  author = {Tobias Senst and Jonas Geistert and Ivo Keller and Thomas Sikora},
+  title = {Robust Local Optical Flow Estimation using Bilinear Equations for Sparse Motion Estimation},
+  booktitle = {20th IEEE International Conference on Image Processing},
+  year = {2013},
+  pages = {2499--2503},
+}
+
+@article{Senst2012,
+  author = {Tobias Senst and Volker Eiselein and Thomas Sikora},
+  title = {Robust Local Optical Flow for Feature Tracking},
+  journal = {IEEE Transactions on Circuits and Systems for Video Technology},
+  year = {2012},
+  pages = {1377--1387},
+  volume = {22},
+  number = {9},
+}
+
+@phdthesis{Senst2019,
+  title={Estimation and analysis of motion in video data},
+  author={Senst, Tobias},
+  year={2019},
+  school={Technical Univerity Berlin},
+  doi = {10.14279/depositonce-9085},
+  url = {https://doi.org/10.14279/depositonce-9085}
+}
+
+@article{Tibshirani2008,
+  title={Fast computation of the median by successive binning},
+  author={Tibshirani, Ryan J},
+  journal={arXiv preprint arXiv:0806.3301},
+  year={2008}
+}
+
+@inproceedings{Hu2017,
+  title={Robust interpolation of correspondences for large displacement optical flow},
+  author={Hu, Yinlin and Li, Yunsong and Song, Rui},
+  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
+  pages={481--489},
+  year={2017}
+}
\ No newline at end of file
diff --git a/modules/optflow/include/opencv2/optflow.hpp b/modules/optflow/include/opencv2/optflow.hpp
index e68d5b78d0c..093b5fe2c7f 100644
--- a/modules/optflow/include/opencv2/optflow.hpp
+++ b/modules/optflow/include/opencv2/optflow.hpp
@@ -68,12 +68,12 @@ Functions reading and writing .flo files in "Middlebury" format, see: <http://vi
 
 #include "opencv2/optflow/pcaflow.hpp"
 #include "opencv2/optflow/sparse_matching_gpc.hpp"
-
+#include "opencv2/optflow/rlofflow.hpp"
 namespace cv
 {
 namespace optflow
 {
-    
+
 //! @addtogroup optflow
 //! @{
 
@@ -137,85 +137,6 @@ CV_EXPORTS_W void calcOpticalFlowSparseToDense ( InputArray from, InputArray to,
                                                  bool use_post_proc = true, float fgs_lambda = 500.0f,
                                                  float fgs_sigma = 1.5f );
 
-/** @brief Read a .flo file
-
-@param path Path to the file to be loaded
-
-The function readOpticalFlow loads a flow field from a file and returns it as a single matrix.
-Resulting Mat has a type CV_32FC2 - floating-point, 2-channel. First channel corresponds to the
-flow in the horizontal direction (u), second - vertical (v).
- */
-CV_EXPORTS_W Mat readOpticalFlow( const String& path );
-/** @brief Write a .flo to disk
-
-@param path Path to the file to be written
-@param flow Flow field to be stored
-
-The function stores a flow field in a file, returns true on success, false otherwise.
-The flow field must be a 2-channel, floating-point matrix (CV_32FC2). First channel corresponds
-to the flow in the horizontal direction (u), second - vertical (v).
- */
-CV_EXPORTS_W bool writeOpticalFlow( const String& path, InputArray flow );
-
-/** @brief Variational optical flow refinement
-
-This class implements variational refinement of the input flow field, i.e.
-it uses input flow to initialize the minimization of the following functional:
-\f$E(U) = \int_{\Omega} \delta \Psi(E_I) + \gamma \Psi(E_G) + \alpha \Psi(E_S) \f$,
-where \f$E_I,E_G,E_S\f$ are color constancy, gradient constancy and smoothness terms
-respectively. \f$\Psi(s^2)=\sqrt{s^2+\epsilon^2}\f$ is a robust penalizer to limit the
-influence of outliers. A complete formulation and a description of the minimization
-procedure can be found in @cite Brox2004
-*/
-class CV_EXPORTS_W VariationalRefinement : public DenseOpticalFlow
-{
-public:
-    /** @brief @ref calc function overload to handle separate horizontal (u) and vertical (v) flow components
-    (to avoid extra splits/merges) */
-    CV_WRAP virtual void calcUV(InputArray I0, InputArray I1, InputOutputArray flow_u, InputOutputArray flow_v) = 0;
-
-    /** @brief Number of outer (fixed-point) iterations in the minimization procedure.
-    @see setFixedPointIterations */
-    CV_WRAP virtual int getFixedPointIterations() const = 0;
-    /** @copybrief getFixedPointIterations @see getFixedPointIterations */
-    CV_WRAP virtual void setFixedPointIterations(int val) = 0;
-
-    /** @brief Number of inner successive over-relaxation (SOR) iterations
-        in the minimization procedure to solve the respective linear system.
-    @see setSorIterations */
-    CV_WRAP virtual int getSorIterations() const = 0;
-    /** @copybrief getSorIterations @see getSorIterations */
-    CV_WRAP virtual void setSorIterations(int val) = 0;
-
-    /** @brief Relaxation factor in SOR
-    @see setOmega */
-    CV_WRAP virtual float getOmega() const = 0;
-    /** @copybrief getOmega @see getOmega */
-    CV_WRAP virtual void setOmega(float val) = 0;
-
-    /** @brief Weight of the smoothness term
-    @see setAlpha */
-    CV_WRAP virtual float getAlpha() const = 0;
-    /** @copybrief getAlpha @see getAlpha */
-    CV_WRAP virtual void setAlpha(float val) = 0;
-
-    /** @brief Weight of the color constancy term
-    @see setDelta */
-    CV_WRAP virtual float getDelta() const = 0;
-    /** @copybrief getDelta @see getDelta */
-    CV_WRAP virtual void setDelta(float val) = 0;
-
-    /** @brief Weight of the gradient constancy term
-    @see setGamma */
-    CV_WRAP virtual float getGamma() const = 0;
-    /** @copybrief getGamma @see getGamma */
-    CV_WRAP virtual void setGamma(float val) = 0;
-};
-
-/** @brief Creates an instance of VariationalRefinement
-*/
-CV_EXPORTS_W Ptr<VariationalRefinement> createVariationalFlowRefinement();
-
 /** @brief DeepFlow optical flow algorithm implementation.
 
 The class implements the DeepFlow optical flow algorithm described in @cite Weinzaepfel2013 . See
@@ -252,107 +173,131 @@ CV_EXPORTS_W Ptr<DenseOpticalFlow> createOptFlow_Farneback();
 //! Additional interface to the SparseToDenseFlow algorithm - calcOpticalFlowSparseToDense()
 CV_EXPORTS_W Ptr<DenseOpticalFlow> createOptFlow_SparseToDense();
 
-/** @brief DIS optical flow algorithm.
+/** @brief "Dual TV L1" Optical Flow Algorithm.
+
+The class implements the "Dual TV L1" optical flow algorithm described in @cite Zach2007 and
+@cite Javier2012 .
+Here are important members of the class that control the algorithm, which you can set after
+constructing the class instance:
+
+-   member double tau
+    Time step of the numerical scheme.
+
+-   member double lambda
+    Weight parameter for the data term, attachment parameter. This is the most relevant
+    parameter, which determines the smoothness of the output. The smaller this parameter is,
+    the smoother the solutions we obtain. It depends on the range of motions of the images, so
+    its value should be adapted to each image sequence.
 
-This class implements the Dense Inverse Search (DIS) optical flow algorithm. More
-details about the algorithm can be found at @cite Kroeger2016 . Includes three presets with preselected
-parameters to provide reasonable trade-off between speed and quality. However, even the slowest preset is
-still relatively fast, use DeepFlow if you need better quality and don't care about speed.
+-   member double theta
+    Weight parameter for (u - v)\^2, tightness parameter. It serves as a link between the
+    attachment and the regularization terms. In theory, it should have a small value in order
+    to maintain both parts in correspondence. The method is stable for a large range of values
+    of this parameter.
 
-This implementation includes several additional features compared to the algorithm described in the paper,
-including spatial propagation of flow vectors (@ref getUseSpatialPropagation), as well as an option to
-utilize an initial flow approximation passed to @ref calc (which is, essentially, temporal propagation,
-if the previous frame's flow field is passed).
+-   member int nscales
+    Number of scales used to create the pyramid of images.
+
+-   member int warps
+    Number of warpings per scale. Represents the number of times that I1(x+u0) and grad(
+    I1(x+u0) ) are computed per scale. This is a parameter that assures the stability of the
+    method. It also affects the running time, so it is a compromise between speed and
+    accuracy.
+
+-   member double epsilon
+    Stopping criterion threshold used in the numerical scheme, which is a trade-off between
+    precision and running time. A small value will yield more accurate solutions at the
+    expense of a slower convergence.
+
+-   member int iterations
+    Stopping criterion iterations number used in the numerical scheme.
+
+C. Zach, T. Pock and H. Bischof, "A Duality Based Approach for Realtime TV-L1 Optical Flow".
+Javier Sanchez, Enric Meinhardt-Llopis and Gabriele Facciolo. "TV-L1 Optical Flow Estimation".
 */
-class CV_EXPORTS_W DISOpticalFlow : public DenseOpticalFlow
+class CV_EXPORTS_W DualTVL1OpticalFlow : public DenseOpticalFlow
 {
 public:
-    enum
-    {
-        PRESET_ULTRAFAST = 0,
-        PRESET_FAST = 1,
-        PRESET_MEDIUM = 2
-    };
-
-    /** @brief Finest level of the Gaussian pyramid on which the flow is computed (zero level
-        corresponds to the original image resolution). The final flow is obtained by bilinear upscaling.
-        @see setFinestScale */
-    CV_WRAP virtual int getFinestScale() const = 0;
-    /** @copybrief getFinestScale @see getFinestScale */
-    CV_WRAP virtual void setFinestScale(int val) = 0;
-
-    /** @brief Size of an image patch for matching (in pixels). Normally, default 8x8 patches work well
-        enough in most cases.
-        @see setPatchSize */
-    CV_WRAP virtual int getPatchSize() const = 0;
-    /** @copybrief getPatchSize @see getPatchSize */
-    CV_WRAP virtual void setPatchSize(int val) = 0;
-
-    /** @brief Stride between neighbor patches. Must be less than patch size. Lower values correspond
-        to higher flow quality.
-        @see setPatchStride */
-    CV_WRAP virtual int getPatchStride() const = 0;
-    /** @copybrief getPatchStride @see getPatchStride */
-    CV_WRAP virtual void setPatchStride(int val) = 0;
-
-    /** @brief Maximum number of gradient descent iterations in the patch inverse search stage. Higher values
-        may improve quality in some cases.
-        @see setGradientDescentIterations */
-    CV_WRAP virtual int getGradientDescentIterations() const = 0;
-    /** @copybrief getGradientDescentIterations @see getGradientDescentIterations */
-    CV_WRAP virtual void setGradientDescentIterations(int val) = 0;
-
-    /** @brief Number of fixed point iterations of variational refinement per scale. Set to zero to
-        disable variational refinement completely. Higher values will typically result in more smooth and
-        high-quality flow.
-    @see setGradientDescentIterations */
-    CV_WRAP virtual int getVariationalRefinementIterations() const = 0;
-    /** @copybrief getGradientDescentIterations @see getGradientDescentIterations */
-    CV_WRAP virtual void setVariationalRefinementIterations(int val) = 0;
-
-    /** @brief Weight of the smoothness term
-    @see setVariationalRefinementAlpha */
-    CV_WRAP virtual float getVariationalRefinementAlpha() const = 0;
-    /** @copybrief getVariationalRefinementAlpha @see getVariationalRefinementAlpha */
-    CV_WRAP virtual void setVariationalRefinementAlpha(float val) = 0;
-
-    /** @brief Weight of the color constancy term
-    @see setVariationalRefinementDelta */
-    CV_WRAP virtual float getVariationalRefinementDelta() const = 0;
-    /** @copybrief getVariationalRefinementDelta @see getVariationalRefinementDelta */
-    CV_WRAP virtual void setVariationalRefinementDelta(float val) = 0;
-
-    /** @brief Weight of the gradient constancy term
-    @see setVariationalRefinementGamma */
-    CV_WRAP virtual float getVariationalRefinementGamma() const = 0;
-    /** @copybrief getVariationalRefinementGamma @see getVariationalRefinementGamma */
-    CV_WRAP virtual void setVariationalRefinementGamma(float val) = 0;
-
-
-    /** @brief Whether to use mean-normalization of patches when computing patch distance. It is turned on
-        by default as it typically provides a noticeable quality boost because of increased robustness to
-        illumination variations. Turn it off if you are certain that your sequence doesn't contain any changes
-        in illumination.
-    @see setUseMeanNormalization */
-    CV_WRAP virtual bool getUseMeanNormalization() const = 0;
-    /** @copybrief getUseMeanNormalization @see getUseMeanNormalization */
-    CV_WRAP virtual void setUseMeanNormalization(bool val) = 0;
-
-    /** @brief Whether to use spatial propagation of good optical flow vectors. This option is turned on by
-        default, as it tends to work better on average and can sometimes help recover from major errors
-        introduced by the coarse-to-fine scheme employed by the DIS optical flow algorithm. Turning this
-        option off can make the output flow field a bit smoother, however.
-    @see setUseSpatialPropagation */
-    CV_WRAP virtual bool getUseSpatialPropagation() const = 0;
-    /** @copybrief getUseSpatialPropagation @see getUseSpatialPropagation */
-    CV_WRAP virtual void setUseSpatialPropagation(bool val) = 0;
+    //! @brief Time step of the numerical scheme
+    /** @see setTau */
+    CV_WRAP virtual double getTau() const = 0;
+    /** @copybrief getTau @see getTau */
+    CV_WRAP virtual void setTau(double val) = 0;
+    //! @brief Weight parameter for the data term, attachment parameter
+    /** @see setLambda */
+    CV_WRAP virtual double getLambda() const = 0;
+    /** @copybrief getLambda @see getLambda */
+    CV_WRAP virtual void setLambda(double val) = 0;
+    //! @brief Weight parameter for (u - v)^2, tightness parameter
+    /** @see setTheta */
+    CV_WRAP virtual double getTheta() const = 0;
+    /** @copybrief getTheta @see getTheta */
+    CV_WRAP virtual void setTheta(double val) = 0;
+    //! @brief coefficient for additional illumination variation term
+    /** @see setGamma */
+    CV_WRAP virtual double getGamma() const = 0;
+    /** @copybrief getGamma @see getGamma */
+    CV_WRAP virtual void setGamma(double val) = 0;
+    //! @brief Number of scales used to create the pyramid of images
+    /** @see setScalesNumber */
+    CV_WRAP virtual int getScalesNumber() const = 0;
+    /** @copybrief getScalesNumber @see getScalesNumber */
+    CV_WRAP virtual void setScalesNumber(int val) = 0;
+    //! @brief Number of warpings per scale
+    /** @see setWarpingsNumber */
+    CV_WRAP virtual int getWarpingsNumber() const = 0;
+    /** @copybrief getWarpingsNumber @see getWarpingsNumber */
+    CV_WRAP virtual void setWarpingsNumber(int val) = 0;
+    //! @brief Stopping criterion threshold used in the numerical scheme, which is a trade-off between precision and running time
+    /** @see setEpsilon */
+    CV_WRAP virtual double getEpsilon() const = 0;
+    /** @copybrief getEpsilon @see getEpsilon */
+    CV_WRAP virtual void setEpsilon(double val) = 0;
+    //! @brief Inner iterations (between outlier filtering) used in the numerical scheme
+    /** @see setInnerIterations */
+    CV_WRAP virtual int getInnerIterations() const = 0;
+    /** @copybrief getInnerIterations @see getInnerIterations */
+    CV_WRAP virtual void setInnerIterations(int val) = 0;
+    //! @brief Outer iterations (number of inner loops) used in the numerical scheme
+    /** @see setOuterIterations */
+    CV_WRAP virtual int getOuterIterations() const = 0;
+    /** @copybrief getOuterIterations @see getOuterIterations */
+    CV_WRAP virtual void setOuterIterations(int val) = 0;
+    //! @brief Use initial flow
+    /** @see setUseInitialFlow */
+    CV_WRAP virtual bool getUseInitialFlow() const = 0;
+    /** @copybrief getUseInitialFlow @see getUseInitialFlow */
+    CV_WRAP virtual void setUseInitialFlow(bool val) = 0;
+    //! @brief Step between scales (<1)
+    /** @see setScaleStep */
+    CV_WRAP virtual double getScaleStep() const = 0;
+    /** @copybrief getScaleStep @see getScaleStep */
+    CV_WRAP virtual void setScaleStep(double val) = 0;
+    //! @brief Median filter kernel size (1 = no filter) (3 or 5)
+    /** @see setMedianFiltering */
+    CV_WRAP virtual int getMedianFiltering() const = 0;
+    /** @copybrief getMedianFiltering @see getMedianFiltering */
+    CV_WRAP virtual void setMedianFiltering(int val) = 0;
+
+    /** @brief Creates instance of cv::DualTVL1OpticalFlow*/
+    CV_WRAP static Ptr<DualTVL1OpticalFlow> create(
+                                            double tau = 0.25,
+                                            double lambda = 0.15,
+                                            double theta = 0.3,
+                                            int nscales = 5,
+                                            int warps = 5,
+                                            double epsilon = 0.01,
+                                            int innnerIterations = 30,
+                                            int outerIterations = 10,
+                                            double scaleStep = 0.8,
+                                            double gamma = 0.0,
+                                            int medianFiltering = 5,
+                                            bool useInitialFlow = false);
 };
 
-/** @brief Creates an instance of DISOpticalFlow
-
-@param preset one of PRESET_ULTRAFAST, PRESET_FAST and PRESET_MEDIUM
+/** @brief Creates instance of cv::DenseOpticalFlow
 */
-CV_EXPORTS_W Ptr<DISOpticalFlow> createOptFlow_DIS(int preset = DISOpticalFlow::PRESET_FAST);
+CV_EXPORTS_W Ptr<DualTVL1OpticalFlow> createOptFlow_DualTVL1();
 
 //! @}
 
diff --git a/modules/optflow/include/opencv2/optflow/rlofflow.hpp b/modules/optflow/include/opencv2/optflow/rlofflow.hpp
new file mode 100644
index 00000000000..c4150cae652
--- /dev/null
+++ b/modules/optflow/include/opencv2/optflow/rlofflow.hpp
@@ -0,0 +1,544 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#ifndef __OPENCV_OPTFLOW_RLOFFLOW_HPP__
+#define __OPENCV_OPTFLOW_RLOFFLOW_HPP__
+
+#include "opencv2/core.hpp"
+#include "opencv2/video.hpp"
+
+namespace cv
+{
+namespace optflow
+{
+//! @addtogroup optflow
+//! @{
+
+enum SupportRegionType {
+    SR_FIXED = 0,           /**<  Apply a constant support region */
+    SR_CROSS = 1            /**<  Apply a adaptive support region obtained by cross-based segmentation
+                             *    as described in @cite Senst2014
+                            */
+};
+enum SolverType {
+    ST_STANDART = 0,        /**< Apply standard iterative refinement */
+    ST_BILINEAR = 1         /**< Apply optimized iterative refinement based bilinear equation solutions
+                             *   as described in @cite Senst2013
+                            */
+};
+
+enum InterpolationType
+{
+    INTERP_GEO = 0,    /**<  Fast geodesic interpolation, see @cite Geistert2016 */
+    INTERP_EPIC = 1,   /**<  Edge-preserving interpolation using ximgproc::EdgeAwareInterpolator, see @cite Revaud2015,Geistert2016. */
+    INTERP_RIC = 2,    /**<  SLIC based robust interpolation using ximgproc::RICInterpolator, see @cite Hu2017. */
+};
+
+/** @brief This is used store and set up the parameters of the robust local optical flow (RLOF) algoritm.
+ *
+ * The RLOF is a fast local optical flow approach described in @cite Senst2012 @cite Senst2013 @cite Senst2014
+ * and @cite Senst2016 similar to the pyramidal iterative Lucas-Kanade method as
+ * proposed by @cite Bouguet00. More details and experiments can be found in the following thesis @cite Senst2019.
+ * The implementation is derived from optflow::calcOpticalFlowPyrLK().
+ * This RLOF implementation can be seen as an improved pyramidal iterative Lucas-Kanade and includes
+ * a set of improving modules. The main improvements in respect to the pyramidal iterative Lucas-Kanade
+ * are:
+ *  - A more robust redecending M-estimator framework (see @cite Senst2012) to improve the accuracy at
+ *     motion boundaries and appearing and disappearing pixels.
+ *  - an adaptive support region strategies to improve the accuracy at motion boundaries to reduce the
+ *     corona effect, i.e oversmoothing of the PLK at motion/object boundaries. The cross-based segementation
+ *     strategy (SR_CROSS) proposed in @cite Senst2014 uses a simple segmenation approach to obtain the optimal
+ *     shape of the support region.
+ *  - To deal with illumination changes (outdoor sequences and shadow) the intensity constancy assumption
+ *     based optical flow equation has been adopt with the Gennert and Negahdaripour illumination model
+ *     (see @cite Senst2016). This model can be switched on/off with the useIlluminationModel variable.
+ *  - By using a global motion prior initialization (see @cite Senst2016) of the iterative refinement
+ *     the accuracy could be significantly improved for large displacements. This initialization can be
+ *     switched on and of with useGlobalMotionPrior variable.
+ *
+ * The RLOF can be computed with the SparseOpticalFlow class or function interface to track a set of features
+ * or with the DenseOpticalFlow class or function interface to compute dense optical flow.
+ *
+ * @see optflow::DenseRLOFOpticalFlow, optflow::calcOpticalFlowDenseRLOF(), optflow::SparseRLOFOpticalFlow, optflow::calcOpticalFlowSparseRLOF()
+ */
+class CV_EXPORTS_W RLOFOpticalFlowParameter{
+public:
+    RLOFOpticalFlowParameter()
+        :solverType(ST_BILINEAR)
+        ,supportRegionType(SR_CROSS)
+        ,normSigma0(3.2f)
+        ,normSigma1(7.f)
+        ,smallWinSize(9)
+        ,largeWinSize(21)
+        ,crossSegmentationThreshold(25)
+        ,maxLevel(5)
+        ,useInitialFlow(false)
+        ,useIlluminationModel(true)
+        ,useGlobalMotionPrior(true)
+        ,maxIteration(30)
+        ,minEigenValue(0.0001f)
+        ,globalMotionRansacThreshold(10)
+    {}
+
+    SolverType solverType;
+    /**< Variable specifies the iterative refinement strategy. Please consider citing  @cite Senst2013 when
+     *  using ST_BILINEAR.
+    */
+
+    SupportRegionType supportRegionType;
+    /**< Variable specifies the support region shape extraction or shrinking strategy.
+    */
+
+    float normSigma0;
+    /**< &sigma paramter of the shrinked Hampel norm introduced in @cite Senst2012. If
+     * &sigma = std::numeric_limist<float>::max() the least-square estimator will be used
+     * instead of the M-estimator. Althoug M-estimator is more robust against outlier in the support
+     * region the least-square can be fast in computation.
+    */
+    float normSigma1;
+    /**< &sigma paramter of the shrinked Hampel norm introduced in @cite Senst2012. If
+     * &sigma = std::numeric_limist<float>::max() the least-square estimator will be used
+     * instead of the M-estimator. Althoug M-estimator is more robust against outlier in the support
+     * region the least-square can be fast in computation.
+    */
+    int smallWinSize;
+    /**< Minimal window size of the support region. This parameter is only used if supportRegionType is SR_CROSS.
+    */
+    int largeWinSize;
+    /**< Maximal window size of the support region. If supportRegionType is SR_FIXED this gives the exact support
+     * region size. The speed of the RLOF is related to the applied win sizes. The smaller the window size the lower is the runtime,
+     * but the more sensitive to noise is the method.
+    */
+    int crossSegmentationThreshold;
+    /**< Color similarity threshold used by cross-based segmentation following @cite Senst2014 .
+     *   (Only used  if supportRegionType is SR_CROSS). With the cross-bassed segmentation
+     *   motion boundaries can be computed more accurately.
+    */
+    int maxLevel;
+    /**< Maximal number of pyramid level used. The large this value is the more likely it is
+     *   to obtain accurate solutions for long-range motions. The runtime is linear related to
+     *   this parameter.
+    */
+    bool useInitialFlow;
+    /**< Use next point list as initial values. A good intialization can imporve the algortihm
+     *   accuracy and reduce the runtime by a faster convergence of the iteration refinement.
+    */
+    bool useIlluminationModel;
+    /**< Use the Gennert and Negahdaripour illumination model instead of the intensity brigthness
+     *   constraint. (proposed in @cite Senst2016 ) This model is defined as follow:
+     *   \f[ I(\mathbf{x},t) + m \cdot I(\mathbf{x},t) + c = I(\mathbf{x},t+1) \f]
+     *   and contains with m and c a multiplicative and additive term which makes the estimate
+     *   more robust against illumination changes. The computational complexity is increased by
+     *   enabling the illumination model.
+    */
+    bool useGlobalMotionPrior;
+    /**< Use global motion prior initialisation has been introduced in @cite Senst2016 . It
+     *   allows to be more accurate for long-range motion. The computational complexity is
+     *   slightly increased by enabling the global motion prior initialisation.
+    */
+    int maxIteration;
+    /**< Number of maximal iterations used for the iterative refinement. Lower values can
+     *   reduce the runtime but also the accuracy.
+    */
+    float minEigenValue;
+    /**< Threshold for the minimal eigenvalue of the gradient matrix defines when to abort the
+     *   iterative refinement.
+    */
+    float globalMotionRansacThreshold;
+    /**< To apply the global motion prior motion vectors will be computed on a regulary sampled which
+     *   are the basis for Homography estimation using RANSAC. The reprojection threshold is based on
+     *   n-th percentil (given by this value [0 ... 100]) of the motion vectors magnitude.
+     *   See @cite Senst2016 for more details.
+    */
+
+    CV_WRAP void setSolverType(SolverType val);
+    CV_WRAP SolverType getSolverType() const;
+
+    CV_WRAP void setSupportRegionType(SupportRegionType val);
+    CV_WRAP SupportRegionType getSupportRegionType() const;
+
+    CV_WRAP void setNormSigma0(float val);
+    CV_WRAP float getNormSigma0() const;
+
+    CV_WRAP void setNormSigma1(float val);
+    CV_WRAP float getNormSigma1() const;
+
+    CV_WRAP void setSmallWinSize(int val);
+    CV_WRAP int getSmallWinSize() const;
+
+    CV_WRAP void setLargeWinSize(int val);
+    CV_WRAP int getLargeWinSize() const;
+
+    CV_WRAP void setCrossSegmentationThreshold(int val);
+    CV_WRAP int getCrossSegmentationThreshold() const;
+
+    CV_WRAP void setMaxLevel(int val);
+    CV_WRAP int getMaxLevel() const;
+
+    CV_WRAP void setUseInitialFlow(bool val);
+    CV_WRAP bool getUseInitialFlow() const;
+
+    CV_WRAP void setUseIlluminationModel(bool val);
+    CV_WRAP bool getUseIlluminationModel() const;
+
+    CV_WRAP void setUseGlobalMotionPrior(bool val);
+    CV_WRAP bool getUseGlobalMotionPrior() const;
+
+    CV_WRAP void setMaxIteration(int val);
+    CV_WRAP int getMaxIteration() const;
+
+    CV_WRAP void setMinEigenValue(float val);
+    CV_WRAP float getMinEigenValue() const;
+
+    CV_WRAP void setGlobalMotionRansacThreshold(float val);
+    CV_WRAP float getGlobalMotionRansacThreshold() const;
+
+    //! @brief Creates instance of optflow::RLOFOpticalFlowParameter
+    CV_WRAP static Ptr<RLOFOpticalFlowParameter> create();
+};
+
+/** @brief Fast dense optical flow computation based on robust local optical flow (RLOF) algorithms and sparse-to-dense interpolation
+ * scheme.
+ *
+ * The RLOF is a fast local optical flow approach described in @cite Senst2012 @cite Senst2013 @cite Senst2014
+ * and @cite Senst2016 similar to the pyramidal iterative Lucas-Kanade method as
+ * proposed by @cite Bouguet00. More details and experiments can be found in the following thesis @cite Senst2019.
+ * The implementation is derived from optflow::calcOpticalFlowPyrLK().
+ *
+ * The sparse-to-dense interpolation scheme allows for fast computation of dense optical flow using RLOF (see @cite Geistert2016).
+ * For this scheme the following steps are applied:
+ * -# motion vector seeded at a regular sampled grid are computed. The sparsity of this grid can be configured with setGridStep
+ * -# (optinally) errornous motion vectors are filter based on the forward backward confidence. The threshold can be configured
+ * with setForwardBackward. The filter is only applied if the threshold >0 but than the runtime is doubled due to the estimation
+ * of the backward flow.
+ * -# Vector field interpolation is applied to the motion vector set to obtain a dense vector field.
+ *
+ * For the RLOF configuration see optflow::RLOFOpticalFlowParameter for further details.
+ * Parameters have been described in @cite Senst2012 @cite Senst2013 @cite Senst2014 and @cite Senst2016.
+ *
+ * @note SIMD parallelization is only available when compiling with SSE4.1. If the grid size is set to (1,1) and the
+ * forward backward threshold <= 0 that the dense optical flow field is purely.
+ * computed with the RLOF.
+ *
+ * @see optflow::calcOpticalFlowDenseRLOF(), optflow::RLOFOpticalFlowParameter
+*/
+class CV_EXPORTS_W DenseRLOFOpticalFlow : public DenseOpticalFlow
+{
+public:
+    //! @brief Configuration of the RLOF alogrithm.
+    /**
+        @see optflow::RLOFOpticalFlowParameter, getRLOFOpticalFlowParameter
+    */
+    CV_WRAP virtual void setRLOFOpticalFlowParameter(Ptr<RLOFOpticalFlowParameter>  val) = 0;
+    /** @copybrief setRLOFOpticalFlowParameter
+        @see optflow::RLOFOpticalFlowParameter, setRLOFOpticalFlowParameter
+    */
+    CV_WRAP virtual Ptr<RLOFOpticalFlowParameter>  getRLOFOpticalFlowParameter() const = 0;
+    //! @brief Threshold for the forward backward confidence check
+    /**For each grid point \f$ \mathbf{x} \f$ a motion vector \f$ d_{I0,I1}(\mathbf{x}) \f$ is computed.
+     *     If the forward backward error \f[ EP_{FB} = || d_{I0,I1} + d_{I1,I0} || \f]
+     *     is larger than threshold given by this function then the motion vector will not be used by the following
+     *    vector field interpolation. \f$ d_{I1,I0} \f$ denotes the backward flow. Note, the forward backward test
+     *    will only be applied if the threshold > 0. This may results into a doubled runtime for the motion estimation.
+     *    @see getForwardBackward, setGridStep
+    */
+    CV_WRAP virtual void setForwardBackward(float val) = 0;
+    /** @copybrief setForwardBackward
+        @see setForwardBackward
+    */
+    CV_WRAP virtual float getForwardBackward() const = 0;
+    //! @brief Size of the grid to spawn the motion vectors.
+    /** For each grid point a motion vector is computed. Some motion vectors will be removed due to the forwatd backward
+     *  threshold (if set >0). The rest will be the base of the vector field interpolation.
+     *    @see getForwardBackward, setGridStep
+    */
+    CV_WRAP virtual Size getGridStep() const = 0;
+    /** @copybrief getGridStep
+     *    @see getGridStep
+     */
+    CV_WRAP virtual void setGridStep(Size val) = 0;
+
+    //! @brief Interpolation used to compute the dense optical flow.
+    /** Two interpolation algorithms are supported
+     * - **INTERP_GEO** applies the fast geodesic interpolation, see @cite Geistert2016.
+     * - **INTERP_EPIC_RESIDUAL** applies the edge-preserving interpolation, see @cite Revaud2015,Geistert2016.
+     * @see ximgproc::EdgeAwareInterpolator, getInterpolation
+    */
+    CV_WRAP virtual void setInterpolation(InterpolationType val) = 0;
+    /** @copybrief setInterpolation
+     *    @see ximgproc::EdgeAwareInterpolator, setInterpolation
+     */
+    CV_WRAP virtual InterpolationType getInterpolation() const = 0;
+    //! @brief see ximgproc::EdgeAwareInterpolator() K value.
+    /** K is a number of nearest-neighbor matches considered, when fitting a locally affine
+     *    model. Usually it should be around 128. However, lower values would make the interpolation noticeably faster.
+     *    @see ximgproc::EdgeAwareInterpolator,  setEPICK
+    */
+    CV_WRAP virtual int getEPICK() const = 0;
+    /** @copybrief getEPICK
+     *    @see ximgproc::EdgeAwareInterpolator, getEPICK
+     */
+    CV_WRAP virtual void setEPICK(int val) = 0;
+    //! @brief see ximgproc::EdgeAwareInterpolator() sigma value.
+    /** Sigma is a parameter defining how fast the weights decrease in the locally-weighted affine
+     *  fitting. Higher values can help preserve fine details, lower values can help to get rid of noise in the
+     *  output flow.
+     *    @see ximgproc::EdgeAwareInterpolator, setEPICSigma
+    */
+    CV_WRAP virtual float getEPICSigma() const = 0;
+    /** @copybrief getEPICSigma
+     *  @see ximgproc::EdgeAwareInterpolator, getEPICSigma
+     */
+    CV_WRAP virtual void setEPICSigma(float val) = 0;
+    //! @brief  see ximgproc::EdgeAwareInterpolator() lambda value.
+    /** Lambda is a parameter defining the weight of the edge-aware term in geodesic distance,
+     *    should be in the range of 0 to 1000.
+     *    @see ximgproc::EdgeAwareInterpolator, setEPICSigma
+    */
+    CV_WRAP virtual float getEPICLambda() const = 0;
+    /** @copybrief getEPICLambda
+     *    @see ximgproc::EdgeAwareInterpolator, getEPICLambda
+    */
+    CV_WRAP virtual void setEPICLambda(float val) = 0;
+    //! @brief see ximgproc::EdgeAwareInterpolator().
+    /** Sets the respective fastGlobalSmootherFilter() parameter.
+     *    @see ximgproc::EdgeAwareInterpolator, setFgsLambda
+    */
+    CV_WRAP virtual float getFgsLambda() const = 0;
+    /** @copybrief getFgsLambda
+     *    @see ximgproc::EdgeAwareInterpolator, ximgproc::fastGlobalSmootherFilter, getFgsLambda
+    */
+    CV_WRAP virtual void setFgsLambda(float val) = 0;
+    //! @brief see ximgproc::EdgeAwareInterpolator().
+    /** Sets the respective fastGlobalSmootherFilter() parameter.
+     *    @see ximgproc::EdgeAwareInterpolator, ximgproc::fastGlobalSmootherFilter, setFgsSigma
+    */
+    CV_WRAP virtual float getFgsSigma() const = 0;
+    /** @copybrief getFgsSigma
+     *    @see ximgproc::EdgeAwareInterpolator, ximgproc::fastGlobalSmootherFilter, getFgsSigma
+     */
+    CV_WRAP virtual void setFgsSigma(float val) = 0;
+    //! @brief enables ximgproc::fastGlobalSmootherFilter
+    /**
+     * @see getUsePostProc
+     */
+    CV_WRAP virtual void setUsePostProc(bool val) = 0;
+    /** @copybrief setUsePostProc
+     *    @see ximgproc::fastGlobalSmootherFilter, setUsePostProc
+     */
+    CV_WRAP virtual bool getUsePostProc() const = 0;
+    //! @brief enables VariationalRefinement
+    /**
+     * @see getUseVariationalRefinement
+     */
+    CV_WRAP virtual void setUseVariationalRefinement(bool val) = 0;
+    /** @copybrief setUseVariationalRefinement
+     *    @see ximgproc::fastGlobalSmootherFilter, setUsePostProc
+     */
+    CV_WRAP virtual bool getUseVariationalRefinement() const = 0;
+    //! @brief Parameter to tune the approximate size of the superpixel used for oversegmentation.
+    /**
+     * @see cv::ximgproc::createSuperpixelSLIC, cv::ximgproc::RICInterpolator
+     */
+    CV_WRAP virtual void setRICSPSize(int val) = 0;
+    /** @copybrief setRICSPSize
+    *    @see setRICSPSize
+    */
+    CV_WRAP virtual int  getRICSPSize() const = 0;
+    /** @brief Parameter to choose superpixel algorithm variant to use:
+     * - cv::ximgproc::SLICType SLIC segments image using a desired region_size (value: 100)
+     * - cv::ximgproc::SLICType SLICO will optimize using adaptive compactness factor (value: 101)
+     * - cv::ximgproc::SLICType MSLIC will optimize using manifold methods resulting in more content-sensitive superpixels (value: 102).
+     *  @see cv::ximgproc::createSuperpixelSLIC, cv::ximgproc::RICInterpolator
+    */
+    CV_WRAP virtual void setRICSLICType(int val) = 0;
+    /** @copybrief setRICSLICType
+     *    @see setRICSLICType
+     */
+    CV_WRAP virtual int  getRICSLICType() const = 0;
+    //! @brief Creates instance of optflow::DenseRLOFOpticalFlow
+    /**
+     *    @param rlofParam see optflow::RLOFOpticalFlowParameter
+     *    @param forwardBackwardThreshold see setForwardBackward
+     *    @param gridStep see setGridStep
+     *    @param interp_type see setInterpolation
+     *    @param epicK see setEPICK
+     *    @param epicSigma see setEPICSigma
+     *    @param epicLambda see setEPICLambda
+     *    @param ricSPSize see setRICSPSize
+     *    @param ricSLICType see setRICSLICType
+     *    @param use_post_proc see setUsePostProc
+     *    @param fgsLambda see setFgsLambda
+     *    @param fgsSigma see setFgsSigma
+     *    @param use_variational_refinement see setUseVariationalRefinement
+    */
+    CV_WRAP static Ptr<DenseRLOFOpticalFlow> create(
+        Ptr<RLOFOpticalFlowParameter> rlofParam = Ptr<RLOFOpticalFlowParameter>(),
+        float forwardBackwardThreshold = 1.f,
+        Size gridStep = Size(6, 6),
+        InterpolationType interp_type = InterpolationType::INTERP_EPIC,
+        int epicK = 128,
+        float epicSigma = 0.05f,
+        float epicLambda = 999.0f,
+        int ricSPSize = 15,
+        int ricSLICType = 100,
+        bool use_post_proc = true,
+        float fgsLambda = 500.0f,
+        float fgsSigma = 1.5f,
+        bool use_variational_refinement = false);
+};
+
+/** @brief Class used for calculation sparse optical flow and feature tracking with robust local optical flow (RLOF) algorithms.
+*
+* The RLOF is a fast local optical flow approach described in @cite Senst2012 @cite Senst2013 @cite Senst2014
+ * and @cite Senst2016 similar to the pyramidal iterative Lucas-Kanade method as
+* proposed by @cite Bouguet00. More details and experiments can be found in the following thesis @cite Senst2019.
+* The implementation is derived from optflow::calcOpticalFlowPyrLK().
+*
+* For the RLOF configuration see optflow::RLOFOpticalFlowParameter for further details.
+* Parameters have been described in @cite Senst2012, @cite Senst2013, @cite Senst2014 and @cite Senst2016.
+*
+* @note SIMD parallelization is only available when compiling with SSE4.1.
+* @see optflow::calcOpticalFlowSparseRLOF(), optflow::RLOFOpticalFlowParameter
+*/
+class CV_EXPORTS_W SparseRLOFOpticalFlow : public SparseOpticalFlow
+{
+public:
+    /** @copydoc DenseRLOFOpticalFlow::setRLOFOpticalFlowParameter
+    */
+    CV_WRAP virtual void setRLOFOpticalFlowParameter(Ptr<RLOFOpticalFlowParameter> val) = 0;
+    /** @copybrief setRLOFOpticalFlowParameter
+     *    @see setRLOFOpticalFlowParameter
+    */
+    CV_WRAP virtual Ptr<RLOFOpticalFlowParameter>  getRLOFOpticalFlowParameter() const = 0;
+    //! @brief Threshold for the forward backward confidence check
+    /** For each feature point a motion vector \f$ d_{I0,I1}(\mathbf{x}) \f$ is computed.
+     *     If the forward backward error \f[ EP_{FB} = || d_{I0,I1} + d_{I1,I0} || \f]
+     *     is larger than threshold given by this function then the status  will not be used by the following
+     *    vector field interpolation. \f$ d_{I1,I0} \f$ denotes the backward flow. Note, the forward backward test
+     *    will only be applied if the threshold > 0. This may results into a doubled runtime for the motion estimation.
+     *    @see setForwardBackward
+    */
+    CV_WRAP virtual void setForwardBackward(float val) = 0;
+    /** @copybrief setForwardBackward
+     *    @see setForwardBackward
+    */
+    CV_WRAP virtual float getForwardBackward() const = 0;
+
+    //! @brief Creates instance of SparseRLOFOpticalFlow
+    /**
+     *    @param rlofParam see setRLOFOpticalFlowParameter
+     *    @param forwardBackwardThreshold see setForwardBackward
+    */
+    CV_WRAP static Ptr<SparseRLOFOpticalFlow> create(
+        Ptr<RLOFOpticalFlowParameter> rlofParam = Ptr<RLOFOpticalFlowParameter>(),
+        float forwardBackwardThreshold = 1.f);
+
+};
+
+/** @brief Fast dense optical flow computation based on robust local optical flow (RLOF) algorithms and sparse-to-dense interpolation scheme.
+ *
+ * The RLOF is a fast local optical flow approach described in @cite Senst2012 @cite Senst2013 @cite Senst2014
+ * and @cite Senst2016 similar to the pyramidal iterative Lucas-Kanade method as
+ * proposed by @cite Bouguet00. More details and experiments can be found in the following thesis @cite Senst2019.
+ * The implementation is derived from optflow::calcOpticalFlowPyrLK().
+ *
+ * The sparse-to-dense interpolation scheme allows for fast computation of dense optical flow using RLOF (see @cite Geistert2016).
+ * For this scheme the following steps are applied:
+ * -# motion vector seeded at a regular sampled grid are computed. The sparsity of this grid can be configured with setGridStep
+ * -# (optinally) errornous motion vectors are filter based on the forward backward confidence. The threshold can be configured
+ * with setForwardBackward. The filter is only applied if the threshold >0 but than the runtime is doubled due to the estimation
+ * of the backward flow.
+ * -# Vector field interpolation is applied to the motion vector set to obtain a dense vector field.
+ *
+ * @param I0 first 8-bit input image. If The cross-based RLOF is used (by selecting optflow::RLOFOpticalFlowParameter::supportRegionType
+ * = SupportRegionType::SR_CROSS) image has to be a 8-bit 3 channel image.
+ * @param I1 second 8-bit input image. If The cross-based RLOF is used (by selecting optflow::RLOFOpticalFlowParameter::supportRegionType
+ * = SupportRegionType::SR_CROSS) image has to be a 8-bit 3 channel image.
+ * @param flow computed flow image that has the same size as I0 and type CV_32FC2.
+ * @param rlofParam see optflow::RLOFOpticalFlowParameter
+ * @param forwardBackwardThreshold Threshold for the forward backward confidence check.
+ * For each grid point \f$ \mathbf{x} \f$ a motion vector \f$ d_{I0,I1}(\mathbf{x}) \f$ is computed.
+ * If the forward backward error \f[ EP_{FB} = || d_{I0,I1} + d_{I1,I0} || \f]
+ * is larger than threshold given by this function then the motion vector will not be used by the following
+ * vector field interpolation. \f$ d_{I1,I0} \f$ denotes the backward flow. Note, the forward backward test
+ *    will only be applied if the threshold > 0. This may results into a doubled runtime for the motion estimation.
+ * @param gridStep Size of the grid to spawn the motion vectors. For each grid point a motion vector is computed.
+ * Some motion vectors will be removed due to the forwatd backward threshold (if set >0). The rest will be the
+ * base of the vector field interpolation.
+ * @param interp_type interpolation method used to compute the dense optical flow. Two interpolation algorithms are
+ * supported:
+ * - **INTERP_GEO** applies the fast geodesic interpolation, see @cite Geistert2016.
+ * - **INTERP_EPIC_RESIDUAL** applies the edge-preserving interpolation, see @cite Revaud2015,Geistert2016.
+ * @param epicK see ximgproc::EdgeAwareInterpolator sets the respective parameter.
+ * @param epicSigma see ximgproc::EdgeAwareInterpolator sets the respective parameter.
+ * @param epicLambda see ximgproc::EdgeAwareInterpolator sets the respective parameter.
+ * @param ricSPSize  see ximgproc::RICInterpolator sets the respective parameter.
+ * @param ricSLICType see ximgproc::RICInterpolator sets the respective parameter.
+ * @param use_post_proc enables ximgproc::fastGlobalSmootherFilter() parameter.
+ * @param fgsLambda sets the respective ximgproc::fastGlobalSmootherFilter() parameter.
+ * @param fgsSigma sets the respective ximgproc::fastGlobalSmootherFilter() parameter.
+ * @param use_variational_refinement enables VariationalRefinement
+ *
+ * Parameters have been described in @cite Senst2012, @cite Senst2013, @cite Senst2014, @cite Senst2016.
+ * For the RLOF configuration see optflow::RLOFOpticalFlowParameter for further details.
+ * @note If the grid size is set to (1,1) and the forward backward threshold <= 0 that the dense optical flow field is purely
+ * computed with the RLOF.
+ *
+ * @note SIMD parallelization is only available when compiling with SSE4.1.
+ *
+ * @sa optflow::DenseRLOFOpticalFlow, optflow::RLOFOpticalFlowParameter
+*/
+CV_EXPORTS_W void calcOpticalFlowDenseRLOF(InputArray I0, InputArray I1, InputOutputArray flow,
+    Ptr<RLOFOpticalFlowParameter> rlofParam = Ptr<RLOFOpticalFlowParameter>(),
+    float forwardBackwardThreshold = 0, Size gridStep = Size(6, 6),
+    InterpolationType interp_type = InterpolationType::INTERP_EPIC,
+    int epicK = 128, float epicSigma = 0.05f, float epicLambda = 100.f,
+    int ricSPSize = 15, int ricSLICType = 100,
+    bool use_post_proc = true, float fgsLambda = 500.0f, float fgsSigma = 1.5f,
+    bool use_variational_refinement = false);
+
+/** @brief Calculates fast optical flow for a sparse feature set using the robust local optical flow (RLOF) similar
+* to optflow::calcOpticalFlowPyrLK().
+*
+* The RLOF is a fast local optical flow approach described in @cite Senst2012 @cite Senst2013 @cite Senst2014
+ * and @cite Senst2016 similar to the pyramidal iterative Lucas-Kanade method as
+* proposed by @cite Bouguet00. More details and experiments can be found in the following thesis @cite Senst2019.
+* The implementation is derived from optflow::calcOpticalFlowPyrLK().
+*
+* @param prevImg first 8-bit input image. If The cross-based RLOF is used (by selecting optflow::RLOFOpticalFlowParameter::supportRegionType
+* = SupportRegionType::SR_CROSS) image has to be a 8-bit 3 channel image.
+* @param nextImg second 8-bit input image. If The cross-based RLOF is used (by selecting optflow::RLOFOpticalFlowParameter::supportRegionType
+* = SupportRegionType::SR_CROSS) image has to be a 8-bit 3 channel image.
+* @param prevPts vector of 2D points for which the flow needs to be found; point coordinates must be single-precision
+* floating-point numbers.
+* @param nextPts output vector of 2D points (with single-precision floating-point coordinates) containing the calculated
+* new positions of input features in the second image; when optflow::RLOFOpticalFlowParameter::useInitialFlow variable is true  the vector must
+* have the same size as in the input and contain the initialization point correspondences.
+* @param status output status vector (of unsigned chars); each element of the vector is set to 1 if the flow for the
+* corresponding features has passed the forward backward check.
+* @param err output vector of errors; each element of the vector is set to the forward backward error for the corresponding feature.
+* @param rlofParam see optflow::RLOFOpticalFlowParameter
+* @param forwardBackwardThreshold Threshold for the forward backward confidence check. If forewardBackwardThreshold <=0 the forward
+*
+* @note SIMD parallelization is only available when compiling with SSE4.1.
+*
+* Parameters have been described in @cite Senst2012, @cite Senst2013, @cite Senst2014 and @cite Senst2016.
+* For the RLOF configuration see optflow::RLOFOpticalFlowParameter for further details.
+*/
+CV_EXPORTS_W void calcOpticalFlowSparseRLOF(InputArray prevImg, InputArray nextImg,
+    InputArray prevPts, InputOutputArray nextPts,
+    OutputArray status, OutputArray err,
+    Ptr<RLOFOpticalFlowParameter> rlofParam = Ptr<RLOFOpticalFlowParameter>(),
+    float forwardBackwardThreshold = 0);
+
+//! Additional interface to the Dense RLOF algorithm - optflow::calcOpticalFlowDenseRLOF()
+CV_EXPORTS_W Ptr<DenseOpticalFlow> createOptFlow_DenseRLOF();
+
+//! Additional interface to the Sparse RLOF algorithm - optflow::calcOpticalFlowSparseRLOF()
+CV_EXPORTS_W Ptr<SparseOpticalFlow> createOptFlow_SparseRLOF();
+//! @}
+
+} // namespace
+} // namespace
+#endif
diff --git a/modules/optflow/perf/opencl/perf_dis_optflow.cpp b/modules/optflow/perf/opencl/perf_dis_optflow.cpp
deleted file mode 100644
index f299d4c496c..00000000000
--- a/modules/optflow/perf/opencl/perf_dis_optflow.cpp
+++ /dev/null
@@ -1,72 +0,0 @@
-// This file is part of OpenCV project.
-// It is subject to the license terms in the LICENSE file found in the top-level directory
-// of this distribution and at http://opencv.org/license.html.
-#include "../perf_precomp.hpp"
-#include "opencv2/ts/ocl_perf.hpp"
-
-namespace opencv_test { namespace {
-
-#ifdef HAVE_OPENCL
-
-void MakeArtificialExample(UMat &dst_frame1, UMat &dst_frame2);
-
-typedef tuple<String, Size> DISParams;
-typedef TestBaseWithParam<DISParams> DenseOpticalFlow_DIS;
-
-OCL_PERF_TEST_P(DenseOpticalFlow_DIS, perf,
-                Combine(Values("PRESET_ULTRAFAST", "PRESET_FAST", "PRESET_MEDIUM"), Values(szVGA, sz720p, sz1080p)))
-{
-    DISParams params = GetParam();
-
-    // use strings to print preset names in the perf test results:
-    String preset_string = get<0>(params);
-    int preset = DISOpticalFlow::PRESET_FAST;
-    if (preset_string == "PRESET_ULTRAFAST")
-        preset = DISOpticalFlow::PRESET_ULTRAFAST;
-    else if (preset_string == "PRESET_FAST")
-        preset = DISOpticalFlow::PRESET_FAST;
-    else if (preset_string == "PRESET_MEDIUM")
-        preset = DISOpticalFlow::PRESET_MEDIUM;
-    Size sz = get<1>(params);
-
-    UMat frame1(sz, CV_8U);
-    UMat frame2(sz, CV_8U);
-    UMat flow;
-
-    MakeArtificialExample(frame1, frame2);
-
-    Ptr<DenseOpticalFlow> algo = createOptFlow_DIS(preset);
-
-    OCL_TEST_CYCLE_N(10)
-    {
-        algo->calc(frame1, frame2, flow);
-    }
-
-    SANITY_CHECK_NOTHING();
-}
-
-void MakeArtificialExample(UMat &dst_frame1, UMat &dst_frame2)
-{
-    int src_scale = 2;
-    int OF_scale = 6;
-    double sigma = dst_frame1.cols / 300;
-
-    UMat tmp(Size(dst_frame1.cols / (1 << src_scale), dst_frame1.rows / (1 << src_scale)), CV_8U);
-    randu(tmp, 0, 255);
-    resize(tmp, dst_frame1, dst_frame1.size(), 0.0, 0.0, INTER_LINEAR_EXACT);
-    resize(tmp, dst_frame2, dst_frame2.size(), 0.0, 0.0, INTER_LINEAR_EXACT);
-
-    Mat displacement_field(Size(dst_frame1.cols / (1 << OF_scale), dst_frame1.rows / (1 << OF_scale)),
-                           CV_32FC2);
-    randn(displacement_field, 0.0, sigma);
-    resize(displacement_field, displacement_field, dst_frame2.size(), 0.0, 0.0, INTER_CUBIC);
-    for (int i = 0; i < displacement_field.rows; i++)
-        for (int j = 0; j < displacement_field.cols; j++)
-            displacement_field.at<Vec2f>(i, j) += Vec2f((float)j, (float)i);
-
-    remap(dst_frame2, dst_frame2, displacement_field, Mat(), INTER_LINEAR, BORDER_REPLICATE);
-}
-
-#endif // HAVE_OPENCL
-
-}} // namespace
diff --git a/modules/optflow/perf/opencl/perf_optflow_dualTVL1.cpp b/modules/optflow/perf/opencl/perf_optflow_dualTVL1.cpp
new file mode 100644
index 00000000000..1fa465d4c55
--- /dev/null
+++ b/modules/optflow/perf/opencl/perf_optflow_dualTVL1.cpp
@@ -0,0 +1,111 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2010-2012, Multicoreware, Inc., all rights reserved.
+// Copyright (C) 2010-2012, Advanced Micro Devices, Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// @Authors
+//    Fangfang Bai, fangfang@multicorewareinc.com
+//    Jin Ma,       jin@multicorewareinc.com
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors as is and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "../perf_precomp.hpp"
+#include "opencv2/ts/ocl_perf.hpp"
+
+#ifdef HAVE_OPENCL
+
+namespace opencv_test {
+namespace ocl {
+
+///////////// OpticalFlow Dual TVL1 ////////////////////////
+typedef tuple< tuple<int, double>, bool> OpticalFlowDualTVL1Params;
+typedef TestBaseWithParam<OpticalFlowDualTVL1Params> OpticalFlowDualTVL1Fixture;
+
+OCL_PERF_TEST_P(OpticalFlowDualTVL1Fixture, OpticalFlowDualTVL1,
+            ::testing::Combine(
+                        ::testing::Values(make_tuple<int, double>(-1, 0.3),
+                                          make_tuple<int, double>(3, 0.5)),
+                        ::testing::Bool()
+                                )
+            )
+    {
+        Mat frame0 = imread(getDataPath("cv/optflow/RubberWhale1.png"), IMREAD_GRAYSCALE);
+        ASSERT_FALSE(frame0.empty()) << "can't load RubberWhale1.png";
+
+        Mat frame1 = imread(getDataPath("cv/optflow/RubberWhale2.png"), IMREAD_GRAYSCALE);
+        ASSERT_FALSE(frame1.empty()) << "can't load RubberWhale2.png";
+
+        const Size srcSize = frame0.size();
+
+        const OpticalFlowDualTVL1Params params = GetParam();
+        const tuple<int, double> filteringScale = get<0>(params);
+            const int medianFiltering = get<0>(filteringScale);
+            const double scaleStep = get<1>(filteringScale);
+        const bool useInitFlow = get<1>(params);
+        double eps = 0.9;
+
+        UMat uFrame0; frame0.copyTo(uFrame0);
+        UMat uFrame1; frame1.copyTo(uFrame1);
+        UMat uFlow(srcSize, CV_32FC2);
+        declare.in(uFrame0, uFrame1, WARMUP_READ).out(uFlow, WARMUP_READ);
+
+        //create algorithm
+        Ptr<DualTVL1OpticalFlow> alg = createOptFlow_DualTVL1();
+
+        //set parameters
+        alg->setScaleStep(scaleStep);
+        alg->setMedianFiltering(medianFiltering);
+
+        if (useInitFlow)
+        {
+            //calculate initial flow as result of optical flow
+            alg->calc(uFrame0, uFrame1, uFlow);
+        }
+
+        //set flag to use initial flow
+        alg->setUseInitialFlow(useInitFlow);
+        OCL_TEST_CYCLE()
+            alg->calc(uFrame0, uFrame1, uFlow);
+
+        SANITY_CHECK(uFlow, eps, ERROR_RELATIVE);
+    }
+}
+
+} // namespace opencv_test::ocl
+
+#endif // HAVE_OPENCL
diff --git a/modules/optflow/perf/perf_disflow.cpp b/modules/optflow/perf/perf_disflow.cpp
deleted file mode 100644
index 04443f57382..00000000000
--- a/modules/optflow/perf/perf_disflow.cpp
+++ /dev/null
@@ -1,66 +0,0 @@
-// This file is part of OpenCV project.
-// It is subject to the license terms in the LICENSE file found in the top-level directory
-// of this distribution and at http://opencv.org/license.html.
-#include "perf_precomp.hpp"
-
-namespace opencv_test { namespace {
-
-void MakeArtificialExample(Mat &dst_frame1, Mat &dst_frame2);
-
-typedef tuple<String, Size> DISParams;
-typedef TestBaseWithParam<DISParams> DenseOpticalFlow_DIS;
-
-PERF_TEST_P(DenseOpticalFlow_DIS, perf,
-            Combine(Values("PRESET_ULTRAFAST", "PRESET_FAST", "PRESET_MEDIUM"), Values(szVGA, sz720p, sz1080p)))
-{
-    DISParams params = GetParam();
-
-    // use strings to print preset names in the perf test results:
-    String preset_string = get<0>(params);
-    int preset = DISOpticalFlow::PRESET_FAST;
-    if (preset_string == "PRESET_ULTRAFAST")
-        preset = DISOpticalFlow::PRESET_ULTRAFAST;
-    else if (preset_string == "PRESET_FAST")
-        preset = DISOpticalFlow::PRESET_FAST;
-    else if (preset_string == "PRESET_MEDIUM")
-        preset = DISOpticalFlow::PRESET_MEDIUM;
-    Size sz = get<1>(params);
-
-    Mat frame1(sz, CV_8U);
-    Mat frame2(sz, CV_8U);
-    Mat flow;
-
-    MakeArtificialExample(frame1, frame2);
-
-    TEST_CYCLE_N(10)
-    {
-        Ptr<DenseOpticalFlow> algo = createOptFlow_DIS(preset);
-        algo->calc(frame1, frame2, flow);
-    }
-
-    SANITY_CHECK_NOTHING();
-}
-
-void MakeArtificialExample(Mat &dst_frame1, Mat &dst_frame2)
-{
-    int src_scale = 2;
-    int OF_scale = 6;
-    double sigma = dst_frame1.cols / 300;
-
-    Mat tmp(Size(dst_frame1.cols / (1 << src_scale), dst_frame1.rows / (1 << src_scale)), CV_8U);
-    randu(tmp, 0, 255);
-    resize(tmp, dst_frame1, dst_frame1.size(), 0.0, 0.0, INTER_LINEAR_EXACT);
-    resize(tmp, dst_frame2, dst_frame2.size(), 0.0, 0.0, INTER_LINEAR_EXACT);
-
-    Mat displacement_field(Size(dst_frame1.cols / (1 << OF_scale), dst_frame1.rows / (1 << OF_scale)),
-                           CV_32FC2);
-    randn(displacement_field, 0.0, sigma);
-    resize(displacement_field, displacement_field, dst_frame2.size(), 0.0, 0.0, INTER_CUBIC);
-    for (int i = 0; i < displacement_field.rows; i++)
-        for (int j = 0; j < displacement_field.cols; j++)
-            displacement_field.at<Vec2f>(i, j) += Vec2f((float)j, (float)i);
-
-    remap(dst_frame2, dst_frame2, displacement_field, Mat(), INTER_LINEAR, BORDER_REPLICATE);
-}
-
-}} // namespace
diff --git a/modules/optflow/perf/perf_rlof.cpp b/modules/optflow/perf/perf_rlof.cpp
new file mode 100644
index 00000000000..1295c526a6e
--- /dev/null
+++ b/modules/optflow/perf/perf_rlof.cpp
@@ -0,0 +1,72 @@
+#include "perf_precomp.hpp"
+namespace opencv_test { namespace {
+
+typedef tuple<std::string, std::string, bool> ST_SR_IM_Sparse_t;
+typedef TestBaseWithParam<ST_SR_IM_Sparse_t> ST_SR_IM_Sparse;
+PERF_TEST_P(ST_SR_IM_Sparse, OpticalFlow_SparseRLOF,
+    testing::Combine(
+        testing::Values<std::string>("ST_BILINEAR", "ST_STANDART"),
+        testing::Values<std::string>("SR_CROSS", "SR_FIXED"),
+        testing::Values(true, false))
+)
+{
+    Mat frame1 = imread(getDataPath("cv/optflow/RubberWhale1.png"));
+    Mat frame2 = imread(getDataPath("cv/optflow/RubberWhale2.png"));
+    ASSERT_FALSE(frame1.empty());
+    ASSERT_FALSE(frame2.empty());
+    vector<Point2f> prevPts, currPts;
+    for (int r = 0; r < frame1.rows; r += 10)
+    {
+        for (int c = 0; c < frame1.cols; c += 10)
+        {
+            prevPts.push_back(Point2f(static_cast<float>(c), static_cast<float>(r)));
+        }
+    }
+    vector<uchar> status(prevPts.size());
+    vector<float> err(prevPts.size());
+
+    Ptr<RLOFOpticalFlowParameter> param = Ptr<RLOFOpticalFlowParameter>(new RLOFOpticalFlowParameter);
+    if (get<0>(GetParam()) == "ST_BILINEAR")
+        param->solverType = ST_BILINEAR;
+    if (get<0>(GetParam()) == "ST_STANDART")
+        param->solverType = ST_STANDART;
+    if (get<1>(GetParam()) == "SR_CROSS")
+        param->supportRegionType = SR_CROSS;
+    if (get<1>(GetParam()) == "SR_FIXED")
+        param->supportRegionType = SR_FIXED;
+    param->useIlluminationModel = get<2>(GetParam());
+
+    PERF_SAMPLE_BEGIN()
+        calcOpticalFlowSparseRLOF(frame1, frame2, prevPts, currPts, status, err, param, 1.f);
+    PERF_SAMPLE_END()
+
+    SANITY_CHECK_NOTHING();
+}
+
+typedef tuple<std::string, int> INTERP_GRID_Dense_t;
+typedef TestBaseWithParam<INTERP_GRID_Dense_t> INTERP_GRID_Dense;
+PERF_TEST_P(INTERP_GRID_Dense, OpticalFlow_DenseRLOF,
+    testing::Combine(
+        testing::Values<std::string>("INTERP_EPIC", "INTERP_GEO"),
+        testing::Values<int>(4,10))
+)
+{
+    Mat flow;
+    Mat frame1 = imread(getDataPath("cv/optflow/RubberWhale1.png"));
+    Mat frame2 = imread(getDataPath("cv/optflow/RubberWhale1.png"));
+    ASSERT_FALSE(frame1.empty());
+    ASSERT_FALSE(frame2.empty());
+    Ptr<RLOFOpticalFlowParameter> param = Ptr<RLOFOpticalFlowParameter>(new RLOFOpticalFlowParameter);;
+    Ptr< DenseRLOFOpticalFlow> algo = DenseRLOFOpticalFlow::create();
+    InterpolationType interp_type = INTERP_EPIC;
+    if (get<0>(GetParam()) == "INTERP_EPIC")
+        interp_type = INTERP_EPIC;
+    if (get<0>(GetParam()) == "INTERP_GEO")
+        interp_type = INTERP_GEO;
+    PERF_SAMPLE_BEGIN()
+        calcOpticalFlowDenseRLOF(frame1, frame2,flow, param, 1.0f, Size(get<1>(GetParam()), get<1>(GetParam())), interp_type);
+    PERF_SAMPLE_END()
+    SANITY_CHECK_NOTHING();
+}
+
+}} // namespace
diff --git a/modules/optflow/perf/perf_tvl1optflow.cpp b/modules/optflow/perf/perf_tvl1optflow.cpp
new file mode 100644
index 00000000000..917c28fcf3f
--- /dev/null
+++ b/modules/optflow/perf/perf_tvl1optflow.cpp
@@ -0,0 +1,31 @@
+#include "perf_precomp.hpp"
+
+namespace opencv_test { namespace {
+using namespace perf;
+
+typedef TestBaseWithParam< std::pair<string, string> > ImagePair;
+
+std::pair<string, string> impair(const char* im1, const char* im2)
+{
+    return std::make_pair(string(im1), string(im2));
+}
+
+PERF_TEST_P(ImagePair, OpticalFlowDual_TVL1, testing::Values(impair("cv/optflow/RubberWhale1.png", "cv/optflow/RubberWhale2.png")))
+{
+    declare.time(260);
+
+    Mat frame1 = imread(getDataPath(GetParam().first), IMREAD_GRAYSCALE);
+    Mat frame2 = imread(getDataPath(GetParam().second), IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame1.empty());
+    ASSERT_FALSE(frame2.empty());
+
+    Mat flow;
+
+    Ptr<DenseOpticalFlow> tvl1 = createOptFlow_DualTVL1();
+
+    TEST_CYCLE() tvl1->calc(frame1, frame2, flow);
+
+    SANITY_CHECK_NOTHING();
+}
+
+}} // namespace
diff --git a/modules/optflow/perf/perf_variational_refinement.cpp b/modules/optflow/perf/perf_variational_refinement.cpp
deleted file mode 100644
index 0a953040da5..00000000000
--- a/modules/optflow/perf/perf_variational_refinement.cpp
+++ /dev/null
@@ -1,40 +0,0 @@
-// This file is part of OpenCV project.
-// It is subject to the license terms in the LICENSE file found in the top-level directory
-// of this distribution and at http://opencv.org/license.html.
-#include "perf_precomp.hpp"
-
-namespace opencv_test { namespace {
-
-typedef tuple<Size, int, int> VarRefParams;
-typedef TestBaseWithParam<VarRefParams> DenseOpticalFlow_VariationalRefinement;
-
-PERF_TEST_P(DenseOpticalFlow_VariationalRefinement, perf, Combine(Values(szQVGA, szVGA), Values(5, 10), Values(5, 10)))
-{
-    VarRefParams params = GetParam();
-    Size sz = get<0>(params);
-    int sorIter = get<1>(params);
-    int fixedPointIter = get<2>(params);
-
-    Mat frame1(sz, CV_8U);
-    Mat frame2(sz, CV_8U);
-    Mat flow(sz, CV_32FC2);
-
-    randu(frame1, 0, 255);
-    randu(frame2, 0, 255);
-    flow.setTo(0.0f);
-
-    TEST_CYCLE_N(10)
-    {
-        Ptr<VariationalRefinement> var = createVariationalFlowRefinement();
-        var->setAlpha(20.0f);
-        var->setGamma(10.0f);
-        var->setDelta(5.0f);
-        var->setSorIterations(sorIter);
-        var->setFixedPointIterations(fixedPointIter);
-        var->calc(frame1, frame2, flow);
-    }
-
-    SANITY_CHECK_NOTHING();
-}
-
-}} // namespace
diff --git a/modules/optflow/samples/dis_opticalflow.cpp b/modules/optflow/samples/dis_opticalflow.cpp
deleted file mode 100644
index 1d4037c0498..00000000000
--- a/modules/optflow/samples/dis_opticalflow.cpp
+++ /dev/null
@@ -1,74 +0,0 @@
-
-#include "opencv2/core/utility.hpp"
-#include "opencv2/highgui.hpp"
-#include "opencv2/imgproc.hpp"
-#include "opencv2/optflow.hpp"
-
-using namespace std;
-using namespace cv;
-using namespace optflow;
-
-static void help()
-{
-    printf("Usage: dis_optflow <video_file>\n");
-}
-
-int main(int argc, char **argv)
-{
-    VideoCapture cap;
-
-    if (argc < 2)
-    {
-        help();
-        exit(1);
-    }
-
-    cap.open(argv[1]);
-    if(!cap.isOpened())
-    {
-        printf("ERROR: Cannot open file %s\n", argv[1]);
-        return -1;
-    }
-
-    Mat prevgray, gray, rgb, frame;
-    Mat flow, flow_uv[2];
-    Mat mag, ang;
-    Mat hsv_split[3], hsv;
-    char ret;
-
-    namedWindow("flow", 1);
-    namedWindow("orig", 1);
-
-    Ptr<DenseOpticalFlow> algorithm = createOptFlow_DIS(DISOpticalFlow::PRESET_MEDIUM);
-
-    while(true)
-    {
-        cap >> frame;
-        if (frame.empty())
-            break;
-
-        cvtColor(frame, gray, COLOR_BGR2GRAY);
-
-        if (!prevgray.empty())
-        {
-            algorithm->calc(prevgray, gray, flow);
-            split(flow, flow_uv);
-            multiply(flow_uv[1], -1, flow_uv[1]);
-            cartToPolar(flow_uv[0], flow_uv[1], mag, ang, true);
-            normalize(mag, mag, 0, 1, NORM_MINMAX);
-            hsv_split[0] = ang;
-            hsv_split[1] = mag;
-            hsv_split[2] = Mat::ones(ang.size(), ang.type());
-            merge(hsv_split, 3, hsv);
-            cvtColor(hsv, rgb, COLOR_HSV2BGR);
-            imshow("flow", rgb);
-            imshow("orig", frame);
-        }
-
-        if ((ret = (char)waitKey(20)) > 0)
-            break;
-        std::swap(prevgray, gray);
-    }
-
-    return 0;
-}
diff --git a/modules/optflow/samples/gpc_evaluate.cpp b/modules/optflow/samples/gpc_evaluate.cpp
index f787103d05b..528483fd521 100644
--- a/modules/optflow/samples/gpc_evaluate.cpp
+++ b/modules/optflow/samples/gpc_evaluate.cpp
@@ -101,7 +101,7 @@ int main( int argc, const char **argv )
 
   Mat from = imread( fromPath );
   Mat to = imread( toPath );
-  Mat gt = optflow::readOpticalFlow( gtPath );
+  Mat gt = readOpticalFlow( gtPath );
   std::vector< std::pair< Point2i, Point2i > > corr;
 
   TickMeter meter;
diff --git a/modules/optflow/samples/optical_flow_evaluation.cpp b/modules/optflow/samples/optical_flow_evaluation.cpp
index cdd641f7725..d0131ece680 100644
--- a/modules/optflow/samples/optical_flow_evaluation.cpp
+++ b/modules/optflow/samples/optical_flow_evaluation.cpp
@@ -12,7 +12,7 @@ using namespace optflow;
 const String keys = "{help h usage ? |      | print this message   }"
         "{@image1        |      | image1               }"
         "{@image2        |      | image2               }"
-        "{@algorithm     |      | [farneback, simpleflow, tvl1, deepflow, sparsetodenseflow, pcaflow, DISflow_ultrafast, DISflow_fast, DISflow_medium] }"
+        "{@algorithm     |      | [farneback, simpleflow, tvl1, deepflow, sparsetodenseflow, RLOF_EPIC, RLOF_RIC, pcaflow, DISflow_ultrafast, DISflow_fast, DISflow_medium] }"
         "{@groundtruth   |      | path to the .flo file  (optional), Middlebury format }"
         "{m measure      |endpoint| error measure - [endpoint or angular] }"
         "{r region       |all   | region to compute stats about [all, discontinuities, untextured] }"
@@ -259,6 +259,20 @@ int main( int argc, char** argv )
         algorithm = createOptFlow_DeepFlow();
     else if ( method == "sparsetodenseflow" )
         algorithm = createOptFlow_SparseToDense();
+    else if (method == "RLOF_EPIC")
+    {
+        algorithm = createOptFlow_DenseRLOF();
+        Ptr<DenseRLOFOpticalFlow> rlof = algorithm.dynamicCast< DenseRLOFOpticalFlow>();
+        rlof->setInterpolation(INTERP_EPIC);
+        rlof->setForwardBackward(1.f);
+    }
+    else if (method == "RLOF_RIC")
+    {
+        algorithm = createOptFlow_DenseRLOF();
+        Ptr<DenseRLOFOpticalFlow> rlof = algorithm.dynamicCast< DenseRLOFOpticalFlow>();;
+        rlof->setInterpolation(INTERP_RIC);
+        rlof->setForwardBackward(1.f);
+    }
     else if ( method == "pcaflow" ) {
         if ( parser.has("prior") ) {
             String prior = parser.get<String>("prior");
@@ -269,11 +283,11 @@ int main( int argc, char** argv )
             algorithm = createOptFlow_PCAFlow();
     }
     else if ( method == "DISflow_ultrafast" )
-        algorithm = createOptFlow_DIS(DISOpticalFlow::PRESET_ULTRAFAST);
+        algorithm = DISOpticalFlow::create(DISOpticalFlow::PRESET_ULTRAFAST);
     else if (method == "DISflow_fast")
-        algorithm = createOptFlow_DIS(DISOpticalFlow::PRESET_FAST);
+        algorithm = DISOpticalFlow::create(DISOpticalFlow::PRESET_FAST);
     else if (method == "DISflow_medium")
-        algorithm = createOptFlow_DIS(DISOpticalFlow::PRESET_MEDIUM);
+        algorithm = DISOpticalFlow::create(DISOpticalFlow::PRESET_MEDIUM);
     else
     {
         printf("Wrong method!\n");
@@ -300,7 +314,7 @@ int main( int argc, char** argv )
 
     if ( !groundtruth_path.empty() )
     { // compare to ground truth
-        ground_truth = optflow::readOpticalFlow(groundtruth_path);
+        ground_truth = readOpticalFlow(groundtruth_path);
         if ( flow.size() != ground_truth.size() || flow.channels() != 2
                 || ground_truth.channels() != 2 )
         {
diff --git a/modules/optflow/samples/pcaflow_demo.cpp b/modules/optflow/samples/pcaflow_demo.cpp
index ea9a60712eb..ec0c3f90378 100644
--- a/modules/optflow/samples/pcaflow_demo.cpp
+++ b/modules/optflow/samples/pcaflow_demo.cpp
@@ -135,7 +135,7 @@ int main( int argc, const char **argv )
 
   Mat i1 = imread( img1 );
   Mat i2 = imread( img2 );
-  Mat gt = optflow::readOpticalFlow( groundtruth );
+  Mat gt = readOpticalFlow( groundtruth );
 
   Mat i1g, i2g;
   cvtColor( i1, i1g, COLOR_BGR2GRAY );
diff --git a/modules/optflow/samples/tvl1_optical_flow.cpp b/modules/optflow/samples/tvl1_optical_flow.cpp
new file mode 100644
index 00000000000..1b6db770ef8
--- /dev/null
+++ b/modules/optflow/samples/tvl1_optical_flow.cpp
@@ -0,0 +1,206 @@
+#include <iostream>
+#include <fstream>
+
+#include <opencv2/core/utility.hpp>
+#include "opencv2/video.hpp"
+#include "opencv2/optflow.hpp"
+#include "opencv2/imgcodecs.hpp"
+#include "opencv2/highgui.hpp"
+
+using namespace cv;
+using namespace std;
+using namespace optflow;
+
+inline bool isFlowCorrect(Point2f u)
+{
+    return !cvIsNaN(u.x) && !cvIsNaN(u.y) && fabs(u.x) < 1e9 && fabs(u.y) < 1e9;
+}
+
+static Vec3b computeColor(float fx, float fy)
+{
+    static bool first = true;
+
+    // relative lengths of color transitions:
+    // these are chosen based on perceptual similarity
+    // (e.g. one can distinguish more shades between red and yellow
+    //  than between yellow and green)
+    const int RY = 15;
+    const int YG = 6;
+    const int GC = 4;
+    const int CB = 11;
+    const int BM = 13;
+    const int MR = 6;
+    const int NCOLS = RY + YG + GC + CB + BM + MR;
+    static Vec3i colorWheel[NCOLS];
+
+    if (first)
+    {
+        int k = 0;
+
+        for (int i = 0; i < RY; ++i, ++k)
+            colorWheel[k] = Vec3i(255, 255 * i / RY, 0);
+
+        for (int i = 0; i < YG; ++i, ++k)
+            colorWheel[k] = Vec3i(255 - 255 * i / YG, 255, 0);
+
+        for (int i = 0; i < GC; ++i, ++k)
+            colorWheel[k] = Vec3i(0, 255, 255 * i / GC);
+
+        for (int i = 0; i < CB; ++i, ++k)
+            colorWheel[k] = Vec3i(0, 255 - 255 * i / CB, 255);
+
+        for (int i = 0; i < BM; ++i, ++k)
+            colorWheel[k] = Vec3i(255 * i / BM, 0, 255);
+
+        for (int i = 0; i < MR; ++i, ++k)
+            colorWheel[k] = Vec3i(255, 0, 255 - 255 * i / MR);
+
+        first = false;
+    }
+
+    const float rad = sqrt(fx * fx + fy * fy);
+    const float a = atan2(-fy, -fx) / (float)CV_PI;
+
+    const float fk = (a + 1.0f) / 2.0f * (NCOLS - 1);
+    const int k0 = static_cast<int>(fk);
+    const int k1 = (k0 + 1) % NCOLS;
+    const float f = fk - k0;
+
+    Vec3b pix;
+
+    for (int b = 0; b < 3; b++)
+    {
+        const float col0 = colorWheel[k0][b] / 255.f;
+        const float col1 = colorWheel[k1][b] / 255.f;
+
+        float col = (1 - f) * col0 + f * col1;
+
+        if (rad <= 1)
+            col = 1 - rad * (1 - col); // increase saturation with radius
+        else
+            col *= .75; // out of range
+
+        pix[2 - b] = static_cast<uchar>(255.f * col);
+    }
+
+    return pix;
+}
+
+static void drawOpticalFlow(const Mat_<Point2f>& flow, Mat& dst, float maxmotion = -1)
+{
+    dst.create(flow.size(), CV_8UC3);
+    dst.setTo(Scalar::all(0));
+
+    // determine motion range:
+    float maxrad = maxmotion;
+
+    if (maxmotion <= 0)
+    {
+        maxrad = 1;
+        for (int y = 0; y < flow.rows; ++y)
+        {
+            for (int x = 0; x < flow.cols; ++x)
+            {
+                Point2f u = flow(y, x);
+
+                if (!isFlowCorrect(u))
+                    continue;
+
+                maxrad = max(maxrad, sqrt(u.x * u.x + u.y * u.y));
+            }
+        }
+    }
+
+    for (int y = 0; y < flow.rows; ++y)
+    {
+        for (int x = 0; x < flow.cols; ++x)
+        {
+            Point2f u = flow(y, x);
+
+            if (isFlowCorrect(u))
+                dst.at<Vec3b>(y, x) = computeColor(u.x / maxrad, u.y / maxrad);
+        }
+    }
+}
+
+// binary file format for flow data specified here:
+// http://vision.middlebury.edu/flow/data/
+static void writeOpticalFlowToFile(const Mat_<Point2f>& flow, const string& fileName)
+{
+    static const char FLO_TAG_STRING[] = "PIEH";
+
+    ofstream file(fileName.c_str(), ios_base::binary);
+
+    file << FLO_TAG_STRING;
+
+    file.write((const char*) &flow.cols, sizeof(int));
+    file.write((const char*) &flow.rows, sizeof(int));
+
+    for (int i = 0; i < flow.rows; ++i)
+    {
+        for (int j = 0; j < flow.cols; ++j)
+        {
+            const Point2f u = flow(i, j);
+
+            file.write((const char*) &u.x, sizeof(float));
+            file.write((const char*) &u.y, sizeof(float));
+        }
+    }
+}
+
+int main(int argc, const char* argv[])
+{
+    cv::CommandLineParser parser(argc, argv, "{help h || show help message}"
+            "{ @frame0 | | frame 0}{ @frame1 | | frame 1}{ @output | | output flow}");
+    if (parser.has("help"))
+    {
+        parser.printMessage();
+        return 0;
+    }
+    string frame0_name = parser.get<string>("@frame0");
+    string frame1_name = parser.get<string>("@frame1");
+    string file = parser.get<string>("@output");
+    if (frame0_name.empty() || frame1_name.empty() || file.empty())
+    {
+        cerr << "Usage : " << argv[0] << " [<frame0>] [<frame1>] [<output_flow>]" << endl;
+        return -1;
+    }
+
+    Mat frame0 = imread(frame0_name, IMREAD_GRAYSCALE);
+    Mat frame1 = imread(frame1_name, IMREAD_GRAYSCALE);
+
+    if (frame0.empty())
+    {
+        cerr << "Can't open image ["  << parser.get<string>("frame0") << "]" << endl;
+        return -1;
+    }
+    if (frame1.empty())
+    {
+        cerr << "Can't open image ["  << parser.get<string>("frame1") << "]" << endl;
+        return -1;
+    }
+
+    if (frame1.size() != frame0.size())
+    {
+        cerr << "Images should be of equal sizes" << endl;
+        return -1;
+    }
+
+    Mat_<Point2f> flow;
+    Ptr<DualTVL1OpticalFlow> tvl1 = DualTVL1OpticalFlow::create();
+
+    const double start = (double)getTickCount();
+    tvl1->calc(frame0, frame1, flow);
+    const double timeSec = (getTickCount() - start) / getTickFrequency();
+    cout << "calcOpticalFlowDual_TVL1 : " << timeSec << " sec" << endl;
+
+    Mat out;
+    drawOpticalFlow(flow, out);
+    if (!file.empty())
+        writeOpticalFlowToFile(flow, file);
+
+    imshow("Flow", out);
+    waitKey();
+
+    return 0;
+}
diff --git a/modules/optflow/src/deepflow.cpp b/modules/optflow/src/deepflow.cpp
index 92423fa32ec..6b7cc2e745f 100644
--- a/modules/optflow/src/deepflow.cpp
+++ b/modules/optflow/src/deepflow.cpp
@@ -147,7 +147,7 @@ void OpticalFlowDeepFlow::calc( InputArray _I0, InputArray _I1, InputOutputArray
 
     for ( int level = levelCount - 1; level >= 0; --level )
     { //iterate through  all levels, beginning with the most coarse
-        Ptr<VariationalRefinement> var = createVariationalFlowRefinement();
+        Ptr<VariationalRefinement> var = VariationalRefinement::create();
 
         var->setAlpha(4 * alpha);
         var->setDelta(delta / 3);
diff --git a/modules/optflow/src/dis_flow.cpp b/modules/optflow/src/dis_flow.cpp
deleted file mode 100644
index d37a497f3a8..00000000000
--- a/modules/optflow/src/dis_flow.cpp
+++ /dev/null
@@ -1,1505 +0,0 @@
-/*M///////////////////////////////////////////////////////////////////////////////////////
-//
-//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
-//
-//  By downloading, copying, installing or using the software you agree to this license.
-//  If you do not agree to this license, do not download, install,
-//  copy or use the software.
-//
-//
-//                           License Agreement
-//                For Open Source Computer Vision Library
-//
-// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
-// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
-// Third party copyrights are property of their respective owners.
-//
-// Redistribution and use in source and binary forms, with or without modification,
-// are permitted provided that the following conditions are met:
-//
-//   * Redistribution's of source code must retain the above copyright notice,
-//     this list of conditions and the following disclaimer.
-//
-//   * Redistribution's in binary form must reproduce the above copyright notice,
-//     this list of conditions and the following disclaimer in the documentation
-//     and/or other materials provided with the distribution.
-//
-//   * The name of the copyright holders may not be used to endorse or promote products
-//     derived from this software without specific prior written permission.
-//
-// This software is provided by the copyright holders and contributors "as is" and
-// any express or implied warranties, including, but not limited to, the implied
-// warranties of merchantability and fitness for a particular purpose are disclaimed.
-// In no event shall the Intel Corporation or contributors be liable for any direct,
-// indirect, incidental, special, exemplary, or consequential damages
-// (including, but not limited to, procurement of substitute goods or services;
-// loss of use, data, or profits; or business interruption) however caused
-// and on any theory of liability, whether in contract, strict liability,
-// or tort (including negligence or otherwise) arising in any way out of
-// the use of this software, even if advised of the possibility of such damage.
-//
-//M*/
-
-#include "precomp.hpp"
-#include "opencv2/core/hal/intrin.hpp"
-#include "opencl_kernels_optflow.hpp"
-
-
-using namespace std;
-#define EPS 0.001F
-#define INF 1E+10F
-
-namespace cv
-{
-namespace optflow
-{
-
-class DISOpticalFlowImpl CV_FINAL : public DISOpticalFlow
-{
-  public:
-    DISOpticalFlowImpl();
-
-    void calc(InputArray I0, InputArray I1, InputOutputArray flow) CV_OVERRIDE;
-    void collectGarbage() CV_OVERRIDE;
-
-  protected: //!< algorithm parameters
-    int finest_scale, coarsest_scale;
-    int patch_size;
-    int patch_stride;
-    int grad_descent_iter;
-    int variational_refinement_iter;
-    float variational_refinement_alpha;
-    float variational_refinement_gamma;
-    float variational_refinement_delta;
-    bool use_mean_normalization;
-    bool use_spatial_propagation;
-
-  protected: //!< some auxiliary variables
-    int border_size;
-    int w, h;   //!< flow buffer width and height on the current scale
-    int ws, hs; //!< sparse flow buffer width and height on the current scale
-
-  public:
-    int getFinestScale() const CV_OVERRIDE { return finest_scale; }
-    void setFinestScale(int val) CV_OVERRIDE { finest_scale = val; }
-    int getPatchSize() const CV_OVERRIDE { return patch_size; }
-    void setPatchSize(int val) CV_OVERRIDE { patch_size = val; }
-    int getPatchStride() const CV_OVERRIDE { return patch_stride; }
-    void setPatchStride(int val) CV_OVERRIDE { patch_stride = val; }
-    int getGradientDescentIterations() const CV_OVERRIDE { return grad_descent_iter; }
-    void setGradientDescentIterations(int val) CV_OVERRIDE { grad_descent_iter = val; }
-    int getVariationalRefinementIterations() const CV_OVERRIDE { return variational_refinement_iter; }
-    void setVariationalRefinementIterations(int val) CV_OVERRIDE { variational_refinement_iter = val; }
-    float getVariationalRefinementAlpha() const CV_OVERRIDE { return variational_refinement_alpha; }
-    void setVariationalRefinementAlpha(float val) CV_OVERRIDE { variational_refinement_alpha = val; }
-    float getVariationalRefinementDelta() const CV_OVERRIDE { return variational_refinement_delta; }
-    void setVariationalRefinementDelta(float val) CV_OVERRIDE { variational_refinement_delta = val; }
-    float getVariationalRefinementGamma() const CV_OVERRIDE { return variational_refinement_gamma; }
-    void setVariationalRefinementGamma(float val) CV_OVERRIDE { variational_refinement_gamma = val; }
-
-    bool getUseMeanNormalization() const CV_OVERRIDE { return use_mean_normalization; }
-    void setUseMeanNormalization(bool val) CV_OVERRIDE { use_mean_normalization = val; }
-    bool getUseSpatialPropagation() const CV_OVERRIDE { return use_spatial_propagation; }
-    void setUseSpatialPropagation(bool val) CV_OVERRIDE { use_spatial_propagation = val; }
-
-  protected:                      //!< internal buffers
-    vector<Mat_<uchar> > I0s;     //!< Gaussian pyramid for the current frame
-    vector<Mat_<uchar> > I1s;     //!< Gaussian pyramid for the next frame
-    vector<Mat_<uchar> > I1s_ext; //!< I1s with borders
-
-    vector<Mat_<short> > I0xs; //!< Gaussian pyramid for the x gradient of the current frame
-    vector<Mat_<short> > I0ys; //!< Gaussian pyramid for the y gradient of the current frame
-
-    vector<Mat_<float> > Ux; //!< x component of the flow vectors
-    vector<Mat_<float> > Uy; //!< y component of the flow vectors
-
-    vector<Mat_<float> > initial_Ux; //!< x component of the initial flow field, if one was passed as an input
-    vector<Mat_<float> > initial_Uy; //!< y component of the initial flow field, if one was passed as an input
-
-    Mat_<Vec2f> U; //!< a buffer for the merged flow
-
-    Mat_<float> Sx; //!< intermediate sparse flow representation (x component)
-    Mat_<float> Sy; //!< intermediate sparse flow representation (y component)
-
-    /* Structure tensor components: */
-    Mat_<float> I0xx_buf; //!< sum of squares of x gradient values
-    Mat_<float> I0yy_buf; //!< sum of squares of y gradient values
-    Mat_<float> I0xy_buf; //!< sum of x and y gradient products
-
-    /* Extra buffers that are useful if patch mean-normalization is used: */
-    Mat_<float> I0x_buf; //!< sum of x gradient values
-    Mat_<float> I0y_buf; //!< sum of y gradient values
-
-    /* Auxiliary buffers used in structure tensor computation: */
-    Mat_<float> I0xx_buf_aux;
-    Mat_<float> I0yy_buf_aux;
-    Mat_<float> I0xy_buf_aux;
-    Mat_<float> I0x_buf_aux;
-    Mat_<float> I0y_buf_aux;
-
-    vector<Ptr<VariationalRefinement> > variational_refinement_processors;
-
-  private: //!< private methods and parallel sections
-    void prepareBuffers(Mat &I0, Mat &I1, Mat &flow, bool use_flow);
-    void precomputeStructureTensor(Mat &dst_I0xx, Mat &dst_I0yy, Mat &dst_I0xy, Mat &dst_I0x, Mat &dst_I0y, Mat &I0x,
-                                   Mat &I0y);
-
-    struct PatchInverseSearch_ParBody : public ParallelLoopBody
-    {
-        DISOpticalFlowImpl *dis;
-        int nstripes, stripe_sz;
-        int hs;
-        Mat *Sx, *Sy, *Ux, *Uy, *I0, *I1, *I0x, *I0y;
-        int num_iter, pyr_level;
-
-        PatchInverseSearch_ParBody(DISOpticalFlowImpl &_dis, int _nstripes, int _hs, Mat &dst_Sx, Mat &dst_Sy,
-                                   Mat &src_Ux, Mat &src_Uy, Mat &_I0, Mat &_I1, Mat &_I0x, Mat &_I0y, int _num_iter,
-                                   int _pyr_level);
-        void operator()(const Range &range) const CV_OVERRIDE;
-    };
-
-    struct Densification_ParBody : public ParallelLoopBody
-    {
-        DISOpticalFlowImpl *dis;
-        int nstripes, stripe_sz;
-        int h;
-        Mat *Ux, *Uy, *Sx, *Sy, *I0, *I1;
-
-        Densification_ParBody(DISOpticalFlowImpl &_dis, int _nstripes, int _h, Mat &dst_Ux, Mat &dst_Uy, Mat &src_Sx,
-                              Mat &src_Sy, Mat &_I0, Mat &_I1);
-        void operator()(const Range &range) const CV_OVERRIDE;
-    };
-
-#ifdef HAVE_OPENCL
-    vector<UMat> u_I0s;     //!< Gaussian pyramid for the current frame
-    vector<UMat> u_I1s;     //!< Gaussian pyramid for the next frame
-    vector<UMat> u_I1s_ext; //!< I1s with borders
-
-    vector<UMat> u_I0xs; //!< Gaussian pyramid for the x gradient of the current frame
-    vector<UMat> u_I0ys; //!< Gaussian pyramid for the y gradient of the current frame
-
-    vector<UMat> u_Ux; //!< x component of the flow vectors
-    vector<UMat> u_Uy; //!< y component of the flow vectors
-
-    vector<UMat> u_initial_Ux; //!< x component of the initial flow field, if one was passed as an input
-    vector<UMat> u_initial_Uy; //!< y component of the initial flow field, if one was passed as an input
-
-    UMat u_U; //!< a buffer for the merged flow
-
-    UMat u_Sx; //!< intermediate sparse flow representation (x component)
-    UMat u_Sy; //!< intermediate sparse flow representation (y component)
-
-    /* Structure tensor components: */
-    UMat u_I0xx_buf; //!< sum of squares of x gradient values
-    UMat u_I0yy_buf; //!< sum of squares of y gradient values
-    UMat u_I0xy_buf; //!< sum of x and y gradient products
-
-    /* Extra buffers that are useful if patch mean-normalization is used: */
-    UMat u_I0x_buf; //!< sum of x gradient values
-    UMat u_I0y_buf; //!< sum of y gradient values
-
-    /* Auxiliary buffers used in structure tensor computation: */
-    UMat u_I0xx_buf_aux;
-    UMat u_I0yy_buf_aux;
-    UMat u_I0xy_buf_aux;
-    UMat u_I0x_buf_aux;
-    UMat u_I0y_buf_aux;
-
-    bool ocl_precomputeStructureTensor(UMat &dst_I0xx, UMat &dst_I0yy, UMat &dst_I0xy,
-                                       UMat &dst_I0x, UMat &dst_I0y, UMat &I0x, UMat &I0y);
-    void ocl_prepareBuffers(UMat &I0, UMat &I1, UMat &flow, bool use_flow);
-    bool ocl_calc(InputArray I0, InputArray I1, InputOutputArray flow);
-    bool ocl_Densification(UMat &dst_Ux, UMat &dst_Uy, UMat &src_Sx, UMat &src_Sy, UMat &_I0, UMat &_I1);
-    bool ocl_PatchInverseSearch(UMat &src_Ux, UMat &src_Uy,
-                                UMat &I0, UMat &I1, UMat &I0x, UMat &I0y, int num_iter, int pyr_level);
-#endif
-};
-
-DISOpticalFlowImpl::DISOpticalFlowImpl()
-{
-    finest_scale = 2;
-    patch_size = 8;
-    patch_stride = 4;
-    grad_descent_iter = 16;
-    variational_refinement_iter = 5;
-    variational_refinement_alpha = 20.f;
-    variational_refinement_gamma = 10.f;
-    variational_refinement_delta = 5.f;
-
-    border_size = 16;
-    use_mean_normalization = true;
-    use_spatial_propagation = true;
-
-    /* Use separate variational refinement instances for different scales to avoid repeated memory allocation: */
-    int max_possible_scales = 10;
-    for (int i = 0; i < max_possible_scales; i++)
-        variational_refinement_processors.push_back(createVariationalFlowRefinement());
-}
-
-void DISOpticalFlowImpl::prepareBuffers(Mat &I0, Mat &I1, Mat &flow, bool use_flow)
-{
-    I0s.resize(coarsest_scale + 1);
-    I1s.resize(coarsest_scale + 1);
-    I1s_ext.resize(coarsest_scale + 1);
-    I0xs.resize(coarsest_scale + 1);
-    I0ys.resize(coarsest_scale + 1);
-    Ux.resize(coarsest_scale + 1);
-    Uy.resize(coarsest_scale + 1);
-
-    Mat flow_uv[2];
-    if (use_flow)
-    {
-        split(flow, flow_uv);
-        initial_Ux.resize(coarsest_scale + 1);
-        initial_Uy.resize(coarsest_scale + 1);
-    }
-
-    int fraction = 1;
-    int cur_rows = 0, cur_cols = 0;
-
-    for (int i = 0; i <= coarsest_scale; i++)
-    {
-        /* Avoid initializing the pyramid levels above the finest scale, as they won't be used anyway */
-        if (i == finest_scale)
-        {
-            cur_rows = I0.rows / fraction;
-            cur_cols = I0.cols / fraction;
-            I0s[i].create(cur_rows, cur_cols);
-            resize(I0, I0s[i], I0s[i].size(), 0.0, 0.0, INTER_AREA);
-            I1s[i].create(cur_rows, cur_cols);
-            resize(I1, I1s[i], I1s[i].size(), 0.0, 0.0, INTER_AREA);
-
-            /* These buffers are reused in each scale so we initialize them once on the finest scale: */
-            Sx.create(cur_rows / patch_stride, cur_cols / patch_stride);
-            Sy.create(cur_rows / patch_stride, cur_cols / patch_stride);
-            I0xx_buf.create(cur_rows / patch_stride, cur_cols / patch_stride);
-            I0yy_buf.create(cur_rows / patch_stride, cur_cols / patch_stride);
-            I0xy_buf.create(cur_rows / patch_stride, cur_cols / patch_stride);
-            I0x_buf.create(cur_rows / patch_stride, cur_cols / patch_stride);
-            I0y_buf.create(cur_rows / patch_stride, cur_cols / patch_stride);
-
-            I0xx_buf_aux.create(cur_rows, cur_cols / patch_stride);
-            I0yy_buf_aux.create(cur_rows, cur_cols / patch_stride);
-            I0xy_buf_aux.create(cur_rows, cur_cols / patch_stride);
-            I0x_buf_aux.create(cur_rows, cur_cols / patch_stride);
-            I0y_buf_aux.create(cur_rows, cur_cols / patch_stride);
-
-            U.create(cur_rows, cur_cols);
-        }
-        else if (i > finest_scale)
-        {
-            cur_rows = I0s[i - 1].rows / 2;
-            cur_cols = I0s[i - 1].cols / 2;
-            I0s[i].create(cur_rows, cur_cols);
-            resize(I0s[i - 1], I0s[i], I0s[i].size(), 0.0, 0.0, INTER_AREA);
-            I1s[i].create(cur_rows, cur_cols);
-            resize(I1s[i - 1], I1s[i], I1s[i].size(), 0.0, 0.0, INTER_AREA);
-        }
-
-        if (i >= finest_scale)
-        {
-            I1s_ext[i].create(cur_rows + 2 * border_size, cur_cols + 2 * border_size);
-            copyMakeBorder(I1s[i], I1s_ext[i], border_size, border_size, border_size, border_size, BORDER_REPLICATE);
-            I0xs[i].create(cur_rows, cur_cols);
-            I0ys[i].create(cur_rows, cur_cols);
-            spatialGradient(I0s[i], I0xs[i], I0ys[i]);
-            Ux[i].create(cur_rows, cur_cols);
-            Uy[i].create(cur_rows, cur_cols);
-            variational_refinement_processors[i]->setAlpha(variational_refinement_alpha);
-            variational_refinement_processors[i]->setDelta(variational_refinement_delta);
-            variational_refinement_processors[i]->setGamma(variational_refinement_gamma);
-            variational_refinement_processors[i]->setSorIterations(5);
-            variational_refinement_processors[i]->setFixedPointIterations(variational_refinement_iter);
-
-            if (use_flow)
-            {
-                resize(flow_uv[0], initial_Ux[i], Size(cur_cols, cur_rows));
-                initial_Ux[i] /= fraction;
-                resize(flow_uv[1], initial_Uy[i], Size(cur_cols, cur_rows));
-                initial_Uy[i] /= fraction;
-            }
-        }
-
-        fraction *= 2;
-    }
-}
-
-/* This function computes the structure tensor elements (local sums of I0x^2, I0x*I0y and I0y^2).
- * A simple box filter is not used instead because we need to compute these sums on a sparse grid
- * and store them densely in the output buffers.
- */
-void DISOpticalFlowImpl::precomputeStructureTensor(Mat &dst_I0xx, Mat &dst_I0yy, Mat &dst_I0xy, Mat &dst_I0x,
-                                                   Mat &dst_I0y, Mat &I0x, Mat &I0y)
-{
-    float *I0xx_ptr = dst_I0xx.ptr<float>();
-    float *I0yy_ptr = dst_I0yy.ptr<float>();
-    float *I0xy_ptr = dst_I0xy.ptr<float>();
-    float *I0x_ptr = dst_I0x.ptr<float>();
-    float *I0y_ptr = dst_I0y.ptr<float>();
-
-    float *I0xx_aux_ptr = I0xx_buf_aux.ptr<float>();
-    float *I0yy_aux_ptr = I0yy_buf_aux.ptr<float>();
-    float *I0xy_aux_ptr = I0xy_buf_aux.ptr<float>();
-    float *I0x_aux_ptr = I0x_buf_aux.ptr<float>();
-    float *I0y_aux_ptr = I0y_buf_aux.ptr<float>();
-
-    /* Separable box filter: horizontal pass */
-    for (int i = 0; i < h; i++)
-    {
-        float sum_xx = 0.0f, sum_yy = 0.0f, sum_xy = 0.0f, sum_x = 0.0f, sum_y = 0.0f;
-        short *x_row = I0x.ptr<short>(i);
-        short *y_row = I0y.ptr<short>(i);
-        for (int j = 0; j < patch_size; j++)
-        {
-            sum_xx += x_row[j] * x_row[j];
-            sum_yy += y_row[j] * y_row[j];
-            sum_xy += x_row[j] * y_row[j];
-            sum_x += x_row[j];
-            sum_y += y_row[j];
-        }
-        I0xx_aux_ptr[i * ws] = sum_xx;
-        I0yy_aux_ptr[i * ws] = sum_yy;
-        I0xy_aux_ptr[i * ws] = sum_xy;
-        I0x_aux_ptr[i * ws] = sum_x;
-        I0y_aux_ptr[i * ws] = sum_y;
-        int js = 1;
-        for (int j = patch_size; j < w; j++)
-        {
-            sum_xx += (x_row[j] * x_row[j] - x_row[j - patch_size] * x_row[j - patch_size]);
-            sum_yy += (y_row[j] * y_row[j] - y_row[j - patch_size] * y_row[j - patch_size]);
-            sum_xy += (x_row[j] * y_row[j] - x_row[j - patch_size] * y_row[j - patch_size]);
-            sum_x += (x_row[j] - x_row[j - patch_size]);
-            sum_y += (y_row[j] - y_row[j - patch_size]);
-            if ((j - patch_size + 1) % patch_stride == 0)
-            {
-                I0xx_aux_ptr[i * ws + js] = sum_xx;
-                I0yy_aux_ptr[i * ws + js] = sum_yy;
-                I0xy_aux_ptr[i * ws + js] = sum_xy;
-                I0x_aux_ptr[i * ws + js] = sum_x;
-                I0y_aux_ptr[i * ws + js] = sum_y;
-                js++;
-            }
-        }
-    }
-
-    AutoBuffer<float> sum_xx(ws), sum_yy(ws), sum_xy(ws), sum_x(ws), sum_y(ws);
-    for (int j = 0; j < ws; j++)
-    {
-        sum_xx[j] = 0.0f;
-        sum_yy[j] = 0.0f;
-        sum_xy[j] = 0.0f;
-        sum_x[j] = 0.0f;
-        sum_y[j] = 0.0f;
-    }
-
-    /* Separable box filter: vertical pass */
-    for (int i = 0; i < patch_size; i++)
-        for (int j = 0; j < ws; j++)
-        {
-            sum_xx[j] += I0xx_aux_ptr[i * ws + j];
-            sum_yy[j] += I0yy_aux_ptr[i * ws + j];
-            sum_xy[j] += I0xy_aux_ptr[i * ws + j];
-            sum_x[j] += I0x_aux_ptr[i * ws + j];
-            sum_y[j] += I0y_aux_ptr[i * ws + j];
-        }
-    for (int j = 0; j < ws; j++)
-    {
-        I0xx_ptr[j] = sum_xx[j];
-        I0yy_ptr[j] = sum_yy[j];
-        I0xy_ptr[j] = sum_xy[j];
-        I0x_ptr[j] = sum_x[j];
-        I0y_ptr[j] = sum_y[j];
-    }
-    int is = 1;
-    for (int i = patch_size; i < h; i++)
-    {
-        for (int j = 0; j < ws; j++)
-        {
-            sum_xx[j] += (I0xx_aux_ptr[i * ws + j] - I0xx_aux_ptr[(i - patch_size) * ws + j]);
-            sum_yy[j] += (I0yy_aux_ptr[i * ws + j] - I0yy_aux_ptr[(i - patch_size) * ws + j]);
-            sum_xy[j] += (I0xy_aux_ptr[i * ws + j] - I0xy_aux_ptr[(i - patch_size) * ws + j]);
-            sum_x[j] += (I0x_aux_ptr[i * ws + j] - I0x_aux_ptr[(i - patch_size) * ws + j]);
-            sum_y[j] += (I0y_aux_ptr[i * ws + j] - I0y_aux_ptr[(i - patch_size) * ws + j]);
-        }
-        if ((i - patch_size + 1) % patch_stride == 0)
-        {
-            for (int j = 0; j < ws; j++)
-            {
-                I0xx_ptr[is * ws + j] = sum_xx[j];
-                I0yy_ptr[is * ws + j] = sum_yy[j];
-                I0xy_ptr[is * ws + j] = sum_xy[j];
-                I0x_ptr[is * ws + j] = sum_x[j];
-                I0y_ptr[is * ws + j] = sum_y[j];
-            }
-            is++;
-        }
-    }
-}
-
-DISOpticalFlowImpl::PatchInverseSearch_ParBody::PatchInverseSearch_ParBody(DISOpticalFlowImpl &_dis, int _nstripes,
-                                                                           int _hs, Mat &dst_Sx, Mat &dst_Sy,
-                                                                           Mat &src_Ux, Mat &src_Uy, Mat &_I0, Mat &_I1,
-                                                                           Mat &_I0x, Mat &_I0y, int _num_iter,
-                                                                           int _pyr_level)
-    : dis(&_dis), nstripes(_nstripes), hs(_hs), Sx(&dst_Sx), Sy(&dst_Sy), Ux(&src_Ux), Uy(&src_Uy), I0(&_I0), I1(&_I1),
-      I0x(&_I0x), I0y(&_I0y), num_iter(_num_iter), pyr_level(_pyr_level)
-{
-    stripe_sz = (int)ceil(hs / (double)nstripes);
-}
-
-/////////////////////////////////////////////* Patch processing functions */////////////////////////////////////////////
-
-/* Some auxiliary macros */
-#define HAL_INIT_BILINEAR_8x8_PATCH_EXTRACTION                                                                         \
-    v_float32x4 w00v = v_setall_f32(w00);                                                                              \
-    v_float32x4 w01v = v_setall_f32(w01);                                                                              \
-    v_float32x4 w10v = v_setall_f32(w10);                                                                              \
-    v_float32x4 w11v = v_setall_f32(w11);                                                                              \
-                                                                                                                       \
-    v_uint8x16 I0_row_16, I1_row_16, I1_row_shifted_16, I1_row_next_16, I1_row_next_shifted_16;                        \
-    v_uint16x8 I0_row_8, I1_row_8, I1_row_shifted_8, I1_row_next_8, I1_row_next_shifted_8, tmp;                        \
-    v_uint32x4 I0_row_4_left, I1_row_4_left, I1_row_shifted_4_left, I1_row_next_4_left, I1_row_next_shifted_4_left;    \
-    v_uint32x4 I0_row_4_right, I1_row_4_right, I1_row_shifted_4_right, I1_row_next_4_right,                            \
-      I1_row_next_shifted_4_right;                                                                                     \
-    v_float32x4 I_diff_left, I_diff_right;                                                                             \
-                                                                                                                       \
-    /* Preload and expand the first row of I1: */                                                                      \
-    I1_row_16 = v_load(I1_ptr);                                                                                        \
-    I1_row_shifted_16 = v_extract<1>(I1_row_16, I1_row_16);                                                            \
-    v_expand(I1_row_16, I1_row_8, tmp);                                                                                \
-    v_expand(I1_row_shifted_16, I1_row_shifted_8, tmp);                                                                \
-    v_expand(I1_row_8, I1_row_4_left, I1_row_4_right);                                                                 \
-    v_expand(I1_row_shifted_8, I1_row_shifted_4_left, I1_row_shifted_4_right);                                         \
-    I1_ptr += I1_stride;
-
-#define HAL_PROCESS_BILINEAR_8x8_PATCH_EXTRACTION                                                                      \
-    /* Load the next row of I1: */                                                                                     \
-    I1_row_next_16 = v_load(I1_ptr);                                                                                   \
-    /* Circular shift left by 1 element: */                                                                            \
-    I1_row_next_shifted_16 = v_extract<1>(I1_row_next_16, I1_row_next_16);                                             \
-    /* Expand to 8 ushorts (we only need the first 8 values): */                                                       \
-    v_expand(I1_row_next_16, I1_row_next_8, tmp);                                                                      \
-    v_expand(I1_row_next_shifted_16, I1_row_next_shifted_8, tmp);                                                      \
-    /* Separate the left and right halves: */                                                                          \
-    v_expand(I1_row_next_8, I1_row_next_4_left, I1_row_next_4_right);                                                  \
-    v_expand(I1_row_next_shifted_8, I1_row_next_shifted_4_left, I1_row_next_shifted_4_right);                          \
-                                                                                                                       \
-    /* Load current row of I0: */                                                                                      \
-    I0_row_16 = v_load(I0_ptr);                                                                                        \
-    v_expand(I0_row_16, I0_row_8, tmp);                                                                                \
-    v_expand(I0_row_8, I0_row_4_left, I0_row_4_right);                                                                 \
-                                                                                                                       \
-    /* Compute diffs between I0 and bilinearly interpolated I1: */                                                     \
-    I_diff_left = w00v * v_cvt_f32(v_reinterpret_as_s32(I1_row_4_left)) +                                              \
-                  w01v * v_cvt_f32(v_reinterpret_as_s32(I1_row_shifted_4_left)) +                                      \
-                  w10v * v_cvt_f32(v_reinterpret_as_s32(I1_row_next_4_left)) +                                         \
-                  w11v * v_cvt_f32(v_reinterpret_as_s32(I1_row_next_shifted_4_left)) -                                 \
-                  v_cvt_f32(v_reinterpret_as_s32(I0_row_4_left));                                                      \
-    I_diff_right = w00v * v_cvt_f32(v_reinterpret_as_s32(I1_row_4_right)) +                                            \
-                   w01v * v_cvt_f32(v_reinterpret_as_s32(I1_row_shifted_4_right)) +                                    \
-                   w10v * v_cvt_f32(v_reinterpret_as_s32(I1_row_next_4_right)) +                                       \
-                   w11v * v_cvt_f32(v_reinterpret_as_s32(I1_row_next_shifted_4_right)) -                               \
-                   v_cvt_f32(v_reinterpret_as_s32(I0_row_4_right));
-
-#define HAL_BILINEAR_8x8_PATCH_EXTRACTION_NEXT_ROW                                                                     \
-    I0_ptr += I0_stride;                                                                                               \
-    I1_ptr += I1_stride;                                                                                               \
-                                                                                                                       \
-    I1_row_4_left = I1_row_next_4_left;                                                                                \
-    I1_row_4_right = I1_row_next_4_right;                                                                              \
-    I1_row_shifted_4_left = I1_row_next_shifted_4_left;                                                                \
-    I1_row_shifted_4_right = I1_row_next_shifted_4_right;
-
-/* This function essentially performs one iteration of gradient descent when finding the most similar patch in I1 for a
- * given one in I0. It assumes that I0_ptr and I1_ptr already point to the corresponding patches and w00, w01, w10, w11
- * are precomputed bilinear interpolation weights. It returns the SSD (sum of squared differences) between these patches
- * and computes the values (dst_dUx, dst_dUy) that are used in the flow vector update. HAL acceleration is implemented
- * only for the default patch size (8x8). Everything is processed in floats as using fixed-point approximations harms
- * the quality significantly.
- */
-inline float processPatch(float &dst_dUx, float &dst_dUy, uchar *I0_ptr, uchar *I1_ptr, short *I0x_ptr, short *I0y_ptr,
-                          int I0_stride, int I1_stride, float w00, float w01, float w10, float w11, int patch_sz)
-{
-    float SSD = 0.0f;
-#if CV_SIMD128
-    if (patch_sz == 8)
-    {
-        /* Variables to accumulate the sums */
-        v_float32x4 Ux_vec = v_setall_f32(0);
-        v_float32x4 Uy_vec = v_setall_f32(0);
-        v_float32x4 SSD_vec = v_setall_f32(0);
-
-        v_int16x8 I0x_row, I0y_row;
-        v_int32x4 I0x_row_4_left, I0x_row_4_right, I0y_row_4_left, I0y_row_4_right;
-
-        HAL_INIT_BILINEAR_8x8_PATCH_EXTRACTION;
-        for (int row = 0; row < 8; row++)
-        {
-            HAL_PROCESS_BILINEAR_8x8_PATCH_EXTRACTION;
-            I0x_row = v_load(I0x_ptr);
-            v_expand(I0x_row, I0x_row_4_left, I0x_row_4_right);
-            I0y_row = v_load(I0y_ptr);
-            v_expand(I0y_row, I0y_row_4_left, I0y_row_4_right);
-
-            /* Update the sums: */
-            Ux_vec += I_diff_left * v_cvt_f32(I0x_row_4_left) + I_diff_right * v_cvt_f32(I0x_row_4_right);
-            Uy_vec += I_diff_left * v_cvt_f32(I0y_row_4_left) + I_diff_right * v_cvt_f32(I0y_row_4_right);
-            SSD_vec += I_diff_left * I_diff_left + I_diff_right * I_diff_right;
-
-            I0x_ptr += I0_stride;
-            I0y_ptr += I0_stride;
-            HAL_BILINEAR_8x8_PATCH_EXTRACTION_NEXT_ROW;
-        }
-
-        /* Final reduce operations: */
-        dst_dUx = v_reduce_sum(Ux_vec);
-        dst_dUy = v_reduce_sum(Uy_vec);
-        SSD = v_reduce_sum(SSD_vec);
-    }
-    else
-    {
-#endif
-        dst_dUx = 0.0f;
-        dst_dUy = 0.0f;
-        float diff;
-        for (int i = 0; i < patch_sz; i++)
-            for (int j = 0; j < patch_sz; j++)
-            {
-                diff = w00 * I1_ptr[i * I1_stride + j] + w01 * I1_ptr[i * I1_stride + j + 1] +
-                       w10 * I1_ptr[(i + 1) * I1_stride + j] + w11 * I1_ptr[(i + 1) * I1_stride + j + 1] -
-                       I0_ptr[i * I0_stride + j];
-
-                SSD += diff * diff;
-                dst_dUx += diff * I0x_ptr[i * I0_stride + j];
-                dst_dUy += diff * I0y_ptr[i * I0_stride + j];
-            }
-#if CV_SIMD128
-    }
-#endif
-    return SSD;
-}
-
-/* Same as processPatch, but with patch mean normalization, which improves robustness under changing
- * lighting conditions
- */
-inline float processPatchMeanNorm(float &dst_dUx, float &dst_dUy, uchar *I0_ptr, uchar *I1_ptr, short *I0x_ptr,
-                                  short *I0y_ptr, int I0_stride, int I1_stride, float w00, float w01, float w10,
-                                  float w11, int patch_sz, float x_grad_sum, float y_grad_sum)
-{
-    float sum_diff = 0.0, sum_diff_sq = 0.0;
-    float sum_I0x_mul = 0.0, sum_I0y_mul = 0.0;
-    float n = (float)patch_sz * patch_sz;
-
-#if CV_SIMD128
-    if (patch_sz == 8)
-    {
-        /* Variables to accumulate the sums */
-        v_float32x4 sum_I0x_mul_vec = v_setall_f32(0);
-        v_float32x4 sum_I0y_mul_vec = v_setall_f32(0);
-        v_float32x4 sum_diff_vec = v_setall_f32(0);
-        v_float32x4 sum_diff_sq_vec = v_setall_f32(0);
-
-        v_int16x8 I0x_row, I0y_row;
-        v_int32x4 I0x_row_4_left, I0x_row_4_right, I0y_row_4_left, I0y_row_4_right;
-
-        HAL_INIT_BILINEAR_8x8_PATCH_EXTRACTION;
-        for (int row = 0; row < 8; row++)
-        {
-            HAL_PROCESS_BILINEAR_8x8_PATCH_EXTRACTION;
-            I0x_row = v_load(I0x_ptr);
-            v_expand(I0x_row, I0x_row_4_left, I0x_row_4_right);
-            I0y_row = v_load(I0y_ptr);
-            v_expand(I0y_row, I0y_row_4_left, I0y_row_4_right);
-
-            /* Update the sums: */
-            sum_I0x_mul_vec += I_diff_left * v_cvt_f32(I0x_row_4_left) + I_diff_right * v_cvt_f32(I0x_row_4_right);
-            sum_I0y_mul_vec += I_diff_left * v_cvt_f32(I0y_row_4_left) + I_diff_right * v_cvt_f32(I0y_row_4_right);
-            sum_diff_sq_vec += I_diff_left * I_diff_left + I_diff_right * I_diff_right;
-            sum_diff_vec += I_diff_left + I_diff_right;
-
-            I0x_ptr += I0_stride;
-            I0y_ptr += I0_stride;
-            HAL_BILINEAR_8x8_PATCH_EXTRACTION_NEXT_ROW;
-        }
-
-        /* Final reduce operations: */
-        sum_I0x_mul = v_reduce_sum(sum_I0x_mul_vec);
-        sum_I0y_mul = v_reduce_sum(sum_I0y_mul_vec);
-        sum_diff = v_reduce_sum(sum_diff_vec);
-        sum_diff_sq = v_reduce_sum(sum_diff_sq_vec);
-    }
-    else
-    {
-#endif
-        float diff;
-        for (int i = 0; i < patch_sz; i++)
-            for (int j = 0; j < patch_sz; j++)
-            {
-                diff = w00 * I1_ptr[i * I1_stride + j] + w01 * I1_ptr[i * I1_stride + j + 1] +
-                       w10 * I1_ptr[(i + 1) * I1_stride + j] + w11 * I1_ptr[(i + 1) * I1_stride + j + 1] -
-                       I0_ptr[i * I0_stride + j];
-
-                sum_diff += diff;
-                sum_diff_sq += diff * diff;
-
-                sum_I0x_mul += diff * I0x_ptr[i * I0_stride + j];
-                sum_I0y_mul += diff * I0y_ptr[i * I0_stride + j];
-            }
-#if CV_SIMD128
-    }
-#endif
-    dst_dUx = sum_I0x_mul - sum_diff * x_grad_sum / n;
-    dst_dUy = sum_I0y_mul - sum_diff * y_grad_sum / n;
-    return sum_diff_sq - sum_diff * sum_diff / n;
-}
-
-/* Similar to processPatch, but compute only the sum of squared differences (SSD) between the patches */
-inline float computeSSD(uchar *I0_ptr, uchar *I1_ptr, int I0_stride, int I1_stride, float w00, float w01, float w10,
-                        float w11, int patch_sz)
-{
-    float SSD = 0.0f;
-#if CV_SIMD128
-    if (patch_sz == 8)
-    {
-        v_float32x4 SSD_vec = v_setall_f32(0);
-        HAL_INIT_BILINEAR_8x8_PATCH_EXTRACTION;
-        for (int row = 0; row < 8; row++)
-        {
-            HAL_PROCESS_BILINEAR_8x8_PATCH_EXTRACTION;
-            SSD_vec += I_diff_left * I_diff_left + I_diff_right * I_diff_right;
-            HAL_BILINEAR_8x8_PATCH_EXTRACTION_NEXT_ROW;
-        }
-        SSD = v_reduce_sum(SSD_vec);
-    }
-    else
-    {
-#endif
-        float diff;
-        for (int i = 0; i < patch_sz; i++)
-            for (int j = 0; j < patch_sz; j++)
-            {
-                diff = w00 * I1_ptr[i * I1_stride + j] + w01 * I1_ptr[i * I1_stride + j + 1] +
-                       w10 * I1_ptr[(i + 1) * I1_stride + j] + w11 * I1_ptr[(i + 1) * I1_stride + j + 1] -
-                       I0_ptr[i * I0_stride + j];
-                SSD += diff * diff;
-            }
-#if CV_SIMD128
-    }
-#endif
-    return SSD;
-}
-
-/* Same as computeSSD, but with patch mean normalization */
-inline float computeSSDMeanNorm(uchar *I0_ptr, uchar *I1_ptr, int I0_stride, int I1_stride, float w00, float w01,
-                                float w10, float w11, int patch_sz)
-{
-    float sum_diff = 0.0f, sum_diff_sq = 0.0f;
-    float n = (float)patch_sz * patch_sz;
-#if CV_SIMD128
-    if (patch_sz == 8)
-    {
-        v_float32x4 sum_diff_vec = v_setall_f32(0);
-        v_float32x4 sum_diff_sq_vec = v_setall_f32(0);
-        HAL_INIT_BILINEAR_8x8_PATCH_EXTRACTION;
-        for (int row = 0; row < 8; row++)
-        {
-            HAL_PROCESS_BILINEAR_8x8_PATCH_EXTRACTION;
-            sum_diff_sq_vec += I_diff_left * I_diff_left + I_diff_right * I_diff_right;
-            sum_diff_vec += I_diff_left + I_diff_right;
-            HAL_BILINEAR_8x8_PATCH_EXTRACTION_NEXT_ROW;
-        }
-        sum_diff = v_reduce_sum(sum_diff_vec);
-        sum_diff_sq = v_reduce_sum(sum_diff_sq_vec);
-    }
-    else
-    {
-#endif
-        float diff;
-        for (int i = 0; i < patch_sz; i++)
-            for (int j = 0; j < patch_sz; j++)
-            {
-                diff = w00 * I1_ptr[i * I1_stride + j] + w01 * I1_ptr[i * I1_stride + j + 1] +
-                       w10 * I1_ptr[(i + 1) * I1_stride + j] + w11 * I1_ptr[(i + 1) * I1_stride + j + 1] -
-                       I0_ptr[i * I0_stride + j];
-
-                sum_diff += diff;
-                sum_diff_sq += diff * diff;
-            }
-#if CV_SIMD128
-    }
-#endif
-    return sum_diff_sq - sum_diff * sum_diff / n;
-}
-
-#undef HAL_INIT_BILINEAR_8x8_PATCH_EXTRACTION
-#undef HAL_PROCESS_BILINEAR_8x8_PATCH_EXTRACTION
-#undef HAL_BILINEAR_8x8_PATCH_EXTRACTION_NEXT_ROW
-///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
-
-void DISOpticalFlowImpl::PatchInverseSearch_ParBody::operator()(const Range &range) const
-{
-    // force separate processing of stripes if we are using spatial propagation:
-    if (dis->use_spatial_propagation && range.end > range.start + 1)
-    {
-        for (int n = range.start; n < range.end; n++)
-            (*this)(Range(n, n + 1));
-        return;
-    }
-    int psz = dis->patch_size;
-    int psz2 = psz / 2;
-    int w_ext = dis->w + 2 * dis->border_size; //!< width of I1_ext
-    int bsz = dis->border_size;
-
-    /* Input dense flow */
-    float *Ux_ptr = Ux->ptr<float>();
-    float *Uy_ptr = Uy->ptr<float>();
-
-    /* Output sparse flow */
-    float *Sx_ptr = Sx->ptr<float>();
-    float *Sy_ptr = Sy->ptr<float>();
-
-    uchar *I0_ptr = I0->ptr<uchar>();
-    uchar *I1_ptr = I1->ptr<uchar>();
-    short *I0x_ptr = I0x->ptr<short>();
-    short *I0y_ptr = I0y->ptr<short>();
-
-    /* Precomputed structure tensor */
-    float *xx_ptr = dis->I0xx_buf.ptr<float>();
-    float *yy_ptr = dis->I0yy_buf.ptr<float>();
-    float *xy_ptr = dis->I0xy_buf.ptr<float>();
-    /* And extra buffers for mean-normalization: */
-    float *x_ptr = dis->I0x_buf.ptr<float>();
-    float *y_ptr = dis->I0y_buf.ptr<float>();
-
-    bool use_temporal_candidates = false;
-    float *initial_Ux_ptr = NULL, *initial_Uy_ptr = NULL;
-    if (!dis->initial_Ux.empty())
-    {
-        initial_Ux_ptr = dis->initial_Ux[pyr_level].ptr<float>();
-        initial_Uy_ptr = dis->initial_Uy[pyr_level].ptr<float>();
-        use_temporal_candidates = true;
-    }
-
-    int i, j, dir;
-    int start_is, end_is, start_js, end_js;
-    int start_i, start_j;
-    float i_lower_limit = bsz - psz + 1.0f;
-    float i_upper_limit = bsz + dis->h - 1.0f;
-    float j_lower_limit = bsz - psz + 1.0f;
-    float j_upper_limit = bsz + dis->w - 1.0f;
-    float dUx, dUy, i_I1, j_I1, w00, w01, w10, w11, dx, dy;
-
-#define INIT_BILINEAR_WEIGHTS(Ux, Uy)                                                                                  \
-    i_I1 = min(max(i + Uy + bsz, i_lower_limit), i_upper_limit);                                                       \
-    j_I1 = min(max(j + Ux + bsz, j_lower_limit), j_upper_limit);                                                       \
-                                                                                                                       \
-    w11 = (i_I1 - floor(i_I1)) * (j_I1 - floor(j_I1));                                                                 \
-    w10 = (i_I1 - floor(i_I1)) * (floor(j_I1) + 1 - j_I1);                                                             \
-    w01 = (floor(i_I1) + 1 - i_I1) * (j_I1 - floor(j_I1));                                                             \
-    w00 = (floor(i_I1) + 1 - i_I1) * (floor(j_I1) + 1 - j_I1);
-
-#define COMPUTE_SSD(dst, Ux, Uy)                                                                                       \
-    INIT_BILINEAR_WEIGHTS(Ux, Uy);                                                                                     \
-    if (dis->use_mean_normalization)                                                                                   \
-        dst = computeSSDMeanNorm(I0_ptr + i * dis->w + j, I1_ptr + (int)i_I1 * w_ext + (int)j_I1, dis->w, w_ext, w00,  \
-                                 w01, w10, w11, psz);                                                                  \
-    else                                                                                                               \
-        dst = computeSSD(I0_ptr + i * dis->w + j, I1_ptr + (int)i_I1 * w_ext + (int)j_I1, dis->w, w_ext, w00, w01,     \
-                         w10, w11, psz);
-
-    int num_inner_iter = (int)floor(dis->grad_descent_iter / (float)num_iter);
-    for (int iter = 0; iter < num_iter; iter++)
-    {
-        if (iter % 2 == 0)
-        {
-            dir = 1;
-            start_is = min(range.start * stripe_sz, hs);
-            end_is = min(range.end * stripe_sz, hs);
-            start_js = 0;
-            end_js = dis->ws;
-            start_i = start_is * dis->patch_stride;
-            start_j = 0;
-        }
-        else
-        {
-            dir = -1;
-            start_is = min(range.end * stripe_sz, hs) - 1;
-            end_is = min(range.start * stripe_sz, hs) - 1;
-            start_js = dis->ws - 1;
-            end_js = -1;
-            start_i = start_is * dis->patch_stride;
-            start_j = (dis->ws - 1) * dis->patch_stride;
-        }
-
-        i = start_i;
-        for (int is = start_is; dir * is < dir * end_is; is += dir)
-        {
-            j = start_j;
-            for (int js = start_js; dir * js < dir * end_js; js += dir)
-            {
-                if (iter == 0)
-                {
-                    /* Using result form the previous pyramid level as the very first approximation: */
-                    Sx_ptr[is * dis->ws + js] = Ux_ptr[(i + psz2) * dis->w + j + psz2];
-                    Sy_ptr[is * dis->ws + js] = Uy_ptr[(i + psz2) * dis->w + j + psz2];
-                }
-
-                float min_SSD = INF, cur_SSD;
-                if (use_temporal_candidates || dis->use_spatial_propagation)
-                {
-                    COMPUTE_SSD(min_SSD, Sx_ptr[is * dis->ws + js], Sy_ptr[is * dis->ws + js]);
-                }
-
-                if (use_temporal_candidates)
-                {
-                    /* Try temporal candidates (vectors from the initial flow field that was passed to the function) */
-                    COMPUTE_SSD(cur_SSD, initial_Ux_ptr[(i + psz2) * dis->w + j + psz2],
-                                initial_Uy_ptr[(i + psz2) * dis->w + j + psz2]);
-                    if (cur_SSD < min_SSD)
-                    {
-                        min_SSD = cur_SSD;
-                        Sx_ptr[is * dis->ws + js] = initial_Ux_ptr[(i + psz2) * dis->w + j + psz2];
-                        Sy_ptr[is * dis->ws + js] = initial_Uy_ptr[(i + psz2) * dis->w + j + psz2];
-                    }
-                }
-
-                if (dis->use_spatial_propagation)
-                {
-                    /* Try spatial candidates: */
-                    if (dir * js > dir * start_js)
-                    {
-                        COMPUTE_SSD(cur_SSD, Sx_ptr[is * dis->ws + js - dir], Sy_ptr[is * dis->ws + js - dir]);
-                        if (cur_SSD < min_SSD)
-                        {
-                            min_SSD = cur_SSD;
-                            Sx_ptr[is * dis->ws + js] = Sx_ptr[is * dis->ws + js - dir];
-                            Sy_ptr[is * dis->ws + js] = Sy_ptr[is * dis->ws + js - dir];
-                        }
-                    }
-                    /* Flow vectors won't actually propagate across different stripes, which is the reason for keeping
-                     * the number of stripes constant. It works well enough in practice and doesn't introduce any
-                     * visible seams.
-                     */
-                    if (dir * is > dir * start_is)
-                    {
-                        COMPUTE_SSD(cur_SSD, Sx_ptr[(is - dir) * dis->ws + js], Sy_ptr[(is - dir) * dis->ws + js]);
-                        if (cur_SSD < min_SSD)
-                        {
-                            min_SSD = cur_SSD;
-                            Sx_ptr[is * dis->ws + js] = Sx_ptr[(is - dir) * dis->ws + js];
-                            Sy_ptr[is * dis->ws + js] = Sy_ptr[(is - dir) * dis->ws + js];
-                        }
-                    }
-                }
-
-                /* Use the best candidate as a starting point for the gradient descent: */
-                float cur_Ux = Sx_ptr[is * dis->ws + js];
-                float cur_Uy = Sy_ptr[is * dis->ws + js];
-
-                /* Computing the inverse of the structure tensor: */
-                float detH = xx_ptr[is * dis->ws + js] * yy_ptr[is * dis->ws + js] -
-                             xy_ptr[is * dis->ws + js] * xy_ptr[is * dis->ws + js];
-                if (abs(detH) < EPS)
-                    detH = EPS;
-                float invH11 = yy_ptr[is * dis->ws + js] / detH;
-                float invH12 = -xy_ptr[is * dis->ws + js] / detH;
-                float invH22 = xx_ptr[is * dis->ws + js] / detH;
-                float prev_SSD = INF, SSD;
-                float x_grad_sum = x_ptr[is * dis->ws + js];
-                float y_grad_sum = y_ptr[is * dis->ws + js];
-
-                for (int t = 0; t < num_inner_iter; t++)
-                {
-                    INIT_BILINEAR_WEIGHTS(cur_Ux, cur_Uy);
-                    if (dis->use_mean_normalization)
-                        SSD = processPatchMeanNorm(dUx, dUy, I0_ptr + i * dis->w + j,
-                                                   I1_ptr + (int)i_I1 * w_ext + (int)j_I1, I0x_ptr + i * dis->w + j,
-                                                   I0y_ptr + i * dis->w + j, dis->w, w_ext, w00, w01, w10, w11, psz,
-                                                   x_grad_sum, y_grad_sum);
-                    else
-                        SSD = processPatch(dUx, dUy, I0_ptr + i * dis->w + j, I1_ptr + (int)i_I1 * w_ext + (int)j_I1,
-                                           I0x_ptr + i * dis->w + j, I0y_ptr + i * dis->w + j, dis->w, w_ext, w00, w01,
-                                           w10, w11, psz);
-
-                    dx = invH11 * dUx + invH12 * dUy;
-                    dy = invH12 * dUx + invH22 * dUy;
-                    cur_Ux -= dx;
-                    cur_Uy -= dy;
-
-                    /* Break when patch distance stops decreasing */
-                    if (SSD >= prev_SSD)
-                        break;
-                    prev_SSD = SSD;
-                }
-
-                /* If gradient descent converged to a flow vector that is very far from the initial approximation
-                 * (more than patch size) then we don't use it. Noticeably improves the robustness.
-                 */
-                if (norm(Vec2f(cur_Ux - Sx_ptr[is * dis->ws + js], cur_Uy - Sy_ptr[is * dis->ws + js])) <= psz)
-                {
-                    Sx_ptr[is * dis->ws + js] = cur_Ux;
-                    Sy_ptr[is * dis->ws + js] = cur_Uy;
-                }
-                j += dir * dis->patch_stride;
-            }
-            i += dir * dis->patch_stride;
-        }
-    }
-#undef INIT_BILINEAR_WEIGHTS
-#undef COMPUTE_SSD
-}
-
-DISOpticalFlowImpl::Densification_ParBody::Densification_ParBody(DISOpticalFlowImpl &_dis, int _nstripes, int _h,
-                                                                 Mat &dst_Ux, Mat &dst_Uy, Mat &src_Sx, Mat &src_Sy,
-                                                                 Mat &_I0, Mat &_I1)
-    : dis(&_dis), nstripes(_nstripes), h(_h), Ux(&dst_Ux), Uy(&dst_Uy), Sx(&src_Sx), Sy(&src_Sy), I0(&_I0), I1(&_I1)
-{
-    stripe_sz = (int)ceil(h / (double)nstripes);
-}
-
-/* This function transforms a sparse optical flow field obtained by PatchInverseSearch (which computes flow values
- * on a sparse grid defined by patch_stride) into a dense optical flow field by weighted averaging of values from the
- * overlapping patches.
- */
-void DISOpticalFlowImpl::Densification_ParBody::operator()(const Range &range) const
-{
-    int start_i = min(range.start * stripe_sz, h);
-    int end_i = min(range.end * stripe_sz, h);
-
-    /* Input sparse flow */
-    float *Sx_ptr = Sx->ptr<float>();
-    float *Sy_ptr = Sy->ptr<float>();
-
-    /* Output dense flow */
-    float *Ux_ptr = Ux->ptr<float>();
-    float *Uy_ptr = Uy->ptr<float>();
-
-    uchar *I0_ptr = I0->ptr<uchar>();
-    uchar *I1_ptr = I1->ptr<uchar>();
-
-    int psz = dis->patch_size;
-    int pstr = dis->patch_stride;
-    int i_l, i_u;
-    int j_l, j_u;
-    float i_m, j_m, diff;
-
-    /* These values define the set of sparse grid locations that contain patches overlapping with the current dense flow
-     * location */
-    int start_is, end_is;
-    int start_js, end_js;
-
-/* Some helper macros for updating this set of sparse grid locations */
-#define UPDATE_SPARSE_I_COORDINATES                                                                                    \
-    if (i % pstr == 0 && i + psz <= h)                                                                                 \
-        end_is++;                                                                                                      \
-    if (i - psz >= 0 && (i - psz) % pstr == 0 && start_is < end_is)                                                    \
-        start_is++;
-
-#define UPDATE_SPARSE_J_COORDINATES                                                                                    \
-    if (j % pstr == 0 && j + psz <= dis->w)                                                                            \
-        end_js++;                                                                                                      \
-    if (j - psz >= 0 && (j - psz) % pstr == 0 && start_js < end_js)                                                    \
-        start_js++;
-
-    start_is = 0;
-    end_is = -1;
-    for (int i = 0; i < start_i; i++)
-    {
-        UPDATE_SPARSE_I_COORDINATES;
-    }
-    for (int i = start_i; i < end_i; i++)
-    {
-        UPDATE_SPARSE_I_COORDINATES;
-        start_js = 0;
-        end_js = -1;
-        for (int j = 0; j < dis->w; j++)
-        {
-            UPDATE_SPARSE_J_COORDINATES;
-            float coef, sum_coef = 0.0f;
-            float sum_Ux = 0.0f;
-            float sum_Uy = 0.0f;
-
-            /* Iterate through all the patches that overlap the current location (i,j) */
-            for (int is = start_is; is <= end_is; is++)
-                for (int js = start_js; js <= end_js; js++)
-                {
-                    j_m = min(max(j + Sx_ptr[is * dis->ws + js], 0.0f), dis->w - 1.0f - EPS);
-                    i_m = min(max(i + Sy_ptr[is * dis->ws + js], 0.0f), dis->h - 1.0f - EPS);
-                    j_l = (int)j_m;
-                    j_u = j_l + 1;
-                    i_l = (int)i_m;
-                    i_u = i_l + 1;
-                    diff = (j_m - j_l) * (i_m - i_l) * I1_ptr[i_u * dis->w + j_u] +
-                           (j_u - j_m) * (i_m - i_l) * I1_ptr[i_u * dis->w + j_l] +
-                           (j_m - j_l) * (i_u - i_m) * I1_ptr[i_l * dis->w + j_u] +
-                           (j_u - j_m) * (i_u - i_m) * I1_ptr[i_l * dis->w + j_l] - I0_ptr[i * dis->w + j];
-                    coef = 1 / max(1.0f, abs(diff));
-                    sum_Ux += coef * Sx_ptr[is * dis->ws + js];
-                    sum_Uy += coef * Sy_ptr[is * dis->ws + js];
-                    sum_coef += coef;
-                }
-            Ux_ptr[i * dis->w + j] = sum_Ux / sum_coef;
-            Uy_ptr[i * dis->w + j] = sum_Uy / sum_coef;
-        }
-    }
-#undef UPDATE_SPARSE_I_COORDINATES
-#undef UPDATE_SPARSE_J_COORDINATES
-}
-
-#ifdef HAVE_OPENCL
-bool DISOpticalFlowImpl::ocl_PatchInverseSearch(UMat &src_Ux, UMat &src_Uy,
-                                                UMat &I0, UMat &I1, UMat &I0x, UMat &I0y, int num_iter, int pyr_level)
-{
-    size_t globalSize[] = {(size_t)ws, (size_t)hs};
-    size_t localSize[]  = {16, 16};
-    int idx;
-    int num_inner_iter = (int)floor(grad_descent_iter / (float)num_iter);
-
-    for (int iter = 0; iter < num_iter; iter++)
-    {
-        if (iter == 0)
-        {
-            ocl::Kernel k1("dis_patch_inverse_search_fwd_1", ocl::optflow::dis_flow_oclsrc);
-            size_t global_sz[] = {(size_t)hs * 8};
-            size_t local_sz[]  = {8};
-            idx = 0;
-
-            idx = k1.set(idx, ocl::KernelArg::PtrReadOnly(src_Ux));
-            idx = k1.set(idx, ocl::KernelArg::PtrReadOnly(src_Uy));
-            idx = k1.set(idx, ocl::KernelArg::PtrReadOnly(I0));
-            idx = k1.set(idx, ocl::KernelArg::PtrReadOnly(I1));
-            idx = k1.set(idx, (int)border_size);
-            idx = k1.set(idx, (int)patch_size);
-            idx = k1.set(idx, (int)patch_stride);
-            idx = k1.set(idx, (int)w);
-            idx = k1.set(idx, (int)h);
-            idx = k1.set(idx, (int)ws);
-            idx = k1.set(idx, (int)hs);
-            idx = k1.set(idx, (int)pyr_level);
-            idx = k1.set(idx, ocl::KernelArg::PtrWriteOnly(u_Sx));
-            idx = k1.set(idx, ocl::KernelArg::PtrWriteOnly(u_Sy));
-            if (!k1.run(1, global_sz, local_sz, false))
-                return false;
-
-            ocl::Kernel k2("dis_patch_inverse_search_fwd_2", ocl::optflow::dis_flow_oclsrc);
-            idx = 0;
-
-            idx = k2.set(idx, ocl::KernelArg::PtrReadOnly(src_Ux));
-            idx = k2.set(idx, ocl::KernelArg::PtrReadOnly(src_Uy));
-            idx = k2.set(idx, ocl::KernelArg::PtrReadOnly(I0));
-            idx = k2.set(idx, ocl::KernelArg::PtrReadOnly(I1));
-            idx = k2.set(idx, ocl::KernelArg::PtrReadOnly(I0x));
-            idx = k2.set(idx, ocl::KernelArg::PtrReadOnly(I0y));
-            idx = k2.set(idx, ocl::KernelArg::PtrReadOnly(u_I0xx_buf));
-            idx = k2.set(idx, ocl::KernelArg::PtrReadOnly(u_I0yy_buf));
-            idx = k2.set(idx, ocl::KernelArg::PtrReadOnly(u_I0xy_buf));
-            idx = k2.set(idx, ocl::KernelArg::PtrReadOnly(u_I0x_buf));
-            idx = k2.set(idx, ocl::KernelArg::PtrReadOnly(u_I0y_buf));
-            idx = k2.set(idx, (int)border_size);
-            idx = k2.set(idx, (int)patch_size);
-            idx = k2.set(idx, (int)patch_stride);
-            idx = k2.set(idx, (int)w);
-            idx = k2.set(idx, (int)h);
-            idx = k2.set(idx, (int)ws);
-            idx = k2.set(idx, (int)hs);
-            idx = k2.set(idx, (int)num_inner_iter);
-            idx = k2.set(idx, (int)pyr_level);
-            idx = k2.set(idx, ocl::KernelArg::PtrReadWrite(u_Sx));
-            idx = k2.set(idx, ocl::KernelArg::PtrReadWrite(u_Sy));
-            if (!k2.run(2, globalSize, localSize, false))
-                return false;
-        }
-        else
-        {
-            ocl::Kernel k3("dis_patch_inverse_search_bwd_1", ocl::optflow::dis_flow_oclsrc);
-            size_t global_sz[] = {(size_t)hs * 8};
-            size_t local_sz[]  = {8};
-            idx = 0;
-
-            idx = k3.set(idx, ocl::KernelArg::PtrReadOnly(I0));
-            idx = k3.set(idx, ocl::KernelArg::PtrReadOnly(I1));
-            idx = k3.set(idx, (int)border_size);
-            idx = k3.set(idx, (int)patch_size);
-            idx = k3.set(idx, (int)patch_stride);
-            idx = k3.set(idx, (int)w);
-            idx = k3.set(idx, (int)h);
-            idx = k3.set(idx, (int)ws);
-            idx = k3.set(idx, (int)hs);
-            idx = k3.set(idx, (int)pyr_level);
-            idx = k3.set(idx, ocl::KernelArg::PtrReadWrite(u_Sx));
-            idx = k3.set(idx, ocl::KernelArg::PtrReadWrite(u_Sy));
-            if (!k3.run(1, global_sz, local_sz, false))
-                return false;
-
-            ocl::Kernel k4("dis_patch_inverse_search_bwd_2", ocl::optflow::dis_flow_oclsrc);
-            idx = 0;
-
-            idx = k4.set(idx, ocl::KernelArg::PtrReadOnly(I0));
-            idx = k4.set(idx, ocl::KernelArg::PtrReadOnly(I1));
-            idx = k4.set(idx, ocl::KernelArg::PtrReadOnly(I0x));
-            idx = k4.set(idx, ocl::KernelArg::PtrReadOnly(I0y));
-            idx = k4.set(idx, ocl::KernelArg::PtrReadOnly(u_I0xx_buf));
-            idx = k4.set(idx, ocl::KernelArg::PtrReadOnly(u_I0yy_buf));
-            idx = k4.set(idx, ocl::KernelArg::PtrReadOnly(u_I0xy_buf));
-            idx = k4.set(idx, ocl::KernelArg::PtrReadOnly(u_I0x_buf));
-            idx = k4.set(idx, ocl::KernelArg::PtrReadOnly(u_I0y_buf));
-            idx = k4.set(idx, (int)border_size);
-            idx = k4.set(idx, (int)patch_size);
-            idx = k4.set(idx, (int)patch_stride);
-            idx = k4.set(idx, (int)w);
-            idx = k4.set(idx, (int)h);
-            idx = k4.set(idx, (int)ws);
-            idx = k4.set(idx, (int)hs);
-            idx = k4.set(idx, (int)num_inner_iter);
-            idx = k4.set(idx, ocl::KernelArg::PtrReadWrite(u_Sx));
-            idx = k4.set(idx, ocl::KernelArg::PtrReadWrite(u_Sy));
-            if (!k4.run(2, globalSize, localSize, false))
-                return false;
-        }
-    }
-    return true;
-}
-
-bool DISOpticalFlowImpl::ocl_Densification(UMat &dst_Ux, UMat &dst_Uy, UMat &src_Sx, UMat &src_Sy, UMat &_I0, UMat &_I1)
-{
-    size_t globalSize[] = {(size_t)w, (size_t)h};
-    size_t localSize[]  = {16, 16};
-
-    ocl::Kernel kernel("dis_densification", ocl::optflow::dis_flow_oclsrc);
-    kernel.args(ocl::KernelArg::PtrReadOnly(src_Sx),
-                ocl::KernelArg::PtrReadOnly(src_Sy),
-                ocl::KernelArg::PtrReadOnly(_I0),
-                ocl::KernelArg::PtrReadOnly(_I1),
-                (int)patch_size, (int)patch_stride,
-                (int)w, (int)h, (int)ws,
-                ocl::KernelArg::PtrWriteOnly(dst_Ux),
-                ocl::KernelArg::PtrWriteOnly(dst_Uy));
-    return kernel.run(2, globalSize, localSize, false);
-}
-
-void DISOpticalFlowImpl::ocl_prepareBuffers(UMat &I0, UMat &I1, UMat &flow, bool use_flow)
-{
-    u_I0s.resize(coarsest_scale + 1);
-    u_I1s.resize(coarsest_scale + 1);
-    u_I1s_ext.resize(coarsest_scale + 1);
-    u_I0xs.resize(coarsest_scale + 1);
-    u_I0ys.resize(coarsest_scale + 1);
-    u_Ux.resize(coarsest_scale + 1);
-    u_Uy.resize(coarsest_scale + 1);
-
-    vector<UMat> flow_uv(2);
-    if (use_flow)
-    {
-        split(flow, flow_uv);
-        u_initial_Ux.resize(coarsest_scale + 1);
-        u_initial_Uy.resize(coarsest_scale + 1);
-    }
-
-    int fraction = 1;
-    int cur_rows = 0, cur_cols = 0;
-
-    for (int i = 0; i <= coarsest_scale; i++)
-    {
-        /* Avoid initializing the pyramid levels above the finest scale, as they won't be used anyway */
-        if (i == finest_scale)
-        {
-            cur_rows = I0.rows / fraction;
-            cur_cols = I0.cols / fraction;
-            u_I0s[i].create(cur_rows, cur_cols, CV_8UC1);
-            resize(I0, u_I0s[i], u_I0s[i].size(), 0.0, 0.0, INTER_AREA);
-            u_I1s[i].create(cur_rows, cur_cols, CV_8UC1);
-            resize(I1, u_I1s[i], u_I1s[i].size(), 0.0, 0.0, INTER_AREA);
-
-            /* These buffers are reused in each scale so we initialize them once on the finest scale: */
-            u_Sx.create(cur_rows / patch_stride, cur_cols / patch_stride, CV_32FC1);
-            u_Sy.create(cur_rows / patch_stride, cur_cols / patch_stride, CV_32FC1);
-            u_I0xx_buf.create(cur_rows / patch_stride, cur_cols / patch_stride, CV_32FC1);
-            u_I0yy_buf.create(cur_rows / patch_stride, cur_cols / patch_stride, CV_32FC1);
-            u_I0xy_buf.create(cur_rows / patch_stride, cur_cols / patch_stride, CV_32FC1);
-            u_I0x_buf.create(cur_rows / patch_stride, cur_cols / patch_stride, CV_32FC1);
-            u_I0y_buf.create(cur_rows / patch_stride, cur_cols / patch_stride, CV_32FC1);
-
-            u_I0xx_buf_aux.create(cur_rows, cur_cols / patch_stride, CV_32FC1);
-            u_I0yy_buf_aux.create(cur_rows, cur_cols / patch_stride, CV_32FC1);
-            u_I0xy_buf_aux.create(cur_rows, cur_cols / patch_stride, CV_32FC1);
-            u_I0x_buf_aux.create(cur_rows, cur_cols / patch_stride, CV_32FC1);
-            u_I0y_buf_aux.create(cur_rows, cur_cols / patch_stride, CV_32FC1);
-
-            u_U.create(cur_rows, cur_cols, CV_32FC2);
-        }
-        else if (i > finest_scale)
-        {
-            cur_rows = u_I0s[i - 1].rows / 2;
-            cur_cols = u_I0s[i - 1].cols / 2;
-            u_I0s[i].create(cur_rows, cur_cols, CV_8UC1);
-            resize(u_I0s[i - 1], u_I0s[i], u_I0s[i].size(), 0.0, 0.0, INTER_AREA);
-            u_I1s[i].create(cur_rows, cur_cols, CV_8UC1);
-            resize(u_I1s[i - 1], u_I1s[i], u_I1s[i].size(), 0.0, 0.0, INTER_AREA);
-        }
-
-        if (i >= finest_scale)
-        {
-            u_I1s_ext[i].create(cur_rows + 2 * border_size, cur_cols + 2 * border_size, CV_8UC1);
-            copyMakeBorder(u_I1s[i], u_I1s_ext[i], border_size, border_size, border_size, border_size, BORDER_REPLICATE);
-            u_I0xs[i].create(cur_rows, cur_cols, CV_16SC1);
-            u_I0ys[i].create(cur_rows, cur_cols, CV_16SC1);
-            spatialGradient(u_I0s[i], u_I0xs[i], u_I0ys[i]);
-            u_Ux[i].create(cur_rows, cur_cols, CV_32FC1);
-            u_Uy[i].create(cur_rows, cur_cols, CV_32FC1);
-            variational_refinement_processors[i]->setAlpha(variational_refinement_alpha);
-            variational_refinement_processors[i]->setDelta(variational_refinement_delta);
-            variational_refinement_processors[i]->setGamma(variational_refinement_gamma);
-            variational_refinement_processors[i]->setSorIterations(5);
-            variational_refinement_processors[i]->setFixedPointIterations(variational_refinement_iter);
-
-            if (use_flow)
-            {
-                resize(flow_uv[0], u_initial_Ux[i], Size(cur_cols, cur_rows));
-                divide(u_initial_Ux[i], static_cast<float>(fraction), u_initial_Ux[i]);
-                resize(flow_uv[1], u_initial_Uy[i], Size(cur_cols, cur_rows));
-                divide(u_initial_Uy[i], static_cast<float>(fraction), u_initial_Uy[i]);
-            }
-        }
-
-        fraction *= 2;
-    }
-}
-
-bool DISOpticalFlowImpl::ocl_precomputeStructureTensor(UMat &dst_I0xx, UMat &dst_I0yy, UMat &dst_I0xy,
-                                                       UMat &dst_I0x, UMat &dst_I0y, UMat &I0x, UMat &I0y)
-{
-    size_t globalSizeX[] = {(size_t)h};
-    size_t localSizeX[]  = {16};
-
-    ocl::Kernel kernelX("dis_precomputeStructureTensor_hor", ocl::optflow::dis_flow_oclsrc);
-    kernelX.args(ocl::KernelArg::PtrReadOnly(I0x),
-                 ocl::KernelArg::PtrReadOnly(I0y),
-                 (int)patch_size, (int)patch_stride,
-                 (int)w, (int)h, (int)ws,
-                 ocl::KernelArg::PtrWriteOnly(u_I0xx_buf_aux),
-                 ocl::KernelArg::PtrWriteOnly(u_I0yy_buf_aux),
-                 ocl::KernelArg::PtrWriteOnly(u_I0xy_buf_aux),
-                 ocl::KernelArg::PtrWriteOnly(u_I0x_buf_aux),
-                 ocl::KernelArg::PtrWriteOnly(u_I0y_buf_aux));
-    if (!kernelX.run(1, globalSizeX, localSizeX, false))
-        return false;
-
-    size_t globalSizeY[] = {(size_t)ws};
-    size_t localSizeY[]  = {16};
-
-    ocl::Kernel kernelY("dis_precomputeStructureTensor_ver", ocl::optflow::dis_flow_oclsrc);
-    kernelY.args(ocl::KernelArg::PtrReadOnly(u_I0xx_buf_aux),
-                 ocl::KernelArg::PtrReadOnly(u_I0yy_buf_aux),
-                 ocl::KernelArg::PtrReadOnly(u_I0xy_buf_aux),
-                 ocl::KernelArg::PtrReadOnly(u_I0x_buf_aux),
-                 ocl::KernelArg::PtrReadOnly(u_I0y_buf_aux),
-                 (int)patch_size, (int)patch_stride,
-                 (int)w, (int)h, (int)ws,
-                 ocl::KernelArg::PtrWriteOnly(dst_I0xx),
-                 ocl::KernelArg::PtrWriteOnly(dst_I0yy),
-                 ocl::KernelArg::PtrWriteOnly(dst_I0xy),
-                 ocl::KernelArg::PtrWriteOnly(dst_I0x),
-                 ocl::KernelArg::PtrWriteOnly(dst_I0y));
-    return kernelY.run(1, globalSizeY, localSizeY, false);
-}
-
-bool DISOpticalFlowImpl::ocl_calc(InputArray I0, InputArray I1, InputOutputArray flow)
-{
-    UMat I0Mat = I0.getUMat();
-    UMat I1Mat = I1.getUMat();
-    bool use_input_flow = false;
-    if (flow.sameSize(I0) && flow.depth() == CV_32F && flow.channels() == 2)
-        use_input_flow = true;
-    else
-        flow.create(I1Mat.size(), CV_32FC2);
-    UMat &u_flowMat = flow.getUMatRef();
-    coarsest_scale = min((int)(log(max(I0Mat.cols, I0Mat.rows) / (4.0 * patch_size)) / log(2.0) + 0.5), /* Original code serach for maximal movement of width/4 */
-                         (int)(log(min(I0Mat.cols, I0Mat.rows) / patch_size) / log(2.0)));              /* Deepest pyramid level greater or equal than patch*/
-
-    ocl_prepareBuffers(I0Mat, I1Mat, u_flowMat, use_input_flow);
-    u_Ux[coarsest_scale].setTo(0.0f);
-    u_Uy[coarsest_scale].setTo(0.0f);
-
-    for (int i = coarsest_scale; i >= finest_scale; i--)
-    {
-        w = u_I0s[i].cols;
-        h = u_I0s[i].rows;
-        ws = 1 + (w - patch_size) / patch_stride;
-        hs = 1 + (h - patch_size) / patch_stride;
-
-        if (!ocl_precomputeStructureTensor(u_I0xx_buf, u_I0yy_buf, u_I0xy_buf,
-                                           u_I0x_buf, u_I0y_buf, u_I0xs[i], u_I0ys[i]))
-            return false;
-
-        if (!ocl_PatchInverseSearch(u_Ux[i], u_Uy[i], u_I0s[i], u_I1s_ext[i], u_I0xs[i], u_I0ys[i], 2, i))
-            return false;
-
-        if (!ocl_Densification(u_Ux[i], u_Uy[i], u_Sx, u_Sy, u_I0s[i], u_I1s[i]))
-            return false;
-
-        if (variational_refinement_iter > 0)
-            variational_refinement_processors[i]->calcUV(u_I0s[i], u_I1s[i],
-                                                         u_Ux[i].getMat(ACCESS_WRITE), u_Uy[i].getMat(ACCESS_WRITE));
-
-        if (i > finest_scale)
-        {
-            resize(u_Ux[i], u_Ux[i - 1], u_Ux[i - 1].size());
-            resize(u_Uy[i], u_Uy[i - 1], u_Uy[i - 1].size());
-            multiply(u_Ux[i - 1], 2, u_Ux[i - 1]);
-            multiply(u_Uy[i - 1], 2, u_Uy[i - 1]);
-        }
-    }
-    vector<UMat> uxy(2);
-    uxy[0] = u_Ux[finest_scale];
-    uxy[1] = u_Uy[finest_scale];
-    merge(uxy, u_U);
-    resize(u_U, u_flowMat, u_flowMat.size());
-    multiply(u_flowMat, 1 << finest_scale, u_flowMat);
-
-    return true;
-}
-#endif
-
-void DISOpticalFlowImpl::calc(InputArray I0, InputArray I1, InputOutputArray flow)
-{
-    CV_Assert(!I0.empty() && I0.depth() == CV_8U && I0.channels() == 1);
-    CV_Assert(!I1.empty() && I1.depth() == CV_8U && I1.channels() == 1);
-    CV_Assert(I0.sameSize(I1));
-    CV_Assert(I0.isContinuous());
-    CV_Assert(I1.isContinuous());
-
-    CV_OCL_RUN(ocl::Device::getDefault().isIntel() && flow.isUMat() &&
-               (patch_size == 8) && (use_spatial_propagation == true),
-               ocl_calc(I0, I1, flow));
-
-    Mat I0Mat = I0.getMat();
-    Mat I1Mat = I1.getMat();
-    bool use_input_flow = false;
-    if (flow.sameSize(I0) && flow.depth() == CV_32F && flow.channels() == 2)
-        use_input_flow = true;
-    else
-        flow.create(I1Mat.size(), CV_32FC2);
-    Mat flowMat = flow.getMat();
-    coarsest_scale = min((int)(log(max(I0Mat.cols, I0Mat.rows) / (4.0 * patch_size)) / log(2.0) + 0.5), /* Original code serach for maximal movement of width/4 */
-                         (int)(log(min(I0Mat.cols, I0Mat.rows) / patch_size) / log(2.0)));              /* Deepest pyramid level greater or equal than patch*/
-    int num_stripes = getNumThreads();
-
-    prepareBuffers(I0Mat, I1Mat, flowMat, use_input_flow);
-    Ux[coarsest_scale].setTo(0.0f);
-    Uy[coarsest_scale].setTo(0.0f);
-
-    for (int i = coarsest_scale; i >= finest_scale; i--)
-    {
-        w = I0s[i].cols;
-        h = I0s[i].rows;
-        ws = 1 + (w - patch_size) / patch_stride;
-        hs = 1 + (h - patch_size) / patch_stride;
-
-        precomputeStructureTensor(I0xx_buf, I0yy_buf, I0xy_buf, I0x_buf, I0y_buf, I0xs[i], I0ys[i]);
-        if (use_spatial_propagation)
-        {
-            /* Use a fixed number of stripes regardless the number of threads to make inverse search
-             * with spatial propagation reproducible
-             */
-            parallel_for_(Range(0, 8), PatchInverseSearch_ParBody(*this, 8, hs, Sx, Sy, Ux[i], Uy[i], I0s[i],
-                                                                  I1s_ext[i], I0xs[i], I0ys[i], 2, i));
-        }
-        else
-        {
-            parallel_for_(Range(0, num_stripes),
-                          PatchInverseSearch_ParBody(*this, num_stripes, hs, Sx, Sy, Ux[i], Uy[i], I0s[i], I1s_ext[i],
-                                                     I0xs[i], I0ys[i], 1, i));
-        }
-
-        parallel_for_(Range(0, num_stripes),
-                      Densification_ParBody(*this, num_stripes, I0s[i].rows, Ux[i], Uy[i], Sx, Sy, I0s[i], I1s[i]));
-        if (variational_refinement_iter > 0)
-            variational_refinement_processors[i]->calcUV(I0s[i], I1s[i], Ux[i], Uy[i]);
-
-        if (i > finest_scale)
-        {
-            resize(Ux[i], Ux[i - 1], Ux[i - 1].size());
-            resize(Uy[i], Uy[i - 1], Uy[i - 1].size());
-            Ux[i - 1] *= 2;
-            Uy[i - 1] *= 2;
-        }
-    }
-    Mat uxy[] = {Ux[finest_scale], Uy[finest_scale]};
-    merge(uxy, 2, U);
-    resize(U, flowMat, flowMat.size());
-    flowMat *= 1 << finest_scale;
-}
-
-void DISOpticalFlowImpl::collectGarbage()
-{
-    I0s.clear();
-    I1s.clear();
-    I1s_ext.clear();
-    I0xs.clear();
-    I0ys.clear();
-    Ux.clear();
-    Uy.clear();
-    U.release();
-    Sx.release();
-    Sy.release();
-    I0xx_buf.release();
-    I0yy_buf.release();
-    I0xy_buf.release();
-    I0xx_buf_aux.release();
-    I0yy_buf_aux.release();
-    I0xy_buf_aux.release();
-
-#ifdef HAVE_OPENCL
-    u_I0s.clear();
-    u_I1s.clear();
-    u_I1s_ext.clear();
-    u_I0xs.clear();
-    u_I0ys.clear();
-    u_Ux.clear();
-    u_Uy.clear();
-    u_U.release();
-    u_Sx.release();
-    u_Sy.release();
-    u_I0xx_buf.release();
-    u_I0yy_buf.release();
-    u_I0xy_buf.release();
-    u_I0xx_buf_aux.release();
-    u_I0yy_buf_aux.release();
-    u_I0xy_buf_aux.release();
-#endif
-
-    for (int i = finest_scale; i <= coarsest_scale; i++)
-        variational_refinement_processors[i]->collectGarbage();
-    variational_refinement_processors.clear();
-}
-
-Ptr<DISOpticalFlow> createOptFlow_DIS(int preset)
-{
-    Ptr<DISOpticalFlow> dis = makePtr<DISOpticalFlowImpl>();
-    dis->setPatchSize(8);
-    if (preset == DISOpticalFlow::PRESET_ULTRAFAST)
-    {
-        dis->setFinestScale(2);
-        dis->setPatchStride(4);
-        dis->setGradientDescentIterations(12);
-        dis->setVariationalRefinementIterations(0);
-    }
-    else if (preset == DISOpticalFlow::PRESET_FAST)
-    {
-        dis->setFinestScale(2);
-        dis->setPatchStride(4);
-        dis->setGradientDescentIterations(16);
-        dis->setVariationalRefinementIterations(5);
-    }
-    else if (preset == DISOpticalFlow::PRESET_MEDIUM)
-    {
-        dis->setFinestScale(1);
-        dis->setPatchStride(3);
-        dis->setGradientDescentIterations(25);
-        dis->setVariationalRefinementIterations(5);
-    }
-
-    return dis;
-}
-}
-}
diff --git a/modules/optflow/src/opencl/dis_flow.cl b/modules/optflow/src/opencl/dis_flow.cl
deleted file mode 100644
index 54c642d9e7d..00000000000
--- a/modules/optflow/src/opencl/dis_flow.cl
+++ /dev/null
@@ -1,522 +0,0 @@
-// This file is part of OpenCV project.
-// It is subject to the license terms in the LICENSE file found in the top-level directory
-// of this distribution and at http://opencv.org/license.html.
-
-#define EPS 0.001f
-#define INF 1E+10F
-
-__kernel void dis_precomputeStructureTensor_hor(__global const short *I0x,
-                                                __global const short *I0y,
-                                                int patch_size, int patch_stride,
-                                                int w, int h, int ws,
-                                                __global float *I0xx_aux_ptr,
-                                                __global float *I0yy_aux_ptr,
-                                                __global float *I0xy_aux_ptr,
-                                                __global float *I0x_aux_ptr,
-                                                __global float *I0y_aux_ptr)
-{
-
-    int i = get_global_id(0);
-
-    if (i >= h) return;
-
-    const __global short *x_row = I0x + i * w;
-    const __global short *y_row = I0y + i * w;
-
-    float sum_xx = 0.0f, sum_yy = 0.0f, sum_xy = 0.0f, sum_x = 0.0f, sum_y = 0.0f;
-    float8 x_vec = convert_float8(vload8(0, x_row));
-    float8 y_vec = convert_float8(vload8(0, y_row));
-    sum_xx = dot(x_vec.lo, x_vec.lo) + dot(x_vec.hi, x_vec.hi);
-    sum_yy = dot(y_vec.lo, y_vec.lo) + dot(y_vec.hi, y_vec.hi);
-    sum_xy = dot(x_vec.lo, y_vec.lo) + dot(x_vec.hi, y_vec.hi);
-    sum_x = dot(x_vec.lo, 1.0f) + dot(x_vec.hi, 1.0f);
-    sum_y = dot(y_vec.lo, 1.0f) + dot(y_vec.hi, 1.0f);
-
-    I0xx_aux_ptr[i * ws] = sum_xx;
-    I0yy_aux_ptr[i * ws] = sum_yy;
-    I0xy_aux_ptr[i * ws] = sum_xy;
-    I0x_aux_ptr[i * ws] = sum_x;
-    I0y_aux_ptr[i * ws] = sum_y;
-
-    int js = 1;
-    for (int j = patch_size; j < w; j++)
-    {
-        short x_val1 = x_row[j];
-        short x_val2 = x_row[j - patch_size];
-        short y_val1 = y_row[j];
-        short y_val2 = y_row[j - patch_size];
-        sum_xx += (x_val1 * x_val1 - x_val2 * x_val2);
-        sum_yy += (y_val1 * y_val1 - y_val2 * y_val2);
-        sum_xy += (x_val1 * y_val1 - x_val2 * y_val2);
-        sum_x += (x_val1 - x_val2);
-        sum_y += (y_val1 - y_val2);
-        if ((j - patch_size + 1) % patch_stride == 0)
-        {
-            int index = i * ws + js;
-            I0xx_aux_ptr[index] = sum_xx;
-            I0yy_aux_ptr[index] = sum_yy;
-            I0xy_aux_ptr[index] = sum_xy;
-            I0x_aux_ptr[index] = sum_x;
-            I0y_aux_ptr[index] = sum_y;
-            js++;
-        }
-    }
-}
-
-__kernel void dis_precomputeStructureTensor_ver(__global const float *I0xx_aux_ptr,
-                                                __global const float *I0yy_aux_ptr,
-                                                __global const float *I0xy_aux_ptr,
-                                                __global const float *I0x_aux_ptr,
-                                                __global const float *I0y_aux_ptr,
-                                                int patch_size, int patch_stride,
-                                                int w, int h, int ws,
-                                                __global float *I0xx_ptr,
-                                                __global float *I0yy_ptr,
-                                                __global float *I0xy_ptr,
-                                                __global float *I0x_ptr,
-                                                __global float *I0y_ptr)
-{
-    int j = get_global_id(0);
-
-    if (j >= ws) return;
-
-    float sum_xx, sum_yy, sum_xy, sum_x, sum_y;
-    sum_xx = sum_yy = sum_xy = sum_x = sum_y = 0.0f;
-
-    for (int i = 0; i < patch_size; i++)
-    {
-        sum_xx += I0xx_aux_ptr[i * ws + j];
-        sum_yy += I0yy_aux_ptr[i * ws + j];
-        sum_xy += I0xy_aux_ptr[i * ws + j];
-        sum_x  += I0x_aux_ptr[i * ws + j];
-        sum_y  += I0y_aux_ptr[i * ws + j];
-    }
-    I0xx_ptr[j] = sum_xx;
-    I0yy_ptr[j] = sum_yy;
-    I0xy_ptr[j] = sum_xy;
-    I0x_ptr[j] = sum_x;
-    I0y_ptr[j] = sum_y;
-
-    int is = 1;
-    for (int i = patch_size; i < h; i++)
-    {
-        sum_xx += (I0xx_aux_ptr[i * ws + j] - I0xx_aux_ptr[(i - patch_size) * ws + j]);
-        sum_yy += (I0yy_aux_ptr[i * ws + j] - I0yy_aux_ptr[(i - patch_size) * ws + j]);
-        sum_xy += (I0xy_aux_ptr[i * ws + j] - I0xy_aux_ptr[(i - patch_size) * ws + j]);
-        sum_x  += (I0x_aux_ptr[i * ws + j] - I0x_aux_ptr[(i - patch_size) * ws + j]);
-        sum_y  += (I0y_aux_ptr[i * ws + j] - I0y_aux_ptr[(i - patch_size) * ws + j]);
-
-        if ((i - patch_size + 1) % patch_stride == 0)
-        {
-            I0xx_ptr[is * ws + j] = sum_xx;
-            I0yy_ptr[is * ws + j] = sum_yy;
-            I0xy_ptr[is * ws + j] = sum_xy;
-            I0x_ptr[is * ws + j] = sum_x;
-            I0y_ptr[is * ws + j] = sum_y;
-            is++;
-        }
-    }
-}
-
-__kernel void dis_densification(__global const float *sx, __global const float *sy,
-                                __global const uchar *i0, __global const uchar *i1,
-                                int psz, int pstr,
-                                int w, int h, int ws,
-                                __global float *ux, __global float *uy)
-{
-    int x = get_global_id(0);
-    int y = get_global_id(1);
-    int i, j;
-
-    if (x >= w || y >= h) return;
-
-    int start_is, end_is;
-    int start_js, end_js;
-
-    end_is = min(y / pstr, (h - psz) / pstr);
-    start_is = max(0, y - psz + pstr) / pstr;
-    start_is = min(start_is, end_is);
-
-    end_js = min(x / pstr, (w - psz) / pstr);
-    start_js = max(0, x - psz + pstr) / pstr;
-    start_js = min(start_js, end_js);
-
-    float coef, sum_coef = 0.0f;
-    float sum_Ux = 0.0f;
-    float sum_Uy = 0.0f;
-
-    int i_l, i_u;
-    int j_l, j_u;
-    float i_m, j_m, diff;
-
-    i = y;
-    j = x;
-
-    /* Iterate through all the patches that overlap the current location (i,j) */
-    for (int is = start_is; is <= end_is; is++)
-        for (int js = start_js; js <= end_js; js++)
-        {
-            float sx_val = sx[is * ws + js];
-            float sy_val = sy[is * ws + js];
-            uchar2 i1_vec1, i1_vec2;
-
-            j_m = min(max(j + sx_val, 0.0f), w - 1.0f - EPS);
-            i_m = min(max(i + sy_val, 0.0f), h - 1.0f - EPS);
-            j_l = (int)j_m;
-            j_u = j_l + 1;
-            i_l = (int)i_m;
-            i_u = i_l + 1;
-            i1_vec1 = vload2(0, i1 + i_u * w + j_l);
-            i1_vec2 = vload2(0, i1 + i_l * w + j_l);
-            diff = (j_m - j_l) * (i_m - i_l) * i1_vec1.y +
-                   (j_u - j_m) * (i_m - i_l) * i1_vec1.x +
-                   (j_m - j_l) * (i_u - i_m) * i1_vec2.y +
-                   (j_u - j_m) * (i_u - i_m) * i1_vec2.x - i0[i * w + j];
-            coef = 1 / max(1.0f, fabs(diff));
-            sum_Ux += coef * sx_val;
-            sum_Uy += coef * sy_val;
-            sum_coef += coef;
-        }
-
-    ux[i * w + j] = sum_Ux / sum_coef;
-    uy[i * w + j] = sum_Uy / sum_coef;
-}
-
-#define INIT_BILINEAR_WEIGHTS(Ux, Uy)                                                                                  \
-    i_I1 = min(max(i + Uy + bsz, i_lower_limit), i_upper_limit);                                                       \
-    j_I1 = min(max(j + Ux + bsz, j_lower_limit), j_upper_limit);                                                       \
-                                                                                                                       \
-    w11 = (i_I1 - floor(i_I1)) * (j_I1 - floor(j_I1));                                                                 \
-    w10 = (i_I1 - floor(i_I1)) * (floor(j_I1) + 1 - j_I1);                                                             \
-    w01 = (floor(i_I1) + 1 - i_I1) * (j_I1 - floor(j_I1));                                                             \
-    w00 = (floor(i_I1) + 1 - i_I1) * (floor(j_I1) + 1 - j_I1);
-
-float computeSSDMeanNorm(const __global uchar *I0_ptr, const __global uchar *I1_ptr,
-                         int I0_stride, int I1_stride,
-                         float w00, float w01, float w10, float w11, int patch_sz, int i)
-{
-    float sum_diff = 0.0f, sum_diff_sq = 0.0f;
-    int n = patch_sz * patch_sz;
-
-    uchar8 I1_vec1, I1_vec2, I0_vec;
-    uchar I1_val1, I1_val2;
-
-    I0_vec  = vload8(0, I0_ptr + i * I0_stride);
-    I1_vec1 = vload8(0, I1_ptr + i * I1_stride);
-    I1_vec2 = vload8(0, I1_ptr + (i + 1) * I1_stride);
-    I1_val1 = I1_ptr[i * I1_stride + 8];
-    I1_val2 = I1_ptr[(i + 1) * I1_stride + 8];
-
-    float8 vec = w00 * convert_float8(I1_vec1) + w01 * convert_float8((uchar8)(I1_vec1.s123, I1_vec1.s4567, I1_val1)) +
-                 w10 * convert_float8(I1_vec2) + w11 * convert_float8((uchar8)(I1_vec2.s123, I1_vec2.s4567, I1_val2)) -
-                 convert_float8(I0_vec);
-
-    sum_diff = (dot(vec.lo, 1.0) + dot(vec.hi, 1.0));
-    sum_diff_sq = (dot(vec.lo, vec.lo) + dot(vec.hi, vec.hi));
-
-    sum_diff = sub_group_reduce_add(sum_diff);
-    sum_diff_sq = sub_group_reduce_add(sum_diff_sq);
-
-    return sum_diff_sq - sum_diff * sum_diff / n;
-}
-
-__kernel void dis_patch_inverse_search_fwd_1(__global const float *Ux_ptr, __global const float *Uy_ptr,
-                                             __global const uchar *I0_ptr, __global const uchar *I1_ptr,
-                                             int border_size, int patch_size, int patch_stride,
-                                             int w, int h, int ws, int hs, int pyr_level,
-                                             __global float *Sx_ptr, __global float *Sy_ptr)
-{
-    int id = get_global_id(0);
-    int is = id / 8;
-    if (id >= (hs * 8)) return;
-
-    int i = is * patch_stride;
-    int j = 0;
-    int psz = patch_size;
-    int psz2 = psz / 2;
-    int w_ext = w + 2 * border_size;
-    int bsz = border_size;
-
-    float i_lower_limit = bsz - psz + 1.0f;
-    float i_upper_limit = bsz + h - 1.0f;
-    float j_lower_limit = bsz - psz + 1.0f;
-    float j_upper_limit = bsz + w - 1.0f;
-    float i_I1, j_I1, w00, w01, w10, w11;
-
-    float prev_Ux = Ux_ptr[(i + psz2) * w + j + psz2];
-    float prev_Uy = Uy_ptr[(i + psz2) * w + j + psz2];
-    Sx_ptr[is * ws] = prev_Ux;
-    Sy_ptr[is * ws] = prev_Uy;
-    j += patch_stride;
-
-    int sid = get_sub_group_local_id();
-    for (int js = 1; js < ws; js++, j += patch_stride)
-    {
-        float min_SSD, cur_SSD;
-        float Ux = Ux_ptr[(i + psz2) * w + j + psz2];
-        float Uy = Uy_ptr[(i + psz2) * w + j + psz2];
-
-        INIT_BILINEAR_WEIGHTS(Ux, Uy);
-        min_SSD = computeSSDMeanNorm(I0_ptr + i * w + j, I1_ptr + (int)i_I1 * w_ext + (int)j_I1,
-                                     w, w_ext, w00, w01, w10, w11, psz, sid);
-
-        INIT_BILINEAR_WEIGHTS(prev_Ux, prev_Uy);
-        cur_SSD = computeSSDMeanNorm(I0_ptr + i * w + j, I1_ptr + (int)i_I1 * w_ext + (int)j_I1,
-                                     w, w_ext, w00, w01, w10, w11, psz, sid);
-        if (cur_SSD < min_SSD)
-        {
-            Ux = prev_Ux;
-            Uy = prev_Uy;
-        }
-
-        prev_Ux = Ux;
-        prev_Uy = Uy;
-        Sx_ptr[is * ws + js] = Ux;
-        Sy_ptr[is * ws + js] = Uy;
-    }
-}
-
-float3 processPatchMeanNorm(const __global uchar *I0_ptr, const __global uchar *I1_ptr,
-                            const __global short *I0x_ptr, const __global short *I0y_ptr,
-                            int I0_stride, int I1_stride, float w00, float w01, float w10,
-                            float w11, int patch_sz, float x_grad_sum, float y_grad_sum)
-{
-    float sum_diff = 0.0, sum_diff_sq = 0.0;
-    float sum_I0x_mul = 0.0, sum_I0y_mul = 0.0;
-    int n = patch_sz * patch_sz;
-    uchar8 I1_vec1, I1_vec2;
-    uchar I1_val1, I1_val2;
-
-    for (int i = 0; i < 8; i++)
-    {
-        uchar8 I0_vec = vload8(0, I0_ptr + i * I0_stride);
-
-        I1_vec1 = (i == 0) ? vload8(0, I1_ptr + i * I1_stride) : I1_vec2;
-        I1_vec2 = vload8(0, I1_ptr + (i + 1) * I1_stride);
-        I1_val1 = (i == 0) ? I1_ptr[i * I1_stride + patch_sz] : I1_val2;
-        I1_val2 = I1_ptr[(i + 1) * I1_stride + patch_sz];
-
-        float8 vec = w00 * convert_float8(I1_vec1) + w01 * convert_float8((uchar8)(I1_vec1.s123, I1_vec1.s4567, I1_val1)) +
-                     w10 * convert_float8(I1_vec2) + w11 * convert_float8((uchar8)(I1_vec2.s123, I1_vec2.s4567, I1_val2)) -
-                     convert_float8(I0_vec);
-
-        sum_diff += (dot(vec.lo, 1.0) + dot(vec.hi, 1.0));
-        sum_diff_sq += (dot(vec.lo, vec.lo) + dot(vec.hi, vec.hi));
-
-        short8 I0x_vec = vload8(0, I0x_ptr + i * I0_stride);
-        short8 I0y_vec = vload8(0, I0y_ptr + i * I0_stride);
-
-        sum_I0x_mul += dot(vec.lo, convert_float4(I0x_vec.lo));
-        sum_I0x_mul += dot(vec.hi, convert_float4(I0x_vec.hi));
-        sum_I0y_mul += dot(vec.lo, convert_float4(I0y_vec.lo));
-        sum_I0y_mul += dot(vec.hi, convert_float4(I0y_vec.hi));
-    }
-
-    float dst_dUx = sum_I0x_mul - sum_diff * x_grad_sum / n;
-    float dst_dUy = sum_I0y_mul - sum_diff * y_grad_sum / n;
-    float SSD = sum_diff_sq - sum_diff * sum_diff / n;
-
-    return (float3)(SSD, dst_dUx, dst_dUy);
-}
-
-__kernel void dis_patch_inverse_search_fwd_2(__global const float *Ux_ptr, __global const float *Uy_ptr,
-                                             __global const uchar *I0_ptr, __global const uchar *I1_ptr,
-                                             __global const short *I0x_ptr, __global const short *I0y_ptr,
-                                             __global const float *xx_ptr, __global const float *yy_ptr,
-                                             __global const float *xy_ptr,
-                                             __global const float *x_ptr, __global const float *y_ptr,
-                                             int border_size, int patch_size, int patch_stride,
-                                             int w, int h, int ws, int hs, int num_inner_iter, int pyr_level,
-                                             __global float *Sx_ptr, __global float *Sy_ptr)
-{
-    int js = get_global_id(0);
-    int is = get_global_id(1);
-    int i = is * patch_stride;
-    int j = js * patch_stride;
-    int psz = patch_size;
-    int psz2 = psz / 2;
-    int w_ext = w + 2 * border_size;
-    int bsz = border_size;
-    int index = is * ws + js;
-
-    if (js >= ws || is >= hs) return;
-
-    float Ux = Sx_ptr[index];
-    float Uy = Sy_ptr[index];
-    float cur_Ux = Ux;
-    float cur_Uy = Uy;
-    float cur_xx = xx_ptr[index];
-    float cur_yy = yy_ptr[index];
-    float cur_xy = xy_ptr[index];
-    float detH = cur_xx * cur_yy - cur_xy * cur_xy;
-
-    if (fabs(detH) < EPS) detH = EPS;
-
-    float invH11 = cur_yy / detH;
-    float invH12 = -cur_xy / detH;
-    float invH22 = cur_xx / detH;
-    float prev_SSD = INF, SSD;
-    float x_grad_sum = x_ptr[index];
-    float y_grad_sum = y_ptr[index];
-
-    float i_lower_limit = bsz - psz + 1.0f;
-    float i_upper_limit = bsz + h - 1.0f;
-    float j_lower_limit = bsz - psz + 1.0f;
-    float j_upper_limit = bsz + w - 1.0f;
-    float dUx, dUy, i_I1, j_I1, w00, w01, w10, w11, dx, dy;
-    float3 res;
-
-    for (int t = 0; t < num_inner_iter; t++)
-    {
-        INIT_BILINEAR_WEIGHTS(cur_Ux, cur_Uy);
-        res = processPatchMeanNorm(I0_ptr + i * w + j,
-                                   I1_ptr + (int)i_I1 * w_ext + (int)j_I1, I0x_ptr + i * w + j,
-                                   I0y_ptr + i * w + j, w, w_ext, w00, w01, w10, w11, psz,
-                                   x_grad_sum, y_grad_sum);
-
-        SSD = res.x;
-        dUx = res.y;
-        dUy = res.z;
-        dx = invH11 * dUx + invH12 * dUy;
-        dy = invH12 * dUx + invH22 * dUy;
-
-        cur_Ux -= dx;
-        cur_Uy -= dy;
-
-        if (SSD >= prev_SSD)
-            break;
-        prev_SSD = SSD;
-    }
-
-    float2 vec = (float2)(cur_Ux - Ux, cur_Uy - Uy);
-    if (dot(vec, vec) <= (float)(psz * psz))
-    {
-        Sx_ptr[index] = cur_Ux;
-        Sy_ptr[index] = cur_Uy;
-    }
-}
-
-__kernel void dis_patch_inverse_search_bwd_1(__global const uchar *I0_ptr, __global const uchar *I1_ptr,
-                                             int border_size, int patch_size, int patch_stride,
-                                             int w, int h, int ws, int hs, int pyr_level,
-                                             __global float *Sx_ptr, __global float *Sy_ptr)
-{
-    int id = get_global_id(0);
-    int is = id / 8;
-    if (id >= (hs * 8)) return;
-
-    is = (hs - 1 - is);
-    int i = is * patch_stride;
-    int j = (ws - 2) * patch_stride;
-    int psz = patch_size;
-    int psz2 = psz / 2;
-    int w_ext = w + 2 * border_size;
-    int bsz = border_size;
-
-    float i_lower_limit = bsz - psz + 1.0f;
-    float i_upper_limit = bsz + h - 1.0f;
-    float j_lower_limit = bsz - psz + 1.0f;
-    float j_upper_limit = bsz + w - 1.0f;
-    float i_I1, j_I1, w00, w01, w10, w11;
-
-    int sid = get_sub_group_local_id();
-    for (int js = (ws - 2); js > -1; js--, j -= patch_stride)
-    {
-        float min_SSD, cur_SSD;
-        float2 Ux = vload2(0, Sx_ptr + is * ws + js);
-        float2 Uy = vload2(0, Sy_ptr + is * ws + js);
-
-        INIT_BILINEAR_WEIGHTS(Ux.x, Uy.x);
-        min_SSD = computeSSDMeanNorm(I0_ptr + i * w + j, I1_ptr + (int)i_I1 * w_ext + (int)j_I1,
-                                     w, w_ext, w00, w01, w10, w11, psz, sid);
-
-        INIT_BILINEAR_WEIGHTS(Ux.y, Uy.y);
-        cur_SSD = computeSSDMeanNorm(I0_ptr + i * w + j, I1_ptr + (int)i_I1 * w_ext + (int)j_I1,
-                                     w, w_ext, w00, w01, w10, w11, psz, sid);
-        if (cur_SSD < min_SSD)
-        {
-            Sx_ptr[is * ws + js] = Ux.y;
-            Sy_ptr[is * ws + js] = Uy.y;
-        }
-    }
-}
-
-__kernel void dis_patch_inverse_search_bwd_2(__global const uchar *I0_ptr, __global const uchar *I1_ptr,
-                                             __global const short *I0x_ptr, __global const short *I0y_ptr,
-                                             __global const float *xx_ptr, __global const float *yy_ptr,
-                                             __global const float *xy_ptr,
-                                             __global const float *x_ptr, __global const float *y_ptr,
-                                             int border_size, int patch_size, int patch_stride,
-                                             int w, int h, int ws, int hs, int num_inner_iter,
-                                             __global float *Sx_ptr, __global float *Sy_ptr)
-{
-    int js = get_global_id(0);
-    int is = get_global_id(1);
-    if (js >= ws || is >= hs) return;
-
-    js = (ws - 1 - js);
-    is = (hs - 1 - is);
-
-    int j = js * patch_stride;
-    int i = is * patch_stride;
-    int psz = patch_size;
-    int psz2 = psz / 2;
-    int w_ext = w + 2 * border_size;
-    int bsz = border_size;
-    int index = is * ws + js;
-
-    float Ux = Sx_ptr[index];
-    float Uy = Sy_ptr[index];
-    float cur_Ux = Ux;
-    float cur_Uy = Uy;
-    float cur_xx = xx_ptr[index];
-    float cur_yy = yy_ptr[index];
-    float cur_xy = xy_ptr[index];
-    float detH = cur_xx * cur_yy - cur_xy * cur_xy;
-
-    if (fabs(detH) < EPS) detH = EPS;
-
-    float invH11 = cur_yy / detH;
-    float invH12 = -cur_xy / detH;
-    float invH22 = cur_xx / detH;
-    float prev_SSD = INF, SSD;
-    float x_grad_sum = x_ptr[index];
-    float y_grad_sum = y_ptr[index];
-
-    float i_lower_limit = bsz - psz + 1.0f;
-    float i_upper_limit = bsz + h - 1.0f;
-    float j_lower_limit = bsz - psz + 1.0f;
-    float j_upper_limit = bsz + w - 1.0f;
-    float dUx, dUy, i_I1, j_I1, w00, w01, w10, w11, dx, dy;
-    float3 res;
-
-    for (int t = 0; t < num_inner_iter; t++)
-    {
-        INIT_BILINEAR_WEIGHTS(cur_Ux, cur_Uy);
-        res = processPatchMeanNorm(I0_ptr + i * w + j,
-                                   I1_ptr + (int)i_I1 * w_ext + (int)j_I1, I0x_ptr + i * w + j,
-                                   I0y_ptr + i * w + j, w, w_ext, w00, w01, w10, w11, psz,
-                                   x_grad_sum, y_grad_sum);
-
-        SSD = res.x;
-        dUx = res.y;
-        dUy = res.z;
-        dx = invH11 * dUx + invH12 * dUy;
-        dy = invH12 * dUx + invH22 * dUy;
-
-        cur_Ux -= dx;
-        cur_Uy -= dy;
-
-        if (SSD >= prev_SSD)
-            break;
-        prev_SSD = SSD;
-    }
-
-    float2 vec = (float2)(cur_Ux - Ux, cur_Uy - Uy);
-    if ((dot(vec, vec)) <= (float)(psz * psz))
-    {
-        Sx_ptr[index] = cur_Ux;
-        Sy_ptr[index] = cur_Uy;
-    }
-}
-
diff --git a/modules/optflow/src/opencl/optical_flow_tvl1.cl b/modules/optflow/src/opencl/optical_flow_tvl1.cl
new file mode 100644
index 00000000000..ce67879feae
--- /dev/null
+++ b/modules/optflow/src/opencl/optical_flow_tvl1.cl
@@ -0,0 +1,378 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2010-2012, Multicoreware, Inc., all rights reserved.
+// Copyright (C) 2010-2012, Advanced Micro Devices, Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// @Authors
+//    Jin Ma jin@multicorewareinc.com
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors as is and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+__kernel void centeredGradientKernel(__global const float* src_ptr, int src_col, int src_row, int src_step,
+                                     __global float* dx, __global float* dy, int d_step)
+{
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+
+    if((x < src_col)&&(y < src_row))
+    {
+        int src_x1 = (x + 1) < (src_col -1)? (x + 1) : (src_col - 1);
+        int src_x2 = (x - 1) > 0 ? (x -1) : 0;
+        dx[y * d_step+ x] = 0.5f * (src_ptr[y * src_step + src_x1] - src_ptr[y * src_step+ src_x2]);
+
+        int src_y1 = (y+1) < (src_row - 1) ? (y + 1) : (src_row - 1);
+        int src_y2 = (y - 1) > 0 ? (y - 1) : 0;
+        dy[y * d_step+ x] = 0.5f * (src_ptr[src_y1 * src_step + x] - src_ptr[src_y2 * src_step+ x]);
+    }
+
+}
+
+inline float bicubicCoeff(float x_)
+{
+
+    float x = fabs(x_);
+    if (x <= 1.0f)
+        return x * x * (1.5f * x - 2.5f) + 1.0f;
+    else if (x < 2.0f)
+        return x * (x * (-0.5f * x + 2.5f) - 4.0f) + 2.0f;
+    else
+        return 0.0f;
+}
+
+__kernel void warpBackwardKernel(__global const float* I0, int I0_step, int I0_col, int I0_row,
+    image2d_t tex_I1, image2d_t tex_I1x, image2d_t tex_I1y,
+    __global const float* u1, int u1_step,
+    __global const float* u2,
+    __global float* I1w,
+    __global float* I1wx, /*int I1wx_step,*/
+    __global float* I1wy, /*int I1wy_step,*/
+    __global float* grad, /*int grad_step,*/
+    __global float* rho,
+    int I1w_step,
+    int u2_step,
+    int u1_offset_x,
+    int u1_offset_y,
+    int u2_offset_x,
+    int u2_offset_y)
+{
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+
+    if(x < I0_col&&y < I0_row)
+    {
+        //float u1Val = u1(y, x);
+        float u1Val = u1[(y + u1_offset_y) * u1_step + x + u1_offset_x];
+        //float u2Val = u2(y, x);
+        float u2Val = u2[(y + u2_offset_y) * u2_step + x + u2_offset_x];
+
+        float wx = x + u1Val;
+        float wy = y + u2Val;
+
+        int xmin = ceil(wx - 2.0f);
+        int xmax = floor(wx + 2.0f);
+
+        int ymin = ceil(wy - 2.0f);
+        int ymax = floor(wy + 2.0f);
+
+        float sum  = 0.0f;
+        float sumx = 0.0f;
+        float sumy = 0.0f;
+        float wsum = 0.0f;
+        sampler_t sampleri = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;
+        for (int cy = ymin; cy <= ymax; ++cy)
+        {
+            for (int cx = xmin; cx <= xmax; ++cx)
+            {
+                float w = bicubicCoeff(wx - cx) * bicubicCoeff(wy - cy);
+                //sum  += w * tex2D(tex_I1 , cx, cy);
+                int2 cood = (int2)(cx, cy);
+                sum += w * read_imagef(tex_I1, sampleri, cood).x;
+                //sumx += w * tex2D(tex_I1x, cx, cy);
+                sumx += w * read_imagef(tex_I1x, sampleri, cood).x;
+                //sumy += w * tex2D(tex_I1y, cx, cy);
+                sumy += w * read_imagef(tex_I1y, sampleri, cood).x;
+                wsum += w;
+            }
+        }
+        float coeff = 1.0f / wsum;
+        float I1wVal  = sum  * coeff;
+        float I1wxVal = sumx * coeff;
+        float I1wyVal = sumy * coeff;
+        I1w[y * I1w_step + x]  = I1wVal;
+        I1wx[y * I1w_step + x] = I1wxVal;
+        I1wy[y * I1w_step + x] = I1wyVal;
+        float Ix2 = I1wxVal * I1wxVal;
+        float Iy2 = I1wyVal * I1wyVal;
+
+        // store the |Grad(I1)|^2
+        grad[y * I1w_step + x] = Ix2 + Iy2;
+
+        // compute the constant part of the rho function
+        float I0Val = I0[y * I0_step + x];
+        rho[y * I1w_step + x] = I1wVal - I1wxVal * u1Val - I1wyVal * u2Val - I0Val;
+    }
+}
+
+inline float readImage(__global const float *image,  int x,  int y,  int rows,  int cols, int elemCntPerRow)
+{
+    int i0 = clamp(x, 0, cols - 1);
+    int j0 = clamp(y, 0, rows - 1);
+
+    return image[j0 * elemCntPerRow + i0];
+}
+
+__kernel void warpBackwardKernelNoImage2d(__global const float* I0, int I0_step, int I0_col, int I0_row,
+    __global const float* tex_I1, __global const float* tex_I1x, __global const float* tex_I1y,
+    __global const float* u1, int u1_step,
+    __global const float* u2,
+    __global float* I1w,
+    __global float* I1wx, /*int I1wx_step,*/
+    __global float* I1wy, /*int I1wy_step,*/
+    __global float* grad, /*int grad_step,*/
+    __global float* rho,
+    int I1w_step,
+    int u2_step,
+    int I1_step,
+    int I1x_step)
+{
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+
+    if(x < I0_col&&y < I0_row)
+    {
+        //float u1Val = u1(y, x);
+        float u1Val = u1[y * u1_step + x];
+        //float u2Val = u2(y, x);
+        float u2Val = u2[y * u2_step + x];
+
+        float wx = x + u1Val;
+        float wy = y + u2Val;
+
+        int xmin = ceil(wx - 2.0f);
+        int xmax = floor(wx + 2.0f);
+
+        int ymin = ceil(wy - 2.0f);
+        int ymax = floor(wy + 2.0f);
+
+        float sum  = 0.0f;
+        float sumx = 0.0f;
+        float sumy = 0.0f;
+        float wsum = 0.0f;
+
+        for (int cy = ymin; cy <= ymax; ++cy)
+        {
+            for (int cx = xmin; cx <= xmax; ++cx)
+            {
+                float w = bicubicCoeff(wx - cx) * bicubicCoeff(wy - cy);
+
+                int2 cood = (int2)(cx, cy);
+                sum += w * readImage(tex_I1, cood.x, cood.y, I0_col, I0_row, I1_step);
+                sumx += w * readImage(tex_I1x, cood.x, cood.y, I0_col, I0_row, I1x_step);
+                sumy += w * readImage(tex_I1y, cood.x, cood.y, I0_col, I0_row, I1x_step);
+                wsum += w;
+            }
+        }
+
+        float coeff = 1.0f / wsum;
+
+        float I1wVal  = sum  * coeff;
+        float I1wxVal = sumx * coeff;
+        float I1wyVal = sumy * coeff;
+
+        I1w[y * I1w_step + x]  = I1wVal;
+        I1wx[y * I1w_step + x] = I1wxVal;
+        I1wy[y * I1w_step + x] = I1wyVal;
+
+        float Ix2 = I1wxVal * I1wxVal;
+        float Iy2 = I1wyVal * I1wyVal;
+
+        // store the |Grad(I1)|^2
+        grad[y * I1w_step + x] = Ix2 + Iy2;
+
+        // compute the constant part of the rho function
+        float I0Val = I0[y * I0_step + x];
+        rho[y * I1w_step + x] = I1wVal - I1wxVal * u1Val - I1wyVal * u2Val - I0Val;
+    }
+
+}
+
+
+__kernel void estimateDualVariablesKernel(__global const float* u1, int u1_col, int u1_row, int u1_step,
+    __global const float* u2,
+    __global float* p11, int p11_step,
+    __global float* p12,
+    __global float* p21,
+    __global float* p22,
+    float taut,
+    int u2_step,
+    int u1_offset_x,
+    int u1_offset_y,
+    int u2_offset_x,
+    int u2_offset_y)
+{
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+
+    if(x < u1_col && y < u1_row)
+    {
+        int src_x1 = (x + 1) < (u1_col - 1) ? (x + 1) : (u1_col - 1);
+        float u1x = u1[(y + u1_offset_y) * u1_step + src_x1 + u1_offset_x] - u1[(y + u1_offset_y) * u1_step + x + u1_offset_x];
+
+        int src_y1 = (y + 1) < (u1_row - 1) ? (y + 1) : (u1_row - 1);
+        float u1y = u1[(src_y1 + u1_offset_y) * u1_step + x + u1_offset_x] - u1[(y + u1_offset_y) * u1_step + x + u1_offset_x];
+
+        int src_x2 = (x + 1) < (u1_col - 1) ? (x + 1) : (u1_col - 1);
+        float u2x = u2[(y + u2_offset_y) * u2_step + src_x2 + u2_offset_x] - u2[(y + u2_offset_y) * u2_step + x + u2_offset_x];
+
+        int src_y2 = (y + 1) <  (u1_row - 1) ? (y + 1) : (u1_row - 1);
+        float u2y = u2[(src_y2 + u2_offset_y) * u2_step + x + u2_offset_x] - u2[(y + u2_offset_y) * u2_step + x + u2_offset_x];
+
+        float g1 = hypot(u1x, u1y);
+        float g2 = hypot(u2x, u2y);
+
+        float ng1 = 1.0f + taut * g1;
+        float ng2 = 1.0f + taut * g2;
+
+        p11[y * p11_step + x] = (p11[y * p11_step + x] + taut * u1x) / ng1;
+        p12[y * p11_step + x] = (p12[y * p11_step + x] + taut * u1y) / ng1;
+        p21[y * p11_step + x] = (p21[y * p11_step + x] + taut * u2x) / ng2;
+        p22[y * p11_step + x] = (p22[y * p11_step + x] + taut * u2y) / ng2;
+    }
+
+}
+
+inline float divergence(__global const float* v1, __global const float* v2, int y, int x, int v1_step, int v2_step)
+{
+
+    if (x > 0 && y > 0)
+    {
+        float v1x = v1[y * v1_step + x] - v1[y * v1_step + x - 1];
+        float v2y = v2[y * v2_step + x] - v2[(y - 1) * v2_step + x];
+        return v1x + v2y;
+    }
+    else
+    {
+        if (y > 0)
+            return v1[y * v1_step + 0] + v2[y * v2_step + 0] - v2[(y - 1) * v2_step + 0];
+        else
+        {
+            if (x > 0)
+                return v1[0 * v1_step + x] - v1[0 * v1_step + x - 1] + v2[0 * v2_step + x];
+            else
+                return v1[0 * v1_step + 0] + v2[0 * v2_step + 0];
+        }
+    }
+
+}
+
+__kernel void estimateUKernel(__global const float* I1wx, int I1wx_col, int I1wx_row, int I1wx_step,
+    __global const float* I1wy, /*int I1wy_step,*/
+    __global const float* grad, /*int grad_step,*/
+    __global const float* rho_c, /*int rho_c_step,*/
+    __global const float* p11, /*int p11_step,*/
+    __global const float* p12, /*int p12_step,*/
+    __global const float* p21, /*int p21_step,*/
+    __global const float* p22, /*int p22_step,*/
+    __global float* u1, int u1_step,
+    __global float* u2,
+    __global float* error, float l_t, float theta, int u2_step,
+    int u1_offset_x,
+    int u1_offset_y,
+    int u2_offset_x,
+    int u2_offset_y,
+    char calc_error)
+{
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+
+    if(x < I1wx_col && y < I1wx_row)
+    {
+        float I1wxVal = I1wx[y * I1wx_step + x];
+        float I1wyVal = I1wy[y * I1wx_step + x];
+        float gradVal = grad[y * I1wx_step + x];
+        float u1OldVal = u1[(y + u1_offset_y) * u1_step + x + u1_offset_x];
+        float u2OldVal = u2[(y + u2_offset_y) * u2_step + x + u2_offset_x];
+
+        float rho = rho_c[y * I1wx_step + x] + (I1wxVal * u1OldVal + I1wyVal * u2OldVal);
+
+        // estimate the values of the variable (v1, v2) (thresholding operator TH)
+
+        float d1 = 0.0f;
+        float d2 = 0.0f;
+
+        if (rho < -l_t * gradVal)
+        {
+            d1 = l_t * I1wxVal;
+            d2 = l_t * I1wyVal;
+        }
+        else if (rho > l_t * gradVal)
+        {
+            d1 = -l_t * I1wxVal;
+            d2 = -l_t * I1wyVal;
+        }
+        else if (gradVal > 1.192092896e-07f)
+        {
+            float fi = -rho / gradVal;
+            d1 = fi * I1wxVal;
+            d2 = fi * I1wyVal;
+        }
+
+        float v1 = u1OldVal + d1;
+        float v2 = u2OldVal + d2;
+
+        // compute the divergence of the dual variable (p1, p2)
+
+        float div_p1 = divergence(p11, p12, y, x, I1wx_step, I1wx_step);
+        float div_p2 = divergence(p21, p22, y, x, I1wx_step, I1wx_step);
+
+        // estimate the values of the optical flow (u1, u2)
+
+        float u1NewVal = v1 + theta * div_p1;
+        float u2NewVal = v2 + theta * div_p2;
+
+        u1[(y + u1_offset_y) * u1_step + x + u1_offset_x] = u1NewVal;
+        u2[(y + u2_offset_y) * u2_step + x + u2_offset_x] = u2NewVal;
+
+        if(calc_error)
+        {
+            float n1 = (u1OldVal - u1NewVal) * (u1OldVal - u1NewVal);
+            float n2 = (u2OldVal - u2NewVal) * (u2OldVal - u2NewVal);
+            error[y * I1wx_step + x] = n1 + n2;
+        }
+    }
+}
diff --git a/modules/optflow/src/optical_flow_io.cpp b/modules/optflow/src/optical_flow_io.cpp
deleted file mode 100644
index 6ed0960b8c0..00000000000
--- a/modules/optflow/src/optical_flow_io.cpp
+++ /dev/null
@@ -1,139 +0,0 @@
-/*M///////////////////////////////////////////////////////////////////////////////////////
- //
- //  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
- //
- //  By downloading, copying, installing or using the software you agree to this license.
- //  If you do not agree to this license, do not download, install,
- //  copy or use the software.
- //
- //
- //                           License Agreement
- //                For Open Source Computer Vision Library
- //
- // Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
- // Copyright (C) 2009, Willow Garage Inc., all rights reserved.
- // Third party copyrights are property of their respective owners.
- //
- // Redistribution and use in source and binary forms, with or without modification,
- // are permitted provided that the following conditions are met:
- //
- //   * Redistribution's of source code must retain the above copyright notice,
- //     this list of conditions and the following disclaimer.
- //
- //   * Redistribution's in binary form must reproduce the above copyright notice,
- //     this list of conditions and the following disclaimer in the documentation
- //     and/or other materials provided with the distribution.
- //
- //   * The name of the copyright holders may not be used to endorse or promote products
- //     derived from this software without specific prior written permission.
- //
- // This software is provided by the copyright holders and contributors "as is" and
- // any express or implied warranties, including, but not limited to, the implied
- // warranties of merchantability and fitness for a particular purpose are disclaimed.
- // In no event shall the Intel Corporation or contributors be liable for any direct,
- // indirect, incidental, special, exemplary, or consequential damages
- // (including, but not limited to, procurement of substitute goods or services;
- // loss of use, data, or profits; or business interruption) however caused
- // and on any theory of liability, whether in contract, strict liability,
- // or tort (including negligence or otherwise) arising in any way out of
- // the use of this software, even if advised of the possibility of such damage.
- //
- //M*/
-#include "precomp.hpp"
-#include<iostream>
-#include<fstream>
-
-namespace cv {
-namespace optflow {
-const float FLOW_TAG_FLOAT = 202021.25f;
-const char *FLOW_TAG_STRING = "PIEH";
-CV_EXPORTS_W Mat readOpticalFlow( const String& path )
-{
-//    CV_Assert(sizeof(float) == 4);
-    //FIXME: ensure right sizes of int and float - here and in writeOpticalFlow()
-
-    Mat_<Point2f> flow;
-    std::ifstream file(path.c_str(), std::ios_base::binary);
-    if ( !file.good() )
-        return CV_CXX_MOVE(flow); // no file - return empty matrix
-
-    float tag;
-    file.read((char*) &tag, sizeof(float));
-    if ( tag != FLOW_TAG_FLOAT )
-        return CV_CXX_MOVE(flow);
-
-    int width, height;
-
-    file.read((char*) &width, 4);
-    file.read((char*) &height, 4);
-
-    flow.create(height, width);
-
-    for ( int i = 0; i < flow.rows; ++i )
-    {
-        for ( int j = 0; j < flow.cols; ++j )
-        {
-            Point2f u;
-            file.read((char*) &u.x, sizeof(float));
-            file.read((char*) &u.y, sizeof(float));
-            if ( !file.good() )
-            {
-                flow.release();
-                return CV_CXX_MOVE(flow);
-            }
-
-            flow(i, j) = u;
-        }
-    }
-    file.close();
-    return CV_CXX_MOVE(flow);
-}
-CV_EXPORTS_W bool writeOpticalFlow( const String& path, InputArray flow )
-{
-//    CV_Assert(sizeof(float) == 4);
-
-    const int nChannels = 2;
-
-    Mat input = flow.getMat();
-    if ( input.channels() != nChannels || input.depth() != CV_32F || path.length() == 0 )
-        return false;
-
-    std::ofstream file(path.c_str(), std::ofstream::binary);
-    if ( !file.good() )
-        return false;
-
-    int nRows, nCols;
-
-    nRows = (int) input.size().height;
-    nCols = (int) input.size().width;
-
-    const int headerSize = 12;
-    char header[headerSize];
-    memcpy(header, FLOW_TAG_STRING, 4);
-    // size of ints is known - has been asserted in the current function
-    memcpy(header + 4, reinterpret_cast<const char*>(&nCols), sizeof(nCols));
-    memcpy(header + 8, reinterpret_cast<const char*>(&nRows), sizeof(nRows));
-    file.write(header, headerSize);
-    if ( !file.good() )
-        return false;
-
-//    if ( input.isContinuous() ) //matrix is continous - treat it as a single row
-//    {
-//        nCols *= nRows;
-//        nRows = 1;
-//    }
-
-    int row;
-    char* p;
-    for ( row = 0; row < nRows; row++ )
-    {
-        p = input.ptr<char>(row);
-        file.write(p, nCols * nChannels * sizeof(float));
-        if ( !file.good() )
-            return false;
-    }
-    file.close();
-    return true;
-}
-}
-}
diff --git a/modules/optflow/src/rlof/berlof_invoker.hpp b/modules/optflow/src/rlof/berlof_invoker.hpp
new file mode 100644
index 00000000000..3b7e1512be7
--- /dev/null
+++ b/modules/optflow/src/rlof/berlof_invoker.hpp
@@ -0,0 +1,2108 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#ifndef _BERLOF_INVOKER_HPP_
+#define _BERLOF_INVOKER_HPP_
+#include "rlof_invokerbase.hpp"
+
+
+namespace cv{
+namespace optflow{
+
+
+static inline bool checkSolution(float x, float y, float * c )
+{
+    float _a = x - 0.002f;
+    float _b = y - 0.002f;
+    cv::Point2f tl( c[0] * _a * _b + c[1] * _a + c[2] * _b + c[3],  c[4] * _a * _b + c[5] * _a + c[6] * _b + c[7]);
+    _a = x + 0.002f;
+    cv::Point2f tr( c[0] * _a * _b + c[1] * _a + c[2] * _b + c[3],  c[4] * _a * _b + c[5] * _a + c[6] * _b + c[7]);
+    _b = y + 0.002f;
+    cv::Point2f br( c[0] * _a * _b + c[1] * _a + c[2] * _b + c[3],  c[4] * _a * _b + c[5] * _a + c[6] * _b + c[7]);
+    _a = x - 0.002f;
+    cv::Point2f bl( c[0] * _a * _b + c[1] * _a + c[2] * _b + c[3],  c[4] * _a * _b + c[5] * _a + c[6] * _b + c[7]);
+    return (tl.x >= 0 && tl.y >= 0) && (tr.x <= 0 && tr.y >= 0)
+        && (bl.x >= 0 && bl.y <= 0) && (br.x <= 0 && br.y <= 0);
+}
+
+static inline cv::Point2f est_DeltaGain(const cv::Matx44f& src, const cv::Vec4f& val)
+{
+    return cv::Point2f(
+        src(2,0) * val[0] + src(2,1) * val[1] + src(2,2) * val[2] + src(2,3) * val[3],
+        src(3,0) * val[0] + src(3,1) * val[1] + src(3,2) * val[2] + src(3,3) * val[3]);
+}
+static inline void est_Result(const cv::Matx44f& src, const cv::Vec4f & val, cv::Point2f & delta, cv::Point2f & gain)
+{
+
+    delta = cv::Point2f(
+        -(src(0,0) * val[0] + src(0,1) * val[1] + src(0,2) * val[2] + src(0,3) * val[3]),
+        -(src(1,0) * val[0] + src(1,1) * val[1] + src(1,2) * val[2] + src(1,3) * val[3]));
+
+    gain = cv::Point2f(
+        src(2,0) * val[0] + src(2,1) * val[1] + src(2,2) * val[2] + src(2,3) * val[3],
+        src(3,0) * val[0] + src(3,1) * val[1] + src(3,2) * val[2] + src(3,3) * val[3]);
+}
+
+namespace berlof {
+
+namespace ica {
+
+class TrackerInvoker : public cv::ParallelLoopBody
+{
+public:
+    TrackerInvoker(
+        const Mat&      _prevImg,
+        const Mat&      _prevDeriv,
+        const Mat&      _nextImg,
+        const Mat&      _rgbPrevImg,
+        const Mat&      _rgbNextImg,
+        const Point2f*  _prevPts,
+        Point2f*        _nextPts,
+        uchar*          _status,
+        float*          _err,
+        int             _level,
+        int             _maxLevel,
+        int             _winSize[2],
+        int             _maxIteration,
+        bool            _useInitialFlow,
+        int             _supportRegionType,
+        int             _crossSegmentationThreshold,
+        const std::vector<float>& _normSigmas,
+        float           _minEigenValue
+    ) :
+        normSigma0(_normSigmas[0]),
+        normSigma1(_normSigmas[1]),
+        normSigma2(_normSigmas[2])
+    {
+        prevImg =       &_prevImg;
+        prevDeriv =     &_prevDeriv;
+        nextImg =       &_nextImg;
+        rgbPrevImg =    &_rgbPrevImg;
+        rgbNextImg =    &_rgbNextImg;
+        prevPts =       _prevPts;
+        nextPts =       _nextPts;
+        status =        _status;
+        err =           _err;
+        minWinSize =    _winSize[0];
+        maxWinSize =    _winSize[1];
+        criteria.maxCount = _maxIteration;
+        criteria.epsilon = 0.01;
+        level =         _level;
+        maxLevel =      _maxLevel;
+        windowType =    _supportRegionType;
+        minEigThreshold = _minEigenValue;
+        useInitialFlow = _useInitialFlow;
+        crossSegmentationThreshold = _crossSegmentationThreshold;
+    }
+
+    void operator()(const cv::Range& range) const CV_OVERRIDE
+    {
+        Point2f halfWin;
+        cv::Size winSize;
+        const Mat& I = *prevImg;
+        const Mat& J = *nextImg;
+        const Mat& derivI = *prevDeriv;
+        const Mat& BI = *rgbPrevImg;
+
+
+        const float FLT_SCALE = (1.f/(1 << 16));
+
+        winSize = cv::Size(maxWinSize,maxWinSize);
+        int winMaskwidth = roundUp(winSize.width, 16);
+        cv::Mat winMaskMatBuf(winMaskwidth, winMaskwidth, tCVMaskType);
+        winMaskMatBuf.setTo(1);
+
+        int j, cn = I.channels(), cn2 = cn*2;
+        int winbufwidth = roundUp(winSize.width, 16);
+        cv::Size winBufSize(winbufwidth,winbufwidth);
+
+        cv::AutoBuffer<deriv_type> _buf(winBufSize.area()*(cn + cn2));
+        int derivDepth = DataType<deriv_type>::depth;
+
+        Mat IWinBuf(winBufSize, CV_MAKETYPE(derivDepth, cn), (deriv_type*)_buf.data());
+        Mat derivIWinBuf(winBufSize, CV_MAKETYPE(derivDepth, cn2), (deriv_type*)_buf.data() + winBufSize.area()*cn);
+
+
+        for( int ptidx = range.start; ptidx < range.end; ptidx++ )
+        {
+            Point2f prevPt = prevPts[ptidx]*(float)(1./(1 << level));
+            Point2f nextPt;
+            if( level == maxLevel )
+            {
+                if( useInitialFlow )
+                    nextPt = nextPts[ptidx]*(float)(1./(1 << level));
+                else
+                    nextPt = prevPt;
+            }
+            else
+                nextPt = nextPts[ptidx]*2.f;
+            nextPts[ptidx] = nextPt;
+
+            Point2i iprevPt, inextPt;
+            iprevPt.x = cvFloor(prevPt.x);
+            iprevPt.y = cvFloor(prevPt.y);
+            int winArea = maxWinSize * maxWinSize;
+            cv::Mat winMaskMat(winMaskMatBuf, cv::Rect(0,0, maxWinSize,maxWinSize));
+            winMaskMatBuf.setTo(0);
+            if( calcWinMaskMat(BI, windowType, iprevPt,
+                                winMaskMat,winSize,halfWin,winArea,
+                                minWinSize,maxWinSize) == false)
+            {
+                continue;
+            }
+            halfWin = Point2f(static_cast<float>(maxWinSize), static_cast<float>(maxWinSize) ) - halfWin;
+            prevPt += halfWin;
+            iprevPt.x = cvFloor(prevPt.x);
+            iprevPt.y = cvFloor(prevPt.y);
+            if( iprevPt.x < 0 || iprevPt.x >= derivI.cols - winSize.width ||
+                iprevPt.y < 0 || iprevPt.y >= derivI.rows - winSize.height - 1)
+            {
+                if( level == 0 )
+                {
+                    if( status )
+                        status[ptidx] = 3;
+                    if( err )
+                        err[ptidx] = 0;
+                }
+                continue;
+            }
+
+            float a = prevPt.x - iprevPt.x;
+            float b = prevPt.y - iprevPt.y;
+            const int W_BITS = 14, W_BITS1 = 14;
+
+            int iw00 = cvRound((1.f - a)*(1.f - b)*(1 << W_BITS));
+            int iw01 = cvRound(a*(1.f - b)*(1 << W_BITS));
+            int iw10 = cvRound((1.f - a)*b*(1 << W_BITS));
+            int iw11 = (1 << W_BITS) - iw00 - iw01 - iw10;
+
+            float A11 = 0, A12 = 0, A22 = 0;
+            float D = 0;
+
+            // extract the patch from the first image, compute covariation matrix of derivatives
+            int x, y;
+            for( y = 0; y < winSize.height; y++ )
+            {
+                const uchar* src = I.ptr<uchar>(y + iprevPt.y, 0) + iprevPt.x*cn;
+                const uchar* src1 = I.ptr<uchar>(y + iprevPt.y + 1, 0) + iprevPt.x*cn;
+                const short* dsrc = derivI.ptr<short>(y + iprevPt.y, 0) + iprevPt.x*cn2;
+                const short* dsrc1 = derivI.ptr<short>(y + iprevPt.y + 1, 0) + iprevPt.x*cn2;
+                short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+                x = 0;
+                for( ; x < winSize.width*cn; x++, dsrc += 2, dsrc1 += 2, dIptr += 2 )
+                {
+                    if( winMaskMat.at<uchar>(y,x) == 0)
+                    {
+                        dIptr[0] = 0;
+                        dIptr[1] = 0;
+                        continue;
+                    }
+                    int ival = CV_DESCALE(src[x]*iw00 + src[x+cn]*iw01 +
+                                          src1[x]*iw10 + src1[x+cn]*iw11, W_BITS1-5);
+                    int ixval = CV_DESCALE(dsrc[0]*iw00 + dsrc[cn2]*iw01 +
+                                           dsrc1[0]*iw10 + dsrc1[cn2]*iw11, W_BITS1);
+                    int iyval = CV_DESCALE(dsrc[1]*iw00 + dsrc[cn2+1]*iw01 +
+                                           dsrc1[1]*iw10 + dsrc1[cn2+1]*iw11, W_BITS1);
+                    Iptr[x] = (short)ival;
+                    dIptr[0] = (short)ixval;
+                    dIptr[1] = (short)iyval;
+                }
+            }
+
+            cv::Mat residualMat = cv::Mat::zeros(winSize.height * (winSize.width + 8) * cn, 1, CV_16SC1);
+            cv::Point2f backUpNextPt = nextPt;
+            nextPt += halfWin;
+            Point2f prevDelta(0,0);    //denotes h(t-1)
+            cv::Size _winSize = winSize;
+#ifdef RLOF_SSE
+            __m128i mmMask0, mmMask1, mmMask;
+            getWBitMask(_winSize.width, mmMask0, mmMask1, mmMask);
+#endif
+            float MEstimatorScale = 1;
+            int buffIdx = 0;
+            float c[8];
+            cv::Mat GMc0, GMc1, GMc2, GMc3;
+            cv::Vec2f Mc0, Mc1, Mc2, Mc3;
+            int noIteration = 0;
+            int noReusedIteration = 0;
+            int noSolvedIteration = 0;
+            for( j = 0; j < criteria.maxCount; j++ )
+            {
+                cv::Point2f delta(0,0);
+                cv::Point2f deltaGain(0,0);
+                bool hasSolved = false;
+                a = nextPt.x - inextPt.x;
+                b = nextPt.y - inextPt.y;
+                float ab = a * b;
+                if( j == 0
+                    || ( inextPt.x != cvFloor(nextPt.x) || inextPt.y != cvFloor(nextPt.y) || j % 2 != 0 ))
+                {
+                    inextPt.x = cvFloor(nextPt.x);
+                    inextPt.y = cvFloor(nextPt.y);
+                    if( inextPt.x < 0 || inextPt.x >= J.cols - winSize.width ||
+                       inextPt.y < 0 || inextPt.y >= J.rows - winSize.height - 1)
+                    {
+                        if( level == 0 && status )
+                            status[ptidx] = 3;
+                        break;
+                    }
+
+
+                    a = nextPt.x - inextPt.x;
+                    b = nextPt.y - inextPt.y;
+                    ab = a * b;
+                    iw00 = cvRound((1.f - a)*(1.f - b)*(1 << W_BITS));
+                    iw01 = cvRound(a*(1.f - b)*(1 << W_BITS));
+                    iw10 = cvRound((1.f - a)*b*(1 << W_BITS));
+                    iw11 = (1 << W_BITS) - iw00 - iw01 - iw10;
+                    // mismatch
+                    if( j == 0 )
+                    {
+                        A11 = 0;
+                        A12 = 0;
+                        A22 = 0;
+                    }
+
+                    if ( j == 0 )
+                    {
+                        buffIdx = 0;
+                        for( y = 0; y < winSize.height; y++ )
+                        {
+                            const uchar* Jptr = J.ptr<uchar>(y + inextPt.y, inextPt.x*cn);
+                            const uchar* Jptr1 = J.ptr<uchar>(y + inextPt.y + 1, inextPt.x*cn);
+                            const short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                            const short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+                            const tMaskType* maskPtr = winMaskMat.ptr<tMaskType>(y, 0);
+                            x = 0;
+                            for( ; x < winSize.width*cn; x++, dIptr += 2)
+                            {
+                                if( maskPtr[x] == 0)
+                                    continue;
+                                int diff = CV_DESCALE(Jptr[x]*iw00 + Jptr[x+cn]*iw01 + Jptr1[x]*iw10 + Jptr1[x+cn]*iw11, W_BITS1-5)
+                                    - Iptr[x];
+                                residualMat.at<short>(buffIdx++) = static_cast<short>(diff);
+                            }
+                        }
+                        /*! Estimation for the residual */
+                        cv::Mat residualMatRoi(residualMat, cv::Rect(0,0,1, buffIdx));
+                        MEstimatorScale = (buffIdx == 0) ? 1.f : estimateScale(residualMatRoi);
+                    }
+
+                    float eta = 1.f / winArea;
+                    float fParam0 = normSigma0 * 32.f;
+                    float fParam1 = normSigma1 * 32.f;
+                    fParam0 = normSigma0 * MEstimatorScale;
+                    fParam1 = normSigma1 * MEstimatorScale;
+
+
+                    buffIdx = 0;
+                    float _b0[4] = {0,0,0,0};
+                    float _b1[4] = {0,0,0,0};
+
+                    /*
+                    */
+                    for( y = 0; y < _winSize.height; y++ )
+                    {
+                        const uchar* Jptr = J.ptr<uchar>(y + inextPt.y, inextPt.x*cn);
+                        const uchar* Jptr1 = J.ptr<uchar>(y + inextPt.y + 1, inextPt.x*cn);
+                        const short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                        const short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+                        const tMaskType* maskPtr = winMaskMat.ptr<tMaskType>(y, 0);
+                        x = 0;
+                        for( ; x < _winSize.width*cn; x++, dIptr += 2 )
+                        {
+                            if( maskPtr[x] == 0)
+                                continue;
+                            int illValue =   - Iptr[x];
+
+                            float It[4] = {static_cast<float>((Jptr1[x+cn]<< 5)    + illValue),
+                                         static_cast<float>((Jptr[x+cn]<< 5)        + illValue),
+                                         static_cast<float>((Jptr1[x]<< 5)        + illValue),
+                                         static_cast<float>((Jptr[x] << 5)            + illValue)};
+
+
+
+                            int J_val  =  CV_DESCALE(Jptr[x]*iw00 + Jptr[x+cn]*iw01 +
+                                                  Jptr1[x]*iw10 + Jptr1[x+cn]*iw11,
+                                                  W_BITS1-5);
+
+
+                            int diff = J_val + illValue;
+
+
+                            MEstimatorScale += (diff < MEstimatorScale) ? -eta : eta;
+
+                            int abss = (diff < 0) ? -diff : diff;
+
+                            // compute the missmatch vector
+                            if( j >= 0)
+                            {
+                                if( abss > fParam1)
+                                {
+                                    It[0] = 0;
+                                    It[1] = 0;
+                                    It[2] = 0;
+                                    It[3] = 0;
+                                }
+                                else if( abss > fParam0 && diff >= 0 )
+                                {
+                                    It[0] = normSigma2 * (It[0] - fParam1);
+                                    It[1] = normSigma2 * (It[1] - fParam1);
+                                    It[2] = normSigma2 * (It[2] - fParam1);
+                                    It[3] = normSigma2 * (It[3] - fParam1);
+                                }
+                                else if( abss > fParam0 && diff < 0 )
+                                {
+                                    It[0] = normSigma2 * (It[0] + fParam1);
+                                    It[1] = normSigma2 * (It[1] + fParam1);
+                                    It[2] = normSigma2 * (It[2] + fParam1);
+                                    It[3] = normSigma2 * (It[3] + fParam1);
+                                }
+                            }
+
+                            float It0 = It[0];
+                            float It1 = It[1];
+                            float It2 = It[2];
+                            float It3 = It[3];
+
+                            float ixval = static_cast<float>(dIptr[0]);
+                            float iyval = static_cast<float>(dIptr[1]);
+                            _b0[0] += It0 * ixval;
+                            _b0[1] += It1 * ixval;
+                            _b0[2] += It2 * ixval;
+                            _b0[3] += It3 * ixval;
+
+
+                            _b1[0] += It0*iyval;
+                            _b1[1] += It1*iyval;
+                            _b1[2] += It2*iyval;
+                            _b1[3] += It3*iyval;
+
+
+                            // compute the Gradient Matrice
+                            if( j == 0)
+                            {
+                                float tale = normSigma2 * FLT_RESCALE;
+                                if( abss < fParam0 || j < 0)
+                                {
+                                    tale = FLT_RESCALE;
+                                }
+                                else if( abss > fParam1)
+                                {
+                                    tale *= 0.01f;
+                                }
+                                else
+                                {
+                                    tale *= normSigma2;
+                                }
+
+                                A11 += (float)(ixval*ixval)*tale;
+                                A12 += (float)(ixval*iyval)*tale;
+                                A22 += (float)(iyval*iyval)*tale;
+
+                            }
+                        }
+                    }
+
+                    if( j == 0 )
+                    {
+
+                        A11 *= FLT_SCALE; // 54866744.
+                        A12 *= FLT_SCALE; // -628764.00
+                        A22 *= FLT_SCALE; // 19730.000
+
+                        D = A11 * A22 - A12 * A12;
+                        float minEig = (A22 + A11 - std::sqrt((A11-A22)*(A11-A22) +
+                                4.f*A12*A12))/(2*winArea);
+
+                        if(  minEig < minEigThreshold || std::abs(D) < FLT_EPSILON)
+                        {
+                            if( level == 0 && status )
+                                status[ptidx] = 0;
+                            if( level > 0)
+                            {
+                                nextPts[ptidx] = backUpNextPt;
+                            }
+                            noIteration++;
+                            break;
+                        }
+
+                        D = (1.f / D);
+
+                    }
+
+                    _b0[0] *= FLT_SCALE;_b0[1] *= FLT_SCALE;_b0[2] *= FLT_SCALE;_b0[3] *= FLT_SCALE;
+                    _b1[0] *= FLT_SCALE;_b1[1] *= FLT_SCALE;_b1[2] *= FLT_SCALE;_b1[3] *= FLT_SCALE;
+
+
+                    Mc0[0] =   _b0[0] - _b0[1] - _b0[2] + _b0[3];
+                    Mc0[1] =   _b1[0] - _b1[1] - _b1[2] + _b1[3];
+
+                    Mc1[0] =   _b0[1] - _b0[3];
+                    Mc1[1] =   _b1[1] - _b1[3];
+
+
+                    Mc2[0] =   _b0[2] - _b0[3];
+                    Mc2[1] =   _b1[2] - _b1[3];
+
+
+                    Mc3[0] =  _b0[3];
+                    Mc3[1] =  _b1[3];
+
+                    c[0] = -Mc0[0];
+                    c[1] = -Mc1[0];
+                    c[2] = -Mc2[0];
+                    c[3] = -Mc3[0];
+                    c[4] = -Mc0[1];
+                    c[5] = -Mc1[1];
+                    c[6] = -Mc2[1];
+                    c[7] = -Mc3[1];
+
+                    float e0 = 1.f / (c[6] * c[0] - c[4] * c[2]);
+                    float e1 = e0 * 0.5f * (c[6] * c[1] + c[7] * c[0] - c[5] * c[2] - c[4] * c[3]);
+                    float e2 = e0 * (c[1] * c[7] -c[3] * c[5]);
+                    e0 = e1 * e1 - e2;
+                    hasSolved = false;
+                    if ( e0 > 0)
+                    {
+                        e0 = sqrt(e0);
+                        float _y[2] = {-e1 - e0, e0 - e1};
+                        float c0yc1[2] = {c[0] * _y[0] + c[1],
+                                            c[0] * _y[1] + c[1]};
+                        float _x[2] = {-(c[2] * _y[0] + c[3]) / c0yc1[0],
+                                        -(c[2] * _y[1] + c[3]) / c0yc1[1]};
+                        bool isIn1 = (_x[0] >=0 && _x[0] <=1 && _y[0] >= 0 && _y[0] <= 1);
+                        bool isIn2 = (_x[1] >=0 && _x[1] <=1 && _y[1] >= 0 && _y[1] <= 1);
+
+                        bool isSolution1 = checkSolution(_x[0], _y[0], c );
+                        bool isSolution2 = checkSolution(_x[1], _y[1], c );
+                        bool isSink1 = isIn1 && isSolution1;
+                        bool isSink2 = isIn2 && isSolution2;
+
+                        if ( isSink1 != isSink2)
+                        {
+                            a = isSink1 ? _x[0] : _x[1];
+                            b = isSink1 ? _y[0] : _y[1];
+                            ab = a * b;
+                            hasSolved = true;
+                            delta.x = inextPt.x + a - nextPt.x;
+                            delta.y = inextPt.y + b - nextPt.y;
+                        } // isIn1 != isIn2
+                    }
+                    if( hasSolved == false)
+                        noIteration++;
+                }
+                else
+                {
+                    hasSolved = false;
+                    noReusedIteration++;
+                }
+                if( hasSolved == false )
+                {
+
+                    cv::Vec2f mismatchVec = ab * Mc0  + Mc1 *a + Mc2 * b + Mc3;
+                    delta.x = (A12 * mismatchVec.val[1] - A22 * mismatchVec.val[0]) * D;
+                    delta.y = (A12 * mismatchVec.val[0] - A11 * mismatchVec.val[1]) * D;
+                    delta.x = MAX(-1.f, MIN(1.f, delta.x));
+                    delta.y = MAX(-1.f, MIN(1.f, delta.y));
+                    nextPt  += delta;
+                    nextPts[ptidx] = nextPt - halfWin;
+                }
+                else
+                {
+                    nextPt += delta;
+                    nextPts[ptidx] = nextPt - halfWin;
+                    noSolvedIteration++;
+                    break;
+                }
+
+                delta.x = ( delta.x != delta.x) ? 0 : delta.x;
+                delta.y = ( delta.y != delta.y) ? 0 : delta.y;
+
+                if(j > 0 && (
+                    (std::abs(delta.x - prevDelta.x) < 0.01  &&    std::abs(delta.y - prevDelta.y) < 0.01)))
+                {
+                    nextPts[ptidx]  -= delta*0.5f;
+                    break;
+                }
+
+                prevDelta = delta;
+            }
+
+        }
+
+    }
+
+    const Mat*          prevImg;
+    const Mat*          nextImg;
+    const Mat*          prevDeriv;
+    const Mat*          rgbPrevImg;
+    const Mat*          rgbNextImg;
+    const Point2f*      prevPts;
+    Point2f*            nextPts;
+    uchar*              status;
+    float*              err;
+    int                 maxWinSize;
+    int                 minWinSize;
+    TermCriteria        criteria;
+    int                 level;
+    int                 maxLevel;
+    int                 windowType;
+    float               minEigThreshold;
+    bool                useInitialFlow;
+    const float         normSigma0, normSigma1, normSigma2;
+    int                 crossSegmentationThreshold;
+
+};
+
+}  // namespace
+namespace radial {
+class TrackerInvoker : public cv::ParallelLoopBody
+{
+public:
+    TrackerInvoker(
+        const Mat&      _prevImg,
+        const Mat&      _prevDeriv,
+        const Mat&      _nextImg,
+        const Mat&      _rgbPrevImg,
+        const Mat&      _rgbNextImg,
+        const Point2f*  _prevPts,
+        Point2f*        _nextPts,
+        uchar*          _status,
+        float*          _err,
+        Point2f*        _gainVecs,
+        int             _level,
+        int             _maxLevel,
+        int             _winSize[2],
+        int             _maxIteration,
+        bool            _useInitialFlow,
+        int             _supportRegionType,
+        int             _crossSegmentationThreshold,
+        const std::vector<float>& _normSigmas,
+        float           _minEigenValue
+    ) :
+        normSigma0(_normSigmas[0]),
+        normSigma1(_normSigmas[1]),
+        normSigma2(_normSigmas[2])
+    {
+        prevImg = &_prevImg;
+        prevDeriv = &_prevDeriv;
+        nextImg = &_nextImg;
+        rgbPrevImg = &_rgbPrevImg;
+        rgbNextImg = &_rgbNextImg;
+        prevPts = _prevPts;
+        nextPts = _nextPts;
+        status = _status;
+        err = _err;
+        gainVecs = _gainVecs;
+        minWinSize = _winSize[0];
+        maxWinSize = _winSize[1];
+        criteria.maxCount = _maxIteration;
+        criteria.epsilon = 0.01;
+        level = _level;
+        maxLevel = _maxLevel;
+        windowType = _supportRegionType;
+        minEigThreshold = _minEigenValue;
+        useInitialFlow = _useInitialFlow;
+        crossSegmentationThreshold = _crossSegmentationThreshold;
+    }
+
+    void operator()(const cv::Range& range) const CV_OVERRIDE
+    {
+        Point2f halfWin;
+        cv::Size winSize;
+        const Mat& I = *prevImg;
+        const Mat& J = *nextImg;
+        const Mat& derivI = *prevDeriv;
+        const Mat& BI = *rgbPrevImg;
+        const float FLT_SCALE = (1.f/(1 << 16));//(1.f/(1 << 20)); // 20
+        winSize = cv::Size(maxWinSize,maxWinSize);
+        int winMaskwidth = roundUp(winSize.width, 16);
+        cv::Mat winMaskMatBuf(winMaskwidth, winMaskwidth, tCVMaskType);
+        winMaskMatBuf.setTo(1);
+
+        int cn = I.channels(), cn2 = cn*2;
+        int winbufwidth = roundUp(winSize.width, 16);
+        cv::Size winBufSize(winbufwidth,winbufwidth);
+
+
+        cv::Matx44f invTensorMat;
+
+        cv::AutoBuffer<deriv_type> _buf(winBufSize.area()*(cn + cn2));
+        int derivDepth = DataType<deriv_type>::depth;
+
+        Mat IWinBuf(winBufSize, CV_MAKETYPE(derivDepth, cn), (deriv_type*)_buf.data());
+        Mat derivIWinBuf(winBufSize, CV_MAKETYPE(derivDepth, cn2), (deriv_type*)_buf.data() + winBufSize.area()*cn);
+
+        for( int ptidx = range.start; ptidx < range.end; ptidx++ )
+        {
+            Point2f prevPt = prevPts[ptidx]*(float)(1./(1 << level));
+            Point2f nextPt;
+            if( level == maxLevel )
+            {
+                if( useInitialFlow )
+                    nextPt = nextPts[ptidx]*(float)(1./(1 << level));
+                else
+                    nextPt = prevPt;
+            }
+            else
+                nextPt = nextPts[ptidx]*2.f;
+            nextPts[ptidx] = nextPt;
+
+            Point2i iprevPt, inextPt;
+            iprevPt.x = cvFloor(prevPt.x);
+            iprevPt.y = cvFloor(prevPt.y);
+            int winArea = maxWinSize * maxWinSize;
+            cv::Mat winMaskMat(winMaskMatBuf, cv::Rect(0,0, maxWinSize,maxWinSize));
+            winMaskMatBuf.setTo(0);
+            if( calcWinMaskMat(BI,  windowType, iprevPt,
+                                winMaskMat,winSize,halfWin,winArea,
+                                minWinSize,maxWinSize) == false)
+            continue;
+            halfWin = Point2f(static_cast<float>(maxWinSize), static_cast<float>(maxWinSize) ) - halfWin;
+            prevPt += halfWin;
+            iprevPt.x = cvFloor(prevPt.x);
+            iprevPt.y = cvFloor(prevPt.y);
+            if( iprevPt.x < 0 || iprevPt.x >= derivI.cols - winSize.width ||
+                iprevPt.y < 0 || iprevPt.y >= derivI.rows - winSize.height - 1)
+            {
+                if( level == 0 )
+                {
+                    if( status )
+                        status[ptidx] = 3;
+                    if( err )
+                        err[ptidx] = 0;
+                }
+                continue;
+            }
+
+            float a = prevPt.x - iprevPt.x;
+            float b = prevPt.y - iprevPt.y;
+            const int W_BITS = 14, W_BITS1 = 14;
+
+            int iw00 = cvRound((1.f - a)*(1.f - b)*(1 << W_BITS));
+            int iw01 = cvRound(a*(1.f - b)*(1 << W_BITS));
+            int iw10 = cvRound((1.f - a)*b*(1 << W_BITS));
+            int iw11 = (1 << W_BITS) - iw00 - iw01 - iw10;
+            float A11 = 0, A12 = 0, A22 = 0;
+
+            // tensor
+            float sumIx = 0;
+            float sumIy = 0;
+            float sumI  = 0;
+            float sumW = 0;
+            float w1 = 0, w2 = 0; // -IyI
+            float dI = 0; // I^2
+            float D = 0;
+
+#ifdef RLOF_SSE
+
+            __m128i qw0 = _mm_set1_epi32(iw00 + (iw01 << 16));
+            __m128i qw1 = _mm_set1_epi32(iw10 + (iw11 << 16));
+            __m128i z = _mm_setzero_si128();
+            __m128i qdelta_d = _mm_set1_epi32(1 << (W_BITS1-1));
+            __m128i qdelta = _mm_set1_epi32(1 << (W_BITS1-5-1));
+            __m128i mmMask4_epi32;
+            __m128i mmMaskSet_epi16   = _mm_set1_epi16(std::numeric_limits<unsigned short>::max());
+            get4BitMask(winSize.width, mmMask4_epi32);
+#endif
+
+            // extract the patch from the first image, compute covariation matrix of derivatives
+            int x, y;
+            for( y = 0; y < winSize.height; y++ )
+            {
+                x = 0;
+                const uchar* src  = I.ptr<uchar>(y + iprevPt.y, 0) + iprevPt.x*cn;
+                const uchar* src1 = I.ptr<uchar>(y + iprevPt.y + 1, 0) + iprevPt.x*cn;
+                const short* dsrc  = derivI.ptr<short>(y + iprevPt.y, 0) + iprevPt.x*cn2;
+                const short* dsrc1 = derivI.ptr<short>(y + iprevPt.y + 1, 0) + iprevPt.x*cn2;
+                short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+#ifdef RLOF_SSE
+                const tMaskType* maskPtr = winMaskMat.ptr<tMaskType>(y, 0);
+                for( ; x <= winBufSize.width*cn - 4; x += 4, dsrc += 4*2, dsrc1 += 8, dIptr += 4*2 )
+                {
+                    __m128i mask_0_7_epi16 = _mm_mullo_epi16(_mm_cvtepi8_epi16(_mm_loadl_epi64((const __m128i*)(maskPtr+x))), mmMaskSet_epi16);
+                    __m128i mask_0_3_epi16 = _mm_unpacklo_epi16(mask_0_7_epi16, mask_0_7_epi16);
+
+
+                    __m128i v00, v01, v10, v11, t0, t1;
+                    v00 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src + x)), z);
+                    v01 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src + x + cn)), z);
+                    v10 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src1 + x)), z);
+                    v11 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src1 + x + cn)), z);
+
+                    t0 = _mm_add_epi32(_mm_madd_epi16(_mm_unpacklo_epi16(v00, v01), qw0),
+                                       _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+                    t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta), W_BITS1-5);
+                    if( x + 4 > winSize.width)
+                    {
+                        t0 = _mm_and_si128(t0, mmMask4_epi32);
+                    }
+                    t0 = _mm_and_si128(t0, mask_0_3_epi16);
+                    _mm_storel_epi64((__m128i*)(Iptr + x), _mm_packs_epi32(t0,t0));
+
+
+                    v00 = _mm_loadu_si128((const __m128i*)(dsrc));
+                    v01 = _mm_loadu_si128((const __m128i*)(dsrc + cn2));
+                    v10 = _mm_loadu_si128((const __m128i*)(dsrc1));
+                    v11 = _mm_loadu_si128((const __m128i*)(dsrc1 + cn2));
+
+                    t0 = _mm_add_epi32(_mm_madd_epi16(_mm_unpacklo_epi16(v00, v01), qw0),
+                                       _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+                    t1 = _mm_add_epi32(_mm_madd_epi16(_mm_unpackhi_epi16(v00, v01), qw0),
+                                       _mm_madd_epi16(_mm_unpackhi_epi16(v10, v11), qw1));
+                    t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta_d), W_BITS1);
+                    t1 = _mm_srai_epi32(_mm_add_epi32(t1, qdelta_d), W_BITS1);
+                    v00 = _mm_packs_epi32(t0, t1); // Ix0 Iy0 Ix1 Iy1 ...
+                    if( x + 4 > winSize.width)
+                    {
+                        v00 = _mm_and_si128(v00, mmMask4_epi32);
+                    }
+                    v00 = _mm_and_si128(v00, mask_0_3_epi16);
+                    _mm_storeu_si128((__m128i*)dIptr, v00);
+                }
+#else
+
+                for( ; x < winSize.width*cn; x++, dsrc += 2, dsrc1 += 2, dIptr += 2 )
+                {
+                    if( winMaskMat.at<uchar>(y,x) == 0)
+                    {
+                        dIptr[0] = 0;
+                        dIptr[1] = 0;
+                        continue;
+                    }
+                    int ival = CV_DESCALE(src[x]*iw00 + src[x+cn]*iw01 +
+                                          src1[x]*iw10 + src1[x+cn]*iw11, W_BITS1-5);
+                    int ixval = CV_DESCALE(dsrc[0]*iw00 + dsrc[cn2]*iw01 +
+                                           dsrc1[0]*iw10 + dsrc1[cn2]*iw11, W_BITS1);
+                    int iyval = CV_DESCALE(dsrc[1]*iw00 + dsrc[cn2+1]*iw01 +
+                                            dsrc1[1]*iw10 + dsrc1[cn2+1]*iw11, W_BITS1);
+
+                    Iptr[x] = (short)ival;
+                    dIptr[0] = (short)ixval;
+                    dIptr[1] = (short)iyval;
+
+                }
+#endif
+            }
+
+            cv::Mat residualMat = cv::Mat::zeros(winSize.height * (winSize.width + 8) * cn, 1, CV_16SC1);
+            cv::Point2f backUpNextPt = nextPt;
+                    nextPt += halfWin;
+            Point2f prevDelta(0,0);    //relates to h(t-1)
+            Point2f prevGain(1,0);
+            cv::Point2f gainVec = gainVecs[ptidx];
+            cv::Point2f backUpGain = gainVec;
+            cv::Size _winSize = winSize;
+            int j;
+#ifdef RLOF_SSE
+            __m128i mmMask0, mmMask1, mmMask;
+            getWBitMask(_winSize.width, mmMask0, mmMask1, mmMask);
+            __m128  mmOnes   = _mm_set1_ps(1.f );
+#endif
+            float MEstimatorScale = 1;
+            int buffIdx = 0;
+            float c[8];
+            cv::Mat GMc0, GMc1, GMc2, GMc3;
+            cv::Vec4f Mc0, Mc1, Mc2, Mc3;
+            int noIteration = 0;
+            int noReusedIteration = 0;
+            int noSolvedIteration = 0;
+            for( j = 0; j < criteria.maxCount; j++ )
+            {
+                cv::Point2f delta(0,0);
+                cv::Point2f deltaGain(0,0);
+                bool hasSolved = false;
+                a = nextPt.x - inextPt.x;
+                b = nextPt.y - inextPt.y;
+                float ab = a * b;
+                if (j == 0
+                    || (inextPt.x != cvFloor(nextPt.x) || inextPt.y != cvFloor(nextPt.y) || j % 2 != 0) )
+                {
+                    inextPt.x = cvFloor(nextPt.x);
+                    inextPt.y = cvFloor(nextPt.y);
+
+                    if( inextPt.x < 0 || inextPt.x >= J.cols - winSize.width ||
+                        inextPt.y < 0 || inextPt.y >= J.rows - winSize.height - 1)
+                    {
+                        if( level == 0 && status )
+                            status[ptidx] = 3;
+                        if (level > 0)
+                        {
+                            nextPts[ptidx] = backUpNextPt;
+                            gainVecs[ptidx] = backUpGain;
+                        }
+                        noIteration++;
+                        break;
+                    }
+
+
+                    a = nextPt.x - inextPt.x;
+                    b = nextPt.y - inextPt.y;
+                    ab = a * b;
+                    iw00 = cvRound((1.f - a)*(1.f - b)*(1 << W_BITS));
+                    iw01 = cvRound(a*(1.f - b)*(1 << W_BITS));
+                    iw10 = cvRound((1.f - a)*b*(1 << W_BITS));
+                    iw11 = (1 << W_BITS) - iw00 - iw01 - iw10;
+                    // mismatch
+
+                    if( j == 0)
+                    {
+                        // tensor
+                        w1 = 0; // -IxI
+                        w2 = 0; // -IyI
+                        dI = 0; // I^2
+                        sumIx = 0;
+                        sumIy = 0;
+                        sumI  = 0;
+                        sumW = 0;
+                        A11 = 0;
+                        A12 = 0;
+                        A22 = 0;
+                    }
+
+                    if ( j == 0 )
+                    {
+                        buffIdx = 0;
+                        for( y = 0; y < winSize.height; y++ )
+                        {
+                            const uchar* Jptr = J.ptr<uchar>(y + inextPt.y, inextPt.x*cn);
+                            const uchar* Jptr1 = J.ptr<uchar>(y + inextPt.y + 1, inextPt.x*cn);
+                            const short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                            const short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+                            const tMaskType* maskPtr = winMaskMat.ptr<tMaskType>(y, 0);
+                            x = 0;
+                            for( ; x < winSize.width*cn; x++, dIptr += 2)
+                            {
+                                if( maskPtr[x] == 0)
+                                    continue;
+                                int diff = static_cast<int>(CV_DESCALE(Jptr[x]*iw00 + Jptr[x+cn]*iw01 + Jptr1[x]*iw10 + Jptr1[x+cn]*iw11, W_BITS1-5)
+                                    - Iptr[x] + Iptr[x] * gainVec.x +gainVec.y);
+                                residualMat.at<short>(buffIdx++) = static_cast<short>(diff);
+                            }
+                        }
+                        /*! Estimation for the residual */
+                        cv::Mat residualMatRoi(residualMat, cv::Rect(0,0,1, buffIdx));
+                        MEstimatorScale = (buffIdx == 0) ? 1.f : estimateScale(residualMatRoi);
+                    }
+
+                    float eta = 1.f / winArea;
+                    float fParam0 = normSigma0 * 32.f;
+                    float fParam1 = normSigma1 * 32.f;
+                    fParam0 = normSigma0 * MEstimatorScale;
+                    fParam1 = normSigma1 * MEstimatorScale;
+
+    #ifdef RLOF_SSE
+
+                    qw0 = _mm_set1_epi32(iw00 + (iw01 << 16));
+                    qw1 = _mm_set1_epi32(iw10 + (iw11 << 16));
+                    __m128 qb0[4] = {_mm_setzero_ps(),_mm_setzero_ps(),_mm_setzero_ps(),_mm_setzero_ps()};
+                    __m128 qb1[4] = {_mm_setzero_ps(),_mm_setzero_ps(),_mm_setzero_ps(),_mm_setzero_ps()};
+                    __m128 qb2[4] = {_mm_setzero_ps(),_mm_setzero_ps(),_mm_setzero_ps(),_mm_setzero_ps()};
+                    __m128 qb3[4] = {_mm_setzero_ps(),_mm_setzero_ps(),_mm_setzero_ps(),_mm_setzero_ps()};
+                    __m128 mmSumW1 = _mm_setzero_ps(), mmSumW2 = _mm_setzero_ps();
+                    __m128 mmSumI = _mm_setzero_ps(), mmSumW = _mm_setzero_ps(), mmSumDI = _mm_setzero_ps();
+                    __m128 mmSumIy = _mm_setzero_ps(),  mmSumIx = _mm_setzero_ps();
+                    __m128 mmAxx = _mm_setzero_ps(), mmAxy = _mm_setzero_ps(), mmAyy = _mm_setzero_ps();
+                    __m128i mmParam0 = _mm_set1_epi16(MIN(std::numeric_limits<short>::max() -1, static_cast<short>(fParam0)));
+                    __m128i mmParam1 = _mm_set1_epi16(MIN(std::numeric_limits<short>::max()- 1, static_cast<short>(fParam1)));
+
+
+                    float s2Val = std::fabs(normSigma2);
+                    int s2bitShift = normSigma2 == 0 ? 1 : cvCeil(log(200.f / s2Val) / log(2.f));
+                    __m128i mmParam2_epi16 = _mm_set1_epi16(static_cast<short>(normSigma2 * (float)(1 << s2bitShift)));
+                    __m128i mmOness_epi16 = _mm_set1_epi16(1 << s2bitShift);
+                    __m128  mmParam2s = _mm_set1_ps(0.01f * normSigma2);
+                    __m128  mmParam2s2 = _mm_set1_ps(normSigma2 * normSigma2);
+                    float gainVal = gainVec.x > 0 ? gainVec.x : -gainVec.x;
+                    int bitShift = gainVec.x == 0 ? 1 : cvCeil(log(200.f / gainVal) / log(2.f));
+                    __m128i mmGainValue_epi16 = _mm_set1_epi16(static_cast<short>(gainVec.x * (float)(1 << bitShift)));
+                    __m128i mmConstValue_epi16 = _mm_set1_epi16(static_cast<short>(gainVec.y));
+                    __m128i mmEta     = _mm_setzero_si128();
+                    __m128i mmScale      = _mm_set1_epi16(static_cast<short>(MEstimatorScale));
+
+    #endif
+
+                    buffIdx = 0;
+                    float _b0[4] = {0,0,0,0};
+                    float _b1[4] = {0,0,0,0};
+                    float _b2[4] = {0,0,0,0};
+                    float _b3[4] = {0,0,0,0};
+                    /*
+                    */
+                    for( y = 0; y < _winSize.height; y++ )
+                    {
+                        const uchar* Jptr =  J.ptr<uchar>(y + inextPt.y, inextPt.x*cn);
+                        const uchar* Jptr1 = J.ptr<uchar>(y + inextPt.y + 1, inextPt.x*cn);
+                        const short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                        const short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+                        const tMaskType* maskPtr = winMaskMat.ptr<tMaskType>(y, 0);
+                        x = 0;
+    #ifdef RLOF_SSE
+                        for( ; x <= _winSize.width*cn; x += 8, dIptr += 8*2 )
+                        {
+                            __m128i mask_0_7_epi16 = _mm_mullo_epi16(_mm_cvtepi8_epi16(_mm_loadl_epi64((const __m128i*)(maskPtr+x))), mmMaskSet_epi16);
+                            __m128i I_0_7_epi16 = _mm_loadu_si128((const __m128i*)(Iptr + x));
+
+                            __m128i v00 = _mm_unpacklo_epi8(
+                                _mm_loadl_epi64((const __m128i*)(Jptr + x)), z);
+                            __m128i v01 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr + x + cn)), z);
+                            __m128i v10 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr1 + x)), z);
+                            __m128i v11 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr1 + x + cn)), z);
+
+                            __m128i t0 = _mm_add_epi32
+                                (_mm_madd_epi16(
+                                    _mm_unpacklo_epi16(v00, v01),
+                                    qw0),
+                                _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+                            __m128i t1 = _mm_add_epi32(_mm_madd_epi16(_mm_unpackhi_epi16(v00, v01), qw0),
+                                                       _mm_madd_epi16(_mm_unpackhi_epi16(v10, v11), qw1));
+
+                            t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta), W_BITS1-5);
+                            t1 = _mm_srai_epi32(_mm_add_epi32(t1, qdelta), W_BITS1-5);
+
+                            __m128i lo = _mm_mullo_epi16(mmGainValue_epi16, I_0_7_epi16);
+                            __m128i hi = _mm_mulhi_epi16(mmGainValue_epi16, I_0_7_epi16);
+
+                            __m128i Igain_0_3_epi32 = _mm_srai_epi32(_mm_unpacklo_epi16(lo, hi), bitShift);
+                            __m128i Igain_4_7_epi32 = _mm_srai_epi32(_mm_unpackhi_epi16(lo, hi), bitShift);
+                            __m128i Igain_epi16 =  _mm_packs_epi32(Igain_0_3_epi32, Igain_4_7_epi32);
+
+
+                            __m128i diffValue = _mm_subs_epi16(_mm_add_epi16(Igain_epi16, mmConstValue_epi16), I_0_7_epi16);
+                            __m128i mmDiffc_epi16[4] =
+                            {
+                               _mm_add_epi16(_mm_slli_epi16(v11, 5), diffValue),
+                               _mm_add_epi16(_mm_slli_epi16(v01, 5), diffValue),
+                               _mm_add_epi16(_mm_slli_epi16(v10, 5), diffValue),
+                               _mm_add_epi16(_mm_slli_epi16(v00, 5), diffValue)
+                            };
+
+                            __m128i mmDiff_epi16 = _mm_add_epi16( _mm_packs_epi32(t0, t1), diffValue);
+
+
+                            mmDiff_epi16 = _mm_and_si128(mmDiff_epi16, mask_0_7_epi16);
+
+                            __m128i scalediffIsPos_epi16    = _mm_cmpgt_epi16(mmDiff_epi16, mmScale);
+                            mmEta = _mm_add_epi16(mmEta, _mm_add_epi16(_mm_and_si128(scalediffIsPos_epi16, _mm_set1_epi16(2)), _mm_set1_epi16(-1)));
+
+
+                            __m128i Ixy_0 = _mm_loadu_si128((const __m128i*)(dIptr));
+                            __m128i Ixy_1 = _mm_loadu_si128((const __m128i*)(dIptr + 8));
+
+
+                            __m128i abs_epi16 = _mm_abs_epi16(mmDiff_epi16);
+                            __m128i bSet2_epi16, bSet1_epi16;
+                            // |It| < sigma1 ?
+                            bSet2_epi16        = _mm_cmplt_epi16(abs_epi16, mmParam1);
+                            // It > 0 ?
+                            __m128i diffIsPos_epi16    = _mm_cmpgt_epi16(mmDiff_epi16, _mm_setzero_si128());
+                            // sigma0 < |It| < sigma1 ?
+                            bSet1_epi16        = _mm_and_si128(bSet2_epi16, _mm_cmpgt_epi16(abs_epi16, mmParam0));
+                                                        // val = |It| -/+ sigma1
+                            __m128i tmpParam1_epi16 = _mm_add_epi16(_mm_and_si128(diffIsPos_epi16, _mm_sub_epi16(mmDiff_epi16, mmParam1)),
+                                                                 _mm_andnot_si128(diffIsPos_epi16, _mm_add_epi16(mmDiff_epi16, mmParam1)));
+                            // It == 0     ? |It| > sigma13
+                            mmDiff_epi16 = _mm_and_si128(bSet2_epi16, mmDiff_epi16);
+
+                            for( unsigned int mmi = 0; mmi < 4; mmi++)
+                            {
+                                __m128i mmDiffc_epi16_t = _mm_and_si128(mmDiffc_epi16[mmi], mask_0_7_epi16);
+                                mmDiffc_epi16_t = _mm_and_si128(bSet2_epi16, mmDiffc_epi16_t);
+
+                                // It == val ? sigma0 < |It| < sigma1
+                                mmDiffc_epi16_t = _mm_blendv_epi8(mmDiffc_epi16_t, tmpParam1_epi16, bSet1_epi16);
+                                __m128i tale_epi16_ = _mm_blendv_epi8(mmOness_epi16, mmParam2_epi16, bSet1_epi16); // mask for 0 - 3
+                                // diff = diff * sigma2
+                                lo = _mm_mullo_epi16(tale_epi16_, mmDiffc_epi16_t);
+                                hi = _mm_mulhi_epi16(tale_epi16_, mmDiffc_epi16_t);
+                                __m128i diff_0_3_epi32 = _mm_srai_epi32(_mm_unpacklo_epi16(lo, hi), s2bitShift);
+                                __m128i diff_4_7_epi32 = _mm_srai_epi32(_mm_unpackhi_epi16(lo, hi), s2bitShift);
+
+                                mmDiffc_epi16_t = _mm_packs_epi32(diff_0_3_epi32, diff_4_7_epi32);
+                                __m128i diff1 = _mm_unpackhi_epi16(mmDiffc_epi16_t, mmDiffc_epi16_t); // It4 It4 It5 It5 It6 It6 It7 It7   | It12 It12 It13 It13...
+                                __m128i diff0 = _mm_unpacklo_epi16(mmDiffc_epi16_t, mmDiffc_epi16_t); // It0 It0 It1 It1 It2 It2 It3 It3   | It8 It8 It9 It9...
+
+                                // Ix * diff / Iy * diff
+                                v10 = _mm_mullo_epi16(Ixy_0, diff0);
+                                v11 = _mm_mulhi_epi16(Ixy_0, diff0);
+                                v00 = _mm_unpacklo_epi16(v10, v11);
+                                v10 = _mm_unpackhi_epi16(v10, v11);
+
+                                qb0[mmi] = _mm_add_ps(qb0[mmi], _mm_cvtepi32_ps(v00));
+                                qb1[mmi] = _mm_add_ps(qb1[mmi], _mm_cvtepi32_ps(v10));
+                                // It * Ix It * Iy [4 ... 7]
+                                // for set 1 hi sigma 1
+                                v10 = _mm_mullo_epi16(Ixy_1, diff1);
+                                v11 = _mm_mulhi_epi16(Ixy_1, diff1);
+                                v00 = _mm_unpacklo_epi16(v10, v11);
+                                v10 = _mm_unpackhi_epi16(v10, v11);
+                                qb0[mmi] = _mm_add_ps(qb0[mmi], _mm_cvtepi32_ps(v00));
+                                qb1[mmi] = _mm_add_ps(qb1[mmi], _mm_cvtepi32_ps(v10));
+                                // diff * J [0 ... 7]
+                                // for set 1  sigma 1
+                                // b3 += diff * Iptr[x]
+                                v10 = _mm_mullo_epi16(mmDiffc_epi16_t, I_0_7_epi16);
+                                v11 = _mm_mulhi_epi16(mmDiffc_epi16_t, I_0_7_epi16);
+                                v00 = _mm_unpacklo_epi16(v10, v11);
+
+                                v10 = _mm_unpackhi_epi16(v10, v11);
+                                qb2[mmi] = _mm_add_ps(qb2[mmi], _mm_cvtepi32_ps(v00));
+                                qb2[mmi] = _mm_add_ps(qb2[mmi], _mm_cvtepi32_ps(v10));
+                                qb3[mmi] = _mm_add_ps(qb3[mmi], _mm_cvtepi32_ps(diff_0_3_epi32));
+                                qb3[mmi] = _mm_add_ps(qb3[mmi], _mm_cvtepi32_ps(diff_4_7_epi32));
+                            }
+
+                            if( j == 0 )
+                            {
+                                __m128 bSet1_0_3_ps = _mm_cvtepi32_ps(_mm_cvtepi16_epi32(bSet1_epi16));
+                                __m128 bSet1_4_7_ps = _mm_cvtepi32_ps(_mm_srai_epi32(_mm_unpackhi_epi16(bSet1_epi16,bSet1_epi16), 16));
+                                __m128 mask_0_4_ps = _mm_cvtepi32_ps(_mm_cvtepi16_epi32(mask_0_7_epi16));
+                                __m128 mask_4_7_ps = _mm_cvtepi32_ps((_mm_srai_epi32(_mm_unpackhi_epi16(mask_0_7_epi16, mask_0_7_epi16),16)));
+
+                                __m128 bSet2_0_3_ps = _mm_cvtepi32_ps(_mm_cvtepi16_epi32(bSet2_epi16));
+                                __m128 bSet2_4_7_ps = _mm_cvtepi32_ps(_mm_srai_epi32(_mm_unpackhi_epi16(bSet2_epi16, bSet2_epi16),16));
+
+                                __m128 tale_0_3_ps = _mm_blendv_ps(mmOnes, mmParam2s2, bSet1_0_3_ps);
+                                __m128 tale_4_7_ps = _mm_blendv_ps(mmOnes, mmParam2s2, bSet1_4_7_ps);
+                                tale_0_3_ps = _mm_blendv_ps(mmParam2s, tale_0_3_ps, bSet2_0_3_ps);
+                                tale_4_7_ps = _mm_blendv_ps(mmParam2s, tale_4_7_ps, bSet2_4_7_ps);
+
+                                tale_0_3_ps = _mm_blendv_ps(_mm_set1_ps(0), tale_0_3_ps, mask_0_4_ps);
+                                tale_4_7_ps = _mm_blendv_ps(_mm_set1_ps(0), tale_4_7_ps, mask_4_7_ps);
+
+                                t0 = _mm_srai_epi32(Ixy_0, 16); // Iy0 Iy1 Iy2 Iy3
+                                t1 = _mm_srai_epi32(_mm_slli_epi32(Ixy_0, 16), 16); // Ix0 Ix1 Ix2 Ix3
+
+                                __m128 fy = _mm_cvtepi32_ps(t0);
+                                __m128 fx = _mm_cvtepi32_ps(t1);
+
+                                // 0 ... 3
+                                __m128 I_ps = _mm_cvtepi32_ps(_mm_srai_epi32(_mm_unpacklo_epi16(I_0_7_epi16, I_0_7_epi16), 16));
+
+                                // A11 - A22
+                                __m128 fxtale = _mm_mul_ps(fx, tale_0_3_ps);
+                                __m128 fytale = _mm_mul_ps(fy, tale_0_3_ps);
+
+                                mmAyy = _mm_add_ps(mmAyy, _mm_mul_ps(fy, fytale));
+                                mmAxy = _mm_add_ps(mmAxy, _mm_mul_ps(fx, fytale));
+                                mmAxx = _mm_add_ps(mmAxx, _mm_mul_ps(fx, fxtale));
+
+                                // sumIx und sumIy
+                                mmSumIx = _mm_add_ps(mmSumIx, fxtale);
+                                mmSumIy = _mm_add_ps(mmSumIy, fytale);
+
+                                mmSumW1 = _mm_add_ps(mmSumW1, _mm_mul_ps(I_ps, fxtale));
+                                mmSumW2 = _mm_add_ps(mmSumW2, _mm_mul_ps(I_ps, fytale));
+
+                                // sumI
+                                __m128 I_tale_ps = _mm_mul_ps(I_ps, tale_0_3_ps);
+                                mmSumI = _mm_add_ps(mmSumI,I_tale_ps);
+
+                                // sumW
+                                mmSumW = _mm_add_ps(mmSumW, tale_0_3_ps);
+
+                                // sumDI
+                                mmSumDI = _mm_add_ps(mmSumDI, _mm_mul_ps( I_ps, I_tale_ps));
+
+
+                                t0 = _mm_srai_epi32(Ixy_1, 16); // Iy8 Iy9 Iy10 Iy11
+                                t1 = _mm_srai_epi32(_mm_slli_epi32(Ixy_1, 16), 16); // Ix0 Ix1 Ix2 Ix3
+
+                                fy =  _mm_cvtepi32_ps(t0);
+                                fx =  _mm_cvtepi32_ps(t1);
+
+                                // 4 ... 7
+                                I_ps = _mm_cvtepi32_ps(_mm_srai_epi32(_mm_unpackhi_epi16(I_0_7_epi16, I_0_7_epi16), 16));
+
+                                // A11 - A22
+                                fxtale = _mm_mul_ps(fx, tale_4_7_ps);
+                                fytale = _mm_mul_ps(fy, tale_4_7_ps);
+
+                                mmAyy = _mm_add_ps(mmAyy, _mm_mul_ps(fy, fytale));
+                                mmAxy = _mm_add_ps(mmAxy, _mm_mul_ps(fx, fytale));
+                                mmAxx = _mm_add_ps(mmAxx, _mm_mul_ps(fx, fxtale));
+
+                                // sumIx und sumIy
+                                mmSumIx = _mm_add_ps(mmSumIx, fxtale);
+                                mmSumIy = _mm_add_ps(mmSumIy, fytale);
+
+                                mmSumW1 = _mm_add_ps(mmSumW1, _mm_mul_ps(I_ps, fxtale));
+                                mmSumW2 = _mm_add_ps(mmSumW2, _mm_mul_ps(I_ps, fytale));
+
+                                // sumI
+                                I_tale_ps = _mm_mul_ps(I_ps, tale_4_7_ps);
+                                mmSumI = _mm_add_ps(mmSumI, I_tale_ps);
+
+                                // sumW
+                                mmSumW = _mm_add_ps(mmSumW, tale_4_7_ps);
+
+                                // sumDI
+                                mmSumDI = _mm_add_ps(mmSumDI, _mm_mul_ps( I_ps, I_tale_ps));
+                            }
+
+                        }
+    #else
+                        for( ; x < _winSize.width*cn; x++, dIptr += 2 )
+                        {
+                            if( maskPtr[x] == 0)
+                                continue;
+
+                            short ixval = dIptr[0];
+                            short iyval = dIptr[1];
+                            int illValue =  static_cast<int>(Iptr[x] * gainVec.x + gainVec.y  - Iptr[x]);
+
+                            int It[4] = {(Jptr1[x+cn]<< 5)    + illValue,
+                                         (Jptr[x+cn]<< 5)        + illValue,
+                                         (Jptr1[x]<< 5)        + illValue,
+                                         (Jptr[x] << 5)            + illValue};
+
+
+                            int J_val  =  CV_DESCALE(Jptr[x]*iw00 + Jptr[x+cn]*iw01 +
+                                                  Jptr1[x]*iw10 + Jptr1[x+cn]*iw11,
+                                                  W_BITS1-5);
+
+                            int diff =  J_val + illValue;
+
+
+
+                            MEstimatorScale += (diff < MEstimatorScale) ? -eta : eta;
+
+                            int abss = (diff < 0) ? -diff : diff;
+
+
+                            // compute the missmatch vector
+                            if( j >= 0)
+                            {
+                                if( abss > fParam1)
+                                {
+                                    It[0] = 0;
+                                    It[1] = 0;
+                                    It[2] = 0;
+                                    It[3] = 0;
+                                }
+                                else if( abss > static_cast<int>(fParam0) && diff >= 0 )
+                                {
+                                    It[0] = static_cast<int>(normSigma2 * (It[0] - fParam1));
+                                    It[1] = static_cast<int>(normSigma2 * (It[1] - fParam1));
+                                    It[2] = static_cast<int>(normSigma2 * (It[2] - fParam1));
+                                    It[3] = static_cast<int>(normSigma2 * (It[3] - fParam1));
+                                }
+                                else if( abss > static_cast<int>(fParam0) && diff < 0 )
+                                {
+                                    It[0] = static_cast<int>(normSigma2 * (It[0] + fParam1));
+                                    It[1] = static_cast<int>(normSigma2 * (It[1] + fParam1));
+                                    It[2] = static_cast<int>(normSigma2 * (It[2] + fParam1));
+                                    It[3] = static_cast<int>(normSigma2 * (It[3] + fParam1));
+                                }
+                            }
+                            _b0[0] += (float)(It[0]*dIptr[0]) ;
+                            _b0[1] += (float)(It[1]*dIptr[0]) ;
+                            _b0[2] += (float)(It[2]*dIptr[0]) ;
+                            _b0[3] += (float)(It[3]*dIptr[0]) ;
+
+
+                            _b1[0] += (float)(It[0]*dIptr[1]) ;
+                            _b1[1] += (float)(It[1]*dIptr[1]) ;
+                            _b1[2] += (float)(It[2]*dIptr[1]) ;
+                            _b1[3] += (float)(It[3]*dIptr[1]) ;
+
+                            _b2[0] += (float)(It[0])*Iptr[x] ;
+                            _b2[1] += (float)(It[1])*Iptr[x] ;
+                            _b2[2] += (float)(It[2])*Iptr[x] ;
+                            _b2[3] += (float)(It[3])*Iptr[x] ;
+
+                            _b3[0] += (float)(It[0]);
+                            _b3[1] += (float)(It[1]);
+                            _b3[2] += (float)(It[2]);
+                            _b3[3] += (float)(It[3]);
+
+                            // compute the Gradient Matrice
+                            if( j == 0)
+                            {
+                                float tale = normSigma2 * FLT_RESCALE;
+                                if( abss < fParam0 || j < 0)
+                                {
+                                    tale = FLT_RESCALE;
+                                }
+                                else if( abss > fParam1)
+                                {
+                                    tale *= 0.01f;
+                                }
+                                else
+                                {
+                                    tale *= normSigma2;
+                                }
+                                if( j == 0 )
+                                {
+                                    A11 += (float)(ixval*ixval)*tale;
+                                    A12 += (float)(ixval*iyval)*tale;
+                                    A22 += (float)(iyval*iyval)*tale;
+                                }
+
+                                dI += Iptr[x] * Iptr[x] * tale;
+                                float dx = static_cast<float>(dIptr[0]) * tale;
+                                float dy = static_cast<float>(dIptr[1]) * tale;
+                                sumIx += dx;
+                                sumIy += dy;
+                                w1 += dx * Iptr[x];
+                                w2 += dy * Iptr[x];
+                                sumI += Iptr[x]  * tale;
+                                sumW += tale;
+                            }
+
+                        }
+    #endif
+                    }
+
+    #ifdef RLOF_SSE
+                    short etaValues[8];
+                    _mm_storeu_si128((__m128i*)(etaValues), mmEta);
+                    MEstimatorScale += eta * (etaValues[0] + etaValues[1] + etaValues[2] + etaValues[3]
+                                               + etaValues[4] + etaValues[5] + etaValues[6] + etaValues[7]);
+                    float CV_DECL_ALIGNED(32) wbuf[4];
+    #endif
+                    if( j == 0 )
+                    {
+    #ifdef RLOF_SSE
+                            _mm_store_ps(wbuf, mmSumW1);
+                            w1  = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                            _mm_store_ps(wbuf, mmSumW2);
+                            w2  = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                            _mm_store_ps(wbuf, mmSumDI);
+                            dI  = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                            _mm_store_ps(wbuf, mmSumI);
+                            sumI  = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                            _mm_store_ps(wbuf, mmSumIx);
+                            sumIx  = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                            _mm_store_ps(wbuf, mmSumIy);
+                            sumIy  = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                            _mm_store_ps(wbuf, mmSumW);
+                            sumW  = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+    #endif
+                            sumIx *= -FLT_SCALE;
+                            sumIy *= -FLT_SCALE;
+                            sumI *=FLT_SCALE;
+                            sumW *= FLT_SCALE;
+                            w1 *= -FLT_SCALE;
+                            w2 *= -FLT_SCALE;
+                            dI *= FLT_SCALE;
+
+
+    #ifdef RLOF_SSE
+                        float CV_DECL_ALIGNED(16) A11buf[4], A12buf[4], A22buf[4];//
+
+                        _mm_store_ps(A11buf, mmAxx);
+                        _mm_store_ps(A12buf, mmAxy);
+                        _mm_store_ps(A22buf, mmAyy);
+
+
+                        A11 = A11buf[0] + A11buf[1] + A11buf[2] + A11buf[3];
+                        A12 = A12buf[0] + A12buf[1] + A12buf[2] + A12buf[3];
+                        A22 = A22buf[0] + A22buf[1] + A22buf[2] + A22buf[3];
+    #endif
+                        A11 *= FLT_SCALE; // 54866744.
+                        A12 *= FLT_SCALE; // -628764.00
+                        A22 *= FLT_SCALE; // 19730.000
+
+                        D = - A12*A12*sumI*sumI + dI*sumW*A12*A12 + 2*A12*sumI*sumIx*w2 + 2*A12*sumI*sumIy*w1
+                            - 2*dI*A12*sumIx*sumIy - 2*sumW*A12*w1*w2 + A11*A22*sumI*sumI - 2*A22*sumI*sumIx*w1
+                            - 2*A11*sumI*sumIy*w2 - sumIx*sumIx*w2*w2 + A22*dI*sumIx*sumIx + 2*sumIx*sumIy*w1*w2
+                            - sumIy*sumIy*w1*w1 + A11*dI*sumIy*sumIy + A22*sumW*w1*w1 + A11*sumW*w2*w2 - A11*A22*dI*sumW;
+
+                        float minEig = (A22 + A11 - std::sqrt((A11-A22)*(A11-A22) +
+                                4.f*A12*A12))/(2*winArea);
+                        if(  minEig < minEigThreshold || std::abs(D) < FLT_EPSILON )
+                        {
+                            if( level == 0 && status )
+                                status[ptidx] = 0;
+                            if( level > 0)
+                            {
+                                nextPts[ptidx] = backUpNextPt;
+                                gainVecs[ptidx] = backUpGain;
+                            }
+                            noIteration++;
+                            break;
+                        }
+
+
+                        D = (1.f / D);
+
+                        invTensorMat(0,0) = (A22*sumI*sumI - 2*sumI*sumIy*w2 + dI*sumIy*sumIy + sumW*w2*w2 - A22*dI*sumW)* D;
+                        invTensorMat(0,1) = (A12*dI*sumW - A12*sumI * sumI - dI*sumIx*sumIy + sumI*sumIx*w2 + sumI*sumIy*w1 - sumW*w1*w2)* D;
+                        invTensorMat(0,2) = (A12*sumI*sumIy - sumIy*sumIy*w1 - A22*sumI*sumIx - A12*sumW*w2 + A22*sumW*w1 + sumIx*sumIy*w2)* D;
+                        invTensorMat(0,3) = (A22*dI*sumIx - A12*dI*sumIy - sumIx*w2*w2 + A12*sumI*w2 - A22*sumI*w1 + sumIy*w1*w2) * D;
+                        invTensorMat(1,0) = invTensorMat(0,1);
+                        invTensorMat(1,1) = (A11*sumI * sumI - 2*sumI*sumIx*w1 + dI*sumIx * sumIx + sumW*w1*w1 - A11*dI*sumW) * D;
+                        invTensorMat(1,2) = (A12*sumI*sumIx - A11*sumI*sumIy - sumIx * sumIx*w2 + A11*sumW*w2 - A12*sumW*w1 + sumIx*sumIy*w1) * D;
+                        invTensorMat(1,3) = (A11*dI*sumIy - sumIy*w1*w1 - A12*dI*sumIx - A11*sumI*w2 + A12*sumI*w1 + sumIx*w1*w2)* D;
+                        invTensorMat(2,0) = invTensorMat(0,2);
+                        invTensorMat(2,1) = invTensorMat(1,2);
+                        invTensorMat(2,2) = (sumW*A12*A12 - 2*A12*sumIx*sumIy + A22*sumIx*sumIx + A11*sumIy*sumIy - A11*A22*sumW)* D;
+                        invTensorMat(2,3) = (A11*A22*sumI - A12*A12*sumI - A11*sumIy*w2 + A12*sumIx*w2 + A12*sumIy*w1 - A22*sumIx*w1)* D;
+                        invTensorMat(3,0) = invTensorMat(0,3);
+                        invTensorMat(3,1) = invTensorMat(1,3);
+                        invTensorMat(3,2) = invTensorMat(2,3);
+                        invTensorMat(3,3) = (dI*A12*A12 - 2*A12*w1*w2 + A22*w1*w1 + A11*w2*w2 - A11*A22*dI)* D;
+                    }
+
+
+    #ifdef RLOF_SSE
+                    float CV_DECL_ALIGNED(16) bbuf[4];
+                    for(int mmi = 0; mmi < 4; mmi++)
+                    {
+
+                        _mm_store_ps(bbuf, _mm_add_ps(qb0[mmi], qb1[mmi]));
+                        _b0[mmi] = bbuf[0] + bbuf[2];
+                        _b1[mmi] = bbuf[1] + bbuf[3];
+                        _mm_store_ps(bbuf, qb2[mmi]);
+                        _b2[mmi] = bbuf[0] + bbuf[1] + bbuf[2] + bbuf[3];
+                        _mm_store_ps(bbuf, qb3[mmi]);
+                        _b3[mmi] = bbuf[0] + bbuf[1] + bbuf[2] + bbuf[3];
+
+                    }
+    #endif
+
+                    _b0[0] *= FLT_SCALE;_b0[1] *= FLT_SCALE;_b0[2] *= FLT_SCALE;_b0[3] *= FLT_SCALE;
+                    _b1[0] *= FLT_SCALE;_b1[1] *= FLT_SCALE;_b1[2] *= FLT_SCALE;_b1[3] *= FLT_SCALE;
+                    _b2[0] *= FLT_SCALE;_b2[1] *= FLT_SCALE;_b2[2] *= FLT_SCALE;_b2[3] *= FLT_SCALE;
+                    _b3[0] *= FLT_SCALE;_b3[1] *= FLT_SCALE;_b3[2] *= FLT_SCALE;_b3[3] *= FLT_SCALE;
+
+
+                    Mc0[0] =   _b0[0] - _b0[1] - _b0[2] + _b0[3];
+                    Mc0[1] =   _b1[0] - _b1[1] - _b1[2] + _b1[3];
+                    Mc0[2] = -(_b2[0] - _b2[1] - _b2[2] + _b2[3]);
+                    Mc0[3] = -(_b3[0] - _b3[1] - _b3[2] + _b3[3]);
+
+                    Mc1[0] =   _b0[1] - _b0[3];
+                    Mc1[1] =   _b1[1] - _b1[3];
+                    Mc1[2] = -(_b2[1] - _b2[3]);
+                    Mc1[3] = -(_b3[1] - _b3[3]);
+
+
+                    Mc2[0] =   _b0[2] - _b0[3];
+                    Mc2[1] =   _b1[2] - _b1[3];
+                    Mc2[2] = -(_b2[2] - _b2[3]);
+                    Mc2[3] = -(_b3[2] - _b3[3]);
+
+
+                    Mc3[0] =  _b0[3];
+                    Mc3[1] =  _b1[3];
+                    Mc3[2] = -_b2[3];
+                    Mc3[3] = -_b3[3];
+
+                    //
+                    c[0] = -Mc0[0];
+                    c[1] = -Mc1[0];
+                    c[2] = -Mc2[0];
+                    c[3] = -Mc3[0];
+                    c[4] = -Mc0[1];
+                    c[5] = -Mc1[1];
+                    c[6] = -Mc2[1];
+                    c[7] = -Mc3[1];
+
+                    float e0 = 1.f / (c[6] * c[0] - c[4] * c[2]);
+                    float e1 = e0 * 0.5f * (c[6] * c[1] + c[7] * c[0] - c[5] * c[2] - c[4] * c[3]);
+                    float e2 = e0 * (c[1] * c[7] -c[3] * c[5]);
+                    e0 = e1 * e1 - e2;
+                    hasSolved = false;
+                    if ( e0 > 0)
+                    {
+                        e0 = sqrt(e0);
+                        float _y[2] = {-e1 - e0, e0 - e1};
+                        float c0yc1[2] = {c[0] * _y[0] + c[1],
+                                            c[0] * _y[1] + c[1]};
+                        float _x[2] = {-(c[2] * _y[0] + c[3]) / c0yc1[0],
+                                        -(c[2] * _y[1] + c[3]) / c0yc1[1]};
+                        bool isIn1 = (_x[0] >=0 && _x[0] <=1 && _y[0] >= 0 && _y[0] <= 1);
+                        bool isIn2 = (_x[1] >=0 && _x[1] <=1 && _y[1] >= 0 && _y[1] <= 1);
+
+                        bool isSolution1 = checkSolution(_x[0], _y[0], c );
+                        bool isSolution2 = checkSolution(_x[1], _y[1], c );
+                        bool isSink1 = isIn1 && isSolution1;
+                        bool isSink2 = isIn2 && isSolution2;
+
+                        if ( isSink1 != isSink2)
+                        {
+                            a = isSink1 ? _x[0] : _x[1];
+                            b = isSink1 ? _y[0] : _y[1];
+                            ab = a * b;
+                            hasSolved = true;
+                            delta.x = inextPt.x + a - nextPt.x;
+                            delta.y = inextPt.y + b - nextPt.y;
+
+                            cv::Vec4f mismatchVec = ab * Mc0  + Mc1 *a + Mc2 * b + Mc3;
+                            deltaGain = est_DeltaGain(invTensorMat, mismatchVec);
+
+                        } // isIn1 != isIn2
+                    }
+                    if( hasSolved == false)
+                        noIteration++;
+                }
+                else
+                {
+                    hasSolved = false;
+                    noReusedIteration++;
+                }
+                if( hasSolved == false )
+                {
+
+                    cv::Vec4f mismatchVec = ab * Mc0  + Mc1 *a + Mc2 * b + Mc3;
+                    est_Result(invTensorMat, mismatchVec, delta, deltaGain);
+
+                    delta.x  = MAX(-1.f, MIN( 1.f , delta.x));
+                    delta.y  = MAX(-1.f, MIN( 1.f , delta.y));
+
+
+                    if( j == 0)
+                        prevGain = deltaGain;
+                    gainVec += deltaGain;
+                    nextPt  += delta    ;
+                    nextPts[ptidx] = nextPt - halfWin;
+                    gainVecs[ptidx]= gainVec;
+
+                }
+                else
+                {
+                    nextPt += delta;
+                    nextPts[ptidx] = nextPt - halfWin;
+                    gainVecs[ptidx]= gainVec + deltaGain;
+                    noSolvedIteration++;
+                    break;
+                }
+
+                if(j > 0 && (
+                    (std::abs(delta.x - prevDelta.x) < 0.01  &&    std::abs(delta.y - prevDelta.y) < 0.01)
+                 || ((delta.ddot(delta) <= 0.001) && std::abs(prevGain.x - deltaGain.x) < 0.01)))
+                {
+                            nextPts[ptidx]  += delta*0.5f;
+                    gainVecs[ptidx] -= deltaGain* 0.5f;
+                    break;
+                }
+
+
+                prevDelta = delta;
+                prevGain = deltaGain;
+            }
+
+        }
+
+    }
+
+    const Mat*          prevImg;
+    const Mat*          nextImg;
+    const Mat*          prevDeriv;
+    const Mat*          rgbPrevImg;
+    const Mat*          rgbNextImg;
+    const Point2f*      prevPts;
+    Point2f*            nextPts;
+    uchar*              status;
+    cv::Point2f*        gainVecs;        // gain vector x -> multiplier y -> offset
+    float*              err;
+    int                 maxWinSize;
+    int                 minWinSize;
+    TermCriteria        criteria;
+    int                 level;
+    int                 maxLevel;
+    int                 windowType;
+    float               minEigThreshold;
+    bool                useInitialFlow;
+    const float         normSigma0, normSigma1, normSigma2;
+    int                 crossSegmentationThreshold;
+};
+
+}} // namespace
+namespace beplk {
+namespace ica {
+
+class TrackerInvoker  : public cv::ParallelLoopBody
+{
+public:
+    TrackerInvoker(
+        const Mat&      _prevImg,
+        const Mat&      _prevDeriv,
+        const Mat&      _nextImg,
+        const Mat&      _rgbPrevImg,
+        const Mat&      _rgbNextImg,
+        const Point2f*  _prevPts,
+        Point2f*        _nextPts,
+        uchar*          _status,
+        float*          _err,
+        int             _level,
+        int             _maxLevel,
+        int             _winSize[2],
+        int             _maxIteration,
+        bool            _useInitialFlow,
+        int             _supportRegionType,
+        int             _crossSegmentationThreshold,
+        float           _minEigenValue)
+    {
+        prevImg = &_prevImg;
+        prevDeriv = &_prevDeriv;
+        nextImg = &_nextImg;
+        rgbPrevImg = &_rgbPrevImg;
+        rgbNextImg = &_rgbNextImg;
+        prevPts = _prevPts;
+        nextPts = _nextPts;
+        status = _status;
+        err = _err;
+        minWinSize = _winSize[0];
+        maxWinSize = _winSize[1];
+        criteria.maxCount = _maxIteration;
+        criteria.epsilon = 0.01;
+        level = _level;
+        maxLevel = _maxLevel;
+        windowType = _supportRegionType;
+        minEigThreshold = _minEigenValue;
+        useInitialFlow = _useInitialFlow;
+        crossSegmentationThreshold = _crossSegmentationThreshold;
+    }
+
+    void operator()(const cv::Range& range) const CV_OVERRIDE
+    {
+        cv::Size    winSize;
+        cv::Point2f halfWin;
+
+        const Mat& I    = *prevImg;
+        const Mat& J    = *nextImg;
+        const Mat& derivI = *prevDeriv;
+        const Mat& BI = *rgbPrevImg;
+
+        winSize = cv::Size(maxWinSize,maxWinSize);
+        int winMaskwidth = roundUp(winSize.width, 8) * 2;
+        cv::Mat winMaskMatBuf(winMaskwidth, winMaskwidth, tCVMaskType);
+        winMaskMatBuf.setTo(1);
+
+        const float FLT_SCALE = (1.f/(1 << 20)); // 20
+
+        int j, cn = I.channels(), cn2 = cn*2;
+        int winbufwidth = roundUp(winSize.width, 8);
+        cv::Size winBufSize(winbufwidth,winbufwidth);
+
+        std::vector<short> _buf(winBufSize.area()*(cn + cn2));
+        Mat IWinBuf(winBufSize, CV_MAKETYPE(CV_16S, cn), &_buf[0]);
+        Mat derivIWinBuf(winBufSize, CV_MAKETYPE(CV_16S, cn2), &_buf[winBufSize.area()*cn]);
+
+        for( int ptidx = range.start; ptidx < range.end; ptidx++ )
+        {
+            Point2f prevPt = prevPts[ptidx]*(float)(1./(1 << level));
+            Point2f nextPt;
+            if( level == maxLevel )
+            {
+                if( useInitialFlow )
+                    nextPt = nextPts[ptidx]*(float)(1./(1 << level));
+                else
+                    nextPt = prevPt;
+            }
+            else
+                nextPt = nextPts[ptidx]*2.f;
+            nextPts[ptidx] = nextPt;
+
+            Point2i iprevPt, inextPt;
+            iprevPt.x = cvFloor(prevPt.x);
+            iprevPt.y = cvFloor(prevPt.y);
+            int winArea = maxWinSize * maxWinSize;
+            cv::Mat winMaskMat(winMaskMatBuf, cv::Rect(0,0, maxWinSize,maxWinSize));
+
+            if( calcWinMaskMat(BI, windowType, iprevPt,
+                    winMaskMat,winSize,halfWin,winArea,
+                    minWinSize,maxWinSize) == false)
+                continue;
+
+            halfWin = Point2f(static_cast<float>(maxWinSize), static_cast<float>(maxWinSize)) - halfWin;
+            prevPt += halfWin;
+            iprevPt.x = cvFloor(prevPt.x);
+            iprevPt.y = cvFloor(prevPt.y);
+
+            if( iprevPt.x < 0 || iprevPt.x >= derivI.cols - winSize.width ||
+                iprevPt.y < 0 || iprevPt.y >= derivI.rows - winSize.height - 1)
+            {
+                if( level == 0 )
+                {
+                    if( status )
+                        status[ptidx] = 3;
+                    if( err )
+                        err[ptidx] = 0;
+                }
+                continue;
+            }
+
+            float a = prevPt.x - iprevPt.x;
+            float b = prevPt.y - iprevPt.y;
+            const int W_BITS = 14, W_BITS1 = 14;
+
+            int iw00 = cvRound((1.f - a)*(1.f - b)*(1 << W_BITS));
+            int iw01 = cvRound(a*(1.f - b)*(1 << W_BITS));
+            int iw10 = cvRound((1.f - a)*b*(1 << W_BITS));
+            int iw11 = (1 << W_BITS) - iw00 - iw01 - iw10;
+            float A11 = 0, A12 = 0, A22 = 0;
+
+#ifdef RLOF_SSE
+            __m128i qw0 = _mm_set1_epi32(iw00 + (iw01 << 16));
+            __m128i qw1 = _mm_set1_epi32(iw10 + (iw11 << 16));
+            __m128i z = _mm_setzero_si128();
+            __m128i qdelta_d = _mm_set1_epi32(1 << (W_BITS1-1));
+            __m128i qdelta = _mm_set1_epi32(1 << (W_BITS1-5-1));
+            __m128 qA11 = _mm_setzero_ps(), qA12 = _mm_setzero_ps(), qA22 = _mm_setzero_ps();
+            __m128i mmMask4_epi32;
+            get4BitMask(winSize.width, mmMask4_epi32);
+#endif
+
+
+            int x, y;
+            for( y = 0; y < winSize.height; y++ )
+            {
+                const uchar* src = I.ptr<uchar>(y + iprevPt.y, 0) + iprevPt.x*cn;
+                const uchar* src1 = I.ptr<uchar>(y + iprevPt.y + 1, 0) + iprevPt.x*cn;
+                const short* dsrc = derivI.ptr<short>(y + iprevPt.y, 0) + iprevPt.x*cn2;
+                const short* dsrc1 = derivI.ptr<short>(y + iprevPt.y + 1, 0) + iprevPt.x*cn2;
+                short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+                const tMaskType* maskPtr = winMaskMat.ptr<tMaskType>(y, 0);
+                x = 0;
+#ifdef RLOF_SSE
+                for( ; x < winSize.width*cn; x += 4, dsrc += 4*2, dsrc1 += 8,dIptr += 4*2 )
+                {
+                    __m128i wMask = _mm_set_epi32(MaskSet * maskPtr[x+3],
+                                                  MaskSet * maskPtr[x+2],
+                                                  MaskSet * maskPtr[x+1],
+                                                  MaskSet * maskPtr[x]);
+                    __m128i v00, v01, v10, v11, t0, t1;
+                    v00 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src + x)), z);
+                    v01 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src + x + cn)), z);
+                    v10 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src1 + x)), z);
+                    v11 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src1 + x + cn)), z);
+
+                    t0 = _mm_add_epi32(_mm_madd_epi16(_mm_unpacklo_epi16(v00, v01), qw0),
+                                       _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+
+                    t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta), W_BITS1-5);
+                    _mm_storel_epi64((__m128i*)(Iptr + x), _mm_packs_epi32(t0,t0));
+
+                    v00 = _mm_loadu_si128((const __m128i*)(dsrc));
+                    v01 = _mm_loadu_si128((const __m128i*)(dsrc + cn2));
+                    v10 = _mm_loadu_si128((const __m128i*)(dsrc1));
+                    v11 = _mm_loadu_si128((const __m128i*)(dsrc1 + cn2));
+
+                    t0 = _mm_add_epi32(_mm_madd_epi16(_mm_unpacklo_epi16(v00, v01), qw0),
+                                       _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+                    t1 = _mm_add_epi32(_mm_madd_epi16(_mm_unpackhi_epi16(v00, v01), qw0),
+                                       _mm_madd_epi16(_mm_unpackhi_epi16(v10, v11), qw1));
+                    t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta_d), W_BITS1);
+                    t1 = _mm_srai_epi32(_mm_add_epi32(t1, qdelta_d), W_BITS1);
+                    v00 = _mm_packs_epi32(t0, t1); // Ix0 Iy0 Ix1 Iy1 ...
+
+                    if( x + 4 > winSize.width)
+                    {
+                        v00 = _mm_and_si128(v00, mmMask4_epi32);
+                    }
+                    v00 = _mm_and_si128(v00, wMask);
+
+                    _mm_storeu_si128((__m128i*)dIptr, v00);
+                    t0 = _mm_srai_epi32(v00, 16); // Iy0 Iy1 Iy2 Iy3
+                    t1 = _mm_srai_epi32(_mm_slli_epi32(v00, 16), 16); // Ix0 Ix1 Ix2 Ix3
+
+                    __m128 fy = _mm_cvtepi32_ps(t0);
+                    __m128 fx = _mm_cvtepi32_ps(t1);
+
+                    qA22 = _mm_add_ps(qA22, _mm_mul_ps(fy, fy));
+                    qA12 = _mm_add_ps(qA12, _mm_mul_ps(fx, fy));
+                    qA11 = _mm_add_ps(qA11, _mm_mul_ps(fx, fx));
+                }
+#else
+
+                for( ; x < winSize.width*cn; x++, dsrc += 2, dsrc1 += 2, dIptr += 2)
+                {
+                    if( maskPtr[x] == 0)
+                    {
+                        dIptr[0] = (short)0;
+                        dIptr[1] = (short)0;
+
+                        continue;
+                    }
+                    int ival = CV_DESCALE(src[x]*iw00 + src[x+cn]*iw01 +
+                                          src1[x]*iw10 + src1[x+cn]*iw11, W_BITS1-5);
+                    int ixval = CV_DESCALE(dsrc[0]*iw00 + dsrc[cn2]*iw01 +
+                                           dsrc1[0]*iw10 + dsrc1[cn2]*iw11, W_BITS1);
+                    int iyval = CV_DESCALE(dsrc[1]*iw00 + dsrc[cn2+1]*iw01 + dsrc1[1]*iw10 +
+                                           dsrc1[cn2+1]*iw11, W_BITS1);
+
+                    Iptr[x] = (short)ival;
+                    dIptr[0] = (short)ixval;
+                    dIptr[1] = (short)iyval;
+                    A11 += (float)(ixval*ixval);
+                    A12 += (float)(ixval*iyval);
+                    A22 += (float)(iyval*iyval);
+
+                }
+
+
+#endif
+
+            }
+
+#ifdef RLOF_SSE
+            float CV_DECL_ALIGNED(16) A11buf[4], A12buf[4], A22buf[4];
+            _mm_store_ps(A11buf, qA11);
+            _mm_store_ps(A12buf, qA12);
+            _mm_store_ps(A22buf, qA22);
+            A11 += A11buf[0] + A11buf[1] + A11buf[2] + A11buf[3];
+            A12 += A12buf[0] + A12buf[1] + A12buf[2] + A12buf[3];
+            A22 += A22buf[0] + A22buf[1] + A22buf[2] + A22buf[3];
+#endif
+
+            A11 *= FLT_SCALE;
+            A12 *= FLT_SCALE;
+            A22 *= FLT_SCALE;
+
+
+            float D = A11*A22 - A12*A12;
+            float minEig = (A22 + A11 - std::sqrt((A11-A22)*(A11-A22) +
+                             4.f*A12*A12))/(2 * winArea);
+
+            if( err )
+                err[ptidx] = (float)minEig;
+
+            if( D < FLT_EPSILON )
+            {
+                if( level == 0 && status )
+                    status[ptidx] = 0;
+                continue;
+            }
+
+            D = 1.f/D;
+
+            cv::Point2f backUpNextPt = nextPt;
+            nextPt += halfWin;
+            Point2f prevDelta(0,0);
+
+            float c[8];
+#ifdef RLOF_SSE
+            __m128i mmMask0, mmMask1, mmMask;
+            getWBitMask(winSize.width, mmMask0, mmMask1, mmMask);
+#endif
+            for( j = 0; j < criteria.maxCount; j++ )
+            {
+                cv::Point2f delta;
+                bool hasSolved = false;
+                a = nextPt.x - cvFloor(nextPt.x);
+                b = nextPt.y - cvFloor(nextPt.y);
+                float ab = a * b;
+
+
+                if( (inextPt.x != cvFloor(nextPt.x) || inextPt.y != cvFloor(nextPt.y) || j == 0))
+                {
+                    inextPt.x = cvFloor(nextPt.x);
+                    inextPt.y = cvFloor(nextPt.y);
+                    if( inextPt.x < 0 || inextPt.x >= J.cols - winSize.width ||
+                        inextPt.y < 0 || inextPt.y >= J.rows - winSize.height - 1)
+                    {
+                        if( level == 0 && status )
+                            status[ptidx] = 3;
+                        if (level > 0)
+                            nextPts[ptidx] = backUpNextPt;
+                        break;
+                    }
+
+
+                    iw00 = cvRound((1.f - a)*(1.f - b)*(1 << W_BITS));
+                    iw01 = cvRound(a*(1.f - b)*(1 << W_BITS));
+                    iw10 = cvRound((1.f - a)*b*(1 << W_BITS));
+                    iw11 = (1 << W_BITS) - iw00 - iw01 - iw10;
+
+                    float _b1[4] = {0,0,0,0};
+                    float _b2[4] = {0,0,0,0};
+    #ifdef RLOF_SSE
+                    __m128 qbc0[4] = {_mm_setzero_ps(),_mm_setzero_ps(),_mm_setzero_ps(),_mm_setzero_ps()};
+                    __m128 qbc1[4] = {_mm_setzero_ps(),_mm_setzero_ps(),_mm_setzero_ps(),_mm_setzero_ps()};
+                    qw0 = _mm_set1_epi32(iw00 + (iw01 << 16));
+                    qw1 = _mm_set1_epi32(iw10 + (iw11 << 16));
+    #endif
+                    for( y = 0; y < winSize.height; y++ )
+                    {
+                        const uchar* Jptr = J.ptr<uchar>(y + inextPt.y, inextPt.x*cn);
+                        const uchar* Jptr1 = J.ptr<uchar>(y + inextPt.y + 1, inextPt.x*cn);
+                        const short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                        const short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+                        x = 0;
+    #ifdef RLOF_SSE
+
+                        const tMaskType* maskPtr = winMaskMat.ptr<tMaskType>(y, 0);
+                        for( ; x <= winSize.width*cn; x += 8, dIptr += 8*2 )
+                        {
+                            if( maskPtr[x  ] == 0 && maskPtr[x+1] == 0 && maskPtr[x+2] == 0 && maskPtr[x+3] == 0
+                            &&    maskPtr[x+4] == 0 && maskPtr[x+5] == 0 && maskPtr[x+6] == 0 && maskPtr[x+7] == 0)
+                                continue;
+                            __m128i diff0 = _mm_loadu_si128((const __m128i*)(Iptr + x));
+                            __m128i v00 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr + x)), z);
+                            __m128i v01 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr + x + cn)), z);
+                            __m128i v10 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr1 + x)), z);
+                            __m128i v11 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr1 + x + cn)), z);
+
+                            __m128i mmDiffc_epi16[4] =
+                            { _mm_subs_epi16(_mm_slli_epi16(v00, 5), diff0),
+                              _mm_subs_epi16(_mm_slli_epi16(v01, 5), diff0),
+                              _mm_subs_epi16(_mm_slli_epi16(v10, 5), diff0),
+                              _mm_subs_epi16(_mm_slli_epi16(v11, 5), diff0)
+                            };
+
+                            __m128i Ixy_0 = _mm_loadu_si128((const __m128i*)(dIptr)); // Ix0 Iy0 Ix1 Iy1 ...
+                            __m128i Ixy_1 = _mm_loadu_si128((const __m128i*)(dIptr + 8));
+
+                            if(  x > winSize.width*cn - 8)
+                            {
+                                Ixy_0 = _mm_and_si128(Ixy_0, mmMask0);
+                                Ixy_1 = _mm_and_si128(Ixy_1, mmMask1);
+                            }
+
+                            __m128i diffc1[4] =
+                            {_mm_unpackhi_epi16(mmDiffc_epi16[0],mmDiffc_epi16[0]),
+                             _mm_unpackhi_epi16(mmDiffc_epi16[1],mmDiffc_epi16[1]),
+                             _mm_unpackhi_epi16(mmDiffc_epi16[2],mmDiffc_epi16[2]),
+                             _mm_unpackhi_epi16(mmDiffc_epi16[3],mmDiffc_epi16[3])
+                            };
+
+                            __m128i diffc0[4] =
+                            {_mm_unpacklo_epi16(mmDiffc_epi16[0],mmDiffc_epi16[0]),
+                             _mm_unpacklo_epi16(mmDiffc_epi16[1],mmDiffc_epi16[1]),
+                             _mm_unpacklo_epi16(mmDiffc_epi16[2],mmDiffc_epi16[2]),
+                             _mm_unpacklo_epi16(mmDiffc_epi16[3],mmDiffc_epi16[3])
+                            };
+
+
+                            // It * Ix It * Iy [0 ... 3]
+                            //mask split for multiplication
+                            // for set 1 lo sigma 1
+
+
+                            v10 = _mm_mullo_epi16(Ixy_0, diffc0[0]);
+                            v11 = _mm_mulhi_epi16(Ixy_0, diffc0[0]);
+                            v00 = _mm_unpacklo_epi16(v10, v11);
+                            v10 = _mm_unpackhi_epi16(v10, v11);
+                            qbc0[0] = _mm_add_ps(qbc0[0], _mm_cvtepi32_ps(v00));
+                            qbc1[0] = _mm_add_ps(qbc1[0], _mm_cvtepi32_ps(v10));
+
+                            v10 = _mm_mullo_epi16(Ixy_0, diffc0[1]);
+                            v11 = _mm_mulhi_epi16(Ixy_0, diffc0[1]);
+                            v00 = _mm_unpacklo_epi16(v10, v11);
+                            v10 = _mm_unpackhi_epi16(v10, v11);
+                            qbc0[1] = _mm_add_ps(qbc0[1], _mm_cvtepi32_ps(v00));
+                            qbc1[1] = _mm_add_ps(qbc1[1], _mm_cvtepi32_ps(v10));
+
+                            v10 = _mm_mullo_epi16(Ixy_0, diffc0[2]);
+                            v11 = _mm_mulhi_epi16(Ixy_0, diffc0[2]);
+                            v00 = _mm_unpacklo_epi16(v10, v11);
+                            v10 = _mm_unpackhi_epi16(v10, v11);
+                            qbc0[2] = _mm_add_ps(qbc0[2], _mm_cvtepi32_ps(v00));
+                            qbc1[2] = _mm_add_ps(qbc1[2], _mm_cvtepi32_ps(v10));
+
+                            v10 = _mm_mullo_epi16(Ixy_0, diffc0[3]);
+                            v11 = _mm_mulhi_epi16(Ixy_0, diffc0[3]);
+                            v00 = _mm_unpacklo_epi16(v10, v11);
+                            v10 = _mm_unpackhi_epi16(v10, v11);
+                            qbc0[3] = _mm_add_ps(qbc0[3], _mm_cvtepi32_ps(v00));
+                            qbc1[3] = _mm_add_ps(qbc1[3], _mm_cvtepi32_ps(v10));
+                            // It * Ix It * Iy [4 ... 7]
+                            // for set 1 hi sigma 1
+
+                            v10 = _mm_mullo_epi16(Ixy_1, diffc1[0]);
+                            v11 = _mm_mulhi_epi16(Ixy_1, diffc1[0]);
+                            v00 = _mm_unpacklo_epi16(v10, v11);
+                            v10 = _mm_unpackhi_epi16(v10, v11);
+                            qbc0[0] = _mm_add_ps(qbc0[0], _mm_cvtepi32_ps(v00));
+                            qbc1[0] = _mm_add_ps(qbc1[0], _mm_cvtepi32_ps(v10));
+
+                            v10 = _mm_mullo_epi16(Ixy_1, diffc1[1]);
+                            v11 = _mm_mulhi_epi16(Ixy_1, diffc1[1]);
+                            v00 = _mm_unpacklo_epi16(v10, v11);
+                            v10 = _mm_unpackhi_epi16(v10, v11);
+                            qbc0[1] = _mm_add_ps(qbc0[1], _mm_cvtepi32_ps(v00));
+                            qbc1[1] = _mm_add_ps(qbc1[1], _mm_cvtepi32_ps(v10));
+
+                            v10 = _mm_mullo_epi16(Ixy_1, diffc1[2]);
+                            v11 = _mm_mulhi_epi16(Ixy_1, diffc1[2]);
+                            v00 = _mm_unpacklo_epi16(v10, v11);
+                            v10 = _mm_unpackhi_epi16(v10, v11);
+                            qbc0[2] = _mm_add_ps(qbc0[2], _mm_cvtepi32_ps(v00));
+                            qbc1[2] = _mm_add_ps(qbc1[2], _mm_cvtepi32_ps(v10));
+
+                            v10 = _mm_mullo_epi16(Ixy_1, diffc1[3]);
+                            v11 = _mm_mulhi_epi16(Ixy_1, diffc1[3]);
+                            v00 = _mm_unpacklo_epi16(v10, v11);
+                            v10 = _mm_unpackhi_epi16(v10, v11);
+                            qbc0[3] = _mm_add_ps(qbc0[3], _mm_cvtepi32_ps(v00));
+                            qbc1[3] = _mm_add_ps(qbc1[3], _mm_cvtepi32_ps(v10));
+
+                         }
+    #else
+                        for( ; x < winSize.width*cn; x++, dIptr += 2 )
+                        {
+                            if( dIptr[0] == 0 && dIptr[1] == 0)
+                                continue;
+                            short It[4] = {
+                                (short)((Jptr [x]    << 5)        - Iptr[x]),
+                                (short)((Jptr [x+cn] << 5)        - Iptr[x]),
+                                (short)((Jptr1[x]    << 5)        - Iptr[x]),
+                                (short)((Jptr1[x+cn] << 5)        - Iptr[x])
+                            };
+                            _b1[0] += (float)(It[0]*dIptr[0]);
+                            _b1[1] += (float)(It[1]*dIptr[0]);
+                            _b1[2] += (float)(It[2]*dIptr[0]);
+                            _b1[3] += (float)(It[3]*dIptr[0]);
+
+                            _b2[0] += (float)(It[0]*dIptr[1]);
+                            _b2[1] += (float)(It[1]*dIptr[1]);
+                            _b2[2] += (float)(It[2]*dIptr[1]);
+                            _b2[3] += (float)(It[3]*dIptr[1]);
+                        }
+    #endif
+
+                    }
+
+    #ifdef RLOF_SSE
+                    float CV_DECL_ALIGNED(16) bbuf[4];
+                    _mm_store_ps(bbuf, _mm_add_ps(qbc0[0], qbc1[0]));
+                    _b1[0] += bbuf[0] + bbuf[2];
+                    _b2[0] += bbuf[1] + bbuf[3];
+
+                    _mm_store_ps(bbuf, _mm_add_ps(qbc0[1], qbc1[1]));
+                    _b1[1] += bbuf[0] + bbuf[2];
+                    _b2[1] += bbuf[1] + bbuf[3];
+
+                    _mm_store_ps(bbuf, _mm_add_ps(qbc0[2], qbc1[2]));
+                    _b1[2] += bbuf[0] + bbuf[2];
+                    _b2[2] += bbuf[1] + bbuf[3];
+
+                    _mm_store_ps(bbuf, _mm_add_ps(qbc0[3], qbc1[3]));
+                    _b1[3] += bbuf[0] + bbuf[2];
+                    _b2[3] += bbuf[1] + bbuf[3];
+
+    #endif
+                    _b1[0] *= FLT_SCALE;
+                    _b1[1] *= FLT_SCALE;
+                    _b1[2] *= FLT_SCALE;
+                    _b1[3] *= FLT_SCALE;
+                    _b2[0] *= FLT_SCALE;
+                    _b2[1] *= FLT_SCALE;
+                    _b2[2] *= FLT_SCALE;
+                    _b2[3] *= FLT_SCALE;
+
+                    float c0[4] = {    _b1[3] + _b1[0] - _b1[2] - _b1[1],
+                                    _b1[1] - _b1[0],
+                                    _b1[2] - _b1[0],
+                                    _b1[0]};
+                    float c1[4] = {    _b2[3] + _b2[0] - _b2[2] - _b2[1],
+                                    _b2[1] - _b2[0],
+                                    _b2[2] - _b2[0],
+                                    _b2[0]};
+
+                    float DA12 = A12 * D ;
+                    float DA22 = A22 * D ;
+                    float DA11 = A11 * D ;
+                    c[0]    = DA12 * c1[0] - DA22 * c0[0];
+                    c[1]    = DA12 * c1[1] - DA22 * c0[1];
+                    c[2]    = DA12 * c1[2] - DA22 * c0[2];
+                    c[3]    = DA12 * c1[3] - DA22 * c0[3];
+                    c[4]    = DA12 * c0[0] - DA11 * c1[0];
+                    c[5]    = DA12 * c0[1] - DA11 * c1[1];
+                    c[6]    = DA12 * c0[2] - DA11 * c1[2];
+                    c[7]    = DA12 * c0[3] - DA11 * c1[3];
+
+                    float e0 = 1.f / (c[6] * c[0] - c[4] * c[2]);
+                    float e1 = e0 * 0.5f * (c[6] * c[1] + c[7] * c[0] - c[5] * c[2] - c[4] * c[3]);
+                    float e2 = e0 * (c[1] * c[7] -c[3] * c[5]);
+                    e0 = e1 * e1 - e2;
+                    if ( e0 >= 0)
+                    {
+                        e0 = sqrt(e0);
+                        float _y[2] = {-e1 - e0, e0 - e1};
+                        float c0yc1[2] = {c[0] * _y[0] + c[1],
+                                          c[0] * _y[1] + c[1]};
+
+                        float _x[2] = {-(c[2] * _y[0] + c[3]) / c0yc1[0],
+                                       -(c[2] * _y[1] + c[3]) / c0yc1[1]};
+
+                        bool isIn1 = (_x[0] >=0 && _x[0] <=1 && _y[0] >= 0 && _y[0] <= 1);
+                        bool isIn2 = (_x[1] >=0 && _x[1] <=1 && _y[1] >= 0 && _y[1] <= 1);
+
+
+                        bool isSink1 = isIn1 && checkSolution(_x[0], _y[0], c );
+                        bool isSink2 = isIn2 && checkSolution(_x[1], _y[1], c );
+
+
+                        //if( isSink1 != isSink2 )
+                        {
+                            if( isSink1 )
+                            {
+                                hasSolved = true;
+                                delta.x = inextPt.x + _x[0] - nextPt.x;
+                                delta.y = inextPt.y + _y[0] - nextPt.y;
+                            }
+                            if (isSink2 )
+                            {
+                                hasSolved = true;
+                                delta.x = inextPt.x + _x[1] - nextPt.x;
+                                delta.y = inextPt.y + _y[1] - nextPt.y;
+                            }
+                        } // isIn1 != isIn2
+                    } // e0 >= 0
+                }
+                else
+                    hasSolved = false;
+                if(hasSolved == false)
+                {
+                    delta = Point2f( c[0] * ab + c[1] * a + c[2] * b + c[3],
+                                     c[4] * ab + c[5] * a + c[6] * b + c[7]);
+                    nextPt += 0.7 * delta;
+                    nextPts[ptidx] = nextPt - halfWin;
+                }
+                else
+                {
+                    nextPt += delta;
+                    nextPts[ptidx] = nextPt - halfWin;
+                    break;
+                }
+
+
+                if( delta.ddot(delta) <= criteria.epsilon)
+                    break;
+
+                if(j > 0 && std::abs(delta.x - prevDelta.x) < 0.01  &&
+                            std::abs(delta.y - prevDelta.y) < 0.01)
+                {
+                    nextPts[ptidx]  -= delta*0.35f;
+                    break;
+                }
+
+                prevDelta = delta;
+
+            }
+
+        }
+    }
+
+
+    const Mat*          prevImg;
+    const Mat*          nextImg;
+    const Mat*          prevDeriv;
+    const Mat*          rgbPrevImg;
+    const Mat*          rgbNextImg;
+    const Point2f*      prevPts;
+    Point2f*            nextPts;
+    uchar*              status;
+    float*              err;
+    int                 maxWinSize;
+    int                 minWinSize;
+    TermCriteria        criteria;
+    int                 level;
+    int                 maxLevel;
+    int                 windowType;
+    float               minEigThreshold;
+    bool                useInitialFlow;
+    int                 crossSegmentationThreshold;
+};
+
+}}}}  // namespace
+#endif
diff --git a/modules/optflow/src/rlof/geo_interpolation.cpp b/modules/optflow/src/rlof/geo_interpolation.cpp
new file mode 100644
index 00000000000..0ac2e1715da
--- /dev/null
+++ b/modules/optflow/src/rlof/geo_interpolation.cpp
@@ -0,0 +1,453 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+// This functions have been contributed by Jonas Geisters <geistert@nue.tu-berlin.de>
+
+#include "../precomp.hpp"
+
+#include "geo_interpolation.hpp"
+#include <string>
+#include <map>
+namespace cv {
+namespace optflow {
+
+struct Graph_helper {
+    std::vector<int> mem;
+    int e_size;
+    Graph_helper(int k, int num_nodes) {
+        e_size = (2 * k + 1);
+        mem.resize(e_size * num_nodes, 0);
+    }
+    inline int size(int id) {
+        int r_addr = id * (e_size);
+        return mem[r_addr];
+    }
+    inline int * data(int id) {
+        int r_addr = id * (e_size)+1;
+        return &mem[r_addr];
+    }
+    inline void add(int id, std::pair<float, int> data) {
+        int r_addr = id * (e_size);
+        int size = ++mem[r_addr];
+        r_addr += 2 * size - 1;//== 1 + 2*(size-1);
+        *(float*)&mem[r_addr] = data.first;
+        mem[r_addr + 1] = data.second;
+    }
+    inline bool color_in_target(int id, int color) {
+        int r_addr = id * (e_size);
+        int size = mem[r_addr];
+        r_addr += 2;
+        for (int i = 0; i < size; i++) {
+            if (mem[r_addr] == color) {
+                return true;
+            }
+            r_addr += 2;
+        }
+        return false;
+    }
+
+};
+
+Mat sgeo_dist(const Mat& gra, int y, int x, float max, Mat &prev)
+{
+    std::vector <Point2f> points;
+    points.push_back(Point2f(static_cast<float>(x), static_cast<float>(y)));
+    return sgeo_dist(gra, points, max, prev);
+}
+Mat sgeo_dist(const Mat& gra, const std::vector<Point2f> & points, float max, Mat &prev)
+{
+    int Dx[] = { -1,0,1,-1,1,-1,0,1 };
+    int Dy[] = { -1,-1,-1,0,0,1,1,1 };
+    Mat dm(gra.rows, gra.cols, CV_32F, Scalar(max));
+    prev = Mat(gra.rows, gra.cols, CV_8U, Scalar(255));
+
+    std::multimap<float, Vec2i > not_visited_with_value;
+
+    for (auto i = points.begin(); i != points.end(); i++)
+    {
+        int y = static_cast<int>(i->y);
+        int x = static_cast<int>(i->x);
+        not_visited_with_value.insert(std::pair<float, Vec2i >(0.f, Vec2i(y, x)));
+        dm.at<float>(y, x) = 0;
+    }
+
+    bool done = false;
+    while (!done)
+    {
+        if (not_visited_with_value.begin() == not_visited_with_value.end()) {
+            done = true;
+            break;
+        }
+        std::multimap<float, Vec2i >::iterator current_it = not_visited_with_value.begin();
+        std::pair<float, Vec2i > current_p = *current_it;
+        not_visited_with_value.erase(current_it);
+
+        int y = current_p.second[0];
+        int x = current_p.second[1];
+        float cur_d = current_p.first;
+
+        if (dm.at<float>(y, x) != cur_d) {
+            continue;
+        }
+
+        Vec8f gra_e = gra.at<Vec8f>(y, x);
+
+        for (int i = 0; i < 8; i++) {
+            if (gra_e[i] < 0) {
+                continue;
+            }
+            int dx = Dx[i];
+            int dy = Dy[i];
+
+            if (dm.at<float>(y + dy, x + dx) > cur_d + gra_e[i]) {
+                dm.at<float>(y + dy, x + dx) = cur_d + gra_e[i];
+                prev.at<uchar>(y + dy, x + dx) = static_cast<uchar>(7 - i);
+                not_visited_with_value.insert(std::pair<float, Vec2i >(cur_d + gra_e[i], Vec2i(y + dy, x + dx)));
+            }
+
+        }
+    }
+
+
+
+
+    return dm;
+}
+
+Mat interpolate_irregular_nn_raster(const std::vector<Point2f> & prevPoints,
+    const std::vector<Point2f> & nextPoints,
+    const std::vector<uchar> & status,
+    const Mat & i1)
+{
+    Mat gra = getGraph(i1, 0.1f);
+    int Dx[] = { -1,0,1,-1,1,-1,0,1 };
+    int Dy[] = { -1,-1,-1,0,0,1,1,1 };
+    int max_rounds = 10;
+    Mat dirt = Mat(gra.rows, gra.cols, CV_8U, Scalar(0));
+    Mat quellknoten = Mat(gra.rows, gra.cols, CV_32S, Scalar(-1));
+    Mat dist = Mat(gra.rows, gra.cols, CV_32F, Scalar(std::numeric_limits<float>::max()));
+    /*
+        * assign quellknoten ids.
+        */
+    for (int i = 0; i < static_cast<int>(prevPoints.size()); i++)
+    {
+        int x = (int)prevPoints[i].x;
+        int y = (int)prevPoints[i].y;
+        if (status[i] == 0)
+            continue;
+        dirt.at<uchar>(y, x) = 1;
+        dist.at<float>(y, x) = 0;
+        quellknoten.at<int>(y, x) = i;
+    }
+
+    bool clean = true;
+    bool done = false;
+    int x = 0;
+    int y = 0;
+    int rounds = 0;
+    while (!done) {
+        /*
+            * Update x and y
+            * on even rounds go rasterscanorder , on odd round inverse rasterscanorder
+            */
+        if (rounds % 2 == 0) {
+            x++;
+            if (x >= gra.cols) {
+                x = 0;
+                y++;
+                if (y >= gra.rows) {
+                    y = 0;
+                    rounds++;
+                    y = gra.rows - 1;
+                    x = gra.cols - 1;
+                    if (rounds >= max_rounds || clean) {
+                        done = true;
+                        break;
+                    }
+                }
+            }
+        }
+        else {
+            x--;
+            if (x < 0) {
+                x = gra.cols - 1;
+                y--;
+                if (y < 0) {
+                    y = gra.rows - 1;
+                    rounds++;
+                    y = 0;
+                    x = 0;
+                    if (rounds >= max_rounds || clean) {
+                        done = true;
+                        break;
+                    }
+                }
+            }
+        }
+        if (dirt.at<uchar>(y, x) == 0) {
+            continue;
+        }
+        dirt.at<uchar>(y, x) = 0;
+
+        float c_dist = dist.at<float>(y, x);
+        Vec8f gra_e = gra.at<Vec8f>(y, x);
+
+        for (int i = 0; i < 8; i++) {
+            int tx = Dx[i];
+            int ty = Dy[i];
+            if (ty == 0 && tx == 0) {
+                continue;
+            }
+            if (x + tx < 0 || x + tx >= gra.cols) {
+                continue;
+            }
+            if (y + ty < 0 || y + ty >= gra.rows) {
+                continue;
+            }
+            if (c_dist > dist.at<float>(y + ty, x + tx)) {
+                if (c_dist > dist.at<float>(y + ty, x + tx) + gra_e[i]) {
+                    quellknoten.at<int>(y, x) = quellknoten.at<int>(y + ty, x + tx);
+                    dist.at<float>(y, x) = dist.at<float>(y + ty, x + tx) + gra_e[i];
+                    dirt.at<uchar>(y, x) = 1;
+                    clean = false;
+                }
+            }
+            else {
+                if (c_dist + gra_e[i] < dist.at<float>(y + ty, x + tx)) {
+                    quellknoten.at<int>(y + ty, x + tx) = quellknoten.at<int>(y, x);
+                    dist.at<float>(y + ty, x + tx) = dist.at<float>(y, x) + gra_e[i];
+                    dirt.at<uchar>(y + ty, x + tx) = 1;
+                    clean = false;
+                }
+            }
+        }
+
+
+
+    }
+
+    Mat nnFlow(i1.rows, i1.cols, CV_32FC2, Scalar(0));
+    for (y = 0; y < i1.rows; y++) {
+        for (x = 0; x < i1.cols; x++) {
+
+            int id = quellknoten.at<int>(y, x);
+            if (id != -1)
+            {
+                nnFlow.at<Point2f>(y, x) = nextPoints[id] - prevPoints[id];
+            }
+        }
+    }
+    return nnFlow;
+}
+
+Mat interpolate_irregular_knn(
+    const std::vector<Point2f> & _prevPoints,
+    const std::vector<Point2f> & _nextPoints,
+    const std::vector<uchar> & status,
+    const Mat &color_img,
+    int k,
+    float pixeldistance)
+{
+    Mat in(color_img.rows, color_img.cols, CV_32FC2);
+    Mat mask = Mat::zeros(color_img.rows, color_img.cols, CV_8UC1);
+
+    for (unsigned n = 0; n < _prevPoints.size(); n++)
+    {
+        if (_prevPoints[n].x >= 0 && _prevPoints[n].y >= 0 && _prevPoints[n].x < color_img.cols && _prevPoints[n].y < color_img.rows)
+        {
+            in.at<Point2f>(_prevPoints[n]) = _nextPoints[n] - _prevPoints[n];
+            mask.at<uchar>(_prevPoints[n]) = status[n];
+        }
+
+    }
+    int Dx[] = { -1,0,1,-1,1,-1,0,1 };
+    int Dy[] = { -1,-1,-1,0,0,1,1,1 };
+    Mat gra = getGraph(color_img, pixeldistance);
+    Mat nnFlow(in.rows, in.cols, CV_32FC2, Scalar(0));
+
+    std::multimap<float, Vec2i > my_agents; // <arrivaltim , < target, color >>
+    Graph_helper graph_helper(k, in.rows*in.cols); //< arrivaltime, color>
+
+
+    int color = 0;
+    std::vector<Vec2i> flow_point_list;
+    for (int y = 0; y < in.rows; y++) {
+        for (int x = 0; x < in.cols; x++) {
+            if (mask.at<uchar>(y, x) > 0) {
+                flow_point_list.push_back(Vec2i(y, x));
+                nnFlow.at<Vec2f>(y, x) = in.at<Vec2f>(y, x);
+
+                int v_id = (y * in.cols + x);
+                graph_helper.add(v_id, std::pair<float, int>(0.f, color));
+
+
+                Vec8f gra_e = gra.at<Vec8f>(y, x);
+                for (int i = 0; i < 8; i++) {
+                    if (gra_e[i] < 0)
+                        continue;
+                    int dx = Dx[i];
+                    int dy = Dy[i];
+                    int target = (y + dy) * in.cols + (x + dx);
+                    Vec2i agent(target, color);
+                    my_agents.insert(std::pair<float, Vec2i >(gra_e[i], agent));
+
+                }
+                color++;
+            }
+        }
+    }
+
+    int global_time = 0;
+
+    bool done = false;
+    while (!done) {
+        if (my_agents.size() == 0) {
+            done = true;
+            break;
+        }
+        global_time++;
+
+        std::multimap<float, Vec2i >::iterator current_it = my_agents.begin();
+        std::pair<float, Vec2i > current_p = *current_it;
+        my_agents.erase(current_it);
+
+        int target = current_p.second[0];
+        color = current_p.second[1];
+        float arriv_time = current_p.first;
+
+        Vec8f gra_e = gra.at<Vec8f>(target);// (y*cols+x)
+        if (graph_helper.size(target) >= k) {
+            continue;
+        }
+
+        bool color_found_in_target = graph_helper.color_in_target(target, color);
+        if (color_found_in_target) {
+            continue;
+        }
+        graph_helper.add(target, std::pair<float, int>(arriv_time, color));
+
+
+        for (int i = 0; i < 8; i++) {
+            if (gra_e[i] < 0)
+                continue;
+
+            int dx = Dx[i];
+            int dy = Dy[i];
+            int new_target = target + dx + (dy*in.cols);
+            if (graph_helper.size(new_target) >= k) {
+                continue;
+            }
+            color_found_in_target = graph_helper.color_in_target(new_target, color);
+            if (color_found_in_target) {
+                continue;
+            }
+            Vec2i new_agent(new_target, color);
+            my_agents.insert(std::pair<float, Vec2i >(arriv_time + gra_e[i], new_agent));
+
+        }
+    }
+
+    Mat ret(in.rows, in.cols*k, CV_32FC2);
+    for (int y = 0; y < in.rows; y++) {
+        for (int x = 0; x < in.cols; x++) {
+            for (int i = 0; i < k; i++) {
+                float dist = *((float*)(graph_helper.data(y*in.cols + x) + 2 * i));
+                float id = *((float*)(graph_helper.data(y*in.cols + x) + 2 * i + 1));
+                ret.at<Vec2f>(y, k*x + i) = Vec2f(dist, id);
+            }
+        }
+    }
+    return ret;
+}
+
+Mat getGraph(const Mat &image, float edge_length)
+{
+
+    int Dx[] = { -1,0,1,-1,1,-1,0,1 };
+    int Dy[] = { -1,-1,-1,0,0,1,1,1 };
+    Mat gra(image.rows, image.cols, CV_32FC(8));
+
+    for (int y = 0; y < gra.rows; y++) {
+        for (int x = 0; x < gra.cols; x++) {
+
+            for (int i = 0; i < 8; i++) {
+                int dx = Dx[i];
+                int dy = Dy[i];
+                gra.at<Vec8f>(y, x)[i] = -1;
+
+                if (x + dx < 0 || y + dy < 0 || x + dx >= gra.cols || y + dy >= gra.rows) {
+                    continue;
+                }
+
+                if (i < 4) {
+                    int si = 7 - i;
+                    gra.at<Vec8f>(y, x)[i] = gra.at<Vec8f>(y + dy, x + dx)[si];
+                }
+                else {
+                    float p1 = dx * dx*edge_length*edge_length + dy * dy*edge_length*edge_length;
+                    float p2 = static_cast<float>(image.at<Vec3b>(y, x)[0] - image.at<Vec3b>(y + dy, x + dx)[0]);
+                    float p3 = static_cast<float>(image.at<Vec3b>(y, x)[1] - image.at<Vec3b>(y + dy, x + dx)[1]);
+                    float p4 = static_cast<float>(image.at<Vec3b>(y, x)[2] - image.at<Vec3b>(y + dy, x + dx)[2]);
+                    gra.at<Vec8f>(y, x)[i] = sqrt(p1 + p2 * p2 + p3 * p3 + p4 * p4);
+                }
+
+            }
+
+        }
+    }
+
+    return gra;
+}
+
+
+Mat interpolate_irregular_nn(
+    const std::vector<Point2f> & _prevPoints,
+    const std::vector<Point2f> & _nextPoints,
+    const std::vector<uchar> & status,
+    const Mat &color_img,
+    float pixeldistance)
+{
+    int Dx[] = { -1,0,1,-1,1,-1,0,1 };
+    int Dy[] = { -1,-1,-1,0,0,1,1,1 };
+    std::vector<Point2f> prevPoints, nextPoints;
+    std::map<std::pair<float, float>, std::pair<float, float>> flowMap;
+    for (unsigned n = 0; n < _prevPoints.size(); n++)
+    {
+        if (status[n] != 0)
+        {
+            flowMap.insert(std::make_pair(
+                std::make_pair(_prevPoints[n].x, _prevPoints[n].y),
+                std::make_pair(_nextPoints[n].x, _nextPoints[n].y)));
+            prevPoints.push_back(_prevPoints[n]);
+            nextPoints.push_back(_nextPoints[n]);
+        }
+
+    }
+
+    Mat gra = getGraph(color_img, pixeldistance);
+
+    Mat prev;
+    Mat geo_dist = sgeo_dist(gra, prevPoints, std::numeric_limits<float>::max(), prev);
+
+
+    Mat nnFlow = Mat::zeros(color_img.size(), CV_32FC2);
+    for (int y = 0; y < nnFlow.rows; y++)
+    {
+        for (int x = 0; x < nnFlow.cols; x++)
+        {
+            int cx = x;
+            int cy = y;
+            while (prev.at<uchar>(cy, cx) != 255)
+            {
+                int i = prev.at<uchar>(cy, cx);
+                cx += Dx[i];
+                cy += Dy[i];
+            }
+            auto val = flowMap[std::make_pair(static_cast<float>(cx), static_cast<float>(cy))];
+            nnFlow.at<Vec2f>(y, x) = Vec2f(val.first - cx, val.second - cy);
+        }
+    }
+    return nnFlow;
+}
+
+}} // namespace
diff --git a/modules/optflow/src/rlof/geo_interpolation.hpp b/modules/optflow/src/rlof/geo_interpolation.hpp
new file mode 100644
index 00000000000..526768903b5
--- /dev/null
+++ b/modules/optflow/src/rlof/geo_interpolation.hpp
@@ -0,0 +1,35 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#ifndef _GEO_INTERPOLATION_HPP_
+#define _GEO_INTERPOLATION_HPP_
+
+namespace cv {
+namespace optflow {
+
+typedef Vec<float, 8> Vec8f;
+Mat getGraph(const Mat & image, float edge_length);
+Mat sgeo_dist(const Mat& gra, int y, int x, float max, Mat &prev);
+Mat sgeo_dist(const Mat& gra, const std::vector<Point2f> & points, float max, Mat &prev);
+Mat interpolate_irregular_nw(const Mat &in, const Mat &mask, const Mat &color_img, float max_d, float bandwidth, float pixeldistance);
+Mat interpolate_irregular_nn(
+    const std::vector<Point2f> & prevPoints,
+    const std::vector<Point2f> & nextPoints,
+    const std::vector<uchar> & status,
+    const Mat &color_img,
+    float pixeldistance);
+Mat interpolate_irregular_knn(
+    const std::vector<Point2f> & _prevPoints,
+    const std::vector<Point2f> & _nextPoints,
+    const std::vector<uchar> & status,
+    const Mat &color_img,
+    int k,
+    float pixeldistance);
+
+Mat interpolate_irregular_nn_raster(const std::vector<Point2f> & prevPoints,
+    const std::vector<Point2f> & nextPoints,
+    const std::vector<uchar> & status,
+    const Mat & i1);
+
+}} // namespace
+#endif
diff --git a/modules/optflow/src/rlof/plk_invoker.hpp b/modules/optflow/src/rlof/plk_invoker.hpp
new file mode 100644
index 00000000000..3c0bad6f963
--- /dev/null
+++ b/modules/optflow/src/rlof/plk_invoker.hpp
@@ -0,0 +1,1087 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#ifndef _PLK_INVOKER_HPP_
+#define _PLK_INVOKER_HPP_
+#include "rlof_invokerbase.hpp"
+
+
+namespace cv {
+namespace optflow {
+namespace plk {
+
+// implementierung ohne SSE
+namespace radial {
+
+class TrackerInvoker : public cv::ParallelLoopBody
+{
+public:
+    TrackerInvoker(
+
+        const Mat&      _prevImg,
+        const Mat&      _prevDeriv,
+        const Mat&      _nextImg,
+        const Mat&      _rgbPrevImg,
+        const Mat&      _rgbNextImg,
+        const Point2f*  _prevPts,
+        Point2f*        _nextPts,
+        uchar*          _status,
+        float*          _err,
+        Point2f*        _gainVecs,
+        int             _level,
+        int             _maxLevel,
+        int             _winSize[2],
+        int             _maxIteration,
+        bool            _useInitialFlow,
+        int             _supportRegionType,
+        float           _minEigenValue,
+        int             _crossSegmentationThreshold)
+    {
+        prevImg = &_prevImg;
+        prevDeriv = &_prevDeriv;
+        nextImg = &_nextImg;
+        rgbPrevImg = &_rgbPrevImg;
+        rgbNextImg = &_rgbNextImg;
+        prevPts = _prevPts;
+        nextPts = _nextPts;
+        status = _status;
+        err = _err;
+        gainVecs = _gainVecs;
+        minWinSize = _winSize[0];
+        maxWinSize = _winSize[1];
+        criteria.maxCount = _maxIteration;
+        criteria.epsilon = 0.01;
+        level = _level;
+        maxLevel = _maxLevel;
+        windowType = _supportRegionType;
+        minEigThreshold = _minEigenValue;
+        useInitialFlow = _useInitialFlow;
+        crossSegmentationThreshold = _crossSegmentationThreshold;
+    }
+
+    void operator()(const cv::Range& range) const CV_OVERRIDE
+    {
+        cv::Size    winSize;
+        cv::Point2f halfWin;
+
+        const Mat& I    = *prevImg;
+        const Mat& J    = *nextImg;
+        const Mat& derivI = *prevDeriv;
+        const Mat& BI = *rgbPrevImg;
+
+        winSize = cv::Size(maxWinSize,maxWinSize);
+        int winMaskwidth = roundUp(winSize.width, 16);
+        cv::Mat winMaskMatBuf(winMaskwidth, winMaskwidth, tCVMaskType);
+        winMaskMatBuf.setTo(1);
+        const float FLT_SCALE = (1.f/(1 << 16));//(1.f/(1 << 20)); // 20
+
+        int cn = I.channels(), cn2 = cn*2;
+        int winbufwidth = roundUp(winSize.width, 16);
+        cv::Size winBufSize(winbufwidth,winbufwidth);
+
+        cv::Matx44f invTensorMat;
+        Vec4f mismatchMat;
+        Vec4f resultMat;
+
+        cv::AutoBuffer<deriv_type> _buf(winBufSize.area()*(cn + cn2));
+        int derivDepth = DataType<deriv_type>::depth;
+
+        Mat IWinBuf(winBufSize, CV_MAKETYPE(derivDepth, cn), (deriv_type*)_buf.data());
+        Mat derivIWinBuf(winBufSize, CV_MAKETYPE(derivDepth, cn2), (deriv_type*)_buf.data() + winBufSize.area()*cn);
+
+        for( int ptidx = range.start; ptidx < range.end; ptidx++ )
+        {
+            Point2f prevPt = prevPts[ptidx]*(float)(1./(1 << level));
+            Point2f nextPt;
+            if( level == maxLevel )
+            {
+                if( useInitialFlow )
+                {
+                    nextPt = nextPts[ptidx]*(float)(1./(1 << level));
+                }
+                else
+                    nextPt = prevPt;
+            }
+            else
+                nextPt = nextPts[ptidx]*2.f;
+            nextPts[ptidx] = nextPt;
+
+            Point2i iprevPt, inextPt;
+            iprevPt.x = cvFloor(prevPt.x);
+            iprevPt.y = cvFloor(prevPt.y);
+            int winArea = maxWinSize * maxWinSize;
+            cv::Mat winMaskMat(winMaskMatBuf, cv::Rect(0,0, maxWinSize,maxWinSize));
+            winMaskMatBuf.setTo(0);
+            if( calcWinMaskMat(BI, windowType, iprevPt,
+                                winMaskMat,winSize,halfWin,winArea,
+                                minWinSize,maxWinSize) == false)
+                        continue;
+            halfWin = Point2f(static_cast<float>(maxWinSize) ,static_cast<float>(maxWinSize) ) - halfWin;
+            prevPt += halfWin;
+            iprevPt.x = cvFloor(prevPt.x);
+            iprevPt.y = cvFloor(prevPt.y);
+
+
+            if( iprevPt.x < 0 || iprevPt.x >= derivI.cols - winSize.width ||
+                iprevPt.y < 0 || iprevPt.y >= derivI.rows - winSize.height - 1)
+            {
+                if( level == 0 )
+                {
+                    if( status )
+                        status[ptidx] = 3;
+                    if( err )
+                        err[ptidx] = 0;
+                }
+                continue;
+            }
+
+            float a = prevPt.x - iprevPt.x;
+            float b = prevPt.y - iprevPt.y;
+            const int W_BITS = 14, W_BITS1 = 14;
+            int iw00 = cvRound((1.f - a)*(1.f - b)*(1 << W_BITS));
+            int iw01 = cvRound(a*(1.f - b)*(1 << W_BITS));
+            int iw10 = cvRound((1.f - a)*b*(1 << W_BITS));
+            int iw11 = (1 << W_BITS) - iw00 - iw01 - iw10;
+            float A11 = 0, A12 = 0, A22 = 0;
+            // tensor
+            float sumIx = 0;
+            float sumIy = 0;
+            float sumI  = 0;
+            float sumW = 0;
+            float w1 = 0, w2 = 0; // -IyI
+            float dI = 0; // I^2
+            float D = 0;
+ #ifdef RLOF_SSE
+
+            __m128i qw0 = _mm_set1_epi32(iw00 + (iw01 << 16));
+            __m128i qw1 = _mm_set1_epi32(iw10 + (iw11 << 16));
+            __m128i z = _mm_setzero_si128();
+            __m128i qdelta_d = _mm_set1_epi32(1 << (W_BITS1-1));
+            __m128i qdelta = _mm_set1_epi32(1 << (W_BITS1-5-1));
+            __m128i mmMask4_epi32;
+            __m128i mmMaskSet_epi16   = _mm_set1_epi16(std::numeric_limits<unsigned short>::max());
+            get4BitMask(winSize.width, mmMask4_epi32);
+#endif
+            // extract the patch from the first image, compute covariation matrix of derivatives
+            int x, y;
+            for( y = 0; y < winSize.height; y++ )
+            {
+                const uchar* src = I.ptr<uchar>(y + iprevPt.y, 0) + iprevPt.x*cn;
+                const uchar* src1 = I.ptr<uchar>(y + iprevPt.y + 1, 0) + iprevPt.x*cn;
+                const short* dsrc = derivI.ptr<short>(y + iprevPt.y, 0) + iprevPt.x*cn2;
+                const short* dsrc1 = derivI.ptr<short>(y + iprevPt.y + 1, 0) + iprevPt.x*cn2;
+                short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+                const tMaskType* maskPtr = winMaskMat.ptr<tMaskType>(y, 0);
+                x = 0;
+#ifdef RLOF_SSE
+               for( ; x <= winBufSize.width*cn - 4; x += 4, dsrc += 4*2, dsrc1 += 8, dIptr += 4*2 )
+                {
+                    __m128i mask_0_7_epi16 = _mm_mullo_epi16(_mm_cvtepi8_epi16(_mm_loadl_epi64((const __m128i*)(maskPtr+x))), mmMaskSet_epi16);
+                    __m128i mask_0_3_epi16 = _mm_unpacklo_epi16(mask_0_7_epi16, mask_0_7_epi16);
+                    __m128i v00, v01, v10, v11, t0, t1;
+                    v00 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src + x)), z);
+                    v01 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src + x + cn)), z);
+                    v10 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src1 + x)), z);
+                    v11 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src1 + x + cn)), z);
+
+                    t0 = _mm_add_epi32(_mm_madd_epi16(_mm_unpacklo_epi16(v00, v01), qw0),
+                                       _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+                    t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta), W_BITS1-5);
+                    if( x + 4 > winSize.width)
+                    {
+                        t0 = _mm_and_si128(t0, mmMask4_epi32);
+                    }
+                    t0 = _mm_and_si128(t0, mask_0_3_epi16);
+                    _mm_storel_epi64((__m128i*)(Iptr + x), _mm_packs_epi32(t0,t0));
+
+
+                    v00 = _mm_loadu_si128((const __m128i*)(dsrc));
+                    v01 = _mm_loadu_si128((const __m128i*)(dsrc + cn2));
+                    v10 = _mm_loadu_si128((const __m128i*)(dsrc1));
+                    v11 = _mm_loadu_si128((const __m128i*)(dsrc1 + cn2));
+
+                    t0 = _mm_add_epi32(_mm_madd_epi16(_mm_unpacklo_epi16(v00, v01), qw0),
+                                       _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+                    t1 = _mm_add_epi32(_mm_madd_epi16(_mm_unpackhi_epi16(v00, v01), qw0),
+                                       _mm_madd_epi16(_mm_unpackhi_epi16(v10, v11), qw1));
+                    t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta_d), W_BITS1);
+                    t1 = _mm_srai_epi32(_mm_add_epi32(t1, qdelta_d), W_BITS1);
+                    v00 = _mm_packs_epi32(t0, t1); // Ix0 Iy0 Ix1 Iy1 ...
+                    if( x + 4 > winSize.width)
+                    {
+                        v00 = _mm_and_si128(v00, mmMask4_epi32);
+                    }
+                    v00 = _mm_and_si128(v00, mask_0_3_epi16);
+                    _mm_storeu_si128((__m128i*)dIptr, v00);
+                }
+#else
+                for( ; x < winSize.width*cn; x++, dsrc += 2, dsrc1 += 2, dIptr += 2 )
+                {
+                    if( maskPtr[x] == 0)
+                    {
+                        dIptr[0] = 0;
+                        dIptr[1] = 0;
+                        continue;
+                    }
+                    int ival = CV_DESCALE(src[x]*iw00 + src[x+cn]*iw01 +
+                                          src1[x]*iw10 + src1[x+cn]*iw11, W_BITS1-5);
+                    int ixval = CV_DESCALE(dsrc[0]*iw00 + dsrc[cn2]*iw01 +
+                                           dsrc1[0]*iw10 + dsrc1[cn2]*iw11, W_BITS1);
+                    int iyval = CV_DESCALE(dsrc[1]*iw00 + dsrc[cn2+1]*iw01 + dsrc1[1]*iw10 +
+                                           dsrc1[cn2+1]*iw11, W_BITS1);
+
+                    Iptr[x] = (short)ival;
+                    dIptr[0] = (short)ixval;
+                    dIptr[1] = (short)iyval;
+                }
+#endif
+
+            }
+
+            cv::Point2f backUpNextPt = nextPt;
+            nextPt += halfWin;
+            Point2f prevDelta(0,0);    //relates to h(t-1)
+            Point2f prevGain(1,0);
+            cv::Point2f gainVec = gainVecs[ptidx];
+            cv::Point2f backUpGain = gainVec;
+            int j;
+            for( j = 0; j < criteria.maxCount; j++ )
+            {
+                status[ptidx] = static_cast<uchar>(j);
+                inextPt.x = cvFloor(nextPt.x);
+                inextPt.y = cvFloor(nextPt.y);
+
+                if( inextPt.x < 0 || inextPt.x >= J.cols - winSize.width ||
+                    inextPt.y < 0 || inextPt.y >= J.rows - winSize.height - 1)
+                {
+                    if( level == 0 && status )
+                        status[ptidx] = 3;
+                    break;
+                }
+
+                a = nextPt.x - inextPt.x;
+                b = nextPt.y - inextPt.y;
+                iw00 = cvRound((1.f - a)*(1.f - b)*(1 << W_BITS));
+                iw01 = cvRound(a*(1.f - b)*(1 << W_BITS));
+                iw10 = cvRound((1.f - a)*b*(1 << W_BITS));
+                iw11 = (1 << W_BITS) - iw00 - iw01 - iw10;
+                float b1,b2,b3,b4;
+                b1 = 0;
+                b2 = 0;
+                b3 = 0;
+                b4 = 0;
+
+                if( j == 0 )
+                {
+                    // tensor
+                    w1 = 0; // -IxI
+                    w2 = 0; // -IyI
+                    dI = 0; // I^2
+                    sumIx = 0;
+                    sumIy = 0;
+                    sumI  = 0;
+                    sumW = 0;
+                    A11 = 0;
+                    A12 = 0;
+                    A22 = 0;
+                }
+#ifdef RLOF_SSE
+                qw0 = _mm_set1_epi32(iw00 + (iw01 << 16));
+                qw1 = _mm_set1_epi32(iw10 + (iw11 << 16));
+                __m128 qb0 = _mm_setzero_ps(), qb1 = _mm_setzero_ps(), qb2 = _mm_setzero_ps();
+                __m128 qb3 = _mm_setzero_ps();
+                __m128 mmSumW1 = _mm_setzero_ps(), mmSumW2 = _mm_setzero_ps();
+                __m128 mmSumI = _mm_setzero_ps(), mmSumW = _mm_setzero_ps(), mmSumDI = _mm_setzero_ps();
+                __m128 mmSumIy = _mm_setzero_ps(),  mmSumIx = _mm_setzero_ps();
+                __m128 mmAxx = _mm_setzero_ps(), mmAxy = _mm_setzero_ps(), mmAyy = _mm_setzero_ps();
+                float gainVal = gainVec.x > 0 ? gainVec.x : -gainVec.x;
+                int bitShift = gainVec.x == 0 ? 1 : cvCeil(log(200.f / gainVal) / log(2.f));
+                __m128i mmGainValue_epi16 = _mm_set1_epi16(static_cast<short>(gainVec.x * (float)(1 << bitShift)));
+                __m128i mmConstValue_epi16 = _mm_set1_epi16(static_cast<short>(gainVec.y));
+#endif
+                for( y = 0; y < winSize.height; y++ )
+                {
+                    const uchar* Jptr = J.ptr<uchar>(y + inextPt.y, inextPt.x*cn);
+                    const uchar* Jptr1 = J.ptr<uchar>(y + inextPt.y + 1, inextPt.x*cn);
+                    const short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                    const short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+                    const tMaskType* maskPtr = winMaskMat.ptr<tMaskType>(y, 0);
+                    x = 0;
+#ifdef RLOF_SSE
+
+                    for( ; x <= winSize.width*cn; x += 8, dIptr += 8*2 )
+                    {
+                        // iw = iw << 14 (16384 short / 2)
+                        // iq0 =  iw01 |iw00, iw01 |iw00, iw01 |iw00, iw01 |iw00    16 bit
+                        __m128i mask_0_7_epi16 = _mm_mullo_epi16(_mm_cvtepi8_epi16(_mm_loadl_epi64((const __m128i*)(maskPtr+x))), mmMaskSet_epi16);
+
+                        // I [0...8]
+                        __m128i diff0, diff1;
+                        __m128i I_0_7_epi16 = _mm_loadu_si128((const __m128i*)(Iptr + x)); //  element 0 to 7
+
+                        __m128i v00 = _mm_unpacklo_epi8(
+                            _mm_loadl_epi64((const __m128i*)(Jptr + x)) // J0, J1, J2, ..., J7, 64 bit nullen
+                                , z); //J0 , 0, J1, 0, J2, 0 ... J7,0
+                        __m128i v01 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr + x + cn)), z);
+                        __m128i v10 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr1 + x)), z);
+                        __m128i v11 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr1 + x + cn)), z);
+
+
+                        __m128i t0 = _mm_add_epi32
+                            (_mm_madd_epi16(
+                                _mm_unpacklo_epi16(v00, v01), //J0, , J1, J1, J2, J2,  ... J3 , J4
+                                qw0),
+                            _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+
+                        __m128i t1 = _mm_add_epi32(_mm_madd_epi16(_mm_unpackhi_epi16(v00, v01), qw0),
+                                                   _mm_madd_epi16(_mm_unpackhi_epi16(v10, v11), qw1));
+
+                        t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta), W_BITS1-5);
+                        t1 = _mm_srai_epi32(_mm_add_epi32(t1, qdelta), W_BITS1-5);
+
+
+                        __m128i Ilo = _mm_mullo_epi16(mmGainValue_epi16, I_0_7_epi16);
+                        __m128i Ihi = _mm_mulhi_epi16(mmGainValue_epi16, I_0_7_epi16);
+
+                        __m128i Igain_0_3_epi32 = _mm_srai_epi32(_mm_unpacklo_epi16(Ilo, Ihi), bitShift);
+                        __m128i Igain_4_7_epi32 = _mm_srai_epi32(_mm_unpackhi_epi16(Ilo, Ihi), bitShift);
+                        __m128i Igain_epi16 =  _mm_packs_epi32(Igain_0_3_epi32, Igain_4_7_epi32);
+
+                        // diff = J - I + I*gain.x + gain.y
+                        __m128i mmDiff_epi16 = _mm_add_epi16(
+                            _mm_subs_epi16(_mm_packs_epi32(t0, t1), I_0_7_epi16),    // J - I
+                            _mm_add_epi16(Igain_epi16, mmConstValue_epi16));
+
+
+                        mmDiff_epi16 = _mm_and_si128(mmDiff_epi16, mask_0_7_epi16);
+
+
+                        __m128i Ixy_0 = _mm_loadu_si128((const __m128i*)(dIptr)); // Ix0 Iy0 Ix1 Iy1 ...
+                        __m128i Ixy_1 = _mm_loadu_si128((const __m128i*)(dIptr + 8));
+
+                        diff1 = _mm_unpackhi_epi16(mmDiff_epi16, mmDiff_epi16); // It4 It4 It5 It5 It6 It6 It7 It7   | It12 It12 It13 It13...
+                        diff0 = _mm_unpacklo_epi16(mmDiff_epi16, mmDiff_epi16); // It0 It0 It1 It1 It2 It2 It3 It3   | It8 It8 It9 It9...
+
+                        v10 = _mm_mullo_epi16(Ixy_0, diff0);
+                        v11 = _mm_mulhi_epi16(Ixy_0, diff0);
+                        v00 = _mm_unpacklo_epi16(v10, v11);
+                        v10 = _mm_unpackhi_epi16(v10, v11);
+                        qb0 = _mm_add_ps(qb0, _mm_cvtepi32_ps(v00));
+                        qb1 = _mm_add_ps(qb1, _mm_cvtepi32_ps(v10));
+
+                        v10 = _mm_mullo_epi16(Ixy_1, diff1);
+                        v11 = _mm_mulhi_epi16(Ixy_1, diff1);
+                        v00 = _mm_unpacklo_epi16(v10, v11);
+                        v10 = _mm_unpackhi_epi16(v10, v11);
+                        qb0 = _mm_add_ps(qb0, _mm_cvtepi32_ps(v00));
+                        qb1 = _mm_add_ps(qb1, _mm_cvtepi32_ps(v10));
+                        // for set 1  sigma 1
+                        // b3 += diff * Iptr[x]
+                        Ilo = _mm_mullo_epi16(mmDiff_epi16, I_0_7_epi16);
+                        Ihi = _mm_mulhi_epi16(mmDiff_epi16, I_0_7_epi16);
+                        v00 = _mm_unpacklo_epi16(Ilo, Ihi);
+                        v10 = _mm_unpackhi_epi16(Ilo, Ihi);
+                        qb2 = _mm_add_ps(qb2, _mm_cvtepi32_ps(v00));
+                        qb2 = _mm_add_ps(qb2, _mm_cvtepi32_ps(v10));
+
+                        // b4+= diff
+                        __m128 mmDiff_0_3_ps  = _mm_cvtepi32_ps(_mm_cvtepi16_epi32(mmDiff_epi16)); // (_mm_srai_epi32(_mm_unpacklo_epi16(mmDiff_epi16, mmDiff_epi16),16));
+                        __m128 mmDiff_4_7_ps  = _mm_cvtepi32_ps((_mm_srai_epi32(_mm_unpackhi_epi16(mmDiff_epi16, mmDiff_epi16),16)));
+                        qb3 = _mm_add_ps(qb3, mmDiff_0_3_ps);
+                        qb3 = _mm_add_ps(qb3, mmDiff_4_7_ps);
+                        if( j == 0 )
+                        {
+                            __m128 mask_0_4_ps = _mm_cvtepi32_ps(_mm_cvtepi16_epi32(mask_0_7_epi16));
+                            __m128 mask_4_7_ps = _mm_cvtepi32_ps((_mm_srai_epi32(_mm_unpackhi_epi16(mask_0_7_epi16, mask_0_7_epi16), 16)));
+
+                            t0 = _mm_srai_epi32(Ixy_0, 16); // Iy0 Iy1 Iy2 Iy3
+                            t1 = _mm_srai_epi32(_mm_slli_epi32(Ixy_0, 16), 16); // Ix0 Ix1 Ix2 Ix3
+
+                            __m128 fy = _mm_blendv_ps(_mm_set1_ps(0), _mm_cvtepi32_ps(t0), mask_0_4_ps);
+                            __m128 fx = _mm_blendv_ps(_mm_set1_ps(0), _mm_cvtepi32_ps(t1), mask_0_4_ps);
+
+                            // 0 ... 3
+                            __m128 I_ps = _mm_cvtepi32_ps(_mm_srai_epi32(_mm_unpacklo_epi16(I_0_7_epi16, I_0_7_epi16), 16));
+
+                            // A11 - A22
+                            mmAyy = _mm_add_ps(mmAyy, _mm_mul_ps(fy, fy));
+                            mmAxy = _mm_add_ps(mmAxy, _mm_mul_ps(fx, fy));
+                            mmAxx = _mm_add_ps(mmAxx, _mm_mul_ps(fx, fx));
+
+                            // sumIx und sumIy
+                            mmSumIx = _mm_add_ps(mmSumIx, fx);
+                            mmSumIy = _mm_add_ps(mmSumIy, fy);
+
+                            mmSumW1 = _mm_add_ps(mmSumW1, _mm_mul_ps(I_ps, fx));
+                            mmSumW2 = _mm_add_ps(mmSumW2, _mm_mul_ps(I_ps, fy));
+
+                            // sumI
+                            I_ps  = _mm_blendv_ps(_mm_set1_ps(0), I_ps, mask_0_4_ps);
+                            mmSumI = _mm_add_ps(mmSumI,I_ps);
+
+                            // sumDI
+                            mmSumDI = _mm_add_ps(mmSumDI, _mm_mul_ps( I_ps, I_ps));
+
+                            t0 = _mm_srai_epi32(Ixy_1, 16); // Iy8 Iy9 Iy10 Iy11
+                            t1 = _mm_srai_epi32(_mm_slli_epi32(Ixy_1, 16), 16); // Ix0 Ix1 Ix2 Ix3
+
+                            fy = _mm_blendv_ps(_mm_set1_ps(0), _mm_cvtepi32_ps(t0), mask_4_7_ps);
+                            fx = _mm_blendv_ps(_mm_set1_ps(0), _mm_cvtepi32_ps(t1), mask_4_7_ps);
+
+                            // 4 ... 7
+                            I_ps = _mm_cvtepi32_ps(_mm_srai_epi32(_mm_unpackhi_epi16(I_0_7_epi16, I_0_7_epi16), 16));
+
+
+
+                            mmAyy = _mm_add_ps(mmAyy, _mm_mul_ps(fy, fy));
+                            mmAxy = _mm_add_ps(mmAxy, _mm_mul_ps(fx, fy));
+                            mmAxx = _mm_add_ps(mmAxx, _mm_mul_ps(fx, fx));
+
+                            // sumIx und sumIy
+                            mmSumIx = _mm_add_ps(mmSumIx, fx);
+                            mmSumIy = _mm_add_ps(mmSumIy, fy);
+
+                            mmSumW1 = _mm_add_ps(mmSumW1, _mm_mul_ps(I_ps, fx));
+                            mmSumW2 = _mm_add_ps(mmSumW2, _mm_mul_ps(I_ps, fy));
+
+                            // sumI
+                            I_ps  = _mm_blendv_ps(_mm_set1_ps(0), I_ps, mask_4_7_ps);
+                            mmSumI = _mm_add_ps(mmSumI, I_ps);
+
+                            // sumW
+                            // sumDI
+                            mmSumDI = _mm_add_ps(mmSumDI, _mm_mul_ps( I_ps, I_ps));
+                        }
+
+                    }
+#else
+                    for( ; x < winSize.width*cn; x++, dIptr += 2 )
+                    {
+                        if( maskPtr[x] == 0)
+                            continue;
+                        int J_val = CV_DESCALE(Jptr[x]*iw00 + Jptr[x+cn]*iw01 +
+                                              Jptr1[x]*iw10 + Jptr1[x+cn]*iw11,
+                                              W_BITS1-5);
+                        int diff = static_cast<int>(J_val - Iptr[x] + Iptr[x] * gainVec.x + gainVec.y);
+
+                        b1 += (float)(diff*dIptr[0])  * FLT_RESCALE;
+                        b2 += (float)(diff*dIptr[1])  * FLT_RESCALE;
+                        b3 += (float)(diff) * Iptr[x] * FLT_RESCALE;
+                        b4 += (float)(diff);
+
+                        // compute the Gradient Matrice
+                        if( j == 0 )
+                        {
+                            A11 += (float)(dIptr[0]*dIptr[0]);
+                            A12 += (float)(dIptr[0]*dIptr[1]);
+                            A22 += (float)(dIptr[1]*dIptr[1]);
+
+
+                            dI += Iptr[x] * Iptr[x] * FLT_RESCALE;
+                            float dx = static_cast<float>(dIptr[0]) * FLT_RESCALE;
+                            float dy = static_cast<float>(dIptr[1]) * FLT_RESCALE;
+                            sumIx += dx;
+                            sumIy += dy;
+                            w1 += dx * Iptr[x];
+                            w2 += dy * Iptr[x];
+                            sumI += Iptr[x]  * FLT_RESCALE;
+                            //sumW += FLT_RESCALE;
+
+                        }
+                    }
+#endif
+                }
+#ifdef RLOF_SSE
+                float CV_DECL_ALIGNED(16) wbuf[4];
+#endif
+                if( j == 0 )
+                {
+#ifdef RLOF_SSE
+                        _mm_store_ps(wbuf, mmSumW1);
+                        w1  = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                        _mm_store_ps(wbuf, mmSumW2);
+                        w2  = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                        _mm_store_ps(wbuf, mmSumDI);
+                        dI  = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                        _mm_store_ps(wbuf, mmSumI);
+                        sumI  = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                        _mm_store_ps(wbuf, mmSumIx);
+                        sumIx  = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                        _mm_store_ps(wbuf, mmSumIy);
+                        sumIy  = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                        _mm_store_ps(wbuf, mmSumW);
+                        sumW  = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+#endif
+                    sumIx *= -FLT_SCALE;
+                    sumIy *= -FLT_SCALE;
+                    sumI *=FLT_SCALE;
+                    sumW = winArea * FLT_SCALE;
+                    w1 *= -FLT_SCALE;
+                    w2 *= -FLT_SCALE;
+                    dI *= FLT_SCALE;
+#ifdef RLOF_SSE
+                    float CV_DECL_ALIGNED(32) A11buf[4], A12buf[4], A22buf[4];//
+
+                    _mm_store_ps(A11buf, mmAxx);
+                    _mm_store_ps(A12buf, mmAxy);
+                    _mm_store_ps(A22buf, mmAyy);
+
+                    A11 = A11buf[0] + A11buf[1] + A11buf[2] + A11buf[3];
+                    A12 = A12buf[0] + A12buf[1] + A12buf[2] + A12buf[3];
+                    A22 = A22buf[0] + A22buf[1] + A22buf[2] + A22buf[3];
+#endif
+                    A11 *= FLT_SCALE;
+                    A12 *= FLT_SCALE;
+                    A22 *= FLT_SCALE;
+                }
+
+#ifdef RLOF_SSE
+                float CV_DECL_ALIGNED(16) bbuf[4];
+                _mm_store_ps(bbuf, _mm_add_ps(qb0, qb1));
+                b1 = bbuf[0] + bbuf[2];
+                b2 = bbuf[1] + bbuf[3];
+                _mm_store_ps(bbuf, qb2);
+                b3 = bbuf[0] + bbuf[1] + bbuf[2] + bbuf[3];
+                _mm_store_ps(bbuf, qb3);
+                b4 = bbuf[0] + bbuf[1] + bbuf[2] + bbuf[3];
+#endif
+
+                mismatchMat(0) =  b1 * FLT_SCALE;
+                mismatchMat(1) =  b2 * FLT_SCALE;
+                mismatchMat(2) = -b3 * FLT_SCALE;
+                mismatchMat(3) = -b4 * FLT_SCALE;
+
+                D = - A12*A12*sumI*sumI + dI*sumW*A12*A12 + 2*A12*sumI*sumIx*w2 + 2*A12*sumI*sumIy*w1
+                        - 2*dI*A12*sumIx*sumIy - 2*sumW*A12*w1*w2 + A11*A22*sumI*sumI - 2*A22*sumI*sumIx*w1
+                        - 2*A11*sumI*sumIy*w2 - sumIx*sumIx*w2*w2 + A22*dI*sumIx*sumIx + 2*sumIx*sumIy*w1*w2
+                        - sumIy*sumIy*w1*w1 + A11*dI*sumIy*sumIy + A22*sumW*w1*w1 + A11*sumW*w2*w2 - A11*A22*dI*sumW;
+
+
+                float minEig = (A22 + A11 - std::sqrt((A11-A22)*(A11-A22) +
+                        4.f*A12*A12))/(2*winArea);
+                if(  minEig < minEigThreshold )
+                {
+                    if( (level == 0 && status) || std::abs(D) < FLT_EPSILON)
+                        status[ptidx] = 0;
+                    if( level > 0)
+                    {
+                        nextPts[ptidx] = backUpNextPt;
+                        gainVecs[ptidx] = backUpGain;
+                    }
+                    break;
+                }
+
+
+
+                D = (1.f / D);
+                invTensorMat(0,0) = (A22*sumI*sumI - 2*sumI*sumIy*w2 + dI*sumIy*sumIy + sumW*w2*w2 - A22*dI*sumW)* D;
+                invTensorMat(0,1) = (A12*dI*sumW - A12*sumI * sumI - dI*sumIx*sumIy + sumI*sumIx*w2 + sumI*sumIy*w1 - sumW*w1*w2)* D;
+                invTensorMat(0,2) = (A12*sumI*sumIy - sumIy*sumIy*w1 - A22*sumI*sumIx - A12*sumW*w2 + A22*sumW*w1 + sumIx*sumIy*w2)* D;
+                invTensorMat(0,3) = (A22*dI*sumIx - A12*dI*sumIy - sumIx*w2*w2 + A12*sumI*w2 - A22*sumI*w1 + sumIy*w1*w2) * D;
+                invTensorMat(1,0) = invTensorMat(0,1);
+                invTensorMat(1,1) = (A11*sumI * sumI - 2*sumI*sumIx*w1 + dI*sumIx * sumIx + sumW*w1*w1 - A11*dI*sumW) * D;
+                invTensorMat(1,2) = (A12*sumI*sumIx - A11*sumI*sumIy - sumIx * sumIx*w2 + A11*sumW*w2 - A12*sumW*w1 + sumIx*sumIy*w1) * D;
+                invTensorMat(1,3) = (A11*dI*sumIy - sumIy*w1*w1 - A12*dI*sumIx - A11*sumI*w2 + A12*sumI*w1 + sumIx*w1*w2)* D;
+                invTensorMat(2,0) = invTensorMat(0,2);
+                invTensorMat(2,1) = invTensorMat(1,2);
+                invTensorMat(2,2) = (sumW*A12*A12 - 2*A12*sumIx*sumIy + A22*sumIx*sumIx + A11*sumIy*sumIy - A11*A22*sumW)* D;
+                invTensorMat(2,3) = (A11*A22*sumI - A12*A12*sumI - A11*sumIy*w2 + A12*sumIx*w2 + A12*sumIy*w1 - A22*sumIx*w1)* D;
+                invTensorMat(3,0) = invTensorMat(0,3);
+                invTensorMat(3,1) = invTensorMat(1,3);
+                invTensorMat(3,2) = invTensorMat(2,3);
+                invTensorMat(3,3) = (dI*A12*A12 - 2*A12*w1*w2 + A22*w1*w1 + A11*w2*w2 - A11*A22*dI)* D;
+
+                resultMat = invTensorMat * mismatchMat;
+
+                // 0.08 -12.10
+                Point2f delta(-resultMat(0), -resultMat(1));
+                Point2f deltaGain(-resultMat(2), -resultMat(3));
+
+
+                if( j == 0)
+                    prevGain = deltaGain;
+                nextPt += delta;
+                nextPts[ptidx] = nextPt - halfWin;
+                gainVecs[ptidx]= gainVec;
+                if( delta.ddot(delta) <= criteria.epsilon)
+                    break;
+                if ((
+                        std::abs(delta.x - prevDelta.x) < 0.01 &&
+                        std::abs(delta.y - prevDelta.y) < 0.01
+                    ) || (
+                        delta.ddot(delta) <= 0.001 &&
+                        std::abs(prevGain.x - deltaGain.x) < 0.01
+                    ))
+                {
+                    nextPts[ptidx]  -= delta*0.5f;
+                    gainVecs[ptidx] -= deltaGain* 0.5f;
+                    break;
+                }
+
+                if(j > 0 && std::abs(delta.x - prevDelta.x) < 0.01  &&
+                            std::abs(delta.y - prevDelta.y) < 0.01)
+                {
+                    nextPts[ptidx]  -= delta*0.5f;
+                    break;
+                }
+
+                prevDelta = delta;
+                prevGain = deltaGain;
+
+            }
+
+
+        }
+
+    }
+
+    const Mat*          prevImg;
+    const Mat*          nextImg;
+    const Mat*          prevDeriv;
+    const Mat*          rgbPrevImg;
+    const Mat*          rgbNextImg;
+    const Point2f*      prevPts;
+    Point2f*            nextPts;
+    uchar*              status;
+    cv::Point2f*        gainVecs;        // gain vector x -> multiplier y -> offset
+    float*              err;
+    int                 maxWinSize;
+    int                 minWinSize;
+    TermCriteria        criteria;
+    int                 level;
+    int                 maxLevel;
+    int                 windowType;
+    float               minEigThreshold;
+    bool                useInitialFlow;
+    int                 crossSegmentationThreshold;
+};
+
+}  // namespace
+namespace ica {
+
+class TrackerInvoker : public cv::ParallelLoopBody
+{
+public:
+    TrackerInvoker(
+        const Mat&      _prevImg,
+        const Mat&      _prevDeriv,
+        const Mat&      _nextImg,
+        const Mat&      _rgbPrevImg,
+        const Mat&      _rgbNextImg,
+        const Point2f*  _prevPts,
+        Point2f*        _nextPts,
+        uchar*          _status,
+        float*          _err,
+        int             _level,
+        int             _maxLevel,
+        int             _winSize[2],
+        int             _maxIteration,
+        bool            _useInitialFlow,
+        int             _supportRegionType,
+        int             _crossSegmentationThreshold,
+        float           _minEigenValue)
+    {
+        prevImg = &_prevImg;
+        prevDeriv = &_prevDeriv;
+        nextImg = &_nextImg;
+        rgbPrevImg = &_rgbPrevImg;
+        rgbNextImg = &_rgbNextImg;
+        prevPts = _prevPts;
+        nextPts = _nextPts;
+        status = _status;
+        err = _err;
+        minWinSize = _winSize[0];
+        maxWinSize = _winSize[1];
+        criteria.maxCount = _maxIteration;
+        criteria.epsilon = 0.01;
+        level = _level;
+        maxLevel = _maxLevel;
+        windowType = _supportRegionType;
+        minEigThreshold = _minEigenValue;
+        useInitialFlow = _useInitialFlow;
+        crossSegmentationThreshold = _crossSegmentationThreshold;
+    }
+
+    void operator()(const cv::Range& range) const CV_OVERRIDE
+    {
+        cv::Size    winSize;
+        cv::Point2f halfWin;
+
+        const Mat& I    = *prevImg;
+        const Mat& J    = *nextImg;
+        const Mat& derivI = *prevDeriv;
+        const Mat& BI = *rgbPrevImg;
+
+        winSize = cv::Size(maxWinSize,maxWinSize);
+        int winMaskwidth = roundUp(winSize.width, 8) * 2;
+        cv::Mat winMaskMatBuf(winMaskwidth, winMaskwidth, tCVMaskType);
+        winMaskMatBuf.setTo(1);
+
+
+
+        const float FLT_SCALE = (1.f/(1 << 20)); // 20
+
+        int j, cn = I.channels(), cn2 = cn*2;
+        int winbufwidth = roundUp(winSize.width, 8);
+        cv::Size winBufSize(winbufwidth,winbufwidth);
+
+        std::vector<short> _buf(winBufSize.area()*(cn + cn2));
+        Mat IWinBuf(winBufSize, CV_MAKETYPE(CV_16S, cn), &_buf[0]);
+        Mat derivIWinBuf(winBufSize, CV_MAKETYPE(CV_16S, cn2), &_buf[winBufSize.area()*cn]);
+
+        for( int ptidx = range.start; ptidx < range.end; ptidx++ )
+        {
+            Point2f prevPt = prevPts[ptidx]*(float)(1./(1 << level));
+            Point2f nextPt;
+            if( level == maxLevel )
+            {
+                if( useInitialFlow )
+                    nextPt = nextPts[ptidx]*(float)(1./(1 << level));
+                else
+                    nextPt = prevPt;
+            }
+            else
+                nextPt = nextPts[ptidx]*2.f;
+            nextPts[ptidx] = nextPt;
+
+            Point2i iprevPt, inextPt;
+            iprevPt.x = cvFloor(prevPt.x);
+            iprevPt.y = cvFloor(prevPt.y);
+            int winArea = maxWinSize * maxWinSize;
+            cv::Mat winMaskMat(winMaskMatBuf, cv::Rect(0,0, maxWinSize,maxWinSize));
+
+            if( calcWinMaskMat(BI,  windowType, iprevPt,
+                                winMaskMat,winSize,halfWin,winArea,
+                                minWinSize,maxWinSize) == false)
+                        continue;
+            halfWin = Point2f(static_cast<float>(maxWinSize), static_cast<float>(maxWinSize)) - halfWin;
+            prevPt += halfWin;
+            iprevPt.x = cvFloor(prevPt.x);
+            iprevPt.y = cvFloor(prevPt.y);
+
+            if( iprevPt.x < 0 || iprevPt.x >= derivI.cols - winSize.width ||
+                iprevPt.y < 0 || iprevPt.y >= derivI.rows - winSize.height - 1)
+            {
+                if( level == 0 )
+                {
+                    if( status )
+                        status[ptidx] = 3;
+                    if( err )
+                        err[ptidx] = 0;
+                }
+                continue;
+            }
+
+            float a = prevPt.x - iprevPt.x;
+            float b = prevPt.y - iprevPt.y;
+            const int W_BITS = 14, W_BITS1 = 14;
+
+            int iw00 = cvRound((1.f - a)*(1.f - b)*(1 << W_BITS));
+            int iw01 = cvRound(a*(1.f - b)*(1 << W_BITS));
+            int iw10 = cvRound((1.f - a)*b*(1 << W_BITS));
+            int iw11 = (1 << W_BITS) - iw00 - iw01 - iw10;
+            float A11 = 0, A12 = 0, A22 = 0;
+
+#ifdef RLOF_SSE
+            __m128i qw0 = _mm_set1_epi32(iw00 + (iw01 << 16));
+            __m128i qw1 = _mm_set1_epi32(iw10 + (iw11 << 16));
+            __m128i z = _mm_setzero_si128();
+            __m128i qdelta_d = _mm_set1_epi32(1 << (W_BITS1-1));
+            __m128i qdelta = _mm_set1_epi32(1 << (W_BITS1-5-1));
+            __m128 qA11 = _mm_setzero_ps(), qA12 = _mm_setzero_ps(), qA22 = _mm_setzero_ps();
+            __m128i mmMask4_epi32;
+            get4BitMask(winSize.width, mmMask4_epi32);
+#endif
+
+            // extract the patch from the first image, compute covariation matrix of derivatives
+            int x, y;
+            for( y = 0; y < winSize.height; y++ )
+            {
+                const uchar* src = I.ptr<uchar>(y + iprevPt.y, 0) + iprevPt.x*cn;
+                const uchar* src1 = I.ptr<uchar>(y + iprevPt.y + 1, 0) + iprevPt.x*cn;
+                const short* dsrc = derivI.ptr<short>(y + iprevPt.y, 0) + iprevPt.x*cn2;
+                const short* dsrc1 = derivI.ptr<short>(y + iprevPt.y + 1, 0) + iprevPt.x*cn2;
+                short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+                const tMaskType* maskPtr = winMaskMat.ptr<tMaskType>(y, 0);
+                x = 0;
+#ifdef RLOF_SSE
+                for( ; x <= winSize.width*cn; x += 4, dsrc += 4*2, dsrc1 += 8, dIptr += 4*2 )
+                {
+                    __m128i wMask = _mm_set_epi32(MaskSet * maskPtr[x+3],
+                                                  MaskSet * maskPtr[x+2],
+                                                  MaskSet * maskPtr[x+1],
+                                                  MaskSet * maskPtr[x]);
+                    __m128i v00, v01, v10, v11, t0, t1;
+                    v00 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src + x)), z);
+                    v01 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src + x + cn)), z);
+                    v10 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src1 + x)), z);
+                    v11 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src1 + x + cn)), z);
+
+                    t0 = _mm_add_epi32(_mm_madd_epi16(_mm_unpacklo_epi16(v00, v01), qw0),
+                                       _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+
+
+
+
+                    t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta), W_BITS1-5);
+                    _mm_storel_epi64((__m128i*)(Iptr + x), _mm_packs_epi32(t0,t0));
+
+
+
+                    v00 = _mm_loadu_si128((const __m128i*)(dsrc));
+                    v01 = _mm_loadu_si128((const __m128i*)(dsrc + cn2));
+                    v10 = _mm_loadu_si128((const __m128i*)(dsrc1));
+                    v11 = _mm_loadu_si128((const __m128i*)(dsrc1 + cn2));
+
+                    t0 = _mm_add_epi32(_mm_madd_epi16(_mm_unpacklo_epi16(v00, v01), qw0),
+                                       _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+                    t1 = _mm_add_epi32(_mm_madd_epi16(_mm_unpackhi_epi16(v00, v01), qw0),
+                                       _mm_madd_epi16(_mm_unpackhi_epi16(v10, v11), qw1));
+                    t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta_d), W_BITS1);
+                    t1 = _mm_srai_epi32(_mm_add_epi32(t1, qdelta_d), W_BITS1);
+                    v00 = _mm_packs_epi32(t0, t1); // Ix0 Iy0 Ix1 Iy1 ...
+
+                    if( x + 4 > winSize.width)
+                    {
+                        v00 = _mm_and_si128(v00, mmMask4_epi32);
+                    }
+                    v00 = _mm_and_si128(v00, wMask);
+
+                    _mm_storeu_si128((__m128i*)dIptr, v00);
+                    t0 = _mm_srai_epi32(v00, 16); // Iy0 Iy1 Iy2 Iy3
+                    t1 = _mm_srai_epi32(_mm_slli_epi32(v00, 16), 16); // Ix0 Ix1 Ix2 Ix3
+
+                    __m128 fy = _mm_cvtepi32_ps(t0);
+                    __m128 fx = _mm_cvtepi32_ps(t1);
+
+
+                    qA22 = _mm_add_ps(qA22, _mm_mul_ps(fy, fy));
+                    qA12 = _mm_add_ps(qA12, _mm_mul_ps(fx, fy));
+                    qA11 = _mm_add_ps(qA11, _mm_mul_ps(fx, fx));
+                }
+#else
+
+                for( ; x < winSize.width*cn; x++, dsrc += 2, dsrc1 += 2, dIptr += 2 )
+                {
+                    if( maskPtr[x] == 0)
+                    {
+                        dIptr[0] = 0;
+                        dIptr[1] = 0;
+                        continue;
+                    }
+                    int ival = CV_DESCALE(src[x]*iw00 + src[x+cn]*iw01 +
+                                          src1[x]*iw10 + src1[x+cn]*iw11, W_BITS1-5);
+                    int ixval = CV_DESCALE(dsrc[0]*iw00 + dsrc[cn2]*iw01 +
+                                           dsrc1[0]*iw10 + dsrc1[cn2]*iw11, W_BITS1);
+                    int iyval = CV_DESCALE(dsrc[1]*iw00 + dsrc[cn2+1]*iw01 + dsrc1[1]*iw10 +
+                                           dsrc1[cn2+1]*iw11, W_BITS1);
+
+                    Iptr[x] = (short)ival;
+                    dIptr[0] = (short)ixval;
+                    dIptr[1] = (short)iyval;
+                    A11 += (float)(ixval*ixval);
+                    A12 += (float)(ixval*iyval);
+                    A22 += (float)(iyval*iyval);
+                }
+#endif
+            }
+
+#ifdef RLOF_SSE
+            float CV_DECL_ALIGNED(16) A11buf[4], A12buf[4], A22buf[4];
+            _mm_store_ps(A11buf, qA11);
+            _mm_store_ps(A12buf, qA12);
+            _mm_store_ps(A22buf, qA22);
+            A11 += A11buf[0] + A11buf[1] + A11buf[2] + A11buf[3];
+            A12 += A12buf[0] + A12buf[1] + A12buf[2] + A12buf[3];
+            A22 += A22buf[0] + A22buf[1] + A22buf[2] + A22buf[3];
+#endif
+
+            A11 *= FLT_SCALE;
+            A12 *= FLT_SCALE;
+            A22 *= FLT_SCALE;
+
+            float D = A11*A22 - A12*A12;
+            float minEig = (A22 + A11 - std::sqrt((A11-A22)*(A11-A22) +
+                             4.f*A12*A12))/(2 * winArea);
+
+            if( err )
+                err[ptidx] = (float)minEig;
+
+            if( minEig < minEigThreshold ||  D < FLT_EPSILON )
+            {
+                if( level == 0 && status )
+                    status[ptidx] = 0;
+                continue;
+            }
+
+            D = 1.f/D;
+
+            nextPt += halfWin;
+            Point2f prevDelta(0,0);    //relates to h(t-1)
+#ifdef RLOF_SSE
+            __m128i mmMask0, mmMask1, mmMask;
+            getWBitMask(winSize.width, mmMask0, mmMask1, mmMask);
+#endif
+            for( j = 0; j < criteria.maxCount; j++ )
+            {
+                status[ptidx] = static_cast<uchar>(j);
+                inextPt.x = cvFloor(nextPt.x);
+                inextPt.y = cvFloor(nextPt.y);
+
+                if( inextPt.x < 0 || inextPt.x >= J.cols - winSize.width ||
+                    inextPt.y < 0 || inextPt.y >= J.rows - winSize.height - 1)
+                {
+                    if( level == 0 && status )
+                        status[ptidx] = 3;
+                    break;
+                }
+
+                a = nextPt.x - inextPt.x;
+                b = nextPt.y - inextPt.y;
+                iw00 = cvRound((1.f - a)*(1.f - b)*(1 << W_BITS));
+                iw01 = cvRound(a*(1.f - b)*(1 << W_BITS));
+                iw10 = cvRound((1.f - a)*b*(1 << W_BITS));
+                iw11 = (1 << W_BITS) - iw00 - iw01 - iw10;
+                float b1 = 0, b2 = 0;
+#ifdef RLOF_SSE
+                qw0 = _mm_set1_epi32(iw00 + (iw01 << 16));
+                qw1 = _mm_set1_epi32(iw10 + (iw11 << 16));
+                __m128 qb0 = _mm_setzero_ps(), qb1 = _mm_setzero_ps();
+
+
+#endif
+                for( y = 0; y < winSize.height; y++ )
+                {
+                    const uchar* Jptr = J.ptr<uchar>(y + inextPt.y, inextPt.x*cn);
+                    const uchar* Jptr1 = J.ptr<uchar>(y + inextPt.y + 1, inextPt.x*cn);
+                    const short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                    const short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+
+                    x = 0;
+#ifdef RLOF_SSE
+                    const tMaskType* maskPtr = winMaskMat.ptr<tMaskType>(y, 0);
+                    for( ; x <= winSize.width*cn; x += 8, dIptr += 8*2 )
+                    {
+                        if( maskPtr[x  ] == 0 && maskPtr[x+1] == 0 && maskPtr[x+2] == 0 && maskPtr[x+3] == 0
+                        &&    maskPtr[x+4] == 0 && maskPtr[x+5] == 0 && maskPtr[x+6] == 0 && maskPtr[x+7] == 0)
+                            continue;
+                        __m128i diff0 = _mm_loadu_si128((const __m128i*)(Iptr + x)), diff1;
+                        __m128i v00 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr + x)), z);
+                        __m128i v01 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr + x + cn)), z);
+                        __m128i v10 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr1 + x)), z);
+                        __m128i v11 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr1 + x + cn)), z);
+
+                        __m128i t0 = _mm_add_epi32(_mm_madd_epi16(_mm_unpacklo_epi16(v00, v01), qw0),
+                                                   _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+                        __m128i t1 = _mm_add_epi32(_mm_madd_epi16(_mm_unpackhi_epi16(v00, v01), qw0),
+                                                   _mm_madd_epi16(_mm_unpackhi_epi16(v10, v11), qw1));
+                        t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta), W_BITS1-5);
+                        t1 = _mm_srai_epi32(_mm_add_epi32(t1, qdelta), W_BITS1-5);
+
+                        __m128i mmDiff_epi16 = _mm_subs_epi16(_mm_packs_epi32(t0, t1), diff0);
+
+                        __m128i Ixy_0 = _mm_loadu_si128((const __m128i*)(dIptr)); // Ix0 Iy0 Ix1 Iy1 ...
+                        __m128i Ixy_1 = _mm_loadu_si128((const __m128i*)(dIptr + 8));
+
+                        if(  x > winSize.width*cn - 8)
+                        {
+                            Ixy_0 = _mm_and_si128(Ixy_0, mmMask0);
+                            Ixy_1 = _mm_and_si128(Ixy_1, mmMask1);
+                        }
+
+
+                        diff1 = _mm_unpackhi_epi16(mmDiff_epi16, mmDiff_epi16); // It4 It4 It5 It5 It6 It6 It7 It7   | It12 It12 It13 It13...
+                        diff0 = _mm_unpacklo_epi16(mmDiff_epi16, mmDiff_epi16); // It0 It0 It1 It1 It2 It2 It3 It3   | It8 It8 It9 It9...
+
+
+                        v10 = _mm_mullo_epi16(Ixy_0, diff0);
+                        v11 = _mm_mulhi_epi16(Ixy_0, diff0);
+                        v00 = _mm_unpacklo_epi16(v10, v11);
+                        v10 = _mm_unpackhi_epi16(v10, v11);
+                        qb0 = _mm_add_ps(qb0, _mm_cvtepi32_ps(v00));
+                        qb1 = _mm_add_ps(qb1, _mm_cvtepi32_ps(v10));
+
+                        // It * Ix It * Iy [4 ... 7]
+                        // for set 1 hi sigma 1
+
+                        v10 = _mm_mullo_epi16(Ixy_1, diff1);
+                        v11 = _mm_mulhi_epi16(Ixy_1, diff1);
+                        v00 = _mm_unpacklo_epi16(v10, v11);
+                        v10 = _mm_unpackhi_epi16(v10, v11);
+                        qb0 = _mm_add_ps(qb0, _mm_cvtepi32_ps(v00));
+                        qb1 = _mm_add_ps(qb1, _mm_cvtepi32_ps(v10));
+                    }
+#else
+                    for( ; x < winSize.width*cn; x++, dIptr += 2 )
+                    {
+                        if( dIptr[0] == 0 && dIptr[1] == 0)
+                            continue;
+                        int diff = CV_DESCALE(Jptr[x]*iw00 + Jptr[x+cn]*iw01 +
+                                              Jptr1[x]*iw10 + Jptr1[x+cn]*iw11,
+                                              W_BITS1-5) - Iptr[x];
+
+                        b1 += (float)(diff*dIptr[0]) * FLT_RESCALE;
+                        b2 += (float)(diff*dIptr[1]) * FLT_RESCALE;
+                    }
+#endif
+                }
+
+#ifdef RLOF_SSE
+                float CV_DECL_ALIGNED(16) bbuf[4];
+                _mm_store_ps(bbuf, _mm_add_ps(qb0, qb1));
+                b1 += bbuf[0] + bbuf[2];
+                b2 += bbuf[1] + bbuf[3];
+#endif
+
+                b1 *= FLT_SCALE;
+                b2 *= FLT_SCALE;
+
+                Point2f delta( (float)((A12*b2 - A22*b1) * D),
+                              (float)((A12*b1 - A11*b2) * D));
+
+
+
+                nextPt += delta;
+                nextPts[ptidx] = nextPt - halfWin;
+
+                if( delta.ddot(delta) <= criteria.epsilon)
+                    break;
+                if(j > 0 && std::abs(delta.x - prevDelta.x) < 0.01  &&
+                            std::abs(delta.y - prevDelta.y) < 0.01)
+                {
+                    nextPts[ptidx]  -= delta*0.5f;
+                    break;
+                }
+
+                prevDelta = delta;
+
+            }
+
+        }
+
+    }
+
+    const Mat*          prevImg;
+    const Mat*          nextImg;
+    const Mat*          prevDeriv;
+    const Mat*          rgbPrevImg;
+    const Mat*          rgbNextImg;
+    const Point2f*      prevPts;
+    Point2f*            nextPts;
+    uchar*              status;
+    float*              err;
+    int                 maxWinSize;
+    int                 minWinSize;
+    TermCriteria        criteria;
+    int                 level;
+    int                 maxLevel;
+    int                 windowType;
+    float               minEigThreshold;
+    bool                useInitialFlow;
+    int                 crossSegmentationThreshold;
+};
+
+}}}}  // namespace
+#endif
diff --git a/modules/optflow/src/rlof/rlof_invoker.hpp b/modules/optflow/src/rlof/rlof_invoker.hpp
new file mode 100644
index 00000000000..8dd0ecccd75
--- /dev/null
+++ b/modules/optflow/src/rlof/rlof_invoker.hpp
@@ -0,0 +1,1423 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#ifndef _RLOF_INVOKER_HPP_
+#define _RLOF_INVOKER_HPP_
+#include "rlof_invokerbase.hpp"
+namespace cv {
+namespace optflow {
+namespace rlof {
+namespace ica {
+
+class TrackerInvoker : public cv::ParallelLoopBody
+{
+public:
+    TrackerInvoker(
+        const Mat&      _prevImg,
+        const Mat&      _prevDeriv,
+        const Mat&      _nextImg,
+        const Mat&      _rgbPrevImg,
+        const Mat&      _rgbNextImg,
+        const Point2f*  _prevPts,
+        Point2f*        _nextPts,
+        uchar*          _status,
+        float*          _err,
+        int             _level,
+        int             _maxLevel,
+        int             _winSize[2],
+        int             _maxIteration,
+        bool            _useInitialFlow,
+        int             _supportRegionType,
+        const std::vector<float>& _normSigmas,
+        float           _minEigenValue,
+        int             _crossSegmentationThreshold
+    ) :
+        normSigma0(_normSigmas[0]),
+        normSigma1(_normSigmas[1]),
+        normSigma2(_normSigmas[2])
+    {
+        prevImg = &_prevImg;
+        prevDeriv = &_prevDeriv;
+        nextImg = &_nextImg;
+        rgbPrevImg = &_rgbPrevImg;
+        rgbNextImg = &_rgbNextImg;
+        prevPts = _prevPts;
+        nextPts = _nextPts;
+        status = _status;
+        err = _err;
+        minWinSize = _winSize[0];
+        maxWinSize = _winSize[1];
+        criteria.maxCount = _maxIteration;
+        criteria.epsilon = 0.01;
+        level = _level;
+        maxLevel = _maxLevel;
+        windowType = _supportRegionType;
+        minEigThreshold = _minEigenValue;
+        useInitialFlow = _useInitialFlow;
+        crossSegmentationThreshold = _crossSegmentationThreshold;
+    }
+
+    void operator()(const cv::Range& range) const CV_OVERRIDE
+    {
+        cv::Size    winSize;
+        cv::Point2f halfWin;
+
+        const Mat& I = *prevImg;
+        const Mat& J = *nextImg;
+        const Mat& derivI = *prevDeriv;
+        const Mat& BI = *rgbPrevImg;
+
+        winSize = cv::Size(maxWinSize, maxWinSize);
+        int winMaskwidth = roundUp(winSize.width, 8) * 2;
+        cv::Mat winMaskMatBuf(winMaskwidth, winMaskwidth, tCVMaskType);
+        winMaskMatBuf.setTo(1);
+
+        const float FLT_SCALE = (1.f / (1 << 20)); // 20
+
+        int cn = I.channels(), cn2 = cn * 2;
+        int winbufwidth = roundUp(winSize.width, 8);
+        cv::Size winBufSize(winbufwidth, winbufwidth);
+
+        std::vector<short> _buf(winBufSize.area()*(cn + cn2));
+        Mat IWinBuf(winBufSize, CV_MAKETYPE(CV_16S, cn), &_buf[0]);
+        Mat derivIWinBuf(winBufSize, CV_MAKETYPE(CV_16S, cn2), &_buf[winBufSize.area()*cn]);
+
+        for (int ptidx = range.start; ptidx < range.end; ptidx++)
+        {
+            Point2f prevPt = prevPts[ptidx] * (float)(1. / (1 << level));
+            Point2f nextPt;
+            if (level == maxLevel)
+            {
+                if ( useInitialFlow)
+                    nextPt = nextPts[ptidx] * (float)(1. / (1 << level));
+                else
+                    nextPt = prevPt;
+            }
+            else
+                nextPt = nextPts[ptidx] * 2.f;
+            nextPts[ptidx] = nextPt;
+
+            Point2i iprevPt, inextPt;
+            iprevPt.x = cvFloor(prevPt.x);
+            iprevPt.y = cvFloor(prevPt.y);
+            int winArea = maxWinSize * maxWinSize;
+            cv::Mat winMaskMat(winMaskMatBuf, cv::Rect(0, 0, maxWinSize, maxWinSize));
+
+            if (calcWinMaskMat(BI, windowType, iprevPt,
+                winMaskMat, winSize, halfWin, winArea,
+                minWinSize, maxWinSize) == false)
+                continue;
+            halfWin = Point2f(static_cast<float>(maxWinSize) ,static_cast<float>(maxWinSize) ) - halfWin;
+            prevPt += halfWin;
+            iprevPt.x = cvFloor(prevPt.x);
+            iprevPt.y = cvFloor(prevPt.y);
+            if( iprevPt.x < 0 || iprevPt.x >= derivI.cols - winSize.width ||
+                iprevPt.y < 0 || iprevPt.y >= derivI.rows - winSize.height - 1)
+            {
+                if (level == 0)
+                {
+                    if (status)
+                        status[ptidx] = 3;
+                    if (err)
+                        err[ptidx] = 0;
+                }
+                continue;
+            }
+
+            float a = prevPt.x - iprevPt.x;
+            float b = prevPt.y - iprevPt.y;
+            const int W_BITS = 14, W_BITS1 = 14;
+
+            int iw00 = cvRound((1.f - a)*(1.f - b)*(1 << W_BITS));
+            int iw01 = cvRound(a*(1.f - b)*(1 << W_BITS));
+            int iw10 = cvRound((1.f - a)*b*(1 << W_BITS));
+            int iw11 = (1 << W_BITS) - iw00 - iw01 - iw10;
+            float A11 = 0, A12 = 0, A22 = 0;
+            float b1 = 0, b2 = 0;
+            float D = 0;
+            float minEig;
+
+#ifdef RLOF_SSE
+            __m128i qw0 = _mm_set1_epi32(iw00 + (iw01 << 16));
+            __m128i qw1 = _mm_set1_epi32(iw10 + (iw11 << 16));
+            __m128i z = _mm_setzero_si128();
+            __m128i qdelta_d = _mm_set1_epi32(1 << (W_BITS1 - 1));
+            __m128i qdelta = _mm_set1_epi32(1 << (W_BITS1 - 5 - 1));
+            __m128i mmMask4_epi32;
+            __m128i mmMaskSet_epi16 = _mm_set1_epi16(std::numeric_limits<unsigned short>::max());
+            get4BitMask(winSize.width, mmMask4_epi32);
+#endif
+
+            // extract the patch from the first image, compute covariation matrix of derivatives
+            int x, y;
+            for (y = 0; y < winSize.height; y++)
+            {
+                const uchar* src = I.ptr<uchar>(y + iprevPt.y, 0) + iprevPt.x*cn;
+                const uchar* src1 = I.ptr<uchar>(y + iprevPt.y + 1, 0) + iprevPt.x*cn;
+                const short* dsrc = derivI.ptr<short>(y + iprevPt.y, 0) + iprevPt.x*cn2;
+                const short* dsrc1 = derivI.ptr<short>(y + iprevPt.y + 1, 0) + iprevPt.x*cn2;
+                short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+                const tMaskType* maskPtr = winMaskMat.ptr<tMaskType>(y, 0);
+                x = 0;
+#ifdef RLOF_SSE
+
+                for (; x <= winSize.width*cn; x += 4, dsrc += 4 * 2, dsrc1 += 8, dIptr += 4 * 2)
+                {
+                    __m128i mask_0_7_epi16 = _mm_mullo_epi16(_mm_cvtepi8_epi16(_mm_loadl_epi64((const __m128i*)(maskPtr + x))), mmMaskSet_epi16);
+                    __m128i mask_0_3_epi16 = _mm_unpacklo_epi16(mask_0_7_epi16, mask_0_7_epi16);
+
+                    __m128i v00, v01, v10, v11, t0, t1;
+                    v00 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src + x)), z);
+                    v01 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src + x + cn)), z);
+                    v10 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src1 + x)), z);
+                    v11 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src1 + x + cn)), z);
+
+                    t0 = _mm_add_epi32(_mm_madd_epi16(_mm_unpacklo_epi16(v00, v01), qw0),
+                        _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+                    t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta), W_BITS1 - 5);
+                    if (x + 4 > winSize.width)
+                    {
+                        t0 = _mm_and_si128(t0, mmMask4_epi32);
+                    }
+                    t0 = _mm_and_si128(t0, mask_0_3_epi16);
+                    _mm_storel_epi64((__m128i*)(Iptr + x), _mm_packs_epi32(t0, t0));
+
+                    v00 = _mm_loadu_si128((const __m128i*)(dsrc));
+                    v01 = _mm_loadu_si128((const __m128i*)(dsrc + cn2));
+                    v10 = _mm_loadu_si128((const __m128i*)(dsrc1));
+                    v11 = _mm_loadu_si128((const __m128i*)(dsrc1 + cn2));
+
+                    t0 = _mm_add_epi32(_mm_madd_epi16(_mm_unpacklo_epi16(v00, v01), qw0),
+                        _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+                    t1 = _mm_add_epi32(_mm_madd_epi16(_mm_unpackhi_epi16(v00, v01), qw0),
+                        _mm_madd_epi16(_mm_unpackhi_epi16(v10, v11), qw1));
+                    t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta_d), W_BITS1);
+                    t1 = _mm_srai_epi32(_mm_add_epi32(t1, qdelta_d), W_BITS1);
+                    v00 = _mm_packs_epi32(t0, t1); // Ix0 Iy0 Ix1 Iy1 ...
+                    if (x + 4 > winSize.width)
+                    {
+                        v00 = _mm_and_si128(v00, mmMask4_epi32);
+                    }
+                    v00 = _mm_and_si128(v00, mask_0_3_epi16);
+                    _mm_storeu_si128((__m128i*)dIptr, v00);
+                }
+#else
+
+                for (; x < winSize.width*cn; x++, dsrc += 2, dsrc1 += 2, dIptr += 2)
+                {
+                    if (maskPtr[x] == 0)
+                    {
+                        dIptr[0] = 0;
+                        dIptr[1] = 0;
+                        continue;
+                    }
+                    int ival = CV_DESCALE(src[x] * iw00 + src[x + cn] * iw01 +
+                        src1[x] * iw10 + src1[x + cn] * iw11, W_BITS1 - 5);
+                    int ixval = CV_DESCALE(dsrc[0] * iw00 + dsrc[cn2] * iw01 +
+                        dsrc1[0] * iw10 + dsrc1[cn2] * iw11, W_BITS1);
+                    int iyval = CV_DESCALE(dsrc[1] * iw00 + dsrc[cn2 + 1] * iw01 + dsrc1[1] * iw10 +
+                        dsrc1[cn2 + 1] * iw11, W_BITS1);
+
+                    Iptr[x] = (short)ival;
+                    dIptr[0] = (short)ixval;
+                    dIptr[1] = (short)iyval;
+                }
+#endif
+            }
+
+            cv::Mat residualMat = cv::Mat::zeros(winSize.height * (winSize.width + 8) * cn, 1, CV_16SC1);
+
+            cv::Point2f backUpNextPt = nextPt;
+            nextPt += halfWin;
+            int j;
+#ifdef RLOF_SSE
+            __m128i mmMask0, mmMask1, mmMask;
+            getWBitMask(winSize.width, mmMask0, mmMask1, mmMask);
+#endif
+            float MEstimatorScale = 1;
+            int buffIdx = 0;
+            cv::Point2f prevDelta(0, 0);
+
+            for (j = 0; j < criteria.maxCount; j++)
+            {
+                inextPt.x = cvFloor(nextPt.x);
+                inextPt.y = cvFloor(nextPt.y);
+
+                if( inextPt.x < 0 || inextPt.x >= J.cols - winSize.width ||
+                    inextPt.y < 0 || inextPt.y >= J.rows - winSize.height - 1)
+                {
+                    if (level == 0 && status)
+                        status[ptidx] = 3;
+                    if (level > 0)
+                        nextPts[ptidx] = backUpNextPt;
+                    break;
+                }
+
+
+                a = nextPt.x - inextPt.x;
+                b = nextPt.y - inextPt.y;
+                iw00 = cvRound((1.f - a)*(1.f - b)*(1 << W_BITS));
+                iw01 = cvRound(a*(1.f - b)*(1 << W_BITS));
+                iw10 = cvRound((1.f - a)*b*(1 << W_BITS));
+                iw11 = (1 << W_BITS) - iw00 - iw01 - iw10;
+                b1 = 0;
+                b2 = 0;
+
+                if (j == 0 )
+                {
+                    A11 = 0;
+                    A12 = 0;
+                    A22 = 0;
+                }
+
+                if (j == 0 )
+                {
+                    buffIdx = 0;
+                    for (y = 0; y < winSize.height; y++)
+                    {
+                        const uchar* Jptr = J.ptr<uchar>(y + inextPt.y, inextPt.x*cn);
+                        const uchar* Jptr1 = J.ptr<uchar>(y + inextPt.y + 1, inextPt.x*cn);
+                        const short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                        const short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+                        const tMaskType* maskPtr = winMaskMat.ptr<tMaskType>(y, 0);
+                        x = 0;
+                        for (; x < winSize.width*cn; x++, dIptr += 2)
+                        {
+                            if (maskPtr[x] == 0)
+                                continue;
+                            int diff = CV_DESCALE(Jptr[x] * iw00 + Jptr[x + cn] * iw01 + Jptr1[x] * iw10 + Jptr1[x + cn] * iw11, W_BITS1 - 5) - Iptr[x];
+                            residualMat.at<short>(buffIdx++) = static_cast<short>(diff);
+                        }
+                    }
+                    /*! Estimation for the residual */
+                    cv::Mat residualMatRoi(residualMat, cv::Rect(0, 0, 1, buffIdx));
+                    MEstimatorScale = (buffIdx == 0) ? 1.f : estimateScale(residualMatRoi);
+                }
+
+                float eta = 1.f / winArea;
+                float fParam0 = normSigma0 * 32.f;
+                float fParam1 = normSigma1 * 32.f;
+                fParam0 = normSigma0 * MEstimatorScale;
+                fParam1 = normSigma1 * MEstimatorScale;
+#ifdef RLOF_SSE
+                qw0 = _mm_set1_epi32(iw00 + (iw01 << 16));
+                qw1 = _mm_set1_epi32(iw10 + (iw11 << 16));
+                __m128 qb0 = _mm_setzero_ps(), qb1 = _mm_setzero_ps();
+                __m128 mmAxx = _mm_setzero_ps(), mmAxy = _mm_setzero_ps(), mmAyy = _mm_setzero_ps();
+                __m128i mmParam0 = _mm_set1_epi16(MIN(std::numeric_limits<short>::max() - 1, static_cast<short>(fParam0)));
+                __m128i mmParam1 = _mm_set1_epi16(MIN(std::numeric_limits<short>::max() - 1, static_cast<short>(fParam1)));
+                float s2Val = std::fabs(normSigma2);
+                int s2bitShift = normSigma2 == 0 ? 1 : cvCeil(log(200.f / s2Val) / log(2.f));
+                __m128i mmParam2_epi16 = _mm_set1_epi16(static_cast<short>(normSigma2 * (float)(1 << s2bitShift)));
+                __m128i mmOness_epi16 = _mm_set1_epi16(1 << s2bitShift);
+                __m128  mmParam2s = _mm_set1_ps(0.01f * normSigma2);
+                __m128  mmParam2s2 = _mm_set1_ps(normSigma2 * normSigma2);
+                __m128  mmOnes = _mm_set1_ps(1.f);
+                __m128i mmEta = _mm_setzero_si128();
+                __m128i mmScale = _mm_set1_epi16(static_cast<short>(MEstimatorScale));
+
+#endif
+
+                buffIdx = 0;
+                for (y = 0; y < winSize.height; y++)
+                {
+                    const uchar* Jptr = J.ptr<uchar>(y + inextPt.y, inextPt.x*cn);
+                    const uchar* Jptr1 = J.ptr<uchar>(y + inextPt.y + 1, inextPt.x*cn);
+                    const short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                    const short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+                    const tMaskType* maskPtr = winMaskMat.ptr<tMaskType>(y, 0);
+                    x = 0;
+#ifdef RLOF_SSE
+                    for (; x <= winSize.width*cn; x += 8, dIptr += 8 * 2)
+                    {
+                        __m128i mask_0_7_epi16 = _mm_mullo_epi16(_mm_cvtepi8_epi16(_mm_loadl_epi64((const __m128i*)(maskPtr + x))), mmMaskSet_epi16);
+                        __m128i diff0, diff1;
+                        __m128i I_0_7_epi16 = _mm_loadu_si128((const __m128i*)(Iptr + x)); // von element 0 bis 7
+
+                        __m128i v00 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr + x)), z);
+                        __m128i v01 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr + x + cn)), z);
+                        __m128i v10 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr1 + x)), z);
+                        __m128i v11 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr1 + x + cn)), z);
+
+                        __m128i t0 = _mm_add_epi32(_mm_madd_epi16(_mm_unpacklo_epi16(v00, v01), qw0),
+                            _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+                        __m128i t1 = _mm_add_epi32(_mm_madd_epi16(_mm_unpackhi_epi16(v00, v01), qw0),
+                            _mm_madd_epi16(_mm_unpackhi_epi16(v10, v11), qw1));
+                        t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta), W_BITS1 - 5);
+                        t1 = _mm_srai_epi32(_mm_add_epi32(t1, qdelta), W_BITS1 - 5);
+
+                        __m128i mmDiff_epi16 = _mm_subs_epi16(_mm_packs_epi32(t0, t1), I_0_7_epi16);
+
+                        mmDiff_epi16 = _mm_and_si128(mmDiff_epi16, mask_0_7_epi16);
+
+
+                        __m128i scalediffIsPos_epi16 = _mm_cmpgt_epi16(mmDiff_epi16, mmScale);
+                        mmEta = _mm_add_epi16(mmEta, _mm_add_epi16(_mm_and_si128(scalediffIsPos_epi16, _mm_set1_epi16(2)), _mm_set1_epi16(-1)));
+
+
+                        __m128i Ixy_0 = _mm_loadu_si128((const __m128i*)(dIptr)); // Ix0 Iy0 Ix1 Iy1 ...
+                        __m128i Ixy_1 = _mm_loadu_si128((const __m128i*)(dIptr + 8));
+
+                        __m128i abs_epi16 = _mm_abs_epi16(mmDiff_epi16);
+                        __m128i bSet2_epi16, bSet1_epi16;
+                        // |It| < sigma1 ?
+                        bSet2_epi16 = _mm_cmplt_epi16(abs_epi16, mmParam1);
+                        // It > 0 ?
+                        __m128i diffIsPos_epi16 = _mm_cmpgt_epi16(mmDiff_epi16, _mm_setzero_si128());
+                        // sigma0 < |It| < sigma1 ?
+                        bSet1_epi16 = _mm_and_si128(bSet2_epi16, _mm_cmpgt_epi16(abs_epi16, mmParam0));
+                        // val = |It| -/+ sigma1
+                        __m128i tmpParam1_epi16 = _mm_add_epi16(_mm_and_si128(diffIsPos_epi16, _mm_sub_epi16(mmDiff_epi16, mmParam1)),
+                            _mm_andnot_si128(diffIsPos_epi16, _mm_add_epi16(mmDiff_epi16, mmParam1)));
+                        // It == 0     ? |It| > sigma13
+                        mmDiff_epi16 = _mm_and_si128(bSet2_epi16, mmDiff_epi16);
+                        // It == val ? sigma0 < |It| < sigma1
+                        mmDiff_epi16 = _mm_blendv_epi8(mmDiff_epi16, tmpParam1_epi16, bSet1_epi16);
+
+
+                        __m128i tale_epi16_ = _mm_blendv_epi8(mmOness_epi16, mmParam2_epi16, bSet1_epi16); // mask for 0 - 3
+                        // diff = diff * sigma2
+                        __m128i lo = _mm_mullo_epi16(tale_epi16_, mmDiff_epi16);
+                        __m128i hi = _mm_mulhi_epi16(tale_epi16_, mmDiff_epi16);
+                        __m128i diff_0_3_epi32 = _mm_srai_epi32(_mm_unpacklo_epi16(lo, hi), s2bitShift); //diff 0_3_epi32
+                        __m128i diff_4_7_epi32 = _mm_srai_epi32(_mm_unpackhi_epi16(lo, hi), s2bitShift); // diff 4_7_epi32
+
+                        mmDiff_epi16 = _mm_packs_epi32(diff_0_3_epi32, diff_4_7_epi32);
+                        diff1 = _mm_unpackhi_epi16(mmDiff_epi16, mmDiff_epi16); // It4 It4 It5 It5 It6 It6 It7 It7   | It12 It12 It13 It13...
+                        diff0 = _mm_unpacklo_epi16(mmDiff_epi16, mmDiff_epi16); // It0 It0 It1 It1 It2 It2 It3 It3   | It8 It8 It9 It9...
+
+                        // Ix * diff / Iy * diff
+                        v10 = _mm_mullo_epi16(Ixy_0, diff0);
+                        v11 = _mm_mulhi_epi16(Ixy_0, diff0);
+                        v00 = _mm_unpacklo_epi16(v10, v11);
+                        v10 = _mm_unpackhi_epi16(v10, v11);
+                        qb0 = _mm_add_ps(qb0, _mm_cvtepi32_ps(v00));
+                        qb1 = _mm_add_ps(qb1, _mm_cvtepi32_ps(v10));
+
+                        // It * Ix It * Iy [4 ... 7]
+                        // for set 1 hi sigma 1
+
+                        v10 = _mm_mullo_epi16(Ixy_1, diff1);
+                        v11 = _mm_mulhi_epi16(Ixy_1, diff1);
+                        v00 = _mm_unpacklo_epi16(v10, v11);
+                        v10 = _mm_unpackhi_epi16(v10, v11);
+                        qb0 = _mm_add_ps(qb0, _mm_cvtepi32_ps(v00));
+                        qb1 = _mm_add_ps(qb1, _mm_cvtepi32_ps(v10));
+
+
+                        if (j == 0)
+                        {
+                            __m128 bSet1_0_3_ps = _mm_cvtepi32_ps(_mm_cvtepi16_epi32(bSet1_epi16));
+                            __m128 bSet1_4_7_ps = _mm_cvtepi32_ps(_mm_srai_epi32(_mm_unpackhi_epi16(bSet1_epi16, bSet1_epi16), 16));
+                            __m128 mask_0_4_ps = _mm_cvtepi32_ps(_mm_cvtepi16_epi32(mask_0_7_epi16));
+                            __m128 mask_4_7_ps = _mm_cvtepi32_ps((_mm_srai_epi32(_mm_unpackhi_epi16(mask_0_7_epi16, mask_0_7_epi16), 16)));
+
+                            __m128 bSet2_0_3_ps = _mm_cvtepi32_ps(_mm_cvtepi16_epi32(bSet2_epi16));
+                            __m128 bSet2_4_7_ps = _mm_cvtepi32_ps(_mm_srai_epi32(_mm_unpackhi_epi16(bSet2_epi16, bSet2_epi16), 16));
+
+                            __m128 tale_0_3_ps = _mm_blendv_ps(mmOnes, mmParam2s2, bSet1_0_3_ps);
+                            __m128 tale_4_7_ps = _mm_blendv_ps(mmOnes, mmParam2s2, bSet1_4_7_ps);
+                            tale_0_3_ps = _mm_blendv_ps(mmParam2s, tale_0_3_ps, bSet2_0_3_ps);
+                            tale_4_7_ps = _mm_blendv_ps(mmParam2s, tale_4_7_ps, bSet2_4_7_ps);
+
+                            tale_0_3_ps = _mm_blendv_ps(_mm_set1_ps(0), tale_0_3_ps, mask_0_4_ps);
+                            tale_4_7_ps = _mm_blendv_ps(_mm_set1_ps(0), tale_4_7_ps, mask_4_7_ps);
+
+                            t0 = _mm_srai_epi32(Ixy_0, 16); // Iy0 Iy1 Iy2 Iy3
+                            t1 = _mm_srai_epi32(_mm_slli_epi32(Ixy_0, 16), 16); // Ix0 Ix1 Ix2 Ix3
+
+                            __m128 fy = _mm_cvtepi32_ps(t0);
+                            __m128 fx = _mm_cvtepi32_ps(t1);
+
+                            // A11 - A22
+                            __m128 fxtale = _mm_mul_ps(fx, tale_0_3_ps);
+                            __m128 fytale = _mm_mul_ps(fy, tale_0_3_ps);
+
+                            mmAyy = _mm_add_ps(mmAyy, _mm_mul_ps(fy, fytale));
+                            mmAxy = _mm_add_ps(mmAxy, _mm_mul_ps(fx, fytale));
+                            mmAxx = _mm_add_ps(mmAxx, _mm_mul_ps(fx, fxtale));
+
+                            t0 = _mm_srai_epi32(Ixy_1, 16); // Iy8 Iy9 Iy10 Iy11
+                            t1 = _mm_srai_epi32(_mm_slli_epi32(Ixy_1, 16), 16); // Ix0 Ix1 Ix2 Ix3
+
+                            fy = _mm_cvtepi32_ps(t0);
+                            fx = _mm_cvtepi32_ps(t1);
+
+                            // A11 - A22
+                            fxtale = _mm_mul_ps(fx, tale_4_7_ps);
+                            fytale = _mm_mul_ps(fy, tale_4_7_ps);
+
+                            mmAyy = _mm_add_ps(mmAyy, _mm_mul_ps(fy, fytale));
+                            mmAxy = _mm_add_ps(mmAxy, _mm_mul_ps(fx, fytale));
+                            mmAxx = _mm_add_ps(mmAxx, _mm_mul_ps(fx, fxtale));
+                        }
+                    }
+#else
+                    for (; x < winSize.width*cn; x++, dIptr += 2)
+                    {
+                        if (maskPtr[x] == 0)
+                            continue;
+                        int diff = CV_DESCALE(Jptr[x] * iw00 + Jptr[x + cn] * iw01 +
+                            Jptr1[x] * iw10 + Jptr1[x + cn] * iw11,
+                            W_BITS1 - 5) - Iptr[x];
+
+
+
+
+
+                        if (diff > MEstimatorScale)
+                            MEstimatorScale += eta;
+                        if (diff < MEstimatorScale)
+                            MEstimatorScale -= eta;
+
+                        int abss = (diff < 0) ? -diff : diff;
+                        if (abss > fParam1)
+                        {
+                            diff = 0;
+                        }
+                        else if (abss > fParam0 && diff >= 0)
+                        {
+                            //diff = fParam1 * (diff - fParam1);
+                            diff = static_cast<int>(normSigma2 * (diff - fParam1));
+                        }
+                        else if (abss > fParam0 && diff < 0)
+                        {
+                            //diff = fParam1 * (diff + fParam1);
+                            diff = static_cast<int>(normSigma2 * (diff + fParam1));
+
+                        }
+
+                        float ixval = (float)(dIptr[0]);
+                        float iyval = (float)(dIptr[1]);
+                        b1 += (float)(diff*ixval);
+                        b2 += (float)(diff*iyval);
+
+                        if ( j == 0)
+                        {
+                            float tale = normSigma2 * FLT_RESCALE;
+                            if (abss < fParam0)
+                            {
+                                tale = FLT_RESCALE;
+                            }
+                            else if (abss > fParam1)
+                            {
+                                tale *= 0.01f;
+                            }
+                            else
+                            {
+                                tale *= normSigma2;
+                            }
+                            A11 += (float)(ixval*ixval*tale);
+                            A12 += (float)(ixval*iyval*tale);
+                            A22 += (float)(iyval*iyval*tale);
+                        }
+
+
+                    }
+#endif
+                }
+
+#ifdef RLOF_SSE
+                short etaValues[8];
+                _mm_storeu_si128((__m128i*)(etaValues), mmEta);
+                MEstimatorScale += eta * (etaValues[0] + etaValues[1] + etaValues[2] + etaValues[3]
+                        + etaValues[4] + etaValues[5] + etaValues[6] + etaValues[7]);
+
+
+#endif
+                if (j == 0)
+                {
+#ifdef RLOF_SSE
+                    float CV_DECL_ALIGNED(16) A11buf[4], A12buf[4], A22buf[4];//
+
+                    _mm_store_ps(A11buf, mmAxx);
+                    _mm_store_ps(A12buf, mmAxy);
+                    _mm_store_ps(A22buf, mmAyy);
+
+
+                    A11 = A11buf[0] + A11buf[1] + A11buf[2] + A11buf[3];
+                    A12 = A12buf[0] + A12buf[1] + A12buf[2] + A12buf[3];
+                    A22 = A22buf[0] + A22buf[1] + A22buf[2] + A22buf[3];
+#endif
+                    A11 *= FLT_SCALE;
+                    A12 *= FLT_SCALE;
+                    A22 *= FLT_SCALE;
+
+
+                    D = A11 * A22 - A12 * A12;
+                    minEig = (A22 + A11 - std::sqrt((A11 - A22)*(A11 - A22) +
+                        4.f*A12*A12)) / (2 * winArea);
+                    D = 1.f / D;
+
+                    if (minEig < minEigThreshold || std::abs(D) < FLT_EPSILON)
+                    {
+                        if (level == 0 && status)
+                            status[ptidx] = 0;
+                        if (level > 0)
+                        {
+                            nextPts[ptidx] = backUpNextPt;
+                        }
+                        break;
+                    }
+                }
+
+#ifdef RLOF_SSE
+                float CV_DECL_ALIGNED(16) bbuf[4];
+                _mm_store_ps(bbuf, _mm_add_ps(qb0, qb1));
+                b1 += bbuf[0] + bbuf[2];
+                b2 += bbuf[1] + bbuf[3];
+#endif
+
+                b1 *= FLT_SCALE;
+                b2 *= FLT_SCALE;
+
+
+                Point2f delta((float)((A12*b2 - A22 * b1) * D), (float)((A12*b1 - A11 * b2) * D));
+
+                delta.x = (delta.x != delta.x) ? 0 : delta.x;
+                delta.y = (delta.y != delta.y) ? 0 : delta.y;
+
+                nextPt += delta * 0.7;
+                nextPts[ptidx] = nextPt - halfWin;
+
+                if (j > 0 && std::abs(delta.x - prevDelta.x) < 0.01  &&
+                    std::abs(delta.y - prevDelta.y) < 0.01)
+                {
+                    nextPts[ptidx] -= delta * 0.5f;
+                    break;
+                }
+
+                prevDelta = delta;
+
+            }
+
+        }
+
+    }
+
+    const Mat*          prevImg;
+    const Mat*          nextImg;
+    const Mat*          prevDeriv;
+    const Mat*          rgbPrevImg;
+    const Mat*          rgbNextImg;
+    const Point2f*      prevPts;
+    Point2f*            nextPts;
+    uchar*              status;
+    float*              err;
+    int                 maxWinSize;
+    int                 minWinSize;
+    TermCriteria        criteria;
+    int                 level;
+    int                 maxLevel;
+    int                 windowType;
+    float               minEigThreshold;
+    bool                useInitialFlow;
+    const float         normSigma0, normSigma1, normSigma2;
+    int                 crossSegmentationThreshold;
+};
+
+} // namespace
+namespace radial {
+
+class TrackerInvoker : public cv::ParallelLoopBody
+{
+public:
+    TrackerInvoker(
+        const Mat&      _prevImg,
+        const Mat&      _prevDeriv,
+        const Mat&      _nextImg,
+        const Mat&      _rgbPrevImg,
+        const Mat&      _rgbNextImg,
+        const Point2f*  _prevPts,
+        Point2f*        _nextPts,
+        uchar*          _status,
+        float*          _err,
+        Point2f*        _gainVecs,
+        int             _level,
+        int             _maxLevel,
+        int             _winSize[2],
+        int             _maxIteration,
+        bool            _useInitialFlow,
+        int             _supportRegionType,
+        const std::vector<float>& _normSigmas,
+        float           _minEigenValue,
+        int             _crossSegmentationThreshold
+    ) :
+        normSigma0(_normSigmas[0]),
+        normSigma1(_normSigmas[1]),
+        normSigma2(_normSigmas[2])
+    {
+        prevImg = &_prevImg;
+        prevDeriv = &_prevDeriv;
+        nextImg = &_nextImg;
+        rgbPrevImg = &_rgbPrevImg;
+        rgbNextImg = &_rgbNextImg;
+        prevPts = _prevPts;
+        nextPts = _nextPts;
+        status = _status;
+        err = _err;
+        gainVecs = _gainVecs;
+        minWinSize = _winSize[0];
+        maxWinSize = _winSize[1];
+        criteria.maxCount = _maxIteration;
+        criteria.epsilon = 0.01;
+        level = _level;
+        maxLevel = _maxLevel;
+        windowType = _supportRegionType;
+        minEigThreshold = _minEigenValue;
+        useInitialFlow = _useInitialFlow;
+        crossSegmentationThreshold = _crossSegmentationThreshold;
+    }
+
+    void operator()(const cv::Range& range) const CV_OVERRIDE
+    {
+#ifdef DEBUG_INVOKER
+        printf("rlof::radial");fflush(stdout);
+#endif
+        Point2f halfWin;
+        cv::Size winSize;
+        const Mat& I = *prevImg;
+        const Mat& J = *nextImg;
+        const Mat& derivI = *prevDeriv;
+        const Mat& BI = *rgbPrevImg;
+
+
+        const float FLT_SCALE = (1.f / (1 << 16));
+
+        winSize = cv::Size(maxWinSize, maxWinSize);
+        int winMaskwidth = roundUp(winSize.width, 16);
+        cv::Mat winMaskMatBuf(winMaskwidth, winMaskwidth, tCVMaskType);
+        winMaskMatBuf.setTo(1);
+
+        int cn = I.channels(), cn2 = cn * 2;
+        int winbufwidth = roundUp(winSize.width, 16);
+        cv::Size winBufSize(winbufwidth, winbufwidth);
+
+        cv::Matx44f invTensorMat;
+        cv::Vec4f mismatchMat;
+        cv::Vec4f resultMat;
+
+
+        std::vector<short> _buf(winBufSize.area()*(cn + cn2));
+        Mat IWinBuf(winBufSize, CV_MAKETYPE(CV_16S, cn), &_buf[0]);
+        Mat derivIWinBuf(winBufSize, CV_MAKETYPE(CV_16S, cn2), &_buf[winBufSize.area()*cn]);
+
+        for (int ptidx = range.start; ptidx < range.end; ptidx++)
+        {
+
+            Point2f prevPt = prevPts[ptidx] * (float)(1. / (1 << level));
+            Point2f nextPt;
+            if (level == maxLevel)
+            {
+                if (useInitialFlow)
+                {
+                    nextPt = nextPts[ptidx] * (float)(1. / (1 << level));
+                }
+                else
+                    nextPt = prevPt;
+            }
+            else
+            {
+                nextPt = nextPts[ptidx] * 2.f;
+
+            }
+            nextPts[ptidx] = nextPt;
+
+            Point2i iprevPt, inextPt;
+            iprevPt.x = cvFloor(prevPt.x);
+            iprevPt.y = cvFloor(prevPt.y);
+            int winArea = maxWinSize * maxWinSize;
+            cv::Mat winMaskMat(winMaskMatBuf, cv::Rect(0, 0, maxWinSize, maxWinSize));
+            winMaskMatBuf.setTo(0);
+            if (calcWinMaskMat(BI, windowType, iprevPt,
+                winMaskMat, winSize, halfWin, winArea,
+                minWinSize, maxWinSize) == false)
+                continue;
+            halfWin = Point2f(static_cast<float>(maxWinSize) ,static_cast<float>(maxWinSize) ) - halfWin;
+            prevPt += halfWin;
+            iprevPt.x = cvFloor(prevPt.x);
+            iprevPt.y = cvFloor(prevPt.y);
+            if( iprevPt.x < 0 || iprevPt.x >= derivI.cols - winSize.width ||
+                iprevPt.y < 0 || iprevPt.y >= derivI.rows - winSize.height - 1)
+            {
+                if (level == 0)
+                {
+                    if (status)
+                        status[ptidx] = 3;
+                    if (err)
+                        err[ptidx] = 0;
+                }
+                continue;
+            }
+
+            float a = prevPt.x - iprevPt.x;
+            float b = prevPt.y - iprevPt.y;
+            const int W_BITS = 14, W_BITS1 = 14;
+
+            int iw00 = cvRound((1.f - a)*(1.f - b)*(1 << W_BITS));
+            int iw01 = cvRound(a*(1.f - b)*(1 << W_BITS));
+            int iw10 = cvRound((1.f - a)*b*(1 << W_BITS));
+            int iw11 = (1 << W_BITS) - iw00 - iw01 - iw10;
+
+            float A11 = 0, A12 = 0, A22 = 0;
+            float b1 = 0, b2 = 0, b3 = 0, b4 = 0;
+            // tensor
+            float sumIx = 0;
+            float sumIy = 0;
+            float sumI = 0;
+            float sumW = 0;
+            float w1 = 0, w2 = 0; // -IyI
+            float dI = 0; // I^2
+            float D = 0;
+
+#ifdef RLOF_SSE
+            __m128i qw0 = _mm_set1_epi32(iw00 + (iw01 << 16));
+            __m128i qw1 = _mm_set1_epi32(iw10 + (iw11 << 16));
+            __m128i z = _mm_setzero_si128();
+            __m128i qdelta_d = _mm_set1_epi32(1 << (W_BITS1 - 1));
+            __m128i qdelta = _mm_set1_epi32(1 << (W_BITS1 - 5 - 1));
+            __m128i mmMask4_epi32;
+            __m128i mmMaskSet_epi16 = _mm_set1_epi16(std::numeric_limits<unsigned short>::max());
+            get4BitMask(winSize.width, mmMask4_epi32);
+#endif
+
+            // extract the patch from the first image, compute covariation matrix of derivatives
+            int x, y;
+            for (y = 0; y < winSize.height; y++)
+            {
+                const uchar* src = I.ptr<uchar>(y + iprevPt.y, 0) + iprevPt.x*cn;
+                const uchar* src1 = I.ptr<uchar>(y + iprevPt.y + 1, 0) + iprevPt.x*cn;
+                const short* dsrc = derivI.ptr<short>(y + iprevPt.y, 0) + iprevPt.x*cn2;
+                const short* dsrc1 = derivI.ptr<short>(y + iprevPt.y + 1, 0) + iprevPt.x*cn2;
+                short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+                x = 0;
+
+#ifdef RLOF_SSE
+                const tMaskType* maskPtr = winMaskMat.ptr<tMaskType>(y, 0);
+                for (; x <= winBufSize.width*cn - 4; x += 4, dsrc += 4 * 2, dsrc1 += 8, dIptr += 4 * 2)
+                {
+                    __m128i mask_0_7_epi16 = _mm_mullo_epi16(_mm_cvtepi8_epi16(_mm_loadl_epi64((const __m128i*)(maskPtr + x))), mmMaskSet_epi16);
+                    __m128i mask_0_3_epi16 = _mm_unpacklo_epi16(mask_0_7_epi16, mask_0_7_epi16);
+                    __m128i v00, v01, v10, v11, t0, t1;
+                    v00 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src + x)), z);
+                    v01 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src + x + cn)), z);
+                    v10 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src1 + x)), z);
+                    v11 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(const int*)(src1 + x + cn)), z);
+
+                    t0 = _mm_add_epi32(_mm_madd_epi16(_mm_unpacklo_epi16(v00, v01), qw0),
+                        _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+                    t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta), W_BITS1 - 5);
+                    if (x + 4 > winSize.width)
+                    {
+                        t0 = _mm_and_si128(t0, mmMask4_epi32);
+                    }
+                    t0 = _mm_and_si128(t0, mask_0_3_epi16);
+                    _mm_storel_epi64((__m128i*)(Iptr + x), _mm_packs_epi32(t0, t0));
+
+
+                    v00 = _mm_loadu_si128((const __m128i*)(dsrc));
+                    v01 = _mm_loadu_si128((const __m128i*)(dsrc + cn2));
+                    v10 = _mm_loadu_si128((const __m128i*)(dsrc1));
+                    v11 = _mm_loadu_si128((const __m128i*)(dsrc1 + cn2));
+
+                    t0 = _mm_add_epi32(_mm_madd_epi16(_mm_unpacklo_epi16(v00, v01), qw0),
+                        _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+                    t1 = _mm_add_epi32(_mm_madd_epi16(_mm_unpackhi_epi16(v00, v01), qw0),
+                        _mm_madd_epi16(_mm_unpackhi_epi16(v10, v11), qw1));
+                    t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta_d), W_BITS1);
+                    t1 = _mm_srai_epi32(_mm_add_epi32(t1, qdelta_d), W_BITS1);
+                    v00 = _mm_packs_epi32(t0, t1); // Ix0 Iy0 Ix1 Iy1 ...
+                    if (x + 4 > winSize.width)
+                    {
+                        v00 = _mm_and_si128(v00, mmMask4_epi32);
+                    }
+                    v00 = _mm_and_si128(v00, mask_0_3_epi16);
+                    _mm_storeu_si128((__m128i*)dIptr, v00);
+                }
+#else
+
+                for (; x < winSize.width*cn; x++, dsrc += 2, dsrc1 += 2, dIptr += 2)
+                {
+                    if (winMaskMat.at<uchar>(y, x) == 0)
+                    {
+                        dIptr[0] = 0;
+                        dIptr[1] = 0;
+                        continue;
+                    }
+                    int ival = CV_DESCALE(src[x] * iw00 + src[x + cn] * iw01 +
+                        src1[x] * iw10 + src1[x + cn] * iw11, W_BITS1 - 5);
+                    int ixval = CV_DESCALE(dsrc[0] * iw00 + dsrc[cn2] * iw01 +
+                        dsrc1[0] * iw10 + dsrc1[ cn2] * iw11, W_BITS1);
+                    int iyval = CV_DESCALE(dsrc[1] * iw00 + dsrc[cn2 + 1] * iw01 + dsrc1[1] * iw10 +
+                        dsrc1[cn2 + 1] * iw11, W_BITS1);
+
+                    Iptr[x] = (short)ival;
+                    dIptr[0] = (short)ixval;
+                    dIptr[1] = (short)iyval;
+
+                }
+#endif
+            }
+
+            cv::Mat residualMat = cv::Mat::zeros(winSize.height * (winSize.width + 8) * cn, 1, CV_16SC1);
+
+            cv::Point2f backUpNextPt = nextPt;
+            nextPt += halfWin;
+            Point2f prevDelta(0, 0);    //related to h(t-1)
+            Point2f prevGain(0, 0);
+            cv::Point2f gainVec = gainVecs[ptidx];
+            cv::Point2f backUpGain = gainVec;
+            cv::Size _winSize = winSize;
+            int j;
+#ifdef RLOF_SSE
+            __m128i mmMask0, mmMask1, mmMask;
+            getWBitMask(_winSize.width, mmMask0, mmMask1, mmMask);
+            __m128  mmOnes = _mm_set1_ps(1.f);
+#endif
+            float MEstimatorScale = 1;
+            int buffIdx = 0;
+            float minEigValue;
+
+            for (j = 0; j < criteria.maxCount; j++)
+            {
+
+                inextPt.x = cvFloor(nextPt.x);
+                inextPt.y = cvFloor(nextPt.y);
+
+                if( inextPt.x < 0 || inextPt.x >= J.cols - winSize.width ||
+                    inextPt.y < 0 || inextPt.y >= J.rows - winSize.height - 1)
+                {
+                    if (level == 0 && status)
+                        status[ptidx] = 3;
+                    if (level > 0)
+                    {
+                        nextPts[ptidx] = backUpNextPt;
+                        gainVecs[ptidx] = backUpGain;
+                    }
+                    break;
+                }
+
+                a = nextPt.x - inextPt.x;
+                b = nextPt.y - inextPt.y;
+                iw00 = cvRound((1.f - a)*(1.f - b)*(1 << W_BITS));
+                iw01 = cvRound(a*(1.f - b)*(1 << W_BITS));
+                iw10 = cvRound((1.f - a)*b*(1 << W_BITS));
+                iw11 = (1 << W_BITS) - iw00 - iw01 - iw10;
+
+                // mismatch
+                b1 = 0;
+                b2 = 0;
+                b3 = 0;
+                b4 = 0;
+                if (j == 0)
+                {
+                    // tensor
+                    w1 = 0; // -IxI
+                    w2 = 0; // -IyI
+                    dI = 0; // I^2
+                    sumIx = 0;
+                    sumIy = 0;
+                    sumI = 0;
+                    sumW = 0;
+                    A11 = 0;
+                    A12 = 0;
+                    A22 = 0;
+                }
+
+                if (j == 0 )
+                {
+                    buffIdx = 0;
+                    for (y = 0; y < winSize.height; y++)
+                    {
+                        const uchar* Jptr = J.ptr<uchar>(y + inextPt.y, inextPt.x*cn);
+                        const uchar* Jptr1 = J.ptr<uchar>(y + inextPt.y + 1, inextPt.x*cn);
+                        const short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                        const short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+                        x = 0;
+                        for (; x < winSize.width*cn; x++, dIptr += 2)
+                        {
+                            if (dIptr[0] == 0 && dIptr[1] == 0)
+                                continue;
+                            int diff = static_cast<int>(CV_DESCALE(    Jptr[x] * iw00 +
+                                                    Jptr[x + cn] * iw01 +
+                                                    Jptr1[x] * iw10 +
+                                                    Jptr1[x + cn] * iw11, W_BITS1 - 5)
+                                - Iptr[x] + Iptr[x] * gainVec.x + gainVec.y);
+                            residualMat.at<short>(buffIdx++) = static_cast<short>(diff);
+                        }
+                    }
+                    /*! Estimation for the residual */
+                    cv::Mat residualMatRoi(residualMat, cv::Rect(0, 0, 1, buffIdx));
+                    MEstimatorScale = (buffIdx == 0) ? 1.f : estimateScale(residualMatRoi);
+                }
+
+                float eta = 1.f / winArea;
+                float fParam0 = normSigma0 * 32.f;
+                float fParam1 = normSigma1 * 32.f;
+                fParam0 = normSigma0 * MEstimatorScale;
+                fParam1 = normSigma1 * MEstimatorScale;
+
+#ifdef RLOF_SSE
+
+                qw0 = _mm_set1_epi32(iw00 + (iw01 << 16));
+                qw1 = _mm_set1_epi32(iw10 + (iw11 << 16));
+                __m128 qb0 = _mm_setzero_ps(), qb1 = _mm_setzero_ps(), qb2 = _mm_setzero_ps();
+                __m128 qb3 = _mm_setzero_ps();
+                __m128 mmSumW1 = _mm_setzero_ps(), mmSumW2 = _mm_setzero_ps();
+                __m128 mmSumI = _mm_setzero_ps(), mmSumW = _mm_setzero_ps(), mmSumDI = _mm_setzero_ps();
+                __m128 mmSumIy = _mm_setzero_ps(), mmSumIx = _mm_setzero_ps();
+                __m128 mmAxx = _mm_setzero_ps(), mmAxy = _mm_setzero_ps(), mmAyy = _mm_setzero_ps();
+                __m128i mmParam0 = _mm_set1_epi16(MIN(std::numeric_limits<short>::max() - 1, static_cast<short>(fParam0)));
+                __m128i mmParam1 = _mm_set1_epi16(MIN(std::numeric_limits<short>::max() - 1, static_cast<short>(fParam1)));
+
+
+                float s2Val = std::fabs(normSigma2);
+                int s2bitShift = normSigma2 == 0 ? 1 : cvCeil(log(200.f / s2Val) / log(2.f));
+                __m128i mmParam2_epi16 = _mm_set1_epi16(static_cast<short>(normSigma2 * (float)(1 << s2bitShift)));
+                __m128i mmOness_epi16 = _mm_set1_epi16(1 << s2bitShift);
+                __m128  mmParam2s = _mm_set1_ps(0.01f * normSigma2);
+                __m128  mmParam2s2 = _mm_set1_ps(normSigma2 * normSigma2);
+
+                float gainVal = gainVec.x > 0 ? gainVec.x : -gainVec.x;
+                int bitShift = gainVec.x == 0 ? 1 : cvCeil(log(200.f / gainVal) / log(2.f));
+
+                __m128i mmGainValue_epi16 = _mm_set1_epi16(static_cast<short>(gainVec.x * (float)(1 << bitShift)));
+                __m128i mmConstValue_epi16 = _mm_set1_epi16(static_cast<short>(gainVec.y));
+                __m128i    mmEta = _mm_setzero_si128();
+                __m128i mmScale = _mm_set1_epi16(static_cast<short>(MEstimatorScale));
+
+#endif
+                buffIdx = 0;
+
+                for (y = 0; y < _winSize.height; y++)
+                {
+                    const uchar* Jptr = J.ptr<uchar>(y + inextPt.y, inextPt.x*cn);
+                    const uchar* Jptr1 = J.ptr<uchar>(y + inextPt.y + 1, inextPt.x*cn);
+                    const short* Iptr  = IWinBuf.ptr<short>(y, 0);
+                    const short* dIptr = derivIWinBuf.ptr<short>(y, 0);
+                    const tMaskType* maskPtr = winMaskMat.ptr<tMaskType>(y, 0);
+                    x = 0;
+#ifdef RLOF_SSE
+
+
+                    for (; x <= _winSize.width*cn; x += 8, dIptr += 8 * 2)
+                    {
+                        __m128i mask_0_7_epi16 = _mm_mullo_epi16(_mm_cvtepi8_epi16(_mm_loadl_epi64((const __m128i*)(maskPtr + x))), mmMaskSet_epi16);
+                        __m128i diff0, diff1;
+                        __m128i I_0_7_epi16 = _mm_loadu_si128((const __m128i*)(Iptr + x)); // von element 0 bis 7
+                           __m128i v00 = _mm_unpacklo_epi8(
+                        _mm_loadl_epi64((const __m128i*)(Jptr + x)) , z); //J0 , 0, J1, 0, J2, 0 ... J7,0
+                        __m128i v01 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr + x + cn)), z);
+                        __m128i v10 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr1 + x)), z);
+                        __m128i v11 = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i*)(Jptr1 + x + cn)), z);
+
+                        __m128i t0 = _mm_add_epi32
+                        (_mm_madd_epi16(
+                            _mm_unpacklo_epi16(v00, v01), //J0, , J1, J1, J2, J2,  ... J3 , J4
+                            qw0),
+                            _mm_madd_epi16(_mm_unpacklo_epi16(v10, v11), qw1));
+                        __m128i t1 = _mm_add_epi32(_mm_madd_epi16(_mm_unpackhi_epi16(v00, v01), qw0),
+                            _mm_madd_epi16(_mm_unpackhi_epi16(v10, v11), qw1));
+
+                        t0 = _mm_srai_epi32(_mm_add_epi32(t0, qdelta), W_BITS1 - 5);
+                        t1 = _mm_srai_epi32(_mm_add_epi32(t1, qdelta), W_BITS1 - 5);
+
+
+                        __m128i lo = _mm_mullo_epi16(mmGainValue_epi16, I_0_7_epi16);
+                        __m128i hi = _mm_mulhi_epi16(mmGainValue_epi16, I_0_7_epi16);
+
+                        __m128i Igain_0_3_epi32 = _mm_srai_epi32(_mm_unpacklo_epi16(lo, hi), bitShift);
+                        __m128i Igain_4_7_epi32 = _mm_srai_epi32(_mm_unpackhi_epi16(lo, hi), bitShift);
+                        __m128i Igain_epi16 = _mm_packs_epi32(Igain_0_3_epi32, Igain_4_7_epi32);
+
+                        // diff = J - I + I*gain.x + gain.y
+                        __m128i mmDiff_epi16 = _mm_add_epi16(
+                            _mm_subs_epi16(_mm_packs_epi32(t0, t1), I_0_7_epi16),    // J - I
+                            _mm_add_epi16(Igain_epi16, mmConstValue_epi16));
+
+
+                        mmDiff_epi16 = _mm_and_si128(mmDiff_epi16, mask_0_7_epi16);
+
+
+                        __m128i scalediffIsPos_epi16 = _mm_cmpgt_epi16(mmDiff_epi16, mmScale);
+                        mmEta = _mm_add_epi16(mmEta, _mm_add_epi16(_mm_and_si128(scalediffIsPos_epi16, _mm_set1_epi16(2)), _mm_set1_epi16(-1)));
+
+
+                        __m128i Ixy_0 = _mm_loadu_si128((const __m128i*)(dIptr)); // Ix0 Iy0 Ix1 Iy1
+                        __m128i Ixy_1 = _mm_loadu_si128((const __m128i*)(dIptr + 8));
+
+
+                        __m128i abs_epi16 = _mm_abs_epi16(mmDiff_epi16);
+                        __m128i bSet2_epi16, bSet1_epi16;
+                        // |It| < sigma1 ?
+                        bSet2_epi16 = _mm_cmplt_epi16(abs_epi16, mmParam1);
+                        // It > 0 ?
+                        __m128i diffIsPos_epi16 = _mm_cmpgt_epi16(mmDiff_epi16, _mm_setzero_si128());
+                        // sigma0 < |It| < sigma1 ?
+                        bSet1_epi16 = _mm_and_si128(bSet2_epi16, _mm_cmpgt_epi16(abs_epi16, mmParam0));
+                        // val = |It| -/+ sigma1
+                        __m128i tmpParam1_epi16 = _mm_add_epi16(_mm_and_si128(diffIsPos_epi16, _mm_sub_epi16(mmDiff_epi16, mmParam1)),
+                            _mm_andnot_si128(diffIsPos_epi16, _mm_add_epi16(mmDiff_epi16, mmParam1)));
+                        // It == 0     ? |It| > sigma13
+                        mmDiff_epi16 = _mm_and_si128(bSet2_epi16, mmDiff_epi16);
+                        // It == val ? sigma0 < |It| < sigma1
+                        mmDiff_epi16 = _mm_blendv_epi8(mmDiff_epi16, tmpParam1_epi16, bSet1_epi16);
+
+
+                        __m128i tale_epi16_ = _mm_blendv_epi8(mmOness_epi16, mmParam2_epi16, bSet1_epi16); // mask for 0 - 3
+                        // diff = diff * sigma2
+                        lo = _mm_mullo_epi16(tale_epi16_, mmDiff_epi16);
+                        hi = _mm_mulhi_epi16(tale_epi16_, mmDiff_epi16);
+                        __m128i diff_0_3_epi32 = _mm_srai_epi32(_mm_unpacklo_epi16(lo, hi), s2bitShift); //diff 0_3_epi32
+                        __m128i diff_4_7_epi32 = _mm_srai_epi32(_mm_unpackhi_epi16(lo, hi), s2bitShift); // diff 4_7_epi32
+
+                        mmDiff_epi16 = _mm_packs_epi32(diff_0_3_epi32, diff_4_7_epi32); // ,da das ergebniss kleiner als 16 bit sein sollte
+
+                        diff1 = _mm_unpackhi_epi16(mmDiff_epi16, mmDiff_epi16); // It4 It4 It5 It5 It6 It6 It7 It7   | It12 It12 It13 It13...
+                        diff0 = _mm_unpacklo_epi16(mmDiff_epi16, mmDiff_epi16); // It0 It0 It1 It1 It2 It2 It3 It3   | It8 It8 It9 It9...
+
+                        // Ix * diff / Iy * diff
+                        v10 = _mm_mullo_epi16(Ixy_0, diff0);
+                        v11 = _mm_mulhi_epi16(Ixy_0, diff0);
+                        v00 = _mm_unpacklo_epi16(v10, v11);
+                        v10 = _mm_unpackhi_epi16(v10, v11);
+                        qb0 = _mm_add_ps(qb0, _mm_cvtepi32_ps(v00));
+                        qb1 = _mm_add_ps(qb1, _mm_cvtepi32_ps(v10));
+
+                        // It * Ix It * Iy [4 ... 7]
+                        // for set 1 hi sigma 1
+
+                        v10 = _mm_mullo_epi16(Ixy_1, diff1);
+                        v11 = _mm_mulhi_epi16(Ixy_1, diff1);
+                        v00 = _mm_unpacklo_epi16(v10, v11);
+                        v10 = _mm_unpackhi_epi16(v10, v11);
+                        qb0 = _mm_add_ps(qb0, _mm_cvtepi32_ps(v00));
+                        qb1 = _mm_add_ps(qb1, _mm_cvtepi32_ps(v10));
+                        // diff * J [0 ... 7]
+                        // for set 1  sigma 1
+                        // b3 += diff * Iptr[x]
+
+                        v10 = _mm_mullo_epi16(mmDiff_epi16, I_0_7_epi16);
+                        v11 = _mm_mulhi_epi16(mmDiff_epi16, I_0_7_epi16);
+                        v00 = _mm_unpacklo_epi16(v10, v11);
+                        v10 = _mm_unpackhi_epi16(v10, v11);
+                        qb2 = _mm_add_ps(qb2, _mm_cvtepi32_ps(v00));
+                        qb2 = _mm_add_ps(qb2, _mm_cvtepi32_ps(v10));
+
+                        qb3 = _mm_add_ps(qb3, _mm_cvtepi32_ps(diff_0_3_epi32));
+                        qb3 = _mm_add_ps(qb3, _mm_cvtepi32_ps(diff_4_7_epi32));
+
+                        if (j == 0)
+                        {
+                            __m128 bSet1_0_3_ps = _mm_cvtepi32_ps(_mm_cvtepi16_epi32(bSet1_epi16));
+                            __m128 bSet1_4_7_ps = _mm_cvtepi32_ps(_mm_srai_epi32(_mm_unpackhi_epi16(bSet1_epi16, bSet1_epi16), 16));
+                            __m128 mask_0_4_ps = _mm_cvtepi32_ps(_mm_cvtepi16_epi32(mask_0_7_epi16));
+                            __m128 mask_4_7_ps = _mm_cvtepi32_ps((_mm_srai_epi32(_mm_unpackhi_epi16(mask_0_7_epi16, mask_0_7_epi16), 16)));
+
+                            __m128 bSet2_0_3_ps = _mm_cvtepi32_ps(_mm_cvtepi16_epi32(bSet2_epi16));
+                            __m128 bSet2_4_7_ps = _mm_cvtepi32_ps(_mm_srai_epi32(_mm_unpackhi_epi16(bSet2_epi16, bSet2_epi16), 16));
+
+                            __m128 tale_0_3_ps = _mm_blendv_ps(mmOnes, mmParam2s2, bSet1_0_3_ps);
+                            __m128 tale_4_7_ps = _mm_blendv_ps(mmOnes, mmParam2s2, bSet1_4_7_ps);
+                            tale_0_3_ps = _mm_blendv_ps(mmParam2s, tale_0_3_ps, bSet2_0_3_ps);
+                            tale_4_7_ps = _mm_blendv_ps(mmParam2s, tale_4_7_ps, bSet2_4_7_ps);
+
+                            tale_0_3_ps = _mm_blendv_ps(_mm_set1_ps(0), tale_0_3_ps, mask_0_4_ps);
+                            tale_4_7_ps = _mm_blendv_ps(_mm_set1_ps(0), tale_4_7_ps, mask_4_7_ps);
+                            t0 = _mm_srai_epi32(Ixy_0, 16);
+                            t1 = _mm_srai_epi32(_mm_slli_epi32(Ixy_0, 16), 16);
+
+                            __m128 fy = _mm_cvtepi32_ps(t0);
+                            __m128 fx = _mm_cvtepi32_ps(t1);
+
+                            // 0 ... 3
+                            __m128 I_ps = _mm_cvtepi32_ps(_mm_srai_epi32(_mm_unpacklo_epi16(I_0_7_epi16, I_0_7_epi16), 16));
+
+                            // A11 - A22
+                            __m128 fxtale = _mm_mul_ps(fx, tale_0_3_ps);
+                            __m128 fytale = _mm_mul_ps(fy, tale_0_3_ps);
+
+                            mmAyy = _mm_add_ps(mmAyy, _mm_mul_ps(fy, fytale));
+                            mmAxy = _mm_add_ps(mmAxy, _mm_mul_ps(fx, fytale));
+                            mmAxx = _mm_add_ps(mmAxx, _mm_mul_ps(fx, fxtale));
+
+                            // sumIx und sumIy
+                            mmSumIx = _mm_add_ps(mmSumIx, fxtale);
+                            mmSumIy = _mm_add_ps(mmSumIy, fytale);
+
+                            mmSumW1 = _mm_add_ps(mmSumW1, _mm_mul_ps(I_ps, fxtale));
+                            mmSumW2 = _mm_add_ps(mmSumW2, _mm_mul_ps(I_ps, fytale));
+
+                            // sumI
+                            __m128 I_tale_ps = _mm_mul_ps(I_ps, tale_0_3_ps);
+                            mmSumI = _mm_add_ps(mmSumI, I_tale_ps);
+
+                            // sumW
+                            mmSumW = _mm_add_ps(mmSumW, tale_0_3_ps);
+
+                            // sumDI
+                            mmSumDI = _mm_add_ps(mmSumDI, _mm_mul_ps(I_ps, I_tale_ps));
+
+                            t0 = _mm_srai_epi32(Ixy_1, 16); // Iy8 Iy9 Iy10 Iy11
+                            t1 = _mm_srai_epi32(_mm_slli_epi32(Ixy_1, 16), 16); // Ix0 Ix1 Ix2 Ix3
+
+                            fy = _mm_cvtepi32_ps(t0);
+                            fx = _mm_cvtepi32_ps(t1);
+
+                            // 4 ... 7
+                            I_ps = _mm_cvtepi32_ps(_mm_srai_epi32(_mm_unpackhi_epi16(I_0_7_epi16, I_0_7_epi16), 16));
+
+                            // A11 - A22
+                            fxtale = _mm_mul_ps(fx, tale_4_7_ps);
+                            fytale = _mm_mul_ps(fy, tale_4_7_ps);
+
+                            mmAyy = _mm_add_ps(mmAyy, _mm_mul_ps(fy, fytale));
+                            mmAxy = _mm_add_ps(mmAxy, _mm_mul_ps(fx, fytale));
+                            mmAxx = _mm_add_ps(mmAxx, _mm_mul_ps(fx, fxtale));
+
+                            // sumIx und sumIy
+                            mmSumIx = _mm_add_ps(mmSumIx, fxtale);
+                            mmSumIy = _mm_add_ps(mmSumIy, fytale);
+
+                            mmSumW1 = _mm_add_ps(mmSumW1, _mm_mul_ps(I_ps, fxtale));
+                            mmSumW2 = _mm_add_ps(mmSumW2, _mm_mul_ps(I_ps, fytale));
+
+                            // sumI
+                            I_tale_ps = _mm_mul_ps(I_ps, tale_4_7_ps);
+                            mmSumI = _mm_add_ps(mmSumI, I_tale_ps);
+
+                            mmSumW = _mm_add_ps(mmSumW, tale_4_7_ps);
+
+                            mmSumDI = _mm_add_ps(mmSumDI, _mm_mul_ps(I_ps, I_tale_ps));
+                        }
+
+                    }
+#else
+                    for (; x < _winSize.width*cn; x++, dIptr += 2)
+                    {
+                        if (maskPtr[x] == 0)
+                            continue;
+                        int J_val = CV_DESCALE(Jptr[x] * iw00 + Jptr[x + cn] * iw01 + Jptr1[x] * iw10 + Jptr1[x + cn] * iw11,
+                            W_BITS1 - 5);
+                        short ixval = static_cast<short>(dIptr[0]);
+                        short iyval = static_cast<short>(dIptr[1]);
+                        int diff = static_cast<int>(J_val - Iptr[x] + Iptr[x] * gainVec.x + gainVec.y);
+                        int abss = (diff < 0) ? -diff : diff;
+                        if (diff > MEstimatorScale)
+                            MEstimatorScale += eta;
+                        if (diff < MEstimatorScale)
+                            MEstimatorScale -= eta;
+                        if (abss > static_cast<int>(fParam1))
+                        {
+                            diff = 0;
+                        }
+                        else if (abss > static_cast<int>(fParam0) && diff >= 0)
+                        {
+                            diff = static_cast<int>(normSigma2 * (diff - fParam1));
+                        }
+                        else if (abss > static_cast<int>(fParam0) && diff < 0)
+                        {
+                            diff = static_cast<int>(normSigma2 * (diff + fParam1));
+                        }
+                        b1 += (float)(diff*ixval);
+                        b2 += (float)(diff*iyval); ;
+                        b3 += (float)(diff)* Iptr[x];
+                        b4 += (float)(diff);
+
+
+                        // compute the Gradient Matrice
+                        if (j == 0)
+                        {
+                            float tale = normSigma2 * FLT_RESCALE;
+                            if (abss < fParam0 || j < 0)
+                            {
+                                tale = FLT_RESCALE;
+                            }
+                            else if (abss > fParam1)
+                            {
+                                tale *= 0.01f;
+                            }
+                            else
+                            {
+                                tale *= normSigma2;
+                            }
+                            if (j == 0)
+                            {
+                                A11 += (float)(ixval*ixval)*tale;
+                                A12 += (float)(ixval*iyval)*tale;
+                                A22 += (float)(iyval*iyval)*tale;
+                            }
+
+                            dI += Iptr[x] * Iptr[x] * tale;
+                            float dx = static_cast<float>(dIptr[0]) * tale;
+                            float dy = static_cast<float>(dIptr[1]) * tale;
+                            sumIx += dx;
+                            sumIy += dy;
+                            w1 += dx * Iptr[x];
+                            w2 += dy * Iptr[x];
+                            sumI += Iptr[x] * tale;
+                            sumW += tale;
+                        }
+
+                    }
+#endif
+                }
+#ifdef RLOF_SSE
+                short etaValues[8];
+                _mm_storeu_si128((__m128i*)(etaValues), mmEta);
+                MEstimatorScale += eta * (etaValues[0] + etaValues[1] + etaValues[2] + etaValues[3]
+                        + etaValues[4] + etaValues[5] + etaValues[6] + etaValues[7]);
+                        float CV_DECL_ALIGNED(32) wbuf[4];
+#endif
+                if (j == 0)
+                {
+#ifdef RLOF_SSE
+                    _mm_store_ps(wbuf, mmSumW1);
+                    w1 = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                    _mm_store_ps(wbuf, mmSumW2);
+                    w2 = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                    _mm_store_ps(wbuf, mmSumDI);
+                    dI = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                    _mm_store_ps(wbuf, mmSumI);
+                    sumI = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                    _mm_store_ps(wbuf, mmSumIx);
+                    sumIx = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                    _mm_store_ps(wbuf, mmSumIy);
+                    sumIy = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+                    _mm_store_ps(wbuf, mmSumW);
+                    sumW = wbuf[0] + wbuf[1] + wbuf[2] + wbuf[3];
+#endif
+                    sumIx *= -FLT_SCALE;
+                    sumIy *= -FLT_SCALE;
+                    sumI *= FLT_SCALE;
+                    sumW *= FLT_SCALE;
+                    w1 *= -FLT_SCALE;
+                    w2 *= -FLT_SCALE;
+                    dI *= FLT_SCALE;
+#ifdef RLOF_SSE
+                    float CV_DECL_ALIGNED(16) A11buf[4], A12buf[4], A22buf[4];
+                    _mm_store_ps(A11buf, mmAxx);
+                    _mm_store_ps(A12buf, mmAxy);
+                    _mm_store_ps(A22buf, mmAyy);
+
+
+                    A11 = A11buf[0] + A11buf[1] + A11buf[2] + A11buf[3];
+                    A12 = A12buf[0] + A12buf[1] + A12buf[2] + A12buf[3];
+                    A22 = A22buf[0] + A22buf[1] + A22buf[2] + A22buf[3];
+#endif
+                    A11 *= FLT_SCALE;
+                    A12 *= FLT_SCALE;
+                    A22 *= FLT_SCALE;
+                }
+#ifdef RLOF_SSE
+                float CV_DECL_ALIGNED(16) bbuf[4];
+                _mm_store_ps(bbuf, _mm_add_ps(qb0, qb1));
+                b1 = bbuf[0] + bbuf[2];
+                b2 = bbuf[1] + bbuf[3];
+                _mm_store_ps(bbuf, qb2);
+                b3 = bbuf[0] + bbuf[1] + bbuf[2] + bbuf[3];
+                _mm_store_ps(bbuf, qb3);
+                b4 = bbuf[0] + bbuf[1] + bbuf[2] + bbuf[3];
+#endif
+                mismatchMat(0) = b1 * FLT_SCALE;
+                mismatchMat(1) = b2 * FLT_SCALE;
+                mismatchMat(2) = -b3 * FLT_SCALE;
+                mismatchMat(3) = -b4 * FLT_SCALE;
+
+                D = -A12 * A12*sumI*sumI + dI * sumW*A12*A12 + 2 * A12*sumI*sumIx*w2 + 2 * A12*sumI*sumIy*w1
+                    - 2 * dI*A12*sumIx*sumIy - 2 * sumW*A12*w1*w2 + A11 * A22*sumI*sumI - 2 * A22*sumI*sumIx*w1
+                    - 2 * A11*sumI*sumIy*w2 - sumIx * sumIx*w2*w2 + A22 * dI*sumIx*sumIx + 2 * sumIx*sumIy*w1*w2
+                    - sumIy * sumIy*w1*w1 + A11 * dI*sumIy*sumIy + A22 * sumW*w1*w1 + A11 * sumW*w2*w2 - A11 * A22*dI*sumW;
+
+                float sqrtVal = std::sqrt((A11 - A22)*(A11 - A22) + 4.f*A12*A12);
+                minEigValue = (A22 + A11 - sqrtVal) / (2.0f*winArea);
+                if (minEigValue < minEigThreshold || std::abs(D) < FLT_EPSILON)
+                {
+                    if (level == 0 && status)
+                        status[ptidx] = 0;
+                    if (level > 0)
+                    {
+                        nextPts[ptidx] = backUpNextPt;
+                        gainVecs[ptidx] = backUpGain;
+                    }
+                    break;
+                }
+
+                D = (1.f / D);
+
+                invTensorMat(0, 0) = (A22*sumI*sumI - 2 * sumI*sumIy*w2 + dI * sumIy*sumIy + sumW * w2*w2 - A22 * dI*sumW)* D;
+                invTensorMat(0, 1) = (A12*dI*sumW - A12 * sumI * sumI - dI * sumIx*sumIy + sumI * sumIx*w2 + sumI * sumIy*w1 - sumW * w1*w2)* D;
+                invTensorMat(0, 2) = (A12*sumI*sumIy - sumIy * sumIy*w1 - A22 * sumI*sumIx - A12 * sumW*w2 + A22 * sumW*w1 + sumIx * sumIy*w2)* D;
+                invTensorMat(0, 3) = (A22*dI*sumIx - A12 * dI*sumIy - sumIx * w2*w2 + A12 * sumI*w2 - A22 * sumI*w1 + sumIy * w1*w2) * D;
+                invTensorMat(1, 0) = invTensorMat(0, 1);
+                invTensorMat(1, 1) = (A11*sumI * sumI - 2 * sumI*sumIx*w1 + dI * sumIx * sumIx + sumW * w1*w1 - A11 * dI*sumW) * D;
+                invTensorMat(1, 2) = (A12*sumI*sumIx - A11 * sumI*sumIy - sumIx * sumIx*w2 + A11 * sumW*w2 - A12 * sumW*w1 + sumIx * sumIy*w1) * D;
+                invTensorMat(1, 3) = (A11*dI*sumIy - sumIy * w1*w1 - A12 * dI*sumIx - A11 * sumI*w2 + A12 * sumI*w1 + sumIx * w1*w2)* D;
+                invTensorMat(2, 0) = invTensorMat(0, 2);
+                invTensorMat(2, 1) = invTensorMat(1, 2);
+                invTensorMat(2, 2) = (sumW*A12*A12 - 2 * A12*sumIx*sumIy + A22 * sumIx*sumIx + A11 * sumIy*sumIy - A11 * A22*sumW)* D;
+                invTensorMat(2, 3) = (A11*A22*sumI - A12 * A12*sumI - A11 * sumIy*w2 + A12 * sumIx*w2 + A12 * sumIy*w1 - A22 * sumIx*w1)* D;
+                invTensorMat(3, 0) = invTensorMat(0, 3);
+                invTensorMat(3, 1) = invTensorMat(1, 3);
+                invTensorMat(3, 2) = invTensorMat(2, 3);
+                invTensorMat(3, 3) = (dI*A12*A12 - 2 * A12*w1*w2 + A22 * w1*w1 + A11 * w2*w2 - A11 * A22*dI)* D;
+
+                resultMat = invTensorMat * mismatchMat;
+
+                Point2f delta(-resultMat(0), -resultMat(1));
+                Point2f deltaGain(resultMat(2), resultMat(3));
+
+
+
+
+                if (j == 0)
+                    prevGain = deltaGain;
+                gainVec += deltaGain * 0.8;
+                nextPt += delta * 0.8;
+                nextPts[ptidx] = nextPt - halfWin;
+                gainVecs[ptidx] = gainVec;
+
+                if (
+                    (std::abs(delta.x - prevDelta.x) < 0.01  &&    std::abs(delta.y - prevDelta.y) < 0.01)
+                    || ((delta.ddot(delta) <= 0.001) && std::abs(prevGain.x - deltaGain.x) < 0.01)
+                    )
+                {
+                    nextPts[ptidx] -= delta * 0.5f;
+                    gainVecs[ptidx] -= deltaGain * 0.5f;
+                    break;
+                }
+
+                prevDelta = delta;
+                prevGain = deltaGain;
+            }
+
+        }
+
+    }
+
+    const Mat*          prevImg;
+    const Mat*          nextImg;
+    const Mat*          prevDeriv;
+    const Mat*          rgbPrevImg;
+    const Mat*          rgbNextImg;
+    const Point2f*      prevPts;
+    Point2f*            nextPts;
+    uchar*              status;
+    cv::Point2f*        gainVecs;
+    float*              err;
+    int                 maxWinSize;
+    int                 minWinSize;
+    TermCriteria        criteria;
+    int                 level;
+    int                 maxLevel;
+    int                 windowType;
+    float               minEigThreshold;
+    bool                useInitialFlow;
+    const float         normSigma0, normSigma1, normSigma2;
+    int                 crossSegmentationThreshold;
+};
+
+}}}} // namespace
+#endif
diff --git a/modules/optflow/src/rlof/rlof_invokerbase.hpp b/modules/optflow/src/rlof/rlof_invokerbase.hpp
new file mode 100644
index 00000000000..997e51b4e93
--- /dev/null
+++ b/modules/optflow/src/rlof/rlof_invokerbase.hpp
@@ -0,0 +1,178 @@
+﻿// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#ifndef _RLOF_INVOKERBASE_HPP_
+#define _RLOF_INVOKERBASE_HPP_
+
+
+#if CV_SSE4_1
+#define RLOF_SSE
+#endif
+
+//#define DEBUG_INVOKER
+
+#ifndef CV_DESCALE
+#define CV_DESCALE(x, n)     (((x) + (1 << ((n)-1))) >> (n))
+#endif
+
+#define FLT_RESCALE 1
+
+
+#include "rlof_localflow.h"
+#include <unordered_map>
+#include "opencv2/core/hal/intrin.hpp"
+using namespace std;
+using namespace cv;
+
+
+namespace cv {
+namespace optflow {
+
+typedef short deriv_type;
+#ifdef RLOF_SSE
+static inline void get4BitMask(const int & width, __m128i & mask)
+{
+    int noBits = width - static_cast<int>(floor(width / 4.f) * 4.f);
+    unsigned int val[4];
+    for (int n = 0; n < 4; n++)
+    {
+        val[n] = (noBits > n) ? (std::numeric_limits<unsigned int>::max()) : 0;
+    }
+    mask = _mm_set_epi32(val[3], val[2], val[1], val[0]);
+
+}
+static inline void getWBitMask(const int & width, __m128i & t0, __m128i & t1, __m128i & t2)
+{
+    int noBits = width - static_cast<int>(floor(width / 8.f) * 8.f);
+    unsigned short val[8];
+    for (int n = 0; n < 8; n++)
+    {
+        val[n] = (noBits > n) ? (0xffff) : 0;
+    }
+    t1 = _mm_set_epi16(val[7], val[7], val[6], val[6], val[5], val[5], val[4], val[4]);
+    t0 = _mm_set_epi16(val[3], val[3], val[2], val[2], val[1], val[1], val[0], val[0]);
+    t2 = _mm_set_epi16(val[7], val[6], val[5], val[4], val[3], val[2], val[1], val[0]);
+}
+#endif
+typedef uchar tMaskType;
+#define tCVMaskType CV_8UC1
+#define MaskSet 0xffffffff
+
+static
+void getLocalPatch(
+        const cv::Mat & src,
+        const cv::Point2i & prevPoint, // feature points
+        cv::Mat & winPointMask,
+        int & noPoints,
+        cv::Rect & winRoi,
+        int minWinSize)
+{
+    int maxWinSizeH = (winPointMask.cols - 1) / 2;
+    winRoi.x = prevPoint.x;// - maxWinSizeH;
+    winRoi.y = prevPoint.y;// - maxWinSizeH;
+    winRoi.width  =  winPointMask.cols;
+    winRoi.height =  winPointMask.rows;
+
+    if( minWinSize == winPointMask.cols || prevPoint.x < 0 || prevPoint.y < 0
+        || prevPoint.x + 2*maxWinSizeH >= src.cols || prevPoint.y + 2*maxWinSizeH >= src.rows)
+    {
+        winRoi.x = prevPoint.x - maxWinSizeH;
+        winRoi.y = prevPoint.y - maxWinSizeH;
+        winPointMask.setTo(1);
+        noPoints = winPointMask.size().area();
+        return;
+    }
+    winPointMask.setTo(0);
+    noPoints = 0;
+    int c            = prevPoint.x + maxWinSizeH;
+    int r            = prevPoint.y + maxWinSizeH;
+    int min_c = c;
+    int max_c = c;
+    int border_left    = c - maxWinSizeH;
+    int border_top    = r - maxWinSizeH;
+    cv::Vec4i bounds = src.at<cv::Vec4i>(r,c);
+    int min_r = bounds.val[2];
+    int max_r = bounds.val[3];
+
+    for( int _r = min_r; _r <= max_r; _r++)
+    {
+        cv::Rect roi(maxWinSizeH, _r - border_top,  winPointMask.cols, 1);
+        if( _r >= 0 && _r < src.cols)
+        {
+            bounds = src.at<cv::Vec4i>(_r,c);
+            roi.x      = bounds.val[0] - border_left;
+            roi.width = bounds.val[1] - bounds.val[0];
+            cv::Mat(winPointMask, roi).setTo(1);
+        }
+        else
+        {
+            bounds.val[0] = border_left;
+            bounds.val[1] = border_left + roi.width;
+        }
+        min_c = MIN(min_c, bounds.val[0]);
+        max_c = MAX(max_c, bounds.val[1]);
+        noPoints += roi.width;
+    }
+
+    if( noPoints < minWinSize * minWinSize)
+    {
+        cv::Rect roi(    winPointMask.cols / 2 - (minWinSize-1)/2,
+                        winPointMask.rows / 2 - (minWinSize-1)/2,
+                        minWinSize, minWinSize);
+        cv::Mat(winPointMask, roi).setTo(1);
+        roi.x += border_left;
+        roi.y += border_top;
+        min_c = MIN(MIN(min_c, roi.tl().x),roi.br().x);
+        max_c = MAX(MAX(max_c, roi.tl().x),roi.br().x);
+        min_r = MIN(MIN(min_r, roi.tl().y),roi.br().y);
+        max_r = MAX(MAX(max_r, roi.tl().y),roi.br().y);
+        noPoints += minWinSize * minWinSize;
+    }
+    winRoi.x = min_c - maxWinSizeH;
+    winRoi.y = min_r - maxWinSizeH;
+    winRoi.width  =  max_c - min_c;
+    winRoi.height =  max_r - min_r;
+    winPointMask = winPointMask(cv::Rect(min_c - border_left, min_r - border_top, winRoi.width, winRoi.height));
+}
+
+static inline
+bool calcWinMaskMat(
+        const cv::Mat & BI,
+        const int windowType,
+        const cv::Point2i & iprevPt,
+        cv::Mat & winMaskMat,
+        cv::Size & winSize,
+        cv::Point2f & halfWin,
+        int & winArea,
+        const int minWinSize,
+        const int maxWinSize)
+{
+    if (windowType == SR_CROSS && maxWinSize != minWinSize)
+    {
+        // patch generation
+        cv::Rect winRoi;
+        getLocalPatch(BI, iprevPt, winMaskMat, winArea, winRoi, minWinSize);
+        if (winArea == 0)
+            return false;
+        winSize = winRoi.size();
+        halfWin = Point2f(static_cast<float>(iprevPt.x - winRoi.tl().x),
+                          static_cast<float>(iprevPt.y - winRoi.tl().y));
+    }
+    else
+    {
+        winSize = cv::Size(maxWinSize, maxWinSize);
+        halfWin = Point2f((winSize.width - 1) / 2.f, (winSize.height - 1) / 2.f);
+        winMaskMat.setTo(1);
+    }
+    return true;
+}
+
+static inline
+short estimateScale(cv::Mat & residuals)
+{
+    cv::Mat absMat = cv::abs(residuals);
+    return quickselect<short>(absMat, absMat.rows / 2);
+}
+
+}} // namespace
+#endif
diff --git a/modules/optflow/src/rlof/rlof_localflow.cpp b/modules/optflow/src/rlof/rlof_localflow.cpp
new file mode 100644
index 00000000000..cdaccc6bb8b
--- /dev/null
+++ b/modules/optflow/src/rlof/rlof_localflow.cpp
@@ -0,0 +1,695 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#include "../precomp.hpp"
+
+#include "opencv2/calib3d.hpp"  // findHomography
+#include "rlof_localflow.h"
+#include "berlof_invoker.hpp"
+#include "rlof_invoker.hpp"
+#include "plk_invoker.hpp"
+using namespace std;
+using namespace cv;
+
+namespace cv {
+namespace detail {
+typedef short deriv_type;
+} // namespace
+
+namespace {
+static void calcSharrDeriv(const cv::Mat& src, cv::Mat& dst)
+{
+    using namespace cv;
+    using cv::detail::deriv_type;
+    int rows = src.rows, cols = src.cols, cn = src.channels(), colsn = cols * cn, depth = src.depth();
+    CV_Assert(depth == CV_8U);
+    dst.create(rows, cols, CV_MAKETYPE(DataType<deriv_type>::depth, cn * 2));
+
+    int x, y, delta = (int)alignSize((cols + 2)*cn, 16);
+    AutoBuffer<deriv_type> _tempBuf(delta * 2 + 64);
+    deriv_type *trow0 = alignPtr(_tempBuf.data() + cn, 16), *trow1 = alignPtr(trow0 + delta, 16);
+
+#if CV_SIMD128
+    v_int16x8 c3 = v_setall_s16(3), c10 = v_setall_s16(10);
+    bool haveSIMD = checkHardwareSupport(CV_CPU_SSE2) || checkHardwareSupport(CV_CPU_NEON);
+#endif
+
+    for (y = 0; y < rows; y++)
+    {
+        const uchar* srow0 = src.ptr<uchar>(y > 0 ? y - 1 : rows > 1 ? 1 : 0);
+        const uchar* srow1 = src.ptr<uchar>(y);
+        const uchar* srow2 = src.ptr<uchar>(y < rows - 1 ? y + 1 : rows > 1 ? rows - 2 : 0);
+        deriv_type* drow = dst.ptr<deriv_type>(y);
+
+        // do vertical convolution
+        x = 0;
+#if CV_SIMD128
+        if (haveSIMD)
+        {
+            for (; x <= colsn - 8; x += 8)
+            {
+                v_int16x8 s0 = v_reinterpret_as_s16(v_load_expand(srow0 + x));
+                v_int16x8 s1 = v_reinterpret_as_s16(v_load_expand(srow1 + x));
+                v_int16x8 s2 = v_reinterpret_as_s16(v_load_expand(srow2 + x));
+
+                v_int16x8 t1 = s2 - s0;
+                v_int16x8 t0 = v_mul_wrap(s0 + s2, c3) + v_mul_wrap(s1, c10);
+
+                v_store(trow0 + x, t0);
+                v_store(trow1 + x, t1);
+            }
+        }
+#endif
+
+        for (; x < colsn; x++)
+        {
+            int t0 = (srow0[x] + srow2[x]) * 3 + srow1[x] * 10;
+            int t1 = srow2[x] - srow0[x];
+            trow0[x] = (deriv_type)t0;
+            trow1[x] = (deriv_type)t1;
+        }
+
+        // make border
+        int x0 = (cols > 1 ? 1 : 0)*cn, x1 = (cols > 1 ? cols - 2 : 0)*cn;
+        for (int k = 0; k < cn; k++)
+        {
+            trow0[-cn + k] = trow0[x0 + k]; trow0[colsn + k] = trow0[x1 + k];
+            trow1[-cn + k] = trow1[x0 + k]; trow1[colsn + k] = trow1[x1 + k];
+        }
+
+        // do horizontal convolution, interleave the results and store them to dst
+        x = 0;
+#if CV_SIMD128
+        if (haveSIMD)
+        {
+            for (; x <= colsn - 8; x += 8)
+            {
+                v_int16x8 s0 = v_load(trow0 + x - cn);
+                v_int16x8 s1 = v_load(trow0 + x + cn);
+                v_int16x8 s2 = v_load(trow1 + x - cn);
+                v_int16x8 s3 = v_load(trow1 + x);
+                v_int16x8 s4 = v_load(trow1 + x + cn);
+
+                v_int16x8 t0 = s1 - s0;
+                v_int16x8 t1 = v_mul_wrap(s2 + s4, c3) + v_mul_wrap(s3, c10);
+
+                v_store_interleave((drow + x * 2), t0, t1);
+            }
+        }
+#endif
+        for (; x < colsn; x++)
+        {
+            deriv_type t0 = (deriv_type)(trow0[x + cn] - trow0[x - cn]);
+            deriv_type t1 = (deriv_type)((trow1[x + cn] + trow1[x - cn]) * 3 + trow1[x] * 10);
+            drow[x * 2] = t0; drow[x * 2 + 1] = t1;
+        }
+    }
+}
+
+} // namespace
+namespace optflow {
+
+/*! Helper function for preCalcCrossSegmentation. Everything is performed on the large
+*\param data CV_8UC3 image ( use extended image mit winSize)
+*\param winSize
+*\param dst CV_32SC1 bounding map
+*\param threshold
+*\param stride if true store into first two bounding maps
+*/
+class HorizontalCrossSegmentation  : public cv::ParallelLoopBody
+{
+public:
+    HorizontalCrossSegmentation(
+            const cv::Point2f * ptList,
+            int npoints,
+            float pointScale,
+            const cv::Mat * data,
+            const int winSize,
+            cv::Mat * dst,
+            int threshold,
+            bool stride,
+            const cv::Mat * mask
+    )
+    {
+        m_ptList        = ptList;
+        m_npoints       = npoints;
+        m_pointScale    = pointScale;
+        m_data          = data;
+        m_winSize       = winSize;
+        m_dst           = dst;
+        m_threshold     = threshold;
+        m_stride        = stride;
+        m_mask          = mask;
+    }
+
+    void operator()(const cv::Range& range) const CV_OVERRIDE
+    {
+        uchar channel[2];
+        channel[0] = m_stride ? 2 : 0;
+        channel[1] = m_stride ? 3 : 1;
+        int hWinSize        = (m_winSize - 1) / 2;
+        std::vector<int> differenz(m_winSize);
+        for( int r = range.start; r < range.end; r++ )
+        {
+            for(int c = hWinSize; c < m_data->cols - hWinSize; c++)
+            {
+                if( m_mask->at<uchar>(r,c) == 0)
+                    continue;
+        const Point3_<uchar> & ucval = m_data->at<Point3_<uchar>>(r,c);
+        Point3i val(static_cast<int>(ucval.x), static_cast<int>(ucval.y), static_cast<int>(ucval.z));
+                int x = c - hWinSize;
+        Point dstPos = m_stride ? Point(r,c) : Point(c,r);
+                for(int ix = 0; ix < m_winSize; ix++, x++)
+                {
+            const Point3_<uchar> & valref = m_data->at<Point3_<uchar>>(r,x);
+                    differenz[ix] = MAX(std::abs(static_cast<int>(valref.x) - val.x),
+                                    MAX(std::abs(static_cast<int>(valref.y) - val.y),
+                                    (std::abs(static_cast<int>(valref.z) - val.z))));
+
+                }
+                cv::Vec4i & bounds = m_dst->at<cv::Vec4i>(dstPos);
+                bounds.val[channel[0]] = c - hWinSize;
+                bounds.val[channel[1]] = c + hWinSize;
+                int * diffPtr = &differenz[hWinSize];
+                bool useUpperBound = false;
+                bool useLowerBound = false;
+                for(int ix = 1; ix <= hWinSize; ix++)
+                {
+                    if( !useUpperBound && diffPtr[-ix] > m_threshold)
+                    {
+                        useUpperBound = true;
+                        bounds.val[channel[0]] = c - ix;
+                    }
+                    if( !useLowerBound && diffPtr[ix-1] > m_threshold)
+                    {
+                        useLowerBound = true;
+                        bounds.val[channel[1]] = c + ix - 1;
+                    }
+                    if( useUpperBound && useLowerBound)
+                        break;
+                }
+            }
+        }
+    }
+
+    const cv::Point2f * m_ptList;
+    int                 m_npoints;
+    float               m_pointScale;
+    const cv::Mat *     m_data;
+    int                 m_winSize;
+    cv::Mat *           m_dst;
+    int                 m_threshold;
+    bool                m_stride;
+    const cv::Mat *     m_mask;
+};
+
+static
+void preCalcCrossSegmentation(
+    const cv::Point2f * ptList,
+    int npoints,
+    float pointScale,
+    const cv::Mat & img,
+    const int winSize,
+    cv::Mat & dst,
+    int threshold
+)
+{
+    int hWinSize = (winSize - 1) / 2;
+    cv::Mat data = img;
+    data.adjustROI(hWinSize, hWinSize, hWinSize, hWinSize);
+    if( dst.size() != dst.size() || dst.type() != CV_32SC4)
+    {
+        dst.release();
+        dst.create(data.size(), CV_32SC4);
+    }
+    cv::Mat mask(data.cols, data.rows, CV_8UC1);
+    mask.setTo(0);
+    for( unsigned int n = 0; n < static_cast<unsigned int>(npoints); n++)
+    {
+        cv::Point ipos( static_cast<int>(floor(ptList[n].y * pointScale)),
+                        static_cast<int>(floor(ptList[n].x * pointScale) + hWinSize));
+        ipos.x = MAX( MIN(ipos.x, mask.cols - 1), 0);
+        int to = MIN( mask.cols - 1, ipos.x + winSize );
+        int ypos = MAX( MIN(ipos.y, mask.rows - 1), 0);
+        for(int x = ipos.x; x <= to ; x++)
+        {
+            mask.at<uchar>(ypos, x) = 255;
+        }
+    }
+    cv::Mat datat = data.t();
+    cv::Mat maskt = mask.t();
+    parallel_for_(cv::Range(0, datat.rows),    HorizontalCrossSegmentation(ptList, npoints, pointScale, &datat, winSize, &dst, threshold, true, &mask));
+    parallel_for_(cv::Range(0, data.rows),    HorizontalCrossSegmentation(ptList, npoints, pointScale, &data, winSize, &dst, threshold, false, &maskt));
+
+}
+
+
+static inline
+bool isrobust(const RLOFOpticalFlowParameter & param)
+{
+    return (param.normSigma0 < 255 && param.normSigma1 < 255);
+}
+static inline
+std::vector<float> get_norm(float sigma0, float sigma1)
+{
+    std::vector<float> result = { sigma0, sigma1, sigma0 / (sigma0 - sigma1), sigma0 * sigma1 };
+    return result;
+}
+
+static
+int buildOpticalFlowPyramidScale(InputArray _img, OutputArrayOfArrays pyramid, Size winSize, int maxLevel, bool withDerivatives,
+    int pyrBorder, int derivBorder, bool tryReuseInputImage, float levelScale[2])
+{
+    Mat img = _img.getMat();
+    CV_Assert(img.depth() == CV_8U && winSize.width > 2 && winSize.height > 2);
+    int pyrstep = withDerivatives ? 2 : 1;
+
+    pyramid.create(1, (maxLevel + 1) * pyrstep, 0 /*type*/, -1, true);
+
+    int derivType = CV_MAKETYPE(DataType<short>::depth, img.channels() * 2);
+
+    //level 0
+    bool lvl0IsSet = false;
+    if (tryReuseInputImage && img.isSubmatrix() && (pyrBorder & BORDER_ISOLATED) == 0)
+    {
+        Size wholeSize;
+        Point ofs;
+        img.locateROI(wholeSize, ofs);
+        if (ofs.x >= winSize.width && ofs.y >= winSize.height
+            && ofs.x + img.cols + winSize.width <= wholeSize.width
+            && ofs.y + img.rows + winSize.height <= wholeSize.height)
+        {
+            pyramid.getMatRef(0) = img;
+            lvl0IsSet = true;
+        }
+    }
+
+    if (!lvl0IsSet)
+    {
+        Mat& temp = pyramid.getMatRef(0);
+
+        if (!temp.empty())
+            temp.adjustROI(winSize.height, winSize.height, winSize.width, winSize.width);
+        if (temp.type() != img.type() || temp.cols != winSize.width * 2 + img.cols || temp.rows != winSize.height * 2 + img.rows)
+            temp.create(img.rows + winSize.height * 2, img.cols + winSize.width * 2, img.type());
+
+        if (pyrBorder == BORDER_TRANSPARENT)
+            img.copyTo(temp(Rect(winSize.width, winSize.height, img.cols, img.rows)));
+        else
+            copyMakeBorder(img, temp, winSize.height, winSize.height, winSize.width, winSize.width, pyrBorder);
+        temp.adjustROI(-winSize.height, -winSize.height, -winSize.width, -winSize.width);
+    }
+
+    Size sz = img.size();
+    Mat prevLevel = pyramid.getMatRef(0);
+    Mat thisLevel = prevLevel;
+
+    for (int level = 0; level <= maxLevel; ++level)
+    {
+        if (level != 0)
+        {
+            Mat& temp = pyramid.getMatRef(level * pyrstep);
+
+            if (!temp.empty())
+                temp.adjustROI(winSize.height, winSize.height, winSize.width, winSize.width);
+            if (temp.type() != img.type() || temp.cols != winSize.width * 2 + sz.width || temp.rows != winSize.height * 2 + sz.height)
+                temp.create(sz.height + winSize.height * 2, sz.width + winSize.width * 2, img.type());
+
+            thisLevel = temp(Rect(winSize.width, winSize.height, sz.width, sz.height));
+            pyrDown(prevLevel, thisLevel, sz);
+
+            if (pyrBorder != BORDER_TRANSPARENT)
+                copyMakeBorder(thisLevel, temp, winSize.height, winSize.height, winSize.width, winSize.width, pyrBorder | BORDER_ISOLATED);
+            temp.adjustROI(-winSize.height, -winSize.height, -winSize.width, -winSize.width);
+        }
+
+        if (withDerivatives)
+        {
+            Mat& deriv = pyramid.getMatRef(level * pyrstep + 1);
+
+            if (!deriv.empty())
+                deriv.adjustROI(winSize.height, winSize.height, winSize.width, winSize.width);
+            if (deriv.type() != derivType || deriv.cols != winSize.width * 2 + sz.width || deriv.rows != winSize.height * 2 + sz.height)
+                deriv.create(sz.height + winSize.height * 2, sz.width + winSize.width * 2, derivType);
+
+            Mat derivI = deriv(Rect(winSize.width, winSize.height, sz.width, sz.height));
+            calcSharrDeriv(thisLevel, derivI);
+
+            if (derivBorder != BORDER_TRANSPARENT)
+                copyMakeBorder(derivI, deriv, winSize.height, winSize.height, winSize.width, winSize.width, derivBorder | BORDER_ISOLATED);
+            deriv.adjustROI(-winSize.height, -winSize.height, -winSize.width, -winSize.width);
+        }
+
+        sz = Size(static_cast<int>((sz.width + 1) / levelScale[0]),
+            static_cast<int>((sz.height + 1) / levelScale[1]));
+        if (sz.width <= winSize.width || sz.height <= winSize.height)
+        {
+            pyramid.create(1, (level + 1) * pyrstep, 0 /*type*/, -1, true);//check this
+            return level;
+        }
+
+        prevLevel = thisLevel;
+    }
+
+    return maxLevel;
+}
+
+int CImageBuffer::buildPyramid(cv::Size winSize, int maxLevel, float levelScale[2],bool withBlurredImage )
+{
+    if (! m_Overwrite)
+        return m_maxLevel;
+    if (withBlurredImage)
+        m_maxLevel = buildOpticalFlowPyramidScale(m_BlurredImage, m_ImagePyramid, winSize, maxLevel, false, 4, 0, true, levelScale);
+    else
+        m_maxLevel = buildOpticalFlowPyramidScale(m_Image, m_ImagePyramid, winSize, maxLevel, false, 4, 0, true, levelScale);
+    return m_maxLevel;
+}
+
+static
+void calcLocalOpticalFlowCore(
+    Ptr<CImageBuffer>  prevPyramids[2],
+    Ptr<CImageBuffer> currPyramids[2],
+    InputArray _prevPts,
+    InputOutputArray _nextPts,
+    const RLOFOpticalFlowParameter & param)
+{
+
+    bool useAdditionalRGB = param.supportRegionType == SR_CROSS;
+    int iWinSize = param.largeWinSize;
+    int winSizes[2] = { iWinSize, iWinSize };
+    if (param.supportRegionType != SR_FIXED)
+    {
+        winSizes[0] = param.smallWinSize;
+    }
+    //cv::Size winSize(iWinSize, iWinSize);
+    cv::TermCriteria criteria(cv::TermCriteria::COUNT | cv::TermCriteria::EPS, param.maxIteration, 0.01);
+    std::vector<float> rlofNorm = get_norm(param.normSigma0, param.normSigma1);
+    CV_Assert(winSizes[0] <= winSizes[1]);
+
+    bool usePreComputedCross = winSizes[0] != winSizes[1];
+    Mat prevPtsMat = _prevPts.getMat();
+    const int derivDepth = DataType<detail::deriv_type>::depth;
+
+    CV_Assert(param.maxLevel >= 0 && iWinSize > 2);
+
+    int level = 0,  npoints;
+    CV_Assert((npoints = prevPtsMat.checkVector(2, CV_32F, true)) >= 0);
+
+    if (!(param.useInitialFlow))
+        _nextPts.create(prevPtsMat.size(), prevPtsMat.type(), -1, true);
+
+    Mat nextPtsMat = _nextPts.getMat();
+    CV_Assert(nextPtsMat.checkVector(2, CV_32F, true) == npoints);
+
+    const Point2f* prevPts = (const Point2f*)prevPtsMat.data;
+    Point2f* nextPts = (Point2f*)nextPtsMat.data;
+    std::vector<uchar> status(npoints);
+    std::vector<float> err(npoints);
+    std::vector<Point2f> gainPts(npoints);
+
+    float levelScale[2] = { 2.f,2.f };
+
+    int maxLevel = prevPyramids[0]->buildPyramid(cv::Size(iWinSize, iWinSize), param.maxLevel, levelScale);
+    maxLevel = currPyramids[0]->buildPyramid(cv::Size(iWinSize, iWinSize), maxLevel, levelScale);
+
+    if (useAdditionalRGB)
+    {
+        prevPyramids[1]->buildPyramid(cv::Size(iWinSize, iWinSize), maxLevel, levelScale, true);
+        currPyramids[1]->buildPyramid(cv::Size(iWinSize, iWinSize), maxLevel, levelScale, true);
+    }
+
+    if ((criteria.type & TermCriteria::COUNT) == 0)
+        criteria.maxCount = 30;
+    else
+        criteria.maxCount = std::min(std::max(criteria.maxCount, 0), 100);
+    if ((criteria.type & TermCriteria::EPS) == 0)
+        criteria.epsilon = 0.001;
+    else
+        criteria.epsilon = std::min(std::max(criteria.epsilon, 0.), 10.);
+    criteria.epsilon *= criteria.epsilon;
+
+    // dI/dx ~ Ix, dI/dy ~ Iy
+    Mat derivIBuf;
+    derivIBuf.create(prevPyramids[0]->m_ImagePyramid[0].rows + iWinSize * 2, prevPyramids[0]->m_ImagePyramid[0].cols + iWinSize * 2, CV_MAKETYPE(derivDepth, prevPyramids[0]->m_ImagePyramid[0].channels() * 2));
+
+    for (level = maxLevel; level >= 0; level--)
+    {
+        Mat derivI;
+        Size imgSize = prevPyramids[0]->getImage(level).size();
+        Mat _derivI(imgSize.height + iWinSize * 2, imgSize.width + iWinSize * 2, derivIBuf.type(), derivIBuf.data);
+        derivI = _derivI(Rect(iWinSize, iWinSize, imgSize.width, imgSize.height));
+        calcSharrDeriv(prevPyramids[0]->getImage(level), derivI);
+        copyMakeBorder(derivI, _derivI, iWinSize, iWinSize, iWinSize, iWinSize, BORDER_CONSTANT | BORDER_ISOLATED);
+
+        cv::Mat tRGBPrevPyr;
+        cv::Mat tRGBNextPyr;
+        if (useAdditionalRGB)
+        {
+            tRGBPrevPyr = prevPyramids[1]->getImage(level);
+            tRGBNextPyr = prevPyramids[1]->getImage(level);
+
+            prevPyramids[1]->m_Overwrite = false;
+            currPyramids[1]->m_Overwrite = false;
+        }
+
+        cv::Mat prevImage = prevPyramids[0]->getImage(level);
+        cv::Mat currImage = currPyramids[0]->getImage(level);
+
+        cv::Mat preCrossMap;
+        if( usePreComputedCross )
+        {
+            preCalcCrossSegmentation(prevPts, npoints, (float)(1./(1 << level)), tRGBPrevPyr, winSizes[1], preCrossMap, param.crossSegmentationThreshold);
+            tRGBNextPyr = cv::Mat();
+            tRGBPrevPyr = preCrossMap;
+        }
+        // apply plk like tracker
+            prevImage.adjustROI(iWinSize, iWinSize, iWinSize, iWinSize);
+            currImage.adjustROI(iWinSize, iWinSize, iWinSize, iWinSize);
+            derivI.adjustROI(iWinSize, iWinSize, iWinSize, iWinSize);
+        if (isrobust(param) == false)
+        {
+            if (param.useIlluminationModel)
+            {
+                cv::parallel_for_(cv::Range(0, npoints),
+                    plk::radial::TrackerInvoker(
+                        prevImage, derivI, currImage, tRGBPrevPyr, tRGBNextPyr,
+                        prevPts, nextPts, &status[0], &err[0], &gainPts[0],
+                        level, maxLevel, winSizes,
+                        param.maxIteration,
+                        param.useInitialFlow,
+                        param.supportRegionType,
+                        param.minEigenValue,
+                        param.crossSegmentationThreshold));
+            }
+            else
+            {
+                if (param.solverType == SolverType::ST_STANDART)
+                {
+                    cv::parallel_for_(cv::Range(0, npoints),
+                        plk::ica::TrackerInvoker(
+                            prevImage, derivI, currImage, tRGBPrevPyr, tRGBNextPyr,
+                            prevPts, nextPts, &status[0], &err[0],
+                            level, maxLevel, winSizes,
+                            param.maxIteration,
+                            param.useInitialFlow,
+                            param.supportRegionType,
+                            param.crossSegmentationThreshold,
+                            param.minEigenValue));
+                }
+                else
+                {
+                    cv::parallel_for_(cv::Range(0, npoints),
+                        beplk::ica::TrackerInvoker(prevImage, derivI, currImage, tRGBPrevPyr, tRGBNextPyr,
+                            prevPts, nextPts, &status[0], &err[0],
+                            level, maxLevel, winSizes,
+                            param.maxIteration,
+                            param.useInitialFlow,
+                            param.supportRegionType,
+                            param.crossSegmentationThreshold,
+                            param.minEigenValue));
+                }
+            }
+        }
+        // for robust models
+        else
+        {
+            if (param.useIlluminationModel)
+            {
+                if (param.solverType == SolverType::ST_STANDART)
+                {
+                    cv::parallel_for_(cv::Range(0, npoints),
+                        rlof::radial::TrackerInvoker(
+                            prevImage, derivI, currImage, tRGBPrevPyr, tRGBNextPyr,
+                            prevPts, nextPts, &status[0], &err[0], &gainPts[0],
+                            level, maxLevel, winSizes,
+                            param.maxIteration,
+                            param.useInitialFlow,
+                            param.supportRegionType,
+                            rlofNorm,
+                            param.minEigenValue,
+                            param.crossSegmentationThreshold));
+                }
+                else
+                {
+                    cv::parallel_for_(cv::Range(0, npoints),
+                        berlof::radial::TrackerInvoker(prevImage, derivI, currImage, tRGBPrevPyr, tRGBNextPyr,
+                            prevPts, nextPts, &status[0], &err[0], &gainPts[0],
+                            level, maxLevel, winSizes,
+                            param.maxIteration,
+                            param.useInitialFlow,
+                            param.supportRegionType,
+                            param.crossSegmentationThreshold,
+                            rlofNorm,
+                            param.minEigenValue));
+                }
+            }
+            else
+            {
+
+                if (param.solverType == SolverType::ST_STANDART)
+                {
+                    cv::parallel_for_(cv::Range(0, npoints),
+                        rlof::ica::TrackerInvoker(
+                            prevImage, derivI, currImage, tRGBPrevPyr, tRGBNextPyr,
+                            prevPts, nextPts, &status[0], &err[0],
+                            level, maxLevel, winSizes,
+                            param.maxIteration,
+                            param.useInitialFlow,
+                            param.supportRegionType,
+                            rlofNorm,
+                            param.minEigenValue,
+                            param.crossSegmentationThreshold));
+                }
+                else
+                {
+                    cv::parallel_for_(cv::Range(0, npoints),
+                        berlof::ica::TrackerInvoker(prevImage, derivI, currImage, tRGBPrevPyr, tRGBNextPyr,
+                            prevPts, nextPts, &status[0], &err[0],
+                            level, maxLevel, winSizes,
+                            param.maxIteration,
+                            param.useInitialFlow,
+                            param.supportRegionType,
+                            param.crossSegmentationThreshold,
+                            rlofNorm,
+                            param.minEigenValue));
+                }
+
+            }
+        }
+
+        prevPyramids[0]->m_Overwrite = true;
+        currPyramids[0]->m_Overwrite = true;
+    }
+}
+
+static
+void preprocess(Ptr<CImageBuffer> prevPyramids[2],
+    Ptr<CImageBuffer> currPyramids[2],
+    const std::vector<cv::Point2f> & prevPoints,
+    std::vector<cv::Point2f> & currPoints,
+    const RLOFOpticalFlowParameter & param)
+{
+    cv::Mat mask, homography;
+    if (param.useGlobalMotionPrior == false)
+        return;
+
+    currPoints.resize(prevPoints.size());
+
+    RLOFOpticalFlowParameter gmeTrackerParam = param;
+    gmeTrackerParam.useGlobalMotionPrior = false;
+    gmeTrackerParam.largeWinSize = 17;
+    // use none robust tracker for global motion estimation since it is faster
+    gmeTrackerParam.normSigma0 = std::numeric_limits<float>::max();
+    gmeTrackerParam.maxIteration = MAX(15, param.maxIteration);
+    gmeTrackerParam.minEigenValue = 0.000001f;
+
+    std::vector<cv::Point2f> gmPrevPoints, gmCurrPoints;
+
+    // Initialize point grid
+    int stepr = prevPyramids[0]->m_Image.rows / 30;
+    int stepc = prevPyramids[0]->m_Image.cols / 40;
+    for (int r = stepr / 2; r < prevPyramids[0]->m_Image.rows; r += stepr)
+    {
+        for (int c = stepc / 2; c < prevPyramids[0]->m_Image.cols; c += stepc)
+        {
+            gmPrevPoints.push_back(cv::Point2f(static_cast<float>(c), static_cast<float>(r)));
+        }
+    }
+
+    // perform motion estimation
+    calcLocalOpticalFlowCore(prevPyramids, currPyramids, gmPrevPoints, gmCurrPoints, gmeTrackerParam);
+
+    cv::Mat prevPointsMat(static_cast<int>(gmPrevPoints.size()), 1, CV_32FC2);
+    cv::Mat currPointsMat(static_cast<int>(gmPrevPoints.size()), 1, CV_32FC2);
+    cv::Mat distMat(static_cast<int>(gmPrevPoints.size()), 1, CV_32FC1);
+
+    // Forward backward confidence to estimate optimal ransac reprojection error
+    int noPoints = 0;
+    for (unsigned int n = 0; n < gmPrevPoints.size(); n++)
+    {
+        cv::Point2f flow = gmCurrPoints[n] - gmPrevPoints[n];
+        prevPointsMat.at<cv::Point2f>(noPoints) = gmPrevPoints[n];
+        currPointsMat.at<cv::Point2f>(noPoints) = gmCurrPoints[n];
+        distMat.at<float>(noPoints) = flow.x * flow.x + flow.y* flow.y;
+        if (isnan(distMat.at<float>(noPoints)) == false)
+            noPoints++;
+    }
+
+    float medianDist = (param.globalMotionRansacThreshold == 0) ? 1.f :
+        quickselect<float>(distMat, static_cast<int>(noPoints  * static_cast<float>(param.globalMotionRansacThreshold) / 100.f));
+    medianDist = sqrt(medianDist);
+
+    if (noPoints < 8)
+        return;
+
+    cv::findHomography(prevPointsMat(cv::Rect(0, 0, 1, noPoints)), currPointsMat(cv::Rect(0, 0, 1, noPoints)), cv::RANSAC, medianDist, mask).convertTo(homography, CV_32FC1);
+
+    if (homography.empty())
+        return;
+
+    cv::perspectiveTransform(prevPoints, currPoints, homography);
+}
+
+void calcLocalOpticalFlow(
+    const Mat prevImage,
+    const Mat currImage,
+    Ptr<CImageBuffer>  prevPyramids[2],
+    Ptr<CImageBuffer>  currPyramids[2],
+    const std::vector<Point2f> & prevPoints,
+    std::vector<Point2f> & currPoints,
+    const RLOFOpticalFlowParameter & param)
+{
+    if (prevImage.empty() == false && currImage.empty()== false)
+    {
+        prevPyramids[0]->m_Overwrite = true;
+        currPyramids[0]->m_Overwrite = true;
+        prevPyramids[1]->m_Overwrite = true;
+        // perform blurring and build blur pyramid only for the prev image
+        currPyramids[1]->m_Overwrite = false;
+        if (prevImage.type() == CV_8UC3)
+        {
+            prevPyramids[0]->setGrayFromRGB(prevImage);
+            currPyramids[0]->setGrayFromRGB(currImage);
+            prevPyramids[1]->setImage(prevImage);
+            currPyramids[1]->setImage(currImage);
+
+            if (param.supportRegionType == SR_CROSS)
+            {
+                prevPyramids[1]->setBlurFromRGB(prevImage);
+                currPyramids[1]->setBlurFromRGB(currImage);
+            }
+        }
+        else
+        {
+            prevPyramids[0]->setImage(prevImage);
+            currPyramids[0]->setImage(currImage);
+        }
+    }
+    preprocess(prevPyramids, currPyramids, prevPoints, currPoints, param);
+    RLOFOpticalFlowParameter internParam = param;
+    if (param.useGlobalMotionPrior == true)
+        internParam.useInitialFlow = true;
+    calcLocalOpticalFlowCore(prevPyramids, currPyramids, prevPoints, currPoints, internParam);
+}
+
+}} // namespace
diff --git a/modules/optflow/src/rlof/rlof_localflow.h b/modules/optflow/src/rlof/rlof_localflow.h
new file mode 100644
index 00000000000..50feb5d2039
--- /dev/null
+++ b/modules/optflow/src/rlof/rlof_localflow.h
@@ -0,0 +1,117 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#ifndef _RLOF_LOCALFLOW_H_
+#define _RLOF_LOCALFLOW_H_
+#include <limits>
+#include <math.h>
+#include <float.h>
+#include <stdio.h>
+#include "opencv2/imgproc.hpp"
+#include "opencv2/optflow/rlofflow.hpp"
+//! Fast median estimation method based on @cite Tibshirani2008. This implementation relates to http://www.stat.cmu.edu/~ryantibs/median/
+using namespace cv;
+template<typename T>
+T quickselect(const Mat & inp, int k)
+{
+    unsigned long i;
+    unsigned long ir;
+    unsigned long j;
+    unsigned long l;
+    unsigned long mid;
+    Mat values = inp.clone();
+    T a;
+
+    l = 0;
+    ir = MAX(values.rows, values.cols) - 1;
+    while(true)
+    {
+        if (ir <= l + 1)
+        {
+            if (ir == l + 1 && values.at<T>(ir) < values.at<T>(l))
+                std::swap(values.at<T>(l), values.at<T>(ir));
+            return values.at<T>(k);
+        }
+        else
+        {
+            mid = (l + ir) >> 1;
+            std::swap(values.at<T>(mid), values.at<T>(l+1));
+            if (values.at<T>(l) > values.at<T>(ir))
+                std::swap(values.at<T>(l), values.at<T>(ir));
+            if (values.at<T>(l+1) > values.at<T>(ir))
+                std::swap(values.at<T>(l+1), values.at<T>(ir));
+            if (values.at<T>(l) > values.at<T>(l+1))
+                std::swap(values.at<T>(l), values.at<T>(l+1));
+            i = l + 1;
+            j = ir;
+            a = values.at<T>(l+1);
+            while (true)
+            {
+                do
+                {
+                    i++;
+                }
+                while (values.at<T>(i) < a);
+                do
+                {
+                    j--;
+                }
+                while (values.at<T>(j) > a);
+                if (j < i) break;
+                std::swap(values.at<T>(i), values.at<T>(j));
+            }
+            values.at<T>(l+1) = values.at<T>(j);
+            values.at<T>(j) = a;
+            if (j >= static_cast<unsigned long>(k)) ir = j - 1;
+            if (j <= static_cast<unsigned long>(k)) l = i;
+        }
+    }
+}
+
+namespace cv {
+namespace optflow {
+
+class CImageBuffer
+{
+public:
+    CImageBuffer()
+        : m_Overwrite(true)
+    {}
+    void setGrayFromRGB(const cv::Mat & inp)
+    {
+        if(m_Overwrite)
+            cv::cvtColor(inp, m_Image, cv::COLOR_BGR2GRAY);
+    }
+    void setImage(const cv::Mat & inp)
+    {
+        if(m_Overwrite)
+            inp.copyTo(m_Image);
+    }
+    void setBlurFromRGB(const cv::Mat & inp)
+    {
+        if(m_Overwrite)
+            cv::GaussianBlur(inp, m_BlurredImage, cv::Size(7,7), -1);
+    }
+
+    int buildPyramid(cv::Size winSize, int maxLevel, float levelScale[2], bool withBlurredImage = false);
+    cv::Mat & getImage(int level) {return m_ImagePyramid[level];}
+
+    std::vector<cv::Mat>     m_ImagePyramid;
+    cv::Mat                  m_BlurredImage;
+    cv::Mat                  m_Image;
+    std::vector<cv::Mat>     m_CrossPyramid;
+    int                      m_maxLevel;
+    bool                     m_Overwrite;
+};
+
+void calcLocalOpticalFlow(
+    const Mat prevImage,
+    const Mat currImage,
+    Ptr<CImageBuffer>  prevPyramids[2],
+    Ptr<CImageBuffer>  currPyramids[2],
+    const std::vector<Point2f> & prevPoints,
+    std::vector<Point2f> & currPoints,
+    const RLOFOpticalFlowParameter & param);
+
+}} // namespace
+#endif
diff --git a/modules/optflow/src/rlofflow.cpp b/modules/optflow/src/rlofflow.cpp
new file mode 100644
index 00000000000..e4e9f05907c
--- /dev/null
+++ b/modules/optflow/src/rlofflow.cpp
@@ -0,0 +1,459 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#include "precomp.hpp"
+#include "rlof/rlof_localflow.h"
+#include "rlof/geo_interpolation.hpp"
+#include "opencv2/ximgproc.hpp"
+
+namespace cv {
+namespace optflow {
+
+Ptr<RLOFOpticalFlowParameter> RLOFOpticalFlowParameter::create()
+{
+    return Ptr<RLOFOpticalFlowParameter>(new RLOFOpticalFlowParameter);
+}
+
+void RLOFOpticalFlowParameter::setSolverType(SolverType val){ solverType = val;}
+SolverType RLOFOpticalFlowParameter::getSolverType() const { return solverType;}
+
+void RLOFOpticalFlowParameter::setSupportRegionType(SupportRegionType val){ supportRegionType = val;}
+SupportRegionType RLOFOpticalFlowParameter::getSupportRegionType() const { return supportRegionType;}
+
+void RLOFOpticalFlowParameter::setNormSigma0(float val){ normSigma0 = val;}
+float RLOFOpticalFlowParameter::getNormSigma0() const { return normSigma0;}
+
+void RLOFOpticalFlowParameter::setNormSigma1(float val){ normSigma1 = val;}
+float RLOFOpticalFlowParameter::getNormSigma1() const { return normSigma1;}
+
+void RLOFOpticalFlowParameter::setSmallWinSize(int val){ smallWinSize = val;}
+int RLOFOpticalFlowParameter::getSmallWinSize() const { return smallWinSize;}
+
+void RLOFOpticalFlowParameter::setLargeWinSize(int val){ largeWinSize = val;}
+int RLOFOpticalFlowParameter::getLargeWinSize() const { return largeWinSize;}
+
+void RLOFOpticalFlowParameter::setCrossSegmentationThreshold(int val){ crossSegmentationThreshold = val;}
+int RLOFOpticalFlowParameter::getCrossSegmentationThreshold() const { return crossSegmentationThreshold;}
+
+void RLOFOpticalFlowParameter::setMaxLevel(int val){ maxLevel = val;}
+int RLOFOpticalFlowParameter::getMaxLevel() const { return maxLevel;}
+
+void RLOFOpticalFlowParameter::setUseInitialFlow(bool val){ useInitialFlow = val;}
+bool RLOFOpticalFlowParameter::getUseInitialFlow() const { return useInitialFlow;}
+
+void RLOFOpticalFlowParameter::setUseIlluminationModel(bool val){ useIlluminationModel = val;}
+bool RLOFOpticalFlowParameter::getUseIlluminationModel() const { return useIlluminationModel;}
+
+void RLOFOpticalFlowParameter::setUseGlobalMotionPrior(bool val){ useGlobalMotionPrior = val;}
+bool RLOFOpticalFlowParameter::getUseGlobalMotionPrior() const { return useGlobalMotionPrior;}
+
+void RLOFOpticalFlowParameter::setMaxIteration(int val){ maxIteration = val;}
+int RLOFOpticalFlowParameter::getMaxIteration() const { return maxIteration;}
+
+void RLOFOpticalFlowParameter::setMinEigenValue(float val){ minEigenValue = val;}
+float RLOFOpticalFlowParameter::getMinEigenValue() const { return minEigenValue;}
+
+void RLOFOpticalFlowParameter::setGlobalMotionRansacThreshold(float val){ globalMotionRansacThreshold = val;}
+float RLOFOpticalFlowParameter::getGlobalMotionRansacThreshold() const { return globalMotionRansacThreshold;}
+
+class DenseOpticalFlowRLOFImpl : public DenseRLOFOpticalFlow
+{
+public:
+    DenseOpticalFlowRLOFImpl()
+        : param(Ptr<RLOFOpticalFlowParameter>(new RLOFOpticalFlowParameter))
+        , forwardBackwardThreshold(1.f)
+        , gridStep(6, 6)
+        , interp_type(InterpolationType::INTERP_GEO)
+        , k(128)
+        , sigma(0.05f)
+        , lambda(999.f)
+        , fgs_lambda(500.0f)
+        , fgs_sigma(1.5f)
+        , use_post_proc(true)
+        , use_variational_refinement(false)
+        , sp_size(15)
+        , slic_type(ximgproc::SLIC)
+
+    {
+        prevPyramid[0] = cv::Ptr<CImageBuffer>(new CImageBuffer);
+        prevPyramid[1] = cv::Ptr<CImageBuffer>(new CImageBuffer);
+        currPyramid[0] = cv::Ptr<CImageBuffer>(new CImageBuffer);
+        currPyramid[1] = cv::Ptr<CImageBuffer>(new CImageBuffer);
+    }
+    virtual void setRLOFOpticalFlowParameter(Ptr<RLOFOpticalFlowParameter>  val) CV_OVERRIDE { param = val; }
+    virtual Ptr<RLOFOpticalFlowParameter>  getRLOFOpticalFlowParameter() const CV_OVERRIDE { return param; }
+
+    virtual float getForwardBackward() const CV_OVERRIDE { return forwardBackwardThreshold; }
+    virtual void setForwardBackward(float val) CV_OVERRIDE { forwardBackwardThreshold = val; }
+
+    virtual void setInterpolation(InterpolationType val) CV_OVERRIDE { interp_type = val; }
+    virtual InterpolationType getInterpolation() const CV_OVERRIDE { return interp_type; }
+
+    virtual Size getGridStep() const CV_OVERRIDE { return gridStep; }
+    virtual void setGridStep(Size val) CV_OVERRIDE { gridStep = val; }
+
+    virtual int getEPICK() const CV_OVERRIDE { return k; }
+    virtual void setEPICK(int val) CV_OVERRIDE { k = val; }
+
+    virtual float getEPICSigma() const CV_OVERRIDE { return sigma; }
+    virtual void setEPICSigma(float val) CV_OVERRIDE { sigma = val; }
+
+    virtual float getEPICLambda() const CV_OVERRIDE { return lambda; }
+    virtual void setEPICLambda(float val)  CV_OVERRIDE { lambda = val; }
+
+    virtual float getFgsLambda() const CV_OVERRIDE { return fgs_lambda; }
+    virtual void setFgsLambda(float val) CV_OVERRIDE { fgs_lambda = val; }
+
+    virtual float getFgsSigma() const CV_OVERRIDE { return fgs_sigma; }
+    virtual void setFgsSigma(float val) CV_OVERRIDE { fgs_sigma = val; }
+
+    virtual bool getUsePostProc() const CV_OVERRIDE { return use_post_proc; }
+    virtual void setUsePostProc(bool val) CV_OVERRIDE { use_post_proc = val; }
+
+    virtual void setUseVariationalRefinement(bool val) CV_OVERRIDE { use_variational_refinement = val; }
+    virtual bool getUseVariationalRefinement() const CV_OVERRIDE { return use_variational_refinement; }
+
+    virtual void setRICSPSize(int val) CV_OVERRIDE { sp_size = val; }
+    virtual int  getRICSPSize() const CV_OVERRIDE { return sp_size; }
+
+    virtual void setRICSLICType(int val) CV_OVERRIDE { slic_type = static_cast<ximgproc::SLICType>(val); }
+    virtual int  getRICSLICType() const CV_OVERRIDE { return slic_type; }
+
+    virtual void calc(InputArray I0, InputArray I1, InputOutputArray flow) CV_OVERRIDE
+    {
+        CV_Assert(!I0.empty() && I0.depth() == CV_8U && (I0.channels() == 3 || I0.channels() == 1));
+        CV_Assert(!I1.empty() && I1.depth() == CV_8U && (I1.channels() == 3 || I1.channels() == 1));
+        CV_Assert(I0.sameSize(I1));
+        if (param.empty())
+            param = Ptr<RLOFOpticalFlowParameter>(new RLOFOpticalFlowParameter());
+        if (param->supportRegionType == SR_CROSS)
+            CV_Assert( I0.channels() == 3 && I1.channels() == 3);
+        CV_Assert(interp_type == InterpolationType::INTERP_EPIC || interp_type == InterpolationType::INTERP_GEO || interp_type == InterpolationType::INTERP_RIC);
+        // if no parameter is used use the default parameter
+
+        Mat prevImage = I0.getMat();
+        Mat currImage = I1.getMat();
+        int noPoints = prevImage.cols * prevImage.rows;
+        std::vector<cv::Point2f> prevPoints(noPoints);
+        std::vector<cv::Point2f> currPoints, refPoints;
+        noPoints = 0;
+        cv::Size grid_h = gridStep / 2;
+        for (int r = grid_h.height; r < prevImage.rows - grid_h.height; r += gridStep.height)
+        {
+            for (int c = grid_h.width; c < prevImage.cols - grid_h.width; c += gridStep.width)
+            {
+                prevPoints[noPoints++] = cv::Point2f(static_cast<float>(c), static_cast<float>(r));
+            }
+        }
+        prevPoints.erase(prevPoints.begin() + noPoints, prevPoints.end());
+        currPoints.resize(prevPoints.size());
+        calcLocalOpticalFlow(prevImage, currImage, prevPyramid, currPyramid, prevPoints, currPoints, *(param.get()));
+        flow.create(prevImage.size(), CV_32FC2);
+        Mat dense_flow = flow.getMat();
+
+        std::vector<Point2f> filtered_prevPoints;
+        std::vector<Point2f> filtered_currPoints;
+        if (gridStep == cv::Size(1, 1) && forwardBackwardThreshold <= 0)
+        {
+            for (unsigned int n = 0; n < prevPoints.size(); n++)
+            {
+                dense_flow.at<Point2f>(prevPoints[n]) = currPoints[n] - prevPoints[n];
+            }
+            return;
+        }
+        if (forwardBackwardThreshold > 0)
+        {
+            // reuse image pyramids
+            calcLocalOpticalFlow(currImage, prevImage, currPyramid, prevPyramid, currPoints, refPoints, *(param.get()));
+
+            filtered_prevPoints.resize(prevPoints.size());
+            filtered_currPoints.resize(prevPoints.size());
+            float sqrForwardBackwardThreshold = forwardBackwardThreshold * forwardBackwardThreshold;
+            noPoints = 0;
+            for (unsigned int r = 0; r < refPoints.size(); r++)
+            {
+                Point2f diff = refPoints[r] - prevPoints[r];
+                if (diff.x * diff.x + diff.y * diff.y < sqrForwardBackwardThreshold)
+                {
+                    filtered_prevPoints[noPoints] = prevPoints[r];
+                    filtered_currPoints[noPoints++] = currPoints[r];
+                }
+            }
+
+            filtered_prevPoints.erase(filtered_prevPoints.begin() + noPoints, filtered_prevPoints.end());
+            filtered_currPoints.erase(filtered_currPoints.begin() + noPoints, filtered_currPoints.end());
+
+        }
+        else
+        {
+            filtered_prevPoints = prevPoints;
+            filtered_currPoints = currPoints;
+        }
+
+        if (interp_type == InterpolationType::INTERP_EPIC)
+        {
+            Ptr<ximgproc::EdgeAwareInterpolator> gd = ximgproc::createEdgeAwareInterpolator();
+            gd->setK(k);
+            gd->setSigma(sigma);
+            gd->setLambda(lambda);
+            gd->setFGSLambda(fgs_lambda);
+            gd->setFGSSigma(fgs_sigma);
+            gd->setUsePostProcessing(false);
+            gd->interpolate(prevImage, filtered_prevPoints, currImage, filtered_currPoints, dense_flow);
+        }
+        else if (interp_type == InterpolationType::INTERP_RIC)
+        {
+            Ptr<ximgproc::RICInterpolator> gd = ximgproc::createRICInterpolator();
+            gd->setK(k);
+            gd->setFGSLambda(fgs_lambda);
+            gd->setFGSSigma(fgs_sigma);
+            gd->setSuperpixelSize(sp_size);
+            gd->setSuperpixelMode(slic_type);
+            gd->setUseGlobalSmootherFilter(false);
+            gd->setUseVariationalRefinement(false);
+            gd->interpolate(prevImage, filtered_prevPoints, currImage, filtered_currPoints, dense_flow);
+        }
+        else
+        {
+            Mat blurredPrevImage, blurredCurrImage;
+            GaussianBlur(prevImage, blurredPrevImage, cv::Size(5, 5), -1);
+            std::vector<uchar> status(filtered_currPoints.size(), 1);
+            interpolate_irregular_nn_raster(filtered_prevPoints, filtered_currPoints, status, blurredPrevImage).copyTo(dense_flow);
+            std::vector<Mat> vecMats;
+            std::vector<Mat> vecMats2(2);
+            cv::split(dense_flow, vecMats);
+            cv::bilateralFilter(vecMats[0], vecMats2[0], 5, 2, 20);
+            cv::bilateralFilter(vecMats[1], vecMats2[1], 5, 2, 20);
+            cv::merge(vecMats2, dense_flow);
+        }
+        if (use_variational_refinement)
+        {
+            Mat prevGrey, currGrey;
+            Ptr<VariationalRefinement > variationalrefine = VariationalRefinement::create();
+            cvtColor(prevImage, prevGrey, COLOR_BGR2GRAY);
+            cvtColor(currImage, currGrey, COLOR_BGR2GRAY);
+            variationalrefine->setOmega(1.9f);
+            variationalrefine->calc(prevGrey, currGrey, flow);
+        }
+        if (use_post_proc)
+        {
+            ximgproc::fastGlobalSmootherFilter(prevImage, flow, flow, fgs_lambda, fgs_sigma);
+        }
+    }
+
+    virtual void collectGarbage() CV_OVERRIDE
+    {
+        prevPyramid[0].release();
+        prevPyramid[1].release();
+        currPyramid[0].release();
+        currPyramid[1].release();
+    }
+
+protected:
+    Ptr<RLOFOpticalFlowParameter> param;
+    float                         forwardBackwardThreshold;
+    Ptr<CImageBuffer>             prevPyramid[2];
+    Ptr<CImageBuffer>             currPyramid[2];
+    cv::Size                      gridStep;
+    InterpolationType             interp_type;
+    int                           k;
+    float                         sigma;
+    float                         lambda;
+    float                         fgs_lambda;
+    float                         fgs_sigma;
+    bool                          use_post_proc;
+    bool                          use_variational_refinement;
+    int                           sp_size;
+    ximgproc::SLICType            slic_type;
+};
+
+Ptr<DenseRLOFOpticalFlow> DenseRLOFOpticalFlow::create(
+    Ptr<RLOFOpticalFlowParameter>  rlofParam,
+    float forwardBackwardThreshold,
+    cv::Size gridStep,
+    InterpolationType interp_type,
+    int epicK,
+    float epicSigma,
+    float epicLambda,
+    int ricSPSize,
+    int ricSLICType,
+    bool use_post_proc,
+    float fgs_lambda,
+    float fgs_sigma,
+    bool use_variational_refinement)
+{
+    Ptr<DenseRLOFOpticalFlow> algo = makePtr<DenseOpticalFlowRLOFImpl>();
+    algo->setRLOFOpticalFlowParameter(rlofParam);
+    algo->setForwardBackward(forwardBackwardThreshold);
+    algo->setGridStep(gridStep);
+    algo->setInterpolation(interp_type);
+    algo->setEPICK(epicK);
+    algo->setEPICSigma(epicSigma);
+    algo->setEPICLambda(epicLambda);
+    algo->setUsePostProc(use_post_proc);
+    algo->setFgsLambda(fgs_lambda);
+    algo->setFgsSigma(fgs_sigma);
+    algo->setRICSLICType(ricSLICType);
+    algo->setRICSPSize(ricSPSize);
+    algo->setUseVariationalRefinement(use_variational_refinement);
+    return algo;
+}
+
+class SparseRLOFOpticalFlowImpl : public SparseRLOFOpticalFlow
+{
+    public:
+    SparseRLOFOpticalFlowImpl()
+        : param(Ptr<RLOFOpticalFlowParameter>(new RLOFOpticalFlowParameter))
+        , forwardBackwardThreshold(1.f)
+    {
+        prevPyramid[0] = cv::Ptr< CImageBuffer>(new CImageBuffer);
+        prevPyramid[1] = cv::Ptr< CImageBuffer>(new CImageBuffer);
+        currPyramid[0] = cv::Ptr< CImageBuffer>(new CImageBuffer);
+        currPyramid[1] = cv::Ptr< CImageBuffer>(new CImageBuffer);
+    }
+    virtual void setRLOFOpticalFlowParameter(Ptr<RLOFOpticalFlowParameter>  val) CV_OVERRIDE { param = val; }
+    virtual Ptr<RLOFOpticalFlowParameter>  getRLOFOpticalFlowParameter() const CV_OVERRIDE { return param; }
+
+    virtual float getForwardBackward()  const CV_OVERRIDE { return forwardBackwardThreshold; }
+    virtual void setForwardBackward(float val) CV_OVERRIDE { forwardBackwardThreshold = val; }
+
+    virtual void calc(InputArray prevImg, InputArray nextImg,
+        InputArray prevPts, InputOutputArray nextPts,
+        OutputArray status,
+        OutputArray err) CV_OVERRIDE
+    {
+        CV_Assert(!prevImg.empty() && prevImg.depth() == CV_8U && (prevImg.channels() == 3 || prevImg.channels() == 1));
+        CV_Assert(!nextImg.empty() && nextImg.depth() == CV_8U && (nextImg.channels() == 3 || nextImg.channels() == 1));
+        CV_Assert(prevImg.sameSize(nextImg));
+
+        if (param.empty())
+        {
+            param = makePtr<RLOFOpticalFlowParameter>();
+        }
+        CV_DbgAssert(!param.empty());
+
+        if (param->supportRegionType == SR_CROSS)
+        {
+            CV_CheckChannelsEQ(prevImg.channels(), 3, "SR_CROSS mode requires images with 3 channels");
+            CV_CheckChannelsEQ(nextImg.channels(), 3, "SR_CROSS mode requires images with 3 channels");
+        }
+
+        Mat prevImage = prevImg.getMat();
+        Mat nextImage = nextImg.getMat();
+        Mat prevPtsMat = prevPts.getMat();
+
+        if (param->useInitialFlow == false)
+            nextPts.create(prevPtsMat.size(), prevPtsMat.type(), -1, true);
+
+        int npoints = 0;
+        CV_Assert((npoints = prevPtsMat.checkVector(2, CV_32F, true)) >= 0);
+        if (npoints == 0)
+        {
+            nextPts.release();
+            status.release();
+            err.release();
+            return;
+        }
+        Mat nextPtsMat = nextPts.getMat();
+        CV_Assert(nextPtsMat.checkVector(2, CV_32F, true) == npoints);
+
+        std::vector<cv::Point2f> prevPoints(npoints), nextPoints(npoints), refPoints;
+
+        if (prevPtsMat.channels() != 2)
+            prevPtsMat = prevPtsMat.reshape(2, npoints);
+
+        prevPtsMat.copyTo(prevPoints);
+
+        if (param->useInitialFlow )
+        {
+            if (nextPtsMat.channels() != 2)
+                nextPtsMat = nextPtsMat.reshape(2, npoints);
+            nextPtsMat.copyTo(nextPoints);
+        }
+        cv::Mat statusMat;
+        cv::Mat errorMat;
+        if (status.needed() || forwardBackwardThreshold > 0)
+        {
+            status.create((int)npoints, 1, CV_8U, -1, true);
+            statusMat = status.getMat();
+            statusMat.setTo(1);
+        }
+
+        if (err.needed() || forwardBackwardThreshold > 0)
+        {
+            err.create((int)npoints, 1, CV_32F, -1, true);
+            errorMat = err.getMat();
+            errorMat.setTo(0);
+        }
+
+        calcLocalOpticalFlow(prevImage, nextImage, prevPyramid, currPyramid, prevPoints, nextPoints, *(param.get()));
+        cv::Mat(1,npoints , CV_32FC2, &nextPoints[0]).copyTo(nextPtsMat);
+        if (forwardBackwardThreshold > 0)
+        {
+            // reuse image pyramids
+            calcLocalOpticalFlow(nextImage, prevImage, currPyramid, prevPyramid, nextPoints, refPoints, *(param.get()));
+        }
+        for (unsigned int r = 0; r < refPoints.size(); r++)
+        {
+            Point2f diff = refPoints[r] - prevPoints[r];
+            errorMat.at<float>(r) = sqrt(diff.x * diff.x + diff.y * diff.y);
+            if (errorMat.at<float>(r) > forwardBackwardThreshold)
+                statusMat.at<uchar>(r) = 0;
+        }
+
+    }
+
+protected:
+    Ptr<RLOFOpticalFlowParameter> param;
+    float                forwardBackwardThreshold;
+    Ptr<CImageBuffer>    prevPyramid[2];
+    Ptr<CImageBuffer>    currPyramid[2];
+};
+
+Ptr<SparseRLOFOpticalFlow> SparseRLOFOpticalFlow::create(
+    Ptr<RLOFOpticalFlowParameter>  rlofParam,
+    float forwardBackwardThreshold)
+{
+    Ptr<SparseRLOFOpticalFlow> algo = makePtr<SparseRLOFOpticalFlowImpl>();
+    algo->setRLOFOpticalFlowParameter(rlofParam);
+    algo->setForwardBackward(forwardBackwardThreshold);
+    return algo;
+}
+
+void calcOpticalFlowDenseRLOF(InputArray I0, InputArray I1, InputOutputArray flow,
+    Ptr<RLOFOpticalFlowParameter>  rlofParam ,
+    float forewardBackwardThreshold, Size gridStep,
+    InterpolationType interp_type,
+    int epicK, float epicSigma, float epicLambda,
+    int superpixelSize, int superpixelType,
+    bool use_post_proc, float fgsLambda, float fgsSigma, bool use_variational_refinement)
+{
+    Ptr<DenseRLOFOpticalFlow> algo = DenseRLOFOpticalFlow::create(
+        rlofParam, forewardBackwardThreshold, gridStep, interp_type,
+        epicK, epicSigma, epicLambda, superpixelSize, superpixelType,
+        use_post_proc, fgsLambda, fgsSigma, use_variational_refinement);
+    algo->calc(I0, I1, flow);
+    algo->collectGarbage();
+}
+
+void calcOpticalFlowSparseRLOF(InputArray prevImg, InputArray nextImg,
+    InputArray prevPts, InputOutputArray nextPts,
+    OutputArray status, OutputArray err,
+    Ptr<RLOFOpticalFlowParameter>  rlofParam,
+    float forewardBackwardThreshold)
+{
+    Ptr<SparseRLOFOpticalFlow> algo = SparseRLOFOpticalFlow::create(
+        rlofParam, forewardBackwardThreshold);
+    algo->calc(prevImg, nextImg, prevPts, nextPts, status, err);
+}
+Ptr<DenseOpticalFlow> createOptFlow_DenseRLOF()
+{
+    return DenseRLOFOpticalFlow::create();
+}
+
+Ptr<SparseOpticalFlow> createOptFlow_SparseRLOF()
+{
+    return SparseRLOFOpticalFlow::create();
+}
+
+}} // namespace
diff --git a/modules/optflow/src/sparse_matching_gpc.cpp b/modules/optflow/src/sparse_matching_gpc.cpp
index 24998548564..8bc299c6eb8 100644
--- a/modules/optflow/src/sparse_matching_gpc.cpp
+++ b/modules/optflow/src/sparse_matching_gpc.cpp
@@ -770,7 +770,7 @@ void GPCDetails::dropOutliers( std::vector< std::pair< Point2i, Point2i > > &cor
 
 void write( FileStorage &fs, const String &name, const optflow::GPCTree::Node &node )
 {
-  cv::internal::WriteStructContext ws( fs, name, CV_NODE_SEQ + CV_NODE_FLOW );
+  cv::internal::WriteStructContext ws( fs, name, FileNode::SEQ + FileNode::FLOW );
   for ( unsigned i = 0; i < optflow::GPCPatchDescriptor::nFeatures; ++i )
     write( fs, node.coef[i] );
   write( fs, node.rhs );
diff --git a/modules/optflow/src/tvl1flow.cpp b/modules/optflow/src/tvl1flow.cpp
new file mode 100644
index 00000000000..4131ac59a11
--- /dev/null
+++ b/modules/optflow/src/tvl1flow.cpp
@@ -0,0 +1,1488 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+/*
+//
+// This implementation is based on Javier Sánchez Pérez <jsanchez@dis.ulpgc.es> implementation.
+// Original BSD license:
+//
+// Copyright (c) 2011, Javier Sánchez Pérez, Enric Meinhardt Llopis
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+//
+// * Redistributions of source code must retain the above copyright notice, this
+//   list of conditions and the following disclaimer.
+//
+// * Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+// POSSIBILITY OF SUCH DAMAGE.
+//
+*/
+
+#include "precomp.hpp"
+#include "opencl_kernels_optflow.hpp"
+
+#include <limits>
+#include <iomanip>
+#include <iostream>
+#include "opencv2/core/opencl/ocl_defs.hpp"
+
+namespace cv {
+namespace optflow {
+
+class OpticalFlowDual_TVL1 : public DualTVL1OpticalFlow
+{
+public:
+
+    OpticalFlowDual_TVL1(double tau_, double lambda_, double theta_, int nscales_, int warps_,
+                         double epsilon_, int innerIterations_, int outerIterations_,
+                         double scaleStep_, double gamma_, int medianFiltering_,
+                         bool useInitialFlow_) :
+        tau(tau_), lambda(lambda_), theta(theta_), gamma(gamma_), nscales(nscales_),
+        warps(warps_), epsilon(epsilon_), innerIterations(innerIterations_),
+        outerIterations(outerIterations_), useInitialFlow(useInitialFlow_),
+        scaleStep(scaleStep_), medianFiltering(medianFiltering_)
+    {
+    }
+    OpticalFlowDual_TVL1();
+
+    void calc(InputArray I0, InputArray I1, InputOutputArray flow) CV_OVERRIDE;
+    void collectGarbage() CV_OVERRIDE;
+
+    inline double getTau() const CV_OVERRIDE { return tau; }
+    inline void setTau(double val) CV_OVERRIDE { tau = val; }
+    inline double getLambda() const CV_OVERRIDE { return lambda; }
+    inline void setLambda(double val) CV_OVERRIDE { lambda = val; }
+    inline double getTheta() const CV_OVERRIDE { return theta; }
+    inline void setTheta(double val) CV_OVERRIDE { theta = val; }
+    inline double getGamma() const CV_OVERRIDE { return gamma; }
+    inline void setGamma(double val) CV_OVERRIDE { gamma = val; }
+    inline int getScalesNumber() const CV_OVERRIDE { return nscales; }
+    inline void setScalesNumber(int val) CV_OVERRIDE { nscales = val; }
+    inline int getWarpingsNumber() const CV_OVERRIDE { return warps; }
+    inline void setWarpingsNumber(int val) CV_OVERRIDE { warps = val; }
+    inline double getEpsilon() const CV_OVERRIDE { return epsilon; }
+    inline void setEpsilon(double val) CV_OVERRIDE { epsilon = val; }
+    inline int getInnerIterations() const CV_OVERRIDE { return innerIterations; }
+    inline void setInnerIterations(int val) CV_OVERRIDE { innerIterations = val; }
+    inline int getOuterIterations() const CV_OVERRIDE { return outerIterations; }
+    inline void setOuterIterations(int val) CV_OVERRIDE { outerIterations = val; }
+    inline bool getUseInitialFlow() const CV_OVERRIDE { return useInitialFlow; }
+    inline void setUseInitialFlow(bool val) CV_OVERRIDE { useInitialFlow = val; }
+    inline double getScaleStep() const CV_OVERRIDE { return scaleStep; }
+    inline void setScaleStep(double val) CV_OVERRIDE { scaleStep = val; }
+    inline int getMedianFiltering() const CV_OVERRIDE { return medianFiltering; }
+    inline void setMedianFiltering(int val) CV_OVERRIDE { medianFiltering = val; }
+
+protected:
+    double tau;
+    double lambda;
+    double theta;
+    double gamma;
+    int nscales;
+    int warps;
+    double epsilon;
+    int innerIterations;
+    int outerIterations;
+    bool useInitialFlow;
+    double scaleStep;
+    int medianFiltering;
+
+private:
+    void procOneScale(const Mat_<float>& I0, const Mat_<float>& I1, Mat_<float>& u1, Mat_<float>& u2, Mat_<float>& u3);
+
+#ifdef HAVE_OPENCL
+    bool procOneScale_ocl(const UMat& I0, const UMat& I1, UMat& u1, UMat& u2);
+
+    bool calc_ocl(InputArray I0, InputArray I1, InputOutputArray flow);
+#endif
+    struct dataMat
+    {
+        std::vector<Mat_<float> > I0s;
+        std::vector<Mat_<float> > I1s;
+        std::vector<Mat_<float> > u1s;
+        std::vector<Mat_<float> > u2s;
+        std::vector<Mat_<float> > u3s;
+
+        Mat_<float> I1x_buf;
+        Mat_<float> I1y_buf;
+
+        Mat_<float> flowMap1_buf;
+        Mat_<float> flowMap2_buf;
+
+        Mat_<float> I1w_buf;
+        Mat_<float> I1wx_buf;
+        Mat_<float> I1wy_buf;
+
+        Mat_<float> grad_buf;
+        Mat_<float> rho_c_buf;
+
+        Mat_<float> v1_buf;
+        Mat_<float> v2_buf;
+        Mat_<float> v3_buf;
+
+        Mat_<float> p11_buf;
+        Mat_<float> p12_buf;
+        Mat_<float> p21_buf;
+        Mat_<float> p22_buf;
+        Mat_<float> p31_buf;
+        Mat_<float> p32_buf;
+
+        Mat_<float> div_p1_buf;
+        Mat_<float> div_p2_buf;
+        Mat_<float> div_p3_buf;
+
+        Mat_<float> u1x_buf;
+        Mat_<float> u1y_buf;
+        Mat_<float> u2x_buf;
+        Mat_<float> u2y_buf;
+        Mat_<float> u3x_buf;
+        Mat_<float> u3y_buf;
+    } dm;
+
+#ifdef HAVE_OPENCL
+    struct dataUMat
+    {
+        std::vector<UMat> I0s;
+        std::vector<UMat> I1s;
+        std::vector<UMat> u1s;
+        std::vector<UMat> u2s;
+
+        UMat I1x_buf;
+        UMat I1y_buf;
+
+        UMat I1w_buf;
+        UMat I1wx_buf;
+        UMat I1wy_buf;
+
+        UMat grad_buf;
+        UMat rho_c_buf;
+
+        UMat p11_buf;
+        UMat p12_buf;
+        UMat p21_buf;
+        UMat p22_buf;
+
+        UMat diff_buf;
+        UMat norm_buf;
+    } dum;
+#endif
+};
+
+#ifdef HAVE_OPENCL
+namespace cv_ocl_tvl1flow
+{
+    bool centeredGradient(const UMat &src, UMat &dx, UMat &dy);
+
+    bool warpBackward(const UMat &I0, const UMat &I1, UMat &I1x, UMat &I1y,
+        UMat &u1, UMat &u2, UMat &I1w, UMat &I1wx, UMat &I1wy,
+        UMat &grad, UMat &rho);
+
+    bool estimateU(UMat &I1wx, UMat &I1wy, UMat &grad,
+        UMat &rho_c, UMat &p11, UMat &p12,
+        UMat &p21, UMat &p22, UMat &u1,
+        UMat &u2, UMat &error, float l_t, float theta, char calc_error);
+
+    bool estimateDualVariables(UMat &u1, UMat &u2,
+        UMat &p11, UMat &p12, UMat &p21, UMat &p22, float taut);
+}
+
+bool cv_ocl_tvl1flow::centeredGradient(const UMat &src, UMat &dx, UMat &dy)
+{
+    size_t globalsize[2] = { (size_t)src.cols, (size_t)src.rows };
+
+    ocl::Kernel kernel;
+    if (!kernel.create("centeredGradientKernel", ocl::optflow::optical_flow_tvl1_oclsrc, ""))
+        return false;
+
+    int idxArg = 0;
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadOnly(src));//src mat
+    idxArg = kernel.set(idxArg, (int)(src.cols));//src mat col
+    idxArg = kernel.set(idxArg, (int)(src.rows));//src mat rows
+    idxArg = kernel.set(idxArg, (int)(src.step / src.elemSize()));//src mat step
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrWriteOnly(dx));//res mat dx
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrWriteOnly(dy));//res mat dy
+    idxArg = kernel.set(idxArg, (int)(dx.step/dx.elemSize()));//res mat step
+    return kernel.run(2, globalsize, NULL, false);
+}
+
+bool cv_ocl_tvl1flow::warpBackward(const UMat &I0, const UMat &I1, UMat &I1x, UMat &I1y,
+    UMat &u1, UMat &u2, UMat &I1w, UMat &I1wx, UMat &I1wy,
+    UMat &grad, UMat &rho)
+{
+    size_t globalsize[2] = { (size_t)I0.cols, (size_t)I0.rows };
+
+    ocl::Kernel kernel;
+    if (!kernel.create("warpBackwardKernel", ocl::optflow::optical_flow_tvl1_oclsrc, ""))
+        return false;
+
+    int idxArg = 0;
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadOnly(I0));//I0 mat
+    int I0_step = (int)(I0.step / I0.elemSize());
+    idxArg = kernel.set(idxArg, I0_step);//I0_step
+    idxArg = kernel.set(idxArg, (int)(I0.cols));//I0_col
+    idxArg = kernel.set(idxArg, (int)(I0.rows));//I0_row
+    ocl::Image2D imageI1(I1);
+    ocl::Image2D imageI1x(I1x);
+    ocl::Image2D imageI1y(I1y);
+    idxArg = kernel.set(idxArg, imageI1);//image2d_t tex_I1
+    idxArg = kernel.set(idxArg, imageI1x);//image2d_t tex_I1x
+    idxArg = kernel.set(idxArg, imageI1y);//image2d_t tex_I1y
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadOnly(u1));//const float* u1
+    idxArg = kernel.set(idxArg, (int)(u1.step / u1.elemSize()));//int u1_step
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadOnly(u2));//const float* u2
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrWriteOnly(I1w));///float* I1w
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrWriteOnly(I1wx));//float* I1wx
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrWriteOnly(I1wy));//float* I1wy
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrWriteOnly(grad));//float* grad
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrWriteOnly(rho));//float* rho
+    idxArg = kernel.set(idxArg, (int)(I1w.step / I1w.elemSize()));//I1w_step
+    idxArg = kernel.set(idxArg, (int)(u2.step / u2.elemSize()));//u2_step
+    int u1_offset_x = (int)((u1.offset) % (u1.step));
+    u1_offset_x = (int)(u1_offset_x / u1.elemSize());
+    idxArg = kernel.set(idxArg, (int)u1_offset_x );//u1_offset_x
+    idxArg = kernel.set(idxArg, (int)(u1.offset/u1.step));//u1_offset_y
+    int u2_offset_x = (int)((u2.offset) % (u2.step));
+    u2_offset_x = (int) (u2_offset_x / u2.elemSize());
+    idxArg = kernel.set(idxArg, (int)u2_offset_x);//u2_offset_x
+    idxArg = kernel.set(idxArg, (int)(u2.offset / u2.step));//u2_offset_y
+    return kernel.run(2, globalsize, NULL, false);
+}
+
+bool cv_ocl_tvl1flow::estimateU(UMat &I1wx, UMat &I1wy, UMat &grad,
+    UMat &rho_c, UMat &p11, UMat &p12,
+    UMat &p21, UMat &p22, UMat &u1,
+    UMat &u2, UMat &error, float l_t, float theta, char calc_error)
+{
+    size_t globalsize[2] = { (size_t)I1wx.cols, (size_t)I1wx.rows };
+
+    ocl::Kernel kernel;
+    if (!kernel.create("estimateUKernel", ocl::optflow::optical_flow_tvl1_oclsrc, ""))
+        return false;
+
+    int idxArg = 0;
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadOnly(I1wx)); //const float* I1wx
+    idxArg = kernel.set(idxArg, (int)(I1wx.cols)); //int I1wx_col
+    idxArg = kernel.set(idxArg, (int)(I1wx.rows)); //int I1wx_row
+    idxArg = kernel.set(idxArg, (int)(I1wx.step/I1wx.elemSize())); //int I1wx_step
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadOnly(I1wy)); //const float* I1wy
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadOnly(grad)); //const float* grad
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadOnly(rho_c)); //const float* rho_c
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadOnly(p11)); //const float* p11
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadOnly(p12)); //const float* p12
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadOnly(p21)); //const float* p21
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadOnly(p22)); //const float* p22
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadWrite(u1)); //float* u1
+    idxArg = kernel.set(idxArg, (int)(u1.step / u1.elemSize())); //int u1_step
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadWrite(u2)); //float* u2
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrWriteOnly(error)); //float* error
+    idxArg = kernel.set(idxArg, (float)l_t); //float l_t
+    idxArg = kernel.set(idxArg, (float)theta); //float theta
+    idxArg = kernel.set(idxArg, (int)(u2.step / u2.elemSize()));//int u2_step
+    int u1_offset_x = (int)(u1.offset % u1.step);
+    u1_offset_x = (int) (u1_offset_x  / u1.elemSize());
+    idxArg = kernel.set(idxArg, (int)u1_offset_x); //int u1_offset_x
+    idxArg = kernel.set(idxArg, (int)(u1.offset/u1.step)); //int u1_offset_y
+    int u2_offset_x = (int)(u2.offset % u2.step);
+    u2_offset_x = (int)(u2_offset_x / u2.elemSize());
+    idxArg = kernel.set(idxArg, (int)u2_offset_x ); //int u2_offset_x
+    idxArg = kernel.set(idxArg, (int)(u2.offset / u2.step)); //int u2_offset_y
+    idxArg = kernel.set(idxArg, (char)calc_error);    //char calc_error
+
+    return kernel.run(2, globalsize, NULL, false);
+}
+
+bool cv_ocl_tvl1flow::estimateDualVariables(UMat &u1, UMat &u2,
+    UMat &p11, UMat &p12, UMat &p21, UMat &p22, float taut)
+{
+    size_t globalsize[2] = { (size_t)u1.cols, (size_t)u1.rows };
+
+    ocl::Kernel kernel;
+    if (!kernel.create("estimateDualVariablesKernel", ocl::optflow::optical_flow_tvl1_oclsrc, ""))
+        return false;
+
+    int idxArg = 0;
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadOnly(u1));// const float* u1
+    idxArg = kernel.set(idxArg, (int)(u1.cols)); //int u1_col
+    idxArg = kernel.set(idxArg, (int)(u1.rows)); //int u1_row
+    idxArg = kernel.set(idxArg, (int)(u1.step/u1.elemSize())); //int u1_step
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadOnly(u2)); // const float* u2
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadWrite(p11)); // float* p11
+    idxArg = kernel.set(idxArg, (int)(p11.step/p11.elemSize())); //int p11_step
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadWrite(p12)); // float* p12
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadWrite(p21)); // float* p21
+    idxArg = kernel.set(idxArg, ocl::KernelArg::PtrReadWrite(p22)); // float* p22
+    idxArg = kernel.set(idxArg, (float)(taut));    //float taut
+    idxArg = kernel.set(idxArg, (int)(u2.step/u2.elemSize())); //int u2_step
+    int u1_offset_x = (int)(u1.offset % u1.step);
+    u1_offset_x = (int)(u1_offset_x / u1.elemSize());
+    idxArg = kernel.set(idxArg, u1_offset_x); //int u1_offset_x
+    idxArg = kernel.set(idxArg, (int)(u1.offset / u1.step)); //int u1_offset_y
+    int u2_offset_x = (int)(u2.offset % u2.step);
+    u2_offset_x = (int)(u2_offset_x / u2.elemSize());
+    idxArg = kernel.set(idxArg, u2_offset_x); //int u2_offset_x
+    idxArg = kernel.set(idxArg, (int)(u2.offset / u2.step)); //int u2_offset_y
+
+    return kernel.run(2, globalsize, NULL, false);
+
+}
+#endif
+
+OpticalFlowDual_TVL1::OpticalFlowDual_TVL1()
+{
+    tau            = 0.25;
+    lambda         = 0.15;
+    theta          = 0.3;
+    nscales        = 5;
+    warps          = 5;
+    epsilon        = 0.01;
+    gamma          = 0.;
+    innerIterations = 30;
+    outerIterations = 10;
+    useInitialFlow = false;
+    medianFiltering = 5;
+    scaleStep      = 0.8;
+}
+
+void OpticalFlowDual_TVL1::calc(InputArray _I0, InputArray _I1, InputOutputArray _flow)
+{
+    CV_INSTRUMENT_REGION();
+
+#ifndef __APPLE__
+    CV_OCL_RUN(_flow.isUMat() &&
+               ocl::Image2D::isFormatSupported(CV_32F, 1, false),
+               calc_ocl(_I0, _I1, _flow))
+#endif
+
+    Mat I0 = _I0.getMat();
+    Mat I1 = _I1.getMat();
+
+    CV_Assert( I0.type() == CV_8UC1 || I0.type() == CV_32FC1 );
+    CV_Assert( I0.size() == I1.size() );
+    CV_Assert( I0.type() == I1.type() );
+    CV_Assert( !useInitialFlow || (_flow.size() == I0.size() && _flow.type() == CV_32FC2) );
+    CV_Assert( nscales > 0 );
+    bool use_gamma = gamma != 0;
+    // allocate memory for the pyramid structure
+    dm.I0s.resize(nscales);
+    dm.I1s.resize(nscales);
+    dm.u1s.resize(nscales);
+    dm.u2s.resize(nscales);
+    dm.u3s.resize(nscales);
+
+    I0.convertTo(dm.I0s[0], dm.I0s[0].depth(), I0.depth() == CV_8U ? 1.0 : 255.0);
+    I1.convertTo(dm.I1s[0], dm.I1s[0].depth(), I1.depth() == CV_8U ? 1.0 : 255.0);
+
+    dm.u1s[0].create(I0.size());
+    dm.u2s[0].create(I0.size());
+    if (use_gamma) dm.u3s[0].create(I0.size());
+
+    if (useInitialFlow)
+    {
+        Mat_<float> mv[] = { dm.u1s[0], dm.u2s[0] };
+        split(_flow.getMat(), mv);
+    }
+
+    dm.I1x_buf.create(I0.size());
+    dm.I1y_buf.create(I0.size());
+
+    dm.flowMap1_buf.create(I0.size());
+    dm.flowMap2_buf.create(I0.size());
+
+    dm.I1w_buf.create(I0.size());
+    dm.I1wx_buf.create(I0.size());
+    dm.I1wy_buf.create(I0.size());
+
+    dm.grad_buf.create(I0.size());
+    dm.rho_c_buf.create(I0.size());
+
+    dm.v1_buf.create(I0.size());
+    dm.v2_buf.create(I0.size());
+    dm.v3_buf.create(I0.size());
+
+    dm.p11_buf.create(I0.size());
+    dm.p12_buf.create(I0.size());
+    dm.p21_buf.create(I0.size());
+    dm.p22_buf.create(I0.size());
+    dm.p31_buf.create(I0.size());
+    dm.p32_buf.create(I0.size());
+
+    dm.div_p1_buf.create(I0.size());
+    dm.div_p2_buf.create(I0.size());
+    dm.div_p3_buf.create(I0.size());
+
+    dm.u1x_buf.create(I0.size());
+    dm.u1y_buf.create(I0.size());
+    dm.u2x_buf.create(I0.size());
+    dm.u2y_buf.create(I0.size());
+    dm.u3x_buf.create(I0.size());
+    dm.u3y_buf.create(I0.size());
+
+    // create the scales
+    for (int s = 1; s < nscales; ++s)
+    {
+        resize(dm.I0s[s - 1], dm.I0s[s], Size(), scaleStep, scaleStep, INTER_LINEAR);
+        resize(dm.I1s[s - 1], dm.I1s[s], Size(), scaleStep, scaleStep, INTER_LINEAR);
+
+        if (dm.I0s[s].cols < 16 || dm.I0s[s].rows < 16)
+        {
+            nscales = s;
+            break;
+        }
+
+        if (useInitialFlow)
+        {
+            resize(dm.u1s[s - 1], dm.u1s[s], Size(), scaleStep, scaleStep, INTER_LINEAR);
+            resize(dm.u2s[s - 1], dm.u2s[s], Size(), scaleStep, scaleStep, INTER_LINEAR);
+
+            multiply(dm.u1s[s], Scalar::all(scaleStep), dm.u1s[s]);
+            multiply(dm.u2s[s], Scalar::all(scaleStep), dm.u2s[s]);
+        }
+        else
+        {
+            dm.u1s[s].create(dm.I0s[s].size());
+            dm.u2s[s].create(dm.I0s[s].size());
+        }
+        if (use_gamma) dm.u3s[s].create(dm.I0s[s].size());
+    }
+    if (!useInitialFlow)
+    {
+        dm.u1s[nscales - 1].setTo(Scalar::all(0));
+        dm.u2s[nscales - 1].setTo(Scalar::all(0));
+    }
+    if (use_gamma) dm.u3s[nscales - 1].setTo(Scalar::all(0));
+    // pyramidal structure for computing the optical flow
+    for (int s = nscales - 1; s >= 0; --s)
+    {
+        // compute the optical flow at the current scale
+        procOneScale(dm.I0s[s], dm.I1s[s], dm.u1s[s], dm.u2s[s], dm.u3s[s]);
+
+        // if this was the last scale, finish now
+        if (s == 0)
+            break;
+
+        // otherwise, upsample the optical flow
+
+        // zoom the optical flow for the next finer scale
+        resize(dm.u1s[s], dm.u1s[s - 1], dm.I0s[s - 1].size(), 0, 0, INTER_LINEAR);
+        resize(dm.u2s[s], dm.u2s[s - 1], dm.I0s[s - 1].size(), 0, 0, INTER_LINEAR);
+        if (use_gamma) resize(dm.u3s[s], dm.u3s[s - 1], dm.I0s[s - 1].size(), 0, 0, INTER_LINEAR);
+
+        // scale the optical flow with the appropriate zoom factor (don't scale u3!)
+        multiply(dm.u1s[s - 1], Scalar::all(1 / scaleStep), dm.u1s[s - 1]);
+        multiply(dm.u2s[s - 1], Scalar::all(1 / scaleStep), dm.u2s[s - 1]);
+    }
+
+    Mat uxy[] = { dm.u1s[0], dm.u2s[0] };
+    merge(uxy, 2, _flow);
+}
+
+#ifdef HAVE_OPENCL
+bool OpticalFlowDual_TVL1::calc_ocl(InputArray _I0, InputArray _I1, InputOutputArray _flow)
+{
+    UMat I0 = _I0.getUMat();
+    UMat I1 = _I1.getUMat();
+
+    CV_Assert(I0.type() == CV_8UC1 || I0.type() == CV_32FC1);
+    CV_Assert(I0.size() == I1.size());
+    CV_Assert(I0.type() == I1.type());
+    CV_Assert(!useInitialFlow || (_flow.size() == I0.size() && _flow.type() == CV_32FC2));
+    CV_Assert(nscales > 0);
+
+    // allocate memory for the pyramid structure
+    dum.I0s.resize(nscales);
+    dum.I1s.resize(nscales);
+    dum.u1s.resize(nscales);
+    dum.u2s.resize(nscales);
+    //I0s_step == I1s_step
+    double alpha = I0.depth() == CV_8U ? 1.0 : 255.0;
+
+    I0.convertTo(dum.I0s[0], CV_32F, alpha);
+    I1.convertTo(dum.I1s[0], CV_32F, I1.depth() == CV_8U ? 1.0 : 255.0);
+
+    dum.u1s[0].create(I0.size(), CV_32FC1);
+    dum.u2s[0].create(I0.size(), CV_32FC1);
+
+    if (useInitialFlow)
+    {
+        std::vector<UMat> umv;
+        umv.push_back(dum.u1s[0]);
+        umv.push_back(dum.u2s[0]);
+        cv::split(_flow,umv);
+    }
+
+    dum.I1x_buf.create(I0.size(), CV_32FC1);
+    dum.I1y_buf.create(I0.size(), CV_32FC1);
+
+    dum.I1w_buf.create(I0.size(), CV_32FC1);
+    dum.I1wx_buf.create(I0.size(), CV_32FC1);
+    dum.I1wy_buf.create(I0.size(), CV_32FC1);
+
+    dum.grad_buf.create(I0.size(), CV_32FC1);
+    dum.rho_c_buf.create(I0.size(), CV_32FC1);
+
+    dum.p11_buf.create(I0.size(), CV_32FC1);
+    dum.p12_buf.create(I0.size(), CV_32FC1);
+    dum.p21_buf.create(I0.size(), CV_32FC1);
+    dum.p22_buf.create(I0.size(), CV_32FC1);
+
+    dum.diff_buf.create(I0.size(), CV_32FC1);
+
+    // create the scales
+    for (int s = 1; s < nscales; ++s)
+    {
+        resize(dum.I0s[s - 1], dum.I0s[s], Size(), scaleStep, scaleStep, INTER_LINEAR);
+        resize(dum.I1s[s - 1], dum.I1s[s], Size(), scaleStep, scaleStep, INTER_LINEAR);
+
+        if (dum.I0s[s].cols < 16 || dum.I0s[s].rows < 16)
+        {
+            nscales = s;
+            break;
+        }
+
+        if (useInitialFlow)
+        {
+            resize(dum.u1s[s - 1], dum.u1s[s], Size(), scaleStep, scaleStep, INTER_LINEAR);
+            resize(dum.u2s[s - 1], dum.u2s[s], Size(), scaleStep, scaleStep, INTER_LINEAR);
+
+            //scale by scale factor
+            multiply(dum.u1s[s], Scalar::all(scaleStep), dum.u1s[s]);
+            multiply(dum.u2s[s], Scalar::all(scaleStep), dum.u2s[s]);
+        }
+    }
+
+    // pyramidal structure for computing the optical flow
+    for (int s = nscales - 1; s >= 0; --s)
+    {
+        // compute the optical flow at the current scale
+        if (!OpticalFlowDual_TVL1::procOneScale_ocl(dum.I0s[s], dum.I1s[s], dum.u1s[s], dum.u2s[s]))
+            return false;
+
+        // if this was the last scale, finish now
+        if (s == 0)
+            break;
+
+        // zoom the optical flow for the next finer scale
+        resize(dum.u1s[s], dum.u1s[s - 1], dum.I0s[s - 1].size(), 0, 0, INTER_LINEAR);
+        resize(dum.u2s[s], dum.u2s[s - 1], dum.I0s[s - 1].size(), 0, 0, INTER_LINEAR);
+
+        // scale the optical flow with the appropriate zoom factor
+        multiply(dum.u1s[s - 1], Scalar::all(1 / scaleStep), dum.u1s[s - 1]);
+        multiply(dum.u2s[s - 1], Scalar::all(1 / scaleStep), dum.u2s[s - 1]);
+    }
+
+    std::vector<UMat> uxy;
+    uxy.push_back(dum.u1s[0]);
+    uxy.push_back(dum.u2s[0]);
+    merge(uxy, _flow);
+    return true;
+}
+#endif
+
+////////////////////////////////////////////////////////////
+// buildFlowMap
+
+struct BuildFlowMapBody : ParallelLoopBody
+{
+    void operator() (const Range& range) const CV_OVERRIDE;
+
+    Mat_<float> u1;
+    Mat_<float> u2;
+    mutable Mat_<float> map1;
+    mutable Mat_<float> map2;
+};
+
+void BuildFlowMapBody::operator() (const Range& range) const
+{
+    for (int y = range.start; y < range.end; ++y)
+    {
+        const float* u1Row = u1[y];
+        const float* u2Row = u2[y];
+
+        float* map1Row = map1[y];
+        float* map2Row = map2[y];
+
+        for (int x = 0; x < u1.cols; ++x)
+        {
+            map1Row[x] = x + u1Row[x];
+            map2Row[x] = y + u2Row[x];
+        }
+    }
+}
+
+static void buildFlowMap(const Mat_<float>& u1, const Mat_<float>& u2,
+                         Mat_<float>& map1, Mat_<float>& map2)
+{
+    CV_DbgAssert( u2.size() == u1.size() );
+    CV_DbgAssert( map1.size() == u1.size() );
+    CV_DbgAssert( map2.size() == u1.size() );
+
+    BuildFlowMapBody body;
+
+    body.u1 = u1;
+    body.u2 = u2;
+    body.map1 = map1;
+    body.map2 = map2;
+
+    parallel_for_(Range(0, u1.rows), body);
+}
+
+////////////////////////////////////////////////////////////
+// centeredGradient
+
+struct CenteredGradientBody : ParallelLoopBody
+{
+    void operator() (const Range& range) const CV_OVERRIDE;
+
+    Mat_<float> src;
+    mutable Mat_<float> dx;
+    mutable Mat_<float> dy;
+};
+
+void CenteredGradientBody::operator() (const Range& range) const
+{
+    const int last_col = src.cols - 1;
+
+    for (int y = range.start; y < range.end; ++y)
+    {
+        const float* srcPrevRow = src[y - 1];
+        const float* srcCurRow = src[y];
+        const float* srcNextRow = src[y + 1];
+
+        float* dxRow = dx[y];
+        float* dyRow = dy[y];
+
+        for (int x = 1; x < last_col; ++x)
+        {
+            dxRow[x] = 0.5f * (srcCurRow[x + 1] - srcCurRow[x - 1]);
+            dyRow[x] = 0.5f * (srcNextRow[x] - srcPrevRow[x]);
+        }
+    }
+}
+
+static void centeredGradient(const Mat_<float>& src, Mat_<float>& dx, Mat_<float>& dy)
+{
+    CV_DbgAssert( src.rows > 2 && src.cols > 2 );
+    CV_DbgAssert( dx.size() == src.size() );
+    CV_DbgAssert( dy.size() == src.size() );
+
+    const int last_row = src.rows - 1;
+    const int last_col = src.cols - 1;
+
+    // compute the gradient on the center body of the image
+    {
+        CenteredGradientBody body;
+
+        body.src = src;
+        body.dx = dx;
+        body.dy = dy;
+
+        parallel_for_(Range(1, last_row), body);
+    }
+
+    // compute the gradient on the first and last rows
+    for (int x = 1; x < last_col; ++x)
+    {
+        dx(0, x) = 0.5f * (src(0, x + 1) - src(0, x - 1));
+        dy(0, x) = 0.5f * (src(1, x) - src(0, x));
+
+        dx(last_row, x) = 0.5f * (src(last_row, x + 1) - src(last_row, x - 1));
+        dy(last_row, x) = 0.5f * (src(last_row, x) - src(last_row - 1, x));
+    }
+
+    // compute the gradient on the first and last columns
+    for (int y = 1; y < last_row; ++y)
+    {
+        dx(y, 0) = 0.5f * (src(y, 1) - src(y, 0));
+        dy(y, 0) = 0.5f * (src(y + 1, 0) - src(y - 1, 0));
+
+        dx(y, last_col) = 0.5f * (src(y, last_col) - src(y, last_col - 1));
+        dy(y, last_col) = 0.5f * (src(y + 1, last_col) - src(y - 1, last_col));
+    }
+
+    // compute the gradient at the four corners
+    dx(0, 0) = 0.5f * (src(0, 1) - src(0, 0));
+    dy(0, 0) = 0.5f * (src(1, 0) - src(0, 0));
+
+    dx(0, last_col) = 0.5f * (src(0, last_col) - src(0, last_col - 1));
+    dy(0, last_col) = 0.5f * (src(1, last_col) - src(0, last_col));
+
+    dx(last_row, 0) = 0.5f * (src(last_row, 1) - src(last_row, 0));
+    dy(last_row, 0) = 0.5f * (src(last_row, 0) - src(last_row - 1, 0));
+
+    dx(last_row, last_col) = 0.5f * (src(last_row, last_col) - src(last_row, last_col - 1));
+    dy(last_row, last_col) = 0.5f * (src(last_row, last_col) - src(last_row - 1, last_col));
+}
+
+////////////////////////////////////////////////////////////
+// forwardGradient
+
+struct ForwardGradientBody : ParallelLoopBody
+{
+    void operator() (const Range& range) const CV_OVERRIDE;
+
+    Mat_<float> src;
+    mutable Mat_<float> dx;
+    mutable Mat_<float> dy;
+};
+
+void ForwardGradientBody::operator() (const Range& range) const
+{
+    const int last_col = src.cols - 1;
+
+    for (int y = range.start; y < range.end; ++y)
+    {
+        const float* srcCurRow = src[y];
+        const float* srcNextRow = src[y + 1];
+
+        float* dxRow = dx[y];
+        float* dyRow = dy[y];
+
+        for (int x = 0; x < last_col; ++x)
+        {
+            dxRow[x] = srcCurRow[x + 1] - srcCurRow[x];
+            dyRow[x] = srcNextRow[x] - srcCurRow[x];
+        }
+    }
+}
+
+static void forwardGradient(const Mat_<float>& src, Mat_<float>& dx, Mat_<float>& dy)
+{
+    CV_DbgAssert( src.rows > 2 && src.cols > 2 );
+    CV_DbgAssert( dx.size() == src.size() );
+    CV_DbgAssert( dy.size() == src.size() );
+
+    const int last_row = src.rows - 1;
+    const int last_col = src.cols - 1;
+
+    // compute the gradient on the central body of the image
+    {
+        ForwardGradientBody body;
+
+        body.src = src;
+        body.dx = dx;
+        body.dy = dy;
+
+        parallel_for_(Range(0, last_row), body);
+    }
+
+    // compute the gradient on the last row
+    for (int x = 0; x < last_col; ++x)
+    {
+        dx(last_row, x) = src(last_row, x + 1) - src(last_row, x);
+        dy(last_row, x) = 0.0f;
+    }
+
+    // compute the gradient on the last column
+    for (int y = 0; y < last_row; ++y)
+    {
+        dx(y, last_col) = 0.0f;
+        dy(y, last_col) = src(y + 1, last_col) - src(y, last_col);
+    }
+
+    dx(last_row, last_col) = 0.0f;
+    dy(last_row, last_col) = 0.0f;
+}
+
+////////////////////////////////////////////////////////////
+// divergence
+
+struct DivergenceBody : ParallelLoopBody
+{
+    void operator() (const Range& range) const CV_OVERRIDE;
+
+    Mat_<float> v1;
+    Mat_<float> v2;
+    mutable Mat_<float> div;
+};
+
+void DivergenceBody::operator() (const Range& range) const
+{
+    for (int y = range.start; y < range.end; ++y)
+    {
+        const float* v1Row = v1[y];
+        const float* v2PrevRow = v2[y - 1];
+        const float* v2CurRow = v2[y];
+
+        float* divRow = div[y];
+
+        for(int x = 1; x < v1.cols; ++x)
+        {
+            const float v1x = v1Row[x] - v1Row[x - 1];
+            const float v2y = v2CurRow[x] - v2PrevRow[x];
+
+            divRow[x] = v1x + v2y;
+        }
+    }
+}
+
+static void divergence(const Mat_<float>& v1, const Mat_<float>& v2, Mat_<float>& div)
+{
+    CV_DbgAssert( v1.rows > 2 && v1.cols > 2 );
+    CV_DbgAssert( v2.size() == v1.size() );
+    CV_DbgAssert( div.size() == v1.size() );
+
+    {
+        DivergenceBody body;
+
+        body.v1 = v1;
+        body.v2 = v2;
+        body.div = div;
+
+        parallel_for_(Range(1, v1.rows), body);
+    }
+
+    // compute the divergence on the first row
+    for(int x = 1; x < v1.cols; ++x)
+        div(0, x) = v1(0, x) - v1(0, x - 1) + v2(0, x);
+
+    // compute the divergence on the first column
+    for (int y = 1; y < v1.rows; ++y)
+        div(y, 0) = v1(y, 0) + v2(y, 0) - v2(y - 1, 0);
+
+    div(0, 0) = v1(0, 0) + v2(0, 0);
+}
+
+////////////////////////////////////////////////////////////
+// calcGradRho
+
+struct CalcGradRhoBody : ParallelLoopBody
+{
+    void operator() (const Range& range) const CV_OVERRIDE;
+
+    Mat_<float> I0;
+    Mat_<float> I1w;
+    Mat_<float> I1wx;
+    Mat_<float> I1wy;
+    Mat_<float> u1;
+    Mat_<float> u2;
+    mutable Mat_<float> grad;
+    mutable Mat_<float> rho_c;
+};
+
+void CalcGradRhoBody::operator() (const Range& range) const
+{
+    for (int y = range.start; y < range.end; ++y)
+    {
+        const float* I0Row = I0[y];
+        const float* I1wRow = I1w[y];
+        const float* I1wxRow = I1wx[y];
+        const float* I1wyRow = I1wy[y];
+        const float* u1Row = u1[y];
+        const float* u2Row = u2[y];
+
+        float* gradRow = grad[y];
+        float* rhoRow = rho_c[y];
+
+        for (int x = 0; x < I0.cols; ++x)
+        {
+            const float Ix2 = I1wxRow[x] * I1wxRow[x];
+            const float Iy2 = I1wyRow[x] * I1wyRow[x];
+
+            // store the |Grad(I1)|^2
+            gradRow[x] = Ix2 + Iy2;
+
+            // compute the constant part of the rho function
+            rhoRow[x] = (I1wRow[x] - I1wxRow[x] * u1Row[x] - I1wyRow[x] * u2Row[x] - I0Row[x]);
+        }
+    }
+}
+
+static void calcGradRho(const Mat_<float>& I0, const Mat_<float>& I1w, const Mat_<float>& I1wx, const Mat_<float>& I1wy, const Mat_<float>& u1, const Mat_<float>& u2,
+    Mat_<float>& grad, Mat_<float>& rho_c)
+{
+    CV_DbgAssert( I1w.size() == I0.size() );
+    CV_DbgAssert( I1wx.size() == I0.size() );
+    CV_DbgAssert( I1wy.size() == I0.size() );
+    CV_DbgAssert( u1.size() == I0.size() );
+    CV_DbgAssert( u2.size() == I0.size() );
+    CV_DbgAssert( grad.size() == I0.size() );
+    CV_DbgAssert( rho_c.size() == I0.size() );
+
+    CalcGradRhoBody body;
+
+    body.I0 = I0;
+    body.I1w = I1w;
+    body.I1wx = I1wx;
+    body.I1wy = I1wy;
+    body.u1 = u1;
+    body.u2 = u2;
+    body.grad = grad;
+    body.rho_c = rho_c;
+
+    parallel_for_(Range(0, I0.rows), body);
+}
+
+////////////////////////////////////////////////////////////
+// estimateV
+
+struct EstimateVBody : ParallelLoopBody
+{
+    void operator() (const Range& range) const CV_OVERRIDE;
+
+    Mat_<float> I1wx;
+    Mat_<float> I1wy;
+    Mat_<float> u1;
+    Mat_<float> u2;
+    Mat_<float> u3;
+    Mat_<float> grad;
+    Mat_<float> rho_c;
+    mutable Mat_<float> v1;
+    mutable Mat_<float> v2;
+    mutable Mat_<float> v3;
+    float l_t;
+    float gamma;
+};
+
+void EstimateVBody::operator() (const Range& range) const
+{
+    bool use_gamma = gamma != 0;
+    for (int y = range.start; y < range.end; ++y)
+    {
+        const float* I1wxRow = I1wx[y];
+        const float* I1wyRow = I1wy[y];
+        const float* u1Row = u1[y];
+        const float* u2Row = u2[y];
+        const float* u3Row = use_gamma?u3[y]:NULL;
+        const float* gradRow = grad[y];
+        const float* rhoRow = rho_c[y];
+
+        float* v1Row = v1[y];
+        float* v2Row = v2[y];
+        float* v3Row = use_gamma ? v3[y]:NULL;
+
+        for (int x = 0; x < I1wx.cols; ++x)
+        {
+            const float rho = use_gamma ? rhoRow[x] + (I1wxRow[x] * u1Row[x] + I1wyRow[x] * u2Row[x]) + gamma * u3Row[x] :
+                                          rhoRow[x] + (I1wxRow[x] * u1Row[x] + I1wyRow[x] * u2Row[x]);
+            float d1 = 0.0f;
+            float d2 = 0.0f;
+            float d3 = 0.0f;
+            if (rho < -l_t * gradRow[x])
+            {
+                d1 = l_t * I1wxRow[x];
+                d2 = l_t * I1wyRow[x];
+                if (use_gamma) d3 = l_t * gamma;
+            }
+            else if (rho > l_t * gradRow[x])
+            {
+                d1 = -l_t * I1wxRow[x];
+                d2 = -l_t * I1wyRow[x];
+                if (use_gamma) d3 = -l_t * gamma;
+            }
+            else if (gradRow[x] > std::numeric_limits<float>::epsilon())
+            {
+                float fi = -rho / gradRow[x];
+                d1 = fi * I1wxRow[x];
+                d2 = fi * I1wyRow[x];
+                if (use_gamma) d3 = fi * gamma;
+            }
+
+            v1Row[x] = u1Row[x] + d1;
+            v2Row[x] = u2Row[x] + d2;
+            if (use_gamma) v3Row[x] = u3Row[x] + d3;
+        }
+    }
+}
+
+static void estimateV(const Mat_<float>& I1wx, const Mat_<float>& I1wy, const Mat_<float>& u1, const Mat_<float>& u2, const Mat_<float>& u3, const Mat_<float>& grad, const Mat_<float>& rho_c,
+   Mat_<float>& v1, Mat_<float>& v2, Mat_<float>& v3, float l_t, float gamma)
+{
+    CV_DbgAssert( I1wy.size() == I1wx.size() );
+    CV_DbgAssert( u1.size() == I1wx.size() );
+    CV_DbgAssert( u2.size() == I1wx.size() );
+    CV_DbgAssert( grad.size() == I1wx.size() );
+    CV_DbgAssert( rho_c.size() == I1wx.size() );
+    CV_DbgAssert( v1.size() == I1wx.size() );
+    CV_DbgAssert( v2.size() == I1wx.size() );
+
+    EstimateVBody body;
+    bool use_gamma = gamma != 0;
+    body.I1wx = I1wx;
+    body.I1wy = I1wy;
+    body.u1 = u1;
+    body.u2 = u2;
+    if (use_gamma) body.u3 = u3;
+    body.grad = grad;
+    body.rho_c = rho_c;
+    body.v1 = v1;
+    body.v2 = v2;
+    if (use_gamma) body.v3 = v3;
+    body.l_t = l_t;
+    body.gamma = gamma;
+    parallel_for_(Range(0, I1wx.rows), body);
+}
+
+////////////////////////////////////////////////////////////
+// estimateU
+
+static float estimateU(const Mat_<float>& v1, const Mat_<float>& v2, const Mat_<float>& v3,
+            const Mat_<float>& div_p1, const Mat_<float>& div_p2, const Mat_<float>& div_p3,
+            Mat_<float>& u1, Mat_<float>& u2, Mat_<float>& u3,
+            float theta, float gamma)
+{
+    CV_DbgAssert( v2.size() == v1.size() );
+    CV_DbgAssert( div_p1.size() == v1.size() );
+    CV_DbgAssert( div_p2.size() == v1.size() );
+    CV_DbgAssert( u1.size() == v1.size() );
+    CV_DbgAssert( u2.size() == v1.size() );
+
+    float error = 0.0f;
+    bool use_gamma = gamma != 0;
+    for (int y = 0; y < v1.rows; ++y)
+    {
+        const float* v1Row = v1[y];
+        const float* v2Row = v2[y];
+        const float* v3Row = use_gamma?v3[y]:NULL;
+        const float* divP1Row = div_p1[y];
+        const float* divP2Row = div_p2[y];
+        const float* divP3Row = use_gamma?div_p3[y]:NULL;
+
+        float* u1Row = u1[y];
+        float* u2Row = u2[y];
+        float* u3Row = use_gamma?u3[y]:NULL;
+
+
+        for (int x = 0; x < v1.cols; ++x)
+        {
+            const float u1k = u1Row[x];
+            const float u2k = u2Row[x];
+            const float u3k = use_gamma?u3Row[x]:0;
+
+            u1Row[x] = v1Row[x] + theta * divP1Row[x];
+            u2Row[x] = v2Row[x] + theta * divP2Row[x];
+            if (use_gamma) u3Row[x] = v3Row[x] + theta * divP3Row[x];
+            error += use_gamma?(u1Row[x] - u1k) * (u1Row[x] - u1k) + (u2Row[x] - u2k) * (u2Row[x] - u2k) + (u3Row[x] - u3k) * (u3Row[x] - u3k):
+                               (u1Row[x] - u1k) * (u1Row[x] - u1k) + (u2Row[x] - u2k) * (u2Row[x] - u2k);
+        }
+    }
+
+    return error;
+}
+
+////////////////////////////////////////////////////////////
+// estimateDualVariables
+
+struct EstimateDualVariablesBody : ParallelLoopBody
+{
+    void operator() (const Range& range) const CV_OVERRIDE;
+
+    Mat_<float> u1x;
+    Mat_<float> u1y;
+    Mat_<float> u2x;
+    Mat_<float> u2y;
+    Mat_<float> u3x;
+    Mat_<float> u3y;
+    mutable Mat_<float> p11;
+    mutable Mat_<float> p12;
+    mutable Mat_<float> p21;
+    mutable Mat_<float> p22;
+    mutable Mat_<float> p31;
+    mutable Mat_<float> p32;
+    float taut;
+    bool use_gamma;
+};
+
+void EstimateDualVariablesBody::operator() (const Range& range) const
+{
+    for (int y = range.start; y < range.end; ++y)
+    {
+        const float* u1xRow = u1x[y];
+        const float* u1yRow = u1y[y];
+        const float* u2xRow = u2x[y];
+        const float* u2yRow = u2y[y];
+        const float* u3xRow = u3x[y];
+        const float* u3yRow = u3y[y];
+
+        float* p11Row = p11[y];
+        float* p12Row = p12[y];
+        float* p21Row = p21[y];
+        float* p22Row = p22[y];
+        float* p31Row = p31[y];
+        float* p32Row = p32[y];
+
+        for (int x = 0; x < u1x.cols; ++x)
+        {
+            const float g1 = static_cast<float>(hypot(u1xRow[x], u1yRow[x]));
+            const float g2 = static_cast<float>(hypot(u2xRow[x], u2yRow[x]));
+
+            const float ng1  = 1.0f + taut * g1;
+            const float ng2 =  1.0f + taut * g2;
+
+            p11Row[x] = (p11Row[x] + taut * u1xRow[x]) / ng1;
+            p12Row[x] = (p12Row[x] + taut * u1yRow[x]) / ng1;
+            p21Row[x] = (p21Row[x] + taut * u2xRow[x]) / ng2;
+            p22Row[x] = (p22Row[x] + taut * u2yRow[x]) / ng2;
+
+            if (use_gamma)
+            {
+                const float g3 = static_cast<float>(hypot(u3xRow[x], u3yRow[x]));
+                const float ng3 = 1.0f + taut * g3;
+                p31Row[x] = (p31Row[x] + taut * u3xRow[x]) / ng3;
+                p32Row[x] = (p32Row[x] + taut * u3yRow[x]) / ng3;
+            }
+        }
+    }
+}
+
+static void estimateDualVariables(const Mat_<float>& u1x, const Mat_<float>& u1y,
+                     const Mat_<float>& u2x, const Mat_<float>& u2y,
+                     const Mat_<float>& u3x, const Mat_<float>& u3y,
+                           Mat_<float>& p11, Mat_<float>& p12,
+                     Mat_<float>& p21, Mat_<float>& p22,
+                     Mat_<float>& p31, Mat_<float>& p32,
+                     float taut, bool use_gamma)
+{
+    CV_DbgAssert( u1y.size() == u1x.size() );
+    CV_DbgAssert( u2x.size() == u1x.size() );
+    CV_DbgAssert( u3x.size() == u1x.size() );
+    CV_DbgAssert( u2y.size() == u1x.size() );
+    CV_DbgAssert( u3y.size() == u1x.size() );
+    CV_DbgAssert( p11.size() == u1x.size() );
+    CV_DbgAssert( p12.size() == u1x.size() );
+    CV_DbgAssert( p21.size() == u1x.size() );
+    CV_DbgAssert( p22.size() == u1x.size() );
+    CV_DbgAssert( p31.size() == u1x.size() );
+    CV_DbgAssert( p32.size() == u1x.size() );
+
+    EstimateDualVariablesBody body;
+
+    body.u1x = u1x;
+    body.u1y = u1y;
+    body.u2x = u2x;
+    body.u2y = u2y;
+    body.u3x = u3x;
+    body.u3y = u3y;
+    body.p11 = p11;
+    body.p12 = p12;
+    body.p21 = p21;
+    body.p22 = p22;
+    body.p31 = p31;
+    body.p32 = p32;
+    body.taut = taut;
+    body.use_gamma = use_gamma;
+
+    parallel_for_(Range(0, u1x.rows), body);
+}
+
+#ifdef HAVE_OPENCL
+bool OpticalFlowDual_TVL1::procOneScale_ocl(const UMat& I0, const UMat& I1, UMat& u1, UMat& u2)
+{
+    using namespace cv_ocl_tvl1flow;
+
+    const double scaledEpsilon = epsilon * epsilon * I0.size().area();
+
+    CV_DbgAssert(I1.size() == I0.size());
+    CV_DbgAssert(I1.type() == I0.type());
+    CV_DbgAssert(u1.empty() || u1.size() == I0.size());
+    CV_DbgAssert(u2.size() == u1.size());
+
+    if (u1.empty())
+    {
+        u1.create(I0.size(), CV_32FC1);
+        u1.setTo(Scalar::all(0));
+
+        u2.create(I0.size(), CV_32FC1);
+        u2.setTo(Scalar::all(0));
+    }
+
+    UMat I1x = dum.I1x_buf(Rect(0, 0, I0.cols, I0.rows));
+    UMat I1y = dum.I1y_buf(Rect(0, 0, I0.cols, I0.rows));
+
+    if (!centeredGradient(I1, I1x, I1y))
+        return false;
+
+    UMat I1w = dum.I1w_buf(Rect(0, 0, I0.cols, I0.rows));
+    UMat I1wx = dum.I1wx_buf(Rect(0, 0, I0.cols, I0.rows));
+    UMat I1wy = dum.I1wy_buf(Rect(0, 0, I0.cols, I0.rows));
+
+    UMat grad = dum.grad_buf(Rect(0, 0, I0.cols, I0.rows));
+    UMat rho_c = dum.rho_c_buf(Rect(0, 0, I0.cols, I0.rows));
+
+    UMat p11 = dum.p11_buf(Rect(0, 0, I0.cols, I0.rows));
+    UMat p12 = dum.p12_buf(Rect(0, 0, I0.cols, I0.rows));
+    UMat p21 = dum.p21_buf(Rect(0, 0, I0.cols, I0.rows));
+    UMat p22 = dum.p22_buf(Rect(0, 0, I0.cols, I0.rows));
+    p11.setTo(Scalar::all(0));
+    p12.setTo(Scalar::all(0));
+    p21.setTo(Scalar::all(0));
+    p22.setTo(Scalar::all(0));
+
+    UMat diff = dum.diff_buf(Rect(0, 0, I0.cols, I0.rows));
+
+    const float l_t = static_cast<float>(lambda * theta);
+    const float taut = static_cast<float>(tau / theta);
+    int n;
+
+    for (int warpings = 0; warpings < warps; ++warpings)
+    {
+        if (!warpBackward(I0, I1, I1x, I1y, u1, u2, I1w, I1wx, I1wy, grad, rho_c))
+            return false;
+
+        double error = std::numeric_limits<double>::max();
+        double prev_error = 0;
+
+        for (int n_outer = 0; error > scaledEpsilon && n_outer < outerIterations; ++n_outer)
+        {
+            if (medianFiltering > 1) {
+                cv::medianBlur(u1, u1, medianFiltering);
+                cv::medianBlur(u2, u2, medianFiltering);
+            }
+            for (int n_inner = 0; error > scaledEpsilon && n_inner < innerIterations; ++n_inner)
+            {
+                // some tweaks to make sum operation less frequently
+                n = n_inner + n_outer*innerIterations;
+                char calc_error = (n & 0x1) && (prev_error < scaledEpsilon);
+                if (!estimateU(I1wx, I1wy, grad, rho_c, p11, p12, p21, p22,
+                    u1, u2, diff, l_t, static_cast<float>(theta), calc_error))
+                    return false;
+                if (calc_error)
+                {
+                    error = cv::sum(diff)[0];
+                    prev_error = error;
+                }
+                else
+                {
+                    error = std::numeric_limits<double>::max();
+                    prev_error -= scaledEpsilon;
+                }
+                if (!estimateDualVariables(u1, u2, p11, p12, p21, p22, taut))
+                    return false;
+            }
+        }
+    }
+    return true;
+}
+#endif
+
+void OpticalFlowDual_TVL1::procOneScale(const Mat_<float>& I0, const Mat_<float>& I1, Mat_<float>& u1, Mat_<float>& u2, Mat_<float>& u3)
+{
+    const float scaledEpsilon = static_cast<float>(epsilon * epsilon * I0.size().area());
+
+    CV_DbgAssert( I1.size() == I0.size() );
+    CV_DbgAssert( I1.type() == I0.type() );
+    CV_DbgAssert( u1.size() == I0.size() );
+    CV_DbgAssert( u2.size() == u1.size() );
+
+    Mat_<float> I1x = dm.I1x_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> I1y = dm.I1y_buf(Rect(0, 0, I0.cols, I0.rows));
+    centeredGradient(I1, I1x, I1y);
+
+    Mat_<float> flowMap1 = dm.flowMap1_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> flowMap2 = dm.flowMap2_buf(Rect(0, 0, I0.cols, I0.rows));
+
+    Mat_<float> I1w = dm.I1w_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> I1wx = dm.I1wx_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> I1wy = dm.I1wy_buf(Rect(0, 0, I0.cols, I0.rows));
+
+    Mat_<float> grad = dm.grad_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> rho_c = dm.rho_c_buf(Rect(0, 0, I0.cols, I0.rows));
+
+    Mat_<float> v1 = dm.v1_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> v2 = dm.v2_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> v3 = dm.v3_buf(Rect(0, 0, I0.cols, I0.rows));
+
+    Mat_<float> p11 = dm.p11_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> p12 = dm.p12_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> p21 = dm.p21_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> p22 = dm.p22_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> p31 = dm.p31_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> p32 = dm.p32_buf(Rect(0, 0, I0.cols, I0.rows));
+    p11.setTo(Scalar::all(0));
+    p12.setTo(Scalar::all(0));
+    p21.setTo(Scalar::all(0));
+    p22.setTo(Scalar::all(0));
+    bool use_gamma = gamma != 0.;
+    if (use_gamma) p31.setTo(Scalar::all(0));
+    if (use_gamma) p32.setTo(Scalar::all(0));
+
+    Mat_<float> div_p1 = dm.div_p1_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> div_p2 = dm.div_p2_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> div_p3 = dm.div_p3_buf(Rect(0, 0, I0.cols, I0.rows));
+
+    Mat_<float> u1x = dm.u1x_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> u1y = dm.u1y_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> u2x = dm.u2x_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> u2y = dm.u2y_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> u3x = dm.u3x_buf(Rect(0, 0, I0.cols, I0.rows));
+    Mat_<float> u3y = dm.u3y_buf(Rect(0, 0, I0.cols, I0.rows));
+
+    const float l_t = static_cast<float>(lambda * theta);
+    const float taut = static_cast<float>(tau / theta);
+
+    for (int warpings = 0; warpings < warps; ++warpings)
+    {
+        // compute the warping of the target image and its derivatives
+        buildFlowMap(u1, u2, flowMap1, flowMap2);
+        remap(I1, I1w, flowMap1, flowMap2, INTER_CUBIC);
+        remap(I1x, I1wx, flowMap1, flowMap2, INTER_CUBIC);
+        remap(I1y, I1wy, flowMap1, flowMap2, INTER_CUBIC);
+        //calculate I1(x+u0) and its gradient
+        calcGradRho(I0, I1w, I1wx, I1wy, u1, u2, grad, rho_c);
+
+        float error = std::numeric_limits<float>::max();
+        for (int n_outer = 0; error > scaledEpsilon && n_outer < outerIterations; ++n_outer)
+        {
+            if (medianFiltering > 1) {
+                cv::medianBlur(u1, u1, medianFiltering);
+                cv::medianBlur(u2, u2, medianFiltering);
+            }
+            for (int n_inner = 0; error > scaledEpsilon && n_inner < innerIterations; ++n_inner)
+            {
+                // estimate the values of the variable (v1, v2) (thresholding operator TH)
+                estimateV(I1wx, I1wy, u1, u2, u3, grad, rho_c, v1, v2, v3, l_t, static_cast<float>(gamma));
+
+                // compute the divergence of the dual variable (p1, p2, p3)
+                divergence(p11, p12, div_p1);
+                divergence(p21, p22, div_p2);
+                if (use_gamma) divergence(p31, p32, div_p3);
+
+                // estimate the values of the optical flow (u1, u2)
+                error = estimateU(v1, v2, v3, div_p1, div_p2, div_p3, u1, u2, u3, static_cast<float>(theta), static_cast<float>(gamma));
+
+                // compute the gradient of the optical flow (Du1, Du2)
+                forwardGradient(u1, u1x, u1y);
+                forwardGradient(u2, u2x, u2y);
+                if (use_gamma) forwardGradient(u3, u3x, u3y);
+
+                // estimate the values of the dual variable (p1, p2, p3)
+                estimateDualVariables(u1x, u1y, u2x, u2y, u3x, u3y, p11, p12, p21, p22, p31, p32, taut, use_gamma);
+            }
+        }
+    }
+}
+
+void OpticalFlowDual_TVL1::collectGarbage()
+{
+    //dataMat structure dm
+    dm.I0s.clear();
+    dm.I1s.clear();
+    dm.u1s.clear();
+    dm.u2s.clear();
+
+    dm.I1x_buf.release();
+    dm.I1y_buf.release();
+
+    dm.flowMap1_buf.release();
+    dm.flowMap2_buf.release();
+
+    dm.I1w_buf.release();
+    dm.I1wx_buf.release();
+    dm.I1wy_buf.release();
+
+    dm.grad_buf.release();
+    dm.rho_c_buf.release();
+
+    dm.v1_buf.release();
+    dm.v2_buf.release();
+
+    dm.p11_buf.release();
+    dm.p12_buf.release();
+    dm.p21_buf.release();
+    dm.p22_buf.release();
+
+    dm.div_p1_buf.release();
+    dm.div_p2_buf.release();
+
+    dm.u1x_buf.release();
+    dm.u1y_buf.release();
+    dm.u2x_buf.release();
+    dm.u2y_buf.release();
+
+#ifdef HAVE_OPENCL
+    //dataUMat structure dum
+    dum.I0s.clear();
+    dum.I1s.clear();
+    dum.u1s.clear();
+    dum.u2s.clear();
+
+    dum.I1x_buf.release();
+    dum.I1y_buf.release();
+
+    dum.I1w_buf.release();
+    dum.I1wx_buf.release();
+    dum.I1wy_buf.release();
+
+    dum.grad_buf.release();
+    dum.rho_c_buf.release();
+
+    dum.p11_buf.release();
+    dum.p12_buf.release();
+    dum.p21_buf.release();
+    dum.p22_buf.release();
+
+    dum.diff_buf.release();
+    dum.norm_buf.release();
+#endif
+}
+
+Ptr<DualTVL1OpticalFlow> createOptFlow_DualTVL1()
+{
+    return makePtr<OpticalFlowDual_TVL1>();
+}
+
+Ptr<DualTVL1OpticalFlow> DualTVL1OpticalFlow::create(
+    double tau, double lambda, double theta, int nscales, int warps,
+    double epsilon, int innerIterations, int outerIterations, double scaleStep,
+    double gamma, int medianFilter, bool useInitialFlow)
+{
+    return makePtr<OpticalFlowDual_TVL1>(tau, lambda, theta, nscales, warps,
+                                         epsilon, innerIterations, outerIterations,
+                                         scaleStep, gamma, medianFilter, useInitialFlow);
+}
+
+}
+}
diff --git a/modules/optflow/src/variational_refinement.cpp b/modules/optflow/src/variational_refinement.cpp
deleted file mode 100644
index f52236e5b7f..00000000000
--- a/modules/optflow/src/variational_refinement.cpp
+++ /dev/null
@@ -1,1192 +0,0 @@
-/*M///////////////////////////////////////////////////////////////////////////////////////
-//
-//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
-//
-//  By downloading, copying, installing or using the software you agree to this license.
-//  If you do not agree to this license, do not download, install,
-//  copy or use the software.
-//
-//
-//                           License Agreement
-//                For Open Source Computer Vision Library
-//
-// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
-// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
-// Third party copyrights are property of their respective owners.
-//
-// Redistribution and use in source and binary forms, with or without modification,
-// are permitted provided that the following conditions are met:
-//
-//   * Redistribution's of source code must retain the above copyright notice,
-//     this list of conditions and the following disclaimer.
-//
-//   * Redistribution's in binary form must reproduce the above copyright notice,
-//     this list of conditions and the following disclaimer in the documentation
-//     and/or other materials provided with the distribution.
-//
-//   * The name of the copyright holders may not be used to endorse or promote products
-//     derived from this software without specific prior written permission.
-//
-// This software is provided by the copyright holders and contributors "as is" and
-// any express or implied warranties, including, but not limited to, the implied
-// warranties of merchantability and fitness for a particular purpose are disclaimed.
-// In no event shall the Intel Corporation or contributors be liable for any direct,
-// indirect, incidental, special, exemplary, or consequential damages
-// (including, but not limited to, procurement of substitute goods or services;
-// loss of use, data, or profits; or business interruption) however caused
-// and on any theory of liability, whether in contract, strict liability,
-// or tort (including negligence or otherwise) arising in any way out of
-// the use of this software, even if advised of the possibility of such damage.
-//
-//M*/
-
-#include "precomp.hpp"
-#include "opencv2/core/hal/intrin.hpp"
-
-using namespace std;
-
-namespace cv
-{
-namespace optflow
-{
-
-class VariationalRefinementImpl CV_FINAL : public VariationalRefinement
-{
-  public:
-    VariationalRefinementImpl();
-
-    void calc(InputArray I0, InputArray I1, InputOutputArray flow) CV_OVERRIDE;
-    void calcUV(InputArray I0, InputArray I1, InputOutputArray flow_u, InputOutputArray flow_v) CV_OVERRIDE;
-    void collectGarbage() CV_OVERRIDE;
-
-  protected: //!< algorithm parameters
-    int fixedPointIterations, sorIterations;
-    float omega;
-    float alpha, delta, gamma;
-    float zeta, epsilon;
-
-  public:
-    int getFixedPointIterations() const CV_OVERRIDE { return fixedPointIterations; }
-    void setFixedPointIterations(int val) CV_OVERRIDE { fixedPointIterations = val; }
-    int getSorIterations() const CV_OVERRIDE { return sorIterations; }
-    void setSorIterations(int val) CV_OVERRIDE { sorIterations = val; }
-    float getOmega() const CV_OVERRIDE { return omega; }
-    void setOmega(float val) CV_OVERRIDE { omega = val; }
-    float getAlpha() const CV_OVERRIDE { return alpha; }
-    void setAlpha(float val) CV_OVERRIDE { alpha = val; }
-    float getDelta() const CV_OVERRIDE { return delta; }
-    void setDelta(float val) CV_OVERRIDE { delta = val; }
-    float getGamma() const CV_OVERRIDE { return gamma; }
-    void setGamma(float val) CV_OVERRIDE { gamma = val; }
-
-  protected: //!< internal buffers
-    /* This struct defines a special data layout for Mat_<float>. Original buffer is split into two: one for "red"
-     * elements (sum of indices is even) and one for "black" (sum of indices is odd) in a checkerboard pattern. It
-     * allows for more efficient processing in SOR iterations, more natural SIMD vectorization and parallelization
-     * (Red-Black SOR). Additionally, it simplifies border handling by adding repeated borders to both red and
-     * black buffers.
-     */
-    struct RedBlackBuffer
-    {
-        Mat_<float> red;   //!< (i+j)%2==0
-        Mat_<float> black; //!< (i+j)%2==1
-
-        /* Width of even and odd rows may be different */
-        int red_even_len, red_odd_len;
-        int black_even_len, black_odd_len;
-
-        void create(Size s);
-        void release();
-    };
-
-    Mat_<float> Ix, Iy, Iz, Ixx, Ixy, Iyy, Ixz, Iyz;                            //!< image derivative buffers
-    RedBlackBuffer Ix_rb, Iy_rb, Iz_rb, Ixx_rb, Ixy_rb, Iyy_rb, Ixz_rb, Iyz_rb; //!< corresponding red-black buffers
-
-    RedBlackBuffer A11, A12, A22, b1, b2; //!< main linear system coefficients
-    RedBlackBuffer weights;               //!< smoothness term weights in the current fixed point iteration
-
-    Mat_<float> mapX, mapY; //!< auxiliary buffers for remapping
-
-    RedBlackBuffer tempW_u, tempW_v; //!< flow buffers that are modified in each fixed point iteration
-    RedBlackBuffer dW_u, dW_v;       //!< optical flow increment
-    RedBlackBuffer W_u_rb, W_v_rb;   //!< red-black-buffer version of the input flow
-
-  private: //!< private methods and parallel sections
-    void splitCheckerboard(RedBlackBuffer &dst, Mat &src);
-    void mergeCheckerboard(Mat &dst, RedBlackBuffer &src);
-    void updateRepeatedBorders(RedBlackBuffer &dst);
-    void warpImage(Mat &dst, Mat &src, Mat &flow_u, Mat &flow_v);
-    void prepareBuffers(Mat &I0, Mat &I1, Mat &W_u, Mat &W_v);
-
-    /* Parallelizing arbitrary operations with 3 input/output arguments */
-    typedef void (VariationalRefinementImpl::*Op)(void *op1, void *op2, void *op3);
-    struct ParallelOp_ParBody : public ParallelLoopBody
-    {
-        VariationalRefinementImpl *var;
-        vector<Op> ops;
-        vector<void *> op1s;
-        vector<void *> op2s;
-        vector<void *> op3s;
-
-        ParallelOp_ParBody(VariationalRefinementImpl &_var, vector<Op> _ops, vector<void *> &_op1s,
-                           vector<void *> &_op2s, vector<void *> &_op3s);
-        void operator()(const Range &range) const CV_OVERRIDE;
-    };
-    void gradHorizAndSplitOp(void *src, void *dst, void *dst_split)
-    {
-        Sobel(*(Mat *)src, *(Mat *)dst, -1, 1, 0, 1, 1, 0.00, BORDER_REPLICATE);
-        splitCheckerboard(*(RedBlackBuffer *)dst_split, *(Mat *)dst);
-    }
-    void gradVertAndSplitOp(void *src, void *dst, void *dst_split)
-    {
-        Sobel(*(Mat *)src, *(Mat *)dst, -1, 0, 1, 1, 1, 0.00, BORDER_REPLICATE);
-        splitCheckerboard(*(RedBlackBuffer *)dst_split, *(Mat *)dst);
-    }
-    void averageOp(void *src1, void *src2, void *dst)
-    {
-        addWeighted(*(Mat *)src1, 0.5, *(Mat *)src2, 0.5, 0.0, *(Mat *)dst, CV_32F);
-    }
-    void subtractOp(void *src1, void *src2, void *dst)
-    {
-        subtract(*(Mat *)src1, *(Mat *)src2, *(Mat *)dst, noArray(), CV_32F);
-    }
-
-    struct ComputeDataTerm_ParBody : public ParallelLoopBody
-    {
-        VariationalRefinementImpl *var;
-        int nstripes, stripe_sz;
-        int h;
-        RedBlackBuffer *dW_u, *dW_v;
-        bool red_pass;
-
-        ComputeDataTerm_ParBody(VariationalRefinementImpl &_var, int _nstripes, int _h, RedBlackBuffer &_dW_u,
-                                RedBlackBuffer &_dW_v, bool _red_pass);
-        void operator()(const Range &range) const CV_OVERRIDE;
-    };
-
-    struct ComputeSmoothnessTermHorPass_ParBody : public ParallelLoopBody
-    {
-        VariationalRefinementImpl *var;
-        int nstripes, stripe_sz;
-        int h;
-        RedBlackBuffer *W_u, *W_v, *curW_u, *curW_v;
-        bool red_pass;
-
-        ComputeSmoothnessTermHorPass_ParBody(VariationalRefinementImpl &_var, int _nstripes, int _h,
-                                             RedBlackBuffer &_W_u, RedBlackBuffer &_W_v, RedBlackBuffer &_tempW_u,
-                                             RedBlackBuffer &_tempW_v, bool _red_pass);
-        void operator()(const Range &range) const CV_OVERRIDE;
-    };
-
-    struct ComputeSmoothnessTermVertPass_ParBody : public ParallelLoopBody
-    {
-        VariationalRefinementImpl *var;
-        int nstripes, stripe_sz;
-        int h;
-        RedBlackBuffer *W_u, *W_v;
-        bool red_pass;
-
-        ComputeSmoothnessTermVertPass_ParBody(VariationalRefinementImpl &_var, int _nstripes, int _h,
-                                              RedBlackBuffer &W_u, RedBlackBuffer &_W_v, bool _red_pass);
-        void operator()(const Range &range) const CV_OVERRIDE;
-    };
-
-    struct RedBlackSOR_ParBody : public ParallelLoopBody
-    {
-        VariationalRefinementImpl *var;
-        int nstripes, stripe_sz;
-        int h;
-        RedBlackBuffer *dW_u, *dW_v;
-        bool red_pass;
-
-        RedBlackSOR_ParBody(VariationalRefinementImpl &_var, int _nstripes, int _h, RedBlackBuffer &_dW_u,
-                            RedBlackBuffer &_dW_v, bool _red_pass);
-        void operator()(const Range &range) const CV_OVERRIDE;
-    };
-};
-
-VariationalRefinementImpl::VariationalRefinementImpl()
-{
-    fixedPointIterations = 5;
-    sorIterations = 5;
-    alpha = 20.0f;
-    delta = 5.0f;
-    gamma = 10.0f;
-    omega = 1.6f;
-    zeta = 0.1f;
-    epsilon = 0.001f;
-}
-
-/* This function converts an input Mat into the RedBlackBuffer format, which involves
- * splitting the input buffer into two and adding repeated borders. Assumes that enough
- * memory in dst is already allocated.
- */
-void VariationalRefinementImpl::splitCheckerboard(RedBlackBuffer &dst, Mat &src)
-{
-    int buf_j, j;
-    int buf_w = (int)ceil(src.cols / 2.0) + 2; //!< max width of red/black buffers with borders
-
-    /* Rows of red and black buffers can have different actual width, some extra repeated values are
-     * added for padding in such cases.
-     */
-    for (int i = 0; i < src.rows; i++)
-    {
-        float *src_buf = src.ptr<float>(i);
-        float *r_buf = dst.red.ptr<float>(i + 1);
-        float *b_buf = dst.black.ptr<float>(i + 1);
-        r_buf[0] = b_buf[0] = src_buf[0];
-        buf_j = 1;
-        if (i % 2 == 0)
-        {
-            for (j = 0; j < src.cols - 1; j += 2)
-            {
-                r_buf[buf_j] = src_buf[j];
-                b_buf[buf_j] = src_buf[j + 1];
-                buf_j++;
-            }
-            if (j < src.cols)
-                r_buf[buf_j] = b_buf[buf_j] = src_buf[j];
-            else
-                j--;
-        }
-        else
-        {
-            for (j = 0; j < src.cols - 1; j += 2)
-            {
-                b_buf[buf_j] = src_buf[j];
-                r_buf[buf_j] = src_buf[j + 1];
-                buf_j++;
-            }
-            if (j < src.cols)
-                r_buf[buf_j] = b_buf[buf_j] = src_buf[j];
-            else
-                j--;
-        }
-        r_buf[buf_w - 1] = b_buf[buf_w - 1] = src_buf[j];
-    }
-
-    /* Fill top and bottom borders: */
-    {
-        float *r_buf_border = dst.red.ptr<float>(dst.red.rows - 1);
-        float *b_buf_border = dst.black.ptr<float>(dst.black.rows - 1);
-        float *r_buf = dst.red.ptr<float>(dst.red.rows - 2);
-        float *b_buf = dst.black.ptr<float>(dst.black.rows - 2);
-        memcpy(r_buf_border, b_buf, buf_w * sizeof(float));
-        memcpy(b_buf_border, r_buf, buf_w * sizeof(float));
-    }
-    {
-        float *r_buf_border = dst.red.ptr<float>(0);
-        float *b_buf_border = dst.black.ptr<float>(0);
-        float *r_buf = dst.red.ptr<float>(1);
-        float *b_buf = dst.black.ptr<float>(1);
-        memcpy(r_buf_border, b_buf, buf_w * sizeof(float));
-        memcpy(b_buf_border, r_buf, buf_w * sizeof(float));
-    }
-}
-
-/* The inverse of splitCheckerboard, i.e. converting the RedBlackBuffer back into Mat.
- * Assumes that enough memory in dst is already allocated.
- */
-void VariationalRefinementImpl::mergeCheckerboard(Mat &dst, RedBlackBuffer &src)
-{
-    int buf_j, j;
-    for (int i = 0; i < dst.rows; i++)
-    {
-        float *src_r_buf = src.red.ptr<float>(i + 1);
-        float *src_b_buf = src.black.ptr<float>(i + 1);
-        float *dst_buf = dst.ptr<float>(i);
-        buf_j = 1;
-
-        if (i % 2 == 0)
-        {
-            for (j = 0; j < dst.cols - 1; j += 2)
-            {
-                dst_buf[j] = src_r_buf[buf_j];
-                dst_buf[j + 1] = src_b_buf[buf_j];
-                buf_j++;
-            }
-            if (j < dst.cols)
-                dst_buf[j] = src_r_buf[buf_j];
-        }
-        else
-        {
-            for (j = 0; j < dst.cols - 1; j += 2)
-            {
-                dst_buf[j] = src_b_buf[buf_j];
-                dst_buf[j + 1] = src_r_buf[buf_j];
-                buf_j++;
-            }
-            if (j < dst.cols)
-                dst_buf[j] = src_b_buf[buf_j];
-        }
-    }
-}
-
-/* An auxiliary function that updates the borders. Used to enforce that border values repeat
- * the ones adjacent to the border.
- */
-void VariationalRefinementImpl::updateRepeatedBorders(RedBlackBuffer &dst)
-{
-    int buf_w = dst.red.cols;
-    for (int i = 0; i < dst.red.rows - 2; i++)
-    {
-        float *r_buf = dst.red.ptr<float>(i + 1);
-        float *b_buf = dst.black.ptr<float>(i + 1);
-
-        if (i % 2 == 0)
-        {
-            b_buf[0] = r_buf[1];
-            if (dst.red_even_len > dst.black_even_len)
-                b_buf[dst.black_even_len + 1] = r_buf[dst.red_even_len];
-            else
-                r_buf[dst.red_even_len + 1] = b_buf[dst.black_even_len];
-        }
-        else
-        {
-            r_buf[0] = b_buf[1];
-            if (dst.red_odd_len < dst.black_odd_len)
-                r_buf[dst.red_odd_len + 1] = b_buf[dst.black_odd_len];
-            else
-                b_buf[dst.black_odd_len + 1] = r_buf[dst.red_odd_len];
-        }
-    }
-    {
-        float *r_buf_border = dst.red.ptr<float>(dst.red.rows - 1);
-        float *b_buf_border = dst.black.ptr<float>(dst.black.rows - 1);
-        float *r_buf = dst.red.ptr<float>(dst.red.rows - 2);
-        float *b_buf = dst.black.ptr<float>(dst.black.rows - 2);
-        memcpy(r_buf_border, b_buf, buf_w * sizeof(float));
-        memcpy(b_buf_border, r_buf, buf_w * sizeof(float));
-    }
-    {
-        float *r_buf_border = dst.red.ptr<float>(0);
-        float *b_buf_border = dst.black.ptr<float>(0);
-        float *r_buf = dst.red.ptr<float>(1);
-        float *b_buf = dst.black.ptr<float>(1);
-        memcpy(r_buf_border, b_buf, buf_w * sizeof(float));
-        memcpy(b_buf_border, r_buf, buf_w * sizeof(float));
-    }
-}
-
-void VariationalRefinementImpl::RedBlackBuffer::create(Size s)
-{
-    /* Allocate enough memory to include borders */
-    int w = (int)ceil(s.width / 2.0) + 2;
-    red.create(s.height + 2, w);
-    black.create(s.height + 2, w);
-
-    if (s.width % 2 == 0)
-        red_even_len = red_odd_len = black_even_len = black_odd_len = w - 2;
-    else
-    {
-        red_even_len = black_odd_len = w - 2;
-        red_odd_len = black_even_len = w - 3;
-    }
-}
-
-void VariationalRefinementImpl::RedBlackBuffer::release()
-{
-    red.release();
-    black.release();
-}
-
-VariationalRefinementImpl::ParallelOp_ParBody::ParallelOp_ParBody(VariationalRefinementImpl &_var, vector<Op> _ops,
-                                                                  vector<void *> &_op1s, vector<void *> &_op2s,
-                                                                  vector<void *> &_op3s)
-    : var(&_var), ops(_ops), op1s(_op1s), op2s(_op2s), op3s(_op3s)
-{
-}
-
-void VariationalRefinementImpl::ParallelOp_ParBody::operator()(const Range &range) const
-{
-    for (int i = range.start; i < range.end; i++)
-        (var->*ops[i])(op1s[i], op2s[i], op3s[i]);
-}
-
-void VariationalRefinementImpl::warpImage(Mat &dst, Mat &src, Mat &flow_u, Mat &flow_v)
-{
-    for (int i = 0; i < flow_u.rows; i++)
-    {
-        float *pFlowU = flow_u.ptr<float>(i);
-        float *pFlowV = flow_v.ptr<float>(i);
-        float *pMapX = mapX.ptr<float>(i);
-        float *pMapY = mapY.ptr<float>(i);
-        for (int j = 0; j < flow_u.cols; j++)
-        {
-            pMapX[j] = j + pFlowU[j];
-            pMapY[j] = i + pFlowV[j];
-        }
-    }
-    remap(src, dst, mapX, mapY, INTER_LINEAR, BORDER_REPLICATE);
-}
-
-void VariationalRefinementImpl::prepareBuffers(Mat &I0, Mat &I1, Mat &W_u, Mat &W_v)
-{
-    Size s = I0.size();
-    A11.create(s);
-    A12.create(s);
-    A22.create(s);
-    b1.create(s);
-    b2.create(s);
-    weights.create(s);
-    weights.red.setTo(0.0f);
-    weights.black.setTo(0.0f);
-    tempW_u.create(s);
-    tempW_v.create(s);
-    dW_u.create(s);
-    dW_v.create(s);
-    W_u_rb.create(s);
-    W_v_rb.create(s);
-
-    Ix.create(s);
-    Iy.create(s);
-    Iz.create(s);
-    Ixx.create(s);
-    Ixy.create(s);
-    Iyy.create(s);
-    Ixz.create(s);
-    Iyz.create(s);
-
-    Ix_rb.create(s);
-    Iy_rb.create(s);
-    Iz_rb.create(s);
-    Ixx_rb.create(s);
-    Ixy_rb.create(s);
-    Iyy_rb.create(s);
-    Ixz_rb.create(s);
-    Iyz_rb.create(s);
-
-    mapX.create(s);
-    mapY.create(s);
-
-    /* Floating point warps work significantly better than fixed-point */
-    Mat I1flt, warpedI;
-    I1.convertTo(I1flt, CV_32F);
-    warpImage(warpedI, I1flt, W_u, W_v);
-
-    /* Computing an average of the current and warped next frames (to compute the derivatives on) and
-     * temporal derivative Iz
-     */
-    Mat averagedI;
-    {
-        vector<void *> op1s;
-        op1s.push_back((void *)&I0);
-        op1s.push_back((void *)&warpedI);
-        vector<void *> op2s;
-        op2s.push_back((void *)&warpedI);
-        op2s.push_back((void *)&I0);
-        vector<void *> op3s;
-        op3s.push_back((void *)&averagedI);
-        op3s.push_back((void *)&Iz);
-        vector<Op> ops;
-        ops.push_back(&VariationalRefinementImpl::averageOp);
-        ops.push_back(&VariationalRefinementImpl::subtractOp);
-        parallel_for_(Range(0, 2), ParallelOp_ParBody(*this, ops, op1s, op2s, op3s));
-    }
-    splitCheckerboard(Iz_rb, Iz);
-
-    /* Computing first-order derivatives */
-    {
-        vector<void *> op1s;
-        op1s.push_back((void *)&averagedI);
-        op1s.push_back((void *)&averagedI);
-        op1s.push_back((void *)&Iz);
-        op1s.push_back((void *)&Iz);
-        vector<void *> op2s;
-        op2s.push_back((void *)&Ix);
-        op2s.push_back((void *)&Iy);
-        op2s.push_back((void *)&Ixz);
-        op2s.push_back((void *)&Iyz);
-        vector<void *> op3s;
-        op3s.push_back((void *)&Ix_rb);
-        op3s.push_back((void *)&Iy_rb);
-        op3s.push_back((void *)&Ixz_rb);
-        op3s.push_back((void *)&Iyz_rb);
-        vector<Op> ops;
-        ops.push_back(&VariationalRefinementImpl::gradHorizAndSplitOp);
-        ops.push_back(&VariationalRefinementImpl::gradVertAndSplitOp);
-        ops.push_back(&VariationalRefinementImpl::gradHorizAndSplitOp);
-        ops.push_back(&VariationalRefinementImpl::gradVertAndSplitOp);
-        parallel_for_(Range(0, 4), ParallelOp_ParBody(*this, ops, op1s, op2s, op3s));
-    }
-
-    /* Computing second-order derivatives */
-    {
-        vector<void *> op1s;
-        op1s.push_back((void *)&Ix);
-        op1s.push_back((void *)&Ix);
-        op1s.push_back((void *)&Iy);
-        vector<void *> op2s;
-        op2s.push_back((void *)&Ixx);
-        op2s.push_back((void *)&Ixy);
-        op2s.push_back((void *)&Iyy);
-        vector<void *> op3s;
-        op3s.push_back((void *)&Ixx_rb);
-        op3s.push_back((void *)&Ixy_rb);
-        op3s.push_back((void *)&Iyy_rb);
-        vector<Op> ops;
-        ops.push_back(&VariationalRefinementImpl::gradHorizAndSplitOp);
-        ops.push_back(&VariationalRefinementImpl::gradVertAndSplitOp);
-        ops.push_back(&VariationalRefinementImpl::gradVertAndSplitOp);
-        parallel_for_(Range(0, 3), ParallelOp_ParBody(*this, ops, op1s, op2s, op3s));
-    }
-}
-
-VariationalRefinementImpl::ComputeDataTerm_ParBody::ComputeDataTerm_ParBody(VariationalRefinementImpl &_var,
-                                                                            int _nstripes, int _h,
-                                                                            RedBlackBuffer &_dW_u,
-                                                                            RedBlackBuffer &_dW_v, bool _red_pass)
-    : var(&_var), nstripes(_nstripes), h(_h), dW_u(&_dW_u), dW_v(&_dW_v), red_pass(_red_pass)
-{
-    stripe_sz = (int)ceil(h / (double)nstripes);
-}
-
-/* This function computes parts of the main linear system coefficients A11,A12,A22,b1,b1
- * that correspond to the data term, which includes color and gradient constancy assumptions.
- */
-void VariationalRefinementImpl::ComputeDataTerm_ParBody::operator()(const Range &range) const
-{
-    int start_i = min(range.start * stripe_sz, h);
-    int end_i = min(range.end * stripe_sz, h);
-
-    float zeta_squared = var->zeta * var->zeta;
-    float epsilon_squared = var->epsilon * var->epsilon;
-    float gamma2 = var->gamma / 2;
-    float delta2 = var->delta / 2;
-
-    float *pIx, *pIy, *pIz;
-    float *pIxx, *pIxy, *pIyy, *pIxz, *pIyz;
-    float *pdU, *pdV;
-    float *pa11, *pa12, *pa22, *pb1, *pb2;
-
-    float derivNorm, derivNorm2;
-    float Ik1z, Ik1zx, Ik1zy;
-    float weight;
-    int len;
-    for (int i = start_i; i < end_i; i++)
-    {
-#define INIT_ROW_POINTERS(color)                                                                                       \
-    pIx = var->Ix_rb.color.ptr<float>(i + 1) + 1;                                                                      \
-    pIy = var->Iy_rb.color.ptr<float>(i + 1) + 1;                                                                      \
-    pIz = var->Iz_rb.color.ptr<float>(i + 1) + 1;                                                                      \
-    pIxx = var->Ixx_rb.color.ptr<float>(i + 1) + 1;                                                                    \
-    pIxy = var->Ixy_rb.color.ptr<float>(i + 1) + 1;                                                                    \
-    pIyy = var->Iyy_rb.color.ptr<float>(i + 1) + 1;                                                                    \
-    pIxz = var->Ixz_rb.color.ptr<float>(i + 1) + 1;                                                                    \
-    pIyz = var->Iyz_rb.color.ptr<float>(i + 1) + 1;                                                                    \
-    pa11 = var->A11.color.ptr<float>(i + 1) + 1;                                                                       \
-    pa12 = var->A12.color.ptr<float>(i + 1) + 1;                                                                       \
-    pa22 = var->A22.color.ptr<float>(i + 1) + 1;                                                                       \
-    pb1 = var->b1.color.ptr<float>(i + 1) + 1;                                                                         \
-    pb2 = var->b2.color.ptr<float>(i + 1) + 1;                                                                         \
-    pdU = dW_u->color.ptr<float>(i + 1) + 1;                                                                           \
-    pdV = dW_v->color.ptr<float>(i + 1) + 1;                                                                           \
-    if (i % 2 == 0)                                                                                                    \
-        len = var->Ix_rb.color##_even_len;                                                                             \
-    else                                                                                                               \
-        len = var->Ix_rb.color##_odd_len;
-
-        if (red_pass)
-        {
-            INIT_ROW_POINTERS(red);
-        }
-        else
-        {
-            INIT_ROW_POINTERS(black);
-        }
-#undef INIT_ROW_POINTERS
-
-        int j = 0;
-#if CV_SIMD128
-        v_float32x4 zeta_vec = v_setall_f32(zeta_squared);
-        v_float32x4 eps_vec = v_setall_f32(epsilon_squared);
-        v_float32x4 delta_vec = v_setall_f32(delta2);
-        v_float32x4 gamma_vec = v_setall_f32(gamma2);
-        v_float32x4 zero_vec = v_setall_f32(0.0f);
-        v_float32x4 pIx_vec, pIy_vec, pIz_vec, pdU_vec, pdV_vec;
-        v_float32x4 pIxx_vec, pIxy_vec, pIyy_vec, pIxz_vec, pIyz_vec;
-        v_float32x4 derivNorm_vec, derivNorm2_vec, weight_vec;
-        v_float32x4 Ik1z_vec, Ik1zx_vec, Ik1zy_vec;
-        v_float32x4 pa11_vec, pa12_vec, pa22_vec, pb1_vec, pb2_vec;
-
-        for (; j < len - 3; j += 4)
-        {
-            pIx_vec = v_load(pIx + j);
-            pIy_vec = v_load(pIy + j);
-            pIz_vec = v_load(pIz + j);
-            pdU_vec = v_load(pdU + j);
-            pdV_vec = v_load(pdV + j);
-
-            derivNorm_vec = pIx_vec * pIx_vec + pIy_vec * pIy_vec + zeta_vec;
-            Ik1z_vec = pIz_vec + pIx_vec * pdU_vec + pIy_vec * pdV_vec;
-            weight_vec = (delta_vec / v_sqrt(Ik1z_vec * Ik1z_vec / derivNorm_vec + eps_vec)) / derivNorm_vec;
-
-            pa11_vec = weight_vec * (pIx_vec * pIx_vec) + zeta_vec;
-            pa12_vec = weight_vec * (pIx_vec * pIy_vec);
-            pa22_vec = weight_vec * (pIy_vec * pIy_vec) + zeta_vec;
-            pb1_vec = zero_vec - weight_vec * (pIz_vec * pIx_vec);
-            pb2_vec = zero_vec - weight_vec * (pIz_vec * pIy_vec);
-
-            pIxx_vec = v_load(pIxx + j);
-            pIxy_vec = v_load(pIxy + j);
-            pIyy_vec = v_load(pIyy + j);
-            pIxz_vec = v_load(pIxz + j);
-            pIyz_vec = v_load(pIyz + j);
-
-            derivNorm_vec = pIxx_vec * pIxx_vec + pIxy_vec * pIxy_vec + zeta_vec;
-            derivNorm2_vec = pIyy_vec * pIyy_vec + pIxy_vec * pIxy_vec + zeta_vec;
-            Ik1zx_vec = pIxz_vec + pIxx_vec * pdU_vec + pIxy_vec * pdV_vec;
-            Ik1zy_vec = pIyz_vec + pIxy_vec * pdU_vec + pIyy_vec * pdV_vec;
-            weight_vec = gamma_vec / v_sqrt(Ik1zx_vec * Ik1zx_vec / derivNorm_vec +
-                                            Ik1zy_vec * Ik1zy_vec / derivNorm2_vec + eps_vec);
-
-            pa11_vec += weight_vec * (pIxx_vec * pIxx_vec / derivNorm_vec + pIxy_vec * pIxy_vec / derivNorm2_vec);
-            pa12_vec += weight_vec * (pIxx_vec * pIxy_vec / derivNorm_vec + pIxy_vec * pIyy_vec / derivNorm2_vec);
-            pa22_vec += weight_vec * (pIxy_vec * pIxy_vec / derivNorm_vec + pIyy_vec * pIyy_vec / derivNorm2_vec);
-            pb1_vec -= weight_vec * (pIxx_vec * pIxz_vec / derivNorm_vec + pIxy_vec * pIyz_vec / derivNorm2_vec);
-            pb2_vec -= weight_vec * (pIxy_vec * pIxz_vec / derivNorm_vec + pIyy_vec * pIyz_vec / derivNorm2_vec);
-
-            v_store(pa11 + j, pa11_vec);
-            v_store(pa12 + j, pa12_vec);
-            v_store(pa22 + j, pa22_vec);
-            v_store(pb1 + j, pb1_vec);
-            v_store(pb2 + j, pb2_vec);
-        }
-#endif
-        for (; j < len; j++)
-        {
-            /* Step 1: Compute color constancy terms */
-            /* Normalization factor:*/
-            derivNorm = pIx[j] * pIx[j] + pIy[j] * pIy[j] + zeta_squared;
-            /* Color constancy penalty (computed by Taylor expansion):*/
-            Ik1z = pIz[j] + pIx[j] * pdU[j] + pIy[j] * pdV[j];
-            /* Weight of the color constancy term in the current fixed-point iteration divided by derivNorm: */
-            weight = (delta2 / sqrt(Ik1z * Ik1z / derivNorm + epsilon_squared)) / derivNorm;
-            /* Add respective color constancy terms to the linear system coefficients: */
-            pa11[j] = weight * (pIx[j] * pIx[j]) + zeta_squared;
-            pa12[j] = weight * (pIx[j] * pIy[j]);
-            pa22[j] = weight * (pIy[j] * pIy[j]) + zeta_squared;
-            pb1[j] = -weight * (pIz[j] * pIx[j]);
-            pb2[j] = -weight * (pIz[j] * pIy[j]);
-
-            /* Step 2: Compute gradient constancy terms */
-            /* Normalization factor for x gradient: */
-            derivNorm = pIxx[j] * pIxx[j] + pIxy[j] * pIxy[j] + zeta_squared;
-            /* Normalization factor for y gradient: */
-            derivNorm2 = pIyy[j] * pIyy[j] + pIxy[j] * pIxy[j] + zeta_squared;
-            /* Gradient constancy penalties (computed by Taylor expansion): */
-            Ik1zx = pIxz[j] + pIxx[j] * pdU[j] + pIxy[j] * pdV[j];
-            Ik1zy = pIyz[j] + pIxy[j] * pdU[j] + pIyy[j] * pdV[j];
-            /* Weight of the gradient constancy term in the current fixed-point iteration: */
-            weight = gamma2 / sqrt(Ik1zx * Ik1zx / derivNorm + Ik1zy * Ik1zy / derivNorm2 + epsilon_squared);
-            /* Add respective gradient constancy components to the linear system coefficients: */
-            pa11[j] += weight * (pIxx[j] * pIxx[j] / derivNorm + pIxy[j] * pIxy[j] / derivNorm2);
-            pa12[j] += weight * (pIxx[j] * pIxy[j] / derivNorm + pIxy[j] * pIyy[j] / derivNorm2);
-            pa22[j] += weight * (pIxy[j] * pIxy[j] / derivNorm + pIyy[j] * pIyy[j] / derivNorm2);
-            pb1[j] += -weight * (pIxx[j] * pIxz[j] / derivNorm + pIxy[j] * pIyz[j] / derivNorm2);
-            pb2[j] += -weight * (pIxy[j] * pIxz[j] / derivNorm + pIyy[j] * pIyz[j] / derivNorm2);
-        }
-    }
-}
-
-VariationalRefinementImpl::ComputeSmoothnessTermHorPass_ParBody::ComputeSmoothnessTermHorPass_ParBody(
-  VariationalRefinementImpl &_var, int _nstripes, int _h, RedBlackBuffer &_W_u, RedBlackBuffer &_W_v,
-  RedBlackBuffer &_tempW_u, RedBlackBuffer &_tempW_v, bool _red_pass)
-    : var(&_var), nstripes(_nstripes), h(_h), W_u(&_W_u), W_v(&_W_v), curW_u(&_tempW_u), curW_v(&_tempW_v),
-      red_pass(_red_pass)
-{
-    stripe_sz = (int)ceil(h / (double)nstripes);
-}
-
-/* This function updates the linear system coefficients A11,A22,b1,b1 according to the
- * flow smoothness term and computes corresponding weights for the current fixed point iteration.
- * A11,A22,b1,b1 are updated only partially (horizontal pass). Doing both horizontal and vertical
- * passes in one loop complicates parallelization (different threads write to the same elements).
- */
-void VariationalRefinementImpl::ComputeSmoothnessTermHorPass_ParBody::operator()(const Range &range) const
-{
-    int start_i = min(range.start * stripe_sz, h);
-    int end_i = min(range.end * stripe_sz, h);
-
-    float epsilon_squared = var->epsilon * var->epsilon;
-    float alpha2 = var->alpha / 2;
-    float *pWeight;
-    float *pA_u, *pA_u_next, *pA_v, *pA_v_next;
-    float *pB_u, *pB_u_next, *pB_v, *pB_v_next;
-    float *cW_u, *cW_u_next, *cW_u_next_row;
-    float *cW_v, *cW_v_next, *cW_v_next_row;
-    float *pW_u, *pW_u_next;
-    float *pW_v, *pW_v_next;
-    float ux, uy, vx, vy;
-    int len;
-    bool touches_right_border = true;
-
-#define INIT_ROW_POINTERS(cur_color, next_color, next_offs_even, next_offs_odd, bool_default)                          \
-    pWeight = var->weights.cur_color.ptr<float>(i + 1) + 1;                                                            \
-    pA_u = var->A11.cur_color.ptr<float>(i + 1) + 1;                                                                   \
-    pB_u = var->b1.cur_color.ptr<float>(i + 1) + 1;                                                                    \
-    cW_u = curW_u->cur_color.ptr<float>(i + 1) + 1;                                                                    \
-    pW_u = W_u->cur_color.ptr<float>(i + 1) + 1;                                                                       \
-    pA_v = var->A22.cur_color.ptr<float>(i + 1) + 1;                                                                   \
-    pB_v = var->b2.cur_color.ptr<float>(i + 1) + 1;                                                                    \
-    cW_v = curW_v->cur_color.ptr<float>(i + 1) + 1;                                                                    \
-    pW_v = W_v->cur_color.ptr<float>(i + 1) + 1;                                                                       \
-                                                                                                                       \
-    cW_u_next_row = curW_u->next_color.ptr<float>(i + 2) + 1;                                                          \
-    cW_v_next_row = curW_v->next_color.ptr<float>(i + 2) + 1;                                                          \
-                                                                                                                       \
-    if (i % 2 == 0)                                                                                                    \
-    {                                                                                                                  \
-        pA_u_next = var->A11.next_color.ptr<float>(i + 1) + next_offs_even;                                            \
-        pB_u_next = var->b1.next_color.ptr<float>(i + 1) + next_offs_even;                                             \
-        cW_u_next = curW_u->next_color.ptr<float>(i + 1) + next_offs_even;                                             \
-        pW_u_next = W_u->next_color.ptr<float>(i + 1) + next_offs_even;                                                \
-        pA_v_next = var->A22.next_color.ptr<float>(i + 1) + next_offs_even;                                            \
-        pB_v_next = var->b2.next_color.ptr<float>(i + 1) + next_offs_even;                                             \
-        cW_v_next = curW_v->next_color.ptr<float>(i + 1) + next_offs_even;                                             \
-        pW_v_next = W_v->next_color.ptr<float>(i + 1) + next_offs_even;                                                \
-        len = var->A11.cur_color##_even_len;                                                                           \
-        if (var->A11.cur_color##_even_len != var->A11.cur_color##_odd_len)                                             \
-            touches_right_border = bool_default;                                                                       \
-        else                                                                                                           \
-            touches_right_border = !bool_default;                                                                      \
-    }                                                                                                                  \
-    else                                                                                                               \
-    {                                                                                                                  \
-        pA_u_next = var->A11.next_color.ptr<float>(i + 1) + next_offs_odd;                                             \
-        pB_u_next = var->b1.next_color.ptr<float>(i + 1) + next_offs_odd;                                              \
-        cW_u_next = curW_u->next_color.ptr<float>(i + 1) + next_offs_odd;                                              \
-        pW_u_next = W_u->next_color.ptr<float>(i + 1) + next_offs_odd;                                                 \
-        pA_v_next = var->A22.next_color.ptr<float>(i + 1) + next_offs_odd;                                             \
-        pB_v_next = var->b2.next_color.ptr<float>(i + 1) + next_offs_odd;                                              \
-        cW_v_next = curW_v->next_color.ptr<float>(i + 1) + next_offs_odd;                                              \
-        pW_v_next = W_v->next_color.ptr<float>(i + 1) + next_offs_odd;                                                 \
-        len = var->A11.cur_color##_odd_len;                                                                            \
-        if (var->A11.cur_color##_even_len != var->A11.cur_color##_odd_len)                                             \
-            touches_right_border = !bool_default;                                                                      \
-        else                                                                                                           \
-            touches_right_border = bool_default;                                                                       \
-    }
-
-    for (int i = start_i; i < end_i; i++)
-    {
-        if (red_pass)
-        {
-            INIT_ROW_POINTERS(red, black, 1, 2, true);
-        }
-        else
-        {
-            INIT_ROW_POINTERS(black, red, 2, 1, false);
-        }
-#undef INIT_ROW_POINTERS
-
-#define COMPUTE                                                                                                        \
-    /* Gradients for the flow on the current fixed-point iteration: */                                                 \
-    ux = cW_u_next[j] - cW_u[j];                                                                                       \
-    vx = cW_v_next[j] - cW_v[j];                                                                                       \
-    uy = cW_u_next_row[j] - cW_u[j];                                                                                   \
-    vy = cW_v_next_row[j] - cW_v[j];                                                                                   \
-    /* Weight of the smoothness term in the current fixed-point iteration: */                                          \
-    pWeight[j] = alpha2 / sqrt(ux * ux + vx * vx + uy * uy + vy * vy + epsilon_squared);                               \
-    /* Gradients for initial raw flow multiplied by weight:*/                                                          \
-    ux = pWeight[j] * (pW_u_next[j] - pW_u[j]);                                                                        \
-    vx = pWeight[j] * (pW_v_next[j] - pW_v[j]);
-
-#define UPDATE                                                                                                         \
-    pB_u[j] += ux;                                                                                                     \
-    pA_u[j] += pWeight[j];                                                                                             \
-    pB_v[j] += vx;                                                                                                     \
-    pA_v[j] += pWeight[j];                                                                                             \
-    pB_u_next[j] -= ux;                                                                                                \
-    pA_u_next[j] += pWeight[j];                                                                                        \
-    pB_v_next[j] -= vx;                                                                                                \
-    pA_v_next[j] += pWeight[j];
-
-        int j = 0;
-#if CV_SIMD128
-        v_float32x4 alpha2_vec = v_setall_f32(alpha2);
-        v_float32x4 eps_vec = v_setall_f32(epsilon_squared);
-        v_float32x4 cW_u_vec, cW_v_vec;
-        v_float32x4 pWeight_vec, ux_vec, vx_vec, uy_vec, vy_vec;
-
-        for (; j < len - 4; j += 4)
-        {
-            cW_u_vec = v_load(cW_u + j);
-            cW_v_vec = v_load(cW_v + j);
-
-            ux_vec = v_load(cW_u_next + j) - cW_u_vec;
-            vx_vec = v_load(cW_v_next + j) - cW_v_vec;
-            uy_vec = v_load(cW_u_next_row + j) - cW_u_vec;
-            vy_vec = v_load(cW_v_next_row + j) - cW_v_vec;
-            pWeight_vec =
-              alpha2_vec / v_sqrt(ux_vec * ux_vec + vx_vec * vx_vec + uy_vec * uy_vec + vy_vec * vy_vec + eps_vec);
-            v_store(pWeight + j, pWeight_vec);
-
-            ux_vec = pWeight_vec * (v_load(pW_u_next + j) - v_load(pW_u + j));
-            vx_vec = pWeight_vec * (v_load(pW_v_next + j) - v_load(pW_v + j));
-
-            v_store(pA_u + j, v_load(pA_u + j) + pWeight_vec);
-            v_store(pA_v + j, v_load(pA_v + j) + pWeight_vec);
-            v_store(pB_u + j, v_load(pB_u + j) + ux_vec);
-            v_store(pB_v + j, v_load(pB_v + j) + vx_vec);
-
-            v_store(pA_u_next + j, v_load(pA_u_next + j) + pWeight_vec);
-            v_store(pA_v_next + j, v_load(pA_v_next + j) + pWeight_vec);
-            v_store(pB_u_next + j, v_load(pB_u_next + j) - ux_vec);
-            v_store(pB_v_next + j, v_load(pB_v_next + j) - vx_vec);
-        }
-#endif
-        for (; j < len - 1; j++)
-        {
-            COMPUTE;
-            UPDATE;
-        }
-
-        /* Omit the update on the rightmost elements */
-        if (touches_right_border)
-        {
-            COMPUTE;
-        }
-        else
-        {
-            COMPUTE;
-            UPDATE;
-        }
-    }
-#undef COMPUTE
-#undef UPDATE
-}
-
-VariationalRefinementImpl::ComputeSmoothnessTermVertPass_ParBody::ComputeSmoothnessTermVertPass_ParBody(
-  VariationalRefinementImpl &_var, int _nstripes, int _h, RedBlackBuffer &_W_u, RedBlackBuffer &_W_v, bool _red_pass)
-    : var(&_var), nstripes(_nstripes), W_u(&_W_u), W_v(&_W_v), red_pass(_red_pass)
-{
-    /* Omit the last row in the vertical pass */
-    h = _h - 1;
-    stripe_sz = (int)ceil(h / (double)nstripes);
-}
-
-/* This function adds the last remaining terms to the linear system coefficients A11,A22,b1,b1. */
-void VariationalRefinementImpl::ComputeSmoothnessTermVertPass_ParBody::operator()(const Range &range) const
-{
-    int start_i = min(range.start * stripe_sz, h);
-    int end_i = min(range.end * stripe_sz, h);
-
-    float *pWeight;
-    float *pA_u, *pA_u_next_row, *pA_v, *pA_v_next_row;
-    float *pB_u, *pB_u_next_row, *pB_v, *pB_v_next_row;
-    float *pW_u, *pW_u_next_row, *pW_v, *pW_v_next_row;
-    float vy, uy;
-    int len;
-
-    for (int i = start_i; i < end_i; i++)
-    {
-#define INIT_ROW_POINTERS(cur_color, next_color)                                                                       \
-    pWeight = var->weights.cur_color.ptr<float>(i + 1) + 1;                                                            \
-    pA_u = var->A11.cur_color.ptr<float>(i + 1) + 1;                                                                   \
-    pB_u = var->b1.cur_color.ptr<float>(i + 1) + 1;                                                                    \
-    pW_u = W_u->cur_color.ptr<float>(i + 1) + 1;                                                                       \
-    pA_v = var->A22.cur_color.ptr<float>(i + 1) + 1;                                                                   \
-    pB_v = var->b2.cur_color.ptr<float>(i + 1) + 1;                                                                    \
-    pW_v = W_v->cur_color.ptr<float>(i + 1) + 1;                                                                       \
-                                                                                                                       \
-    pA_u_next_row = var->A11.next_color.ptr<float>(i + 2) + 1;                                                         \
-    pB_u_next_row = var->b1.next_color.ptr<float>(i + 2) + 1;                                                          \
-    pW_u_next_row = W_u->next_color.ptr<float>(i + 2) + 1;                                                             \
-    pA_v_next_row = var->A22.next_color.ptr<float>(i + 2) + 1;                                                         \
-    pB_v_next_row = var->b2.next_color.ptr<float>(i + 2) + 1;                                                          \
-    pW_v_next_row = W_v->next_color.ptr<float>(i + 2) + 1;                                                             \
-                                                                                                                       \
-    if (i % 2 == 0)                                                                                                    \
-        len = var->A11.cur_color##_even_len;                                                                           \
-    else                                                                                                               \
-        len = var->A11.cur_color##_odd_len;
-
-        if (red_pass)
-        {
-            INIT_ROW_POINTERS(red, black);
-        }
-        else
-        {
-            INIT_ROW_POINTERS(black, red);
-        }
-#undef INIT_ROW_POINTERS
-
-        int j = 0;
-#if CV_SIMD128
-        v_float32x4 pWeight_vec, uy_vec, vy_vec;
-        for (; j < len - 3; j += 4)
-        {
-            pWeight_vec = v_load(pWeight + j);
-            uy_vec = pWeight_vec * (v_load(pW_u_next_row + j) - v_load(pW_u + j));
-            vy_vec = pWeight_vec * (v_load(pW_v_next_row + j) - v_load(pW_v + j));
-
-            v_store(pA_u + j, v_load(pA_u + j) + pWeight_vec);
-            v_store(pA_v + j, v_load(pA_v + j) + pWeight_vec);
-            v_store(pB_u + j, v_load(pB_u + j) + uy_vec);
-            v_store(pB_v + j, v_load(pB_v + j) + vy_vec);
-
-            v_store(pA_u_next_row + j, v_load(pA_u_next_row + j) + pWeight_vec);
-            v_store(pA_v_next_row + j, v_load(pA_v_next_row + j) + pWeight_vec);
-            v_store(pB_u_next_row + j, v_load(pB_u_next_row + j) - uy_vec);
-            v_store(pB_v_next_row + j, v_load(pB_v_next_row + j) - vy_vec);
-        }
-#endif
-        for (; j < len; j++)
-        {
-            uy = pWeight[j] * (pW_u_next_row[j] - pW_u[j]);
-            vy = pWeight[j] * (pW_v_next_row[j] - pW_v[j]);
-            pB_u[j] += uy;
-            pA_u[j] += pWeight[j];
-            pB_v[j] += vy;
-            pA_v[j] += pWeight[j];
-            pB_u_next_row[j] -= uy;
-            pA_u_next_row[j] += pWeight[j];
-            pB_v_next_row[j] -= vy;
-            pA_v_next_row[j] += pWeight[j];
-        }
-    }
-}
-
-VariationalRefinementImpl::RedBlackSOR_ParBody::RedBlackSOR_ParBody(VariationalRefinementImpl &_var, int _nstripes,
-                                                                    int _h, RedBlackBuffer &_dW_u,
-                                                                    RedBlackBuffer &_dW_v, bool _red_pass)
-    : var(&_var), nstripes(_nstripes), h(_h), dW_u(&_dW_u), dW_v(&_dW_v), red_pass(_red_pass)
-{
-    stripe_sz = (int)ceil(h / (double)nstripes);
-}
-
-/* This function implements the Red-Black SOR (successive-over relaxation) method for solving the main
- * linear system in the current fixed-point iteration.
- */
-void VariationalRefinementImpl::RedBlackSOR_ParBody::operator()(const Range &range) const
-{
-    int start = min(range.start * stripe_sz, h);
-    int end = min(range.end * stripe_sz, h);
-
-    float *pa11, *pa12, *pa22, *pb1, *pb2, *pW, *pdu, *pdv;
-    float *pW_next, *pdu_next, *pdv_next;
-    float *pW_prev_row, *pdu_prev_row, *pdv_prev_row;
-    float *pdu_next_row, *pdv_next_row;
-
-    float sigmaU, sigmaV;
-    int j, len;
-    for (int i = start; i < end; i++)
-    {
-#define INIT_ROW_POINTERS(cur_color, next_color, next_offs_even, next_offs_odd)                                        \
-    pW = var->weights.cur_color.ptr<float>(i + 1) + 1;                                                                 \
-    pa11 = var->A11.cur_color.ptr<float>(i + 1) + 1;                                                                   \
-    pa12 = var->A12.cur_color.ptr<float>(i + 1) + 1;                                                                   \
-    pa22 = var->A22.cur_color.ptr<float>(i + 1) + 1;                                                                   \
-    pb1 = var->b1.cur_color.ptr<float>(i + 1) + 1;                                                                     \
-    pb2 = var->b2.cur_color.ptr<float>(i + 1) + 1;                                                                     \
-    pdu = dW_u->cur_color.ptr<float>(i + 1) + 1;                                                                       \
-    pdv = dW_v->cur_color.ptr<float>(i + 1) + 1;                                                                       \
-                                                                                                                       \
-    pdu_next_row = dW_u->next_color.ptr<float>(i + 2) + 1;                                                             \
-    pdv_next_row = dW_v->next_color.ptr<float>(i + 2) + 1;                                                             \
-                                                                                                                       \
-    pW_prev_row = var->weights.next_color.ptr<float>(i) + 1;                                                           \
-    pdu_prev_row = dW_u->next_color.ptr<float>(i) + 1;                                                                 \
-    pdv_prev_row = dW_v->next_color.ptr<float>(i) + 1;                                                                 \
-                                                                                                                       \
-    if (i % 2 == 0)                                                                                                    \
-    {                                                                                                                  \
-        pW_next = var->weights.next_color.ptr<float>(i + 1) + next_offs_even;                                          \
-        pdu_next = dW_u->next_color.ptr<float>(i + 1) + next_offs_even;                                                \
-        pdv_next = dW_v->next_color.ptr<float>(i + 1) + next_offs_even;                                                \
-        len = var->A11.cur_color##_even_len;                                                                           \
-    }                                                                                                                  \
-    else                                                                                                               \
-    {                                                                                                                  \
-        pW_next = var->weights.next_color.ptr<float>(i + 1) + next_offs_odd;                                           \
-        pdu_next = dW_u->next_color.ptr<float>(i + 1) + next_offs_odd;                                                 \
-        pdv_next = dW_v->next_color.ptr<float>(i + 1) + next_offs_odd;                                                 \
-        len = var->A11.cur_color##_odd_len;                                                                            \
-    }
-        if (red_pass)
-        {
-            INIT_ROW_POINTERS(red, black, 1, 2);
-        }
-        else
-        {
-            INIT_ROW_POINTERS(black, red, 2, 1);
-        }
-#undef INIT_ROW_POINTERS
-
-        j = 0;
-#if CV_SIMD128
-        v_float32x4 pW_prev_vec = v_setall_f32(pW_next[-1]);
-        v_float32x4 pdu_prev_vec = v_setall_f32(pdu_next[-1]);
-        v_float32x4 pdv_prev_vec = v_setall_f32(pdv_next[-1]);
-        v_float32x4 omega_vec = v_setall_f32(var->omega);
-        v_float32x4 pW_vec, pW_next_vec, pW_prev_row_vec;
-        v_float32x4 pdu_next_vec, pdu_prev_row_vec, pdu_next_row_vec;
-        v_float32x4 pdv_next_vec, pdv_prev_row_vec, pdv_next_row_vec;
-        v_float32x4 pW_shifted_vec, pdu_shifted_vec, pdv_shifted_vec;
-        v_float32x4 pa12_vec, sigmaU_vec, sigmaV_vec, pdu_vec, pdv_vec;
-        for (; j < len - 3; j += 4)
-        {
-            pW_vec = v_load(pW + j);
-            pW_next_vec = v_load(pW_next + j);
-            pW_prev_row_vec = v_load(pW_prev_row + j);
-            pdu_next_vec = v_load(pdu_next + j);
-            pdu_prev_row_vec = v_load(pdu_prev_row + j);
-            pdu_next_row_vec = v_load(pdu_next_row + j);
-            pdv_next_vec = v_load(pdv_next + j);
-            pdv_prev_row_vec = v_load(pdv_prev_row + j);
-            pdv_next_row_vec = v_load(pdv_next_row + j);
-            pa12_vec = v_load(pa12 + j);
-            pW_shifted_vec = v_reinterpret_as_f32(
-              v_extract<3>(v_reinterpret_as_s32(pW_prev_vec), v_reinterpret_as_s32(pW_next_vec)));
-            pdu_shifted_vec = v_reinterpret_as_f32(
-              v_extract<3>(v_reinterpret_as_s32(pdu_prev_vec), v_reinterpret_as_s32(pdu_next_vec)));
-            pdv_shifted_vec = v_reinterpret_as_f32(
-              v_extract<3>(v_reinterpret_as_s32(pdv_prev_vec), v_reinterpret_as_s32(pdv_next_vec)));
-
-            sigmaU_vec = pW_shifted_vec * pdu_shifted_vec + pW_vec * pdu_next_vec + pW_prev_row_vec * pdu_prev_row_vec +
-                         pW_vec * pdu_next_row_vec;
-            sigmaV_vec = pW_shifted_vec * pdv_shifted_vec + pW_vec * pdv_next_vec + pW_prev_row_vec * pdv_prev_row_vec +
-                         pW_vec * pdv_next_row_vec;
-
-            pdu_vec = v_load(pdu + j);
-            pdv_vec = v_load(pdv + j);
-            pdu_vec += omega_vec * ((sigmaU_vec + v_load(pb1 + j) - pdv_vec * pa12_vec) / v_load(pa11 + j) - pdu_vec);
-            pdv_vec += omega_vec * ((sigmaV_vec + v_load(pb2 + j) - pdu_vec * pa12_vec) / v_load(pa22 + j) - pdv_vec);
-            v_store(pdu + j, pdu_vec);
-            v_store(pdv + j, pdv_vec);
-
-            pW_prev_vec = pW_next_vec;
-            pdu_prev_vec = pdu_next_vec;
-            pdv_prev_vec = pdv_next_vec;
-        }
-#endif
-        for (; j < len; j++)
-        {
-            sigmaU = pW_next[j - 1] * pdu_next[j - 1] + pW[j] * pdu_next[j] + pW_prev_row[j] * pdu_prev_row[j] +
-                     pW[j] * pdu_next_row[j];
-            sigmaV = pW_next[j - 1] * pdv_next[j - 1] + pW[j] * pdv_next[j] + pW_prev_row[j] * pdv_prev_row[j] +
-                     pW[j] * pdv_next_row[j];
-            pdu[j] += var->omega * ((sigmaU + pb1[j] - pdv[j] * pa12[j]) / pa11[j] - pdu[j]);
-            pdv[j] += var->omega * ((sigmaV + pb2[j] - pdu[j] * pa12[j]) / pa22[j] - pdv[j]);
-        }
-    }
-}
-
-void VariationalRefinementImpl::calc(InputArray I0, InputArray I1, InputOutputArray flow)
-{
-    CV_Assert(!I0.empty() && I0.channels() == 1);
-    CV_Assert(!I1.empty() && I1.channels() == 1);
-    CV_Assert(I0.sameSize(I1));
-    CV_Assert((I0.depth() == CV_8U && I1.depth() == CV_8U) || (I0.depth() == CV_32F && I1.depth() == CV_32F));
-    CV_Assert(!flow.empty() && flow.depth() == CV_32F && flow.channels() == 2);
-    CV_Assert(I0.sameSize(flow));
-
-    Mat uv[2];
-    Mat &flowMat = flow.getMatRef();
-    split(flowMat, uv);
-    calcUV(I0, I1, uv[0], uv[1]);
-    merge(uv, 2, flowMat);
-}
-
-void VariationalRefinementImpl::calcUV(InputArray I0, InputArray I1, InputOutputArray flow_u, InputOutputArray flow_v)
-{
-    CV_Assert(!I0.empty() && I0.channels() == 1);
-    CV_Assert(!I1.empty() && I1.channels() == 1);
-    CV_Assert(I0.sameSize(I1));
-    CV_Assert((I0.depth() == CV_8U && I1.depth() == CV_8U) || (I0.depth() == CV_32F && I1.depth() == CV_32F));
-    CV_Assert(!flow_u.empty() && flow_u.depth() == CV_32F && flow_u.channels() == 1);
-    CV_Assert(!flow_v.empty() && flow_v.depth() == CV_32F && flow_v.channels() == 1);
-    CV_Assert(I0.sameSize(flow_u));
-    CV_Assert(flow_u.sameSize(flow_v));
-
-    int num_stripes = getNumThreads();
-    Mat I0Mat = I0.getMat();
-    Mat I1Mat = I1.getMat();
-    Mat &W_u = flow_u.getMatRef();
-    Mat &W_v = flow_v.getMatRef();
-    prepareBuffers(I0Mat, I1Mat, W_u, W_v);
-
-    splitCheckerboard(W_u_rb, W_u);
-    splitCheckerboard(W_v_rb, W_v);
-    W_u_rb.red.copyTo(tempW_u.red);
-    W_u_rb.black.copyTo(tempW_u.black);
-    W_v_rb.red.copyTo(tempW_v.red);
-    W_v_rb.black.copyTo(tempW_v.black);
-    dW_u.red.setTo(0.0f);
-    dW_u.black.setTo(0.0f);
-    dW_v.red.setTo(0.0f);
-    dW_v.black.setTo(0.0f);
-
-    for (int i = 0; i < fixedPointIterations; i++)
-    {
-        parallel_for_(Range(0, num_stripes), ComputeDataTerm_ParBody(*this, num_stripes, I0Mat.rows, dW_u, dW_v, true));
-        parallel_for_(Range(0, num_stripes), ComputeDataTerm_ParBody(*this, num_stripes, I0Mat.rows, dW_u, dW_v, false));
-
-        parallel_for_(Range(0, num_stripes), ComputeSmoothnessTermHorPass_ParBody(
-                                               *this, num_stripes, I0Mat.rows, W_u_rb, W_v_rb, tempW_u, tempW_v, true));
-        parallel_for_(Range(0, num_stripes), ComputeSmoothnessTermHorPass_ParBody(
-                                               *this, num_stripes, I0Mat.rows, W_u_rb, W_v_rb, tempW_u, tempW_v, false));
-
-        parallel_for_(Range(0, num_stripes),
-                      ComputeSmoothnessTermVertPass_ParBody(*this, num_stripes, I0Mat.rows, W_u_rb, W_v_rb, true));
-        parallel_for_(Range(0, num_stripes),
-                      ComputeSmoothnessTermVertPass_ParBody(*this, num_stripes, I0Mat.rows, W_u_rb, W_v_rb, false));
-
-        for (int j = 0; j < sorIterations; j++)
-        {
-            parallel_for_(Range(0, num_stripes), RedBlackSOR_ParBody(*this, num_stripes, I0Mat.rows, dW_u, dW_v, true));
-            parallel_for_(Range(0, num_stripes), RedBlackSOR_ParBody(*this, num_stripes, I0Mat.rows, dW_u, dW_v, false));
-        }
-
-        tempW_u.red = W_u_rb.red + dW_u.red;
-        tempW_u.black = W_u_rb.black + dW_u.black;
-        updateRepeatedBorders(tempW_u);
-        tempW_v.red = W_v_rb.red + dW_v.red;
-        tempW_v.black = W_v_rb.black + dW_v.black;
-        updateRepeatedBorders(tempW_v);
-    }
-    mergeCheckerboard(W_u, tempW_u);
-    mergeCheckerboard(W_v, tempW_v);
-}
-void VariationalRefinementImpl::collectGarbage()
-{
-    Ix.release();
-    Iy.release();
-    Iz.release();
-    Ixx.release();
-    Ixy.release();
-    Iyy.release();
-    Ixz.release();
-    Iyz.release();
-
-    Ix_rb.release();
-    Iy_rb.release();
-    Iz_rb.release();
-    Ixx_rb.release();
-    Ixy_rb.release();
-    Iyy_rb.release();
-    Ixz_rb.release();
-    Iyz_rb.release();
-
-    A11.release();
-    A12.release();
-    A22.release();
-    b1.release();
-    b2.release();
-    weights.release();
-
-    mapX.release();
-    mapY.release();
-
-    tempW_u.release();
-    tempW_v.release();
-    dW_u.release();
-    dW_v.release();
-    W_u_rb.release();
-    W_v_rb.release();
-}
-
-Ptr<VariationalRefinement> createVariationalFlowRefinement() { return makePtr<VariationalRefinementImpl>(); }
-}
-}
diff --git a/modules/optflow/test/ocl/test_optflow_tvl1flow.cpp b/modules/optflow/test/ocl/test_optflow_tvl1flow.cpp
new file mode 100644
index 00000000000..4c8dca43050
--- /dev/null
+++ b/modules/optflow/test/ocl/test_optflow_tvl1flow.cpp
@@ -0,0 +1,117 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2010-2012, Institute Of Software Chinese Academy Of Science, all rights reserved.
+// Copyright (C) 2010-2012, Advanced Micro Devices, Inc., all rights reserved.
+// Copyright (C) 2010-2012, Multicoreware, Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "../test_precomp.hpp"
+#include "opencv2/ts/ocl_test.hpp"
+
+#ifdef HAVE_OPENCL
+
+namespace opencv_test {
+namespace ocl {
+
+/////////////////////////////////////////////////////////////////////////////////////////////////
+// Optical_flow_tvl1
+namespace
+{
+    IMPLEMENT_PARAM_CLASS(UseInitFlow, bool)
+    IMPLEMENT_PARAM_CLASS(MedianFiltering, int)
+    IMPLEMENT_PARAM_CLASS(ScaleStep, double)
+}
+
+PARAM_TEST_CASE(OpticalFlowTVL1, UseInitFlow, MedianFiltering, ScaleStep)
+{
+    bool useInitFlow;
+    int medianFiltering;
+    double scaleStep;
+    virtual void SetUp()
+    {
+        useInitFlow = GET_PARAM(0);
+        medianFiltering = GET_PARAM(1);
+        scaleStep = GET_PARAM(2);
+    }
+};
+
+OCL_TEST_P(OpticalFlowTVL1, Mat)
+{
+    Mat frame0 = readImage("optflow/RubberWhale1.png", IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame0.empty());
+
+    Mat frame1 = readImage("optflow/RubberWhale2.png", IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame1.empty());
+
+    Mat flow; UMat uflow;
+
+    //create algorithm
+    Ptr<DualTVL1OpticalFlow> alg = createOptFlow_DualTVL1();
+
+    //set parameters
+    alg->setScaleStep(scaleStep);
+    alg->setMedianFiltering(medianFiltering);
+
+    //create initial flow as result of algorithm calculation
+    if (useInitFlow)
+    {
+        OCL_ON(alg->calc(frame0, frame1, uflow));
+        uflow.copyTo(flow);
+    }
+
+    //set flag to use initial flow as it is ready to use
+    alg->setUseInitialFlow(useInitFlow);
+
+    OCL_OFF(alg->calc(frame0, frame1, flow));
+    OCL_ON(alg->calc(frame0, frame1, uflow));
+
+    EXPECT_MAT_SIMILAR(flow, uflow, 1e-2);
+}
+
+OCL_INSTANTIATE_TEST_CASE_P(Contrib, OpticalFlowTVL1,
+    Combine(
+    Values(UseInitFlow(false), UseInitFlow(true)),
+    Values(MedianFiltering(3), MedianFiltering(-1)),
+    Values(ScaleStep(0.3),ScaleStep(0.5))
+    )
+    );
+
+} } // namespace opencv_test::ocl
+
+#endif // HAVE_OPENCL
diff --git a/modules/optflow/test/test_OF_accuracy.cpp b/modules/optflow/test/test_OF_accuracy.cpp
index bf83e5052eb..aa3e67bded4 100644
--- a/modules/optflow/test/test_OF_accuracy.cpp
+++ b/modules/optflow/test/test_OF_accuracy.cpp
@@ -83,7 +83,20 @@ static float calcRMSE(Mat flow1, Mat flow2)
     }
     return (float)sqrt(sum / (1e-9 + counter));
 }
-
+static float calcRMSE(vector<Point2f> prevPts, vector<Point2f> currPts, Mat flow)
+{
+    vector<float> ee;
+    for (unsigned int n = 0; n < prevPts.size(); n++)
+    {
+        Point2f gtFlow = flow.at<Point2f>(prevPts[n]);
+        if (isFlowCorrect(gtFlow.x) && isFlowCorrect(gtFlow.y))
+        {
+            Point2f diffFlow = (currPts[n] - prevPts[n]) - gtFlow;
+            ee.push_back(sqrt(diffFlow.x * diffFlow.x + diffFlow.y * diffFlow.y));
+        }
+    }
+    return static_cast<float>(mean(ee).val[0]);
+}
 static float calcAvgEPE(vector< pair<Point2i, Point2i> > corr, Mat flow)
 {
     double sum = 0;
@@ -111,9 +124,13 @@ static float calcAvgEPE(vector< pair<Point2i, Point2i> > corr, Mat flow)
 
 bool readRubberWhale(Mat &dst_frame_1, Mat &dst_frame_2, Mat &dst_GT)
 {
-    const string frame1_path = getRubberWhaleFrame1();
-    const string frame2_path = getRubberWhaleFrame2();
-    const string gt_flow_path = getRubberWhaleGroundTruth();
+    string frame1_path = getRubberWhaleFrame1();
+    string frame2_path = getRubberWhaleFrame2();
+    string gt_flow_path = getRubberWhaleGroundTruth();
+    // removing space may be an issue on windows machines
+    frame1_path.erase(std::remove_if(frame1_path.begin(), frame1_path.end(), isspace), frame1_path.end());
+    frame2_path.erase(std::remove_if(frame2_path.begin(), frame2_path.end(), isspace), frame2_path.end());
+    gt_flow_path.erase(std::remove_if(gt_flow_path.begin(), gt_flow_path.end(), isspace), gt_flow_path.end());
 
     dst_frame_1 = imread(frame1_path);
     dst_frame_2 = imread(frame2_path);
@@ -125,6 +142,7 @@ bool readRubberWhale(Mat &dst_frame_1, Mat &dst_frame_2, Mat &dst_GT)
         return true;
 }
 
+
 TEST(DenseOpticalFlow_SimpleFlow, ReferenceAccuracy)
 {
     Mat frame1, frame2, GT;
@@ -157,62 +175,112 @@ TEST(DenseOpticalFlow_DeepFlow, ReferenceAccuracy)
     EXPECT_LE(calcRMSE(GT, flow), target_RMSE);
 }
 
-TEST(DenseOpticalFlow_SparseToDenseFlow, ReferenceAccuracy)
+TEST(SparseOpticalFlow, ReferenceAccuracy)
 {
+    // with the following test each invoker class should be tested once
     Mat frame1, frame2, GT;
     ASSERT_TRUE(readRubberWhale(frame1, frame2, GT));
-    float target_RMSE = 0.52f;
+    vector<Point2f> prevPts, currPts;
+    for (int r = 0; r < frame1.rows; r+=10)
+    {
+        for (int c = 0; c < frame1.cols; c+=10)
+        {
+            prevPts.push_back(Point2f(static_cast<float>(c), static_cast<float>(r)));
+        }
+    }
+    vector<uchar> status(prevPts.size());
+    vector<float> err(prevPts.size());
+    Ptr<SparseRLOFOpticalFlow> algo = SparseRLOFOpticalFlow::create();
+    algo->setForwardBackward(0.0f);
+    Ptr<RLOFOpticalFlowParameter> param = Ptr<RLOFOpticalFlowParameter>(new RLOFOpticalFlowParameter);
+    param->supportRegionType = SR_CROSS;
+    param->useIlluminationModel = true;
+    param->solverType = ST_BILINEAR;
+    algo->setRLOFOpticalFlowParameter(param);
+    algo->calc(frame1, frame2, prevPts, currPts, status, err);
+    EXPECT_LE(calcRMSE(prevPts, currPts, GT), 0.3f);
+
+    param->solverType = ST_STANDART;
+    algo->setRLOFOpticalFlowParameter(param);
+    algo->calc(frame1, frame2, prevPts, currPts, status, err);
+    EXPECT_LE(calcRMSE(prevPts, currPts, GT), 0.34f);
+
+    param->useIlluminationModel = false;
+    param->solverType = ST_BILINEAR;
+    algo->setRLOFOpticalFlowParameter(param);
+    algo->calc(frame1, frame2, prevPts, currPts, status, err);
+    EXPECT_LE(calcRMSE(prevPts, currPts, GT), 0.27f);
+
+    param->solverType = ST_STANDART;
+    algo->setRLOFOpticalFlowParameter(param);
+    algo->calc(frame1, frame2, prevPts, currPts, status, err);
+    EXPECT_LE(calcRMSE(prevPts, currPts, GT), 0.27f);
+
+    param->normSigma0 = numeric_limits<float>::max();
+    param->normSigma1 = numeric_limits<float>::max();
+    param->useIlluminationModel = true;
+
+    param->solverType = ST_BILINEAR;
+    algo->setRLOFOpticalFlowParameter(param);
+    algo->calc(frame1, frame2, prevPts, currPts, status, err);
+    EXPECT_LE(calcRMSE(prevPts, currPts, GT), 0.28f);
+
+    param->solverType = ST_STANDART;
+    algo->setRLOFOpticalFlowParameter(param);
+    algo->calc(frame1, frame2, prevPts, currPts, status, err);
+    EXPECT_LE(calcRMSE(prevPts, currPts, GT), 0.28f);
+
+    param->useIlluminationModel = false;
+
+    param->solverType = ST_BILINEAR;
+    algo->setRLOFOpticalFlowParameter(param);
+    algo->calc(frame1, frame2, prevPts, currPts, status, err);
+    EXPECT_LE(calcRMSE(prevPts, currPts, GT), 0.80f);
+
+    param->solverType = ST_STANDART;
+    algo->setRLOFOpticalFlowParameter(param);
+    algo->calc(frame1, frame2, prevPts, currPts, status, err);
+    EXPECT_LE(calcRMSE(prevPts, currPts, GT), 0.28f);
+}
 
+TEST(DenseOpticalFlow_RLOF, ReferenceAccuracy)
+{
+    Mat frame1, frame2, GT;
+    ASSERT_TRUE(readRubberWhale(frame1, frame2, GT));
     Mat flow;
-    Ptr<DenseOpticalFlow> algo;
-    algo = createOptFlow_SparseToDense();
+    Ptr<DenseRLOFOpticalFlow> algo = DenseRLOFOpticalFlow::create();
+    Ptr<RLOFOpticalFlowParameter> param = Ptr<RLOFOpticalFlowParameter>(new RLOFOpticalFlowParameter);
+    param->supportRegionType = SR_CROSS;
+    param->solverType = ST_BILINEAR;
+    algo->setRLOFOpticalFlowParameter(param);
+    algo->setForwardBackward(1.0f);
+    algo->setGridStep(cv::Size(4, 4));
+    algo->setInterpolation(INTERP_EPIC);
     algo->calc(frame1, frame2, flow);
+
     ASSERT_EQ(GT.rows, flow.rows);
     ASSERT_EQ(GT.cols, flow.cols);
-    EXPECT_LE(calcRMSE(GT, flow), target_RMSE);
-}
+    EXPECT_LE(calcRMSE(GT, flow), 0.46f);
 
-TEST(DenseOpticalFlow_DIS, ReferenceAccuracy)
-{
-    Mat frame1, frame2, GT;
-    ASSERT_TRUE(readRubberWhale(frame1, frame2, GT));
-    int presets[] = {DISOpticalFlow::PRESET_ULTRAFAST, DISOpticalFlow::PRESET_FAST, DISOpticalFlow::PRESET_MEDIUM};
-    float target_RMSE[] = {0.86f, 0.74f, 0.49f};
-    cvtColor(frame1, frame1, COLOR_BGR2GRAY);
-    cvtColor(frame2, frame2, COLOR_BGR2GRAY);
+    algo->setInterpolation(INTERP_GEO);
+    algo->calc(frame1, frame2, flow);
 
-    Ptr<DenseOpticalFlow> algo;
+    ASSERT_EQ(GT.rows, flow.rows);
+    ASSERT_EQ(GT.cols, flow.cols);
+    EXPECT_LE(calcRMSE(GT, flow), 0.55f);
 
-    // iterate over presets:
-    for (int i = 0; i < 3; i++)
-    {
-        Mat flow;
-        algo = createOptFlow_DIS(presets[i]);
-        algo->calc(frame1, frame2, flow);
-        ASSERT_EQ(GT.rows, flow.rows);
-        ASSERT_EQ(GT.cols, flow.cols);
-        EXPECT_LE(calcRMSE(GT, flow), target_RMSE[i]);
-    }
 }
 
-TEST(DenseOpticalFlow_VariationalRefinement, ReferenceAccuracy)
+TEST(DenseOpticalFlow_SparseToDenseFlow, ReferenceAccuracy)
 {
     Mat frame1, frame2, GT;
     ASSERT_TRUE(readRubberWhale(frame1, frame2, GT));
-    float target_RMSE = 0.86f;
-    cvtColor(frame1, frame1, COLOR_BGR2GRAY);
-    cvtColor(frame2, frame2, COLOR_BGR2GRAY);
+    float target_RMSE = 0.52f;
 
-    Ptr<VariationalRefinement> var_ref;
-    var_ref = createVariationalFlowRefinement();
-    var_ref->setAlpha(20.0f);
-    var_ref->setDelta(5.0f);
-    var_ref->setGamma(10.0f);
-    var_ref->setSorIterations(25);
-    var_ref->setFixedPointIterations(25);
-    Mat flow(frame1.size(), CV_32FC2);
-    flow.setTo(0.0f);
-    var_ref->calc(frame1, frame2, flow);
+    Mat flow;
+    Ptr<DenseOpticalFlow> algo;
+    algo = createOptFlow_SparseToDense();
+    algo->calc(frame1, frame2, flow);
     ASSERT_EQ(GT.rows, flow.rows);
     ASSERT_EQ(GT.cols, flow.cols);
     EXPECT_LE(calcRMSE(GT, flow), target_RMSE);
diff --git a/modules/optflow/test/test_OF_reproducibility.cpp b/modules/optflow/test/test_OF_reproducibility.cpp
deleted file mode 100644
index 21fed1d5ced..00000000000
--- a/modules/optflow/test/test_OF_reproducibility.cpp
+++ /dev/null
@@ -1,160 +0,0 @@
-/*M///////////////////////////////////////////////////////////////////////////////////////
-//
-//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
-//
-//  By downloading, copying, installing or using the software you agree to this license.
-//  If you do not agree to this license, do not download, install,
-//  copy or use the software.
-//
-//
-//                        Intel License Agreement
-//                For Open Source Computer Vision Library
-//
-// Copyright (C) 2000, Intel Corporation, all rights reserved.
-// Third party copyrights are property of their respective owners.
-//
-// Redistribution and use in source and binary forms, with or without modification,
-// are permitted provided that the following conditions are met:
-//
-//   * Redistribution's of source code must retain the above copyright notice,
-//     this list of conditions and the following disclaimer.
-//
-//   * Redistribution's in binary form must reproduce the above copyright notice,
-//     this list of conditions and the following disclaimer in the documentation
-//     and/or other materials provided with the distribution.
-//
-//   * The name of Intel Corporation may not be used to endorse or promote products
-//     derived from this software without specific prior written permission.
-//
-// This software is provided by the copyright holders and contributors "as is" and
-// any express or implied warranties, including, but not limited to, the implied
-// warranties of merchantability and fitness for a particular purpose are disclaimed.
-// In no event shall the Intel Corporation or contributors be liable for any direct,
-// indirect, incidental, special, exemplary, or consequential damages
-// (including, but not limited to, procurement of substitute goods or services;
-// loss of use, data, or profits; or business interruption) however caused
-// and on any theory of liability, whether in contract, strict liability,
-// or tort (including negligence or otherwise) arising in any way out of
-// the use of this software, even if advised of the possibility of such damage.
-//
-//M*/
-
-#include "test_precomp.hpp"
-
-namespace opencv_test { namespace {
-
-typedef tuple<Size> OFParams;
-typedef TestWithParam<OFParams> DenseOpticalFlow_DIS;
-typedef TestWithParam<OFParams> DenseOpticalFlow_VariationalRefinement;
-
-TEST_P(DenseOpticalFlow_DIS, MultithreadReproducibility)
-{
-    double MAX_DIF = 0.01;
-    double MAX_MEAN_DIF = 0.001;
-    int loopsCount = 2;
-    RNG rng(0);
-
-    OFParams params = GetParam();
-    Size size = get<0>(params);
-
-    int nThreads = cv::getNumThreads();
-    if (nThreads == 1)
-        throw SkipTestException("Single thread environment");
-    for (int iter = 0; iter <= loopsCount; iter++)
-    {
-        Mat frame1(size, CV_8U);
-        randu(frame1, 0, 255);
-        Mat frame2(size, CV_8U);
-        randu(frame2, 0, 255);
-
-        Ptr<DISOpticalFlow> algo = createOptFlow_DIS();
-        int psz = rng.uniform(4, 16);
-        int pstr = rng.uniform(1, psz - 1);
-        int grad_iter = rng.uniform(1, 64);
-        int var_iter = rng.uniform(0, 10);
-        bool use_mean_normalization = !!rng.uniform(0, 2);
-        bool use_spatial_propagation = !!rng.uniform(0, 2);
-        algo->setFinestScale(0);
-        algo->setPatchSize(psz);
-        algo->setPatchStride(pstr);
-        algo->setGradientDescentIterations(grad_iter);
-        algo->setVariationalRefinementIterations(var_iter);
-        algo->setUseMeanNormalization(use_mean_normalization);
-        algo->setUseSpatialPropagation(use_spatial_propagation);
-
-        cv::setNumThreads(nThreads);
-        Mat resMultiThread;
-        algo->calc(frame1, frame2, resMultiThread);
-
-        cv::setNumThreads(1);
-        Mat resSingleThread;
-        algo->calc(frame1, frame2, resSingleThread);
-
-        EXPECT_LE(cv::norm(resSingleThread, resMultiThread, NORM_INF), MAX_DIF);
-        EXPECT_LE(cv::norm(resSingleThread, resMultiThread, NORM_L1), MAX_MEAN_DIF * frame1.total());
-
-        // resulting flow should be within the frame bounds:
-        double min_val, max_val;
-        minMaxLoc(resMultiThread, &min_val, &max_val);
-        EXPECT_LE(abs(min_val), sqrt( static_cast<double>(size.height * size.height + size.width * size.width)) );
-        EXPECT_LE(abs(max_val), sqrt( static_cast<double>(size.height * size.height + size.width * size.width)) );
-    }
-}
-
-INSTANTIATE_TEST_CASE_P(FullSet, DenseOpticalFlow_DIS, Values(szODD, szQVGA));
-
-TEST_P(DenseOpticalFlow_VariationalRefinement, MultithreadReproducibility)
-{
-    double MAX_DIF = 0.01;
-    double MAX_MEAN_DIF = 0.001;
-    float input_flow_rad = 5.0;
-    int loopsCount = 2;
-    RNG rng(0);
-
-    OFParams params = GetParam();
-    Size size = get<0>(params);
-
-    int nThreads = cv::getNumThreads();
-    if (nThreads == 1)
-        throw SkipTestException("Single thread environment");
-    for (int iter = 0; iter <= loopsCount; iter++)
-    {
-        Mat frame1(size, CV_8U);
-        randu(frame1, 0, 255);
-        Mat frame2(size, CV_8U);
-        randu(frame2, 0, 255);
-        Mat flow(size, CV_32FC2);
-        randu(flow, -input_flow_rad, input_flow_rad);
-
-        Ptr<VariationalRefinement> var = createVariationalFlowRefinement();
-        var->setAlpha(rng.uniform(1.0f, 100.0f));
-        var->setGamma(rng.uniform(0.1f, 10.0f));
-        var->setDelta(rng.uniform(0.1f, 10.0f));
-        var->setSorIterations(rng.uniform(1, 20));
-        var->setFixedPointIterations(rng.uniform(1, 20));
-        var->setOmega(rng.uniform(1.01f, 1.99f));
-
-        cv::setNumThreads(nThreads);
-        Mat resMultiThread;
-        flow.copyTo(resMultiThread);
-        var->calc(frame1, frame2, resMultiThread);
-
-        cv::setNumThreads(1);
-        Mat resSingleThread;
-        flow.copyTo(resSingleThread);
-        var->calc(frame1, frame2, resSingleThread);
-
-        EXPECT_LE(cv::norm(resSingleThread, resMultiThread, NORM_INF), MAX_DIF);
-        EXPECT_LE(cv::norm(resSingleThread, resMultiThread, NORM_L1), MAX_MEAN_DIF * frame1.total());
-
-        // resulting flow should be within the frame bounds:
-        double min_val, max_val;
-        minMaxLoc(resMultiThread, &min_val, &max_val);
-        EXPECT_LE(abs(min_val), sqrt( static_cast<double>(size.height * size.height + size.width * size.width)) );
-        EXPECT_LE(abs(max_val), sqrt( static_cast<double>(size.height * size.height + size.width * size.width)) );
-    }
-}
-
-INSTANTIATE_TEST_CASE_P(FullSet, DenseOpticalFlow_VariationalRefinement, Values(szODD, szQVGA));
-
-}} // namespace
diff --git a/modules/optflow/test/test_tvl1optflow.cpp b/modules/optflow/test/test_tvl1optflow.cpp
new file mode 100644
index 00000000000..c4274ffbe18
--- /dev/null
+++ b/modules/optflow/test/test_tvl1optflow.cpp
@@ -0,0 +1,173 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                        Intel License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000, Intel Corporation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of Intel Corporation may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+//#define DUMP
+
+    // first four bytes, should be the same in little endian
+    const float FLO_TAG_FLOAT = 202021.25f;  // check for this when READING the file
+
+#ifdef DUMP
+    // binary file format for flow data specified here:
+    // http://vision.middlebury.edu/flow/data/
+    void writeOpticalFlowToFile(const Mat_<Point2f>& flow, const string& fileName)
+    {
+        const char FLO_TAG_STRING[] = "PIEH";    // use this when WRITING the file
+        ofstream file(fileName.c_str(), ios_base::binary);
+
+        file << FLO_TAG_STRING;
+
+        file.write((const char*) &flow.cols, sizeof(int));
+        file.write((const char*) &flow.rows, sizeof(int));
+
+        for (int i = 0; i < flow.rows; ++i)
+        {
+            for (int j = 0; j < flow.cols; ++j)
+            {
+                const Point2f u = flow(i, j);
+
+                file.write((const char*) &u.x, sizeof(float));
+                file.write((const char*) &u.y, sizeof(float));
+            }
+        }
+    }
+#endif
+
+    // binary file format for flow data specified here:
+    // http://vision.middlebury.edu/flow/data/
+    void readOpticalFlowFromFile(Mat_<Point2f>& flow, const string& fileName)
+    {
+        std::ifstream file(fileName.c_str(), std::ios_base::binary);
+
+        float tag;
+        file.read((char*) &tag, sizeof(float));
+        CV_Assert( tag == FLO_TAG_FLOAT );
+
+        Size size;
+
+        file.read((char*) &size.width, sizeof(int));
+        file.read((char*) &size.height, sizeof(int));
+
+        flow.create(size);
+
+        for (int i = 0; i < flow.rows; ++i)
+        {
+            for (int j = 0; j < flow.cols; ++j)
+            {
+                Point2f u;
+
+                file.read((char*) &u.x, sizeof(float));
+                file.read((char*) &u.y, sizeof(float));
+
+                flow(i, j) = u;
+            }
+        }
+        file.close();
+    }
+
+    bool isFlowCorrect(Point2f u)
+    {
+        return !cvIsNaN(u.x) && !cvIsNaN(u.y) && (fabs(u.x) < 1e9) && (fabs(u.y) < 1e9);
+    }
+
+    void check(const Mat_<Point2f>& gold, const Mat_<Point2f>& flow, double threshold = 0.1, double expectedAccuracy = 0.95)
+    {
+        threshold = threshold*threshold;
+
+        size_t gold_counter = 0;
+        size_t valid_counter = 0;
+
+        for (int i = 0; i < gold.rows; ++i)
+        {
+            for (int j = 0; j < gold.cols; ++j)
+            {
+                const Point2f u1 = gold(i, j);
+                const Point2f u2 = flow(i, j);
+
+                if (isFlowCorrect(u1))
+                {
+                    gold_counter++;
+                    if (isFlowCorrect(u2))
+                    {
+                        const Point2f diff = u1 - u2;
+                        double err = diff.ddot(diff);
+                        if (err <= threshold)
+                            valid_counter++;
+                    }
+                }
+            }
+        }
+        EXPECT_GE(valid_counter, expectedAccuracy * gold_counter);
+    }
+
+TEST(Contrib_calcOpticalFlowDual_TVL1, Regression)
+{
+    const string frame1_path = TS::ptr()->get_data_path() + "optflow/RubberWhale1.png";
+    const string frame2_path = TS::ptr()->get_data_path() + "optflow/RubberWhale2.png";
+    const string gold_flow_path = TS::ptr()->get_data_path() + "optflow/tvl1_flow.flo";
+
+    Mat frame1 = imread(frame1_path, IMREAD_GRAYSCALE);
+    Mat frame2 = imread(frame2_path, IMREAD_GRAYSCALE);
+    ASSERT_FALSE(frame1.empty());
+    ASSERT_FALSE(frame2.empty());
+
+    Mat_<Point2f> flow;
+    Ptr<DualTVL1OpticalFlow> tvl1 = cv::optflow::DualTVL1OpticalFlow::create();
+
+    tvl1->calc(frame1, frame2, flow);
+
+#ifdef DUMP
+    writeOpticalFlowToFile(flow, gold_flow_path);
+#else
+    Mat_<Point2f> gold;
+    readOpticalFlowFromFile(gold, gold_flow_path);
+
+    ASSERT_EQ(gold.rows, flow.rows);
+    ASSERT_EQ(gold.cols, flow.cols);
+
+    check(gold, flow);
+#endif
+}
+
+}} // namespace
diff --git a/modules/ovis/CMakeLists.txt b/modules/ovis/CMakeLists.txt
index 870e95d4c9d..dc190da980b 100644
--- a/modules/ovis/CMakeLists.txt
+++ b/modules/ovis/CMakeLists.txt
@@ -1,14 +1,14 @@
 set(the_description "OGRE 3D Visualiser.")
 
-find_package(OGRE 1.10 QUIET)
+find_package(OGRE QUIET)
 
 if(NOT OGRE_FOUND)
   message(STATUS "Module opencv_ovis disabled because OGRE3D was not found")
   ocv_module_disable(ovis)
-elseif(OGRE_VERSION VERSION_LESS 1.10 OR OGRE_VERSION VERSION_GREATER 2.0)
+elseif(OGRE_VERSION VERSION_LESS 1.11.5 OR OGRE_VERSION VERSION_GREATER 2.0)
   message(STATUS "Module opencv_ovis disabled because of incompatible OGRE3D version (${OGRE_VERSION})")
   ocv_module_disable(ovis)
-elseif(OGRE_VERSION VERSION_GREATER 1.10) # we need C++11 for OGRE 1.11
+else() # we need C++11 for OGRE 1.11
   if(MSVC)
     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /Qstd=c++11")
   else()
@@ -16,13 +16,15 @@ elseif(OGRE_VERSION VERSION_GREATER 1.10) # we need C++11 for OGRE 1.11
   endif()
 endif()
 
-if(OGRE_VERSION VERSION_LESS 1.10.10)
-    message(WARNING "opencv_ovis: Ogre >= 1.10.10 recommended for interactive windows")
-endif()
-
 include_directories(${OGRE_INCLUDE_DIRS})
 link_directories(${OGRE_LIBRARY_DIRS})
 
-ocv_define_module(ovis opencv_core opencv_imgproc opencv_calib3d WRAP python)
+ocv_add_module(ovis opencv_core opencv_imgproc opencv_calib3d WRAP python)
+ocv_glob_module_sources()
+ocv_module_include_directories()
+ocv_create_module()
+
+ocv_add_samples(opencv_aruco)
+
 ocv_warnings_disable(CMAKE_CXX_FLAGS -Wunused-parameter)
 ocv_target_link_libraries(${the_module} ${OGRE_LIBRARIES})
diff --git a/modules/ovis/include/opencv2/ovis.hpp b/modules/ovis/include/opencv2/ovis.hpp
index 78cd7719160..f99165c6549 100644
--- a/modules/ovis/include/opencv2/ovis.hpp
+++ b/modules/ovis/include/opencv2/ovis.hpp
@@ -25,7 +25,9 @@ enum SceneSettings
     /// draw coordinate system crosses for debugging
     SCENE_SHOW_CS_CROSS = 4,
     /// Apply anti-aliasing. The first window determines the setting for all windows.
-    SCENE_AA = 8
+    SCENE_AA = 8,
+    /// Render off-screen without a window. Allows separate AA setting. Requires manual update via @ref WindowScene::update
+    SCENE_OFFSCREEN = 16
 };
 
 enum MaterialProperty
@@ -34,6 +36,7 @@ enum MaterialProperty
     MATERIAL_LINE_WIDTH,
     MATERIAL_OPACITY,
     MATERIAL_EMISSIVE,
+    MATERIAL_DIFFUSE,
     MATERIAL_TEXTURE0,
     MATERIAL_TEXTURE = MATERIAL_TEXTURE0,
     MATERIAL_TEXTURE1,
@@ -45,7 +48,8 @@ enum EntityProperty
 {
     ENTITY_MATERIAL,
     ENTITY_SCALE,
-    ENTITY_AABB_WORLD
+    ENTITY_AABB_WORLD,
+    ENTITY_ANIMBLEND_MODE
 };
 
 /**
@@ -77,7 +81,7 @@ class CV_EXPORTS_W WindowScene {
     CV_WRAP virtual void setCompositors(const std::vector<String>& names) = 0;
 
     /**
-     * place an entity of an mesh in the scene
+     * place an entity of a mesh in the scene
      *
      * the mesh needs to be created beforehand. Either programmatically
      * by e.g. @ref createPointCloudMesh or by placing an Ogre .mesh file in a resource location.
@@ -105,7 +109,8 @@ class CV_EXPORTS_W WindowScene {
     CV_WRAP virtual void setEntityProperty(const String& name, int prop, const Scalar& value) = 0;
 
     /// @overload
-    CV_WRAP virtual void setEntityProperty(const String& name, int prop, const String& value) = 0;
+    CV_WRAP virtual void setEntityProperty(const String& name, int prop, const String& value,
+                                           int subEntityIdx = -1) = 0;
 
     /**
      * get the property of an entity
@@ -164,6 +169,40 @@ class CV_EXPORTS_W WindowScene {
     CV_WRAP virtual void setEntityPose(const String& name, InputArray tvec = noArray(),
                                        InputArray rot = noArray(), bool invert = false) = 0;
 
+	/**
+     * Retrieves the current pose of an entity
+	 * @param name entity name
+     * @param R 3x3 rotation matrix
+     * @param tvec translation vector
+     * @param invert return the inverted pose
+     */
+	CV_WRAP virtual void getEntityPose(const String& name, OutputArray R = noArray(), OutputArray tvec = noArray(),
+                                       bool invert = false) = 0;
+
+    /**
+     * get a list of available entity animations
+     * @param name entity name
+     * @param out the animation names
+     */
+    CV_WRAP virtual void getEntityAnimations(const String& name, std::vector<String>& out) = 0;
+
+    /**
+     * play entity animation
+     * @param name entity name
+     * @param animname animation name
+     * @param loop enable or disable animation loop
+     * @see getEntityAnimations
+     */
+    CV_WRAP virtual void playEntityAnimation(const String& name, const String& animname,
+                                               bool loop = true) = 0;
+
+    /**
+     * stop entity animation
+     * @param name enitity name
+     * @param animname animation name
+     */
+    CV_WRAP virtual void stopEntityAnimation(const String& name, const String& animname) = 0;
+
     /**
      * read back the image generated by the last call to @ref waitKey
      */
@@ -211,6 +250,15 @@ class CV_EXPORTS_W WindowScene {
      */
     CV_WRAP virtual void setCameraLookAt(const String& target, InputArray offset = noArray()) = 0;
 
+	/**
+     * convenience method to orient an entity to a specific entity.
+	 * If target is an empty string the entity looks at the given offset point
+	 * @param origin entity to make look at
+     * @param target name of target entity
+	 * @param offset offset from entity centre
+     */
+	CV_WRAP virtual void setEntityLookAt(const String& origin, const String& target, InputArray offset = noArray()) = 0;
+
     /**
      * Retrieves the current camera pose
      * @param R 3x3 rotation matrix
@@ -232,6 +280,10 @@ class CV_EXPORTS_W WindowScene {
     CV_WRAP virtual void setCameraIntrinsics(InputArray K, const Size& imsize,
                                              float zNear = -1,
                                              float zFar = -1) = 0;
+    /**
+     * render this window, but do not swap buffers. Automatically called by @ref ovis::waitKey
+     */
+    CV_WRAP virtual void update() = 0;
 };
 
 /**
@@ -315,6 +367,17 @@ CV_EXPORTS_W void createPointCloudMesh(const String& name, InputArray vertices,
  */
 CV_EXPORTS_W void createGridMesh(const String& name, const Size2f& size, const Size& segments = Size(1, 1));
 
+/**
+ * creates a triangle mesh from vertex-vertex or face-vertex representation
+ *
+ * creates a material with the same name
+ * @param name name of the mesh
+ * @param vertices float vector of positions
+ * @param normals float vector of normals
+ * @param indices int vector of indices
+ */
+CV_EXPORTS_W void createTriangleMesh(const String& name, InputArray vertices, InputArray normals = noArray(), InputArray indices = noArray());
+
 /**
  * updates an existing texture
  *
diff --git a/modules/ovis/samples/aruco_ar_demo.cpp b/modules/ovis/samples/aruco_ar_demo.cpp
new file mode 100644
index 00000000000..8e2464046c1
--- /dev/null
+++ b/modules/ovis/samples/aruco_ar_demo.cpp
@@ -0,0 +1,68 @@
+#include <opencv2/highgui.hpp>
+#include <opencv2/calib3d.hpp>
+#include <opencv2/videoio.hpp>
+
+#include <opencv2/ovis.hpp>
+#include <opencv2/aruco.hpp>
+
+#include <iostream>
+
+
+#define KEY_ESCAPE 27
+
+using namespace cv;
+
+int main()
+{
+  Mat img;
+  std::vector<std::vector<Point2f>> corners;
+  std::vector<int> ids;
+  std::vector<Vec3d> rvecs;
+  std::vector<Vec3d> tvecs;
+
+  const Size2i imsize(800, 600);
+  const double focal_length = 800.0;
+
+  // aruco
+  Ptr<aruco::Dictionary> adict = aruco::getPredefinedDictionary(aruco::DICT_4X4_50);
+  //Mat out_img;
+  //aruco::drawMarker(adict, 0, 400, out_img);
+  //imshow("marker", out_img);
+
+  // random calibration data, your mileage may vary
+  Mat1d cm = Mat1d::zeros(3, 3); // init empty matrix
+  cm.at<double>(0, 0) = focal_length; // f_x
+  cm.at<double>(1, 1) = focal_length; // f_y
+  cm.at<double>(2, 2) = 1; // f_z
+  Mat K = getDefaultNewCameraMatrix(cm, imsize, true);
+
+  // AR scene
+  ovis::addResourceLocation("packs/Sinbad.zip"); // shipped with Ogre
+
+  Ptr<ovis::WindowScene> win = ovis::createWindow(String("arucoAR"), imsize, ovis::SCENE_INTERACTIVE | ovis::SCENE_AA);
+  win->setCameraIntrinsics(K, imsize);
+  win->createEntity("sinbad", "Sinbad.mesh", Vec3i(0, 0, 5), Vec3f(1.57, 0.0, 0.0));
+  win->createLightEntity("sun", Vec3i(0, 0, 100));
+
+  // video capture
+  VideoCapture cap{0};
+  cap.set(CAP_PROP_FRAME_WIDTH, imsize.width);
+  cap.set(CAP_PROP_FRAME_HEIGHT, imsize.height);
+
+  std::cout << "Press ESCAPE to exit demo" << std::endl;
+  while (ovis::waitKey(1) != KEY_ESCAPE) {
+    cap.read(img);
+    win->setBackground(img);
+    aruco::detectMarkers(img, adict, corners, ids);
+
+    waitKey(1);
+
+    if (ids.size() == 0)
+      continue;
+
+    aruco::estimatePoseSingleMarkers(corners, 5, K, noArray(), rvecs, tvecs);
+    win->setCameraPose(tvecs.at(0), rvecs.at(0), true);
+  }
+
+  return 0;
+}
diff --git a/modules/ovis/samples/ovis_demo.cpp b/modules/ovis/samples/ovis_demo.cpp
new file mode 100644
index 00000000000..c4a9b0f36a8
--- /dev/null
+++ b/modules/ovis/samples/ovis_demo.cpp
@@ -0,0 +1,58 @@
+#include <opencv2/calib3d.hpp>
+#include <opencv2/videoio.hpp>
+
+#include <opencv2/ovis.hpp>
+
+#include <iostream>
+
+
+#define KEY_ESCAPE 27
+
+using namespace cv;
+
+int main()
+{
+  Mat R;
+  Vec3d t;
+
+  const Size2i imsize(800, 600);
+  const double focal_length = 800.0;
+
+  //add some external resources
+  ovis::addResourceLocation("packs/Sinbad.zip"); // shipped with Ogre
+
+  //camera intrinsics
+  Mat1d K = Mat1d::zeros(3, 3); // init empty matrix
+  K.at<double>(0, 0) = focal_length; // f_x
+  K.at<double>(1, 1) = focal_length; // f_y
+  K.at<double>(0, 2) = 400; // t_x
+  K.at<double>(1, 2) = 500; // t_y
+  K.at<double>(2, 2) = 1; // f_z
+
+  //observer scene
+  Ptr<ovis::WindowScene> owin = ovis::createWindow(String("VR"), imsize);
+  ovis::createGridMesh("ground", Size2i(10, 10), Size2i(10, 10));
+  owin->createEntity("ground", "ground", Vec3f(1.57, 0, 0));
+  owin->createCameraEntity("cam", K, imsize, 5);
+  owin->createEntity("sinbad", "Sinbad.mesh", Vec3i(0, 0, 5), Vec3f(CV_PI/2.0, 0.0, 0.0)); // externally defined mesh
+  owin->createLightEntity("sun", Vec3i(0, 0, -100));
+
+  // setup and play idle animation
+  owin->setEntityProperty("sinbad", ovis::EntityProperty::ENTITY_ANIMBLEND_MODE, Scalar(1)); // 1 = cumulative
+  owin->playEntityAnimation("sinbad", "IdleBase");
+  owin->playEntityAnimation("sinbad", "IdleTop");
+
+  //interaction scene
+  Ptr<ovis::WindowScene> iwin = ovis::createWindow(String("AR"), imsize, ovis::SCENE_SEPARATE | ovis::SCENE_INTERACTIVE);
+  iwin->createEntity("sinbad", "Sinbad.mesh", Vec3i(0, -5, 0), Vec3f(CV_PI, 0.0, 0.0));
+  iwin->createLightEntity("sun", Vec3i(0, 0, -100));
+  iwin->setCameraIntrinsics(K, imsize);
+
+  std::cout << "Press ESCAPE to exit demo" << std::endl;
+  while (ovis::waitKey(1) != KEY_ESCAPE) {
+      iwin->getCameraPose(R, t);
+      owin->setEntityPose("cam", t, R);
+  }
+
+  return 1;
+}
diff --git a/modules/ovis/samples/ovis_demo.py b/modules/ovis/samples/ovis_demo.py
index 238108fe4aa..2993a81b8d1 100644
--- a/modules/ovis/samples/ovis_demo.py
+++ b/modules/ovis/samples/ovis_demo.py
@@ -14,15 +14,22 @@
 cv.ovis.createGridMesh("ground", (10, 10), (10, 10))
 owin.createEntity("ground", "ground", rot=(1.57, 0, 0))
 owin.createCameraEntity("cam", K, imsize, 5)
-owin.createEntity("figure", "Sinbad.mesh", tvec=(0, -5, 0), rot=(np.pi, 0, 0))  # externally defined mesh
+owin.createEntity("sinbad", "Sinbad.mesh", tvec=(0, -5, 0), rot=(np.pi, 0, 0))  # externally defined mesh
 owin.createLightEntity("sun", (0, 0, -100))
 
+# setup and play idle animation
+owin.setEntityProperty("sinbad", cv.ovis.ENTITY_ANIMBLEND_MODE, 1) # 1 = cumulative
+owin.playEntityAnimation("sinbad", "IdleBase")
+owin.playEntityAnimation("sinbad", "IdleTop")
+
 # interaction scene
 iwin = cv.ovis.createWindow("AR", imsize, cv.ovis.SCENE_SEPARATE | cv.ovis.SCENE_INTERACTIVE)
-iwin.createEntity("figure", "Sinbad.mesh", tvec=(0, -5, 0), rot=(np.pi, 0, 0))
+iwin.createEntity("sinbad", "Sinbad.mesh", tvec=(0, -5, 0), rot=(np.pi, 0, 0))
 iwin.createLightEntity("sun", (0, 0, -100))
 iwin.setCameraIntrinsics(K, imsize)
 
 while cv.ovis.waitKey(1) != 27:
     R, t = iwin.getCameraPose()
     owin.setEntityPose("cam", t, R)
+
+del iwin # must be destroyed in reverse creation order
\ No newline at end of file
diff --git a/modules/ovis/src/meshes.cpp b/modules/ovis/src/meshes.cpp
index 7d1c2052af4..8216d5f96b1 100644
--- a/modules/ovis/src/meshes.cpp
+++ b/modules/ovis/src/meshes.cpp
@@ -102,6 +102,92 @@ void createPointCloudMesh(const String& name, InputArray vertices, InputArray co
     mesh->_setBounds(bounds);
 }
 
+void createTriangleMesh(const String& name, InputArray vertices, InputArray normals, InputArray indices)
+{
+    CV_CheckTypeEQ(vertices.type(), CV_32FC3, "vertices type must be Vec3f");
+    CV_Assert(vertices.isContinuous());
+
+    if(!normals.empty())
+    {
+        CV_CheckTypeEQ(normals.type(), CV_32FC3, "normals type must be Vec3f");
+        CV_Assert(normals.isContinuous());
+        CV_Assert(normals.size() == vertices.size());
+    }
+    if(!indices.empty())
+    {
+        CV_CheckTypeEQ(indices.type(), CV_32S, "indices type must be int");
+        CV_Assert(indices.isContinuous());
+    }
+
+    // default material
+    auto mat = MaterialManager::getSingleton().create(name, RESOURCEGROUP_NAME);
+
+    // mesh
+    MeshPtr mesh = MeshManager::getSingleton().createManual(name, RESOURCEGROUP_NAME);
+    SubMesh* sub = mesh->createSubMesh();
+    sub->useSharedVertices = true;
+    sub->operationType = RenderOperation::OT_TRIANGLE_LIST;
+    sub->setMaterialName(name);
+
+    int n = vertices.rows();
+
+    mesh->sharedVertexData = new VertexData();
+    mesh->sharedVertexData->vertexCount = n;
+    VertexDeclaration* decl = mesh->sharedVertexData->vertexDeclaration;
+
+    // vertex data
+    HardwareBufferManager& hbm = HardwareBufferManager::getSingleton();
+
+    Mat _vertices = vertices.getMat();
+
+    int source = 0;
+    HardwareVertexBufferSharedPtr hwbuf;
+
+    decl->addElement(source, 0, VET_FLOAT3, VES_POSITION);
+    hwbuf = hbm.createVertexBuffer(decl->getVertexSize(source), n, HardwareBuffer::HBU_STATIC_WRITE_ONLY);
+    hwbuf->writeData(0, hwbuf->getSizeInBytes(), _vertices.ptr(), true);
+    mesh->sharedVertexData->vertexBufferBinding->setBinding(source, hwbuf);
+
+    // normals
+    if (!normals.empty())
+    {
+        source += 1;
+
+        Mat _normals = normals.getMat();
+        decl->addElement(source, 0, VET_FLOAT3, VES_NORMAL);
+        hwbuf =
+            hbm.createVertexBuffer(decl->getVertexSize(source), n, HardwareBuffer::HBU_STATIC_WRITE_ONLY);
+        hwbuf->writeData(0, hwbuf->getSizeInBytes(), _normals.ptr(), true);
+        mesh->sharedVertexData->vertexBufferBinding->setBinding(source, hwbuf);
+    }
+    else
+    {
+        mat->setLightingEnabled(false);
+    }
+
+    // indices
+    if (!indices.empty())
+    {
+        Mat _indices = indices.getMat();
+
+        HardwareIndexBufferSharedPtr ibuf = HardwareBufferManager::getSingleton().createIndexBuffer(
+            HardwareIndexBuffer::IT_32BIT, indices.total(), HardwareBuffer::HBU_STATIC_WRITE_ONLY);
+        ibuf->writeData(0, ibuf->getSizeInBytes(), _indices.ptr(), true);
+
+        sub->indexData->indexBuffer = ibuf;
+        sub->indexData->indexStart = 0;
+        sub->indexData->indexCount = indices.total();
+    }
+
+    AxisAlignedBox bounds(AxisAlignedBox::EXTENT_NULL);
+    for (int i = 0; i < n; i++)
+    {
+        Vec3f v = _vertices.at<Vec3f>(i);
+        bounds.merge(Vector3(v[0], v[1], v[2]));
+    }
+    mesh->_setBounds(bounds);
+}
+
 void createGridMesh(const String& name, const Size2f& size, const Size& segments)
 {
     CV_Assert_N(_app, !segments.empty());
diff --git a/modules/ovis/src/ovis.cpp b/modules/ovis/src/ovis.cpp
index 22cd56c171e..6befbe68f53 100644
--- a/modules/ovis/src/ovis.cpp
+++ b/modules/ovis/src/ovis.cpp
@@ -12,6 +12,7 @@
 #include <opencv2/calib3d.hpp>
 #include <opencv2/core/utils/configuration.private.hpp>
 
+
 namespace cv
 {
 namespace ovis
@@ -22,7 +23,7 @@ const char* RESOURCEGROUP_NAME = "OVIS";
 Ptr<Application> _app;
 
 static const char* RENDERSYSTEM_NAME = "OpenGL 3+ Rendering Subsystem";
-static std::vector<String> _extraResourceLocations;
+static std::set<String> _extraResourceLocations;
 
 // convert from OpenCV to Ogre coordinates:
 static Quaternion toOGRE(Degree(180), Vector3::UNIT_X);
@@ -122,11 +123,11 @@ static void _setCameraIntrinsics(Camera* cam, InputArray _K, const Size& imsize)
 
     Matx33f K = _K.getMat();
 
-    float near = cam->getNearClipDistance();
-    float top = near * K(1, 2) / K(1, 1);
-    float left = -near * K(0, 2) / K(0, 0);
-    float right = near * (imsize.width - K(0, 2)) / K(0, 0);
-    float bottom = -near * (imsize.height - K(1, 2)) / K(1, 1);
+    float zNear = cam->getNearClipDistance();
+    float top = zNear * K(1, 2) / K(1, 1);
+    float left = -zNear * K(0, 2) / K(0, 0);
+    float right = zNear * (imsize.width - K(0, 2)) / K(0, 0);
+    float bottom = -zNear * (imsize.height - K(1, 2)) / K(1, 1);
 
     // use frustum extents instead of setFrustumOffset as the latter
     // assumes centered FOV, which is not the case
@@ -170,6 +171,14 @@ static SceneNode& _getSceneNode(SceneManager* sceneMgr, const String& name)
     return *mo->getParentSceneNode();
 }
 
+static ColourValue convertColor(const Scalar& val)
+{
+    // BGR 0..255 (uchar) to RGB 0..1
+    ColourValue ret = ColourValue(val[2], val[1], val[0]) / 255;
+    ret.saturate();
+    return ret;
+}
+
 struct Application : public OgreBites::ApplicationContext, public OgreBites::InputListener
 {
     Ptr<LogManager> logMgr;
@@ -237,6 +246,20 @@ struct Application : public OgreBites::ApplicationContext, public OgreBites::Inp
         return ret;
     }
 
+#if OGRE_VERSION < ((1 << 16) | (12 << 8) | 3)
+    void destroyWindow(const Ogre::String& name)
+    {
+        for (auto it = mWindows.begin(); it != mWindows.end(); ++it)
+        {
+            if (it->render->getName() != name)
+                continue;
+            mRoot->destroyRenderTarget(it->render);
+            mWindows.erase(it);
+            return;
+        }
+    }
+#endif
+
     size_t numWindows() const { return mWindows.size(); }
 
     void locateResources() CV_OVERRIDE
@@ -245,10 +268,9 @@ struct Application : public OgreBites::ApplicationContext, public OgreBites::Inp
         ResourceGroupManager& rgm = ResourceGroupManager::getSingleton();
         rgm.createResourceGroup(RESOURCEGROUP_NAME);
 
-        for (size_t i = 0; i < _extraResourceLocations.size(); i++)
+        for (auto loc :_extraResourceLocations)
         {
-            String loc = _extraResourceLocations[i];
-            String type = StringUtil::endsWith(loc, ".zip") ? "Zip" : "FileSystem";
+            const char* type = StringUtil::endsWith(loc, ".zip") ? "Zip" : "FileSystem";
 
             if (!FileSystemLayer::fileExists(loc))
             {
@@ -275,9 +297,10 @@ class WindowSceneImpl : public WindowScene
     Root* root;
     SceneManager* sceneMgr;
     SceneNode* camNode;
-    RenderWindow* rWin;
+    RenderTarget* rWin;
     Ptr<OgreBites::CameraMan> camman;
     Ptr<Rectangle2D> bgplane;
+    std::unordered_map<AnimationState*, Controller<Real>*> frameCtrlrs;
 
     Ogre::RenderTarget* depthRTT;
     int flags;
@@ -320,20 +343,27 @@ class WindowSceneImpl : public WindowScene
         {
             camman.reset(new OgreBites::CameraMan(camNode));
             camman->setStyle(OgreBites::CS_ORBIT);
-#if OGRE_VERSION >= ((1 << 16) | (11 << 8) | 5)
             camman->setFixedYaw(false);
-#else
-            camNode->setFixedYawAxis(true, Vector3::NEGATIVE_UNIT_Y); // OpenCV +Y in Ogre CS
-#endif
         }
 
         if (!app->sceneMgr)
         {
+            CV_Assert((flags & SCENE_OFFSCREEN) == 0 && "off-screen rendering for main window not supported");
+
             app->sceneMgr = sceneMgr;
             rWin = app->getRenderWindow();
             if (camman)
                 app->addInputListener(camman.get());
         }
+        else if (flags & SCENE_OFFSCREEN)
+        {
+            // render into an offscreen texture
+            TexturePtr tex = TextureManager::getSingleton().createManual(
+                title, RESOURCEGROUP_NAME, TEX_TYPE_2D, sz.width, sz.height, 0, PF_BYTE_RGB,
+                TU_RENDERTARGET, NULL, false, flags & SCENE_AA ? 4 : 0);
+            rWin = tex->getBuffer()->getRenderTarget();
+            rWin->setAutoUpdated(false); // only update when requested
+        }
         else
         {
             OgreBites::NativeWindowPair nwin = app->createWindow(title, sz.width, sz.height);
@@ -370,6 +400,10 @@ class WindowSceneImpl : public WindowScene
             _app->closeApp();
             _app.release();
         }
+        else
+        {
+            _app->destroyWindow(title);
+        }
     }
 
     void setBackground(InputArray image) CV_OVERRIDE
@@ -445,6 +479,9 @@ class WindowSceneImpl : public WindowScene
         case PF_BYTE_RGBA:
             dst_type = CV_8UC4;
             break;
+#if OGRE_VERSION >= ((1 << 16) | (12 << 8) | 3)
+        case PF_DEPTH32F:
+#endif
         case PF_FLOAT32_R:
             dst_type = CV_32F;
             break;
@@ -479,10 +516,7 @@ class WindowSceneImpl : public WindowScene
     {
         // hide background plane
         bgplane->setVisible(false);
-
-        // BGRA as uchar
-        ColourValue _color = ColourValue(color[2], color[1], color[0], color[3]) / 255;
-        rWin->getViewport(0)->setBackgroundColour(_color);
+        rWin->getViewport(0)->setBackgroundColour(convertColor(color));
     }
 
     void createEntity(const String& name, const String& meshname, InputArray tvec, InputArray rot) CV_OVERRIDE
@@ -496,7 +530,8 @@ class WindowSceneImpl : public WindowScene
         node->attachObject(ent);
     }
 
-    void removeEntity(const String& name) CV_OVERRIDE {
+    void removeEntity(const String& name) CV_OVERRIDE
+    {
         SceneNode& node = _getSceneNode(sceneMgr, name);
         node.getAttachedObject(name)->detachFromParent();
 
@@ -545,7 +580,6 @@ class WindowSceneImpl : public WindowScene
                            const Scalar& specularColour) CV_OVERRIDE
     {
         Light* light = sceneMgr->createLight(name);
-        light->setDirection(Vector3::NEGATIVE_UNIT_Z);
         // convert to BGR
         light->setDiffuseColour(ColourValue(diffuseColour[2], diffuseColour[1], diffuseColour[0]));
         light->setSpecularColour(ColourValue(specularColour[2], specularColour[1], specularColour[0]));
@@ -577,7 +611,83 @@ class WindowSceneImpl : public WindowScene
         node.setPosition(t);
     }
 
-    void setEntityProperty(const String& name, int prop, const String& value) CV_OVERRIDE
+	void getEntityPose(const String& name ,OutputArray R, OutputArray tvec, bool invert) CV_OVERRIDE
+    {
+		SceneNode* node = sceneMgr->getEntity(name)->getParentSceneNode();
+        Matrix3 _R;
+        // toOGRE.Inverse() == toOGRE
+        (node->getOrientation()*toOGRE).ToRotationMatrix(_R);
+
+        if (invert)
+        {
+            _R = _R.Transpose();
+        }
+
+        if (tvec.needed())
+        {
+            Vector3 _tvec = node->getPosition();
+
+            if (invert)
+            {
+                _tvec = _R * -_tvec;
+            }
+
+            Mat_<Real>(3, 1, _tvec.ptr()).copyTo(tvec);
+        }
+
+        if (R.needed())
+        {
+            Mat_<Real>(3, 3, _R[0]).copyTo(R);
+        }
+    }
+
+    void getEntityAnimations(const String& name, std::vector<String>& out) CV_OVERRIDE
+    {
+        SceneNode& node = _getSceneNode(sceneMgr, name);
+        Entity* ent = dynamic_cast<Entity*>(node.getAttachedObject(name));
+        CV_Assert(ent && "invalid entity");
+        for (auto const& anim : ent->getAllAnimationStates()->getAnimationStates())
+        {
+            out.push_back(anim.first);
+        }
+    }
+
+    void playEntityAnimation(const String& name, const String& animname, bool loop = true) CV_OVERRIDE
+    {
+        SceneNode& node = _getSceneNode(sceneMgr, name);
+        Entity* ent = dynamic_cast<Entity*>(node.getAttachedObject(name));
+        CV_Assert(ent && "invalid entity");
+        AnimationState* animstate = ent->getAnimationState(animname);
+
+        animstate->setTimePosition(0);
+        animstate->setEnabled(true);
+        animstate->setLoop(loop);
+
+        if (frameCtrlrs.find(animstate) != frameCtrlrs.end()) return;
+        frameCtrlrs.insert({
+            animstate,
+            Ogre::ControllerManager::getSingleton().createFrameTimePassthroughController(
+                Ogre::AnimationStateControllerValue::create(animstate, true)
+            )
+        });
+    }
+
+    void stopEntityAnimation(const String& name, const String& animname) CV_OVERRIDE
+    {
+        SceneNode& node = _getSceneNode(sceneMgr, name);
+        Entity* ent = dynamic_cast<Entity*>(node.getAttachedObject(name));
+        CV_Assert(ent && "invalid entity");
+        AnimationState* animstate = ent->getAnimationState(animname);
+
+        if (!animstate->getEnabled()) return;
+
+        animstate->setEnabled(false);
+        animstate->setTimePosition(0);
+        Ogre::ControllerManager::getSingleton().destroyController(frameCtrlrs[animstate]);
+        frameCtrlrs.erase(animstate);
+    }
+
+    void setEntityProperty(const String& name, int prop, const String& value, int subEntityIdx) CV_OVERRIDE
     {
         CV_Assert(prop == ENTITY_MATERIAL);
         SceneNode& node = _getSceneNode(sceneMgr, name);
@@ -588,20 +698,41 @@ class WindowSceneImpl : public WindowScene
         Camera* cam = dynamic_cast<Camera*>(node.getAttachedObject(name));
         if(cam)
         {
+            CV_Assert(subEntityIdx == -1 && "Camera Entities do not have SubEntities");
             cam->setMaterial(mat);
             return;
         }
 
         Entity* ent = dynamic_cast<Entity*>(node.getAttachedObject(name));
         CV_Assert(ent && "invalid entity");
-        ent->setMaterial(mat);
+
+        if (subEntityIdx < 0)
+            ent->setMaterial(mat);
+        else
+            ent->getSubEntities()[subEntityIdx]->setMaterial(mat);
     }
 
     void setEntityProperty(const String& name, int prop, const Scalar& value) CV_OVERRIDE
     {
-        CV_Assert(prop == ENTITY_SCALE);
         SceneNode& node = _getSceneNode(sceneMgr, name);
-        node.setScale(value[0], value[1], value[2]);
+        switch(prop)
+        {
+        case ENTITY_SCALE:
+        {
+            node.setScale(value[0], value[1], value[2]);
+            break;
+        }
+        case ENTITY_ANIMBLEND_MODE:
+        {
+            Entity* ent = dynamic_cast<Entity*>(node.getAttachedObject(name));
+            CV_Assert(ent && "invalid entity");
+
+            ent->getSkeleton()->setBlendMode(static_cast<Ogre::SkeletonAnimationBlendMode>(int(value[0])));
+            break;
+        }
+        default:
+            CV_Error(Error::StsBadArg, "unsupported property");
+        }
     }
 
     void getEntityProperty(const String& name, int prop, OutputArray value) CV_OVERRIDE
@@ -705,11 +836,14 @@ class WindowSceneImpl : public WindowScene
         ndc = (2 * f * n) / ndc;
     }
 
+    void update()
+    {
+        rWin->update(false);
+    }
+
     void fixCameraYawAxis(bool useFixed, InputArray _up) CV_OVERRIDE
     {
-#if OGRE_VERSION >= ((1 << 16) | (11 << 8) | 5)
         if(camman) camman->setFixedYaw(useFixed);
-#endif
 
         Vector3 up = Vector3::NEGATIVE_UNIT_Y;
         if (!_up.empty())
@@ -784,9 +918,33 @@ class WindowSceneImpl : public WindowScene
 
         camNode->lookAt(tgt->_getDerivedPosition() + _offset, Ogre::Node::TS_WORLD);
     }
+
+	void setEntityLookAt(const String& origin, const String& target, InputArray offset) CV_OVERRIDE
+    {
+		SceneNode* orig = sceneMgr->getEntity(origin)->getParentSceneNode();
+
+		Vector3 _offset = Vector3::ZERO;
+
+        if (!offset.empty())
+        {
+            offset.copyTo(Mat_<Real>(3, 1, _offset.ptr()));
+        }
+
+		if(target.compare("") != 0){
+        SceneNode* tgt = sceneMgr->getEntity(target)->getParentSceneNode();
+		orig->lookAt(tgt->_getDerivedPosition() + _offset, Ogre::Node::TS_WORLD, Ogre::Vector3::UNIT_Z);
+		}else{
+			orig->lookAt(_offset, Ogre::Node::TS_WORLD, Ogre::Vector3::UNIT_Z);
+		}
+    }
+
 };
 
-CV_EXPORTS_W void addResourceLocation(const String& path) { _extraResourceLocations.push_back(path); }
+CV_EXPORTS_W void addResourceLocation(const String& path)
+{
+    CV_Assert(!_app && "must be called before the first createWindow");
+    _extraResourceLocations.insert(Ogre::StringUtil::normalizeFilePath(path, false));
+}
 
 Ptr<WindowScene> createWindow(const String& title, const Size& size, int flags)
 {
@@ -822,8 +980,8 @@ void setMaterialProperty(const String& name, int prop, const Scalar& val)
     CV_Assert(mat);
 
     Pass* rpass = mat->getTechniques()[0]->getPasses()[0];
-    ColourValue col;
 
+    ColourValue col;
     switch (prop)
     {
     case MATERIAL_POINT_SIZE:
@@ -836,17 +994,16 @@ void setMaterialProperty(const String& name, int prop, const Scalar& val)
         rpass->setSceneBlending(SBT_TRANSPARENT_ALPHA);
         rpass->setDepthWriteEnabled(false);
         break;
+    case MATERIAL_DIFFUSE:
+        col = convertColor(val);
+        col.a = rpass->getDiffuse().a;
+        rpass->setDiffuse(col);
+        break;
     case MATERIAL_EMISSIVE:
-        col = ColourValue(val[2], val[1], val[0]) / 255; // BGR as uchar
-        col.saturate();
-        rpass->setEmissive(col);
+        rpass->setEmissive(convertColor(val));
         break;
     case MATERIAL_LINE_WIDTH:
-#if OGRE_VERSION >= ((1 << 16) | (11 << 8) | 2)
         rpass->setLineWidth(val[0]);
-#else
-        CV_Error(Error::StsError, "needs OGRE 1.11.2+ for this");
-#endif
         break;
     default:
         CV_Error(Error::StsBadArg, "invalid or non Scalar property");
diff --git a/modules/quality/CMakeLists.txt b/modules/quality/CMakeLists.txt
new file mode 100644
index 00000000000..2a66e31518e
--- /dev/null
+++ b/modules/quality/CMakeLists.txt
@@ -0,0 +1,9 @@
+set(the_description "Image Quality Analysis API")
+ocv_define_module(quality opencv_core opencv_imgproc opencv_ml WRAP python)
+
+# add test data from samples dir to contrib/quality
+ocv_add_testdata(samples/ contrib/quality FILES_MATCHING PATTERN "*.yml")
+
+# add brisque model, range files to installation
+file(GLOB QUALITY_MODEL_DATA samples/*.yml)
+install(FILES ${QUALITY_MODEL_DATA} DESTINATION ${OPENCV_OTHER_INSTALL_PATH}/quality COMPONENT libs)
\ No newline at end of file
diff --git a/modules/quality/README.md b/modules/quality/README.md
new file mode 100644
index 00000000000..a710eda1958
--- /dev/null
+++ b/modules/quality/README.md
@@ -0,0 +1,123 @@
+//! @addtogroup quality
+//! @{
+
+Quality API, Image Quality Analysis
+=======================================
+
+Implementation of various image quality analysis (IQA) algorithms
+
+- **Mean squared error (MSE)**
+  https://en.wikipedia.org/wiki/Mean_squared_error
+
+- **Peak signal-to-noise ratio (PSNR)**
+  https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio
+
+- **Structural similarity (SSIM)**
+  https://en.wikipedia.org/wiki/Structural_similarity
+
+- **Gradient Magnitude Similarity Deviation (GMSD)**
+  http://www4.comp.polyu.edu.hk/~cslzhang/IQA/GMSD/GMSD.htm
+  In general, the GMSD algorithm should yield the best result for full-reference IQA.
+
+- **Blind/Referenceless Image Spatial Quality Evaluation (BRISQUE)**
+  http://live.ece.utexas.edu/research/Quality/nrqa.htm
+
+Interface/Usage
+-----------------------------------------
+All algorithms can be accessed through the simpler static `compute` methods,
+or be accessed by instance created via the static `create` methods.
+
+Instance methods are designed to be more performant when comparing one source
+file against multiple comparison files, as the algorithm-specific preprocessing on the
+source file need not be repeated with each call.
+
+For performance reaasons, it is recommended, but not required, for users of this module
+to convert input images to grayscale images prior to processing.
+SSIM and GMSD were originally tested by their respective researchers on grayscale uint8 images,
+but this implementation will compute the values for each channel if the user desires to do so.
+
+BRISQUE is a NR-IQA algorithm (No-Reference) which doesn't require a reference image.
+
+Quick Start/Usage
+-----------------------------------------
+**C++ Implementations**
+
+**For Full Reference IQA Algorithms (MSE, PSNR, SSIM, GMSD)**
+
+```cpp
+    #include <opencv2/quality.hpp>
+    cv::Mat img1, img2; /* your cv::Mat images to compare */
+    cv::Mat quality_map;  /* output quality map (optional) */
+    /* compute MSE via static method */
+    cv::Scalar result_static = quality::QualityMSE::compute(img1, img2, quality_map);  /* or cv::noArray() if not interested in output quality maps */
+    /* alternatively, compute MSE via instance */
+    cv::Ptr<quality::QualityBase> ptr = quality::QualityMSE::create(img1);
+    cv::Scalar result = ptr->compute( img2 );  /* compute MSE, compare img1 vs img2 */
+    ptr->getQualityMap(quality_map);  /* optionally, access output quality maps */
+```
+
+**For No Reference IQA Algorithm (BRISQUE)**
+
+```cpp
+    #include <opencv2/quality.hpp>
+    cv::Mat img = cv::imread("/path/to/my_image.bmp"); // path to the image to evaluate
+    cv::String model_path = "path/to/brisque_model_live.yml"; // path to the trained model
+    cv::String range_path = "path/to/brisque_range_live.yml"; // path to range file
+    /* compute BRISQUE quality score via static method */
+    cv::Scalar result_static = quality::QualityBRISQUE::compute(img,
+model_path, range_path);
+    /* alternatively, compute BRISQUE via instance */
+    cv::Ptr<quality::QualityBase> ptr = quality::QualityBRISQUE::create(model_path, range_path);
+    cv::Scalar result = ptr->compute(img); /* computes BRISQUE score for img */
+```
+
+**Python Implementations**
+
+**For Full Reference IQA Algorithms (MSE, PSNR, SSIM, GSMD)**
+
+```python
+    import cv2
+    # read images
+    img1 = cv2.imread(img1, 1) # specify img1
+    img2 = cv2.imread(img2_path, 1) # specify img2_path
+    # compute MSE score and quality maps via static method
+    result_static, quality_map = cv2.quality.QualityMSE_compute(img1, img2)
+    # compute MSE score and quality maps via Instance
+    obj = cv2.quality.QualityMSE_create(img1)
+    result = obj.compute(img2)
+    quality_map = obj.getQualityMap()
+```
+
+**For No Reference IQA Algorithm (BRISQUE)**
+
+```python
+    import cv2
+    # read image
+    img = cv2.imread(img_path, 1) # mention img_path
+    # compute brisque quality score via static method
+    score = cv2.quality.QualityBRISQUE_compute(img, model_path,
+range_path) # specify model_path and range_path
+    # compute brisque quality score via instance
+    # specify model_path and range_path
+    obj = cv2.quality.QualityBRISQUE_create(model_path, range_path)
+    score = obj.compute(img)
+```
+
+Library Design
+-----------------------------------------
+Each implemented algorithm shall:
+- Inherit from `QualityBase`, and properly implement/override `compute`, `empty` and `clear` instance methods, along with a static `compute` method.
+- Accept one `cv::Mat` or `cv::UMat` via `InputArray` for computation.  Each input `cv::Mat` or `cv::UMat` may contain one or more channels.  If the algorithm does not support multiple channels, it should be documented and an appropriate assertion should be in place.
+- Return a `cv::Scalar` with per-channel computed value
+- Compute result via a single, static method named `compute` and via an overridden instance method (see `compute` in `qualitybase.hpp`).
+- Perform any setup and/or pre-processing of reference images in the constructor, allowing for efficient computation when comparing the reference image versus multiple comparison image(s).  No-reference algorithms should accept images for evaluation in the `compute` method.
+- Optionally compute resulting quality map.  Instance `compute` method should store them in `QualityBase::_qualityMap` as the mat type defined by `QualityBase::_mat_type`, or override `QualityBase::getQualityMap`.  Static `compute` method should return the quality map in an `OutputArray` parameter.
+- Document algorithm in this readme and in its respective header.  Documentation should include interpretation for the results of `compute` as well as the format of the output quality map (if supported), along with any other notable usage information.
+- Implement tests of static `compute` method and instance methods using single- and multi-channel images and OpenCL enabled and disabled
+
+To Do
+-----------------------------------------
+- Document the output quality maps for each algorithm
+- Investigate precision loss with cv::Filter2D + UMat + CV_32F + OCL for GMSD
+
+//! @}
\ No newline at end of file
diff --git a/modules/quality/doc/quality.bib b/modules/quality/doc/quality.bib
new file mode 100644
index 00000000000..bebc0f62943
--- /dev/null
+++ b/modules/quality/doc/quality.bib
@@ -0,0 +1,34 @@
+@article{Mittal2,
+  title={No-Reference Image Quality Assessment in the Spatial Domain},
+  author={A. {Mittal} and A. K. {Moorthy} and A. C. {Bovik}},
+  journal={IEEE Transactions on Image Processing},
+  volume={21},
+  number={12},
+  pages={4695-4708},
+  year={2012},
+  ISSN={1057-7149},
+  doi={10.1109/TIP.2012.2214050},
+}
+
+@misc{Mittal2_software,
+  title={BRISQUE Software Release},
+  author={A. {Mittal} and A. K. {Moorthy} and A. C. {Bovik}},
+  howpublished={\url{http://live.ece.utexas.edu/research/quality/BRISQUE_release.zip}},
+  year={2011},
+}
+
+@article{Ponomarenko,
+  title={TID2008 - A Database for Evaluation of Full-Reference Visual Quality Assessment Metrics},
+  author={N. {Ponomarenko}, V. {Lukin}, A. {Zelensky}, K. {Egiazarian}, M. {Carli}, F. {Battisti}},
+  journal={Advances of Modern Radioelectronics},
+  volume={10},
+  pages={30-45},
+  year={2009},
+}
+
+@misc{Sheikh,
+  title={LIVE Image Quality Assessment Database Release 2},
+  author={H.R. {Sheikh}, Z. {Wang}, L. {Cormack} and A.C. {Bovik}},
+  howpublished={\url{http://live.ece.utexas.edu/research/quality}},
+  year={2005},
+}
\ No newline at end of file
diff --git a/modules/quality/include/opencv2/quality.hpp b/modules/quality/include/opencv2/quality.hpp
new file mode 100644
index 00000000000..8470f08fefc
--- /dev/null
+++ b/modules/quality/include/opencv2/quality.hpp
@@ -0,0 +1,15 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_QUALITY_HPP
+#define OPENCV_QUALITY_HPP
+
+#include "quality/qualitybase.hpp"
+#include "quality/qualitymse.hpp"
+#include "quality/qualitypsnr.hpp"
+#include "quality/qualityssim.hpp"
+#include "quality/qualitygmsd.hpp"
+#include "quality/qualitybrisque.hpp"
+
+#endif
\ No newline at end of file
diff --git a/modules/quality/include/opencv2/quality/quality_utils.hpp b/modules/quality/include/opencv2/quality/quality_utils.hpp
new file mode 100644
index 00000000000..4b3df51a0c3
--- /dev/null
+++ b/modules/quality/include/opencv2/quality/quality_utils.hpp
@@ -0,0 +1,109 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_QUALITY_QUALITY_UTILS_HPP
+#define OPENCV_QUALITY_QUALITY_UTILS_HPP
+
+#include "qualitybase.hpp"
+
+namespace cv
+{
+namespace quality
+{
+namespace quality_utils
+{
+
+// default type of matrix to expand to
+static CV_CONSTEXPR const int EXPANDED_MAT_DEFAULT_TYPE = CV_32F;
+
+// convert inputarray to specified mat type.  set type == -1 to preserve existing type
+template <typename R>
+inline R extract_mat(InputArray in, const int type = -1)
+{
+    R result = {};
+    if ( in.isMat() )
+        in.getMat().convertTo( result, (type != -1) ? type : in.getMat().type());
+    else if ( in.isUMat() )
+        in.getUMat().convertTo( result, (type != -1) ? type : in.getUMat().type());
+    else
+        CV_Error(Error::StsNotImplemented, "Unsupported input type");
+
+    return result;
+}
+
+// extract and expand matrix to target type
+template <typename R>
+inline R expand_mat( InputArray src, int TYPE_DEFAULT = EXPANDED_MAT_DEFAULT_TYPE)
+{
+    auto result = extract_mat<R>(src, -1);
+
+    // by default, expand to 32F unless we already have >= 32 bits, then go to 64
+    //  if/when we can detect OpenCL CV_16F support, opt for that when input depth == 8
+    //  note that this may impact the precision of the algorithms and would need testing
+    int type = TYPE_DEFAULT;
+
+    switch (result.depth())
+    {
+    case CV_32F:
+    case CV_32S:
+    case CV_64F:
+        type = CV_64F;
+    };  // switch
+
+    result.convertTo(result, type);
+    return result;
+}
+
+// return mat of observed min/max pair per column
+//  row 0:  min per column
+//  row 1:  max per column
+// template <typename T>
+inline cv::Mat get_column_range( const cv::Mat& data )
+{
+    CV_Assert(data.channels() == 1);
+    CV_Assert(data.rows > 0);
+
+    cv::Mat result( cv::Size( data.cols, 2 ), data.type() );
+
+    auto
+        row_min = result.row(0)
+        , row_max = result.row(1)
+        ;
+
+    // set initial min/max
+    data.row(0).copyTo(row_min);
+    data.row(0).copyTo(row_max);
+
+    for (int y = 1; y < data.rows; ++y)
+    {
+        auto row = data.row(y);
+        cv::min(row,row_min, row_min);
+        cv::max(row, row_max, row_max);
+    }
+    return result;
+}   // get_column_range
+
+// linear scale of each column from min to max
+//  range is column-wise pair of observed min/max.  See get_column_range
+template <typename T>
+inline void scale( cv::Mat& mat, const cv::Mat& range, const T min, const T max )
+{
+    // value = lower + (upper - lower) * (value - feature_min[index]) / (feature_max[index] - feature_min[index]);
+    // where [lower] = lower bound, [upper] = upper bound
+
+    for (int y = 0; y < mat.rows; ++y)
+    {
+        auto row = mat.row(y);
+        auto row_min = range.row(0);
+        auto row_max = range.row(1);
+
+        for (int x = 0; x < mat.cols; ++x)
+            row.at<T>(x) = min + (max - min) * (row.at<T>(x) - row_min.at<T>(x) ) / (row_max.at<T>(x) - row_min.at<T>(x));
+    }
+}
+
+}   // quality_utils
+}   // quality
+}   // cv
+#endif
\ No newline at end of file
diff --git a/modules/quality/include/opencv2/quality/qualitybase.hpp b/modules/quality/include/opencv2/quality/qualitybase.hpp
new file mode 100644
index 00000000000..1f4887fcae1
--- /dev/null
+++ b/modules/quality/include/opencv2/quality/qualitybase.hpp
@@ -0,0 +1,63 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_QUALITYBASE_HPP
+#define OPENCV_QUALITYBASE_HPP
+
+#include <opencv2/core.hpp>
+
+/**
+@defgroup quality Image Quality Analysis (IQA) API
+*/
+
+namespace cv
+{
+namespace quality
+{
+
+//! @addtogroup quality
+//! @{
+
+/************************************ Quality Base Class ************************************/
+class CV_EXPORTS_W QualityBase
+    : public virtual Algorithm
+{
+public:
+
+    /** @brief Destructor */
+    virtual ~QualityBase() = default;
+
+    /**
+    @brief Compute quality score per channel with the per-channel score in each element of the resulting cv::Scalar.  See specific algorithm for interpreting result scores
+    @param img comparison image, or image to evalute for no-reference quality algorithms
+    */
+    virtual CV_WRAP cv::Scalar compute( InputArray img ) = 0;
+
+    /** @brief Returns output quality map that was generated during computation, if supported by the algorithm  */
+    virtual CV_WRAP void getQualityMap(OutputArray dst) const
+    {
+        if (!dst.needed() || _qualityMap.empty() )
+            return;
+        dst.assign(_qualityMap);
+    }
+
+    /** @brief Implements Algorithm::clear()  */
+    CV_WRAP void clear() CV_OVERRIDE { _qualityMap = _mat_type(); Algorithm::clear(); }
+
+    /** @brief Implements Algorithm::empty()  */
+    CV_WRAP bool empty() const CV_OVERRIDE { return _qualityMap.empty(); }
+
+protected:
+
+    /** @brief internal mat type default */
+    using _mat_type = cv::UMat;
+
+    /** @brief Output quality maps if generated by algorithm */
+    _mat_type _qualityMap;
+
+};  // QualityBase
+//! @}
+}   // quality
+}   // cv
+#endif
\ No newline at end of file
diff --git a/modules/quality/include/opencv2/quality/qualitybrisque.hpp b/modules/quality/include/opencv2/quality/qualitybrisque.hpp
new file mode 100644
index 00000000000..9732f82d26b
--- /dev/null
+++ b/modules/quality/include/opencv2/quality/qualitybrisque.hpp
@@ -0,0 +1,82 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_QUALITY_QUALITYBRISQUE_HPP
+#define OPENCV_QUALITY_QUALITYBRISQUE_HPP
+
+#include "qualitybase.hpp"
+#include "opencv2/ml.hpp"
+
+namespace cv
+{
+namespace quality
+{
+
+/**
+@brief BRISQUE (Blind/Referenceless Image Spatial Quality Evaluator) is a No Reference Image Quality Assessment (NR-IQA) algorithm.
+
+BRISQUE computes a score based on extracting Natural Scene Statistics (https://en.wikipedia.org/wiki/Scene_statistics)
+and calculating feature vectors. See Mittal et al. @cite Mittal2 for original paper and original implementation @cite Mittal2_software .
+
+A trained model is provided in the /samples/ directory and is trained on the LIVE-R2 database @cite Sheikh as in the original implementation.
+When evaluated against the TID2008 database @cite Ponomarenko , the SROCC is -0.8424 versus the SROCC of -0.8354 in the original implementation.
+C++ code for the BRISQUE LIVE-R2 trainer and TID2008 evaluator are also provided in the /samples/ directory.
+*/
+class CV_EXPORTS_W QualityBRISQUE : public QualityBase {
+public:
+
+    /** @brief Computes BRISQUE quality score for input image
+    @param img Image for which to compute quality
+    @returns cv::Scalar with the score in the first element.  The score ranges from 0 (best quality) to 100 (worst quality)
+    */
+    CV_WRAP cv::Scalar compute( InputArray img ) CV_OVERRIDE;
+
+    /**
+    @brief Create an object which calculates quality
+    @param model_file_path cv::String which contains a path to the BRISQUE model data, eg. /path/to/brisque_model_live.yml
+    @param range_file_path cv::String which contains a path to the BRISQUE range data, eg. /path/to/brisque_range_live.yml
+    */
+    CV_WRAP static Ptr<QualityBRISQUE> create( const cv::String& model_file_path, const cv::String& range_file_path );
+
+    /**
+    @brief Create an object which calculates quality
+    @param model cv::Ptr<cv::ml::SVM> which contains a loaded BRISQUE model
+    @param range cv::Mat which contains BRISQUE range data
+    */
+    CV_WRAP static Ptr<QualityBRISQUE> create( const cv::Ptr<cv::ml::SVM>& model, const cv::Mat& range );
+
+    /**
+    @brief static method for computing quality
+    @param img image for which to compute quality
+    @param model_file_path cv::String which contains a path to the BRISQUE model data, eg. /path/to/brisque_model_live.yml
+    @param range_file_path cv::String which contains a path to the BRISQUE range data, eg. /path/to/brisque_range_live.yml
+    @returns cv::Scalar with the score in the first element.  The score ranges from 0 (best quality) to 100 (worst quality)
+    */
+    CV_WRAP static cv::Scalar compute( InputArray img, const cv::String& model_file_path, const cv::String& range_file_path );
+
+    /**
+    @brief static method for computing image features used by the BRISQUE algorithm
+    @param img image (BGR(A) or grayscale) for which to compute features
+    @param features output row vector of features to cv::Mat or cv::UMat
+    */
+    CV_WRAP static void computeFeatures(InputArray img, OutputArray features);
+
+protected:
+
+    cv::Ptr<cv::ml::SVM> _model = nullptr;
+    cv::Mat _range;
+
+    /** @brief Internal constructor */
+    QualityBRISQUE( const cv::String& model_file_path, const cv::String& range_file_path );
+
+    /** @brief Internal constructor */
+    QualityBRISQUE(const cv::Ptr<cv::ml::SVM>& model, const cv::Mat& range )
+        : _model{ model }
+        , _range{ range }
+    {}
+
+};  // QualityBRISQUE
+}   // quality
+}   // cv
+#endif
diff --git a/modules/quality/include/opencv2/quality/qualitygmsd.hpp b/modules/quality/include/opencv2/quality/qualitygmsd.hpp
new file mode 100644
index 00000000000..3a9cd0b694b
--- /dev/null
+++ b/modules/quality/include/opencv2/quality/qualitygmsd.hpp
@@ -0,0 +1,92 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_QUALITY_QUALITYGMSD_HPP
+#define OPENCV_QUALITY_QUALITYGMSD_HPP
+
+#include "qualitybase.hpp"
+
+namespace cv
+{
+namespace quality
+{
+
+/**
+@brief Full reference GMSD algorithm
+http://www4.comp.polyu.edu.hk/~cslzhang/IQA/GMSD/GMSD.htm
+*/
+class CV_EXPORTS_W QualityGMSD
+    : public QualityBase {
+public:
+
+    /**
+    @brief Compute GMSD
+    @param cmp comparison image
+    @returns cv::Scalar with per-channel quality value.  Values range from 0 (worst) to 1 (best)
+    */
+    CV_WRAP cv::Scalar compute( InputArray cmp ) CV_OVERRIDE;
+
+    /** @brief Implements Algorithm::empty()  */
+    CV_WRAP bool empty() const CV_OVERRIDE { return _refImgData.empty() && QualityBase::empty(); }
+
+    /** @brief Implements Algorithm::clear()  */
+    CV_WRAP void clear() CV_OVERRIDE { _refImgData = _mat_data(); QualityBase::clear(); }
+
+    /**
+    @brief Create an object which calculates image quality
+    @param ref reference image
+    */
+    CV_WRAP static Ptr<QualityGMSD> create( InputArray ref );
+
+    /**
+    @brief static method for computing quality
+    @param ref reference image
+    @param cmp comparison image
+    @param qualityMap output quality map, or cv::noArray()
+    @returns cv::Scalar with per-channel quality value.  Values range from 0 (worst) to 1 (best)
+    */
+    CV_WRAP static cv::Scalar compute( InputArray ref, InputArray cmp, OutputArray qualityMap );
+
+protected:
+
+    // holds computed values for a mat
+    struct _mat_data
+    {
+        // internal mat type
+        using mat_type = QualityBase::_mat_type;
+
+        mat_type
+            gradient_map
+            , gradient_map_squared
+            ;
+
+        // allow default construction
+        _mat_data() = default;
+
+        // construct from mat_type
+        _mat_data(const mat_type&);
+
+        // construct from inputarray
+        _mat_data(InputArray);
+
+        // returns flag if empty
+        bool empty() const { return this->gradient_map.empty() && this->gradient_map_squared.empty(); }
+
+        // compute for a single frame
+        static std::pair<cv::Scalar, mat_type> compute(const _mat_data& lhs, const _mat_data& rhs);
+
+    };  // mat_data
+
+    /** @brief Reference image data */
+    _mat_data _refImgData;
+
+    // internal constructor
+    QualityGMSD(_mat_data refImgData)
+        : _refImgData(std::move(refImgData))
+    {}
+
+};  // QualityGMSD
+}   // quality
+}   // cv
+#endif
\ No newline at end of file
diff --git a/modules/quality/include/opencv2/quality/qualitymse.hpp b/modules/quality/include/opencv2/quality/qualitymse.hpp
new file mode 100644
index 00000000000..7763a58b786
--- /dev/null
+++ b/modules/quality/include/opencv2/quality/qualitymse.hpp
@@ -0,0 +1,64 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_QUALITY_QUALITYMSE_HPP
+#define OPENCV_QUALITY_QUALITYMSE_HPP
+
+#include "qualitybase.hpp"
+
+namespace cv
+{
+namespace quality
+{
+
+/**
+@brief Full reference mean square error algorithm  https://en.wikipedia.org/wiki/Mean_squared_error
+*/
+class CV_EXPORTS_W QualityMSE : public QualityBase {
+public:
+
+    /** @brief Computes MSE for reference images supplied in class constructor and provided comparison images
+    @param cmpImgs Comparison image(s)
+    @returns cv::Scalar with per-channel quality values.  Values range from 0 (best) to potentially max float (worst)
+    */
+    CV_WRAP cv::Scalar compute( InputArrayOfArrays cmpImgs ) CV_OVERRIDE;
+
+    /** @brief Implements Algorithm::empty()  */
+    CV_WRAP bool empty() const CV_OVERRIDE { return _ref.empty() && QualityBase::empty(); }
+
+    /** @brief Implements Algorithm::clear()  */
+    CV_WRAP void clear() CV_OVERRIDE { _ref = _mat_type(); QualityBase::clear(); }
+
+    /**
+    @brief Create an object which calculates quality
+    @param ref input image to use as the reference for comparison
+    */
+    CV_WRAP static Ptr<QualityMSE> create(InputArray ref);
+
+    /**
+    @brief static method for computing quality
+    @param ref reference image
+    @param cmp comparison image=
+    @param qualityMap output quality map, or cv::noArray()
+    @returns cv::Scalar with per-channel quality values.  Values range from 0 (best) to max float (worst)
+    */
+    CV_WRAP static cv::Scalar compute( InputArray ref, InputArray cmp, OutputArray qualityMap );
+
+protected:
+
+    /** @brief Reference image, converted to internal mat type */
+    QualityBase::_mat_type _ref;
+
+    /**
+    @brief Constructor
+    @param ref reference image, converted to internal type
+    */
+    QualityMSE(QualityBase::_mat_type ref)
+        : _ref(std::move(ref))
+    {}
+
+};  // QualityMSE
+}   // quality
+}   // cv
+#endif
\ No newline at end of file
diff --git a/modules/quality/include/opencv2/quality/qualitypsnr.hpp b/modules/quality/include/opencv2/quality/qualitypsnr.hpp
new file mode 100644
index 00000000000..59b7325507d
--- /dev/null
+++ b/modules/quality/include/opencv2/quality/qualitypsnr.hpp
@@ -0,0 +1,120 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_QUALITY_QUALITYPSNR_HPP
+#define OPENCV_QUALITY_QUALITYPSNR_HPP
+
+#include <limits>   // numeric_limits
+#include "qualitybase.hpp"
+#include "qualitymse.hpp"
+
+namespace cv
+{
+namespace quality
+{
+
+/**
+@brief Full reference peak signal to noise ratio (PSNR) algorithm  https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio
+*/
+class CV_EXPORTS_W QualityPSNR
+    : public QualityBase {
+
+public:
+
+    /** @brief Default maximum pixel value */
+#if __cplusplus >= 201103L || (defined(_MSC_VER) && _MSC_VER >= 1900/*MSVS 2015*/)
+    static constexpr double MAX_PIXEL_VALUE_DEFAULT = 255.;
+#else
+    // support MSVS 2013
+    static const int MAX_PIXEL_VALUE_DEFAULT = 255;
+#endif
+
+    /**
+    @brief Create an object which calculates quality
+    @param ref input image to use as the source for comparison
+    @param maxPixelValue maximum per-channel value for any individual pixel; eg 255 for uint8 image
+    */
+    CV_WRAP static Ptr<QualityPSNR> create( InputArray ref, double maxPixelValue = QualityPSNR::MAX_PIXEL_VALUE_DEFAULT )
+    {
+        return Ptr<QualityPSNR>(new QualityPSNR(QualityMSE::create(ref), maxPixelValue));
+    }
+
+    /**
+    @brief Compute the PSNR
+    @param cmp Comparison image
+    @returns Per-channel PSNR value, or std::numeric_limits<double>::infinity() if the MSE between the two images == 0
+    */
+    CV_WRAP cv::Scalar compute( InputArray cmp ) CV_OVERRIDE
+    {
+        auto result = _qualityMSE->compute( cmp );
+        _qualityMSE->getQualityMap(_qualityMap);  // copy from internal obj to this obj
+        return _mse_to_psnr(
+            result
+            , _maxPixelValue
+        );
+    }
+
+    /** @brief Implements Algorithm::empty()  */
+    CV_WRAP bool empty() const CV_OVERRIDE { return _qualityMSE->empty() && QualityBase::empty(); }
+
+    /** @brief Implements Algorithm::clear()  */
+    CV_WRAP void clear() CV_OVERRIDE { _qualityMSE->clear(); QualityBase::clear(); }
+
+    /**
+    @brief static method for computing quality
+    @param ref reference image
+    @param cmp comparison image
+    @param qualityMap output quality map, or cv::noArray()
+    @param maxPixelValue maximum per-channel value for any individual pixel; eg 255 for uint8 image
+    @returns PSNR value, or std::numeric_limits<double>::infinity() if the MSE between the two images == 0
+    */
+    CV_WRAP static cv::Scalar compute( InputArray ref, InputArray cmp, OutputArray qualityMap, double maxPixelValue = QualityPSNR::MAX_PIXEL_VALUE_DEFAULT)
+    {
+        return _mse_to_psnr(
+            QualityMSE::compute(ref, cmp, qualityMap)
+            , maxPixelValue
+        );
+    }
+
+    /** @brief return the maximum pixel value used for PSNR computation */
+    CV_WRAP double getMaxPixelValue() const { return _maxPixelValue; }
+
+    /**
+    @brief sets the maximum pixel value used for PSNR computation
+    @param val Maximum pixel value
+    */
+    CV_WRAP void setMaxPixelValue(double val) { this->_maxPixelValue = val; }
+
+protected:
+
+    Ptr<QualityMSE> _qualityMSE;
+    double _maxPixelValue = QualityPSNR::MAX_PIXEL_VALUE_DEFAULT;
+
+    /** @brief Constructor */
+    QualityPSNR( Ptr<QualityMSE> qualityMSE, double maxPixelValue )
+        : _qualityMSE(std::move(qualityMSE))
+        , _maxPixelValue(maxPixelValue)
+    {}
+
+    // convert mse to psnr
+    static double _mse_to_psnr(double mse, double max_pixel_value)
+    {
+        return (mse == 0.)
+            ? std::numeric_limits<double>::infinity()
+            : 10. * std::log10((max_pixel_value * max_pixel_value) / mse)
+            ;
+    }
+
+    // convert scalar of mses to psnrs
+    static cv::Scalar _mse_to_psnr(cv::Scalar mse, double max_pixel_value)
+    {
+        for (int i = 0; i < mse.rows; ++i)
+            mse(i) = _mse_to_psnr(mse(i), max_pixel_value);
+        return mse;
+    }
+
+};    // QualityPSNR
+}   // quality
+}   // cv
+#endif
\ No newline at end of file
diff --git a/modules/quality/include/opencv2/quality/qualityssim.hpp b/modules/quality/include/opencv2/quality/qualityssim.hpp
new file mode 100644
index 00000000000..edbd3ae6b52
--- /dev/null
+++ b/modules/quality/include/opencv2/quality/qualityssim.hpp
@@ -0,0 +1,97 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_QUALITY_QUALITYSSIM_HPP
+#define OPENCV_QUALITY_QUALITYSSIM_HPP
+
+#include "qualitybase.hpp"
+
+namespace cv
+{
+namespace quality
+{
+
+/**
+@brief Full reference structural similarity algorithm  https://en.wikipedia.org/wiki/Structural_similarity
+*/
+class CV_EXPORTS_W QualitySSIM
+    : public QualityBase {
+public:
+
+    /**
+    @brief Computes SSIM
+    @param cmp Comparison image
+    @returns cv::Scalar with per-channel quality values.  Values range from 0 (worst) to 1 (best)
+    */
+    CV_WRAP cv::Scalar compute( InputArray cmp ) CV_OVERRIDE;
+
+    /** @brief Implements Algorithm::empty()  */
+    CV_WRAP bool empty() const CV_OVERRIDE { return _refImgData.empty() && QualityBase::empty(); }
+
+    /** @brief Implements Algorithm::clear()  */
+    CV_WRAP void clear() CV_OVERRIDE { _refImgData = _mat_data(); QualityBase::clear(); }
+
+    /**
+    @brief Create an object which calculates quality
+    @param ref input image to use as the reference image for comparison
+    */
+    CV_WRAP static Ptr<QualitySSIM> create( InputArray ref );
+
+    /**
+    @brief static method for computing quality
+    @param ref reference image
+    @param cmp comparison image
+    @param qualityMap output quality map, or cv::noArray()
+    @returns cv::Scalar with per-channel quality values.  Values range from 0 (worst) to 1 (best)
+    */
+    CV_WRAP static cv::Scalar compute( InputArray ref, InputArray cmp, OutputArray qualityMap );
+
+protected:
+
+    // holds computed values for a mat
+    struct _mat_data
+    {
+        // internal mat type
+        using mat_type = QualityBase::_mat_type;
+
+        mat_type
+            I
+            , I_2
+            , mu
+            , mu_2
+            , sigma_2
+            ;
+
+        // allow default construction
+        _mat_data() = default;
+
+        // construct from mat_type
+        _mat_data(const mat_type&);
+
+        // construct from inputarray
+        _mat_data(InputArray);
+
+        // return flag if this is empty
+        bool empty() const { return I.empty() && I_2.empty() && mu.empty() && mu_2.empty() && sigma_2.empty(); }
+
+        // computes ssim and quality map for single frame
+        static std::pair<cv::Scalar, mat_type> compute(const _mat_data& lhs, const _mat_data& rhs);
+
+    };  // mat_data
+
+    /** @brief Reference image data */
+    _mat_data _refImgData;
+
+    /**
+    @brief Constructor
+    @param refImgData reference image, converted to internal type
+    */
+    QualitySSIM( _mat_data refImgData )
+        : _refImgData( std::move(refImgData) )
+    {}
+
+};  // QualitySSIM
+}   // quality
+}   // cv
+#endif
\ No newline at end of file
diff --git a/modules/quality/samples/brisque_eval_tid2008.cpp b/modules/quality/samples/brisque_eval_tid2008.cpp
new file mode 100644
index 00000000000..590aa602c5e
--- /dev/null
+++ b/modules/quality/samples/brisque_eval_tid2008.cpp
@@ -0,0 +1,243 @@
+#include <fstream>
+
+#include "opencv2/quality.hpp"
+#include "opencv2/imgproc.hpp"
+#include "opencv2/imgcodecs.hpp"
+#include "opencv2/ml.hpp"
+
+/*
+BRISQUE evaluator using TID2008
+
+TID2008:
+http://www.ponomarenko.info/tid2008.htm
+
+[1] N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, M. Carli,
+F. Battisti, "TID2008 - A Database for Evaluation of Full-Reference
+Visual Quality Assessment Metrics", Advances of Modern
+Radioelectronics, Vol. 10, pp. 30-45, 2009.
+
+[2] N. Ponomarenko, F. Battisti, K. Egiazarian, J. Astola,  V. Lukin
+"Metrics performance comparison for color image database", Fourth
+international workshop on video processing and quality metrics
+for consumer electronics, Scottsdale, Arizona, USA. Jan. 14-16, 2009, 6 p.
+
+*/
+
+namespace {
+
+    // get ordinal ranks of data, fractional ranks assigned for ties.  O(n^2) time complexity
+    //  optional binary predicate used for rank ordering of data elements, equality evaluation
+    template <typename T, typename PrEqual = std::equal_to<T>, typename PrLess = std::less<T>>
+    std::vector<float> rank_ordinal(const T* data, std::size_t sz, PrEqual&& eq = {}, PrLess&& lt = {})
+    {
+        std::vector<float> result{};
+        result.resize(sz, -1);// set all ranks to -1, indicating not yet done
+
+        int rank = 0;
+        while (rank < (int)sz)
+        {
+            std::vector<int> els = {};
+
+            for (int i = 0; i < (int)sz; ++i)
+            {
+                if (result[i] < 0)//not yet done
+                {
+                    if (!els.empty())// already found something
+                    {
+                        if (lt(data[i], data[els[0]]))//found a smaller item, replace existing
+                        {
+                            els.clear();
+                            els.emplace_back(i);
+                        }
+                        else if (eq(data[i], data[els[0]]))// found a tie, add to vector
+                            els.emplace_back(i);
+                    }
+                    else//els.empty==no current item, add it
+                        els.emplace_back(i);
+                }
+            }
+
+            CV_Assert(!els.empty());
+
+            // compute, assign arithmetic mean
+            const auto assigned_rank = (double)rank + (double)(els.size() - 1) / 2.;
+            for (auto el : els)
+                result[el] = (float)assigned_rank;
+
+            rank += (int)els.size();
+        }
+
+        return result;
+    }
+
+    template <typename T>
+    double pearson(const T* x, const T* y, std::size_t sz)
+    {
+        // based on https://www.geeksforgeeks.org/program-spearmans-rank-correlation/
+
+        double sigma_x = {}, sigma_y = {}, sigma_xy = {}, sigma_xsq = {}, sigma_ysq = {};
+        for (unsigned i = 0; i < sz; ++i)
+        {
+            sigma_x += x[i];
+            sigma_y += y[i];
+            sigma_xy += x[i] * y[i];
+            sigma_xsq += x[i] * x[i];
+            sigma_ysq += y[i] * y[i];
+        }
+
+        const double
+            num = (sz * sigma_xy - sigma_x * sigma_y)
+            , den = std::sqrt(((double)sz*sigma_xsq - sigma_x * sigma_x) * ((double)sz*sigma_ysq - sigma_y * sigma_y))
+            ;
+        return num / den;
+    }
+
+    // https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
+    template <typename T>
+    double spearman(const T* x, const T* y, std::size_t sz)
+    {
+        // convert x, y to ranked integral vectors
+        const auto
+            x_rank = rank_ordinal(x, sz)
+            , y_rank = rank_ordinal(y, sz)
+            ;
+
+        return pearson(x_rank.data(), y_rank.data(), sz);
+    }
+
+    // returns cv::Mat of columns: { Distortion Type ID, MOS_Score, Brisque_Score }
+    cv::Mat tid2008_eval(const std::string& root, cv::quality::QualityBRISQUE& alg)
+    {
+        const std::string
+            mos_with_names_path = root + "mos_with_names.txt"
+            , dist_imgs_root = root + "distorted_images/"
+            ;
+
+        cv::Mat result(0, 3, CV_32FC1);
+
+        // distortion types we care about
+        static const std::vector<int> distortion_types = {
+            10 // jpeg compression
+            , 11 // jp2k compression
+            , 1 // additive gaussian noise
+            , 8 // gaussian blur
+        };
+
+        static const int
+            num_images = 25 // [I01_ - I25_], file names
+            , num_distortions = 4 // num distortions per image
+            ;
+
+        // load mos_with_names.  format: { mos, fname }
+        std::vector<std::pair<float, std::string>> mos_with_names = {};
+
+        std::ifstream mos_file(mos_with_names_path, std::ios::in);
+        while (true)
+        {
+            std::string line;
+            std::getline(mos_file, line);
+            if (!line.empty())
+            {
+                const auto space_pos = line.find(' ');
+                CV_Assert(space_pos != line.npos);
+
+                mos_with_names.emplace_back(std::make_pair(
+                    (float)std::atof(line.substr(0, space_pos).c_str())
+                    , line.substr(space_pos + 1)
+                ));
+            }
+
+            if (mos_file.peek() == EOF)
+                break;
+        };
+
+        // foreach image
+        //  foreach distortion type
+        //      foreach distortion level
+        //          distortion type id, mos value, brisque value
+
+        for (int i = 0; i < num_images; ++i)
+        {
+            for (int ty = 0; ty < (int)distortion_types.size(); ++ty)
+            {
+                for (int dist = 1; dist <= num_distortions; ++dist)
+                {
+                    float mos_val = 0.f;
+
+                    const std::string img_name = std::string("i")
+                        + (((i + 1) < 10) ? "0" : "")
+                        + std::to_string(i + 1)
+                        + "_"
+                        + ((distortion_types[ty] < 10) ? "0" : "")
+                        + std::to_string(distortion_types[ty])
+                        + "_"
+                        + std::to_string(dist)
+                        + ".bmp";
+
+                    // find mos
+                    bool found = false;
+                    for (const auto& val : mos_with_names)
+                    {
+                        if (val.second == img_name)
+                        {
+                            found = true;
+                            mos_val = val.first;
+                            break;
+                        }
+
+                    }
+
+                    CV_Assert(found);
+
+                    // do brisque
+                    auto img = cv::imread(dist_imgs_root + img_name);
+
+                    // typeid, mos, brisque
+                    cv::Mat row(1, 3, CV_32FC1);
+                    row.at<float>(0) = (float)distortion_types[ty];
+                    row.at<float>(1) = mos_val;
+                    row.at<float>(2) = (float)alg.compute(img)[0];
+                    result.push_back(row);
+
+                }// dist
+            }//ty
+        }//i
+
+        return result;
+    }
+}
+
+inline void printHelp()
+{
+    using namespace std;
+    cout << "    Demo of comparing BRISQUE quality assessment model against TID2008 database." << endl;
+    cout << "    A. Mittal, A. K. Moorthy and A. C. Bovik, 'No Reference Image Quality Assessment in the Spatial Domain'" << std::endl << std::endl;
+    cout << "    Usage: program <tid2008_path> <brisque_model_path> <brisque_range_path>" << endl << endl;
+}
+
+int main(int argc, const char * argv[])
+{
+    using namespace cv::ml;
+
+    if (argc != 4)
+    {
+        printHelp();
+        exit(1);
+    }
+
+    std::cout << "Evaluating database at " << argv[1] << "..." << std::endl;
+
+    const auto ptr = cv::quality::QualityBRISQUE::create(argv[2], argv[3]);
+
+    const auto data = tid2008_eval( std::string( argv[1] ) + "/", *ptr );
+
+    // create contiguous mats
+    const auto mos = data.col(1).clone();
+    const auto brisque = data.col(2).clone();
+
+    // calc srocc
+    const auto cc = spearman((const float*)mos.data, (const float*)brisque.data, data.rows);
+    std::cout << "SROCC: " << cc << std::endl;
+
+    return 0;
+}
diff --git a/modules/quality/samples/brisque_model_live.yml b/modules/quality/samples/brisque_model_live.yml
new file mode 100644
index 00000000000..8cdc8547134
--- /dev/null
+++ b/modules/quality/samples/brisque_model_live.yml
@@ -0,0 +1,9424 @@
+%YAML:1.0
+---
+opencv_ml_svm:
+   format: 3
+   svmType: EPS_SVR
+   kernel:
+      type: RBF
+      gamma: 5.0000000000000003e-02
+   C: 1024.
+   p: 1.0000000000000001e-01
+   term_criteria: { epsilon:1.0000000000000000e-03 }
+   var_count: 36
+   sv_total: 774
+   support_vectors:
+      - [ -5.97555935e-01, -1.39442384e-01, -2.49112904e-01,
+          4.35729742e-01, -7.69471049e-01, -1.13702476e-01,
+          -2.72858143e-01, 3.86102557e-01, -7.46624291e-01,
+          -2.00464189e-01, -1.87499940e-01, -6.77904487e-02,
+          -5.93950391e-01, -5.81470609e-01, -1.87500000e-01,
+          -6.22787476e-02, -5.90193391e-01, -5.92239380e-01,
+          3.78418922e-01, 2.13951468e-01, 4.01574850e-01, 1.64288640e-01,
+          -3.32301497e-01, 3.92629981e-01, 4.03468132e-01,
+          5.94722033e-02, -2.93012440e-01, 2.26754189e-01,
+          4.96886611e-01, -3.65840018e-01, -3.31132412e-02,
+          -2.85737455e-01, 5.24875641e-01, -4.32341158e-01,
+          -4.67160940e-02, -3.17992151e-01 ]
+      - [ -7.51449883e-01, -4.71489668e-01, -5.17388225e-01,
+          4.52400565e-01, -9.24875677e-01, -4.56025302e-01,
+          -5.43556452e-01, 8.03558826e-02, -8.67808700e-01,
+          -6.54508471e-01, -4.54687417e-01, 1.92940950e-01,
+          -8.50726187e-01, -7.93629766e-01, -4.56250012e-01,
+          1.78094268e-01, -8.46399486e-01, -8.03789556e-01,
+          -5.01954556e-02, -1.31991088e-01, 1.68728828e-02,
+          4.53818798e-01, -6.85710669e-01, 5.63471317e-02,
+          6.58959150e-02, -2.04213083e-01, -4.87260878e-01,
+          -4.07591999e-01, 1.33250237e-01, -6.99379444e-02,
+          -4.70214307e-01, -5.55439115e-01, 1.56716347e-01,
+          -2.68402874e-01, -4.39923286e-01, -6.08644128e-01 ]
+      - [ -7.56006658e-01, -5.77993512e-01, -5.47196627e-01,
+          3.88451338e-01, -9.53619361e-01, -5.97933412e-01,
+          -5.79553604e-01, 1.11859322e-01, -9.09386635e-01,
+          -7.18025684e-01, -4.51562464e-01, 2.51630068e-01,
+          -9.05938506e-01, -8.50941420e-01, -4.53124940e-01,
+          3.06160808e-01, -9.09709811e-01, -8.48268449e-01,
+          -1.50299072e-03, -2.71080732e-01, -3.37451696e-03,
+          4.12147164e-01, -7.47886777e-01, -1.52841151e-01,
+          -6.82081580e-02, 1.69326305e-01, -6.84828579e-01,
+          -3.69906008e-01, 1.45703554e-01, -2.39847898e-01,
+          -5.75910628e-01, -7.30112314e-01, 1.36815906e-01,
+          -1.35852635e-01, -6.09467983e-01, -7.08082259e-01 ]
+      - [ -8.30778778e-01, -7.30379462e-01, -6.89141214e-01,
+          3.08734655e-01, -9.87211287e-01, -7.53609061e-01,
+          -6.80345535e-01, 3.19819927e-01, -9.85353291e-01,
+          -8.01384211e-01, -5.48437476e-01, 4.95920300e-01,
+          -9.76384759e-01, -9.02393043e-01, -5.54687440e-01,
+          5.18024087e-01, -9.76644039e-01, -9.02455568e-01,
+          -3.92846465e-01, -4.89412487e-01, -3.00337493e-01,
+          3.26959014e-01, -8.38108301e-01, -4.47629869e-01,
+          -3.50289047e-01, 4.34923887e-01, -8.67099106e-01,
+          -4.84009445e-01, -1.98007464e-01, -4.89409566e-02,
+          -7.43211389e-01, -8.19475114e-01, -2.03980088e-01,
+          -6.73755407e-02, -7.45016456e-01, -8.18471551e-01 ]
+      - [ -7.69055486e-01, -6.18591428e-01, -5.67068815e-01,
+          3.67950559e-01, -9.63698566e-01, -6.45147204e-01,
+          -5.66594601e-01, 1.96846962e-01, -9.44249332e-01,
+          -7.53406584e-01, -4.40624952e-01, 3.65130544e-01,
+          -9.36553895e-01, -8.65356505e-01, -4.43749964e-01,
+          4.14383888e-01, -9.39273894e-01, -8.62746060e-01,
+          -1.23234391e-02, -3.02398264e-01, 7.87401199e-03,
+          5.18019676e-01, -8.03870857e-01, -1.71984017e-01,
+          -6.35837913e-02, 1.88674450e-01, -7.15294480e-01,
+          -4.09737945e-01, 1.40722275e-01, -1.72965348e-01,
+          -6.22165442e-01, -7.44866610e-01, 1.21890545e-01,
+          -7.61067271e-02, -6.49645686e-01, -7.23056436e-01 ]
+      - [ -8.11309040e-01, -5.79038978e-01, -6.53655052e-01,
+          2.66601801e-01, -9.21262622e-01, -5.57496607e-01,
+          -6.84665203e-01, 7.94374943e-03, -8.66051912e-01,
+          -6.84777021e-01, -6.04687452e-01, -2.71651149e-02,
+          -8.41300607e-01, -8.57320845e-01, -6.17187500e-01,
+          4.62102890e-03, -8.38529468e-01, -8.55950475e-01,
+          -3.08686554e-01, -3.36190939e-01, -2.10348666e-01,
+          2.58878827e-01, -7.18912780e-01, -2.54680097e-01,
+          -2.50867128e-01, 7.55485296e-02, -6.54054523e-01,
+          -4.10593688e-01, -6.10211492e-02, -4.06402111e-01,
+          -5.34636855e-01, -7.71946073e-01, -8.45771432e-02,
+          -4.07498062e-01, -5.38136482e-01, -7.62808263e-01 ]
+      - [ -7.35915542e-01, -2.53773689e-01, -4.93257642e-01,
+          3.85161281e-01, -8.00197840e-01, -1.76539660e-01,
+          -5.16198695e-01, -9.33832526e-02, -6.54668391e-01,
+          -4.08097148e-01, -4.39062417e-01, -8.20776224e-02,
+          -6.59215450e-01, -6.70591354e-01, -4.42187488e-01,
+          -1.38509095e-01, -6.36986017e-01, -6.88472390e-01,
+          3.87736559e-02, 1.35442853e-01, 1.11361146e-01, 1.46885633e-01,
+          -3.59397233e-01, 3.20133686e-01, 1.46820664e-01,
+          -3.24057341e-01, -1.66410029e-01, -6.21851683e-02,
+          2.32876658e-01, -2.17308640e-01, -1.65701330e-01,
+          -3.35480988e-01, 2.38805890e-01, -2.92272389e-01,
+          -1.56302094e-01, -3.54638338e-01 ]
+      - [ -8.09030652e-01, -5.39053679e-01, -6.52235627e-01,
+          1.16976976e-01, -8.91915023e-01, -6.04330182e-01,
+          -6.65946722e-01, 2.68867135e-01, -9.00376499e-01,
+          -5.75086951e-01, -6.01562500e-01, 1.66838288e-01,
+          -8.61237407e-01, -8.16005170e-01, -6.10937476e-01,
+          2.13618517e-01, -8.63878012e-01, -8.13410342e-01,
+          -2.83438623e-01, -1.76465034e-01, -1.72103405e-01,
+          5.87092638e-02, -5.73883832e-01, -2.12707877e-01,
+          -1.63005888e-01, 2.20118999e-01, -5.95708787e-01,
+          -1.34734631e-01, -9.58903432e-02, -1.78796947e-01,
+          -4.49561238e-01, -6.01356864e-01, -1.06965184e-01,
+          -1.71632826e-01, -4.62060034e-01, -5.96197605e-01 ]
+      - [ -8.18144143e-01, -6.84016705e-01, -7.17530131e-01,
+          4.44494486e-02, -9.30332303e-01, -7.36634851e-01,
+          -7.17782557e-01, 1.44906521e-01, -9.44211185e-01,
+          -7.60924101e-01, -6.90624952e-01, 3.37285995e-02,
+          -8.94341350e-01, -8.99865270e-01, -6.92187488e-01,
+          2.58617997e-01, -9.23155963e-01, -8.78362536e-01,
+          -4.17493284e-01, -4.53735232e-01, -4.21822250e-01,
+          1.94593430e-01, -7.56791651e-01, -4.00211453e-01,
+          -3.36416245e-01, 1.08048320e-01, -7.48083353e-01,
+          -5.48516154e-01, -2.65255272e-01, -1.92801178e-01,
+          -6.65878654e-01, -8.07839990e-01, -2.96019852e-01,
+          -1.27869248e-01, -6.81973219e-01, -7.87529349e-01 ]
+      - [ -7.51449883e-01, -4.46877956e-01, -5.55713236e-01,
+          2.72939205e-02, -8.13275516e-01, -5.25509000e-01,
+          -5.69474459e-01, 3.63322616e-01, -8.62414539e-01,
+          -4.29143012e-01, -4.39062417e-01, -2.35755742e-01,
+          -7.50849009e-01, -8.24527740e-01, -4.40625012e-01,
+          -4.60924506e-02, -7.81088114e-01, -8.05214524e-01,
+          3.39643955e-02, -1.43103063e-01, 1.01237297e-02,
+          -1.18883252e-02, -4.80522573e-01, -1.28819466e-01,
+          -3.81503701e-02, 2.92940617e-01, -5.29328585e-01,
+          8.81496668e-02, 2.97633886e-01, -8.98843527e-01,
+          -2.59629488e-01, -7.41354942e-01, 2.51243830e-01,
+          -6.81590796e-01, -3.20888698e-01, -6.83992565e-01 ]
+      - [ -8.00745666e-01, -5.47454476e-01, -6.45138383e-01,
+          1.70310855e-01, -9.12038326e-01, -6.13396049e-01,
+          -6.54427648e-01, 1.36836648e-01, -8.72543931e-01,
+          -6.11520648e-01, -5.95312476e-01, 1.96510911e-01,
+          -8.68301332e-01, -8.14886272e-01, -6.06249928e-01,
+          3.71325016e-02, -8.29524279e-01, -8.34729552e-01,
+          -1.96272969e-01, -2.12897182e-01, -9.78627205e-02,
+          -7.32473731e-02, -5.61367869e-01, -3.46313894e-01,
+          -1.39884353e-01, 1.39487505e-01, -5.73199630e-01,
+          -1.89729929e-01, -2.11706161e-02, -2.91248500e-01,
+          -4.57477510e-01, -6.58225596e-01, -6.21890426e-02,
+          -1.79520369e-01, -4.84609246e-01, -6.18657649e-01 ]
+      - [ -6.46437407e-01, -2.82576203e-01, -3.30021322e-01,
+          2.56448865e-01, -8.16340327e-01, -3.73782814e-01,
+          -3.57811332e-01, 2.19053030e-01, -7.85990119e-01,
+          -4.09804881e-01, -3.65624964e-01, 5.70372343e-02,
+          -6.62667632e-01, -6.09823823e-01, -3.54687512e-01,
+          5.20280600e-02, -6.70666039e-01, -6.37556612e-01,
+          2.37751722e-01, 4.48246002e-02, 2.37345338e-01,
+          -6.36093020e-02, -3.63519251e-01, 1.75126791e-02,
+          2.55491376e-01, -1.90009117e-01, -3.13201249e-01,
+          -1.16502404e-01, 2.75217891e-01, 1.31736636e-01,
+          -2.96668410e-01, -2.52418518e-01, 3.20895553e-01,
+          -2.91866422e-01, -2.11802185e-01, -4.02434707e-01 ]
+      - [ -7.95360386e-01, -4.64225650e-01, -6.01135552e-01,
+          4.52491164e-01, -9.15479541e-01, -3.92477751e-01,
+          -6.32829309e-01, -1.38400376e-01, -7.75539279e-01,
+          -6.32809579e-01, -5.57812512e-01, 1.59499645e-02,
+          -7.93174088e-01, -7.87053823e-01, -5.60937464e-01,
+          -4.02470231e-02, -7.72863626e-01, -7.98910141e-01,
+          -1.00090206e-01, -7.18296766e-02, -5.96175790e-02,
+          3.66864085e-01, -5.89139700e-01, 1.74452543e-01,
+          -1.96532011e-02, -3.13422143e-01, -3.14955890e-01,
+          -2.81157851e-01, 5.60398102e-02, -2.21251488e-01,
+          -3.56385887e-01, -5.28312683e-01, 5.72139025e-02,
+          -2.75477231e-01, -3.53472292e-01, -5.38779080e-01 ]
+      - [ -8.03852558e-01, -5.60077310e-01, -6.38041139e-01,
+          -3.68342996e-02, -8.67691159e-01, -6.71859026e-01,
+          -6.54427648e-01, 3.56220126e-01, -9.23446655e-01,
+          -5.74068308e-01, -6.20312452e-01, -4.55922484e-02,
+          -8.16976368e-01, -8.37730527e-01, -6.25000000e-01,
+          1.57512903e-01, -8.48734915e-01, -8.14604461e-01,
+          -3.83829296e-01, -3.33596766e-01, -2.77840316e-01,
+          -6.55543804e-03, -6.30420685e-01, -3.96442235e-01,
+          -2.67051995e-01, 2.73972750e-01, -7.24807024e-01,
+          -3.30820799e-01, -2.12951481e-01, -2.63663769e-01,
+          -5.20848870e-01, -7.12720275e-01, -2.26368129e-01,
+          -2.32076347e-01, -5.36275387e-01, -6.98915958e-01 ]
+      - [ -7.30323076e-01, -1.57782316e-01, -3.62668574e-01,
+          8.37614179e-01, -8.49678040e-01, 1.21071339e-01,
+          -3.39092791e-01, -4.64269161e-01, -5.44795394e-01,
+          -5.25070786e-01, -3.43749940e-01, -4.34187412e-01,
+          -4.64896739e-01, -6.20769501e-01, -3.48437488e-01,
+          -3.01995039e-01, -4.87570941e-01, -6.03720784e-01,
+          -1.89660430e-01, 7.08419085e-02, 8.66141319e-02,
+          6.23188853e-01, -5.86469829e-01, 5.60379148e-01,
+          1.49132967e-01, -8.69919598e-01, -2.82315612e-02,
+          -4.64265347e-01, 1.78082108e-01, -7.08329558e-01,
+          -2.20751762e-03, -4.80149388e-01, 1.74129367e-01,
+          -6.16432667e-01, -6.39300346e-02, -4.58555758e-01 ]
+      - [ -7.61184752e-01, -5.29594123e-01, -5.62810481e-01,
+          1.90864682e-01, -9.11934733e-01, -6.15689814e-01,
+          -5.73794007e-01, 3.34281564e-01, -9.23075736e-01,
+          -6.09738231e-01, -4.90624905e-01, 1.82352066e-01,
+          -8.68353963e-01, -8.20554614e-01, -4.95312452e-01,
+          2.06494570e-01, -8.68257761e-01, -8.22399616e-01,
+          -8.14548135e-02, -2.35193729e-01, -5.51180840e-02,
+          1.14514947e-01, -6.62093818e-01, -3.15040588e-01,
+          -5.20231128e-02, 1.37854576e-01, -6.56857133e-01,
+          -3.49381387e-01, 8.71729851e-03, -6.60536885e-02,
+          -5.43562233e-01, -6.30089402e-01, -9.95022058e-03,
+          -1.37825906e-01, -5.29538035e-01, -6.38942480e-01 ]
+      - [ -7.95774639e-01, -4.77428913e-01, -6.29524469e-01,
+          5.78501225e-02, -8.45474482e-01, -5.54202437e-01,
+          -6.40028775e-01, 2.02629805e-01, -8.51960897e-01,
+          -5.18712342e-01, -5.85937500e-01, 3.73691320e-02,
+          -7.98189342e-01, -7.85572529e-01, -5.95312476e-01,
+          1.03295088e-01, -8.03743601e-01, -7.79769301e-01,
+          -2.18515217e-01, -1.17768764e-01, -9.56130028e-02,
+          2.51657963e-02, -5.35338402e-01, -1.83111489e-01,
+          -9.82660055e-02, 1.52639866e-01, -5.31835437e-01,
+          -9.59798694e-02, -4.35865521e-02, -1.52526617e-01,
+          -4.02695179e-01, -5.40705919e-01, -4.97512817e-02,
+          -1.37714863e-01, -4.14659739e-01, -5.30860186e-01 ]
+      - [ -6.66942835e-01, -2.54925728e-01, -3.25762928e-01,
+          9.31068778e-01, -9.08010006e-01, -7.00271130e-03,
+          -3.14614832e-01, -5.26570737e-01, -6.09568357e-01,
+          -6.42442346e-01, -2.54687488e-01, -7.13165879e-01,
+          -5.30165076e-01, -7.79379129e-01, -2.57812440e-01,
+          -4.19271290e-01, -5.81882715e-01, -7.36962140e-01,
+          1.79440856e-01, -6.75199628e-02, 2.53093362e-01,
+          8.68084550e-01, -7.63803422e-01, 3.87098789e-01,
+          2.83236861e-01, -6.32056952e-01, -3.16629589e-01,
+          -5.47244787e-01, 4.29638743e-01, -8.97222519e-01,
+          -2.20483184e-01, -7.05732763e-01, 4.25372958e-01,
+          -8.32538247e-01, -2.51320302e-01, -6.82827353e-01 ]
+      - [ -8.30157399e-01, -7.02668309e-01, -7.04755187e-01,
+          1.30939364e-01, -9.65747237e-01, -7.77984142e-01,
+          -7.35061169e-01, 3.65431070e-01, -9.77528453e-01,
+          -7.24858046e-01, -6.21874928e-01, 3.67594957e-01,
+          -9.57308650e-01, -8.94636273e-01, -6.32812500e-01,
+          3.94242167e-01, -9.57403004e-01, -8.93857896e-01,
+          -4.15689826e-01, -3.96433175e-01, -3.49831164e-01,
+          4.13793325e-02, -7.03789473e-01, -4.75560963e-01,
+          -3.91907513e-01, 3.90916586e-01, -7.77586699e-01,
+          -3.00066054e-01, -2.75217950e-01, -2.11683631e-01,
+          -6.19384170e-01, -7.77332127e-01, -2.83582091e-01,
+          -1.62129939e-01, -6.42609298e-01, -7.65671611e-01 ]
+      - [ -8.39270949e-01, -8.42031360e-01, -7.31724620e-01,
+          1.31289244e-01, -9.95796680e-01, -8.90482008e-01,
+          -7.49460042e-01, 1.90351605e-01, -9.94647205e-01,
+          -8.94988775e-01, -6.73437476e-01, 4.80333567e-01,
+          -9.91700113e-01, -9.39254344e-01, -6.81249976e-01,
+          5.00651836e-01, -9.91777480e-01, -9.39390898e-01,
+          -3.24316263e-01, -5.95337152e-01, -2.75590479e-01,
+          4.29415464e-01, -9.36997890e-01, -6.29172564e-01,
+          -2.87861288e-01, 4.47833896e-01, -9.30179894e-01,
+          -6.58259630e-01, -1.70610249e-01, 3.59008670e-01,
+          -8.92894506e-01, -8.40708315e-01, -1.74129426e-01,
+          3.26935172e-01, -8.91071498e-01, -8.43931675e-01 ]
+      - [ -8.02609801e-01, -5.89201093e-01, -6.45138383e-01,
+          -8.45165849e-02, -8.67098451e-01, -7.03755140e-01,
+          -6.42908573e-01, 6.18834615e-01, -9.73069489e-01,
+          -5.59258342e-01, -5.46875000e-01, 4.61914539e-02,
+          -8.72139037e-01, -8.68110538e-01, -5.48437476e-01,
+          1.67176843e-01, -8.85582745e-01, -8.57626796e-01,
+          -1.71025038e-01, -2.80676246e-01, -1.20359898e-01,
+          -5.72049022e-02, -5.67356586e-01, -3.39260876e-01,
+          -1.74566567e-01, 4.87450838e-01, -7.15797365e-01,
+          -6.05849624e-02, 9.83810425e-02, -6.82403326e-01,
+          -4.28300798e-01, -7.86850393e-01, 7.96018839e-02,
+          -4.53529179e-01, -5.01061022e-01, -7.41474509e-01 ]
+      - [ -8.12551796e-01, -6.67026877e-01, -6.16749525e-01,
+          1.57106519e-01, -9.62257922e-01, -7.67985165e-01,
+          -6.70266390e-01, 5.84338307e-01, -9.90526736e-01,
+          -6.73291087e-01, -5.29687524e-01, 5.30365467e-01,
+          -9.63233590e-01, -8.64573479e-01, -5.35937548e-01,
+          5.32484412e-01, -9.62083519e-01, -8.67233336e-01,
+          -2.31740355e-01, -2.81199992e-01, -1.96850419e-01,
+          2.54415274e-02, -6.17173254e-01, -3.33430707e-01,
+          -2.06936479e-01, 7.06565142e-01, -8.11554849e-01,
+          -2.95510292e-02, -2.61518955e-02, -1.11737788e-01,
+          -5.91859818e-01, -6.98298395e-01, -2.23881602e-02,
+          -2.26745903e-01, -5.75341702e-01, -7.19477355e-01 ]
+      - [ -8.37613940e-01, -8.13829064e-01, -7.44499683e-01,
+          2.35361934e-01, -9.97099996e-01, -8.44651639e-01,
+          -7.22102165e-01, 2.06410646e-01, -9.93061602e-01,
+          -8.84314716e-01, -6.40625000e-01, 5.46434999e-01,
+          -9.91789103e-01, -9.26592231e-01, -6.48437500e-01,
+          5.52073002e-01, -9.91277039e-01, -9.27839994e-01,
+          -3.70003045e-01, -5.37773490e-01, -2.93588281e-01,
+          5.44216990e-01, -9.28189039e-01, -4.81441319e-01,
+          -3.22543323e-01, 2.74999380e-01, -8.74237180e-01,
+          -6.53266728e-01, -2.12951481e-01, 2.68959761e-01,
+          -8.49220455e-01, -8.16495061e-01, -2.18905568e-01,
+          2.21071005e-01, -8.47553611e-01, -8.25150728e-01 ]
+      - [ -7.59527743e-01, -4.51951683e-01, -5.67068815e-01,
+          1.38841867e-02, -8.10477555e-01, -5.29560208e-01,
+          -5.80993533e-01, 3.75988007e-01, -8.68177652e-01,
+          -4.29959536e-01, -4.51562464e-01, -2.30455399e-01,
+          -7.54351199e-01, -8.26839924e-01, -4.54687536e-01,
+          -4.36534286e-02, -7.84011602e-01, -8.07973087e-01,
+          5.10966778e-03, -1.46428764e-01, 1.12485886e-03,
+          -3.63026857e-02, -4.65881109e-01, -1.34057641e-01,
+          -5.20231128e-02, 2.78150320e-01, -5.22869468e-01,
+          8.24661255e-02, 2.82689810e-01, -9.13267553e-01,
+          -2.56476283e-01, -7.45868921e-01, 2.36318350e-01,
+          -6.89965606e-01, -3.19059551e-01, -6.87031269e-01 ]
+      - [ -5.50331414e-01, -8.71763229e-02, -1.65365517e-01,
+          4.84445810e-01, -7.61842370e-01, -6.36519790e-02,
+          -1.82145476e-01, 4.54751253e-01, -7.41577268e-01,
+          -1.42901957e-01, -1.26562476e-01, -5.58068752e-02,
+          -5.50679088e-01, -5.24050951e-01, -1.32812440e-01,
+          -4.64901328e-02, -5.47124386e-01, -5.34006715e-01,
+          4.33724046e-01, 2.52801895e-01, 4.39820051e-01, 1.39711618e-01,
+          -2.87002325e-01, 4.44679141e-01, 4.58959460e-01,
+          1.10820532e-02, -2.44230211e-01, 2.51331568e-01,
+          5.49190402e-01, -3.84759784e-01, 1.74971819e-02,
+          -2.47391701e-01, 5.72139144e-01, -4.42805171e-01,
+          2.62653828e-03, -2.76491165e-01 ]
+      - [ -8.38028193e-01, -4.97904956e-01, -6.73527300e-01,
+          7.91657448e-01, -9.69720721e-01, -2.26570666e-01,
+          -6.87544942e-01, -7.00056016e-01, -6.41064107e-01,
+          -8.20390940e-01, -6.54687464e-01, -4.08040226e-01,
+          -6.95034027e-01, -8.50242019e-01, -6.64062500e-01,
+          -4.23824549e-01, -6.78329945e-01, -8.59142005e-01,
+          -4.14487541e-01, -2.58698225e-01, -2.01349795e-01,
+          8.65715384e-01, -8.60821903e-01, 1.75550938e-01,
+          -1.95375800e-01, -8.30271363e-01, -2.85812259e-01,
+          -7.42358804e-01, -1.28268957e-01, -6.41483665e-01,
+          -3.60506952e-01, -7.45309949e-01, -1.39303565e-01,
+          -7.10289955e-01, -3.52757931e-01, -7.50963330e-01 ]
+      - [ -8.47348809e-01, -8.26157451e-01, -8.09794188e-01,
+          -1.41208768e-02, -9.72527027e-01, -8.63920987e-01,
+          -8.08495283e-01, 1.42262578e-01, -9.87214744e-01,
+          -8.73461485e-01, -7.92187452e-01, 2.21908092e-01,
+          -9.65411663e-01, -9.40453291e-01, -7.73437500e-01,
+          3.87941241e-01, -9.79423761e-01, -9.31219935e-01,
+          -5.58761716e-01, -6.24706984e-01, -5.74803114e-01,
+          2.39370704e-01, -8.72274816e-01, -5.95648408e-01,
+          -5.35260201e-01, 1.75603747e-01, -8.67686450e-01,
+          -7.13586807e-01, -4.81942773e-01, 1.06456995e-01,
+          -8.42213869e-01, -8.67918015e-01, -4.97512460e-01,
+          1.12361550e-01, -8.45891476e-01, -8.62318516e-01 ]
+      - [ -7.81068742e-01, -4.89364743e-01, -5.98296642e-01,
+          2.18062997e-01, -8.69647622e-01, -4.78635788e-01,
+          -6.24190032e-01, 3.06208134e-02, -8.51515830e-01,
+          -6.47874951e-01, -5.59374988e-01, -4.11769152e-02,
+          -8.07081282e-01, -8.23128283e-01, -5.68749964e-01,
+          -2.31788158e-02, -8.03170383e-01, -8.25547040e-01,
+          -1.67418122e-01, -2.04198062e-01, -1.27109110e-01,
+          3.49638462e-01, -6.53984427e-01, 6.50036335e-03,
+          -1.00578129e-01, -2.72852361e-01, -4.43405688e-01,
+          -4.27471220e-01, 3.86052132e-02, -4.74669397e-01,
+          -4.11327958e-01, -6.97848439e-01, 3.23382616e-02,
+          -5.35511434e-01, -4.07929361e-01, -7.05104589e-01 ]
+      - [ -8.50869894e-01, -6.94439888e-01, -7.61533022e-01,
+          1.27359033e-01, -9.54508483e-01, -7.31330216e-01,
+          -7.91216671e-01, -7.86000490e-03, -9.00745034e-01,
+          -7.49813318e-01, -7.34375000e-01, 1.82347655e-01,
+          -9.14512396e-01, -8.80690753e-01, -7.53125012e-01,
+          6.98212385e-02, -8.85723174e-01, -8.91250432e-01,
+          -3.82025898e-01, -3.92019331e-01, -3.29583824e-01,
+          -1.85972452e-03, -6.84483707e-01, -4.87914205e-01,
+          -3.45664799e-01, 1.34111166e-01, -6.79288387e-01,
+          -3.93903255e-01, -2.52802014e-01, -9.32066441e-02,
+          -6.45543039e-01, -7.49936104e-01, -2.81094551e-01,
+          -6.32167459e-02, -6.51187301e-01, -7.32557714e-01 ]
+      - [ -6.82062984e-01, -3.10415149e-01, -4.12349164e-01,
+          1.90994143e-01, -8.11558008e-01, -4.05981123e-01,
+          -4.02447820e-01, 7.31471658e-01, -8.98369014e-01,
+          -2.68799365e-01, -2.71874964e-01, 1.90147400e-01,
+          -7.77607501e-01, -6.98327899e-01, -2.81250000e-01,
+          1.36716008e-01, -7.64576554e-01, -7.15095282e-01,
+          1.78238511e-01, 1.05952859e-01, 2.35095620e-01,
+          -2.55802274e-02, -3.62273097e-01, 6.96249008e-02,
+          1.93063617e-01, 3.83384585e-01, -4.63136137e-01,
+          3.36614370e-01, 3.77335072e-01, -3.10259879e-01,
+          -1.77690387e-01, -3.93917739e-01, 3.75621796e-01,
+          -4.97602761e-01, -1.40522540e-01, -4.45371270e-01 ]
+      - [ -8.56462300e-01, -7.67428994e-01, -7.98438609e-01,
+          1.29626513e-01, -9.76885915e-01, -7.91788161e-01,
+          -8.17134619e-01, 1.02330446e-01, -9.62548912e-01,
+          -8.15470159e-01, -7.62499988e-01, 2.94997573e-01,
+          -9.59333777e-01, -9.12737548e-01, -7.65625000e-01,
+          2.99057484e-01, -9.57864165e-01, -9.16654825e-01,
+          -5.98437071e-01, -5.35561383e-01, -5.52305937e-01,
+          8.57285261e-02, -7.84124613e-01, -5.77505231e-01,
+          -5.67630112e-01, 2.47550011e-02, -7.28140235e-01,
+          -6.02546334e-01, -5.36737204e-01, 3.91838551e-02,
+          -7.37511635e-01, -7.93678284e-01, -5.37313461e-01,
+          -6.96677566e-02, -7.21428037e-01, -8.13394070e-01 ]
+      - [ -8.53355408e-01, -7.07715392e-01, -7.78566360e-01,
+          3.83450389e-01, -9.82001781e-01, -6.26468301e-01,
+          -7.69618392e-01, -6.20634556e-02, -9.32990968e-01,
+          -8.45893502e-01, -7.18750000e-01, 2.01229453e-01,
+          -9.35546577e-01, -9.03992295e-01, -7.09374964e-01,
+          2.92094946e-01, -9.44469869e-01, -8.99298370e-01,
+          -5.40727437e-01, -4.26200926e-01, -4.21822250e-01,
+          7.14576244e-01, -8.91243339e-01, -7.60530233e-02,
+          -4.70520318e-01, -3.68540287e-01, -5.30057609e-01,
+          -6.85844660e-01, -3.79825711e-01, -4.95329976e-01,
+          -5.25332093e-01, -8.26919734e-01, -3.80596995e-01,
+          -4.14312959e-01, -5.67164421e-01, -8.10901284e-01 ]
+      - [ -8.06752324e-01, -5.83943903e-01, -6.01135552e-01,
+          5.01669765e-01, -9.74970996e-01, -5.82415819e-01,
+          -6.24190032e-01, 4.61520433e-01, -9.64230895e-01,
+          -6.41999424e-01, -4.64062512e-01, 5.88889956e-01,
+          -9.52933550e-01, -8.28137696e-01, -4.74999964e-01,
+          5.90736747e-01, -9.50857520e-01, -8.28583241e-01,
+          -3.01472843e-01, -1.77663207e-01, -1.49606228e-01,
+          4.10464287e-01, -6.50204897e-01, 1.06985569e-01,
+          -1.69942200e-01, 2.59569645e-01, -6.27325296e-01,
+          -1.49732411e-01, -2.86426544e-02, -3.30675840e-01,
+          -4.23711360e-01, -6.48212612e-01, -1.99004412e-02,
+          -3.29448223e-01, -4.39680457e-01, -6.45693123e-01 ]
+      - [ -8.38235259e-01, -7.41162419e-01, -7.07594037e-01,
+          2.73174763e-01, -9.85123754e-01, -7.60330617e-01,
+          -6.97624207e-01, 2.92843103e-01, -9.84933019e-01,
+          -8.10171723e-01, -5.82812428e-01, 4.62462664e-01,
+          -9.74997997e-01, -9.06124532e-01, -5.89062452e-01,
+          4.84657884e-01, -9.75163519e-01, -9.06185150e-01,
+          -4.13285315e-01, -5.04120886e-01, -3.31833541e-01,
+          3.26954603e-01, -8.42102230e-01, -4.53987360e-01,
+          -3.80346835e-01, 3.87203336e-01, -8.59899998e-01,
+          -5.09983301e-01, -2.30386019e-01, -4.94223237e-02,
+          -7.48643994e-01, -8.25679660e-01, -2.33830869e-01,
+          -6.00892901e-02, -7.53228247e-01, -8.24435830e-01 ]
+      - [ -8.17937016e-01, -7.31281281e-01, -6.46557808e-01,
+          2.11981177e-01, -9.82953548e-01, -8.03188562e-01,
+          -6.84665203e-01, 3.32902074e-01, -9.81568038e-01,
+          -7.76611090e-01, -5.64062417e-01, 4.53067899e-01,
+          -9.71217334e-01, -9.01223421e-01, -5.71874976e-01,
+          4.57810640e-01, -9.70196545e-01, -9.03403401e-01,
+          -2.60595143e-01, -4.33442116e-01, -2.55343020e-01,
+          2.63566017e-01, -8.05195212e-01, -4.42224085e-01,
+          -2.36994207e-01, 4.47118402e-01, -8.28728259e-01,
+          -3.85044038e-01, -7.09837675e-02, -6.87390566e-03,
+          -7.27900505e-01, -7.86221385e-01, -6.71641231e-02,
+          -9.21632051e-02, -7.17622638e-01, -7.97811627e-01 ]
+      - [ -8.30157399e-01, -7.14080036e-01, -7.07594037e-01,
+          1.44822717e-01, -9.70533431e-01, -7.84148574e-01,
+          -7.36501098e-01, 3.45394373e-01, -9.78642941e-01,
+          -7.42155552e-01, -6.26562476e-01, 3.75642061e-01,
+          -9.60549474e-01, -8.98124397e-01, -6.37499988e-01,
+          4.01965022e-01, -9.60519850e-01, -8.97143841e-01,
+          -4.13886487e-01, -4.09133375e-01, -3.52081001e-01,
+          9.72194672e-02, -7.35225558e-01, -4.72845852e-01,
+          -3.89595449e-01, 3.90742540e-01, -7.87353992e-01,
+          -3.24333191e-01, -2.75217950e-01, -2.09831476e-01,
+          -6.32414281e-01, -7.87380219e-01, -2.83582091e-01,
+          -1.45026684e-01, -6.59637928e-01, -7.73845553e-01 ]
+      - [ -8.25393558e-01, -7.39487410e-01, -6.63591146e-01,
+          1.99782133e-01, -9.82844591e-01, -8.06211293e-01,
+          -7.01943815e-01, 3.17206383e-01, -9.82091129e-01,
+          -7.83585429e-01, -5.87499976e-01, 4.36479568e-01,
+          -9.71365213e-01, -9.04761672e-01, -5.95312476e-01,
+          4.40208435e-01, -9.70199823e-01, -9.07051563e-01,
+          -3.16501379e-01, -4.53915894e-01, -3.07086527e-01,
+          2.47220874e-01, -8.07100296e-01, -4.60408270e-01,
+          -2.97109842e-01, 4.06874061e-01, -8.26559126e-01,
+          -4.15262043e-01, -1.38231635e-01, -2.76924968e-02,
+          -7.33357251e-01, -8.00875843e-01, -1.34328306e-01,
+          -1.28878117e-01, -7.19430566e-01, -8.14168811e-01 ]
+      - [ -8.60604823e-01, -6.59838676e-01, -7.55855203e-01,
+          2.64789820e-01, -9.56939697e-01, -6.32429123e-01,
+          -7.71058321e-01, -3.59324515e-01, -8.17276239e-01,
+          -8.34016860e-01, -7.95312464e-01, 3.40347171e-01,
+          -9.02082205e-01, -8.00769567e-01, -7.92187452e-01,
+          -3.42500806e-01, -7.65927017e-01, -9.16210294e-01,
+          -5.39525151e-01, -4.90558803e-01, -4.73565817e-01,
+          1.92737818e-01, -7.95609117e-01, -4.82716620e-01,
+          -4.86705244e-01, -4.51630056e-01, -5.28611064e-01,
+          -7.55151808e-01, -4.96886671e-01, 2.06338167e-02,
+          -6.76059067e-01, -7.40630388e-01, -4.62686598e-01,
+          -3.24826539e-01, -6.13042593e-01, -8.18888724e-01 ]
+      - [ -8.31400156e-01, -4.71329629e-01, -6.86302304e-01,
+          6.20602369e-02, -8.02305758e-01, -4.48113024e-01,
+          -7.03383684e-01, 1.56377554e-02, -7.71951973e-01,
+          -5.01827300e-01, -6.25000000e-01, -2.56448150e-01,
+          -7.08560705e-01, -8.05094242e-01, -6.29687548e-01,
+          -3.53544533e-01, -6.76736355e-01, -8.28975439e-01,
+          -3.66396189e-01, -2.14978099e-01, -1.99100077e-01,
+          -5.18782139e-02, -4.97840047e-01, -2.17811942e-01,
+          -2.53179193e-01, -1.17129147e-01, -4.35436785e-01,
+          -2.63941586e-01, -1.48194313e-01, -5.04165649e-01,
+          -3.10442150e-01, -6.49637341e-01, -1.44278586e-01,
+          -6.74620986e-01, -2.84287333e-01, -6.89385891e-01 ]
+      - [ -8.27257633e-01, -6.54702544e-01, -7.33144104e-01,
+          5.43876886e-02, -9.13156509e-01, -6.85207248e-01,
+          -7.40820706e-01, 1.26747608e-01, -9.31789041e-01,
+          -7.37111568e-01, -6.74999952e-01, 9.56175327e-02,
+          -8.97667289e-01, -8.85605574e-01, -6.84375048e-01,
+          1.02903605e-01, -8.93567681e-01, -8.88343155e-01,
+          -3.47760797e-01, -3.93028140e-01, -3.43082130e-01,
+          2.20197082e-01, -7.27506638e-01, -3.11068833e-01,
+          -3.57225478e-01, -6.04601502e-02, -6.40701056e-01,
+          -5.32063007e-01, -2.57783234e-01, -2.70883203e-01,
+          -5.95973551e-01, -7.80692995e-01, -2.63681531e-01,
+          -3.31855834e-01, -5.90991020e-01, -7.87676752e-01 ]
+      - [ -9.21499610e-01, -8.48466396e-01, -8.79347026e-01,
+          -6.39517307e-02, -9.74406838e-01, -8.85031521e-01,
+          -8.97768140e-01, -3.54399681e-02, -9.74628806e-01,
+          -9.07163441e-01, -8.71874988e-01, 1.50179625e-01,
+          -9.63315904e-01, -9.53637481e-01, -8.79687488e-01,
+          1.90207124e-01, -9.64962602e-01, -9.53421056e-01,
+          -7.45716870e-01, -7.22556949e-01, -7.00787365e-01,
+          1.29015684e-01, -8.91875327e-01, -7.43804812e-01,
+          -7.22543359e-01, 1.41738534e-01, -8.97834718e-01,
+          -7.97137380e-01, -6.88667536e-01, 6.24693632e-02,
+          -8.59631777e-01, -9.05981362e-01, -6.86567128e-01,
+          2.18396187e-02, -8.60187769e-01, -9.10960078e-01 ]
+      - [ -8.48177314e-01, -7.46231616e-01, -7.88502455e-01,
+          8.99022818e-02, -9.57642555e-01, -7.59786308e-01,
+          -7.89776802e-01, 3.60105038e-02, -9.48652804e-01,
+          -8.23735595e-01, -7.49999940e-01, 1.48336530e-01,
+          -9.35084224e-01, -9.17696476e-01, -7.48437464e-01,
+          2.03239322e-01, -9.39994216e-01, -9.17432487e-01,
+          -5.91223359e-01, -5.59346795e-01, -5.41057348e-01,
+          4.02996063e-01, -8.86797607e-01, -4.48542178e-01,
+          -5.49132943e-01, 5.85799217e-02, -7.70365834e-01,
+          -6.39404535e-01, -4.74470794e-01, -1.23071492e-01,
+          -7.36234963e-01, -8.52883935e-01, -5.04975140e-01,
+          -1.58458352e-01, -7.27913976e-01, -8.49984765e-01 ]
+      - [ -8.48591566e-01, -8.29208970e-01, -7.60113597e-01,
+          1.10414147e-01, -9.90624130e-01, -8.68532956e-01,
+          -7.59539247e-01, 1.52610898e-01, -9.89012361e-01,
+          -8.84097517e-01, -7.09374964e-01, 4.19993162e-01,
+          -9.84043896e-01, -9.30851519e-01, -7.21874952e-01,
+          4.02348161e-01, -9.81509745e-01, -9.34493482e-01,
+          -5.93627930e-01, -6.67872906e-01, -4.87064064e-01,
+          4.21162009e-01, -9.54442143e-01, -6.64813697e-01,
+          -5.07514477e-01, 3.42009068e-01, -9.27642167e-01,
+          -7.14850664e-01, -4.96886671e-01, 3.76343369e-01,
+          -8.97625566e-01, -8.35390151e-01, -5.02487540e-01,
+          2.32081413e-01, -8.78873408e-01, -8.57772887e-01 ]
+      - [ -8.77174854e-01, -4.90318716e-01, -6.77785635e-01,
+          -2.19856799e-01, -7.78777421e-01, -6.52205944e-01,
+          -6.99064076e-01, -1.97000265e-01, -7.75878251e-01,
+          -6.73830390e-01, -6.85937524e-01, -2.86027610e-01,
+          -6.80552781e-01, -7.97625482e-01, -6.92187488e-01,
+          -2.94394672e-01, -6.69710577e-01, -8.09061825e-01,
+          -6.83198094e-01, -4.35148478e-01, -4.96062994e-01,
+          -1.09681964e-01, -6.54584646e-01, -5.67304492e-01,
+          -5.12138724e-01, -1.56644225e-01, -6.36894166e-01,
+          -6.36323214e-01, -5.09340048e-01, -1.62096381e-01,
+          -5.75825274e-01, -7.38364756e-01, -5.12437820e-01,
+          -2.20924199e-01, -5.76585948e-01, -7.52363443e-01 ]
+      - [ -5.92584908e-01, -2.67247617e-01, -2.87437856e-01,
+          -4.99704480e-02, -7.29815602e-01, -4.54239547e-01,
+          -2.41180718e-01, -1.49021864e-01, -7.19829917e-01,
+          -5.56706607e-01, -1.81249976e-01, -1.26305521e-01,
+          -6.77921295e-01, -7.00513184e-01, -1.93749905e-01,
+          -1.06476188e-01, -6.75809145e-01, -7.06257463e-01,
+          1.11511707e-01, -2.20811069e-01, 1.68728828e-02,
+          2.57429123e-01, -6.98554039e-01, -2.26973414e-01,
+          8.43930244e-02, -5.14982343e-02, -6.14031613e-01,
+          -4.57673371e-01, 2.32876658e-01, -1.47462487e-01,
+          -5.60634851e-01, -6.75423861e-01, 2.06467628e-01,
+          -2.02839851e-01, -5.53250790e-01, -6.81040764e-01 ]
+      - [ -7.63048887e-01, -5.66806078e-01, -5.11710465e-01,
+          5.59199929e-01, -9.80842590e-01, -6.01642370e-01,
+          -5.21958232e-01, 5.98009586e-01, -9.79186177e-01,
+          -6.46244168e-01, -2.79687464e-01, 7.21042752e-01,
+          -9.65830505e-01, -8.28070700e-01, -2.85937428e-01,
+          7.40495086e-01, -9.66417670e-01, -8.27769578e-01,
+          4.59872484e-02, -1.48371100e-01, 9.33632851e-02,
+          4.21051621e-01, -6.82114244e-01, 1.14456415e-02,
+          7.97687769e-02, 6.23332977e-01, -7.50386715e-01,
+          -1.60496235e-02, 2.75217891e-01, -6.53592348e-02,
+          -5.32133579e-01, -6.10404253e-01, 2.83582091e-01,
+          -8.17350745e-02, -5.40872693e-01, -6.15487397e-01 ]
+      - [ -8.42377782e-01, -8.06739271e-01, -7.37402439e-01,
+          9.57559347e-02, -9.85600650e-01, -8.59553754e-01,
+          -7.48020172e-01, 1.21912599e-01, -9.81824636e-01,
+          -8.73914301e-01, -7.18750000e-01, 3.53926301e-01,
+          -9.72984731e-01, -9.23230946e-01, -7.31249928e-01,
+          2.94345498e-01, -9.65750158e-01, -9.31502759e-01,
+          -5.75593650e-01, -6.48000836e-01, -4.93813276e-01,
+          3.04851294e-01, -9.20655191e-01, -6.65831268e-01,
+          -4.93641675e-01, 2.64397860e-01, -9.00356054e-01,
+          -7.11938381e-01, -5.04358709e-01, 2.98527718e-01,
+          -8.61111581e-01, -8.18096876e-01, -5.09950280e-01,
+          1.23656154e-01, -8.33780944e-01, -8.46158624e-01 ]
+      - [ -7.27423370e-01, -4.15362418e-01, -5.21646559e-01,
+          -5.02707958e-02, -7.77795851e-01, -5.25186539e-01,
+          -5.03239751e-01, -1.41480744e-01, -7.64162242e-01,
+          -6.18953884e-01, -4.42187488e-01, -8.29456449e-02,
+          -7.54250169e-01, -7.76029348e-01, -4.48437512e-01,
+          -7.12493062e-02, -7.51672268e-01, -7.83040166e-01,
+          -1.31950736e-01, -3.00081670e-01, -1.56355441e-01,
+          2.94859648e-01, -7.42483974e-01, -2.68768132e-01,
+          -1.12138808e-01, -1.60648227e-02, -6.52117729e-01,
+          -4.94309187e-01, -1.24531984e-03, -1.64466977e-01,
+          -5.95590293e-01, -7.23017633e-01, -2.23881602e-02,
+          -2.26262093e-01, -5.87244809e-01, -7.29704738e-01 ]
+      - [ -7.35086977e-01, -3.77465129e-01, -4.57771420e-01,
+          8.90282512e-01, -9.42527950e-01, -1.34775221e-01,
+          -4.68682528e-01, -6.38206124e-01, -6.36777103e-01,
+          -7.49009848e-01, -3.98437500e-01, -6.51054144e-01,
+          -6.19573474e-01, -8.40553761e-01, -3.98437500e-01,
+          -4.24234092e-01, -6.59851611e-01, -8.16728115e-01,
+          1.19927883e-01, -1.33341193e-01, 1.94600701e-01,
+          9.84912395e-01, -8.21957469e-01, 3.72139812e-01,
+          2.20809102e-01, -6.33440852e-01, -3.56781602e-01,
+          -5.99281073e-01, 3.99750948e-01, -9.40719068e-01,
+          -2.77176440e-01, -7.60589838e-01, 3.85572195e-01,
+          -8.50653410e-01, -3.12653005e-01, -7.34291494e-01 ]
+      - [ -8.11723292e-01, -7.17233300e-01, -6.38041139e-01,
+          3.10281754e-01, -9.84820247e-01, -7.55118132e-01,
+          -6.27069771e-01, 2.93657541e-01, -9.82322454e-01,
+          -8.12107027e-01, -5.15625000e-01, 5.17964602e-01,
+          -9.74816322e-01, -8.94686997e-01, -5.28125048e-01,
+          5.30549526e-01, -9.74100590e-01, -8.94564748e-01,
+          -1.93868399e-01, -4.17065084e-01, -1.58605158e-01,
+          4.59451914e-01, -8.46110582e-01, -3.37731898e-01,
+          -2.20809281e-01, 2.81076670e-01, -7.98828661e-01,
+          -4.85598564e-01, -1.86799765e-02, -1.92830563e-02,
+          -7.26753116e-01, -7.88369656e-01, -1.99004412e-02,
+          1.38289928e-02, -7.41589785e-01, -7.82275558e-01 ]
+      - [ -8.70753944e-01, -6.95665240e-01, -7.99858093e-01,
+          2.40635037e-01, -9.64367867e-01, -6.58335447e-01,
+          -8.14254820e-01, -3.54143739e-01, -8.26918483e-01,
+          -8.47432673e-01, -8.14062476e-01, 2.25518584e-01,
+          -9.06594932e-01, -8.53244543e-01, -8.15625012e-01,
+          -2.19555318e-01, -8.18837166e-01, -9.22162414e-01,
+          -5.84610820e-01, -5.31726122e-01, -5.34308195e-01,
+          2.22234368e-01, -8.27841997e-01, -5.14601111e-01,
+          -5.46820879e-01, -4.04919207e-01, -5.61545968e-01,
+          -7.67011940e-01, -5.54171860e-01, 5.59163094e-03,
+          -7.01787353e-01, -7.75692344e-01, -5.39801002e-01,
+          -2.83231616e-01, -6.46476328e-01, -8.35853517e-01 ]
+      - [ -7.41300702e-01, -2.86431730e-01, -4.49254811e-01,
+          7.83197522e-01, -8.98319483e-01, -6.85726404e-02,
+          -4.25485909e-01, -5.29726624e-01, -6.11605763e-01,
+          -6.55523419e-01, -4.20312524e-01, -3.50574374e-01,
+          -5.84789693e-01, -7.06261396e-01, -4.35937464e-01,
+          -3.76292109e-01, -5.67659974e-01, -7.21268475e-01,
+          -1.69221580e-01, -9.35393572e-02, 5.62441349e-03,
+          6.52287126e-01, -7.16876149e-01, 2.62727618e-01,
+          5.89594841e-02, -8.14294934e-01, -2.26309538e-01,
+          -6.28730536e-01, 1.18306279e-01, -5.66051781e-01,
+          -2.51890063e-01, -6.08557940e-01, 9.45273638e-02,
+          -6.50047421e-01, -2.41549730e-01, -6.22933447e-01 ]
+      - [ -8.33678544e-01, -6.28962636e-01, -6.96238458e-01,
+          4.83032465e-01, -9.80418980e-01, -5.81389785e-01,
+          -7.03383684e-01, 4.09054756e-02, -9.14379120e-01,
+          -7.57337034e-01, -6.20312452e-01, 3.31769347e-01,
+          -9.27707672e-01, -8.55379283e-01, -6.34374976e-01,
+          3.11689734e-01, -9.20843661e-01, -8.59246552e-01,
+          -2.85241961e-01, -2.52581954e-01, -2.19347596e-01,
+          5.45004368e-01, -7.74007261e-01, -8.03989172e-03,
+          -1.93063557e-01, -2.18890786e-01, -4.84135687e-01,
+          -4.41262662e-01, -1.38231635e-01, -2.08261132e-01,
+          -5.06638646e-01, -6.69988513e-01, -1.36815846e-01,
+          -3.11056733e-01, -4.88238633e-01, -6.86519742e-01 ]
+      - [ -8.75517845e-01, -6.83252215e-01, -7.65791297e-01,
+          3.28232050e-01, -9.73797858e-01, -6.40972018e-01,
+          -7.62418985e-01, 2.15556622e-02, -9.35637474e-01,
+          -8.06622088e-01, -7.17187464e-01, 1.90812826e-01,
+          -9.22946811e-01, -8.89674604e-01, -7.23437428e-01,
+          2.94139981e-01, -9.33384657e-01, -8.81361961e-01,
+          -5.93026817e-01, -4.62202013e-01, -4.69066381e-01,
+          4.95014310e-01, -8.57668400e-01, -2.57949769e-01,
+          -4.65896010e-01, -1.64388537e-01, -6.59456968e-01,
+          -6.67522907e-01, -4.49564159e-01, -2.31914997e-01,
+          -6.05320215e-01, -7.88167536e-01, -4.67661679e-01,
+          -7.91698098e-02, -6.53228641e-01, -7.51458883e-01 ]
+      - [ -7.98053026e-01, -4.22122836e-01, -5.84102273e-01,
+          6.21260405e-02, -8.20483565e-01, -5.06672204e-01,
+          -6.01151943e-01, -8.35031271e-03, -7.83975661e-01,
+          -5.54070950e-01, -5.84374905e-01, -6.70915842e-03,
+          -7.27907836e-01, -7.21795797e-01, -5.89062452e-01,
+          -1.13783598e-01, -6.96680427e-01, -7.49719381e-01,
+          -3.34535658e-01, -2.05446780e-01, -1.58605158e-01,
+          -1.64874017e-01, -5.20067811e-01, -3.86242270e-01,
+          -1.44508719e-01, -2.44060874e-01, -4.53462481e-01,
+          -4.17697668e-01, -1.45703673e-01, -8.95907879e-02,
+          -4.45290804e-01, -5.53250968e-01, -1.51741326e-01,
+          -2.41949677e-01, -4.17863548e-01, -5.93488216e-01 ]
+      - [ -6.89519465e-01, -3.37676167e-01, -4.18026924e-01,
+          1.21474624e-01, -8.19623590e-01, -4.79943514e-01,
+          -4.32685316e-01, 2.92589903e-01, -8.38370681e-01,
+          -4.56281006e-01, -4.24999952e-01, -2.80032754e-02,
+          -6.97367787e-01, -6.90119147e-01, -4.28124964e-01,
+          -4.13483977e-02, -6.88361228e-01, -7.01383770e-01,
+          5.13975620e-02, -1.16996348e-01, 7.31158257e-02,
+          -2.78988481e-02, -5.48555732e-01, -2.66712189e-01,
+          1.05202198e-01, -5.17904758e-03, -5.54242015e-01,
+          -3.12613904e-01, 1.20796919e-01, -5.35377860e-02,
+          -4.41023409e-01, -5.18186331e-01, 1.16915464e-01,
+          -1.08151913e-01, -4.37309623e-01, -5.30729830e-01 ]
+      - [ -7.23902225e-01, -4.01484311e-01, -4.86160457e-01,
+          2.04652429e-01, -8.50817621e-01, -4.70466375e-01,
+          -5.06119490e-01, 2.27936268e-01, -8.44221890e-01,
+          -5.06785572e-01, -4.37499940e-01, 1.44566417e-01,
+          -7.87563443e-01, -7.29002595e-01, -4.28124964e-01,
+          1.04516268e-01, -7.84076095e-01, -7.53003240e-01,
+          1.12112999e-01, -4.71862555e-02, 1.18110180e-01,
+          5.17601967e-02, -4.77589667e-01, -4.05586958e-02,
+          1.09826565e-01, -4.89041805e-02, -4.37549651e-01,
+          -1.63864791e-01, 1.50684953e-01, 1.39500260e-01,
+          -4.02125359e-01, -3.66903305e-01, 2.06467628e-01,
+          -3.09552670e-01, -3.11969638e-01, -5.10111570e-01 ]
+      - [ -8.18972647e-01, -7.89595008e-01, -6.86302304e-01,
+          1.20785713e-01, -9.84932780e-01, -8.52846265e-01,
+          -7.06263483e-01, 2.26737380e-01, -9.87106979e-01,
+          -8.50978374e-01, -6.12499952e-01, 3.99923086e-01,
+          -9.78859365e-01, -9.28160131e-01, -6.15625024e-01,
+          4.30781841e-01, -9.79442298e-01, -9.27635014e-01,
+          -3.63390446e-01, -5.53730607e-01, -3.40832412e-01,
+          3.80505681e-01, -8.98771167e-01, -5.49094677e-01,
+          -3.17919075e-01, 3.89836669e-01, -8.95503223e-01,
+          -6.06883049e-01, -1.93026185e-01, 8.42038393e-02,
+          -8.24049473e-01, -8.51367354e-01, -1.84079647e-01,
+          1.05646133e-01, -8.35724473e-01, -8.49328995e-01 ]
+      - [ -8.03438306e-01, -4.77997303e-01, -6.38041139e-01,
+          2.78559923e-02, -8.43089581e-01, -5.72671056e-01,
+          -6.52987719e-01, 2.10353494e-01, -8.49341571e-01,
+          -5.03584027e-01, -6.07812524e-01, -3.98756266e-02,
+          -7.83628643e-01, -7.98921704e-01, -6.12499952e-01,
+          1.97480917e-02, -7.89239109e-01, -7.96204209e-01,
+          -3.20709407e-01, -1.76403642e-01, -1.45106852e-01,
+          -1.11058712e-01, -5.19740582e-01, -3.23376358e-01,
+          -2.11560726e-01, 1.96838975e-01, -5.67442894e-01,
+          -1.06384099e-01, -1.00871742e-01, -2.53478289e-01,
+          -4.18506801e-01, -6.11301839e-01, -1.16915464e-01,
+          -2.11557627e-01, -4.42769945e-01, -5.98677993e-01 ]
+      - [ -7.91425049e-01, -5.70731938e-01, -6.13910556e-01,
+          3.39275837e-01, -9.40984190e-01, -5.72076559e-01,
+          -6.27069771e-01, 1.31708980e-01, -9.13427174e-01,
+          -7.09331274e-01, -5.40624976e-01, 2.48349309e-01,
+          -8.98561478e-01, -8.40090394e-01, -5.40624976e-01,
+          2.51575112e-01, -8.97549391e-01, -8.46781135e-01,
+          -2.05290139e-01, -2.33020186e-01, -1.45106852e-01,
+          4.27160501e-01, -7.28045106e-01, -6.40677214e-02,
+          -1.14450872e-01, -1.47250295e-01, -5.48537850e-01,
+          -4.56944168e-01, -2.61518955e-02, -1.03356779e-01,
+          -5.40650666e-01, -6.46274507e-01, -2.73631811e-02,
+          -2.55891919e-01, -5.15252233e-01, -6.79154277e-01 ]
+      - [ -7.03189731e-01, -2.77496815e-01, -4.23704743e-01,
+          8.68114233e-02, -7.63024509e-01, -3.92845929e-01,
+          -4.19726372e-01, 1.66043520e-01, -7.50403047e-01,
+          -3.73317778e-01, -3.87499928e-01, -1.44799232e-01,
+          -6.31599844e-01, -6.65084541e-01, -4.17187452e-01,
+          -6.41708374e-02, -6.25751138e-01, -6.41422749e-01,
+          -6.46228194e-02, -2.40876675e-02, 1.40607476e-01,
+          -3.01237762e-01, -3.45172346e-01, -2.48662889e-01,
+          9.36415195e-02, -3.10893059e-02, -3.89117420e-01,
+          -6.24510050e-02, 1.98007464e-01, -3.37644756e-01,
+          -2.55327702e-01, -4.89904284e-01, 9.70149040e-02,
+          -1.03448272e-01, -2.93446302e-01, -3.85680258e-01 ]
+      - [ -8.68475556e-01, -7.81859338e-01, -8.28247011e-01,
+          1.86274886e-01, -9.85207021e-01, -7.75141001e-01,
+          -8.25773954e-01, -8.61095190e-02, -9.42817330e-01,
+          -8.72874677e-01, -7.82812476e-01, 2.77499437e-01,
+          -9.60656881e-01, -9.19008732e-01, -8.03124964e-01,
+          1.31823063e-01, -9.40093160e-01, -9.36626256e-01,
+          -6.21280432e-01, -6.23502254e-01, -6.17547750e-01,
+          2.75410414e-01, -8.93797874e-01, -6.03728056e-01,
+          -6.23121381e-01, -2.17053592e-01, -6.96577430e-01,
+          -7.83553541e-01, -6.23910308e-01, 8.09670687e-02,
+          -7.96091855e-01, -8.36823583e-01, -6.16915405e-01,
+          -1.78983569e-01, -7.48051524e-01, -8.81397665e-01 ]
+      - [ -8.71168196e-01, -8.19125891e-01, -8.29666436e-01,
+          1.50674582e-02, -9.78685617e-01, -8.61645579e-01,
+          -8.51691842e-01, 7.34828711e-02, -9.73096967e-01,
+          -8.54339004e-01, -8.51562440e-01, 2.42349386e-01,
+          -9.59866345e-01, -9.25184250e-01, -8.56249988e-01,
+          2.19311476e-01, -9.55175161e-01, -9.32736874e-01,
+          -6.64562702e-01, -6.76525176e-01, -6.55793071e-01,
+          4.00737524e-02, -8.57399166e-01, -7.61014342e-01,
+          -6.76300645e-01, 1.40818357e-01, -8.53448033e-01,
+          -7.13221490e-01, -6.73723578e-01, 9.62607861e-02,
+          -8.20114136e-01, -8.56209755e-01, -6.71641827e-01,
+          -9.20439363e-02, -7.81099975e-01, -8.81774366e-01 ]
+      - [ -8.23115170e-01, -6.58062875e-01, -6.53655052e-01,
+          4.01752830e-01, -9.80420172e-01, -6.65868640e-01,
+          -6.73146129e-01, 3.82122755e-01, -9.74658906e-01,
+          -7.21371889e-01, -5.40624976e-01, 5.53079724e-01,
+          -9.65465009e-01, -8.62150908e-01, -5.50000012e-01,
+          5.76302290e-01, -9.65584636e-01, -8.60755205e-01,
+          -3.94649863e-01, -2.87706017e-01, -3.11586082e-01,
+          3.47228050e-01, -6.84280992e-01, -4.19018269e-02,
+          -3.41040432e-01, 2.20862031e-01, -6.59168720e-01,
+          -2.53136635e-01, -1.73100889e-01, -3.52094293e-01,
+          -5.02867103e-01, -7.33494699e-01, -1.66666687e-01,
+          -3.01475823e-01, -5.35944700e-01, -7.24440515e-01 ]
+      - [ -8.52319777e-01, -6.93283021e-01, -7.74308026e-01,
+          3.43015075e-01, -9.70048368e-01, -6.06090069e-01,
+          -7.65298724e-01, -6.08959198e-02, -9.26802814e-01,
+          -8.33979845e-01, -7.23437428e-01, 1.43979669e-01,
+          -9.20492113e-01, -9.00161982e-01, -7.14062452e-01,
+          2.45924592e-01, -9.31908846e-01, -8.94377768e-01,
+          -5.42530894e-01, -4.13407087e-01, -4.26321745e-01,
+          7.04362273e-01, -8.80186558e-01, -4.59014773e-02,
+          -4.70520318e-01, -4.06838834e-01, -5.00782132e-01,
+          -6.86197281e-01, -3.84806991e-01, -5.22009313e-01,
+          -5.01930475e-01, -8.22184563e-01, -3.80596995e-01,
+          -4.44835842e-01, -5.45199275e-01, -8.06753755e-01 ]
+      - [ -7.21002460e-01, -2.38627315e-01, -4.02413070e-01,
+          5.01550436e-01, -8.17363620e-01, -1.30752027e-01,
+          -4.02447820e-01, -2.40170181e-01, -6.34531319e-01,
+          -4.91422355e-01, -3.35937440e-01, -3.45198691e-01,
+          -5.80202162e-01, -6.93216503e-01, -3.46875012e-01,
+          -2.86794424e-01, -5.82147956e-01, -6.90396130e-01,
+          -1.42170191e-01, -5.40874600e-02, 1.46232843e-02,
+          3.00391078e-01, -5.58404684e-01, 1.38768554e-01,
+          5.20231724e-02, -4.13434863e-01, -2.82521307e-01,
+          -3.33667815e-01, 1.48194313e-01, -4.79903162e-01,
+          -2.58139014e-01, -5.69239378e-01, 1.39303446e-01,
+          -5.72803736e-01, -2.41088331e-01, -5.83027840e-01 ]
+      - [ -8.56048048e-01, -6.63370967e-01, -7.85663605e-01,
+          -3.31335664e-02, -8.81092906e-01, -6.73340380e-01,
+          -7.81137466e-01, 2.63430476e-01, -9.40073848e-01,
+          -6.50793552e-01, -7.31249928e-01, 4.41110134e-03,
+          -8.69120955e-01, -8.82212400e-01, -7.43749976e-01,
+          -6.56425953e-02, -8.47738683e-01, -8.94025445e-01,
+          -5.04057765e-01, -4.04742956e-01, -4.35320556e-01,
+          1.05192184e-01, -6.55188560e-01, -3.05557728e-01,
+          -4.49711025e-01, -6.77669048e-03, -5.91414332e-01,
+          -4.02629972e-01, -3.05105925e-01, -5.84488809e-01,
+          -4.53228951e-01, -8.04542303e-01, -3.10945332e-01,
+          -7.19552100e-01, -4.31020737e-01, -8.24138820e-01 ]
+      - [ -8.30571651e-01, -7.10975528e-01, -7.28885710e-01,
+          7.48192072e-02, -9.51222479e-01, -7.66871989e-01,
+          -7.43700504e-01, 5.08874655e-02, -9.38871026e-01,
+          -7.99351513e-01, -7.20312476e-01, 2.11486936e-01,
+          -9.26483870e-01, -8.88299882e-01, -7.25000024e-01,
+          1.69687510e-01, -9.16618109e-01, -8.96692395e-01,
+          -3.72407675e-01, -4.63412642e-01, -3.31833541e-01,
+          1.92578316e-01, -8.04545999e-01, -5.12127399e-01,
+          -3.31791878e-01, 7.18367100e-02, -7.48693407e-01,
+          -5.83509088e-01, -3.00124526e-01, 1.05420589e-01,
+          -7.34954119e-01, -7.54389465e-01, -2.91044831e-01,
+          5.42243719e-02, -7.31632948e-01, -7.62916327e-01 ]
+      - [ -7.35501230e-01, -3.80299628e-01, -4.60610390e-01,
+          8.85011792e-01, -9.42548752e-01, -1.38794184e-01,
+          -4.71562207e-01, -6.33155107e-01, -6.40530348e-01,
+          -7.50547886e-01, -3.99999976e-01, -6.46110058e-01,
+          -6.23234510e-01, -8.41922104e-01, -4.01562452e-01,
+          -4.22673941e-01, -6.62441611e-01, -8.18616629e-01,
+          1.24736905e-01, -1.33688450e-01, 1.99100137e-01,
+          9.89315867e-01, -8.23195696e-01, 3.72831821e-01,
+          2.25433469e-01, -6.33633375e-01, -3.56980324e-01,
+          -5.99128485e-01, 4.02241588e-01, -9.47523236e-01,
+          -2.75276303e-01, -7.61654437e-01, 3.93034935e-01,
+          -8.55561256e-01, -3.11444581e-01, -7.34799862e-01 ]
+      - [ -8.66197169e-01, -8.54754210e-01, -8.38183105e-01,
+          1.69487000e-02, -9.88310397e-01, -8.91537428e-01,
+          -8.57451379e-01, 1.01133585e-01, -9.87115562e-01,
+          -8.83520365e-01, -8.32812428e-01, 2.94369102e-01,
+          -9.78561759e-01, -9.43598509e-01, -8.40624988e-01,
+          3.02565336e-01, -9.77610946e-01, -9.46182072e-01,
+          -6.65163815e-01, -7.18069315e-01, -6.82789683e-01,
+          8.91516209e-02, -8.94269764e-01, -7.86297917e-01,
+          -7.01734185e-01, 1.73730969e-01, -8.86751652e-01,
+          -7.48166800e-01, -6.96139514e-01, 1.22281075e-01,
+          -8.55822682e-01, -8.82044792e-01, -6.99005008e-01,
+          -3.08504105e-02, -8.25773656e-01, -8.99258614e-01 ]
+      - [ -7.34879851e-01, -4.23412144e-01, -5.21646559e-01,
+          4.47326899e-02, -8.24273050e-01, -5.37674546e-01,
+          -5.42116642e-01, 3.66207480e-01, -8.65636826e-01,
+          -4.41939354e-01, -4.46874917e-01, -2.32842565e-02,
+          -7.79388905e-01, -7.82587647e-01, -4.48437512e-01,
+          1.10286474e-02, -7.81596124e-01, -7.85685241e-01,
+          9.04718637e-02, -6.55006170e-02, 7.98650980e-02,
+          3.51250172e-03, -4.46651161e-01, -4.67674732e-02,
+          8.90172720e-02, 2.24644780e-01, -5.33334255e-01,
+          -1.35698318e-02, 2.17932701e-01, -3.08683515e-01,
+          -3.42218518e-01, -5.51599860e-01, 2.48756170e-01,
+          -4.07620668e-01, -3.40140164e-01, -5.80465317e-01 ]
+      - [ -8.54598165e-01, -7.56816626e-01, -7.87083030e-01,
+          1.58142328e-01, -9.78015661e-01, -7.80390084e-01,
+          -8.04175615e-01, 1.12142324e-01, -9.61414397e-01,
+          -8.09283316e-01, -7.49999940e-01, 3.07555676e-01,
+          -9.58217502e-01, -9.07831788e-01, -7.54687488e-01,
+          2.98759222e-01, -9.55420494e-01, -9.13179100e-01,
+          -5.81605077e-01, -5.18914700e-01, -5.29808760e-01,
+          9.92918015e-02, -7.80182660e-01, -5.55068731e-01,
+          -5.39884448e-01, 1.39279366e-02, -7.16536045e-01,
+          -5.92589080e-01, -5.14321327e-01, 2.59006023e-02,
+          -7.21117556e-01, -7.82609463e-01, -5.12437820e-01,
+          -1.00021482e-01, -7.02311873e-01, -8.07100415e-01 ]
+      - [ -8.68889809e-01, -7.71978140e-01, -8.04116368e-01,
+          -1.03681684e-02, -9.60701704e-01, -8.31391692e-01,
+          -8.21454287e-01, 8.49157572e-02, -9.58991349e-01,
+          -8.15390825e-01, -8.34375024e-01, 2.12101460e-01,
+          -9.36122298e-01, -8.99581671e-01, -8.37499976e-01,
+          1.53873682e-01, -9.26336408e-01, -9.14286196e-01,
+          -6.54944420e-01, -6.32092476e-01, -6.10798597e-01,
+          4.77337837e-03, -8.28131914e-01, -7.40631342e-01,
+          -6.39306366e-01, 1.35991096e-01, -8.30132008e-01,
+          -6.73857689e-01, -6.41345024e-01, 1.02160811e-01,
+          -7.90874958e-01, -8.23579550e-01, -6.26865625e-01,
+          -1.39079392e-01, -7.42937922e-01, -8.64515662e-01 ]
+      - [ -7.41715014e-01, -4.61642921e-01, -5.11710465e-01,
+          5.25114536e-02, -8.55776310e-01, -5.97066343e-01,
+          -5.29157639e-01, -2.84673572e-02, -8.25512171e-01,
+          -6.46311522e-01, -5.39062500e-01, 1.43602967e-01,
+          -7.93382645e-01, -7.37002492e-01, -5.40624976e-01,
+          5.47895432e-02, -7.71323144e-01, -7.59558618e-01,
+          -1.56597555e-01, -2.83363342e-01, -1.15860522e-01,
+          1.08213067e-01, -6.96494341e-01, -3.88545156e-01,
+          -1.00578129e-01, -2.15260386e-02, -6.41729355e-01,
+          -4.81898487e-01, -8.34370852e-02, 1.33247614e-01,
+          -6.19720697e-01, -6.14081979e-01, -8.20896029e-02,
+          8.45992565e-02, -6.15516841e-01, -6.23926520e-01 ]
+      - [ -6.88483834e-01, -2.81238616e-01, -4.03832495e-01,
+          3.84535909e-01, -8.28417659e-01, -2.74260581e-01,
+          -3.89488816e-01, -1.40860379e-01, -7.16988981e-01,
+          -5.45772135e-01, -3.78125012e-01, -2.20462263e-01,
+          -6.37087464e-01, -7.02138245e-01, -3.93749952e-01,
+          7.52897263e-02, -6.86162114e-01, -6.46268547e-01,
+          -9.91892815e-03, -1.12368345e-01, 3.93700600e-02,
+          4.48541880e-01, -6.79815292e-01, 6.08596802e-02,
+          1.00577950e-01, -3.32827568e-01, -4.57435608e-01,
+          -4.82639015e-01, 1.35740995e-01, -2.52036214e-01,
+          -3.96997511e-01, -5.77053905e-01, 1.04477525e-01,
+          -1.22998357e-01, -4.36408103e-01, -5.37776649e-01 ]
+      - [ -8.11309040e-01, -5.56226015e-01, -6.60752296e-01,
+          7.23222494e-02, -8.90390813e-01, -6.34255290e-01,
+          -6.57307386e-01, 3.83553863e-01, -9.32573259e-01,
+          -5.79611659e-01, -5.95312476e-01, 2.13146925e-01,
+          -8.79599273e-01, -8.24437857e-01, -6.01562500e-01,
+          2.60500669e-01, -8.82978499e-01, -8.22516441e-01,
+          -2.76224852e-01, -1.91021264e-01, -1.72103405e-01,
+          6.47056103e-02, -5.87031484e-01, -2.29797959e-01,
+          -1.58381522e-01, 2.65397787e-01, -6.28460884e-01,
+          -1.45543993e-01, -9.33997035e-02, -1.46733403e-01,
+          -4.74789917e-01, -6.08455658e-01, -1.04477644e-01,
+          -1.51604235e-01, -4.83258009e-01, -6.06111944e-01 ]
+      - [ -6.93661988e-01, -5.86395442e-01, -4.44996417e-01,
+          -7.57418871e-02, -9.05275762e-01, -7.81001389e-01,
+          -4.45644319e-01, -9.41693783e-04, -9.08073664e-01,
+          -7.84598887e-01, -4.24999952e-01, 1.15259051e-01,
+          -8.80428016e-01, -8.56855869e-01, -4.26562488e-01,
+          7.99673796e-02, -8.74860883e-01, -8.69565368e-01,
+          -8.71658325e-03, -4.76416051e-01, -8.66141319e-02,
+          2.11377859e-01, -8.49862218e-01, -6.05122149e-01,
+          -6.12717271e-02, 1.82831645e-01, -8.32879484e-01,
+          -6.47293448e-01, -1.36985779e-02, 2.95246720e-01,
+          -8.01389992e-01, -7.49570966e-01, -1.74129605e-02,
+          1.26876116e-01, -7.78724015e-01, -7.80303597e-01 ]
+      - [ -8.55840921e-01, -6.67105556e-01, -7.85663605e-01,
+          -3.78609896e-02, -8.81214738e-01, -6.77515984e-01,
+          -7.82577395e-01, 2.61719823e-01, -9.40972567e-01,
+          -6.54246449e-01, -7.32812524e-01, -5.49674034e-04,
+          -8.69364679e-01, -8.84090126e-01, -7.46874988e-01,
+          -6.71240687e-02, -8.48493576e-01, -8.95410180e-01,
+          -5.02254367e-01, -4.08276796e-01, -4.35320556e-01,
+          1.21975541e-01, -6.63493633e-01, -2.99570203e-01,
+          -4.49711025e-01, 3.48436832e-03, -5.99174440e-01,
+          -4.03611243e-01, -3.00124526e-01, -6.15773439e-01,
+          -4.48487341e-01, -8.12688947e-01, -3.05970132e-01,
+          -7.34461069e-01, -4.32623148e-01, -8.29644561e-01 ]
+      - [ -8.29328895e-01, -7.38075435e-01, -6.03974462e-01,
+          1.28216267e-01, -9.82260287e-01, -8.50754619e-01,
+          -7.32181430e-01, 5.24122119e-01, -9.97080445e-01,
+          -7.42644310e-01, -5.46875000e-01, 5.87431431e-01,
+          -9.83379126e-01, -8.96802008e-01, -5.53124964e-01,
+          5.90665102e-01, -9.82738733e-01, -8.98758829e-01,
+          -2.67207742e-01, -3.87852132e-01, -2.17097878e-01,
+          -7.12411404e-02, -6.71926856e-01, -5.33830285e-01,
+          -2.06936479e-01, 9.74239111e-01, -9.32268381e-01,
+          -1.36469185e-01, -8.09464455e-02, 1.05788708e-02,
+          -6.96785092e-01, -7.49440908e-01, -8.95522237e-02,
+          -6.80893064e-02, -6.87578440e-01, -7.61850417e-01 ]
+      - [ -8.07373643e-01, -5.97183466e-01, -6.69268966e-01,
+          -2.12531686e-02, -8.86579692e-01, -6.94513083e-01,
+          -6.76025927e-01, 3.55889440e-01, -9.39820528e-01,
+          -6.16785884e-01, -6.18749976e-01, 6.48179054e-02,
+          -8.69761825e-01, -8.60906482e-01, -6.18749976e-01,
+          1.97558641e-01, -8.86895895e-01, -8.49477470e-01,
+          -3.55575681e-01, -3.54800403e-01, -2.80089974e-01,
+          4.43165302e-02, -6.66457772e-01, -4.02806580e-01,
+          -2.46242762e-01, 3.04149270e-01, -7.54548907e-01,
+          -3.62985075e-01, -1.88044786e-01, -2.39790976e-01,
+          -5.72116256e-01, -7.43708670e-01, -1.99005008e-01,
+          -2.05280364e-01, -5.84537625e-01, -7.27754951e-01 ]
+      - [ -8.17937016e-01, -6.84297919e-01, -7.17530131e-01,
+          4.39212322e-02, -9.30250943e-01, -7.36805856e-01,
+          -7.17782557e-01, 1.45934105e-01, -9.44487810e-01,
+          -7.61010587e-01, -6.90624952e-01, 3.40795517e-02,
+          -8.94562781e-01, -9.00013328e-01, -6.92187488e-01,
+          2.60642052e-01, -9.23472345e-01, -8.78175318e-01,
+          -4.18094456e-01, -4.54382420e-01, -4.21822250e-01,
+          1.94235325e-01, -7.56946743e-01, -4.00985777e-01,
+          -3.36416245e-01, 1.09325767e-01, -7.49140382e-01,
+          -5.49229860e-01, -2.67745912e-01, -1.92371309e-01,
+          -6.66266799e-01, -8.08143973e-01, -2.98507512e-01,
+          -1.26756191e-01, -6.83045208e-01, -7.88174212e-01 ]
+      - [ -7.54971027e-01, -2.03649044e-01, -4.60610390e-01,
+          1.77279949e-01, -7.01831102e-01, -1.77906156e-01,
+          -4.84521210e-01, 2.86059976e-01, -7.27928758e-01,
+          -2.12482333e-01, -4.49999988e-01, -1.71045601e-01,
+          -5.42180955e-01, -5.83832085e-01, -4.49999988e-01,
+          -2.68469334e-01, -5.07559419e-01, -6.16350532e-01,
+          -2.10700393e-01, 8.53002071e-02, 2.58718729e-02,
+          -4.90997434e-02, -2.51140893e-01, 2.26276398e-01,
+          1.96532011e-02, -1.32375002e-01, -2.78390408e-01,
+          -9.81855392e-03, 7.34744072e-02, -3.88358653e-01,
+          -7.73746371e-02, -3.68542910e-01, 6.71641827e-02,
+          -4.99451637e-01, -4.60711122e-02, -3.87982249e-01 ]
+      - [ -9.35169816e-01, -6.27752423e-01, -8.41022015e-01,
+          -1.96838498e-01, -8.28027725e-01, -7.07365394e-01,
+          -8.67530584e-01, -9.83642340e-02, -8.55550230e-01,
+          -7.23338664e-01, -8.40624928e-01, -1.89163029e-01,
+          -7.84274220e-01, -8.75158787e-01, -8.49999964e-01,
+          -1.80141389e-01, -7.75985241e-01, -8.80312860e-01,
+          -8.24466527e-01, -5.12368917e-01, -6.35545552e-01,
+          -1.79390848e-01, -6.15910888e-01, -6.10820889e-01,
+          -6.73988461e-01, -1.17816091e-01, -6.58389688e-01,
+          -6.56059146e-01, -6.73723578e-01, -2.28584766e-01,
+          -5.67772627e-01, -7.86009908e-01, -6.81592107e-01,
+          -3.06937337e-01, -5.57302237e-01, -7.99948096e-01 ]
+      - [ -7.59113491e-01, -3.52827072e-01, -5.00354886e-01,
+          3.51664424e-01, -8.49000812e-01, -3.32088411e-01,
+          -4.94600415e-01, -1.75973952e-01, -7.35795259e-01,
+          -5.98082423e-01, -4.90624905e-01, -2.38363385e-01,
+          -6.56525195e-01, -7.36344934e-01, -5.10937452e-01,
+          6.55645132e-02, -7.08696723e-01, -6.80279613e-01,
+          -2.49173462e-01, -1.75903976e-01, -1.15860522e-01,
+          4.16879773e-01, -6.95658445e-01, -1.17313862e-03,
+          -6.58959150e-02, -3.25969934e-01, -4.74304438e-01,
+          -5.15570939e-01, -5.35492301e-02, -2.56732106e-01,
+          -4.20585155e-01, -6.11551166e-01, -8.45771432e-02,
+          -1.15945816e-01, -4.61525023e-01, -5.66591263e-01 ]
+      - [ -8.27257633e-01, -7.54390538e-01, -6.76366210e-01,
+          2.38420129e-01, -9.88396525e-01, -8.06416273e-01,
+          -6.97624207e-01, 3.38663340e-01, -9.90957916e-01,
+          -8.15213025e-01, -5.76562405e-01, 6.03063226e-01,
+          -9.85731244e-01, -8.97658229e-01, -5.79687476e-01,
+          5.81084013e-01, -9.83920932e-01, -9.02965844e-01,
+          -2.74421394e-01, -4.35075521e-01, -2.32845843e-01,
+          3.81189942e-01, -8.42589736e-01, -4.06989813e-01,
+          -2.39306450e-01, 3.76266837e-01, -8.34563673e-01,
+          -4.72762227e-01, -1.08343780e-01, 3.52851272e-01,
+          -8.05836082e-01, -7.30924010e-01, -9.45274234e-02,
+          1.17181897e-01, -7.70069718e-01, -7.75298834e-01 ]
+      - [ -8.38442445e-01, -7.54223824e-01, -7.53016353e-01,
+          8.29123259e-02, -9.67368960e-01, -8.02565515e-01,
+          -7.73938060e-01, 6.73617125e-02, -9.56115961e-01,
+          -8.26356232e-01, -7.31249928e-01, 2.60109782e-01,
+          -9.50571537e-01, -9.09356296e-01, -7.40625024e-01,
+          2.41556287e-01, -9.44862962e-01, -9.13658261e-01,
+          -4.04268146e-01, -5.08515120e-01, -3.76827896e-01,
+          2.22146749e-01, -8.35581779e-01, -5.47209620e-01,
+          -3.75722647e-01, 1.37400985e-01, -7.96615422e-01,
+          -6.11382484e-01, -3.39975178e-01, 1.47119880e-01,
+          -7.81957924e-01, -7.87554324e-01, -3.23383152e-01,
+          4.47167158e-02, -7.67665744e-01, -8.03602040e-01 ]
+      - [ -8.59154940e-01, -6.55951023e-01, -7.79985785e-01,
+          4.84078646e-01, -9.77067888e-01, -5.01702547e-01,
+          -8.05615544e-01, -4.00415123e-01, -8.18324208e-01,
+          -8.60825241e-01, -7.67187476e-01, -1.00445628e-01,
+          -8.53334546e-01, -9.00395036e-01, -7.81250000e-01,
+          -8.98415446e-02, -8.46162200e-01, -9.02790189e-01,
+          -5.33513784e-01, -3.93723130e-01, -3.85826766e-01,
+          7.99245119e-01, -9.02916253e-01, -1.39294863e-02,
+          -3.98844004e-01, -6.49153471e-01, -4.21193242e-01,
+          -7.78396606e-01, -3.39975178e-01, -5.22619188e-01,
+          -4.91479874e-01, -8.10190141e-01, -3.50746274e-01,
+          -6.10322893e-01, -4.77391064e-01, -8.18933189e-01 ]
+      - [ -8.63090277e-01, -8.91674101e-01, -8.32505345e-01,
+          9.24586058e-02, -9.97723103e-01, -9.01267052e-01,
+          -7.45140314e-01, 3.87024879e-02, -9.95651662e-01,
+          -9.52926993e-01, -7.31249928e-01, 3.83995414e-01,
+          -9.94391859e-01, -9.63588595e-01, -7.37499952e-01,
+          4.04810190e-01, -9.94362772e-01, -9.63913620e-01,
+          -5.26901126e-01, -7.24891365e-01, -4.57817793e-01,
+          5.89796662e-01, -9.90465760e-01, -6.95539951e-01,
+          -4.35838163e-01, 1.03472471e-01, -9.26866829e-01,
+          -8.77206564e-01, -3.42465758e-01, 1.70367718e-01,
+          -9.19075012e-01, -9.25572336e-01, -3.50746274e-01,
+          1.50918007e-01, -9.20068800e-01, -9.26115334e-01 ]
+      - [ -8.45898926e-01, -8.65587115e-01, -8.31085861e-01,
+          6.87358379e-02, -9.93877828e-01, -8.90264392e-01,
+          -7.76817858e-01, 7.27375746e-02, -9.91544127e-01,
+          -9.23862755e-01, -7.12499976e-01, 3.80095601e-01,
+          -9.90771353e-01, -9.55456376e-01, -7.20312476e-01,
+          3.80679011e-01, -9.89831030e-01, -9.57591534e-01,
+          -5.76795936e-01, -7.15632021e-01, -6.08548939e-01,
+          3.83862257e-01, -9.62910056e-01, -7.07098126e-01,
+          -6.30057812e-01, 2.27600336e-02, -8.57855737e-01,
+          -8.17385077e-01, -5.79078555e-01, 1.58742428e-01,
+          -8.91084909e-01, -9.02347505e-01, -5.97014904e-01,
+          2.07115412e-02, -8.70870769e-01, -9.16943431e-01 ]
+      - [ -8.07580769e-01, -6.92466378e-01, -6.45138383e-01,
+          1.85107350e-01, -9.68359292e-01, -7.65267611e-01,
+          -6.67386591e-01, 3.62672806e-01, -9.76899147e-01,
+          -7.43993402e-01, -5.59374988e-01, 3.84565592e-01,
+          -9.56041574e-01, -8.90029550e-01, -5.59374988e-01,
+          4.51908231e-01, -9.60418224e-01, -8.86156023e-01,
+          -3.40547085e-01, -4.35752153e-01, -2.93588281e-01,
+          2.60136724e-01, -8.03539932e-01, -4.38950241e-01,
+          -2.71676362e-01, 3.76491189e-01, -8.24868441e-01,
+          -4.45844054e-01, -1.70610249e-01, -1.20023251e-01,
+          -6.89640820e-01, -7.96295047e-01, -1.89054728e-01,
+          -7.05472231e-02, -7.03304410e-01, -7.80720472e-01 ]
+      - [ -6.84962749e-01, -3.42234254e-01, -4.15188074e-01,
+          1.31199002e-01, -8.11065197e-01, -4.54454720e-01,
+          -4.35565174e-01, 4.65944648e-01, -8.59367013e-01,
+          -3.69812310e-01, -3.73437464e-01, -3.31705809e-02,
+          -7.31674075e-01, -7.30084777e-01, -3.76562476e-01,
+          1.18722916e-02, -7.35698938e-01, -7.31335402e-01,
+          2.27532268e-01, 2.09450722e-02, 2.03599572e-01, 1.54644251e-02,
+          -3.98188055e-01, 5.89671135e-02, 2.23121405e-01,
+          2.38456130e-01, -4.90215957e-01, 8.54985714e-02,
+          3.32503080e-01, -3.29651892e-01, -2.54697025e-01,
+          -4.77581501e-01, 3.65671635e-01, -4.25230503e-01,
+          -2.55366802e-01, -5.09159744e-01 ]
+      - [ -6.78334713e-01, -2.86944270e-01, -3.74024153e-01,
+          5.20415902e-01, -8.62362862e-01, -2.46171236e-01,
+          -3.88048947e-01, -4.36919928e-02, -7.43605733e-01,
+          -5.21828055e-01, -3.20312500e-01, -7.13944435e-04,
+          -7.01571941e-01, -6.80955768e-01, -3.24999988e-01,
+          -2.56810784e-02, -6.93234384e-01, -6.96856380e-01,
+          6.52238131e-02, -6.35564327e-04, 1.40607476e-01,
+          3.61774921e-01, -5.81136465e-01, 1.64258480e-01,
+          2.18497038e-01, -3.23828518e-01, -3.64868462e-01,
+          -3.37343812e-01, 2.55292654e-01, -7.70286322e-02,
+          -3.46354485e-01, -4.31020379e-01, 2.78607130e-01,
+          -3.07576358e-01, -3.06368053e-01, -5.00431180e-01 ]
+      - [ -8.30985904e-01, -9.44986343e-01, -8.99219275e-01,
+          -6.07139468e-02, -9.97569561e-01, -9.54975426e-01,
+          -8.73290122e-01, -1.30460858e-02, -9.97088850e-01,
+          -9.64733422e-01, -8.28125000e-01, 3.08642983e-01,
+          -9.96563613e-01, -9.78892088e-01, -8.34375024e-01,
+          3.41492414e-01, -9.96961415e-01, -9.78819549e-01,
+          -7.18665481e-01, -8.74966860e-01, -8.38020265e-01,
+          2.11275816e-01, -9.81658041e-01, -8.83416891e-01,
+          -8.54335248e-01, 1.52193189e-01, -9.57754731e-01,
+          -9.05621827e-01, -8.80448341e-01, 2.13571548e-01,
+          -9.48831856e-01, -9.50919092e-01, -8.83084595e-01,
+          1.72393918e-01, -9.46010053e-01, -9.53239739e-01 ]
+      - [ -6.76470578e-01, -3.19556117e-01, -3.86799216e-01,
+          2.56285071e-01, -8.32039118e-01, -4.01064932e-01,
+          -4.12526965e-01, 2.06828833e-01, -8.00237119e-01,
+          -4.42098677e-01, -3.85937452e-01, 5.23476601e-02,
+          -7.02287912e-01, -6.60573840e-01, -3.74999940e-01,
+          4.45728302e-02, -7.08271265e-01, -6.85892940e-01,
+          2.09498048e-01, 1.60851479e-02, 2.10348725e-01,
+          -2.67827511e-03, -4.15850163e-01, 3.21424007e-03,
+          2.18497038e-01, -1.27751589e-01, -3.63508165e-01,
+          -1.27289593e-01, 2.37857938e-01, 1.51417136e-01,
+          -3.35481584e-01, -2.83869267e-01, 2.86069512e-01,
+          -2.93249309e-01, -2.43249595e-01, -4.34287488e-01 ]
+      - [ -8.35956931e-01, -7.39372969e-01, -7.70049691e-01,
+          3.09358835e-02, -9.47967172e-01, -7.78950155e-01,
+          -7.63858855e-01, 1.36997938e-01, -9.63043749e-01,
+          -8.07436943e-01, -7.40625024e-01, 1.20166659e-01,
+          -9.27211106e-01, -9.15304422e-01, -7.37499952e-01,
+          3.13785076e-01, -9.49022293e-01, -8.99160981e-01,
+          -4.85422373e-01, -5.10398030e-01, -5.00562429e-01,
+          2.14428186e-01, -7.98141062e-01, -4.59925175e-01,
+          -4.26589608e-01, 7.27100372e-02, -7.68093765e-01,
+          -6.18717909e-01, -3.69862974e-01, -1.13488793e-01,
+          -7.21695781e-01, -8.31115901e-01, -3.98009956e-01,
+          -5.50290346e-02, -7.36459911e-01, -8.13664079e-01 ]
+      - [ -6.52651191e-01, -2.15761364e-01, -3.74024153e-01,
+          1.30602479e-01, -7.37701893e-01, -3.09946537e-01,
+          -3.85169148e-01, 5.40162563e-01, -8.02021444e-01,
+          -1.70293510e-01, -2.64062464e-01, -6.12336397e-02,
+          -6.54574633e-01, -6.50368094e-01, -2.70312488e-01,
+          -1.17170811e-01, -6.38616383e-01, -6.72930360e-01,
+          2.51577973e-01, 1.64294124e-01, 3.02587152e-01,
+          -9.91052985e-02, -2.95624495e-01, 9.03559923e-02,
+          2.55491376e-01, 2.98721194e-01, -3.89505208e-01,
+          3.67888331e-01, 4.29638743e-01, -3.03920031e-01,
+          -1.18155062e-01, -3.31888258e-01, 4.32835698e-01,
+          -5.09816647e-01, -7.48904943e-02, -3.91309917e-01 ]
+      - [ -7.82725751e-01, -5.57876706e-01, -5.88360548e-01,
+          2.21244454e-01, -9.22617674e-01, -6.16007328e-01,
+          -6.02591753e-01, 3.13461542e-01, -9.30410981e-01,
+          -6.38315439e-01, -5.06250024e-01, 3.68415713e-01,
+          -9.07092452e-01, -8.14287961e-01, -5.09374976e-01,
+          2.92291641e-01, -8.96015286e-01, -8.31734180e-01,
+          -7.24377036e-02, -1.92839205e-01, -7.31158853e-02,
+          1.78207994e-01, -6.16807818e-01, -1.46612823e-01,
+          -6.82081580e-02, 1.04466558e-01, -5.93959093e-01,
+          -2.68601835e-01, 3.73589993e-03, 1.63225532e-01,
+          -5.63970685e-01, -5.35846651e-01, 6.21891022e-02,
+          -2.23043799e-01, -4.91010308e-01, -6.38652444e-01 ]
+      - [ -8.53355408e-01, -9.06804383e-01, -8.18310857e-01,
+          5.62666655e-02, -9.98921931e-01, -9.29972887e-01,
+          -8.25773954e-01, 1.04509592e-01, -9.98623729e-01,
+          -9.40644503e-01, -7.68750012e-01, 4.62307453e-01,
+          -9.98264849e-01, -9.61453617e-01, -7.73437500e-01,
+          4.80274796e-01, -9.98216093e-01, -9.61802006e-01,
+          -5.55154800e-01, -7.22270727e-01, -5.11811018e-01,
+          3.91887546e-01, -9.64072526e-01, -7.23439693e-01,
+          -4.82080996e-01, 4.20817375e-01, -9.70534980e-01,
+          -7.89552212e-01, -4.29638922e-01, 4.39246058e-01,
+          -9.49844778e-01, -8.92282069e-01, -4.45273697e-01,
+          4.15096402e-01, -9.48832452e-01, -8.93578231e-01 ]
+      - [ -8.12551796e-01, -4.00764823e-01, -5.85521698e-01,
+          7.32194543e-01, -9.21403944e-01, -1.31685317e-01,
+          -6.08351350e-01, -4.96439040e-01, -6.63378656e-01,
+          -7.14773655e-01, -5.67187488e-01, -4.19487655e-01,
+          -6.48058772e-01, -8.04696798e-01, -5.73437452e-01,
+          -2.43001997e-01, -6.80448771e-01, -7.84852505e-01,
+          -3.42350483e-01, -1.19268239e-01, -1.09111309e-01,
+          7.87847400e-01, -7.45757699e-01, 4.23329115e-01,
+          -1.00578129e-01, -8.14441621e-01, -1.66137159e-01,
+          -6.03015721e-01, -3.36239338e-02, -7.87649035e-01,
+          -1.68022752e-01, -6.65911436e-01, -2.98507214e-02,
+          -7.54827917e-01, -2.05682278e-01, -6.56051874e-01 ]
+      - [ -8.73239458e-01, -7.89673924e-01, -8.15472007e-01,
+          2.26045966e-01, -9.89030123e-01, -7.71658242e-01,
+          -7.75377929e-01, 2.83480883e-02, -9.73052740e-01,
+          -8.87600243e-01, -7.34375000e-01, 3.40726137e-01,
+          -9.72952962e-01, -9.25824642e-01, -7.45312452e-01,
+          3.82893085e-01, -9.74063516e-01, -9.23149526e-01,
+          -6.13465607e-01, -5.74123561e-01, -5.25309324e-01,
+          6.48878455e-01, -9.50446188e-01, -3.76434922e-01,
+          -5.60693622e-01, -3.41913104e-02, -7.72765756e-01,
+          -7.27218032e-01, -5.16811967e-01, -7.88900852e-02,
+          -7.48963356e-01, -8.50170016e-01, -5.24875641e-01,
+          1.60201788e-02, -7.77983248e-01, -8.31035137e-01 ]
+      - [ -8.34299922e-01, -8.56768727e-01, -7.45919108e-01,
+          -7.24172592e-03, -9.87991333e-01, -9.14806724e-01,
+          -8.04175615e-01, 2.06624985e-01, -9.96416748e-01,
+          -8.88348758e-01, -7.03124940e-01, 3.94523740e-01,
+          -9.89310682e-01, -9.49368536e-01, -7.06250012e-01,
+          4.22184944e-01, -9.89644349e-01, -9.49309528e-01,
+          -5.62969685e-01, -6.86352372e-01, -5.36557913e-01,
+          2.74194837e-01, -9.29872096e-01, -7.14838088e-01,
+          -5.44508696e-01, 5.21168232e-01, -9.64792311e-01,
+          -6.76362753e-01, -4.64508116e-01, 2.21949339e-01,
+          -9.04137671e-01, -8.96197140e-01, -4.65174139e-01,
+          2.53716469e-01, -9.13423121e-01, -8.92044008e-01 ]
+      - [ -8.36164057e-01, -6.19504869e-01, -7.18949556e-01,
+          5.77812195e-02, -8.94320488e-01, -6.42233491e-01,
+          -7.16342688e-01, -5.50228357e-03, -8.82835031e-01,
+          -7.22335100e-01, -7.04687476e-01, 4.64308262e-03,
+          -8.39701772e-01, -8.48827004e-01, -7.09374964e-01,
+          -7.47060776e-03, -8.32510352e-01, -8.57616663e-01,
+          -5.73189139e-01, -4.78351355e-01, -5.07311583e-01,
+          2.74213791e-01, -7.99066722e-01, -3.85954738e-01,
+          -4.98265982e-01, -1.68905258e-02, -6.93972468e-01,
+          -5.83980203e-01, -4.54545498e-01, -1.87663972e-01,
+          -6.36542857e-01, -7.95231044e-01, -4.65174139e-01,
+          -2.68767476e-01, -6.22852206e-01, -8.04947495e-01 ]
+      - [ -7.05882311e-01, -3.90389085e-01, -4.12349164e-01,
+          5.21359921e-01, -9.16332185e-01, -4.00386512e-01,
+          -4.34125304e-01, 5.95165610e-01, -9.21306968e-01,
+          -4.46851254e-01, -2.37499952e-01, 4.45624113e-01,
+          -8.76395702e-01, -7.46549129e-01, -2.40624964e-01,
+          4.63142037e-01, -8.76773179e-01, -7.49015749e-01,
+          2.50375748e-01, 3.64146233e-02, 2.95838118e-01, 3.08227420e-01,
+          -5.24263144e-01, 2.04800844e-01, 2.43930459e-01,
+          5.06321311e-01, -5.98905325e-01, 1.95061207e-01,
+          3.99750948e-01, -2.39047647e-01, -2.94540584e-01,
+          -4.63367105e-01, 4.07960176e-01, -2.98971534e-01,
+          -2.94995010e-01, -4.79940653e-01 ]
+      - [ -7.70298243e-01, -4.55148637e-01, -5.48616052e-01,
+          7.60891557e-01, -9.50593829e-01, -2.53990173e-01,
+          -5.80993533e-01, -6.33885026e-01, -6.67788625e-01,
+          -7.92304277e-01, -5.07812500e-01, -5.22274852e-01,
+          -6.88767910e-01, -8.64457726e-01, -5.07812500e-01,
+          -3.52904558e-01, -7.17594385e-01, -8.50452900e-01,
+          4.89929914e-02, -1.80145860e-01, 1.36107922e-01, 1.,
+          -8.55680823e-01, 2.78789639e-01, 1.23699307e-01,
+          -6.27484500e-01, -3.78966212e-01, -6.29807234e-01,
+          3.15068483e-01, -8.30706358e-01, -3.52365613e-01,
+          -7.74869204e-01, 3.08457732e-01, -7.37796903e-01,
+          -3.87530208e-01, -7.49359131e-01 ]
+      - [ -8.47348809e-01, -8.52534473e-01, -7.91341364e-01,
+          -2.10610628e-02, -9.85255837e-01, -9.06983376e-01,
+          -8.37293029e-01, 1.71784997e-01, -9.92631555e-01,
+          -8.76120746e-01, -7.53124952e-01, 3.32006216e-01,
+          -9.84007716e-01, -9.48666811e-01, -7.64062524e-01,
+          3.50165367e-01, -9.83406663e-01, -9.48733628e-01,
+          -5.28103471e-01, -6.44283414e-01, -5.36557913e-01,
+          1.15599632e-01, -8.72439861e-01, -7.22387910e-01,
+          -5.67630112e-01, 4.50348854e-01, -9.28263724e-01,
+          -6.07990503e-01, -4.64508116e-01, 1.65458679e-01,
+          -8.68346334e-01, -8.74751210e-01, -4.87562180e-01,
+          1.53734684e-01, -8.68465066e-01, -8.72094512e-01 ]
+      - [ -8.03024054e-01, -5.12236893e-01, -6.08232737e-01,
+          -9.21230316e-02, -8.27953100e-01, -6.40939236e-01,
+          -6.32829309e-01, 3.14905047e-01, -8.95947576e-01,
+          -5.36851466e-01, -6.25000000e-01, -1.52689397e-01,
+          -7.44410932e-01, -8.01008403e-01, -6.35937512e-01,
+          1.41968369e-01, -7.97198057e-01, -7.56359339e-01,
+          -4.31319535e-01, -3.14111292e-01, -2.86839068e-01,
+          -6.67948127e-02, -5.86966395e-01, -3.92136753e-01,
+          -2.76300609e-01, 2.19347477e-01, -6.89822197e-01,
+          -3.19966197e-01, -2.35367477e-01, -2.58953750e-01,
+          -4.82329726e-01, -6.79695129e-01, -2.31343329e-01,
+          -2.67065346e-01, -4.94708955e-01, -6.79094195e-01 ]
+      - [ -8.46313179e-01, -6.11543059e-01, -7.48757958e-01,
+          4.04047966e-03, -8.65844131e-01, -6.18283272e-01,
+          -7.55219579e-01, 2.11495519e-01, -9.05133367e-01,
+          -6.06010914e-01, -6.93749964e-01, -4.80347276e-02,
+          -8.35706115e-01, -8.61833215e-01, -7.04687476e-01,
+          -1.29315853e-01, -8.10414553e-01, -8.76325965e-01,
+          -4.61376667e-01, -3.44512582e-01, -3.63329589e-01,
+          3.15290689e-02, -5.91321766e-01, -2.82089710e-01,
+          -3.98844004e-01, -1.11011863e-02, -5.51382363e-01,
+          -3.37624311e-01, -2.57783234e-01, -5.41617990e-01,
+          -4.16281044e-01, -7.56932855e-01, -2.56218910e-01,
+          -7.15910375e-01, -3.84332120e-01, -7.87177801e-01 ]
+      - [ -7.37365365e-01, -3.07119310e-01, -4.87579882e-01,
+          5.05073071e-02, -7.62514770e-01, -4.15397942e-01,
+          -4.94600415e-01, 2.00177073e-01, -7.68680215e-01,
+          -3.71369362e-01, -4.78124976e-01, -6.20713234e-02,
+          -6.58875704e-01, -6.63025022e-01, -4.79687452e-01,
+          -2.51158476e-02, -6.58059359e-01, -6.62279725e-01,
+          -1.13315403e-01, 5.81705570e-03, 2.58718729e-02,
+          -8.77427459e-02, -4.13543940e-01, -1.08038664e-01,
+          3.58381271e-02, 1.69357061e-02, -3.98402870e-01,
+          -2.15741396e-02, 8.09464455e-02, -1.32946670e-01,
+          -2.83193350e-01, -4.07318294e-01, 6.46766424e-02,
+          -1.02880478e-01, -3.03079844e-01, -3.96385789e-01 ]
+      - [ -8.45484674e-01, -7.48677850e-01, -7.58694112e-01,
+          1.70154572e-02, -9.56423283e-01, -8.11746895e-01,
+          -7.68178523e-01, 3.27485919e-01, -9.85610664e-01,
+          -7.74470448e-01, -7.10937500e-01, 3.54505301e-01,
+          -9.60808635e-01, -9.00798619e-01, -7.18750000e-01,
+          3.88076782e-01, -9.61691260e-01, -8.99600148e-01,
+          -4.23504710e-01, -4.29966390e-01, -3.83576989e-01,
+          1.71058774e-01, -7.63704300e-01, -4.45591390e-01,
+          -3.64161789e-01, 4.26216722e-01, -8.28824401e-01,
+          -3.90707791e-01, -2.77708590e-01, 6.64962530e-02,
+          -7.28553653e-01, -7.63607323e-01, -2.93532312e-01,
+          5.89330196e-02, -7.32503295e-01, -7.61965215e-01 ]
+      - [ -8.44863296e-01, -8.93173456e-01, -8.18310857e-01,
+          1.28722191e-03, -9.93791640e-01, -9.23950255e-01,
+          -8.53131771e-01, 1.14861608e-01, -9.96885240e-01,
+          -9.21433389e-01, -7.53124952e-01, 3.78622293e-01,
+          -9.93784785e-01, -9.62047279e-01, -7.54687488e-01,
+          4.08025265e-01, -9.94130552e-01, -9.62022960e-01,
+          -5.99038243e-01, -7.39313424e-01, -6.26546621e-01,
+          2.55997419e-01, -9.40284431e-01, -7.49170244e-01,
+          -5.93063593e-01, 2.97754884e-01, -9.48891699e-01,
+          -7.98963249e-01, -5.29265285e-01, 2.96023846e-01,
+          -9.33942616e-01, -9.09964800e-01, -5.54726362e-01,
+          2.74680257e-01, -9.31632280e-01, -9.07856941e-01 ]
+      - [ -8.50869894e-01, -9.28742230e-01, -8.29666436e-01,
+          -2.09209919e-02, -9.98062551e-01, -9.51582253e-01,
+          -8.57451379e-01, 6.76307678e-02, -9.99055624e-01,
+          -9.52202201e-01, -7.92187452e-01, 3.97836804e-01,
+          -9.98148680e-01, -9.71232474e-01, -7.99999952e-01,
+          4.16414976e-01, -9.98069644e-01, -9.71516371e-01,
+          -5.77998281e-01, -7.70406485e-01, -5.74803114e-01,
+          3.33827138e-01, -9.72895384e-01, -7.95446396e-01,
+          -5.19075215e-01, 4.21167374e-01, -9.81006742e-01,
+          -8.25573802e-01, -4.89414692e-01, 4.49644923e-01,
+          -9.65004802e-01, -9.12595510e-01, -4.97512460e-01,
+          4.17328358e-01, -9.63521540e-01, -9.15087938e-01 ]
+      - [ -8.31193030e-01, -6.63668811e-01, -6.83463454e-01,
+          -1.97485089e-02, -9.21774864e-01, -7.65070617e-01,
+          -7.06263483e-01, 5.35022259e-01, -9.82506096e-01,
+          -6.42710567e-01, -6.12499952e-01, 1.93123817e-01,
+          -9.20064688e-01, -8.85975718e-01, -6.09375000e-01,
+          2.87245989e-01, -9.29186940e-01, -8.79970968e-01,
+          -3.06883097e-01, -3.57130766e-01, -2.50843644e-01,
+          -4.93139029e-02, -6.12455130e-01, -4.13005054e-01,
+          -2.92485595e-01, 4.67137456e-01, -7.53841460e-01,
+          -1.62458241e-01, -6.60025477e-02, -5.49121022e-01,
+          -5.12071669e-01, -8.08190167e-01, -7.46268630e-02,
+          -4.05996561e-01, -5.60351789e-01, -7.78731644e-01 ]
+      - [ -8.38028193e-01, -9.37899947e-01, -8.92122090e-01,
+          -5.00240922e-02, -9.97351527e-01, -9.50736344e-01,
+          -8.79049659e-01, 6.18267059e-03, -9.97052491e-01,
+          -9.58295107e-01, -8.24999988e-01, 3.17779303e-01,
+          -9.96141136e-01, -9.76422966e-01, -8.28125000e-01,
+          3.48222017e-01, -9.96539652e-01, -9.76738930e-01,
+          -7.06642628e-01, -8.56690049e-01, -8.06524158e-01,
+          1.90534830e-01, -9.73213315e-01, -8.76532614e-01,
+          -8.21965337e-01, 1.56822443e-01, -9.51495886e-01,
+          -8.90472174e-01, -8.45579088e-01, 2.06099749e-01,
+          -9.40237105e-01, -9.43483353e-01, -8.55721414e-01,
+          1.50764108e-01, -9.33994710e-01, -9.46633399e-01 ]
+      - [ -8.48798692e-01, -9.20819402e-01, -8.14052522e-01,
+          6.12688065e-03, -9.98225808e-01, -9.44777131e-01,
+          -8.27213824e-01, 7.38279819e-02, -9.98203397e-01,
+          -9.47947919e-01, -7.92187452e-01, 4.44093943e-01,
+          -9.98357058e-01, -9.63966846e-01, -7.92187452e-01,
+          4.59004402e-01, -9.98285949e-01, -9.65148389e-01,
+          -6.59152389e-01, -7.83264220e-01, -5.90551138e-01,
+          4.53172565e-01, -9.90652025e-01, -7.82068133e-01,
+          -5.90751469e-01, 5.11887550e-01, -9.90395367e-01,
+          -7.98413038e-01, -5.26774645e-01, 6.33799076e-01,
+          -9.82361555e-01, -8.93150985e-01, -5.39801002e-01,
+          5.70868850e-01, -9.78595734e-01, -8.99831176e-01 ]
+      - [ -8.12344670e-01, -6.23276949e-01, -6.76366210e-01,
+          5.79364300e-02, -9.14289415e-01, -6.96827769e-01,
+          -6.96184278e-01, -6.59972429e-03, -8.90530288e-01,
+          -7.39631653e-01, -6.87500000e-01, 1.59928799e-01,
+          -8.76694202e-01, -8.38419914e-01, -6.87500000e-01,
+          7.69084692e-02, -8.57590795e-01, -8.55153143e-01,
+          -3.21911693e-01, -3.80622506e-01, -2.57592857e-01,
+          1.27493262e-01, -7.39088655e-01, -4.48966444e-01,
+          -2.55491316e-01, 2.23946571e-03, -6.79459691e-01,
+          -5.27963400e-01, -2.37858057e-01, 5.23062944e-02,
+          -6.56772017e-01, -6.95704341e-01, -2.21393049e-01,
+          1.54209137e-03, -6.55857325e-01, -7.06896245e-01 ]
+      - [ -7.41300702e-01, -4.71545339e-01, -5.10290980e-01,
+          2.46037364e-01, -9.04361486e-01, -5.64761400e-01,
+          -5.03239751e-01, 8.10348868e-01, -9.70768332e-01,
+          -4.57181215e-01, -3.31249952e-01, 4.78288293e-01,
+          -9.07114506e-01, -7.83498943e-01, -3.34374964e-01,
+          4.75711107e-01, -9.05469835e-01, -7.89068282e-01,
+          3.27622890e-02, -2.95920968e-02, 7.98650980e-02,
+          1.02387071e-01, -5.01838803e-01, -2.04041600e-02,
+          4.27745581e-02, 5.03275871e-01, -5.89325070e-01,
+          2.35809088e-01, 2.55292654e-01, -2.85241842e-01,
+          -3.32565129e-01, -5.29138088e-01, 2.61193871e-01,
+          -4.69572365e-01, -3.01974237e-01, -5.76180458e-01 ]
+      - [ -8.68268430e-01, -5.62404394e-01, -7.10432947e-01,
+          3.28123450e-01, -9.29140329e-01, -5.02927303e-01,
+          -7.36501098e-01, -1.67231381e-01, -8.23516607e-01,
+          -7.25993037e-01, -6.93749964e-01, -1.66895390e-01,
+          -7.82308042e-01, -8.48718464e-01, -7.09374964e-01,
+          4.65967655e-02, -8.17524016e-01, -8.21429193e-01,
+          -5.58160543e-01, -3.59016299e-01, -4.01574790e-01,
+          3.52458954e-01, -7.51965284e-01, -1.83091760e-01,
+          -3.82658958e-01, -2.67749786e-01, -5.57201564e-01,
+          -6.12923503e-01, -3.69862974e-01, -3.28550160e-01,
+          -4.86818850e-01, -7.28732467e-01, -3.98009956e-01,
+          -1.15901887e-01, -5.53398728e-01, -6.72037244e-01 ]
+      - [ -7.00289965e-01, -3.81128192e-01, -4.84740913e-01,
+          2.04484701e-01, -8.13257277e-01, -3.85065615e-01,
+          -4.97480154e-01, -1.04255676e-02, -7.74511158e-01,
+          -5.46214104e-01, -4.29687500e-01, -2.59466946e-01,
+          -6.90486372e-01, -7.73780704e-01, -4.40625012e-01,
+          -1.06951654e-01, -7.13097215e-01, -7.55420864e-01,
+          5.01952171e-02, -1.36260211e-01, 1.46232843e-02,
+          2.00866818e-01, -5.54887295e-01, 7.05957413e-03,
+          5.66473007e-02, -1.55834496e-01, -4.59334970e-01,
+          -3.17845762e-01, 1.73100829e-01, -4.71280992e-01,
+          -3.50829005e-01, -6.39168739e-01, 1.39303446e-01,
+          -3.62311304e-01, -3.79781306e-01, -5.99642277e-01 ]
+      - [ -7.92046368e-01, -3.91964078e-01, -5.92618883e-01,
+          2.07523584e-01, -8.13519537e-01, -3.56623828e-01,
+          -6.22750163e-01, 2.60911942e-01, -8.22711587e-01,
+          -4.08233762e-01, -5.10937452e-01, -2.81001329e-02,
+          -7.53891706e-01, -7.57495999e-01, -5.17187476e-01,
+          1.15897655e-02, -7.54281044e-01, -7.56648421e-01,
+          -2.38954127e-01, -1.54812932e-02, -4.16197777e-02,
+          9.29570198e-02, -3.93814445e-01, 1.75159693e-01,
+          -6.35837913e-02, 2.65439749e-02, -4.15317655e-01,
+          -4.21744585e-02, 3.73589993e-03, -3.99908006e-01,
+          -1.91482604e-01, -4.82153594e-01, 1.74129009e-02,
+          -4.64654922e-01, -1.88039899e-01, -4.94818270e-01 ]
+      - [ -8.24772120e-01, -7.53844082e-01, -6.91980124e-01,
+          1.98375344e-01, -9.85160768e-01, -8.10752571e-01,
+          -7.03383684e-01, 3.11214685e-01, -9.88555193e-01,
+          -8.15871119e-01, -5.98437428e-01, 4.86023426e-01,
+          -9.78353024e-01, -9.07019615e-01, -6.00000024e-01,
+          5.11620045e-01, -9.79154110e-01, -9.07771945e-01,
+          -2.94259191e-01, -4.72614467e-01, -3.07086527e-01,
+          2.90599942e-01, -8.38662326e-01, -4.88080323e-01,
+          -3.10982645e-01, 2.89226770e-01, -8.25576842e-01,
+          -5.31427741e-01, -1.70610249e-01, 5.67275286e-02,
+          -7.59654403e-01, -7.96126068e-01, -1.89054728e-01,
+          1.72721148e-02, -7.55791724e-01, -7.98595488e-01 ]
+      - [ -8.53562534e-01, -8.78741741e-01, -8.16891432e-01,
+          1.92321539e-02, -9.92283583e-01, -9.09892142e-01,
+          -8.20014358e-01, 1.13199592e-01, -9.94853795e-01,
+          -9.16770697e-01, -7.73437500e-01, 3.64831090e-01,
+          -9.90125895e-01, -9.54494357e-01, -7.81250000e-01,
+          3.74600172e-01, -9.89395499e-01, -9.55622733e-01,
+          -5.37721753e-01, -6.92048073e-01, -5.09561300e-01,
+          3.30488086e-01, -9.47243750e-01, -7.19516933e-01,
+          -5.16763031e-01, 2.73071766e-01, -9.25087333e-01,
+          -7.62366056e-01, -4.69489455e-01, 2.85278678e-01,
+          -9.12011743e-01, -8.86160314e-01, -4.65174139e-01,
+          2.09020257e-01, -9.03437614e-01, -8.93410742e-01 ]
+      - [ -7.22659469e-01, -4.58988845e-01, -5.03193736e-01,
+          2.67149329e-01, -8.75656903e-01, -4.75122452e-01,
+          -5.32037377e-01, 4.09139395e-02, -8.31103802e-01,
+          -6.10501885e-01, -4.42187488e-01, -1.01539135e-01,
+          -7.77934670e-01, -8.07946801e-01, -4.45312440e-01,
+          1.93192959e-02, -7.94122577e-01, -7.96764851e-01,
+          5.50044775e-02, -1.81839466e-01, 2.36220360e-02,
+          2.90088654e-01, -6.32256866e-01, -3.97186875e-02,
+          2.42774487e-02, -3.96960378e-02, -5.45473933e-01,
+          -3.36327612e-01, 1.50684953e-01, -3.75365138e-01,
+          -4.34387088e-01, -6.66298032e-01, 1.21890545e-01,
+          -2.70637751e-01, -4.62917387e-01, -6.31897807e-01 ]
+      - [ -6.33388519e-01, -2.17221618e-01, -3.15826833e-01,
+          4.35067058e-01, -8.06607246e-01, -1.91886842e-01,
+          -3.53491664e-01, 4.02397990e-01, -7.94385850e-01,
+          -2.80499458e-01, -2.14062452e-01, 2.85466909e-02,
+          -6.90076411e-01, -6.53321862e-01, -2.20312476e-01,
+          3.64339352e-02, -6.86058521e-01, -6.60192788e-01,
+          3.46558452e-01, 1.68093443e-01, 3.76827836e-01, 2.32709527e-01,
+          -3.96098435e-01, 3.64819050e-01, 3.36416125e-01,
+          1.37848735e-01, -3.56855631e-01, 2.11552262e-01,
+          4.74470735e-01, -3.36114705e-01, -1.05139494e-01,
+          -3.36357415e-01, 4.92537379e-01, -4.11062777e-01,
+          -1.10201061e-01, -3.65584791e-01 ]
+      - [ -8.46934557e-01, -9.33791697e-01, -7.70049691e-01,
+          -6.56546354e-02, -9.97292459e-01, -9.66226816e-01,
+          -8.22894156e-01, 2.97689438e-02, -9.97721255e-01,
+          -9.59163904e-01, -7.56249964e-01, 3.71992469e-01,
+          -9.97892976e-01, -9.75661814e-01, -7.64062524e-01,
+          3.95416617e-01, -9.97913539e-01, -9.75665808e-01,
+          -6.24286175e-01, -8.17356706e-01, -5.68053961e-01,
+          3.46383929e-01, -9.90105927e-01, -8.61357689e-01,
+          -6.30057812e-01, 4.57384586e-01, -9.90809321e-01,
+          -8.32578182e-01, -5.09340048e-01, 5.13140917e-01,
+          -9.83386397e-01, -9.30432677e-01, -5.17412901e-01,
+          5.02105236e-01, -9.83575702e-01, -9.30570602e-01 ]
+      - [ -9.31234479e-01, -6.80231571e-01, -8.43860865e-01,
+          -1.26530945e-01, -8.87141705e-01, -7.50874519e-01,
+          -8.70410323e-01, -1.09228492e-02, -9.11791503e-01,
+          -7.61410475e-01, -8.32812428e-01, -4.12372351e-02,
+          -8.66269290e-01, -8.99389744e-01, -8.39062452e-01,
+          -3.01750302e-02, -8.62251282e-01, -9.03726101e-01,
+          -8.05831075e-01, -5.42468190e-01, -6.40044987e-01,
+          -1.09142423e-01, -6.76267743e-01, -6.19194806e-01,
+          -6.71676278e-01, -5.45288920e-02, -7.11261749e-01,
+          -6.66720808e-01, -6.71232939e-01, -1.68992460e-01,
+          -6.27807379e-01, -8.03987086e-01, -6.76616907e-01,
+          -2.60654390e-01, -6.12411141e-01, -8.19054723e-01 ]
+      - [ -8.04059625e-01, -3.49547446e-01, -5.94038308e-01,
+          -2.44784355e-02, -7.20941424e-01, -3.91433001e-01,
+          -6.05471492e-01, -2.89000869e-02, -7.00878382e-01,
+          -4.25543964e-01, -5.73437452e-01, -2.43807137e-01,
+          -6.03743732e-01, -6.93051338e-01, -5.74999928e-01,
+          -4.02210534e-01, -5.52139044e-01, -7.34508872e-01,
+          -2.28133559e-01, -1.28785431e-01, -7.53655434e-02,
+          -1.43148839e-01, -4.08694327e-01, -1.76419556e-01,
+          -1.30635917e-01, -1.86302662e-01, -3.56987536e-01,
+          -2.11867154e-01, -4.35865521e-02, -4.43406940e-01,
+          -2.53392696e-01, -5.62691808e-01, -1.99004412e-02,
+          -6.22045338e-01, -2.40632117e-01, -6.19903803e-01 ]
+      - [ -8.60397696e-01, -8.57411742e-01, -8.29666436e-01,
+          1.90150023e-01, -9.98830378e-01, -8.59979272e-01,
+          -7.42260635e-01, 1.32943869e-01, -9.95975733e-01,
+          -9.24871504e-01, -7.15624988e-01, 5.24979830e-01,
+          -9.96123672e-01, -9.42523181e-01, -7.20312476e-01,
+          5.32653689e-01, -9.95721519e-01, -9.43475783e-01,
+          -5.04057765e-01, -6.13440275e-01, -4.33070838e-01,
+          7.14243889e-01, -9.78519738e-01, -4.92493629e-01,
+          -4.49711025e-01, 5.95678091e-02, -8.48381519e-01,
+          -7.68779576e-01, -3.84806991e-01, 2.31830120e-01,
+          -8.69572997e-01, -8.53326976e-01, -3.80596995e-01,
+          2.24253058e-01, -8.73914301e-01, -8.54672790e-01 ]
+      - [ -7.90596485e-01, -4.96810198e-01, -6.06813312e-01,
+          2.80491352e-01, -8.85387778e-01, -4.60778594e-01,
+          -6.44348383e-01, -6.39632344e-02, -8.00176620e-01,
+          -6.20095372e-01, -5.51562488e-01, -1.67622626e-01,
+          -7.68414497e-01, -8.25458169e-01, -5.70312440e-01,
+          -1.62385285e-01, -7.54885316e-01, -8.25329185e-01,
+          -2.75022566e-01, -2.74815500e-01, -1.78852618e-01,
+          2.15769410e-01, -6.54801607e-01, -1.72097981e-01,
+          -1.90751433e-01, -2.42048502e-03, -5.84941626e-01,
+          -3.72036278e-01, -3.86051536e-02, -4.88290787e-01,
+          -4.41526890e-01, -7.32409000e-01, -4.47760820e-02,
+          -4.81512129e-01, -4.50643361e-01, -7.21501529e-01 ]
+      - [ -8.62261832e-01, -9.36925054e-01, -8.35344195e-01,
+          -1.60476565e-02, -9.99198675e-01, -9.56652462e-01,
+          -8.41612637e-01, 2.49183178e-02, -9.98000860e-01,
+          -9.60460067e-01, -8.15625012e-01, 3.65052938e-01,
+          -9.97568667e-01, -9.73513186e-01, -8.20312500e-01,
+          3.87562871e-01, -9.97590065e-01, -9.73814487e-01,
+          -5.34716010e-01, -7.77206540e-01, -4.37570274e-01,
+          4.59870577e-01, -9.93054450e-01, -8.20672393e-01,
+          -4.33526099e-01, 4.90381837e-01, -9.90205288e-01,
+          -8.35242152e-01, -3.27521861e-01, 6.14205956e-01,
+          -9.84039605e-01, -9.12897527e-01, -3.33333433e-01,
+          5.90095401e-01, -9.83216465e-01, -9.14794087e-01 ]
+      - [ -7.80861616e-01, -5.63979864e-01, -5.64229965e-01,
+          2.07205296e-01, -9.35164928e-01, -6.66083097e-01,
+          -6.18430495e-01, 3.84242654e-01, -9.29845333e-01,
+          -5.82566738e-01, -4.74999964e-01, 2.89977074e-01,
+          -9.07043755e-01, -8.40539813e-01, -4.84375000e-01,
+          2.68672824e-01, -9.00822639e-01, -8.46629858e-01,
+          -9.52810645e-02, -2.27634251e-01, -9.11136270e-02,
+          1.16519213e-01, -6.16834223e-01, -2.23427653e-01,
+          -8.67052078e-02, 3.39736462e-01, -6.68218136e-01,
+          -1.44372761e-01, 6.35117292e-02, -2.09023893e-01,
+          -5.20524740e-01, -6.71899021e-01, 8.45770836e-02,
+          -3.22988987e-01, -5.09900749e-01, -6.97031140e-01 ]
+      - [ -8.40513706e-01, -9.28808570e-01, -8.41022015e-01,
+          -7.77587295e-02, -9.95080590e-01, -9.56620574e-01,
+          -8.87688994e-01, 4.41063643e-02, -9.97125924e-01,
+          -9.44773018e-01, -8.06249976e-01, 3.01244855e-01,
+          -9.94428337e-01, -9.75238383e-01, -8.15625012e-01,
+          3.24591041e-01, -9.94373381e-01, -9.75381136e-01,
+          -6.42320514e-01, -8.14087749e-01, -6.91788554e-01,
+          1.78220749e-01, -9.60694969e-01, -8.61548305e-01,
+          -7.43352652e-01, 3.45715165e-01, -9.72047448e-01,
+          -8.13875556e-01, -6.86176836e-01, 2.27519512e-01,
+          -9.42032993e-01, -9.38241661e-01, -6.99005008e-01,
+          2.03636527e-01, -9.41318572e-01, -9.38290238e-01 ]
+      - [ -8.58740687e-01, -8.74399185e-01, -8.08374703e-01,
+          2.15753317e-02, -9.91793692e-01, -9.08177853e-01,
+          -7.98416078e-01, 5.07806540e-02, -9.89689887e-01,
+          -9.23317552e-01, -7.60937452e-01, 2.79404640e-01,
+          -9.85012054e-01, -9.60930407e-01, -7.68750012e-01,
+          3.08315277e-01, -9.85127032e-01, -9.60596442e-01,
+          -4.84821200e-01, -6.90557480e-01, -4.69066381e-01,
+          3.56111050e-01, -9.48721945e-01, -7.07399249e-01,
+          -4.58959579e-01, 2.40071416e-01, -9.16909277e-01,
+          -7.70883918e-01, -3.54919076e-01, 1.38255000e-01,
+          -8.89740229e-01, -9.03735399e-01, -3.50746274e-01,
+          1.31340981e-01, -8.93966675e-01, -9.03912485e-01 ]
+      - [ -7.94117630e-01, -6.86025262e-01, -6.21007800e-01,
+          2.33493209e-01, -9.73714232e-01, -7.59484529e-01,
+          -6.29949570e-01, 3.45839262e-01, -9.77685332e-01,
+          -7.65468478e-01, -5.26562452e-01, 4.42269683e-01,
+          -9.60222483e-01, -8.83459091e-01, -5.24999976e-01,
+          4.72952127e-01, -9.61649001e-01, -8.84079576e-01,
+          -1.30147398e-01, -3.84602785e-01, -1.47356570e-01,
+          3.17153692e-01, -8.13767672e-01, -4.13266420e-01,
+          -1.30635917e-01, 3.28190088e-01, -8.05154562e-01,
+          -4.59044456e-01, -2.86426544e-02, 7.57889748e-02,
+          -7.14502871e-01, -7.39985943e-01, -4.22885418e-02,
+          2.92054415e-02, -7.07580447e-01, -7.42912292e-01 ]
+      - [ -7.43993402e-01, -5.19587636e-01, -5.57132721e-01,
+          -4.91515994e-02, -8.46460938e-01, -6.47724688e-01,
+          -5.56515455e-01, 2.50120044e-01, -8.97326827e-01,
+          -6.04645371e-01, -5.15625000e-01, -1.76173985e-01,
+          -7.86339402e-01, -8.43889236e-01, -5.26562452e-01,
+          1.45422220e-01, -8.35307777e-01, -8.01622748e-01,
+          -2.22723246e-01, -3.12030077e-01, -2.37345338e-01,
+          9.17793512e-02, -6.44076109e-01, -3.04458916e-01,
+          -1.67630076e-01, 1.29123211e-01, -6.72288060e-01,
+          -3.86964917e-01, -4.85678315e-02, -2.68593252e-01,
+          -5.57006240e-01, -7.34630346e-01, -7.96020627e-02,
+          -1.75823271e-01, -5.77856600e-01, -7.03346848e-01 ]
+      - [ -8.45691800e-01, -6.71509027e-01, -7.57274628e-01,
+          8.97141695e-02, -9.22445655e-01, -6.73779368e-01,
+          -7.60979116e-01, -2.69969702e-02, -9.00772989e-01,
+          -7.65697539e-01, -7.35937476e-01, 1.93339586e-02,
+          -8.72965932e-01, -8.81893873e-01, -7.43749976e-01,
+          2.64475346e-02, -8.68627429e-01, -8.86092126e-01,
+          -5.96633613e-01, -5.13408661e-01, -5.36557913e-01,
+          3.27003479e-01, -8.34367812e-01, -4.00036514e-01,
+          -5.30635893e-01, 9.26911831e-03, -7.21702754e-01,
+          -6.05075836e-01, -4.86924112e-01, -1.81994140e-01,
+          -6.65370524e-01, -8.18993986e-01, -5.00000000e-01,
+          -2.60317326e-01, -6.51729286e-01, -8.27195466e-01 ]
+      - [ -8.16072941e-01, -5.96181870e-01, -6.76366210e-01,
+          8.33086967e-02, -9.15270507e-01, -6.80722296e-01,
+          -6.90424740e-01, 3.26978207e-01, -9.33477163e-01,
+          -6.14439309e-01, -6.15625024e-01, 1.71168089e-01,
+          -8.91389370e-01, -8.54161143e-01, -6.21875048e-01,
+          2.08537817e-01, -8.93212259e-01, -8.54165316e-01,
+          -3.54373336e-01, -2.76095271e-01, -2.37345338e-01,
+          -2.24302411e-02, -6.11558199e-01, -3.80269110e-01,
+          -2.85549164e-01, 3.15605044e-01, -6.74356341e-01,
+          -1.70699954e-01, -1.75591469e-01, -2.69180298e-01,
+          -5.03111184e-01, -6.97498560e-01, -1.96517467e-01,
+          -2.22683966e-01, -5.23314476e-01, -6.81401134e-01 ]
+      - [ -7.98260152e-01, -9.55748916e-01, -8.01277518e-01,
+          -9.72855091e-02, -9.97832894e-01, -9.75059927e-01,
+          -7.92656600e-01, -2.71161199e-02, -9.98350680e-01,
+          -9.77762699e-01, -7.57812500e-01, 3.07662606e-01,
+          -9.97968793e-01, -9.84564841e-01, -7.60937512e-01,
+          3.32581401e-01, -9.98063982e-01, -9.84945118e-01,
+          -6.51938677e-01, -8.75674129e-01, -7.00787365e-01,
+          3.40336680e-01, -9.97489333e-01, -8.93247962e-01,
+          -6.62427783e-01, 3.46487641e-01, -9.94139254e-01,
+          -9.13838923e-01, -6.01494431e-01, 5.06152272e-01,
+          -9.93505955e-01, -9.53666151e-01, -6.06965184e-01,
+          4.55450654e-01, -9.91569519e-01, -9.56449986e-01 ]
+      - [ -7.71748126e-01, -2.70126164e-01, -5.15968800e-01,
+          1.58617020e-01, -7.25395918e-01, -2.33740926e-01,
+          -5.40676773e-01, 2.66263604e-01, -7.56409645e-01,
+          -2.78748214e-01, -4.76562440e-01, -1.76157117e-01,
+          -6.10877275e-01, -6.61804080e-01, -4.82812464e-01,
+          -2.30423689e-01, -5.85077882e-01, -6.79045439e-01,
+          -2.20919788e-01, 4.82993126e-02, -3.37451696e-03,
+          1.24409199e-02, -3.08805406e-01, 2.14013577e-01,
+          -3.46827507e-03, -8.32137465e-02, -3.21257353e-01,
+          -1.88837647e-02, 5.85304499e-02, -4.21912551e-01,
+          -1.09511256e-01, -4.18495178e-01, 5.47263622e-02,
+          -4.96969044e-01, -9.31746960e-02, -4.28239465e-01 ]
+      - [ -8.15865755e-01, -9.21960235e-01, -7.78566360e-01,
+          -3.25928926e-02, -9.97327924e-01, -9.56256866e-01,
+          -7.63858855e-01, 2.29822397e-02, -9.96624172e-01,
+          -9.60041702e-01, -7.03124940e-01, 3.61973405e-01,
+          -9.96721447e-01, -9.74759400e-01, -7.07812488e-01,
+          3.81648302e-01, -9.96726215e-01, -9.75406110e-01,
+          -5.41328549e-01, -8.08297336e-01, -5.38807631e-01,
+          3.35871100e-01, -9.87732947e-01, -8.59974921e-01,
+          -5.58381557e-01, 3.21136951e-01, -9.75099802e-01,
+          -8.62988472e-01, -4.84433413e-01, 3.90264273e-01,
+          -9.70647037e-01, -9.37245488e-01, -4.82587039e-01,
+          3.45574856e-01, -9.69382226e-01, -9.41896975e-01 ]
+      - [ -8.45898926e-01, -7.65256643e-01, -7.31724620e-01,
+          2.33338952e-01, -9.87332880e-01, -7.89391816e-01,
+          -7.27861762e-01, 2.69786477e-01, -9.86535311e-01,
+          -8.22798073e-01, -6.20312452e-01, 4.54159856e-01,
+          -9.78258848e-01, -9.13700998e-01, -6.28124952e-01,
+          4.78127599e-01, -9.78506923e-01, -9.13485587e-01,
+          -4.75202978e-01, -5.37212551e-01, -4.06074226e-01,
+          3.09609532e-01, -8.53693604e-01, -4.93483663e-01,
+          -4.49711025e-01, 3.71251822e-01, -8.68450999e-01,
+          -5.39243579e-01, -3.22540462e-01, -3.01576853e-02,
+          -7.67244220e-01, -8.39082956e-01, -3.28358173e-01,
+          -2.17210054e-02, -7.76176333e-01, -8.35545063e-01 ]
+      - [ -8.28293324e-01, -5.17422318e-01, -6.70688450e-01,
+          7.82085657e-02, -8.65342140e-01, -5.71471453e-01,
+          -6.88984871e-01, -7.20125437e-03, -8.23905110e-01,
+          -6.16775393e-01, -6.62499964e-01, 1.93214417e-02,
+          -7.94244111e-01, -7.90354550e-01, -6.67187452e-01,
+          -7.11948276e-02, -7.69592524e-01, -8.13132524e-01,
+          -4.46949244e-01, -2.80779064e-01, -2.66591609e-01,
+          -8.89896154e-02, -5.87684214e-01, -4.17401195e-01,
+          -2.62427807e-01, -1.77711427e-01, -5.12525320e-01,
+          -4.48119283e-01, -2.62764633e-01, -9.25585032e-02,
+          -4.94842529e-01, -6.09457791e-01, -2.61193991e-01,
+          -2.47700751e-01, -4.67030108e-01, -6.48245096e-01 ]
+      - [ -8.36992502e-01, -6.66320682e-01, -7.28885710e-01,
+          1.28391147e-01, -9.47691023e-01, -7.19942689e-01,
+          -7.58099318e-01, 1.84960365e-02, -8.93766463e-01,
+          -7.22749352e-01, -7.01562464e-01, 1.76294088e-01,
+          -9.03787315e-01, -8.68509531e-01, -7.15624988e-01,
+          3.45275402e-02, -8.68899703e-01, -8.82672668e-01,
+          -2.89450049e-01, -3.49190474e-01, -2.41844714e-01,
+          -1.60485506e-02, -6.63463116e-01, -4.63591993e-01,
+          -2.64739871e-01, 1.71201110e-01, -6.69624984e-01,
+          -3.35081279e-01, -1.53175533e-01, -1.22363865e-01,
+          -6.14842594e-01, -7.29518890e-01, -1.84079647e-01,
+          -7.62571692e-02, -6.24942422e-01, -7.09164977e-01 ]
+      - [ -8.56462300e-01, -8.92666221e-01, -8.56635928e-01,
+          -7.07054138e-03, -9.93618488e-01, -9.20904815e-01,
+          -8.77609789e-01, 5.04455566e-02, -9.91054773e-01,
+          -9.18031573e-01, -8.46874952e-01, 2.79321671e-01,
+          -9.85793889e-01, -9.60018396e-01, -8.48437488e-01,
+          3.40624213e-01, -9.88589525e-01, -9.58261788e-01,
+          -6.62158132e-01, -7.70874858e-01, -7.07536578e-01,
+          1.43160105e-01, -9.34921384e-01, -8.25031281e-01,
+          -7.24855542e-01, 2.03904867e-01, -9.24376428e-01,
+          -8.01115870e-01, -7.28518128e-01, 1.84540510e-01,
+          -9.02421474e-01, -9.08460140e-01, -7.36318409e-01,
+          7.51602650e-02, -8.84171784e-01, -9.18113947e-01 ]
+      - [ -7.70091116e-01, -4.00709629e-01, -5.44357777e-01,
+          2.24258661e-01, -8.29978108e-01, -3.91951203e-01,
+          -5.85313141e-01, -1.33539915e-01, -7.26767778e-01,
+          -5.53117812e-01, -5.06250024e-01, -2.76410341e-01,
+          -6.77222490e-01, -7.72536993e-01, -5.31250000e-01,
+          -2.37779975e-01, -6.62654400e-01, -7.62245119e-01,
+          -2.02885509e-01, -2.02092946e-01, -1.06861591e-01,
+          1.06435776e-01, -5.60278773e-01, -1.25735998e-01,
+          -1.00578129e-01, -9.21405554e-02, -5.03267169e-01,
+          -3.29789102e-01, 3.86052132e-02, -4.74485695e-01,
+          -3.78593385e-01, -6.72159910e-01, 2.48756409e-02,
+          -4.52716172e-01, -3.90042543e-01, -6.56423926e-01 ]
+      - [ -5.78086138e-01, -1.53570712e-01, -2.87437856e-01,
+          2.68830180e-01, -7.00443923e-01, -1.14754200e-01,
+          -2.98776031e-01, 6.49272203e-02, -5.98847866e-01,
+          -1.90795124e-01, -1.32812440e-01, -6.29608750e-01,
+          -4.69632864e-01, -6.89413071e-01, -1.29687488e-01,
+          -3.41511309e-01, -5.27542353e-01, -6.46916211e-01,
+          3.95251036e-01, 6.88892603e-02, 3.16085458e-01, 3.43871117e-02,
+          -3.55135679e-01, 1.64796710e-01, 2.48554826e-01,
+          1.10242724e-01, -2.92442024e-01, 2.94337034e-01,
+          6.43835664e-01, -9.87812459e-01, -4.99279499e-02,
+          -6.12904429e-01, 6.16915345e-01, -8.14865887e-01,
+          -1.08894169e-01, -5.59449434e-01 ]
+      - [ -8.18765521e-01, -7.43769825e-01, -6.79205060e-01,
+          3.20585132e-01, -9.87728000e-01, -7.51825571e-01,
+          -6.47228241e-01, -6.49660826e-03, -9.60239649e-01,
+          -8.81932437e-01, -5.81249952e-01, 2.53021717e-01,
+          -9.56462324e-01, -9.22987640e-01, -5.82812428e-01,
+          3.03727388e-01, -9.59037483e-01, -9.21517432e-01,
+          -1.95070684e-01, -4.79854405e-01, -1.45106852e-01,
+          6.30838394e-01, -9.06527638e-01, -3.35980535e-01,
+          -1.56069398e-01, -3.66100073e-02, -7.83740044e-01,
+          -7.23948121e-01, 3.36239338e-02, -2.17860579e-01,
+          -7.33716309e-01, -8.58089507e-01, 4.72637415e-02,
+          -1.61304533e-01, -7.57123470e-01, -8.51419389e-01 ]
+      - [ -8.52319777e-01, -8.89210165e-01, -7.79985785e-01,
+          7.08935261e-02, -9.97409999e-01, -9.19038177e-01,
+          -7.94096470e-01, 1.55017853e-01, -9.99024868e-01,
+          -9.30367768e-01, -7.48437464e-01, 5.02455950e-01,
+          -9.97833788e-01, -9.52742696e-01, -7.54687488e-01,
+          5.19859552e-01, -9.97819841e-01, -9.53083217e-01,
+          -5.78599334e-01, -6.84399962e-01, -4.55568016e-01,
+          6.08488441e-01, -9.86090958e-01, -6.47081017e-01,
+          -4.52023149e-01, 5.58462858e-01, -9.82255101e-01,
+          -7.43405163e-01, -3.22540462e-01, 6.29430175e-01,
+          -9.70194995e-01, -8.76492321e-01, -3.20895433e-01,
+          6.23288393e-01, -9.70574617e-01, -8.77344549e-01 ]
+      - [ -8.92709196e-01, -8.49623203e-01, -8.87863755e-01,
+          2.08283663e-02, -9.86364126e-01, -8.69806945e-01,
+          -9.30885494e-01, 2.46941924e-01, -9.95117247e-01,
+          -8.08331549e-01, -8.15625012e-01, 3.50247502e-01,
+          -9.86395240e-01, -9.47009921e-01, -8.23437452e-01,
+          3.38806748e-01, -9.84148383e-01, -9.50505018e-01,
+          -7.66756892e-01, -7.06259012e-01, -7.72778392e-01,
+          -1.38644576e-02, -7.90552378e-01, -7.08083510e-01,
+          -8.05780351e-01, 2.71026492e-01, -8.57245326e-01,
+          -5.64437807e-01, -7.53424704e-01, -3.36489141e-01,
+          -6.88019633e-01, -9.22507226e-01, -7.68656731e-01,
+          -4.39942002e-01, -6.59551680e-01, -9.29073513e-01 ]
+      - [ -9.24813569e-01, -7.68908501e-01, -8.58055353e-01,
+          -8.39002132e-02, -9.40320730e-01, -8.21852326e-01,
+          -8.81929398e-01, -1.89431906e-02, -9.49090421e-01,
+          -8.42860341e-01, -8.45312476e-01, 7.78241158e-02,
+          -9.27942157e-01, -9.29282367e-01, -8.54687512e-01,
+          9.78796482e-02, -9.25905585e-01, -9.30593431e-01,
+          -7.78178573e-01, -6.19104385e-01, -6.78290188e-01,
+          1.81121826e-02, -7.75052428e-01, -6.38656378e-01,
+          -7.01734185e-01, 4.11051512e-02, -7.96128452e-01,
+          -7.07238793e-01, -6.63760900e-01, -1.16767645e-01,
+          -7.37029314e-01, -8.66489530e-01, -6.69154167e-01,
+          -1.73424482e-01, -7.33786106e-01, -8.73800516e-01 ]
+      - [ -6.89726591e-01, -3.16734552e-01, -4.54932570e-01,
+          1.93830967e-01, -7.79111385e-01, -3.25442731e-01,
+          -4.57163393e-01, -8.03112984e-04, -7.40072966e-01,
+          -4.81337428e-01, -3.99999976e-01, -3.48571897e-01,
+          -6.26135588e-01, -7.43191838e-01, -4.17187452e-01,
+          -1.71846092e-01, -6.50758147e-01, -7.15546131e-01,
+          4.05770540e-02, -9.64129567e-02, 3.71204615e-02,
+          1.05249882e-01, -4.83354747e-01, 1.95853710e-02,
+          9.82658863e-02, -1.99726641e-01, -4.09899294e-01,
+          -2.85703540e-01, 2.00498104e-01, -4.89755154e-01,
+          -2.99845099e-01, -6.04532838e-01, 1.64179087e-01,
+          -3.93158615e-01, -3.22686672e-01, -5.63329995e-01 ]
+      - [ -8.54391038e-01, -8.69882405e-01, -7.88502455e-01,
+          8.90333652e-02, -9.96113956e-01, -9.01565671e-01,
+          -7.88336873e-01, 1.42427683e-01, -9.95427191e-01,
+          -9.12701547e-01, -7.34375000e-01, 4.73244190e-01,
+          -9.94061351e-01, -9.44852114e-01, -7.37499952e-01,
+          4.76521492e-01, -9.93426502e-01, -9.46750641e-01,
+          -6.10459924e-01, -7.07856297e-01, -5.09561300e-01,
+          4.66652513e-01, -9.74169791e-01, -6.99441314e-01,
+          -5.23699522e-01, 4.04885888e-01, -9.54893351e-01,
+          -7.42447495e-01, -4.89414692e-01, 4.76865292e-01,
+          -9.38173711e-01, -8.59262288e-01, -5.07462621e-01,
+          3.67086768e-01, -9.25814331e-01, -8.72662485e-01 ]
+      - [ -7.70712495e-01, -4.31667745e-01, -5.82682729e-01,
+          7.87228346e-02, -8.35316539e-01, -5.23715138e-01,
+          -5.75233996e-01, 1.86854005e-01, -8.27310443e-01,
+          -4.90524173e-01, -5.26562452e-01, 1.86436176e-02,
+          -7.74879396e-01, -7.63949096e-01, -5.34375012e-01,
+          -7.59565234e-02, -7.41115332e-01, -7.77468681e-01,
+          -1.03697121e-01, -1.20169878e-01, 2.13723183e-02,
+          -1.78414822e-01, -4.62116480e-01, -2.96837449e-01,
+          -1.04046464e-02, 5.90029955e-02, -4.89749730e-01,
+          -1.32140160e-01, 9.58904028e-02, -2.82187402e-01,
+          -3.71956885e-01, -5.71679235e-01, 1.99004412e-02,
+          -1.24853790e-01, -4.00568247e-01, -5.07139385e-01 ]
+      - [ -8.48384440e-01, -4.41682160e-01, -6.33782804e-01,
+          2.63170362e-01, -8.61983120e-01, -4.06329215e-01,
+          -6.42908573e-01, -2.26005495e-01, -7.49975920e-01,
+          -6.54628098e-01, -6.32812500e-01, -3.10339570e-01,
+          -6.57954454e-01, -7.79967070e-01, -6.56250000e-01,
+          1.21736526e-03, -7.17397749e-01, -7.25666523e-01,
+          -5.22091985e-01, -2.80018389e-01, -3.31833541e-01,
+          3.01121473e-01, -6.90602779e-01, -1.20328188e-01,
+          -2.99421966e-01, -3.29372764e-01, -4.90448058e-01,
+          -5.71064234e-01, -2.87671208e-01, -3.14293742e-01,
+          -4.31133449e-01, -6.68202043e-01, -3.23383152e-01,
+          -1.42162919e-01, -4.83541429e-01, -6.15039945e-01 ]
+      - [ -7.38193870e-01, -3.08817089e-01, -4.69127059e-01,
+          6.54300451e-02, -7.77557671e-01, -4.34281886e-01,
+          -4.80201542e-01, 2.57321954e-01, -7.82453716e-01,
+          -3.54202926e-01, -4.62499917e-01, -8.16395283e-02,
+          -6.62033796e-01, -6.74585402e-01, -4.71875012e-01,
+          -3.28372121e-02, -6.65307701e-01, -6.73648834e-01,
+          -2.02885509e-01, -5.57835102e-02, -3.37451696e-03,
+          -1.53664589e-01, -4.42316294e-01, -2.38374889e-01,
+          -7.97688365e-02, 1.36009574e-01, -4.73405480e-01,
+          -7.15214014e-03, 3.36239338e-02, -2.36761451e-01,
+          -3.22183371e-01, -5.06324887e-01, 1.24377012e-02,
+          -1.92063987e-01, -3.46575856e-01, -4.91276503e-01 ]
+      - [ -8.64954472e-01, -8.17641914e-01, -8.36763620e-01,
+          1.89853787e-01, -9.89809692e-01, -7.92137027e-01,
+          -7.49460042e-01, 1.15235567e-01, -9.89784181e-01,
+          -9.04102921e-01, -7.01562464e-01, 3.85798812e-01,
+          -9.84358072e-01, -9.39749777e-01, -7.12499976e-01,
+          4.28103089e-01, -9.85246480e-01, -9.37164545e-01,
+          -6.36309028e-01, -6.16630137e-01, -5.65804243e-01,
+          6.23374581e-01, -9.53787446e-01, -4.06687915e-01,
+          -6.11560702e-01, -9.44737792e-02, -7.72943079e-01,
+          -7.81418264e-01, -5.66625237e-01, -2.05227494e-01,
+          -7.35272050e-01, -8.87325764e-01, -5.54726362e-01,
+          -9.57075953e-02, -7.79389441e-01, -8.73501778e-01 ]
+      - [ -7.59527743e-01, -2.43483126e-01, -4.35060322e-01,
+          8.98654103e-01, -8.92945886e-01, 7.34398365e-02,
+          -4.39884722e-01, -5.39333224e-01, -5.67555070e-01,
+          -6.11413002e-01, -4.04687405e-01, -5.03832161e-01,
+          -5.26492953e-01, -7.13440776e-01, -4.14062500e-01,
+          -3.49980116e-01, -5.49447417e-01, -6.91228747e-01,
+          -2.40156353e-01, 1.03235245e-02, 3.03711891e-02,
+          7.17393875e-01, -6.52877331e-01, 5.46200037e-01,
+          7.51445293e-02, -8.65856946e-01, -7.04357028e-02,
+          -5.17133594e-01, 1.10834241e-01, -7.44896531e-01,
+          -5.22407293e-02, -5.45019984e-01, 1.16915464e-01,
+          -6.88734770e-01, -1.02741003e-01, -5.32527626e-01 ]
+      - [ -8.72410953e-01, -8.10539126e-01, -8.12633097e-01,
+          1.26741290e-01, -9.89589930e-01, -8.41600299e-01,
+          -8.27213824e-01, 1.26363873e-01, -9.79315519e-01,
+          -8.49710941e-01, -7.54687488e-01, 3.71067762e-01,
+          -9.78388786e-01, -9.27985430e-01, -7.73437500e-01,
+          3.49015832e-01, -9.73730862e-01, -9.30102050e-01,
+          -4.75202978e-01, -5.61130404e-01, -4.30821180e-01,
+          1.18189812e-01, -8.34928632e-01, -6.46694779e-01,
+          -4.54335332e-01, 1.87656164e-01, -8.11473489e-01,
+          -5.91989517e-01, -3.64881754e-01, 1.20992184e-01,
+          -8.09413433e-01, -8.26962352e-01, -4.00497556e-01,
+          4.55596447e-02, -7.88454711e-01, -8.25880051e-01 ]
+      - [ -5.85128427e-01, -1.24177694e-01, -2.46273935e-01,
+          2.18137145e-01, -7.22809017e-01, -2.16295898e-01,
+          -2.65658736e-01, 6.01322174e-01, -7.76383638e-01,
+          -7.78011084e-02, -1.64062440e-01, -6.05104566e-02,
+          -5.84766805e-01, -5.66915274e-01, -1.70312464e-01,
+          -1.77154362e-01, -5.54541111e-01, -6.04734778e-01,
+          3.70002985e-01, 2.26692438e-01, 4.03824568e-01,
+          -1.41560197e-01, -2.38507450e-01, 1.34953499e-01,
+          3.41040492e-01, 2.20734000e-01, -3.08011532e-01,
+          4.17549014e-01, 5.06849289e-01, -3.02895546e-01,
+          -4.20989990e-02, -2.55354166e-01, 5.27363062e-01,
+          -5.32064199e-01, -2.98798084e-04, -3.32530618e-01 ]
+      - [ -6.38980985e-01, -2.27190852e-02, -3.14407408e-01,
+          3.64206314e-01, -6.63400412e-01, 7.05816746e-02,
+          -3.18934441e-01, -1.46977305e-02, -5.40423274e-01,
+          -1.60759509e-01, -2.42187440e-01, -2.67403722e-01,
+          -4.41787004e-01, -5.10290980e-01, -2.46874988e-01,
+          -2.85147309e-01, -4.26168203e-01, -5.25509894e-01,
+          2.56988287e-01, 2.88810015e-01, 3.25084329e-01, 4.86731529e-02,
+          -2.29323566e-01, 4.18988228e-01, 3.75722528e-01,
+          -3.95783663e-01, -5.73629141e-02, 3.30650806e-02,
+          4.37110782e-01, -2.31171846e-01, 2.34448910e-03,
+          -1.69936240e-01, 4.45273519e-01, -2.50979304e-01,
+          -1.07982159e-02, -1.79285526e-01 ]
+      - [ -8.25186431e-01, -7.20786095e-01, -6.59332871e-01,
+          3.48168254e-01, -9.90861416e-01, -7.58546829e-01,
+          -6.64506793e-01, 3.77844930e-01, -9.89169061e-01,
+          -7.92418838e-01, -5.28125048e-01, 6.38264179e-01,
+          -9.84348416e-01, -8.87422562e-01, -5.34375012e-01,
+          6.50047541e-01, -9.84284163e-01, -8.88324261e-01,
+          -2.72016883e-01, -3.66951525e-01, -2.14848220e-01,
+          3.79037738e-01, -7.89247394e-01, -2.72752047e-01,
+          -2.39306450e-01, 4.60720062e-01, -8.11811030e-01,
+          -3.26062143e-01, -6.10211492e-02, 8.15356970e-02,
+          -7.10600734e-01, -7.34178364e-01, -5.47263026e-02,
+          7.31116533e-02, -7.19901204e-01, -7.38644242e-01 ]
+      - [ -8.53769660e-01, -8.45637500e-01, -8.56635928e-01,
+          1.84932232e-01, -9.96043682e-01, -8.26260269e-01,
+          -7.53779650e-01, 6.95161819e-02, -9.90816832e-01,
+          -9.24623549e-01, -7.06249952e-01, 4.13525105e-01,
+          -9.90177631e-01, -9.47539449e-01, -7.14062452e-01,
+          4.32972789e-01, -9.90056694e-01, -9.47755337e-01,
+          -6.53742135e-01, -6.58639908e-01, -6.40044987e-01,
+          6.00054026e-01, -9.66711521e-01, -4.64893520e-01,
+          -6.69364214e-01, -2.21645117e-01, -7.42518783e-01,
+          -8.47024977e-01, -6.28891647e-01, -1.46733105e-01,
+          -7.72588432e-01, -9.00386333e-01, -6.36815906e-01,
+          -1.94156408e-01, -7.68687487e-01, -9.02907133e-01 ]
+      - [ -9.20256853e-01, -9.01969612e-01, -8.89283180e-01,
+          -5.79032302e-02, -9.91131067e-01, -9.29351091e-01,
+          -9.13606882e-01, -3.96459699e-02, -9.88064885e-01,
+          -9.41108525e-01, -8.82812500e-01, 2.22724319e-01,
+          -9.84926403e-01, -9.68921900e-01, -8.90625000e-01,
+          2.50968814e-01, -9.85061228e-01, -9.69042301e-01,
+          -7.65554547e-01, -7.97201991e-01, -7.41282344e-01,
+          1.72528148e-01, -9.44489956e-01, -8.20754826e-01,
+          -7.64161825e-01, 2.12305188e-01, -9.49858606e-01,
+          -8.50429177e-01, -7.38480687e-01, 1.32051110e-01,
+          -9.12780643e-01, -9.35139537e-01, -7.43781090e-01,
+          1.35301590e-01, -9.19813752e-01, -9.34553564e-01 ]
+      - [ -8.55633795e-01, -7.80994534e-01, -7.91341364e-01,
+          1.24776840e-01, -9.79733050e-01, -8.07837963e-01,
+          -8.04175615e-01, 1.10539079e-01, -9.70930517e-01,
+          -8.37091208e-01, -7.32812524e-01, 3.15621734e-01,
+          -9.67591226e-01, -9.22485232e-01, -7.39062488e-01,
+          3.28082204e-01, -9.66447234e-01, -9.23877776e-01,
+          -5.28103471e-01, -5.67163348e-01, -5.41057348e-01,
+          2.09130049e-01, -8.35124314e-01, -5.45816958e-01,
+          -5.56069434e-01, 7.18064308e-02, -7.79905617e-01,
+          -6.43497109e-01, -4.96886671e-01, -2.82965302e-02,
+          -7.59750485e-01, -8.39506686e-01, -4.92537320e-01,
+          -1.24539733e-01, -7.46831536e-01, -8.52709889e-01 ]
+      - [ -6.78541839e-01, -2.60386825e-01, -4.29382563e-01,
+          2.08061218e-01, -7.32898116e-01, -2.17978776e-01,
+          -4.57163393e-01, 5.79526424e-02, -6.59044623e-01,
+          -2.90445864e-01, -2.89062440e-01, -6.30980313e-01,
+          -5.43512344e-01, -7.64797926e-01, -2.99999952e-01,
+          -3.75394106e-01, -5.90778708e-01, -7.31162786e-01,
+          2.43763089e-01, -1.37681961e-02, 1.83352113e-01,
+          3.97688150e-02, -3.96673143e-01, 9.42480564e-02,
+          1.28323674e-01, 1.59963965e-01, -3.73487055e-01,
+          2.11190939e-01, 4.99377251e-01, -1., -1.11021578e-01,
+          -6.69436455e-01, 4.75124478e-01, -8.07054877e-01,
+          -1.76193535e-01, -6.15063190e-01 ]
+      - [ -8.64125967e-01, -9.20482874e-01, -8.19730282e-01,
+          2.95348167e-02, -9.99529481e-01, -9.45573151e-01,
+          -8.30093563e-01, 9.63002443e-02, -9.99353290e-01,
+          -9.48568046e-01, -7.95312464e-01, 4.66583133e-01,
+          -9.99342799e-01, -9.64986861e-01, -7.98437476e-01,
+          4.84781742e-01, -9.99391615e-01, -9.65872407e-01,
+          -5.21490812e-01, -7.32302904e-01, -4.24071968e-01,
+          5.39847612e-01, -9.92536664e-01, -7.60653079e-01,
+          -4.21965420e-01, 5.31414866e-01, -9.84032452e-01,
+          -7.77069569e-01, -3.12577784e-01, 6.78635359e-01,
+          -9.79106426e-01, -8.83197308e-01, -2.93532312e-01,
+          6.69947624e-01, -9.80686426e-01, -8.89197767e-01 ]
+      - [ -8.53355408e-01, -8.55516136e-01, -8.42441440e-01,
+          9.73310471e-02, -9.93444860e-01, -8.69412661e-01,
+          -8.11375082e-01, 1.31090641e-01, -9.93062735e-01,
+          -9.02320027e-01, -7.59374976e-01, 4.13509846e-01,
+          -9.89783466e-01, -9.43891823e-01, -7.65625000e-01,
+          4.25803423e-01, -9.89400208e-01, -9.45174098e-01,
+          -6.30297542e-01, -6.74943089e-01, -6.17547750e-01,
+          2.72807121e-01, -9.21875596e-01, -6.82767510e-01,
+          -6.34682059e-01, 6.00514412e-02, -8.41548800e-01,
+          -7.62295008e-01, -5.99003792e-01, 2.01230764e-01,
+          -8.75529587e-01, -8.71921659e-01, -6.01990104e-01,
+          1.64517999e-01, -8.75408411e-01, -8.77363682e-01 ]
+      - [ -8.46934557e-01, -9.03610826e-01, -8.15472007e-01,
+          -2.60635018e-02, -9.93629813e-01, -9.34827745e-01,
+          -8.27213824e-01, 1.16137981e-01, -9.98534799e-01,
+          -9.35720205e-01, -7.79687524e-01, 3.70170951e-01,
+          -9.94064450e-01, -9.63321865e-01, -7.82812476e-01,
+          3.93090010e-01, -9.94136274e-01, -9.63747680e-01,
+          -5.34716010e-01, -7.26023793e-01, -4.80314970e-01,
+          3.88344765e-01, -9.71291065e-01, -7.59605825e-01,
+          -5.07514477e-01, 3.77658129e-01, -9.60947394e-01,
+          -7.84715354e-01, -4.19676244e-01, 4.19460177e-01,
+          -9.51509178e-01, -9.00913000e-01, -4.27860737e-01,
+          3.74362946e-01, -9.47607458e-01, -9.03528810e-01 ]
+      - [ -8.62468958e-01, -5.61139464e-01, -6.79205060e-01,
+          -5.97197413e-02, -8.44675660e-01, -6.40257716e-01,
+          -6.84665203e-01, -7.93429613e-02, -8.22708488e-01,
+          -6.66911602e-01, -6.01562500e-01, -7.64920115e-02,
+          -8.11457992e-01, -8.41390252e-01, -6.07812524e-01,
+          -5.60657978e-02, -8.09147716e-01, -8.44828606e-01,
+          -4.41538930e-01, -2.49407649e-01, -1.99100077e-01,
+          -7.95710087e-02, -5.45514464e-01, -3.32486629e-01,
+          -1.90751433e-01, -1.44939899e-01, -4.79512990e-01,
+          -3.56995225e-01, -1.33250356e-01, -2.22925663e-01,
+          -4.95299876e-01, -6.66518152e-01, -1.44278586e-01,
+          -3.72611582e-01, -4.60643530e-01, -6.91143990e-01 ]
+      - [ -7.71126747e-01, -3.46504450e-01, -4.88999307e-01,
+          1.69709325e-01, -7.81329215e-01, -3.46033633e-01,
+          -5.04679561e-01, -7.57076144e-02, -7.22708106e-01,
+          -5.05719960e-01, -4.01562452e-01, -3.15626919e-01,
+          -6.53050244e-01, -7.56461322e-01, -4.10937428e-01,
+          -2.75215149e-01, -6.49052978e-01, -7.54735589e-01,
+          -9.94891524e-02, -1.44118309e-01, 3.26209068e-02,
+          1.83184981e-01, -5.77365875e-01, -6.34414554e-02,
+          3.81501913e-02, -1.02612376e-01, -4.89815950e-01,
+          -3.11043441e-01, 1.63138151e-01, -4.82809007e-01,
+          -3.59270036e-01, -6.51788950e-01, 1.44278646e-01,
+          -4.63668108e-01, -3.65630984e-01, -6.33429885e-01 ]
+      - [ -8.88980925e-01, -4.99357283e-01, -6.77785635e-01,
+          -2.37071574e-01, -7.04979122e-01, -5.52824378e-01,
+          -6.73146129e-01, -3.37826431e-01, -6.89892352e-01,
+          -6.50892615e-01, -5.67187488e-01, -5.39735913e-01,
+          -6.51651621e-01, -8.48251581e-01, -5.71874976e-01,
+          -4.85455990e-01, -6.51604533e-01, -8.47181618e-01,
+          -5.28103471e-01, -2.32767999e-01, -1.87851489e-01,
+          -1.09474659e-02, -4.94690835e-01, -1.58798814e-01,
+          -2.04624236e-01, 1.95729733e-02, -5.34219563e-01,
+          -2.61878192e-01, -6.22671843e-03, -6.31948948e-01,
+          -3.74183059e-01, -7.39180624e-01, -1.24378800e-02,
+          -6.37643099e-01, -3.80559623e-01, -7.29465127e-01 ]
+      - [ -7.39436626e-01, -3.14963639e-01, -4.42157567e-01,
+          6.65599227e-01, -8.76873553e-01, -1.22208536e-01,
+          -4.19726372e-01, -4.23770845e-01, -6.56222761e-01,
+          -6.44615531e-01, -3.81249964e-01, -2.77072668e-01,
+          -6.42256141e-01, -7.30023921e-01, -3.84374976e-01,
+          -2.95015991e-01, -6.31365299e-01, -7.43755937e-01,
+          -1.31349623e-01, -7.52064586e-02, 6.86165094e-02,
+          6.83417559e-01, -7.24232256e-01, 2.74191260e-01,
+          1.00577950e-01, -7.99237251e-01, -2.26145625e-01,
+          -6.10106111e-01, 1.43212914e-01, -5.36813021e-01,
+          -2.61419117e-01, -6.00101650e-01, 1.36815906e-01,
+          -6.72621012e-01, -2.37259448e-01, -6.26293898e-01 ]
+      - [ -7.68019915e-01, -4.06197488e-01, -5.10290980e-01,
+          1.50226951e-01, -7.99047470e-01, -3.97681594e-01,
+          -5.04679561e-01, -5.37995696e-02, -7.68058896e-01,
+          -5.65530181e-01, -3.81249964e-01, -7.26343393e-02,
+          -7.44161487e-01, -7.59189248e-01, -3.89062464e-01,
+          -6.14387393e-02, -7.39920199e-01, -7.64629722e-01,
+          -1.28343940e-01, -2.14763880e-02, 6.63666725e-02,
+          3.01762223e-01, -5.33311188e-01, 1.91721797e-01,
+          1.28323674e-01, -2.17310071e-01, -3.78026545e-01,
+          -2.54275143e-01, 2.12951303e-01, -1.99134171e-01,
+          -3.47841978e-01, -5.00624299e-01, 2.01492548e-01,
+          -3.69928837e-01, -3.09525609e-01, -5.38344026e-01 ]
+      - [ -8.71789575e-01, -5.64932048e-01, -6.74946785e-01,
+          -1.75860882e-01, -8.05533171e-01, -6.61346257e-01,
+          -6.90424740e-01, -1.22366786e-01, -8.05759788e-01,
+          -6.68466091e-01, -5.92187405e-01, -2.10794270e-01,
+          -7.78317571e-01, -8.51703644e-01, -5.98437428e-01,
+          -1.80145621e-01, -7.77307689e-01, -8.54369342e-01,
+          -4.58370984e-01, -2.77721107e-01, -2.05849290e-01,
+          1.27084255e-02, -5.98752499e-01, -3.15427542e-01,
+          -1.86127245e-01, -4.65291739e-02, -5.65753400e-01,
+          -3.86895180e-01, -1.00871742e-01, -2.23137915e-01,
+          -5.40104926e-01, -7.03878284e-01, -9.45274234e-02,
+          -2.41563976e-01, -5.50093293e-01, -7.07094848e-01 ]
+      - [ -6.51615560e-01, -3.88642430e-01, -3.78282428e-01,
+          7.76516199e-02, -8.38142395e-01, -5.56080878e-01,
+          -3.96688223e-01, 2.29609013e-02, -8.13074172e-01,
+          -5.97855270e-01, -3.92187476e-01, 1.58502698e-01,
+          -7.72335410e-01, -7.03722715e-01, -3.96875024e-01,
+          6.88102245e-02, -7.50894785e-01, -7.26748705e-01,
+          -9.31781530e-03, -2.42855489e-01, -2.13723183e-02,
+          1.44999266e-01, -6.98109627e-01, -3.53090703e-01,
+          1.15609169e-03, 1.95573568e-02, -6.46716118e-01,
+          -4.48231637e-01, 4.10958529e-02, 1.62020087e-01,
+          -6.13840103e-01, -5.91738224e-01, 3.48258018e-02,
+          1.16331220e-01, -6.08245969e-01, -5.99275947e-01 ]
+      - [ -7.53521085e-01, -4.28449571e-01, -5.10290980e-01,
+          -7.33981133e-02, -8.03818703e-01, -5.90732396e-01,
+          -5.39236784e-01, 3.33897948e-01, -8.70029688e-01,
+          -4.79379475e-01, -5.29687524e-01, -1.30313694e-01,
+          -7.12850809e-01, -7.53620744e-01, -5.35937548e-01,
+          1.44920945e-01, -7.59585559e-01, -7.07080841e-01,
+          -2.22122133e-01, -2.53493667e-01, -1.60854816e-01,
+          -7.54094720e-02, -5.66589057e-01, -3.60689640e-01,
+          -1.42196596e-01, 2.28262663e-01, -6.75178409e-01,
+          -2.85626113e-01, -8.84184837e-02, -2.19178140e-01,
+          -4.75996137e-01, -6.44797564e-01, -8.20896029e-02,
+          -1.73226416e-01, -5.03554642e-01, -6.34262502e-01 ]
+      - [ -7.85418391e-01, -2.93752491e-01, -4.66288209e-01,
+          2.34006166e-01, -7.65764356e-01, -2.55724072e-01,
+          -4.68682528e-01, -4.93732095e-02, -7.13942945e-01,
+          -4.72753286e-01, -3.56249988e-01, -1.68988109e-01,
+          -6.72446787e-01, -7.17383444e-01, -3.59374940e-01,
+          -1.56749308e-01, -6.68370306e-01, -7.24523604e-01,
+          -3.28524232e-01, -7.92402029e-02, -3.48705649e-02,
+          2.12175965e-01, -5.21846175e-01, 9.38563347e-02,
+          3.46815586e-03, -3.81123841e-01, -3.00624490e-01,
+          -3.30619574e-01, 1.10834241e-01, -5.96265674e-01,
+          -2.48124897e-01, -6.20584130e-01, 9.70149040e-02,
+          -6.59903169e-01, -2.39000857e-01, -6.25258148e-01 ]
+      - [ -6.64871573e-01, -1.97753012e-01, -3.58410180e-01,
+          3.62316847e-01, -7.38635302e-01, -9.05824900e-02,
+          -3.57811332e-01, 6.92632198e-02, -6.67353392e-01,
+          -3.04002285e-01, -2.15624928e-01, -8.93573165e-02,
+          -6.17148399e-01, -6.18424892e-01, -2.17187524e-01,
+          -6.27605915e-02, -6.16975546e-01, -6.23605132e-01,
+          1.81845427e-01, 2.62650013e-01, 3.11586022e-01, 1.41735792e-01,
+          -2.65151262e-01, 4.92850780e-01, 3.52601171e-01,
+          -2.88403749e-01, -9.13853645e-02, 1.12362862e-01,
+          4.27148223e-01, -3.52313876e-01, 1.21775866e-02,
+          -2.38862991e-01, 4.10447836e-01, -3.54830503e-01,
+          6.02638721e-03, -2.29349494e-01 ]
+      - [ -8.82974327e-01, -4.67611372e-01, -6.80624604e-01,
+          -2.66265512e-01, -7.51233041e-01, -6.46368980e-01,
+          -7.00503945e-01, -2.11049318e-01, -7.61817992e-01,
+          -6.62855744e-01, -6.87500000e-01, -3.24022532e-01,
+          -6.56053066e-01, -7.91003823e-01, -6.92187488e-01,
+          -3.23817015e-01, -6.47699594e-01, -8.01887691e-01,
+          -6.66366100e-01, -4.10117567e-01, -4.46569204e-01,
+          -1.97208405e-01, -6.06913090e-01, -5.83166897e-01,
+          -4.63583827e-01, -2.28412032e-01, -6.01471543e-01,
+          -6.47468805e-01, -4.62017417e-01, -2.02701449e-01,
+          -5.53377032e-01, -7.33900607e-01, -4.67661679e-01,
+          -2.56678522e-01, -5.57446361e-01, -7.48015523e-01 ]
+      - [ -9.19221222e-01, -6.76081777e-01, -7.64371872e-01,
+          -1.91363275e-01, -8.64669681e-01, -7.65854359e-01,
+          -7.78257728e-01, -2.18672752e-01, -8.41255903e-01,
+          -7.85554767e-01, -7.14062452e-01, -1.47121787e-01,
+          -8.36557388e-01, -8.94747853e-01, -7.17187524e-01,
+          -1.39785826e-01, -8.32745075e-01, -9.00645792e-01,
+          -6.66366100e-01, -4.42282200e-01, -3.52081001e-01,
+          8.19935799e-02, -7.49157548e-01, -5.16298890e-01,
+          -3.36416245e-01, 1.77749395e-02, -7.11415887e-01,
+          -5.69838226e-01, -3.02615166e-01, -4.45941091e-02,
+          -6.77266181e-01, -7.61084437e-01, -2.88557291e-01,
+          -1.76117361e-01, -6.58914208e-01, -7.85527110e-01 ]
+      - [ -8.11723292e-01, -3.96229446e-01, -5.40099382e-01,
+          4.49305773e-02, -7.59954453e-01, -4.08119977e-01,
+          -5.47876179e-01, -1.12170815e-01, -7.31561422e-01,
+          -5.45815170e-01, -4.31249976e-01, -3.48663628e-01,
+          -6.69619560e-01, -7.86008000e-01, -4.37499940e-01,
+          -2.91276991e-01, -6.71469033e-01, -7.83633947e-01,
+          -2.62398601e-01, -1.59414113e-01, -2.36220360e-02,
+          1.63949132e-01, -5.62950969e-01, -5.73945642e-02,
+          -1.27168894e-02, -7.07072616e-02, -4.99549687e-01,
+          -2.95613050e-01, 1.35740995e-01, -5.48048437e-01,
+          -3.56459558e-01, -6.80059254e-01, 1.16915464e-01,
+          -5.27239442e-01, -3.68059278e-01, -6.64843082e-01 ]
+      - [ -8.64125967e-01, -4.65781391e-01, -6.53655052e-01,
+          -1.74708307e-01, -7.34083891e-01, -5.45058608e-01,
+          -6.63066983e-01, -1.35711610e-01, -7.18042135e-01,
+          -5.39209247e-01, -5.59374988e-01, -2.81464159e-01,
+          -6.91576600e-01, -7.92403519e-01, -5.60937464e-01,
+          -2.89140105e-01, -6.81334794e-01, -8.01993668e-01,
+          -4.28313851e-01, -1.26758218e-01, -9.33633447e-02,
+          -2.14238584e-01, -3.75099778e-01, -2.13448644e-01,
+          -1.23699486e-01, -7.05785155e-02, -3.65469992e-01,
+          -8.37599039e-02, 3.86052132e-02, -5.48311591e-01,
+          -2.90824652e-01, -6.37287199e-01, 1.24377012e-02,
+          -4.90124702e-01, -3.18313003e-01, -6.16276324e-01 ]
+      - [ -8.08409274e-01, -5.24535656e-01, -6.09652162e-01,
+          2.55602479e-01, -9.04925227e-01, -5.34522176e-01,
+          -6.16990626e-01, -3.39221001e-01, -7.68955171e-01,
+          -7.50415623e-01, -6.15625024e-01, 2.56991744e-01,
+          -8.37349534e-01, -7.48218775e-01, -6.12499952e-01,
+          -3.44646871e-01, -7.22242713e-01, -8.61223638e-01,
+          -3.48361909e-01, -3.40279281e-01, -2.28346467e-01,
+          1.90634608e-01, -7.35351205e-01, -3.71367931e-01,
+          -2.32369959e-01, -5.00398159e-01, -4.74028587e-01,
+          -6.84996724e-01, -2.40348697e-01, 3.79711390e-02,
+          -6.07023239e-01, -6.51895106e-01, -2.06467748e-01,
+          -3.19520891e-01, -5.42870522e-01, -7.40337014e-01 ]
+      - [ -9.41590726e-01, -5.86650729e-01, -8.12633097e-01,
+          -2.32326508e-01, -7.88440347e-01, -6.76218867e-01,
+          -8.35853100e-01, -1.97366595e-01, -8.06417882e-01,
+          -7.21649766e-01, -7.87499964e-01, -2.65248895e-01,
+          -7.55175948e-01, -8.69366229e-01, -7.95312464e-01,
+          -2.48731375e-01, -7.49994040e-01, -8.74805808e-01,
+          -8.21460843e-01, -4.61650968e-01, -5.47806501e-01,
+          -3.00057590e-01, -5.15363932e-01, -5.90215087e-01,
+          -5.93063593e-01, -1.93961978e-01, -6.10846579e-01,
+          -6.51595950e-01, -6.03985071e-01, -3.21251452e-01,
+          -5.12400746e-01, -7.76937187e-01, -6.11940265e-01,
+          -3.91026139e-01, -5.09359717e-01, -7.90979803e-01 ]
+      - [ -7.51864135e-01, -1.37345612e-01, -3.91057491e-01,
+          2.65234947e-01, -6.94804966e-01, -8.67291689e-02,
+          -4.21166301e-01, 2.88664103e-01, -6.96590662e-01,
+          -1.57410324e-01, -3.29687476e-01, -1.77519917e-01,
+          -5.33409774e-01, -5.70140600e-01, -3.35937440e-01,
+          -1.82517409e-01, -5.20303726e-01, -5.78931749e-01,
+          -1.21731341e-01, 1.56984687e-01, 1.69853687e-01,
+          -2.47371793e-02, -2.29055643e-01, 3.06944728e-01,
+          1.79190636e-01, -1.16908967e-01, -2.61716962e-01,
+          4.74740267e-02, 2.15442181e-01, -4.64082956e-01,
+          -6.94406033e-03, -3.40039015e-01, 2.01492548e-01,
+          -5.46957552e-01, 1.66639090e-02, -3.49382699e-01 ]
+      - [ -8.31400156e-01, -4.42673028e-01, -5.99716127e-01,
+          -1.08051419e-01, -7.31345117e-01, -4.85756338e-01,
+          -6.11231089e-01, -9.70042944e-02, -7.29947209e-01,
+          -5.29552102e-01, -4.87499952e-01, -1.77644372e-01,
+          -7.10801661e-01, -7.67316937e-01, -4.87499952e-01,
+          -2.07458198e-01, -6.99285805e-01, -7.82447517e-01,
+          -3.08686554e-01, -4.64244485e-02, -4.61191535e-02,
+          -5.73743582e-02, -3.49937975e-01, 3.82097960e-02,
+          -2.19653249e-02, -1.53028071e-01, -3.15462053e-01,
+          -9.75292921e-02, 8.34370852e-02, -1.59225285e-01,
+          -3.33297729e-01, -4.71177638e-01, 1.31840825e-01,
+          -5.38008690e-01, -2.54433453e-01, -5.77805758e-01 ]
+      - [ -6.29246056e-01, -2.08745360e-01, -3.25762928e-01,
+          8.30592871e-01, -8.79743040e-01, -6.91831112e-03,
+          -2.72858143e-01, -5.19066095e-01, -5.92180133e-01,
+          -6.14406466e-01, -2.59374917e-01, -3.05441439e-01,
+          -5.72697997e-01, -6.64663315e-01, -2.76562512e-01,
+          -3.41259897e-01, -5.54953933e-01, -6.82118058e-01,
+          4.83919382e-02, -5.09508848e-02, 1.27109051e-01,
+          7.51115918e-01, -7.32439876e-01, 3.38825703e-01,
+          1.76878572e-01, -7.77519405e-01, -2.43244946e-01,
+          -5.98563671e-01, 2.55292654e-01, -5.12487948e-01,
+          -2.57888854e-01, -5.77470303e-01, 2.36318350e-01,
+          -6.13968194e-01, -2.43589163e-01, -5.96966386e-01 ]
+      - [ -8.92502069e-01, -6.97855353e-01, -7.54435778e-01,
+          -1.45020604e-01, -9.02401507e-01, -7.99861431e-01,
+          -7.69618392e-01, -9.47937965e-02, -8.92628908e-01,
+          -7.91652739e-01, -7.24999905e-01, -1.90978646e-02,
+          -8.79532218e-01, -9.00083303e-01, -7.26562500e-01,
+          -1.13099813e-02, -8.77483547e-01, -9.05900300e-01,
+          -5.47941089e-01, -4.86239433e-01, -3.31833541e-01,
+          -4.08391953e-02, -7.45956063e-01, -6.32259727e-01,
+          -3.66474092e-01, 3.90450954e-02, -7.29451954e-01,
+          -5.81477880e-01, -3.37484360e-01, -2.08089948e-02,
+          -7.17745841e-01, -7.90677309e-01, -3.20895433e-01,
+          -2.24700809e-01, -6.84826672e-01, -8.26655090e-01 ]
+      - [ -6.99254394e-01, -2.94114411e-01, -3.38537931e-01,
+          4.80018973e-01, -8.04071069e-01, -1.35917783e-01,
+          -2.98776031e-01, -4.40208852e-01, -6.43797874e-01,
+          -6.32026076e-01, -1.90624952e-01, -4.95124817e-01,
+          -6.10425234e-01, -7.67692685e-01, -2.07812428e-01,
+          -3.46355855e-01, -6.27711177e-01, -7.49064028e-01,
+          9.52808857e-02, -4.60521579e-02, 2.50843644e-01,
+          7.32392907e-01, -6.85385466e-01, 4.25330520e-01,
+          3.22543263e-01, -5.75604796e-01, -3.14236164e-01,
+          -4.93594229e-01, 4.86923933e-01, -8.86503220e-01,
+          -2.12922156e-01, -6.92639947e-01, 4.62686658e-01,
+          -7.54441619e-01, -2.61813521e-01, -6.56899989e-01 ]
+      - [ -8.46106052e-01, -4.97438073e-01, -6.29524469e-01,
+          -7.54857063e-02, -7.74174392e-01, -5.29650986e-01,
+          -6.25629902e-01, -1.70616984e-01, -7.64060676e-01,
+          -6.37414575e-01, -5.17187476e-01, -2.01603770e-01,
+          -7.44328022e-01, -8.11278582e-01, -5.26562452e-01,
+          -1.73811078e-01, -7.42258906e-01, -8.14332902e-01,
+          -3.65193903e-01, -1.28676295e-01, -1.20359898e-01,
+          1.34979010e-01, -4.89830196e-01, 5.16446829e-02,
+          -5.89596033e-02, -2.30461240e-01, -3.88831854e-01,
+          -3.01258564e-01, 7.09837675e-02, -3.54417086e-01,
+          -3.78895819e-01, -6.14686370e-01, 6.96516037e-02,
+          -4.91040528e-01, -3.61817181e-01, -6.47435665e-01 ]
+      - [ -7.34465599e-01, -3.23295653e-01, -4.47835386e-01,
+          -8.43439102e-02, -6.87881589e-01, -4.01006043e-01,
+          -4.68682528e-01, 2.09901929e-01, -7.37364173e-01,
+          -3.03773701e-01, -3.04687500e-01, -1.62875712e-01,
+          -6.61318183e-01, -7.00934172e-01, -3.12500000e-01,
+          -1.54081523e-01, -6.55904353e-01, -7.08314896e-01,
+          -8.71658325e-03, 1.46679044e-01, 1.83352113e-01,
+          -9.31813121e-02, -2.35331714e-01, 1.96789861e-01,
+          1.65317893e-01, 1.72493577e-01, -2.81218290e-01,
+          4.03708816e-01, 3.72353673e-01, -4.53819394e-01,
+          -9.52126980e-02, -4.00604367e-01, 3.58208895e-01,
+          -6.10429525e-01, -5.71768880e-02, -4.35050309e-01 ]
+      - [ -8.84424210e-01, -4.87255096e-01, -6.69268966e-01,
+          -1.05810165e-03, -7.91144073e-01, -4.89370584e-01,
+          -6.70266390e-01, -2.18656301e-01, -7.33208418e-01,
+          -6.25629723e-01, -5.78124940e-01, -2.55073249e-01,
+          -7.27443099e-01, -8.17993402e-01, -5.82812428e-01,
+          -2.13734984e-01, -7.28683233e-01, -8.19666505e-01,
+          -5.35317123e-01, -1.80369914e-01, -2.32845843e-01,
+          1.09229922e-01, -5.01106143e-01, -4.74238396e-03,
+          -1.69942200e-01, -3.57896566e-01, -3.26728046e-01,
+          -3.66454601e-01, -6.10211492e-02, -6.18334174e-01,
+          -3.14217269e-01, -6.95944667e-01, -8.70646834e-02,
+          -6.64634407e-01, -3.06488216e-01, -6.94609702e-01 ]
+      - [ -7.24523604e-01, -4.61696267e-01, -4.84740913e-01,
+          -5.58161139e-02, -8.15226853e-01, -6.00108385e-01,
+          -4.83081341e-01, 2.00553656e-01, -8.60226274e-01,
+          -5.64878225e-01, -4.15624976e-01, -1.30642712e-01,
+          -7.73213565e-01, -8.11679661e-01, -4.29687500e-01,
+          1.03526592e-01, -8.04572344e-01, -7.78280377e-01,
+          -1.00691378e-01, -2.46754408e-01, -1.09111309e-01,
+          8.14088583e-02, -6.19110107e-01, -2.70149291e-01,
+          -1.27168894e-02, 1.06261253e-01, -6.45799637e-01,
+          -3.61467838e-01, 8.34370852e-02, -2.56789386e-01,
+          -5.29259562e-01, -6.99317276e-01, 4.22885418e-02,
+          -1.34734452e-01, -5.56988657e-01, -6.61471903e-01 ]
+      - [ -7.54142523e-01, -2.67982125e-01, -4.35060322e-01,
+          1.94041133e-01, -7.29166150e-01, -2.22822845e-01,
+          -4.35565174e-01, -3.08833122e-02, -6.91745043e-01,
+          -4.22650218e-01, -3.03124905e-01, -2.66448796e-01,
+          -6.30657613e-01, -7.09818840e-01, -3.14062476e-01,
+          -1.88454747e-01, -6.34413838e-01, -7.00065017e-01,
+          -1.43973649e-01, -5.65539002e-02, 5.96176386e-02,
+          1.95512414e-01, -4.97664571e-01, 1.15423203e-01,
+          9.82658863e-02, -1.45359218e-01, -4.05753672e-01,
+          -2.20898867e-01, 2.05479503e-01, -5.47739983e-01,
+          -2.56608903e-01, -5.96808136e-01, 1.84079528e-01,
+          -4.91943598e-01, -2.79434860e-01, -5.72682261e-01 ]
+      - [ -9.21085358e-01, -5.17596662e-01, -7.64371872e-01,
+          -2.32910514e-01, -7.67058969e-01, -6.44081116e-01,
+          -7.89776802e-01, -1.12794042e-01, -8.03982735e-01,
+          -6.54158354e-01, -7.62499988e-01, -2.97536373e-01,
+          -6.93888903e-01, -8.25358927e-01, -7.68750012e-01,
+          -3.04983974e-01, -6.80048466e-01, -8.34349871e-01,
+          -7.72768259e-01, -4.33330476e-01, -5.23059607e-01,
+          -2.77040958e-01, -5.44859886e-01, -5.97122908e-01,
+          -5.56069434e-01, -2.14477301e-01, -5.95168591e-01,
+          -6.43442988e-01, -5.66625237e-01, -2.64615297e-01,
+          -5.21427035e-01, -7.50762403e-01, -5.74626863e-01,
+          -3.17980707e-01, -5.24836779e-01, -7.63446987e-01 ]
+      - [ -8.05302382e-01, -3.10628712e-01, -5.27324319e-01,
+          1.06050968e-01, -7.10982382e-01, -2.55480468e-01,
+          -5.20518363e-01, -1.49573982e-01, -6.61916554e-01,
+          -4.64814544e-01, -3.89062464e-01, -3.56447518e-01,
+          -6.19828761e-01, -7.39436269e-01, -3.95312428e-01,
+          -2.67344952e-01, -6.29756093e-01, -7.32067525e-01,
+          -2.14908361e-01, -7.90225267e-02, 5.62441349e-03,
+          1.37655735e-01, -4.69134510e-01, 9.28543806e-02,
+          4.73988056e-02, -1.02480888e-01, -4.23573911e-01,
+          -2.04682946e-01, 1.85554147e-01, -6.07382774e-01,
+          -2.69401014e-01, -6.36036634e-01, 1.56716347e-01,
+          -5.11544824e-01, -3.01239908e-01, -6.01653934e-01 ]
+      - [ -7.02361226e-01, -2.12735653e-01, -3.58410180e-01,
+          5.32951355e-02, -6.57661319e-01, -2.29263723e-01,
+          -3.66450608e-01, 1.48467541e-01, -6.64819777e-01,
+          -2.30634511e-01, -1.75000012e-01, -4.43233371e-01,
+          -5.58546543e-01, -7.00883746e-01, -1.89062476e-01,
+          -3.10068369e-01, -5.69125414e-01, -6.77830815e-01,
+          5.44033051e-02, 5.30446768e-02, 2.26096749e-01,
+          -1.17169619e-02, -3.37456584e-01, 1.32556915e-01,
+          1.81502938e-01, 1.40233755e-01, -3.27880025e-01,
+          2.69734263e-01, 5.09339929e-01, -9.94020462e-01,
+          -5.95568419e-02, -6.31699204e-01, 4.67661738e-01,
+          -7.13938534e-01, -1.44412398e-01, -5.49718022e-01 ]
+      - [ -8.25807810e-01, -4.40330923e-01, -5.96877217e-01,
+          -1.97899401e-01, -6.91983342e-01, -4.99576211e-01,
+          -6.04031682e-01, 2.28238106e-03, -7.63637245e-01,
+          -5.09435534e-01, -4.84374940e-01, -3.26369166e-01,
+          -6.82988763e-01, -7.94428349e-01, -4.82812464e-01,
+          -3.64822149e-01, -6.68491602e-01, -8.10781181e-01,
+          -2.60595143e-01, -1.06119096e-01, -2.13723183e-02,
+          -4.88851070e-02, -4.10384893e-01, -5.42692542e-02,
+          -3.58382463e-02, -1.13275230e-01, -3.76057744e-01,
+          -1.46675289e-01, 1.03362441e-01, -5.90833724e-01,
+          -2.49062538e-01, -6.19383097e-01, 1.11940265e-01,
+          -8.03072631e-01, -2.15397298e-01, -6.69531465e-01 ]
+      - [ -5.57580829e-01, -1.42076552e-01, -1.90915585e-01,
+          4.75219727e-01, -7.67285228e-01, -8.11985731e-02,
+          -2.13822842e-01, 4.54801083e-01, -7.60486901e-01,
+          -1.80956006e-01, -5.78125119e-02, 1.09441757e-01,
+          -6.50890350e-01, -5.68664432e-01, -7.18750358e-02,
+          1.15054011e-01, -6.45524979e-01, -5.74580312e-01,
+          4.32521820e-01, 2.63383269e-01, 4.91563559e-01, 2.99276948e-01,
+          -3.61415267e-01, 5.31929493e-01, 4.91329432e-01,
+          9.54037905e-02, -2.92325377e-01, 2.75613785e-01,
+          5.46699762e-01, -3.93835247e-01, 2.00160742e-02,
+          -2.50756681e-01, 5.62189102e-01, -4.52815711e-01,
+          1.39164925e-02, -2.72503614e-01 ]
+      - [ -8.51077080e-01, -4.60448921e-01, -5.94038308e-01,
+          -2.22269773e-01, -7.20345318e-01, -5.64285994e-01,
+          -6.28509760e-01, -1.03588879e-01, -7.20944345e-01,
+          -5.19374609e-01, -4.84374940e-01, -3.51867914e-01,
+          -6.83949947e-01, -8.04203749e-01, -4.87499952e-01,
+          -3.16972375e-01, -6.83953822e-01, -8.07947993e-01,
+          -3.61587107e-01, -1.31081581e-01, -1.11361146e-01,
+          6.41454458e-02, -4.85100389e-01, -3.89252901e-02,
+          -7.51445293e-02, 1.79182291e-02, -4.17468011e-01,
+          -5.72247505e-02, 1.18306279e-01, -5.39388776e-01,
+          -3.30744565e-01, -6.57733679e-01, 1.31840825e-01,
+          -6.79457486e-01, -3.15515161e-01, -6.87738061e-01 ]
+      - [ -8.23115170e-01, -4.06926930e-01, -5.55713236e-01,
+          6.02004528e-02, -7.68256366e-01, -4.08017218e-01,
+          -5.57955325e-01, -1.55392528e-01, -7.28294492e-01,
+          -5.71927965e-01, -4.43749964e-01, -2.84519196e-01,
+          -6.94227934e-01, -7.87228346e-01, -4.46874917e-01,
+          -2.50460565e-01, -6.93690538e-01, -7.90197849e-01,
+          -3.63390446e-01, -1.14452124e-01, -6.86163306e-02,
+          1.48506165e-01, -5.01805067e-01, 4.55794334e-02,
+          -2.19653249e-02, -3.43492806e-01, -3.29311013e-01,
+          -3.33318532e-01, 8.59277248e-02, -5.97719967e-01,
+          -2.78827786e-01, -6.46850228e-01, 7.71143436e-02,
+          -6.62273169e-01, -2.76554286e-01, -6.56219363e-01 ]
+      - [ -7.34879851e-01, -4.16513622e-01, -4.81902063e-01,
+          9.97428894e-02, -8.20461988e-01, -4.90839183e-01,
+          -4.94600415e-01, 1.67486310e-01, -8.23635697e-01,
+          -5.09784520e-01, -3.78125012e-01, 1.42897367e-02,
+          -7.71959841e-01, -7.58897662e-01, -3.82812500e-01,
+          3.83150578e-02, -7.70873427e-01, -7.61831641e-01,
+          -2.49475241e-02, -9.48639512e-02, 8.88639688e-02,
+          8.05324316e-02, -5.67364812e-01, -1.73541427e-01,
+          9.36415195e-02, 7.43787289e-02, -5.50233066e-01,
+          -2.21136451e-01, 1.48194313e-01, -7.77622461e-02,
+          -4.34472322e-01, -5.23442268e-01, 1.34328365e-01,
+          -1.56597912e-01, -4.21829164e-01, -5.39301991e-01 ]
+      - [ -8.76139224e-01, -5.27304173e-01, -6.67849541e-01,
+          -2.15637684e-01, -7.64841735e-01, -6.27346456e-01,
+          -6.90424740e-01, -5.92636466e-02, -7.86078930e-01,
+          -5.89385033e-01, -5.79687476e-01, -2.61785388e-01,
+          -7.44312227e-01, -8.35969985e-01, -5.84374905e-01,
+          -2.17207968e-01, -7.44868398e-01, -8.35915565e-01,
+          -5.16681671e-01, -1.52202010e-01, -1.74353242e-01,
+          -2.04997301e-01, -3.87817502e-01, -2.30375528e-01,
+          -2.39306450e-01, 6.03954792e-02, -4.44589317e-01,
+          -5.53531051e-02, -6.10211492e-02, -5.16634464e-01,
+          -3.30162585e-01, -6.61952138e-01, -8.70646834e-02,
+          -5.25421739e-01, -3.34754765e-01, -6.54234648e-01 ]
+      - [ -7.84175634e-01, -3.78074706e-01, -5.17388225e-01,
+          5.76901436e-02, -7.91920781e-01, -4.62949216e-01,
+          -5.29157639e-01, 2.05282331e-01, -8.03283632e-01,
+          -4.32969570e-01, -4.23437476e-01, -6.52156472e-02,
+          -7.33288884e-01, -7.45720506e-01, -4.32812452e-01,
+          -2.15637684e-02, -7.33493209e-01, -7.43700504e-01,
+          -2.12503850e-01, -3.42136025e-02, 7.08662271e-02,
+          -1.59857571e-01, -4.10720110e-01, -1.89938068e-01,
+          -1.04046464e-02, 1.21909380e-01, -4.61715341e-01,
+          -3.67456675e-03, 1.00871801e-01, -3.23279500e-01,
+          -2.90551782e-01, -5.19873857e-01, 8.70646238e-02,
+          -3.18528712e-01, -3.04462254e-01, -5.14637768e-01 ]
+      - [ -4.87779677e-01, -3.26828361e-02, -8.58765244e-02,
+          5.16448140e-01, -7.42336690e-01, 1.36041641e-03,
+          -9.86319780e-02, 5.32407165e-01, -7.31996477e-01,
+          -6.44185543e-02, -3.43749523e-02, -1.24615431e-02,
+          -5.29140294e-01, -4.75739539e-01, -4.06249166e-02,
+          -1.65965557e-02, -5.22710383e-01, -4.88735020e-01,
+          4.68590260e-01, 2.75228024e-01, 4.55568075e-01, 1.58172607e-01,
+          -2.84108222e-01, 4.77760077e-01, 4.84393001e-01,
+          4.64439392e-04, -2.33176887e-01, 2.57420301e-01,
+          5.79078436e-01, -3.87719572e-01, 4.12517786e-02,
+          -2.25507617e-01, 5.84576964e-01, -4.29972410e-01,
+          2.91374922e-02, -2.43940115e-01 ]
+      - [ -5.92999160e-01, -2.32043087e-01, -2.73243427e-01,
+          2.01088428e-01, -7.57462800e-01, -3.00637245e-01,
+          -2.87257016e-01, 4.07748938e-01, -7.81156242e-01,
+          -2.56809413e-01, -1.60937488e-01, -8.17799568e-03,
+          -6.74583912e-01, -6.49243832e-01, -1.59375012e-01,
+          1.20171309e-02, -6.75179601e-01, -6.56652689e-01,
+          3.73609900e-01, 1.45009279e-01, 3.67828965e-01, 4.87458706e-03,
+          -3.18951011e-01, 1.91266298e-01, 3.82658839e-01,
+          2.34650254e-01, -4.09194231e-01, 2.36968279e-01,
+          5.01867890e-01, -3.64858866e-01, -1.37192309e-01,
+          -3.81133258e-01, 5.09950161e-01, -4.77679074e-01,
+          -1.17718577e-01, -4.07420516e-01 ]
+      - [ -7.57456541e-01, -3.17088783e-01, -4.69127059e-01,
+          8.91383886e-02, -7.66441941e-01, -3.91596258e-01,
+          -4.78761673e-01, 1.28814459e-01, -7.40196943e-01,
+          -3.77087951e-01, -3.68749917e-01, -1.09943628e-01,
+          -6.82726741e-01, -7.05486655e-01, -3.76562476e-01,
+          -1.26456082e-01, -6.64838433e-01, -7.09339499e-01,
+          -1.14517689e-01, -4.54169512e-03, 1.40607476e-01,
+          -1.90846026e-01, -3.83487225e-01, -1.78810120e-01,
+          1.23699307e-01, -7.62391090e-03, -3.93420041e-01,
+          -4.02984023e-02, 1.98007464e-01, -3.77413213e-01,
+          -2.43810177e-01, -5.00874877e-01, 1.34328365e-01,
+          -1.76431894e-01, -2.89585531e-01, -4.21749651e-01 ]
+      - [ -6.38980985e-01, -1.87547028e-01, -3.61249149e-01,
+          5.39745092e-02, -6.86629057e-01, -2.82327831e-01,
+          -3.63570869e-01, 2.57815123e-02, -6.58126354e-01,
+          -3.23437214e-01, -3.15624952e-01, -1.60220981e-01,
+          -5.67096531e-01, -5.98149598e-01, -3.18750024e-01,
+          -3.17748427e-01, -5.23290753e-01, -6.44126773e-01,
+          1.29546165e-01, -1.33036375e-02, 1.36107922e-01,
+          2.79991627e-02, -4.61777747e-01, -4.17965651e-02,
+          1.00577950e-01, -7.09021091e-02, -3.78379583e-01,
+          -9.17841196e-02, 2.30386019e-01, -2.63041437e-01,
+          -2.65001655e-01, -4.56403017e-01, 2.66169310e-01,
+          -4.79392588e-01, -2.44669497e-01, -5.32163024e-01 ]
+      - [ -5.65451622e-01, -1.09332860e-01, -2.19304502e-01,
+          9.27684307e-02, -6.85261071e-01, -2.58017421e-01,
+          -2.29661644e-01, 3.06692600e-01, -7.10166574e-01,
+          -1.97630346e-01, -1.93749905e-01, -4.55992222e-02,
+          -5.54652512e-01, -5.25452077e-01, -2.01562524e-01,
+          2.00690031e-02, -5.58997035e-01, -5.16677260e-01,
+          1.88457966e-01, 1.22092962e-01, 2.53093362e-01,
+          -8.78696442e-02, -3.66943538e-01, -1.89953446e-02,
+          2.46242762e-01, 6.09235764e-02, -3.82191718e-01,
+          6.64991140e-02, 3.30012321e-01, -6.26401901e-02,
+          -2.30704427e-01, -3.01996350e-01, 3.23382974e-01,
+          -5.49849272e-02, -2.48087943e-01, -3.03228080e-01 ]
+      - [ -5.45360446e-01, -1.33104801e-01, -1.71043336e-01,
+          4.96081471e-01, -7.71846890e-01, -7.58672357e-02,
+          -1.95104361e-01, 4.83240962e-01, -7.64482737e-01,
+          -1.68411970e-01, -4.68749404e-02, 1.11771703e-01,
+          -6.44757450e-01, -5.59817314e-01, -5.93749881e-02,
+          1.16749167e-01, -6.40899777e-01, -5.67961931e-01,
+          4.46949124e-01, 2.62758374e-01, 4.82564688e-01, 2.93026090e-01,
+          -3.59576702e-01, 5.26757956e-01, 4.95953679e-01,
+          8.27276707e-02, -2.84899533e-01, 2.72622466e-01,
+          5.56662440e-01, -3.60135913e-01, 9.12797451e-03,
+          -2.39666879e-01, 5.74626923e-01, -4.53687847e-01,
+          1.22662783e-02, -2.73872912e-01 ]
+      - [ -6.17647052e-01, -1.92938089e-01, -2.94535160e-01,
+          1.77100539e-01, -7.15949357e-01, -2.33859122e-01,
+          -3.04535568e-01, 4.64869261e-01, -7.57128596e-01,
+          -1.48740709e-01, -1.43749893e-01, 3.25819254e-02,
+          -6.49944484e-01, -6.02162957e-01, -1.48437440e-01,
+          1.63658857e-02, -6.42510653e-01, -6.16255105e-01,
+          3.02073956e-01, 2.39507437e-01, 3.97075295e-01,
+          -4.24830317e-02, -2.58690059e-01, 2.34661579e-01,
+          3.59537482e-01, 2.82338142e-01, -3.20606112e-01,
+          4.79787707e-01, 4.99377251e-01, -2.80832291e-01,
+          -4.33008671e-02, -2.43219137e-01, 5.07462502e-01,
+          -4.78964388e-01, -1.55693293e-03, -3.04638445e-01 ]
+      - [ -6.51201308e-01, -1.92587316e-01, -3.42796326e-01,
+          1.85209513e-01, -7.18745470e-01, -2.25314617e-01,
+          -3.39092791e-01, 3.82078886e-02, -6.91131353e-01,
+          -3.72650445e-01, -2.87499964e-01, -3.61466885e-01,
+          -5.46224475e-01, -6.63424969e-01, -2.99999952e-01,
+          -1.90559208e-01, -5.69520831e-01, -6.32731318e-01,
+          4.83919382e-02, -4.53695655e-02, 9.33632851e-02,
+          7.86969662e-02, -4.38418269e-01, 6.89408779e-02,
+          1.79190636e-01, -2.37194180e-01, -3.65480125e-01,
+          -2.52805889e-01, 2.92652488e-01, -5.05726099e-01,
+          -2.52686441e-01, -5.67489028e-01, 2.36318350e-01,
+          -4.00802016e-01, -2.75845408e-01, -5.22036254e-01 ]
+      - [ -7.34465599e-01, -3.23295653e-01, -4.47835386e-01,
+          -8.43439102e-02, -6.87881589e-01, -4.01006043e-01,
+          -4.68682528e-01, 2.09901929e-01, -7.37364173e-01,
+          -3.03773701e-01, -3.04687500e-01, -1.62875712e-01,
+          -6.61318183e-01, -7.00934172e-01, -3.12500000e-01,
+          -1.54081523e-01, -6.55904353e-01, -7.08314896e-01,
+          -8.71658325e-03, 1.46679044e-01, 1.83352113e-01,
+          -9.31813121e-02, -2.35331714e-01, 1.96789861e-01,
+          1.65317893e-01, 1.72493577e-01, -2.81218290e-01,
+          4.03708816e-01, 3.72353673e-01, -4.53819394e-01,
+          -9.52126980e-02, -4.00604367e-01, 3.58208895e-01,
+          -6.10429525e-01, -5.71768880e-02, -4.35050309e-01 ]
+      - [ -8.87945294e-01, -5.73968589e-01, -7.24627376e-01,
+          3.88029814e-02, -8.67071688e-01, -5.96473277e-01,
+          -7.03383684e-01, -3.07656109e-01, -7.81548023e-01,
+          -7.53920197e-01, -6.76562428e-01, -5.75492382e-02,
+          -8.08856308e-01, -8.35943282e-01, -6.70312524e-01,
+          -2.68830001e-01, -7.62191951e-01, -8.75670552e-01,
+          -5.47941089e-01, -3.67676556e-01, -3.04836869e-01,
+          1.24978185e-01, -7.07133353e-01, -3.86788189e-01,
+          -2.78612673e-01, -5.15132070e-01, -4.41311836e-01,
+          -6.70407593e-01, -3.00124526e-01, -4.71320748e-02,
+          -6.08509898e-01, -6.97196066e-01, -2.58706450e-01,
+          -5.19421160e-01, -5.10663271e-01, -7.98099697e-01 ]
+      - [ -6.23032331e-01, -2.60428488e-01, -3.24343503e-01,
+          4.21076894e-01, -8.32724810e-01, -2.69793987e-01,
+          -3.10295165e-01, -8.40907097e-02, -7.29060352e-01,
+          -5.27725697e-01, -2.95312405e-01, -1.94439173e-01,
+          -6.37070656e-01, -6.87413633e-01, -3.10937524e-01,
+          9.99063253e-02, -6.84671760e-01, -6.32006884e-01,
+          1.66215777e-01, -6.71890974e-02, 1.51856065e-01,
+          4.32691455e-01, -6.58694983e-01, 7.82065392e-02,
+          2.30057716e-01, -3.43634963e-01, -4.41386104e-01,
+          -4.59860623e-01, 2.80199170e-01, -2.67321825e-01,
+          -3.71543109e-01, -5.54497957e-01, 2.46268749e-01,
+          -1.24848902e-01, -4.12275493e-01, -5.10116160e-01 ]
+      - [ -8.33264291e-01, -4.41672742e-01, -6.01135552e-01,
+          -7.10602403e-02, -7.63135731e-01, -5.08990765e-01,
+          -6.12670898e-01, 3.51520777e-02, -7.70982027e-01,
+          -4.95523274e-01, -5.00000000e-01, -1.83955729e-01,
+          -7.19069719e-01, -7.78806567e-01, -5.06250024e-01,
+          -1.28114045e-01, -7.23212242e-01, -7.77417660e-01,
+          -3.70003045e-01, -2.07080841e-02, -5.51180840e-02,
+          -8.58305097e-02, -3.69127572e-01, -3.44309211e-02,
+          -2.89017558e-02, -1.10855699e-02, -3.61546934e-01,
+          4.08053398e-03, 5.60398102e-02, -4.09911931e-01,
+          -2.23806202e-01, -5.11138201e-01, 3.98010015e-02,
+          -4.10053968e-01, -2.38486290e-01, -5.07316828e-01 ]
+      - [ -8.69096935e-01, -5.61224937e-01, -6.89141214e-01,
+          -5.60658574e-02, -8.36880445e-01, -6.21417761e-01,
+          -6.83225334e-01, -1.05512142e-01, -8.29524279e-01,
+          -6.96110427e-01, -6.34374976e-01, -1.32478595e-01,
+          -7.95529485e-01, -8.45539212e-01, -6.39062524e-01,
+          -1.17601335e-01, -7.92355895e-01, -8.50645244e-01,
+          -4.82416630e-01, -3.94218683e-01, -3.29583824e-01,
+          1.77971721e-01, -7.38577724e-01, -3.87566209e-01,
+          -3.01734209e-01, -1.35106027e-01, -6.18387103e-01,
+          -5.64827085e-01, -2.65255272e-01, -2.72020936e-01,
+          -5.93037486e-01, -7.79340744e-01, -2.78607011e-01,
+          -3.64675403e-01, -5.76578021e-01, -7.90109634e-01 ]
+      - [ -9.13835943e-01, -3.71382833e-01, -7.09013462e-01,
+          1.72832489e-01, -7.92791486e-01, -3.04072499e-01,
+          -7.12023020e-01, -1.49656117e-01, -7.29411840e-01,
+          -5.67157447e-01, -6.62499964e-01, -1.82044208e-01,
+          -6.79392457e-01, -7.50782847e-01, -6.74999952e-01,
+          -8.34775567e-02, -6.90596700e-01, -7.36919165e-01,
+          -7.10249543e-01, -1.29401982e-01, -3.61079931e-01,
+          2.30004191e-01, -5.55280805e-01, 7.68841505e-02,
+          -3.08670521e-01, -4.09826338e-01, -3.30567956e-01,
+          -4.52672362e-01, -3.07596564e-01, -5.13700902e-01,
+          -2.41814017e-01, -6.22161508e-01, -3.33333433e-01,
+          -3.49333942e-01, -3.06431949e-01, -5.69505095e-01 ]
+      - [ -6.71499610e-01, -4.16808486e-01, -4.13768649e-01,
+          8.23137760e-02, -8.48342299e-01, -5.70375741e-01,
+          -4.34125304e-01, 3.02820206e-02, -8.24525356e-01,
+          -6.11990988e-01, -4.29687500e-01, 1.51823401e-01,
+          -7.84658372e-01, -7.22359955e-01, -4.29687500e-01,
+          7.80513287e-02, -7.67396867e-01, -7.43294954e-01,
+          -8.71658325e-03, -2.44264901e-01, -2.13723183e-02,
+          1.44825816e-01, -6.98340654e-01, -3.53746772e-01,
+          -3.46827507e-03, 2.10146904e-02, -6.47213817e-01,
+          -4.47812557e-01, 3.86052132e-02, 1.59372687e-01,
+          -6.10920668e-01, -5.89827836e-01, 3.73134613e-02,
+          1.14854217e-01, -6.08056545e-01, -5.99740624e-01 ]
+      - [ -7.34672785e-01, -1.87422395e-01, -3.71185243e-01,
+          6.55542135e-01, -8.06654990e-01, 5.28963804e-02,
+          -3.39092791e-01, -3.65057766e-01, -5.92244565e-01,
+          -5.18596411e-01, -2.82812476e-01, -3.91701579e-01,
+          -5.35782635e-01, -6.65572703e-01, -2.95312405e-01,
+          -2.54018068e-01, -5.54485798e-01, -6.45330369e-01,
+          -1.28343940e-01, 9.03952122e-02, 1.47356510e-01,
+          6.32382274e-01, -5.87560534e-01, 5.60160875e-01,
+          2.04624295e-01, -8.77277732e-01, -2.03716755e-02,
+          -4.52800512e-01, 2.20423341e-01, -7.38913715e-01,
+          6.60657883e-03, -4.85501230e-01, 2.18905449e-01,
+          -6.55218840e-01, -4.92496490e-02, -4.63277757e-01 ]
+      - [ -8.00538540e-01, -6.70812488e-01, -6.49396777e-01,
+          1.38951540e-02, -9.36745822e-01, -7.81655371e-01,
+          -6.67386591e-01, 1.27431273e-01, -9.37803686e-01,
+          -7.63248324e-01, -6.60937488e-01, 2.20051527e-01,
+          -9.09792364e-01, -8.63115788e-01, -6.59375012e-01,
+          1.51008725e-01, -9.00753736e-01, -8.82127047e-01,
+          -3.96453321e-01, -5.01436293e-01, -3.47581506e-01,
+          3.07489634e-02, -8.06572080e-01, -6.72734320e-01,
+          -3.82658958e-01, 1.78518534e-01, -8.09765935e-01,
+          -5.99232852e-01, -3.74844372e-01, 2.21298337e-01,
+          -7.74452806e-01, -7.48227894e-01, -3.70646775e-01,
+          -4.66746092e-02, -7.26883411e-01, -8.00542891e-01 ]
+      - [ -7.67605662e-01, -4.58289444e-01, -5.33002138e-01,
+          2.77676582e-02, -8.14041138e-01, -5.29510498e-01,
+          -5.46436191e-01, 7.97095299e-02, -8.15822721e-01,
+          -5.52825451e-01, -4.20312524e-01, -4.56827879e-02,
+          -7.76251793e-01, -7.86266148e-01, -4.28124964e-01,
+          -1.12957358e-02, -7.75964856e-01, -7.86931515e-01,
+          -1.07303917e-01, -1.13419354e-01, 3.93700600e-02,
+          7.01526403e-02, -5.54740191e-01, -1.62174165e-01,
+          5.20231724e-02, 5.38418293e-02, -5.38495898e-01,
+          -2.23471642e-01, 1.30759597e-01, -1.55705571e-01,
+          -4.35301542e-01, -5.64884901e-01, 1.19403005e-01,
+          -2.09143996e-01, -4.30079341e-01, -5.73482096e-01 ]
+      - [ -6.90555096e-01, -1.23407841e-01, -3.58410180e-01,
+          1.65521264e-01, -6.64629340e-01, -1.32891834e-01,
+          -3.83729279e-01, 2.99795032e-01, -6.99259758e-01,
+          -1.59192145e-01, -3.56249988e-01, -1.52465284e-01,
+          -4.98529434e-01, -5.20728409e-01, -3.53124917e-01,
+          -2.57745147e-01, -4.62392449e-01, -5.56800961e-01,
+          9.91880894e-03, 1.49847627e-01, 1.60854936e-01,
+          -4.93537784e-02, -2.19700992e-01, 2.86705375e-01,
+          1.53757215e-01, -1.09518468e-01, -2.67928839e-01,
+          4.50961590e-02, 2.12951303e-01, -3.04438412e-01,
+          -8.20212364e-02, -3.10487151e-01, 2.06467628e-01,
+          -4.01024818e-01, -5.53000569e-02, -3.26810896e-01 ]
+      - [ -8.57290804e-01, -4.80587721e-01, -6.35202289e-01,
+          -1.34023368e-01, -7.69405425e-01, -5.70167005e-01,
+          -6.54427648e-01, 3.27706337e-04, -7.85254598e-01,
+          -5.44658303e-01, -5.42187452e-01, -1.95912540e-01,
+          -7.42278099e-01, -8.08854401e-01, -5.51562488e-01,
+          -1.47912621e-01, -7.43003845e-01, -8.07376266e-01,
+          -4.52359498e-01, -1.13490403e-01, -1.13610804e-01,
+          -1.78157568e-01, -3.89507234e-01, -1.92974746e-01,
+          -1.72254324e-01, 8.20262432e-02, -4.46427345e-01,
+          -2.85065174e-02, -1.61893964e-02, -4.84372795e-01,
+          -3.08489978e-01, -6.25632405e-01, -3.23382616e-02,
+          -4.83776331e-01, -3.14951956e-01, -6.14272714e-01 ]
+      - [ -7.69676924e-01, -3.75161409e-01, -5.06032705e-01,
+          5.91660738e-02, -7.59975672e-01, -4.00282264e-01,
+          -5.14758825e-01, 3.87767553e-02, -7.48570681e-01,
+          -4.61743832e-01, -3.79687488e-01, -2.74721980e-02,
+          -7.23344564e-01, -7.18482852e-01, -3.79687488e-01,
+          -6.44957423e-02, -7.14815378e-01, -7.38206387e-01,
+          -9.28765535e-02, 3.08206081e-02, 1.11361146e-01,
+          -1.01435184e-03, -3.64469886e-01, 9.61756706e-02,
+          1.56069279e-01, -1.12783492e-01, -3.30239594e-01,
+          -5.99213839e-02, 1.88044786e-01, -4.49514389e-03,
+          -2.95955896e-01, -3.38743269e-01, 2.28855610e-01,
+          -4.34533715e-01, -2.02015400e-01, -4.75047290e-01 ]
+      - [ -8.51077080e-01, -4.60448921e-01, -5.94038308e-01,
+          -2.22269773e-01, -7.20345318e-01, -5.64285994e-01,
+          -6.28509760e-01, -1.03588879e-01, -7.20944345e-01,
+          -5.19374609e-01, -4.84374940e-01, -3.51867914e-01,
+          -6.83949947e-01, -8.04203749e-01, -4.87499952e-01,
+          -3.16972375e-01, -6.83953822e-01, -8.07947993e-01,
+          -3.61587107e-01, -1.31081581e-01, -1.11361146e-01,
+          6.41454458e-02, -4.85100389e-01, -3.89252901e-02,
+          -7.51445293e-02, 1.79182291e-02, -4.17468011e-01,
+          -5.72247505e-02, 1.18306279e-01, -5.39388776e-01,
+          -3.30744565e-01, -6.57733679e-01, 1.31840825e-01,
+          -6.79457486e-01, -3.15515161e-01, -6.87738061e-01 ]
+      - [ -8.93123448e-01, -3.38382781e-01, -6.66430116e-01,
+          -1.16978765e-01, -6.36944354e-01, -3.28230619e-01,
+          -6.81785464e-01, -1.36820734e-01, -6.36761069e-01,
+          -4.11501467e-01, -5.89062452e-01, -3.71328056e-01,
+          -5.84985256e-01, -7.32831478e-01, -5.93750000e-01,
+          -3.24834585e-01, -5.84660470e-01, -7.32503474e-01,
+          -6.06251955e-01, 2.80668736e-02, -2.28346467e-01,
+          -1.01997018e-01, -1.68383837e-01, 2.69264579e-01,
+          -2.36994207e-01, -1.53340042e-01, -1.91388667e-01,
+          7.31080770e-02, -1.25778317e-01, -6.25540912e-01,
+          -5.94062209e-02, -5.15755177e-01, -1.29353285e-01,
+          -6.29367590e-01, -7.37007856e-02, -5.06863594e-01 ]
+      - [ -9.02651191e-01, -4.55554247e-01, -7.09013462e-01,
+          -1.38161540e-01, -7.46445060e-01, -5.29589891e-01,
+          -7.32181430e-01, -1.19850993e-01, -7.31075525e-01,
+          -5.44507146e-01, -6.48437440e-01, -3.00967932e-01,
+          -6.93407476e-01, -8.10317159e-01, -6.57812476e-01,
+          -2.29956388e-01, -6.97771549e-01, -8.04356575e-01,
+          -6.22482777e-01, -2.10514188e-01, -2.75590479e-01,
+          -1.42080724e-01, -4.62386727e-01, -2.79337287e-01,
+          -2.90173352e-01, 1.92201138e-02, -5.09876490e-01,
+          -2.24247694e-01, -1.83063507e-01, -4.59738731e-01,
+          -3.84932280e-01, -6.89827561e-01, -2.01492548e-01,
+          -4.05038357e-01, -4.12147641e-01, -6.71144664e-01 ]
+      - [ -7.07332253e-01, -3.08277428e-01, -4.13768649e-01,
+          1.30748987e-01, -7.58331835e-01, -3.46061707e-01,
+          -4.34125304e-01, 1.45170569e-01, -7.48953462e-01,
+          -3.85757864e-01, -2.82812476e-01, 3.53579521e-02,
+          -7.09938407e-01, -6.75406933e-01, -2.79687464e-01,
+          -1.51798129e-02, -7.01435030e-01, -7.00670004e-01,
+          1.10309601e-01, 7.89346695e-02, 2.48593926e-01, 2.25805044e-02,
+          -3.82438004e-01, 9.78686810e-02, 2.62427688e-01,
+          -7.36979246e-02, -3.39064598e-01, -2.25634575e-02,
+          2.67745852e-01, 5.76262474e-02, -2.58969307e-01,
+          -2.58413792e-01, 3.15920472e-01, -3.40770781e-01,
+          -1.78631902e-01, -3.98664594e-01 ]
+      - [ -8.06959391e-01, -4.91880655e-01, -5.68488240e-01,
+          -1.02086842e-01, -8.09014976e-01, -6.17513120e-01,
+          -5.69474459e-01, 8.48515034e-02, -8.43685150e-01,
+          -6.00969911e-01, -4.93749976e-01, -1.60883665e-01,
+          -7.75556505e-01, -8.27272058e-01, -5.07812500e-01,
+          2.87145376e-02, -8.01500440e-01, -8.03335249e-01,
+          -2.68410027e-01, -2.72359014e-01, -1.63104653e-01,
+          3.81000042e-02, -6.08605385e-01, -3.02318454e-01,
+          -7.74567723e-02, 3.96589041e-02, -6.19458795e-01,
+          -3.83354723e-01, -8.71735811e-03, -3.05005968e-01,
+          -5.24168015e-01, -7.19978392e-01, -2.48757005e-02,
+          -2.08678544e-01, -5.50374031e-01, -6.89971507e-01 ]
+      - [ -7.02361226e-01, -2.12735653e-01, -3.58410180e-01,
+          5.32951355e-02, -6.57661319e-01, -2.29263723e-01,
+          -3.66450608e-01, 1.48467541e-01, -6.64819777e-01,
+          -2.30634511e-01, -1.75000012e-01, -4.43233371e-01,
+          -5.58546543e-01, -7.00883746e-01, -1.89062476e-01,
+          -3.10068369e-01, -5.69125414e-01, -6.77830815e-01,
+          5.44033051e-02, 5.30446768e-02, 2.26096749e-01,
+          -1.17169619e-02, -3.37456584e-01, 1.32556915e-01,
+          1.81502938e-01, 1.40233755e-01, -3.27880025e-01,
+          2.69734263e-01, 5.09339929e-01, -9.94020462e-01,
+          -5.95568419e-02, -6.31699204e-01, 4.67661738e-01,
+          -7.13938534e-01, -1.44412398e-01, -5.49718022e-01 ]
+      - [ -7.43579149e-01, -3.06813180e-01, -4.73385394e-01,
+          2.05215335e-01, -7.41132975e-01, -2.29185581e-01,
+          -4.58603263e-01, -7.92374611e-02, -6.75223947e-01,
+          -4.32189882e-01, -3.24999928e-01, -1.64583504e-01,
+          -6.51132464e-01, -6.91567898e-01, -3.29687476e-01,
+          -1.48351610e-01, -6.47318721e-01, -6.97726846e-01,
+          -4.29817438e-02, 1.64112091e-01, 1.51856065e-01,
+          1.35128021e-01, -2.95814097e-01, 4.25932050e-01,
+          1.86127067e-01, -2.70555317e-01, -1.26384258e-01,
+          6.67165518e-02, 3.07596564e-01, -4.46379244e-01,
+          -6.35891557e-02, -3.72672439e-01, 3.00994992e-01,
+          -4.78548884e-01, -6.75066113e-02, -3.75960588e-01 ]
+      - [ -8.39270949e-01, -4.19211626e-01, -6.12491131e-01,
+          -1.83613598e-01, -6.85906410e-01, -4.76937950e-01,
+          -6.19870365e-01, 1.57176256e-02, -7.54282534e-01,
+          -4.80251312e-01, -5.04687428e-01, -3.07901740e-01,
+          -6.76070809e-01, -7.83007503e-01, -5.03124952e-01,
+          -3.48437250e-01, -6.60591125e-01, -7.99924433e-01,
+          -2.96663702e-01, -1.08455837e-01, -3.48705649e-02,
+          -4.74033356e-02, -4.07391489e-01, -4.76434827e-02,
+          -5.20231128e-02, -1.21436656e-01, -3.65060329e-01,
+          -1.40133500e-01, 1.05852962e-01, -6.46267772e-01,
+          -2.37824798e-01, -6.36502624e-01, 1.09452724e-01,
+          -8.57564688e-01, -2.00431108e-01, -6.82337761e-01 ]
+      - [ -8.18765521e-01, -5.36113977e-01, -6.16749525e-01,
+          4.40235138e-02, -8.64760101e-01, -6.08098745e-01,
+          -6.24190032e-01, 2.26124525e-02, -8.42327714e-01,
+          -6.35997295e-01, -5.43749988e-01, 2.01734304e-02,
+          -8.23167801e-01, -8.19773793e-01, -5.53124964e-01,
+          3.10118198e-02, -8.19415092e-01, -8.23966205e-01,
+          -3.27321947e-01, -2.09533155e-01, -1.33858263e-01,
+          -6.52725697e-02, -5.52781999e-01, -3.24446559e-01,
+          -1.19075119e-01, -1.60430491e-01, -4.73325372e-01,
+          -3.57567906e-01, -1.23287737e-01, -1.22241914e-01,
+          -4.73934710e-01, -5.96836448e-01, -1.19403005e-01,
+          -2.89518714e-01, -4.43957329e-01, -6.37513638e-01 ]
+      - [ -7.02361226e-01, -2.12735653e-01, -3.58410180e-01,
+          5.32951355e-02, -6.57661319e-01, -2.29263723e-01,
+          -3.66450608e-01, 1.48467541e-01, -6.64819777e-01,
+          -2.30634511e-01, -1.75000012e-01, -4.43233371e-01,
+          -5.58546543e-01, -7.00883746e-01, -1.89062476e-01,
+          -3.10068369e-01, -5.69125414e-01, -6.77830815e-01,
+          5.44033051e-02, 5.30446768e-02, 2.26096749e-01,
+          -1.17169619e-02, -3.37456584e-01, 1.32556915e-01,
+          1.81502938e-01, 1.40233755e-01, -3.27880025e-01,
+          2.69734263e-01, 5.09339929e-01, -9.94020462e-01,
+          -5.95568419e-02, -6.31699204e-01, 4.67661738e-01,
+          -7.13938534e-01, -1.44412398e-01, -5.49718022e-01 ]
+      - [ -8.64125967e-01, -4.65781391e-01, -6.53655052e-01,
+          -1.74708307e-01, -7.34083891e-01, -5.45058608e-01,
+          -6.63066983e-01, -1.35711610e-01, -7.18042135e-01,
+          -5.39209247e-01, -5.59374988e-01, -2.81464159e-01,
+          -6.91576600e-01, -7.92403519e-01, -5.60937464e-01,
+          -2.89140105e-01, -6.81334794e-01, -8.01993668e-01,
+          -4.28313851e-01, -1.26758218e-01, -9.33633447e-02,
+          -2.14238584e-01, -3.75099778e-01, -2.13448644e-01,
+          -1.23699486e-01, -7.05785155e-02, -3.65469992e-01,
+          -8.37599039e-02, 3.86052132e-02, -5.48311591e-01,
+          -2.90824652e-01, -6.37287199e-01, 1.24377012e-02,
+          -4.90124702e-01, -3.18313003e-01, -6.16276324e-01 ]
+      - [ -7.67191350e-01, -3.18167865e-01, -4.43576992e-01,
+          3.33259821e-01, -7.69953251e-01, -1.71702981e-01,
+          -3.89488816e-01, -4.68857050e-01, -6.32659435e-01,
+          -6.40885234e-01, -2.96875000e-01, -5.22220433e-01,
+          -6.02768302e-01, -7.77229369e-01, -3.04687500e-01,
+          -4.03162062e-01, -6.17090225e-01, -7.65718520e-01,
+          -3.81725430e-02, -7.61610866e-02, 1.51856065e-01,
+          5.48888922e-01, -6.10838592e-01, 3.66177678e-01,
+          2.57803440e-01, -5.56694150e-01, -3.23680818e-01,
+          -4.95335817e-01, 4.29638743e-01, -8.81833613e-01,
+          -2.34891355e-01, -7.09339857e-01, 4.12935138e-01,
+          -7.59258032e-01, -2.80739069e-01, -6.75593674e-01 ]
+      - [ -8.14623058e-01, -3.60664487e-01, -5.30163169e-01,
+          1.68921709e-01, -7.39796042e-01, -2.50021935e-01,
+          -4.73002076e-01, -4.64233160e-01, -6.34565294e-01,
+          -6.46778047e-01, -3.82812440e-01, -5.21516085e-01,
+          -6.09371185e-01, -7.89435148e-01, -3.90625000e-01,
+          -4.31221724e-01, -6.17698073e-01, -7.82793760e-01,
+          -1.62608981e-01, -9.70017314e-02, 5.51180840e-02,
+          4.54355597e-01, -5.63805819e-01, 3.52381110e-01,
+          2.09248424e-01, -5.74411035e-01, -3.03285182e-01,
+          -4.92121220e-01, 4.02241588e-01, -9.49997842e-01,
+          -2.29886174e-01, -7.33957112e-01, 3.68159056e-01,
+          -9.02947068e-01, -2.49038339e-01, -7.12153196e-01 ]
+      - [ -8.41342151e-01, -6.98350191e-01, -6.97657943e-01,
+          -5.63349724e-02, -9.28732336e-01, -8.00696433e-01,
+          -7.09143281e-01, -2.66309977e-02, -9.23146367e-01,
+          -8.12717259e-01, -6.89062476e-01, 1.42327189e-01,
+          -9.08138454e-01, -8.84793997e-01, -6.93749964e-01,
+          1.03460550e-01, -9.00446892e-01, -8.96360636e-01,
+          -3.64592791e-01, -5.31470895e-01, -2.59842515e-01,
+          2.12664008e-01, -8.65064263e-01, -6.30894780e-01,
+          -2.39306450e-01, 1.79091334e-01, -8.41254473e-01,
+          -6.64235950e-01, -2.75217950e-01, 2.72775531e-01,
+          -8.10824394e-01, -7.68850386e-01, -2.71144271e-01,
+          9.49268341e-02, -7.85380244e-01, -8.01992714e-01 ]
+      - [ -6.59693480e-01, -5.81634521e-01, -4.00993645e-01,
+          -1.17891431e-02, -9.16596413e-01, -7.71487951e-01,
+          -4.06767368e-01, 8.10344219e-02, -9.20760870e-01,
+          -7.71788716e-01, -3.78125012e-01, 1.58149600e-01,
+          -8.85532677e-01, -8.50926459e-01, -3.82812500e-01,
+          1.22901678e-01, -8.79558980e-01, -8.62584233e-01,
+          5.44033051e-02, -4.53680158e-01, -3.26209068e-02,
+          1.97403073e-01, -8.38881910e-01, -5.94890237e-01,
+          -1.15603209e-03, 1.62312031e-01, -8.20146918e-01,
+          -6.38726890e-01, 5.10584116e-02, 2.86695600e-01,
+          -7.89646745e-01, -7.38978386e-01, 4.47760820e-02,
+          1.20357394e-01, -7.66628146e-01, -7.69098163e-01 ]
+      - [ -5.89892328e-01, -2.25821018e-01, -2.78921247e-01,
+          4.35439706e-01, -8.22597265e-01, -2.40072191e-01,
+          -2.49819934e-01, -7.84728527e-02, -7.17143059e-01,
+          -5.05705476e-01, -2.32812464e-01, -2.05066025e-01,
+          -6.20201111e-01, -6.71201348e-01, -2.51562417e-01,
+          9.76355076e-02, -6.67885900e-01, -6.11744344e-01,
+          2.06492305e-01, -5.85207343e-02, 1.72103524e-01,
+          4.20170903e-01, -6.47152185e-01, 8.71489048e-02,
+          2.57803440e-01, -3.48100781e-01, -4.36012626e-01,
+          -4.54455316e-01, 3.12577844e-01, -2.62328327e-01,
+          -3.67970943e-01, -5.47338545e-01, 2.88557172e-01,
+          -1.20545030e-01, -4.08247650e-01, -5.02659082e-01 ]
+      - [ -7.46064603e-01, -3.19167852e-01, -4.44996417e-01,
+          6.53205633e-01, -8.75317872e-01, -1.28857076e-01,
+          -4.19726372e-01, -4.30905223e-01, -6.56724095e-01,
+          -6.49597883e-01, -3.81249964e-01, -2.86327004e-01,
+          -6.43690944e-01, -7.35008478e-01, -3.84374976e-01,
+          -3.10460925e-01, -6.31832123e-01, -7.50160396e-01,
+          -1.13916516e-01, -7.45942593e-02, 7.53655434e-02,
+          6.78659201e-01, -7.19687819e-01, 2.78945327e-01,
+          1.12138629e-01, -7.79978037e-01, -2.37217844e-01,
+          -6.04092836e-01, 1.45703554e-01, -5.16053438e-01,
+          -2.67103553e-01, -5.94470978e-01, 1.39303446e-01,
+          -6.48638844e-01, -2.47899771e-01, -6.23389006e-01 ]
+      - [ -9.24606442e-01, -4.98422980e-01, -7.51596868e-01,
+          -1.23525500e-01, -7.76967883e-01, -5.64032078e-01,
+          -7.53779650e-01, -1.58744395e-01, -7.73549557e-01,
+          -6.42394066e-01, -6.89062476e-01, -2.32266307e-01,
+          -7.40710557e-01, -8.32508683e-01, -6.93749964e-01,
+          -2.01487422e-01, -7.37960696e-01, -8.34109843e-01,
+          -7.06041455e-01, -3.45022321e-01, -4.03824508e-01,
+          9.36393738e-02, -6.50593638e-01, -3.12478960e-01,
+          -3.84971201e-01, -1.72677338e-01, -5.37889719e-01,
+          -4.94930446e-01, -3.37484360e-01, -3.85867774e-01,
+          -5.05104184e-01, -7.64580607e-01, -3.28358173e-01,
+          -5.00846565e-01, -4.84932840e-01, -7.80165255e-01 ]
+      - [ -8.32850039e-01, -4.17923033e-01, -6.02554977e-01,
+          -4.56559658e-02, -7.10561335e-01, -3.91743064e-01,
+          -5.82433403e-01, -2.72934616e-01, -6.61531925e-01,
+          -5.59454978e-01, -4.71875012e-01, -3.36484551e-01,
+          -6.49584889e-01, -7.66229391e-01, -4.76562500e-01,
+          -3.16441596e-01, -6.44114494e-01, -7.70959377e-01,
+          -2.99669445e-01, 6.85787201e-03, -3.03711891e-02,
+          4.23901081e-02, -3.16896498e-01, 2.43918777e-01,
+          -5.78039885e-03, -3.41792047e-01, -1.46441460e-01,
+          -7.44156837e-02, 1.73100829e-01, -6.25474274e-01,
+          -1.44094408e-01, -5.48436880e-01, 1.69154167e-01,
+          -6.93536043e-01, -1.35790884e-01, -5.55695713e-01 ]
+      - [ -6.41052246e-01, -2.91754007e-01, -3.38537931e-01,
+          1.43289447e-01, -8.00186932e-01, -4.31140006e-01,
+          -3.59251201e-01, 3.65831852e-01, -8.28056693e-01,
+          -3.87470365e-01, -3.14062476e-01, -5.80519438e-03,
+          -6.91049337e-01, -6.70561254e-01, -3.24999988e-01,
+          -6.77597523e-03, -6.83099627e-01, -6.77370310e-01,
+          1.24736905e-01, -9.66395736e-02, 1.15860462e-01,
+          1.49059296e-02, -5.62114716e-01, -2.40993202e-01,
+          1.58381343e-01, 5.39051294e-02, -5.73820472e-01,
+          -2.84238994e-01, 1.75591469e-01, -3.98290157e-02,
+          -4.30980861e-01, -4.98987019e-01, 1.69154167e-01,
+          -8.03288817e-02, -4.33302283e-01, -5.10895848e-01 ]
+      - [ -7.31151581e-01, -1.87159061e-01, -3.69765818e-01,
+          6.56993985e-01, -8.06340098e-01, 5.48661947e-02,
+          -3.34773242e-01, -3.63527417e-01, -5.93012810e-01,
+          -5.18376827e-01, -2.78124988e-01, -3.90027165e-01,
+          -5.36272168e-01, -6.65019929e-01, -2.92187452e-01,
+          -2.53258705e-01, -5.54172575e-01, -6.44487262e-01,
+          -1.16922200e-01, 9.38963890e-02, 1.49606228e-01,
+          6.35372400e-01, -5.85860610e-01, 5.69156170e-01,
+          2.06936359e-01, -8.65513206e-01, -2.46337652e-02,
+          -4.46134150e-01, 2.22913980e-01, -7.23169148e-01,
+          2.61926651e-03, -4.79893386e-01, 2.23880649e-01,
+          -6.49054289e-01, -4.95164990e-02, -4.59796429e-01 ]
+      - [ -8.98508728e-01, -7.02568412e-01, -7.57274628e-01,
+          -1.69097483e-01, -8.96033525e-01, -8.03746343e-01,
+          -7.73938060e-01, -1.41401589e-01, -8.80123913e-01,
+          -7.98256874e-01, -7.23437428e-01, -6.89495802e-02,
+          -8.70085835e-01, -9.04647410e-01, -7.26562500e-01,
+          -5.60808182e-02, -8.68475080e-01, -9.09839272e-01,
+          -5.70183396e-01, -4.72985089e-01, -3.45331788e-01,
+          -3.69051099e-02, -7.35987365e-01, -6.12564087e-01,
+          -3.78034770e-01, 2.62906551e-02, -7.10406661e-01,
+          -5.61773658e-01, -3.37484360e-01, -3.82503271e-02,
+          -7.07793236e-01, -7.88411856e-01, -3.23383152e-01,
+          -2.37613916e-01, -6.75673723e-01, -8.24012756e-01 ]
+      - [ -9.02651191e-01, -4.55554247e-01, -7.09013462e-01,
+          -1.38161540e-01, -7.46445060e-01, -5.29589891e-01,
+          -7.32181430e-01, -1.19850993e-01, -7.31075525e-01,
+          -5.44507146e-01, -6.48437440e-01, -3.00967932e-01,
+          -6.93407476e-01, -8.10317159e-01, -6.57812476e-01,
+          -2.29956388e-01, -6.97771549e-01, -8.04356575e-01,
+          -6.22482777e-01, -2.10514188e-01, -2.75590479e-01,
+          -1.42080724e-01, -4.62386727e-01, -2.79337287e-01,
+          -2.90173352e-01, 1.92201138e-02, -5.09876490e-01,
+          -2.24247694e-01, -1.83063507e-01, -4.59738731e-01,
+          -3.84932280e-01, -6.89827561e-01, -2.01492548e-01,
+          -4.05038357e-01, -4.12147641e-01, -6.71144664e-01 ]
+      - [ -8.80903065e-01, -5.72669983e-01, -7.13271856e-01,
+          4.45367098e-02, -8.68416309e-01, -5.97117424e-01,
+          -6.93304539e-01, -3.12510610e-01, -7.80257225e-01,
+          -7.54462779e-01, -6.65624976e-01, -3.85207534e-02,
+          -8.11236978e-01, -8.31165314e-01, -6.59375012e-01,
+          -2.73900628e-01, -7.60297596e-01, -8.74602079e-01,
+          -5.24496555e-01, -3.68401945e-01, -2.98087776e-01,
+          1.33431911e-01, -7.11877465e-01, -3.86554062e-01,
+          -2.71676362e-01, -5.13099372e-01, -4.48889911e-01,
+          -6.75426841e-01, -2.90161848e-01, -3.43890786e-02,
+          -6.13131404e-01, -6.95099056e-01, -2.51243711e-01,
+          -5.19329309e-01, -5.13745964e-01, -7.99502015e-01 ]
+      - [ -6.48301601e-01, -8.53065252e-02, -2.56210089e-01,
+          8.64483953e-01, -8.28634262e-01, 1.75201774e-01,
+          -2.09503174e-01, -4.52971458e-01, -5.21860361e-01,
+          -4.80180919e-01, -2.14062452e-01, -4.19856548e-01,
+          -4.38603044e-01, -5.78058243e-01, -2.20312476e-01,
+          -2.85804570e-01, -4.60621238e-01, -5.59604645e-01,
+          4.50849533e-03, 1.18795037e-01, 2.05849290e-01, 6.67672753e-01,
+          -5.84712982e-01, 6.14165902e-01, 2.69364119e-01,
+          -8.50387752e-01, -3.49370837e-02, -4.32676315e-01,
+          3.10087204e-01, -6.62839770e-01, -4.70411777e-03,
+          -4.44931865e-01, 3.10945153e-01, -5.74120343e-01,
+          -5.99921346e-02, -4.21424627e-01 ]
+      - [ -7.94946134e-01, -4.98777807e-01, -5.79843879e-01,
+          8.96818638e-02, -8.62253785e-01, -5.72923660e-01,
+          -5.83873272e-01, 6.07411861e-02, -8.39728236e-01,
+          -6.08558536e-01, -5.20312428e-01, 3.17751169e-02,
+          -8.07287037e-01, -7.96764970e-01, -5.23437500e-01,
+          3.71235609e-02, -8.03865433e-01, -8.03399444e-01,
+          -2.56988347e-01, -1.84628367e-01, -9.11136270e-02,
+          -7.30475187e-02, -5.51075459e-01, -3.28455508e-01,
+          -7.28324056e-02, -1.47654712e-01, -4.85688508e-01,
+          -3.59070718e-01, -9.33997035e-02, -7.39801526e-02,
+          -4.59482849e-01, -5.56019902e-01, -9.20398831e-02,
+          -2.39115655e-01, -4.26801682e-01, -5.96195877e-01 ]
+      - [ -8.80695939e-01, -5.16776264e-01, -6.76366210e-01,
+          -1.62263989e-01, -7.83983529e-01, -6.15227580e-01,
+          -6.88984871e-01, -5.69051504e-02, -8.10257792e-01,
+          -6.29312038e-01, -6.01562500e-01, -2.05786288e-01,
+          -7.64760196e-01, -8.37991714e-01, -6.10937476e-01,
+          -8.43255520e-02, -7.80370355e-01, -8.25434208e-01,
+          -5.07664561e-01, -2.81948328e-01, -2.48593926e-01,
+          5.48362732e-03, -5.74416578e-01, -2.81728804e-01,
+          -1.83815122e-01, -5.71857691e-02, -5.61365724e-01,
+          -3.90867949e-01, -8.59278440e-02, -3.99353862e-01,
+          -5.01040816e-01, -7.45106041e-01, -1.11940265e-01,
+          -2.96878695e-01, -5.22156119e-01, -7.07433462e-01 ]
+      - [ -6.37531042e-01, -2.62456596e-01, -3.04471254e-01,
+          4.14152622e-01, -8.14726770e-01, -2.34517992e-01,
+          -3.11735034e-01, 1.09551787e-01, -7.60073304e-01,
+          -4.45388556e-01, -1.93749905e-01, 5.46348095e-02,
+          -7.06798196e-01, -6.62592411e-01, -1.95312500e-01,
+          4.13689613e-02, -7.02182472e-01, -6.76814973e-01,
+          1.38563156e-01, 5.22384644e-02, 2.44094610e-01, 3.78309250e-01,
+          -5.63310385e-01, 2.19624162e-01, 2.78612733e-01,
+          -2.61785090e-01, -3.54123056e-01, -2.54451394e-01,
+          3.27521682e-01, -4.56286073e-02, -3.13601375e-01,
+          -3.77472401e-01, 3.40796113e-01, -3.02132428e-01,
+          -2.58328915e-01, -4.50452447e-01 ]
+      - [ -9.04101074e-01, -5.28799832e-01, -7.20369101e-01,
+          -9.72108841e-02, -8.07131946e-01, -5.97254753e-01,
+          -7.19222426e-01, -1.52048528e-01, -7.95987964e-01,
+          -6.73019528e-01, -6.54687464e-01, -2.25639880e-01,
+          -7.59300768e-01, -8.43888104e-01, -6.59375012e-01,
+          -1.87920213e-01, -7.59939790e-01, -8.45864296e-01,
+          -6.10459924e-01, -3.70005488e-01, -3.47581506e-01,
+          1.31647706e-01, -6.95553541e-01, -3.54311764e-01,
+          -3.22543323e-01, -1.48732007e-01, -5.86120844e-01,
+          -5.32826900e-01, -2.67745912e-01, -3.47667277e-01,
+          -5.52026391e-01, -7.77727008e-01, -2.71144271e-01,
+          -4.83085811e-01, -5.22148252e-01, -7.93887675e-01 ]
+      - [ -5.50124288e-01, -8.89745951e-02, -1.92335010e-01,
+          9.13248062e-02, -6.76077783e-01, -2.44145393e-01,
+          -2.08063364e-01, 2.90108204e-01, -6.95985615e-01,
+          -1.85497046e-01, -1.64062440e-01, -4.80245948e-02,
+          -5.41002870e-01, -5.09850621e-01, -1.71874940e-01,
+          1.51203871e-02, -5.44134378e-01, -5.00865221e-01,
+          1.90862656e-01, 1.24874234e-01, 2.59842515e-01,
+          -9.96962786e-02, -3.60826612e-01, -2.33005285e-02,
+          2.48554826e-01, 5.35002947e-02, -3.76217306e-01,
+          6.79699183e-02, 3.27521682e-01, -5.86705208e-02,
+          -2.29936719e-01, -2.98849225e-01, 3.33333373e-01,
+          -5.28011322e-02, -2.48141825e-01, -3.01718712e-01 ]
+      - [ -5.34175694e-01, -1.51058912e-01, -1.80979431e-01,
+          2.50810266e-01, -7.43682325e-01, -2.38687575e-01,
+          -2.03743696e-01, 5.32287002e-01, -7.82614172e-01,
+          -1.67351127e-01, -1.25000000e-01, -4.48187590e-02,
+          -6.12201452e-01, -5.90940595e-01, -1.25000000e-01,
+          1.17679834e-02, -6.19696736e-01, -5.91019452e-01,
+          4.12684083e-01, 1.71483040e-01, 4.03824568e-01,
+          -5.80645800e-02, -2.88449287e-01, 1.60991788e-01,
+          4.05780315e-01, 1.90075040e-01, -3.79416108e-01,
+          2.36280799e-01, 5.24283886e-01, -3.39950025e-01,
+          -1.12412989e-01, -3.42975795e-01, 5.37313461e-01,
+          -3.77742410e-01, -1.18295252e-01, -3.52536678e-01 ]
+      - [ -6.59693480e-01, -1.96949959e-01, -3.28601897e-01,
+          1.81595802e-01, -7.01250792e-01, -1.95124984e-01,
+          -3.26133847e-01, 9.61554050e-03, -6.70530379e-01,
+          -3.60086799e-01, -2.26562500e-01, -4.10913944e-01,
+          -5.49020648e-01, -6.82377219e-01, -2.40624964e-01,
+          -2.49745011e-01, -5.69609106e-01, -6.55732036e-01,
+          4.71895933e-02, -3.94887328e-02, 1.13610864e-01,
+          1.40589833e-01, -4.66137588e-01, 1.00304008e-01,
+          1.88439250e-01, -2.20116138e-01, -3.74894381e-01,
+          -2.48070896e-01, 2.87671208e-01, -5.13694286e-01,
+          -2.42820919e-01, -5.63561022e-01, 2.41293430e-01,
+          -4.33766007e-01, -2.62924850e-01, -5.26854634e-01 ]
+      - [ -6.57207966e-01, -2.51699865e-01, -3.47054720e-01,
+          1.42443538e-01, -7.70545125e-01, -3.69537473e-01,
+          -3.60691130e-01, 3.42338681e-01, -7.83758581e-01,
+          -3.05776656e-01, -3.09374928e-01, -3.96552086e-02,
+          -6.57609999e-01, -6.45849466e-01, -3.14062476e-01,
+          -1.37602091e-02, -6.58196330e-01, -6.51529193e-01,
+          1.35256052e-02, 1.62160397e-02, 1.60854936e-01,
+          -1.01001620e-01, -4.43249404e-01, -1.68343663e-01,
+          7.05201626e-02, 1.83051944e-01, -4.72703457e-01,
+          5.29483557e-02, 2.07970142e-01, -1.87218189e-01,
+          -3.06967199e-01, -4.55394864e-01, 1.84079528e-01,
+          -1.63217485e-01, -3.18279326e-01, -4.40267801e-01 ]
+      - [ -7.89975166e-01, -2.49234796e-01, -4.53513145e-01,
+          4.69233274e-01, -7.84489691e-01, -6.08197451e-02,
+          -4.25485909e-01, -3.02684486e-01, -6.37921572e-01,
+          -5.41391850e-01, -3.42187464e-01, -3.34265590e-01,
+          -5.96642971e-01, -7.05241919e-01, -3.57812464e-01,
+          -1.97314143e-01, -6.13220930e-01, -6.84292853e-01,
+          -2.56387174e-01, 4.41833735e-02, 7.76153803e-02,
+          5.20672679e-01, -5.47518492e-01, 4.94345546e-01,
+          1.65317893e-01, -8.38675499e-01, -4.27730083e-02,
+          -4.47873712e-01, 1.93026185e-01, -8.36749017e-01,
+          -6.48331642e-03, -5.49094677e-01, 2.03980088e-01,
+          -7.98814535e-01, -5.28696775e-02, -5.40697336e-01 ]
+      - [ -9.91300762e-01, -6.33030415e-01, -9.19091523e-01,
+          -4.91011620e-01, -6.73977733e-01, -7.77428091e-01,
+          -9.53923702e-01, -4.08176303e-01, -6.65910959e-01,
+          -7.38325238e-01, -9.25000012e-01, -4.94947553e-01,
+          -6.47163093e-01, -9.08944070e-01, -9.32812452e-01,
+          -4.40666437e-01, -6.48110628e-01, -9.08481359e-01,
+          -9.60925758e-01, -4.79016125e-01, -7.41282344e-01,
+          -1.78608179e-01, -5.24282575e-01, -5.12382269e-01,
+          -7.71098316e-01, -6.75804615e-02, -5.63479066e-01,
+          -4.83150840e-01, -7.08592772e-01, -4.77617025e-01,
+          -5.11101902e-01, -8.64958525e-01, -7.23880589e-01,
+          -4.98562872e-01, -5.07665932e-01, -8.57700586e-01 ]
+      - [ -9.98964369e-01, -7.25596368e-01, -9.53158259e-01,
+          -5.89996219e-01, -6.62434340e-01, -8.65734994e-01,
+          -9.78401721e-01, -5.32806396e-01, -6.70866668e-01,
+          -8.56963396e-01, -9.71875012e-01, -5.74841976e-01,
+          -6.39564812e-01, -9.49942172e-01, -9.76562500e-01,
+          -5.07862329e-01, -6.53109550e-01, -9.50777590e-01,
+          -9.96393144e-01, -6.33899093e-01, -8.17772746e-01,
+          -1.76430821e-01, -6.50708079e-01, -7.06748605e-01,
+          -8.38150263e-01, -1.84719205e-01, -5.92698097e-01,
+          -6.93982482e-01, -8.08219194e-01, -4.26999569e-01,
+          -5.99173069e-01, -9.16422009e-01, -8.10945272e-01,
+          -4.45301175e-01, -6.29743040e-01, -9.24050868e-01 ]
+      - [ -8.75517845e-01, -4.46870446e-01, -6.47977293e-01,
+          -2.49845207e-01, -6.61327720e-01, -4.97788310e-01,
+          -6.54427648e-01, -2.63989449e-01, -6.49227202e-01,
+          -5.38610876e-01, -5.32812476e-01, -4.84187841e-01,
+          -6.20782912e-01, -8.01007748e-01, -5.37500024e-01,
+          -4.34479594e-01, -6.21282458e-01, -8.01411033e-01,
+          -4.44544733e-01, -1.14347279e-01, -1.45106852e-01,
+          -1.91932321e-01, -2.62746513e-01, -1.35192275e-02,
+          -1.79190755e-01, 1.56195760e-01, -4.75164354e-01,
+          1.72756910e-02, 7.59650469e-02, -6.47523999e-01,
+          -2.81182110e-01, -6.71835482e-01, 7.96018839e-02,
+          -6.88443184e-01, -2.91943789e-01, -6.78548336e-01 ]
+      - [ -6.90762222e-01, -3.27445328e-01, -3.92476916e-01,
+          6.33004904e-02, -7.54896283e-01, -4.00655568e-01,
+          -4.09647226e-01, 1.96559906e-01, -7.64099896e-01,
+          -3.76987338e-01, -2.48437464e-01, -1.06223285e-01,
+          -6.99615598e-01, -7.18613148e-01, -2.53125012e-01,
+          -9.26998258e-02, -6.96857452e-01, -7.25481272e-01,
+          1.98677540e-01, 7.15543032e-02, 2.32845902e-01, 6.35384321e-02,
+          -3.71051252e-01, 1.75018311e-01, 2.78612733e-01,
+          1.72326088e-01, -4.05553818e-01, 1.64787531e-01,
+          4.24657583e-01, -4.30697739e-01, -2.00598359e-01,
+          -4.77632523e-01, 4.30348396e-01, -5.70001423e-01,
+          -1.79810345e-01, -5.12057543e-01 ]
+      - [ -8.59569192e-01, -5.60316205e-01, -6.47977293e-01,
+          -2.18550503e-01, -7.78560519e-01, -6.51171207e-01,
+          -6.65946722e-01, -1.76071465e-01, -7.79288292e-01,
+          -6.64371908e-01, -5.54687440e-01, -2.87334263e-01,
+          -7.50074863e-01, -8.47507238e-01, -5.60937464e-01,
+          -2.41892040e-01, -7.51337588e-01, -8.48180056e-01,
+          -3.90441895e-01, -2.25324750e-01, -1.54105783e-01,
+          -1.40949488e-02, -5.26785135e-01, -2.18536615e-01,
+          -1.49132967e-01, -5.66726923e-03, -5.18582463e-01,
+          -2.61931062e-01, 5.35492897e-02, -3.97451460e-01,
+          -4.63142812e-01, -7.05047786e-01, 5.72139025e-02,
+          -4.48326647e-01, -4.66249228e-01, -7.13471055e-01 ]
+      - [ -7.33637094e-01, -4.85278487e-01, -5.10290980e-01,
+          8.91047716e-02, -8.66686702e-01, -5.93269289e-01,
+          -5.21958232e-01, 7.23544359e-02, -8.53801608e-01,
+          -6.33931816e-01, -4.90624905e-01, 1.30715251e-01,
+          -8.19021940e-01, -7.74661541e-01, -4.92187500e-01,
+          8.50270987e-02, -8.05976152e-01, -7.87663817e-01,
+          -9.16742682e-02, -2.62821913e-01, -4.83689904e-02,
+          1.56062841e-01, -7.07317710e-01, -3.58909249e-01,
+          -3.58382463e-02, 1.67746544e-02, -6.49241924e-01,
+          -4.56108987e-01, -1.36985779e-02, 1.32269025e-01,
+          -6.12742066e-01, -6.05907738e-01, -2.73631811e-02,
+          7.97117949e-02, -6.04185700e-01, -6.13187432e-01 ]
+      - [ -9.76594865e-01, -5.27888477e-01, -8.62313688e-01,
+          -6.44068718e-01, -4.73069906e-01, -6.82657599e-01,
+          -9.04967546e-01, -3.47245336e-01, -6.08694077e-01,
+          -5.93486547e-01, -8.57812464e-01, -6.98004425e-01,
+          -5.02630949e-01, -8.71576607e-01, -8.62500012e-01,
+          -6.73900187e-01, -4.93608117e-01, -8.77989292e-01,
+          -9.01412666e-01, -3.81849408e-01, -5.95050573e-01,
+          -4.03647840e-01, -2.58011639e-01, -4.12941337e-01,
+          -6.43930674e-01, -1.97649002e-01, -3.53574812e-01,
+          -3.21367562e-01, -5.59153199e-01, -9.21761990e-01,
+          -1.97610617e-01, -8.37605298e-01, -5.59701443e-01, -1.,
+          -2.00732708e-01, -8.47293019e-01 ]
+      - [ -7.11060524e-01, -3.28776240e-01, -4.53513145e-01,
+          2.35479474e-01, -8.02811384e-01, -3.39775980e-01,
+          -4.87401009e-01, -6.92057610e-02, -7.15844989e-01,
+          -4.90163267e-01, -4.03124928e-01, -2.71739185e-01,
+          -6.47514582e-01, -7.34539032e-01, -4.18750048e-01,
+          -2.19761074e-01, -6.41825438e-01, -7.26296425e-01,
+          -9.31781530e-03, -1.38157248e-01, 1.46232843e-02,
+          1.64788961e-01, -5.63230634e-01, -5.81184626e-02,
+          3.35259438e-02, -7.26158023e-02, -4.95175481e-01,
+          -2.88184345e-01, 1.80572867e-01, -3.87726009e-01,
+          -3.80084336e-01, -6.24639094e-01, 1.66666627e-01,
+          -3.60420048e-01, -3.91886652e-01, -6.07764006e-01 ]
+      - [ -8.23736548e-01, -1.94827139e-01, -5.20227134e-01,
+          1.86424732e-01, -6.83058262e-01, -1.15873575e-01,
+          -5.33477247e-01, 1.56814098e-01, -6.84214830e-01,
+          -2.36963391e-01, -4.26562428e-01, -1.40269339e-01,
+          -5.91402411e-01, -6.21245623e-01, -4.32812452e-01,
+          -1.01672053e-01, -5.91795444e-01, -6.22490764e-01,
+          -3.76615644e-01, 1.69408441e-01, 3.26209068e-02,
+          6.91425800e-03, -2.04758048e-01, 3.95822883e-01,
+          4.27745581e-02, -1.17655337e-01, -2.12213635e-01,
+          1.17718458e-01, 1.00871801e-01, -5.19698560e-01,
+          3.66162062e-02, -3.46718609e-01, 8.95521641e-02,
+          -5.81629872e-01, 4.65881824e-02, -3.54920030e-01 ]
+      - [ -6.08947754e-01, -2.15492070e-01, -2.54790604e-01,
+          8.94870281e-01, -8.82930458e-01, 2.20073462e-02,
+          -2.36861050e-01, -4.88814890e-01, -6.03384614e-01,
+          -6.07624292e-01, -1.56249940e-01, -6.36641741e-01,
+          -5.38842916e-01, -7.52708137e-01, -1.68749988e-01,
+          -3.67867827e-01, -5.79762220e-01, -7.10020065e-01,
+          2.47971058e-01, -4.77833152e-02, 3.04836869e-01,
+          9.22742367e-01, -7.75678337e-01, 4.17612672e-01,
+          3.20231199e-01, -6.01036787e-01, -3.21507037e-01,
+          -5.23532748e-01, 4.84433413e-01, -8.62252951e-01,
+          -2.21044242e-01, -6.88677371e-01, 4.67661738e-01,
+          -7.67882407e-01, -2.58485854e-01, -6.59620464e-01 ]
+      - [ -5.31483054e-01, 8.28555822e-02, -1.36976659e-01,
+          4.90448713e-01, -6.43590271e-01, 2.16053486e-01,
+          -1.40388727e-01, 1.75679088e-01, -5.42908192e-01,
+          -2.02804804e-03, -6.71874881e-02, -1.87948406e-01,
+          -3.77184093e-01, -3.89915764e-01, -6.87499642e-02,
+          -1.58556879e-01, -3.72819960e-01, -3.92018020e-01,
+          3.39344740e-01, 3.45543742e-01, 4.01574850e-01, 1.53306723e-02,
+          -1.80367827e-01, 4.59916711e-01, 4.58959460e-01,
+          -4.17480171e-01, -1.87930465e-02, 7.01745749e-02,
+          5.09339929e-01, -2.46122479e-01, 8.14194679e-02,
+          -9.84390974e-02, 5.19900560e-01, -2.64936924e-01,
+          6.80359602e-02, -1.07359767e-01 ]
+      - [ -9.85501230e-01, -6.87417865e-01, -8.93541515e-01,
+          -6.10235691e-01, -6.41799092e-01, -8.27839375e-01,
+          -9.16486681e-01, -5.70139885e-01, -6.43765211e-01,
+          -8.23895812e-01, -8.98437440e-01, -6.51164234e-01,
+          -6.18380845e-01, -9.33515310e-01, -9.04687464e-01,
+          -5.98693609e-01, -6.20882571e-01, -9.35313761e-01,
+          -9.25458372e-01, -5.31046093e-01, -6.76040530e-01,
+          -1.91313982e-01, -5.81317663e-01, -5.87181211e-01,
+          -6.92485571e-01, -1.63501263e-01, -5.61188638e-01,
+          -5.78109741e-01, -6.31382346e-01, -4.77556586e-01,
+          -5.56002378e-01, -8.75321865e-01, -6.31840825e-01,
+          -5.46716809e-01, -5.60007930e-01, -8.86484146e-01 ]
+      - [ -7.81275868e-01, -3.70373130e-01, -5.23065984e-01,
+          3.23374748e-01, -8.45849991e-01, -3.43977630e-01,
+          -5.20518363e-01, -1.29280686e-01, -7.50054598e-01,
+          -5.88179946e-01, -5.01562476e-01, -2.19495475e-01,
+          -6.67812884e-01, -7.40891576e-01, -5.17187476e-01,
+          3.85072231e-02, -7.11171627e-01, -6.95631921e-01,
+          -2.10700393e-01, -1.58238232e-01, -6.18672967e-02,
+          3.47707033e-01, -6.66398644e-01, -3.27683091e-02,
+          1.04045868e-02, -3.96977365e-01, -4.33073461e-01,
+          -5.17324507e-01, 6.22653961e-03, -3.20376277e-01,
+          -3.94308507e-01, -6.15782022e-01, -2.73631811e-02,
+          -1.82393849e-01, -4.34068918e-01, -5.71089387e-01 ]
+      - [ -7.08575010e-01, -2.74878621e-01, -4.13768649e-01,
+          7.81284928e-01, -8.89348269e-01, -5.53086996e-02,
+          -3.86609018e-01, -5.23808002e-01, -6.09552622e-01,
+          -6.46203279e-01, -3.64062488e-01, -3.59801054e-01,
+          -5.92330217e-01, -7.12790966e-01, -3.73437464e-01,
+          -3.91801596e-01, -5.75719953e-01, -7.29342878e-01,
+          -4.65885997e-02, -6.93001151e-02, 8.43644142e-02,
+          7.16627002e-01, -7.31953561e-01, 2.97709227e-01,
+          1.23699307e-01, -7.73954690e-01, -2.46604741e-01,
+          -6.06896520e-01, 1.83063507e-01, -5.08039057e-01,
+          -2.68380821e-01, -5.88875055e-01, 1.69154167e-01,
+          -6.12016916e-01, -2.53806055e-01, -6.09012127e-01 ]
+      - [ -6.43537700e-01, -4.33120370e-01, -4.27963138e-01,
+          1.33912683e-01, -8.49187851e-01, -5.32195926e-01,
+          -3.99567962e-01, 4.59525585e-02, -8.35726678e-01,
+          -6.24788404e-01, -3.31249952e-01, -4.98085022e-02,
+          -7.83342957e-01, -7.93319523e-01, -3.39062512e-01,
+          -5.12305498e-02, -7.78419316e-01, -8.01094532e-01,
+          8.98705721e-02, -2.36886203e-01, -1.01237297e-02,
+          2.75812030e-01, -7.14278460e-01, -2.38465607e-01,
+          5.66473007e-02, -3.22579145e-02, -6.27838016e-01,
+          -4.63425517e-01, 1.90535426e-01, -1.57348514e-01,
+          -5.65656185e-01, -6.85633063e-01, 1.76616907e-01,
+          -2.27948308e-01, -5.58166265e-01, -6.96685910e-01 ]
+      - [ -9.84051347e-01, -6.75492287e-01, -8.99219275e-01,
+          -6.35431170e-01, -5.98736286e-01, -8.12780499e-01,
+          -9.23686087e-01, -5.15826643e-01, -6.65848136e-01,
+          -8.09679925e-01, -8.96874964e-01, -6.40942752e-01,
+          -6.24854326e-01, -9.33154225e-01, -9.04687464e-01,
+          -5.70026338e-01, -6.23872399e-01, -9.28180218e-01,
+          -9.24857259e-01, -5.31100988e-01, -6.89538777e-01,
+          -2.24441886e-01, -5.25056243e-01, -5.56353986e-01,
+          -6.85549140e-01, -2.50559092e-01, -5.09115219e-01,
+          -6.07068539e-01, -6.26401067e-01, -5.50128222e-01,
+          -5.31290948e-01, -8.86138022e-01, -6.34328365e-01,
+          -6.26019299e-01, -4.92612839e-01, -8.77215385e-01 ]
+      - [ -5.71458161e-01, -1.89545453e-01, -2.19304502e-01,
+          5.05662918e-01, -8.13175976e-01, -1.62459016e-01,
+          -2.18142509e-01, 1.23550653e-01, -7.36644030e-01,
+          -3.97413254e-01, -1.42187417e-01, 5.92184067e-02,
+          -6.59544349e-01, -6.02222323e-01, -1.45312488e-01,
+          3.18917036e-02, -6.52137697e-01, -6.20737672e-01,
+          2.32942581e-01, 6.53063059e-02, 2.68841386e-01, 3.81684422e-01,
+          -5.62477231e-01, 2.24271417e-01, 3.41040492e-01,
+          -3.16715360e-01, -3.37188184e-01, -2.81551600e-01,
+          3.77335072e-01, -4.49087620e-02, -3.01252782e-01,
+          -3.62915277e-01, 4.02985096e-01, -2.88116157e-01,
+          -2.53708661e-01, -4.36079562e-01 ]
+      - [ -9.86122608e-01, -6.88116908e-01, -8.97799850e-01,
+          -6.46147966e-01, -6.16491675e-01, -8.34587693e-01,
+          -9.28005755e-01, -5.64906657e-01, -6.31741881e-01,
+          -8.15917909e-01, -9.03124988e-01, -6.83229923e-01,
+          -6.06381238e-01, -9.38655078e-01, -9.09375012e-01,
+          -6.40079856e-01, -6.00877345e-01, -9.39700484e-01,
+          -9.26660657e-01, -5.50047636e-01, -6.62542164e-01,
+          -3.28267515e-01, -4.79230940e-01, -6.14464521e-01,
+          -7.10982680e-01, -1.50320888e-01, -5.61443031e-01,
+          -5.68387091e-01, -6.23910308e-01, -6.35587692e-01,
+          -4.97284055e-01, -8.96708310e-01, -6.31840825e-01,
+          -6.81523681e-01, -4.92958665e-01, -8.95266294e-01 ]
+      - [ -9.36619759e-01, -5.30791998e-01, -7.85663605e-01,
+          -3.61266971e-01, -6.11269653e-01, -5.40890932e-01,
+          -7.39380836e-01, -7.31801033e-01, -5.49629688e-01,
+          -7.71564722e-01, -6.95312500e-01, -8.43001008e-01,
+          -5.33027411e-01, -8.86778772e-01, -6.99999988e-01,
+          -7.67415166e-01, -5.39130032e-01, -8.85276079e-01,
+          -6.84400380e-01, -3.91708612e-01, -3.76827896e-01,
+          -2.31946707e-02, -5.22734880e-01, -2.33129323e-01,
+          -3.13294888e-01, -4.50288355e-01, -4.66386855e-01,
+          -6.51198506e-01, -1.95516825e-01, -8.70284975e-01,
+          -4.09085512e-01, -8.64094198e-01, -2.13930309e-01,
+          -8.67490768e-01, -4.18706059e-01, -8.54325056e-01 ]
+      - [ -9.47390199e-01, -7.04180598e-01, -8.16891432e-01,
+          -3.44127595e-01, -8.28439057e-01, -8.17463160e-01,
+          -8.38732839e-01, -3.23173463e-01, -8.07912886e-01,
+          -8.09735775e-01, -7.89062500e-01, -3.21737230e-01,
+          -7.95948625e-01, -9.19027507e-01, -7.95312464e-01,
+          -2.72531748e-01, -7.98522115e-01, -9.19604599e-01,
+          -7.68560290e-01, -4.83674169e-01, -4.75815475e-01,
+          -1.49390697e-01, -6.44736052e-01, -5.92239618e-01,
+          -5.07514477e-01, -9.28364396e-02, -6.13794506e-01,
+          -5.39060831e-01, -4.14694905e-01, -3.28301191e-01,
+          -6.16341233e-01, -8.31296504e-01, -4.12935317e-01,
+          -4.54143584e-01, -5.96810818e-01, -8.48750234e-01 ]
+      - [ -6.79784596e-01, -2.33760357e-01, -3.74024153e-01,
+          1.40361071e-01, -7.47619271e-01, -3.21168661e-01,
+          -3.75090003e-01, 3.08056712e-01, -7.62227178e-01,
+          -2.85538018e-01, -3.17187428e-01, -5.39918542e-02,
+          -6.39098763e-01, -6.30984247e-01, -3.28125000e-01,
+          1.53831244e-02, -6.43237591e-01, -6.21383309e-01,
+          5.74090481e-02, 8.44621658e-02, 1.92350984e-01,
+          -4.24703956e-02, -4.01778460e-01, -2.34628320e-02,
+          2.11560607e-01, 7.65516758e-02, -3.98592532e-01,
+          5.60376644e-02, 2.35367298e-01, -1.29621804e-01,
+          -2.30805755e-01, -3.46267521e-01, 2.36318350e-01,
+          -9.29810405e-02, -2.58966506e-01, -3.39453995e-01 ]
+      - [ -6.18061304e-01, -2.08452702e-01, -2.87437856e-01,
+          1.31213903e-01, -7.53530800e-01, -3.51285696e-01,
+          -3.11735034e-01, 3.16381812e-01, -7.59930134e-01,
+          -2.82981396e-01, -2.67187417e-01, -6.75684214e-02,
+          -6.23147249e-01, -6.17189884e-01, -2.70312488e-01,
+          -3.47760320e-02, -6.25540495e-01, -6.22305512e-01,
+          9.04718637e-02, 3.30508947e-02, 1.99100137e-01,
+          -1.12352014e-01, -4.32594836e-01, -1.62348926e-01,
+          1.12138629e-01, 1.78787827e-01, -4.62477684e-01,
+          6.64347410e-02, 2.62764692e-01, -1.81777418e-01,
+          -2.94512033e-01, -4.37937140e-01, 2.43781090e-01,
+          -1.51567638e-01, -3.10244083e-01, -4.23683345e-01 ]
+      - [ -8.15451503e-01, -3.89556587e-01, -5.67068815e-01,
+          -1.81180239e-03, -7.64352560e-01, -4.54671919e-01,
+          -5.73794007e-01, 1.52343512e-02, -7.38091707e-01,
+          -4.56387460e-01, -4.64062512e-01, -1.56403661e-01,
+          -7.02638865e-01, -7.49482870e-01, -4.67187464e-01,
+          -1.62414610e-01, -6.91542625e-01, -7.56247222e-01,
+          -2.80432880e-01, -3.37332487e-02, 3.26209068e-02,
+          -1.56068325e-01, -3.86230528e-01, -1.48270130e-01,
+          1.50288343e-02, -2.24920511e-02, -3.69516373e-01,
+          -2.20268965e-02, 1.28268957e-01, -4.18005943e-01,
+          -2.63178170e-01, -5.43606460e-01, 7.96018839e-02,
+          -3.24280739e-01, -2.85018444e-01, -5.00577331e-01 ]
+      - [ -9.83637094e-01, -7.05649972e-01, -8.96380424e-01,
+          -5.63504040e-01, -6.82013988e-01, -8.31289947e-01,
+          -9.16486681e-01, -5.37508845e-01, -6.78532124e-01,
+          -8.32802892e-01, -9.03124988e-01, -5.78660965e-01,
+          -6.55495524e-01, -9.32453215e-01, -9.07812476e-01,
+          -5.29247165e-01, -6.58705592e-01, -9.34432447e-01,
+          -9.17042375e-01, -5.26345968e-01, -6.73790753e-01,
+          -1.79395854e-01, -5.80709934e-01, -5.71206689e-01,
+          -6.87861323e-01, -1.83288932e-01, -5.40565133e-01,
+          -5.72063625e-01, -6.28891647e-01, -4.63331163e-01,
+          -5.46228051e-01, -8.63940239e-01, -6.19403005e-01,
+          -5.49529433e-01, -5.49745321e-01, -8.79618883e-01 ]
+      - [ -8.73446584e-01, -4.07728672e-01, -6.43718958e-01,
+          1.86123848e-01, -7.87212551e-01, -3.02454114e-01,
+          -5.91072679e-01, -4.67392385e-01, -6.59494877e-01,
+          -6.90286398e-01, -5.39062500e-01, -3.79484773e-01,
+          -6.59513950e-01, -7.97165930e-01, -5.45312464e-01,
+          -3.57109010e-01, -6.54509664e-01, -8.01909029e-01,
+          -4.81214404e-01, -1.44770622e-01, -1.58605158e-01,
+          4.06361699e-01, -6.02945089e-01, 2.16845274e-01,
+          -5.20231128e-02, -7.04034090e-01, -2.39840746e-01,
+          -5.73120594e-01, 1.86798573e-02, -7.36697555e-01,
+          -2.62397647e-01, -7.02562928e-01, 7.46262074e-03,
+          -8.53131771e-01, -2.41268754e-01, -7.18088925e-01 ]
+      - [ -4.64374483e-01, -6.05396032e-02, -1.39815509e-01,
+          2.73788571e-01, -6.56528056e-01, -3.93373370e-02,
+          -1.64866686e-01, 1.84229851e-01, -5.91980457e-01,
+          -8.23371410e-02, 2.81250477e-02, -5.81787348e-01,
+          -4.28810418e-01, -6.23344958e-01, 3.28125954e-02,
+          -3.49628568e-01, -4.72475290e-01, -5.87918997e-01,
+          5.59362769e-01, 1.24578714e-01, 4.19572473e-01, 1.63993835e-02,
+          -3.26265693e-01, 1.94326520e-01, 3.41040492e-01,
+          1.20903611e-01, -2.74288833e-01, 3.42662334e-01,
+          7.35990047e-01, -9.36630189e-01, -1.68231726e-02,
+          -5.60631752e-01, 7.21393108e-01, -7.48802304e-01,
+          -8.79661441e-02, -5.06944418e-01 ]
+      - [ -9.98964369e-01, -7.25596368e-01, -9.53158259e-01,
+          -5.89996219e-01, -6.62434340e-01, -8.65734994e-01,
+          -9.78401721e-01, -5.32806396e-01, -6.70866668e-01,
+          -8.56963396e-01, -9.71875012e-01, -5.74841976e-01,
+          -6.39564812e-01, -9.49942172e-01, -9.76562500e-01,
+          -5.07862329e-01, -6.53109550e-01, -9.50777590e-01,
+          -9.96393144e-01, -6.33899093e-01, -8.17772746e-01,
+          -1.76430821e-01, -6.50708079e-01, -7.06748605e-01,
+          -8.38150263e-01, -1.84719205e-01, -5.92698097e-01,
+          -6.93982482e-01, -8.08219194e-01, -4.26999569e-01,
+          -5.99173069e-01, -9.16422009e-01, -8.10945272e-01,
+          -4.45301175e-01, -6.29743040e-01, -9.24050868e-01 ]
+      - [ -9.86122608e-01, -6.88116908e-01, -8.97799850e-01,
+          -6.46147966e-01, -6.16491675e-01, -8.34587693e-01,
+          -9.28005755e-01, -5.64906657e-01, -6.31741881e-01,
+          -8.15917909e-01, -9.03124988e-01, -6.83229923e-01,
+          -6.06381238e-01, -9.38655078e-01, -9.09375012e-01,
+          -6.40079856e-01, -6.00877345e-01, -9.39700484e-01,
+          -9.26660657e-01, -5.50047636e-01, -6.62542164e-01,
+          -3.28267515e-01, -4.79230940e-01, -6.14464521e-01,
+          -7.10982680e-01, -1.50320888e-01, -5.61443031e-01,
+          -5.68387091e-01, -6.23910308e-01, -6.35587692e-01,
+          -4.97284055e-01, -8.96708310e-01, -6.31840825e-01,
+          -6.81523681e-01, -4.92958665e-01, -8.95266294e-01 ]
+      - [ -9.74523604e-01, -5.31839728e-01, -8.58055353e-01,
+          -6.27411127e-01, -4.91741121e-01, -6.82927251e-01,
+          -9.02087808e-01, -3.19190502e-01, -6.29657388e-01,
+          -5.91385245e-01, -8.53124976e-01, -6.69366777e-01,
+          -5.20065546e-01, -8.69171858e-01, -8.57812464e-01,
+          -6.44873261e-01, -5.11701703e-01, -8.75424504e-01,
+          -8.93597841e-01, -3.72432232e-01, -5.88301480e-01,
+          -3.80190074e-01, -2.74113417e-01, -3.98107588e-01,
+          -6.36994243e-01, -1.84757531e-01, -3.56670201e-01,
+          -3.05361986e-01, -5.46699882e-01, -9.06813145e-01,
+          -2.03182697e-01, -8.31202209e-01, -5.47263622e-01,
+          -9.93266284e-01, -1.99573934e-01, -8.41070056e-01 ]
+      - [ -8.77174854e-01, -3.97728086e-01, -6.12491131e-01,
+          -4.29597557e-01, -5.68977475e-01, -5.23754895e-01,
+          -6.61627054e-01, -8.48048925e-02, -6.51186943e-01,
+          -3.87805462e-01, -5.15625000e-01, -6.29323483e-01,
+          -5.50028741e-01, -7.94285953e-01, -5.29687524e-01,
+          -4.96937811e-01, -5.68362236e-01, -7.80562520e-01,
+          -4.92636025e-01, -1.11537516e-01, -1.13610804e-01,
+          -2.74772704e-01, -2.60992050e-01, -1.19399488e-01,
+          -1.81502879e-01, 9.47574377e-02, -3.57497334e-01,
+          1.50460243e-01, 1.15815639e-01, -9.76780534e-01,
+          -1.51152492e-01, -7.19320118e-01, 6.96516037e-02,
+          -7.94844747e-01, -2.06319809e-01, -6.63684726e-01 ]
+      - [ -5.62551796e-01, -1.20627940e-01, -2.20723927e-01,
+          1.85472965e-01, -6.55558884e-01, -1.13931894e-01,
+          -2.39740789e-01, 1.85346603e-01, -6.24304414e-01,
+          -1.34854615e-01, -4.68749404e-02, -5.38219333e-01,
+          -4.80433285e-01, -6.57762647e-01, -5.46874404e-02,
+          -3.76858592e-01, -5.00357151e-01, -6.31411433e-01,
+          3.76014352e-01, 1.03372574e-01, 3.63329649e-01, 3.98993492e-02,
+          -3.50160480e-01, 1.81885242e-01, 2.83236861e-01,
+          1.34692311e-01, -2.95066595e-01, 3.23056459e-01,
+          6.48816824e-01, -9.54978585e-01, -2.83200741e-02,
+          -5.82837224e-01, 6.26865745e-01, -7.26733208e-01,
+          -1.10588729e-01, -5.19604921e-01 ]
+      - [ -9.95028973e-01, -7.05238342e-01, -9.47480500e-01,
+          -5.70201576e-01, -6.23529553e-01, -8.13929379e-01,
+          -9.45284367e-01, -7.07015097e-01, -5.93288183e-01,
+          -8.88702214e-01, -9.49999988e-01, -6.90301895e-01,
+          -5.84191203e-01, -9.49743807e-01, -9.56250012e-01,
+          -6.36798978e-01, -5.82962871e-01, -9.49850142e-01,
+          -9.66937184e-01, -5.95239639e-01, -7.70528674e-01,
+          -1.48269594e-01, -5.85274041e-01, -5.66196680e-01,
+          -7.59537637e-01, -3.22286129e-01, -5.50773621e-01,
+          -7.55620480e-01, -7.33499408e-01, -5.89572668e-01,
+          -5.32714725e-01, -9.21069682e-01, -7.36318409e-01,
+          -6.34456456e-01, -5.34771800e-01, -9.21566069e-01 ]
+      - [ -9.61474717e-01, -6.14930153e-01, -8.63733113e-01,
+          -2.44465649e-01, -7.86531806e-01, -6.84349060e-01,
+          -8.81929398e-01, -2.76934803e-01, -7.85051703e-01,
+          -7.54349589e-01, -8.39062452e-01, -2.93079317e-01,
+          -7.53803074e-01, -8.86563301e-01, -8.46874952e-01,
+          -2.43196011e-01, -7.57411003e-01, -8.87516499e-01,
+          -8.79170418e-01, -4.38313603e-01, -6.28796458e-01,
+          -2.23838151e-01, -4.94334757e-01, -4.94756162e-01,
+          -6.55491352e-01, -1.80031836e-01, -5.73144674e-01,
+          -6.02490783e-01, -6.23910308e-01, -3.90479207e-01,
+          -4.95226979e-01, -8.00287902e-01, -6.31840825e-01,
+          -4.59812462e-01, -4.93068933e-01, -8.13270986e-01 ]
+      - [ -9.79908884e-01, -5.99250078e-01, -8.89283180e-01,
+          -4.74300683e-01, -6.33822978e-01, -7.05976605e-01,
+          -9.10727143e-01, -4.24606264e-01, -6.54972196e-01,
+          -7.21876562e-01, -8.75000000e-01, -5.17786443e-01,
+          -6.31720901e-01, -8.89791131e-01, -8.79687488e-01,
+          -4.80147898e-01, -6.28928304e-01, -8.92266154e-01,
+          -9.06822979e-01, -4.35802996e-01, -6.53543234e-01,
+          -1.89240158e-01, -4.72175181e-01, -4.24566269e-01,
+          -6.76300645e-01, -1.56128109e-01, -4.96250570e-01,
+          -4.75717723e-01, -6.03985071e-01, -5.05847096e-01,
+          -4.77053583e-01, -8.34138334e-01, -6.01990104e-01,
+          -5.85179508e-01, -4.72038925e-01, -8.44650567e-01 ]
+      - [ -8.98508728e-01, -5.44732213e-01, -7.06174612e-01,
+          -2.94532537e-01, -7.27183938e-01, -6.34611011e-01,
+          -7.29301691e-01, -2.09308684e-01, -7.33173549e-01,
+          -6.19974613e-01, -6.31250024e-01, -3.58604252e-01,
+          -7.02937722e-01, -8.37200522e-01, -6.39062524e-01,
+          -3.21608543e-01, -7.00232625e-01, -8.38336468e-01,
+          -5.99038243e-01, -1.74712956e-01, -2.80089974e-01,
+          -1.29978538e-01, -3.90983105e-01, -1.48933470e-01,
+          -2.85549164e-01, -9.44080949e-02, -3.50769579e-01,
+          -1.05131865e-01, -1.05853021e-01, -5.69546580e-01,
+          -3.15551460e-01, -6.79692984e-01, -1.26865745e-01,
+          -5.90741634e-01, -3.18838537e-01, -6.75497532e-01 ]
+      - [ -7.08989263e-01, -3.48871291e-01, -4.42157567e-01,
+          1.32394314e-01, -8.12192261e-01, -4.52187657e-01,
+          -4.52843726e-01, 1.23270750e-01, -7.94810712e-01,
+          -4.89718616e-01, -4.14062500e-01, 1.13213062e-03,
+          -7.16966629e-01, -7.00401187e-01, -4.23437476e-01,
+          -5.78780174e-02, -6.99261367e-01, -7.20178127e-01,
+          -8.68651271e-02, -1.21682286e-01, 2.36220360e-02,
+          -1.06426537e-01, -5.24936438e-01, -3.16991210e-01,
+          3.58381271e-02, -1.65798306e-01, -4.72228348e-01,
+          -3.49199235e-01, 4.35866117e-02, -3.19211483e-02,
+          -4.28874493e-01, -4.96914208e-01, 4.22885418e-02,
+          -1.80458963e-01, -4.01065290e-01, -5.35483539e-01 ]
+      - [ -9.64788735e-01, -6.60398483e-01, -8.45280349e-01,
+          -4.74960804e-01, -7.15559542e-01, -7.81264186e-01,
+          -8.61771047e-01, -4.51320648e-01, -7.17602372e-01,
+          -7.93809414e-01, -8.18749964e-01, -5.32712460e-01,
+          -6.85931444e-01, -9.12369192e-01, -8.24999988e-01,
+          -4.84547198e-01, -6.87557101e-01, -9.14231837e-01,
+          -8.67147565e-01, -4.50753212e-01, -5.86051702e-01,
+          -1.36249483e-01, -5.72202384e-01, -4.85692322e-01,
+          -5.93063593e-01, -1.45584404e-01, -5.51590085e-01,
+          -5.20408452e-01, -5.19302607e-01, -4.21728432e-01,
+          -5.21991909e-01, -8.14367771e-01, -5.14925361e-01,
+          -5.43023288e-01, -5.11059761e-01, -8.37715507e-01 ]
+      - [ -4.81980145e-01, -6.81853294e-03, -9.01347995e-02,
+          2.48419762e-01, -6.69441283e-01, -9.62846279e-02,
+          -1.18790448e-01, 6.79442406e-01, -7.30775118e-01,
+          7.90889263e-02, -2.49999762e-02, -4.02132273e-02,
+          -4.98711467e-01, -4.52954531e-01, -2.96874642e-02,
+          -1.44149244e-01, -4.75273550e-01, -4.97405887e-01,
+          4.51157212e-01, 2.68793225e-01, 4.73565817e-01,
+          -1.76813066e-01, -2.03621328e-01, 1.49432302e-01,
+          4.08092380e-01, 2.11812019e-01, -2.83041477e-01,
+          4.52067852e-01, 5.59153080e-01, -2.82853961e-01,
+          3.87203693e-03, -1.96178257e-01, 5.94527364e-01,
+          -5.23171902e-01, 4.31807041e-02, -2.85106897e-01 ]
+      - [ -6.28003299e-01, -2.28910148e-01, -2.56210089e-01,
+          7.57763267e-01, -8.51357818e-01, -1.33862495e-02,
+          -2.42620528e-01, -4.86441374e-01, -6.06133223e-01,
+          -6.09891415e-01, -1.40624940e-01, -5.88094831e-01,
+          -5.60111701e-01, -7.52862155e-01, -1.54687464e-01,
+          -3.73610377e-01, -5.90751708e-01, -7.21818686e-01,
+          2.08295703e-01, -4.32795882e-02, 3.13835740e-01,
+          8.89242291e-01, -7.60267675e-01, 4.17332053e-01,
+          3.15606833e-01, -5.92788815e-01, -3.21148515e-01,
+          -5.16772985e-01, 4.52054739e-01, -8.63351226e-01,
+          -2.13555992e-01, -6.86086893e-01, 4.62686658e-01,
+          -7.43722677e-01, -2.65897155e-01, -6.55550718e-01 ]
+      - [ -8.98301601e-01, -3.66212964e-01, -6.84882879e-01,
+          7.71312714e-02, -7.09865689e-01, -2.45172977e-01,
+          -6.48668051e-01, -3.94546509e-01, -6.28114164e-01,
+          -6.09332085e-01, -5.89062452e-01, -4.13925767e-01,
+          -6.02135062e-01, -7.65710831e-01, -5.98437428e-01,
+          -3.10780406e-01, -6.16994858e-01, -7.56506145e-01,
+          -6.02645040e-01, -1.04215801e-01, -2.64341891e-01,
+          2.54232645e-01, -4.51347828e-01, 3.27449083e-01,
+          -1.65318012e-01, -7.29577303e-01, -9.18507576e-02,
+          -4.68530178e-01, -9.83811617e-02, -9.28042650e-01,
+          -8.13841224e-02, -6.81236446e-01, -1.09452724e-01,
+          -8.74733567e-01, -1.22221172e-01, -6.62433028e-01 ]
+      - [ -9.49254334e-01, -5.99367678e-01, -8.08374703e-01,
+          -4.26197469e-01, -6.89824164e-01, -7.04465270e-01,
+          -8.14254820e-01, -4.77590322e-01, -6.75705791e-01,
+          -7.53923893e-01, -7.53124952e-01, -6.33473516e-01,
+          -6.47328556e-01, -9.04154181e-01, -7.59374976e-01,
+          -5.70691645e-01, -6.49691105e-01, -9.03013229e-01,
+          -8.05229962e-01, -4.20117259e-01, -5.16310453e-01,
+          -1.47382677e-01, -5.27243495e-01, -4.19237077e-01,
+          -5.19075215e-01, -1.83735132e-01, -5.17979980e-01,
+          -5.00896037e-01, -4.04732287e-01, -6.08275890e-01,
+          -4.83908951e-01, -8.44976366e-01, -4.10447717e-01,
+          -6.03041887e-01, -4.98709977e-01, -8.37122679e-01 ]
+      - [ -9.86122608e-01, -6.24418020e-01, -9.07736003e-01,
+          -4.56593513e-01, -6.89448118e-01, -7.58988857e-01,
+          -9.42404628e-01, -3.61655116e-01, -6.94723964e-01,
+          -7.24788487e-01, -9.09374952e-01, -4.53851044e-01,
+          -6.66171312e-01, -9.00204122e-01, -9.17187512e-01,
+          -4.01724637e-01, -6.65921569e-01, -8.99288476e-01,
+          -9.41689253e-01, -4.33530033e-01, -7.23284602e-01,
+          -1.30918145e-01, -5.09548783e-01, -4.16001022e-01,
+          -7.36416221e-01, -1.05744362e-01, -5.05367517e-01,
+          -4.40142274e-01, -6.61270261e-01, -5.36860466e-01,
+          -4.63420331e-01, -8.50562572e-01, -6.79104447e-01,
+          -5.57194471e-01, -4.54173028e-01, -8.39524150e-01 ]
+      - [ -8.67854178e-01, -6.81727827e-01, -7.10432947e-01,
+          -1.11426890e-01, -9.07538533e-01, -7.91318655e-01,
+          -7.22102165e-01, -9.07808542e-02, -9.00008440e-01,
+          -8.04420471e-01, -6.84374928e-01, 6.53324127e-02,
+          -8.90878320e-01, -8.86675715e-01, -6.85937524e-01,
+          1.51752234e-02, -8.79443347e-01, -8.98639083e-01,
+          -3.91644180e-01, -5.08302450e-01, -2.10348666e-01,
+          1.85604215e-01, -8.47823262e-01, -6.18850827e-01,
+          -2.00000048e-01, 1.54215217e-01, -8.23614836e-01,
+          -6.50065958e-01, -2.22914040e-01, 2.13108659e-01,
+          -7.88493276e-01, -7.67007053e-01, -2.11442828e-01,
+          3.40954065e-02, -7.60680556e-01, -7.97873735e-01 ]
+      - [ -8.75517845e-01, -4.46870446e-01, -6.47977293e-01,
+          -2.49845207e-01, -6.61327720e-01, -4.97788310e-01,
+          -6.54427648e-01, -2.63989449e-01, -6.49227202e-01,
+          -5.38610876e-01, -5.32812476e-01, -4.84187841e-01,
+          -6.20782912e-01, -8.01007748e-01, -5.37500024e-01,
+          -4.34479594e-01, -6.21282458e-01, -8.01411033e-01,
+          -4.44544733e-01, -1.14347279e-01, -1.45106852e-01,
+          -1.91932321e-01, -2.62746513e-01, -1.35192275e-02,
+          -1.79190755e-01, 1.56195760e-01, -4.75164354e-01,
+          1.72756910e-02, 7.59650469e-02, -6.47523999e-01,
+          -2.81182110e-01, -6.71835482e-01, 7.96018839e-02,
+          -6.88443184e-01, -2.91943789e-01, -6.78548336e-01 ]
+      - [ -9.60439086e-01, -5.00424385e-01, -8.50958109e-01,
+          -1.04142249e-01, -7.42943645e-01, -4.67369258e-01,
+          -8.47372174e-01, -2.91490734e-01, -7.16345906e-01,
+          -6.69978142e-01, -8.18749964e-01, -2.66185701e-01,
+          -6.88992262e-01, -8.18709731e-01, -8.24999988e-01,
+          -2.15517581e-01, -6.90842748e-01, -8.17067683e-01,
+          -8.71355593e-01, -1.97603524e-01, -6.06299162e-01,
+          1.46474361e-01, -4.95455444e-01, 8.69580507e-02,
+          -5.72254419e-01, -3.23117495e-01, -3.22476566e-01,
+          -4.18972135e-01, -5.44209242e-01, -5.20210981e-01,
+          -2.77763426e-01, -6.96340561e-01, -5.54726362e-01,
+          -4.62961495e-01, -3.15905452e-01, -6.74524665e-01 ]
+      - [ -9.04722452e-01, -5.37475348e-01, -6.99077368e-01,
+          -4.00793076e-01, -6.59583092e-01, -6.27978504e-01,
+          -7.12023020e-01, -3.95506322e-01, -6.52390957e-01,
+          -6.48712337e-01, -6.12499952e-01, -5.26802778e-01,
+          -6.35662675e-01, -8.37936819e-01, -6.15625024e-01,
+          -5.29702425e-01, -6.24631047e-01, -8.47776294e-01,
+          -5.96633613e-01, -2.40808010e-01, -2.66591609e-01,
+          -2.44001150e-01, -3.38006735e-01, -2.17058182e-01,
+          -2.55491316e-01, -2.37586617e-01, -3.53416145e-01,
+          -2.82887340e-01, -1.23287737e-01, -3.77411783e-01,
+          -4.08660591e-01, -6.64843500e-01, -1.06965184e-01,
+          -6.56597316e-01, -3.48692238e-01, -7.23254979e-01 ]
+      - [ -9.21706736e-01, -5.88227510e-01, -7.27466285e-01,
+          -4.42935765e-01, -6.90947890e-01, -7.04090953e-01,
+          -7.43700504e-01, -3.96947205e-01, -7.01946616e-01,
+          -7.15157151e-01, -6.57812476e-01, -5.72524428e-01,
+          -6.64402902e-01, -8.79983604e-01, -6.64062500e-01,
+          -5.17233849e-01, -6.65836573e-01, -8.80113184e-01,
+          -6.30898714e-01, -3.56366336e-01, -3.16085458e-01,
+          -1.44262433e-01, -5.17295003e-01, -3.73116612e-01,
+          -3.15607011e-01, -9.47023630e-02, -5.34695268e-01,
+          -3.99068534e-01, -1.28268957e-01, -5.61442912e-01,
+          -4.88783598e-01, -8.01731706e-01, -1.31840825e-01,
+          -6.01687431e-01, -4.91031229e-01, -8.02811205e-01 ]
+      - [ -5.67108572e-01, -3.31602693e-02, -2.07948864e-01,
+          5.18668652e-01, -7.06914723e-01, 1.14154816e-01,
+          -2.13822842e-01, 2.02911615e-01, -6.21612310e-01,
+          -1.16515756e-01, -9.06249285e-02, -7.04243779e-02,
+          -5.16711116e-01, -4.90520298e-01, -9.53124762e-02,
+          -5.42201996e-02, -5.11194348e-01, -4.95380223e-01,
+          3.09287548e-01, 3.27001929e-01, 3.92575979e-01, 1.26530290e-01,
+          -2.44415462e-01, 5.08439183e-01, 4.38150287e-01,
+          -3.74514401e-01, -5.01648188e-02, 7.50126839e-02,
+          4.76961374e-01, -2.55991101e-01, 6.89456463e-02,
+          -1.18941426e-01, 4.77611899e-01, -2.41732597e-01,
+          4.88481522e-02, -1.13082111e-01 ]
+      - [ -8.97887349e-01, -4.58641112e-01, -6.91980124e-01,
+          -3.12862933e-01, -6.23104572e-01, -5.01069665e-01,
+          -6.84665203e-01, -4.10836399e-01, -6.11326039e-01,
+          -6.05005503e-01, -5.79687476e-01, -6.31083250e-01,
+          -5.84524512e-01, -8.30153883e-01, -5.89062452e-01,
+          -5.55800200e-01, -5.88218093e-01, -8.26067269e-01,
+          -5.08265734e-01, -2.66522050e-01, -2.14848220e-01,
+          -1.32456064e-01, -4.30347502e-01, -2.09643960e-01,
+          -2.23121405e-01, -1.10457540e-01, -4.77030575e-01,
+          -3.18189621e-01, -3.11332941e-02, -6.64149284e-01,
+          -3.76827240e-01, -7.55976796e-01, -5.97015619e-02,
+          -6.04526699e-01, -4.00990427e-01, -7.35028863e-01 ]
+      - [ -8.97887349e-01, -4.58641112e-01, -6.91980124e-01,
+          -3.12862933e-01, -6.23104572e-01, -5.01069665e-01,
+          -6.84665203e-01, -4.10836399e-01, -6.11326039e-01,
+          -6.05005503e-01, -5.79687476e-01, -6.31083250e-01,
+          -5.84524512e-01, -8.30153883e-01, -5.89062452e-01,
+          -5.55800200e-01, -5.88218093e-01, -8.26067269e-01,
+          -5.08265734e-01, -2.66522050e-01, -2.14848220e-01,
+          -1.32456064e-01, -4.30347502e-01, -2.09643960e-01,
+          -2.23121405e-01, -1.10457540e-01, -4.77030575e-01,
+          -3.18189621e-01, -3.11332941e-02, -6.64149284e-01,
+          -3.76827240e-01, -7.55976796e-01, -5.97015619e-02,
+          -6.04526699e-01, -4.00990427e-01, -7.35028863e-01 ]
+      - [ -8.92502069e-01, -5.41647673e-01, -6.87721789e-01,
+          -3.54080856e-01, -6.86543107e-01, -6.24790847e-01,
+          -7.07703352e-01, -3.30455303e-01, -6.82241321e-01,
+          -6.38785720e-01, -6.07812524e-01, -4.33356404e-01,
+          -6.65611506e-01, -8.29069316e-01, -6.09375000e-01,
+          -4.40040350e-01, -6.56204402e-01, -8.40225875e-01,
+          -5.46738863e-01, -2.04877317e-01, -2.30596185e-01,
+          -1.97558880e-01, -3.53378415e-01, -1.74269676e-01,
+          -2.27745771e-01, -2.21289337e-01, -3.34920883e-01,
+          -2.32651412e-01, -8.84184837e-02, -3.34599614e-01,
+          -3.92551124e-01, -6.28073812e-01, -5.97015619e-02,
+          -6.32951796e-01, -3.30900669e-01, -6.96007371e-01 ]
+      - [ -8.50455642e-01, -3.87221754e-01, -5.72746634e-01,
+          -3.30412626e-01, -6.32924795e-01, -5.23523271e-01,
+          -6.16990626e-01, -3.18642855e-02, -6.73299551e-01,
+          -3.80071044e-01, -4.67187464e-01, -4.24205005e-01,
+          -6.08792424e-01, -7.61692941e-01, -4.71875012e-01,
+          -4.01843309e-01, -6.02693081e-01, -7.66284525e-01,
+          -3.71806502e-01, -3.13030481e-02, -3.93701196e-02,
+          -2.79575944e-01, -2.16816902e-01, -4.93075848e-02,
+          -8.20809603e-02, 2.59665251e-02, -2.70427167e-01,
+          2.10198641e-01, 1.65628791e-01, -6.75558448e-01,
+          -1.62217021e-01, -5.87967038e-01, 1.59203887e-01,
+          -7.62891293e-01, -1.55410767e-01, -6.04225755e-01 ]
+      - [ -7.98467278e-01, -3.65894616e-01, -5.13129890e-01,
+          4.29971814e-01, -8.38500917e-01, -2.22984910e-01,
+          -4.68682528e-01, -4.20638323e-01, -6.80044115e-01,
+          -6.75758243e-01, -4.17187452e-01, -2.90976524e-01,
+          -6.72938406e-01, -7.67467499e-01, -4.21875000e-01,
+          -3.00865531e-01, -6.62499547e-01, -7.77746081e-01,
+          -2.02284455e-01, -8.60197544e-02, 4.16197777e-02,
+          5.64059973e-01, -6.65407181e-01, 2.64848471e-01,
+          1.14450812e-01, -7.89984345e-01, -2.20630586e-01,
+          -5.95245183e-01, 1.60647631e-01, -6.35218263e-01,
+          -2.53002822e-01, -6.38192058e-01, 1.41791105e-01,
+          -7.64730692e-01, -2.26394951e-01, -6.58286810e-01 ]
+      - [ -9.39312339e-01, -5.20259321e-01, -7.68630207e-01,
+          -3.90566766e-01, -6.40470147e-01, -6.04387641e-01,
+          -7.88336873e-01, -3.72862160e-01, -6.54765129e-01,
+          -6.46653593e-01, -7.12499976e-01, -5.93416929e-01,
+          -6.10633016e-01, -8.59528720e-01, -7.18750000e-01,
+          -5.42364240e-01, -6.09990597e-01, -8.59825134e-01,
+          -7.39705443e-01, -3.36818635e-01, -4.26321745e-01,
+          -1.30875409e-01, -4.70629454e-01, -2.95879602e-01,
+          -4.58959579e-01, -1.50697827e-02, -5.40595174e-01,
+          -3.29418480e-01, -3.05105925e-01, -6.19677663e-01,
+          -4.15754855e-01, -7.93883860e-01, -3.05970132e-01,
+          -6.61233127e-01, -4.13002133e-01, -7.91017234e-01 ]
+      - [ -9.90886509e-01, -7.17558742e-01, -9.16252613e-01,
+          -6.48846030e-01, -6.50685012e-01, -8.73920321e-01,
+          -9.38084960e-01, -6.08135343e-01, -6.48073435e-01,
+          -8.63933921e-01, -9.31249976e-01, -6.69176757e-01,
+          -6.20827556e-01, -9.51457322e-01, -9.35937464e-01,
+          -6.17625833e-01, -6.20052099e-01, -9.51482654e-01,
+          -9.52509761e-01, -5.79261065e-01, -7.14285731e-01,
+          -2.59651363e-01, -5.85529268e-01, -6.83490217e-01,
+          -7.20231235e-01, -2.84890056e-01, -5.15056491e-01,
+          -6.63889289e-01, -6.98630095e-01, -5.63744128e-01,
+          -5.22693634e-01, -9.00489271e-01, -6.99005008e-01,
+          -6.54460907e-01, -5.08285046e-01, -9.07619655e-01 ]
+      - [ -9.79908884e-01, -5.99250078e-01, -8.89283180e-01,
+          -4.74300683e-01, -6.33822978e-01, -7.05976605e-01,
+          -9.10727143e-01, -4.24606264e-01, -6.54972196e-01,
+          -7.21876562e-01, -8.75000000e-01, -5.17786443e-01,
+          -6.31720901e-01, -8.89791131e-01, -8.79687488e-01,
+          -4.80147898e-01, -6.28928304e-01, -8.92266154e-01,
+          -9.06822979e-01, -4.35802996e-01, -6.53543234e-01,
+          -1.89240158e-01, -4.72175181e-01, -4.24566269e-01,
+          -6.76300645e-01, -1.56128109e-01, -4.96250570e-01,
+          -4.75717723e-01, -6.03985071e-01, -5.05847096e-01,
+          -4.77053583e-01, -8.34138334e-01, -6.01990104e-01,
+          -5.85179508e-01, -4.72038925e-01, -8.44650567e-01 ]
+      - [ -7.01532722e-01, -1.23702645e-01, -3.52732420e-01,
+          1.45943880e-01, -6.53122544e-01, -1.30742252e-01,
+          -3.83729279e-01, 3.57877970e-01, -7.10324645e-01,
+          -1.28151357e-01, -3.42187464e-01, -1.91175699e-01,
+          -4.94523108e-01, -5.35092473e-01, -3.43749940e-01,
+          -2.70127833e-01, -4.62197661e-01, -5.62093258e-01,
+          -6.31201267e-03, 1.55217290e-01, 1.81102276e-01,
+          -6.56194091e-02, -2.08516836e-01, 2.83116817e-01,
+          1.69942141e-01, -1.21459544e-01, -2.62387395e-01,
+          4.01343107e-02, 2.27895379e-01, -3.61781716e-01,
+          -5.65155149e-02, -3.21223140e-01, 2.08955288e-01,
+          -4.23678756e-01, -3.49678993e-02, -3.21658075e-01 ]
+      - [ -9.45526123e-01, -6.11671686e-01, -7.87083030e-01,
+          -4.86612797e-01, -6.61515951e-01, -7.14565277e-01,
+          -7.92656600e-01, -5.23092866e-01, -6.54849410e-01,
+          -7.56623685e-01, -7.34375000e-01, -6.51502252e-01,
+          -6.25920832e-01, -8.92568648e-01, -7.42187500e-01,
+          -5.76765418e-01, -6.35638237e-01, -8.92710507e-01,
+          -7.48722553e-01, -3.79956901e-01, -4.42069769e-01,
+          -1.71756029e-01, -4.55280125e-01, -3.28364491e-01,
+          -4.21965420e-01, -2.23203599e-01, -4.82713819e-01,
+          -4.76653159e-01, -2.85180628e-01, -6.41386271e-01,
+          -4.35222983e-01, -8.12061071e-01, -3.05970132e-01,
+          -6.20528400e-01, -4.70221639e-01, -8.12783360e-01 ]
+      - [ -6.89933717e-01, -2.70676672e-01, -3.79701853e-01,
+          2.24314213e-01, -7.46094167e-01, -2.39562392e-01,
+          -3.95248353e-01, 1.60543799e-01, -7.28198886e-01,
+          -3.37486565e-01, -2.25000024e-01, -4.47816849e-02,
+          -6.78062379e-01, -6.69554472e-01, -2.34374940e-01,
+          -2.54392028e-02, -6.74298406e-01, -6.72884107e-01,
+          1.99879646e-01, 1.76923156e-01, 3.25084329e-01, 2.12441683e-01,
+          -3.40336442e-01, 4.50948238e-01, 3.45664620e-01,
+          2.00269341e-01, -3.69848311e-01, 2.69160509e-01,
+          4.79451895e-01, -4.60972786e-01, -7.58304596e-02,
+          -3.81032884e-01, 4.82586861e-01, -5.19946694e-01,
+          -7.94385672e-02, -3.97843361e-01 ]
+      - [ -1., -7.19048858e-01, -9.75869417e-01, -5.65122008e-01,
+          -6.59737468e-01, -8.61166596e-01, -9.88480926e-01,
+          -6.39941394e-01, -6.21207952e-01, -8.98418665e-01,
+          -9.87499952e-01, -6.28220320e-01, -6.10723019e-01,
+          -9.60077107e-01, -9.95312452e-01, -5.68400383e-01,
+          -6.12988472e-01, -9.59959865e-01, -1., -6.40521646e-01,
+          -8.26771677e-01, -3.62784386e-01, -4.85614300e-01,
+          -7.43990183e-01, -8.47398877e-01, -2.89978027e-01,
+          -6.01600051e-01, -8.15260947e-01, -8.35616469e-01,
+          -5.26895702e-01, -5.51958561e-01, -9.34168458e-01,
+          -8.40795994e-01, -5.47862530e-01, -5.72381377e-01,
+          -9.36098456e-01 ]
+      - [ -6.05012417e-01, -2.07400382e-01, -2.70404577e-01,
+          2.33652234e-01, -7.53470123e-01, -2.63607085e-01,
+          -2.88696885e-01, 2.48849392e-01, -7.43071556e-01,
+          -3.06414664e-01, -1.90624952e-01, 9.93484259e-02,
+          -6.60575867e-01, -5.86068273e-01, -1.79687440e-01,
+          4.40368652e-02, -6.55179262e-01, -6.19535625e-01,
+          2.93658018e-01, 1.02012157e-01, 3.16085458e-01, 4.58031893e-02,
+          -3.99419546e-01, 9.81205702e-02, 3.17918897e-01,
+          -9.92775559e-02, -3.29756618e-01, -3.49876881e-02,
+          3.42465758e-01, 1.63154006e-01, -2.58580387e-01,
+          -1.88426375e-01, 3.80596876e-01, -2.52737522e-01,
+          -1.68770850e-01, -3.36882532e-01 ]
+      - [ -8.89602304e-01, -4.02398348e-01, -6.33782804e-01,
+          -4.67360198e-01, -5.49849927e-01, -5.34425735e-01,
+          -6.83225334e-01, -1.29781485e-01, -6.32253468e-01,
+          -3.97843480e-01, -5.37500024e-01, -6.84163094e-01,
+          -5.32439232e-01, -8.03794265e-01, -5.50000012e-01,
+          -5.58912992e-01, -5.50028622e-01, -7.92427063e-01,
+          -5.43131948e-01, -1.46211982e-01, -1.56355441e-01,
+          -3.29656720e-01, -2.38490820e-01, -1.64507508e-01,
+          -2.16185093e-01, 6.32489920e-02, -3.63837540e-01,
+          9.32887793e-02, 4.35866117e-02, -9.82074559e-01,
+          -1.55805767e-01, -7.32279897e-01, 4.97508049e-03,
+          -8.23705554e-01, -2.06514060e-01, -6.82882547e-01 ]
+      - [ -8.20836782e-01, -3.76183212e-01, -5.37260413e-01,
+          -2.84064293e-01, -6.45195186e-01, -5.01337647e-01,
+          -5.78113675e-01, 4.63529825e-02, -6.99447036e-01,
+          -3.60121548e-01, -4.24999952e-01, -3.63013804e-01,
+          -6.21617615e-01, -7.46393204e-01, -4.29687500e-01,
+          -3.36733580e-01, -6.18416309e-01, -7.51318038e-01,
+          -2.79831767e-01, 2.80617476e-02, 1.46232843e-02,
+          -2.01915681e-01, -2.29799151e-01, 4.19335365e-02,
+          -2.42775679e-02, 9.41504240e-02, -2.71184802e-01,
+          3.08971286e-01, 2.30386019e-01, -6.20880246e-01,
+          -1.40953958e-01, -5.38748384e-01, 2.38805890e-01,
+          -7.37263739e-01, -1.24989450e-01, -5.62402248e-01 ]
+      - [ -5.64415932e-01, -3.60401452e-01, -2.94535160e-01,
+          -2.37555504e-02, -7.92619348e-01, -5.46879292e-01,
+          -3.01655889e-01, 2.98107266e-01, -8.49434257e-01,
+          -4.94360149e-01, -2.15624928e-01, -2.00928748e-01,
+          -7.21889198e-01, -7.75102258e-01, -2.17187524e-01,
+          1.09684229e-01, -7.67701387e-01, -7.29020715e-01,
+          1.81244373e-01, -1.87442720e-01, 1.68728828e-02,
+          1.12293720e-01, -6.07841492e-01, -2.12638319e-01,
+          1.21387243e-01, 1.64608359e-01, -6.48186624e-01,
+          -3.06513488e-01, 2.97633886e-01, -2.55186379e-01,
+          -5.09146571e-01, -6.72086239e-01, 2.53731132e-01,
+          -1.35696232e-01, -5.30997097e-01, -6.30381167e-01 ]
+      - [ -9.62924600e-01, -5.86239100e-01, -8.67991507e-01,
+          -2.52270758e-01, -7.62120306e-01, -6.55199528e-01,
+          -8.67530584e-01, -3.46683860e-01, -7.24659264e-01,
+          -7.28104234e-01, -8.28125000e-01, -3.31856966e-01,
+          -7.22139418e-01, -8.73570740e-01, -8.34375024e-01,
+          -3.12542737e-01, -7.17034996e-01, -8.79111588e-01,
+          -8.37691605e-01, -4.59253728e-01, -5.72553396e-01,
+          -4.59758043e-02, -6.33274794e-01, -4.66710329e-01,
+          -5.60693622e-01, -3.55037570e-01, -4.53556895e-01,
+          -6.09804869e-01, -5.01868010e-01, -3.74426842e-01,
+          -5.62012136e-01, -8.20305705e-01, -5.02487540e-01,
+          -5.71166754e-01, -5.22686839e-01, -8.53071570e-01 ]
+      - [ -9.64788735e-01, -6.60398483e-01, -8.45280349e-01,
+          -4.74960804e-01, -7.15559542e-01, -7.81264186e-01,
+          -8.61771047e-01, -4.51320648e-01, -7.17602372e-01,
+          -7.93809414e-01, -8.18749964e-01, -5.32712460e-01,
+          -6.85931444e-01, -9.12369192e-01, -8.24999988e-01,
+          -4.84547198e-01, -6.87557101e-01, -9.14231837e-01,
+          -8.67147565e-01, -4.50753212e-01, -5.86051702e-01,
+          -1.36249483e-01, -5.72202384e-01, -4.85692322e-01,
+          -5.93063593e-01, -1.45584404e-01, -5.51590085e-01,
+          -5.20408452e-01, -5.19302607e-01, -4.21728432e-01,
+          -5.21991909e-01, -8.14367771e-01, -5.14925361e-01,
+          -5.43023288e-01, -5.11059761e-01, -8.37715507e-01 ]
+      - [ -9.77216244e-01, -5.86935163e-01, -8.99219275e-01,
+          -4.11209285e-01, -6.18224919e-01, -6.29056811e-01,
+          -8.77609789e-01, -6.85290694e-01, -5.44766545e-01,
+          -7.96906650e-01, -8.62500012e-01, -7.00191736e-01,
+          -5.48303843e-01, -8.99805605e-01, -8.68749976e-01,
+          -6.26471400e-01, -5.54799736e-01, -8.97256911e-01,
+          -9.03817236e-01, -4.15836334e-01, -6.58042729e-01,
+          -1.18381321e-01, -4.37838197e-01, -2.68658400e-01,
+          -6.30057812e-01, -5.59698522e-01, -2.45223999e-01,
+          -6.26353621e-01, -5.79078555e-01, -8.32751513e-01,
+          -2.90194631e-01, -8.54008436e-01, -5.87064624e-01,
+          -8.37663829e-01, -3.06195498e-01, -8.45870972e-01 ]
+      - [ -9.26470578e-01, -4.83002543e-01, -7.24627376e-01,
+          -4.05018568e-01, -6.46190822e-01, -6.17588937e-01,
+          -7.63858855e-01, -2.84325659e-01, -6.41297460e-01,
+          -5.52239060e-01, -6.64062500e-01, -5.46527803e-01,
+          -6.09899163e-01, -8.33549917e-01, -6.68749988e-01,
+          -5.08196592e-01, -6.08233213e-01, -8.36938143e-01,
+          -6.25488520e-01, -2.82061815e-01, -3.13835740e-01,
+          -1.31606579e-01, -4.53791320e-01, -2.56267369e-01,
+          -3.10982645e-01, -1.42499745e-01, -4.09141064e-01,
+          -2.60355651e-01, -1.18306279e-01, -7.00816154e-01,
+          -3.62312198e-01, -7.69163251e-01, -1.19403005e-01,
+          -8.04753184e-01, -3.51249814e-01, -7.83570409e-01 ]
+      - [ -7.13545978e-01, -2.03576505e-01, -3.64087999e-01,
+          4.04397249e-01, -7.72111297e-01, -1.25262022e-01,
+          -3.53491664e-01, -8.77909660e-02, -6.63572967e-01,
+          -4.22983170e-01, -2.71874964e-01, -2.42248237e-01,
+          -5.93773484e-01, -6.60596967e-01, -2.82812476e-01,
+          -2.20350862e-01, -5.88038504e-01, -6.64169729e-01,
+          -1.45175934e-01, -2.70717740e-02, 5.06186485e-02,
+          2.86723018e-01, -5.44698954e-01, 1.46297455e-01,
+          8.67052078e-02, -4.32049215e-01, -2.60001242e-01,
+          -3.18745017e-01, 1.80572867e-01, -4.87707317e-01,
+          -2.36311674e-01, -5.52695096e-01, 1.76616907e-01,
+          -5.73094726e-01, -2.23177493e-01, -5.65921307e-01 ]
+      - [ -9.56503749e-01, -5.97888172e-01, -8.23988616e-01,
+          -5.22223115e-01, -6.28361702e-01, -7.16902673e-01,
+          -8.28653693e-01, -6.13440931e-01, -5.65919340e-01,
+          -7.42037833e-01, -7.85937428e-01, -6.72836065e-01,
+          -5.87218761e-01, -8.88227999e-01, -7.87499964e-01,
+          -6.98713839e-01, -5.59924245e-01, -8.97642374e-01,
+          -7.95010567e-01, -4.06265140e-01, -4.62317228e-01,
+          -3.28419447e-01, -4.18489099e-01, -4.75181341e-01,
+          -4.89017367e-01, -2.92291403e-01, -3.88374031e-01,
+          -4.41523552e-01, -3.82316411e-01, -6.20857596e-01,
+          -4.27159607e-01, -8.11965466e-01, -3.93034816e-01,
+          -6.75585985e-01, -4.13606882e-01, -8.08618307e-01 ]
+      - [ -8.14623058e-01, -4.21575785e-01, -5.59971631e-01,
+          -1.29980683e-01, -7.65863299e-01, -5.65209031e-01,
+          -5.88192940e-01, 2.49431610e-01, -8.40068161e-01,
+          -4.65509474e-01, -5.31250000e-01, -2.09974349e-01,
+          -7.02658176e-01, -7.74124146e-01, -5.45312464e-01,
+          -1.04799867e-02, -7.33585119e-01, -7.43471980e-01,
+          -3.31529915e-01, -2.47729719e-01, -1.42857194e-01,
+          -1.52386665e-01, -5.29534698e-01, -3.85897040e-01,
+          -1.42196596e-01, 1.37679696e-01, -6.39297068e-01,
+          -3.16069961e-01, -1.03362381e-01, -3.29340219e-01,
+          -4.31139350e-01, -6.59262657e-01, -1.11940265e-01,
+          -2.93277800e-01, -4.53117490e-01, -6.46720827e-01 ]
+      - [ -9.27091956e-01, -5.21930575e-01, -7.67210782e-01,
+          -3.40017021e-01, -6.27749085e-01, -5.40058255e-01,
+          -7.32181430e-01, -6.18929148e-01, -5.59653878e-01,
+          -7.06421494e-01, -6.79687500e-01, -6.67192757e-01,
+          -5.64955473e-01, -8.46640289e-01, -6.84375048e-01,
+          -6.29045308e-01, -5.62491298e-01, -8.50737333e-01,
+          -6.68770671e-01, -2.49610960e-01, -3.49831164e-01,
+          -1.35465384e-01, -3.43455255e-01, -8.78545046e-02,
+          -2.92485595e-01, -5.18944979e-01, -2.15117753e-01,
+          -4.24680829e-01, -1.80572867e-01, -7.06715167e-01,
+          -2.86226153e-01, -7.28526890e-01, -1.89054728e-01,
+          -7.75503397e-01, -2.79933512e-01, -7.34789133e-01 ]
+      - [ -9.60024834e-01, -6.22483373e-01, -8.16891432e-01,
+          -5.63922703e-01, -6.30902767e-01, -7.50003278e-01,
+          -8.34413230e-01, -5.40110826e-01, -6.32389188e-01,
+          -7.57841051e-01, -7.87499964e-01, -6.72993422e-01,
+          -6.09495819e-01, -9.01714623e-01, -7.93749988e-01,
+          -6.24669194e-01, -6.08247757e-01, -9.03200030e-01,
+          -7.98016250e-01, -4.44723308e-01, -4.89313841e-01,
+          -2.04143226e-01, -5.19266188e-01, -4.72521663e-01,
+          -5.02890229e-01, -1.80158615e-01, -5.24440765e-01,
+          -5.02874255e-01, -3.84806991e-01, -5.41522443e-01,
+          -5.17115414e-01, -8.39019775e-01, -3.78109455e-01,
+          -6.38248920e-01, -5.09380579e-01, -8.51559162e-01 ]
+      - [ -9.66031492e-01, -5.85710466e-01, -8.36763620e-01,
+          -5.56676209e-01, -5.98255157e-01, -7.18120813e-01,
+          -8.63210917e-01, -4.78561699e-01, -6.10429764e-01,
+          -6.99119568e-01, -8.14062476e-01, -6.54416323e-01,
+          -5.75150132e-01, -8.83080602e-01, -8.20312500e-01,
+          -6.02551043e-01, -5.75902939e-01, -8.84572208e-01,
+          -8.40096176e-01, -4.08591390e-01, -5.54555655e-01,
+          -2.37344325e-01, -4.45933580e-01, -4.24144387e-01,
+          -5.63005805e-01, -1.76061630e-01, -4.52386439e-01,
+          -4.09245968e-01, -4.54545498e-01, -5.91378927e-01,
+          -4.38627660e-01, -8.18444967e-01, -4.55223858e-01,
+          -6.40804052e-01, -4.48005617e-01, -8.25971484e-01 ]
+      - [ -9.60024834e-01, -6.22483373e-01, -8.16891432e-01,
+          -5.63922703e-01, -6.30902767e-01, -7.50003278e-01,
+          -8.34413230e-01, -5.40110826e-01, -6.32389188e-01,
+          -7.57841051e-01, -7.87499964e-01, -6.72993422e-01,
+          -6.09495819e-01, -9.01714623e-01, -7.93749988e-01,
+          -6.24669194e-01, -6.08247757e-01, -9.03200030e-01,
+          -7.98016250e-01, -4.44723308e-01, -4.89313841e-01,
+          -2.04143226e-01, -5.19266188e-01, -4.72521663e-01,
+          -5.02890229e-01, -1.80158615e-01, -5.24440765e-01,
+          -5.02874255e-01, -3.84806991e-01, -5.41522443e-01,
+          -5.17115414e-01, -8.39019775e-01, -3.78109455e-01,
+          -6.38248920e-01, -5.09380579e-01, -8.51559162e-01 ]
+      - [ -9.84051347e-01, -6.75492287e-01, -8.99219275e-01,
+          -6.35431170e-01, -5.98736286e-01, -8.12780499e-01,
+          -9.23686087e-01, -5.15826643e-01, -6.65848136e-01,
+          -8.09679925e-01, -8.96874964e-01, -6.40942752e-01,
+          -6.24854326e-01, -9.33154225e-01, -9.04687464e-01,
+          -5.70026338e-01, -6.23872399e-01, -9.28180218e-01,
+          -9.24857259e-01, -5.31100988e-01, -6.89538777e-01,
+          -2.24441886e-01, -5.25056243e-01, -5.56353986e-01,
+          -6.85549140e-01, -2.50559092e-01, -5.09115219e-01,
+          -6.07068539e-01, -6.26401067e-01, -5.50128222e-01,
+          -5.31290948e-01, -8.86138022e-01, -6.34328365e-01,
+          -6.26019299e-01, -4.92612839e-01, -8.77215385e-01 ]
+      - [ -5.15741587e-01, -8.29320550e-02, -1.58268332e-01,
+          2.81823158e-01, -7.04865158e-01, -1.30436778e-01,
+          -1.67746603e-01, 6.52151585e-01, -7.58268356e-01,
+          -5.59186935e-03, -4.21874523e-02, 6.20050430e-02,
+          -5.88493824e-01, -5.12324691e-01, -4.21874523e-02,
+          -6.63465261e-03, -5.73610187e-01, -5.43782353e-01,
+          4.47550297e-01, 2.63820171e-01, 4.84814405e-01,
+          -6.90563917e-02, -2.59704292e-01, 1.98733568e-01,
+          4.17340994e-01, 2.68437743e-01, -3.11592340e-01,
+          4.75929379e-01, 5.59153080e-01, -2.61529446e-01,
+          -1.16270185e-02, -1.97723210e-01, 5.72139144e-01,
+          -5.03171802e-01, 3.63698006e-02, -2.80894637e-01 ]
+      - [ -5.70008278e-01, -1.79196179e-01, -2.26401687e-01,
+          2.39993691e-01, -7.51993656e-01, -2.60522068e-01,
+          -2.41180718e-01, 2.74641156e-01, -7.42576838e-01,
+          -2.89654136e-01, -1.93749905e-01, 9.34658051e-02,
+          -6.27986312e-01, -5.47899783e-01, -1.78124964e-01,
+          7.66346455e-02, -6.31928384e-01, -5.76595426e-01,
+          3.21911693e-01, 9.72579718e-02, 3.13835740e-01, 5.30958176e-04,
+          -3.79294753e-01, 7.51612186e-02, 3.27167511e-01,
+          -1.13683283e-01, -3.30261707e-01, -5.21605015e-02,
+          3.52428317e-01, 1.64507151e-01, -2.56358147e-01,
+          -1.84925735e-01, 4.02985096e-01, -2.63931572e-01,
+          -1.72997534e-01, -3.46346676e-01 ]
+      - [ -6.66321456e-01, -2.20825553e-01, -3.44215751e-01,
+          1.20091677e-01, -7.42072523e-01, -3.31747949e-01,
+          -3.46292257e-01, 2.30034828e-01, -7.35182703e-01,
+          -2.99258828e-01, -2.79687464e-01, -1.14680588e-01,
+          -6.21885419e-01, -6.36844516e-01, -2.96875000e-01,
+          -7.12500215e-02, -6.14453077e-01, -6.27210200e-01,
+          -6.31201267e-03, 3.60059738e-03, 1.83352113e-01,
+          -2.46233642e-01, -3.66030753e-01, -2.13458538e-01,
+          1.49132967e-01, 1.84093714e-02, -4.10595894e-01,
+          -3.77756953e-02, 2.47820616e-01, -3.42567444e-01,
+          -2.38565505e-01, -4.74526405e-01, 1.56716347e-01,
+          -7.26569891e-02, -2.92062640e-01, -3.64107251e-01 ]
+      - [ -8.14001679e-01, -2.80212045e-01, -5.07452130e-01,
+          1.08389735e-01, -7.02741742e-01, -2.40598023e-01,
+          -5.30597568e-01, 3.18411589e-02, -6.91608548e-01,
+          -3.64009500e-01, -4.09374952e-01, -2.80106306e-01,
+          -5.97159624e-01, -6.88236237e-01, -4.15624976e-01,
+          -2.38452077e-01, -5.96658468e-01, -6.88919306e-01,
+          -3.54974508e-01, 1.44406676e-01, 4.83690500e-02,
+          -2.48119235e-02, -2.02229798e-01, 3.50961804e-01,
+          6.12715483e-02, -1.65521681e-01, -2.12069094e-01,
+          5.67407608e-02, 1.13325000e-01, -5.32563567e-01,
+          3.34872007e-02, -3.56046200e-01, 9.95024443e-02,
+          -5.88898182e-01, 3.67048979e-02, -3.66535008e-01 ]
+      - [ -9.21706736e-01, -5.88227510e-01, -7.27466285e-01,
+          -4.42935765e-01, -6.90947890e-01, -7.04090953e-01,
+          -7.43700504e-01, -3.96947205e-01, -7.01946616e-01,
+          -7.15157151e-01, -6.57812476e-01, -5.72524428e-01,
+          -6.64402902e-01, -8.79983604e-01, -6.64062500e-01,
+          -5.17233849e-01, -6.65836573e-01, -8.80113184e-01,
+          -6.30898714e-01, -3.56366336e-01, -3.16085458e-01,
+          -1.44262433e-01, -5.17295003e-01, -3.73116612e-01,
+          -3.15607011e-01, -9.47023630e-02, -5.34695268e-01,
+          -3.99068534e-01, -1.28268957e-01, -5.61442912e-01,
+          -4.88783598e-01, -8.01731706e-01, -1.31840825e-01,
+          -6.01687431e-01, -4.91031229e-01, -8.02811205e-01 ]
+      - [ -7.05260992e-01, -2.83119798e-01, -4.54932570e-01,
+          7.51854181e-02, -7.27706194e-01, -3.29969764e-01,
+          -4.62922931e-01, 1.19584084e-01, -7.26403356e-01,
+          -3.60444009e-01, -3.90625000e-01, -2.35060811e-01,
+          -6.12785339e-01, -6.83884382e-01, -3.92187476e-01,
+          -3.57647240e-01, -5.79080105e-01, -7.20240593e-01,
+          6.64261580e-02, -3.21364403e-02, 1.22609735e-01,
+          2.40054131e-02, -4.61434782e-01, -4.64645028e-02,
+          6.82079792e-02, -6.78610206e-02, -3.85498583e-01,
+          -1.01581216e-01, 1.93026185e-01, -3.13741624e-01,
+          -2.69448042e-01, -4.90105212e-01, 2.26368189e-01,
+          -5.35411119e-01, -2.44223118e-01, -5.61456203e-01 ]
+      - [ -9.79701757e-01, -5.34993291e-01, -9.02058184e-01,
+          -2.99822748e-01, -6.60983801e-01, -5.62208772e-01,
+          -8.94888401e-01, -4.45918560e-01, -6.38720095e-01,
+          -7.15487838e-01, -8.78124952e-01, -4.70031440e-01,
+          -6.08185768e-01, -8.57200861e-01, -8.84374976e-01,
+          -4.24953520e-01, -6.06677353e-01, -8.57904196e-01,
+          -9.21851516e-01, -3.07986736e-01, -6.87289119e-01,
+          -1.43300295e-02, -4.46681499e-01, -1.14191592e-01,
+          -6.69364214e-01, -2.84463167e-01, -3.83116901e-01,
+          -4.84751284e-01, -6.26401067e-01, -5.33532500e-01,
+          -3.71876299e-01, -7.85546184e-01, -6.31840825e-01,
+          -5.66525221e-01, -3.72253239e-01, -7.82182753e-01 ]
+      - [ -9.90886509e-01, -7.17558742e-01, -9.16252613e-01,
+          -6.48846030e-01, -6.50685012e-01, -8.73920321e-01,
+          -9.38084960e-01, -6.08135343e-01, -6.48073435e-01,
+          -8.63933921e-01, -9.31249976e-01, -6.69176757e-01,
+          -6.20827556e-01, -9.51457322e-01, -9.35937464e-01,
+          -6.17625833e-01, -6.20052099e-01, -9.51482654e-01,
+          -9.52509761e-01, -5.79261065e-01, -7.14285731e-01,
+          -2.59651363e-01, -5.85529268e-01, -6.83490217e-01,
+          -7.20231235e-01, -2.84890056e-01, -5.15056491e-01,
+          -6.63889289e-01, -6.98630095e-01, -5.63744128e-01,
+          -5.22693634e-01, -9.00489271e-01, -6.99005008e-01,
+          -6.54460907e-01, -5.08285046e-01, -9.07619655e-01 ]
+      - [ -5.74979305e-01, 1.22317672e-01, -1.54009938e-01,
+          -6.14174068e-01, -2.99295366e-01, -2.91635633e-01,
+          -1.61987007e-01, -5.79676569e-01, -3.04739714e-01,
+          -2.96342969e-01, -1.57812417e-01, -2.35252738e-01,
+          -2.87150979e-01, -3.19623709e-01, -1.65625036e-01,
+          -1.74758077e-01, -3.00524354e-01, -3.24885666e-01,
+          2.96663642e-01, 1.62267685e-03, 3.47581506e-01,
+          -2.73707390e-01, -4.13872182e-01, -3.05231273e-01,
+          3.94219518e-01, -3.12535048e-01, -4.01621819e-01,
+          -3.64959896e-01, 4.27148223e-01, 1.06967568e-01,
+          -4.11812901e-01, -3.93056989e-01, 4.32835698e-01,
+          1.61758661e-02, -4.05543089e-01, -4.26084757e-01 ]
+      - [ -3.94366264e-01, 7.02243686e-01, 1.19943261e-01,
+          -5.54526985e-01, 3.10461044e-01, 5.91362834e-01,
+          1.27429843e-01, -6.53823018e-01, 3.46575499e-01,
+          4.88886595e-01, 1.29687548e-01, -3.57095957e-01,
+          4.30995584e-01, 4.22528505e-01, 1.25000119e-01,
+          -3.24725926e-01, 4.28716302e-01, 4.06538844e-01,
+          5.74992537e-01, 7.14014649e-01, 7.05286860e-01,
+          -4.37112451e-01, 3.61546159e-01, 6.84435606e-01,
+          7.50288963e-01, -7.13230669e-01, 4.72879529e-01,
+          4.02912617e-01, 7.45952725e-01, -9.78795290e-02,
+          5.19379735e-01, 4.75302815e-01, 7.53731370e-01,
+          -8.41907263e-02, 4.79058504e-01, 4.56609845e-01 ]
+      - [ -5.20091176e-01, 2.88349986e-01, -5.60681224e-02,
+          -4.13685977e-01, -2.27860630e-01, -6.28769398e-03,
+          -5.83152175e-02, -4.21129942e-01, -2.17590809e-01,
+          -4.16419506e-02, -4.21874523e-02, -1.87165380e-01,
+          -1.57167315e-01, -1.37851834e-01, -4.68749404e-02,
+          -1.77276790e-01, -1.53530300e-01, -1.54961348e-01,
+          3.20108175e-01, 2.22863674e-01, 4.39820051e-01,
+          -3.32549691e-01, -2.13408172e-01, -6.48716688e-02,
+          4.58959460e-01, -3.61976981e-01, -1.87067986e-01,
+          -1.06094003e-01, 4.94395971e-01, 7.00154305e-02,
+          -1.75549388e-01, -1.55156493e-01, 4.87562299e-01,
+          -5.64792752e-02, -1.50090814e-01, -1.97623134e-01 ]
+      - [ 1., 9.94790316e-01, 9.78708267e-01, -9.95977819e-01,
+          9.89872336e-01, 9.88604188e-01, 9.76961970e-01,
+          -9.78070438e-01, 9.89091277e-01, 1., 9.89062548e-01,
+          -4.83053505e-01, 9.89990830e-01, 9.96112347e-01,
+          9.79687452e-01, -4.43482399e-01, 9.93149877e-01,
+          9.86756921e-01, 8.37691665e-01, 9.93937492e-01, 9.48256493e-01,
+          -9.73185122e-01, 9.85790253e-01, 7.56471157e-01,
+          9.86127019e-01, -9.58619177e-01, 9.62677002e-01,
+          7.33296037e-01, 9.57658529e-01, -5.05876541e-02,
+          9.42809820e-01, 9.81771469e-01, 9.55223918e-01,
+          -1.14515066e-01, 9.29674864e-01, 9.19620633e-01 ]
+      - [ -5.00621378e-01, 1.88227892e-02, -7.87793398e-02,
+          -3.46211135e-01, -4.86473143e-01, -3.12091649e-01,
+          -9.71922278e-02, -2.02921391e-01, -5.15916467e-01,
+          -2.84298360e-01, -4.53124642e-02, -2.54181206e-01,
+          -4.23274517e-01, -4.73911583e-01, -5.46874404e-02,
+          -2.22342014e-02, -4.66205359e-01, -4.25406039e-01,
+          3.96453261e-01, -1.61164999e-03, 2.62092233e-01,
+          -4.10841703e-02, -4.48644042e-01, -1.02177560e-01,
+          3.91907454e-01, -2.37116814e-02, -4.92248297e-01,
+          -2.19263494e-01, 5.39227962e-01, -1.58586025e-01,
+          -3.84801090e-01, -5.03744483e-01, 5.09950161e-01,
+          -1.13425910e-01, -3.94291878e-01, -4.79328811e-01 ]
+      - [ -4.89850938e-01, 2.64842510e-01, -5.74875474e-02,
+          -2.27620780e-01, -2.81927168e-01, 1.02878571e-01,
+          -1.65587068e-02, -4.76204276e-01, -2.36918807e-01,
+          -1.14064276e-01, -4.68742847e-03, -3.22202384e-01,
+          -1.59715474e-01, -2.21284032e-01, -1.56250000e-02,
+          -1.33872807e-01, -1.95837498e-01, -1.74380183e-01,
+          4.77006316e-01, 2.69483566e-01, 4.93813396e-01, 1.09794140e-02,
+          -2.70115137e-01, 2.90375948e-01, 6.16184831e-01,
+          -5.41215599e-01, -1.13529861e-01, -1.83272541e-01,
+          6.46326184e-01, -1.67111456e-01, -8.11907053e-02,
+          -2.04046428e-01, 6.21890426e-01, -1.13268316e-01,
+          -1.00190282e-01, -1.77390695e-01 ]
+      - [ -3.02402675e-01, 7.84287810e-01, 2.16465592e-01,
+          -8.01233411e-01, 5.02290606e-01, 5.56757450e-01,
+          2.09503293e-01, -7.72720337e-01, 5.02110720e-01,
+          5.65596581e-01, 2.15625048e-01, -3.92493486e-01,
+          5.48002362e-01, 5.32550693e-01, 2.09375024e-01,
+          -3.59456122e-01, 5.52739143e-01, 5.23507953e-01,
+          6.08656406e-01, 7.71600485e-01, 7.63779521e-01,
+          -7.06507981e-01, 5.32863736e-01, 5.31141162e-01,
+          7.66473889e-01, -6.62973523e-01, 5.38556814e-01,
+          5.62237740e-01, 7.73349881e-01, -5.61847687e-02,
+          5.68790317e-01, 5.64720511e-01, 7.86069751e-01,
+          -1.38412714e-02, 5.33328414e-01, 5.73776126e-01 ]
+      - [ -5.43703437e-01, 3.11288834e-02, -1.46912694e-01,
+          -1.44598305e-01, -5.35296023e-01, -2.12747633e-01,
+          -1.61987007e-01, -3.20731997e-02, -5.47852099e-01,
+          -1.94531024e-01, -1.31249964e-01, -8.00057054e-02,
+          -4.30177569e-01, -3.95296574e-01, -1.34374976e-01,
+          -7.21251965e-02, -4.28833425e-01, -4.10846829e-01,
+          2.74421453e-01, 3.01618576e-02, 2.77840257e-01,
+          -6.95978403e-02, -4.57252502e-01, -1.50793254e-01,
+          3.24855447e-01, -3.32146287e-02, -4.73756611e-01,
+          -1.99787676e-01, 3.47447038e-01, 2.09854841e-02,
+          -3.54082823e-01, -3.80901396e-01, 3.48258853e-01,
+          -6.68382645e-03, -3.62731040e-01, -3.94787908e-01 ]
+      - [ -4.19428349e-01, 6.54870152e-01, 8.87154341e-02,
+          -6.81144655e-01, 3.01228881e-01, 4.27549362e-01,
+          7.70338774e-02, -6.30645216e-01, 2.97421336e-01,
+          4.45697665e-01, 8.12500715e-02, -3.49529147e-01,
+          3.72459412e-01, 3.58899236e-01, 8.28125477e-02,
+          -2.84892499e-01, 3.58481169e-01, 3.56101990e-01,
+          6.68169498e-01, 6.62116647e-01, 7.54780769e-01,
+          -4.30737793e-01, 2.93887615e-01, 5.95205784e-01,
+          7.59537458e-01, -4.80300844e-01, 3.39776874e-01,
+          5.34626961e-01, 7.95765877e-01, -1.38168275e-01,
+          4.58346367e-01, 3.79055977e-01, 7.73631930e-01,
+          -1.82299912e-01, 4.61763024e-01, 3.61317039e-01 ]
+      - [ -4.42212164e-01, 2.82552958e-01, 4.04541492e-02,
+          -5.42379975e-01, -2.20384657e-01, -1.13794088e-01,
+          3.23975086e-02, -5.27471423e-01, -2.18100429e-01,
+          -1.30248487e-01, 4.21875715e-02, -2.86567926e-01,
+          -1.68911397e-01, -2.07875013e-01, 3.90625000e-02,
+          -2.48171747e-01, -1.71778917e-01, -2.15497911e-01,
+          3.93447518e-01, 1.96271062e-01, 5.11811018e-01,
+          -3.92224371e-01, -2.29458928e-01, -1.52808309e-01,
+          5.30635715e-01, -4.45010960e-01, -2.17841923e-01,
+          -2.30988741e-01, 5.76587796e-01, -1.35069489e-02,
+          -2.19469726e-01, -2.53833294e-01, 5.69651842e-01,
+          -7.74739385e-02, -2.10577130e-01, -2.71286428e-01 ]
+      - [ -5.45360446e-01, 2.74785757e-02, -1.36976659e-01,
+          -1.48419261e-01, -5.37047029e-01, -2.19183028e-01,
+          -1.40388727e-01, -1.79262042e-01, -5.16468406e-01,
+          -2.66458869e-01, -1.03124976e-01, -6.91615939e-02,
+          -4.46438015e-01, -4.07814682e-01, -1.15625024e-01,
+          -1.03272438e-01, -4.29067075e-01, -4.27319050e-01,
+          1.46979094e-01, 3.28812599e-02, 2.48593926e-01,
+          -1.96247578e-01, -4.11950707e-01, -2.23554313e-01,
+          2.60115504e-01, -2.39729047e-01, -3.69827032e-01,
+          -2.56030381e-01, 3.02615166e-01, 1.55851841e-02,
+          -3.40903461e-01, -3.71071517e-01, 3.03482652e-01,
+          -1.00005567e-01, -3.22672069e-01, -4.05859828e-01 ]
+      - [ -4.97514546e-01, 4.74797845e-01, -9.22638178e-03,
+          -6.24661922e-01, 5.39990664e-02, 1.60686255e-01,
+          -3.59964371e-03, -6.81931853e-01, 7.89319277e-02,
+          1.06901646e-01, -3.12495232e-03, -3.27757299e-01,
+          1.04521513e-01, 6.77890778e-02, -9.37497616e-03,
+          -3.10283542e-01, 1.12996578e-01, 5.56932688e-02,
+          4.64983463e-01, 4.17399764e-01, 5.59055209e-01,
+          -3.74979258e-01, 9.34982300e-03, 2.26022840e-01,
+          6.13872766e-01, -5.62742352e-01, 8.84661674e-02,
+          6.13691807e-02, 6.46326184e-01, -3.85217071e-02,
+          6.56366348e-02, 3.43825817e-02, 6.49253607e-01,
+          -1.29072309e-01, 7.44991302e-02, -6.21157885e-03 ]
+      - [ -4.92957771e-01, -9.94431973e-03, -1.32718265e-01,
+          5.08933067e-02, -5.90582132e-01, -1.27863228e-01,
+          -1.25989914e-01, -2.39057422e-01, -5.12479424e-01,
+          -3.07644069e-01, -3.74999642e-02, -2.93170154e-01,
+          -4.50344503e-01, -5.21368980e-01, -6.09374642e-02,
+          -2.36614764e-01, -4.45235968e-01, -5.11277020e-01,
+          4.86624599e-01, 3.73102427e-02, 3.43082070e-01, 1.38425231e-01,
+          -4.72021222e-01, 8.14940929e-02, 4.12716627e-01,
+          -9.87443328e-02, -4.03834820e-01, -1.51511788e-01,
+          6.11456990e-01, -3.69132459e-01, -2.76879311e-01,
+          -5.05378962e-01, 5.97014785e-01, -3.54031384e-01,
+          -2.84512043e-01, -4.90332305e-01 ]
+      - [ -4.94614780e-01, -3.32144499e-02, -9.29737687e-02,
+          -2.84065902e-01, -5.44247150e-01, -3.45717907e-01,
+          -1.46148205e-01, -4.45019603e-02, -5.76090455e-01,
+          -2.51747012e-01, -7.81249404e-02, -2.51678467e-01,
+          -4.56957281e-01, -5.10574579e-01, -8.28124881e-02,
+          -3.41317654e-02, -4.97542381e-01, -4.69035208e-01,
+          4.70994830e-01, -3.10742855e-03, 3.43082070e-01,
+          -1.33481681e-01, -4.39365387e-01, -1.92233145e-01,
+          3.94219518e-01, 7.61413574e-02, -5.09242296e-01,
+          -1.42600179e-01, 5.06849289e-01, -2.01682448e-01,
+          -3.44077706e-01, -4.87179399e-01, 5.32338381e-01,
+          -2.02740908e-01, -3.60151172e-01, -4.89349246e-01 ]
+      - [ 4.43247676e-01, 9.06053543e-01, 5.71327090e-01,
+          -9.27218437e-01, 7.83630490e-01, 7.89343357e-01,
+          5.59395194e-01, -9.26438391e-01, 7.91278720e-01,
+          7.85937428e-01, 5.73437452e-01, -4.74499047e-01,
+          8.12611461e-01, 7.84906507e-01, 5.64062476e-01,
+          -4.23216701e-01, 8.08298588e-01, 7.77953863e-01,
+          9.62128043e-01, 9.01220798e-01, 9.52755809e-01,
+          -8.70526373e-01, 8.16582561e-01, 6.93069458e-01,
+          9.93063569e-01, -5.97620904e-01, 6.74366713e-01,
+          8.72742176e-01, 9.72602606e-01, -1.34160161e-01,
+          8.41944456e-01, 7.95486331e-01, 9.80099440e-01,
+          -2.11212099e-01, 8.45097184e-01, 7.45118380e-01 ]
+      - [ -4.63960290e-01, 3.04329515e-01, -1.06458664e-02,
+          -6.47526979e-02, -3.23911071e-01, 2.07481861e-01,
+          -1.94383860e-02, -4.92866039e-02, -3.10168445e-01,
+          1.72369599e-01, 3.12507153e-03, -1.14461899e-01,
+          -1.54823244e-01, -8.72110724e-02, 6.25002384e-03,
+          -9.17686224e-02, -1.59279108e-01, -1.04106247e-01,
+          6.08055353e-01, 4.27189589e-01, 6.15298033e-01,
+          -4.87822294e-03, -1.23292506e-01, 5.36619544e-01,
+          6.32369876e-01, -7.42302537e-02, -8.70277286e-02,
+          4.13950562e-01, 7.11083412e-01, -2.45279729e-01,
+          1.59197569e-01, -1.18513703e-02, 7.13930368e-01,
+          -2.85631180e-01, 1.57435894e-01, -2.46264935e-02 ]
+      - [ -4.76180673e-01, 5.99450827e-01, 3.05180550e-02,
+          -8.08049798e-01, 2.79885769e-01, 2.52600193e-01,
+          2.23183632e-02, -8.01861405e-01, 2.86353707e-01,
+          2.47472048e-01, 2.34376192e-02, -3.59487891e-01,
+          2.97609687e-01, 2.62944102e-01, 1.25000477e-02,
+          -3.05659294e-01, 2.95827508e-01, 2.64378667e-01,
+          4.65584517e-01, 4.78933811e-01, 5.88301539e-01,
+          -5.91370106e-01, 1.66715145e-01, 1.67391539e-01,
+          6.16184831e-01, -6.51777864e-01, 1.85317159e-01,
+          8.48997831e-02, 6.23910308e-01, 6.30962849e-02, 1.41158938e-01,
+          1.91075921e-01, 6.16915345e-01, 8.30698013e-03, 1.48431301e-01,
+          1.68201804e-01 ]
+      - [ -4.83430028e-01, 1.63600326e-01, -8.72959495e-02,
+          -1.82503581e-01, -3.75018597e-01, 4.78672981e-03,
+          -8.13534260e-02, -3.96010876e-01, -3.13060284e-01,
+          -1.51450455e-01, -2.34375000e-02, -3.54501963e-01,
+          -2.59199619e-01, -3.48270535e-01, -3.59374881e-02,
+          -2.86871552e-01, -2.60844648e-01, -3.39141130e-01,
+          5.69582224e-01, 1.67798162e-01, 4.64566946e-01,
+          -7.83795118e-03, -3.22476447e-01, 1.69083357e-01,
+          5.37572145e-01, -2.05053985e-01, -2.68764675e-01,
+          -4.83946800e-02, 6.96139336e-01, -3.18876565e-01,
+          -1.58487499e-01, -3.67051601e-01, 6.71641827e-01,
+          -3.13709915e-01, -1.61222041e-01, -3.52310359e-01 ]
+      - [ -4.79080379e-01, 2.95535326e-02, -6.60042763e-02,
+          4.28755283e-01, -6.90092564e-01, 3.26400995e-02,
+          -6.83945417e-02, 4.01728272e-01, -6.67929769e-01,
+          -4.52256203e-02, -2.18750238e-02, 2.22349167e-03,
+          -4.75659132e-01, -4.03575897e-01, -2.96874642e-02,
+          -2.57365704e-02, -4.62459505e-01, -4.22283888e-01,
+          5.14277101e-01, 2.96050906e-01, 4.98312712e-01, 1.40661240e-01,
+          -2.67727971e-01, 4.82866406e-01, 5.12138724e-01,
+          -1.56862140e-02, -2.08387554e-01, 2.80164838e-01,
+          6.16438270e-01, -3.56175721e-01, 4.71913815e-02,
+          -1.98334455e-01, 6.36815906e-01, -3.99873972e-01,
+          2.73166895e-02, -2.25231409e-01 ]
+      - [ 1., 9.94457126e-01, 9.65933204e-01, -1., 9.93205309e-01,
+          9.86997366e-01, 9.69762564e-01, -9.83627975e-01,
+          9.87233996e-01, 9.90891457e-01, 9.73437548e-01,
+          -4.76930320e-01, 9.89320874e-01, 1., 9.71875191e-01,
+          -4.58880723e-01, 9.98749137e-01, 9.79413509e-01, 1.,
+          9.87383246e-01, 1., -9.81846273e-01, 9.85774279e-01,
+          7.50755787e-01, 1., -6.78797126e-01, 8.49282503e-01, 1., 1.,
+          -1.12338960e-01, 9.73213792e-01, 9.57475781e-01, 1.,
+          -1.42652392e-01, 9.57111120e-01, 9.24884200e-01 ]
+      - [ -4.50082839e-01, 6.78485394e-01, 7.45209455e-02,
+          -5.43723285e-01, 2.59418726e-01, 5.31753302e-01,
+          8.27933550e-02, -8.35815847e-01, 3.69200826e-01,
+          3.20662141e-01, 7.96874762e-02, -4.01812851e-01,
+          3.93440843e-01, 3.42196584e-01, 7.81250000e-02,
+          -3.82502317e-01, 4.01377082e-01, 3.26905608e-01,
+          5.52750230e-01, 6.54679179e-01, 7.07536578e-01,
+          -3.01283121e-01, 2.11724997e-01, 6.60281897e-01,
+          7.52601027e-01, -9.14528906e-01, 4.70026493e-01,
+          1.45201921e-01, 7.58405924e-01, -1.98739588e-01,
+          4.38553452e-01, 3.08655739e-01, 7.46268630e-01,
+          -2.43555784e-01, 4.47056770e-01, 2.98009396e-01 ]
+      - [ -5.21126807e-01, 2.17633963e-01, -4.04542685e-02,
+          -5.61519265e-01, -2.66891181e-01, -1.96467638e-01,
+          -5.39957285e-02, -5.22225142e-01, -2.61740685e-01,
+          -1.87585473e-01, -4.68749404e-02, -1.89491868e-01,
+          -2.33603001e-01, -2.27088034e-01, -4.84374166e-02,
+          -1.88339412e-01, -2.31395364e-01, -2.49806285e-01,
+          2.21520901e-01, 5.94123602e-02, 3.79077554e-01,
+          -3.27989757e-01, -3.87533545e-01, -3.21053088e-01,
+          3.45664620e-01, -2.15499222e-01, -3.88176143e-01,
+          -2.53081262e-01, 4.07222867e-01, 1.63435936e-01,
+          -3.84045184e-01, -3.29685152e-01, 4.10447836e-01,
+          5.31828403e-03, -3.54532778e-01, -3.78318310e-01 ]
+      - [ 1., 9.95014787e-01, 9.68772054e-01, -9.95581031e-01,
+          9.92106557e-01, 9.91265774e-01, 9.66882586e-01,
+          -9.98177826e-01, 9.98067498e-01, 9.87366557e-01,
+          9.75000143e-01, -5.01023054e-01, 1., 9.91765499e-01,
+          9.74999905e-01, -4.42543328e-01, 9.91620183e-01,
+          9.85675573e-01, 9.30868626e-01, 9.88911629e-01, 9.64004397e-01,
+          -6.96689010e-01, 8.38364005e-01, 1., 9.83814836e-01, -1., 1.,
+          7.20947504e-01, 9.82565403e-01, -1.35777295e-01,
+          9.80995297e-01, 9.43203688e-01, 9.60198879e-01,
+          -2.04534471e-01, 1., 9.13147211e-01 ]
+      - [ 1., 9.70505476e-01, 7.82824755e-01, -9.72959638e-01,
+          9.23152089e-01, 9.21952128e-01, 8.01295877e-01,
+          -9.65643883e-01, 9.18933749e-01, 9.16658640e-01,
+          7.93750048e-01, -4.95449781e-01, 9.34217215e-01,
+          9.14894938e-01, 7.81250238e-01, -4.20365930e-01,
+          9.24321532e-01, 9.21314240e-01, 8.20859551e-01, 9.68497396e-01,
+          9.30258632e-01, -9.33297217e-01, 9.22291160e-01,
+          7.32935667e-01, 9.67629910e-01, -9.57329631e-01,
+          9.27067518e-01, 6.87002182e-01, 9.42714810e-01,
+          -3.91308665e-02, 8.97035122e-01, 9.42145109e-01,
+          9.35323358e-01, -5.74010611e-02, 8.87648463e-01,
+          9.25477266e-01 ]
+      - [ -4.38898146e-01, 6.16393685e-01, 8.44570398e-02,
+          -7.24002957e-01, 2.45416164e-01, 3.07432413e-01,
+          7.70338774e-02, -7.51968503e-01, 2.72065043e-01,
+          2.86504269e-01, 8.28125477e-02, -3.14589798e-01,
+          2.83634543e-01, 2.84789443e-01, 7.65625238e-02,
+          -2.90112853e-01, 2.88790226e-01, 2.72774100e-01,
+          4.31319475e-01, 5.05671620e-01, 5.99550128e-01,
+          -5.13646722e-01, 1.32947087e-01, 2.24575281e-01,
+          6.18496895e-01, -5.41991413e-01, 1.52308226e-01,
+          1.73796535e-01, 6.41344786e-01, 5.16102314e-02, 1.56567335e-01,
+          1.99607253e-01, 6.26865745e-01, 1.22015715e-01, 1.17987394e-01,
+          2.18927383e-01 ]
+      - [ -4.52361286e-01, 6.24524593e-01, 6.03264570e-02,
+          -6.36287570e-01, 2.37876058e-01, 3.93828392e-01,
+          6.11951351e-02, -6.10829473e-01, 2.27404952e-01,
+          3.74093652e-01, 7.03125000e-02, -3.31023932e-01,
+          3.01471710e-01, 2.91911840e-01, 6.25000000e-02,
+          -2.83640981e-01, 3.05137277e-01, 2.95346022e-01,
+          5.73790193e-01, 6.00653887e-01, 7.34533191e-01,
+          -6.00273848e-01, 2.66263247e-01, 3.09392571e-01,
+          7.29479671e-01, -4.56867158e-01, 2.21879959e-01,
+          3.91894102e-01, 7.63387203e-01, -6.21897578e-02,
+          3.22072268e-01, 2.93891311e-01, 7.16418028e-01,
+          -2.59828568e-02, 3.16824198e-01, 3.27107430e-01 ]
+      - [ -4.48011637e-01, 3.83342266e-01, -2.12919712e-03,
+          -3.85765851e-01, -9.76043344e-02, 2.09163666e-01,
+          -7.19904900e-04, -4.85205293e-01, -7.34933019e-02,
+          9.73056555e-02, 1.87500715e-02, -3.53552699e-01,
+          1.34526491e-02, -4.92332578e-02, 2.03125477e-02,
+          -2.71229148e-01, -3.59743834e-03, -4.47323918e-02,
+          5.75593591e-01, 3.27230811e-01, 5.18560171e-01,
+          -1.38716578e-01, -1.22425795e-01, 3.39816093e-01,
+          6.16184831e-01, -3.11373889e-01, -9.25585032e-02,
+          1.02149010e-01, 6.76214218e-01, -2.93073952e-01,
+          4.38067913e-02, -1.58881009e-01, 6.59204125e-01,
+          -2.57988930e-01, 2.98681259e-02, -1.35002077e-01 ]
+      - [ -4.65617239e-01, 6.11695886e-01, 5.89070320e-02,
+          -6.99583650e-01, 2.31981158e-01, 3.14871073e-01,
+          4.82362509e-02, -6.95499539e-01, 2.40076423e-01,
+          3.01201224e-01, 5.31250238e-02, -3.29607248e-01,
+          2.84993887e-01, 2.73199916e-01, 4.37500477e-02,
+          -2.96581805e-01, 2.86942482e-01, 2.63579965e-01,
+          4.66185808e-01, 5.44419885e-01, 6.31046176e-01,
+          -5.94295979e-01, 2.13726640e-01, 2.33219385e-01,
+          6.64739847e-01, -6.15198433e-01, 2.24411249e-01,
+          1.86417818e-01, 6.68742180e-01, 5.63888550e-02, 2.13984966e-01,
+          2.67244220e-01, 6.74129248e-01, -1.38747096e-02,
+          2.10608721e-01, 2.20429659e-01 ]
+      - [ -2.17067122e-01, 8.06351304e-01, 2.67565608e-01,
+          -8.21802318e-01, 5.48576713e-01, 5.95657825e-01,
+          2.67098665e-01, -8.17676127e-01, 5.51102757e-01,
+          5.83252549e-01, 2.65625119e-01, -4.07292366e-01,
+          5.92642426e-01, 5.74281096e-01, 2.64062405e-01,
+          -3.90092134e-01, 5.98116994e-01, 5.52744150e-01,
+          6.75984383e-01, 7.71505475e-01, 7.99775004e-01,
+          -7.05534101e-01, 5.45304894e-01, 5.54173589e-01,
+          8.38150144e-01, -7.38036573e-01, 5.49047589e-01,
+          4.81841922e-01, 8.08219075e-01, 8.39247704e-02, 5.35957813e-01,
+          6.51584744e-01, 8.30845833e-01, -4.35336232e-02,
+          5.43482780e-01, 5.61147690e-01 ]
+      - [ -4.83844280e-01, 1.66458368e-01, -9.58126187e-02,
+          -1.57535315e-01, -3.74509990e-01, 3.08516026e-02,
+          -7.84736872e-02, -2.72123516e-01, -3.58750224e-01,
+          -1.09211802e-01, -3.43749523e-02, -3.44539344e-01,
+          -2.49153912e-01, -3.32679212e-01, -3.90624404e-02,
+          -2.20352292e-01, -2.68705666e-01, -3.09815288e-01,
+          4.65584517e-01, 1.57457352e-01, 3.65579247e-01, 1.33241415e-02,
+          -3.03884029e-01, 2.30386496e-01, 4.77456570e-01,
+          -2.27158964e-01, -2.66159177e-01, -7.23500252e-02,
+          5.99003673e-01, -3.86836708e-01, -1.25731945e-01,
+          -3.78367841e-01, 5.57214022e-01, -2.94263005e-01,
+          -1.54537082e-01, -3.39462936e-01 ]
+      - [ -5.16984224e-01, 2.08624482e-01, -8.30376148e-02,
+          -3.41381073e-01, -3.19054544e-01, -6.66418076e-02,
+          -1.00071967e-01, -2.33100832e-01, -3.35230470e-01,
+          -4.02099490e-02, -7.18749166e-02, -1.48952484e-01,
+          -2.40740836e-01, -2.11995482e-01, -7.65624642e-02,
+          -1.33558869e-01, -2.37267733e-01, -2.24249542e-01,
+          3.33333373e-01, 1.39081836e-01, 3.72328520e-01,
+          -1.42823100e-01, -3.51921916e-01, -5.73104620e-02,
+          4.12716627e-01, -1.22721493e-01, -3.59737515e-01,
+          -1.06319487e-01, 4.37110782e-01, 5.30153513e-02,
+          -2.60543168e-01, -2.59662807e-01, 4.37810779e-01,
+          7.05838203e-03, -2.61121988e-01, -2.77575910e-01 ]
+      - [ -4.35584128e-01, 7.19334245e-01, 9.01348591e-02,
+          -6.99052513e-01, 3.85887861e-01, 5.16386986e-01,
+          8.99927616e-02, -7.79405951e-01, 4.15050983e-01,
+          4.37025666e-01, 9.37500000e-02, -4.25806642e-01,
+          4.70285177e-01, 4.10335183e-01, 8.28125477e-02,
+          -3.15831840e-01, 4.51270580e-01, 4.36292887e-01,
+          5.81604958e-01, 7.08307385e-01, 7.18785167e-01,
+          -5.27112722e-01, 3.97516489e-01, 6.01560235e-01,
+          7.73410320e-01, -7.62636840e-01, 4.80980277e-01,
+          3.51544023e-01, 7.68368483e-01, -9.48553085e-02,
+          5.01817465e-01, 4.59807396e-01, 7.56218791e-01,
+          -6.31073117e-02, 4.67851758e-01, 4.61659551e-01 ]
+      - [ 1., 1., 9.57416534e-01, -9.87720549e-01, 9.85221386e-01,
+          9.91780758e-01, 9.61123228e-01, -9.83930349e-01,
+          9.91574287e-01, 9.95401621e-01, 9.62500095e-01,
+          -4.90639567e-01, 9.94468451e-01, 9.93836999e-01,
+          9.48437452e-01, -4.35872376e-01, 9.96802688e-01,
+          9.96434450e-01, 8.63540769e-01, 1., 9.55005646e-01,
+          -9.55066383e-01, 9.78340983e-01, 7.75943398e-01,
+          9.90751266e-01, -9.62630510e-01, 9.80592608e-01,
+          7.51117468e-01, 9.72602606e-01, -9.40873623e-02,
+          9.63848948e-01, 9.63771462e-01, 9.55223918e-01,
+          -1.08252347e-01, 9.57871437e-01, 9.55663681e-01 ]
+      - [ 1., 9.78763223e-01, 7.85663605e-01, -9.84951913e-01,
+          9.34648871e-01, 9.21599865e-01, 7.86897063e-01,
+          -9.75663006e-01, 9.35500741e-01, 9.24658656e-01,
+          7.89062500e-01, -4.56021309e-01, 9.29939389e-01,
+          9.43149328e-01, 7.84374952e-01, -4.58290100e-01,
+          9.44409609e-01, 9.11252022e-01, 8.22061896e-01, 9.67205644e-01,
+          9.34758067e-01, -9.37967420e-01, 9.13730621e-01,
+          7.15267897e-01, 9.53757048e-01, -9.51332748e-01,
+          9.24912453e-01, 6.90827012e-01, 9.27770734e-01,
+          -2.10748911e-02, 8.97851586e-01, 9.59768653e-01,
+          9.27860618e-01, -8.85136724e-02, 8.94483089e-01,
+          9.04472470e-01 ]
+      - [ -5.06006598e-01, 2.97384024e-01, -4.04542685e-02,
+          -3.02975059e-01, -2.59000659e-01, 6.00912571e-02,
+          -6.11951351e-02, -1.94706380e-01, -2.66216040e-01,
+          9.94689465e-02, -2.96874642e-02, -1.53092384e-01,
+          -1.54254794e-01, -1.12407148e-01, -3.43749523e-02,
+          -9.65618491e-02, -1.59697771e-01, -1.09187365e-01,
+          3.60985875e-01, 3.38224888e-01, 4.73565817e-01,
+          -2.53517568e-01, -1.35166347e-01, 1.58532262e-01,
+          4.82080936e-01, -9.84250903e-02, -1.70256376e-01,
+          2.36530066e-01, 5.44209123e-01, -3.54413986e-02,
+          -2.42342353e-02, -6.17516637e-02, 5.37313461e-01,
+          -1.25350952e-02, -4.61729765e-02, -5.81235886e-02 ]
+      - [ -2.00704217e-01, 7.31739163e-01, 2.77501702e-01,
+          -7.53973544e-01, 4.33524132e-01, 5.30073643e-01,
+          2.68538475e-01, -6.62241697e-01, 4.14074898e-01,
+          5.78514218e-01, 2.90624976e-01, -3.81720424e-01,
+          4.85383511e-01, 4.74621773e-01, 2.78125048e-01,
+          -3.49564493e-01, 4.86262441e-01, 4.60896850e-01,
+          6.69972897e-01, 7.18476892e-01, 7.77277946e-01,
+          -6.22598648e-01, 4.38400030e-01, 5.24466515e-01,
+          7.59537458e-01, -4.44549263e-01, 3.92793894e-01,
+          6.63424730e-01, 8.10709715e-01, -5.32749295e-02,
+          5.02438426e-01, 4.96506095e-01, 7.96019912e-01,
+          -1.68099642e-01, 5.40444970e-01, 4.55438733e-01 ]
+      - [ -4.53396857e-01, 2.08724260e-01, -5.60681224e-02,
+          1.98527813e-01, -4.38585460e-01, 2.98880458e-01,
+          1.22389793e-02, -6.11733496e-01, -2.52330244e-01,
+          -2.43760288e-01, 3.75000238e-02, -5.00036120e-01,
+          -1.87708497e-01, -3.48822951e-01, 3.28125954e-02,
+          -3.07655573e-01, -2.29039848e-01, -3.13154399e-01,
+          5.46738863e-01, 1.60802484e-01, 4.89313841e-01, 6.39335155e-01,
+          -5.64564943e-01, 5.77615976e-01, 5.67630053e-01,
+          -6.38297558e-01, -1.64827406e-01, -3.47330868e-01,
+          6.86176777e-01, -6.11244321e-01, -8.68703723e-02,
+          -4.59925711e-01, 6.79104328e-01, -5.34105420e-01,
+          -1.28048182e-01, -4.37452137e-01 ]
+      - [ -4.27920520e-01, 7.65403152e-01, 1.12846017e-01,
+          -7.90046036e-01, 4.71327543e-01, 5.20135880e-01,
+          1.04391694e-01, -7.85334527e-01, 4.83806968e-01,
+          5.17923594e-01, 1.07812524e-01, -3.83435726e-01,
+          5.21557093e-01, 5.02584219e-01, 1.00000024e-01,
+          -3.51123691e-01, 5.23012400e-01, 4.89358068e-01,
+          5.55154800e-01, 7.63492703e-01, 7.52530932e-01,
+          -7.42670476e-01, 5.34927368e-01, 4.79639530e-01,
+          7.41040349e-01, -6.46262407e-01, 5.12590647e-01,
+          5.46780109e-01, 7.63387203e-01, -9.22715664e-02,
+          5.72281122e-01, 5.37166476e-01, 7.68656611e-01,
+          -9.03273225e-02, 5.45406580e-01, 5.23012042e-01 ]
+      - [ -3.38442445e-01, 8.01584721e-01, 1.98012710e-01,
+          -8.15974832e-01, 5.52686810e-01, 6.00778103e-01,
+          1.96544290e-01, -8.03182423e-01, 5.45600891e-01,
+          5.85376143e-01, 1.98437452e-01, -4.42571342e-01,
+          6.05578542e-01, 5.55861354e-01, 1.92187428e-01,
+          -3.87638748e-01, 6.02853537e-01, 5.55222869e-01,
+          6.21881604e-01, 8.00054550e-01, 7.93025851e-01,
+          -7.92472661e-01, 6.11333966e-01, 5.15013695e-01,
+          8.15028787e-01, -7.20633149e-01, 5.87953210e-01,
+          5.55071592e-01, 8.20672393e-01, -8.35992098e-02,
+          6.19209647e-01, 5.96777081e-01, 8.13432693e-01,
+          -4.87726927e-03, 5.82157135e-01, 6.35308146e-01 ]
+      - [ -4.65617239e-01, 4.59460855e-01, 7.80689716e-03,
+          -5.02514184e-01, 4.19211388e-03, 2.26052165e-01,
+          2.08783150e-02, -5.03671825e-01, 2.38752365e-02,
+          2.11450100e-01, 3.28125954e-02, -3.25729609e-01,
+          7.56889582e-02, 3.92247438e-02, 3.75000238e-02,
+          -3.72864366e-01, 9.84263420e-02, -6.72399998e-04,
+          6.73579812e-01, 4.43970442e-01, 6.76040530e-01,
+          -3.37801754e-01, 2.79198885e-02, 3.13933134e-01,
+          6.92485452e-01, -3.84575427e-01, 9.55860615e-02,
+          2.97677994e-01, 7.80821919e-01, -1.72057867e-01,
+          1.59862638e-01, 4.23545837e-02, 7.73631930e-01,
+          -3.02697361e-01, 1.85799956e-01, -5.61124086e-03 ]
+      - [ 1., 9.93027210e-01, 9.67352748e-01, -9.90167320e-01,
+          9.86249447e-01, 9.90622282e-01, 9.62562919e-01,
+          -9.90365863e-01, 9.94288325e-01, 9.91431832e-01,
+          9.84375000e-01, -4.87857580e-01, 9.83616590e-01,
+          9.84631062e-01, 9.65624928e-01, -4.18098271e-01,
+          9.77823377e-01, 9.90982294e-01, 8.43101859e-01, 9.93513465e-01,
+          9.57255483e-01, -9.71878529e-01, 9.69636083e-01,
+          7.38973379e-01, 9.74566340e-01, -9.91596460e-01,
+          9.87447262e-01, 7.15768576e-01, 9.35242772e-01,
+          -7.83981681e-02, 9.67076302e-01, 9.81071115e-01,
+          9.27860618e-01, -2.97825336e-02, 9.32808042e-01, 1. ]
+      - [ 1., 9.96204138e-01, 9.75869298e-01, -9.86127913e-01,
+          9.87274647e-01, 9.97386336e-01, 9.72642303e-01, -1., 1.,
+          9.87994552e-01, 9.70312595e-01, -4.89046335e-01,
+          9.97003675e-01, 9.98350859e-01, 9.73437548e-01,
+          -4.45162475e-01, 9.93026972e-01, 9.84933853e-01,
+          8.74361277e-01, 9.95082736e-01, 9.52755809e-01,
+          -9.63872612e-01, 9.97335792e-01, 7.86591053e-01,
+          9.81502891e-01, -9.73580480e-01, 9.89748001e-01,
+          7.45661616e-01, 9.62640166e-01, -5.72590232e-02,
+          9.63882685e-01, 9.98568773e-01, 9.60198879e-01,
+          -6.58912659e-02, 9.25670981e-01, 9.59709406e-01 ]
+      - [ -4.91507888e-01, 1.01641655e-01, -9.15543437e-02,
+          -9.14036632e-02, -4.81270492e-01, -7.67253637e-02,
+          -7.70337582e-02, -1.26474023e-01, -4.45838988e-01,
+          -1.13658547e-01, -1.87499523e-02, -1.92090750e-01,
+          -3.62983704e-01, -3.74279320e-01, -1.87499523e-02,
+          -3.22085261e-01, -3.23302686e-01, -4.23749149e-01,
+          3.54373217e-01, 1.24444604e-01, 3.58830214e-01,
+          -4.00543213e-04, -3.80698919e-01, 7.20865726e-02,
+          3.24855447e-01, -1.18159115e-01, -2.81144321e-01,
+          2.22686529e-02, 4.62017417e-01, -1.97433293e-01,
+          -1.79948151e-01, -3.27169061e-01, 5.17412901e-01,
+          -4.23245549e-01, -1.58226609e-01, -4.14407730e-01 ]
+      - [ 3.10683250e-03, 8.50395083e-01, 4.12349105e-01,
+          -8.39218020e-01, 6.51747584e-01, 7.17633724e-01,
+          4.11087155e-01, -8.50028038e-01, 6.64015651e-01,
+          7.01336503e-01, 4.14062500e-01, -4.19308245e-01,
+          6.97384357e-01, 6.91724300e-01, 4.10937667e-01,
+          -3.69885445e-01, 6.93108916e-01, 6.85153008e-01,
+          8.79170418e-01, 8.62721443e-01, 9.19010043e-01,
+          -7.77450562e-01, 7.16152191e-01, 6.95250392e-01,
+          9.49132800e-01, -5.33767581e-01, 5.93487382e-01,
+          8.46386790e-01, 9.47696090e-01, -9.79135633e-02,
+          7.64036417e-01, 7.43770242e-01, 9.42786098e-01,
+          -1.75362110e-01, 7.60316968e-01, 6.85429811e-01 ]
+      - [ -5.05592406e-01, 1.47632003e-01, -7.59403706e-02,
+          -6.03460073e-02, -4.73960578e-01, -3.50701809e-02,
+          -9.71922278e-02, 8.67520571e-02, -4.77908373e-01,
+          2.70831585e-02, -7.18749166e-02, -1.68242216e-01,
+          -3.13156724e-01, -3.06637347e-01, -7.18750358e-02,
+          -4.93156314e-02, -3.38938415e-01, -2.91166186e-01,
+          5.12473702e-01, 2.78440833e-01, 5.16310573e-01,
+          -1.23899341e-01, -2.01379836e-01, 2.26052046e-01,
+          5.00577927e-01, 4.88219261e-02, -2.44217277e-01,
+          3.01551580e-01, 6.16438270e-01, -2.11480558e-01,
+          -3.30393910e-02, -1.84790432e-01, 6.44278646e-01,
+          -2.91695237e-01, -3.71589661e-02, -2.21810281e-01 ]
+      - [ -4.14043069e-01, 2.76169896e-01, 3.54862213e-03,
+          -1.67186737e-01, -2.85751998e-01, 1.59646630e-01,
+          -1.22389793e-02, -2.17742145e-01, -2.19553590e-01,
+          1.47609830e-01, 7.18749762e-02, -4.86955106e-01,
+          -1.02537036e-01, -2.51701713e-01, 6.40624762e-02,
+          -3.08969080e-01, -1.37393415e-01, -2.13441193e-01,
+          7.41508842e-01, 2.84377098e-01, 5.74803114e-01,
+          -1.10770166e-01, -1.68398261e-01, 3.02613616e-01,
+          5.02890110e-01, 1.47533417e-02, -1.26978338e-01,
+          4.63621616e-01, 8.20672393e-01, -6.96012020e-01,
+          9.67952013e-02, -3.44520330e-01, 8.15920353e-01,
+          -5.90271950e-01, 4.43491936e-02, -3.10541153e-01 ]
+      - [ 1., 9.90784407e-01, 9.41802740e-01, -9.81577933e-01,
+          9.79580641e-01, 9.91450310e-01, 9.42404628e-01,
+          -9.77336287e-01, 9.77194667e-01, 9.84104037e-01,
+          9.51562643e-01, -4.75256443e-01, 9.80190277e-01,
+          9.90245342e-01, 9.49999928e-01, -4.33745027e-01,
+          9.76090431e-01, 9.74834085e-01, 8.39495063e-01, 9.90456939e-01,
+          9.48256493e-01, -1., 9.95083690e-01, 7.27072835e-01,
+          9.53757048e-01, -9.36215162e-01, 9.65834618e-01,
+          7.65220165e-01, 9.52677369e-01, -4.59166169e-02,
+          9.38776612e-01, 9.81677294e-01, 9.47761178e-01,
+          -1.07752800e-01, 9.38613653e-01, 9.35166478e-01 ]
+      - [ -4.67274308e-01, 2.01040983e-01, -3.76152992e-02,
+          -5.17795086e-02, -4.30503607e-01, 4.47267294e-02,
+          -6.98343515e-02, 2.24745393e-01, -4.65715945e-01,
+          1.80821776e-01, 0., -7.41651654e-02, -2.84772336e-01,
+          -2.16509521e-01, -1.09374523e-02, -1.76580131e-01,
+          -2.49075115e-01, -2.60975540e-01, 5.08265734e-01,
+          3.43415618e-01, 5.38807631e-01, -2.27807224e-01,
+          -1.26079321e-01, 2.11389899e-01, 4.70520258e-01,
+          1.19489431e-01, -1.86845183e-01, 5.02829671e-01,
+          6.33872986e-01, -2.13283122e-01, 5.20768166e-02,
+          -1.00355029e-01, 6.59204125e-01, -3.99574816e-01,
+          7.72582293e-02, -1.77105725e-01 ]
+      - [ -2.33222902e-01, 7.48151660e-01, 2.61887908e-01,
+          -6.90105081e-01, 4.32702541e-01, 6.02500081e-01,
+          2.57019520e-01, -6.95580840e-01, 4.45582390e-01,
+          5.82096457e-01, 2.62499928e-01, -3.69097769e-01,
+          5.07268310e-01, 5.07769823e-01, 2.62500048e-01,
+          -3.82207215e-01, 5.23466945e-01, 4.75571275e-01,
+          7.12653995e-01, 7.76986361e-01, 7.86276698e-01,
+          -5.54079533e-01, 5.05722642e-01, 7.26365209e-01,
+          8.35837960e-01, -6.53658688e-01, 5.38858056e-01,
+          5.83727598e-01, 8.55541706e-01, -1.23828232e-01,
+          6.25541806e-01, 5.69979310e-01, 8.48258734e-01,
+          -1.54987991e-01, 6.08129501e-01, 5.38928866e-01 ]
+      - [ -1.39188051e-01, 8.11719537e-01, 3.47054601e-01,
+          -8.47706437e-01, 5.91683149e-01, 6.26442432e-01,
+          3.40532780e-01, -8.63300920e-01, 6.07566714e-01,
+          6.10330462e-01, 3.46875072e-01, -4.47208881e-01,
+          6.32221699e-01, 5.91727853e-01, 3.39062572e-01,
+          -4.05152500e-01, 6.31805301e-01, 5.82695961e-01,
+          7.88397908e-01, 7.96442032e-01, 8.62767220e-01,
+          -7.20873475e-01, 5.92413664e-01, 6.03950620e-01,
+          8.75144482e-01, -5.29913843e-01, 5.04140973e-01,
+          7.14599609e-01, 8.87920141e-01, -1.68025732e-01,
+          6.73897505e-01, 5.84396601e-01, 8.78109336e-01,
+          -2.20704734e-01, 6.81465030e-01, 5.62693596e-01 ]
+      - [ -2.91632175e-01, 7.03758955e-01, 2.16465592e-01,
+          -7.39518881e-01, 3.98936510e-01, 4.97425079e-01,
+          2.13822842e-01, -7.71040380e-01, 4.18884873e-01,
+          4.63163137e-01, 2.20312595e-01, -4.33076501e-01,
+          4.68961000e-01, 4.12249923e-01, 2.20312476e-01,
+          -3.39806437e-01, 4.43108320e-01, 4.16720629e-01,
+          6.90411687e-01, 6.60743594e-01, 7.45781779e-01,
+          -5.73067069e-01, 3.75933647e-01, 5.04584193e-01,
+          7.66473889e-01, -4.82009232e-01, 3.34013224e-01,
+          5.24493098e-01, 8.10709715e-01, -2.15490878e-01,
+          4.79518652e-01, 3.39686751e-01, 8.00994873e-01,
+          -1.74384058e-01, 4.52751756e-01, 3.58999252e-01 ]
+      - [ 1., 9.95694876e-01, 9.88644361e-01, -9.97738779e-01,
+          9.90092278e-01, 9.87289667e-01, 9.79841709e-01,
+          -9.98430490e-01, 9.98569012e-01, 9.88459468e-01,
+          9.90625143e-01, -4.95330691e-01, 9.93552804e-01,
+          9.89823103e-01, 9.79687452e-01, -4.16286588e-01,
+          9.83412862e-01, 9.99411345e-01, 8.43101859e-01, 9.95751858e-01,
+          9.43757057e-01, -9.70322371e-01, 9.85160351e-01,
+          7.59409904e-01, 9.69942212e-01, -9.70266819e-01,
+          9.88913655e-01, 7.47717142e-01, 9.55168128e-01,
+          -3.02091241e-02, 9.41744804e-01, 1., 9.50248480e-01,
+          -9.22977924e-02, 9.37833786e-01, 9.48489308e-01 ]
+      - [ 1., 9.36520457e-01, 6.80624485e-01, -9.21807170e-01,
+          8.37761164e-01, 8.71454000e-01, 6.81785464e-01,
+          -9.23356056e-01, 8.45386624e-01, 8.65583062e-01,
+          6.85937524e-01, -4.54421580e-01, 8.61877799e-01,
+          8.62679243e-01, 6.85937524e-01, -4.26410258e-01,
+          8.59122753e-01, 8.38262558e-01, 7.81785250e-01, 9.39345360e-01,
+          8.98762703e-01, -8.19989800e-01, 8.25196266e-01,
+          7.75251150e-01, 9.21387196e-01, -9.05356586e-01,
+          8.70045185e-01, 6.81357145e-01, 9.07845378e-01,
+          -8.35801363e-02, 8.68707061e-01, 8.68669271e-01,
+          8.98009896e-01, -1.40346169e-01, 8.67535710e-01,
+          8.28159809e-01 ]
+      - [ -5.20505428e-01, 6.60959482e-02, -8.44570994e-02,
+          -6.90047741e-02, -5.43258846e-01, -1.59139335e-01,
+          -1.08711302e-01, 7.11559057e-02, -5.48753381e-01,
+          -1.09291077e-01, -5.78125119e-02, -7.08127022e-02,
+          -4.21930611e-01, -3.78644705e-01, -6.56250119e-02,
+          -5.52509427e-02, -4.18343306e-01, -3.87366772e-01,
+          2.43763089e-01, 1.53914213e-01, 3.63329649e-01,
+          -2.07487345e-01, -3.18144560e-01, -8.12135935e-02,
+          2.80924797e-01, 9.07684565e-02, -3.65833700e-01,
+          1.34077191e-01, 4.42092061e-01, -1.53273165e-01,
+          -2.06347406e-01, -3.27982485e-01, 4.30348396e-01,
+          -1.22952759e-01, -2.25392580e-01, -3.16512406e-01 ]
+      - [ -5.03935397e-01, 2.91906476e-01, -3.47764492e-02,
+          -4.91194844e-01, -1.96428418e-01, -3.76729369e-02,
+          -4.10366654e-02, -5.25197268e-01, -1.75646424e-01,
+          -7.72495270e-02, -3.59374285e-02, -1.55535281e-01,
+          -1.50173962e-01, -1.09462678e-01, -4.21874523e-02,
+          -1.56129241e-01, -1.43689692e-01, -1.29880488e-01,
+          3.13495636e-01, 1.74713016e-01, 4.21822309e-01,
+          -2.20728397e-01, -3.07247579e-01, -7.65897632e-02,
+          4.54335213e-01, -2.88051724e-01, -2.72355318e-01,
+          -1.49875283e-01, 4.54545379e-01, 1.84098721e-01,
+          -2.68919945e-01, -1.85663283e-01, 4.60198998e-01,
+          1.36733055e-01, -2.66058624e-01, -2.04217374e-01 ]
+      - [ -4.42212164e-01, 3.50121498e-01, 7.09652901e-04,
+          9.62535143e-02, -3.03481340e-01, 4.22337532e-01,
+          7.12742805e-02, -6.13242745e-01, -1.21122181e-01,
+          -7.64327645e-02, 6.87500238e-02, -3.83319139e-01,
+          -5.09182811e-02, -1.35450602e-01, 7.03125000e-02,
+          -2.86471486e-01, -7.13388920e-02, -1.26934528e-01,
+          6.21280432e-01, 4.07138586e-01, 6.01799846e-01, 3.05132508e-01,
+          -2.44647920e-01, 7.83064127e-01, 7.15606928e-01,
+          -8.69731903e-01, 1.72794223e-01, -1.60517573e-01,
+          7.40971327e-01, -4.48158920e-01, 1.97767019e-01,
+          -1.09505415e-01, 7.63681531e-01, -4.22614515e-01,
+          1.57314062e-01, -1.11110628e-01 ]
+      - [ 1., 9.93859649e-01, 1., -9.83115196e-01, 9.80627418e-01,
+          9.94257808e-01, 1., -9.87986565e-01, 9.89826202e-01,
+          9.90828514e-01, 1., -4.88081694e-01, 9.96246696e-01,
+          9.99349833e-01, 1., -4.07553613e-01, 9.76655960e-01, 1.,
+          9.38082218e-01, 9.90908265e-01, 9.77502823e-01,
+          -9.88086224e-01, 1., 7.56161690e-01, 9.95375514e-01,
+          -7.27625072e-01, 8.72741103e-01, 9.57366228e-01,
+          9.77583885e-01, -1.48443162e-01, 1., 9.51452017e-01,
+          9.57711220e-01, -1.75810575e-01, 9.98149633e-01,
+          9.37161803e-01 ]
+      - [ 1., 9.97304201e-01, 9.70191598e-01, -9.97731805e-01, 1.,
+          9.98449564e-01, 9.78401780e-01, -9.90927398e-01,
+          9.99937773e-01, 9.98688579e-01, 9.89062548e-01,
+          -4.90161598e-01, 9.96961355e-01, 9.98019099e-01,
+          9.90624905e-01, -4.37744260e-01, 9.88577604e-01,
+          9.86946940e-01, 8.35286975e-01, 9.99868393e-01, 9.59505081e-01,
+          -9.95339990e-01, 9.96331215e-01, 7.37554431e-01,
+          9.81502891e-01, -9.78324234e-01, 9.86144543e-01,
+          7.34214544e-01, 9.40224051e-01, -8.12388659e-02,
+          9.72872019e-01, 9.84803557e-01, 9.45273399e-01,
+          -7.68705606e-02, 9.44246888e-01, 9.69433308e-01 ]
+      - [ -4.86951113e-01, -5.43962121e-02, -1.41234934e-01,
+          7.13795781e-01, -7.67288923e-01, 1.53977394e-01,
+          -4.96760607e-02, -4.74146903e-01, -5.30308187e-01,
+          -4.97782528e-01, 1.09375715e-02, -5.69772363e-01,
+          -4.47923779e-01, -6.37079954e-01, 1.09375715e-02,
+          -3.08710277e-01, -4.94328082e-01, -5.92940331e-01,
+          4.41538811e-01, 9.90176201e-03, 3.92575979e-01, 8.99058342e-01,
+          -7.40635753e-01, 4.80666637e-01, 4.42774534e-01,
+          -5.96109986e-01, -3.01768541e-01, -4.84735191e-01,
+          6.11456990e-01, -7.95185685e-01, -2.01567113e-01,
+          -6.39753222e-01, 5.99502444e-01, -7.22188354e-01,
+          -2.35808909e-01, -6.16593242e-01 ]
+      - [ 1., 9.83244419e-01, 7.20368981e-01, -9.74076092e-01,
+          9.31378961e-01, 9.26266074e-01, 7.27861762e-01,
+          -9.52652395e-01, 9.25861597e-01, 9.35071468e-01,
+          7.34375119e-01, -4.95002270e-01, 9.44874167e-01,
+          9.24747467e-01, 7.29687691e-01, -4.48595285e-01,
+          9.41687226e-01, 9.14206982e-01, 8.23264122e-01, 9.85479474e-01,
+          9.43757057e-01, -9.40404952e-01, 9.43718553e-01,
+          7.51904249e-01, 9.72254157e-01, -9.60536063e-01,
+          9.59985256e-01, 7.25117564e-01, 9.45205331e-01,
+          -7.27916956e-02, 9.34474707e-01, 9.51253414e-01,
+          9.20397878e-01, -1.05045855e-01, 9.48862195e-01,
+          9.47887659e-01 ]
+      - [ -6.31731153e-02, 8.26896429e-01, 3.83960247e-01,
+          -8.40884507e-01, 6.14465594e-01, 6.66313767e-01,
+          3.69330525e-01, -8.00514042e-01, 6.13635778e-01,
+          6.89770341e-01, 3.89062643e-01, -5.04132509e-01,
+          6.85564756e-01, 6.08228087e-01, 3.76562476e-01,
+          -3.96564543e-01, 6.62533760e-01, 6.26441002e-01,
+          7.54734039e-01, 7.98377872e-01, 8.35770607e-01,
+          -7.39601970e-01, 6.02227449e-01, 5.86231947e-01,
+          8.40462446e-01, -5.93559623e-01, 5.61904788e-01,
+          7.02088714e-01, 8.72976303e-01, -1.41684413e-01,
+          6.53706551e-01, 5.85153341e-01, 8.73134375e-01,
+          -1.39793217e-01, 6.38287425e-01, 5.84119797e-01 ]
+      - [ -4.32684362e-01, 5.90505719e-01, 5.46487570e-02,
+          -6.11027896e-01, 2.10338950e-01, 3.85869384e-01,
+          5.25557995e-02, -6.93167925e-01, 2.49731183e-01,
+          3.16244364e-01, 6.25000000e-02, -3.64425898e-01,
+          2.81173229e-01, 2.43603230e-01, 5.62499762e-02,
+          -3.47246647e-01, 2.90157676e-01, 2.29358077e-01,
+          6.55545592e-01, 5.60453057e-01, 6.89538836e-01,
+          -4.12113190e-01, 2.00403571e-01, 4.74826813e-01,
+          7.22543240e-01, -4.27251875e-01, 2.04279542e-01,
+          4.04901147e-01, 7.65877843e-01, -2.18018651e-01,
+          3.26405525e-01, 1.78927898e-01, 7.66169190e-01,
+          -2.28501916e-01, 3.11180472e-01, 1.71348810e-01 ]
+      - [ -4.91715014e-01, 2.99164534e-01, -4.18736935e-02,
+          -2.73805380e-01, -2.60294497e-01, 8.79666805e-02,
+          -5.39957285e-02, -1.64135993e-01, -2.62449145e-01,
+          1.34906650e-01, -3.59374285e-02, -2.09629118e-01,
+          -1.33682191e-01, -1.24803483e-01, -4.21874523e-02,
+          -1.17025912e-01, -1.51949525e-01, -1.13885105e-01,
+          5.34114838e-01, 3.68120193e-01, 5.74803114e-01,
+          -2.03055561e-01, -9.52860713e-02, 2.97404647e-01,
+          5.51445007e-01, -7.22617507e-02, -1.14786744e-01,
+          3.66792083e-01, 6.51307464e-01, -2.06378758e-01,
+          7.25326538e-02, -7.45447874e-02, 6.64179087e-01,
+          -2.50355721e-01, 6.08159304e-02, -9.93171334e-02 ]
+      - [ -5.08077860e-01, -3.21213007e-02, -1.10007048e-01,
+          -1.77386403e-02, -6.16814613e-01, -2.38036692e-01,
+          -4.39164042e-02, -5.08915961e-01, -5.03474116e-01,
+          -4.86266315e-01, -8.43749642e-02, 1.81445003e-01,
+          -5.46820760e-01, -3.99476349e-01, -7.96874762e-02,
+          -4.11186278e-01, -4.16333079e-01, -5.65797567e-01,
+          2.20919728e-01, -7.81379342e-02, 2.37345338e-01,
+          1.26924992e-01, -6.08076990e-01, -1.98837221e-01,
+          2.48554826e-01, -5.26172459e-01, -3.82615209e-01,
+          -5.40276289e-01, 3.07596564e-01, 3.37660313e-02,
+          -4.65781450e-01, -4.92888272e-01, 3.50746155e-01,
+          -1.98001742e-01, -4.32442069e-01, -5.61700225e-01 ]
+      - [ -3.35749805e-01, 7.45469213e-01, 1.79559946e-01,
+          -7.76915789e-01, 4.65471625e-01, 5.35205841e-01,
+          1.69186354e-01, -7.04783916e-01, 4.50164318e-01,
+          5.70551038e-01, 1.76562548e-01, -4.04132128e-01,
+          5.23460865e-01, 4.93000031e-01, 1.70312524e-01,
+          -3.16703022e-01, 5.02136469e-01, 4.98971343e-01,
+          7.20468760e-01, 7.54220724e-01, 8.08773875e-01,
+          -5.21423578e-01, 4.48263049e-01, 6.94320083e-01,
+          8.08092356e-01, -6.18012786e-01, 5.11046410e-01,
+          5.91913581e-01, 8.48069668e-01, -1.31872475e-01,
+          5.93905926e-01, 5.29276848e-01, 8.33333254e-01,
+          -2.03337133e-01, 6.00669026e-01, 4.91239309e-01 ]
+      - [ 1., 9.77606773e-01, 8.39602470e-01, -9.70326424e-01,
+          9.39880252e-01, 9.49715018e-01, 8.47372293e-01,
+          -9.63264585e-01, 9.41854239e-01, 9.50701952e-01,
+          8.53124976e-01, -4.74708080e-01, 9.49044466e-01,
+          9.51605082e-01, 8.45312476e-01, -4.33360457e-01,
+          9.50293303e-01, 9.42018986e-01, 8.41899633e-01, 9.83640432e-01,
+          9.50506330e-01, -9.42214072e-01, 9.40017462e-01,
+          7.45272756e-01, 9.81502891e-01, -9.45522845e-01,
+          9.48917150e-01, 7.33859062e-01, 9.45205331e-01,
+          -6.00225925e-02, 9.32726264e-01, 9.61449742e-01,
+          9.45273399e-01, -1.17338955e-01, 9.36275959e-01,
+          9.23862815e-01 ]
+      - [ -4.85915542e-01, 5.13054967e-01, 2.90986300e-02,
+          -7.28949904e-01, 1.16809368e-01, 1.35053873e-01,
+          1.94383860e-02, -7.05786824e-01, 1.22730017e-01,
+          1.40842199e-01, 2.81250477e-02, -3.04124415e-01,
+          1.42062306e-01, 1.28547311e-01, 1.56251192e-02,
+          -2.98960984e-01, 1.54322147e-01, 1.11396790e-01,
+          4.15689707e-01, 3.88793588e-01, 5.72553396e-01,
+          -5.55902958e-01, 1.14749670e-02, -2.18403339e-03,
+          5.88439226e-01, -5.13603806e-01, 9.64045525e-03,
+          9.10818577e-03, 6.18929029e-01, 1.08284950e-01,
+          -8.28534365e-03, 5.84031343e-02, 6.14427924e-01,
+          -7.70890713e-03, 1.11026764e-02, 8.23116302e-03 ]
+      - [ -4.50497150e-01, 3.95426154e-01, 2.20013857e-02,
+          -4.00244355e-01, -1.19930744e-01, 1.62727594e-01,
+          9.35924053e-03, -3.82615924e-01, -1.07209384e-01,
+          1.50213599e-01, 1.56251192e-02, -1.49645030e-01,
+          -1.71256661e-02, 5.22376299e-02, 1.87500715e-02,
+          -1.53778672e-01, -1.66881680e-02, 2.06285715e-02,
+          5.77998161e-01, 3.71183038e-01, 5.83801985e-01,
+          -2.49642611e-01, -7.95982480e-02, 2.59264350e-01,
+          6.09248519e-01, -3.31176162e-01, -4.53545451e-02,
+          1.48526907e-01, 6.26400948e-01, 2.15610504e-01,
+          -2.11697817e-02, 1.22451663e-01, 6.66666508e-01,
+          -1.31315351e-01, 5.28231859e-02, -2.96645164e-02 ]
+      - [ -4.42833543e-01, 7.29283214e-01, 1.00070953e-01,
+          -8.48004401e-01, 4.47609544e-01, 4.21829581e-01,
+          8.56730938e-02, -8.24588418e-01, 4.46364045e-01,
+          4.26895380e-01, 9.21875238e-02, -3.81781280e-01,
+          4.62464452e-01, 4.36100125e-01, 8.90625715e-02,
+          -3.83257866e-01, 4.76297736e-01, 4.10122037e-01,
+          5.40727258e-01, 6.77029610e-01, 7.21034884e-01,
+          -7.46377468e-01, 4.28853035e-01, 3.28396320e-01,
+          7.50288963e-01, -7.49644518e-01, 4.29292798e-01,
+          2.98142314e-01, 7.48443365e-01, 1.36618614e-02, 4.05754924e-01,
+          4.45624828e-01, 7.28855848e-01, -2.41902471e-02,
+          4.09806252e-01, 4.29566979e-01 ]
+      - [ -5.02485514e-01, 3.53709817e-01, -1.77430511e-02,
+          -3.81395936e-01, -1.87457621e-01, 8.47164392e-02,
+          -3.38372588e-02, -2.95015514e-01, -1.88522279e-01,
+          1.17936254e-01, -1.40624046e-02, -1.94779992e-01,
+          -8.80694985e-02, -6.18823767e-02, -1.56250000e-02,
+          -1.52148545e-01, -9.25242305e-02, -6.73158169e-02,
+          3.67598414e-01, 3.49688411e-01, 5.32058477e-01,
+          -3.62267971e-01, -9.73204374e-02, 7.95692205e-02,
+          4.63583708e-01, -1.30667448e-01, -1.32211566e-01,
+          2.56773353e-01, 5.89041114e-01, -1.01280272e-01,
+          -3.34423780e-03, -8.33743215e-02, 5.84576964e-01,
+          -6.94473982e-02, -2.94533372e-02, -7.68569708e-02 ]
+      - [ -3.77796233e-01, 6.36402726e-01, 1.42654300e-01,
+          -8.02864075e-01, 3.04718614e-01, 3.01106095e-01,
+          1.24550104e-01, -6.99863970e-01, 2.78236032e-01,
+          3.52142930e-01, 1.26562595e-01, -3.97473812e-01,
+          3.38598371e-01, 2.87614703e-01, 1.17187619e-01,
+          -2.83727169e-01, 3.18597913e-01, 3.13904524e-01,
+          6.23684883e-01, 5.59648871e-01, 6.73790812e-01,
+          -6.30391598e-01, 2.79727221e-01, 2.80334592e-01,
+          7.29479671e-01, -5.36602080e-01, 2.21197248e-01,
+          2.87232757e-01, 7.31008649e-01, -9.81754065e-03,
+          2.61999369e-01, 2.69299746e-01, 7.33830810e-01,
+          -3.90054584e-02, 2.48689651e-01, 2.44084954e-01 ]
+      - [ -4.44697618e-01, 3.58773470e-01, -7.09772110e-04,
+          -1.60877764e-01, -2.11081445e-01, 2.84280300e-01,
+          -9.35924053e-03, -1.53262794e-01, -2.04858541e-01,
+          2.33965158e-01, 3.75000238e-02, -2.21579552e-01,
+          -6.66683912e-02, -5.23781776e-02, 2.96875238e-02,
+          -2.86250114e-01, -3.70854735e-02, -9.12558436e-02,
+          5.38923979e-01, 4.69287038e-01, 5.74803114e-01,
+          -1.05189383e-01, -5.81175089e-03, 5.92073441e-01,
+          6.23121262e-01, -2.96738386e-01, 4.42742109e-02,
+          3.29177976e-01, 6.96139336e-01, -2.32163668e-01,
+          2.06378460e-01, 4.41031456e-02, 7.06467628e-01,
+          -2.74063766e-01, 1.97454810e-01, 2.25772858e-02 ]
+      - [ -3.19801211e-01, 7.38378763e-01, 2.12207198e-01,
+          -7.18249440e-01, 4.25273418e-01, 5.55684686e-01,
+          2.05183625e-01, -8.66016090e-01, 4.95165944e-01,
+          4.56013680e-01, 2.15625048e-01, -4.86428380e-01,
+          5.22572875e-01, 4.30215478e-01, 2.03125119e-01,
+          -4.07264769e-01, 5.14786005e-01, 4.41916585e-01,
+          6.05650663e-01, 6.79426312e-01, 7.23284721e-01,
+          -4.49805439e-01, 3.33746672e-01, 6.24208808e-01,
+          7.57225275e-01, -8.45820308e-01, 4.98509884e-01,
+          2.65348792e-01, 7.60896564e-01, -1.83783472e-01,
+          4.99178410e-01, 3.83579373e-01, 7.63681531e-01,
+          -1.54225051e-01, 4.50571418e-01, 3.71175170e-01 ]
+      - [ -4.43869114e-01, 6.10448122e-01, 7.31014013e-02,
+          -5.76337576e-01, 1.86539173e-01, 3.95053744e-01,
+          6.40748739e-02, -6.61917686e-01, 2.33348131e-01,
+          3.28719616e-01, 6.87500238e-02, -3.03446531e-01,
+          2.79090762e-01, 2.87117124e-01, 6.40624762e-02,
+          -2.68038452e-01, 2.79547095e-01, 2.78521299e-01,
+          4.76405144e-01, 5.54203749e-01, 6.40045047e-01,
+          -3.22497725e-01, 1.04215384e-01, 4.52576756e-01,
+          6.57803297e-01, -6.22980893e-01, 2.45516539e-01,
+          2.04329729e-01, 6.73723459e-01, 6.37941360e-02, 2.15619564e-01,
+          2.74968743e-01, 6.79104328e-01, -9.49110985e-02,
+          2.50235677e-01, 2.02841759e-01 ]
+      - [ -5.68351328e-01, -1.99104369e-01, -1.62526608e-01,
+          -2.57203162e-01, -6.91621900e-01, -5.45944333e-01,
+          -1.79265559e-01, -1.59495115e-01, -6.93817973e-01,
+          -5.23475885e-01, -1.56249940e-01, 3.73730659e-02,
+          -6.56676888e-01, -6.08402908e-01, -1.57812417e-01,
+          -3.47203016e-02, -6.41249120e-01, -6.37294650e-01,
+          -4.41840291e-02, -2.90690064e-01, 2.81214714e-02,
+          -6.59203529e-02, -7.01619387e-01, -5.68434477e-01,
+          -1.73410773e-02, 1.20144606e-01, -7.16826677e-01,
+          -4.78877306e-01, 5.85304499e-02, 2.08808064e-01,
+          -6.77362680e-01, -6.40839517e-01, 6.46766424e-02,
+          -1.85137987e-03, -6.37514055e-01, -6.82437420e-01 ]
+      - [ -4.38691020e-01, 1.58673286e-01, -4.18736935e-02,
+          4.71653342e-01, -5.75935602e-01, 3.28165054e-01,
+          6.69547319e-02, -4.97842014e-01, -3.53789389e-01,
+          -2.81600952e-01, 7.03125000e-02, -3.80396605e-01,
+          -2.71793664e-01, -3.70551050e-01, 6.25000000e-02,
+          -2.63248146e-01, -2.91924298e-01, -3.54745030e-01,
+          5.07063389e-01, 2.63704658e-01, 4.96062994e-01, 5.61401367e-01,
+          -4.62715983e-01, 7.11306810e-01, 5.86127043e-01,
+          -8.40445697e-01, 3.01412344e-02, -3.09049189e-01,
+          6.28891587e-01, -5.77884734e-01, 7.02563524e-02,
+          -3.11016738e-01, 6.56716347e-01, -4.82222378e-01,
+          7.58755207e-03, -2.90426314e-01 ]
+      - [ 1., 9.77896333e-01, 7.88502455e-01, -9.63996172e-01,
+          9.26139235e-01, 9.37078118e-01, 7.96976328e-01,
+          -9.75384295e-01, 9.32681441e-01, 9.22192335e-01,
+          8.04687500e-01, -4.80333090e-01, 9.37786698e-01,
+          9.32135820e-01, 7.95312405e-01, -4.07808721e-01,
+          9.28372860e-01, 9.37403560e-01, 8.50315571e-01, 9.75412250e-01,
+          9.39257741e-01, -9.28068817e-01, 9.43581104e-01,
+          7.70200729e-01, 9.74566340e-01, -9.48493659e-01,
+          9.36571836e-01, 7.12761879e-01, 9.57658529e-01,
+          -6.91086054e-02, 9.25874710e-01, 9.45732951e-01,
+          9.37810659e-01, -8.80190730e-02, 9.14038420e-01,
+          9.26328897e-01 ]
+      - [ -5.07042289e-01, -2.95134783e-02, -9.43931937e-02,
+          -2.68283546e-01, -5.65987945e-01, -3.65457416e-01,
+          -1.10151112e-01, -2.47724652e-01, -5.60862660e-01,
+          -3.84996891e-01, -7.34375119e-02, -1.64795041e-01,
+          -4.92960632e-01, -5.08473575e-01, -7.81249404e-02,
+          -1.60729170e-01, -4.87818301e-01, -5.21056354e-01,
+          1.71024919e-01, -9.17258263e-02, 2.10348725e-01,
+          -1.51464283e-01, -5.29890358e-01, -3.64578485e-01,
+          2.16184974e-01, -2.21870422e-01, -5.07869780e-01,
+          -4.44247305e-01, 2.80199170e-01, -3.99739146e-02,
+          -4.68338251e-01, -5.34076571e-01, 2.78607130e-01,
+          -1.01184130e-01, -4.68250811e-01, -5.52795053e-01 ]
+      - [ -4.36619759e-01, 6.53067827e-01, 8.72960091e-02,
+          -4.64989662e-01, 1.97449088e-01, 5.40736198e-01,
+          9.71921682e-02, -7.76397109e-01, 3.13744307e-01,
+          3.15312266e-01, 8.75000954e-02, -4.26283896e-01,
+          3.71040106e-01, 2.99308062e-01, 8.12500715e-02,
+          -3.41555655e-01, 3.59027028e-01, 3.12109232e-01,
+          6.00841641e-01, 6.67302847e-01, 7.23284721e-01,
+          -2.27022171e-01, 2.08166838e-01, 7.71842360e-01,
+          7.50288963e-01, -8.85802925e-01, 5.03477573e-01,
+          2.20603108e-01, 7.68368483e-01, -2.66141355e-01,
+          5.05334973e-01, 3.24078560e-01, 7.68656611e-01,
+          -2.56413341e-01, 4.81492043e-01, 3.24205279e-01 ]
+      - [ 1., 9.82812047e-01, 8.05535674e-01, -9.61126208e-01,
+          9.35359359e-01, 9.53217030e-01, 8.08495283e-01,
+          -9.90468204e-01, 9.51850653e-01, 9.29293633e-01,
+          8.12500000e-01, -4.88707781e-01, 9.57154155e-01,
+          9.47344303e-01, 8.06249857e-01, -4.23057258e-01,
+          9.49676633e-01, 9.48751569e-01, 8.35286975e-01, 9.86364841e-01,
+          9.39257741e-01, -9.83003914e-01, 9.85153794e-01,
+          7.39107370e-01, 9.72254157e-01, -9.66940641e-01,
+          9.59309578e-01, 7.15050101e-01, 9.50186849e-01,
+          -9.04198885e-02, 9.47968245e-01, 9.49368834e-01,
+          9.30348277e-01, -6.59778714e-02, 9.14297342e-01,
+          9.46515441e-01 ]
+      - [ -4.97100234e-01, 4.52244282e-03, -1.00071013e-01,
+          -2.68186331e-01, -5.07085204e-01, -2.77211726e-01,
+          -6.26348853e-02, -3.69515538e-01, -4.91796196e-01,
+          -3.75057936e-01, -2.81249285e-02, -1.99445426e-01,
+          -4.56348717e-01, -4.82815981e-01, -3.43749523e-02,
+          -1.63472235e-01, -4.58304942e-01, -4.88329113e-01,
+          3.83829236e-01, -2.69873142e-02, 2.91338563e-01,
+          6.93624020e-02, -5.30643821e-01, -1.17784560e-01,
+          3.87283206e-01, -2.12657571e-01, -4.52465296e-01,
+          -3.44552815e-01, 5.29265285e-01, -1.21900856e-01,
+          -4.21846449e-01, -5.22081375e-01, 5.19900560e-01,
+          -1.65986717e-01, -4.20433760e-01, -5.30304193e-01 ]
+      - [ -5.89270949e-01, 1.04528666e-02, -1.86657190e-01,
+          -5.46612740e-01, -4.21342671e-01, -3.91659200e-01,
+          -1.96544290e-01, -5.24507403e-01, -4.21657026e-01,
+          -4.02899563e-01, -1.84374988e-01, -1.86950684e-01,
+          -4.10259664e-01, -4.31499898e-01, -1.95312500e-01,
+          -1.43407583e-01, -4.16500866e-01, -4.38015819e-01,
+          2.92455673e-01, -9.04731750e-02, 3.29583883e-01,
+          -2.13174701e-01, -5.10401011e-01, -3.90799582e-01,
+          3.64161849e-01, -2.20896959e-01, -5.03969550e-01,
+          -4.29568172e-01, 4.17185545e-01, 1.33331895e-01,
+          -5.05985737e-01, -4.82890189e-01, 4.10447836e-01,
+          2.52746344e-02, -4.94191945e-01, -5.15577257e-01 ]
+      - [ -3.40720832e-01, 8.41659427e-01, 1.99432254e-01,
+          -8.75865996e-01, 6.30387306e-01, 6.25851512e-01,
+          1.90784812e-01, -8.40735197e-01, 6.18864059e-01,
+          6.34154916e-01, 1.96875095e-01, -4.43634748e-01,
+          6.64155245e-01, 6.20513320e-01, 1.90625072e-01,
+          -3.62745941e-01, 6.50356889e-01, 6.29461408e-01,
+          6.39915824e-01, 8.49371791e-01, 8.31270933e-01,
+          -8.19161952e-01, 6.78742409e-01, 5.70675969e-01,
+          8.38150144e-01, -7.69042611e-01, 6.69428110e-01,
+          5.99726081e-01, 8.28144431e-01, -9.12665725e-02,
+          7.08401203e-01, 6.86197281e-01, 8.40795994e-01,
+          -6.95213675e-02, 6.74576044e-01, 6.81329131e-01 ]
+      - [ -5.06006598e-01, 1.89158440e-01, -5.60681224e-02,
+          -4.74557579e-01, -2.97964394e-01, -1.60575926e-01,
+          -6.55146241e-02, -3.52138281e-01, -3.28075290e-01,
+          -1.34219825e-01, -3.90624404e-02, -2.83884287e-01,
+          -2.44073391e-01, -2.93417811e-01, -4.68749404e-02,
+          -9.64817405e-02, -2.80732512e-01, -2.50548065e-01,
+          4.44544673e-01, 1.33906722e-01, 3.90326262e-01,
+          -1.71282530e-01, -3.02075207e-01, -9.42784548e-03,
+          5.05202293e-01, -1.48320436e-01, -3.54391813e-01,
+          -1.21977806e-01, 5.99003673e-01, -1.40255988e-01,
+          -2.48763204e-01, -3.57903123e-01, 5.79601884e-01,
+          -7.49386549e-02, -2.70357966e-01, -3.31433594e-01 ]
+      - [ -5.61516166e-01, -1.25614107e-01, -1.82398856e-01,
+          -1.39220536e-01, -6.52244329e-01, -3.96837711e-01,
+          -1.83585227e-01, -1.94570780e-01, -6.29486024e-01,
+          -4.49268103e-01, -1.73437417e-01, 6.29081726e-02,
+          -5.94871879e-01, -5.21077871e-01, -1.75000012e-01,
+          4.27377224e-03, -5.77285051e-01, -5.45550942e-01,
+          1.66816831e-01, -1.26574516e-01, 1.65354371e-01,
+          6.37352467e-02, -6.17764115e-01, -2.88255990e-01,
+          1.95375681e-01, -3.52175832e-02, -5.74584484e-01,
+          -3.74027371e-01, 2.50311375e-01, 1.93133354e-01,
+          -5.51378608e-01, -5.03915548e-01, 2.51243830e-01,
+          1.57056570e-01, -5.48210680e-01, -5.10970175e-01 ]
+      - [ -4.38069642e-01, 6.67862296e-01, 8.58764648e-02,
+          -6.42698228e-01, 2.86604762e-01, 4.52593684e-01,
+          8.42332840e-02, -7.28556871e-01, 3.24832320e-01,
+          3.77402902e-01, 8.12500715e-02, -4.10362422e-01,
+          3.91505122e-01, 3.33619118e-01, 8.43750238e-02,
+          -3.41222584e-01, 3.74894500e-01, 3.30304980e-01,
+          5.43131948e-01, 6.28704906e-01, 6.82789683e-01,
+          -4.83297467e-01, 2.69795537e-01, 4.74593401e-01,
+          6.97109699e-01, -6.90777540e-01, 3.78136396e-01,
+          2.99921870e-01, 7.33499289e-01, -9.99630690e-02,
+          3.75623226e-01, 3.20242643e-01, 7.18905449e-01,
+          -1.52960122e-01, 3.84670258e-01, 3.01712513e-01 ]
+      - [ 1., 9.97390151e-01, 9.67352748e-01, -9.98267353e-01,
+          9.87456918e-01, 9.82121468e-01, 9.66882586e-01,
+          -9.86840606e-01, 9.91471529e-01, 9.92264152e-01,
+          9.73437548e-01, -4.86830473e-01, 9.92788792e-01,
+          9.95547891e-01, 9.71875191e-01, -4.21276808e-01,
+          9.80277061e-01, 9.91217136e-01, 8.01622987e-01, 9.90986824e-01,
+          9.30258632e-01, -9.66860950e-01, 9.74267721e-01,
+          7.48626471e-01, 9.44508553e-01, -9.70464766e-01,
+          9.77218270e-01, 7.28765845e-01, 9.45205331e-01,
+          -1.25991702e-02, 9.10070062e-01, 9.81715441e-01,
+          9.25372958e-01, -1.09597087e-01, 9.48789358e-01,
+          9.43789244e-01 ]
+      - [ -4.78044748e-01, 3.86221766e-01, -1.20653510e-02,
+          -1.12632513e-02, -2.63794422e-01, 3.67359757e-01,
+          5.03957272e-03, -2.56138384e-01, -1.82110965e-01,
+          1.65903091e-01, 1.40625238e-02, -2.24858224e-01,
+          -3.42550278e-02, -1.84723735e-02, 4.68754768e-03,
+          -1.61133170e-01, -4.25590277e-02, -1.49415135e-02,
+          4.34325218e-01, 4.65405345e-01, 5.14060736e-01,
+          -1.35240316e-01, -1.92677975e-02, 5.20611644e-01,
+          5.74566364e-01, -5.07170081e-01, 1.08023763e-01,
+          1.50914192e-01, 6.28891587e-01, -1.47809982e-01,
+          1.67870998e-01, 6.27481937e-02, 6.09452605e-01,
+          -1.48323894e-01, 1.62000656e-01, 6.99108839e-02 ]
+      - [ -4.71831024e-01, 6.46868229e-01, 5.03903627e-02,
+          -7.61173725e-01, 3.06498885e-01, 3.39603305e-01,
+          4.67963219e-02, -7.26316988e-01, 2.97896266e-01,
+          3.42039585e-01, 4.99999523e-02, -4.00304973e-01,
+          3.55313897e-01, 2.98618317e-01, 3.59375477e-02,
+          -2.75870442e-01, 3.31647754e-01, 3.29731226e-01,
+          5.29906750e-01, 5.92964888e-01, 6.49043798e-01,
+          -5.83380580e-01, 2.86006093e-01, 3.51243854e-01,
+          7.22543240e-01, -6.20033383e-01, 2.77008891e-01,
+          2.57554770e-01, 7.28518009e-01, -1.31358504e-02,
+          2.89236188e-01, 2.96213984e-01, 7.03980207e-01,
+          -6.58418536e-02, 3.00384283e-01, 2.78598309e-01 ]
+      - [ -4.65617239e-01, 2.49074936e-01, -1.49042010e-02,
+          -5.01283884e-01, -2.26551116e-01, -8.68778229e-02,
+          -5.54354787e-02, -3.59114528e-01, -2.39936471e-01,
+          -1.63194537e-02, -2.03124285e-02, -2.68849432e-01,
+          -1.65557265e-01, -1.96646929e-01, -2.49999762e-02,
+          -1.58688486e-01, -1.84273720e-01, -1.77138567e-01,
+          5.49143434e-01, 2.09313154e-01, 5.02812147e-01,
+          -3.27912033e-01, -1.97480917e-01, -3.08226347e-02,
+          5.39884329e-01, -1.84688866e-01, -2.53188848e-01,
+          -7.08401203e-04, 6.28891587e-01, -1.57718360e-01,
+          -1.40708208e-01, -2.58759260e-01, 6.41790867e-01,
+          -1.33511424e-01, -1.65269077e-01, -2.55692422e-01 ]
+      - [ -3.48591566e-01, 7.80034184e-01, 1.89496160e-01,
+          -8.30097616e-01, 4.94919538e-01, 5.11048675e-01,
+          1.77825809e-01, -8.43531609e-01, 5.18752933e-01,
+          5.05981684e-01, 1.79687500e-01, -4.19563115e-01,
+          5.45217991e-01, 5.05375743e-01, 1.73437476e-01,
+          -3.83894145e-01, 5.47913790e-01, 4.95599985e-01,
+          6.16471291e-01, 7.28636026e-01, 7.70528674e-01,
+          -7.17492700e-01, 4.78864670e-01, 4.42305565e-01,
+          7.84970999e-01, -7.06365585e-01, 4.85605597e-01,
+          4.33266044e-01, 7.85803199e-01, 1.31369829e-02, 4.94336843e-01,
+          5.43370724e-01, 7.91044831e-01, 6.66630268e-03, 4.61571336e-01,
+          5.12310147e-01 ]
+      - [ -3.03852558e-01, 7.61575460e-01, 2.30659962e-01,
+          -8.50816131e-01, 5.00786066e-01, 4.98646855e-01,
+          2.16702700e-01, -8.20796251e-01, 5.01326919e-01,
+          5.13062358e-01, 2.25000143e-01, -4.14609253e-01,
+          5.28418899e-01, 4.93489146e-01, 2.17187524e-01,
+          -3.55227292e-01, 5.24902344e-01, 4.96159673e-01,
+          5.70183277e-01, 6.91022873e-01, 7.30033755e-01,
+          -6.87357306e-01, 4.25837278e-01, 4.08029079e-01,
+          7.50288963e-01, -6.66689813e-01, 4.21274304e-01,
+          3.95426512e-01, 7.60896564e-01, 5.98720312e-02, 4.12120819e-01,
+          4.91602898e-01, 7.33830810e-01, -2.70257592e-02,
+          4.38250542e-01, 4.58271623e-01 ]
+      - [ -3.44034851e-01, 7.28548050e-01, 1.73882127e-01,
+          -7.43897855e-01, 4.30393100e-01, 5.28761029e-01,
+          1.69186354e-01, -7.96063185e-01, 4.60125208e-01,
+          4.83683944e-01, 1.71875000e-01, -4.30781484e-01,
+          5.04860044e-01, 4.50782299e-01, 1.68749928e-01,
+          -3.84498537e-01, 5.03369093e-01, 4.45121646e-01,
+          6.97625518e-01, 7.04873204e-01, 7.52530932e-01,
+          -6.12605512e-01, 4.43857431e-01, 5.44175506e-01,
+          7.94219375e-01, -5.06539702e-01, 3.96278024e-01,
+          5.83492041e-01, 8.25653672e-01, -1.48919344e-01,
+          5.19728303e-01, 4.35929298e-01, 8.03482652e-01,
+          -1.77897334e-01, 5.16484737e-01, 4.22822595e-01 ]
+      - [ 1., 9.96958137e-01, 9.77288842e-01, -9.93098140e-01,
+          9.93602633e-01, 9.96690035e-01, 9.85601187e-01,
+          -9.89783823e-01, 9.92243171e-01, 9.90924358e-01,
+          9.92187500e-01, -4.79283333e-01, 9.88587737e-01,
+          9.97817159e-01, 9.81249809e-01, -4.31758046e-01,
+          9.92580295e-01, 9.96387482e-01, 8.13645959e-01, 9.94765282e-01,
+          9.25759315e-01, -9.50990319e-01, 9.72727656e-01,
+          7.70584345e-01, 9.56069231e-01, -9.62782085e-01,
+          9.80024815e-01, 7.45172024e-01, 9.30261493e-01,
+          -5.72785735e-02, 9.50219393e-01, 9.82735634e-01,
+          9.42786098e-01, -1.25935256e-01, 9.55507517e-01,
+          9.36612129e-01 ]
+      - [ -4.68102753e-01, 6.72186613e-02, -5.18098474e-02,
+          -1.09615922e-02, -5.50910473e-01, -1.20063424e-01,
+          -5.68754673e-02, 3.77353430e-02, -5.48197865e-01,
+          -1.39582336e-01, -6.56250119e-02, 1.67533159e-02,
+          -3.99332702e-01, -3.03326964e-01, -6.24999404e-02,
+          6.22105598e-02, -4.16741431e-01, -3.19299757e-01,
+          4.72798228e-01, 1.78819299e-01, 4.37570333e-01,
+          -9.49521065e-02, -2.87599623e-01, 1.14450812e-01,
+          4.40462351e-01, -1.96946263e-01, -2.40521908e-01,
+          -1.58190727e-04, 4.79451895e-01, 1.86212897e-01,
+          -2.01619089e-01, -1.07106745e-01, 5.32338381e-01,
+          -2.05291986e-01, -1.26954675e-01, -2.62598395e-01 ]
+      - [ -4.16114330e-01, 7.32181787e-01, 1.05748773e-01,
+          -7.68863797e-01, 4.36313748e-01, 5.00161409e-01,
+          1.02951884e-01, -7.46257961e-01, 4.43541765e-01,
+          5.10048628e-01, 1.12500072e-01, -4.04735923e-01,
+          4.85026121e-01, 4.44804311e-01, 1.01562500e-01,
+          -3.80149484e-01, 4.89210010e-01, 4.27945852e-01,
+          6.71175122e-01, 7.41140366e-01, 7.72778511e-01,
+          -7.10866570e-01, 5.26906490e-01, 5.17696142e-01,
+          7.94219375e-01, -6.55053377e-01, 5.31064749e-01,
+          5.66273093e-01, 8.15691113e-01, -1.97883070e-01,
+          5.94926715e-01, 4.73752022e-01, 8.38308334e-01,
+          -2.17417419e-01, 5.64262629e-01, 4.42353249e-01 ]
+      - [ -2.83968508e-01, 7.50315547e-01, 2.17885017e-01,
+          -7.67593920e-01, 4.51057911e-01, 5.31466365e-01,
+          2.12383032e-01, -7.63742089e-01, 4.60130215e-01,
+          5.22711992e-01, 2.17187643e-01, -3.41573775e-01,
+          4.92185831e-01, 5.09781599e-01, 2.14062572e-01,
+          -3.35736096e-01, 4.99100804e-01, 4.82721090e-01,
+          6.60955787e-01, 7.04828978e-01, 7.61529922e-01,
+          -6.28549993e-01, 4.24064398e-01, 4.93777514e-01,
+          7.84970999e-01, -6.52942777e-01, 4.34884906e-01,
+          4.36165810e-01, 7.90784478e-01, 8.79323483e-02, 4.31629300e-01,
+          5.37693262e-01, 7.96019912e-01, -7.07254410e-02,
+          4.68051672e-01, 4.56762910e-01 ]
+      - [ -5.28997540e-01, 1.60564184e-02, -1.29879296e-01,
+          -2.07712054e-02, -5.81189752e-01, -1.77865803e-01,
+          -1.44708395e-01, 1.55849695e-01, -6.02648675e-01,
+          -1.28786743e-01, -9.68749523e-02, -2.90966630e-02,
+          -4.65542436e-01, -4.09188151e-01, -1.04687452e-01,
+          7.22682476e-03, -4.63994265e-01, -4.08087254e-01,
+          2.41959691e-01, 1.66471243e-01, 3.25084329e-01,
+          -1.29020333e-01, -3.24349344e-01, 3.81720066e-03,
+          3.08670521e-01, 4.71968651e-02, -3.52216005e-01,
+          1.03832126e-01, 3.97260189e-01, -6.44183755e-02,
+          -1.92153990e-01, -2.61147678e-01, 3.88059616e-01,
+          -3.72839570e-02, -2.16945946e-01, -2.58447111e-01 ]
+      - [ -4.98135924e-01, -3.19444537e-02, -1.54009938e-01,
+          6.22099638e-01, -7.43494391e-01, 1.21705890e-01,
+          -3.09575200e-02, -5.32027662e-01, -5.04859686e-01,
+          -5.02455115e-01, -1.71874762e-02, -2.99549699e-01,
+          -4.71198201e-01, -5.45521796e-01, -2.96874642e-02,
+          -3.44892740e-01, -4.51054811e-01, -5.67104697e-01,
+          2.84039736e-01, 3.58843803e-02, 2.98087835e-01, 7.20698714e-01,
+          -6.80519581e-01, 4.13003445e-01, 3.73410344e-01,
+          -7.84593225e-01, -2.09608972e-01, -5.44565558e-01,
+          4.64508057e-01, -4.72941101e-01, -2.22443700e-01,
+          -5.15985668e-01, 4.42786098e-01, -5.62669277e-01,
+          -2.09325373e-01, -5.32342255e-01 ]
+      - [ -4.76387799e-01, 3.55975151e-01, -1.06458664e-02,
+          -2.07860351e-01, -2.24252105e-01, 2.12432265e-01,
+          -5.03957272e-03, -4.86880362e-01, -1.21923149e-01,
+          3.05364132e-02, 2.03125477e-02, -3.01069975e-01,
+          -6.06828928e-02, -9.74969864e-02, 9.37509537e-03,
+          -2.56796062e-01, -6.14826679e-02, -1.00151420e-01,
+          4.75202799e-01, 3.24599504e-01, 5.25309443e-01, 1.53779984e-04,
+          -2.29342461e-01, 3.49819779e-01, 5.37572145e-01,
+          -4.96464491e-01, -1.12679601e-02, -4.64606285e-03,
+          6.46326184e-01, -2.53178775e-01, 1.56309605e-02,
+          -1.62100852e-01, 6.44278646e-01, -3.16800892e-01,
+          2.38181353e-02, -1.77997828e-01 ]
+      - [ -5.21333933e-01, -7.76398182e-03, -1.31298840e-01,
+          3.17867279e-01, -6.84616387e-01, -5.61475158e-02,
+          -1.01511896e-01, -9.74727273e-02, -5.82107902e-01,
+          -3.04406047e-01, -7.34375119e-02, -1.97064281e-02,
+          -4.99149144e-01, -4.44081426e-01, -7.34375119e-02,
+          7.96711445e-03, -5.00689983e-01, -4.50820565e-01,
+          2.96663642e-01, 1.20053768e-01, 3.31833482e-01, 2.86507130e-01,
+          -4.96630609e-01, 2.32297897e-01, 4.03468132e-01,
+          -3.79967272e-01, -2.73603082e-01, -2.51497746e-01,
+          4.47073460e-01, -1.38198733e-02, -2.66050041e-01,
+          -3.06075037e-01, 4.75124478e-01, -2.16908157e-01,
+          -2.33603776e-01, -3.75940800e-01 ]
+      - [ -4.24399376e-01, 3.97495031e-01, 3.33569050e-02,
+          -3.35022986e-01, -1.35875285e-01, 2.09115744e-01,
+          1.22389793e-02, -1.11610830e-01, -1.73508167e-01,
+          3.25176239e-01, 5.78124523e-02, -1.61784053e-01,
+          -3.07100415e-02, 2.96756029e-02, 5.93750477e-02,
+          -2.16454923e-01, -9.45800543e-03, -1.22547150e-02,
+          5.45536518e-01, 4.44196224e-01, 6.19797587e-01,
+          -3.31830323e-01, 3.03494930e-03, 2.79638648e-01,
+          5.35260081e-01, -1.55586004e-02, -4.86378074e-02,
+          5.58908463e-01, 6.76214218e-01, -1.57307386e-01,
+          1.58615112e-01, 4.80380058e-02, 6.94029808e-01,
+          -3.16365063e-01, 1.84821248e-01, -1.88687444e-02 ]
+      - [ -3.99337232e-01, 1.36810899e-01, -2.34208703e-02,
+          4.08455133e-02, -4.67742383e-01, 7.39796162e-02,
+          -3.52771878e-02, -5.17932177e-02, -3.94366741e-01,
+          3.46331596e-02, 9.21875238e-02, -5.33278465e-01,
+          -2.50968337e-01, -4.26386297e-01, 9.37500000e-02,
+          -3.25881481e-01, -2.95997739e-01, -3.91471744e-01,
+          7.28283644e-01, 2.01676011e-01, 5.25309443e-01,
+          -3.12439799e-02, -2.58888900e-01, 2.52610803e-01,
+          4.47398663e-01, 6.91063404e-02, -2.05110252e-01,
+          3.98641229e-01, 8.43088388e-01, -8.34944725e-01,
+          2.60056257e-02, -4.73554730e-01, 8.30845833e-01,
+          -6.66471243e-01, -3.84815335e-02, -4.21019733e-01 ]
+      - [ 1., 9.79812145e-01, 7.99857974e-01, -9.75689530e-01,
+          9.33219194e-01, 9.32208896e-01, 7.94096470e-01,
+          -9.62986708e-01, 9.41601038e-01, 9.47221398e-01,
+          8.01562548e-01, -4.78162527e-01, 9.53862786e-01,
+          9.52103734e-01, 7.93750048e-01, -4.26341116e-01,
+          9.47726250e-01, 9.43167806e-01, 8.11241269e-01, 9.86089587e-01,
+          9.28009152e-01, -9.50332761e-01, 9.62184191e-01,
+          7.58227825e-01, 9.39884305e-01, -9.59634364e-01,
+          9.66193318e-01, 7.29656100e-01, 9.45205331e-01,
+          -1.25702560e-01, 9.50577736e-01, 9.18788910e-01,
+          9.42786098e-01, -9.53487754e-02, 9.25006986e-01,
+          9.31647539e-01 ]
+      - [ 1., 9.69443202e-01, 7.58694053e-01, -9.49611127e-01,
+          8.90056133e-01, 9.07954216e-01, 7.52339959e-01,
+          -9.67025518e-01, 9.08410788e-01, 8.98774862e-01,
+          7.59375095e-01, -4.84353483e-01, 9.13443208e-01,
+          8.99256825e-01, 7.54687428e-01, -4.45291638e-01,
+          9.16840911e-01, 8.90156984e-01, 8.13645959e-01, 9.72693324e-01,
+          9.30258632e-01, -9.60996747e-01, 9.37232852e-01,
+          7.09797144e-01, 9.39884305e-01, -9.55599427e-01,
+          9.48417902e-01, 7.12748885e-01, 9.35242772e-01,
+          -5.02233505e-02, 8.98548722e-01, 9.33136940e-01,
+          9.25372958e-01, -6.85276389e-02, 8.99569035e-01,
+          9.28061604e-01 ]
+      - [ -2.27009118e-01, 8.25725198e-01, 2.81760097e-01,
+          -7.63257205e-01, 5.67951322e-01, 6.91657543e-01,
+          2.72858143e-01, -8.26978326e-01, 6.05833054e-01,
+          6.41844630e-01, 2.81250119e-01, -4.11465228e-01,
+          6.39995098e-01, 6.25420809e-01, 2.73437500e-01,
+          -3.79365683e-01, 6.38734221e-01, 6.07823849e-01,
+          6.72377467e-01, 8.32378507e-01, 7.99775004e-01,
+          -6.45196557e-01, 6.09068394e-01, 7.35359550e-01,
+          8.47398639e-01, -8.39918792e-01, 6.85505509e-01,
+          5.23799539e-01, 8.20672393e-01, -5.81761003e-02,
+          6.86481357e-01, 6.91788197e-01, 8.25870514e-01,
+          -9.23477411e-02, 6.77433372e-01, 6.64454699e-01 ]
+      - [ -5.11599064e-01, 2.47917414e-01, -6.74237013e-02,
+          -8.29125643e-02, -3.70662868e-01, 1.13192916e-01,
+          -5.68754673e-02, -3.52933824e-01, -2.84570694e-01,
+          -7.35480189e-02, -4.06249166e-02, -1.21040106e-01,
+          -2.21827805e-01, -1.71739519e-01, -4.68749404e-02,
+          -1.13256812e-01, -2.13971376e-01, -1.83550656e-01,
+          3.80222321e-01, 2.57276177e-01, 4.37570333e-01, 1.17991328e-01,
+          -3.36393058e-01, 3.16876054e-01, 5.12138724e-01,
+          -4.36261535e-01, -1.40319586e-01, -1.17220104e-01,
+          5.49190402e-01, 3.31686735e-02, -1.37670934e-01,
+          -1.37127995e-01, 5.62189102e-01, -1.57314539e-01,
+          -1.00391030e-01, -2.06255853e-01 ]
+      - [ -4.63131726e-01, 6.39616132e-01, 4.89709377e-02,
+          -6.07653499e-01, 2.45536923e-01, 4.35954094e-01,
+          4.67963219e-02, -7.24252880e-01, 2.91417599e-01,
+          3.36053252e-01, 5.15625477e-02, -4.00748193e-01,
+          3.49499106e-01, 2.91932583e-01, 4.68750000e-02,
+          -2.77308702e-01, 3.18692684e-01, 3.14627290e-01,
+          5.24496436e-01, 6.24242425e-01, 6.46794081e-01,
+          -4.35518324e-01, 2.69936562e-01, 5.42064428e-01,
+          7.08670378e-01, -7.41949260e-01, 3.72292280e-01,
+          2.28850007e-01, 7.38480687e-01, -1.06036842e-01,
+          3.66446018e-01, 3.05854797e-01, 7.03980207e-01,
+          -7.67786503e-02, 3.51739168e-01, 3.25065494e-01 ]
+      - [ -4.64581668e-01, 6.87032700e-01, 6.74237013e-02,
+          -7.23810077e-01, 3.34543467e-01, 4.19280648e-01,
+          6.11951351e-02, -6.81124926e-01, 3.31828117e-01,
+          4.34410214e-01, 5.93750477e-02, -3.66358995e-01,
+          3.99524212e-01, 3.74892116e-01, 5.78124523e-02,
+          -2.97602236e-01, 3.80018234e-01, 3.68734002e-01,
+          5.08866787e-01, 6.67690754e-01, 7.03037143e-01,
+          -6.43116832e-01, 3.66179705e-01, 3.85359764e-01,
+          6.90173268e-01, -5.43864489e-01, 3.53994489e-01,
+          4.60723996e-01, 7.38480687e-01, -9.34821367e-02,
+          4.25193429e-01, 3.78299236e-01, 7.21393108e-01,
+          -2.45929360e-02, 3.85769367e-01, 4.02997732e-01 ]
+      - [ -4.76594865e-01, 5.17162204e-01, 2.90986300e-02,
+          -6.06122494e-01, 8.77442360e-02, 2.27622271e-01,
+          4.10368443e-02, -7.49317288e-01, 1.33474588e-01,
+          1.14496708e-01, 3.43750715e-02, -2.26267397e-01,
+          1.29339218e-01, 1.70018792e-01, 2.81250477e-02,
+          -3.79453480e-01, 1.84911370e-01, 8.79347324e-02,
+          4.64382291e-01, 4.37200427e-01, 6.04049444e-01,
+          -4.49822485e-01, 3.51594687e-02, 1.68508291e-01,
+          6.23121262e-01, -7.63411522e-01, 1.70123696e-01,
+          -6.13127351e-02, 6.46326184e-01, 5.00358343e-02,
+          6.68914318e-02, 9.97648239e-02, 6.64179087e-01,
+          -6.35796785e-02, 7.78306723e-02, 4.23073769e-02 ]
+      - [ -4.88401055e-01, 3.68319750e-01, -1.91625953e-02,
+          -3.77007008e-01, -1.57312989e-01, 1.32360816e-01,
+          -1.51187181e-02, -3.20956230e-01, -1.64661825e-01,
+          1.27736330e-01, 7.81261921e-03, -2.41987467e-01,
+          -5.58578968e-02, -5.47094941e-02, 4.68754768e-03,
+          -1.81944013e-01, -5.93495369e-02, -4.83300090e-02,
+          4.74000573e-01, 3.49507809e-01, 5.95050573e-01,
+          -3.99419129e-01, -7.75988698e-02, 6.67934418e-02,
+          5.60693502e-01, -1.95257425e-01, -1.12835467e-01,
+          2.10183144e-01, 6.53798223e-01, -1.30177617e-01,
+          1.52467489e-02, -8.15932751e-02, 5.92039704e-01,
+          2.77404785e-02, -2.31524706e-02, -5.13190031e-03 ]
+      - [ -3.32435846e-01, 6.86766982e-01, 1.79559946e-01,
+          -6.18516207e-01, 3.17311764e-01, 5.28565645e-01,
+          1.79265738e-01, -6.36417270e-01, 3.32450032e-01,
+          4.93387938e-01, 1.82812452e-01, -3.38719785e-01,
+          4.09146547e-01, 4.15252090e-01, 1.79687619e-01,
+          -2.86138415e-01, 3.97271514e-01, 4.05018330e-01,
+          7.01833487e-01, 7.04117894e-01, 7.72778511e-01,
+          -4.92924392e-01, 3.76895666e-01, 6.28141999e-01,
+          8.10404658e-01, -3.36199820e-01, 3.00893903e-01,
+          6.84306502e-01, 8.13200355e-01, -1.75627947e-01,
+          5.34695387e-01, 4.29248452e-01, 8.18407774e-01,
+          -2.08950639e-01, 5.21280169e-01, 4.03829932e-01 ]
+      - [ 1., 9.80197310e-01, 8.26827526e-01, -9.89940345e-01,
+          9.51032400e-01, 9.38491702e-01, 8.37293148e-01,
+          -9.84846652e-01, 9.52108026e-01, 9.37973380e-01,
+          8.39062572e-01, -4.88459647e-01, 9.54671264e-01,
+          9.45816278e-01, 8.28125119e-01, -4.25830722e-01,
+          9.54637766e-01, 9.52829242e-01, 8.41899633e-01, 9.79125738e-01,
+          9.34758067e-01, -9.46954012e-01, 9.51450825e-01,
+          7.50512481e-01, 9.65317726e-01, -9.35570478e-01,
+          9.51909542e-01, 7.49825597e-01, 9.50186849e-01,
+          -2.50344276e-02, 9.05073166e-01, 9.64500785e-01,
+          9.42786098e-01, -8.39545727e-02, 9.07667279e-01,
+          9.23264265e-01 ]
+      - [ -4.67481375e-01, 3.09066057e-01, -2.20014453e-02,
+          -3.32596481e-01, -2.15348780e-01, 9.43275690e-02,
+          -1.07990503e-02, -3.66427422e-01, -1.76731050e-01,
+          6.73424006e-02, 2.96876431e-02, -2.75344312e-01,
+          -1.24960661e-01, -1.52720571e-01, 2.34376192e-02,
+          -3.43871772e-01, -9.24942493e-02, -1.88659847e-01,
+          5.54553509e-01, 3.00958633e-01, 5.63554525e-01,
+          -1.61326468e-01, -1.84399664e-01, 2.06016541e-01,
+          5.44508696e-01, -2.59740651e-01, -8.72505903e-02,
+          1.68559670e-01, 6.68742180e-01, -1.63381815e-01,
+          -1.82943344e-02, -1.37319505e-01, 6.99004889e-01,
+          -3.91789079e-01, 2.13155746e-02, -2.23306656e-01 ]
+      - [ -3.87738228e-01, 6.98599458e-01, 1.36976600e-01,
+          -7.18473077e-01, 3.54041457e-01, 4.57100034e-01,
+          1.23110175e-01, -6.83393419e-01, 3.62567544e-01,
+          4.77195144e-01, 1.35937572e-01, -3.64518344e-01,
+          4.19455290e-01, 4.04089332e-01, 1.34375095e-01,
+          -3.31257105e-01, 4.19656157e-01, 3.91749263e-01,
+          5.52750230e-01, 6.80798173e-01, 7.00787306e-01,
+          -6.41544402e-01, 4.01203871e-01, 4.35938597e-01,
+          7.22543240e-01, -5.04629791e-01, 3.40250611e-01,
+          4.98170733e-01, 7.45952725e-01, 4.37253714e-02, 4.03758645e-01,
+          4.68469501e-01, 7.28855848e-01, -2.61868834e-02,
+          4.09145355e-01, 4.27251816e-01 ]
+      - [ -3.18972707e-01, 8.00100684e-01, 2.12207198e-01,
+          -8.09058487e-01, 5.36375165e-01, 5.89980960e-01,
+          2.09503293e-01, -8.38865936e-01, 5.52202702e-01,
+          5.55695772e-01, 2.10937500e-01, -4.19111013e-01,
+          5.89500427e-01, 5.57655334e-01, 2.01562524e-01,
+          -3.65528882e-01, 5.86264014e-01, 5.55671215e-01,
+          6.48331881e-01, 7.75471926e-01, 7.81777263e-01,
+          -6.40682817e-01, 5.18967509e-01, 6.12091660e-01,
+          8.24277282e-01, -7.80044317e-01, 5.73728800e-01,
+          4.56202626e-01, 8.13200355e-01, -1.00501180e-02,
+          5.68024397e-01, 6.04893208e-01, 8.08457613e-01,
+          -6.12678528e-02, 5.67768216e-01, 5.72158337e-01 ]
+      - [ -4.93993402e-01, 1.83712959e-01, -6.03265166e-02,
+          -4.17437911e-01, -3.06513131e-01, -1.20049655e-01,
+          -2.95175314e-02, -5.22393703e-01, -2.88185954e-01,
+          -2.20511794e-01, -1.40624046e-02, -2.62393832e-01,
+          -2.59978771e-01, -2.97569811e-01, -2.65624523e-02,
+          -2.24205971e-01, -2.58558214e-01, -3.00253153e-01,
+          4.95641708e-01, 1.31520987e-01, 4.37570333e-01,
+          -9.16685462e-02, -3.56057942e-01, 1.09148026e-03,
+          5.39884329e-01, -3.50639164e-01, -2.81356812e-01,
+          -2.22710669e-01, 6.48816824e-01, -1.05565608e-01,
+          -2.69222200e-01, -3.57712567e-01, 6.34328127e-01,
+          -1.46814883e-01, -2.68409491e-01, -3.67582202e-01 ]
+      - [ -4.67067122e-01, 3.00906062e-01, -3.61958742e-02,
+          5.95170259e-02, -3.37936699e-01, 3.22309136e-01,
+          4.24766541e-02, -6.64522529e-01, -1.60192788e-01,
+          -1.70335054e-01, 5.31250238e-02, -3.52184653e-01,
+          -1.29256070e-01, -2.02640295e-01, 3.90625000e-02,
+          -3.52141440e-01, -1.13940775e-01, -2.15763032e-01,
+          5.18485069e-01, 2.81027079e-01, 5.16310573e-01, 3.30262184e-01,
+          -3.81103694e-01, 5.35552382e-01, 6.50866866e-01,
+          -8.27000558e-01, -4.86606359e-03, -3.28264713e-01,
+          7.13574052e-01, -3.19576740e-01, -2.49190927e-02,
+          -2.40143239e-01, 6.89054728e-01, -4.32144880e-01,
+          -3.57967615e-03, -2.70481110e-01 ]
+      - [ -4.25435007e-01, 7.21232176e-01, 1.08587623e-01,
+          -7.94188678e-01, 4.20305610e-01, 4.50868130e-01,
+          1.08711362e-01, -7.86669135e-01, 4.13855672e-01,
+          4.29835200e-01, 1.12500072e-01, -3.79734516e-01,
+          4.51990247e-01, 4.27358747e-01, 1.01562500e-01,
+          -3.56547534e-01, 4.56519485e-01, 4.10560846e-01,
+          5.44334173e-01, 6.74390912e-01, 7.18785167e-01,
+          -6.75671816e-01, 3.95365596e-01, 3.81520391e-01,
+          7.50288963e-01, -7.09458709e-01, 4.07555819e-01,
+          3.21216702e-01, 7.31008649e-01, -4.69481945e-03,
+          4.14480448e-01, 4.39522505e-01, 7.31343150e-01,
+          -5.15263081e-02, 4.13046479e-01, 4.11316633e-01 ]
+      - [ -4.05550957e-01, 7.90369272e-01, 1.41234875e-01,
+          -7.27609813e-01, 4.77663040e-01, 6.05716825e-01,
+          1.34629250e-01, -8.77684236e-01, 5.51903009e-01,
+          5.04478455e-01, 1.35937572e-01, -4.48372841e-01,
+          5.78290939e-01, 5.16003966e-01, 1.26562595e-01,
+          -4.04988170e-01, 5.79664946e-01, 5.10039926e-01,
+          6.20078206e-01, 7.92713284e-01, 7.86276698e-01,
+          -5.74337363e-01, 5.01433372e-01, 6.88648462e-01,
+          8.26589465e-01, -9.09345865e-01, 6.45750523e-01,
+          3.77521753e-01, 8.10709715e-01, -1.57946944e-01,
+          6.47768140e-01, 5.62709570e-01, 7.93532133e-01,
+          -2.07325816e-01, 6.49964571e-01, 5.37797451e-01 ]
+      - [ 3.14830542e-02, 8.15505385e-01, 4.18026805e-01,
+          -8.11893404e-01, 5.78747511e-01, 6.58384800e-01,
+          4.11087155e-01, -7.64950991e-01, 5.72827101e-01,
+          6.80726051e-01, 4.18750048e-01, -4.28942323e-01,
+          6.33991599e-01, 6.12402678e-01, 4.15624976e-01,
+          -4.03190851e-01, 6.42109036e-01, 6.00239158e-01,
+          7.25879073e-01, 8.05900216e-01, 8.31270933e-01,
+          -7.18978405e-01, 5.87527752e-01, 5.96241355e-01,
+          8.45086694e-01, -5.70329309e-01, 5.34690142e-01,
+          6.97304845e-01, 8.40597749e-01, -2.86331177e-02,
+          6.36951685e-01, 6.64657831e-01, 8.65671635e-01,
+          -1.74855709e-01, 6.51338577e-01, 5.68615198e-01 ]
+      - [ -2.88111091e-01, 7.02596188e-01, 2.09368348e-01,
+          -6.14426374e-01, 3.37924600e-01, 5.62905908e-01,
+          2.06623435e-01, -6.39075696e-01, 3.61542821e-01,
+          5.30602455e-01, 2.15625048e-01, -4.08707857e-01,
+          4.53012228e-01, 4.12847400e-01, 2.10937619e-01,
+          -3.25567245e-01, 4.38608766e-01, 4.22387958e-01,
+          6.86203718e-01, 7.33210087e-01, 7.66029358e-01,
+          -4.85404849e-01, 4.20085907e-01, 7.03830719e-01,
+          8.15028787e-01, -6.06714010e-01, 4.61439967e-01,
+          5.38984776e-01, 8.18181753e-01, -1.36059046e-01,
+          5.55725217e-01, 4.84316707e-01, 8.18407774e-01,
+          -1.89026892e-01, 5.63573718e-01, 4.63597775e-01 ]
+      - [ -4.37241077e-01, 7.51759887e-01, 1.00070953e-01,
+          -8.14444959e-01, 4.67286587e-01, 4.84951019e-01,
+          9.86322165e-02, -8.16562533e-01, 4.72064734e-01,
+          4.68665600e-01, 8.90625715e-02, -4.52548146e-01,
+          5.34270167e-01, 4.59923506e-01, 8.43750238e-02,
+          -3.40268612e-01, 5.04528999e-01, 4.76394296e-01,
+          5.74391365e-01, 7.22295403e-01, 7.16535449e-01,
+          -6.55469596e-01, 4.67772126e-01, 5.10316491e-01,
+          7.75722384e-01, -7.56479383e-01, 4.97376561e-01,
+          3.81301045e-01, 7.80821919e-01, -3.36540937e-02,
+          4.87030506e-01, 4.95536685e-01, 7.71144271e-01,
+          -6.67873025e-02, 4.78474379e-01, 4.70494390e-01 ]
+      - [ -3.82560074e-01, 6.60678148e-01, 1.36976600e-01,
+          -7.48231411e-01, 3.28021884e-01, 3.90088439e-01,
+          1.30309582e-01, -7.17315435e-01, 3.25533390e-01,
+          3.94496083e-01, 1.35937572e-01, -3.48257184e-01,
+          3.64602208e-01, 3.54517698e-01, 1.23437524e-01,
+          -3.21173728e-01, 3.78395319e-01, 3.52602482e-01,
+          5.07063389e-01, 5.60616851e-01, 6.37795210e-01,
+          -5.41994214e-01, 2.11750746e-01, 3.02380562e-01,
+          6.53179169e-01, -5.11871874e-01, 2.08336353e-01,
+          2.93432236e-01, 6.76214218e-01, 3.54170799e-03, 2.59058714e-01,
+          2.75319219e-01, 6.66666508e-01, -2.16283202e-02,
+          2.54703045e-01, 2.62138367e-01 ]
+      - [ -4.90886509e-01, 9.37826633e-02, -9.01347995e-02,
+          1.91199780e-03, -5.08966148e-01, -3.31609249e-02,
+          -3.23972702e-02, -3.48660290e-01, -4.40696001e-01,
+          -2.87891448e-01, -1.09374523e-02, -2.78258681e-01,
+          -3.56107473e-01, -4.11554456e-01, -2.18750238e-02,
+          -4.49630618e-02, -3.98272693e-01, -3.57042253e-01,
+          4.21701193e-01, 1.29086971e-01, 3.94825697e-01, 1.85849547e-01,
+          -4.47224319e-01, 1.93403602e-01, 5.32947779e-01,
+          -4.76487935e-01, -2.65560091e-01, -3.25189054e-01,
+          5.81569076e-01, -2.16992676e-01, -2.25824654e-01,
+          -3.78834844e-01, 5.52238703e-01, -1.05054259e-01,
+          -2.58969307e-01, -3.37176204e-01 ]
+      - [ -4.28541899e-01, 7.21338749e-01, 1.05748773e-01,
+          -7.14685738e-01, 3.79156947e-01, 4.90814686e-01,
+          1.02951884e-01, -7.61657715e-01, 4.08018589e-01,
+          4.48671818e-01, 1.03125095e-01, -3.99667084e-01,
+          4.54284668e-01, 4.13636208e-01, 9.84375477e-02,
+          -3.51038575e-01, 4.53249335e-01, 4.11112070e-01,
+          5.95431328e-01, 6.94952607e-01, 7.43532062e-01,
+          -5.89051902e-01, 3.99327993e-01, 5.14443874e-01,
+          7.57225275e-01, -7.07555294e-01, 4.49527502e-01,
+          3.80185246e-01, 7.80821919e-01, -1.24586821e-01,
+          4.85105872e-01, 4.17859793e-01, 7.76119351e-01,
+          -1.54129446e-01, 4.67691898e-01, 3.89603496e-01 ]
+      - [ -4.89229560e-01, 3.47174406e-02, -1.00071013e-01,
+          3.17063451e-01, -6.43860340e-01, 2.51907110e-02,
+          -6.55146241e-02, -2.22446203e-01, -5.04751205e-01,
+          -2.82698512e-01, -7.81250000e-03, -2.64702201e-01,
+          -4.22654271e-01, -4.76843178e-01, -1.71874762e-02,
+          -2.07020223e-01, -4.26150024e-01, -4.74236906e-01,
+          3.58581185e-01, 9.98226404e-02, 3.29583883e-01, 2.75554538e-01,
+          -4.86673474e-01, 2.38357067e-01, 3.66473913e-01,
+          -3.81959975e-01, -2.43656576e-01, -2.15044796e-01,
+          5.19302607e-01, -3.73556793e-01, -2.02696383e-01,
+          -4.44735706e-01, 5.14925241e-01, -4.63836789e-01,
+          -1.89648151e-01, -4.63369131e-01 ]
+      - [ -4.56918001e-01, 6.98560357e-01, 6.60041571e-02,
+          -7.77900457e-01, 3.85873556e-01, 4.21772480e-01,
+          6.55148029e-02, -8.12362075e-01, 4.08397317e-01,
+          3.90901446e-01, 6.40624762e-02, -3.81047130e-01,
+          4.26894426e-01, 3.94577026e-01, 5.31250238e-02,
+          -3.54573965e-01, 4.37094331e-01, 3.86809468e-01,
+          5.33513665e-01, 6.67500138e-01, 6.94038272e-01,
+          -6.17594242e-01, 3.67183805e-01, 4.21983600e-01,
+          7.22543240e-01, -7.20123768e-01, 4.38609362e-01,
+          3.44984174e-01, 7.38480687e-01, -3.75913978e-02,
+          4.10541177e-01, 4.08241987e-01, 7.18905449e-01,
+          -7.85223842e-02, 4.13772941e-01, 3.90365362e-01 ]
+      - [ -3.06959450e-01, 7.42272973e-01, 2.34918356e-01,
+          -8.74553382e-01, 4.32019949e-01, 3.87926340e-01,
+          2.31101632e-01, -8.83279443e-01, 4.41218734e-01,
+          3.75325918e-01, 2.21875072e-01, -4.50970292e-01,
+          4.60205674e-01, 3.88932228e-01, 2.14062572e-01,
+          -4.25009906e-01, 4.68417764e-01, 3.77450466e-01,
+          6.74782038e-01, 7.14619994e-01, 8.06524277e-01,
+          -7.96752036e-01, 5.07183909e-01, 3.72957706e-01,
+          8.15028787e-01, -7.82103240e-01, 5.02942801e-01,
+          3.59984040e-01, 8.28144431e-01, -4.10015583e-02,
+          4.78736758e-01, 4.81584787e-01, 8.28358173e-01,
+          -8.62265229e-02, 4.74762559e-01, 4.52378988e-01 ]
+      - [ -2.41715014e-01, 7.40968347e-01, 2.61887908e-01,
+          -7.52889872e-01, 4.48461533e-01, 5.49135327e-01,
+          2.44060516e-01, -7.05729425e-01, 4.56470251e-01,
+          5.83986163e-01, 2.65625119e-01, -4.68186975e-01,
+          5.35156250e-01, 4.61724043e-01, 2.60937452e-01,
+          -3.76547277e-01, 5.13910294e-01, 4.69331861e-01,
+          6.92215204e-01, 7.01289535e-01, 7.63779521e-01,
+          -6.14230990e-01, 4.33580041e-01, 5.28438568e-01,
+          7.50288963e-01, -4.54403758e-01, 4.07180548e-01,
+          6.69726968e-01, 8.48069668e-01, -2.31094897e-01,
+          5.27259707e-01, 3.77835751e-01, 8.18407774e-01,
+          -1.94891632e-01, 5.07859111e-01, 4.00949955e-01 ]
+      - [ 2.44614840e-01, 9.07952428e-01, 5.06032586e-01,
+          -8.71938527e-01, 7.58635759e-01, 8.20184469e-01,
+          5.04679680e-01, -9.09708977e-01, 7.78435469e-01,
+          7.84444571e-01, 5.15625119e-01, -4.45415556e-01,
+          7.94050336e-01, 7.85142183e-01, 5.03124952e-01,
+          -4.09791291e-01, 7.98116684e-01, 7.74829507e-01,
+          7.69161344e-01, 9.17175889e-01, 9.01012301e-01,
+          -8.08385909e-01, 7.80271292e-01, 7.32697725e-01,
+          9.07514215e-01, -9.08732772e-01, 8.38903189e-01,
+          6.34469271e-01, 9.00373459e-01, -5.66369891e-02,
+          8.18276405e-01, 8.38650823e-01, 8.98009896e-01,
+          -1.19314492e-01, 8.26180339e-01, 8.02606225e-01 ]
+      - [ -4.97514546e-01, 2.50519037e-01, -3.33570242e-02,
+          -3.41765106e-01, -2.90308714e-01, -2.42127776e-02,
+          -6.47938251e-03, -6.45803690e-01, -2.02057242e-01,
+          -2.10156858e-01, -2.49999762e-02, -4.32371497e-02,
+          -2.30118752e-01, -1.32737696e-01, -2.18750238e-02,
+          -4.04384673e-01, -1.34265840e-01, -2.72455812e-01,
+          3.91042948e-01, 1.60517812e-01, 4.69066381e-01,
+          -1.37877345e-01, -3.53784084e-01, -5.05967140e-02,
+          4.82080936e-01, -6.46455824e-01, -1.54096305e-01,
+          -3.50461125e-01, 5.44209123e-01, 4.15328741e-02,
+          -2.50998974e-01, -2.54527688e-01, 5.54726362e-01,
+          -1.16109490e-01, -2.28697062e-01, -3.12488258e-01 ]
+      - [ -5.02485514e-01, 8.69081020e-02, -1.04329348e-01,
+          2.10753798e-01, -5.70880532e-01, 6.43981695e-02,
+          -1.17350519e-01, 2.16784835e-01, -5.63689411e-01,
+          -3.95643711e-03, -4.99998927e-02, -1.58762753e-01,
+          -3.81031156e-01, -3.78068864e-01, -5.00000119e-02,
+          -2.36763299e-01, -3.53919268e-01, -4.12670076e-01,
+          3.56777787e-01, 2.69756079e-01, 3.81327391e-01, 9.37614441e-02,
+          -2.48458207e-01, 4.50856805e-01, 4.26589608e-01,
+          -1.28751636e-01, -2.03461945e-01, 1.40809894e-01,
+          5.14321208e-01, -2.78009295e-01, -1.58996582e-02,
+          -2.14301109e-01, 5.02487421e-01, -3.83607090e-01,
+          1.06506348e-02, -2.37615347e-01 ]
+      - [ -2.97845960e-01, 8.45073462e-01, 2.30659962e-01,
+          -8.37218165e-01, 6.06689453e-01, 6.46411180e-01,
+          2.25341916e-01, -8.82516682e-01, 6.37622118e-01,
+          6.14457130e-01, 2.25000143e-01, -4.48536873e-01,
+          6.70765162e-01, 6.25927210e-01, 2.18750119e-01,
+          -3.87520671e-01, 6.63666129e-01, 6.25326276e-01,
+          6.38112426e-01, 8.38233352e-01, 8.02024841e-01,
+          -7.54452705e-01, 6.41426206e-01, 6.13169909e-01,
+          8.15028787e-01, -8.52147579e-01, 7.08245397e-01,
+          5.31965852e-01, 8.25653672e-01, -3.03635597e-02,
+          6.61587477e-01, 6.89639449e-01, 8.08457613e-01,
+          -9.14781690e-02, 6.74373865e-01, 6.61417246e-01 ]
+      - [ -3.21043968e-01, 7.08419204e-01, 1.99432254e-01,
+          -8.07067633e-01, 3.97389174e-01, 4.17020917e-01,
+          1.89344883e-01, -7.79759407e-01, 4.02258396e-01,
+          4.30889130e-01, 1.76562548e-01, -4.00277913e-01,
+          4.46287513e-01, 4.09287214e-01, 1.84374928e-01,
+          -3.54572475e-01, 4.36875343e-01, 3.95828724e-01,
+          6.83799267e-01, 6.43622398e-01, 7.39032626e-01,
+          -6.54487967e-01, 3.81185412e-01, 3.94257545e-01,
+          7.82658935e-01, -6.36881709e-01, 3.66641760e-01,
+          3.64096999e-01, 7.95765877e-01, -1.98873878e-02,
+          3.74463439e-01, 3.84979367e-01, 7.48756051e-01,
+          -9.49931145e-03, 3.74800205e-01, 4.03634071e-01 ]
+      - [ 1., 9.83561516e-01, 9.27608252e-01, -9.65385258e-01,
+          9.51857209e-01, 9.76169229e-01, 9.40964699e-01,
+          -9.80706990e-01, 9.59753990e-01, 9.58606601e-01,
+          9.31250095e-01, -4.77115452e-01, 9.65378404e-01,
+          9.71073985e-01, 9.35937643e-01, -4.30701375e-01,
+          9.65502620e-01, 9.64960814e-01, 8.79771590e-01, 9.79875922e-01,
+          9.64004397e-01, -9.65979278e-01, 9.66381192e-01,
+          7.44852781e-01, 9.79190588e-01, -9.15676355e-01,
+          9.31344271e-01, 7.53983140e-01, 9.62640166e-01,
+          -5.44670224e-02, 9.28193450e-01, 9.62244749e-01,
+          9.65173960e-01, -9.05410647e-02, 9.13161397e-01,
+          9.23825383e-01 ]
+      - [ 1., 9.96557236e-01, 9.70191598e-01, -9.83905077e-01,
+          9.87433553e-01, 1., 9.75522041e-01, -9.91294265e-01,
+          9.99954939e-01, 9.98118877e-01, 9.92187500e-01,
+          -4.85359848e-01, 9.90819097e-01, 9.95209575e-01,
+          9.76562738e-01, -4.54339862e-01, 1., 9.84947920e-01,
+          7.96212792e-01, 9.94311094e-01, 9.25759315e-01,
+          -9.35208380e-01, 9.56252933e-01, 7.73679376e-01,
+          9.49132800e-01, -9.85023439e-01, 9.81784701e-01,
+          7.14277148e-01, 9.35242772e-01, -8.70890021e-02,
+          9.59439278e-01, 9.64504123e-01, 9.35323358e-01,
+          -1.05418086e-01, 9.41903114e-01, 9.40481424e-01 ]
+      - [ -4.87365425e-01, 2.41612434e-01, -5.32292724e-02,
+          2.41928935e-01, -4.76424158e-01, 2.75185704e-01,
+          -2.95175314e-02, -5.06955981e-02, -3.87779593e-01,
+          4.61506844e-02, -1.56250000e-02, -2.08841681e-01,
+          -2.04011679e-01, -2.03590214e-01, -2.49999762e-02,
+          -1.23768270e-01, -2.14106679e-01, -1.89458013e-01,
+          4.14487600e-01, 3.93541932e-01, 4.60067511e-01,
+          -6.58708215e-02, -1.06655419e-01, 4.71468568e-01,
+          5.16762972e-01, -4.85690773e-01, 4.44041491e-02,
+          8.27300549e-02, 5.81569076e-01, -1.75756872e-01,
+          9.29534435e-02, -3.56084108e-02, 5.77114582e-01,
+          -1.76572859e-01, 8.07886124e-02, -3.39899659e-02 ]
+      - [ -3.57290804e-01, 6.10495329e-01, 1.71043277e-01,
+          -7.88995445e-01, 2.29372621e-01, 2.25590944e-01,
+          1.63426876e-01, -7.73125350e-01, 2.29068756e-01,
+          2.20568895e-01, 1.59374952e-01, -3.92836809e-01,
+          2.57616401e-01, 2.02930450e-01, 1.53124928e-01,
+          -3.58069301e-01, 2.58208036e-01, 1.92405701e-01,
+          5.71986675e-01, 5.47724962e-01, 7.14285731e-01,
+          -6.69436693e-01, 2.39538312e-01, 1.78368211e-01,
+          7.29479671e-01, -7.03931212e-01, 2.64124155e-01,
+          1.38044715e-01, 7.35990047e-01, -4.11509871e-02,
+          2.33878970e-01, 2.14852333e-01, 7.31343150e-01,
+          -7.48499632e-02, 2.31575489e-01, 1.99198484e-01 ]
+      - [ -4.10936236e-01, 7.29527473e-01, 1.19943261e-01,
+          -8.47430587e-01, 4.64049697e-01, 4.45052981e-01,
+          1.07271314e-01, -8.53357196e-01, 4.73794341e-01,
+          4.32313323e-01, 1.07812524e-01, -3.96659613e-01,
+          4.90964532e-01, 4.57524896e-01, 1.01562500e-01,
+          -3.53629053e-01, 4.89818096e-01, 4.50184107e-01,
+          5.65975308e-01, 6.38138056e-01, 6.94038272e-01,
+          -7.05807447e-01, 3.93405914e-01, 3.33713174e-01,
+          7.04046130e-01, -7.21795321e-01, 4.04770970e-01,
+          2.96021104e-01, 7.26027369e-01, 3.88098955e-02, 3.58283877e-01,
+          4.13696527e-01, 7.11442709e-01, 3.26225758e-02, 3.54537606e-01,
+          4.14200425e-01 ]
+      - [ -4.05550957e-01, 7.46270657e-01, 1.34137630e-01,
+          -6.34892344e-01, 3.84076595e-01, 5.94508767e-01,
+          1.30309582e-01, -8.22674036e-01, 4.66959953e-01,
+          4.59427357e-01, 1.34375095e-01, -4.38392997e-01,
+          5.06501675e-01, 4.43888068e-01, 1.28125072e-01,
+          -3.70485187e-01, 5.00701189e-01, 4.50576544e-01,
+          6.26089573e-01, 7.62873530e-01, 7.72778511e-01,
+          -4.43521738e-01, 4.18798923e-01, 7.68146515e-01,
+          8.01155925e-01, -8.92035484e-01, 6.22247338e-01,
+          3.67155552e-01, 7.98256516e-01, -2.03471482e-01,
+          6.22681141e-01, 4.97253537e-01, 7.96019912e-01,
+          -2.57343113e-01, 6.27592325e-01, 4.73982811e-01 ]
+      - [ -4.88193870e-01, 1.48588419e-01, -6.60042763e-02,
+          -1.54184043e-01, -4.39237595e-01, -6.93399906e-02,
+          -5.11158109e-02, -8.99680853e-02, -4.37500894e-01,
+          -6.88210130e-02, -4.68742847e-03, -1.80721879e-01,
+          -3.18013608e-01, -3.16624701e-01, -2.49999762e-02,
+          -9.48163867e-02, -3.17682743e-01, -2.91626632e-01,
+          4.21100020e-01, 1.87512159e-01, 4.89313841e-01,
+          -3.04557920e-01, -2.46855497e-01, -7.85970688e-02,
+          4.42774534e-01, -5.02594113e-02, -2.90028393e-01,
+          9.45206881e-02, 5.74097157e-01, -2.03945100e-01,
+          -1.43048823e-01, -2.90611684e-01, 4.70149159e-01,
+          1.15058422e-02, -1.83652461e-01, -1.91390038e-01 ]
+      - [ -3.82560074e-01, 6.54584050e-01, 1.34137630e-01,
+          -5.92937708e-01, 2.75378942e-01, 4.99286532e-01,
+          1.31749511e-01, -8.45199168e-01, 3.72486591e-01,
+          3.20609450e-01, 1.42187476e-01, -4.88295674e-01,
+          4.01077509e-01, 2.90105939e-01, 1.28125072e-01,
+          -3.61519992e-01, 3.81060839e-01, 3.24389815e-01,
+          6.02043748e-01, 5.71504474e-01, 6.73790812e-01,
+          -2.31531680e-01, 1.24079943e-01, 6.22703195e-01,
+          7.15606928e-01, -7.94727683e-01, 3.46455932e-01,
+          1.32313490e-01, 7.26027369e-01, -2.50561953e-01,
+          3.66493583e-01, 1.93531752e-01, 7.18905449e-01,
+          -2.16827989e-01, 3.33083034e-01, 2.00481176e-01 ]
+      - [ -4.50290024e-01, 7.16851830e-01, 8.44570398e-02,
+          -7.70930529e-01, 3.98874879e-01, 4.48212743e-01,
+          8.13536644e-02, -8.23841929e-01, 4.23087478e-01,
+          3.98650050e-01, 8.43750238e-02, -3.53662372e-01,
+          4.37040687e-01, 4.29028273e-01, 7.34374523e-02,
+          -4.14712250e-01, 4.76664901e-01, 3.83980155e-01,
+          5.55154800e-01, 6.74563766e-01, 7.34533191e-01,
+          -6.81220710e-01, 3.99494290e-01, 3.81256819e-01,
+          7.41040349e-01, -7.97371864e-01, 4.55944538e-01,
+          2.70586610e-01, 7.48443365e-01, 5.12611866e-03, 4.10152197e-01,
+          4.43353891e-01, 7.38805890e-01, -5.89273572e-02,
+          4.22745466e-01, 4.16054606e-01 ]
+      - [ 1., 9.82658029e-01, 7.67210722e-01, -9.88349617e-01,
+          9.36839223e-01, 9.18812990e-01, 7.79697657e-01,
+          -9.60185766e-01, 9.29494023e-01, 9.34498787e-01,
+          7.85937548e-01, -4.66827273e-01, 9.30413604e-01,
+          9.34409499e-01, 7.74999857e-01, -4.08365548e-01,
+          9.27147150e-01, 9.34716344e-01, 7.86594510e-01, 9.78867412e-01,
+          9.01012301e-01, -9.55681801e-01, 9.52174544e-01,
+          7.32826948e-01, 9.49132800e-01, -9.62439775e-01,
+          9.46865916e-01, 7.02296019e-01, 9.25280333e-01,
+          -9.09943581e-02, 9.35191393e-01, 9.34231162e-01,
+          9.12935138e-01, -9.96194482e-02, 9.18226242e-01,
+          9.19605613e-01 ]
+      - [ -7.23487973e-01, -8.39577794e-01, -5.82682729e-01,
+          1.37553334e-01, -9.96247530e-01, -9.09216166e-01,
+          -5.27717829e-01, 6.65014982e-02, -9.89640594e-01,
+          -9.34762836e-01, -5.37500024e-01, 4.57875729e-01,
+          -9.88960922e-01, -9.41745520e-01, -5.00000000e-01,
+          3.32540870e-01, -9.83419716e-01, -9.57652628e-01,
+          -2.26330101e-01, -6.41727448e-01, -2.30596185e-01,
+          5.16015053e-01, -9.70714450e-01, -6.88885272e-01,
+          -1.86127245e-01, 1.57134652e-01, -9.11007345e-01,
+          -8.18038225e-01, -2.12951481e-01, 3.66589189e-01,
+          -9.05444384e-01, -8.54495287e-01, -2.06467748e-01,
+          -7.34862685e-02, -8.47100019e-01, -9.09842014e-01 ]
+      - [ -6.40016556e-01, -6.76595151e-01, -3.95315826e-01,
+          3.55539799e-01, -9.84241307e-01, -7.71120429e-01,
+          -3.88048947e-01, 3.91163826e-01, -9.82526302e-01,
+          -7.97361374e-01, -3.04687500e-01, 5.65445423e-01,
+          -9.64991868e-01, -8.68802190e-01, -3.10937524e-01,
+          5.90686679e-01, -9.65352535e-01, -8.67677987e-01,
+          8.32581520e-02, -3.85652900e-01, 5.51180840e-02,
+          5.46427011e-01, -8.71605456e-01, -3.45284104e-01,
+          6.12715483e-02, 3.83291602e-01, -8.36324215e-01,
+          -4.91743207e-01, 2.30386019e-01, 6.16121292e-03,
+          -7.15991378e-01, -7.63684869e-01, 2.06467628e-01,
+          4.35328484e-03, -7.14626431e-01, -7.54995942e-01 ]
+      - [ -7.20588207e-01, -9.12176013e-01, -5.72746634e-01,
+          7.48157501e-03, -9.98252869e-01, -9.60970461e-01,
+          -5.73794007e-01, 6.96498156e-02, -9.98010993e-01,
+          -9.63209689e-01, -5.40624976e-01, 4.54711914e-01,
+          -9.98151839e-01, -9.70397770e-01, -5.26562452e-01,
+          4.61672902e-01, -9.98027086e-01, -9.72300589e-01,
+          -3.15900207e-01, -7.51348019e-01, -2.05849290e-01,
+          5.01127720e-01, -9.91028666e-01, -8.07108283e-01,
+          -2.06936479e-01, 5.33776522e-01, -9.87754464e-01,
+          -8.17978561e-01, -1.78082287e-01, 6.57593608e-01,
+          -9.74431276e-01, -8.83585095e-01, -1.84079647e-01,
+          5.71943760e-01, -9.70591009e-01, -8.96350861e-01 ]
+      - [ -5.76636314e-01, -2.67782092e-01, -3.08729649e-01,
+          2.37035871e-01, -7.99365342e-01, -3.55191052e-01,
+          -2.90136755e-01, 2.76251078e-01, -7.90461779e-01,
+          -3.82775903e-01, -2.18750000e-01, -7.33584166e-03,
+          -6.80021048e-01, -6.56320095e-01, -2.18750000e-01,
+          -1.77589834e-01, -6.42327070e-01, -6.99673414e-01,
+          1.04899168e-01, -7.73963332e-02, 1.04611874e-01,
+          1.24214053e-01, -5.49376369e-01, -8.49560499e-02,
+          6.82079792e-02, 3.51693630e-02, -4.77506995e-01,
+          -1.36888981e-01, 2.02988744e-01, -2.85938442e-01,
+          -3.28470290e-01, -5.28444290e-01, 2.33830810e-01,
+          -5.21489382e-01, -2.98609495e-01, -5.99002600e-01 ]
+      - [ -9.11143303e-01, -9.94755924e-01, -8.31085861e-01,
+          -2.05644727e-01, -9.98658538e-01, -9.96715486e-01,
+          -8.48812103e-01, -1.58219814e-01, -9.98714983e-01,
+          -9.98032212e-01, -7.51562476e-01, 1.54878378e-01,
+          -9.98762727e-01, -9.99055505e-01, -7.50000000e-01,
+          1.85267687e-01, -9.98912871e-01, -9.99274492e-01,
+          -7.27081478e-01, -9.40276802e-01, -5.63554525e-01,
+          2.14062214e-01, -9.99594748e-01, -9.78191018e-01,
+          -5.81502914e-01, 2.38117933e-01, -9.98548210e-01,
+          -9.81068730e-01, -5.11830688e-01, 4.04214978e-01,
+          -9.98758495e-01, -9.87453520e-01, -4.92537320e-01,
+          3.78174186e-01, -9.98684287e-01, -9.88446236e-01 ]
+      - [ -7.22452402e-01, -8.76188517e-01, -5.13129890e-01,
+          6.93516731e-02, -9.97904658e-01, -9.45395350e-01,
+          -5.30597568e-01, 1.59256101e-01, -9.98226047e-01,
+          -9.44712400e-01, -4.62499917e-01, 5.23783803e-01,
+          -9.97671247e-01, -9.59971130e-01, -4.81249988e-01,
+          5.52653432e-01, -9.97862041e-01, -9.58846509e-01,
+          -1.75233006e-01, -6.77203059e-01, -5.51180840e-02,
+          5.23542643e-01, -9.82028842e-01, -7.55830228e-01,
+          -4.50868011e-02, 5.95066309e-01, -9.81172621e-01,
+          -7.60805964e-01, 1.24526024e-03, 5.22130847e-01,
+          -9.47807848e-01, -8.74733210e-01, 3.48258018e-02,
+          5.70369482e-01, -9.54652846e-01, -8.71775270e-01 ]
+      - [ -4.42626357e-01, -4.90035474e-01, -1.18523836e-01,
+          5.02307296e-01, -9.56350446e-01, -6.06342793e-01,
+          -1.57667279e-01, 4.55776811e-01, -9.32749391e-01,
+          -6.19363070e-01, 2.81250477e-02, 4.33715105e-01,
+          -8.99458945e-01, -7.93957353e-01, -1.56247616e-03,
+          5.04209161e-01, -9.00501847e-01, -7.79987991e-01,
+          6.36910081e-01, -1.49529636e-01, 3.99325013e-01,
+          3.29574227e-01, -7.03956962e-01, -1.75374448e-01,
+          2.90173411e-01, 5.09613633e-01, -6.92656159e-01,
+          -2.86984444e-02, 7.26027369e-01, -5.62539577e-01,
+          -4.44544554e-01, -7.24127889e-01, 6.51741385e-01,
+          -2.77747095e-01, -5.06343722e-01, -6.56009078e-01 ]
+      - [ -4.52775478e-01, -3.86295378e-01, -6.88431263e-02,
+          5.89872479e-01, -9.43876982e-01, -5.08374214e-01,
+          -1.18790448e-01, 8.60636592e-01, -9.56620097e-01,
+          -4.52139437e-01, 9.37509537e-03, 6.85525894e-01,
+          -8.82753432e-01, -6.77847743e-01, 1.56259537e-03,
+          5.72144866e-01, -8.69543791e-01, -7.03561425e-01,
+          5.93026757e-01, 4.94980812e-02, 5.16310573e-01, 2.99781561e-01,
+          -5.94616115e-01, 2.87553072e-02, 4.47398663e-01,
+          6.54816389e-01, -6.48978829e-01, 2.46039867e-01,
+          6.21419668e-01, -7.59913325e-02, -3.40271056e-01,
+          -4.14430797e-01, 6.46766305e-01, -3.37576509e-01,
+          -2.93175578e-01, -4.88546431e-01 ]
+      - [ -5.22162437e-01, -3.86237800e-01, -1.89496160e-01,
+          6.27253413e-01, -9.39469218e-01, -4.46303427e-01,
+          -1.43268466e-01, 3.47598791e-01, -8.97242546e-01,
+          -5.89554489e-01, -1.06249928e-01, 3.54716063e-01,
+          -8.33334506e-01, -7.15741992e-01, -1.07812524e-01,
+          3.22978973e-01, -8.28043222e-01, -7.29357004e-01,
+          2.87045360e-01, -7.82111883e-02, 2.77840257e-01,
+          5.47542095e-01, -7.25363314e-01, 4.68049049e-02,
+          3.61849546e-01, -2.68316865e-02, -5.68490565e-01,
+          -3.51460874e-01, 3.87297511e-01, 2.01385021e-02,
+          -4.72874820e-01, -5.05904198e-01, 4.07960176e-01,
+          -2.35650241e-01, -4.24911439e-01, -5.70191145e-01 ]
+      - [ -7.62634635e-01, -9.96104002e-01, -5.35840988e-01,
+          -2.35338926e-01, -9.98358607e-01, -9.99153137e-01,
+          -6.14110887e-01, -1.74634278e-01, -9.98457372e-01,
+          -9.99068320e-01, -3.62500012e-01, 1.27505541e-01,
+          -9.98787999e-01, -1., -4.01562452e-01, 1.60961270e-01,
+          -9.98732388e-01, -9.99915898e-01, -4.95040655e-01,
+          -9.90180910e-01, -6.44544482e-01, 1.04780436e-01,
+          -9.99348998e-01, -9.98450279e-01, -7.54913330e-01,
+          1.60443068e-01, -9.99056756e-01, -9.96914804e-01,
+          -2.80199170e-01, 2.54235029e-01, -9.99198496e-01,
+          -9.99654591e-01, -3.03482592e-01, 2.36683249e-01,
+          -9.99231994e-01, -9.99544144e-01 ]
+      - [ -7.06503749e-01, -9.90328014e-01, -5.04613280e-01,
+          -2.28535235e-01, -9.98457968e-01, -9.98858929e-01,
+          -5.06119490e-01, -1.75075769e-01, -9.98383939e-01,
+          -9.98992205e-01, -3.70312512e-01, 1.40890002e-01,
+          -9.98587191e-01, -9.99359131e-01, -3.89062464e-01,
+          1.73754692e-01, -9.98560905e-01, -9.99283612e-01,
+          -5.06462336e-01, -9.89590406e-01, -6.67041659e-01,
+          1.19717836e-01, -9.99488711e-01, -9.96637523e-01,
+          -6.50867105e-01, 1.74014807e-01, -9.99645829e-01,
+          -9.97723222e-01, -4.17185605e-01, 2.98783541e-01,
+          -9.99627709e-01, -9.98470545e-01, -3.93034816e-01,
+          2.79219270e-01, -9.99673724e-01, -9.98566091e-01 ]
+      - [ -7.40265131e-01, -8.02417636e-01, -5.69907665e-01,
+          9.37095881e-02, -9.89174724e-01, -8.94620717e-01,
+          -5.85313141e-01, 1.74634695e-01, -9.88354981e-01,
+          -8.92066956e-01, -5.78124940e-01, 4.04500127e-01,
+          -9.77907479e-01, -9.26316738e-01, -5.71874976e-01,
+          3.58730793e-01, -9.74956810e-01, -9.36116815e-01,
+          -3.86835039e-01, -6.39405489e-01, -3.72328401e-01,
+          2.48825073e-01, -9.27371860e-01, -7.44194865e-01,
+          -4.12716806e-01, 3.57693911e-01, -9.26056087e-01,
+          -7.07077444e-01, -4.22166944e-01, 2.54876852e-01,
+          -8.69652271e-01, -8.45412970e-01, -4.12935317e-01,
+          4.04886007e-02, -8.39398265e-01, -8.77034724e-01 ]
+      - [ -5.12841761e-01, -3.83761048e-01, -1.32718265e-01,
+          4.92924213e-01, -9.26033258e-01, -5.07451296e-01,
+          -1.70626342e-01, 7.07188129e-01, -9.37003911e-01,
+          -4.72154558e-01, -9.53124762e-02, 4.01133895e-01,
+          -8.42453659e-01, -7.12682247e-01, -9.84374285e-02,
+          4.81407881e-01, -8.51442397e-01, -7.05327749e-01,
+          4.55966353e-01, 1.45695210e-02, 3.99325013e-01, 2.14049339e-01,
+          -5.52870750e-01, 1.43625736e-02, 3.96531701e-01,
+          4.79062319e-01, -6.19607270e-01, 9.94495153e-02,
+          5.16811967e-01, -2.23473489e-01, -3.35259259e-01,
+          -4.89553690e-01, 5.34825802e-01, -2.74213552e-01,
+          -3.35468709e-01, -5.00719547e-01 ]
+      - [ -6.02734029e-01, -5.68141580e-01, -3.11568499e-01,
+          7.00241208e-01, -9.90068018e-01, -6.15272045e-01,
+          -2.18142509e-01, 1.90223932e-01, -9.43228364e-01,
+          -7.80523062e-01, -1.51562393e-01, 3.55687737e-01,
+          -9.20159042e-01, -8.46941829e-01, -1.64062440e-01,
+          5.17860651e-01, -9.32442367e-01, -8.27774584e-01,
+          3.70002985e-01, -2.56457031e-01, 3.04836869e-01,
+          9.78078365e-01, -9.03053284e-01, 5.76114655e-03,
+          3.73410344e-01, -1.41970932e-01, -6.62867546e-01,
+          -5.99093497e-01, 5.31755924e-01, -5.09312987e-01,
+          -5.33487380e-01, -7.79381812e-01, 5.44776201e-01,
+          -3.13012481e-01, -5.87194920e-01, -7.42759347e-01 ]
+      - [ -6.94697618e-01, -7.27313042e-01, -4.63449240e-01,
+          2.30844140e-01, -9.84423220e-01, -8.26276124e-01,
+          -4.84521210e-01, 3.83721590e-01, -9.89690304e-01,
+          -8.22586894e-01, -3.84374976e-01, 5.92447162e-01,
+          -9.77576971e-01, -8.88277173e-01, -3.79687488e-01,
+          5.21591663e-01, -9.72754300e-01, -8.99503052e-01,
+          1.77335739e-02, -4.05060410e-01, -3.71202826e-02,
+          3.82653475e-01, -8.43612909e-01, -4.29094136e-01,
+          -7.05202818e-02, 4.13796425e-01, -8.35223317e-01,
+          -4.52014446e-01, 9.58904028e-02, -4.88542914e-02,
+          -6.98939085e-01, -7.69957304e-01, 7.96018839e-02,
+          -2.50120997e-01, -6.67978644e-01, -8.03249538e-01 ]
+      - [ -6.37323976e-01, -8.57388020e-01, -3.95315826e-01,
+          1.62637234e-01, -9.99505997e-01, -9.37026739e-01,
+          -3.34773242e-01, 1.79472327e-01, -9.98445213e-01,
+          -9.48416173e-01, -2.49999940e-01, 5.69458246e-01,
+          -9.97975945e-01, -9.59016442e-01, -2.79687464e-01,
+          6.08867168e-01, -9.98385131e-01, -9.57210183e-01,
+          5.96633673e-01, -5.96084356e-01, 4.24072027e-01,
+          7.79657006e-01, -9.88635421e-01, -6.55554295e-01,
+          5.74566364e-01, 5.00244975e-01, -9.64097261e-01,
+          -7.77416468e-01, 7.45952725e-01, 5.49280047e-01,
+          -9.40043867e-01, -8.64161074e-01, 7.28855848e-01,
+          6.58884048e-01, -9.49498653e-01, -8.50517750e-01 ]
+      - [ -7.17274249e-01, -9.98995185e-01, -5.65649390e-01,
+          -2.14469135e-01, -9.99119580e-01, -9.99043047e-01,
+          -5.69474459e-01, -1.60428047e-01, -9.99163449e-01,
+          -9.99406338e-01, -4.34374988e-01, 1.63193583e-01,
+          -9.99277353e-01, -9.99504387e-01, -4.23437476e-01,
+          1.90326810e-01, -9.99321759e-01, -9.99698937e-01,
+          -4.28313851e-01, -9.74724650e-01, -4.98312652e-01,
+          1.61698818e-01, -9.99736130e-01, -9.92248476e-01,
+          -4.91329491e-01, 2.13686347e-01, -9.99759793e-01,
+          -9.93572116e-01, -3.42465758e-01, 3.66855621e-01,
+          -9.99779284e-01, -9.95074272e-01, -3.53233814e-01,
+          3.45592976e-01, -9.99752581e-01, -9.95203137e-01 ]
+      - [ -4.94200528e-01, -3.63380194e-01, -1.17104411e-01,
+          7.33276248e-01, -9.50106323e-01, -4.20604944e-01,
+          -7.99134970e-02, 6.69229984e-01, -9.32783186e-01,
+          -4.95672286e-01, -2.18750238e-02, 5.91957688e-01,
+          -8.62127602e-01, -6.75132513e-01, -3.74999642e-02,
+          5.88201165e-01, -8.57382655e-01, -6.75310850e-01,
+          3.92846465e-01, 5.15557528e-02, 3.90326262e-01, 5.51804900e-01,
+          -6.55822992e-01, 2.23261356e-01, 4.65895891e-01,
+          2.40889311e-01, -5.65732121e-01, -6.80045485e-02,
+          4.96886611e-01, -1.46584809e-01, -3.22924376e-01,
+          -4.38091874e-01, 4.90049601e-01, -1.50658786e-01,
+          -3.31305921e-01, -4.36072171e-01 ]
+      - [ -6.79991722e-01, -8.69228959e-01, -4.94677067e-01,
+          1.21632695e-01, -9.99322832e-01, -9.41855431e-01,
+          -5.03239751e-01, 1.98802948e-01, -9.99235451e-01,
+          -9.43160295e-01, -4.37499940e-01, 5.71734905e-01,
+          -9.98389006e-01, -9.56453383e-01, -4.40625012e-01,
+          5.84349990e-01, -9.98332381e-01, -9.57028151e-01,
+          2.67207623e-01, -6.23388410e-01, 2.05849290e-01,
+          6.41189098e-01, -9.84973133e-01, -7.12563217e-01,
+          2.43930459e-01, 6.87248349e-01, -9.82448816e-01,
+          -7.30382919e-01, 3.15068483e-01, 6.84430003e-01,
+          -9.51486707e-01, -8.41466129e-01, 3.10945153e-01,
+          6.77734137e-01, -9.52180564e-01, -8.43543530e-01 ]
+      - [ -5.00414252e-01, -5.51838219e-01, -1.29879296e-01,
+          6.32429361e-01, -9.86597359e-01, -6.64024293e-01,
+          -1.44708395e-01, 7.19176173e-01, -9.85703647e-01,
+          -6.77421927e-01, 0., 8.87451291e-01, -9.62706506e-01,
+          -7.82569408e-01, -7.81250000e-03, 8.92425895e-01,
+          -9.62754369e-01, -7.83876181e-01, 6.36309028e-01,
+          -1.01932585e-01, 5.65804243e-01, 7.64750361e-01,
+          -8.12477291e-01, 3.44766378e-02, 5.79190731e-01,
+          7.19747782e-01, -7.91273236e-01, -7.29997754e-02,
+          6.43835664e-01, 1.47495031e-01, -5.57265580e-01,
+          -5.31154871e-01, 6.74129248e-01, 1.22457266e-01,
+          -5.68800211e-01, -5.47879219e-01 ]
+      - [ -5.93206286e-01, -6.19144559e-01, -2.88857400e-01,
+          4.63155389e-01, -9.84153390e-01, -7.26359129e-01,
+          -2.69978404e-01, 4.54765201e-01, -9.78738904e-01,
+          -7.63597727e-01, -1.78124964e-01, 5.93579292e-01,
+          -9.55763221e-01, -8.45546842e-01, -1.90624952e-01,
+          6.63951039e-01, -9.59385157e-01, -8.38051498e-01,
+          2.26329923e-01, -2.96438992e-01, 2.01349735e-01,
+          5.76924562e-01, -8.40377986e-01, -2.33550012e-01,
+          2.16184974e-01, 3.11298013e-01, -7.79309332e-01,
+          -4.33170021e-01, 3.67372394e-01, -1.14654422e-01,
+          -6.33726954e-01, -7.26919174e-01, 3.15920472e-01,
+          3.63504887e-03, -6.56523108e-01, -6.95008457e-01 ]
+      - [ -6.57000840e-01, -8.28758121e-01, -4.50674236e-01,
+          1.94547415e-01, -9.98658657e-01, -9.16306078e-01,
+          -4.61483061e-01, 2.76808619e-01, -9.98556972e-01,
+          -9.17866945e-01, -3.78125012e-01, 6.28738284e-01,
+          -9.96397495e-01, -9.38854635e-01, -3.79687488e-01,
+          6.35850906e-01, -9.96222556e-01, -9.39897001e-01,
+          2.89449930e-01, -5.59738517e-01, 2.26096749e-01,
+          6.41716957e-01, -9.71887946e-01, -6.43760562e-01,
+          2.57803440e-01, 6.85735583e-01, -9.68503118e-01,
+          -6.65056229e-01, 3.02615166e-01, 5.81179857e-01,
+          -9.16189611e-01, -8.10424089e-01, 2.88557172e-01,
+          5.62808514e-01, -9.15743470e-01, -8.13119709e-01 ]
+      - [ -6.28210425e-01, -4.23435867e-01, -3.15826833e-01,
+          3.34446073e-01, -9.10455167e-01, -5.48891068e-01,
+          -3.10295165e-01, 4.42053795e-01, -9.07487392e-01,
+          -5.38823366e-01, -2.59374917e-01, 1.85750961e-01,
+          -8.14312041e-01, -7.48666167e-01, -2.98437476e-01,
+          2.41638541e-01, -8.09353948e-01, -7.32147813e-01,
+          8.02524090e-02, -1.23552442e-01, 2.14848161e-01,
+          -2.60169506e-02, -5.67572951e-01, -2.94772923e-01,
+          1.65317893e-01, 2.54330873e-01, -6.13320827e-01,
+          -1.41224921e-01, 2.82689810e-01, -3.11456740e-01,
+          -3.89641285e-01, -5.91971397e-01, 1.84079528e-01,
+          -8.97732973e-02, -4.27923858e-01, -5.09862542e-01 ]
+      - [ -7.00289965e-01, -9.95115101e-01, -3.92476916e-01,
+          -2.44339049e-01, -9.98198211e-01, -9.99456465e-01,
+          -4.88840878e-01, -1.78106308e-01, -9.98630285e-01,
+          -9.99548078e-01, -2.79687464e-01, 1.32523894e-01,
+          -9.98759687e-01, -9.99808311e-01, -2.68750012e-01,
+          1.61111236e-01, -9.98837829e-01, -1., -5.34114838e-01,
+          -9.97190058e-01, -6.42294705e-01, 9.67779160e-02,
+          -9.99509752e-01, -1., -7.73410439e-01, 1.57034874e-01,
+          -9.99341369e-01, -9.98269856e-01, -3.32503140e-01,
+          2.54875183e-01, -9.99489248e-01, -1., -3.50746274e-01,
+          2.29247332e-01, -9.99407411e-01, -1. ]
+      - [ -6.77920461e-01, -8.28080654e-01, -4.49254811e-01,
+          2.18849659e-01, -9.99575913e-01, -9.17707026e-01,
+          -4.91720617e-01, 3.28263402e-01, -9.99768496e-01,
+          -9.12501633e-01, -4.01562452e-01, 7.21723318e-01,
+          -9.99011040e-01, -9.35061038e-01, -3.98437500e-01,
+          7.20164537e-01, -9.98832643e-01, -9.36974108e-01,
+          4.57168579e-01, -5.00098526e-01, 4.30821061e-01,
+          7.55349636e-01, -9.73191559e-01, -5.78175366e-01,
+          3.80346775e-01, 9.81776118e-01, -9.83877897e-01,
+          -5.40338516e-01, 6.31382346e-01, 9.18533444e-01,
+          -9.41351771e-01, -7.53065288e-01, 6.16915345e-01,
+          8.50075722e-01, -9.36684966e-01, -7.67060280e-01 ]
+      - [ -5.39768040e-01, -7.13965356e-01, -2.06529438e-01,
+          3.21002364e-01, -9.90513027e-01, -8.35432649e-01,
+          -2.10943103e-01, 4.32413220e-01, -9.92123187e-01,
+          -8.39574516e-01, -1.60937488e-01, 7.20009327e-01,
+          -9.82083142e-01, -8.77582312e-01, -1.29687488e-01,
+          6.61550164e-01, -9.80064750e-01, -8.90674293e-01,
+          6.65163755e-01, -3.57276559e-01, 5.38807631e-01,
+          7.23741412e-01, -9.22752440e-01, -3.91110837e-01,
+          5.28323650e-01, 7.15044975e-01, -9.09229279e-01,
+          -4.37198937e-01, 5.41718483e-01, 5.84812045e-01,
+          -8.09654474e-01, -6.50343478e-01, 5.92039704e-01,
+          2.60890722e-01, -7.69986510e-01, -7.20982432e-01 ]
+      - [ -7.29908824e-01, -7.11264729e-01, -4.67707634e-01,
+          2.59640098e-01, -9.85469818e-01, -8.16361845e-01,
+          -5.16198695e-01, 4.45202708e-01, -9.90773916e-01,
+          -7.95270205e-01, -4.57812488e-01, 5.08391738e-01,
+          -9.66120481e-01, -8.80478501e-01, -4.51562464e-01,
+          6.39932752e-01, -9.74897265e-01, -8.70294333e-01,
+          -2.20919788e-01, -4.53234613e-01, -1.47356570e-01,
+          2.93776870e-01, -8.48027349e-01, -5.21155536e-01,
+          -1.16762996e-01, 5.24605870e-01, -8.88171196e-01,
+          -4.84296203e-01, -1.03362381e-01, 8.07785988e-03,
+          -7.19492674e-01, -7.73271441e-01, -7.21393228e-02,
+          4.32401896e-02, -7.39580572e-01, -7.70561516e-01 ]
+      - [ -7.03811049e-01, -6.77715600e-01, -5.27324319e-01,
+          3.66072059e-01, -9.84161437e-01, -7.43400216e-01,
+          -4.61483061e-01, 7.68494606e-03, -9.45268750e-01,
+          -8.52878213e-01, -5.09374976e-01, 5.40267944e-01,
+          -9.53002512e-01, -8.40948999e-01, -5.03124952e-01,
+          3.67343426e-03, -8.96064222e-01, -9.14084435e-01,
+          -2.04087794e-01, -4.77377295e-01, -2.01349795e-01,
+          4.50486064e-01, -8.96738768e-01, -4.90560472e-01,
+          -1.97687924e-01, -1.73449934e-01, -7.31100440e-01,
+          -7.45418668e-01, -2.00498223e-01, 1.87488556e-01,
+          -7.66748548e-01, -7.52820671e-01, -1.66666687e-01,
+          -3.35312665e-01, -6.73975170e-01, -8.45990479e-01 ]
+      - [ -5.37075400e-01, -5.50753117e-01, -2.30660081e-01,
+          5.68038225e-01, -9.78756487e-01, -6.45780563e-01,
+          -1.66306674e-01, 4.48506832e-01, -9.63863075e-01,
+          -7.20344424e-01, -1.14062428e-01, 6.10938430e-01,
+          -9.35034335e-01, -8.00028086e-01, -1.09375000e-01,
+          5.70257187e-01, -9.31434572e-01, -8.11480582e-01,
+          3.12894464e-01, -2.01854289e-01, 2.91338563e-01,
+          7.00788617e-01, -8.42535675e-01, -1.06823504e-01,
+          3.73410344e-01, 2.60723233e-01, -7.42624879e-01,
+          -4.05421376e-01, 3.94769549e-01, 1.64501548e-01,
+          -6.24538064e-01, -5.99783301e-01, 4.05472755e-01,
+          -9.39432383e-02, -5.78521967e-01, -6.54988766e-01 ]
+      - [ -9.03479695e-01, -9.19660330e-01, -7.34563529e-01,
+          -7.50406981e-02, -9.96623516e-01, -9.67752278e-01,
+          -7.62418985e-01, 4.97341156e-04, -9.97469544e-01,
+          -9.69198525e-01, -7.40625024e-01, 3.35368156e-01,
+          -9.96702492e-01, -9.77618694e-01, -7.46874988e-01,
+          3.57867956e-01, -9.96776640e-01, -9.78115857e-01,
+          -7.31289506e-01, -7.80576527e-01, -5.18560171e-01,
+          3.08713555e-01, -9.79694486e-01, -8.46219420e-01,
+          -5.39884448e-01, 3.44575644e-01, -9.78180766e-01,
+          -8.60762715e-01, -5.31755984e-01, 3.26807141e-01,
+          -9.51416135e-01, -9.24335182e-01, -5.37313461e-01,
+          2.96418190e-01, -9.51790750e-01, -9.28145409e-01 ]
+      - [ -7.40679383e-01, -9.51557219e-01, -6.38041139e-01,
+          -1.02630734e-01, -9.98343647e-01, -9.82788265e-01,
+          -6.31389499e-01, -4.41507697e-02, -9.98230159e-01,
+          -9.84130681e-01, -5.90624928e-01, 3.05177450e-01,
+          -9.98299718e-01, -9.87883925e-01, -5.60937464e-01,
+          3.21221113e-01, -9.98301208e-01, -9.89073336e-01,
+          -3.92846465e-01, -8.48923385e-01, -3.72328401e-01,
+          3.18482041e-01, -9.94255662e-01, -9.14494276e-01,
+          -3.73410404e-01, 3.53974342e-01, -9.91748691e-01,
+          -9.17661190e-01, -3.84806991e-01, 4.65249658e-01,
+          -9.85341907e-01, -9.48661745e-01, -3.68159294e-01,
+          3.54181886e-01, -9.79748189e-01, -9.57314074e-01 ]
+      - [ -6.18682742e-01, -4.96791542e-01, -3.37118506e-01,
+          3.31859231e-01, -9.39087570e-01, -6.30586028e-01,
+          -3.54931593e-01, 4.71028209e-01, -9.44081843e-01,
+          -6.21891022e-01, -3.32812488e-01, 2.98107862e-01,
+          -8.64563763e-01, -7.77938724e-01, -3.32812488e-01,
+          2.93878794e-01, -8.61861825e-01, -7.85042703e-01,
+          1.72828197e-01, -2.25341320e-01, 1.18110180e-01,
+          2.25787044e-01, -7.27700233e-01, -3.31107438e-01,
+          1.63005710e-01, 2.65414238e-01, -7.31437683e-01,
+          -3.72551203e-01, 1.80572867e-01, 5.33211231e-03,
+          -5.69588840e-01, -6.17528081e-01, 1.61691427e-01,
+          -4.30640578e-02, -5.64201236e-01, -6.24392331e-01 ]
+      - [ -4.70381141e-01, -5.09423733e-01, -9.86515880e-02,
+          5.99807620e-01, -9.77532804e-01, -6.37860239e-01,
+          -1.51907802e-01, 8.04975510e-01, -9.81990635e-01,
+          -6.06889486e-01, 1.40625238e-02, 8.89451027e-01,
+          -9.49904323e-01, -7.48903334e-01, 7.81250000e-03,
+          7.88532853e-01, -9.41914320e-01, -7.67067969e-01,
+          6.24286175e-01, -6.03045821e-02, 5.25309443e-01,
+          4.96818542e-01, -7.34336615e-01, -6.24400377e-02,
+          4.72832322e-01, 8.35979581e-01, -7.78951168e-01,
+          1.09988928e-01, 6.48816824e-01, 1.35236979e-01,
+          -5.16918242e-01, -4.92222011e-01, 6.64179087e-01,
+          -1.03259325e-01, -4.76101339e-01, -5.53017259e-01 ]
+      - [ -5.68972707e-01, -6.62006676e-01, -2.87437856e-01,
+          4.44720149e-01, -9.90106106e-01, -7.68695593e-01,
+          -2.39740789e-01, 3.93545985e-01, -9.82478321e-01,
+          -8.08733642e-01, -1.73437417e-01, 6.69317126e-01,
+          -9.70269263e-01, -8.58352065e-01, -1.70312464e-01,
+          6.34128690e-01, -9.67944145e-01, -8.66564214e-01,
+          3.32131028e-01, -3.12516868e-01, 3.04836869e-01,
+          7.71628141e-01, -9.09316123e-01, -2.64483511e-01,
+          3.87283206e-01, 4.73351717e-01, -8.53432834e-01,
+          -4.72527683e-01, 4.22166824e-01, 3.48409891e-01,
+          -7.52276957e-01, -6.69905066e-01, 4.22885656e-01,
+          1.04511619e-01, -7.14177370e-01, -7.15596437e-01 ]
+      - [ -6.60521984e-01, -4.62841332e-01, -3.38537931e-01,
+          7.94510603e-01, -9.75686908e-01, -4.45696592e-01,
+          -2.38300860e-01, 2.19242573e-01, -9.09948945e-01,
+          -6.86516523e-01, -2.18750000e-01, 3.31051230e-01,
+          -8.67238998e-01, -7.72588313e-01, -2.35937476e-01,
+          4.37989712e-01, -8.75719070e-01, -7.56608427e-01,
+          -5.07965684e-02, -1.17488623e-01, 1.56355500e-01,
+          9.61985588e-01, -8.36903453e-01, 2.91073084e-01,
+          2.23121405e-01, -2.90763438e-01, -4.96289253e-01,
+          -4.88542199e-01, 2.25404620e-01, -4.98410821e-01,
+          -3.38379622e-01, -6.38048112e-01, 2.31343150e-01,
+          -3.71728480e-01, -3.96879554e-01, -6.13870561e-01 ]
+      - [ -7.18309879e-01, -9.14598286e-01, -5.57132721e-01,
+          -1.80895925e-02, -9.98346031e-01, -9.67916310e-01,
+          -5.75233996e-01, 6.34179115e-02, -9.98503983e-01,
+          -9.67008114e-01, -5.23437500e-01, 4.09977794e-01,
+          -9.97887135e-01, -9.75702107e-01, -5.00000000e-01,
+          4.18249369e-01, -9.97766376e-01, -9.77524281e-01,
+          -2.41959810e-01, -7.22943902e-01, -2.01349795e-01,
+          4.12838936e-01, -9.80683982e-01, -8.10503364e-01,
+          -1.83815122e-01, 5.29522419e-01, -9.84940112e-01,
+          -8.08156669e-01, -1.50684893e-01, 5.25226116e-01,
+          -9.60875034e-01, -8.93386185e-01, -1.29353285e-01,
+          3.81799459e-01, -9.50995386e-01, -9.10795212e-01 ]
+      - [ -7.47514486e-01, -9.46907818e-01, -6.40880048e-01,
+          -7.52513409e-02, -9.98598635e-01, -9.78720784e-01,
+          -6.47228241e-01, -1.36025548e-02, -9.98523176e-01,
+          -9.79968071e-01, -6.26562476e-01, 3.60050559e-01,
+          -9.98876989e-01, -9.83650625e-01, -6.03124976e-01,
+          3.71502280e-01, -9.98766601e-01, -9.85034108e-01,
+          -3.66396189e-01, -8.17370117e-01, -2.98087776e-01,
+          4.56568718e-01, -9.97349501e-01, -8.71650934e-01,
+          -2.99421966e-01, 5.13069868e-01, -9.96751130e-01,
+          -8.77922177e-01, -2.37858057e-01, 7.18282461e-01,
+          -9.93626177e-01, -9.19239640e-01, -2.21393049e-01,
+          6.62862062e-01, -9.92430687e-01, -9.27159905e-01 ]
+      - [ -5.64001679e-01, -6.83049977e-01, -2.50532329e-01,
+          4.86645341e-01, -9.96751070e-01, -7.99520731e-01,
+          -2.67098486e-01, 5.72515011e-01, -9.96294916e-01,
+          -8.06036592e-01, -1.25000000e-01, 8.99588823e-01,
+          -9.89638329e-01, -8.59858990e-01, -1.34374976e-01,
+          9.09560442e-01, -9.90046084e-01, -8.60889077e-01,
+          6.68169498e-01, -2.63649344e-01, 5.95050573e-01,
+          8.80442619e-01, -9.14927125e-01, -2.05173492e-01,
+          6.11560583e-01, 8.74533653e-01, -9.04633582e-01,
+          -2.81297028e-01, 7.01120853e-01, 5.22647023e-01,
+          -7.69201159e-01, -6.21643543e-01, 7.31343150e-01,
+          4.95754123e-01, -7.73061514e-01, -6.36053801e-01 ]
+      - [ -6.28624678e-01, -6.42072499e-01, -3.12987924e-01,
+          3.64150286e-01, -9.82321978e-01, -7.67638981e-01,
+          -3.30453515e-01, 4.74673986e-01, -9.81169224e-01,
+          -7.58163750e-01, -2.53125012e-01, 5.59055924e-01,
+          -9.55956817e-01, -8.53022695e-01, -2.96875000e-01,
+          5.90808630e-01, -9.54084873e-01, -8.43954206e-01,
+          1.42771125e-01, -3.05142760e-01, 2.12598562e-01,
+          3.02782297e-01, -8.03055644e-01, -4.25070047e-01,
+          1.93063617e-01, 5.83751917e-01, -8.41278315e-01,
+          -3.23947608e-01, 2.95143127e-01, -5.92594147e-02,
+          -6.37358904e-01, -7.10399508e-01, 2.21393108e-01,
+          1.20508313e-01, -6.68057561e-01, -6.60786867e-01 ]
+      - [ -7.07953572e-01, -9.12701488e-01, -5.04613280e-01,
+          2.66408920e-02, -9.99539375e-01, -9.66995239e-01,
+          -5.13318896e-01, 9.71568823e-02, -9.99524653e-01,
+          -9.68304694e-01, -4.73437488e-01, 4.78355169e-01,
+          -9.99281764e-01, -9.73897099e-01, -4.42187488e-01,
+          4.82969642e-01, -9.99234676e-01, -9.76276934e-01,
+          3.90441775e-01, -6.89077556e-01, 3.81327391e-01,
+          6.05986834e-01, -9.92408812e-01, -7.94013202e-01,
+          3.89595270e-01, 6.60003304e-01, -9.91170108e-01,
+          -8.04410398e-01, 4.54545379e-01, 8.52342367e-01,
+          -9.81013000e-01, -8.65987182e-01, 5.42288542e-01,
+          7.47245908e-01, -9.78042185e-01, -8.85734856e-01 ]
+      - [ -6.93247676e-01, -8.78880024e-01, -4.83321488e-01,
+          1.05423331e-01, -9.99446154e-01, -9.47985411e-01,
+          -4.73002076e-01, 1.64327383e-01, -9.99145806e-01,
+          -9.52243090e-01, -4.09374952e-01, 5.59470057e-01,
+          -9.99054015e-01, -9.62773502e-01, -4.32812452e-01,
+          5.81853271e-01, -9.99036551e-01, -9.61504877e-01,
+          3.83829236e-01, -6.46569252e-01, 3.38582754e-01,
+          6.76520705e-01, -9.91039515e-01, -7.38477409e-01,
+          3.47976804e-01, 6.69355869e-01, -9.86286163e-01,
+          -7.68253863e-01, 5.44209123e-01, 7.66655803e-01,
+          -9.70120013e-01, -8.62617135e-01, 4.92537379e-01,
+          8.26019406e-01, -9.73405898e-01, -8.53641510e-01 ]
+      - [ -7.16238618e-01, -8.58976483e-01, -5.37260413e-01,
+          7.60216713e-02, -9.96380568e-01, -9.33959842e-01,
+          -5.39236784e-01, 1.73018932e-01, -9.97198999e-01,
+          -9.34445739e-01, -4.96874928e-01, 4.83862400e-01,
+          -9.94097650e-01, -9.52664971e-01, -4.89062428e-01,
+          4.86756444e-01, -9.93874848e-01, -9.55366015e-01,
+          -2.15509534e-01, -6.14469409e-01, -1.49606228e-01,
+          4.36733603e-01, -9.55611527e-01, -7.00949430e-01,
+          -1.28323674e-01, 5.09717345e-01, -9.55662668e-01,
+          -7.08388329e-01, -1.13325059e-01, 4.16860580e-01,
+          -9.06047523e-01, -8.40597987e-01, -1.11940265e-01,
+          2.38878727e-01, -8.86041105e-01, -8.64542365e-01 ]
+      - [ -6.69221222e-01, -8.03096890e-01, -5.06032705e-01,
+          1.68594241e-01, -9.93205309e-01, -8.87989402e-01,
+          -4.81641471e-01, 2.08204985e-01, -9.91756022e-01,
+          -9.01067197e-01, -4.21875000e-01, 5.11545658e-01,
+          -9.87204731e-01, -9.30036187e-01, -4.24999952e-01,
+          5.18502951e-01, -9.86962914e-01, -9.32088137e-01,
+          -2.18515217e-01, -5.96347094e-01, -2.73340821e-01,
+          5.34512520e-01, -9.55107927e-01, -5.99448800e-01,
+          -2.50867128e-01, 3.90516281e-01, -9.26264822e-01,
+          -6.95455670e-01, -1.60647571e-01, 2.46305108e-01,
+          -8.74684036e-01, -8.54303837e-01, -1.86567187e-01,
+          1.78987741e-01, -8.67706895e-01, -8.61255050e-01 ]
+      - [ -4.75559294e-01, -3.60148489e-01, -7.31015205e-02,
+          7.00167298e-01, -9.44058895e-01, -4.28575933e-01,
+          -7.84736872e-02, 7.58558154e-01, -9.40068245e-01,
+          -4.61955547e-01, 3.12507153e-03, 6.02229834e-01,
+          -8.61635327e-01, -6.71173573e-01, -6.25002384e-03,
+          6.02241278e-01, -8.60325813e-01, -6.75856948e-01,
+          5.58761716e-01, 7.41208792e-02, 5.07311583e-01, 5.09633303e-01,
+          -6.12850368e-01, 2.62525320e-01, 5.21387100e-01,
+          4.26202536e-01, -5.79431891e-01, 1.18283510e-01,
+          6.01494312e-01, -1.98965251e-01, -2.74018407e-01,
+          -4.15137231e-01, 6.21890426e-01, -2.45350599e-01,
+          -2.85686612e-01, -4.36507463e-01 ]
+      - [ -3.81110191e-01, -2.59727240e-01, -8.87153745e-02,
+          4.80428934e-01, -8.53001833e-01, -3.12069297e-01,
+          -1.31749332e-01, 3.07666898e-01, -7.86592305e-01,
+          -3.66520107e-01, 7.50000477e-02, -1.24564886e-01,
+          -6.87418580e-01, -7.04673052e-01, 7.81250000e-02,
+          5.25386333e-02, -7.10072041e-01, -6.78627372e-01,
+          6.33303165e-01, -1.33562088e-03, 4.08323884e-01,
+          1.83450222e-01, -5.21886051e-01, 3.88549566e-02,
+          3.15606833e-01, 2.90156126e-01, -4.76657629e-01,
+          1.75791621e-01, 7.45952725e-01, -8.53325903e-01,
+          -2.04281926e-01, -6.58629894e-01, 7.11442709e-01,
+          -5.98712206e-01, -2.76925147e-01, -5.91302037e-01 ]
+      - [ -6.40223682e-01, -3.47361445e-01, -3.01632404e-01,
+          8.86865497e-01, -9.51579511e-01, -2.51552403e-01,
+          -2.12382913e-01, 3.85158062e-02, -8.27284873e-01,
+          -6.22046709e-01, -1.98437452e-01, 7.27984905e-02,
+          -7.62203038e-01, -7.23386049e-01, -2.15624928e-01,
+          1.89849138e-01, -7.73653328e-01, -7.05533028e-01,
+          -2.13406682e-02, -4.23591137e-02, 1.76602960e-01,
+          8.94014120e-01, -7.72053361e-01, 4.17633772e-01,
+          2.41618395e-01, -4.84260619e-01, -3.54418099e-01,
+          -4.71267581e-01, 2.52802014e-01, -6.02931917e-01,
+          -2.17373073e-01, -5.88555872e-01, 2.58706570e-01,
+          -4.82928634e-01, -2.79700339e-01, -5.63894749e-01 ]
+      - [ -5.97970128e-01, -7.01381505e-01, -3.28601897e-01,
+          4.90207195e-01, -9.97499466e-01, -7.96085298e-01,
+          -2.02303767e-01, 3.05004954e-01, -9.86036181e-01,
+          -8.59660268e-01, -1.17187500e-01, 6.14809394e-01,
+          -9.79143739e-01, -8.95481348e-01, -1.48437440e-01,
+          7.13505149e-01, -9.82932508e-01, -8.85641634e-01,
+          4.51758265e-01, -3.78694594e-01, 3.38582754e-01,
+          9.53268766e-01, -9.49828327e-01, -2.53776908e-01,
+          4.19653058e-01, 1.44229293e-01, -8.18230629e-01,
+          -6.50272250e-01, 5.99003673e-01, -1.13608778e-01,
+          -7.25272477e-01, -8.06986749e-01, 6.01990104e-01,
+          8.73486996e-02, -7.65444279e-01, -7.75159001e-01 ]
+      - [ -7.10024834e-01, -9.29126918e-01, -5.37260413e-01,
+          -7.53182173e-02, -9.97305274e-01, -9.75752056e-01,
+          -6.01151943e-01, 1.44821405e-02, -9.97830093e-01,
+          -9.72970188e-01, -5.07812500e-01, 3.71359706e-01,
+          -9.98082399e-01, -9.81206179e-01, -5.15625000e-01,
+          3.92275691e-01, -9.98069465e-01, -9.81433749e-01,
+          -3.11091185e-01, -7.92587698e-01, -2.93588281e-01,
+          4.07026291e-01, -9.91302073e-01, -8.57940853e-01,
+          -3.01734209e-01, 4.90730286e-01, -9.92573142e-01,
+          -8.60907197e-01, -1.90535545e-01, 5.55417180e-01,
+          -9.82114613e-01, -9.26485062e-01, -2.03980088e-01,
+          5.39264560e-01, -9.82119620e-01, -9.27643538e-01 ]
+      - [ -6.24275088e-01, -6.79650307e-01, -3.37118506e-01,
+          3.41275096e-01, -9.87763107e-01, -8.01840425e-01,
+          -3.62131000e-01, 4.53819633e-01, -9.87073183e-01,
+          -7.91865289e-01, -2.60937512e-01, 6.18791223e-01,
+          -9.70794559e-01, -8.69714558e-01, -2.73437440e-01,
+          6.42641902e-01, -9.71513271e-01, -8.69267941e-01,
+          1.23232603e-02, -3.41163337e-01, 1.31608605e-01,
+          4.35499072e-01, -8.60902309e-01, -4.36445773e-01,
+          4.27745581e-02, 6.68657184e-01, -8.75273347e-01,
+          -3.21900129e-01, 1.70610189e-01, 1.05178475e-01,
+          -7.15038598e-01, -7.25996494e-01, 1.71641707e-01,
+          1.23181462e-01, -7.27831721e-01, -7.25130558e-01 ]
+      - [ -5.23612261e-01, -5.40942788e-01, -1.76721096e-01,
+          6.25855684e-01, -9.83460307e-01, -6.43194675e-01,
+          -1.40388727e-01, 6.39771342e-01, -9.77953494e-01,
+          -6.80105984e-01, -4.53124642e-02, 7.97189593e-01,
+          -9.51622307e-01, -7.80634880e-01, -5.93749881e-02,
+          7.85164714e-01, -9.48904216e-01, -7.81570435e-01,
+          4.37932134e-01, -1.08751416e-01, 4.19572473e-01,
+          7.64432192e-01, -8.19109976e-01, 3.09433937e-02,
+          4.86705065e-01, 5.35434127e-01, -7.62147665e-01,
+          -1.85781181e-01, 5.31755924e-01, 9.33433771e-02,
+          -5.55907607e-01, -5.56358695e-01, 5.24875641e-01,
+          7.46169090e-02, -5.56306601e-01, -5.57144225e-01 ]
+      - [ -6.34424210e-01, -4.28210080e-01, -3.56990755e-01,
+          3.87200952e-01, -9.08638358e-01, -4.97965574e-01,
+          -3.40532780e-01, 2.68128872e-01, -8.86326551e-01,
+          -5.94923317e-01, -2.70312488e-01, 3.62679958e-02,
+          -7.95365334e-01, -7.77144611e-01, -2.85937428e-01,
+          1.72033072e-01, -8.09749901e-01, -7.58819640e-01,
+          1.24135852e-01, -1.56722903e-01, 1.24859452e-01,
+          3.15654874e-01, -6.47808671e-01, -4.85361218e-02,
+          1.95375681e-01, 3.96585464e-03, -5.69417417e-01,
+          -3.26820791e-01, 3.25031042e-01, -4.29099560e-01,
+          -4.06815886e-01, -6.57652020e-01, 2.76119351e-01,
+          -3.09230030e-01, -4.34382200e-01, -6.16841495e-01 ]
+      - [ -7.36122608e-01, -9.47587311e-01, -5.67068815e-01,
+          -9.51595902e-02, -9.98451710e-01, -9.83070910e-01,
+          -5.82433403e-01, -2.51786113e-02, -9.98648584e-01,
+          -9.83497262e-01, -4.89062428e-01, 3.30181003e-01,
+          -9.98813093e-01, -9.87935424e-01, -5.00000000e-01,
+          3.61659765e-01, -9.98969853e-01, -9.87777829e-01,
+          -1.27742767e-01, -8.10769320e-01, -7.53655434e-02,
+          4.56842184e-01, -9.97978449e-01, -8.92309844e-01,
+          -4.73989248e-02, 5.12556434e-01, -9.97273803e-01,
+          -8.97196412e-01, 9.09091234e-02, 6.88441515e-01,
+          -9.94280517e-01, -9.36953664e-01, 8.45770836e-02,
+          6.94157362e-01, -9.94643092e-01, -9.36014235e-01 ]
+      - [ -7.31773019e-01, -8.56498480e-01, -5.24485469e-01,
+          9.10971165e-02, -9.95943606e-01, -9.27763999e-01,
+          -5.23398101e-01, 1.47039056e-01, -9.96000946e-01,
+          -9.36324894e-01, -4.62499917e-01, 5.07790327e-01,
+          -9.94952679e-01, -9.52281415e-01, -4.98437524e-01,
+          5.51657438e-01, -9.95458186e-01, -9.48783219e-01,
+          -1.79440975e-01, -6.30101085e-01, -1.13610804e-01,
+          6.55203462e-01, -9.81878221e-01, -6.48372769e-01,
+          -7.05202818e-02, 4.99647379e-01, -9.62600589e-01,
+          -7.44313478e-01, -3.11332941e-02, 4.43527460e-01,
+          -9.22380567e-01, -8.57090116e-01, -5.47263026e-02,
+          5.74608803e-01, -9.36677575e-01, -8.36502612e-01 ]
+      - [ -7.22245216e-01, -9.23721790e-01, -5.81263304e-01,
+          -3.92743349e-02, -9.98025417e-01, -9.70328510e-01,
+          -6.04031682e-01, 1.01634264e-02, -9.97129440e-01,
+          -9.70975876e-01, -5.81249952e-01, 3.42619896e-01,
+          -9.96217608e-01, -9.78188872e-01, -5.79687476e-01,
+          3.63141537e-01, -9.96179640e-01, -9.78683829e-01,
+          -1.23234391e-02, -7.43819237e-01, -1.01237297e-02,
+          4.70458150e-01, -9.90084410e-01, -8.34687889e-01,
+          4.97108698e-02, 5.46461582e-01, -9.91393983e-01,
+          -8.47533524e-01, 8.84184837e-02, 6.20896339e-01,
+          -9.77524340e-01, -9.07087505e-01, 9.70149040e-02,
+          5.82992315e-01, -9.75627780e-01, -9.10168290e-01 ]
+      - [ -6.96354628e-01, -7.98357725e-01, -4.88999307e-01,
+          2.08274364e-01, -9.94384944e-01, -8.79964769e-01,
+          -4.68682528e-01, 2.28229642e-01, -9.92731988e-01,
+          -8.99422407e-01, -4.03124928e-01, 5.52365541e-01,
+          -9.88743126e-01, -9.26531851e-01, -4.37499940e-01,
+          6.13755226e-01, -9.90198493e-01, -9.20555055e-01,
+          -1.48782730e-01, -5.43891311e-01, -9.33633447e-02,
+          7.30011582e-01, -9.68459010e-01, -5.07690787e-01,
+          -5.20231128e-02, 4.30340767e-01, -9.25950229e-01,
+          -6.78368688e-01, -2.61518955e-02, 2.81326056e-01,
+          -8.58167648e-01, -8.23515177e-01, -4.97512817e-02,
+          4.32203174e-01, -8.79940271e-01, -7.95843363e-01 ]
+      - [ -7.58906364e-01, -8.92461240e-01, -6.43718958e-01,
+          1.44587755e-02, -9.96697247e-01, -9.48001564e-01,
+          -6.45788312e-01, 8.06676149e-02, -9.96243477e-01,
+          -9.49419558e-01, -6.23437524e-01, 4.05657053e-01,
+          -9.94300187e-01, -9.62989390e-01, -6.14062548e-01,
+          4.01268125e-01, -9.93843079e-01, -9.66551840e-01,
+          -4.42140102e-01, -7.38738120e-01, -4.08323944e-01,
+          3.18044782e-01, -9.70747352e-01, -8.17702115e-01,
+          -4.45086777e-01, 3.57072234e-01, -9.62373793e-01,
+          -8.09192061e-01, -4.74470794e-01, 3.22023630e-01,
+          -9.29316998e-01, -8.96892071e-01, -4.57711458e-01,
+          1.63623452e-01, -9.13446426e-01, -9.16823328e-01 ]
+      - [ -6.77713335e-01, -8.38316798e-01, -4.86160457e-01,
+          1.62044168e-01, -9.97299731e-01, -9.14658487e-01,
+          -4.49963987e-01, 1.15150452e-01, -9.92931306e-01,
+          -9.35286999e-01, -4.07812476e-01, 4.77199912e-01,
+          -9.91470218e-01, -9.48319435e-01, -4.12500024e-01,
+          4.83990192e-01, -9.91136611e-01, -9.49871004e-01,
+          -5.68079948e-02, -5.79143524e-01, 1.01237297e-02,
+          7.47181892e-01, -9.80779886e-01, -5.81511855e-01,
+          9.36415195e-02, 3.41410518e-01, -9.33181047e-01,
+          -7.64100254e-01, 1.80572867e-01, 3.57968926e-01,
+          -8.98661792e-01, -8.51613343e-01, 1.49253607e-01,
+          2.68278599e-01, -8.90176415e-01, -8.61740589e-01 ]
+      - [ -7.37158239e-01, -9.44409430e-01, -6.06813312e-01,
+          -4.15688753e-02, -9.99769747e-01, -9.79603648e-01,
+          -5.93952537e-01, 2.19593048e-02, -9.99777794e-01,
+          -9.81659651e-01, -5.42187452e-01, 3.87571692e-01,
+          -9.99705136e-01, -9.85477924e-01, -5.50000012e-01,
+          4.10590768e-01, -9.99733865e-01, -9.85567808e-01,
+          9.10729170e-02, -7.53751218e-01, 1.04611874e-01,
+          4.99141335e-01, -9.93256211e-01, -8.44555438e-01,
+          1.37572289e-01, 5.95441699e-01, -9.96285260e-01,
+          -8.60928655e-01, 2.32876658e-01, 7.51937628e-01,
+          -9.88969386e-01, -9.09129500e-01, 2.31343150e-01,
+          7.41192460e-01, -9.88619447e-01, -9.09736991e-01 ]
+      - [ -4.85294163e-01, -4.33252931e-01, -1.31298840e-01,
+          5.05800605e-01, -9.41267133e-01, -5.48947990e-01,
+          -1.46148205e-01, 4.83017921e-01, -9.23599720e-01,
+          -5.77537000e-01, -1.87499940e-01, 3.81697893e-01,
+          -8.34228635e-01, -7.05371141e-01, -1.53124988e-01,
+          4.20754433e-01, -8.49407554e-01, -7.23836780e-01,
+          4.96242881e-01, -6.35316968e-02, 3.74578238e-01,
+          2.71522045e-01, -6.23960733e-01, -6.42356873e-02,
+          3.89595270e-01, 1.94577575e-01, -5.86438179e-01,
+          -1.58633947e-01, 3.99750948e-01, 1.50760889e-01,
+          -4.33322966e-01, -3.92558992e-01, 4.52736259e-01,
+          -2.55845904e-01, -3.58762622e-01, -5.16377449e-01 ]
+      - [ -7.05468059e-01, -7.51709402e-01, -4.90418732e-01,
+          2.03537345e-01, -9.88237739e-01, -8.52061033e-01,
+          -4.87401009e-01, 3.21454644e-01, -9.90661204e-01,
+          -8.53645921e-01, -4.43749964e-01, 5.20703197e-01,
+          -9.77237105e-01, -9.02677178e-01, -4.43749964e-01,
+          5.12305498e-01, -9.76317465e-01, -9.07554567e-01,
+          -1.77637577e-01, -4.62048769e-01, -1.00112557e-01,
+          3.48286390e-01, -8.80208492e-01, -5.54388285e-01,
+          -1.14450872e-01, 3.68326664e-01, -8.67633700e-01,
+          -5.70672035e-01, -8.84184837e-02, 1.84777975e-01,
+          -7.74797142e-01, -7.62068033e-01, -8.45771432e-02,
+          5.47409058e-04, -7.47051477e-01, -7.93643713e-01 ]
+      - [ -4.60646272e-01, -5.40245712e-01, -1.28459871e-01,
+          4.75206733e-01, -9.67888474e-01, -6.65896654e-01,
+          -1.64866686e-01, 4.79665518e-01, -9.53686953e-01,
+          -6.70089960e-01, 2.18751431e-02, 5.27644873e-01,
+          -9.28208649e-01, -8.15432131e-01, -1.56250000e-02,
+          5.88983297e-01, -9.28360224e-01, -8.02385449e-01,
+          6.63961530e-01, -1.90825045e-01, 4.10573721e-01,
+          3.67260695e-01, -7.46525168e-01, -2.33604014e-01,
+          3.01734090e-01, 5.66903710e-01, -7.43797123e-01,
+          -9.08134580e-02, 7.45952725e-01, -4.48415220e-01,
+          -5.13407230e-01, -7.36165345e-01, 6.56716347e-01,
+          -1.66573942e-01, -5.69763482e-01, -6.70722365e-01 ]
+      - [ -8.59362066e-01, -6.39526308e-01, -6.60752296e-01,
+          1.91271305e-02, -9.29362655e-01, -7.60123134e-01,
+          -6.78905725e-01, 5.79603910e-02, -9.26453710e-01,
+          -7.75692940e-01, -6.49999976e-01, -6.83957338e-03,
+          -8.61187160e-01, -8.74121964e-01, -6.56250000e-01,
+          -8.32653046e-03, -8.56202662e-01, -8.80247116e-01,
+          -6.36309028e-01, -5.09317398e-01, -4.69066381e-01,
+          8.52774382e-02, -7.95613170e-01, -6.01069450e-01,
+          -4.82080996e-01, 5.13046980e-02, -7.79890537e-01,
+          -6.60014987e-01, -4.66998756e-01, -1.27919972e-01,
+          -6.75873101e-01, -8.04215372e-01, -4.72636819e-01,
+          -1.97759211e-01, -6.72245860e-01, -8.17174435e-01 ]
+      - [ -5.93206286e-01, -6.15446925e-01, -2.88857400e-01,
+          4.39875364e-01, -9.83774364e-01, -7.37211585e-01,
+          -3.01655889e-01, 5.78002691e-01, -9.85291660e-01,
+          -7.29640543e-01, -2.06249952e-01, 6.74570799e-01,
+          -9.59169865e-01, -8.29093456e-01, -2.25000024e-01,
+          7.17604399e-01, -9.60873544e-01, -8.24077725e-01,
+          1.89059138e-01, -2.33391702e-01, 2.28346467e-01,
+          5.10479689e-01, -8.26630592e-01, -2.70600080e-01,
+          2.46242762e-01, 6.66130543e-01, -8.36049080e-01,
+          -2.29189515e-01, 2.85180569e-01, 1.48759127e-01,
+          -6.43742561e-01, -6.29054308e-01, 2.81094432e-01,
+          1.80305839e-01, -6.55873656e-01, -6.21175528e-01 ]
+      - [ -6.85584068e-01, -5.41835785e-01, -3.52732420e-01,
+          5.56242943e-01, -9.75505054e-01, -6.16212785e-01,
+          -3.88048947e-01, 6.56985641e-01, -9.76017356e-01,
+          -6.24393940e-01, -3.07812452e-01, 7.37151027e-01,
+          -9.40951407e-01, -7.61550009e-01, -3.10937524e-01,
+          7.00009704e-01, -9.34985459e-01, -7.66914427e-01,
+          4.35826778e-02, -1.24937654e-01, 1.76602960e-01,
+          5.17519474e-01, -7.24444509e-01, 2.18540430e-02,
+          1.72254324e-01, 5.21946549e-01, -7.37986445e-01,
+          -1.15664184e-01, 2.27895379e-01, -7.77861476e-02,
+          -4.80163515e-01, -5.66219091e-01, 2.11442828e-01,
+          -1.02868795e-01, -4.70134914e-01, -5.57388246e-01 ]
+      - [ -6.74606442e-01, -8.29230726e-01, -4.62029815e-01,
+          1.83943748e-01, -9.97577012e-01, -9.10756290e-01,
+          -4.91720617e-01, 2.62289524e-01, -9.97708142e-01,
+          -9.13920403e-01, -3.99999976e-01, 6.39982224e-01,
+          -9.96601820e-01, -9.36908126e-01, -4.09374952e-01,
+          6.53164268e-01, -9.96498287e-01, -9.36762273e-01,
+          1.05500340e-01, -5.76161087e-01, 1.45106912e-01,
+          6.44860506e-01, -9.70287263e-01, -6.25972271e-01,
+          1.46820664e-01, 6.29257441e-01, -9.63239431e-01,
+          -6.74129188e-01, 3.07596564e-01, 5.35003424e-01,
+          -9.19892848e-01, -8.30864131e-01, 2.91044831e-01,
+          5.41531682e-01, -9.20809686e-01, -8.27704251e-01 ]
+      - [ -6.16404295e-01, -3.61186743e-01, -3.05890679e-01,
+          8.93407583e-01, -9.52321708e-01, -2.47644544e-01,
+          -2.62778997e-01, -2.27724254e-01, -7.71414101e-01,
+          -6.81498945e-01, -1.89062476e-01, -3.08091879e-01,
+          -7.09677935e-01, -7.97023416e-01, -1.98437452e-01,
+          -5.46509027e-02, -7.44528234e-01, -7.61964440e-01,
+          3.02675009e-01, -1.31957293e-01, 2.98087835e-01,
+          9.49273586e-01, -8.31263244e-01, 2.58463264e-01,
+          3.45664620e-01, -4.46684957e-01, -4.58733976e-01,
+          -5.59791863e-01, 5.01867890e-01, -7.84769893e-01,
+          -3.35429907e-01, -7.36261368e-01, 5.04975080e-01,
+          -6.56803012e-01, -3.80896568e-01, -7.04736471e-01 ]
+      - [ -6.94904685e-01, -8.84597182e-01, -4.97515917e-01,
+          6.68075085e-02, -9.98865902e-01, -9.52841878e-01,
+          -5.04679561e-01, 1.49691343e-01, -9.98800039e-01,
+          -9.51975584e-01, -4.49999988e-01, 5.22493005e-01,
+          -9.98440683e-01, -9.64006841e-01, -4.59374964e-01,
+          5.41084290e-01, -9.98455703e-01, -9.64036345e-01,
+          1.43973470e-01, -6.32480681e-01, 1.54105783e-01,
+          6.21018529e-01, -9.85278606e-01, -7.24231124e-01,
+          2.04624295e-01, 7.32433319e-01, -9.86984730e-01,
+          -7.24912405e-01, 3.02615166e-01, 7.33303905e-01,
+          -9.60541070e-01, -8.45422745e-01, 2.91044831e-01,
+          7.38706112e-01, -9.61386621e-01, -8.44763577e-01 ]
+      - [ -7.43371964e-01, -9.88468289e-01, -5.95457733e-01,
+          -2.28716671e-01, -9.97993827e-01, -9.98092115e-01,
+          -5.66594601e-01, -1.72695637e-01, -9.98060644e-01,
+          -9.98319149e-01, -4.59374964e-01, 1.37292147e-01,
+          -9.98247027e-01, -9.99101400e-01, -4.64062512e-01,
+          1.65089607e-01, -9.98254955e-01, -9.99256074e-01,
+          -4.19897854e-01, -9.78304386e-01, -5.83801985e-01,
+          1.29881382e-01, -9.99102652e-01, -9.93956566e-01,
+          -5.19075215e-01, 1.81807399e-01, -9.99291778e-01,
+          -9.95917499e-01, -3.05105925e-01, 3.09598804e-01,
+          -9.99232352e-01, -9.97241020e-01, -3.13432872e-01,
+          2.86319852e-01, -9.99167800e-01, -9.97290134e-01 ]
+      - [ -7.96188891e-01, -9.93034124e-01, -6.46557808e-01,
+          -2.21653461e-01, -9.98301387e-01, -9.98009205e-01,
+          -6.57307386e-01, -1.68566823e-01, -9.98308539e-01,
+          -9.98398423e-01, -4.85937476e-01, 1.47656322e-01,
+          -9.98687565e-01, -9.99254942e-01, -4.65624988e-01,
+          1.71568394e-01, -9.98721540e-01, -9.99564767e-01,
+          -2.96062529e-01, -9.58757460e-01, -3.97075355e-01,
+          1.64622545e-01, -9.98944759e-01, -9.88406837e-01,
+          -3.94219637e-01, 2.07714438e-01, -9.98669386e-01,
+          -9.90146875e-01, -1.75591469e-01, 3.61049175e-01,
+          -9.98908162e-01, -9.93471622e-01, -1.21890545e-01,
+          3.29526544e-01, -9.98807132e-01, -9.94285643e-01 ]
+      - [ -5.02071261e-01, 3.43885422e-02, -1.12846017e-01,
+          5.02681136e-01, -7.02668548e-01, 8.73441696e-02,
+          -9.43124890e-02, 2.04877734e-01, -6.19269490e-01,
+          -1.20541930e-01, -6.71874881e-02, -1.00026906e-01,
+          -4.34719086e-01, -4.09339964e-01, -7.96874762e-02,
+          -5.23717403e-02, -4.33474898e-01, -4.03825879e-01,
+          3.54974508e-01, 2.95840621e-01, 3.94825697e-01, 6.97435141e-02,
+          -2.49578118e-01, 4.12703037e-01, 4.61271644e-01,
+          -3.46001506e-01, -1.05351031e-01, 3.06656361e-02,
+          5.16811967e-01, -2.44840443e-01, 5.24938107e-03,
+          -1.72347128e-01, 5.09950161e-01, -2.37708688e-01,
+          -8.57824087e-03, -1.66007221e-01 ]
+      - [ -6.92833424e-01, -8.66329551e-01, -5.33002138e-01,
+          6.49905205e-02, -9.95958984e-01, -9.35480773e-01,
+          -5.49316049e-01, 1.40611291e-01, -9.96146023e-01,
+          -9.37527716e-01, -4.78124976e-01, 4.89550591e-01,
+          -9.95099366e-01, -9.55431938e-01, -4.82812464e-01,
+          5.02446055e-01, -9.95003104e-01, -9.56360996e-01,
+          -2.85241961e-01, -6.81751728e-01, -3.02587152e-01,
+          4.92569089e-01, -9.74856317e-01, -7.16835260e-01,
+          -2.80924916e-01, 4.61755633e-01, -9.65269983e-01,
+          -7.62634516e-01, -2.00498223e-01, 3.97541404e-01,
+          -9.32525277e-01, -8.82814646e-01, -2.16417849e-01,
+          3.49804878e-01, -9.30047989e-01, -8.88437569e-01 ]
+      - [ -7.05260992e-01, -9.00716245e-01, -4.88999307e-01,
+          4.57907915e-02, -9.99285638e-01, -9.61279750e-01,
+          -5.23398101e-01, 1.26591921e-01, -9.99189854e-01,
+          -9.59335148e-01, -4.28124964e-01, 5.02156973e-01,
+          -9.99158740e-01, -9.70994353e-01, -4.60937440e-01,
+          5.29911637e-01, -9.99185205e-01, -9.69426036e-01,
+          4.18695569e-01, -6.71920061e-01, 4.19572473e-01,
+          5.76574445e-01, -9.88464117e-01, -7.87459135e-01,
+          3.22543263e-01, 7.34893203e-01, -9.92230296e-01,
+          -7.66839743e-01, 6.61270142e-01, 8.03313255e-01,
+          -9.79104221e-01, -8.77354920e-01, 5.59701443e-01,
+          8.81263137e-01, -9.81987894e-01, -8.64103198e-01 ]
+      - [ -6.86826825e-01, -6.50534868e-01, -3.88218641e-01,
+          5.37979603e-01, -9.93385553e-01, -7.25603282e-01,
+          -3.10295165e-01, 3.22936296e-01, -9.76033330e-01,
+          -8.09747458e-01, -2.62499988e-01, 5.93220353e-01,
+          -9.62928772e-01, -8.58332813e-01, -2.84374952e-01,
+          6.71250343e-01, -9.66723323e-01, -8.49282682e-01,
+          -1.01893663e-01, -2.88300753e-01, 1.09111428e-01,
+          9.88035202e-01, -9.26758170e-01, -4.63805795e-02,
+          1.88439250e-01, 1.10576987e-01, -7.50607908e-01,
+          -5.49372554e-01, 2.02988744e-01, -1.42341912e-01,
+          -6.14445567e-01, -7.24005342e-01, 1.91542268e-01,
+          -1.42710805e-02, -6.53494000e-01, -7.01067388e-01 ]
+      - [ -6.68806970e-01, -7.43080735e-01, -4.60610390e-01,
+          3.08689594e-01, -9.92294312e-01, -8.27384830e-01,
+          -4.24046040e-01, 2.76537180e-01, -9.87222433e-01,
+          -8.61771882e-01, -3.62500012e-01, 5.47882438e-01,
+          -9.78375912e-01, -9.01846766e-01, -3.93749952e-01,
+          6.29824758e-01, -9.81271744e-01, -8.92629683e-01,
+          -1.04899347e-01, -4.74171042e-01, -6.86163306e-02,
+          7.57405639e-01, -9.50879157e-01, -3.95086646e-01,
+          -2.65896916e-02, 3.44884515e-01, -8.83338988e-01,
+          -6.33064091e-01, -6.22671843e-03, 1.39765739e-01,
+          -7.91895270e-01, -7.96873450e-01, -2.98507214e-02,
+          3.08235884e-01, -8.21298361e-01, -7.63437629e-01 ]
+      - [ -6.43330574e-01, -7.05815494e-01, -3.37118506e-01,
+          3.42186689e-01, -9.90431011e-01, -8.15067768e-01,
+          -3.13174903e-01, 3.24380875e-01, -9.85606611e-01,
+          -8.44481707e-01, -2.34374940e-01, 5.87332606e-01,
+          -9.75870907e-01, -8.90571475e-01, -2.40624964e-01,
+          5.84609985e-01, -9.74728227e-01, -8.92748892e-01,
+          9.70844030e-02, -3.95260394e-01, 1.15860462e-01,
+          6.69479609e-01, -9.14314389e-01, -3.66833329e-01,
+          1.49132967e-01, 3.14478278e-01, -8.37794304e-01,
+          -5.57999969e-01, 2.55292654e-01, 8.65914822e-02,
+          -7.51400709e-01, -7.70359159e-01, 2.66169310e-01,
+          1.04242563e-02, -7.44080901e-01, -7.81485200e-01 ]
+      - [ -6.94697618e-01, -8.36217344e-01, -5.06032705e-01,
+          1.07343078e-01, -9.94555593e-01, -9.16797042e-01,
+          -5.27717829e-01, 1.59991622e-01, -9.92951989e-01,
+          -9.18891668e-01, -4.98437405e-01, 4.58217025e-01,
+          -9.88649487e-01, -9.42038536e-01, -4.98437524e-01,
+          4.70642447e-01, -9.88295734e-01, -9.43002820e-01,
+          2.73519754e-02, -5.88613033e-01, 5.06186485e-02,
+          5.39346099e-01, -9.63743329e-01, -6.70855820e-01,
+          7.51445293e-02, 5.66133857e-01, -9.59025979e-01,
+          -6.96606457e-01, 8.34370852e-02, 4.56408024e-01,
+          -9.05150533e-01, -8.29625130e-01, 6.96516037e-02,
+          3.97500992e-01, -8.98021102e-01, -8.34086537e-01 ]
+      - [ -6.64664447e-01, -8.97351384e-01, -4.49254811e-01,
+          6.76325560e-02, -9.99600470e-01, -9.60081518e-01,
+          -4.08207357e-01, 1.07133746e-01, -9.99186933e-01,
+          -9.66636419e-01, -3.28125000e-01, 4.92461324e-01,
+          -9.99058664e-01, -9.73352134e-01, -3.53124917e-01,
+          5.22845030e-01, -9.99204338e-01, -9.72457945e-01,
+          5.93026757e-01, -6.68548524e-01, 4.03824568e-01,
+          6.90490246e-01, -9.93979633e-01, -7.57089734e-01,
+          5.69942117e-01, 5.37523866e-01, -9.82533276e-01,
+          -8.27717841e-01, 7.68368483e-01, 6.71247602e-01,
+          -9.70688343e-01, -8.90570223e-01, 7.16418028e-01,
+          7.36575246e-01, -9.74432766e-01, -8.82229626e-01 ]
+      - [ -5.99420071e-01, -7.52427697e-01, -3.59829664e-01,
+          2.06953526e-01, -9.89384472e-01, -8.66419494e-01,
+          -3.47732186e-01, 2.96459436e-01, -9.89925683e-01,
+          -8.70821595e-01, -2.74999976e-01, 4.71699834e-01,
+          -9.77343023e-01, -9.18532670e-01, -2.84374952e-01,
+          5.82250953e-01, -9.82204556e-01, -9.10210907e-01,
+          6.04147911e-02, -4.91610348e-01, -6.63666725e-02,
+          4.70267057e-01, -9.07064140e-01, -5.19397736e-01,
+          4.97108698e-02, 4.47910428e-01, -9.01991546e-01,
+          -6.04275703e-01, 1.25778317e-01, 7.97399282e-02,
+          -8.01735163e-01, -8.25944841e-01, 1.14427805e-01,
+          2.01683760e-01, -8.23710203e-01, -8.03840697e-01 ]
+      - [ -5.16777158e-01, -6.09617293e-01, -1.69623911e-01,
+          5.35060048e-01, -9.90716279e-01, -7.39032626e-01,
+          -2.19582319e-01, 7.03572631e-01, -9.92460847e-01,
+          -7.19957113e-01, -4.84374166e-02, 9.53094363e-01,
+          -9.79179859e-01, -8.10022533e-01, -5.00000119e-02,
+          8.66550565e-01, -9.74113345e-01, -8.22180867e-01,
+          6.38112426e-01, -1.71423793e-01, 5.32058477e-01,
+          6.48156285e-01, -8.36083770e-01, -1.74029827e-01,
+          4.95953679e-01, 9.68229055e-01, -8.71312737e-01,
+          -4.57648635e-02, 6.73723459e-01, 3.93944860e-01,
+          -6.77317977e-01, -5.57205439e-01, 6.76616907e-01,
+          1.85903430e-01, -6.45335317e-01, -6.05767965e-01 ]
+      - [ -6.46437407e-01, -7.89630651e-01, -3.62668574e-01,
+          2.15906739e-01, -9.95821416e-01, -8.94883633e-01,
+          -3.90928686e-01, 3.20700526e-01, -9.96025383e-01,
+          -8.90610337e-01, -2.93749928e-01, 5.98124027e-01,
+          -9.90412891e-01, -9.25583899e-01, -3.26562464e-01,
+          6.26427889e-01, -9.90269184e-01, -9.21967745e-01,
+          2.24526644e-01, -4.87763107e-01, 2.53093362e-01,
+          4.86855865e-01, -9.28473890e-01, -5.99850237e-01,
+          2.53179073e-01, 7.34970450e-01, -9.50414062e-01,
+          -5.50498962e-01, 3.34993720e-01, 3.20031404e-01,
+          -8.45194578e-01, -7.96043694e-01, 2.76119351e-01,
+          4.48601246e-01, -8.62698436e-01, -7.71239042e-01 ]
+      - [ -6.74813569e-01, -5.95565557e-01, -4.05251920e-01,
+          3.09481740e-01, -9.63277400e-01, -7.13077128e-01,
+          -4.06767368e-01, 3.78791094e-01, -9.61424291e-01,
+          -7.24411726e-01, -3.67187440e-01, 4.17049646e-01,
+          -9.21659589e-01, -8.27124417e-01, -3.74999940e-01,
+          3.72172713e-01, -9.15154219e-01, -8.37715626e-01,
+          -1.32551908e-01, -3.13950181e-01, -4.83689904e-02,
+          1.79989815e-01, -7.57131457e-01, -4.36279476e-01,
+          -6.12717271e-02, 1.63703561e-01, -7.27966964e-01,
+          -4.58789945e-01, -2.86426544e-02, 1.53268576e-02,
+          -6.20911002e-01, -6.70217633e-01, -2.73631811e-02,
+          -1.65908515e-01, -5.89394808e-01, -7.07439661e-01 ]
+      - [ -6.72535181e-01, -8.68172050e-01, -4.46415961e-01,
+          1.18301868e-01, -9.99438167e-01, -9.46149290e-01,
+          -4.97480154e-01, 2.08785415e-01, -9.99430299e-01,
+          -9.42625225e-01, -3.96874905e-01, 5.90773463e-01,
+          -9.99035537e-01, -9.58226025e-01, -3.93749952e-01,
+          6.03872299e-01, -9.99039471e-01, -9.59061384e-01,
+          5.35918236e-01, -5.89661360e-01, 5.05061984e-01,
+          6.52918935e-01, -9.81878877e-01, -7.06715882e-01,
+          3.82658839e-01, 8.41792583e-01, -9.88371253e-01,
+          -6.76083803e-01, 6.88667536e-01, 8.48940611e-01,
+          -9.63655591e-01, -8.27545404e-01, 6.94029808e-01,
+          8.19660306e-01, -9.61582124e-01, -8.31479371e-01 ]
+      - [ -5.94863296e-01, -6.72490120e-01, -3.89638066e-01,
+          2.76542187e-01, -9.77871180e-01, -7.85846889e-01,
+          -3.27573776e-01, 2.58830905e-01, -9.72055197e-01,
+          -8.22782397e-01, -2.78124988e-01, 4.14488673e-01,
+          -9.52134609e-01, -8.82752657e-01, -2.82812476e-01,
+          4.16962504e-01, -9.51383054e-01, -8.86589587e-01,
+          -8.08537006e-02, -4.68183041e-01, -1.76602960e-01,
+          5.13700604e-01, -9.02252197e-01, -4.45412278e-01,
+          -1.56069398e-01, 2.77757764e-01, -8.48708868e-01,
+          -6.01502895e-01, -3.61145735e-02, 4.85409498e-02,
+          -7.73616612e-01, -8.10329974e-01, -6.46765828e-02,
+          -3.37296724e-02, -7.61979759e-01, -8.19167972e-01 ]
+      - [ -6.87862515e-01, -6.39783382e-01, -3.82540822e-01,
+          5.08028865e-01, -9.91369188e-01, -7.29159713e-01,
+          -4.19726372e-01, 5.99969506e-01, -9.90933895e-01,
+          -7.33487129e-01, -3.17187428e-01, 8.40561032e-01,
+          -9.76835489e-01, -8.20365310e-01, -3.15624952e-01,
+          8.33876371e-01, -9.75198567e-01, -8.21157813e-01,
+          5.80101013e-02, -2.34398603e-01, 2.01349735e-01,
+          6.72638178e-01, -8.41360033e-01, -1.24255240e-01,
+          1.93063617e-01, 6.85876846e-01, -8.44273984e-01,
+          -2.26721525e-01, 2.50311375e-01, 1.60206676e-01,
+          -6.45608962e-01, -6.26209378e-01, 2.36318350e-01,
+          1.78993344e-01, -6.47595525e-01, -6.12554550e-01 ]
+      - [ -8.90430808e-01, -8.34692657e-01, -7.09013462e-01,
+          3.12926769e-02, -9.90131736e-01, -9.10308719e-01,
+          -7.29301691e-01, 8.64669085e-02, -9.89767909e-01,
+          -9.17079210e-01, -7.18750000e-01, 3.34128499e-01,
+          -9.81509149e-01, -9.44392264e-01, -7.25000024e-01,
+          3.41775298e-01, -9.80758429e-01, -9.46680784e-01,
+          -7.13255227e-01, -6.67424798e-01, -5.29808760e-01,
+          2.77993321e-01, -9.34116125e-01, -7.24126458e-01,
+          -5.46820879e-01, 2.68952131e-01, -9.25082922e-01,
+          -7.63419926e-01, -5.21793365e-01, 7.34689236e-02,
+          -8.45791936e-01, -8.83428812e-01, -5.24875641e-01,
+          1.69674158e-02, -8.44070375e-01, -8.91777992e-01 ]
+      - [ -6.84548497e-01, -4.75980103e-01, -3.45635176e-01,
+          5.45559645e-01, -9.55439806e-01, -5.37852645e-01,
+          -3.76529872e-01, 6.51062131e-01, -9.57353413e-01,
+          -5.49123049e-01, -3.14062476e-01, 6.09664917e-01,
+          -9.00881946e-01, -7.25515127e-01, -3.20312500e-01,
+          5.55019379e-01, -8.90528798e-01, -7.34659433e-01,
+          4.89929914e-02, -6.40394688e-02, 1.72103524e-01,
+          4.08721685e-01, -6.38801932e-01, 9.16278362e-02,
+          1.69942141e-01, 4.14365530e-01, -6.62760377e-01,
+          -6.19161129e-02, 2.25404620e-01, -1.81540191e-01,
+          -3.84434342e-01, -5.25660157e-01, 2.06467628e-01,
+          -2.24386752e-01, -3.69161367e-01, -5.20097136e-01 ]
+      - [ -7.27837622e-01, -9.72045183e-01, -5.58552206e-01,
+          -1.75088882e-01, -9.98097301e-01, -9.93202984e-01,
+          -5.57955325e-01, -1.24197841e-01, -9.97993052e-01,
+          -9.94116485e-01, -4.28124964e-01, 2.15213418e-01,
+          -9.98441160e-01, -9.95716512e-01, -4.04687524e-01,
+          2.36124277e-01, -9.98469651e-01, -9.96253610e-01,
+          -1.51788414e-01, -8.91668379e-01, -1.72103405e-01,
+          2.97209382e-01, -9.98465002e-01, -9.55043137e-01,
+          -1.63005888e-01, 2.98253059e-01, -9.95801210e-01,
+          -9.60243583e-01, -9.09091234e-02, 4.75591540e-01,
+          -9.95092213e-01, -9.72295046e-01, -1.74129605e-02,
+          4.28335428e-01, -9.94862497e-01, -9.76550162e-01 ]
+      - [ -6.10811949e-01, -5.79823792e-01, -3.00212920e-01,
+          4.85535622e-01, -9.75736916e-01, -6.71586871e-01,
+          -2.62778997e-01, 3.40399027e-01, -9.57996547e-01,
+          -7.47283757e-01, -1.99999928e-01, 4.71067071e-01,
+          -9.31531847e-01, -8.32682729e-01, -2.06249952e-01,
+          4.78973746e-01, -9.30577457e-01, -8.34954083e-01,
+          9.82867479e-02, -2.73657084e-01, 1.00112557e-01,
+          6.24968648e-01, -8.42356026e-01, -1.69950008e-01,
+          1.39884353e-01, 1.19427800e-01, -7.07660079e-01,
+          -4.62518215e-01, 2.35367298e-01, -1.57065988e-01,
+          -5.99279642e-01, -7.14984894e-01, 2.43781090e-01,
+          -2.45999634e-01, -5.89585960e-01, -7.29227304e-01 ]
+      - [ -6.92419231e-01, -2.32235849e-01, -3.62668574e-01,
+          2.98469543e-01, -7.81492531e-01, -2.51010180e-01,
+          -3.89488816e-01, 4.27915692e-01, -8.03537488e-01,
+          -2.73175001e-01, -3.54687512e-01, -6.04867935e-04,
+          -6.33200347e-01, -6.00873709e-01, -3.51562440e-01,
+          -8.87788534e-02, -6.04441822e-01, -6.26300514e-01,
+          3.09587717e-02, 9.75104570e-02, 1.74353242e-01, 6.55188560e-02,
+          -3.35061848e-01, 2.45444655e-01, 1.63005710e-01,
+          4.06630039e-02, -3.87166381e-01, 3.09821367e-02,
+          2.25404620e-01, -3.16352546e-01, -1.47610545e-01,
+          -3.78045678e-01, 2.11442828e-01, -4.02197242e-01,
+          -1.23499691e-01, -3.88897598e-01 ]
+      - [ -6.54722452e-01, -7.73424864e-01, -4.25124168e-01,
+          2.82531977e-01, -9.95876610e-01, -8.64931166e-01,
+          -4.39884722e-01, 3.40934873e-01, -9.95100379e-01,
+          -8.74120295e-01, -3.32812488e-01, 6.75707936e-01,
+          -9.91286993e-01, -9.10950243e-01, -3.39062512e-01,
+          6.96353197e-01, -9.91571844e-01, -9.10597146e-01,
+          8.62638950e-02, -4.95492101e-01, 1.02362275e-01,
+          6.23674035e-01, -9.39755797e-01, -5.07806778e-01,
+          1.02890134e-01, 5.51376104e-01, -9.24724817e-01,
+          -5.92772961e-01, 2.52802014e-01, 3.10049057e-01,
+          -8.47692132e-01, -8.02127361e-01, 2.36318350e-01,
+          3.15180302e-01, -8.48278880e-01, -7.96373010e-01 ]
+      - [ -6.46230340e-01, -4.31683183e-01, -3.92476916e-01,
+          7.99212337e-01, -9.63134408e-01, -3.54813755e-01,
+          -3.03095698e-01, -1.26853228e-01, -8.32184553e-01,
+          -7.20185637e-01, -2.92187452e-01, 5.25001287e-02,
+          -7.99676478e-01, -7.77014792e-01, -3.01562428e-01,
+          1.32862329e-02, -7.89468527e-01, -7.90582180e-01,
+          -5.71095943e-03, -1.88507140e-01, 7.76153803e-02,
+          8.72092843e-01, -8.47061455e-01, 1.43243074e-01,
+          1.44508600e-01, -5.04217029e-01, -4.71232772e-01,
+          -6.36061192e-01, 1.95516825e-01, -4.47059214e-01,
+          -4.22245145e-01, -6.84676528e-01, 1.74129367e-01,
+          -5.76771200e-01, -4.02559400e-01, -7.08018064e-01 ]
+      - [ -5.92377782e-01, -6.41425967e-01, -2.90276825e-01,
+          4.53729391e-01, -9.87788737e-01, -7.50505209e-01,
+          -2.71418214e-01, 4.54359889e-01, -9.83316660e-01,
+          -7.83085406e-01, -1.73437417e-01, 6.33722067e-01,
+          -9.65031385e-01, -8.55289936e-01, -1.85937464e-01,
+          6.98238134e-01, -9.68010068e-01, -8.48737776e-01,
+          2.38352895e-01, -3.18538129e-01, 2.14848161e-01,
+          6.08073831e-01, -8.61492276e-01, -2.65374064e-01,
+          2.23121405e-01, 3.52543473e-01, -8.04646015e-01,
+          -4.52693164e-01, 3.72353673e-01, -4.87246513e-02,
+          -6.68327689e-01, -7.34848022e-01, 3.23382974e-01,
+          6.69282675e-02, -6.89530671e-01, -7.04442739e-01 ]
+      - [ -5.71458161e-01, -3.72081876e-01, -2.24982262e-01,
+          3.62863660e-01, -8.99804413e-01, -5.11459053e-01,
+          -2.48380184e-01, 4.99960065e-01, -8.98650527e-01,
+          -4.83505368e-01, -1.93749905e-01, 2.35009789e-01,
+          -8.02007496e-01, -7.14409351e-01, -2.01562524e-01,
+          2.57346511e-01, -8.02975833e-01, -7.18214452e-01,
+          7.84490108e-02, -9.07785892e-02, 1.76602960e-01,
+          1.05733275e-01, -6.12457395e-01, -2.31038094e-01,
+          8.43930244e-02, 3.81321669e-01, -6.39204741e-01,
+          -4.25637364e-02, 2.30386019e-01, -1.75478876e-01,
+          -4.29112792e-01, -5.64798713e-01, 2.18905449e-01,
+          -1.48250878e-01, -4.46584225e-01, -5.56003571e-01 ]
+      - [ -7.18517005e-01, -5.42287588e-01, -4.73385394e-01,
+          8.98119211e-02, -9.06037569e-01, -6.84073806e-01,
+          -5.07559419e-01, 4.33782816e-01, -9.41603005e-01,
+          -6.11128330e-01, -4.92187500e-01, 8.52688551e-02,
+          -8.35406184e-01, -8.11239779e-01, -4.84375000e-01,
+          3.24410081e-01, -8.69307756e-01, -7.81969726e-01,
+          -1.71025038e-01, -3.19340289e-01, -1.42857194e-01,
+          6.98339939e-02, -6.82873130e-01, -4.04218614e-01,
+          -1.14450872e-01, 3.64181519e-01, -7.68302560e-01,
+          -3.38964581e-01, -6.84931278e-02, -1.84590161e-01,
+          -5.53239286e-01, -6.96780384e-01, -5.47263026e-02,
+          -1.51600301e-01, -5.77258766e-01, -6.91446185e-01 ]
+      - [ -6.65700078e-01, -8.29277992e-01, -4.79063213e-01,
+          1.03906989e-01, -9.93524492e-01, -9.14880395e-01,
+          -4.75881934e-01, 1.45497561e-01, -9.91586268e-01,
+          -9.20611560e-01, -4.48437512e-01, 4.41225648e-01,
+          -9.86250877e-01, -9.40686047e-01, -4.51562464e-01,
+          4.40170050e-01, -9.85885859e-01, -9.44247961e-01,
+          -1.37361050e-01, -6.35073185e-01, -1.33858263e-01,
+          4.75329638e-01, -9.62164164e-01, -6.96842551e-01,
+          -1.21387362e-01, 4.67898130e-01, -9.52197790e-01,
+          -7.26641297e-01, -1.43213034e-01, 4.53789830e-01,
+          -9.06853795e-01, -8.29016685e-01, -1.56716406e-01,
+          3.19297791e-01, -8.93650532e-01, -8.49787414e-01 ]
+      - [ -7.09403515e-01, -8.13294828e-01, -4.98935461e-01,
+          1.65013313e-01, -9.95105565e-01, -8.99712026e-01,
+          -5.33477247e-01, 2.65904665e-01, -9.95748341e-01,
+          -8.97083104e-01, -4.21875000e-01, 5.82450867e-01,
+          -9.92543817e-01, -9.31570411e-01, -4.14062500e-01,
+          5.49779654e-01, -9.90843415e-01, -9.36743379e-01,
+          9.91880894e-03, -5.25920033e-01, -4.83689904e-02,
+          3.79502058e-01, -9.04119670e-01, -5.95600724e-01,
+          -8.67052078e-02, 4.86291885e-01, -9.12481904e-01,
+          -5.92848241e-01, 7.34744072e-02, 1.68095946e-01,
+          -8.25105309e-01, -8.22584748e-01, 5.72139025e-02,
+          2.55687237e-02, -8.07964027e-01, -8.42614055e-01 ]
+      - [ -6.68806970e-01, -7.30607390e-01, -4.37899172e-01,
+          2.41279483e-01, -9.87043500e-01, -8.35139155e-01,
+          -4.57163393e-01, 2.86364198e-01, -9.84002888e-01,
+          -8.42507482e-01, -4.21875000e-01, 4.95918393e-01,
+          -9.69774246e-01, -8.92870307e-01, -4.24999952e-01,
+          4.82138991e-01, -9.67263997e-01, -8.96396220e-01,
+          -5.71095943e-03, -4.63499963e-01, 3.37457657e-03,
+          4.68156338e-01, -9.08406138e-01, -5.32733202e-01,
+          1.96532011e-02, 4.12639499e-01, -8.86200368e-01,
+          -5.88665009e-01, 1.12079382e-02, 2.59791136e-01,
+          -7.98184037e-01, -7.59582639e-01, -7.46268034e-03,
+          1.95333838e-01, -7.87342131e-01, -7.64959395e-01 ]
+      - [ -6.65907204e-01, -8.27627122e-01, -4.44996417e-01,
+          1.81549311e-01, -9.98120546e-01, -9.16627705e-01,
+          -4.51403856e-01, 2.73541331e-01, -9.98028874e-01,
+          -9.15370286e-01, -3.74999940e-01, 6.31222248e-01,
+          -9.96367455e-01, -9.38362181e-01, -3.89062464e-01,
+          6.50973558e-01, -9.96506691e-01, -9.37944949e-01,
+          1.74030542e-01, -5.23622632e-01, 2.01349735e-01,
+          6.72406793e-01, -9.67695951e-01, -5.96523881e-01,
+          2.55491376e-01, 8.02149415e-01, -9.71376717e-01,
+          -5.93019247e-01, 3.22540522e-01, 6.31632090e-01,
+          -9.12565887e-01, -7.88176894e-01, 3.20895553e-01,
+          6.55067086e-01, -9.16384816e-01, -7.85440266e-01 ]
+      - [ -6.12676024e-01, -7.59373128e-01, -3.45635176e-01,
+          3.61661673e-01, -9.98885453e-01, -8.66017282e-01,
+          -3.56371462e-01, 4.46811795e-01, -9.98695433e-01,
+          -8.70220602e-01, -2.42187440e-01, 8.26701283e-01,
+          -9.96229053e-01, -9.02693987e-01, -2.51562417e-01,
+          8.35972071e-01, -9.96410370e-01, -9.03712213e-01,
+          6.39314651e-01, -3.77558827e-01, 5.68054080e-01,
+          8.97766113e-01, -9.56381440e-01, -3.75723302e-01,
+          5.95375538e-01, 9.16175008e-01, -9.50875878e-01,
+          -4.32025075e-01, 7.28518009e-01, 7.54364610e-01,
+          -8.73379707e-01, -6.86106384e-01, 7.48756051e-01,
+          7.26293445e-01, -8.73745084e-01, -6.97846591e-01 ]
+      - [ -6.13711655e-01, -6.57546759e-01, -3.44215751e-01,
+          3.70661974e-01, -9.86002207e-01, -7.77933121e-01,
+          -3.60691130e-01, 4.77409840e-01, -9.86364901e-01,
+          -7.77535737e-01, -2.98437476e-01, 5.86619616e-01,
+          -9.61464405e-01, -8.55566621e-01, -2.95312405e-01,
+          5.88545203e-01, -9.60944057e-01, -8.59210968e-01,
+          2.20318556e-01, -3.53044033e-01, 1.51856065e-01,
+          4.49741006e-01, -8.64075720e-01, -4.32845473e-01,
+          1.93063617e-01, 4.93934631e-01, -8.61974359e-01,
+          -4.66061771e-01, 2.05479503e-01, 1.64250731e-01,
+          -7.17979908e-01, -7.05168188e-01, 1.86567187e-01,
+          1.27032280e-01, -7.14926243e-01, -7.09497929e-01 ]
+      - [ -6.67771339e-01, -5.99364877e-01, -4.30801988e-01,
+          2.51594782e-01, -9.51154828e-01, -7.05183268e-01,
+          -4.21166301e-01, 4.89086270e-01, -9.70812380e-01,
+          -6.95376277e-01, -3.46874893e-01, 4.76705074e-01,
+          -9.30285156e-01, -8.24409366e-01, -3.45312417e-01,
+          3.59764576e-01, -9.17872071e-01, -8.46320391e-01,
+          4.17792797e-02, -2.73876727e-01, 7.87401199e-03,
+          3.31534386e-01, -7.52972960e-01, -2.63597548e-01,
+          -2.42775679e-02, 3.04826975e-01, -7.23642826e-01,
+          -3.08535874e-01, 1.23287678e-01, -2.20620751e-01,
+          -5.45353293e-01, -6.96427763e-01, 1.24378085e-01,
+          -4.68834817e-01, -5.04676104e-01, -7.47524858e-01 ]
+      - [ -7.14995861e-01, -9.05018508e-01, -5.61391056e-01,
+          2.24611759e-02, -9.98103321e-01, -9.56873000e-01,
+          -5.60835123e-01, 8.31639767e-02, -9.97770965e-01,
+          -9.59445059e-01, -5.24999976e-01, 4.66011763e-01,
+          -9.97760355e-01, -9.67544079e-01, -5.10937452e-01,
+          4.72905397e-01, -9.97653127e-01, -9.69583988e-01,
+          -3.10490012e-01, -7.38668919e-01, -2.01349795e-01,
+          5.03952980e-01, -9.89143133e-01, -7.94140816e-01,
+          -2.00000048e-01, 5.31241894e-01, -9.85142112e-01,
+          -8.06801140e-01, -1.80572867e-01, 6.34378910e-01,
+          -9.68621075e-01, -8.77248526e-01, -1.89054728e-01,
+          5.44867158e-01, -9.64164615e-01, -8.90761614e-01 ]
+      - [ -7.68641233e-01, -9.99470413e-01, -6.97657943e-01,
+          -1.96535647e-01, -9.99295950e-01, -9.97798324e-01,
+          -6.83225334e-01, -1.51775002e-01, -9.99251544e-01,
+          -9.98914003e-01, -5.09374976e-01, 1.65418625e-01,
+          -9.99384224e-01, -9.99570072e-01, -5.28125048e-01,
+          1.95033312e-01, -9.99367833e-01, -9.99582410e-01,
+          -3.80222440e-01, -9.59586024e-01, -4.82564628e-01,
+          1.87668085e-01, -9.99589026e-01, -9.86298323e-01,
+          -4.61271703e-01, 2.40049005e-01, -9.99589026e-01,
+          -9.88225996e-01, -3.00124526e-01, 3.99534941e-01,
+          -9.99654472e-01, -9.92341042e-01, -3.18407953e-01,
+          3.82384181e-01, -9.99629259e-01, -9.92238045e-01 ]
+      - [ -7.25144982e-01, -9.80694294e-01, -6.08232737e-01,
+          -1.95366383e-01, -9.98273432e-01, -9.95608628e-01,
+          -5.79553604e-01, -1.41878307e-01, -9.98262227e-01,
+          -9.96279836e-01, -4.64062512e-01, 1.76188111e-01,
+          -9.98390019e-01, -9.97642994e-01, -4.60937440e-01,
+          2.04041600e-01, -9.98423576e-01, -9.97804165e-01,
+          -3.38743687e-01, -9.32625592e-01, -3.85826766e-01,
+          2.25263953e-01, -9.98964727e-01, -9.74688888e-01,
+          -3.57225478e-01, 2.83728361e-01, -9.99037743e-01,
+          -9.76883292e-01, -2.45329976e-01, 4.45397496e-01,
+          -9.98710215e-01, -9.85004187e-01, -2.38805950e-01,
+          4.20711160e-01, -9.98576880e-01, -9.85656321e-01 ]
+      - [ -6.70878232e-01, -5.98239839e-01, -4.33640897e-01,
+          5.98666549e-01, -9.86036420e-01, -6.26864672e-01,
+          -3.20374310e-01, 1.49392843e-01, -9.44802880e-01,
+          -7.97656119e-01, -3.09374928e-01, 4.03900266e-01,
+          -9.26917613e-01, -8.41464698e-01, -3.15624952e-01,
+          3.70647550e-01, -9.22262192e-01, -8.50540042e-01,
+          -4.83919978e-02, -3.01815569e-01, 4.61193323e-02,
+          9.16558862e-01, -9.12968159e-01, -5.69474697e-02,
+          1.19075060e-01, -2.35832632e-01, -6.49977744e-01,
+          -6.60917044e-01, 1.55666113e-01, -2.67972112e-01,
+          -5.80477357e-01, -7.44556844e-01, 1.34328365e-01,
+          -4.02842581e-01, -5.61114788e-01, -7.67049551e-01 ]
+      - [ -7.35086977e-01, -9.03229356e-01, -5.48616052e-01,
+          4.27922010e-02, -9.99162853e-01, -9.58883345e-01,
+          -5.75233996e-01, 1.14987373e-01, -9.99130487e-01,
+          -9.59544778e-01, -5.29687524e-01, 5.02084613e-01,
+          -9.99144375e-01, -9.68659163e-01, -5.40624976e-01,
+          5.26625276e-01, -9.99217927e-01, -9.68169391e-01,
+          -4.47851419e-02, -6.73401117e-01, 4.83690500e-02,
+          6.81867480e-01, -9.95094478e-01, -7.44830668e-01,
+          3.35259438e-02, 7.18794465e-01, -9.93114650e-01,
+          -7.61809289e-01, 1.88044786e-01, 8.58881831e-01,
+          -9.81590867e-01, -8.58333647e-01, 1.91542268e-01,
+          9.09624338e-01, -9.84454751e-01, -8.53718817e-01 ]
+      - [ -4.87158298e-01, -2.42452681e-01, -9.01347995e-02,
+          6.49156094e-01, -8.89455616e-01, -2.76160419e-01,
+          -9.57522988e-02, 6.76486850e-01, -8.80720317e-01,
+          -3.25727165e-01, -3.43749523e-02, 3.28006387e-01,
+          -7.51131594e-01, -6.07975841e-01, -4.37499881e-02,
+          3.21711898e-01, -7.46852398e-01, -6.16069198e-01,
+          5.28103352e-01, 1.54039264e-01, 4.87064123e-01, 3.60171318e-01,
+          -4.85354245e-01, 3.52662086e-01, 5.02890110e-01,
+          2.53093958e-01, -4.46509719e-01, 1.83758736e-01,
+          5.94022274e-01, -3.09349000e-01, -1.42089963e-01,
+          -3.50233376e-01, 6.14427924e-01, -3.61337006e-01,
+          -1.54477239e-01, -3.73843551e-01 ]
+      - [ -5.18434167e-01, -5.80603719e-01, -1.41234934e-01,
+          5.00651360e-01, -9.83033657e-01, -7.17090070e-01,
+          -1.97984040e-01, 6.76849604e-01, -9.86617506e-01,
+          -6.98690891e-01, -6.24999404e-02, 7.66398668e-01,
+          -9.60298121e-01, -8.10600519e-01, -5.93749881e-02,
+          7.86112070e-01, -9.61483300e-01, -8.11688304e-01,
+          5.43733120e-01, -1.57133520e-01, 4.42069769e-01,
+          4.67987418e-01, -7.65968621e-01, -1.71820402e-01,
+          4.35837984e-01, 7.46935606e-01, -8.13791990e-01,
+          -9.53370333e-02, 5.64134479e-01, 5.69282770e-02,
+          -5.86780787e-01, -6.05794668e-01, 5.79601884e-01,
+          -9.28509235e-03, -5.80331981e-01, -6.18705988e-01 ]
+      - [ -6.38152421e-01, -7.45717406e-01, -3.39957416e-01,
+          2.71675587e-01, -9.93239760e-01, -8.60590577e-01,
+          -3.65010738e-01, 3.78470540e-01, -9.93134201e-01,
+          -8.54746103e-01, -2.76562512e-01, 6.06418014e-01,
+          -9.83574569e-01, -9.04558480e-01, -3.10937524e-01,
+          6.35827661e-01, -9.83039975e-01, -8.99202883e-01,
+          2.01081991e-01, -4.27456677e-01, 2.35095620e-01,
+          4.39319730e-01, -8.95916879e-01, -5.39233387e-01,
+          2.36994147e-01, 7.04067469e-01, -9.24580216e-01,
+          -4.76218641e-01, 3.25031042e-01, 1.89753532e-01,
+          -7.85274386e-01, -7.69946933e-01, 2.61193871e-01,
+          3.34124207e-01, -8.06988180e-01, -7.38199055e-01 ]
+      - [ -5.14498830e-01, -5.29434323e-01, -2.27821112e-01,
+          1.74320936e-01, -9.27473545e-01, -7.05007017e-01,
+          -2.38300860e-01, 3.63904238e-01, -9.41968083e-01,
+          -6.90238714e-01, -1.57812417e-01, 1.24256253e-01,
+          -8.71647060e-01, -8.42536926e-01, -1.71874940e-01,
+          3.77067924e-01, -8.96876633e-01, -8.11534107e-01,
+          1.39164329e-01, -3.21666837e-01, -2.81215310e-02,
+          2.67980576e-01, -7.58190215e-01, -3.45339179e-01,
+          6.82079792e-02, 3.15840483e-01, -7.82753050e-01,
+          -4.30883348e-01, 2.12951303e-01, -1.75145805e-01,
+          -6.34684682e-01, -7.54553139e-01, 1.76616907e-01,
+          -2.70374417e-02, -6.64883792e-01, -7.17705607e-01 ]
+      - [ -4.87986803e-01, -5.76705694e-01, -1.42654359e-01,
+          4.85056400e-01, -9.78128433e-01, -7.02136517e-01,
+          -1.40388727e-01, 5.26324749e-01, -9.72979903e-01,
+          -7.17169583e-01, -1.26562476e-01, 6.48589373e-01,
+          -9.39057767e-01, -7.96089888e-01, -9.53124762e-02,
+          6.08764887e-01, -9.39261854e-01, -8.15210879e-01,
+          6.17072344e-01, -1.92804158e-01, 4.53318357e-01,
+          5.35724759e-01, -8.01408708e-01, -1.91429436e-01,
+          4.54335213e-01, 4.88830566e-01, -7.74460912e-01,
+          -2.62362957e-01, 4.54545379e-01, 2.99085736e-01,
+          -6.11554980e-01, -5.20229101e-01, 5.09950161e-01,
+          -9.76983905e-02, -5.45539856e-01, -6.22184515e-01 ]
+      - [ -6.56379461e-01, -8.06742191e-01, -4.08090889e-01,
+          1.83699131e-01, -9.96212721e-01, -9.05626774e-01,
+          -4.45644319e-01, 2.84051895e-01, -9.96199846e-01,
+          -9.00259852e-01, -3.42187464e-01, 5.96978307e-01,
+          -9.92763042e-01, -9.32042897e-01, -3.53124917e-01,
+          6.14902258e-01, -9.92835522e-01, -9.31939363e-01,
+          1.77335739e-02, -5.08481860e-01, 1.45106912e-01,
+          5.45827389e-01, -9.47355270e-01, -6.09449267e-01,
+          5.20231724e-02, 7.24393487e-01, -9.52612638e-01,
+          -5.47114611e-01, 2.02988744e-01, 4.09287572e-01,
+          -8.74181926e-01, -8.03070068e-01, 1.99005008e-01,
+          4.17883277e-01, -8.79850566e-01, -8.04215550e-01 ]
+      - [ -6.57829285e-01, -8.58893752e-01, -4.62029815e-01,
+          8.72633457e-02, -9.96866286e-01, -9.36897337e-01,
+          -4.39884722e-01, 1.61185503e-01, -9.96919692e-01,
+          -9.40672457e-01, -3.64062488e-01, 5.02397656e-01,
+          -9.95529830e-01, -9.57388282e-01, -3.76562476e-01,
+          5.31907082e-01, -9.95723903e-01, -9.56331909e-01,
+          -9.31781530e-03, -6.32698774e-01, -7.98650384e-02,
+          5.30689478e-01, -9.69566762e-01, -6.89883530e-01,
+          6.12715483e-02, 4.89389896e-01, -9.60927129e-01,
+          -7.53160119e-01, 8.59277248e-02, 3.81286383e-01,
+          -9.20177400e-01, -8.73140752e-01, 9.20397043e-02,
+          4.35821652e-01, -9.26915824e-01, -8.64325941e-01 ]
+      - [ -7.17688441e-01, -9.16440904e-01, -5.52874386e-01,
+          -2.57458687e-02, -9.97945011e-01, -9.67801154e-01,
+          -5.46436191e-01, -8.46499205e-03, -9.96299744e-01,
+          -9.72853422e-01, -5.17187476e-01, 3.58093023e-01,
+          -9.96385217e-01, -9.77581680e-01, -4.87499952e-01,
+          3.52558494e-01, -9.96119857e-01, -9.80822742e-01,
+          -1.83047831e-01, -7.63003886e-01, -2.12598383e-01,
+          4.60153580e-01, -9.91887093e-01, -8.37044418e-01,
+          -1.46820843e-01, 3.08137894e-01, -9.74132061e-01,
+          -8.85522842e-01, -1.45703673e-01, 4.78190780e-01,
+          -9.67646420e-01, -9.17265058e-01, -1.16915464e-01,
+          2.38795161e-01, -9.51358736e-01, -9.41747665e-01 ]
+      - [ -7.37572491e-01, -9.99092638e-01, -6.65010691e-01,
+          -2.03046739e-01, -9.99234378e-01, -9.98291492e-01,
+          -6.16990626e-01, -1.50901854e-01, -9.99268889e-01,
+          -9.98978853e-01, -4.43749964e-01, 1.62598014e-01,
+          -9.99334812e-01, -9.99605417e-01, -4.54687536e-01,
+          1.91491485e-01, -9.99325037e-01, -9.99659717e-01,
+          -4.44544733e-01, -9.70671833e-01, -5.50056219e-01,
+          1.74228430e-01, -9.99804199e-01, -9.89709079e-01,
+          -4.79768813e-01, 2.23349810e-01, -9.99818206e-01,
+          -9.92486537e-01, -3.37484360e-01, 3.79741311e-01,
+          -9.99871790e-01, -9.94529963e-01, -3.03482592e-01,
+          3.58953953e-01, -9.99869227e-01, -9.94903207e-01 ]
+      - [ -6.52858317e-01, -9.87246275e-01, -4.54932570e-01,
+          -2.17814505e-01, -9.98162210e-01, -9.97598827e-01,
+          -4.70122337e-01, -1.63394451e-01, -9.98546660e-01,
+          -9.98538196e-01, -2.10937500e-01, 1.51223540e-01,
+          -9.98694837e-01, -9.99132931e-01, -2.03125000e-01,
+          1.78120375e-01, -9.98732746e-01, -9.99334514e-01,
+          -3.33333373e-01, -9.61785436e-01, -4.33070838e-01,
+          1.76306963e-01, -9.99430001e-01, -9.88285363e-01,
+          -4.15028930e-01, 2.18045354e-01, -9.99247253e-01,
+          -9.90745842e-01, -1.98007464e-01, 3.67092609e-01,
+          -9.99271631e-01, -9.93931890e-01, -1.99005008e-01,
+          3.46840739e-01, -9.99218941e-01, -9.93975222e-01 ]
+      - [ -7.86039770e-01, -9.47259068e-01, -6.15329981e-01,
+          -1.06280386e-01, -9.97766316e-01, -9.81746376e-01,
+          -6.29949570e-01, -4.59878445e-02, -9.97979045e-01,
+          -9.83567178e-01, -5.76562405e-01, 3.09088588e-01,
+          -9.98213947e-01, -9.87388849e-01, -6.01562500e-01,
+          3.42206717e-01, -9.98350441e-01, -9.86818790e-01,
+          -2.19717503e-01, -8.05730939e-01, -1.67604029e-01,
+          4.39186215e-01, -9.95762587e-01, -8.77375662e-01,
+          -1.58381522e-01, 4.63761091e-01, -9.94035482e-01,
+          -8.91391158e-01, -5.35492301e-02, 6.19315863e-01,
+          -9.89324987e-01, -9.32932913e-01, -1.04477644e-01,
+          6.60548687e-01, -9.90979910e-01, -9.26411867e-01 ]
+      - [ -6.77713335e-01, -7.28029728e-01, -4.63449240e-01,
+          3.79796147e-01, -9.93840516e-01, -8.01886201e-01,
+          -3.57811332e-01, 2.09305644e-01, -9.82214630e-01,
+          -8.72711539e-01, -3.29687476e-01, 5.35357118e-01,
+          -9.75618780e-01, -8.99321318e-01, -3.31249952e-01,
+          5.19720316e-01, -9.74074781e-01, -9.04215455e-01,
+          -6.70274496e-02, -4.29904819e-01, 1.91226006e-02,
+          8.82499814e-01, -9.56467569e-01, -3.05530190e-01,
+          9.59537029e-02, 6.74895048e-02, -8.14748943e-01,
+          -6.97654366e-01, 1.45703554e-01, 2.50214338e-02,
+          -7.53873110e-01, -7.95838356e-01, 1.24378085e-01,
+          -1.11602187e-01, -7.35396862e-01, -8.15498471e-01 ]
+      - [ -7.29494631e-01, -9.82146800e-01, -5.94038308e-01,
+          -1.64684653e-01, -9.99163568e-01, -9.94757414e-01,
+          -6.22750163e-01, -1.02765620e-01, -9.99105930e-01,
+          -9.94562984e-01, -5.35937548e-01, 2.31042743e-01,
+          -9.99178648e-01, -9.96122301e-01, -5.34375012e-01,
+          2.57499695e-01, -9.99202132e-01, -9.96285141e-01,
+          -2.23925471e-01, -8.91938746e-01, -2.46344268e-01,
+          2.94573665e-01, -9.98173237e-01, -9.51922417e-01,
+          -2.23121405e-01, 3.26292038e-01, -9.96667862e-01,
+          -9.54989791e-01, -1.75591469e-01, 4.84242916e-01,
+          -9.95095551e-01, -9.70277786e-01, -1.69154167e-01,
+          4.66193318e-01, -9.94967282e-01, -9.70738590e-01 ]
+      - [ -7.72783756e-01, -9.75850105e-01, -6.32363379e-01,
+          -1.67916596e-01, -9.98565018e-01, -9.93200183e-01,
+          -6.31389499e-01, -1.19917691e-01, -9.98424709e-01,
+          -9.94434237e-01, -5.29687524e-01, 2.11968303e-01,
+          -9.98654425e-01, -9.96143699e-01, -5.32812476e-01,
+          2.39393830e-01, -9.98683035e-01, -9.96276915e-01,
+          -1.83649004e-01, -8.87441456e-01, -1.31608605e-01,
+          3.07159543e-01, -9.98639166e-01, -9.54156518e-01,
+          -1.32948041e-01, 3.60863447e-01, -9.98529732e-01,
+          -9.57618713e-01, 8.71729851e-03, 5.52761912e-01,
+          -9.98090208e-01, -9.71981943e-01, 1.74129009e-02,
+          5.36403537e-01, -9.98063147e-01, -9.72698927e-01 ]
+      - [ -6.29038930e-01, -7.02732086e-01, -3.48474145e-01,
+          3.19165111e-01, -9.89971280e-01, -8.22149456e-01,
+          -3.75090003e-01, 4.28701997e-01, -9.89379525e-01,
+          -8.13287437e-01, -2.71874964e-01, 6.25524998e-01,
+          -9.76345778e-01, -8.81270111e-01, -2.82812476e-01,
+          6.49081230e-01, -9.77017403e-01, -8.80957007e-01,
+          1.23232603e-02, -3.67719173e-01, 1.33858323e-01,
+          4.60879445e-01, -8.78940582e-01, -4.62931752e-01,
+          4.50866222e-02, 6.90119624e-01, -8.92053425e-01,
+          -3.54837716e-01, 1.73100829e-01, 1.54044032e-01,
+          -7.44629502e-01, -7.38823771e-01, 1.76616907e-01,
+          1.70786977e-01, -7.56402075e-01, -7.38581061e-01 ]
+      - [ -7.33637094e-01, -9.37556684e-01, -5.98296642e-01,
+          -2.31900215e-02, -9.99765635e-01, -9.76290226e-01,
+          -6.08351350e-01, 4.56939936e-02, -9.99736011e-01,
+          -9.76979613e-01, -5.65624952e-01, 4.17394757e-01,
+          -9.99684751e-01, -9.81827617e-01, -5.68749964e-01,
+          4.38478827e-01, -9.99710917e-01, -9.82100904e-01,
+          1.01292372e-01, -7.52289653e-01, 5.96176386e-02,
+          5.46337843e-01, -9.96789396e-01, -8.44151556e-01,
+          9.82658863e-02, 5.97797632e-01, -9.95663345e-01,
+          -8.52936685e-01, 2.07970142e-01, 7.65999198e-01,
+          -9.88994658e-01, -9.05439377e-01, 2.13930368e-01,
+          7.57767916e-01, -9.89120543e-01, -9.07257080e-01 ]
+      - [ -7.07746506e-01, -9.44415331e-01, -5.42938232e-01,
+          -9.04487371e-02, -9.98462796e-01, -9.82680500e-01,
+          -5.56515455e-01, -1.80416107e-02, -9.98559594e-01,
+          -9.82417285e-01, -4.68749940e-01, 3.33687544e-01,
+          -9.98754859e-01, -9.87582862e-01, -4.74999964e-01,
+          3.58892560e-01, -9.98731732e-01, -9.87449944e-01,
+          -9.58822370e-02, -8.01917136e-01, -1.18110180e-01,
+          4.28988099e-01, -9.95232522e-01, -8.81609142e-01,
+          -2.65896916e-02, 4.65084076e-01, -9.94042993e-01,
+          -8.96529078e-01, 4.10958529e-02, 5.96365333e-01,
+          -9.88395870e-01, -9.36862469e-01, 4.22885418e-02,
+          5.94526291e-01, -9.88574803e-01, -9.36080813e-01 ]
+      - [ -7.22866595e-01, -8.95212531e-01, -5.00354886e-01,
+          4.29580212e-02, -9.98820484e-01, -9.58356321e-01,
+          -4.85961080e-01, 9.27659273e-02, -9.98534262e-01,
+          -9.63654160e-01, -4.15624976e-01, 4.72359896e-01,
+          -9.98409212e-01, -9.71529901e-01, -4.46874917e-01,
+          4.97498512e-01, -9.98427689e-01, -9.70511556e-01,
+          -1.00090206e-01, -6.60259068e-01, 6.41169548e-02,
+          6.74364328e-01, -9.91658926e-01, -7.21871614e-01,
+          1.51445031e-01, 5.43215156e-01, -9.79926050e-01,
+          -7.98386216e-01, 2.27895379e-01, 6.16404533e-01,
+          -9.60299134e-01, -8.75756919e-01, 1.84079528e-01,
+          6.72083855e-01, -9.64962244e-01, -8.67987514e-01 ]
+      - [ -4.98964429e-01, -4.19606149e-01, -1.28459871e-01,
+          7.14618802e-01, -9.64092076e-01, -4.93909657e-01,
+          -8.99928212e-02, 6.76509619e-01, -9.51217234e-01,
+          -5.54968536e-01, -2.18750238e-02, 6.75489426e-01,
+          -8.97422373e-01, -7.08235204e-01, -3.59374881e-02,
+          6.67114735e-01, -8.93150926e-01, -7.08829999e-01,
+          4.09678340e-01, 5.72717190e-03, 4.01574850e-01, 6.25937819e-01,
+          -7.11232185e-01, 1.75145030e-01, 4.75144386e-01,
+          3.37432504e-01, -6.31267548e-01, -9.60180759e-02,
+          5.04358649e-01, -8.83455873e-02, -3.91051590e-01,
+          -4.75112557e-01, 4.97512341e-01, -9.37904119e-02,
+          -3.98028910e-01, -4.73571837e-01 ]
+      - [ -6.97597384e-01, -8.33746731e-01, -4.29382563e-01,
+          1.46602988e-01, -9.97317374e-01, -9.23249960e-01,
+          -4.31245446e-01, 1.97817683e-01, -9.96663809e-01,
+          -9.29911435e-01, -3.57812464e-01, 5.53364158e-01,
+          -9.95024264e-01, -9.47443128e-01, -3.56249988e-01,
+          5.62229276e-01, -9.94819820e-01, -9.48534429e-01,
+          1.07905030e-01, -5.68311334e-01, 1.67604089e-01,
+          6.44751430e-01, -9.71393943e-01, -6.33667111e-01,
+          1.88439250e-01, 5.09975433e-01, -9.49197412e-01,
+          -7.07252920e-01, 3.15068483e-01, 4.56680894e-01,
+          -9.10689712e-01, -8.40461612e-01, 3.25870633e-01,
+          4.03024077e-01, -9.07272279e-01, -8.47774446e-01 ]
+      - [ -6.23032331e-01, -7.81018078e-01, -3.52732420e-01,
+          2.95453668e-01, -9.98031080e-01, -8.82369518e-01,
+          -3.43412519e-01, 3.42469931e-01, -9.97114539e-01,
+          -8.93580258e-01, -2.35937476e-01, 7.08872557e-01,
+          -9.94655252e-01, -9.20421422e-01, -2.64062464e-01,
+          7.44392395e-01, -9.95214462e-01, -9.17487562e-01,
+          3.68199587e-01, -4.89245057e-01, 3.47581506e-01,
+          7.35575080e-01, -9.60829258e-01, -5.24302423e-01,
+          3.43352556e-01, 6.12492561e-01, -9.40135598e-01,
+          -6.13401353e-01, 5.04358649e-01, 4.74795461e-01,
+          -8.81284297e-01, -7.94582903e-01, 4.55223918e-01,
+          5.65227389e-01, -8.90690088e-01, -7.75805473e-01 ]
+      - [ -4.94200528e-01, -6.00140274e-01, -1.49751663e-01,
+          4.64309573e-01, -9.81157005e-01, -7.26613522e-01,
+          -1.47588134e-01, 5.19745708e-01, -9.77714837e-01,
+          -7.39115715e-01, -1.25000000e-01, 6.76684022e-01,
+          -9.49706137e-01, -8.10299218e-01, -9.37500000e-02,
+          6.28175735e-01, -9.48864400e-01, -8.28921556e-01,
+          6.32702112e-01, -2.17773020e-01, 4.69066381e-01,
+          5.75089455e-01, -8.26274157e-01, -2.19882190e-01,
+          4.68207955e-01, 5.34831047e-01, -8.01535904e-01,
+          -2.85904288e-01, 4.66998696e-01, 3.39856386e-01,
+          -6.45392060e-01, -5.41443586e-01, 5.22387981e-01,
+          -4.99996543e-02, -5.83062649e-01, -6.39287531e-01 ]
+      - [ -6.95111871e-01, -7.16582417e-01, -4.05251920e-01,
+          4.22266126e-01, -9.96093988e-01, -8.06026697e-01,
+          -3.47732186e-01, 3.05071235e-01, -9.86869812e-01,
+          -8.54764521e-01, -2.87499964e-01, 6.24991536e-01,
+          -9.80037391e-01, -8.90399456e-01, -3.14062476e-01,
+          6.86884880e-01, -9.82206047e-01, -8.84110808e-01,
+          -1.05500519e-01, -3.64940643e-01, 9.78627205e-02,
+          9.66309071e-01, -9.51339483e-01, -1.98978662e-01,
+          1.88439250e-01, 2.63510108e-01, -8.32949936e-01,
+          -5.89306533e-01, 2.15442181e-01, 5.12508154e-02,
+          -7.24516749e-01, -7.55948007e-01, 1.96517348e-01,
+          1.71556234e-01, -7.53201723e-01, -7.34636068e-01 ]
+      - [ -7.13753104e-01, -8.97804439e-01, -5.55713236e-01,
+          7.68160820e-03, -9.97217298e-01, -9.56396878e-01,
+          -5.82433403e-01, 5.55227995e-02, -9.95937347e-01,
+          -9.57116127e-01, -5.57812512e-01, 3.81038308e-01,
+          -9.94318068e-01, -9.68336403e-01, -5.57812512e-01,
+          4.00698662e-01, -9.94286895e-01, -9.68905866e-01,
+          8.11529160e-03, -6.91576004e-01, 2.13723183e-02,
+          5.09102106e-01, -9.84303236e-01, -7.82904387e-01,
+          7.51445293e-02, 5.82896352e-01, -9.85417426e-01,
+          -7.97580719e-01, 1.05852962e-01, 5.96138239e-01,
+          -9.61809158e-01, -8.81670773e-01, 1.01989985e-01,
+          5.47618628e-01, -9.58237171e-01, -8.85460496e-01 ]
+      - [ -7.09817767e-01, -9.73170102e-01, -5.69907665e-01,
+          -1.66526079e-01, -9.98660326e-01, -9.93637681e-01,
+          -6.12670898e-01, -9.49361920e-02, -9.98676121e-01,
+          -9.92528439e-01, -4.71875012e-01, 2.28974700e-01,
+          -9.98787522e-01, -9.95528281e-01, -4.48437512e-01,
+          2.48596668e-01, -9.98727202e-01, -9.95968580e-01,
+          -2.86444306e-01, -8.94595325e-01, -2.53093302e-01,
+          2.67765999e-01, -9.97595847e-01, -9.56870615e-01,
+          -2.83237040e-01, 3.56743932e-01, -9.98403490e-01,
+          -9.54313278e-01, -2.02988863e-01, 4.94467854e-01,
+          -9.96241927e-01, -9.71658409e-01, -1.54228866e-01,
+          4.46991205e-01, -9.95563269e-01, -9.74945188e-01 ]
+      - [ -5.68765521e-01, -4.71402049e-01, -3.01632404e-01,
+          5.37443042e-01, -9.51002538e-01, -5.35440028e-01,
+          -2.35421121e-01, 2.40995884e-01, -9.11889255e-01,
+          -6.79418802e-01, -2.06249952e-01, 1.98200941e-01,
+          -8.51483107e-01, -7.94159353e-01, -2.29687452e-01,
+          3.96453023e-01, -8.70448887e-01, -7.63395429e-01,
+          1.06702685e-01, -2.39809990e-01, 6.86165094e-02,
+          6.46452069e-01, -8.24958861e-01, -8.33248496e-02,
+          1.39884353e-01, -2.44598985e-02, -6.68711543e-01,
+          -5.21163344e-01, 1.68119431e-01, -2.02617764e-01,
+          -5.44820249e-01, -6.86576247e-01, 1.39303446e-01,
+          -3.24437618e-02, -5.87065279e-01, -6.43133283e-01 ]
+      - [ -5.78914642e-01, -7.62669563e-01, -2.59048998e-01,
+          2.79565811e-01, -9.95934963e-01, -8.78947437e-01,
+          -3.17494571e-01, 3.78749490e-01, -9.95055079e-01,
+          -8.69320929e-01, -1.35937452e-01, 6.84103966e-01,
+          -9.91319120e-01, -9.15616512e-01, -1.84374988e-01,
+          7.44589686e-01, -9.92410064e-01, -9.08511519e-01,
+          7.45115757e-01, -4.38414931e-01, 5.32058477e-01,
+          5.57249546e-01, -9.25121665e-01, -5.47721684e-01,
+          4.28901672e-01, 8.30422282e-01, -9.43531871e-01,
+          -4.58347857e-01, 8.10709715e-01, 3.53145480e-01,
+          -8.50762069e-01, -7.94213772e-01, 6.94029808e-01,
+          5.46689868e-01, -8.70409489e-01, -7.54698277e-01 ]
+      - [ -5.56338072e-01, -6.88891888e-01, -2.13626683e-01,
+          4.09042954e-01, -9.94308531e-01, -8.19124758e-01,
+          -2.75737882e-01, 5.52109718e-01, -9.95454133e-01,
+          -8.07143033e-01, -1.21874928e-01, 8.14097404e-01,
+          -9.86501396e-01, -8.69907796e-01, -1.12499952e-01,
+          8.16060781e-01, -9.86405551e-01, -8.72394800e-01,
+          6.05650663e-01, -2.89944768e-01, 4.96062994e-01,
+          5.98120809e-01, -8.72822881e-01, -3.38313103e-01,
+          4.65895891e-01, 8.69515538e-01, -9.05161142e-01,
+          -2.73531735e-01, 6.18929029e-01, 3.55503440e-01,
+          -7.56808937e-01, -6.73851907e-01, 6.39303565e-01,
+          2.84542680e-01, -7.49037623e-01, -6.87277138e-01 ]
+      - [ -5.90513647e-01, -5.02220869e-01, -2.86018431e-01,
+          5.49571872e-01, -9.61073339e-01, -5.69441080e-01,
+          -2.48380184e-01, 3.01573753e-01, -9.28191006e-01,
+          -6.85709834e-01, -1.89062476e-01, 3.50897789e-01,
+          -8.88162494e-01, -7.98164785e-01, -1.95312500e-01,
+          3.69203687e-01, -8.87743592e-01, -7.99711943e-01,
+          1.11511707e-01, -2.14903057e-01, 1.04611874e-01,
+          5.86971045e-01, -7.96919107e-01, -7.98367858e-02,
+          1.42196536e-01, 1.73037052e-02, -6.31300926e-01,
+          -4.20850754e-01, 2.40348697e-01, -2.47672498e-01,
+          -5.22694826e-01, -6.83065176e-01, 2.48756170e-01,
+          -3.42776299e-01, -5.11479855e-01, -6.98961020e-01 ]
+      - [ -4.73488033e-01, -2.34245300e-01, -9.58126187e-02,
+          4.52210784e-01, -8.59482169e-01, -3.51419449e-01,
+          -1.33189321e-01, 8.08943987e-01, -8.92180383e-01,
+          -2.50660419e-01, -4.06249166e-02, 3.13933253e-01,
+          -7.39639878e-01, -5.97558379e-01, -4.68749404e-02,
+          1.87407970e-01, -7.17201531e-01, -6.33557796e-01,
+          5.47339916e-01, 1.51429892e-01, 5.02812147e-01, 6.40234947e-02,
+          -4.18886662e-01, 8.60317945e-02, 4.26589608e-01,
+          4.39806342e-01, -4.85986173e-01, 3.48444462e-01,
+          5.99003673e-01, -2.30403364e-01, -1.68926358e-01,
+          -3.30561101e-01, 6.31840825e-01, -4.79300022e-01,
+          -1.27176762e-01, -4.10731077e-01 ]
+      - [ -7.10439086e-01, -9.75278199e-01, -5.78424454e-01,
+          -1.68191195e-01, -9.98684943e-01, -9.93848741e-01,
+          -5.59395194e-01, -1.16721451e-01, -9.98687863e-01,
+          -9.95039880e-01, -4.28124964e-01, 2.14131832e-01,
+          -9.98825371e-01, -9.96517122e-01, -4.34374988e-01,
+          2.40591645e-01, -9.98816192e-01, -9.96619344e-01,
+          -1.37962162e-01, -8.94967735e-01, -1.63104653e-01,
+          3.00223708e-01, -9.99030411e-01, -9.58658814e-01,
+          -1.53757274e-01, 3.49826455e-01, -9.98591602e-01,
+          -9.60717916e-01, -8.71735811e-03, 5.30849457e-01,
+          -9.98068690e-01, -9.74604130e-01, -1.74129605e-02,
+          5.14523387e-01, -9.98008847e-01, -9.74917889e-01 ]
+      - [ -9.10521984e-01, -9.78856623e-01, -8.01277518e-01,
+          -1.76430523e-01, -9.98476624e-01, -9.92686391e-01,
+          -8.21454287e-01, -1.22309566e-01, -9.98628378e-01,
+          -9.94055092e-01, -7.68750012e-01, 2.06634521e-01,
+          -9.98703778e-01, -9.95953023e-01, -7.71875024e-01,
+          2.35942245e-01, -9.98836994e-01, -9.96183395e-01,
+          -7.21671224e-01, -8.96966875e-01, -4.84814405e-01,
+          2.58444190e-01, -9.97677267e-01, -9.53318596e-01,
+          -5.09826660e-01, 2.99922705e-01, -9.96905088e-01,
+          -9.56681490e-01, -4.69489455e-01, 4.48871493e-01,
+          -9.95284617e-01, -9.72306550e-01, -4.65174139e-01,
+          4.30786967e-01, -9.95503128e-01, -9.73539233e-01 ]
+      - [ -6.00662827e-01, -6.53005481e-01, -3.10149074e-01,
+          4.11813140e-01, -9.88282084e-01, -7.72577584e-01,
+          -3.20374310e-01, 5.42681456e-01, -9.89367187e-01,
+          -7.67020524e-01, -2.18750000e-01, 6.99573517e-01,
+          -9.70529020e-01, -8.49102736e-01, -2.35937476e-01,
+          7.39914775e-01, -9.71909761e-01, -8.44744205e-01,
+          1.94469333e-01, -2.74912357e-01, 2.35095620e-01,
+          5.60716391e-01, -8.59187841e-01, -3.11981380e-01,
+          2.57803440e-01, 7.16359615e-01, -8.67826819e-01,
+          -2.75634646e-01, 2.95143127e-01, 2.16247439e-01,
+          -6.92505300e-01, -6.54125571e-01, 2.91044831e-01,
+          2.52042294e-01, -7.04694927e-01, -6.46586776e-01 ]
+      - [ -6.27589107e-01, -7.87954152e-01, -3.72604668e-01,
+          2.48757958e-01, -9.96270835e-01, -8.85105550e-01,
+          -3.65010738e-01, 2.45386720e-01, -9.92351592e-01,
+          -8.97994280e-01, -3.10937524e-01, 5.97113252e-01,
+          -9.89144742e-01, -9.21398163e-01, -3.06249976e-01,
+          5.90290785e-01, -9.88648057e-01, -9.25211370e-01,
+          3.15900207e-01, -4.70652759e-01, 3.02587152e-01,
+          7.72448659e-01, -9.63065863e-01, -4.97859359e-01,
+          3.66473913e-01, 6.54470921e-01, -9.44175005e-01,
+          -5.97965777e-01, 4.42092061e-01, 6.08774304e-01,
+          -8.87723684e-01, -7.57924736e-01, 4.45273519e-01,
+          4.39053655e-01, -8.70242655e-01, -7.86864460e-01 ]
+      - [ -6.37738228e-01, -6.03233397e-01, -3.98154676e-01,
+          3.72962594e-01, -9.67197537e-01, -6.90039217e-01,
+          -3.89488816e-01, 3.45800042e-01, -9.59221184e-01,
+          -7.36766934e-01, -3.17187428e-01, 4.02565837e-01,
+          -9.26369190e-01, -8.40745568e-01, -3.35937440e-01,
+          4.24688697e-01, -9.24656928e-01, -8.37919652e-01,
+          8.98705721e-02, -3.21557045e-01, 4.61193323e-02,
+          4.80512857e-01, -8.16677272e-01, -2.57515073e-01,
+          6.12715483e-02, 2.66654491e-01, -7.65030384e-01,
+          -4.38754618e-01, 2.35367298e-01, -1.51664495e-01,
+          -6.27978027e-01, -7.39038587e-01, 2.11442828e-01,
+          -1.41150594e-01, -6.29760623e-01, -7.26929069e-01 ]
+      - [ -7.22038150e-01, -9.72497165e-01, -5.71327209e-01,
+          -1.63611233e-01, -9.98603344e-01, -9.93144155e-01,
+          -6.16990626e-01, -9.24007893e-02, -9.98673618e-01,
+          -9.92206812e-01, -4.56250012e-01, 2.37790108e-01,
+          -9.98947203e-01, -9.95385706e-01, -4.68749940e-01,
+          2.65455365e-01, -9.98884737e-01, -9.95237589e-01,
+          -1.95671856e-01, -8.79044294e-01, -2.14848220e-01,
+          3.27617049e-01, -9.98586357e-01, -9.44621146e-01,
+          -1.97687924e-01, 3.90063763e-01, -9.98474896e-01,
+          -9.46754336e-01, -6.10211492e-02, 5.62440395e-01,
+          -9.97520387e-01, -9.67709124e-01, -7.21393228e-02,
+          5.54243922e-01, -9.97525334e-01, -9.67308879e-01 ]
+      - [ -5.61723292e-01, -4.40021634e-01, -2.19304502e-01,
+          4.51942086e-01, -9.39331770e-01, -5.67459345e-01,
+          -2.33981311e-01, 6.16436481e-01, -9.44103479e-01,
+          -5.48067093e-01, -1.82812452e-01, 4.30451393e-01,
+          -8.66743326e-01, -7.37976789e-01, -1.98437452e-01,
+          4.90163445e-01, -8.69673908e-01, -7.28668094e-01,
+          1.76435113e-01, -8.28887224e-02, 2.21597314e-01,
+          2.85160422e-01, -6.70583248e-01, -1.43105686e-01,
+          2.23121405e-01, 4.36041594e-01, -6.80289268e-01,
+          -8.04496408e-02, 2.80199170e-01, -2.39598751e-02,
+          -4.60214436e-01, -5.17643332e-01, 2.76119351e-01,
+          3.24058533e-03, -4.76312935e-01, -5.09986103e-01 ]
+      - [ -8.49834323e-01, -9.35773373e-01, -7.85663605e-01,
+          -2.05516815e-03, -9.99966323e-01, -9.63934064e-01,
+          -8.08495283e-01, 7.38722086e-02, -9.99992311e-01,
+          -9.63585615e-01, -7.68750012e-01, 4.41025376e-01,
+          -9.99932408e-01, -9.74803746e-01, -7.73437500e-01,
+          4.60629106e-01, -9.99938548e-01, -9.75108743e-01,
+          -3.57379019e-01, -7.41002083e-01, -2.55343020e-01,
+          5.64062238e-01, -9.96776521e-01, -8.06185484e-01,
+          -3.08670521e-01, 6.87589526e-01, -9.99233246e-01,
+          -8.08716416e-01, -9.58903432e-02, 8.82041931e-01,
+          -9.95902836e-01, -8.94223034e-01, -9.45274234e-02,
+          8.71128559e-01, -9.95662570e-01, -8.96165967e-01 ]
+      - [ -8.44863296e-01, -9.40898597e-01, -7.77146935e-01,
+          -1.48026943e-02, -9.99907076e-01, -9.67482626e-01,
+          -7.91216671e-01, 5.60345650e-02, -9.99915361e-01,
+          -9.68607128e-01, -7.57812500e-01, 4.23450947e-01,
+          -9.99847531e-01, -9.77237582e-01, -7.64062524e-01,
+          4.45303440e-01, -9.99878705e-01, -9.77386713e-01,
+          -3.39945972e-01, -7.53582716e-01, -2.93588281e-01,
+          5.86435080e-01, -9.99207616e-01, -8.18612158e-01,
+          -2.71676362e-01, 6.55933261e-01, -9.98989761e-01,
+          -8.28328609e-01, -1.10834420e-01, 8.99218440e-01,
+          -9.97846544e-01, -9.00625288e-01, -1.14427924e-01,
+          8.97531867e-01, -9.97939944e-01, -9.02000964e-01 ]
+      - [ -8.43827665e-01, -9.40819204e-01, -7.77146935e-01,
+          -1.43474936e-02, -9.99915361e-01, -9.67443943e-01,
+          -7.91216671e-01, 5.64398766e-02, -9.99921203e-01,
+          -9.68566239e-01, -7.56249964e-01, 4.24991965e-01,
+          -9.99872744e-01, -9.77249980e-01, -7.62499988e-01,
+          4.45384741e-01, -9.99877989e-01, -9.77427721e-01,
+          -3.36940289e-01, -7.53018498e-01, -2.91338563e-01,
+          5.87198734e-01, -9.99159157e-01, -8.17556977e-01,
+          -2.69364119e-01, 6.56666636e-01, -9.98999536e-01,
+          -8.28328490e-01, -1.08343780e-01, 8.99419427e-01,
+          -9.97828305e-01, -9.00567651e-01, -1.14427924e-01,
+          8.97853732e-01, -9.97896016e-01, -9.01620626e-01 ]
+      - [ -6.98425889e-01, -5.09929061e-01, -4.03832495e-01,
+          5.32573581e-01, -9.73229110e-01, -6.10546768e-01,
+          -4.21166301e-01, 6.49646282e-01, -9.73368585e-01,
+          -6.09311581e-01, -1.14062428e-01, 6.59372926e-01,
+          -9.49778676e-01, -8.16025794e-01, -1.14062428e-01,
+          6.76176548e-01, -9.50422406e-01, -8.17284822e-01,
+          7.06341267e-02, -1.93311512e-01, 7.76153803e-02,
+          2.42894530e-01, -7.13065743e-01, -2.78760195e-01,
+          9.36415195e-02, 2.73168564e-01, -7.09981561e-01,
+          -3.17708313e-01, 1.35740995e-01, 2.99327374e-02,
+          -5.57157636e-01, -5.94030499e-01, 9.70149040e-02,
+          2.09259987e-03, -5.44338822e-01, -5.84875107e-01 ]
+      - [ -6.94697618e-01, -3.89391720e-01, -4.44996417e-01,
+          1.92018270e-01, -8.58845592e-01, -5.06423533e-01,
+          -4.58603263e-01, 3.38384986e-01, -8.70109797e-01,
+          -4.92089808e-01, -4.23437476e-01, -3.60066891e-02,
+          -7.43893325e-01, -7.46458411e-01, -4.28124964e-01,
+          -3.27071548e-02, -7.38900304e-01, -7.53911436e-01,
+          6.34204149e-02, -1.36954427e-01, 5.96176386e-02,
+          3.09163332e-02, -5.85945487e-01, -2.66691506e-01,
+          9.59537029e-02, 6.22757673e-02, -5.94846010e-01,
+          -3.13723445e-01, 1.30759597e-01, -8.55141878e-02,
+          -4.55850422e-01, -5.49241304e-01, 1.04477525e-01,
+          -1.15699232e-01, -4.51745570e-01, -5.49124599e-01 ]
+      - [ -7.90182292e-01, -8.77996922e-01, -6.93399549e-01,
+          3.41926813e-02, -9.95392382e-01, -9.32102323e-01,
+          -6.84665203e-01, 5.68988323e-02, -9.94313061e-01,
+          -9.46133435e-01, -6.23437524e-01, 4.15858984e-01,
+          -9.94583011e-01, -9.62117732e-01, -6.28124952e-01,
+          4.34637189e-01, -9.94547606e-01, -9.62633252e-01,
+          -2.53982663e-01, -6.50189161e-01, -1.56355441e-01,
+          6.14900589e-01, -9.84402299e-01, -6.90280259e-01,
+          -6.82081580e-02, 5.03655434e-01, -9.73836243e-01,
+          -7.83908248e-01, 1.30759597e-01, 6.12371564e-01,
+          -9.62615490e-01, -8.79425526e-01, 1.21890545e-01,
+          5.75963378e-01, -9.61052358e-01, -8.83821428e-01 ]
+      - [ -7.46685982e-01, -4.35001612e-01, -5.24485469e-01,
+          5.26586175e-01, -9.14788425e-01, -3.52768481e-01,
+          -4.93160486e-01, 1.25599384e-01, -8.89941096e-01,
+          -6.76481247e-01, -4.10937428e-01, 8.31828117e-02,
+          -8.43004823e-01, -8.20416033e-01, -4.15624976e-01,
+          1.33169174e-01, -8.48333061e-01, -8.21337581e-01,
+          -4.65885997e-02, -1.20101810e-01, 1.68728828e-02,
+          6.88657641e-01, -7.46538758e-01, 2.27144480e-01,
+          9.59537029e-02, -5.78266263e-01, -3.76973629e-01,
+          -5.92907429e-01, 1.50684953e-01, -4.70543742e-01,
+          -3.42556119e-01, -6.33624315e-01, 1.29353285e-01,
+          -5.87966561e-01, -3.19694579e-01, -6.52436852e-01 ]
+      - [ -7.63463140e-01, -9.53028977e-01, -6.76366210e-01,
+          -1.17312491e-01, -9.97641623e-01, -9.82249260e-01,
+          -6.64506793e-01, -5.87471724e-02, -9.97996151e-01,
+          -9.85112906e-01, -6.37499928e-01, 2.85158992e-01,
+          -9.97763932e-01, -9.87981737e-01, -6.39062524e-01,
+          3.10493112e-01, -9.97823834e-01, -9.88249540e-01,
+          -4.25308108e-01, -8.37914944e-01, -4.33070838e-01,
+          4.28536296e-01, -9.98552680e-01, -8.86445880e-01,
+          -3.47976923e-01, 4.56925035e-01, -9.98595357e-01,
+          -9.16564584e-01, -2.62764633e-01, 6.78902864e-01,
+          -9.97982621e-01, -9.45793033e-01, -2.68656731e-01,
+          6.71380162e-01, -9.98075008e-01, -9.46418941e-01 ]
+      - [ -8.27671885e-01, -5.35764337e-01, -7.11852372e-01,
+          1.36199951e-01, -8.59179199e-01, -4.93748903e-01,
+          -7.24981964e-01, -8.82338583e-01, -5.22561133e-01,
+          -8.28991115e-01, -7.17187464e-01, -6.39241219e-01,
+          -5.94540238e-01, -8.65024745e-01, -7.26562500e-01,
+          -5.97214699e-01, -5.90475440e-01, -8.67319286e-01,
+          -1.27141654e-01, -3.06859791e-01, -9.33633447e-02,
+          5.24777532e-01, -8.09907019e-01, -1.64911866e-01,
+          -1.26011610e-01, -2.20976353e-01, -6.10319972e-01,
+          -6.12603724e-01, -1.36985779e-02, -7.53539801e-02,
+          -6.37140453e-01, -7.24920809e-01, -2.48757005e-02,
+          -5.60443997e-02, -6.50619864e-01, -7.19945669e-01 ]
+      - [ -7.26594865e-01, -3.09024930e-01, -4.50674236e-01,
+          8.98135900e-01, -9.28904176e-01, -6.75529838e-02,
+          -4.19726372e-01, -5.21628737e-01, -6.32598519e-01,
+          -6.74656987e-01, -4.06250000e-01, -4.29100215e-01,
+          -6.00615621e-01, -7.50649154e-01, -4.20312524e-01,
+          -4.54408288e-01, -5.83518744e-01, -7.64696956e-01,
+          -6.82296753e-02, -9.27793384e-02, 4.83690500e-02,
+          7.63572931e-01, -7.55176902e-01, 3.05968642e-01,
+          9.36415195e-02, -7.60203719e-01, -2.67988503e-01,
+          -6.21909142e-01, 1.68119431e-01, -5.83914816e-01,
+          -2.64256060e-01, -6.22636259e-01, 1.44278646e-01,
+          -6.79999352e-01, -2.50780761e-01, -6.39128923e-01 ]
+      - [ -8.21043909e-01, -9.53444898e-01, -7.60113597e-01,
+          -6.21596575e-02, -9.99323666e-01, -9.75757182e-01,
+          -7.69618392e-01, 7.26246834e-03, -9.99431133e-01,
+          -9.76970136e-01, -7.34375000e-01, 3.66628766e-01,
+          -9.99437809e-01, -9.83201206e-01, -7.39062488e-01,
+          3.89740348e-01, -9.99456465e-01, -9.83349741e-01,
+          -4.22903597e-01, -7.98331857e-01, -3.99325013e-01,
+          5.07343769e-01, -9.99385357e-01, -8.56915295e-01,
+          -3.82658958e-01, 5.73216915e-01, -9.99351501e-01,
+          -8.67564023e-01, -2.47820735e-01, 8.03411126e-01,
+          -9.98844802e-01, -9.24878418e-01, -2.53731430e-01,
+          7.96850681e-01, -9.98837292e-01, -9.25726414e-01 ]
+      - [ -7.81897306e-01, -7.70364344e-01, -6.18168950e-01,
+          2.27049232e-01, -9.93488431e-01, -8.50867450e-01,
+          -6.24190032e-01, 2.97871113e-01, -9.92839217e-01,
+          -8.58308196e-01, -5.00000000e-01, 5.64311504e-01,
+          -9.88154233e-01, -9.18562174e-01, -5.03124952e-01,
+          5.59717298e-01, -9.87108052e-01, -9.20882761e-01,
+          -1.10309601e-01, -4.85454559e-01, -8.66141319e-02,
+          4.74481106e-01, -9.15152311e-01, -5.38761854e-01,
+          -4.97110486e-02, 4.92290020e-01, -9.09007192e-01,
+          -5.80797315e-01, 4.60772514e-02, 4.47635889e-01,
+          -8.69639158e-01, -7.81360328e-01, 2.48756409e-02,
+          3.44145894e-01, -8.54862928e-01, -7.93514311e-01 ]
+      - [ -7.36951113e-01, -5.03938675e-01, -5.30163169e-01,
+          1.19901896e-01, -8.88997614e-01, -6.17588520e-01,
+          -5.44996381e-01, 3.92262936e-02, -8.61272335e-01,
+          -6.67736232e-01, -5.35937548e-01, 8.55263472e-02,
+          -8.18522453e-01, -7.91035533e-01, -5.35937548e-01,
+          1.56768560e-02, -8.01735163e-01, -8.09074998e-01,
+          -1.03697121e-01, -2.92933941e-01, -1.04611933e-01,
+          1.73123717e-01, -7.28343368e-01, -3.81645381e-01,
+          -8.90174508e-02, 3.20879221e-02, -6.70692801e-01,
+          -4.79633510e-01, -4.60771918e-02, 1.36358023e-01,
+          -6.38309121e-01, -6.32313848e-01, -6.46765828e-02,
+          8.99602175e-02, -6.29516840e-01, -6.36062503e-01 ]
+      - [ -7.31358767e-01, -5.94562471e-01, -5.18807709e-01,
+          2.80510187e-01, -9.58893597e-01, -6.97999299e-01,
+          -5.01799822e-01, 3.36033344e-01, -9.59394991e-01,
+          -7.28578210e-01, -4.03124928e-01, 4.36477423e-01,
+          -9.40224051e-01, -8.52403641e-01, -4.15624976e-01,
+          3.81465793e-01, -9.31673527e-01, -8.60407114e-01,
+          -6.94319606e-02, -3.21253479e-01, -5.51180840e-02,
+          2.98480153e-01, -7.95543194e-01, -3.96756411e-01,
+          -4.97110486e-02, 1.91884518e-01, -7.52204657e-01,
+          -4.79726970e-01, -3.61145735e-02, 1.89234138e-01,
+          -6.77866340e-01, -6.51038706e-01, -5.22388220e-02,
+          1.21184468e-01, -6.64388120e-01, -6.59234762e-01 ]
+      - [ -7.35915542e-01, -5.02320051e-01, -5.25904894e-01,
+          1.29509568e-01, -8.90461981e-01, -6.14622474e-01,
+          -5.40676773e-01, 4.25151587e-02, -8.61902833e-01,
+          -6.67208016e-01, -5.34374952e-01, 9.10938978e-02,
+          -8.18445683e-01, -7.88883805e-01, -5.34375012e-01,
+          2.05930471e-02, -8.01347136e-01, -8.06825399e-01,
+          -1.03095949e-01, -2.89623678e-01, -1.00112557e-01,
+          1.65525675e-01, -7.24388838e-01, -3.81962240e-01,
+          -8.43930840e-02, 3.16692591e-02, -6.69588804e-01,
+          -4.78021204e-01, -4.85678315e-02, 1.28874183e-01,
+          -6.30812645e-01, -6.27787352e-01, -6.46765828e-02,
+          8.36925507e-02, -6.22201979e-01, -6.31173015e-01 ]
+      - [ -7.43786216e-01, -6.73830450e-01, -4.73385394e-01,
+          3.90606165e-01, -9.88343000e-01, -7.61159360e-01,
+          -5.03239751e-01, 4.45508838e-01, -9.87621129e-01,
+          -7.80139804e-01, -3.23437452e-01, 6.60882354e-01,
+          -9.78322804e-01, -8.75424266e-01, -3.32812488e-01,
+          6.84216142e-01, -9.78775084e-01, -8.74352932e-01,
+          2.56988287e-01, -3.35035205e-01, 2.93588281e-01,
+          5.92763901e-01, -8.71813059e-01, -3.23030412e-01,
+          2.04624295e-01, 5.95886350e-01, -8.67892981e-01,
+          -3.89586985e-01, 4.37110782e-01, 3.39111567e-01,
+          -7.85842180e-01, -7.15668082e-01, 4.20397878e-01,
+          3.82472157e-01, -7.93920100e-01, -7.04802513e-01 ]
+      - [ -7.22452402e-01, -6.08266115e-01, -4.29382563e-01,
+          4.40246820e-01, -9.80025589e-01, -6.97585821e-01,
+          -4.58603263e-01, 4.79216099e-01, -9.77593362e-01,
+          -7.22837210e-01, -2.46874988e-01, 6.25076413e-01,
+          -9.64165866e-01, -8.53028655e-01, -2.56249964e-01,
+          6.43334866e-01, -9.64100301e-01, -8.52251232e-01,
+          2.08896995e-01, -2.87852108e-01, 2.21597314e-01,
+          4.50760126e-01, -8.06423008e-01, -2.80355573e-01,
+          1.51445031e-01, 6.05599642e-01, -8.46752524e-01,
+          -3.13390553e-01, 3.49937677e-01, 1.89218879e-01,
+          -7.23424196e-01, -7.00442791e-01, 3.08457732e-01,
+          2.37766385e-01, -7.31983423e-01, -6.84562504e-01 ]
+      - [ -8.50662768e-01, -9.26246047e-01, -7.58694112e-01,
+          2.46835947e-02, -9.99955416e-01, -9.58342612e-01,
+          -7.69618392e-01, 9.05187130e-02, -9.99940276e-01,
+          -9.61694658e-01, -7.31249928e-01, 4.65822458e-01,
+          -9.99900579e-01, -9.72332776e-01, -7.40625024e-01,
+          4.86524463e-01, -9.99921262e-01, -9.72264111e-01,
+          -2.69011199e-01, -7.20394909e-01, -1.72103405e-01,
+          6.60355568e-01, -9.99354303e-01, -7.89472938e-01,
+          -2.02312112e-01, 7.21640944e-01, -9.99166310e-01,
+          -8.00659299e-01, 1.24526024e-03, 9.89081383e-01,
+          -9.98050988e-01, -8.83954644e-01, -1.74129605e-02,
+          9.91179824e-01, -9.98049796e-01, -8.83805752e-01 ]
+      - [ -6.80820227e-01, -4.94913936e-01, -3.64087999e-01,
+          5.86256981e-01, -9.68934774e-01, -5.61828494e-01,
+          -3.44852388e-01, 5.91246843e-01, -9.65668440e-01,
+          -6.27109885e-01, -2.96874642e-02, 6.24774456e-01,
+          -9.43475187e-01, -8.15538883e-01, -4.21874523e-02,
+          6.73004031e-01, -9.45869327e-01, -8.10832500e-01,
+          7.60443211e-02, -2.08095789e-01, 1.31608605e-01,
+          3.53354692e-01, -7.09457397e-01, -1.44159734e-01,
+          1.05202198e-01, 4.18588996e-01, -7.51570463e-01,
+          -2.58032441e-01, 2.25404620e-01, -1.33934796e-01,
+          -5.61982930e-01, -6.71143115e-01, 1.76616907e-01,
+          -2.26731896e-02, -5.83093524e-01, -6.34004831e-01 ]
+      - [ -6.88483834e-01, -3.09016049e-01, -4.42157567e-01,
+          2.83957720e-01, -8.04874182e-01, -3.02082181e-01,
+          -4.39884722e-01, 7.41339922e-02, -7.58162439e-01,
+          -4.58395839e-01, -3.82812440e-01, -3.69284153e-01,
+          -6.22634172e-01, -7.46471524e-01, -3.95312428e-01,
+          -2.02076972e-01, -6.44616902e-01, -7.20434785e-01,
+          3.75713110e-02, -7.75516033e-02, 5.96176386e-02,
+          1.56823754e-01, -5.02080441e-01, 5.26647568e-02,
+          1.16762996e-01, -2.09915102e-01, -3.92854691e-01,
+          -2.69489050e-01, 2.22913980e-01, -5.20797253e-01,
+          -2.75269687e-01, -5.97760558e-01, 1.84079528e-01,
+          -4.24771369e-01, -3.01457167e-01, -5.59088886e-01 ]
+      - [ -8.33885670e-01, -9.43302274e-01, -7.51596868e-01,
+          -3.49961519e-02, -9.99542654e-01, -9.71170783e-01,
+          -7.68178523e-01, 3.67540121e-02, -9.99558330e-01,
+          -9.71500516e-01, -7.20312476e-01, 3.98832440e-01,
+          -9.99548793e-01, -9.80013728e-01, -7.25000024e-01,
+          4.19418693e-01, -9.99543607e-01, -9.80258465e-01,
+          -3.50165367e-01, -7.61453748e-01, -2.80089974e-01,
+          5.63093424e-01, -9.99078572e-01, -8.32742691e-01,
+          -2.76300609e-01, 6.38298392e-01, -9.98870850e-01,
+          -8.36103678e-01, -8.34370852e-02, 8.70543003e-01,
+          -9.97938931e-01, -9.09595251e-01, -8.95522237e-02,
+          8.66669178e-01, -9.97984111e-01, -9.10762548e-01 ]
+      - [ -6.89105213e-01, -1.12410843e-01, -4.15188074e-01,
+          1.24150872e-01, -6.54238462e-01, -1.45486355e-01,
+          -4.18286502e-01, 1.78588390e-01, -6.22905850e-01,
+          -1.17610335e-01, -3.21874976e-01, -4.14323688e-01,
+          -4.97878611e-01, -6.41677797e-01, -3.31249952e-01,
+          -4.61849928e-01, -4.64466631e-01, -6.53590202e-01,
+          1.74030542e-01, 1.50704622e-01, 2.73340821e-01,
+          -2.58388877e-01, -2.49478459e-01, -4.14127707e-02,
+          2.50867009e-01, -9.80318785e-02, -2.45541930e-01,
+          1.01687312e-01, 3.59900355e-01, -3.55597675e-01,
+          -1.09207928e-01, -3.57593536e-01, 3.18407893e-01,
+          -1.35577440e-01, -1.62652850e-01, -2.64250040e-01 ]
+      - [ -7.13753104e-01, -3.49319100e-01, -4.74804819e-01,
+          2.15049982e-01, -8.45743418e-01, -4.52068686e-01,
+          -4.87401009e-01, 3.35354328e-01, -8.37497652e-01,
+          -4.10924733e-01, -3.62500012e-01, 1.20569825e-01,
+          -7.95706093e-01, -7.48394966e-01, -3.98437500e-01,
+          9.26432610e-02, -7.74312794e-01, -7.45411992e-01,
+          4.47850227e-02, -2.27619410e-02, 1.67604089e-01,
+          -1.40531242e-01, -4.39462185e-01, -2.08263099e-01,
+          1.28323674e-01, 1.37632251e-01, -4.84493554e-01,
+          -2.69385576e-02, 2.35367298e-01, -2.99847186e-01,
+          -2.97263563e-01, -5.05600452e-01, 1.66666627e-01,
+          -5.59747815e-02, -3.45697939e-01, -4.09127772e-01 ]
+      - [ -7.93703377e-01, -7.41612434e-01, -6.08232737e-01,
+          2.68178105e-01, -9.92322922e-01, -8.25295329e-01,
+          -6.18430495e-01, 3.64096880e-01, -9.91439521e-01,
+          -8.21177244e-01, -4.82812464e-01, 5.93544960e-01,
+          -9.84913647e-01, -9.03133631e-01, -4.93749976e-01,
+          6.11322165e-01, -9.84792769e-01, -9.02378678e-01,
+          -5.19989133e-02, -3.85612845e-01, 5.06186485e-02,
+          4.54895377e-01, -8.76668930e-01, -4.54922915e-01,
+          1.96532011e-02, 5.41335821e-01, -8.64447117e-01,
+          -4.14529741e-01, 2.12951303e-01, 3.93769026e-01,
+          -8.18283796e-01, -7.34332204e-01, 1.46766186e-01,
+          4.05295491e-01, -8.14875603e-01, -7.20335543e-01 ]
+      - [ -7.01532722e-01, -3.03816378e-01, -4.36479747e-01,
+          1.25969887e-01, -7.88305759e-01, -4.08502042e-01,
+          -4.35565174e-01, 2.23583579e-01, -7.81163454e-01,
+          -3.86492789e-01, -3.81249964e-01, -1.63313270e-01,
+          -6.59907341e-01, -7.02973604e-01, -3.95312428e-01,
+          -1.38364136e-01, -6.48071647e-01, -6.97519064e-01,
+          -5.44034243e-02, -3.66010666e-02, 1.27109051e-01,
+          -2.59979904e-01, -3.77867877e-01, -2.51613259e-01,
+          8.90172720e-02, 3.72035503e-02, -4.38549399e-01,
+          -6.55459762e-02, 1.98007464e-01, -3.28839362e-01,
+          -2.77294755e-01, -5.04919589e-01, 1.04477525e-01,
+          -9.66117978e-02, -3.17208171e-01, -4.05352831e-01 ]
+      - [ -7.74647892e-01, -7.23532856e-01, -5.88360548e-01,
+          2.16421366e-01, -9.78380322e-01, -7.93115675e-01,
+          -5.91072679e-01, 4.77326155e-01, -9.97203588e-01,
+          -8.11729312e-01, -4.96874928e-01, 6.63277864e-01,
+          -9.84423101e-01, -8.83010507e-01, -5.07812500e-01,
+          6.64438605e-01, -9.83375847e-01, -8.83374214e-01,
+          1.04899168e-01, -2.87792206e-01, 2.39595056e-01,
+          9.40259218e-01, -9.31247532e-01, -1.57223046e-01,
+          3.08670521e-01, 8.52970839e-01, -9.16286886e-01,
+          -3.14139485e-01, 7.21045971e-01, 7.70179391e-01,
+          -8.66531849e-01, -6.68005168e-01, 7.08955288e-01,
+          7.77252555e-01, -8.69724274e-01, -6.70956254e-01 ]
+      - [ -6.12054706e-01, 9.86802578e-03, -2.56210089e-01,
+          4.95188117e-01, -6.89748108e-01, 1.41177893e-01,
+          -2.59899080e-01, 1.22402072e-01, -5.75129747e-01,
+          -9.99113917e-02, -1.71874940e-01, -3.68130803e-01,
+          -4.04509246e-01, -5.16438067e-01, -1.73437536e-01,
+          -3.54484260e-01, -3.97635400e-01, -5.26049554e-01,
+          2.88247705e-01, 3.06089997e-01, 3.54330659e-01, 5.06398678e-02,
+          -2.22617030e-01, 4.34606314e-01, 3.96531701e-01,
+          -4.28605795e-01, -3.02681327e-02, 3.43279839e-02,
+          4.66998696e-01, -2.57188380e-01, 3.35279703e-02,
+          -1.54980242e-01, 4.57711339e-01, -2.88100362e-01,
+          2.91496515e-02, -1.63144052e-01 ]
+      - [ -6.13918781e-01, 1.49716139e-02, -2.57629514e-01,
+          4.79748487e-01, -6.80786669e-01, 1.45429850e-01,
+          -2.58459330e-01, 1.24678612e-01, -5.70673347e-01,
+          -8.99239779e-02, -1.73437417e-01, -3.79390836e-01,
+          -3.94739807e-01, -5.12022495e-01, -1.76562488e-01,
+          -3.64795208e-01, -3.85892391e-01, -5.19561112e-01,
+          2.80432820e-01, 3.22070241e-01, 3.52081060e-01, 2.52505541e-02,
+          -1.89726710e-01, 4.57151890e-01, 3.98843884e-01,
+          -4.00174022e-01, -3.06423306e-02, 6.82958364e-02,
+          4.69489455e-01, -2.51495123e-01, 5.18361330e-02,
+          -1.33117259e-01, 4.60198998e-01, -2.43323326e-01,
+          3.19623947e-02, -1.31598473e-01 ]
+      - [ -6.01284146e-01, 2.48087645e-02, -2.40596175e-01,
+          4.95814204e-01, -6.83145165e-01, 1.53562069e-01,
+          -2.36861050e-01, 1.44528985e-01, -5.72143197e-01,
+          -7.60941505e-02, -1.60937488e-01, -3.77489865e-01,
+          -3.84448588e-01, -4.99728978e-01, -1.60937488e-01,
+          -3.59520257e-01, -3.79239321e-01, -5.09145498e-01,
+          2.92455673e-01, 3.18660259e-01, 3.67828965e-01, 8.76855850e-03,
+          -1.89608455e-01, 4.32650328e-01, 4.17340994e-01,
+          -3.86184633e-01, -4.68846560e-02, 6.37085438e-02,
+          4.81942773e-01, -2.58567691e-01, 5.02890348e-02,
+          -1.38742745e-01, 4.70149159e-01, -2.52139568e-01,
+          3.26855183e-02, -1.36128962e-01 ]
+      - [ -6.01284146e-01, 1.55872107e-02, -2.42015660e-01,
+          4.84249711e-01, -6.85056448e-01, 1.36562943e-01,
+          -2.39740789e-01, 1.64527774e-01, -5.85868239e-01,
+          -8.24424028e-02, -1.60937488e-01, -3.48863721e-01,
+          -4.01711881e-01, -5.03458858e-01, -1.60937488e-01,
+          -3.46180916e-01, -3.92528594e-01, -5.15932500e-01,
+          2.87646651e-01, 3.07581186e-01, 3.54330659e-01, 3.06715965e-02,
+          -2.11522996e-01, 4.25148010e-01, 4.10404563e-01,
+          -3.88926148e-01, -5.69333434e-02, 4.52908278e-02,
+          4.74470735e-01, -2.84967661e-01, 4.45302725e-02,
+          -1.62148714e-01, 4.60198998e-01, -2.75771379e-01,
+          2.83279419e-02, -1.55988991e-01 ]
+      - [ -7.31565833e-01, -1., -5.78424454e-01, -2.24771261e-01,
+          -9.98972178e-01, -9.99447346e-01, -5.92512548e-01,
+          -1.64462149e-01, -9.99040902e-01, -9.99412179e-01,
+          -5.64062417e-01, 1.63294554e-01, -9.99120593e-01,
+          -9.99247134e-01, -5.68749964e-01, 1.92052484e-01,
+          -9.99109983e-01, -9.99307394e-01, -6.66967273e-01,
+          -9.89094794e-01, -7.52530932e-01, 1.47378683e-01,
+          -9.99739408e-01, -9.91634727e-01, -7.64161825e-01,
+          2.05310702e-01, -9.99812365e-01, -9.92093265e-01,
+          -7.48443365e-01, 3.59582782e-01, -9.99829590e-01,
+          -9.93595123e-01, -7.51243830e-01, 3.38972211e-01,
+          -9.99787807e-01, -9.93697166e-01 ]
+      - [ -6.95111871e-01, -4.55264032e-01, -3.51312995e-01,
+          5.94556928e-01, -9.65878904e-01, -5.44828773e-01,
+          -3.75090003e-01, 7.22437143e-01, -9.67489660e-01,
+          -5.44377208e-01, -3.28124762e-02, 7.75440454e-01,
+          -9.46015060e-01, -7.74853587e-01, -4.06249166e-02,
+          6.89752579e-01, -9.38463926e-01, -7.90050387e-01,
+          2.76224852e-01, -5.32926917e-02, 2.57592797e-01,
+          2.60784149e-01, -6.10293031e-01, -4.29335833e-02,
+          2.92485476e-01, 2.89685011e-01, -6.15620255e-01,
+          -1.10068977e-01, 2.85180569e-01, 2.88578749e-01,
+          -4.83546495e-01, -3.71741652e-01, 3.10945153e-01,
+          -7.84370303e-02, -4.16430891e-01, -4.88924801e-01 ]
+      - [ -6.25724912e-01, -2.68162370e-01, -3.03051829e-01,
+          3.25874567e-01, -8.29001427e-01, -3.48667860e-01,
+          -3.24693978e-01, 2.92150617e-01, -7.98226595e-01,
+          -3.82991076e-01, -3.04687500e-01, 6.85718060e-02,
+          -6.75971985e-01, -6.20070457e-01, -2.85937428e-01,
+          6.21803999e-02, -6.85657263e-01, -6.49687886e-01,
+          2.53381371e-01, 5.72335720e-02, 2.53093362e-01,
+          -1.94881558e-02, -3.89212549e-01, 3.00152302e-02,
+          2.60115504e-01, -1.15100622e-01, -3.47000062e-01,
+          -8.41328502e-02, 3.05105805e-01, 1.39352918e-01,
+          -2.99795330e-01, -2.50682235e-01, 3.45771074e-01,
+          -2.68375516e-01, -2.15851367e-01, -3.92209709e-01 ]
+      - [ -6.24482155e-01, -2.74621606e-01, -3.01632404e-01,
+          3.19927812e-01, -8.31102014e-01, -3.59030545e-01,
+          -3.24693978e-01, 3.01306844e-01, -8.05171907e-01,
+          -3.90980065e-01, -3.06249976e-01, 6.98959827e-02,
+          -6.79132104e-01, -6.23392224e-01, -2.87499964e-01,
+          6.93866014e-02, -6.90705895e-01, -6.52624130e-01,
+          2.64803171e-01, 5.77853918e-02, 2.68841386e-01,
+          -2.44546533e-02, -3.83987784e-01, 3.32158804e-02,
+          2.71676183e-01, -1.35162234e-01, -3.37190270e-01,
+          -9.08132792e-02, 3.07596564e-01, 1.15567803e-01,
+          -2.80512214e-01, -2.44322419e-01, 3.50746155e-01,
+          -2.94185758e-01, -1.98878229e-01, -3.90153348e-01 ]
+      - [ -7.38400996e-01, -5.85240483e-01, -4.49254811e-01,
+          4.66272712e-01, -9.79845166e-01, -6.76442266e-01,
+          -4.57163393e-01, 5.54705739e-01, -9.79482770e-01,
+          -6.88896894e-01, -2.37499952e-01, 7.59337187e-01,
+          -9.68874872e-01, -8.26802135e-01, -2.53125012e-01,
+          6.62716031e-01, -9.60997462e-01, -8.39640856e-01,
+          2.41959691e-01, -1.80904508e-01, 2.44094610e-01,
+          4.89729047e-01, -7.72966683e-01, -1.48979664e-01,
+          2.85549045e-01, 4.99496222e-01, -7.68685937e-01,
+          -2.26498961e-01, 3.54919076e-01, 5.41049957e-01,
+          -6.97987497e-01, -5.08005381e-01, 3.68159056e-01,
+          1.29069448e-01, -6.20961070e-01, -6.04344428e-01 ]
+      - [ -7.68019915e-01, -5.89115262e-01, -5.30163169e-01,
+          5.68462372e-01, -9.86533523e-01, -6.22979164e-01,
+          -5.32037377e-01, 5.68150401e-01, -9.81931686e-01,
+          -6.76468372e-01, -3.31249952e-01, 7.46751785e-01,
+          -9.69896257e-01, -8.28024387e-01, -3.37499976e-01,
+          7.86098957e-01, -9.71485734e-01, -8.24033737e-01,
+          -6.34205341e-02, -1.63256824e-01, 7.98650980e-02,
+          6.88111663e-01, -7.88994431e-01, 8.25117826e-02,
+          3.81501913e-02, 4.49109554e-01, -7.22241402e-01,
+          -1.48898244e-01, 2.17932701e-01, -3.17474604e-02,
+          -5.62048197e-01, -6.26150250e-01, 2.26368189e-01,
+          1.24719739e-01, -6.11097217e-01, -5.96533179e-01 ]
+      - [ -7.86454022e-01, -9.93312299e-01, -6.97657943e-01,
+          -1.87088132e-01, -9.98967171e-01, -9.96062577e-01,
+          -6.93304539e-01, -1.27643585e-01, -9.99126434e-01,
+          -9.96674001e-01, -6.73437476e-01, 2.08711863e-01,
+          -9.99143004e-01, -9.97002482e-01, -6.81249976e-01,
+          2.40438819e-01, -9.99235332e-01, -9.97008741e-01,
+          -6.42921567e-01, -9.49585855e-01, -6.96287990e-01,
+          2.31600285e-01, -9.99756038e-01, -9.69113231e-01,
+          -6.64739907e-01, 2.88604140e-01, -9.99759614e-01,
+          -9.73154306e-01, -6.43835664e-01, 4.66946006e-01,
+          -9.99748111e-01, -9.81845140e-01, -6.56716406e-01,
+          4.51987982e-01, -9.99753058e-01, -9.81820226e-01 ]
+      - [ -7.33222842e-01, -2.05968022e-01, -4.35060322e-01,
+          2.56855726e-01, -7.32268631e-01, -1.66525960e-01,
+          -4.68682528e-01, 3.26417804e-01, -7.47370362e-01,
+          -2.19806433e-01, -4.09374952e-01, -2.76798666e-01,
+          -5.40810347e-01, -6.29751146e-01, -4.10937428e-01,
+          -3.09616923e-01, -5.21041572e-01, -6.45666718e-01,
+          -7.06342459e-02, 1.05512261e-01, 1.04611874e-01,
+          3.58061790e-02, -2.91006684e-01, 2.83384204e-01,
+          9.13294554e-02, -1.28534377e-01, -2.83018589e-01,
+          -7.11339712e-03, 1.75591469e-01, -4.28502858e-01,
+          -8.34303498e-02, -3.89551640e-01, 1.61691427e-01,
+          -4.93710339e-01, -6.30151629e-02, -3.91018510e-01 ]
+      - [ -7.30530262e-01, -2.04869390e-01, -4.29382563e-01,
+          2.66903996e-01, -7.36682892e-01, -1.67267621e-01,
+          -4.61483061e-01, 3.55195522e-01, -7.55951941e-01,
+          -2.13585913e-01, -4.01562452e-01, -2.44740963e-01,
+          -5.49772382e-01, -6.23674870e-01, -4.03124928e-01,
+          -2.82820523e-01, -5.30620933e-01, -6.42063379e-01,
+          -8.38594437e-02, 1.04739666e-01, 9.56130028e-02,
+          -2.16954947e-03, -2.70064950e-01, 2.64810562e-01,
+          8.43930244e-02, -8.79210830e-02, -3.02422106e-01,
+          1.16040707e-02, 1.53175592e-01, -3.92641246e-01,
+          -8.73228312e-02, -3.73477638e-01, 1.44278646e-01,
+          -5.05718768e-01, -5.16269803e-02, -3.89459074e-01 ]
+      - [ -7.29080379e-01, -2.02712893e-01, -4.27963138e-01,
+          2.48565316e-01, -7.29493380e-01, -1.70509517e-01,
+          -4.58603263e-01, 3.67743492e-01, -7.57311225e-01,
+          -2.05628633e-01, -3.99999976e-01, -2.51925588e-01,
+          -5.44908881e-01, -6.21823430e-01, -4.03124928e-01,
+          -2.92803586e-01, -5.24041414e-01, -6.40164971e-01,
+          -8.74662399e-02, 1.05673313e-01, 1.00112557e-01,
+          -5.72991371e-02, -2.41400480e-01, 2.34421015e-01,
+          8.20808411e-02, -4.45587635e-02, -3.19878757e-01,
+          3.71968746e-02, 1.50684953e-01, -3.92987013e-01,
+          -8.56277347e-02, -3.72358799e-01, 1.46766186e-01,
+          -4.91066575e-01, -5.91627955e-02, -3.87395620e-01 ]
+      - [ -8.07373643e-01, -9.00144815e-01, -7.54435778e-01,
+          2.48845816e-02, -9.97967958e-01, -9.43117559e-01,
+          -7.73938060e-01, 1.14362717e-01, -9.98112857e-01,
+          -9.39899266e-01, -6.95312500e-01, 4.60789323e-01,
+          -9.97852445e-01, -9.63429987e-01, -6.96874976e-01,
+          4.78178382e-01, -9.97879088e-01, -9.64306116e-01,
+          -4.98647451e-01, -7.36313820e-01, -4.51068580e-01,
+          3.86041045e-01, -9.80063081e-01, -8.01707804e-01,
+          -4.61271703e-01, 4.98646617e-01, -9.81531382e-01,
+          -7.82436848e-01, -3.10087204e-01, 5.16832709e-01,
+          -9.67876792e-01, -9.04485404e-01, -3.15920413e-01,
+          4.39497828e-01, -9.62674201e-01, -9.12182331e-01 ]
+      - [ -7.92253494e-01, -9.59448874e-01, -7.92760849e-01,
+          -1.08984351e-01, -9.98216510e-01, -9.79645193e-01,
+          -7.98416078e-01, -4.39325571e-02, -9.98177767e-01,
+          -9.80290711e-01, -7.48437464e-01, 2.98265815e-01,
+          -9.98246670e-01, -9.86629486e-01, -7.50000000e-01,
+          3.22758794e-01, -9.98293936e-01, -9.86961246e-01,
+          -5.48542261e-01, -8.75153363e-01, -6.24296963e-01,
+          2.87493706e-01, -9.95244026e-01, -9.17661250e-01,
+          -6.13872826e-01, 3.35735798e-01, -9.93388951e-01,
+          -9.19506371e-01, -5.41718602e-01, 4.68285441e-01,
+          -9.91245747e-01, -9.57480192e-01, -5.52238822e-01,
+          4.28060770e-01, -9.89733100e-01, -9.59052503e-01 ]
+      - [ -7.64291644e-01, -6.15247488e-01, -5.74166059e-01,
+          2.73357630e-02, -9.28631783e-01, -7.63348460e-01,
+          -5.91072679e-01, 1.69603944e-01, -9.32872415e-01,
+          -7.37048090e-01, -5.95312476e-01, 1.68824196e-01,
+          -8.89051974e-01, -8.51903498e-01, -5.92187524e-01,
+          6.52443171e-02, -8.74396503e-01, -8.75853181e-01,
+          -3.53772223e-01, -5.02619922e-01, -3.45331788e-01,
+          6.44510984e-02, -8.19567859e-01, -6.66887641e-01,
+          -3.84971201e-01, 2.44171262e-01, -8.30415726e-01,
+          -5.80614030e-01, -3.47447038e-01, 1.91907525e-01,
+          -7.74744332e-01, -7.61169910e-01, -3.43283534e-01,
+          -4.80401516e-02, -7.32027292e-01, -8.04666698e-01 ]
+      - [ -7.95774639e-01, -7.68090904e-01, -6.65010691e-01,
+          1.67945623e-01, -9.90439236e-01, -8.56113791e-01,
+          -6.64506793e-01, 2.56781459e-01, -9.90229309e-01,
+          -8.57494354e-01, -5.14062524e-01, 4.61488128e-01,
+          -9.83822763e-01, -9.29084301e-01, -5.14062524e-01,
+          4.27863836e-01, -9.81455386e-01, -9.35613096e-01,
+          -3.68199587e-01, -5.93773246e-01, -3.76827896e-01,
+          2.24195600e-01, -9.04077649e-01, -7.05763340e-01,
+          -4.01156127e-01, 2.72280455e-01, -8.88428748e-01,
+          -6.83596611e-01, -4.09713566e-01, 2.79845357e-01,
+          -8.49061847e-01, -8.11475158e-01, -4.05472696e-01,
+          5.05700111e-02, -8.13939631e-01, -8.49161446e-01 ]
+      - [ -7.71540999e-01, -6.65887833e-01, -5.94038308e-01,
+          1.23611093e-01, -9.63031471e-01, -7.93102980e-01,
+          -6.25629902e-01, 1.43565893e-01, -9.47590947e-01,
+          -7.83699691e-01, -6.14062428e-01, 2.27228999e-01,
+          -9.24625337e-01, -8.82585049e-01, -6.10937476e-01,
+          1.93230987e-01, -9.20157790e-01, -8.94464314e-01,
+          -3.39945972e-01, -5.23836553e-01, -3.38582635e-01,
+          1.28602266e-01, -8.50604773e-01, -6.71274185e-01,
+          -3.78034770e-01, 2.39892125e-01, -8.42749834e-01,
+          -6.11442268e-01, -3.49937677e-01, 2.33577609e-01,
+          -7.97514677e-01, -7.69381166e-01, -3.48258734e-01,
+          -2.27104425e-02, -7.52769947e-01, -8.14786911e-01 ]
+      - [ -8.08823526e-01, -9.93012547e-01, -7.53016353e-01,
+          -1.62866175e-01, -9.99520361e-01, -9.94600773e-01,
+          -7.81137466e-01, -9.79048014e-02, -9.99539554e-01,
+          -9.94287372e-01, -7.51562476e-01, 2.41411567e-01,
+          -9.99524772e-01, -9.95505452e-01, -7.57812500e-01,
+          2.69141436e-01, -9.99563515e-01, -9.95589972e-01,
+          -6.51938677e-01, -9.24768269e-01, -6.44544482e-01,
+          2.81167388e-01, -9.99905944e-01, -9.55504656e-01,
+          -6.71676278e-01, 3.50211382e-01, -9.99955416e-01,
+          -9.54071462e-01, -6.23910308e-01, 5.41736484e-01,
+          -9.99960661e-01, -9.72194493e-01, -6.24378145e-01,
+          5.26697397e-01, -9.99920011e-01, -9.72588837e-01 ]
+      - [ -8.70753944e-01, -9.84467030e-01, -1., -2.05926895e-01,
+          -9.94523048e-01, -9.85958397e-01, -1., -1.88264310e-01,
+          -9.92799520e-01, -9.92775023e-01, -1., 4.85666990e-02,
+          -9.89440441e-01, -9.98861790e-01, -9.84374940e-01,
+          9.13130045e-02, -9.90875840e-01, -9.98732686e-01,
+          -8.14848185e-01, -9.80014861e-01, -1., 7.31353760e-02,
+          -9.91435409e-01, -9.85444784e-01, -9.86127198e-01,
+          7.98383951e-02, -9.87775981e-01, -9.94295835e-01, -1.,
+          1.94576859e-01, -9.88579214e-01, -9.95205760e-01, -1.,
+          1.69350624e-01, -9.88720238e-01, -9.95447874e-01 ]
+      - [ -7.56835103e-01, -6.92078888e-01, -5.55713236e-01,
+          5.15568376e-01, -9.99378800e-01, -7.66058803e-01,
+          -5.70914268e-01, 5.97718954e-01, -9.99277413e-01,
+          -7.81034470e-01, -4.18749928e-01, 1., -9.98373628e-01,
+          -8.62518907e-01, -4.24999952e-01, 1., -9.98346031e-01,
+          -8.63608658e-01, 4.03666973e-01, -2.32029080e-01,
+          6.08548999e-01, 9.81516004e-01, -9.27701235e-01,
+          -1.52251661e-01, 3.91907454e-01, 1., -9.01200533e-01,
+          -1.11187518e-01, 8.82938862e-01, 8.58323455e-01,
+          -8.38825703e-01, -5.87478042e-01, 8.73134375e-01,
+          8.31120610e-01, -8.36751819e-01, -5.97258329e-01 ]
+      - [ -5.81607342e-01, -1.34204507e-01, -2.27821112e-01,
+          4.94241953e-01, -7.85390854e-01, -1.00186288e-01,
+          -2.54139602e-01, 4.43341136e-01, -7.60687232e-01,
+          -1.84999108e-01, -1.45312488e-01, -7.44051337e-02,
+          -5.99711478e-01, -5.90012431e-01, -1.48437440e-01,
+          -6.08986616e-02, -5.98104000e-01, -5.99491477e-01,
+          4.03065801e-01, 2.21898913e-01, 4.19572473e-01, 2.12859631e-01,
+          -3.54878724e-01, 4.18564200e-01, 4.17340994e-01,
+          7.82543421e-02, -3.01199973e-01, 2.36562848e-01,
+          5.24283886e-01, -3.61952603e-01, -3.45290899e-02,
+          -2.83303499e-01, 5.42288542e-01, -4.33892608e-01,
+          -4.30293083e-02, -3.14625442e-01 ]
+      - [ -5.83471417e-01, -1.22754991e-01, -2.26401687e-01,
+          5.06066561e-01, -7.81017542e-01, -7.77372122e-02,
+          -2.52699673e-01, 4.39395905e-01, -7.51804948e-01,
+          -1.68538809e-01, -1.46874964e-01, -8.72014165e-02,
+          -5.86925030e-01, -5.81080317e-01, -1.48437440e-01,
+          -7.69000053e-02, -5.84671021e-01, -5.91567397e-01,
+          3.70002985e-01, 2.36447453e-01, 4.10573721e-01, 1.97838187e-01,
+          -3.38322341e-01, 4.30096269e-01, 4.01155949e-01,
+          5.83853722e-02, -2.75972426e-01, 2.55912781e-01,
+          5.06849289e-01, -3.73342812e-01, -1.27458572e-02,
+          -2.70798922e-01, 5.24875641e-01, -4.42337871e-01,
+          -2.17775702e-02, -3.00962210e-01 ]
+      - [ -8.65782917e-01, -9.37441468e-01, -7.98438609e-01,
+          -4.28318977e-03, -1., -9.63872313e-01, -8.32973361e-01,
+          7.79973269e-02, -1., -9.60299611e-01, -7.90624976e-01,
+          4.42558527e-01, -1., -9.74112928e-01, -7.96875000e-01,
+          4.62063074e-01, -1., -9.74310637e-01, -4.52960670e-01,
+          -7.40870118e-01, -3.52081001e-01, 6.11332059e-01,
+          -9.99641299e-01, -8.02382588e-01, -3.75722647e-01,
+          7.16306806e-01, -9.99684930e-01, -7.89183915e-01,
+          -1.95516825e-01, 9.51533198e-01, -9.98716235e-01,
+          -8.88818085e-01, -2.06467748e-01, 9.47041988e-01,
+          -9.98645127e-01, -8.89774799e-01 ]
+      - [ -8.79660308e-01, -2.54645109e-01, -7.09013462e-01,
+          -2.21580625e-01, -5.09149909e-01, -2.36279011e-01,
+          -7.30741501e-01, 1.85139537e-01, -5.97835183e-01,
+          1.61855221e-02, -6.67187452e-01, -1., -3.00193667e-01,
+          -7.95280695e-01, -6.73437476e-01, -1., -2.81419218e-01,
+          -8.10069323e-01, -6.56747818e-01, -1.92885995e-01,
+          -4.28571403e-01, -7.46491075e-01, -1.49645686e-01,
+          -6.18553221e-01, -4.75144506e-01, -3.32608283e-01,
+          -2.40347207e-01, -2.92140722e-01, -4.32129562e-01,
+          -6.96548462e-01, -1.66570246e-01, -6.86404288e-01,
+          -4.32835817e-01, -6.83852553e-01, -1.96941614e-01,
+          -6.77671790e-01 ]
+      - [ -6.39602304e-01, -2.84365714e-01, -2.56210089e-01,
+          7.41132140e-01, -9.41534638e-01, -3.48587275e-01,
+          -2.77177811e-01, 1., -9.49285924e-01, -2.77020514e-01,
+          1.35937572e-01, 7.76980877e-01, -9.03830290e-01,
+          -6.89878821e-01, 1.26562595e-01, 7.54393458e-01,
+          -9.01286066e-01, -6.97796345e-01, 4.07874942e-01,
+          1.88173532e-01, 4.10573721e-01, 1.16777778e-01,
+          -4.04243112e-01, 1.84041977e-01, 3.87283206e-01,
+          4.88245726e-01, -4.70710039e-01, 4.52740312e-01,
+          5.39227962e-01, -1.80540740e-01, -1.56042576e-01,
+          -2.90727079e-01, 5.42288542e-01, -3.69006217e-01,
+          -1.15606427e-01, -3.44988942e-01 ]
+      - [ -5.74150801e-01, -1.22724354e-01, -2.33498931e-01,
+          2.52291083e-01, -7.35707045e-01, -2.13719726e-01,
+          -2.49819934e-01, 6.66272759e-01, -7.92670250e-01,
+          -6.26013875e-02, -1.39062464e-01, -7.73589015e-02,
+          -5.88988781e-01, -5.78828335e-01, -1.39062464e-01,
+          -1.81290150e-01, -5.64080715e-01, -6.15539193e-01,
+          4.10279632e-01, 2.28192806e-01, 4.24072027e-01,
+          -1.32406890e-01, -2.51611769e-01, 1.25911236e-01,
+          3.66473913e-01, 2.68042564e-01, -3.36294472e-01,
+          4.27705169e-01, 5.39227962e-01, -3.29704225e-01,
+          -3.92968655e-02, -2.67542481e-01, 5.62189102e-01,
+          -5.46589017e-01, -7.92741776e-05, -3.38453412e-01 ]
+      - [ -5.73736548e-01, -1.10366821e-01, -2.29240656e-01,
+          2.67293215e-01, -7.30512142e-01, -1.89307690e-01,
+          -2.45500267e-01, 6.69388056e-01, -7.86709428e-01,
+          -4.50915694e-02, -1.39062464e-01, -9.26529169e-02,
+          -5.74080944e-01, -5.68623900e-01, -1.40624940e-01,
+          -1.89292550e-01, -5.50205112e-01, -6.04359508e-01,
+          3.67598414e-01, 2.43543029e-01, 4.08323884e-01,
+          -1.56999171e-01, -2.26118863e-01, 1.35326624e-01,
+          3.47976804e-01, 2.29449987e-01, -3.05449665e-01,
+          4.34721351e-01, 5.14321208e-01, -3.43124509e-01,
+          -1.53049231e-02, -2.54315317e-01, 5.37313461e-01,
+          -5.71886420e-01, 2.80790329e-02, -3.29443514e-01 ]
+      - [ -8.73239458e-01, -6.69647217e-01, -7.09013462e-01,
+          1.57988906e-01, -9.62740719e-01, -7.49849796e-01,
+          -7.24981964e-01, 1.97985172e-01, -9.63585794e-01,
+          -7.81581044e-01, -5.99999905e-01, 2.76337624e-01,
+          -9.47752655e-01, -9.03648734e-01, -6.10937476e-01,
+          2.79327154e-01, -9.45411742e-01, -9.06287730e-01,
+          -6.36309028e-01, -4.91094708e-01, -4.87064064e-01,
+          1.09169364e-01, -7.87202716e-01, -5.59006333e-01,
+          -5.00578046e-01, 7.79414177e-03, -7.52826154e-01,
+          -6.55142784e-01, -4.89414692e-01, -7.84916878e-02,
+          -6.77788973e-01, -7.86767423e-01, -5.00000000e-01,
+          -1.54608011e-01, -6.67710900e-01, -7.98612773e-01 ]
+      - [ -8.61019075e-01, -6.43306971e-01, -6.91980124e-01,
+          1.96550131e-01, -9.56439376e-01, -7.07498193e-01,
+          -7.22102165e-01, 1.65770411e-01, -9.43789542e-01,
+          -7.46178985e-01, -6.04687452e-01, 2.14632869e-01,
+          -9.22996998e-01, -8.84001255e-01, -6.04687452e-01,
+          2.37080693e-01, -9.24195766e-01, -8.87533784e-01,
+          -5.87015390e-01, -3.77429605e-01, -4.53318357e-01,
+          1.59470439e-01, -7.06802607e-01, -3.36816132e-01,
+          -4.61271703e-01, 3.06853056e-02, -6.70485497e-01,
+          -4.93704557e-01, -3.97260308e-01, -1.97661221e-01,
+          -5.75556993e-01, -7.44004607e-01, -3.83084595e-01,
+          -3.58090758e-01, -5.50329328e-01, -7.76879370e-01 ]
+      - [ -8.54391038e-01, -9.00411606e-01, -7.79985785e-01,
+          -1.44338608e-03, -9.95671451e-01, -9.37932312e-01,
+          -7.95536399e-01, 5.15772104e-02, -9.95511830e-01,
+          -9.44829106e-01, -7.42187500e-01, 3.69826913e-01,
+          -9.94333565e-01, -9.65528488e-01, -7.50000000e-01,
+          3.87263656e-01, -9.94135439e-01, -9.66018438e-01,
+          -6.94619775e-01, -7.45388865e-01, -6.22047186e-01,
+          3.62842798e-01, -9.68059599e-01, -7.43897080e-01,
+          -6.55491352e-01, 3.48169923e-01, -9.60547268e-01,
+          -7.87463248e-01, -5.61643898e-01, 2.82387018e-01,
+          -9.33567047e-01, -9.13191676e-01, -5.77114403e-01,
+          2.38495469e-01, -9.29834127e-01, -9.15674210e-01 ]
+      - [ -8.94573331e-01, -5.97356081e-01, -7.28885710e-01,
+          1.22518539e-02, -9.06232536e-01, -7.02273667e-01,
+          -7.53779650e-01, -5.70797920e-02, -8.76962185e-01,
+          -7.41624355e-01, -7.03124940e-01, -1.15431905e-01,
+          -8.25458050e-01, -8.73842716e-01, -7.07812488e-01,
+          -1.04420841e-01, -8.22537899e-01, -8.79858077e-01,
+          -6.94018722e-01, -4.62819278e-01, -4.93813276e-01,
+          -5.59697151e-02, -6.99992955e-01, -5.81191659e-01,
+          -5.05202353e-01, -1.11845315e-01, -6.81178451e-01,
+          -6.55799747e-01, -4.96886671e-01, -2.31249928e-01,
+          -5.86444438e-01, -7.77728081e-01, -5.04975140e-01,
+          -2.88863420e-01, -5.86622655e-01, -7.89115965e-01 ]
+      - [ -8.97680223e-01, -5.27785301e-01, -7.26046801e-01,
+          -1.68326080e-01, -8.08427513e-01, -6.57931268e-01,
+          -7.48020172e-01, -1.57933533e-01, -8.01807046e-01,
+          -6.85844779e-01, -7.29687452e-01, -3.30027580e-01,
+          -6.90873027e-01, -8.29942822e-01, -7.34375000e-01,
+          -3.31198752e-01, -6.82492077e-01, -8.40444386e-01,
+          -7.03035831e-01, -4.41401720e-01, -5.00562429e-01,
+          -1.46040678e-01, -6.37873471e-01, -5.81820726e-01,
+          -5.14450908e-01, -1.95312738e-01, -6.23259544e-01,
+          -6.54270053e-01, -5.11830688e-01, -2.36829996e-01,
+          -5.55056334e-01, -7.56902814e-01, -5.17412901e-01,
+          -2.99166858e-01, -5.54369569e-01, -7.70018101e-01 ]
+      - [ -8.28086138e-01, -9.41279352e-01, -7.57274628e-01,
+          -3.56280804e-02, -9.99364018e-01, -9.69753504e-01,
+          -7.72498190e-01, 3.59139442e-02, -9.99317646e-01,
+          -9.69790697e-01, -7.31249928e-01, 3.98304224e-01,
+          -9.99337792e-01, -9.78671432e-01, -7.39062488e-01,
+          4.19725180e-01, -9.99356747e-01, -9.78850901e-01,
+          -3.92245352e-01, -7.56890893e-01, -3.45331788e-01,
+          5.69324017e-01, -9.99044120e-01, -8.19607198e-01,
+          -3.38728368e-01, 6.55213237e-01, -9.99089420e-01,
+          -8.21868598e-01, -1.83063507e-01, 8.91688824e-01,
+          -9.98342514e-01, -9.02516544e-01, -1.96517467e-01,
+          8.85191560e-01, -9.98255491e-01, -9.03237045e-01 ]
+      - [ -8.28707516e-01, -9.41302419e-01, -7.58694112e-01,
+          -5.73591590e-02, -9.98495758e-01, -9.70140219e-01,
+          -7.76817858e-01, 1.43810511e-02, -9.98494327e-01,
+          -9.69904721e-01, -7.35937476e-01, 3.97544622e-01,
+          -9.99313116e-01, -9.78512406e-01, -7.45312452e-01,
+          4.20740843e-01, -9.99356031e-01, -9.78515387e-01,
+          -3.87436211e-01, -7.56461203e-01, -3.34083259e-01,
+          5.63510418e-01, -9.98709440e-01, -8.19713950e-01,
+          -3.24855566e-01, 6.48235679e-01, -9.98698950e-01,
+          -8.21892738e-01, -1.83063507e-01, 8.92671108e-01,
+          -9.98344600e-01, -9.02277887e-01, -2.03980088e-01,
+          8.89691830e-01, -9.98342156e-01, -9.02356267e-01 ]
+      - [ -6.73985124e-01, -2.60215342e-01, -3.99574220e-01,
+          1.26364231e-01, -7.70236135e-01, -3.76116157e-01,
+          -4.01007950e-01, 3.73356223e-01, -8.00814331e-01,
+          -3.11154187e-01, -3.59374940e-01, -1.08784199e-01,
+          -6.52153432e-01, -6.70947373e-01, -3.64062488e-01,
+          -4.96535301e-02, -6.55404866e-01, -6.65794075e-01,
+          8.92695189e-02, 5.65074682e-02, 1.63104653e-01,
+          -1.75241828e-02, -4.38095808e-01, -5.66509366e-02,
+          1.63005710e-01, 1.52153492e-01, -4.55218971e-01,
+          4.59040403e-02, 2.40348697e-01, -9.50032473e-02,
+          -2.85834014e-01, -3.80730391e-01, 2.26368189e-01,
+          -9.01049972e-02, -2.94168174e-01, -3.73923779e-01 ]
+      - [ -6.67978466e-01, -2.52564490e-01, -3.86799216e-01,
+          1.53822303e-01, -7.75267422e-01, -3.64305735e-01,
+          -3.93808424e-01, 3.57717633e-01, -7.94034839e-01,
+          -3.10227573e-01, -3.51562440e-01, -1.05325997e-01,
+          -6.48665786e-01, -6.65302336e-01, -3.59374940e-01,
+          -4.95548844e-02, -6.49561286e-01, -6.59032702e-01,
+          1.11511707e-01, 6.88164234e-02, 1.81102276e-01,
+          -2.94072628e-02, -4.26385403e-01, -5.04934192e-02,
+          1.81502938e-01, 1.32232070e-01, -4.38570917e-01,
+          5.21047115e-02, 2.60273933e-01, -9.43704844e-02,
+          -2.75745511e-01, -3.69439662e-01, 2.41293430e-01,
+          -8.10722113e-02, -2.83061326e-01, -3.56941044e-01 ]
+      - [ -7.65120149e-01, -9.90692437e-01, -6.59332871e-01,
+          -1.86961710e-01, -9.99012351e-01, -9.96326566e-01,
+          -6.84665203e-01, -1.23866439e-01, -9.99020040e-01,
+          -9.96087492e-01, -6.53124988e-01, 2.10444093e-01,
+          -9.99055445e-01, -9.96770978e-01, -6.53124988e-01,
+          2.38585711e-01, -9.99100983e-01, -9.96908605e-01,
+          -6.07454181e-01, -9.40371513e-01, -6.26546621e-01,
+          2.38950253e-01, -9.99606967e-01, -9.68427718e-01,
+          -6.46242738e-01, 3.05508614e-01, -9.99691784e-01,
+          -9.67977166e-01, -5.94022393e-01, 4.85777855e-01,
+          -9.99679863e-01, -9.79806960e-01, -5.97014904e-01,
+          4.69348311e-01, -9.99671042e-01, -9.80212092e-01 ]
+      - [ -7.71126747e-01, -6.58510089e-01, -5.61391056e-01,
+          4.23741698e-01, -9.89822567e-01, -7.31862903e-01,
+          -5.07559419e-01, 4.56407547e-01, -9.86000657e-01,
+          -7.65665531e-01, -2.92187452e-01, 6.86411500e-01,
+          -9.80712533e-01, -8.76879215e-01, -3.14062476e-01,
+          6.39177680e-01, -9.75840211e-01, -8.79331887e-01,
+          2.85241961e-01, -2.78746903e-01, 2.73340821e-01,
+          5.22279382e-01, -8.39212477e-01, -2.97737837e-01,
+          1.14450812e-01, 3.53743792e-01, -7.29438186e-01,
+          -2.76736200e-01, 4.29638743e-01, 1.57306552e-01,
+          -7.13463783e-01, -7.01624751e-01, 4.02985096e-01,
+          1.56562924e-01, -7.08888888e-01, -6.90078378e-01 ]
+      - [ -8.55426669e-01, -9.19957519e-01, -7.45919108e-01,
+          2.47811079e-02, -9.99733925e-01, -9.57002938e-01,
+          -7.76817858e-01, 1.08984470e-01, -9.99792755e-01,
+          -9.54445183e-01, -7.15624988e-01, 4.74971294e-01,
+          -9.99739230e-01, -9.70334411e-01, -7.21874952e-01,
+          4.94625330e-01, -9.99733686e-01, -9.70332026e-01,
+          -2.91253388e-01, -7.11070299e-01, -1.83352113e-01,
+          6.56866908e-01, -9.99227762e-01, -7.87733316e-01,
+          -1.95375800e-01, 7.57761240e-01, -9.99305248e-01,
+          -7.81787097e-01, 5.35492897e-02, 1., -9.98189509e-01,
+          -8.84593368e-01, 4.22885418e-02, 1., -9.98109818e-01,
+          -8.84841383e-01 ]
+      - [ -7.88111031e-01, -7.11115420e-01, -6.16749525e-01,
+          -7.75448084e-02, -9.34514642e-01, -8.30311418e-01,
+          -6.42908573e-01, 1.90919518e-01, -9.59331572e-01,
+          -7.87221730e-01, -5.81249952e-01, 2.60063052e-01,
+          -9.38954353e-01, -8.94884288e-01, -5.89062452e-01,
+          2.80132174e-01, -9.37102675e-01, -8.94083679e-01,
+          2.23925471e-01, -3.47413361e-01, 3.45331788e-01,
+          6.21612906e-01, -9.03669298e-01, -4.03317869e-01,
+          3.04046154e-01, 9.69940782e-01, -9.35263932e-01,
+          -2.76067615e-01, 7.63387203e-01, 6.45328283e-01,
+          -8.69017601e-01, -7.20734119e-01, 7.21393108e-01,
+          6.85147762e-01, -8.71792853e-01, -7.09108353e-01 ]
+      - [ -8.43827665e-01, -9.93364096e-01, -7.89921939e-01,
+          -1.50722504e-01, -9.99695182e-01, -9.93256211e-01,
+          -8.28653693e-01, -8.35800767e-02, -9.99714673e-01,
+          -9.92533445e-01, -7.95312464e-01, 2.58034348e-01,
+          -9.99724090e-01, -9.94538486e-01, -8.04687500e-01,
+          2.84795880e-01, -9.99746978e-01, -9.94584978e-01,
+          -6.56747818e-01, -9.16808009e-01, -6.04049504e-01,
+          2.99389720e-01, -9.99995768e-01, -9.52540874e-01,
+          -6.39306366e-01, 3.69517565e-01, -1., -9.49695230e-01,
+          -5.74097157e-01, 5.64099908e-01, -1., -9.70365703e-01,
+          -5.84577143e-01, 5.51482201e-01, -9.99984384e-01,
+          -9.70377505e-01 ]
+      - [ -5.48674464e-01, -1.38533771e-01, -2.53371179e-01,
+          3.35910797e-01, -7.19693899e-01, -9.59195495e-02,
+          -2.72858143e-01, 1.19338512e-01, -6.13483250e-01,
+          -1.70759857e-01, -8.59374404e-02, -6.47942424e-01,
+          -4.69762206e-01, -6.93245173e-01, -8.12499523e-02,
+          -3.69813621e-01, -5.23545623e-01, -6.52370572e-01,
+          4.93237019e-01, 8.93902779e-02, 3.58830214e-01, 5.20393848e-02,
+          -3.58233690e-01, 1.83452368e-01, 3.04046154e-01,
+          1.53981686e-01, -3.10000420e-01, 3.21414113e-01,
+          7.11083412e-01, -9.78610098e-01, -4.26967144e-02,
+          -5.99504828e-01, 6.86567068e-01, -8.01430047e-01,
+          -1.02464139e-01, -5.44492245e-01 ]
+      - [ -6.45401835e-01, -9.93373275e-01, -2.95954585e-01,
+          -2.54154861e-01, -9.98080492e-01, -9.99815166e-01,
+          -2.97336161e-01, -1.90517843e-01, -9.98426914e-01,
+          -9.99870658e-01, -2.34374940e-01, 1.31335974e-01,
+          -9.98526633e-01, -9.99577165e-01, -2.25000024e-01,
+          1.61245346e-01, -9.98598278e-01, -9.99725282e-01,
+          -5.10670304e-01, -1., -5.92800915e-01, 9.52889919e-02,
+          -9.99350190e-01, -9.99806046e-01, -5.97687840e-01,
+          1.50663257e-01, -9.99437749e-01, -1., -5.74097157e-01,
+          2.88194895e-01, -9.99447346e-01, -9.98454154e-01,
+          -5.74626863e-01, 2.66334295e-01, -9.99446213e-01,
+          -9.98515844e-01 ]
+      - [ -7.03189731e-01, -3.02554786e-01, -4.29382563e-01,
+          1.76359177e-01, -8.10436130e-01, -4.13329124e-01,
+          -4.38444853e-01, 3.56706023e-01, -8.14373374e-01,
+          -3.49208772e-01, -3.98437500e-01, -1.42384887e-01,
+          -6.70967817e-01, -7.07102776e-01, -4.06250000e-01,
+          -1.05160117e-01, -6.72429562e-01, -7.09969282e-01,
+          -3.51669192e-02, -2.62770057e-02, 8.43644142e-02,
+          -9.13460255e-02, -4.69827592e-01, -2.04896986e-01,
+          5.78022003e-03, 1.96784496e-01, -4.99189138e-01,
+          2.06608772e-02, 1.40722275e-01, -2.17241108e-01,
+          -3.31977785e-01, -4.98796999e-01, 1.21890545e-01,
+          -1.81706250e-01, -3.50710988e-01, -4.84305561e-01 ]
+      - [ -7.15410113e-01, -4.05796230e-01, -4.52093720e-01,
+          3.76228452e-01, -9.15748000e-01, -5.07951379e-01,
+          -4.62922931e-01, 5.32592893e-01, -9.16276932e-01,
+          -4.71352875e-01, -3.26562464e-01, 2.51194239e-01,
+          -8.60099554e-01, -7.87666798e-01, -3.42187464e-01,
+          2.79158235e-01, -8.59583139e-01, -7.86856592e-01,
+          1.95370913e-02, -5.64308167e-02, 1.06861591e-01,
+          5.84703684e-02, -5.56179523e-01, -1.78987145e-01,
+          7.74565935e-02, 3.46231341e-01, -5.98255694e-01,
+          4.14884090e-03, 1.75591469e-01, -1.73454046e-01,
+          -3.79717290e-01, -5.19093633e-01, 1.74129367e-01,
+          -1.22333169e-01, -4.09280062e-01, -5.08314431e-01 ]
+      - [ -6.91176474e-01, -2.91790903e-01, -4.05251920e-01,
+          1.94922805e-01, -8.12533379e-01, -4.05891776e-01,
+          -4.18286502e-01, 3.73875856e-01, -8.14797997e-01,
+          -3.39786708e-01, -3.81249964e-01, -1.39845133e-01,
+          -6.66636586e-01, -7.00626075e-01, -3.87499928e-01,
+          -9.60351229e-02, -6.70243204e-01, -7.02966809e-01,
+          -2.31440663e-02, -1.94348097e-02, 9.56130028e-02,
+          -8.23670626e-02, -4.70019579e-01, -1.94079280e-01,
+          2.65895128e-02, 2.05075860e-01, -5.00872731e-01,
+          2.74785757e-02, 1.65628791e-01, -2.51446009e-01,
+          -3.18554223e-01, -5.03278613e-01, 1.36815906e-01,
+          -1.89792991e-01, -3.40437949e-01, -4.78045046e-01 ]
+      - [ -6.86619759e-01, -2.88546443e-01, -3.98154676e-01,
+          2.09367633e-01, -8.15320075e-01, -4.00895655e-01,
+          -4.12526965e-01, 3.70930552e-01, -8.13018739e-01,
+          -3.39195073e-01, -3.71874988e-01, -1.28880382e-01,
+          -6.69304490e-01, -6.98635697e-01, -3.78125012e-01,
+          -9.00930166e-02, -6.71925783e-01, -7.01902390e-01,
+          -1.50299072e-03, -1.32405758e-02, 1.13610864e-01,
+          -7.69701004e-02, -4.71115172e-01, -1.88805878e-01,
+          3.58381271e-02, 2.01456308e-01, -4.97431219e-01,
+          2.93309689e-02, 1.80572867e-01, -2.23757207e-01,
+          -3.22015107e-01, -4.90920663e-01, 1.54228926e-01,
+          -1.96342409e-01, -3.36959124e-01, -4.77389693e-01 ]
+      - [ -8.96644592e-01, -8.13380122e-01, -8.56635928e-01,
+          1.52502298e-01, -9.91816998e-01, -8.20544362e-01,
+          -8.63210917e-01, 1.88711524e-01, -9.90702271e-01,
+          -8.50351751e-01, -7.53124952e-01, 3.68835688e-01,
+          -9.84515905e-01, -9.41648304e-01, -7.60937512e-01,
+          3.92174721e-01, -9.84672666e-01, -9.41986918e-01,
+          -8.31680179e-01, -7.01285124e-01, -7.97525287e-01,
+          6.07841015e-02, -8.47860694e-01, -7.25233555e-01,
+          -8.03468227e-01, -1.39052868e-02, -8.19926441e-01,
+          -8.04941475e-01, -8.05728555e-01, -1.20672286e-01,
+          -7.71936297e-01, -9.09645677e-01, -8.13432813e-01,
+          -1.67401552e-01, -7.69563437e-01, -9.13783550e-01 ]
+      - [ -6.71085358e-01, -3.94022763e-01, -3.58410180e-01,
+          7.55919337e-01, -9.60767567e-01, -3.92180085e-01,
+          -3.46292257e-01, 3.50131869e-01, -9.01574373e-01,
+          -5.78930855e-01, -1.48437440e-01, 4.14617658e-01,
+          -8.81607771e-01, -7.67984569e-01, -1.42187417e-01,
+          4.33742881e-01, -8.83757114e-01, -7.73070335e-01,
+          1.87856793e-01, -3.69070768e-02, 2.21597314e-01,
+          5.47158003e-01, -6.92690253e-01, 1.40200019e-01,
+          2.69364119e-01, -4.99815941e-02, -5.13425589e-01,
+          -2.85114050e-01, 3.30012321e-01, 9.40287113e-03,
+          -4.30464983e-01, -4.68202770e-01, 3.33333373e-01,
+          -2.29081154e-01, -3.82914543e-01, -5.30320764e-01 ]
+      - [ -6.76677704e-01, -4.06161845e-01, -3.64087999e-01,
+          7.56688237e-01, -9.65978980e-01, -4.15611088e-01,
+          -3.08855295e-01, 5.28184652e-01, -9.41264212e-01,
+          -5.83818793e-01, -4.53124642e-02, 5.77610254e-01,
+          -9.16849077e-01, -7.77677178e-01, -3.12500000e-02,
+          6.03932261e-01, -9.20151949e-01, -7.81785250e-01,
+          1.81244373e-01, -4.28574085e-02, 2.12598562e-01,
+          5.41219234e-01, -6.99297547e-01, 1.15715742e-01,
+          2.76300430e-01, -4.78787422e-02, -5.26838899e-01,
+          -3.04975629e-01, 3.05105805e-01, 5.04388809e-02,
+          -4.47035551e-01, -4.64184880e-01, 3.20895553e-01,
+          -1.85440421e-01, -3.99844170e-01, -5.25600672e-01 ]
+      - [ -6.42502069e-01, -1.03054881e-01, -3.08729649e-01,
+          5.76221347e-01, -7.70277023e-01, 4.41998243e-02,
+          -3.41972649e-01, -1.82572007e-02, -6.19258285e-01,
+          -2.94776499e-01, -2.12499976e-01, -3.80571783e-01,
+          -5.21997213e-01, -6.42822504e-01, -2.23437488e-01,
+          -3.34025681e-01, -5.24404645e-01, -6.45969391e-01,
+          2.83438444e-01, 1.47106051e-01, 3.18335295e-01, 3.32533479e-01,
+          -4.81316388e-01, 3.32394481e-01, 3.54913235e-01,
+          -3.58739078e-01, -2.40991950e-01, -1.87745810e-01,
+          4.34620261e-01, -1.22666657e-01, -2.09005415e-01,
+          -3.12732697e-01, 4.37810779e-01, -2.76629090e-01,
+          -1.76812530e-01, -3.55659127e-01 ]
+      - [ -6.39602304e-01, -2.56848633e-01, -3.22924078e-01,
+          5.83933830e-01, -8.69314909e-01, -2.21105635e-01,
+          -3.34773242e-01, 2.86097527e-02, -7.51476586e-01,
+          -4.86770988e-01, -2.56249964e-01, -5.36137223e-02,
+          -6.83772564e-01, -6.80431724e-01, -2.65624940e-01,
+          -8.34514499e-02, -6.72528148e-01, -6.95708752e-01,
+          1.76435113e-01, 2.68533230e-02, 2.10348725e-01, 3.64164114e-01,
+          -5.80085039e-01, 1.64053798e-01, 2.94797540e-01,
+          -2.97647953e-01, -3.72891963e-01, -3.16750348e-01,
+          3.32503080e-01, -4.71151471e-02, -3.38711858e-01,
+          -4.04047608e-01, 3.48258853e-01, -2.90213048e-01,
+          -2.91014433e-01, -4.74299133e-01 ]
+      - [ -8.13794494e-01, -6.84780002e-01, -6.56493962e-01,
+          3.23115110e-01, -9.86374140e-01, -7.49793231e-01,
+          -6.45788312e-01, 3.78526688e-01, -9.84959304e-01,
+          -7.75285006e-01, -4.85937476e-01, 5.27300119e-01,
+          -9.73819613e-01, -8.91345382e-01, -4.87499952e-01,
+          5.37094355e-01, -9.73838091e-01, -8.94068360e-01,
+          -3.09287727e-01, -3.77275586e-01, -2.41844714e-01,
+          1.85240507e-01, -7.71563768e-01, -4.53697085e-01,
+          -2.27745771e-01, 6.37902021e-02, -7.15775907e-01,
+          -5.31137109e-01, -1.88044786e-01, 9.99783278e-02,
+          -6.95539594e-01, -7.13011920e-01, -2.13930309e-01,
+          -5.30556440e-02, -6.63768411e-01, -7.37520218e-01 ]
+      - [ -7.69676924e-01, -5.41127086e-01, -5.62810481e-01,
+          3.62404704e-01, -9.55320835e-01, -6.19166017e-01,
+          -5.76673865e-01, 2.99403548e-01, -9.31275606e-01,
+          -6.55366302e-01, -4.90624905e-01, 2.64021754e-01,
+          -9.01702583e-01, -8.40481758e-01, -5.04687428e-01,
+          2.57859826e-01, -8.97035122e-01, -8.44474971e-01,
+          -1.90862715e-01, -2.28997171e-01, -1.02362216e-01,
+          2.31727362e-02, -6.32829130e-01, -3.62912595e-01,
+          -9.82660055e-02, -1.70761347e-02, -5.96461177e-01,
+          -4.02623594e-01, -8.59278440e-02, -5.66542149e-03,
+          -5.29091358e-01, -5.89254081e-01, -7.71144032e-02,
+          -1.28452182e-01, -5.12388706e-01, -6.21207416e-01 ]
+      - [ -7.50207126e-01, -3.96821797e-01, -5.15968800e-01,
+          1.38636470e-01, -8.39726985e-01, -4.94839251e-01,
+          -5.30597568e-01, 9.73228216e-02, -8.12164187e-01,
+          -5.34523606e-01, -4.93749976e-01, -9.42987204e-02,
+          -7.26132512e-01, -7.51851499e-01, -4.98437524e-01,
+          -1.88263476e-01, -7.01077819e-01, -7.77688563e-01,
+          -1.60805643e-01, -1.71494842e-01, -5.73677421e-02,
+          -7.95773268e-02, -5.58692217e-01, -3.47025096e-01,
+          -4.73989248e-02, -8.83079171e-02, -5.29258430e-01,
+          -3.63702953e-01, -2.11706161e-02, -8.75619650e-02,
+          -4.53064322e-01, -5.53610325e-01, -2.98507214e-02,
+          -2.20287681e-01, -4.28766847e-01, -5.85057676e-01 ]
+      - [ -7.63048887e-01, -4.69238281e-01, -5.44357777e-01,
+          2.72327185e-01, -9.09551919e-01, -5.50807655e-01,
+          -5.69474459e-01, 1.28502727e-01, -8.61338794e-01,
+          -6.06329799e-01, -5.26562452e-01, 6.71911240e-02,
+          -8.16157341e-01, -7.94690609e-01, -5.35937548e-01,
+          1.01929903e-02, -8.00499439e-01, -8.09664130e-01,
+          -1.59002185e-01, -1.96387947e-01, -5.28683662e-02,
+          2.28674412e-02, -6.22064233e-01, -3.43004882e-01,
+          -6.12717271e-02, -9.11847949e-02, -5.50817311e-01,
+          -4.01551723e-01, -3.61145735e-02, -4.87667322e-03,
+          -5.02187967e-01, -5.59785128e-01, -3.98010015e-02,
+          -1.47894621e-01, -4.76625860e-01, -5.94870090e-01 ]
+      - [ -7.91217923e-01, -8.41245949e-01, -6.84882879e-01,
+          1.69004202e-01, -9.98762429e-01, -9.03420866e-01,
+          -7.00503945e-01, 2.58499146e-01, -9.98812914e-01,
+          -9.03509378e-01, -5.95312476e-01, 6.19726419e-01,
+          -9.98273075e-01, -9.40674305e-01, -5.95312476e-01,
+          6.31228447e-01, -9.98275161e-01, -9.41944003e-01,
+          -2.83438623e-01, -5.53923070e-01, -1.67604029e-01,
+          5.70489049e-01, -9.59220767e-01, -6.00566983e-01,
+          -1.60693645e-01, 6.42874599e-01, -9.57633376e-01,
+          -6.09079301e-01, 7.09837675e-02, 6.14437461e-01,
+          -9.31801498e-01, -8.20832133e-01, 6.46766424e-02,
+          5.24002671e-01, -9.25126314e-01, -8.35712731e-01 ]
+      - [ -7.69883990e-01, -9.88250196e-01, -5.91199458e-01,
+          -2.14374542e-01, -9.98224914e-01, -9.97314334e-01,
+          -6.08351350e-01, -1.52552843e-01, -9.98271108e-01,
+          -9.97142732e-01, -5.78124940e-01, 1.79683328e-01,
+          -9.98484194e-01, -9.97540951e-01, -5.81249952e-01,
+          2.08292961e-01, -9.98546302e-01, -9.97704327e-01,
+          -5.57559431e-01, -9.51168001e-01, -5.68053961e-01,
+          1.95230603e-01, -9.98989940e-01, -9.78960574e-01,
+          -5.81502914e-01, 2.59853125e-01, -9.99007761e-01,
+          -9.78078723e-01, -5.39227962e-01, 4.30586576e-01,
+          -9.99081492e-01, -9.85218465e-01, -5.34825861e-01,
+          4.10680413e-01, -9.99060154e-01, -9.85770702e-01 ]
+      - [ -7.83761382e-01, -9.61633384e-01, -6.83463454e-01,
+          -1.18893862e-01, -9.98495638e-01, -9.85362172e-01,
+          -7.07703352e-01, -4.86132503e-02, -9.98503745e-01,
+          -9.84576821e-01, -6.65624976e-01, 3.01521063e-01,
+          -9.98674631e-01, -9.88608778e-01, -6.65624976e-01,
+          3.25856328e-01, -9.98725057e-01, -9.88943398e-01,
+          -4.67388093e-01, -8.53870511e-01, -4.48818862e-01,
+          3.83185744e-01, -9.98943865e-01, -9.12764311e-01,
+          -4.72832441e-01, 4.65022445e-01, -9.98924553e-01,
+          -9.07302678e-01, -3.57409716e-01, 6.68869019e-01,
+          -9.98732269e-01, -9.48938668e-01, -3.40795994e-01,
+          6.48916602e-01, -9.98586774e-01, -9.51258123e-01 ]
+      - [ -7.78790414e-01, -6.23721242e-01, -6.29524469e-01,
+          3.48684669e-01, -9.67905104e-01, -6.57228112e-01,
+          -5.93952537e-01, 3.40567827e-02, -9.33227241e-01,
+          -8.09934855e-01, -5.59374988e-01, 5.86285353e-01,
+          -9.57985997e-01, -8.33461106e-01, -6.04687452e-01,
+          7.51566887e-03, -8.84264529e-01, -9.03235495e-01,
+          -2.28734612e-01, -3.93033803e-01, -2.10348666e-01,
+          3.48073363e-01, -8.27398777e-01, -4.07102525e-01,
+          -2.34682083e-01, -2.99443781e-01, -6.17061019e-01,
+          -6.94235981e-01, -2.07970083e-01, 1.57292485e-01,
+          -6.94639504e-01, -6.86349630e-01, -1.74129426e-01,
+          -3.19103420e-01, -6.05411172e-01, -7.88245142e-01 ]
+      - [ -7.92874932e-01, -5.25266945e-01, -6.42299533e-01,
+          3.15780759e-01, -9.18665409e-01, -5.11238933e-01,
+          -6.35709167e-01, -2.13007987e-01, -8.19002688e-01,
+          -7.46949077e-01, -6.31250024e-01, 3.09947371e-01,
+          -8.66690278e-01, -7.68498719e-01, -6.23437524e-01,
+          -3.10422122e-01, -7.52807498e-01, -8.76281738e-01,
+          -2.58791804e-01, -3.24798107e-01, -2.28346467e-01,
+          2.33892798e-01, -7.44102240e-01, -3.39737296e-01,
+          -2.39306450e-01, -4.53585148e-01, -4.87548292e-01,
+          -6.64167702e-01, -2.22914040e-01, -4.67053056e-02,
+          -5.65331280e-01, -6.51577175e-01, -1.94029808e-01,
+          -3.22869956e-01, -5.21540403e-01, -7.23393798e-01 ]
+      - [ -7.86661148e-01, -8.35301459e-01, -6.49396777e-01,
+          1.90571427e-01, -9.99155998e-01, -9.03675854e-01,
+          -6.91864610e-01, 2.80218720e-01, -9.99098897e-01,
+          -8.99352133e-01, -5.54687440e-01, 6.46318555e-01,
+          -9.98698711e-01, -9.40252781e-01, -5.59374988e-01,
+          6.58135653e-01, -9.98679876e-01, -9.40869153e-01,
+          -1.08506262e-01, -5.27422667e-01, 3.93700600e-02,
+          6.49902463e-01, -9.64189053e-01, -5.80790281e-01,
+          -1.04046464e-02, 7.50416279e-01, -9.65113699e-01,
+          -5.73065639e-01, 3.92278910e-01, 6.61351204e-01,
+          -9.34536159e-01, -8.18268895e-01, 3.75621796e-01,
+          6.29052043e-01, -9.33351755e-01, -8.24221730e-01 ]
+      - [ -8.75724912e-01, -9.34711754e-01, -9.63094413e-01,
+          -2.72648156e-01, -9.68377352e-01, -9.68728185e-01,
+          -9.91360664e-01, -1.41723275e-01, -9.77894783e-01,
+          -9.54057038e-01, -9.90624964e-01, 1.11231804e-02,
+          -9.62597609e-01, -9.84630287e-01, -1., 2.99339294e-02,
+          -9.60260451e-01, -9.85762715e-01, -6.77787781e-01,
+          -8.93031478e-01, -7.75028110e-01, 4.22731638e-02,
+          -9.63971913e-01, -9.48965907e-01, -8.10404658e-01,
+          1.56075001e-01, -9.70752656e-01, -9.31704760e-01,
+          -8.08219194e-01, 1.93879366e-01, -9.60887074e-01,
+          -9.67465937e-01, -8.00994992e-01, 1.88634276e-01,
+          -9.64678645e-01, -9.67665792e-01 ]
+      - [ -7.47307360e-01, -5.57770431e-01, -5.13129890e-01,
+          5.17144918e-01, -9.80699539e-01, -6.30705714e-01,
+          -5.29157639e-01, 5.94707251e-01, -9.78529096e-01,
+          -6.43147230e-01, -1.81249976e-01, 6.40112281e-01,
+          -9.64139819e-01, -8.51249695e-01, -1.79687440e-01,
+          5.97365975e-01, -9.60781217e-01, -8.60173345e-01,
+          2.37450600e-02, -1.99064612e-01, -5.62423468e-03,
+          2.71413565e-01, -6.73948765e-01, -1.54184580e-01,
+          -4.04624343e-02, 2.62048960e-01, -6.45803332e-01,
+          -1.91248536e-01, 1.35740995e-01, -3.12071562e-01,
+          -4.66247857e-01, -6.66249633e-01, 1.21890545e-01,
+          -5.56263447e-01, -4.16415572e-01, -7.13725865e-01 ]
+      - [ -7.35915542e-01, -4.41979289e-01, -5.15968800e-01,
+          3.67937922e-01, -9.13862824e-01, -4.92974043e-01,
+          -5.47876179e-01, 3.12201738e-01, -8.81991744e-01,
+          -5.23818970e-01, -3.76562476e-01, 1.38417721e-01,
+          -8.52839351e-01, -8.14811945e-01, -4.09374952e-01,
+          2.97620296e-02, -8.28810096e-01, -8.31677735e-01,
+          5.19987345e-02, -1.18449807e-01, 4.83690500e-02,
+          2.25318909e-01, -6.07192397e-01, -7.15269446e-02,
+          8.09240341e-03, 1.58916950e-01, -5.45877457e-01,
+          -1.16238832e-01, 1.65628791e-01, -3.96417975e-01,
+          -3.55539143e-01, -6.08873606e-01, 1.64179087e-01,
+          -5.48382521e-01, -3.34049940e-01, -6.43981218e-01 ]
+      - [ -7.81690121e-01, -8.24021757e-01, -6.46557808e-01,
+          2.21850872e-01, -9.99176264e-01, -8.91490281e-01,
+          -6.71706200e-01, 2.99836040e-01, -9.99091327e-01,
+          -8.94652963e-01, -5.37500024e-01, 6.73315167e-01,
+          -9.98652756e-01, -9.35434639e-01, -5.42187512e-01,
+          6.84763074e-01, -9.98649120e-01, -9.36083734e-01,
+          -4.71897721e-02, -5.00927925e-01, 7.98650980e-02,
+          7.26841807e-01, -9.65695739e-01, -5.24649918e-01,
+          5.89594841e-02, 7.39458561e-01, -9.58677351e-01,
+          -5.61519086e-01, 4.69489455e-01, 6.65863514e-01,
+          -9.25957859e-01, -8.03065896e-01, 4.42786098e-01,
+          6.24210954e-01, -9.23565865e-01, -8.10034990e-01 ]
+      - [ -8.82352948e-01, -4.05852973e-01, -7.35982955e-01,
+          4.02548194e-01, -8.11949372e-01, -5.34630418e-02,
+          -7.46580243e-01, -4.56861913e-01, -5.61956882e-01,
+          -5.92080235e-01, -7.04687476e-01, -8.74433219e-01,
+          -4.61441100e-01, -8.57980728e-01, -7.06250012e-01,
+          -8.09722722e-01, -4.70646799e-01, -8.61057639e-01,
+          -7.20468879e-01, -3.58497620e-01, -5.52305937e-01,
+          5.64750433e-02, -5.85884333e-01, -2.40777254e-01,
+          -5.53757250e-01, -7.04517126e-01, -2.34724343e-01,
+          -7.15773463e-01, -5.26774645e-01, -6.69667065e-01,
+          -2.80787289e-01, -7.68110931e-01, -5.29850721e-01,
+          -7.79996514e-01, -2.65859365e-01, -7.87194908e-01 ]
+      - [ -8.60604823e-01, -5.70753455e-01, -7.81405210e-01,
+          -1.29164934e-01, -7.72678494e-01, -5.58385849e-01,
+          -7.91216671e-01, -8.21355760e-01, -5.03623962e-01,
+          -8.06688309e-01, -7.85937428e-01, -5.58325887e-01,
+          -5.97285032e-01, -8.55286539e-01, -7.87499964e-01,
+          -4.64561462e-01, -6.16928637e-01, -8.51981401e-01,
+          -3.58581364e-01, -3.36789310e-01, -2.75590479e-01,
+          2.93055654e-01, -7.32440591e-01, -2.35639811e-01,
+          -3.41040432e-01, -2.14197934e-01, -5.54796457e-01,
+          -5.53796709e-01, -2.25404799e-01, 4.86850739e-03,
+          -6.45231366e-01, -7.05456674e-01, -2.43781149e-01,
+          -6.85201287e-02, -6.32713318e-01, -7.15703249e-01 ]
+      - [ -7.14374483e-01, -1.79782569e-01, -3.59829664e-01,
+          9.46238160e-01, -8.83838177e-01, 1.21455193e-01,
+          -3.44852388e-01, -4.16351140e-01, -5.78796506e-01,
+          -5.36260128e-01, -3.15624952e-01, -4.78487670e-01,
+          -5.04780769e-01, -6.74992085e-01, -3.23437452e-01,
+          -3.47202659e-01, -5.23373306e-01, -6.57614648e-01,
+          -8.38594437e-02, 8.15420151e-02, 1.42857194e-01,
+          6.99128985e-01, -6.17910266e-01, 5.87778211e-01,
+          1.81502938e-01, -8.32626104e-01, -5.47528267e-02,
+          -4.51957524e-01, 2.25404620e-01, -6.85239196e-01,
+          -2.45729685e-02, -4.80591893e-01, 2.31343150e-01,
+          -6.39988661e-01, -7.19716549e-02, -4.72643614e-01 ]
+      - [ -8.14001679e-01, -5.95684350e-01, -6.45138383e-01,
+          5.68221927e-01, -9.85082746e-01, -5.65639496e-01,
+          -5.32037377e-01, 3.71528506e-01, -9.76140261e-01,
+          -7.63098717e-01, -3.84374976e-01, 5.46971917e-01,
+          -9.61227059e-01, -8.62589538e-01, -3.96875024e-01,
+          5.75114727e-01, -9.61552143e-01, -8.60319674e-01,
+          -2.26931214e-01, -2.07180202e-01, -3.71202826e-02,
+          7.53689051e-01, -8.23434889e-01, 8.96965265e-02,
+          -6.35837913e-02, -1.91734850e-01, -5.53800166e-01,
+          -5.02611160e-01, -3.86051536e-02, -2.97891319e-01,
+          -4.72998679e-01, -6.75632477e-01, -1.99004412e-02,
+          -2.40622640e-01, -5.09684801e-01, -6.67200565e-01 ]
+      - [ -7.98467278e-01, -7.81684160e-01, -6.56493962e-01,
+          3.22956085e-01, -9.98585820e-01, -8.35038185e-01,
+          -6.02591753e-01, 3.37568521e-01, -9.98173654e-01,
+          -8.80449533e-01, -5.09374976e-01, 7.40746856e-01,
+          -9.97689784e-01, -9.16240096e-01, -5.18749952e-01,
+          7.53594160e-01, -9.97712851e-01, -9.16056812e-01,
+          -1.77637577e-01, -4.08680856e-01, 1.01237297e-02,
+          9.63531733e-01, -9.66859221e-01, -2.72414207e-01,
+          1.58381343e-01, 6.24965549e-01, -9.26717579e-01,
+          -5.44017076e-01, 4.14694905e-01, 6.17709756e-01,
+          -8.90437543e-01, -7.58551598e-01, 4.10447836e-01,
+          6.60372257e-01, -8.97836685e-01, -7.54249275e-01 ]
+      - [ -7.89768040e-01, -8.02787483e-01, -6.13910556e-01,
+          2.14020729e-01, -9.96446252e-01, -8.76086414e-01,
+          -6.50107980e-01, 3.39788556e-01, -9.97773170e-01,
+          -8.68988454e-01, -5.34374952e-01, 6.79080486e-01,
+          -9.96396482e-01, -9.21429157e-01, -5.46875000e-01,
+          6.99188352e-01, -9.96585071e-01, -9.20630217e-01,
+          -2.66005456e-01, -5.35623729e-01, -9.78627205e-02,
+          6.22717738e-01, -9.58447874e-01, -5.59953094e-01,
+          -7.97688365e-02, 7.57221103e-01, -9.65424061e-01,
+          -5.58672428e-01, 1.40722275e-01, 5.88063121e-01,
+          -9.20460761e-01, -8.11859727e-01, 1.44278646e-01,
+          6.34363770e-01, -9.27750170e-01, -8.07510436e-01 ]
+      - [ -7.86868274e-01, -6.02729440e-01, -5.34421563e-01,
+          4.04859304e-01, -9.78105664e-01, -6.88665867e-01,
+          -5.85313141e-01, 5.87681293e-01, -9.81265306e-01,
+          -6.45417929e-01, -3.59374940e-01, 5.80550909e-01,
+          -9.58749473e-01, -8.48822713e-01, -3.64062488e-01,
+          6.47609949e-01, -9.62615788e-01, -8.42802405e-01,
+          -4.03666973e-01, -3.49856675e-01, -2.62092233e-01,
+          1.57578468e-01, -7.31338620e-01, -3.99438739e-01,
+          -2.48555005e-01, 4.01837826e-01, -7.91330397e-01,
+          -3.39612067e-01, -2.40348697e-01, -1.96629703e-01,
+          -5.58776855e-01, -7.17091203e-01, -2.36318409e-01,
+          -1.39594018e-01, -5.83916485e-01, -7.01017976e-01 ]
+      - [ -8.31400156e-01, -8.90806317e-01, -7.18949556e-01,
+          5.58673143e-02, -9.98087108e-01, -9.36751306e-01,
+          -7.26421833e-01, 1.32872939e-01, -9.98360157e-01,
+          -9.40257728e-01, -6.71875000e-01, 4.93162751e-01,
+          -9.98144090e-01, -9.60123420e-01, -6.79687500e-01,
+          5.16323447e-01, -9.98280764e-01, -9.60072458e-01,
+          -3.86233926e-01, -6.99100256e-01, -2.71091104e-01,
+          5.09860873e-01, -9.83188927e-01, -7.48738647e-01,
+          -2.20809281e-01, 5.65049171e-01, -9.83151197e-01,
+          -7.73915827e-01, -9.58903432e-02, 6.11031532e-01,
+          -9.68559146e-01, -8.86308670e-01, -1.01989985e-01,
+          5.90988517e-01, -9.67929900e-01, -8.88282895e-01 ]
+      - [ -7.90389419e-01, -5.01924276e-01, -5.85521698e-01,
+          -2.44955420e-02, -8.42805803e-01, -6.20213151e-01,
+          -6.09791160e-01, 3.84976149e-01, -9.07298923e-01,
+          -5.19043326e-01, -5.85937500e-01, -1.38462901e-01,
+          -7.55924761e-01, -8.04767072e-01, -5.92187524e-01,
+          1.11696005e-01, -7.98246861e-01, -7.69684911e-01,
+          -3.87436211e-01, -2.99086154e-01, -2.59842515e-01,
+          -7.18380213e-02, -5.79731047e-01, -3.84227276e-01,
+          -2.55491316e-01, 2.10016847e-01, -6.81968093e-01,
+          -3.15661013e-01, -1.98007464e-01, -2.88047791e-01,
+          -4.69368637e-01, -6.79421008e-01, -2.03980088e-01,
+          -2.51139462e-01, -4.90236998e-01, -6.65881038e-01 ]
+      - [ -7.96188891e-01, -4.98150051e-01, -5.89779973e-01,
+          -4.30853963e-02, -8.34946275e-01, -6.18779957e-01,
+          -6.12670898e-01, 3.87206197e-01, -9.05728340e-01,
+          -5.12010157e-01, -5.92187405e-01, -1.50190294e-01,
+          -7.48424888e-01, -8.01835299e-01, -5.98437428e-01,
+          9.09109116e-02, -7.89813459e-01, -7.68115163e-01,
+          -4.00060177e-01, -3.00908744e-01, -2.71091104e-01,
+          -8.79175067e-02, -5.70791364e-01, -3.88785779e-01,
+          -2.64739871e-01, 2.02900767e-01, -6.79258108e-01,
+          -3.18031311e-01, -2.00498223e-01, -2.87853181e-01,
+          -4.73630488e-01, -6.83118820e-01, -2.11442828e-01,
+          -2.65179098e-01, -4.86016572e-01, -6.69245839e-01 ]
+      - [ -8.14208746e-01, -8.99750113e-01, -7.33144104e-01,
+          2.57399082e-02, -9.97334182e-01, -9.41199899e-01,
+          -7.14902759e-01, 8.58173370e-02, -9.97661352e-01,
+          -9.50805962e-01, -6.65624976e-01, 4.54873443e-01,
+          -9.97664511e-01, -9.64791715e-01, -6.74999952e-01,
+          4.77303743e-01, -9.97736216e-01, -9.64699090e-01,
+          -3.65795076e-01, -7.05661893e-01, -3.54330719e-01,
+          5.78323007e-01, -9.90293980e-01, -7.26458311e-01,
+          -2.69364119e-01, 5.44159651e-01, -9.84909594e-01,
+          -7.91287839e-01, -1.30759597e-01, 6.54334307e-01,
+          -9.74363029e-01, -8.85960400e-01, -1.51741326e-01,
+          6.57264113e-01, -9.74929094e-01, -8.84259105e-01 ]
+      - [ -7.78997540e-01, -7.94170380e-01, -6.26685619e-01,
+          2.16766834e-01, -9.94959295e-01, -8.63190114e-01,
+          -6.06911421e-01, 2.74012923e-01, -9.95581686e-01,
+          -8.85845661e-01, -5.00000000e-01, 6.15505338e-01,
+          -9.93668854e-01, -9.25510883e-01, -5.12500048e-01,
+          6.36889577e-01, -9.93834794e-01, -9.24693704e-01,
+          -1.52389586e-01, -5.05693495e-01, -1.02362216e-01,
+          6.85818911e-01, -9.51764882e-01, -4.67536509e-01,
+          -8.09252262e-03, 5.67841768e-01, -9.38904524e-01,
+          -6.17145538e-01, 2.40348697e-01, 5.76217294e-01,
+          -9.11881506e-01, -8.03847551e-01, 2.21393108e-01,
+          6.29892826e-01, -9.18384075e-01, -7.94718325e-01 ]
+      - [ -7.58906364e-01, -6.82863355e-01, -6.02554977e-01,
+          1.89010978e-01, -9.67126846e-01, -7.66828716e-01,
+          -6.09791160e-01, 7.65157938e-02, -9.51464593e-01,
+          -8.28388155e-01, -5.50000012e-01, 3.08019638e-01,
+          -9.48063433e-01, -8.96982074e-01, -5.53124964e-01,
+          3.49567175e-01, -9.49558973e-01, -8.95255685e-01,
+          4.11782265e-02, -3.69146109e-01, -1.46231055e-02,
+          5.68992019e-01, -8.68807137e-01, -3.01852405e-01,
+          5.66473007e-02, 2.63988137e-01, -8.24006021e-01,
+          -5.66431642e-01, 2.12951303e-01, 2.45115042e-01,
+          -7.81629980e-01, -7.45780647e-01, 2.06467628e-01,
+          2.80562878e-01, -7.91171908e-01, -7.38163114e-01 ]
+      - [ -7.23695159e-01, -5.17873764e-01, -5.11710465e-01,
+          6.27859354e-01, -9.78761673e-01, -5.37585616e-01,
+          -3.98128092e-01, 5.13964176e-01, -9.71591771e-01,
+          -6.88026667e-01, -1.43749893e-01, 5.92980504e-01,
+          -9.52012897e-01, -8.38898063e-01, -1.67187512e-01,
+          6.61619782e-01, -9.55427170e-01, -8.30741167e-01,
+          4.53861952e-02, -1.99587822e-01, 5.28683662e-02,
+          6.02035284e-01, -7.93809533e-01, -4.30631638e-02,
+          9.82658863e-02, -5.41487336e-02, -6.33341074e-01,
+          -4.90772486e-01, 1.25778317e-01, -1.92729831e-01,
+          -5.12749076e-01, -6.54896557e-01, 9.45273638e-02,
+          4.65253592e-02, -5.70515037e-01, -5.91055989e-01 ]
+      - [ -7.12096095e-01, -3.67597699e-01, -4.63449240e-01,
+          5.23873925e-01, -9.01112914e-01, -3.32792997e-01,
+          -4.62922931e-01, -5.85014224e-02, -7.87194550e-01,
+          -6.02763355e-01, -4.54687417e-01, -2.00929582e-01,
+          -7.00649023e-01, -7.63835073e-01, -4.74999964e-01,
+          7.50269890e-02, -7.42188454e-01, -7.15241909e-01,
+          3.15599442e-02, -1.35567248e-01, 2.81214714e-02,
+          4.89779353e-01, -7.10479796e-01, 4.09584045e-02,
+          9.13294554e-02, -2.94492900e-01, -4.90936339e-01,
+          -4.94433582e-01, 1.35740995e-01, -2.68886030e-01,
+          -4.15778100e-01, -6.02182329e-01, 9.95024443e-02,
+          -1.38400733e-01, -4.50056255e-01, -5.58877468e-01 ]
+      - [ -7.72162378e-01, -8.54272604e-01, -6.65010691e-01,
+          1.21610045e-01, -9.97228384e-01, -9.12902415e-01,
+          -6.80345535e-01, 2.08265424e-01, -9.97661352e-01,
+          -9.15157259e-01, -5.78124940e-01, 5.73790789e-01,
+          -9.97459292e-01, -9.46268559e-01, -5.85937500e-01,
+          5.89164257e-01, -9.97482717e-01, -9.46690321e-01,
+          -3.21911693e-01, -6.37645066e-01, -2.84589410e-01,
+          5.91262817e-01, -9.79509592e-01, -6.60902739e-01,
+          -2.55491316e-01, 5.99221230e-01, -9.72740293e-01,
+          -6.96338058e-01, -5.35492301e-02, 5.82046151e-01,
+          -9.52459872e-01, -8.64546657e-01, -6.21890426e-02,
+          5.71568370e-01, -9.53594804e-01, -8.67226601e-01 ]
+      - [ -7.87903905e-01, -9.53272223e-01, -6.90560699e-01,
+          -1.11368001e-01, -9.97646570e-01, -9.80865359e-01,
+          -7.22102165e-01, -3.12609076e-02, -9.98262107e-01,
+          -9.80345488e-01, -6.67187452e-01, 3.20018291e-01,
+          -9.98408735e-01, -9.85873163e-01, -6.71875000e-01,
+          3.43361497e-01, -9.98395920e-01, -9.86030221e-01,
+          -4.71596062e-01, -8.39305699e-01, -4.87064064e-01,
+          4.20957327e-01, -9.98686671e-01, -8.86345506e-01,
+          -4.77456689e-01, 4.83386517e-01, -9.98602927e-01,
+          -8.94344270e-01, -3.62391114e-01, 6.95347428e-01,
+          -9.98340309e-01, -9.40898538e-01, -3.70646775e-01,
+          6.85346246e-01, -9.98315692e-01, -9.41436291e-01 ]
+      - [ -7.71126747e-01, -8.62960279e-01, -6.59332871e-01,
+          9.55827236e-02, -9.97414291e-01, -9.24333453e-01,
+          -6.99064076e-01, 1.95688248e-01, -9.97714937e-01,
+          -9.17860687e-01, -5.90624928e-01, 5.51864386e-01,
+          -9.97585118e-01, -9.50438619e-01, -5.96874952e-01,
+          5.67999721e-01, -9.97595251e-01, -9.50833261e-01,
+          -3.19507122e-01, -6.57290459e-01, -2.64341891e-01,
+          5.08285999e-01, -9.77250934e-01, -7.20885515e-01,
+          -2.94797719e-01, 6.27405167e-01, -9.79558170e-01,
+          -7.02237368e-01, -5.35492301e-02, 5.92361093e-01,
+          -9.60135818e-01, -8.75919938e-01, -6.71641231e-02,
+          5.81520081e-01, -9.60450113e-01, -8.77106190e-01 ]
+      - [ -7.05468059e-01, -5.21953583e-01, -5.18807709e-01,
+          1.19300961e-01, -8.95148993e-01, -6.33819103e-01,
+          -4.81641471e-01, 2.07080364e-01, -9.16571021e-01,
+          -6.88740373e-01, -3.39062452e-01, 1.22450948e-01,
+          -8.85226011e-01, -8.60546052e-01, -3.51562440e-01,
+          7.21822977e-02, -8.75059187e-01, -8.71139646e-01,
+          4.11782265e-02, -3.06089818e-01, -5.28683662e-02,
+          3.97718906e-01, -7.94884920e-01, -2.84398079e-01,
+          -1.15603209e-03, 1.39845133e-01, -7.30062962e-01,
+          -4.85540509e-01, 1.50684953e-01, -6.19087219e-02,
+          -6.54838502e-01, -7.31590867e-01, 1.29353285e-01,
+          -1.49643779e-01, -6.40834391e-01, -7.42586672e-01 ]
+      - [ -6.90969348e-01, -4.31532025e-01, -4.86160457e-01,
+          1.23380065e-01, -8.43017459e-01, -5.19092560e-01,
+          -4.62922931e-01, 9.14084911e-03, -8.21736693e-01,
+          -6.18901789e-01, -3.90625000e-01, -1.75830841e-01,
+          -7.63004065e-01, -8.14973652e-01, -3.95312428e-01,
+          -1.64675415e-01, -7.60081053e-01, -8.21782112e-01,
+          2.25427151e-02, -2.68303633e-01, -7.08661675e-02,
+          3.38623524e-01, -7.46845424e-01, -2.33813822e-01,
+          -2.42775679e-02, 1.58325434e-02, -6.58402145e-01,
+          -4.72268462e-01, 1.18306279e-01, -1.50613725e-01,
+          -5.92326045e-01, -7.09788978e-01, 8.70646238e-02,
+          -2.23533452e-01, -5.79423666e-01, -7.17182755e-01 ]
+      - [ -6.68599844e-01, -9.99837041e-01, -4.40738142e-01,
+          -2.34897017e-01, -9.98931766e-01, -1., -4.32685316e-01,
+          -1.76511765e-01, -9.98980105e-01, -1., -3.93749952e-01,
+          1.52155519e-01, -9.99061227e-01, -9.99576449e-01,
+          -3.99999976e-01, 1.80850029e-01, -9.99050856e-01,
+          -9.99651194e-01, -6.17673635e-01, -9.97310102e-01,
+          -7.36782908e-01, 1.29912615e-01, -9.99823987e-01,
+          -9.95926917e-01, -7.06358433e-01, 1.77811027e-01,
+          -9.99728620e-01, -9.97187316e-01, -7.06102133e-01,
+          3.32946181e-01, -9.99900699e-01, -9.96432543e-01,
+          -7.16417909e-01, 3.12293172e-01, -9.99867618e-01,
+          -9.96418595e-01 ]
+      - [ -8.46727431e-01, -9.32448864e-01, -7.58694112e-01,
+          -7.72070885e-03, -9.99712110e-01, -9.65036631e-01,
+          -7.94096470e-01, 7.89687634e-02, -9.99752820e-01,
+          -9.60908830e-01, -7.35937476e-01, 4.37611103e-01,
+          -9.99689758e-01, -9.74912941e-01, -7.42187500e-01,
+          4.57953811e-01, -9.99702096e-01, -9.75099385e-01,
+          -3.42350483e-01, -7.33007550e-01, -2.62092233e-01,
+          6.09516978e-01, -9.99377787e-01, -8.11239958e-01,
+          -2.57803440e-01, 7.24192500e-01, -9.99484777e-01,
+          -7.98078537e-01, -8.84184837e-02, 9.57180619e-01,
+          -9.98592913e-01, -8.92299831e-01, -9.20398831e-02,
+          9.54300642e-01, -9.98534858e-01, -8.93388093e-01 ]
+      - [ -7.72162378e-01, -5.86960793e-01, -5.08871555e-01,
+          4.54203367e-01, -9.81714368e-01, -6.80073261e-01,
+          -5.52195787e-01, 6.05939746e-01, -9.82289195e-01,
+          -6.48998260e-01, -3.21874976e-01, 6.67346954e-01,
+          -9.64342237e-01, -8.38131785e-01, -3.29687476e-01,
+          6.53344989e-01, -9.62236047e-01, -8.41881394e-01,
+          2.73219109e-01, -1.71615958e-01, 2.41844773e-01,
+          4.96058583e-01, -7.61047661e-01, -1.09484434e-01,
+          2.83236861e-01, 4.41790104e-01, -7.30092287e-01,
+          -1.94267392e-01, 3.22540522e-01, 1.70181036e-01,
+          -6.05822921e-01, -5.76585174e-01, 3.28358293e-01,
+          6.18827343e-02, -5.88233471e-01, -5.99221587e-01 ]
+      - [ -6.17854238e-01, -2.35796928e-01, -3.00212920e-01,
+          2.83779025e-01, -8.02029371e-01, -3.22133839e-01,
+          -3.20374310e-01, 5.45291066e-01, -8.33670139e-01,
+          -2.61917591e-01, -2.40624964e-01, -5.37687540e-02,
+          -6.79121852e-01, -6.74789846e-01, -2.39062488e-01,
+          -6.50155544e-03, -6.83659971e-01, -6.75811291e-01,
+          3.57980132e-01, 1.11680150e-01, 3.34083200e-01,
+          -7.27236271e-04, -3.46174479e-01, 1.33753657e-01,
+          3.36416125e-01, 2.43521810e-01, -4.41870570e-01,
+          1.84980154e-01, 4.62017417e-01, -3.62690270e-01,
+          -1.65226698e-01, -4.07529533e-01, 4.90049601e-01,
+          -4.09593403e-01, -1.72403812e-01, -4.21287000e-01 ]
+      - [ -7.99088657e-01, -9.92581964e-01, -7.43080258e-01,
+          -1.65930092e-01, -9.99449909e-01, -9.94818091e-01,
+          -7.40820706e-01, -1.01596653e-01, -9.99553442e-01,
+          -9.95120049e-01, -7.18750000e-01, 2.36405849e-01,
+          -9.99496937e-01, -9.95949805e-01, -7.26562500e-01,
+          2.63665199e-01, -9.99514997e-01, -9.96014416e-01,
+          -6.32702172e-01, -9.29434299e-01, -6.31046116e-01,
+          2.76109576e-01, -1., -9.59923506e-01, -6.18497133e-01,
+          3.37826848e-01, -9.99984026e-01, -9.62342143e-01,
+          -5.79078555e-01, 5.28428793e-01, -9.99979198e-01,
+          -9.75871086e-01, -5.92039824e-01, 5.15696406e-01, -1.,
+          -9.75978017e-01 ]
+      - [ -7.44614720e-01, -7.05860615e-01, -5.20227134e-01,
+          4.24484968e-01, -9.93688226e-01, -7.66820550e-01,
+          -4.37004983e-01, 3.04018497e-01, -9.87101972e-01,
+          -8.49426210e-01, -2.95312405e-01, 5.97691298e-01,
+          -9.81458724e-01, -9.00289297e-01, -3.03125024e-01,
+          6.19509459e-01, -9.81627047e-01, -8.99553657e-01,
+          1.44574642e-01, -3.68971050e-01, 1.63104653e-01,
+          7.94766784e-01, -9.11155403e-01, -2.23998666e-01,
+          1.37572289e-01, 2.36677408e-01, -8.23308706e-01,
+          -5.89005709e-01, 3.82316232e-01, 7.76991844e-02,
+          -7.62115359e-01, -7.83342242e-01, 3.70646715e-01,
+          1.01046205e-01, -7.69659281e-01, -7.76550770e-01 ]
+      - [ -7.80654550e-01, -5.52439690e-01, -6.29524469e-01,
+          1.80220842e-01, -8.74623299e-01, -5.15531242e-01,
+          -6.19870365e-01, -4.52719510e-01, -7.36397266e-01,
+          -7.74395823e-01, -5.59374988e-01, -6.25146031e-01,
+          -6.73767567e-01, -8.88190508e-01, -5.65624952e-01,
+          -5.73455453e-01, -6.72278047e-01, -8.87891054e-01,
+          2.28133440e-01, -2.99569249e-01, 1.51856065e-01,
+          6.53638124e-01, -8.38419914e-01, -1.30547523e-01,
+          1.16762996e-01, 8.52196217e-02, -7.31702566e-01,
+          -5.36428690e-01, 3.17559004e-01, -1.66487992e-01,
+          -6.50656700e-01, -7.62763858e-01, 3.33333373e-01,
+          -1.52254283e-01, -6.63489699e-01, -7.59085536e-01 ]
+      - [ -7.02982605e-01, -4.91275907e-01, -4.39318717e-01,
+          7.85500526e-01, -9.79811490e-01, -4.43569541e-01,
+          -3.00215900e-01, 2.72752404e-01, -9.42509353e-01,
+          -7.34346986e-01, -1.04687452e-01, 4.55858827e-01,
+          -9.29463744e-01, -8.35474074e-01, -1.12499952e-01,
+          5.08145213e-01, -9.31505084e-01, -8.30176651e-01,
+          1.96272850e-01, -1.62015200e-01, 2.35095620e-01,
+          9.09397244e-01, -8.36625695e-01, 1.96170688e-01,
+          2.64739752e-01, -2.99515128e-01, -5.28126359e-01,
+          -5.36694765e-01, 4.14694905e-01, -6.32814765e-01,
+          -4.06814933e-01, -7.34825373e-01, 4.05472755e-01,
+          -5.01124263e-01, -4.46105301e-01, -7.00325012e-01 ]
+      - [ -6.68806970e-01, -2.65159011e-01, -3.22924078e-01,
+          9.86250997e-01, -9.21418369e-01, -6.11376762e-03,
+          -3.05975437e-01, -5.43398499e-01, -6.12775147e-01,
+          -6.55438542e-01, -2.35937476e-01, -7.29092538e-01,
+          -5.39651871e-01, -7.90426373e-01, -2.34374940e-01,
+          -4.51766193e-01, -5.88687301e-01, -7.53518820e-01,
+          2.37150550e-01, -4.47229743e-02, 3.00337434e-01,
+          9.39261079e-01, -7.70141602e-01, 4.63002801e-01,
+          3.27167511e-01, -6.37929678e-01, -3.01125586e-01,
+          -5.29975474e-01, 4.86923933e-01, -9.17048097e-01,
+          -1.99625134e-01, -6.95782185e-01, 4.67661738e-01,
+          -8.59241486e-01, -2.26259649e-01, -6.73130035e-01 ]
+      - [ -6.59072042e-01, -2.64965713e-01, -3.21504653e-01, 1.,
+          -9.23929572e-01, -3.26365232e-03, -3.05975437e-01,
+          -5.39320290e-01, -6.14713252e-01, -6.55346751e-01,
+          -2.37499952e-01, -7.15211093e-01, -5.45411289e-01,
+          -7.90473104e-01, -2.37499952e-01, -4.34195936e-01,
+          -5.93545318e-01, -7.51626492e-01, 2.07694650e-01,
+          -7.10706115e-02, 2.64341950e-01, 9.14303541e-01,
+          -7.79838681e-01, 3.99904370e-01, 2.99421906e-01,
+          -6.23790801e-01, -3.25375438e-01, -5.48602819e-01,
+          4.57036018e-01, -8.93893898e-01, -2.31370449e-01,
+          -7.09870577e-01, 4.42786098e-01, -8.30431521e-01,
+          -2.56063104e-01, -6.84254348e-01 ]
+      - [ -7.95981765e-01, -9.53692555e-01, -7.20369101e-01,
+          -8.94319415e-02, -9.98551965e-01, -9.79224920e-01,
+          -7.14902759e-01, -1.96853876e-02, -9.98797297e-01,
+          -9.80658174e-01, -6.74999952e-01, 3.31131220e-01,
+          -9.98765767e-01, -9.85725045e-01, -6.82812452e-01,
+          3.56203556e-01, -9.98798251e-01, -9.85754728e-01,
+          -4.23504710e-01, -8.21981251e-01, -4.15073156e-01,
+          4.50339079e-01, -9.98798251e-01, -8.79269898e-01,
+          -3.82658958e-01, 5.21303773e-01, -9.99126256e-01,
+          -8.90780389e-01, -2.52802014e-01, 7.33769178e-01,
+          -9.98546302e-01, -9.38076973e-01, -2.66169190e-01,
+          7.30081916e-01, -9.98574138e-01, -9.37732875e-01 ]
+      - [ -6.83305740e-01, -4.47697937e-01, -4.63449240e-01,
+          -2.92154551e-02, -8.24235260e-01, -5.99027455e-01,
+          -4.64362800e-01, 2.75373101e-01, -8.76201630e-01,
+          -5.51300168e-01, -3.82812440e-01, -2.41696775e-01,
+          -7.56148577e-01, -8.28716040e-01, -3.98437500e-01,
+          9.57554579e-02, -8.06294978e-01, -7.82826424e-01,
+          1.59301758e-02, -2.26254702e-01, -8.88638496e-02,
+          1.11082554e-01, -6.20717525e-01, -2.37690389e-01,
+          -1.15603209e-03, 2.36596823e-01, -6.77818298e-01,
+          -2.86687493e-01, 1.60647631e-01, -2.73889482e-01,
+          -5.23431122e-01, -6.97930872e-01, 1.11940265e-01,
+          -1.29748702e-01, -5.50621092e-01, -6.50858998e-01 ]
+      - [ -6.76263452e-01, -4.78322744e-01, -4.57771420e-01,
+          -8.12867880e-02, -8.34779561e-01, -6.54651403e-01,
+          -4.48524117e-01, 4.40854430e-01, -9.28080022e-01,
+          -5.76371014e-01, -3.35937440e-01, -1.65043175e-01,
+          -8.09071422e-01, -8.55862796e-01, -3.45312417e-01,
+          1.64942980e-01, -8.54454637e-01, -8.18386137e-01,
+          8.80672932e-02, -2.50087559e-01, -5.06187081e-02,
+          1.41352773e-01, -6.59106731e-01, -2.78217316e-01,
+          3.35259438e-02, 2.95354724e-01, -7.25801587e-01,
+          -3.25895369e-01, 2.07970142e-01, -2.22973049e-01,
+          -5.68509817e-01, -7.14585066e-01, 1.61691427e-01,
+          -1.08205974e-01, -5.88813186e-01, -6.76615417e-01 ]
+      - [ -6.61971807e-01, -4.36108172e-01, -4.39318717e-01,
+          -4.04636860e-02, -8.18201542e-01, -5.97493708e-01,
+          -4.38444853e-01, 2.76629448e-01, -8.74244153e-01,
+          -5.49530625e-01, -3.51562440e-01, -2.47614861e-01,
+          -7.53836751e-01, -8.26939940e-01, -3.65624964e-01,
+          6.04004860e-02, -7.98949003e-01, -7.86245704e-01,
+          7.00329542e-02, -2.34365642e-01, -6.63666725e-02,
+          1.34619594e-01, -6.39541149e-01, -2.46783257e-01,
+          2.89015770e-02, 1.92303181e-01, -6.79882765e-01,
+          -3.38182509e-01, 1.98007464e-01, -2.53901839e-01,
+          -5.39986610e-01, -7.02482224e-01, 1.54228926e-01,
+          -1.66637897e-01, -5.54299176e-01, -6.68725967e-01 ]
+      - [ -6.58864975e-01, -4.30956960e-01, -4.32221413e-01,
+          -3.76976728e-02, -8.16998839e-01, -5.93760312e-01,
+          -4.32685316e-01, 2.79894829e-01, -8.73025715e-01,
+          -5.45205712e-01, -3.43749940e-01, -2.49867022e-01,
+          -7.51593053e-01, -8.25149775e-01, -3.59374940e-01,
+          6.39539957e-02, -7.97194123e-01, -7.82833040e-01,
+          8.62638950e-02, -2.28793919e-01, -5.96175790e-02,
+          1.23741627e-01, -6.30984902e-01, -2.43036330e-01,
+          4.04623747e-02, 1.90698624e-01, -6.77447200e-01,
+          -3.35260093e-01, 2.17932701e-01, -2.55681872e-01,
+          -5.36707759e-01, -6.99564993e-01, 1.61691427e-01,
+          -1.55894458e-01, -5.51639795e-01, -6.61450684e-01 ]
+      - [ -7.84589887e-01, -9.83730078e-01, -7.18949556e-01,
+          -1.90903664e-01, -9.98138130e-01, -9.94390428e-01,
+          -8.67530584e-01, -9.30294991e-02, -9.98397887e-01,
+          -9.88335252e-01, -7.49999940e-01, 2.19188929e-01,
+          -9.98520017e-01, -9.94742811e-01, -7.54687488e-01,
+          2.43030190e-01, -9.98450816e-01, -9.94944274e-01,
+          -7.44514585e-01, -9.69966173e-01, -8.31271052e-01,
+          1.26320481e-01, -9.97043312e-01, -9.83787835e-01, -1.,
+          2.59741545e-01, -9.99525130e-01, -9.53831315e-01,
+          -8.82938981e-01, 3.30357075e-01, -9.96822953e-01,
+          -9.87504959e-01, -8.88059676e-01, 3.03745508e-01,
+          -9.96535540e-01, -9.87997651e-01 ]
+      - [ -7.66570032e-01, -9.98048663e-01, -6.40880048e-01,
+          -2.09533274e-01, -9.98971760e-01, -9.98294592e-01,
+          -7.30741501e-01, -1.34966373e-01, -9.99128640e-01,
+          -9.97196257e-01, -6.49999976e-01, 1.90580130e-01,
+          -9.99196589e-01, -9.98122096e-01, -6.60937488e-01,
+          2.17455864e-01, -9.99178648e-01, -9.98212576e-01,
+          -6.49534166e-01, -9.72666085e-01, -6.78290188e-01,
+          1.67416930e-01, -9.99598265e-01, -9.87512589e-01,
+          -7.98843980e-01, 2.47811794e-01, -9.99752700e-01,
+          -9.79758859e-01, -7.11083531e-01, 3.98302555e-01,
+          -9.99723136e-01, -9.89430189e-01, -7.18905449e-01,
+          3.80685925e-01, -9.99704897e-01, -9.89424825e-01 ]
+      - [ -7.43371964e-01, -4.00101662e-01, -5.20227134e-01,
+          2.94392705e-01, -8.51437569e-01, -3.85879815e-01,
+          -5.59395194e-01, -9.02498364e-02, -7.47055769e-01,
+          -5.54487467e-01, -4.67187464e-01, -2.98676848e-01,
+          -6.88841105e-01, -7.88758993e-01, -4.87499952e-01,
+          -2.93113470e-01, -6.69097304e-01, -7.85932899e-01,
+          -8.80674124e-02, -1.90561593e-01, -5.28683662e-02,
+          1.88680172e-01, -6.05296135e-01, -1.10271096e-01,
+          -4.50868011e-02, -1.43112540e-02, -5.44299066e-01,
+          -3.10459256e-01, 1.05852962e-01, -4.89043176e-01,
+          -3.86060357e-01, -6.79583013e-01, 9.70149040e-02,
+          -4.33593631e-01, -4.06621039e-01, -6.56804800e-01 ]
+      - [ -7.44614720e-01, -4.12230849e-01, -5.25904894e-01,
+          2.83931255e-01, -8.56099904e-01, -4.05577660e-01,
+          -5.63714862e-01, -3.82177830e-02, -7.73075044e-01,
+          -5.59851170e-01, -4.74999964e-01, -2.52286971e-01,
+          -7.08900392e-01, -7.91914165e-01, -4.92187500e-01,
+          -2.35446513e-01, -6.94902897e-01, -7.88973927e-01,
+          -1.04899347e-01, -1.97151065e-01, -6.41169548e-02,
+          1.96780920e-01, -6.09524608e-01, -1.07979000e-01,
+          -5.89596033e-02, -1.15840435e-02, -5.50158203e-01,
+          -3.17928195e-01, 9.58904028e-02, -4.66918051e-01,
+          -4.00271773e-01, -6.81859374e-01, 8.20896626e-02,
+          -4.33854222e-01, -4.10139441e-01, -6.60731614e-01 ]
+      - [ -7.37158239e-01, -5.28887153e-01, -5.07452130e-01,
+          4.88198400e-01, -9.58847821e-01, -5.53492546e-01,
+          -4.78761673e-01, 4.78702664e-01, -9.60026264e-01,
+          -6.50443792e-01, -3.06249976e-01, 4.44986224e-01,
+          -9.27986562e-01, -8.31270337e-01, -3.15624952e-01,
+          4.74506855e-01, -9.28348064e-01, -8.29488397e-01,
+          -9.52810645e-02, -2.75002122e-01, -5.73677421e-02,
+          2.96782613e-01, -7.25490570e-01, -2.35481203e-01,
+          -7.28324056e-02, 3.77972484e-01, -7.57663012e-01,
+          -3.02768886e-01, 8.84184837e-02, -2.07587242e-01,
+          -5.74354529e-01, -7.18112290e-01, 5.22388220e-02,
+          -1.88700616e-01, -5.74655294e-01, -7.00255156e-01 ]
+      - [ -8.34299922e-01, -9.30633068e-01, -7.98438609e-01,
+          -2.38223672e-02, -9.98276651e-01, -9.56815660e-01,
+          -8.11375082e-01, 4.65128422e-02, -9.98302877e-01,
+          -9.58329916e-01, -7.71874964e-01, 4.13550377e-01,
+          -9.98545408e-01, -9.71192658e-01, -7.70312488e-01,
+          4.27756071e-01, -9.98387754e-01, -9.72124279e-01,
+          -5.94229102e-01, -7.87487984e-01, -5.36557913e-01,
+          4.75610256e-01, -9.94705498e-01, -8.05391729e-01,
+          -5.56069434e-01, 5.57092786e-01, -9.95469987e-01,
+          -8.09282422e-01, -4.57036197e-01, 6.96320534e-01,
+          -9.89078104e-01, -8.98372591e-01, -4.55223858e-01,
+          6.67116404e-01, -9.88299370e-01, -9.02942836e-01 ]
+      - [ -8.00124288e-01, -8.65656495e-01, -7.13271856e-01,
+          1.18032932e-01, -9.97954488e-01, -9.13314164e-01,
+          -7.24981964e-01, 1.94521308e-01, -9.97860372e-01,
+          -9.16197240e-01, -6.57812476e-01, 5.71832418e-01,
+          -9.97774482e-01, -9.43985224e-01, -6.54687464e-01,
+          5.80355644e-01, -9.97665107e-01, -9.45822835e-01,
+          -4.12684143e-01, -6.75387144e-01, -3.13835740e-01,
+          5.70705533e-01, -9.84782279e-01, -7.03664780e-01,
+          -3.17919075e-01, 6.35936856e-01, -9.83425438e-01,
+          -7.13055730e-01, -1.63138211e-01, 7.31181502e-01,
+          -9.68766809e-01, -8.48791718e-01, -1.74129426e-01,
+          6.36483431e-01, -9.61975634e-01, -8.61205041e-01 ]
+      - [ -7.29701757e-01, -7.25414574e-01, -5.17388225e-01,
+          2.20912695e-01, -9.86994445e-01, -8.35715353e-01,
+          -5.04679561e-01, 2.95433521e-01, -9.86699820e-01,
+          -8.45004559e-01, -3.45312417e-01, 5.11625171e-01,
+          -9.79111135e-01, -9.12311673e-01, -3.60937536e-01,
+          4.88369346e-01, -9.76752162e-01, -9.16396320e-01,
+          -8.74662399e-02, -5.29709935e-01, -1.27109110e-01,
+          3.57599974e-01, -9.07550812e-01, -6.18332624e-01,
+          -1.09826684e-01, 3.50881457e-01, -8.96705151e-01,
+          -6.57177329e-01, -1.03362381e-01, 3.57601523e-01,
+          -8.43389392e-01, -7.77306080e-01, -1.19403005e-01,
+          2.16934085e-01, -8.25741887e-01, -8.01320970e-01 ]
+      - [ -7.35708356e-01, -7.07150578e-01, -5.48616052e-01,
+          9.24643278e-02, -9.69427586e-01, -8.33111882e-01,
+          -5.50755918e-01, 1.69494033e-01, -9.69742894e-01,
+          -8.37663531e-01, -5.48437476e-01, 2.92115331e-01,
+          -9.46979940e-01, -8.99395227e-01, -5.59374988e-01,
+          2.41925359e-01, -9.39991593e-01, -9.08696234e-01,
+          -8.68651271e-02, -5.17110229e-01, -1.63104653e-01,
+          2.61837721e-01, -8.76997888e-01, -6.20390534e-01,
+          -1.46820843e-01, 2.30867863e-01, -8.59402478e-01,
+          -6.62911236e-01, -1.13325059e-01, 2.98508406e-01,
+          -8.18149567e-01, -7.68236041e-01, -1.21890545e-01,
+          1.37467861e-01, -7.95871615e-01, -7.95940340e-01 ]
+      - [ -7.40265131e-01, -7.09125876e-01, -5.55713236e-01,
+          9.15971994e-02, -9.69772637e-01, -8.33847344e-01,
+          -5.56515455e-01, 1.66078210e-01, -9.69651997e-01,
+          -8.38411629e-01, -5.60937464e-01, 2.92989135e-01,
+          -9.46991086e-01, -8.98939610e-01, -5.70312440e-01,
+          2.35108495e-01, -9.39385295e-01, -9.09480989e-01,
+          -1.01893663e-01, -5.21875262e-01, -1.72103405e-01,
+          2.66129136e-01, -8.78831029e-01, -6.20553911e-01,
+          -1.58381522e-01, 2.37515092e-01, -8.61453533e-01,
+          -6.61802530e-01, -1.28268957e-01, 3.06106806e-01,
+          -8.20675433e-01, -7.68281817e-01, -1.36815846e-01,
+          1.36979461e-01, -7.97215700e-01, -7.97697425e-01 ]
+      - [ -7.96810269e-01, -8.64118814e-01, -6.79205060e-01,
+          1.17724776e-01, -9.98454511e-01, -9.21496153e-01,
+          -7.16342688e-01, 2.05001950e-01, -9.98494625e-01,
+          -9.18314636e-01, -6.14062428e-01, 5.67845225e-01,
+          -9.98288572e-01, -9.49936152e-01, -6.20312452e-01,
+          5.84764957e-01, -9.98322368e-01, -9.50215936e-01,
+          -3.09287727e-01, -6.23075008e-01, -1.78852618e-01,
+          5.69929123e-01, -9.74611044e-01, -6.70062900e-01,
+          -1.88439369e-01, 6.54562116e-01, -9.77636099e-01,
+          -6.86888039e-01, 5.10584116e-02, 6.02665424e-01,
+          -9.53336656e-01, -8.62750947e-01, 3.98010015e-02,
+          5.94531417e-01, -9.53556240e-01, -8.63262177e-01 ]
+      - [ -7.37572491e-01, -3.73889625e-01, -4.87579882e-01,
+          3.40975523e-01, -8.57055426e-01, -3.66553843e-01,
+          -5.03239751e-01, 1.28068328e-01, -8.24793696e-01,
+          -5.40017366e-01, -4.26562428e-01, -9.56056714e-02,
+          -7.49567866e-01, -7.74961472e-01, -4.43749964e-01,
+          -5.09014726e-02, -7.48145461e-01, -7.71518528e-01,
+          -1.12714231e-01, -1.11857057e-01, -2.58716941e-02,
+          2.89814353e-01, -6.05425239e-01, 2.32770443e-02,
+          -1.73410773e-02, -2.23551452e-01, -4.14269328e-01,
+          -3.26960146e-01, 9.83810425e-02, -4.05725360e-01,
+          -3.49154770e-01, -6.12257779e-01, 9.70149040e-02,
+          -4.48778987e-01, -3.51186574e-01, -6.17870092e-01 ]
+      - [ -7.39229441e-01, -4.27434921e-01, -4.69127059e-01,
+          4.83941555e-01, -9.21893358e-01, -4.34286475e-01,
+          -4.39884722e-01, 4.20753121e-01, -9.27405238e-01,
+          -5.89402914e-01, -2.92187452e-01, 2.34335423e-01,
+          -8.71904075e-01, -8.09972823e-01, -2.98437476e-01,
+          2.75333762e-01, -8.74121547e-01, -8.08988869e-01,
+          -8.38594437e-02, -1.16263330e-01, 2.58718729e-02,
+          4.21999693e-01, -6.70879006e-01, 4.80359793e-02,
+          4.97108698e-02, -1.57158256e-01, -4.73886847e-01,
+          -3.41951489e-01, 1.43212914e-01, -3.62124741e-01,
+          -3.79376054e-01, -6.14283323e-01, 1.34328365e-01,
+          -4.35655892e-01, -3.67445409e-01, -6.23015761e-01 ]
+      - [ -7.40058005e-01, -5.13973117e-01, -4.79063213e-01,
+          6.00961924e-01, -9.74777460e-01, -5.48144579e-01,
+          -4.35565174e-01, 5.60063004e-01, -9.71926510e-01,
+          -6.55767322e-01, -1.20312452e-01, 5.95252991e-01,
+          -9.51798081e-01, -8.38515162e-01, -1.26562476e-01,
+          6.09246373e-01, -9.51799870e-01, -8.39516997e-01,
+          -5.01954556e-02, -1.76985919e-01, 1.91226006e-02,
+          4.37225223e-01, -7.21518576e-01, -5.67589402e-02,
+          1.96532011e-02, -1.67014003e-02, -5.71970403e-01,
+          -3.56979072e-01, 1.13325000e-01, -2.87945628e-01,
+          -4.63865101e-01, -6.54623985e-01, 1.09452724e-01,
+          -3.39639187e-01, -4.62945044e-01, -6.62745953e-01 ]
+      - [ -7.15410113e-01, -4.01919663e-01, -4.49254811e-01,
+          4.31569815e-01, -8.98540974e-01, -4.11635876e-01,
+          -4.21166301e-01, 4.11864400e-01, -9.14021552e-01,
+          -5.60322881e-01, -3.01562428e-01, 1.41420960e-01,
+          -8.40236604e-01, -7.97616661e-01, -3.10937524e-01,
+          2.13505030e-01, -8.46654356e-01, -7.92160153e-01,
+          -5.98136783e-02, -1.01592898e-01, 3.93700600e-02,
+          4.13708448e-01, -6.55807495e-01, 7.16965199e-02,
+          5.66473007e-02, -1.35726392e-01, -4.73121583e-01,
+          -3.18258107e-01, 1.60647631e-01, -3.57118845e-01,
+          -3.71112466e-01, -6.03801966e-01, 1.51741385e-01,
+          -4.30808187e-01, -3.60322952e-01, -6.13803744e-01 ]
+   decision_functions:
+      -
+         sv_count: 774
+         rho: -1.4963506109127357e+02
+         alpha: [ -1024., -1024., 1024., 1024., 1024., 1024., -1024.,
+             -1024., -1024., -9.0608407934024945e+02,
+             3.0314084464718235e+02, -1024., 1024., 1024., -1024.,
+             -1024., -1024., 1024., -1024., 1024., 1024., 1024., 1024.,
+             1024., -1024., -1024., 1024., -1024., 1024., 1024., 1024.,
+             5.9025689541824875e+01, -1024., 1024., 1024., -1024., 1024.,
+             -1024., -1024., 1024., -3.8582569564151481e+02, -1024.,
+             -1024., -1024., -1024., -1024., -1024., -1024., 1024.,
+             1024., -1024., -1024., 1024., 1024., 1024., 1024., 1024.,
+             1024., -1024., -1024., -1024., 3.4026186270772018e+02,
+             -1024., -1024., 1024., -1024., -1024.,
+             -4.7518227433716498e+02, -7.5842988411897693e+02, -1024.,
+             1024., 1024., -1024., 1024., 1024., 1024., -1024., -1024.,
+             1024., 1024., -1024., -1024., -1024., 1024., 1024., 1024.,
+             -1024., 1024., -1024., 1024., -1024., -1024., -1024.,
+             -1024., -1024., -1024., -1024., 1024., 1024., -1024.,
+             -1024., -1024., -1024., 1024., 1024.,
+             1.2081089651635729e+02, -1024., -1024., 1024., -1024.,
+             1024., 1024., -1024., -1024., -1024., -1024., -1024., 1024.,
+             1024., -1024., 1024., 1024., -1024., 1024., -1024., -1024.,
+             1024., -1024., 1024., 1024., 1024., 1024., -1024., -1024.,
+             -1024., -1024., 4.2101597755578176e+01, 1024., -1024.,
+             1024., -1024., 1024., -1024., -1024., -1024., 1024., 1024.,
+             1024., -1024., -1024., -1024., -1024., -1024., -1024.,
+             1024., -1024., 1024., -1024., -1024.,
+             5.5213881569142654e+02, 1024., -1024., 1024.,
+             -2.4894534912901264e+02, 1024., -1024., 1024., 1024., 1024.,
+             1024., -1024., 1024., -1024., -1024., 1024., 1024., 1024.,
+             -1024., 4.5105906000249337e+02, 1024., 1024., -1024.,
+             -1024., 1024., -1024., 1024., -1024., 1024., 1024., 1024.,
+             -1024., -1024., 1024., -1024., 1024., -1024., -1024., 1024.,
+             5.5578906629278674e+02, -1024., 1024., -1024., -1024.,
+             -1024., -1024., -1024., -1024., -5.4716468978054843e+02,
+             1024., 1024., -1024., 1024., 1024., -1024., -1024., -1024.,
+             1024., -1024., -1024., 1024., -1024., -1024., -1024.,
+             -1024., -1024., -6.5836040721836321e+02, 1024., -1024.,
+             -1024., 1024., 1024., -1024., 1024., -1024., 1024.,
+             -5.7377245668692876e+02, -1024., -1024., -1024.,
+             -8.0122499228434560e+02, 1024., -1024.,
+             3.1603855315808414e+02, 1024., -1024., -1024., 1024., 1024.,
+             1024., 1024., -1024., 1024., -1024., -1024., 1024., 1024.,
+             -1024., 1024., 1024., -8.2466731926834552e+02, -1024.,
+             1024., 1024., -1024., -1024., -1024., 1024., -1024., -1024.,
+             -1024., 1024., -1024., -1024., 1024., 1024., 1024., -1024.,
+             -1024., 1024., -1024., 1024., 1024., 1024., 1024., -1024.,
+             -1024., -1024., -5.3959978935828769e+02, 1024., 1024.,
+             1024., 1024., -1024., 1024., 1024., 1024.,
+             -3.1347040959926687e+02, 7.6616439923852101e+02, 1024.,
+             1024., -1024., -1024., -1024., -1024., 1024.,
+             1.2232635549632882e+01, 1024., 1024.,
+             -1.9897152547033389e+02, 1024., -1024., 1024.,
+             4.6099048210814112e+02, 1024., -1024., -1024., 1024.,
+             -1024., 1024., -1024., 1024., 1024., 1024., 1024., -1024.,
+             1024., -1024., 1024., 1024., 1024., -1024., -1024.,
+             1.2442262761937528e+02, 1024., 1024.,
+             -3.5785419567347185e+02, -1024., 1024., 1024., -1024.,
+             9.9813861980238369e+02, -1024., 1024., 1024., 1024.,
+             1.9059449884389801e+02, -1024., -1024., -1024., 1024.,
+             -2.6930480111078543e+02, -1024., 1024., -1024., 1024.,
+             -2.9244091202497106e+02, 1024., -2.1753468896683447e+02,
+             -1024., 1024., -1024., 1024., 1024.,
+             -1.8669740708452628e+02, 1.4949088272096142e+02, -1024.,
+             -1024., 1024., 7.1469220490740838e+02, 1024., 1024., 1024.,
+             7.7029396705838303e+02, -1024., -1024., -1024.,
+             1.4501222722475867e+02, 1.9272041135097032e+02,
+             -3.4852570292734640e+02, 1024., 1024., -1024., -1024.,
+             1024., -1024., -1024., 1024., 7.6782572603641643e+02, 1024.,
+             1024., 1024., -9.6790907083457262e+02, 1024., -1024., 1024.,
+             -1.0077520815711885e+03, -1024., -1024., -1024., 1024.,
+             -1024., 1024., -1024., 7.6142561897567703e+02, 1024.,
+             -9.5292785905734922e+02, 1024., -1024., -1024.,
+             -2.2161609145578910e+02, -1024., 4.1619756346085796e+02,
+             1024., -1.1843805548619852e+02, 1024., -1024., 1024.,
+             4.0105610317314824e+02, -1024., 1024., -1024., -1024.,
+             -1024., 1024., -1024., -1024., 1024., -1024., 1024., 1024.,
+             3.0455949066918146e+02, 1024., 1024., 1024., 1024., 1024.,
+             1024., 4.2418615890743075e+02, -1024., -1024.,
+             9.9053948534575920e+02, 1024., -1024., -1024.,
+             -6.0373751155373714e+02, 1024., -1024., 1024., 1024.,
+             -1024., -1024., 8.8347102797301247e+00, 1024., -1024.,
+             -8.9496368172884888e+02, -1024., -1024., -1024., -1024.,
+             1024., 1024., -3.9222831802221418e+02, -1024., 1024.,
+             -4.1523052035479668e+02, -4.6810888142120524e+02, -1024.,
+             -1024., 1024., 1024., 4.5530077312815570e+02, -1024., 1024.,
+             1024., -1024., -1024., 1024., 1024., -1024.,
+             -1.0357011908607899e+02, -1024., 1024., -1024., -1024.,
+             3.6372714230950390e+02, 7.1429781448668791e+02, 1024.,
+             1024., -1024., -1024., 1024., -1024., 1024., -1024., -1024.,
+             -6.1529234823604520e+02, -1024., -1024., -1024., 1024.,
+             -1024., 1024., -1024., 1024., 1024., -1024.,
+             6.3377350932662665e+01, 1024., -1024., 1024.,
+             -7.0802670266711698e+02, 1024., -4.4782288416973643e+02,
+             -9.5583384501196167e+02, 1024., -1024., -1024., -1024.,
+             1024., -6.1124368121157568e+02, 1024., -1024., -1024.,
+             -1024., -8.0888234395002530e+02, -1024., -1024.,
+             6.5159251028543997e+02, -1024., 1024., -1024., 1024., 1024.,
+             4.5202678273988062e+02, -1024., -1024., 1024.,
+             -9.0316302577902593e+02, 1024., 1024., -1024., 1024.,
+             -1024., 1024., 1024., 1024., -1024., 3.5748267386291559e+02,
+             1024., 1024., -1024., 1024., 1024., 1024., -1024., -1024.,
+             1024., -1024., -1024., -1024., -9.6208892910428096e+02,
+             -1024., -1024., 3.4160661733309787e+01, -1024.,
+             6.4476862295804574e+02, -1024., 1024., 1024., 1024., 1024.,
+             -1024., -3.3069394764490625e+02, -1024., -1024.,
+             6.8700342296245560e+02, 1024., 1024., 1024., 1024., 1024.,
+             -1024., 1024., -1024., 1024., 8.8855535355423427e+01,
+             -1024., 1024., -1024., 1024., -1024., -1024., 1024.,
+             6.0025061351596491e+02, 1024., -7.0814571498968769e+02,
+             1024., -1024., -1024., 6.2439085173495801e+02, 1024., 1024.,
+             1024., 1024., -1024., 1024., 1024., -1024., -1024.,
+             8.0621950706978680e+02, 1024., 1024., 1024., 1024., -1024.,
+             1.6280109271528985e+02, 1024., -1024., 1024., -1024., 1024.,
+             1024., -1024., 1024., -1024., 1024., 1024., -1024., -1024.,
+             1024., 1024., -1024., -1024., -1024., 1024., -1024., 1024.,
+             1024., -1024., -1024., -1024., -1024., 1024., -1024., 1024.,
+             1024., 1024., 1024., -1024., 1024., 1024., 1024., 1024.,
+             -8.6903282736594576e+02, -1024., 6.1611090655742407e+02,
+             -1024., 1024., 1024., -1024., 1024., 1024., 1024., -1024.,
+             2.4164372362794660e+02, -1024., 1024., 1024., 1024., -1024.,
+             -1024., -1024., 1024., -1024., -1024., -1024.,
+             8.6233126757588002e+02, -8.9313895724685665e+02, -1024.,
+             -1024., 1024., 1024., -4.9135622464436983e+02, -1024.,
+             9.0470220753510489e+02, 1024., 1024., 1024., -1024., 1024.,
+             1024., -1024., 1024., -1024., 1024.,
+             -7.2859884461208151e+02, -1024., -3.8174590483235653e+02,
+             1024., -1024., 1024., 1024., 1024., 1.0491225606965803e+02,
+             1024., 1024., 1024., 2.5380531987570777e+02, 1024., -1024.,
+             1024., 1024., 1024., 1024., -1024., 1024., 1024., 1024.,
+             1024., -1024., 1024., -1024., 1024., -1024., 1024., 1024.,
+             1024., -1.0010868349856861e+03, -1024., 1024., -1024.,
+             1024., -1024., 1024., -1024., -1024.,
+             3.3613558601031635e+01, -1024., -2.1967311278608616e+02,
+             -1024., -1024., -1024., 1024., 1024., 1024., 1024., 1024.,
+             -1024., 1024., 1024., 1024., -1024., 1024., -1024., -1024.,
+             -9.8091279767319907e+02, -1024., -1024., -1024., 1024.,
+             1024., -1024., -1024., -5.6384109771512931e+02, -1024.,
+             -1024., -1024., -1024., -1024., 1024., 1024., -1024., 1024.,
+             1024. ]
diff --git a/modules/quality/samples/brisque_range_live.yml b/modules/quality/samples/brisque_range_live.yml
new file mode 100644
index 00000000000..6fec552525b
--- /dev/null
+++ b/modules/quality/samples/brisque_range_live.yml
@@ -0,0 +1,25 @@
+%YAML:1.0
+---
+range: !!opencv-matrix
+   rows: 2
+   cols: 36
+   dt: f
+   data: [ 3.44000012e-01, 1.92631185e-02, 2.31999993e-01,
+       -1.25608176e-01, 1.54766443e-04, 5.36677078e-04, 2.47999996e-01,
+       -1.25662684e-01, 1.56631286e-04, 5.32896898e-04, 2.64999986e-01,
+       -1.37013525e-01, 1.69135848e-04, 3.88529879e-04, 2.68999994e-01,
+       -1.45002097e-01, 1.74277433e-04, 4.11326590e-04, 4.09000009e-01,
+       1.65343825e-02, 2.17999995e-01, -2.00738415e-01, 1.03299266e-04,
+       8.17875145e-04, 2.28000000e-01, -1.98958635e-01, 1.15834941e-04,
+       8.49922828e-04, 2.46000007e-01, -1.55001476e-01, 1.20401361e-04,
+       3.38587241e-04, 2.47999996e-01, -1.48134664e-01, 1.16321200e-04,
+       3.34327371e-04, 10., 8.07274520e-01, 1.64100003e+00,
+       2.02751741e-01, 7.14265108e-01, 4.68011886e-01, 1.63699996e+00,
+       1.79955900e-01, 7.12509930e-01, 4.68246639e-01, 1.54499996e+00,
+       1.01060480e-01, 6.86503410e-01, 5.31757474e-01, 1.54900002e+00,
+       1.00678936e-01, 6.87403798e-01, 5.33775926e-01, 3.73600006e+00,
+       8.01105976e-01, 1.10699999e+00, 1.75127238e-01, 7.52403796e-01,
+       4.00098890e-01, 1.09300005e+00, 1.56139076e-01, 7.52328634e-01,
+       4.06460851e-01, 1.04900002e+00, 9.35277343e-02, 6.23002231e-01,
+       5.31899512e-01, 1.05200005e+00, 9.37106311e-02, 6.25087202e-01,
+       5.38609207e-01 ]
diff --git a/modules/quality/samples/brisque_trainer_livedb.cpp b/modules/quality/samples/brisque_trainer_livedb.cpp
new file mode 100644
index 00000000000..71caa554b48
--- /dev/null
+++ b/modules/quality/samples/brisque_trainer_livedb.cpp
@@ -0,0 +1,176 @@
+#include  <sstream>
+#include  <iostream>
+
+#include "opencv2/quality.hpp"
+#include "opencv2/quality/quality_utils.hpp"
+#include "opencv2/imgcodecs.hpp"
+#include "opencv2/ml.hpp"
+
+/*
+BRISQUE Trainer using LIVE DB R2
+http://live.ece.utexas.edu/research/Quality/subjective.htm
+H.R. Sheikh, Z.Wang, L. Cormack and A.C. Bovik, "LIVE Image Quality Assessment Database Release 2", http://live.ece.utexas.edu/research/quality .
+H.R. Sheikh, M.F. Sabir and A.C. Bovik, "A statistical evaluation of recent full reference image quality assessment algorithms", IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3440-3451, Nov. 2006.
+Z. Wang, A.C. Bovik, H.R. Sheikh and E.P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," IEEE Transactions on Image Processing , vol.13, no.4, pp. 600- 612, April 2004.
+*/
+
+/*
+Copyright (c) 2011 The University of Texas at Austin
+All rights reserved.
+
+Permission is hereby granted, without written agreement and without license or royalty fees, to use, copy,
+modify, and distribute this code (the source files) and its documentation for
+any purpose, provided that the copyright notice in its entirety appear in all copies of this code, and the
+original source of this code, Laboratory for Image and Video Engineering (LIVE, http://live.ece.utexas.edu)
+and Center for Perceptual Systems (CPS, http://www.cps.utexas.edu) at the University of Texas at Austin (UT Austin,
+http://www.utexas.edu), is acknowledged in any publication that reports research using this code. The research
+is to be cited in the bibliography as:
+
+1) A. Mittal, A. K. Moorthy and A. C. Bovik, "BRISQUE Software Release",
+URL: http://live.ece.utexas.edu/research/quality/BRISQUE_release.zip, 2011
+
+2) A. Mittal, A. K. Moorthy and A. C. Bovik, "No Reference Image Quality Assessment in the Spatial Domain"
+submitted
+
+IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT AUSTIN BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL,
+OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS DATABASE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF TEXAS
+AT AUSTIN HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+THE UNIVERSITY OF TEXAS AT AUSTIN SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE DATABASE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS,
+AND THE UNIVERSITY OF TEXAS AT AUSTIN HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
+*/
+
+/* Original Paper: @cite Mittal2 and Original Implementation: @cite Mittal2_software */
+
+namespace {
+
+    #define CATEGORIES 5
+    #define IMAGENUM   982
+    #define JP2KNUM    227
+    #define JPEGNUM    233
+    #define WNNUM      174
+    #define GBLURNUM   174
+    #define FFNUM      174
+
+    // collects training data from LIVE R2 database
+    // returns {features, responses}, 1 row per image
+    std::pair<cv::Mat, cv::Mat> collect_data_live_r2(const std::string& foldername)
+    {
+        FILE* fid = nullptr;
+
+        //----------------------------------------------------
+        // class is the distortion category, there are 982 images in LIVE database
+        std::vector<std::string> distortionlabels;
+        distortionlabels.push_back("jp2k");
+        distortionlabels.push_back("jpeg");
+        distortionlabels.push_back("wn");
+        distortionlabels.push_back("gblur");
+        distortionlabels.push_back("fastfading");
+
+        int imnumber[5] = { 0,227,460,634,808 };
+
+        std::vector<int>categorylabels;
+        categorylabels.insert(categorylabels.end(), JP2KNUM, 0);
+        categorylabels.insert(categorylabels.end(), JPEGNUM, 1);
+        categorylabels.insert(categorylabels.end(), WNNUM, 2);
+        categorylabels.insert(categorylabels.end(), GBLURNUM, 3);
+        categorylabels.insert(categorylabels.end(), FFNUM, 4);
+
+        int  iforg[IMAGENUM];
+        fid = fopen((foldername + "orgs.txt").c_str(), "r");
+        for (int itr = 0; itr < IMAGENUM; itr++)
+            CV_Assert( fscanf(fid, "%d", iforg + itr) > 0);
+        fclose(fid);
+
+        float dmosscores[IMAGENUM];
+        fid = fopen((foldername + "dmos.txt").c_str(), "r");
+        for (int itr = 0; itr < IMAGENUM; itr++)
+            CV_Assert( fscanf(fid, "%f", dmosscores + itr) > 0 );
+        fclose(fid);
+
+        // features vector, 1 row per image
+        cv::Mat features(0, 0, CV_32FC1);
+
+        // response vector, 1 row per image
+        cv::Mat responses(0, 1, CV_32FC1);
+
+        for (int itr = 0; itr < IMAGENUM; itr++)
+        {
+            //Dont compute features for original images
+            if (iforg[itr])
+                continue;
+
+            // append dmos score
+            float score = dmosscores[itr];
+            responses.push_back(cv::Mat(1, 1, CV_32FC1, (void*)&score));
+
+            // load image, calc features
+            std::string imname = "";
+            imname.append(foldername);
+            imname.append("/");
+            imname.append(distortionlabels[categorylabels[itr]].c_str());
+            imname.append("/img");
+            imname += std::to_string((itr - imnumber[categorylabels[itr]] + 1));
+            imname.append(".bmp");
+
+            cv::Mat im_features;
+            cv::quality::QualityBRISQUE::computeFeatures(cv::imread(imname), im_features);    // outputs a row vector
+
+            features.push_back(im_features.row(0)); // append row vector
+        }
+
+        return std::make_pair(std::move(features), std::move(responses));
+    }    //    collect_data_live_r2
+}
+
+inline void printHelp()
+{
+    using namespace std;
+    cout << "    Demo of training BRISQUE quality assessment model using LIVE R2 database." << endl;
+    cout << "    A. Mittal, A. K. Moorthy and A. C. Bovik, 'No Reference Image Quality Assessment in the Spatial Domain'" << std::endl << std::endl;
+
+    cout << "    Usage: program <live_r2_db_path> <output_model_path> <output_range_path>" << endl << endl;
+}
+
+int main(int argc, const char * argv[])
+{
+    using namespace cv::ml;
+
+    if (argc != 4)
+    {
+        printHelp();
+        exit(1);
+    }
+
+    std::cout << "Training BRISQUE on database at " << argv[1] << "..." << std::endl;
+
+    // collect data from the data set
+    auto data = collect_data_live_r2( std::string( argv[1] ) + "/" );
+
+    // extract column ranges for features
+    const auto range = cv::quality::quality_utils::get_column_range(data.first);
+
+    // scale all features from -1 to 1
+    cv::quality::quality_utils::scale<float>(data.first, range, -1.f, 1.f);
+
+    // do training, output train file
+    // libsvm call from original BRISQUE impl:   svm-train  -s 3 -g 0.05 -c 1024 -b 1 -q train_scale allmodel
+    auto svm = SVM::create();
+    svm->setType(SVM::Types::EPS_SVR);
+    svm->setKernel(SVM::KernelTypes::RBF);
+    svm->setGamma(0.05);
+    svm->setC(1024.);
+    svm->setTermCriteria(cv::TermCriteria(cv::TermCriteria::Type::EPS, 1000, 0.001));
+    svm->setP(.1);// default p (epsilon) from libsvm
+
+    svm->train(data.first, cv::ml::ROW_SAMPLE, data.second);
+    svm->save( argv[2] );   // save to location specified in argv[2]
+
+    // output scale file to argv[3]
+    cv::Mat range_mat(range);
+    cv::FileStorage fs(argv[3], cv::FileStorage::WRITE );
+    fs << "range" << range_mat;
+
+    return 0;
+}
diff --git a/modules/quality/src/precomp.hpp b/modules/quality/src/precomp.hpp
new file mode 100644
index 00000000000..1cc77d1a98b
--- /dev/null
+++ b/modules/quality/src/precomp.hpp
@@ -0,0 +1,8 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#ifndef OPENCV_QUALITY_PRECOMP_HPP
+#define OPENCV_QUALITY_PRECOMP_HPP
+#include <opencv2/core.hpp>
+#include "opencv2/quality/qualitybase.hpp"
+#endif
\ No newline at end of file
diff --git a/modules/quality/src/qualitybrisque.cpp b/modules/quality/src/qualitybrisque.cpp
new file mode 100644
index 00000000000..c8bcb22baee
--- /dev/null
+++ b/modules/quality/src/qualitybrisque.cpp
@@ -0,0 +1,302 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+/*
+Copyright (c) 2011 The University of Texas at Austin
+All rights reserved.
+
+Permission is hereby granted, without written agreement and without license or royalty fees, to use, copy,
+modify, and distribute this code (the source files) and its documentation for
+any purpose, provided that the copyright notice in its entirety appear in all copies of this code, and the
+original source of this code, Laboratory for Image and Video Engineering (LIVE, http://live.ece.utexas.edu)
+and Center for Perceptual Systems (CPS, http://www.cps.utexas.edu) at the University of Texas at Austin (UT Austin,
+http://www.utexas.edu), is acknowledged in any publication that reports research using this code. The research
+is to be cited in the bibliography as:
+
+1) A. Mittal, A. K. Moorthy and A. C. Bovik, "BRISQUE Software Release",
+URL: http://live.ece.utexas.edu/research/quality/BRISQUE_release.zip, 2011
+
+2) A. Mittal, A. K. Moorthy and A. C. Bovik, "No Reference Image Quality Assessment in the Spatial Domain"
+submitted
+
+IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT AUSTIN BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL,
+OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS DATABASE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF TEXAS
+AT AUSTIN HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+THE UNIVERSITY OF TEXAS AT AUSTIN SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE DATABASE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS,
+AND THE UNIVERSITY OF TEXAS AT AUSTIN HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
+*/
+
+/* Original Paper: @cite Mittal2 and Original Implementation: @cite Mittal2_software */
+#include "precomp.hpp"
+#include "opencv2/imgproc.hpp"
+#include "opencv2/quality/qualitybrisque.hpp"
+#include "opencv2/quality/quality_utils.hpp"
+
+namespace
+{
+    using namespace cv;
+    using namespace cv::quality;
+
+    // type of mat we're working with internally
+    //  Win32+UMat:  performance is 15-20X worse than Mat
+    //  Win32+UMat+OCL:  performance is 200-300X worse than Mat, plus accuracy errors
+    //  Linux+UMat: 15X worse performance than Linux+Mat
+    using brisque_mat_type = cv::Mat;
+
+    // brisque intermediate calculation type
+    //  Linux+Mat:  CV_64F is 3X slower than CV_32F
+    //  Win32+Mat:  CV_64F is 2X slower than CV_32F
+    static constexpr const int BRISQUE_CALC_MAT_TYPE = CV_32F;
+    // brisque intermediate matrix element type.  float if BRISQUE_CALC_MAT_TYPE == CV_32F, double if BRISQUE_CALC_MAT_TYPE == CV_64F
+    using brisque_calc_element_type = float;
+
+    // convert mat to grayscale, range [0-1]
+    brisque_mat_type mat_convert( const brisque_mat_type& mat )
+    {
+        brisque_mat_type result = mat;
+        switch (mat.channels())
+        {
+        case 1:
+            break;
+        case 3:
+            cv::cvtColor(result, result, cv::COLOR_BGR2GRAY, 1);
+            break;
+        case 4:
+            cv::cvtColor(result, result, cv::COLOR_BGRA2GRAY, 1);
+            break;
+        default:
+            CV_Error(cv::Error::StsNotImplemented, "Unknown/unsupported channel count");
+        };//switch
+
+        // scale to 0-1 range
+        result.convertTo(result, BRISQUE_CALC_MAT_TYPE, 1. / 255.);
+        return result;
+    }
+
+    // function to compute best fit parameters from AGGDfit
+    void AGGDfit(const brisque_mat_type& structdis, double& lsigma_best, double& rsigma_best, double& gamma_best)
+    {
+        long int poscount = 0, negcount = 0;
+        double possqsum = 0, negsqsum = 0, abssum = 0;
+        for (int i = 0; i < structdis.rows; i++)
+        {
+            for (int j = 0; j < structdis.cols; j++)
+            {
+                double pt = structdis.at<brisque_calc_element_type>(i, j);
+                if (pt > 0)
+                {
+                    poscount++;
+                    possqsum += pt * pt;
+                    abssum += pt;
+                }
+                else if (pt < 0)
+                {
+                    negcount++;
+                    negsqsum += pt * pt;
+                    abssum -= pt;
+                }
+            }
+        }
+
+        lsigma_best = cv::pow(negsqsum / negcount, 0.5);
+        rsigma_best = cv::pow(possqsum / poscount, 0.5);
+
+        double gammahat = lsigma_best / rsigma_best;
+        long int totalcount = (structdis.cols)*(structdis.rows);
+        double rhat = cv::pow(abssum / totalcount, static_cast<double>(2)) / ((negsqsum + possqsum) / totalcount);
+        double rhatnorm = rhat * (cv::pow(gammahat, 3) + 1)*(gammahat + 1) / pow(pow(gammahat, 2) + 1, 2);
+
+        double prevgamma = 0;
+        double prevdiff = 1e10;
+        double sampling = 0.001;
+        for (double gam = 0.2; gam < 10; gam += sampling) //possible to coarsen sampling to quicken the code, with some loss of accuracy
+        {
+            double r_gam = tgamma(2 / gam)*tgamma(2 / gam) / (tgamma(1 / gam)*tgamma(3 / gam));
+            double diff = abs(r_gam - rhatnorm);
+            if (diff > prevdiff) break;
+            prevdiff = diff;
+            prevgamma = gam;
+        }
+        gamma_best = prevgamma;
+
+        // return structdis.clone();
+    }
+
+    std::vector<brisque_calc_element_type> ComputeBrisqueFeature( const brisque_mat_type& orig )
+    {
+        CV_DbgAssert(orig.channels() == 1);
+
+        std::vector<brisque_calc_element_type> featurevector;
+
+        auto orig_bw = orig;
+
+        // orig_bw now contains the grayscale image normalized to the range 0,1
+        int scalenum = 2; // number of times to scale the image
+        for (int itr_scale = 1; itr_scale <= scalenum; itr_scale++)
+        {
+            // resize image
+            cv::Size dst_size( int( orig_bw.cols / cv::pow((double)2, itr_scale - 1) ), int( orig_bw.rows / pow((double)2, itr_scale - 1)));
+            brisque_mat_type imdist_scaled;
+            cv::resize(orig_bw, imdist_scaled, dst_size, 0, 0, cv::INTER_CUBIC); // INTER_CUBIC
+
+            // calculating MSCN coefficients
+            // compute mu (local mean)
+            brisque_mat_type mu;//  (imdist_scaled.size(), CV_64FC1, 1);
+            cv::GaussianBlur(imdist_scaled, mu, cv::Size(7, 7), 7. / 6., 0., cv::BORDER_REPLICATE );
+
+            brisque_mat_type mu_sq;
+            cv::pow(mu, double(2.0), mu_sq);
+
+            //compute sigma (local sigma)
+            brisque_mat_type sigma;// (imdist_scaled.size(), CV_64FC1, 1);
+            cv::multiply(imdist_scaled, imdist_scaled, sigma);
+
+            cv::GaussianBlur(sigma, sigma, cv::Size(7, 7), 7./6., 0., cv::BORDER_REPLICATE );
+
+            cv::subtract(sigma, mu_sq, sigma);
+            cv::pow(sigma, double(0.5), sigma);
+            cv::add(sigma, Scalar(1.0 / 255), sigma); // to avoid DivideByZero Error
+
+            brisque_mat_type structdis;// (imdist_scaled.size(), CV_64FC1, 1);
+            cv::subtract(imdist_scaled, mu, structdis);
+            cv::divide(structdis, sigma, structdis);  // structdis is MSCN image
+
+            // Compute AGGD fit to MSCN image
+            double lsigma_best, rsigma_best, gamma_best;
+
+            //structdis = AGGDfit(structdis, lsigma_best, rsigma_best, gamma_best);
+            AGGDfit(structdis, lsigma_best, rsigma_best, gamma_best);
+            featurevector.push_back( (brisque_calc_element_type) gamma_best);
+            featurevector.push_back(( (brisque_calc_element_type)(  lsigma_best*lsigma_best + rsigma_best * rsigma_best) / 2 ));
+
+            // Compute paired product images
+            // indices for orientations (H, V, D1, D2)
+            int shifts[4][2] = { {0,1},{1,0},{1,1},{-1,1} };
+
+            for (int itr_shift = 1; itr_shift <= 4; itr_shift++)
+            {
+                // select the shifting index from the 2D array
+                int* reqshift = shifts[itr_shift - 1];
+
+                // declare, create shifted_structdis as pairwise image
+                brisque_mat_type shifted_structdis(imdist_scaled.size(), BRISQUE_CALC_MAT_TYPE); //(imdist_scaled.size(), CV_64FC1, 1);
+
+                // create pair-wise product for the given orientation (reqshift)
+                for (int i = 0; i < structdis.rows; i++)
+                {
+                    for (int j = 0; j < structdis.cols; j++)
+                    {
+                        if (i + reqshift[0] >= 0 && i + reqshift[0] < structdis.rows && j + reqshift[1] >= 0 && j + reqshift[1] < structdis.cols)
+                        {
+                            shifted_structdis.at<brisque_calc_element_type>(i,j) = structdis.at<brisque_calc_element_type>(i + reqshift[0], j + reqshift[1]);
+                        }
+                        else
+                        {
+                            shifted_structdis.at<brisque_calc_element_type>(i, j) = (brisque_calc_element_type) 0;
+                        }
+                    }
+                }
+
+                // calculate the products of the pairs
+                cv::multiply(structdis, shifted_structdis, shifted_structdis);
+
+                // fit the pairwise product to AGGD
+                // shifted_structdis = AGGDfit(shifted_structdis, lsigma_best, rsigma_best, gamma_best);
+                AGGDfit(shifted_structdis, lsigma_best, rsigma_best, gamma_best);
+
+                double constant = sqrt(tgamma(1 / gamma_best)) / sqrt(tgamma(3 / gamma_best));
+                double meanparam = (rsigma_best - lsigma_best)*(tgamma(2 / gamma_best) / tgamma(1 / gamma_best))*constant;
+
+                // push the calculated parameters from AGGD fit to pair-wise products
+                featurevector.push_back((brisque_calc_element_type)gamma_best);
+                featurevector.push_back((brisque_calc_element_type)meanparam);
+                featurevector.push_back( (brisque_calc_element_type) cv::pow(lsigma_best, 2));
+                featurevector.push_back( (brisque_calc_element_type) cv::pow(rsigma_best, 2));
+            }
+        }
+
+        return featurevector;
+    }
+
+    brisque_calc_element_type computescore(const cv::Ptr<cv::ml::SVM>& model, const cv::Mat& range, const brisque_mat_type& img ) {
+
+        const auto brisqueFeatures = ComputeBrisqueFeature( img ); // compute brisque features
+
+        cv::Mat feat_mat( 1,(int)brisqueFeatures.size(), CV_32FC1, (void*)brisqueFeatures.data() ); // load to mat
+        quality_utils::scale(feat_mat, range, -1.f, 1.f);// scale to range [-1,1]
+
+        cv::Mat result;
+        model->predict(feat_mat, result);
+        return std::min( std::max( result.at<float>(0), 0.f ), 100.f ); // clamp to [0-100]
+    }
+
+    // computes score for a single frame
+    cv::Scalar compute(const cv::Ptr<cv::ml::SVM>& model, const cv::Mat& range, const brisque_mat_type& img)
+    {
+        auto result = cv::Scalar{ 0. };
+        result[0] = computescore(model, range, img);
+        return result;
+    }
+}
+
+// static
+cv::Ptr<QualityBRISQUE> QualityBRISQUE::create(const cv::String& model_file_path, const cv::String& range_file_path)
+{
+    return cv::Ptr<QualityBRISQUE>(new QualityBRISQUE(model_file_path, range_file_path));
+}
+
+// static
+cv::Ptr<QualityBRISQUE> QualityBRISQUE::create(const cv::Ptr<cv::ml::SVM>& model, const cv::Mat& range)
+{
+    return cv::Ptr<QualityBRISQUE>(new QualityBRISQUE(model, range));
+}
+
+// static
+cv::Scalar QualityBRISQUE::compute( InputArray img, const cv::String& model_file_path, const cv::String& range_file_path)
+{
+    return QualityBRISQUE(model_file_path, range_file_path).compute(img);
+}
+
+// QualityBRISQUE() constructor
+QualityBRISQUE::QualityBRISQUE(const cv::String& model_file_path, const cv::String& range_file_path)
+    : QualityBRISQUE(
+        cv::ml::SVM::load(model_file_path)
+        , cv::FileStorage(range_file_path, cv::FileStorage::READ)["range"].mat()
+    )
+{}
+
+cv::Scalar QualityBRISQUE::compute( InputArray img )
+{
+    auto mat = quality_utils::extract_mat<brisque_mat_type>(img); // extract input mats
+
+    mat = mat_convert(mat);// convert to gs, scale to [0,1]
+
+    return ::compute(this->_model, this->_range, mat );
+}
+
+//static
+void QualityBRISQUE::computeFeatures(InputArray img, OutputArray features)
+{
+    CV_Assert(features.needed());
+    CV_Assert(img.isMat());
+    CV_Assert(!img.getMat().empty());
+
+    auto mat = mat_convert(img.getMat());
+
+    const auto vals = ComputeBrisqueFeature(mat);
+    cv::Mat valmat( cv::Size( (int)vals.size(), 1 ), CV_32FC1, (void*)vals.data()); // create row vector, type depends on brisque_calc_element_type
+
+    if (features.isUMat())
+        valmat.copyTo(features.getUMatRef());
+    else if (features.isMat())
+        // how to move data instead?
+        // if calling this:
+        //      features.getMatRef() = valmat;
+        //  then shared data is erased when valmat is released, corrupting the data in the outputarray for the caller
+        valmat.copyTo(features.getMatRef());
+    else
+        CV_Error(cv::Error::StsNotImplemented, "Unsupported output type");
+}
\ No newline at end of file
diff --git a/modules/quality/src/qualitygmsd.cpp b/modules/quality/src/qualitygmsd.cpp
new file mode 100644
index 00000000000..f5a12af7b31
--- /dev/null
+++ b/modules/quality/src/qualitygmsd.cpp
@@ -0,0 +1,178 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "precomp.hpp"
+
+#include "opencv2/quality/qualitygmsd.hpp"
+#include "opencv2/core/ocl.hpp"
+
+#include "opencv2/imgproc.hpp"  // blur, resize
+#include "opencv2/quality/quality_utils.hpp"
+
+namespace
+{
+    using namespace cv;
+    using namespace cv::quality;
+
+    using _mat_type = cv::UMat;// match QualityGMSD::_mat_data::mat_type
+    using _quality_map_type = _mat_type;
+
+    template <typename SrcMat, typename DstMat>
+    void filter_2D(const SrcMat& src, DstMat& dst, cv::InputArray kernel, cv::Point anchor, double delta, int border_type )
+    {
+        cv::filter2D(src, dst, src.depth(), kernel, anchor, delta, border_type);
+    }
+
+    // At the time of this writing (OpenCV 4.0.1) cv::Filter2D with OpenCL+UMat/32F suffers from precision loss large enough
+    //  to warrant conversion prior to application of Filter2D
+    template <typename DstMat>
+    void filter_2D( const UMat& src, DstMat& dst, cv::InputArray kernel, cv::Point anchor, double delta, int border_type )
+    {
+        if ( !cv::ocl::useOpenCL() || src.depth() == CV_64F)   // nothing more to do
+            return filter_2D<UMat, DstMat>(src, dst, kernel, anchor, delta, border_type);
+
+        auto dst_type = dst.type() == 0 ? src.type() : dst.type();
+
+        // UMat conversion to 64F
+        UMat src_converted = {};
+        src.convertTo(src_converted, CV_64F);
+        dst.convertTo(dst, CV_64F);
+
+        filter_2D<UMat, DstMat>(src_converted, dst, kernel, anchor, delta, border_type);
+        dst.convertTo(dst, dst_type);
+    }
+
+    // conv2, based on https://stackoverflow.com/a/12540358
+    enum ConvolutionType {
+        /* Return the full convolution, including border */
+        CONVOLUTION_FULL,
+
+        /* Return only the part that corresponds to the original image */
+        CONVOLUTION_SAME,
+
+        /* Return only the submatrix containing elements that were not influenced by the border */
+        CONVOLUTION_VALID
+    };
+
+    template <typename MatSrc, typename MatDst, typename TKernel>
+    void conv2(const MatSrc& img, MatDst& dest, const TKernel& kernel, ConvolutionType type ) {
+        auto source = img;
+        TKernel kernel_flipped = {};
+        cv::flip(kernel, kernel_flipped, -1);
+
+        if (CONVOLUTION_FULL == type) {
+            source = MatSrc();
+            const int additionalRows = kernel.rows - 1, additionalCols = kernel.cols - 1;
+            cv::copyMakeBorder(img, source, (additionalRows + 1) / 2, additionalRows / 2,
+                (additionalCols + 1) / 2, additionalCols / 2, BORDER_CONSTANT, Scalar(0));
+        }
+
+        cv::Point anchor(kernel.cols - kernel.cols / 2 - 1, kernel.rows - kernel.rows / 2 - 1);
+
+        // cv::filter2D(source, dest, img.depth(), kernel_flipped, anchor, 0, BORDER_CONSTANT );
+        filter_2D(source, dest, kernel_flipped, anchor, 0, BORDER_CONSTANT);
+
+        if (CONVOLUTION_VALID == type) {
+            dest = dest.colRange((kernel.cols - 1) / 2, dest.cols - kernel.cols / 2)
+                .rowRange((kernel.rows - 1) / 2, dest.rows - kernel.rows / 2);
+        }
+    }
+}   // ns
+
+// construct mat_data from _mat_type
+QualityGMSD::_mat_data::_mat_data(const QualityGMSD::_mat_data::mat_type& mat)
+{
+    CV_Assert(!mat.empty());
+
+    // 2x2 avg kernel
+    _mat_type
+        tmp1 = {}
+        , tmp = {}
+    ;
+
+    cv::blur(mat, tmp1, cv::Size(2, 2), cv::Point(0, 0), BORDER_CONSTANT);
+
+    // 2x2 downsample
+    // bug/hack:
+    //  modules\core\src\matrix.cpp:169: error: (-215:Assertion failed) u->refcount == 0 in function 'cv::StdMatAllocator::deallocate'
+    //  when src==dst and using UMat, useOpenCL=false
+    //  workaround:  use 2 temp vars instead of 1 so that src != dst
+    //  todo:  fix after https://github.com/opencv/opencv/issues/13577 solved
+    cv::resize(tmp1, tmp, cv::Size(), .5, .5, INTER_NEAREST);
+
+    // prewitt conv2
+    static const cv::Matx33d
+        prewitt_y = { 1. / 3., 1. / 3., 1. / 3., 0., 0., 0., -1. / 3., -1. / 3., -1. / 3. }
+        , prewitt_x = { 1. / 3., 0., -1. / 3., 1. / 3., 0., -1. / 3.,1. / 3., 0., -1. / 3. }
+    ;
+
+    // prewitt y on tmp ==> this->gradient_map
+    ::conv2(tmp, this->gradient_map, prewitt_y, ::ConvolutionType::CONVOLUTION_SAME);
+
+    // prewitt x on tmp ==> tmp
+    ::conv2(tmp, tmp, prewitt_x, ::ConvolutionType::CONVOLUTION_SAME);
+
+    // calc gradient map, sqrt( px ^ 2 + py ^ 2 )
+    cv::multiply(this->gradient_map, this->gradient_map, this->gradient_map);   // square gradient map
+    cv::multiply(tmp, tmp, tmp);    // square temp
+    cv::add(this->gradient_map, tmp, this->gradient_map);   // add together
+    cv::sqrt(this->gradient_map, this->gradient_map);// get sqrt
+
+    // calc gradient map squared
+    this->gradient_map_squared = this->gradient_map.mul(this->gradient_map);
+}
+
+QualityGMSD::_mat_data::_mat_data(InputArray arr)
+    : _mat_data(quality_utils::expand_mat<mat_type>(arr))//delegate
+{}
+
+// static
+Ptr<QualityGMSD> QualityGMSD::create( InputArray ref )
+{
+    return Ptr<QualityGMSD>(new QualityGMSD( _mat_data(ref)));
+}
+
+// static
+cv::Scalar QualityGMSD::compute( InputArray ref, InputArray cmp, OutputArray qualityMap )
+{
+    auto result = _mat_data::compute( _mat_data(ref), _mat_data(cmp) );
+
+    if (qualityMap.needed())
+        qualityMap.assign(result.second);
+    return result.first;
+}
+
+cv::Scalar QualityGMSD::compute( InputArray cmp )
+{
+    auto result = _mat_data::compute(this->_refImgData, _mat_data(cmp));
+    OutputArray(this->_qualityMap).assign(result.second);
+    return result.first;
+}
+
+// computes gmsd and quality map for single frame
+std::pair<cv::Scalar, _quality_map_type> QualityGMSD::_mat_data::compute(const QualityGMSD::_mat_data& lhs, const QualityGMSD::_mat_data& rhs)
+{
+    static const double T = 170.;
+    std::pair<cv::Scalar, _quality_map_type> result;
+
+    // compute quality_map = (2 * gm1 .* gm2 + T) ./ (gm1 .^2 + gm2 .^2 + T);
+    _mat_type num
+        , denom
+        , qm
+        ;
+
+    cv::multiply(lhs.gradient_map, rhs.gradient_map, num);
+    cv::multiply(num, 2., num);
+    cv::add(num, T, num);
+
+    cv::add(lhs.gradient_map_squared, rhs.gradient_map_squared, denom);
+    cv::add(denom, T, denom);
+
+    cv::divide(num, denom, qm);
+
+    cv::meanStdDev(qm, cv::noArray(), result.first);
+    result.second = std::move(qm);
+
+    return result;
+}   // compute
\ No newline at end of file
diff --git a/modules/quality/src/qualitymse.cpp b/modules/quality/src/qualitymse.cpp
new file mode 100644
index 00000000000..f83679504fd
--- /dev/null
+++ b/modules/quality/src/qualitymse.cpp
@@ -0,0 +1,59 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "precomp.hpp"
+#include "opencv2/quality/qualitymse.hpp"
+#include "opencv2/quality/quality_utils.hpp"
+
+namespace
+{
+    using namespace cv;
+    using namespace cv::quality;
+
+    using mse_mat_type = UMat;
+    using _quality_map_type = mse_mat_type;
+
+    // computes mse and quality map for single frame
+    std::pair<cv::Scalar, _quality_map_type> compute(const mse_mat_type& lhs, const mse_mat_type& rhs)
+    {
+        std::pair<cv::Scalar, _quality_map_type> result;
+
+        cv::subtract( lhs, rhs, result.second );
+
+        // cv::pow(diff, 2., diff);
+        cv::multiply(result.second, result.second, result.second); // slightly faster than pow2
+
+        result.first = cv::mean(result.second);
+
+        return result;
+    }
+}
+
+// static
+Ptr<QualityMSE> QualityMSE::create( InputArray ref )
+{
+    return Ptr<QualityMSE>(new QualityMSE(quality_utils::expand_mat<mse_mat_type>(ref)));
+}
+
+// static
+cv::Scalar QualityMSE::compute( InputArray ref_, InputArray cmp_, OutputArray qualityMap )
+{
+    auto ref = quality_utils::expand_mat<mse_mat_type>(ref_);
+    auto cmp = quality_utils::expand_mat<mse_mat_type>(cmp_);
+
+    auto result = ::compute(ref, cmp);
+
+    if (qualityMap.needed())
+        qualityMap.assign(result.second);
+
+    return result.first;
+}
+
+cv::Scalar QualityMSE::compute( InputArray cmp_ )
+{
+    auto cmp = quality_utils::expand_mat<mse_mat_type>(cmp_);
+    auto result = ::compute( this->_ref, cmp );
+    OutputArray(this->_qualityMap).assign(result.second);
+    return result.first;
+}
\ No newline at end of file
diff --git a/modules/quality/src/qualityssim.cpp b/modules/quality/src/qualityssim.cpp
new file mode 100644
index 00000000000..b8008b72278
--- /dev/null
+++ b/modules/quality/src/qualityssim.cpp
@@ -0,0 +1,118 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "precomp.hpp"
+#include "opencv2/quality/qualityssim.hpp"
+#include "opencv2/imgproc.hpp"  // GaussianBlur
+#include "opencv2/quality/quality_utils.hpp"
+
+namespace
+{
+    using namespace cv;
+    using namespace cv::quality;
+
+    using _mat_type = UMat;
+    using _quality_map_type = _mat_type;
+
+    // SSIM blur function
+    _mat_type blur(const _mat_type& mat)
+    {
+        _mat_type result = {};
+        cv::GaussianBlur( mat, result, cv::Size(11, 11), 1.5 );
+        return result;
+    }
+}   // ns
+
+QualitySSIM::_mat_data::_mat_data( const _mat_type& mat )
+{
+    this->I = mat;
+    cv::multiply(this->I, this->I, this->I_2);
+    this->mu = ::blur(this->I);
+    cv::multiply(this->mu, this->mu, this->mu_2);
+    this->sigma_2 = ::blur(this->I_2);    // blur the squared img, subtract blurred_squared
+    cv::subtract(this->sigma_2, this->mu_2, this->sigma_2);
+}
+
+QualitySSIM::_mat_data::_mat_data(InputArray arr )
+    : _mat_data( quality_utils::expand_mat<mat_type>(arr) )    // delegate
+{}
+
+// static
+Ptr<QualitySSIM> QualitySSIM::create( InputArray ref )
+{
+    return Ptr<QualitySSIM>(new QualitySSIM( _mat_data( ref )));
+}
+
+// static
+cv::Scalar QualitySSIM::compute( InputArray ref, InputArray cmp, OutputArray qualityMap )
+{
+    auto result = _mat_data::compute( _mat_data(ref), _mat_data(cmp) );
+
+    if (qualityMap.needed())
+        qualityMap.assign(result.second);
+
+    return result.first;
+}
+
+cv::Scalar QualitySSIM::compute( InputArray cmp )
+{
+    auto result = _mat_data::compute(
+        this->_refImgData
+        , _mat_data(cmp)
+    );
+
+    OutputArray(this->_qualityMap).assign(result.second);
+    return result.first;
+}
+
+// static.  computes ssim and quality map for single frame
+// based on https://docs.opencv.org/2.4/doc/tutorials/highgui/video-input-psnr-ssim/video-input-psnr-ssim.html
+std::pair<cv::Scalar, _mat_type> QualitySSIM::_mat_data::compute(const _mat_data& lhs, const _mat_data& rhs)
+{
+    const double
+        C1 = 6.5025
+        , C2 = 58.5225
+        ;
+
+    mat_type
+        I1_I2
+        , mu1_mu2
+        , t1
+        , t2
+        , t3
+        , sigma12
+        ;
+
+    cv::multiply(lhs.I, rhs.I, I1_I2);
+    cv::multiply(lhs.mu, rhs.mu, mu1_mu2);
+    cv::subtract(::blur(I1_I2), mu1_mu2, sigma12);
+
+    // t3 = ((2*mu1_mu2 + C1).*(2*sigma12 + C2))
+    cv::multiply(mu1_mu2, 2., t1);
+    cv::add(t1, C1, t1);// t1 += C1
+
+    cv::multiply(sigma12, 2., t2);
+    cv::add(t2, C2, t2);// t2 += C2
+
+    // t3 = t1 * t2
+    cv::multiply(t1, t2, t3);
+
+    // t1 =((mu1_2 + mu2_2 + C1).*(sigma1_2 + sigma2_2 + C2))
+    cv::add(lhs.mu_2, rhs.mu_2, t1);
+    cv::add(t1, C1, t1);
+
+    cv::add(lhs.sigma_2, rhs.sigma_2, t2);
+    cv::add(t2, C2, t2);
+
+    // t1 *= t2
+    cv::multiply(t1, t2, t1);
+
+    // quality map: t3 /= t1
+    cv::divide(t3, t1, t3);
+
+    return {
+        cv::mean(t3)
+        , std::move(t3)
+    };
+}   // compute
\ No newline at end of file
diff --git a/modules/quality/test/test_brisque.cpp b/modules/quality/test/test_brisque.cpp
new file mode 100644
index 00000000000..466bec39d5a
--- /dev/null
+++ b/modules/quality/test/test_brisque.cpp
@@ -0,0 +1,91 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "test_precomp.hpp"
+
+#define TEST_CASE_NAME CV_Quality_BRISQUE
+
+namespace opencv_test
+{
+namespace quality_test
+{
+
+// brisque per channel
+const cv::Scalar
+    BRISQUE_EXPECTED_1 = { 31.866388320922852 }  // testfile_1a
+    , BRISQUE_EXPECTED_2 = { 9.7544803619384766 }   // testfile 2a
+;
+
+// default model and range file names
+//  opencv tests must be installed (cmake var:  INSTALL_TESTS), or BRISQUE tests will be skipped
+static const char* MODEL_FNAME = "brisque_model_live.yml";
+static const char* RANGE_FNAME = "brisque_range_live.yml";
+
+// instantiates a brisque object for testing
+inline cv::Ptr<quality::QualityBRISQUE> create_brisque()
+{
+    const auto model = cvtest::findDataFile(MODEL_FNAME, false);
+    const auto range = cvtest::findDataFile(RANGE_FNAME, false);
+    return quality::QualityBRISQUE::create(model, range);
+}
+
+// static method
+TEST(TEST_CASE_NAME, static_ )
+{
+    quality_expect_near(
+        quality::QualityBRISQUE::compute(
+            get_testfile_1a()
+            , cvtest::findDataFile(MODEL_FNAME, false)
+            , cvtest::findDataFile(RANGE_FNAME, false)
+        )
+        , BRISQUE_EXPECTED_1
+    );
+}
+
+// single channel, instance method, with and without opencl
+TEST(TEST_CASE_NAME, single_channel )
+{
+    auto fn = []() { quality_test(create_brisque(), get_testfile_1a(), BRISQUE_EXPECTED_1, false, true ); };
+    OCL_OFF( fn() );
+    OCL_ON( fn() );
+}
+
+// multi-channel
+TEST(TEST_CASE_NAME, multi_channel)
+{
+    quality_test(create_brisque(), get_testfile_2a(), BRISQUE_EXPECTED_2, false, true);
+}
+
+// check brisque model/range persistence
+TEST(TEST_CASE_NAME, model_persistence )
+{
+    auto ptr = create_brisque();
+    auto fn = [&ptr]() { quality_test(ptr, get_testfile_1a(), BRISQUE_EXPECTED_1, false, true); };
+    fn();
+    fn();   // model/range should persist with brisque ptr through multiple invocations
+}
+
+// check compute features interface method
+TEST(TEST_CASE_NAME, compute_features)
+{
+    auto ptr = create_brisque();
+    cv::Mat features;
+    ptr->computeFeatures(get_testfile_1a(), features);
+
+    EXPECT_EQ(features.rows, 1);
+    EXPECT_EQ(features.cols, 36);
+}
+
+/*
+// internal a/b test
+TEST(TEST_CASE_NAME, performance)
+{
+    auto ref = get_testfile_1a();
+    auto alg = create_brisque();
+
+    quality_performance_test("BRISQUE", [&]() { alg->compute(ref); });
+}
+*/
+}
+} // namespace
\ No newline at end of file
diff --git a/modules/quality/test/test_gmsd.cpp b/modules/quality/test/test_gmsd.cpp
new file mode 100644
index 00000000000..3853d06b2a3
--- /dev/null
+++ b/modules/quality/test/test_gmsd.cpp
@@ -0,0 +1,53 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "test_precomp.hpp"
+
+#define TEST_CASE_NAME CV_Quality_GMSD
+
+namespace opencv_test
+{
+namespace quality_test
+{
+
+// expected gmsd per channel
+const cv::Scalar
+    GMSD_EXPECTED_1 = { .2393 }
+    , GMSD_EXPECTED_2 = { .0942, .1016, .0995 }
+;
+
+// static method
+TEST(TEST_CASE_NAME, static_)
+{
+    cv::Mat qMat = {};
+    quality_expect_near(quality::QualityGMSD::compute(get_testfile_1a(), get_testfile_1a(), qMat), cv::Scalar(0.)); // ref vs ref == 0.
+    check_quality_map(qMat);
+}
+
+// single channel, with and without opencl
+TEST(TEST_CASE_NAME, single_channel)
+{
+    auto fn = []() { quality_test(quality::QualityGMSD::create(get_testfile_1a()), get_testfile_1b(), GMSD_EXPECTED_1); };
+    OCL_OFF(fn());
+    OCL_ON(fn());
+}
+
+// multi-channel
+TEST(TEST_CASE_NAME, multi_channel)
+{
+    quality_test(quality::QualityGMSD::create(get_testfile_2a()), get_testfile_2b(), GMSD_EXPECTED_2);
+}
+
+// internal A/B test
+/*
+TEST(TEST_CASE_NAME, performance)
+{
+    auto ref = get_testfile_1a();
+    auto cmp = get_testfile_1b();
+    quality_performance_test("GMSD", [&]() { cv::quality::QualityGMSD::compute(ref, cmp, cv::noArray()); });
+}
+*/
+
+}
+} // namespace
\ No newline at end of file
diff --git a/modules/quality/test/test_main.cpp b/modules/quality/test/test_main.cpp
new file mode 100644
index 00000000000..f96908c1abe
--- /dev/null
+++ b/modules/quality/test/test_main.cpp
@@ -0,0 +1,9 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#include "test_precomp.hpp"
+
+CV_TEST_MAIN("",
+    cvtest::addDataSearchSubDirectory("contrib/quality")    // for ocv_add_testdata
+    , cvtest::addDataSearchSubDirectory("quality")          // for ${OPENCV_TEST_DATA_PATH}
+)
\ No newline at end of file
diff --git a/modules/quality/test/test_mse.cpp b/modules/quality/test/test_mse.cpp
new file mode 100644
index 00000000000..e301edffd86
--- /dev/null
+++ b/modules/quality/test/test_mse.cpp
@@ -0,0 +1,47 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "test_precomp.hpp"
+
+#define TEST_CASE_NAME CV_Quality_MSE
+
+namespace opencv_test
+{
+namespace quality_test
+{
+
+// static method
+TEST(TEST_CASE_NAME, static_ )
+{
+    cv::Mat qMat = {};
+    quality_expect_near(quality::QualityMSE::compute(get_testfile_1a(), get_testfile_1a(), qMat), cv::Scalar(0.)); // ref vs ref == 0
+    check_quality_map(qMat);
+}
+
+// single channel, with and without opencl
+TEST(TEST_CASE_NAME, single_channel )
+{
+    auto fn = []() { quality_test(quality::QualityMSE::create(get_testfile_1a()), get_testfile_1b(), MSE_EXPECTED_1); };
+    OCL_OFF( fn() );
+    OCL_ON( fn() );
+}
+
+// multi-channel
+TEST(TEST_CASE_NAME, multi_channel)
+{
+    quality_test(quality::QualityMSE::create(get_testfile_2a()), get_testfile_2b(), MSE_EXPECTED_2);
+}
+
+// internal a/b test
+/*
+TEST(TEST_CASE_NAME, performance)
+{
+    auto ref = get_testfile_1a();
+    auto cmp = get_testfile_1b();
+
+    quality_performance_test("MSE", [&]() { cv::quality::QualityMSE::compute(ref, cmp, cv::noArray()); });
+}
+*/
+}
+} // namespace
\ No newline at end of file
diff --git a/modules/quality/test/test_precomp.hpp b/modules/quality/test/test_precomp.hpp
new file mode 100644
index 00000000000..f8f192a174a
--- /dev/null
+++ b/modules/quality/test/test_precomp.hpp
@@ -0,0 +1,131 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#ifndef OPENCV_TEST_PRECOMP_HPP
+#define OPENCV_TEST_PRECOMP_HPP
+
+#include <chrono>
+#include <opencv2/core.hpp>
+#include <opencv2/ts.hpp>
+#include <opencv2/ts/ocl_test.hpp>  // OCL_ON, OCL_OFF
+#include <opencv2/imgcodecs.hpp>
+#include <opencv2/quality.hpp>
+#include <opencv2/quality/quality_utils.hpp>
+
+namespace opencv_test
+{
+namespace quality_test
+{
+
+const cv::String
+    dataDir = "cv/optflow/"
+    , testfile1a = dataDir + "rock_1.bmp"
+    , testfile1b = dataDir + "rock_2.bmp"
+    , testfile2a = dataDir + "RubberWhale1.png"
+    , testfile2b = dataDir + "RubberWhale2.png"
+    ;
+
+const cv::Scalar
+    MSE_EXPECTED_1 = { 2136.0525 } // matlab: immse('rock_1.bmp', 'rock_2.bmp') == 2.136052552083333e+03
+    , MSE_EXPECTED_2 = { 92.8235, 109.4104, 121.4 } // matlab: immse('rubberwhale1.png', 'rubberwhale2.png') == {92.8235, 109.4104, 121.4}
+    ;
+
+inline cv::Mat get_testfile(const cv::String& path, int flags = IMREAD_UNCHANGED )
+{
+    auto full_path = TS::ptr()->get_data_path() + path;
+    auto result = cv::imread( full_path, flags );
+    if (result.empty())
+        CV_Error(cv::Error::StsObjectNotFound, "Cannot find file: " + full_path );
+    return result;
+}
+
+inline cv::Mat get_testfile_1a() { return get_testfile(testfile1a, IMREAD_GRAYSCALE); }
+inline cv::Mat get_testfile_1b() { return get_testfile(testfile1b, IMREAD_GRAYSCALE); }
+inline cv::Mat get_testfile_2a() { return get_testfile(testfile2a); }
+inline cv::Mat get_testfile_2b() { return get_testfile(testfile2b); }
+
+const double QUALITY_ERR_TOLERANCE = .002  // allowed margin of error
+    ;
+
+inline void quality_expect_near( const cv::Scalar& a, const cv::Scalar& b, double err_tolerance = QUALITY_ERR_TOLERANCE)
+{
+    for (int i = 0; i < a.rows; ++i)
+    {
+        if (std::isinf(a(i)))
+            EXPECT_EQ(a(i), b(i));
+        else
+            EXPECT_NEAR(a(i), b(i), err_tolerance);
+    }
+}
+
+template <typename TMat>
+inline void check_quality_map( const TMat& mat, const bool expect_empty = false )
+{
+    EXPECT_EQ( mat.empty(), expect_empty );
+    if ( !expect_empty )
+    {
+        EXPECT_GT(mat.rows, 0);
+        EXPECT_GT(mat.cols, 0);
+    }
+}
+
+// execute quality test for a pair of images
+template <typename TMat>
+inline void quality_test(cv::Ptr<quality::QualityBase> ptr, const TMat& cmp, const Scalar& expected, const bool quality_map_expected = true, const bool empty_expected = false )
+{
+    cv::Mat qMat = {};
+    cv::UMat qUMat = {};
+
+    // quality map should return empty in initial state
+    ptr->getQualityMap(qMat);
+    EXPECT_TRUE( qMat.empty() );
+
+    // compute quality, check result
+    quality_expect_near( expected, ptr->compute(cmp));
+
+    if (empty_expected)
+        EXPECT_TRUE(ptr->empty());
+    else
+        EXPECT_FALSE(ptr->empty());
+
+    // getQualityMap to Mat, UMat
+    ptr->getQualityMap(qMat);
+    ptr->getQualityMap(qUMat);
+
+    // check them
+    check_quality_map(qMat, !quality_map_expected);
+    check_quality_map(qUMat, !quality_map_expected);
+
+    // reset algorithm, should now be empty
+    ptr->clear();
+    EXPECT_TRUE(ptr->empty());
+}
+
+/* A/B test benchmarking for development purposes */
+/*
+template <typename Fn>
+inline void quality_performance_test( const char* name, Fn&& op )
+{
+    const auto exec_test = [&]()
+    {
+        const int NRUNS = 100;
+        const auto start_t = std::chrono::high_resolution_clock::now();
+        for (int i = 0; i < NRUNS; ++i)
+            op();
+
+        const auto end_t = std::chrono::high_resolution_clock::now();
+        std::cout << name << " performance (OCL=" << cv::ocl::useOpenCL() << "): " << (double)(std::chrono::duration_cast<std::chrono::milliseconds>(end_t - start_t).count()) / (double)NRUNS << "ms\n";
+    };
+
+    // only run tests in NDEBUG mode
+#ifdef NDEBUG
+    OCL_OFF(exec_test());
+    OCL_ON(exec_test());
+#endif
+}
+*/
+
+}
+}
+
+#endif
\ No newline at end of file
diff --git a/modules/quality/test/test_psnr.cpp b/modules/quality/test/test_psnr.cpp
new file mode 100644
index 00000000000..0ab3a80dfef
--- /dev/null
+++ b/modules/quality/test/test_psnr.cpp
@@ -0,0 +1,51 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "test_precomp.hpp"
+
+#define TEST_CASE_NAME CV_Quality_PSNR
+
+namespace opencv_test
+{
+namespace quality_test
+{
+
+const cv::Scalar
+    PSNR_EXPECTED_1 = { 14.8347, INFINITY, INFINITY, INFINITY } // matlab: psnr('rock_1.bmp', 'rock_2.bmp') == 14.8347
+    , PSNR_EXPECTED_2 = { 28.4542, 27.7402, 27.2886, INFINITY }  // matlab: psnr('rubberwhale1.png', 'rubberwhale2.png') == BGR: 28.4542, 27.7402, 27.2886,  avg 27.8015
+;
+
+// static method
+TEST(TEST_CASE_NAME, static_)
+{
+    cv::Mat qMat = {};
+    quality_expect_near(quality::QualityPSNR::compute(get_testfile_1a(), get_testfile_1a(), qMat), cv::Scalar(INFINITY, INFINITY, INFINITY, INFINITY)); // ref vs ref == inf
+    check_quality_map(qMat);
+}
+
+// single channel, with/without opencl
+TEST(TEST_CASE_NAME, single_channel)
+{
+    auto fn = []() { quality_test(quality::QualityPSNR::create(get_testfile_1a()), get_testfile_1b(), PSNR_EXPECTED_1); };
+    OCL_OFF( fn() );
+    OCL_ON( fn() );
+}
+
+// multi-channel
+TEST(TEST_CASE_NAME, multi_channel)
+{
+    quality_test(quality::QualityPSNR::create(get_testfile_2a()), get_testfile_2b(), PSNR_EXPECTED_2);
+}
+
+// internal a/b test
+/*
+TEST(TEST_CASE_NAME, performance)
+{
+    auto ref = get_testfile_1a();
+    auto cmp = get_testfile_1b();
+    quality_performance_test("PSNR", [&]() { cv::quality::QualityPSNR::compute(ref, cmp, cv::noArray()); });
+}
+*/
+}
+} // namespace
\ No newline at end of file
diff --git a/modules/quality/test/test_ssim.cpp b/modules/quality/test/test_ssim.cpp
new file mode 100644
index 00000000000..f02c1675286
--- /dev/null
+++ b/modules/quality/test/test_ssim.cpp
@@ -0,0 +1,52 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "test_precomp.hpp"
+
+#define TEST_CASE_NAME CV_Quality_SSIM
+
+namespace opencv_test
+{
+namespace quality_test
+{
+
+// expected ssim per channel
+const cv::Scalar
+    SSIM_EXPECTED_1 = { .1501 }
+    , SSIM_EXPECTED_2 = { .7541, .7742, .8095 }
+    ;
+
+// static method
+TEST(TEST_CASE_NAME, static_)
+{
+    cv::Mat qMat = {};
+    quality_expect_near(quality::QualitySSIM::compute(get_testfile_1a(), get_testfile_1a(), qMat), cv::Scalar(1.)); // ref vs ref == 1.
+    check_quality_map(qMat);
+}
+
+// single channel, with/without opencl
+TEST(TEST_CASE_NAME, single_channel)
+{
+    auto fn = []() { quality_test(quality::QualitySSIM::create(get_testfile_1a()), get_testfile_1b(), SSIM_EXPECTED_1); };
+    OCL_OFF(fn());
+    OCL_ON(fn());
+}
+
+// multi-channel
+TEST(TEST_CASE_NAME, multi_channel)
+{
+    quality_test(quality::QualitySSIM::create(get_testfile_2a()), get_testfile_2b(), SSIM_EXPECTED_2);
+}
+
+// internal a/b test
+/*
+TEST(TEST_CASE_NAME, performance)
+{
+    auto ref = get_testfile_1a();
+    auto cmp = get_testfile_1b();
+    quality_performance_test("SSIM", [&]() { cv::quality::QualitySSIM::compute(ref, cmp, cv::noArray()); });
+}
+*/
+}
+} // namespace
\ No newline at end of file
diff --git a/modules/rapid/CMakeLists.txt b/modules/rapid/CMakeLists.txt
new file mode 100644
index 00000000000..1d30ec25182
--- /dev/null
+++ b/modules/rapid/CMakeLists.txt
@@ -0,0 +1,2 @@
+set(the_description "rapid - silhouette based 3D object tracking")
+ocv_define_module(rapid opencv_core opencv_imgproc opencv_calib3d WRAP python)
diff --git a/modules/rapid/doc/rapid.bib b/modules/rapid/doc/rapid.bib
new file mode 100644
index 00000000000..abb4ec87acd
--- /dev/null
+++ b/modules/rapid/doc/rapid.bib
@@ -0,0 +1,18 @@
+@inproceedings{harris1990rapid,
+  title={RAPID-a video rate object tracker.},
+  author={Harris, Chris and Stennett, Carl},
+  booktitle={BMVC},
+  pages={1--6},
+  year={1990}
+}
+
+@article{drummond2002real,
+  title={Real-time visual tracking of complex structures},
+  author={Drummond, Tom and Cipolla, Roberto},
+  journal={IEEE Transactions on pattern analysis and machine intelligence},
+  volume={24},
+  number={7},
+  pages={932--946},
+  year={2002},
+  publisher={IEEE}
+}
diff --git a/modules/rapid/include/opencv2/rapid.hpp b/modules/rapid/include/opencv2/rapid.hpp
new file mode 100644
index 00000000000..829bd86e099
--- /dev/null
+++ b/modules/rapid/include/opencv2/rapid.hpp
@@ -0,0 +1,126 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_RAPID_HPP_
+#define OPENCV_RAPID_HPP_
+
+#include <opencv2/core.hpp>
+#include <opencv2/imgproc.hpp>
+
+/**
+@defgroup rapid silhouette based 3D object tracking
+
+implements "RAPID-a video rate object tracker" @cite harris1990rapid with the dynamic control point extraction of @cite drummond2002real
+*/
+
+namespace cv
+{
+namespace rapid
+{
+//! @addtogroup rapid
+//! @{
+
+/**
+ * Debug draw markers of matched correspondences onto a lineBundle
+ * @param bundle the lineBundle
+ * @param srcLocations the according source locations
+ * @param newLocations matched source locations
+ * @param colors colors for the markers. Defaults to white.
+ */
+CV_EXPORTS_W void drawCorrespondencies(InputOutputArray bundle, InputArray srcLocations,
+                                       InputArray newLocations, InputArray colors = noArray());
+/**
+ * Debug draw search lines onto an image
+ * @param img the output image
+ * @param locations the source locations of a line bundle
+ * @param color the line color
+ */
+CV_EXPORTS_W void drawSearchLines(InputOutputArray img, InputArray locations, const Scalar& color);
+
+/**
+ * Draw a wireframe of a triangle mesh
+ * @param img the output image
+ * @param pts2d the 2d points obtained by @ref projectPoints
+ * @param tris triangle face connectivity
+ * @param color line color
+ * @param type line type. See @ref LineTypes.
+ * @param cullBackface enable back-face culling based on CCW order
+ */
+CV_EXPORTS_W void drawWireframe(InputOutputArray img, InputArray pts2d, InputArray tris,
+                                const Scalar& color, int type = LINE_8, bool cullBackface = false);
+/**
+ * Extract control points from the projected silhouette of a mesh
+ *
+ * see @cite drummond2002real Sec 2.1, Step b
+ * @param num number of control points
+ * @param len search radius (used to restrict the ROI)
+ * @param pts3d the 3D points of the mesh
+ * @param rvec rotation between mesh and camera
+ * @param tvec translation between mesh and camera
+ * @param K camera intrinsic
+ * @param imsize size of the video frame
+ * @param tris triangle face connectivity
+ * @param ctl2d the 2D locations of the control points
+ * @param ctl3d matching 3D points of the mesh
+ */
+CV_EXPORTS_W void extractControlPoints(int num, int len, InputArray pts3d, InputArray rvec, InputArray tvec,
+                                       InputArray K, const Size& imsize, InputArray tris, OutputArray ctl2d,
+                                       OutputArray ctl3d);
+/**
+ * Extract the line bundle from an image
+ * @param len the search radius. The bundle will have `2*len + 1` columns.
+ * @param ctl2d the search lines will be centered at this points and orthogonal to the contour defined by
+ * them. The bundle will have as many rows.
+ * @param img the image to read the pixel intensities values from
+ * @param bundle line bundle image with size `ctl2d.rows() x (2 * len + 1)` and the same type as @p img
+ * @param srcLocations the source pixel locations of @p bundle in @p img as CV_16SC2
+ */
+CV_EXPORTS_W void extractLineBundle(int len, InputArray ctl2d, InputArray img, OutputArray bundle,
+                                    OutputArray srcLocations);
+
+/**
+ * Find corresponding image locations by searching for a maximal sobel edge along the search line (a single
+ * row in the bundle)
+ * @param bundle the line bundle
+ * @param srcLocations the according source image location
+ * @param newLocations image locations with maximal edge along the search line
+ * @param response the sobel response for the selected point
+ */
+CV_EXPORTS_W void findCorrespondencies(InputArray bundle, InputArray srcLocations, OutputArray newLocations,
+                                       OutputArray response = noArray());
+
+/**
+ * Filter corresponding 2d and 3d points based on mask
+ * @param pts2d 2d points
+ * @param pts3d 3d points
+ * @param mask mask containing non-zero values for the elements to be retained
+ */
+CV_EXPORTS_W void filterCorrespondencies(InputOutputArray pts2d, InputOutputArray pts3d, InputArray mask);
+
+/**
+ * High level function to execute a single rapid @cite harris1990rapid iteration
+ *
+ * 1. @ref extractControlPoints
+ * 2. @ref extractLineBundle
+ * 3. @ref findCorrespondencies
+ * 4. @ref filterCorrespondencies
+ * 5. @ref solvePnPRefineLM
+ *
+ * @param img the video frame
+ * @param num number of search lines
+ * @param len search line radius
+ * @param pts3d the 3D points of the mesh
+ * @param tris triangle face connectivity
+ * @param K camera matrix
+ * @param rvec rotation between mesh and camera. Input values are used as an initial solution.
+ * @param tvec translation between mesh and camera. Input values are used as an initial solution.
+ * @return ratio of search lines that could be extracted and matched
+ */
+CV_EXPORTS_W float rapid(InputArray img, int num, int len, InputArray pts3d, InputArray tris, InputArray K,
+                         InputOutputArray rvec, InputOutputArray tvec);
+//! @}
+} /* namespace rapid */
+} /* namespace cv */
+
+#endif /* OPENCV_RAPID_HPP_ */
diff --git a/modules/rapid/src/precomp.hpp b/modules/rapid/src/precomp.hpp
new file mode 100644
index 00000000000..b92639fb3f2
--- /dev/null
+++ b/modules/rapid/src/precomp.hpp
@@ -0,0 +1,11 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#ifndef __OPENCV_PRECOMP_H__
+#define __OPENCV_PRECOMP_H__
+
+#include "opencv2/rapid.hpp"
+#include <vector>
+#include <opencv2/calib3d.hpp>
+
+#endif
diff --git a/modules/rapid/src/rapid.cpp b/modules/rapid/src/rapid.cpp
new file mode 100644
index 00000000000..af0ca2f97b3
--- /dev/null
+++ b/modules/rapid/src/rapid.cpp
@@ -0,0 +1,350 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "precomp.hpp"
+
+namespace cv
+{
+namespace rapid
+{
+
+static std::vector<int> getSilhoutteVertices(const Size& imsize, const std::vector<Point>& contour,
+                                             const Mat_<Point2f>& pts2d)
+{
+    // store indices
+    Mat_<int> img1(imsize, 0);
+    Rect img_rect({0, 0}, imsize);
+    for (int i = 0; i < pts2d.rows; i++) {
+        if (img_rect.contains(pts2d(i))) {
+            img1(pts2d(i)) = i + 1;
+        }
+    }
+
+    std::vector<int> v_idx;
+    // look up indices on contour
+    for (size_t i = 0; i < contour.size(); i++) {
+        if (int idx = img1(contour[i])) {
+            v_idx.push_back(idx - 1);
+        }
+    }
+
+    return v_idx;
+}
+
+class Contour3DSampler {
+    std::vector<int> idx;        // indices of points on contour
+    std::vector<float> cum_dist; // prefix sum
+
+    Mat_<Point2f> ipts2d;
+    Mat_<Point3f> ipts3d;
+
+    float lambda;
+    int pos;
+
+public:
+    float perimeter;
+
+    Contour3DSampler(const Mat_<Point2f>& pts2d, const Mat_<Point3f>& pts3d,
+                     const std::vector<Point>& contour, const Size& imsize)
+        : ipts2d(pts2d), ipts3d(pts3d)
+    {
+        idx = getSilhoutteVertices(imsize, contour, pts2d);
+
+        CV_Assert(!idx.empty());
+        // close the loop
+        idx.push_back(idx[0]);
+
+        cum_dist.resize(ipts2d.rows);
+        perimeter = 0.0f;
+
+        for (size_t i = 1; i < idx.size(); i++) {
+            perimeter += (float)norm(pts2d(idx[i]) - pts2d(idx[i - 1]));
+            cum_dist[i] = perimeter;
+        }
+
+        pos = 0;
+        lambda = 0;
+    }
+
+    void advanceTo(float dist)
+    {
+        while (pos < int(cum_dist.size() - 1) && dist >= cum_dist[pos]) {
+            pos++;
+        }
+
+        lambda = (dist - cum_dist[pos - 1]) / (cum_dist[pos] - cum_dist[pos - 1]);
+    }
+
+    Point3f current3D() const { return (1 - lambda) * ipts3d(idx[pos - 1]) + lambda * ipts3d(idx[pos]); }
+    Point2f current2D() const { return (1 - lambda) * ipts2d(idx[pos - 1]) + lambda * ipts2d(idx[pos]); }
+};
+
+void drawWireframe(InputOutputArray img, InputArray _pts2d, InputArray _tris,
+                   const Scalar& color, int type, bool cullBackface)
+{
+    CV_Assert(_tris.getMat().checkVector(3, CV_32S) > 0);
+    CV_Assert(_pts2d.getMat().checkVector(2, CV_32F) > 0);
+
+    Mat_<Vec3i> tris = _tris.getMat();
+    Mat_<Point2f> pts2d = _pts2d.getMat();
+
+    for (int i = 0; i < int(tris.total()); i++) {
+        const auto& idx = tris(i);
+        std::vector<Point> poly = {pts2d(idx[0]), pts2d(idx[1]), pts2d(idx[2])};
+
+        // skip back facing triangles
+        if (cullBackface && ((poly[2] - poly[0]).cross(poly[2] - poly[1]) >= 0))
+            continue;
+
+        polylines(img, poly, true, color, 1, type);
+    }
+}
+
+void drawSearchLines(InputOutputArray img, InputArray _locations, const Scalar& color)
+{
+    Mat locations = _locations.getMat();
+    CV_CheckTypeEQ(_locations.type(), CV_16SC2, "Vec2s data type expected");
+
+    for (int i = 0; i < locations.rows; i++) {
+        Point pt1(locations.at<Vec2s>(i, 0));
+        Point pt2(locations.at<Vec2s>(i, locations.cols - 1));
+        line(img, pt1, pt2, color, 1);
+    }
+}
+
+static void sampleControlPoints(int num, Contour3DSampler& sampler, const Rect& roi, OutputArray _opts2d,
+                                OutputArray _opts3d)
+{
+    std::vector<Vec3f> opts3d;
+    opts3d.reserve(num);
+    std::vector<Vec2f> opts2d;
+    opts2d.reserve(num);
+
+    // sample at equal steps
+    float step = sampler.perimeter / num;
+
+    if (step == 0)
+        num = 0; // edge case -> skip loop
+
+    for (int i = 0; i < num; i++) {
+        sampler.advanceTo(step * i);
+        auto pt2d = sampler.current2D();
+
+        // skip points too close to border
+        if (!roi.contains(pt2d))
+            continue;
+
+        opts3d.push_back(sampler.current3D());
+        opts2d.push_back(pt2d);
+    }
+
+    Mat(opts3d).copyTo(_opts3d);
+    Mat(opts2d).copyTo(_opts2d);
+}
+
+void extractControlPoints(int num, int len, InputArray pts3d, InputArray rvec, InputArray tvec,
+                          InputArray K, const Size& imsize, InputArray tris, OutputArray ctl2d,
+                          OutputArray ctl3d)
+{
+    CV_Assert(num);
+
+    Mat_<Point2f> pts2d(pts3d.rows(), 1);
+    projectPoints(pts3d, rvec, tvec, K, noArray(), pts2d);
+
+    Mat_<uchar> img(imsize, uchar(0));
+    drawWireframe(img, pts2d, tris.getMat(), 255, LINE_8, true);
+
+    // find contour
+    std::vector<std::vector<Point>> contours;
+    findContours(img, contours, RETR_EXTERNAL, CHAIN_APPROX_NONE);
+    CV_Assert(!contours.empty());
+
+    Contour3DSampler sampler(pts2d, pts3d.getMat(), contours[0], imsize);
+    Rect valid_roi(Point(len, len), imsize - Size(2 * len, 2 * len));
+    sampleControlPoints(num, sampler, valid_roi, ctl2d, ctl3d);
+}
+
+void extractLineBundle(int len, InputArray ctl2d, InputArray img, OutputArray bundle,
+                       OutputArray srcLocations)
+{
+    CV_Assert(len > 0);
+    Mat _img = img.getMat();
+
+    CV_Assert(ctl2d.getMat().checkVector(2, CV_32F) > 0);
+    Mat_<Point2f> contour = ctl2d.getMat();
+
+    const int N = contour.rows;
+    const int W = len * 2 + 1;
+
+    srcLocations.create(N, W, CV_16SC2);
+    Mat_<Vec2s> _srcLocations = srcLocations.getMat();
+
+    for (int i = 0; i < N; i++) {
+        // central difference
+        const Point2f diff = contour((i + 1) % N) - contour((i - 1 + N) % N);
+        Point2f n(normalize(Vec2f(-diff.y, diff.x))); // perpendicular to diff
+        // make it cover L pixels
+        n *= len / std::max(std::abs(n.x), std::abs(n.y));
+
+        LineIterator li(_img, contour(i) - n, contour(i) + n);
+        CV_DbgAssert(li.count == W);
+
+        for (int j = 0; j < li.count; j++, ++li) {
+            _srcLocations(i, j) = Vec2i(li.pos());
+        }
+    }
+
+    remap(img, bundle, srcLocations, noArray(),
+          INTER_NEAREST); // inter_nearest as we use integer locations
+}
+
+static void compute1DSobel(const Mat& src, Mat& dst)
+{
+    CV_CheckDepthEQ(src.depth(), CV_8U, "only uchar images supported");
+    int channels = src.channels();
+
+    CV_Assert(channels == 1 || channels == 3);
+
+    dst.create(src.size(), CV_8U);
+
+    for (int i = 0; i < src.rows; i++) {
+        for (int j = 1; j < src.cols - 1; j++) {
+            // central difference kernel: [-1, 0, 1]
+            if (channels == 3) {
+                const Vec3s diff = Vec3s(src.at<Vec3b>(i, j + 1)) - Vec3s(src.at<Vec3b>(i, j - 1));
+                dst.at<uchar>(i, j) =
+                    (uchar)std::max(std::max(std::abs(diff[0]), std::abs(diff[1])), std::abs(diff[2]));
+            } else {
+                dst.at<uchar>(i, j) = (uchar)std::abs(src.at<uchar>(i, j + 1) - src.at<uchar>(i, j - 1));
+            }
+        }
+        dst.at<uchar>(i, 0) = dst.at<uchar>(i, src.cols - 1) = 0; // border
+    }
+}
+
+void findCorrespondencies(InputArray bundle, InputArray _srcLocations, OutputArray _newLocations,
+                          OutputArray _response)
+{
+    CV_Assert(bundle.size() == _srcLocations.size());
+    CV_CheckTypeEQ(_srcLocations.type(), CV_16SC2, "Vec2s data type expected");
+
+    Mat_<uchar> sobel;
+    compute1DSobel(bundle.getMat(), sobel);
+
+    _newLocations.create(sobel.rows, 1, CV_16SC2);
+
+    Mat newLocations = _newLocations.getMat();
+    Mat srcLocations = _srcLocations.getMat();
+
+    Mat_<uchar> response;
+    if (_response.needed()) {
+        _response.create(sobel.rows, 1, CV_8U);
+        response = _response.getMat();
+    }
+
+    // sobel.cols = 2*len + 1
+    const int len = sobel.cols / 2;
+    const int ct = len + 1;
+
+    // find closest maximum to center
+    for (int i = 0; i < sobel.rows; i++) {
+        int pos = ct;
+        uchar mx = sobel.at<uchar>(i, ct);
+        for (int j = 0; j < len; j++) {
+            uchar right = sobel.at<uchar>(i, ct + j);
+            uchar left = sobel.at<uchar>(i, ct - j);
+            if (right > mx) {
+                mx = right;
+                pos = ct + j;
+            }
+            if (left > mx) {
+                mx = left;
+                pos = ct - j;
+            }
+        }
+
+        if (!response.empty())
+            response(i) = mx;
+
+        newLocations.at<Vec2s>(i, 0) = srcLocations.at<Vec2s>(i, pos);
+    }
+}
+
+void drawCorrespondencies(InputOutputArray _bundle, InputArray _srcLocations, InputArray _newLocations,
+                          InputArray _colors)
+{
+    CV_CheckTypeEQ(_srcLocations.type(), CV_16SC2, "Vec2s data type expected");
+    CV_CheckTypeEQ(_newLocations.type(), CV_16SC2, "Vec2s data type expected");
+    CV_Assert(_bundle.size() == _srcLocations.size());
+    CV_Assert(_colors.empty() || _colors.rows() == _srcLocations.rows());
+
+    Mat bundle = _bundle.getMat();
+    Mat_<Vec2s> srcLocations = _srcLocations.getMat();
+    Mat_<Vec2s> newLocations = _newLocations.getMat();
+    Mat_<Vec4d> colors = _colors.getMat();
+
+    for (int i = 0; i < bundle.rows; i++) {
+        const Vec2s& ref = newLocations(i);
+        for (int j = 1; j < bundle.cols - 1; j++) {
+            if (ref == srcLocations(i, j)) {
+                bundle(Rect(Point(j, i), Size(1, 1))) = colors.empty() ? Scalar::all(255) : colors(i);
+            }
+        }
+    }
+}
+
+void filterCorrespondencies(InputOutputArray _pts2d, InputOutputArray _pts3d, InputArray _mask)
+{
+    CV_CheckTypeEQ(_mask.type(), CV_8UC1, "mask must be of uchar type");
+    CV_Assert(_pts2d.rows() == _pts3d.rows() && _pts2d.rows() == _mask.rows());
+
+    Mat pts2d = _pts2d.getMat();
+    Mat pts3d = _pts3d.getMat();
+    Mat_<uchar> mask = _mask.getMat();
+
+    Mat opts3d(0, 1, pts3d.type());
+    opts3d.reserve(mask.rows);
+    Mat opts2d(0, 1, pts2d.type());
+    opts2d.reserve(mask.rows);
+
+    for (int i = 0; i < mask.rows; i++) {
+        if (!mask(i))
+            continue;
+
+        opts2d.push_back(pts2d.row(i));
+        opts3d.push_back(pts3d.row(i));
+    }
+
+    Mat(opts3d).copyTo(_pts3d);
+    Mat(opts2d).copyTo(_pts2d);
+}
+
+float rapid(InputArray img, int num, int len, InputArray vtx, InputArray tris, InputArray K,
+            InputOutputArray rvec, InputOutputArray tvec)
+{
+    CV_Assert(num >= 3);
+    Mat pts2d, pts3d, correspondencies;
+    extractControlPoints(num, len, vtx, rvec, tvec, K, img.size(), tris, pts2d, pts3d);
+    if (pts2d.empty())
+        return 0;
+
+    Mat lineBundle, imgLoc;
+    extractLineBundle(len, pts2d, img, lineBundle, imgLoc);
+
+    Mat response;
+    findCorrespondencies(lineBundle, imgLoc, correspondencies, response);
+
+    const uchar sobel_thresh = 20;
+    filterCorrespondencies(correspondencies, pts3d, response > sobel_thresh);
+
+    if (correspondencies.rows < 3)
+        return 0;
+
+    solvePnPRefineLM(pts3d, correspondencies, K, cv::noArray(), rvec, tvec);
+
+    return float(correspondencies.rows) / num;
+}
+
+} /* namespace rapid */
+} /* namespace cv */
diff --git a/modules/rapid/test/test_main.cpp b/modules/rapid/test/test_main.cpp
new file mode 100644
index 00000000000..1fc6e501e17
--- /dev/null
+++ b/modules/rapid/test/test_main.cpp
@@ -0,0 +1,45 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#include "test_precomp.hpp"
+
+CV_TEST_MAIN("cv")
+
+namespace opencv_test { namespace {
+
+TEST(CV_Rapid, rapid)
+{
+    // a unit sized box
+    std::vector<Vec3f> vtx = {
+        {1, -1, -1}, {1, -1, 1}, {-1, -1, 1}, {-1, -1, -1}, {1, 1, -1}, {1, 1, 1}, {-1, 1, 1}, {-1, 1, -1},
+    };
+    std::vector<Vec3i> tris = {
+        {2, 4, 1}, {8, 6, 5}, {5, 2, 1}, {6, 3, 2}, {3, 8, 4}, {1, 8, 5},
+        {2, 3, 4}, {8, 7, 6}, {5, 6, 2}, {6, 7, 3}, {3, 7, 8}, {1, 4, 8},
+    };
+    Mat(tris) -= Scalar(1, 1, 1);
+
+    // camera setup
+    Size sz(1280, 720);
+
+    Mat K = getDefaultNewCameraMatrix(Matx33f::diag(Vec3f(800, 800, 1)), sz, true);
+    Vec3f trans = {0, 0, 5};
+    Vec3f rot = {0.7f, 0.6f, 0};
+
+    // draw something
+    Mat pts2d;
+    projectPoints(vtx, rot, trans, K, noArray(), pts2d);
+
+    Mat_<uchar> img(sz, uchar(0));
+    rapid::drawWireframe(img, pts2d, tris, Scalar(255), LINE_8);
+
+    // recover pose form different position
+    Vec3f t_init = Vec3f(0.1f, 0, 5);
+    for(int i = 0; i < 2; i++) // do two iteration
+        rapid::rapid(img, 100, 20, vtx, tris, K, rot, t_init);
+
+    // assert that it improved from init
+    ASSERT_LT(cv::norm(trans - t_init), 0.075);
+}
+
+}} // namespace
diff --git a/modules/rapid/test/test_precomp.hpp b/modules/rapid/test/test_precomp.hpp
new file mode 100644
index 00000000000..d36dafc9fcf
--- /dev/null
+++ b/modules/rapid/test/test_precomp.hpp
@@ -0,0 +1,12 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/imgproc.hpp"
+#include "opencv2/calib3d.hpp"
+#include "opencv2/rapid.hpp"
+
+#endif
diff --git a/modules/reg/samples/map_test.cpp b/modules/reg/samples/map_test.cpp
index 454803a5187..2e7ae3fe8a3 100644
--- a/modules/reg/samples/map_test.cpp
+++ b/modules/reg/samples/map_test.cpp
@@ -39,9 +39,6 @@
 #include <opencv2/imgcodecs.hpp>
 #include <opencv2/highgui.hpp> // OpenCV window I/O
 #include <opencv2/imgproc.hpp> // OpenCV image transformations
-#include <opencv2/imgproc/types_c.h>
-#include <opencv2/imgcodecs/imgcodecs_c.h>
-#include <opencv2/highgui/highgui_c.h>
 
 #ifdef COMPARE_FEATURES
 #include <opencv2/xfeatures2d.hpp>
@@ -115,8 +112,8 @@ static void testShift(const Mat& img1)
     showDifference(img1, dest, DIFF_REGPIX_IM);
 
     waitKey(0);
-    cvDestroyWindow(DIFF_IM);
-    cvDestroyWindow(DIFF_REGPIX_IM);
+    destroyWindow(DIFF_IM);
+    destroyWindow(DIFF_REGPIX_IM);
 }
 
 static void testEuclidean(const Mat& img1)
@@ -152,8 +149,8 @@ static void testEuclidean(const Mat& img1)
     showDifference(img1, dest, DIFF_REGPIX_IM);
 
     waitKey(0);
-    cvDestroyWindow(DIFF_IM);
-    cvDestroyWindow(DIFF_REGPIX_IM);
+    destroyWindow(DIFF_IM);
+    destroyWindow(DIFF_REGPIX_IM);
 }
 
 static void testSimilarity(const Mat& img1)
@@ -190,8 +187,8 @@ static void testSimilarity(const Mat& img1)
     showDifference(img1, dest, DIFF_REGPIX_IM);
 
     waitKey(0);
-    cvDestroyWindow(DIFF_IM);
-    cvDestroyWindow(DIFF_REGPIX_IM);
+    destroyWindow(DIFF_IM);
+    destroyWindow(DIFF_REGPIX_IM);
 }
 
 static void testAffine(const Mat& img1)
@@ -224,8 +221,8 @@ static void testAffine(const Mat& img1)
     showDifference(img1, dest, DIFF_REGPIX_IM);
 
     waitKey(0);
-    cvDestroyWindow(DIFF_IM);
-    cvDestroyWindow(DIFF_REGPIX_IM);
+    destroyWindow(DIFF_IM);
+    destroyWindow(DIFF_REGPIX_IM);
 }
 
 static void testProjective(const Mat& img1)
@@ -256,8 +253,8 @@ static void testProjective(const Mat& img1)
     showDifference(img1, dest, DIFF_REGPIX_IM);
 
     waitKey(0);
-    cvDestroyWindow(DIFF_IM);
-    cvDestroyWindow(DIFF_REGPIX_IM);
+    destroyWindow(DIFF_IM);
+    destroyWindow(DIFF_REGPIX_IM);
 }
 
 #ifdef COMPARE_FEATURES
diff --git a/modules/rgbd/CMakeLists.txt b/modules/rgbd/CMakeLists.txt
index f2e022fe8a7..7f2f6a67257 100644
--- a/modules/rgbd/CMakeLists.txt
+++ b/modules/rgbd/CMakeLists.txt
@@ -1,2 +1,2 @@
 set(the_description "RGBD algorithms")
-ocv_define_module(rgbd opencv_core opencv_calib3d opencv_imgproc WRAP python)
+ocv_define_module(rgbd opencv_core opencv_calib3d opencv_imgproc OPTIONAL opencv_viz WRAP python)
diff --git a/modules/rgbd/LICENSE_KinectFusion.md b/modules/rgbd/LICENSE_KinectFusion.md
new file mode 100644
index 00000000000..0b27f29ca25
--- /dev/null
+++ b/modules/rgbd/LICENSE_KinectFusion.md
@@ -0,0 +1,27 @@
+Copyright (c) 2012, Anatoly Baksheev
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+* Neither the name of the {organization} nor the names of its
+  contributors may be used to endorse or promote products derived from
+  this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/modules/rgbd/LICENSE_WillowGarage.md b/modules/rgbd/LICENSE_WillowGarage.md
new file mode 100644
index 00000000000..003b03ef409
--- /dev/null
+++ b/modules/rgbd/LICENSE_WillowGarage.md
@@ -0,0 +1,31 @@
+Software License Agreement (BSD License)
+
+Copyright (c) 2009, Willow Garage, Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following
+disclaimer in the documentation and/or other materials provided
+with the distribution.
+Neither the name of Willow Garage, Inc. nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
diff --git a/modules/rgbd/README.md b/modules/rgbd/README.md
index a8e9e7aa0a7..2a662c6829c 100644
--- a/modules/rgbd/README.md
+++ b/modules/rgbd/README.md
@@ -1,4 +1,19 @@
  RGB-Depth Processing module
 ============================
 
-RGB-Depth Processing module -- Linemod 3D object recognition; Fast surface normals and 3D plane finding. 3D visual odometry
+Contains a collection of depth processing algorithms:
+* Linemod 3D object recognition
+* Fast surface normals and 3D plane finding
+* 3D visual odometry
+* KinectFusion
+
+Note that the KinectFusion algorithm was patented and its use may be restricted by following (but not limited to) list of patents:
+
+* _US20120196679A1_  Real-Time Camera Tracking Using Depth Maps
+* _US20120194644A1_  Mobile Camera Localization Using Depth Maps
+* _US20120194516A1_  Three-Dimensional Environment Reconstruction
+* _US8401225B2_  Moving object segmentation using depth images
+
+Since OpenCV's license imposes different restrictions on usage please consult a legal before using this algorithm any way.
+
+That's why you need to set the OPENCV_ENABLE_NONFREE option in CMake to use KinectFusion.
diff --git a/modules/rgbd/doc/dynafu_ICP.ipynb b/modules/rgbd/doc/dynafu_ICP.ipynb
new file mode 100644
index 00000000000..546d8dccc3d
--- /dev/null
+++ b/modules/rgbd/doc/dynafu_ICP.ipynb
@@ -0,0 +1,419 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# DynaFu ICP Math\n",
+    "## Differentiating and Linearising Rt matrices\n",
+    "\n",
+    "In dynafu, the warp function looks like the following for each node $i$:\n",
+    "\n",
+    "\n",
+    "$\n",
+    "\\begin{equation*}\n",
+    "f_i(x_i, V_g) = T_{x_i} * V_g = R(x_i) * V_g + t(x_i)\n",
+    "\\end{equation*}\n",
+    "$\n",
+    "\n",
+    "where ${x_i}$ are the transformation parameters for node $i$ and the rotation is performed around the corresponding node (and not a global reference)\n",
+    "\n",
+    "For linearising a transform around the parameters $\\mathbf{x}$, we need to find the derivative\n",
+    "\n",
+    "$\n",
+    "\\begin{equation*}\n",
+    "\\displaystyle\n",
+    "\\frac{\\partial f_i(\\mathbf{x} \\circ \\epsilon,   V_g)}{\\partial \\epsilon} |_{\\epsilon = 0}\n",
+    "\\end{equation*}\n",
+    "$\n",
+    "\n",
+    "We calculate this as follows:\n",
+    "\n",
+    "$\n",
+    "\\begin{equation*}\n",
+    "f_i(\\mathbf{x} \\circ \\epsilon, V_g) = f_i(\\epsilon, V) = T_{inc} * V\n",
+    "\\end{equation*}\n",
+    "$ where $V = f_i(\\mathbf{x}, V_g)$ and $T_{inc}$ is the infinitesimal transform with parameters $\\epsilon$\n",
+    "\n",
+    "According to Lie algebra, each Rt matrix can be represented as $A = e^\\xi$ where $\\xi$ are the transform parameters. Therefore,\n",
+    "\n",
+    "\n",
+    "$\n",
+    "\\begin{equation*}\n",
+    "f_i(\\mathbf{x}, V_g) = e^\\xi V\n",
+    "\\end{equation*}\n",
+    "$\n",
+    "\n",
+    "Therefore,\n",
+    "\n",
+    "$\n",
+    "\\begin{equation*}\n",
+    "\\displaystyle\n",
+    "\\frac{\\partial f_i(\\mathbf{x} \\circ \\xi,   V_g)}{\\partial \\xi} |_{\\xi = 0} =\n",
+    "\\frac{\\partial e^\\xi V}{\\partial \\xi} |_{\\xi=0} = \n",
+    "\\begin{pmatrix} -[V]_{\\times} & I_{3x3} \\end{pmatrix}_{3 \\times 6}\n",
+    "\\end{equation*}\n",
+    "$\n",
+    "\n",
+    "Let us denote $\\begin{pmatrix} -[V]_{\\times} & I_{3x3} \\end{pmatrix}$ as $G(V)$ from now on.\n",
+    "\n",
+    "This result is mentioned in [this manifold optimisation tutorial](http://ingmec.ual.es/~jlblanco/papers/jlblanco2010geometry3D_techrep.pdf) (equation 10.23).\n",
+    "\n",
+    "With this result, we can now linearise our transformation around $\\mathbf{x}$:\n",
+    "\n",
+    "$\n",
+    "\\begin{equation*}\n",
+    "f_i(x_i, V_g) = G(V) * \\epsilon + V\n",
+    "\\end{equation*}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "I suppose the following is an equivalent excerpt from the dynafu paper (Section about efficient optimisation) that mentions this way of calculating derivatives:\n",
+    "> We formulate compositional updates $\\hat x$ through the exponential map with a per-node twist $ξ_i ∈ se(3)$, requiring 6 variables per node transform, and perform linearisation  around $ξ_i=  0$. \n",
+    "\n",
+    "As a side note, the derivative $\\large \\frac{\\partial e^\\xi}{\\partial \\xi}|_{\\xi=0}$ is called the tangent (esentially the derivative) to the SE(3) manifold (the space in which Rt matrix $T_{inc}$ exists) at identity ($\\xi = 0$)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Estimating Warp Field Parameters\n",
+    "The total energy to be minimised is \n",
+    "\n",
+    "$\n",
+    "E = E_{data} + \\lambda E_{reg}\n",
+    "$\n",
+    "\n",
+    "#### Data term rearrangement \n",
+    "$\n",
+    "\\displaystyle\n",
+    "E_{data} = \\sum_{u \\in \\Omega} \\rho_{Tukey}( (T_u N_g)^T ((T_u V_g) - V_c))\n",
+    "$\n",
+    "\n",
+    "The quadcopter paper tells us that the following expression has the same minimiser, so we can use this instead:\n",
+    "\n",
+    "$\n",
+    "\\displaystyle\n",
+    "E_{data} = \\sum_{u \\in \\Omega} w_{Tukey}(r_u) \\cdot (r_u)^2\n",
+    "$\n",
+    "\n",
+    "where $w_{Tukey}(x) = \\rho'(x)/x$ which behaves like a constant term and $r_u = N_g^T (V_g - T_u^{-1}\\cdot V_c)$\n",
+    "\n",
+    "#### Regularisation term rearrangement\n",
+    "$\n",
+    "\\begin{equation}\n",
+    "\\displaystyle\n",
+    "E_{reg} = \\sum_{i = 0}^n \\sum_{j \\in \\varepsilon(i)} \\alpha_{ij} \\rho_{Huber} (T_{i}V_g^j - T_{j}V_g^j)\n",
+    "\\end{equation}\n",
+    "$\n",
+    "\n",
+    "This needs to be changed to the form of weighted least squares to be useful. So incorporate the same rearrangement as the data term and sum over edges instead:\n",
+    "\n",
+    "$\n",
+    "\\begin{equation}\n",
+    "\\displaystyle\n",
+    "E_{reg} = \\sum_{e \\in E} w_{Huber}(r_e) (r_e)^2\n",
+    "\\end{equation}\n",
+    "$\n",
+    "\n",
+    "Here $E$ is the set of the directed edges in the regularisation graph between all nodes from current level and the next coarser level. And $w_{Huber}(x) = \\alpha_x \\rho'(x)/x$\n",
+    "\n",
+    "#### Obtaining normal equation\n",
+    "\n",
+    "Therefore to solve an iteration, we equate the derivative with 0\n",
+    "\n",
+    "$\n",
+    "\\begin{equation*}\n",
+    "\\large\n",
+    "\\frac{\\partial E_{data}}{\\partial \\xi} + \\lambda \\frac{\\partial E_{reg}}{\\partial \\xi} = 0\n",
+    "\\end{equation*}\n",
+    "$\n",
+    "\n",
+    "which gives us\n",
+    "\n",
+    "$\n",
+    "\\begin{equation*}\n",
+    "J_d^T W_d(r_d + J_d\\mathbf{\\hat x}) + \\lambda J_r^T W_r (r_r + J_r\\mathbf{\\hat x}) = 0\n",
+    "\\end{equation*}\n",
+    "$\n",
+    "\n",
+    "$\n",
+    "(J_d^T W_d J_d + \\lambda J_r^T W_r J_r)\\mathbf{\\hat x} = -(J_d^T W_d r_d + \\lambda J_r^T W_r r_r)\n",
+    "$\n",
+    "\n",
+    "Here $W_d$ and $W_r$ are the weight matrices as described in quadcopter paper. However for $W_r, \\alpha$ is also incorporated in this matrix"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Calculating Data Term Jacobian ($J_d$) \n",
+    "\n",
+    "We estimate the inverse of Rt matrices instead of the Rt matrices themselves. So firstly we have to write $T_u^{-1} V_g$ in terms of inverse matrices. However, I realised that\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation*}\n",
+    "T_u^{-1} V_g \\ne \\sum_{k \\in N(V_u)} \\frac{w_k T_k^{-1}V_g}{w_k}\n",
+    "\\end{equation*}\n",
+    "$$\n",
+    "\n",
+    "Unfortunately, I could not find a representation of $T_u^{-1} V_g$ in terms of $T_k^{-1}V_g$ and got stuck here. Below is an approach without estimating the inverse Rt matrices, I think we can use that instead as the math is now fixed.\n",
+    "\n",
+    "### Alternative calculation for $J_d$\n",
+    "The residual $r_u$ in the data term is \n",
+    "\n",
+    "$$\n",
+    "r_u = (T_u N_g)^T (T_u V_g - V_c)\n",
+    "$$\n",
+    "\n",
+    "Let $a, b$ be column vectors such that $a = T_u N_g$ and $b = (T_u V_g - V_c)$. Now we can state the residual as \n",
+    "\n",
+    "$$\n",
+    "r_u = a^Tb\n",
+    "$$\n",
+    "\n",
+    "Each entry in $J_d$ for node paramter $x_j$ associated with node $j$ is:\n",
+    "\n",
+    "$$\n",
+    "(J_d)_{uj} = \\frac{\\partial r_u}{\\partial x_j} = \\frac{\\partial (a^Tb)}{\\partial x_j}\n",
+    "$$\n",
+    "\n",
+    "**Please note that numerator layout is assumed in all the derivatives**\n",
+    "\n",
+    "Applying chain rule for multiple variables, we get\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\\begin{aligned}\n",
+    "\\frac{\\partial (a^Tb)}{\\partial x_j} & = \n",
+    "\\frac{\\partial (a^Tb)}{\\partial a} \\cdot \\frac{\\partial a}{\\partial x_j} +\n",
+    "\\frac{\\partial (a^Tb)}{\\partial b} \\cdot \\frac{\\partial b}{\\partial x_j} \\\\\n",
+    "& =\n",
+    "\\frac{\\partial (a^Tb)}{\\partial a} \\cdot \\frac{\\partial a}{\\partial x_j} +\n",
+    "\\frac{\\partial (b^Ta)}{\\partial b} \\cdot \\frac{\\partial b}{\\partial x_j} && \\text{Since  $a^Tb = b^Ta$} \\\\\n",
+    "& =\n",
+    "b^T \\cdot \\frac{\\partial a}{\\partial x_j} +\n",
+    "a^T \\cdot \\frac{\\partial b}{\\partial x_j} && \\text{Since $\\frac{\\partial x^TA}{\\partial x} = A^T$}\n",
+    "\\end{aligned}\\end{equation}\\tag{1}\\label{1}\n",
+    "$$\n",
+    "\n",
+    "The identity $\\frac{\\partial x^TA}{\\partial x} = A^T$ is mentioned in [this wikipedia page](https://en.wikipedia.org/wiki/Matrix_calculus#Vector-by-vector_identities). Now we calculate $\\frac{\\partial a}{\\partial x_j}$ and $\\frac{\\partial b}{\\partial x_j}$ as follows:\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\\begin{aligned}\n",
+    "\\frac{\\partial a}{\\partial x_j} & = \\frac{\\partial (T_u N_g)}{\\partial x_j} \\\\\n",
+    "& = \\sum_{k \\in N(V_u)} \\frac{w_k \\frac{\\partial T_k N_g}{\\partial x_j}}{w_k} \\\\\n",
+    "& = \n",
+    "\\begin{cases}\n",
+    " \\frac{w_j \\frac{\\partial T_j N_g}{\\partial x_j}}{\\sum_{k \\in N(V_u)} w_k} && \\text{if $j \\in N(V_u)$} \\\\\n",
+    " 0 && \\text{otherwise}\n",
+    "\\end{cases} \\\\\n",
+    "& = \n",
+    "\\begin{cases}\n",
+    " \\frac{w_j \\begin{pmatrix}-[T_j N_g]_\\times & I_{3\\times3}\\end{pmatrix}}{\\sum_{k \\in N(V_u)} w_k} && \\text{if $j \\in N(V_u)$} \\\\\n",
+    " 0 && \\text{otherwise}\n",
+    "\\end{cases}\n",
+    "\\end{aligned}\\end{equation}\\tag{2}\\label{2}\n",
+    "$$\n",
+    "\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\\begin{aligned}\n",
+    "\\frac{\\partial b}{\\partial x_j} & = \\frac{\\partial (T_uV_g - V_c)}{\\partial x_j} \\\\\n",
+    "& = \\frac{\\partial T_uV_g}{\\partial x_j}\\\\\n",
+    "& = \n",
+    "\\begin{cases}\n",
+    " \\frac{w_j \\begin{pmatrix}-[T_j V_g]_\\times & I_{3\\times3}\\end{pmatrix}}{\\sum_{k \\in N(V_u)} w_k} && \\text{if $j \\in N(V_u)$} \\\\\n",
+    " 0 && \\text{otherwise}\n",
+    "\\end{cases} && \\text{Calculated similarly to ($\\ref{2}$)}\n",
+    "\\end{aligned}\\end{equation}\\tag{3}\\label{3}\n",
+    "$$\n",
+    "\n",
+    "Substituting equations $(\\ref{2})$, $(\\ref{3})$ as well as the values of $a^T$ and $b^T$ in $(\\ref{1})$, we obtain the required result:\n",
+    "\n",
+    "$$\n",
+    "(J_d)_{uj} = \n",
+    "\\begin{cases}\n",
+    "\\frac{w_j}{\\sum_{k \\in N(V_u)} w_k}\n",
+    "\\left(\n",
+    "  (T_u V_g - V_c)^T\n",
+    "  \\begin{pmatrix}-[T_j N_g] & I_{3\\times3}\\end{pmatrix} +\n",
+    "  (T_u N_g)^T \\begin{pmatrix}-[T_j V_g] & I_{3\\times3}\\end{pmatrix}\n",
+    "\\right) & \\text{if} j \\in N(V_u) \\\\\n",
+    "0 & \\text{otherwise}\n",
+    "\\end{cases}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can simplify this expression further:\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation*}\n",
+    "\\left(\n",
+    "  (T_u V_g - V_c)^T \\begin{pmatrix}-[T_j N_g] & I_{3\\times3}\\end{pmatrix} +\n",
+    "  (T_u N_g)^T \\begin{pmatrix}-[T_j V_g] & I_{3\\times3}\\end{pmatrix}\n",
+    "\\right) = \\\\\n",
+    "\\left [ \n",
+    "(T_u V_g - V_c)^T \\left ( -[T_j N_g]_\\otimes \\right ) \\mid (T_u V_g - V_c)^T\n",
+    " \\right ]\n",
+    "+\n",
+    "\\left [ \n",
+    "(T_u N_g)^T \\left ( -[T_j V_g]_\\otimes \\right ) \\mid (T_u N_g)^T\n",
+    " \\right ] = \\\\\n",
+    "\\left [ \n",
+    "(T_u V_g - V_c)^T \\left ( -[T_j N_g]_\\otimes \\right ) +\n",
+    "(T_u N_g)^T \\left ( -[T_j V_g]_\\otimes \\right )\n",
+    "    \\mid (T_u V_g + T_u N_g - V_c )^T\n",
+    "\\right ] = \\\\\n",
+    "\\left [ \n",
+    "(T_u V_g - V_c)^T  ( [T_j N_g]_\\otimes )^T +\n",
+    "(T_u N_g)^T ( [T_j V_g]_\\otimes )^T\n",
+    "    \\mid (T_u V_g + T_u N_g - V_c )^T\n",
+    "\\right ] = \\\\\n",
+    "\\begin{bmatrix}\n",
+    "( [T_j N_g]_\\otimes )(T_u V_g - V_c) +\n",
+    "( [T_j V_g]_\\otimes )(T_u N_g)\n",
+    "\\\\ \n",
+    "T_u V_g + T_u N_g - V_c\n",
+    "\\end{bmatrix}^T = \\\\\n",
+    "\\begin{bmatrix}\n",
+    "( T_j N_g) \\times (T_u V_g - V_c) +\n",
+    "( T_j V_g) \\times (T_u N_g)\n",
+    "\\\\ \n",
+    "T_u V_g + T_u N_g - V_c\n",
+    "\\end{bmatrix}^T = \\\\\n",
+    "\\begin{bmatrix}\n",
+    "( T_j N_g) \\times (T_u V_g - V_c) +\n",
+    "( T_j V_g) \\times (T_u N_g)\n",
+    "\\\\ \n",
+    "T_u V_g + T_u N_g - V_c \n",
+    "\\end{bmatrix}^T\n",
+    "\\end{equation*}\n",
+    "$$\n",
+    "\n",
+    "So, we get the final expression as:\n",
+    "$$\n",
+    "\\begin{equation*}\n",
+    "(J_d)_{uj} = \n",
+    "\\begin{cases}\n",
+    "\\frac{w_j}{\\sum_{k \\in N(V_u)} w_k}\n",
+    "\\begin{bmatrix}\n",
+    "( T_j N_g) \\times (T_u V_g - V_c) +\n",
+    "( T_j V_g) \\times (T_u N_g)\n",
+    "\\\\ \n",
+    "T_u V_g + T_u N_g - V_c \n",
+    "\\end{bmatrix}^T\n",
+    "&\n",
+    "\\text{if} j \\in N(V_u) \\\\\n",
+    "0 & \\text{otherwise}\n",
+    "\\end{cases}\n",
+    "\\end{equation*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that we have expression for the Jacobian, we can form the normal equation corresponding to the data term of the ICP\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation*}\n",
+    "\\\\\n",
+    "\\Omega \\text{ pixels, N nodes}\n",
+    "\\\\\n",
+    "J_d^T W_d J_d \\mathbf{\\hat x} = -J_d^T W_d r_d \\\\\n",
+    "\\underbrace{J_d^T}_{6N \\times\\Omega} \\underbrace {W_d}_{\\Omega \\times \\Omega}\n",
+    "\\underbrace{J_d}_{\\Omega \\times 6N} \\underbrace{\\mathbf{\\hat x}}_{6N \\times 1} =\n",
+    "-J_d^T W_d \\underbrace{r_d}_{\\Omega \\times 1} \\\\\n",
+    "\\underbrace{\\mathbf{A}_d}_{6N \\times 6N} \\mathbf{\\hat x} = \\underbrace{\\mathbf{b}_d}_{6N \\times 1} \\\\\n",
+    "\\\\\n",
+    "\\text {for each block (i, j) which are 6}\\times\\text{6:}\n",
+    "\\\\\n",
+    "(\\mathbf{A}_d)_{ij} = \\sum_{\\Omega} \\underbrace{w_d(\\Omega)}_{\\text{robust for pix}}\n",
+    "\\left(\\frac{\\partial E_d}{\\partial x_i}\\right)^T_\\Omega \\left(\\frac{\\partial E_d}{\\partial x_j}\\right)_\\Omega \\\\\n",
+    "\\\\\n",
+    "\\text {for each block j which are 6}\\times\\text{1:}\n",
+    "\\\\\n",
+    "(\\mathbf{b}_d)_{j} = - \\sum_{\\Omega} \\underbrace{w_d(\\Omega)}_{\\text{robust for pix}} r_d(\\Omega)\n",
+    "\\left(\\frac{\\partial E_d}{\\partial x_j}\\right)^T_\\Omega \\\\\n",
+    "(\\mathbf{b}_d)_{j} = - \\sum_{\\Omega} \\underbrace{w_d(\\Omega)}_{\\text{robust for pix}} ((T_u N_g)^T (T_u V_g - V_c))_{\\Omega}\n",
+    "\\left(\n",
+    "    \\frac{w_j \\text{ or 0} }{\\sum_{k \\in N(V_u)} w_k}\n",
+    "\\begin{bmatrix}\n",
+    "( T_j N_g) \\times (T_u V_g - V_c) +\n",
+    "( T_j V_g) \\times (T_u N_g)\n",
+    "\\\\ \n",
+    "T_u V_g + T_u N_g - V_c \n",
+    "\\end{bmatrix}\n",
+    "    \\right)_\\Omega \\\\\n",
+    "(\\mathbf{A}_d)_{ij} = \\sum_{\\Omega} \\underbrace{w_d(\\Omega)}_{\\text{robust for pix}}\n",
+    "\\left(\n",
+    "\\frac{(w_i\\text{ or 0})(w_j\\text{ or 0})}{(\\sum_{k \\in N(V_u)} w_k)^2}\n",
+    "    \\underbrace{(A^T_{i} A_{j})}_{6 \\times 6}\n",
+    "\\right)_{\\Omega}\n",
+    "\\\\ \\\\\n",
+    "A_{i} =\n",
+    "\\begin{bmatrix}\n",
+    "( T_i N_g) \\times (T_u V_g - V_c) +\n",
+    "( T_i V_g) \\times (T_u N_g)\n",
+    "\\\\ \n",
+    "T_u V_g + T_u N_g - V_c \n",
+    "\\end{bmatrix}\n",
+    "\\end{equation*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Calculating Regularisation Term Jacobian ($J_r$)\n",
+    "\n",
+    "Each row in $J_r$ corresponds to derivative of summand for each edge $e$ in the regularisation graph $\\epsilon$ and column $k$ corresponds to node $k$ with respect to which the derivative is calculated.\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation*}\n",
+    "\\displaystyle\n",
+    "(J_r)_{ek} = \n",
+    "\\sum_{e_{ij} \\in \\epsilon} \\frac{\\partial ( T_iV_g^j - T_jV_g^j)}{\\partial x_k}\n",
+    "=\n",
+    "\\sum_{e_{ij} \\in \\epsilon} \\begin{cases}\n",
+    "\\begin{pmatrix} -[T_iV_g^j] & I_{3x3} \\end{pmatrix} & \\text {if   }  i = k \\\\\n",
+    "-\\begin{pmatrix} -[T_jV_g^j] & I_{3x3} \\end{pmatrix} & \\text {if  }  j = k \\\\\n",
+    "0 & \\text {otherwise}\n",
+    "\\end{cases}\n",
+    "\\end{equation*}\n",
+    "$$\n",
+    "\n",
+    "Using this Jacobian we can set up a normal equation corresponding to the regularisation term similarly to the data term as mentioned in the previous section"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.5.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/modules/rgbd/doc/rgbd.bib b/modules/rgbd/doc/rgbd.bib
new file mode 100644
index 00000000000..6e87233ea85
--- /dev/null
+++ b/modules/rgbd/doc/rgbd.bib
@@ -0,0 +1,25 @@
+@inproceedings{kinectfusion,
+author = {Izadi, Shahram and Kim, David and Hilliges, Otmar and Molyneaux, David and Newcombe, Richard and Kohli, Pushmeet and Shotton, Jamie and Hodges, Steve and Freeman, Dustin and Davison, Andrew and Fitzgibbon, Andrew},
+title = {KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera},
+booktitle = {},
+year = {2011},
+month = {October},
+abstract = {
+KinectFusion enables a user holding and moving a standard Kinect camera to rapidly create detailed 3D reconstructions of an indoor scene. Only the depth data from Kinect is used to track the 3D pose of the sensor and reconstruct, geometrically precise, 3D models of the physical scene in real-time. The capabilities of KinectFusion, as well as the novel GPU-based pipeline are described in full. We show uses of the core system for low-cost handheld scanning, and geometry-aware augmented reality and physics-based interactions. Novel extensions to the core GPU pipeline demonstrate object segmentation and user interaction directly in front of the sensor, without degrading camera tracking or reconstruction. These extensions are used to enable real-time multi-touch interactions anywhere, allowing any planar or non-planar reconstructed physical surface to be appropriated for touch.
+},
+publisher = {ACM},
+url = {https://www.microsoft.com/en-us/research/publication/kinectfusion-real-time-3d-reconstruction-and-interaction-using-a-moving-depth-camera/},
+address = {},
+pages = {559-568},
+journal = {},
+volume = {},
+chapter = {},
+isbn = {978-1-4503-0716-1},
+}
+@inproceedings{dynamicfusion,
+author = {Newcombe, Richard A. and Fox, Dieter and Seitz, Steven M.},
+title = {DynamicFusion: Reconstruction and Tracking of Non-Rigid Scenes in Real-Time},
+booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+month = {June},
+year = {2015}
+}
diff --git a/modules/rgbd/include/opencv2/rgbd.hpp b/modules/rgbd/include/opencv2/rgbd.hpp
index 8f74344255d..37b2927cdcf 100755
--- a/modules/rgbd/include/opencv2/rgbd.hpp
+++ b/modules/rgbd/include/opencv2/rgbd.hpp
@@ -1,1094 +1,23 @@
-/*
- * Software License Agreement (BSD License)
- *
- *  Copyright (c) 2009, Willow Garage, Inc.
- *  All rights reserved.
- *
- *  Redistribution and use in source and binary forms, with or without
- *  modification, are permitted provided that the following conditions
- *  are met:
- *
- *   * Redistributions of source code must retain the above copyright
- *     notice, this list of conditions and the following disclaimer.
- *   * Redistributions in binary form must reproduce the above
- *     copyright notice, this list of conditions and the following
- *     disclaimer in the documentation and/or other materials provided
- *     with the distribution.
- *   * Neither the name of Willow Garage, Inc. nor the names of its
- *     contributors may be used to endorse or promote products derived
- *     from this software without specific prior written permission.
- *
- *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
- *  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
- *  COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
- *  INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
- *  BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- *  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- *  CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- *  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
- *  ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- *  POSSIBILITY OF SUCH DAMAGE.
- *
- */
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
 
 #ifndef __OPENCV_RGBD_HPP__
 #define __OPENCV_RGBD_HPP__
 
-#ifdef __cplusplus
+#include "opencv2/rgbd/linemod.hpp"
+#include "opencv2/rgbd/depth.hpp"
+#include "opencv2/rgbd/kinfu.hpp"
+#include "opencv2/rgbd/dynafu.hpp"
 
-#include <opencv2/core.hpp>
-#include <limits>
 
 /** @defgroup rgbd RGB-Depth Processing
 */
 
-namespace cv
-{
-namespace rgbd
-{
-
-//! @addtogroup rgbd
-//! @{
-
-  /** Checks if the value is a valid depth. For CV_16U or CV_16S, the convention is to be invalid if it is
-   * a limit. For a float/double, we just check if it is a NaN
-   * @param depth the depth to check for validity
-   */
-  CV_EXPORTS
-  inline bool
-  isValidDepth(const float & depth)
-  {
-    return !cvIsNaN(depth);
-  }
-  CV_EXPORTS
-  inline bool
-  isValidDepth(const double & depth)
-  {
-    return !cvIsNaN(depth);
-  }
-  CV_EXPORTS
-  inline bool
-  isValidDepth(const short int & depth)
-  {
-    return (depth != std::numeric_limits<short int>::min()) && (depth != std::numeric_limits<short int>::max());
-  }
-  CV_EXPORTS
-  inline bool
-  isValidDepth(const unsigned short int & depth)
-  {
-    return (depth != std::numeric_limits<unsigned short int>::min())
-        && (depth != std::numeric_limits<unsigned short int>::max());
-  }
-  CV_EXPORTS
-  inline bool
-  isValidDepth(const int & depth)
-  {
-    return (depth != std::numeric_limits<int>::min()) && (depth != std::numeric_limits<int>::max());
-  }
-  CV_EXPORTS
-  inline bool
-  isValidDepth(const unsigned int & depth)
-  {
-    return (depth != std::numeric_limits<unsigned int>::min()) && (depth != std::numeric_limits<unsigned int>::max());
-  }
-
-  /** Object that can compute the normals in an image.
-   * It is an object as it can cache data for speed efficiency
-   * The implemented methods are either:
-   * - FALS (the fastest) and SRI from
-   * ``Fast and Accurate Computation of Surface Normals from Range Images``
-   * by H. Badino, D. Huber, Y. Park and T. Kanade
-   * - the normals with bilateral filtering on a depth image from
-   * ``Gradient Response Maps for Real-Time Detection of Texture-Less Objects``
-   * by S. Hinterstoisser, C. Cagniart, S. Ilic, P. Sturm, N. Navab, P. Fua, and V. Lepetit
-   */
-  class CV_EXPORTS_W RgbdNormals: public Algorithm
-  {
-  public:
-    enum RGBD_NORMALS_METHOD
-    {
-      RGBD_NORMALS_METHOD_FALS = 0,
-      RGBD_NORMALS_METHOD_LINEMOD = 1,
-      RGBD_NORMALS_METHOD_SRI = 2
-    };
-
-    RgbdNormals()
-        :
-          rows_(0),
-          cols_(0),
-          depth_(0),
-          K_(Mat()),
-          window_size_(0),
-          method_(RGBD_NORMALS_METHOD_FALS),
-          rgbd_normals_impl_(0)
-    {
-    }
-
-    /** Constructor
-     * @param rows the number of rows of the depth image normals will be computed on
-     * @param cols the number of cols of the depth image normals will be computed on
-     * @param depth the depth of the normals (only CV_32F or CV_64F)
-     * @param K the calibration matrix to use
-     * @param window_size the window size to compute the normals: can only be 1,3,5 or 7
-     * @param method one of the methods to use: RGBD_NORMALS_METHOD_SRI, RGBD_NORMALS_METHOD_FALS
-     */
-    RgbdNormals(int rows, int cols, int depth, InputArray K, int window_size = 5, int method =
-        RgbdNormals::RGBD_NORMALS_METHOD_FALS);
-
-    ~RgbdNormals();
-
-    CV_WRAP static Ptr<RgbdNormals> create(int rows, int cols, int depth, InputArray K, int window_size = 5, int method =
-        RgbdNormals::RGBD_NORMALS_METHOD_FALS);
-
-    /** Given a set of 3d points in a depth image, compute the normals at each point.
-     * @param points a rows x cols x 3 matrix of CV_32F/CV64F or a rows x cols x 1 CV_U16S
-     * @param normals a rows x cols x 3 matrix
-     */
-    CV_WRAP_AS(apply) void
-    operator()(InputArray points, OutputArray normals) const;
-
-    /** Initializes some data that is cached for later computation
-     * If that function is not called, it will be called the first time normals are computed
-     */
-    CV_WRAP void
-    initialize() const;
-
-    CV_WRAP int getRows() const
-    {
-        return rows_;
-    }
-    CV_WRAP void setRows(int val)
-    {
-        rows_ = val;
-    }
-    CV_WRAP int getCols() const
-    {
-        return cols_;
-    }
-    CV_WRAP void setCols(int val)
-    {
-        cols_ = val;
-    }
-    CV_WRAP int getWindowSize() const
-    {
-        return window_size_;
-    }
-    CV_WRAP void setWindowSize(int val)
-    {
-        window_size_ = val;
-    }
-    CV_WRAP int getDepth() const
-    {
-        return depth_;
-    }
-    CV_WRAP void setDepth(int val)
-    {
-        depth_ = val;
-    }
-    CV_WRAP cv::Mat getK() const
-    {
-        return K_;
-    }
-    CV_WRAP void setK(const cv::Mat &val)
-    {
-        K_ = val;
-    }
-    CV_WRAP int getMethod() const
-    {
-        return method_;
-    }
-    CV_WRAP void setMethod(int val)
-    {
-        method_ = val;
-    }
-
-  protected:
-    void
-    initialize_normals_impl(int rows, int cols, int depth, const Mat & K, int window_size, int method) const;
-
-    int rows_, cols_, depth_;
-    Mat K_;
-    int window_size_;
-    int method_;
-    mutable void* rgbd_normals_impl_;
-  };
-
-  /** Object that can clean a noisy depth image
-   */
-  class CV_EXPORTS_W DepthCleaner: public Algorithm
-  {
-  public:
-    /** NIL method is from
-     * ``Modeling Kinect Sensor Noise for Improved 3d Reconstruction and Tracking``
-     * by C. Nguyen, S. Izadi, D. Lovel
-     */
-    enum DEPTH_CLEANER_METHOD
-    {
-      DEPTH_CLEANER_NIL
-    };
-
-    DepthCleaner()
-        :
-          depth_(0),
-          window_size_(0),
-          method_(DEPTH_CLEANER_NIL),
-          depth_cleaner_impl_(0)
-    {
-    }
-
-    /** Constructor
-     * @param depth the depth of the normals (only CV_32F or CV_64F)
-     * @param window_size the window size to compute the normals: can only be 1,3,5 or 7
-     * @param method one of the methods to use: RGBD_NORMALS_METHOD_SRI, RGBD_NORMALS_METHOD_FALS
-     */
-    DepthCleaner(int depth, int window_size = 5, int method = DepthCleaner::DEPTH_CLEANER_NIL);
-
-    ~DepthCleaner();
-
-    CV_WRAP static Ptr<DepthCleaner> create(int depth, int window_size = 5, int method = DepthCleaner::DEPTH_CLEANER_NIL);
-
-    /** Given a set of 3d points in a depth image, compute the normals at each point.
-     * @param points a rows x cols x 3 matrix of CV_32F/CV64F or a rows x cols x 1 CV_U16S
-     * @param depth a rows x cols matrix of the cleaned up depth
-     */
-    CV_WRAP_AS(apply) void
-    operator()(InputArray points, OutputArray depth) const;
-
-    /** Initializes some data that is cached for later computation
-     * If that function is not called, it will be called the first time normals are computed
-     */
-    CV_WRAP void
-    initialize() const;
-
-    CV_WRAP int getWindowSize() const
-    {
-        return window_size_;
-    }
-    CV_WRAP void setWindowSize(int val)
-    {
-        window_size_ = val;
-    }
-    CV_WRAP int getDepth() const
-    {
-        return depth_;
-    }
-    CV_WRAP void setDepth(int val)
-    {
-        depth_ = val;
-    }
-    CV_WRAP int getMethod() const
-    {
-        return method_;
-    }
-    CV_WRAP void setMethod(int val)
-    {
-        method_ = val;
-    }
-
-  protected:
-    void
-    initialize_cleaner_impl() const;
-
-    int depth_;
-    int window_size_;
-    int method_;
-    mutable void* depth_cleaner_impl_;
-  };
-
-
-  /** Registers depth data to an external camera
-   * Registration is performed by creating a depth cloud, transforming the cloud by
-   * the rigid body transformation between the cameras, and then projecting the
-   * transformed points into the RGB camera.
-   *
-   * uv_rgb = K_rgb * [R | t] * z * inv(K_ir) * uv_ir
-   *
-   * Currently does not check for negative depth values.
-   *
-   * @param unregisteredCameraMatrix the camera matrix of the depth camera
-   * @param registeredCameraMatrix the camera matrix of the external camera
-   * @param registeredDistCoeffs the distortion coefficients of the external camera
-   * @param Rt the rigid body transform between the cameras. Transforms points from depth camera frame to external camera frame.
-   * @param unregisteredDepth the input depth data
-   * @param outputImagePlaneSize the image plane dimensions of the external camera (width, height)
-   * @param registeredDepth the result of transforming the depth into the external camera
-   * @param depthDilation whether or not the depth is dilated to avoid holes and occlusion errors (optional)
-   */
-  CV_EXPORTS_W
-  void
-  registerDepth(InputArray unregisteredCameraMatrix, InputArray registeredCameraMatrix, InputArray registeredDistCoeffs,
-                InputArray Rt, InputArray unregisteredDepth, const Size& outputImagePlaneSize,
-                OutputArray registeredDepth, bool depthDilation=false);
-
-  /**
-   * @param depth the depth image
-   * @param in_K
-   * @param in_points the list of xy coordinates
-   * @param points3d the resulting 3d points
-   */
-  CV_EXPORTS_W
-  void
-  depthTo3dSparse(InputArray depth, InputArray in_K, InputArray in_points, OutputArray points3d);
-
-  /** Converts a depth image to an organized set of 3d points.
-   * The coordinate system is x pointing left, y down and z away from the camera
-   * @param depth the depth image (if given as short int CV_U, it is assumed to be the depth in millimeters
-   *              (as done with the Microsoft Kinect), otherwise, if given as CV_32F or CV_64F, it is assumed in meters)
-   * @param K The calibration matrix
-   * @param points3d the resulting 3d points. They are of depth the same as `depth` if it is CV_32F or CV_64F, and the
-   *        depth of `K` if `depth` is of depth CV_U
-   * @param mask the mask of the points to consider (can be empty)
-   */
-  CV_EXPORTS_W
-  void
-  depthTo3d(InputArray depth, InputArray K, OutputArray points3d, InputArray mask = noArray());
-
-  /** If the input image is of type CV_16UC1 (like the Kinect one), the image is converted to floats, divided
-   * by 1000 to get a depth in meters, and the values 0 are converted to std::numeric_limits<float>::quiet_NaN()
-   * Otherwise, the image is simply converted to floats
-   * @param in the depth image (if given as short int CV_U, it is assumed to be the depth in millimeters
-   *              (as done with the Microsoft Kinect), it is assumed in meters)
-   * @param depth the desired output depth (floats or double)
-   * @param out The rescaled float depth image
-   */
-  CV_EXPORTS_W
-  void
-  rescaleDepth(InputArray in, int depth, OutputArray out);
-
-  /** Object that can compute planes in an image
-   */
-  class CV_EXPORTS_W RgbdPlane: public Algorithm
-  {
-  public:
-    enum RGBD_PLANE_METHOD
-    {
-      RGBD_PLANE_METHOD_DEFAULT
-    };
-
-      RgbdPlane(int method = RgbdPlane::RGBD_PLANE_METHOD_DEFAULT)
-        :
-          method_(method),
-          block_size_(40),
-          min_size_(block_size_*block_size_),
-          threshold_(0.01),
-          sensor_error_a_(0),
-          sensor_error_b_(0),
-          sensor_error_c_(0)
-    {
-    }
-
-    /** Constructor
-     * @param block_size The size of the blocks to look at for a stable MSE
-     * @param min_size The minimum size of a cluster to be considered a plane
-     * @param threshold The maximum distance of a point from a plane to belong to it (in meters)
-     * @param sensor_error_a coefficient of the sensor error. 0 by default, 0.0075 for a Kinect
-     * @param sensor_error_b coefficient of the sensor error. 0 by default
-     * @param sensor_error_c coefficient of the sensor error. 0 by default
-     * @param method The method to use to compute the planes.
-     */
-    RgbdPlane(int method, int block_size,
-              int min_size, double threshold, double sensor_error_a = 0,
-              double sensor_error_b = 0, double sensor_error_c = 0);
-
-    ~RgbdPlane();
-
-    CV_WRAP static Ptr<RgbdPlane> create(int method, int block_size, int min_size, double threshold,
-                                         double sensor_error_a = 0, double sensor_error_b = 0,
-                                         double sensor_error_c = 0);
-
-    /** Find The planes in a depth image
-     * @param points3d the 3d points organized like the depth image: rows x cols with 3 channels
-     * @param normals the normals for every point in the depth image
-     * @param mask An image where each pixel is labeled with the plane it belongs to
-     *        and 255 if it does not belong to any plane
-     * @param plane_coefficients the coefficients of the corresponding planes (a,b,c,d) such that ax+by+cz+d=0, norm(a,b,c)=1
-     *        and c < 0 (so that the normal points towards the camera)
-     */
-    CV_WRAP_AS(apply) void
-    operator()(InputArray points3d, InputArray normals, OutputArray mask,
-               OutputArray plane_coefficients);
-
-    /** Find The planes in a depth image but without doing a normal check, which is faster but less accurate
-     * @param points3d the 3d points organized like the depth image: rows x cols with 3 channels
-     * @param mask An image where each pixel is labeled with the plane it belongs to
-     *        and 255 if it does not belong to any plane
-     * @param plane_coefficients the coefficients of the corresponding planes (a,b,c,d) such that ax+by+cz+d=0
-     */
-    CV_WRAP_AS(apply) void
-    operator()(InputArray points3d, OutputArray mask, OutputArray plane_coefficients);
-
-    CV_WRAP int getBlockSize() const
-    {
-        return block_size_;
-    }
-    CV_WRAP void setBlockSize(int val)
-    {
-        block_size_ = val;
-    }
-    CV_WRAP int getMinSize() const
-    {
-        return min_size_;
-    }
-    CV_WRAP void setMinSize(int val)
-    {
-        min_size_ = val;
-    }
-    CV_WRAP int getMethod() const
-    {
-        return method_;
-    }
-    CV_WRAP void setMethod(int val)
-    {
-        method_ = val;
-    }
-    CV_WRAP double getThreshold() const
-    {
-        return threshold_;
-    }
-    CV_WRAP void setThreshold(double val)
-    {
-        threshold_ = val;
-    }
-    CV_WRAP double getSensorErrorA() const
-    {
-        return sensor_error_a_;
-    }
-    CV_WRAP void setSensorErrorA(double val)
-    {
-        sensor_error_a_ = val;
-    }
-    CV_WRAP double getSensorErrorB() const
-    {
-        return sensor_error_b_;
-    }
-    CV_WRAP void setSensorErrorB(double val)
-    {
-        sensor_error_b_ = val;
-    }
-    CV_WRAP double getSensorErrorC() const
-    {
-        return sensor_error_c_;
-    }
-    CV_WRAP void setSensorErrorC(double val)
-    {
-        sensor_error_c_ = val;
-    }
-
-  private:
-    /** The method to use to compute the planes */
-    int method_;
-    /** The size of the blocks to look at for a stable MSE */
-    int block_size_;
-    /** The minimum size of a cluster to be considered a plane */
-    int min_size_;
-    /** How far a point can be from a plane to belong to it (in meters) */
-    double threshold_;
-    /** coefficient of the sensor error with respect to the. All 0 by default but you want a=0.0075 for a Kinect */
-    double sensor_error_a_, sensor_error_b_, sensor_error_c_;
-  };
-
-  /** Object that contains a frame data.
-   */
-  struct CV_EXPORTS_W RgbdFrame
-  {
-      RgbdFrame();
-      RgbdFrame(const Mat& image, const Mat& depth, const Mat& mask=Mat(), const Mat& normals=Mat(), int ID=-1);
-      virtual ~RgbdFrame();
-
-      CV_WRAP static Ptr<RgbdFrame> create(const Mat& image=Mat(), const Mat& depth=Mat(), const Mat& mask=Mat(), const Mat& normals=Mat(), int ID=-1);
-
-      CV_WRAP virtual void
-      release();
-
-      CV_PROP int ID;
-      CV_PROP Mat image;
-      CV_PROP Mat depth;
-      CV_PROP Mat mask;
-      CV_PROP Mat normals;
-  };
-
-  /** Object that contains a frame data that is possibly needed for the Odometry.
-   * It's used for the efficiency (to pass precomputed/cached data of the frame that participates
-   * in the Odometry processing several times).
-   */
-  struct CV_EXPORTS_W OdometryFrame : public RgbdFrame
-  {
-    /** These constants are used to set a type of cache which has to be prepared depending on the frame role:
-     * srcFrame or dstFrame (see compute method of the Odometry class). For the srcFrame and dstFrame different cache data may be required,
-     * some part of a cache may be common for both frame roles.
-     * @param CACHE_SRC The cache data for the srcFrame will be prepared.
-     * @param CACHE_DST The cache data for the dstFrame will be prepared.
-     * @param CACHE_ALL The cache data for both srcFrame and dstFrame roles will be computed.
-     */
-    enum
-    {
-      CACHE_SRC = 1, CACHE_DST = 2, CACHE_ALL = CACHE_SRC + CACHE_DST
-    };
-
-    OdometryFrame();
-    OdometryFrame(const Mat& image, const Mat& depth, const Mat& mask=Mat(), const Mat& normals=Mat(), int ID=-1);
-
-    CV_WRAP static Ptr<OdometryFrame> create(const Mat& image=Mat(), const Mat& depth=Mat(), const Mat& mask=Mat(), const Mat& normals=Mat(), int ID=-1);
-
-    CV_WRAP virtual void
-    release() CV_OVERRIDE;
-
-    CV_WRAP void
-    releasePyramids();
-
-    CV_PROP std::vector<Mat> pyramidImage;
-    CV_PROP std::vector<Mat> pyramidDepth;
-    CV_PROP std::vector<Mat> pyramidMask;
-
-    CV_PROP std::vector<Mat> pyramidCloud;
-
-    CV_PROP std::vector<Mat> pyramid_dI_dx;
-    CV_PROP std::vector<Mat> pyramid_dI_dy;
-    CV_PROP std::vector<Mat> pyramidTexturedMask;
-
-    CV_PROP std::vector<Mat> pyramidNormals;
-    CV_PROP std::vector<Mat> pyramidNormalsMask;
-  };
-
-  /** Base class for computation of odometry.
-   */
-  class CV_EXPORTS_W Odometry: public Algorithm
-  {
-  public:
-
-    /** A class of transformation*/
-    enum
-    {
-      ROTATION = 1, TRANSLATION = 2, RIGID_BODY_MOTION = 4
-    };
-
-    CV_WRAP static inline float
-    DEFAULT_MIN_DEPTH()
-    {
-      return 0.f; // in meters
-    }
-    CV_WRAP static inline float
-    DEFAULT_MAX_DEPTH()
-    {
-      return 4.f; // in meters
-    }
-    CV_WRAP static inline float
-    DEFAULT_MAX_DEPTH_DIFF()
-    {
-      return 0.07f; // in meters
-    }
-    CV_WRAP static inline float
-    DEFAULT_MAX_POINTS_PART()
-    {
-      return 0.07f; // in [0, 1]
-    }
-    CV_WRAP static inline float
-    DEFAULT_MAX_TRANSLATION()
-    {
-      return 0.15f; // in meters
-    }
-    CV_WRAP static inline float
-    DEFAULT_MAX_ROTATION()
-    {
-      return 15; // in degrees
-    }
-
-    /** Method to compute a transformation from the source frame to the destination one.
-     * Some odometry algorithms do not used some data of frames (eg. ICP does not use images).
-     * In such case corresponding arguments can be set as empty Mat.
-     * The method returns true if all internal computions were possible (e.g. there were enough correspondences,
-     * system of equations has a solution, etc) and resulting transformation satisfies some test if it's provided
-     * by the Odometry inheritor implementation (e.g. thresholds for maximum translation and rotation).
-     * @param srcImage Image data of the source frame (CV_8UC1)
-     * @param srcDepth Depth data of the source frame (CV_32FC1, in meters)
-     * @param srcMask Mask that sets which pixels have to be used from the source frame (CV_8UC1)
-     * @param dstImage Image data of the destination frame (CV_8UC1)
-     * @param dstDepth Depth data of the destination frame (CV_32FC1, in meters)
-     * @param dstMask Mask that sets which pixels have to be used from the destination frame (CV_8UC1)
-     * @param Rt Resulting transformation from the source frame to the destination one (rigid body motion):
-     dst_p = Rt * src_p, where dst_p is a homogeneous point in the destination frame and src_p is
-     homogeneous point in the source frame,
-     Rt is 4x4 matrix of CV_64FC1 type.
-     * @param initRt Initial transformation from the source frame to the destination one (optional)
-     */
-    CV_WRAP bool
-    compute(const Mat& srcImage, const Mat& srcDepth, const Mat& srcMask, const Mat& dstImage, const Mat& dstDepth,
-            const Mat& dstMask, OutputArray Rt, const Mat& initRt = Mat()) const;
-
-    /** One more method to compute a transformation from the source frame to the destination one.
-     * It is designed to save on computing the frame data (image pyramids, normals, etc.).
-     */
-    CV_WRAP_AS(compute2) bool
-    compute(Ptr<OdometryFrame>& srcFrame, Ptr<OdometryFrame>& dstFrame, OutputArray Rt, const Mat& initRt = Mat()) const;
-
-    /** Prepare a cache for the frame. The function checks the precomputed/passed data (throws the error if this data
-     * does not satisfy) and computes all remaining cache data needed for the frame. Returned size is a resolution
-     * of the prepared frame.
-     * @param frame The odometry which will process the frame.
-     * @param cacheType The cache type: CACHE_SRC, CACHE_DST or CACHE_ALL.
-     */
-    CV_WRAP virtual Size prepareFrameCache(Ptr<OdometryFrame>& frame, int cacheType) const;
-
-    CV_WRAP static Ptr<Odometry> create(const String & odometryType);
-
-    /** @see setCameraMatrix */
-    CV_WRAP virtual cv::Mat getCameraMatrix() const = 0;
-    /** @copybrief getCameraMatrix @see getCameraMatrix */
-    CV_WRAP virtual void setCameraMatrix(const cv::Mat &val) = 0;
-    /** @see setTransformType */
-    CV_WRAP virtual int getTransformType() const = 0;
-    /** @copybrief getTransformType @see getTransformType */
-    CV_WRAP virtual void setTransformType(int val) = 0;
-
-  protected:
-    virtual void
-    checkParams() const = 0;
-
-    virtual bool
-    computeImpl(const Ptr<OdometryFrame>& srcFrame, const Ptr<OdometryFrame>& dstFrame, OutputArray Rt,
-                const Mat& initRt) const = 0;
-  };
-
-  /** Odometry based on the paper "Real-Time Visual Odometry from Dense RGB-D Images",
-   * F. Steinbucker, J. Strum, D. Cremers, ICCV, 2011.
-   */
-  class CV_EXPORTS_W RgbdOdometry: public Odometry
-  {
-  public:
-    RgbdOdometry();
-    /** Constructor.
-     * @param cameraMatrix Camera matrix
-     * @param minDepth Pixels with depth less than minDepth will not be used (in meters)
-     * @param maxDepth Pixels with depth larger than maxDepth will not be used (in meters)
-     * @param maxDepthDiff Correspondences between pixels of two given frames will be filtered out
-     *                     if their depth difference is larger than maxDepthDiff (in meters)
-     * @param iterCounts Count of iterations on each pyramid level.
-     * @param minGradientMagnitudes For each pyramid level the pixels will be filtered out
-     *                              if they have gradient magnitude less than minGradientMagnitudes[level].
-     * @param maxPointsPart The method uses a random pixels subset of size frameWidth x frameHeight x pointsPart
-     * @param transformType Class of transformation
-     */
-    RgbdOdometry(const Mat& cameraMatrix, float minDepth = Odometry::DEFAULT_MIN_DEPTH(), float maxDepth = Odometry::DEFAULT_MAX_DEPTH(),
-                 float maxDepthDiff = Odometry::DEFAULT_MAX_DEPTH_DIFF(), const std::vector<int>& iterCounts = std::vector<int>(),
-                 const std::vector<float>& minGradientMagnitudes = std::vector<float>(), float maxPointsPart = Odometry::DEFAULT_MAX_POINTS_PART(),
-                 int transformType = Odometry::RIGID_BODY_MOTION);
-
-    CV_WRAP static Ptr<RgbdOdometry> create(const Mat& cameraMatrix = Mat(), float minDepth = Odometry::DEFAULT_MIN_DEPTH(), float maxDepth = Odometry::DEFAULT_MAX_DEPTH(),
-                 float maxDepthDiff = Odometry::DEFAULT_MAX_DEPTH_DIFF(), const std::vector<int>& iterCounts = std::vector<int>(),
-                 const std::vector<float>& minGradientMagnitudes = std::vector<float>(), float maxPointsPart = Odometry::DEFAULT_MAX_POINTS_PART(),
-                 int transformType = Odometry::RIGID_BODY_MOTION);
-
-    CV_WRAP virtual Size prepareFrameCache(Ptr<OdometryFrame>& frame, int cacheType) const CV_OVERRIDE;
-
-    CV_WRAP cv::Mat getCameraMatrix() const CV_OVERRIDE
-    {
-        return cameraMatrix;
-    }
-    CV_WRAP void setCameraMatrix(const cv::Mat &val) CV_OVERRIDE
-    {
-        cameraMatrix = val;
-    }
-    CV_WRAP double getMinDepth() const
-    {
-        return minDepth;
-    }
-    CV_WRAP void setMinDepth(double val)
-    {
-        minDepth = val;
-    }
-    CV_WRAP double getMaxDepth() const
-    {
-        return maxDepth;
-    }
-    CV_WRAP void setMaxDepth(double val)
-    {
-        maxDepth = val;
-    }
-    CV_WRAP double getMaxDepthDiff() const
-    {
-        return maxDepthDiff;
-    }
-    CV_WRAP void setMaxDepthDiff(double val)
-    {
-        maxDepthDiff = val;
-    }
-    CV_WRAP cv::Mat getIterationCounts() const
-    {
-        return iterCounts;
-    }
-    CV_WRAP void setIterationCounts(const cv::Mat &val)
-    {
-        iterCounts = val;
-    }
-    CV_WRAP cv::Mat getMinGradientMagnitudes() const
-    {
-        return minGradientMagnitudes;
-    }
-    CV_WRAP void setMinGradientMagnitudes(const cv::Mat &val)
-    {
-        minGradientMagnitudes = val;
-    }
-    CV_WRAP double getMaxPointsPart() const
-    {
-        return maxPointsPart;
-    }
-    CV_WRAP void setMaxPointsPart(double val)
-    {
-        maxPointsPart = val;
-    }
-    CV_WRAP int getTransformType() const CV_OVERRIDE
-    {
-        return transformType;
-    }
-    CV_WRAP void setTransformType(int val) CV_OVERRIDE
-    {
-        transformType = val;
-    }
-    CV_WRAP double getMaxTranslation() const
-    {
-        return maxTranslation;
-    }
-    CV_WRAP void setMaxTranslation(double val)
-    {
-        maxTranslation = val;
-    }
-    CV_WRAP double getMaxRotation() const
-    {
-        return maxRotation;
-    }
-    CV_WRAP void setMaxRotation(double val)
-    {
-        maxRotation = val;
-    }
-
-  protected:
-    virtual void
-    checkParams() const CV_OVERRIDE;
-
-    virtual bool
-    computeImpl(const Ptr<OdometryFrame>& srcFrame, const Ptr<OdometryFrame>& dstFrame, OutputArray Rt,
-                const Mat& initRt) const CV_OVERRIDE;
-
-    // Some params have commented desired type. It's due to AlgorithmInfo::addParams does not support it now.
-    /*float*/
-    double minDepth, maxDepth, maxDepthDiff;
-    /*vector<int>*/
-    Mat iterCounts;
-    /*vector<float>*/
-    Mat minGradientMagnitudes;
-    double maxPointsPart;
-
-    Mat cameraMatrix;
-    int transformType;
-
-    double maxTranslation, maxRotation;
-  };
-
-  /** Odometry based on the paper "KinectFusion: Real-Time Dense Surface Mapping and Tracking",
-   * Richard A. Newcombe, Andrew Fitzgibbon, at al, SIGGRAPH, 2011.
-   */
-  class CV_EXPORTS_W ICPOdometry: public Odometry
-  {
-  public:
-    ICPOdometry();
-    /** Constructor.
-     * @param cameraMatrix Camera matrix
-     * @param minDepth Pixels with depth less than minDepth will not be used
-     * @param maxDepth Pixels with depth larger than maxDepth will not be used
-     * @param maxDepthDiff Correspondences between pixels of two given frames will be filtered out
-     *                     if their depth difference is larger than maxDepthDiff
-     * @param maxPointsPart The method uses a random pixels subset of size frameWidth x frameHeight x pointsPart
-     * @param iterCounts Count of iterations on each pyramid level.
-     * @param transformType Class of trasformation
-     */
-    ICPOdometry(const Mat& cameraMatrix, float minDepth = Odometry::DEFAULT_MIN_DEPTH(), float maxDepth = Odometry::DEFAULT_MAX_DEPTH(),
-                float maxDepthDiff = Odometry::DEFAULT_MAX_DEPTH_DIFF(), float maxPointsPart = Odometry::DEFAULT_MAX_POINTS_PART(),
-                const std::vector<int>& iterCounts = std::vector<int>(), int transformType = Odometry::RIGID_BODY_MOTION);
-
-    CV_WRAP static Ptr<ICPOdometry> create(const Mat& cameraMatrix = Mat(), float minDepth = Odometry::DEFAULT_MIN_DEPTH(), float maxDepth = Odometry::DEFAULT_MAX_DEPTH(),
-                float maxDepthDiff = Odometry::DEFAULT_MAX_DEPTH_DIFF(), float maxPointsPart = Odometry::DEFAULT_MAX_POINTS_PART(),
-                const std::vector<int>& iterCounts = std::vector<int>(), int transformType = Odometry::RIGID_BODY_MOTION);
-
-    CV_WRAP virtual Size prepareFrameCache(Ptr<OdometryFrame>& frame, int cacheType) const CV_OVERRIDE;
-
-    CV_WRAP cv::Mat getCameraMatrix() const CV_OVERRIDE
-    {
-        return cameraMatrix;
-    }
-    CV_WRAP void setCameraMatrix(const cv::Mat &val) CV_OVERRIDE
-    {
-        cameraMatrix = val;
-    }
-    CV_WRAP double getMinDepth() const
-    {
-        return minDepth;
-    }
-    CV_WRAP void setMinDepth(double val)
-    {
-        minDepth = val;
-    }
-    CV_WRAP double getMaxDepth() const
-    {
-        return maxDepth;
-    }
-    CV_WRAP void setMaxDepth(double val)
-    {
-        maxDepth = val;
-    }
-    CV_WRAP double getMaxDepthDiff() const
-    {
-        return maxDepthDiff;
-    }
-    CV_WRAP void setMaxDepthDiff(double val)
-    {
-        maxDepthDiff = val;
-    }
-    CV_WRAP cv::Mat getIterationCounts() const
-    {
-        return iterCounts;
-    }
-    CV_WRAP void setIterationCounts(const cv::Mat &val)
-    {
-        iterCounts = val;
-    }
-    CV_WRAP double getMaxPointsPart() const
-    {
-        return maxPointsPart;
-    }
-    CV_WRAP void setMaxPointsPart(double val)
-    {
-        maxPointsPart = val;
-    }
-    CV_WRAP int getTransformType() const CV_OVERRIDE
-    {
-        return transformType;
-    }
-    CV_WRAP void setTransformType(int val) CV_OVERRIDE
-    {
-        transformType = val;
-    }
-    CV_WRAP double getMaxTranslation() const
-    {
-        return maxTranslation;
-    }
-    CV_WRAP void setMaxTranslation(double val)
-    {
-        maxTranslation = val;
-    }
-    CV_WRAP double getMaxRotation() const
-    {
-        return maxRotation;
-    }
-    CV_WRAP void setMaxRotation(double val)
-    {
-        maxRotation = val;
-    }
-    CV_WRAP Ptr<RgbdNormals> getNormalsComputer() const
-    {
-        return normalsComputer;
-    }
-
-  protected:
-    virtual void
-    checkParams() const CV_OVERRIDE;
-
-    virtual bool
-    computeImpl(const Ptr<OdometryFrame>& srcFrame, const Ptr<OdometryFrame>& dstFrame, OutputArray Rt,
-                const Mat& initRt) const CV_OVERRIDE;
-
-    // Some params have commented desired type. It's due to AlgorithmInfo::addParams does not support it now.
-    /*float*/
-    double minDepth, maxDepth, maxDepthDiff;
-    /*float*/
-    double maxPointsPart;
-    /*vector<int>*/
-    Mat iterCounts;
-
-    Mat cameraMatrix;
-    int transformType;
-
-    double maxTranslation, maxRotation;
-
-    mutable Ptr<RgbdNormals> normalsComputer;
-  };
-
-  /** Odometry that merges RgbdOdometry and ICPOdometry by minimize sum of their energy functions.
-   */
-
-  class CV_EXPORTS_W RgbdICPOdometry: public Odometry
-  {
-  public:
-    RgbdICPOdometry();
-    /** Constructor.
-     * @param cameraMatrix Camera matrix
-     * @param minDepth Pixels with depth less than minDepth will not be used
-     * @param maxDepth Pixels with depth larger than maxDepth will not be used
-     * @param maxDepthDiff Correspondences between pixels of two given frames will be filtered out
-     *                     if their depth difference is larger than maxDepthDiff
-     * @param maxPointsPart The method uses a random pixels subset of size frameWidth x frameHeight x pointsPart
-     * @param iterCounts Count of iterations on each pyramid level.
-     * @param minGradientMagnitudes For each pyramid level the pixels will be filtered out
-     *                              if they have gradient magnitude less than minGradientMagnitudes[level].
-     * @param transformType Class of trasformation
-     */
-    RgbdICPOdometry(const Mat& cameraMatrix, float minDepth = Odometry::DEFAULT_MIN_DEPTH(), float maxDepth = Odometry::DEFAULT_MAX_DEPTH(),
-                    float maxDepthDiff = Odometry::DEFAULT_MAX_DEPTH_DIFF(), float maxPointsPart = Odometry::DEFAULT_MAX_POINTS_PART(),
-                    const std::vector<int>& iterCounts = std::vector<int>(),
-                    const std::vector<float>& minGradientMagnitudes = std::vector<float>(),
-                    int transformType = Odometry::RIGID_BODY_MOTION);
-
-    CV_WRAP static Ptr<RgbdICPOdometry> create(const Mat& cameraMatrix = Mat(), float minDepth = Odometry::DEFAULT_MIN_DEPTH(), float maxDepth = Odometry::DEFAULT_MAX_DEPTH(),
-                    float maxDepthDiff = Odometry::DEFAULT_MAX_DEPTH_DIFF(), float maxPointsPart = Odometry::DEFAULT_MAX_POINTS_PART(),
-                    const std::vector<int>& iterCounts = std::vector<int>(),
-                    const std::vector<float>& minGradientMagnitudes = std::vector<float>(),
-                    int transformType = Odometry::RIGID_BODY_MOTION);
-
-    CV_WRAP virtual Size prepareFrameCache(Ptr<OdometryFrame>& frame, int cacheType) const CV_OVERRIDE;
-
-    CV_WRAP cv::Mat getCameraMatrix() const CV_OVERRIDE
-    {
-        return cameraMatrix;
-    }
-    CV_WRAP void setCameraMatrix(const cv::Mat &val) CV_OVERRIDE
-    {
-        cameraMatrix = val;
-    }
-    CV_WRAP double getMinDepth() const
-    {
-        return minDepth;
-    }
-    CV_WRAP void setMinDepth(double val)
-    {
-        minDepth = val;
-    }
-    CV_WRAP double getMaxDepth() const
-    {
-        return maxDepth;
-    }
-    CV_WRAP void setMaxDepth(double val)
-    {
-        maxDepth = val;
-    }
-    CV_WRAP double getMaxDepthDiff() const
-    {
-        return maxDepthDiff;
-    }
-    CV_WRAP void setMaxDepthDiff(double val)
-    {
-        maxDepthDiff = val;
-    }
-    CV_WRAP double getMaxPointsPart() const
-    {
-        return maxPointsPart;
-    }
-    CV_WRAP void setMaxPointsPart(double val)
-    {
-        maxPointsPart = val;
-    }
-    CV_WRAP cv::Mat getIterationCounts() const
-    {
-        return iterCounts;
-    }
-    CV_WRAP void setIterationCounts(const cv::Mat &val)
-    {
-        iterCounts = val;
-    }
-    CV_WRAP cv::Mat getMinGradientMagnitudes() const
-    {
-        return minGradientMagnitudes;
-    }
-    CV_WRAP void setMinGradientMagnitudes(const cv::Mat &val)
-    {
-        minGradientMagnitudes = val;
-    }
-    CV_WRAP int getTransformType() const CV_OVERRIDE
-    {
-        return transformType;
-    }
-    CV_WRAP void setTransformType(int val) CV_OVERRIDE
-    {
-        transformType = val;
-    }
-    CV_WRAP double getMaxTranslation() const
-    {
-        return maxTranslation;
-    }
-    CV_WRAP void setMaxTranslation(double val)
-    {
-        maxTranslation = val;
-    }
-    CV_WRAP double getMaxRotation() const
-    {
-        return maxRotation;
-    }
-    CV_WRAP void setMaxRotation(double val)
-    {
-        maxRotation = val;
-    }
-    CV_WRAP Ptr<RgbdNormals> getNormalsComputer() const
-    {
-        return normalsComputer;
-    }
-
-  protected:
-    virtual void
-    checkParams() const CV_OVERRIDE;
-
-    virtual bool
-    computeImpl(const Ptr<OdometryFrame>& srcFrame, const Ptr<OdometryFrame>& dstFrame, OutputArray Rt,
-                const Mat& initRt) const CV_OVERRIDE;
-
-    // Some params have commented desired type. It's due to AlgorithmInfo::addParams does not support it now.
-    /*float*/
-    double minDepth, maxDepth, maxDepthDiff;
-    /*float*/
-    double maxPointsPart;
-    /*vector<int>*/
-    Mat iterCounts;
-    /*vector<float>*/
-    Mat minGradientMagnitudes;
-
-    Mat cameraMatrix;
-    int transformType;
-
-    double maxTranslation, maxRotation;
-
-    mutable Ptr<RgbdNormals> normalsComputer;
-  };
-
-  /** Warp the image: compute 3d points from the depth, transform them using given transformation,
-   * then project color point cloud to an image plane.
-   * This function can be used to visualize results of the Odometry algorithm.
-   * @param image The image (of CV_8UC1 or CV_8UC3 type)
-   * @param depth The depth (of type used in depthTo3d fuction)
-   * @param mask The mask of used pixels (of CV_8UC1), it can be empty
-   * @param Rt The transformation that will be applied to the 3d points computed from the depth
-   * @param cameraMatrix Camera matrix
-   * @param distCoeff Distortion coefficients
-   * @param warpedImage The warped image.
-   * @param warpedDepth The warped depth.
-   * @param warpedMask The warped mask.
-   */
-  CV_EXPORTS_W
-  void
-  warpFrame(const Mat& image, const Mat& depth, const Mat& mask, const Mat& Rt, const Mat& cameraMatrix,
-            const Mat& distCoeff, OutputArray warpedImage, OutputArray warpedDepth = noArray(), OutputArray warpedMask = noArray());
-
-// TODO Depth interpolation
-// Curvature
-// Get rescaleDepth return dubles if asked for
-
-//! @}
-
-} /* namespace rgbd */
-} /* namespace cv */
-
-#include "opencv2/rgbd/linemod.hpp"
-
-#endif /* __cplusplus */
 #endif
 
 /* End of file. */
-
diff --git a/modules/rgbd/include/opencv2/rgbd/depth.hpp b/modules/rgbd/include/opencv2/rgbd/depth.hpp
new file mode 100755
index 00000000000..94fdca62036
--- /dev/null
+++ b/modules/rgbd/include/opencv2/rgbd/depth.hpp
@@ -0,0 +1,1192 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
+
+#ifndef __OPENCV_RGBD_DEPTH_HPP__
+#define __OPENCV_RGBD_DEPTH_HPP__
+
+#include <opencv2/core.hpp>
+#include <limits>
+
+namespace cv
+{
+namespace rgbd
+{
+
+//! @addtogroup rgbd
+//! @{
+
+  /** Checks if the value is a valid depth. For CV_16U or CV_16S, the convention is to be invalid if it is
+   * a limit. For a float/double, we just check if it is a NaN
+   * @param depth the depth to check for validity
+   */
+  CV_EXPORTS
+  inline bool
+  isValidDepth(const float & depth)
+  {
+    return !cvIsNaN(depth);
+  }
+  CV_EXPORTS
+  inline bool
+  isValidDepth(const double & depth)
+  {
+    return !cvIsNaN(depth);
+  }
+  CV_EXPORTS
+  inline bool
+  isValidDepth(const short int & depth)
+  {
+    return (depth != std::numeric_limits<short int>::min()) && (depth != std::numeric_limits<short int>::max());
+  }
+  CV_EXPORTS
+  inline bool
+  isValidDepth(const unsigned short int & depth)
+  {
+    return (depth != std::numeric_limits<unsigned short int>::min())
+        && (depth != std::numeric_limits<unsigned short int>::max());
+  }
+  CV_EXPORTS
+  inline bool
+  isValidDepth(const int & depth)
+  {
+    return (depth != std::numeric_limits<int>::min()) && (depth != std::numeric_limits<int>::max());
+  }
+  CV_EXPORTS
+  inline bool
+  isValidDepth(const unsigned int & depth)
+  {
+    return (depth != std::numeric_limits<unsigned int>::min()) && (depth != std::numeric_limits<unsigned int>::max());
+  }
+
+  /** Object that can compute the normals in an image.
+   * It is an object as it can cache data for speed efficiency
+   * The implemented methods are either:
+   * - FALS (the fastest) and SRI from
+   * ``Fast and Accurate Computation of Surface Normals from Range Images``
+   * by H. Badino, D. Huber, Y. Park and T. Kanade
+   * - the normals with bilateral filtering on a depth image from
+   * ``Gradient Response Maps for Real-Time Detection of Texture-Less Objects``
+   * by S. Hinterstoisser, C. Cagniart, S. Ilic, P. Sturm, N. Navab, P. Fua, and V. Lepetit
+   */
+  class CV_EXPORTS_W RgbdNormals: public Algorithm
+  {
+  public:
+    enum RGBD_NORMALS_METHOD
+    {
+      RGBD_NORMALS_METHOD_FALS = 0,
+      RGBD_NORMALS_METHOD_LINEMOD = 1,
+      RGBD_NORMALS_METHOD_SRI = 2
+    };
+
+    RgbdNormals()
+        :
+          rows_(0),
+          cols_(0),
+          depth_(0),
+          K_(Mat()),
+          window_size_(0),
+          method_(RGBD_NORMALS_METHOD_FALS),
+          rgbd_normals_impl_(0)
+    {
+    }
+
+    /** Constructor
+     * @param rows the number of rows of the depth image normals will be computed on
+     * @param cols the number of cols of the depth image normals will be computed on
+     * @param depth the depth of the normals (only CV_32F or CV_64F)
+     * @param K the calibration matrix to use
+     * @param window_size the window size to compute the normals: can only be 1,3,5 or 7
+     * @param method one of the methods to use: RGBD_NORMALS_METHOD_SRI, RGBD_NORMALS_METHOD_FALS
+     */
+    RgbdNormals(int rows, int cols, int depth, InputArray K, int window_size = 5, int method =
+        RgbdNormals::RGBD_NORMALS_METHOD_FALS);
+
+    ~RgbdNormals();
+
+    CV_WRAP static Ptr<RgbdNormals> create(int rows, int cols, int depth, InputArray K, int window_size = 5, int method =
+        RgbdNormals::RGBD_NORMALS_METHOD_FALS);
+
+    /** Given a set of 3d points in a depth image, compute the normals at each point.
+     * @param points a rows x cols x 3 matrix of CV_32F/CV64F or a rows x cols x 1 CV_U16S
+     * @param normals a rows x cols x 3 matrix
+     */
+    CV_WRAP_AS(apply) void
+    operator()(InputArray points, OutputArray normals) const;
+
+    /** Initializes some data that is cached for later computation
+     * If that function is not called, it will be called the first time normals are computed
+     */
+    CV_WRAP void
+    initialize() const;
+
+    CV_WRAP int getRows() const
+    {
+        return rows_;
+    }
+    CV_WRAP void setRows(int val)
+    {
+        rows_ = val;
+    }
+    CV_WRAP int getCols() const
+    {
+        return cols_;
+    }
+    CV_WRAP void setCols(int val)
+    {
+        cols_ = val;
+    }
+    CV_WRAP int getWindowSize() const
+    {
+        return window_size_;
+    }
+    CV_WRAP void setWindowSize(int val)
+    {
+        window_size_ = val;
+    }
+    CV_WRAP int getDepth() const
+    {
+        return depth_;
+    }
+    CV_WRAP void setDepth(int val)
+    {
+        depth_ = val;
+    }
+    CV_WRAP cv::Mat getK() const
+    {
+        return K_;
+    }
+    CV_WRAP void setK(const cv::Mat &val)
+    {
+        K_ = val;
+    }
+    CV_WRAP int getMethod() const
+    {
+        return method_;
+    }
+    CV_WRAP void setMethod(int val)
+    {
+        method_ = val;
+    }
+
+  protected:
+    void
+    initialize_normals_impl(int rows, int cols, int depth, const Mat & K, int window_size, int method) const;
+
+    int rows_, cols_, depth_;
+    Mat K_;
+    int window_size_;
+    int method_;
+    mutable void* rgbd_normals_impl_;
+  };
+
+  /** Object that can clean a noisy depth image
+   */
+  class CV_EXPORTS_W DepthCleaner: public Algorithm
+  {
+  public:
+    /** NIL method is from
+     * ``Modeling Kinect Sensor Noise for Improved 3d Reconstruction and Tracking``
+     * by C. Nguyen, S. Izadi, D. Lovel
+     */
+    enum DEPTH_CLEANER_METHOD
+    {
+      DEPTH_CLEANER_NIL
+    };
+
+    DepthCleaner()
+        :
+          depth_(0),
+          window_size_(0),
+          method_(DEPTH_CLEANER_NIL),
+          depth_cleaner_impl_(0)
+    {
+    }
+
+    /** Constructor
+     * @param depth the depth of the normals (only CV_32F or CV_64F)
+     * @param window_size the window size to compute the normals: can only be 1,3,5 or 7
+     * @param method one of the methods to use: RGBD_NORMALS_METHOD_SRI, RGBD_NORMALS_METHOD_FALS
+     */
+    DepthCleaner(int depth, int window_size = 5, int method = DepthCleaner::DEPTH_CLEANER_NIL);
+
+    ~DepthCleaner();
+
+    CV_WRAP static Ptr<DepthCleaner> create(int depth, int window_size = 5, int method = DepthCleaner::DEPTH_CLEANER_NIL);
+
+    /** Given a set of 3d points in a depth image, compute the normals at each point.
+     * @param points a rows x cols x 3 matrix of CV_32F/CV64F or a rows x cols x 1 CV_U16S
+     * @param depth a rows x cols matrix of the cleaned up depth
+     */
+    CV_WRAP_AS(apply) void
+    operator()(InputArray points, OutputArray depth) const;
+
+    /** Initializes some data that is cached for later computation
+     * If that function is not called, it will be called the first time normals are computed
+     */
+    CV_WRAP void
+    initialize() const;
+
+    CV_WRAP int getWindowSize() const
+    {
+        return window_size_;
+    }
+    CV_WRAP void setWindowSize(int val)
+    {
+        window_size_ = val;
+    }
+    CV_WRAP int getDepth() const
+    {
+        return depth_;
+    }
+    CV_WRAP void setDepth(int val)
+    {
+        depth_ = val;
+    }
+    CV_WRAP int getMethod() const
+    {
+        return method_;
+    }
+    CV_WRAP void setMethod(int val)
+    {
+        method_ = val;
+    }
+
+  protected:
+    void
+    initialize_cleaner_impl() const;
+
+    int depth_;
+    int window_size_;
+    int method_;
+    mutable void* depth_cleaner_impl_;
+  };
+
+
+  /** Registers depth data to an external camera
+   * Registration is performed by creating a depth cloud, transforming the cloud by
+   * the rigid body transformation between the cameras, and then projecting the
+   * transformed points into the RGB camera.
+   *
+   * uv_rgb = K_rgb * [R | t] * z * inv(K_ir) * uv_ir
+   *
+   * Currently does not check for negative depth values.
+   *
+   * @param unregisteredCameraMatrix the camera matrix of the depth camera
+   * @param registeredCameraMatrix the camera matrix of the external camera
+   * @param registeredDistCoeffs the distortion coefficients of the external camera
+   * @param Rt the rigid body transform between the cameras. Transforms points from depth camera frame to external camera frame.
+   * @param unregisteredDepth the input depth data
+   * @param outputImagePlaneSize the image plane dimensions of the external camera (width, height)
+   * @param registeredDepth the result of transforming the depth into the external camera
+   * @param depthDilation whether or not the depth is dilated to avoid holes and occlusion errors (optional)
+   */
+  CV_EXPORTS_W
+  void
+  registerDepth(InputArray unregisteredCameraMatrix, InputArray registeredCameraMatrix, InputArray registeredDistCoeffs,
+                InputArray Rt, InputArray unregisteredDepth, const Size& outputImagePlaneSize,
+                OutputArray registeredDepth, bool depthDilation=false);
+
+  /**
+   * @param depth the depth image
+   * @param in_K
+   * @param in_points the list of xy coordinates
+   * @param points3d the resulting 3d points
+   */
+  CV_EXPORTS_W
+  void
+  depthTo3dSparse(InputArray depth, InputArray in_K, InputArray in_points, OutputArray points3d);
+
+  /** Converts a depth image to an organized set of 3d points.
+   * The coordinate system is x pointing left, y down and z away from the camera
+   * @param depth the depth image (if given as short int CV_U, it is assumed to be the depth in millimeters
+   *              (as done with the Microsoft Kinect), otherwise, if given as CV_32F or CV_64F, it is assumed in meters)
+   * @param K The calibration matrix
+   * @param points3d the resulting 3d points. They are of depth the same as `depth` if it is CV_32F or CV_64F, and the
+   *        depth of `K` if `depth` is of depth CV_U
+   * @param mask the mask of the points to consider (can be empty)
+   */
+  CV_EXPORTS_W
+  void
+  depthTo3d(InputArray depth, InputArray K, OutputArray points3d, InputArray mask = noArray());
+
+  /** If the input image is of type CV_16UC1 (like the Kinect one), the image is converted to floats, divided
+   * by 1000 to get a depth in meters, and the values 0 are converted to std::numeric_limits<float>::quiet_NaN()
+   * Otherwise, the image is simply converted to floats
+   * @param in the depth image (if given as short int CV_U, it is assumed to be the depth in millimeters
+   *              (as done with the Microsoft Kinect), it is assumed in meters)
+   * @param depth the desired output depth (floats or double)
+   * @param out The rescaled float depth image
+   */
+  CV_EXPORTS_W
+  void
+  rescaleDepth(InputArray in, int depth, OutputArray out);
+
+  /** Object that can compute planes in an image
+   */
+  class CV_EXPORTS_W RgbdPlane: public Algorithm
+  {
+  public:
+    enum RGBD_PLANE_METHOD
+    {
+      RGBD_PLANE_METHOD_DEFAULT
+    };
+
+      RgbdPlane(int method = RgbdPlane::RGBD_PLANE_METHOD_DEFAULT)
+        :
+          method_(method),
+          block_size_(40),
+          min_size_(block_size_*block_size_),
+          threshold_(0.01),
+          sensor_error_a_(0),
+          sensor_error_b_(0),
+          sensor_error_c_(0)
+    {
+    }
+
+    /** Constructor
+     * @param block_size The size of the blocks to look at for a stable MSE
+     * @param min_size The minimum size of a cluster to be considered a plane
+     * @param threshold The maximum distance of a point from a plane to belong to it (in meters)
+     * @param sensor_error_a coefficient of the sensor error. 0 by default, 0.0075 for a Kinect
+     * @param sensor_error_b coefficient of the sensor error. 0 by default
+     * @param sensor_error_c coefficient of the sensor error. 0 by default
+     * @param method The method to use to compute the planes.
+     */
+    RgbdPlane(int method, int block_size,
+              int min_size, double threshold, double sensor_error_a = 0,
+              double sensor_error_b = 0, double sensor_error_c = 0);
+
+    ~RgbdPlane();
+
+    CV_WRAP static Ptr<RgbdPlane> create(int method, int block_size, int min_size, double threshold,
+                                         double sensor_error_a = 0, double sensor_error_b = 0,
+                                         double sensor_error_c = 0);
+
+    /** Find The planes in a depth image
+     * @param points3d the 3d points organized like the depth image: rows x cols with 3 channels
+     * @param normals the normals for every point in the depth image
+     * @param mask An image where each pixel is labeled with the plane it belongs to
+     *        and 255 if it does not belong to any plane
+     * @param plane_coefficients the coefficients of the corresponding planes (a,b,c,d) such that ax+by+cz+d=0, norm(a,b,c)=1
+     *        and c < 0 (so that the normal points towards the camera)
+     */
+    CV_WRAP_AS(apply) void
+    operator()(InputArray points3d, InputArray normals, OutputArray mask,
+               OutputArray plane_coefficients);
+
+    /** Find The planes in a depth image but without doing a normal check, which is faster but less accurate
+     * @param points3d the 3d points organized like the depth image: rows x cols with 3 channels
+     * @param mask An image where each pixel is labeled with the plane it belongs to
+     *        and 255 if it does not belong to any plane
+     * @param plane_coefficients the coefficients of the corresponding planes (a,b,c,d) such that ax+by+cz+d=0
+     */
+    CV_WRAP_AS(apply) void
+    operator()(InputArray points3d, OutputArray mask, OutputArray plane_coefficients);
+
+    CV_WRAP int getBlockSize() const
+    {
+        return block_size_;
+    }
+    CV_WRAP void setBlockSize(int val)
+    {
+        block_size_ = val;
+    }
+    CV_WRAP int getMinSize() const
+    {
+        return min_size_;
+    }
+    CV_WRAP void setMinSize(int val)
+    {
+        min_size_ = val;
+    }
+    CV_WRAP int getMethod() const
+    {
+        return method_;
+    }
+    CV_WRAP void setMethod(int val)
+    {
+        method_ = val;
+    }
+    CV_WRAP double getThreshold() const
+    {
+        return threshold_;
+    }
+    CV_WRAP void setThreshold(double val)
+    {
+        threshold_ = val;
+    }
+    CV_WRAP double getSensorErrorA() const
+    {
+        return sensor_error_a_;
+    }
+    CV_WRAP void setSensorErrorA(double val)
+    {
+        sensor_error_a_ = val;
+    }
+    CV_WRAP double getSensorErrorB() const
+    {
+        return sensor_error_b_;
+    }
+    CV_WRAP void setSensorErrorB(double val)
+    {
+        sensor_error_b_ = val;
+    }
+    CV_WRAP double getSensorErrorC() const
+    {
+        return sensor_error_c_;
+    }
+    CV_WRAP void setSensorErrorC(double val)
+    {
+        sensor_error_c_ = val;
+    }
+
+  private:
+    /** The method to use to compute the planes */
+    int method_;
+    /** The size of the blocks to look at for a stable MSE */
+    int block_size_;
+    /** The minimum size of a cluster to be considered a plane */
+    int min_size_;
+    /** How far a point can be from a plane to belong to it (in meters) */
+    double threshold_;
+    /** coefficient of the sensor error with respect to the. All 0 by default but you want a=0.0075 for a Kinect */
+    double sensor_error_a_, sensor_error_b_, sensor_error_c_;
+  };
+
+  /** Object that contains a frame data.
+   */
+  struct CV_EXPORTS_W RgbdFrame
+  {
+      RgbdFrame();
+      RgbdFrame(const Mat& image, const Mat& depth, const Mat& mask=Mat(), const Mat& normals=Mat(), int ID=-1);
+      virtual ~RgbdFrame();
+
+      CV_WRAP static Ptr<RgbdFrame> create(const Mat& image=Mat(), const Mat& depth=Mat(), const Mat& mask=Mat(), const Mat& normals=Mat(), int ID=-1);
+
+      CV_WRAP virtual void
+      release();
+
+      CV_PROP int ID;
+      CV_PROP Mat image;
+      CV_PROP Mat depth;
+      CV_PROP Mat mask;
+      CV_PROP Mat normals;
+  };
+
+  /** Object that contains a frame data that is possibly needed for the Odometry.
+   * It's used for the efficiency (to pass precomputed/cached data of the frame that participates
+   * in the Odometry processing several times).
+   */
+  struct CV_EXPORTS_W OdometryFrame : public RgbdFrame
+  {
+    /** These constants are used to set a type of cache which has to be prepared depending on the frame role:
+     * srcFrame or dstFrame (see compute method of the Odometry class). For the srcFrame and dstFrame different cache data may be required,
+     * some part of a cache may be common for both frame roles.
+     * @param CACHE_SRC The cache data for the srcFrame will be prepared.
+     * @param CACHE_DST The cache data for the dstFrame will be prepared.
+     * @param CACHE_ALL The cache data for both srcFrame and dstFrame roles will be computed.
+     */
+    enum
+    {
+      CACHE_SRC = 1, CACHE_DST = 2, CACHE_ALL = CACHE_SRC + CACHE_DST
+    };
+
+    OdometryFrame();
+    OdometryFrame(const Mat& image, const Mat& depth, const Mat& mask=Mat(), const Mat& normals=Mat(), int ID=-1);
+
+    CV_WRAP static Ptr<OdometryFrame> create(const Mat& image=Mat(), const Mat& depth=Mat(), const Mat& mask=Mat(), const Mat& normals=Mat(), int ID=-1);
+
+    CV_WRAP virtual void
+    release() CV_OVERRIDE;
+
+    CV_WRAP void
+    releasePyramids();
+
+    CV_PROP std::vector<Mat> pyramidImage;
+    CV_PROP std::vector<Mat> pyramidDepth;
+    CV_PROP std::vector<Mat> pyramidMask;
+
+    CV_PROP std::vector<Mat> pyramidCloud;
+
+    CV_PROP std::vector<Mat> pyramid_dI_dx;
+    CV_PROP std::vector<Mat> pyramid_dI_dy;
+    CV_PROP std::vector<Mat> pyramidTexturedMask;
+
+    CV_PROP std::vector<Mat> pyramidNormals;
+    CV_PROP std::vector<Mat> pyramidNormalsMask;
+  };
+
+  /** Base class for computation of odometry.
+   */
+  class CV_EXPORTS_W Odometry: public Algorithm
+  {
+  public:
+
+    /** A class of transformation*/
+    enum
+    {
+      ROTATION = 1, TRANSLATION = 2, RIGID_BODY_MOTION = 4
+    };
+
+    CV_WRAP static inline float
+    DEFAULT_MIN_DEPTH()
+    {
+      return 0.f; // in meters
+    }
+    CV_WRAP static inline float
+    DEFAULT_MAX_DEPTH()
+    {
+      return 4.f; // in meters
+    }
+    CV_WRAP static inline float
+    DEFAULT_MAX_DEPTH_DIFF()
+    {
+      return 0.07f; // in meters
+    }
+    CV_WRAP static inline float
+    DEFAULT_MAX_POINTS_PART()
+    {
+      return 0.07f; // in [0, 1]
+    }
+    CV_WRAP static inline float
+    DEFAULT_MAX_TRANSLATION()
+    {
+      return 0.15f; // in meters
+    }
+    CV_WRAP static inline float
+    DEFAULT_MAX_ROTATION()
+    {
+      return 15; // in degrees
+    }
+
+    /** Method to compute a transformation from the source frame to the destination one.
+     * Some odometry algorithms do not used some data of frames (eg. ICP does not use images).
+     * In such case corresponding arguments can be set as empty Mat.
+     * The method returns true if all internal computions were possible (e.g. there were enough correspondences,
+     * system of equations has a solution, etc) and resulting transformation satisfies some test if it's provided
+     * by the Odometry inheritor implementation (e.g. thresholds for maximum translation and rotation).
+     * @param srcImage Image data of the source frame (CV_8UC1)
+     * @param srcDepth Depth data of the source frame (CV_32FC1, in meters)
+     * @param srcMask Mask that sets which pixels have to be used from the source frame (CV_8UC1)
+     * @param dstImage Image data of the destination frame (CV_8UC1)
+     * @param dstDepth Depth data of the destination frame (CV_32FC1, in meters)
+     * @param dstMask Mask that sets which pixels have to be used from the destination frame (CV_8UC1)
+     * @param Rt Resulting transformation from the source frame to the destination one (rigid body motion):
+     dst_p = Rt * src_p, where dst_p is a homogeneous point in the destination frame and src_p is
+     homogeneous point in the source frame,
+     Rt is 4x4 matrix of CV_64FC1 type.
+     * @param initRt Initial transformation from the source frame to the destination one (optional)
+     */
+    CV_WRAP bool
+    compute(const Mat& srcImage, const Mat& srcDepth, const Mat& srcMask, const Mat& dstImage, const Mat& dstDepth,
+            const Mat& dstMask, OutputArray Rt, const Mat& initRt = Mat()) const;
+
+    /** One more method to compute a transformation from the source frame to the destination one.
+     * It is designed to save on computing the frame data (image pyramids, normals, etc.).
+     */
+    CV_WRAP_AS(compute2) bool
+    compute(Ptr<OdometryFrame>& srcFrame, Ptr<OdometryFrame>& dstFrame, OutputArray Rt, const Mat& initRt = Mat()) const;
+
+    /** Prepare a cache for the frame. The function checks the precomputed/passed data (throws the error if this data
+     * does not satisfy) and computes all remaining cache data needed for the frame. Returned size is a resolution
+     * of the prepared frame.
+     * @param frame The odometry which will process the frame.
+     * @param cacheType The cache type: CACHE_SRC, CACHE_DST or CACHE_ALL.
+     */
+    CV_WRAP virtual Size prepareFrameCache(Ptr<OdometryFrame>& frame, int cacheType) const;
+
+    CV_WRAP static Ptr<Odometry> create(const String & odometryType);
+
+    /** @see setCameraMatrix */
+    CV_WRAP virtual cv::Mat getCameraMatrix() const = 0;
+    /** @copybrief getCameraMatrix @see getCameraMatrix */
+    CV_WRAP virtual void setCameraMatrix(const cv::Mat &val) = 0;
+    /** @see setTransformType */
+    CV_WRAP virtual int getTransformType() const = 0;
+    /** @copybrief getTransformType @see getTransformType */
+    CV_WRAP virtual void setTransformType(int val) = 0;
+
+  protected:
+    virtual void
+    checkParams() const = 0;
+
+    virtual bool
+    computeImpl(const Ptr<OdometryFrame>& srcFrame, const Ptr<OdometryFrame>& dstFrame, OutputArray Rt,
+                const Mat& initRt) const = 0;
+  };
+
+  /** Odometry based on the paper "Real-Time Visual Odometry from Dense RGB-D Images",
+   * F. Steinbucker, J. Strum, D. Cremers, ICCV, 2011.
+   */
+  class CV_EXPORTS_W RgbdOdometry: public Odometry
+  {
+  public:
+    RgbdOdometry();
+    /** Constructor.
+     * @param cameraMatrix Camera matrix
+     * @param minDepth Pixels with depth less than minDepth will not be used (in meters)
+     * @param maxDepth Pixels with depth larger than maxDepth will not be used (in meters)
+     * @param maxDepthDiff Correspondences between pixels of two given frames will be filtered out
+     *                     if their depth difference is larger than maxDepthDiff (in meters)
+     * @param iterCounts Count of iterations on each pyramid level.
+     * @param minGradientMagnitudes For each pyramid level the pixels will be filtered out
+     *                              if they have gradient magnitude less than minGradientMagnitudes[level].
+     * @param maxPointsPart The method uses a random pixels subset of size frameWidth x frameHeight x pointsPart
+     * @param transformType Class of transformation
+     */
+    RgbdOdometry(const Mat& cameraMatrix, float minDepth = Odometry::DEFAULT_MIN_DEPTH(), float maxDepth = Odometry::DEFAULT_MAX_DEPTH(),
+                 float maxDepthDiff = Odometry::DEFAULT_MAX_DEPTH_DIFF(), const std::vector<int>& iterCounts = std::vector<int>(),
+                 const std::vector<float>& minGradientMagnitudes = std::vector<float>(), float maxPointsPart = Odometry::DEFAULT_MAX_POINTS_PART(),
+                 int transformType = Odometry::RIGID_BODY_MOTION);
+
+    CV_WRAP static Ptr<RgbdOdometry> create(const Mat& cameraMatrix = Mat(), float minDepth = Odometry::DEFAULT_MIN_DEPTH(), float maxDepth = Odometry::DEFAULT_MAX_DEPTH(),
+                 float maxDepthDiff = Odometry::DEFAULT_MAX_DEPTH_DIFF(), const std::vector<int>& iterCounts = std::vector<int>(),
+                 const std::vector<float>& minGradientMagnitudes = std::vector<float>(), float maxPointsPart = Odometry::DEFAULT_MAX_POINTS_PART(),
+                 int transformType = Odometry::RIGID_BODY_MOTION);
+
+    CV_WRAP virtual Size prepareFrameCache(Ptr<OdometryFrame>& frame, int cacheType) const CV_OVERRIDE;
+
+    CV_WRAP cv::Mat getCameraMatrix() const CV_OVERRIDE
+    {
+        return cameraMatrix;
+    }
+    CV_WRAP void setCameraMatrix(const cv::Mat &val) CV_OVERRIDE
+    {
+        cameraMatrix = val;
+    }
+    CV_WRAP double getMinDepth() const
+    {
+        return minDepth;
+    }
+    CV_WRAP void setMinDepth(double val)
+    {
+        minDepth = val;
+    }
+    CV_WRAP double getMaxDepth() const
+    {
+        return maxDepth;
+    }
+    CV_WRAP void setMaxDepth(double val)
+    {
+        maxDepth = val;
+    }
+    CV_WRAP double getMaxDepthDiff() const
+    {
+        return maxDepthDiff;
+    }
+    CV_WRAP void setMaxDepthDiff(double val)
+    {
+        maxDepthDiff = val;
+    }
+    CV_WRAP cv::Mat getIterationCounts() const
+    {
+        return iterCounts;
+    }
+    CV_WRAP void setIterationCounts(const cv::Mat &val)
+    {
+        iterCounts = val;
+    }
+    CV_WRAP cv::Mat getMinGradientMagnitudes() const
+    {
+        return minGradientMagnitudes;
+    }
+    CV_WRAP void setMinGradientMagnitudes(const cv::Mat &val)
+    {
+        minGradientMagnitudes = val;
+    }
+    CV_WRAP double getMaxPointsPart() const
+    {
+        return maxPointsPart;
+    }
+    CV_WRAP void setMaxPointsPart(double val)
+    {
+        maxPointsPart = val;
+    }
+    CV_WRAP int getTransformType() const CV_OVERRIDE
+    {
+        return transformType;
+    }
+    CV_WRAP void setTransformType(int val) CV_OVERRIDE
+    {
+        transformType = val;
+    }
+    CV_WRAP double getMaxTranslation() const
+    {
+        return maxTranslation;
+    }
+    CV_WRAP void setMaxTranslation(double val)
+    {
+        maxTranslation = val;
+    }
+    CV_WRAP double getMaxRotation() const
+    {
+        return maxRotation;
+    }
+    CV_WRAP void setMaxRotation(double val)
+    {
+        maxRotation = val;
+    }
+
+  protected:
+    virtual void
+    checkParams() const CV_OVERRIDE;
+
+    virtual bool
+    computeImpl(const Ptr<OdometryFrame>& srcFrame, const Ptr<OdometryFrame>& dstFrame, OutputArray Rt,
+                const Mat& initRt) const CV_OVERRIDE;
+
+    // Some params have commented desired type. It's due to AlgorithmInfo::addParams does not support it now.
+    /*float*/
+    double minDepth, maxDepth, maxDepthDiff;
+    /*vector<int>*/
+    Mat iterCounts;
+    /*vector<float>*/
+    Mat minGradientMagnitudes;
+    double maxPointsPart;
+
+    Mat cameraMatrix;
+    int transformType;
+
+    double maxTranslation, maxRotation;
+  };
+
+  /** Odometry based on the paper "KinectFusion: Real-Time Dense Surface Mapping and Tracking",
+   * Richard A. Newcombe, Andrew Fitzgibbon, at al, SIGGRAPH, 2011.
+   */
+  class CV_EXPORTS_W ICPOdometry: public Odometry
+  {
+  public:
+    ICPOdometry();
+    /** Constructor.
+     * @param cameraMatrix Camera matrix
+     * @param minDepth Pixels with depth less than minDepth will not be used
+     * @param maxDepth Pixels with depth larger than maxDepth will not be used
+     * @param maxDepthDiff Correspondences between pixels of two given frames will be filtered out
+     *                     if their depth difference is larger than maxDepthDiff
+     * @param maxPointsPart The method uses a random pixels subset of size frameWidth x frameHeight x pointsPart
+     * @param iterCounts Count of iterations on each pyramid level.
+     * @param transformType Class of trasformation
+     */
+    ICPOdometry(const Mat& cameraMatrix, float minDepth = Odometry::DEFAULT_MIN_DEPTH(), float maxDepth = Odometry::DEFAULT_MAX_DEPTH(),
+                float maxDepthDiff = Odometry::DEFAULT_MAX_DEPTH_DIFF(), float maxPointsPart = Odometry::DEFAULT_MAX_POINTS_PART(),
+                const std::vector<int>& iterCounts = std::vector<int>(), int transformType = Odometry::RIGID_BODY_MOTION);
+
+    CV_WRAP static Ptr<ICPOdometry> create(const Mat& cameraMatrix = Mat(), float minDepth = Odometry::DEFAULT_MIN_DEPTH(), float maxDepth = Odometry::DEFAULT_MAX_DEPTH(),
+                float maxDepthDiff = Odometry::DEFAULT_MAX_DEPTH_DIFF(), float maxPointsPart = Odometry::DEFAULT_MAX_POINTS_PART(),
+                const std::vector<int>& iterCounts = std::vector<int>(), int transformType = Odometry::RIGID_BODY_MOTION);
+
+    CV_WRAP virtual Size prepareFrameCache(Ptr<OdometryFrame>& frame, int cacheType) const CV_OVERRIDE;
+
+    CV_WRAP cv::Mat getCameraMatrix() const CV_OVERRIDE
+    {
+        return cameraMatrix;
+    }
+    CV_WRAP void setCameraMatrix(const cv::Mat &val) CV_OVERRIDE
+    {
+        cameraMatrix = val;
+    }
+    CV_WRAP double getMinDepth() const
+    {
+        return minDepth;
+    }
+    CV_WRAP void setMinDepth(double val)
+    {
+        minDepth = val;
+    }
+    CV_WRAP double getMaxDepth() const
+    {
+        return maxDepth;
+    }
+    CV_WRAP void setMaxDepth(double val)
+    {
+        maxDepth = val;
+    }
+    CV_WRAP double getMaxDepthDiff() const
+    {
+        return maxDepthDiff;
+    }
+    CV_WRAP void setMaxDepthDiff(double val)
+    {
+        maxDepthDiff = val;
+    }
+    CV_WRAP cv::Mat getIterationCounts() const
+    {
+        return iterCounts;
+    }
+    CV_WRAP void setIterationCounts(const cv::Mat &val)
+    {
+        iterCounts = val;
+    }
+    CV_WRAP double getMaxPointsPart() const
+    {
+        return maxPointsPart;
+    }
+    CV_WRAP void setMaxPointsPart(double val)
+    {
+        maxPointsPart = val;
+    }
+    CV_WRAP int getTransformType() const CV_OVERRIDE
+    {
+        return transformType;
+    }
+    CV_WRAP void setTransformType(int val) CV_OVERRIDE
+    {
+        transformType = val;
+    }
+    CV_WRAP double getMaxTranslation() const
+    {
+        return maxTranslation;
+    }
+    CV_WRAP void setMaxTranslation(double val)
+    {
+        maxTranslation = val;
+    }
+    CV_WRAP double getMaxRotation() const
+    {
+        return maxRotation;
+    }
+    CV_WRAP void setMaxRotation(double val)
+    {
+        maxRotation = val;
+    }
+    CV_WRAP Ptr<RgbdNormals> getNormalsComputer() const
+    {
+        return normalsComputer;
+    }
+
+  protected:
+    virtual void
+    checkParams() const CV_OVERRIDE;
+
+    virtual bool
+    computeImpl(const Ptr<OdometryFrame>& srcFrame, const Ptr<OdometryFrame>& dstFrame, OutputArray Rt,
+                const Mat& initRt) const CV_OVERRIDE;
+
+    // Some params have commented desired type. It's due to AlgorithmInfo::addParams does not support it now.
+    /*float*/
+    double minDepth, maxDepth, maxDepthDiff;
+    /*float*/
+    double maxPointsPart;
+    /*vector<int>*/
+    Mat iterCounts;
+
+    Mat cameraMatrix;
+    int transformType;
+
+    double maxTranslation, maxRotation;
+
+    mutable Ptr<RgbdNormals> normalsComputer;
+  };
+
+  /** Odometry that merges RgbdOdometry and ICPOdometry by minimize sum of their energy functions.
+   */
+
+  class CV_EXPORTS_W RgbdICPOdometry: public Odometry
+  {
+  public:
+    RgbdICPOdometry();
+    /** Constructor.
+     * @param cameraMatrix Camera matrix
+     * @param minDepth Pixels with depth less than minDepth will not be used
+     * @param maxDepth Pixels with depth larger than maxDepth will not be used
+     * @param maxDepthDiff Correspondences between pixels of two given frames will be filtered out
+     *                     if their depth difference is larger than maxDepthDiff
+     * @param maxPointsPart The method uses a random pixels subset of size frameWidth x frameHeight x pointsPart
+     * @param iterCounts Count of iterations on each pyramid level.
+     * @param minGradientMagnitudes For each pyramid level the pixels will be filtered out
+     *                              if they have gradient magnitude less than minGradientMagnitudes[level].
+     * @param transformType Class of trasformation
+     */
+    RgbdICPOdometry(const Mat& cameraMatrix, float minDepth = Odometry::DEFAULT_MIN_DEPTH(), float maxDepth = Odometry::DEFAULT_MAX_DEPTH(),
+                    float maxDepthDiff = Odometry::DEFAULT_MAX_DEPTH_DIFF(), float maxPointsPart = Odometry::DEFAULT_MAX_POINTS_PART(),
+                    const std::vector<int>& iterCounts = std::vector<int>(),
+                    const std::vector<float>& minGradientMagnitudes = std::vector<float>(),
+                    int transformType = Odometry::RIGID_BODY_MOTION);
+
+    CV_WRAP static Ptr<RgbdICPOdometry> create(const Mat& cameraMatrix = Mat(), float minDepth = Odometry::DEFAULT_MIN_DEPTH(), float maxDepth = Odometry::DEFAULT_MAX_DEPTH(),
+                    float maxDepthDiff = Odometry::DEFAULT_MAX_DEPTH_DIFF(), float maxPointsPart = Odometry::DEFAULT_MAX_POINTS_PART(),
+                    const std::vector<int>& iterCounts = std::vector<int>(),
+                    const std::vector<float>& minGradientMagnitudes = std::vector<float>(),
+                    int transformType = Odometry::RIGID_BODY_MOTION);
+
+    CV_WRAP virtual Size prepareFrameCache(Ptr<OdometryFrame>& frame, int cacheType) const CV_OVERRIDE;
+
+    CV_WRAP cv::Mat getCameraMatrix() const CV_OVERRIDE
+    {
+        return cameraMatrix;
+    }
+    CV_WRAP void setCameraMatrix(const cv::Mat &val) CV_OVERRIDE
+    {
+        cameraMatrix = val;
+    }
+    CV_WRAP double getMinDepth() const
+    {
+        return minDepth;
+    }
+    CV_WRAP void setMinDepth(double val)
+    {
+        minDepth = val;
+    }
+    CV_WRAP double getMaxDepth() const
+    {
+        return maxDepth;
+    }
+    CV_WRAP void setMaxDepth(double val)
+    {
+        maxDepth = val;
+    }
+    CV_WRAP double getMaxDepthDiff() const
+    {
+        return maxDepthDiff;
+    }
+    CV_WRAP void setMaxDepthDiff(double val)
+    {
+        maxDepthDiff = val;
+    }
+    CV_WRAP double getMaxPointsPart() const
+    {
+        return maxPointsPart;
+    }
+    CV_WRAP void setMaxPointsPart(double val)
+    {
+        maxPointsPart = val;
+    }
+    CV_WRAP cv::Mat getIterationCounts() const
+    {
+        return iterCounts;
+    }
+    CV_WRAP void setIterationCounts(const cv::Mat &val)
+    {
+        iterCounts = val;
+    }
+    CV_WRAP cv::Mat getMinGradientMagnitudes() const
+    {
+        return minGradientMagnitudes;
+    }
+    CV_WRAP void setMinGradientMagnitudes(const cv::Mat &val)
+    {
+        minGradientMagnitudes = val;
+    }
+    CV_WRAP int getTransformType() const CV_OVERRIDE
+    {
+        return transformType;
+    }
+    CV_WRAP void setTransformType(int val) CV_OVERRIDE
+    {
+        transformType = val;
+    }
+    CV_WRAP double getMaxTranslation() const
+    {
+        return maxTranslation;
+    }
+    CV_WRAP void setMaxTranslation(double val)
+    {
+        maxTranslation = val;
+    }
+    CV_WRAP double getMaxRotation() const
+    {
+        return maxRotation;
+    }
+    CV_WRAP void setMaxRotation(double val)
+    {
+        maxRotation = val;
+    }
+    CV_WRAP Ptr<RgbdNormals> getNormalsComputer() const
+    {
+        return normalsComputer;
+    }
+
+  protected:
+    virtual void
+    checkParams() const CV_OVERRIDE;
+
+    virtual bool
+    computeImpl(const Ptr<OdometryFrame>& srcFrame, const Ptr<OdometryFrame>& dstFrame, OutputArray Rt,
+                const Mat& initRt) const CV_OVERRIDE;
+
+    // Some params have commented desired type. It's due to AlgorithmInfo::addParams does not support it now.
+    /*float*/
+    double minDepth, maxDepth, maxDepthDiff;
+    /*float*/
+    double maxPointsPart;
+    /*vector<int>*/
+    Mat iterCounts;
+    /*vector<float>*/
+    Mat minGradientMagnitudes;
+
+    Mat cameraMatrix;
+    int transformType;
+
+    double maxTranslation, maxRotation;
+
+    mutable Ptr<RgbdNormals> normalsComputer;
+  };
+
+  /** A faster version of ICPOdometry which is used in KinectFusion implementation
+   * Partial list of differences:
+   * - Works in parallel
+   * - Written in universal intrinsics
+   * - Filters points by angle
+   * - Interpolates points and normals
+   * - Doesn't use masks or min/max depth filtering
+   * - Doesn't use random subsets of points
+   * - Supports only Rt transform type
+   * - Supports only 4-float vectors as input type
+   */
+  class CV_EXPORTS_W FastICPOdometry: public Odometry
+  {
+  public:
+    FastICPOdometry();
+    /** Constructor.
+     * @param cameraMatrix Camera matrix
+     * @param maxDistDiff Correspondences between pixels of two given frames will be filtered out
+     *                     if their depth difference is larger than maxDepthDiff
+     * @param angleThreshold Correspondence will be filtered out
+     *                     if an angle between their normals is bigger than threshold
+     * @param sigmaDepth Depth sigma in meters for bilateral smooth
+     * @param sigmaSpatial Spatial sigma in pixels for bilateral smooth
+     * @param kernelSize Kernel size in pixels for bilateral smooth
+     * @param iterCounts Count of iterations on each pyramid level
+     */
+    FastICPOdometry(const Mat& cameraMatrix,
+                    float maxDistDiff = Odometry::DEFAULT_MAX_DEPTH_DIFF(),
+                    float angleThreshold = (float)(30. * CV_PI / 180.),
+                    float sigmaDepth = 0.04f,
+                    float sigmaSpatial = 4.5f,
+                    int kernelSize = 7,
+                    const std::vector<int>& iterCounts = std::vector<int>());
+
+    CV_WRAP static Ptr<FastICPOdometry> create(const Mat& cameraMatrix,
+                                               float maxDistDiff = Odometry::DEFAULT_MAX_DEPTH_DIFF(),
+                                               float angleThreshold = (float)(30. * CV_PI / 180.),
+                                               float sigmaDepth = 0.04f,
+                                               float sigmaSpatial = 4.5f,
+                                               int kernelSize = 7,
+                                               const std::vector<int>& iterCounts = std::vector<int>());
+
+    CV_WRAP virtual Size prepareFrameCache(Ptr<OdometryFrame>& frame, int cacheType) const CV_OVERRIDE;
+
+    CV_WRAP cv::Mat getCameraMatrix() const CV_OVERRIDE
+    {
+        return cameraMatrix;
+    }
+    CV_WRAP void setCameraMatrix(const cv::Mat &val) CV_OVERRIDE
+    {
+        cameraMatrix = val;
+    }
+    CV_WRAP double getMaxDistDiff() const
+    {
+        return maxDistDiff;
+    }
+    CV_WRAP void setMaxDistDiff(float val)
+    {
+        maxDistDiff = val;
+    }
+    CV_WRAP float getAngleThreshold() const
+    {
+        return angleThreshold;
+    }
+    CV_WRAP void setAngleThreshold(float f)
+    {
+        angleThreshold = f;
+    }
+    CV_WRAP float getSigmaDepth() const
+    {
+        return sigmaDepth;
+    }
+    CV_WRAP void setSigmaDepth(float f)
+    {
+        sigmaDepth = f;
+    }
+    CV_WRAP float getSigmaSpatial() const
+    {
+        return sigmaSpatial;
+    }
+    CV_WRAP void setSigmaSpatial(float f)
+    {
+        sigmaSpatial = f;
+    }
+    CV_WRAP int getKernelSize() const
+    {
+        return kernelSize;
+    }
+    CV_WRAP void setKernelSize(int f)
+    {
+        kernelSize = f;
+    }
+    CV_WRAP cv::Mat getIterationCounts() const
+    {
+        return iterCounts;
+    }
+    CV_WRAP void setIterationCounts(const cv::Mat &val)
+    {
+        iterCounts = val;
+    }
+    CV_WRAP int getTransformType() const CV_OVERRIDE
+    {
+        return Odometry::RIGID_BODY_MOTION;
+    }
+    CV_WRAP void setTransformType(int val) CV_OVERRIDE
+    {
+        if(val != Odometry::RIGID_BODY_MOTION)
+            throw std::runtime_error("Rigid Body Motion is the only accepted transformation type"
+                                     " for this odometry method");
+    }
+
+  protected:
+    virtual void
+    checkParams() const CV_OVERRIDE;
+
+    virtual bool
+    computeImpl(const Ptr<OdometryFrame>& srcFrame, const Ptr<OdometryFrame>& dstFrame, OutputArray Rt,
+                const Mat& initRt) const CV_OVERRIDE;
+
+    // Some params have commented desired type. It's due to AlgorithmInfo::addParams does not support it now.
+    float maxDistDiff;
+
+    float angleThreshold;
+
+    float sigmaDepth;
+
+    float sigmaSpatial;
+
+    int kernelSize;
+
+    /*vector<int>*/
+    Mat iterCounts;
+
+    Mat cameraMatrix;
+  };
+
+  /** Warp the image: compute 3d points from the depth, transform them using given transformation,
+   * then project color point cloud to an image plane.
+   * This function can be used to visualize results of the Odometry algorithm.
+   * @param image The image (of CV_8UC1 or CV_8UC3 type)
+   * @param depth The depth (of type used in depthTo3d fuction)
+   * @param mask The mask of used pixels (of CV_8UC1), it can be empty
+   * @param Rt The transformation that will be applied to the 3d points computed from the depth
+   * @param cameraMatrix Camera matrix
+   * @param distCoeff Distortion coefficients
+   * @param warpedImage The warped image.
+   * @param warpedDepth The warped depth.
+   * @param warpedMask The warped mask.
+   */
+  CV_EXPORTS_W
+  void
+  warpFrame(const Mat& image, const Mat& depth, const Mat& mask, const Mat& Rt, const Mat& cameraMatrix,
+            const Mat& distCoeff, OutputArray warpedImage, OutputArray warpedDepth = noArray(), OutputArray warpedMask = noArray());
+
+// TODO Depth interpolation
+// Curvature
+// Get rescaleDepth return dubles if asked for
+
+//! @}
+
+} /* namespace rgbd */
+} /* namespace cv */
+
+#endif
+
+/* End of file. */
diff --git a/modules/rgbd/include/opencv2/rgbd/dynafu.hpp b/modules/rgbd/include/opencv2/rgbd/dynafu.hpp
new file mode 100644
index 00000000000..d057ebe76c6
--- /dev/null
+++ b/modules/rgbd/include/opencv2/rgbd/dynafu.hpp
@@ -0,0 +1,206 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#ifndef __OPENCV_RGBD_DYNAFU_HPP__
+#define __OPENCV_RGBD_DYNAFU_HPP__
+
+#include "opencv2/core.hpp"
+#include "opencv2/core/affine.hpp"
+
+namespace cv {
+namespace dynafu {
+
+struct CV_EXPORTS_W Params
+{
+    /** @brief Default parameters
+    A set of parameters which provides better model quality, can be very slow.
+    */
+    CV_WRAP static Ptr<Params> defaultParams();
+
+    /** @brief Coarse parameters
+    A set of parameters which provides better speed, can fail to match frames
+    in case of rapid sensor motion.
+    */
+    CV_WRAP static Ptr<Params> coarseParams();
+
+    /** @brief frame size in pixels */
+    CV_PROP_RW Size frameSize;
+
+    /** @brief camera intrinsics */
+    CV_PROP Matx33f intr;
+
+    /** @brief pre-scale per 1 meter for input values
+
+    Typical values are:
+         * 5000 per 1 meter for the 16-bit PNG files of TUM database
+         * 1000 per 1 meter for Kinect 2 device
+         * 1 per 1 meter for the 32-bit float images in the ROS bag files
+    */
+    CV_PROP_RW float depthFactor;
+
+    /** @brief Depth sigma in meters for bilateral smooth */
+    CV_PROP_RW float bilateral_sigma_depth;
+    /** @brief Spatial sigma in pixels for bilateral smooth */
+    CV_PROP_RW float bilateral_sigma_spatial;
+    /** @brief Kernel size in pixels for bilateral smooth */
+    CV_PROP_RW int   bilateral_kernel_size;
+
+    /** @brief Number of pyramid levels for ICP */
+    CV_PROP_RW int pyramidLevels;
+
+    /** @brief Resolution of voxel space
+
+    Number of voxels in each dimension.
+    */
+    CV_PROP_RW Vec3i volumeDims;
+    /** @brief Size of voxel in meters */
+    CV_PROP_RW float voxelSize;
+
+    /** @brief Minimal camera movement in meters
+
+    Integrate new depth frame only if camera movement exceeds this value.
+    */
+    CV_PROP_RW float tsdf_min_camera_movement;
+
+    /** @brief initial volume pose in meters */
+    Affine3f volumePose;
+
+    /** @brief distance to truncate in meters
+
+    Distances to surface that exceed this value will be truncated to 1.0.
+    */
+    CV_PROP_RW float tsdf_trunc_dist;
+
+    /** @brief max number of frames per voxel
+
+    Each voxel keeps running average of distances no longer than this value.
+    */
+    CV_PROP_RW int tsdf_max_weight;
+
+    /** @brief A length of one raycast step
+
+    How much voxel sizes we skip each raycast step
+    */
+    CV_PROP_RW float raycast_step_factor;
+
+    // gradient delta in voxel sizes
+    // fixed at 1.0f
+    // float gradient_delta_factor;
+
+    /** @brief light pose for rendering in meters */
+    CV_PROP Vec3f lightPose;
+
+    /** @brief distance theshold for ICP in meters */
+    CV_PROP_RW float icpDistThresh;
+    /** angle threshold for ICP in radians */
+    CV_PROP_RW float icpAngleThresh;
+    /** number of ICP iterations for each pyramid level */
+    CV_PROP std::vector<int> icpIterations;
+
+    /** @brief Threshold for depth truncation in meters
+
+    All depth values beyond this threshold will be set to zero
+    */
+    CV_PROP_RW float truncateThreshold;
+};
+
+/** @brief DynamicFusion implementation
+
+  This class implements a 3d reconstruction algorithm as described in @cite dynamicfusion.
+
+  It takes a sequence of depth images taken from depth sensor
+  (or any depth images source such as stereo camera matching algorithm or even raymarching renderer).
+  The output can be obtained as a vector of points and their normals
+  or can be Phong-rendered from given camera pose.
+
+  It extends the KinectFusion algorithm to handle non-rigidly deforming scenes by maintaining a sparse
+  set of nodes covering the geometry such that each node contains a warp to transform it from a canonical
+  space to the live frame.
+
+  An internal representation of a model is a voxel cuboid that keeps TSDF values
+  which are a sort of distances to the surface (for details read the @cite kinectfusion article about TSDF).
+  There is no interface to that representation yet.
+
+  Note that DynamicFusion is based on the KinectFusion algorithm which is patented and its use may be
+  restricted by the list of patents mentioned in README.md file in this module directory.
+
+  That's why you need to set the OPENCV_ENABLE_NONFREE option in CMake to use DynamicFusion.
+*/
+class CV_EXPORTS_W DynaFu
+{
+public:
+    CV_WRAP static Ptr<DynaFu> create(const Ptr<Params>& _params);
+    virtual ~DynaFu();
+
+    /** @brief Get current parameters */
+    virtual const Params& getParams() const = 0;
+
+    /** @brief Renders a volume into an image
+
+      Renders a 0-surface of TSDF using Phong shading into a CV_8UC4 Mat.
+      Light pose is fixed in DynaFu params.
+
+        @param image resulting image
+        @param cameraPose pose of camera to render from. If empty then render from current pose
+        which is a last frame camera pose.
+    */
+
+    CV_WRAP virtual void render(OutputArray image, const Matx44f& cameraPose = Matx44f::eye()) const = 0;
+
+    /** @brief Gets points and normals of current 3d mesh
+
+      The order of normals corresponds to order of points.
+      The order of points is undefined.
+
+        @param points vector of points which are 4-float vectors
+        @param normals vector of normals which are 4-float vectors
+     */
+    CV_WRAP virtual void getCloud(OutputArray points, OutputArray normals) const = 0;
+
+    /** @brief Gets points of current 3d mesh
+
+     The order of points is undefined.
+
+        @param points vector of points which are 4-float vectors
+     */
+    CV_WRAP virtual void getPoints(OutputArray points) const = 0;
+
+    /** @brief Calculates normals for given points
+        @param points input vector of points which are 4-float vectors
+        @param normals output vector of corresponding normals which are 4-float vectors
+     */
+    CV_WRAP virtual  void getNormals(InputArray points, OutputArray normals) const = 0;
+
+    /** @brief Resets the algorithm
+
+    Clears current model and resets a pose.
+    */
+    CV_WRAP virtual void reset() = 0;
+
+    /** @brief Get current pose in voxel space */
+    virtual const Affine3f getPose() const = 0;
+
+    /** @brief Process next depth frame
+
+      Integrates depth into voxel space with respect to its ICP-calculated pose.
+      Input image is converted to CV_32F internally if has another type.
+
+    @param depth one-channel image which size and depth scale is described in algorithm's parameters
+    @return true if succeeded to align new frame with current scene, false if opposite
+    */
+    CV_WRAP virtual bool update(InputArray depth) = 0;
+
+    virtual std::vector<Point3f> getNodesPos() const = 0;
+
+    virtual void marchCubes(OutputArray vertices, OutputArray edges) const = 0;
+
+    virtual void renderSurface(OutputArray depthImage, OutputArray vertImage, OutputArray normImage, bool warp=true) = 0;
+};
+
+//! @}
+}
+}
+#endif
diff --git a/modules/rgbd/include/opencv2/rgbd/kinfu.hpp b/modules/rgbd/include/opencv2/rgbd/kinfu.hpp
new file mode 100644
index 00000000000..a2c55bd150f
--- /dev/null
+++ b/modules/rgbd/include/opencv2/rgbd/kinfu.hpp
@@ -0,0 +1,244 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#ifndef __OPENCV_RGBD_KINFU_HPP__
+#define __OPENCV_RGBD_KINFU_HPP__
+
+#include "opencv2/core.hpp"
+#include "opencv2/core/affine.hpp"
+
+namespace cv {
+namespace kinfu {
+//! @addtogroup kinect_fusion
+//! @{
+
+struct CV_EXPORTS_W Params
+{
+
+    CV_WRAP Params(){}
+
+    /**
+     * @brief Constructor for Params
+     * Sets the initial pose of the TSDF volume.
+     * @param volumeIntialPoseRot rotation matrix
+     * @param volumeIntialPoseTransl translation vector
+     */
+    CV_WRAP Params(Matx33f volumeIntialPoseRot, Vec3f volumeIntialPoseTransl)
+    {
+      setInitialVolumePose(volumeIntialPoseRot,volumeIntialPoseTransl);
+    }
+
+    /**
+     * @brief Constructor for Params
+     * Sets the initial pose of the TSDF volume.
+     * @param volumeIntialPose 4 by 4 Homogeneous Transform matrix to set the intial pose of TSDF volume
+     */
+    CV_WRAP Params(Matx44f volumeIntialPose)
+    {
+      setInitialVolumePose(volumeIntialPose);
+    }
+
+    /**
+     * @brief Set Initial Volume Pose
+     * Sets the initial pose of the TSDF volume.
+     * @param R rotation matrix
+     * @param t translation vector
+     */
+    CV_WRAP void setInitialVolumePose(Matx33f R, Vec3f t);
+
+    /**
+     * @brief Set Initial Volume Pose
+     * Sets the initial pose of the TSDF volume.
+     * @param homogen_tf 4 by 4 Homogeneous Transform matrix to set the intial pose of TSDF volume
+     */
+    CV_WRAP void setInitialVolumePose(Matx44f homogen_tf);
+
+    /**
+     * @brief Default parameters
+     * A set of parameters which provides better model quality, can be very slow.
+     */
+    CV_WRAP static Ptr<Params> defaultParams();
+
+    /** @brief Coarse parameters
+    A set of parameters which provides better speed, can fail to match frames
+    in case of rapid sensor motion.
+    */
+    CV_WRAP static Ptr<Params> coarseParams();
+
+    /** @brief frame size in pixels */
+    CV_PROP_RW Size frameSize;
+
+    /** @brief camera intrinsics */
+    CV_PROP_RW Matx33f intr;
+
+    /** @brief pre-scale per 1 meter for input values
+
+    Typical values are:
+         * 5000 per 1 meter for the 16-bit PNG files of TUM database
+         * 1000 per 1 meter for Kinect 2 device
+         * 1 per 1 meter for the 32-bit float images in the ROS bag files
+    */
+    CV_PROP_RW float depthFactor;
+
+    /** @brief Depth sigma in meters for bilateral smooth */
+    CV_PROP_RW float bilateral_sigma_depth;
+    /** @brief Spatial sigma in pixels for bilateral smooth */
+    CV_PROP_RW float bilateral_sigma_spatial;
+    /** @brief Kernel size in pixels for bilateral smooth */
+    CV_PROP_RW int   bilateral_kernel_size;
+
+    /** @brief Number of pyramid levels for ICP */
+    CV_PROP_RW int pyramidLevels;
+
+    /** @brief Resolution of voxel space
+
+    Number of voxels in each dimension.
+    */
+    CV_PROP_RW Vec3i volumeDims;
+    /** @brief Size of voxel in meters */
+    CV_PROP_RW float voxelSize;
+
+    /** @brief Minimal camera movement in meters
+
+    Integrate new depth frame only if camera movement exceeds this value.
+    */
+    CV_PROP_RW float tsdf_min_camera_movement;
+
+    /** @brief initial volume pose in meters */
+    Affine3f volumePose;
+
+    /** @brief distance to truncate in meters
+
+    Distances to surface that exceed this value will be truncated to 1.0.
+    */
+    CV_PROP_RW float tsdf_trunc_dist;
+
+    /** @brief max number of frames per voxel
+
+    Each voxel keeps running average of distances no longer than this value.
+    */
+    CV_PROP_RW int tsdf_max_weight;
+
+    /** @brief A length of one raycast step
+
+    How much voxel sizes we skip each raycast step
+    */
+    CV_PROP_RW float raycast_step_factor;
+
+    // gradient delta in voxel sizes
+    // fixed at 1.0f
+    // float gradient_delta_factor;
+
+    /** @brief light pose for rendering in meters */
+    CV_PROP_RW Vec3f lightPose;
+
+    /** @brief distance theshold for ICP in meters */
+    CV_PROP_RW float icpDistThresh;
+    /** angle threshold for ICP in radians */
+    CV_PROP_RW float icpAngleThresh;
+    /** number of ICP iterations for each pyramid level */
+    CV_PROP_RW std::vector<int> icpIterations;
+
+    /** @brief Threshold for depth truncation in meters
+
+    All depth values beyond this threshold will be set to zero
+    */
+    CV_PROP_RW float truncateThreshold;
+};
+
+/** @brief KinectFusion implementation
+
+  This class implements a 3d reconstruction algorithm described in
+  @cite kinectfusion paper.
+
+  It takes a sequence of depth images taken from depth sensor
+  (or any depth images source such as stereo camera matching algorithm or even raymarching renderer).
+  The output can be obtained as a vector of points and their normals
+  or can be Phong-rendered from given camera pose.
+
+  An internal representation of a model is a voxel cuboid that keeps TSDF values
+  which are a sort of distances to the surface (for details read the @cite kinectfusion article about TSDF).
+  There is no interface to that representation yet.
+
+  KinFu uses OpenCL acceleration automatically if available.
+  To enable or disable it explicitly use cv::setUseOptimized() or cv::ocl::setUseOpenCL().
+
+  This implementation is based on [kinfu-remake](https://github.com/Nerei/kinfu_remake).
+
+  Note that the KinectFusion algorithm was patented and its use may be restricted by
+  the list of patents mentioned in README.md file in this module directory.
+
+  That's why you need to set the OPENCV_ENABLE_NONFREE option in CMake to use KinectFusion.
+*/
+class CV_EXPORTS_W KinFu
+{
+public:
+    CV_WRAP static Ptr<KinFu> create(const Ptr<Params>& _params);
+    virtual ~KinFu();
+
+    /** @brief Get current parameters */
+    virtual const Params& getParams() const = 0;
+
+    /** @brief Renders a volume into an image
+
+      Renders a 0-surface of TSDF using Phong shading into a CV_8UC4 Mat.
+      Light pose is fixed in KinFu params.
+
+        @param image resulting image
+        @param cameraPose pose of camera to render from. If empty then render from current pose
+        which is a last frame camera pose.
+    */
+
+    CV_WRAP virtual void render(OutputArray image, const Matx44f& cameraPose = Matx44f::eye()) const = 0;
+
+    /** @brief Gets points and normals of current 3d mesh
+
+      The order of normals corresponds to order of points.
+      The order of points is undefined.
+
+        @param points vector of points which are 4-float vectors
+        @param normals vector of normals which are 4-float vectors
+     */
+    CV_WRAP virtual void getCloud(OutputArray points, OutputArray normals) const = 0;
+
+    /** @brief Gets points of current 3d mesh
+
+     The order of points is undefined.
+
+        @param points vector of points which are 4-float vectors
+     */
+    CV_WRAP virtual void getPoints(OutputArray points) const = 0;
+
+    /** @brief Calculates normals for given points
+        @param points input vector of points which are 4-float vectors
+        @param normals output vector of corresponding normals which are 4-float vectors
+     */
+    CV_WRAP virtual  void getNormals(InputArray points, OutputArray normals) const = 0;
+
+    /** @brief Resets the algorithm
+
+    Clears current model and resets a pose.
+    */
+    CV_WRAP virtual void reset() = 0;
+
+    /** @brief Get current pose in voxel space */
+    virtual const Affine3f getPose() const = 0;
+
+    /** @brief Process next depth frame
+
+      Integrates depth into voxel space with respect to its ICP-calculated pose.
+      Input image is converted to CV_32F internally if has another type.
+
+    @param depth one-channel image which size and depth scale is described in algorithm's parameters
+    @return true if succeeded to align new frame with current scene, false if opposite
+    */
+    CV_WRAP virtual bool update(InputArray depth) = 0;
+};
+
+//! @}
+}
+}
+#endif
diff --git a/modules/rgbd/include/opencv2/rgbd/linemod.hpp b/modules/rgbd/include/opencv2/rgbd/linemod.hpp
index 95eed96c4a6..76b61bf7cd8 100644
--- a/modules/rgbd/include/opencv2/rgbd/linemod.hpp
+++ b/modules/rgbd/include/opencv2/rgbd/linemod.hpp
@@ -1,48 +1,11 @@
-/*M///////////////////////////////////////////////////////////////////////////////////////
-//
-//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
-//
-//  By downloading, copying, installing or using the software you agree to this license.
-//  If you do not agree to this license, do not download, install,
-//  copy or use the software.
-//
-//
-//                          License Agreement
-//                For Open Source Computer Vision Library
-//
-// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
-// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
-// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
-// Third party copyrights are property of their respective owners.
-//
-// Redistribution and use in source and binary forms, with or without modification,
-// are permitted provided that the following conditions are met:
-//
-//   * Redistribution's of source code must retain the above copyright notice,
-//     this list of conditions and the following disclaimer.
-//
-//   * Redistribution's in binary form must reproduce the above copyright notice,
-//     this list of conditions and the following disclaimer in the documentation
-//     and/or other materials provided with the distribution.
-//
-//   * The name of the copyright holders may not be used to endorse or promote products
-//     derived from this software without specific prior written permission.
-//
-// This software is provided by the copyright holders and contributors "as is" and
-// any express or implied warranties, including, but not limited to, the implied
-// warranties of merchantability and fitness for a particular purpose are disclaimed.
-// In no event shall the Intel Corporation or contributors be liable for any direct,
-// indirect, incidental, special, exemplary, or consequential damages
-// (including, but not limited to, procurement of substitute goods or services;
-// loss of use, data, or profits; or business interruption) however caused
-// and on any theory of liability, whether in contract, strict liability,
-// or tort (including negligence or otherwise) arising in any way out of
-// the use of this software, even if advised of the possibility of such damage.
-//
-//M*/
-
-#ifndef __OPENCV_OBJDETECT_LINEMOD_HPP__
-#define __OPENCV_OBJDETECT_LINEMOD_HPP__
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
+
+#ifndef __OPENCV_RGBD_LINEMOD_HPP__
+#define __OPENCV_RGBD_LINEMOD_HPP__
 
 #include "opencv2/core.hpp"
 #include <map>
@@ -281,6 +244,15 @@ class CV_EXPORTS_W DepthNormal : public Modality
  */
 CV_EXPORTS_W void colormap(const Mat& quantized, CV_OUT Mat& dst);
 
+/**
+ * \brief Debug function to draw linemod features
+ * @param img
+ * @param templates see @ref Detector::addTemplate
+ * @param tl template bbox top-left offset see @ref Detector::addTemplate
+ * @param size marker size see @ref cv::drawMarker
+ */
+CV_EXPORTS_W void drawFeatures(InputOutputArray img, const std::vector<Template>& templates, const Point2i& tl, int size = 10);
+
 /**
  * \brief Represents a successful template match.
  */
diff --git a/modules/rgbd/samples/CMakeLists.txt b/modules/rgbd/samples/CMakeLists.txt
deleted file mode 100644
index 1f7390679eb..00000000000
--- a/modules/rgbd/samples/CMakeLists.txt
+++ /dev/null
@@ -1,9 +0,0 @@
-cmake_minimum_required(VERSION 2.8)
-project(map_test)
-find_package(OpenCV REQUIRED)
-
-set(SOURCES odometry_evaluation.cpp)
-
-include_directories(${OpenCV_INCLUDE_DIRS})
-add_executable(odometry_evaluation ${SOURCES} ${HEADERS})
-target_link_libraries(odometry_evaluation ${OpenCV_LIBS})
diff --git a/modules/rgbd/samples/dynafu_demo.cpp b/modules/rgbd/samples/dynafu_demo.cpp
new file mode 100644
index 00000000000..b69d8725f9c
--- /dev/null
+++ b/modules/rgbd/samples/dynafu_demo.cpp
@@ -0,0 +1,495 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#define CV_LOG_STRIP_LEVEL CV_LOG_LEVEL_VERBOSE
+
+#include <iostream>
+#include <fstream>
+#include <opencv2/imgproc.hpp>
+#include <opencv2/calib3d.hpp>
+#include <opencv2/highgui.hpp>
+#include <opencv2/core/utils/logger.hpp>
+#include <opencv2/rgbd.hpp>
+
+using namespace cv;
+using namespace cv::dynafu;
+using namespace std;
+
+#ifdef HAVE_OPENCV_VIZ
+#include <opencv2/viz.hpp>
+#endif
+
+static vector<string> readDepth(std::string fileList);
+
+static vector<string> readDepth(std::string fileList)
+{
+    vector<string> v;
+
+    fstream file(fileList);
+    if(!file.is_open())
+        throw std::runtime_error("Failed to read depth list");
+
+    std::string dir;
+    size_t slashIdx = fileList.rfind('/');
+    slashIdx = slashIdx != std::string::npos ? slashIdx : fileList.rfind('\\');
+    dir = fileList.substr(0, slashIdx);
+
+    while(!file.eof())
+    {
+        std::string s, imgPath;
+        std::getline(file, s);
+        if(s.empty() || s[0] == '#') continue;
+        std::stringstream ss;
+        ss << s;
+        double thumb;
+        ss >> thumb >> imgPath;
+        v.push_back(dir+'/'+imgPath);
+    }
+
+    return v;
+}
+
+struct DepthWriter
+{
+    DepthWriter(string fileList) :
+        file(fileList, ios::out), count(0), dir()
+    {
+        size_t slashIdx = fileList.rfind('/');
+        slashIdx = slashIdx != std::string::npos ? slashIdx : fileList.rfind('\\');
+        dir = fileList.substr(0, slashIdx);
+
+        if(!file.is_open())
+            throw std::runtime_error("Failed to write depth list");
+
+        file << "# depth maps saved from device" << endl;
+        file << "# useless_number filename" << endl;
+    }
+
+    void append(InputArray _depth)
+    {
+        Mat depth = _depth.getMat();
+        string depthFname = cv::format("%04d.png", count);
+        string fullDepthFname = dir + '/' + depthFname;
+        if(!imwrite(fullDepthFname, depth))
+            throw std::runtime_error("Failed to write depth to file " + fullDepthFname);
+        file << count++ << " " << depthFname << endl;
+    }
+
+    fstream file;
+    int count;
+    string dir;
+};
+
+namespace Kinect2Params
+{
+    static const Size frameSize = Size(512, 424);
+    // approximate values, no guarantee to be correct
+    static const float focal = 366.1f;
+    static const float cx = 258.2f;
+    static const float cy = 204.f;
+    static const float k1 =  0.12f;
+    static const float k2 = -0.34f;
+    static const float k3 =  0.12f;
+};
+
+struct DepthSource
+{
+public:
+    DepthSource(int cam) :
+        DepthSource("", cam)
+    { }
+
+    DepthSource(String fileListName) :
+        DepthSource(fileListName, -1)
+    { }
+
+    DepthSource(String fileListName, int cam) :
+        depthFileList(fileListName.empty() ? vector<string>() : readDepth(fileListName)),
+        frameIdx(0),
+        vc( cam >= 0 ? VideoCapture(VideoCaptureAPIs::CAP_OPENNI2 + cam) : VideoCapture()),
+        undistortMap1(),
+        undistortMap2(),
+        useKinect2Workarounds(true)
+    {
+    }
+
+    UMat getDepth()
+    {
+        UMat out;
+        if (!vc.isOpened())
+        {
+            if (frameIdx < depthFileList.size())
+            {
+                Mat f = cv::imread(depthFileList[frameIdx++], IMREAD_ANYDEPTH);
+                f.copyTo(out);
+            }
+            else
+            {
+                return UMat();
+            }
+        }
+        else
+        {
+            vc.grab();
+            vc.retrieve(out, CAP_OPENNI_DEPTH_MAP);
+
+            // workaround for Kinect 2
+            if(useKinect2Workarounds)
+            {
+                out = out(Rect(Point(), Kinect2Params::frameSize));
+
+                UMat outCopy;
+                // linear remap adds gradient between valid and invalid pixels
+                // which causes garbage, use nearest instead
+                remap(out, outCopy, undistortMap1, undistortMap2, cv::INTER_NEAREST);
+
+                cv::flip(outCopy, out, 1);
+            }
+        }
+        if (out.empty())
+            throw std::runtime_error("Matrix is empty");
+        return out;
+    }
+
+    bool empty()
+    {
+        return depthFileList.empty() && !(vc.isOpened());
+    }
+
+    void updateParams(Params& params)
+    {
+        if (vc.isOpened())
+        {
+            // this should be set in according to user's depth sensor
+            int w = (int)vc.get(VideoCaptureProperties::CAP_PROP_FRAME_WIDTH);
+            int h = (int)vc.get(VideoCaptureProperties::CAP_PROP_FRAME_HEIGHT);
+
+            float focal = (float)vc.get(CAP_OPENNI_DEPTH_GENERATOR | CAP_PROP_OPENNI_FOCAL_LENGTH);
+
+            // it's recommended to calibrate sensor to obtain its intrinsics
+            float fx, fy, cx, cy;
+            Size frameSize;
+            if(useKinect2Workarounds)
+            {
+                fx = fy = Kinect2Params::focal;
+                cx = Kinect2Params::cx;
+                cy = Kinect2Params::cy;
+
+                frameSize = Kinect2Params::frameSize;
+            }
+            else
+            {
+                fx = fy = focal;
+                cx = w/2 - 0.5f;
+                cy = h/2 - 0.5f;
+
+                frameSize = Size(w, h);
+            }
+
+            Matx33f camMatrix = Matx33f(fx,  0, cx,
+                                        0,  fy, cy,
+                                        0,   0,  1);
+
+            params.frameSize = frameSize;
+            params.intr = camMatrix;
+            params.depthFactor = 1000.f;
+
+            Matx<float, 1, 5> distCoeffs;
+            distCoeffs(0) = Kinect2Params::k1;
+            distCoeffs(1) = Kinect2Params::k2;
+            distCoeffs(4) = Kinect2Params::k3;
+            if(useKinect2Workarounds)
+                initUndistortRectifyMap(camMatrix, distCoeffs, cv::noArray(),
+                                        camMatrix, frameSize, CV_16SC2,
+                                        undistortMap1, undistortMap2);
+        }
+    }
+
+    vector<string> depthFileList;
+    size_t frameIdx;
+    VideoCapture vc;
+    UMat undistortMap1, undistortMap2;
+    bool useKinect2Workarounds;
+};
+
+#ifdef HAVE_OPENCV_VIZ
+const std::string vizWindowName = "cloud";
+
+struct PauseCallbackArgs
+{
+    PauseCallbackArgs(DynaFu& _df) : df(_df)
+    { }
+
+    DynaFu& df;
+};
+
+void pauseCallback(const viz::MouseEvent& me, void* args);
+void pauseCallback(const viz::MouseEvent& me, void* args)
+{
+    if(me.type == viz::MouseEvent::Type::MouseMove       ||
+       me.type == viz::MouseEvent::Type::MouseScrollDown ||
+       me.type == viz::MouseEvent::Type::MouseScrollUp)
+    {
+        PauseCallbackArgs pca = *((PauseCallbackArgs*)(args));
+        viz::Viz3d window(vizWindowName);
+        UMat rendered;
+        pca.df.render(rendered, window.getViewerPose().matrix);
+        imshow("render", rendered);
+        waitKey(1);
+    }
+}
+#endif
+
+static const char* keys =
+{
+    "{help h usage ? | | print this message   }"
+    "{depth  | | Path to depth.txt file listing a set of depth images }"
+    "{camera |0| Index of depth camera to be used as a depth source }"
+    "{coarse | | Run on coarse settings (fast but ugly) or on default (slow but looks better),"
+        " in coarse mode points and normals are displayed }"
+    "{idle   | | Do not run DynaFu, just display depth frames }"
+    "{record | | Write depth frames to specified file list"
+        " (the same format as for the 'depth' key) }"
+};
+
+static const std::string message =
+ "\nThis demo uses live depth input or RGB-D dataset taken from"
+ "\nhttps://vision.in.tum.de/data/datasets/rgbd-dataset"
+ "\nto demonstrate KinectFusion implementation \n";
+
+
+int main(int argc, char **argv)
+{
+    bool coarse = false;
+    bool idle = false;
+    string recordPath;
+
+    CommandLineParser parser(argc, argv, keys);
+    parser.about(message);
+
+    if(!parser.check())
+    {
+        parser.printMessage();
+        parser.printErrors();
+        return -1;
+    }
+
+    if(parser.has("help"))
+    {
+        parser.printMessage();
+        return 0;
+    }
+    if(parser.has("coarse"))
+    {
+        coarse = true;
+    }
+    if(parser.has("record"))
+    {
+        recordPath = parser.get<String>("record");
+    }
+    if(parser.has("idle"))
+    {
+        idle = true;
+    }
+
+    Ptr<DepthSource> ds;
+    if (parser.has("depth"))
+        ds = makePtr<DepthSource>(parser.get<String>("depth"));
+    else
+        ds = makePtr<DepthSource>(parser.get<int>("camera"));
+
+    if (ds->empty())
+    {
+        std::cerr << "Failed to open depth source" << std::endl;
+        parser.printMessage();
+        return -1;
+    }
+
+    Ptr<DepthWriter> depthWriter;
+    if(!recordPath.empty())
+        depthWriter = makePtr<DepthWriter>(recordPath);
+
+    Ptr<Params> params;
+    Ptr<DynaFu> df;
+
+    if(coarse)
+        params = Params::coarseParams();
+    else
+        params = Params::defaultParams();
+
+    // These params can be different for each depth sensor
+    ds->updateParams(*params);
+
+    // Enables OpenCL explicitly (by default can be switched-off)
+    cv::setUseOptimized(false);
+
+    // Scene-specific params should be tuned for each scene individually
+    //params->volumePose = params->volumePose.translate(Vec3f(0.f, 0.f, 0.5f));
+    //params->tsdf_max_weight = 16;
+
+    namedWindow("OpenGL Window", WINDOW_OPENGL);
+    resizeWindow("OpenGL Window", 1, 1);
+    if(!idle)
+        df = DynaFu::create(params);
+
+#ifdef HAVE_OPENCV_VIZ
+    cv::viz::Viz3d window(vizWindowName);
+    window.setViewerPose(Affine3f::Identity());
+    bool pause = false;
+#endif
+
+    UMat rendered;
+    UMat points;
+    UMat normals;
+
+    int64 prevTime = getTickCount();
+
+
+    for(UMat frame = ds->getDepth(); !frame.empty(); frame = ds->getDepth())
+    {
+        Mat depthImg, vertImg, normImg;
+        setOpenGlContext("OpenGL Window");
+        df->renderSurface(depthImg, vertImg, normImg);
+        if(!depthImg.empty())
+        {
+            UMat depthCvt8, vertCvt8, normCvt8;
+            convertScaleAbs(depthImg, depthCvt8, 0.33*255);
+            vertImg.convertTo(vertCvt8, CV_8UC3, 255);
+            normImg.convertTo(normCvt8, CV_8UC3, 255);
+
+            imshow("Surface prediction", depthCvt8);
+            imshow("vertex prediction", vertCvt8);
+            imshow("normal prediction", normCvt8);
+        }
+
+        if(depthWriter)
+            depthWriter->append(frame);
+
+#ifdef HAVE_OPENCV_VIZ
+        if(pause)
+        {
+            // doesn't happen in idle mode
+            df->getCloud(points, normals);
+
+            if(!points.empty() && !normals.empty())
+            {
+                viz::WCloud cloudWidget(points, viz::Color::white());
+                viz::WCloudNormals cloudNormals(points, normals, /*level*/1, /*scale*/0.05, viz::Color::gray());
+
+                Vec3d volSize = df->getParams().voxelSize*Vec3d(df->getParams().volumeDims);
+                window.showWidget("cube", viz::WCube(Vec3d::all(0),
+                                                     volSize),
+                                  df->getParams().volumePose);
+                PauseCallbackArgs pca(*df);
+                window.registerMouseCallback(pauseCallback, (void*)&pca);
+                window.showWidget("text", viz::WText(cv::String("Move camera in this window. "
+                                                                "Close the window or press Q to resume"), Point()));
+                window.spin();
+                window.removeWidget("text");
+                //window.removeWidget("cloud");
+                //window.removeWidget("normals");
+                window.registerMouseCallback(0);
+            }
+
+            pause = false;
+        }
+        else
+#endif
+        {
+            UMat cvt8;
+            float depthFactor = params->depthFactor;
+            convertScaleAbs(frame, cvt8, 0.25*256. / depthFactor);
+            if(!idle)
+            {
+                imshow("depth", cvt8);
+
+                if(!df->update(frame))
+                {
+                    df->reset();
+                    std::cout << "reset" << std::endl;
+                }
+#ifdef HAVE_OPENCV_VIZ
+                else
+                {
+                    Mat meshCloud, meshEdges, meshPoly;
+                    df->marchCubes(meshCloud, meshEdges);
+                    for(int i = 0; i < meshEdges.size().height; i += 3)
+                    {
+                        meshPoly.push_back<int>(3);
+                        meshPoly.push_back<int>(meshEdges.at<int>(i, 0));
+                        meshPoly.push_back<int>(meshEdges.at<int>(i+1, 0));
+                        meshPoly.push_back<int>(meshEdges.at<int>(i+2, 0));
+                    }
+
+                    viz::WMesh mesh(meshCloud.t(), meshPoly);
+                    window.showWidget("mesh", mesh);
+
+                    if(coarse)
+                    {
+                        df->getCloud(points, normals);
+
+                        if(!points.empty() && !normals.empty())
+                        {
+                            viz::WCloud cloudWidget(points, viz::Color::white());
+                            viz::WCloudNormals cloudNormals(points, normals, /*level*/1, /*scale*/0.05, viz::Color::gray());
+                            //window.showWidget("cloud", cloudWidget);
+                            //window.showWidget("normals", cloudNormals);
+                            if(!df->getNodesPos().empty())
+                            {
+                                viz::WCloud nodeCloud(df->getNodesPos(), viz::Color::red());
+                                nodeCloud.setRenderingProperty(viz::POINT_SIZE, 4);
+                                window.showWidget("nodes", nodeCloud);
+                            }
+                        }
+                    }
+
+                    //window.showWidget("worldAxes", viz::WCoordinateSystem());
+                    Vec3d volSize = df->getParams().voxelSize*df->getParams().volumeDims;
+                    window.showWidget("cube", viz::WCube(Vec3d::all(0),
+                                                         volSize),
+                                      df->getParams().volumePose);
+                    window.setViewerPose(df->getPose());
+                    window.spinOnce(1, true);
+                }
+#endif
+
+                df->render(rendered);
+            }
+            else
+            {
+                rendered = cvt8;
+            }
+        }
+
+        int64 newTime = getTickCount();
+        putText(rendered, cv::format("FPS: %2d press R to reset, P to pause, Q to quit",
+                                     (int)(getTickFrequency()/(newTime - prevTime))),
+                Point(0, rendered.rows-1), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 255));
+        prevTime = newTime;
+
+        imshow("render", rendered);
+
+        int c = waitKey(1);
+        switch (c)
+        {
+        case 'r':
+            if(!idle)
+                df->reset();
+            break;
+        case 'q':
+            return 0;
+#ifdef HAVE_OPENCV_VIZ
+        case 'p':
+            if(!idle)
+                pause = true;
+#endif
+        default:
+            break;
+        }
+    }
+
+    return 0;
+}
diff --git a/modules/rgbd/samples/kinfu_demo.cpp b/modules/rgbd/samples/kinfu_demo.cpp
new file mode 100644
index 00000000000..ba151a6c69e
--- /dev/null
+++ b/modules/rgbd/samples/kinfu_demo.cpp
@@ -0,0 +1,524 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#include <iostream>
+#include <fstream>
+#include <opencv2/imgproc.hpp>
+#include <opencv2/calib3d.hpp>
+#include <opencv2/highgui.hpp>
+#include <opencv2/rgbd/kinfu.hpp>
+
+using namespace cv;
+using namespace cv::kinfu;
+using namespace std;
+
+#ifdef HAVE_OPENCV_VIZ
+#include <opencv2/viz.hpp>
+#endif
+
+static vector<string> readDepth(std::string fileList);
+
+static vector<string> readDepth(std::string fileList)
+{
+    vector<string> v;
+
+    fstream file(fileList);
+    if(!file.is_open())
+        throw std::runtime_error("Failed to read depth list");
+
+    std::string dir;
+    size_t slashIdx = fileList.rfind('/');
+    slashIdx = slashIdx != std::string::npos ? slashIdx : fileList.rfind('\\');
+    dir = fileList.substr(0, slashIdx);
+
+    while(!file.eof())
+    {
+        std::string s, imgPath;
+        std::getline(file, s);
+        if(s.empty() || s[0] == '#') continue;
+        std::stringstream ss;
+        ss << s;
+        double thumb;
+        ss >> thumb >> imgPath;
+        v.push_back(dir+'/'+imgPath);
+    }
+
+    return v;
+}
+
+struct DepthWriter
+{
+    DepthWriter(string fileList) :
+        file(fileList, ios::out), count(0), dir()
+    {
+        size_t slashIdx = fileList.rfind('/');
+        slashIdx = slashIdx != std::string::npos ? slashIdx : fileList.rfind('\\');
+        dir = fileList.substr(0, slashIdx);
+
+        if(!file.is_open())
+            throw std::runtime_error("Failed to write depth list");
+
+        file << "# depth maps saved from device" << endl;
+        file << "# useless_number filename" << endl;
+    }
+
+    void append(InputArray _depth)
+    {
+        Mat depth = _depth.getMat();
+        string depthFname = cv::format("%04d.png", count);
+        string fullDepthFname = dir + '/' + depthFname;
+        if(!imwrite(fullDepthFname, depth))
+            throw std::runtime_error("Failed to write depth to file " + fullDepthFname);
+        file << count++ << " " << depthFname << endl;
+    }
+
+    fstream file;
+    int count;
+    string dir;
+};
+
+namespace Kinect2Params
+{
+    static const Size frameSize = Size(512, 424);
+    // approximate values, no guarantee to be correct
+    static const float focal = 366.1f;
+    static const float cx = 258.2f;
+    static const float cy = 204.f;
+    static const float k1 =  0.12f;
+    static const float k2 = -0.34f;
+    static const float k3 =  0.12f;
+};
+
+struct DepthSource
+{
+public:
+    enum Type
+    {
+        DEPTH_LIST,
+        DEPTH_KINECT2_LIST,
+        DEPTH_KINECT2,
+        DEPTH_REALSENSE
+    };
+
+    DepthSource(int cam) :
+        DepthSource("", cam)
+    { }
+
+    DepthSource(String fileListName) :
+        DepthSource(fileListName, -1)
+    { }
+
+    DepthSource(String fileListName, int cam) :
+        depthFileList(fileListName.empty() ? vector<string>() : readDepth(fileListName)),
+        frameIdx(0),
+        undistortMap1(),
+        undistortMap2()
+    {
+        if(cam >= 0)
+        {
+            vc = VideoCapture(VideoCaptureAPIs::CAP_OPENNI2 + cam);
+            if(vc.isOpened())
+            {
+                sourceType = Type::DEPTH_KINECT2;
+            }
+            else
+            {
+                vc = VideoCapture(VideoCaptureAPIs::CAP_REALSENSE + cam);
+                if(vc.isOpened())
+                {
+                    sourceType = Type::DEPTH_REALSENSE;
+                }
+            }
+        }
+        else
+        {
+            vc = VideoCapture();
+            sourceType = Type::DEPTH_KINECT2_LIST;
+        }
+    }
+
+    UMat getDepth()
+    {
+        UMat out;
+        if (!vc.isOpened())
+        {
+            if (frameIdx < depthFileList.size())
+            {
+                Mat f = cv::imread(depthFileList[frameIdx++], IMREAD_ANYDEPTH);
+                f.copyTo(out);
+            }
+            else
+            {
+                return UMat();
+            }
+        }
+        else
+        {
+            vc.grab();
+            switch (sourceType)
+            {
+            case Type::DEPTH_KINECT2:
+                vc.retrieve(out, CAP_OPENNI_DEPTH_MAP);
+                break;
+            case Type::DEPTH_REALSENSE:
+                vc.retrieve(out, CAP_INTELPERC_DEPTH_MAP);
+                break;
+            default:
+                // unknown depth source
+                vc.retrieve(out);
+            }
+
+            // workaround for Kinect 2
+            if(sourceType == Type::DEPTH_KINECT2)
+            {
+                out = out(Rect(Point(), Kinect2Params::frameSize));
+
+                UMat outCopy;
+                // linear remap adds gradient between valid and invalid pixels
+                // which causes garbage, use nearest instead
+                remap(out, outCopy, undistortMap1, undistortMap2, cv::INTER_NEAREST);
+
+                cv::flip(outCopy, out, 1);
+            }
+        }
+        if (out.empty())
+            throw std::runtime_error("Matrix is empty");
+        return out;
+    }
+
+    bool empty()
+    {
+        return depthFileList.empty() && !(vc.isOpened());
+    }
+
+    void updateParams(Params& params)
+    {
+        if (vc.isOpened())
+        {
+            // this should be set in according to user's depth sensor
+            int w = (int)vc.get(VideoCaptureProperties::CAP_PROP_FRAME_WIDTH);
+            int h = (int)vc.get(VideoCaptureProperties::CAP_PROP_FRAME_HEIGHT);
+
+            // it's recommended to calibrate sensor to obtain its intrinsics
+            float fx, fy, cx, cy;
+            float depthFactor = 1000.f;
+            Size frameSize;
+            if(sourceType == Type::DEPTH_KINECT2)
+            {
+                fx = fy = Kinect2Params::focal;
+                cx = Kinect2Params::cx;
+                cy = Kinect2Params::cy;
+
+                frameSize = Kinect2Params::frameSize;
+            }
+            else
+            {
+                if(sourceType == Type::DEPTH_REALSENSE)
+                {
+                    fx = (float)vc.get(CAP_PROP_INTELPERC_DEPTH_FOCAL_LENGTH_HORZ);
+                    fy = (float)vc.get(CAP_PROP_INTELPERC_DEPTH_FOCAL_LENGTH_VERT);
+                    depthFactor = 1.f/(float)vc.get(CAP_PROP_INTELPERC_DEPTH_SATURATION_VALUE);
+                }
+                else
+                {
+                    fx = fy = (float)vc.get(CAP_OPENNI_DEPTH_GENERATOR | CAP_PROP_OPENNI_FOCAL_LENGTH);
+                }
+
+                cx = w/2 - 0.5f;
+                cy = h/2 - 0.5f;
+
+                frameSize = Size(w, h);
+            }
+
+            Matx33f camMatrix = Matx33f(fx,  0, cx,
+                                        0,  fy, cy,
+                                        0,   0,  1);
+
+            params.frameSize = frameSize;
+            params.intr = camMatrix;
+            params.depthFactor = depthFactor;
+
+            // RealSense has shorter depth range, some params should be tuned
+            if(sourceType == Type::DEPTH_REALSENSE)
+            {
+                // all sizes in meters
+                float cubeSize = 1.f;
+                params.voxelSize = cubeSize/params.volumeDims[0];
+                params.tsdf_trunc_dist = 0.01f;
+                params.icpDistThresh = 0.01f;
+                params.volumePose = Affine3f().translate(Vec3f(-cubeSize/2.f,
+                                                               -cubeSize/2.f,
+                                                               0.05f));
+                params.truncateThreshold = 2.5f;
+                params.bilateral_sigma_depth = 0.01f;
+            }
+
+            if(sourceType == Type::DEPTH_KINECT2)
+            {
+                Matx<float, 1, 5> distCoeffs;
+                distCoeffs(0) = Kinect2Params::k1;
+                distCoeffs(1) = Kinect2Params::k2;
+                distCoeffs(4) = Kinect2Params::k3;
+
+                initUndistortRectifyMap(camMatrix, distCoeffs, cv::noArray(),
+                                        camMatrix, frameSize, CV_16SC2,
+                                        undistortMap1, undistortMap2);
+            }
+        }
+    }
+
+    vector<string> depthFileList;
+    size_t frameIdx;
+    VideoCapture vc;
+    UMat undistortMap1, undistortMap2;
+    Type sourceType;
+};
+
+#ifdef HAVE_OPENCV_VIZ
+const std::string vizWindowName = "cloud";
+
+struct PauseCallbackArgs
+{
+    PauseCallbackArgs(KinFu& _kf) : kf(_kf)
+    { }
+
+    KinFu& kf;
+};
+
+void pauseCallback(const viz::MouseEvent& me, void* args);
+void pauseCallback(const viz::MouseEvent& me, void* args)
+{
+    if(me.type == viz::MouseEvent::Type::MouseMove       ||
+       me.type == viz::MouseEvent::Type::MouseScrollDown ||
+       me.type == viz::MouseEvent::Type::MouseScrollUp)
+    {
+        PauseCallbackArgs pca = *((PauseCallbackArgs*)(args));
+        viz::Viz3d window(vizWindowName);
+        UMat rendered;
+        pca.kf.render(rendered, window.getViewerPose().matrix);
+        imshow("render", rendered);
+        waitKey(1);
+    }
+}
+#endif
+
+static const char* keys =
+{
+    "{help h usage ? | | print this message   }"
+    "{depth  | | Path to depth.txt file listing a set of depth images }"
+    "{camera |0| Index of depth camera to be used as a depth source }"
+    "{coarse | | Run on coarse settings (fast but ugly) or on default (slow but looks better),"
+        " in coarse mode points and normals are displayed }"
+    "{idle   | | Do not run KinFu, just display depth frames }"
+    "{record | | Write depth frames to specified file list"
+        " (the same format as for the 'depth' key) }"
+};
+
+static const std::string message =
+ "\nThis demo uses live depth input or RGB-D dataset taken from"
+ "\nhttps://vision.in.tum.de/data/datasets/rgbd-dataset"
+ "\nto demonstrate KinectFusion implementation \n";
+
+
+int main(int argc, char **argv)
+{
+    bool coarse = false;
+    bool idle = false;
+    string recordPath;
+
+    CommandLineParser parser(argc, argv, keys);
+    parser.about(message);
+
+    if(!parser.check())
+    {
+        parser.printMessage();
+        parser.printErrors();
+        return -1;
+    }
+
+    if(parser.has("help"))
+    {
+        parser.printMessage();
+        return 0;
+    }
+    if(parser.has("coarse"))
+    {
+        coarse = true;
+    }
+    if(parser.has("record"))
+    {
+        recordPath = parser.get<String>("record");
+    }
+    if(parser.has("idle"))
+    {
+        idle = true;
+    }
+
+    Ptr<DepthSource> ds;
+    if (parser.has("depth"))
+        ds = makePtr<DepthSource>(parser.get<String>("depth"));
+    else
+        ds = makePtr<DepthSource>(parser.get<int>("camera"));
+
+    if (ds->empty())
+    {
+        std::cerr << "Failed to open depth source" << std::endl;
+        parser.printMessage();
+        return -1;
+    }
+
+    Ptr<DepthWriter> depthWriter;
+    if(!recordPath.empty())
+        depthWriter = makePtr<DepthWriter>(recordPath);
+
+    Ptr<Params> params;
+    Ptr<KinFu> kf;
+
+    if(coarse)
+        params = Params::coarseParams();
+    else
+        params = Params::defaultParams();
+
+    // These params can be different for each depth sensor
+    ds->updateParams(*params);
+
+    // Enables OpenCL explicitly (by default can be switched-off)
+    cv::setUseOptimized(true);
+
+    // Scene-specific params should be tuned for each scene individually
+    //float cubeSize = 1.f;
+    //params->voxelSize = cubeSize/params->volumeDims[0]; //meters
+    //params->tsdf_trunc_dist = 0.01f; //meters
+    //params->icpDistThresh = 0.01f; //meters
+    //params->volumePose = Affine3f().translate(Vec3f(-cubeSize/2.f, -cubeSize/2.f, 0.25f)); //meters
+    //params->tsdf_max_weight = 16;
+
+    if(!idle)
+        kf = KinFu::create(params);
+
+#ifdef HAVE_OPENCV_VIZ
+    cv::viz::Viz3d window(vizWindowName);
+    window.setViewerPose(Affine3f::Identity());
+    bool pause = false;
+#endif
+
+    UMat rendered;
+    UMat points;
+    UMat normals;
+
+    int64 prevTime = getTickCount();
+
+    for(UMat frame = ds->getDepth(); !frame.empty(); frame = ds->getDepth())
+    {
+        if(depthWriter)
+            depthWriter->append(frame);
+
+#ifdef HAVE_OPENCV_VIZ
+        if(pause)
+        {
+            // doesn't happen in idle mode
+            kf->getCloud(points, normals);
+            if(!points.empty() && !normals.empty())
+            {
+                viz::WCloud cloudWidget(points, viz::Color::white());
+                viz::WCloudNormals cloudNormals(points, normals, /*level*/1, /*scale*/0.05, viz::Color::gray());
+                window.showWidget("cloud", cloudWidget);
+                window.showWidget("normals", cloudNormals);
+
+                Vec3d volSize = kf->getParams().voxelSize*Vec3d(kf->getParams().volumeDims);
+                window.showWidget("cube", viz::WCube(Vec3d::all(0),
+                                                     volSize),
+                                  kf->getParams().volumePose);
+                PauseCallbackArgs pca(*kf);
+                window.registerMouseCallback(pauseCallback, (void*)&pca);
+                window.showWidget("text", viz::WText(cv::String("Move camera in this window. "
+                                                                "Close the window or press Q to resume"), Point()));
+                window.spin();
+                window.removeWidget("text");
+                window.removeWidget("cloud");
+                window.removeWidget("normals");
+                window.registerMouseCallback(0);
+            }
+
+            pause = false;
+        }
+        else
+#endif
+        {
+            UMat cvt8;
+            float depthFactor = params->depthFactor;
+            convertScaleAbs(frame, cvt8, 0.25*256. / depthFactor);
+            if(!idle)
+            {
+                imshow("depth", cvt8);
+
+                if(!kf->update(frame))
+                {
+                    kf->reset();
+                    std::cout << "reset" << std::endl;
+                }
+#ifdef HAVE_OPENCV_VIZ
+                else
+                {
+                    if(coarse)
+                    {
+                        kf->getCloud(points, normals);
+                        if(!points.empty() && !normals.empty())
+                        {
+                            viz::WCloud cloudWidget(points, viz::Color::white());
+                            viz::WCloudNormals cloudNormals(points, normals, /*level*/1, /*scale*/0.05, viz::Color::gray());
+                            window.showWidget("cloud", cloudWidget);
+                            window.showWidget("normals", cloudNormals);
+                        }
+                    }
+
+                    //window.showWidget("worldAxes", viz::WCoordinateSystem());
+                    Vec3d volSize = kf->getParams().voxelSize*kf->getParams().volumeDims;
+                    window.showWidget("cube", viz::WCube(Vec3d::all(0),
+                                                         volSize),
+                                      kf->getParams().volumePose);
+                    window.setViewerPose(kf->getPose());
+                    window.spinOnce(1, true);
+                }
+#endif
+
+                kf->render(rendered);
+            }
+            else
+            {
+                rendered = cvt8;
+            }
+        }
+
+        int64 newTime = getTickCount();
+        putText(rendered, cv::format("FPS: %2d press R to reset, P to pause, Q to quit",
+                                     (int)(getTickFrequency()/(newTime - prevTime))),
+                Point(0, rendered.rows-1), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 255));
+        prevTime = newTime;
+
+        imshow("render", rendered);
+
+        int c = waitKey(1);
+        switch (c)
+        {
+        case 'r':
+            if(!idle)
+                kf->reset();
+            break;
+        case 'q':
+            return 0;
+#ifdef HAVE_OPENCV_VIZ
+        case 'p':
+            if(!idle)
+                pause = true;
+#endif
+        default:
+            break;
+        }
+    }
+
+    return 0;
+}
diff --git a/modules/rgbd/samples/odometry_evaluation.cpp b/modules/rgbd/samples/odometry_evaluation.cpp
index 121acfc5af8..b6d6605c6d1 100644
--- a/modules/rgbd/samples/odometry_evaluation.cpp
+++ b/modules/rgbd/samples/odometry_evaluation.cpp
@@ -1,37 +1,8 @@
-/*
- * Software License Agreement (BSD License)
- *
- *  Copyright (c) 2009, Willow Garage, Inc.
- *  All rights reserved.
- *
- *  Redistribution and use in source and binary forms, with or without
- *  modification, are permitted provided that the following conditions
- *  are met:
- *
- *   * Redistributions of source code must retain the above copyright
- *     notice, this list of conditions and the following disclaimer.
- *   * Redistributions in binary form must reproduce the above
- *     copyright notice, this list of conditions and the following
- *     disclaimer in the documentation and/or other materials provided
- *     with the distribution.
- *   * Neither the name of Willow Garage, Inc. nor the names of its
- *     contributors may be used to endorse or promote products derived
- *     from this software without specific prior written permission.
- *
- *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
- *  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
- *  COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
- *  INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
- *  BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- *  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- *  CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- *  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
- *  ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- *  POSSIBILITY OF SUCH DAMAGE.
- *
- */
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
 
 #include <opencv2/rgbd.hpp>
 
@@ -135,7 +106,7 @@ int main(int argc, char** argv)
 {
     if(argc != 4)
     {
-        cout << "Format: file_with_rgb_depth_pairs trajectory_file odometry_name [Rgbd or ICP or RgbdICP]" << endl;
+        cout << "Format: file_with_rgb_depth_pairs trajectory_file odometry_name [Rgbd or ICP or RgbdICP or FastICP]" << endl;
         return -1;
     }
     
diff --git a/modules/rgbd/src/depth_cleaner.cpp b/modules/rgbd/src/depth_cleaner.cpp
index ca6ba9d55e4..4c505d3e1e4 100644
--- a/modules/rgbd/src/depth_cleaner.cpp
+++ b/modules/rgbd/src/depth_cleaner.cpp
@@ -1,37 +1,8 @@
-/*
- * Software License Agreement (BSD License)
- *
- *  Copyright (c) 2012, Willow Garage, Inc.
- *  All rights reserved.
- *
- *  Redistribution and use in source and binary forms, with or without
- *  modification, are permitted provided that the following conditions
- *  are met:
- *
- *   * Redistributions of source code must retain the above copyright
- *     notice, this list of conditions and the following disclaimer.
- *   * Redistributions in binary form must reproduce the above
- *     copyright notice, this list of conditions and the following
- *     disclaimer in the documentation and/or other materials provided
- *     with the distribution.
- *   * Neither the name of Willow Garage, Inc. nor the names of its
- *     contributors may be used to endorse or promote products derived
- *     from this software without specific prior written permission.
- *
- *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
- *  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
- *  COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
- *  INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
- *  BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- *  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- *  CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- *  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
- *  ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- *  POSSIBILITY OF SUCH DAMAGE.
- *
- */
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
 
 #include "precomp.hpp"
 
diff --git a/modules/rgbd/src/depth_registration.cpp b/modules/rgbd/src/depth_registration.cpp
index 4cf13e3c89d..196fd0048c2 100644
--- a/modules/rgbd/src/depth_registration.cpp
+++ b/modules/rgbd/src/depth_registration.cpp
@@ -1,50 +1,11 @@
-/*M///////////////////////////////////////////////////////////////////////////////////////
-//
-//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
-//
-//  By downloading, copying, installing or using the software you agree to this license.
-//  If you do not agree to this license, do not download, install,
-//  copy or use the software.
-//
-//
-//                          License Agreement
-//                For Open Source Computer Vision Library
-//
-// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
-// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
-// Copyright (C) 2014, OpenCV Foundation, all rights reserved.
-// Third party copyrights are property of their respective owners.
-//
-// Redistribution and use in source and binary forms, with or without modification,
-// are permitted provided that the following conditions are met:
-//
-//   * Redistribution's of source code must retain the above copyright notice,
-//     this list of conditions and the following disclaimer.
-//
-//   * Redistribution's in binary form must reproduce the above copyright notice,
-//     this list of conditions and the following disclaimer in the documentation
-//     and/or other materials provided with the distribution.
-//
-//   * The name of the copyright holders may not be used to endorse or promote products
-//     derived from this software without specific prior written permission.
-//
-// This software is provided by the copyright holders and contributors "as is" and
-// any express or implied warranties, including, but not limited to, the implied
-// warranties of merchantability and fitness for a particular purpose are disclaimed.
-// In no event shall the Intel Corporation or contributors be liable for any direct,
-// indirect, incidental, special, exemplary, or consequential damages
-// (including, but not limited to, procurement of substitute goods or services;
-// loss of use, data, or profits; or business interruption) however caused
-// and on any theory of liability, whether in contract, strict liability,
-// or tort (including negligence or otherwise) arising in any way out of
-// the use of this software, even if advised of the possibility of such damage.
-//
-//M*/
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
 
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
 
 #include "precomp.hpp"
 
-
 namespace cv
 {
 namespace rgbd
diff --git a/modules/rgbd/src/depth_to_3d.cpp b/modules/rgbd/src/depth_to_3d.cpp
index cdcc4ea4815..b57cca4d7e5 100644
--- a/modules/rgbd/src/depth_to_3d.cpp
+++ b/modules/rgbd/src/depth_to_3d.cpp
@@ -1,42 +1,14 @@
-/*
- * Software License Agreement (BSD License)
- *
- *  Copyright (c) 2009, Willow Garage, Inc.
- *  All rights reserved.
- *
- *  Redistribution and use in source and binary forms, with or without
- *  modification, are permitted provided that the following conditions
- *  are met:
- *
- *   * Redistributions of source code must retain the above copyright
- *     notice, this list of conditions and the following disclaimer.
- *   * Redistributions in binary form must reproduce the above
- *     copyright notice, this list of conditions and the following
- *     disclaimer in the documentation and/or other materials provided
- *     with the distribution.
- *   * Neither the name of Willow Garage, Inc. nor the names of its
- *     contributors may be used to endorse or promote products derived
- *     from this software without specific prior written permission.
- *
- *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
- *  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
- *  COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
- *  INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
- *  BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- *  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- *  CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- *  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
- *  ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- *  POSSIBILITY OF SUCH DAMAGE.
- *
- */
-
-#include <opencv2/rgbd.hpp>
-
-#include "depth_to_3d.h"
-#include "utils.h"
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
+
+
+#include "precomp.hpp"
+
+#include "depth_to_3d.hpp"
+#include "utils.hpp"
 
 namespace cv
 {
diff --git a/modules/rgbd/src/depth_to_3d.h b/modules/rgbd/src/depth_to_3d.hpp
similarity index 57%
rename from modules/rgbd/src/depth_to_3d.h
rename to modules/rgbd/src/depth_to_3d.hpp
index 97e78673093..17410f0bfaa 100644
--- a/modules/rgbd/src/depth_to_3d.h
+++ b/modules/rgbd/src/depth_to_3d.hpp
@@ -1,46 +1,12 @@
-/*
- * Software License Agreement (BSD License)
- *
- *  Copyright (c) 2009, Willow Garage, Inc.
- *  All rights reserved.
- *
- *  Redistribution and use in source and binary forms, with or without
- *  modification, are permitted provided that the following conditions
- *  are met:
- *
- *   * Redistributions of source code must retain the above copyright
- *     notice, this list of conditions and the following disclaimer.
- *   * Redistributions in binary form must reproduce the above
- *     copyright notice, this list of conditions and the following
- *     disclaimer in the documentation and/or other materials provided
- *     with the distribution.
- *   * Neither the name of Willow Garage, Inc. nor the names of its
- *     contributors may be used to endorse or promote products derived
- *     from this software without specific prior written permission.
- *
- *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
- *  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
- *  COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
- *  INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
- *  BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- *  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- *  CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- *  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
- *  ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- *  POSSIBILITY OF SUCH DAMAGE.
- *
- */
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
 
 #ifndef __OPENCV_RGBD_DEPTH_TO_3D_HPP__
 #define __OPENCV_RGBD_DEPTH_TO_3D_HPP__
 
-#ifdef __cplusplus
-
-#include <opencv2/core.hpp>
-#include <limits.h>
-
 namespace cv
 {
 namespace rgbd
@@ -122,8 +88,6 @@ convertDepthToFloat(const cv::Mat& depth, float scale, const cv::Mat &uv_mat, cv
 }
 }
 
-#endif /* __cplusplus */
-
 #endif
 
 /* End of file. */
diff --git a/modules/rgbd/src/dqb.cpp b/modules/rgbd/src/dqb.cpp
new file mode 100644
index 00000000000..4b7560088ee
--- /dev/null
+++ b/modules/rgbd/src/dqb.cpp
@@ -0,0 +1,187 @@
+#include "dqb.hpp"
+
+namespace cv {
+namespace dynafu {
+
+Quaternion::Quaternion() : coeff(Vec4f(0.f, 0.f, 0.f, 0.f))
+{}
+
+Quaternion::Quaternion(float w, float i, float j, float k) : coeff(Vec4f(w, i, j, k))
+{}
+
+Quaternion::Quaternion(const Affine3f& r)
+{
+    // Compute trace of matrix
+    float T = (float)trace(r.matrix);
+
+    float S, X, Y, Z, W;
+
+    if ( T > 0.00000001f ) // to avoid large distortions!
+    {
+        S = sqrt(T) * 2.f;
+        X = (r.matrix(1, 2) - r.matrix(2, 1)) / S;
+        Y = (r.matrix(2, 0) - r.matrix(0, 2)) / S;
+        Z = (r.matrix(0, 1) - r.matrix(1, 0)) / S;
+        W = 0.25f * S;
+    }
+    else
+    {
+        if (r.matrix(0, 0) > r.matrix(1, 1) && r.matrix(0, 0) > r.matrix(2, 2))
+        {
+            // Column 0 :
+            S  = sqrt(1.0f + r.matrix(0,0) - r.matrix(1,1) - r.matrix(2,2)) * 2.f;
+            X = 0.25f * S;
+            Y = (r.matrix(1, 0) + r.matrix(0, 1)) / S;
+            Z = (r.matrix(0, 2) + r.matrix(2, 0)) / S;
+            W = (r.matrix(2, 1) - r.matrix(1, 2)) / S;
+        }
+        else if (r.matrix(1, 1) > r.matrix(2, 2))
+        {
+            // Column 1 :
+            S  = sqrt(1.0f + r.matrix(1,1) - r.matrix(0,0) - r.matrix(2,2)) * 2.f;
+            X = (r.matrix(1, 0) + r.matrix(0, 1)) / S;
+            Y = 0.25f * S;
+            Z = (r.matrix(2, 1) + r.matrix(1, 2)) / S;
+            W = (r.matrix(0, 2) - r.matrix(2, 0)) / S;
+        }
+        else
+        {   // Column 2 :
+            S  = sqrt( 1.0f + r.matrix(2, 2) - r.matrix(0, 0) - r.matrix(1, 1)) * 2.f;
+            X = (r.matrix(0, 2) + r.matrix(2, 0)) / S;
+            Y = (r.matrix(2, 1) + r.matrix(1, 2)) / S;
+            Z = 0.25f * S;
+            W = (r.matrix(1,0) - r.matrix(0, 1)) / S;
+        }
+    }
+
+    coeff = Vec4f(W, -X, -Y, -Z);
+}
+
+Affine3f Quaternion::getRotation() const
+{
+    float W = coeff[0], X = -coeff[1], Y = -coeff[2], Z = -coeff[3];
+    float xx = X * X, xy = X * Y, xz = X * Z, xw = X * W;
+    float yy = Y * Y, yz = Y * Z, yw = Y * W, zz = Z * Z;
+    float zw = Z * W;
+
+    Matx33f rot(1.f - 2.f * (yy + zz),  2.f * (xy + zw),        2.f * (xz - yw),
+                2.f * (xy - zw),        1.f - 2.f * (xx + zz),  2.f * (yz + xw),
+                2.f * (xz + yw),        2.f * (yz - xw),        1.f - 2.f * (xx + yy));
+
+    Affine3f Rt = Affine3f(rot, Vec3f::all(0));
+    return Rt;
+}
+
+Quaternion operator*(float a, const Quaternion& q)
+{
+    Vec4f newQ = a*q.coeff;
+    return Quaternion(newQ[0], newQ[1], newQ[2], newQ[3]);
+}
+
+Quaternion operator*(const Quaternion& q, float a)
+{
+    return a*q;
+}
+
+Quaternion operator/(const Quaternion& q, float a)
+{
+    Vec4f newQ = q.coeff/a;
+    return Quaternion(newQ[0], newQ[1], newQ[2], newQ[3]);
+}
+
+Quaternion operator+(const Quaternion& q1, const Quaternion& q2)
+{
+    Vec4f newQ = q1.coeff + q2.coeff;
+    return Quaternion(newQ[0], newQ[1], newQ[2], newQ[3]);
+}
+
+Quaternion& operator+=(Quaternion& q1, const Quaternion& q2)
+{
+    q1.coeff += q2.coeff;
+    return q1;
+}
+
+Quaternion& operator/=(Quaternion& q, float a)
+{
+    q.coeff /= a;
+    return q;
+}
+
+
+
+DualQuaternion::DualQuaternion() : q0(), qe()
+{}
+
+DualQuaternion::DualQuaternion(const Affine3f& rt)
+{
+    q0 = Quaternion(rt);
+    Vec3f t = rt.translation();
+    float w = -0.5f*( t[0] * q0.i() + t[1] * q0.j() + t[2] * q0.k());
+    float i =  0.5f*( t[0] * q0.w() + t[1] * q0.k() - t[2] * q0.j());
+    float j =  0.5f*(-t[0] * q0.k() + t[1] * q0.w() + t[2] * q0.i());
+    float k =  0.5f*( t[0] * q0.j() - t[1] * q0.i() + t[2] * q0.w());
+    qe = Quaternion(w, i, j, k);
+}
+
+DualQuaternion::DualQuaternion(Quaternion& _q0, Quaternion& _qe) : q0(_q0), qe(_qe)
+{}
+
+void DualQuaternion::normalize()
+{
+    float n = q0.normalize();
+    qe /= n;
+}
+
+DualQuaternion& operator+=(DualQuaternion& q1, const DualQuaternion& q2)
+{
+    q1.q0 += q2.q0;
+    q1.qe += q2.qe;
+    return q1;
+}
+
+DualQuaternion operator*(float a, const DualQuaternion& q)
+{
+    Quaternion newQ0 = a*q.q0;
+    Quaternion newQe = a*q.qe;
+    return DualQuaternion(newQ0, newQe);
+}
+
+Affine3f DualQuaternion::getAffine() const
+{
+    float norm = q0.norm();
+
+    Affine3f Rt = (q0/norm).getRotation();
+    Vec3f t(0.f, 0.f, 0.f);
+    t[0] = 2.f*(-qe.w()*q0.i() + qe.i()*q0.w() - qe.j()*q0.k() + qe.k()*q0.j()) / norm;
+    t[1] = 2.f*(-qe.w()*q0.j() + qe.i()*q0.k() + qe.j()*q0.w() - qe.k()*q0.i()) / norm;
+    t[2] = 2.f*(-qe.w()*q0.k() - qe.i()*q0.j() + qe.j()*q0.i() + qe.k()*q0.w()) / norm;
+
+    return Rt.translate(t);
+}
+
+DualQuaternion DQB(std::vector<float>& weights, std::vector<DualQuaternion>& quats)
+{
+    size_t n = weights.size();
+    DualQuaternion blended;
+    for(size_t i = 0; i < n; i++)
+        blended += weights[i] * quats[i];
+
+    blended.normalize();
+    return blended;
+}
+
+Affine3f DQB(std::vector<float>& weights, std::vector<Affine3f>& transforms)
+{
+    size_t n = transforms.size();
+    std::vector<DualQuaternion> quats(n);
+
+    std::transform(transforms.begin(), transforms.end(),
+                   quats.begin(), [](const Affine3f& rt){return DualQuaternion(rt);});
+
+    DualQuaternion blended = DQB(weights, quats);
+    return blended.getAffine();
+}
+
+
+} // namespace dynafu
+} // namespace cv
\ No newline at end of file
diff --git a/modules/rgbd/src/dqb.hpp b/modules/rgbd/src/dqb.hpp
new file mode 100644
index 00000000000..c000ee4f76d
--- /dev/null
+++ b/modules/rgbd/src/dqb.hpp
@@ -0,0 +1,99 @@
+/*
+The code for Dual Quaternion Blending provided here is a modified
+version of the sample codes by Arkan.
+
+Copyright (c) 2019 Arkan
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+*/
+
+#ifndef __OPENCV_RGBD_DQB_HPP__
+#define __OPENCV_RGBD_DQB_HPP__
+
+#include "opencv2/core.hpp"
+#include "opencv2/core/affine.hpp"
+
+namespace cv {
+namespace dynafu {
+
+class Quaternion
+{
+public:
+    Quaternion();
+    Quaternion(float w, float i, float j, float k);
+
+    // Generate a quaternion from rotation of a Rt matrix.
+    Quaternion(const Affine3f& r);
+
+    float normalize()
+    {
+        float n = (float)cv::norm(coeff);
+        coeff /= n;
+        return n;
+    }
+
+    Affine3f getRotation() const;
+
+    float w() const {return coeff[0];}
+    float i() const {return coeff[1];}
+    float j() const {return coeff[2];}
+    float k() const {return coeff[3];}
+
+    float norm() const {return (float)cv::norm(coeff);}
+
+    friend Quaternion operator*(float a, const Quaternion& q);
+    friend Quaternion operator*(const Quaternion& q, float a);
+    friend Quaternion operator/(const Quaternion& q, float a);
+    friend Quaternion operator+(const Quaternion& q1, const Quaternion& q2);
+    friend Quaternion& operator+=(Quaternion& q1, const Quaternion& q2);
+    friend Quaternion& operator/=(Quaternion& q, float a);
+
+private:
+    // w, i, j, k coefficients
+    Vec4f coeff;
+};
+
+class DualQuaternion
+{
+public:
+    DualQuaternion();
+    DualQuaternion(const Affine3f& Rt);
+    DualQuaternion(Quaternion& q0, Quaternion& qe);
+
+    void normalize();
+
+    friend DualQuaternion& operator+=(DualQuaternion& q1, const DualQuaternion& q2);
+    friend DualQuaternion operator*(float a, const DualQuaternion& q);
+
+    Affine3f getAffine() const;
+
+private:
+    Quaternion q0; // rotation quaternion
+    Quaternion qe; // translation quaternion
+};
+
+DualQuaternion DQB(std::vector<float>& weights, std::vector<DualQuaternion>& quats);
+
+Affine3f DQB(std::vector<float>& weights, std::vector<Affine3f>& transforms);
+
+
+} // namespace dynafu
+} // namespace cv
+
+#endif
\ No newline at end of file
diff --git a/modules/rgbd/src/dynafu.cpp b/modules/rgbd/src/dynafu.cpp
new file mode 100644
index 00000000000..0677f4242c1
--- /dev/null
+++ b/modules/rgbd/src/dynafu.cpp
@@ -0,0 +1,559 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#include "precomp.hpp"
+#include "dynafu_tsdf.hpp"
+#include "warpfield.hpp"
+
+#include "fast_icp.hpp"
+#include "nonrigid_icp.hpp"
+#include "kinfu_frame.hpp"
+
+#include "opencv2/core/opengl.hpp"
+
+#ifdef HAVE_OPENGL
+#define GL_GLEXT_PROTOTYPES
+#ifdef __APPLE__
+# include <OpenGL/gl.h>
+#else
+#ifdef _WIN32
+# define WIN32_LEAN_AND_MEAN
+# include <windows.h>
+#endif
+# include <GL/gl.h>
+#endif
+#else
+# define NO_OGL_ERR CV_Error(cv::Error::OpenGlNotSupported, \
+                    "OpenGL support not enabled. Please rebuild the library with OpenGL support");
+#endif
+
+namespace cv {
+namespace dynafu {
+using namespace kinfu;
+
+Ptr<Params> Params::defaultParams()
+{
+    Params p;
+
+    p.frameSize = Size(640, 480);
+
+    float fx, fy, cx, cy;
+    fx = fy = 525.f;
+    cx = p.frameSize.width/2 - 0.5f;
+    cy = p.frameSize.height/2 - 0.5f;
+    p.intr = Matx33f(fx,  0, cx,
+                      0, fy, cy,
+                      0,  0,  1);
+
+    // 5000 for the 16-bit PNG files
+    // 1 for the 32-bit float images in the ROS bag files
+    p.depthFactor = 5000;
+
+    // sigma_depth is scaled by depthFactor when calling bilateral filter
+    p.bilateral_sigma_depth = 0.04f;  //meter
+    p.bilateral_sigma_spatial = 4.5; //pixels
+    p.bilateral_kernel_size = 7;     //pixels
+
+    p.icpAngleThresh = (float)(30. * CV_PI / 180.); // radians
+    p.icpDistThresh = 0.1f; // meters
+
+    p.icpIterations = {10, 5, 4};
+    p.pyramidLevels = (int)p.icpIterations.size();
+
+    p.tsdf_min_camera_movement = 0.f; //meters, disabled
+
+    p.volumeDims = Vec3i::all(512); //number of voxels
+
+    float volSize = 3.f;
+    p.voxelSize = volSize/512.f; //meters
+
+    // default pose of volume cube
+    p.volumePose = Affine3f().translate(Vec3f(-volSize/2.f, -volSize/2.f, 0.5f));
+    p.tsdf_trunc_dist = 0.04f; //meters;
+    p.tsdf_max_weight = 64;   //frames
+
+    p.raycast_step_factor = 0.25f;  //in voxel sizes
+    // gradient delta factor is fixed at 1.0f and is not used
+    //p.gradient_delta_factor = 0.5f; //in voxel sizes
+
+    //p.lightPose = p.volume_pose.translation()/4; //meters
+    p.lightPose = Vec3f::all(0.f); //meters
+
+    // depth truncation is not used by default but can be useful in some scenes
+    p.truncateThreshold = 0.f; //meters
+
+    return makePtr<Params>(p);
+}
+
+Ptr<Params> Params::coarseParams()
+{
+    Ptr<Params> p = defaultParams();
+
+    p->icpIterations = {5, 3, 2};
+    p->pyramidLevels = (int)p->icpIterations.size();
+
+    float volSize = 3.f;
+    p->volumeDims = Vec3i::all(128); //number of voxels
+    p->voxelSize  = volSize/128.f;
+
+    p->raycast_step_factor = 0.75f;  //in voxel sizes
+
+    return p;
+}
+
+// T should be Mat or UMat
+template< typename T >
+class DynaFuImpl : public DynaFu
+{
+public:
+    DynaFuImpl(const Params& _params);
+    virtual ~DynaFuImpl();
+
+    const Params& getParams() const CV_OVERRIDE;
+
+    void render(OutputArray image, const Matx44f& cameraPose) const CV_OVERRIDE;
+
+    void getCloud(OutputArray points, OutputArray normals) const CV_OVERRIDE;
+    void getPoints(OutputArray points) const CV_OVERRIDE;
+    void getNormals(InputArray points, OutputArray normals) const CV_OVERRIDE;
+
+    void reset() CV_OVERRIDE;
+
+    const Affine3f getPose() const CV_OVERRIDE;
+
+    bool update(InputArray depth) CV_OVERRIDE;
+
+    bool updateT(const T& depth);
+
+    std::vector<Point3f> getNodesPos() const CV_OVERRIDE;
+
+    void marchCubes(OutputArray vertices, OutputArray edges) const CV_OVERRIDE;
+
+    void renderSurface(OutputArray depthImage, OutputArray vertImage, OutputArray normImage, bool warp=true) CV_OVERRIDE;
+
+private:
+    Params params;
+
+    cv::Ptr<ICP> icp;
+    cv::Ptr<NonRigidICP> dynafuICP;
+    cv::Ptr<TSDFVolume> volume;
+
+    int frameCounter;
+    Affine3f pose;
+    std::vector<T> pyrPoints;
+    std::vector<T> pyrNormals;
+
+    WarpField warpfield;
+
+#ifdef HAVE_OPENGL
+    ogl::Arrays arr;
+    ogl::Buffer idx;
+#endif
+    void drawScene(OutputArray depthImg, OutputArray shadedImg);
+};
+
+template< typename T>
+std::vector<Point3f> DynaFuImpl<T>::getNodesPos() const {
+    NodeVectorType nv = warpfield.getNodes();
+    std::vector<Point3f> nodesPos;
+    for(auto n: nv)
+        nodesPos.push_back(n->pos);
+
+    return nodesPos;
+}
+
+template< typename T >
+DynaFuImpl<T>::DynaFuImpl(const Params &_params) :
+    params(_params),
+    icp(makeICP(params.intr, params.icpIterations, params.icpAngleThresh, params.icpDistThresh)),
+    dynafuICP(makeNonRigidICP(params.intr, volume, 2)),
+    volume(makeTSDFVolume(params.volumeDims, params.voxelSize, params.volumePose,
+                          params.tsdf_trunc_dist, params.tsdf_max_weight,
+                          params.raycast_step_factor)),
+    pyrPoints(), pyrNormals(), warpfield()
+{
+#ifdef HAVE_OPENGL
+    // Bind framebuffer for off-screen rendering
+    unsigned int fbo_depth;
+    glGenRenderbuffersEXT(1, &fbo_depth);
+    glBindRenderbufferEXT(GL_RENDERBUFFER_EXT, fbo_depth);
+    glRenderbufferStorageEXT(GL_RENDERBUFFER_EXT, GL_DEPTH_COMPONENT, params.frameSize.width, params.frameSize.height);
+
+    unsigned int fbo;
+    glGenFramebuffersEXT(1, &fbo);
+    glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, fbo);
+
+    glFramebufferRenderbufferEXT(GL_FRAMEBUFFER_EXT, GL_DEPTH_ATTACHMENT_EXT, GL_RENDERBUFFER_EXT, fbo_depth);
+
+    // Make a color attachment to this framebuffer
+    unsigned int fbo_color;
+    glGenRenderbuffersEXT(1, &fbo_color);
+    glBindRenderbufferEXT(GL_RENDERBUFFER_EXT, fbo_color);
+    glRenderbufferStorageEXT(GL_RENDERBUFFER_EXT, GL_RGB, params.frameSize.width, params.frameSize.height);
+
+    glFramebufferRenderbufferEXT(GL_FRAMEBUFFER_EXT, GL_COLOR_ATTACHMENT0_EXT, GL_RENDERBUFFER_EXT, fbo_color);
+
+#endif
+
+    reset();
+}
+
+template< typename T >
+void DynaFuImpl<T>::drawScene(OutputArray depthImage, OutputArray shadedImage)
+{
+#ifdef HAVE_OPENGL
+    glViewport(0, 0, params.frameSize.width, params.frameSize.height);
+
+    glEnable(GL_DEPTH_TEST);
+    glClear(GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT);
+
+    glMatrixMode(GL_PROJECTION);
+    glLoadIdentity();
+
+    float fovX = params.frameSize.width/params.intr(0, 0);
+    float fovY = params.frameSize.height/params.intr(1, 1);
+
+    Vec3f t;
+    t = params.volumePose.translation();
+
+    double nearZ = t[2];
+    double farZ = params.volumeDims[2] * params.voxelSize + nearZ;
+
+    // Define viewing volume
+    glFrustum(-nearZ*fovX/2, nearZ*fovX/2, -nearZ*fovY/2, nearZ*fovY/2, nearZ, farZ);
+
+    glMatrixMode(GL_MODELVIEW);
+    glLoadIdentity();
+    glScalef(1.f, 1.f, -1.f); //Flip Z as camera points towards -ve Z axis
+
+    ogl::render(arr, idx, ogl::TRIANGLES);
+
+    Mat depthData(params.frameSize.height, params.frameSize.width, CV_32F);
+    Mat shadeData(params.frameSize.height, params.frameSize.width, CV_32FC3);
+    glReadPixels(0, 0, params.frameSize.width, params.frameSize.height, GL_DEPTH_COMPONENT, GL_FLOAT, depthData.ptr());
+    glReadPixels(0, 0, params.frameSize.width, params.frameSize.height, GL_RGB, GL_FLOAT, shadeData.ptr());
+
+    // linearise depth
+    for(auto it = depthData.begin<float>(); it != depthData.end<float>(); ++it)
+    {
+        *it = farZ * nearZ / ((*it)*(nearZ - farZ) + farZ);
+
+        if(*it >= farZ)
+            *it = std::numeric_limits<float>::quiet_NaN();
+    }
+
+    if(depthImage.needed()) {
+        depthData.copyTo(depthImage);
+    }
+
+    if(shadedImage.needed()) {
+        shadeData.copyTo(shadedImage);
+    }
+#else
+    CV_UNUSED(depthImage);
+    CV_UNUSED(shadedImage);
+    NO_OGL_ERR;
+#endif
+}
+
+template< typename T >
+void DynaFuImpl<T>::reset()
+{
+    frameCounter = 0;
+    pose = Affine3f::Identity();
+    warpfield.setAllRT(Affine3f::Identity());
+    volume->reset();
+}
+
+template< typename T >
+DynaFuImpl<T>::~DynaFuImpl()
+{ }
+
+template< typename T >
+const Params& DynaFuImpl<T>::getParams() const
+{
+    return params;
+}
+
+template< typename T >
+const Affine3f DynaFuImpl<T>::getPose() const
+{
+    return pose;
+}
+
+
+template<>
+bool DynaFuImpl<Mat>::update(InputArray _depth)
+{
+    CV_Assert(!_depth.empty() && _depth.size() == params.frameSize);
+
+    Mat depth;
+    if(_depth.isUMat())
+    {
+        _depth.copyTo(depth);
+        return updateT(depth);
+    }
+    else
+    {
+        return updateT(_depth.getMat());
+    }
+}
+
+
+template<>
+bool DynaFuImpl<UMat>::update(InputArray _depth)
+{
+    CV_TRACE_FUNCTION();
+
+    CV_Assert(!_depth.empty() && _depth.size() == params.frameSize);
+
+    UMat depth;
+    if(!_depth.isUMat())
+    {
+        _depth.copyTo(depth);
+        return updateT(depth);
+    }
+    else
+    {
+        return updateT(_depth.getUMat());
+    }
+}
+
+
+template< typename T >
+bool DynaFuImpl<T>::updateT(const T& _depth)
+{
+    CV_TRACE_FUNCTION();
+
+    T depth;
+    if(_depth.type() != DEPTH_TYPE)
+        _depth.convertTo(depth, DEPTH_TYPE);
+    else
+        depth = _depth;
+
+    std::vector<T> newPoints, newNormals;
+    makeFrameFromDepth(depth, newPoints, newNormals, params.intr,
+                       params.pyramidLevels,
+                       params.depthFactor,
+                       params.bilateral_sigma_depth,
+                       params.bilateral_sigma_spatial,
+                       params.bilateral_kernel_size,
+                       params.truncateThreshold);
+
+    if(frameCounter == 0)
+    {
+        // use depth instead of distance
+        volume->integrate(depth, params.depthFactor, pose, params.intr, makePtr<WarpField>(warpfield));
+
+        pyrPoints  = newPoints;
+        pyrNormals = newNormals;
+        warpfield.setAllRT(Affine3f::Identity());
+    }
+    else
+    {
+        UMat wfPoints;
+        volume->fetchPointsNormals(wfPoints, noArray(), true);
+        warpfield.updateNodesFromPoints(wfPoints);
+
+        Mat _depthRender, estdDepth, _vertRender, _normRender;
+        renderSurface(_depthRender, _vertRender, _normRender, false);
+        _depthRender.convertTo(estdDepth, DEPTH_TYPE);
+
+        std::vector<T> estdPoints, estdNormals;
+        makeFrameFromDepth(estdDepth, estdPoints, estdNormals, params.intr,
+                    params.pyramidLevels,
+                    1.f,
+                    params.bilateral_sigma_depth,
+                    params.bilateral_sigma_spatial,
+                    params.bilateral_kernel_size,
+                    params.truncateThreshold);
+
+        pyrPoints = estdPoints;
+        pyrNormals = estdNormals;
+
+        Affine3f affine;
+        bool success = icp->estimateTransform(affine, pyrPoints, pyrNormals, newPoints, newNormals);
+        if(!success)
+            return false;
+
+        pose = pose * affine;
+
+        for(int iter = 0; iter < 1; iter++)
+        {
+            renderSurface(_depthRender, _vertRender, _normRender);
+            _depthRender.convertTo(estdDepth, DEPTH_TYPE);
+
+            makeFrameFromDepth(estdDepth, estdPoints, estdNormals, params.intr,
+                params.pyramidLevels,
+                1.f,
+                params.bilateral_sigma_depth,
+                params.bilateral_sigma_spatial,
+                params.bilateral_kernel_size,
+                params.truncateThreshold);
+
+            success = dynafuICP->estimateWarpNodes(warpfield, pose, _vertRender, estdPoints[0],
+                                                estdNormals[0],
+                                               newPoints[0], newNormals[0]);
+            if(!success)
+                return false;
+        }
+
+        float rnorm = (float)cv::norm(affine.rvec());
+        float tnorm = (float)cv::norm(affine.translation());
+        // We do not integrate volume if camera does not move
+        if((rnorm + tnorm)/2 >= params.tsdf_min_camera_movement)
+        {
+            // use depth instead of distance
+            volume->integrate(depth, params.depthFactor, pose, params.intr, makePtr<WarpField>(warpfield));
+        }
+
+
+    }
+
+    std::cout << "Frame# " << frameCounter++ << std::endl;
+    return true;
+}
+
+
+template< typename T >
+void DynaFuImpl<T>::render(OutputArray image, const Matx44f& _cameraPose) const
+{
+    CV_TRACE_FUNCTION();
+
+    Affine3f cameraPose(_cameraPose);
+
+    const Affine3f id = Affine3f::Identity();
+    if((cameraPose.rotation() == pose.rotation() && cameraPose.translation() == pose.translation()) ||
+       (cameraPose.rotation() == id.rotation()   && cameraPose.translation() == id.translation()))
+    {
+        renderPointsNormals(pyrPoints[0], pyrNormals[0], image, params.lightPose);
+    }
+    else
+    {
+        T points, normals;
+        volume->raycast(cameraPose, params.intr, params.frameSize, points, normals);
+        renderPointsNormals(points, normals, image, params.lightPose);
+    }
+}
+
+
+template< typename T >
+void DynaFuImpl<T>::getCloud(OutputArray p, OutputArray n) const
+{
+    volume->fetchPointsNormals(p, n);
+}
+
+
+template< typename T >
+void DynaFuImpl<T>::getPoints(OutputArray points) const
+{
+    volume->fetchPointsNormals(points, noArray());
+}
+
+
+template< typename T >
+void DynaFuImpl<T>::getNormals(InputArray points, OutputArray normals) const
+{
+    volume->fetchNormals(points, normals);
+}
+
+template< typename T >
+void DynaFuImpl<T>::marchCubes(OutputArray vertices, OutputArray edges) const
+{
+    volume->marchCubes(vertices, edges);
+}
+
+template<typename T>
+void DynaFuImpl<T>::renderSurface(OutputArray depthImage, OutputArray vertImage, OutputArray normImage, bool warp)
+{
+#ifdef HAVE_OPENGL
+    Mat _vertices, vertices, normals, meshIdx;
+    volume->marchCubes(_vertices, noArray());
+    if(_vertices.empty()) return;
+
+    _vertices.convertTo(vertices, POINT_TYPE);
+    getNormals(vertices, normals);
+
+    Mat warpedVerts(vertices.size(), vertices.type());
+
+    Affine3f invCamPose(pose.inv());
+    for(int i = 0; i < vertices.size().height; i++) {
+        ptype v = vertices.at<ptype>(i);
+
+        // transform vertex to RGB space
+        Point3f pVoxel = (params.volumePose.inv() * Point3f(v[0], v[1], v[2])) / params.voxelSize;
+        Point3f pGlobal = Point3f(pVoxel.x / params.volumeDims[0],
+                                  pVoxel.y / params.volumeDims[1],
+                                  pVoxel.z / params.volumeDims[2]);
+        vertices.at<ptype>(i) = ptype(pGlobal.x, pGlobal.y, pGlobal.z, 1.f);
+
+        // transform normals to RGB space
+        ptype n = normals.at<ptype>(i);
+        Point3f nGlobal = params.volumePose.rotation().inv() * Point3f(n[0], n[1], n[2]);
+        nGlobal.x = (nGlobal.x + 1)/2;
+        nGlobal.y = (nGlobal.y + 1)/2;
+        nGlobal.z = (nGlobal.z + 1)/2;
+        normals.at<ptype>(i) = ptype(nGlobal.x, nGlobal.y, nGlobal.z, 1.f);
+
+        //Point3f p = Point3f(v[0], v[1], v[2]);
+
+        if(!warp)
+        {
+            Point3f p(invCamPose * params.volumePose * (pVoxel*params.voxelSize));
+            warpedVerts.at<ptype>(i) = ptype(p.x, p.y, p.z, 1.f);
+        }
+        else
+        {
+            int numNeighbours = 0;
+            const nodeNeighboursType neighbours = volume->getVoxelNeighbours(pVoxel, numNeighbours);
+            Point3f p = (invCamPose * params.volumePose) * warpfield.applyWarp(pVoxel*params.voxelSize, neighbours, numNeighbours);
+            warpedVerts.at<ptype>(i) = ptype(p.x, p.y, p.z, 1.f);
+        }
+    }
+
+    for(int i = 0; i < vertices.size().height; i++)
+        meshIdx.push_back<int>(i);
+
+    arr.setVertexArray(warpedVerts);
+    arr.setColorArray(vertices);
+    idx.copyFrom(meshIdx);
+
+    drawScene(depthImage, vertImage);
+
+    arr.setVertexArray(warpedVerts);
+    arr.setColorArray(normals);
+    drawScene(noArray(), normImage);
+#else
+    CV_UNUSED(depthImage);
+    CV_UNUSED(vertImage);
+    CV_UNUSED(normImage);
+    CV_UNUSED(warp);
+    NO_OGL_ERR;
+#endif
+}
+
+// importing class
+
+#ifdef OPENCV_ENABLE_NONFREE
+
+Ptr<DynaFu> DynaFu::create(const Ptr<Params>& params)
+{
+    return makePtr< DynaFuImpl<Mat> >(*params);
+}
+
+#else
+Ptr<DynaFu> DynaFu::create(const Ptr<Params>& /*params*/)
+{
+    CV_Error(Error::StsNotImplemented,
+             "This algorithm is patented and is excluded in this configuration; "
+             "Set OPENCV_ENABLE_NONFREE CMake option and rebuild the library");
+}
+#endif
+
+DynaFu::~DynaFu() {}
+
+} // namespace dynafu
+} // namespace cv
diff --git a/modules/rgbd/src/dynafu_tsdf.cpp b/modules/rgbd/src/dynafu_tsdf.cpp
new file mode 100644
index 00000000000..b8de0600d7b
--- /dev/null
+++ b/modules/rgbd/src/dynafu_tsdf.cpp
@@ -0,0 +1,948 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#include "precomp.hpp"
+#include "dynafu_tsdf.hpp"
+#include "marchingcubes.hpp"
+
+namespace cv {
+
+namespace dynafu {
+
+using namespace kinfu;
+
+// TODO: Optimization possible:
+// * volumeType can be FP16
+// * weight can be int16
+typedef float volumeType;
+struct Voxel
+{
+    volumeType v;
+    float weight;
+    nodeNeighboursType neighbours;
+    float neighbourDists[DYNAFU_MAX_NEIGHBOURS];
+    int n;
+};
+typedef Vec<uchar, sizeof(Voxel)> VecT;
+
+
+class TSDFVolumeCPU : public TSDFVolume
+{
+public:
+    // dimension in voxels, size in meters
+    TSDFVolumeCPU(Point3i _res, float _voxelSize, cv::Affine3f _pose, float _truncDist, int _maxWeight,
+                  float _raycastStepFactor, bool zFirstMemOrder = true);
+
+    virtual void integrate(InputArray _depth, float depthFactor, cv::Affine3f cameraPose, cv::kinfu::Intr intrinsics, Ptr<WarpField> wf) override;
+    virtual void raycast(cv::Affine3f cameraPose, cv::kinfu::Intr intrinsics, cv::Size frameSize,
+                         cv::OutputArray points, cv::OutputArray normals) const override;
+
+    virtual void fetchNormals(cv::InputArray points, cv::OutputArray _normals) const override;
+    virtual void fetchPointsNormals(cv::OutputArray points, cv::OutputArray normals, bool fetchVoxels) const override;
+
+    virtual void marchCubes(OutputArray _vertices, OutputArray _edges) const override;
+
+    virtual void reset() override;
+
+    volumeType interpolateVoxel(cv::Point3f p) const;
+    Point3f getNormalVoxel(cv::Point3f p) const;
+
+    nodeNeighboursType const& getVoxelNeighbours(Point3i coords, int& n) const override;
+
+    // See zFirstMemOrder arg of parent class constructor
+    // for the array layout info
+    // Consist of Voxel elements
+    Mat volume;
+
+private:
+    Point3f interpolate(Point3f p1, Point3f p2, float v1, float v2) const;
+};
+
+
+TSDFVolume::TSDFVolume(Point3i _res, float _voxelSize, Affine3f _pose, float _truncDist, int _maxWeight,
+                       float _raycastStepFactor, bool zFirstMemOrder) :
+    voxelSize(_voxelSize),
+    voxelSizeInv(1.f/_voxelSize),
+    volResolution(_res),
+    maxWeight((float)_maxWeight),
+    pose(_pose),
+    raycastStepFactor(_raycastStepFactor)
+{
+    // Unlike original code, this should work with any volume size
+    // Not only when (x,y,z % 32) == 0
+
+    volSize = Point3f(volResolution) * voxelSize;
+
+    truncDist = std::max(_truncDist, 2.1f * voxelSize);
+
+    // (xRes*yRes*zRes) array
+    // Depending on zFirstMemOrder arg:
+    // &elem(x, y, z) = data + x*zRes*yRes + y*zRes + z;
+    // &elem(x, y, z) = data + x + y*xRes + z*xRes*yRes;
+    int xdim, ydim, zdim;
+    if(zFirstMemOrder)
+    {
+        xdim = volResolution.z * volResolution.y;
+        ydim = volResolution.z;
+        zdim = 1;
+    }
+    else
+    {
+        xdim = 1;
+        ydim = volResolution.x;
+        zdim = volResolution.x * volResolution.y;
+    }
+
+    volDims = Vec4i(xdim, ydim, zdim);
+    neighbourCoords = Vec8i(
+        volDims.dot(Vec4i(0, 0, 0)),
+        volDims.dot(Vec4i(0, 0, 1)),
+        volDims.dot(Vec4i(0, 1, 0)),
+        volDims.dot(Vec4i(0, 1, 1)),
+        volDims.dot(Vec4i(1, 0, 0)),
+        volDims.dot(Vec4i(1, 0, 1)),
+        volDims.dot(Vec4i(1, 1, 0)),
+        volDims.dot(Vec4i(1, 1, 1))
+    );
+}
+
+// dimension in voxels, size in meters
+TSDFVolumeCPU::TSDFVolumeCPU(Point3i _res, float _voxelSize, cv::Affine3f _pose, float _truncDist, int _maxWeight,
+                             float _raycastStepFactor, bool zFirstMemOrder) :
+    TSDFVolume(_res, _voxelSize, _pose, _truncDist, _maxWeight, _raycastStepFactor, zFirstMemOrder)
+{
+    volume = Mat(1, volResolution.x * volResolution.y * volResolution.z, rawType<Voxel>());
+
+    reset();
+}
+
+// zero volume, leave rest params the same
+void TSDFVolumeCPU::reset()
+{
+    CV_TRACE_FUNCTION();
+
+    volume.forEach<VecT>([](VecT& vv, const int* /* position */)
+    {
+        Voxel& v = reinterpret_cast<Voxel&>(vv);
+        v.v = 0; v.weight = 0;
+    });
+}
+
+static const bool fixMissingData = false;
+
+static inline depthType bilinearDepth(const Depth& m, cv::Point2f pt)
+{
+    const depthType defaultValue = qnan;
+    if(pt.x < 0 || pt.x >= m.cols-1 ||
+       pt.y < 0 || pt.y >= m.rows-1)
+        return defaultValue;
+
+    int xi = cvFloor(pt.x), yi = cvFloor(pt.y);
+
+    const depthType* row0 = m[yi+0];
+    const depthType* row1 = m[yi+1];
+
+    depthType v00 = row0[xi+0];
+    depthType v01 = row0[xi+1];
+    depthType v10 = row1[xi+0];
+    depthType v11 = row1[xi+1];
+
+    // assume correct depth is positive
+    bool b00 = v00 > 0;
+    bool b01 = v01 > 0;
+    bool b10 = v10 > 0;
+    bool b11 = v11 > 0;
+
+    if(!fixMissingData)
+    {
+        if(!(b00 && b01 && b10 && b11))
+            return defaultValue;
+        else
+        {
+            float tx = pt.x - xi, ty = pt.y - yi;
+            depthType v0 = v00 + tx*(v01 - v00);
+            depthType v1 = v10 + tx*(v11 - v10);
+            return v0 + ty*(v1 - v0);
+        }
+    }
+    else
+    {
+        int nz = b00 + b01 + b10 + b11;
+        if(nz == 0)
+        {
+            return defaultValue;
+        }
+        if(nz == 1)
+        {
+            if(b00) return v00;
+            if(b01) return v01;
+            if(b10) return v10;
+            if(b11) return v11;
+        }
+        else if(nz == 2)
+        {
+            if(b00 && b10) v01 = v00, v11 = v10;
+            if(b01 && b11) v00 = v01, v10 = v11;
+            if(b00 && b01) v10 = v00, v11 = v01;
+            if(b10 && b11) v00 = v10, v01 = v11;
+            if(b00 && b11) v01 = v10 = (v00 + v11)*0.5f;
+            if(b01 && b10) v00 = v11 = (v01 + v10)*0.5f;
+        }
+        else if(nz == 3)
+        {
+            if(!b00) v00 = v10 + v01 - v11;
+            if(!b01) v01 = v00 + v11 - v10;
+            if(!b10) v10 = v00 + v11 - v01;
+            if(!b11) v11 = v01 + v10 - v00;
+        }
+
+        float tx = pt.x - xi, ty = pt.y - yi;
+        depthType v0 = v00 + tx*(v01 - v00);
+        depthType v1 = v10 + tx*(v11 - v10);
+        return v0 + ty*(v1 - v0);
+    }
+}
+
+struct IntegrateInvoker : ParallelLoopBody
+{
+    IntegrateInvoker(TSDFVolumeCPU& _volume, const Depth& _depth, Intr intrinsics, cv::Affine3f cameraPose,
+                     float depthFactor, Ptr<WarpField> wf) :
+        ParallelLoopBody(),
+        volume(_volume),
+        depth(_depth),
+        proj(intrinsics.makeProjector()),
+        vol2cam(cameraPose.inv() * _volume.pose),
+        truncDistInv(1.f/_volume.truncDist),
+        dfac(1.f/depthFactor),
+        warpfield(wf)
+    {
+        volDataStart = volume.volume.ptr<Voxel>();
+    }
+
+    virtual void operator() (const Range& range) const override
+    {
+        CV_TRACE_FUNCTION();
+
+        for(int x = range.start; x < range.end; x++)
+        {
+            Voxel* volDataX = volDataStart + x*volume.volDims[0];
+            for(int y = 0; y < volume.volResolution.y; y++)
+            {
+                Voxel* volDataY = volDataX+y*volume.volDims[1];
+
+                for(int z = 0; z < volume.volResolution.z; z++)
+                {
+                    Voxel& voxel = volDataY[z*volume.volDims[2]];
+
+                    Point3f volPt = Point3f((float)x, (float)y, (float)z)*volume.voxelSize;
+
+                    if(warpfield->getNodeIndex())
+                    {
+                        std::vector<int> indices(warpfield->k);
+                        std::vector<float> dists(warpfield->k);
+                        warpfield->findNeighbours(volPt, indices, dists);
+
+                        voxel.n = 0;
+                        for(size_t i = 0; i < indices.size(); i++)
+                        {
+                            if(std::isnan(dists[i])) continue;
+
+                            voxel.neighbourDists[voxel.n] = dists[i];
+                            voxel.neighbours[voxel.n++] = indices[i];
+                        }
+                    }
+
+                    Point3f camSpacePt =
+                    vol2cam * warpfield->applyWarp(volPt, voxel.neighbours, voxel.n);
+
+                    if(camSpacePt.z <= 0)
+                        continue;
+
+                    Point3f camPixVec;
+                    Point2f projected = proj(camSpacePt, camPixVec);
+
+                    depthType v = bilinearDepth(depth, projected);
+
+                    if(v == 0)
+                        continue;
+
+                    // norm(camPixVec) produces double which is too slow
+                    float pixNorm = sqrt(camPixVec.dot(camPixVec));
+                    // difference between distances of point and of surface to camera
+                    volumeType sdf = pixNorm*(v*dfac - camSpacePt.z);
+                    // possible alternative is:
+                    // kftype sdf = norm(camSpacePt)*(v*dfac/camSpacePt.z - 1);
+
+                    if(sdf >= -volume.truncDist)
+                    {
+                        volumeType tsdf = fmin(1.f, sdf * truncDistInv);
+
+                        float& weight = voxel.weight;
+                        volumeType& value = voxel.v;
+
+                        // update TSDF
+                        float newWeight = 0;
+
+                        if(warpfield->getNodesLen() >= (size_t)warpfield->k)
+                        {
+                            for(int i = 0; i < voxel.n; i++)
+                                newWeight += sqrt(voxel.neighbourDists[i]);
+
+                            if(voxel.n > 0) newWeight /= voxel.n;
+
+                        } else newWeight = 1.f;
+
+                        if((weight + newWeight) != 0)
+                        {
+                            value = (value*weight+tsdf*newWeight) / (weight+newWeight);
+                            weight = min(weight+newWeight, volume.maxWeight);
+                        }
+                    }
+                }
+            }
+        }
+    }
+
+    TSDFVolumeCPU& volume;
+    const Depth& depth;
+    const Intr::Projector proj;
+    const cv::Affine3f vol2cam;
+    const float truncDistInv;
+    const float dfac;
+    Voxel* volDataStart;
+    Ptr<WarpField> warpfield;
+};
+
+// use depth instead of distance (optimization)
+void TSDFVolumeCPU::integrate(InputArray _depth, float depthFactor, cv::Affine3f cameraPose, Intr intrinsics, Ptr<WarpField> wf)
+{
+    CV_TRACE_FUNCTION();
+
+    CV_Assert(_depth.type() == DEPTH_TYPE);
+    Depth depth = _depth.getMat();
+
+    IntegrateInvoker ii(*this, depth, intrinsics, cameraPose, depthFactor, wf);
+    Range range(0, volResolution.x);
+    parallel_for_(range, ii);
+}
+
+inline volumeType TSDFVolumeCPU::interpolateVoxel(Point3f p) const
+{
+    int xdim = volDims[0], ydim = volDims[1], zdim = volDims[2];
+
+    int ix = cvFloor(p.x);
+    int iy = cvFloor(p.y);
+    int iz = cvFloor(p.z);
+
+    float tx = p.x - ix;
+    float ty = p.y - iy;
+    float tz = p.z - iz;
+
+    int coordBase = ix*xdim + iy*ydim + iz*zdim;
+    const Voxel* volData = volume.ptr<Voxel>();
+
+    volumeType vx[8];
+    for(int i = 0; i < 8; i++)
+        vx[i] = volData[neighbourCoords[i] + coordBase].v;
+
+    volumeType v00 = vx[0] + tz*(vx[1] - vx[0]);
+    volumeType v01 = vx[2] + tz*(vx[3] - vx[2]);
+    volumeType v10 = vx[4] + tz*(vx[5] - vx[4]);
+    volumeType v11 = vx[6] + tz*(vx[7] - vx[6]);
+
+    volumeType v0 = v00 + ty*(v01 - v00);
+    volumeType v1 = v10 + ty*(v11 - v10);
+
+    return v0 + tx*(v1 - v0);
+}
+
+inline Point3f TSDFVolumeCPU::getNormalVoxel(Point3f p) const
+{
+    const int xdim = volDims[0], ydim = volDims[1], zdim = volDims[2];
+    const Voxel* volData = volume.ptr<Voxel>();
+
+    if(p.x < 1 || p.x >= volResolution.x - 2 ||
+       p.y < 1 || p.y >= volResolution.y - 2 ||
+       p.z < 1 || p.z >= volResolution.z - 2)
+        return nan3;
+
+    int ix = cvFloor(p.x);
+    int iy = cvFloor(p.y);
+    int iz = cvFloor(p.z);
+
+    float tx = p.x - ix;
+    float ty = p.y - iy;
+    float tz = p.z - iz;
+
+    int coordBase = ix*xdim + iy*ydim + iz*zdim;
+
+    Vec3f an;
+    for(int c = 0; c < 3; c++)
+    {
+        const int dim = volDims[c];
+        float& nv = an[c];
+
+        volumeType vx[8];
+        for(int i = 0; i < 8; i++)
+            vx[i] = volData[neighbourCoords[i] + coordBase + 1*dim].v -
+                    volData[neighbourCoords[i] + coordBase - 1*dim].v;
+
+        volumeType v00 = vx[0] + tz*(vx[1] - vx[0]);
+        volumeType v01 = vx[2] + tz*(vx[3] - vx[2]);
+        volumeType v10 = vx[4] + tz*(vx[5] - vx[4]);
+        volumeType v11 = vx[6] + tz*(vx[7] - vx[6]);
+
+        volumeType v0 = v00 + ty*(v01 - v00);
+        volumeType v1 = v10 + ty*(v11 - v10);
+
+        nv = v0 + tx*(v1 - v0);
+    }
+
+    return normalize(an);
+}
+
+
+struct RaycastInvoker : ParallelLoopBody
+{
+    RaycastInvoker(Points& _points, Normals& _normals, Affine3f cameraPose,
+                   Intr intrinsics, const TSDFVolumeCPU& _volume) :
+        ParallelLoopBody(),
+        points(_points),
+        normals(_normals),
+        volume(_volume),
+        tstep(volume.truncDist * volume.raycastStepFactor),
+        // We do subtract voxel size to minimize checks after
+        // Note: origin of volume coordinate is placed
+        // in the center of voxel (0,0,0), not in the corner of the voxel!
+        boxMax(volume.volSize - Point3f(volume.voxelSize,
+                                        volume.voxelSize,
+                                        volume.voxelSize)),
+        boxMin(),
+        cam2vol(volume.pose.inv() * cameraPose),
+        vol2cam(cameraPose.inv() * volume.pose),
+        reproj(intrinsics.makeReprojector())
+    {  }
+
+    virtual void operator() (const Range& range) const override
+    {
+        const Point3f camTrans = cam2vol.translation();
+        const Matx33f  camRot  = cam2vol.rotation();
+        const Matx33f  volRot  = vol2cam.rotation();
+
+        for(int y = range.start; y < range.end; y++)
+        {
+            ptype* ptsRow = points[y];
+            ptype* nrmRow = normals[y];
+
+            for(int x = 0; x < points.cols; x++)
+            {
+                Point3f point = nan3, normal = nan3;
+
+                Point3f orig = camTrans;
+                // direction through pixel in volume space
+                Point3f dir = normalize(Vec3f(camRot * reproj(Point3f((float)x, (float)y, 1.f))));
+
+                // compute intersection of ray with all six bbox planes
+                Vec3f rayinv(1.f/dir.x, 1.f/dir.y, 1.f/dir.z);
+                Point3f tbottom = rayinv.mul(boxMin - orig);
+                Point3f ttop    = rayinv.mul(boxMax - orig);
+
+                // re-order intersections to find smallest and largest on each axis
+                Point3f minAx(min(ttop.x, tbottom.x), min(ttop.y, tbottom.y), min(ttop.z, tbottom.z));
+                Point3f maxAx(max(ttop.x, tbottom.x), max(ttop.y, tbottom.y), max(ttop.z, tbottom.z));
+
+                // near clipping plane
+                const float clip = 0.f;
+                float tmin = max(max(max(minAx.x, minAx.y), max(minAx.x, minAx.z)), clip);
+                float tmax =     min(min(maxAx.x, maxAx.y), min(maxAx.x, maxAx.z));
+
+                // precautions against getting coordinates out of bounds
+                tmin = tmin + tstep;
+                tmax = tmax - tstep;
+
+                if(tmin < tmax)
+                {
+                    // interpolation optimized a little
+                    orig = orig*volume.voxelSizeInv;
+                    dir  =  dir*volume.voxelSizeInv;
+
+                    Point3f rayStep = dir * tstep;
+                    Point3f next = (orig + dir * tmin);
+                    volumeType f = volume.interpolateVoxel(next), fnext = f;
+
+                    //raymarch
+                    int steps = 0;
+                    int nSteps = (int)floor((tmax - tmin)/tstep);
+                    for(; steps < nSteps; steps++)
+                    {
+                        next += rayStep;
+                        int xdim = volume.volDims[0];
+                        int ydim = volume.volDims[1];
+                        int zdim = volume.volDims[2];
+                        int ix = cvRound(next.x);
+                        int iy = cvRound(next.y);
+                        int iz = cvRound(next.z);
+                        fnext = volume.volume.at<Voxel>(ix*xdim + iy*ydim + iz*zdim).v;
+                        if(fnext != f)
+                        {
+                            fnext = volume.interpolateVoxel(next);
+
+                            // when ray crosses a surface
+                            if(std::signbit(f) != std::signbit(fnext))
+                                break;
+
+                            f = fnext;
+                        }
+                    }
+
+                    // if ray penetrates a surface from outside
+                    // linearly interpolate t between two f values
+                    if(f > 0.f && fnext < 0.f)
+                    {
+                        Point3f tp = next - rayStep;
+                        volumeType ft   = volume.interpolateVoxel(tp);
+                        volumeType ftdt = volume.interpolateVoxel(next);
+                        // float t = tmin + steps*tstep;
+                        // float ts = t - tstep*ft/(ftdt - ft);
+                        float ts = tmin + tstep*(steps - ft/(ftdt - ft));
+
+                        // avoid division by zero
+                        if(!cvIsNaN(ts) && !cvIsInf(ts))
+                        {
+                            Point3f pv = (orig + dir*ts);
+                            Point3f nv = volume.getNormalVoxel(pv);
+
+                            if(!isNaN(nv))
+                            {
+                                //convert pv and nv to camera space
+                                normal = volRot * nv;
+                                // interpolation optimized a little
+                                point = vol2cam * (pv*volume.voxelSize);
+                            }
+                        }
+                    }
+                }
+
+                ptsRow[x] = toPtype(point);
+                nrmRow[x] = toPtype(normal);
+            }
+        }
+    }
+
+    Points& points;
+    Normals& normals;
+    const TSDFVolumeCPU& volume;
+
+    const float tstep;
+
+    const Point3f boxMax;
+    const Point3f boxMin;
+
+    const Affine3f cam2vol;
+    const Affine3f vol2cam;
+    const Intr::Reprojector reproj;
+};
+
+
+void TSDFVolumeCPU::raycast(cv::Affine3f cameraPose, Intr intrinsics, Size frameSize,
+                            cv::OutputArray _points, cv::OutputArray _normals) const
+{
+    CV_TRACE_FUNCTION();
+
+    CV_Assert(frameSize.area() > 0);
+
+    _points.create (frameSize, POINT_TYPE);
+    _normals.create(frameSize, POINT_TYPE);
+
+    Points points   =  _points.getMat();
+    Normals normals = _normals.getMat();
+
+    RaycastInvoker ri(points, normals, cameraPose, intrinsics, *this);
+
+    const int nstripes = -1;
+    parallel_for_(Range(0, points.rows), ri, nstripes);
+}
+
+
+struct FetchPointsNormalsInvoker : ParallelLoopBody
+{
+    FetchPointsNormalsInvoker(const TSDFVolumeCPU& _volume,
+                              std::vector< std::vector<ptype> >& _pVecs,
+                              std::vector< std::vector<ptype> >& _nVecs,
+                              bool _needNormals, bool _fetchVoxels) :
+        ParallelLoopBody(),
+        vol(_volume),
+        pVecs(_pVecs),
+        nVecs(_nVecs),
+        needNormals(_needNormals),
+        fetchVoxels(_fetchVoxels)
+    {
+        volDataStart = vol.volume.ptr<Voxel>();
+    }
+
+    inline void coord(std::vector<ptype>& points, std::vector<ptype>& normals,
+                      int x, int y, int z, Point3f V, float v0, int axis) const
+    {
+        // 0 for x, 1 for y, 2 for z
+        bool limits = false;
+        Point3i shift;
+        float Vc = 0.f;
+        if(axis == 0)
+        {
+            shift = Point3i(1, 0, 0);
+            limits = (x + 1 < vol.volResolution.x);
+            Vc = V.x;
+        }
+        if(axis == 1)
+        {
+            shift = Point3i(0, 1, 0);
+            limits = (y + 1 < vol.volResolution.y);
+            Vc = V.y;
+        }
+        if(axis == 2)
+        {
+            shift = Point3i(0, 0, 1);
+            limits = (z + 1 < vol.volResolution.z);
+            Vc = V.z;
+        }
+
+        if(limits)
+        {
+            const Voxel& voxeld = volDataStart[(x+shift.x)*vol.volDims[0] +
+                                               (y+shift.y)*vol.volDims[1] +
+                                               (z+shift.z)*vol.volDims[2]];
+            volumeType vd = voxeld.v;
+
+            if(voxeld.weight != 0 && vd != 1.f)
+            {
+                if((v0 > 0 && vd < 0) || (v0 < 0 && vd > 0))
+                {
+                    //linearly interpolate coordinate
+                    float Vn = Vc + vol.voxelSize;
+                    float dinv = 1.f/(abs(v0)+abs(vd));
+                    float inter = (Vc*abs(vd) + Vn*abs(v0))*dinv;
+
+                    Point3f p(shift.x ? inter : V.x,
+                              shift.y ? inter : V.y,
+                              shift.z ? inter : V.z);
+                    {
+                        if(fetchVoxels)
+                        {
+                            points.push_back(toPtype(p));
+                            if(needNormals)
+                                normals.push_back(toPtype(vol.getNormalVoxel(p*vol.voxelSizeInv)));
+                        } else {
+
+                            points.push_back(toPtype(vol.pose * p));
+                            if(needNormals)
+                                normals.push_back(toPtype(vol.pose.rotation() *
+                                                          vol.getNormalVoxel(p*vol.voxelSizeInv)));
+                        }
+                    }
+                }
+            }
+        }
+    }
+
+    virtual void operator() (const Range& range) const override
+    {
+        std::vector<ptype> points, normals;
+        for(int x = range.start; x < range.end; x++)
+        {
+            const Voxel* volDataX = volDataStart + x*vol.volDims[0];
+            for(int y = 0; y < vol.volResolution.y; y++)
+            {
+                const Voxel* volDataY = volDataX + y*vol.volDims[1];
+                for(int z = 0; z < vol.volResolution.z; z++)
+                {
+                    const Voxel& voxel0 = volDataY[z*vol.volDims[2]];
+                    volumeType v0 = voxel0.v;
+                    if(voxel0.weight != 0 && v0 != 1.f)
+                    {
+                        Point3f V(Point3f((float)x + 0.5f, (float)y + 0.5f, (float)z + 0.5f)*vol.voxelSize);
+
+                        coord(points, normals, x, y, z, V, v0, 0);
+                        coord(points, normals, x, y, z, V, v0, 1);
+                        coord(points, normals, x, y, z, V, v0, 2);
+
+                    } // if voxel is not empty
+                }
+            }
+        }
+
+        AutoLock al(mutex);
+        pVecs.push_back(points);
+        nVecs.push_back(normals);
+    }
+
+    const TSDFVolumeCPU& vol;
+    std::vector< std::vector<ptype> >& pVecs;
+    std::vector< std::vector<ptype> >& nVecs;
+    const Voxel* volDataStart;
+    bool needNormals;
+    bool fetchVoxels;
+    mutable Mutex mutex;
+};
+
+void TSDFVolumeCPU::fetchPointsNormals(OutputArray _points, OutputArray _normals, bool fetchVoxels) const
+{
+    CV_TRACE_FUNCTION();
+
+    if(_points.needed())
+    {
+        std::vector< std::vector<ptype> > pVecs, nVecs;
+        FetchPointsNormalsInvoker fi(*this, pVecs, nVecs, _normals.needed(), fetchVoxels);
+        Range range(0, volResolution.x);
+        const int nstripes = -1;
+        parallel_for_(range, fi, nstripes);
+        std::vector<ptype> points, normals;
+        for(size_t i = 0; i < pVecs.size(); i++)
+        {
+            points.insert(points.end(), pVecs[i].begin(), pVecs[i].end());
+            normals.insert(normals.end(), nVecs[i].begin(), nVecs[i].end());
+        }
+
+        _points.create((int)points.size(), 1, POINT_TYPE);
+        if(!points.empty())
+            Mat((int)points.size(), 1, POINT_TYPE, &points[0]).copyTo(_points.getMat());
+
+        if(_normals.needed())
+        {
+            _normals.create((int)normals.size(), 1, POINT_TYPE);
+            if(!normals.empty())
+                Mat((int)normals.size(), 1, POINT_TYPE, &normals[0]).copyTo(_normals.getMat());
+        }
+    }
+}
+
+
+struct PushNormals
+{
+    PushNormals(const TSDFVolumeCPU& _vol, Mat_<ptype>& _nrm) :
+        vol(_vol), normals(_nrm), invPose(vol.pose.inv())
+    { }
+    void operator ()(const ptype &pp, const int * position) const
+    {
+        Point3f p = fromPtype(pp);
+        Point3f n = nan3;
+        if(!isNaN(p))
+        {
+            Point3f voxPt = (invPose * p);
+            voxPt = voxPt * vol.voxelSizeInv;
+            n = vol.pose.rotation() * vol.getNormalVoxel(voxPt);
+        }
+        normals(position[0], position[1]) = toPtype(n);
+    }
+    const TSDFVolumeCPU& vol;
+    Mat_<ptype>& normals;
+
+    Affine3f invPose;
+};
+
+
+void TSDFVolumeCPU::fetchNormals(InputArray _points, OutputArray _normals) const
+{
+    CV_TRACE_FUNCTION();
+
+    if(_normals.needed())
+    {
+        Points points = _points.getMat();
+        CV_Assert(points.type() == POINT_TYPE);
+
+        _normals.createSameSize(_points, _points.type());
+        Mat_<ptype> normals = _normals.getMat();
+
+        points.forEach(PushNormals(*this, normals));
+    }
+}
+
+struct MarchCubesInvoker : ParallelLoopBody
+{
+    MarchCubesInvoker(const TSDFVolumeCPU& _volume,
+                      std::vector<Vec4f>& _meshPoints) :
+        volume(_volume),
+        meshPoints(_meshPoints),
+        mcNeighbourPts{
+            Point3f(0.f, 0.f, 0.f),
+            Point3f(0.f, 0.f, 1.f),
+            Point3f(0.f, 1.f, 1.f),
+            Point3f(0.f, 1.f, 0.f),
+            Point3f(1.f, 0.f, 0.f),
+            Point3f(1.f, 0.f, 1.f),
+            Point3f(1.f, 1.f, 1.f),
+            Point3f(1.f, 1.f, 0.f)},
+        mcNeighbourCoords(
+            Vec8i(
+            volume.volDims.dot(Vec4i(0, 0, 0)),
+            volume.volDims.dot(Vec4i(0, 0, 1)),
+            volume.volDims.dot(Vec4i(0, 1, 1)),
+            volume.volDims.dot(Vec4i(0, 1, 0)),
+            volume.volDims.dot(Vec4i(1, 0, 0)),
+            volume.volDims.dot(Vec4i(1, 0, 1)),
+            volume.volDims.dot(Vec4i(1, 1, 1)),
+            volume.volDims.dot(Vec4i(1, 1, 0))
+            ))
+    {
+        volData = volume.volume.ptr<Voxel>();
+    }
+
+    Point3f interpolate(Point3f p1, Point3f p2, float v1, float v2) const
+    {
+        float dV = 0.5f;
+        if (abs(v1 - v2) > 0.0001f)
+            dV = v1 / (v1 - v2);
+
+        Point3f p = p1 + dV * (p2 - p1);
+        return p;
+    }
+
+    virtual void operator()(const Range &range) const override
+    {
+        std::vector<Vec4f> points;
+        for (int x = range.start; x < range.end; x++)
+        {
+            int coordBaseX = x * volume.volDims[0];
+            for (int y = 0; y < volume.volResolution.y - 1; y++)
+            {
+                int coordBaseY = coordBaseX + y * volume.volDims[1];
+                for (int z = 0; z < volume.volResolution.z - 1; z++)
+                {
+                    int coordBase = coordBaseY + z * volume.volDims[2];
+
+                    if (volData[coordBase].weight == 0)
+                        continue;
+
+                    uint8_t cubeIndex = 0;
+                    float tsdfValues[8] = {0};
+                    for (int i = 0; i < 8; i++)
+                    {
+                        if (volData[mcNeighbourCoords[i] + coordBase].weight == 0)
+                            continue;
+
+                        tsdfValues[i] = volData[mcNeighbourCoords[i] + coordBase].v;
+                        if (tsdfValues[i] <= 0)
+                            cubeIndex |= (1 << i);
+                    }
+
+                    if (edgeTable[cubeIndex] == 0)
+                        continue;
+
+                    Point3f vertices[12];
+                    Point3f basePt((float)x, (float)y, (float)z);
+
+                    if (edgeTable[cubeIndex] & 1)
+                        vertices[0] = basePt + interpolate(mcNeighbourPts[0], mcNeighbourPts[1],
+                                                           tsdfValues[0], tsdfValues[1]);
+                    if (edgeTable[cubeIndex] & 2)
+                        vertices[1] = basePt + interpolate(mcNeighbourPts[1], mcNeighbourPts[2],
+                                                           tsdfValues[1], tsdfValues[2]);
+                    if (edgeTable[cubeIndex] & 4)
+                        vertices[2] = basePt + interpolate(mcNeighbourPts[2], mcNeighbourPts[3],
+                                                           tsdfValues[2], tsdfValues[3]);
+                    if (edgeTable[cubeIndex] & 8)
+                        vertices[3] = basePt + interpolate(mcNeighbourPts[3], mcNeighbourPts[0],
+                                                           tsdfValues[3], tsdfValues[0]);
+                    if (edgeTable[cubeIndex] & 16)
+                        vertices[4] = basePt + interpolate(mcNeighbourPts[4], mcNeighbourPts[5],
+                                                           tsdfValues[4], tsdfValues[5]);
+                    if (edgeTable[cubeIndex] & 32)
+                        vertices[5] = basePt + interpolate(mcNeighbourPts[5], mcNeighbourPts[6],
+                                                           tsdfValues[5], tsdfValues[6]);
+                    if (edgeTable[cubeIndex] & 64)
+                        vertices[6] = basePt + interpolate(mcNeighbourPts[6], mcNeighbourPts[7],
+                                                           tsdfValues[6], tsdfValues[7]);
+                    if (edgeTable[cubeIndex] & 128)
+                        vertices[7] = basePt + interpolate(mcNeighbourPts[7], mcNeighbourPts[4],
+                                                           tsdfValues[7], tsdfValues[4]);
+                    if (edgeTable[cubeIndex] & 256)
+                        vertices[8] = basePt + interpolate(mcNeighbourPts[0], mcNeighbourPts[4],
+                                                           tsdfValues[0], tsdfValues[4]);
+                    if (edgeTable[cubeIndex] & 512)
+                        vertices[9] = basePt + interpolate(mcNeighbourPts[1], mcNeighbourPts[5],
+                                                           tsdfValues[1], tsdfValues[5]);
+                    if (edgeTable[cubeIndex] & 1024)
+                        vertices[10] = basePt + interpolate(mcNeighbourPts[2], mcNeighbourPts[6],
+                                                            tsdfValues[2], tsdfValues[6]);
+                    if (edgeTable[cubeIndex] & 2048)
+                        vertices[11] = basePt + interpolate(mcNeighbourPts[3], mcNeighbourPts[7],
+                                                            tsdfValues[3], tsdfValues[7]);
+
+                    for (int i = 0; triTable[cubeIndex][i] != -1; i += 3)
+                    {
+                        Point3f p = volume.pose * (vertices[triTable[cubeIndex][i]] * volume.voxelSize);
+                        points.push_back(Vec4f(p.x, p.y, p.z, 1.f));
+
+                        p = volume.pose * (vertices[triTable[cubeIndex][i + 1]] * volume.voxelSize);
+                        points.push_back(Vec4f(p.x, p.y, p.z, 1.f));
+
+                        p = volume.pose * (vertices[triTable[cubeIndex][i + 2]] * volume.voxelSize);
+                        points.push_back(Vec4f(p.x, p.y, p.z, 1.f));
+                    }
+                }
+            }
+        }
+
+        if(points.size() > 0)
+        {
+            AutoLock al(m);
+            meshPoints.insert(meshPoints.end(), points.begin(), points.end());
+        }
+    }
+
+    const TSDFVolumeCPU& volume;
+    std::vector<Vec4f>& meshPoints;
+    const Point3f mcNeighbourPts[8];
+    const Vec8i mcNeighbourCoords;
+    const Voxel* volData;
+    mutable Mutex m;
+};
+
+void TSDFVolumeCPU::marchCubes(OutputArray _vertices, OutputArray _edges) const
+{
+    std::vector<Vec4f> meshPoints;
+    std::vector<int> meshEdges;
+    MarchCubesInvoker mci(*this, meshPoints);
+    Range range(0, volResolution.x - 1);
+    parallel_for_(range, mci);
+
+    for(int i = 0; i < (int)meshPoints.size(); i+= 3)
+    {
+        meshEdges.push_back(i);
+        meshEdges.push_back(i+1);
+
+        meshEdges.push_back(i+1);
+        meshEdges.push_back(i+2);
+
+        meshEdges.push_back(i+2);
+        meshEdges.push_back(i);
+    }
+
+    if (_vertices.needed())
+        Mat((int)meshPoints.size(), 1, CV_32FC4, &meshPoints[0]).copyTo(_vertices);
+
+    if (_edges.needed())
+        Mat((int)meshPoints.size(), 2, CV_32S, &meshEdges[0]).copyTo(_edges);
+}
+
+nodeNeighboursType const& TSDFVolumeCPU::getVoxelNeighbours(Point3i v, int& n) const
+{
+    int baseX = v.x * volDims[0];
+    int baseY = baseX + v.y * volDims[1];
+    int base = baseY + v.z * volDims[2];
+    const Voxel *vox = volume.ptr<Voxel>()+base;
+
+    n = vox->n;
+    return vox->neighbours;
+}
+
+cv::Ptr<TSDFVolume> makeTSDFVolume(Point3i _res,  float _voxelSize, cv::Affine3f _pose, float _truncDist, int _maxWeight,
+                                   float _raycastStepFactor)
+{
+    return cv::makePtr<TSDFVolumeCPU>(_res, _voxelSize, _pose, _truncDist, _maxWeight, _raycastStepFactor);
+}
+
+} // namespace dynafu
+} // namespace cv
diff --git a/modules/rgbd/src/dynafu_tsdf.hpp b/modules/rgbd/src/dynafu_tsdf.hpp
new file mode 100644
index 00000000000..8dabfeb0a45
--- /dev/null
+++ b/modules/rgbd/src/dynafu_tsdf.hpp
@@ -0,0 +1,58 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+#ifndef __OPENCV_DYNAFU_TSDF_H__
+#define __OPENCV_DYNAFU_TSDF_H__
+
+#include "kinfu_frame.hpp"
+#include "warpfield.hpp"
+
+namespace cv {
+namespace dynafu {
+
+class TSDFVolume
+{
+public:
+    // dimension in voxels, size in meters
+    TSDFVolume(Point3i _res, float _voxelSize, cv::Affine3f _pose, float _truncDist, int _maxWeight,
+               float _raycastStepFactor, bool zFirstMemOrder = true);
+
+    virtual void integrate(InputArray _depth, float depthFactor, cv::Affine3f cameraPose,
+                           cv::kinfu::Intr intrinsics, Ptr<WarpField> wf) = 0;
+
+    virtual void raycast(cv::Affine3f cameraPose, cv::kinfu::Intr intrinsics, cv::Size frameSize,
+                         cv::OutputArray points, cv::OutputArray normals) const = 0;
+
+    virtual void fetchPointsNormals(cv::OutputArray points, cv::OutputArray normals,
+                                    bool fetchVoxels=false) const = 0;
+
+    virtual void fetchNormals(cv::InputArray points, cv::OutputArray _normals) const = 0;
+
+    virtual void marchCubes(OutputArray _vertices, OutputArray _edges) const = 0;
+
+    virtual void reset() = 0;
+
+    virtual nodeNeighboursType const& getVoxelNeighbours(Point3i v, int& n) const = 0;
+
+    virtual ~TSDFVolume() { }
+
+    float voxelSize;
+    float voxelSizeInv;
+    Point3i volResolution;
+    float maxWeight;
+    cv::Affine3f pose;
+    float raycastStepFactor;
+
+    Point3f volSize;
+    float truncDist;
+    Vec4i volDims;
+    Vec8i neighbourCoords;
+};
+
+cv::Ptr<TSDFVolume> makeTSDFVolume(Point3i _res,  float _voxelSize, cv::Affine3f _pose, float _truncDist, int _maxWeight,
+                                   float _raycastStepFactor);
+
+} // namespace dynafu
+} // namespace cv
+#endif
diff --git a/modules/rgbd/src/fast_icp.cpp b/modules/rgbd/src/fast_icp.cpp
new file mode 100644
index 00000000000..06d4a750647
--- /dev/null
+++ b/modules/rgbd/src/fast_icp.cpp
@@ -0,0 +1,659 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#include "precomp.hpp"
+#include "fast_icp.hpp"
+#include "opencl_kernels_rgbd.hpp"
+
+using namespace std;
+
+namespace cv {
+namespace kinfu {
+
+enum
+{
+    UTSIZE = 27
+};
+
+ICP::ICP(const Intr _intrinsics, const std::vector<int>& _iterations, float _angleThreshold, float _distanceThreshold) :
+    iterations(_iterations), angleThreshold(_angleThreshold), distanceThreshold(_distanceThreshold),
+    intrinsics(_intrinsics)
+{ }
+
+class ICPImpl : public ICP
+{
+public:
+    ICPImpl(const cv::kinfu::Intr _intrinsics, const std::vector<int> &_iterations, float _angleThreshold, float _distanceThreshold);
+
+    virtual bool estimateTransform(cv::Affine3f& transform,
+                                   InputArray oldPoints, InputArray oldNormals,
+                                   InputArray newPoints, InputArray newNormals
+                                   ) const override;
+
+    template < typename T >
+    bool estimateTransformT(cv::Affine3f& transform,
+                            const vector<T>& oldPoints, const vector<T>& oldNormals,
+                            const vector<T>& newPoints, const vector<T>& newNormals
+                            ) const;
+
+    virtual ~ICPImpl() { }
+
+    template < typename T >
+    void getAb(const T& oldPts, const T& oldNrm, const T& newPts, const T& newNrm,
+               cv::Affine3f pose, int level, cv::Matx66f& A, cv::Vec6f& b) const;
+
+private:
+
+    mutable vector<UMat> groupedSumBuffers;
+
+};
+
+ICPImpl::ICPImpl(const Intr _intrinsics, const std::vector<int> &_iterations, float _angleThreshold, float _distanceThreshold) :
+    ICP(_intrinsics, _iterations, _angleThreshold, _distanceThreshold),
+    groupedSumBuffers(_iterations.size())
+{ }
+
+
+bool ICPImpl::estimateTransform(cv::Affine3f& transform,
+                                InputArray _oldPoints, InputArray _oldNormals,
+                                InputArray _newPoints, InputArray _newNormals
+                                ) const
+{
+    CV_TRACE_FUNCTION();
+
+    CV_Assert(_oldPoints.size() == _oldNormals.size());
+    CV_Assert(_newPoints.size() == _newNormals.size());
+    CV_Assert(_oldPoints.size() == _newPoints.size());
+
+#ifdef HAVE_OPENCL
+    if(cv::ocl::isOpenCLActivated() &&
+       _oldPoints.isUMatVector() && _oldNormals.isUMatVector() &&
+       _newPoints.isUMatVector() && _newNormals.isUMatVector())
+    {
+        std::vector<UMat> op, np, on, nn;
+        _oldPoints.getUMatVector(op);
+        _newPoints.getUMatVector(np);
+        _oldNormals.getUMatVector(on);
+        _newNormals.getUMatVector(nn);
+        return estimateTransformT<UMat>(transform, op, on, np, nn);
+    }
+#endif
+
+    std::vector<Mat> op, on, np, nn;
+    _oldPoints.getMatVector(op);
+    _newPoints.getMatVector(np);
+    _oldNormals.getMatVector(on);
+    _newNormals.getMatVector(nn);
+    return estimateTransformT<Mat>(transform, op, on, np, nn);
+}
+
+template < typename T >
+bool ICPImpl::estimateTransformT(cv::Affine3f& transform,
+                                 const vector<T>& oldPoints, const vector<T>& oldNormals,
+                                 const vector<T>& newPoints, const vector<T>& newNormals
+                                 ) const
+{
+    CV_TRACE_FUNCTION();
+
+    transform = Affine3f::Identity();
+    for(size_t l = 0; l < iterations.size(); l++)
+    {
+        size_t level = iterations.size() - 1 - l;
+
+        const T& oldPts = oldPoints [level], newPts = newPoints [level];
+        const T& oldNrm = oldNormals[level], newNrm = newNormals[level];
+
+        for(int iter = 0; iter < iterations[level]; iter++)
+        {
+            Matx66f A;
+            Vec6f b;
+
+            getAb(oldPts, oldNrm, newPts, newNrm, transform, (int)level, A, b);
+
+            double det = cv::determinant(A);
+
+            if (abs (det) < 1e-15 || cvIsNaN(det))
+                return false;
+
+            Vec6f x;
+            // theoretically, any method of solving is applicable
+            // since there are usual least square matrices
+            solve(A, b, x, DECOMP_SVD);
+            Affine3f tinc(Vec3f(x.val), Vec3f(x.val+3));
+            transform = tinc * transform;
+        }
+    }
+
+    return true;
+}
+
+
+///////// CPU implementation /////////
+
+// 1 any coord to check is enough since we know the generation
+
+
+#if USE_INTRINSICS
+static inline bool fastCheck(const v_float32x4& p0, const v_float32x4& p1)
+{
+    float check = (p0.get0() + p1.get0());
+    return !cvIsNaN(check);
+}
+
+static inline void getCrossPerm(const v_float32x4& a, v_float32x4& yzx, v_float32x4& zxy)
+{
+    v_uint32x4 aa = v_reinterpret_as_u32(a);
+    v_uint32x4 yz00 = v_extract<1>(aa, v_setzero_u32());
+    v_uint32x4 x0y0, tmp;
+    v_zip(aa, v_setzero_u32(), x0y0, tmp);
+    v_uint32x4 yzx0 = v_combine_low(yz00, x0y0);
+    v_uint32x4 y000 = v_extract<2>(x0y0, v_setzero_u32());
+    v_uint32x4 zx00 = v_extract<1>(yzx0, v_setzero_u32());
+    zxy = v_reinterpret_as_f32(v_combine_low(zx00, y000));
+    yzx = v_reinterpret_as_f32(yzx0);
+}
+
+static inline v_float32x4 crossProduct(const v_float32x4& a, const v_float32x4& b)
+{
+    v_float32x4 ayzx, azxy, byzx, bzxy;
+    getCrossPerm(a, ayzx, azxy);
+    getCrossPerm(b, byzx, bzxy);
+    return ayzx*bzxy - azxy*byzx;
+}
+#else
+static inline bool fastCheck(const Point3f& p)
+{
+    return !cvIsNaN(p.x);
+}
+
+#endif
+
+typedef Matx<float, 6, 7> ABtype;
+
+struct GetAbInvoker : ParallelLoopBody
+{
+    GetAbInvoker(ABtype& _globalAb, Mutex& _mtx,
+                 const Points& _oldPts, const Normals& _oldNrm, const Points& _newPts, const Normals& _newNrm,
+                 Affine3f _pose, Intr::Projector _proj, float _sqDistanceThresh, float _minCos) :
+        ParallelLoopBody(),
+        globalSumAb(_globalAb), mtx(_mtx),
+        oldPts(_oldPts), oldNrm(_oldNrm), newPts(_newPts), newNrm(_newNrm), pose(_pose),
+        proj(_proj), sqDistanceThresh(_sqDistanceThresh), minCos(_minCos)
+    { }
+
+    virtual void operator ()(const Range& range) const override
+    {
+#if USE_INTRINSICS
+        CV_Assert(ptype::channels == 4);
+
+        const size_t utBufferSize = 9;
+        float CV_DECL_ALIGNED(16) upperTriangle[utBufferSize*4];
+        for(size_t i = 0; i < utBufferSize*4; i++)
+            upperTriangle[i] = 0;
+        // how values are kept in upperTriangle
+        const int NA = 0;
+        const size_t utPos[] =
+        {
+           0,  1,  2,  4,  5,  6,  3,
+          NA,  9, 10, 12, 13, 14, 11,
+          NA, NA, 18, 20, 21, 22, 19,
+          NA, NA, NA, 24, 28, 30, 32,
+          NA, NA, NA, NA, 25, 29, 33,
+          NA, NA, NA, NA, NA, 26, 34
+        };
+
+        const float (&pm)[16] = pose.matrix.val;
+        v_float32x4 poseRot0(pm[0], pm[4], pm[ 8], 0);
+        v_float32x4 poseRot1(pm[1], pm[5], pm[ 9], 0);
+        v_float32x4 poseRot2(pm[2], pm[6], pm[10], 0);
+        v_float32x4 poseTrans(pm[3], pm[7], pm[11], 0);
+
+        v_float32x4 vfxy(proj.fx, proj.fy, 0, 0), vcxy(proj.cx, proj.cy, 0, 0);
+        v_float32x4 vframe((float)(oldPts.cols - 1), (float)(oldPts.rows - 1), 1.f, 1.f);
+
+        float sqThresh = sqDistanceThresh;
+        float cosThresh = minCos;
+
+        for(int y = range.start; y < range.end; y++)
+        {
+            const CV_DECL_ALIGNED(16) float* newPtsRow = (const float*)newPts[y];
+            const CV_DECL_ALIGNED(16) float* newNrmRow = (const float*)newNrm[y];
+
+            for(int x = 0; x < newPts.cols; x++)
+            {
+                v_float32x4 newP = v_load_aligned(newPtsRow + x*4);
+                v_float32x4 newN = v_load_aligned(newNrmRow + x*4);
+
+                if(!fastCheck(newP, newN))
+                    continue;
+
+                //transform to old coord system
+                newP = v_matmuladd(newP, poseRot0, poseRot1, poseRot2, poseTrans);
+                newN = v_matmuladd(newN, poseRot0, poseRot1, poseRot2, v_setzero_f32());
+
+                //find correspondence by projecting the point
+                v_float32x4 oldCoords;
+                float pz = (v_reinterpret_as_f32(v_rotate_right<2>(v_reinterpret_as_u32(newP))).get0());
+                // x, y, 0, 0
+                oldCoords = v_muladd(newP/v_setall_f32(pz), vfxy, vcxy);
+
+                if(!v_check_all((oldCoords >= v_setzero_f32()) & (oldCoords < vframe)))
+                    continue;
+
+                // bilinearly interpolate oldPts and oldNrm under oldCoords point
+                v_float32x4 oldP;
+                v_float32x4 oldN;
+                {
+                    v_int32x4 ixy = v_floor(oldCoords);
+                    v_float32x4 txy = oldCoords - v_cvt_f32(ixy);
+                    int xi = ixy.get0();
+                    int yi = v_rotate_right<1>(ixy).get0();
+                    v_float32x4 tx = v_setall_f32(txy.get0());
+                    txy = v_reinterpret_as_f32(v_rotate_right<1>(v_reinterpret_as_u32(txy)));
+                    v_float32x4 ty = v_setall_f32(txy.get0());
+
+                    const float* prow0 = (const float*)oldPts[yi+0];
+                    const float* prow1 = (const float*)oldPts[yi+1];
+
+                    v_float32x4 p00 = v_load(prow0 + (xi+0)*4);
+                    v_float32x4 p01 = v_load(prow0 + (xi+1)*4);
+                    v_float32x4 p10 = v_load(prow1 + (xi+0)*4);
+                    v_float32x4 p11 = v_load(prow1 + (xi+1)*4);
+
+                    // do not fix missing data
+                    // NaN check is done later
+
+                    const float* nrow0 = (const float*)oldNrm[yi+0];
+                    const float* nrow1 = (const float*)oldNrm[yi+1];
+
+                    v_float32x4 n00 = v_load(nrow0 + (xi+0)*4);
+                    v_float32x4 n01 = v_load(nrow0 + (xi+1)*4);
+                    v_float32x4 n10 = v_load(nrow1 + (xi+0)*4);
+                    v_float32x4 n11 = v_load(nrow1 + (xi+1)*4);
+
+                    // NaN check is done later
+
+                    v_float32x4 p0 = p00 + tx*(p01 - p00);
+                    v_float32x4 p1 = p10 + tx*(p11 - p10);
+                    oldP = p0 + ty*(p1 - p0);
+
+                    v_float32x4 n0 = n00 + tx*(n01 - n00);
+                    v_float32x4 n1 = n10 + tx*(n11 - n10);
+                    oldN = n0 + ty*(n1 - n0);
+                }
+
+                bool oldPNcheck = fastCheck(oldP, oldN);
+
+                //filter by distance
+                v_float32x4 diff = newP - oldP;
+                bool distCheck = !(v_reduce_sum(diff*diff) > sqThresh);
+
+                //filter by angle
+                bool angleCheck = !(abs(v_reduce_sum(newN*oldN)) < cosThresh);
+
+                if(!(oldPNcheck && distCheck && angleCheck))
+                    continue;
+
+                // build point-wise vector ab = [ A | b ]
+
+                v_float32x4 VxNv = crossProduct(newP, oldN);
+                Point3f VxN;
+                VxN.x = VxNv.get0();
+                VxN.y = v_reinterpret_as_f32(v_extract<1>(v_reinterpret_as_u32(VxNv), v_setzero_u32())).get0();
+                VxN.z = v_reinterpret_as_f32(v_extract<2>(v_reinterpret_as_u32(VxNv), v_setzero_u32())).get0();
+
+                float dotp = -v_reduce_sum(oldN*diff);
+
+                // build point-wise upper-triangle matrix [ab^T * ab] w/o last row
+                // which is [A^T*A | A^T*b]
+                // and gather sum
+
+                v_float32x4 vd = VxNv | v_float32x4(0, 0, 0, dotp);
+                v_float32x4 n = oldN;
+                v_float32x4 nyzx;
+                {
+                    v_uint32x4 aa = v_reinterpret_as_u32(n);
+                    v_uint32x4 yz00 = v_extract<1>(aa, v_setzero_u32());
+                    v_uint32x4 x0y0, tmp;
+                    v_zip(aa, v_setzero_u32(), x0y0, tmp);
+                    nyzx = v_reinterpret_as_f32(v_combine_low(yz00, x0y0));
+                }
+
+                v_float32x4 vutg[utBufferSize];
+                for(size_t i = 0; i < utBufferSize; i++)
+                    vutg[i] = v_load_aligned(upperTriangle + i*4);
+
+                int p = 0;
+                v_float32x4 v;
+                // vx * vd, vx * n
+                v = v_setall_f32(VxN.x);
+                v_store_aligned(upperTriangle + p*4, v_muladd(v, vd, vutg[p])); p++;
+                v_store_aligned(upperTriangle + p*4, v_muladd(v,  n, vutg[p])); p++;
+                // vy * vd, vy * n
+                v = v_setall_f32(VxN.y);
+                v_store_aligned(upperTriangle + p*4, v_muladd(v, vd, vutg[p])); p++;
+                v_store_aligned(upperTriangle + p*4, v_muladd(v,  n, vutg[p])); p++;
+                // vz * vd, vz * n
+                v = v_setall_f32(VxN.z);
+                v_store_aligned(upperTriangle + p*4, v_muladd(v, vd, vutg[p])); p++;
+                v_store_aligned(upperTriangle + p*4, v_muladd(v,  n, vutg[p])); p++;
+                // nx^2, ny^2, nz^2
+                v_store_aligned(upperTriangle + p*4, v_muladd(n, n, vutg[p])); p++;
+                // nx*ny, ny*nz, nx*nz
+                v_store_aligned(upperTriangle + p*4, v_muladd(n, nyzx, vutg[p])); p++;
+                // nx*d, ny*d, nz*d
+                v = v_setall_f32(dotp);
+                v_store_aligned(upperTriangle + p*4, v_muladd(n, v, vutg[p])); p++;
+            }
+        }
+
+        ABtype sumAB = ABtype::zeros();
+        for(int i = 0; i < 6; i++)
+        {
+            for(int j = i; j < 7; j++)
+            {
+                size_t p = utPos[i*7+j];
+                sumAB(i, j) = upperTriangle[p];
+            }
+        }
+
+#else
+
+        float upperTriangle[UTSIZE];
+        for(int i = 0; i < UTSIZE; i++)
+            upperTriangle[i] = 0;
+
+        for(int y = range.start; y < range.end; y++)
+        {
+            const ptype* newPtsRow = newPts[y];
+            const ptype* newNrmRow = newNrm[y];
+
+            for(int x = 0; x < newPts.cols; x++)
+            {
+                Point3f newP = fromPtype(newPtsRow[x]);
+                Point3f newN = fromPtype(newNrmRow[x]);
+
+                Point3f oldP(nan3), oldN(nan3);
+
+                if(!(fastCheck(newP) && fastCheck(newN)))
+                    continue;
+
+                //transform to old coord system
+                newP = pose * newP;
+                newN = pose.rotation() * newN;
+
+                //find correspondence by projecting the point
+                Point2f oldCoords = proj(newP);
+                if(!(oldCoords.x >= 0 && oldCoords.x < oldPts.cols - 1 &&
+                     oldCoords.y >= 0 && oldCoords.y < oldPts.rows - 1))
+                    continue;
+
+                // bilinearly interpolate oldPts and oldNrm under oldCoords point
+                int xi = cvFloor(oldCoords.x), yi = cvFloor(oldCoords.y);
+                float tx  = oldCoords.x - xi, ty = oldCoords.y - yi;
+
+                const ptype* prow0 = oldPts[yi+0];
+                const ptype* prow1 = oldPts[yi+1];
+
+                Point3f p00 = fromPtype(prow0[xi+0]);
+                Point3f p01 = fromPtype(prow0[xi+1]);
+                Point3f p10 = fromPtype(prow1[xi+0]);
+                Point3f p11 = fromPtype(prow1[xi+1]);
+
+                //do not fix missing data
+                if(!(fastCheck(p00) && fastCheck(p01) &&
+                     fastCheck(p10) && fastCheck(p11)))
+                    continue;
+
+                const ptype* nrow0 = oldNrm[yi+0];
+                const ptype* nrow1 = oldNrm[yi+1];
+
+                Point3f n00 = fromPtype(nrow0[xi+0]);
+                Point3f n01 = fromPtype(nrow0[xi+1]);
+                Point3f n10 = fromPtype(nrow1[xi+0]);
+                Point3f n11 = fromPtype(nrow1[xi+1]);
+
+                if(!(fastCheck(n00) && fastCheck(n01) &&
+                     fastCheck(n10) && fastCheck(n11)))
+                    continue;
+
+                Point3f p0 = p00 + tx*(p01 - p00);
+                Point3f p1 = p10 + tx*(p11 - p10);
+                oldP = p0 + ty*(p1 - p0);
+
+                Point3f n0 = n00 + tx*(n01 - n00);
+                Point3f n1 = n10 + tx*(n11 - n10);
+                oldN = n0 + ty*(n1 - n0);
+
+                if(!(fastCheck(oldP) && fastCheck(oldN)))
+                    continue;
+
+                //filter by distance
+                Point3f diff = newP - oldP;
+                if(diff.dot(diff) > sqDistanceThresh)
+                {
+                    continue;
+                }
+
+                //filter by angle
+                if(abs(newN.dot(oldN)) < minCos)
+                {
+                    continue;
+                }
+
+                // build point-wise vector ab = [ A | b ]
+
+                //try to optimize
+                Point3f VxN = newP.cross(oldN);
+                float ab[7] = {VxN.x, VxN.y, VxN.z, oldN.x, oldN.y, oldN.z, oldN.dot(-diff)};
+
+                // build point-wise upper-triangle matrix [ab^T * ab] w/o last row
+                // which is [A^T*A | A^T*b]
+                // and gather sum
+                int pos = 0;
+                for(int i = 0; i < 6; i++)
+                {
+                    for(int j = i; j < 7; j++)
+                    {
+                        upperTriangle[pos++] += ab[i]*ab[j];
+                    }
+                }
+            }
+        }
+
+        ABtype sumAB = ABtype::zeros();
+        int pos = 0;
+        for(int i = 0; i < 6; i++)
+        {
+            for(int j = i; j < 7; j++)
+            {
+                sumAB(i, j) = upperTriangle[pos++];
+            }
+        }
+#endif
+
+        AutoLock al(mtx);
+        globalSumAb += sumAB;
+    }
+
+    ABtype& globalSumAb;
+    Mutex& mtx;
+    const Points& oldPts;
+    const Normals& oldNrm;
+    const Points& newPts;
+    const Normals& newNrm;
+    Affine3f pose;
+    const Intr::Projector proj;
+    float sqDistanceThresh;
+    float minCos;
+};
+
+
+template <>
+void ICPImpl::getAb<Mat>(const Mat& oldPts, const Mat& oldNrm, const Mat& newPts, const Mat& newNrm,
+                         cv::Affine3f pose, int level, cv::Matx66f& A, cv::Vec6f& b) const
+{
+    CV_TRACE_FUNCTION();
+
+    CV_Assert(oldPts.size() == oldNrm.size());
+    CV_Assert(newPts.size() == newNrm.size());
+
+    ABtype sumAB = ABtype::zeros();
+    Mutex mutex;
+    const Points  op(oldPts), on(oldNrm);
+    const Normals np(newPts), nn(newNrm);
+    GetAbInvoker invoker(sumAB, mutex, op, on, np, nn, pose,
+                         intrinsics.scale(level).makeProjector(),
+                         distanceThreshold*distanceThreshold, cos(angleThreshold));
+    Range range(0, newPts.rows);
+    const int nstripes = -1;
+    parallel_for_(range, invoker, nstripes);
+
+    // splitting AB matrix to A and b
+    for(int i = 0; i < 6; i++)
+    {
+        // augment lower triangle of A by symmetry
+        for(int j = i; j < 6; j++)
+        {
+            A(i, j) = A(j, i) = sumAB(i, j);
+        }
+
+        b(i) = sumAB(i, 6);
+    }
+}
+
+///////// GPU implementation /////////
+
+#ifdef HAVE_OPENCL
+
+template <>
+void ICPImpl::getAb<UMat>(const UMat& oldPts, const UMat& oldNrm, const UMat& newPts, const UMat& newNrm,
+                          Affine3f pose, int level, Matx66f &A, Vec6f &b) const
+{
+    CV_TRACE_FUNCTION();
+
+    Size oldSize = oldPts.size(), newSize = newPts.size();
+    CV_Assert(oldSize == oldNrm.size());
+    CV_Assert(newSize == newNrm.size());
+
+    // calculate 1x7 vector ab to produce b and upper triangle of A:
+    // [A|b] = ab*(ab^t)
+    // and then reduce it across work groups
+
+    cv::String errorStr;
+    ocl::ProgramSource source = ocl::rgbd::icp_oclsrc;
+    cv::String options = "-cl-mad-enable";
+    ocl::Kernel k;
+    k.create("getAb", source, options, &errorStr);
+
+    if(k.empty())
+        throw std::runtime_error("Failed to create kernel: " + errorStr);
+
+    size_t globalSize[2];
+    globalSize[0] = (size_t)newPts.cols;
+    globalSize[1] = (size_t)newPts.rows;
+
+    const ocl::Device& device = ocl::Device::getDefault();
+    // workaround for Intel's integrated GPU
+    size_t wgsLimit = device.isIntel() ? 64 : device.maxWorkGroupSize();
+    size_t memSize = device.localMemSize();
+    // local memory should keep upperTriangles for all threads in group for reduce
+    const size_t ltsz = UTSIZE*sizeof(float);
+    const size_t lcols = 32;
+    size_t lrows = min(memSize/ltsz, wgsLimit)/lcols;
+    // round lrows down to 2^n
+    lrows = roundDownPow2(lrows);
+    size_t localSize[2] = {lcols, lrows};
+    Size ngroups((int)divUp(globalSize[0], (unsigned int)localSize[0]),
+                 (int)divUp(globalSize[1], (unsigned int)localSize[1]));
+
+    // size of local buffer for group-wide reduce
+    size_t lsz = localSize[0]*localSize[1]*ltsz;
+
+    Intr::Projector proj = intrinsics.scale(level).makeProjector();
+    Vec2f fxy(proj.fx, proj.fy), cxy(proj.cx, proj.cy);
+
+    UMat& groupedSumGpu = groupedSumBuffers[level];
+    groupedSumGpu.create(Size(ngroups.width*UTSIZE, ngroups.height),
+                         CV_32F);
+    groupedSumGpu.setTo(0);
+
+    // TODO: optimization possible:
+    // samplers instead of oldPts/oldNrm (mask needed)
+    k.args(ocl::KernelArg::ReadOnlyNoSize(oldPts),
+           ocl::KernelArg::ReadOnlyNoSize(oldNrm),
+           oldSize,
+           ocl::KernelArg::ReadOnlyNoSize(newPts),
+           ocl::KernelArg::ReadOnlyNoSize(newNrm),
+           newSize,
+           ocl::KernelArg::Constant(pose.matrix.val,
+                                    sizeof(pose.matrix.val)),
+           fxy.val, cxy.val,
+           distanceThreshold*distanceThreshold,
+           cos(angleThreshold),
+           ocl::KernelArg::Local(lsz),
+           ocl::KernelArg::WriteOnlyNoSize(groupedSumGpu)
+           );
+
+    if(!k.run(2, globalSize, localSize, true))
+        throw std::runtime_error("Failed to run kernel");
+
+    float upperTriangle[UTSIZE];
+    for(int i = 0; i < UTSIZE; i++)
+        upperTriangle[i] = 0;
+
+    Mat groupedSumCpu = groupedSumGpu.getMat(ACCESS_READ);
+
+    for(int y = 0; y < ngroups.height; y++)
+    {
+        const float* rowr = groupedSumCpu.ptr<float>(y);
+        for(int x = 0; x < ngroups.width; x++)
+        {
+            const float* p = rowr + x*UTSIZE;
+            for(int j = 0; j < UTSIZE; j++)
+            {
+                upperTriangle[j] += p[j];
+            }
+        }
+    }
+    groupedSumCpu.release();
+
+    ABtype sumAB = ABtype::zeros();
+    int pos = 0;
+    for(int i = 0; i < 6; i++)
+    {
+        for(int j = i; j < 7; j++)
+        {
+            sumAB(i, j) = upperTriangle[pos++];
+        }
+    }
+
+    // splitting AB matrix to A and b
+    for(int i = 0; i < 6; i++)
+    {
+        // augment lower triangle of A by symmetry
+        for(int j = i; j < 6; j++)
+        {
+            A(i, j) = A(j, i) = sumAB(i, j);
+        }
+
+        b(i) = sumAB(i, 6);
+    }
+}
+
+#endif
+
+///
+
+
+cv::Ptr<ICP> makeICP(const cv::kinfu::Intr _intrinsics, const std::vector<int> &_iterations,
+                     float _angleThreshold, float _distanceThreshold)
+{
+    return makePtr<ICPImpl>(_intrinsics, _iterations, _angleThreshold, _distanceThreshold);
+}
+
+} // namespace kinfu
+} // namespace cv
diff --git a/modules/rgbd/src/fast_icp.hpp b/modules/rgbd/src/fast_icp.hpp
new file mode 100644
index 00000000000..7a9f0694096
--- /dev/null
+++ b/modules/rgbd/src/fast_icp.hpp
@@ -0,0 +1,41 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#ifndef __OPENCV_KINFU_FAST_ICP_H__
+#define __OPENCV_KINFU_FAST_ICP_H__
+
+#include "precomp.hpp"
+#include "kinfu_frame.hpp"
+
+namespace cv {
+namespace kinfu {
+
+class ICP
+{
+public:
+    ICP(const cv::kinfu::Intr _intrinsics, const std::vector<int> &_iterations, float _angleThreshold, float _distanceThreshold);
+
+    virtual bool estimateTransform(cv::Affine3f& transform,
+                                   InputArray oldPoints, InputArray oldNormals,
+                                   InputArray newPoints, InputArray newNormals
+                                   ) const = 0;
+
+    virtual ~ICP() { }
+
+protected:
+
+    std::vector<int> iterations;
+    float angleThreshold;
+    float distanceThreshold;
+    cv::kinfu::Intr intrinsics;
+};
+
+cv::Ptr<ICP> makeICP(const cv::kinfu::Intr _intrinsics, const std::vector<int> &_iterations,
+                     float _angleThreshold, float _distanceThreshold);
+
+} // namespace kinfu
+} // namespace cv
+#endif
diff --git a/modules/rgbd/src/kinfu.cpp b/modules/rgbd/src/kinfu.cpp
new file mode 100644
index 00000000000..61644385f1a
--- /dev/null
+++ b/modules/rgbd/src/kinfu.cpp
@@ -0,0 +1,333 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#include "precomp.hpp"
+#include "fast_icp.hpp"
+#include "tsdf.hpp"
+#include "kinfu_frame.hpp"
+
+namespace cv {
+namespace kinfu {
+
+void Params::setInitialVolumePose(Matx33f R, Vec3f t)
+{
+    setInitialVolumePose(Affine3f(R,t).matrix);
+}
+
+void Params::setInitialVolumePose(Matx44f homogen_tf)
+{
+    Params::volumePose.matrix = homogen_tf;
+}
+
+Ptr<Params> Params::defaultParams()
+{
+    Params p;
+
+    p.frameSize = Size(640, 480);
+
+    float fx, fy, cx, cy;
+    fx = fy = 525.f;
+    cx = p.frameSize.width/2 - 0.5f;
+    cy = p.frameSize.height/2 - 0.5f;
+    p.intr = Matx33f(fx,  0, cx,
+                      0, fy, cy,
+                      0,  0,  1);
+
+    // 5000 for the 16-bit PNG files
+    // 1 for the 32-bit float images in the ROS bag files
+    p.depthFactor = 5000;
+
+    // sigma_depth is scaled by depthFactor when calling bilateral filter
+    p.bilateral_sigma_depth = 0.04f;  //meter
+    p.bilateral_sigma_spatial = 4.5; //pixels
+    p.bilateral_kernel_size = 7;     //pixels
+
+    p.icpAngleThresh = (float)(30. * CV_PI / 180.); // radians
+    p.icpDistThresh = 0.1f; // meters
+
+    p.icpIterations = {10, 5, 4};
+    p.pyramidLevels = (int)p.icpIterations.size();
+
+    p.tsdf_min_camera_movement = 0.f; //meters, disabled
+
+    p.volumeDims = Vec3i::all(512); //number of voxels
+
+    float volSize = 3.f;
+    p.voxelSize = volSize/512.f; //meters
+
+    // default pose of volume cube
+    p.volumePose = Affine3f().translate(Vec3f(-volSize/2.f, -volSize/2.f, 0.5f));
+    p.tsdf_trunc_dist = 0.04f; //meters;
+    p.tsdf_max_weight = 64;   //frames
+
+    p.raycast_step_factor = 0.25f;  //in voxel sizes
+    // gradient delta factor is fixed at 1.0f and is not used
+    //p.gradient_delta_factor = 0.5f; //in voxel sizes
+
+    //p.lightPose = p.volume_pose.translation()/4; //meters
+    p.lightPose = Vec3f::all(0.f); //meters
+
+    // depth truncation is not used by default but can be useful in some scenes
+    p.truncateThreshold = 0.f; //meters
+
+    return makePtr<Params>(p);
+}
+
+Ptr<Params> Params::coarseParams()
+{
+    Ptr<Params> p = defaultParams();
+
+    p->icpIterations = {5, 3, 2};
+    p->pyramidLevels = (int)p->icpIterations.size();
+
+    float volSize = 3.f;
+    p->volumeDims = Vec3i::all(128); //number of voxels
+    p->voxelSize  = volSize/128.f;
+
+    p->raycast_step_factor = 0.75f;  //in voxel sizes
+
+    return p;
+}
+
+// T should be Mat or UMat
+template< typename T >
+class KinFuImpl : public KinFu
+{
+public:
+    KinFuImpl(const Params& _params);
+    virtual ~KinFuImpl();
+
+    const Params& getParams() const CV_OVERRIDE;
+
+    void render(OutputArray image, const Matx44f& cameraPose) const CV_OVERRIDE;
+
+    void getCloud(OutputArray points, OutputArray normals) const CV_OVERRIDE;
+    void getPoints(OutputArray points) const CV_OVERRIDE;
+    void getNormals(InputArray points, OutputArray normals) const CV_OVERRIDE;
+
+    void reset() CV_OVERRIDE;
+
+    const Affine3f getPose() const CV_OVERRIDE;
+
+    bool update(InputArray depth) CV_OVERRIDE;
+
+    bool updateT(const T& depth);
+
+private:
+    Params params;
+
+    cv::Ptr<ICP> icp;
+    cv::Ptr<TSDFVolume> volume;
+
+    int frameCounter;
+    Affine3f pose;
+    std::vector<T> pyrPoints;
+    std::vector<T> pyrNormals;
+};
+
+
+template< typename T >
+KinFuImpl<T>::KinFuImpl(const Params &_params) :
+    params(_params),
+    icp(makeICP(params.intr, params.icpIterations, params.icpAngleThresh, params.icpDistThresh)),
+    volume(makeTSDFVolume(params.volumeDims, params.voxelSize, params.volumePose,
+                          params.tsdf_trunc_dist, params.tsdf_max_weight,
+                          params.raycast_step_factor)),
+    pyrPoints(), pyrNormals()
+{
+    reset();
+}
+
+template< typename T >
+void KinFuImpl<T>::reset()
+{
+    frameCounter = 0;
+    pose = Affine3f::Identity();
+    volume->reset();
+}
+
+template< typename T >
+KinFuImpl<T>::~KinFuImpl()
+{ }
+
+template< typename T >
+const Params& KinFuImpl<T>::getParams() const
+{
+    return params;
+}
+
+template< typename T >
+const Affine3f KinFuImpl<T>::getPose() const
+{
+    return pose;
+}
+
+
+template<>
+bool KinFuImpl<Mat>::update(InputArray _depth)
+{
+    CV_Assert(!_depth.empty() && _depth.size() == params.frameSize);
+
+    Mat depth;
+    if(_depth.isUMat())
+    {
+        _depth.copyTo(depth);
+        return updateT(depth);
+    }
+    else
+    {
+        return updateT(_depth.getMat());
+    }
+}
+
+
+template<>
+bool KinFuImpl<UMat>::update(InputArray _depth)
+{
+    CV_Assert(!_depth.empty() && _depth.size() == params.frameSize);
+
+    UMat depth;
+    if(!_depth.isUMat())
+    {
+        _depth.copyTo(depth);
+        return updateT(depth);
+    }
+    else
+    {
+        return updateT(_depth.getUMat());
+    }
+}
+
+
+template< typename T >
+bool KinFuImpl<T>::updateT(const T& _depth)
+{
+    CV_TRACE_FUNCTION();
+
+    T depth;
+    if(_depth.type() != DEPTH_TYPE)
+        _depth.convertTo(depth, DEPTH_TYPE);
+    else
+        depth = _depth;
+
+    std::vector<T> newPoints, newNormals;
+    makeFrameFromDepth(depth, newPoints, newNormals, params.intr,
+                       params.pyramidLevels,
+                       params.depthFactor,
+                       params.bilateral_sigma_depth,
+                       params.bilateral_sigma_spatial,
+                       params.bilateral_kernel_size,
+                       params.truncateThreshold);
+
+    if(frameCounter == 0)
+    {
+        // use depth instead of distance
+        volume->integrate(depth, params.depthFactor, pose, params.intr);
+
+        pyrPoints  = newPoints;
+        pyrNormals = newNormals;
+    }
+    else
+    {
+        Affine3f affine;
+        bool success = icp->estimateTransform(affine, pyrPoints, pyrNormals, newPoints, newNormals);
+        if(!success)
+            return false;
+
+        pose = pose * affine;
+
+        float rnorm = (float)cv::norm(affine.rvec());
+        float tnorm = (float)cv::norm(affine.translation());
+        // We do not integrate volume if camera does not move
+        if((rnorm + tnorm)/2 >= params.tsdf_min_camera_movement)
+        {
+            // use depth instead of distance
+            volume->integrate(depth, params.depthFactor, pose, params.intr);
+        }
+
+        T& points  = pyrPoints [0];
+        T& normals = pyrNormals[0];
+        volume->raycast(pose, params.intr, params.frameSize, points, normals);
+        // build a pyramid of points and normals
+        buildPyramidPointsNormals(points, normals, pyrPoints, pyrNormals,
+                                  params.pyramidLevels);
+    }
+
+    frameCounter++;
+    return true;
+}
+
+
+template< typename T >
+void KinFuImpl<T>::render(OutputArray image, const Matx44f& _cameraPose) const
+{
+    CV_TRACE_FUNCTION();
+
+    Affine3f cameraPose(_cameraPose);
+
+    const Affine3f id = Affine3f::Identity();
+    if((cameraPose.rotation() == pose.rotation() && cameraPose.translation() == pose.translation()) ||
+       (cameraPose.rotation() == id.rotation()   && cameraPose.translation() == id.translation()))
+    {
+        renderPointsNormals(pyrPoints[0], pyrNormals[0], image, params.lightPose);
+    }
+    else
+    {
+        T points, normals;
+        volume->raycast(cameraPose, params.intr, params.frameSize, points, normals);
+        renderPointsNormals(points, normals, image, params.lightPose);
+    }
+}
+
+
+template< typename T >
+void KinFuImpl<T>::getCloud(OutputArray p, OutputArray n) const
+{
+    volume->fetchPointsNormals(p, n);
+}
+
+
+template< typename T >
+void KinFuImpl<T>::getPoints(OutputArray points) const
+{
+    volume->fetchPointsNormals(points, noArray());
+}
+
+
+template< typename T >
+void KinFuImpl<T>::getNormals(InputArray points, OutputArray normals) const
+{
+    volume->fetchNormals(points, normals);
+}
+
+// importing class
+
+#ifdef OPENCV_ENABLE_NONFREE
+
+Ptr<KinFu> KinFu::create(const Ptr<Params>& params)
+{
+    CV_Assert((int)params->icpIterations.size() == params->pyramidLevels);
+    CV_Assert(params->intr(0,1) == 0 && params->intr(1,0) == 0 && params->intr(2,0) == 0 && params->intr(2,1) == 0 && params->intr(2,2) == 1);
+#ifdef HAVE_OPENCL
+    if(cv::ocl::useOpenCL())
+        return makePtr< KinFuImpl<UMat> >(*params);
+#endif
+    return makePtr< KinFuImpl<Mat> >(*params);
+}
+
+#else
+Ptr<KinFu> KinFu::create(const Ptr<Params>& /*params*/)
+{
+    CV_Error(Error::StsNotImplemented,
+             "This algorithm is patented and is excluded in this configuration; "
+             "Set OPENCV_ENABLE_NONFREE CMake option and rebuild the library");
+}
+#endif
+
+KinFu::~KinFu() {}
+
+} // namespace kinfu
+} // namespace cv
diff --git a/modules/rgbd/src/kinfu_frame.cpp b/modules/rgbd/src/kinfu_frame.cpp
new file mode 100644
index 00000000000..3a65a7a0727
--- /dev/null
+++ b/modules/rgbd/src/kinfu_frame.cpp
@@ -0,0 +1,702 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#include "precomp.hpp"
+#include "kinfu_frame.hpp"
+#include "opencl_kernels_rgbd.hpp"
+
+namespace cv {
+namespace kinfu {
+
+static void computePointsNormals(const cv::kinfu::Intr, float depthFactor, const Depth, Points, Normals );
+static Depth pyrDownBilateral(const Depth depth, float sigma);
+static void pyrDownPointsNormals(const Points p, const Normals n, Points& pdown, Normals& ndown);
+
+template<int p>
+inline float specPow(float x)
+{
+    if(p % 2 == 0)
+    {
+        float v = specPow<p/2>(x);
+        return v*v;
+    }
+    else
+    {
+        float v = specPow<(p-1)/2>(x);
+        return v*v*x;
+    }
+}
+
+template<>
+inline float specPow<0>(float /*x*/)
+{
+    return 1.f;
+}
+
+template<>
+inline float specPow<1>(float x)
+{
+    return x;
+}
+
+struct RenderInvoker : ParallelLoopBody
+{
+    RenderInvoker(const Points& _points, const Normals& _normals, Mat_<Vec4b>& _img, Affine3f _lightPose, Size _sz) :
+        ParallelLoopBody(),
+        points(_points),
+        normals(_normals),
+        img(_img),
+        lightPose(_lightPose),
+        sz(_sz)
+    { }
+
+    virtual void operator ()(const Range& range) const override
+    {
+        for(int y = range.start; y < range.end; y++)
+        {
+            Vec4b* imgRow = img[y];
+            const ptype* ptsRow = points[y];
+            const ptype* nrmRow = normals[y];
+
+            for(int x = 0; x < sz.width; x++)
+            {
+                Point3f p = fromPtype(ptsRow[x]);
+                Point3f n = fromPtype(nrmRow[x]);
+
+                Vec4b color;
+
+                if(isNaN(p))
+                {
+                    color = Vec4b(0, 32, 0, 0);
+                }
+                else
+                {
+                    const float Ka = 0.3f;  //ambient coeff
+                    const float Kd = 0.5f;  //diffuse coeff
+                    const float Ks = 0.2f;  //specular coeff
+                    const int   sp = 20;  //specular power
+
+                    const float Ax = 1.f;   //ambient color,  can be RGB
+                    const float Dx = 1.f;   //diffuse color,  can be RGB
+                    const float Sx = 1.f;   //specular color, can be RGB
+                    const float Lx = 1.f;   //light color
+
+                    Point3f l = normalize(lightPose.translation() - Vec3f(p));
+                    Point3f v = normalize(-Vec3f(p));
+                    Point3f r = normalize(Vec3f(2.f*n*n.dot(l) - l));
+
+                    uchar ix = (uchar)((Ax*Ka*Dx + Lx*Kd*Dx*max(0.f, n.dot(l)) +
+                                        Lx*Ks*Sx*specPow<sp>(max(0.f, r.dot(v))))*255.f);
+                    color = Vec4b(ix, ix, ix, 0);
+                }
+
+                imgRow[x] = color;
+            }
+        }
+    }
+
+    const Points& points;
+    const Normals& normals;
+    Mat_<Vec4b>& img;
+    Affine3f lightPose;
+    Size sz;
+};
+
+
+void pyrDownPointsNormals(const Points p, const Normals n, Points &pdown, Normals &ndown)
+{
+    CV_TRACE_FUNCTION();
+
+    for(int y = 0; y < pdown.rows; y++)
+    {
+        ptype* ptsRow = pdown[y];
+        ptype* nrmRow = ndown[y];
+        const ptype* pUpRow0 = p[2*y];
+        const ptype* pUpRow1 = p[2*y+1];
+        const ptype* nUpRow0 = n[2*y];
+        const ptype* nUpRow1 = n[2*y+1];
+        for(int x = 0; x < pdown.cols; x++)
+        {
+            Point3f point = nan3, normal = nan3;
+
+            Point3f d00 = fromPtype(pUpRow0[2*x]);
+            Point3f d01 = fromPtype(pUpRow0[2*x+1]);
+            Point3f d10 = fromPtype(pUpRow1[2*x]);
+            Point3f d11 = fromPtype(pUpRow1[2*x+1]);
+
+            if(!(isNaN(d00) || isNaN(d01) || isNaN(d10) || isNaN(d11)))
+            {
+                point = (d00 + d01 + d10 + d11)*0.25f;
+
+                Point3f n00 = fromPtype(nUpRow0[2*x]);
+                Point3f n01 = fromPtype(nUpRow0[2*x+1]);
+                Point3f n10 = fromPtype(nUpRow1[2*x]);
+                Point3f n11 = fromPtype(nUpRow1[2*x+1]);
+
+                normal = (n00 + n01 + n10 + n11)*0.25f;
+            }
+
+            ptsRow[x] = toPtype(point);
+            nrmRow[x] = toPtype(normal);
+        }
+    }
+}
+
+struct PyrDownBilateralInvoker : ParallelLoopBody
+{
+    PyrDownBilateralInvoker(const Depth& _depth, Depth& _depthDown, float _sigma) :
+        ParallelLoopBody(),
+        depth(_depth),
+        depthDown(_depthDown),
+        sigma(_sigma)
+    { }
+
+    virtual void operator ()(const Range& range) const override
+    {
+        float sigma3 = sigma*3;
+        const int D = 5;
+
+        for(int y = range.start; y < range.end; y++)
+        {
+            depthType* downRow = depthDown[y];
+            const depthType* srcCenterRow = depth[2*y];
+
+            for(int x = 0; x < depthDown.cols; x++)
+            {
+                depthType center = srcCenterRow[2*x];
+
+                int sx = max(0, 2*x - D/2), ex = min(2*x - D/2 + D, depth.cols-1);
+                int sy = max(0, 2*y - D/2), ey = min(2*y - D/2 + D, depth.rows-1);
+
+                depthType sum = 0;
+                int count = 0;
+
+                for(int iy = sy; iy < ey; iy++)
+                {
+                    const depthType* srcRow = depth[iy];
+                    for(int ix = sx; ix < ex; ix++)
+                    {
+                        depthType val = srcRow[ix];
+                        if(abs(val - center) < sigma3)
+                        {
+                            sum += val; count ++;
+                        }
+                    }
+                }
+
+                downRow[x] = (count == 0) ? 0 : sum / count;
+            }
+        }
+    }
+
+    const Depth& depth;
+    Depth& depthDown;
+    float sigma;
+};
+
+
+Depth pyrDownBilateral(const Depth depth, float sigma)
+{
+    CV_TRACE_FUNCTION();
+
+    Depth depthDown(depth.rows/2, depth.cols/2);
+
+    PyrDownBilateralInvoker pdi(depth, depthDown, sigma);
+    Range range(0, depthDown.rows);
+    const int nstripes = -1;
+    parallel_for_(range, pdi, nstripes);
+
+    return depthDown;
+}
+
+struct ComputePointsNormalsInvoker : ParallelLoopBody
+{
+    ComputePointsNormalsInvoker(const Depth& _depth, Points& _points, Normals& _normals,
+                                const Intr::Reprojector& _reproj, float _dfac) :
+        ParallelLoopBody(),
+        depth(_depth),
+        points(_points),
+        normals(_normals),
+        reproj(_reproj),
+        dfac(_dfac)
+    { }
+
+    virtual void operator ()(const Range& range) const override
+    {
+        for(int y = range.start; y < range.end; y++)
+        {
+            const depthType* depthRow0 = depth[y];
+            const depthType* depthRow1 = (y < depth.rows - 1) ? depth[y + 1] : 0;
+            ptype    *ptsRow = points[y];
+            ptype   *normRow = normals[y];
+
+            for(int x = 0; x < depth.cols; x++)
+            {
+                depthType d00 = depthRow0[x];
+                depthType z00 = d00*dfac;
+                Point3f v00 = reproj(Point3f((float)x, (float)y, z00));
+
+                Point3f p = nan3, n = nan3;
+
+                if(x < depth.cols - 1 && y < depth.rows - 1)
+                {
+                    depthType d01 = depthRow0[x+1];
+                    depthType d10 = depthRow1[x];
+
+                    depthType z01 = d01*dfac;
+                    depthType z10 = d10*dfac;
+
+                    // before it was
+                    //if(z00*z01*z10 != 0)
+                    if(z00 != 0 && z01 != 0 && z10 != 0)
+                    {
+                        Point3f v01 = reproj(Point3f((float)(x+1), (float)(y+0), z01));
+                        Point3f v10 = reproj(Point3f((float)(x+0), (float)(y+1), z10));
+
+                        cv::Vec3f vec = (v01-v00).cross(v10-v00);
+                        n = -normalize(vec);
+                        p = v00;
+                    }
+                }
+
+                ptsRow[x] = toPtype(p);
+                normRow[x] = toPtype(n);
+            }
+        }
+    }
+
+    const Depth& depth;
+    Points& points;
+    Normals& normals;
+    const Intr::Reprojector& reproj;
+    float dfac;
+};
+
+void computePointsNormals(const Intr intr, float depthFactor, const Depth depth,
+                          Points points, Normals normals)
+{
+    CV_TRACE_FUNCTION();
+
+    CV_Assert(!points.empty() && !normals.empty());
+    CV_Assert(depth.size() == points.size());
+    CV_Assert(depth.size() == normals.size());
+
+    // conversion to meters
+    // before it was:
+    //float dfac = 0.001f/depthFactor;
+    float dfac = 1.f/depthFactor;
+
+    Intr::Reprojector reproj = intr.makeReprojector();
+
+    ComputePointsNormalsInvoker ci(depth, points, normals, reproj, dfac);
+    Range range(0, depth.rows);
+    const int nstripes = -1;
+    parallel_for_(range, ci, nstripes);
+}
+
+///////// GPU implementation /////////
+
+#ifdef HAVE_OPENCL
+
+static bool ocl_renderPointsNormals(const UMat points, const UMat normals, UMat image, Affine3f lightPose);
+static bool ocl_makeFrameFromDepth(const UMat depth, OutputArrayOfArrays points, OutputArrayOfArrays normals,
+                                   const Intr intr, int levels, float depthFactor,
+                                   float sigmaDepth, float sigmaSpatial, int kernelSize,
+                                   float truncateThreshold);
+static bool ocl_buildPyramidPointsNormals(const UMat points, const UMat normals,
+                                          OutputArrayOfArrays pyrPoints, OutputArrayOfArrays pyrNormals,
+                                          int levels);
+
+static bool computePointsNormalsGpu(const Intr intr, float depthFactor, const UMat& depth, UMat& points, UMat& normals);
+static bool pyrDownBilateralGpu(const UMat& depth, UMat& depthDown, float sigma);
+static bool customBilateralFilterGpu(const UMat src, UMat& dst, int kernelSize, float sigmaDepth, float sigmaSpatial);
+static bool pyrDownPointsNormalsGpu(const UMat p, const UMat n, UMat &pdown, UMat &ndown);
+
+
+bool computePointsNormalsGpu(const Intr intr, float depthFactor, const UMat& depth,
+                             UMat& points, UMat& normals)
+{
+    CV_TRACE_FUNCTION();
+
+    CV_Assert(!points.empty() && !normals.empty());
+    CV_Assert(depth.size() == points.size());
+    CV_Assert(depth.size() == normals.size());
+    CV_Assert(depth.type() == DEPTH_TYPE);
+    CV_Assert(points.type()  == POINT_TYPE);
+    CV_Assert(normals.type() == POINT_TYPE);
+
+    // conversion to meters
+    float dfac = 1.f/depthFactor;
+
+    Intr::Reprojector reproj = intr.makeReprojector();
+
+    cv::String errorStr;
+    cv::String name = "computePointsNormals";
+    ocl::ProgramSource source = ocl::rgbd::kinfu_frame_oclsrc;
+    cv::String options = "-cl-mad-enable";
+    ocl::Kernel k;
+    k.create(name.c_str(), source, options, &errorStr);
+
+    if(k.empty())
+        return false;
+
+    Vec2f fxyinv(reproj.fxinv, reproj.fyinv), cxy(reproj.cx, reproj.cy);
+
+    k.args(ocl::KernelArg::WriteOnlyNoSize(points),
+           ocl::KernelArg::WriteOnlyNoSize(normals),
+           ocl::KernelArg::ReadOnly(depth),
+           fxyinv.val,
+           cxy.val,
+           dfac);
+
+    size_t globalSize[2];
+    globalSize[0] = (size_t)depth.cols;
+    globalSize[1] = (size_t)depth.rows;
+
+    return k.run(2, globalSize, NULL, true);
+}
+
+
+bool pyrDownBilateralGpu(const UMat& depth, UMat& depthDown, float sigma)
+{
+    CV_TRACE_FUNCTION();
+
+    depthDown.create(depth.rows/2, depth.cols/2, DEPTH_TYPE);
+
+    cv::String errorStr;
+    cv::String name = "pyrDownBilateral";
+    ocl::ProgramSource source = ocl::rgbd::kinfu_frame_oclsrc;
+    cv::String options = "-cl-mad-enable";
+    ocl::Kernel k;
+    k.create(name.c_str(), source, options, &errorStr);
+
+    if(k.empty())
+        return false;
+
+    k.args(ocl::KernelArg::ReadOnly(depth),
+           ocl::KernelArg::WriteOnly(depthDown),
+           sigma);
+
+    size_t globalSize[2];
+    globalSize[0] = (size_t)depthDown.cols;
+    globalSize[1] = (size_t)depthDown.rows;
+
+    return k.run(2, globalSize, NULL, true);
+}
+
+//TODO: remove it when OpenCV's bilateral processes 32f on GPU
+bool customBilateralFilterGpu(const UMat src /* udepth */, UMat& dst /* smooth */,
+                              int kernelSize, float sigmaDepth, float sigmaSpatial)
+{
+    CV_TRACE_FUNCTION();
+
+    Size frameSize = src.size();
+
+    CV_Assert(frameSize.area() > 0);
+    CV_Assert(src.type() == DEPTH_TYPE);
+
+    dst.create(frameSize, DEPTH_TYPE);
+
+    cv::String errorStr;
+    cv::String name = "customBilateral";
+    ocl::ProgramSource source = ocl::rgbd::kinfu_frame_oclsrc;
+    cv::String options = "-cl-mad-enable";
+    ocl::Kernel k;
+    k.create(name.c_str(), source, options, &errorStr);
+
+    if(k.empty())
+        return false;
+
+    k.args(ocl::KernelArg::ReadOnlyNoSize(src),
+           ocl::KernelArg::WriteOnlyNoSize(dst),
+           frameSize,
+           kernelSize,
+           0.5f / (sigmaSpatial * sigmaSpatial),
+           0.5f / (sigmaDepth * sigmaDepth));
+
+    size_t globalSize[2];
+    globalSize[0] = (size_t)src.cols;
+    globalSize[1] = (size_t)src.rows;
+
+    return k.run(2, globalSize, NULL, true);
+}
+
+
+bool pyrDownPointsNormalsGpu(const UMat p, const UMat n, UMat &pdown, UMat &ndown)
+{
+    CV_TRACE_FUNCTION();
+
+    cv::String errorStr;
+    cv::String name = "pyrDownPointsNormals";
+    ocl::ProgramSource source = ocl::rgbd::kinfu_frame_oclsrc;
+    cv::String options = "-cl-mad-enable";
+    ocl::Kernel k;
+    k.create(name.c_str(), source, options, &errorStr);
+
+    if(k.empty())
+        return false;
+
+    Size downSize = pdown.size();
+
+    k.args(ocl::KernelArg::ReadOnlyNoSize(p),
+           ocl::KernelArg::ReadOnlyNoSize(n),
+           ocl::KernelArg::WriteOnlyNoSize(pdown),
+           ocl::KernelArg::WriteOnlyNoSize(ndown),
+           downSize);
+
+    size_t globalSize[2];
+    globalSize[0] = (size_t)pdown.cols;
+    globalSize[1] = (size_t)pdown.rows;
+
+    return k.run(2, globalSize, NULL, true);
+}
+
+
+static bool ocl_renderPointsNormals(const UMat points, const UMat normals,
+                                    UMat img, Affine3f lightPose)
+{
+    CV_TRACE_FUNCTION();
+
+    cv::String errorStr;
+    cv::String name = "render";
+    ocl::ProgramSource source = ocl::rgbd::kinfu_frame_oclsrc;
+    cv::String options = "-cl-mad-enable";
+    ocl::Kernel k;
+    k.create(name.c_str(), source, options, &errorStr);
+
+    if(k.empty())
+        return false;
+
+    Vec4f lightPt(lightPose.translation()[0],
+                  lightPose.translation()[1],
+                  lightPose.translation()[2]);
+    Size frameSize = points.size();
+
+    k.args(ocl::KernelArg::ReadOnlyNoSize(points),
+           ocl::KernelArg::ReadOnlyNoSize(normals),
+           ocl::KernelArg::WriteOnlyNoSize(img),
+           frameSize,
+           lightPt.val);
+
+    size_t globalSize[2];
+    globalSize[0] = (size_t)points.cols;
+    globalSize[1] = (size_t)points.rows;
+
+    return k.run(2, globalSize, NULL, true);
+}
+
+
+static bool ocl_makeFrameFromDepth(const UMat depth, OutputArrayOfArrays points, OutputArrayOfArrays normals,
+                                   const Intr intr, int levels, float depthFactor,
+                                   float sigmaDepth, float sigmaSpatial, int kernelSize,
+                                   float truncateThreshold)
+{
+    CV_TRACE_FUNCTION();
+
+    // looks like OpenCV's bilateral filter works the same as KinFu's
+    UMat smooth;
+    //TODO: fix that
+    // until 32f isn't implemented in OpenCV in OpenCL, we should use our workarounds
+    //bilateralFilter(udepth, smooth, kernelSize, sigmaDepth*depthFactor, sigmaSpatial);
+    if(!customBilateralFilterGpu(depth, smooth, kernelSize, sigmaDepth*depthFactor, sigmaSpatial))
+        return false;
+
+    // depth truncation can be used in some scenes
+    if(truncateThreshold > 0.f)
+        threshold(depth, depth, truncateThreshold*depthFactor, 0.0, THRESH_TOZERO_INV);
+
+    UMat scaled = smooth;
+    Size sz = smooth.size();
+    points.create(levels, 1, POINT_TYPE);
+    normals.create(levels, 1, POINT_TYPE);
+    for(int i = 0; i < levels; i++)
+    {
+        UMat& p = points.getUMatRef(i);
+        UMat& n = normals.getUMatRef(i);
+        p.create(sz, POINT_TYPE);
+        n.create(sz, POINT_TYPE);
+
+        if(!computePointsNormalsGpu(intr.scale(i), depthFactor, scaled, p, n))
+            return false;
+
+        if(i < levels - 1)
+        {
+            sz.width /= 2, sz.height /= 2;
+            UMat halfDepth(sz, DEPTH_TYPE);
+            pyrDownBilateralGpu(scaled, halfDepth, sigmaDepth*depthFactor);
+            scaled = halfDepth;
+        }
+    }
+
+    return true;
+}
+
+
+static bool ocl_buildPyramidPointsNormals(const UMat points, const UMat normals,
+                                          OutputArrayOfArrays pyrPoints, OutputArrayOfArrays pyrNormals,
+                                          int levels)
+{
+    CV_TRACE_FUNCTION();
+
+    pyrPoints .create(levels, 1, POINT_TYPE);
+    pyrNormals.create(levels, 1, POINT_TYPE);
+
+    pyrPoints .getUMatRef(0) = points;
+    pyrNormals.getUMatRef(0) = normals;
+
+    Size sz = points.size();
+    for(int i = 1; i < levels; i++)
+    {
+        UMat p1 = pyrPoints .getUMat(i-1);
+        UMat n1 = pyrNormals.getUMat(i-1);
+
+        sz.width /= 2; sz.height /= 2;
+        UMat& p0 = pyrPoints .getUMatRef(i);
+        UMat& n0 = pyrNormals.getUMatRef(i);
+        p0.create(sz, POINT_TYPE);
+        n0.create(sz, POINT_TYPE);
+
+        if(!pyrDownPointsNormalsGpu(p1, n1, p0, n0))
+            return false;
+    }
+
+    return true;
+}
+
+#endif
+
+
+void renderPointsNormals(InputArray _points, InputArray _normals, OutputArray image, Affine3f lightPose)
+{
+    CV_TRACE_FUNCTION();
+
+    CV_Assert(_points.size().area() > 0);
+    CV_Assert(_points.size() == _normals.size());
+
+    Size sz = _points.size();
+    image.create(sz, CV_8UC4);
+
+    CV_OCL_RUN(_points.isUMat() && _normals.isUMat() && image.isUMat(),
+               ocl_renderPointsNormals(_points.getUMat(),
+                                       _normals.getUMat(),
+                                       image.getUMat(), lightPose))
+
+    Points  points  = _points.getMat();
+    Normals normals = _normals.getMat();
+
+    Mat_<Vec4b> img = image.getMat();
+
+    RenderInvoker ri(points, normals, img, lightPose, sz);
+    Range range(0, sz.height);
+    const int nstripes = -1;
+    parallel_for_(range, ri, nstripes);
+}
+
+
+void makeFrameFromDepth(InputArray _depth,
+                        OutputArray pyrPoints, OutputArray pyrNormals,
+                        const Intr intr, int levels, float depthFactor,
+                        float sigmaDepth, float sigmaSpatial, int kernelSize,
+                        float truncateThreshold)
+{
+    CV_TRACE_FUNCTION();
+
+    CV_Assert(_depth.type() == DEPTH_TYPE);
+
+    CV_OCL_RUN(_depth.isUMat() && pyrPoints.isUMatVector() && pyrNormals.isUMatVector(),
+               ocl_makeFrameFromDepth(_depth.getUMat(), pyrPoints, pyrNormals,
+                                      intr, levels, depthFactor,
+                                      sigmaDepth, sigmaSpatial, kernelSize,
+                                      truncateThreshold));
+
+    int kp = pyrPoints.kind(), kn = pyrNormals.kind();
+    CV_Assert(kp == _InputArray::STD_ARRAY_MAT || kp == _InputArray::STD_VECTOR_MAT);
+    CV_Assert(kn == _InputArray::STD_ARRAY_MAT || kn == _InputArray::STD_VECTOR_MAT);
+
+    Depth depth = _depth.getMat();
+
+    // looks like OpenCV's bilateral filter works the same as KinFu's
+    Depth smooth;
+
+    bilateralFilter(depth, smooth, kernelSize, sigmaDepth*depthFactor, sigmaSpatial);
+
+    // depth truncation can be used in some scenes
+    if(truncateThreshold > 0.f)
+        threshold(depth, depth, truncateThreshold, 0.0, THRESH_TOZERO_INV);
+
+    // we don't need depth pyramid outside this method
+    // if we do, the code is to be refactored
+
+    Depth scaled = smooth;
+    Size sz = smooth.size();
+    pyrPoints.create(levels, 1, POINT_TYPE);
+    pyrNormals.create(levels, 1, POINT_TYPE);
+    for(int i = 0; i < levels; i++)
+    {
+        pyrPoints .create(sz, POINT_TYPE, i);
+        pyrNormals.create(sz, POINT_TYPE, i);
+
+        Points  p = pyrPoints. getMatRef(i);
+        Normals n = pyrNormals.getMatRef(i);
+
+        computePointsNormals(intr.scale(i), depthFactor, scaled, p, n);
+
+        if(i < levels - 1)
+        {
+            sz.width /= 2; sz.height /= 2;
+            scaled = pyrDownBilateral(scaled, sigmaDepth*depthFactor);
+        }
+    }
+}
+
+
+void buildPyramidPointsNormals(InputArray _points, InputArray _normals,
+                               OutputArrayOfArrays pyrPoints, OutputArrayOfArrays pyrNormals,
+                               int levels)
+{
+    CV_TRACE_FUNCTION();
+
+    CV_Assert(_points.type() == POINT_TYPE);
+    CV_Assert(_points.type() == _normals.type());
+    CV_Assert(_points.size() == _normals.size());
+
+    CV_OCL_RUN(_points.isUMat() && _normals.isUMat() &&
+               pyrPoints.isUMatVector() && pyrNormals.isUMatVector(),
+               ocl_buildPyramidPointsNormals(_points.getUMat(), _normals.getUMat(),
+                                             pyrPoints, pyrNormals,
+                                             levels));
+
+    int kp = pyrPoints.kind(), kn = pyrNormals.kind();
+    CV_Assert(kp == _InputArray::STD_ARRAY_MAT || kp == _InputArray::STD_VECTOR_MAT);
+    CV_Assert(kn == _InputArray::STD_ARRAY_MAT || kn == _InputArray::STD_VECTOR_MAT);
+
+    Mat p0 = _points.getMat(), n0 = _normals.getMat();
+
+    pyrPoints .create(levels, 1, POINT_TYPE);
+    pyrNormals.create(levels, 1, POINT_TYPE);
+
+    pyrPoints .getMatRef(0) = p0;
+    pyrNormals.getMatRef(0) = n0;
+
+    Size sz = _points.size();
+    for(int i = 1; i < levels; i++)
+    {
+        Points  p1 = pyrPoints .getMat(i-1);
+        Normals n1 = pyrNormals.getMat(i-1);
+
+        sz.width /= 2; sz.height /= 2;
+
+        pyrPoints .create(sz, POINT_TYPE, i);
+        pyrNormals.create(sz, POINT_TYPE, i);
+        Points  pd = pyrPoints. getMatRef(i);
+        Normals nd = pyrNormals.getMatRef(i);
+
+        pyrDownPointsNormals(p1, n1, pd, nd);
+    }
+}
+
+} // namespace kinfu
+} // namespace cv
diff --git a/modules/rgbd/src/kinfu_frame.hpp b/modules/rgbd/src/kinfu_frame.hpp
new file mode 100644
index 00000000000..8f2d6f82936
--- /dev/null
+++ b/modules/rgbd/src/kinfu_frame.hpp
@@ -0,0 +1,94 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#ifndef __OPENCV_KINFU_FRAME_H__
+#define __OPENCV_KINFU_FRAME_H__
+
+#include "utils.hpp"
+
+namespace cv {
+
+template<> class DataType<cv::Point3f>
+{
+public:
+    typedef float       value_type;
+    typedef value_type  work_type;
+    typedef value_type  channel_type;
+    typedef value_type  vec_type;
+    enum { generic_type = 0,
+           depth        = CV_32F,
+           channels     = 3,
+           fmt          = (int)'f',
+           type         = CV_MAKETYPE(depth, channels)
+         };
+};
+
+template<> class DataType<cv::Vec3f>
+{
+public:
+    typedef float       value_type;
+    typedef value_type  work_type;
+    typedef value_type  channel_type;
+    typedef value_type  vec_type;
+    enum { generic_type = 0,
+           depth        = CV_32F,
+           channels     = 3,
+           fmt          = (int)'f',
+           type         = CV_MAKETYPE(depth, channels)
+         };
+};
+
+template<> class DataType<cv::Vec4f>
+{
+public:
+    typedef float       value_type;
+    typedef value_type  work_type;
+    typedef value_type  channel_type;
+    typedef value_type  vec_type;
+    enum { generic_type = 0,
+           depth        = CV_32F,
+           channels     = 4,
+           fmt          = (int)'f',
+           type         = CV_MAKETYPE(depth, channels)
+         };
+};
+
+namespace kinfu {
+
+typedef cv::Vec4f ptype;
+inline cv::Vec3f fromPtype(const ptype& x)
+{
+    return cv::Vec3f(x[0], x[1], x[2]);
+}
+
+inline ptype toPtype(const cv::Vec3f& x)
+{
+    return ptype(x[0], x[1], x[2], 0);
+}
+
+enum
+{
+    DEPTH_TYPE = DataType<depthType>::type,
+    POINT_TYPE = DataType<ptype    >::type
+};
+
+typedef cv::Mat_< ptype > Points;
+typedef Points Normals;
+
+typedef cv::Mat_< depthType > Depth;
+
+void renderPointsNormals(InputArray _points, InputArray _normals, OutputArray image, cv::Affine3f lightPose);
+void makeFrameFromDepth(InputArray depth, OutputArray pyrPoints, OutputArray pyrNormals,
+                        const Intr intr, int levels, float depthFactor,
+                        float sigmaDepth, float sigmaSpatial, int kernelSize,
+                        float truncateThreshold);
+void buildPyramidPointsNormals(InputArray _points, InputArray _normals,
+                               OutputArrayOfArrays pyrPoints, OutputArrayOfArrays pyrNormals,
+                               int levels);
+
+} // namespace kinfu
+} // namespace cv
+#endif
diff --git a/modules/rgbd/src/linemod.cpp b/modules/rgbd/src/linemod.cpp
index 7a62379a32c..ac2a0e4c0bc 100644
--- a/modules/rgbd/src/linemod.cpp
+++ b/modules/rgbd/src/linemod.cpp
@@ -1,44 +1,8 @@
-/*M///////////////////////////////////////////////////////////////////////////////////////
-//
-//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
-//
-//  By downloading, copying, installing or using the software you agree to this license.
-//  If you do not agree to this license, do not download, install,
-//  copy or use the software.
-//
-//
-//                           License Agreement
-//                For Open Source Computer Vision Library
-//
-// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
-// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
-// Third party copyrights are property of their respective owners.
-//
-// Redistribution and use in source and binary forms, with or without modification,
-// are permitted provided that the following conditions are met:
-//
-//   * Redistribution's of source code must retain the above copyright notice,
-//     this list of conditions and the following disclaimer.
-//
-//   * Redistribution's in binary form must reproduce the above copyright notice,
-//     this list of conditions and the following disclaimer in the documentation
-//     and/or other materials provided with the distribution.
-//
-//   * The name of the copyright holders may not be used to endorse or promote products
-//     derived from this software without specific prior written permission.
-//
-// This software is provided by the copyright holders and contributors "as is" and
-// any express or implied warranties, including, but not limited to, the implied
-// warranties of merchantability and fitness for a particular purpose are disclaimed.
-// In no event shall the Intel Corporation or contributors be liable for any direct,
-// indirect, incidental, special, exemplary, or consequential damages
-// (including, but not limited to, procurement of substitute goods or services;
-// loss of use, data, or profits; or business interruption) however caused
-// and on any theory of liability, whether in contract, strict liability,
-// or tort (including negligence or otherwise) arising in any way out of
-// the use of this software, even if advised of the possibility of such damage.
-//
-//M*/
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
 
 #include "precomp.hpp"
 
@@ -243,6 +207,29 @@ void colormap(const Mat& quantized, Mat& dst)
   }
 }
 
+void drawFeatures(InputOutputArray img, const std::vector<Template>& templates, const Point2i& tl, int size)
+{
+#ifdef HAVE_OPENCV_IMGPROC
+    static Scalar colors[] = {{0, 0, 255}, {0, 255, 0}};
+    static int markers[] = {MARKER_SQUARE, MARKER_DIAMOND};
+
+    int modality = 0;
+    for(const Template& t : templates)
+    {
+        if(t.pyramid_level != 0) continue;
+
+        for(const Feature& f : t.features)
+        {
+            drawMarker(img, tl + Point(f.x, f.y), colors[int(modality != 0)], markers[int(modality != 0)], size);
+        }
+
+        modality++;
+    }
+#else
+    CV_Assert(false, "functionality needs imgproc module");
+#endif
+}
+
 /****************************************************************************************\
 *                             Color gradient modality                                    *
 \****************************************************************************************/
diff --git a/modules/rgbd/src/marchingcubes.hpp b/modules/rgbd/src/marchingcubes.hpp
new file mode 100644
index 00000000000..c8639c775c7
--- /dev/null
+++ b/modules/rgbd/src/marchingcubes.hpp
@@ -0,0 +1,304 @@
+#ifndef __OPENCV_DYNAFU_MARCHINGCUBES_H__
+#define __OPENCV_DYNAFU_MARCHINGCUBES_H__
+
+/*
+These tables are originally generated by Cory Gene Bloyd
+(http://paulbourke.net/geometry/polygonise) and are released
+in the public domain
+*/
+
+namespace cv {
+namespace dynafu {
+
+// For any edge, if one vertex is inside of the surface and the other is outside of the surface
+//  then the edge intersects the surface
+// For each of the 8 vertices of the cube can be two possible states : either inside or outside of the surface
+// For any cube the are 2^8=256 possible sets of vertex states
+// This table lists the edges intersected by the surface for all 256 possible vertex states
+// There are 12 edges.  For each entry in the table, if edge #n is intersected, then bit #n is set to 1
+int edgeTable[256] =
+    {
+        0x000, 0x109, 0x203, 0x30a, 0x406, 0x50f, 0x605, 0x70c, 0x80c, 0x905, 0xa0f, 0xb06, 0xc0a, 0xd03, 0xe09, 0xf00,
+        0x190, 0x099, 0x393, 0x29a, 0x596, 0x49f, 0x795, 0x69c, 0x99c, 0x895, 0xb9f, 0xa96, 0xd9a, 0xc93, 0xf99, 0xe90,
+        0x230, 0x339, 0x033, 0x13a, 0x636, 0x73f, 0x435, 0x53c, 0xa3c, 0xb35, 0x83f, 0x936, 0xe3a, 0xf33, 0xc39, 0xd30,
+        0x3a0, 0x2a9, 0x1a3, 0x0aa, 0x7a6, 0x6af, 0x5a5, 0x4ac, 0xbac, 0xaa5, 0x9af, 0x8a6, 0xfaa, 0xea3, 0xda9, 0xca0,
+        0x460, 0x569, 0x663, 0x76a, 0x066, 0x16f, 0x265, 0x36c, 0xc6c, 0xd65, 0xe6f, 0xf66, 0x86a, 0x963, 0xa69, 0xb60,
+        0x5f0, 0x4f9, 0x7f3, 0x6fa, 0x1f6, 0x0ff, 0x3f5, 0x2fc, 0xdfc, 0xcf5, 0xfff, 0xef6, 0x9fa, 0x8f3, 0xbf9, 0xaf0,
+        0x650, 0x759, 0x453, 0x55a, 0x256, 0x35f, 0x055, 0x15c, 0xe5c, 0xf55, 0xc5f, 0xd56, 0xa5a, 0xb53, 0x859, 0x950,
+        0x7c0, 0x6c9, 0x5c3, 0x4ca, 0x3c6, 0x2cf, 0x1c5, 0x0cc, 0xfcc, 0xec5, 0xdcf, 0xcc6, 0xbca, 0xac3, 0x9c9, 0x8c0,
+        0x8c0, 0x9c9, 0xac3, 0xbca, 0xcc6, 0xdcf, 0xec5, 0xfcc, 0x0cc, 0x1c5, 0x2cf, 0x3c6, 0x4ca, 0x5c3, 0x6c9, 0x7c0,
+        0x950, 0x859, 0xb53, 0xa5a, 0xd56, 0xc5f, 0xf55, 0xe5c, 0x15c, 0x055, 0x35f, 0x256, 0x55a, 0x453, 0x759, 0x650,
+        0xaf0, 0xbf9, 0x8f3, 0x9fa, 0xef6, 0xfff, 0xcf5, 0xdfc, 0x2fc, 0x3f5, 0x0ff, 0x1f6, 0x6fa, 0x7f3, 0x4f9, 0x5f0,
+        0xb60, 0xa69, 0x963, 0x86a, 0xf66, 0xe6f, 0xd65, 0xc6c, 0x36c, 0x265, 0x16f, 0x066, 0x76a, 0x663, 0x569, 0x460,
+        0xca0, 0xda9, 0xea3, 0xfaa, 0x8a6, 0x9af, 0xaa5, 0xbac, 0x4ac, 0x5a5, 0x6af, 0x7a6, 0x0aa, 0x1a3, 0x2a9, 0x3a0,
+        0xd30, 0xc39, 0xf33, 0xe3a, 0x936, 0x83f, 0xb35, 0xa3c, 0x53c, 0x435, 0x73f, 0x636, 0x13a, 0x033, 0x339, 0x230,
+        0xe90, 0xf99, 0xc93, 0xd9a, 0xa96, 0xb9f, 0x895, 0x99c, 0x69c, 0x795, 0x49f, 0x596, 0x29a, 0x393, 0x099, 0x190,
+        0xf00, 0xe09, 0xd03, 0xc0a, 0xb06, 0xa0f, 0x905, 0x80c, 0x70c, 0x605, 0x50f, 0x406, 0x30a, 0x203, 0x109, 0x000};
+
+//  For each of the possible vertex states listed in aiCubeEdgeFlags there is a specific triangulation
+//  of the edge intersection points.  a2iTriangleConnectionTable lists all of them in the form of
+//  0-5 edge triples with the list terminated by the invalid value -1.
+//  For example: a2iTriangleConnectionTable[3] list the 2 triangles formed when corner[0]
+//  and corner[1] are inside of the surface, but the rest of the cube is not.
+int triTable[256][16] =
+    {
+        {-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {0, 8, 3, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {0, 1, 9, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {1, 8, 3, 9, 8, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {1, 2, 10, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {0, 8, 3, 1, 2, 10, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {9, 2, 10, 0, 2, 9, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {2, 8, 3, 2, 10, 8, 10, 9, 8, -1, -1, -1, -1, -1, -1, -1},
+        {3, 11, 2, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {0, 11, 2, 8, 11, 0, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {1, 9, 0, 2, 3, 11, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {1, 11, 2, 1, 9, 11, 9, 8, 11, -1, -1, -1, -1, -1, -1, -1},
+        {3, 10, 1, 11, 10, 3, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {0, 10, 1, 0, 8, 10, 8, 11, 10, -1, -1, -1, -1, -1, -1, -1},
+        {3, 9, 0, 3, 11, 9, 11, 10, 9, -1, -1, -1, -1, -1, -1, -1},
+        {9, 8, 10, 10, 8, 11, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {4, 7, 8, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {4, 3, 0, 7, 3, 4, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {0, 1, 9, 8, 4, 7, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {4, 1, 9, 4, 7, 1, 7, 3, 1, -1, -1, -1, -1, -1, -1, -1},
+        {1, 2, 10, 8, 4, 7, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {3, 4, 7, 3, 0, 4, 1, 2, 10, -1, -1, -1, -1, -1, -1, -1},
+        {9, 2, 10, 9, 0, 2, 8, 4, 7, -1, -1, -1, -1, -1, -1, -1},
+        {2, 10, 9, 2, 9, 7, 2, 7, 3, 7, 9, 4, -1, -1, -1, -1},
+        {8, 4, 7, 3, 11, 2, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {11, 4, 7, 11, 2, 4, 2, 0, 4, -1, -1, -1, -1, -1, -1, -1},
+        {9, 0, 1, 8, 4, 7, 2, 3, 11, -1, -1, -1, -1, -1, -1, -1},
+        {4, 7, 11, 9, 4, 11, 9, 11, 2, 9, 2, 1, -1, -1, -1, -1},
+        {3, 10, 1, 3, 11, 10, 7, 8, 4, -1, -1, -1, -1, -1, -1, -1},
+        {1, 11, 10, 1, 4, 11, 1, 0, 4, 7, 11, 4, -1, -1, -1, -1},
+        {4, 7, 8, 9, 0, 11, 9, 11, 10, 11, 0, 3, -1, -1, -1, -1},
+        {4, 7, 11, 4, 11, 9, 9, 11, 10, -1, -1, -1, -1, -1, -1, -1},
+        {9, 5, 4, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {9, 5, 4, 0, 8, 3, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {0, 5, 4, 1, 5, 0, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {8, 5, 4, 8, 3, 5, 3, 1, 5, -1, -1, -1, -1, -1, -1, -1},
+        {1, 2, 10, 9, 5, 4, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {3, 0, 8, 1, 2, 10, 4, 9, 5, -1, -1, -1, -1, -1, -1, -1},
+        {5, 2, 10, 5, 4, 2, 4, 0, 2, -1, -1, -1, -1, -1, -1, -1},
+        {2, 10, 5, 3, 2, 5, 3, 5, 4, 3, 4, 8, -1, -1, -1, -1},
+        {9, 5, 4, 2, 3, 11, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {0, 11, 2, 0, 8, 11, 4, 9, 5, -1, -1, -1, -1, -1, -1, -1},
+        {0, 5, 4, 0, 1, 5, 2, 3, 11, -1, -1, -1, -1, -1, -1, -1},
+        {2, 1, 5, 2, 5, 8, 2, 8, 11, 4, 8, 5, -1, -1, -1, -1},
+        {10, 3, 11, 10, 1, 3, 9, 5, 4, -1, -1, -1, -1, -1, -1, -1},
+        {4, 9, 5, 0, 8, 1, 8, 10, 1, 8, 11, 10, -1, -1, -1, -1},
+        {5, 4, 0, 5, 0, 11, 5, 11, 10, 11, 0, 3, -1, -1, -1, -1},
+        {5, 4, 8, 5, 8, 10, 10, 8, 11, -1, -1, -1, -1, -1, -1, -1},
+        {9, 7, 8, 5, 7, 9, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {9, 3, 0, 9, 5, 3, 5, 7, 3, -1, -1, -1, -1, -1, -1, -1},
+        {0, 7, 8, 0, 1, 7, 1, 5, 7, -1, -1, -1, -1, -1, -1, -1},
+        {1, 5, 3, 3, 5, 7, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {9, 7, 8, 9, 5, 7, 10, 1, 2, -1, -1, -1, -1, -1, -1, -1},
+        {10, 1, 2, 9, 5, 0, 5, 3, 0, 5, 7, 3, -1, -1, -1, -1},
+        {8, 0, 2, 8, 2, 5, 8, 5, 7, 10, 5, 2, -1, -1, -1, -1},
+        {2, 10, 5, 2, 5, 3, 3, 5, 7, -1, -1, -1, -1, -1, -1, -1},
+        {7, 9, 5, 7, 8, 9, 3, 11, 2, -1, -1, -1, -1, -1, -1, -1},
+        {9, 5, 7, 9, 7, 2, 9, 2, 0, 2, 7, 11, -1, -1, -1, -1},
+        {2, 3, 11, 0, 1, 8, 1, 7, 8, 1, 5, 7, -1, -1, -1, -1},
+        {11, 2, 1, 11, 1, 7, 7, 1, 5, -1, -1, -1, -1, -1, -1, -1},
+        {9, 5, 8, 8, 5, 7, 10, 1, 3, 10, 3, 11, -1, -1, -1, -1},
+        {5, 7, 0, 5, 0, 9, 7, 11, 0, 1, 0, 10, 11, 10, 0, -1},
+        {11, 10, 0, 11, 0, 3, 10, 5, 0, 8, 0, 7, 5, 7, 0, -1},
+        {11, 10, 5, 7, 11, 5, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {10, 6, 5, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {0, 8, 3, 5, 10, 6, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {9, 0, 1, 5, 10, 6, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {1, 8, 3, 1, 9, 8, 5, 10, 6, -1, -1, -1, -1, -1, -1, -1},
+        {1, 6, 5, 2, 6, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {1, 6, 5, 1, 2, 6, 3, 0, 8, -1, -1, -1, -1, -1, -1, -1},
+        {9, 6, 5, 9, 0, 6, 0, 2, 6, -1, -1, -1, -1, -1, -1, -1},
+        {5, 9, 8, 5, 8, 2, 5, 2, 6, 3, 2, 8, -1, -1, -1, -1},
+        {2, 3, 11, 10, 6, 5, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {11, 0, 8, 11, 2, 0, 10, 6, 5, -1, -1, -1, -1, -1, -1, -1},
+        {0, 1, 9, 2, 3, 11, 5, 10, 6, -1, -1, -1, -1, -1, -1, -1},
+        {5, 10, 6, 1, 9, 2, 9, 11, 2, 9, 8, 11, -1, -1, -1, -1},
+        {6, 3, 11, 6, 5, 3, 5, 1, 3, -1, -1, -1, -1, -1, -1, -1},
+        {0, 8, 11, 0, 11, 5, 0, 5, 1, 5, 11, 6, -1, -1, -1, -1},
+        {3, 11, 6, 0, 3, 6, 0, 6, 5, 0, 5, 9, -1, -1, -1, -1},
+        {6, 5, 9, 6, 9, 11, 11, 9, 8, -1, -1, -1, -1, -1, -1, -1},
+        {5, 10, 6, 4, 7, 8, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {4, 3, 0, 4, 7, 3, 6, 5, 10, -1, -1, -1, -1, -1, -1, -1},
+        {1, 9, 0, 5, 10, 6, 8, 4, 7, -1, -1, -1, -1, -1, -1, -1},
+        {10, 6, 5, 1, 9, 7, 1, 7, 3, 7, 9, 4, -1, -1, -1, -1},
+        {6, 1, 2, 6, 5, 1, 4, 7, 8, -1, -1, -1, -1, -1, -1, -1},
+        {1, 2, 5, 5, 2, 6, 3, 0, 4, 3, 4, 7, -1, -1, -1, -1},
+        {8, 4, 7, 9, 0, 5, 0, 6, 5, 0, 2, 6, -1, -1, -1, -1},
+        {7, 3, 9, 7, 9, 4, 3, 2, 9, 5, 9, 6, 2, 6, 9, -1},
+        {3, 11, 2, 7, 8, 4, 10, 6, 5, -1, -1, -1, -1, -1, -1, -1},
+        {5, 10, 6, 4, 7, 2, 4, 2, 0, 2, 7, 11, -1, -1, -1, -1},
+        {0, 1, 9, 4, 7, 8, 2, 3, 11, 5, 10, 6, -1, -1, -1, -1},
+        {9, 2, 1, 9, 11, 2, 9, 4, 11, 7, 11, 4, 5, 10, 6, -1},
+        {8, 4, 7, 3, 11, 5, 3, 5, 1, 5, 11, 6, -1, -1, -1, -1},
+        {5, 1, 11, 5, 11, 6, 1, 0, 11, 7, 11, 4, 0, 4, 11, -1},
+        {0, 5, 9, 0, 6, 5, 0, 3, 6, 11, 6, 3, 8, 4, 7, -1},
+        {6, 5, 9, 6, 9, 11, 4, 7, 9, 7, 11, 9, -1, -1, -1, -1},
+        {10, 4, 9, 6, 4, 10, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {4, 10, 6, 4, 9, 10, 0, 8, 3, -1, -1, -1, -1, -1, -1, -1},
+        {10, 0, 1, 10, 6, 0, 6, 4, 0, -1, -1, -1, -1, -1, -1, -1},
+        {8, 3, 1, 8, 1, 6, 8, 6, 4, 6, 1, 10, -1, -1, -1, -1},
+        {1, 4, 9, 1, 2, 4, 2, 6, 4, -1, -1, -1, -1, -1, -1, -1},
+        {3, 0, 8, 1, 2, 9, 2, 4, 9, 2, 6, 4, -1, -1, -1, -1},
+        {0, 2, 4, 4, 2, 6, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {8, 3, 2, 8, 2, 4, 4, 2, 6, -1, -1, -1, -1, -1, -1, -1},
+        {10, 4, 9, 10, 6, 4, 11, 2, 3, -1, -1, -1, -1, -1, -1, -1},
+        {0, 8, 2, 2, 8, 11, 4, 9, 10, 4, 10, 6, -1, -1, -1, -1},
+        {3, 11, 2, 0, 1, 6, 0, 6, 4, 6, 1, 10, -1, -1, -1, -1},
+        {6, 4, 1, 6, 1, 10, 4, 8, 1, 2, 1, 11, 8, 11, 1, -1},
+        {9, 6, 4, 9, 3, 6, 9, 1, 3, 11, 6, 3, -1, -1, -1, -1},
+        {8, 11, 1, 8, 1, 0, 11, 6, 1, 9, 1, 4, 6, 4, 1, -1},
+        {3, 11, 6, 3, 6, 0, 0, 6, 4, -1, -1, -1, -1, -1, -1, -1},
+        {6, 4, 8, 11, 6, 8, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {7, 10, 6, 7, 8, 10, 8, 9, 10, -1, -1, -1, -1, -1, -1, -1},
+        {0, 7, 3, 0, 10, 7, 0, 9, 10, 6, 7, 10, -1, -1, -1, -1},
+        {10, 6, 7, 1, 10, 7, 1, 7, 8, 1, 8, 0, -1, -1, -1, -1},
+        {10, 6, 7, 10, 7, 1, 1, 7, 3, -1, -1, -1, -1, -1, -1, -1},
+        {1, 2, 6, 1, 6, 8, 1, 8, 9, 8, 6, 7, -1, -1, -1, -1},
+        {2, 6, 9, 2, 9, 1, 6, 7, 9, 0, 9, 3, 7, 3, 9, -1},
+        {7, 8, 0, 7, 0, 6, 6, 0, 2, -1, -1, -1, -1, -1, -1, -1},
+        {7, 3, 2, 6, 7, 2, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {2, 3, 11, 10, 6, 8, 10, 8, 9, 8, 6, 7, -1, -1, -1, -1},
+        {2, 0, 7, 2, 7, 11, 0, 9, 7, 6, 7, 10, 9, 10, 7, -1},
+        {1, 8, 0, 1, 7, 8, 1, 10, 7, 6, 7, 10, 2, 3, 11, -1},
+        {11, 2, 1, 11, 1, 7, 10, 6, 1, 6, 7, 1, -1, -1, -1, -1},
+        {8, 9, 6, 8, 6, 7, 9, 1, 6, 11, 6, 3, 1, 3, 6, -1},
+        {0, 9, 1, 11, 6, 7, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {7, 8, 0, 7, 0, 6, 3, 11, 0, 11, 6, 0, -1, -1, -1, -1},
+        {7, 11, 6, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {7, 6, 11, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {3, 0, 8, 11, 7, 6, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {0, 1, 9, 11, 7, 6, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {8, 1, 9, 8, 3, 1, 11, 7, 6, -1, -1, -1, -1, -1, -1, -1},
+        {10, 1, 2, 6, 11, 7, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {1, 2, 10, 3, 0, 8, 6, 11, 7, -1, -1, -1, -1, -1, -1, -1},
+        {2, 9, 0, 2, 10, 9, 6, 11, 7, -1, -1, -1, -1, -1, -1, -1},
+        {6, 11, 7, 2, 10, 3, 10, 8, 3, 10, 9, 8, -1, -1, -1, -1},
+        {7, 2, 3, 6, 2, 7, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {7, 0, 8, 7, 6, 0, 6, 2, 0, -1, -1, -1, -1, -1, -1, -1},
+        {2, 7, 6, 2, 3, 7, 0, 1, 9, -1, -1, -1, -1, -1, -1, -1},
+        {1, 6, 2, 1, 8, 6, 1, 9, 8, 8, 7, 6, -1, -1, -1, -1},
+        {10, 7, 6, 10, 1, 7, 1, 3, 7, -1, -1, -1, -1, -1, -1, -1},
+        {10, 7, 6, 1, 7, 10, 1, 8, 7, 1, 0, 8, -1, -1, -1, -1},
+        {0, 3, 7, 0, 7, 10, 0, 10, 9, 6, 10, 7, -1, -1, -1, -1},
+        {7, 6, 10, 7, 10, 8, 8, 10, 9, -1, -1, -1, -1, -1, -1, -1},
+        {6, 8, 4, 11, 8, 6, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {3, 6, 11, 3, 0, 6, 0, 4, 6, -1, -1, -1, -1, -1, -1, -1},
+        {8, 6, 11, 8, 4, 6, 9, 0, 1, -1, -1, -1, -1, -1, -1, -1},
+        {9, 4, 6, 9, 6, 3, 9, 3, 1, 11, 3, 6, -1, -1, -1, -1},
+        {6, 8, 4, 6, 11, 8, 2, 10, 1, -1, -1, -1, -1, -1, -1, -1},
+        {1, 2, 10, 3, 0, 11, 0, 6, 11, 0, 4, 6, -1, -1, -1, -1},
+        {4, 11, 8, 4, 6, 11, 0, 2, 9, 2, 10, 9, -1, -1, -1, -1},
+        {10, 9, 3, 10, 3, 2, 9, 4, 3, 11, 3, 6, 4, 6, 3, -1},
+        {8, 2, 3, 8, 4, 2, 4, 6, 2, -1, -1, -1, -1, -1, -1, -1},
+        {0, 4, 2, 4, 6, 2, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {1, 9, 0, 2, 3, 4, 2, 4, 6, 4, 3, 8, -1, -1, -1, -1},
+        {1, 9, 4, 1, 4, 2, 2, 4, 6, -1, -1, -1, -1, -1, -1, -1},
+        {8, 1, 3, 8, 6, 1, 8, 4, 6, 6, 10, 1, -1, -1, -1, -1},
+        {10, 1, 0, 10, 0, 6, 6, 0, 4, -1, -1, -1, -1, -1, -1, -1},
+        {4, 6, 3, 4, 3, 8, 6, 10, 3, 0, 3, 9, 10, 9, 3, -1},
+        {10, 9, 4, 6, 10, 4, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {4, 9, 5, 7, 6, 11, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {0, 8, 3, 4, 9, 5, 11, 7, 6, -1, -1, -1, -1, -1, -1, -1},
+        {5, 0, 1, 5, 4, 0, 7, 6, 11, -1, -1, -1, -1, -1, -1, -1},
+        {11, 7, 6, 8, 3, 4, 3, 5, 4, 3, 1, 5, -1, -1, -1, -1},
+        {9, 5, 4, 10, 1, 2, 7, 6, 11, -1, -1, -1, -1, -1, -1, -1},
+        {6, 11, 7, 1, 2, 10, 0, 8, 3, 4, 9, 5, -1, -1, -1, -1},
+        {7, 6, 11, 5, 4, 10, 4, 2, 10, 4, 0, 2, -1, -1, -1, -1},
+        {3, 4, 8, 3, 5, 4, 3, 2, 5, 10, 5, 2, 11, 7, 6, -1},
+        {7, 2, 3, 7, 6, 2, 5, 4, 9, -1, -1, -1, -1, -1, -1, -1},
+        {9, 5, 4, 0, 8, 6, 0, 6, 2, 6, 8, 7, -1, -1, -1, -1},
+        {3, 6, 2, 3, 7, 6, 1, 5, 0, 5, 4, 0, -1, -1, -1, -1},
+        {6, 2, 8, 6, 8, 7, 2, 1, 8, 4, 8, 5, 1, 5, 8, -1},
+        {9, 5, 4, 10, 1, 6, 1, 7, 6, 1, 3, 7, -1, -1, -1, -1},
+        {1, 6, 10, 1, 7, 6, 1, 0, 7, 8, 7, 0, 9, 5, 4, -1},
+        {4, 0, 10, 4, 10, 5, 0, 3, 10, 6, 10, 7, 3, 7, 10, -1},
+        {7, 6, 10, 7, 10, 8, 5, 4, 10, 4, 8, 10, -1, -1, -1, -1},
+        {6, 9, 5, 6, 11, 9, 11, 8, 9, -1, -1, -1, -1, -1, -1, -1},
+        {3, 6, 11, 0, 6, 3, 0, 5, 6, 0, 9, 5, -1, -1, -1, -1},
+        {0, 11, 8, 0, 5, 11, 0, 1, 5, 5, 6, 11, -1, -1, -1, -1},
+        {6, 11, 3, 6, 3, 5, 5, 3, 1, -1, -1, -1, -1, -1, -1, -1},
+        {1, 2, 10, 9, 5, 11, 9, 11, 8, 11, 5, 6, -1, -1, -1, -1},
+        {0, 11, 3, 0, 6, 11, 0, 9, 6, 5, 6, 9, 1, 2, 10, -1},
+        {11, 8, 5, 11, 5, 6, 8, 0, 5, 10, 5, 2, 0, 2, 5, -1},
+        {6, 11, 3, 6, 3, 5, 2, 10, 3, 10, 5, 3, -1, -1, -1, -1},
+        {5, 8, 9, 5, 2, 8, 5, 6, 2, 3, 8, 2, -1, -1, -1, -1},
+        {9, 5, 6, 9, 6, 0, 0, 6, 2, -1, -1, -1, -1, -1, -1, -1},
+        {1, 5, 8, 1, 8, 0, 5, 6, 8, 3, 8, 2, 6, 2, 8, -1},
+        {1, 5, 6, 2, 1, 6, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {1, 3, 6, 1, 6, 10, 3, 8, 6, 5, 6, 9, 8, 9, 6, -1},
+        {10, 1, 0, 10, 0, 6, 9, 5, 0, 5, 6, 0, -1, -1, -1, -1},
+        {0, 3, 8, 5, 6, 10, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {10, 5, 6, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {11, 5, 10, 7, 5, 11, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {11, 5, 10, 11, 7, 5, 8, 3, 0, -1, -1, -1, -1, -1, -1, -1},
+        {5, 11, 7, 5, 10, 11, 1, 9, 0, -1, -1, -1, -1, -1, -1, -1},
+        {10, 7, 5, 10, 11, 7, 9, 8, 1, 8, 3, 1, -1, -1, -1, -1},
+        {11, 1, 2, 11, 7, 1, 7, 5, 1, -1, -1, -1, -1, -1, -1, -1},
+        {0, 8, 3, 1, 2, 7, 1, 7, 5, 7, 2, 11, -1, -1, -1, -1},
+        {9, 7, 5, 9, 2, 7, 9, 0, 2, 2, 11, 7, -1, -1, -1, -1},
+        {7, 5, 2, 7, 2, 11, 5, 9, 2, 3, 2, 8, 9, 8, 2, -1},
+        {2, 5, 10, 2, 3, 5, 3, 7, 5, -1, -1, -1, -1, -1, -1, -1},
+        {8, 2, 0, 8, 5, 2, 8, 7, 5, 10, 2, 5, -1, -1, -1, -1},
+        {9, 0, 1, 5, 10, 3, 5, 3, 7, 3, 10, 2, -1, -1, -1, -1},
+        {9, 8, 2, 9, 2, 1, 8, 7, 2, 10, 2, 5, 7, 5, 2, -1},
+        {1, 3, 5, 3, 7, 5, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {0, 8, 7, 0, 7, 1, 1, 7, 5, -1, -1, -1, -1, -1, -1, -1},
+        {9, 0, 3, 9, 3, 5, 5, 3, 7, -1, -1, -1, -1, -1, -1, -1},
+        {9, 8, 7, 5, 9, 7, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {5, 8, 4, 5, 10, 8, 10, 11, 8, -1, -1, -1, -1, -1, -1, -1},
+        {5, 0, 4, 5, 11, 0, 5, 10, 11, 11, 3, 0, -1, -1, -1, -1},
+        {0, 1, 9, 8, 4, 10, 8, 10, 11, 10, 4, 5, -1, -1, -1, -1},
+        {10, 11, 4, 10, 4, 5, 11, 3, 4, 9, 4, 1, 3, 1, 4, -1},
+        {2, 5, 1, 2, 8, 5, 2, 11, 8, 4, 5, 8, -1, -1, -1, -1},
+        {0, 4, 11, 0, 11, 3, 4, 5, 11, 2, 11, 1, 5, 1, 11, -1},
+        {0, 2, 5, 0, 5, 9, 2, 11, 5, 4, 5, 8, 11, 8, 5, -1},
+        {9, 4, 5, 2, 11, 3, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {2, 5, 10, 3, 5, 2, 3, 4, 5, 3, 8, 4, -1, -1, -1, -1},
+        {5, 10, 2, 5, 2, 4, 4, 2, 0, -1, -1, -1, -1, -1, -1, -1},
+        {3, 10, 2, 3, 5, 10, 3, 8, 5, 4, 5, 8, 0, 1, 9, -1},
+        {5, 10, 2, 5, 2, 4, 1, 9, 2, 9, 4, 2, -1, -1, -1, -1},
+        {8, 4, 5, 8, 5, 3, 3, 5, 1, -1, -1, -1, -1, -1, -1, -1},
+        {0, 4, 5, 1, 0, 5, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {8, 4, 5, 8, 5, 3, 9, 0, 5, 0, 3, 5, -1, -1, -1, -1},
+        {9, 4, 5, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {4, 11, 7, 4, 9, 11, 9, 10, 11, -1, -1, -1, -1, -1, -1, -1},
+        {0, 8, 3, 4, 9, 7, 9, 11, 7, 9, 10, 11, -1, -1, -1, -1},
+        {1, 10, 11, 1, 11, 4, 1, 4, 0, 7, 4, 11, -1, -1, -1, -1},
+        {3, 1, 4, 3, 4, 8, 1, 10, 4, 7, 4, 11, 10, 11, 4, -1},
+        {4, 11, 7, 9, 11, 4, 9, 2, 11, 9, 1, 2, -1, -1, -1, -1},
+        {9, 7, 4, 9, 11, 7, 9, 1, 11, 2, 11, 1, 0, 8, 3, -1},
+        {11, 7, 4, 11, 4, 2, 2, 4, 0, -1, -1, -1, -1, -1, -1, -1},
+        {11, 7, 4, 11, 4, 2, 8, 3, 4, 3, 2, 4, -1, -1, -1, -1},
+        {2, 9, 10, 2, 7, 9, 2, 3, 7, 7, 4, 9, -1, -1, -1, -1},
+        {9, 10, 7, 9, 7, 4, 10, 2, 7, 8, 7, 0, 2, 0, 7, -1},
+        {3, 7, 10, 3, 10, 2, 7, 4, 10, 1, 10, 0, 4, 0, 10, -1},
+        {1, 10, 2, 8, 7, 4, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {4, 9, 1, 4, 1, 7, 7, 1, 3, -1, -1, -1, -1, -1, -1, -1},
+        {4, 9, 1, 4, 1, 7, 0, 8, 1, 8, 7, 1, -1, -1, -1, -1},
+        {4, 0, 3, 7, 4, 3, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {4, 8, 7, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {9, 10, 8, 10, 11, 8, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {3, 0, 9, 3, 9, 11, 11, 9, 10, -1, -1, -1, -1, -1, -1, -1},
+        {0, 1, 10, 0, 10, 8, 8, 10, 11, -1, -1, -1, -1, -1, -1, -1},
+        {3, 1, 10, 11, 3, 10, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {1, 2, 11, 1, 11, 9, 9, 11, 8, -1, -1, -1, -1, -1, -1, -1},
+        {3, 0, 9, 3, 9, 11, 1, 2, 9, 2, 11, 9, -1, -1, -1, -1},
+        {0, 2, 11, 8, 0, 11, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {3, 2, 11, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {2, 3, 8, 2, 8, 10, 10, 8, 9, -1, -1, -1, -1, -1, -1, -1},
+        {9, 10, 2, 0, 9, 2, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {2, 3, 8, 2, 8, 10, 0, 1, 8, 1, 10, 8, -1, -1, -1, -1},
+        {1, 10, 2, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {1, 3, 8, 9, 1, 8, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {0, 9, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {0, 3, 8, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1},
+        {-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1}};
+
+} // namespace dynafu
+} // namespace cv
+#endif
diff --git a/modules/rgbd/src/nonrigid_icp.cpp b/modules/rgbd/src/nonrigid_icp.cpp
new file mode 100644
index 00000000000..f2bbc5ffb2a
--- /dev/null
+++ b/modules/rgbd/src/nonrigid_icp.cpp
@@ -0,0 +1,479 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#include <algorithm>
+#include "precomp.hpp"
+#include "nonrigid_icp.hpp"
+
+#define MAD_SCALE 1.4826f
+#define TUKEY_B 4.6851f
+#define HUBER_K 1.345f
+
+namespace cv
+{
+namespace dynafu
+{
+using namespace kinfu;
+
+NonRigidICP::NonRigidICP(const Intr _intrinsics, const cv::Ptr<TSDFVolume>& _volume, int _iterations) :
+iterations(_iterations), volume(_volume), intrinsics(_intrinsics)
+{}
+
+class ICPImpl : public NonRigidICP
+{
+public:
+    ICPImpl(const cv::kinfu::Intr _intrinsics, const cv::Ptr<TSDFVolume>& _volume, int _iterations);
+
+    virtual bool estimateWarpNodes(WarpField& currentWarp, const Affine3f &pose,
+                                   InputArray vertImage, InputArray oldPoints,
+                                   InputArray oldNormals, InputArray newPoints,
+                                   InputArray newNormals) const override;
+
+    virtual ~ICPImpl() {}
+
+private:
+    float median(std::vector<float> v) const;
+    float tukeyWeight(float x, float sigma) const;
+    float huberWeight(Vec3f v, float sigma) const;
+};
+
+ICPImpl::ICPImpl(const Intr _intrinsics, const cv::Ptr<TSDFVolume>& _volume, int _iterations) :
+NonRigidICP(_intrinsics, _volume, _iterations)
+{}
+
+static inline bool fastCheck(const Point3f& p)
+{
+    return !cvIsNaN(p.x);
+}
+
+float ICPImpl::median(std::vector<float> v) const
+{
+    size_t n = v.size()/2;
+    if(n == 0) return 0;
+
+    std::nth_element(v.begin(), v.begin()+n, v.end());
+    float vn = v[n];
+
+    if(n%2 == 0)
+    {
+        std::nth_element(v.begin(), v.begin()+n-1, v.end());
+        return (vn+v[n-1])/2.f;
+    }
+    else
+    {
+        return vn;
+    }
+}
+
+float ICPImpl::tukeyWeight(float r, float sigma) const
+{
+    float x = r/sigma;
+    if(std::abs(x) <= TUKEY_B)
+    {
+        float y = 1 - (x*x)/(TUKEY_B * TUKEY_B);
+        return y*y;
+
+    } else return 0;
+}
+
+float ICPImpl::huberWeight(Vec3f v, float sigma) const
+{
+    if(sigma == 0) return 0.f;
+    float x = (float)std::abs(norm(v)/sigma);
+    return (x > HUBER_K)? HUBER_K/x : 1.f;
+}
+
+bool ICPImpl::estimateWarpNodes(WarpField& currentWarp, const Affine3f &pose,
+                                InputArray _vertImage, InputArray _oldPoints,
+                                InputArray _oldNormals, InputArray _newPoints,
+                                InputArray _newNormals) const
+{
+    CV_Assert(_vertImage.isMat());
+    CV_Assert(_oldPoints.isMat());
+    CV_Assert(_newPoints.isMat());
+    CV_Assert(_newNormals.isMat());
+
+    Mat vertImage = _vertImage.getMat();
+    Mat oldPoints = _oldPoints.getMat();
+    Mat newPoints = _newPoints.getMat();
+    Mat newNormals = _newNormals.getMat();
+    Mat oldNormals = _oldNormals.getMat();
+
+    CV_Assert(!vertImage.empty());
+    CV_Assert(!oldPoints.empty());
+    CV_Assert(!newPoints.empty());
+    CV_Assert(!newNormals.empty());
+
+    const NodeVectorType& warpNodes = currentWarp.getNodes();
+
+    Affine3f T_lw = pose.inv() * volume->pose;
+
+    // Accumulate regularisation term for each node in the heiarchy
+    const std::vector<NodeVectorType>& regNodes = currentWarp.getGraphNodes();
+    const heirarchyType& regGraph = currentWarp.getRegGraph();
+
+    int totalNodes = (int)warpNodes.size();
+    for(const auto& nodes: regNodes) totalNodes += (int)nodes.size();
+
+    // level-wise regularisation components of A and b (from Ax = b) for each node in hierarchy
+    Mat_<float> b_reg(6*totalNodes, 1, 0.f);
+    Mat_<float> A_reg(6*totalNodes, 6*totalNodes, 0.f);
+
+
+    // indices for each node block to A,b matrices. It determines the order
+    // in which paramters are laid out
+
+    std::vector<int> baseIndices(currentWarp.n_levels, 0);
+
+    for(int l = currentWarp.n_levels-2; l >= 0; l--)
+    {
+        baseIndices[l] = baseIndices[l+1]+6*((int)regNodes[l].size());
+    }
+
+    for(const int& i: baseIndices) std::cout << i << ", ";
+    std::cout << std::endl;
+
+    // populate residuals for each edge in the graph to calculate sigma
+    std::vector<float> reg_residuals;
+    float RegEnergy = 0;
+    int numEdges = 0;
+    for(int l = 0; l < (currentWarp.n_levels-1); l++)
+    {
+        const std::vector<nodeNeighboursType>& level = regGraph[l];
+
+        const NodeVectorType& currentLevelNodes = (l == 0)? warpNodes : regNodes[l-1];
+
+        const NodeVectorType& nextLevelNodes = regNodes[l];
+
+        std::cout << currentLevelNodes.size() << " " << nextLevelNodes.size() << std::endl;
+
+
+        for(size_t node = 0; node < level.size(); node++)
+        {
+            const nodeNeighboursType& children = level[node];
+            Vec3f nodePos = currentLevelNodes[node]->pos;
+            Affine3f nodeTransform = currentLevelNodes[node]->transform;
+
+            for(int c = 0; c < currentWarp.k; c++)
+            {
+                const int child = children[c];
+                Vec3f childPos = nextLevelNodes[child]->pos;
+                Vec3f childTranslation = nextLevelNodes[child]->transform.translation();
+
+                Vec3f re = nodeTransform * (childPos - nodePos) + nodePos
+                           - (childTranslation + childPos);
+                numEdges++;
+
+                reg_residuals.push_back((float)norm(re));
+                RegEnergy += (float)norm(re);
+            }
+        }
+    }
+
+    Mat_<float> J_reg(6*numEdges, 6*totalNodes, 0.f);
+
+    std::cout << "Total reg energy: " << RegEnergy << ", Average: " << RegEnergy/numEdges << std::endl;
+
+    float reg_med = median(reg_residuals);
+    std::for_each(reg_residuals.begin(), reg_residuals.end(),
+                  [reg_med](float& x)
+                  {
+                      x = std::abs(x-reg_med);
+                  });
+
+    float reg_sigma = MAD_SCALE * median(reg_residuals);
+    std::cout << "[Reg] Sigma: " << reg_sigma << " from " << reg_residuals.size() << " residuals " << std::endl;
+
+    for(int l = 0; l < (currentWarp.n_levels-1); l++)
+    {
+        const std::vector<nodeNeighboursType>& level = regGraph[l];
+
+        const NodeVectorType& currentLevelNodes = (l == 0)? warpNodes : regNodes[l-1];
+        const NodeVectorType& nextLevelNodes = regNodes[l];
+
+        for(size_t node = 0; node < level.size(); node++)
+        {
+            const nodeNeighboursType& children = level[node];
+            Vec3f nodePos = currentLevelNodes[node]->pos;
+            Affine3f nodeTransform = currentLevelNodes[node]->transform;
+
+            int parentIndex = baseIndices[l]+6*(int)node;
+
+            for(int edge = 0; edge < currentWarp.k; edge++)
+            {
+                const int child = children[edge];
+                const Ptr<WarpNode> childNode = nextLevelNodes[child];
+                Vec3f childTranslation = childNode->transform.translation();
+
+                Vec3f childPos = childNode->pos;
+                Vec3f transformedChild = nodeTransform * (childPos - nodePos);
+                Vec3f r_edge = transformedChild + nodePos - (childTranslation + childPos);
+
+                if(norm(r_edge) > 0.01) continue;
+
+                float robustWeight = huberWeight(r_edge, reg_sigma);
+
+                // take sqrt since radius is stored as squared distance
+                float edgeWeight = sqrt(min(childNode->radius, currentLevelNodes[node]->radius));
+
+                Vec3f v1 = transformedChild.cross(r_edge);
+
+
+                float w = 1 * robustWeight * edgeWeight;
+                b_reg(parentIndex+0) += -w * v1[0];
+                b_reg(parentIndex+1) += -w * v1[1];
+                b_reg(parentIndex+2) += -w * v1[2];
+                b_reg(parentIndex+3) += -w * r_edge[0];
+                b_reg(parentIndex+4) += -w * r_edge[1];
+                b_reg(parentIndex+5) += -w * r_edge[2];
+
+                int childIndex = baseIndices[l+1]+6*child;
+                Vec3f v2 = childTranslation.cross(r_edge);
+                b_reg(childIndex+0) += w * v2[0];
+                b_reg(childIndex+1) += w * v2[1];
+                b_reg(childIndex+2) += w * v2[2];
+                b_reg(childIndex+3) += w * r_edge[0];
+                b_reg(childIndex+4) += w * r_edge[1];
+                b_reg(childIndex+5) += w * r_edge[2];
+
+
+                Matx33f Tj_Vj_Vi_cross(0, -transformedChild[2], transformedChild[1],
+                                       transformedChild[2], 0, -transformedChild[0],
+                                       -transformedChild[1], transformedChild[0], 0);
+                Matx33f tj_cross(0, -childTranslation[2], childTranslation[1],
+                                 childTranslation[2], 0, -childTranslation[0],
+                                 -childTranslation[1], childTranslation[0], 0);
+
+                // place top left elements
+                Matx33f top_left = Tj_Vj_Vi_cross * tj_cross;
+                for(int i = 0; i < 3; i++)
+                    for (int j = 0; j < 3; j++)
+                    {
+                        A_reg(parentIndex+i, childIndex+j) += w * top_left(i, j);
+                        A_reg(childIndex+i, parentIndex+j) += w * top_left(i, j);
+                    }
+
+                // place top right elements
+                for(int i = 0; i < 3; i++)
+                    for(int j = 0; j < 3; j++)
+                    {
+                        A_reg(parentIndex+i, childIndex+j+3) += - w * Tj_Vj_Vi_cross(i, j);
+                        A_reg(childIndex+i, parentIndex+j+3) += - w * Tj_Vj_Vi_cross(i, j);
+                    }
+
+                // place bottom left elements
+                for(int i = 0; i < 3; i++)
+                    for(int j = 0; j < 3; j++)
+                    {
+                        A_reg(parentIndex+i+3, childIndex+j) += w * tj_cross(i, j);
+                        A_reg(childIndex+i+3, parentIndex+j) += w * tj_cross(i, j);
+                    }
+
+                // place bottom right elements (which is -I_3)
+                for(int i = 0; i < 3; i++)
+                {
+                    A_reg(parentIndex+i+3, childIndex+i+3) += -w;
+                    A_reg(childIndex+i+3, parentIndex+i+3) += -w;
+                }
+
+            }
+        }
+    }
+
+    std::vector<float> residuals;
+
+    Mat Vg(oldPoints.size(), CV_32FC3, nan3);
+
+    Mat Vc(oldPoints.size(), CV_32FC3, nan3);
+    Mat Nc(oldPoints.size(), CV_32FC3, nan3);
+    cv::kinfu::Intr::Projector proj = intrinsics.makeProjector();
+
+    for (int y = 0; y < oldPoints.size().height; y++)
+    {
+        for (int x = 0; x < oldPoints.size().width; x++)
+        {
+            // Obtain correspondence by projecting Tu_Vg
+            Vec3f curV = oldPoints.at<Vec3f>(y, x);
+            if (curV == Vec3f::all(0) || cvIsNaN(curV[0]) || cvIsNaN(curV[1]) || cvIsNaN(curV[2]))
+                continue;
+
+            Point2f newCoords = proj(oldPoints.at<Point3f>(y, x));
+            if(!(newCoords.x >= 0 && newCoords.x < newPoints.cols - 1 &&
+                 newCoords.y >= 0 && newCoords.y < newPoints.rows - 1))
+                continue;
+
+            // TODO: interpolate Vg instead of simply converting projected coords to int
+            Vg.at<Vec3f>(y, x) = vertImage.at<Vec3f>((int)newCoords.y, (int)newCoords.x);
+
+            // bilinearly interpolate newPoints under newCoords point
+            int xi = cvFloor(newCoords.x), yi = cvFloor(newCoords.y);
+            float tx  = newCoords.x - xi, ty = newCoords.y - yi;
+
+            const ptype* prow0 = newPoints.ptr<ptype>(yi+0);
+            const ptype* prow1 = newPoints.ptr<ptype>(yi+1);
+
+            Point3f p00 = fromPtype(prow0[xi+0]);
+            Point3f p01 = fromPtype(prow0[xi+1]);
+            Point3f p10 = fromPtype(prow1[xi+0]);
+            Point3f p11 = fromPtype(prow1[xi+1]);
+
+            //do not fix missing data
+            if(!(fastCheck(p00) && fastCheck(p01) &&
+                fastCheck(p10) && fastCheck(p11)))
+                continue;
+
+            Point3f p0 = p00 + tx*(p01 - p00);
+            Point3f p1 = p10 + tx*(p11 - p10);
+            Point3f newP = (p0 + ty*(p1 - p0));
+
+            const ptype* nrow0 = newNormals.ptr<ptype>(yi+0);
+            const ptype* nrow1 = newNormals.ptr<ptype>(yi+1);
+
+            Point3f n00 = fromPtype(nrow0[xi+0]);
+            Point3f n01 = fromPtype(nrow0[xi+1]);
+            Point3f n10 = fromPtype(nrow1[xi+0]);
+            Point3f n11 = fromPtype(nrow1[xi+1]);
+
+            if(!(fastCheck(n00) && fastCheck(n01) &&
+                fastCheck(n10) && fastCheck(n11)))
+                continue;
+
+            Point3f n0 = n00 + tx*(n01 - n00);
+            Point3f n1 = n10 + tx*(n11 - n10);
+            Point3f newN = n0 + ty*(n1 - n0);
+
+            Vc.at<Point3f>(y, x) = newP;
+            Nc.at<Point3f>(y, x) = newN;
+
+            Vec3f diff = oldPoints.at<Vec3f>(y, x) - Vec3f(newP);
+            if(diff.dot(diff) > 0.0004f) continue;
+            if(abs(newN.dot(oldNormals.at<Point3f>(y, x))) < cos((float)CV_PI / 2)) continue;
+
+            float rd = newN.dot(diff);
+
+
+            residuals.push_back(rd);
+
+        }
+    }
+
+    float med = median(residuals);
+    std::for_each(residuals.begin(), residuals.end(), [med](float& x){x =  std::abs(x-med);});
+    std::cout << "median: " << med << " from " << residuals.size() << " residuals " << std::endl;
+    float sigma = MAD_SCALE * median(residuals);
+
+    float total_error = 0;
+    int pix_count = 0;
+
+    for(int y = 0; y < oldPoints.size().height; y++)
+    {
+        for(int x = 0; x < oldPoints.size().width; x++)
+        {
+            Vec3f curV = oldPoints.at<Vec3f>(y, x);
+            if (curV == Vec3f::all(0) || cvIsNaN(curV[0]))
+                continue;
+
+            Vec3f V = Vg.at<Vec3f>(y, x);
+            if (V == Vec3f::all(0) || cvIsNaN(V[0]))
+                continue;
+
+            V[0] *= volume->volResolution.x;
+            V[1] *= volume->volResolution.y;
+            V[2] *= volume->volResolution.z;
+
+            if(!fastCheck(Vc.at<Point3f>(y, x)))
+                continue;
+
+            if(!fastCheck(Nc.at<Point3f>(y, x)))
+                continue;
+
+            Point3i p((int)V[0], (int)V[1], (int)V[2]);
+            Vec3f diff = oldPoints.at<Vec3f>(y, x) - Vc.at<Vec3f>(y, x);
+
+            float rd = Nc.at<Vec3f>(y, x).dot(diff);
+
+            total_error += tukeyWeight(rd, sigma) * rd * rd;
+            pix_count++;
+
+            int n;
+            nodeNeighboursType neighbours = volume->getVoxelNeighbours(p, n);
+            float totalNeighbourWeight = 0.f;
+            float neighWeights[DYNAFU_MAX_NEIGHBOURS];
+            for (int i = 0; i < n; i++)
+            {
+                int neigh = neighbours[i];
+                neighWeights[i] = warpNodes[neigh]->weight(Point3f(V)*volume->voxelSize);
+
+                totalNeighbourWeight += neighWeights[i];
+            }
+
+            if(totalNeighbourWeight < 1e-5) continue;
+
+            for (int i = 0; i < n; i++)
+            {
+                if(neighWeights[i] < 0.01) continue;
+                int neigh = neighbours[i];
+
+                Vec3f Tj_Vg_Vj = (warpNodes[neigh]->transform *
+                                 (Point3f(V)*volume->voxelSize - warpNodes[neigh]->pos));
+
+                Matx33f Tj_Vg_Vj_x(0, -Tj_Vg_Vj[2], Tj_Vg_Vj[1],
+                                   Tj_Vg_Vj[2], 0, -Tj_Vg_Vj[0],
+                                   -Tj_Vg_Vj[1], Tj_Vg_Vj[0], 0);
+
+                Vec3f v1 = (Tj_Vg_Vj_x * T_lw.rotation().t()) * Nc.at<Vec3f>(y, x);
+                Vec3f v2 = T_lw.rotation().t() * Nc.at<Vec3f>(y, x);
+
+                Matx61f J_dataT(v1[0], v1[1], v1[2], v2[0], v2[1], v2[2]);
+                Matx16f J_data = J_dataT.t();
+                Matx66f H_data = J_dataT * J_data;
+
+                float w = (neighWeights[i] / totalNeighbourWeight);
+
+                float robustWeight = tukeyWeight(rd, sigma);
+
+                int blockIndex = baseIndices[0]+6*neigh;
+                for(int row = 0; row < 6; row++)
+                    for(int col = 0; col < 6; col++)
+                        A_reg(blockIndex+row, blockIndex+col) += robustWeight * w * w * H_data(row, col);
+
+                for(int row = 0; row < 6; row++)
+                    b_reg(blockIndex+row) += -robustWeight * rd * w * J_dataT(row);
+            }
+
+        }
+    }
+
+    double det = determinant(A_reg);
+    std::cout << "A_reg det:" << det  << std::endl;
+
+    std::cout << "Solving " << 6*totalNodes << std::endl;
+    Mat_<float> nodeTwists(6*totalNodes, 1, 0.f);
+    bool result = solve(A_reg, b_reg, nodeTwists, DECOMP_SVD);
+    std::cout << "Done " << result << std::endl;
+
+    for(int i = 0; i < (int)warpNodes.size(); i++)
+    {
+        int idx = baseIndices[0]+6*i;
+        Vec3f r(nodeTwists(idx), nodeTwists(idx+1), nodeTwists(idx+2));
+        Vec3f t(nodeTwists(idx+3), nodeTwists(idx+4), nodeTwists(idx+5));
+        Affine3f tinc(r, t);
+        warpNodes[i]->transform = warpNodes[i]->transform * tinc;
+    }
+
+    std::cout << "Nan count: " << "/" << warpNodes.size() << "\n" << std::endl;
+
+    return true;
+}
+
+cv::Ptr<NonRigidICP> makeNonRigidICP(const cv::kinfu::Intr _intrinsics, const cv::Ptr<TSDFVolume>& _volume,
+                                     int _iterations)
+{
+    return makePtr<ICPImpl>(_intrinsics, _volume, _iterations);
+}
+
+} // namespace dynafu
+} // namespace cv
diff --git a/modules/rgbd/src/nonrigid_icp.hpp b/modules/rgbd/src/nonrigid_icp.hpp
new file mode 100644
index 00000000000..9240a604df7
--- /dev/null
+++ b/modules/rgbd/src/nonrigid_icp.hpp
@@ -0,0 +1,42 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#ifndef __OPENCV_DYNAFU_NONRIGID_ICP_H__
+#define __OPENCV_DYNAFU_NONRIGID_ICP_H__
+
+#include "precomp.hpp"
+#include "kinfu_frame.hpp"
+#include "warpfield.hpp"
+#include "dynafu_tsdf.hpp"
+
+namespace cv {
+namespace dynafu {
+
+class NonRigidICP
+{
+public:
+    NonRigidICP(const cv::kinfu::Intr _intrinsics, const cv::Ptr<TSDFVolume>& _volume, int _iterations);
+
+    virtual bool estimateWarpNodes(WarpField& currentWarp, const Affine3f& pose,
+                                   InputArray vertImage, InputArray oldPoints,
+                                   InputArray oldNormals,
+                                   InputArray newPoints, InputArray newNormals) const = 0;
+
+    virtual ~NonRigidICP() { }
+
+protected:
+
+    int iterations;
+    const cv::Ptr<TSDFVolume>& volume;
+    cv::kinfu::Intr intrinsics;
+};
+
+cv::Ptr<NonRigidICP> makeNonRigidICP(const cv::kinfu::Intr _intrinsics, const cv::Ptr<TSDFVolume>& _volume,
+                                     int _iterations);
+
+} // namespace dynafu
+} // namespace cv
+#endif
diff --git a/modules/rgbd/src/normal.cpp b/modules/rgbd/src/normal.cpp
index f4718640f32..3eec7e26945 100644
--- a/modules/rgbd/src/normal.cpp
+++ b/modules/rgbd/src/normal.cpp
@@ -1,37 +1,8 @@
-/*
- * Software License Agreement (BSD License)
- *
- *  Copyright (c) 2012, Willow Garage, Inc.
- *  All rights reserved.
- *
- *  Redistribution and use in source and binary forms, with or without
- *  modification, are permitted provided that the following conditions
- *  are met:
- *
- *   * Redistributions of source code must retain the above copyright
- *     notice, this list of conditions and the following disclaimer.
- *   * Redistributions in binary form must reproduce the above
- *     copyright notice, this list of conditions and the following
- *     disclaimer in the documentation and/or other materials provided
- *     with the distribution.
- *   * Neither the name of Willow Garage, Inc. nor the names of its
- *     contributors may be used to endorse or promote products derived
- *     from this software without specific prior written permission.
- *
- *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
- *  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
- *  COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
- *  INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
- *  BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- *  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- *  CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- *  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
- *  ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- *  POSSIBILITY OF SUCH DAMAGE.
- *
- */
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
 
 #include "precomp.hpp"
 
@@ -283,13 +254,13 @@ namespace rgbd
       Vec3T *row_B = B[0];
       for (; row_r != row_r_end; ++row_r, ++row_B, ++row_V)
       {
-        Vec3T val = (*row_V) / (*row_r);
-        if(cvIsInf(val[0]) || cvIsNaN(val[0]) ||
-           cvIsInf(val[1]) || cvIsNaN(val[1]) ||
-           cvIsInf(val[2]) || cvIsNaN(val[2]))
-            *row_B = Vec3T();
-        else
-            *row_B = val;
+          Vec3T val = (*row_V) / (*row_r);
+          if(cvIsInf(val[0]) || cvIsNaN(val[0]) ||
+             cvIsInf(val[1]) || cvIsNaN(val[1]) ||
+             cvIsInf(val[2]) || cvIsNaN(val[2]))
+              *row_B = Vec3T();
+          else
+              *row_B = val;
       }
 
       // Apply a box filter to B
diff --git a/modules/rgbd/src/odometry.cpp b/modules/rgbd/src/odometry.cpp
index 454e9643bd8..f533d5e461d 100755
--- a/modules/rgbd/src/odometry.cpp
+++ b/modules/rgbd/src/odometry.cpp
@@ -1,39 +1,13 @@
-/*
- * Software License Agreement (BSD License)
- *
- *  Copyright (c) 2012, Willow Garage, Inc.
- *  All rights reserved.
- *
- *  Redistribution and use in source and binary forms, with or without
- *  modification, are permitted provided that the following conditions
- *  are met:
- *
- *   * Redistributions of source code must retain the above copyright
- *     notice, this list of conditions and the following disclaimer.
- *   * Redistributions in binary form must reproduce the above
- *     copyright notice, this list of conditions and the following
- *     disclaimer in the documentation and/or other materials provided
- *     with the distribution.
- *   * Neither the name of Willow Garage, Inc. nor the names of its
- *     contributors may be used to endorse or promote products derived
- *     from this software without specific prior written permission.
- *
- *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
- *  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
- *  COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
- *  INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
- *  BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- *  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- *  CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- *  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
- *  ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- *  POSSIBILITY OF SUCH DAMAGE.
- *
- */
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
 
 #include "precomp.hpp"
+#include "fast_icp.hpp"
 
 #if defined(HAVE_EIGEN) && EIGEN_WORLD_VERSION == 3
 #  define HAVE_EIGEN3_HERE
@@ -1090,8 +1064,8 @@ bool Odometry::compute(Ptr<OdometryFrame>& srcFrame, Ptr<OdometryFrame>& dstFram
 
 Size Odometry::prepareFrameCache(Ptr<OdometryFrame> &frame, int /*cacheType*/) const
 {
-    if(frame == 0)
-        CV_Error(Error::StsBadArg, "Null frame pointer.\n");
+    if (!frame)
+        CV_Error(Error::StsBadArg, "Null frame pointer.");
 
     return Size();
 }
@@ -1104,6 +1078,8 @@ Ptr<Odometry> Odometry::create(const String & odometryType)
         return makePtr<ICPOdometry>();
     else if (odometryType == "RgbdICPOdometry")
         return makePtr<RgbdICPOdometry>();
+    else if (odometryType == "FastICPOdometry")
+        return makePtr<FastICPOdometry>();
     return Ptr<Odometry>();
 }
 
@@ -1450,6 +1426,114 @@ bool RgbdICPOdometry::computeImpl(const Ptr<OdometryFrame>& srcFrame, const Ptr<
 
 //
 
+using namespace cv::kinfu;
+
+FastICPOdometry::FastICPOdometry() :
+    maxDistDiff(DEFAULT_MAX_DEPTH_DIFF()),
+    angleThreshold((float)(30. * CV_PI / 180.)),
+    sigmaDepth(0.04f),
+    sigmaSpatial(4.5f),
+    kernelSize(7)
+{
+    setDefaultIterCounts(iterCounts);
+}
+
+FastICPOdometry::FastICPOdometry(const Mat& _cameraMatrix,
+                                 float _maxDistDiff,
+                                 float _angleThreshold,
+                                 float _sigmaDepth,
+                                 float _sigmaSpatial,
+                                 int _kernelSize,
+                                 const std::vector<int>& _iterCounts) :
+    maxDistDiff(_maxDistDiff),
+    angleThreshold(_angleThreshold),
+    sigmaDepth(_sigmaDepth),
+    sigmaSpatial(_sigmaSpatial),
+    kernelSize(_kernelSize),
+    iterCounts(Mat(_iterCounts).clone()),
+    cameraMatrix(_cameraMatrix)
+{
+    if(iterCounts.empty())
+        setDefaultIterCounts(iterCounts);
+}
+
+Ptr<FastICPOdometry> FastICPOdometry::create(const Mat& _cameraMatrix,
+                                             float _maxDistDiff,
+                                             float _angleThreshold,
+                                             float _sigmaDepth,
+                                             float _sigmaSpatial,
+                                             int _kernelSize,
+                                             const std::vector<int>& _iterCounts)
+{
+    return makePtr<FastICPOdometry>(_cameraMatrix, _maxDistDiff, _angleThreshold,
+                                   _sigmaDepth, _sigmaSpatial, _kernelSize, _iterCounts);
+}
+
+Size FastICPOdometry::prepareFrameCache(Ptr<OdometryFrame>& frame, int cacheType) const
+{
+    Odometry::prepareFrameCache(frame, cacheType);
+
+    if(frame->depth.empty())
+    {
+        if(!frame->pyramidDepth.empty())
+            frame->depth = frame->pyramidDepth[0];
+        else if(!frame->pyramidCloud.empty())
+        {
+            Mat cloud = frame->pyramidCloud[0];
+            std::vector<Mat> xyz;
+            split(cloud, xyz);
+            frame->depth = xyz[2];
+        }
+        else
+            CV_Error(Error::StsBadSize, "Depth or pyramidDepth or pyramidCloud have to be set.");
+    }
+    checkDepth(frame->depth, frame->depth.size());
+
+    // mask isn't used by FastICP
+    Intr intr(cameraMatrix);
+    float depthFactor = 1.f; // user should rescale depth manually
+    float truncateThreshold = 0.f; // disabled
+    makeFrameFromDepth(frame->depth, frame->pyramidCloud, frame->pyramidNormals, intr, (int)iterCounts.total(),
+                       depthFactor, sigmaDepth, sigmaSpatial, kernelSize, truncateThreshold);
+
+    return frame->depth.size();
+}
+
+void FastICPOdometry::checkParams() const
+{
+    CV_Assert(cameraMatrix.size() == Size(3,3) &&
+              (cameraMatrix.type() == CV_32FC1 ||
+               cameraMatrix.type() == CV_64FC1));
+
+    CV_Assert(maxDistDiff > 0);
+    CV_Assert(angleThreshold > 0);
+    CV_Assert(sigmaDepth > 0 && sigmaSpatial > 0 && kernelSize > 0);
+}
+
+bool FastICPOdometry::computeImpl(const Ptr<OdometryFrame>& srcFrame,
+                                  const Ptr<OdometryFrame>& dstFrame,
+                                  OutputArray Rt, const Mat& /*initRt*/) const
+{
+    kinfu::Intr intr(cameraMatrix);
+    std::vector<int> iterations = iterCounts;
+    Ptr<kinfu::ICP> icp = kinfu::makeICP(intr,
+                                         iterations,
+                                         angleThreshold,
+                                         maxDistDiff);
+
+    // KinFu's ICP calculates transformation from new frame to old one (src to dst)
+    Affine3f transform;
+    bool result = icp->estimateTransform(transform,
+                                         dstFrame->pyramidCloud, dstFrame->pyramidNormals,
+                                         srcFrame->pyramidCloud, srcFrame->pyramidNormals);
+
+    Rt.create(Size(4, 4), CV_64FC1);
+    Mat(Matx44d(transform.matrix)).copyTo(Rt.getMat());
+    return result;
+}
+
+//
+
 void
 warpFrame(const Mat& image, const Mat& depth, const Mat& mask,
           const Mat& Rt, const Mat& cameraMatrix, const Mat& distCoeff,
@@ -1462,5 +1546,5 @@ warpFrame(const Mat& image, const Mat& depth, const Mat& mask,
     else
         CV_Error(Error::StsBadArg, "Image has to be type of CV_8UC1 or CV_8UC3");
 }
-}
+} // namespace rgbd
 } // namespace cv
diff --git a/modules/rgbd/src/opencl/icp.cl b/modules/rgbd/src/opencl/icp.cl
new file mode 100644
index 00000000000..7c3901f3ce5
--- /dev/null
+++ b/modules/rgbd/src/opencl/icp.cl
@@ -0,0 +1,241 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#define UTSIZE 27
+
+typedef float4 ptype;
+
+/*
+    Calculate an upper triangle of Ab matrix then reduce it across workgroup
+*/
+
+inline void calcAb7(__global const char * oldPointsptr,
+                    int oldPoints_step, int oldPoints_offset,
+                    __global const char * oldNormalsptr,
+                    int oldNormals_step, int oldNormals_offset,
+                    const int2 oldSize,
+                    __global const char * newPointsptr,
+                    int newPoints_step, int newPoints_offset,
+                    __global const char * newNormalsptr,
+                    int newNormals_step, int newNormals_offset,
+                    const int2 newSize,
+                    const float16 poseMatrix,
+                    const float2 fxy,
+                    const float2 cxy,
+                    const float sqDistanceThresh,
+                    const float minCos,
+                    float* ab7
+                    )
+{
+    const int x = get_global_id(0);
+    const int y = get_global_id(1);
+
+    if(x >= newSize.x || y >= newSize.y)
+        return;
+
+    // coord-independent constants
+
+    const float3 poseRot0 = poseMatrix.s012;
+    const float3 poseRot1 = poseMatrix.s456;
+    const float3 poseRot2 = poseMatrix.s89a;
+    const float3 poseTrans = poseMatrix.s37b;
+
+    const float2 oldEdge = (float2)(oldSize.x - 1, oldSize.y - 1);
+
+    __global const ptype* newPtsRow = (__global const ptype*)(newPointsptr +
+                                                              newPoints_offset +
+                                                              y*newPoints_step);
+
+    __global const ptype* newNrmRow = (__global const ptype*)(newNormalsptr +
+                                                              newNormals_offset +
+                                                              y*newNormals_step);
+
+    float3 newP = newPtsRow[x].xyz;
+    float3 newN = newNrmRow[x].xyz;
+
+    if( any(isnan(newP)) || any(isnan(newN)) ||
+        any(isinf(newP)) || any(isinf(newN)) )
+        return;
+
+    //transform to old coord system
+    newP = (float3)(dot(newP, poseRot0),
+                    dot(newP, poseRot1),
+                    dot(newP, poseRot2)) + poseTrans;
+    newN = (float3)(dot(newN, poseRot0),
+                    dot(newN, poseRot1),
+                    dot(newN, poseRot2));
+
+    //find correspondence by projecting the point
+    float2 oldCoords = (newP.xy/newP.z)*fxy+cxy;
+
+    if(!(all(oldCoords >= 0.f) && all(oldCoords < oldEdge)))
+        return;
+
+    // bilinearly interpolate oldPts and oldNrm under oldCoords point
+    float3 oldP, oldN;
+    float2 ip = floor(oldCoords);
+    float2 t = oldCoords - ip;
+    int xi = ip.x, yi = ip.y;
+
+    __global const ptype* prow0 = (__global const ptype*)(oldPointsptr +
+                                                          oldPoints_offset +
+                                                          (yi+0)*oldPoints_step);
+    __global const ptype* prow1 = (__global const ptype*)(oldPointsptr +
+                                                          oldPoints_offset +
+                                                          (yi+1)*oldPoints_step);
+    float3 p00 = prow0[xi+0].xyz;
+    float3 p01 = prow0[xi+1].xyz;
+    float3 p10 = prow1[xi+0].xyz;
+    float3 p11 = prow1[xi+1].xyz;
+
+    // NaN check is done later
+
+    __global const ptype* nrow0 = (__global const ptype*)(oldNormalsptr +
+                                                          oldNormals_offset +
+                                                          (yi+0)*oldNormals_step);
+    __global const ptype* nrow1 = (__global const ptype*)(oldNormalsptr +
+                                                          oldNormals_offset +
+                                                          (yi+1)*oldNormals_step);
+
+    float3 n00 = nrow0[xi+0].xyz;
+    float3 n01 = nrow0[xi+1].xyz;
+    float3 n10 = nrow1[xi+0].xyz;
+    float3 n11 = nrow1[xi+1].xyz;
+
+    // NaN check is done later
+
+    float3 p0 = mix(p00, p01, t.x);
+    float3 p1 = mix(p10, p11, t.x);
+    oldP = mix(p0, p1, t.y);
+
+    float3 n0 = mix(n00, n01, t.x);
+    float3 n1 = mix(n10, n11, t.x);
+    oldN = mix(n0, n1, t.y);
+
+    if( any(isnan(oldP)) || any(isnan(oldN)) ||
+        any(isinf(oldP)) || any(isinf(oldN)) )
+        return;
+
+    //filter by distance
+    float3 diff = newP - oldP;
+    if(dot(diff, diff) > sqDistanceThresh)
+        return;
+
+    //filter by angle
+    if(fabs(dot(newN, oldN)) < minCos)
+        return;
+
+    // build point-wise vector ab = [ A | b ]
+
+    float3 VxN = cross(newP, oldN);
+    float ab[7] = {VxN.x, VxN.y, VxN.z, oldN.x, oldN.y, oldN.z, -dot(oldN, diff)};
+
+    for(int i = 0; i < 7; i++)
+        ab7[i] = ab[i];
+}
+
+
+__kernel void getAb(__global const char * oldPointsptr,
+                    int oldPoints_step, int oldPoints_offset,
+                    __global const char * oldNormalsptr,
+                    int oldNormals_step, int oldNormals_offset,
+                    const int2 oldSize,
+                    __global const char * newPointsptr,
+                    int newPoints_step, int newPoints_offset,
+                    __global const char * newNormalsptr,
+                    int newNormals_step, int newNormals_offset,
+                    const int2 newSize,
+                    const float16 poseMatrix,
+                    const float2 fxy,
+                    const float2 cxy,
+                    const float sqDistanceThresh,
+                    const float minCos,
+                    __local float * reducebuf,
+                    __global char* groupedSumptr,
+                    int groupedSum_step, int groupedSum_offset
+)
+{
+    const int x = get_global_id(0);
+    const int y = get_global_id(1);
+
+    if(x >= newSize.x || y >= newSize.y)
+        return;
+
+    const int gx = get_group_id(0);
+    const int gy = get_group_id(1);
+    const int gw = get_num_groups(0);
+    const int gh = get_num_groups(1);
+
+    const int lx = get_local_id(0);
+    const int ly = get_local_id(1);
+    const int lw = get_local_size(0);
+    const int lh = get_local_size(1);
+    const int lsz = lw*lh;
+    const int lid = lx + ly*lw;
+
+    float ab[7];
+    for(int i = 0; i < 7; i++)
+        ab[i] = 0;
+
+    calcAb7(oldPointsptr,
+            oldPoints_step, oldPoints_offset,
+            oldNormalsptr,
+            oldNormals_step, oldNormals_offset,
+            oldSize,
+            newPointsptr,
+            newPoints_step, newPoints_offset,
+            newNormalsptr,
+            newNormals_step, newNormals_offset,
+            newSize,
+            poseMatrix,
+            fxy, cxy,
+            sqDistanceThresh,
+            minCos,
+            ab);
+
+    // build point-wise upper-triangle matrix [ab^T * ab] w/o last row
+    // which is [A^T*A | A^T*b]
+    // and gather sum
+
+    __local float* upperTriangle = reducebuf + lid*UTSIZE;
+
+    int pos = 0;
+    for(int i = 0; i < 6; i++)
+    {
+        for(int j = i; j < 7; j++)
+        {
+            upperTriangle[pos++] = ab[i]*ab[j];
+        }
+    }
+
+    // reduce upperTriangle to local mem
+
+    // maxStep = ctz(lsz), ctz isn't supported on CUDA devices
+    const int c = clz(lsz & -lsz);
+    const int maxStep = c ? 31 - c : c;
+    for(int nstep = 1; nstep <= maxStep; nstep++)
+    {
+        if(lid % (1 << nstep) == 0)
+        {
+            __local float* rto   = reducebuf + UTSIZE*lid;
+            __local float* rfrom = reducebuf + UTSIZE*(lid+(1 << (nstep-1)));
+            for(int i = 0; i < UTSIZE; i++)
+                rto[i] += rfrom[i];
+        }
+        barrier(CLK_LOCAL_MEM_FENCE);
+    }
+
+    // here group sum should be in reducebuf[0...UTSIZE]
+    if(lid == 0)
+    {
+        __global float* groupedRow = (__global float*)(groupedSumptr +
+                                                       groupedSum_offset +
+                                                       gy*groupedSum_step);
+
+        for(int i = 0; i < UTSIZE; i++)
+            groupedRow[gx*UTSIZE + i] = reducebuf[i];
+    }
+}
diff --git a/modules/rgbd/src/opencl/kinfu_frame.cl b/modules/rgbd/src/opencl/kinfu_frame.cl
new file mode 100644
index 00000000000..07a05dfed1c
--- /dev/null
+++ b/modules/rgbd/src/opencl/kinfu_frame.cl
@@ -0,0 +1,287 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+inline float3 reproject(float3 p, float2 fxyinv, float2 cxy)
+{
+    float2 pp = p.z*(p.xy - cxy)*fxyinv;
+    return (float3)(pp, p.z);
+}
+
+typedef float4 ptype;
+
+__kernel void computePointsNormals(__global char * pointsptr,
+                                   int points_step, int points_offset,
+                                   __global char * normalsptr,
+                                   int normals_step, int normals_offset,
+                                   __global const char * depthptr,
+                                   int depth_step, int depth_offset,
+                                   int depth_rows, int depth_cols,
+                                   const float2 fxyinv,
+                                   const float2 cxy,
+                                   const float dfac
+                                    )
+{
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+
+    if(x >= depth_cols || y >= depth_rows)
+        return;
+
+    __global const float* row0 = (__global const float*)(depthptr + depth_offset +
+                                                         (y+0)*depth_step);
+    __global const float* row1 = (__global const float*)(depthptr + depth_offset +
+                                                         (y+1)*depth_step);
+
+    float d00 = row0[x];
+    float z00 = d00*dfac;
+    float3 p00 = (float3)(convert_float2((int2)(x, y)), z00);
+    float3 v00 = reproject(p00, fxyinv, cxy);
+
+    float3 p = nan((uint)0), n = nan((uint)0);
+
+    if(x < depth_cols - 1 && y < depth_rows - 1)
+    {
+        float d01 = row0[x+1];
+        float d10 = row1[x];
+
+        float z01 = d01*dfac;
+        float z10 = d10*dfac;
+
+        if(z00 != 0 && z01 != 0 && z10 != 0)
+        {
+            float3 p01 = (float3)(convert_float2((int2)(x+1, y+0)), z01);
+            float3 p10 = (float3)(convert_float2((int2)(x+0, y+1)), z10);
+            float3 v01 = reproject(p01, fxyinv, cxy);
+            float3 v10 = reproject(p10, fxyinv, cxy);
+
+            float3 vec = cross(v01 - v00, v10 - v00);
+            n = - normalize(vec);
+            p = v00;
+        }
+    }
+
+    __global float* pts = (__global float*)(pointsptr  +  points_offset + y*points_step  + x*sizeof(ptype));
+    __global float* nrm = (__global float*)(normalsptr + normals_offset + y*normals_step + x*sizeof(ptype));
+    vstore4((ptype)(p, 0), 0, pts);
+    vstore4((ptype)(n, 0), 0, nrm);
+}
+
+__kernel void pyrDownBilateral(__global const char * depthptr,
+                               int depth_step, int depth_offset,
+                               int depth_rows, int depth_cols,
+                               __global char * depthDownptr,
+                               int depthDown_step, int depthDown_offset,
+                               int depthDown_rows, int depthDown_cols,
+                               const float sigma
+                               )
+{
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+
+    if(x >= depthDown_cols || y >= depthDown_rows)
+        return;
+
+    const float sigma3 = sigma*3;
+    const int D = 5;
+
+    __global const float* srcCenterRow = (__global const float*)(depthptr + depth_offset +
+                                                                 (2*y)*depth_step);
+
+    float center = srcCenterRow[2*x];
+
+    int sx = max(0, 2*x - D/2), ex = min(2*x - D/2 + D, depth_cols-1);
+    int sy = max(0, 2*y - D/2), ey = min(2*y - D/2 + D, depth_rows-1);
+
+    float sum = 0;
+    int count = 0;
+
+    for(int iy = sy; iy < ey; iy++)
+    {
+        __global const float* srcRow = (__global const float*)(depthptr + depth_offset +
+                                                               (iy)*depth_step);
+        for(int ix = sx; ix < ex; ix++)
+        {
+            float val = srcRow[ix];
+            if(fabs(val - center) < sigma3)
+            {
+                sum += val; count++;
+            }
+        }
+    }
+
+    __global float* downRow = (__global float*)(depthDownptr + depthDown_offset +
+                                                y*depthDown_step + x*sizeof(float));
+
+    *downRow = (count == 0) ? 0 : sum/convert_float(count);
+}
+
+//TODO: remove bilateral when OpenCV performs 32f bilat with OpenCL
+
+__kernel void customBilateral(__global const char * srcptr,
+                              int src_step, int src_offset,
+                              __global char * dstptr,
+                              int dst_step, int dst_offset,
+                              const int2 frameSize,
+                              const int kernelSize,
+                              const float sigma_spatial2_inv_half,
+                              const float sigma_depth2_inv_half
+)
+{
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+
+    if(x >= frameSize.x || y >= frameSize.y)
+        return;
+
+    __global const float* srcCenterRow = (__global const float*)(srcptr + src_offset +
+                                                                 y*src_step);
+    float value = srcCenterRow[x];
+
+    int tx = min (x - kernelSize / 2 + kernelSize, frameSize.x - 1);
+    int ty = min (y - kernelSize / 2 + kernelSize, frameSize.y - 1);
+
+    float sum1 = 0;
+    float sum2 = 0;
+
+    for (int cy = max (y - kernelSize / 2, 0); cy < ty; ++cy)
+    {
+        __global const float* srcRow = (__global const float*)(srcptr + src_offset +
+                                                               cy*src_step);
+        for (int cx = max (x - kernelSize / 2, 0); cx < tx; ++cx)
+        {
+            float depth = srcRow[cx];
+
+            float space2 = convert_float((x - cx) * (x - cx) + (y - cy) * (y - cy));
+            float color2 = (value - depth) * (value - depth);
+
+            float weight = native_exp (-(space2 * sigma_spatial2_inv_half +
+                                         color2 * sigma_depth2_inv_half));
+
+            sum1 += depth * weight;
+            sum2 += weight;
+        }
+    }
+
+    __global float* dst = (__global float*)(dstptr + dst_offset +
+                                            y*dst_step + x*sizeof(float));
+    *dst = sum1/sum2;
+}
+
+__kernel void pyrDownPointsNormals(__global const char * pptr,
+                                   int p_step, int p_offset,
+                                   __global const char * nptr,
+                                   int n_step, int n_offset,
+                                   __global char * pdownptr,
+                                   int pdown_step, int pdown_offset,
+                                   __global char * ndownptr,
+                                   int ndown_step, int ndown_offset,
+                                   const int2 downSize
+                                   )
+{
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+
+    if(x >= downSize.x || y >= downSize.y)
+        return;
+
+    float3 point = nan((uint)0), normal = nan((uint)0);
+
+    __global const ptype* pUpRow0 = (__global const ptype*)(pptr + p_offset + (2*y  )*p_step);
+    __global const ptype* pUpRow1 = (__global const ptype*)(pptr + p_offset + (2*y+1)*p_step);
+
+    float3 d00 = pUpRow0[2*x  ].xyz;
+    float3 d01 = pUpRow0[2*x+1].xyz;
+    float3 d10 = pUpRow1[2*x  ].xyz;
+    float3 d11 = pUpRow1[2*x+1].xyz;
+
+    if(!(any(isnan(d00)) || any(isnan(d01)) ||
+         any(isnan(d10)) || any(isnan(d11))))
+    {
+        point = (d00 + d01 + d10 + d11)*0.25f;
+
+        __global const ptype* nUpRow0 = (__global const ptype*)(nptr + n_offset + (2*y  )*n_step);
+        __global const ptype* nUpRow1 = (__global const ptype*)(nptr + n_offset + (2*y+1)*n_step);
+
+        float3 n00 = nUpRow0[2*x  ].xyz;
+        float3 n01 = nUpRow0[2*x+1].xyz;
+        float3 n10 = nUpRow1[2*x  ].xyz;
+        float3 n11 = nUpRow1[2*x+1].xyz;
+
+        normal = (n00 + n01 + n10 + n11)*0.25f;
+    }
+
+    __global ptype* pts = (__global ptype*)(pdownptr + pdown_offset + y*pdown_step);
+    __global ptype* nrm = (__global ptype*)(ndownptr + ndown_offset + y*ndown_step);
+    pts[x] = (ptype)(point,  0);
+    nrm[x] = (ptype)(normal, 0);
+}
+
+typedef char4 pixelType;
+
+// 20 is fixed power
+float specPow20(float x)
+{
+    float x2 = x*x;
+    float x5 = x2*x2*x;
+    float x10 = x5*x5;
+    float x20 = x10*x10;
+    return x20;
+}
+
+__kernel void render(__global const char * pointsptr,
+                     int points_step, int points_offset,
+                     __global const char * normalsptr,
+                     int normals_step, int normals_offset,
+                     __global char * imgptr,
+                     int img_step, int img_offset,
+                     const int2 frameSize,
+                     const float4 lightPt
+                    )
+{
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+
+    if(x >= frameSize.x || y >= frameSize.y)
+        return;
+
+    __global const ptype* ptsRow = (__global const ptype*)(pointsptr  + points_offset  + y*points_step  + x*sizeof(ptype));
+    __global const ptype* nrmRow = (__global const ptype*)(normalsptr + normals_offset + y*normals_step + x*sizeof(ptype));
+
+    float3 p = (*ptsRow).xyz;
+    float3 n = (*nrmRow).xyz;
+
+    pixelType color;
+
+    if(any(isnan(p)))
+    {
+        color = (pixelType)(0, 32, 0, 0);
+    }
+    else
+    {
+        const float Ka = 0.3f;  //ambient coeff
+        const float Kd = 0.5f;  //diffuse coeff
+        const float Ks = 0.2f;  //specular coeff
+        //const int   sp = 20;  //specular power, fixed in specPow20()
+
+        const float Ax = 1.f;   //ambient color,  can be RGB
+        const float Dx = 1.f;   //diffuse color,  can be RGB
+        const float Sx = 1.f;   //specular color, can be RGB
+        const float Lx = 1.f;   //light color
+
+        float3 l = normalize(lightPt.xyz - p);
+        float3 v = normalize(-p);
+        float3 r = normalize(2.f*n*dot(n, l) - l);
+
+        float val = (Ax*Ka*Dx + Lx*Kd*Dx*max(0.f, dot(n, l)) +
+                     Lx*Ks*Sx*specPow20(max(0.f, dot(r, v))));
+
+        uchar ix = convert_uchar(val*255.f);
+        color = (pixelType)(ix, ix, ix, 0);
+    }
+
+    __global char* imgRow = (__global char*)(imgptr + img_offset + y*img_step + x*sizeof(pixelType));
+    vstore4(color, 0, imgRow);
+}
diff --git a/modules/rgbd/src/opencl/tsdf.cl b/modules/rgbd/src/opencl/tsdf.cl
new file mode 100644
index 00000000000..d9bfff4ac00
--- /dev/null
+++ b/modules/rgbd/src/opencl/tsdf.cl
@@ -0,0 +1,825 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+__kernel void integrate(__global const char * depthptr,
+                        int depth_step, int depth_offset,
+                        int depth_rows, int depth_cols,
+                        __global float2 * volumeptr,
+                        const float16 vol2camMatrix,
+                        const float voxelSize,
+                        const int4 volResolution4,
+                        const int4 volDims4,
+                        const float2 fxy,
+                        const float2 cxy,
+                        const float dfac,
+                        const float truncDist,
+                        const int maxWeight)
+{
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+
+    const int3 volResolution = volResolution4.xyz;
+
+    if(x >= volResolution.x || y >= volResolution.y)
+        return;
+
+    // coord-independent constants
+    const int3 volDims = volDims4.xyz;
+    const float2 limits = (float2)(depth_cols-1, depth_rows-1);
+
+    const float4 vol2cam0 = vol2camMatrix.s0123;
+    const float4 vol2cam1 = vol2camMatrix.s4567;
+    const float4 vol2cam2 = vol2camMatrix.s89ab;
+
+    const float truncDistInv = 1.f/truncDist;
+
+    // optimization of camSpace transformation (vector addition instead of matmul at each z)
+    float4 inPt = (float4)(x*voxelSize, y*voxelSize, 0, 1);
+    float3 basePt = (float3)(dot(vol2cam0, inPt),
+                             dot(vol2cam1, inPt),
+                             dot(vol2cam2, inPt));
+
+    float3 camSpacePt = basePt;
+
+    // zStep == vol2cam*(float3(x, y, 1)*voxelSize) - basePt;
+    float3 zStep = ((float3)(vol2cam0.z, vol2cam1.z, vol2cam2.z))*voxelSize;
+
+    int volYidx = x*volDims.x + y*volDims.y;
+
+    int startZ, endZ;
+    if(fabs(zStep.z) > 1e-5)
+    {
+        int baseZ = convert_int(-basePt.z / zStep.z);
+        if(zStep.z > 0)
+        {
+            startZ = baseZ;
+            endZ = volResolution.z;
+        }
+        else
+        {
+            startZ = 0;
+            endZ = baseZ;
+        }
+    }
+    else
+    {
+        if(basePt.z > 0)
+        {
+            startZ = 0; endZ = volResolution.z;
+        }
+        else
+        {
+            // z loop shouldn't be performed
+            //startZ = endZ = 0;
+            return;
+        }
+    }
+
+    startZ = max(0, startZ);
+    endZ = min(volResolution.z, endZ);
+
+    for(int z = startZ; z < endZ; z++)
+    {
+        // optimization of the following:
+        //float3 camSpacePt = vol2cam * ((float3)(x, y, z)*voxelSize);
+        camSpacePt += zStep;
+
+        if(camSpacePt.z <= 0)
+            continue;
+
+        float3 camPixVec = camSpacePt / camSpacePt.z;
+        float2 projected = mad(camPixVec.xy, fxy, cxy);
+
+        float v;
+        // bilinearly interpolate depth at projected
+        if(all(projected >= 0) && all(projected < limits))
+        {
+            float2 ip = floor(projected);
+            int xi = ip.x, yi = ip.y;
+
+            __global const float* row0 = (__global const float*)(depthptr + depth_offset +
+                                                                 (yi+0)*depth_step);
+            __global const float* row1 = (__global const float*)(depthptr + depth_offset +
+                                                                 (yi+1)*depth_step);
+
+            float v00 = row0[xi+0];
+            float v01 = row0[xi+1];
+            float v10 = row1[xi+0];
+            float v11 = row1[xi+1];
+            float4 vv = (float4)(v00, v01, v10, v11);
+
+            // assume correct depth is positive
+            if(all(vv > 0))
+            {
+                float2 t = projected - ip;
+                float2 vf = mix(vv.xz, vv.yw, t.x);
+                v = mix(vf.s0, vf.s1, t.y);
+            }
+            else
+                continue;
+        }
+        else
+            continue;
+
+        if(v == 0)
+            continue;
+
+        float pixNorm = length(camPixVec);
+
+        // difference between distances of point and of surface to camera
+        float sdf = pixNorm*(v*dfac - camSpacePt.z);
+        // possible alternative is:
+        // float sdf = length(camSpacePt)*(v*dfac/camSpacePt.z - 1.0);
+
+        if(sdf >= -truncDist)
+        {
+            float tsdf = fmin(1.0f, sdf * truncDistInv);
+            int volIdx = volYidx + z*volDims.z;
+
+            float2 voxel = volumeptr[volIdx];
+            float value  = voxel.s0;
+            int weight = as_int(voxel.s1);
+
+            // update TSDF
+            value = (value*weight + tsdf) / (weight + 1);
+            weight = min(weight + 1, maxWeight);
+
+            voxel.s0 = value;
+            voxel.s1 = as_float(weight);
+            volumeptr[volIdx] = voxel;
+        }
+    }
+}
+
+
+inline float interpolateVoxel(float3 p, __global const float2* volumePtr,
+                              int3 volDims, int8 neighbourCoords)
+{
+    float3 fip = floor(p);
+    int3 ip = convert_int3(fip);
+    float3 t = p - fip;
+
+    int3 cmul = volDims*ip;
+    int coordBase = cmul.x + cmul.y + cmul.z;
+    int nco[8];
+    vstore8(neighbourCoords + coordBase, 0, nco);
+
+    float vaz[8];
+    for(int i = 0; i < 8; i++)
+        vaz[i] = volumePtr[nco[i]].s0;
+
+    float8 vz = vload8(0, vaz);
+
+    float4 vy = mix(vz.s0246, vz.s1357, t.z);
+    float2 vx = mix(vy.s02, vy.s13, t.y);
+    return mix(vx.s0, vx.s1, t.x);
+}
+
+inline float3 getNormalVoxel(float3 p, __global const float2* volumePtr,
+                             int3 volResolution, int3 volDims, int8 neighbourCoords)
+{
+    if(any(p < 1) || any(p >= convert_float3(volResolution - 2)))
+        return nan((uint)0);
+
+    float3 fip = floor(p);
+    int3 ip = convert_int3(fip);
+    float3 t = p - fip;
+
+    int3 cmul = volDims*ip;
+    int coordBase = cmul.x + cmul.y + cmul.z;
+    int nco[8];
+    vstore8(neighbourCoords + coordBase, 0, nco);
+
+    int arDims[3];
+    vstore3(volDims, 0, arDims);
+    float an[3];
+    for(int c = 0; c < 3; c++)
+    {
+        int dim = arDims[c];
+
+        float vaz[8];
+        for(int i = 0; i < 8; i++)
+            vaz[i] = volumePtr[nco[i] + dim].s0 -
+                     volumePtr[nco[i] - dim].s0;
+
+        float8 vz = vload8(0, vaz);
+
+        float4 vy = mix(vz.s0246, vz.s1357, t.z);
+        float2 vx = mix(vy.s02, vy.s13, t.y);
+
+        an[c] = mix(vx.s0, vx.s1, t.x);
+    }
+
+    //gradientDeltaFactor is fixed at 1.0 of voxel size
+    return fast_normalize(vload3(0, an));
+}
+
+typedef float4 ptype;
+
+__kernel void raycast(__global char * pointsptr,
+                      int points_step, int points_offset,
+                      __global char * normalsptr,
+                      int normals_step, int normals_offset,
+                      const int2 frameSize,
+                      __global const float2 * volumeptr,
+                      __global const float * vol2camptr,
+                      __global const float * cam2volptr,
+                      const float2 fixy,
+                      const float2 cxy,
+                      const float4 boxDown4,
+                      const float4 boxUp4,
+                      const float tstep,
+                      const float voxelSize,
+                      const int4 volResolution4,
+                      const int4 volDims4,
+                      const int8 neighbourCoords
+                      )
+{
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+
+    if(x >= frameSize.x || y >= frameSize.y)
+        return;
+
+    // coordinate-independent constants
+
+    __global const float* cm = cam2volptr;
+    const float3 camRot0  = vload4(0, cm).xyz;
+    const float3 camRot1  = vload4(1, cm).xyz;
+    const float3 camRot2  = vload4(2, cm).xyz;
+    const float3 camTrans = (float3)(cm[3], cm[7], cm[11]);
+
+    __global const float* vm = vol2camptr;
+    const float3 volRot0  = vload4(0, vm).xyz;
+    const float3 volRot1  = vload4(1, vm).xyz;
+    const float3 volRot2  = vload4(2, vm).xyz;
+    const float3 volTrans = (float3)(vm[3], vm[7], vm[11]);
+
+    const float3 boxDown = boxDown4.xyz;
+    const float3 boxUp   = boxUp4.xyz;
+    const int3   volDims = volDims4.xyz;
+
+    const int3 volResolution = volResolution4.xyz;
+
+    const float invVoxelSize = native_recip(voxelSize);
+
+    // kernel itself
+
+    float3 point  = nan((uint)0);
+    float3 normal = nan((uint)0);
+
+    float3 orig = camTrans;
+
+    // get direction through pixel in volume space:
+    // 1. reproject (x, y) on projecting plane where z = 1.f
+    float3 planed = (float3)(((float2)(x, y) - cxy)*fixy, 1.f);
+
+    // 2. rotate to volume space
+    planed = (float3)(dot(planed, camRot0),
+                      dot(planed, camRot1),
+                      dot(planed, camRot2));
+
+    // 3. normalize
+    float3 dir = fast_normalize(planed);
+
+    // compute intersection of ray with all six bbox planes
+    float3 rayinv = native_recip(dir);
+    float3 tbottom = rayinv*(boxDown - orig);
+    float3 ttop    = rayinv*(boxUp   - orig);
+
+    // re-order intersections to find smallest and largest on each axis
+    float3 minAx = min(ttop, tbottom);
+    float3 maxAx = max(ttop, tbottom);
+
+    // near clipping plane
+    const float clip = 0.f;
+    float tmin = max(max(max(minAx.x, minAx.y), max(minAx.x, minAx.z)), clip);
+    float tmax =     min(min(maxAx.x, maxAx.y), min(maxAx.x, maxAx.z));
+
+    // precautions against getting coordinates out of bounds
+    tmin = tmin + tstep;
+    tmax = tmax - tstep;
+
+    if(tmin < tmax)
+    {
+        // interpolation optimized a little
+        orig *= invVoxelSize;
+        dir  *= invVoxelSize;
+
+        float3 rayStep = dir*tstep;
+        float3 next = (orig + dir*tmin);
+        float f = interpolateVoxel(next, volumeptr, volDims, neighbourCoords);
+        float fnext = f;
+
+        // raymarch
+        int steps = 0;
+        int nSteps = floor(native_divide(tmax - tmin, tstep));
+        bool stop = false;
+        for(int i = 0; i < nSteps; i++)
+        {
+            // fix for wrong steps counting
+            if(!stop)
+            {
+                next += rayStep;
+
+                // fetch voxel
+                int3 ip = convert_int3(round(next));
+                int3 cmul = ip*volDims;
+                int idx = cmul.x + cmul.y + cmul.z;
+                fnext = volumeptr[idx].s0;
+
+                if(fnext != f)
+                {
+                    fnext = interpolateVoxel(next, volumeptr, volDims, neighbourCoords);
+
+                    // when ray crosses a surface
+                    if(signbit(f) != signbit(fnext))
+                    {
+                        stop = true; continue;
+                    }
+
+                    f = fnext;
+                }
+                steps++;
+            }
+        }
+
+        // if ray penetrates a surface from outside
+        // linearly interpolate t between two f values
+        if(f > 0 && fnext < 0)
+        {
+            float3 tp = next - rayStep;
+            float ft   = interpolateVoxel(tp,   volumeptr, volDims, neighbourCoords);
+            float ftdt = interpolateVoxel(next, volumeptr, volDims, neighbourCoords);
+            // float t = tmin + steps*tstep;
+            // float ts = t - tstep*ft/(ftdt - ft);
+            float ts = tmin + tstep*(steps - native_divide(ft, ftdt - ft));
+
+            // avoid division by zero
+            if(!isnan(ts) && !isinf(ts))
+            {
+                float3 pv = orig + dir*ts;
+                float3 nv = getNormalVoxel(pv, volumeptr, volResolution, volDims, neighbourCoords);
+
+                if(!any(isnan(nv)))
+                {
+                    //convert pv and nv to camera space
+                    normal = (float3)(dot(nv, volRot0),
+                                      dot(nv, volRot1),
+                                      dot(nv, volRot2));
+                    // interpolation optimized a little
+                    pv *= voxelSize;
+                    point = (float3)(dot(pv, volRot0),
+                                     dot(pv, volRot1),
+                                     dot(pv, volRot2)) + volTrans;
+                }
+            }
+        }
+    }
+
+    __global float* pts = (__global float*)(pointsptr  +  points_offset + y*points_step  + x*sizeof(ptype));
+    __global float* nrm = (__global float*)(normalsptr + normals_offset + y*normals_step + x*sizeof(ptype));
+    vstore4((float4)(point,  0), 0, pts);
+    vstore4((float4)(normal, 0), 0, nrm);
+}
+
+
+__kernel void getNormals(__global const char * pointsptr,
+                         int points_step, int points_offset,
+                         __global char * normalsptr,
+                         int normals_step, int normals_offset,
+                         const int2 frameSize,
+                         __global const float2* volumeptr,
+                         __global const float * volPoseptr,
+                         __global const float * invPoseptr,
+                         const float voxelSizeInv,
+                         const int4 volResolution4,
+                         const int4 volDims4,
+                         const int8 neighbourCoords
+                         )
+{
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+
+    if(x >= frameSize.x || y >= frameSize.y)
+        return;
+
+    // coordinate-independent constants
+
+    __global const float* vp = volPoseptr;
+    const float3 volRot0  = vload4(0, vp).xyz;
+    const float3 volRot1  = vload4(1, vp).xyz;
+    const float3 volRot2  = vload4(2, vp).xyz;
+    const float3 volTrans = (float3)(vp[3], vp[7], vp[11]);
+
+    __global const float* iv = invPoseptr;
+    const float3 invRot0 = vload4(0, iv).xyz;
+    const float3 invRot1 = vload4(1, iv).xyz;
+    const float3 invRot2 = vload4(2, iv).xyz;
+    const float3 invTrans = (float3)(iv[3], iv[7], iv[11]);
+
+    const int3 volResolution = volResolution4.xyz;
+    const int3 volDims = volDims4.xyz;
+
+    // kernel itself
+
+    __global const ptype* ptsRow = (__global const ptype*)(pointsptr +
+                                                           points_offset +
+                                                           y*points_step);
+    float3 p = ptsRow[x].xyz;
+    float3 n = nan((uint)0);
+    if(!any(isnan(p)))
+    {
+        float3 voxPt = (float3)(dot(p, invRot0),
+                                dot(p, invRot1),
+                                dot(p, invRot2)) + invTrans;
+        voxPt = voxPt * voxelSizeInv;
+        n = getNormalVoxel(voxPt, volumeptr, volResolution, volDims, neighbourCoords);
+        n = (float3)(dot(n, volRot0),
+                     dot(n, volRot1),
+                     dot(n, volRot2));
+    }
+
+    __global float* nrm = (__global float*)(normalsptr +
+                                            normals_offset +
+                                            y*normals_step +
+                                            x*sizeof(ptype));
+
+    vstore4((float4)(n, 0), 0, nrm);
+}
+
+#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics:enable
+
+struct CoordReturn
+{
+    bool result;
+    float3 point;
+    float3 normal;
+};
+
+inline struct CoordReturn coord(int x, int y, int z, float3 V, float v0, int axis,
+                                __global const float2* volumeptr,
+                                int3 volResolution, int3 volDims,
+                                int8 neighbourCoords,
+                                float voxelSize, float voxelSizeInv,
+                                const float3 volRot0,
+                                const float3 volRot1,
+                                const float3 volRot2,
+                                const float3 volTrans,
+                                bool needNormals,
+                                bool scan
+                                )
+{
+    struct CoordReturn cr;
+
+    // 0 for x, 1 for y, 2 for z
+    bool limits = false;
+    int3 shift;
+    float Vc = 0.f;
+    if(axis == 0)
+    {
+        shift = (int3)(1, 0, 0);
+        limits = (x + 1 < volResolution.x);
+        Vc = V.x;
+    }
+    if(axis == 1)
+    {
+        shift = (int3)(0, 1, 0);
+        limits = (y + 1 < volResolution.y);
+        Vc = V.y;
+    }
+    if(axis == 2)
+    {
+        shift = (int3)(0, 0, 1);
+        limits = (z + 1 < volResolution.z);
+        Vc = V.z;
+    }
+
+    if(limits)
+    {
+        int3 ip = ((int3)(x, y, z)) + shift;
+        int3 cmul = ip*volDims;
+        int idx = cmul.x + cmul.y + cmul.z;
+        float2 voxel = volumeptr[idx].s0;
+        float vd  = voxel.s0;
+        int weight = as_int(voxel.s1);
+
+        if(weight != 0 && vd != 1.f)
+        {
+            if((v0 > 0 && vd < 0) || (v0 < 0 && vd > 0))
+            {
+                // calc actual values or estimate amount of space
+                if(!scan)
+                {
+                    // linearly interpolate coordinate
+                    float Vn = Vc + voxelSize;
+                    float dinv = 1.f/(fabs(v0)+fabs(vd));
+                    float inter = (Vc*fabs(vd) + Vn*fabs(v0))*dinv;
+
+                    float3 p = (float3)(shift.x ? inter : V.x,
+                                        shift.y ? inter : V.y,
+                                        shift.z ? inter : V.z);
+
+                    cr.point = (float3)(dot(p, volRot0),
+                                        dot(p, volRot1),
+                                        dot(p, volRot2)) + volTrans;
+
+                    if(needNormals)
+                    {
+                        float3 nv = getNormalVoxel(p * voxelSizeInv,
+                                                   volumeptr, volResolution, volDims, neighbourCoords);
+
+                        cr.normal = (float3)(dot(nv, volRot0),
+                                             dot(nv, volRot1),
+                                             dot(nv, volRot2));
+                    }
+                }
+
+                cr.result = true;
+                return cr;
+            }
+        }
+    }
+
+    cr.result = false;
+    return cr;
+}
+
+
+__kernel void scanSize(__global const float2* volumeptr,
+                       const int4 volResolution4,
+                       const int4 volDims4,
+                       const int8 neighbourCoords,
+                       __global const float * volPoseptr,
+                       const float voxelSize,
+                       const float voxelSizeInv,
+                       __local int* reducebuf,
+                       __global char* groupedSumptr,
+                       int groupedSum_slicestep,
+                       int groupedSum_step, int groupedSum_offset
+                       )
+{
+    const int3 volDims = volDims4.xyz;
+    const int3 volResolution = volResolution4.xyz;
+
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+    int z = get_global_id(2);
+
+    bool validVoxel = true;
+    if(x >= volResolution.x || y >= volResolution.y || z >= volResolution.z)
+        validVoxel = false;
+
+    const int gx = get_group_id(0);
+    const int gy = get_group_id(1);
+    const int gz = get_group_id(2);
+
+    const int lx = get_local_id(0);
+    const int ly = get_local_id(1);
+    const int lz = get_local_id(2);
+    const int lw = get_local_size(0);
+    const int lh = get_local_size(1);
+    const int ld = get_local_size(2);
+    const int lsz = lw*lh*ld;
+    const int lid = lx + ly*lw + lz*lw*lh;
+
+    // coordinate-independent constants
+
+    __global const float* vp = volPoseptr;
+    const float3 volRot0  = vload4(0, vp).xyz;
+    const float3 volRot1  = vload4(1, vp).xyz;
+    const float3 volRot2  = vload4(2, vp).xyz;
+    const float3 volTrans = (float3)(vp[3], vp[7], vp[11]);
+
+    // kernel itself
+    int npts = 0;
+    if(validVoxel)
+    {
+        int3 ip = (int3)(x, y, z);
+        int3 cmul = ip*volDims;
+        int idx = cmul.x + cmul.y + cmul.z;
+        float2 voxel = volumeptr[idx].s0;
+        float value  = voxel.s0;
+        int weight = as_int(voxel.s1);
+
+        // if voxel is not empty
+        if(weight != 0 && value != 1.f)
+        {
+            float3 V = (((float3)(x, y, z)) + 0.5f)*voxelSize;
+
+            #pragma unroll
+            for(int i = 0; i < 3; i++)
+            {
+                struct CoordReturn cr;
+                cr = coord(x, y, z, V, value, i,
+                           volumeptr, volResolution, volDims,
+                           neighbourCoords,
+                           voxelSize, voxelSizeInv,
+                           volRot0, volRot1, volRot2, volTrans,
+                           false, true);
+                if(cr.result)
+                {
+                    npts++;
+                }
+            }
+        }
+    }
+
+    // reducebuf keeps counters for each thread
+    reducebuf[lid] = npts;
+
+    // reduce counter to local mem
+
+    // maxStep = ctz(lsz), ctz isn't supported on CUDA devices
+    const int c = clz(lsz & -lsz);
+    const int maxStep = c ? 31 - c : c;
+    for(int nstep = 1; nstep <= maxStep; nstep++)
+    {
+        if(lid % (1 << nstep) == 0)
+        {
+            int rto   = lid;
+            int rfrom = lid + (1 << (nstep-1));
+            reducebuf[rto] += reducebuf[rfrom];
+        }
+        barrier(CLK_LOCAL_MEM_FENCE);
+    }
+
+    if(lid == 0)
+    {
+        __global int* groupedRow = (__global int*)(groupedSumptr +
+                                                   groupedSum_offset +
+                                                   gy*groupedSum_step +
+                                                   gz*groupedSum_slicestep);
+
+        groupedRow[gx] = reducebuf[0];
+    }
+}
+
+
+__kernel void fillPtsNrm(__global const float2* volumeptr,
+                         const int4 volResolution4,
+                         const int4 volDims4,
+                         const int8 neighbourCoords,
+                         __global const float * volPoseptr,
+                         const float voxelSize,
+                         const float voxelSizeInv,
+                         const int needNormals,
+                         __local float* localbuf,
+                         volatile __global int* atomicCtr,
+                         __global const char* groupedSumptr,
+                         int groupedSum_slicestep,
+                         int groupedSum_step, int groupedSum_offset,
+                         __global char * pointsptr,
+                         int points_step, int points_offset,
+                         __global char * normalsptr,
+                         int normals_step, int normals_offset
+                         )
+{
+    const int3 volDims = volDims4.xyz;
+    const int3 volResolution = volResolution4.xyz;
+
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+    int z = get_global_id(2);
+
+    bool validVoxel = true;
+    if(x >= volResolution.x || y >= volResolution.y || z >= volResolution.z)
+        validVoxel = false;
+
+    const int gx = get_group_id(0);
+    const int gy = get_group_id(1);
+    const int gz = get_group_id(2);
+
+    __global int* groupedRow = (__global int*)(groupedSumptr +
+                                               groupedSum_offset +
+                                               gy*groupedSum_step +
+                                               gz*groupedSum_slicestep);
+
+    // this group contains 0 pts, skip it
+    int nptsGroup = groupedRow[gx];
+    if(nptsGroup == 0)
+        return;
+
+    const int lx = get_local_id(0);
+    const int ly = get_local_id(1);
+    const int lz = get_local_id(2);
+    const int lw = get_local_size(0);
+    const int lh = get_local_size(1);
+    const int ld = get_local_size(2);
+    const int lsz = lw*lh*ld;
+    const int lid = lx + ly*lw + lz*lw*lh;
+
+    // coordinate-independent constants
+
+    __global const float* vp = volPoseptr;
+    const float3 volRot0  = vload4(0, vp).xyz;
+    const float3 volRot1  = vload4(1, vp).xyz;
+    const float3 volRot2  = vload4(2, vp).xyz;
+    const float3 volTrans = (float3)(vp[3], vp[7], vp[11]);
+
+    // kernel itself
+    int npts = 0;
+    float3 parr[3], narr[3];
+    if(validVoxel)
+    {
+        int3 ip = (int3)(x, y, z);
+        int3 cmul = ip*volDims;
+        int idx = cmul.x + cmul.y + cmul.z;
+        float2 voxel = volumeptr[idx].s0;
+        float value  = voxel.s0;
+        int weight = as_int(voxel.s1);
+
+        // if voxel is not empty
+        if(weight != 0 && value != 1.f)
+        {
+            float3 V = (((float3)(x, y, z)) + 0.5f)*voxelSize;
+
+            #pragma unroll
+            for(int i = 0; i < 3; i++)
+            {
+                struct CoordReturn cr;
+                cr = coord(x, y, z, V, value, i,
+                           volumeptr, volResolution, volDims,
+                           neighbourCoords,
+                           voxelSize, voxelSizeInv,
+                           volRot0, volRot1, volRot2, volTrans,
+                           needNormals, false);
+
+                if(cr.result)
+                {
+                    parr[npts] = cr.point;
+                    narr[npts] = cr.normal;
+                    npts++;
+                }
+            }
+        }
+    }
+
+    // 4 floats per point or normal
+    const int elemStep = 4;
+
+    __local float* normAddr;
+    __local int localCtr;
+    if(lid == 0)
+        localCtr = 0;
+
+    // push all pts (and nrm) from private array to local mem
+    int privateCtr = 0;
+    barrier(CLK_LOCAL_MEM_FENCE);
+    privateCtr = atomic_add(&localCtr, npts);
+    barrier(CLK_LOCAL_MEM_FENCE);
+
+    for(int i = 0; i < npts; i++)
+    {
+        __local float* addr = localbuf + (privateCtr+i)*elemStep;
+        vstore4((float4)(parr[i], 0), 0, addr);
+    }
+
+    if(needNormals)
+    {
+        normAddr = localbuf + localCtr*elemStep;
+
+        for(int i = 0; i < npts; i++)
+        {
+            __local float* addr = normAddr + (privateCtr+i)*elemStep;
+            vstore4((float4)(narr[i], 0), 0, addr);
+        }
+    }
+
+    // debugging purposes
+    if(lid == 0)
+    {
+        if(localCtr != nptsGroup)
+        {
+            printf("!!! fetchPointsNormals result may be incorrect, npts != localCtr at %3d %3d %3d: %3d vs %3d\n",
+                   gx, gy, gz, localCtr, nptsGroup);
+        }
+    }
+
+    // copy local buffer to global mem
+    __local int whereToWrite;
+    if(lid == 0)
+        whereToWrite = atomic_add(atomicCtr, localCtr);
+    barrier(CLK_GLOBAL_MEM_FENCE);
+
+    event_t ev[2];
+    int evn = 0;
+    // points and normals are 1-column matrices
+    __global float* pts = (__global float*)(pointsptr +
+                                            points_offset +
+                                            whereToWrite*points_step);
+    ev[evn++] = async_work_group_copy(pts, localbuf, localCtr*elemStep, 0);
+
+    if(needNormals)
+    {
+        __global float* nrm = (__global float*)(normalsptr +
+                                                normals_offset +
+                                                whereToWrite*normals_step);
+        ev[evn++] = async_work_group_copy(nrm, normAddr, localCtr*elemStep, 0);
+    }
+
+    wait_group_events(evn, ev);
+}
diff --git a/modules/rgbd/src/plane.cpp b/modules/rgbd/src/plane.cpp
index de93ce3963b..6e7026827a3 100644
--- a/modules/rgbd/src/plane.cpp
+++ b/modules/rgbd/src/plane.cpp
@@ -1,37 +1,8 @@
-/*
- * Software License Agreement (BSD License)
- *
- *  Copyright (c) 2012, Willow Garage, Inc.
- *  All rights reserved.
- *
- *  Redistribution and use in source and binary forms, with or without
- *  modification, are permitted provided that the following conditions
- *  are met:
- *
- *   * Redistributions of source code must retain the above copyright
- *     notice, this list of conditions and the following disclaimer.
- *   * Redistributions in binary form must reproduce the above
- *     copyright notice, this list of conditions and the following
- *     disclaimer in the documentation and/or other materials provided
- *     with the distribution.
- *   * Neither the name of Willow Garage, Inc. nor the names of its
- *     contributors may be used to endorse or promote products derived
- *     from this software without specific prior written permission.
- *
- *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
- *  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
- *  COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
- *  INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
- *  BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- *  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- *  CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- *  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
- *  ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- *  POSSIBILITY OF SUCH DAMAGE.
- *
- */
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
 
 /** This is an implementation of a fast plane detection loosely inspired by
  * Fast Plane Detection and Polygonalization in noisy 3D Range Images
diff --git a/modules/rgbd/src/precomp.hpp b/modules/rgbd/src/precomp.hpp
index 5bcf8895caa..bad630705d7 100644
--- a/modules/rgbd/src/precomp.hpp
+++ b/modules/rgbd/src/precomp.hpp
@@ -1,56 +1,26 @@
-/*M///////////////////////////////////////////////////////////////////////////////////////
-//
-//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
-//
-//  By downloading, copying, installing or using the software you agree to this license.
-//  If you do not agree to this license, do not download, install,
-//  copy or use the software.
-//
-//
-//                          License Agreement
-//                For Open Source Computer Vision Library
-//
-// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
-// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
-// Third party copyrights are property of their respective owners.
-//
-// Redistribution and use in source and binary forms, with or without modification,
-// are permitted provided that the following conditions are met:
-//
-//   * Redistribution's of source code must retain the above copyright notice,
-//     this list of conditions and the following disclaimer.
-//
-//   * Redistribution's in binary form must reproduce the above copyright notice,
-//     this list of conditions and the following disclaimer in the documentation
-//     and/or other materials provided with the distribution.
-//
-//   * The name of the copyright holders may not be used to endorse or promote products
-//     derived from this software without specific prior written permission.
-//
-// This software is provided by the copyright holders and contributors "as is" and
-// any express or implied warranties, including, but not limited to, the implied
-// warranties of merchantability and fitness for a particular purpose are disclaimed.
-// In no event shall the Intel Corporation or contributors be liable for any direct,
-// indirect, incidental, special, exemplary, or consequential damages
-// (including, but not limited to, procurement of substitute goods or services;
-// loss of use, data, or profits; or business interruption) however caused
-// and on any theory of liability, whether in contract, strict liability,
-// or tort (including negligence or otherwise) arising in any way out of
-// the use of this software, even if advised of the possibility of such damage.
-//
-//M*/
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
+
 
 #ifndef __OPENCV_PRECOMP_H__
 #define __OPENCV_PRECOMP_H__
 
-#include "opencv2/rgbd.hpp"
-#include "opencv2/calib3d.hpp"
-#include "opencv2/imgproc.hpp"
-#include "opencv2/core/utility.hpp"
-#include "opencv2/core/private.hpp"
 #include <iostream>
 #include <list>
 #include <set>
 #include <limits>
 
+#include "opencv2/core/utility.hpp"
+#include "opencv2/core/private.hpp"
+#include "opencv2/core/hal/intrin.hpp"
+#include "opencv2/core/ocl.hpp"
+#include "opencv2/imgproc.hpp"
+#include "opencv2/calib3d.hpp"
+#include "opencv2/rgbd.hpp"
+
 #endif
diff --git a/modules/rgbd/src/tsdf.cpp b/modules/rgbd/src/tsdf.cpp
new file mode 100644
index 00000000000..c60a7ccc915
--- /dev/null
+++ b/modules/rgbd/src/tsdf.cpp
@@ -0,0 +1,1486 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#include "precomp.hpp"
+#include "tsdf.hpp"
+#include "opencl_kernels_rgbd.hpp"
+
+namespace cv {
+
+namespace kinfu {
+
+// TODO: Optimization possible:
+// * volumeType can be FP16
+// * weight can be int16
+typedef float volumeType;
+struct Voxel
+{
+    volumeType v;
+    int weight;
+};
+typedef Vec<uchar, sizeof(Voxel)> VecT;
+
+
+class TSDFVolumeCPU : public TSDFVolume
+{
+public:
+    // dimension in voxels, size in meters
+    TSDFVolumeCPU(Point3i _res, float _voxelSize, cv::Affine3f _pose, float _truncDist, int _maxWeight,
+                  float _raycastStepFactor, bool zFirstMemOrder = true);
+
+    virtual void integrate(InputArray _depth, float depthFactor, cv::Affine3f cameraPose, cv::kinfu::Intr intrinsics) override;
+    virtual void raycast(cv::Affine3f cameraPose, cv::kinfu::Intr intrinsics, cv::Size frameSize,
+                         cv::OutputArray points, cv::OutputArray normals) const override;
+
+    virtual void fetchNormals(cv::InputArray points, cv::OutputArray _normals) const override;
+    virtual void fetchPointsNormals(cv::OutputArray points, cv::OutputArray normals) const override;
+
+    virtual void reset() override;
+
+    volumeType interpolateVoxel(cv::Point3f p) const;
+    Point3f getNormalVoxel(cv::Point3f p) const;
+
+#if USE_INTRINSICS
+    volumeType interpolateVoxel(const v_float32x4& p) const;
+    v_float32x4 getNormalVoxel(const v_float32x4& p) const;
+#endif
+
+    // See zFirstMemOrder arg of parent class constructor
+    // for the array layout info
+    // Consist of Voxel elements
+    Mat volume;
+};
+
+
+TSDFVolume::TSDFVolume(Point3i _res, float _voxelSize, Affine3f _pose, float _truncDist, int _maxWeight,
+                       float _raycastStepFactor, bool zFirstMemOrder) :
+    voxelSize(_voxelSize),
+    voxelSizeInv(1.f/_voxelSize),
+    volResolution(_res),
+    maxWeight(_maxWeight),
+    pose(_pose),
+    raycastStepFactor(_raycastStepFactor)
+{
+    // Unlike original code, this should work with any volume size
+    // Not only when (x,y,z % 32) == 0
+
+    volSize = Point3f(volResolution) * voxelSize;
+
+    truncDist = std::max(_truncDist, 2.1f * voxelSize);
+
+    // (xRes*yRes*zRes) array
+    // Depending on zFirstMemOrder arg:
+    // &elem(x, y, z) = data + x*zRes*yRes + y*zRes + z;
+    // &elem(x, y, z) = data + x + y*xRes + z*xRes*yRes;
+    int xdim, ydim, zdim;
+    if(zFirstMemOrder)
+    {
+        xdim = volResolution.z * volResolution.y;
+        ydim = volResolution.z;
+        zdim = 1;
+    }
+    else
+    {
+        xdim = 1;
+        ydim = volResolution.x;
+        zdim = volResolution.x * volResolution.y;
+    }
+
+    volDims = Vec4i(xdim, ydim, zdim);
+    neighbourCoords = Vec8i(
+        volDims.dot(Vec4i(0, 0, 0)),
+        volDims.dot(Vec4i(0, 0, 1)),
+        volDims.dot(Vec4i(0, 1, 0)),
+        volDims.dot(Vec4i(0, 1, 1)),
+        volDims.dot(Vec4i(1, 0, 0)),
+        volDims.dot(Vec4i(1, 0, 1)),
+        volDims.dot(Vec4i(1, 1, 0)),
+        volDims.dot(Vec4i(1, 1, 1))
+    );
+}
+
+// dimension in voxels, size in meters
+TSDFVolumeCPU::TSDFVolumeCPU(Point3i _res, float _voxelSize, cv::Affine3f _pose, float _truncDist, int _maxWeight,
+                             float _raycastStepFactor, bool zFirstMemOrder) :
+    TSDFVolume(_res, _voxelSize, _pose, _truncDist, _maxWeight, _raycastStepFactor, zFirstMemOrder)
+{
+    volume = Mat(1, volResolution.x * volResolution.y * volResolution.z, rawType<Voxel>());
+
+    reset();
+}
+
+// zero volume, leave rest params the same
+void TSDFVolumeCPU::reset()
+{
+    CV_TRACE_FUNCTION();
+
+    volume.forEach<VecT>([](VecT& vv, const int* /* position */)
+    {
+        Voxel& v = reinterpret_cast<Voxel&>(vv);
+        v.v = 0; v.weight = 0;
+    });
+}
+
+// SIMD version of that code is manually inlined
+#if !USE_INTRINSICS
+static const bool fixMissingData = false;
+
+static inline depthType bilinearDepth(const Depth& m, cv::Point2f pt)
+{
+    const depthType defaultValue = qnan;
+    if(pt.x < 0 || pt.x >= m.cols-1 ||
+       pt.y < 0 || pt.y >= m.rows-1)
+        return defaultValue;
+
+    int xi = cvFloor(pt.x), yi = cvFloor(pt.y);
+
+    const depthType* row0 = m[yi+0];
+    const depthType* row1 = m[yi+1];
+
+    depthType v00 = row0[xi+0];
+    depthType v01 = row0[xi+1];
+    depthType v10 = row1[xi+0];
+    depthType v11 = row1[xi+1];
+
+    // assume correct depth is positive
+    bool b00 = v00 > 0;
+    bool b01 = v01 > 0;
+    bool b10 = v10 > 0;
+    bool b11 = v11 > 0;
+
+    if(!fixMissingData)
+    {
+        if(!(b00 && b01 && b10 && b11))
+            return defaultValue;
+        else
+        {
+            float tx = pt.x - xi, ty = pt.y - yi;
+            depthType v0 = v00 + tx*(v01 - v00);
+            depthType v1 = v10 + tx*(v11 - v10);
+            return v0 + ty*(v1 - v0);
+        }
+    }
+    else
+    {
+        int nz = b00 + b01 + b10 + b11;
+        if(nz == 0)
+        {
+            return defaultValue;
+        }
+        if(nz == 1)
+        {
+            if(b00) return v00;
+            if(b01) return v01;
+            if(b10) return v10;
+            if(b11) return v11;
+        }
+        else if(nz == 2)
+        {
+            if(b00 && b10) v01 = v00, v11 = v10;
+            if(b01 && b11) v00 = v01, v10 = v11;
+            if(b00 && b01) v10 = v00, v11 = v01;
+            if(b10 && b11) v00 = v10, v01 = v11;
+            if(b00 && b11) v01 = v10 = (v00 + v11)*0.5f;
+            if(b01 && b10) v00 = v11 = (v01 + v10)*0.5f;
+        }
+        else if(nz == 3)
+        {
+            if(!b00) v00 = v10 + v01 - v11;
+            if(!b01) v01 = v00 + v11 - v10;
+            if(!b10) v10 = v00 + v11 - v01;
+            if(!b11) v11 = v01 + v10 - v00;
+        }
+
+        float tx = pt.x - xi, ty = pt.y - yi;
+        depthType v0 = v00 + tx*(v01 - v00);
+        depthType v1 = v10 + tx*(v11 - v10);
+        return v0 + ty*(v1 - v0);
+    }
+}
+#endif
+
+struct IntegrateInvoker : ParallelLoopBody
+{
+    IntegrateInvoker(TSDFVolumeCPU& _volume, const Depth& _depth, Intr intrinsics, cv::Affine3f cameraPose,
+                     float depthFactor) :
+        ParallelLoopBody(),
+        volume(_volume),
+        depth(_depth),
+        proj(intrinsics.makeProjector()),
+        vol2cam(cameraPose.inv() * _volume.pose),
+        truncDistInv(1.f/_volume.truncDist),
+        dfac(1.f/depthFactor)
+    {
+        volDataStart = volume.volume.ptr<Voxel>();
+    }
+
+#if USE_INTRINSICS
+    virtual void operator() (const Range& range) const override
+    {
+        // zStep == vol2cam*(Point3f(x, y, 1)*voxelSize) - basePt;
+        Point3f zStepPt = Point3f(vol2cam.matrix(0, 2),
+                                  vol2cam.matrix(1, 2),
+                                  vol2cam.matrix(2, 2))*volume.voxelSize;
+
+        v_float32x4 zStep(zStepPt.x, zStepPt.y, zStepPt.z, 0);
+        v_float32x4 vfxy(proj.fx, proj.fy, 0.f, 0.f), vcxy(proj.cx, proj.cy, 0.f, 0.f);
+        const v_float32x4 upLimits = v_cvt_f32(v_int32x4(depth.cols-1, depth.rows-1, 0, 0));
+
+        for(int x = range.start; x < range.end; x++)
+        {
+            Voxel* volDataX = volDataStart + x*volume.volDims[0];
+            for(int y = 0; y < volume.volResolution.y; y++)
+            {
+                Voxel* volDataY = volDataX + y*volume.volDims[1];
+                // optimization of camSpace transformation (vector addition instead of matmul at each z)
+                Point3f basePt = vol2cam*(Point3f((float)x, (float)y, 0)*volume.voxelSize);
+                v_float32x4 camSpacePt(basePt.x, basePt.y, basePt.z, 0);
+
+                int startZ, endZ;
+                if(abs(zStepPt.z) > 1e-5)
+                {
+                    int baseZ = (int)(-basePt.z / zStepPt.z);
+                    if(zStepPt.z > 0)
+                    {
+                        startZ = baseZ;
+                        endZ = volume.volResolution.z;
+                    }
+                    else
+                    {
+                        startZ = 0;
+                        endZ = baseZ;
+                    }
+                }
+                else
+                {
+                    if(basePt.z > 0)
+                    {
+                        startZ = 0; endZ = volume.volResolution.z;
+                    }
+                    else
+                    {
+                        // z loop shouldn't be performed
+                        startZ = endZ = 0;
+                    }
+                }
+                startZ = max(0, startZ);
+                endZ = min(volume.volResolution.z, endZ);
+                for(int z = startZ; z < endZ; z++)
+                {
+                    // optimization of the following:
+                    //Point3f volPt = Point3f(x, y, z)*voxelSize;
+                    //Point3f camSpacePt = vol2cam * volPt;
+                    camSpacePt += zStep;
+
+                    float zCamSpace = v_reinterpret_as_f32(v_rotate_right<2>(v_reinterpret_as_u32(camSpacePt))).get0();
+
+                    if(zCamSpace <= 0.f)
+                        continue;
+
+                    v_float32x4 camPixVec = camSpacePt/v_setall_f32(zCamSpace);
+                    v_float32x4 projected = v_muladd(camPixVec, vfxy, vcxy);
+                    // leave only first 2 lanes
+                    projected = v_reinterpret_as_f32(v_reinterpret_as_u32(projected) &
+                                                     v_uint32x4(0xFFFFFFFF, 0xFFFFFFFF, 0, 0));
+
+                    depthType v;
+                    // bilinearly interpolate depth at projected
+                    {
+                        const v_float32x4& pt = projected;
+                        // check coords >= 0 and < imgSize
+                        v_uint32x4 limits = v_reinterpret_as_u32(pt < v_setzero_f32()) |
+                                            v_reinterpret_as_u32(pt >= upLimits);
+                        limits = limits | v_rotate_right<1>(limits);
+                        if(limits.get0())
+                            continue;
+
+                        // xi, yi = floor(pt)
+                        v_int32x4 ip = v_floor(pt);
+                        v_int32x4 ipshift = ip;
+                        int xi = ipshift.get0();
+                        ipshift = v_rotate_right<1>(ipshift);
+                        int yi = ipshift.get0();
+
+                        const depthType* row0 = depth[yi+0];
+                        const depthType* row1 = depth[yi+1];
+
+                        // v001 = [v(xi + 0, yi + 0), v(xi + 1, yi + 0)]
+                        v_float32x4 v001 = v_load_low(row0 + xi);
+                        // v101 = [v(xi + 0, yi + 1), v(xi + 1, yi + 1)]
+                        v_float32x4 v101 = v_load_low(row1 + xi);
+
+                        v_float32x4 vall = v_combine_low(v001, v101);
+
+                        // assume correct depth is positive
+                        // don't fix missing data
+                        if(v_check_all(vall > v_setzero_f32()))
+                        {
+                            v_float32x4 t = pt - v_cvt_f32(ip);
+                            float tx = t.get0();
+                            t = v_reinterpret_as_f32(v_rotate_right<1>(v_reinterpret_as_u32(t)));
+                            v_float32x4 ty = v_setall_f32(t.get0());
+                            // vx is y-interpolated between rows 0 and 1
+                            v_float32x4 vx = v001 + ty*(v101 - v001);
+                            float v0 = vx.get0();
+                            vx = v_reinterpret_as_f32(v_rotate_right<1>(v_reinterpret_as_u32(vx)));
+                            float v1 = vx.get0();
+                            v = v0 + tx*(v1 - v0);
+                        }
+                        else
+                            continue;
+                    }
+
+                    // norm(camPixVec) produces double which is too slow
+                    float pixNorm = sqrt(v_reduce_sum(camPixVec*camPixVec));
+                    // difference between distances of point and of surface to camera
+                    volumeType sdf = pixNorm*(v*dfac - zCamSpace);
+                    // possible alternative is:
+                    // kftype sdf = norm(camSpacePt)*(v*dfac/camSpacePt.z - 1);
+
+                    if(sdf >= -volume.truncDist)
+                    {
+                        volumeType tsdf = fmin(1.f, sdf * truncDistInv);
+
+                        Voxel& voxel = volDataY[z*volume.volDims[2]];
+                        int& weight = voxel.weight;
+                        volumeType& value = voxel.v;
+
+                        // update TSDF
+                        value = (value*weight+tsdf) / (weight + 1);
+                        weight = min(weight + 1, volume.maxWeight);
+                    }
+                }
+            }
+        }
+    }
+#else
+    virtual void operator() (const Range& range) const override
+    {
+        for(int x = range.start; x < range.end; x++)
+        {
+            Voxel* volDataX = volDataStart + x*volume.volDims[0];
+            for(int y = 0; y < volume.volResolution.y; y++)
+            {
+                Voxel* volDataY = volDataX+y*volume.volDims[1];
+                // optimization of camSpace transformation (vector addition instead of matmul at each z)
+                Point3f basePt = vol2cam*(Point3f(x, y, 0)*volume.voxelSize);
+                Point3f camSpacePt = basePt;
+                // zStep == vol2cam*(Point3f(x, y, 1)*voxelSize) - basePt;
+                Point3f zStep = Point3f(vol2cam.matrix(0, 2),
+                                        vol2cam.matrix(1, 2),
+                                        vol2cam.matrix(2, 2))*volume.voxelSize;
+
+                int startZ, endZ;
+                if(abs(zStep.z) > 1e-5)
+                {
+                    int baseZ = -basePt.z / zStep.z;
+                    if(zStep.z > 0)
+                    {
+                        startZ = baseZ;
+                        endZ = volume.volResolution.z;
+                    }
+                    else
+                    {
+                        startZ = 0;
+                        endZ = baseZ;
+                    }
+                }
+                else
+                {
+                    if(basePt.z > 0)
+                    {
+                        startZ = 0; endZ = volume.volResolution.z;
+                    }
+                    else
+                    {
+                        // z loop shouldn't be performed
+                        startZ = endZ = 0;
+                    }
+                }
+                startZ = max(0, startZ);
+                endZ = min(volume.volResolution.z, endZ);
+                for(int z = startZ; z < endZ; z++)
+                {
+                    // optimization of the following:
+                    //Point3f volPt = Point3f(x, y, z)*volume.voxelSize;
+                    //Point3f camSpacePt = vol2cam * volPt;
+                    camSpacePt += zStep;
+
+                    if(camSpacePt.z <= 0)
+                        continue;
+
+                    Point3f camPixVec;
+                    Point2f projected = proj(camSpacePt, camPixVec);
+
+                    depthType v = bilinearDepth(depth, projected);
+                    if(v == 0)
+                        continue;
+
+                    // norm(camPixVec) produces double which is too slow
+                    float pixNorm = sqrt(camPixVec.dot(camPixVec));
+                    // difference between distances of point and of surface to camera
+                    volumeType sdf = pixNorm*(v*dfac - camSpacePt.z);
+                    // possible alternative is:
+                    // kftype sdf = norm(camSpacePt)*(v*dfac/camSpacePt.z - 1);
+
+                    if(sdf >= -volume.truncDist)
+                    {
+                        volumeType tsdf = fmin(1.f, sdf * truncDistInv);
+
+                        Voxel& voxel = volDataY[z*volume.volDims[2]];
+                        int& weight = voxel.weight;
+                        volumeType& value = voxel.v;
+
+                        // update TSDF
+                        value = (value*weight+tsdf) / (weight + 1);
+                        weight = min(weight + 1, volume.maxWeight);
+                    }
+                }
+            }
+        }
+    }
+#endif
+
+    TSDFVolumeCPU& volume;
+    const Depth& depth;
+    const Intr::Projector proj;
+    const cv::Affine3f vol2cam;
+    const float truncDistInv;
+    const float dfac;
+    Voxel* volDataStart;
+};
+
+// use depth instead of distance (optimization)
+void TSDFVolumeCPU::integrate(InputArray _depth, float depthFactor, cv::Affine3f cameraPose, Intr intrinsics)
+{
+    CV_TRACE_FUNCTION();
+
+    CV_Assert(_depth.type() == DEPTH_TYPE);
+    Depth depth = _depth.getMat();
+
+    IntegrateInvoker ii(*this, depth, intrinsics, cameraPose, depthFactor);
+    Range range(0, volResolution.x);
+    parallel_for_(range, ii);
+}
+
+#if USE_INTRINSICS
+// all coordinate checks should be done in inclosing cycle
+inline volumeType TSDFVolumeCPU::interpolateVoxel(Point3f _p) const
+{
+    v_float32x4 p(_p.x, _p.y, _p.z, 0);
+    return interpolateVoxel(p);
+}
+
+inline volumeType TSDFVolumeCPU::interpolateVoxel(const v_float32x4& p) const
+{
+    // tx, ty, tz = floor(p)
+    v_int32x4 ip = v_floor(p);
+    v_float32x4 t = p - v_cvt_f32(ip);
+    float tx = t.get0();
+    t = v_reinterpret_as_f32(v_rotate_right<1>(v_reinterpret_as_u32(t)));
+    float ty = t.get0();
+    t = v_reinterpret_as_f32(v_rotate_right<1>(v_reinterpret_as_u32(t)));
+    float tz = t.get0();
+
+    int xdim = volDims[0], ydim = volDims[1], zdim = volDims[2];
+    const Voxel* volData = volume.ptr<Voxel>();
+
+    int ix = ip.get0(); ip = v_rotate_right<1>(ip);
+    int iy = ip.get0(); ip = v_rotate_right<1>(ip);
+    int iz = ip.get0();
+
+    int coordBase = ix*xdim + iy*ydim + iz*zdim;
+
+    volumeType vx[8];
+    for(int i = 0; i < 8; i++)
+        vx[i] = volData[neighbourCoords[i] + coordBase].v;
+
+    v_float32x4 v0246(vx[0], vx[2], vx[4], vx[6]), v1357(vx[1], vx[3], vx[5], vx[7]);
+    v_float32x4 vxx = v0246 + v_setall_f32(tz)*(v1357 - v0246);
+
+    v_float32x4 v00_10 = vxx;
+    v_float32x4 v01_11 = v_reinterpret_as_f32(v_rotate_right<1>(v_reinterpret_as_u32(vxx)));
+
+    v_float32x4 v0_1 = v00_10 + v_setall_f32(ty)*(v01_11 - v00_10);
+    float v0 = v0_1.get0();
+    v0_1 = v_reinterpret_as_f32(v_rotate_right<2>(v_reinterpret_as_u32(v0_1)));
+    float v1 = v0_1.get0();
+
+    return v0 + tx*(v1 - v0);
+}
+#else
+inline volumeType TSDFVolumeCPU::interpolateVoxel(Point3f p) const
+{
+    int xdim = volDims[0], ydim = volDims[1], zdim = volDims[2];
+
+    int ix = cvFloor(p.x);
+    int iy = cvFloor(p.y);
+    int iz = cvFloor(p.z);
+
+    float tx = p.x - ix;
+    float ty = p.y - iy;
+    float tz = p.z - iz;
+
+    int coordBase = ix*xdim + iy*ydim + iz*zdim;
+    const Voxel* volData = volume.ptr<Voxel>();
+
+    volumeType vx[8];
+    for(int i = 0; i < 8; i++)
+        vx[i] = volData[neighbourCoords[i] + coordBase].v;
+
+    volumeType v00 = vx[0] + tz*(vx[1] - vx[0]);
+    volumeType v01 = vx[2] + tz*(vx[3] - vx[2]);
+    volumeType v10 = vx[4] + tz*(vx[5] - vx[4]);
+    volumeType v11 = vx[6] + tz*(vx[7] - vx[6]);
+
+    volumeType v0 = v00 + ty*(v01 - v00);
+    volumeType v1 = v10 + ty*(v11 - v10);
+
+    return v0 + tx*(v1 - v0);
+}
+#endif
+
+#if USE_INTRINSICS
+//gradientDeltaFactor is fixed at 1.0 of voxel size
+inline Point3f TSDFVolumeCPU::getNormalVoxel(Point3f _p) const
+{
+    v_float32x4 p(_p.x, _p.y, _p.z, 0.f);
+    v_float32x4 result = getNormalVoxel(p);
+    float CV_DECL_ALIGNED(16) ares[4];
+    v_store_aligned(ares, result);
+    return Point3f(ares[0], ares[1], ares[2]);
+}
+
+inline v_float32x4 TSDFVolumeCPU::getNormalVoxel(const v_float32x4& p) const
+{
+    if(v_check_any((p < v_float32x4(1.f, 1.f, 1.f, 0.f)) +
+                   (p >= v_float32x4((float)(volResolution.x-2),
+                                     (float)(volResolution.y-2),
+                                     (float)(volResolution.z-2), 1.f))
+                   ))
+        return nanv;
+
+    v_int32x4 ip = v_floor(p);
+    v_float32x4 t = p - v_cvt_f32(ip);
+    float tx = t.get0();
+    t = v_reinterpret_as_f32(v_rotate_right<1>(v_reinterpret_as_u32(t)));
+    float ty = t.get0();
+    t = v_reinterpret_as_f32(v_rotate_right<1>(v_reinterpret_as_u32(t)));
+    float tz = t.get0();
+
+    const int xdim = volDims[0], ydim = volDims[1], zdim = volDims[2];
+    const Voxel* volData = volume.ptr<Voxel>();
+
+    int ix = ip.get0(); ip = v_rotate_right<1>(ip);
+    int iy = ip.get0(); ip = v_rotate_right<1>(ip);
+    int iz = ip.get0();
+
+    int coordBase = ix*xdim + iy*ydim + iz*zdim;
+
+    float CV_DECL_ALIGNED(16) an[4];
+    an[0] = an[1] = an[2] = an[3] = 0.f;
+    for(int c = 0; c < 3; c++)
+    {
+        const int dim = volDims[c];
+        float& nv = an[c];
+
+        volumeType vx[8];
+        for(int i = 0; i < 8; i++)
+            vx[i] = volData[neighbourCoords[i] + coordBase + 1*dim].v -
+                    volData[neighbourCoords[i] + coordBase - 1*dim].v;
+
+        v_float32x4 v0246(vx[0], vx[2], vx[4], vx[6]), v1357(vx[1], vx[3], vx[5], vx[7]);
+        v_float32x4 vxx = v0246 + v_setall_f32(tz)*(v1357 - v0246);
+
+        v_float32x4 v00_10 = vxx;
+        v_float32x4 v01_11 = v_reinterpret_as_f32(v_rotate_right<1>(v_reinterpret_as_u32(vxx)));
+
+        v_float32x4 v0_1 = v00_10 + v_setall_f32(ty)*(v01_11 - v00_10);
+        float v0 = v0_1.get0();
+        v0_1 = v_reinterpret_as_f32(v_rotate_right<2>(v_reinterpret_as_u32(v0_1)));
+        float v1 = v0_1.get0();
+
+        nv = v0 + tx*(v1 - v0);
+    }
+
+    v_float32x4 n = v_load_aligned(an);
+    v_float32x4 invNorm = v_invsqrt(v_setall_f32(v_reduce_sum(n*n)));
+    return n*invNorm;
+}
+#else
+inline Point3f TSDFVolumeCPU::getNormalVoxel(Point3f p) const
+{
+    const int xdim = volDims[0], ydim = volDims[1], zdim = volDims[2];
+    const Voxel* volData = volume.ptr<Voxel>();
+
+    if(p.x < 1 || p.x >= volResolution.x - 2 ||
+       p.y < 1 || p.y >= volResolution.y - 2 ||
+       p.z < 1 || p.z >= volResolution.z - 2)
+        return nan3;
+
+    int ix = cvFloor(p.x);
+    int iy = cvFloor(p.y);
+    int iz = cvFloor(p.z);
+
+    float tx = p.x - ix;
+    float ty = p.y - iy;
+    float tz = p.z - iz;
+
+    int coordBase = ix*xdim + iy*ydim + iz*zdim;
+
+    Vec3f an;
+    for(int c = 0; c < 3; c++)
+    {
+        const int dim = volDims[c];
+        float& nv = an[c];
+
+        volumeType vx[8];
+        for(int i = 0; i < 8; i++)
+            vx[i] = volData[neighbourCoords[i] + coordBase + 1*dim].v -
+                    volData[neighbourCoords[i] + coordBase - 1*dim].v;
+
+        volumeType v00 = vx[0] + tz*(vx[1] - vx[0]);
+        volumeType v01 = vx[2] + tz*(vx[3] - vx[2]);
+        volumeType v10 = vx[4] + tz*(vx[5] - vx[4]);
+        volumeType v11 = vx[6] + tz*(vx[7] - vx[6]);
+
+        volumeType v0 = v00 + ty*(v01 - v00);
+        volumeType v1 = v10 + ty*(v11 - v10);
+
+        nv = v0 + tx*(v1 - v0);
+    }
+
+    return normalize(an);
+}
+#endif
+
+
+struct RaycastInvoker : ParallelLoopBody
+{
+    RaycastInvoker(Points& _points, Normals& _normals, Affine3f cameraPose,
+                   Intr intrinsics, const TSDFVolumeCPU& _volume) :
+        ParallelLoopBody(),
+        points(_points),
+        normals(_normals),
+        volume(_volume),
+        tstep(volume.truncDist * volume.raycastStepFactor),
+        // We do subtract voxel size to minimize checks after
+        // Note: origin of volume coordinate is placed
+        // in the center of voxel (0,0,0), not in the corner of the voxel!
+        boxMax(volume.volSize - Point3f(volume.voxelSize,
+                                        volume.voxelSize,
+                                        volume.voxelSize)),
+        boxMin(),
+        cam2vol(volume.pose.inv() * cameraPose),
+        vol2cam(cameraPose.inv() * volume.pose),
+        reproj(intrinsics.makeReprojector())
+    {  }
+
+#if USE_INTRINSICS
+    virtual void operator() (const Range& range) const override
+    {
+        const v_float32x4 vfxy(reproj.fxinv, reproj.fyinv, 0, 0);
+        const v_float32x4 vcxy(reproj.cx, reproj.cy, 0, 0);
+
+        const float (&cm)[16] = cam2vol.matrix.val;
+        const v_float32x4 camRot0(cm[0], cm[4], cm[ 8], 0);
+        const v_float32x4 camRot1(cm[1], cm[5], cm[ 9], 0);
+        const v_float32x4 camRot2(cm[2], cm[6], cm[10], 0);
+        const v_float32x4 camTrans(cm[3], cm[7], cm[11], 0);
+
+        const v_float32x4 boxDown(boxMin.x, boxMin.y, boxMin.z, 0.f);
+        const v_float32x4 boxUp(boxMax.x, boxMax.y, boxMax.z, 0.f);
+
+        const v_float32x4 invVoxelSize = v_float32x4(volume.voxelSizeInv,
+                                                     volume.voxelSizeInv,
+                                                     volume.voxelSizeInv, 1.f);
+
+        const float (&vm)[16] = vol2cam.matrix.val;
+        const v_float32x4 volRot0(vm[0], vm[4], vm[ 8], 0);
+        const v_float32x4 volRot1(vm[1], vm[5], vm[ 9], 0);
+        const v_float32x4 volRot2(vm[2], vm[6], vm[10], 0);
+        const v_float32x4 volTrans(vm[3], vm[7], vm[11], 0);
+
+        for(int y = range.start; y < range.end; y++)
+        {
+            ptype* ptsRow = points[y];
+            ptype* nrmRow = normals[y];
+
+            for(int x = 0; x < points.cols; x++)
+            {
+                v_float32x4 point = nanv, normal = nanv;
+
+                v_float32x4 orig = camTrans;
+
+                // get direction through pixel in volume space:
+
+                // 1. reproject (x, y) on projecting plane where z = 1.f
+                v_float32x4 planed = (v_float32x4((float)x, (float)y, 0.f, 0.f) - vcxy)*vfxy;
+                planed = v_combine_low(planed, v_float32x4(1.f, 0.f, 0.f, 0.f));
+
+                // 2. rotate to volume space
+                planed = v_matmuladd(planed, camRot0, camRot1, camRot2, v_setzero_f32());
+
+                // 3. normalize
+                v_float32x4 invNorm = v_invsqrt(v_setall_f32(v_reduce_sum(planed*planed)));
+                v_float32x4 dir = planed*invNorm;
+
+                // compute intersection of ray with all six bbox planes
+                v_float32x4 rayinv = v_setall_f32(1.f)/dir;
+                // div by zero should be eliminated by these products
+                v_float32x4 tbottom = rayinv*(boxDown - orig);
+                v_float32x4 ttop    = rayinv*(boxUp   - orig);
+
+                // re-order intersections to find smallest and largest on each axis
+                v_float32x4 minAx = v_min(ttop, tbottom);
+                v_float32x4 maxAx = v_max(ttop, tbottom);
+
+                // near clipping plane
+                const float clip = 0.f;
+                float tmin = max(v_reduce_max(minAx), clip);
+                float tmax =     v_reduce_min(maxAx);
+
+                // precautions against getting coordinates out of bounds
+                tmin = tmin + tstep;
+                tmax = tmax - tstep;
+
+                if(tmin < tmax)
+                {
+                    // interpolation optimized a little
+                    orig *= invVoxelSize;
+                    dir  *= invVoxelSize;
+
+                    int xdim = volume.volDims[0];
+                    int ydim = volume.volDims[1];
+                    int zdim = volume.volDims[2];
+                    v_float32x4 rayStep = dir * v_setall_f32(tstep);
+                    v_float32x4 next = (orig + dir * v_setall_f32(tmin));
+                    volumeType f = volume.interpolateVoxel(next), fnext = f;
+
+                    //raymarch
+                    int steps = 0;
+                    int nSteps = cvFloor((tmax - tmin)/tstep);
+                    for(; steps < nSteps; steps++)
+                    {
+                        next += rayStep;
+                        v_int32x4 ip = v_round(next);
+                        int ix = ip.get0(); ip = v_rotate_right<1>(ip);
+                        int iy = ip.get0(); ip = v_rotate_right<1>(ip);
+                        int iz = ip.get0();
+                        int coord = ix*xdim + iy*ydim + iz*zdim;
+
+                        fnext = volume.volume.at<Voxel>(coord).v;
+                        if(fnext != f)
+                        {
+                            fnext = volume.interpolateVoxel(next);
+
+                            // when ray crosses a surface
+                            if(std::signbit(f) != std::signbit(fnext))
+                                break;
+
+                            f = fnext;
+                        }
+                    }
+
+                    // if ray penetrates a surface from outside
+                    // linearly interpolate t between two f values
+                    if(f > 0.f && fnext < 0.f)
+                    {
+                        v_float32x4 tp = next - rayStep;
+                        volumeType ft   = volume.interpolateVoxel(tp);
+                        volumeType ftdt = volume.interpolateVoxel(next);
+                        // float t = tmin + steps*tstep;
+                        // float ts = t - tstep*ft/(ftdt - ft);
+                        float ts = tmin + tstep*(steps - ft/(ftdt - ft));
+
+                        // avoid division by zero
+                        if(!cvIsNaN(ts) && !cvIsInf(ts))
+                        {
+                            v_float32x4 pv = (orig + dir*v_setall_f32(ts));
+                            v_float32x4 nv = volume.getNormalVoxel(pv);
+
+                            if(!isNaN(nv))
+                            {
+                                //convert pv and nv to camera space
+                                normal = v_matmuladd(nv, volRot0, volRot1, volRot2, v_setzero_f32());
+                                // interpolation optimized a little
+                                point = v_matmuladd(pv*v_float32x4(volume.voxelSize,
+                                                                   volume.voxelSize,
+                                                                   volume.voxelSize, 1.f),
+                                                    volRot0, volRot1, volRot2, volTrans);
+                            }
+                        }
+                    }
+                }
+
+                v_store((float*)(&ptsRow[x]), point);
+                v_store((float*)(&nrmRow[x]), normal);
+            }
+        }
+    }
+#else
+    virtual void operator() (const Range& range) const override
+    {
+        const Point3f camTrans = cam2vol.translation();
+        const Matx33f  camRot  = cam2vol.rotation();
+        const Matx33f  volRot  = vol2cam.rotation();
+
+        for(int y = range.start; y < range.end; y++)
+        {
+            ptype* ptsRow = points[y];
+            ptype* nrmRow = normals[y];
+
+            for(int x = 0; x < points.cols; x++)
+            {
+                Point3f point = nan3, normal = nan3;
+
+                Point3f orig = camTrans;
+                // direction through pixel in volume space
+                Point3f dir = normalize(Vec3f(camRot * reproj(Point3f(x, y, 1.f))));
+
+                // compute intersection of ray with all six bbox planes
+                Vec3f rayinv(1.f/dir.x, 1.f/dir.y, 1.f/dir.z);
+                Point3f tbottom = rayinv.mul(boxMin - orig);
+                Point3f ttop    = rayinv.mul(boxMax - orig);
+
+                // re-order intersections to find smallest and largest on each axis
+                Point3f minAx(min(ttop.x, tbottom.x), min(ttop.y, tbottom.y), min(ttop.z, tbottom.z));
+                Point3f maxAx(max(ttop.x, tbottom.x), max(ttop.y, tbottom.y), max(ttop.z, tbottom.z));
+
+                // near clipping plane
+                const float clip = 0.f;
+                float tmin = max(max(max(minAx.x, minAx.y), max(minAx.x, minAx.z)), clip);
+                float tmax =     min(min(maxAx.x, maxAx.y), min(maxAx.x, maxAx.z));
+
+                // precautions against getting coordinates out of bounds
+                tmin = tmin + tstep;
+                tmax = tmax - tstep;
+
+                if(tmin < tmax)
+                {
+                    // interpolation optimized a little
+                    orig = orig*volume.voxelSizeInv;
+                    dir  =  dir*volume.voxelSizeInv;
+
+                    Point3f rayStep = dir * tstep;
+                    Point3f next = (orig + dir * tmin);
+                    volumeType f = volume.interpolateVoxel(next), fnext = f;
+
+                    //raymarch
+                    int steps = 0;
+                    int nSteps = floor((tmax - tmin)/tstep);
+                    for(; steps < nSteps; steps++)
+                    {
+                        next += rayStep;
+                        int xdim = volume.volDims[0];
+                        int ydim = volume.volDims[1];
+                        int zdim = volume.volDims[2];
+                        int ix = cvRound(next.x);
+                        int iy = cvRound(next.y);
+                        int iz = cvRound(next.z);
+                        fnext = volume.volume.at<Voxel>(ix*xdim + iy*ydim + iz*zdim).v;
+                        if(fnext != f)
+                        {
+                            fnext = volume.interpolateVoxel(next);
+
+                            // when ray crosses a surface
+                            if(std::signbit(f) != std::signbit(fnext))
+                                break;
+
+                            f = fnext;
+                        }
+                    }
+
+                    // if ray penetrates a surface from outside
+                    // linearly interpolate t between two f values
+                    if(f > 0.f && fnext < 0.f)
+                    {
+                        Point3f tp = next - rayStep;
+                        volumeType ft   = volume.interpolateVoxel(tp);
+                        volumeType ftdt = volume.interpolateVoxel(next);
+                        // float t = tmin + steps*tstep;
+                        // float ts = t - tstep*ft/(ftdt - ft);
+                        float ts = tmin + tstep*(steps - ft/(ftdt - ft));
+
+                        // avoid division by zero
+                        if(!cvIsNaN(ts) && !cvIsInf(ts))
+                        {
+                            Point3f pv = (orig + dir*ts);
+                            Point3f nv = volume.getNormalVoxel(pv);
+
+                            if(!isNaN(nv))
+                            {
+                                //convert pv and nv to camera space
+                                normal = volRot * nv;
+                                // interpolation optimized a little
+                                point = vol2cam * (pv*volume.voxelSize);
+                            }
+                        }
+                    }
+                }
+
+                ptsRow[x] = toPtype(point);
+                nrmRow[x] = toPtype(normal);
+            }
+        }
+    }
+#endif
+
+    Points& points;
+    Normals& normals;
+    const TSDFVolumeCPU& volume;
+
+    const float tstep;
+
+    const Point3f boxMax;
+    const Point3f boxMin;
+
+    const Affine3f cam2vol;
+    const Affine3f vol2cam;
+    const Intr::Reprojector reproj;
+};
+
+
+void TSDFVolumeCPU::raycast(cv::Affine3f cameraPose, Intr intrinsics, Size frameSize,
+                            cv::OutputArray _points, cv::OutputArray _normals) const
+{
+    CV_TRACE_FUNCTION();
+
+    CV_Assert(frameSize.area() > 0);
+
+    _points.create (frameSize, POINT_TYPE);
+    _normals.create(frameSize, POINT_TYPE);
+
+    Points points   =  _points.getMat();
+    Normals normals = _normals.getMat();
+
+    RaycastInvoker ri(points, normals, cameraPose, intrinsics, *this);
+
+    const int nstripes = -1;
+    parallel_for_(Range(0, points.rows), ri, nstripes);
+}
+
+
+struct FetchPointsNormalsInvoker : ParallelLoopBody
+{
+    FetchPointsNormalsInvoker(const TSDFVolumeCPU& _volume,
+                              std::vector< std::vector<ptype> >& _pVecs,
+                              std::vector< std::vector<ptype> >& _nVecs,
+                              bool _needNormals) :
+        ParallelLoopBody(),
+        vol(_volume),
+        pVecs(_pVecs),
+        nVecs(_nVecs),
+        needNormals(_needNormals)
+    {
+        volDataStart = vol.volume.ptr<Voxel>();
+    }
+
+    inline void coord(std::vector<ptype>& points, std::vector<ptype>& normals,
+                      int x, int y, int z, Point3f V, float v0, int axis) const
+    {
+        // 0 for x, 1 for y, 2 for z
+        bool limits = false;
+        Point3i shift;
+        float Vc = 0.f;
+        if(axis == 0)
+        {
+            shift = Point3i(1, 0, 0);
+            limits = (x + 1 < vol.volResolution.x);
+            Vc = V.x;
+        }
+        if(axis == 1)
+        {
+            shift = Point3i(0, 1, 0);
+            limits = (y + 1 < vol.volResolution.y);
+            Vc = V.y;
+        }
+        if(axis == 2)
+        {
+            shift = Point3i(0, 0, 1);
+            limits = (z + 1 < vol.volResolution.z);
+            Vc = V.z;
+        }
+
+        if(limits)
+        {
+            const Voxel& voxeld = volDataStart[(x+shift.x)*vol.volDims[0] +
+                                               (y+shift.y)*vol.volDims[1] +
+                                               (z+shift.z)*vol.volDims[2]];
+            volumeType vd = voxeld.v;
+
+            if(voxeld.weight != 0 && vd != 1.f)
+            {
+                if((v0 > 0 && vd < 0) || (v0 < 0 && vd > 0))
+                {
+                    //linearly interpolate coordinate
+                    float Vn = Vc + vol.voxelSize;
+                    float dinv = 1.f/(abs(v0)+abs(vd));
+                    float inter = (Vc*abs(vd) + Vn*abs(v0))*dinv;
+
+                    Point3f p(shift.x ? inter : V.x,
+                              shift.y ? inter : V.y,
+                              shift.z ? inter : V.z);
+                    {
+                        points.push_back(toPtype(vol.pose * p));
+                        if(needNormals)
+                            normals.push_back(toPtype(vol.pose.rotation() *
+                                                      vol.getNormalVoxel(p*vol.voxelSizeInv)));
+                    }
+                }
+            }
+        }
+    }
+
+    virtual void operator() (const Range& range) const override
+    {
+        std::vector<ptype> points, normals;
+        for(int x = range.start; x < range.end; x++)
+        {
+            const Voxel* volDataX = volDataStart + x*vol.volDims[0];
+            for(int y = 0; y < vol.volResolution.y; y++)
+            {
+                const Voxel* volDataY = volDataX + y*vol.volDims[1];
+                for(int z = 0; z < vol.volResolution.z; z++)
+                {
+                    const Voxel& voxel0 = volDataY[z*vol.volDims[2]];
+                    volumeType v0 = voxel0.v;
+                    if(voxel0.weight != 0 && v0 != 1.f)
+                    {
+                        Point3f V(Point3f((float)x + 0.5f, (float)y + 0.5f, (float)z + 0.5f)*vol.voxelSize);
+
+                        coord(points, normals, x, y, z, V, v0, 0);
+                        coord(points, normals, x, y, z, V, v0, 1);
+                        coord(points, normals, x, y, z, V, v0, 2);
+
+                    } // if voxel is not empty
+                }
+            }
+        }
+
+        AutoLock al(mutex);
+        pVecs.push_back(points);
+        nVecs.push_back(normals);
+    }
+
+    const TSDFVolumeCPU& vol;
+    std::vector< std::vector<ptype> >& pVecs;
+    std::vector< std::vector<ptype> >& nVecs;
+    const Voxel* volDataStart;
+    bool needNormals;
+    mutable Mutex mutex;
+};
+
+void TSDFVolumeCPU::fetchPointsNormals(OutputArray _points, OutputArray _normals) const
+{
+    CV_TRACE_FUNCTION();
+
+    if(_points.needed())
+    {
+        std::vector< std::vector<ptype> > pVecs, nVecs;
+        FetchPointsNormalsInvoker fi(*this, pVecs, nVecs, _normals.needed());
+        Range range(0, volResolution.x);
+        const int nstripes = -1;
+        parallel_for_(range, fi, nstripes);
+        std::vector<ptype> points, normals;
+        for(size_t i = 0; i < pVecs.size(); i++)
+        {
+            points.insert(points.end(), pVecs[i].begin(), pVecs[i].end());
+            normals.insert(normals.end(), nVecs[i].begin(), nVecs[i].end());
+        }
+
+        _points.create((int)points.size(), 1, POINT_TYPE);
+        if(!points.empty())
+            Mat((int)points.size(), 1, POINT_TYPE, &points[0]).copyTo(_points.getMat());
+
+        if(_normals.needed())
+        {
+            _normals.create((int)normals.size(), 1, POINT_TYPE);
+            if(!normals.empty())
+                Mat((int)normals.size(), 1, POINT_TYPE, &normals[0]).copyTo(_normals.getMat());
+        }
+    }
+}
+
+
+struct PushNormals
+{
+    PushNormals(const TSDFVolumeCPU& _vol, Mat_<ptype>& _nrm) :
+        vol(_vol), normals(_nrm), invPose(vol.pose.inv())
+    { }
+    void operator ()(const ptype &pp, const int * position) const
+    {
+        Point3f p = fromPtype(pp);
+        Point3f n = nan3;
+        if(!isNaN(p))
+        {
+            Point3f voxPt = (invPose * p);
+            voxPt = voxPt * vol.voxelSizeInv;
+            n = vol.pose.rotation() * vol.getNormalVoxel(voxPt);
+        }
+        normals(position[0], position[1]) = toPtype(n);
+    }
+    const TSDFVolumeCPU& vol;
+    Mat_<ptype>& normals;
+
+    Affine3f invPose;
+};
+
+
+void TSDFVolumeCPU::fetchNormals(InputArray _points, OutputArray _normals) const
+{
+    CV_TRACE_FUNCTION();
+
+    if(_normals.needed())
+    {
+        Points points = _points.getMat();
+        CV_Assert(points.type() == POINT_TYPE);
+
+        _normals.createSameSize(_points, _points.type());
+        Mat_<ptype> normals = _normals.getMat();
+
+        points.forEach(PushNormals(*this, normals));
+    }
+}
+
+///////// GPU implementation /////////
+
+#ifdef HAVE_OPENCL
+
+class TSDFVolumeGPU : public TSDFVolume
+{
+public:
+    // dimension in voxels, size in meters
+    TSDFVolumeGPU(Point3i _res, float _voxelSize, cv::Affine3f _pose, float _truncDist, int _maxWeight,
+                  float _raycastStepFactor);
+
+    virtual void integrate(InputArray _depth, float depthFactor, cv::Affine3f cameraPose, cv::kinfu::Intr intrinsics) override;
+    virtual void raycast(cv::Affine3f cameraPose, cv::kinfu::Intr intrinsics, cv::Size frameSize,
+                         cv::OutputArray _points, cv::OutputArray _normals) const override;
+
+    virtual void fetchPointsNormals(cv::OutputArray points, cv::OutputArray normals) const override;
+    virtual void fetchNormals(cv::InputArray points, cv::OutputArray normals) const override;
+
+    virtual void reset() override;
+
+    // See zFirstMemOrder arg of parent class constructor
+    // for the array layout info
+    // Array elem is CV_32FC2, read as (float, int)
+    // TODO: optimization possible to (fp16, int16), see Voxel definition
+    UMat volume;
+};
+
+
+TSDFVolumeGPU::TSDFVolumeGPU(Point3i _res, float _voxelSize, cv::Affine3f _pose, float _truncDist, int _maxWeight,
+                             float _raycastStepFactor) :
+    TSDFVolume(_res, _voxelSize, _pose, _truncDist, _maxWeight, _raycastStepFactor, false)
+{
+    volume = UMat(1, volResolution.x * volResolution.y * volResolution.z, CV_32FC2);
+
+    reset();
+}
+
+
+// zero volume, leave rest params the same
+void TSDFVolumeGPU::reset()
+{
+    CV_TRACE_FUNCTION();
+
+    volume.setTo(Scalar(0, 0));
+}
+
+
+// use depth instead of distance (optimization)
+void TSDFVolumeGPU::integrate(InputArray _depth, float depthFactor,
+                              cv::Affine3f cameraPose, Intr intrinsics)
+{
+    CV_TRACE_FUNCTION();
+
+    UMat depth = _depth.getUMat();
+
+    cv::String errorStr;
+    cv::String name = "integrate";
+    ocl::ProgramSource source = ocl::rgbd::tsdf_oclsrc;
+    cv::String options = "-cl-mad-enable";
+    ocl::Kernel k;
+    k.create(name.c_str(), source, options, &errorStr);
+
+    if(k.empty())
+        throw std::runtime_error("Failed to create kernel: " + errorStr);
+
+    cv::Affine3f vol2cam(cameraPose.inv() * pose);
+    float dfac = 1.f/depthFactor;
+    Vec4i volResGpu(volResolution.x, volResolution.y, volResolution.z);
+    Vec2f fxy(intrinsics.fx, intrinsics.fy), cxy(intrinsics.cx, intrinsics.cy);
+
+    // TODO: optimization possible
+    // Use sampler for depth (mask needed)
+    k.args(ocl::KernelArg::ReadOnly(depth),
+           ocl::KernelArg::PtrReadWrite(volume),
+           ocl::KernelArg::Constant(vol2cam.matrix.val,
+                                    sizeof(vol2cam.matrix.val)),
+           voxelSize,
+           volResGpu.val,
+           volDims.val,
+           fxy.val,
+           cxy.val,
+           dfac,
+           truncDist,
+           maxWeight);
+
+    size_t globalSize[2];
+    globalSize[0] = (size_t)volResolution.x;
+    globalSize[1] = (size_t)volResolution.y;
+
+    if(!k.run(2, globalSize, NULL, true))
+        throw std::runtime_error("Failed to run kernel");
+}
+
+
+void TSDFVolumeGPU::raycast(cv::Affine3f cameraPose, Intr intrinsics, Size frameSize,
+                            cv::OutputArray _points, cv::OutputArray _normals) const
+{
+    CV_TRACE_FUNCTION();
+
+    CV_Assert(frameSize.area() > 0);
+
+    cv::String errorStr;
+    cv::String name = "raycast";
+    ocl::ProgramSource source = ocl::rgbd::tsdf_oclsrc;
+    cv::String options = "-cl-mad-enable";
+    ocl::Kernel k;
+    k.create(name.c_str(), source, options, &errorStr);
+
+    if(k.empty())
+        throw std::runtime_error("Failed to create kernel: " + errorStr);
+
+    _points.create (frameSize, CV_32FC4);
+    _normals.create(frameSize, CV_32FC4);
+
+    UMat points  =  _points.getUMat();
+    UMat normals = _normals.getUMat();
+
+    UMat vol2camGpu, cam2volGpu;
+    Affine3f vol2cam = cameraPose.inv() * pose;
+    Affine3f cam2vol = pose.inv() * cameraPose;
+    Mat(cam2vol.matrix).copyTo(cam2volGpu);
+    Mat(vol2cam.matrix).copyTo(vol2camGpu);
+    Intr::Reprojector r = intrinsics.makeReprojector();
+    // We do subtract voxel size to minimize checks after
+    // Note: origin of volume coordinate is placed
+    // in the center of voxel (0,0,0), not in the corner of the voxel!
+    Vec4f boxMin, boxMax(volSize.x - voxelSize,
+                         volSize.y - voxelSize,
+                         volSize.z - voxelSize);
+    Vec2f finv(r.fxinv, r.fyinv), cxy(r.cx, r.cy);
+    float tstep = truncDist * raycastStepFactor;
+
+    Vec4i volResGpu(volResolution.x, volResolution.y, volResolution.z);
+
+    k.args(ocl::KernelArg::WriteOnlyNoSize(points),
+           ocl::KernelArg::WriteOnlyNoSize(normals),
+           frameSize,
+           ocl::KernelArg::PtrReadOnly(volume),
+           ocl::KernelArg::PtrReadOnly(vol2camGpu),
+           ocl::KernelArg::PtrReadOnly(cam2volGpu),
+           finv.val, cxy.val,
+           boxMin.val, boxMax.val,
+           tstep,
+           voxelSize,
+           volResGpu.val,
+           volDims.val,
+           neighbourCoords.val);
+
+    size_t globalSize[2];
+    globalSize[0] = (size_t)frameSize.width;
+    globalSize[1] = (size_t)frameSize.height;
+
+    if(!k.run(2, globalSize, NULL, true))
+        throw std::runtime_error("Failed to run kernel");
+}
+
+
+void TSDFVolumeGPU::fetchNormals(InputArray _points, OutputArray _normals) const
+{
+    CV_TRACE_FUNCTION();
+
+    if(_normals.needed())
+    {
+        UMat points = _points.getUMat();
+        CV_Assert(points.type() == POINT_TYPE);
+
+        _normals.createSameSize(_points, POINT_TYPE);
+        UMat normals = _normals.getUMat();
+
+        cv::String errorStr;
+        cv::String name = "getNormals";
+        ocl::ProgramSource source = ocl::rgbd::tsdf_oclsrc;
+        cv::String options = "-cl-mad-enable";
+        ocl::Kernel k;
+        k.create(name.c_str(), source, options, &errorStr);
+
+        if(k.empty())
+            throw std::runtime_error("Failed to create kernel: " + errorStr);
+
+        UMat volPoseGpu, invPoseGpu;
+        Mat(pose      .matrix).copyTo(volPoseGpu);
+        Mat(pose.inv().matrix).copyTo(invPoseGpu);
+        Vec4i volResGpu(volResolution.x, volResolution.y, volResolution.z);
+        Size frameSize = points.size();
+
+        k.args(ocl::KernelArg::ReadOnlyNoSize(points),
+               ocl::KernelArg::WriteOnlyNoSize(normals),
+               frameSize,
+               ocl::KernelArg::PtrReadOnly(volume),
+               ocl::KernelArg::PtrReadOnly(volPoseGpu),
+               ocl::KernelArg::PtrReadOnly(invPoseGpu),
+               voxelSizeInv,
+               volResGpu.val,
+               volDims.val,
+               neighbourCoords.val);
+
+        size_t globalSize[2];
+        globalSize[0] = (size_t)points.cols;
+        globalSize[1] = (size_t)points.rows;
+
+        if(!k.run(2, globalSize, NULL, true))
+            throw std::runtime_error("Failed to run kernel");
+    }
+}
+
+void TSDFVolumeGPU::fetchPointsNormals(OutputArray points, OutputArray normals) const
+{
+    CV_TRACE_FUNCTION();
+
+    if(points.needed())
+    {
+        bool needNormals = normals.needed();
+
+        // 1. scan to count points in each group and allocate output arrays
+
+        ocl::Kernel kscan;
+
+        cv::String errorStr;
+        ocl::ProgramSource source = ocl::rgbd::tsdf_oclsrc;
+        cv::String options = "-cl-mad-enable";
+
+        kscan.create("scanSize", source, options, &errorStr);
+
+        if(kscan.empty())
+            throw std::runtime_error("Failed to create kernel: " + errorStr);
+
+        size_t globalSize[3];
+        globalSize[0] = (size_t)volResolution.x;
+        globalSize[1] = (size_t)volResolution.y;
+        globalSize[2] = (size_t)volResolution.z;
+
+        const ocl::Device& device = ocl::Device::getDefault();
+        size_t wgsLimit = device.maxWorkGroupSize();
+        size_t memSize  = device.localMemSize();
+        // local mem should keep a point (and a normal) for each thread in a group
+        // use 4 float per each point and normal
+        size_t elemSize = (sizeof(float)*4)*(needNormals ? 2 : 1);
+        const size_t lcols = 8;
+        const size_t lrows = 8;
+        size_t lplanes = min(memSize/elemSize, wgsLimit)/lcols/lrows;
+        lplanes = roundDownPow2(lplanes);
+        size_t localSize[3] = {lcols, lrows, lplanes};
+        Vec3i ngroups((int)divUp(globalSize[0], (unsigned int)localSize[0]),
+                      (int)divUp(globalSize[1], (unsigned int)localSize[1]),
+                      (int)divUp(globalSize[2], (unsigned int)localSize[2]));
+
+        const size_t counterSize = sizeof(int);
+        size_t lszscan = localSize[0]*localSize[1]*localSize[2]*counterSize;
+
+        const int gsz[3] = {ngroups[2], ngroups[1], ngroups[0]};
+        UMat groupedSum(3, gsz, CV_32S, Scalar(0));
+
+        UMat volPoseGpu;
+        Mat(pose.matrix).copyTo(volPoseGpu);
+        Vec4i volResGpu(volResolution.x, volResolution.y, volResolution.z);
+
+        kscan.args(ocl::KernelArg::PtrReadOnly(volume),
+                   volResGpu.val,
+                   volDims.val,
+                   neighbourCoords.val,
+                   ocl::KernelArg::PtrReadOnly(volPoseGpu),
+                   voxelSize,
+                   voxelSizeInv,
+                   ocl::KernelArg::Local(lszscan),
+                   ocl::KernelArg::WriteOnlyNoSize(groupedSum));
+
+        if(!kscan.run(3, globalSize, localSize, true))
+            throw std::runtime_error("Failed to run kernel");
+
+        Mat groupedSumCpu = groupedSum.getMat(ACCESS_READ);
+        int gpuSum = (int)cv::sum(groupedSumCpu)[0];
+        // should be no CPU copies when new kernel is executing
+        groupedSumCpu.release();
+
+        // 2. fill output arrays according to per-group points count
+
+        points.create(gpuSum, 1, POINT_TYPE);
+        UMat pts = points.getUMat();
+        UMat nrm;
+        if(needNormals)
+        {
+            normals.create(gpuSum, 1, POINT_TYPE);
+            nrm = normals.getUMat();
+        }
+        else
+        {
+            // it won't be accessed but empty args are forbidden
+            nrm = UMat(1, 1, POINT_TYPE);
+        }
+
+        if (gpuSum)
+        {
+            ocl::Kernel kfill;
+            kfill.create("fillPtsNrm", source, options, &errorStr);
+
+            if(kfill.empty())
+                throw std::runtime_error("Failed to create kernel: " + errorStr);
+
+            UMat atomicCtr(1, 1, CV_32S, Scalar(0));
+
+            // mem size to keep pts (and normals optionally) for all work-items in a group
+            size_t lszfill = localSize[0]*localSize[1]*localSize[2]*elemSize;
+
+            kfill.args(ocl::KernelArg::PtrReadOnly(volume),
+                       volResGpu.val,
+                       volDims.val,
+                       neighbourCoords.val,
+                       ocl::KernelArg::PtrReadOnly(volPoseGpu),
+                       voxelSize,
+                       voxelSizeInv,
+                       ((int)needNormals),
+                       ocl::KernelArg::Local(lszfill),
+                       ocl::KernelArg::PtrReadWrite(atomicCtr),
+                       ocl::KernelArg::ReadOnlyNoSize(groupedSum),
+                       ocl::KernelArg::WriteOnlyNoSize(pts),
+                       ocl::KernelArg::WriteOnlyNoSize(nrm)
+                       );
+
+            if(!kfill.run(3, globalSize, localSize, true))
+                throw std::runtime_error("Failed to run kernel");
+        }
+    }
+}
+
+#endif
+
+cv::Ptr<TSDFVolume> makeTSDFVolume(Point3i _res,  float _voxelSize, cv::Affine3f _pose, float _truncDist, int _maxWeight,
+                                   float _raycastStepFactor)
+{
+#ifdef HAVE_OPENCL
+    if(cv::ocl::useOpenCL())
+        return cv::makePtr<TSDFVolumeGPU>(_res, _voxelSize, _pose, _truncDist, _maxWeight, _raycastStepFactor);
+#endif
+    return cv::makePtr<TSDFVolumeCPU>(_res, _voxelSize, _pose, _truncDist, _maxWeight, _raycastStepFactor);
+}
+
+} // namespace kinfu
+} // namespace cv
diff --git a/modules/rgbd/src/tsdf.hpp b/modules/rgbd/src/tsdf.hpp
new file mode 100644
index 00000000000..347935ddf29
--- /dev/null
+++ b/modules/rgbd/src/tsdf.hpp
@@ -0,0 +1,52 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#ifndef __OPENCV_KINFU_TSDF_H__
+#define __OPENCV_KINFU_TSDF_H__
+
+#include "kinfu_frame.hpp"
+
+namespace cv {
+namespace kinfu {
+
+
+class TSDFVolume
+{
+public:
+    // dimension in voxels, size in meters
+    TSDFVolume(Point3i _res, float _voxelSize, cv::Affine3f _pose, float _truncDist, int _maxWeight,
+               float _raycastStepFactor, bool zFirstMemOrder = true);
+
+    virtual void integrate(InputArray _depth, float depthFactor, cv::Affine3f cameraPose, cv::kinfu::Intr intrinsics) = 0;
+    virtual void raycast(cv::Affine3f cameraPose, cv::kinfu::Intr intrinsics, cv::Size frameSize,
+                         cv::OutputArray points, cv::OutputArray normals) const = 0;
+
+    virtual void fetchPointsNormals(cv::OutputArray points, cv::OutputArray normals) const = 0;
+    virtual void fetchNormals(cv::InputArray points, cv::OutputArray _normals) const = 0;
+
+    virtual void reset() = 0;
+
+    virtual ~TSDFVolume() { }
+
+    float voxelSize;
+    float voxelSizeInv;
+    Point3i volResolution;
+    int maxWeight;
+    cv::Affine3f pose;
+    float raycastStepFactor;
+
+    Point3f volSize;
+    float truncDist;
+    Vec4i volDims;
+    Vec8i neighbourCoords;
+};
+
+cv::Ptr<TSDFVolume> makeTSDFVolume(Point3i _res,  float _voxelSize, cv::Affine3f _pose, float _truncDist, int _maxWeight,
+                                   float _raycastStepFactor);
+
+} // namespace kinfu
+} // namespace cv
+#endif
diff --git a/modules/rgbd/src/utils.cpp b/modules/rgbd/src/utils.cpp
index f068c298e41..d2188a77c23 100644
--- a/modules/rgbd/src/utils.cpp
+++ b/modules/rgbd/src/utils.cpp
@@ -1,42 +1,13 @@
-/*
- * Software License Agreement (BSD License)
- *
- *  Copyright (c) 2009, Willow Garage, Inc.
- *  All rights reserved.
- *
- *  Redistribution and use in source and binary forms, with or without
- *  modification, are permitted provided that the following conditions
- *  are met:
- *
- *   * Redistributions of source code must retain the above copyright
- *     notice, this list of conditions and the following disclaimer.
- *   * Redistributions in binary form must reproduce the above
- *     copyright notice, this list of conditions and the following
- *     disclaimer in the documentation and/or other materials provided
- *     with the distribution.
- *   * Neither the name of Willow Garage, Inc. nor the names of its
- *     contributors may be used to endorse or promote products derived
- *     from this software without specific prior written permission.
- *
- *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
- *  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
- *  COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
- *  INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
- *  BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- *  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- *  CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- *  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
- *  ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- *  POSSIBILITY OF SUCH DAMAGE.
- *
- */
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
 
-#include <opencv2/rgbd.hpp>
-#include <limits>
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
 
-#include "utils.h"
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
+
+#include "precomp.hpp"
+#include "utils.hpp"
 
 namespace cv
 {
@@ -76,6 +47,10 @@ namespace rgbd
     if ((in_depth == CV_32F) || (in_depth == CV_64F))
       in.convertTo(out, depth);
   }
-}
-}
+} // namespace rgbd
+
+namespace kinfu {
+
+} // namespace kinfu
+} // namespace cv
 
diff --git a/modules/rgbd/src/utils.h b/modules/rgbd/src/utils.h
deleted file mode 100644
index 54f4c100aac..00000000000
--- a/modules/rgbd/src/utils.h
+++ /dev/null
@@ -1,81 +0,0 @@
-/*
- * Software License Agreement (BSD License)
- *
- *  Copyright (c) 2009, Willow Garage, Inc.
- *  All rights reserved.
- *
- *  Redistribution and use in source and binary forms, with or without
- *  modification, are permitted provided that the following conditions
- *  are met:
- *
- *   * Redistributions of source code must retain the above copyright
- *     notice, this list of conditions and the following disclaimer.
- *   * Redistributions in binary form must reproduce the above
- *     copyright notice, this list of conditions and the following
- *     disclaimer in the documentation and/or other materials provided
- *     with the distribution.
- *   * Neither the name of Willow Garage, Inc. nor the names of its
- *     contributors may be used to endorse or promote products derived
- *     from this software without specific prior written permission.
- *
- *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
- *  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
- *  COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
- *  INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
- *  BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- *  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- *  CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- *  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
- *  ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- *  POSSIBILITY OF SUCH DAMAGE.
- *
- */
-
-#ifndef __OPENCV_RGBD_UTILS_HPP__
-#define __OPENCV_RGBD_UTILS_HPP__
-
-#ifdef __cplusplus
-
-#include <opencv2/rgbd.hpp>
-
-namespace cv
-{
-namespace rgbd
-{
-
-/** If the input image is of type CV_16UC1 (like the Kinect one), the image is converted to floats, divided
- * by 1000 to get a depth in meters, and the values 0 are converted to std::numeric_limits<float>::quiet_NaN()
- * Otherwise, the image is simply converted to floats
- * @param in the depth image (if given as short int CV_U, it is assumed to be the depth in millimeters
- *              (as done with the Microsoft Kinect), it is assumed in meters)
- * @param the desired output depth (floats or double)
- * @param out The rescaled float depth image
- */
-template<typename T>
-void
-rescaleDepthTemplated(const Mat& in, Mat& out);
-
-template<>
-inline void
-rescaleDepthTemplated<float>(const Mat& in, Mat& out)
-{
-  rescaleDepth(in, CV_32F, out);
-}
-
-template<>
-inline void
-rescaleDepthTemplated<double>(const Mat& in, Mat& out)
-{
-  rescaleDepth(in, CV_64F, out);
-}
-
-}
-}
-
-#endif /* __cplusplus */
-
-#endif
-
-/* End of file. */
diff --git a/modules/rgbd/src/utils.hpp b/modules/rgbd/src/utils.hpp
new file mode 100644
index 00000000000..b1b3028677f
--- /dev/null
+++ b/modules/rgbd/src/utils.hpp
@@ -0,0 +1,148 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
+
+#ifndef __OPENCV_RGBD_UTILS_HPP__
+#define __OPENCV_RGBD_UTILS_HPP__
+
+namespace cv
+{
+namespace rgbd
+{
+
+/** If the input image is of type CV_16UC1 (like the Kinect one), the image is converted to floats, divided
+ * by 1000 to get a depth in meters, and the values 0 are converted to std::numeric_limits<float>::quiet_NaN()
+ * Otherwise, the image is simply converted to floats
+ * @param in the depth image (if given as short int CV_U, it is assumed to be the depth in millimeters
+ *              (as done with the Microsoft Kinect), it is assumed in meters)
+ * @param the desired output depth (floats or double)
+ * @param out The rescaled float depth image
+ */
+template<typename T>
+void
+rescaleDepthTemplated(const Mat& in, Mat& out);
+
+template<>
+inline void
+rescaleDepthTemplated<float>(const Mat& in, Mat& out)
+{
+  rescaleDepth(in, CV_32F, out);
+}
+
+template<>
+inline void
+rescaleDepthTemplated<double>(const Mat& in, Mat& out)
+{
+  rescaleDepth(in, CV_64F, out);
+}
+
+} // namespace rgbd
+
+
+namespace kinfu {
+
+// One place to turn intrinsics on and off
+#define USE_INTRINSICS CV_SIMD128
+
+typedef float depthType;
+
+const float qnan = std::numeric_limits<float>::quiet_NaN();
+const cv::Vec3f nan3(qnan, qnan, qnan);
+#if USE_INTRINSICS
+const cv::v_float32x4 nanv(qnan, qnan, qnan, qnan);
+#endif
+
+inline bool isNaN(cv::Point3f p)
+{
+    return (cvIsNaN(p.x) || cvIsNaN(p.y) || cvIsNaN(p.z));
+}
+
+#if USE_INTRINSICS
+static inline bool isNaN(const cv::v_float32x4& p)
+{
+    return cv::v_check_any(p != p);
+}
+#endif
+
+/** @brief Camera intrinsics */
+struct Intr
+{
+    /** Reprojects screen point to camera space given z coord. */
+    struct Reprojector
+    {
+        Reprojector() {}
+        inline Reprojector(Intr intr)
+        {
+            fxinv = 1.f/intr.fx, fyinv = 1.f/intr.fy;
+            cx = intr.cx, cy = intr.cy;
+        }
+        template<typename T>
+        inline cv::Point3_<T> operator()(cv::Point3_<T> p) const
+        {
+            T x = p.z * (p.x - cx) * fxinv;
+            T y = p.z * (p.y - cy) * fyinv;
+            return cv::Point3_<T>(x, y, p.z);
+        }
+
+        float fxinv, fyinv, cx, cy;
+    };
+    /** Projects camera space vector onto screen */
+    struct Projector
+    {
+        inline Projector(Intr intr) : fx(intr.fx), fy(intr.fy), cx(intr.cx), cy(intr.cy) { }
+        template<typename T>
+        inline cv::Point_<T> operator()(cv::Point3_<T> p) const
+        {
+            T invz = T(1)/p.z;
+            T x = fx*(p.x*invz) + cx;
+            T y = fy*(p.y*invz) + cy;
+            return cv::Point_<T>(x, y);
+        }
+        template<typename T>
+        inline cv::Point_<T> operator()(cv::Point3_<T> p, cv::Point3_<T>& pixVec) const
+        {
+            T invz = T(1)/p.z;
+            pixVec = cv::Point3_<T>(p.x*invz, p.y*invz, 1);
+            T x = fx*pixVec.x + cx;
+            T y = fy*pixVec.y + cy;
+            return cv::Point_<T>(x, y);
+        }
+        float fx, fy, cx, cy;
+    };
+    Intr() : fx(), fy(), cx(), cy() { }
+    Intr(float _fx, float _fy, float _cx, float _cy) : fx(_fx), fy(_fy), cx(_cx), cy(_cy) { }
+    Intr(cv::Matx33f m) : fx(m(0, 0)), fy(m(1, 1)), cx(m(0, 2)), cy(m(1, 2)) { }
+    // scale intrinsics to pyramid level
+    inline Intr scale(int pyr) const
+    {
+        float factor = (1.f /(1 << pyr));
+        return Intr(fx*factor, fy*factor, cx*factor, cy*factor);
+    }
+    inline Reprojector makeReprojector() const { return Reprojector(*this); }
+    inline Projector   makeProjector()   const { return Projector(*this);   }
+
+    float fx, fy, cx, cy;
+};
+
+inline size_t roundDownPow2(size_t x)
+{
+    size_t shift = 0;
+    while(x != 0)
+    {
+        shift++; x >>= 1;
+    }
+    return (size_t)(1ULL << (shift-1));
+}
+
+} // namespace kinfu
+
+} // namespace cv
+
+
+#endif
+
+/* End of file. */
diff --git a/modules/rgbd/src/warpfield.cpp b/modules/rgbd/src/warpfield.cpp
new file mode 100644
index 00000000000..5d86f05b8af
--- /dev/null
+++ b/modules/rgbd/src/warpfield.cpp
@@ -0,0 +1,353 @@
+#include "precomp.hpp"
+#include "warpfield.hpp"
+
+namespace cv {
+namespace dynafu {
+
+WarpField::WarpField(int _maxNeighbours, int K, int levels, float baseResolution, float resolutionGrowth):
+k(K), n_levels(levels),
+nodes(), maxNeighbours(_maxNeighbours), // good amount for dense kinfu pointclouds
+baseRes(baseResolution),
+resGrowthRate(resolutionGrowth),
+regGraphNodes(n_levels-1),
+heirarchy(n_levels-1),
+nodeIndex(nullptr)
+{
+    CV_Assert(k <= DYNAFU_MAX_NEIGHBOURS);
+}
+
+NodeVectorType const& WarpField::getNodes() const
+{
+    return nodes;
+}
+
+std::vector<NodeVectorType> const& WarpField::getGraphNodes() const
+{
+    return regGraphNodes;
+}
+
+heirarchyType const& WarpField::getRegGraph() const
+{
+    return heirarchy;
+}
+
+bool PtCmp(cv::Point3f a, cv::Point3f b)
+{
+    return (a.x < b.x) ||
+            ((a.x >= b.x) && (a.y < b.y)) ||
+            ((a.x >= b.x) && (a.y >= b.y) && (a.z < b.z));
+}
+
+Ptr<flann::GenericIndex<flann::L2_Simple<float> > > WarpField::getNodeIndex() const
+{
+    return nodeIndex;
+}
+
+Mat getNodesPos(NodeVectorType nv)
+{
+    Mat nodePos((int)nv.size(), 3, CV_32F);
+    for(int i = 0; i < (int)nv.size(); i++)
+    {
+        nodePos.at<float>(i, 0) = nv[i]->pos.x;
+        nodePos.at<float>(i, 1) = nv[i]->pos.y;
+        nodePos.at<float>(i, 2) = nv[i]->pos.z;
+    }
+    return nodePos;
+}
+
+void WarpField::updateNodesFromPoints(InputArray inputPoints)
+{
+    Mat points_matrix(inputPoints.size().height, 3, CV_32F);
+    if(inputPoints.channels() == 1)
+    {
+        points_matrix = inputPoints.getMat().colRange(0, 3);
+    }
+    else
+    {
+        points_matrix = inputPoints.getMat().reshape(1).colRange(0, 3).clone();
+    }
+
+    cvflann::LinearIndexParams params;
+    flann::GenericIndex<flann::L2_Simple <float> > searchIndex(points_matrix, params);
+
+    AutoBuffer<bool> validIndex;
+    removeSupported(searchIndex, validIndex);
+
+    NodeVectorType newNodes;
+    if((int)nodes.size() > k)
+    {
+        newNodes = subsampleIndex(points_matrix, searchIndex, validIndex, baseRes, nodeIndex);
+    }
+    else
+    {
+        newNodes = subsampleIndex(points_matrix, searchIndex, validIndex, baseRes);
+    }
+
+    initTransforms(newNodes);
+    nodes.insert(nodes.end(), newNodes.begin(), newNodes.end());
+    nodesPos = getNodesPos(nodes);
+    //re-build index
+    nodeIndex = new flann::GenericIndex<flann::L2_Simple<float> >(nodesPos,
+                                                                  cvflann::LinearIndexParams());
+
+    constructRegGraph();
+}
+
+
+void WarpField::removeSupported(flann::GenericIndex<flann::L2_Simple<float> >& ind,
+                                  AutoBuffer<bool>& validInd)
+{
+    validInd.allocate(ind.size());
+    std::fill_n(validInd.data(), ind.size(), true);
+
+    for(WarpNode* n: nodes)
+    {
+        std::vector<float> query = {n->pos.x, n->pos.y, n->pos.z};
+
+        std::vector<int> indices_vec(maxNeighbours);
+        std::vector<float> dists_vec(maxNeighbours);
+
+        ind.radiusSearch(query, indices_vec, dists_vec, n->radius, cvflann::SearchParams());
+
+        for(auto i: indices_vec)
+        {
+            validInd[i] = false;
+        }
+
+    }
+}
+
+NodeVectorType WarpField::subsampleIndex(Mat& pmat,
+                                         flann::GenericIndex<flann::L2_Simple<float> >& ind,
+                                         AutoBuffer<bool>& validIndex, float res,
+                                         Ptr<flann::GenericIndex<flann::L2_Simple<float> > > knnIndex)
+{
+    CV_TRACE_FUNCTION();
+
+    NodeVectorType temp_nodes;
+
+    for(int i = 0; i < (int)validIndex.size(); i++)
+    {
+        if(!validIndex[i])
+        {
+            continue;
+        }
+
+        std::vector<int> indices_vec(maxNeighbours);
+        std::vector<float> dist_vec(maxNeighbours);
+
+        ind.radiusSearch(pmat.row(i), indices_vec, dist_vec, res, cvflann::SearchParams());
+
+
+        Ptr<WarpNode> wn = new WarpNode;
+        Point3f centre(0, 0, 0);
+        float len = 0;
+        for(int index: indices_vec)
+        {
+            if(validIndex[index])
+            {
+                centre += Point3f(pmat.at<float>(index, 0),
+                                  pmat.at<float>(index, 1),
+                                  pmat.at<float>(index, 2));
+                len++;
+            }
+        }
+
+        centre /= len;
+        wn->pos = centre;
+
+        for(auto index: indices_vec)
+        {
+            validIndex[index] = false;
+        }
+
+        std::vector<int> knn_indices(k+1, 0);
+        std::vector<float> knn_dists(k+1, 0);
+
+        std::vector<float> query = {wn->pos.x, wn->pos.y, wn->pos.z};
+
+        if(knnIndex != nullptr)
+        {
+            knnIndex->knnSearch(query, knn_indices, knn_dists, (k+1), cvflann::SearchParams());
+            wn->radius = *std::max_element(knn_dists.begin(), knn_dists.end());
+        }
+        else
+        {
+            wn->radius = res;
+        }
+
+        wn->transform = Affine3f::Identity();
+
+        temp_nodes.push_back(wn);
+    }
+
+    return temp_nodes;
+}
+
+void WarpField::initTransforms(NodeVectorType nv)
+{
+    if(nodesPos.size().height == 0)
+    {
+        return;
+    }
+
+    for(auto nodePtr: nv)
+    {
+        std::vector<int> knnIndices(k);
+        std::vector<float> knnDists(k);
+
+        std::vector<float> query = {nodePtr->pos.x, nodePtr->pos.y, nodePtr->pos.z};
+
+        nodeIndex->knnSearch(query ,knnIndices, knnDists, k, cvflann::SearchParams());
+
+        std::vector<float> weights(knnIndices.size());
+        std::vector<Affine3f> transforms(knnIndices.size());
+
+        size_t i = 0;
+        for(int idx: knnIndices)
+        {
+            weights[i] = nodes[idx]->weight(nodePtr->pos);
+            transforms[i++] = nodes[idx]->transform;
+        }
+
+        Affine3f pose = DQB(weights, transforms);
+        // linearly interpolate translations
+        Vec3f translation(0,0,0);
+        float totalWeight = 0;
+        for(i = 0; i < transforms.size(); i++)
+        {
+            translation += weights[i]*transforms[i].translation();
+            totalWeight += weights[i];
+        }
+
+        if(totalWeight < 1e-5) translation = Vec3f(0, 0, 0);
+        else translation /= totalWeight;
+        nodePtr->transform = Affine3f(pose.rotation(), translation);
+    }
+}
+
+void WarpField::constructRegGraph()
+{
+    CV_TRACE_FUNCTION();
+
+    regGraphNodes.clear();
+
+    if(n_levels == 1) // First level (Nwarp) is already stored in nodes
+    {
+        return;
+    }
+
+    float effResolution = baseRes*resGrowthRate;
+    NodeVectorType curNodes = nodes;
+    Mat curNodeMatrix = getNodesPos(curNodes);
+
+    Ptr<flann::GenericIndex<flann::L2_Simple<float> > > curNodeIndex(
+        new flann::GenericIndex<flann::L2_Simple<float> >(curNodeMatrix,
+                                                          cvflann::LinearIndexParams()));
+
+    for(int l = 0; l < (n_levels-1); l++)
+    {
+        AutoBuffer<bool> nodeValidity;
+        nodeValidity.allocate(curNodeIndex->size());
+
+        std::fill_n(nodeValidity.data(), curNodeIndex->size(), true);
+        NodeVectorType coarseNodes = subsampleIndex(curNodeMatrix, *curNodeIndex, nodeValidity,
+                                                    effResolution);
+
+        initTransforms(coarseNodes);
+
+        Mat coarseNodeMatrix = getNodesPos(coarseNodes);
+
+        Ptr<flann::GenericIndex<flann::L2_Simple<float> > > coarseNodeIndex(
+            new flann::GenericIndex<flann::L2_Simple<float> >(coarseNodeMatrix,
+                                                              cvflann::LinearIndexParams()));
+
+        heirarchy[l] = std::vector<nodeNeighboursType>(curNodes.size());
+        for(int i = 0; i < (int)curNodes.size(); i++)
+        {
+            std::vector<int> children_indices(k);
+            std::vector<float> children_dists(k);
+
+            std::vector<float> query = {curNodeMatrix.at<float>(i, 0),
+                                        curNodeMatrix.at<float>(i, 1),
+                                        curNodeMatrix.at<float>(i, 2)};
+
+            coarseNodeIndex->knnSearch(query, children_indices, children_dists, k,
+                                       cvflann::SearchParams());
+            heirarchy[l][i].fill(-1);
+            std::copy(children_indices.begin(), children_indices.end(), heirarchy[l][i].begin());
+        }
+
+        regGraphNodes.push_back(coarseNodes);
+        curNodes = coarseNodes;
+        curNodeMatrix = coarseNodeMatrix;
+        curNodeIndex = coarseNodeIndex;
+        effResolution *= resGrowthRate;
+    }
+
+}
+
+Point3f WarpField::applyWarp(Point3f p, const nodeNeighboursType neighbours, int n, bool normal) const
+{
+    CV_TRACE_FUNCTION();
+
+    if(n == 0)
+    {
+        return p;
+    }
+
+    float totalWeight = 0;
+    Point3f WarpedPt(0,0,0);
+
+    for(int i = 0; i < n; i++)
+    {
+        Ptr<WarpNode> neigh = nodes[neighbours[i]];
+        float w = neigh->weight(p);
+        if(w < 0.01)
+        {
+            continue;
+        }
+
+        Matx33f R = neigh->transform.rotation();
+        Point3f newPt(0, 0, 0);
+
+        if(normal)
+        {
+            newPt = R * p;
+        }
+        else
+        {
+            newPt = R * (p - neigh->pos) + neigh->pos;
+            Vec3f T = neigh->transform.translation();
+            newPt.x += T[0];
+            newPt.y += T[1];
+            newPt.z += T[2];
+        }
+
+        WarpedPt += newPt * w;
+        totalWeight += w;
+
+    }
+    WarpedPt /= totalWeight;
+
+    if(totalWeight == 0)
+    {
+        return p;
+    }
+    else
+    {
+        return WarpedPt;
+    }
+
+}
+
+void WarpField::setAllRT(Affine3f warpRT)
+{
+    for(auto n: nodes)
+    {
+        n->transform = warpRT;
+    }
+}
+
+} // namepsace dynafu
+} // namespace cv
diff --git a/modules/rgbd/src/warpfield.hpp b/modules/rgbd/src/warpfield.hpp
new file mode 100644
index 00000000000..23dad61f706
--- /dev/null
+++ b/modules/rgbd/src/warpfield.hpp
@@ -0,0 +1,94 @@
+#ifndef __OPENCV_RGBD_WARPFIELD_HPP__
+#define __OPENCV_RGBD_WARPFIELD_HPP__
+
+#include "opencv2/core.hpp"
+#include "opencv2/flann.hpp"
+#include "dqb.hpp"
+
+#define DYNAFU_MAX_NEIGHBOURS 10
+typedef std::array<int, DYNAFU_MAX_NEIGHBOURS> nodeNeighboursType;
+typedef std::vector<std::vector<nodeNeighboursType> > heirarchyType;
+
+namespace cv {
+namespace dynafu {
+
+struct WarpNode
+{
+    Point3f pos;
+    float radius;
+    Affine3f transform;
+
+    float weight(Point3f x)
+    {
+        Point3f diff = pos - x;
+        float L2 = diff.x*diff.x + diff.y*diff.y + diff.z*diff.z;
+        return expf(-L2/(2.f*radius));
+    }
+};
+
+typedef std::vector<Ptr<WarpNode> > NodeVectorType;
+
+class WarpField
+{
+public:
+    WarpField(int _maxNeighbours=1000000, int K=4, int levels=4, float baseResolution=.10f,
+              float resolutionGrowth=4);
+
+    void updateNodesFromPoints(InputArray _points);
+
+    const NodeVectorType& getNodes() const;
+    const heirarchyType& getRegGraph() const;
+    const std::vector<NodeVectorType>& getGraphNodes() const;
+
+    size_t getNodesLen() const
+    {
+        return nodes.size();
+    }
+
+    Point3f applyWarp(Point3f p, const nodeNeighboursType neighbours, int n, bool normal=false) const;
+
+    void setAllRT(Affine3f warpRT);
+
+    Ptr<flann::GenericIndex<flann::L2_Simple<float> > > getNodeIndex() const;
+
+    inline void findNeighbours(Point3f queryPt, std::vector<int>& indices, std::vector<float>& dists)
+    {
+        std::vector<float> query = {queryPt.x, queryPt.y, queryPt.z};
+        nodeIndex->knnSearch(query, indices, dists, k, cvflann::SearchParams());
+    }
+
+    int k; //k-nearest neighbours will be used
+    int n_levels; // number of levels in the heirarchy
+
+private:
+    void removeSupported(flann::GenericIndex<flann::L2_Simple<float> >& ind, AutoBuffer<bool>& supInd);
+
+    NodeVectorType subsampleIndex(Mat& pmat, flann::GenericIndex<flann::L2_Simple<float> >& ind,
+                                  AutoBuffer<bool>& supInd, float res,
+                                  Ptr<flann::GenericIndex<flann::L2_Simple<float> > > knnIndex = nullptr);
+    void constructRegGraph();
+
+    void initTransforms(NodeVectorType nv);
+
+    NodeVectorType nodes; //heirarchy level 0
+    int maxNeighbours;
+
+    float baseRes;
+    float resGrowthRate;
+
+    std::vector<NodeVectorType> regGraphNodes; // heirarchy levels 1 to L
+    heirarchyType heirarchy;
+
+    Ptr<flann::GenericIndex<flann::L2_Simple<float> > > nodeIndex;
+
+    Mat nodesPos;
+
+};
+
+bool PtCmp(cv::Point3f a, cv::Point3f b);
+Mat getNodesPos(NodeVectorType nv);
+
+} // namepsace dynafu
+} // namespace cv
+
+#endif
diff --git a/modules/rgbd/test/test_dynafu.cpp b/modules/rgbd/test/test_dynafu.cpp
new file mode 100644
index 00000000000..78a1a8fd92f
--- /dev/null
+++ b/modules/rgbd/test/test_dynafu.cpp
@@ -0,0 +1,129 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#include "test_precomp.hpp"
+
+#ifdef HAVE_OPENGL
+namespace opencv_test { namespace {
+
+static std::vector<std::string> readDepth(std::string fileList)
+{
+    std::vector<std::string> v;
+
+    std::fstream file(fileList);
+    if(!file.is_open())
+        throw std::runtime_error("Failed to read depth list");
+
+    std::string dir;
+    size_t slashIdx = fileList.rfind('/');
+    slashIdx = slashIdx != std::string::npos ? slashIdx : fileList.rfind('\\');
+    dir = fileList.substr(0, slashIdx);
+
+    while(!file.eof())
+    {
+        std::string s, imgPath;
+        std::getline(file, s);
+        if(s.empty() || s[0] == '#') continue;
+        std::stringstream ss;
+        ss << s;
+        double thumb;
+        ss >> thumb >> imgPath;
+        v.push_back(dir+'/'+imgPath);
+    }
+
+    return v;
+}
+
+static const bool display = false;
+
+void flyTest(bool hiDense, bool inequal)
+{
+    Ptr<dynafu::Params> params;
+    if(hiDense)
+        params = dynafu::Params::defaultParams();
+    else
+        params = dynafu::Params::coarseParams();
+
+    if(inequal)
+    {
+        params->volumeDims[0] += 32;
+        params->volumeDims[1] -= 32;
+    }
+
+    std::vector<String> depths = readDepth(cvtest::TS::ptr()->get_data_path() + "dynafu/depth.txt");
+    CV_Assert(!depths.empty());
+
+    Ptr<dynafu::DynaFu> df = dynafu::DynaFu::create(params);
+
+    // Check for first 10 frames
+    CV_Assert(depths.size() >= 10);
+    Mat currentDepth, prevDepth;
+    for(size_t i = 0; i < 10; i++)
+    {
+        currentDepth = cv::imread(depths[i], IMREAD_ANYDEPTH);
+
+        ASSERT_TRUE(df->update(currentDepth));
+
+        Mat renderedDepth;
+        df->renderSurface(renderedDepth, noArray(), noArray());
+
+        if(i > 0)
+        {
+            // Check if estimated depth aligns with actual depth in the previous frame
+            Mat depthCvt8, renderCvt8;
+            convertScaleAbs(prevDepth, depthCvt8, 0.25*256. / params->depthFactor);
+            convertScaleAbs(renderedDepth, renderCvt8, 0.33*255, -0.5*0.33*255);
+
+            Mat diff;
+            absdiff(depthCvt8, renderCvt8, diff);
+
+            Scalar_<float> mu, sigma;
+            meanStdDev(diff, mu, sigma);
+            std::cout << "Mean: " << mu[0] << ", Std dev: " << sigma[0] << std::endl;
+        }
+
+        if(display)
+        {
+            imshow("depth", currentDepth*(1.f/params->depthFactor/4.f));
+            Mat rendered;
+            df->render(rendered);
+            imshow("render", rendered);
+            waitKey(10);
+        }
+
+        currentDepth.copyTo(prevDepth);
+    }
+}
+
+/*
+#ifdef OPENCV_ENABLE_NONFREE
+TEST( DynamicFusion, lowDense )
+#else
+TEST(DynamicFusion, DISABLED_lowDense)
+#endif
+{
+    flyTest(false, false);
+}
+
+#ifdef OPENCV_ENABLE_NONFREE
+TEST( DynamicFusion, inequal )
+#else
+TEST(DynamicFusion, DISABLED_inequal)
+#endif
+{
+    flyTest(false, true);
+}
+*/
+
+// To enable DynamicFusion tests, uncomment the above lines and delete the following lines
+TEST(DynamicFusion, DISABLED)
+{
+    CV_UNUSED(flyTest);
+}
+
+}} // namespace
+
+#endif
diff --git a/modules/rgbd/test/test_kinfu.cpp b/modules/rgbd/test/test_kinfu.cpp
new file mode 100644
index 00000000000..19196121e0a
--- /dev/null
+++ b/modules/rgbd/test/test_kinfu.cpp
@@ -0,0 +1,372 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+#include "test_precomp.hpp"
+
+// Inspired by Inigo Quilez' raymarching guide:
+// http://iquilezles.org/www/articles/distfunctions/distfunctions.htm
+
+namespace opencv_test { namespace {
+
+using namespace cv;
+
+/** Reprojects screen point to camera space given z coord. */
+struct Reprojector
+{
+    Reprojector() {}
+    inline Reprojector(Matx33f intr)
+    {
+        fxinv = 1.f/intr(0, 0), fyinv = 1.f/intr(1, 1);
+        cx = intr(0, 2), cy = intr(1, 2);
+    }
+    template<typename T>
+    inline cv::Point3_<T> operator()(cv::Point3_<T> p) const
+    {
+        T x = p.z * (p.x - cx) * fxinv;
+        T y = p.z * (p.y - cy) * fyinv;
+        return cv::Point3_<T>(x, y, p.z);
+    }
+
+    float fxinv, fyinv, cx, cy;
+};
+
+template<class Scene>
+struct RenderInvoker : ParallelLoopBody
+{
+    RenderInvoker(Mat_<float>& _frame, Affine3f _pose,
+                  Reprojector _reproj,
+                  float _depthFactor) : ParallelLoopBody(),
+        frame(_frame),
+        pose(_pose),
+        reproj(_reproj),
+        depthFactor(_depthFactor)
+    { }
+
+    virtual void operator ()(const cv::Range& r) const
+    {
+        for(int y = r.start; y < r.end; y++)
+        {
+            float* frameRow = frame[y];
+            for(int x = 0; x < frame.cols; x++)
+            {
+                float pix = 0;
+
+                Point3f orig = pose.translation();
+                // direction through pixel
+                Point3f screenVec = reproj(Point3f((float)x, (float)y, 1.f));
+                float xyt = 1.f/(screenVec.x*screenVec.x +
+                                 screenVec.y*screenVec.y + 1.f);
+                Point3f dir = normalize(Vec3f(pose.rotation() * screenVec));
+                // screen space axis
+                dir.y = - dir.y;
+
+                const float maxDepth = 20.f;
+                const float maxSteps = 256;
+                float t = 0.f;
+                for(int step = 0; step < maxSteps && t < maxDepth; step++)
+                {
+                    Point3f p = orig + dir*t;
+                    float d = Scene::map(p);
+                    if(d < 0.000001f)
+                    {
+                        float depth = std::sqrt(t*t*xyt);
+                        pix = depth*depthFactor;
+                        break;
+                    }
+                    t += d;
+                }
+
+                frameRow[x] = pix;
+            }
+        }
+    }
+
+    Mat_<float>& frame;
+    Affine3f pose;
+    Reprojector reproj;
+    float depthFactor;
+};
+
+struct Scene
+{
+    virtual ~Scene() {}
+    static Ptr<Scene> create(int nScene, Size sz, Matx33f _intr, float _depthFactor);
+    virtual Mat depth(Affine3f pose) = 0;
+    virtual std::vector<Affine3f> getPoses() = 0;
+};
+
+struct CubeSpheresScene : Scene
+{
+    const int framesPerCycle = 32;
+    const float nCycles = 0.25f;
+    const Affine3f startPose = Affine3f(Vec3f(-0.5f, 0.f, 0.f), Vec3f(2.1f, 1.4f, -2.1f));
+
+    CubeSpheresScene(Size sz, Matx33f _intr, float _depthFactor) :
+        frameSize(sz), intr(_intr), depthFactor(_depthFactor)
+    { }
+
+    static float map(Point3f p)
+    {
+        float plane = p.y + 0.5f;
+
+        Point3f boxPose = p - Point3f(-0.0f, 0.3f, 0.0f);
+        float boxSize = 0.5f;
+        float roundness = 0.08f;
+        Point3f boxTmp;
+        boxTmp.x = max(abs(boxPose.x) - boxSize, 0.0f);
+        boxTmp.y = max(abs(boxPose.y) - boxSize, 0.0f);
+        boxTmp.z = max(abs(boxPose.z) - boxSize, 0.0f);
+        float roundBox = (float)cv::norm(boxTmp) - roundness;
+
+        float sphereRadius = 0.7f;
+        float sphere = (float)cv::norm(boxPose) - sphereRadius;
+
+        float boxMinusSphere = max(roundBox, -sphere);
+
+        float sphere2 = (float)cv::norm(p - Point3f(0.3f, 1.f, 0.f)) - 0.1f;
+        float sphere3 = (float)cv::norm(p - Point3f(0.0f, 1.f, 0.f)) - 0.2f;
+        float res = min(min(plane, boxMinusSphere), min(sphere2, sphere3));
+
+        return res;
+    }
+
+    Mat depth(Affine3f pose) override
+    {
+        Mat_<float> frame(frameSize);
+        Reprojector reproj(intr);
+
+        Range range(0, frame.rows);
+        parallel_for_(range, RenderInvoker<CubeSpheresScene>(frame, pose, reproj, depthFactor));
+
+        return std::move(frame);
+    }
+
+    std::vector<Affine3f> getPoses() override
+    {
+        std::vector<Affine3f> poses;
+        for(int i = 0; i < (int)(framesPerCycle*nCycles); i++)
+        {
+            float angle = (float)(CV_2PI*i/framesPerCycle);
+            Affine3f pose;
+            pose = pose.rotate(startPose.rotation());
+            pose = pose.rotate(Vec3f(0.f, -1.f, 0.f)*angle);
+            pose = pose.translate(Vec3f(startPose.translation()[0]*sin(angle),
+                                        startPose.translation()[1],
+                                        startPose.translation()[2]*cos(angle)));
+            poses.push_back(pose);
+        }
+
+        return poses;
+    }
+
+    Size frameSize;
+    Matx33f intr;
+    float depthFactor;
+};
+
+
+struct RotatingScene : Scene
+{
+    const int framesPerCycle = 32;
+    const float nCycles = 0.5f;
+    const Affine3f startPose = Affine3f(Vec3f(-1.f, 0.f, 0.f), Vec3f(1.5f, 2.f, -1.5f));
+
+    RotatingScene(Size sz, Matx33f _intr, float _depthFactor) :
+        frameSize(sz), intr(_intr), depthFactor(_depthFactor)
+    {
+        cv::RNG rng(0);
+        rng.fill(randTexture, cv::RNG::UNIFORM, 0.f, 1.f);
+    }
+
+    static float noise(Point2f pt)
+    {
+        pt.x = abs(pt.x - (int)pt.x);
+        pt.y = abs(pt.y - (int)pt.y);
+        pt *= 256.f;
+
+        int xi = cvFloor(pt.x), yi = cvFloor(pt.y);
+
+        const float* row0 = randTexture[(yi+0)%256];
+        const float* row1 = randTexture[(yi+1)%256];
+
+        float v00 = row0[(xi+0)%256];
+        float v01 = row0[(xi+1)%256];
+        float v10 = row1[(xi+0)%256];
+        float v11 = row1[(xi+1)%256];
+
+        float tx = pt.x - xi, ty = pt.y - yi;
+        float v0 = v00 + tx*(v01 - v00);
+        float v1 = v10 + tx*(v11 - v10);
+        return v0 + ty*(v1 - v0);
+    }
+
+    static float map(Point3f p)
+    {
+        const Point3f torPlace(0.f, 0.f, 0.f);
+        Point3f torPos(p - torPlace);
+        const Point2f torusParams(1.f, 0.2f);
+        Point2f torq(std::sqrt(torPos.x*torPos.x + torPos.z*torPos.z) - torusParams.x, torPos.y);
+        float torus = (float)cv::norm(torq) - torusParams.y;
+
+        const Point3f cylShift(0.25f, 0.25f, 0.25f);
+
+        Point3f cylPos = Point3f(abs(std::fmod(p.x-0.1f, cylShift.x)),
+                                 p.y,
+                                 abs(std::fmod(p.z-0.2f, cylShift.z)))  - cylShift*0.5f;
+
+        const Point2f cylParams(0.1f,
+                                0.1f+0.1f*sin(p.x*p.y*5.f /* +std::log(1.f+abs(p.x*0.1f)) */));
+        Point2f cyld = Point2f(abs(std::sqrt(cylPos.x*cylPos.x + cylPos.z*cylPos.z)), abs(cylPos.y)) - cylParams;
+        float pins = min(max(cyld.x, cyld.y), 0.0f) + (float)cv::norm(Point2f(max(cyld.x, 0.f), max(cyld.y, 0.f)));
+
+        float terrain = p.y + 0.25f*noise(Point2f(p.x, p.z)*0.01f);
+
+        float res = min(terrain, max(-pins, torus));
+
+        return res;
+    }
+
+    Mat depth(Affine3f pose) override
+    {
+        Mat_<float> frame(frameSize);
+        Reprojector reproj(intr);
+
+        Range range(0, frame.rows);
+        parallel_for_(range, RenderInvoker<RotatingScene>(frame, pose, reproj, depthFactor));
+
+        return std::move(frame);
+    }
+
+    std::vector<Affine3f> getPoses() override
+    {
+        std::vector<Affine3f> poses;
+        for(int i = 0; i < framesPerCycle*nCycles; i++)
+        {
+            float angle = (float)(CV_2PI*i/framesPerCycle);
+            Affine3f pose;
+            pose = pose.rotate(startPose.rotation());
+            pose = pose.rotate(Vec3f(0.f, -1.f, 0.f)*angle);
+            pose = pose.translate(Vec3f(startPose.translation()[0]*sin(angle),
+                                        startPose.translation()[1],
+                                        startPose.translation()[2]*cos(angle)));
+            poses.push_back(pose);
+        }
+
+        return poses;
+    }
+
+    Size frameSize;
+    Matx33f intr;
+    float depthFactor;
+    static cv::Mat_<float> randTexture;
+};
+
+Mat_<float> RotatingScene::randTexture(256, 256);
+
+Ptr<Scene> Scene::create(int nScene, Size sz, Matx33f _intr, float _depthFactor)
+{
+    if(nScene == 0)
+        return makePtr<RotatingScene>(sz, _intr, _depthFactor);
+    else
+        return makePtr<CubeSpheresScene>(sz, _intr, _depthFactor);
+}
+
+static const bool display = false;
+
+void flyTest(bool hiDense, bool inequal)
+{
+    Ptr<kinfu::Params> params;
+    if(hiDense)
+        params = kinfu::Params::defaultParams();
+    else
+        params = kinfu::Params::coarseParams();
+
+    if(inequal)
+    {
+        params->volumeDims[0] += 32;
+        params->volumeDims[1] -= 32;
+    }
+
+    Ptr<Scene> scene = Scene::create(hiDense, params->frameSize, params->intr, params->depthFactor);
+
+    Ptr<kinfu::KinFu> kf = kinfu::KinFu::create(params);
+
+    std::vector<Affine3f> poses = scene->getPoses();
+    Affine3f startPoseGT = poses[0], startPoseKF;
+    Affine3f pose, kfPose;
+    for(size_t i = 0; i < poses.size(); i++)
+    {
+        pose = poses[i];
+
+        Mat depth = scene->depth(pose);
+
+        ASSERT_TRUE(kf->update(depth));
+
+        kfPose = kf->getPose();
+        if(i == 0)
+            startPoseKF = kfPose;
+
+        pose = (  startPoseGT.inv() * pose  )*startPoseKF;
+
+        if(display)
+        {
+            imshow("depth", depth*(1.f/params->depthFactor/4.f));
+            Mat rendered;
+            kf->render(rendered);
+            imshow("render", rendered);
+            waitKey(10);
+        }
+    }
+
+    double rvecThreshold = hiDense ? 0.01 : 0.02;
+    ASSERT_LT(cv::norm(kfPose.rvec() - pose.rvec()), rvecThreshold);
+    double poseThreshold = hiDense ? 0.03 : 0.1;
+    ASSERT_LT(cv::norm(kfPose.translation() - pose.translation()), poseThreshold);
+}
+
+
+#ifdef OPENCV_ENABLE_NONFREE
+TEST( KinectFusion, lowDense )
+#else
+TEST(KinectFusion, DISABLED_lowDense)
+#endif
+{
+    flyTest(false, false);
+}
+
+#ifdef OPENCV_ENABLE_NONFREE
+TEST( KinectFusion, highDense )
+#else
+TEST(KinectFusion, DISABLED_highDense)
+#endif
+{
+    flyTest(true, false);
+}
+
+#ifdef OPENCV_ENABLE_NONFREE
+TEST( KinectFusion, inequal )
+#else
+TEST(KinectFusion, DISABLED_inequal)
+#endif
+{
+    flyTest(false, true);
+}
+
+#ifdef HAVE_OPENCL
+#ifdef OPENCV_ENABLE_NONFREE
+TEST( KinectFusion, OCL )
+#else
+TEST(KinectFusion, DISABLED_OCL)
+#endif
+{
+    cv::ocl::setUseOpenCL(false);
+    flyTest(false, false);
+    cv::ocl::setUseOpenCL(true);
+    flyTest(false, false);
+}
+#endif
+
+}} // namespace
diff --git a/modules/rgbd/test/test_main.cpp b/modules/rgbd/test/test_main.cpp
index 0e51ddfd050..3bd6c912128 100644
--- a/modules/rgbd/test/test_main.cpp
+++ b/modules/rgbd/test/test_main.cpp
@@ -1,6 +1,11 @@
 // This file is part of OpenCV project.
 // It is subject to the license terms in the LICENSE file found in the top-level directory
-// of this distribution and at http://opencv.org/license.html.
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
+
 #include "test_precomp.hpp"
 
 CV_TEST_MAIN("cv")
diff --git a/modules/rgbd/test/test_normal.cpp b/modules/rgbd/test/test_normal.cpp
index 85269b1e9b8..04baf200d74 100644
--- a/modules/rgbd/test/test_normal.cpp
+++ b/modules/rgbd/test/test_normal.cpp
@@ -1,37 +1,8 @@
-/*
- * Software License Agreement (BSD License)
- *
- *  Copyright (c) 2012, Willow Garage, Inc.
- *  All rights reserved.
- *
- *  Redistribution and use in source and binary forms, with or without
- *  modification, are permitted provided that the following conditions
- *  are met:
- *
- *   * Redistributions of source code must retain the above copyright
- *     notice, this list of conditions and the following disclaimer.
- *   * Redistributions in binary form must reproduce the above
- *     copyright notice, this list of conditions and the following
- *     disclaimer in the documentation and/or other materials provided
- *     with the distribution.
- *   * Neither the name of Willow Garage, Inc. nor the names of its
- *     contributors may be used to endorse or promote products derived
- *     from this software without specific prior written permission.
- *
- *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
- *  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
- *  COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
- *  INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
- *  BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- *  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- *  CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- *  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
- *  ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- *  POSSIBILITY OF SUCH DAMAGE.
- *
- */
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
 
 #include "test_precomp.hpp"
 #include <opencv2/rgbd.hpp>
diff --git a/modules/rgbd/test/test_odometry.cpp b/modules/rgbd/test/test_odometry.cpp
index a4efbde4bf8..98354812826 100644
--- a/modules/rgbd/test/test_odometry.cpp
+++ b/modules/rgbd/test/test_odometry.cpp
@@ -1,37 +1,10 @@
-/*
- * Software License Agreement (BSD License)
- *
- *  Copyright (c) 2012, Willow Garage, Inc.
- *  All rights reserved.
- *
- *  Redistribution and use in source and binary forms, with or without
- *  modification, are permitted provided that the following conditions
- *  are met:
- *
- *   * Redistributions of source code must retain the above copyright
- *     notice, this list of conditions and the following disclaimer.
- *   * Redistributions in binary form must reproduce the above
- *     copyright notice, this list of conditions and the following
- *     disclaimer in the documentation and/or other materials provided
- *     with the distribution.
- *   * Neither the name of Willow Garage, Inc. nor the names of its
- *     contributors may be used to endorse or promote products derived
- *     from this software without specific prior written permission.
- *
- *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
- *  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
- *  COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
- *  INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
- *  BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- *  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- *  CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- *  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
- *  ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- *  POSSIBILITY OF SUCH DAMAGE.
- *
- */
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
 
 #include "test_precomp.hpp"
 
@@ -294,7 +267,6 @@ void CV_OdometryTest::run(int)
         waitKey();
 #endif
 
-
         // compare rotation
         double rdiffnorm = cv::norm(rvec - calcRvec),
                rnorm = cv::norm(rvec);
@@ -350,5 +322,11 @@ TEST(RGBD_Odometry_RgbdICP, algorithmic)
     test.safe_run();
 }
 
+TEST(RGBD_Odometry_FastICP, algorithmic)
+{
+    CV_OdometryTest test(cv::rgbd::Odometry::create("FastICPOdometry"), 0.99, 0.99, FLT_EPSILON);
+    test.safe_run();
+}
+
 
 }} // namespace
diff --git a/modules/rgbd/test/test_precomp.hpp b/modules/rgbd/test/test_precomp.hpp
index 1a6ea9015e3..a896e58e4d9 100644
--- a/modules/rgbd/test/test_precomp.hpp
+++ b/modules/rgbd/test/test_precomp.hpp
@@ -1,6 +1,11 @@
 // This file is part of OpenCV project.
 // It is subject to the license terms in the LICENSE file found in the top-level directory
-// of this distribution and at http://opencv.org/license.html.
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
+
 #ifndef __OPENCV_TEST_PRECOMP_HPP__
 #define __OPENCV_TEST_PRECOMP_HPP__
 
@@ -8,6 +13,10 @@
 #include <opencv2/rgbd.hpp>
 #include <opencv2/calib3d.hpp>
 
+#ifdef HAVE_OPENCL
+#include <opencv2/core/ocl.hpp>
+#endif
+
 namespace opencv_test {
 using namespace cv::rgbd;
 }
diff --git a/modules/rgbd/test/test_registration.cpp b/modules/rgbd/test/test_registration.cpp
index 5b35a444dc9..1bf68340a27 100644
--- a/modules/rgbd/test/test_registration.cpp
+++ b/modules/rgbd/test/test_registration.cpp
@@ -1,46 +1,8 @@
-/*M///////////////////////////////////////////////////////////////////////////////////////
-//
-//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
-//
-//  By downloading, copying, installing or using the software you agree to this license.
-//  If you do not agree to this license, do not download, install,
-//  copy or use the software.
-//
-//
-//                          License Agreement
-//                For Open Source Computer Vision Library
-//
-// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
-// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
-// Copyright (C) 2014, OpenCV Foundation, all rights reserved.
-// Third party copyrights are property of their respective owners.
-//
-// Redistribution and use in source and binary forms, with or without modification,
-// are permitted provided that the following conditions are met:
-//
-//   * Redistribution's of source code must retain the above copyright notice,
-//     this list of conditions and the following disclaimer.
-//
-//   * Redistribution's in binary form must reproduce the above copyright notice,
-//     this list of conditions and the following disclaimer in the documentation
-//     and/or other materials provided with the distribution.
-//
-//   * The name of the copyright holders may not be used to endorse or promote products
-//     derived from this software without specific prior written permission.
-//
-// This software is provided by the copyright holders and contributors "as is" and
-// any express or implied warranties, including, but not limited to, the implied
-// warranties of merchantability and fitness for a particular purpose are disclaimed.
-// In no event shall the Intel Corporation or contributors be liable for any direct,
-// indirect, incidental, special, exemplary, or consequential damages
-// (including, but not limited to, procurement of substitute goods or services;
-// loss of use, data, or profits; or business interruption) however caused
-// and on any theory of liability, whether in contract, strict liability,
-// or tort (including negligence or otherwise) arising in any way out of
-// the use of this software, even if advised of the possibility of such damage.
-//
-//M*/
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
 
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
 
 #include "test_precomp.hpp"
 
diff --git a/modules/rgbd/test/test_utils.cpp b/modules/rgbd/test/test_utils.cpp
index 19190e62702..b155eaff951 100644
--- a/modules/rgbd/test/test_utils.cpp
+++ b/modules/rgbd/test/test_utils.cpp
@@ -1,6 +1,9 @@
 // This file is part of OpenCV project.
 // It is subject to the license terms in the LICENSE file found in the top-level directory
-// of this distribution and at http://opencv.org/license.html.
+// of this distribution and at http://opencv.org/license.html
+
+// This code is also subject to the license terms in the LICENSE_WillowGarage.md file found in this module's directory
+
 #include "test_precomp.hpp"
 
 namespace opencv_test { namespace {
@@ -65,4 +68,4 @@ TEST(Rgbd_DepthTo3d, compute)
 }
 
 
-}} // namespace
\ No newline at end of file
+}} // namespace
diff --git a/modules/saliency/src/staticSaliencySpectralResidual.cpp b/modules/saliency/src/staticSaliencySpectralResidual.cpp
index 0182822e301..73bb1536d54 100644
--- a/modules/saliency/src/staticSaliencySpectralResidual.cpp
+++ b/modules/saliency/src/staticSaliencySpectralResidual.cpp
@@ -109,7 +109,7 @@ bool StaticSaliencySpectralResidual::computeSaliencyImpl( InputArray image, Outp
 
   //-- Get magnitude and phase of frequency spectrum --//
   cartToPolar( mv.at( 0 ), mv.at( 1 ), magnitude, angle, false );
-  log( magnitude, logAmplitude );
+  log( magnitude + Scalar( 1 ), logAmplitude );
   //-- Blur log amplitude with averaging filter --//
   blur( logAmplitude, logAmplitude_blur, Size( 3, 3 ), Point( -1, -1 ), BORDER_DEFAULT );
 
diff --git a/modules/saliency/test/test_main.cpp b/modules/saliency/test/test_main.cpp
new file mode 100644
index 00000000000..2f16b0cd8c8
--- /dev/null
+++ b/modules/saliency/test/test_main.cpp
@@ -0,0 +1,41 @@
+/*
+By downloading, copying, installing or using the software you agree to this
+license. If you do not agree to this license, do not download, install,
+copy or use the software.
+
+                          License Agreement
+               For Open Source Computer Vision Library
+                       (3-clause BSD License)
+
+Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+Third party copyrights are property of their respective owners.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+  * Redistributions of source code must retain the above copyright notice,
+    this list of conditions and the following disclaimer.
+
+  * Redistributions in binary form must reproduce the above copyright notice,
+    this list of conditions and the following disclaimer in the documentation
+    and/or other materials provided with the distribution.
+
+  * Neither the names of the copyright holders nor the names of the contributors
+    may be used to endorse or promote products derived from this software
+    without specific prior written permission.
+
+This software is provided by the copyright holders and contributors "as is" and
+any express or implied warranties, including, but not limited to, the implied
+warranties of merchantability and fitness for a particular purpose are
+disclaimed. In no event shall copyright holders or contributors be liable for
+any direct, indirect, incidental, special, exemplary, or consequential damages
+(including, but not limited to, procurement of substitute goods or services;
+loss of use, data, or profits; or business interruption) however caused
+and on any theory of liability, whether in contract, strict liability,
+or tort (including negligence or otherwise) arising in any way out of
+the use of this software, even if advised of the possibility of such damage.
+*/
+
+#include "test_precomp.hpp"
+
+CV_TEST_MAIN("cv")
diff --git a/modules/saliency/test/test_precomp.hpp b/modules/saliency/test/test_precomp.hpp
new file mode 100644
index 00000000000..37332d2d0ee
--- /dev/null
+++ b/modules/saliency/test/test_precomp.hpp
@@ -0,0 +1,14 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/saliency.hpp"
+
+namespace opencv_test {
+    using namespace saliency;
+}
+
+#endif
diff --git a/modules/saliency/test/test_static_saliency_spectral_residual.cpp b/modules/saliency/test/test_static_saliency_spectral_residual.cpp
new file mode 100644
index 00000000000..ff0f46d540e
--- /dev/null
+++ b/modules/saliency/test/test_static_saliency_spectral_residual.cpp
@@ -0,0 +1,53 @@
+/*
+By downloading, copying, installing or using the software you agree to this
+license. If you do not agree to this license, do not download, install,
+copy or use the software.
+
+                          License Agreement
+               For Open Source Computer Vision Library
+                       (3-clause BSD License)
+
+Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+Third party copyrights are property of their respective owners.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+  * Redistributions of source code must retain the above copyright notice,
+    this list of conditions and the following disclaimer.
+
+  * Redistributions in binary form must reproduce the above copyright notice,
+    this list of conditions and the following disclaimer in the documentation
+    and/or other materials provided with the distribution.
+
+  * Neither the names of the copyright holders nor the names of the contributors
+    may be used to endorse or promote products derived from this software
+    without specific prior written permission.
+
+This software is provided by the copyright holders and contributors "as is" and
+any express or implied warranties, including, but not limited to, the implied
+warranties of merchantability and fitness for a particular purpose are
+disclaimed. In no event shall copyright holders or contributors be liable for
+any direct, indirect, incidental, special, exemplary, or consequential damages
+(including, but not limited to, procurement of substitute goods or services;
+loss of use, data, or profits; or business interruption) however caused
+and on any theory of liability, whether in contract, strict liability,
+or tort (including negligence or otherwise) arising in any way out of
+the use of this software, even if advised of the possibility of such damage.
+*/
+
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+TEST(CV_StaticSaliencySpectralResidual, should_not_contain_nan)
+{
+    Ptr<StaticSaliencySpectralResidual> saliencyAlgorithm = StaticSaliencySpectralResidual::create();
+    Mat img = Mat::zeros(cv::Size(1, 1), CV_32F);
+    Mat saliencyMap;
+
+    saliencyAlgorithm->computeSaliency(img, saliencyMap);
+    EXPECT_FALSE(std::isnan(saliencyMap.at<float>(0, 0)));
+}
+
+}} // namespace
diff --git a/modules/sfm/include/opencv2/sfm/io.hpp b/modules/sfm/include/opencv2/sfm/io.hpp
index dc01a251836..d339e07e99e 100644
--- a/modules/sfm/include/opencv2/sfm/io.hpp
+++ b/modules/sfm/include/opencv2/sfm/io.hpp
@@ -76,7 +76,7 @@ CV_EXPORTS_W
 void
 importReconstruction(const cv::String &file, OutputArrayOfArrays Rs,
                      OutputArrayOfArrays Ts, OutputArrayOfArrays Ks,
-                     OutputArray points3d, int file_format = SFM_IO_BUNDLER);
+                     OutputArrayOfArrays points3d, int file_format = SFM_IO_BUNDLER);
 
 //! @} sfm
 
diff --git a/modules/sfm/include/opencv2/sfm/simple_pipeline.hpp b/modules/sfm/include/opencv2/sfm/simple_pipeline.hpp
index 1bd4e17ffe1..6aca8e5b498 100644
--- a/modules/sfm/include/opencv2/sfm/simple_pipeline.hpp
+++ b/modules/sfm/include/opencv2/sfm/simple_pipeline.hpp
@@ -55,7 +55,8 @@ enum {
 
 /** @brief Data structure describing the camera model and its parameters.
   @param _distortion_model Type of camera model.
-  @param _focal_length focal length of the camera (in pixels).
+  @param _focal_length_x focal length of the camera (in pixels).
+  @param _focal_length_y focal length of the camera (in pixels).
   @param _principal_point_x principal point of the camera in the x direction (in pixels).
   @param _principal_point_y principal point of the camera in the y direction (in pixels).
   @param _polynomial_k1 radial distortion parameter.
@@ -73,7 +74,8 @@ class CV_EXPORTS_W_SIMPLE libmv_CameraIntrinsicsOptions
 public:
   CV_WRAP
   libmv_CameraIntrinsicsOptions(const int _distortion_model=0,
-                                const double _focal_length=0,
+                                const double _focal_length_x=0,
+                                const double _focal_length_y=0,
                                 const double _principal_point_x=0,
                                 const double _principal_point_y=0,
                                 const double _polynomial_k1=0,
@@ -84,7 +86,8 @@ class CV_EXPORTS_W_SIMPLE libmv_CameraIntrinsicsOptions
     : distortion_model(_distortion_model),
       image_width(2*_principal_point_x),
       image_height(2*_principal_point_y),
-      focal_length(_focal_length),
+      focal_length_x(_focal_length_x),
+      focal_length_y(_focal_length_y),
       principal_point_x(_principal_point_x),
       principal_point_y(_principal_point_y),
       polynomial_k1(_polynomial_k1),
@@ -103,7 +106,8 @@ class CV_EXPORTS_W_SIMPLE libmv_CameraIntrinsicsOptions
   // Common settings of all distortion models.
   CV_PROP_RW int distortion_model;
   CV_PROP_RW int image_width, image_height;
-  CV_PROP_RW double focal_length;
+  CV_PROP_RW double focal_length_x;
+  CV_PROP_RW double focal_length_y;
   CV_PROP_RW double principal_point_x, principal_point_y;
 
   // Radial distortion model.
diff --git a/modules/sfm/src/io.cpp b/modules/sfm/src/io.cpp
index 8e4255bab71..c632623e2f9 100644
--- a/modules/sfm/src/io.cpp
+++ b/modules/sfm/src/io.cpp
@@ -50,7 +50,7 @@ namespace sfm
 void
 importReconstruction(const cv::String &file, OutputArrayOfArrays _Rs,
                      OutputArrayOfArrays _Ts, OutputArrayOfArrays _Ks,
-                     OutputArray _points3d, int file_format) {
+                     OutputArrayOfArrays _points3d, int file_format) {
 
     std::vector<Matx33d> Rs, Ks;
     std::vector<Vec3d> Ts, points3d;
@@ -89,4 +89,4 @@ importReconstruction(const cv::String &file, OutputArrayOfArrays _Rs,
 
 
 } /* namespace sfm */
-} /* namespace cv */
\ No newline at end of file
+} /* namespace cv */
diff --git a/modules/sfm/src/libmv_capi.h b/modules/sfm/src/libmv_capi.h
index cf2e5daf75a..d5dd7b9f9f9 100644
--- a/modules/sfm/src/libmv_capi.h
+++ b/modules/sfm/src/libmv_capi.h
@@ -203,8 +203,8 @@ static bool selectTwoKeyframesBasedOnGRICAndVariance(
 static void libmv_cameraIntrinsicsFillFromOptions(
     const libmv_CameraIntrinsicsOptions* camera_intrinsics_options,
     CameraIntrinsics* camera_intrinsics) {
-  camera_intrinsics->SetFocalLength(camera_intrinsics_options->focal_length,
-                                    camera_intrinsics_options->focal_length);
+  camera_intrinsics->SetFocalLength(camera_intrinsics_options->focal_length_x,
+                                    camera_intrinsics_options->focal_length_y);
 
   camera_intrinsics->SetPrincipalPoint(
       camera_intrinsics_options->principal_point_x,
diff --git a/modules/sfm/src/libmv_light/libmv/correspondence/nRobustViewMatching.h b/modules/sfm/src/libmv_light/libmv/correspondence/nRobustViewMatching.h
index faddbc3ed23..bea27d7fc8b 100644
--- a/modules/sfm/src/libmv_light/libmv/correspondence/nRobustViewMatching.h
+++ b/modules/sfm/src/libmv_light/libmv/correspondence/nRobustViewMatching.h
@@ -127,9 +127,9 @@ private :
   Matches m_tracks;
 
   /// Interface to detect Keypoint.
-  cv::Ptr<cv::FeatureDetector> m_pDetector;
+  std::shared_ptr<cv::FeatureDetector> m_pDetector;
   /// Interface to describe Keypoint.
-  cv::Ptr<cv::DescriptorExtractor> m_pDescriber;
+  std::shared_ptr<cv::DescriptorExtractor> m_pDescriber;
 };
 
 } // using namespace correspondence
diff --git a/modules/sfm/src/numeric.cpp b/modules/sfm/src/numeric.cpp
index 3115e164f64..dea6032269c 100644
--- a/modules/sfm/src/numeric.cpp
+++ b/modules/sfm/src/numeric.cpp
@@ -140,7 +140,7 @@ skewMat( const Mat_<T> &x )
           x(2),    0 , -x(0),
          -x(1),  x(0),    0;
 
-  return CV_CXX_MOVE(skew);
+  return std::move(skew);
 }
 
 Mat
diff --git a/modules/sfm/src/reconstruct.cpp b/modules/sfm/src/reconstruct.cpp
index 254678ac954..52f628c5470 100644
--- a/modules/sfm/src/reconstruct.cpp
+++ b/modules/sfm/src/reconstruct.cpp
@@ -69,7 +69,8 @@ namespace sfm
 
     // Camera data
     Matx33d Ka = K.getMat();
-    const double focal_length = Ka(0,0);
+    const double focal_length_x = Ka(0,0);
+    const double focal_length_y = Ka(1,1);
     const double principal_x = Ka(0,2), principal_y = Ka(1,2), k1 = 0, k2 = 0, k3 = 0;
 
     // Set reconstruction options
@@ -77,7 +78,8 @@ namespace sfm
 
     libmv_CameraIntrinsicsOptions camera_instrinsic_options =
       libmv_CameraIntrinsicsOptions(SFM_DISTORTION_MODEL_POLYNOMIAL,
-                                    focal_length, principal_x, principal_y,
+                                    focal_length_x, focal_length_y,
+                                    principal_x, principal_y,
                                     k1, k2, k3);
 
     //-- Instantiate reconstruction pipeline
diff --git a/modules/shape/CMakeLists.txt b/modules/shape/CMakeLists.txt
new file mode 100644
index 00000000000..61beeb66078
--- /dev/null
+++ b/modules/shape/CMakeLists.txt
@@ -0,0 +1,2 @@
+set(the_description "Shape descriptors and matchers")
+ocv_define_module(shape opencv_core opencv_imgproc opencv_calib3d WRAP python)
diff --git a/modules/shape/include/opencv2/shape.hpp b/modules/shape/include/opencv2/shape.hpp
new file mode 100644
index 00000000000..f302b6bbc0f
--- /dev/null
+++ b/modules/shape/include/opencv2/shape.hpp
@@ -0,0 +1,57 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2012, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_SHAPE_HPP
+#define OPENCV_SHAPE_HPP
+
+#include "opencv2/shape/emdL1.hpp"
+#include "opencv2/shape/shape_transformer.hpp"
+#include "opencv2/shape/hist_cost.hpp"
+#include "opencv2/shape/shape_distance.hpp"
+
+/**
+  @defgroup shape Shape Distance and Matching
+ */
+
+#endif
+
+/* End of file. */
diff --git a/modules/shape/include/opencv2/shape/emdL1.hpp b/modules/shape/include/opencv2/shape/emdL1.hpp
new file mode 100644
index 00000000000..a15d68c222c
--- /dev/null
+++ b/modules/shape/include/opencv2/shape/emdL1.hpp
@@ -0,0 +1,72 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2012, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_EMD_L1_HPP
+#define OPENCV_EMD_L1_HPP
+
+#include "opencv2/core.hpp"
+
+namespace cv
+{
+/****************************************************************************************\
+*                                   EMDL1 Function                                      *
+\****************************************************************************************/
+
+//! @addtogroup shape
+//! @{
+
+/** @brief Computes the "minimal work" distance between two weighted point configurations base on the papers
+"EMD-L1: An efficient and Robust Algorithm for comparing histogram-based descriptors", by Haibin
+Ling and Kazunori Okuda; and "The Earth Mover's Distance is the Mallows Distance: Some Insights from
+Statistics", by Elizaveta Levina and Peter Bickel.
+
+@param signature1 First signature, a single column floating-point matrix. Each row is the value of
+the histogram in each bin.
+@param signature2 Second signature of the same format and size as signature1.
+ */
+CV_EXPORTS float EMDL1(InputArray signature1, InputArray signature2);
+
+//! @}
+
+}//namespace cv
+
+#endif
diff --git a/modules/shape/include/opencv2/shape/hist_cost.hpp b/modules/shape/include/opencv2/shape/hist_cost.hpp
new file mode 100644
index 00000000000..21d0d6807be
--- /dev/null
+++ b/modules/shape/include/opencv2/shape/hist_cost.hpp
@@ -0,0 +1,111 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_HIST_COST_HPP
+#define OPENCV_HIST_COST_HPP
+
+#include "opencv2/imgproc.hpp"
+
+namespace cv
+{
+
+//! @addtogroup shape
+//! @{
+
+/** @brief Abstract base class for histogram cost algorithms.
+ */
+class CV_EXPORTS_W HistogramCostExtractor : public Algorithm
+{
+public:
+    CV_WRAP virtual void buildCostMatrix(InputArray descriptors1, InputArray descriptors2, OutputArray costMatrix) = 0;
+
+    CV_WRAP virtual void setNDummies(int nDummies) = 0;
+    CV_WRAP virtual int getNDummies() const = 0;
+
+    CV_WRAP virtual void setDefaultCost(float defaultCost) = 0;
+    CV_WRAP virtual float getDefaultCost() const = 0;
+};
+
+/** @brief A norm based cost extraction. :
+ */
+class CV_EXPORTS_W NormHistogramCostExtractor : public HistogramCostExtractor
+{
+public:
+    CV_WRAP virtual void setNormFlag(int flag) = 0;
+    CV_WRAP virtual int getNormFlag() const = 0;
+};
+
+CV_EXPORTS_W Ptr<HistogramCostExtractor>
+    createNormHistogramCostExtractor(int flag=DIST_L2, int nDummies=25, float defaultCost=0.2f);
+
+/** @brief An EMD based cost extraction. :
+ */
+class CV_EXPORTS_W EMDHistogramCostExtractor : public HistogramCostExtractor
+{
+public:
+    CV_WRAP virtual void setNormFlag(int flag) = 0;
+    CV_WRAP virtual int getNormFlag() const = 0;
+};
+
+CV_EXPORTS_W Ptr<HistogramCostExtractor>
+    createEMDHistogramCostExtractor(int flag=DIST_L2, int nDummies=25, float defaultCost=0.2f);
+
+/** @brief An Chi based cost extraction. :
+ */
+class CV_EXPORTS_W ChiHistogramCostExtractor : public HistogramCostExtractor
+{};
+
+CV_EXPORTS_W Ptr<HistogramCostExtractor> createChiHistogramCostExtractor(int nDummies=25, float defaultCost=0.2f);
+
+/** @brief An EMD-L1 based cost extraction. :
+ */
+class CV_EXPORTS_W EMDL1HistogramCostExtractor : public HistogramCostExtractor
+{};
+
+CV_EXPORTS_W Ptr<HistogramCostExtractor>
+    createEMDL1HistogramCostExtractor(int nDummies=25, float defaultCost=0.2f);
+
+//! @}
+
+} // cv
+#endif
diff --git a/modules/shape/include/opencv2/shape/shape.hpp b/modules/shape/include/opencv2/shape/shape.hpp
new file mode 100644
index 00000000000..5c4da3cefa9
--- /dev/null
+++ b/modules/shape/include/opencv2/shape/shape.hpp
@@ -0,0 +1,48 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifdef __OPENCV_BUILD
+#error this is a compatibility header which should not be used inside the OpenCV library
+#endif
+
+#include "opencv2/shape.hpp"
diff --git a/modules/shape/include/opencv2/shape/shape_distance.hpp b/modules/shape/include/opencv2/shape/shape_distance.hpp
new file mode 100644
index 00000000000..94e20bc336b
--- /dev/null
+++ b/modules/shape/include/opencv2/shape/shape_distance.hpp
@@ -0,0 +1,227 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_SHAPE_SHAPE_DISTANCE_HPP
+#define OPENCV_SHAPE_SHAPE_DISTANCE_HPP
+#include "opencv2/core.hpp"
+#include "opencv2/shape/hist_cost.hpp"
+#include "opencv2/shape/shape_transformer.hpp"
+
+namespace cv
+{
+
+//! @addtogroup shape
+//! @{
+
+/** @example modules/shape/samples/shape_example.cpp
+An example using shape distance algorithm
+*/
+/** @brief Abstract base class for shape distance algorithms.
+ */
+class CV_EXPORTS_W ShapeDistanceExtractor : public Algorithm
+{
+public:
+    /** @brief Compute the shape distance between two shapes defined by its contours.
+
+    @param contour1 Contour defining first shape.
+    @param contour2 Contour defining second shape.
+     */
+    CV_WRAP virtual float computeDistance(InputArray contour1, InputArray contour2) = 0;
+};
+
+/***********************************************************************************/
+/***********************************************************************************/
+/***********************************************************************************/
+/** @brief Implementation of the Shape Context descriptor and matching algorithm
+
+proposed by Belongie et al. in "Shape Matching and Object Recognition Using Shape Contexts" (PAMI
+2002). This implementation is packaged in a generic scheme, in order to allow you the
+implementation of the common variations of the original pipeline.
+*/
+class CV_EXPORTS_W ShapeContextDistanceExtractor : public ShapeDistanceExtractor
+{
+public:
+    /** @brief Establish the number of angular bins for the Shape Context Descriptor used in the shape matching
+    pipeline.
+
+    @param nAngularBins The number of angular bins in the shape context descriptor.
+     */
+    CV_WRAP virtual void setAngularBins(int nAngularBins) = 0;
+    CV_WRAP virtual int getAngularBins() const = 0;
+
+    /** @brief Establish the number of radial bins for the Shape Context Descriptor used in the shape matching
+    pipeline.
+
+    @param nRadialBins The number of radial bins in the shape context descriptor.
+     */
+    CV_WRAP virtual void setRadialBins(int nRadialBins) = 0;
+    CV_WRAP virtual int getRadialBins() const = 0;
+
+    /** @brief Set the inner radius of the shape context descriptor.
+
+    @param innerRadius The value of the inner radius.
+     */
+    CV_WRAP virtual void setInnerRadius(float innerRadius) = 0;
+    CV_WRAP virtual float getInnerRadius() const = 0;
+
+    /** @brief Set the outer radius of the shape context descriptor.
+
+    @param outerRadius The value of the outer radius.
+     */
+    CV_WRAP virtual void setOuterRadius(float outerRadius) = 0;
+    CV_WRAP virtual float getOuterRadius() const = 0;
+
+    CV_WRAP virtual void setRotationInvariant(bool rotationInvariant) = 0;
+    CV_WRAP virtual bool getRotationInvariant() const = 0;
+
+    /** @brief Set the weight of the shape context distance in the final value of the shape distance. The shape
+    context distance between two shapes is defined as the symmetric sum of shape context matching costs
+    over best matching points. The final value of the shape distance is a user-defined linear
+    combination of the shape context distance, an image appearance distance, and a bending energy.
+
+    @param shapeContextWeight The weight of the shape context distance in the final distance value.
+     */
+    CV_WRAP virtual void setShapeContextWeight(float shapeContextWeight) = 0;
+    CV_WRAP virtual float getShapeContextWeight() const = 0;
+
+    /** @brief Set the weight of the Image Appearance cost in the final value of the shape distance. The image
+    appearance cost is defined as the sum of squared brightness differences in Gaussian windows around
+    corresponding image points. The final value of the shape distance is a user-defined linear
+    combination of the shape context distance, an image appearance distance, and a bending energy. If
+    this value is set to a number different from 0, is mandatory to set the images that correspond to
+    each shape.
+
+    @param imageAppearanceWeight The weight of the appearance cost in the final distance value.
+     */
+    CV_WRAP virtual void setImageAppearanceWeight(float imageAppearanceWeight) = 0;
+    CV_WRAP virtual float getImageAppearanceWeight() const = 0;
+
+    /** @brief Set the weight of the Bending Energy in the final value of the shape distance. The bending energy
+    definition depends on what transformation is being used to align the shapes. The final value of the
+    shape distance is a user-defined linear combination of the shape context distance, an image
+    appearance distance, and a bending energy.
+
+    @param bendingEnergyWeight The weight of the Bending Energy in the final distance value.
+     */
+    CV_WRAP virtual void setBendingEnergyWeight(float bendingEnergyWeight) = 0;
+    CV_WRAP virtual float getBendingEnergyWeight() const = 0;
+
+    /** @brief Set the images that correspond to each shape. This images are used in the calculation of the Image
+    Appearance cost.
+
+    @param image1 Image corresponding to the shape defined by contours1.
+    @param image2 Image corresponding to the shape defined by contours2.
+     */
+    CV_WRAP virtual void setImages(InputArray image1, InputArray image2) = 0;
+    CV_WRAP virtual void getImages(OutputArray image1, OutputArray image2) const = 0;
+
+    CV_WRAP virtual void setIterations(int iterations) = 0;
+    CV_WRAP virtual int getIterations() const = 0;
+
+    /** @brief Set the algorithm used for building the shape context descriptor cost matrix.
+
+    @param comparer Smart pointer to a HistogramCostExtractor, an algorithm that defines the cost
+    matrix between descriptors.
+     */
+    CV_WRAP virtual void setCostExtractor(Ptr<HistogramCostExtractor> comparer) = 0;
+    CV_WRAP virtual Ptr<HistogramCostExtractor> getCostExtractor() const = 0;
+
+    /** @brief Set the value of the standard deviation for the Gaussian window for the image appearance cost.
+
+    @param sigma Standard Deviation.
+     */
+    CV_WRAP virtual void setStdDev(float sigma) = 0;
+    CV_WRAP virtual float getStdDev() const = 0;
+
+    /** @brief Set the algorithm used for aligning the shapes.
+
+    @param transformer Smart pointer to a ShapeTransformer, an algorithm that defines the aligning
+    transformation.
+     */
+    CV_WRAP virtual void setTransformAlgorithm(Ptr<ShapeTransformer> transformer) = 0;
+    CV_WRAP virtual Ptr<ShapeTransformer> getTransformAlgorithm() const = 0;
+};
+
+/* Complete constructor */
+CV_EXPORTS_W Ptr<ShapeContextDistanceExtractor>
+    createShapeContextDistanceExtractor(int nAngularBins=12, int nRadialBins=4,
+                                        float innerRadius=0.2f, float outerRadius=2, int iterations=3,
+                                        const Ptr<HistogramCostExtractor> &comparer = createChiHistogramCostExtractor(),
+                                        const Ptr<ShapeTransformer> &transformer = createThinPlateSplineShapeTransformer());
+
+/***********************************************************************************/
+/***********************************************************************************/
+/***********************************************************************************/
+/** @brief A simple Hausdorff distance measure between shapes defined by contours
+
+according to the paper "Comparing Images using the Hausdorff distance." by D.P. Huttenlocher, G.A.
+Klanderman, and W.J. Rucklidge. (PAMI 1993). :
+ */
+class CV_EXPORTS_W HausdorffDistanceExtractor : public ShapeDistanceExtractor
+{
+public:
+    /** @brief Set the norm used to compute the Hausdorff value between two shapes. It can be L1 or L2 norm.
+
+    @param distanceFlag Flag indicating which norm is used to compute the Hausdorff distance
+    (NORM_L1, NORM_L2).
+     */
+    CV_WRAP virtual void setDistanceFlag(int distanceFlag) = 0;
+    CV_WRAP virtual int getDistanceFlag() const = 0;
+
+    /** @brief This method sets the rank proportion (or fractional value) that establish the Kth ranked value of
+    the partial Hausdorff distance. Experimentally had been shown that 0.6 is a good value to compare
+    shapes.
+
+    @param rankProportion fractional value (between 0 and 1).
+     */
+    CV_WRAP virtual void setRankProportion(float rankProportion) = 0;
+    CV_WRAP virtual float getRankProportion() const = 0;
+};
+
+/* Constructor */
+CV_EXPORTS_W Ptr<HausdorffDistanceExtractor> createHausdorffDistanceExtractor(int distanceFlag=cv::NORM_L2, float rankProp=0.6f);
+
+//! @}
+
+} // cv
+#endif
diff --git a/modules/shape/include/opencv2/shape/shape_transformer.hpp b/modules/shape/include/opencv2/shape/shape_transformer.hpp
new file mode 100644
index 00000000000..3c3ce20c125
--- /dev/null
+++ b/modules/shape/include/opencv2/shape/shape_transformer.hpp
@@ -0,0 +1,132 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_SHAPE_SHAPE_TRANSFORM_HPP
+#define OPENCV_SHAPE_SHAPE_TRANSFORM_HPP
+#include <vector>
+#include "opencv2/core.hpp"
+#include "opencv2/imgproc.hpp"
+
+namespace cv
+{
+
+//! @addtogroup shape
+//! @{
+
+/** @brief Abstract base class for shape transformation algorithms.
+ */
+class CV_EXPORTS_W ShapeTransformer : public Algorithm
+{
+public:
+    /** @brief Estimate the transformation parameters of the current transformer algorithm, based on point matches.
+
+    @param transformingShape Contour defining first shape.
+    @param targetShape Contour defining second shape (Target).
+    @param matches Standard vector of Matches between points.
+     */
+    CV_WRAP virtual void estimateTransformation(InputArray transformingShape, InputArray targetShape,
+                                                 std::vector<DMatch>& matches) = 0;
+
+    /** @brief Apply a transformation, given a pre-estimated transformation parameters.
+
+    @param input Contour (set of points) to apply the transformation.
+    @param output Output contour.
+     */
+    CV_WRAP virtual float applyTransformation(InputArray input, OutputArray output=noArray()) = 0;
+
+    /** @brief Apply a transformation, given a pre-estimated transformation parameters, to an Image.
+
+    @param transformingImage Input image.
+    @param output Output image.
+    @param flags Image interpolation method.
+    @param borderMode border style.
+    @param borderValue border value.
+     */
+    CV_WRAP virtual void warpImage(InputArray transformingImage, OutputArray output,
+                                   int flags=INTER_LINEAR, int borderMode=BORDER_CONSTANT,
+                                   const Scalar& borderValue=Scalar()) const = 0;
+};
+
+/***********************************************************************************/
+/***********************************************************************************/
+
+/** @brief Definition of the transformation
+
+occupied in the paper "Principal Warps: Thin-Plate Splines and Decomposition of Deformations", by
+F.L. Bookstein (PAMI 1989). :
+ */
+class CV_EXPORTS_W ThinPlateSplineShapeTransformer : public ShapeTransformer
+{
+public:
+    /** @brief Set the regularization parameter for relaxing the exact interpolation requirements of the TPS
+    algorithm.
+
+    @param beta value of the regularization parameter.
+     */
+    CV_WRAP virtual void setRegularizationParameter(double beta) = 0;
+    CV_WRAP virtual double getRegularizationParameter() const = 0;
+};
+
+/** Complete constructor */
+CV_EXPORTS_W Ptr<ThinPlateSplineShapeTransformer>
+    createThinPlateSplineShapeTransformer(double regularizationParameter=0);
+
+/***********************************************************************************/
+/***********************************************************************************/
+
+/** @brief Wrapper class for the OpenCV Affine Transformation algorithm. :
+ */
+class CV_EXPORTS_W AffineTransformer : public ShapeTransformer
+{
+public:
+    CV_WRAP virtual void setFullAffine(bool fullAffine) = 0;
+    CV_WRAP virtual bool getFullAffine() const = 0;
+};
+
+/** Complete constructor */
+CV_EXPORTS_W Ptr<AffineTransformer> createAffineTransformer(bool fullAffine);
+
+//! @}
+
+} // cv
+#endif
diff --git a/modules/shape/misc/python/test/test_shape.py b/modules/shape/misc/python/test/test_shape.py
new file mode 100644
index 00000000000..f7a564f0838
--- /dev/null
+++ b/modules/shape/misc/python/test/test_shape.py
@@ -0,0 +1,37 @@
+#!/usr/bin/env python
+
+# Python 2/3 compatibility
+from __future__ import print_function
+
+import os
+
+import cv2 as cv
+
+from tests_common import NewOpenCVTests
+
+SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
+MODULE_DIR = os.path.join(SCRIPT_DIR, '../../../')
+
+class shape_test(NewOpenCVTests):
+
+    def test_computeDistance(self):
+
+        a = cv.imread(os.path.join(MODULE_DIR, 'samples/data/shape_sample/1.png'), cv.IMREAD_GRAYSCALE)
+        b = cv.imread(os.path.join(MODULE_DIR, 'samples/data/shape_sample/2.png'), cv.IMREAD_GRAYSCALE)
+        if a is None or b is None:
+            raise unittest.SkipTest("Missing files with test data")
+
+        ca, _ = cv.findContours(a, cv.RETR_CCOMP, cv.CHAIN_APPROX_TC89_KCOS)
+        cb, _ = cv.findContours(b, cv.RETR_CCOMP, cv.CHAIN_APPROX_TC89_KCOS)
+
+        hd = cv.createHausdorffDistanceExtractor()
+        sd = cv.createShapeContextDistanceExtractor()
+
+        d1 = hd.computeDistance(ca[0], cb[0])
+        d2 = sd.computeDistance(ca[0], cb[0])
+
+        self.assertAlmostEqual(d1, 26.4196891785, 3, "HausdorffDistanceExtractor")
+        self.assertAlmostEqual(d2, 0.25804194808, 3, "ShapeContextDistanceExtractor")
+
+if __name__ == '__main__':
+    NewOpenCVTests.bootstrap()
diff --git a/modules/shape/samples/data/shape_sample/1.png b/modules/shape/samples/data/shape_sample/1.png
new file mode 100644
index 00000000000..f473cb4bde6
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/1.png differ
diff --git a/modules/shape/samples/data/shape_sample/10.png b/modules/shape/samples/data/shape_sample/10.png
new file mode 100644
index 00000000000..46283caacac
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/10.png differ
diff --git a/modules/shape/samples/data/shape_sample/11.png b/modules/shape/samples/data/shape_sample/11.png
new file mode 100644
index 00000000000..d4f114d809e
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/11.png differ
diff --git a/modules/shape/samples/data/shape_sample/12.png b/modules/shape/samples/data/shape_sample/12.png
new file mode 100644
index 00000000000..ad876e9397a
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/12.png differ
diff --git a/modules/shape/samples/data/shape_sample/13.png b/modules/shape/samples/data/shape_sample/13.png
new file mode 100644
index 00000000000..2ec2621b2ef
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/13.png differ
diff --git a/modules/shape/samples/data/shape_sample/14.png b/modules/shape/samples/data/shape_sample/14.png
new file mode 100644
index 00000000000..956e5b8145e
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/14.png differ
diff --git a/modules/shape/samples/data/shape_sample/15.png b/modules/shape/samples/data/shape_sample/15.png
new file mode 100644
index 00000000000..422395fb0aa
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/15.png differ
diff --git a/modules/shape/samples/data/shape_sample/16.png b/modules/shape/samples/data/shape_sample/16.png
new file mode 100644
index 00000000000..57ad4104a67
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/16.png differ
diff --git a/modules/shape/samples/data/shape_sample/17.png b/modules/shape/samples/data/shape_sample/17.png
new file mode 100644
index 00000000000..58417bc57b7
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/17.png differ
diff --git a/modules/shape/samples/data/shape_sample/18.png b/modules/shape/samples/data/shape_sample/18.png
new file mode 100644
index 00000000000..25fb50b1003
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/18.png differ
diff --git a/modules/shape/samples/data/shape_sample/19.png b/modules/shape/samples/data/shape_sample/19.png
new file mode 100644
index 00000000000..256b8889c7d
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/19.png differ
diff --git a/modules/shape/samples/data/shape_sample/2.png b/modules/shape/samples/data/shape_sample/2.png
new file mode 100644
index 00000000000..7c7c31b9426
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/2.png differ
diff --git a/modules/shape/samples/data/shape_sample/20.png b/modules/shape/samples/data/shape_sample/20.png
new file mode 100644
index 00000000000..de8d5ed3e2c
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/20.png differ
diff --git a/modules/shape/samples/data/shape_sample/3.png b/modules/shape/samples/data/shape_sample/3.png
new file mode 100644
index 00000000000..7e8f7bc471e
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/3.png differ
diff --git a/modules/shape/samples/data/shape_sample/4.png b/modules/shape/samples/data/shape_sample/4.png
new file mode 100644
index 00000000000..5cced92c736
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/4.png differ
diff --git a/modules/shape/samples/data/shape_sample/5.png b/modules/shape/samples/data/shape_sample/5.png
new file mode 100644
index 00000000000..786d658ca52
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/5.png differ
diff --git a/modules/shape/samples/data/shape_sample/6.png b/modules/shape/samples/data/shape_sample/6.png
new file mode 100644
index 00000000000..e87cf4d2c45
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/6.png differ
diff --git a/modules/shape/samples/data/shape_sample/7.png b/modules/shape/samples/data/shape_sample/7.png
new file mode 100644
index 00000000000..b12aa52d2be
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/7.png differ
diff --git a/modules/shape/samples/data/shape_sample/8.png b/modules/shape/samples/data/shape_sample/8.png
new file mode 100644
index 00000000000..52fff13d41f
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/8.png differ
diff --git a/modules/shape/samples/data/shape_sample/9.png b/modules/shape/samples/data/shape_sample/9.png
new file mode 100644
index 00000000000..a54aa3ab78d
Binary files /dev/null and b/modules/shape/samples/data/shape_sample/9.png differ
diff --git a/modules/shape/samples/shape_example.cpp b/modules/shape/samples/shape_example.cpp
new file mode 100644
index 00000000000..514e4dc01c5
--- /dev/null
+++ b/modules/shape/samples/shape_example.cpp
@@ -0,0 +1,121 @@
+/*
+ * shape_context.cpp -- Shape context demo for shape matching
+ */
+
+#include "opencv2/shape.hpp"
+#include "opencv2/imgcodecs.hpp"
+#include "opencv2/highgui.hpp"
+#include "opencv2/imgproc.hpp"
+#include <opencv2/core/utility.hpp>
+#include <iostream>
+#include <string>
+
+using namespace std;
+using namespace cv;
+
+static void help()
+{
+    printf("\n"
+            "This program demonstrates a method for shape comparison based on Shape Context\n"
+            "You should run the program providing a number between 1 and 20 for selecting an image in the folder ../data/shape_sample.\n"
+            "Call\n"
+            "./shape_example [number between 1 and 20, 1 default]\n\n");
+}
+
+static vector<Point> simpleContour( const Mat& currentQuery, int n=300 )
+{
+    vector<vector<Point> > _contoursQuery;
+    vector <Point> contoursQuery;
+    findContours(currentQuery, _contoursQuery, RETR_LIST, CHAIN_APPROX_NONE);
+    for (size_t border=0; border<_contoursQuery.size(); border++)
+    {
+        for (size_t p=0; p<_contoursQuery[border].size(); p++)
+        {
+            contoursQuery.push_back( _contoursQuery[border][p] );
+        }
+    }
+
+    // In case actual number of points is less than n
+    int dummy=0;
+    for (int add=(int)contoursQuery.size()-1; add<n; add++)
+    {
+        contoursQuery.push_back(contoursQuery[dummy++]); //adding dummy values
+    }
+
+    // Uniformly sampling
+    cv::randShuffle(contoursQuery);
+    vector<Point> cont;
+    for (int i=0; i<n; i++)
+    {
+        cont.push_back(contoursQuery[i]);
+    }
+    return cont;
+}
+
+int main(int argc, char** argv)
+{
+    string path = "data/shape_sample/";
+    cv::CommandLineParser parser(argc, argv, "{help h||}{@input|1|}");
+    if (parser.has("help"))
+    {
+        help();
+        return 0;
+    }
+    int indexQuery = parser.get<int>("@input");
+    if (!parser.check())
+    {
+        parser.printErrors();
+        help();
+        return 1;
+    }
+    if (indexQuery < 1 || indexQuery > 20)
+    {
+        help();
+        return 1;
+    }
+    cv::Ptr <cv::ShapeContextDistanceExtractor> mysc = cv::createShapeContextDistanceExtractor();
+
+    Size sz2Sh(300,300);
+    stringstream queryName;
+    queryName<<path<<indexQuery<<".png";
+    Mat query=imread(queryName.str(), IMREAD_GRAYSCALE);
+    Mat queryToShow;
+    resize(query, queryToShow, sz2Sh, 0, 0, INTER_LINEAR_EXACT);
+    imshow("QUERY", queryToShow);
+    moveWindow("TEST", 0,0);
+    vector<Point> contQuery = simpleContour(query);
+    int bestMatch = 0;
+    float bestDis=FLT_MAX;
+    for ( int ii=1; ii<=20; ii++ )
+    {
+        if (ii==indexQuery) continue;
+        waitKey(30);
+        stringstream iiname;
+        iiname<<path<<ii<<".png";
+        cout<<"name: "<<iiname.str()<<endl;
+        Mat iiIm=imread(iiname.str(), 0);
+        Mat iiToShow;
+        resize(iiIm, iiToShow, sz2Sh, 0, 0, INTER_LINEAR_EXACT);
+        imshow("TEST", iiToShow);
+        moveWindow("TEST", sz2Sh.width+50,0);
+        vector<Point> contii = simpleContour(iiIm);
+        float dis = mysc->computeDistance( contQuery, contii );
+        if ( dis<bestDis )
+        {
+            bestMatch = ii;
+            bestDis = dis;
+        }
+        std::cout<<" distance between "<<queryName.str()<<" and "<<iiname.str()<<" is: "<<dis<<std::endl;
+    }
+    destroyWindow("TEST");
+    stringstream bestname;
+    bestname<<path<<bestMatch<<".png";
+    Mat iiIm=imread(bestname.str(), 0);
+    Mat bestToShow;
+    resize(iiIm, bestToShow, sz2Sh, 0, 0, INTER_LINEAR_EXACT);
+    imshow("BEST MATCH", bestToShow);
+    moveWindow("BEST MATCH", sz2Sh.width+50,0);
+    waitKey();
+
+    return 0;
+}
diff --git a/modules/shape/src/aff_trans.cpp b/modules/shape/src/aff_trans.cpp
new file mode 100644
index 00000000000..75daf2b1736
--- /dev/null
+++ b/modules/shape/src/aff_trans.cpp
@@ -0,0 +1,279 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+namespace cv
+{
+
+class AffineTransformerImpl : public AffineTransformer
+{
+public:
+    /* Constructors */
+    AffineTransformerImpl()
+    {
+        fullAffine = true;
+        name_ = "ShapeTransformer.AFF";
+        transformCost = 0;
+    }
+
+    AffineTransformerImpl(bool _fullAffine)
+    {
+        fullAffine = _fullAffine;
+        name_ = "ShapeTransformer.AFF";
+        transformCost = 0;
+    }
+
+    /* Destructor */
+    ~AffineTransformerImpl()
+    {
+    }
+
+    //! the main operator
+    virtual void estimateTransformation(InputArray transformingShape, InputArray targetShape, std::vector<DMatch> &matches) CV_OVERRIDE;
+    virtual float applyTransformation(InputArray input, OutputArray output=noArray()) CV_OVERRIDE;
+    virtual void warpImage(InputArray transformingImage, OutputArray output,
+                           int flags, int borderMode, const Scalar& borderValue) const CV_OVERRIDE;
+
+    //! Setters/Getters
+    virtual void setFullAffine(bool _fullAffine) CV_OVERRIDE {fullAffine=_fullAffine;}
+    virtual bool getFullAffine() const CV_OVERRIDE {return fullAffine;}
+
+    //! write/read
+    virtual void write(FileStorage& fs) const CV_OVERRIDE
+    {
+        writeFormat(fs);
+        fs << "name" << name_
+           << "affine_type" << int(fullAffine);
+    }
+
+    virtual void read(const FileNode& fn) CV_OVERRIDE
+    {
+        CV_Assert( (String)fn["name"] == name_ );
+        fullAffine = int(fn["affine_type"])?true:false;
+    }
+
+private:
+    bool fullAffine;
+    Mat affineMat;
+    float transformCost;
+
+protected:
+    String name_;
+};
+
+void AffineTransformerImpl::warpImage(InputArray transformingImage, OutputArray output,
+                                      int flags, int borderMode, const Scalar& borderValue) const
+{
+    CV_INSTRUMENT_REGION();
+
+    CV_Assert(!affineMat.empty());
+    warpAffine(transformingImage, output, affineMat, transformingImage.getMat().size(), flags, borderMode, borderValue);
+}
+
+
+static Mat _localAffineEstimate(const std::vector<Point2f>& shape1, const std::vector<Point2f>& shape2,
+                                bool fullAfine)
+{
+    Mat out(2,3,CV_32F);
+    int siz=2*(int)shape1.size();
+
+    if (fullAfine)
+    {
+        Mat matM(siz, 6, CV_32F);
+        Mat matP(siz,1,CV_32F);
+        int contPt=0;
+        for (int ii=0; ii<siz; ii++)
+        {
+            Mat therow = Mat::zeros(1,6,CV_32F);
+            if (ii%2==0)
+            {
+                therow.at<float>(0,0)=shape1[contPt].x;
+                therow.at<float>(0,1)=shape1[contPt].y;
+                therow.at<float>(0,2)=1;
+                therow.row(0).copyTo(matM.row(ii));
+                matP.at<float>(ii,0) = shape2[contPt].x;
+            }
+            else
+            {
+                therow.at<float>(0,3)=shape1[contPt].x;
+                therow.at<float>(0,4)=shape1[contPt].y;
+                therow.at<float>(0,5)=1;
+                therow.row(0).copyTo(matM.row(ii));
+                matP.at<float>(ii,0) = shape2[contPt].y;
+                contPt++;
+            }
+        }
+        Mat sol;
+        solve(matM, matP, sol, DECOMP_SVD);
+        out = sol.reshape(0,2);
+    }
+    else
+    {
+        Mat matM(siz, 4, CV_32F);
+        Mat matP(siz,1,CV_32F);
+        int contPt=0;
+        for (int ii=0; ii<siz; ii++)
+        {
+            Mat therow = Mat::zeros(1,4,CV_32F);
+            if (ii%2==0)
+            {
+                therow.at<float>(0,0)=shape1[contPt].x;
+                therow.at<float>(0,1)=shape1[contPt].y;
+                therow.at<float>(0,2)=1;
+                therow.row(0).copyTo(matM.row(ii));
+                matP.at<float>(ii,0) = shape2[contPt].x;
+            }
+            else
+            {
+                therow.at<float>(0,0)=shape1[contPt].y;
+                therow.at<float>(0,1)=-shape1[contPt].x;
+                therow.at<float>(0,3)=1;
+                therow.row(0).copyTo(matM.row(ii));
+                matP.at<float>(ii,0) = shape2[contPt].y;
+                contPt++;
+            }
+        }
+        Mat sol;
+        solve(matM, matP, sol, DECOMP_SVD);
+        out.at<float>(0,0)=sol.at<float>(0,0);
+        out.at<float>(0,1)=sol.at<float>(1,0);
+        out.at<float>(0,2)=sol.at<float>(2,0);
+        out.at<float>(1,0)=-sol.at<float>(1,0);
+        out.at<float>(1,1)=sol.at<float>(0,0);
+        out.at<float>(1,2)=sol.at<float>(3,0);
+    }
+    return out;
+}
+
+void AffineTransformerImpl::estimateTransformation(InputArray _pts1, InputArray _pts2, std::vector<DMatch>& _matches)
+{
+    CV_INSTRUMENT_REGION();
+
+    Mat pts1 = _pts1.getMat();
+    Mat pts2 = _pts2.getMat();
+    CV_Assert((pts1.channels()==2) && (pts1.cols>0) && (pts2.channels()==2) && (pts2.cols>0));
+    CV_Assert(_matches.size()>1);
+
+    if (pts1.type() != CV_32F)
+        pts1.convertTo(pts1, CV_32F);
+    if (pts2.type() != CV_32F)
+        pts2.convertTo(pts2, CV_32F);
+
+    // Use only valid matchings //
+    std::vector<DMatch> matches;
+    for (size_t i=0; i<_matches.size(); i++)
+    {
+        if (_matches[i].queryIdx<pts1.cols &&
+            _matches[i].trainIdx<pts2.cols)
+        {
+            matches.push_back(_matches[i]);
+        }
+    }
+
+    // Organizing the correspondent points in vector style //
+    std::vector<Point2f> shape1; // transforming shape
+    std::vector<Point2f> shape2; // target shape
+    for (size_t i=0; i<matches.size(); i++)
+    {
+        Point2f pt1=pts1.at<Point2f>(0,matches[i].queryIdx);
+        shape1.push_back(pt1);
+
+        Point2f pt2=pts2.at<Point2f>(0,matches[i].trainIdx);
+        shape2.push_back(pt2);
+    }
+
+    Mat affine;
+    if (fullAffine)
+    {
+        estimateAffine2D(shape1, shape2).convertTo(affine, CV_32F);
+    } else
+    {
+        estimateAffinePartial2D(shape1, shape2).convertTo(affine, CV_32F);
+    }
+
+    if (affine.empty())
+        //In case there is not good solution, just give a LLS based one
+        affine = _localAffineEstimate(shape1, shape2, fullAffine);
+
+    affineMat = affine;
+}
+
+float AffineTransformerImpl::applyTransformation(InputArray inPts, OutputArray outPts)
+{
+    CV_INSTRUMENT_REGION();
+
+    Mat pts1 = inPts.getMat();
+    CV_Assert((pts1.channels()==2) && (pts1.cols>0));
+
+    //Apply transformation in the complete set of points
+    Mat fAffine;
+    transform(pts1, fAffine, affineMat);
+
+    // Ensambling output //
+    if (outPts.needed())
+    {
+        outPts.create(1,fAffine.cols, CV_32FC2);
+        Mat outMat = outPts.getMat();
+        for (int i=0; i<fAffine.cols; i++)
+            outMat.at<Point2f>(0,i)=fAffine.at<Point2f>(0,i);
+    }
+
+    // Updating Transform Cost //
+    Mat Af(2, 2, CV_32F);
+    Af.at<float>(0,0)=affineMat.at<float>(0,0);
+    Af.at<float>(0,1)=affineMat.at<float>(1,0);
+    Af.at<float>(1,0)=affineMat.at<float>(0,1);
+    Af.at<float>(1,1)=affineMat.at<float>(1,1);
+    SVD mysvd(Af, SVD::NO_UV);
+    Mat singVals=mysvd.w;
+    transformCost=std::log((singVals.at<float>(0,0)+FLT_MIN)/(singVals.at<float>(1,0)+FLT_MIN));
+
+    return transformCost;
+}
+
+Ptr <AffineTransformer> createAffineTransformer(bool fullAffine)
+{
+    return Ptr<AffineTransformer>( new AffineTransformerImpl(fullAffine) );
+}
+
+} // cv
diff --git a/modules/shape/src/emdL1.cpp b/modules/shape/src/emdL1.cpp
new file mode 100644
index 00000000000..b28dc967052
--- /dev/null
+++ b/modules/shape/src/emdL1.cpp
@@ -0,0 +1,797 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+/*
+ * Implementation of an optimized EMD for histograms based in
+ * the papers "EMD-L1: An efficient and Robust Algorithm
+ * for comparing histogram-based descriptors", by Haibin Ling and
+ * Kazunori Okuda; and "The Earth Mover's Distance is the Mallows
+ * Distance: Some Insights from Statistics", by Elizaveta Levina and
+ * Peter Bickel, based on HAIBIN LING AND KAZUNORI OKADA implementation.
+ */
+
+#include "precomp.hpp"
+#include "emdL1_def.hpp"
+#include <limits>
+
+/****************************************************************************************\
+*                                   EMDL1 Class                                         *
+\****************************************************************************************/
+
+float EmdL1::getEMDL1(cv::Mat &sig1, cv::Mat &sig2)
+{
+    // Initialization
+    CV_Assert((sig1.rows==sig2.rows) && (sig1.cols==sig2.cols) && (!sig1.empty()) && (!sig2.empty()));
+    if(!initBaseTrees(sig1.rows, 1))
+        return -1;
+
+    float *H1=new float[sig1.rows], *H2 = new float[sig2.rows];
+    for (int ii=0; ii<sig1.rows; ii++)
+    {
+        H1[ii]=sig1.at<float>(ii,0);
+        H2[ii]=sig2.at<float>(ii,0);
+    }
+
+    fillBaseTrees(H1,H2); // Initialize histograms
+    greedySolution(); // Construct an initial Basic Feasible solution
+    initBVTree(); // Initialize BVTree
+
+    // Iteration
+    bool bOptimal = false;
+    m_nItr = 0;
+    while(!bOptimal && m_nItr<nMaxIt)
+    {
+        // Derive U=(u_ij) for row i and column j
+        if(m_nItr==0) updateSubtree(m_pRoot);
+        else updateSubtree(m_pEnter->pChild);
+
+        // Optimality test
+        bOptimal = isOptimal();
+
+        // Find new solution
+        if(!bOptimal)
+            findNewSolution();
+        ++m_nItr;
+    }
+    delete [] H1;
+    delete [] H2;
+    // Output the total flow
+    return compuTotalFlow();
+}
+
+void EmdL1::setMaxIteration(int _nMaxIt)
+{
+    nMaxIt=_nMaxIt;
+}
+
+//-- SubFunctions called in the EMD algorithm
+bool EmdL1::initBaseTrees(int n1, int n2, int n3)
+{
+    if(binsDim1==n1 && binsDim2==n2 && binsDim3==n3)
+        return true;
+    binsDim1 = n1;
+    binsDim2 = n2;
+    binsDim3 = n3;
+    if(binsDim1==0 || binsDim2==0) dimension = 0;
+    else dimension	= (binsDim3==0)?2:3;
+
+    if(dimension==2)
+    {
+        m_Nodes.resize(binsDim1);
+        m_EdgesUp.resize(binsDim1);
+        m_EdgesRight.resize(binsDim1);
+        for(int i1=0; i1<binsDim1; i1++)
+        {
+            m_Nodes[i1].resize(binsDim2);
+            m_EdgesUp[i1].resize(binsDim2);
+            m_EdgesRight[i1].resize(binsDim2);
+        }
+        m_NBVEdges.resize(binsDim1*binsDim2*4+2);
+        m_auxQueue.resize(binsDim1*binsDim2+2);
+        m_fromLoop.resize(binsDim1*binsDim2+2);
+        m_toLoop.resize(binsDim1*binsDim2+2);
+    }
+    else if(dimension==3)
+    {
+        m_3dNodes.resize(binsDim1);
+        m_3dEdgesUp.resize(binsDim1);
+        m_3dEdgesRight.resize(binsDim1);
+        m_3dEdgesDeep.resize(binsDim1);
+        for(int i1=0; i1<binsDim1; i1++)
+        {
+            m_3dNodes[i1].resize(binsDim2);
+            m_3dEdgesUp[i1].resize(binsDim2);
+            m_3dEdgesRight[i1].resize(binsDim2);
+            m_3dEdgesDeep[i1].resize(binsDim2);
+            for(int i2=0; i2<binsDim2; i2++)
+            {
+                m_3dNodes[i1][i2].resize(binsDim3);
+                m_3dEdgesUp[i1][i2].resize(binsDim3);
+                m_3dEdgesRight[i1][i2].resize(binsDim3);
+                m_3dEdgesDeep[i1][i2].resize(binsDim3);
+            }
+        }
+        m_NBVEdges.resize(binsDim1*binsDim2*binsDim3*6+4);
+        m_auxQueue.resize(binsDim1*binsDim2*binsDim3+4);
+        m_fromLoop.resize(binsDim1*binsDim2*binsDim3+4);
+        m_toLoop.resize(binsDim1*binsDim2*binsDim3+2);
+    }
+    else
+        return false;
+
+    return true;
+}
+
+bool EmdL1::fillBaseTrees(float *H1, float *H2)
+{
+    //- Set global counters
+    m_pRoot	= NULL;
+    // Graph initialization
+    float *p1 = H1;
+    float *p2 = H2;
+    if(dimension==2)
+    {
+        for(int c=0; c<binsDim2; c++)
+        {
+            for(int r=0; r<binsDim1; r++)
+            {
+                //- initialize nodes and links
+                m_Nodes[r][c].pos[0] = r;
+                m_Nodes[r][c].pos[1] = c;
+                m_Nodes[r][c].d = *(p1++)-*(p2++);
+                m_Nodes[r][c].pParent = NULL;
+                m_Nodes[r][c].pChild = NULL;
+                m_Nodes[r][c].iLevel = -1;
+
+                //- initialize edges
+                // to the right
+                m_EdgesRight[r][c].pParent = &(m_Nodes[r][c]);
+                m_EdgesRight[r][c].pChild = &(m_Nodes[r][(c+1)%binsDim2]);
+                m_EdgesRight[r][c].flow	= 0;
+                m_EdgesRight[r][c].iDir	= 1;
+                m_EdgesRight[r][c].pNxt	= NULL;
+
+                // to the upward
+                m_EdgesUp[r][c].pParent	= &(m_Nodes[r][c]);
+                m_EdgesUp[r][c].pChild	= &(m_Nodes[(r+1)%binsDim1][c]);
+                m_EdgesUp[r][c].flow = 0;
+                m_EdgesUp[r][c].iDir = 1;
+                m_EdgesUp[r][c].pNxt = NULL;
+            }
+        }
+    }
+    else if(dimension==3)
+    {
+        for(int z=0; z<binsDim3; z++)
+        {
+            for(int c=0; c<binsDim2; c++)
+            {
+                for(int r=0; r<binsDim1; r++)
+                {
+                    //- initialize nodes and edges
+                    m_3dNodes[r][c][z].pos[0] = r;
+                    m_3dNodes[r][c][z].pos[1] = c;
+                    m_3dNodes[r][c][z].pos[2] = z;
+                    m_3dNodes[r][c][z].d = *(p1++)-*(p2++);
+                    m_3dNodes[r][c][z].pParent = NULL;
+                    m_3dNodes[r][c][z].pChild = NULL;
+                    m_3dNodes[r][c][z].iLevel = -1;
+
+                    //- initialize edges
+                    // to the upward
+                    m_3dEdgesUp[r][c][z].pParent= &(m_3dNodes[r][c][z]);
+                    m_3dEdgesUp[r][c][z].pChild	= &(m_3dNodes[(r+1)%binsDim1][c][z]);
+                    m_3dEdgesUp[r][c][z].flow = 0;
+                    m_3dEdgesUp[r][c][z].iDir = 1;
+                    m_3dEdgesUp[r][c][z].pNxt = NULL;
+
+                    // to the right
+                    m_3dEdgesRight[r][c][z].pParent	= &(m_3dNodes[r][c][z]);
+                    m_3dEdgesRight[r][c][z].pChild	= &(m_3dNodes[r][(c+1)%binsDim2][z]);
+                    m_3dEdgesRight[r][c][z].flow	= 0;
+                    m_3dEdgesRight[r][c][z].iDir	= 1;
+                    m_3dEdgesRight[r][c][z].pNxt	= NULL;
+
+                    // to the deep
+                    m_3dEdgesDeep[r][c][z].pParent	= &(m_3dNodes[r][c][z]);
+                    m_3dEdgesDeep[r][c][z].pChild	= &(m_3dNodes[r][c])[(z+1)%binsDim3];
+                    m_3dEdgesDeep[r][c][z].flow = 0;
+                    m_3dEdgesDeep[r][c][z].iDir = 1;
+                    m_3dEdgesDeep[r][c][z].pNxt = NULL;
+                }
+            }
+        }
+    }
+    return true;
+}
+
+bool EmdL1::greedySolution()
+{
+    return dimension==2?greedySolution2():greedySolution3();
+}
+
+bool EmdL1::greedySolution2()
+{
+    //- Prepare auxiliary array, D=H1-H2
+    int	c,r;
+    floatArray2D D(binsDim1);
+    for(r=0; r<binsDim1; r++)
+    {
+        D[r].resize(binsDim2);
+        for(c=0; c<binsDim2; c++) D[r][c] = m_Nodes[r][c].d;
+    }
+    // compute integrated values along each dimension
+    std::vector<float> d2s(binsDim2);
+    d2s[0] = 0;
+    for(c=0; c<binsDim2-1; c++)
+    {
+        d2s[c+1] = d2s[c];
+        for(r=0; r<binsDim1; r++) d2s[c+1]-= D[r][c];
+    }
+
+    std::vector<float> d1s(binsDim1);
+    d1s[0] = 0;
+    for(r=0; r<binsDim1-1; r++)
+    {
+        d1s[r+1] = d1s[r];
+        for(c=0; c<binsDim2; c++) d1s[r+1]-= D[r][c];
+    }
+
+    //- Greedy algorithm for initial solution
+    cvPEmdEdge pBV;
+    float dFlow;
+    bool bUpward = false;
+    nNBV = 0; // number of NON-BV edges
+
+    for(c=0; c<binsDim2-1; c++)
+    for(r=0; r<binsDim1; r++)
+    {
+        dFlow = D[r][c];
+        bUpward = (r<binsDim1-1) && (fabs(dFlow+d2s[c+1]) > fabs(dFlow+d1s[r+1]));	// Move upward or right
+
+        // modify basic variables, record BV and related values
+        if(bUpward)
+        {
+            // move to up
+            pBV	= &(m_EdgesUp[r][c]);
+            m_NBVEdges[nNBV++]	= &(m_EdgesRight[r][c]);
+            D[r+1][c] += dFlow;		// auxiliary matrix maintenance
+            d1s[r+1] += dFlow;		// auxiliary matrix maintenance
+        }
+        else
+        {
+            // move to right, no other choice
+            pBV	= &(m_EdgesRight[r][c]);
+            if(r<binsDim1-1)
+                m_NBVEdges[nNBV++]	= &(m_EdgesUp[r][c]);
+
+            D[r][c+1] += dFlow;		// auxiliary matrix maintenance
+            d2s[c+1] += dFlow;		// auxiliary matrix maintenance
+        }
+        pBV->pParent->pChild = pBV;
+        pBV->flow = fabs(dFlow);
+        pBV->iDir = dFlow>0;		// 1:outward, 0:inward
+    }
+
+    //- rightmost column, no choice but move upward
+    c = binsDim2-1;
+    for(r=0; r<binsDim1-1; r++)
+    {
+        dFlow = D[r][c];
+        pBV = &(m_EdgesUp[r][c]);
+        D[r+1][c] += dFlow;		// auxiliary matrix maintenance
+        pBV->pParent->pChild= pBV;
+        pBV->flow = fabs(dFlow);
+        pBV->iDir = dFlow>0;		// 1:outward, 0:inward
+    }
+    return true;
+}
+
+bool EmdL1::greedySolution3()
+{
+    //- Prepare auxiliary array, D=H1-H2
+    int i1,i2,i3;
+    std::vector<floatArray2D> D(binsDim1);
+    for(i1=0; i1<binsDim1; i1++)
+    {
+        D[i1].resize(binsDim2);
+        for(i2=0; i2<binsDim2; i2++)
+        {
+            D[i1][i2].resize(binsDim3);
+            for(i3=0; i3<binsDim3; i3++)
+                D[i1][i2][i3] = m_3dNodes[i1][i2][i3].d;
+        }
+    }
+
+    // compute integrated values along each dimension
+    std::vector<float> d1s(binsDim1);
+    d1s[0] = 0;
+    for(i1=0; i1<binsDim1-1; i1++)
+    {
+        d1s[i1+1] = d1s[i1];
+        for(i2=0; i2<binsDim2; i2++)
+        {
+            for(i3=0; i3<binsDim3; i3++)
+                d1s[i1+1] -= D[i1][i2][i3];
+        }
+    }
+
+    std::vector<float> d2s(binsDim2);
+    d2s[0] = 0;
+    for(i2=0; i2<binsDim2-1; i2++)
+    {
+        d2s[i2+1] = d2s[i2];
+        for(i1=0; i1<binsDim1; i1++)
+        {
+            for(i3=0; i3<binsDim3; i3++)
+                d2s[i2+1] -= D[i1][i2][i3];
+        }
+    }
+
+    std::vector<float> d3s(binsDim3);
+    d3s[0] = 0;
+    for(i3=0; i3<binsDim3-1; i3++)
+    {
+        d3s[i3+1]	= d3s[i3];
+        for(i1=0; i1<binsDim1; i1++)
+        {
+            for(i2=0; i2<binsDim2; i2++)
+                d3s[i3+1] -= D[i1][i2][i3];
+        }
+    }
+
+    //- Greedy algorithm for initial solution
+    cvPEmdEdge pBV;
+    float dFlow, f1,f2,f3;
+    nNBV = 0; // number of NON-BV edges
+    for(i3=0; i3<binsDim3; i3++)
+    {
+        for(i2=0; i2<binsDim2; i2++)
+        {
+            for(i1=0; i1<binsDim1; i1++)
+            {
+                if(i3==binsDim3-1 && i2==binsDim2-1 && i1==binsDim1-1) break;
+
+                //- determine which direction to move, either right or upward
+                dFlow = D[i1][i2][i3];
+                f1 = (i1<(binsDim1-1))?fabs(dFlow+d1s[i1+1]):std::numeric_limits<float>::max();
+                f2 = (i2<(binsDim2-1))?fabs(dFlow+d2s[i2+1]):std::numeric_limits<float>::max();
+                f3 = (i3<(binsDim3-1))?fabs(dFlow+d3s[i3+1]):std::numeric_limits<float>::max();
+
+                if(f1<f2 && f1<f3)
+                {
+                    pBV	= &(m_3dEdgesUp[i1][i2][i3]); // up
+                    if(i2<binsDim2-1) m_NBVEdges[nNBV++] = &(m_3dEdgesRight[i1][i2][i3]);	// right
+                    if(i3<binsDim3-1) m_NBVEdges[nNBV++] = &(m_3dEdgesDeep[i1][i2][i3]); // deep
+                    D[i1+1][i2][i3]	+= dFlow; // maintain auxiliary matrix
+                    d1s[i1+1] += dFlow;
+                }
+                else if(f2<f3)
+                {
+                    pBV	= &(m_3dEdgesRight[i1][i2][i3]); // right
+                    if(i1<binsDim1-1) m_NBVEdges[nNBV++] = &(m_3dEdgesUp[i1][i2][i3]); // up
+                    if(i3<binsDim3-1) m_NBVEdges[nNBV++] = &(m_3dEdgesDeep[i1][i2][i3]); // deep
+                    D[i1][i2+1][i3]	+= dFlow; // maintain auxiliary matrix
+                    d2s[i2+1] += dFlow;
+                }
+                else
+                {
+                    pBV	= &(m_3dEdgesDeep[i1][i2][i3]); // deep
+                    if(i2<binsDim2-1) m_NBVEdges[nNBV++] = &(m_3dEdgesRight[i1][i2][i3]);	// right
+                    if(i1<binsDim1-1) m_NBVEdges[nNBV++] = &(m_3dEdgesUp[i1][i2][i3]); // up
+                    D[i1][i2][i3+1]	+= dFlow; // maintain auxiliary matrix
+                    d3s[i3+1] += dFlow;
+                }
+
+                pBV->flow = fabs(dFlow);
+                pBV->iDir = dFlow>0; // 1:outward, 0:inward
+                pBV->pParent->pChild= pBV;
+            }
+        }
+    }
+    return true;
+}
+
+void EmdL1::initBVTree()
+{
+    // initialize BVTree from the initial BF solution
+    //- Using the center of the graph as the root
+    int r = (int)(0.5*binsDim1-.5);
+    int c = (int)(0.5*binsDim2-.5);
+    int z = (int)(0.5*binsDim3-.5);
+    m_pRoot	= dimension==2 ? &(m_Nodes[r][c]) : &(m_3dNodes[r][c][z]);
+    m_pRoot->u = 0;
+    m_pRoot->iLevel	= 0;
+    m_pRoot->pParent= NULL;
+    m_pRoot->pPEdge	= NULL;
+
+    //- Prepare a queue
+    m_auxQueue[0] = m_pRoot;
+    int nQueue = 1; // length of queue
+    int iQHead = 0; // head of queue
+
+    //- Recursively build subtrees
+    cvPEmdEdge pCurE=NULL, pNxtE=NULL;
+    cvPEmdNode pCurN=NULL, pNxtN=NULL;
+    int	nBin = binsDim1*binsDim2*std::max(binsDim3,1);
+    while(iQHead<nQueue && nQueue<nBin)
+    {
+        pCurN = m_auxQueue[iQHead++];	// pop out from queue
+        r = pCurN->pos[0];
+        c = pCurN->pos[1];
+        z = pCurN->pos[2];
+
+        // check connection from itself
+        pCurE = pCurN->pChild;	// the initial child from initial solution
+        if(pCurE)
+        {
+            pNxtN = pCurE->pChild;
+            pNxtN->pParent = pCurN;
+            pNxtN->pPEdge = pCurE;
+            m_auxQueue[nQueue++] = pNxtN;
+        }
+
+        // check four neighbor nodes
+        int	nNB	= dimension==2?4:6;
+        for(int k=0;k<nNB;k++)
+        {
+            if(dimension==2)
+            {
+                if(k==0 && c>0) pNxtN = &(m_Nodes[r][c-1]);		// left
+                else if(k==1 && r>0) pNxtN	= &(m_Nodes[r-1][c]);		// down
+                else if(k==2 && c<binsDim2-1) pNxtN	= &(m_Nodes[r][c+1]);		// right
+                else if(k==3 && r<binsDim1-1) pNxtN	= &(m_Nodes[r+1][c]);		// up
+                else continue;
+            }
+            else if(dimension==3)
+            {
+                if(k==0 && c>0) pNxtN = &(m_3dNodes[r][c-1][z]); // left
+                else if(k==1 && c<binsDim2-1) pNxtN	= &(m_3dNodes[r][c+1][z]); // right
+                else if(k==2 && r>0) pNxtN	= &(m_3dNodes[r-1][c][z]); // down
+                else if(k==3 && r<binsDim1-1) pNxtN	= &(m_3dNodes[r+1][c][z]); // up
+                else if(k==4 && z>0) pNxtN = &(m_3dNodes[r][c][z-1]); // shallow
+                else if(k==5 && z<binsDim3-1) pNxtN	= &(m_3dNodes[r][c][z+1]); // deep
+                else continue;
+            }
+            if(pNxtN != pCurN->pParent)
+            {
+                CV_Assert(pNxtN != NULL);
+                pNxtE = pNxtN->pChild;
+                if(pNxtE && pNxtE->pChild==pCurN) // has connection
+                {
+                    pNxtN->pParent = pCurN;
+                    pNxtN->pPEdge = pNxtE;
+                    pNxtN->pChild = NULL;
+                    m_auxQueue[nQueue++] = pNxtN;
+
+                    pNxtE->pParent = pCurN; // reverse direction
+                    pNxtE->pChild = pNxtN;
+                    pNxtE->iDir = !pNxtE->iDir;
+
+                    if(pCurE) pCurE->pNxt = pNxtE;	// add to edge list
+                    else pCurN->pChild = pNxtE;
+                    pCurE = pNxtE;
+                }
+            }
+        }
+    }
+}
+
+void EmdL1::updateSubtree(cvPEmdNode pRoot)
+{
+    // Initialize auxiliary queue
+    m_auxQueue[0] = pRoot;
+    int nQueue = 1; // queue length
+    int iQHead = 0; // head of queue
+
+    // BFS browing
+    cvPEmdNode pCurN=NULL,pNxtN=NULL;
+    cvPEmdEdge pCurE=NULL;
+    while(iQHead<nQueue)
+    {
+        pCurN = m_auxQueue[iQHead++];	// pop out from queue
+        pCurE = pCurN->pChild;
+
+        // browsing all children
+        while(pCurE)
+        {
+            pNxtN = pCurE->pChild;
+            pNxtN->iLevel = pCurN->iLevel+1;
+            pNxtN->u = pCurE->iDir ? (pCurN->u - 1) : (pCurN->u + 1);
+            pCurE = pCurE->pNxt;
+            m_auxQueue[nQueue++] = pNxtN;
+        }
+    }
+}
+
+bool EmdL1::isOptimal()
+{
+    int iC, iMinC = 0;
+    cvPEmdEdge pE;
+    m_pEnter = NULL;
+    m_iEnter = -1;
+
+    // test each NON-BV edges
+    for(int k=0; k<nNBV; ++k)
+    {
+        pE = m_NBVEdges[k];
+        iC = 1 - pE->pParent->u + pE->pChild->u;
+        if(iC<iMinC)
+        {
+            iMinC = iC;
+            m_iEnter= k;
+        }
+        else
+        {
+            // Try reversing the direction
+            iC	= 1 + pE->pParent->u - pE->pChild->u;
+            if(iC<iMinC)
+            {
+                iMinC = iC;
+                m_iEnter= k;
+            }
+        }
+    }
+
+    if(m_iEnter>=0)
+    {
+        m_pEnter = m_NBVEdges[m_iEnter];
+        if(iMinC == (1 - m_pEnter->pChild->u + m_pEnter->pParent->u))	{
+            // reverse direction
+            cvPEmdNode pN = m_pEnter->pParent;
+            m_pEnter->pParent = m_pEnter->pChild;
+            m_pEnter->pChild = pN;
+        }
+
+        m_pEnter->iDir = 1;
+    }
+    return m_iEnter==-1;
+}
+
+void EmdL1::findNewSolution()
+{
+    // Find loop formed by adding the Enter BV edge.
+    findLoopFromEnterBV();
+    // Modify flow values along the loop
+    cvPEmdEdge pE = NULL;
+    CV_Assert(m_pLeave != NULL);
+    float	minFlow = m_pLeave->flow;
+    int k;
+    for(k=0; k<m_iFrom; k++)
+    {
+        pE = m_fromLoop[k];
+        if(pE->iDir) pE->flow += minFlow; // outward
+        else pE->flow -= minFlow; // inward
+    }
+    for(k=0; k<m_iTo; k++)
+    {
+        pE = m_toLoop[k];
+        if(pE->iDir) pE->flow -= minFlow; // outward
+        else pE->flow += minFlow; // inward
+    }
+
+    // Update BV Tree, removing the Leaving-BV edge
+    cvPEmdNode pLParentN = m_pLeave->pParent;
+    cvPEmdNode pLChildN = m_pLeave->pChild;
+    cvPEmdEdge pPreE = pLParentN->pChild;
+    if(pPreE==m_pLeave)
+    {
+        pLParentN->pChild = m_pLeave->pNxt; // Leaving-BV is the first child
+    }
+    else
+    {
+        while(pPreE->pNxt != m_pLeave)
+            pPreE	= pPreE->pNxt;
+        pPreE->pNxt	= m_pLeave->pNxt; // remove Leaving-BV from child list
+    }
+    pLChildN->pParent = NULL;
+    pLChildN->pPEdge = NULL;
+
+    m_NBVEdges[m_iEnter]= m_pLeave; // put the leaving-BV into the NBV array
+
+    // Add the Enter BV edge
+    cvPEmdNode pEParentN = m_pEnter->pParent;
+    cvPEmdNode pEChildN = m_pEnter->pChild;
+    m_pEnter->flow = minFlow;
+    m_pEnter->pNxt = pEParentN->pChild;		// insert the Enter BV as the first child
+    pEParentN->pChild = m_pEnter;					//		of its parent
+
+    // Recursively update the tree start from pEChildN
+    cvPEmdNode pPreN = pEParentN;
+    cvPEmdNode pCurN = pEChildN;
+    cvPEmdNode pNxtN;
+    cvPEmdEdge pNxtE, pPreE0;
+    pPreE = m_pEnter;
+    while(pCurN)
+    {
+        pNxtN = pCurN->pParent;
+        pNxtE = pCurN->pPEdge;
+        pCurN->pParent = pPreN;
+        pCurN->pPEdge = pPreE;
+        if(pNxtN)
+        {
+            // remove the edge from pNxtN's child list
+            if(pNxtN->pChild==pNxtE)
+            {
+                pNxtN->pChild	= pNxtE->pNxt;			// first child
+            }
+            else
+            {
+                pPreE0	= pNxtN->pChild;
+                while(pPreE0->pNxt != pNxtE)
+                    pPreE0	= pPreE0->pNxt;
+                pPreE0->pNxt	= pNxtE->pNxt;			// remove Leaving-BV from child list
+            }
+            // reverse the parent-child direction
+            pNxtE->pParent = pCurN;
+            pNxtE->pChild = pNxtN;
+            pNxtE->iDir = !pNxtE->iDir;
+            pNxtE->pNxt = pCurN->pChild;
+            pCurN->pChild = pNxtE;
+            pPreE = pNxtE;
+            pPreN = pCurN;
+        }
+        pCurN = pNxtN;
+    }
+
+    // Update U at the child of the Enter BV
+    pEChildN->u = m_pEnter->iDir?(pEParentN->u-1):(pEParentN->u + 1);
+    pEChildN->iLevel = pEParentN->iLevel+1;
+}
+
+void EmdL1::findLoopFromEnterBV()
+{
+    // Initialize Leaving-BV edge
+    float minFlow	= std::numeric_limits<float>::max();
+    cvPEmdEdge pE = NULL;
+    int iLFlag = 0;	// 0: in the FROM list, 1: in the TO list
+
+    // Using two loop list to store the loop nodes
+    cvPEmdNode pFrom = m_pEnter->pParent;
+    cvPEmdNode pTo = m_pEnter->pChild;
+    m_iFrom	= 0;
+    m_iTo = 0;
+    m_pLeave = NULL;
+
+    // Trace back to make pFrom and pTo at the same level
+    while(pFrom->iLevel > pTo->iLevel)
+    {
+        pE = pFrom->pPEdge;
+        m_fromLoop[m_iFrom++] = pE;
+        if(!pE->iDir && pE->flow<minFlow)
+        {
+            minFlow = pE->flow;
+            m_pLeave = pE;
+            iLFlag = 0;	// 0: in the FROM list
+        }
+        pFrom = pFrom->pParent;
+    }
+
+    while(pTo->iLevel > pFrom->iLevel)
+    {
+        pE = pTo->pPEdge;
+        m_toLoop[m_iTo++] = pE;
+        if(pE->iDir && pE->flow<minFlow)
+        {
+            minFlow = pE->flow;
+            m_pLeave = pE;
+            iLFlag = 1;	// 1: in the TO list
+        }
+        pTo	= pTo->pParent;
+    }
+
+    // Trace pTo and pFrom simultaneously till find their common ancester
+    while(pTo!=pFrom)
+    {
+        pE = pFrom->pPEdge;
+        m_fromLoop[m_iFrom++] = pE;
+        if(!pE->iDir && pE->flow<minFlow)
+        {
+            minFlow = pE->flow;
+            m_pLeave = pE;
+            iLFlag = 0;	// 0: in the FROM list, 1: in the TO list
+        }
+        pFrom = pFrom->pParent;
+
+        pE = pTo->pPEdge;
+        m_toLoop[m_iTo++] = pE;
+        if(pE->iDir && pE->flow<minFlow)
+        {
+            minFlow = pE->flow;
+            m_pLeave = pE;
+            iLFlag = 1;	// 0: in the FROM list, 1: in the TO list
+        }
+        pTo	= pTo->pParent;
+    }
+
+    // Reverse the direction of the Enter BV edge if necessary
+    if(iLFlag==0)
+    {
+        cvPEmdNode pN = m_pEnter->pParent;
+        m_pEnter->pParent = m_pEnter->pChild;
+        m_pEnter->pChild = pN;
+        m_pEnter->iDir = !m_pEnter->iDir;
+    }
+}
+
+float EmdL1::compuTotalFlow()
+{
+    // Computing the total flow as the final distance
+    float f = 0;
+
+    // Initialize auxiliary queue
+    m_auxQueue[0] = m_pRoot;
+    int nQueue = 1; // length of queue
+    int iQHead = 0; // head of queue
+
+    // BFS browing the tree
+    cvPEmdNode pCurN=NULL,pNxtN=NULL;
+    cvPEmdEdge pCurE=NULL;
+    while(iQHead<nQueue)
+    {
+        pCurN = m_auxQueue[iQHead++];	// pop out from queue
+        pCurE = pCurN->pChild;
+
+        // browsing all children
+        while(pCurE)
+        {
+            f += pCurE->flow;
+            pNxtN = pCurE->pChild;
+            pCurE = pCurE->pNxt;
+            m_auxQueue[nQueue++] = pNxtN;
+        }
+    }
+    return f;
+}
+
+/****************************************************************************************\
+*                                   EMDL1 Function                                      *
+\****************************************************************************************/
+
+float cv::EMDL1(InputArray _signature1, InputArray _signature2)
+{
+    CV_INSTRUMENT_REGION();
+
+    Mat signature1 = _signature1.getMat(), signature2 = _signature2.getMat();
+    EmdL1 emdl1;
+    return emdl1.getEMDL1(signature1, signature2);
+}
diff --git a/modules/shape/src/emdL1_def.hpp b/modules/shape/src/emdL1_def.hpp
new file mode 100644
index 00000000000..e6110c778a0
--- /dev/null
+++ b/modules/shape/src/emdL1_def.hpp
@@ -0,0 +1,148 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include <stdlib.h>
+#include <math.h>
+#include <vector>
+
+/****************************************************************************************\
+*                                   For EMDL1 Framework                                 *
+\****************************************************************************************/
+typedef struct cvEMDEdge* cvPEmdEdge;
+typedef struct cvEMDNode* cvPEmdNode;
+struct cvEMDNode
+{
+    int pos[3]; // grid position
+    float d; // initial value
+    int u;
+    // tree maintenance
+    int iLevel; // level in the tree, 0 means root
+    cvPEmdNode pParent; // pointer to its parent
+    cvPEmdEdge pChild;
+    cvPEmdEdge pPEdge; // point to the edge coming out from its parent
+};
+struct cvEMDEdge
+{
+    float flow; // initial value
+    int iDir; // 1:outward, 0:inward
+    // tree maintenance
+    cvPEmdNode pParent; // point to its parent
+    cvPEmdNode pChild; // the child node
+    cvPEmdEdge pNxt; // next child/edge
+};
+typedef std::vector<cvEMDNode> cvEMDNodeArray;
+typedef std::vector<cvEMDEdge> cvEMDEdgeArray;
+typedef std::vector<cvEMDNodeArray> cvEMDNodeArray2D;
+typedef std::vector<cvEMDEdgeArray> cvEMDEdgeArray2D;
+typedef std::vector<float> floatArray;
+typedef std::vector<floatArray> floatArray2D;
+
+/****************************************************************************************\
+*                                   EMDL1 Class                                         *
+\****************************************************************************************/
+class EmdL1
+{
+public:
+    EmdL1()
+    {
+        m_pRoot	= NULL;
+        binsDim1 = 0;
+        binsDim2 = 0;
+        binsDim3 = 0;
+        dimension = 0;
+        nMaxIt = 500;
+
+        m_pLeave = 0;
+        m_iEnter = 0;
+        nNBV = 0;
+        m_nItr = 0;
+        m_iTo = 0;
+        m_iFrom = 0;
+        m_pEnter = 0;
+    }
+
+    ~EmdL1()
+    {
+    }
+
+    float getEMDL1(cv::Mat &sig1, cv::Mat &sig2);
+    void setMaxIteration(int _nMaxIt);
+
+private:
+    //-- SubFunctions called in the EMD algorithm
+    bool initBaseTrees(int n1=0, int n2=0, int n3=0);
+    bool fillBaseTrees(float *H1, float *H2);
+    bool greedySolution();
+    bool greedySolution2();
+    bool greedySolution3();
+    void initBVTree();
+    void updateSubtree(cvPEmdNode pRoot);
+    bool isOptimal();
+    void findNewSolution();
+    void findLoopFromEnterBV();
+    float compuTotalFlow();
+
+private:
+    int dimension;
+    int binsDim1, binsDim2, binsDim3; // the histogram contains m_n1 rows and m_n2 columns
+    int nNBV; // number of Non-Basic Variables (NBV)
+    int nMaxIt;
+    cvEMDNodeArray2D m_Nodes; // all nodes
+    cvEMDEdgeArray2D m_EdgesRight; // all edges to right
+    cvEMDEdgeArray2D m_EdgesUp; // all edges to upward
+    std::vector<cvEMDNodeArray2D>	m_3dNodes; // all nodes for 3D
+    std::vector<cvEMDEdgeArray2D>	m_3dEdgesRight; // all edges to right, 3D
+    std::vector<cvEMDEdgeArray2D>	m_3dEdgesUp; // all edges to upward, 3D
+    std::vector<cvEMDEdgeArray2D>	m_3dEdgesDeep; // all edges to deep, 3D
+    std::vector<cvPEmdEdge> m_NBVEdges; // pointers to all NON-BV edges
+    std::vector<cvPEmdNode> m_auxQueue; // auxiliary node queue
+    cvPEmdNode m_pRoot; // root of the BV Tree
+    cvPEmdEdge m_pEnter; // Enter BV edge
+    int m_iEnter; // Enter BV edge, index in m_NBVEdges
+    cvPEmdEdge m_pLeave; // Leave BV edge
+    int m_nItr; // number of iteration
+    // auxiliary variables for searching a new loop
+    std::vector<cvPEmdEdge> m_fromLoop;
+    std::vector<cvPEmdEdge> m_toLoop;
+    int	m_iFrom;
+    int m_iTo;
+};
diff --git a/modules/shape/src/haus_dis.cpp b/modules/shape/src/haus_dis.cpp
new file mode 100644
index 00000000000..a544edcaa5e
--- /dev/null
+++ b/modules/shape/src/haus_dis.cpp
@@ -0,0 +1,157 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                        Intel License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000, Intel Corporation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of Intel Corporation may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+namespace cv
+{
+
+class HausdorffDistanceExtractorImpl CV_FINAL : public HausdorffDistanceExtractor
+{
+public:
+    /* Constructor */
+    HausdorffDistanceExtractorImpl(int _distanceFlag = NORM_L1, float _rankProportion=0.6)
+    {
+        distanceFlag = _distanceFlag;
+        rankProportion = _rankProportion;
+        name_ = "ShapeDistanceExtractor.HAU";
+    }
+
+    /* Destructor */
+    ~HausdorffDistanceExtractorImpl()
+    {
+    }
+
+    //! the main operator
+    virtual float computeDistance(InputArray contour1, InputArray contour2) CV_OVERRIDE;
+
+    //! Setters/Getters
+    virtual void setDistanceFlag(int _distanceFlag) CV_OVERRIDE {distanceFlag=_distanceFlag;}
+    virtual int getDistanceFlag() const CV_OVERRIDE {return distanceFlag;}
+
+    virtual void setRankProportion(float _rankProportion) CV_OVERRIDE
+    {
+        CV_Assert((_rankProportion>0) && (_rankProportion<=1));
+        rankProportion=_rankProportion;
+    }
+    virtual float getRankProportion() const CV_OVERRIDE {return rankProportion;}
+
+    //! write/read
+    virtual void write(FileStorage& fs) const CV_OVERRIDE
+    {
+        writeFormat(fs);
+        fs << "name" << name_
+           << "distance" << distanceFlag
+           << "rank" << rankProportion;
+    }
+
+    virtual void read(const FileNode& fn) CV_OVERRIDE
+    {
+        CV_Assert( (String)fn["name"] == name_ );
+        distanceFlag = (int)fn["distance"];
+        rankProportion = (float)fn["rank"];
+    }
+
+private:
+    int distanceFlag;
+    float rankProportion;
+
+protected:
+    String name_;
+};
+
+//! Hausdorff distance for a pair of set of points
+static float _apply(const Mat &set1, const Mat &set2, int distType, double propRank)
+{
+    // Building distance matrix //
+    Mat disMat(set1.cols, set2.cols, CV_32F);
+    int K = int(propRank*(disMat.rows-1));
+
+    for (int r=0; r<disMat.rows; r++)
+    {
+        for (int c=0; c<disMat.cols; c++)
+        {
+            Point2f diff = set1.at<Point2f>(0,r)-set2.at<Point2f>(0,c);
+            disMat.at<float>(r,c) = (float)norm(Mat(diff), distType);
+        }
+    }
+
+    Mat shortest(disMat.rows,1,CV_32F);
+    for (int ii=0; ii<disMat.rows; ii++)
+    {
+        Mat therow = disMat.row(ii);
+        double mindis;
+        minMaxIdx(therow, &mindis);
+        shortest.at<float>(ii,0) = float(mindis);
+    }
+    Mat sorted;
+    cv::sort(shortest, sorted, SORT_EVERY_ROW | SORT_DESCENDING);
+    return sorted.at<float>(K,0);
+}
+
+float HausdorffDistanceExtractorImpl::computeDistance(InputArray contour1, InputArray contour2)
+{
+    CV_INSTRUMENT_REGION();
+
+    Mat set1=contour1.getMat(), set2=contour2.getMat();
+    if (set1.type() != CV_32F)
+        set1.convertTo(set1, CV_32F);
+    if (set2.type() != CV_32F)
+        set2.convertTo(set2, CV_32F);
+    CV_Assert((set1.channels()==2) && (set1.cols>0));
+    CV_Assert((set2.channels()==2) && (set2.cols>0));
+
+    // Force vectors column-based
+    if (set1.dims > 1)
+        set1 = set1.reshape(2, 1);
+    if (set2.dims > 1)
+        set2 = set2.reshape(2, 1);
+
+    return std::max( _apply(set1, set2, distanceFlag, rankProportion),
+                     _apply(set2, set1, distanceFlag, rankProportion) );
+}
+
+Ptr <HausdorffDistanceExtractor> createHausdorffDistanceExtractor(int distanceFlag, float rankProp)
+{
+    return Ptr<HausdorffDistanceExtractor>(new HausdorffDistanceExtractorImpl(distanceFlag, rankProp));
+}
+
+} // cv
diff --git a/modules/shape/src/hist_cost.cpp b/modules/shape/src/hist_cost.cpp
new file mode 100644
index 00000000000..f255d606948
--- /dev/null
+++ b/modules/shape/src/hist_cost.cpp
@@ -0,0 +1,549 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                        Intel License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000, Intel Corporation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of Intel Corporation may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+namespace cv
+{
+
+/*!  */
+class NormHistogramCostExtractorImpl CV_FINAL : public NormHistogramCostExtractor
+{
+public:
+    /* Constructors */
+    NormHistogramCostExtractorImpl(int _flag, int _nDummies, float _defaultCost)
+    {
+        flag=_flag;
+        nDummies=_nDummies;
+        defaultCost=_defaultCost;
+        name_ = "HistogramCostExtractor.NOR";
+    }
+
+    /* Destructor */
+    ~NormHistogramCostExtractorImpl() CV_OVERRIDE
+    {
+    }
+
+    //! the main operator
+    virtual void buildCostMatrix(InputArray descriptors1, InputArray descriptors2, OutputArray costMatrix) CV_OVERRIDE;
+
+    //! Setters/Getters
+    void setNDummies(int _nDummies) CV_OVERRIDE
+    {
+        nDummies=_nDummies;
+    }
+
+    int getNDummies() const CV_OVERRIDE
+    {
+        return nDummies;
+    }
+
+    void setDefaultCost(float _defaultCost) CV_OVERRIDE
+    {
+        defaultCost=_defaultCost;
+    }
+
+    float getDefaultCost() const CV_OVERRIDE
+    {
+        return defaultCost;
+    }
+
+    virtual void setNormFlag(int _flag) CV_OVERRIDE
+    {
+        flag=_flag;
+    }
+
+    virtual int getNormFlag() const CV_OVERRIDE
+    {
+        return flag;
+    }
+
+    //! write/read
+    virtual void write(FileStorage& fs) const CV_OVERRIDE
+    {
+        writeFormat(fs);
+        fs << "name" << name_
+           << "flag" << flag
+           << "dummies" << nDummies
+           << "default" << defaultCost;
+    }
+
+    virtual void read(const FileNode& fn) CV_OVERRIDE
+    {
+        CV_Assert( (String)fn["name"] == name_ );
+        flag = (int)fn["flag"];
+        nDummies = (int)fn["dummies"];
+        defaultCost = (float)fn["default"];
+    }
+
+private:
+    int flag;
+    int nDummies;
+    float defaultCost;
+
+protected:
+    String name_;
+};
+
+void NormHistogramCostExtractorImpl::buildCostMatrix(InputArray _descriptors1, InputArray _descriptors2, OutputArray _costMatrix)
+{
+    CV_INSTRUMENT_REGION();
+
+    // size of the costMatrix with dummies //
+    Mat descriptors1=_descriptors1.getMat();
+    Mat descriptors2=_descriptors2.getMat();
+    int costrows = std::max(descriptors1.rows, descriptors2.rows)+nDummies;
+    _costMatrix.create(costrows, costrows, CV_32F);
+    Mat costMatrix=_costMatrix.getMat();
+
+
+    // Obtain copies of the descriptors //
+    cv::Mat scd1 = descriptors1.clone();
+    cv::Mat scd2 = descriptors2.clone();
+
+    // row normalization //
+    for(int i=0; i<scd1.rows; i++)
+    {
+        scd1.row(i)/=(sum(scd1.row(i))[0]+FLT_EPSILON);
+    }
+    for(int i=0; i<scd2.rows; i++)
+    {
+        scd2.row(i)/=(sum(scd2.row(i))[0]+FLT_EPSILON);
+    }
+
+    // Compute the Cost Matrix //
+    for(int i=0; i<costrows; i++)
+    {
+        for(int j=0; j<costrows; j++)
+        {
+            if (i<scd1.rows && j<scd2.rows)
+            {
+                Mat columnDiff = scd1.row(i)-scd2.row(j);
+                costMatrix.at<float>(i,j)=(float)norm(columnDiff, flag);
+            }
+            else
+            {
+                costMatrix.at<float>(i,j)=defaultCost;
+            }
+        }
+    }
+}
+
+Ptr <HistogramCostExtractor> createNormHistogramCostExtractor(int flag, int nDummies, float defaultCost)
+{
+    return Ptr <HistogramCostExtractor>( new NormHistogramCostExtractorImpl(flag, nDummies, defaultCost) );
+}
+
+/*!  */
+class EMDHistogramCostExtractorImpl CV_FINAL : public EMDHistogramCostExtractor
+{
+public:
+    /* Constructors */
+    EMDHistogramCostExtractorImpl(int _flag, int _nDummies, float _defaultCost)
+    {
+        flag=_flag;
+        nDummies=_nDummies;
+        defaultCost=_defaultCost;
+        name_ = "HistogramCostExtractor.EMD";
+    }
+
+    /* Destructor */
+    ~EMDHistogramCostExtractorImpl() CV_OVERRIDE
+    {
+    }
+
+    //! the main operator
+    virtual void buildCostMatrix(InputArray descriptors1, InputArray descriptors2, OutputArray costMatrix) CV_OVERRIDE;
+
+    //! Setters/Getters
+    void setNDummies(int _nDummies) CV_OVERRIDE
+    {
+        nDummies=_nDummies;
+    }
+
+    int getNDummies() const CV_OVERRIDE
+    {
+        return nDummies;
+    }
+
+    void setDefaultCost(float _defaultCost) CV_OVERRIDE
+    {
+        defaultCost=_defaultCost;
+    }
+
+    float getDefaultCost() const CV_OVERRIDE
+    {
+        return defaultCost;
+    }
+
+    virtual void setNormFlag(int _flag) CV_OVERRIDE
+    {
+        flag=_flag;
+    }
+
+    virtual int getNormFlag() const CV_OVERRIDE
+    {
+        return flag;
+    }
+
+    //! write/read
+    virtual void write(FileStorage& fs) const CV_OVERRIDE
+    {
+        writeFormat(fs);
+        fs << "name" << name_
+           << "flag" << flag
+           << "dummies" << nDummies
+           << "default" << defaultCost;
+    }
+
+    virtual void read(const FileNode& fn) CV_OVERRIDE
+    {
+        CV_Assert( (String)fn["name"] == name_ );
+        flag = (int)fn["flag"];
+        nDummies = (int)fn["dummies"];
+        defaultCost = (float)fn["default"];
+    }
+
+private:
+    int flag;
+    int nDummies;
+    float defaultCost;
+
+protected:
+    String name_;
+};
+
+void EMDHistogramCostExtractorImpl::buildCostMatrix(InputArray _descriptors1, InputArray _descriptors2, OutputArray _costMatrix)
+{
+    CV_INSTRUMENT_REGION();
+
+    // size of the costMatrix with dummies //
+    Mat descriptors1=_descriptors1.getMat();
+    Mat descriptors2=_descriptors2.getMat();
+    int costrows = std::max(descriptors1.rows, descriptors2.rows)+nDummies;
+    _costMatrix.create(costrows, costrows, CV_32F);
+    Mat costMatrix=_costMatrix.getMat();
+
+    // Obtain copies of the descriptors //
+    cv::Mat scd1=descriptors1.clone();
+    cv::Mat scd2=descriptors2.clone();
+
+    // row normalization //
+    for(int i=0; i<scd1.rows; i++)
+    {
+        cv::Mat row = scd1.row(i);
+        scd1.row(i)/=(sum(row)[0]+FLT_EPSILON);
+    }
+    for(int i=0; i<scd2.rows; i++)
+    {
+        cv::Mat row = scd2.row(i);
+        scd2.row(i)/=(sum(row)[0]+FLT_EPSILON);
+    }
+
+    // Compute the Cost Matrix //
+    for(int i=0; i<costrows; i++)
+    {
+        for(int j=0; j<costrows; j++)
+        {
+            if (i<scd1.rows && j<scd2.rows)
+            {
+                cv::Mat sig1(scd1.cols,2,CV_32F), sig2(scd2.cols,2,CV_32F);
+                sig1.col(0)=scd1.row(i).t();
+                sig2.col(0)=scd2.row(j).t();
+                for (int k=0; k<sig1.rows; k++)
+                {
+                    sig1.at<float>(k,1)=float(k);
+                }
+                for (int k=0; k<sig2.rows; k++)
+                {
+                    sig2.at<float>(k,1)=float(k);
+                }
+
+                costMatrix.at<float>(i,j) = cv::EMD(sig1, sig2, flag);
+            }
+            else
+            {
+                costMatrix.at<float>(i,j) = defaultCost;
+            }
+        }
+    }
+}
+
+Ptr <HistogramCostExtractor> createEMDHistogramCostExtractor(int flag, int nDummies, float defaultCost)
+{
+    return Ptr <HistogramCostExtractor>( new EMDHistogramCostExtractorImpl(flag, nDummies, defaultCost) );
+}
+
+/*!  */
+class ChiHistogramCostExtractorImpl CV_FINAL : public ChiHistogramCostExtractor
+{
+public:
+    /* Constructors */
+    ChiHistogramCostExtractorImpl(int _nDummies, float _defaultCost)
+    {
+        name_ = "HistogramCostExtractor.CHI";
+        nDummies=_nDummies;
+        defaultCost=_defaultCost;
+    }
+
+    /* Destructor */
+    ~ChiHistogramCostExtractorImpl() CV_OVERRIDE
+    {
+    }
+
+    //! the main operator
+    virtual void buildCostMatrix(InputArray descriptors1, InputArray descriptors2, OutputArray costMatrix) CV_OVERRIDE;
+
+    //! setters / getters
+    void setNDummies(int _nDummies) CV_OVERRIDE
+    {
+        nDummies=_nDummies;
+    }
+
+    int getNDummies() const CV_OVERRIDE
+    {
+        return nDummies;
+    }
+
+    void setDefaultCost(float _defaultCost) CV_OVERRIDE
+    {
+        defaultCost=_defaultCost;
+    }
+
+    float getDefaultCost() const CV_OVERRIDE
+    {
+        return defaultCost;
+    }
+
+    //! write/read
+    virtual void write(FileStorage& fs) const CV_OVERRIDE
+    {
+        writeFormat(fs);
+        fs << "name" << name_
+           << "dummies" << nDummies
+           << "default" << defaultCost;
+    }
+
+    virtual void read(const FileNode& fn) CV_OVERRIDE
+    {
+        CV_Assert( (String)fn["name"] == name_ );
+        nDummies = (int)fn["dummies"];
+        defaultCost = (float)fn["default"];
+    }
+
+protected:
+    String name_;
+    int nDummies;
+    float defaultCost;
+};
+
+void ChiHistogramCostExtractorImpl::buildCostMatrix(InputArray _descriptors1, InputArray _descriptors2, OutputArray _costMatrix)
+{
+    CV_INSTRUMENT_REGION();
+
+    // size of the costMatrix with dummies //
+    Mat descriptors1=_descriptors1.getMat();
+    Mat descriptors2=_descriptors2.getMat();
+    int costrows = std::max(descriptors1.rows, descriptors2.rows)+nDummies;
+    _costMatrix.create(costrows, costrows, CV_32FC1);
+    Mat costMatrix=_costMatrix.getMat();
+
+    // Obtain copies of the descriptors //
+    cv::Mat scd1=descriptors1.clone();
+    cv::Mat scd2=descriptors2.clone();
+
+    // row normalization //
+    for(int i=0; i<scd1.rows; i++)
+    {
+        cv::Mat row = scd1.row(i);
+        scd1.row(i)/=(sum(row)[0]+FLT_EPSILON);
+    }
+    for(int i=0; i<scd2.rows; i++)
+    {
+       cv::Mat row = scd2.row(i);
+        scd2.row(i)/=(sum(row)[0]+FLT_EPSILON);
+    }
+
+    // Compute the Cost Matrix //
+    for(int i=0; i<costrows; i++)
+    {
+        for(int j=0; j<costrows; j++)
+        {
+            if (i<scd1.rows && j<scd2.rows)
+            {
+                float csum = 0;
+                for(int k=0; k<scd2.cols; k++)
+                {
+                    float resta=scd1.at<float>(i,k)-scd2.at<float>(j,k);
+                    float suma=scd1.at<float>(i,k)+scd2.at<float>(j,k);
+                    csum += resta*resta/(FLT_EPSILON+suma);
+                }
+                costMatrix.at<float>(i,j)=csum/2;
+            }
+            else
+            {
+                costMatrix.at<float>(i,j)=defaultCost;
+            }
+        }
+    }
+}
+
+Ptr <HistogramCostExtractor> createChiHistogramCostExtractor(int nDummies, float defaultCost)
+{
+    return Ptr <HistogramCostExtractor>( new ChiHistogramCostExtractorImpl(nDummies, defaultCost) );
+}
+
+/*!  */
+class EMDL1HistogramCostExtractorImpl CV_FINAL : public EMDL1HistogramCostExtractor
+{
+public:
+    /* Constructors */
+    EMDL1HistogramCostExtractorImpl(int _nDummies, float _defaultCost)
+    {
+        name_ = "HistogramCostExtractor.CHI";
+        nDummies=_nDummies;
+        defaultCost=_defaultCost;
+    }
+
+    /* Destructor */
+    ~EMDL1HistogramCostExtractorImpl() CV_OVERRIDE
+    {
+    }
+
+    //! the main operator
+    virtual void buildCostMatrix(InputArray descriptors1, InputArray descriptors2, OutputArray costMatrix) CV_OVERRIDE;
+
+    //! setters / getters
+    void setNDummies(int _nDummies) CV_OVERRIDE
+    {
+        nDummies=_nDummies;
+    }
+
+    int getNDummies() const CV_OVERRIDE
+    {
+        return nDummies;
+    }
+
+    void setDefaultCost(float _defaultCost) CV_OVERRIDE
+    {
+        defaultCost=_defaultCost;
+    }
+
+    float getDefaultCost() const CV_OVERRIDE
+    {
+        return defaultCost;
+    }
+
+    //! write/read
+    virtual void write(FileStorage& fs) const CV_OVERRIDE
+    {
+        writeFormat(fs);
+        fs << "name" << name_
+           << "dummies" << nDummies
+           << "default" << defaultCost;
+    }
+
+    virtual void read(const FileNode& fn) CV_OVERRIDE
+    {
+        CV_Assert( (String)fn["name"] == name_ );
+        nDummies = (int)fn["dummies"];
+        defaultCost = (float)fn["default"];
+    }
+
+protected:
+    String name_;
+    int nDummies;
+    float defaultCost;
+};
+
+void EMDL1HistogramCostExtractorImpl::buildCostMatrix(InputArray _descriptors1, InputArray _descriptors2, OutputArray _costMatrix)
+{
+    CV_INSTRUMENT_REGION();
+
+    // size of the costMatrix with dummies //
+    Mat descriptors1=_descriptors1.getMat();
+    Mat descriptors2=_descriptors2.getMat();
+    int costrows = std::max(descriptors1.rows, descriptors2.rows)+nDummies;
+    _costMatrix.create(costrows, costrows, CV_32F);
+    Mat costMatrix=_costMatrix.getMat();
+
+    // Obtain copies of the descriptors //
+    cv::Mat scd1=descriptors1.clone();
+    cv::Mat scd2=descriptors2.clone();
+
+    // row normalization //
+    for(int i=0; i<scd1.rows; i++)
+    {
+        cv::Mat row = scd1.row(i);
+        scd1.row(i)/=(sum(row)[0]+FLT_EPSILON);
+    }
+    for(int i=0; i<scd2.rows; i++)
+    {
+        cv::Mat row = scd2.row(i);
+        scd2.row(i)/=(sum(row)[0]+FLT_EPSILON);
+    }
+
+    // Compute the Cost Matrix //
+    for(int i=0; i<costrows; i++)
+    {
+        for(int j=0; j<costrows; j++)
+        {
+            if (i<scd1.rows && j<scd2.rows)
+            {
+                cv::Mat sig1(scd1.cols,1,CV_32F), sig2(scd2.cols,1,CV_32F);
+                sig1.col(0)=scd1.row(i).t();
+                sig2.col(0)=scd2.row(j).t();
+                costMatrix.at<float>(i,j) = cv::EMDL1(sig1, sig2);
+            }
+            else
+            {
+                costMatrix.at<float>(i,j) = defaultCost;
+            }
+        }
+    }
+}
+
+Ptr <HistogramCostExtractor> createEMDL1HistogramCostExtractor(int nDummies, float defaultCost)
+{
+    return Ptr <HistogramCostExtractor>( new EMDL1HistogramCostExtractorImpl(nDummies, defaultCost) );
+}
+
+} // cv
diff --git a/modules/shape/src/precomp.hpp b/modules/shape/src/precomp.hpp
new file mode 100644
index 00000000000..a059b97b5a0
--- /dev/null
+++ b/modules/shape/src/precomp.hpp
@@ -0,0 +1,59 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                          License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_PRECOMP_H__
+#define __OPENCV_PRECOMP_H__
+
+#include <vector>
+#include <cmath>
+#include <iostream>
+
+#include "opencv2/calib3d.hpp"
+#include "opencv2/imgproc.hpp"
+#include "opencv2/shape.hpp"
+
+#include "opencv2/core/utility.hpp"
+#include "opencv2/core/private.hpp"
+
+#include "opencv2/opencv_modules.hpp"
+
+#endif
diff --git a/modules/shape/src/sc_dis.cpp b/modules/shape/src/sc_dis.cpp
new file mode 100644
index 00000000000..23c77f07173
--- /dev/null
+++ b/modules/shape/src/sc_dis.cpp
@@ -0,0 +1,793 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+/*
+ * Implementation of the paper Shape Matching and Object Recognition Using Shape Contexts
+ * Belongie et al., 2002 by Juan Manuel Perez for GSoC 2013.
+ */
+
+#include "precomp.hpp"
+#include "opencv2/core.hpp"
+#include "scd_def.hpp"
+#include <limits>
+
+namespace cv
+{
+class ShapeContextDistanceExtractorImpl : public ShapeContextDistanceExtractor
+{
+public:
+    /* Constructors */
+    ShapeContextDistanceExtractorImpl(int _nAngularBins, int _nRadialBins, float _innerRadius, float _outerRadius, int _iterations,
+                                      const Ptr<HistogramCostExtractor> &_comparer, const Ptr<ShapeTransformer> &_transformer)
+    {
+        nAngularBins=_nAngularBins;
+        nRadialBins=_nRadialBins;
+        innerRadius=_innerRadius;
+        outerRadius=_outerRadius;
+        rotationInvariant=false;
+        comparer=_comparer;
+        iterations=_iterations;
+        transformer=_transformer;
+        bendingEnergyWeight=0.3f;
+        imageAppearanceWeight=0.0f;
+        shapeContextWeight=1.0f;
+        sigma=10.0f;
+        name_ = "ShapeDistanceExtractor.SCD";
+        costFlag = 0;
+    }
+
+    /* Destructor */
+    ~ShapeContextDistanceExtractorImpl() CV_OVERRIDE
+    {
+    }
+
+    //! the main operator
+    virtual float computeDistance(InputArray contour1, InputArray contour2) CV_OVERRIDE;
+
+    //! Setters/Getters
+    virtual void setAngularBins(int _nAngularBins) CV_OVERRIDE { CV_Assert(_nAngularBins>0); nAngularBins=_nAngularBins; }
+    virtual int getAngularBins() const CV_OVERRIDE { return nAngularBins; }
+
+    virtual void setRadialBins(int _nRadialBins) CV_OVERRIDE { CV_Assert(_nRadialBins>0); nRadialBins=_nRadialBins; }
+    virtual int getRadialBins() const CV_OVERRIDE { return nRadialBins; }
+
+    virtual void setInnerRadius(float _innerRadius) CV_OVERRIDE { CV_Assert(_innerRadius>0); innerRadius=_innerRadius; }
+    virtual float getInnerRadius() const CV_OVERRIDE { return innerRadius; }
+
+    virtual void setOuterRadius(float _outerRadius) CV_OVERRIDE { CV_Assert(_outerRadius>0); outerRadius=_outerRadius; }
+    virtual float getOuterRadius() const CV_OVERRIDE { return outerRadius; }
+
+    virtual void setRotationInvariant(bool _rotationInvariant) CV_OVERRIDE { rotationInvariant=_rotationInvariant; }
+    virtual bool getRotationInvariant() const CV_OVERRIDE { return rotationInvariant; }
+
+    virtual void setCostExtractor(Ptr<HistogramCostExtractor> _comparer) CV_OVERRIDE { comparer = _comparer; }
+    virtual Ptr<HistogramCostExtractor> getCostExtractor() const CV_OVERRIDE { return comparer; }
+
+    virtual void setShapeContextWeight(float _shapeContextWeight) CV_OVERRIDE { shapeContextWeight=_shapeContextWeight; }
+    virtual float getShapeContextWeight() const CV_OVERRIDE { return shapeContextWeight; }
+
+    virtual void setImageAppearanceWeight(float _imageAppearanceWeight) CV_OVERRIDE { imageAppearanceWeight=_imageAppearanceWeight; }
+    virtual float getImageAppearanceWeight() const CV_OVERRIDE { return imageAppearanceWeight; }
+
+    virtual void setBendingEnergyWeight(float _bendingEnergyWeight) CV_OVERRIDE { bendingEnergyWeight=_bendingEnergyWeight; }
+    virtual float getBendingEnergyWeight() const CV_OVERRIDE { return bendingEnergyWeight; }
+
+    virtual void setStdDev(float _sigma) CV_OVERRIDE { sigma=_sigma; }
+    virtual float getStdDev() const CV_OVERRIDE { return sigma; }
+
+    virtual void setImages(InputArray _image1, InputArray _image2) CV_OVERRIDE
+    {
+        Mat image1_=_image1.getMat(), image2_=_image2.getMat();
+        CV_Assert((image1_.depth()==0) && (image2_.depth()==0));
+        image1=image1_;
+        image2=image2_;
+    }
+
+    virtual void getImages(OutputArray _image1, OutputArray _image2) const CV_OVERRIDE
+    {
+        CV_Assert((!image1.empty()) && (!image2.empty()));
+        image1.copyTo(_image1);
+        image2.copyTo(_image2);
+    }
+
+    virtual void setIterations(int _iterations) CV_OVERRIDE {CV_Assert(_iterations>0); iterations=_iterations;}
+    virtual int getIterations() const CV_OVERRIDE {return iterations;}
+
+    virtual void setTransformAlgorithm(Ptr<ShapeTransformer> _transformer) CV_OVERRIDE {transformer=_transformer;}
+    virtual Ptr<ShapeTransformer> getTransformAlgorithm() const CV_OVERRIDE {return transformer;}
+
+    //! write/read
+    virtual void write(FileStorage& fs) const CV_OVERRIDE
+    {
+        writeFormat(fs);
+        fs << "name" << name_
+           << "nRads" << nRadialBins
+           << "nAngs" << nAngularBins
+           << "iters" << iterations
+           << "img_1" << image1
+           << "img_2" << image2
+           << "beWei" << bendingEnergyWeight
+           << "scWei" << shapeContextWeight
+           << "iaWei" << imageAppearanceWeight
+           << "costF" << costFlag
+           << "rotIn" << rotationInvariant
+           << "sigma" << sigma;
+    }
+
+    virtual void read(const FileNode& fn) CV_OVERRIDE
+    {
+        CV_Assert( (String)fn["name"] == name_ );
+        nRadialBins = (int)fn["nRads"];
+        nAngularBins = (int)fn["nAngs"];
+        iterations = (int)fn["iters"];
+        bendingEnergyWeight = (float)fn["beWei"];
+        shapeContextWeight = (float)fn["scWei"];
+        imageAppearanceWeight = (float)fn["iaWei"];
+        costFlag = (int)fn["costF"];
+        sigma = (float)fn["sigma"];
+    }
+
+protected:
+    int nAngularBins;
+    int nRadialBins;
+    float innerRadius;
+    float outerRadius;
+    bool rotationInvariant;
+    int costFlag;
+    int iterations;
+    Ptr<ShapeTransformer> transformer;
+    Ptr<HistogramCostExtractor> comparer;
+    Mat image1;
+    Mat image2;
+    float bendingEnergyWeight;
+    float imageAppearanceWeight;
+    float shapeContextWeight;
+    float sigma;
+    String name_;
+};
+
+float ShapeContextDistanceExtractorImpl::computeDistance(InputArray contour1, InputArray contour2)
+{
+    CV_INSTRUMENT_REGION();
+
+    // Checking //
+    Mat sset1=contour1.getMat(), sset2=contour2.getMat(), set1, set2;
+    if (set1.type() != CV_32F)
+        sset1.convertTo(set1, CV_32F);
+    else
+        sset1.copyTo(set1);
+
+    if (set2.type() != CV_32F)
+        sset2.convertTo(set2, CV_32F);
+    else
+        sset2.copyTo(set2);
+
+    CV_Assert((set1.channels()==2) && (set1.cols>0));
+    CV_Assert((set2.channels()==2) && (set2.cols>0));
+
+    // Force vectors column-based
+    if (set1.dims > 1)
+        set1 = set1.reshape(2, 1);
+    if (set2.dims > 1)
+        set2 = set2.reshape(2, 1);
+
+    if (imageAppearanceWeight!=0)
+    {
+        CV_Assert((!image1.empty()) && (!image2.empty()));
+    }
+
+    // Initializing Extractor, Descriptor structures and Matcher //
+    SCD set1SCE(nAngularBins, nRadialBins, innerRadius, outerRadius, rotationInvariant);
+    Mat set1SCD;
+    SCD set2SCE(nAngularBins, nRadialBins, innerRadius, outerRadius, rotationInvariant);
+    Mat set2SCD;
+    SCDMatcher matcher;
+    std::vector<DMatch> matches;
+
+    // Distance components (The output is a linear combination of these 3) //
+    float sDistance=0, bEnergy=0, iAppearance=0;
+    float beta;
+
+    // Initializing some variables //
+    std::vector<int> inliers1, inliers2;
+
+    Ptr<ThinPlateSplineShapeTransformer> transDown = transformer.dynamicCast<ThinPlateSplineShapeTransformer>();
+
+    Mat warpedImage;
+    int ii, jj, pt;
+
+    for (ii=0; ii<iterations; ii++)
+    {
+        // Extract SCD descriptor in the set1 //
+        set1SCE.extractSCD(set1, set1SCD, inliers1);
+
+        // Extract SCD descriptor of the set2 (TARGET) //
+        set2SCE.extractSCD(set2, set2SCD, inliers2, set1SCE.getMeanDistance());
+
+        // regularization parameter with annealing rate annRate //
+        beta=set1SCE.getMeanDistance();
+        beta *= beta;
+
+        // match //
+        matcher.matchDescriptors(set1SCD, set2SCD, matches, comparer, inliers1, inliers2);
+
+        // apply TPS transform //
+        if ( !transDown.empty() )
+            transDown->setRegularizationParameter(beta);
+        transformer->estimateTransformation(set1, set2, matches);
+        bEnergy += transformer->applyTransformation(set1, set1);
+
+        // Image appearance //
+        if (imageAppearanceWeight!=0)
+        {
+            // Have to accumulate the transformation along all the iterations
+            if (ii==0)
+            {
+                if ( !transDown.empty() )
+                {
+                    image2.copyTo(warpedImage);
+                }
+                else
+                {
+                    image1.copyTo(warpedImage);
+                }
+            }
+            transformer->warpImage(warpedImage, warpedImage);
+        }
+    }
+
+    Mat gaussWindow, diffIm;
+    if (imageAppearanceWeight!=0)
+    {
+        // compute appearance cost
+        if ( !transDown.empty() )
+        {
+            resize(warpedImage, warpedImage, image1.size(), 0, 0, INTER_LINEAR_EXACT);
+            Mat temp=(warpedImage-image1);
+            multiply(temp, temp, diffIm);
+        }
+        else
+        {
+            resize(warpedImage, warpedImage, image2.size(), 0, 0, INTER_LINEAR_EXACT);
+            Mat temp=(warpedImage-image2);
+            multiply(temp, temp, diffIm);
+        }
+        gaussWindow = Mat::zeros(warpedImage.rows, warpedImage.cols, CV_32F);
+        for (pt=0; pt<sset1.cols; pt++)
+        {
+            Point2f p = sset1.at<Point2f>(0,pt);
+            for (ii=0; ii<diffIm.rows; ii++)
+            {
+                for (jj=0; jj<diffIm.cols; jj++)
+                {
+                    float val = float(std::exp( -float( (p.x-jj)*(p.x-jj) + (p.y-ii)*(p.y-ii) )/(2*sigma*sigma) ) / (sigma*sigma*2*CV_PI));
+                    gaussWindow.at<float>(ii,jj) += val;
+                }
+            }
+        }
+
+        Mat appIm(diffIm.rows, diffIm.cols, CV_32F);
+        for (ii=0; ii<diffIm.rows; ii++)
+        {
+            for (jj=0; jj<diffIm.cols; jj++)
+            {
+                float elema=float( diffIm.at<uchar>(ii,jj) )/255;
+                float elemb=gaussWindow.at<float>(ii,jj);
+                appIm.at<float>(ii,jj) = elema*elemb;
+            }
+        }
+        iAppearance = float(cv::sum(appIm)[0]/sset1.cols);
+    }
+    sDistance = matcher.getMatchingCost();
+
+    return (sDistance*shapeContextWeight+bEnergy*bendingEnergyWeight+iAppearance*imageAppearanceWeight);
+}
+
+Ptr <ShapeContextDistanceExtractor> createShapeContextDistanceExtractor(int nAngularBins, int nRadialBins, float innerRadius, float outerRadius, int iterations,
+                                                                        const Ptr<HistogramCostExtractor> &comparer, const Ptr<ShapeTransformer> &transformer)
+{
+    return Ptr <ShapeContextDistanceExtractor> ( new ShapeContextDistanceExtractorImpl(nAngularBins, nRadialBins, innerRadius,
+                                                                                       outerRadius, iterations, comparer, transformer) );
+}
+
+//! SCD
+void SCD::extractSCD(cv::Mat &contour, cv::Mat &descriptors, const std::vector<int> &queryInliers, const float _meanDistance)
+{
+    cv::Mat contourMat = contour;
+    cv::Mat disMatrix = cv::Mat::zeros(contourMat.cols, contourMat.cols, CV_32F);
+    cv::Mat angleMatrix = cv::Mat::zeros(contourMat.cols, contourMat.cols, CV_32F);
+
+    std::vector<double> logspaces, angspaces;
+    logarithmicSpaces(logspaces);
+    angularSpaces(angspaces);
+    buildNormalizedDistanceMatrix(contourMat, disMatrix, queryInliers, _meanDistance);
+    buildAngleMatrix(contourMat, angleMatrix);
+
+    // Now, build the descriptor matrix (each row is a point) //
+    descriptors = cv::Mat::zeros(contourMat.cols, descriptorSize(), CV_32F);
+
+    for (int ptidx=0; ptidx<contourMat.cols; ptidx++)
+    {
+        for (int cmp=0; cmp<contourMat.cols; cmp++)
+        {
+            if (ptidx==cmp) continue;
+            if ((int)queryInliers.size()>0)
+            {
+                if (queryInliers[ptidx]==0 || queryInliers[cmp]==0) continue; //avoid outliers
+            }
+
+            int angidx=-1, radidx=-1;
+            for (int i=0; i<nRadialBins; i++)
+            {
+                if (disMatrix.at<float>(ptidx, cmp)<logspaces[i])
+                {
+                    radidx=i;
+                    break;
+                }
+            }
+            for (int i=0; i<nAngularBins; i++)
+            {
+                if (angleMatrix.at<float>(ptidx, cmp)<angspaces[i])
+                {
+                    angidx=i;
+                    break;
+                }
+            }
+            if (angidx!=-1 && radidx!=-1)
+            {
+                int idx = angidx+radidx*nAngularBins;
+                descriptors.at<float>(ptidx, idx)++;
+            }
+        }
+    }
+}
+
+void SCD::logarithmicSpaces(std::vector<double> &vecSpaces) const
+{
+    double logmin=log10(innerRadius);
+    double logmax=log10(outerRadius);
+    double delta=(logmax-logmin)/(nRadialBins-1);
+    double accdelta=0;
+
+    for (int i=0; i<nRadialBins; i++)
+    {
+        double val = std::pow(10,logmin+accdelta);
+        vecSpaces.push_back(val);
+        accdelta += delta;
+    }
+}
+
+void SCD::angularSpaces(std::vector<double> &vecSpaces) const
+{
+    double delta=2*CV_PI/nAngularBins;
+    double val=0;
+
+    for (int i=0; i<nAngularBins; i++)
+    {
+        val += delta;
+        vecSpaces.push_back(val);
+    }
+}
+
+void SCD::buildNormalizedDistanceMatrix(cv::Mat &contour, cv::Mat &disMatrix, const std::vector<int> &queryInliers, const float _meanDistance)
+{
+    cv::Mat contourMat = contour;
+    cv::Mat mask(disMatrix.rows, disMatrix.cols, CV_8U);
+
+    for (int i=0; i<contourMat.cols; i++)
+    {
+      for (int j=0; j<contourMat.cols; j++)
+      {
+          disMatrix.at<float>(i,j) = (float)norm( cv::Mat(contourMat.at<cv::Point2f>(0,i)-contourMat.at<cv::Point2f>(0,j)), cv::NORM_L2 );
+          if (_meanDistance<0)
+          {
+              if (queryInliers.size()>0)
+              {
+                  mask.at<char>(i,j)=char(queryInliers[j] && queryInliers[i]);
+              }
+              else
+              {
+                  mask.at<char>(i,j)=1;
+              }
+          }
+      }
+    }
+
+    if (_meanDistance<0)
+    {
+      meanDistance=(float)mean(disMatrix, mask)[0];
+    }
+    else
+    {
+      meanDistance=_meanDistance;
+    }
+    disMatrix/=meanDistance+FLT_EPSILON;
+}
+
+void SCD::buildAngleMatrix(cv::Mat &contour, cv::Mat &angleMatrix) const
+{
+    cv::Mat contourMat = contour;
+
+    // if descriptor is rotationInvariant compute massCenter //
+    cv::Point2f massCenter(0,0);
+    if (rotationInvariant)
+    {
+        for (int i=0; i<contourMat.cols; i++)
+        {
+            massCenter+=contourMat.at<cv::Point2f>(0,i);
+        }
+        massCenter.x=massCenter.x/(float)contourMat.cols;
+        massCenter.y=massCenter.y/(float)contourMat.cols;
+    }
+
+
+    for (int i=0; i<contourMat.cols; i++)
+    {
+        for (int j=0; j<contourMat.cols; j++)
+        {
+            if (i==j)
+            {
+                angleMatrix.at<float>(i,j)=0.0;
+            }
+            else
+            {
+                cv::Point2f dif = contourMat.at<cv::Point2f>(0,i) - contourMat.at<cv::Point2f>(0,j);
+                angleMatrix.at<float>(i,j) = std::atan2(dif.y, dif.x);
+
+                if (rotationInvariant)
+                {
+                    cv::Point2f refPt = contourMat.at<cv::Point2f>(0,i) - massCenter;
+                    float refAngle = atan2(refPt.y, refPt.x);
+                    angleMatrix.at<float>(i,j) -= refAngle;
+                }
+                angleMatrix.at<float>(i,j) = float(fmod(double(angleMatrix.at<float>(i,j)+(double)FLT_EPSILON),2*CV_PI)+CV_PI);
+            }
+        }
+    }
+}
+
+//! SCDMatcher
+void SCDMatcher::matchDescriptors(cv::Mat &descriptors1, cv::Mat &descriptors2, std::vector<cv::DMatch> &matches,
+                                  cv::Ptr<cv::HistogramCostExtractor> &comparer, std::vector<int> &inliers1, std::vector<int> &inliers2)
+{
+    matches.clear();
+
+    // Build the cost Matrix between descriptors //
+    cv::Mat costMat;
+    buildCostMatrix(descriptors1, descriptors2, costMat, comparer);
+
+    // Solve the matching problem using the hungarian method //
+    hungarian(costMat, matches, inliers1, inliers2, descriptors1.rows, descriptors2.rows);
+}
+
+void SCDMatcher::buildCostMatrix(const cv::Mat &descriptors1, const cv::Mat &descriptors2,
+                                 cv::Mat &costMatrix, cv::Ptr<cv::HistogramCostExtractor> &comparer) const
+{
+    CV_INSTRUMENT_REGION();
+
+    comparer->buildCostMatrix(descriptors1, descriptors2, costMatrix);
+}
+
+void SCDMatcher::hungarian(cv::Mat &costMatrix, std::vector<cv::DMatch> &outMatches, std::vector<int> &inliers1,
+                           std::vector<int> &inliers2, int sizeScd1, int sizeScd2)
+{
+    std::vector<int> free(costMatrix.rows, 0), collist(costMatrix.rows, 0);
+    std::vector<int> matches(costMatrix.rows, 0), colsol(costMatrix.rows), rowsol(costMatrix.rows);
+    std::vector<float> d(costMatrix.rows), pred(costMatrix.rows), v(costMatrix.rows);
+
+    const float LOWV = 1e-10f;
+    bool unassignedfound;
+    int  i=0, imin=0, numfree=0, prvnumfree=0, f=0, i0=0, k=0, freerow=0;
+    int  j=0, j1=0, j2=0, endofpath=0, last=0, low=0, up=0;
+    float min=0, h=0, umin=0, usubmin=0, v2=0;
+
+    // COLUMN REDUCTION //
+    for (j = costMatrix.rows-1; j >= 0; j--)
+    {
+        // find minimum cost over rows.
+        min = costMatrix.at<float>(0,j);
+        imin = 0;
+        for (i = 1; i < costMatrix.rows; i++)
+        if (costMatrix.at<float>(i,j) < min)
+        {
+            min = costMatrix.at<float>(i,j);
+            imin = i;
+        }
+        v[j] = min;
+
+        if (++matches[imin] == 1)
+        {
+            rowsol[imin] = j;
+            colsol[j] = imin;
+        }
+        else
+        {
+            colsol[j]=-1;
+        }
+    }
+
+    // REDUCTION TRANSFER //
+    for (i=0; i<costMatrix.rows; i++)
+    {
+        if (matches[i] == 0)
+        {
+            free[numfree++] = i;
+        }
+        else
+        {
+            if (matches[i] == 1)
+            {
+                j1=rowsol[i];
+                min=std::numeric_limits<float>::max();
+                for (j=0; j<costMatrix.rows; j++)
+                {
+                    if (j!=j1)
+                    {
+                        if (costMatrix.at<float>(i,j)-v[j] < min)
+                        {
+                            min=costMatrix.at<float>(i,j)-v[j];
+                        }
+                    }
+                }
+                v[j1] = v[j1]-min;
+            }
+        }
+    }
+    // AUGMENTING ROW REDUCTION //
+    int loopcnt = 0;
+    do
+    {
+        loopcnt++;
+        k=0;
+        prvnumfree=numfree;
+        numfree=0;
+        while (k < prvnumfree)
+        {
+            i=free[k];
+            k++;
+            umin = costMatrix.at<float>(i,0)-v[0];
+            j1=0;
+            usubmin = std::numeric_limits<float>::max();
+            for (j=1; j<costMatrix.rows; j++)
+            {
+                h = costMatrix.at<float>(i,j)-v[j];
+                if (h < usubmin)
+                {
+                    if (h >= umin)
+                    {
+                        usubmin = h;
+                        j2 = j;
+                    }
+                    else
+                    {
+                        usubmin = umin;
+                        umin = h;
+                        j2 = j1;
+                        j1 = j;
+                    }
+                }
+            }
+            i0 = colsol[j1];
+
+            if (fabs(umin-usubmin) > LOWV) //if( umin < usubmin )
+            {
+                v[j1] = v[j1] - (usubmin - umin);
+            }
+            else // minimum and subminimum equal.
+            {
+                if (i0 >= 0) // minimum column j1 is assigned.
+                {
+                    j1 = j2;
+                    i0 = colsol[j2];
+                }
+            }
+            // (re-)assign i to j1, possibly de-assigning an i0.
+            rowsol[i]=j1;
+            colsol[j1]=i;
+
+            if (i0 >= 0)
+            {
+                //if( umin < usubmin )
+                if (fabs(umin-usubmin) > LOWV)
+                {
+                    free[--k] = i0;
+                }
+                else
+                {
+                    free[numfree++] = i0;
+                }
+            }
+        }
+    }while (loopcnt<2); // repeat once.
+
+    // AUGMENT SOLUTION for each free row //
+    for (f = 0; f<numfree; f++)
+    {
+        freerow = free[f];       // start row of augmenting path.
+        // Dijkstra shortest path algorithm.
+        // runs until unassigned column added to shortest path tree.
+        for (j = 0; j < costMatrix.rows; j++)
+        {
+            d[j] = costMatrix.at<float>(freerow,j) - v[j];
+            pred[j] = float(freerow);
+            collist[j] = j;        // init column list.
+        }
+
+        low=0; // columns in 0..low-1 are ready, now none.
+        up=0;  // columns in low..up-1 are to be scanned for current minimum, now none.
+        unassignedfound = false;
+        do
+        {
+            if (up == low)
+            {
+                last=low-1;
+                min = d[collist[up++]];
+                for (k = up; k < costMatrix.rows; k++)
+                {
+                    j = collist[k];
+                    h = d[j];
+                    if (h <= min)
+                    {
+                        if (h < min) // new minimum.
+                        {
+                            up = low; // restart list at index low.
+                            min = h;
+                        }
+                        collist[k] = collist[up];
+                        collist[up++] = j;
+                    }
+                }
+                for (k=low; k<up; k++)
+                {
+                    if (colsol[collist[k]] < 0)
+                    {
+                        endofpath = collist[k];
+                        unassignedfound = true;
+                        break;
+                    }
+                }
+            }
+
+            if (!unassignedfound)
+            {
+                // update 'distances' between freerow and all unscanned columns, via next scanned column.
+                j1 = collist[low];
+                low++;
+                i = colsol[j1];
+                h = costMatrix.at<float>(i,j1)-v[j1]-min;
+
+                for (k = up; k < costMatrix.rows; k++)
+                {
+                    j = collist[k];
+                    v2 = costMatrix.at<float>(i,j) - v[j] - h;
+                    if (v2 < d[j])
+                    {
+                        pred[j] = float(i);
+                        if (v2 == min)
+                        {
+                            if (colsol[j] < 0)
+                            {
+                                // if unassigned, shortest augmenting path is complete.
+                                endofpath = j;
+                                unassignedfound = true;
+                                break;
+                            }
+                            else
+                            {
+                                collist[k] = collist[up];
+                                collist[up++] = j;
+                            }
+                        }
+                        d[j] = v2;
+                    }
+                }
+            }
+        }while (!unassignedfound);
+
+        // update column prices.
+        for (k = 0; k <= last; k++)
+        {
+            j1 = collist[k];
+            v[j1] = v[j1] + d[j1] - min;
+        }
+
+        // reset row and column assignments along the alternating path.
+        do
+        {
+            i = int(pred[endofpath]);
+            colsol[endofpath] = i;
+            j1 = endofpath;
+            endofpath = rowsol[i];
+            rowsol[i] = j1;
+        }while (i != freerow);
+    }
+
+    // calculate symmetric shape context cost
+    cv::Mat trueCostMatrix(costMatrix, cv::Rect(0,0,sizeScd1, sizeScd2));
+    CV_Assert(!trueCostMatrix.empty());
+    float leftcost = 0;
+    for (int nrow=0; nrow<trueCostMatrix.rows; nrow++)
+    {
+        double minval;
+        minMaxIdx(trueCostMatrix.row(nrow), &minval);
+        leftcost+=float(minval);
+    }
+    leftcost /= trueCostMatrix.rows;
+
+    float rightcost = 0;
+    for (int ncol=0; ncol<trueCostMatrix.cols; ncol++)
+    {
+        double minval;
+        minMaxIdx(trueCostMatrix.col(ncol), &minval);
+        rightcost+=float(minval);
+    }
+    rightcost /= trueCostMatrix.cols;
+
+    minMatchCost = std::max(leftcost,rightcost);
+
+    // Save in a DMatch vector
+    for (i=0;i<costMatrix.cols;i++)
+    {
+        cv::DMatch singleMatch(colsol[i],i,costMatrix.at<float>(colsol[i],i));//queryIdx,trainIdx,distance
+        outMatches.push_back(singleMatch);
+    }
+
+    // Update inliers
+    inliers1.reserve(sizeScd1);
+    for (size_t kc = 0; kc<inliers1.size(); kc++)
+    {
+        if (rowsol[kc]<sizeScd2) // if a real match
+            inliers1[kc]=1;
+        else
+            inliers1[kc]=0;
+    }
+    inliers2.reserve(sizeScd2);
+    for (size_t kc = 0; kc<inliers2.size(); kc++)
+    {
+        if (colsol[kc]<sizeScd1) // if a real match
+            inliers2[kc]=1;
+        else
+            inliers2[kc]=0;
+    }
+}
+
+}
diff --git a/modules/shape/src/scd_def.hpp b/modules/shape/src/scd_def.hpp
new file mode 100644
index 00000000000..882ef00f241
--- /dev/null
+++ b/modules/shape/src/scd_def.hpp
@@ -0,0 +1,132 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include <stdlib.h>
+#include <math.h>
+#include <vector>
+
+namespace cv
+{
+/*
+ * ShapeContextDescriptor class
+ */
+class SCD
+{
+public:
+    //! the full constructor taking all the necessary parameters
+    explicit SCD(int _nAngularBins=12, int _nRadialBins=5,
+                 double _innerRadius=0.1, double _outerRadius=1, bool _rotationInvariant=false)
+    {
+        setAngularBins(_nAngularBins);
+        setRadialBins(_nRadialBins);
+        setInnerRadius(_innerRadius);
+        setOuterRadius(_outerRadius);
+        setRotationInvariant(_rotationInvariant);
+        meanDistance = 0;
+    }
+
+    void extractSCD(cv::Mat& contour, cv::Mat& descriptors,
+                    const std::vector<int>& queryInliers=std::vector<int>(),
+                    const float _meanDistance=-1);
+
+    int descriptorSize() {return nAngularBins*nRadialBins;}
+    void setAngularBins(int angularBins) { nAngularBins=angularBins; }
+    void setRadialBins(int radialBins) { nRadialBins=radialBins; }
+    void setInnerRadius(double _innerRadius) { innerRadius=_innerRadius; }
+    void setOuterRadius(double _outerRadius) { outerRadius=_outerRadius; }
+    void setRotationInvariant(bool _rotationInvariant) { rotationInvariant=_rotationInvariant; }
+    int getAngularBins() const { return nAngularBins; }
+    int getRadialBins() const { return nRadialBins; }
+    double getInnerRadius() const { return innerRadius; }
+    double getOuterRadius() const { return outerRadius; }
+    bool getRotationInvariant() const { return rotationInvariant; }
+    float getMeanDistance() const { return meanDistance; }
+
+private:
+    int nAngularBins;
+    int nRadialBins;
+    double innerRadius;
+    double outerRadius;
+    bool rotationInvariant;
+    float meanDistance;
+
+protected:
+    void logarithmicSpaces(std::vector<double>& vecSpaces) const;
+    void angularSpaces(std::vector<double>& vecSpaces) const;
+
+    void buildNormalizedDistanceMatrix(cv::Mat& contour,
+                          cv::Mat& disMatrix, const std::vector<int> &queryInliers,
+                          const float _meanDistance=-1);
+
+    void buildAngleMatrix(cv::Mat& contour,
+                              cv::Mat& angleMatrix) const;
+};
+
+/*
+ * Matcher
+ */
+class SCDMatcher
+{
+public:
+    // the full constructor
+    SCDMatcher() : minMatchCost(0)
+    {
+    }
+
+    // the matcher function using Hungarian method
+    void matchDescriptors(cv::Mat& descriptors1,  cv::Mat& descriptors2, std::vector<cv::DMatch>& matches, cv::Ptr<cv::HistogramCostExtractor>& comparer,
+                                      std::vector<int>& inliers1, std::vector<int> &inliers2);
+
+    // matching cost
+    float getMatchingCost() const {return minMatchCost;}
+
+private:
+    float minMatchCost;
+protected:
+    void buildCostMatrix(const cv::Mat& descriptors1, const cv::Mat& descriptors2,
+                                     cv::Mat& costMatrix, cv::Ptr<cv::HistogramCostExtractor>& comparer) const;
+    void hungarian(cv::Mat& costMatrix, std::vector<cv::DMatch>& outMatches, std::vector<int> &inliers1,
+                   std::vector<int> &inliers2, int sizeScd1=0, int sizeScd2=0);
+
+};
+
+}
diff --git a/modules/shape/src/tps_trans.cpp b/modules/shape/src/tps_trans.cpp
new file mode 100644
index 00000000000..5818c7973ed
--- /dev/null
+++ b/modules/shape/src/tps_trans.cpp
@@ -0,0 +1,295 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+namespace cv
+{
+
+class ThinPlateSplineShapeTransformerImpl CV_FINAL : public ThinPlateSplineShapeTransformer
+{
+public:
+    /* Constructors */
+    ThinPlateSplineShapeTransformerImpl()
+    {
+        regularizationParameter=0;
+        name_ = "ShapeTransformer.TPS";
+        tpsComputed=false;
+        transformCost = 0;
+    }
+
+    ThinPlateSplineShapeTransformerImpl(double _regularizationParameter)
+    {
+        regularizationParameter=_regularizationParameter;
+        name_ = "ShapeTransformer.TPS";
+        tpsComputed=false;
+        transformCost = 0;
+    }
+
+    /* Destructor */
+    ~ThinPlateSplineShapeTransformerImpl() CV_OVERRIDE
+    {
+    }
+
+    //! the main operators
+    virtual void estimateTransformation(InputArray transformingShape, InputArray targetShape, std::vector<DMatch> &matches) CV_OVERRIDE;
+    virtual float applyTransformation(InputArray inPts, OutputArray output=noArray()) CV_OVERRIDE;
+    virtual void warpImage(InputArray transformingImage, OutputArray output,
+                           int flags, int borderMode, const Scalar& borderValue) const CV_OVERRIDE;
+
+    //! Setters/Getters
+    virtual void setRegularizationParameter(double _regularizationParameter) CV_OVERRIDE { regularizationParameter=_regularizationParameter; }
+    virtual double getRegularizationParameter() const CV_OVERRIDE { return regularizationParameter; }
+
+    //! write/read
+    virtual void write(FileStorage& fs) const CV_OVERRIDE
+    {
+        writeFormat(fs);
+        fs << "name" << name_
+           << "regularization" << regularizationParameter;
+    }
+
+    virtual void read(const FileNode& fn) CV_OVERRIDE
+    {
+        CV_Assert( (String)fn["name"] == name_ );
+        regularizationParameter = (int)fn["regularization"];
+    }
+
+private:
+    bool tpsComputed;
+    double regularizationParameter;
+    float transformCost;
+    Mat tpsParameters;
+    Mat shapeReference;
+
+protected:
+    String name_;
+};
+
+static float distance(Point2f p, Point2f q)
+{
+    Point2f diff = p - q;
+    float norma = diff.x*diff.x + diff.y*diff.y;// - 2*diff.x*diff.y;
+    if (norma<0) norma=0;
+    //else norma = std::sqrt(norma);
+    norma = norma*std::log(norma+FLT_EPSILON);
+    return norma;
+}
+
+static Point2f _applyTransformation(const Mat &shapeRef, const Point2f point, const Mat &tpsParameters)
+{
+    Point2f out;
+    for (int i=0; i<2; i++)
+    {
+        float a1=tpsParameters.at<float>(tpsParameters.rows-3,i);
+        float ax=tpsParameters.at<float>(tpsParameters.rows-2,i);
+        float ay=tpsParameters.at<float>(tpsParameters.rows-1,i);
+
+        float affine=a1+ax*point.x+ay*point.y;
+        float nonrigid=0;
+        for (int j=0; j<shapeRef.rows; j++)
+        {
+            nonrigid+=tpsParameters.at<float>(j,i)*
+                    distance(Point2f(shapeRef.at<float>(j,0),shapeRef.at<float>(j,1)),
+                            point);
+        }
+        if (i==0)
+        {
+            out.x=affine+nonrigid;
+        }
+        if (i==1)
+        {
+            out.y=affine+nonrigid;
+        }
+    }
+    return out;
+}
+
+/* public methods */
+void ThinPlateSplineShapeTransformerImpl::warpImage(InputArray transformingImage, OutputArray output,
+                                      int flags, int borderMode, const Scalar& borderValue) const
+{
+    CV_INSTRUMENT_REGION();
+
+    CV_Assert(tpsComputed==true);
+
+    Mat theinput = transformingImage.getMat();
+    Mat mapX(theinput.rows, theinput.cols, CV_32FC1);
+    Mat mapY(theinput.rows, theinput.cols, CV_32FC1);
+
+    for (int row = 0; row < theinput.rows; row++)
+    {
+        for (int col = 0; col < theinput.cols; col++)
+        {
+            Point2f pt = _applyTransformation(shapeReference, Point2f(float(col), float(row)), tpsParameters);
+            mapX.at<float>(row, col) = pt.x;
+            mapY.at<float>(row, col) = pt.y;
+        }
+    }
+    remap(transformingImage, output, mapX, mapY, flags, borderMode, borderValue);
+}
+
+float ThinPlateSplineShapeTransformerImpl::applyTransformation(InputArray inPts, OutputArray outPts)
+{
+    CV_INSTRUMENT_REGION();
+
+    CV_Assert(tpsComputed);
+    Mat pts1 = inPts.getMat();
+    CV_Assert((pts1.channels()==2) && (pts1.cols>0));
+
+    //Apply transformation in the complete set of points
+    // Ensambling output //
+    if (outPts.needed())
+    {
+        outPts.create(1,pts1.cols, CV_32FC2);
+        Mat outMat = outPts.getMat();
+        for (int i=0; i<pts1.cols; i++)
+        {
+            Point2f pt=pts1.at<Point2f>(0,i);
+            outMat.at<Point2f>(0,i)=_applyTransformation(shapeReference, pt, tpsParameters);
+        }
+    }
+
+    return transformCost;
+}
+
+void ThinPlateSplineShapeTransformerImpl::estimateTransformation(InputArray _pts1, InputArray _pts2,
+                                                               std::vector<DMatch>& _matches )
+{
+    CV_INSTRUMENT_REGION();
+
+    Mat pts1 = _pts1.getMat();
+    Mat pts2 = _pts2.getMat();
+    CV_Assert((pts1.channels()==2) && (pts1.cols>0) && (pts2.channels()==2) && (pts2.cols>0));
+    CV_Assert(_matches.size()>1);
+
+    if (pts1.type() != CV_32F)
+        pts1.convertTo(pts1, CV_32F);
+    if (pts2.type() != CV_32F)
+        pts2.convertTo(pts2, CV_32F);
+
+    // Use only valid matchings //
+    std::vector<DMatch> matches;
+    for (size_t i=0; i<_matches.size(); i++)
+    {
+        if (_matches[i].queryIdx<pts1.cols &&
+            _matches[i].trainIdx<pts2.cols)
+        {
+            matches.push_back(_matches[i]);
+        }
+    }
+
+    // Organizing the correspondent points in matrix style //
+    Mat shape1((int)matches.size(),2,CV_32F); // transforming shape
+    Mat shape2((int)matches.size(),2,CV_32F); // target shape
+    for (int i=0, end = (int)matches.size(); i<end; i++)
+    {
+        Point2f pt1=pts1.at<Point2f>(0,matches[i].queryIdx);
+        shape1.at<float>(i,0) = pt1.x;
+        shape1.at<float>(i,1) = pt1.y;
+
+        Point2f pt2=pts2.at<Point2f>(0,matches[i].trainIdx);
+        shape2.at<float>(i,0) = pt2.x;
+        shape2.at<float>(i,1) = pt2.y;
+    }
+    shape1.copyTo(shapeReference);
+
+    // Building the matrices for solving the L*(w|a)=(v|0) problem with L={[K|P];[P'|0]}
+
+    //Building K and P (Needed to build L)
+    Mat matK((int)matches.size(),(int)matches.size(),CV_32F);
+    Mat matP((int)matches.size(),3,CV_32F);
+    for (int i=0, end=(int)matches.size(); i<end; i++)
+    {
+        for (int j=0; j<end; j++)
+        {
+            if (i==j)
+            {
+                matK.at<float>(i,j)=float(regularizationParameter);
+            }
+            else
+            {
+                matK.at<float>(i,j) = distance(Point2f(shape1.at<float>(i,0),shape1.at<float>(i,1)),
+                                               Point2f(shape1.at<float>(j,0),shape1.at<float>(j,1)));
+            }
+        }
+        matP.at<float>(i,0) = 1;
+        matP.at<float>(i,1) = shape1.at<float>(i,0);
+        matP.at<float>(i,2) = shape1.at<float>(i,1);
+    }
+
+    //Building L
+    Mat matL=Mat::zeros((int)matches.size()+3,(int)matches.size()+3,CV_32F);
+    Mat matLroi(matL, Rect(0,0,(int)matches.size(),(int)matches.size())); //roi for K
+    matK.copyTo(matLroi);
+    matLroi = Mat(matL,Rect((int)matches.size(),0,3,(int)matches.size())); //roi for P
+    matP.copyTo(matLroi);
+    Mat matPt;
+    transpose(matP,matPt);
+    matLroi = Mat(matL,Rect(0,(int)matches.size(),(int)matches.size(),3)); //roi for P'
+    matPt.copyTo(matLroi);
+
+    //Building B (v|0)
+    Mat matB = Mat::zeros((int)matches.size()+3,2,CV_32F);
+    for (int i=0, end = (int)matches.size(); i<end; i++)
+    {
+        matB.at<float>(i,0) = shape2.at<float>(i,0); //x's
+        matB.at<float>(i,1) = shape2.at<float>(i,1); //y's
+    }
+
+    //Obtaining transformation params (w|a)
+    solve(matL, matB, tpsParameters, DECOMP_LU);
+    //tpsParameters = matL.inv()*matB;
+
+    //Setting transform Cost and Shape reference
+    Mat w(tpsParameters, Rect(0,0,2,tpsParameters.rows-3));
+    Mat Q=w.t()*matK*w;
+    transformCost=fabs(Q.at<float>(0,0)*Q.at<float>(1,1));//fabs(mean(Q.diag(0))[0]);//std::max(Q.at<float>(0,0),Q.at<float>(1,1));
+    tpsComputed=true;
+}
+
+Ptr <ThinPlateSplineShapeTransformer> createThinPlateSplineShapeTransformer(double regularizationParameter)
+{
+    return Ptr<ThinPlateSplineShapeTransformer>( new ThinPlateSplineShapeTransformerImpl(regularizationParameter) );
+}
+
+} // cv
diff --git a/modules/shape/test/test_main.cpp b/modules/shape/test/test_main.cpp
new file mode 100644
index 00000000000..93e4d2860eb
--- /dev/null
+++ b/modules/shape/test/test_main.cpp
@@ -0,0 +1,10 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#include "test_precomp.hpp"
+
+#if defined(HAVE_HPX)
+    #include <hpx/hpx_main.hpp>
+#endif
+
+CV_TEST_MAIN("cv")
diff --git a/modules/shape/test/test_precomp.hpp b/modules/shape/test/test_precomp.hpp
new file mode 100644
index 00000000000..09d98c6e085
--- /dev/null
+++ b/modules/shape/test/test_precomp.hpp
@@ -0,0 +1,10 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/shape.hpp"
+
+#endif
diff --git a/modules/shape/test/test_shape.cpp b/modules/shape/test/test_shape.cpp
new file mode 100644
index 00000000000..cf3d3f8f7be
--- /dev/null
+++ b/modules/shape/test/test_shape.cpp
@@ -0,0 +1,324 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                        Intel License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000, Intel Corporation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of Intel Corporation may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+template <typename T, typename compute>
+class ShapeBaseTest : public cvtest::BaseTest
+{
+public:
+    typedef Point_<T> PointType;
+    ShapeBaseTest(int _NSN, int _NP, float _CURRENT_MAX_ACCUR)
+        : NSN(_NSN), NP(_NP), CURRENT_MAX_ACCUR(_CURRENT_MAX_ACCUR)
+    {
+        // generate file list
+        vector<string> shapeNames;
+        shapeNames.push_back("apple"); //ok
+        shapeNames.push_back("children"); // ok
+        shapeNames.push_back("device7"); // ok
+        shapeNames.push_back("Heart"); // ok
+        shapeNames.push_back("teddy"); // ok
+        for (vector<string>::const_iterator i = shapeNames.begin(); i != shapeNames.end(); ++i)
+        {
+            for (int j = 0; j < NSN; ++j)
+            {
+                std::stringstream filename;
+                filename << cvtest::TS::ptr()->get_data_path()
+                         << "shape/mpeg_test/" << *i << "-" << j + 1 << ".png";
+                filenames.push_back(filename.str());
+            }
+        }
+        // distance matrix
+        const int totalCount = (int)filenames.size();
+        distanceMat = Mat::zeros(totalCount, totalCount, CV_32F);
+    }
+
+protected:
+    void run(int)
+    {
+        mpegTest();
+        displayMPEGResults();
+    }
+
+    vector<PointType> convertContourType(const Mat& currentQuery) const
+    {
+        if (currentQuery.empty()) {
+            return vector<PointType>();
+        }
+        vector<vector<Point> > _contoursQuery;
+        findContours(currentQuery, _contoursQuery, RETR_LIST, CHAIN_APPROX_NONE);
+
+        vector <PointType> contoursQuery;
+        for (size_t border=0; border<_contoursQuery.size(); border++)
+        {
+            for (size_t p=0; p<_contoursQuery[border].size(); p++)
+            {
+                contoursQuery.push_back(PointType((T)_contoursQuery[border][p].x,
+                                                  (T)_contoursQuery[border][p].y));
+            }
+        }
+
+        // In case actual number of points is less than n
+        for (int add=(int)contoursQuery.size()-1; add<NP; add++)
+        {
+            contoursQuery.push_back(contoursQuery[contoursQuery.size()-add+1]); //adding dummy values
+        }
+
+        // Uniformly sampling
+        cv::randShuffle(contoursQuery);
+        int nStart=NP;
+        vector<PointType> cont;
+        for (int i=0; i<nStart; i++)
+        {
+            cont.push_back(contoursQuery[i]);
+        }
+        return cont;
+    }
+
+    void mpegTest()
+    {
+        // query contours (normal v flipped, h flipped) and testing contour
+        vector<PointType> contoursQuery1, contoursQuery2, contoursQuery3, contoursTesting;
+        // reading query and computing its properties
+        for (vector<string>::const_iterator a = filenames.begin(); a != filenames.end(); ++a)
+        {
+            // read current image
+            int aIndex = (int)(a - filenames.begin());
+            Mat currentQuery = imread(*a, IMREAD_GRAYSCALE);
+            Mat flippedHQuery, flippedVQuery;
+            flip(currentQuery, flippedHQuery, 0);
+            flip(currentQuery, flippedVQuery, 1);
+            // compute border of the query and its flipped versions
+            contoursQuery1=convertContourType(currentQuery);
+            contoursQuery2=convertContourType(flippedHQuery);
+            contoursQuery3=convertContourType(flippedVQuery);
+            // compare with all the rest of the images: testing
+            for (vector<string>::const_iterator b = filenames.begin(); b != filenames.end(); ++b)
+            {
+                int bIndex = (int)(b - filenames.begin());
+                float distance = 0;
+                // skip self-comparisson
+                if (a != b)
+                {
+                    // read testing image
+                    Mat currentTest = imread(*b, IMREAD_GRAYSCALE);
+                    // compute border of the testing
+                    contoursTesting=convertContourType(currentTest);
+                    // compute shape distance
+                    distance = cmp(contoursQuery1, contoursQuery2,
+                                   contoursQuery3, contoursTesting);
+                }
+                distanceMat.at<float>(aIndex, bIndex) = distance;
+            }
+        }
+    }
+
+    void displayMPEGResults()
+    {
+        const int FIRST_MANY=2*NSN;
+
+        int corrects=0;
+        int divi=0;
+        for (int row=0; row<distanceMat.rows; row++)
+        {
+            if (row%NSN==0) //another group
+            {
+                divi+=NSN;
+            }
+            for (int col=divi-NSN; col<divi; col++)
+            {
+                int nsmall=0;
+                for (int i=0; i<distanceMat.cols; i++)
+                {
+                    if (distanceMat.at<float>(row,col) > distanceMat.at<float>(row,i))
+                    {
+                        nsmall++;
+                    }
+                }
+                if (nsmall<=FIRST_MANY)
+                {
+                    corrects++;
+                }
+            }
+        }
+        float porc = 100*float(corrects)/(NSN*distanceMat.rows);
+        std::cout << "Test result: " << porc << "%" << std::endl;
+        if (porc >= CURRENT_MAX_ACCUR)
+            ts->set_failed_test_info(cvtest::TS::OK);
+        else
+            ts->set_failed_test_info(cvtest::TS::FAIL_BAD_ACCURACY);
+    }
+
+protected:
+    int NSN;
+    int NP;
+    float CURRENT_MAX_ACCUR;
+    vector<string> filenames;
+    Mat distanceMat;
+    compute cmp;
+};
+
+//------------------------------------------------------------------------
+//                       Test Shape_SCD.regression
+//------------------------------------------------------------------------
+
+class computeShapeDistance_Chi
+{
+    Ptr <ShapeContextDistanceExtractor> mysc;
+public:
+    computeShapeDistance_Chi()
+    {
+        const int angularBins=12;
+        const int radialBins=4;
+        const float minRad=0.2f;
+        const float maxRad=2;
+        mysc = createShapeContextDistanceExtractor(angularBins, radialBins, minRad, maxRad);
+        mysc->setIterations(1);
+        mysc->setCostExtractor(createChiHistogramCostExtractor(30,0.15f));
+        mysc->setTransformAlgorithm( createThinPlateSplineShapeTransformer() );
+    }
+    float operator()(vector <Point2f>& query1, vector <Point2f>& query2,
+                     vector <Point2f>& query3, vector <Point2f>& testq)
+    {
+        return std::min(mysc->computeDistance(query1, testq),
+                        std::min(mysc->computeDistance(query2, testq),
+                                 mysc->computeDistance(query3, testq)));
+    }
+};
+
+TEST(Shape_SCD, regression)
+{
+    const int NSN_val=5;//10;//20; //number of shapes per class
+    const int NP_val=120; //number of points simplifying the contour
+    const float CURRENT_MAX_ACCUR_val=95; //99% and 100% reached in several tests, 95 is fixed as minimum boundary
+    ShapeBaseTest<float, computeShapeDistance_Chi> test(NSN_val, NP_val, CURRENT_MAX_ACCUR_val);
+    test.safe_run();
+}
+
+//------------------------------------------------------------------------
+//                       Test ShapeEMD_SCD.regression
+//------------------------------------------------------------------------
+
+class computeShapeDistance_EMD
+{
+    Ptr <ShapeContextDistanceExtractor> mysc;
+public:
+    computeShapeDistance_EMD()
+    {
+        const int angularBins=12;
+        const int radialBins=4;
+        const float minRad=0.2f;
+        const float maxRad=2;
+        mysc = createShapeContextDistanceExtractor(angularBins, radialBins, minRad, maxRad);
+        mysc->setIterations(1);
+        mysc->setCostExtractor( createEMDL1HistogramCostExtractor() );
+        mysc->setTransformAlgorithm( createThinPlateSplineShapeTransformer() );
+    }
+    float operator()(vector <Point2f>& query1, vector <Point2f>& query2,
+                     vector <Point2f>& query3, vector <Point2f>& testq)
+    {
+        return std::min(mysc->computeDistance(query1, testq),
+                        std::min(mysc->computeDistance(query2, testq),
+                                 mysc->computeDistance(query3, testq)));
+    }
+};
+
+TEST(ShapeEMD_SCD, regression)
+{
+    const int NSN_val=5;//10;//20; //number of shapes per class
+    const int NP_val=100; //number of points simplifying the contour
+    const float CURRENT_MAX_ACCUR_val=95; //98% and 99% reached in several tests, 95 is fixed as minimum boundary
+    ShapeBaseTest<float, computeShapeDistance_EMD> test(NSN_val, NP_val, CURRENT_MAX_ACCUR_val);
+    test.safe_run();
+}
+
+//------------------------------------------------------------------------
+//                       Test Hauss.regression
+//------------------------------------------------------------------------
+
+class computeShapeDistance_Haussdorf
+{
+    Ptr <HausdorffDistanceExtractor> haus;
+public:
+    computeShapeDistance_Haussdorf()
+    {
+        haus = createHausdorffDistanceExtractor();
+    }
+    float operator()(vector<Point> &query1, vector<Point> &query2,
+                     vector<Point> &query3, vector<Point> &testq)
+    {
+        return std::min(haus->computeDistance(query1,testq),
+                        std::min(haus->computeDistance(query2,testq),
+                                 haus->computeDistance(query3,testq)));
+    }
+};
+
+TEST(Hauss, regression)
+{
+    const int NSN_val=5;//10;//20; //number of shapes per class
+    const int NP_val = 180; //number of points simplifying the contour
+    const float CURRENT_MAX_ACCUR_val=85; //90% and 91% reached in several tests, 85 is fixed as minimum boundary
+    ShapeBaseTest<int, computeShapeDistance_Haussdorf> test(NSN_val, NP_val, CURRENT_MAX_ACCUR_val);
+    test.safe_run();
+}
+
+TEST(computeDistance, regression_4976)
+{
+    Mat a = imread(cvtest::findDataFile("shape/samples/1.png"), 0);
+    Mat b = imread(cvtest::findDataFile("shape/samples/2.png"), 0);
+
+    vector<vector<Point> > ca,cb;
+    findContours(a, ca, cv::RETR_CCOMP, cv::CHAIN_APPROX_TC89_KCOS);
+    findContours(b, cb, cv::RETR_CCOMP, cv::CHAIN_APPROX_TC89_KCOS);
+
+    Ptr<HausdorffDistanceExtractor> hd = createHausdorffDistanceExtractor();
+    Ptr<ShapeContextDistanceExtractor> sd = createShapeContextDistanceExtractor();
+
+    double d1 = hd->computeDistance(ca[0],cb[0]);
+    double d2 = sd->computeDistance(ca[0],cb[0]);
+
+    EXPECT_NEAR(d1, 26.4196891785, 1e-3) << "HausdorffDistanceExtractor";
+    EXPECT_NEAR(d2, 0.25804194808, 1e-3) << "ShapeContextDistanceExtractor";
+}
+
+}} // namespace
diff --git a/modules/stereo/CMakeLists.txt b/modules/stereo/CMakeLists.txt
index 97297e879d4..25412b6a12b 100644
--- a/modules/stereo/CMakeLists.txt
+++ b/modules/stereo/CMakeLists.txt
@@ -1,2 +1,2 @@
 set(the_description "Stereo Correspondence")
-ocv_define_module(stereo opencv_imgproc opencv_features2d opencv_core opencv_calib3d)
+ocv_define_module(stereo opencv_imgproc opencv_features2d opencv_core opencv_calib3d opencv_tracking opencv_video)
diff --git a/modules/stereo/README.md b/modules/stereo/README.md
index 745064dbd60..f12f37dcfef 100644
--- a/modules/stereo/README.md
+++ b/modules/stereo/README.md
@@ -2,3 +2,10 @@ Stereo Correspondence with different descriptors
 ================================================
 
 Stereo matching done with different descriptors: Census / CS-Census / MCT / BRIEF / MV.
+
+Quasi Dense Stereo
+======================
+
+Quasi Dense Stereo is method for performing dense stereo matching.
+The code uses pyramidal Lucas-Kanade with Shi-Tomasi features to get the initial seed correspondences.
+Then these seeds are propagated by using mentioned growing scheme.
diff --git a/modules/stereo/doc/stereo.bib b/modules/stereo/doc/stereo.bib
new file mode 100644
index 00000000000..353e6d1b71a
--- /dev/null
+++ b/modules/stereo/doc/stereo.bib
@@ -0,0 +1,33 @@
+@InProceedings{Stoyanov2010,
+author="Stoyanov, Danail
+and Scarzanella, Marco Visentini
+and Pratt, Philip
+and Yang, Guang-Zhong",
+editor="Jiang, Tianzi
+and Navab, Nassir
+and Pluim, Josien P. W.
+and Viergever, Max A.",
+title="Real-Time Stereo Reconstruction in Robotically Assisted Minimally Invasive Surgery",
+booktitle="Medical Image Computing and Computer-Assisted Intervention (MICCAI 2010)",
+year="2010",
+publisher="Springer Berlin Heidelberg",
+address="Berlin, Heidelberg",
+pages="275--282",
+abstract="The recovery of 3D tissue structure and morphology during robotic assisted surgery is an important step towards accurate deployment of surgical guidance and control techniques in minimally invasive therapies. In this article, we present a novel stereo reconstruction algorithm that propagates disparity information around a set of candidate feature matches. This has the advantage of avoiding problems with specular highlights, occlusions from instruments and view dependent illumination bias. Furthermore, the algorithm can be used with any feature matching strategy allowing the propagation of depth in very disparate views. Validation is provided for a phantom model with known geometry and this data is available online in order to establish a structured validation scheme in the field. The practical value of the proposed method is further demonstrated by reconstructions on various in vivo images of robotic assisted procedures, which are also available to the community.",
+isbn="978-3-642-15705-9"
+}
+
+@article{Lhuillier2000,
+abstract = {A new robust dense matching algorithm is introduced. The algorithm$\backslash$nstarts from matching the most textured points, then a match propagation$\backslash$nalgorithm is developed with the best first strategy to dense matching.$\backslash$nNext, the matching map is regularised by using the local geometric$\backslash$nconstraints encoded by planar affine applications and by using the$\backslash$nglobal geometric constraint encoded by the fundamental matrix. Two most$\backslash$ndistinctive features are a match propagation strategy developed by$\backslash$nanalogy to region growing and a successive regularisation by local and$\backslash$nglobal geometric constraints. The algorithm is efficient, robust and can$\backslash$ncope with wide disparity. The algorithm is demonstrated on many real$\backslash$nimage pairs, and applications on image interpolation and a creation of$\backslash$nnovel views are also presented},
+author = {Lhuillier, Maxime and Quan, Long},
+doi = {10.1109/ICPR.2000.905620},
+file = {:home/dimitrisps/Desktop/ucl/papers/quasiDenseMatching.pdf:pdf},
+isbn = {0-7695-0750-6},
+issn = {10514651},
+journal = {Proceedings-International Conference on Pattern Recognition},
+number = {1},
+pages = {968--972},
+title = {{Robust dense matching using local and global geometric constraints}},
+volume = {15},
+year = {2000}
+}
diff --git a/modules/stereo/include/opencv2/stereo.hpp b/modules/stereo/include/opencv2/stereo.hpp
index fc22938b943..5eea8058e65 100644
--- a/modules/stereo/include/opencv2/stereo.hpp
+++ b/modules/stereo/include/opencv2/stereo.hpp
@@ -49,6 +49,7 @@
 #include "opencv2/core/affine.hpp"
 #include "opencv2/stereo/descriptor.hpp"
 #include "opencv2/stereo/matching.hpp"
+#include <opencv2/stereo/quasi_dense_stereo.hpp>
 
 /**
 @defgroup stereo Stereo Correspondance Algorithms
diff --git a/modules/stereo/include/opencv2/stereo/quasi_dense_stereo.hpp b/modules/stereo/include/opencv2/stereo/quasi_dense_stereo.hpp
new file mode 100644
index 00000000000..b302c133464
--- /dev/null
+++ b/modules/stereo/include/opencv2/stereo/quasi_dense_stereo.hpp
@@ -0,0 +1,197 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+//authors: Danail Stoyanov, Evangelos Mazomenos, Dimitrios Psychogyios
+
+
+//__OPENCV_QUASI_DENSE_STEREO_H__
+#ifndef __OPENCV_QUASI_DENSE_STEREO_H__
+#define __OPENCV_QUASI_DENSE_STEREO_H__
+
+
+
+#include <opencv2/core.hpp>
+
+
+namespace cv
+{
+namespace stereo
+{
+/** \addtogroup stereo
+ *  @{
+ */
+
+
+// A basic match structure
+struct CV_EXPORTS Match
+{
+    cv::Point2i p0;
+    cv::Point2i p1;
+    float	corr;
+
+    bool operator < (const Match & rhs) const//fixme  may be used uninitialized in this function
+    {
+        return this->corr < rhs.corr;
+    }
+};
+struct CV_EXPORTS PropagationParameters
+{
+    int	corrWinSizeX;			// similarity window
+    int	corrWinSizeY;
+
+    int borderX;					// border to ignore
+    int borderY;
+
+    //matching
+    float correlationThreshold;	// correlation threshold
+    float textrureThreshold;		// texture threshold
+
+    int	  neighborhoodSize;		// neighborhood size
+    int	  disparityGradient;	// disparity gradient threshold
+
+    // Parameters for LK flow algorithm
+    int lkTemplateSize;
+    int lkPyrLvl;
+    int lkTermParam1;
+    float lkTermParam2;
+
+    // Parameters for GFT algorithm.
+    float gftQualityThres;
+    int gftMinSeperationDist;
+    int gftMaxNumFeatures;
+
+};
+
+
+/**
+ * @brief Class containing the methods needed for Quasi Dense Stereo computation.
+ *
+ * This module contains the code to perform quasi dense stereo matching.
+ * The method initially starts with a sparse 3D reconstruction based on feature matching across a
+ * stereo image pair and subsequently propagates the structure into neighboring image regions.
+ * To obtain initial seed correspondences, the algorithm locates Shi and Tomashi features in the
+ * left image of the stereo pair and then tracks them using pyramidal Lucas-Kanade in the right image.
+ * To densify the sparse correspondences, the algorithm computes the zero-mean normalized
+ * cross-correlation (ZNCC) in small patches around every seed pair and uses it as a quality metric
+ * for each match. In this code, we introduce a custom structure to store the location and ZNCC value
+ * of correspondences called "Match". Seed Matches are stored in a priority queue sorted according to
+ * their ZNCC value, allowing for the best quality Match to be readily available. The algorithm pops
+ * Matches and uses them to extract new matches around them. This is done by considering a small
+ * neighboring area around each Seed and retrieving correspondences above a certain texture threshold
+ * that are not previously computed. New matches are stored in the seed priority queue and used as seeds.
+ * The propagation process ends when no additional matches can be retrieved.
+ *
+ *
+ * @sa This code represents the work presented in @cite Stoyanov2010.
+ * If this code is useful for your work please cite @cite Stoyanov2010.
+ *
+ * Also the original growing scheme idea is described in @cite Lhuillier2000
+ *
+ */
+
+class CV_EXPORTS QuasiDenseStereo
+{
+public:
+    /**
+     * @brief destructor
+     * Method to free all the memory allocated by matrices and vectors in this class.
+     */
+    virtual ~QuasiDenseStereo() = 0;
+
+
+    /**
+     * @brief Load a file containing the configuration parameters of the class.
+     * @param[in] filepath The location of the .YAML file containing the configuration parameters.
+     * @note default value is an empty string in which case the default parameters will be loaded.
+     * @retval 1: If the path is not empty and the program loaded the parameters successfully.
+     * @retval 0: If the path is empty and the program loaded default parameters.
+     * @retval -1: If the file location is not valid or the program could not open the file and
+     * loaded default parameters from defaults.hpp.
+     * @note The method is automatically called in the constructor and configures the class.
+     * @note Loading different parameters will have an effect on the output. This is useful for tuning
+     * in case of video processing.
+     * @sa loadParameters
+     */
+    virtual int loadParameters(cv::String filepath) = 0;
+
+
+    /**
+     * @brief Save a file containing all the configuration parameters the class is currently set to.
+     * @param[in] filepath The location to store the parameters file.
+     * @note Calling this method with no arguments will result in storing class parameters to a file
+     * names "qds_parameters.yaml" in the root project folder.
+     * @note This method can be used to generate a template file for tuning the class.
+     * @sa loadParameters
+     */
+    virtual int saveParameters(cv::String filepath) = 0;
+
+
+    /**
+     * @brief Get The sparse corresponding points.
+     * @param[out] sMatches A vector containing all sparse correspondences.
+     * @note The method clears the sMatches vector.
+     * @note The returned Match elements inside the sMatches vector, do not use corr member.
+     */
+    virtual void getSparseMatches(std::vector<stereo::Match> &sMatches) = 0;
+
+
+    /**
+     * @brief Get The dense corresponding points.
+     * @param[out] denseMatches A vector containing all dense matches.
+     * @note The method clears the denseMatches vector.
+     * @note The returned Match elements inside the sMatches vector, do not use corr member.
+     */
+    virtual void getDenseMatches(std::vector<stereo::Match> &denseMatches) = 0;
+
+
+
+    /**
+     * @brief Main process of the algorithm. This method computes the sparse seeds and then densifies them.
+     *
+     * Initially input images are converted to gray-scale and then the sparseMatching method
+     * is called to obtain the sparse stereo. Finally quasiDenseMatching is called to densify the corresponding
+     * points.
+     * @param[in] imgLeft The left Channel of a stereo image pair.
+     * @param[in] imgRight The right Channel of a stereo image pair.
+     * @note If input images are in color, the method assumes that are BGR and converts them to grayscale.
+     * @sa sparseMatching
+     * @sa quasiDenseMatching
+     */
+    virtual void process(const cv::Mat &imgLeft ,const cv::Mat &imgRight) = 0;
+
+
+    /**
+     * @brief Specify pixel coordinates in the left image and get its corresponding location in the right image.
+     * @param[in] x The x pixel coordinate in the left image channel.
+     * @param[in] y The y pixel coordinate in the left image channel.
+     * @retval cv::Point(x, y) The location of the corresponding pixel in the right image.
+     * @retval cv::Point(0, 0) (NO_MATCH)  if no match is found in the right image for the specified pixel location in the left image.
+     * @note This method should be always called after process, otherwise the matches will not be correct.
+     */
+    virtual cv::Point2f getMatch(const int x, const int y) = 0;
+
+
+    /**
+     * @brief Compute and return the disparity map based on the correspondences found in the "process" method.
+     * @param[in] disparityLvls The level of detail in output disparity image.
+     * @note Default level is 50
+     * @return cv::Mat containing a the disparity image in grayscale.
+     * @sa computeDisparity
+     * @sa quantizeDisparity
+     */
+    virtual cv::Mat getDisparity(uint8_t disparityLvls=50) = 0;
+
+
+    static cv::Ptr<QuasiDenseStereo> create(cv::Size monoImgSize, cv::String paramFilepath = cv::String());
+
+
+    PropagationParameters Param;
+};
+
+} //namespace cv
+} //namespace stereo
+
+/** @}*/
+
+#endif // __OPENCV_QUASI_DENSE_STEREO_H__
diff --git a/modules/stereo/samples/dense_disparity.cpp b/modules/stereo/samples/dense_disparity.cpp
new file mode 100644
index 00000000000..0e257def230
--- /dev/null
+++ b/modules/stereo/samples/dense_disparity.cpp
@@ -0,0 +1,64 @@
+#include <opencv2/core.hpp>
+#include <opencv2/highgui.hpp>
+#include <fstream>
+#include <opencv2/stereo.hpp>
+
+
+
+
+using namespace cv;
+using namespace std;
+
+
+int main()
+{
+//!     [load]
+    cv::Mat rightImg, leftImg;
+    leftImg = imread("./imgLeft.png", IMREAD_COLOR);
+    rightImg = imread("./imgRight.png", IMREAD_COLOR);
+//!     [load]
+
+
+//!     [create]
+    cv::Size frameSize = leftImg.size();
+    Ptr<stereo::QuasiDenseStereo> stereo = stereo::QuasiDenseStereo::create(frameSize);
+//!     [create]
+
+
+//!     [process]
+    stereo->process(leftImg, rightImg);
+//!     [process]
+
+
+//!     [disp]
+    uint8_t displvl = 80;
+    cv::Mat disp;
+    disp = stereo->getDisparity(displvl);
+    cv::namedWindow("disparity map");
+    cv::imshow("disparity map", disp);
+//!     [disp]
+
+
+    cv::namedWindow("right channel");
+    cv::namedWindow("left channel");
+    cv::imshow("left channel", leftImg);
+    cv::imshow("right channel", rightImg);
+
+
+//!     [export]
+    vector<stereo::Match> matches;
+    stereo->getDenseMatches(matches);
+    std::ofstream dense("./dense.txt", std::ios::out);
+    for (uint i=0; i< matches.size(); i++)
+    {
+        dense << matches[i].p0 << matches[i].p1 << endl;
+    }
+    dense.close();
+//!     [export]
+
+
+
+    cv::waitKey(0);
+
+    return 0;
+}
diff --git a/modules/stereo/samples/export_param_file.cpp b/modules/stereo/samples/export_param_file.cpp
new file mode 100644
index 00000000000..9ea90002e5f
--- /dev/null
+++ b/modules/stereo/samples/export_param_file.cpp
@@ -0,0 +1,22 @@
+#include <opencv2/core.hpp>
+#include <opencv2/stereo.hpp>
+
+using namespace cv;
+using namespace std;
+
+int main(int argc, char* argv[])
+{
+//!     [create]
+    Ptr<stereo::QuasiDenseStereo> stereo =  stereo::QuasiDenseStereo::create(cv::Size(5,5));
+//!     [create]
+
+
+//!     [write]
+    std::string parameterFileLocation = "./parameters.yaml";
+    if (argc > 1)
+        parameterFileLocation = argv[1];
+    stereo->saveParameters(parameterFileLocation);
+//!     [write]
+
+    return 0;
+}
diff --git a/modules/stereo/src/quasi_dense_stereo.cpp b/modules/stereo/src/quasi_dense_stereo.cpp
new file mode 100644
index 00000000000..7a6513cb843
--- /dev/null
+++ b/modules/stereo/src/quasi_dense_stereo.cpp
@@ -0,0 +1,685 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "precomp.hpp"
+#include <opencv2/video/tracking.hpp>
+#include <opencv2/stereo/quasi_dense_stereo.hpp>
+#include <queue>
+
+
+namespace cv {
+namespace stereo {
+
+#define NO_MATCH cv::Point(0,0)
+
+typedef std::priority_queue<Match, std::vector<Match>, std::less<Match> > t_matchPriorityQueue;
+
+
+class QuasiDenseStereoImpl : public QuasiDenseStereo
+{
+public:
+    QuasiDenseStereoImpl(cv::Size monoImgSize, cv::String paramFilepath)
+    {
+        loadParameters(paramFilepath);
+        width = monoImgSize.width;
+        height = monoImgSize.height;
+        refMap = cv::Mat_<cv::Point2i>(monoImgSize);
+        mtcMap = cv::Mat_<cv::Point2i>(monoImgSize);
+
+        cv::Size integralSize = cv::Size(monoImgSize.width+1, monoImgSize.height+1);
+        sum0 = cv::Mat_<int32_t>(integralSize);
+        sum1 = cv::Mat_<int32_t>(integralSize);
+        ssum0 = cv::Mat_<double>(integralSize);
+        ssum1 = cv::Mat_<double>(integralSize);
+        // the disparity image.
+        disparity = cv::Mat_<float>(monoImgSize);
+        disparityImg = cv::Mat_<uchar>(monoImgSize);
+        // texture images.
+        textureDescLeft = cv::Mat_<int> (monoImgSize);
+        textureDescRight = cv::Mat_<int> (monoImgSize);
+    }
+
+    ~QuasiDenseStereoImpl()
+    {
+
+        rightFeatures.clear();
+        leftFeatures.clear();
+
+        refMap.release();
+        mtcMap.release();
+
+        sum0.release();
+        sum1.release();
+        ssum0.release();
+        ssum1.release();
+        // the disparity image.
+        disparity.release();
+        disparityImg.release();
+        // texture images.
+        textureDescLeft.release();
+        textureDescRight.release();
+    }
+
+
+    /**
+     * @brief Computes sparse stereo. The output is stores in refMap and mthMap.
+     *
+     * This method used the "goodFeaturesToTrack" function of OpenCV to extracts salient points
+     * in the left image. Feature locations are used as inputs in the "calcOpticalFlowPyrLK"
+     * function of OpenCV along with the left and right images. The optical flow algorithm estimates
+     * tracks the locations of the features in the right image. The two set of locations constitute
+     * the sparse set of matches. These are then used as seeds in the intensification stage of the
+     * algorithm.
+     * @param[in] imgLeft The left Channel of a stereo image.
+     * @param[in] imgRight The right Channel of a stereo image.
+     * @param[out] featuresLeft (vector of points) The location of the features in the left image.
+     * @param[out] featuresRight (vector of points) The location of the features in the right image.
+     * @note featuresLeft and featuresRight must have the same length and corresponding features
+     * must be indexed the same way in both vectors.
+     */
+    void sparseMatching(const cv::Mat &imgLeft ,const cv::Mat &imgRight,
+                        std::vector< cv::Point2f > &featuresLeft,
+                        std::vector< cv::Point2f > &featuresRight)
+    {
+        std::vector< uchar > featureStatus;
+        std::vector< float > error;
+        featuresLeft.clear();
+        featuresRight.clear();
+
+        cv::goodFeaturesToTrack(imgLeft, featuresLeft, Param.gftMaxNumFeatures,
+        Param.gftQualityThres, Param.gftMinSeperationDist);
+
+        cv::Size templateSize(Param.lkTemplateSize,Param.lkTemplateSize);
+        cv::TermCriteria termination(cv::TermCriteria::MAX_ITER | cv::TermCriteria::EPS,
+                                     Param.lkTermParam1, Param.lkTermParam2);
+        cv::calcOpticalFlowPyrLK(imgLeft, imgRight, featuresLeft, featuresRight,
+        featureStatus, error,
+        templateSize, Param.lkPyrLvl, termination);
+        //discard bad features.
+        for(size_t i=0; i<featuresLeft.size();)
+        {
+            if( featureStatus[i]==0 )
+            {
+                std::swap(featuresLeft[i], featuresLeft.back());
+                featuresLeft.pop_back();
+                std::swap(featureStatus[i], featureStatus.back());
+                featureStatus.pop_back();
+                std::swap(featuresRight[i], featuresRight.back());
+                featuresRight.pop_back();
+            }
+            else
+                ++i;
+        }
+    }
+
+
+    /**
+     * @brief Based on the seeds computed in sparse stereo, this method calculates the semi dense
+     * set of correspondences.
+     *
+     * The method initially discards low quality matches based on their zero-normalized cross
+     * correlation (zncc) value. This is done by calling the "extractSparseSeeds" method. Remaining
+     * high quality Matches stored in a t_matchPriorityQueue sorted according to their zncc value.
+     * The priority queue allows for new matches to be added while keeping track of the best Match.
+     * The algorithm then process the queue iteratively. In every iteration a Match is popped from
+     * the queue. The algorithm then tries to find candidate matches by matching every point in a
+     * small patch around the left Match feature, with a point within a same sized patch around the
+     * corresponding right feature. For each candidate point match, the zncc is computed and if it
+     * surpasses a threshold, the candidate pair is stored in a temporary priority queue. After this
+     * process completed the candidate matches are popped from the Local priority queue and if a
+     * match is not registered in refMap, it means that is the best match for this point. The
+     * algorithm registers this point in refMap and also push it to the Seed queue. If a candidate
+     * match is already registered, it means that is not the best and the algorithm discards it.
+     *
+     * @note This method does not have input arguments, but uses the "leftFeatures" and
+     * "rightFeatures" vectors.
+     * Also there is no output since the method used refMap and mtcMap to store the results.
+     * @param[in] featuresLeft The location of the features in the left image.
+     * @param[in] featuresRight The location of the features in the right image.
+     */
+    void quasiDenseMatching(const std::vector< cv::Point2f > &featuresLeft,
+                            const std::vector< cv::Point2f > &featuresRight)
+    {
+        dMatchesLen = 0;
+        refMap = cv::Mat_<cv::Point2i>(cv::Size(width, height), cv::Point2i(0, 0));
+        mtcMap = cv::Point2i(0, 0);
+
+        // build texture homogeneity reference maps.
+        buildTextureDescriptor(grayLeft, textureDescLeft);
+        buildTextureDescriptor(grayRight, textureDescRight);
+
+        // generate the intergal images for fast variable window correlation calculations
+        cv::integral(grayLeft, sum0, ssum0);
+        cv::integral(grayRight, sum1, ssum1);
+
+        // Seed priority queue. The algorithm wants to pop the best seed available in order to densify
+        //the sparse set.
+        t_matchPriorityQueue seeds = extractSparseSeeds(featuresLeft, featuresRight,
+        refMap, mtcMap);
+
+
+        // Do the propagation part
+        while(!seeds.empty())
+        {
+            t_matchPriorityQueue Local;
+
+            // Get the best seed at the moment
+            Match m = seeds.top();
+            seeds.pop();
+
+            // Ignore the border
+            if(!CheckBorder(m, Param.borderX, Param.borderY, width, height))
+                continue;
+
+            // For all neighbours of the seed in image 1
+            //the neighborghoud is defined with Param.N*2 dimentrion
+            for(int y=-Param.neighborhoodSize;y<=Param.neighborhoodSize;y++)
+            {
+                for(int x=-Param.neighborhoodSize;x<=Param.neighborhoodSize;x++)
+                {
+                    cv::Point2i p0 = cv::Point2i(m.p0.x+x,m.p0.y+y);
+
+                    // Check if its unique in ref
+                    if(refMap.at<cv::Point2i>(p0.y,p0.x) != NO_MATCH)
+                        continue;
+
+                    // Check the texture descriptor for a boundary
+                    if(textureDescLeft.at<int>(p0.y, p0.x) > Param.textrureThreshold)
+                        continue;
+
+                    // For all candidate matches.
+                    for(int wy=-Param.disparityGradient; wy<=Param.disparityGradient; wy++)
+                    {
+                        for(int wx=-Param.disparityGradient; wx<=Param.disparityGradient; wx++)
+                        {
+                            cv::Point p1 = cv::Point(m.p1.x+x+wx,m.p1.y+y+wy);
+
+                            // Check if its unique in ref
+                            if(mtcMap.at<cv::Point2i>(p1.y, p1.x) != NO_MATCH)
+                                continue;
+
+                            // Check the texture descriptor for a boundary
+                            if(textureDescRight.at<int>(p1.y, p1.x) > Param.textrureThreshold)
+                                continue;
+
+                            // Calculate ZNCC and store local match.
+                            float corr = iZNCC_c1(p0,p1,Param.corrWinSizeX,Param.corrWinSizeY);
+
+                            // push back if this is valid match
+                            if( corr > Param.correlationThreshold )
+                            {
+                                Match nm;
+                                nm.p0 = p0;
+                                nm.p1 = p1;
+                                nm.corr = corr;
+                                Local.push(nm);
+                            }
+                        }
+                    }
+                }
+            }
+
+            // Get seeds from the local
+            while( !Local.empty() )
+            {
+                Match lm = Local.top();
+                Local.pop();
+                // Check if its unique in both ref and dst.
+                if(refMap.at<cv::Point2i>(lm.p0.y, lm.p0.x) != NO_MATCH)
+                    continue;
+                if(mtcMap.at<cv::Point2i>(lm.p1.y, lm.p1.x) != NO_MATCH)
+                    continue;
+
+
+                // Unique match
+                refMap.at<cv::Point2i>(lm.p0.y, lm.p0.x) = lm.p1;
+                mtcMap.at<cv::Point2i>(lm.p1.y, lm.p1.x) = lm.p0;
+                dMatchesLen++;
+                // Add to the seed list
+                seeds.push(lm);
+            }
+        }
+    }
+
+
+    /**
+     * @brief Compute the disparity map based on the Euclidean distance of corresponding points.
+     * @param[in] matchMap A matrix of points, the same size as the left channel. Each cell of this
+     * matrix stores the location of the corresponding point in the right image.
+     * @param[out] dispMat The disparity map.
+     * @sa quantizeDisparity
+     * @sa getDisparity
+     */
+    void computeDisparity(const cv::Mat_<cv::Point2i> &matchMap,
+                            cv::Mat_<float> &dispMat)
+    {
+        for(int row=0; row< height; row++)
+        {
+            for(int col=0; col<width; col++)
+            {
+                cv::Point2d tmpPoint(col, row);
+
+                if (matchMap.at<cv::Point2i>(tmpPoint) == NO_MATCH)
+                {
+                    dispMat.at<float>(tmpPoint) = 200;
+                    continue;
+                }
+                //if a match is found, compute the difference in location of the match and current
+                //pixel.
+                int dx = col-matchMap.at<cv::Point2i>(tmpPoint).x;
+                int dy = row-matchMap.at<cv::Point2i>(tmpPoint).y;
+                //calculate disparity of current pixel.
+                dispMat.at<float>(tmpPoint) = sqrt(float(dx*dx+dy*dy));
+            }
+        }
+    }
+
+
+    /**
+     * @brief Disparity map normalization for display purposes. If needed specify the quantization
+     * level as input argument.
+     * @param[in] dispMat The disparity Map.
+     * @param[in] lvls The quantization level of the output disparity map.
+     * @return Disparity image.
+     * @note Stores the output in the disparityImage class variable.
+     * @sa computeDisparity
+     * @sa getDisparity
+     */
+    cv::Mat quantiseDisparity(const cv::Mat_<float> &dispMat, const int lvls)
+   {
+       float tmpPixelVal ;
+       double min, max;
+//	   minMaxLoc(disparity, &min, &max);
+       min = 0;
+       max = lvls;
+       for(int row=0; row<height; row++)
+       {
+           for(int col=0; col<width; col++)
+           {
+               tmpPixelVal = dispMat.at<float>(row, col);
+               tmpPixelVal = (float) (255. - 255.0*(tmpPixelVal-min)/(max-min));
+
+               disparityImg.at<uchar>(row, col) =  (uint8_t) tmpPixelVal;
+           }
+       }
+       return disparityImg;
+   }
+
+
+    /**
+     * @brief Compute the Zero-mean Normalized Cross-correlation.
+     *
+     * Compare a patch in the left image, centered in point p0 with a patch in the right image,
+     * centered in point p1. Patches are defined by wy, wx and the patch size is (2*wx+1) by
+     * (2*wy+1).
+     * @param [in] p0 The central point of the patch in the left image.
+     * @param [in] p1 The central point of the patch in the right image.
+     * @param [in] wx The distance from the center of the patch to the border in the x direction.
+     * @param [in] wy The distance from the center of the patch to the border in the y direction.
+     * @return The value of the the zero-mean normalized cross correlation.
+     * @note Default value for wx, wy is 1. in this case the patch is 3x3.
+     */
+    float iZNCC_c1(const cv::Point2i p0, const cv::Point2i p1, const int wx=1, const int wy=1)
+    {
+        float m0=0.0 ,m1=0.0 ,s0=0.0 ,s1=0.0;
+        float wa = (float)(2*wy+1)*(2*wx+1);
+        float zncc=0.0;
+
+
+        patchSumSum2(p0, sum0, ssum0, m0, s0, wx, wy);
+        patchSumSum2(p1, sum1, ssum1, m1, s1, wx, wy);
+
+        m0 /= wa;
+        m1 /= wa;
+
+        // standard deviations
+        s0 = sqrt(s0-wa*m0*m0);
+        s1 = sqrt(s1-wa*m1*m1);
+
+
+        for (int col=-wy; col<=wy; col++)
+        {
+            for (int row=-wx; row<=wx; row++)
+            {
+                zncc += (float)grayLeft.at<uchar>(p0.y+row, p0.x+col) *
+                (float)grayRight.at<uchar>(p1.y+row, p1.x+col);
+            }
+        }
+        zncc = (zncc-wa*m0*m1)/(s0*s1);
+        return zncc;
+    }
+
+
+    /**
+     * @brief Compute the sum of values and the sum of squared values of a patch with dimensions
+     * 2*xWindow+1 by 2*yWindow+1 and centered in point p, using the integral image and integral
+     * image of squared pixel values.
+     * @param[in] p The center of the patch we want to calculate the sum and sum of squared values.
+     * @param[in] s The integral image
+     * @param[in] ss The integral image of squared values.
+     * @param[out] sum The sum of pixels inside the patch.
+     * @param[out] ssum The sum of squared values inside the patch.
+     * @param [in] xWindow The distance from the central pixel of the patch to the border in x
+     * direction.
+     * @param [in] yWindow The distance from the central pixel of the patch to the border in y
+     * direction.
+     * @note Default value for xWindow, yWindow is 1. in this case the patch is 3x3.
+     * @note integral images are very useful to sum values of patches in constant time independent
+     * of their size. For more information refer to the cv::Integral function OpenCV page.
+     */
+    void patchSumSum2(const cv::Point2i p, const cv::Mat &sum, const cv::Mat &ssum,
+                        float &s, float &ss, const int xWindow=1, const int yWindow=1)
+    {
+      cv::Point2i otl(p.x-xWindow, p.y-yWindow);
+      //outer top right
+      cv::Point2i otr(p.x+xWindow+1, p.y-yWindow);
+      //outer bottom left
+      cv::Point2i obl(p.x-xWindow, p.y+yWindow+1);
+      //outer bottom right
+      cv::Point2i obr(p.x+xWindow+1, p.y+yWindow+1);
+
+      // sum and squared sum for right window
+      s = (float)(sum.at<int>(otl) - sum.at<int>(otr)
+          - sum.at<int>(obl) + sum.at<int>(obr));
+
+      ss = (float)(ssum.at<double>(otl) - ssum.at<double>(otr)
+           - ssum.at<double>(obl) + ssum.at<double>(obr));
+    }
+
+
+    /**
+     * @brief Create a priority queue containing sparse Matches
+     *
+     * This method computes the zncc for each Match extracted in "sparseMatching". If the zncc is
+     * over the correlation threshold then the Match is inserted in the output priority queue.
+     * @param[in] featuresLeft The feature locations in the left image.
+     * @param[in] featuresRight The features locations in the right image.
+     * @param[out] leftMap A matrix of points, of the same size as the left image. Each cell of this
+     * matrix stores the location of the corresponding point in the right image.
+     * @param[out] rightMap A matrix of points, the same size as the right image. Each cell of this
+     * matrix stores the location of the corresponding point in the left image.
+     * @return Priority queue containing sparse matches.
+     */
+    t_matchPriorityQueue extractSparseSeeds(const std::vector< cv::Point2f > &featuresLeft,
+                                            const std::vector< cv::Point2f >  &featuresRight,
+                                            cv::Mat_<cv::Point2i> &leftMap,
+                                            cv::Mat_<cv::Point2i> &rightMap)
+    {
+        t_matchPriorityQueue seeds;
+        for(uint i=0; i < featuresLeft.size(); i++)
+        {
+            // Calculate correlation and store match in Seeds.
+            Match m;
+            m.p0 = cv::Point2i(featuresLeft[i]);
+            m.p1 = cv::Point2i(featuresRight[i]);
+            m.corr = 0;
+
+            // Check if too close to boundary.
+            if(!CheckBorder(m,Param.borderX,Param.borderY, width, height))
+            continue;
+
+            m.corr = iZNCC_c1(m.p0, m.p1, Param.corrWinSizeX, Param.corrWinSizeY);
+            // Can we add it to the list
+            if( m.corr > Param.correlationThreshold )
+            {
+                seeds.push(m);
+                leftMap.at<cv::Point2i>(m.p0.y, m.p0.x) = m.p1;
+                rightMap.at<cv::Point2i>(m.p1.y, m.p1.x) = m.p0;
+            }
+        }
+        return seeds;
+    }
+
+
+    /**
+     * @brief Check if a match is close to the boarder of an image.
+     * @param[in] m The match containing points in both image.
+     * @param[in] bx The offset of the image edge that defines the border in x direction.
+     * @param[in] by The offset of the image edge that defines the border in y direction.
+     * @param[in] w The width of the image.
+     * @param[in] h The height of the image.
+     * @retval true If the feature is in the border of the image.
+     * @retval false If the feature is not in the border of image.
+     */
+    bool CheckBorder(Match m, int bx, int by, int w, int h)
+    {
+        if(m.p0.x<bx || m.p0.x>w-bx || m.p0.y<by || m.p0.y>h-by ||
+        m.p1.x<bx || m.p1.x>w-bx || m.p1.y<by || m.p1.y>h-by)
+        {
+            return false;
+        }
+
+        return true;
+    }
+
+
+    /**
+     * @brief Build a texture descriptor
+     * @param[in] img The image we need to compute the descriptor for.
+     * @param[out] descriptor The texture descriptor of the image.
+     */
+    void buildTextureDescriptor(cv::Mat &img,cv::Mat &descriptor)
+    {
+        float a, b, c, d;
+
+        uint8_t center, top, bottom, right, left;
+        //reset descriptors
+
+        // traverse every pixel.
+        for(int row=1; row<height-1; row++)
+        {
+            for(int col=1; col<width-1; col++)
+            {
+                // the values of the current pixel.
+                center = img.at<uchar>(row,col);
+                top = img.at<uchar>(row-1,col);
+                bottom = img.at<uchar>(row+1,col);
+                left = img.at<uchar>(row,col-1);
+                right = img.at<uchar>(row,col+1);
+
+                a = (float)abs(center - top);
+                b = (float)abs(center - bottom);
+                c = (float)abs(center - left);
+                d = (float)abs(center - right);
+                //choose the biggest of them.
+                int val = (int) std::max(a, std::max(b, std::max(c, d)));
+                descriptor.at<int>(row, col) = val;
+            }
+        }
+    }
+
+    //-------------------------------------------------------------------------
+
+
+    void getSparseMatches(std::vector<stereo::Match> &sMatches) override
+    {
+        Match tmpMatch;
+        sMatches.clear();
+        sMatches.reserve(leftFeatures.size());
+        for (uint i=0; i<leftFeatures.size(); i++)
+        {
+            tmpMatch.p0 = leftFeatures[i];
+            tmpMatch.p1 = rightFeatures[i];
+            sMatches.push_back(tmpMatch);
+        }
+    }
+
+    int loadParameters(cv::String filepath) override
+    {
+        cv::FileStorage fs;
+        //if user specified a pathfile, try to use it.
+        if (!filepath.empty())
+        {
+            fs.open(filepath, cv::FileStorage::READ);
+        }
+        // If the file opened, read the parameters.
+        if (fs.isOpened())
+        {
+            fs["borderX"] >> Param.borderX;
+            fs["borderY"] >> Param.borderY;
+            fs["corrWinSizeX"] >> Param.corrWinSizeX;
+            fs["corrWinSizeY"] >> Param.corrWinSizeY;
+            fs["correlationThreshold"] >> Param.correlationThreshold;
+            fs["textrureThreshold"] >> Param.textrureThreshold;
+
+            fs["neighborhoodSize"] >> Param.neighborhoodSize;
+            fs["disparityGradient"] >> Param.disparityGradient;
+
+            fs["lkTemplateSize"] >> Param.lkTemplateSize;
+            fs["lkPyrLvl"] >> Param.lkPyrLvl;
+            fs["lkTermParam1"] >> Param.lkTermParam1;
+            fs["lkTermParam2"] >> Param.lkTermParam2;
+
+            fs["gftQualityThres"] >> Param.gftQualityThres;
+            fs["gftMinSeperationDist"] >> Param.gftMinSeperationDist;
+            fs["gftMaxNumFeatures"] >> Param.gftMaxNumFeatures;
+            fs.release();
+            return 1;
+        }
+        // If the filepath was incorrect or non existent, load default parameters.
+        Param.borderX = 15;
+        Param.borderY = 15;
+        // corr window size
+        Param.corrWinSizeX = 5;
+        Param.corrWinSizeY = 5;
+        Param.correlationThreshold = (float)0.5;
+        Param.textrureThreshold = 200;
+
+        Param.neighborhoodSize = 5;
+        Param.disparityGradient = 1;
+
+        Param.lkTemplateSize = 3;
+        Param.lkPyrLvl = 3;
+        Param.lkTermParam1 = 3;
+        Param.lkTermParam2 = (float)0.003;
+
+        Param.gftQualityThres = (float)0.01;
+        Param.gftMinSeperationDist = 10;
+        Param.gftMaxNumFeatures = 500;
+        // Return 0 if there was no filepath provides.
+        // Return -1 if there was a problem opening the filepath provided.
+        if(filepath.empty())
+        {
+            return 0;
+        }
+        return -1;
+    }
+
+    int saveParameters(cv::String filepath) override
+    {
+        cv::FileStorage fs(filepath, cv::FileStorage::WRITE);
+        if (fs.isOpened())
+        {
+            fs << "borderX" << Param.borderX;
+            fs << "borderY" << Param.borderY;
+            fs << "corrWinSizeX" << Param.corrWinSizeX;
+            fs << "corrWinSizeY" << Param.corrWinSizeY;
+            fs << "correlationThreshold" << Param.correlationThreshold;
+            fs << "textrureThreshold" << Param.textrureThreshold;
+
+            fs << "neighborhoodSize" << Param.neighborhoodSize;
+            fs << "disparityGradient" << Param.disparityGradient;
+
+            fs << "lkTemplateSize" << Param.lkTemplateSize;
+            fs << "lkPyrLvl" << Param.lkPyrLvl;
+            fs << "lkTermParam1" << Param.lkTermParam1;
+            fs << "lkTermParam2" << Param.lkTermParam2;
+
+            fs << "gftQualityThres" << Param.gftQualityThres;
+            fs << "gftMinSeperationDist" << Param.gftMinSeperationDist;
+            fs << "gftMaxNumFeatures" << Param.gftMaxNumFeatures;
+            fs.release();
+        }
+        return -1;
+    }
+
+    void getDenseMatches(std::vector<stereo::Match> &denseMatches) override
+    {
+        Match tmpMatch;
+        denseMatches.clear();
+        denseMatches.reserve(dMatchesLen);
+        for (int row=0; row<height; row++)
+        {
+            for(int col=0; col<width; col++)
+            {
+                tmpMatch.p0 = cv::Point(col, row);
+                tmpMatch.p1 = refMap.at<Point2i>(row, col);
+                if (tmpMatch.p1 == NO_MATCH)
+                {
+                    continue;
+                }
+                denseMatches.push_back(tmpMatch);
+            }
+        }
+    }
+
+    void process(const cv::Mat &imgLeft , const cv::Mat &imgRight) override
+    {
+        if (imgLeft.channels()>1)
+        {
+            cv::cvtColor(imgLeft, grayLeft, cv::COLOR_BGR2GRAY);
+            cv::cvtColor(imgRight, grayRight, cv::COLOR_BGR2GRAY);
+        }
+        else
+        {
+            grayLeft = imgLeft.clone();
+            grayRight = imgRight.clone();
+        }
+        sparseMatching(grayLeft, grayRight, leftFeatures, rightFeatures);
+        quasiDenseMatching(leftFeatures, rightFeatures);
+    }
+
+    cv::Point2f getMatch(const int x, const int y) override
+    {
+        return refMap.at<cv::Point2i>(y, x);
+    }
+
+    cv::Mat getDisparity(uint8_t disparityLvls) override
+    {
+        computeDisparity(refMap, disparity);
+        return quantiseDisparity(disparity, disparityLvls);
+    }
+
+    // Variables used at sparse feature extraction.
+    // Container for left images' features, extracted with GFT algorithm.
+    std::vector< cv::Point2f > leftFeatures;
+    // Container for right images' features, matching is done with LK flow algorithm.
+    std::vector< cv::Point2f > rightFeatures;
+
+    // Width and height of a single image.
+    int width;
+    int height;
+    int dMatchesLen;
+    // Containers to store input images.
+    cv::Mat grayLeft;
+    cv::Mat grayRight;
+    // Containers to store the locations of each points pair.
+    cv::Mat_<cv::Point2i> refMap;
+    cv::Mat_<cv::Point2i> mtcMap;
+    cv::Mat_<int32_t> sum0;
+    cv::Mat_<int32_t> sum1;
+    cv::Mat_<double> ssum0;
+    cv::Mat_<double> ssum1;
+    // Container to store the disparity un-normalized
+    cv::Mat_<float> disparity;
+    // Container to store the disparity image.
+    cv::Mat_<uchar> disparityImg;
+    // Containers to store textures descriptors.
+    cv::Mat_<int> textureDescLeft;
+    cv::Mat_<int> textureDescRight;
+
+};
+
+cv::Ptr<QuasiDenseStereo> QuasiDenseStereo::create(cv::Size monoImgSize, cv::String paramFilepath)
+{
+    return cv::makePtr<QuasiDenseStereoImpl>(monoImgSize, paramFilepath);
+}
+
+QuasiDenseStereo::~QuasiDenseStereo(){
+
+}
+
+
+}
+}
diff --git a/modules/stereo/tutorials/qds_export_parameters/qds_export_parameters.markdown b/modules/stereo/tutorials/qds_export_parameters/qds_export_parameters.markdown
new file mode 100644
index 00000000000..f4f6e3af594
--- /dev/null
+++ b/modules/stereo/tutorials/qds_export_parameters/qds_export_parameters.markdown
@@ -0,0 +1,23 @@
+Exporting a template parameter file {#tutorial_qds_export_parameters}
+==================
+
+Goal
+----
+
+In this tutorial you will learn how to
+
+-   create a simple parameter file template.
+
+@include ./samples/export_param_file.cpp
+
+## Explanation:
+
+The class supports loading configuration parameters from a .yaml file using the method `loadParameters()`.
+This is very useful for fine-tuning the class' parameters on the fly. To extract a template of this
+parameter file you run the following code.
+
+We create an instance of a `QuasiDenseStereo` object. Not specifying the second argument of the constructor,
+makes the object to load default parameters.
+@snippet ./samples/export_param_file.cpp create
+By calling the method `saveParameters()`, we store the template file to the location specified by `parameterFileLocation`
+@snippet ./samples/export_param_file.cpp write
diff --git a/modules/stereo/tutorials/qds_quasi_dense_stereo/qds_quasi_dense_stereo.markdown b/modules/stereo/tutorials/qds_quasi_dense_stereo/qds_quasi_dense_stereo.markdown
new file mode 100644
index 00000000000..34c1bb6480d
--- /dev/null
+++ b/modules/stereo/tutorials/qds_quasi_dense_stereo/qds_quasi_dense_stereo.markdown
@@ -0,0 +1,39 @@
+Quasi dense Stereo {#tutorial_qds_quasi_dense_stereo}
+==================
+
+Goal
+----
+
+In this tutorial you will learn how to
+
+-   Configure a QuasiDenseStero object
+-   Compute dense Stereo correspondences.
+
+@include ./samples/dense_disparity.cpp
+
+## Explanation:
+
+The program loads a stereo image pair.
+
+After importing the images.
+@snippet ./samples/dense_disparity.cpp load
+We need to know the frame size of a single image, in order to create an instance of a `QuasiDesnseStereo` object.
+
+@snippet ./samples/dense_disparity.cpp create
+
+Because we didn't specify the second argument in the constructor, the `QuasiDesnseStereo` object will
+load default parameters.
+
+We can then pass the imported stereo images in the process method like this
+@snippet ./samples/dense_disparity.cpp process
+
+The process method contains most of the functionality of the class and does two main things.
+-   Computes a sparse stereo based in "Good Features to Track" and "pyramidal Lucas-Kanade" flow algorithm
+-   Based on those sparse stereo points, densifies the stereo correspondences using Quasi Dense Stereo method.
+
+After the execution of `process()` we can display the disparity Image of the stereo.
+@snippet ./samples/dense_disparity.cpp disp
+
+
+At this point we can also extract all the corresponding points using `getDenseMatches()` method and export them in a file.
+@snippet ./samples/dense_disparity.cpp export
diff --git a/modules/stereo/tutorials/table_of_content_quasi_dense_stereo.markdown b/modules/stereo/tutorials/table_of_content_quasi_dense_stereo.markdown
new file mode 100644
index 00000000000..7da5b186d45
--- /dev/null
+++ b/modules/stereo/tutorials/table_of_content_quasi_dense_stereo.markdown
@@ -0,0 +1,14 @@
+Quasi Dense Stereo (stereo module) {#tutorial_table_of_content_quasi_dense_stereo}
+==========================================================
+
+Quasi Dense Stereo is method for performing dense stereo matching. `QuasiDenseStereo` implements this process.
+The code uses pyramidal Lucas-Kanade with Shi-Tomasi features to get the initial seed correspondences.
+Then these seeds are propagated by using mentioned growing scheme.
+
+-   @subpage tutorial_qds_quasi_dense_stereo
+
+    Example showing how to get dense correspondences from a stereo image pair.
+
+-   @subpage tutorial_qds_export_parameters
+
+    Example showing how to genereate a parameter file template.
diff --git a/modules/superres/CMakeLists.txt b/modules/superres/CMakeLists.txt
new file mode 100644
index 00000000000..2979ef0fb87
--- /dev/null
+++ b/modules/superres/CMakeLists.txt
@@ -0,0 +1,11 @@
+if(IOS OR WINRT)
+  ocv_module_disable(superres)
+endif()
+
+set(the_description "Super Resolution")
+if(HAVE_CUDA)
+  ocv_warnings_disable(CMAKE_CXX_FLAGS -Wundef -Wshadow)
+endif()
+ocv_define_module(superres opencv_imgproc opencv_video opencv_optflow
+                  OPTIONAL opencv_videoio opencv_cudaarithm opencv_cudafilters opencv_cudawarping opencv_cudaimgproc opencv_cudaoptflow opencv_cudacodec
+                  WRAP python)
diff --git a/modules/superres/include/opencv2/superres.hpp b/modules/superres/include/opencv2/superres.hpp
new file mode 100644
index 00000000000..792dffd6716
--- /dev/null
+++ b/modules/superres/include/opencv2/superres.hpp
@@ -0,0 +1,207 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_SUPERRES_HPP
+#define OPENCV_SUPERRES_HPP
+
+#include "opencv2/core.hpp"
+#include "opencv2/superres/optical_flow.hpp"
+
+/**
+  @defgroup superres Super Resolution
+
+The Super Resolution module contains a set of functions and classes that can be used to solve the
+problem of resolution enhancement. There are a few methods implemented, most of them are described in
+the papers @cite Farsiu03 and @cite Mitzel09 .
+
+ */
+
+namespace cv
+{
+    namespace superres
+    {
+
+//! @addtogroup superres
+//! @{
+
+        class CV_EXPORTS FrameSource
+        {
+        public:
+            virtual ~FrameSource();
+
+            virtual void nextFrame(OutputArray frame) = 0;
+            virtual void reset() = 0;
+        };
+
+        CV_EXPORTS Ptr<FrameSource> createFrameSource_Empty();
+
+        CV_EXPORTS Ptr<FrameSource> createFrameSource_Video(const String& fileName);
+        CV_EXPORTS Ptr<FrameSource> createFrameSource_Video_CUDA(const String& fileName);
+
+        CV_EXPORTS Ptr<FrameSource> createFrameSource_Camera(int deviceId = 0);
+
+        /** @brief Base class for Super Resolution algorithms.
+
+        The class is only used to define the common interface for the whole family of Super Resolution
+        algorithms.
+         */
+        class CV_EXPORTS SuperResolution : public cv::Algorithm, public FrameSource
+        {
+        public:
+            /** @brief Set input frame source for Super Resolution algorithm.
+
+            @param frameSource Input frame source
+             */
+            void setInput(const Ptr<FrameSource>& frameSource);
+
+            /** @brief Process next frame from input and return output result.
+
+            @param frame Output result
+             */
+            void nextFrame(OutputArray frame) CV_OVERRIDE;
+            void reset() CV_OVERRIDE;
+
+            /** @brief Clear all inner buffers.
+            */
+            virtual void collectGarbage();
+
+            //! @brief Scale factor
+            /** @see setScale */
+            virtual int getScale() const = 0;
+            /** @copybrief getScale @see getScale */
+            virtual void setScale(int val) = 0;
+
+            //! @brief Iterations count
+            /** @see setIterations */
+            virtual int getIterations() const = 0;
+            /** @copybrief getIterations @see getIterations */
+            virtual void setIterations(int val) = 0;
+
+            //! @brief Asymptotic value of steepest descent method
+            /** @see setTau */
+            virtual double getTau() const = 0;
+            /** @copybrief getTau @see getTau */
+            virtual void setTau(double val) = 0;
+
+            //! @brief Weight parameter to balance data term and smoothness term
+            /** @see setLambda */
+            virtual double getLambda() const = 0;
+            /** @copybrief getLambda @see getLambda */
+            virtual void setLambda(double val) = 0;
+
+            //! @brief Parameter of spacial distribution in Bilateral-TV
+            /** @see setAlpha */
+            virtual double getAlpha() const = 0;
+            /** @copybrief getAlpha @see getAlpha */
+            virtual void setAlpha(double val) = 0;
+
+            //! @brief Kernel size of Bilateral-TV filter
+            /** @see setKernelSize */
+            virtual int getKernelSize() const = 0;
+            /** @copybrief getKernelSize @see getKernelSize */
+            virtual void setKernelSize(int val) = 0;
+
+            //! @brief Gaussian blur kernel size
+            /** @see setBlurKernelSize */
+            virtual int getBlurKernelSize() const = 0;
+            /** @copybrief getBlurKernelSize @see getBlurKernelSize */
+            virtual void setBlurKernelSize(int val) = 0;
+
+            //! @brief Gaussian blur sigma
+            /** @see setBlurSigma */
+            virtual double getBlurSigma() const = 0;
+            /** @copybrief getBlurSigma @see getBlurSigma */
+            virtual void setBlurSigma(double val) = 0;
+
+            //! @brief Radius of the temporal search area
+            /** @see setTemporalAreaRadius */
+            virtual int getTemporalAreaRadius() const = 0;
+            /** @copybrief getTemporalAreaRadius @see getTemporalAreaRadius */
+            virtual void setTemporalAreaRadius(int val) = 0;
+
+            //! @brief Dense optical flow algorithm
+            /** @see setOpticalFlow */
+            virtual Ptr<cv::superres::DenseOpticalFlowExt> getOpticalFlow() const = 0;
+            /** @copybrief getOpticalFlow @see getOpticalFlow */
+            virtual void setOpticalFlow(const Ptr<cv::superres::DenseOpticalFlowExt> &val) = 0;
+
+        protected:
+            SuperResolution();
+
+            virtual void initImpl(Ptr<FrameSource>& frameSource) = 0;
+            virtual void processImpl(Ptr<FrameSource>& frameSource, OutputArray output) = 0;
+
+            bool isUmat_;
+
+        private:
+            Ptr<FrameSource> frameSource_;
+            bool firstCall_;
+        };
+
+        /** @brief Create Bilateral TV-L1 Super Resolution.
+
+        This class implements Super Resolution algorithm described in the papers @cite Farsiu03 and
+        @cite Mitzel09 .
+
+        Here are important members of the class that control the algorithm, which you can set after
+        constructing the class instance:
+
+        -   **int scale** Scale factor.
+        -   **int iterations** Iteration count.
+        -   **double tau** Asymptotic value of steepest descent method.
+        -   **double lambda** Weight parameter to balance data term and smoothness term.
+        -   **double alpha** Parameter of spacial distribution in Bilateral-TV.
+        -   **int btvKernelSize** Kernel size of Bilateral-TV filter.
+        -   **int blurKernelSize** Gaussian blur kernel size.
+        -   **double blurSigma** Gaussian blur sigma.
+        -   **int temporalAreaRadius** Radius of the temporal search area.
+        -   **Ptr\<DenseOpticalFlowExt\> opticalFlow** Dense optical flow algorithm.
+         */
+        CV_EXPORTS Ptr<SuperResolution> createSuperResolution_BTVL1();
+        CV_EXPORTS Ptr<SuperResolution> createSuperResolution_BTVL1_CUDA();
+
+//! @} superres
+
+    }
+}
+
+#endif // OPENCV_SUPERRES_HPP
diff --git a/modules/superres/include/opencv2/superres/optical_flow.hpp b/modules/superres/include/opencv2/superres/optical_flow.hpp
new file mode 100644
index 00000000000..07e7ca9c9e9
--- /dev/null
+++ b/modules/superres/include/opencv2/superres/optical_flow.hpp
@@ -0,0 +1,203 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_SUPERRES_OPTICAL_FLOW_HPP
+#define OPENCV_SUPERRES_OPTICAL_FLOW_HPP
+
+#include "opencv2/core.hpp"
+
+namespace cv
+{
+    namespace superres
+    {
+
+//! @addtogroup superres
+//! @{
+
+        class CV_EXPORTS DenseOpticalFlowExt : public cv::Algorithm
+        {
+        public:
+            virtual void calc(InputArray frame0, InputArray frame1, OutputArray flow1, OutputArray flow2 = noArray()) = 0;
+            virtual void collectGarbage() = 0;
+        };
+
+
+        class CV_EXPORTS FarnebackOpticalFlow : public virtual DenseOpticalFlowExt
+        {
+        public:
+            /** @see setPyrScale */
+            virtual double getPyrScale() const = 0;
+            /** @copybrief getPyrScale @see getPyrScale */
+            virtual void setPyrScale(double val) = 0;
+            /** @see setLevelsNumber */
+            virtual int getLevelsNumber() const = 0;
+            /** @copybrief getLevelsNumber @see getLevelsNumber */
+            virtual void setLevelsNumber(int val) = 0;
+            /** @see setWindowSize */
+            virtual int getWindowSize() const = 0;
+            /** @copybrief getWindowSize @see getWindowSize */
+            virtual void setWindowSize(int val) = 0;
+            /** @see setIterations */
+            virtual int getIterations() const = 0;
+            /** @copybrief getIterations @see getIterations */
+            virtual void setIterations(int val) = 0;
+            /** @see setPolyN */
+            virtual int getPolyN() const = 0;
+            /** @copybrief getPolyN @see getPolyN */
+            virtual void setPolyN(int val) = 0;
+            /** @see setPolySigma */
+            virtual double getPolySigma() const = 0;
+            /** @copybrief getPolySigma @see getPolySigma */
+            virtual void setPolySigma(double val) = 0;
+            /** @see setFlags */
+            virtual int getFlags() const = 0;
+            /** @copybrief getFlags @see getFlags */
+            virtual void setFlags(int val) = 0;
+        };
+        CV_EXPORTS Ptr<FarnebackOpticalFlow> createOptFlow_Farneback();
+        CV_EXPORTS Ptr<FarnebackOpticalFlow> createOptFlow_Farneback_CUDA();
+
+
+//        CV_EXPORTS Ptr<DenseOpticalFlowExt> createOptFlow_Simple();
+
+
+        class CV_EXPORTS DualTVL1OpticalFlow : public virtual DenseOpticalFlowExt
+        {
+        public:
+            /** @see setTau */
+            virtual double getTau() const = 0;
+            /** @copybrief getTau @see getTau */
+            virtual void setTau(double val) = 0;
+            /** @see setLambda */
+            virtual double getLambda() const = 0;
+            /** @copybrief getLambda @see getLambda */
+            virtual void setLambda(double val) = 0;
+            /** @see setTheta */
+            virtual double getTheta() const = 0;
+            /** @copybrief getTheta @see getTheta */
+            virtual void setTheta(double val) = 0;
+            /** @see setScalesNumber */
+            virtual int getScalesNumber() const = 0;
+            /** @copybrief getScalesNumber @see getScalesNumber */
+            virtual void setScalesNumber(int val) = 0;
+            /** @see setWarpingsNumber */
+            virtual int getWarpingsNumber() const = 0;
+            /** @copybrief getWarpingsNumber @see getWarpingsNumber */
+            virtual void setWarpingsNumber(int val) = 0;
+            /** @see setEpsilon */
+            virtual double getEpsilon() const = 0;
+            /** @copybrief getEpsilon @see getEpsilon */
+            virtual void setEpsilon(double val) = 0;
+            /** @see setIterations */
+            virtual int getIterations() const = 0;
+            /** @copybrief getIterations @see getIterations */
+            virtual void setIterations(int val) = 0;
+            /** @see setUseInitialFlow */
+            virtual bool getUseInitialFlow() const = 0;
+            /** @copybrief getUseInitialFlow @see getUseInitialFlow */
+            virtual void setUseInitialFlow(bool val) = 0;
+        };
+        CV_EXPORTS Ptr<DualTVL1OpticalFlow> createOptFlow_DualTVL1();
+        CV_EXPORTS Ptr<DualTVL1OpticalFlow> createOptFlow_DualTVL1_CUDA();
+
+
+        class CV_EXPORTS BroxOpticalFlow : public virtual DenseOpticalFlowExt
+        {
+        public:
+            //! @brief Flow smoothness
+            /** @see setAlpha */
+            virtual double getAlpha() const = 0;
+            /** @copybrief getAlpha @see getAlpha */
+            virtual void setAlpha(double val) = 0;
+            //! @brief Gradient constancy importance
+            /** @see setGamma */
+            virtual double getGamma() const = 0;
+            /** @copybrief getGamma @see getGamma */
+            virtual void setGamma(double val) = 0;
+            //! @brief Pyramid scale factor
+            /** @see setScaleFactor */
+            virtual double getScaleFactor() const = 0;
+            /** @copybrief getScaleFactor @see getScaleFactor */
+            virtual void setScaleFactor(double val) = 0;
+            //! @brief Number of lagged non-linearity iterations (inner loop)
+            /** @see setInnerIterations */
+            virtual int getInnerIterations() const = 0;
+            /** @copybrief getInnerIterations @see getInnerIterations */
+            virtual void setInnerIterations(int val) = 0;
+            //! @brief Number of warping iterations (number of pyramid levels)
+            /** @see setOuterIterations */
+            virtual int getOuterIterations() const = 0;
+            /** @copybrief getOuterIterations @see getOuterIterations */
+            virtual void setOuterIterations(int val) = 0;
+            //! @brief Number of linear system solver iterations
+            /** @see setSolverIterations */
+            virtual int getSolverIterations() const = 0;
+            /** @copybrief getSolverIterations @see getSolverIterations */
+            virtual void setSolverIterations(int val) = 0;
+        };
+        CV_EXPORTS Ptr<BroxOpticalFlow> createOptFlow_Brox_CUDA();
+
+
+        class PyrLKOpticalFlow : public virtual DenseOpticalFlowExt
+        {
+        public:
+            /** @see setWindowSize */
+            virtual int getWindowSize() const = 0;
+            /** @copybrief getWindowSize @see getWindowSize */
+            virtual void setWindowSize(int val) = 0;
+            /** @see setMaxLevel */
+            virtual int getMaxLevel() const = 0;
+            /** @copybrief getMaxLevel @see getMaxLevel */
+            virtual void setMaxLevel(int val) = 0;
+            /** @see setIterations */
+            virtual int getIterations() const = 0;
+            /** @copybrief getIterations @see getIterations */
+            virtual void setIterations(int val) = 0;
+        };
+        CV_EXPORTS Ptr<PyrLKOpticalFlow> createOptFlow_PyrLK_CUDA();
+
+//! @}
+
+    }
+}
+
+#endif // OPENCV_SUPERRES_OPTICAL_FLOW_HPP
diff --git a/modules/superres/perf/perf_main.cpp b/modules/superres/perf/perf_main.cpp
new file mode 100644
index 00000000000..af1ff2fdb94
--- /dev/null
+++ b/modules/superres/perf/perf_main.cpp
@@ -0,0 +1,58 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+
+using namespace perf;
+
+static const char * impls[] = {
+#ifdef HAVE_CUDA
+    "cuda",
+#endif
+    "plain"
+};
+
+#if defined(HAVE_HPX)
+    #include <hpx/hpx_main.hpp>
+#endif
+
+CV_PERF_TEST_MAIN_WITH_IMPLS(superres, impls, printCudaInfo())
diff --git a/modules/superres/perf/perf_precomp.hpp b/modules/superres/perf/perf_precomp.hpp
new file mode 100644
index 00000000000..8eb905c1718
--- /dev/null
+++ b/modules/superres/perf/perf_precomp.hpp
@@ -0,0 +1,51 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_PERF_PRECOMP_HPP__
+#define __OPENCV_PERF_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/ts/cuda_perf.hpp"
+#include "opencv2/superres.hpp"
+#include "opencv2/superres/optical_flow.hpp"
+
+#endif
diff --git a/modules/superres/perf/perf_superres.cpp b/modules/superres/perf/perf_superres.cpp
new file mode 100644
index 00000000000..3fdbbfa8045
--- /dev/null
+++ b/modules/superres/perf/perf_superres.cpp
@@ -0,0 +1,217 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "perf_precomp.hpp"
+#include "opencv2/ts/ocl_perf.hpp"
+
+namespace opencv_test
+{
+using namespace perf;
+using namespace cv::superres;
+using namespace cv::cuda;
+
+namespace
+{
+    class OneFrameSource_CPU : public FrameSource
+    {
+    public:
+        explicit OneFrameSource_CPU(const Mat& frame) : frame_(frame) {}
+
+        void nextFrame(OutputArray frame)
+        {
+            frame.getMatRef() = frame_;
+        }
+
+        void reset()
+        {
+        }
+
+    private:
+        Mat frame_;
+    };
+
+    class OneFrameSource_CUDA : public FrameSource
+    {
+    public:
+        explicit OneFrameSource_CUDA(const GpuMat& frame) : frame_(frame) {}
+
+        void nextFrame(OutputArray frame)
+        {
+            frame.getGpuMatRef() = frame_;
+        }
+
+        void reset()
+        {
+        }
+
+    private:
+        GpuMat frame_;
+    };
+
+    class ZeroOpticalFlow : public DenseOpticalFlowExt
+    {
+    public:
+        virtual void calc(InputArray frame0, InputArray, OutputArray flow1, OutputArray flow2)
+        {
+            cv::Size size = frame0.size();
+
+            if (!flow2.needed())
+            {
+                flow1.create(size, CV_32FC2);
+                flow1.setTo(cv::Scalar::all(0));
+            }
+            else
+            {
+                flow1.create(size, CV_32FC1);
+                flow2.create(size, CV_32FC1);
+
+                flow1.setTo(cv::Scalar::all(0));
+                flow2.setTo(cv::Scalar::all(0));
+            }
+        }
+
+        virtual void collectGarbage()
+        {
+        }
+    };
+} // namespace
+
+PERF_TEST_P(Size_MatType, SuperResolution_BTVL1,
+            Combine(Values(szSmall64, szSmall128),
+                    Values(MatType(CV_8UC1), MatType(CV_8UC3))))
+{
+    declare.time(5 * 60);
+
+    const Size size = get<0>(GetParam());
+    const int type = get<1>(GetParam());
+
+    Mat frame(size, type);
+    declare.in(frame, WARMUP_RNG);
+
+    const int scale = 2;
+    const int iterations = 50;
+    const int temporalAreaRadius = 1;
+    Ptr<DenseOpticalFlowExt> opticalFlow(new ZeroOpticalFlow);
+
+    if (PERF_RUN_CUDA())
+    {
+        Ptr<SuperResolution> superRes = createSuperResolution_BTVL1_CUDA();
+
+        superRes->setScale(scale);
+        superRes->setIterations(iterations);
+        superRes->setTemporalAreaRadius(temporalAreaRadius);
+        superRes->setOpticalFlow(opticalFlow);
+
+        superRes->setInput(makePtr<OneFrameSource_CUDA>(GpuMat(frame)));
+
+        GpuMat dst;
+        superRes->nextFrame(dst);
+
+        TEST_CYCLE_N(10) superRes->nextFrame(dst);
+
+        CUDA_SANITY_CHECK(dst, 2);
+    }
+    else
+    {
+        Ptr<SuperResolution> superRes = createSuperResolution_BTVL1();
+
+        superRes->setScale(scale);
+        superRes->setIterations(iterations);
+        superRes->setTemporalAreaRadius(temporalAreaRadius);
+        superRes->setOpticalFlow(opticalFlow);
+
+        superRes->setInput(makePtr<OneFrameSource_CPU>(frame));
+
+        Mat dst;
+        superRes->nextFrame(dst);
+
+        TEST_CYCLE_N(10) superRes->nextFrame(dst);
+
+        CPU_SANITY_CHECK(dst);
+    }
+}
+
+#ifdef HAVE_OPENCL
+
+namespace ocl {
+
+typedef Size_MatType SuperResolution_BTVL1;
+
+OCL_PERF_TEST_P(SuperResolution_BTVL1 ,BTVL1,
+            Combine(Values(szSmall64, szSmall128),
+                    Values(MatType(CV_8UC1), MatType(CV_8UC3))))
+{
+    Size_MatType_t params = GetParam();
+    const Size size = get<0>(params);
+    const int type = get<1>(params);
+
+    Mat frame(size, type);
+    UMat dst(1, 1, 0);
+    declare.in(frame, WARMUP_RNG);
+
+    const int scale = 2;
+    const int iterations = 50;
+    const int temporalAreaRadius = 1;
+
+    Ptr<DenseOpticalFlowExt> opticalFlow(new ZeroOpticalFlow);
+    Ptr<SuperResolution> superRes = createSuperResolution_BTVL1();
+
+    superRes->setScale(scale);
+    superRes->setIterations(iterations);
+    superRes->setTemporalAreaRadius(temporalAreaRadius);
+    superRes->setOpticalFlow(opticalFlow);
+
+    superRes->setInput(makePtr<OneFrameSource_CPU>(frame));
+
+    // skip first frame
+    superRes->nextFrame(dst);
+
+    OCL_TEST_CYCLE_N(10) superRes->nextFrame(dst);
+
+    SANITY_CHECK_NOTHING();
+}
+
+} // namespace ocl
+
+#endif // HAVE_OPENCL
+
+} // namespace
diff --git a/modules/superres/src/btv_l1.cpp b/modules/superres/src/btv_l1.cpp
new file mode 100644
index 00000000000..31274916349
--- /dev/null
+++ b/modules/superres/src/btv_l1.cpp
@@ -0,0 +1,1126 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+// S. Farsiu , D. Robinson, M. Elad, P. Milanfar. Fast and robust multiframe super resolution.
+// Dennis Mitzel, Thomas Pock, Thomas Schoenemann, Daniel Cremers. Video Super Resolution using Duality Based TV-L1 Optical Flow.
+
+#include "precomp.hpp"
+#include "opencl_kernels_superres.hpp"
+
+using namespace cv;
+using namespace cv::superres;
+using namespace cv::superres::detail;
+
+namespace
+{
+#ifdef HAVE_OPENCL
+
+    bool ocl_calcRelativeMotions(InputArrayOfArrays _forwardMotions, InputArrayOfArrays _backwardMotions,
+                                 OutputArrayOfArrays _relForwardMotions, OutputArrayOfArrays _relBackwardMotions,
+                                 int baseIdx, const Size & size)
+    {
+        std::vector<UMat> & forwardMotions = *(std::vector<UMat> *)_forwardMotions.getObj(),
+                & backwardMotions = *(std::vector<UMat> *)_backwardMotions.getObj(),
+                & relForwardMotions = *(std::vector<UMat> *)_relForwardMotions.getObj(),
+                & relBackwardMotions = *(std::vector<UMat> *)_relBackwardMotions.getObj();
+
+        const int count = static_cast<int>(forwardMotions.size());
+
+        relForwardMotions.resize(count);
+        relForwardMotions[baseIdx].create(size, CV_32FC2);
+        relForwardMotions[baseIdx].setTo(Scalar::all(0));
+
+        relBackwardMotions.resize(count);
+        relBackwardMotions[baseIdx].create(size, CV_32FC2);
+        relBackwardMotions[baseIdx].setTo(Scalar::all(0));
+
+        for (int i = baseIdx - 1; i >= 0; --i)
+        {
+            add(relForwardMotions[i + 1], forwardMotions[i], relForwardMotions[i]);
+            add(relBackwardMotions[i + 1], backwardMotions[i + 1], relBackwardMotions[i]);
+        }
+
+        for (int i = baseIdx + 1; i < count; ++i)
+        {
+            add(relForwardMotions[i - 1], backwardMotions[i], relForwardMotions[i]);
+            add(relBackwardMotions[i - 1], forwardMotions[i - 1], relBackwardMotions[i]);
+        }
+
+        return true;
+    }
+
+#endif
+
+    void calcRelativeMotions(InputArrayOfArrays _forwardMotions, InputArrayOfArrays _backwardMotions,
+                             OutputArrayOfArrays _relForwardMotions, OutputArrayOfArrays _relBackwardMotions,
+                             int baseIdx, const Size & size)
+    {
+        CV_OCL_RUN(_forwardMotions.isUMatVector() && _backwardMotions.isUMatVector() &&
+                   _relForwardMotions.isUMatVector() && _relBackwardMotions.isUMatVector(),
+                   ocl_calcRelativeMotions(_forwardMotions, _backwardMotions, _relForwardMotions,
+                                           _relBackwardMotions, baseIdx, size))
+
+        std::vector<Mat> & forwardMotions = *(std::vector<Mat> *)_forwardMotions.getObj(),
+                & backwardMotions = *(std::vector<Mat> *)_backwardMotions.getObj(),
+                & relForwardMotions = *(std::vector<Mat> *)_relForwardMotions.getObj(),
+                & relBackwardMotions = *(std::vector<Mat> *)_relBackwardMotions.getObj();
+
+        const int count = static_cast<int>(forwardMotions.size());
+
+        relForwardMotions.resize(count);
+        relForwardMotions[baseIdx].create(size, CV_32FC2);
+        relForwardMotions[baseIdx].setTo(Scalar::all(0));
+
+        relBackwardMotions.resize(count);
+        relBackwardMotions[baseIdx].create(size, CV_32FC2);
+        relBackwardMotions[baseIdx].setTo(Scalar::all(0));
+
+        for (int i = baseIdx - 1; i >= 0; --i)
+        {
+            add(relForwardMotions[i + 1], forwardMotions[i], relForwardMotions[i]);
+            add(relBackwardMotions[i + 1], backwardMotions[i + 1], relBackwardMotions[i]);
+        }
+
+        for (int i = baseIdx + 1; i < count; ++i)
+        {
+            add(relForwardMotions[i - 1], backwardMotions[i], relForwardMotions[i]);
+            add(relBackwardMotions[i - 1], forwardMotions[i - 1], relBackwardMotions[i]);
+        }
+    }
+#ifdef HAVE_OPENCL
+
+    bool ocl_upscaleMotions(InputArrayOfArrays _lowResMotions, OutputArrayOfArrays _highResMotions, int scale)
+    {
+        std::vector<UMat> & lowResMotions = *(std::vector<UMat> *)_lowResMotions.getObj(),
+                & highResMotions = *(std::vector<UMat> *)_highResMotions.getObj();
+
+        highResMotions.resize(lowResMotions.size());
+
+        for (size_t i = 0; i < lowResMotions.size(); ++i)
+        {
+            resize(lowResMotions[i], highResMotions[i], Size(), scale, scale, INTER_LINEAR); // TODO
+            multiply(highResMotions[i], Scalar::all(scale), highResMotions[i]);
+        }
+
+        return true;
+    }
+
+#endif
+
+    void upscaleMotions(InputArrayOfArrays _lowResMotions, OutputArrayOfArrays _highResMotions, int scale)
+    {
+        CV_OCL_RUN(_lowResMotions.isUMatVector() && _highResMotions.isUMatVector(),
+                   ocl_upscaleMotions(_lowResMotions, _highResMotions, scale))
+
+        std::vector<Mat> & lowResMotions = *(std::vector<Mat> *)_lowResMotions.getObj(),
+                & highResMotions = *(std::vector<Mat> *)_highResMotions.getObj();
+
+        highResMotions.resize(lowResMotions.size());
+
+        for (size_t i = 0; i < lowResMotions.size(); ++i)
+        {
+            resize(lowResMotions[i], highResMotions[i], Size(), scale, scale, INTER_CUBIC);
+            multiply(highResMotions[i], Scalar::all(scale), highResMotions[i]);
+        }
+    }
+
+#ifdef HAVE_OPENCL
+
+    bool ocl_buildMotionMaps(InputArray _forwardMotion, InputArray _backwardMotion,
+                             OutputArray _forwardMap, OutputArray _backwardMap)
+    {
+        ocl::Kernel k("buildMotionMaps", ocl::superres::superres_btvl1_oclsrc);
+        if (k.empty())
+            return false;
+
+        UMat forwardMotion = _forwardMotion.getUMat(), backwardMotion = _backwardMotion.getUMat();
+        Size size = forwardMotion.size();
+
+        _forwardMap.create(size, CV_32FC2);
+        _backwardMap.create(size, CV_32FC2);
+
+        UMat forwardMap = _forwardMap.getUMat(), backwardMap = _backwardMap.getUMat();
+
+        k.args(ocl::KernelArg::ReadOnlyNoSize(forwardMotion),
+               ocl::KernelArg::ReadOnlyNoSize(backwardMotion),
+               ocl::KernelArg::WriteOnlyNoSize(forwardMap),
+               ocl::KernelArg::WriteOnly(backwardMap));
+
+        size_t globalsize[2] = { (size_t)size.width, (size_t)size.height };
+        return k.run(2, globalsize, NULL, false);
+    }
+
+#endif
+
+    void buildMotionMaps(InputArray _forwardMotion, InputArray _backwardMotion,
+                         OutputArray _forwardMap, OutputArray _backwardMap)
+    {
+        CV_OCL_RUN(_forwardMap.isUMat() && _backwardMap.isUMat(),
+                   ocl_buildMotionMaps(_forwardMotion, _backwardMotion, _forwardMap,
+                                       _backwardMap));
+
+        Mat forwardMotion = _forwardMotion.getMat(), backwardMotion = _backwardMotion.getMat();
+
+        _forwardMap.create(forwardMotion.size(), CV_32FC2);
+        _backwardMap.create(forwardMotion.size(), CV_32FC2);
+
+        Mat forwardMap = _forwardMap.getMat(), backwardMap = _backwardMap.getMat();
+
+        for (int y = 0; y < forwardMotion.rows; ++y)
+        {
+            const Point2f* forwardMotionRow = forwardMotion.ptr<Point2f>(y);
+            const Point2f* backwardMotionRow = backwardMotion.ptr<Point2f>(y);
+            Point2f* forwardMapRow = forwardMap.ptr<Point2f>(y);
+            Point2f* backwardMapRow = backwardMap.ptr<Point2f>(y);
+
+            for (int x = 0; x < forwardMotion.cols; ++x)
+            {
+                Point2f base(static_cast<float>(x), static_cast<float>(y));
+
+                forwardMapRow[x] = base + backwardMotionRow[x];
+                backwardMapRow[x] = base + forwardMotionRow[x];
+            }
+        }
+    }
+
+    template <typename T>
+    void upscaleImpl(InputArray _src, OutputArray _dst, int scale)
+    {
+        Mat src = _src.getMat();
+        _dst.create(src.rows * scale, src.cols * scale, src.type());
+        _dst.setTo(Scalar::all(0));
+        Mat dst = _dst.getMat();
+
+        for (int y = 0, Y = 0; y < src.rows; ++y, Y += scale)
+        {
+            const T * const srcRow = src.ptr<T>(y);
+            T * const dstRow = dst.ptr<T>(Y);
+
+            for (int x = 0, X = 0; x < src.cols; ++x, X += scale)
+                dstRow[X] = srcRow[x];
+        }
+    }
+
+#ifdef HAVE_OPENCL
+
+    static bool ocl_upscale(InputArray _src, OutputArray _dst, int scale)
+    {
+        int type = _src.type(), cn = CV_MAT_CN(type);
+        ocl::Kernel k("upscale", ocl::superres::superres_btvl1_oclsrc,
+                      format("-D cn=%d", cn));
+        if (k.empty())
+            return false;
+
+        UMat src = _src.getUMat();
+        _dst.create(src.rows * scale, src.cols * scale, type);
+        _dst.setTo(Scalar::all(0));
+        UMat dst = _dst.getUMat();
+
+        k.args(ocl::KernelArg::ReadOnly(src),
+               ocl::KernelArg::ReadWriteNoSize(dst), scale);
+
+        size_t globalsize[2] = { (size_t)src.cols, (size_t)src.rows };
+        return k.run(2, globalsize, NULL, false);
+    }
+
+#endif
+
+    typedef struct _Point4f { float ar[4]; } Point4f;
+
+    void upscale(InputArray _src, OutputArray _dst, int scale)
+    {
+        int cn = _src.channels();
+        CV_Assert( cn == 1 || cn == 3 || cn == 4 );
+
+        CV_OCL_RUN(_dst.isUMat(),
+                   ocl_upscale(_src, _dst, scale))
+
+        typedef void (*func_t)(InputArray src, OutputArray dst, int scale);
+        static const func_t funcs[] =
+        {
+            0, upscaleImpl<float>, 0, upscaleImpl<Point3f>, upscaleImpl<Point4f>
+        };
+
+        const func_t func = funcs[cn];
+        CV_Assert(func != 0);
+        func(_src, _dst, scale);
+    }
+
+    inline float diffSign(float a, float b)
+    {
+        return a > b ? 1.0f : a < b ? -1.0f : 0.0f;
+    }
+
+    Point3f diffSign(Point3f a, Point3f b)
+    {
+        return Point3f(
+            a.x > b.x ? 1.0f : a.x < b.x ? -1.0f : 0.0f,
+            a.y > b.y ? 1.0f : a.y < b.y ? -1.0f : 0.0f,
+            a.z > b.z ? 1.0f : a.z < b.z ? -1.0f : 0.0f
+        );
+    }
+
+#ifdef HAVE_OPENCL
+
+    static bool ocl_diffSign(InputArray _src1, OutputArray _src2, OutputArray _dst)
+    {
+        ocl::Kernel k("diffSign", ocl::superres::superres_btvl1_oclsrc);
+        if (k.empty())
+            return false;
+
+        UMat src1 = _src1.getUMat(), src2 = _src2.getUMat();
+        _dst.create(src1.size(), src1.type());
+        UMat dst = _dst.getUMat();
+
+        int cn = src1.channels();
+        k.args(ocl::KernelArg::ReadOnlyNoSize(src1),
+               ocl::KernelArg::ReadOnlyNoSize(src2),
+               ocl::KernelArg::WriteOnly(dst, cn));
+
+        size_t globalsize[2] = { (size_t)src1.cols * cn, (size_t)src1.rows };
+        return k.run(2, globalsize, NULL, false);
+    }
+
+#endif
+
+    void diffSign(InputArray _src1, OutputArray _src2, OutputArray _dst)
+    {
+        CV_OCL_RUN(_dst.isUMat(),
+                   ocl_diffSign(_src1, _src2, _dst))
+
+        Mat src1 = _src1.getMat(), src2 = _src2.getMat();
+        _dst.create(src1.size(), src1.type());
+        Mat dst = _dst.getMat();
+
+        const int count = src1.cols * src1.channels();
+
+        for (int y = 0; y < src1.rows; ++y)
+        {
+            const float * const src1Ptr = src1.ptr<float>(y);
+            const float * const src2Ptr = src2.ptr<float>(y);
+            float* dstPtr = dst.ptr<float>(y);
+
+            for (int x = 0; x < count; ++x)
+                dstPtr[x] = diffSign(src1Ptr[x], src2Ptr[x]);
+        }
+    }
+
+    void calcBtvWeights(int btvKernelSize, double alpha, std::vector<float>& btvWeights)
+    {
+        const size_t size = btvKernelSize * btvKernelSize;
+
+        btvWeights.resize(size);
+
+        const int ksize = (btvKernelSize - 1) / 2;
+        const float alpha_f = static_cast<float>(alpha);
+
+        for (int m = 0, ind = 0; m <= ksize; ++m)
+        {
+            for (int l = ksize; l + m >= 0; --l, ++ind)
+                btvWeights[ind] = pow(alpha_f, std::abs(m) + std::abs(l));
+        }
+    }
+
+    template <typename T>
+    struct BtvRegularizationBody : ParallelLoopBody
+    {
+        void operator ()(const Range& range) const CV_OVERRIDE;
+
+        Mat src;
+        mutable Mat dst;
+        int ksize;
+        const float* btvWeights;
+    };
+
+    template <typename T>
+    void BtvRegularizationBody<T>::operator ()(const Range& range) const
+    {
+        for (int i = range.start; i < range.end; ++i)
+        {
+            const T * const srcRow = src.ptr<T>(i);
+            T * const dstRow = dst.ptr<T>(i);
+
+            for(int j = ksize; j < src.cols - ksize; ++j)
+            {
+                const T srcVal = srcRow[j];
+
+                for (int m = 0, ind = 0; m <= ksize; ++m)
+                {
+                    const T* srcRow2 = src.ptr<T>(i - m);
+                    const T* srcRow3 = src.ptr<T>(i + m);
+
+                    for (int l = ksize; l + m >= 0; --l, ++ind)
+                        dstRow[j] += btvWeights[ind] * (diffSign(srcVal, srcRow3[j + l])
+                                                        - diffSign(srcRow2[j - l], srcVal));
+                }
+            }
+        }
+    }
+
+    template <typename T>
+    void calcBtvRegularizationImpl(InputArray _src, OutputArray _dst, int btvKernelSize, const std::vector<float>& btvWeights)
+    {
+        Mat src = _src.getMat();
+        _dst.create(src.size(), src.type());
+        _dst.setTo(Scalar::all(0));
+        Mat dst = _dst.getMat();
+
+        const int ksize = (btvKernelSize - 1) / 2;
+
+        BtvRegularizationBody<T> body;
+
+        body.src = src;
+        body.dst = dst;
+        body.ksize = ksize;
+        body.btvWeights = &btvWeights[0];
+
+        parallel_for_(Range(ksize, src.rows - ksize), body);
+    }
+
+#ifdef HAVE_OPENCL
+
+    static bool ocl_calcBtvRegularization(InputArray _src, OutputArray _dst, int btvKernelSize, const UMat & ubtvWeights)
+    {
+        int cn = _src.channels();
+        ocl::Kernel k("calcBtvRegularization", ocl::superres::superres_btvl1_oclsrc,
+                      format("-D cn=%d", cn));
+        if (k.empty())
+            return false;
+
+        UMat src = _src.getUMat();
+        _dst.create(src.size(), src.type());
+        _dst.setTo(Scalar::all(0));
+        UMat dst = _dst.getUMat();
+
+        const int ksize = (btvKernelSize - 1) / 2;
+
+        k.args(ocl::KernelArg::ReadOnlyNoSize(src), ocl::KernelArg::WriteOnly(dst),
+              ksize, ocl::KernelArg::PtrReadOnly(ubtvWeights));
+
+        size_t globalsize[2] = { (size_t)src.cols, (size_t)src.rows };
+        return k.run(2, globalsize, NULL, false);
+    }
+
+#endif
+
+    void calcBtvRegularization(InputArray _src, OutputArray _dst, int btvKernelSize,
+                               const std::vector<float>& btvWeights, const UMat & ubtvWeights)
+    {
+        CV_OCL_RUN(_dst.isUMat(),
+                   ocl_calcBtvRegularization(_src, _dst, btvKernelSize, ubtvWeights))
+        CV_UNUSED(ubtvWeights);
+        if (_src.channels() == 1)
+        {
+            calcBtvRegularizationImpl<float>(_src, _dst, btvKernelSize, btvWeights);
+        }
+        else if (_src.channels() == 3)
+        {
+            calcBtvRegularizationImpl<Point3f>(_src, _dst, btvKernelSize, btvWeights);
+        }
+        else
+        {
+            CV_Error(Error::StsBadArg, "Unsupported number of channels in _src");
+        }
+    }
+
+    class BTVL1_Base : public cv::superres::SuperResolution
+    {
+    public:
+        BTVL1_Base();
+
+        void process(InputArrayOfArrays src, OutputArray dst, InputArrayOfArrays forwardMotions,
+                     InputArrayOfArrays backwardMotions, int baseIdx);
+
+        void collectGarbage() CV_OVERRIDE;
+
+        inline int getScale() const CV_OVERRIDE { return scale_; }
+        inline void setScale(int val) CV_OVERRIDE { scale_ = val; }
+        inline int getIterations() const CV_OVERRIDE { return iterations_; }
+        inline void setIterations(int val) CV_OVERRIDE { iterations_ = val; }
+        inline double getTau() const CV_OVERRIDE { return tau_; }
+        inline void setTau(double val) CV_OVERRIDE { tau_ = val; }
+        inline double getLambda() const CV_OVERRIDE { return lambda_; }
+        inline void setLambda(double val) CV_OVERRIDE { lambda_ = val; }
+        inline double getAlpha() const CV_OVERRIDE { return alpha_; }
+        inline void setAlpha(double val) CV_OVERRIDE { alpha_ = val; }
+        inline int getKernelSize() const CV_OVERRIDE { return btvKernelSize_; }
+        inline void setKernelSize(int val) CV_OVERRIDE { btvKernelSize_ = val; }
+        inline int getBlurKernelSize() const CV_OVERRIDE { return blurKernelSize_; }
+        inline void setBlurKernelSize(int val) CV_OVERRIDE { blurKernelSize_ = val; }
+        inline double getBlurSigma() const CV_OVERRIDE { return blurSigma_; }
+        inline void setBlurSigma(double val) CV_OVERRIDE { blurSigma_ = val; }
+        inline int getTemporalAreaRadius() const CV_OVERRIDE { return temporalAreaRadius_; }
+        inline void setTemporalAreaRadius(int val) CV_OVERRIDE { temporalAreaRadius_ = val; }
+        inline Ptr<cv::superres::DenseOpticalFlowExt> getOpticalFlow() const CV_OVERRIDE { return opticalFlow_; }
+        inline void setOpticalFlow(const Ptr<cv::superres::DenseOpticalFlowExt>& val) CV_OVERRIDE { opticalFlow_ = val; }
+
+    protected:
+        int scale_;
+        int iterations_;
+        double tau_;
+        double lambda_;
+        double alpha_;
+        int btvKernelSize_;
+        int blurKernelSize_;
+        double blurSigma_;
+        int temporalAreaRadius_; // not used in some implementations
+        Ptr<cv::superres::DenseOpticalFlowExt> opticalFlow_;
+
+    private:
+        bool ocl_process(InputArrayOfArrays src, OutputArray dst, InputArrayOfArrays forwardMotions,
+                         InputArrayOfArrays backwardMotions, int baseIdx);
+
+        //Ptr<FilterEngine> filter_;
+        int curBlurKernelSize_;
+        double curBlurSigma_;
+        int curSrcType_;
+
+        std::vector<float> btvWeights_;
+        UMat ubtvWeights_;
+
+        int curBtvKernelSize_;
+        double curAlpha_;
+
+        // Mat
+        std::vector<Mat> lowResForwardMotions_;
+        std::vector<Mat> lowResBackwardMotions_;
+
+        std::vector<Mat> highResForwardMotions_;
+        std::vector<Mat> highResBackwardMotions_;
+
+        std::vector<Mat> forwardMaps_;
+        std::vector<Mat> backwardMaps_;
+
+        Mat highRes_;
+
+        Mat diffTerm_, regTerm_;
+        Mat a_, b_, c_;
+
+#ifdef HAVE_OPENCL
+        // UMat
+        std::vector<UMat> ulowResForwardMotions_;
+        std::vector<UMat> ulowResBackwardMotions_;
+
+        std::vector<UMat> uhighResForwardMotions_;
+        std::vector<UMat> uhighResBackwardMotions_;
+
+        std::vector<UMat> uforwardMaps_;
+        std::vector<UMat> ubackwardMaps_;
+
+        UMat uhighRes_;
+
+        UMat udiffTerm_, uregTerm_;
+        UMat ua_, ub_, uc_;
+#endif
+    };
+
+    BTVL1_Base::BTVL1_Base()
+    {
+        scale_ = 4;
+        iterations_ = 180;
+        lambda_ = 0.03;
+        tau_ = 1.3;
+        alpha_ = 0.7;
+        btvKernelSize_ = 7;
+        blurKernelSize_ = 5;
+        blurSigma_ = 0.0;
+        temporalAreaRadius_ = 0;
+        opticalFlow_ = createOptFlow_Farneback();
+
+        curBlurKernelSize_ = -1;
+        curBlurSigma_ = -1.0;
+        curSrcType_ = -1;
+
+        curBtvKernelSize_ = -1;
+        curAlpha_ = -1.0;
+    }
+
+#ifdef HAVE_OPENCL
+
+    bool BTVL1_Base::ocl_process(InputArrayOfArrays _src, OutputArray _dst, InputArrayOfArrays _forwardMotions,
+                                 InputArrayOfArrays _backwardMotions, int baseIdx)
+    {
+        std::vector<UMat> & src = *(std::vector<UMat> *)_src.getObj(),
+                & forwardMotions = *(std::vector<UMat> *)_forwardMotions.getObj(),
+                & backwardMotions = *(std::vector<UMat> *)_backwardMotions.getObj();
+
+        // update blur filter and btv weights
+        if (blurKernelSize_ != curBlurKernelSize_ || blurSigma_ != curBlurSigma_ || src[0].type() != curSrcType_)
+        {
+            //filter_ = createGaussianFilter(src[0].type(), Size(blurKernelSize_, blurKernelSize_), blurSigma_);
+            curBlurKernelSize_ = blurKernelSize_;
+            curBlurSigma_ = blurSigma_;
+            curSrcType_ = src[0].type();
+        }
+
+        if (btvWeights_.empty() || btvKernelSize_ != curBtvKernelSize_ || alpha_ != curAlpha_)
+        {
+            calcBtvWeights(btvKernelSize_, alpha_, btvWeights_);
+            Mat(btvWeights_, true).copyTo(ubtvWeights_);
+
+            curBtvKernelSize_ = btvKernelSize_;
+            curAlpha_ = alpha_;
+        }
+
+        // calc high res motions
+        calcRelativeMotions(forwardMotions, backwardMotions, ulowResForwardMotions_, ulowResBackwardMotions_, baseIdx, src[0].size());
+
+        upscaleMotions(ulowResForwardMotions_, uhighResForwardMotions_, scale_);
+        upscaleMotions(ulowResBackwardMotions_, uhighResBackwardMotions_, scale_);
+
+        uforwardMaps_.resize(uhighResForwardMotions_.size());
+        ubackwardMaps_.resize(uhighResForwardMotions_.size());
+        for (size_t i = 0; i < uhighResForwardMotions_.size(); ++i)
+            buildMotionMaps(uhighResForwardMotions_[i], uhighResBackwardMotions_[i], uforwardMaps_[i], ubackwardMaps_[i]);
+
+        // initial estimation
+        const Size lowResSize = src[0].size();
+        const Size highResSize(lowResSize.width * scale_, lowResSize.height * scale_);
+
+        resize(src[baseIdx], uhighRes_, highResSize, 0, 0, INTER_LINEAR); // TODO
+
+        // iterations
+        udiffTerm_.create(highResSize, uhighRes_.type());
+        ua_.create(highResSize, uhighRes_.type());
+        ub_.create(highResSize, uhighRes_.type());
+        uc_.create(lowResSize, uhighRes_.type());
+
+        for (int i = 0; i < iterations_; ++i)
+        {
+            udiffTerm_.setTo(Scalar::all(0));
+
+            for (size_t k = 0; k < src.size(); ++k)
+            {
+                // a = M * Ih
+                remap(uhighRes_, ua_, ubackwardMaps_[k], noArray(), INTER_NEAREST);
+                // b = HM * Ih
+                GaussianBlur(ua_, ub_, Size(blurKernelSize_, blurKernelSize_), blurSigma_);
+                // c = DHM * Ih
+                resize(ub_, uc_, lowResSize, 0, 0, INTER_NEAREST);
+
+                diffSign(src[k], uc_, uc_);
+
+                // a = Dt * diff
+                upscale(uc_, ua_, scale_);
+
+                // b = HtDt * diff
+                GaussianBlur(ua_, ub_, Size(blurKernelSize_, blurKernelSize_), blurSigma_);
+                // a = MtHtDt * diff
+                remap(ub_, ua_, uforwardMaps_[k], noArray(), INTER_NEAREST);
+
+                add(udiffTerm_, ua_, udiffTerm_);
+            }
+
+            if (lambda_ > 0)
+            {
+                calcBtvRegularization(uhighRes_, uregTerm_, btvKernelSize_, btvWeights_, ubtvWeights_);
+                addWeighted(udiffTerm_, 1.0, uregTerm_, -lambda_, 0.0, udiffTerm_);
+            }
+
+            addWeighted(uhighRes_, 1.0, udiffTerm_, tau_, 0.0, uhighRes_);
+        }
+
+        Rect inner(btvKernelSize_, btvKernelSize_, uhighRes_.cols - 2 * btvKernelSize_, uhighRes_.rows - 2 * btvKernelSize_);
+        uhighRes_(inner).copyTo(_dst);
+
+        return true;
+    }
+
+#endif
+
+    void BTVL1_Base::process(InputArrayOfArrays _src, OutputArray _dst, InputArrayOfArrays _forwardMotions,
+                             InputArrayOfArrays _backwardMotions, int baseIdx)
+    {
+        CV_INSTRUMENT_REGION();
+
+        CV_Assert( scale_ > 1 );
+        CV_Assert( iterations_ > 0 );
+        CV_Assert( tau_ > 0.0 );
+        CV_Assert( alpha_ > 0.0 );
+        CV_Assert( btvKernelSize_ > 0 );
+        CV_Assert( blurKernelSize_ > 0 );
+        CV_Assert( blurSigma_ >= 0.0 );
+
+        CV_OCL_RUN(_src.isUMatVector() && _dst.isUMat() && _forwardMotions.isUMatVector() &&
+                   _backwardMotions.isUMatVector(),
+                   ocl_process(_src, _dst, _forwardMotions, _backwardMotions, baseIdx))
+
+        std::vector<Mat> & src = *(std::vector<Mat> *)_src.getObj(),
+                & forwardMotions = *(std::vector<Mat> *)_forwardMotions.getObj(),
+                & backwardMotions = *(std::vector<Mat> *)_backwardMotions.getObj();
+
+        // update blur filter and btv weights
+        if (blurKernelSize_ != curBlurKernelSize_ || blurSigma_ != curBlurSigma_ || src[0].type() != curSrcType_)
+        {
+            //filter_ = createGaussianFilter(src[0].type(), Size(blurKernelSize_, blurKernelSize_), blurSigma_);
+            curBlurKernelSize_ = blurKernelSize_;
+            curBlurSigma_ = blurSigma_;
+            curSrcType_ = src[0].type();
+        }
+
+        if (btvWeights_.empty() || btvKernelSize_ != curBtvKernelSize_ || alpha_ != curAlpha_)
+        {
+            calcBtvWeights(btvKernelSize_, alpha_, btvWeights_);
+            curBtvKernelSize_ = btvKernelSize_;
+            curAlpha_ = alpha_;
+        }
+
+        // calc high res motions
+        calcRelativeMotions(forwardMotions, backwardMotions, lowResForwardMotions_, lowResBackwardMotions_, baseIdx, src[0].size());
+
+        upscaleMotions(lowResForwardMotions_, highResForwardMotions_, scale_);
+        upscaleMotions(lowResBackwardMotions_, highResBackwardMotions_, scale_);
+
+        forwardMaps_.resize(highResForwardMotions_.size());
+        backwardMaps_.resize(highResForwardMotions_.size());
+        for (size_t i = 0; i < highResForwardMotions_.size(); ++i)
+            buildMotionMaps(highResForwardMotions_[i], highResBackwardMotions_[i], forwardMaps_[i], backwardMaps_[i]);
+
+        // initial estimation
+        const Size lowResSize = src[0].size();
+        const Size highResSize(lowResSize.width * scale_, lowResSize.height * scale_);
+
+        resize(src[baseIdx], highRes_, highResSize, 0, 0, INTER_CUBIC);
+
+        // iterations
+        diffTerm_.create(highResSize, highRes_.type());
+        a_.create(highResSize, highRes_.type());
+        b_.create(highResSize, highRes_.type());
+        c_.create(lowResSize, highRes_.type());
+
+        for (int i = 0; i < iterations_; ++i)
+        {
+            diffTerm_.setTo(Scalar::all(0));
+
+            for (size_t k = 0; k < src.size(); ++k)
+            {
+                // a = M * Ih
+                remap(highRes_, a_, backwardMaps_[k], noArray(), INTER_NEAREST);
+                // b = HM * Ih
+                GaussianBlur(a_, b_, Size(blurKernelSize_, blurKernelSize_), blurSigma_);
+                // c = DHM * Ih
+                resize(b_, c_, lowResSize, 0, 0, INTER_NEAREST);
+
+                diffSign(src[k], c_, c_);
+
+                // a = Dt * diff
+                upscale(c_, a_, scale_);
+                // b = HtDt * diff
+                GaussianBlur(a_, b_, Size(blurKernelSize_, blurKernelSize_), blurSigma_);
+                // a = MtHtDt * diff
+                remap(b_, a_, forwardMaps_[k], noArray(), INTER_NEAREST);
+
+                add(diffTerm_, a_, diffTerm_);
+            }
+
+            if (lambda_ > 0)
+            {
+                calcBtvRegularization(highRes_, regTerm_, btvKernelSize_, btvWeights_, ubtvWeights_);
+                addWeighted(diffTerm_, 1.0, regTerm_, -lambda_, 0.0, diffTerm_);
+            }
+
+            addWeighted(highRes_, 1.0, diffTerm_, tau_, 0.0, highRes_);
+        }
+
+        Rect inner(btvKernelSize_, btvKernelSize_, highRes_.cols - 2 * btvKernelSize_, highRes_.rows - 2 * btvKernelSize_);
+        highRes_(inner).copyTo(_dst);
+    }
+
+    void BTVL1_Base::collectGarbage()
+    {
+        // Mat
+        lowResForwardMotions_.clear();
+        lowResBackwardMotions_.clear();
+
+        highResForwardMotions_.clear();
+        highResBackwardMotions_.clear();
+
+        forwardMaps_.clear();
+        backwardMaps_.clear();
+
+        highRes_.release();
+
+        diffTerm_.release();
+        regTerm_.release();
+        a_.release();
+        b_.release();
+        c_.release();
+
+#ifdef HAVE_OPENCL
+        // UMat
+        ulowResForwardMotions_.clear();
+        ulowResBackwardMotions_.clear();
+
+        uhighResForwardMotions_.clear();
+        uhighResBackwardMotions_.clear();
+
+        uforwardMaps_.clear();
+        ubackwardMaps_.clear();
+
+        uhighRes_.release();
+
+        udiffTerm_.release();
+        uregTerm_.release();
+        ua_.release();
+        ub_.release();
+        uc_.release();
+#endif
+    }
+
+////////////////////////////////////////////////////////////////////
+
+    class BTVL1 CV_FINAL : public BTVL1_Base
+    {
+    public:
+        BTVL1();
+
+        void collectGarbage() CV_OVERRIDE;
+
+    protected:
+        void initImpl(Ptr<FrameSource>& frameSource) CV_OVERRIDE;
+        bool ocl_initImpl(Ptr<FrameSource>& frameSource);
+
+        void processImpl(Ptr<FrameSource>& frameSource, OutputArray output) CV_OVERRIDE;
+        bool ocl_processImpl(Ptr<FrameSource>& frameSource, OutputArray output);
+
+    private:
+        void readNextFrame(Ptr<FrameSource>& frameSource);
+        bool ocl_readNextFrame(Ptr<FrameSource>& frameSource);
+
+        void processFrame(int idx);
+        bool ocl_processFrame(int idx);
+
+        int storePos_;
+        int procPos_;
+        int outPos_;
+
+        // Mat
+        Mat curFrame_;
+        Mat prevFrame_;
+
+        std::vector<Mat> frames_;
+        std::vector<Mat> forwardMotions_;
+        std::vector<Mat> backwardMotions_;
+        std::vector<Mat> outputs_;
+
+        std::vector<Mat> srcFrames_;
+        std::vector<Mat> srcForwardMotions_;
+        std::vector<Mat> srcBackwardMotions_;
+        Mat finalOutput_;
+
+#ifdef HAVE_OPENCL
+        // UMat
+        UMat ucurFrame_;
+        UMat uprevFrame_;
+
+        std::vector<UMat> uframes_;
+        std::vector<UMat> uforwardMotions_;
+        std::vector<UMat> ubackwardMotions_;
+        std::vector<UMat> uoutputs_;
+
+        std::vector<UMat> usrcFrames_;
+        std::vector<UMat> usrcForwardMotions_;
+        std::vector<UMat> usrcBackwardMotions_;
+#endif
+    };
+
+    BTVL1::BTVL1()
+    {
+        temporalAreaRadius_ = 4;
+        procPos_ = 0;
+        outPos_ = 0;
+        storePos_ = 0;
+    }
+
+    void BTVL1::collectGarbage()
+    {
+        // Mat
+        curFrame_.release();
+        prevFrame_.release();
+
+        frames_.clear();
+        forwardMotions_.clear();
+        backwardMotions_.clear();
+        outputs_.clear();
+
+        srcFrames_.clear();
+        srcForwardMotions_.clear();
+        srcBackwardMotions_.clear();
+        finalOutput_.release();
+
+#ifdef HAVE_OPENCL
+        // UMat
+        ucurFrame_.release();
+        uprevFrame_.release();
+
+        uframes_.clear();
+        uforwardMotions_.clear();
+        ubackwardMotions_.clear();
+        uoutputs_.clear();
+
+        usrcFrames_.clear();
+        usrcForwardMotions_.clear();
+        usrcBackwardMotions_.clear();
+#endif
+
+        SuperResolution::collectGarbage();
+        BTVL1_Base::collectGarbage();
+    }
+
+#ifdef HAVE_OPENCL
+
+    bool BTVL1::ocl_initImpl(Ptr<FrameSource>& frameSource)
+    {
+        const int cacheSize = 2 * temporalAreaRadius_ + 1;
+
+        uframes_.resize(cacheSize);
+        uforwardMotions_.resize(cacheSize);
+        ubackwardMotions_.resize(cacheSize);
+        uoutputs_.resize(cacheSize);
+
+        storePos_ = -1;
+
+        for (int t = -temporalAreaRadius_; t <= temporalAreaRadius_; ++t)
+            readNextFrame(frameSource);
+
+        for (int i = 0; i <= temporalAreaRadius_; ++i)
+            processFrame(i);
+
+        procPos_ = temporalAreaRadius_;
+        outPos_ = -1;
+
+        return true;
+    }
+
+#endif
+
+    void BTVL1::initImpl(Ptr<FrameSource>& frameSource)
+    {
+        const int cacheSize = 2 * temporalAreaRadius_ + 1;
+
+        frames_.resize(cacheSize);
+        forwardMotions_.resize(cacheSize);
+        backwardMotions_.resize(cacheSize);
+        outputs_.resize(cacheSize);
+
+        CV_OCL_RUN(isUmat_,
+                   ocl_initImpl(frameSource))
+
+        storePos_ = -1;
+
+        for (int t = -temporalAreaRadius_; t <= temporalAreaRadius_; ++t)
+            readNextFrame(frameSource);
+
+        for (int i = 0; i <= temporalAreaRadius_; ++i)
+            processFrame(i);
+
+        procPos_ = temporalAreaRadius_;
+        outPos_ = -1;
+    }
+
+#ifdef HAVE_OPENCL
+
+    bool BTVL1::ocl_processImpl(Ptr<FrameSource>& /*frameSource*/, OutputArray _output)
+    {
+        const UMat& curOutput = at(outPos_, uoutputs_);
+        curOutput.convertTo(_output, CV_8U);
+
+        return true;
+    }
+
+#endif
+
+    void BTVL1::processImpl(Ptr<FrameSource>& frameSource, OutputArray _output)
+    {
+        CV_INSTRUMENT_REGION();
+
+        if (outPos_ >= storePos_)
+        {
+            _output.release();
+            return;
+        }
+
+        readNextFrame(frameSource);
+
+        if (procPos_ < storePos_)
+        {
+            ++procPos_;
+            processFrame(procPos_);
+        }
+        ++outPos_;
+
+        CV_OCL_RUN(isUmat_,
+                   ocl_processImpl(frameSource, _output))
+
+        const Mat& curOutput = at(outPos_, outputs_);
+
+        if (_output.kind() < _InputArray::OPENGL_BUFFER || _output.isUMat())
+            curOutput.convertTo(_output, CV_8U);
+        else
+        {
+            curOutput.convertTo(finalOutput_, CV_8U);
+            arrCopy(finalOutput_, _output);
+        }
+    }
+
+#ifdef HAVE_OPENCL
+
+    bool BTVL1::ocl_readNextFrame(Ptr<FrameSource>& /*frameSource*/)
+    {
+        ucurFrame_.convertTo(at(storePos_, uframes_), CV_32F);
+
+        if (storePos_ > 0)
+        {
+            opticalFlow_->calc(uprevFrame_, ucurFrame_, at(storePos_ - 1, uforwardMotions_));
+            opticalFlow_->calc(ucurFrame_, uprevFrame_, at(storePos_, ubackwardMotions_));
+        }
+
+        ucurFrame_.copyTo(uprevFrame_);
+        return true;
+    }
+
+#endif
+
+    void BTVL1::readNextFrame(Ptr<FrameSource>& frameSource)
+    {
+        CV_INSTRUMENT_REGION();
+
+        frameSource->nextFrame(curFrame_);
+        if (curFrame_.empty())
+            return;
+
+#ifdef HAVE_OPENCL
+        if (isUmat_)
+            curFrame_.copyTo(ucurFrame_);
+#endif
+        ++storePos_;
+
+        CV_OCL_RUN(isUmat_,
+                   ocl_readNextFrame(frameSource))
+
+        curFrame_.convertTo(at(storePos_, frames_), CV_32F);
+
+        if (storePos_ > 0)
+        {
+            opticalFlow_->calc(prevFrame_, curFrame_, at(storePos_ - 1, forwardMotions_));
+            opticalFlow_->calc(curFrame_, prevFrame_, at(storePos_, backwardMotions_));
+        }
+
+        curFrame_.copyTo(prevFrame_);
+    }
+
+#ifdef HAVE_OPENCL
+
+    bool BTVL1::ocl_processFrame(int idx)
+    {
+        const int startIdx = std::max(idx - temporalAreaRadius_, 0);
+        const int procIdx = idx;
+        const int endIdx = std::min(startIdx + 2 * temporalAreaRadius_, storePos_);
+
+        const int count = endIdx - startIdx + 1;
+
+        usrcFrames_.resize(count);
+        usrcForwardMotions_.resize(count);
+        usrcBackwardMotions_.resize(count);
+
+        int baseIdx = -1;
+
+        for (int i = startIdx, k = 0; i <= endIdx; ++i, ++k)
+        {
+            if (i == procIdx)
+                baseIdx = k;
+
+            usrcFrames_[k] = at(i, uframes_);
+
+            if (i < endIdx)
+                usrcForwardMotions_[k] = at(i, uforwardMotions_);
+            if (i > startIdx)
+                usrcBackwardMotions_[k] = at(i, ubackwardMotions_);
+        }
+
+        process(usrcFrames_, at(idx, uoutputs_), usrcForwardMotions_, usrcBackwardMotions_, baseIdx);
+
+        return true;
+    }
+
+#endif
+
+    void BTVL1::processFrame(int idx)
+    {
+        CV_INSTRUMENT_REGION();
+
+        CV_OCL_RUN(isUmat_,
+                   ocl_processFrame(idx))
+
+        const int startIdx = std::max(idx - temporalAreaRadius_, 0);
+        const int procIdx = idx;
+        const int endIdx = std::min(startIdx + 2 * temporalAreaRadius_, storePos_);
+
+        const int count = endIdx - startIdx + 1;
+
+        srcFrames_.resize(count);
+        srcForwardMotions_.resize(count);
+        srcBackwardMotions_.resize(count);
+
+        int baseIdx = -1;
+
+        for (int i = startIdx, k = 0; i <= endIdx; ++i, ++k)
+        {
+            if (i == procIdx)
+                baseIdx = k;
+
+            srcFrames_[k] = at(i, frames_);
+
+            if (i < endIdx)
+                srcForwardMotions_[k] = at(i, forwardMotions_);
+            if (i > startIdx)
+                srcBackwardMotions_[k] = at(i, backwardMotions_);
+        }
+
+        process(srcFrames_, at(idx, outputs_), srcForwardMotions_, srcBackwardMotions_, baseIdx);
+    }
+}
+
+Ptr<cv::superres::SuperResolution> cv::superres::createSuperResolution_BTVL1()
+{
+    return makePtr<BTVL1>();
+}
diff --git a/modules/superres/src/btv_l1_cuda.cpp b/modules/superres/src/btv_l1_cuda.cpp
new file mode 100644
index 00000000000..989d01c7a1a
--- /dev/null
+++ b/modules/superres/src/btv_l1_cuda.cpp
@@ -0,0 +1,590 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+// S. Farsiu , D. Robinson, M. Elad, P. Milanfar. Fast and robust multiframe super resolution.
+// Dennis Mitzel, Thomas Pock, Thomas Schoenemann, Daniel Cremers. Video Super Resolution using Duality Based TV-L1 Optical Flow.
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::superres;
+using namespace cv::superres::detail;
+
+#if !defined(HAVE_CUDA) || !defined(HAVE_OPENCV_CUDAARITHM) || !defined(HAVE_OPENCV_CUDAWARPING) || !defined(HAVE_OPENCV_CUDAFILTERS)
+
+Ptr<SuperResolution> cv::superres::createSuperResolution_BTVL1_CUDA()
+{
+    CV_Error(Error::StsNotImplemented, "The called functionality is disabled for current build or platform");
+}
+
+#else // HAVE_CUDA
+
+namespace btv_l1_cudev
+{
+    void buildMotionMaps(PtrStepSzf forwardMotionX, PtrStepSzf forwardMotionY,
+                         PtrStepSzf backwardMotionX, PtrStepSzf bacwardMotionY,
+                         PtrStepSzf forwardMapX, PtrStepSzf forwardMapY,
+                         PtrStepSzf backwardMapX, PtrStepSzf backwardMapY);
+
+    template <int cn>
+    void upscale(const PtrStepSzb src, PtrStepSzb dst, int scale, cudaStream_t stream);
+
+    void diffSign(PtrStepSzf src1, PtrStepSzf src2, PtrStepSzf dst, cudaStream_t stream);
+
+    void loadBtvWeights(const float* weights, size_t count);
+    template <int cn> void calcBtvRegularization(PtrStepSzb src, PtrStepSzb dst, int ksize);
+}
+
+namespace
+{
+    void calcRelativeMotions(const std::vector<std::pair<GpuMat, GpuMat> >& forwardMotions, const std::vector<std::pair<GpuMat, GpuMat> >& backwardMotions,
+                             std::vector<std::pair<GpuMat, GpuMat> >& relForwardMotions, std::vector<std::pair<GpuMat, GpuMat> >& relBackwardMotions,
+                             int baseIdx, Size size)
+    {
+        const int count = static_cast<int>(forwardMotions.size());
+
+        relForwardMotions.resize(count);
+        relForwardMotions[baseIdx].first.create(size, CV_32FC1);
+        relForwardMotions[baseIdx].first.setTo(Scalar::all(0));
+        relForwardMotions[baseIdx].second.create(size, CV_32FC1);
+        relForwardMotions[baseIdx].second.setTo(Scalar::all(0));
+
+        relBackwardMotions.resize(count);
+        relBackwardMotions[baseIdx].first.create(size, CV_32FC1);
+        relBackwardMotions[baseIdx].first.setTo(Scalar::all(0));
+        relBackwardMotions[baseIdx].second.create(size, CV_32FC1);
+        relBackwardMotions[baseIdx].second.setTo(Scalar::all(0));
+
+        for (int i = baseIdx - 1; i >= 0; --i)
+        {
+            cuda::add(relForwardMotions[i + 1].first, forwardMotions[i].first, relForwardMotions[i].first);
+            cuda::add(relForwardMotions[i + 1].second, forwardMotions[i].second, relForwardMotions[i].second);
+
+            cuda::add(relBackwardMotions[i + 1].first, backwardMotions[i + 1].first, relBackwardMotions[i].first);
+            cuda::add(relBackwardMotions[i + 1].second, backwardMotions[i + 1].second, relBackwardMotions[i].second);
+        }
+
+        for (int i = baseIdx + 1; i < count; ++i)
+        {
+            cuda::add(relForwardMotions[i - 1].first, backwardMotions[i].first, relForwardMotions[i].first);
+            cuda::add(relForwardMotions[i - 1].second, backwardMotions[i].second, relForwardMotions[i].second);
+
+            cuda::add(relBackwardMotions[i - 1].first, forwardMotions[i - 1].first, relBackwardMotions[i].first);
+            cuda::add(relBackwardMotions[i - 1].second, forwardMotions[i - 1].second, relBackwardMotions[i].second);
+        }
+    }
+
+    void upscaleMotions(const std::vector<std::pair<GpuMat, GpuMat> >& lowResMotions, std::vector<std::pair<GpuMat, GpuMat> >& highResMotions, int scale)
+    {
+        highResMotions.resize(lowResMotions.size());
+
+        for (size_t i = 0; i < lowResMotions.size(); ++i)
+        {
+            cuda::resize(lowResMotions[i].first, highResMotions[i].first, Size(), scale, scale, INTER_CUBIC);
+            cuda::resize(lowResMotions[i].second, highResMotions[i].second, Size(), scale, scale, INTER_CUBIC);
+
+            cuda::multiply(highResMotions[i].first, Scalar::all(scale), highResMotions[i].first);
+            cuda::multiply(highResMotions[i].second, Scalar::all(scale), highResMotions[i].second);
+        }
+    }
+
+    void buildMotionMaps(const std::pair<GpuMat, GpuMat>& forwardMotion, const std::pair<GpuMat, GpuMat>& backwardMotion,
+                         std::pair<GpuMat, GpuMat>& forwardMap, std::pair<GpuMat, GpuMat>& backwardMap)
+    {
+        forwardMap.first.create(forwardMotion.first.size(), CV_32FC1);
+        forwardMap.second.create(forwardMotion.first.size(), CV_32FC1);
+
+        backwardMap.first.create(forwardMotion.first.size(), CV_32FC1);
+        backwardMap.second.create(forwardMotion.first.size(), CV_32FC1);
+
+        btv_l1_cudev::buildMotionMaps(forwardMotion.first, forwardMotion.second,
+                                       backwardMotion.first, backwardMotion.second,
+                                       forwardMap.first, forwardMap.second,
+                                       backwardMap.first, backwardMap.second);
+    }
+
+    void upscale(const GpuMat& src, GpuMat& dst, int scale, Stream& stream)
+    {
+        typedef void (*func_t)(const PtrStepSzb src, PtrStepSzb dst, int scale, cudaStream_t stream);
+        static const func_t funcs[] =
+        {
+            0, btv_l1_cudev::upscale<1>, 0, btv_l1_cudev::upscale<3>, btv_l1_cudev::upscale<4>
+        };
+
+        CV_Assert( src.channels() == 1 || src.channels() == 3 || src.channels() == 4 );
+
+        dst.create(src.rows * scale, src.cols * scale, src.type());
+        dst.setTo(Scalar::all(0));
+
+        const func_t func = funcs[src.channels()];
+
+        func(src, dst, scale, StreamAccessor::getStream(stream));
+    }
+
+    void diffSign(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, Stream& stream)
+    {
+        dst.create(src1.size(), src1.type());
+
+        btv_l1_cudev::diffSign(src1.reshape(1), src2.reshape(1), dst.reshape(1), StreamAccessor::getStream(stream));
+    }
+
+    void calcBtvWeights(int btvKernelSize, double alpha, std::vector<float>& btvWeights)
+    {
+        const size_t size = btvKernelSize * btvKernelSize;
+
+        btvWeights.resize(size);
+
+        const int ksize = (btvKernelSize - 1) / 2;
+        const float alpha_f = static_cast<float>(alpha);
+
+        for (int m = 0, ind = 0; m <= ksize; ++m)
+        {
+            for (int l = ksize; l + m >= 0; --l, ++ind)
+                btvWeights[ind] = pow(alpha_f, std::abs(m) + std::abs(l));
+        }
+
+        btv_l1_cudev::loadBtvWeights(&btvWeights[0], size);
+    }
+
+    void calcBtvRegularization(const GpuMat& src, GpuMat& dst, int btvKernelSize)
+    {
+        typedef void (*func_t)(PtrStepSzb src, PtrStepSzb dst, int ksize);
+        static const func_t funcs[] =
+        {
+            0,
+            btv_l1_cudev::calcBtvRegularization<1>,
+            0,
+            btv_l1_cudev::calcBtvRegularization<3>,
+            btv_l1_cudev::calcBtvRegularization<4>
+        };
+
+        dst.create(src.size(), src.type());
+        dst.setTo(Scalar::all(0));
+
+        const int ksize = (btvKernelSize - 1) / 2;
+
+        funcs[src.channels()](src, dst, ksize);
+    }
+
+    class BTVL1_CUDA_Base : public cv::superres::SuperResolution
+    {
+    public:
+        BTVL1_CUDA_Base();
+
+        void process(const std::vector<GpuMat>& src, GpuMat& dst,
+                     const std::vector<std::pair<GpuMat, GpuMat> >& forwardMotions, const std::vector<std::pair<GpuMat, GpuMat> >& backwardMotions,
+                     int baseIdx);
+
+        void collectGarbage();
+
+        inline int getScale() const CV_OVERRIDE { return scale_; }
+        inline void setScale(int val) CV_OVERRIDE { scale_ = val; }
+        inline int getIterations() const CV_OVERRIDE { return iterations_; }
+        inline void setIterations(int val) CV_OVERRIDE { iterations_ = val; }
+        inline double getTau() const CV_OVERRIDE { return tau_; }
+        inline void setTau(double val) CV_OVERRIDE { tau_ = val; }
+        inline double getLambda() const CV_OVERRIDE { return lambda_; }
+        inline void setLambda(double val) CV_OVERRIDE { lambda_ = val; }
+        inline double getAlpha() const CV_OVERRIDE { return alpha_; }
+        inline void setAlpha(double val) CV_OVERRIDE { alpha_ = val; }
+        inline int getKernelSize() const CV_OVERRIDE { return btvKernelSize_; }
+        inline void setKernelSize(int val) CV_OVERRIDE { btvKernelSize_ = val; }
+        inline int getBlurKernelSize() const CV_OVERRIDE { return blurKernelSize_; }
+        inline void setBlurKernelSize(int val) CV_OVERRIDE { blurKernelSize_ = val; }
+        inline double getBlurSigma() const CV_OVERRIDE { return blurSigma_; }
+        inline void setBlurSigma(double val) CV_OVERRIDE { blurSigma_ = val; }
+        inline int getTemporalAreaRadius() const CV_OVERRIDE { return temporalAreaRadius_; }
+        inline void setTemporalAreaRadius(int val) CV_OVERRIDE { temporalAreaRadius_ = val; }
+        inline Ptr<cv::superres::DenseOpticalFlowExt> getOpticalFlow() const CV_OVERRIDE { return opticalFlow_; }
+        inline void setOpticalFlow(const Ptr<cv::superres::DenseOpticalFlowExt>& val) CV_OVERRIDE { opticalFlow_ = val; }
+
+    protected:
+        int scale_;
+        int iterations_;
+        double lambda_;
+        double tau_;
+        double alpha_;
+        int btvKernelSize_;
+        int blurKernelSize_;
+        double blurSigma_;
+        int temporalAreaRadius_;
+        Ptr<cv::superres::DenseOpticalFlowExt> opticalFlow_;
+
+    private:
+        std::vector<Ptr<cuda::Filter> > filters_;
+        int curBlurKernelSize_;
+        double curBlurSigma_;
+        int curSrcType_;
+
+        std::vector<float> btvWeights_;
+        int curBtvKernelSize_;
+        double curAlpha_;
+
+        std::vector<std::pair<GpuMat, GpuMat> > lowResForwardMotions_;
+        std::vector<std::pair<GpuMat, GpuMat> > lowResBackwardMotions_;
+
+        std::vector<std::pair<GpuMat, GpuMat> > highResForwardMotions_;
+        std::vector<std::pair<GpuMat, GpuMat> > highResBackwardMotions_;
+
+        std::vector<std::pair<GpuMat, GpuMat> > forwardMaps_;
+        std::vector<std::pair<GpuMat, GpuMat> > backwardMaps_;
+
+        GpuMat highRes_;
+
+        std::vector<Stream> streams_;
+        std::vector<GpuMat> diffTerms_;
+        std::vector<GpuMat> a_, b_, c_;
+        GpuMat regTerm_;
+    };
+
+    BTVL1_CUDA_Base::BTVL1_CUDA_Base()
+    {
+        scale_ = 4;
+        iterations_ = 180;
+        lambda_ = 0.03;
+        tau_ = 1.3;
+        alpha_ = 0.7;
+        btvKernelSize_ = 7;
+        blurKernelSize_ = 5;
+        blurSigma_ = 0.0;
+
+#ifdef HAVE_OPENCV_CUDAOPTFLOW
+        opticalFlow_ = createOptFlow_Farneback_CUDA();
+#else
+        opticalFlow_ = createOptFlow_Farneback();
+#endif
+        temporalAreaRadius_ = 0;
+
+        curBlurKernelSize_ = -1;
+        curBlurSigma_ = -1.0;
+        curSrcType_ = -1;
+
+        curBtvKernelSize_ = -1;
+        curAlpha_ = -1.0;
+    }
+
+    void BTVL1_CUDA_Base::process(const std::vector<GpuMat>& src, GpuMat& dst,
+                                 const std::vector<std::pair<GpuMat, GpuMat> >& forwardMotions, const std::vector<std::pair<GpuMat, GpuMat> >& backwardMotions,
+                                 int baseIdx)
+    {
+        CV_Assert( scale_ > 1 );
+        CV_Assert( iterations_ > 0 );
+        CV_Assert( tau_ > 0.0 );
+        CV_Assert( alpha_ > 0.0 );
+        CV_Assert( btvKernelSize_ > 0 && btvKernelSize_ <= 16 );
+        CV_Assert( blurKernelSize_ > 0 );
+        CV_Assert( blurSigma_ >= 0.0 );
+
+        // update blur filter and btv weights
+
+        if (filters_.size() != src.size() || blurKernelSize_ != curBlurKernelSize_ || blurSigma_ != curBlurSigma_ || src[0].type() != curSrcType_)
+        {
+            filters_.resize(src.size());
+            for (size_t i = 0; i < src.size(); ++i)
+                filters_[i] = cuda::createGaussianFilter(src[0].type(), -1, Size(blurKernelSize_, blurKernelSize_), blurSigma_);
+            curBlurKernelSize_ = blurKernelSize_;
+            curBlurSigma_ = blurSigma_;
+            curSrcType_ = src[0].type();
+        }
+
+        if (btvWeights_.empty() || btvKernelSize_ != curBtvKernelSize_ || alpha_ != curAlpha_)
+        {
+            calcBtvWeights(btvKernelSize_, alpha_, btvWeights_);
+            curBtvKernelSize_ = btvKernelSize_;
+            curAlpha_ = alpha_;
+        }
+
+        // calc motions between input frames
+
+        calcRelativeMotions(forwardMotions, backwardMotions, lowResForwardMotions_, lowResBackwardMotions_, baseIdx, src[0].size());
+
+        upscaleMotions(lowResForwardMotions_, highResForwardMotions_, scale_);
+        upscaleMotions(lowResBackwardMotions_, highResBackwardMotions_, scale_);
+
+        forwardMaps_.resize(highResForwardMotions_.size());
+        backwardMaps_.resize(highResForwardMotions_.size());
+        for (size_t i = 0; i < highResForwardMotions_.size(); ++i)
+            buildMotionMaps(highResForwardMotions_[i], highResBackwardMotions_[i], forwardMaps_[i], backwardMaps_[i]);
+
+        // initial estimation
+
+        const Size lowResSize = src[0].size();
+        const Size highResSize(lowResSize.width * scale_, lowResSize.height * scale_);
+
+        cuda::resize(src[baseIdx], highRes_, highResSize, 0, 0, INTER_CUBIC);
+
+        // iterations
+
+        streams_.resize(src.size());
+        diffTerms_.resize(src.size());
+        a_.resize(src.size());
+        b_.resize(src.size());
+        c_.resize(src.size());
+
+        for (int i = 0; i < iterations_; ++i)
+        {
+            for (size_t k = 0; k < src.size(); ++k)
+            {
+                // a = M * Ih
+                cuda::remap(highRes_, a_[k], backwardMaps_[k].first, backwardMaps_[k].second, INTER_NEAREST, BORDER_REPLICATE, Scalar(), streams_[k]);
+                // b = HM * Ih
+                filters_[k]->apply(a_[k], b_[k], streams_[k]);
+                // c = DHF * Ih
+                cuda::resize(b_[k], c_[k], lowResSize, 0, 0, INTER_NEAREST, streams_[k]);
+
+                diffSign(src[k], c_[k], c_[k], streams_[k]);
+
+                // a = Dt * diff
+                upscale(c_[k], a_[k], scale_, streams_[k]);
+                // b = HtDt * diff
+                filters_[k]->apply(a_[k], b_[k], streams_[k]);
+                // diffTerm = MtHtDt * diff
+                cuda::remap(b_[k], diffTerms_[k], forwardMaps_[k].first, forwardMaps_[k].second, INTER_NEAREST, BORDER_REPLICATE, Scalar(), streams_[k]);
+            }
+
+            if (lambda_ > 0)
+            {
+                calcBtvRegularization(highRes_, regTerm_, btvKernelSize_);
+                cuda::addWeighted(highRes_, 1.0, regTerm_, -tau_ * lambda_, 0.0, highRes_);
+            }
+
+            for (size_t k = 0; k < src.size(); ++k)
+            {
+                streams_[k].waitForCompletion();
+                cuda::addWeighted(highRes_, 1.0, diffTerms_[k], tau_, 0.0, highRes_);
+            }
+        }
+
+        Rect inner(btvKernelSize_, btvKernelSize_, highRes_.cols - 2 * btvKernelSize_, highRes_.rows - 2 * btvKernelSize_);
+        highRes_(inner).copyTo(dst);
+    }
+
+    void BTVL1_CUDA_Base::collectGarbage()
+    {
+        filters_.clear();
+
+        lowResForwardMotions_.clear();
+        lowResBackwardMotions_.clear();
+
+        highResForwardMotions_.clear();
+        highResBackwardMotions_.clear();
+
+        forwardMaps_.clear();
+        backwardMaps_.clear();
+
+        highRes_.release();
+
+        diffTerms_.clear();
+        a_.clear();
+        b_.clear();
+        c_.clear();
+        regTerm_.release();
+    }
+
+////////////////////////////////////////////////////////////
+
+    class BTVL1_CUDA : public BTVL1_CUDA_Base
+    {
+    public:
+        BTVL1_CUDA();
+
+        void collectGarbage();
+
+    protected:
+        void initImpl(Ptr<FrameSource>& frameSource);
+        void processImpl(Ptr<FrameSource>& frameSource, OutputArray output);
+
+    private:
+        void readNextFrame(Ptr<FrameSource>& frameSource);
+        void processFrame(int idx);
+
+        GpuMat curFrame_;
+        GpuMat prevFrame_;
+
+        std::vector<GpuMat> frames_;
+        std::vector<std::pair<GpuMat, GpuMat> > forwardMotions_;
+        std::vector<std::pair<GpuMat, GpuMat> > backwardMotions_;
+        std::vector<GpuMat> outputs_;
+
+        int storePos_;
+        int procPos_;
+        int outPos_;
+
+        std::vector<GpuMat> srcFrames_;
+        std::vector<std::pair<GpuMat, GpuMat> > srcForwardMotions_;
+        std::vector<std::pair<GpuMat, GpuMat> > srcBackwardMotions_;
+        GpuMat finalOutput_;
+    };
+
+    BTVL1_CUDA::BTVL1_CUDA()
+    {
+        temporalAreaRadius_ = 4;
+    }
+
+    void BTVL1_CUDA::collectGarbage()
+    {
+        curFrame_.release();
+        prevFrame_.release();
+
+        frames_.clear();
+        forwardMotions_.clear();
+        backwardMotions_.clear();
+        outputs_.clear();
+
+        srcFrames_.clear();
+        srcForwardMotions_.clear();
+        srcBackwardMotions_.clear();
+        finalOutput_.release();
+
+        SuperResolution::collectGarbage();
+        BTVL1_CUDA_Base::collectGarbage();
+    }
+
+    void BTVL1_CUDA::initImpl(Ptr<FrameSource>& frameSource)
+    {
+        const int cacheSize = 2 * temporalAreaRadius_ + 1;
+
+        frames_.resize(cacheSize);
+        forwardMotions_.resize(cacheSize);
+        backwardMotions_.resize(cacheSize);
+        outputs_.resize(cacheSize);
+
+        storePos_ = -1;
+
+        for (int t = -temporalAreaRadius_; t <= temporalAreaRadius_; ++t)
+            readNextFrame(frameSource);
+
+        for (int i = 0; i <= temporalAreaRadius_; ++i)
+            processFrame(i);
+
+        procPos_ = temporalAreaRadius_;
+        outPos_ = -1;
+    }
+
+    void BTVL1_CUDA::processImpl(Ptr<FrameSource>& frameSource, OutputArray _output)
+    {
+        if (outPos_ >= storePos_)
+        {
+            _output.release();
+            return;
+        }
+
+        readNextFrame(frameSource);
+
+        if (procPos_ < storePos_)
+        {
+            ++procPos_;
+            processFrame(procPos_);
+        }
+
+        ++outPos_;
+        const GpuMat& curOutput = at(outPos_, outputs_);
+
+        if (_output.kind() == _InputArray::CUDA_GPU_MAT)
+            curOutput.convertTo(_output.getGpuMatRef(), CV_8U);
+        else
+        {
+            curOutput.convertTo(finalOutput_, CV_8U);
+            arrCopy(finalOutput_, _output);
+        }
+    }
+
+    void BTVL1_CUDA::readNextFrame(Ptr<FrameSource>& frameSource)
+    {
+        frameSource->nextFrame(curFrame_);
+
+        if (curFrame_.empty())
+            return;
+
+        ++storePos_;
+        curFrame_.convertTo(at(storePos_, frames_), CV_32F);
+
+        if (storePos_ > 0)
+        {
+            std::pair<GpuMat, GpuMat>& forwardMotion = at(storePos_ - 1, forwardMotions_);
+            std::pair<GpuMat, GpuMat>& backwardMotion = at(storePos_, backwardMotions_);
+
+            opticalFlow_->calc(prevFrame_, curFrame_, forwardMotion.first, forwardMotion.second);
+            opticalFlow_->calc(curFrame_, prevFrame_, backwardMotion.first, backwardMotion.second);
+        }
+
+        curFrame_.copyTo(prevFrame_);
+    }
+
+    void BTVL1_CUDA::processFrame(int idx)
+    {
+        const int startIdx = std::max(idx - temporalAreaRadius_, 0);
+        const int procIdx = idx;
+        const int endIdx = std::min(startIdx + 2 * temporalAreaRadius_, storePos_);
+
+        const int count = endIdx - startIdx + 1;
+
+        srcFrames_.resize(count);
+        srcForwardMotions_.resize(count);
+        srcBackwardMotions_.resize(count);
+
+        int baseIdx = -1;
+
+        for (int i = startIdx, k = 0; i <= endIdx; ++i, ++k)
+        {
+            if (i == procIdx)
+                baseIdx = k;
+
+            srcFrames_[k] = at(i, frames_);
+
+            if (i < endIdx)
+                srcForwardMotions_[k] = at(i, forwardMotions_);
+            if (i > startIdx)
+                srcBackwardMotions_[k] = at(i, backwardMotions_);
+        }
+
+        process(srcFrames_, at(idx, outputs_), srcForwardMotions_, srcBackwardMotions_, baseIdx);
+    }
+}
+
+Ptr<SuperResolution> cv::superres::createSuperResolution_BTVL1_CUDA()
+{
+    return makePtr<BTVL1_CUDA>();
+}
+
+#endif // HAVE_CUDA
diff --git a/modules/superres/src/cuda/btv_l1_gpu.cu b/modules/superres/src/cuda/btv_l1_gpu.cu
new file mode 100644
index 00000000000..fffca6439a2
--- /dev/null
+++ b/modules/superres/src/cuda/btv_l1_gpu.cu
@@ -0,0 +1,240 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "opencv2/opencv_modules.hpp"
+
+#if defined(HAVE_OPENCV_CUDAARITHM) && defined(HAVE_OPENCV_CUDAWARPING) && defined(HAVE_OPENCV_CUDAFILTERS)
+
+#include "opencv2/core/cuda/common.hpp"
+#include "opencv2/core/cuda/transform.hpp"
+#include "opencv2/core/cuda/vec_traits.hpp"
+#include "opencv2/core/cuda/vec_math.hpp"
+
+using namespace cv::cuda;
+using namespace cv::cuda::device;
+
+namespace btv_l1_cudev
+{
+    void buildMotionMaps(PtrStepSzf forwardMotionX, PtrStepSzf forwardMotionY,
+                         PtrStepSzf backwardMotionX, PtrStepSzf bacwardMotionY,
+                         PtrStepSzf forwardMapX, PtrStepSzf forwardMapY,
+                         PtrStepSzf backwardMapX, PtrStepSzf backwardMapY);
+
+    template <int cn>
+    void upscale(const PtrStepSzb src, PtrStepSzb dst, int scale, cudaStream_t stream);
+
+    void diffSign(PtrStepSzf src1, PtrStepSzf src2, PtrStepSzf dst, cudaStream_t stream);
+
+    void loadBtvWeights(const float* weights, size_t count);
+    template <int cn> void calcBtvRegularization(PtrStepSzb src, PtrStepSzb dst, int ksize);
+}
+
+namespace btv_l1_cudev
+{
+    __global__ void buildMotionMapsKernel(const PtrStepSzf forwardMotionX, const PtrStepf forwardMotionY,
+                                          PtrStepf backwardMotionX, PtrStepf backwardMotionY,
+                                          PtrStepf forwardMapX, PtrStepf forwardMapY,
+                                          PtrStepf backwardMapX, PtrStepf backwardMapY)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= forwardMotionX.cols || y >= forwardMotionX.rows)
+            return;
+
+        const float fx = forwardMotionX(y, x);
+        const float fy = forwardMotionY(y, x);
+
+        const float bx = backwardMotionX(y, x);
+        const float by = backwardMotionY(y, x);
+
+        forwardMapX(y, x) = x + bx;
+        forwardMapY(y, x) = y + by;
+
+        backwardMapX(y, x) = x + fx;
+        backwardMapY(y, x) = y + fy;
+    }
+
+    void buildMotionMaps(PtrStepSzf forwardMotionX, PtrStepSzf forwardMotionY,
+                         PtrStepSzf backwardMotionX, PtrStepSzf bacwardMotionY,
+                         PtrStepSzf forwardMapX, PtrStepSzf forwardMapY,
+                         PtrStepSzf backwardMapX, PtrStepSzf backwardMapY)
+    {
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(forwardMapX.cols, block.x), divUp(forwardMapX.rows, block.y));
+
+        buildMotionMapsKernel<<<grid, block>>>(forwardMotionX, forwardMotionY,
+                                               backwardMotionX, bacwardMotionY,
+                                               forwardMapX, forwardMapY,
+                                               backwardMapX, backwardMapY);
+        cudaSafeCall( cudaGetLastError() );
+
+        cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    template <typename T>
+    __global__ void upscaleKernel(const PtrStepSz<T> src, PtrStep<T> dst, const int scale)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+        if (x >= src.cols || y >= src.rows)
+            return;
+
+        dst(y * scale, x * scale) = src(y, x);
+    }
+
+    template <int cn>
+    void upscale(const PtrStepSzb src, PtrStepSzb dst, int scale, cudaStream_t stream)
+    {
+        typedef typename TypeVec<float, cn>::vec_type src_t;
+
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(src.cols, block.x), divUp(src.rows, block.y));
+
+        upscaleKernel<src_t><<<grid, block, 0, stream>>>((PtrStepSz<src_t>) src, (PtrStepSz<src_t>) dst, scale);
+        cudaSafeCall( cudaGetLastError() );
+
+        if (stream == 0)
+            cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    template void upscale<1>(const PtrStepSzb src, PtrStepSzb dst, int scale, cudaStream_t stream);
+    template void upscale<3>(const PtrStepSzb src, PtrStepSzb dst, int scale, cudaStream_t stream);
+    template void upscale<4>(const PtrStepSzb src, PtrStepSzb dst, int scale, cudaStream_t stream);
+
+    __device__ __forceinline__ float diffSign(float a, float b)
+    {
+        return a > b ? 1.0f : a < b ? -1.0f : 0.0f;
+    }
+    __device__ __forceinline__ float3 diffSign(const float3& a, const float3& b)
+    {
+        return make_float3(
+            a.x > b.x ? 1.0f : a.x < b.x ? -1.0f : 0.0f,
+            a.y > b.y ? 1.0f : a.y < b.y ? -1.0f : 0.0f,
+            a.z > b.z ? 1.0f : a.z < b.z ? -1.0f : 0.0f
+        );
+    }
+    __device__ __forceinline__ float4 diffSign(const float4& a, const float4& b)
+    {
+        return make_float4(
+            a.x > b.x ? 1.0f : a.x < b.x ? -1.0f : 0.0f,
+            a.y > b.y ? 1.0f : a.y < b.y ? -1.0f : 0.0f,
+            a.z > b.z ? 1.0f : a.z < b.z ? -1.0f : 0.0f,
+            0.0f
+        );
+    }
+
+    struct DiffSign : binary_function<float, float, float>
+    {
+        __device__ __forceinline__ float operator ()(float a, float b) const
+        {
+            return diffSign(a, b);
+        }
+    };
+}
+
+namespace cv { namespace cuda { namespace device
+{
+    template <> struct TransformFunctorTraits<btv_l1_cudev::DiffSign> : DefaultTransformFunctorTraits<btv_l1_cudev::DiffSign>
+    {
+        enum { smart_block_dim_y = 8 };
+        enum { smart_shift = 4 };
+    };
+}}}
+
+namespace btv_l1_cudev
+{
+    void diffSign(PtrStepSzf src1, PtrStepSzf src2, PtrStepSzf dst, cudaStream_t stream)
+    {
+        transform(src1, src2, dst, DiffSign(), WithOutMask(), stream);
+    }
+
+    __constant__ float c_btvRegWeights[16*16];
+
+    template <typename T>
+    __global__ void calcBtvRegularizationKernel(const PtrStepSz<T> src, PtrStep<T> dst, const int ksize)
+    {
+        const int x = blockIdx.x * blockDim.x + threadIdx.x + ksize;
+        const int y = blockIdx.y * blockDim.y + threadIdx.y + ksize;
+
+        if (y >= src.rows - ksize || x >= src.cols - ksize)
+            return;
+
+        const T srcVal = src(y, x);
+
+        T dstVal = VecTraits<T>::all(0);
+
+        for (int m = 0, count = 0; m <= ksize; ++m)
+        {
+            for (int l = ksize; l + m >= 0; --l, ++count)
+                dstVal = dstVal + c_btvRegWeights[count] * (diffSign(srcVal, src(y + m, x + l)) - diffSign(src(y - m, x - l), srcVal));
+        }
+
+        dst(y, x) = dstVal;
+    }
+
+    void loadBtvWeights(const float* weights, size_t count)
+    {
+        cudaSafeCall( cudaMemcpyToSymbol(c_btvRegWeights, weights, count * sizeof(float)) );
+    }
+
+    template <int cn>
+    void calcBtvRegularization(PtrStepSzb src, PtrStepSzb dst, int ksize)
+    {
+        typedef typename TypeVec<float, cn>::vec_type src_t;
+
+        const dim3 block(32, 8);
+        const dim3 grid(divUp(src.cols, block.x), divUp(src.rows, block.y));
+
+        calcBtvRegularizationKernel<src_t><<<grid, block>>>((PtrStepSz<src_t>) src, (PtrStepSz<src_t>) dst, ksize);
+        cudaSafeCall( cudaGetLastError() );
+
+        cudaSafeCall( cudaDeviceSynchronize() );
+    }
+
+    template void calcBtvRegularization<1>(PtrStepSzb src, PtrStepSzb dst, int ksize);
+    template void calcBtvRegularization<3>(PtrStepSzb src, PtrStepSzb dst, int ksize);
+    template void calcBtvRegularization<4>(PtrStepSzb src, PtrStepSzb dst, int ksize);
+}
+
+#endif
diff --git a/modules/superres/src/frame_source.cpp b/modules/superres/src/frame_source.cpp
new file mode 100644
index 00000000000..f3d8db7d186
--- /dev/null
+++ b/modules/superres/src/frame_source.cpp
@@ -0,0 +1,255 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::superres;
+using namespace cv::superres::detail;
+
+cv::superres::FrameSource::~FrameSource()
+{
+}
+
+//////////////////////////////////////////////////////
+// EmptyFrameSource
+
+namespace
+{
+    class EmptyFrameSource : public FrameSource
+    {
+    public:
+        void nextFrame(OutputArray frame) CV_OVERRIDE;
+        void reset() CV_OVERRIDE;
+    };
+
+    void EmptyFrameSource::nextFrame(OutputArray frame)
+    {
+        frame.release();
+    }
+
+    void EmptyFrameSource::reset()
+    {
+    }
+}
+
+Ptr<FrameSource> cv::superres::createFrameSource_Empty()
+{
+    return makePtr<EmptyFrameSource>();
+}
+
+//////////////////////////////////////////////////////
+// VideoFrameSource & CameraFrameSource
+
+#ifndef HAVE_OPENCV_VIDEOIO
+
+Ptr<FrameSource> cv::superres::createFrameSource_Video(const String& fileName)
+{
+    CV_UNUSED(fileName);
+    CV_Error(cv::Error::StsNotImplemented, "The called functionality is disabled for current build or platform");
+    return Ptr<FrameSource>();
+}
+
+Ptr<FrameSource> cv::superres::createFrameSource_Camera(int deviceId)
+{
+    CV_UNUSED(deviceId);
+    CV_Error(cv::Error::StsNotImplemented, "The called functionality is disabled for current build or platform");
+    return Ptr<FrameSource>();
+}
+
+#else // HAVE_OPENCV_VIDEOIO
+
+namespace
+{
+    class CaptureFrameSource : public FrameSource
+    {
+    public:
+        void nextFrame(OutputArray frame) CV_OVERRIDE;
+
+    protected:
+        VideoCapture vc_;
+
+    private:
+        Mat frame_;
+    };
+
+    void CaptureFrameSource::nextFrame(OutputArray _frame)
+    {
+        if (_frame.kind() == _InputArray::MAT)
+            vc_ >> _frame.getMatRef();
+        else if(_frame.kind() == _InputArray::CUDA_GPU_MAT)
+        {
+            vc_ >> frame_;
+            arrCopy(frame_, _frame);
+        }
+        else if (_frame.isUMat())
+            vc_ >> *(UMat *)_frame.getObj();
+        else
+        {
+            // should never get here
+            CV_Error(Error::StsBadArg, "Failed to detect input frame kind" );
+        }
+    }
+
+    class VideoFrameSource : public CaptureFrameSource
+    {
+    public:
+        VideoFrameSource(const String& fileName);
+
+        void reset() CV_OVERRIDE;
+
+    private:
+        String fileName_;
+    };
+
+    VideoFrameSource::VideoFrameSource(const String& fileName) : fileName_(fileName)
+    {
+        reset();
+    }
+
+    void VideoFrameSource::reset()
+    {
+        vc_.release();
+        vc_.open(fileName_);
+        CV_Assert( vc_.isOpened() );
+    }
+
+    class CameraFrameSource : public CaptureFrameSource
+    {
+    public:
+        CameraFrameSource(int deviceId);
+
+        void reset() CV_OVERRIDE;
+
+    private:
+        int deviceId_;
+    };
+
+    CameraFrameSource::CameraFrameSource(int deviceId) : deviceId_(deviceId)
+    {
+        reset();
+    }
+
+    void CameraFrameSource::reset()
+    {
+        vc_.release();
+        vc_.open(deviceId_);
+        CV_Assert( vc_.isOpened() );
+    }
+}
+
+Ptr<FrameSource> cv::superres::createFrameSource_Video(const String& fileName)
+{
+    return makePtr<VideoFrameSource>(fileName);
+}
+
+Ptr<FrameSource> cv::superres::createFrameSource_Camera(int deviceId)
+{
+    return makePtr<CameraFrameSource>(deviceId);
+}
+
+#endif // HAVE_OPENCV_VIDEOIO
+
+//////////////////////////////////////////////////////
+// VideoFrameSource_CUDA
+
+#ifndef HAVE_OPENCV_CUDACODEC
+
+Ptr<FrameSource> cv::superres::createFrameSource_Video_CUDA(const String& fileName)
+{
+    CV_UNUSED(fileName);
+    CV_Error(cv::Error::StsNotImplemented, "The called functionality is disabled for current build or platform");
+}
+
+#else // HAVE_OPENCV_CUDACODEC
+
+namespace
+{
+    class VideoFrameSource_CUDA : public FrameSource
+    {
+    public:
+        VideoFrameSource_CUDA(const String& fileName);
+
+        void nextFrame(OutputArray frame);
+        void reset();
+
+    private:
+        String fileName_;
+        Ptr<cudacodec::VideoReader> reader_;
+        GpuMat frame_;
+    };
+
+    VideoFrameSource_CUDA::VideoFrameSource_CUDA(const String& fileName) : fileName_(fileName)
+    {
+        reset();
+    }
+
+    void VideoFrameSource_CUDA::nextFrame(OutputArray _frame)
+    {
+        if (_frame.kind() == _InputArray::CUDA_GPU_MAT)
+        {
+            bool res = reader_->nextFrame(_frame.getGpuMatRef());
+            if (!res)
+                _frame.release();
+        }
+        else
+        {
+            bool res = reader_->nextFrame(frame_);
+            if (!res)
+                _frame.release();
+            else
+                arrCopy(frame_, _frame);
+        }
+    }
+
+    void VideoFrameSource_CUDA::reset()
+    {
+        reader_ = cudacodec::createVideoReader(fileName_);
+    }
+}
+
+Ptr<FrameSource> cv::superres::createFrameSource_Video_CUDA(const String& fileName)
+{
+    return makePtr<VideoFrameSource_CUDA>(fileName);
+}
+
+#endif // HAVE_OPENCV_CUDACODEC
diff --git a/modules/superres/src/input_array_utility.cpp b/modules/superres/src/input_array_utility.cpp
new file mode 100644
index 00000000000..fd60c20bd92
--- /dev/null
+++ b/modules/superres/src/input_array_utility.cpp
@@ -0,0 +1,314 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+
+Mat cv::superres::arrGetMat(InputArray arr, Mat& buf)
+{
+    switch (arr.kind())
+    {
+    case _InputArray::CUDA_GPU_MAT:
+        arr.getGpuMat().download(buf);
+        return buf;
+
+    case _InputArray::OPENGL_BUFFER:
+        arr.getOGlBuffer().copyTo(buf);
+        return buf;
+
+    default:
+        return arr.getMat();
+    }
+}
+
+UMat cv::superres::arrGetUMat(InputArray arr, UMat& buf)
+{
+    switch (arr.kind())
+    {
+    case _InputArray::CUDA_GPU_MAT:
+        arr.getGpuMat().download(buf);
+        return buf;
+
+    case _InputArray::OPENGL_BUFFER:
+        arr.getOGlBuffer().copyTo(buf);
+        return buf;
+
+    default:
+        return arr.getUMat();
+    }
+}
+
+GpuMat cv::superres::arrGetGpuMat(InputArray arr, GpuMat& buf)
+{
+    switch (arr.kind())
+    {
+    case _InputArray::CUDA_GPU_MAT:
+        return arr.getGpuMat();
+
+    case _InputArray::OPENGL_BUFFER:
+        arr.getOGlBuffer().copyTo(buf);
+        return buf;
+
+    default:
+        buf.upload(arr.getMat());
+        return buf;
+    }
+}
+
+namespace
+{
+    void mat2mat(InputArray src, OutputArray dst)
+    {
+        src.getMat().copyTo(dst);
+    }
+    void arr2buf(InputArray src, OutputArray dst)
+    {
+        dst.getOGlBufferRef().copyFrom(src);
+    }
+    void mat2gpu(InputArray src, OutputArray dst)
+    {
+        dst.getGpuMatRef().upload(src.getMat());
+    }
+    void buf2arr(InputArray src, OutputArray dst)
+    {
+        src.getOGlBuffer().copyTo(dst);
+    }
+    void gpu2mat(InputArray src, OutputArray dst)
+    {
+        GpuMat d = src.getGpuMat();
+        dst.create(d.size(), d.type());
+        Mat m = dst.getMat();
+        d.download(m);
+    }
+    void gpu2gpu(InputArray src, OutputArray dst)
+    {
+        src.getGpuMat().copyTo(dst.getGpuMatRef());
+    }
+}
+
+void cv::superres::arrCopy(InputArray src, OutputArray dst)
+{
+    if (dst.isUMat() || src.isUMat())
+    {
+        src.copyTo(dst);
+        return;
+    }
+
+    typedef void (*func_t)(InputArray src, OutputArray dst);
+    static const func_t funcs[10][10] =
+    {
+        { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
+        { 0, mat2mat, mat2mat, mat2mat, mat2mat, mat2mat, mat2mat, arr2buf, 0, mat2gpu },
+        { 0, mat2mat, mat2mat, mat2mat, mat2mat, mat2mat, mat2mat, arr2buf, 0, mat2gpu },
+        { 0, mat2mat, mat2mat, mat2mat, mat2mat, mat2mat, mat2mat, arr2buf, 0, mat2gpu },
+        { 0, mat2mat, mat2mat, mat2mat, mat2mat, mat2mat, mat2mat, arr2buf, 0, mat2gpu },
+        { 0, mat2mat, mat2mat, mat2mat, mat2mat, mat2mat, mat2mat, arr2buf, 0, mat2gpu },
+        { 0, mat2mat, mat2mat, mat2mat, mat2mat, mat2mat, mat2mat, arr2buf, 0, mat2gpu },
+        { 0, buf2arr, buf2arr, buf2arr, buf2arr, buf2arr, buf2arr, buf2arr, 0, buf2arr },
+        { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
+        { 0, gpu2mat, gpu2mat, gpu2mat, gpu2mat, gpu2mat, gpu2mat, arr2buf, 0 , gpu2gpu },
+    };
+
+    const int src_kind = src.kind() >> _InputArray::KIND_SHIFT;
+    const int dst_kind = dst.kind() >> _InputArray::KIND_SHIFT;
+
+    CV_Assert( src_kind >= 0 && src_kind < 10 );
+    CV_Assert( dst_kind >= 0 && dst_kind < 10 );
+
+    const func_t func = funcs[src_kind][dst_kind];
+    CV_Assert( func != 0 );
+
+    func(src, dst);
+}
+
+namespace
+{
+    void convertToCn(InputArray src, OutputArray dst, int cn)
+    {
+        int scn = src.channels();
+        CV_Assert( scn == 1 || scn == 3 || scn == 4 );
+        CV_Assert( cn == 1 || cn == 3 || cn == 4 );
+
+        static const int codes[5][5] =
+        {
+            { -1, -1, -1, -1, -1 },
+            { -1, -1, -1, COLOR_GRAY2BGR, COLOR_GRAY2BGRA },
+            { -1, -1, -1, -1, -1 },
+            { -1, COLOR_BGR2GRAY, -1, -1, COLOR_BGR2BGRA },
+            { -1, COLOR_BGRA2GRAY, -1, COLOR_BGRA2BGR, -1 }
+        };
+
+        const int code = codes[scn][cn];
+        CV_Assert( code >= 0 );
+
+        switch (src.kind())
+        {
+        case _InputArray::CUDA_GPU_MAT:
+            #ifdef HAVE_OPENCV_CUDAIMGPROC
+                cuda::cvtColor(src.getGpuMat(), dst.getGpuMatRef(), code, cn);
+            #else
+                CV_Error(cv::Error::StsNotImplemented, "The called functionality is disabled for current build or platform");
+            #endif
+            break;
+
+        default:
+            cv::cvtColor(src, dst, code, cn);
+            break;
+        }
+    }
+
+    void convertToDepth(InputArray src, OutputArray dst, int depth)
+    {
+        const int sdepth = src.depth();
+        CV_Assert( sdepth <= CV_64F );
+        CV_Assert( depth == CV_8U || depth == CV_32F );
+
+        static const double maxVals[CV_64F + 1] =
+        {
+            (double)std::numeric_limits<uchar>::max(),
+            (double)std::numeric_limits<schar>::max(),
+            (double)std::numeric_limits<ushort>::max(),
+            (double)std::numeric_limits<short>::max(),
+            (double)std::numeric_limits<int>::max(),
+            1.0,
+            1.0,
+        };
+
+        const double scale = maxVals[depth] / maxVals[sdepth];
+
+        switch (src.kind())
+        {
+        case _InputArray::CUDA_GPU_MAT:
+            src.getGpuMat().convertTo(dst.getGpuMatRef(), depth, scale);
+            break;
+
+        case _InputArray::UMAT:
+            src.getUMat().convertTo(dst, depth, scale);
+            break;
+
+        default:
+            src.getMat().convertTo(dst, depth, scale);
+            break;
+        }
+    }
+}
+
+Mat cv::superres::convertToType(const Mat& src, int type, Mat& buf0, Mat& buf1)
+{
+    CV_INSTRUMENT_REGION();
+
+    if (src.type() == type)
+        return src;
+
+    const int depth = CV_MAT_DEPTH(type);
+    const int cn = CV_MAT_CN(type);
+
+    if (src.depth() == depth)
+    {
+        convertToCn(src, buf0, cn);
+        return buf0;
+    }
+
+    if (src.channels() == cn)
+    {
+        convertToDepth(src, buf1, depth);
+        return buf1;
+    }
+
+    convertToCn(src, buf0, cn);
+    convertToDepth(buf0, buf1, depth);
+    return buf1;
+}
+
+UMat cv::superres::convertToType(const UMat& src, int type, UMat& buf0, UMat& buf1)
+{
+    CV_INSTRUMENT_REGION();
+
+    if (src.type() == type)
+        return src;
+
+    const int depth = CV_MAT_DEPTH(type);
+    const int cn = CV_MAT_CN(type);
+
+    if (src.depth() == depth)
+    {
+        convertToCn(src, buf0, cn);
+        return buf0;
+    }
+
+    if (src.channels() == cn)
+    {
+        convertToDepth(src, buf1, depth);
+        return buf1;
+    }
+
+    convertToCn(src, buf0, cn);
+    convertToDepth(buf0, buf1, depth);
+    return buf1;
+}
+
+GpuMat cv::superres::convertToType(const GpuMat& src, int type, GpuMat& buf0, GpuMat& buf1)
+{
+    if (src.type() == type)
+        return src;
+
+    const int depth = CV_MAT_DEPTH(type);
+    const int cn = CV_MAT_CN(type);
+
+    if (src.depth() == depth)
+    {
+        convertToCn(src, buf0, cn);
+        return buf0;
+    }
+
+    if (src.channels() == cn)
+    {
+        convertToDepth(src, buf1, depth);
+        return buf1;
+    }
+
+    convertToCn(src, buf0, cn);
+    convertToDepth(buf0, buf1, depth);
+    return buf1;
+}
diff --git a/modules/superres/src/input_array_utility.hpp b/modules/superres/src/input_array_utility.hpp
new file mode 100644
index 00000000000..3a858fbd76b
--- /dev/null
+++ b/modules/superres/src/input_array_utility.hpp
@@ -0,0 +1,65 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_SUPERRES_INPUT_ARRAY_UTILITY_HPP__
+#define __OPENCV_SUPERRES_INPUT_ARRAY_UTILITY_HPP__
+
+#include "opencv2/core.hpp"
+#include "opencv2/core/cuda.hpp"
+
+namespace cv
+{
+    namespace superres
+    {
+        CV_EXPORTS Mat arrGetMat(InputArray arr, Mat& buf);
+        CV_EXPORTS UMat arrGetUMat(InputArray arr, UMat& buf);
+        CV_EXPORTS cuda::GpuMat arrGetGpuMat(InputArray arr, cuda::GpuMat& buf);
+
+        CV_EXPORTS void arrCopy(InputArray src, OutputArray dst);
+
+        CV_EXPORTS Mat convertToType(const Mat& src, int type, Mat& buf0, Mat& buf1);
+        CV_EXPORTS UMat convertToType(const UMat& src, int type, UMat& buf0, UMat& buf1);
+        CV_EXPORTS cuda::GpuMat convertToType(const cuda::GpuMat& src, int type, cuda::GpuMat& buf0, cuda::GpuMat& buf1);
+    }
+}
+
+#endif // __OPENCV_SUPERRES_INPUT_ARRAY_UTILITY_HPP__
diff --git a/modules/superres/src/opencl/superres_btvl1.cl b/modules/superres/src/opencl/superres_btvl1.cl
new file mode 100644
index 00000000000..b0e11aacbe5
--- /dev/null
+++ b/modules/superres/src/opencl/superres_btvl1.cl
@@ -0,0 +1,179 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2010-2012, Multicoreware, Inc., all rights reserved.
+// Copyright (C) 2010-2012, Advanced Micro Devices, Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// @Authors
+//    Jin Ma jin@multicorewareinc.com
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors as is and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef cn
+#define cn 1
+#endif
+
+#define sz (int)sizeof(float)
+#define src_elem_at(_src, y, step, x) *(__global const float *)(_src + mad24(y, step, (x) * sz))
+#define dst_elem_at(_dst, y, step, x) *(__global float *)(_dst + mad24(y, step, (x) * sz))
+
+__kernel void buildMotionMaps(__global const uchar * forwardMotionPtr, int forwardMotion_step, int forwardMotion_offset,
+                              __global const uchar * backwardMotionPtr, int backwardMotion_step, int backwardMotion_offset,
+                              __global const uchar * forwardMapPtr, int forwardMap_step, int forwardMap_offset,
+                              __global const uchar * backwardMapPtr, int backwardMap_step, int backwardMap_offset,
+                              int rows, int cols)
+{
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+
+    if (x < cols && y < rows)
+    {
+        int forwardMotion_index = mad24(forwardMotion_step, y, (int)sizeof(float2) * x + forwardMotion_offset);
+        int backwardMotion_index = mad24(backwardMotion_step, y, (int)sizeof(float2) * x + backwardMotion_offset);
+        int forwardMap_index = mad24(forwardMap_step, y, (int)sizeof(float2) * x + forwardMap_offset);
+        int backwardMap_index = mad24(backwardMap_step, y, (int)sizeof(float2) * x + backwardMap_offset);
+
+        float2 forwardMotion = *(__global const float2 *)(forwardMotionPtr + forwardMotion_index);
+        float2 backwardMotion = *(__global const float2 *)(backwardMotionPtr + backwardMotion_index);
+        __global float2 * forwardMap = (__global float2 *)(forwardMapPtr + forwardMap_index);
+        __global float2 * backwardMap = (__global float2 *)(backwardMapPtr + backwardMap_index);
+
+        float2 basePoint = (float2)(x, y);
+
+        forwardMap[0] = basePoint + backwardMotion;
+        backwardMap[0] = basePoint + forwardMotion;
+    }
+}
+
+__kernel void upscale(__global const uchar * srcptr, int src_step, int src_offset, int src_rows, int src_cols,
+                      __global uchar * dstptr, int dst_step, int dst_offset, int scale)
+{
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+
+    if (x < src_cols && y < src_rows)
+    {
+        int src_index = mad24(y, src_step, sz * x * cn + src_offset);
+        int dst_index = mad24(y * scale, dst_step, sz * x * scale * cn + dst_offset);
+
+        __global const float * src = (__global const float *)(srcptr + src_index);
+        __global float * dst = (__global float *)(dstptr + dst_index);
+
+        #pragma unroll
+        for (int c = 0; c < cn; ++c)
+            dst[c] = src[c];
+    }
+}
+
+
+inline float diffSign1(float a, float b)
+{
+    return a > b ? 1.0f : a < b ? -1.0f : 0.0f;
+}
+
+inline float3 diffSign3(float3 a, float3 b)
+{
+    float3 pos;
+    pos.x = a.x > b.x ? 1.0f : a.x < b.x ? -1.0f : 0.0f;
+    pos.y = a.y > b.y ? 1.0f : a.y < b.y ? -1.0f : 0.0f;
+    pos.z = a.z > b.z ? 1.0f : a.z < b.z ? -1.0f : 0.0f;
+    return pos;
+}
+
+__kernel void diffSign(__global const uchar * src1, int src1_step, int src1_offset,
+                       __global const uchar * src2, int src2_step, int src2_offset,
+                       __global uchar * dst, int dst_step, int dst_offset, int rows, int cols)
+{
+    int x = get_global_id(0);
+    int y = get_global_id(1);
+
+    if (x < cols && y < rows)
+        *(__global float *)(dst + mad24(y, dst_step, sz * x + dst_offset)) =
+            diffSign1(*(__global const float *)(src1 + mad24(y, src1_step, sz * x + src1_offset)),
+                      *(__global const float *)(src2 + mad24(y, src2_step, sz * x + src2_offset)));
+}
+
+__kernel void calcBtvRegularization(__global const uchar * src, int src_step, int src_offset,
+                                    __global uchar * dst, int dst_step, int dst_offset, int dst_rows, int dst_cols,
+                                    int ksize, __constant float * c_btvRegWeights)
+{
+    int x = get_global_id(0) + ksize;
+    int y = get_global_id(1) + ksize;
+
+    if (y < dst_rows - ksize && x < dst_cols - ksize)
+    {
+        src += src_offset;
+
+#if cn == 1
+        const float srcVal = src_elem_at(src, y, src_step, x);
+        float dstVal = 0.0f;
+
+        for (int m = 0, count = 0; m <= ksize; ++m)
+            for (int l = ksize; l + m >= 0; --l, ++count)
+            {
+                dstVal += c_btvRegWeights[count] * (diffSign1(srcVal, src_elem_at(src, y + m, src_step, x + l))
+                    - diffSign1(src_elem_at(src, y - m, src_step, x - l), srcVal));
+            }
+
+        dst_elem_at(dst, y, dst_step, x) = dstVal;
+#elif cn == 3
+        __global const float * src0ptr = (__global const float *)(src + mad24(y, src_step, 3 * sz * x + src_offset));
+        float3 srcVal = (float3)(src0ptr[0], src0ptr[1], src0ptr[2]), dstVal = 0.f;
+
+        for (int m = 0, count = 0; m <= ksize; ++m)
+        {
+            for (int l = ksize; l + m >= 0; --l, ++count)
+            {
+                __global const float * src1ptr = (__global const float *)(src + mad24(y + m, src_step, 3 * sz * (x + l) + src_offset));
+                __global const float * src2ptr = (__global const float *)(src + mad24(y - m, src_step, 3 * sz * (x - l) + src_offset));
+
+                float3 src1 = (float3)(src1ptr[0], src1ptr[1], src1ptr[2]);
+                float3 src2 = (float3)(src2ptr[0], src2ptr[1], src2ptr[2]);
+
+                dstVal += c_btvRegWeights[count] * (diffSign3(srcVal, src1) - diffSign3(src2, srcVal));
+            }
+        }
+
+        __global float * dstptr = (__global float *)(dst + mad24(y, dst_step, 3 * sz * x + dst_offset + 0));
+        dstptr[0] = dstVal.x;
+        dstptr[1] = dstVal.y;
+        dstptr[2] = dstVal.z;
+#else
+#error "Number of channels should be either 1 of 3"
+#endif
+    }
+}
diff --git a/modules/superres/src/optical_flow.cpp b/modules/superres/src/optical_flow.cpp
new file mode 100644
index 00000000000..50e5b04866a
--- /dev/null
+++ b/modules/superres/src/optical_flow.cpp
@@ -0,0 +1,849 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+#include "opencv2/optflow.hpp"
+#include "opencv2/core/opencl/ocl_defs.hpp"
+
+using namespace cv;
+using namespace cv::cuda;
+using namespace cv::superres;
+using namespace cv::superres::detail;
+
+///////////////////////////////////////////////////////////////////
+// CpuOpticalFlow
+
+namespace
+{
+    class CpuOpticalFlow : public virtual cv::superres::DenseOpticalFlowExt
+    {
+    public:
+        explicit CpuOpticalFlow(int work_type);
+
+        void calc(InputArray frame0, InputArray frame1, OutputArray flow1, OutputArray flow2) CV_OVERRIDE;
+        void collectGarbage() CV_OVERRIDE;
+
+    protected:
+        virtual void impl(InputArray input0, InputArray input1, OutputArray dst) = 0;
+
+    private:
+#ifdef HAVE_OPENCL
+        bool ocl_calc(InputArray frame0, InputArray frame1, OutputArray flow1, OutputArray flow2);
+#endif
+
+        int work_type_;
+
+        // Mat
+        Mat buf_[6];
+        Mat flow_;
+        Mat flows_[2];
+
+        // UMat
+        UMat ubuf_[6];
+        UMat uflow_;
+        std::vector<UMat> uflows_;
+    };
+
+    CpuOpticalFlow::CpuOpticalFlow(int work_type) :
+        work_type_(work_type)
+    {
+    }
+
+#ifdef HAVE_OPENCL
+    bool CpuOpticalFlow::ocl_calc(InputArray _frame0, InputArray _frame1, OutputArray _flow1, OutputArray _flow2)
+    {
+        UMat frame0 = arrGetUMat(_frame0, ubuf_[0]);
+        UMat frame1 = arrGetUMat(_frame1, ubuf_[1]);
+
+        CV_Assert( frame1.type() == frame0.type() );
+        CV_Assert( frame1.size() == frame0.size() );
+
+        UMat input0 = convertToType(frame0, work_type_, ubuf_[2], ubuf_[3]);
+        UMat input1 = convertToType(frame1, work_type_, ubuf_[4], ubuf_[5]);
+
+        if (!_flow2.needed())
+        {
+            impl(input0, input1, _flow1);
+            return true;
+        }
+
+        impl(input0, input1, uflow_);
+
+        if (!_flow2.needed())
+            arrCopy(uflow_, _flow1);
+        else
+        {
+            split(uflow_, uflows_);
+
+            arrCopy(uflows_[0], _flow1);
+            arrCopy(uflows_[1], _flow2);
+        }
+
+        return true;
+    }
+#endif
+
+    void CpuOpticalFlow::calc(InputArray _frame0, InputArray _frame1, OutputArray _flow1, OutputArray _flow2)
+    {
+        CV_INSTRUMENT_REGION();
+
+        CV_OCL_RUN(_flow1.isUMat() && (_flow2.isUMat() || !_flow2.needed()),
+                   ocl_calc(_frame0, _frame1, _flow1, _flow2))
+
+        Mat frame0 = arrGetMat(_frame0, buf_[0]);
+        Mat frame1 = arrGetMat(_frame1, buf_[1]);
+
+        CV_Assert( frame1.type() == frame0.type() );
+        CV_Assert( frame1.size() == frame0.size() );
+
+        Mat input0 = convertToType(frame0, work_type_, buf_[2], buf_[3]);
+        Mat input1 = convertToType(frame1, work_type_, buf_[4], buf_[5]);
+
+        if (!_flow2.needed() && _flow1.kind() < _InputArray::OPENGL_BUFFER)
+        {
+            impl(input0, input1, _flow1);
+            return;
+        }
+
+        impl(input0, input1, flow_);
+
+        if (!_flow2.needed())
+            arrCopy(flow_, _flow1);
+        else
+        {
+            split(flow_, flows_);
+
+            arrCopy(flows_[0], _flow1);
+            arrCopy(flows_[1], _flow2);
+        }
+    }
+
+    void CpuOpticalFlow::collectGarbage()
+    {
+        // Mat
+        for (int i = 0; i < 6; ++i)
+            buf_[i].release();
+        flow_.release();
+        flows_[0].release();
+        flows_[1].release();
+
+        // UMat
+        for (int i = 0; i < 6; ++i)
+            ubuf_[i].release();
+        uflow_.release();
+        uflows_[0].release();
+        uflows_[1].release();
+    }
+}
+
+///////////////////////////////////////////////////////////////////
+// Farneback
+
+namespace
+{
+    class Farneback CV_FINAL : public CpuOpticalFlow, public cv::superres::FarnebackOpticalFlow
+    {
+    public:
+        Farneback();
+        void calc(InputArray frame0, InputArray frame1, OutputArray flow1, OutputArray flow2) CV_OVERRIDE;
+        void collectGarbage() CV_OVERRIDE;
+
+        inline double getPyrScale() const CV_OVERRIDE { return pyrScale_; }
+        inline void setPyrScale(double val) CV_OVERRIDE { pyrScale_ = val; }
+        inline int getLevelsNumber() const CV_OVERRIDE { return numLevels_; }
+        inline void setLevelsNumber(int val) CV_OVERRIDE { numLevels_ = val; }
+        inline int getWindowSize() const CV_OVERRIDE { return winSize_; }
+        inline void setWindowSize(int val) CV_OVERRIDE { winSize_ = val; }
+        inline int getIterations() const CV_OVERRIDE { return numIters_; }
+        inline void setIterations(int val) CV_OVERRIDE { numIters_ = val; }
+        inline int getPolyN() const CV_OVERRIDE { return polyN_; }
+        inline void setPolyN(int val) CV_OVERRIDE { polyN_ = val; }
+        inline double getPolySigma() const CV_OVERRIDE { return polySigma_; }
+        inline void setPolySigma(double val) CV_OVERRIDE { polySigma_ = val; }
+        inline int getFlags() const CV_OVERRIDE { return flags_; }
+        inline void setFlags(int val) CV_OVERRIDE { flags_ = val; }
+
+    protected:
+        void impl(InputArray input0, InputArray input1, OutputArray dst) CV_OVERRIDE;
+
+    private:
+        double pyrScale_;
+        int numLevels_;
+        int winSize_;
+        int numIters_;
+        int polyN_;
+        double polySigma_;
+        int flags_;
+    };
+
+    Farneback::Farneback() : CpuOpticalFlow(CV_8UC1)
+    {
+        pyrScale_ = 0.5;
+        numLevels_ = 5;
+        winSize_ = 13;
+        numIters_ = 10;
+        polyN_ = 5;
+        polySigma_ = 1.1;
+        flags_ = 0;
+    }
+
+    void Farneback::calc(InputArray frame0, InputArray frame1, OutputArray flow1, OutputArray flow2)
+    {
+        CV_INSTRUMENT_REGION();
+
+        CpuOpticalFlow::calc(frame0, frame1, flow1, flow2);
+    }
+
+    void Farneback::collectGarbage()
+    {
+        CpuOpticalFlow::collectGarbage();
+    }
+
+    void Farneback::impl(InputArray input0, InputArray input1, OutputArray dst)
+    {
+        calcOpticalFlowFarneback(input0, input1, InputOutputArray(dst), pyrScale_,
+                                 numLevels_, winSize_, numIters_,
+                                 polyN_, polySigma_, flags_);
+    }
+}
+
+Ptr<cv::superres::FarnebackOpticalFlow> cv::superres::createOptFlow_Farneback()
+{
+    return makePtr<Farneback>();
+}
+
+///////////////////////////////////////////////////////////////////
+// Simple
+
+/*
+namespace
+{
+    class Simple : public CpuOpticalFlow
+    {
+    public:
+        AlgorithmInfo* info() const;
+
+        Simple();
+
+    protected:
+        void impl(InputArray input0, InputArray input1, OutputArray dst);
+
+    private:
+        int layers_;
+        int averagingBlockSize_;
+        int maxFlow_;
+        double sigmaDist_;
+        double sigmaColor_;
+        int postProcessWindow_;
+        double sigmaDistFix_;
+        double sigmaColorFix_;
+        double occThr_;
+        int upscaleAveragingRadius_;
+        double upscaleSigmaDist_;
+        double upscaleSigmaColor_;
+        double speedUpThr_;
+    };
+
+    CV_INIT_ALGORITHM(Simple, "DenseOpticalFlowExt.Simple",
+                      obj.info()->addParam(obj, "layers", obj.layers_);
+                      obj.info()->addParam(obj, "averagingBlockSize", obj.averagingBlockSize_);
+                      obj.info()->addParam(obj, "maxFlow", obj.maxFlow_);
+                      obj.info()->addParam(obj, "sigmaDist", obj.sigmaDist_);
+                      obj.info()->addParam(obj, "sigmaColor", obj.sigmaColor_);
+                      obj.info()->addParam(obj, "postProcessWindow", obj.postProcessWindow_);
+                      obj.info()->addParam(obj, "sigmaDistFix", obj.sigmaDistFix_);
+                      obj.info()->addParam(obj, "sigmaColorFix", obj.sigmaColorFix_);
+                      obj.info()->addParam(obj, "occThr", obj.occThr_);
+                      obj.info()->addParam(obj, "upscaleAveragingRadius", obj.upscaleAveragingRadius_);
+                      obj.info()->addParam(obj, "upscaleSigmaDist", obj.upscaleSigmaDist_);
+                      obj.info()->addParam(obj, "upscaleSigmaColor", obj.upscaleSigmaColor_);
+                      obj.info()->addParam(obj, "speedUpThr", obj.speedUpThr_))
+
+    Simple::Simple() : CpuOpticalFlow(CV_8UC3)
+    {
+        layers_ = 3;
+        averagingBlockSize_ = 2;
+        maxFlow_ = 4;
+        sigmaDist_ = 4.1;
+        sigmaColor_ = 25.5;
+        postProcessWindow_ = 18;
+        sigmaDistFix_ = 55.0;
+        sigmaColorFix_ = 25.5;
+        occThr_ = 0.35;
+        upscaleAveragingRadius_ = 18;
+        upscaleSigmaDist_ = 55.0;
+        upscaleSigmaColor_ = 25.5;
+        speedUpThr_ = 10;
+    }
+
+    void Simple::impl(InputArray _input0, InputArray _input1, OutputArray _dst)
+    {
+        calcOpticalFlowSF(_input0, _input1, _dst,
+                          layers_,
+                          averagingBlockSize_,
+                          maxFlow_,
+                          sigmaDist_,
+                          sigmaColor_,
+                          postProcessWindow_,
+                          sigmaDistFix_,
+                          sigmaColorFix_,
+                          occThr_,
+                          upscaleAveragingRadius_,
+                          upscaleSigmaDist_,
+                          upscaleSigmaColor_,
+                          speedUpThr_);
+    }
+}
+
+Ptr<DenseOpticalFlowExt> cv::superres::createOptFlow_Simple()
+{
+    return makePtr<Simple>();
+}*/
+
+///////////////////////////////////////////////////////////////////
+// DualTVL1
+
+namespace
+{
+    class DualTVL1 CV_FINAL : public CpuOpticalFlow, public virtual cv::superres::DualTVL1OpticalFlow
+    {
+    public:
+        DualTVL1();
+        void calc(InputArray frame0, InputArray frame1, OutputArray flow1, OutputArray flow2) CV_OVERRIDE;
+        void collectGarbage() CV_OVERRIDE;
+
+        inline double getTau() const CV_OVERRIDE { return (*alg_).getTau(); }
+        inline void setTau(double val) CV_OVERRIDE { (*alg_).setTau(val); }
+        inline double getLambda() const CV_OVERRIDE { return (*alg_).getLambda(); }
+        inline void setLambda(double val) CV_OVERRIDE { (*alg_).setLambda(val); }
+        inline double getTheta() const CV_OVERRIDE { return (*alg_).getTheta(); }
+        inline void setTheta(double val) CV_OVERRIDE { (*alg_).setTheta(val); }
+        inline int getScalesNumber() const CV_OVERRIDE { return (*alg_).getScalesNumber(); }
+        inline void setScalesNumber(int val) CV_OVERRIDE { (*alg_).setScalesNumber(val); }
+        inline int getWarpingsNumber() const CV_OVERRIDE { return (*alg_).getWarpingsNumber(); }
+        inline void setWarpingsNumber(int val) CV_OVERRIDE { (*alg_).setWarpingsNumber(val); }
+        inline double getEpsilon() const CV_OVERRIDE { return (*alg_).getEpsilon(); }
+        inline void setEpsilon(double val) CV_OVERRIDE { (*alg_).setEpsilon(val); }
+        inline int  getIterations() const CV_OVERRIDE { return (*alg_).getOuterIterations(); }
+        inline void setIterations(int val) CV_OVERRIDE { (*alg_).setOuterIterations(val); }
+        inline bool getUseInitialFlow() const CV_OVERRIDE { return (*alg_).getUseInitialFlow(); }
+        inline void setUseInitialFlow(bool val) CV_OVERRIDE { (*alg_).setUseInitialFlow(val); }
+
+    protected:
+        void impl(InputArray input0, InputArray input1, OutputArray dst) CV_OVERRIDE;
+
+    private:
+        Ptr<optflow::DualTVL1OpticalFlow> alg_;
+    };
+
+    DualTVL1::DualTVL1() : CpuOpticalFlow(CV_8UC1)
+    {
+        alg_ = optflow::createOptFlow_DualTVL1();
+    }
+
+    void DualTVL1::calc(InputArray frame0, InputArray frame1, OutputArray flow1, OutputArray flow2)
+    {
+        CV_INSTRUMENT_REGION();
+
+        CpuOpticalFlow::calc(frame0, frame1, flow1, flow2);
+    }
+
+    void DualTVL1::impl(InputArray input0, InputArray input1, OutputArray dst)
+    {
+        alg_->calc(input0, input1, (InputOutputArray)dst);
+    }
+
+    void DualTVL1::collectGarbage()
+    {
+        alg_->collectGarbage();
+        CpuOpticalFlow::collectGarbage();
+    }
+}
+
+Ptr<cv::superres::DualTVL1OpticalFlow> cv::superres::createOptFlow_DualTVL1()
+{
+    return makePtr<DualTVL1>();
+}
+
+///////////////////////////////////////////////////////////////////
+// GpuOpticalFlow
+
+#ifndef HAVE_OPENCV_CUDAOPTFLOW
+
+Ptr<cv::superres::FarnebackOpticalFlow> cv::superres::createOptFlow_Farneback_CUDA()
+{
+    CV_Error(cv::Error::StsNotImplemented, "The called functionality is disabled for current build or platform");
+}
+
+Ptr<cv::superres::DualTVL1OpticalFlow> cv::superres::createOptFlow_DualTVL1_CUDA()
+{
+    CV_Error(cv::Error::StsNotImplemented, "The called functionality is disabled for current build or platform");
+}
+
+Ptr<cv::superres::BroxOpticalFlow> cv::superres::createOptFlow_Brox_CUDA()
+{
+    CV_Error(cv::Error::StsNotImplemented, "The called functionality is disabled for current build or platform");
+}
+
+Ptr<cv::superres::PyrLKOpticalFlow> cv::superres::createOptFlow_PyrLK_CUDA()
+{
+    CV_Error(cv::Error::StsNotImplemented, "The called functionality is disabled for current build or platform");
+}
+
+#else // HAVE_OPENCV_CUDAOPTFLOW
+
+namespace
+{
+    class GpuOpticalFlow : public virtual cv::superres::DenseOpticalFlowExt
+    {
+    public:
+        explicit GpuOpticalFlow(int work_type);
+
+        void calc(InputArray frame0, InputArray frame1, OutputArray flow1, OutputArray flow2) CV_OVERRIDE;
+        void collectGarbage() CV_OVERRIDE;
+
+    protected:
+        virtual void impl(const GpuMat& input0, const GpuMat& input1, GpuMat& dst1, GpuMat& dst2) = 0;
+
+    private:
+        int work_type_;
+        GpuMat buf_[6];
+        GpuMat u_, v_, flow_;
+    };
+
+    GpuOpticalFlow::GpuOpticalFlow(int work_type) : work_type_(work_type)
+    {
+    }
+
+    void GpuOpticalFlow::calc(InputArray _frame0, InputArray _frame1, OutputArray _flow1, OutputArray _flow2)
+    {
+        CV_INSTRUMENT_REGION();
+
+        GpuMat frame0 = arrGetGpuMat(_frame0, buf_[0]);
+        GpuMat frame1 = arrGetGpuMat(_frame1, buf_[1]);
+
+        CV_Assert( frame1.type() == frame0.type() );
+        CV_Assert( frame1.size() == frame0.size() );
+
+        GpuMat input0 = convertToType(frame0, work_type_, buf_[2], buf_[3]);
+        GpuMat input1 = convertToType(frame1, work_type_, buf_[4], buf_[5]);
+
+        if (_flow2.needed() && _flow1.kind() == _InputArray::CUDA_GPU_MAT && _flow2.kind() == _InputArray::CUDA_GPU_MAT)
+        {
+            impl(input0, input1, _flow1.getGpuMatRef(), _flow2.getGpuMatRef());
+            return;
+        }
+
+        impl(input0, input1, u_, v_);
+
+        if (_flow2.needed())
+        {
+            arrCopy(u_, _flow1);
+            arrCopy(v_, _flow2);
+        }
+        else
+        {
+            GpuMat src[] = {u_, v_};
+            merge(src, 2, flow_);
+            arrCopy(flow_, _flow1);
+        }
+    }
+
+    void GpuOpticalFlow::collectGarbage()
+    {
+        for (int i = 0; i < 6; ++i)
+            buf_[i].release();
+        u_.release();
+        v_.release();
+        flow_.release();
+    }
+}
+
+///////////////////////////////////////////////////////////////////
+// Brox_CUDA
+
+namespace
+{
+    class Brox_CUDA : public GpuOpticalFlow, public virtual cv::superres::BroxOpticalFlow
+    {
+    public:
+        Brox_CUDA();
+        void calc(InputArray frame0, InputArray frame1, OutputArray flow1, OutputArray flow2) CV_OVERRIDE;
+        void collectGarbage() CV_OVERRIDE;
+
+        inline double getAlpha() const CV_OVERRIDE { return alpha_; }
+        inline void setAlpha(double val) CV_OVERRIDE { alpha_ = val; }
+        inline double getGamma() const CV_OVERRIDE { return gamma_; }
+        inline void setGamma(double val) CV_OVERRIDE { gamma_ = val; }
+        inline double getScaleFactor() const CV_OVERRIDE { return scaleFactor_; }
+        inline void setScaleFactor(double val) CV_OVERRIDE { scaleFactor_ = val; }
+        inline int getInnerIterations() const CV_OVERRIDE { return innerIterations_; }
+        inline void setInnerIterations(int val) CV_OVERRIDE { innerIterations_ = val; }
+        inline int getOuterIterations() const CV_OVERRIDE { return outerIterations_; }
+        inline void setOuterIterations(int val) CV_OVERRIDE { outerIterations_ = val; }
+        inline int getSolverIterations() const CV_OVERRIDE { return solverIterations_; }
+        inline void setSolverIterations(int val) CV_OVERRIDE { solverIterations_ = val; }
+
+    protected:
+        void impl(const GpuMat& input0, const GpuMat& input1, GpuMat& dst1, GpuMat& dst2) CV_OVERRIDE;
+
+    private:
+        double alpha_;
+        double gamma_;
+        double scaleFactor_;
+        int innerIterations_;
+        int outerIterations_;
+        int solverIterations_;
+
+        Ptr<cuda::BroxOpticalFlow> alg_;
+    };
+
+    Brox_CUDA::Brox_CUDA() : GpuOpticalFlow(CV_32FC1)
+    {
+        alg_ = cuda::BroxOpticalFlow::create(0.197f, 50.0f, 0.8f, 10, 77, 10);
+
+        alpha_ = alg_->getFlowSmoothness();
+        gamma_ = alg_->getGradientConstancyImportance();
+        scaleFactor_ = alg_->getPyramidScaleFactor();
+        innerIterations_ = alg_->getInnerIterations();
+        outerIterations_ = alg_->getOuterIterations();
+        solverIterations_ = alg_->getSolverIterations();
+    }
+
+    void Brox_CUDA::calc(InputArray frame0, InputArray frame1, OutputArray flow1, OutputArray flow2)
+    {
+        GpuOpticalFlow::calc(frame0, frame1, flow1, flow2);
+    }
+
+    void Brox_CUDA::impl(const GpuMat& input0, const GpuMat& input1, GpuMat& dst1, GpuMat& dst2)
+    {
+        alg_->setFlowSmoothness(alpha_);
+        alg_->setGradientConstancyImportance(gamma_);
+        alg_->setPyramidScaleFactor(scaleFactor_);
+        alg_->setInnerIterations(innerIterations_);
+        alg_->setOuterIterations(outerIterations_);
+        alg_->setSolverIterations(solverIterations_);
+
+        GpuMat flow;
+        alg_->calc(input0, input1, flow);
+
+        GpuMat flows[2];
+        cuda::split(flow, flows);
+
+        dst1 = flows[0];
+        dst2 = flows[1];
+    }
+
+    void Brox_CUDA::collectGarbage()
+    {
+        alg_ = cuda::BroxOpticalFlow::create(alpha_, gamma_, scaleFactor_, innerIterations_, outerIterations_, solverIterations_);
+        GpuOpticalFlow::collectGarbage();
+    }
+}
+
+Ptr<cv::superres::BroxOpticalFlow> cv::superres::createOptFlow_Brox_CUDA()
+{
+    return makePtr<Brox_CUDA>();
+}
+
+///////////////////////////////////////////////////////////////////
+// PyrLK_CUDA
+
+namespace
+{
+    class PyrLK_CUDA : public GpuOpticalFlow, public cv::superres::PyrLKOpticalFlow
+    {
+    public:
+        PyrLK_CUDA();
+        void calc(InputArray frame0, InputArray frame1, OutputArray flow1, OutputArray flow2) CV_OVERRIDE;
+        void collectGarbage() CV_OVERRIDE;
+
+        inline int getWindowSize() const CV_OVERRIDE { return winSize_; }
+        inline void setWindowSize(int val) CV_OVERRIDE { winSize_ = val; }
+        inline int getMaxLevel() const CV_OVERRIDE { return maxLevel_; }
+        inline void setMaxLevel(int val) CV_OVERRIDE { maxLevel_ = val; }
+        inline int getIterations() const CV_OVERRIDE { return iterations_; }
+        inline void setIterations(int val) CV_OVERRIDE { iterations_ = val; }
+
+    protected:
+        void impl(const GpuMat& input0, const GpuMat& input1, GpuMat& dst1, GpuMat& dst2) CV_OVERRIDE;
+
+    private:
+        int winSize_;
+        int maxLevel_;
+        int iterations_;
+
+        Ptr<cuda::DensePyrLKOpticalFlow> alg_;
+    };
+
+    PyrLK_CUDA::PyrLK_CUDA() : GpuOpticalFlow(CV_8UC1)
+    {
+        alg_ = cuda::DensePyrLKOpticalFlow::create();
+
+        winSize_ = alg_->getWinSize().width;
+        maxLevel_ = alg_->getMaxLevel();
+        iterations_ = alg_->getNumIters();
+    }
+
+    void PyrLK_CUDA::calc(InputArray frame0, InputArray frame1, OutputArray flow1, OutputArray flow2)
+    {
+        GpuOpticalFlow::calc(frame0, frame1, flow1, flow2);
+    }
+
+    void PyrLK_CUDA::impl(const GpuMat& input0, const GpuMat& input1, GpuMat& dst1, GpuMat& dst2)
+    {
+        alg_->setWinSize(Size(winSize_, winSize_));
+        alg_->setMaxLevel(maxLevel_);
+        alg_->setNumIters(iterations_);
+
+        GpuMat flow;
+        alg_->calc(input0, input1, flow);
+
+        GpuMat flows[2];
+        cuda::split(flow, flows);
+
+        dst1 = flows[0];
+        dst2 = flows[1];
+    }
+
+    void PyrLK_CUDA::collectGarbage()
+    {
+        alg_ = cuda::DensePyrLKOpticalFlow::create();
+        GpuOpticalFlow::collectGarbage();
+    }
+}
+
+Ptr<cv::superres::PyrLKOpticalFlow> cv::superres::createOptFlow_PyrLK_CUDA()
+{
+    return makePtr<PyrLK_CUDA>();
+}
+
+///////////////////////////////////////////////////////////////////
+// Farneback_CUDA
+
+namespace
+{
+    class Farneback_CUDA : public GpuOpticalFlow, public cv::superres::FarnebackOpticalFlow
+    {
+    public:
+        Farneback_CUDA();
+        void calc(InputArray frame0, InputArray frame1, OutputArray flow1, OutputArray flow2) CV_OVERRIDE;
+        void collectGarbage() CV_OVERRIDE;
+
+        inline double getPyrScale() const CV_OVERRIDE { return pyrScale_; }
+        inline void setPyrScale(double val) CV_OVERRIDE { pyrScale_ = val; }
+        inline int getLevelsNumber() const CV_OVERRIDE { return numLevels_; }
+        inline void setLevelsNumber(int val) CV_OVERRIDE { numLevels_ = val; }
+        inline int getWindowSize() const CV_OVERRIDE { return winSize_; }
+        inline void setWindowSize(int val) CV_OVERRIDE { winSize_ = val; }
+        inline int getIterations() const CV_OVERRIDE { return numIters_; }
+        inline void setIterations(int val) CV_OVERRIDE { numIters_ = val; }
+        inline int getPolyN() const CV_OVERRIDE { return polyN_; }
+        inline void setPolyN(int val) CV_OVERRIDE { polyN_ = val; }
+        inline double getPolySigma() const CV_OVERRIDE { return polySigma_; }
+        inline void setPolySigma(double val) CV_OVERRIDE { polySigma_ = val; }
+        inline int getFlags() const CV_OVERRIDE { return flags_; }
+        inline void setFlags(int val) CV_OVERRIDE { flags_ = val; }
+
+    protected:
+        void impl(const GpuMat& input0, const GpuMat& input1, GpuMat& dst1, GpuMat& dst2) CV_OVERRIDE;
+
+    private:
+        double pyrScale_;
+        int numLevels_;
+        int winSize_;
+        int numIters_;
+        int polyN_;
+        double polySigma_;
+        int flags_;
+
+        Ptr<cuda::FarnebackOpticalFlow> alg_;
+    };
+
+    Farneback_CUDA::Farneback_CUDA() : GpuOpticalFlow(CV_8UC1)
+    {
+        alg_ = cuda::FarnebackOpticalFlow::create();
+
+        pyrScale_ = alg_->getPyrScale();
+        numLevels_ = alg_->getNumLevels();
+        winSize_ = alg_->getWinSize();
+        numIters_ = alg_->getNumIters();
+        polyN_ = alg_->getPolyN();
+        polySigma_ = alg_->getPolySigma();
+        flags_ = alg_->getFlags();
+    }
+
+    void Farneback_CUDA::calc(InputArray frame0, InputArray frame1, OutputArray flow1, OutputArray flow2)
+    {
+        GpuOpticalFlow::calc(frame0, frame1, flow1, flow2);
+    }
+
+    void Farneback_CUDA::impl(const GpuMat& input0, const GpuMat& input1, GpuMat& dst1, GpuMat& dst2)
+    {
+        alg_->setPyrScale(pyrScale_);
+        alg_->setNumLevels(numLevels_);
+        alg_->setWinSize(winSize_);
+        alg_->setNumIters(numIters_);
+        alg_->setPolyN(polyN_);
+        alg_->setPolySigma(polySigma_);
+        alg_->setFlags(flags_);
+
+        GpuMat flow;
+        alg_->calc(input0, input1, flow);
+
+        GpuMat flows[2];
+        cuda::split(flow, flows);
+
+        dst1 = flows[0];
+        dst2 = flows[1];
+    }
+
+    void Farneback_CUDA::collectGarbage()
+    {
+        alg_ = cuda::FarnebackOpticalFlow::create();
+        GpuOpticalFlow::collectGarbage();
+    }
+}
+
+Ptr<cv::superres::FarnebackOpticalFlow> cv::superres::createOptFlow_Farneback_CUDA()
+{
+    return makePtr<Farneback_CUDA>();
+}
+
+///////////////////////////////////////////////////////////////////
+// DualTVL1_CUDA
+
+namespace
+{
+    class DualTVL1_CUDA : public GpuOpticalFlow, public cv::superres::DualTVL1OpticalFlow
+    {
+    public:
+        DualTVL1_CUDA();
+        void calc(InputArray frame0, InputArray frame1, OutputArray flow1, OutputArray flow2) CV_OVERRIDE;
+        void collectGarbage() CV_OVERRIDE;
+
+        inline double getTau() const CV_OVERRIDE { return tau_; }
+        inline void setTau(double val) CV_OVERRIDE { tau_ = val; }
+        inline double getLambda() const CV_OVERRIDE { return lambda_; }
+        inline void setLambda(double val) CV_OVERRIDE { lambda_ = val; }
+        inline double getTheta() const CV_OVERRIDE { return theta_; }
+        inline void setTheta(double val) CV_OVERRIDE { theta_ = val; }
+        inline int getScalesNumber() const CV_OVERRIDE { return nscales_; }
+        inline void setScalesNumber(int val) CV_OVERRIDE { nscales_ = val; }
+        inline int getWarpingsNumber() const CV_OVERRIDE { return warps_; }
+        inline void setWarpingsNumber(int val) CV_OVERRIDE { warps_ = val; }
+        inline double getEpsilon() const CV_OVERRIDE { return epsilon_; }
+        inline void setEpsilon(double val) CV_OVERRIDE { epsilon_ = val; }
+        inline int getIterations() const CV_OVERRIDE { return iterations_; }
+        inline void setIterations(int val) CV_OVERRIDE { iterations_ = val; }
+        inline bool getUseInitialFlow() const CV_OVERRIDE { return useInitialFlow_; }
+        inline void setUseInitialFlow(bool val) CV_OVERRIDE { useInitialFlow_ = val; }
+
+    protected:
+        void impl(const GpuMat& input0, const GpuMat& input1, GpuMat& dst1, GpuMat& dst2) CV_OVERRIDE;
+
+    private:
+        double tau_;
+        double lambda_;
+        double theta_;
+        int nscales_;
+        int warps_;
+        double epsilon_;
+        int iterations_;
+        bool useInitialFlow_;
+
+        Ptr<cuda::OpticalFlowDual_TVL1> alg_;
+    };
+
+    DualTVL1_CUDA::DualTVL1_CUDA() : GpuOpticalFlow(CV_8UC1)
+    {
+        alg_ = cuda::OpticalFlowDual_TVL1::create();
+
+        tau_ = alg_->getTau();
+        lambda_ = alg_->getLambda();
+        theta_ = alg_->getTheta();
+        nscales_ = alg_->getNumScales();
+        warps_ = alg_->getNumWarps();
+        epsilon_ = alg_->getEpsilon();
+        iterations_ = alg_->getNumIterations();
+        useInitialFlow_ = alg_->getUseInitialFlow();
+    }
+
+    void DualTVL1_CUDA::calc(InputArray frame0, InputArray frame1, OutputArray flow1, OutputArray flow2)
+    {
+        GpuOpticalFlow::calc(frame0, frame1, flow1, flow2);
+    }
+
+    void DualTVL1_CUDA::impl(const GpuMat& input0, const GpuMat& input1, GpuMat& dst1, GpuMat& dst2)
+    {
+        alg_->setTau(tau_);
+        alg_->setLambda(lambda_);
+        alg_->setTheta(theta_);
+        alg_->setNumScales(nscales_);
+        alg_->setNumWarps(warps_);
+        alg_->setEpsilon(epsilon_);
+        alg_->setNumIterations(iterations_);
+        alg_->setUseInitialFlow(useInitialFlow_);
+
+        GpuMat flow;
+        alg_->calc(input0, input1, flow);
+
+        GpuMat flows[2];
+        cuda::split(flow, flows);
+
+        dst1 = flows[0];
+        dst2 = flows[1];
+    }
+
+    void DualTVL1_CUDA::collectGarbage()
+    {
+        alg_ = cuda::OpticalFlowDual_TVL1::create();
+        GpuOpticalFlow::collectGarbage();
+    }
+}
+
+Ptr<cv::superres::DualTVL1OpticalFlow> cv::superres::createOptFlow_DualTVL1_CUDA()
+{
+    return makePtr<DualTVL1_CUDA>();
+}
+
+#endif // HAVE_OPENCV_CUDAOPTFLOW
diff --git a/modules/superres/src/precomp.hpp b/modules/superres/src/precomp.hpp
new file mode 100644
index 00000000000..9f12c248d6c
--- /dev/null
+++ b/modules/superres/src/precomp.hpp
@@ -0,0 +1,97 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_PRECOMP_H__
+#define __OPENCV_PRECOMP_H__
+
+#include <vector>
+#include <limits>
+
+#include "opencv2/opencv_modules.hpp"
+#include "opencv2/core.hpp"
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/core/opengl.hpp"
+#include "opencv2/core/utility.hpp"
+#include "opencv2/imgproc.hpp"
+#include "opencv2/video/tracking.hpp"
+#include "opencv2/core/private.hpp"
+
+#include "opencv2/core/private.cuda.hpp"
+#include "opencv2/core/ocl.hpp"
+
+#ifdef HAVE_OPENCV_CUDAARITHM
+#  include "opencv2/cudaarithm.hpp"
+#endif
+
+#ifdef HAVE_OPENCV_CUDAWARPING
+#  include "opencv2/cudawarping.hpp"
+#endif
+
+#ifdef HAVE_OPENCV_CUDAFILTERS
+#  include "opencv2/cudafilters.hpp"
+#endif
+
+#ifdef HAVE_OPENCV_CUDAIMGPROC
+#  include "opencv2/cudaimgproc.hpp"
+#endif
+
+#ifdef HAVE_OPENCV_CUDAOPTFLOW
+#  include "opencv2/cudaoptflow.hpp"
+#endif
+
+#ifdef HAVE_OPENCV_CUDACODEC
+#  include "opencv2/cudacodec.hpp"
+#endif
+
+#ifdef HAVE_OPENCV_VIDEOIO
+    #include "opencv2/videoio.hpp"
+#endif
+
+#include "opencv2/superres.hpp"
+#include "opencv2/superres/optical_flow.hpp"
+#include "input_array_utility.hpp"
+
+#include "ring_buffer.hpp"
+
+#include "opencv2/core/private.hpp"
+
+#endif /* __OPENCV_PRECOMP_H__ */
diff --git a/modules/superres/src/ring_buffer.hpp b/modules/superres/src/ring_buffer.hpp
new file mode 100644
index 00000000000..a0c0c04b5e7
--- /dev/null
+++ b/modules/superres/src/ring_buffer.hpp
@@ -0,0 +1,79 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __RING_BUFFER_HPP__
+#define __RING_BUFFER_HPP__
+
+#include "precomp.hpp"
+
+namespace cv
+{
+    namespace superres
+    {
+        namespace detail
+        {
+            template <typename T, class A>
+            inline const T& at(int index, const std::vector<T, A>& items)
+            {
+                const int len = static_cast<int>(items.size());
+                if (index < 0)
+                    index -= ((index - len + 1) / len) * len;
+                if (index >= len)
+                    index %= len;
+                return items[index];
+            }
+
+            template <typename T, class A>
+            inline T& at(int index, std::vector<T, A>& items)
+            {
+                const int len = static_cast<int>(items.size());
+                if (index < 0)
+                    index -= ((index - len + 1) / len) * len;
+                if (index >= len)
+                    index %= len;
+                return items[index];
+            }
+        }
+    }
+}
+
+#endif // __RING_BUFFER_HPP__
diff --git a/modules/superres/src/super_resolution.cpp b/modules/superres/src/super_resolution.cpp
new file mode 100644
index 00000000000..7c153ea2cde
--- /dev/null
+++ b/modules/superres/src/super_resolution.cpp
@@ -0,0 +1,86 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+
+using namespace cv;
+using namespace cv::superres;
+
+cv::superres::SuperResolution::SuperResolution()
+{
+    frameSource_ = createFrameSource_Empty();
+    firstCall_ = true;
+    isUmat_ = false;
+}
+
+void cv::superres::SuperResolution::setInput(const Ptr<FrameSource>& frameSource)
+{
+    frameSource_ = frameSource;
+    firstCall_ = true;
+    isUmat_ = false;
+}
+
+void cv::superres::SuperResolution::nextFrame(OutputArray frame)
+{
+    CV_INSTRUMENT_REGION();
+
+    isUmat_ = frame.isUMat() && cv::ocl::useOpenCL();
+
+    if (firstCall_)
+    {
+        initImpl(frameSource_);
+        firstCall_ = false;
+    }
+
+    processImpl(frameSource_, frame);
+}
+
+void cv::superres::SuperResolution::reset()
+{
+    frameSource_->reset();
+    firstCall_ = true;
+    isUmat_ = false;
+}
+
+void cv::superres::SuperResolution::collectGarbage()
+{
+}
diff --git a/modules/superres/test/test_main.cpp b/modules/superres/test/test_main.cpp
new file mode 100644
index 00000000000..e9724d80967
--- /dev/null
+++ b/modules/superres/test/test_main.cpp
@@ -0,0 +1,49 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+
+#if defined(HAVE_HPX)
+    #include <hpx/hpx_main.hpp>
+#endif
+
+CV_TEST_MAIN("superres")
diff --git a/modules/superres/test/test_precomp.hpp b/modules/superres/test/test_precomp.hpp
new file mode 100644
index 00000000000..100bf51fcd1
--- /dev/null
+++ b/modules/superres/test/test_precomp.hpp
@@ -0,0 +1,48 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/superres.hpp"
+
+#endif
diff --git a/modules/superres/test/test_superres.cpp b/modules/superres/test/test_superres.cpp
new file mode 100644
index 00000000000..b0d4122f6e2
--- /dev/null
+++ b/modules/superres/test/test_superres.cpp
@@ -0,0 +1,307 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "test_precomp.hpp"
+#include "cvconfig.h"
+#include "../src/input_array_utility.hpp"
+#include "opencv2/ts/ocl_test.hpp"
+
+namespace opencv_test {
+
+#ifdef HAVE_VIDEO_INPUT
+
+namespace {
+
+class AllignedFrameSource : public cv::superres::FrameSource
+{
+public:
+    AllignedFrameSource(const cv::Ptr<cv::superres::FrameSource>& base, int scale);
+
+    void nextFrame(cv::OutputArray frame);
+    void reset();
+
+private:
+    cv::Ptr<cv::superres::FrameSource> base_;
+
+    cv::Mat origFrame_;
+    int scale_;
+};
+
+AllignedFrameSource::AllignedFrameSource(const cv::Ptr<cv::superres::FrameSource>& base, int scale) :
+    base_(base), scale_(scale)
+{
+    CV_Assert( base_ );
+}
+
+void AllignedFrameSource::nextFrame(cv::OutputArray frame)
+{
+    base_->nextFrame(origFrame_);
+
+    if (origFrame_.rows % scale_ == 0 && origFrame_.cols % scale_ == 0)
+        cv::superres::arrCopy(origFrame_, frame);
+    else
+    {
+        cv::Rect ROI(0, 0, (origFrame_.cols / scale_) * scale_, (origFrame_.rows / scale_) * scale_);
+        cv::superres::arrCopy(origFrame_(ROI), frame);
+    }
+}
+
+void AllignedFrameSource::reset()
+{
+    base_->reset();
+}
+
+class DegradeFrameSource : public cv::superres::FrameSource
+{
+public:
+    DegradeFrameSource(const cv::Ptr<cv::superres::FrameSource>& base, int scale);
+
+    void nextFrame(cv::OutputArray frame);
+    void reset();
+
+private:
+    cv::Ptr<cv::superres::FrameSource> base_;
+
+    cv::Mat origFrame_;
+    cv::Mat blurred_;
+    cv::Mat deg_;
+    double iscale_;
+};
+
+DegradeFrameSource::DegradeFrameSource(const cv::Ptr<cv::superres::FrameSource>& base, int scale) :
+    base_(base), iscale_(1.0 / scale)
+{
+    CV_Assert( base_ );
+}
+
+static void addGaussNoise(cv::OutputArray _image, double sigma)
+{
+    int type = _image.type(), depth = CV_MAT_DEPTH(type), cn = CV_MAT_CN(type);
+    cv::Mat noise(_image.size(), CV_32FC(cn));
+    cvtest::TS::ptr()->get_rng().fill(noise, cv::RNG::NORMAL, 0.0, sigma);
+
+    cv::addWeighted(_image, 1.0, noise, 1.0, 0.0, _image, depth);
+}
+
+static void addSpikeNoise(cv::OutputArray _image, int frequency)
+{
+    cv::Mat_<uchar> mask(_image.size(), 0);
+
+    for (int y = 0; y < mask.rows; ++y)
+        for (int x = 0; x < mask.cols; ++x)
+            if (cvtest::TS::ptr()->get_rng().uniform(0, frequency) < 1)
+                mask(y, x) = 255;
+
+    _image.setTo(cv::Scalar::all(255), mask);
+}
+
+void DegradeFrameSource::nextFrame(cv::OutputArray frame)
+{
+    base_->nextFrame(origFrame_);
+
+    cv::GaussianBlur(origFrame_, blurred_, cv::Size(5, 5), 0);
+    cv::resize(blurred_, deg_, cv::Size(), iscale_, iscale_, cv::INTER_NEAREST);
+
+    addGaussNoise(deg_, 10.0);
+    addSpikeNoise(deg_, 500);
+
+    cv::superres::arrCopy(deg_, frame);
+}
+
+void DegradeFrameSource::reset()
+{
+    base_->reset();
+}
+
+double MSSIM(cv::InputArray _i1, cv::InputArray _i2)
+{
+    const double C1 = 6.5025;
+    const double C2 = 58.5225;
+
+    const int depth = CV_32F;
+
+    cv::Mat I1, I2;
+    _i1.getMat().convertTo(I1, depth);
+    _i2.getMat().convertTo(I2, depth);
+
+    cv::Mat I2_2  = I2.mul(I2); // I2^2
+    cv::Mat I1_2  = I1.mul(I1); // I1^2
+    cv::Mat I1_I2 = I1.mul(I2); // I1 * I2
+
+    cv::Mat mu1, mu2;
+    cv::GaussianBlur(I1, mu1, cv::Size(11, 11), 1.5);
+    cv::GaussianBlur(I2, mu2, cv::Size(11, 11), 1.5);
+
+    cv::Mat mu1_2   = mu1.mul(mu1);
+    cv::Mat mu2_2   = mu2.mul(mu2);
+    cv::Mat mu1_mu2 = mu1.mul(mu2);
+
+    cv::Mat sigma1_2, sigma2_2, sigma12;
+
+    cv::GaussianBlur(I1_2, sigma1_2, cv::Size(11, 11), 1.5);
+    sigma1_2 -= mu1_2;
+
+    cv::GaussianBlur(I2_2, sigma2_2, cv::Size(11, 11), 1.5);
+    sigma2_2 -= mu2_2;
+
+    cv::GaussianBlur(I1_I2, sigma12, cv::Size(11, 11), 1.5);
+    sigma12 -= mu1_mu2;
+
+    cv::Mat t1, t2;
+    cv::Mat numerator;
+    cv::Mat denominator;
+
+    // t3 = ((2*mu1_mu2 + C1).*(2*sigma12 + C2))
+    t1 = 2 * mu1_mu2 + C1;
+    t2 = 2 * sigma12 + C2;
+    numerator = t1.mul(t2);
+
+    // t1 =((mu1_2 + mu2_2 + C1).*(sigma1_2 + sigma2_2 + C2))
+    t1 = mu1_2 + mu2_2 + C1;
+    t2 = sigma1_2 + sigma2_2 + C2;
+    denominator = t1.mul(t2);
+
+    // ssim_map =  numerator./denominator;
+    cv::Mat ssim_map;
+    cv::divide(numerator, denominator, ssim_map);
+
+    // mssim = average of ssim map
+    cv::Scalar mssim = cv::mean(ssim_map);
+
+    if (_i1.channels() == 1)
+        return mssim[0];
+
+    return (mssim[0] + mssim[1] + mssim[3]) / 3;
+}
+
+class SuperResolution : public testing::Test
+{
+public:
+    template <typename T>
+    void RunTest(cv::Ptr<cv::superres::SuperResolution> superRes);
+};
+
+template <typename T>
+void SuperResolution::RunTest(cv::Ptr<cv::superres::SuperResolution> superRes)
+{
+    const std::string inputVideoName = cvtest::TS::ptr()->get_data_path() + "car.avi";
+    const int scale = 2;
+    const int iterations = 100;
+    const int temporalAreaRadius = 2;
+
+    ASSERT_FALSE( superRes.empty() );
+
+    const int btvKernelSize = superRes->getKernelSize();
+
+    superRes->setScale(scale);
+    superRes->setIterations(iterations);
+    superRes->setTemporalAreaRadius(temporalAreaRadius);
+
+    cv::Ptr<cv::superres::FrameSource> goldSource(new AllignedFrameSource(cv::superres::createFrameSource_Video(inputVideoName), scale));
+    cv::Ptr<cv::superres::FrameSource> lowResSource(new DegradeFrameSource(
+        cv::makePtr<AllignedFrameSource>(cv::superres::createFrameSource_Video(inputVideoName), scale), scale));
+
+    // skip first frame
+    cv::Mat frame;
+
+    lowResSource->nextFrame(frame);
+    goldSource->nextFrame(frame);
+
+    cv::Rect inner(btvKernelSize, btvKernelSize, frame.cols - 2 * btvKernelSize, frame.rows - 2 * btvKernelSize);
+
+    superRes->setInput(lowResSource);
+
+    double srAvgMSSIM = 0.0;
+    const int count = 10;
+
+    cv::Mat goldFrame;
+    T superResFrame;
+    for (int i = 0; i < count; ++i)
+    {
+        goldSource->nextFrame(goldFrame);
+        ASSERT_FALSE( goldFrame.empty() );
+
+        superRes->nextFrame(superResFrame);
+        ASSERT_FALSE( superResFrame.empty() );
+
+        const double srMSSIM = MSSIM(goldFrame(inner), superResFrame);
+
+        srAvgMSSIM += srMSSIM;
+    }
+
+    srAvgMSSIM /= count;
+
+    EXPECT_GE( srAvgMSSIM, 0.5 );
+}
+
+TEST_F(SuperResolution, BTVL1)
+{
+    RunTest<cv::Mat>(cv::superres::createSuperResolution_BTVL1());
+}
+
+#if defined(HAVE_CUDA) && defined(HAVE_OPENCV_CUDAARITHM) && defined(HAVE_OPENCV_CUDAWARPING) && defined(HAVE_OPENCV_CUDAFILTERS)
+
+TEST_F(SuperResolution, BTVL1_CUDA)
+{
+    RunTest<cv::Mat>(cv::superres::createSuperResolution_BTVL1_CUDA());
+}
+
+#endif
+
+} // namespace
+
+#ifdef HAVE_OPENCL
+
+namespace ocl {
+
+OCL_TEST_F(SuperResolution, BTVL1)
+{
+    RunTest<cv::UMat>(cv::superres::createSuperResolution_BTVL1());
+}
+
+} // namespace opencv_test::ocl
+
+#endif
+
+#endif // HAVE_VIDEO_INPUT
+
+} // namespace
diff --git a/modules/surface_matching/include/opencv2/surface_matching/icp.hpp b/modules/surface_matching/include/opencv2/surface_matching/icp.hpp
index 123120a10fa..a0052d424c2 100644
--- a/modules/surface_matching/include/opencv2/surface_matching/icp.hpp
+++ b/modules/surface_matching/include/opencv2/surface_matching/icp.hpp
@@ -149,7 +149,7 @@ class CV_EXPORTS_W ICP
      *
      *  \details It is assumed that the model is registered on the scene. Scene remains static, while the model transforms. The output poses transform the models onto the scene. Because of the point to plane minimization, the scene is expected to have the normals available. Expected to have the normals (Nx6).
      */
-  int registerModelToScene(const Mat& srcPC, const Mat& dstPC, std::vector<Pose3DPtr>& poses);
+  CV_WRAP int registerModelToScene(const Mat& srcPC, const Mat& dstPC, CV_IN_OUT std::vector<Pose3DPtr>& poses);
 
 private:
   float m_tolerance;
diff --git a/modules/surface_matching/include/opencv2/surface_matching/pose_3d.hpp b/modules/surface_matching/include/opencv2/surface_matching/pose_3d.hpp
index c594239ee1c..b015db91026 100644
--- a/modules/surface_matching/include/opencv2/surface_matching/pose_3d.hpp
+++ b/modules/surface_matching/include/opencv2/surface_matching/pose_3d.hpp
@@ -70,7 +70,7 @@ typedef Ptr<PoseCluster3D> PoseCluster3DPtr;
 class CV_EXPORTS_W Pose3D
 {
 public:
-  Pose3D()
+  CV_WRAP Pose3D()
   {
     alpha=0;
     modelIndex=0;
@@ -80,7 +80,7 @@ class CV_EXPORTS_W Pose3D
     pose = Matx44d::all(0);
   }
 
-  Pose3D(double Alpha, size_t ModelIndex=0, size_t NumVotes=0)
+  CV_WRAP Pose3D(double Alpha, size_t ModelIndex=0, size_t NumVotes=0)
   {
     alpha = Alpha;
     modelIndex = ModelIndex;
@@ -94,24 +94,24 @@ class CV_EXPORTS_W Pose3D
    *  \brief Updates the pose with the new one
    *  \param [in] NewPose New pose to overwrite
    */
-  void updatePose(Matx44d& NewPose);
+  CV_WRAP void updatePose(Matx44d& NewPose);
 
   /**
    *  \brief Updates the pose with the new one
    */
-  void updatePose(Matx33d& NewR, Vec3d& NewT);
+  CV_WRAP void updatePose(Matx33d& NewR, Vec3d& NewT);
 
   /**
    *  \brief Updates the pose with the new one, but this time using quaternions to represent rotation
    */
-  void updatePoseQuat(Vec4d& Q, Vec3d& NewT);
+  CV_WRAP void updatePoseQuat(Vec4d& Q, Vec3d& NewT);
 
   /**
    *  \brief Left multiplies the existing pose in order to update the transformation
    *  \param [in] IncrementalPose New pose to apply
    */
-  void appendPose(Matx44d& IncrementalPose);
-  void printPose();
+  CV_WRAP void appendPose(Matx44d& IncrementalPose);
+  CV_WRAP void printPose();
 
   Pose3DPtr clone();
 
@@ -122,12 +122,12 @@ class CV_EXPORTS_W Pose3D
 
   virtual ~Pose3D() {}
 
-  double alpha, residual;
-  size_t modelIndex, numVotes;
-  Matx44d pose;
-  double angle;
-  Vec3d t;
-  Vec4d q;
+  CV_PROP double alpha, residual;
+  CV_PROP size_t modelIndex, numVotes;
+  CV_PROP Matx44d pose;
+  CV_PROP double angle;
+  CV_PROP Vec3d t;
+  CV_PROP Vec4d q;
 };
 
 /**
diff --git a/modules/surface_matching/include/opencv2/surface_matching/ppf_match_3d.hpp b/modules/surface_matching/include/opencv2/surface_matching/ppf_match_3d.hpp
index 8e68610654f..b967e6a6929 100644
--- a/modules/surface_matching/include/opencv2/surface_matching/ppf_match_3d.hpp
+++ b/modules/surface_matching/include/opencv2/surface_matching/ppf_match_3d.hpp
@@ -101,7 +101,7 @@ class CV_EXPORTS_W PPF3DDetector
   /**
    * \brief Empty constructor. Sets default arguments
    */
-  PPF3DDetector();
+  CV_WRAP PPF3DDetector();
 
   /**
     * Constructor with arguments
@@ -109,7 +109,7 @@ class CV_EXPORTS_W PPF3DDetector
     * @param [in] relativeDistanceStep The discretization distance of the point pair distance relative to the model's diameter. This value has a direct impact on the hashtable. Using small values would lead to too fine discretization, and thus ambiguity in the bins of hashtable. Too large values would lead to no discrimination over the feature vectors and different point pair features would be assigned to the same bin. This argument defaults to the value of RelativeSamplingStep. For noisy scenes, the value can be increased to improve the robustness of the matching against noisy points.
     * @param [in] numAngles Set the discretization of the point pair orientation as the number of subdivisions of the angle. This value is the equivalent of RelativeDistanceStep for the orientations. Increasing the value increases the precision of the matching but decreases the robustness against incorrect normal directions. Decreasing the value decreases the precision of the matching but increases the robustness against incorrect normal directions. For very noisy scenes where the normal directions can not be computed accurately, the value can be set to 25 or 20.
     */
-  PPF3DDetector(const double relativeSamplingStep, const double relativeDistanceStep=0.05, const double numAngles=30);
+  CV_WRAP PPF3DDetector(const double relativeSamplingStep, const double relativeDistanceStep=0.05, const double numAngles=30);
 
   virtual ~PPF3DDetector();
 
@@ -128,7 +128,7 @@ class CV_EXPORTS_W PPF3DDetector
     *
     *  \details Uses the parameters set in the constructor to downsample and learn a new model. When the model is learnt, the instance gets ready for calling "match".
     */
-  void trainModel(const Mat& Model);
+  CV_WRAP void trainModel(const Mat& Model);
 
   /**
     *  \brief Matches a trained model across a provided scene.
@@ -138,7 +138,7 @@ class CV_EXPORTS_W PPF3DDetector
     *  @param [in] relativeSceneSampleStep The ratio of scene points to be used for the matching after sampling with relativeSceneDistance. For example, if this value is set to 1.0/5.0, every 5th point from the scene is used for pose estimation. This parameter allows an easy trade-off between speed and accuracy of the matching. Increasing the value leads to less points being used and in turn to a faster but less accurate pose computation. Decreasing the value has the inverse effect.
     *  @param [in] relativeSceneDistance Set the distance threshold relative to the diameter of the model. This parameter is equivalent to relativeSamplingStep in the training stage. This parameter acts like a prior sampling with the relativeSceneSampleStep parameter.
     */
-  void match(const Mat& scene, std::vector<Pose3DPtr> &results, const double relativeSceneSampleStep=1.0/5.0, const double relativeSceneDistance=0.03);
+  CV_WRAP void match(const Mat& scene, CV_OUT std::vector<Pose3DPtr> &results, const double relativeSceneSampleStep=1.0/5.0, const double relativeSceneDistance=0.03);
 
   void read(const FileNode& fn);
   void write(FileStorage& fs) const;
diff --git a/modules/surface_matching/samples/ppf_load_match.py b/modules/surface_matching/samples/ppf_load_match.py
new file mode 100644
index 00000000000..31ebe613b5c
--- /dev/null
+++ b/modules/surface_matching/samples/ppf_load_match.py
@@ -0,0 +1,32 @@
+import cv2 as cv
+
+N = 2
+modelname = "parasaurolophus_6700"
+scenename = "rs1_normals"
+
+detector = cv.ppf_match_3d_PPF3DDetector(0.025, 0.05)
+
+print('Loading model...')
+pc = cv.ppf_match_3d.loadPLYSimple("data/%s.ply" % modelname, 1)
+
+
+print('Training...')
+detector.trainModel(pc)
+
+print('Loading scene...')
+pcTest = cv.ppf_match_3d.loadPLYSimple("data/%s.ply" % scenename, 1)
+
+print('Matching...')
+results = detector.match(pcTest, 1.0/40.0, 0.05)
+
+print('Performing ICP...')
+icp = cv.ppf_match_3d_ICP(100)
+_, results = icp.registerModelToScene(pc, pcTest, results[:N])
+
+print("Poses: ")
+for i, result in enumerate(results):
+    #result.printPose()
+    print("\n-- Pose to Model Index %d: NumVotes = %d, Residual = %f\n%s\n" % (result.modelIndex, result.numVotes, result.residual, result.pose))
+    if i == 0:
+        pct = cv.ppf_match_3d.transformPCPose(pc, result.pose)
+        cv.ppf_match_3d.writePLY(pct, "%sPCTrans.ply" % modelname)
diff --git a/modules/text/CMakeLists.txt b/modules/text/CMakeLists.txt
index 82e1e2a7e73..73bf6a2f878 100644
--- a/modules/text/CMakeLists.txt
+++ b/modules/text/CMakeLists.txt
@@ -12,9 +12,8 @@ ocv_define_module(text
       java
 )
 
-if(HAVE_TESSERACT)
-  ocv_include_directories(${Tesseract_INCLUDE_DIRS})
-  ocv_target_link_libraries(${the_module} ${Tesseract_LIBRARIES})
+if(TARGET ocv.3rdparty.tesseract)
+  ocv_target_link_libraries(${the_module} LINK_PRIVATE ocv.3rdparty.tesseract)
 endif()
 
 configure_file(${CMAKE_CURRENT_SOURCE_DIR}/text_config.hpp.in
diff --git a/modules/text/cmake/init.cmake b/modules/text/cmake/init.cmake
index e6df917762b..254757ecfa6 100644
--- a/modules/text/cmake/init.cmake
+++ b/modules/text/cmake/init.cmake
@@ -33,6 +33,26 @@ if(NOT HAVE_TESSERACT
       endif()
     endif()
     set(HAVE_TESSERACT 1)
+    set(HAVE_TESSERACT 1)
+
+    # TODO use ocv_add_external_target
+    set(name "tesseract")
+    set(inc "${Tesseract_INCLUDE_DIRS}")
+    set(link "${Tesseract_LIBRARIES}")
+    set(def "")
+    if(BUILD_SHARED_LIBS)
+      set(imp IMPORTED)
+    endif()
+    add_library(ocv.3rdparty.${name} INTERFACE ${imp})
+    set_target_properties(ocv.3rdparty.${name} PROPERTIES
+      INTERFACE_INCLUDE_DIRECTORIES "${inc}"
+      INTERFACE_SYSTEM_INCLUDE_DIRECTORIES "${inc}"
+      INTERFACE_LINK_LIBRARIES "${link}"
+      INTERFACE_COMPILE_DEFINITIONS "${def}")
+    if(NOT BUILD_SHARED_LIBS)
+      install(TARGETS ocv.3rdparty.${name} EXPORT OpenCVModules)
+    endif()
+
   else()
     message(STATUS "Tesseract:   NO")
   endif()
diff --git a/modules/text/include/opencv2/text/ocr.hpp b/modules/text/include/opencv2/text/ocr.hpp
index c1e0d17194d..0137c37a8b6 100644
--- a/modules/text/include/opencv2/text/ocr.hpp
+++ b/modules/text/include/opencv2/text/ocr.hpp
@@ -290,14 +290,6 @@ class CV_EXPORTS_W OCRHMMDecoder : public BaseOCR
     @param mode HMM Decoding algorithm. Only OCR_DECODER_VITERBI is available for the moment
     (<http://en.wikipedia.org/wiki/Viterbi_algorithm>).
      */
-    static Ptr<OCRHMMDecoder> create(const Ptr<OCRHMMDecoder::ClassifierCallback> classifier,// The character classifier with built in feature extractor
-                                     const std::string& vocabulary,                    // The language vocabulary (chars when ASCII English text)
-                                                                                       //     size() must be equal to the number of classes
-                                     InputArray transition_probabilities_table,        // Table with transition probabilities between character pairs
-                                                                                       //     cols == rows == vocabulary.size()
-                                     InputArray emission_probabilities_table,          // Table with observation emission probabilities
-                                                                                       //     cols == rows == vocabulary.size()
-                                     decoder_mode mode = OCR_DECODER_VITERBI);         // HMM Decoding algorithm (only Viterbi for the moment)
 
     CV_WRAP static Ptr<OCRHMMDecoder> create(const Ptr<OCRHMMDecoder::ClassifierCallback> classifier,// The character classifier with built in feature extractor
                                      const String& vocabulary,                    // The language vocabulary (chars when ASCII English text)
diff --git a/modules/text/src/erfilter.cpp b/modules/text/src/erfilter.cpp
index 2560b8b056c..8e8afbe5472 100644
--- a/modules/text/src/erfilter.cpp
+++ b/modules/text/src/erfilter.cpp
@@ -671,12 +671,12 @@ void ERFilterNM::er_merge(ERStat *parent, ERStat *child)
     child->level = child->level*thresholdDelta;
 
     // before saving calculate P(child|character) and filter if possible
-    if (classifier != NULL)
+    if (classifier)
     {
         child->probability = classifier->eval(*child);
     }
 
-    if ( (((classifier!=NULL)?(child->probability >= minProbability):true)||(nonMaxSuppression)) &&
+    if ( (((classifier)?(child->probability >= minProbability):true)||(nonMaxSuppression)) &&
          ((child->area >= (minArea*region_mask.rows*region_mask.cols)) &&
           (child->area <= (maxArea*region_mask.rows*region_mask.cols)) &&
           (child->rect.width > 2) && (child->rect.height > 2)) )
@@ -864,12 +864,12 @@ ERStat* ERFilterNM::er_tree_filter ( InputArray image, ERStat * stat, ERStat *pa
 
 
     // calculate P(child|character) and filter if possible
-    if ( (classifier != NULL) && (stat->parent != NULL) )
+    if (classifier && (stat->parent != NULL))
     {
         stat->probability = classifier->eval(*stat);
     }
 
-    if ( ( ((classifier != NULL)?(stat->probability >= minProbability):true) &&
+    if ( ( ((classifier)?(stat->probability >= minProbability):true) &&
           ((stat->area >= minArea*region_mask.rows*region_mask.cols) &&
            (stat->area <= maxArea*region_mask.rows*region_mask.cols)) ) ||
         (stat->parent == NULL) )
diff --git a/modules/text/src/ocr_hmm_decoder.cpp b/modules/text/src/ocr_hmm_decoder.cpp
index 598aeb31c40..6d61578bdac 100644
--- a/modules/text/src/ocr_hmm_decoder.cpp
+++ b/modules/text/src/ocr_hmm_decoder.cpp
@@ -665,16 +665,6 @@ class OCRHMMDecoderImpl : public OCRHMMDecoder
     }
 };
 
-Ptr<OCRHMMDecoder> OCRHMMDecoder::create( Ptr<OCRHMMDecoder::ClassifierCallback> _classifier,
-                                          const string& _vocabulary,
-                                          InputArray transition_p,
-                                          InputArray emission_p,
-                                          decoder_mode _mode)
-{
-    return makePtr<OCRHMMDecoderImpl>(_classifier, _vocabulary, transition_p, emission_p, _mode);
-}
-
-
 Ptr<OCRHMMDecoder> OCRHMMDecoder::create( Ptr<OCRHMMDecoder::ClassifierCallback> _classifier,
                                           const String& _vocabulary,
                                           InputArray transition_p,
diff --git a/modules/tracking/include/opencv2/tracking/feature.hpp b/modules/tracking/include/opencv2/tracking/feature.hpp
index 710a37012b5..3bcfe6e6c7a 100644
--- a/modules/tracking/include/opencv2/tracking/feature.hpp
+++ b/modules/tracking/include/opencv2/tracking/feature.hpp
@@ -137,17 +137,18 @@ class CvParams
 class CvFeatureParams : public CvParams
 {
  public:
-  enum
+  enum FeatureType
   {
     HAAR = 0,
     LBP = 1,
     HOG = 2
   };
+
   CvFeatureParams();
   virtual void init( const CvFeatureParams& fp );
   virtual void write( FileStorage &fs ) const CV_OVERRIDE;
   virtual bool read( const FileNode &node ) CV_OVERRIDE;
-  static Ptr<CvFeatureParams> create( int featureType );
+  static Ptr<CvFeatureParams> create(CvFeatureParams::FeatureType featureType);
   int maxCatCount;  // 0 in case of numerical features
   int featSize;  // 1 in case of simple features (HAAR, LBP) and N_BINS(9)*N_CELLS(4) in case of Dalal's HOG features
   int numFeatures;
@@ -163,7 +164,7 @@ class CvFeatureEvaluator
   virtual void setImage( const Mat& img, uchar clsLabel, int idx );
   virtual void writeFeatures( FileStorage &fs, const Mat& featureMap ) const = 0;
   virtual float operator()( int featureIdx, int sampleIdx ) = 0;
-  static Ptr<CvFeatureEvaluator> create( int type );
+  static Ptr<CvFeatureEvaluator> create(CvFeatureParams::FeatureType type);
 
   int getNumFeatures() const
   {
diff --git a/modules/tracking/include/opencv2/tracking/tracker.hpp b/modules/tracking/include/opencv2/tracking/tracker.hpp
index 3b880d9d60b..6f489f01e7a 100644
--- a/modules/tracking/include/opencv2/tracking/tracker.hpp
+++ b/modules/tracking/include/opencv2/tracking/tracker.hpp
@@ -1535,7 +1535,7 @@ class CV_EXPORTS_W TrackerCSRT : public Tracker
 
   CV_WRAP static Ptr<TrackerCSRT> create();
 
-  virtual void setInitialMask(const Mat mask) = 0;
+  CV_WRAP virtual void setInitialMask(InputArray mask) = 0;
 
   virtual ~TrackerCSRT() CV_OVERRIDE {}
 };
diff --git a/modules/tracking/include/opencv2/tracking/tracking_by_matching.hpp b/modules/tracking/include/opencv2/tracking/tracking_by_matching.hpp
new file mode 100644
index 00000000000..b6962e2a9b1
--- /dev/null
+++ b/modules/tracking/include/opencv2/tracking/tracking_by_matching.hpp
@@ -0,0 +1,557 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef __OPENCV_TRACKING_TRACKING_BY_MATCHING_HPP__
+#define __OPENCV_TRACKING_TRACKING_BY_MATCHING_HPP__
+
+#include <deque>
+#include <iostream>
+#include <string>
+#include <unordered_map>
+#include <vector>
+#include <memory>
+#include <map>
+#include <tuple>
+#include <set>
+
+#include "opencv2/core.hpp"
+#include "opencv2/imgproc.hpp"
+
+
+namespace cv {
+namespace tbm { //Tracking-by-Matching
+///
+/// \brief The TrackedObject struct defines properties of detected object.
+///
+struct CV_EXPORTS TrackedObject {
+    cv::Rect rect;       ///< Detected object ROI (zero area if N/A).
+    double confidence;   ///< Detection confidence level (-1 if N/A).
+    int frame_idx;       ///< Frame index where object was detected (-1 if N/A).
+    int object_id;       ///< Unique object identifier (-1 if N/A).
+    uint64_t timestamp;  ///< Timestamp in milliseconds.
+
+    ///
+    /// \brief Default constructor.
+    ///
+    TrackedObject()
+        : confidence(-1),
+        frame_idx(-1),
+        object_id(-1),
+        timestamp(0) {}
+
+    ///
+    /// \brief Constructor with parameters.
+    /// \param rect Bounding box of detected object.
+    /// \param confidence Confidence of detection.
+    /// \param frame_idx Index of frame.
+    /// \param object_id Object ID.
+    ///
+    TrackedObject(const cv::Rect &rect, float confidence, int frame_idx,
+                  int object_id)
+        : rect(rect),
+        confidence(confidence),
+        frame_idx(frame_idx),
+        object_id(object_id),
+        timestamp(0) {}
+};
+
+using TrackedObjects = std::deque<TrackedObject>;
+
+bool operator==(const TrackedObject& first, const TrackedObject& second);
+bool operator!=(const TrackedObject& first, const TrackedObject& second);
+/// (object id, detected objects) pairs collection.
+using ObjectTracks = std::unordered_map<int, TrackedObjects>;
+
+///
+/// \brief The IImageDescriptor class declares base class for image
+/// descriptor.
+///
+class CV_EXPORTS IImageDescriptor {
+public:
+    ///
+    /// \brief Descriptor size getter.
+    /// \return Descriptor size.
+    ///
+    virtual cv::Size size() const = 0;
+
+    ///
+    /// \brief Computes image descriptor.
+    /// \param[in] mat Color image.
+    /// \param[out] descr Computed descriptor.
+    ///
+    virtual void compute(const cv::Mat &mat, CV_OUT cv::Mat& descr) = 0;
+
+    ///
+    /// \brief Computes image descriptors in batches.
+    /// \param[in] mats Images of interest.
+    /// \param[out] descrs Matrices to store the computed descriptors.
+    ///
+    virtual void compute(const std::vector<cv::Mat> &mats,
+                         CV_OUT std::vector<cv::Mat>& descrs) = 0;
+
+    virtual ~IImageDescriptor() {}
+};
+
+
+///
+/// \brief Uses resized image as descriptor.
+///
+class CV_EXPORTS ResizedImageDescriptor : public IImageDescriptor {
+public:
+    ///
+    /// \brief Constructor.
+    /// \param[in] descr_size Size of the descriptor (resized image).
+    /// \param[in] interpolation Interpolation algorithm.
+    ///
+    explicit ResizedImageDescriptor(const cv::Size &descr_size,
+                                    const cv::InterpolationFlags interpolation)
+        : descr_size_(descr_size), interpolation_(interpolation) {
+            CV_Assert(descr_size.width > 0);
+            CV_Assert(descr_size.height > 0);
+        }
+
+    ///
+    /// \brief Returns descriptor size.
+    /// \return Number of elements in the descriptor.
+    ///
+    cv::Size size() const override { return descr_size_; }
+
+    ///
+    /// \brief Computes image descriptor.
+    /// \param[in] mat Frame containing the image of interest.
+    /// \param[out] descr Matrix to store the computed descriptor.
+    ///
+    void compute(const cv::Mat &mat, CV_OUT cv::Mat& descr) override {
+        CV_Assert(!mat.empty());
+        cv::resize(mat, descr, descr_size_, 0, 0, interpolation_);
+    }
+
+    ///
+    /// \brief Computes images descriptors.
+    /// \param[in] mats Frames containing images of interest.
+    /// \param[out] descrs Matrices to store the computed descriptors.
+    //
+    void compute(const std::vector<cv::Mat> &mats,
+                 CV_OUT std::vector<cv::Mat>& descrs) override  {
+        descrs.resize(mats.size());
+        for (size_t i = 0; i < mats.size(); i++)  {
+            compute(mats[i], descrs[i]);
+        }
+    }
+
+private:
+    cv::Size descr_size_;
+
+    cv::InterpolationFlags interpolation_;
+};
+
+
+///
+/// \brief The IDescriptorDistance class declares an interface for distance
+/// computation between reidentification descriptors.
+///
+class CV_EXPORTS IDescriptorDistance {
+public:
+    ///
+    /// \brief Computes distance between two descriptors.
+    /// \param[in] descr1 First descriptor.
+    /// \param[in] descr2 Second descriptor.
+    /// \return Distance between two descriptors.
+    ///
+    virtual float compute(const cv::Mat &descr1, const cv::Mat &descr2) = 0;
+
+    ///
+    /// \brief Computes distances between two descriptors in batches.
+    /// \param[in] descrs1 Batch of first descriptors.
+    /// \param[in] descrs2 Batch of second descriptors.
+    /// \return Distances between descriptors.
+    ///
+    virtual std::vector<float> compute(const std::vector<cv::Mat> &descrs1,
+                                       const std::vector<cv::Mat> &descrs2) = 0;
+
+    virtual ~IDescriptorDistance() {}
+};
+
+///
+/// \brief The CosDistance class allows computing cosine distance between two
+/// reidentification descriptors.
+///
+class CV_EXPORTS CosDistance : public IDescriptorDistance {
+public:
+    ///
+    /// \brief CosDistance constructor.
+    /// \param[in] descriptor_size Descriptor size.
+    ///
+    explicit CosDistance(const cv::Size &descriptor_size);
+
+    ///
+    /// \brief Computes distance between two descriptors.
+    /// \param descr1 First descriptor.
+    /// \param descr2 Second descriptor.
+    /// \return Distance between two descriptors.
+    ///
+    float compute(const cv::Mat &descr1, const cv::Mat &descr2) override;
+
+    ///
+    /// \brief Computes distances between two descriptors in batches.
+    /// \param[in] descrs1 Batch of first descriptors.
+    /// \param[in] descrs2 Batch of second descriptors.
+    /// \return Distances between descriptors.
+    ///
+    std::vector<float> compute(
+        const std::vector<cv::Mat> &descrs1,
+        const std::vector<cv::Mat> &descrs2) override;
+
+private:
+    cv::Size descriptor_size_;
+};
+
+
+///
+/// \brief Computes distance between images
+///        using MatchTemplate function from OpenCV library
+///        and its cross-correlation computation method in particular.
+///
+class CV_EXPORTS MatchTemplateDistance : public IDescriptorDistance {
+public:
+    ///
+    /// \brief Constructs the distance object.
+    ///
+    /// \param[in] type Method of MatchTemplate function computation.
+    /// \param[in] scale Scale parameter for the distance.
+    ///            Final distance is computed as:
+    ///            scale * distance + offset.
+    /// \param[in] offset Offset parameter for the distance.
+    ///            Final distance is computed as:
+    ///            scale * distance + offset.
+    ///
+    MatchTemplateDistance(int type = cv::TemplateMatchModes::TM_CCORR_NORMED,
+                          float scale = -1, float offset = 1)
+        : type_(type), scale_(scale), offset_(offset) {}
+    ///
+    /// \brief Computes distance between image descriptors.
+    /// \param[in] descr1 First image descriptor.
+    /// \param[in] descr2 Second image descriptor.
+    /// \return Distance between image descriptors.
+    ///
+    float compute(const cv::Mat &descr1, const cv::Mat &descr2) override;
+    ///
+    /// \brief Computes distances between two descriptors in batches.
+    /// \param[in] descrs1 Batch of first descriptors.
+    /// \param[in] descrs2 Batch of second descriptors.
+    /// \return Distances between descriptors.
+    ///
+    std::vector<float> compute(const std::vector<cv::Mat> &descrs1,
+                               const std::vector<cv::Mat> &descrs2) override;
+    virtual ~MatchTemplateDistance() {}
+
+private:
+    int type_;      ///< Method of MatchTemplate function computation.
+    float scale_;   ///< Scale parameter for the distance. Final distance is
+                    /// computed as: scale * distance + offset.
+    float offset_;  ///< Offset parameter for the distance. Final distance is
+                    /// computed as: scale * distance + offset.
+};
+
+///
+/// \brief The TrackerParams struct stores parameters of TrackerByMatching
+///
+struct CV_EXPORTS TrackerParams {
+    size_t min_track_duration;  ///< Min track duration in milliseconds.
+
+    size_t forget_delay;  ///< Forget about track if the last bounding box in
+                          /// track was detected more than specified number of
+                          /// frames ago.
+
+    float aff_thr_fast;  ///< Affinity threshold which is used to determine if
+                         /// tracklet and detection should be combined (fast
+                         /// descriptor is used).
+
+    float aff_thr_strong;  ///< Affinity threshold which is used to determine if
+                           /// tracklet and detection should be combined(strong
+                           /// descriptor is used).
+
+    float shape_affinity_w;  ///< Shape affinity weight.
+
+    float motion_affinity_w;  ///< Motion affinity weight.
+
+    float time_affinity_w;  ///< Time affinity weight.
+
+    float min_det_conf;  ///< Min confidence of detection.
+
+    cv::Vec2f bbox_aspect_ratios_range;  ///< Bounding box aspect ratios range.
+
+    cv::Vec2f bbox_heights_range;  ///< Bounding box heights range.
+
+    int predict;  ///< How many frames are used to predict bounding box in case
+    /// of lost track.
+
+    float strong_affinity_thr;  ///< If 'fast' confidence is greater than this
+                                /// threshold then 'strong' Re-ID approach is
+                                /// used.
+
+    float reid_thr;  ///< Affinity threshold for re-identification.
+
+    bool drop_forgotten_tracks;  ///< Drop forgotten tracks. If it's enabled it
+                                 /// disables an ability to get detection log.
+
+    int max_num_objects_in_track;  ///< The number of objects in track is
+                                   /// restricted by this parameter. If it is negative or zero, the max number of
+                                   /// objects in track is not restricted.
+
+    ///
+    /// Default constructor.
+    ///
+    TrackerParams();
+};
+
+///
+/// \brief The Track class describes tracks.
+///
+class CV_EXPORTS Track {
+public:
+    ///
+    /// \brief Track constructor.
+    /// \param objs Detected objects sequence.
+    /// \param last_image Image of last image in the detected object sequence.
+    /// \param descriptor_fast Fast descriptor.
+    /// \param descriptor_strong Strong descriptor (reid embedding).
+    ///
+    Track(const TrackedObjects &objs, const cv::Mat &last_image,
+          const cv::Mat &descriptor_fast, const cv::Mat &descriptor_strong)
+        : objects(objs),
+        predicted_rect(!objs.empty() ? objs.back().rect : cv::Rect()),
+        last_image(last_image),
+        descriptor_fast(descriptor_fast),
+        descriptor_strong(descriptor_strong),
+        lost(0),
+        length(1) {
+            CV_Assert(!objs.empty());
+            first_object = objs[0];
+        }
+
+    ///
+    /// \brief empty returns if track does not contain objects.
+    /// \return true if track does not contain objects.
+    ///
+    bool empty() const { return objects.empty(); }
+
+    ///
+    /// \brief size returns number of detected objects in a track.
+    /// \return number of detected objects in a track.
+    ///
+    size_t size() const { return objects.size(); }
+
+    ///
+    /// \brief operator [] return const reference to detected object with
+    ///        specified index.
+    /// \param i Index of object.
+    /// \return const reference to detected object with specified index.
+    ///
+    const TrackedObject &operator[](size_t i) const { return objects[i]; }
+
+    ///
+    /// \brief operator [] return non-const reference to detected object with
+    ///        specified index.
+    /// \param i Index of object.
+    /// \return non-const reference to detected object with specified index.
+    ///
+    TrackedObject &operator[](size_t i) { return objects[i]; }
+
+    ///
+    /// \brief back returns const reference to last object in track.
+    /// \return const reference to last object in track.
+    ///
+    const TrackedObject &back() const {
+        CV_Assert(!empty());
+        return objects.back();
+    }
+
+    ///
+    /// \brief back returns non-const reference to last object in track.
+    /// \return non-const reference to last object in track.
+    ///
+    TrackedObject &back() {
+        CV_Assert(!empty());
+        return objects.back();
+    }
+
+    TrackedObjects objects;   ///< Detected objects;
+    cv::Rect predicted_rect;  ///< Rectangle that represents predicted position
+                              /// and size of bounding box if track has been lost.
+    cv::Mat last_image;       ///< Image of last detected object in track.
+    cv::Mat descriptor_fast;  ///< Fast descriptor.
+    cv::Mat descriptor_strong;  ///< Strong descriptor (reid embedding).
+    size_t lost;                ///< How many frames ago track has been lost.
+
+    TrackedObject first_object;  ///< First object in track.
+    size_t length;  ///< Length of a track including number of objects that were
+                    /// removed from track in order to avoid memory usage growth.
+};
+
+///
+/// \brief Tracker-by-Matching algorithm interface.
+///
+/// This class is implementation of tracking-by-matching system. It uses two
+/// different appearance measures to compute affinity between bounding boxes:
+/// some fast descriptor and some strong descriptor. Each time the assignment
+/// problem is solved. The assignment problem in our case is how to establish
+/// correspondence between existing tracklets and recently detected objects.
+/// First step is to compute an affinity matrix between tracklets and
+/// detections. The affinity equals to
+///       appearance_affinity * motion_affinity * shape_affinity.
+/// Where appearance is 1 - distance(tracklet_fast_dscr, detection_fast_dscr).
+/// Second step is to solve the assignment problem using Kuhn-Munkres
+/// algorithm. If correspondence between some tracklet and detection is
+/// established with low confidence (affinity) then the strong descriptor is
+/// used to determine if there is correspondence between tracklet and detection.
+///
+class CV_EXPORTS ITrackerByMatching {
+public:
+    using Descriptor = std::shared_ptr<IImageDescriptor>;
+    using Distance = std::shared_ptr<IDescriptorDistance>;
+
+    ///
+    /// \brief Destructor for the tracker
+    ///
+    virtual ~ITrackerByMatching() {}
+
+    ///
+    /// \brief Process given frame.
+    /// \param[in] frame Colored image (CV_8UC3).
+    /// \param[in] detections Detected objects on the frame.
+    /// \param[in] timestamp Timestamp must be positive and measured in
+    /// milliseconds
+    ///
+    virtual void process(const cv::Mat &frame, const TrackedObjects &detections,
+                         uint64_t timestamp) = 0;
+
+    ///
+    /// \brief Pipeline parameters getter.
+    /// \return Parameters of pipeline.
+    ///
+    virtual const TrackerParams &params() const = 0;
+
+    ///
+    /// \brief Pipeline parameters setter.
+    /// \param[in] params Parameters of pipeline.
+    ///
+    virtual void setParams(const TrackerParams &params) = 0;
+
+    ///
+    /// \brief Fast descriptor getter.
+    /// \return Fast descriptor used in pipeline.
+    ///
+    virtual const Descriptor &descriptorFast() const = 0;
+
+    ///
+    /// \brief Fast descriptor setter.
+    /// \param[in] val Fast descriptor used in pipeline.
+    ///
+    virtual void setDescriptorFast(const Descriptor &val) = 0;
+
+    ///
+    /// \brief Strong descriptor getter.
+    /// \return Strong descriptor used in pipeline.
+    ///
+    virtual const Descriptor &descriptorStrong() const = 0;
+
+    ///
+    /// \brief Strong descriptor setter.
+    /// \param[in] val Strong descriptor used in pipeline.
+    ///
+    virtual void setDescriptorStrong(const Descriptor &val) = 0;
+
+    ///
+    /// \brief Fast distance getter.
+    /// \return Fast distance used in pipeline.
+    ///
+    virtual const Distance &distanceFast() const = 0;
+
+    ///
+    /// \brief Fast distance setter.
+    /// \param[in] val Fast distance used in pipeline.
+    ///
+    virtual void setDistanceFast(const Distance &val) = 0;
+
+    ///
+    /// \brief Strong distance getter.
+    /// \return Strong distance used in pipeline.
+    ///
+    virtual const Distance &distanceStrong() const = 0;
+
+    ///
+    /// \brief Strong distance setter.
+    /// \param[in] val Strong distance used in pipeline.
+    ///
+    virtual void setDistanceStrong(const Distance &val) = 0;
+
+    ///
+    /// \brief Returns number of counted people.
+    /// \return a number of counted people.
+    ///
+    virtual size_t count() const = 0;
+
+    ///
+    /// \brief Get active tracks to draw
+    /// \return Active tracks.
+    ///
+    virtual std::unordered_map<size_t, std::vector<cv::Point> > getActiveTracks() const = 0;
+
+    ///
+    /// \brief Get tracked detections.
+    /// \return Tracked detections.
+    ///
+    virtual TrackedObjects trackedDetections() const = 0;
+
+    ///
+    /// \brief Draws active tracks on a given frame.
+    /// \param[in] frame Colored image (CV_8UC3).
+    /// \return Colored image with drawn active tracks.
+    ///
+    virtual cv::Mat drawActiveTracks(const cv::Mat &frame) = 0;
+
+    ///
+    /// \brief isTrackForgotten returns true if track is forgotten.
+    /// \param id Track ID.
+    /// \return true if track is forgotten.
+    ///
+    virtual bool isTrackForgotten(size_t id) const = 0;
+
+    ///
+    /// \brief tracks Returns all tracks including forgotten (lost too many frames
+    /// ago).
+    /// \return Set of tracks {id, track}.
+    ///
+    virtual const std::unordered_map<size_t, Track> &tracks() const = 0;
+
+    ///
+    /// \brief isTrackValid Checks whether track is valid (duration > threshold).
+    /// \param track_id Index of checked track.
+    /// \return True if track duration exceeds some predefined value.
+    ///
+    virtual bool isTrackValid(size_t track_id) const = 0;
+
+    ///
+    /// \brief dropForgottenTracks Removes tracks from memory that were lost too
+    /// many frames ago.
+    ///
+    virtual void dropForgottenTracks() = 0;
+
+    ///
+    /// \brief dropForgottenTrack Check that the track was lost too many frames
+    /// ago
+    /// and removes it frm memory.
+    ///
+    virtual void dropForgottenTrack(size_t track_id) = 0;
+};
+
+///
+/// \brief The factory to create Tracker-by-Matching algorithm implementation.
+///
+CV_EXPORTS cv::Ptr<ITrackerByMatching> createTrackerByMatching(const TrackerParams &params = TrackerParams());
+
+} // namespace tbm
+} // namespace cv
+#endif // #ifndef __OPENCV_TRACKING_TRACKING_BY_MATCHING_HPP__
diff --git a/modules/tracking/samples/tracker.cpp b/modules/tracking/samples/tracker.cpp
index 5e0cb7f1c9e..add014c8a1c 100644
--- a/modules/tracking/samples/tracker.cpp
+++ b/modules/tracking/samples/tracker.cpp
@@ -93,7 +93,7 @@ int main( int argc, char** argv ){
 
   //instantiates the specific Tracker
   Ptr<Tracker> tracker = createTrackerByName(tracker_algorithm);
-  if( tracker == NULL )
+  if (!tracker)
   {
     cout << "***Error in the instantiation of the tracker...***\n";
     return -1;
diff --git a/modules/tracking/samples/tracker_dataset.cpp b/modules/tracking/samples/tracker_dataset.cpp
index 24d3dfc13f2..1dbeb4c8407 100644
--- a/modules/tracking/samples/tracker_dataset.cpp
+++ b/modules/tracking/samples/tracker_dataset.cpp
@@ -145,7 +145,7 @@ int main(int argc, char *argv[])
 
 	//Create Tracker
     Ptr<Tracker> tracker = createTrackerByName(tracker_algorithm);
-	if (tracker == NULL)
+	if (!tracker)
 	{
 		cout << "***Error in the instantiation of the tracker...***\n";
 		getchar();
diff --git a/modules/tracking/samples/tracking_by_matching.cpp b/modules/tracking/samples/tracking_by_matching.cpp
new file mode 100644
index 00000000000..1e53845a1e5
--- /dev/null
+++ b/modules/tracking/samples/tracking_by_matching.cpp
@@ -0,0 +1,254 @@
+#include <opencv2/core.hpp>
+#include <opencv2/highgui.hpp>
+#include <opencv2/tracking/tracking_by_matching.hpp>
+#include <iostream>
+
+#ifdef HAVE_OPENCV_DNN
+#include <opencv2/dnn.hpp>
+
+using namespace std;
+using namespace cv;
+using namespace cv::tbm;
+
+static const char* keys =
+{   "{video_name       | | video name                       }"
+    "{start_frame      |0| Start frame                      }"
+    "{frame_step       |1| Frame step                       }"
+    "{detector_model   | | Path to detector's Caffe model   }"
+    "{detector_weights | | Path to detector's Caffe weights }"
+    "{desired_class_id |-1| The desired class that should be tracked }"
+};
+
+static void help()
+{
+  cout << "\nThis example shows the functionality of \"Tracking-by-Matching\" approach:"
+      " detector is used to detect objects on frames, \n"
+      "matching is used to find correspondences between new detections and tracked objects.\n"
+      "Detection is made by DNN detection network every `--frame_step` frame.\n"
+      "Point a .prototxt file of the network as the parameter `--detector_model`, and a .caffemodel file"
+      " as the parameter `--detector_weights`.\n"
+      "(As an example of such detection network is a popular MobileNet_SSD network trained on VOC dataset.)\n"
+      "If `--desired_class_id` parameter is set, the detection result is filtered by class id,"
+      " returned by the detection network.\n"
+      "(That is, if a detection net was trained on VOC dataset, then to track pedestrians point --desired_class_id=15)\n"
+       "Example of <video_name> is in opencv_extra/testdata/cv/tracking/\n"
+       "Call:\n"
+       "./example_tracking_tracking_by_matching --video_name=<video_name> --detector_model=<detector_model_path> --detector_weights=<detector_weights_path> \\\n"
+       "                                       [--start_frame=<start_frame>] \\\n"
+       "                                       [--frame_step=<frame_step>] \\\n"
+       "                                       [--desired_class_id=<desired_class_id>]\n"
+       << endl;
+
+  cout << "\n\nHot keys: \n"
+       "\tq - quit the program\n"
+       "\tp - pause/resume video\n";
+}
+
+cv::Ptr<ITrackerByMatching> createTrackerByMatchingWithFastDescriptor();
+
+class DnnObjectDetector
+{
+public:
+    DnnObjectDetector(const String& net_caffe_model_path, const String& net_caffe_weights_path,
+                      int desired_class_id=-1,
+                      float confidence_threshold = 0.2,
+                      //the following parameters are default for popular MobileNet_SSD caffe model
+                      const String& net_input_name="data",
+                      const String& net_output_name="detection_out",
+                      double net_scalefactor=0.007843,
+                      const Size& net_size = Size(300,300),
+                      const Scalar& net_mean = Scalar(127.5, 127.5, 127.5),
+                      bool net_swapRB=false)
+        :desired_class_id(desired_class_id),
+        confidence_threshold(confidence_threshold),
+        net_input_name(net_input_name),
+        net_output_name(net_output_name),
+        net_scalefactor(net_scalefactor),
+        net_size(net_size),
+        net_mean(net_mean),
+        net_swapRB(net_swapRB)
+    {
+        net = dnn::readNetFromCaffe(net_caffe_model_path, net_caffe_weights_path);
+        if (net.empty())
+            CV_Error(Error::StsError, "Cannot read Caffe net");
+    }
+    TrackedObjects detect(const cv::Mat& frame, int frame_idx)
+    {
+        Mat resized_frame;
+        resize(frame, resized_frame, net_size);
+        Mat inputBlob = cv::dnn::blobFromImage(resized_frame, net_scalefactor, net_size, net_mean, net_swapRB);
+
+        net.setInput(inputBlob, net_input_name);
+        Mat detection = net.forward(net_output_name);
+        Mat detection_as_mat(detection.size[2], detection.size[3], CV_32F, detection.ptr<float>());
+
+        TrackedObjects res;
+        for (int i = 0; i < detection_as_mat.rows; i++)
+        {
+            float cur_confidence = detection_as_mat.at<float>(i, 2);
+            int cur_class_id = static_cast<int>(detection_as_mat.at<float>(i, 1));
+            int x_left = static_cast<int>(detection_as_mat.at<float>(i, 3) * frame.cols);
+            int y_bottom = static_cast<int>(detection_as_mat.at<float>(i, 4) * frame.rows);
+            int x_right = static_cast<int>(detection_as_mat.at<float>(i, 5) * frame.cols);
+            int y_top = static_cast<int>(detection_as_mat.at<float>(i, 6) * frame.rows);
+
+            Rect cur_rect(x_left, y_bottom, (x_right - x_left), (y_top - y_bottom));
+
+            if (cur_confidence < confidence_threshold)
+                continue;
+            if ((desired_class_id >= 0) && (cur_class_id != desired_class_id))
+                continue;
+
+            //clipping by frame size
+            cur_rect = cur_rect & Rect(Point(), frame.size());
+            if (cur_rect.empty())
+                continue;
+
+            TrackedObject cur_obj(cur_rect, cur_confidence, frame_idx, -1);
+            res.push_back(cur_obj);
+        }
+        return res;
+    }
+private:
+    cv::dnn::Net net;
+    int desired_class_id;
+    float confidence_threshold;
+    String net_input_name;
+    String net_output_name;
+    double net_scalefactor;
+    Size net_size;
+    Scalar net_mean;
+    bool net_swapRB;
+};
+
+cv::Ptr<ITrackerByMatching>
+createTrackerByMatchingWithFastDescriptor() {
+    cv::tbm::TrackerParams params;
+
+    cv::Ptr<ITrackerByMatching> tracker = createTrackerByMatching(params);
+
+    std::shared_ptr<IImageDescriptor> descriptor_fast =
+        std::make_shared<ResizedImageDescriptor>(
+            cv::Size(16, 32), cv::InterpolationFlags::INTER_LINEAR);
+    std::shared_ptr<IDescriptorDistance> distance_fast =
+        std::make_shared<MatchTemplateDistance>();
+
+    tracker->setDescriptorFast(descriptor_fast);
+    tracker->setDistanceFast(distance_fast);
+
+    return tracker;
+}
+int main( int argc, char** argv ){
+    CommandLineParser parser( argc, argv, keys );
+    cv::Ptr<ITrackerByMatching> tracker = createTrackerByMatchingWithFastDescriptor();
+
+    String video_name = parser.get<String>("video_name");
+    int start_frame = parser.get<int>("start_frame");
+    int frame_step = parser.get<int>("frame_step");
+    String detector_model = parser.get<String>("detector_model");
+    String detector_weights = parser.get<String>("detector_weights");
+    int desired_class_id = parser.get<int>("desired_class_id");
+
+    if( video_name.empty() || detector_model.empty() || detector_weights.empty() )
+    {
+        help();
+        return -1;
+    }
+
+
+    //open the capture
+    VideoCapture cap;
+    cap.open( video_name );
+    cap.set( CAP_PROP_POS_FRAMES, start_frame );
+
+    if( !cap.isOpened() )
+    {
+        help();
+        cout << "***Could not initialize capturing...***\n";
+        cout << "Current parameter's value: \n";
+        parser.printMessage();
+        return -1;
+    }
+
+    // If you use the popular MobileNet_SSD detector, the default parameters may be used.
+    // Otherwise, set your own parameters (net_mean, net_scalefactor, etc).
+    DnnObjectDetector detector(detector_model, detector_weights, desired_class_id);
+
+    Mat frame;
+    namedWindow( "Tracking by Matching", 1 );
+
+    int frame_counter = -1;
+    int64 time_total = 0;
+    bool paused = false;
+    for ( ;; )
+    {
+        if( paused )
+        {
+            char c = (char) waitKey(30);
+            if (c == 'p')
+                paused = !paused;
+            if (c == 'q')
+                break;
+            continue;
+        }
+
+        cap >> frame;
+        if(frame.empty()){
+            break;
+        }
+        frame_counter++;
+        if (frame_counter < start_frame)
+            continue;
+        if (frame_counter % frame_step != 0)
+            continue;
+
+
+        int64 frame_time = getTickCount();
+
+        TrackedObjects detections = detector.detect(frame, frame_counter);
+
+        // timestamp in milliseconds
+        uint64_t cur_timestamp = static_cast<uint64_t>(1000.0 / 30 * frame_counter);
+        tracker->process(frame, detections, cur_timestamp);
+
+        frame_time = getTickCount() - frame_time;
+        time_total += frame_time;
+
+        // Drawing colored "worms" (tracks).
+        frame = tracker->drawActiveTracks(frame);
+
+
+        // Drawing all detected objects on a frame by BLUE COLOR
+        for (const auto &detection : detections) {
+            cv::rectangle(frame, detection.rect, cv::Scalar(255, 0, 0), 3);
+        }
+
+        // Drawing tracked detections only by RED color and print ID and detection
+        // confidence level.
+        for (const auto &detection : tracker->trackedDetections()) {
+            cv::rectangle(frame, detection.rect, cv::Scalar(0, 0, 255), 3);
+            std::string text = std::to_string(detection.object_id) +
+                " conf: " + std::to_string(detection.confidence);
+            cv::putText(frame, text, detection.rect.tl(), cv::FONT_HERSHEY_COMPLEX,
+                        1.0, cv::Scalar(0, 0, 255), 3);
+        }
+
+        imshow( "Tracking by Matching", frame );
+
+        char c = (char) waitKey( 2 );
+        if (c == 'q')
+            break;
+        if (c == 'p')
+            paused = !paused;
+    }
+
+    double s = frame_counter / (time_total / getTickFrequency());
+    printf("FPS: %f\n", s);
+
+    return 0;
+}
+#else // #ifdef HAVE_OPENCV_DNN
+int main(int, char**){
+    CV_Error(cv::Error::StsNotImplemented, "At the moment the sample 'tracking_by_matching' can work only when opencv_dnn module is built.");
+}
+#endif // #ifdef HAVE_OPENCV_DNN
diff --git a/modules/tracking/src/feature.cpp b/modules/tracking/src/feature.cpp
index 23aa699f553..7b8bfcff731 100644
--- a/modules/tracking/src/feature.cpp
+++ b/modules/tracking/src/feature.cpp
@@ -100,7 +100,7 @@ bool CvFeatureParams::read( const FileNode &node )
   return ( maxCatCount >= 0 && featSize >= 1 );
 }
 
-Ptr<CvFeatureParams> CvFeatureParams::create( int featureType )
+Ptr<CvFeatureParams> CvFeatureParams::create(FeatureType featureType)
 {
   return featureType == HAAR ? Ptr<CvFeatureParams>( new CvHaarFeatureParams ) : featureType == LBP ? Ptr<CvFeatureParams>( new CvLBPFeatureParams ) :
          featureType == HOG ? Ptr<CvFeatureParams>( new CvHOGFeatureParams ) : Ptr<CvFeatureParams>();
@@ -128,7 +128,7 @@ void CvFeatureEvaluator::setImage( const Mat &img, uchar clsLabel, int idx )
   cls.ptr<float>( idx )[0] = clsLabel;
 }
 
-Ptr<CvFeatureEvaluator> CvFeatureEvaluator::create( int type )
+Ptr<CvFeatureEvaluator> CvFeatureEvaluator::create(CvFeatureParams::FeatureType type)
 {
   return type == CvFeatureParams::HAAR ? Ptr<CvFeatureEvaluator>( new CvHaarEvaluator ) :
          type == CvFeatureParams::LBP ? Ptr<CvFeatureEvaluator>( new CvLBPEvaluator ) :
diff --git a/modules/tracking/src/kuhn_munkres.cpp b/modules/tracking/src/kuhn_munkres.cpp
new file mode 100644
index 00000000000..1d33c70e42c
--- /dev/null
+++ b/modules/tracking/src/kuhn_munkres.cpp
@@ -0,0 +1,168 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "kuhn_munkres.hpp"
+
+#include <algorithm>
+#include <limits>
+#include <vector>
+
+KuhnMunkres::KuhnMunkres() : n_() {}
+
+std::vector<size_t> KuhnMunkres::Solve(const cv::Mat& dissimilarity_matrix) {
+    CV_Assert(dissimilarity_matrix.type() == CV_32F);
+    double min_val;
+    cv::minMaxLoc(dissimilarity_matrix, &min_val);
+    CV_Assert(min_val >= 0);
+
+    n_ = std::max(dissimilarity_matrix.rows, dissimilarity_matrix.cols);
+    dm_ = cv::Mat(n_, n_, CV_32F, cv::Scalar(0));
+    marked_ = cv::Mat(n_, n_, CV_8S, cv::Scalar(0));
+    points_ = std::vector<cv::Point>(n_ * 2);
+
+    dissimilarity_matrix.copyTo(dm_(
+            cv::Rect(0, 0, dissimilarity_matrix.cols, dissimilarity_matrix.rows)));
+
+    is_row_visited_ = std::vector<int>(n_, 0);
+    is_col_visited_ = std::vector<int>(n_, 0);
+
+    Run();
+
+    std::vector<size_t> results(static_cast<size_t>(marked_.rows), static_cast<size_t>(-1));
+    for (int i = 0; i < marked_.rows; i++) {
+        const auto ptr = marked_.ptr<char>(i);
+        for (int j = 0; j < marked_.cols; j++) {
+            if (ptr[j] == kStar) {
+                results[i] = j;
+            }
+        }
+    }
+    return results;
+}
+
+void KuhnMunkres::TrySimpleCase() {
+    auto is_row_visited = std::vector<int>(n_, 0);
+    auto is_col_visited = std::vector<int>(n_, 0);
+
+    for (int row = 0; row < n_; row++) {
+        auto ptr = dm_.ptr<float>(row);
+        auto marked_ptr = marked_.ptr<char>(row);
+        auto min_val = *std::min_element(ptr, ptr + n_);
+        for (int col = 0; col < n_; col++) {
+            ptr[col] -= min_val;
+            if (ptr[col] == 0 && !is_col_visited[col] && !is_row_visited[row]) {
+                marked_ptr[col] = kStar;
+                is_col_visited[col] = 1;
+                is_row_visited[row] = 1;
+            }
+        }
+    }
+}
+
+bool KuhnMunkres::CheckIfOptimumIsFound() {
+    int count = 0;
+    for (int i = 0; i < n_; i++) {
+        const auto marked_ptr = marked_.ptr<char>(i);
+        for (int j = 0; j < n_; j++) {
+            if (marked_ptr[j] == kStar) {
+                is_col_visited_[j] = 1;
+                count++;
+            }
+        }
+    }
+
+    return count >= n_;
+}
+
+cv::Point KuhnMunkres::FindUncoveredMinValPos() {
+    auto min_val = std::numeric_limits<float>::max();
+    cv::Point min_val_pos(-1, -1);
+    for (int i = 0; i < n_; i++) {
+        if (!is_row_visited_[i]) {
+            auto dm_ptr = dm_.ptr<float>(i);
+            for (int j = 0; j < n_; j++) {
+                if (!is_col_visited_[j] && dm_ptr[j] < min_val) {
+                    min_val = dm_ptr[j];
+                    min_val_pos = cv::Point(j, i);
+                }
+            }
+        }
+    }
+    return min_val_pos;
+}
+
+void KuhnMunkres::UpdateDissimilarityMatrix(float val) {
+    for (int i = 0; i < n_; i++) {
+        auto dm_ptr = dm_.ptr<float>(i);
+        for (int j = 0; j < n_; j++) {
+            if (is_row_visited_[i]) dm_ptr[j] += val;
+            if (!is_col_visited_[j]) dm_ptr[j] -= val;
+        }
+    }
+}
+
+int KuhnMunkres::FindInRow(int row, int what) {
+    for (int j = 0; j < n_; j++) {
+        if (marked_.at<char>(row, j) == what) {
+            return j;
+        }
+    }
+    return -1;
+}
+
+int KuhnMunkres::FindInCol(int col, int what) {
+    for (int i = 0; i < n_; i++) {
+        if (marked_.at<char>(i, col) == what) {
+            return i;
+        }
+    }
+    return -1;
+}
+
+void KuhnMunkres::Run() {
+    TrySimpleCase();
+    while (!CheckIfOptimumIsFound()) {
+        while (true) {
+            auto point = FindUncoveredMinValPos();
+            auto min_val = dm_.at<float>(point.y, point.x);
+            if (min_val > 0) {
+                UpdateDissimilarityMatrix(min_val);
+            } else {
+                marked_.at<char>(point.y, point.x) = kPrime;
+                int col = FindInRow(point.y, kStar);
+                if (col >= 0) {
+                    is_row_visited_[point.y] = 1;
+                    is_col_visited_[col] = 0;
+                } else {
+                    int count = 0;
+                    points_[count] = point;
+
+                    while (true) {
+                        int row = FindInCol(points_[count].x, kStar);
+                        if (row >= 0) {
+                            count++;
+                            points_[count] = cv::Point(points_[count - 1].x, row);
+                            int col1 = FindInRow(points_[count].y, kPrime);
+                            count++;
+                            points_[count] = cv::Point(col1, points_[count - 1].y);
+                        } else {
+                            break;
+                        }
+                    }
+
+                    for (int i = 0; i < count + 1; i++) {
+                        auto& mark = marked_.at<char>(points_[i].y, points_[i].x);
+                        mark = mark == kStar ? 0 : kStar;
+                    }
+
+                    is_row_visited_ = std::vector<int>(n_, 0);
+                    is_col_visited_ = std::vector<int>(n_, 0);
+
+                    marked_.setTo(0, marked_ == kPrime);
+                    break;
+                }
+            }
+        }
+    }
+}
diff --git a/modules/tracking/src/kuhn_munkres.hpp b/modules/tracking/src/kuhn_munkres.hpp
new file mode 100644
index 00000000000..4f5ea28c185
--- /dev/null
+++ b/modules/tracking/src/kuhn_munkres.hpp
@@ -0,0 +1,55 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef __OPENCV_TRACKING_KUHN_MUNKRES_HPP__
+#define __OPENCV_TRACKING_KUHN_MUNKRES_HPP__
+
+#include "opencv2/core.hpp"
+
+#include <memory>
+#include <vector>
+
+
+///
+/// \brief The KuhnMunkres class
+///
+/// Solves the assignment problem.
+///
+class KuhnMunkres {
+public:
+    KuhnMunkres();
+
+    ///
+    /// \brief Solves the assignment problem for given dissimilarity matrix.
+    /// It returns a vector that where each element is a column index for
+    /// corresponding row (e.g. result[0] stores optimal column index for very
+    /// first row in the dissimilarity matrix).
+    /// \param dissimilarity_matrix CV_32F dissimilarity matrix.
+    /// \return Optimal column index for each row. -1 means that there is no
+    /// column for row.
+    ///
+    std::vector<size_t> Solve(const cv::Mat &dissimilarity_matrix);
+
+private:
+    static constexpr int kStar = 1;
+    static constexpr int kPrime = 2;
+
+    cv::Mat dm_;
+    cv::Mat marked_;
+    std::vector<cv::Point> points_;
+
+    std::vector<int> is_row_visited_;
+    std::vector<int> is_col_visited_;
+
+    int n_;
+
+    void TrySimpleCase();
+    bool CheckIfOptimumIsFound();
+    cv::Point FindUncoveredMinValPos();
+    void UpdateDissimilarityMatrix(float val);
+    int FindInRow(int row, int what);
+    int FindInCol(int col, int what);
+    void Run();
+};
+#endif // #ifndef __OPENCV_TRACKING_KUHN_MUNKRES_HPP__
diff --git a/modules/tracking/src/multiTracker.cpp b/modules/tracking/src/multiTracker.cpp
index fe8dcbdcf8b..7b71bccdeb7 100644
--- a/modules/tracking/src/multiTracker.cpp
+++ b/modules/tracking/src/multiTracker.cpp
@@ -47,7 +47,7 @@ namespace cv
     bool MultiTracker_Alt::addTarget(InputArray image, const Rect2d& boundingBox, Ptr<Tracker> tracker_algorithm)
 	{
         Ptr<Tracker> tracker = tracker_algorithm;
-		if (tracker == NULL)
+		if (!tracker)
 			return false;
 
 		if (!tracker->init(image, boundingBox))
diff --git a/modules/tracking/src/tracker.cpp b/modules/tracking/src/tracker.cpp
index ad4765d82ca..f692b5d62f9 100644
--- a/modules/tracking/src/tracker.cpp
+++ b/modules/tracking/src/tracker.cpp
@@ -70,7 +70,7 @@ bool Tracker::init( InputArray image, const Rect2d& boundingBox )
   bool initTracker = initImpl( image.getMat(), boundingBox );
 
   //check if the model component is initialized
-  if( model == 0 )
+  if (!model)
   {
     CV_Error( -1, "The model is not initialized" );
   }
diff --git a/modules/tracking/src/trackerCSRT.cpp b/modules/tracking/src/trackerCSRT.cpp
index 0c541cfb7bb..900fea51a64 100644
--- a/modules/tracking/src/trackerCSRT.cpp
+++ b/modules/tracking/src/trackerCSRT.cpp
@@ -35,7 +35,7 @@ class TrackerCSRTImpl : public TrackerCSRT
     TrackerCSRT::Params params;
 
     bool initImpl(const Mat& image, const Rect2d& boundingBox) CV_OVERRIDE;
-    virtual void setInitialMask(const Mat mask) CV_OVERRIDE;
+    virtual void setInitialMask(InputArray mask) CV_OVERRIDE;
     bool updateImpl(const Mat& image, Rect2d& boundingBox) CV_OVERRIDE;
     void update_csr_filter(const Mat &image, const Mat &my_mask);
     void update_histograms(const Mat &image, const Rect &region);
@@ -99,9 +99,9 @@ void TrackerCSRTImpl::write(cv::FileStorage& fs) const
     params.write(fs);
 }
 
-void TrackerCSRTImpl::setInitialMask(const Mat mask)
+void TrackerCSRTImpl::setInitialMask(InputArray mask)
 {
-    preset_mask = mask;
+    preset_mask = mask.getMat();
 }
 
 bool TrackerCSRTImpl::check_mask_area(const Mat &mat, const double obj_area)
diff --git a/modules/tracking/src/trackerFeatureSet.cpp b/modules/tracking/src/trackerFeatureSet.cpp
index 99ec4944b3c..9896f0ffa29 100644
--- a/modules/tracking/src/trackerFeatureSet.cpp
+++ b/modules/tracking/src/trackerFeatureSet.cpp
@@ -101,7 +101,7 @@ bool TrackerFeatureSet::addTrackerFeature( String trackerFeatureType )
   }
   Ptr<TrackerFeature> feature = TrackerFeature::create( trackerFeatureType );
 
-  if( feature == 0 )
+  if (!feature)
   {
     return false;
   }
diff --git a/modules/tracking/src/trackerModel.cpp b/modules/tracking/src/trackerModel.cpp
index bfee5bf9e2a..023b3b67dc3 100644
--- a/modules/tracking/src/trackerModel.cpp
+++ b/modules/tracking/src/trackerModel.cpp
@@ -61,7 +61,7 @@ TrackerModel::~TrackerModel()
 
 bool TrackerModel::setTrackerStateEstimator( Ptr<TrackerStateEstimator> trackerStateEstimator )
 {
-  if( stateEstimator != 0 )
+  if (stateEstimator.get())
   {
     return false;
   }
@@ -109,12 +109,12 @@ void TrackerModel::modelUpdate()
 
 bool TrackerModel::runStateEstimator()
 {
-  if( stateEstimator == 0 )
+  if (!stateEstimator)
   {
     CV_Error( -1, "Tracker state estimator is not setted" );
   }
   Ptr<TrackerTargetState> targetState = stateEstimator->estimate( confidenceMaps );
-  if( targetState == 0 )
+  if (!targetState)
     return false;
 
   setLastTargetState( targetState );
diff --git a/modules/tracking/src/trackerSampler.cpp b/modules/tracking/src/trackerSampler.cpp
index 2a0b591ffc5..38ea52317e3 100644
--- a/modules/tracking/src/trackerSampler.cpp
+++ b/modules/tracking/src/trackerSampler.cpp
@@ -96,7 +96,7 @@ bool TrackerSampler::addTrackerSamplerAlgorithm( String trackerSamplerAlgorithmT
   }
   Ptr<TrackerSamplerAlgorithm> sampler = TrackerSamplerAlgorithm::create( trackerSamplerAlgorithmType );
 
-  if( sampler == 0 )
+  if (!sampler)
   {
     return false;
   }
@@ -113,7 +113,7 @@ bool TrackerSampler::addTrackerSamplerAlgorithm( Ptr<TrackerSamplerAlgorithm>& s
     return false;
   }
 
-  if( sampler == 0 )
+  if (!sampler)
   {
     return false;
   }
diff --git a/modules/tracking/src/tracking_by_matching.cpp b/modules/tracking/src/tracking_by_matching.cpp
new file mode 100644
index 00000000000..3e2199b5c7a
--- /dev/null
+++ b/modules/tracking/src/tracking_by_matching.cpp
@@ -0,0 +1,1337 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include <map>
+#include <set>
+#include <string>
+#include <tuple>
+#include <unordered_map>
+#include <vector>
+#include <utility>
+#include <limits>
+#include <algorithm>
+
+#include "opencv2/tracking/tracking_by_matching.hpp"
+#include "opencv2/core/check.hpp"
+#include "kuhn_munkres.hpp"
+
+#define TBM_CHECK(cond) CV_Assert(cond)
+
+#define TBM_CHECK_EQ(actual, expected) CV_CheckEQ(actual, expected, "Assertion error:")
+#define TBM_CHECK_NE(actual, expected) CV_CheckNE(actual, expected, "Assertion error:")
+#define TBM_CHECK_LT(actual, expected) CV_CheckLT(actual, expected, "Assertion error:")
+#define TBM_CHECK_GT(actual, expected) CV_CheckGT(actual, expected, "Assertion error:")
+#define TBM_CHECK_LE(actual, expected) CV_CheckLE(actual, expected, "Assertion error:")
+#define TBM_CHECK_GE(actual, expected) CV_CheckGE(actual, expected, "Assertion error:")
+
+using namespace cv::tbm;
+
+CosDistance::CosDistance(const cv::Size &descriptor_size)
+    : descriptor_size_(descriptor_size) {
+    TBM_CHECK(descriptor_size.area() != 0);
+}
+
+float CosDistance::compute(const cv::Mat &descr1, const cv::Mat &descr2) {
+    TBM_CHECK(!descr1.empty());
+    TBM_CHECK(!descr2.empty());
+    TBM_CHECK(descr1.size() == descriptor_size_);
+    TBM_CHECK(descr2.size() == descriptor_size_);
+
+    double xy = descr1.dot(descr2);
+    double xx = descr1.dot(descr1);
+    double yy = descr2.dot(descr2);
+    double norm = sqrt(xx * yy) + 1e-6;
+    return 0.5f * static_cast<float>(1.0 - xy / norm);
+}
+
+std::vector<float> CosDistance::compute(const std::vector<cv::Mat> &descrs1,
+                                        const std::vector<cv::Mat> &descrs2) {
+    TBM_CHECK(descrs1.size() != 0);
+    TBM_CHECK(descrs1.size() == descrs2.size());
+
+    std::vector<float> distances(descrs1.size(), 1.f);
+    for (size_t i = 0; i < descrs1.size(); i++) {
+        distances.at(i) = compute(descrs1.at(i), descrs2.at(i));
+    }
+
+    return distances;
+}
+
+
+float MatchTemplateDistance::compute(const cv::Mat &descr1,
+                                     const cv::Mat &descr2) {
+    TBM_CHECK(!descr1.empty() && !descr2.empty());
+    TBM_CHECK_EQ(descr1.size(), descr2.size());
+    TBM_CHECK_EQ(descr1.type(), descr2.type());
+    cv::Mat res;
+    cv::matchTemplate(descr1, descr2, res, type_);
+    TBM_CHECK(res.size() == cv::Size(1, 1));
+    float dist = res.at<float>(0, 0);
+    return scale_ * dist + offset_;
+}
+
+std::vector<float> MatchTemplateDistance::compute(const std::vector<cv::Mat> &descrs1,
+                                                  const std::vector<cv::Mat> &descrs2) {
+    std::vector<float> result;
+    for (size_t i = 0; i < descrs1.size(); i++) {
+        result.push_back(compute(descrs1[i], descrs2[i]));
+    }
+    return result;
+}
+
+namespace {
+cv::Point Center(const cv::Rect& rect) {
+    return cv::Point((int)(rect.x + rect.width * .5), (int)(rect.y + rect.height * .5));
+}
+
+std::vector<cv::Point> Centers(const TrackedObjects &detections) {
+    std::vector<cv::Point> centers(detections.size());
+    for (size_t i = 0; i < detections.size(); i++) {
+        centers[i] = Center(detections[i].rect);
+    }
+    return centers;
+}
+
+inline bool IsInRange(float val, float min, float max) {
+    return min <= val && val <= max;
+}
+
+inline bool IsInRange(float val, cv::Vec2f range) {
+    return IsInRange(val, range[0], range[1]);
+}
+
+std::vector<cv::Scalar> GenRandomColors(int colors_num) {
+    std::vector<cv::Scalar> colors(colors_num);
+    for (int i = 0; i < colors_num; i++) {
+        colors[i] = cv::Scalar(static_cast<uchar>(255. * rand() / RAND_MAX),  // NOLINT
+                               static_cast<uchar>(255. * rand() / RAND_MAX),  // NOLINT
+                               static_cast<uchar>(255. * rand() / RAND_MAX));  // NOLINT
+    }
+    return colors;
+}
+
+///
+/// \brief Draws a polyline on a frame.
+/// \param[in] polyline Vector of points (polyline).
+/// \param[in] color Color (BGR).
+/// \param[in,out] image Frame.
+/// \param[in] lwd Line width.
+///
+void DrawPolyline(const std::vector<cv::Point>& polyline,
+                  const cv::Scalar& color, CV_OUT cv::Mat& image,
+                  int lwd = 5) {
+    TBM_CHECK(!image.empty());
+    TBM_CHECK_EQ(image.type(), CV_8UC3);
+    TBM_CHECK_GT(lwd, 0);
+    TBM_CHECK_LT(lwd, 20);
+
+    for (size_t i = 1; i < polyline.size(); i++) {
+        cv::line(image, polyline[i - 1], polyline[i], color, lwd);
+    }
+}
+
+void ValidateParams(const TrackerParams &p) {
+    TBM_CHECK_GE(p.min_track_duration, static_cast<size_t>(500));
+    TBM_CHECK_LE(p.min_track_duration, static_cast<size_t>(10000));
+
+    TBM_CHECK_LE(p.forget_delay, static_cast<size_t>(10000));
+
+    TBM_CHECK_GE(p.aff_thr_fast, 0.0f);
+    TBM_CHECK_LE(p.aff_thr_fast, 1.0f);
+
+    TBM_CHECK_GE(p.aff_thr_strong, 0.0f);
+    TBM_CHECK_LE(p.aff_thr_strong, 1.0f);
+
+    TBM_CHECK_GE(p.shape_affinity_w, 0.0f);
+    TBM_CHECK_LE(p.shape_affinity_w, 100.0f);
+
+    TBM_CHECK_GE(p.motion_affinity_w, 0.0f);
+    TBM_CHECK_LE(p.motion_affinity_w, 100.0f);
+
+    TBM_CHECK_GE(p.time_affinity_w, 0.0f);
+    TBM_CHECK_LE(p.time_affinity_w, 100.0f);
+
+    TBM_CHECK_GE(p.min_det_conf, 0.0f);
+    TBM_CHECK_LE(p.min_det_conf, 1.0f);
+
+    TBM_CHECK_GE(p.bbox_aspect_ratios_range[0], 0.0f);
+    TBM_CHECK_LE(p.bbox_aspect_ratios_range[1], 10.0f);
+    TBM_CHECK_LT(p.bbox_aspect_ratios_range[0], p.bbox_aspect_ratios_range[1]);
+
+    TBM_CHECK_GE(p.bbox_heights_range[0], 10.0f);
+    TBM_CHECK_LE(p.bbox_heights_range[1], 1080.0f);
+    TBM_CHECK_LT(p.bbox_heights_range[0], p.bbox_heights_range[1]);
+
+    TBM_CHECK_GE(p.predict, 0);
+    TBM_CHECK_LE(p.predict, 10000);
+
+    TBM_CHECK_GE(p.strong_affinity_thr, 0.0f);
+    TBM_CHECK_LE(p.strong_affinity_thr, 1.0f);
+
+    TBM_CHECK_GE(p.reid_thr, 0.0f);
+    TBM_CHECK_LE(p.reid_thr, 1.0f);
+
+
+    if (p.max_num_objects_in_track > 0) {
+        int min_required_track_length = static_cast<int>(p.forget_delay);
+        TBM_CHECK_GE(p.max_num_objects_in_track, min_required_track_length);
+        TBM_CHECK_LE(p.max_num_objects_in_track, 10000);
+    }
+}
+
+}  // anonymous namespace
+
+///
+/// \brief Tracker-by-Matching algorithm implementation.
+///
+/// This class is implementation of tracking-by-matching system. It uses two
+/// different appearance measures to compute affinity between bounding boxes:
+/// some fast descriptor and some strong descriptor. Each time the assignment
+/// problem is solved. The assignment problem in our case is how to establish
+/// correspondence between existing tracklets and recently detected objects.
+/// First step is to compute an affinity matrix between tracklets and
+/// detections. The affinity equals to
+///       appearance_affinity * motion_affinity * shape_affinity.
+/// Where appearance is 1 - distance(tracklet_fast_dscr, detection_fast_dscr).
+/// Second step is to solve the assignment problem using Kuhn-Munkres
+/// algorithm. If correspondence between some tracklet and detection is
+/// established with low confidence (affinity) then the strong descriptor is
+/// used to determine if there is correspondence between tracklet and detection.
+///
+class TrackerByMatching: public ITrackerByMatching {
+public:
+    using Descriptor = std::shared_ptr<IImageDescriptor>;
+    using Distance = std::shared_ptr<IDescriptorDistance>;
+
+    ///
+    /// \brief Constructor that creates an instance of the tracker with
+    /// parameters.
+    /// \param[in] params - the tracker parameters.
+    ///
+    explicit TrackerByMatching(const TrackerParams &params = TrackerParams());
+    virtual ~TrackerByMatching() {}
+
+    ///
+    /// \brief Process given frame.
+    /// \param[in] frame Colored image (CV_8UC3).
+    /// \param[in] detections Detected objects on the frame.
+    /// \param[in] timestamp Timestamp must be positive and measured in
+    /// milliseconds
+    ///
+    void process(const cv::Mat &frame, const TrackedObjects &detections,
+                 uint64_t timestamp) override;
+
+    ///
+    /// \brief Pipeline parameters getter.
+    /// \return Parameters of pipeline.
+    ///
+    const TrackerParams &params() const override;
+
+    ///
+    /// \brief Pipeline parameters setter.
+    /// \param[in] params Parameters of pipeline.
+    ///
+    void setParams(const TrackerParams &params) override;
+
+    ///
+    /// \brief Fast descriptor getter.
+    /// \return Fast descriptor used in pipeline.
+    ///
+    const Descriptor &descriptorFast() const override;
+
+    ///
+    /// \brief Fast descriptor setter.
+    /// \param[in] val Fast descriptor used in pipeline.
+    ///
+    void setDescriptorFast(const Descriptor &val) override;
+
+    ///
+    /// \brief Strong descriptor getter.
+    /// \return Strong descriptor used in pipeline.
+    ///
+    const Descriptor &descriptorStrong() const override;
+
+    ///
+    /// \brief Strong descriptor setter.
+    /// \param[in] val Strong descriptor used in pipeline.
+    ///
+    void setDescriptorStrong(const Descriptor &val) override;
+
+    ///
+    /// \brief Fast distance getter.
+    /// \return Fast distance used in pipeline.
+    ///
+    const Distance &distanceFast() const override;
+
+    ///
+    /// \brief Fast distance setter.
+    /// \param[in] val Fast distance used in pipeline.
+    ///
+    void setDistanceFast(const Distance &val) override;
+
+    ///
+    /// \brief Strong distance getter.
+    /// \return Strong distance used in pipeline.
+    ///
+    const Distance &distanceStrong() const override;
+
+    ///
+    /// \brief Strong distance setter.
+    /// \param[in] val Strong distance used in pipeline.
+    ///
+    void setDistanceStrong(const Distance &val) override;
+
+    ///
+    /// \brief Returns number of counted people.
+    /// \return a number of counted people.
+    ///
+    size_t count() const override;
+
+    ///
+    /// \brief Get active tracks to draw
+    /// \return Active tracks.
+    ///
+    std::unordered_map<size_t, std::vector<cv::Point> > getActiveTracks() const override;
+
+    ///
+    /// \brief Get tracked detections.
+    /// \return Tracked detections.
+    ///
+    TrackedObjects trackedDetections() const override;
+
+    ///
+    /// \brief Draws active tracks on a given frame.
+    /// \param[in] frame Colored image (CV_8UC3).
+    /// \return Colored image with drawn active tracks.
+    ///
+    cv::Mat drawActiveTracks(const cv::Mat &frame) override;
+
+    ///
+    /// \brief Print confusion matrices of data association classifiers.
+    /// It works only in case of loaded detection logs instead of native
+    /// detectors.
+    ///
+    void PrintConfusionMatrices() const;
+
+    ///
+    /// \brief isTrackForgotten returns true if track is forgotten.
+    /// \param id Track ID.
+    /// \return true if track is forgotten.
+    ///
+    bool isTrackForgotten(size_t id) const override;
+
+    ///
+    /// \brief tracks Returns all tracks including forgotten (lost too many frames
+    /// ago).
+    /// \return Set of tracks {id, track}.
+    ///
+    const std::unordered_map<size_t, Track> &tracks() const override;
+
+    ///
+    /// \brief isTrackValid Checks whether track is valid (duration > threshold).
+    /// \param track_id Index of checked track.
+    /// \return True if track duration exceeds some predefined value.
+    ///
+    bool isTrackValid(size_t track_id) const override;
+
+    ///
+    /// \brief dropForgottenTracks Removes tracks from memory that were lost too
+    /// many frames ago.
+    ///
+    void dropForgottenTracks() override;
+
+    ///
+    /// \brief dropForgottenTrack Check that the track was lost too many frames
+    /// ago
+    /// and removes it frm memory.
+    ///
+    void dropForgottenTrack(size_t track_id) override;
+
+private:
+    struct Match {
+        int frame_idx1;
+        int frame_idx2;
+        cv::Rect rect1;
+        cv::Rect rect2;
+        cv::Rect pr_rect1;
+        bool pr_label;
+        bool gt_label;
+
+        Match() {}
+
+        Match(const TrackedObject &a, const cv::Rect &a_pr_rect,
+              const TrackedObject &b, bool pr_label)
+            : frame_idx1(a.frame_idx),
+            frame_idx2(b.frame_idx),
+            rect1(a.rect),
+            rect2(b.rect),
+            pr_rect1(a_pr_rect),
+            pr_label(pr_label),
+            gt_label(a.object_id == b.object_id) {
+                CV_Assert(frame_idx1 != frame_idx2);
+            }
+    };
+
+
+    const ObjectTracks all_tracks(bool valid_only) const;
+    // Returns shape affinity.
+    static float ShapeAffinity(float w, const cv::Rect &trk, const cv::Rect &det);
+
+    // Returns motion affinity.
+    static float MotionAffinity(float w, const cv::Rect &trk,
+                                const cv::Rect &det);
+
+    // Returns time affinity.
+    static float TimeAffinity(float w, const float &trk, const float &det);
+
+    cv::Rect PredictRect(size_t id, size_t k, size_t s) const;
+
+    cv::Rect PredictRectSmoothed(size_t id, size_t k, size_t s) const;
+
+    cv::Rect PredictRectSimple(size_t id, size_t k, size_t s) const;
+
+    void SolveAssignmentProblem(
+        const std::set<size_t> &track_ids, const TrackedObjects &detections,
+        const std::vector<cv::Mat> &descriptors,
+        CV_OUT std::set<size_t>& unmatched_tracks,
+        CV_OUT std::set<size_t>& unmatched_detections,
+        CV_OUT std::set<std::tuple<size_t, size_t, float>>& matches);
+
+    void ComputeFastDesciptors(const cv::Mat &frame,
+                               const TrackedObjects &detections,
+                               CV_OUT std::vector<cv::Mat>& desriptors);
+
+    void ComputeDissimilarityMatrix(const std::set<size_t> &active_track_ids,
+                                    const TrackedObjects &detections,
+                                    const std::vector<cv::Mat> &fast_descriptors,
+                                    CV_OUT cv::Mat& dissimilarity_matrix);
+
+    std::vector<float> ComputeDistances(
+        const cv::Mat &frame,
+        const TrackedObjects& detections,
+        const std::vector<std::pair<size_t, size_t>> &track_and_det_ids,
+        CV_OUT std::map<size_t, cv::Mat>& det_id_to_descriptor);
+
+    std::map<size_t, std::pair<bool, cv::Mat>> StrongMatching(
+        const cv::Mat &frame,
+        const TrackedObjects& detections,
+        const std::vector<std::pair<size_t, size_t>> &track_and_det_ids);
+
+    std::vector<std::pair<size_t, size_t>> GetTrackToDetectionIds(
+        const std::set<std::tuple<size_t, size_t, float>> &matches);
+
+    float AffinityFast(const cv::Mat &descriptor1, const TrackedObject &obj1,
+                       const cv::Mat &descriptor2, const TrackedObject &obj2);
+
+    float Affinity(const TrackedObject &obj1, const TrackedObject &obj2);
+
+    void AddNewTrack(const cv::Mat &frame, const TrackedObject &detection,
+                     const cv::Mat &fast_descriptor,
+                     const cv::Mat &descriptor_strong = cv::Mat());
+
+    void AddNewTracks(const cv::Mat &frame, const TrackedObjects &detections,
+                      const std::vector<cv::Mat> &descriptors_fast);
+
+    void AddNewTracks(const cv::Mat &frame, const TrackedObjects &detections,
+                      const std::vector<cv::Mat> &descriptors_fast,
+                      const std::set<size_t> &ids);
+
+    void AppendToTrack(const cv::Mat &frame, size_t track_id,
+                       const TrackedObject &detection,
+                       const cv::Mat &descriptor_fast,
+                       const cv::Mat &descriptor_strong);
+
+    bool EraseTrackIfBBoxIsOutOfFrame(size_t track_id);
+
+    bool EraseTrackIfItWasLostTooManyFramesAgo(size_t track_id);
+
+    bool UpdateLostTrackAndEraseIfItsNeeded(size_t track_id);
+
+    void UpdateLostTracks(const std::set<size_t> &track_ids);
+
+    static cv::Mat ConfusionMatrix(const std::vector<Match> &matches);
+
+    const std::set<size_t> &active_track_ids() const;
+
+    // Returns decisions made by heuristic based on fast distance/descriptor and
+    // shape, motion and time affinity.
+    const std::vector<Match> & base_classifier_matches() const;
+
+    // Returns decisions made by heuristic based on strong distance/descriptor
+    // and
+    // shape, motion and time affinity.
+    const std::vector<Match> &reid_based_classifier_matches() const;
+
+    // Returns decisions made by strong distance/descriptor affinity.
+    const std::vector<Match> &reid_classifier_matches() const;
+
+    TrackedObjects FilterDetections(const TrackedObjects &detections) const;
+    bool isTrackForgotten(const Track &track) const;
+
+    // Parameters of the pipeline.
+    TrackerParams params_;
+
+    // Indexes of active tracks.
+    std::set<size_t> active_track_ids_;
+
+    // Descriptor fast (base classifer).
+    Descriptor descriptor_fast_;
+
+    // Distance fast (base classifer).
+    Distance distance_fast_;
+
+    // Descriptor strong (reid classifier).
+    Descriptor descriptor_strong_;
+
+    // Distance strong (reid classifier).
+    Distance distance_strong_;
+
+    // All tracks.
+    std::unordered_map<size_t, Track> tracks_;
+
+    // Previous frame image.
+    cv::Size prev_frame_size_;
+
+    struct pair_hash {
+        std::size_t operator()(const std::pair<size_t, size_t> &p) const {
+            CV_Assert(p.first < 1e6 && p.second < 1e6);
+            return static_cast<size_t>(p.first * 1e6 + p.second);
+        }
+    };
+
+    // Distance between current active tracks.
+    std::unordered_map<std::pair<size_t, size_t>, float, pair_hash> tracks_dists_;
+
+    // Whether collect matches and compute confusion matrices for
+    // track-detection
+    // association task (base classifier, reid-based classifier,
+    // reid-classiifer).
+    bool collect_matches_;
+
+    // This vector contains decisions made by
+    // fast_apperance-motion-shape affinity model.
+    std::vector<Match> base_classifier_matches_;
+
+    // This vector contains decisions made by
+    // strong_apperance(cnn-reid)-motion-shape affinity model.
+    std::vector<Match> reid_based_classifier_matches_;
+
+    // This vector contains decisions made by
+    // strong_apperance(cnn-reid) affinity model only.
+    std::vector<Match> reid_classifier_matches_;
+
+    // Number of all current tracks.
+    size_t tracks_counter_;
+
+    // Number of dropped valid tracks.
+    size_t valid_tracks_counter_;
+
+    cv::Size frame_size_;
+
+    std::vector<cv::Scalar> colors_;
+
+    uint64_t prev_timestamp_;
+};
+
+cv::Ptr<ITrackerByMatching> cv::tbm::createTrackerByMatching(const TrackerParams &params)
+{
+    ITrackerByMatching* ptr = new TrackerByMatching(params);
+    return cv::Ptr<ITrackerByMatching>(ptr);
+}
+
+TrackerParams::TrackerParams()
+    : min_track_duration(1000),
+    forget_delay(150),
+    aff_thr_fast(0.8f),
+    aff_thr_strong(0.75f),
+    shape_affinity_w(0.5f),
+    motion_affinity_w(0.2f),
+    time_affinity_w(0.0f),
+    min_det_conf(0.1f),
+    bbox_aspect_ratios_range(0.666f, 5.0f),
+    bbox_heights_range(40.f, 1000.f),
+    predict(25),
+    strong_affinity_thr(0.2805f),
+    reid_thr(0.61f),
+    drop_forgotten_tracks(true),
+    max_num_objects_in_track(300) {}
+
+// Returns confusion matrix as:
+//   |tp fn|
+//   |fp tn|
+cv::Mat TrackerByMatching::ConfusionMatrix(const std::vector<Match> &matches) {
+    const bool kNegative = false;
+    cv::Mat conf_mat(2, 2, CV_32F, cv::Scalar(0));
+    for (const auto &m : matches) {
+        conf_mat.at<float>(m.gt_label == kNegative, m.pr_label == kNegative)++;
+    }
+
+    return conf_mat;
+}
+
+TrackerByMatching::TrackerByMatching(const TrackerParams &params)
+    : params_(params),
+    descriptor_strong_(nullptr),
+    distance_strong_(nullptr),
+    collect_matches_(true),
+    tracks_counter_(0),
+    valid_tracks_counter_(0),
+    frame_size_(0, 0),
+    prev_timestamp_(std::numeric_limits<uint64_t>::max()) {
+        ValidateParams(params);
+    }
+
+// Pipeline parameters getter.
+const TrackerParams &TrackerByMatching::params() const { return params_; }
+
+// Pipeline parameters setter.
+void TrackerByMatching::setParams(const TrackerParams &params) {
+    ValidateParams(params);
+    params_ = params;
+}
+
+// Descriptor fast getter.
+const TrackerByMatching::Descriptor &TrackerByMatching::descriptorFast() const {
+    return descriptor_fast_;
+}
+
+// Descriptor fast setter.
+void TrackerByMatching::setDescriptorFast(const Descriptor &val) {
+    descriptor_fast_ = val;
+}
+
+// Descriptor strong getter.
+const TrackerByMatching::Descriptor &TrackerByMatching::descriptorStrong() const {
+    return descriptor_strong_;
+}
+
+// Descriptor strong setter.
+void TrackerByMatching::setDescriptorStrong(const Descriptor &val) {
+    descriptor_strong_ = val;
+}
+
+// Distance fast getter.
+const TrackerByMatching::Distance &TrackerByMatching::distanceFast() const { return distance_fast_; }
+
+// Distance fast setter.
+void TrackerByMatching::setDistanceFast(const Distance &val) { distance_fast_ = val; }
+
+// Distance strong getter.
+const TrackerByMatching::Distance &TrackerByMatching::distanceStrong() const { return distance_strong_; }
+
+// Distance strong setter.
+void TrackerByMatching::setDistanceStrong(const Distance &val) { distance_strong_ = val; }
+
+// Returns all tracks including forgotten (lost too many frames ago).
+const std::unordered_map<size_t, Track> &
+TrackerByMatching::tracks() const {
+    return tracks_;
+}
+
+// Returns indexes of active tracks only.
+const std::set<size_t> &TrackerByMatching::active_track_ids() const {
+    return active_track_ids_;
+}
+
+
+// Returns decisions made by heuristic based on fast distance/descriptor and
+// shape, motion and time affinity.
+const std::vector<TrackerByMatching::Match> &
+TrackerByMatching::base_classifier_matches() const {
+    return base_classifier_matches_;
+}
+
+// Returns decisions made by heuristic based on strong distance/descriptor
+// and
+// shape, motion and time affinity.
+const std::vector<TrackerByMatching::Match> &TrackerByMatching::reid_based_classifier_matches() const {
+    return reid_based_classifier_matches_;
+}
+
+// Returns decisions made by strong distance/descriptor affinity.
+const std::vector<TrackerByMatching::Match> &TrackerByMatching::reid_classifier_matches() const {
+    return reid_classifier_matches_;
+}
+
+TrackedObjects TrackerByMatching::FilterDetections(
+    const TrackedObjects &detections) const {
+    TrackedObjects filtered_detections;
+    for (const auto &det : detections) {
+        float aspect_ratio = static_cast<float>(det.rect.height) / det.rect.width;
+        if (det.confidence > params_.min_det_conf &&
+            IsInRange(aspect_ratio, params_.bbox_aspect_ratios_range) &&
+            IsInRange(static_cast<float>(det.rect.height), params_.bbox_heights_range)) {
+            filtered_detections.emplace_back(det);
+        }
+    }
+    return filtered_detections;
+}
+
+void TrackerByMatching::SolveAssignmentProblem(
+    const std::set<size_t> &track_ids, const TrackedObjects &detections,
+    const std::vector<cv::Mat> &descriptors,
+    std::set<size_t>& unmatched_tracks, std::set<size_t>& unmatched_detections,
+    std::set<std::tuple<size_t, size_t, float>>& matches) {
+    unmatched_tracks.clear();
+    unmatched_detections.clear();
+
+    TBM_CHECK(!track_ids.empty());
+    TBM_CHECK(!detections.empty());
+    TBM_CHECK(descriptors.size() == detections.size());
+    matches.clear();
+
+    cv::Mat dissimilarity;
+    ComputeDissimilarityMatrix(track_ids, detections, descriptors,
+                               dissimilarity);
+
+    auto res = KuhnMunkres().Solve(dissimilarity);
+
+    for (size_t i = 0; i < detections.size(); i++) {
+        unmatched_detections.insert(i);
+    }
+
+    int i = 0;
+    for (auto id : track_ids) {
+        if (res[i] < detections.size()) {
+            matches.emplace(id, res[i], 1 - dissimilarity.at<float>(i, static_cast<int>(res[i])));
+        } else {
+            unmatched_tracks.insert(id);
+        }
+        i++;
+    }
+}
+
+const ObjectTracks TrackerByMatching::all_tracks(bool valid_only) const {
+    ObjectTracks all_objects;
+    int counter = 0;
+
+    std::set<size_t> sorted_ids;
+    for (const auto &pair : tracks()) {
+        sorted_ids.emplace(pair.first);
+    }
+
+    for (size_t id : sorted_ids) {
+        if (!valid_only || isTrackValid(id)) {
+            TrackedObjects filtered_objects;
+            for (const auto &object : tracks().at(id).objects) {
+                filtered_objects.emplace_back(object);
+                filtered_objects.back().object_id = counter;
+            }
+            all_objects.emplace(counter++, filtered_objects);
+        }
+    }
+    return all_objects;
+}
+
+cv::Rect TrackerByMatching::PredictRect(size_t id, size_t k,
+                                        size_t s) const {
+    const auto &track = tracks_.at(id);
+    TBM_CHECK(!track.empty());
+
+    if (track.size() == 1) {
+        return track[0].rect;
+    }
+
+    size_t start_i = track.size() > k ? track.size() - k : 0;
+    float width = 0, height = 0;
+
+    for (size_t i = start_i; i < track.size(); i++) {
+        width += track[i].rect.width;
+        height += track[i].rect.height;
+    }
+
+    TBM_CHECK(track.size() - start_i > 0);
+    width /= (track.size() - start_i);
+    height /= (track.size() - start_i);
+
+    float delim = 0;
+    cv::Point2f d(0, 0);
+
+    for (size_t i = start_i + 1; i < track.size(); i++) {
+        d += cv::Point2f(Center(track[i].rect) - Center(track[i - 1].rect));
+        delim += (track[i].frame_idx - track[i - 1].frame_idx);
+    }
+
+    if (delim) {
+        d /= delim;
+    }
+
+    s += 1;
+
+    cv::Point c = Center(track.back().rect);
+    return cv::Rect(static_cast<int>(c.x - width / 2 + d.x * s),
+                    static_cast<int>(c.y - height / 2 + d.y * s),
+                    static_cast<int>(width),
+                    static_cast<int>(height));
+}
+
+
+bool TrackerByMatching::EraseTrackIfBBoxIsOutOfFrame(size_t track_id) {
+    if (tracks_.find(track_id) == tracks_.end()) return true;
+    auto c = Center(tracks_.at(track_id).predicted_rect);
+    if (!prev_frame_size_.empty() &&
+        (c.x < 0 || c.y < 0 || c.x > prev_frame_size_.width ||
+         c.y > prev_frame_size_.height)) {
+        tracks_.at(track_id).lost = params_.forget_delay + 1;
+        for (auto id : active_track_ids()) {
+            size_t min_id = std::min(id, track_id);
+            size_t max_id = std::max(id, track_id);
+            tracks_dists_.erase(std::pair<size_t, size_t>(min_id, max_id));
+        }
+        active_track_ids_.erase(track_id);
+        return true;
+    }
+    return false;
+}
+
+bool TrackerByMatching::EraseTrackIfItWasLostTooManyFramesAgo(
+    size_t track_id) {
+    if (tracks_.find(track_id) == tracks_.end()) return true;
+    if (tracks_.at(track_id).lost > params_.forget_delay) {
+        for (auto id : active_track_ids()) {
+            size_t min_id = std::min(id, track_id);
+            size_t max_id = std::max(id, track_id);
+            tracks_dists_.erase(std::pair<size_t, size_t>(min_id, max_id));
+        }
+        active_track_ids_.erase(track_id);
+
+        return true;
+    }
+    return false;
+}
+
+bool TrackerByMatching::UpdateLostTrackAndEraseIfItsNeeded(
+    size_t track_id) {
+    tracks_.at(track_id).lost++;
+    tracks_.at(track_id).predicted_rect =
+        PredictRect(track_id, params().predict, tracks_.at(track_id).lost);
+
+    bool erased = EraseTrackIfBBoxIsOutOfFrame(track_id);
+    if (!erased) erased = EraseTrackIfItWasLostTooManyFramesAgo(track_id);
+    return erased;
+}
+
+void TrackerByMatching::UpdateLostTracks(
+    const std::set<size_t> &track_ids) {
+    for (auto track_id : track_ids) {
+        UpdateLostTrackAndEraseIfItsNeeded(track_id);
+    }
+}
+
+void TrackerByMatching::process(const cv::Mat &frame,
+                                const TrackedObjects &input_detections,
+                                uint64_t timestamp) {
+    if (prev_timestamp_ != std::numeric_limits<uint64_t>::max())
+        TBM_CHECK_LT(static_cast<size_t>(prev_timestamp_), static_cast<size_t>(timestamp));
+
+    if (frame_size_ == cv::Size(0, 0)) {
+        frame_size_ = frame.size();
+    } else {
+        TBM_CHECK_EQ(frame_size_, frame.size());
+    }
+
+    TrackedObjects detections = FilterDetections(input_detections);
+    for (auto &obj : detections) {
+        obj.timestamp = timestamp;
+    }
+
+    std::vector<cv::Mat> descriptors_fast;
+    ComputeFastDesciptors(frame, detections, descriptors_fast);
+
+    auto active_tracks = active_track_ids_;
+
+    if (!active_tracks.empty() && !detections.empty()) {
+        std::set<size_t> unmatched_tracks, unmatched_detections;
+        std::set<std::tuple<size_t, size_t, float>> matches;
+
+        SolveAssignmentProblem(active_tracks, detections, descriptors_fast,
+                               unmatched_tracks,
+                               unmatched_detections, matches);
+
+        std::map<size_t, std::pair<bool, cv::Mat>> is_matching_to_track;
+
+        if (distance_strong_) {
+            std::vector<std::pair<size_t, size_t>> reid_track_and_det_ids =
+                GetTrackToDetectionIds(matches);
+            is_matching_to_track = StrongMatching(
+                frame, detections, reid_track_and_det_ids);
+        }
+
+        for (const auto &match : matches) {
+            size_t track_id = std::get<0>(match);
+            size_t det_id = std::get<1>(match);
+            float conf = std::get<2>(match);
+
+            auto last_det = tracks_.at(track_id).objects.back();
+            last_det.rect = tracks_.at(track_id).predicted_rect;
+
+            if (collect_matches_ && last_det.object_id >= 0 &&
+                detections[det_id].object_id >= 0) {
+                base_classifier_matches_.emplace_back(
+                    tracks_.at(track_id).objects.back(), last_det.rect,
+                    detections[det_id], conf > params_.aff_thr_fast);
+            }
+
+            if (conf > params_.aff_thr_fast) {
+                AppendToTrack(frame, track_id, detections[det_id],
+                              descriptors_fast[det_id], cv::Mat());
+                unmatched_detections.erase(det_id);
+            } else {
+                if (conf > params_.strong_affinity_thr) {
+                    if (distance_strong_ && is_matching_to_track[track_id].first) {
+                        AppendToTrack(frame, track_id, detections[det_id],
+                                      descriptors_fast[det_id],
+                                      is_matching_to_track[track_id].second.clone());
+                    } else {
+                        if (UpdateLostTrackAndEraseIfItsNeeded(track_id)) {
+                            AddNewTrack(frame, detections[det_id], descriptors_fast[det_id],
+                                        distance_strong_
+                                        ? is_matching_to_track[track_id].second.clone()
+                                        : cv::Mat());
+                        }
+                    }
+
+                    unmatched_detections.erase(det_id);
+                } else {
+                    unmatched_tracks.insert(track_id);
+                }
+            }
+        }
+
+        AddNewTracks(frame, detections, descriptors_fast, unmatched_detections);
+        UpdateLostTracks(unmatched_tracks);
+
+        for (size_t id : active_tracks) {
+            EraseTrackIfBBoxIsOutOfFrame(id);
+        }
+    } else {
+        AddNewTracks(frame, detections, descriptors_fast);
+        UpdateLostTracks(active_tracks);
+    }
+
+    prev_frame_size_ = frame.size();
+    if (params_.drop_forgotten_tracks) dropForgottenTracks();
+
+    tracks_dists_.clear();
+    prev_timestamp_ = timestamp;
+}
+
+void TrackerByMatching::dropForgottenTracks() {
+    std::unordered_map<size_t, Track> new_tracks;
+    std::set<size_t> new_active_tracks;
+
+    size_t max_id = 0;
+    if (!active_track_ids_.empty())
+        max_id =
+            *std::max_element(active_track_ids_.begin(), active_track_ids_.end());
+
+    const size_t kMaxTrackID = 10000;
+    bool reassign_id = max_id > kMaxTrackID;
+
+    size_t counter = 0;
+    for (const auto &pair : tracks_) {
+        if (!isTrackForgotten(pair.first)) {
+            new_tracks.emplace(reassign_id ? counter : pair.first, pair.second);
+            new_active_tracks.emplace(reassign_id ? counter : pair.first);
+            counter++;
+
+        } else {
+            if (isTrackValid(pair.first)) {
+                valid_tracks_counter_++;
+            }
+        }
+    }
+    tracks_.swap(new_tracks);
+    active_track_ids_.swap(new_active_tracks);
+
+    tracks_counter_ = reassign_id ? counter : tracks_counter_;
+}
+
+void TrackerByMatching::dropForgottenTrack(size_t track_id) {
+    TBM_CHECK(isTrackForgotten(track_id));
+    TBM_CHECK(active_track_ids_.count(track_id) == 0);
+    tracks_.erase(track_id);
+}
+
+float TrackerByMatching::ShapeAffinity(float weight, const cv::Rect &trk,
+                                       const cv::Rect &det) {
+    float w_dist = static_cast<float>(std::fabs(trk.width - det.width) / (trk.width + det.width));
+    float h_dist = static_cast<float>(std::fabs(trk.height - det.height) / (trk.height + det.height));
+    return exp(-weight * (w_dist + h_dist));
+}
+
+float TrackerByMatching::MotionAffinity(float weight, const cv::Rect &trk,
+                                        const cv::Rect &det) {
+    float x_dist = static_cast<float>(trk.x - det.x) * (trk.x - det.x) /
+        (det.width * det.width);
+    float y_dist = static_cast<float>(trk.y - det.y) * (trk.y - det.y) /
+        (det.height * det.height);
+    return exp(-weight * (x_dist + y_dist));
+}
+
+float TrackerByMatching::TimeAffinity(float weight, const float &trk_time,
+                                      const float &det_time) {
+    return exp(-weight * std::fabs(trk_time - det_time));
+}
+
+void TrackerByMatching::ComputeFastDesciptors(
+    const cv::Mat &frame, const TrackedObjects &detections,
+    std::vector<cv::Mat>& desriptors) {
+    desriptors = std::vector<cv::Mat>(detections.size(), cv::Mat());
+    for (size_t i = 0; i < detections.size(); i++) {
+        descriptor_fast_->compute(frame(detections[i].rect).clone(),
+                                  desriptors[i]);
+    }
+}
+
+void TrackerByMatching::ComputeDissimilarityMatrix(
+    const std::set<size_t> &active_tracks, const TrackedObjects &detections,
+    const std::vector<cv::Mat> &descriptors_fast,
+    cv::Mat& dissimilarity_matrix) {
+    cv::Mat am(static_cast<int>(active_tracks.size()), static_cast<int>(detections.size()), CV_32F, cv::Scalar(0));
+    int i = 0;
+    for (auto id : active_tracks) {
+        auto ptr = am.ptr<float>(i);
+        for (size_t j = 0; j < descriptors_fast.size(); j++) {
+            auto last_det = tracks_.at(id).objects.back();
+            last_det.rect = tracks_.at(id).predicted_rect;
+            ptr[j] = AffinityFast(tracks_.at(id).descriptor_fast, last_det,
+                                  descriptors_fast[j], detections[j]);
+        }
+        i++;
+    }
+    dissimilarity_matrix = 1.0 - am;
+}
+
+std::vector<float> TrackerByMatching::ComputeDistances(
+    const cv::Mat &frame,
+    const TrackedObjects& detections,
+    const std::vector<std::pair<size_t, size_t>> &track_and_det_ids,
+    std::map<size_t, cv::Mat>& det_id_to_descriptor) {
+    std::map<size_t, size_t> det_to_batch_ids;
+    std::map<size_t, size_t> track_to_batch_ids;
+
+    std::vector<cv::Mat> images;
+    std::vector<cv::Mat> descriptors;
+    for (size_t i = 0; i < track_and_det_ids.size(); i++) {
+        size_t track_id = track_and_det_ids[i].first;
+        size_t det_id = track_and_det_ids[i].second;
+
+        if (tracks_.at(track_id).descriptor_strong.empty()) {
+            images.push_back(tracks_.at(track_id).last_image);
+            descriptors.push_back(cv::Mat());
+            track_to_batch_ids[track_id] = descriptors.size() - 1;
+        }
+
+        images.push_back(frame(detections[det_id].rect));
+        descriptors.push_back(cv::Mat());
+        det_to_batch_ids[det_id] = descriptors.size() - 1;
+    }
+
+    descriptor_strong_->compute(images, descriptors);
+
+    std::vector<cv::Mat> descriptors1;
+    std::vector<cv::Mat> descriptors2;
+    for (size_t i = 0; i < track_and_det_ids.size(); i++) {
+        size_t track_id = track_and_det_ids[i].first;
+        size_t det_id = track_and_det_ids[i].second;
+
+        if (tracks_.at(track_id).descriptor_strong.empty()) {
+            tracks_.at(track_id).descriptor_strong =
+                descriptors[track_to_batch_ids[track_id]].clone();
+        }
+        det_id_to_descriptor[det_id] = descriptors[det_to_batch_ids[det_id]];
+
+        descriptors1.push_back(descriptors[det_to_batch_ids[det_id]]);
+        descriptors2.push_back(tracks_.at(track_id).descriptor_strong);
+    }
+
+    std::vector<float> distances =
+        distance_strong_->compute(descriptors1, descriptors2);
+
+    return distances;
+}
+
+std::vector<std::pair<size_t, size_t>>
+TrackerByMatching::GetTrackToDetectionIds(
+    const std::set<std::tuple<size_t, size_t, float>> &matches) {
+    std::vector<std::pair<size_t, size_t>> track_and_det_ids;
+
+    for (const auto &match : matches) {
+        size_t track_id = std::get<0>(match);
+        size_t det_id = std::get<1>(match);
+        float conf = std::get<2>(match);
+        if (conf < params_.aff_thr_fast && conf > params_.strong_affinity_thr) {
+            track_and_det_ids.emplace_back(track_id, det_id);
+        }
+    }
+    return track_and_det_ids;
+}
+
+std::map<size_t, std::pair<bool, cv::Mat>>
+TrackerByMatching::StrongMatching(
+    const cv::Mat &frame,
+    const TrackedObjects& detections,
+    const std::vector<std::pair<size_t, size_t>> &track_and_det_ids) {
+    std::map<size_t, std::pair<bool, cv::Mat>> is_matching;
+
+    if (track_and_det_ids.size() == 0) {
+        return is_matching;
+    }
+
+    std::map<size_t, cv::Mat> det_ids_to_descriptors;
+    std::vector<float> distances =
+        ComputeDistances(frame, detections,
+                         track_and_det_ids, det_ids_to_descriptors);
+
+    for (size_t i = 0; i < track_and_det_ids.size(); i++) {
+        auto reid_affinity = 1.0 - distances[i];
+
+        size_t track_id = track_and_det_ids[i].first;
+        size_t det_id = track_and_det_ids[i].second;
+
+        const auto& track = tracks_.at(track_id);
+        const auto& detection = detections[det_id];
+
+        auto last_det = track.objects.back();
+        last_det.rect = track.predicted_rect;
+
+        float affinity = static_cast<float>(reid_affinity * Affinity(last_det, detection));
+
+        if (collect_matches_ && last_det.object_id >= 0 &&
+            detection.object_id >= 0) {
+            reid_classifier_matches_.emplace_back(track.objects.back(), last_det.rect,
+                                                  detection,
+                                                  reid_affinity > params_.reid_thr);
+
+            reid_based_classifier_matches_.emplace_back(
+                track.objects.back(), last_det.rect, detection,
+                affinity > params_.aff_thr_strong);
+        }
+
+        bool is_detection_matching =
+            reid_affinity > params_.reid_thr && affinity > params_.aff_thr_strong;
+
+        is_matching[track_id] = std::pair<bool, cv::Mat>(
+            is_detection_matching, det_ids_to_descriptors[det_id]);
+    }
+    return is_matching;
+}
+
+void TrackerByMatching::AddNewTracks(
+    const cv::Mat &frame, const TrackedObjects &detections,
+    const std::vector<cv::Mat> &descriptors_fast) {
+    TBM_CHECK(detections.size() == descriptors_fast.size());
+    for (size_t i = 0; i < detections.size(); i++) {
+        AddNewTrack(frame, detections[i], descriptors_fast[i]);
+    }
+}
+
+void TrackerByMatching::AddNewTracks(
+    const cv::Mat &frame, const TrackedObjects &detections,
+    const std::vector<cv::Mat> &descriptors_fast, const std::set<size_t> &ids) {
+    TBM_CHECK(detections.size() == descriptors_fast.size());
+    for (size_t i : ids) {
+        TBM_CHECK(i < detections.size());
+        AddNewTrack(frame, detections[i], descriptors_fast[i]);
+    }
+}
+
+void TrackerByMatching::AddNewTrack(const cv::Mat &frame,
+                                    const TrackedObject &detection,
+                                    const cv::Mat &descriptor_fast,
+                                    const cv::Mat &descriptor_strong) {
+    auto detection_with_id = detection;
+    detection_with_id.object_id = static_cast<int>(tracks_counter_);
+    tracks_.emplace(std::pair<size_t, Track>(
+            tracks_counter_,
+            Track({detection_with_id}, frame(detection.rect).clone(),
+                  descriptor_fast.clone(), descriptor_strong.clone())));
+
+    for (size_t id : active_track_ids_) {
+        tracks_dists_.emplace(std::pair<size_t, size_t>(id, tracks_counter_),
+                              std::numeric_limits<float>::max());
+    }
+
+    active_track_ids_.insert(tracks_counter_);
+    tracks_counter_++;
+}
+
+void TrackerByMatching::AppendToTrack(const cv::Mat &frame,
+                                      size_t track_id,
+                                      const TrackedObject &detection,
+                                      const cv::Mat &descriptor_fast,
+                                      const cv::Mat &descriptor_strong) {
+    TBM_CHECK(!isTrackForgotten(track_id));
+
+    auto detection_with_id = detection;
+    detection_with_id.object_id = static_cast<int>(track_id);
+
+    auto &cur_track = tracks_.at(track_id);
+    cur_track.objects.emplace_back(detection_with_id);
+    cur_track.predicted_rect = detection.rect;
+    cur_track.lost = 0;
+    cur_track.last_image = frame(detection.rect).clone();
+    cur_track.descriptor_fast = descriptor_fast.clone();
+    cur_track.length++;
+
+    if (cur_track.descriptor_strong.empty()) {
+        cur_track.descriptor_strong = descriptor_strong.clone();
+    } else if (!descriptor_strong.empty()) {
+        cur_track.descriptor_strong =
+            0.5 * (descriptor_strong + cur_track.descriptor_strong);
+    }
+
+
+    if (params_.max_num_objects_in_track > 0) {
+        while (cur_track.size() >
+               static_cast<size_t>(params_.max_num_objects_in_track)) {
+            cur_track.objects.erase(cur_track.objects.begin());
+        }
+    }
+}
+
+float TrackerByMatching::AffinityFast(const cv::Mat &descriptor1,
+                                      const TrackedObject &obj1,
+                                      const cv::Mat &descriptor2,
+                                      const TrackedObject &obj2) {
+    const float eps = static_cast<float>(1e-6);
+    float shp_aff = ShapeAffinity(params_.shape_affinity_w, obj1.rect, obj2.rect);
+    if (shp_aff < eps) return 0.0;
+
+    float mot_aff =
+        MotionAffinity(params_.motion_affinity_w, obj1.rect, obj2.rect);
+    if (mot_aff < eps) return 0.0;
+    float time_aff =
+        TimeAffinity(params_.time_affinity_w, static_cast<float>(obj1.frame_idx), static_cast<float>(obj2.frame_idx));
+
+    if (time_aff < eps) return 0.0;
+
+    float app_aff = static_cast<float>(1.0 - distance_fast_->compute(descriptor1, descriptor2));
+
+    return shp_aff * mot_aff * app_aff * time_aff;
+}
+
+float TrackerByMatching::Affinity(const TrackedObject &obj1,
+                                  const TrackedObject &obj2) {
+    float shp_aff = ShapeAffinity(params_.shape_affinity_w, obj1.rect, obj2.rect);
+    float mot_aff =
+        MotionAffinity(params_.motion_affinity_w, obj1.rect, obj2.rect);
+    float time_aff =
+        TimeAffinity(params_.time_affinity_w, static_cast<float>(obj1.frame_idx), static_cast<float>(obj2.frame_idx));
+    return shp_aff * mot_aff * time_aff;
+}
+
+bool TrackerByMatching::isTrackValid(size_t id) const {
+    const auto& track = tracks_.at(id);
+    const auto &objects = track.objects;
+    if (objects.empty()) {
+        return false;
+    }
+    int64_t duration_ms = objects.back().timestamp - track.first_object.timestamp;
+    if (duration_ms < static_cast<int64_t>(params_.min_track_duration))
+        return false;
+    return true;
+}
+
+bool TrackerByMatching::isTrackForgotten(size_t id) const {
+    return isTrackForgotten(tracks_.at(id));
+}
+
+bool TrackerByMatching::isTrackForgotten(const Track &track) const {
+    return (track.lost > params_.forget_delay);
+}
+
+size_t TrackerByMatching::count() const {
+    size_t count = valid_tracks_counter_;
+    for (const auto &pair : tracks_) {
+        count += (isTrackValid(pair.first) ? 1 : 0);
+    }
+    return count;
+}
+
+std::unordered_map<size_t, std::vector<cv::Point>>
+TrackerByMatching::getActiveTracks() const {
+    std::unordered_map<size_t, std::vector<cv::Point>> active_tracks;
+    for (size_t idx : active_track_ids()) {
+        auto track = tracks().at(idx);
+        if (isTrackValid(idx) && !isTrackForgotten(idx)) {
+            active_tracks.emplace(idx, Centers(track.objects));
+        }
+    }
+    return active_tracks;
+}
+
+TrackedObjects TrackerByMatching::trackedDetections() const {
+    TrackedObjects detections;
+    for (size_t idx : active_track_ids()) {
+        auto track = tracks().at(idx);
+        if (isTrackValid(idx) && !track.lost) {
+            detections.emplace_back(track.objects.back());
+        }
+    }
+    return detections;
+}
+
+cv::Mat TrackerByMatching::drawActiveTracks(const cv::Mat &frame) {
+    cv::Mat out_frame = frame.clone();
+
+    if (colors_.empty()) {
+        int num_colors = 100;
+        colors_ = GenRandomColors(num_colors);
+    }
+
+    auto active_tracks = getActiveTracks();
+    for (auto active_track : active_tracks) {
+        size_t idx = active_track.first;
+        auto centers = active_track.second;
+        DrawPolyline(centers, colors_[idx % colors_.size()], out_frame);
+        std::stringstream ss;
+        ss << idx;
+        cv::putText(out_frame, ss.str(), centers.back(), cv::FONT_HERSHEY_SCRIPT_COMPLEX, 2.0,
+                    colors_[idx % colors_.size()], 3);
+        auto track = tracks().at(idx);
+        if (track.lost) {
+            cv::line(out_frame, active_track.second.back(),
+                     Center(track.predicted_rect), cv::Scalar(0, 0, 0), 4);
+        }
+    }
+
+    return out_frame;
+}
+
+const cv::Size kMinFrameSize = cv::Size(320, 240);
+const cv::Size kMaxFrameSize = cv::Size(1920, 1080);
+
+void TrackerByMatching::PrintConfusionMatrices() const {
+    std::cout << "Base classifier quality: " << std::endl;
+    {
+        auto cm = ConfusionMatrix(base_classifier_matches());
+        std::cout << cm << std::endl;
+        std::cout << "or" << std::endl;
+        cm.row(0) = cm.row(0) / std::max(1.0, cv::sum(cm.row(0))[0]);
+        cm.row(1) = cm.row(1) / std::max(1.0, cv::sum(cm.row(1))[0]);
+        std::cout << cm << std::endl << std::endl;
+    }
+
+    std::cout << "Reid-based classifier quality: " << std::endl;
+    {
+        auto cm = ConfusionMatrix(reid_based_classifier_matches());
+        std::cout << cm << std::endl;
+        std::cout << "or" << std::endl;
+        cm.row(0) = cm.row(0) / std::max(1.0, cv::sum(cm.row(0))[0]);
+        cm.row(1) = cm.row(1) / std::max(1.0, cv::sum(cm.row(1))[0]);
+        std::cout << cm << std::endl << std::endl;
+    }
+
+    std::cout << "Reid only classifier quality: " << std::endl;
+    {
+        auto cm = ConfusionMatrix(reid_classifier_matches());
+        std::cout << cm << std::endl;
+        std::cout << "or" << std::endl;
+        cm.row(0) = cm.row(0) / std::max(1.0, cv::sum(cm.row(0))[0]);
+        cm.row(1) = cm.row(1) / std::max(1.0, cv::sum(cm.row(1))[0]);
+        std::cout << cm << std::endl << std::endl;
+    }
+}
diff --git a/modules/videostab/CMakeLists.txt b/modules/videostab/CMakeLists.txt
new file mode 100644
index 00000000000..c31fcfceb5f
--- /dev/null
+++ b/modules/videostab/CMakeLists.txt
@@ -0,0 +1,12 @@
+set(the_description "Video stabilization")
+
+if(HAVE_CUDA)
+  ocv_warnings_disable(CMAKE_CXX_FLAGS -Wundef -Wmissing-declarations -Wshadow -Wunused-parameter)
+endif()
+
+if(DEFINED WINRT AND NOT DEFINED ENABLE_WINRT_MODE_NATIVE)
+  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /ZW")
+endif()
+
+ocv_define_module(videostab opencv_imgproc opencv_features2d opencv_video opencv_photo opencv_calib3d
+OPTIONAL opencv_cudawarping opencv_cudaoptflow opencv_videoio WRAP python)
diff --git a/modules/videostab/include/opencv2/videostab.hpp b/modules/videostab/include/opencv2/videostab.hpp
new file mode 100644
index 00000000000..ca3f5adef2b
--- /dev/null
+++ b/modules/videostab/include/opencv2/videostab.hpp
@@ -0,0 +1,81 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_VIDEOSTAB_HPP
+#define OPENCV_VIDEOSTAB_HPP
+
+/**
+  @defgroup videostab Video Stabilization
+
+The video stabilization module contains a set of functions and classes that can be used to solve the
+problem of video stabilization. There are a few methods implemented, most of them are described in
+the papers @cite OF06 and @cite G11 . However, there are some extensions and deviations from the original
+paper methods.
+
+### References
+
+ 1. "Full-Frame Video Stabilization with Motion Inpainting"
+     Yasuyuki Matsushita, Eyal Ofek, Weina Ge, Xiaoou Tang, Senior Member, and Heung-Yeung Shum
+ 2. "Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths"
+     Matthias Grundmann, Vivek Kwatra, Irfan Essa
+
+     @{
+         @defgroup videostab_motion Global Motion Estimation
+
+The video stabilization module contains a set of functions and classes for global motion estimation
+between point clouds or between images. In the last case features are extracted and matched
+internally. For the sake of convenience the motion estimation functions are wrapped into classes.
+Both the functions and the classes are available.
+
+         @defgroup videostab_marching Fast Marching Method
+
+The Fast Marching Method @cite Telea04 is used in of the video stabilization routines to do motion and
+color inpainting. The method is implemented is a flexible way and it's made public for other users.
+
+     @}
+
+*/
+
+#include "opencv2/videostab/stabilizer.hpp"
+#include "opencv2/videostab/ring_buffer.hpp"
+
+#endif
diff --git a/modules/videostab/include/opencv2/videostab/deblurring.hpp b/modules/videostab/include/opencv2/videostab/deblurring.hpp
new file mode 100644
index 00000000000..d1bd5141f47
--- /dev/null
+++ b/modules/videostab/include/opencv2/videostab/deblurring.hpp
@@ -0,0 +1,116 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_VIDEOSTAB_DEBLURRING_HPP
+#define OPENCV_VIDEOSTAB_DEBLURRING_HPP
+
+#include <vector>
+#include "opencv2/core.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+//! @addtogroup videostab
+//! @{
+
+CV_EXPORTS float calcBlurriness(const Mat &frame);
+
+class CV_EXPORTS DeblurerBase
+{
+public:
+    DeblurerBase() : radius_(0), frames_(0), motions_(0), blurrinessRates_(0) {}
+
+    virtual ~DeblurerBase() {}
+
+    virtual void setRadius(int val) { radius_ = val; }
+    virtual int radius() const { return radius_; }
+
+    virtual void deblur(int idx, Mat &frame, const Range &range) = 0;
+
+
+    // data from stabilizer
+
+    virtual void setFrames(const std::vector<Mat> &val) { frames_ = &val; }
+    virtual const std::vector<Mat>& frames() const { return *frames_; }
+
+    virtual void setMotions(const std::vector<Mat> &val) { motions_ = &val; }
+    virtual const std::vector<Mat>& motions() const { return *motions_; }
+
+    virtual void setBlurrinessRates(const std::vector<float> &val) { blurrinessRates_ = &val; }
+    virtual const std::vector<float>& blurrinessRates() const { return *blurrinessRates_; }
+
+protected:
+    int radius_;
+    const std::vector<Mat> *frames_;
+    const std::vector<Mat> *motions_;
+    const std::vector<float> *blurrinessRates_;
+};
+
+class CV_EXPORTS NullDeblurer : public DeblurerBase
+{
+public:
+    virtual void deblur(int /*idx*/, Mat &/*frame*/, const Range &/*range*/) CV_OVERRIDE {}
+};
+
+class CV_EXPORTS WeightingDeblurer : public DeblurerBase
+{
+public:
+    WeightingDeblurer();
+
+    void setSensitivity(float val) { sensitivity_ = val; }
+    float sensitivity() const { return sensitivity_; }
+
+    virtual void deblur(int idx, Mat &frame, const Range &range) CV_OVERRIDE;
+
+private:
+    float sensitivity_;
+    Mat_<float> bSum_, gSum_, rSum_, wSum_;
+};
+
+//! @}
+
+} // namespace videostab
+} // namespace cv
+
+#endif
diff --git a/modules/videostab/include/opencv2/videostab/fast_marching.hpp b/modules/videostab/include/opencv2/videostab/fast_marching.hpp
new file mode 100644
index 00000000000..43f8e4a7299
--- /dev/null
+++ b/modules/videostab/include/opencv2/videostab/fast_marching.hpp
@@ -0,0 +1,121 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_VIDEOSTAB_FAST_MARCHING_HPP
+#define OPENCV_VIDEOSTAB_FAST_MARCHING_HPP
+
+#include <cmath>
+#include <queue>
+#include <algorithm>
+#include "opencv2/core.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+//! @addtogroup videostab_marching
+//! @{
+
+/** @brief Describes the Fast Marching Method implementation.
+
+  See http://iwi.eldoc.ub.rug.nl/FILES/root/2004/JGraphToolsTelea/2004JGraphToolsTelea.pdf
+ */
+class CV_EXPORTS FastMarchingMethod
+{
+public:
+    FastMarchingMethod() : inf_(1e6f), size_(0) {}
+
+    /** @brief Template method that runs the Fast Marching Method.
+
+    @param mask Image mask. 0 value indicates that the pixel value must be inpainted, 255 indicates
+    that the pixel value is known, other values aren't acceptable.
+    @param inpaint Inpainting functor that overloads void operator ()(int x, int y).
+    @return Inpainting functor.
+     */
+    template <typename Inpaint>
+    Inpaint run(const Mat &mask, Inpaint inpaint);
+
+    /**
+    @return Distance map that's created during working of the method.
+    */
+    Mat distanceMap() const { return dist_; }
+
+private:
+    enum { INSIDE = 0, BAND = 1, KNOWN = 255 };
+
+    struct DXY
+    {
+        float dist;
+        int x, y;
+
+        DXY() : dist(0), x(0), y(0) {}
+        DXY(float _dist, int _x, int _y) : dist(_dist), x(_x), y(_y) {}
+        bool operator <(const DXY &dxy) const { return dist < dxy.dist; }
+    };
+
+    float solve(int x1, int y1, int x2, int y2) const;
+    int& indexOf(const DXY &dxy) { return index_(dxy.y, dxy.x); }
+
+    void heapUp(int idx);
+    void heapDown(int idx);
+    void heapAdd(const DXY &dxy);
+    void heapRemoveMin();
+
+    float inf_;
+
+    cv::Mat_<uchar> flag_; // flag map
+    cv::Mat_<float> dist_; // distance map
+
+    cv::Mat_<int> index_; // index of point in the narrow band
+    std::vector<DXY> narrowBand_; // narrow band heap
+    int size_; // narrow band size
+};
+
+//! @}
+
+} // namespace videostab
+} // namespace cv
+
+#include "fast_marching_inl.hpp"
+
+#endif
diff --git a/modules/videostab/include/opencv2/videostab/fast_marching_inl.hpp b/modules/videostab/include/opencv2/videostab/fast_marching_inl.hpp
new file mode 100644
index 00000000000..fdd488aac80
--- /dev/null
+++ b/modules/videostab/include/opencv2/videostab/fast_marching_inl.hpp
@@ -0,0 +1,165 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_VIDEOSTAB_FAST_MARCHING_INL_HPP
+#define OPENCV_VIDEOSTAB_FAST_MARCHING_INL_HPP
+
+#include "opencv2/videostab/fast_marching.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+template <typename Inpaint>
+Inpaint FastMarchingMethod::run(const cv::Mat &mask, Inpaint inpaint)
+{
+    using namespace cv;
+
+    CV_Assert(mask.type() == CV_8U);
+
+    static const int lut[4][2] = {{-1,0}, {0,-1}, {1,0}, {0,1}};
+
+    mask.copyTo(flag_);
+    flag_.create(mask.size());
+    dist_.create(mask.size());
+    index_.create(mask.size());
+    narrowBand_.clear();
+    size_ = 0;
+
+    // init
+    for (int y = 0; y < flag_.rows; ++y)
+    {
+        for (int x = 0; x < flag_.cols; ++x)
+        {
+            if (flag_(y,x) == KNOWN)
+                dist_(y,x) = 0.f;
+            else
+            {
+                int n = 0;
+                int nunknown = 0;
+
+                for (int i = 0; i < 4; ++i)
+                {
+                    int xn = x + lut[i][0];
+                    int yn = y + lut[i][1];
+
+                    if (xn >= 0 && xn < flag_.cols && yn >= 0 && yn < flag_.rows)
+                    {
+                        n++;
+                        if (flag_(yn,xn) != KNOWN)
+                            nunknown++;
+                    }
+                }
+
+                if (n>0 && nunknown == n)
+                {
+                    dist_(y,x) = inf_;
+                    flag_(y,x) = INSIDE;
+                }
+                else
+                {
+                    dist_(y,x) = 0.f;
+                    flag_(y,x) = BAND;
+                    inpaint(x, y);
+
+                    narrowBand_.push_back(DXY(0.f,x,y));
+                    index_(y,x) = size_++;
+                }
+            }
+        }
+    }
+
+    // make heap
+    for (int i = size_/2-1; i >= 0; --i)
+        heapDown(i);
+
+    // main cycle
+    while (size_ > 0)
+    {
+        int x = narrowBand_[0].x;
+        int y = narrowBand_[0].y;
+        heapRemoveMin();
+
+        flag_(y,x) = KNOWN;
+        for (int n = 0; n < 4; ++n)
+        {
+            int xn = x + lut[n][0];
+            int yn = y + lut[n][1];
+
+            if (xn >= 0 && xn < flag_.cols && yn >= 0 && yn < flag_.rows && flag_(yn,xn) != KNOWN)
+            {
+                dist_(yn,xn) = std::min(std::min(solve(xn-1, yn, xn, yn-1), solve(xn+1, yn, xn, yn-1)),
+                                        std::min(solve(xn-1, yn, xn, yn+1), solve(xn+1, yn, xn, yn+1)));
+
+                if (flag_(yn,xn) == INSIDE)
+                {
+                    flag_(yn,xn) = BAND;
+                    inpaint(xn, yn);
+                    heapAdd(DXY(dist_(yn,xn),xn,yn));
+                }
+                else
+                {
+                    int i = index_(yn,xn);
+                    if (dist_(yn,xn) < narrowBand_[i].dist)
+                    {
+                        narrowBand_[i].dist = dist_(yn,xn);
+                        heapUp(i);
+                    }
+                    // works better if it's commented out
+                    /*else if (dist(yn,xn) > narrowBand[i].dist)
+                    {
+                        narrowBand[i].dist = dist(yn,xn);
+                        heapDown(i);
+                    }*/
+                }
+            }
+        }
+    }
+
+    return inpaint;
+}
+
+} // namespace videostab
+} // namespace cv
+
+#endif
diff --git a/modules/videostab/include/opencv2/videostab/frame_source.hpp b/modules/videostab/include/opencv2/videostab/frame_source.hpp
new file mode 100644
index 00000000000..36343dd5fbb
--- /dev/null
+++ b/modules/videostab/include/opencv2/videostab/frame_source.hpp
@@ -0,0 +1,116 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_VIDEOSTAB_FRAME_SOURCE_HPP
+#define OPENCV_VIDEOSTAB_FRAME_SOURCE_HPP
+
+#include <vector>
+#include "opencv2/core.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+//! @addtogroup videostab
+//! @{
+
+class CV_EXPORTS IFrameSource
+{
+public:
+    virtual ~IFrameSource() {}
+    virtual void reset() = 0;
+    virtual Mat nextFrame() = 0;
+};
+
+class CV_EXPORTS NullFrameSource : public IFrameSource
+{
+public:
+    virtual void reset() CV_OVERRIDE {}
+    virtual Mat nextFrame() CV_OVERRIDE { return Mat(); }
+};
+
+class CV_EXPORTS VideoFileSource : public IFrameSource
+{
+public:
+    VideoFileSource(const String &path, bool volatileFrame = false);
+
+    virtual void reset() CV_OVERRIDE;
+    virtual Mat nextFrame() CV_OVERRIDE;
+
+    int width();
+    int height();
+    int count();
+    double fps();
+
+private:
+    Ptr<IFrameSource> impl;
+};
+
+class MaskFrameSource : public IFrameSource
+{
+public:
+    MaskFrameSource(const Ptr<IFrameSource>& source): impl(source) {};
+
+    virtual void reset() CV_OVERRIDE { impl->reset(); }
+    virtual Mat nextFrame() CV_OVERRIDE {
+        Mat nextFrame = impl->nextFrame();
+        maskCallback_(nextFrame);
+        return nextFrame;
+    }
+
+    void setMaskCallback(std::function<void(Mat&)> MaskCallback)
+    {
+        maskCallback_ = std::bind(MaskCallback, std::placeholders::_1);
+    };
+
+private:
+    Ptr<IFrameSource> impl;
+    std::function<void(Mat&)> maskCallback_;
+};
+
+//! @}
+
+} // namespace videostab
+} // namespace cv
+
+#endif
diff --git a/modules/videostab/include/opencv2/videostab/global_motion.hpp b/modules/videostab/include/opencv2/videostab/global_motion.hpp
new file mode 100644
index 00000000000..24291c2db4b
--- /dev/null
+++ b/modules/videostab/include/opencv2/videostab/global_motion.hpp
@@ -0,0 +1,311 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_VIDEOSTAB_GLOBAL_MOTION_HPP
+#define OPENCV_VIDEOSTAB_GLOBAL_MOTION_HPP
+
+#include <vector>
+#include <fstream>
+#include "opencv2/core.hpp"
+#include "opencv2/features2d.hpp"
+#include "opencv2/opencv_modules.hpp"
+#include "opencv2/videostab/optical_flow.hpp"
+#include "opencv2/videostab/motion_core.hpp"
+#include "opencv2/videostab/outlier_rejection.hpp"
+
+#ifdef HAVE_OPENCV_CUDAIMGPROC
+#  include "opencv2/cudaimgproc.hpp"
+#endif
+
+namespace cv
+{
+namespace videostab
+{
+
+//! @addtogroup videostab_motion
+//! @{
+
+/** @brief Estimates best global motion between two 2D point clouds in the least-squares sense.
+
+@note Works in-place and changes input point arrays.
+
+@param points0 Source set of 2D points (32F).
+@param points1 Destination set of 2D points (32F).
+@param model Motion model (up to MM_AFFINE).
+@param rmse Final root-mean-square error.
+@return 3x3 2D transformation matrix (32F).
+ */
+CV_EXPORTS Mat estimateGlobalMotionLeastSquares(
+        InputOutputArray points0, InputOutputArray points1, int model = MM_AFFINE,
+        float *rmse = 0);
+
+/** @brief Estimates best global motion between two 2D point clouds robustly (using RANSAC method).
+
+@param points0 Source set of 2D points (32F).
+@param points1 Destination set of 2D points (32F).
+@param model Motion model. See cv::videostab::MotionModel.
+@param params RANSAC method parameters. See videostab::RansacParams.
+@param rmse Final root-mean-square error.
+@param ninliers Final number of inliers.
+ */
+CV_EXPORTS Mat estimateGlobalMotionRansac(
+        InputArray points0, InputArray points1, int model = MM_AFFINE,
+        const RansacParams &params = RansacParams::default2dMotion(MM_AFFINE),
+        float *rmse = 0, int *ninliers = 0);
+
+/** @brief Base class for all global motion estimation methods.
+ */
+class CV_EXPORTS MotionEstimatorBase
+{
+public:
+    virtual ~MotionEstimatorBase() {}
+
+    /** @brief Sets motion model.
+
+    @param val Motion model. See cv::videostab::MotionModel.
+     */
+    virtual void setMotionModel(MotionModel val) { motionModel_ = val; }
+
+    /**
+    @return Motion model. See cv::videostab::MotionModel.
+    */
+    virtual MotionModel motionModel() const { return motionModel_; }
+
+    /** @brief Estimates global motion between two 2D point clouds.
+
+    @param points0 Source set of 2D points (32F).
+    @param points1 Destination set of 2D points (32F).
+    @param ok Indicates whether motion was estimated successfully.
+    @return 3x3 2D transformation matrix (32F).
+     */
+    virtual Mat estimate(InputArray points0, InputArray points1, bool *ok = 0) = 0;
+
+protected:
+    MotionEstimatorBase(MotionModel model) { setMotionModel(model); }
+
+private:
+    MotionModel motionModel_;
+};
+
+/** @brief Describes a robust RANSAC-based global 2D motion estimation method which minimizes L2 error.
+ */
+class CV_EXPORTS MotionEstimatorRansacL2 : public MotionEstimatorBase
+{
+public:
+    MotionEstimatorRansacL2(MotionModel model = MM_AFFINE);
+
+    void setRansacParams(const RansacParams &val) { ransacParams_ = val; }
+    RansacParams ransacParams() const { return ransacParams_; }
+
+    void setMinInlierRatio(float val) { minInlierRatio_ = val; }
+    float minInlierRatio() const { return minInlierRatio_; }
+
+    virtual Mat estimate(InputArray points0, InputArray points1, bool *ok = 0) CV_OVERRIDE;
+
+private:
+    RansacParams ransacParams_;
+    float minInlierRatio_;
+};
+
+/** @brief Describes a global 2D motion estimation method which minimizes L1 error.
+
+@note To be able to use this method you must build OpenCV with CLP library support. :
+ */
+class CV_EXPORTS MotionEstimatorL1 : public MotionEstimatorBase
+{
+public:
+    MotionEstimatorL1(MotionModel model = MM_AFFINE);
+
+    virtual Mat estimate(InputArray points0, InputArray points1, bool *ok = 0) CV_OVERRIDE;
+
+private:
+    std::vector<double> obj_, collb_, colub_;
+    std::vector<double> elems_, rowlb_, rowub_;
+    std::vector<int> rows_, cols_;
+
+    void set(int row, int col, double coef)
+    {
+        rows_.push_back(row);
+        cols_.push_back(col);
+        elems_.push_back(coef);
+    }
+};
+
+/** @brief Base class for global 2D motion estimation methods which take frames as input.
+ */
+class CV_EXPORTS ImageMotionEstimatorBase
+{
+public:
+    virtual ~ImageMotionEstimatorBase() {}
+
+    virtual void setMotionModel(MotionModel val) { motionModel_ = val; }
+    virtual MotionModel motionModel() const { return motionModel_; }
+
+    virtual void setFrameMask(InputArray mask)
+    {
+        if (!mask.empty())
+            CV_Error(Error::StsNotImplemented, "Mask support is not implemented.");
+    }
+
+    virtual Mat estimate(const Mat &frame0, const Mat &frame1, bool *ok = 0) = 0;
+
+protected:
+    ImageMotionEstimatorBase(MotionModel model) { setMotionModel(model); }
+
+private:
+    MotionModel motionModel_;
+};
+
+class CV_EXPORTS FromFileMotionReader : public ImageMotionEstimatorBase
+{
+public:
+    FromFileMotionReader(const String &path);
+
+    virtual Mat estimate(const Mat &frame0, const Mat &frame1, bool *ok = 0) CV_OVERRIDE;
+
+private:
+    std::ifstream file_;
+};
+
+class CV_EXPORTS ToFileMotionWriter : public ImageMotionEstimatorBase
+{
+public:
+    ToFileMotionWriter(const String &path, Ptr<ImageMotionEstimatorBase> estimator);
+
+    virtual void setMotionModel(MotionModel val) CV_OVERRIDE { motionEstimator_->setMotionModel(val); }
+    virtual MotionModel motionModel() const CV_OVERRIDE { return motionEstimator_->motionModel(); }
+
+    virtual void setFrameMask(InputArray mask) CV_OVERRIDE { motionEstimator_->setFrameMask(mask); }
+
+    virtual Mat estimate(const Mat &frame0, const Mat &frame1, bool *ok = 0) CV_OVERRIDE;
+
+private:
+    std::ofstream file_;
+    Ptr<ImageMotionEstimatorBase> motionEstimator_;
+};
+
+/** @brief Describes a global 2D motion estimation method which uses keypoints detection and optical flow for
+matching.
+ */
+class CV_EXPORTS KeypointBasedMotionEstimator : public ImageMotionEstimatorBase
+{
+public:
+    KeypointBasedMotionEstimator(Ptr<MotionEstimatorBase> estimator);
+
+    virtual void setMotionModel(MotionModel val) CV_OVERRIDE { motionEstimator_->setMotionModel(val); }
+    virtual MotionModel motionModel() const CV_OVERRIDE { return motionEstimator_->motionModel(); }
+
+    void setDetector(Ptr<FeatureDetector> val) { detector_ = val; }
+    Ptr<FeatureDetector> detector() const { return detector_; }
+
+    void setOpticalFlowEstimator(Ptr<ISparseOptFlowEstimator> val) { optFlowEstimator_ = val; }
+    Ptr<ISparseOptFlowEstimator> opticalFlowEstimator() const { return optFlowEstimator_; }
+
+    void setOutlierRejector(Ptr<IOutlierRejector> val) { outlierRejector_ = val; }
+    Ptr<IOutlierRejector> outlierRejector() const { return outlierRejector_; }
+
+    virtual void setFrameMask(InputArray mask) CV_OVERRIDE { mask_ = mask.getMat(); }
+
+    virtual Mat estimate(const Mat &frame0, const Mat &frame1, bool *ok = 0) CV_OVERRIDE;
+    Mat estimate(InputArray frame0, InputArray frame1, bool *ok = 0);
+
+private:
+    Ptr<MotionEstimatorBase> motionEstimator_;
+    Ptr<FeatureDetector> detector_;
+    Ptr<ISparseOptFlowEstimator> optFlowEstimator_;
+    Ptr<IOutlierRejector> outlierRejector_;
+    Mat mask_;
+
+    std::vector<uchar> status_;
+    std::vector<KeyPoint> keypointsPrev_;
+    std::vector<Point2f> pointsPrev_, points_;
+    std::vector<Point2f> pointsPrevGood_, pointsGood_;
+};
+
+#if defined(HAVE_OPENCV_CUDAIMGPROC) && defined(HAVE_OPENCV_CUDAOPTFLOW)
+
+class CV_EXPORTS KeypointBasedMotionEstimatorGpu : public ImageMotionEstimatorBase
+{
+public:
+    KeypointBasedMotionEstimatorGpu(Ptr<MotionEstimatorBase> estimator);
+
+    virtual void setMotionModel(MotionModel val) CV_OVERRIDE { motionEstimator_->setMotionModel(val); }
+    virtual MotionModel motionModel() const CV_OVERRIDE { return motionEstimator_->motionModel(); }
+
+    void setOutlierRejector(Ptr<IOutlierRejector> val) { outlierRejector_ = val; }
+    Ptr<IOutlierRejector> outlierRejector() const { return outlierRejector_; }
+
+    virtual Mat estimate(const Mat &frame0, const Mat &frame1, bool *ok = 0) CV_OVERRIDE;
+    Mat estimate(const cuda::GpuMat &frame0, const cuda::GpuMat &frame1, bool *ok = 0);
+
+private:
+    Ptr<MotionEstimatorBase> motionEstimator_;
+    Ptr<cuda::CornersDetector> detector_;
+    SparsePyrLkOptFlowEstimatorGpu optFlowEstimator_;
+    Ptr<IOutlierRejector> outlierRejector_;
+
+    cuda::GpuMat frame0_, grayFrame0_, frame1_;
+    cuda::GpuMat pointsPrev_, points_;
+    cuda::GpuMat status_;
+
+    Mat hostPointsPrev_, hostPoints_;
+    std::vector<Point2f> hostPointsPrevTmp_, hostPointsTmp_;
+    std::vector<uchar> rejectionStatus_;
+};
+
+#endif // defined(HAVE_OPENCV_CUDAIMGPROC) && defined(HAVE_OPENCV_CUDAOPTFLOW)
+
+/** @brief Computes motion between two frames assuming that all the intermediate motions are known.
+
+@param from Source frame index.
+@param to Destination frame index.
+@param motions Pair-wise motions. motions[i] denotes motion from the frame i to the frame i+1
+@return Motion from the Source frame to the Destination frame.
+ */
+CV_EXPORTS Mat getMotion(int from, int to, const std::vector<Mat> &motions);
+
+//! @}
+
+} // namespace videostab
+} // namespace cv
+
+#endif
diff --git a/modules/videostab/include/opencv2/videostab/inpainting.hpp b/modules/videostab/include/opencv2/videostab/inpainting.hpp
new file mode 100644
index 00000000000..9c123f02acb
--- /dev/null
+++ b/modules/videostab/include/opencv2/videostab/inpainting.hpp
@@ -0,0 +1,212 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_VIDEOSTAB_INPAINTINT_HPP
+#define OPENCV_VIDEOSTAB_INPAINTINT_HPP
+
+#include <vector>
+#include "opencv2/core.hpp"
+#include "opencv2/videostab/optical_flow.hpp"
+#include "opencv2/videostab/fast_marching.hpp"
+#include "opencv2/videostab/global_motion.hpp"
+#include "opencv2/photo.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+//! @addtogroup videostab
+//! @{
+
+class CV_EXPORTS InpainterBase
+{
+public:
+    InpainterBase()
+        : radius_(0), motionModel_(MM_UNKNOWN), frames_(0), motions_(0),
+          stabilizedFrames_(0), stabilizationMotions_(0) {}
+
+    virtual ~InpainterBase() {}
+
+    virtual void setRadius(int val) { radius_ = val; }
+    virtual int radius() const { return radius_; }
+
+    virtual void setMotionModel(MotionModel val) { motionModel_ = val; }
+    virtual MotionModel motionModel() const { return motionModel_; }
+
+    virtual void inpaint(int idx, Mat &frame, Mat &mask) = 0;
+
+
+    // data from stabilizer
+
+    virtual void setFrames(const std::vector<Mat> &val) { frames_ = &val; }
+    virtual const std::vector<Mat>& frames() const { return *frames_; }
+
+    virtual void setMotions(const std::vector<Mat> &val) { motions_ = &val; }
+    virtual const std::vector<Mat>& motions() const { return *motions_; }
+
+    virtual void setStabilizedFrames(const std::vector<Mat> &val) { stabilizedFrames_ = &val; }
+    virtual const std::vector<Mat>& stabilizedFrames() const { return *stabilizedFrames_; }
+
+    virtual void setStabilizationMotions(const std::vector<Mat> &val) { stabilizationMotions_ = &val; }
+    virtual const std::vector<Mat>& stabilizationMotions() const { return *stabilizationMotions_; }
+
+protected:
+    int radius_;
+    MotionModel motionModel_;
+    const std::vector<Mat> *frames_;
+    const std::vector<Mat> *motions_;
+    const std::vector<Mat> *stabilizedFrames_;
+    const std::vector<Mat> *stabilizationMotions_;
+};
+
+class CV_EXPORTS NullInpainter : public InpainterBase
+{
+public:
+    virtual void inpaint(int /*idx*/, Mat &/*frame*/, Mat &/*mask*/) CV_OVERRIDE {}
+};
+
+class CV_EXPORTS InpaintingPipeline : public InpainterBase
+{
+public:
+    void pushBack(Ptr<InpainterBase> inpainter) { inpainters_.push_back(inpainter); }
+    bool empty() const { return inpainters_.empty(); }
+
+    virtual void setRadius(int val) CV_OVERRIDE;
+    virtual void setMotionModel(MotionModel val) CV_OVERRIDE;
+    virtual void setFrames(const std::vector<Mat> &val) CV_OVERRIDE;
+    virtual void setMotions(const std::vector<Mat> &val) CV_OVERRIDE;
+    virtual void setStabilizedFrames(const std::vector<Mat> &val) CV_OVERRIDE;
+    virtual void setStabilizationMotions(const std::vector<Mat> &val) CV_OVERRIDE;
+
+    virtual void inpaint(int idx, Mat &frame, Mat &mask) CV_OVERRIDE;
+
+private:
+    std::vector<Ptr<InpainterBase> > inpainters_;
+};
+
+class CV_EXPORTS ConsistentMosaicInpainter : public InpainterBase
+{
+public:
+    ConsistentMosaicInpainter();
+
+    void setStdevThresh(float val) { stdevThresh_ = val; }
+    float stdevThresh() const { return stdevThresh_; }
+
+    virtual void inpaint(int idx, Mat &frame, Mat &mask) CV_OVERRIDE;
+
+private:
+    float stdevThresh_;
+};
+
+class CV_EXPORTS MotionInpainter : public InpainterBase
+{
+public:
+    MotionInpainter();
+
+    void setOptFlowEstimator(Ptr<IDenseOptFlowEstimator> val) { optFlowEstimator_ = val; }
+    Ptr<IDenseOptFlowEstimator> optFlowEstimator() const { return optFlowEstimator_; }
+
+    void setFlowErrorThreshold(float val) { flowErrorThreshold_ = val; }
+    float flowErrorThreshold() const { return flowErrorThreshold_; }
+
+    void setDistThreshold(float val) { distThresh_ = val; }
+    float distThresh() const { return distThresh_; }
+
+    void setBorderMode(int val) { borderMode_ = val; }
+    int borderMode() const { return borderMode_; }
+
+    virtual void inpaint(int idx, Mat &frame, Mat &mask) CV_OVERRIDE;
+
+private:
+    FastMarchingMethod fmm_;
+    Ptr<IDenseOptFlowEstimator> optFlowEstimator_;
+    float flowErrorThreshold_;
+    float distThresh_;
+    int borderMode_;
+
+    Mat frame1_, transformedFrame1_;
+    Mat_<uchar> grayFrame_, transformedGrayFrame1_;
+    Mat_<uchar> mask1_, transformedMask1_;
+    Mat_<float> flowX_, flowY_, flowErrors_;
+    Mat_<uchar> flowMask_;
+};
+
+class CV_EXPORTS ColorAverageInpainter : public InpainterBase
+{
+public:
+    virtual void inpaint(int idx, Mat &frame, Mat &mask) CV_OVERRIDE;
+
+private:
+    FastMarchingMethod fmm_;
+};
+
+class CV_EXPORTS ColorInpainter : public InpainterBase
+{
+public:
+    ColorInpainter(int method = INPAINT_TELEA, double radius = 2.);
+
+    virtual void inpaint(int idx, Mat &frame, Mat &mask) CV_OVERRIDE;
+
+private:
+    int method_;
+    double radius_;
+    Mat invMask_;
+};
+
+inline ColorInpainter::ColorInpainter(int _method, double _radius)
+        : method_(_method), radius_(_radius) {}
+
+CV_EXPORTS void calcFlowMask(
+        const Mat &flowX, const Mat &flowY, const Mat &errors, float maxError,
+        const Mat &mask0, const Mat &mask1, Mat &flowMask);
+
+CV_EXPORTS void completeFrameAccordingToFlow(
+        const Mat &flowMask, const Mat &flowX, const Mat &flowY, const Mat &frame1, const Mat &mask1,
+        float distThresh, Mat& frame0, Mat &mask0);
+
+//! @}
+
+} // namespace videostab
+} // namespace cv
+
+#endif
diff --git a/modules/videostab/include/opencv2/videostab/log.hpp b/modules/videostab/include/opencv2/videostab/log.hpp
new file mode 100644
index 00000000000..73e704996d9
--- /dev/null
+++ b/modules/videostab/include/opencv2/videostab/log.hpp
@@ -0,0 +1,80 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_VIDEOSTAB_LOG_HPP
+#define OPENCV_VIDEOSTAB_LOG_HPP
+
+#include "opencv2/core.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+//! @addtogroup videostab
+//! @{
+
+class CV_EXPORTS ILog
+{
+public:
+    virtual ~ILog() {}
+    virtual void print(const char *format, ...) = 0;
+};
+
+class CV_EXPORTS NullLog : public ILog
+{
+public:
+    virtual void print(const char * /*format*/, ...) CV_OVERRIDE {}
+};
+
+class CV_EXPORTS LogToStdout : public ILog
+{
+public:
+    virtual void print(const char *format, ...) CV_OVERRIDE;
+};
+
+//! @}
+
+} // namespace videostab
+} // namespace cv
+
+#endif
diff --git a/modules/videostab/include/opencv2/videostab/motion_core.hpp b/modules/videostab/include/opencv2/videostab/motion_core.hpp
new file mode 100644
index 00000000000..4525cc7b3ce
--- /dev/null
+++ b/modules/videostab/include/opencv2/videostab/motion_core.hpp
@@ -0,0 +1,129 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_VIDEOSTAB_MOTION_CORE_HPP
+#define OPENCV_VIDEOSTAB_MOTION_CORE_HPP
+
+#include <cmath>
+#include "opencv2/core.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+//! @addtogroup videostab_motion
+//! @{
+
+/** @brief Describes motion model between two point clouds.
+ */
+enum MotionModel
+{
+    MM_TRANSLATION = 0,
+    MM_TRANSLATION_AND_SCALE = 1,
+    MM_ROTATION = 2,
+    MM_RIGID = 3,
+    MM_SIMILARITY = 4,
+    MM_AFFINE = 5,
+    MM_HOMOGRAPHY = 6,
+    MM_UNKNOWN = 7
+};
+
+/** @brief Describes RANSAC method parameters.
+ */
+struct CV_EXPORTS RansacParams
+{
+    int size; //!< subset size
+    float thresh; //!< max error to classify as inlier
+    float eps; //!< max outliers ratio
+    float prob; //!< probability of success
+
+    RansacParams() : size(0), thresh(0), eps(0), prob(0) {}
+    /** @brief Constructor
+    @param size Subset size.
+    @param thresh Maximum re-projection error value to classify as inlier.
+    @param eps Maximum ratio of incorrect correspondences.
+    @param prob Required success probability.
+     */
+    RansacParams(int size, float thresh, float eps, float prob);
+
+    /**
+    @return Number of iterations that'll be performed by RANSAC method.
+    */
+    int niters() const
+    {
+        return static_cast<int>(
+                std::ceil(std::log(1 - prob) / std::log(1 - std::pow(1 - eps, size))));
+    }
+
+    /**
+    @param model Motion model. See cv::videostab::MotionModel.
+    @return Default RANSAC method parameters for the given motion model.
+    */
+    static RansacParams default2dMotion(MotionModel model)
+    {
+        CV_Assert(model < MM_UNKNOWN);
+        if (model == MM_TRANSLATION)
+            return RansacParams(1, 0.5f, 0.5f, 0.99f);
+        if (model == MM_TRANSLATION_AND_SCALE)
+            return RansacParams(2, 0.5f, 0.5f, 0.99f);
+        if (model == MM_ROTATION)
+            return RansacParams(1, 0.5f, 0.5f, 0.99f);
+        if (model == MM_RIGID)
+            return RansacParams(2, 0.5f, 0.5f, 0.99f);
+        if (model == MM_SIMILARITY)
+            return RansacParams(2, 0.5f, 0.5f, 0.99f);
+        if (model == MM_AFFINE)
+            return RansacParams(3, 0.5f, 0.5f, 0.99f);
+        return RansacParams(4, 0.5f, 0.5f, 0.99f);
+    }
+};
+
+inline RansacParams::RansacParams(int _size, float _thresh, float _eps, float _prob)
+    : size(_size), thresh(_thresh), eps(_eps), prob(_prob) {}
+
+//! @}
+
+} // namespace videostab
+} // namespace cv
+
+#endif
diff --git a/modules/videostab/include/opencv2/videostab/motion_stabilizing.hpp b/modules/videostab/include/opencv2/videostab/motion_stabilizing.hpp
new file mode 100644
index 00000000000..f0dbff12766
--- /dev/null
+++ b/modules/videostab/include/opencv2/videostab/motion_stabilizing.hpp
@@ -0,0 +1,174 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_VIDEOSTAB_MOTION_STABILIZING_HPP
+#define OPENCV_VIDEOSTAB_MOTION_STABILIZING_HPP
+
+#include <vector>
+#include <utility>
+#include "opencv2/core.hpp"
+#include "opencv2/videostab/global_motion.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+//! @addtogroup videostab_motion
+//! @{
+
+class CV_EXPORTS IMotionStabilizer
+{
+public:
+    virtual ~IMotionStabilizer() {}
+
+    //! assumes that [0, size-1) is in or equals to [range.first, range.second)
+    virtual void stabilize(
+            int size, const std::vector<Mat> &motions, const Range &range,
+            Mat *stabilizationMotions) = 0;
+};
+
+class CV_EXPORTS MotionStabilizationPipeline : public IMotionStabilizer
+{
+public:
+    void pushBack(Ptr<IMotionStabilizer> stabilizer) { stabilizers_.push_back(stabilizer); }
+    bool empty() const { return stabilizers_.empty(); }
+
+    virtual void stabilize(
+            int size, const std::vector<Mat> &motions, const Range &range,
+            Mat *stabilizationMotions) CV_OVERRIDE;
+
+private:
+    std::vector<Ptr<IMotionStabilizer> > stabilizers_;
+};
+
+class CV_EXPORTS MotionFilterBase : public IMotionStabilizer
+{
+public:
+    virtual ~MotionFilterBase() {}
+
+    virtual Mat stabilize(
+            int idx, const std::vector<Mat> &motions, const Range &range) = 0;
+
+    virtual void stabilize(
+            int size, const std::vector<Mat> &motions, const Range &range,
+            Mat *stabilizationMotions) CV_OVERRIDE;
+};
+
+class CV_EXPORTS GaussianMotionFilter : public MotionFilterBase
+{
+public:
+    GaussianMotionFilter(int radius = 15, float stdev = -1.f);
+
+    void setParams(int radius, float stdev = -1.f);
+    int radius() const { return radius_; }
+    float stdev() const { return stdev_; }
+
+    virtual Mat stabilize(
+            int idx, const std::vector<Mat> &motions, const Range &range) CV_OVERRIDE;
+
+private:
+    int radius_;
+    float stdev_;
+    std::vector<float> weight_;
+};
+
+inline GaussianMotionFilter::GaussianMotionFilter(int _radius, float _stdev) { setParams(_radius, _stdev); }
+
+class CV_EXPORTS LpMotionStabilizer : public IMotionStabilizer
+{
+public:
+    LpMotionStabilizer(MotionModel model = MM_SIMILARITY);
+
+    void setMotionModel(MotionModel val) { model_ = val; }
+    MotionModel motionModel() const { return model_; }
+
+    void setFrameSize(Size val) { frameSize_ = val; }
+    Size frameSize() const { return frameSize_; }
+
+    void setTrimRatio(float val) { trimRatio_ = val; }
+    float trimRatio() const { return trimRatio_; }
+
+    void setWeight1(float val) { w1_ = val; }
+    float weight1() const { return w1_; }
+
+    void setWeight2(float val) { w2_ = val; }
+    float weight2() const { return w2_; }
+
+    void setWeight3(float val) { w3_ = val; }
+    float weight3() const { return w3_; }
+
+    void setWeight4(float val) { w4_ = val; }
+    float weight4() const { return w4_; }
+
+    virtual void stabilize(
+            int size, const std::vector<Mat> &motions, const Range &range,
+            Mat *stabilizationMotions) CV_OVERRIDE;
+
+private:
+    MotionModel model_;
+    Size frameSize_;
+    float trimRatio_;
+    float w1_, w2_, w3_, w4_;
+
+    std::vector<double> obj_, collb_, colub_;
+    std::vector<int> rows_, cols_;
+    std::vector<double> elems_, rowlb_, rowub_;
+
+    void set(int row, int col, double coef)
+    {
+        rows_.push_back(row);
+        cols_.push_back(col);
+        elems_.push_back(coef);
+    }
+};
+
+CV_EXPORTS Mat ensureInclusionConstraint(const Mat &M, Size size, float trimRatio);
+
+CV_EXPORTS float estimateOptimalTrimRatio(const Mat &M, Size size);
+
+//! @}
+
+} // namespace videostab
+} // namespace
+
+#endif
diff --git a/modules/videostab/include/opencv2/videostab/optical_flow.hpp b/modules/videostab/include/opencv2/videostab/optical_flow.hpp
new file mode 100644
index 00000000000..5e06941d5bd
--- /dev/null
+++ b/modules/videostab/include/opencv2/videostab/optical_flow.hpp
@@ -0,0 +1,150 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_VIDEOSTAB_OPTICAL_FLOW_HPP
+#define OPENCV_VIDEOSTAB_OPTICAL_FLOW_HPP
+
+#include "opencv2/core.hpp"
+#include "opencv2/opencv_modules.hpp"
+
+#ifdef HAVE_OPENCV_CUDAOPTFLOW
+  #include "opencv2/cudaoptflow.hpp"
+#endif
+
+namespace cv
+{
+namespace videostab
+{
+
+//! @addtogroup videostab
+//! @{
+
+class CV_EXPORTS ISparseOptFlowEstimator
+{
+public:
+    virtual ~ISparseOptFlowEstimator() {}
+    virtual void run(
+            InputArray frame0, InputArray frame1, InputArray points0, InputOutputArray points1,
+            OutputArray status, OutputArray errors) = 0;
+};
+
+class CV_EXPORTS IDenseOptFlowEstimator
+{
+public:
+    virtual ~IDenseOptFlowEstimator() {}
+    virtual void run(
+            InputArray frame0, InputArray frame1, InputOutputArray flowX, InputOutputArray flowY,
+            OutputArray errors) = 0;
+};
+
+class CV_EXPORTS PyrLkOptFlowEstimatorBase
+{
+public:
+    PyrLkOptFlowEstimatorBase() { setWinSize(Size(21, 21)); setMaxLevel(3); }
+
+    virtual void setWinSize(Size val) { winSize_ = val; }
+    virtual Size winSize() const { return winSize_; }
+
+    virtual void setMaxLevel(int val) { maxLevel_ = val; }
+    virtual int maxLevel() const { return maxLevel_; }
+    virtual ~PyrLkOptFlowEstimatorBase() {}
+
+protected:
+    Size winSize_;
+    int maxLevel_;
+};
+
+class CV_EXPORTS SparsePyrLkOptFlowEstimator
+        : public PyrLkOptFlowEstimatorBase, public ISparseOptFlowEstimator
+{
+public:
+    virtual void run(
+            InputArray frame0, InputArray frame1, InputArray points0, InputOutputArray points1,
+            OutputArray status, OutputArray errors) CV_OVERRIDE;
+};
+
+#ifdef HAVE_OPENCV_CUDAOPTFLOW
+
+class CV_EXPORTS SparsePyrLkOptFlowEstimatorGpu
+        : public PyrLkOptFlowEstimatorBase, public ISparseOptFlowEstimator
+{
+public:
+    SparsePyrLkOptFlowEstimatorGpu();
+
+    virtual void run(
+            InputArray frame0, InputArray frame1, InputArray points0, InputOutputArray points1,
+            OutputArray status, OutputArray errors) CV_OVERRIDE;
+
+    void run(const cuda::GpuMat &frame0, const cuda::GpuMat &frame1, const cuda::GpuMat &points0, cuda::GpuMat &points1,
+             cuda::GpuMat &status, cuda::GpuMat &errors);
+
+    void run(const cuda::GpuMat &frame0, const cuda::GpuMat &frame1, const cuda::GpuMat &points0, cuda::GpuMat &points1,
+             cuda::GpuMat &status);
+
+private:
+    Ptr<cuda::SparsePyrLKOpticalFlow> optFlowEstimator_;
+    cuda::GpuMat frame0_, frame1_, points0_, points1_, status_, errors_;
+};
+
+class CV_EXPORTS DensePyrLkOptFlowEstimatorGpu
+        : public PyrLkOptFlowEstimatorBase, public IDenseOptFlowEstimator
+{
+public:
+    DensePyrLkOptFlowEstimatorGpu();
+
+    virtual void run(
+            InputArray frame0, InputArray frame1, InputOutputArray flowX, InputOutputArray flowY,
+            OutputArray errors) CV_OVERRIDE;
+
+private:
+    Ptr<cuda::DensePyrLKOpticalFlow> optFlowEstimator_;
+    cuda::GpuMat frame0_, frame1_, flowX_, flowY_, errors_;
+};
+
+#endif
+
+//! @}
+
+} // namespace videostab
+} // namespace cv
+
+#endif
diff --git a/modules/videostab/include/opencv2/videostab/outlier_rejection.hpp b/modules/videostab/include/opencv2/videostab/outlier_rejection.hpp
new file mode 100644
index 00000000000..1d29896bf3f
--- /dev/null
+++ b/modules/videostab/include/opencv2/videostab/outlier_rejection.hpp
@@ -0,0 +1,101 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_VIDEOSTAB_OUTLIER_REJECTION_HPP
+#define OPENCV_VIDEOSTAB_OUTLIER_REJECTION_HPP
+
+#include <vector>
+#include "opencv2/core.hpp"
+#include "opencv2/videostab/motion_core.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+//! @addtogroup videostab
+//! @{
+
+class CV_EXPORTS IOutlierRejector
+{
+public:
+    virtual ~IOutlierRejector() {}
+
+    virtual void process(
+            Size frameSize, InputArray points0, InputArray points1, OutputArray mask) = 0;
+};
+
+class CV_EXPORTS NullOutlierRejector : public IOutlierRejector
+{
+public:
+    virtual void process(
+            Size frameSize, InputArray points0, InputArray points1, OutputArray mask) CV_OVERRIDE;
+};
+
+class CV_EXPORTS TranslationBasedLocalOutlierRejector : public IOutlierRejector
+{
+public:
+    TranslationBasedLocalOutlierRejector();
+
+    void setCellSize(Size val) { cellSize_ = val; }
+    Size cellSize() const { return cellSize_; }
+
+    void setRansacParams(RansacParams val) { ransacParams_ = val; }
+    RansacParams ransacParams() const { return ransacParams_; }
+
+    virtual void process(
+            Size frameSize, InputArray points0, InputArray points1, OutputArray mask) CV_OVERRIDE;
+
+private:
+    Size cellSize_;
+    RansacParams ransacParams_;
+
+    typedef std::vector<int> Cell;
+    std::vector<Cell> grid_;
+};
+
+//! @}
+
+} // namespace videostab
+} // namespace cv
+
+#endif
diff --git a/modules/videostab/include/opencv2/videostab/ring_buffer.hpp b/modules/videostab/include/opencv2/videostab/ring_buffer.hpp
new file mode 100644
index 00000000000..55d52444bc5
--- /dev/null
+++ b/modules/videostab/include/opencv2/videostab/ring_buffer.hpp
@@ -0,0 +1,72 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_VIDEOSTAB_RING_BUFFER_HPP
+#define OPENCV_VIDEOSTAB_RING_BUFFER_HPP
+
+#include <vector>
+#include "opencv2/imgproc.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+//! @addtogroup videostab
+//! @{
+
+template <typename T> inline T& at(int idx, std::vector<T> &items)
+{
+    return items[cv::borderInterpolate(idx, static_cast<int>(items.size()), cv::BORDER_WRAP)];
+}
+
+template <typename T> inline const T& at(int idx, const std::vector<T> &items)
+{
+    return items[cv::borderInterpolate(idx, static_cast<int>(items.size()), cv::BORDER_WRAP)];
+}
+
+//! @}
+
+} // namespace videostab
+} // namespace cv
+
+#endif
diff --git a/modules/videostab/include/opencv2/videostab/stabilizer.hpp b/modules/videostab/include/opencv2/videostab/stabilizer.hpp
new file mode 100644
index 00000000000..28fe348c903
--- /dev/null
+++ b/modules/videostab/include/opencv2/videostab/stabilizer.hpp
@@ -0,0 +1,204 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_VIDEOSTAB_STABILIZER_HPP
+#define OPENCV_VIDEOSTAB_STABILIZER_HPP
+
+#include <vector>
+#include <ctime>
+#include "opencv2/core.hpp"
+#include "opencv2/imgproc.hpp"
+#include "opencv2/videostab/global_motion.hpp"
+#include "opencv2/videostab/motion_stabilizing.hpp"
+#include "opencv2/videostab/frame_source.hpp"
+#include "opencv2/videostab/log.hpp"
+#include "opencv2/videostab/inpainting.hpp"
+#include "opencv2/videostab/deblurring.hpp"
+#include "opencv2/videostab/wobble_suppression.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+//! @addtogroup videostab
+//! @{
+
+class CV_EXPORTS StabilizerBase
+{
+public:
+    virtual ~StabilizerBase() {}
+
+    void setLog(Ptr<ILog> ilog) { log_ = ilog; }
+    Ptr<ILog> log() const { return log_; }
+
+    void setRadius(int val) { radius_ = val; }
+    int radius() const { return radius_; }
+
+    void setFrameSource(Ptr<IFrameSource> val) { frameSource_ = val; }
+    Ptr<IFrameSource> frameSource() const { return frameSource_; }
+
+    void setMaskSource(const Ptr<IFrameSource>& val) { maskSource_ = val; }
+    Ptr<IFrameSource> maskSource() const { return maskSource_; }
+
+    void setMotionEstimator(Ptr<ImageMotionEstimatorBase> val) { motionEstimator_ = val; }
+    Ptr<ImageMotionEstimatorBase> motionEstimator() const { return motionEstimator_; }
+
+    void setDeblurer(Ptr<DeblurerBase> val) { deblurer_ = val; }
+    Ptr<DeblurerBase> deblurrer() const { return deblurer_; }
+
+    void setTrimRatio(float val) { trimRatio_ = val; }
+    float trimRatio() const { return trimRatio_; }
+
+    void setCorrectionForInclusion(bool val) { doCorrectionForInclusion_ = val; }
+    bool doCorrectionForInclusion() const { return doCorrectionForInclusion_; }
+
+    void setBorderMode(int val) { borderMode_ = val; }
+    int borderMode() const { return borderMode_; }
+
+    void setInpainter(Ptr<InpainterBase> val) { inpainter_ = val; }
+    Ptr<InpainterBase> inpainter() const { return inpainter_; }
+
+protected:
+    StabilizerBase();
+
+    void reset();
+    Mat nextStabilizedFrame();
+    bool doOneIteration();
+    virtual void setUp(const Mat &firstFrame);
+    virtual Mat estimateMotion() = 0;
+    virtual Mat estimateStabilizationMotion() = 0;
+    void stabilizeFrame();
+    virtual Mat postProcessFrame(const Mat &frame);
+    void logProcessingTime();
+
+    Ptr<ILog> log_;
+    Ptr<IFrameSource> frameSource_;
+    Ptr<IFrameSource> maskSource_;
+    Ptr<ImageMotionEstimatorBase> motionEstimator_;
+    Ptr<DeblurerBase> deblurer_;
+    Ptr<InpainterBase> inpainter_;
+    int radius_;
+    float trimRatio_;
+    bool doCorrectionForInclusion_;
+    int borderMode_;
+
+    Size frameSize_;
+    Mat frameMask_;
+    int curPos_;
+    int curStabilizedPos_;
+    bool doDeblurring_;
+    Mat preProcessedFrame_;
+    bool doInpainting_;
+    Mat inpaintingMask_;
+    Mat finalFrame_;
+    std::vector<Mat> frames_;
+    std::vector<Mat> motions_; // motions_[i] is the motion from i-th to i+1-th frame
+    std::vector<float> blurrinessRates_;
+    std::vector<Mat> stabilizedFrames_;
+    std::vector<Mat> stabilizedMasks_;
+    std::vector<Mat> stabilizationMotions_;
+    clock_t processingStartTime_;
+};
+
+class CV_EXPORTS OnePassStabilizer : public StabilizerBase, public IFrameSource
+{
+public:
+    OnePassStabilizer();
+
+    void setMotionFilter(Ptr<MotionFilterBase> val) { motionFilter_ = val; }
+    Ptr<MotionFilterBase> motionFilter() const { return motionFilter_; }
+
+    virtual void reset() CV_OVERRIDE;
+    virtual Mat nextFrame() CV_OVERRIDE { return nextStabilizedFrame(); }
+
+protected:
+    virtual void setUp(const Mat &firstFrame) CV_OVERRIDE;
+    virtual Mat estimateMotion() CV_OVERRIDE;
+    virtual Mat estimateStabilizationMotion() CV_OVERRIDE;
+    virtual Mat postProcessFrame(const Mat &frame) CV_OVERRIDE;
+
+    Ptr<MotionFilterBase> motionFilter_;
+};
+
+class CV_EXPORTS TwoPassStabilizer : public StabilizerBase, public IFrameSource
+{
+public:
+    TwoPassStabilizer();
+
+    void setMotionStabilizer(Ptr<IMotionStabilizer> val) { motionStabilizer_ = val; }
+    Ptr<IMotionStabilizer> motionStabilizer() const { return motionStabilizer_; }
+
+    void setWobbleSuppressor(Ptr<WobbleSuppressorBase> val) { wobbleSuppressor_ = val; }
+    Ptr<WobbleSuppressorBase> wobbleSuppressor() const { return wobbleSuppressor_; }
+
+    void setEstimateTrimRatio(bool val) { mustEstTrimRatio_ = val; }
+    bool mustEstimateTrimaRatio() const { return mustEstTrimRatio_; }
+
+    virtual void reset() CV_OVERRIDE;
+    virtual Mat nextFrame() CV_OVERRIDE;
+
+protected:
+    void runPrePassIfNecessary();
+
+    virtual void setUp(const Mat &firstFrame) CV_OVERRIDE;
+    virtual Mat estimateMotion() CV_OVERRIDE;
+    virtual Mat estimateStabilizationMotion() CV_OVERRIDE;
+    virtual Mat postProcessFrame(const Mat &frame) CV_OVERRIDE;
+
+    Ptr<IMotionStabilizer> motionStabilizer_;
+    Ptr<WobbleSuppressorBase> wobbleSuppressor_;
+    bool mustEstTrimRatio_;
+
+    int frameCount_;
+    bool isPrePassDone_;
+    bool doWobbleSuppression_;
+    std::vector<Mat> motions2_;
+    Mat suppressedFrame_;
+};
+
+//! @}
+
+} // namespace videostab
+} // namespace cv
+
+#endif
diff --git a/modules/videostab/include/opencv2/videostab/wobble_suppression.hpp b/modules/videostab/include/opencv2/videostab/wobble_suppression.hpp
new file mode 100644
index 00000000000..d60ae6dd098
--- /dev/null
+++ b/modules/videostab/include/opencv2/videostab/wobble_suppression.hpp
@@ -0,0 +1,140 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef OPENCV_VIDEOSTAB_WOBBLE_SUPPRESSION_HPP
+#define OPENCV_VIDEOSTAB_WOBBLE_SUPPRESSION_HPP
+
+#include <vector>
+#include "opencv2/core.hpp"
+#include "opencv2/core/cuda.hpp"
+#include "opencv2/videostab/global_motion.hpp"
+#include "opencv2/videostab/log.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+//! @addtogroup videostab
+//! @{
+
+class CV_EXPORTS WobbleSuppressorBase
+{
+public:
+    WobbleSuppressorBase();
+
+    virtual ~WobbleSuppressorBase() {}
+
+    void setMotionEstimator(Ptr<ImageMotionEstimatorBase> val) { motionEstimator_ = val; }
+    Ptr<ImageMotionEstimatorBase> motionEstimator() const { return motionEstimator_; }
+
+    virtual void suppress(int idx, const Mat &frame, Mat &result) = 0;
+
+
+    // data from stabilizer
+
+    virtual void setFrameCount(int val) { frameCount_ = val; }
+    virtual int frameCount() const { return frameCount_; }
+
+    virtual void setMotions(const std::vector<Mat> &val) { motions_ = &val; }
+    virtual const std::vector<Mat>& motions() const { return *motions_; }
+
+    virtual void setMotions2(const std::vector<Mat> &val) { motions2_ = &val; }
+    virtual const std::vector<Mat>& motions2() const { return *motions2_; }
+
+    virtual void setStabilizationMotions(const std::vector<Mat> &val) { stabilizationMotions_ = &val; }
+    virtual const std::vector<Mat>& stabilizationMotions() const { return *stabilizationMotions_; }
+
+protected:
+    Ptr<ImageMotionEstimatorBase> motionEstimator_;
+    int frameCount_;
+    const std::vector<Mat> *motions_;
+    const std::vector<Mat> *motions2_;
+    const std::vector<Mat> *stabilizationMotions_;
+};
+
+class CV_EXPORTS NullWobbleSuppressor : public WobbleSuppressorBase
+{
+public:
+    virtual void suppress(int idx, const Mat &frame, Mat &result) CV_OVERRIDE;
+};
+
+class CV_EXPORTS MoreAccurateMotionWobbleSuppressorBase : public WobbleSuppressorBase
+{
+public:
+    virtual void setPeriod(int val) { period_ = val; }
+    virtual int period() const { return period_; }
+
+protected:
+    MoreAccurateMotionWobbleSuppressorBase() { setPeriod(30); }
+
+    int period_;
+};
+
+class CV_EXPORTS MoreAccurateMotionWobbleSuppressor : public MoreAccurateMotionWobbleSuppressorBase
+{
+public:
+    virtual void suppress(int idx, const Mat &frame, Mat &result) CV_OVERRIDE;
+
+private:
+    Mat_<float> mapx_, mapy_;
+};
+
+#if defined(HAVE_OPENCV_CUDAWARPING)
+class CV_EXPORTS MoreAccurateMotionWobbleSuppressorGpu : public MoreAccurateMotionWobbleSuppressorBase
+{
+public:
+    void suppress(int idx, const cuda::GpuMat &frame, cuda::GpuMat &result);
+    virtual void suppress(int idx, const Mat &frame, Mat &result) CV_OVERRIDE;
+
+private:
+    cuda::GpuMat frameDevice_, resultDevice_;
+    cuda::GpuMat mapx_, mapy_;
+};
+#endif
+
+//! @}
+
+} // namespace videostab
+} // namespace cv
+
+#endif
diff --git a/modules/videostab/samples/videostab.cpp b/modules/videostab/samples/videostab.cpp
new file mode 100644
index 00000000000..f416a558634
--- /dev/null
+++ b/modules/videostab/samples/videostab.cpp
@@ -0,0 +1,582 @@
+#include <string>
+#include <iostream>
+#include <fstream>
+#include <sstream>
+#include <stdexcept>
+#include "opencv2/core.hpp"
+#include <opencv2/core/utility.hpp>
+#include "opencv2/video.hpp"
+#include "opencv2/imgproc.hpp"
+#include "opencv2/videoio.hpp"
+#include "opencv2/highgui.hpp"
+#include "opencv2/videostab.hpp"
+#include "opencv2/opencv_modules.hpp"
+
+#define arg(name) cmd.get<string>(name)
+#define argb(name) cmd.get<bool>(name)
+#define argi(name) cmd.get<int>(name)
+#define argf(name) cmd.get<float>(name)
+#define argd(name) cmd.get<double>(name)
+
+using namespace std;
+using namespace cv;
+using namespace cv::videostab;
+
+Ptr<IFrameSource> stabilizedFrames;
+string saveMotionsPath;
+double outputFps;
+string outputPath;
+bool quietMode;
+
+void run();
+void saveMotionsIfNecessary();
+void printHelp();
+MotionModel motionModel(const string &str);
+
+
+void run()
+{
+    VideoWriter writer;
+    Mat stabilizedFrame;
+    int nframes = 0;
+
+    // for each stabilized frame
+    while (!(stabilizedFrame = stabilizedFrames->nextFrame()).empty())
+    {
+        nframes++;
+
+        // init writer (once) and save stabilized frame
+        if (!outputPath.empty())
+        {
+            if (!writer.isOpened())
+                writer.open(outputPath, VideoWriter::fourcc('X','V','I','D'),
+                            outputFps, stabilizedFrame.size());
+            writer << stabilizedFrame;
+        }
+
+        // show stabilized frame
+        if (!quietMode)
+        {
+            imshow("stabilizedFrame", stabilizedFrame);
+            char key = static_cast<char>(waitKey(3));
+            if (key == 27) { cout << endl; break; }
+        }
+    }
+
+    cout << "processed frames: " << nframes << endl
+         << "finished\n";
+}
+
+
+void printHelp()
+{
+    cout << "OpenCV video stabilizer.\n"
+            "Usage: videostab <file_path> [arguments]\n\n"
+            "Arguments:\n"
+            "  -m=, --model=(transl|transl_and_scale|rigid|similarity|affine|homography)\n"
+            "      Set motion model. The default is affine.\n"
+            "  -lp=, --lin-prog-motion-est=(yes|no)\n"
+            "      Turn on/off LP based motion estimation. The default is no.\n"
+            "  --subset=(<int_number>|auto)\n"
+            "      Number of random samples per one motion hypothesis. The default is auto.\n"
+            "  --thresh=(<float_number>|auto)\n"
+            "      Maximum error to classify match as inlier. The default is auto.\n"
+            "  --outlier-ratio=<float_number>\n"
+            "      Motion estimation outlier ratio hypothesis. The default is 0.5.\n"
+            "  --min-inlier-ratio=<float_number>\n"
+            "      Minimum inlier ratio to decide if estimated motion is OK. The default is 0.1.\n"
+            "  --nkps=<int_number>\n"
+            "      Number of keypoints to find in each frame. The default is 1000.\n"
+            "  --local-outlier-rejection=(yes|no)\n"
+            "      Perform local outlier rejection. The default is no.\n\n"
+            "  --feature-masks=(file_path|no)\n"
+            "      Load masks from file. The default is no.\n\n"
+            "  -sm=, --save-motions=(<file_path>|no)\n"
+            "      Save estimated motions into file. The default is no.\n"
+            "  -lm=, --load-motions=(<file_path>|no)\n"
+            "      Load motions from file. The default is no.\n\n"
+            "  -r=, --radius=<int_number>\n"
+            "      Set sliding window radius. The default is 15.\n"
+            "  --stdev=(<float_number>|auto)\n"
+            "      Set smoothing weights standard deviation. The default is auto\n"
+            "      (i.e. sqrt(radius)).\n"
+            "  -lps=, --lin-prog-stab=(yes|no)\n"
+            "      Turn on/off linear programming based stabilization method.\n"
+            "  --lps-trim-ratio=(<float_number>|auto)\n"
+            "      Trimming ratio used in linear programming based method.\n"
+            "  --lps-w1=(<float_number>|1)\n"
+            "      1st derivative weight. The default is 1.\n"
+            "  --lps-w2=(<float_number>|10)\n"
+            "      2nd derivative weight. The default is 10.\n"
+            "  --lps-w3=(<float_number>|100)\n"
+            "      3rd derivative weight. The default is 100.\n"
+            "  --lps-w4=(<float_number>|100)\n"
+            "      Non-translation motion components weight. The default is 100.\n\n"
+            "  --deblur=(yes|no)\n"
+            "      Do deblurring.\n"
+            "  --deblur-sens=<float_number>\n"
+            "      Set deblurring sensitivity (from 0 to +inf). The default is 0.1.\n\n"
+            "  -t=, --trim-ratio=<float_number>\n"
+            "      Set trimming ratio (from 0 to 0.5). The default is 0.1.\n"
+            "  -et=, --est-trim=(yes|no)\n"
+            "      Estimate trim ratio automatically. The default is yes.\n"
+            "  -ic=, --incl-constr=(yes|no)\n"
+            "      Ensure the inclusion constraint is always satisfied. The default is no.\n\n"
+            "  -bm=, --border-mode=(replicate|reflect|const)\n"
+            "      Set border extrapolation mode. The default is replicate.\n\n"
+            "  --mosaic=(yes|no)\n"
+            "      Do consistent mosaicking. The default is no.\n"
+            "  --mosaic-stdev=<float_number>\n"
+            "      Consistent mosaicking stdev threshold. The default is 10.0.\n\n"
+            "  -mi=, --motion-inpaint=(yes|no)\n"
+            "      Do motion inpainting (requires CUDA support). The default is no.\n"
+            "  --mi-dist-thresh=<float_number>\n"
+            "      Estimated flow distance threshold for motion inpainting. The default is 5.0.\n\n"
+            "  -ci=, --color-inpaint=(no|average|ns|telea)\n"
+            "      Do color inpainting. The default is no.\n"
+            "  --ci-radius=<float_number>\n"
+            "      Set color inpainting radius (for ns and telea options only).\n"
+            "      The default is 2.0\n\n"
+            "  -ws=, --wobble-suppress=(yes|no)\n"
+            "      Perform wobble suppression. The default is no.\n"
+            "  --ws-lp=(yes|no)\n"
+            "      Turn on/off LP based motion estimation. The default is no.\n"
+            "  --ws-period=<int_number>\n"
+            "      Set wobble suppression period. The default is 30.\n"
+            "  --ws-model=(transl|transl_and_scale|rigid|similarity|affine|homography)\n"
+            "      Set wobble suppression motion model (must have more DOF than motion \n"
+            "      estimation model). The default is homography.\n"
+            "  --ws-subset=(<int_number>|auto)\n"
+            "      Number of random samples per one motion hypothesis. The default is auto.\n"
+            "  --ws-thresh=(<float_number>|auto)\n"
+            "      Maximum error to classify match as inlier. The default is auto.\n"
+            "  --ws-outlier-ratio=<float_number>\n"
+            "      Motion estimation outlier ratio hypothesis. The default is 0.5.\n"
+            "  --ws-min-inlier-ratio=<float_number>\n"
+            "      Minimum inlier ratio to decide if estimated motion is OK. The default is 0.1.\n"
+            "  --ws-nkps=<int_number>\n"
+            "      Number of keypoints to find in each frame. The default is 1000.\n"
+            "  --ws-local-outlier-rejection=(yes|no)\n"
+            "      Perform local outlier rejection. The default is no.\n\n"
+            "  -sm2=, --save-motions2=(<file_path>|no)\n"
+            "      Save motions estimated for wobble suppression. The default is no.\n"
+            "  -lm2=, --load-motions2=(<file_path>|no)\n"
+            "      Load motions for wobble suppression from file. The default is no.\n\n"
+            "  -gpu=(yes|no)\n"
+            "      Use CUDA optimization whenever possible. The default is no.\n\n"
+            "  -o=, --output=(no|<file_path>)\n"
+            "      Set output file path explicitly. The default is stabilized.avi.\n"
+            "  --fps=(<float_number>|auto)\n"
+            "      Set output video FPS explicitly. By default the source FPS is used (auto).\n"
+            "  -q, --quiet\n"
+            "      Don't show output video frames.\n\n"
+            "  -h, --help\n"
+            "      Print help.\n\n"
+            "Note: some argument configurations lead to two passes, some to single pass.\n\n";
+}
+
+// motion estimator builders are for concise creation of motion estimators
+
+class IMotionEstimatorBuilder
+{
+public:
+    virtual ~IMotionEstimatorBuilder() {}
+    virtual Ptr<ImageMotionEstimatorBase> build() = 0;
+protected:
+    IMotionEstimatorBuilder(CommandLineParser &command) : cmd(command) {}
+    CommandLineParser cmd;
+};
+
+
+class MotionEstimatorRansacL2Builder : public IMotionEstimatorBuilder
+{
+public:
+    MotionEstimatorRansacL2Builder(CommandLineParser &command, bool use_gpu, const string &_prefix = "")
+        : IMotionEstimatorBuilder(command), gpu(use_gpu), prefix(_prefix) {}
+
+    virtual Ptr<ImageMotionEstimatorBase> build() CV_OVERRIDE
+    {
+        Ptr<MotionEstimatorRansacL2> est = makePtr<MotionEstimatorRansacL2>(motionModel(arg(prefix + "model")));
+
+        RansacParams ransac = est->ransacParams();
+        if (arg(prefix + "subset") != "auto")
+            ransac.size = argi(prefix + "subset");
+        if (arg(prefix + "thresh") != "auto")
+            ransac.thresh = argf(prefix + "thresh");
+        ransac.eps = argf(prefix + "outlier-ratio");
+        est->setRansacParams(ransac);
+
+        est->setMinInlierRatio(argf(prefix + "min-inlier-ratio"));
+
+        Ptr<IOutlierRejector> outlierRejector = makePtr<NullOutlierRejector>();
+        if (arg(prefix + "local-outlier-rejection") == "yes")
+        {
+            Ptr<TranslationBasedLocalOutlierRejector> tblor = makePtr<TranslationBasedLocalOutlierRejector>();
+            RansacParams ransacParams = tblor->ransacParams();
+            if (arg(prefix + "thresh") != "auto")
+                ransacParams.thresh = argf(prefix + "thresh");
+            tblor->setRansacParams(ransacParams);
+            outlierRejector = tblor;
+        }
+
+#if defined(HAVE_OPENCV_CUDAIMGPROC) && defined(HAVE_OPENCV_CUDAOPTFLOW)
+        if (gpu)
+        {
+            Ptr<KeypointBasedMotionEstimatorGpu> kbest = makePtr<KeypointBasedMotionEstimatorGpu>(est);
+            kbest->setOutlierRejector(outlierRejector);
+            return kbest;
+        }
+#else
+        CV_Assert(gpu == false && "CUDA modules are not available");
+#endif
+
+        Ptr<KeypointBasedMotionEstimator> kbest = makePtr<KeypointBasedMotionEstimator>(est);
+        kbest->setDetector(GFTTDetector::create(argi(prefix + "nkps")));
+        kbest->setOutlierRejector(outlierRejector);
+        return kbest;
+    }
+private:
+    bool gpu;
+    string prefix;
+};
+
+
+class MotionEstimatorL1Builder : public IMotionEstimatorBuilder
+{
+public:
+    MotionEstimatorL1Builder(CommandLineParser &command, bool use_gpu, const string &_prefix = "")
+        : IMotionEstimatorBuilder(command), gpu(use_gpu), prefix(_prefix) {}
+
+    virtual Ptr<ImageMotionEstimatorBase> build() CV_OVERRIDE
+    {
+        Ptr<MotionEstimatorL1> est = makePtr<MotionEstimatorL1>(motionModel(arg(prefix + "model")));
+
+        Ptr<IOutlierRejector> outlierRejector = makePtr<NullOutlierRejector>();
+        if (arg(prefix + "local-outlier-rejection") == "yes")
+        {
+            Ptr<TranslationBasedLocalOutlierRejector> tblor = makePtr<TranslationBasedLocalOutlierRejector>();
+            RansacParams ransacParams = tblor->ransacParams();
+            if (arg(prefix + "thresh") != "auto")
+                ransacParams.thresh = argf(prefix + "thresh");
+            tblor->setRansacParams(ransacParams);
+            outlierRejector = tblor;
+        }
+
+#if defined(HAVE_OPENCV_CUDAIMGPROC) && defined(HAVE_OPENCV_CUDAOPTFLOW)
+        if (gpu)
+        {
+            Ptr<KeypointBasedMotionEstimatorGpu> kbest = makePtr<KeypointBasedMotionEstimatorGpu>(est);
+            kbest->setOutlierRejector(outlierRejector);
+            return kbest;
+        }
+#else
+        CV_Assert(gpu == false && "CUDA modules are not available");
+#endif
+
+        Ptr<KeypointBasedMotionEstimator> kbest = makePtr<KeypointBasedMotionEstimator>(est);
+        kbest->setDetector(GFTTDetector::create(argi(prefix + "nkps")));
+        kbest->setOutlierRejector(outlierRejector);
+        return kbest;
+    }
+private:
+    bool gpu;
+    string prefix;
+};
+
+
+int main(int argc, const char **argv)
+{
+    try
+    {
+        const char *keys =
+                "{ @1                       |           | }"
+                "{ m  model                 | affine    | }"
+                "{ lp lin-prog-motion-est   | no        | }"
+                "{  subset                  | auto      | }"
+                "{  thresh                  | auto | }"
+                "{  outlier-ratio           | 0.5 | }"
+                "{  min-inlier-ratio        | 0.1 | }"
+                "{  nkps                    | 1000 | }"
+                "{  extra-kps               | 0 | }"
+                "{  local-outlier-rejection | no | }"
+                "{  feature-masks           | no | }"
+                "{ sm  save-motions         | no | }"
+                "{ lm  load-motions         | no | }"
+                "{ r  radius                | 15 | }"
+                "{  stdev                   | auto | }"
+                "{ lps  lin-prog-stab       | no | }"
+                "{  lps-trim-ratio          | auto | }"
+                "{  lps-w1                  | 1 | }"
+                "{  lps-w2                  | 10 | }"
+                "{  lps-w3                  | 100 | }"
+                "{  lps-w4                  | 100 | }"
+                "{  deblur                  | no | }"
+                "{  deblur-sens             | 0.1 | }"
+                "{ et  est-trim             | yes | }"
+                "{ t  trim-ratio            | 0.1 | }"
+                "{ ic  incl-constr          | no | }"
+                "{ bm  border-mode          | replicate | }"
+                "{  mosaic                  | no | }"
+                "{ ms  mosaic-stdev         | 10.0 | }"
+                "{ mi  motion-inpaint       | no | }"
+                "{  mi-dist-thresh          | 5.0 | }"
+                "{ ci color-inpaint         | no | }"
+                "{  ci-radius               | 2 | }"
+                "{ ws  wobble-suppress      | no | }"
+                "{  ws-period               | 30 | }"
+                "{  ws-model                | homography | }"
+                "{  ws-subset               | auto | }"
+                "{  ws-thresh               | auto | }"
+                "{  ws-outlier-ratio        | 0.5 | }"
+                "{  ws-min-inlier-ratio     | 0.1 | }"
+                "{  ws-nkps                 | 1000 | }"
+                "{  ws-extra-kps            | 0 | }"
+                "{  ws-local-outlier-rejection | no | }"
+                "{  ws-lp                   | no | }"
+                "{ sm2 save-motions2        | no | }"
+                "{ lm2 load-motions2        | no | }"
+                "{ gpu                      | no | }"
+                "{ o  output                | stabilized.avi | }"
+                "{ fps                      | auto | }"
+                "{ q quiet                  |  | }"
+                "{ h help                   |  | }";
+        CommandLineParser cmd(argc, argv, keys);
+
+        // parse command arguments
+
+        if (argb("help"))
+        {
+            printHelp();
+            return 0;
+        }
+
+        if (arg("gpu") == "yes")
+        {
+            cout << "initializing GPU..."; cout.flush();
+            Mat hostTmp = Mat::zeros(1, 1, CV_32F);
+            cuda::GpuMat deviceTmp;
+            deviceTmp.upload(hostTmp);
+            cout << endl;
+        }
+
+        StabilizerBase *stabilizer = 0;
+
+        // check if source video is specified
+
+        string inputPath = arg(0);
+        if (inputPath.empty())
+            throw runtime_error("specify video file path");
+
+        // get source video parameters
+
+        Ptr<VideoFileSource> source = makePtr<VideoFileSource>(inputPath);
+        cout << "frame count (rough): " << source->count() << endl;
+        if (arg("fps") == "auto")
+            outputFps = source->fps();
+        else
+            outputFps = argd("fps");
+
+        // prepare motion estimation builders
+
+        Ptr<IMotionEstimatorBuilder> motionEstBuilder;
+        if (arg("lin-prog-motion-est") == "yes")
+            motionEstBuilder.reset(new MotionEstimatorL1Builder(cmd, arg("gpu") == "yes"));
+        else
+            motionEstBuilder.reset(new MotionEstimatorRansacL2Builder(cmd, arg("gpu") == "yes"));
+
+        Ptr<IMotionEstimatorBuilder> wsMotionEstBuilder;
+        if (arg("ws-lp") == "yes")
+            wsMotionEstBuilder.reset(new MotionEstimatorL1Builder(cmd, arg("gpu") == "yes", "ws-"));
+        else
+            wsMotionEstBuilder.reset(new MotionEstimatorRansacL2Builder(cmd, arg("gpu") == "yes", "ws-"));
+
+        // determine whether we must use one pass or two pass stabilizer
+        bool isTwoPass =
+                arg("est-trim") == "yes" || arg("wobble-suppress") == "yes" || arg("lin-prog-stab") == "yes";
+
+        if (isTwoPass)
+        {
+            // we must use two pass stabilizer
+
+            TwoPassStabilizer *twoPassStabilizer = new TwoPassStabilizer();
+            stabilizer = twoPassStabilizer;
+            twoPassStabilizer->setEstimateTrimRatio(arg("est-trim") == "yes");
+
+            // determine stabilization technique
+
+            if (arg("lin-prog-stab") == "yes")
+            {
+                Ptr<LpMotionStabilizer> stab = makePtr<LpMotionStabilizer>();
+                stab->setFrameSize(Size(source->width(), source->height()));
+                stab->setTrimRatio(arg("lps-trim-ratio") == "auto" ? argf("trim-ratio") : argf("lps-trim-ratio"));
+                stab->setWeight1(argf("lps-w1"));
+                stab->setWeight2(argf("lps-w2"));
+                stab->setWeight3(argf("lps-w3"));
+                stab->setWeight4(argf("lps-w4"));
+                twoPassStabilizer->setMotionStabilizer(stab);
+            }
+            else if (arg("stdev") == "auto")
+                twoPassStabilizer->setMotionStabilizer(makePtr<GaussianMotionFilter>(argi("radius")));
+            else
+                twoPassStabilizer->setMotionStabilizer(makePtr<GaussianMotionFilter>(argi("radius"), argf("stdev")));
+
+            // init wobble suppressor if necessary
+
+            if (arg("wobble-suppress") == "yes")
+            {
+                Ptr<MoreAccurateMotionWobbleSuppressorBase> ws = makePtr<MoreAccurateMotionWobbleSuppressor>();
+                if (arg("gpu") == "yes")
+#ifdef HAVE_OPENCV_CUDAWARPING
+                    ws = makePtr<MoreAccurateMotionWobbleSuppressorGpu>();
+#else
+                    throw runtime_error("OpenCV is built without CUDA support");
+#endif
+
+                ws->setMotionEstimator(wsMotionEstBuilder->build());
+                ws->setPeriod(argi("ws-period"));
+                twoPassStabilizer->setWobbleSuppressor(ws);
+
+                MotionModel model = ws->motionEstimator()->motionModel();
+                if (arg("load-motions2") != "no")
+                {
+                    ws->setMotionEstimator(makePtr<FromFileMotionReader>(arg("load-motions2")));
+                    ws->motionEstimator()->setMotionModel(model);
+                }
+                if (arg("save-motions2") != "no")
+                {
+                    ws->setMotionEstimator(makePtr<ToFileMotionWriter>(arg("save-motions2"), ws->motionEstimator()));
+                    ws->motionEstimator()->setMotionModel(model);
+                }
+            }
+        }
+        else
+        {
+            // we must use one pass stabilizer
+
+            OnePassStabilizer *onePassStabilizer = new OnePassStabilizer();
+            stabilizer = onePassStabilizer;
+            if (arg("stdev") == "auto")
+                onePassStabilizer->setMotionFilter(makePtr<GaussianMotionFilter>(argi("radius")));
+            else
+                onePassStabilizer->setMotionFilter(makePtr<GaussianMotionFilter>(argi("radius"), argf("stdev")));
+        }
+
+        stabilizer->setFrameSource(source);
+        stabilizer->setMotionEstimator(motionEstBuilder->build());
+
+        if (arg("feature-masks") != "no")
+        {
+            Ptr<MaskFrameSource> maskSource = makePtr<MaskFrameSource>(
+                makePtr<VideoFileSource>(arg("feature-masks")));
+            std::function<void(Mat&)> maskCallback = [](Mat & inputFrame)
+            {
+                cv::cvtColor(inputFrame, inputFrame, cv::COLOR_BGR2GRAY);
+                threshold(inputFrame, inputFrame, 127, 255, THRESH_BINARY);
+            };
+            maskSource->setMaskCallback(maskCallback);
+            stabilizer->setMaskSource(maskSource);
+        }
+
+        // cast stabilizer to simple frame source interface to read stabilized frames
+        stabilizedFrames.reset(dynamic_cast<IFrameSource*>(stabilizer));
+
+        MotionModel model = stabilizer->motionEstimator()->motionModel();
+        if (arg("load-motions") != "no")
+        {
+            stabilizer->setMotionEstimator(makePtr<FromFileMotionReader>(arg("load-motions")));
+            stabilizer->motionEstimator()->setMotionModel(model);
+        }
+        if (arg("save-motions") != "no")
+        {
+            stabilizer->setMotionEstimator(makePtr<ToFileMotionWriter>(arg("save-motions"), stabilizer->motionEstimator()));
+            stabilizer->motionEstimator()->setMotionModel(model);
+        }
+
+        stabilizer->setRadius(argi("radius"));
+
+        // init deblurer
+        if (arg("deblur") == "yes")
+        {
+            Ptr<WeightingDeblurer> deblurer = makePtr<WeightingDeblurer>();
+            deblurer->setRadius(argi("radius"));
+            deblurer->setSensitivity(argf("deblur-sens"));
+            stabilizer->setDeblurer(deblurer);
+        }
+
+        // set up trimming parameters
+        stabilizer->setTrimRatio(argf("trim-ratio"));
+        stabilizer->setCorrectionForInclusion(arg("incl-constr") == "yes");
+
+        if (arg("border-mode") == "reflect")
+            stabilizer->setBorderMode(BORDER_REFLECT);
+        else if (arg("border-mode") == "replicate")
+            stabilizer->setBorderMode(BORDER_REPLICATE);
+        else if (arg("border-mode") == "const")
+            stabilizer->setBorderMode(BORDER_CONSTANT);
+        else
+            throw runtime_error("unknown border extrapolation mode: "
+                                 + cmd.get<string>("border-mode"));
+
+        // init inpainter
+        InpaintingPipeline *inpainters = new InpaintingPipeline();
+        Ptr<InpainterBase> inpainters_(inpainters);
+        if (arg("mosaic") == "yes")
+        {
+            Ptr<ConsistentMosaicInpainter> inp = makePtr<ConsistentMosaicInpainter>();
+            inp->setStdevThresh(argf("mosaic-stdev"));
+            inpainters->pushBack(inp);
+        }
+        if (arg("motion-inpaint") == "yes")
+        {
+            Ptr<MotionInpainter> inp = makePtr<MotionInpainter>();
+            inp->setDistThreshold(argf("mi-dist-thresh"));
+            inpainters->pushBack(inp);
+        }
+        if (arg("color-inpaint") == "average")
+            inpainters->pushBack(makePtr<ColorAverageInpainter>());
+        else if (arg("color-inpaint") == "ns")
+            inpainters->pushBack(makePtr<ColorInpainter>(int(INPAINT_NS), argd("ci-radius")));
+        else if (arg("color-inpaint") == "telea")
+            inpainters->pushBack(makePtr<ColorInpainter>(int(INPAINT_TELEA), argd("ci-radius")));
+        else if (arg("color-inpaint") != "no")
+            throw runtime_error("unknown color inpainting method: " + arg("color-inpaint"));
+        if (!inpainters->empty())
+        {
+            inpainters->setRadius(argi("radius"));
+            stabilizer->setInpainter(inpainters_);
+        }
+
+        if (arg("output") != "no")
+            outputPath = arg("output");
+
+        quietMode = argb("quiet");
+
+        run();
+    }
+    catch (const exception &e)
+    {
+        cout << "error: " << e.what() << endl;
+        stabilizedFrames.release();
+        return -1;
+    }
+    stabilizedFrames.release();
+    return 0;
+}
+
+
+MotionModel motionModel(const string &str)
+{
+    if (str == "transl")
+        return MM_TRANSLATION;
+    if (str == "transl_and_scale")
+        return MM_TRANSLATION_AND_SCALE;
+    if (str == "rigid")
+        return MM_RIGID;
+    if (str == "similarity")
+        return MM_SIMILARITY;
+    if (str == "affine")
+        return MM_AFFINE;
+    if (str == "homography")
+        return MM_HOMOGRAPHY;
+    throw runtime_error("unknown motion model: " + str);
+}
diff --git a/modules/videostab/src/clp.hpp b/modules/videostab/src/clp.hpp
new file mode 100644
index 00000000000..41b875b3c0b
--- /dev/null
+++ b/modules/videostab/src/clp.hpp
@@ -0,0 +1,64 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_VIDEOSTAB_CLP_HPP__
+#define __OPENCV_VIDEOSTAB_CLP_HPP__
+
+#ifdef HAVE_CLP
+#  define COIN_BIG_INDEX 0
+#  define DEBUG_COIN 0
+#  define PRESOLVE_DEBUG 0
+#  define PRESOLVE_CONSISTENCY 0
+
+#  include "ClpSimplex.hpp"
+#  include "ClpPresolve.hpp"
+#  include "ClpPrimalColumnSteepest.hpp"
+#  include "ClpDualRowSteepest.hpp"
+#  define INF 1e10
+#endif
+
+// Clp replaces min and max with ?: globally, we can't use std::min and std::max in case
+// when HAVE_CLP is true. We create the defines by ourselves when HAVE_CLP == 0.
+#undef min
+#undef max
+
+#endif
diff --git a/modules/videostab/src/cuda/global_motion.cu b/modules/videostab/src/cuda/global_motion.cu
new file mode 100644
index 00000000000..7eca6ff76b7
--- /dev/null
+++ b/modules/videostab/src/cuda/global_motion.cu
@@ -0,0 +1,117 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#if !defined CUDA_DISABLER
+
+#include <thrust/device_ptr.h>
+#include <thrust/remove.h>
+#include <thrust/functional.h>
+#include "opencv2/core/cuda/common.hpp"
+
+namespace cv { namespace cuda { namespace device { namespace globmotion {
+
+__constant__ float cml[9];
+__constant__ float cmr[9];
+
+int compactPoints(int N, float *points0, float *points1, const uchar *mask)
+{
+    thrust::device_ptr<float2> dpoints0((float2*)points0);
+    thrust::device_ptr<float2> dpoints1((float2*)points1);
+    thrust::device_ptr<const uchar> dmask(mask);
+
+    return (int)(thrust::remove_if(thrust::make_zip_iterator(thrust::make_tuple(dpoints0, dpoints1)),
+                             thrust::make_zip_iterator(thrust::make_tuple(dpoints0 + N, dpoints1 + N)),
+                             dmask, thrust::not1(thrust::identity<uchar>()))
+           - thrust::make_zip_iterator(make_tuple(dpoints0, dpoints1)));
+}
+
+
+__global__ void calcWobbleSuppressionMapsKernel(
+        const int left, const int idx, const int right, const int width, const int height,
+        PtrStepf mapx, PtrStepf mapy)
+{
+    const int x = blockDim.x * blockIdx.x + threadIdx.x;
+    const int y = blockDim.y * blockIdx.y + threadIdx.y;
+
+    if (x < width && y < height)
+    {
+        float xl = cml[0]*x + cml[1]*y + cml[2];
+        float yl = cml[3]*x + cml[4]*y + cml[5];
+        float izl = 1.f / (cml[6]*x + cml[7]*y + cml[8]);
+        xl *= izl;
+        yl *= izl;
+
+        float xr = cmr[0]*x + cmr[1]*y + cmr[2];
+        float yr = cmr[3]*x + cmr[4]*y + cmr[5];
+        float izr = 1.f / (cmr[6]*x + cmr[7]*y + cmr[8]);
+        xr *= izr;
+        yr *= izr;
+
+        float wl = idx - left;
+        float wr = right - idx;
+        mapx(y,x) = (wr * xl + wl * xr) / (wl + wr);
+        mapy(y,x) = (wr * yl + wl * yr) / (wl + wr);
+    }
+}
+
+
+void calcWobbleSuppressionMaps(
+        int left, int idx, int right, int width, int height,
+        const float *ml, const float *mr, PtrStepSzf mapx, PtrStepSzf mapy)
+{
+    cudaSafeCall(cudaMemcpyToSymbol(cml, ml, 9*sizeof(float)));
+    cudaSafeCall(cudaMemcpyToSymbol(cmr, mr, 9*sizeof(float)));
+
+    dim3 threads(32, 8);
+    dim3 grid(divUp(width, threads.x), divUp(height, threads.y));
+
+    calcWobbleSuppressionMapsKernel<<<grid, threads>>>(
+            left, idx, right, width, height, mapx, mapy);
+
+    cudaSafeCall(cudaGetLastError());
+    cudaSafeCall(cudaDeviceSynchronize());
+}
+
+}}}}
+
+
+#endif /* CUDA_DISABLER */
diff --git a/modules/videostab/src/deblurring.cpp b/modules/videostab/src/deblurring.cpp
new file mode 100644
index 00000000000..872e2fe83be
--- /dev/null
+++ b/modules/videostab/src/deblurring.cpp
@@ -0,0 +1,143 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+#include "opencv2/videostab/deblurring.hpp"
+#include "opencv2/videostab/global_motion.hpp"
+#include "opencv2/videostab/ring_buffer.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+float calcBlurriness(const Mat &frame)
+{
+    CV_INSTRUMENT_REGION();
+
+    Mat Gx, Gy;
+    Sobel(frame, Gx, CV_32F, 1, 0);
+    Sobel(frame, Gy, CV_32F, 0, 1);
+    double normGx = norm(Gx);
+    double normGy = norm(Gy);
+    double sumSq = normGx*normGx + normGy*normGy;
+    return static_cast<float>(1. / (sumSq / frame.size().area() + 1e-6));
+}
+
+
+WeightingDeblurer::WeightingDeblurer()
+{
+    setSensitivity(0.1f);
+}
+
+
+void WeightingDeblurer::deblur(int idx, Mat &frame, const Range &range)
+{
+    CV_INSTRUMENT_REGION();
+
+    CV_Assert(frame.type() == CV_8UC3);
+
+    bSum_.create(frame.size());
+    gSum_.create(frame.size());
+    rSum_.create(frame.size());
+    wSum_.create(frame.size());
+
+    for (int y = 0; y < frame.rows; ++y)
+    {
+        for (int x = 0; x < frame.cols; ++x)
+        {
+            Point3_<uchar> p = frame.at<Point3_<uchar> >(y,x);
+            bSum_(y,x) = p.x;
+            gSum_(y,x) = p.y;
+            rSum_(y,x) = p.z;
+            wSum_(y,x) = 1.f;
+        }
+    }
+
+    int iMin = std::max(idx - radius_, range.start);
+    int iMax = std::min(idx + radius_, range.end);
+    for (int k = iMin; k <= iMax; ++k)
+    {
+        const Mat &neighbor = at(k, *frames_);
+        float bRatio = at(idx, *blurrinessRates_) / at(k, *blurrinessRates_);
+        Mat_<float> M = getMotion(idx, k, *motions_);
+
+        if (bRatio > 1.f)
+        {
+            for (int y = 0; y < frame.rows; ++y)
+            {
+                for (int x = 0; x < frame.cols; ++x)
+                {
+                    int x1 = cvRound(M(0,0)*x + M(0,1)*y + M(0,2));
+                    int y1 = cvRound(M(1,0)*x + M(1,1)*y + M(1,2));
+
+                    if (x1 >= 0 && x1 < neighbor.cols && y1 >= 0 && y1 < neighbor.rows)
+                    {
+                        const Point3_<uchar> &p = frame.at<Point3_<uchar> >(y,x);
+                        const Point3_<uchar> &p1 = neighbor.at<Point3_<uchar> >(y1,x1);
+                        float w = bRatio * sensitivity_ /
+                                (sensitivity_ + std::abs(intensity(p1) - intensity(p)));
+                        bSum_(y,x) += w * p1.x;
+                        gSum_(y,x) += w * p1.y;
+                        rSum_(y,x) += w * p1.z;
+                        wSum_(y,x) += w;
+                    }
+                }
+            }
+        }
+    }
+
+    for (int y = 0; y < frame.rows; ++y)
+    {
+        for (int x = 0; x < frame.cols; ++x)
+        {
+            float wSumInv = 1.f / wSum_(y,x);
+            frame.at<Point3_<uchar> >(y,x) = Point3_<uchar>(
+                    static_cast<uchar>(bSum_(y,x)*wSumInv),
+                    static_cast<uchar>(gSum_(y,x)*wSumInv),
+                    static_cast<uchar>(rSum_(y,x)*wSumInv));
+        }
+    }
+}
+
+} // namespace videostab
+} // namespace cv
diff --git a/modules/videostab/src/fast_marching.cpp b/modules/videostab/src/fast_marching.cpp
new file mode 100644
index 00000000000..dce102723cf
--- /dev/null
+++ b/modules/videostab/src/fast_marching.cpp
@@ -0,0 +1,141 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+#include "opencv2/videostab/fast_marching.hpp"
+#include "opencv2/videostab/ring_buffer.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+float FastMarchingMethod::solve(int x1, int y1, int x2, int y2) const
+{
+    float sol = inf_;
+    if (y1 >=0 && y1 < flag_.rows && x1 >= 0 && x1 < flag_.cols && flag_(y1,x1) == KNOWN)
+    {
+        float t1 = dist_(y1,x1);
+        if (y2 >=0 && y2 < flag_.rows && x2 >= 0 && x2 < flag_.cols && flag_(y2,x2) == KNOWN)
+        {
+            float t2 = dist_(y2,x2);
+            float r = std::sqrt(2 - sqr(t1 - t2));
+            float s = (t1 + t2 - r) / 2;
+
+            if (s >= t1 && s >= t2)
+                sol = s;
+            else
+            {
+                s += r;
+                if (s >= t1 && s >= t2)
+                    sol = s;
+            }
+        }
+        else
+            sol = 1 + t1;
+    }
+    else if (y2 >=0 && y2 < flag_.rows && x2 >= 0 && x2 < flag_.cols && flag_(y2,x2) == KNOWN)
+        sol = 1 + dist_(y2,x1);
+    return sol;
+}
+
+
+void FastMarchingMethod::heapUp(int idx)
+{
+    int p = (idx-1)/2;
+    while (idx > 0 && narrowBand_[idx] < narrowBand_[p])
+    {
+        std::swap(indexOf(narrowBand_[p]), indexOf(narrowBand_[idx]));
+        std::swap(narrowBand_[p], narrowBand_[idx]);
+        idx = p;
+        p = (idx-1)/2;
+    }
+}
+
+
+void FastMarchingMethod::heapDown(int idx)
+{
+    int l, r, smallest;
+    for(;;)
+    {
+        l = 2*idx+1;
+        r = 2*idx+2;
+        smallest = idx;
+
+        if (l < size_ && narrowBand_[l] < narrowBand_[smallest]) smallest = l;
+        if (r < size_ && narrowBand_[r] < narrowBand_[smallest]) smallest = r;
+
+        if (smallest == idx)
+            break;
+        else
+        {
+            std::swap(indexOf(narrowBand_[idx]), indexOf(narrowBand_[smallest]));
+            std::swap(narrowBand_[idx], narrowBand_[smallest]);
+            idx = smallest;
+        }
+    }
+}
+
+
+void FastMarchingMethod::heapAdd(const DXY &dxy)
+{
+    if (static_cast<int>(narrowBand_.size()) < size_ + 1)
+        narrowBand_.resize(size_*2 + 1);
+    narrowBand_[size_] = dxy;
+    indexOf(dxy) = size_++;
+    heapUp(size_-1);
+}
+
+
+void FastMarchingMethod::heapRemoveMin()
+{
+    if (size_ > 0)
+    {
+        size_--;
+        std::swap(indexOf(narrowBand_[0]), indexOf(narrowBand_[size_]));
+        std::swap(narrowBand_[0], narrowBand_[size_]);
+        heapDown(0);
+    }
+}
+
+} // namespace videostab
+} // namespace cv
diff --git a/modules/videostab/src/frame_source.cpp b/modules/videostab/src/frame_source.cpp
new file mode 100644
index 00000000000..a0176f131e4
--- /dev/null
+++ b/modules/videostab/src/frame_source.cpp
@@ -0,0 +1,120 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+#include "opencv2/videostab/frame_source.hpp"
+#include "opencv2/videostab/ring_buffer.hpp"
+
+#include "opencv2/opencv_modules.hpp"
+#ifdef HAVE_OPENCV_VIDEOIO
+#  include "opencv2/videoio.hpp"
+#endif
+
+namespace cv
+{
+namespace videostab
+{
+
+namespace {
+
+class VideoFileSourceImpl : public IFrameSource
+{
+public:
+    VideoFileSourceImpl(const String &path, bool volatileFrame)
+        : path_(path), volatileFrame_(volatileFrame) { reset(); }
+
+    virtual void reset() CV_OVERRIDE
+    {
+#ifdef HAVE_OPENCV_VIDEOIO
+        vc.release();
+        vc.open(path_);
+        if (!vc.isOpened())
+            CV_Error(0, "can't open file: " + path_);
+#else
+        CV_Error(Error::StsNotImplemented, "OpenCV has been compiled without video I/O support");
+#endif
+    }
+
+    virtual Mat nextFrame() CV_OVERRIDE
+    {
+        Mat frame;
+#ifdef HAVE_OPENCV_VIDEOIO
+        vc >> frame;
+#endif
+        return volatileFrame_ ? frame : frame.clone();
+    }
+
+#ifdef HAVE_OPENCV_VIDEOIO
+    int width() {return static_cast<int>(vc.get(CAP_PROP_FRAME_WIDTH));}
+    int height() {return static_cast<int>(vc.get(CAP_PROP_FRAME_HEIGHT));}
+    int count() {return static_cast<int>(vc.get(CAP_PROP_FRAME_COUNT));}
+    double fps() {return vc.get(CAP_PROP_FPS);}
+#else
+    int width() {return 0;}
+    int height() {return 0;}
+    int count() {return 0;}
+    double fps() {return 0;}
+#endif
+
+private:
+    String path_;
+    bool volatileFrame_;
+#ifdef HAVE_OPENCV_VIDEOIO
+    VideoCapture vc;
+#endif
+};
+
+}//namespace
+
+VideoFileSource::VideoFileSource(const String &path, bool volatileFrame)
+    : impl(new VideoFileSourceImpl(path, volatileFrame)) {}
+
+void VideoFileSource::reset() { impl->reset(); }
+Mat VideoFileSource::nextFrame() { return impl->nextFrame(); }
+
+int VideoFileSource::width() { return ((VideoFileSourceImpl*)impl.get())->width(); }
+int VideoFileSource::height() { return ((VideoFileSourceImpl*)impl.get())->height(); }
+int VideoFileSource::count() { return ((VideoFileSourceImpl*)impl.get())->count(); }
+double VideoFileSource::fps() { return ((VideoFileSourceImpl*)impl.get())->fps(); }
+
+} // namespace videostab
+} // namespace cv
diff --git a/modules/videostab/src/global_motion.cpp b/modules/videostab/src/global_motion.cpp
new file mode 100644
index 00000000000..5bef5b1e74f
--- /dev/null
+++ b/modules/videostab/src/global_motion.cpp
@@ -0,0 +1,882 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+#include "opencv2/videostab/global_motion.hpp"
+#include "opencv2/videostab/ring_buffer.hpp"
+#include "opencv2/videostab/outlier_rejection.hpp"
+#include "opencv2/opencv_modules.hpp"
+#include "clp.hpp"
+
+#include "opencv2/core/private.cuda.hpp"
+
+#if defined(HAVE_OPENCV_CUDAIMGPROC) && defined(HAVE_OPENCV_CUDAOPTFLOW)
+    #if !defined HAVE_CUDA || defined(CUDA_DISABLER)
+        namespace cv { namespace cuda {
+            static void compactPoints(GpuMat&, GpuMat&, const GpuMat&) { throw_no_cuda(); }
+        }}
+    #else
+        namespace cv { namespace cuda { namespace device { namespace globmotion {
+            int compactPoints(int N, float *points0, float *points1, const uchar *mask);
+        }}}}
+        namespace cv { namespace cuda {
+            static void compactPoints(GpuMat &points0, GpuMat &points1, const GpuMat &mask)
+            {
+                CV_Assert(points0.rows == 1 && points1.rows == 1 && mask.rows == 1);
+                CV_Assert(points0.type() == CV_32FC2 && points1.type() == CV_32FC2 && mask.type() == CV_8U);
+                CV_Assert(points0.cols == mask.cols && points1.cols == mask.cols);
+
+                int npoints = points0.cols;
+                int remaining = cv::cuda::device::globmotion::compactPoints(
+                        npoints, (float*)points0.data, (float*)points1.data, mask.data);
+
+                points0 = points0.colRange(0, remaining);
+                points1 = points1.colRange(0, remaining);
+            }
+        }}
+    #endif
+#endif
+
+namespace cv
+{
+namespace videostab
+{
+
+// does isotropic normalization
+static Mat normalizePoints(int npoints, Point2f *points)
+{
+    float cx = 0.f, cy = 0.f;
+    for (int i = 0; i < npoints; ++i)
+    {
+        cx += points[i].x;
+        cy += points[i].y;
+    }
+    cx /= npoints;
+    cy /= npoints;
+
+    float d = 0.f;
+    for (int i = 0; i < npoints; ++i)
+    {
+        points[i].x -= cx;
+        points[i].y -= cy;
+        d += std::sqrt(sqr(points[i].x) + sqr(points[i].y));
+    }
+    d /= npoints;
+
+    float s = std::sqrt(2.f) / d;
+    for (int i = 0; i < npoints; ++i)
+    {
+        points[i].x *= s;
+        points[i].y *= s;
+    }
+
+    Mat_<float> T = Mat::eye(3, 3, CV_32F);
+    T(0,0) = T(1,1) = s;
+    T(0,2) = -cx*s;
+    T(1,2) = -cy*s;
+    return std::move(T);
+}
+
+
+static Mat estimateGlobMotionLeastSquaresTranslation(
+        int npoints, Point2f *points0, Point2f *points1, float *rmse)
+{
+    Mat_<float> M = Mat::eye(3, 3, CV_32F);
+    for (int i = 0; i < npoints; ++i)
+    {
+        M(0,2) += points1[i].x - points0[i].x;
+        M(1,2) += points1[i].y - points0[i].y;
+    }
+    M(0,2) /= npoints;
+    M(1,2) /= npoints;
+
+    if (rmse)
+    {
+        *rmse = 0;
+        for (int i = 0; i < npoints; ++i)
+            *rmse += sqr(points1[i].x - points0[i].x - M(0,2)) +
+                     sqr(points1[i].y - points0[i].y - M(1,2));
+        *rmse = std::sqrt(*rmse / npoints);
+    }
+
+    return std::move(M);
+}
+
+
+static Mat estimateGlobMotionLeastSquaresTranslationAndScale(
+        int npoints, Point2f *points0, Point2f *points1, float *rmse)
+{
+    Mat_<float> T0 = normalizePoints(npoints, points0);
+    Mat_<float> T1 = normalizePoints(npoints, points1);
+
+    Mat_<float> A(2*npoints, 3), b(2*npoints, 1);
+    float *a0, *a1;
+    Point2f p0, p1;
+
+    for (int i = 0; i < npoints; ++i)
+    {
+        a0 = A[2*i];
+        a1 = A[2*i+1];
+        p0 = points0[i];
+        p1 = points1[i];
+        a0[0] = p0.x; a0[1] = 1; a0[2] = 0;
+        a1[0] = p0.y; a1[1] = 0; a1[2] = 1;
+        b(2*i,0) = p1.x;
+        b(2*i+1,0) = p1.y;
+    }
+
+    Mat_<float> sol;
+    solve(A, b, sol, DECOMP_NORMAL | DECOMP_LU);
+
+    if (rmse)
+        *rmse = static_cast<float>(norm(A*sol, b, NORM_L2) / std::sqrt(static_cast<double>(npoints)));
+
+    Mat_<float> M = Mat::eye(3, 3, CV_32F);
+    M(0,0) = M(1,1) = sol(0,0);
+    M(0,2) = sol(1,0);
+    M(1,2) = sol(2,0);
+
+    return T1.inv() * M * T0;
+}
+
+static Mat estimateGlobMotionLeastSquaresRotation(
+        int npoints, Point2f *points0, Point2f *points1, float *rmse)
+{
+    Point2f p0, p1;
+    float A(0), B(0);
+    for(int i=0; i<npoints; ++i)
+    {
+        p0 = points0[i];
+        p1 = points1[i];
+
+        A += p0.x*p1.x + p0.y*p1.y;
+        B += p0.x*p1.y - p1.x*p0.y;
+    }
+
+    // A*sin(alpha) + B*cos(alpha) = 0
+    float C = std::sqrt(A*A + B*B);
+    Mat_<float> M = Mat::eye(3, 3, CV_32F);
+    if ( C != 0 )
+    {
+        float sinAlpha = - B / C;
+        float cosAlpha = A / C;
+
+        M(0,0) = cosAlpha;
+        M(1,1) = M(0,0);
+        M(0,1) = sinAlpha;
+        M(1,0) = - M(0,1);
+    }
+
+    if (rmse)
+    {
+        *rmse = 0;
+        for (int i = 0; i < npoints; ++i)
+        {
+            p0 = points0[i];
+            p1 = points1[i];
+            *rmse += sqr(p1.x - M(0,0)*p0.x - M(0,1)*p0.y) +
+                     sqr(p1.y - M(1,0)*p0.x - M(1,1)*p0.y);
+        }
+        *rmse = std::sqrt(*rmse / npoints);
+    }
+
+    return std::move(M);
+}
+
+static Mat  estimateGlobMotionLeastSquaresRigid(
+        int npoints, Point2f *points0, Point2f *points1, float *rmse)
+{
+    Point2f mean0(0.f, 0.f);
+    Point2f mean1(0.f, 0.f);
+
+    for (int i = 0; i < npoints; ++i)
+    {
+        mean0 += points0[i];
+        mean1 += points1[i];
+    }
+
+    mean0 *= 1.f / npoints;
+    mean1 *= 1.f / npoints;
+
+    Mat_<float> A = Mat::zeros(2, 2, CV_32F);
+    Point2f pt0, pt1;
+
+    for (int i = 0; i < npoints; ++i)
+    {
+        pt0 = points0[i] - mean0;
+        pt1 = points1[i] - mean1;
+        A(0,0) += pt1.x * pt0.x;
+        A(0,1) += pt1.x * pt0.y;
+        A(1,0) += pt1.y * pt0.x;
+        A(1,1) += pt1.y * pt0.y;
+    }
+
+    Mat_<float> M = Mat::eye(3, 3, CV_32F);
+
+    SVD svd(A);
+    Mat_<float> R = svd.u * svd.vt;
+    Mat tmp(M(Rect(0,0,2,2)));
+    R.copyTo(tmp);
+
+    M(0,2) = mean1.x - R(0,0)*mean0.x - R(0,1)*mean0.y;
+    M(1,2) = mean1.y - R(1,0)*mean0.x - R(1,1)*mean0.y;
+
+    if (rmse)
+    {
+        *rmse = 0;
+        for (int i = 0; i < npoints; ++i)
+        {
+            pt0 = points0[i];
+            pt1 = points1[i];
+            *rmse += sqr(pt1.x - M(0,0)*pt0.x - M(0,1)*pt0.y - M(0,2)) +
+                     sqr(pt1.y - M(1,0)*pt0.x - M(1,1)*pt0.y - M(1,2));
+        }
+        *rmse = std::sqrt(*rmse / npoints);
+    }
+
+    return std::move(M);
+}
+
+
+static Mat estimateGlobMotionLeastSquaresSimilarity(
+        int npoints, Point2f *points0, Point2f *points1, float *rmse)
+{
+    Mat_<float> T0 = normalizePoints(npoints, points0);
+    Mat_<float> T1 = normalizePoints(npoints, points1);
+
+    Mat_<float> A(2*npoints, 4), b(2*npoints, 1);
+    float *a0, *a1;
+    Point2f p0, p1;
+
+    for (int i = 0; i < npoints; ++i)
+    {
+        a0 = A[2*i];
+        a1 = A[2*i+1];
+        p0 = points0[i];
+        p1 = points1[i];
+        a0[0] = p0.x; a0[1] = p0.y; a0[2] = 1; a0[3] = 0;
+        a1[0] = p0.y; a1[1] = -p0.x; a1[2] = 0; a1[3] = 1;
+        b(2*i,0) = p1.x;
+        b(2*i+1,0) = p1.y;
+    }
+
+    Mat_<float> sol;
+    solve(A, b, sol, DECOMP_NORMAL | DECOMP_LU);
+
+    if (rmse)
+        *rmse = static_cast<float>(norm(A*sol, b, NORM_L2) / std::sqrt(static_cast<double>(npoints)));
+
+    Mat_<float> M = Mat::eye(3, 3, CV_32F);
+    M(0,0) = M(1,1) = sol(0,0);
+    M(0,1) = sol(1,0);
+    M(1,0) = -sol(1,0);
+    M(0,2) = sol(2,0);
+    M(1,2) = sol(3,0);
+
+    return T1.inv() * M * T0;
+}
+
+
+static Mat estimateGlobMotionLeastSquaresAffine(
+        int npoints, Point2f *points0, Point2f *points1, float *rmse)
+{
+    Mat_<float> T0 = normalizePoints(npoints, points0);
+    Mat_<float> T1 = normalizePoints(npoints, points1);
+
+    Mat_<float> A(2*npoints, 6), b(2*npoints, 1);
+    float *a0, *a1;
+    Point2f p0, p1;
+
+    for (int i = 0; i < npoints; ++i)
+    {
+        a0 = A[2*i];
+        a1 = A[2*i+1];
+        p0 = points0[i];
+        p1 = points1[i];
+        a0[0] = p0.x; a0[1] = p0.y; a0[2] = 1; a0[3] = a0[4] = a0[5] = 0;
+        a1[0] = a1[1] = a1[2] = 0; a1[3] = p0.x; a1[4] = p0.y; a1[5] = 1;
+        b(2*i,0) = p1.x;
+        b(2*i+1,0) = p1.y;
+    }
+
+    Mat_<float> sol;
+    solve(A, b, sol, DECOMP_NORMAL | DECOMP_LU);
+
+    if (rmse)
+        *rmse = static_cast<float>(norm(A*sol, b, NORM_L2) / std::sqrt(static_cast<double>(npoints)));
+
+    Mat_<float> M = Mat::eye(3, 3, CV_32F);
+    for (int i = 0, k = 0; i < 2; ++i)
+        for (int j = 0; j < 3; ++j, ++k)
+            M(i,j) = sol(k,0);
+
+    return T1.inv() * M * T0;
+}
+
+
+Mat estimateGlobalMotionLeastSquares(
+        InputOutputArray points0, InputOutputArray points1, int model, float *rmse)
+{
+    CV_INSTRUMENT_REGION();
+
+    CV_Assert(model <= MM_AFFINE);
+    CV_Assert(points0.type() == points1.type());
+    const int npoints = points0.getMat().checkVector(2);
+    CV_Assert(points1.getMat().checkVector(2) == npoints);
+
+    typedef Mat (*Impl)(int, Point2f*, Point2f*, float*);
+    static Impl impls[] = { estimateGlobMotionLeastSquaresTranslation,
+                            estimateGlobMotionLeastSquaresTranslationAndScale,
+                            estimateGlobMotionLeastSquaresRotation,
+                            estimateGlobMotionLeastSquaresRigid,
+                            estimateGlobMotionLeastSquaresSimilarity,
+                            estimateGlobMotionLeastSquaresAffine };
+
+    Point2f *points0_ = points0.getMat().ptr<Point2f>();
+    Point2f *points1_ = points1.getMat().ptr<Point2f>();
+
+    return impls[model](npoints, points0_, points1_, rmse);
+}
+
+
+Mat estimateGlobalMotionRansac(
+        InputArray points0, InputArray points1, int model, const RansacParams &params,
+        float *rmse, int *ninliers)
+{
+    CV_INSTRUMENT_REGION();
+
+    CV_Assert(model <= MM_AFFINE);
+    CV_Assert(points0.type() == points1.type());
+    const int npoints = points0.getMat().checkVector(2);
+    CV_Assert(points1.getMat().checkVector(2) == npoints);
+
+    if (npoints < params.size)
+        return Mat::eye(3, 3, CV_32F);
+
+    const Point2f *points0_ = points0.getMat().ptr<Point2f>();
+    const Point2f *points1_ = points1.getMat().ptr<Point2f>();
+    const int niters = params.niters();
+
+    // current hypothesis
+    std::vector<int> indices(params.size);
+    std::vector<Point2f> subset0(params.size);
+    std::vector<Point2f> subset1(params.size);
+
+    // best hypothesis
+    std::vector<int> bestIndices(params.size);
+
+    Mat_<float> bestM;
+    int ninliersMax = -1;
+
+    RNG rng(0);
+    Point2f p0, p1;
+    float x, y;
+
+    for (int iter = 0; iter < niters; ++iter)
+    {
+        for (int i = 0; i < params.size; ++i)
+        {
+            bool ok = false;
+            while (!ok)
+            {
+                ok = true;
+                indices[i] = static_cast<unsigned>(rng) % npoints;
+                for (int j = 0; j < i; ++j)
+                    if (indices[i] == indices[j])
+                        { ok = false; break; }
+            }
+        }
+        for (int i = 0; i < params.size; ++i)
+        {
+            subset0[i] = points0_[indices[i]];
+            subset1[i] = points1_[indices[i]];
+        }
+
+        Mat_<float> M = estimateGlobalMotionLeastSquares(subset0, subset1, model, 0);
+
+        int numinliers = 0;
+        for (int i = 0; i < npoints; ++i)
+        {
+            p0 = points0_[i];
+            p1 = points1_[i];
+            x = M(0,0)*p0.x + M(0,1)*p0.y + M(0,2);
+            y = M(1,0)*p0.x + M(1,1)*p0.y + M(1,2);
+            if (sqr(x - p1.x) + sqr(y - p1.y) < params.thresh * params.thresh)
+                numinliers++;
+        }
+        if (numinliers >= ninliersMax)
+        {
+            bestM = M;
+            ninliersMax = numinliers;
+            bestIndices.swap(indices);
+        }
+    }
+
+    if (ninliersMax < params.size)
+    {
+        // compute RMSE
+        for (int i = 0; i < params.size; ++i)
+        {
+            subset0[i] = points0_[bestIndices[i]];
+            subset1[i] = points1_[bestIndices[i]];
+        }
+        bestM = estimateGlobalMotionLeastSquares(subset0, subset1, model, rmse);
+    }
+    else
+    {
+        subset0.resize(ninliersMax);
+        subset1.resize(ninliersMax);
+        for (int i = 0, j = 0; i < npoints && j < ninliersMax ; ++i)
+        {
+            p0 = points0_[i];
+            p1 = points1_[i];
+            x = bestM(0,0)*p0.x + bestM(0,1)*p0.y + bestM(0,2);
+            y = bestM(1,0)*p0.x + bestM(1,1)*p0.y + bestM(1,2);
+            if (sqr(x - p1.x) + sqr(y - p1.y) < params.thresh * params.thresh)
+            {
+                subset0[j] = p0;
+                subset1[j] = p1;
+                j++;
+            }
+        }
+        bestM = estimateGlobalMotionLeastSquares(subset0, subset1, model, rmse);
+    }
+
+    if (ninliers)
+        *ninliers = ninliersMax;
+
+    return std::move(bestM);
+}
+
+
+MotionEstimatorRansacL2::MotionEstimatorRansacL2(MotionModel model)
+    : MotionEstimatorBase(model)
+{
+    setRansacParams(RansacParams::default2dMotion(model));
+    setMinInlierRatio(0.1f);
+}
+
+
+Mat MotionEstimatorRansacL2::estimate(InputArray points0, InputArray points1, bool *ok)
+{
+    CV_Assert(points0.type() == points1.type());
+    const int npoints = points0.getMat().checkVector(2);
+    CV_Assert(points1.getMat().checkVector(2) == npoints);
+
+    // find motion
+
+    int ninliers = 0;
+    Mat_<float> M;
+
+    if (motionModel() != MM_HOMOGRAPHY)
+        M = estimateGlobalMotionRansac(
+                points0, points1, motionModel(), ransacParams_, 0, &ninliers);
+    else
+    {
+        std::vector<uchar> mask;
+        M = findHomography(points0, points1, mask, LMEDS);
+        for (int i  = 0; i < npoints; ++i)
+            if (mask[i]) ninliers++;
+    }
+
+    // check if we're confident enough in estimated motion
+
+    if (ok) *ok = true;
+    if (static_cast<float>(ninliers) / npoints < minInlierRatio_)
+    {
+        M = Mat::eye(3, 3, CV_32F);
+        if (ok) *ok = false;
+    }
+
+    return std::move(M);
+}
+
+
+MotionEstimatorL1::MotionEstimatorL1(MotionModel model)
+    : MotionEstimatorBase(model)
+{
+}
+
+
+// TODO will estimation of all motions as one LP problem be faster?
+Mat MotionEstimatorL1::estimate(InputArray points0, InputArray points1, bool *ok)
+{
+    CV_Assert(points0.type() == points1.type());
+    const int npoints = points0.getMat().checkVector(2);
+    CV_Assert(points1.getMat().checkVector(2) == npoints);
+
+#ifndef HAVE_CLP
+
+    CV_UNUSED(ok);
+    CV_Error(Error::StsError, "The library is built without Clp support");
+
+#else
+
+    CV_Assert(motionModel() <= MM_AFFINE && motionModel() != MM_RIGID);
+
+    if(npoints <= 0)
+        return Mat::eye(3, 3, CV_32F);
+
+    // prepare LP problem
+
+    const Point2f *points0_ = points0.getMat().ptr<Point2f>();
+    const Point2f *points1_ = points1.getMat().ptr<Point2f>();
+
+    int ncols = 6 + 2*npoints;
+    int nrows = 4*npoints;
+
+    if (motionModel() == MM_SIMILARITY)
+        nrows += 2;
+    else if (motionModel() == MM_TRANSLATION_AND_SCALE)
+        nrows += 3;
+    else if (motionModel() == MM_TRANSLATION)
+        nrows += 4;
+
+    rows_.clear();
+    cols_.clear();
+    elems_.clear();
+    obj_.assign(ncols, 0);
+    collb_.assign(ncols, -INF);
+    colub_.assign(ncols, INF);
+
+    int c = 6;
+
+    for (int i = 0; i < npoints; ++i, c += 2)
+    {
+        obj_[c] = 1;
+        collb_[c] = 0;
+
+        obj_[c+1] = 1;
+        collb_[c+1] = 0;
+    }
+
+    elems_.clear();
+    rowlb_.assign(nrows, -INF);
+    rowub_.assign(nrows, INF);
+
+    int r = 0;
+    Point2f p0, p1;
+
+    for (int i = 0; i < npoints; ++i, r += 4)
+    {
+        p0 = points0_[i];
+        p1 = points1_[i];
+
+        set(r, 0, p0.x); set(r, 1, p0.y); set(r, 2, 1); set(r, 6+2*i, -1);
+        rowub_[r] = p1.x;
+
+        set(r+1, 3, p0.x); set(r+1, 4, p0.y); set(r+1, 5, 1); set(r+1, 6+2*i+1, -1);
+        rowub_[r+1] = p1.y;
+
+        set(r+2, 0, p0.x); set(r+2, 1, p0.y); set(r+2, 2, 1); set(r+2, 6+2*i, 1);
+        rowlb_[r+2] = p1.x;
+
+        set(r+3, 3, p0.x); set(r+3, 4, p0.y); set(r+3, 5, 1); set(r+3, 6+2*i+1, 1);
+        rowlb_[r+3] = p1.y;
+    }
+
+    if (motionModel() == MM_SIMILARITY)
+    {
+        set(r, 0, 1); set(r, 4, -1); rowlb_[r] = rowub_[r] = 0;
+        set(r+1, 1, 1); set(r+1, 3, 1); rowlb_[r+1] = rowub_[r+1] = 0;
+    }
+    else if (motionModel() == MM_TRANSLATION_AND_SCALE)
+    {
+        set(r, 0, 1); set(r, 4, -1); rowlb_[r] = rowub_[r] = 0;
+        set(r+1, 1, 1); rowlb_[r+1] = rowub_[r+1] = 0;
+        set(r+2, 3, 1); rowlb_[r+2] = rowub_[r+2] = 0;
+    }
+    else if (motionModel() == MM_TRANSLATION)
+    {
+        set(r, 0, 1); rowlb_[r] = rowub_[r] = 1;
+        set(r+1, 1, 1); rowlb_[r+1] = rowub_[r+1] = 0;
+        set(r+2, 3, 1); rowlb_[r+2] = rowub_[r+2] = 0;
+        set(r+3, 4, 1); rowlb_[r+3] = rowub_[r+3] = 1;
+    }
+
+    // solve
+
+    CoinPackedMatrix A(true, &rows_[0], &cols_[0], &elems_[0], elems_.size());
+    A.setDimensions(nrows, ncols);
+
+    ClpSimplex model(false);
+    model.loadProblem(A, &collb_[0], &colub_[0], &obj_[0], &rowlb_[0], &rowub_[0]);
+
+    ClpDualRowSteepest dualSteep(1);
+    model.setDualRowPivotAlgorithm(dualSteep);
+    model.scaling(1);
+
+    model.dual();
+
+    // extract motion
+
+    const double *sol = model.getColSolution();
+
+    Mat_<float> M = Mat::eye(3, 3, CV_32F);
+    M(0,0) = sol[0];
+    M(0,1) = sol[1];
+    M(0,2) = sol[2];
+    M(1,0) = sol[3];
+    M(1,1) = sol[4];
+    M(1,2) = sol[5];
+
+    if (ok) *ok = true;
+    return M;
+#endif
+}
+
+
+FromFileMotionReader::FromFileMotionReader(const String &path)
+    : ImageMotionEstimatorBase(MM_UNKNOWN)
+{
+    file_.open(path.c_str());
+    CV_Assert(file_.is_open());
+}
+
+
+Mat FromFileMotionReader::estimate(const Mat &/*frame0*/, const Mat &/*frame1*/, bool *ok)
+{
+    Mat_<float> M(3, 3);
+    bool ok_;
+    file_ >> M(0,0) >> M(0,1) >> M(0,2)
+          >> M(1,0) >> M(1,1) >> M(1,2)
+          >> M(2,0) >> M(2,1) >> M(2,2) >> ok_;
+    if (ok) *ok = ok_;
+    return std::move(M);
+}
+
+
+ToFileMotionWriter::ToFileMotionWriter(const String &path, Ptr<ImageMotionEstimatorBase> estimator)
+    : ImageMotionEstimatorBase(estimator->motionModel()), motionEstimator_(estimator)
+{
+    file_.open(path.c_str());
+    CV_Assert(file_.is_open());
+}
+
+
+Mat ToFileMotionWriter::estimate(const Mat &frame0, const Mat &frame1, bool *ok)
+{
+    bool ok_;
+    Mat_<float> M = motionEstimator_->estimate(frame0, frame1, &ok_);
+    file_ << M(0,0) << " " << M(0,1) << " " << M(0,2) << " "
+          << M(1,0) << " " << M(1,1) << " " << M(1,2) << " "
+          << M(2,0) << " " << M(2,1) << " " << M(2,2) << " " << ok_ << std::endl;
+    if (ok) *ok = ok_;
+    return std::move(M);
+}
+
+
+KeypointBasedMotionEstimator::KeypointBasedMotionEstimator(Ptr<MotionEstimatorBase> estimator)
+    : ImageMotionEstimatorBase(estimator->motionModel()), motionEstimator_(estimator)
+{
+    setDetector(GFTTDetector::create());
+    setOpticalFlowEstimator(makePtr<SparsePyrLkOptFlowEstimator>());
+    setOutlierRejector(makePtr<NullOutlierRejector>());
+}
+
+
+Mat KeypointBasedMotionEstimator::estimate(const Mat &frame0, const Mat &frame1, bool *ok)
+{
+    InputArray image0 = frame0;
+    InputArray image1 = frame1;
+
+    return estimate(image0, image1, ok);
+}
+
+Mat KeypointBasedMotionEstimator::estimate(InputArray frame0, InputArray frame1, bool *ok)
+{
+    // find keypoints
+    detector_->detect(frame0, keypointsPrev_, mask_);
+    if (keypointsPrev_.empty())
+        return Mat::eye(3, 3, CV_32F);
+
+    // extract points from keypoints
+    pointsPrev_.resize(keypointsPrev_.size());
+    for (size_t i = 0; i < keypointsPrev_.size(); ++i)
+        pointsPrev_[i] = keypointsPrev_[i].pt;
+
+    // find correspondences
+    optFlowEstimator_->run(frame0, frame1, pointsPrev_, points_, status_, noArray());
+
+    // leave good correspondences only
+
+    pointsPrevGood_.clear(); pointsPrevGood_.reserve(points_.size());
+    pointsGood_.clear(); pointsGood_.reserve(points_.size());
+
+    for (size_t i = 0; i < points_.size(); ++i)
+    {
+        if (status_[i])
+        {
+            pointsPrevGood_.push_back(pointsPrev_[i]);
+            pointsGood_.push_back(points_[i]);
+        }
+    }
+
+    // perform outlier rejection
+
+    IOutlierRejector *outlRejector = outlierRejector_.get();
+    if (!dynamic_cast<NullOutlierRejector*>(outlRejector))
+    {
+        pointsPrev_.swap(pointsPrevGood_);
+        points_.swap(pointsGood_);
+
+        outlierRejector_->process(frame0.size(), pointsPrev_, points_, status_);
+
+        pointsPrevGood_.clear();
+        pointsPrevGood_.reserve(points_.size());
+
+        pointsGood_.clear();
+        pointsGood_.reserve(points_.size());
+
+        for (size_t i = 0; i < points_.size(); ++i)
+        {
+            if (status_[i])
+            {
+                pointsPrevGood_.push_back(pointsPrev_[i]);
+                pointsGood_.push_back(points_[i]);
+            }
+        }
+    }
+
+    // estimate motion
+    return motionEstimator_->estimate(pointsPrevGood_, pointsGood_, ok);
+}
+
+#if defined(HAVE_OPENCV_CUDAIMGPROC) && defined(HAVE_OPENCV_CUDAOPTFLOW)
+
+KeypointBasedMotionEstimatorGpu::KeypointBasedMotionEstimatorGpu(Ptr<MotionEstimatorBase> estimator)
+    : ImageMotionEstimatorBase(estimator->motionModel()), motionEstimator_(estimator)
+{
+    detector_ = cuda::createGoodFeaturesToTrackDetector(CV_8UC1);
+
+    CV_Assert(cuda::getCudaEnabledDeviceCount() > 0);
+    setOutlierRejector(makePtr<NullOutlierRejector>());
+}
+
+
+Mat KeypointBasedMotionEstimatorGpu::estimate(const Mat &frame0, const Mat &frame1, bool *ok)
+{
+    frame0_.upload(frame0);
+    frame1_.upload(frame1);
+    return estimate(frame0_, frame1_, ok);
+}
+
+
+Mat KeypointBasedMotionEstimatorGpu::estimate(const cuda::GpuMat &frame0, const cuda::GpuMat &frame1, bool *ok)
+{
+    // convert frame to gray if it's color
+
+    cuda::GpuMat grayFrame0;
+    if (frame0.channels() == 1)
+        grayFrame0 = frame0;
+    else
+    {
+        cuda::cvtColor(frame0, grayFrame0_, COLOR_BGR2GRAY);
+        grayFrame0 = grayFrame0_;
+    }
+
+    // find keypoints
+    detector_->detect(grayFrame0, pointsPrev_);
+
+    // find correspondences
+    optFlowEstimator_.run(frame0, frame1, pointsPrev_, points_, status_);
+
+    // leave good correspondences only
+    cuda::compactPoints(pointsPrev_, points_, status_);
+
+    pointsPrev_.download(hostPointsPrev_);
+    points_.download(hostPoints_);
+
+    // perform outlier rejection
+
+    IOutlierRejector *rejector = outlierRejector_.get();
+    if (!dynamic_cast<NullOutlierRejector*>(rejector))
+    {
+        outlierRejector_->process(frame0.size(), hostPointsPrev_, hostPoints_, rejectionStatus_);
+
+        hostPointsPrevTmp_.clear();
+        hostPointsPrevTmp_.reserve(hostPoints_.cols);
+
+        hostPointsTmp_.clear();
+        hostPointsTmp_.reserve(hostPoints_.cols);
+
+        for (int i = 0; i < hostPoints_.cols; ++i)
+        {
+            if (rejectionStatus_[i])
+            {
+                hostPointsPrevTmp_.push_back(hostPointsPrev_.at<Point2f>(0,i));
+                hostPointsTmp_.push_back(hostPoints_.at<Point2f>(0,i));
+            }
+        }
+
+        hostPointsPrev_ = Mat(1, (int)hostPointsPrevTmp_.size(), CV_32FC2, &hostPointsPrevTmp_[0]);
+        hostPoints_ = Mat(1, (int)hostPointsTmp_.size(), CV_32FC2, &hostPointsTmp_[0]);
+    }
+
+    // estimate motion
+    return motionEstimator_->estimate(hostPointsPrev_, hostPoints_, ok);
+}
+
+#endif // defined(HAVE_OPENCV_CUDAIMGPROC) && defined(HAVE_OPENCV_CUDAOPTFLOW)
+
+
+Mat getMotion(int from, int to, const std::vector<Mat> &motions)
+{
+    CV_INSTRUMENT_REGION();
+
+    Mat M = Mat::eye(3, 3, CV_32F);
+    if (to > from)
+    {
+        for (int i = from; i < to; ++i)
+            M = at(i, motions) * M;
+    }
+    else if (from > to)
+    {
+        for (int i = to; i < from; ++i)
+            M = at(i, motions) * M;
+        M = M.inv();
+    }
+    return M;
+}
+
+} // namespace videostab
+} // namespace cv
diff --git a/modules/videostab/src/inpainting.cpp b/modules/videostab/src/inpainting.cpp
new file mode 100644
index 00000000000..5ca6a4bf811
--- /dev/null
+++ b/modules/videostab/src/inpainting.cpp
@@ -0,0 +1,560 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+#include <queue>
+#include "opencv2/videostab/inpainting.hpp"
+#include "opencv2/videostab/global_motion.hpp"
+#include "opencv2/videostab/fast_marching.hpp"
+#include "opencv2/videostab/ring_buffer.hpp"
+#include "opencv2/opencv_modules.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+void InpaintingPipeline::setRadius(int val)
+{
+    for (size_t i = 0; i < inpainters_.size(); ++i)
+        inpainters_[i]->setRadius(val);
+    InpainterBase::setRadius(val);
+}
+
+
+void InpaintingPipeline::setFrames(const std::vector<Mat> &val)
+{
+    for (size_t i = 0; i < inpainters_.size(); ++i)
+        inpainters_[i]->setFrames(val);
+    InpainterBase::setFrames(val);
+}
+
+
+void InpaintingPipeline::setMotionModel(MotionModel val)
+{
+    for (size_t i = 0; i < inpainters_.size(); ++i)
+        inpainters_[i]->setMotionModel(val);
+    InpainterBase::setMotionModel(val);
+}
+
+
+void InpaintingPipeline::setMotions(const std::vector<Mat> &val)
+{
+    for (size_t i = 0; i < inpainters_.size(); ++i)
+        inpainters_[i]->setMotions(val);
+    InpainterBase::setMotions(val);
+}
+
+
+void InpaintingPipeline::setStabilizedFrames(const std::vector<Mat> &val)
+{
+    for (size_t i = 0; i < inpainters_.size(); ++i)
+        inpainters_[i]->setStabilizedFrames(val);
+    InpainterBase::setStabilizedFrames(val);
+}
+
+
+void InpaintingPipeline::setStabilizationMotions(const std::vector<Mat> &val)
+{
+    for (size_t i = 0; i < inpainters_.size(); ++i)
+        inpainters_[i]->setStabilizationMotions(val);
+    InpainterBase::setStabilizationMotions(val);
+}
+
+
+void InpaintingPipeline::inpaint(int idx, Mat &frame, Mat &mask)
+{
+    CV_INSTRUMENT_REGION();
+
+    for (size_t i = 0; i < inpainters_.size(); ++i)
+        inpainters_[i]->inpaint(idx, frame, mask);
+}
+
+
+struct Pixel3
+{
+    float intens;
+    Point3_<uchar> color;
+    bool operator <(const Pixel3 &other) const { return intens < other.intens; }
+};
+
+
+ConsistentMosaicInpainter::ConsistentMosaicInpainter()
+{
+    setStdevThresh(20.f);
+}
+
+
+void ConsistentMosaicInpainter::inpaint(int idx, Mat &frame, Mat &mask)
+{
+    CV_INSTRUMENT_REGION();
+
+    CV_Assert(frame.type() == CV_8UC3);
+    CV_Assert(mask.size() == frame.size() && mask.type() == CV_8U);
+
+    Mat invS = at(idx, *stabilizationMotions_).inv();
+    std::vector<Mat_<float> > vmotions(2*radius_ + 1);
+    for (int i = -radius_; i <= radius_; ++i)
+        vmotions[radius_ + i] = getMotion(idx, idx + i, *motions_) * invS;
+
+    int n;
+    float mean, var;
+    std::vector<Pixel3> pixels(2*radius_ + 1);
+
+    Mat_<Point3_<uchar> > frame_(frame);
+    Mat_<uchar> mask_(mask);
+
+    for (int y = 0; y < mask.rows; ++y)
+    {
+        for (int x = 0; x < mask.cols; ++x)
+        {
+            if (!mask_(y, x))
+            {
+                n = 0;
+                mean = 0;
+                var = 0;
+
+                for (int i = -radius_; i <= radius_; ++i)
+                {
+                    const Mat_<Point3_<uchar> > &framei = at(idx + i, *frames_);
+                    const Mat_<float> &Mi = vmotions[radius_ + i];
+                    int xi = cvRound(Mi(0,0)*x + Mi(0,1)*y + Mi(0,2));
+                    int yi = cvRound(Mi(1,0)*x + Mi(1,1)*y + Mi(1,2));
+                    if (xi >= 0 && xi < framei.cols && yi >= 0 && yi < framei.rows)
+                    {
+                        pixels[n].color = framei(yi, xi);
+                        mean += pixels[n].intens = intensity(pixels[n].color);
+                        n++;
+                    }
+                }
+
+                if (n > 0)
+                {
+                    mean /= n;
+                    for (int i = 0; i < n; ++i)
+                        var += sqr(pixels[i].intens - mean);
+                    var /= std::max(n - 1, 1);
+
+                    if (var < stdevThresh_ * stdevThresh_)
+                    {
+                        std::sort(pixels.begin(), pixels.begin() + n);
+                        int nh = (n-1)/2;
+                        int c1 = pixels[nh].color.x;
+                        int c2 = pixels[nh].color.y;
+                        int c3 = pixels[nh].color.z;
+                        if (n-2*nh)
+                        {
+                            c1 = (c1 + pixels[nh].color.x) / 2;
+                            c2 = (c2 + pixels[nh].color.y) / 2;
+                            c3 = (c3 + pixels[nh].color.z) / 2;
+                        }
+                        frame_(y, x) = Point3_<uchar>(
+                                static_cast<uchar>(c1),
+                                static_cast<uchar>(c2),
+                                static_cast<uchar>(c3));
+                        mask_(y, x) = 255;
+                    }
+                }
+            }
+        }
+    }
+}
+
+
+static float alignementError(
+        const Mat &M, const Mat &frame0, const Mat &mask0, const Mat &frame1)
+{
+    CV_Assert(frame0.type() == CV_8UC3 && frame1.type() == CV_8UC3);
+    CV_Assert(mask0.type() == CV_8U && mask0.size() == frame0.size());
+    CV_Assert(frame0.size() == frame1.size());
+    CV_Assert(M.size() == Size(3,3) && M.type() == CV_32F);
+
+    Mat_<uchar> mask0_(mask0);
+    Mat_<float> M_(M);
+    float err = 0;
+
+    for (int y0 = 0; y0 < frame0.rows; ++y0)
+    {
+        for (int x0 = 0; x0 < frame0.cols; ++x0)
+        {
+            if (mask0_(y0,x0))
+            {
+                int x1 = cvRound(M_(0,0)*x0 + M_(0,1)*y0 + M_(0,2));
+                int y1 = cvRound(M_(1,0)*x0 + M_(1,1)*y0 + M_(1,2));
+                if (y1 >= 0 && y1 < frame1.rows && x1 >= 0 && x1 < frame1.cols)
+                    err += std::abs(intensity(frame1.at<Point3_<uchar> >(y1,x1)) -
+                                    intensity(frame0.at<Point3_<uchar> >(y0,x0)));
+            }
+        }
+    }
+
+    return err;
+}
+
+
+class MotionInpaintBody
+{
+public:
+    void operator ()(int x, int y)
+    {
+        float uEst = 0.f, vEst = 0.f, wSum = 0.f;
+
+        for (int dy = -rad; dy <= rad; ++dy)
+        {
+            for (int dx = -rad; dx <= rad; ++dx)
+            {
+                int qx0 = x + dx;
+                int qy0 = y + dy;
+
+                if (qy0 >= 0 && qy0 < mask0.rows && qx0 >= 0 && qx0 < mask0.cols && mask0(qy0,qx0))
+                {
+                    int qx1 = cvRound(qx0 + flowX(qy0,qx0));
+                    int qy1 = cvRound(qy0 + flowY(qy0,qx0));
+                    int px1 = qx1 - dx;
+                    int py1 = qy1 - dy;
+
+                    if (qx1 >= 0 && qx1 < mask1.cols && qy1 >= 0 && qy1 < mask1.rows && mask1(qy1,qx1) &&
+                        px1 >= 0 && px1 < mask1.cols && py1 >= 0 && py1 < mask1.rows && mask1(py1,px1))
+                    {
+                        float dudx = 0.f, dvdx = 0.f, dudy = 0.f, dvdy = 0.f;
+
+                        if (qx0 > 0 && mask0(qy0,qx0-1))
+                        {
+                            if (qx0+1 < mask0.cols && mask0(qy0,qx0+1))
+                            {
+                                dudx = (flowX(qy0,qx0+1) - flowX(qy0,qx0-1)) * 0.5f;
+                                dvdx = (flowY(qy0,qx0+1) - flowY(qy0,qx0-1)) * 0.5f;
+                            }
+                            else
+                            {
+                                dudx = flowX(qy0,qx0) - flowX(qy0,qx0-1);
+                                dvdx = flowY(qy0,qx0) - flowY(qy0,qx0-1);
+                            }
+                        }
+                        else if (qx0+1 < mask0.cols && mask0(qy0,qx0+1))
+                        {
+                            dudx = flowX(qy0,qx0+1) - flowX(qy0,qx0);
+                            dvdx = flowY(qy0,qx0+1) - flowY(qy0,qx0);
+                        }
+
+                        if (qy0 > 0 && mask0(qy0-1,qx0))
+                        {
+                            if (qy0+1 < mask0.rows && mask0(qy0+1,qx0))
+                            {
+                                dudy = (flowX(qy0+1,qx0) - flowX(qy0-1,qx0)) * 0.5f;
+                                dvdy = (flowY(qy0+1,qx0) - flowY(qy0-1,qx0)) * 0.5f;
+                            }
+                            else
+                            {
+                                dudy = flowX(qy0,qx0) - flowX(qy0-1,qx0);
+                                dvdy = flowY(qy0,qx0) - flowY(qy0-1,qx0);
+                            }
+                        }
+                        else if (qy0+1 < mask0.rows && mask0(qy0+1,qx0))
+                        {
+                            dudy = flowX(qy0+1,qx0) - flowX(qy0,qx0);
+                            dvdy = flowY(qy0+1,qx0) - flowY(qy0,qx0);
+                        }
+
+                        Point3_<uchar> cp = frame1(py1,px1), cq = frame1(qy1,qx1);
+                        float distColor = sqr(static_cast<float>(cp.x-cq.x))
+                                        + sqr(static_cast<float>(cp.y-cq.y))
+                                        + sqr(static_cast<float>(cp.z-cq.z));
+                        float w = 1.f / (std::sqrt(distColor * (dx*dx + dy*dy)) + eps);
+
+                        uEst += w * (flowX(qy0,qx0) - dudx*dx - dudy*dy);
+                        vEst += w * (flowY(qy0,qx0) - dvdx*dx - dvdy*dy);
+                        wSum += w;
+                    }
+                }
+            }
+        }
+
+        if (wSum > 0.f)
+        {
+            flowX(y,x) = uEst / wSum;
+            flowY(y,x) = vEst / wSum;
+            mask0(y,x) = 255;
+        }
+    }
+
+    Mat_<Point3_<uchar> > frame1;
+    Mat_<uchar> mask0, mask1;
+    Mat_<float> flowX, flowY;
+    float eps;
+    int rad;
+};
+
+
+#ifdef _MSC_VER
+#pragma warning(disable: 4702)  // unreachable code
+#endif
+MotionInpainter::MotionInpainter()
+{
+#ifdef HAVE_OPENCV_CUDAOPTFLOW
+    setOptFlowEstimator(makePtr<DensePyrLkOptFlowEstimatorGpu>());
+    setFlowErrorThreshold(1e-4f);
+    setDistThreshold(5.f);
+    setBorderMode(BORDER_REPLICATE);
+#else
+    CV_Error(Error::StsNotImplemented, "Current implementation of MotionInpainter requires CUDA");
+#endif
+}
+
+
+void MotionInpainter::inpaint(int idx, Mat &frame, Mat &mask)
+{
+    CV_INSTRUMENT_REGION();
+
+    std::priority_queue<std::pair<float,int> > neighbors;
+    std::vector<Mat> vmotions(2*radius_ + 1);
+
+    for (int i = -radius_; i <= radius_; ++i)
+    {
+        Mat motion0to1 = getMotion(idx, idx + i, *motions_) * at(idx, *stabilizationMotions_).inv();
+        vmotions[radius_ + i] = motion0to1;
+
+        if (i != 0)
+        {
+            float err = alignementError(motion0to1, frame, mask, at(idx + i, *frames_));
+            neighbors.push(std::make_pair(-err, idx + i));
+        }
+    }
+
+    if (mask1_.size() != mask.size())
+    {
+        mask1_.create(mask.size());
+        mask1_.setTo(255);
+    }
+
+    cvtColor(frame, grayFrame_, COLOR_BGR2GRAY);
+
+    MotionInpaintBody body;
+    body.rad = 2;
+    body.eps = 1e-4f;
+
+    while (!neighbors.empty())
+    {
+        int neighbor = neighbors.top().second;
+        neighbors.pop();
+
+        Mat motion1to0 = vmotions[radius_ + neighbor - idx].inv();
+
+        // warp frame
+
+        frame1_ = at(neighbor, *frames_);
+
+        if (motionModel_ != MM_HOMOGRAPHY)
+            warpAffine(
+                    frame1_, transformedFrame1_, motion1to0(Rect(0,0,3,2)), frame1_.size(),
+                    INTER_LINEAR, borderMode_);
+        else
+            warpPerspective(
+                    frame1_, transformedFrame1_, motion1to0, frame1_.size(), INTER_LINEAR,
+                    borderMode_);
+
+        cvtColor(transformedFrame1_, transformedGrayFrame1_, COLOR_BGR2GRAY);
+
+        // warp mask
+
+        if (motionModel_ != MM_HOMOGRAPHY)
+            warpAffine(
+                    mask1_, transformedMask1_, motion1to0(Rect(0,0,3,2)), mask1_.size(),
+                    INTER_NEAREST);
+        else
+            warpPerspective(mask1_, transformedMask1_, motion1to0, mask1_.size(), INTER_NEAREST);
+
+        erode(transformedMask1_, transformedMask1_, Mat());
+
+        // update flow
+
+        optFlowEstimator_->run(grayFrame_, transformedGrayFrame1_, flowX_, flowY_, flowErrors_);
+
+        calcFlowMask(
+                flowX_, flowY_, flowErrors_, flowErrorThreshold_, mask, transformedMask1_,
+                flowMask_);
+
+        body.flowX = flowX_;
+        body.flowY = flowY_;
+        body.mask0 = flowMask_;
+        body.mask1 = transformedMask1_;
+        body.frame1 = transformedFrame1_;
+        fmm_.run(flowMask_, body);
+
+        completeFrameAccordingToFlow(
+                flowMask_, flowX_, flowY_, transformedFrame1_, transformedMask1_, distThresh_,
+                frame, mask);
+    }
+}
+
+
+class ColorAverageInpaintBody
+{
+public:
+    void operator ()(int x, int y)
+    {
+        float c1 = 0, c2 = 0, c3 = 0;
+        float wSum = 0;
+
+        static const int lut[8][2] = {{-1,-1}, {-1,0}, {-1,1}, {0,-1}, {0,1}, {1,-1}, {1,0}, {1,1}};
+
+        for (int i = 0; i < 8; ++i)
+        {
+            int qx = x + lut[i][0];
+            int qy = y + lut[i][1];
+            if (qy >= 0 && qy < mask.rows && qx >= 0 && qx < mask.cols && mask(qy,qx))
+            {
+                c1 += frame.at<uchar>(qy,3*qx);
+                c2 += frame.at<uchar>(qy,3*qx+1);
+                c3 += frame.at<uchar>(qy,3*qx+2);
+                wSum += 1;
+            }
+        }
+
+        float wSumInv = (std::fabs(wSum) > 0) ? (1.f / wSum) : 0; // if wSum is 0, c1-c3 will be 0 too
+        frame(y,x) = Point3_<uchar>(
+                static_cast<uchar>(c1*wSumInv),
+                static_cast<uchar>(c2*wSumInv),
+                static_cast<uchar>(c3*wSumInv));
+        mask(y,x) = 255;
+    }
+
+    cv::Mat_<uchar> mask;
+    cv::Mat_<cv::Point3_<uchar> > frame;
+};
+
+
+void ColorAverageInpainter::inpaint(int /*idx*/, Mat &frame, Mat &mask)
+{
+    CV_INSTRUMENT_REGION();
+
+    ColorAverageInpaintBody body;
+    body.mask = mask;
+    body.frame = frame;
+    fmm_.run(mask, body);
+}
+
+
+void ColorInpainter::inpaint(int /*idx*/, Mat &frame, Mat &mask)
+{
+    CV_INSTRUMENT_REGION();
+
+    bitwise_not(mask, invMask_);
+    cv::inpaint(frame, invMask_, frame, radius_, method_);
+}
+
+
+void calcFlowMask(
+        const Mat &flowX, const Mat &flowY, const Mat &errors, float maxError,
+        const Mat &mask0, const Mat &mask1, Mat &flowMask)
+{
+    CV_INSTRUMENT_REGION();
+
+    CV_Assert(flowX.type() == CV_32F && flowX.size() == mask0.size());
+    CV_Assert(flowY.type() == CV_32F && flowY.size() == mask0.size());
+    CV_Assert(errors.type() == CV_32F && errors.size() == mask0.size());
+    CV_Assert(mask0.type() == CV_8U);
+    CV_Assert(mask1.type() == CV_8U && mask1.size() == mask0.size());
+
+    Mat_<float> flowX_(flowX), flowY_(flowY), errors_(errors);
+    Mat_<uchar> mask0_(mask0), mask1_(mask1);
+
+    flowMask.create(mask0.size(), CV_8U);
+    flowMask.setTo(0);
+    Mat_<uchar> flowMask_(flowMask);
+
+    for (int y0 = 0; y0 < flowMask_.rows; ++y0)
+    {
+        for (int x0 = 0; x0 < flowMask_.cols; ++x0)
+        {
+            if (mask0_(y0,x0) && errors_(y0,x0) < maxError)
+            {
+                int x1 = cvRound(x0 + flowX_(y0,x0));
+                int y1 = cvRound(y0 + flowY_(y0,x0));
+
+                if (x1 >= 0 && x1 < mask1_.cols && y1 >= 0 && y1 < mask1_.rows && mask1_(y1,x1))
+                    flowMask_(y0,x0) = 255;
+            }
+        }
+    }
+}
+
+
+void completeFrameAccordingToFlow(
+        const Mat &flowMask, const Mat &flowX, const Mat &flowY, const Mat &frame1, const Mat &mask1,
+        float distThresh, Mat &frame0, Mat &mask0)
+{
+    CV_INSTRUMENT_REGION();
+
+    CV_Assert(flowMask.type() == CV_8U);
+    CV_Assert(flowX.type() == CV_32F && flowX.size() == flowMask.size());
+    CV_Assert(flowY.type() == CV_32F && flowY.size() == flowMask.size());
+    CV_Assert(frame1.type() == CV_8UC3 && frame1.size() == flowMask.size());
+    CV_Assert(mask1.type() == CV_8U && mask1.size() == flowMask.size());
+    CV_Assert(frame0.type() == CV_8UC3 && frame0.size() == flowMask.size());
+    CV_Assert(mask0.type() == CV_8U && mask0.size() == flowMask.size());
+
+    Mat_<uchar> flowMask_(flowMask), mask1_(mask1), mask0_(mask0);
+    Mat_<float> flowX_(flowX), flowY_(flowY);
+
+    for (int y0 = 0; y0 < frame0.rows; ++y0)
+    {
+        for (int x0 = 0; x0 < frame0.cols; ++x0)
+        {
+            if (!mask0_(y0,x0) && flowMask_(y0,x0))
+            {
+                int x1 = cvRound(x0 + flowX_(y0,x0));
+                int y1 = cvRound(y0 + flowY_(y0,x0));
+
+                if (x1 >= 0 && x1 < frame1.cols && y1 >= 0 && y1 < frame1.rows && mask1_(y1,x1)
+                    && sqr(flowX_(y0,x0)) + sqr(flowY_(y0,x0)) < sqr(distThresh))
+                {
+                    frame0.at<Point3_<uchar> >(y0,x0) = frame1.at<Point3_<uchar> >(y1,x1);
+                    mask0_(y0,x0) = 255;
+                }
+            }
+        }
+    }
+}
+
+} // namespace videostab
+} // namespace cv
diff --git a/modules/videostab/src/log.cpp b/modules/videostab/src/log.cpp
new file mode 100644
index 00000000000..4c6d414e07b
--- /dev/null
+++ b/modules/videostab/src/log.cpp
@@ -0,0 +1,63 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+#include <cstdio>
+#include <cstdarg>
+#include "opencv2/videostab/log.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+void LogToStdout::print(const char *format, ...)
+{
+    va_list args;
+    va_start(args, format);
+    vprintf(format, args);
+    fflush(stdout);
+    va_end(args);
+}
+
+} // namespace videostab
+} // namespace cv
diff --git a/modules/videostab/src/motion_stabilizing.cpp b/modules/videostab/src/motion_stabilizing.cpp
new file mode 100644
index 00000000000..3705d9d95a0
--- /dev/null
+++ b/modules/videostab/src/motion_stabilizing.cpp
@@ -0,0 +1,718 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+#include "opencv2/videostab/motion_stabilizing.hpp"
+#include "opencv2/videostab/global_motion.hpp"
+#include "opencv2/videostab/ring_buffer.hpp"
+#include "clp.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+void MotionStabilizationPipeline::stabilize(
+        int size, const std::vector<Mat> &motions, const Range &range, Mat *stabilizationMotions)
+{
+    std::vector<Mat> updatedMotions(motions.size());
+    for (size_t i = 0; i < motions.size(); ++i)
+        updatedMotions[i] = motions[i].clone();
+
+    std::vector<Mat> stabilizationMotions_(size);
+
+    for (int i = 0; i < size; ++i)
+        stabilizationMotions[i] = Mat::eye(3, 3, CV_32F);
+
+    for (size_t i = 0; i < stabilizers_.size(); ++i)
+    {
+        stabilizers_[i]->stabilize(size, updatedMotions, range, &stabilizationMotions_[0]);
+
+        for (int k = 0; k < size; ++k)
+            stabilizationMotions[k] = stabilizationMotions_[k] * stabilizationMotions[k];
+
+        for (int j = 0; j + 1 < size; ++j)
+        {
+            Mat S0 = stabilizationMotions[j];
+            Mat S1 = stabilizationMotions[j+1];
+            at(j, updatedMotions) = S1 * at(j, updatedMotions) * S0.inv();
+        }
+    }
+}
+
+
+void MotionFilterBase::stabilize(
+        int size, const std::vector<Mat> &motions, const Range &range, Mat *stabilizationMotions)
+{
+    for (int i = 0; i < size; ++i)
+        stabilizationMotions[i] = stabilize(i, motions, range);
+}
+
+
+void GaussianMotionFilter::setParams(int _radius, float _stdev)
+{
+    radius_ = _radius;
+    stdev_ = _stdev > 0.f ? _stdev : std::sqrt(static_cast<float>(_radius));
+
+    float sum = 0;
+    weight_.resize(2*radius_ + 1);
+    for (int i = -radius_; i <= radius_; ++i)
+        sum += weight_[radius_ + i] = std::exp(-i*i/(stdev_*stdev_));
+    for (int i = -radius_; i <= radius_; ++i)
+        weight_[radius_ + i] /= sum;
+}
+
+
+Mat GaussianMotionFilter::stabilize(int idx, const std::vector<Mat> &motions, const Range &range)
+{
+    const Mat &cur = at(idx, motions);
+    Mat res = Mat::zeros(cur.size(), cur.type());
+    float sum = 0.f;
+    int iMin = std::max(idx - radius_, range.start);
+    int iMax = std::min(idx + radius_, range.end);
+    for (int i = iMin; i <= iMax; ++i)
+    {
+        res += weight_[radius_ + i - idx] * getMotion(idx, i, motions);
+        sum += weight_[radius_ + i - idx];
+    }
+    return sum > 0.f ? res / sum : Mat::eye(cur.size(), cur.type());
+}
+
+
+LpMotionStabilizer::LpMotionStabilizer(MotionModel model)
+{
+    setMotionModel(model);
+    setFrameSize(Size(0,0));
+    setTrimRatio(0.1f);
+    setWeight1(1);
+    setWeight2(10);
+    setWeight3(100);
+    setWeight4(100);
+}
+
+
+#ifndef HAVE_CLP
+
+void LpMotionStabilizer::stabilize(int, const std::vector<Mat>&, const Range &, Mat*)
+{
+    CV_Error(Error::StsError, "The library is built without Clp support");
+}
+
+#else
+
+void LpMotionStabilizer::stabilize(
+        int size, const std::vector<Mat> &motions, const Range &/*range*/, Mat *stabilizationMotions)
+{
+    CV_Assert(model_ <= MM_AFFINE);
+
+    int N = size;
+    const std::vector<Mat> &M = motions;
+    Mat *S = stabilizationMotions;
+
+    double w = frameSize_.width, h = frameSize_.height;
+    double tw = w * trimRatio_, th = h * trimRatio_;
+
+    int ncols = 4*N + 6*(N-1) + 6*(N-2) + 6*(N-3);
+    int nrows = 8*N + 2*6*(N-1) + 2*6*(N-2) + 2*6*(N-3);
+
+    rows_.clear();
+    cols_.clear();
+    elems_.clear();
+
+    obj_.assign(ncols, 0);
+    collb_.assign(ncols, -INF);
+    colub_.assign(ncols, INF);
+    int c = 4*N;
+
+    // for each slack variable e[t] (error bound)
+    for (int t = 0; t < N-1; ++t, c += 6)
+    {
+        // e[t](0,0)
+        obj_[c] = w4_*w1_;
+        collb_[c] = 0;
+
+        // e[t](0,1)
+        obj_[c+1] = w4_*w1_;
+        collb_[c+1] = 0;
+
+        // e[t](0,2)
+        obj_[c+2] = w1_;
+        collb_[c+2] = 0;
+
+        // e[t](1,0)
+        obj_[c+3] = w4_*w1_;
+        collb_[c+3] = 0;
+
+        // e[t](1,1)
+        obj_[c+4] = w4_*w1_;
+        collb_[c+4] = 0;
+
+        // e[t](1,2)
+        obj_[c+5] = w1_;
+        collb_[c+5] = 0;
+    }
+    for (int t = 0; t < N-2; ++t, c += 6)
+    {
+        // e[t](0,0)
+        obj_[c] = w4_*w2_;
+        collb_[c] = 0;
+
+        // e[t](0,1)
+        obj_[c+1] = w4_*w2_;
+        collb_[c+1] = 0;
+
+        // e[t](0,2)
+        obj_[c+2] = w2_;
+        collb_[c+2] = 0;
+
+        // e[t](1,0)
+        obj_[c+3] = w4_*w2_;
+        collb_[c+3] = 0;
+
+        // e[t](1,1)
+        obj_[c+4] = w4_*w2_;
+        collb_[c+4] = 0;
+
+        // e[t](1,2)
+        obj_[c+5] = w2_;
+        collb_[c+5] = 0;
+    }
+    for (int t = 0; t < N-3; ++t, c += 6)
+    {
+        // e[t](0,0)
+        obj_[c] = w4_*w3_;
+        collb_[c] = 0;
+
+        // e[t](0,1)
+        obj_[c+1] = w4_*w3_;
+        collb_[c+1] = 0;
+
+        // e[t](0,2)
+        obj_[c+2] = w3_;
+        collb_[c+2] = 0;
+
+        // e[t](1,0)
+        obj_[c+3] = w4_*w3_;
+        collb_[c+3] = 0;
+
+        // e[t](1,1)
+        obj_[c+4] = w4_*w3_;
+        collb_[c+4] = 0;
+
+        // e[t](1,2)
+        obj_[c+5] = w3_;
+        collb_[c+5] = 0;
+    }
+
+    elems_.clear();
+    rowlb_.assign(nrows, -INF);
+    rowub_.assign(nrows, INF);
+
+    int r = 0;
+
+    // frame corners
+    const Point2d pt[] = {Point2d(0,0), Point2d(w,0), Point2d(w,h), Point2d(0,h)};
+
+    // for each frame
+    for (int t = 0; t < N; ++t)
+    {
+        c = 4*t;
+
+        // for each frame corner
+        for (int i = 0; i < 4; ++i, r += 2)
+        {
+            set(r, c, pt[i].x); set(r, c+1, pt[i].y); set(r, c+2, 1);
+            set(r+1, c, pt[i].y); set(r+1, c+1, -pt[i].x); set(r+1, c+3, 1);
+            rowlb_[r] = pt[i].x-tw;
+            rowub_[r] = pt[i].x+tw;
+            rowlb_[r+1] = pt[i].y-th;
+            rowub_[r+1] = pt[i].y+th;
+        }
+    }
+
+    // for each S[t+1]M[t] - S[t] - e[t] <= 0 condition
+    for (int t = 0; t < N-1; ++t, r += 6)
+    {
+        Mat_<float> M0 = at(t,M);
+
+        c = 4*t;
+        set(r, c, -1);
+        set(r+1, c+1, -1);
+        set(r+2, c+2, -1);
+        set(r+3, c+1, 1);
+        set(r+4, c, -1);
+        set(r+5, c+3, -1);
+
+        c = 4*(t+1);
+        set(r, c, M0(0,0)); set(r, c+1, M0(1,0));
+        set(r+1, c, M0(0,1)); set(r+1, c+1, M0(1,1));
+        set(r+2, c, M0(0,2)); set(r+2, c+1, M0(1,2)); set(r+2, c+2, 1);
+        set(r+3, c, M0(1,0)); set(r+3, c+1, -M0(0,0));
+        set(r+4, c, M0(1,1)); set(r+4, c+1, -M0(0,1));
+        set(r+5, c, M0(1,2)); set(r+5, c+1, -M0(0,2)); set(r+5, c+3, 1);
+
+        c = 4*N + 6*t;
+        for (int i = 0; i < 6; ++i)
+            set(r+i, c+i, -1);
+
+        rowub_[r] = 0;
+        rowub_[r+1] = 0;
+        rowub_[r+2] = 0;
+        rowub_[r+3] = 0;
+        rowub_[r+4] = 0;
+        rowub_[r+5] = 0;
+    }
+
+    // for each 0 <= S[t+1]M[t] - S[t] + e[t] condition
+    for (int t = 0; t < N-1; ++t, r += 6)
+    {
+        Mat_<float> M0 = at(t,M);
+
+        c = 4*t;
+        set(r, c, -1);
+        set(r+1, c+1, -1);
+        set(r+2, c+2, -1);
+        set(r+3, c+1, 1);
+        set(r+4, c, -1);
+        set(r+5, c+3, -1);
+
+        c = 4*(t+1);
+        set(r, c, M0(0,0)); set(r, c+1, M0(1,0));
+        set(r+1, c, M0(0,1)); set(r+1, c+1, M0(1,1));
+        set(r+2, c, M0(0,2)); set(r+2, c+1, M0(1,2)); set(r+2, c+2, 1);
+        set(r+3, c, M0(1,0)); set(r+3, c+1, -M0(0,0));
+        set(r+4, c, M0(1,1)); set(r+4, c+1, -M0(0,1));
+        set(r+5, c, M0(1,2)); set(r+5, c+1, -M0(0,2)); set(r+5, c+3, 1);
+
+        c = 4*N + 6*t;
+        for (int i = 0; i < 6; ++i)
+            set(r+i, c+i, 1);
+
+        rowlb_[r] = 0;
+        rowlb_[r+1] = 0;
+        rowlb_[r+2] = 0;
+        rowlb_[r+3] = 0;
+        rowlb_[r+4] = 0;
+        rowlb_[r+5] = 0;
+    }
+
+    // for each S[t+2]M[t+1] - S[t+1]*(I+M[t]) + S[t] - e[t] <= 0 condition
+    for (int t = 0; t < N-2; ++t, r += 6)
+    {
+        Mat_<float> M0 = at(t,M), M1 = at(t+1,M);
+
+        c = 4*t;
+        set(r, c, 1);
+        set(r+1, c+1, 1);
+        set(r+2, c+2, 1);
+        set(r+3, c+1, -1);
+        set(r+4, c, 1);
+        set(r+5, c+3, 1);
+
+        c = 4*(t+1);
+        set(r, c, -M0(0,0)-1); set(r, c+1, -M0(1,0));
+        set(r+1, c, -M0(0,1)); set(r+1, c+1, -M0(1,1)-1);
+        set(r+2, c, -M0(0,2)); set(r+2, c+1, -M0(1,2)); set(r+2, c+2, -2);
+        set(r+3, c, -M0(1,0)); set(r+3, c+1, M0(0,0)+1);
+        set(r+4, c, -M0(1,1)-1); set(r+4, c+1, M0(0,1));
+        set(r+5, c, -M0(1,2)); set(r+5, c+1, M0(0,2)); set(r+5, c+3, -2);
+
+        c = 4*(t+2);
+        set(r, c, M1(0,0)); set(r, c+1, M1(1,0));
+        set(r+1, c, M1(0,1)); set(r+1, c+1, M1(1,1));
+        set(r+2, c, M1(0,2)); set(r+2, c+1, M1(1,2)); set(r+2, c+2, 1);
+        set(r+3, c, M1(1,0)); set(r+3, c+1, -M1(0,0));
+        set(r+4, c, M1(1,1)); set(r+4, c+1, -M1(0,1));
+        set(r+5, c, M1(1,2)); set(r+5, c+1, -M1(0,2)); set(r+5, c+3, 1);
+
+        c = 4*N + 6*(N-1) + 6*t;
+        for (int i = 0; i < 6; ++i)
+            set(r+i, c+i, -1);
+
+        rowub_[r] = 0;
+        rowub_[r+1] = 0;
+        rowub_[r+2] = 0;
+        rowub_[r+3] = 0;
+        rowub_[r+4] = 0;
+        rowub_[r+5] = 0;
+    }
+
+    // for each 0 <= S[t+2]M[t+1]] - S[t+1]*(I+M[t]) + S[t] + e[t] condition
+    for (int t = 0; t < N-2; ++t, r += 6)
+    {
+        Mat_<float> M0 = at(t,M), M1 = at(t+1,M);
+
+        c = 4*t;
+        set(r, c, 1);
+        set(r+1, c+1, 1);
+        set(r+2, c+2, 1);
+        set(r+3, c+1, -1);
+        set(r+4, c, 1);
+        set(r+5, c+3, 1);
+
+        c = 4*(t+1);
+        set(r, c, -M0(0,0)-1); set(r, c+1, -M0(1,0));
+        set(r+1, c, -M0(0,1)); set(r+1, c+1, -M0(1,1)-1);
+        set(r+2, c, -M0(0,2)); set(r+2, c+1, -M0(1,2)); set(r+2, c+2, -2);
+        set(r+3, c, -M0(1,0)); set(r+3, c+1, M0(0,0)+1);
+        set(r+4, c, -M0(1,1)-1); set(r+4, c+1, M0(0,1));
+        set(r+5, c, -M0(1,2)); set(r+5, c+1, M0(0,2)); set(r+5, c+3, -2);
+
+        c = 4*(t+2);
+        set(r, c, M1(0,0)); set(r, c+1, M1(1,0));
+        set(r+1, c, M1(0,1)); set(r+1, c+1, M1(1,1));
+        set(r+2, c, M1(0,2)); set(r+2, c+1, M1(1,2)); set(r+2, c+2, 1);
+        set(r+3, c, M1(1,0)); set(r+3, c+1, -M1(0,0));
+        set(r+4, c, M1(1,1)); set(r+4, c+1, -M1(0,1));
+        set(r+5, c, M1(1,2)); set(r+5, c+1, -M1(0,2)); set(r+5, c+3, 1);
+
+        c = 4*N + 6*(N-1) + 6*t;
+        for (int i = 0; i < 6; ++i)
+            set(r+i, c+i, 1);
+
+        rowlb_[r] = 0;
+        rowlb_[r+1] = 0;
+        rowlb_[r+2] = 0;
+        rowlb_[r+3] = 0;
+        rowlb_[r+4] = 0;
+        rowlb_[r+5] = 0;
+    }
+
+    // for each S[t+3]M[t+2] - S[t+2]*(I+2M[t+1]) + S[t+1]*(2*I+M[t]) - S[t] - e[t] <= 0 condition
+    for (int t = 0; t < N-3; ++t, r += 6)
+    {
+        Mat_<float> M0 = at(t,M), M1 = at(t+1,M), M2 = at(t+2,M);
+
+        c = 4*t;
+        set(r, c, -1);
+        set(r+1, c+1, -1);
+        set(r+2, c+2, -1);
+        set(r+3, c+1, 1);
+        set(r+4, c, -1);
+        set(r+5, c+3, -1);
+
+        c = 4*(t+1);
+        set(r, c, M0(0,0)+2); set(r, c+1, M0(1,0));
+        set(r+1, c, M0(0,1)); set(r+1, c+1, M0(1,1)+2);
+        set(r+2, c, M0(0,2)); set(r+2, c+1, M0(1,2)); set(r+2, c+2, 3);
+        set(r+3, c, M0(1,0)); set(r+3, c+1, -M0(0,0)-2);
+        set(r+4, c, M0(1,1)+2); set(r+4, c+1, -M0(0,1));
+        set(r+5, c, M0(1,2)); set(r+5, c+1, -M0(0,2)); set(r+5, c+3, 3);
+
+        c = 4*(t+2);
+        set(r, c, -2*M1(0,0)-1); set(r, c+1, -2*M1(1,0));
+        set(r+1, c, -2*M1(0,1)); set(r+1, c+1, -2*M1(1,1)-1);
+        set(r+2, c, -2*M1(0,2)); set(r+2, c+1, -2*M1(1,2)); set(r+2, c+2, -3);
+        set(r+3, c, -2*M1(1,0)); set(r+3, c+1, 2*M1(0,0)+1);
+        set(r+4, c, -2*M1(1,1)-1); set(r+4, c+1, 2*M1(0,1));
+        set(r+5, c, -2*M1(1,2)); set(r+5, c+1, 2*M1(0,2)); set(r+5, c+3, -3);
+
+        c = 4*(t+3);
+        set(r, c, M2(0,0)); set(r, c+1, M2(1,0));
+        set(r+1, c, M2(0,1)); set(r+1, c+1, M2(1,1));
+        set(r+2, c, M2(0,2)); set(r+2, c+1, M2(1,2)); set(r+2, c+2, 1);
+        set(r+3, c, M2(1,0)); set(r+3, c+1, -M2(0,0));
+        set(r+4, c, M2(1,1)); set(r+4, c+1, -M2(0,1));
+        set(r+5, c, M2(1,2)); set(r+5, c+1, -M2(0,2)); set(r+5, c+3, 1);
+
+        c = 4*N + 6*(N-1) + 6*(N-2) + 6*t;
+        for (int i = 0; i < 6; ++i)
+            set(r+i, c+i, -1);
+
+        rowub_[r] = 0;
+        rowub_[r+1] = 0;
+        rowub_[r+2] = 0;
+        rowub_[r+3] = 0;
+        rowub_[r+4] = 0;
+        rowub_[r+5] = 0;
+    }
+
+    // for each 0 <= S[t+3]M[t+2] - S[t+2]*(I+2M[t+1]) + S[t+1]*(2*I+M[t]) + e[t] condition
+    for (int t = 0; t < N-3; ++t, r += 6)
+    {
+        Mat_<float> M0 = at(t,M), M1 = at(t+1,M), M2 = at(t+2,M);
+
+        c = 4*t;
+        set(r, c, -1);
+        set(r+1, c+1, -1);
+        set(r+2, c+2, -1);
+        set(r+3, c+1, 1);
+        set(r+4, c, -1);
+        set(r+5, c+3, -1);
+
+        c = 4*(t+1);
+        set(r, c, M0(0,0)+2); set(r, c+1, M0(1,0));
+        set(r+1, c, M0(0,1)); set(r+1, c+1, M0(1,1)+2);
+        set(r+2, c, M0(0,2)); set(r+2, c+1, M0(1,2)); set(r+2, c+2, 3);
+        set(r+3, c, M0(1,0)); set(r+3, c+1, -M0(0,0)-2);
+        set(r+4, c, M0(1,1)+2); set(r+4, c+1, -M0(0,1));
+        set(r+5, c, M0(1,2)); set(r+5, c+1, -M0(0,2)); set(r+5, c+3, 3);
+
+        c = 4*(t+2);
+        set(r, c, -2*M1(0,0)-1); set(r, c+1, -2*M1(1,0));
+        set(r+1, c, -2*M1(0,1)); set(r+1, c+1, -2*M1(1,1)-1);
+        set(r+2, c, -2*M1(0,2)); set(r+2, c+1, -2*M1(1,2)); set(r+2, c+2, -3);
+        set(r+3, c, -2*M1(1,0)); set(r+3, c+1, 2*M1(0,0)+1);
+        set(r+4, c, -2*M1(1,1)-1); set(r+4, c+1, 2*M1(0,1));
+        set(r+5, c, -2*M1(1,2)); set(r+5, c+1, 2*M1(0,2)); set(r+5, c+3, -3);
+
+        c = 4*(t+3);
+        set(r, c, M2(0,0)); set(r, c+1, M2(1,0));
+        set(r+1, c, M2(0,1)); set(r+1, c+1, M2(1,1));
+        set(r+2, c, M2(0,2)); set(r+2, c+1, M2(1,2)); set(r+2, c+2, 1);
+        set(r+3, c, M2(1,0)); set(r+3, c+1, -M2(0,0));
+        set(r+4, c, M2(1,1)); set(r+4, c+1, -M2(0,1));
+        set(r+5, c, M2(1,2)); set(r+5, c+1, -M2(0,2)); set(r+5, c+3, 1);
+
+        c = 4*N + 6*(N-1) + 6*(N-2) + 6*t;
+        for (int i = 0; i < 6; ++i)
+            set(r+i, c+i, 1);
+
+        rowlb_[r] = 0;
+        rowlb_[r+1] = 0;
+        rowlb_[r+2] = 0;
+        rowlb_[r+3] = 0;
+        rowlb_[r+4] = 0;
+        rowlb_[r+5] = 0;
+    }
+
+    // solve
+
+    CoinPackedMatrix A(true, &rows_[0], &cols_[0], &elems_[0], elems_.size());
+    A.setDimensions(nrows, ncols);
+
+    ClpSimplex model(false);
+    model.loadProblem(A, &collb_[0], &colub_[0], &obj_[0], &rowlb_[0], &rowub_[0]);
+
+    ClpDualRowSteepest dualSteep(1);
+    model.setDualRowPivotAlgorithm(dualSteep);
+
+    ClpPrimalColumnSteepest primalSteep(1);
+    model.setPrimalColumnPivotAlgorithm(primalSteep);
+
+    model.scaling(1);
+
+    ClpPresolve presolveInfo;
+    Ptr<ClpSimplex> presolvedModel(presolveInfo.presolvedModel(model));
+
+    if (presolvedModel)
+    {
+        presolvedModel->dual();
+        presolveInfo.postsolve(true);
+        model.checkSolution();
+        model.primal(1);
+    }
+    else
+    {
+        model.dual();
+        model.checkSolution();
+        model.primal(1);
+    }
+
+    // save results
+
+    const double *sol = model.getColSolution();
+    c = 0;
+
+    for (int t = 0; t < N; ++t, c += 4)
+    {
+        Mat_<float> S0 = Mat::eye(3, 3, CV_32F);
+        S0(1,1) = S0(0,0) = sol[c];
+        S0(0,1) = sol[c+1];
+        S0(1,0) = -sol[c+1];
+        S0(0,2) = sol[c+2];
+        S0(1,2) = sol[c+3];
+        S[t] = S0;
+    }
+}
+#endif // #ifndef HAVE_CLP
+
+
+static inline int areaSign(Point2f a, Point2f b, Point2f c)
+{
+    double area = (b-a).cross(c-a);
+    if (area < -1e-5) return -1;
+    if (area > 1e-5) return 1;
+    return 0;
+}
+
+
+static inline bool segmentsIntersect(Point2f a, Point2f b, Point2f c, Point2f d)
+{
+    return areaSign(a,b,c) * areaSign(a,b,d) < 0 &&
+           areaSign(c,d,a) * areaSign(c,d,b) < 0;
+}
+
+
+// Checks if rect a (with sides parallel to axis) is inside rect b (arbitrary).
+// Rects must be passed in the [(0,0), (w,0), (w,h), (0,h)] order.
+static inline bool isRectInside(const Point2f a[4], const Point2f b[4])
+{
+    for (int i = 0; i < 4; ++i)
+        if (b[i].x > a[0].x && b[i].x < a[2].x && b[i].y > a[0].y && b[i].y < a[2].y)
+            return false;
+    for (int i = 0; i < 4; ++i)
+    for (int j = 0; j < 4; ++j)
+        if (segmentsIntersect(a[i], a[(i+1)%4], b[j], b[(j+1)%4]))
+            return false;
+    return true;
+}
+
+
+static inline bool isGoodMotion(const float M[], float w, float h, float dx, float dy)
+{
+    Point2f pt[4] = {Point2f(0,0), Point2f(w,0), Point2f(w,h), Point2f(0,h)};
+    Point2f Mpt[4];
+
+    for (int i = 0; i < 4; ++i)
+    {
+        Mpt[i].x = M[0]*pt[i].x + M[1]*pt[i].y + M[2];
+        Mpt[i].y = M[3]*pt[i].x + M[4]*pt[i].y + M[5];
+        float z = M[6]*pt[i].x + M[7]*pt[i].y + M[8];
+        Mpt[i].x /= z;
+        Mpt[i].y /= z;
+    }
+
+    pt[0] = Point2f(dx, dy);
+    pt[1] = Point2f(w - dx, dy);
+    pt[2] = Point2f(w - dx, h - dy);
+    pt[3] = Point2f(dx, h - dy);
+
+    return isRectInside(pt, Mpt);
+}
+
+
+static inline void relaxMotion(const float M[], float t, float res[])
+{
+    res[0] = M[0]*(1.f-t) + t;
+    res[1] = M[1]*(1.f-t);
+    res[2] = M[2]*(1.f-t);
+    res[3] = M[3]*(1.f-t);
+    res[4] = M[4]*(1.f-t) + t;
+    res[5] = M[5]*(1.f-t);
+    res[6] = M[6]*(1.f-t);
+    res[7] = M[7]*(1.f-t);
+    res[8] = M[8]*(1.f-t) + t;
+}
+
+
+Mat ensureInclusionConstraint(const Mat &M, Size size, float trimRatio)
+{
+    CV_INSTRUMENT_REGION();
+
+    CV_Assert(M.size() == Size(3,3) && M.type() == CV_32F);
+
+    const float w = static_cast<float>(size.width);
+    const float h = static_cast<float>(size.height);
+    const float dx = floor(w * trimRatio);
+    const float dy = floor(h * trimRatio);
+    const float srcM[] =
+            {M.at<float>(0,0), M.at<float>(0,1), M.at<float>(0,2),
+             M.at<float>(1,0), M.at<float>(1,1), M.at<float>(1,2),
+             M.at<float>(2,0), M.at<float>(2,1), M.at<float>(2,2)};
+
+    float curM[9];
+    float t = 0;
+    relaxMotion(srcM, t, curM);
+    if (isGoodMotion(curM, w, h, dx, dy))
+        return M;
+
+    float l = 0, r = 1;
+    while (r - l > 1e-3f)
+    {
+        t = (l + r) * 0.5f;
+        relaxMotion(srcM, t, curM);
+        if (isGoodMotion(curM, w, h, dx, dy))
+            r = t;
+        else
+            l = t;
+    }
+
+    return (1 - r) * M + r * Mat::eye(3, 3, CV_32F);
+}
+
+
+// TODO can be estimated for O(1) time
+float estimateOptimalTrimRatio(const Mat &M, Size size)
+{
+    CV_INSTRUMENT_REGION();
+
+    CV_Assert(M.size() == Size(3,3) && M.type() == CV_32F);
+
+    const float w = static_cast<float>(size.width);
+    const float h = static_cast<float>(size.height);
+    Mat_<float> M_(M);
+
+    Point2f pt[4] = {Point2f(0,0), Point2f(w,0), Point2f(w,h), Point2f(0,h)};
+    Point2f Mpt[4];
+    float z;
+
+    for (int i = 0; i < 4; ++i)
+    {
+        Mpt[i].x = M_(0,0)*pt[i].x + M_(0,1)*pt[i].y + M_(0,2);
+        Mpt[i].y = M_(1,0)*pt[i].x + M_(1,1)*pt[i].y + M_(1,2);
+        z = M_(2,0)*pt[i].x + M_(2,1)*pt[i].y + M_(2,2);
+        Mpt[i].x /= z;
+        Mpt[i].y /= z;
+    }
+
+    float l = 0, r = 0.5f;
+    while (r - l > 1e-3f)
+    {
+        float t = (l + r) * 0.5f;
+        float dx = floor(w * t);
+        float dy = floor(h * t);
+        pt[0] = Point2f(dx, dy);
+        pt[1] = Point2f(w - dx, dy);
+        pt[2] = Point2f(w - dx, h - dy);
+        pt[3] = Point2f(dx, h - dy);
+        if (isRectInside(pt, Mpt))
+            r = t;
+        else
+            l = t;
+    }
+
+    return r;
+}
+
+} // namespace videostab
+} // namespace cv
diff --git a/modules/videostab/src/optical_flow.cpp b/modules/videostab/src/optical_flow.cpp
new file mode 100644
index 00000000000..32c8133a7d3
--- /dev/null
+++ b/modules/videostab/src/optical_flow.cpp
@@ -0,0 +1,155 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+#include "opencv2/video.hpp"
+#include "opencv2/videostab/optical_flow.hpp"
+#include "opencv2/videostab/ring_buffer.hpp"
+
+#ifdef HAVE_OPENCV_CUDAARITHM
+  #include "opencv2/cudaarithm.hpp"
+#endif
+
+namespace cv
+{
+namespace videostab
+{
+
+void SparsePyrLkOptFlowEstimator::run(
+        InputArray frame0, InputArray frame1, InputArray points0, InputOutputArray points1,
+        OutputArray status, OutputArray errors)
+{
+    calcOpticalFlowPyrLK(frame0, frame1, points0, points1, status, errors, winSize_, maxLevel_);
+}
+
+
+#ifdef HAVE_OPENCV_CUDAOPTFLOW
+
+SparsePyrLkOptFlowEstimatorGpu::SparsePyrLkOptFlowEstimatorGpu()
+{
+    CV_Assert(cuda::getCudaEnabledDeviceCount() > 0);
+    optFlowEstimator_ = cuda::SparsePyrLKOpticalFlow::create();
+}
+
+
+void SparsePyrLkOptFlowEstimatorGpu::run(
+        InputArray frame0, InputArray frame1, InputArray points0, InputOutputArray points1,
+        OutputArray status, OutputArray errors)
+{
+    frame0_.upload(frame0.getMat());
+    frame1_.upload(frame1.getMat());
+    points0_.upload(points0.getMat());
+
+    if (errors.needed())
+    {
+        run(frame0_, frame1_, points0_, points1_, status_, errors_);
+        errors_.download(errors.getMatRef());
+    }
+    else
+        run(frame0_, frame1_, points0_, points1_, status_);
+
+    points1_.download(points1.getMatRef());
+    status_.download(status.getMatRef());
+}
+
+
+void SparsePyrLkOptFlowEstimatorGpu::run(
+        const cuda::GpuMat &frame0, const cuda::GpuMat &frame1, const cuda::GpuMat &points0,
+        cuda::GpuMat &points1, cuda::GpuMat &status, cuda::GpuMat &errors)
+{
+    optFlowEstimator_->setWinSize(winSize_);
+    optFlowEstimator_->setMaxLevel(maxLevel_);
+    optFlowEstimator_->calc(frame0, frame1, points0, points1, status, errors);
+}
+
+
+void SparsePyrLkOptFlowEstimatorGpu::run(
+        const cuda::GpuMat &frame0, const cuda::GpuMat &frame1, const cuda::GpuMat &points0,
+        cuda::GpuMat &points1, cuda::GpuMat &status)
+{
+    optFlowEstimator_->setWinSize(winSize_);
+    optFlowEstimator_->setMaxLevel(maxLevel_);
+    optFlowEstimator_->calc(frame0, frame1, points0, points1, status);
+}
+
+
+DensePyrLkOptFlowEstimatorGpu::DensePyrLkOptFlowEstimatorGpu()
+{
+    CV_Assert(cuda::getCudaEnabledDeviceCount() > 0);
+    optFlowEstimator_ = cuda::DensePyrLKOpticalFlow::create();
+}
+
+
+void DensePyrLkOptFlowEstimatorGpu::run(
+        InputArray frame0, InputArray frame1, InputOutputArray flowX, InputOutputArray flowY,
+        OutputArray errors)
+{
+    frame0_.upload(frame0.getMat());
+    frame1_.upload(frame1.getMat());
+
+    optFlowEstimator_->setWinSize(winSize_);
+    optFlowEstimator_->setMaxLevel(maxLevel_);
+
+    if (errors.needed())
+    {
+        CV_Error(Error::StsNotImplemented, "DensePyrLkOptFlowEstimatorGpu doesn't support errors calculation");
+    }
+    else
+    {
+        cuda::GpuMat flow;
+        optFlowEstimator_->calc(frame0_, frame1_, flow);
+
+        cuda::GpuMat flows[2];
+        cuda::split(flow, flows);
+
+        flowX_ = flows[0];
+        flowY_ = flows[1];
+    }
+
+    flowX_.download(flowX.getMatRef());
+    flowY_.download(flowY.getMatRef());
+}
+
+#endif // HAVE_OPENCV_CUDAOPTFLOW
+
+} // namespace videostab
+} // namespace cv
diff --git a/modules/videostab/src/outlier_rejection.cpp b/modules/videostab/src/outlier_rejection.cpp
new file mode 100644
index 00000000000..b6d7d64fcf2
--- /dev/null
+++ b/modules/videostab/src/outlier_rejection.cpp
@@ -0,0 +1,197 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+#include "opencv2/videostab/outlier_rejection.hpp"
+
+namespace cv
+{
+namespace videostab
+{
+
+void NullOutlierRejector::process(
+        Size /*frameSize*/, InputArray points0, InputArray points1, OutputArray mask)
+{
+    CV_INSTRUMENT_REGION();
+
+    CV_Assert(points0.type() == points1.type());
+    CV_Assert(points0.getMat().checkVector(2) == points1.getMat().checkVector(2));
+
+    int npoints = points0.getMat().checkVector(2);
+    mask.create(1, npoints, CV_8U);
+    Mat mask_ = mask.getMat();
+    mask_.setTo(1);
+}
+
+TranslationBasedLocalOutlierRejector::TranslationBasedLocalOutlierRejector()
+{
+    setCellSize(Size(50, 50));
+    setRansacParams(RansacParams::default2dMotion(MM_TRANSLATION));
+}
+
+
+void TranslationBasedLocalOutlierRejector::process(
+        Size frameSize, InputArray points0, InputArray points1, OutputArray mask)
+{
+    CV_INSTRUMENT_REGION();
+
+    CV_Assert(points0.type() == points1.type());
+    CV_Assert(points0.getMat().checkVector(2) == points1.getMat().checkVector(2));
+
+    int npoints = points0.getMat().checkVector(2);
+
+    const Point2f* points0_ = points0.getMat().ptr<Point2f>();
+    const Point2f* points1_ = points1.getMat().ptr<Point2f>();
+
+    mask.create(1, npoints, CV_8U);
+    uchar* mask_ = mask.getMat().ptr<uchar>();
+
+    Size ncells((frameSize.width + cellSize_.width - 1) / cellSize_.width,
+                (frameSize.height + cellSize_.height - 1) / cellSize_.height);
+
+    // fill grid cells
+
+    grid_.assign(ncells.area(), Cell());
+
+    for (int i = 0; i < npoints; ++i)
+    {
+        int cx = std::min(cvRound(points0_[i].x / cellSize_.width), ncells.width - 1);
+        int cy = std::min(cvRound(points0_[i].y / cellSize_.height), ncells.height - 1);
+        grid_[cy * ncells.width + cx].push_back(i);
+    }
+
+    // process each cell
+
+    RNG rng(0);
+    int niters = ransacParams_.niters();
+    std::vector<int> inliers;
+
+    for (size_t ci = 0; ci < grid_.size(); ++ci)
+    {
+        // estimate translation model at the current cell using RANSAC
+
+        float x1, y1;
+        const Cell &cell = grid_[ci];
+        int ninliers, ninliersMax = 0;
+        float dxBest = 0.f, dyBest = 0.f;
+
+        // find the best hypothesis
+
+        if (!cell.empty())
+        {
+            for (int iter = 0; iter < niters; ++iter)
+            {
+                int idx = cell[static_cast<unsigned>(rng) % cell.size()];
+                float dx = points1_[idx].x - points0_[idx].x;
+                float dy = points1_[idx].y - points0_[idx].y;
+
+                ninliers = 0;
+                for (size_t i = 0; i < cell.size(); ++i)
+                {
+                    x1 = points0_[cell[i]].x + dx;
+                    y1 = points0_[cell[i]].y + dy;
+                    if (sqr(x1 - points1_[cell[i]].x) + sqr(y1 - points1_[cell[i]].y) <
+                        sqr(ransacParams_.thresh))
+                    {
+                        ninliers++;
+                    }
+                }
+
+                if (ninliers > ninliersMax)
+                {
+                    ninliersMax = ninliers;
+                    dxBest = dx;
+                    dyBest = dy;
+                }
+            }
+        }
+
+        // get the best hypothesis inliers
+
+        ninliers = 0;
+        inliers.resize(ninliersMax);
+        for (size_t i = 0; i < cell.size(); ++i)
+        {
+            x1 = points0_[cell[i]].x + dxBest;
+            y1 = points0_[cell[i]].y + dyBest;
+            if (sqr(x1 - points1_[cell[i]].x) + sqr(y1 - points1_[cell[i]].y) <
+                sqr(ransacParams_.thresh))
+            {
+                inliers[ninliers++] = cell[i];
+            }
+        }
+
+        // refine the best hypothesis
+
+        dxBest = dyBest = 0.f;
+        for (size_t i = 0; i < inliers.size(); ++i)
+        {
+            dxBest += points1_[inliers[i]].x - points0_[inliers[i]].x;
+            dyBest += points1_[inliers[i]].y - points0_[inliers[i]].y;
+        }
+        if (!inliers.empty())
+        {
+            dxBest /= inliers.size();
+            dyBest /= inliers.size();
+        }
+
+        // set mask elements for refined model inliers
+
+        for (size_t i = 0; i < cell.size(); ++i)
+        {
+            x1 = points0_[cell[i]].x + dxBest;
+            y1 = points0_[cell[i]].y + dyBest;
+            if (sqr(x1 - points1_[cell[i]].x) + sqr(y1 - points1_[cell[i]].y) <
+                sqr(ransacParams_.thresh))
+            {
+                mask_[cell[i]] = 1;
+            }
+            else
+            {
+                mask_[cell[i]] = 0;
+            }
+        }
+    }
+}
+
+} // namespace videostab
+} // namespace cv
diff --git a/modules/videostab/src/precomp.hpp b/modules/videostab/src/precomp.hpp
new file mode 100644
index 00000000000..aa6026deee5
--- /dev/null
+++ b/modules/videostab/src/precomp.hpp
@@ -0,0 +1,67 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#ifndef __OPENCV_PRECOMP_HPP__
+#define __OPENCV_PRECOMP_HPP__
+
+#include <stdexcept>
+#include <iostream>
+#include <ctime>
+#include <algorithm>
+#include "opencv2/core.hpp"
+#include "opencv2/imgproc.hpp"
+#include "opencv2/video.hpp"
+#include "opencv2/features2d.hpp"
+#include "opencv2/calib3d.hpp"
+
+#include "opencv2/core/private.hpp"
+
+// some aux. functions
+
+inline float sqr(float x) { return x * x; }
+
+inline float intensity(const cv::Point3_<uchar> &bgr)
+{
+    return 0.3f*bgr.x + 0.59f*bgr.y + 0.11f*bgr.z;
+}
+
+#endif
diff --git a/modules/videostab/src/stabilizer.cpp b/modules/videostab/src/stabilizer.cpp
new file mode 100644
index 00000000000..91a7bc7c8ed
--- /dev/null
+++ b/modules/videostab/src/stabilizer.cpp
@@ -0,0 +1,511 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+#include "opencv2/videostab/stabilizer.hpp"
+#include "opencv2/videostab/ring_buffer.hpp"
+
+// for debug purposes
+#define SAVE_MOTIONS 0
+
+namespace cv
+{
+namespace videostab
+{
+
+StabilizerBase::StabilizerBase()
+{
+    setLog(makePtr<LogToStdout>());
+    setFrameSource(makePtr<NullFrameSource>());
+    setMotionEstimator(makePtr<KeypointBasedMotionEstimator>(makePtr<MotionEstimatorRansacL2>()));
+    setDeblurer(makePtr<NullDeblurer>());
+    setInpainter(makePtr<NullInpainter>());
+    setRadius(15);
+    setTrimRatio(0);
+    setCorrectionForInclusion(false);
+    setBorderMode(BORDER_REPLICATE);
+    curPos_ = -1;
+    curStabilizedPos_ = -1;
+    doDeblurring_ = false;
+    doInpainting_ = false;
+    processingStartTime_ = 0;
+}
+
+
+void StabilizerBase::reset()
+{
+    frameSize_ = Size(0, 0);
+    frameMask_ = Mat();
+    curPos_ = -1;
+    curStabilizedPos_ = -1;
+    doDeblurring_ = false;
+    preProcessedFrame_ = Mat();
+    doInpainting_ = false;
+    inpaintingMask_ = Mat();
+    frames_.clear();
+    motions_.clear();
+    blurrinessRates_.clear();
+    stabilizedFrames_.clear();
+    stabilizedMasks_.clear();
+    stabilizationMotions_.clear();
+    processingStartTime_ = 0;
+}
+
+
+Mat StabilizerBase::nextStabilizedFrame()
+{
+    // check if we've processed all frames already
+    if (curStabilizedPos_ == curPos_ && curStabilizedPos_ != -1)
+    {
+        logProcessingTime();
+        return Mat();
+    }
+
+    bool processed;
+    do processed = doOneIteration();
+    while (processed && curStabilizedPos_ == -1);
+
+    // check if the frame source is empty
+    if (curStabilizedPos_ == -1)
+    {
+        logProcessingTime();
+        return Mat();
+    }
+
+    return postProcessFrame(at(curStabilizedPos_, stabilizedFrames_));
+}
+
+
+bool StabilizerBase::doOneIteration()
+{
+    Mat frame = frameSource_->nextFrame();
+    if (!frame.empty())
+    {
+        curPos_++;
+
+        if (curPos_ > 0)
+        {
+            at(curPos_, frames_) = frame;
+
+            if (doDeblurring_)
+                at(curPos_, blurrinessRates_) = calcBlurriness(frame);
+
+            at(curPos_ - 1, motions_) = estimateMotion();
+
+            if (curPos_ >= radius_)
+            {
+                curStabilizedPos_ = curPos_ - radius_;
+                stabilizeFrame();
+            }
+        }
+        else
+            setUp(frame);
+
+        log_->print(".");
+        return true;
+    }
+    else if (curStabilizedPos_ < curPos_)
+    {
+        curStabilizedPos_++;
+        at(curPos_, motions_) = Mat::eye(3, 3, CV_32F);
+        stabilizeFrame();
+
+        log_->print(".");
+        return true;
+    }
+
+    return false;
+}
+
+
+void StabilizerBase::setUp(const Mat &firstFrame)
+{
+    InpainterBase *inpaint = inpainter_.get();
+    doInpainting_ = dynamic_cast<NullInpainter*>(inpaint) == 0;
+    if (doInpainting_)
+    {
+        inpainter_->setMotionModel(motionEstimator_->motionModel());
+        inpainter_->setFrames(frames_);
+        inpainter_->setMotions(motions_);
+        inpainter_->setStabilizedFrames(stabilizedFrames_);
+        inpainter_->setStabilizationMotions(stabilizationMotions_);
+    }
+
+    DeblurerBase *deblurer = deblurer_.get();
+    doDeblurring_ = dynamic_cast<NullDeblurer*>(deblurer) == 0;
+    if (doDeblurring_)
+    {
+        blurrinessRates_.resize(2*radius_ + 1);
+        float blurriness = calcBlurriness(firstFrame);
+        for (int i  = -radius_; i <= 0; ++i)
+            at(i, blurrinessRates_) = blurriness;
+        deblurer_->setFrames(frames_);
+        deblurer_->setMotions(motions_);
+        deblurer_->setBlurrinessRates(blurrinessRates_);
+    }
+
+    log_->print("processing frames");
+    processingStartTime_ = clock();
+}
+
+
+void StabilizerBase::stabilizeFrame()
+{
+    Mat stabilizationMotion = estimateStabilizationMotion();
+    if (doCorrectionForInclusion_)
+        stabilizationMotion = ensureInclusionConstraint(stabilizationMotion, frameSize_, trimRatio_);
+
+    at(curStabilizedPos_, stabilizationMotions_) = stabilizationMotion;
+
+    if (doDeblurring_)
+    {
+        at(curStabilizedPos_, frames_).copyTo(preProcessedFrame_);
+        deblurer_->deblur(curStabilizedPos_, preProcessedFrame_, Range(0, curPos_));
+    }
+    else
+        preProcessedFrame_ = at(curStabilizedPos_, frames_);
+
+    // apply stabilization transformation
+
+    if (motionEstimator_->motionModel() != MM_HOMOGRAPHY)
+        warpAffine(
+                preProcessedFrame_, at(curStabilizedPos_, stabilizedFrames_),
+                stabilizationMotion(Rect(0,0,3,2)), frameSize_, INTER_LINEAR, borderMode_);
+    else
+        warpPerspective(
+                preProcessedFrame_, at(curStabilizedPos_, stabilizedFrames_),
+                stabilizationMotion, frameSize_, INTER_LINEAR, borderMode_);
+
+    if (doInpainting_)
+    {
+        if (motionEstimator_->motionModel() != MM_HOMOGRAPHY)
+            warpAffine(
+                    frameMask_, at(curStabilizedPos_, stabilizedMasks_),
+                    stabilizationMotion(Rect(0,0,3,2)), frameSize_, INTER_NEAREST);
+        else
+            warpPerspective(
+                    frameMask_, at(curStabilizedPos_, stabilizedMasks_),
+                    stabilizationMotion, frameSize_, INTER_NEAREST);
+
+        erode(at(curStabilizedPos_, stabilizedMasks_), at(curStabilizedPos_, stabilizedMasks_),
+              Mat());
+
+        at(curStabilizedPos_, stabilizedMasks_).copyTo(inpaintingMask_);
+
+        inpainter_->inpaint(
+            curStabilizedPos_, at(curStabilizedPos_, stabilizedFrames_), inpaintingMask_);
+    }
+}
+
+
+Mat StabilizerBase::postProcessFrame(const Mat &frame)
+{
+    // trim frame
+    int dx = static_cast<int>(floor(trimRatio_ * frame.cols));
+    int dy = static_cast<int>(floor(trimRatio_ * frame.rows));
+    return frame(Rect(dx, dy, frame.cols - 2*dx, frame.rows - 2*dy));
+}
+
+
+void StabilizerBase::logProcessingTime()
+{
+    clock_t elapsedTime = clock() - processingStartTime_;
+    log_->print("\nprocessing time: %.3f sec\n", static_cast<double>(elapsedTime) / CLOCKS_PER_SEC);
+}
+
+
+OnePassStabilizer::OnePassStabilizer()
+{
+    setMotionFilter(makePtr<GaussianMotionFilter>());
+    reset();
+}
+
+
+void OnePassStabilizer::reset()
+{
+    StabilizerBase::reset();
+}
+
+
+void OnePassStabilizer::setUp(const Mat &firstFrame)
+{
+    frameSize_ = firstFrame.size();
+    frameMask_.create(frameSize_, CV_8U);
+    frameMask_.setTo(255);
+
+    int cacheSize = 2*radius_ + 1;
+    frames_.resize(cacheSize);
+    stabilizedFrames_.resize(cacheSize);
+    stabilizedMasks_.resize(cacheSize);
+    motions_.resize(cacheSize);
+    stabilizationMotions_.resize(cacheSize);
+
+    for (int i = -radius_; i < 0; ++i)
+    {
+        at(i, motions_) = Mat::eye(3, 3, CV_32F);
+        at(i, frames_) = firstFrame;
+    }
+
+    at(0, frames_) = firstFrame;
+
+    StabilizerBase::setUp(firstFrame);
+}
+
+
+Mat OnePassStabilizer::estimateMotion()
+{
+    return motionEstimator_->estimate(at(curPos_ - 1, frames_), at(curPos_, frames_));
+}
+
+
+Mat OnePassStabilizer::estimateStabilizationMotion()
+{
+    return motionFilter_->stabilize(curStabilizedPos_, motions_, Range(0, curPos_));
+}
+
+
+Mat OnePassStabilizer::postProcessFrame(const Mat &frame)
+{
+    return StabilizerBase::postProcessFrame(frame);
+}
+
+
+TwoPassStabilizer::TwoPassStabilizer()
+{
+    setMotionStabilizer(makePtr<GaussianMotionFilter>());
+    setWobbleSuppressor(makePtr<NullWobbleSuppressor>());
+    setEstimateTrimRatio(false);
+    reset();
+}
+
+
+void TwoPassStabilizer::reset()
+{
+    StabilizerBase::reset();
+    frameCount_ = 0;
+    isPrePassDone_ = false;
+    doWobbleSuppression_ = false;
+    motions2_.clear();
+    suppressedFrame_ = Mat();
+}
+
+
+Mat TwoPassStabilizer::nextFrame()
+{
+    runPrePassIfNecessary();
+    return StabilizerBase::nextStabilizedFrame();
+}
+
+
+#if SAVE_MOTIONS
+static void saveMotions(
+        int frameCount, const std::vector<Mat> &motions, const std::vector<Mat> &stabilizationMotions)
+{
+    std::ofstream fm("log_motions.csv");
+    for (int i = 0; i < frameCount - 1; ++i)
+    {
+        Mat_<float> M = at(i, motions);
+        fm << M(0,0) << " " << M(0,1) << " " << M(0,2) << " "
+           << M(1,0) << " " << M(1,1) << " " << M(1,2) << " "
+           << M(2,0) << " " << M(2,1) << " " << M(2,2) << std::endl;
+    }
+    std::ofstream fo("log_orig.csv");
+    for (int i = 0; i < frameCount; ++i)
+    {
+        Mat_<float> M = getMotion(0, i, motions);
+        fo << M(0,0) << " " << M(0,1) << " " << M(0,2) << " "
+           << M(1,0) << " " << M(1,1) << " " << M(1,2) << " "
+           << M(2,0) << " " << M(2,1) << " " << M(2,2) << std::endl;
+    }
+    std::ofstream fs("log_stab.csv");
+    for (int i = 0; i < frameCount; ++i)
+    {
+        Mat_<float> M = stabilizationMotions[i] * getMotion(0, i, motions);
+        fs << M(0,0) << " " << M(0,1) << " " << M(0,2) << " "
+           << M(1,0) << " " << M(1,1) << " " << M(1,2) << " "
+           << M(2,0) << " " << M(2,1) << " " << M(2,2) << std::endl;
+    }
+}
+#endif
+
+
+void TwoPassStabilizer::runPrePassIfNecessary()
+{
+    if (!isPrePassDone_)
+    {
+        // check if we must do wobble suppression
+
+        WobbleSuppressorBase *wobble = wobbleSuppressor_.get();
+        doWobbleSuppression_ = dynamic_cast<NullWobbleSuppressor*>(wobble) == 0;
+
+        // estimate motions
+
+        clock_t startTime = clock();
+        log_->print("first pass: estimating motions");
+
+        Mat prevFrame, frame;
+        bool ok = true, ok2 = true;
+
+        while (!(frame = frameSource_->nextFrame()).empty())
+        {
+            if (frameCount_ > 0)
+            {
+                if (maskSource_)
+                    motionEstimator_->setFrameMask(maskSource_->nextFrame());
+
+                motions_.push_back(motionEstimator_->estimate(prevFrame, frame, &ok));
+
+                if (doWobbleSuppression_)
+                {
+                    Mat M = wobbleSuppressor_->motionEstimator()->estimate(prevFrame, frame, &ok2);
+                    if (ok2)
+                        motions2_.push_back(M);
+                    else
+                        motions2_.push_back(motions_.back());
+                }
+
+                if (ok)
+                {
+                    if (ok2) log_->print(".");
+                    else log_->print("?");
+                }
+                else log_->print("x");
+            }
+            else
+            {
+                frameSize_ = frame.size();
+                frameMask_.create(frameSize_, CV_8U);
+                frameMask_.setTo(255);
+            }
+
+            prevFrame = frame;
+            frameCount_++;
+        }
+
+        clock_t elapsedTime = clock() - startTime;
+        log_->print("\nmotion estimation time: %.3f sec\n",
+                    static_cast<double>(elapsedTime) / CLOCKS_PER_SEC);
+
+        // add aux. motions
+
+        for (int i = 0; i < radius_; ++i)
+            motions_.push_back(Mat::eye(3, 3, CV_32F));
+
+        // stabilize
+
+        startTime = clock();
+
+        stabilizationMotions_.resize(frameCount_);
+        motionStabilizer_->stabilize(
+            frameCount_, motions_, Range(0, frameCount_ - 1), &stabilizationMotions_[0]);
+
+        elapsedTime = clock() - startTime;
+        log_->print("motion stabilization time: %.3f sec\n",
+                    static_cast<double>(elapsedTime) / CLOCKS_PER_SEC);
+
+        // estimate optimal trim ratio if necessary
+
+        if (mustEstTrimRatio_)
+        {
+            trimRatio_ = 0;
+            for (int i = 0; i < frameCount_; ++i)
+            {
+                Mat S = stabilizationMotions_[i];
+                trimRatio_ = std::max(trimRatio_, estimateOptimalTrimRatio(S, frameSize_));
+            }
+            log_->print("estimated trim ratio: %f\n", static_cast<double>(trimRatio_));
+        }
+
+#if SAVE_MOTIONS
+        saveMotions(frameCount_, motions_, stabilizationMotions_);
+#endif
+
+        isPrePassDone_ = true;
+        frameSource_->reset();
+    }
+}
+
+
+void TwoPassStabilizer::setUp(const Mat &firstFrame)
+{
+    int cacheSize = 2*radius_ + 1;
+    frames_.resize(cacheSize);
+    stabilizedFrames_.resize(cacheSize);
+    stabilizedMasks_.resize(cacheSize);
+
+    for (int i = -radius_; i <= 0; ++i)
+        at(i, frames_) = firstFrame;
+
+    WobbleSuppressorBase *wobble = wobbleSuppressor_.get();
+    doWobbleSuppression_ = dynamic_cast<NullWobbleSuppressor*>(wobble) == 0;
+    if (doWobbleSuppression_)
+    {
+        wobbleSuppressor_->setFrameCount(frameCount_);
+        wobbleSuppressor_->setMotions(motions_);
+        wobbleSuppressor_->setMotions2(motions2_);
+        wobbleSuppressor_->setStabilizationMotions(stabilizationMotions_);
+    }
+
+    StabilizerBase::setUp(firstFrame);
+}
+
+
+Mat TwoPassStabilizer::estimateMotion()
+{
+    return motions_[curPos_ - 1].clone();
+}
+
+
+Mat TwoPassStabilizer::estimateStabilizationMotion()
+{
+    return stabilizationMotions_[curStabilizedPos_].clone();
+}
+
+
+Mat TwoPassStabilizer::postProcessFrame(const Mat &frame)
+{
+    wobbleSuppressor_->suppress(curStabilizedPos_, frame, suppressedFrame_);
+    return StabilizerBase::postProcessFrame(suppressedFrame_);
+}
+
+} // namespace videostab
+} // namespace cv
diff --git a/modules/videostab/src/wobble_suppression.cpp b/modules/videostab/src/wobble_suppression.cpp
new file mode 100644
index 00000000000..079c511b5d8
--- /dev/null
+++ b/modules/videostab/src/wobble_suppression.cpp
@@ -0,0 +1,188 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+//M*/
+
+#include "precomp.hpp"
+#include "opencv2/videostab/wobble_suppression.hpp"
+#include "opencv2/videostab/ring_buffer.hpp"
+
+#include "opencv2/core/private.cuda.hpp"
+
+#ifdef HAVE_OPENCV_CUDAWARPING
+#  include "opencv2/cudawarping.hpp"
+#endif
+
+#if defined(HAVE_OPENCV_CUDAWARPING)
+    #if !defined HAVE_CUDA || defined(CUDA_DISABLER)
+        namespace cv { namespace cuda {
+            static void calcWobbleSuppressionMaps(int, int, int, Size, const Mat&, const Mat&, GpuMat&, GpuMat&) { throw_no_cuda(); }
+        }}
+    #else
+        namespace cv { namespace cuda { namespace device { namespace globmotion {
+            void calcWobbleSuppressionMaps(
+                    int left, int idx, int right, int width, int height,
+                    const float *ml, const float *mr, PtrStepSzf mapx, PtrStepSzf mapy);
+        }}}}
+        namespace cv { namespace cuda {
+            static void calcWobbleSuppressionMaps(
+                    int left, int idx, int right, Size size, const Mat &ml, const Mat &mr,
+                    GpuMat &mapx, GpuMat &mapy)
+            {
+                CV_Assert(ml.size() == Size(3, 3) && ml.type() == CV_32F && ml.isContinuous());
+                CV_Assert(mr.size() == Size(3, 3) && mr.type() == CV_32F && mr.isContinuous());
+
+                mapx.create(size, CV_32F);
+                mapy.create(size, CV_32F);
+
+                cv::cuda::device::globmotion::calcWobbleSuppressionMaps(
+                            left, idx, right, size.width, size.height,
+                            ml.ptr<float>(), mr.ptr<float>(), mapx, mapy);
+            }
+        }}
+    #endif
+#endif
+
+namespace cv
+{
+namespace videostab
+{
+
+WobbleSuppressorBase::WobbleSuppressorBase() : frameCount_(0), motions_(0), motions2_(0), stabilizationMotions_(0)
+{
+    setMotionEstimator(makePtr<KeypointBasedMotionEstimator>(makePtr<MotionEstimatorRansacL2>(MM_HOMOGRAPHY)));
+}
+
+
+void NullWobbleSuppressor::suppress(int /*idx*/, const Mat &frame, Mat &result)
+{
+    result = frame;
+}
+
+
+void MoreAccurateMotionWobbleSuppressor::suppress(int idx, const Mat &frame, Mat &result)
+{
+    CV_Assert(motions_ && stabilizationMotions_);
+
+    if (idx % period_ == 0)
+    {
+        result = frame;
+        return;
+    }
+
+    int k1 = idx / period_ * period_;
+    int k2 = std::min(k1 + period_, frameCount_ - 1);
+
+    Mat S1 = (*stabilizationMotions_)[idx];
+
+    Mat_<float> ML = S1 * getMotion(k1, idx, *motions2_) * getMotion(k1, idx, *motions_).inv() * S1.inv();
+    Mat_<float> MR = S1 * getMotion(idx, k2, *motions2_).inv() * getMotion(idx, k2, *motions_) * S1.inv();
+
+    mapx_.create(frame.size());
+    mapy_.create(frame.size());
+
+    float xl, yl, zl, wl;
+    float xr, yr, zr, wr;
+
+    for (int y = 0; y < frame.rows; ++y)
+    {
+        for (int x = 0; x < frame.cols; ++x)
+        {
+            xl = ML(0,0)*x + ML(0,1)*y + ML(0,2);
+            yl = ML(1,0)*x + ML(1,1)*y + ML(1,2);
+            zl = ML(2,0)*x + ML(2,1)*y + ML(2,2);
+            xl /= zl; yl /= zl;
+            wl = float(idx - k1);
+
+            xr = MR(0,0)*x + MR(0,1)*y + MR(0,2);
+            yr = MR(1,0)*x + MR(1,1)*y + MR(1,2);
+            zr = MR(2,0)*x + MR(2,1)*y + MR(2,2);
+            xr /= zr; yr /= zr;
+            wr = float(k2 - idx);
+
+            mapx_(y,x) = (wr * xl + wl * xr) / (wl + wr);
+            mapy_(y,x) = (wr * yl + wl * yr) / (wl + wr);
+        }
+    }
+
+    if (result.data == frame.data)
+        result = Mat(frame.size(), frame.type());
+
+    remap(frame, result, mapx_, mapy_, INTER_LINEAR, BORDER_REPLICATE);
+}
+
+#if defined(HAVE_OPENCV_CUDAWARPING)
+void MoreAccurateMotionWobbleSuppressorGpu::suppress(int idx, const cuda::GpuMat &frame, cuda::GpuMat &result)
+{
+    CV_Assert(motions_ && stabilizationMotions_);
+
+    if (idx % period_ == 0)
+    {
+        result = frame;
+        return;
+    }
+
+    int k1 = idx / period_ * period_;
+    int k2 = std::min(k1 + period_, frameCount_ - 1);
+
+    Mat S1 = (*stabilizationMotions_)[idx];
+
+    Mat ML = S1 * getMotion(k1, idx, *motions2_) * getMotion(k1, idx, *motions_).inv() * S1.inv();
+    Mat MR = S1 * getMotion(idx, k2, *motions2_).inv() * getMotion(idx, k2, *motions_) * S1.inv();
+
+    cuda::calcWobbleSuppressionMaps(k1, idx, k2, frame.size(), ML, MR, mapx_, mapy_);
+
+    if (result.data == frame.data)
+        result = cuda::GpuMat(frame.size(), frame.type());
+
+    cuda::remap(frame, result, mapx_, mapy_, INTER_LINEAR, BORDER_REPLICATE);
+}
+
+
+void MoreAccurateMotionWobbleSuppressorGpu::suppress(int idx, const Mat &frame, Mat &result)
+{
+    frameDevice_.upload(frame);
+    suppress(idx, frameDevice_, resultDevice_);
+    resultDevice_.download(result);
+}
+#endif
+
+} // namespace videostab
+} // namespace cv
diff --git a/modules/videostab/test/test_main.cpp b/modules/videostab/test/test_main.cpp
new file mode 100644
index 00000000000..1eff6818990
--- /dev/null
+++ b/modules/videostab/test/test_main.cpp
@@ -0,0 +1,11 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+#include "test_precomp.hpp"
+
+#if defined(HAVE_HPX)
+    #include <hpx/hpx_main.hpp>
+#endif
+
+CV_TEST_MAIN("cv")
diff --git a/modules/videostab/test/test_motion_estimation.cpp b/modules/videostab/test/test_motion_estimation.cpp
new file mode 100644
index 00000000000..9a1ea1decd2
--- /dev/null
+++ b/modules/videostab/test/test_motion_estimation.cpp
@@ -0,0 +1,175 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+namespace testUtil
+{
+
+cv::RNG rng(/*std::time(0)*/0);
+
+const float sigma = 1.f;
+const float pointsMaxX = 500.f;
+const float pointsMaxY = 500.f;
+const int testRun = 5000;
+
+void generatePoints(cv::Mat points);
+void addNoise(cv::Mat points);
+
+cv::Mat generateTransform(const cv::videostab::MotionModel model);
+
+double performTest(const cv::videostab::MotionModel model, int size);
+
+}
+
+void testUtil::generatePoints(cv::Mat points)
+{
+    CV_Assert(!points.empty());
+    for(int i = 0; i < points.cols; ++i)
+    {
+        points.at<float>(0, i) = rng.uniform(0.f, pointsMaxX);
+        points.at<float>(1, i) = rng.uniform(0.f, pointsMaxY);
+        points.at<float>(2, i) = 1.f;
+    }
+}
+
+void testUtil::addNoise(cv::Mat points)
+{
+    CV_Assert(!points.empty());
+    for(int i = 0; i < points.cols; i++)
+    {
+        points.at<float>(0, i) += static_cast<float>(rng.gaussian(sigma));
+        points.at<float>(1, i) += static_cast<float>(rng.gaussian(sigma));
+
+    }
+}
+
+
+cv::Mat testUtil::generateTransform(const cv::videostab::MotionModel model)
+{
+    /*----------Params----------*/
+    const float minAngle = 0.f, maxAngle = static_cast<float>(CV_PI);
+    const float minScale = 0.5f, maxScale = 2.f;
+    const float maxTranslation = 100.f;
+    const float affineCoeff = 3.f;
+    /*----------Params----------*/
+
+    cv::Mat transform = cv::Mat::eye(3, 3, CV_32F);
+
+    if(model != cv::videostab::MM_ROTATION)
+    {
+        transform.at<float>(0,2) = rng.uniform(-maxTranslation, maxTranslation);
+        transform.at<float>(1,2) = rng.uniform(-maxTranslation, maxTranslation);
+    }
+
+    if(model != cv::videostab::MM_AFFINE)
+    {
+
+        if(model != cv::videostab::MM_TRANSLATION_AND_SCALE &&
+                model != cv::videostab::MM_TRANSLATION)
+        {
+            const float angle = rng.uniform(minAngle, maxAngle);
+
+            transform.at<float>(1,1) = transform.at<float>(0,0) = std::cos(angle);
+            transform.at<float>(0,1) = std::sin(angle);
+            transform.at<float>(1,0) = -transform.at<float>(0,1);
+
+        }
+
+        if(model == cv::videostab::MM_TRANSLATION_AND_SCALE ||
+                model == cv::videostab::MM_SIMILARITY)
+        {
+            const float scale = rng.uniform(minScale, maxScale);
+
+            transform.at<float>(0,0) *= scale;
+            transform.at<float>(1,1) *= scale;
+
+        }
+
+    }
+    else
+    {
+        transform.at<float>(0,0) = rng.uniform(-affineCoeff, affineCoeff);
+        transform.at<float>(0,1) = rng.uniform(-affineCoeff, affineCoeff);
+        transform.at<float>(1,0) = rng.uniform(-affineCoeff, affineCoeff);
+        transform.at<float>(1,1) = rng.uniform(-affineCoeff, affineCoeff);
+    }
+
+    return transform;
+}
+
+
+double testUtil::performTest(const cv::videostab::MotionModel model, int size)
+{
+    cv::Ptr<cv::videostab::MotionEstimatorRansacL2> estimator = cv::makePtr<cv::videostab::MotionEstimatorRansacL2>(model);
+
+    estimator->setRansacParams(cv::videostab::RansacParams(size, 3.f*testUtil::sigma /*3 sigma rule*/, 0.5f, 0.5f));
+
+    double disparity = 0.;
+
+    for(int attempt = 0; attempt < testUtil::testRun; attempt++)
+    {
+        const cv::Mat transform = testUtil::generateTransform(model);
+
+        const int pointsNumber = testUtil::rng.uniform(10, 100);
+
+        cv::Mat points(3, pointsNumber, CV_32F);
+
+        testUtil::generatePoints(points);
+
+        cv::Mat transformedPoints = transform * points;
+
+        testUtil::addNoise(transformedPoints);
+
+        const cv::Mat src = points.rowRange(0,2).t();
+        const cv::Mat dst = transformedPoints.rowRange(0,2).t();
+
+        bool isOK = false;
+        const cv::Mat estTransform = estimator->estimate(src.reshape(2), dst.reshape(2), &isOK);
+
+        CV_Assert(isOK);
+        const cv::Mat testPoints = estTransform * points;
+
+        const double norm = cv::norm(testPoints, transformedPoints, cv::NORM_INF);
+
+        disparity = std::max(disparity, norm);
+    }
+
+    return disparity;
+
+}
+
+TEST(Regression, MM_TRANSLATION)
+{
+    EXPECT_LT(testUtil::performTest(cv::videostab::MM_TRANSLATION, 2), 7.f);
+}
+
+TEST(Regression, MM_TRANSLATION_AND_SCALE)
+{
+    EXPECT_LT(testUtil::performTest(cv::videostab::MM_TRANSLATION_AND_SCALE, 3), 7.f);
+}
+
+TEST(Regression, MM_ROTATION)
+{
+    EXPECT_LT(testUtil::performTest(cv::videostab::MM_ROTATION, 2), 7.f);
+}
+
+TEST(Regression, MM_RIGID)
+{
+    EXPECT_LT(testUtil::performTest(cv::videostab::MM_RIGID, 3), 7.f);
+}
+
+TEST(Regression, MM_SIMILARITY)
+{
+    EXPECT_LT(testUtil::performTest(cv::videostab::MM_SIMILARITY, 4), 7.f);
+}
+
+TEST(Regression, MM_AFFINE)
+{
+    EXPECT_LT(testUtil::performTest(cv::videostab::MM_AFFINE, 6), 9.f);
+}
+
+}} // namespace
diff --git a/modules/videostab/test/test_precomp.hpp b/modules/videostab/test/test_precomp.hpp
new file mode 100644
index 00000000000..faacf33c6ca
--- /dev/null
+++ b/modules/videostab/test/test_precomp.hpp
@@ -0,0 +1,11 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+#ifndef __OPENCV_TEST_PRECOMP_HPP__
+#define __OPENCV_TEST_PRECOMP_HPP__
+
+#include "opencv2/ts.hpp"
+#include "opencv2/videostab.hpp"
+
+#endif
diff --git a/modules/videostab/test/test_stabilizer.cpp b/modules/videostab/test/test_stabilizer.cpp
new file mode 100644
index 00000000000..592e1f42b79
--- /dev/null
+++ b/modules/videostab/test/test_stabilizer.cpp
@@ -0,0 +1,82 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "test_precomp.hpp"
+#include <opencv2/ts/cuda_test.hpp> // EXPECT_MAT_NEAR
+
+namespace opencv_test { namespace {
+
+using namespace ::cv::videostab;
+
+class OneFrameTestSource : public IFrameSource
+{
+public:
+    OneFrameTestSource(const Mat &frame)
+    {
+        frameNumber_ = 0;
+        frame_ = frame;
+    }
+
+    virtual void reset() CV_OVERRIDE
+    {
+        frameNumber_ = 0;
+    }
+
+    virtual Mat nextFrame() CV_OVERRIDE
+    {
+        return (frameNumber_++ == 0) ? frame_ : Mat();
+    }
+
+private:
+    int frameNumber_;
+    Mat frame_;
+};
+
+TEST(OnePassStabilizer, oneFrame)
+{
+    Mat frame(2, 3, CV_8UC3);
+    randu(frame, Scalar::all(0), Scalar::all(255));
+
+    OnePassStabilizer stabilizer;
+    stabilizer.setRadius(10);
+    stabilizer.setFrameSource(makePtr<OneFrameTestSource>(frame));
+
+    Mat stabilizedFrame = stabilizer.nextFrame();
+    EXPECT_MAT_NEAR(frame, stabilizedFrame, 0);
+    EXPECT_TRUE(stabilizer.nextFrame().empty());
+}
+
+TEST(OnePassStabilizer, oneFrame_deblur)
+{
+    Mat frame(2, 3, CV_8UC3);
+    randu(frame, Scalar::all(0), Scalar::all(255));
+
+    OnePassStabilizer stabilizer;
+    stabilizer.setRadius(1);
+    stabilizer.setFrameSource(makePtr<OneFrameTestSource>(frame));
+
+    Ptr<WeightingDeblurer> deblurer = makePtr<WeightingDeblurer>();
+    deblurer->setRadius(10);
+    stabilizer.setDeblurer(deblurer);
+
+    Mat stabilizedFrame = stabilizer.nextFrame();
+    EXPECT_MAT_NEAR(frame, stabilizedFrame, 0);
+    EXPECT_TRUE(stabilizer.nextFrame().empty());
+}
+
+TEST(TwoPassStabilizer, oneFrame)
+{
+    Mat frame(2, 3, CV_8UC3);
+    randu(frame, Scalar::all(0), Scalar::all(255));
+
+    TwoPassStabilizer stabilizer;
+    stabilizer.setRadius(10);
+    stabilizer.setFrameSource(makePtr<OneFrameTestSource>(frame));
+
+    Mat stabilizedFrame = stabilizer.nextFrame();
+    EXPECT_MAT_NEAR(frame, stabilizedFrame, 0);
+    EXPECT_TRUE(stabilizer.nextFrame().empty());
+}
+
+}} // namespace
diff --git a/modules/viz/CMakeLists.txt b/modules/viz/CMakeLists.txt
new file mode 100644
index 00000000000..89a9c3e0982
--- /dev/null
+++ b/modules/viz/CMakeLists.txt
@@ -0,0 +1,46 @@
+if(NOT HAVE_VTK)
+  ocv_module_disable(viz)
+endif()
+
+set(the_description "Viz")
+include(${VTK_USE_FILE})
+
+if(NOT BUILD_SHARED_LIBS)
+  # We observed conflict between builtin 3rdparty libraries and
+  # system-wide similar libraries (but with different versions) from VTK dependencies
+  set(_conflicts "")
+  foreach(dep ${VTK_LIBRARIES})
+    if(("${dep}" MATCHES "libz\\." AND BUILD_ZLIB)
+      OR ("${dep}" MATCHES "libjpeg\\." AND BUILD_JPEG)
+      OR ("${dep}" MATCHES "libpng\\." AND BUILD_PNG)
+      OR ("${dep}" MATCHES "libtiff\\." AND BUILD_TIFF)
+    )
+      list(APPEND _conflicts "${dep}")
+    endif()
+  endforeach()
+  if(_conflicts)
+    message(STATUS "Disabling VIZ module due to conflicts with VTK dependencies: ${_conflicts}")
+    ocv_module_disable(viz)
+  endif()
+endif()
+
+ocv_warnings_disable(CMAKE_CXX_FLAGS -Winconsistent-missing-override -Wsuggest-override)
+
+ocv_add_module(viz opencv_core WRAP python)
+ocv_glob_module_sources()
+ocv_module_include_directories()
+ocv_create_module()
+
+ocv_add_accuracy_tests()
+ocv_add_perf_tests()
+ocv_add_samples(opencv_imgproc opencv_calib3d opencv_features2d opencv_flann)
+
+ocv_target_link_libraries(${the_module} PRIVATE ${VTK_LIBRARIES})
+
+if(APPLE AND BUILD_opencv_viz)
+  ocv_target_link_libraries(${the_module} PRIVATE "-framework Cocoa")
+endif()
+
+if(TARGET opencv_test_viz)
+  set_target_properties(opencv_test_viz PROPERTIES MACOSX_BUNDLE TRUE)
+endif()
diff --git a/modules/viz/doc/images/cpw1.png b/modules/viz/doc/images/cpw1.png
new file mode 100644
index 00000000000..985b1eeaffb
Binary files /dev/null and b/modules/viz/doc/images/cpw1.png differ
diff --git a/modules/viz/doc/images/cpw2.png b/modules/viz/doc/images/cpw2.png
new file mode 100644
index 00000000000..5733a6af657
Binary files /dev/null and b/modules/viz/doc/images/cpw2.png differ
diff --git a/modules/viz/doc/images/cpw3.png b/modules/viz/doc/images/cpw3.png
new file mode 100644
index 00000000000..585b836b66d
Binary files /dev/null and b/modules/viz/doc/images/cpw3.png differ
diff --git a/modules/viz/doc/images/cube_widget.png b/modules/viz/doc/images/cube_widget.png
new file mode 100644
index 00000000000..ffe56811aed
Binary files /dev/null and b/modules/viz/doc/images/cube_widget.png differ
diff --git a/modules/viz/include/opencv2/viz.hpp b/modules/viz/include/opencv2/viz.hpp
new file mode 100644
index 00000000000..fc79b8b60e7
--- /dev/null
+++ b/modules/viz/include/opencv2/viz.hpp
@@ -0,0 +1,84 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#ifndef OPENCV_VIZ_HPP
+#define OPENCV_VIZ_HPP
+
+#include <opencv2/viz/types.hpp>
+#include <opencv2/viz/widgets.hpp>
+#include <opencv2/viz/viz3d.hpp>
+#include <opencv2/viz/vizcore.hpp>
+
+/**
+  @defgroup viz 3D Visualizer
+
+This section describes 3D visualization window as well as classes and methods that are used to
+interact with it.
+
+3D visualization window (see Viz3d) is used to display widgets (see Widget), and it provides several
+methods to interact with scene and widgets.
+
+  @{
+    @defgroup viz_widget Widget
+
+In this section, the widget framework is explained. Widgets represent 2D or 3D objects, varying from
+simple ones such as lines to complex ones such as point clouds and meshes.
+
+Widgets are **implicitly shared**. Therefore, one can add a widget to the scene, and modify the
+widget without re-adding the widget.
+
+@code
+// Create a cloud widget
+viz::WCloud cw(cloud, viz::Color::red());
+// Display it in a window
+myWindow.showWidget("CloudWidget1", cw);
+// Modify it, and it will be modified in the window.
+cw.setColor(viz::Color::yellow());
+@endcode
+
+  @}
+*/
+
+#endif /* OPENCV_VIZ_HPP */
diff --git a/modules/viz/include/opencv2/viz/types.hpp b/modules/viz/include/opencv2/viz/types.hpp
new file mode 100644
index 00000000000..1a7bde292c4
--- /dev/null
+++ b/modules/viz/include/opencv2/viz/types.hpp
@@ -0,0 +1,389 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#ifndef OPENCV_VIZ_TYPES_HPP
+#define OPENCV_VIZ_TYPES_HPP
+
+#include <string>
+#include <opencv2/core.hpp>
+#include <opencv2/core/affine.hpp>
+
+namespace cv
+{
+    namespace viz
+    {
+
+//! @addtogroup viz
+//! @{
+
+        /** @brief This class represents color in BGR order.
+        */
+        class Color : public Scalar
+        {
+        public:
+            Color();
+            //! The three channels will have the same value equal to gray.
+            Color(double gray);
+            Color(double blue, double green, double red);
+
+            Color(const Scalar& color);
+
+            operator Vec3b() const;
+
+            static Color black();
+            static Color blue();
+            static Color green();
+            static Color red();
+            static Color cyan();
+            static Color yellow();
+            static Color magenta();
+            static Color white();
+
+            static Color gray();
+            static Color silver();
+
+            static Color mlab();
+
+            static Color navy();
+            static Color maroon();
+            static Color teal();
+            static Color olive();
+            static Color purple();
+            static Color azure();
+            static Color chartreuse();
+            static Color rose();
+
+            static Color lime();
+            static Color gold();
+            static Color orange();
+            static Color orange_red();
+            static Color indigo();
+
+            static Color brown();
+            static Color apricot();
+            static Color pink();
+            static Color raspberry();
+            static Color cherry();
+            static Color violet();
+            static Color amethyst();
+            static Color bluberry();
+            static Color celestial_blue();
+            static Color turquoise();
+
+            static Color not_set();
+        };
+
+        /** @brief This class wraps mesh attributes, and it can load a mesh from a ply file. :
+        */
+        class CV_EXPORTS Mesh
+        {
+        public:
+            enum {
+                LOAD_AUTO = 0,
+                LOAD_PLY = 1,
+                LOAD_OBJ = 2
+            };
+
+            Mat cloud; //!< point coordinates of type CV_32FC3 or CV_64FC3 with only 1 row
+            Mat colors; //!< point color of type CV_8UC3 or CV_8UC4 with only 1 row
+            Mat normals; //!< point normals of type CV_32FC3, CV_32FC4, CV_64FC3 or CV_64FC4 with only 1 row
+
+            //! Raw integer list of the form: (n,id1,id2,...,idn, n,id1,id2,...,idn, ...)
+            //! where n is the number of points in the polygon, and id is a zero-offset index into an associated cloud.
+            Mat polygons; //!< CV_32SC1 with only 1 row
+
+            Mat texture;
+            Mat tcoords; //!< CV_32FC2 or CV_64FC2 with only 1 row
+
+            /** @brief Loads a mesh from a ply or a obj file.
+
+            @param file File name
+            @param type File type (for now only PLY and OBJ are supported)
+
+            **File type** can be one of the following:
+            -   **LOAD_PLY**
+            -   **LOAD_OBJ**
+             */
+            static Mesh load(const String& file, int type = LOAD_PLY);
+
+        };
+
+        /** @brief This class wraps intrinsic parameters of a camera.
+
+        It provides several constructors that can extract the intrinsic parameters from field of
+        view, intrinsic matrix and projection matrix. :
+         */
+        class CV_EXPORTS Camera
+        {
+        public:
+
+            /** @brief Constructs a Camera.
+
+            @param fx Horizontal focal length.
+            @param fy Vertical focal length.
+            @param cx x coordinate of the principal point.
+            @param cy y coordinate of the principal point.
+            @param window_size Size of the window. This together with focal length and principal
+            point determines the field of view.
+             */
+            Camera(double fx, double fy, double cx, double cy, const Size &window_size);
+
+            /** @overload
+            @param fov Field of view (horizontal, vertical)
+            @param window_size Size of the window. Principal point is at the center of the window
+            by default.
+            */
+            Camera(const Vec2d &fov, const Size &window_size);
+
+            /** @overload
+            @param K Intrinsic matrix of the camera with the following form
+            \f[
+              \begin{bmatrix}
+              f_x &   0 & c_x\\
+                0 & f_y & c_y\\
+                0 &   0 &   1\\
+              \end{bmatrix}
+            \f]
+            @param window_size Size of the window. This together with intrinsic matrix determines
+            the field of view.
+            */
+            Camera(const Matx33d &K, const Size &window_size);
+
+            /** @overload
+            @param proj Projection matrix of the camera with the following form
+            \f[
+              \begin{bmatrix}
+              \frac{2n}{r-l} &        0       & \frac{r+l}{r-l}  & 0\\
+                    0        & \frac{2n}{t-b} & \frac{t+b}{t-b}  & 0\\
+                    0        &        0       & -\frac{f+n}{f-n} & -\frac{2fn}{f-n}\\
+                    0        &        0       & -1               & 0\\
+              \end{bmatrix}
+            \f]
+
+            @param window_size Size of the window. This together with projection matrix determines
+            the field of view.
+            */
+            explicit Camera(const Matx44d &proj, const Size &window_size);
+
+            const Vec2d & getClip() const { return clip_; }
+            void setClip(const Vec2d &clip) { clip_ = clip; }
+
+            const Size & getWindowSize() const { return window_size_; }
+            void setWindowSize(const Size &window_size);
+
+            const Vec2d& getFov() const { return fov_; }
+            void setFov(const Vec2d& fov) { fov_ = fov; }
+
+            const Vec2d& getPrincipalPoint() const { return principal_point_; }
+            const Vec2d& getFocalLength() const { return focal_; }
+
+            /** @brief Computes projection matrix using intrinsic parameters of the camera.
+
+
+            @param proj Output projection matrix with the following form
+            \f[
+              \begin{bmatrix}
+              \frac{2n}{r-l} &        0       & \frac{r+l}{r-l}  & 0\\
+                    0        & \frac{2n}{t-b} & \frac{t+b}{t-b}  & 0\\
+                    0        &        0       & -\frac{f+n}{f-n} & -\frac{2fn}{f-n}\\
+                    0        &        0       & -1               & 0\\
+              \end{bmatrix}
+            \f]
+             */
+            void computeProjectionMatrix(Matx44d &proj) const;
+
+            /** @brief Creates a Kinect Camera with
+              - fx = fy = 525
+              - cx = 320
+              - cy = 240
+
+            @param window_size Size of the window. This together with intrinsic matrix of a Kinect Camera
+            determines the field of view.
+             */
+            static Camera KinectCamera(const Size &window_size);
+
+        private:
+            void init(double fx, double fy, double cx, double cy, const Size &window_size);
+
+            /** The near plane and the far plane.
+             *  - clip_[0]: the near plane; default value is 0.01
+             *  - clip_[1]: the far plane; default value is 1000.01
+             */
+            Vec2d clip_;
+
+            /**
+             * Field of view.
+             *  - fov_[0]: horizontal(x-axis) field of view in radians
+             *  - fov_[1]: vertical(y-axis) field of view in radians
+             */
+            Vec2d fov_;
+
+            /** Window size.*/
+            Size window_size_;
+
+            /**
+             * Principal point.
+             *  - principal_point_[0]: cx
+             *  - principal_point_[1]: cy
+             */
+            Vec2d principal_point_;
+            /**
+             * Focal length.
+             *  - focal_[0]: fx
+             *  - focal_[1]: fy
+             */
+            Vec2d focal_;
+        };
+
+        /** @brief This class represents a keyboard event.
+        */
+        class CV_EXPORTS KeyboardEvent
+        {
+        public:
+            enum { NONE = 0, ALT = 1, CTRL = 2, SHIFT = 4 };
+            enum Action { KEY_UP = 0, KEY_DOWN = 1 };
+
+            /** @brief Constructs a KeyboardEvent.
+
+            @param action Signals if key is pressed or released.
+            @param symbol Name of the key.
+            @param code Code of the key.
+            @param modifiers Signals if alt, ctrl or shift are pressed or their combination.
+             */
+            KeyboardEvent(Action action, const String& symbol, unsigned char code, int modifiers);
+
+            Action action;
+            String symbol;
+            unsigned char code;
+            int modifiers;
+        };
+
+        /** @brief This class represents a mouse event.
+        */
+        class CV_EXPORTS MouseEvent
+        {
+        public:
+            enum Type { MouseMove = 1, MouseButtonPress, MouseButtonRelease, MouseScrollDown, MouseScrollUp, MouseDblClick } ;
+            enum MouseButton { NoButton = 0, LeftButton, MiddleButton, RightButton, VScroll } ;
+
+            /** @brief Constructs a MouseEvent.
+
+            @param type Type of the event. This can be **MouseMove**, **MouseButtonPress**,
+            **MouseButtonRelease**, **MouseScrollDown**, **MouseScrollUp**, **MouseDblClick**.
+            @param button Mouse button. This can be **NoButton**, **LeftButton**, **MiddleButton**,
+            **RightButton**, **VScroll**.
+            @param pointer Position of the event.
+            @param modifiers Signals if alt, ctrl or shift are pressed or their combination.
+             */
+            MouseEvent(const Type& type, const MouseButton& button, const Point& pointer, int modifiers);
+
+            Type type;
+            MouseButton button;
+            Point pointer;
+            int modifiers;
+        };
+
+//! @} viz
+
+    } /* namespace viz */
+} /* namespace cv */
+
+//! @cond IGNORED
+
+//////////////////////////////////////////////////////////////////////////////////////////////////////
+/// cv::viz::Color
+
+inline cv::viz::Color::Color() : Scalar(0, 0, 0) {}
+inline cv::viz::Color::Color(double _gray) : Scalar(_gray, _gray, _gray) {}
+inline cv::viz::Color::Color(double _blue, double _green, double _red) : Scalar(_blue, _green, _red) {}
+inline cv::viz::Color::Color(const Scalar& color) : Scalar(color) {}
+
+inline cv::viz::Color::operator cv::Vec3b() const { return cv::Vec3d(val); }
+
+inline cv::viz::Color cv::viz::Color::black()   { return Color(  0,   0,   0); }
+inline cv::viz::Color cv::viz::Color::blue()    { return Color(255,   0,   0); }
+inline cv::viz::Color cv::viz::Color::green()   { return Color(  0, 255,   0); }
+inline cv::viz::Color cv::viz::Color::red()     { return Color(  0,   0, 255); }
+inline cv::viz::Color cv::viz::Color::cyan()    { return Color(255, 255,   0); }
+inline cv::viz::Color cv::viz::Color::yellow()  { return Color(  0, 255, 255); }
+inline cv::viz::Color cv::viz::Color::magenta() { return Color(255,   0, 255); }
+inline cv::viz::Color cv::viz::Color::white()   { return Color(255, 255, 255); }
+
+inline cv::viz::Color cv::viz::Color::gray()    { return Color(128, 128, 128); }
+inline cv::viz::Color cv::viz::Color::silver()  { return Color(192, 192, 192); }
+
+inline cv::viz::Color cv::viz::Color::mlab()    { return Color(255, 128, 128); }
+
+inline cv::viz::Color cv::viz::Color::navy()       { return Color(128,   0,   0); }
+inline cv::viz::Color cv::viz::Color::maroon()     { return Color(  0,   0, 128); }
+inline cv::viz::Color cv::viz::Color::teal()       { return Color(128, 128,   0); }
+inline cv::viz::Color cv::viz::Color::olive()      { return Color(  0, 128, 128); }
+inline cv::viz::Color cv::viz::Color::purple()     { return Color(128,   0, 128); }
+inline cv::viz::Color cv::viz::Color::azure()      { return Color(255, 128,   0); }
+inline cv::viz::Color cv::viz::Color::chartreuse() { return Color(  0, 255, 128); }
+inline cv::viz::Color cv::viz::Color::rose()       { return Color(128,   0, 255); }
+
+inline cv::viz::Color cv::viz::Color::lime()       { return Color(  0, 255, 191); }
+inline cv::viz::Color cv::viz::Color::gold()       { return Color(  0, 215, 255); }
+inline cv::viz::Color cv::viz::Color::orange()     { return Color(  0, 165, 255); }
+inline cv::viz::Color cv::viz::Color::orange_red() { return Color(  0,  69, 255); }
+inline cv::viz::Color cv::viz::Color::indigo()     { return Color(130,   0,  75); }
+
+inline cv::viz::Color cv::viz::Color::brown()          { return Color( 42,  42, 165); }
+inline cv::viz::Color cv::viz::Color::apricot()        { return Color(177, 206, 251); }
+inline cv::viz::Color cv::viz::Color::pink()           { return Color(203, 192, 255); }
+inline cv::viz::Color cv::viz::Color::raspberry()      { return Color( 92,  11, 227); }
+inline cv::viz::Color cv::viz::Color::cherry()         { return Color( 99,  29, 222); }
+inline cv::viz::Color cv::viz::Color::violet()         { return Color(226,  43, 138); }
+inline cv::viz::Color cv::viz::Color::amethyst()       { return Color(204, 102, 153); }
+inline cv::viz::Color cv::viz::Color::bluberry()       { return Color(247, 134,  79); }
+inline cv::viz::Color cv::viz::Color::celestial_blue() { return Color(208, 151,  73); }
+inline cv::viz::Color cv::viz::Color::turquoise()      { return Color(208, 224,  64); }
+
+inline cv::viz::Color cv::viz::Color::not_set()        { return Color(-1, -1, -1); }
+
+//! @endcond
+
+#endif
diff --git a/modules/viz/include/opencv2/viz/viz3d.hpp b/modules/viz/include/opencv2/viz/viz3d.hpp
new file mode 100644
index 00000000000..1e6e9d5e549
--- /dev/null
+++ b/modules/viz/include/opencv2/viz/viz3d.hpp
@@ -0,0 +1,353 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#ifndef OPENCV_VIZ_VIZ3D_HPP
+#define OPENCV_VIZ_VIZ3D_HPP
+
+#if !defined YES_I_AGREE_THAT_VIZ_API_IS_NOT_STABLE_NOW_AND_BINARY_COMPARTIBILITY_WONT_BE_SUPPORTED && !defined CVAPI_EXPORTS
+    //#error "Viz is in beta state now. Please define macro above to use it"
+#endif
+
+#include <opencv2/core.hpp>
+#include <opencv2/viz/types.hpp>
+#include <opencv2/viz/widgets.hpp>
+
+namespace cv
+{
+    namespace viz
+    {
+
+//! @addtogroup viz
+//! @{
+
+        /** @brief The Viz3d class represents a 3D visualizer window. This class is implicitly shared.
+        */
+        class CV_EXPORTS Viz3d
+        {
+        public:
+            typedef cv::viz::Color Color;
+            typedef void (*KeyboardCallback)(const KeyboardEvent&, void*);
+            typedef void (*MouseCallback)(const MouseEvent&, void*);
+
+            /** @brief The constructors.
+
+            @param window_name Name of the window.
+             */
+            Viz3d(const String& window_name = String());
+            Viz3d(const Viz3d&);
+            Viz3d& operator=(const Viz3d&);
+            ~Viz3d();
+
+            /** @brief Shows a widget in the window.
+
+            @param id A unique id for the widget. @param widget The widget to be displayed in the window.
+            @param pose Pose of the widget.
+             */
+            void showWidget(const String &id, const Widget &widget, const Affine3d &pose = Affine3d::Identity());
+
+            /** @brief Removes a widget from the window.
+
+            @param id The id of the widget that will be removed.
+             */
+            void removeWidget(const String &id);
+
+            /** @brief Retrieves a widget from the window.
+
+            A widget is implicitly shared; that is, if the returned widget is modified, the changes
+            will be immediately visible in the window.
+
+            @param id The id of the widget that will be returned.
+             */
+            Widget getWidget(const String &id) const;
+
+            /** @brief Removes all widgets from the window.
+            */
+            void removeAllWidgets();
+
+            /** @brief Removed all widgets and displays image scaled to whole window area.
+
+            @param image Image to be displayed.
+            @param window_size Size of Viz3d window. Default value means no change.
+             */
+            void showImage(InputArray image, const Size& window_size = Size(-1, -1));
+
+            /** @brief Sets pose of a widget in the window.
+
+            @param id The id of the widget whose pose will be set. @param pose The new pose of the widget.
+             */
+            void setWidgetPose(const String &id, const Affine3d &pose);
+
+            /** @brief Updates pose of a widget in the window by pre-multiplying its current pose.
+
+            @param id The id of the widget whose pose will be updated. @param pose The pose that the current
+            pose of the widget will be pre-multiplied by.
+             */
+            void updateWidgetPose(const String &id, const Affine3d &pose);
+
+            /** @brief Returns the current pose of a widget in the window.
+
+            @param id The id of the widget whose pose will be returned.
+             */
+            Affine3d getWidgetPose(const String &id) const;
+
+            /** @brief Sets the intrinsic parameters of the viewer using Camera.
+
+            @param camera Camera object wrapping intrinsic parameters.
+             */
+            void setCamera(const Camera &camera);
+
+            /** @brief Returns a camera object that contains intrinsic parameters of the current viewer.
+            */
+            Camera getCamera() const;
+
+            /** @brief Returns the current pose of the viewer.
+            */
+            Affine3d getViewerPose() const;
+
+            /** @brief Sets pose of the viewer.
+
+            @param pose The new pose of the viewer.
+             */
+            void setViewerPose(const Affine3d &pose);
+
+            /** @brief Resets camera viewpoint to a 3D widget in the scene.
+
+            @param id Id of a 3D widget.
+             */
+            void resetCameraViewpoint(const String &id);
+
+            /** @brief Resets camera.
+            */
+            void resetCamera();
+
+            /** @brief Transforms a point in world coordinate system to window coordinate system.
+
+            @param pt Point in world coordinate system.
+            @param window_coord Output point in window coordinate system.
+             */
+            void convertToWindowCoordinates(const Point3d &pt, Point3d &window_coord);
+
+            /** @brief Transforms a point in window coordinate system to a 3D ray in world coordinate system.
+
+            @param window_coord Point in window coordinate system. @param origin Output origin of the ray.
+            @param direction Output direction of the ray.
+             */
+            void converTo3DRay(const Point3d &window_coord, Point3d &origin, Vec3d &direction);
+
+            /** @brief Returns the current size of the window.
+            */
+            Size getWindowSize() const;
+            /** @brief Sets the size of the window.
+
+            @param window_size New size of the window.
+             */
+            void setWindowSize(const Size &window_size);
+
+            /** @brief Returns the name of the window which has been set in the constructor.
+             *  `Viz - ` is prepended to the name if necessary.
+             */
+            String getWindowName() const;
+
+            /** @brief Returns the Mat screenshot of the current scene.
+            */
+            cv::Mat getScreenshot() const;
+
+            /** @brief Saves screenshot of the current scene.
+
+            @param file Name of the file.
+             */
+            void saveScreenshot(const String &file);
+
+            /** @brief Sets the position of the window in the screen.
+
+            @param window_position coordinates of the window
+             */
+            void setWindowPosition(const Point& window_position);
+
+            /** @brief Sets or unsets full-screen rendering mode.
+
+            @param mode If true, window will use full-screen mode.
+             */
+            void setFullScreen(bool mode = true);
+
+            /** @brief Sets background color.
+            */
+            void setBackgroundColor(const Color& color = Color::black(), const Color& color2 = Color::not_set());
+            void setBackgroundTexture(InputArray image = noArray());
+            void setBackgroundMeshLab();
+
+            /** @brief The window renders and starts the event loop.
+            */
+            void spin();
+
+            /** @brief Starts the event loop for a given time.
+
+            @param time Amount of time in milliseconds for the event loop to keep running.
+            @param force_redraw If true, window renders.
+             */
+            void spinOnce(int time = 1, bool force_redraw = false);
+
+            /** @brief Create a window in memory instead of on the screen.
+             */
+            void setOffScreenRendering();
+
+            /** @brief Remove all lights from the current scene.
+            */
+            void removeAllLights();
+
+            /** @brief Add a light in the scene.
+
+            @param position The position of the light.
+            @param focalPoint The point at which the light is shining
+            @param color The color of the light
+            @param diffuseColor The diffuse color of the light
+            @param ambientColor The ambient color of the light
+            @param specularColor The specular color of the light
+             */
+            void addLight(const Vec3d &position, const Vec3d &focalPoint = Vec3d(0, 0, 0), const Color &color = Color::white(),
+                          const Color &diffuseColor = Color::white(), const Color &ambientColor = Color::black(),
+                          const Color &specularColor = Color::white());
+
+            /** @brief Returns whether the event loop has been stopped.
+            */
+            bool wasStopped() const;
+            void close();
+
+            /** @brief Sets keyboard handler.
+
+            @param callback Keyboard callback (void (\*KeyboardCallbackFunction(const
+            KeyboardEvent&, void\*)).
+            @param cookie The optional parameter passed to the callback.
+             */
+            void registerKeyboardCallback(KeyboardCallback callback, void* cookie = 0);
+
+            /** @brief Sets mouse handler.
+
+            @param callback Mouse callback (void (\*MouseCallback)(const MouseEvent&, void\*)).
+            @param cookie The optional parameter passed to the callback.
+             */
+            void registerMouseCallback(MouseCallback callback, void* cookie = 0);
+
+            /** @brief Sets rendering property of a widget.
+
+            @param id Id of the widget.
+            @param property Property that will be modified.
+            @param value The new value of the property.
+
+            Rendering property can be one of the following:
+            -   **POINT_SIZE**
+            -   **OPACITY**
+            -   **LINE_WIDTH**
+            -   **FONT_SIZE**
+
+            REPRESENTATION: Expected values are
+            -   **REPRESENTATION_POINTS**
+            -   **REPRESENTATION_WIREFRAME**
+            -   **REPRESENTATION_SURFACE**
+
+            IMMEDIATE_RENDERING:
+            -   Turn on immediate rendering by setting the value to 1.
+            -   Turn off immediate rendering by setting the value to 0.
+
+            SHADING: Expected values are
+            -   **SHADING_FLAT**
+            -   **SHADING_GOURAUD**
+            -   **SHADING_PHONG**
+             */
+            void setRenderingProperty(const String &id, int property, double value);
+            /** @brief Returns rendering property of a widget.
+
+            @param id Id of the widget.
+            @param property Property.
+
+            Rendering property can be one of the following:
+            -   **POINT_SIZE**
+            -   **OPACITY**
+            -   **LINE_WIDTH**
+            -   **FONT_SIZE**
+
+            REPRESENTATION: Expected values are
+            -   **REPRESENTATION_POINTS**
+            -   **REPRESENTATION_WIREFRAME**
+            -   **REPRESENTATION_SURFACE**
+
+            IMMEDIATE_RENDERING:
+            -   Turn on immediate rendering by setting the value to 1.
+            -   Turn off immediate rendering by setting the value to 0.
+
+            SHADING: Expected values are
+            -   **SHADING_FLAT**
+            -   **SHADING_GOURAUD**
+            -   **SHADING_PHONG**
+             */
+            double getRenderingProperty(const String &id, int property);
+
+            /** @brief Sets geometry representation of the widgets to surface, wireframe or points.
+
+            @param representation Geometry representation which can be one of the following:
+            -   **REPRESENTATION_POINTS**
+            -   **REPRESENTATION_WIREFRAME**
+            -   **REPRESENTATION_SURFACE**
+             */
+            void setRepresentation(int representation);
+
+            void setGlobalWarnings(bool enabled = false);
+        private:
+
+            struct VizImpl;
+            VizImpl* impl_;
+
+            void create(const String &window_name);
+            void release();
+
+            friend class VizStorage;
+        };
+
+//! @}
+
+    } /* namespace viz */
+} /* namespace cv */
+
+#endif
diff --git a/modules/viz/include/opencv2/viz/vizcore.hpp b/modules/viz/include/opencv2/viz/vizcore.hpp
new file mode 100644
index 00000000000..85b75131d68
--- /dev/null
+++ b/modules/viz/include/opencv2/viz/vizcore.hpp
@@ -0,0 +1,226 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#ifndef OPENCV_VIZCORE_HPP
+#define OPENCV_VIZCORE_HPP
+
+#include <opencv2/viz/types.hpp>
+#include <opencv2/viz/widgets.hpp>
+#include <opencv2/viz/viz3d.hpp>
+
+namespace cv
+{
+    namespace viz
+    {
+
+//! @addtogroup viz
+//! @{
+
+        /** @brief Takes coordinate frame data and builds transform to global coordinate frame.
+
+        @param axis_x X axis vector in global coordinate frame.
+        @param axis_y Y axis vector in global coordinate frame.
+        @param axis_z Z axis vector in global coordinate frame.
+        @param origin Origin of the coordinate frame in global coordinate frame.
+
+        @return An affine transform that describes transformation between global coordinate frame
+        and a given coordinate frame.
+        The returned transforms can transform a point in the given coordinate frame to the global
+        coordinate frame.
+         */
+        CV_EXPORTS Affine3d makeTransformToGlobal(const Vec3d& axis_x, const Vec3d& axis_y, const Vec3d& axis_z, const Vec3d& origin = Vec3d::all(0));
+
+        /** @brief Constructs camera pose from position, focal_point and up_vector (see gluLookAt() for more
+        information).
+
+        @param position Position of the camera in global coordinate frame.
+        @param focal_point Focal point of the camera in global coordinate frame.
+        @param y_dir Up vector of the camera in global coordinate frame.
+
+        This function returns pose of the camera in global coordinate frame.
+         */
+        CV_EXPORTS Affine3d makeCameraPose(const Vec3d& position, const Vec3d& focal_point, const Vec3d& y_dir);
+
+        /** @brief Retrieves a window by its name.
+
+        @param window_name Name of the window that is to be retrieved.
+
+        This function returns a Viz3d object with the given name.
+
+        @note If the window with that name already exists, that window is returned. Otherwise, new window is
+        created with the given name, and it is returned.
+
+        @note Window names are automatically prefixed by "Viz - " if it is not done by the user.
+           @code
+            /// window and window_2 are the same windows.
+            viz::Viz3d window   = viz::getWindowByName("myWindow");
+            viz::Viz3d window_2 = viz::getWindowByName("Viz - myWindow");
+            @endcode
+         */
+        CV_EXPORTS Viz3d getWindowByName(const String &window_name);
+
+        //! Unregisters all Viz windows from internal database. After it 'getWindowByName()' will create new windows instead of getting existing from the database.
+        CV_EXPORTS void unregisterAllWindows();
+
+        //! Displays image in specified window
+        CV_EXPORTS Viz3d imshow(const String& window_name, InputArray image, const Size& window_size = Size(-1, -1));
+
+        /** @brief Checks **float/double** value for nan.
+
+        @param x return true if nan.
+         */
+        inline bool isNan(float x)
+        {
+            unsigned int *u = reinterpret_cast<unsigned int *>(&x);
+            return ((u[0] & 0x7f800000) == 0x7f800000) && (u[0] & 0x007fffff);
+        }
+
+        /** @brief Checks **float/double** value for nan.
+
+        @param x return true if nan.
+         */
+        inline bool isNan(double x)
+        {
+            unsigned int *u = reinterpret_cast<unsigned int *>(&x);
+            return (u[1] & 0x7ff00000) == 0x7ff00000 && (u[0] != 0 || (u[1] & 0x000fffff) != 0);
+        }
+
+        /** @brief Checks **float/double** value for nan.
+
+        @param v return true if **any** of the elements of the vector is *nan*.
+         */
+        template<typename _Tp, int cn> inline bool isNan(const Vec<_Tp, cn>& v)
+        { return isNan(v.val[0]) || isNan(v.val[1]) || isNan(v.val[2]); }
+
+        /** @brief Checks **float/double** value for nan.
+
+        @param p return true if **any** of the elements of the point is *nan*.
+         */
+        template<typename _Tp> inline bool isNan(const Point3_<_Tp>& p)
+        { return isNan(p.x) || isNan(p.y) || isNan(p.z); }
+
+
+        ///////////////////////////////////////////////////////////////////////////////////////////////
+        /// Read/write clouds. Supported formats: ply, xyz, obj and stl (readonly)
+
+        /**
+         * @param file Filename with extension. Supported formats: PLY, XYZ and OBJ.
+         * @param cloud  Supported depths: CV_32F and CV_64F. Supported channels: 3 and 4.
+         * @param colors Used by PLY format only. Supported depth: CV_8U. Supported channels: 1, 3 and 4.
+         * @param normals Used by PLY and OBJ format only. Supported depths: CV_32F and CV_64F.
+         *                Supported channels: 3 and 4.
+         * @param binary Used only for PLY format.
+         */
+        CV_EXPORTS void writeCloud(const String& file, InputArray cloud, InputArray colors = noArray(), InputArray normals = noArray(), bool binary = false);
+
+        /**
+         * @param file Filename with extension. Supported formats: PLY, XYZ, OBJ and STL.
+         * @param colors Used by PLY and STL formats only.
+         * @param normals Used by PLY, OBJ and STL formats only.
+         * @return A mat containing the point coordinates with depth CV_32F or CV_64F and number of
+         *         channels 3 or 4 with only 1 row.
+         */
+        CV_EXPORTS Mat  readCloud (const String& file, OutputArray colors = noArray(), OutputArray normals = noArray());
+
+        ///////////////////////////////////////////////////////////////////////////////////////////////
+        /// Reads mesh. Only ply format is supported now and no texture load support
+
+        CV_EXPORTS Mesh readMesh(const String& file);
+
+        ///////////////////////////////////////////////////////////////////////////////////////////////
+        /// Read/write poses and trajectories
+
+        /**
+         * @param file Filename of type supported by cv::FileStorage.
+         * @param pose Output matrix.
+         * @param tag Name of the pose in the file.
+         */
+        CV_EXPORTS bool readPose(const String& file, Affine3d& pose, const String& tag = "pose");
+        /**
+         * @param file Filename.
+         * @param pose Input matrix.
+         * @param tag Name of the pose to be saved into the given file.
+         */
+        CV_EXPORTS void writePose(const String& file, const Affine3d& pose, const String& tag = "pose");
+
+        /** takes vector<Affine3<T>> with T = float/dobule and writes to a sequence of files with given filename format
+         * @param traj Trajectory containing a list of poses. It can be
+         *          - std::vector<cv::Mat>, each cv::Mat is of type CV_32F16 or CV_64FC16
+         *          - std::vector<cv::Affine3f>, std::vector<cv::Affine3d>
+         *          - cv::Mat of type CV_32FC16 OR CV_64F16
+         * @param files_format Format specifier string for constructing filenames.
+         *                     The only placeholder in the string should support `int`.
+         * @param start The initial counter for files_format.
+         * @param tag Name of the matrix in the file.
+         */
+        CV_EXPORTS void writeTrajectory(InputArray traj, const String& files_format = "pose%05d.xml", int start = 0, const String& tag = "pose");
+
+        /** takes vector<Affine3<T>> with T = float/dobule and loads poses from sequence of files
+         *
+         * @param traj Output array containing a lists of poses. It can be
+         *             - std::vector<cv::Affine3f>, std::vector<cv::Affine3d>
+         *             - cv::Mat
+         * @param files_format Format specifier string for constructing filenames.
+         *                     The only placeholder in the string should support `int`.
+         * @param start The initial counter for files_format. It must be greater than or equal to 0.
+         * @param end The final  counter for files_format.
+         * @param tag Name of the matrix in the file.
+         */
+        CV_EXPORTS void readTrajectory(OutputArray traj, const String& files_format = "pose%05d.xml", int start = 0, int end = INT_MAX, const String& tag = "pose");
+
+
+        ///////////////////////////////////////////////////////////////////////////////////////////////
+        /** Computing normals for mesh
+         * @param mesh Input mesh.
+         * @param normals Normals at very point in the mesh of type CV_64FC3.
+         */
+        CV_EXPORTS void computeNormals(const Mesh& mesh, OutputArray normals);
+
+//! @}
+
+    } /* namespace viz */
+} /* namespace cv */
+
+#endif /* OPENCV_VIZCORE_HPP */
diff --git a/modules/viz/include/opencv2/viz/widget_accessor.hpp b/modules/viz/include/opencv2/viz/widget_accessor.hpp
new file mode 100644
index 00000000000..7b4be543f85
--- /dev/null
+++ b/modules/viz/include/opencv2/viz/widget_accessor.hpp
@@ -0,0 +1,89 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#ifndef OPENCV_VIZ_WIDGET_ACCESSOR_HPP
+#define OPENCV_VIZ_WIDGET_ACCESSOR_HPP
+
+#include <opencv2/core/cvdef.h>
+#include <vtkSmartPointer.h>
+#include <vtkProp.h>
+
+namespace cv
+{
+    namespace viz
+    {
+
+//! @addtogroup viz_widget
+//! @{
+
+        class Widget;
+
+        /** @brief This class is for users who want to develop their own widgets using VTK library API. :
+        */
+        struct CV_EXPORTS WidgetAccessor
+        {
+            /** @brief Returns vtkProp of a given widget.
+
+            @param widget Widget whose vtkProp is to be returned.
+
+            @note vtkProp has to be down cast appropriately to be modified.
+                @code
+                vtkActor * actor = vtkActor::SafeDownCast(viz::WidgetAccessor::getProp(widget));
+                @endcode
+             */
+            static vtkSmartPointer<vtkProp> getProp(const Widget &widget);
+            /** @brief Sets vtkProp of a given widget.
+
+            @param widget Widget whose vtkProp is to be set. @param prop A vtkProp.
+             */
+            static void setProp(Widget &widget, vtkSmartPointer<vtkProp> prop);
+        };
+
+//! @}
+
+    }
+}
+
+#endif
diff --git a/modules/viz/include/opencv2/viz/widgets.hpp b/modules/viz/include/opencv2/viz/widgets.hpp
new file mode 100644
index 00000000000..46910798116
--- /dev/null
+++ b/modules/viz/include/opencv2/viz/widgets.hpp
@@ -0,0 +1,850 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#ifndef OPENCV_VIZ_WIDGETS_HPP
+#define OPENCV_VIZ_WIDGETS_HPP
+
+#include <opencv2/viz/types.hpp>
+
+namespace cv
+{
+    namespace viz
+    {
+
+//! @addtogroup viz_widget
+//! @{
+
+        /////////////////////////////////////////////////////////////////////////////
+        /// Widget rendering properties
+        enum RenderingProperties
+        {
+            POINT_SIZE,
+            OPACITY,
+            LINE_WIDTH,
+            FONT_SIZE,
+            REPRESENTATION,
+            IMMEDIATE_RENDERING,
+            SHADING,
+            AMBIENT,
+            LIGHTING
+        };
+
+        enum RepresentationValues
+        {
+            REPRESENTATION_POINTS,
+            REPRESENTATION_WIREFRAME,
+            REPRESENTATION_SURFACE
+        };
+
+        enum ShadingValues
+        {
+            SHADING_FLAT,
+            SHADING_GOURAUD,
+            SHADING_PHONG
+        };
+
+        /////////////////////////////////////////////////////////////////////////////
+
+        /** @brief Base class of all widgets. Widget is implicitly shared.
+        */
+        class CV_EXPORTS Widget
+        {
+        public:
+            Widget();
+            Widget(const Widget& other);
+            Widget& operator=(const Widget& other);
+            virtual ~Widget();
+
+            /** @brief Creates a widget from ply file.
+
+            @param file_name Ply file name.
+             */
+            static Widget fromPlyFile(const String &file_name);
+
+            /** @brief Sets rendering property of the widget.
+
+            @param property Property that will be modified.
+            @param value The new value of the property.
+
+            Rendering property can be one of the following:
+            -   **POINT_SIZE**
+            -   **OPACITY**
+            -   **LINE_WIDTH**
+            -   **FONT_SIZE**
+
+            REPRESENTATION: Expected values are
+            -   **REPRESENTATION_POINTS**
+            -   **REPRESENTATION_WIREFRAME**
+            -   **REPRESENTATION_SURFACE**
+
+            IMMEDIATE_RENDERING:
+            -   Turn on immediate rendering by setting the value to 1.
+            -   Turn off immediate rendering by setting the value to 0.
+
+            SHADING: Expected values are
+            -   **SHADING_FLAT**
+            -   **SHADING_GOURAUD**
+            -   **SHADING_PHONG**
+             */
+            void setRenderingProperty(int property, double value);
+            /** @brief Returns rendering property of the widget.
+
+            @param property Property.
+
+            Rendering property can be one of the following:
+            -   **POINT_SIZE**
+            -   **OPACITY**
+            -   **LINE_WIDTH**
+            -   **FONT_SIZE**
+            -   **AMBIENT**
+
+            REPRESENTATION: Expected values are
+            -   **REPRESENTATION_POINTS**
+            -   **REPRESENTATION_WIREFRAME**
+            -   **REPRESENTATION_SURFACE**
+
+            **IMMEDIATE_RENDERING**:
+            -   Turn on immediate rendering by setting the value to 1.
+            -   Turn off immediate rendering by setting the value to 0.
+
+            SHADING: Expected values are
+            -   **SHADING_FLAT**
+            -   **SHADING_GOURAUD**
+            -   **SHADING_PHONG**
+             */
+            double getRenderingProperty(int property) const;
+
+            /** @brief Casts a widget to another.
+
+            @code
+            // Create a sphere widget
+            viz::WSphere sw(Point3f(0.0f,0.0f,0.0f), 0.5f);
+            // Cast sphere widget to cloud widget
+            viz::WCloud cw = sw.cast<viz::WCloud>();
+            @endcode
+
+            @note 3D Widgets can only be cast to 3D Widgets. 2D Widgets can only be cast to 2D Widgets.
+             */
+            template<typename _W> _W cast() const;
+        private:
+            class Impl;
+            Impl *impl_;
+            friend struct WidgetAccessor;
+        };
+
+        /////////////////////////////////////////////////////////////////////////////
+
+        /** @brief Base class of all 3D widgets.
+         */
+        class CV_EXPORTS Widget3D : public Widget
+        {
+        public:
+            Widget3D() {}
+
+            /** @brief Sets pose of the widget.
+
+            @param pose The new pose of the widget.
+             */
+            void setPose(const Affine3d &pose);
+            /** @brief Updates pose of the widget by pre-multiplying its current pose.
+
+            @param pose The pose that the current pose of the widget will be pre-multiplied by.
+             */
+            void updatePose(const Affine3d &pose);
+            /** @brief Returns the current pose of the widget.
+             */
+            Affine3d getPose() const;
+
+            /** @brief Transforms internal widget data (i.e. points, normals) using the given transform.
+
+            @param transform Specified transformation to apply.
+             */
+            void applyTransform(const Affine3d &transform);
+
+            /** @brief Sets the color of the widget.
+
+            @param color color of type Color
+             */
+            void setColor(const Color &color);
+
+        };
+
+        /////////////////////////////////////////////////////////////////////////////
+
+        /** @brief Base class of all 2D widgets.
+        */
+        class CV_EXPORTS Widget2D : public Widget
+        {
+        public:
+            Widget2D() {}
+
+            /** @brief Sets the color of the widget.
+
+            @param color color of type Color
+             */
+            void setColor(const Color &color);
+        };
+
+        /////////////////////////////////////////////////////////////////////////////
+        /// Simple widgets
+
+        /** @brief This 3D Widget defines a finite line.
+        */
+        class CV_EXPORTS WLine : public Widget3D
+        {
+        public:
+            /** @brief Constructs a WLine.
+
+            @param pt1 Start point of the line.
+            @param pt2 End point of the line.
+            @param color Color of the line.
+             */
+            WLine(const Point3d &pt1, const Point3d &pt2, const Color &color = Color::white());
+        };
+
+        /** @brief This 3D Widget defines a finite plane.
+        */
+        class CV_EXPORTS WPlane : public Widget3D
+        {
+        public:
+            /** @brief Constructs a default plane with center point at origin and normal oriented along z-axis.
+
+            @param size Size of the plane
+            @param color Color of the plane.
+             */
+            WPlane(const Size2d& size = Size2d(1.0, 1.0), const Color &color = Color::white());
+
+            /** @brief Constructs a repositioned plane
+
+            @param center Center of the plane
+            @param normal Plane normal orientation
+            @param new_yaxis Up-vector. New orientation of plane y-axis.
+            @param size
+            @param color Color of the plane.
+             */
+            WPlane(const Point3d& center, const Vec3d& normal, const Vec3d& new_yaxis,
+                   const Size2d& size = Size2d(1.0, 1.0), const Color &color = Color::white());
+        };
+
+        /** @brief This 3D Widget defines a sphere. :
+        */
+        class CV_EXPORTS WSphere : public Widget3D
+        {
+        public:
+            /** @brief Constructs a WSphere.
+
+            @param center Center of the sphere.
+            @param radius Radius of the sphere.
+            @param sphere_resolution Resolution of the sphere.
+            @param color Color of the sphere.
+             */
+            WSphere(const cv::Point3d &center, double radius, int sphere_resolution = 10, const Color &color = Color::white());
+        };
+
+        /** @brief This 3D Widget defines an arrow.
+        */
+        class CV_EXPORTS WArrow : public Widget3D
+        {
+        public:
+            /** @brief Constructs an WArrow.
+
+            @param pt1 Start point of the arrow.
+            @param pt2 End point of the arrow.
+            @param thickness Thickness of the arrow. Thickness of arrow head is also adjusted
+            accordingly.
+            @param color Color of the arrow.
+
+            Arrow head is located at the end point of the arrow.
+             */
+            WArrow(const Point3d& pt1, const Point3d& pt2, double thickness = 0.03, const Color &color = Color::white());
+        };
+
+        /** @brief This 3D Widget defines a circle.
+        */
+        class CV_EXPORTS WCircle : public Widget3D
+        {
+        public:
+            /** @brief Constructs default planar circle centered at origin with plane normal along z-axis
+
+            @param radius Radius of the circle.
+            @param thickness Thickness of the circle.
+            @param color Color of the circle.
+             */
+            WCircle(double radius, double thickness = 0.01, const Color &color = Color::white());
+
+            /** @brief Constructs repositioned planar circle.
+
+            @param radius Radius of the circle.
+            @param center Center of the circle.
+            @param normal Normal of the plane in which the circle lies.
+            @param thickness Thickness of the circle.
+            @param color Color of the circle.
+             */
+            WCircle(double radius, const Point3d& center, const Vec3d& normal, double thickness = 0.01, const Color &color = Color::white());
+        };
+
+        /** @brief This 3D Widget defines a cone. :
+        */
+        class CV_EXPORTS WCone : public Widget3D
+        {
+        public:
+            /** @brief Constructs default cone oriented along x-axis with center of its base located at origin
+
+            @param length Length of the cone.
+            @param radius Radius of the cone.
+            @param resolution Resolution of the cone.
+            @param color Color of the cone.
+             */
+            WCone(double length, double radius, int resolution = 6.0, const Color &color = Color::white());
+
+            /** @brief Constructs repositioned planar cone.
+
+            @param radius Radius of the cone.
+            @param center Center of the cone base.
+            @param tip Tip of the cone.
+            @param resolution Resolution of the cone.
+            @param color Color of the cone.
+
+             */
+            WCone(double radius, const Point3d& center, const Point3d& tip, int resolution = 6.0, const Color &color = Color::white());
+        };
+
+        /** @brief This 3D Widget defines a cylinder. :
+        */
+        class CV_EXPORTS WCylinder : public Widget3D
+        {
+        public:
+            /** @brief Constructs a WCylinder.
+
+            @param axis_point1 A point1 on the axis of the cylinder.
+            @param axis_point2 A point2 on the axis of the cylinder.
+            @param radius Radius of the cylinder.
+            @param numsides Resolution of the cylinder.
+            @param color Color of the cylinder.
+             */
+            WCylinder(const Point3d& axis_point1, const Point3d& axis_point2, double radius, int numsides = 30, const Color &color = Color::white());
+        };
+
+        /** @brief This 3D Widget defines a cube.
+         */
+        class CV_EXPORTS WCube : public Widget3D
+        {
+        public:
+            /** @brief Constructs a WCube.
+
+            @param min_point Specifies minimum point of the bounding box.
+            @param max_point Specifies maximum point of the bounding box.
+            @param wire_frame If true, cube is represented as wireframe.
+            @param color Color of the cube.
+
+            ![Cube Widget](images/cube_widget.png)
+             */
+            WCube(const Point3d& min_point = Vec3d::all(-0.5), const Point3d& max_point = Vec3d::all(0.5),
+                  bool wire_frame = true, const Color &color = Color::white());
+        };
+
+        /** @brief This 3D Widget defines a poly line. :
+        */
+        class CV_EXPORTS WPolyLine : public Widget3D
+        {
+        public:
+            WPolyLine(InputArray points, InputArray colors);
+            /** @brief Constructs a WPolyLine.
+
+            @param points Point set.
+            @param color Color of the poly line.
+             */
+            WPolyLine(InputArray points, const Color &color = Color::white());
+        };
+
+        /////////////////////////////////////////////////////////////////////////////
+        /// Text and image widgets
+
+        /** @brief This 2D Widget represents text overlay.
+        */
+        class CV_EXPORTS WText : public Widget2D
+        {
+        public:
+            /** @brief Constructs a WText.
+
+            @param text Text content of the widget.
+            @param pos Position of the text.
+            @param font_size Font size.
+            @param color Color of the text.
+             */
+            WText(const String &text, const Point &pos, int font_size = 20, const Color &color = Color::white());
+
+            /** @brief Sets the text content of the widget.
+
+            @param text Text content of the widget.
+             */
+            void setText(const String &text);
+            /** @brief Returns the current text content of the widget.
+            */
+            String getText() const;
+        };
+
+        /** @brief This 3D Widget represents 3D text. The text always faces the camera.
+        */
+        class CV_EXPORTS WText3D : public Widget3D
+        {
+        public:
+            /** @brief Constructs a WText3D.
+
+            @param text Text content of the widget.
+            @param position Position of the text.
+            @param text_scale Size of the text.
+            @param face_camera If true, text always faces the camera.
+            @param color Color of the text.
+             */
+            WText3D(const String &text, const Point3d &position, double text_scale = 1., bool face_camera = true, const Color &color = Color::white());
+
+            /** @brief Sets the text content of the widget.
+
+            @param text Text content of the widget.
+
+             */
+            void setText(const String &text);
+            /** @brief Returns the current text content of the widget.
+            */
+            String getText() const;
+        };
+
+        /** @brief This 2D Widget represents an image overlay. :
+        */
+        class CV_EXPORTS WImageOverlay : public Widget2D
+        {
+        public:
+            /** @brief Constructs an WImageOverlay.
+
+            @param image BGR or Gray-Scale image.
+            @param rect Image is scaled and positioned based on rect.
+             */
+            WImageOverlay(InputArray image, const Rect &rect);
+            /** @brief Sets the image content of the widget.
+
+            @param image BGR or Gray-Scale image.
+             */
+            void setImage(InputArray image);
+        };
+
+        /** @brief This 3D Widget represents an image in 3D space. :
+        */
+        class CV_EXPORTS WImage3D : public Widget3D
+        {
+        public:
+            /** @brief Constructs an WImage3D.
+
+            @param image BGR or Gray-Scale image.
+            @param size Size of the image.
+             */
+            WImage3D(InputArray image, const Size2d &size);
+
+            /** @brief Constructs an WImage3D.
+
+            @param image BGR or Gray-Scale image.
+            @param size Size of the image.
+            @param center Position of the image.
+            @param normal Normal of the plane that represents the image.
+            @param up_vector Determines orientation of the image.
+             */
+            WImage3D(InputArray image, const Size2d &size, const Vec3d &center, const Vec3d &normal, const Vec3d &up_vector);
+
+            /** @brief Sets the image content of the widget.
+
+            @param image BGR or Gray-Scale image.
+             */
+            void setImage(InputArray image);
+
+            /** @brief Sets the image size of the widget.
+
+            @param size the new size of the image.
+             */
+            void setSize(const Size& size);
+        };
+
+        /////////////////////////////////////////////////////////////////////////////
+        /// Compound widgets
+
+        /** @brief This 3D Widget represents a coordinate system. :
+        */
+        class CV_EXPORTS WCoordinateSystem : public Widget3D
+        {
+        public:
+            /** @brief Constructs a WCoordinateSystem.
+
+            @param scale Determines the size of the axes.
+             */
+            WCoordinateSystem(double scale = 1.0);
+        };
+
+        /** @brief This 3D Widget defines a grid. :
+         */
+        class CV_EXPORTS WGrid : public Widget3D
+        {
+        public:
+            /** @brief Constructs a WGrid.
+
+            @param cells Number of cell columns and rows, respectively.
+            @param cells_spacing Size of each cell, respectively.
+            @param color Color of the grid.
+             */
+            WGrid(const Vec2i &cells = Vec2i::all(10), const Vec2d &cells_spacing = Vec2d::all(1.0), const Color &color = Color::white());
+
+            //! Creates repositioned grid
+            WGrid(const Point3d& center, const Vec3d& normal, const Vec3d& new_yaxis,
+                  const Vec2i &cells = Vec2i::all(10), const Vec2d &cells_spacing = Vec2d::all(1.0), const Color &color = Color::white());
+        };
+
+        /** @brief This 3D Widget represents camera position in a scene by its axes or viewing frustum. :
+        */
+        class CV_EXPORTS WCameraPosition : public Widget3D
+        {
+        public:
+            /** @brief Creates camera coordinate frame at the origin.
+
+            ![Camera coordinate frame](images/cpw1.png)
+             */
+            WCameraPosition(double scale = 1.0);
+            /** @brief Display the viewing frustum
+            @param K Intrinsic matrix of the camera.
+            @param scale Scale of the frustum.
+            @param color Color of the frustum.
+
+            Creates viewing frustum of the camera based on its intrinsic matrix K.
+
+            ![Camera viewing frustum](images/cpw2.png)
+            */
+            WCameraPosition(const Matx33d &K, double scale = 1.0, const Color &color = Color::white());
+            /** @brief Display the viewing frustum
+            @param fov Field of view of the camera (horizontal, vertical).
+            @param scale Scale of the frustum.
+            @param color Color of the frustum.
+
+            Creates viewing frustum of the camera based on its field of view fov.
+
+            ![Camera viewing frustum](images/cpw2.png)
+             */
+            WCameraPosition(const Vec2d &fov, double scale = 1.0, const Color &color = Color::white());
+            /** @brief Display image on the far plane of the viewing frustum
+
+            @param K Intrinsic matrix of the camera.
+            @param image BGR or Gray-Scale image that is going to be displayed on the far plane of the frustum.
+            @param scale Scale of the frustum and image.
+            @param color Color of the frustum.
+
+            Creates viewing frustum of the camera based on its intrinsic matrix K, and displays image on
+            the far end plane.
+
+            ![Camera viewing frustum with image](images/cpw3.png)
+             */
+            WCameraPosition(const Matx33d &K, InputArray image, double scale = 1.0, const Color &color = Color::white());
+            /** @brief  Display image on the far plane of the viewing frustum
+
+            @param fov Field of view of the camera (horizontal, vertical).
+            @param image BGR or Gray-Scale image that is going to be displayed on the far plane of the frustum.
+            @param scale Scale of the frustum and image.
+            @param color Color of the frustum.
+
+            Creates viewing frustum of the camera based on its intrinsic matrix K, and displays image on
+            the far end plane.
+
+            ![Camera viewing frustum with image](images/cpw3.png)
+             */
+            WCameraPosition(const Vec2d &fov, InputArray image, double scale = 1.0, const Color &color = Color::white());
+        };
+
+        /////////////////////////////////////////////////////////////////////////////
+        /// Trajectories
+
+        /** @brief This 3D Widget represents a trajectory. :
+        */
+        class CV_EXPORTS WTrajectory : public Widget3D
+        {
+        public:
+            enum {FRAMES = 1, PATH = 2, BOTH = FRAMES + PATH };
+
+            /** @brief Constructs a WTrajectory.
+
+            @param path List of poses on a trajectory. Takes std::vector\<Affine\<T\>\> with T == [float | double]
+            @param display_mode Display mode. This can be PATH, FRAMES, and BOTH.
+            @param scale Scale of the frames. Polyline is not affected.
+            @param color Color of the polyline that represents path.
+
+            Frames are not affected.
+            Displays trajectory of the given path as follows:
+            -   PATH : Displays a poly line that represents the path.
+            -   FRAMES : Displays coordinate frames at each pose.
+            -   PATH & FRAMES : Displays both poly line and coordinate frames.
+             */
+            WTrajectory(InputArray path, int display_mode = WTrajectory::PATH, double scale = 1.0, const Color &color = Color::white());
+        };
+
+        /** @brief This 3D Widget represents a trajectory. :
+        */
+        class CV_EXPORTS WTrajectoryFrustums : public Widget3D
+        {
+        public:
+            /** @brief Constructs a WTrajectoryFrustums.
+
+            @param path List of poses on a trajectory. Takes std::vector\<Affine\<T\>\> with T == [float | double]
+            @param K Intrinsic matrix of the camera.
+            @param scale Scale of the frustums.
+            @param color Color of the frustums.
+
+            Displays frustums at each pose of the trajectory.
+             */
+            WTrajectoryFrustums(InputArray path, const Matx33d &K, double scale = 1., const Color &color = Color::white());
+
+            /** @brief Constructs a WTrajectoryFrustums.
+
+            @param path List of poses on a trajectory. Takes std::vector\<Affine\<T\>\> with T == [float | double]
+            @param fov Field of view of the camera (horizontal, vertical).
+            @param scale Scale of the frustums.
+            @param color Color of the frustums.
+
+            Displays frustums at each pose of the trajectory.
+             */
+            WTrajectoryFrustums(InputArray path, const Vec2d &fov, double scale = 1., const Color &color = Color::white());
+        };
+
+        /** @brief This 3D Widget represents a trajectory using spheres and lines
+
+        where spheres represent the positions of the camera, and lines represent the direction from
+        previous position to the current. :
+         */
+        class CV_EXPORTS WTrajectorySpheres: public Widget3D
+        {
+        public:
+            /** @brief Constructs a WTrajectorySpheres.
+
+            @param path List of poses on a trajectory. Takes std::vector\<Affine\<T\>\> with T == [float | double]
+            @param line_length Max length of the lines which point to previous position
+            @param radius Radius of the spheres.
+            @param from Color for first sphere.
+            @param to Color for last sphere. Intermediate spheres will have interpolated color.
+             */
+            WTrajectorySpheres(InputArray path, double line_length = 0.05, double radius = 0.007,
+                               const Color &from = Color::red(), const Color &to = Color::white());
+        };
+
+        /////////////////////////////////////////////////////////////////////////////
+        /// Clouds
+
+        /** @brief This 3D Widget defines a point cloud. :
+
+        @note In case there are four channels in the cloud, fourth channel is ignored.
+        */
+        class CV_EXPORTS WCloud: public Widget3D
+        {
+        public:
+            /** @brief Constructs a WCloud.
+
+            @param cloud Set of points which can be of type: CV_32FC3, CV_32FC4, CV_64FC3, CV_64FC4.
+            @param colors Set of colors. It has to be of the same size with cloud.
+
+            Points in the cloud belong to mask when they are set to (NaN, NaN, NaN).
+             */
+            WCloud(InputArray cloud, InputArray colors);
+
+            /** @brief Constructs a WCloud.
+            @param cloud Set of points which can be of type: CV_32FC3, CV_32FC4, CV_64FC3, CV_64FC4.
+            @param color A single Color for the whole cloud.
+
+            Points in the cloud belong to mask when they are set to (NaN, NaN, NaN).
+             */
+            WCloud(InputArray cloud, const Color &color = Color::white());
+
+            /** @brief Constructs a WCloud.
+            @param cloud Set of points which can be of type: CV_32FC3, CV_32FC4, CV_64FC3, CV_64FC4.
+            @param colors Set of colors. It has to be of the same size with cloud.
+            @param normals Normals for each point in cloud. Size and type should match with the cloud parameter.
+
+            Points in the cloud belong to mask when they are set to (NaN, NaN, NaN).
+             */
+            WCloud(InputArray cloud, InputArray colors, InputArray normals);
+
+            /** @brief Constructs a WCloud.
+            @param cloud Set of points which can be of type: CV_32FC3, CV_32FC4, CV_64FC3, CV_64FC4.
+            @param color A single Color for the whole cloud.
+            @param normals Normals for each point in cloud.
+
+            Size and type should match with the cloud parameter.
+            Points in the cloud belong to mask when they are set to (NaN, NaN, NaN).
+             */
+            WCloud(InputArray cloud, const Color &color, InputArray normals);
+        };
+
+        class CV_EXPORTS WPaintedCloud: public Widget3D
+        {
+        public:
+            //! Paint cloud with default gradient between cloud bounds points
+            WPaintedCloud(InputArray cloud);
+
+            //! Paint cloud with default gradient between given points
+            WPaintedCloud(InputArray cloud, const Point3d& p1, const Point3d& p2);
+
+            //! Paint cloud with gradient specified by given colors between given points
+            WPaintedCloud(InputArray cloud, const Point3d& p1, const Point3d& p2, const Color& c1, const Color c2);
+        };
+
+        /** @brief This 3D Widget defines a collection of clouds. :
+        @note In case there are four channels in the cloud, fourth channel is ignored.
+        */
+        class CV_EXPORTS WCloudCollection : public Widget3D
+        {
+        public:
+            WCloudCollection();
+
+            /** @brief Adds a cloud to the collection.
+
+            @param cloud Point set which can be of type: CV_32FC3, CV_32FC4, CV_64FC3, CV_64FC4.
+            @param colors Set of colors. It has to be of the same size with cloud.
+            @param pose Pose of the cloud. Points in the cloud belong to mask when they are set to (NaN, NaN, NaN).
+             */
+            void addCloud(InputArray cloud, InputArray colors, const Affine3d &pose = Affine3d::Identity());
+            /** @brief Adds a cloud to the collection.
+
+            @param cloud Point set which can be of type: CV_32FC3, CV_32FC4, CV_64FC3, CV_64FC4.
+            @param color A single Color for the whole cloud.
+            @param pose Pose of the cloud. Points in the cloud belong to mask when they are set to (NaN, NaN, NaN).
+             */
+            void addCloud(InputArray cloud, const Color &color = Color::white(), const Affine3d &pose = Affine3d::Identity());
+            /** @brief Finalizes cloud data by repacking to single cloud.
+
+            Useful for large cloud collections to reduce memory usage
+            */
+            void finalize();
+        };
+
+        /** @brief This 3D Widget represents normals of a point cloud. :
+        */
+        class CV_EXPORTS WCloudNormals : public Widget3D
+        {
+        public:
+            /** @brief Constructs a WCloudNormals.
+
+            @param cloud Point set which can be of type: CV_32FC3, CV_32FC4, CV_64FC3, CV_64FC4.
+            @param normals A set of normals that has to be of same type with cloud.
+            @param level Display only every level th normal.
+            @param scale Scale of the arrows that represent normals.
+            @param color Color of the arrows that represent normals.
+
+            @note In case there are four channels in the cloud, fourth channel is ignored.
+             */
+            WCloudNormals(InputArray cloud, InputArray normals, int level = 64, double scale = 0.1, const Color &color = Color::white());
+        };
+
+        /** @brief Constructs a WMesh.
+
+        @param mesh Mesh object that will be displayed.
+        @param cloud Points of the mesh object.
+        @param polygons Points of the mesh object.
+        @param colors Point colors.
+        @param normals Point normals.
+         */
+        class CV_EXPORTS WMesh : public Widget3D
+        {
+        public:
+            WMesh(const Mesh &mesh);
+            WMesh(InputArray cloud, InputArray polygons, InputArray colors = noArray(), InputArray normals = noArray());
+        };
+
+        /** @brief This class allows to merge several widgets to single one.
+
+        It has quite limited functionality and can't merge widgets with different attributes. For
+        instance, if widgetA has color array and widgetB has only global color defined, then result
+        of merge won't have color at all. The class is suitable for merging large amount of similar
+        widgets. :
+         */
+        class CV_EXPORTS WWidgetMerger : public Widget3D
+        {
+        public:
+            WWidgetMerger();
+
+            //! Add widget to merge with optional position change
+            void addWidget(const Widget3D& widget, const Affine3d &pose = Affine3d::Identity());
+
+            //! Repacks internal structure to single widget
+            void finalize();
+        };
+
+        /////////////////////////////////////////////////////////////////////////////
+        /// Utility exports
+
+        template<> CV_EXPORTS Widget2D Widget::cast<Widget2D>() const;
+        template<> CV_EXPORTS Widget3D Widget::cast<Widget3D>() const;
+        template<> CV_EXPORTS WLine Widget::cast<WLine>() const;
+        template<> CV_EXPORTS WPlane Widget::cast<WPlane>() const;
+        template<> CV_EXPORTS WSphere Widget::cast<WSphere>() const;
+        template<> CV_EXPORTS WCylinder Widget::cast<WCylinder>() const;
+        template<> CV_EXPORTS WArrow Widget::cast<WArrow>() const;
+        template<> CV_EXPORTS WCircle Widget::cast<WCircle>() const;
+        template<> CV_EXPORTS WCone Widget::cast<WCone>() const;
+        template<> CV_EXPORTS WCube Widget::cast<WCube>() const;
+        template<> CV_EXPORTS WCoordinateSystem Widget::cast<WCoordinateSystem>() const;
+        template<> CV_EXPORTS WPolyLine Widget::cast<WPolyLine>() const;
+        template<> CV_EXPORTS WGrid Widget::cast<WGrid>() const;
+        template<> CV_EXPORTS WText3D Widget::cast<WText3D>() const;
+        template<> CV_EXPORTS WText Widget::cast<WText>() const;
+        template<> CV_EXPORTS WImageOverlay Widget::cast<WImageOverlay>() const;
+        template<> CV_EXPORTS WImage3D Widget::cast<WImage3D>() const;
+        template<> CV_EXPORTS WCameraPosition Widget::cast<WCameraPosition>() const;
+        template<> CV_EXPORTS WTrajectory Widget::cast<WTrajectory>() const;
+        template<> CV_EXPORTS WTrajectoryFrustums Widget::cast<WTrajectoryFrustums>() const;
+        template<> CV_EXPORTS WTrajectorySpheres Widget::cast<WTrajectorySpheres>() const;
+        template<> CV_EXPORTS WCloud Widget::cast<WCloud>() const;
+        template<> CV_EXPORTS WPaintedCloud Widget::cast<WPaintedCloud>() const;
+        template<> CV_EXPORTS WCloudCollection Widget::cast<WCloudCollection>() const;
+        template<> CV_EXPORTS WCloudNormals Widget::cast<WCloudNormals>() const;
+        template<> CV_EXPORTS WMesh Widget::cast<WMesh>() const;
+        template<> CV_EXPORTS WWidgetMerger Widget::cast<WWidgetMerger>() const;
+
+//! @}
+
+    } /* namespace viz */
+} /* namespace cv */
+
+#endif
diff --git a/modules/viz/samples/creating_widgets.cpp b/modules/viz/samples/creating_widgets.cpp
new file mode 100644
index 00000000000..37e9598219d
--- /dev/null
+++ b/modules/viz/samples/creating_widgets.cpp
@@ -0,0 +1,122 @@
+/**
+ * @file creating_widgets.cpp
+ * @brief Creating custom widgets using VTK
+ * @author Ozan Cagri Tonkal
+ */
+
+#ifndef USE_VTK
+#include <iostream>
+int main()
+{
+    std::cout << "This sample requires direct compilation with VTK. Stop" << std::endl;
+    return 0;
+}
+#else
+#include <opencv2/viz.hpp>
+#include <opencv2/viz/widget_accessor.hpp>
+#include <iostream>
+
+#include <vtkPoints.h>
+#include <vtkTriangle.h>
+#include <vtkCellArray.h>
+#include <vtkPolyData.h>
+#include <vtkPolyDataMapper.h>
+#include <vtkIdList.h>
+#include <vtkActor.h>
+#include <vtkProp.h>
+
+using namespace cv;
+using namespace std;
+
+/**
+ * @function help
+ * @brief Display instructions to use this tutorial program
+ */
+static void help()
+{
+    cout
+    << "--------------------------------------------------------------------------"   << endl
+    << "This program shows how to create a custom widget. You can create your own "
+    << "widgets by extending Widget2D/Widget3D, and with the help of WidgetAccessor." << endl
+    << "Usage:"                                                                       << endl
+    << "./creating_widgets"                                                           << endl
+    << endl;
+}
+
+/**
+ * @class TriangleWidget
+ * @brief Defining our own 3D Triangle widget
+ */
+class WTriangle : public viz::Widget3D
+{
+    public:
+        WTriangle(const Point3f &pt1, const Point3f &pt2, const Point3f &pt3, const viz::Color & color = viz::Color::white());
+};
+
+/**
+ * @function TriangleWidget::TriangleWidget
+ * @brief Constructor
+ */
+WTriangle::WTriangle(const Point3f &pt1, const Point3f &pt2, const Point3f &pt3, const viz::Color & color)
+{
+    // Create a triangle
+    vtkSmartPointer<vtkPoints> points = vtkSmartPointer<vtkPoints>::New();
+    points->InsertNextPoint(pt1.x, pt1.y, pt1.z);
+    points->InsertNextPoint(pt2.x, pt2.y, pt2.z);
+    points->InsertNextPoint(pt3.x, pt3.y, pt3.z);
+
+    vtkSmartPointer<vtkTriangle> triangle = vtkSmartPointer<vtkTriangle>::New();
+    triangle->GetPointIds()->SetId(0,0);
+    triangle->GetPointIds()->SetId(1,1);
+    triangle->GetPointIds()->SetId(2,2);
+
+    vtkSmartPointer<vtkCellArray> cells = vtkSmartPointer<vtkCellArray>::New();
+    cells->InsertNextCell(triangle);
+
+    // Create a polydata object
+    vtkSmartPointer<vtkPolyData> polyData = vtkSmartPointer<vtkPolyData>::New();
+
+    // Add the geometry and topology to the polydata
+    polyData->SetPoints(points);
+    polyData->SetPolys(cells);
+
+    // Create mapper and actor
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+#if VTK_MAJOR_VERSION <= 5
+    mapper->SetInput(polyData);
+#else
+    mapper->SetInputData(polyData);
+#endif
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    // Store this actor in the widget in order that visualizer can access it
+    viz::WidgetAccessor::setProp(*this, actor);
+
+    // Set the color of the widget. This has to be called after WidgetAccessor.
+    setColor(color);
+}
+
+/**
+ * @function main
+ */
+int main()
+{
+    help();
+
+    /// Create a window
+    viz::Viz3d myWindow("Creating Widgets");
+
+    /// Create a triangle widget
+    WTriangle tw(Point3f(0.0,0.0,0.0), Point3f(1.0,1.0,1.0), Point3f(0.0,1.0,0.0), viz::Color::red());
+
+    /// Show widget in the visualizer window
+    myWindow.showWidget("TRIANGLE", tw);
+
+    /// Start event loop
+    myWindow.spin();
+
+    return 0;
+}
+#endif
diff --git a/modules/viz/samples/data/bunny.ply b/modules/viz/samples/data/bunny.ply
new file mode 100644
index 00000000000..7d342333951
--- /dev/null
+++ b/modules/viz/samples/data/bunny.ply
@@ -0,0 +1,5752 @@
+ply
+format ascii 1.0
+comment zipper output
+element vertex 1889
+property float x
+property float y
+property float z
+property float confidence
+property float intensity
+element face 3851
+property list uchar int vertex_indices
+end_header
+-0.0369122 0.127512 0.00276757 0.850855 0.5
+-0.0457707 0.130327 0.00306785 0.900159 0.5
+-0.0708847 0.149834 0.0388672 0.398443 0.5
+-0.00331557 0.130403 0.0212208 0.85268 0.5
+-0.0211979 0.1272 0.00915278 0.675938 0.5
+-0.0265255 0.12592 0.00874866 0.711533 0.5
+0.0339261 0.112038 0.0269672 0.652757 0.5
+0.0376485 0.110455 0.0145481 0.708171 0.5
+-0.0259368 0.111118 0.0379115 0.454541 0.437538
+0.027952 0.120939 0.0215377 0.533079 0.5
+-0.0628308 0.155987 -0.0150105 0.404517 0.5
+0.0390029 0.106711 0.0215202 0.535542 0.5
+0.0447976 0.0950477 0.00866471 0.579563 0.425995
+-0.0330636 0.173619 -0.0031267 0.365607 0.5
+-0.0808069 0.136243 0.0495014 0.499575 0.5
+-0.0705086 0.12445 0.0526685 0.564827 0.5
+0.00874873 0.131225 0.0145345 0.748371 0.5
+0.0401015 0.106711 0.00874166 0.680399 0.5
+0.0379483 0.100145 -0.00827134 0.600054 0.5
+-0.0906538 0.137201 0.0207305 0.824561 0.5
+-0.0841655 0.110667 0.0275273 0.690889 0.5
+-0.0705214 0.156214 0.0144536 0.807492 0.5
+-0.083872 0.15212 0.0282652 0.248168 0.41865
+0.00305028 0.12432 0.0332425 0.555044 0.354559
+0.00870828 0.124165 0.0330348 0.653433 0.5
+-0.0328896 0.12613 0.00300653 0.898771 0.5
+-0.0506302 0.143065 0.0150611 0.691477 0.5
+-0.0757863 0.13637 0.050172 0.566256 0.5
+-0.0027191 0.128962 0.0264678 0.271491 0.462815
+-0.0460961 0.125118 0.0263142 0.539149 0.5
+-0.0785104 0.0942728 -0.0109192 0.710999 0.5
+0.0216915 0.125373 0.0211452 0.530957 0.5
+-0.0888469 0.124305 0.00237041 0.635593 0.5
+0.040386 0.100825 -0.00303043 0.574857 0.5
+-0.0884145 0.117791 0.00268555 0.487167 0.430737
+-0.0319074 0.177421 -0.00491879 0.269231 0.447035
+-0.0765825 0.143224 0.0455148 0.414139 0.5
+-0.0209748 0.112544 0.0388613 0.482541 0.5
+-0.020836 0.179425 -0.0221622 0.341071 0.440034
+-0.0377039 0.167987 -0.0130391 0.396317 0.473039
+-0.0331765 0.12681 0.00839958 0.896274 0.5
+0.00893926 0.127114 0.0292916 0.350014 0.41288
+-0.044944 0.131083 0.0147963 0.599596 0.5
+-0.0266041 0.12515 0.00282384 0.73687 0.5
+0.0144285 0.12328 0.0319185 0.625269 0.5
+0.019244 0.122284 0.0308314 0.611204 0.34486
+-0.0390225 0.167317 0.00215527 0.413994 0.469929
+-0.08808 0.129976 0.00206377 0.625486 0.5
+-0.0537203 0.142608 0.0266058 0.696873 0.5
+0.043095 0.0980072 0.0191617 0.665192 0.5
+0.0432138 0.100117 0.00866473 0.691828 0.5
+0.0415448 0.0944954 0.0275695 0.671611 0.5
+-0.0578726 0.155337 0.0149245 0.394763 0.437313
+-0.0231577 0.157375 -0.0046304 0.136389 0.380194
+-0.0683123 0.145735 0.0420568 0.751812 0.5
+-0.0708351 0.142847 0.0451248 0.627973 0.5
+-0.070664 0.0642894 0.0209789 0.413051 0.5
+-0.0761519 0.130581 0.0525324 0.629117 0.5
+-0.0640036 0.161784 -0.0208118 0.449093 0.5
+-0.0706461 0.155711 0.00252406 0.855717 0.5
+-0.0924366 0.118434 0.0399838 0.673877 0.5
+-0.0635349 0.156052 0.0148814 0.798496 0.5
+0.0282675 0.118192 0.0274382 0.635485 0.5
+0.0392736 0.0938857 -0.00915453 0.585857 0.459742
+-0.0695973 0.164844 -0.0174846 0.548789 0.5
+-0.00892354 0.123904 0.0330319 0.602316 0.374044
+0.0269099 0.0942476 0.0444911 0.649753 0.5
+-0.0146258 0.162377 -0.0144398 0.338176 0.5
+-0.0450983 0.167072 0.00289327 0.449091 0.5
+-0.0761536 0.172742 -0.0384391 0.256591 0.4298
+-0.0858274 0.105458 0.00472318 0.523819 0.297125
+0.0370431 0.110443 0.0207229 0.52623 0.448558
+0.0321593 0.0994027 0.0380657 0.733041 0.5
+-0.075287 0.146433 0.0428582 0.424358 0.5
+-0.0395145 0.171107 0.000531747 0.452893 0.5
+-0.0839586 0.11215 0.00148754 0.436727 0.419097
+0.0446848 0.0883378 0.0216285 0.487783 0.481728
+0.0161783 0.127819 0.0220535 0.481793 0.5
+-0.00251635 0.0397232 0.0474087 0.280725 0.5
+0.00303163 0.0406968 0.0460422 0.331809 0.5
+-0.0143059 0.128197 0.00333856 0.693854 0.5
+-0.0526117 0.155596 0.0109972 0.561042 0.5
+-0.0332043 0.17776 -0.00906223 0.212789 0.5
+0.0394391 0.106654 0.00306577 0.522321 0.489889
+-0.0923799 0.1249 0.0327641 0.848517 0.5
+0.0454681 0.0882959 0.0146642 0.575503 0.5
+-0.0274495 0.179802 -0.00925837 0.258799 0.457369
+-0.072504 0.146297 0.0429682 0.549207 0.5
+-0.0579959 0.129793 0.0383118 0.658867 0.444043
+0.043117 0.0923689 0.0251649 0.622686 0.5
+-0.00865718 0.130323 0.0149721 0.633691 0.5
+-0.0141304 0.129188 0.0147431 0.547632 0.5
+-0.0707877 0.15583 0.00921954 0.739059 0.5
+-0.00952731 0.127041 0.0281475 0.375412 0.377874
+-0.0646289 0.153404 0.0329146 0.855321 0.5
+-0.0706939 0.15347 0.0328596 0.444959 0.455263
+0.0208126 0.118434 0.0336393 0.519282 0.5
+-0.0396566 0.173008 -0.00299705 0.274377 0.177706
+-0.0442176 0.170815 -0.00391429 0.245926 0.5
+-0.0582565 0.0395149 0.0457796 0.417977 0.459314
+-0.0523033 0.0401501 0.04623 0.454776 0.456044
+-0.0760211 0.161274 -0.0145891 0.267801 0.372187
+-0.0693983 0.163016 -0.0140293 0.403228 0.45768
+0.0399663 0.106491 0.014952 0.713602 0.5
+0.041536 0.0950084 -0.00475737 0.490139 0.464008
+-0.0470079 0.163779 0.00528295 0.432857 0.486946
+-0.0402546 0.161678 0.00298655 0.447592 0.5
+-0.0386569 0.0389805 0.0441153 0.509262 0.5
+-0.0704175 0.166991 -0.0216976 0.332592 0.447054
+-0.0254201 0.0886622 0.0503827 0.608282 0.5
+-0.0886334 0.137429 0.00876953 0.549009 0.5
+-0.014179 0.12627 0.0266417 0.420759 0.5
+-0.0360017 0.17408 -0.0118959 0.409753 0.289042
+-0.0886251 0.0937834 0.00823534 0.753697 0.5
+-0.0648672 0.155874 -0.00891497 0.595216 0.5
+-0.0704508 0.137752 -0.00774011 0.446131 0.5
+-0.0750154 0.166247 -0.0219558 0.263106 0.5
+0.0299465 0.114869 0.0300239 0.642356 0.5
+0.0398138 0.0998788 0.0273101 0.51725 0.5
+-0.015242 0.111698 0.0407424 0.605597 0.5
+-0.0700387 0.118219 0.0524379 0.585543 0.5
+0.0149973 0.112399 0.0386082 0.669811 0.5
+-0.036487 0.171225 0.000545037 0.438578 0.5
+-0.0641664 0.118551 -0.00968333 0.569796 0.5
+-0.071817 0.166979 -0.0463822 0.381568 0.451091
+-0.0913559 0.14534 0.0246937 0.648478 0.5
+0.00903703 0.112569 0.0396571 0.549283 0.408623
+0.0324674 0.0997396 -0.0141603 0.732658 0.5
+0.0417911 0.101845 0.00188609 0.547756 0.5
+0.00302992 0.112517 0.0415434 0.592572 0.5
+-0.0650368 0.148485 0.0382561 0.62562 0.5
+-0.0706519 0.13063 0.0502497 0.563116 0.5
+-0.0144471 0.128935 0.00903509 0.682121 0.5
+0.00292575 0.131541 0.00912318 0.795238 0.5
+-0.0625682 0.151125 0.035875 0.463512 0.5
+0.0349829 0.113328 0.0214487 0.620597 0.5
+0.021327 0.0385664 0.0392992 0.259499 0.426724
+0.0145125 0.093771 0.0501571 0.654705 0.5
+-0.00923752 0.112849 0.0413907 0.615633 0.5
+0.0415329 0.100906 0.0210277 0.662312 0.5
+0.0422859 0.101486 0.0146614 0.569693 0.490777
+-0.0773783 0.112839 -0.00448759 0.505277 0.5
+-0.078035 0.137641 -0.00517379 0.466714 0.5
+0.00873437 0.106347 -0.0202193 0.792948 0.5
+0.0090324 0.13035 0.0211569 0.465873 0.5
+0.00301322 0.130902 0.0206741 0.592486 0.5
+-0.00286342 0.13115 0.0147367 0.587804 0.5
+-0.0391578 0.12569 0.0207996 0.438744 0.464814
+-0.0205725 0.123523 0.0265579 0.445477 0.415699
+-0.0644194 0.155634 0.00928477 0.611624 0.331941
+-0.0463385 0.131411 0.0207671 0.674928 0.5
+-0.0532034 0.0439067 0.044658 0.417403 0.440199
+-0.00297651 0.131046 0.00884967 0.738924 0.5
+-0.089664 0.137755 0.0263925 0.80362 0.5
+-0.00888731 0.124273 -0.00880284 0.767738 0.284429
+-0.0460971 0.0385107 0.0446891 0.654962 0.5
+-0.0649255 0.178874 -0.0579325 0.245129 0.411885
+-0.0329347 0.124601 0.0211235 0.32811 0.5
+-0.0831301 0.149901 0.0334123 0.331963 0.314683
+-0.0895652 0.093948 0.0149303 0.603378 0.5
+-0.0328901 0.124518 -0.00282055 0.63839 0.5
+-0.0845271 0.106161 0.00204328 0.338681 0.43162
+-0.0469341 0.155816 0.00872921 0.470367 0.484595
+0.0206202 0.123943 0.0267275 0.477255 0.5
+-0.026256 0.117499 0.0321672 0.543293 0.5
+-0.021392 0.118632 0.0336445 0.468887 0.429556
+-0.0195069 0.116132 0.0368525 0.534732 0.411301
+-0.0761618 0.118382 0.0520923 0.490413 0.5
+0.00889281 0.0395765 0.0451727 0.476347 0.38769
+-0.0534736 0.159548 0.00753828 0.476667 0.5
+-0.0469464 0.161226 0.00680216 0.495992 0.483766
+-0.0574886 0.154862 0.0204748 0.677314 0.5
+0.0317199 0.117635 0.0202007 0.579556 0.5
+0.0378683 0.105514 -0.00259159 0.588286 0.5
+-0.0811847 0.137693 -0.00253994 0.641736 0.5
+-0.0764348 0.124515 0.0528345 0.65366 0.5
+0.0343816 0.106104 -0.00900254 0.534403 0.5
+0.0457922 0.088316 0.00867097 0.586292 0.439394
+-0.0703288 0.0944195 -0.0159143 0.511499 0.5
+-0.0756048 0.0937947 -0.0135536 0.429902 0.5
+-0.058657 0.156369 0.0093256 0.31374 0.5
+-0.0637335 0.153848 0.00222718 0.478676 0.5
+-0.0777278 0.0960024 0.0363437 0.678588 0.5
+-0.0868519 0.136556 0.00309926 0.517441 0.5
+-0.0455299 0.0432404 0.0432162 0.712662 0.5
+-0.0402011 0.045749 0.0408051 0.669165 0.320516
+-0.0654123 0.160403 -0.0149066 0.335302 0.5
+-0.0318898 0.0387174 0.0510004 0.553401 0.5
+-0.0267997 0.0453977 0.0509311 0.501112 0.5
+-0.0271043 0.0396972 0.0535379 0.487956 0.5
+-0.0215575 0.0460868 0.0517209 0.709553 0.5
+-0.0143078 0.0445295 0.0504368 0.575852 0.5
+-0.00981594 0.043264 0.0493162 0.448927 0.393067
+-0.00348436 0.044054 0.0472086 0.598081 0.5
+0.009577 0.0458139 0.0465877 0.519814 0.433928
+0.02048 0.111086 0.0379569 0.681163 0.5
+-0.0141831 0.128547 0.0200007 0.293349 0.5
+-0.0526702 0.144108 0.0210347 0.639643 0.5
+-0.0634838 0.17384 -0.0527131 0.549906 0.5
+-0.0366553 0.171999 -0.0125745 0.436075 0.5
+-0.0525548 0.131228 0.0328277 0.727547 0.5
+-0.0659567 0.132023 0.0442925 0.724494 0.5
+-0.0921726 0.11832 0.0267606 0.794672 0.5
+0.0452792 0.0882737 0.00268175 0.507794 0.5
+-0.00305651 0.112889 0.0417789 0.635396 0.5
+-0.0451955 0.161396 -0.00871567 0.424682 0.5
+-0.0402914 0.160933 -0.0115368 0.411895 0.405943
+-0.0521414 0.0701165 0.0389584 0.682177 0.456916
+-0.0383315 0.093604 -0.0232581 0.72469 0.5
+-0.0690556 0.137374 0.046352 0.61723 0.5
+-0.0695996 0.167401 -0.0516299 0.518552 0.5
+-0.00246047 0.124102 0.0337609 0.444043 0.5
+-0.0398624 0.128204 0.00299348 0.864483 0.5
+-0.0753331 0.149032 0.0395625 0.432149 0.5
+-0.0701432 0.160618 -0.00917801 0.464361 0.5
+-0.0589378 0.0440425 0.0434222 0.437887 0.447715
+-0.0207164 0.126445 0.00312493 0.710427 0.5
+-0.00850666 0.0467286 0.0481052 0.613173 0.5
+0.00300323 0.0450308 0.0469911 0.464978 0.5
+-0.0802174 0.148665 0.0379438 0.47939 0.5
+-0.0819961 0.130698 0.0513437 0.54405 0.5
+0.00273088 0.106333 -0.0209927 0.733954 0.5
+-0.0757273 0.0885687 -0.0138399 0.397424 0.5
+-0.0698477 0.0882875 -0.0167823 0.420617 0.5
+-0.0668508 0.159243 -0.0102161 0.42216 0.440727
+-0.0226988 0.0885773 0.0536309 0.546444 0.5
+-0.00281419 0.0990077 0.0505614 0.455087 0.5
+0.0452902 0.0696213 0.0253974 0.33948 0.5
+-0.0525629 0.0472823 0.040482 0.279548 0.5
+-0.046959 0.0466581 0.0408127 0.43714 0.5
+-0.0691348 0.156682 -0.00276369 0.629099 0.5
+-0.0897599 0.150073 0.0220744 0.276354 0.5
+-0.0702883 0.155637 0.0263654 0.47565 0.441038
+-0.0765031 0.154893 0.0266005 0.799832 0.5
+-0.00804843 0.0987379 0.0505998 0.327523 0.438474
+0.0300791 0.11567 -0.00430465 0.66246 0.5
+-0.0923054 0.117757 0.0334441 0.476916 0.5
+-0.0331192 0.0449511 0.0462474 0.432059 0.466683
+-0.0337794 0.113308 0.034612 0.683562 0.5
+-0.0521291 0.113769 0.0349566 0.515399 0.5
+0.0437636 0.0825382 -0.0027974 0.568535 0.5
+-0.0202819 0.126016 0.0210507 0.374818 0.437592
+0.0327872 0.043925 0.0295904 0.650152 0.5
+-0.0453372 0.155266 -0.0075525 0.386286 0.5
+-0.0284609 0.173987 -0.0175958 0.379432 0.418735
+0.0268448 0.0881755 -0.0223077 0.715629 0.5
+-0.0308231 0.0923023 -0.0246377 0.474586 0.431409
+-0.0899732 0.149975 0.0141115 0.257143 0.5
+0.0381804 0.105121 0.0266947 0.534482 0.490368
+0.00842001 0.12907 0.0258154 0.374593 0.448613
+-0.0266549 0.0942999 -0.0265555 0.294426 0.332222
+-0.0279896 0.0475815 0.0485532 0.381268 0.5
+-0.0150037 0.048073 0.0483203 0.576068 0.5
+-0.00298993 0.0473817 0.0491102 0.446744 0.431743
+0.00376754 0.0477551 0.0502037 0.495901 0.44823
+0.00748504 0.0473851 0.0493363 0.494952 0.5
+-0.0581651 0.149751 0.032858 0.470966 0.5
+-0.0720688 0.136456 0.0490662 0.625357 0.5
+-0.0810638 0.0939541 -0.0082617 0.685573 0.5
+0.0380863 0.0458646 0.0307423 0.807573 0.5
+-0.0253234 0.182998 -0.0108168 0.245054 0.5
+-0.0230508 0.183235 -0.0110157 0.246322 0.458572
+0.00323317 0.129146 0.0263855 0.347796 0.441746
+-0.0626125 0.149788 -0.00343342 0.691705 0.5
+-0.0591471 0.0466998 0.0395843 0.0883466 0.213805
+-0.0353862 0.0471292 0.0414241 0.656538 0.5
+-0.0194948 0.0486404 0.0485565 0.373069 0.5
+-0.00849455 0.0521633 0.0517688 0.61481 0.5
+-0.00296485 0.051429 0.0527827 0.53012 0.5
+0.00279019 0.0517664 0.0528352 0.560812 0.423049
+0.00904034 0.0517165 0.051222 0.558244 0.5
+0.0443839 0.0943042 0.00268377 0.582116 0.455816
+-0.0886145 0.111113 0.0148415 0.604102 0.5
+-0.0885219 0.144027 0.0329221 0.623335 0.5
+0.0440719 0.0937787 0.0206165 0.493368 0.454688
+0.0436531 0.0980341 0.0146596 0.668233 0.5
+-0.0650976 0.153799 -0.00285808 0.715743 0.5
+-0.0517297 0.0490759 0.0371355 0 0
+-0.0331222 0.0518259 0.0385377 0.676102 0.5
+-0.0377352 0.127448 0.0152358 0.612182 0.5
+-0.00906608 0.100701 0.0460122 0.338462 0.5
+-0.0410683 0.128416 0.0134054 0.417331 0.5
+-0.0712056 0.158724 -0.00521868 0.246338 0.5
+-0.0266313 0.0501544 0.044695 0.182016 0.5
+-0.0211065 0.0519946 0.0455753 0.195646 0.404388
+-0.0168667 0.0505241 0.0476889 0.520032 0.5
+-0.0147601 0.0527687 0.050103 0.451613 0.5
+-0.0626395 0.149972 -0.00897733 0.363787 0.461156
+-0.090861 0.124732 0.00627835 0.587249 0.5
+-0.0255786 0.0923499 -0.0315595 0.294527 0.5
+-0.0709738 0.172947 -0.052768 0.460427 0.5
+-0.0588974 0.143232 -0.00327646 0.48145 0.5
+-0.0943643 0.12436 0.0216467 0.570519 0.5
+0.0337044 0.112449 -0.00269877 0.532211 0.5
+-0.0515051 0.136557 0.0263185 0.72719 0.5
+-0.00886593 0.121199 0.0360577 0.614897 0.5
+-0.061729 0.155665 -0.0259512 0.690546 0.5
+-0.0862637 0.10567 0.0206042 0.519516 0.5
+-0.0895584 0.138606 0.032689 0.685876 0.5
+-0.0268168 0.123904 0.0208113 0.428255 0.5
+0.0341937 0.0515433 0.033081 0.609925 0.5
+0.0401268 0.0512743 0.0322702 0.669803 0.5
+0.0449306 0.0526595 0.0319582 0.655209 0.5
+-0.0405348 0.117168 0.0319438 0.657986 0.5
+-0.0636902 0.155546 -0.0390642 0.523327 0.5
+0.0278663 0.100401 0.0410064 0.689793 0.5
+-0.0275828 0.179275 -0.0157605 0.314049 0.5
+-0.0758871 0.0942302 0.0383961 0.647987 0.457049
+0.0138371 0.129201 0.0203961 0.412341 0.5
+-0.0152723 0.0998429 0.0451638 0.271215 0.427554
+-0.00916763 0.129718 0.0206646 0.438679 0.430152
+-0.0512444 0.0516901 0.0334801 0.192432 0.5
+-0.0461563 0.0523184 0.0379981 0.311543 0.5
+-0.0410001 0.05272 0.0393793 0.629588 0.477809
+-0.0270993 0.0526642 0.0393104 0.155274 0.5
+0.0434924 0.0931097 -0.00154028 0.576953 0.480183
+-0.0823819 0.112683 0.045427 0.438131 0.5
+-0.092066 0.118055 0.00909937 0.325678 0.5
+-0.00448884 0.121713 0.0362976 0.591545 0.5
+0.0147346 0.129423 0.0143146 0.840212 0.5
+-0.0158113 0.161888 -0.00973584 0.202865 0.5
+-0.0778838 0.149704 -0.00337488 0.403345 0.5
+-0.0865357 0.12477 -0.00166991 0.677311 0.5
+0.0153656 0.126058 0.0275381 0.479299 0.429147
+-0.0388913 0.123761 0.0249778 0.514489 0.5
+-0.0390351 0.121238 0.0283673 0.510424 0.470651
+-0.0324963 0.120237 0.0283344 0.568849 0.348087
+-0.0149052 0.12311 0.0316417 0.446842 0.5
+-0.0582873 0.117688 0.0386719 0.634635 0.5
+-0.0626536 0.161861 -0.0264031 0.685413 0.5
+-0.0818147 0.141639 0.0444825 0.392929 0.5
+0.0350734 0.100071 0.0345975 0.716199 0.5
+0.0311856 0.11215 0.0310216 0.689434 0.5
+-0.0335778 0.11743 0.031458 0.525408 0.5
+-0.059637 0.153475 0.031348 0.93076 0.5
+-0.0481256 0.0536625 0.0362191 0.58186 0.5
+-0.059026 0.156388 0.00269852 0.133166 0.5
+-0.0211187 0.0578754 0.0461125 0.660553 0.5
+-0.082738 0.124721 0.050554 0.665202 0.5
+-0.0466997 0.11363 0.0348133 0.568902 0.5
+-0.0107262 0.179662 -0.0277472 0.400699 0.458536
+0.0347725 0.0894441 -0.0170339 0.702331 0.5
+-0.0891825 0.100351 0.0148945 0.574286 0.477791
+0.0257275 0.122894 0.0207337 0.498278 0.5
+-0.0883949 0.100277 0.00841226 0.477822 0.5
+-0.0649858 0.155518 0.0263367 0.864791 0.5
+-0.0768402 0.154073 0.00257877 0.57436 0.5
+-0.0576877 0.154146 0.0262123 0.402162 0.5
+-0.0266966 0.125729 0.0145923 0.393422 0.5
+-0.076376 0.155782 0.0208875 0.505065 0.5
+-0.0763295 0.167188 -0.039594 0.405226 0.426366
+-0.0771877 0.100229 -0.0103313 0.528684 0.5
+-0.0153681 0.0590839 0.0519909 0.652683 0.5
+-0.010206 0.0576345 0.0535443 0.781548 0.413019
+-0.00350044 0.0578672 0.0543757 0.774384 0.5
+0.00300818 0.0568916 0.0538692 0.704357 0.5
+0.0088308 0.0580497 0.0529859 0.692645 0.5
+0.0410915 0.0820775 -0.00893411 0.500391 0.430286
+0.0395449 0.0576373 0.0318985 0.612032 0.4505
+-0.0762443 0.139336 0.0484763 0.588653 0.42756
+-0.0324306 0.120379 -0.00955344 0.656019 0.5
+-0.0194451 0.0881559 0.0557639 0.449983 0.473992
+-0.074787 0.159471 -0.00898201 0.281303 0.5
+-0.0639935 0.15611 0.0210031 0.687157 0.5
+-0.0762438 0.153101 0.0322442 0.323875 0.45561
+-0.00876679 0.128727 0.025102 0.735708 0.5
+0.0282216 0.112237 -0.00983067 0.567922 0.385391
+-0.0451341 0.0593225 0.0387559 0.511245 0.5
+-0.0405005 0.0579499 0.040202 0.540369 0.5
+-0.033993 0.0584028 0.038704 0.646744 0.5
+-0.0272756 0.0585468 0.0382285 0.571263 0.5
+-0.0248608 0.122913 0.0245429 0.379391 0.5
+-0.0825276 0.154355 0.0206132 0.385494 0.444119
+-0.00884271 0.129403 0.00305159 0.702319 0.5
+0.0207587 0.126654 0.0147646 0.624434 0.5
+-0.0394868 0.173351 -0.00839443 0.199648 0.251821
+-0.028421 0.114019 0.0347746 0.603313 0.5
+-0.0193575 0.122009 0.0306737 0.55532 0.5
+-0.0691626 0.161675 -0.0514614 0.38665 0.5
+-0.0516736 0.15006 0.0148119 0.716684 0.5
+-0.0156325 0.120151 0.0349054 0.470635 0.336572
+0.0467454 0.0582319 0.0314404 0.576429 0.5
+-0.0770165 0.0685425 0.0147863 0.703257 0.5
+-0.00967101 0.173225 -0.0264945 0.379771 0.5
+-0.0213141 0.184813 -0.0151112 0.186313 0.403961
+-0.0766524 0.0882188 0.0382876 0.650646 0.5
+-0.0540219 0.0521463 0.0110698 0.270787 0.5
+-0.0219451 0.126821 0.0155536 0.534695 0.5
+-0.0820391 0.153392 0.0264506 0.292051 0.4047
+-0.0213183 0.124468 -0.00290836 0.782181 0.5
+-0.0268364 0.123465 -0.00321538 0.727949 0.5
+-0.0312035 0.177796 -0.0133521 0.371348 0.5
+-0.00749945 0.0598042 0.0553302 0.778631 0.5
+-0.00108951 0.0601245 0.0554892 0.776353 0.5
+0.00280202 0.0599746 0.0555283 0.768603 0.5
+-0.051797 0.118119 0.033678 0.677092 0.438456
+0.00302464 0.131618 0.0149353 0.692956 0.5
+0.0446005 0.0942619 0.0151198 0.554026 0.5
+-0.0880636 0.111855 0.00852285 0.304511 0.3924
+-0.0704321 0.144096 -0.0148369 0.130446 0.5
+-0.0820967 0.0943634 0.0322765 0.629357 0.5
+-0.0269642 0.120812 0.0275676 0.345323 0.386314
+-0.0540164 0.149968 0.0253393 0.49489 0.5
+-0.0800337 0.0995053 -0.00770139 0.499264 0.5
+0.00922138 0.12038 0.0360924 0.562107 0.5
+0.00286056 0.117968 0.0387331 0.649494 0.5
+-0.0936229 0.118494 0.0206524 0.664933 0.5
+-0.0409923 0.113229 0.035109 0.667726 0.5
+-0.0822185 0.154488 0.0146661 0.500539 0.5
+-0.0625956 0.155202 -0.0329876 0.814083 0.5
+-0.0462511 0.124621 -0.00898124 0.590842 0.5
+-0.0220336 0.160676 -0.00426008 0.309766 0.47069
+-0.065621 0.172767 -0.0466049 0.613718 0.5
+-0.0762614 0.155884 0.0148687 0.717415 0.5
+-0.0644988 0.149044 -0.0265159 0.690046 0.5
+-0.0581979 0.0593456 0.0210895 0.079935 0
+-0.0335439 0.122618 0.0254024 0.514037 0.5
+-0.0826578 0.153434 0.00921403 0.601617 0.5
+-0.049999 0.132417 0.0286961 0.650903 0.5
+0.0088217 0.131096 0.00864908 0.834131 0.5
+-0.0154842 0.0644282 0.0533754 0.608033 0.445048
+-0.00871951 0.065015 0.0556827 0.650491 0.470895
+-0.00324815 0.0640003 0.0562816 0.762387 0.5
+0.00292601 0.0643094 0.0563956 0.748671 0.5
+0.00738462 0.0651614 0.0553402 0.488299 0.46872
+-0.0143174 0.116971 0.037836 0.441459 0.5
+-0.00299223 0.118083 0.0390751 0.65526 0.5
+-0.00864301 0.117816 0.0385662 0.681198 0.5
+-0.0532884 0.0571719 0.0206631 0.106703 0
+-0.0882588 0.100387 0.0210097 0.535268 0.5
+-0.0324377 0.099703 -0.0227313 0.620611 0.5
+0.0425072 0.0603725 0.0302275 0.744481 0.5
+0.0523383 0.0580401 0.0290457 0.405493 0.41666
+0.0413612 0.0877503 -0.00929235 0.635782 0.5
+-0.0581547 0.0620148 0.0270981 0.448705 0.5
+-0.0530328 0.0590503 0.0266933 0.136202 0.5
+-0.0477227 0.135526 0.0148654 0.740469 0.5
+0.00323512 0.0983053 0.0504424 0.395048 0.366076
+0.0150627 0.119642 0.034806 0.696033 0.374342
+-0.0453373 0.0643061 0.0391142 0.587502 0.5
+-0.0394097 0.0644278 0.0414133 0.715885 0.5
+-0.033068 0.0642666 0.0396407 0.650585 0.5
+-0.0270237 0.0644489 0.0395335 0.617817 0.5
+-0.0881604 0.149479 0.0268507 0.265855 0.5
+-0.0640727 0.143434 -0.00894036 0.668887 0.5
+0.00286033 0.121151 0.036139 0.623932 0.5
+-0.0827306 0.138152 0.0466993 0.412428 0.5
+-0.00261511 0.127006 0.030132 0.335862 0.5
+0.0355841 0.108498 -0.00452523 0.461807 0.466834
+0.0219203 0.114136 0.0356941 0.554683 0.5
+-0.0379555 0.161954 -0.0128021 0.499753 0.5
+-0.0526362 0.0643632 0.0340621 0.277414 0.5
+0.025874 0.123374 0.0143811 0.506732 0.5
+-0.0451406 0.131184 0.00901599 0.493237 0.5
+-0.075778 0.155361 -0.00310678 0.708579 0.5
+-0.0739145 0.156437 -0.0274945 0.645327 0.5
+-0.0833056 0.100778 -0.00354288 0.490806 0.397415
+-0.0767099 0.173942 -0.0452732 0.259897 0.5
+0.00846106 0.116985 0.038033 0.66824 0.5
+-0.0200899 0.184788 -0.020546 0.237973 0.106197
+-0.046571 0.120413 0.0285524 0.752764 0.5
+-0.0515313 0.123718 -0.0088569 0.538005 0.5
+0.0212116 0.105804 -0.0171101 0.576137 0.468722
+-0.0938613 0.124487 0.0151416 0.737559 0.5
+0.0414591 0.064577 0.0290352 0.617794 0.5
+0.0466725 0.0643471 0.0285539 0.486488 0.5
+0.0526423 0.0634018 0.0283831 0.501229 0.5
+-0.0468141 0.168322 -0.00285433 0.371444 0.5
+-0.0869152 0.0944156 0.00293118 0.494536 0.346642
+-0.0773713 0.161559 -0.0267238 0.476378 0.5
+-0.0396095 0.126677 -0.00334699 0.853498 0.5
+-0.0271315 0.0764239 0.0455715 0.693464 0.5
+-0.0587953 0.107012 -0.0177177 0.484023 0.5
+-0.0748314 0.11156 -0.00720996 0.44421 0.5
+-0.0642623 0.0888181 -0.018733 0.676741 0.5
+-0.0325172 0.0881157 -0.0255424 0.370176 0.330832
+0.00325654 0.0700086 0.0561047 0.731659 0.5
+0.0103151 0.0636713 0.0537558 0.477793 0.458716
+0.0432701 0.0979967 0.00267804 0.544182 0.465461
+-0.0708223 0.156244 0.021207 0.768676 0.5
+-0.0584176 0.0702277 0.0384322 0.673529 0.5
+-0.0703207 0.112305 -0.00963846 0.530989 0.5
+-0.0581653 0.0881983 -0.0208369 0.619673 0.5
+-0.0443038 0.0877156 -0.0218942 0.693083 0.5
+-0.0488091 0.0660127 0.0373959 0.829801 0.5
+0.00269411 0.126911 0.030114 0.419275 0.5
+0.0239692 0.12105 0.0288706 0.523768 0.5
+-0.0469203 0.117468 0.0314407 0.649888 0.5
+-0.091552 0.143361 0.0201623 0.515231 0.5
+-0.0907563 0.143859 0.0263089 0.504684 0.469425
+-0.0495713 0.144022 0.00976642 0.636632 0.45621
+-0.0770934 0.15583 -0.0147903 0.519503 0.5
+-0.0868322 0.105634 0.00887573 0.731519 0.5
+-0.082848 0.131648 -0.00299747 0.386393 0.5
+-0.0384249 0.106407 -0.0201393 0.79815 0.5
+-0.0823953 0.118841 -0.00336022 0.540306 0.5
+-0.0102333 0.0876697 -0.0375101 0.564234 0.5
+-0.00789361 0.089842 -0.0363492 0.755212 0.5
+-0.0579097 0.111769 -0.0161856 0.463258 0.5
+0.0140074 0.105793 -0.0193841 0.554632 0.5
+-0.00328561 0.105435 -0.0225198 0.740261 0.5
+-0.0409613 0.070972 0.0419904 0.795206 0.5
+-0.033501 0.0710512 0.0409793 0.706864 0.5
+-0.0272732 0.0701361 0.0410332 0.726443 0.5
+-0.0161963 0.127121 0.0228897 0.305628 0.5
+-0.0190644 0.127936 0.0133818 0.519435 0.5
+-0.0149926 0.0694778 0.0545159 0.595577 0.5
+-0.00932719 0.0707313 0.0562936 0.785998 0.5
+-0.002994 0.0710941 0.0575426 0.779773 0.5
+0.00838831 0.0714267 0.0556585 0.671976 0.5
+0.0102531 0.0693533 0.0547665 0.525573 0.5
+-0.0323939 0.153399 -0.00240332 0.209483 0.5
+0.0435981 0.0881514 0.0254203 0.603121 0.478265
+-0.0586529 0.124882 -0.00781093 0.700525 0.5
+-0.0204287 0.107045 -0.022046 0.723165 0.5
+-0.0382961 0.0879422 -0.0229335 0.629507 0.5
+-0.081573 0.113394 -0.00173083 0.508624 0.426711
+-0.0380811 0.154778 -0.00889149 0.748063 0.5
+-0.00212588 0.0889926 -0.0354677 0.782073 0.5
+0.00904065 0.100193 -0.0222794 0.54652 0.5
+-0.0467068 0.0700493 0.0405769 0.710023 0.5
+-0.0779974 0.151244 0.0352264 0.347296 0.5
+0.0149019 0.116126 0.0367849 0.635361 0.5
+-0.07603 0.106301 -0.0087688 0.520423 0.5
+-0.0885261 0.137839 0.0393964 0.651389 0.5
+-0.0703112 0.131278 -0.00857724 0.737784 0.5
+0.0419377 0.0703605 0.0288832 0.54196 0.5
+0.0514194 0.0684326 0.0256968 0.512602 0.5
+-0.0922548 0.124813 0.0393757 0.806636 0.5
+0.0135035 0.128105 0.0250558 0.487288 0.424656
+-0.0704618 0.125421 -0.00881334 0.801453 0.5
+-0.0703931 0.118731 -0.00840961 0.381625 0.5
+-0.0719685 0.106305 -0.0114493 0.499561 0.5
+-0.0646972 0.161498 -0.0573125 0.41682 0.5
+0.0463693 0.0715128 0.0216754 0.461473 0.448797
+-0.0538246 0.153497 0.0152346 0.602795 0.402362
+-0.0142869 0.0724666 0.0554243 0.617853 0.5
+-0.0394057 0.118512 -0.01336 0.602235 0.5
+-0.0280509 0.0880065 -0.0330858 0.33771 0.5
+-0.00957701 0.168254 -0.0212321 0.359593 0.5
+-0.0445856 0.167324 -0.00782662 0.413138 0.327414
+-0.0513101 0.161594 -0.00355965 0.292939 0.5
+-0.0702356 0.179304 -0.0569867 0.253404 0.5
+-0.0644695 0.168402 -0.0398946 0.676128 0.5
+-0.0089459 0.130139 0.00911776 0.703889 0.5
+0.00219503 0.0880369 -0.0342201 0.75972 0.5
+-0.0268891 0.16726 -0.0174204 0.847505 0.5
+-0.0525985 0.155054 -0.00368706 0.37123 0.419006
+-0.0761618 0.131736 -0.00696723 0.42394 0.44361
+-0.0759576 0.07099 0.0265672 0.757943 0.5
+-0.00875341 0.10588 -0.02285 0.71177 0.5
+-0.0519242 0.1493 -0.00277595 0.483301 0.5
+-0.016371 0.18465 -0.0214272 0.271878 0.5
+-0.020548 0.0705632 0.0520411 0.601639 0.5
+-0.0813371 0.120073 0.049533 0.662828 0.5
+-0.0625087 0.149934 -0.0150319 0.415531 0.480025
+-0.0831098 0.10651 0.0273461 0.515033 0.5
+-0.011119 0.163582 -0.018751 0.17813 0.5
+-0.00291057 0.101147 0.0456419 0.307462 0.5
+-0.0635467 0.0660523 0.0318653 0.49936 0.45677
+-0.0511979 0.0873878 -0.0217212 0.75515 0.5
+-0.0530335 0.0740367 0.0417219 0.727079 0.5
+-0.0465007 0.0756701 0.0421325 0.696934 0.5
+-0.022314 0.0760359 0.0530306 0.607912 0.5
+-0.0151351 0.0764056 0.0563566 0.616605 0.5
+-0.00900601 0.0766621 0.0575852 0.791265 0.5
+-0.00299732 0.0767339 0.0584651 0.678647 0.450838
+0.00347424 0.0769755 0.0565905 0.523043 0.5
+0.00860763 0.0767538 0.0557293 0.612782 0.5
+-0.0271239 0.156216 -0.00302734 0.139755 0.329034
+-0.0633091 0.16738 -0.0580906 0.358909 0.45373
+-0.0873943 0.144225 0.00902371 0.583528 0.5
+-0.0626891 0.162297 -0.0470925 0.70746 0.5
+0.0370111 0.110397 0.00265294 0.516602 0.481774
+-0.0744006 0.144062 -0.00864565 0.417075 0.5
+-0.0244124 0.183841 -0.0135068 0.166659 0.5
+-0.0803381 0.0715473 0.0150483 0.5669 0.5
+-0.0644528 0.0761561 0.040638 0.610448 0.476331
+-0.0588413 0.0753794 0.0421022 0.634349 0.5
+-0.0524294 0.077372 0.0433357 0.774603 0.5
+-0.0484981 0.0769334 0.043281 0.674446 0.5
+-0.0414954 0.0773856 0.0429005 0.752035 0.5
+-0.0395008 0.0754808 0.0425134 0.72256 0.5
+-0.033488 0.0764759 0.0414605 0.748994 0.5
+-0.0627838 0.162163 -0.0530538 0.691143 0.5
+0.0381456 0.0881056 -0.0138675 0.676152 0.5
+-0.0642837 0.0396418 0.039624 0.532543 0.5
+-0.0526672 0.121335 -0.010917 0.523608 0.5
+-0.0738104 0.162942 -0.037093 0.458525 0.324439
+-0.0490869 0.13938 0.00889895 0.657159 0.5
+-0.0495771 0.166027 -0.00171113 0.322064 0.5
+-0.0709736 0.161609 -0.0450808 0.365011 0.420984
+0.0251847 0.12195 0.0254854 0.524179 0.5
+-0.0193615 0.0781018 0.0558163 0.595703 0.4544
+-0.0265458 0.120645 -0.00911332 0.52669 0.5
+-0.061796 0.155741 -0.0207923 0.443336 0.5
+-0.082476 0.110295 0.0324103 0.745977 0.5
+-0.0691674 0.156314 -0.050857 0.360984 0.5
+-0.0622848 0.16236 -0.0396288 0.427869 0.464762
+-0.088248 0.113803 0.0264606 0.595923 0.5
+-0.0575392 0.0787026 0.0436363 0.801201 0.5
+-0.0298439 0.0782596 0.0421168 0.771067 0.5
+-0.0677617 0.0876701 0.0434928 0.59211 0.5
+-0.0921939 0.131884 0.015227 0.781723 0.5
+-0.0878987 0.111742 0.0209206 0.698028 0.5
+-0.049353 0.139298 0.0147955 0.761861 0.5
+-0.0327071 0.173321 -0.0149209 0.384317 0.5
+-0.0866298 0.152851 0.0149144 0.267781 0.5
+-0.0779646 0.100025 0.035185 0.697079 0.5
+-0.0935537 0.118404 0.0151524 0.667612 0.5
+-0.084908 0.10801 0.0228537 0.694681 0.5
+-0.0210677 0.0821213 0.0562096 0.557699 0.5
+-0.0149957 0.082187 0.0572635 0.665194 0.5
+-0.00899671 0.0822178 0.0576875 0.71377 0.5
+-0.00299966 0.0822055 0.0574653 0.668024 0.472979
+0.0034748 0.0817533 0.0567544 0.69456 0.5
+0.00824833 0.082992 0.0556315 0.615627 0.5
+0.0102414 0.0812949 0.0546523 0.424956 0.485927
+-0.0398496 0.123966 -0.00878898 0.60318 0.5
+-0.092257 0.124769 0.00902091 0.309094 0.468872
+-0.0436728 0.126191 0.0209533 0.472028 0.413108
+-0.0820425 0.105873 -0.00271871 0.341089 0.347157
+-0.0663016 0.0807623 0.0424437 0.632223 0.5
+-0.0639939 0.0836688 0.0439754 0.778832 0.5
+-0.058539 0.0825906 0.0439671 0.770991 0.5
+-0.0521209 0.0822523 0.0446262 0.782751 0.5
+-0.0467559 0.0828569 0.0439458 0.699516 0.399968
+-0.0424962 0.0810729 0.0423266 0.617938 0.5
+-0.0404903 0.0830123 0.0430984 0.712874 0.5
+-0.0365108 0.0825773 0.0434355 0.675696 0.5
+-0.032204 0.0824171 0.0421121 0.529763 0.5
+-0.0864005 0.152981 0.0204492 0.250247 0.416029
+-0.0235661 0.115415 0.0353667 0.518805 0.471584
+-0.0764871 0.111685 0.0461598 0.498936 0.5
+-0.0763895 0.14977 -0.00829972 0.604451 0.5
+-0.0754801 0.161855 -0.0327796 0.39691 0.5
+-0.0285733 0.0828247 0.0462702 0.636794 0.5
+-0.0862819 0.100797 0.0028483 0.65379 0.5
+0.021088 0.08242 0.0504086 0.491924 0.475524
+-0.0801892 0.143128 -0.00230055 0.641961 0.5
+0.00844098 0.124407 -0.00878569 0.555015 0.5
+0.0147552 0.0825883 0.0529115 0.480476 0.5
+-0.061995 0.161169 -0.032654 0.499509 0.5
+-0.0807571 0.1525 0.0307996 0.295115 0.454522
+-0.00295953 0.130272 0.00279699 0.742188 0.5
+-0.0153619 0.0884791 0.0565599 0.818561 0.5
+-0.00899729 0.0878977 0.0570287 0.818958 0.5
+-0.00299611 0.0880658 0.0568489 0.695384 0.5
+0.00301457 0.0885291 0.0562756 0.81087 0.5
+0.00834267 0.0873808 0.0555541 0.577038 0.479545
+-0.00897481 0.0941651 -0.0338408 0.678465 0.5
+0.0314278 0.11673 0.0250113 0.597807 0.5
+-0.0760602 0.155337 0.0093949 0.68566 0.5
+0.0257808 0.116776 -0.00728909 0.54747 0.36626
+-0.0646577 0.0882843 0.0447113 0.69894 0.5
+-0.058996 0.0882997 0.0449149 0.778337 0.5
+-0.0529958 0.0883132 0.0451395 0.696869 0.45083
+-0.0465421 0.0881579 0.0443187 0.605881 0.5
+-0.0404961 0.0876863 0.0430941 0.556958 0.5
+-0.0331792 0.0885648 0.04366 0.668172 0.5
+-0.0280482 0.0879652 0.046363 0.699915 0.5
+0.0150626 0.0881784 0.0517745 0.702815 0.5
+0.0205955 0.087113 0.0492325 0.678548 0.5
+-0.0702712 0.0823874 0.0409431 0.628092 0.5
+-0.0296926 0.0896882 -0.0286839 0.317989 0.390463
+-0.0236137 0.179242 -0.0115629 0.264741 0.5
+-0.0809391 0.100029 0.0323433 0.683272 0.5
+-0.0928336 0.130683 0.0207107 0.62518 0.472282
+-0.0761771 0.156201 -0.0204165 0.612769 0.5
+0.0146577 0.129396 0.00843576 0.595962 0.5
+0.0104845 0.089766 0.0542005 0.46622 0.5
+-0.072579 0.161253 -0.0389447 0.482103 0.5
+-0.0322741 0.110391 -0.0184574 0.809584 0.5
+-0.0550172 0.150108 0.027792 0.412797 0.5
+-0.071635 0.0883254 0.0414652 0.604622 0.463567
+-0.0424904 0.0895336 0.0426086 0.959715 0.5
+0.0207945 0.0897491 0.0484315 0.669841 0.5
+0.0273189 0.118845 -0.00265658 0.615055 0.5
+0.0285218 0.121112 0.0162366 0.593248 0.434231
+-0.00899735 0.0930598 0.0559298 0.639163 0.5
+-0.00291176 0.118727 -0.0144021 0.826286 0.5
+-0.0885191 0.113233 0.0327948 0.447552 0.461926
+-0.0713744 0.0938304 0.0415269 0.544171 0.444972
+-0.0641029 0.0935514 0.0439488 0.597795 0.395518
+-0.0584965 0.0944146 0.0446213 0.678752 0.5
+-0.0515853 0.0939836 0.0442383 0.634435 0.477778
+-0.0465591 0.0937901 0.0436103 0.714507 0.5
+-0.0414914 0.0942416 0.0425268 0.490492 0.46307
+-0.0377723 0.0933327 0.0434889 0.620752 0.5
+-0.0332864 0.0945766 0.0443868 0.723538 0.5
+-0.0263807 0.094318 0.0450568 0.620324 0.5
+-0.0141606 0.0929618 0.0553898 0.503825 0.5
+-0.00319641 0.0930898 0.0557853 0.624082 0.5
+0.00150357 0.0931879 0.0551544 0.492015 0.5
+0.00367616 0.0950752 0.0535295 0.508462 0.5
+0.00915739 0.0941794 0.0519212 0.597357 0.452723
+0.0216553 0.0937794 0.0473202 0.671835 0.5
+-0.0702968 0.174481 -0.045888 0.43732 0.455145
+-0.0305889 0.168899 -0.00702359 0.59106 0.5
+-0.0528191 0.162649 0.00296711 0.343566 0.5
+-0.0890968 0.0940104 0.0208024 0.539357 0.478012
+-0.0626249 0.173112 -0.0586131 0.353011 0.447085
+-0.0443835 0.105923 -0.0201903 0.683228 0.5
+-0.0664958 0.0951776 0.0424531 0.672396 0.5
+-0.0324384 0.126415 0.0146752 0.445893 0.463327
+-0.0152469 0.0961657 0.0518098 0.323594 0.5
+-0.0097537 0.0960506 0.0535818 0.446732 0.426556
+-0.00304601 0.0963367 0.0537791 0.579525 0.5
+0.01642 0.0957081 0.0480381 0.687032 0.5
+-0.0876548 0.105191 0.0148253 0.774556 0.5
+-0.0699417 0.0763232 0.0381496 0.596573 0.5
+0.0358078 0.0958594 -0.0120328 0.738943 0.5
+0.0374966 0.100154 0.031249 0.720944 0.5
+-0.0530195 0.150059 0.0207323 0.696139 0.5
+-0.0905911 0.131765 0.0328667 0.816274 0.5
+-0.0709717 0.147309 -0.0268389 0.224341 0.389051
+-0.0443321 0.0935075 -0.0222668 0.709831 0.5
+-0.0400911 0.128618 0.00909496 0.81345 0.5
+-0.0710054 0.100275 0.0398128 0.481571 0.5
+-0.0653063 0.100124 0.0417262 0.670525 0.470095
+-0.0589969 0.0980495 0.0430328 0.779482 0.5
+-0.0529938 0.0980631 0.0432952 0.836255 0.5
+-0.0469951 0.0980659 0.043235 0.637806 0.5
+-0.0408476 0.100401 0.0414668 0.648927 0.395789
+-0.0323344 0.0988071 0.0435216 0.652032 0.5
+-0.0259464 0.0998425 0.0438947 0.737424 0.5
+-0.0212066 0.0999849 0.0444194 0.576924 0.5
+0.00749586 0.09835 0.0488255 0.46146 0.5
+0.0090271 0.101109 0.0469975 0.470012 0.5
+0.0153076 0.100008 0.0472449 0.600016 0.5
+0.0208175 0.100067 0.0453866 0.595024 0.46889
+-0.0648326 0.131509 -0.00838673 0.790869 0.5
+-0.0740297 0.150832 -0.0323367 0.406089 0.5
+-0.0932444 0.124885 0.026841 0.802537 0.5
+-0.0633239 0.169093 -0.0610358 0.362406 0.5
+-0.0771158 0.162488 -0.0202679 0.465605 0.5
+-0.0585669 0.0647555 0.0323611 0.494963 0.328305
+0.0377689 0.110383 0.00969065 0.710008 0.5
+-0.0503559 0.0935892 -0.0218956 0.807094 0.5
+-0.0589961 0.101543 0.042437 0.529374 0.5
+-0.0516647 0.101981 0.0417488 0.647378 0.5
+-0.0469248 0.101325 0.0421166 0.608323 0.5
+-0.0352173 0.101965 0.0413638 0.751982 0.5
+0.00285015 0.100935 0.0464433 0.395489 0.5
+-0.075479 0.150312 -0.0143808 0.730394 0.5
+-0.078936 0.108126 -0.00525459 0.540251 0.381971
+-0.0251472 0.168981 -0.0187156 0.757996 0.5
+-0.071457 0.113692 0.0499983 0.429195 0.5
+-0.0747771 0.0997536 0.0377868 0.551123 0.5
+-0.0902919 0.137212 0.0146286 0.495279 0.5
+-0.0264568 0.105883 0.0411765 0.58994 0.471484
+-0.0209966 0.1044 0.0429589 0.797197 0.5
+-0.0145208 0.105597 0.0430511 0.780555 0.5
+-0.00899316 0.10622 0.0435541 0.510194 0.5
+-0.00289533 0.105882 0.0438861 0.384284 0.5
+0.00245231 0.105621 0.0429868 0.332307 0.5
+0.00945613 0.104903 0.0439002 0.435482 0.5
+0.0149913 0.104769 0.0443348 0.548532 0.5
+-0.0772186 0.106139 0.0350601 0.430274 0.367589
+-0.0708601 0.106945 0.0381598 0.402417 0.5
+-0.0652985 0.106577 0.0390805 0.558067 0.398761
+-0.0583896 0.105623 0.0405326 0.594554 0.5
+-0.0529341 0.106445 0.0398435 0.644542 0.398207
+-0.0461638 0.105797 0.0404843 0.759883 0.5
+-0.0400204 0.106789 0.0388993 0.653599 0.5
+-0.03311 0.106322 0.0394461 0.532024 0.5
+0.0193026 0.10477 0.0431964 0.486674 0.480281
+-0.0501412 0.13774 0.00286739 0.569746 0.5
+0.0266104 0.105911 0.0384052 0.650339 0.5
+0.0438719 0.088439 -0.0031027 0.506353 0.478726
+-0.0590381 0.113203 0.0362299 0.87726 0.5
+-0.021499 0.107851 0.0414162 0.584043 0.5
+-0.0164951 0.107881 0.0420289 0.633836 0.5
+0.00450524 0.107918 0.0419336 0.79888 0.5
+0.00856234 0.108229 0.0410531 0.820786 0.5
+0.0149994 0.10779 0.0412845 0.598409 0.5
+0.0213049 0.106041 0.0409433 0.561561 0.479574
+-0.0336665 0.167843 -0.00338268 0.478764 0.5
+-0.0587789 0.131705 -0.00671001 0.673026 0.5
+-0.0443517 0.100306 -0.0215281 0.825942 0.5
+-0.0147306 0.179604 -0.0266222 0.40888 0.5
+0.0159582 0.108177 -0.0177822 0.564672 0.468958
+-0.0638447 0.138119 -0.00733006 0.633194 0.5
+-0.0330953 0.167861 -0.0155539 0.527374 0.428366
+-0.0643684 0.125359 -0.00876153 0.813046 0.5
+-0.032583 0.161992 -0.0142418 0.852313 0.5
+-0.068568 0.110392 0.0392194 0.353622 0.364353
+-0.0643494 0.112195 0.0388907 0.34696 0.5
+-0.0593722 0.112082 0.0373875 0.588374 0.5
+-0.0529986 0.110472 0.0373551 0.513233 0.408461
+-0.0468613 0.11028 0.0378862 0.569336 0.5
+-0.040984 0.110496 0.0370883 0.553647 0.5
+-0.0320055 0.110468 0.0370438 0.565129 0.5
+-0.0074871 0.110717 0.042649 0.617568 0.5
+-0.00449218 0.110714 0.0426582 0.621679 0.5
+0.0250033 0.110611 0.0368459 0.631257 0.5
+0.025919 0.0995286 -0.0189206 0.684181 0.5
+-0.06973 0.112153 0.0457184 0.746569 0.5
+-0.045604 0.148834 -0.00329924 0.521986 0.5
+-0.0653006 0.0947889 -0.0177657 0.582853 0.5
+-0.0906677 0.13318 0.0277848 0.773217 0.5
+-0.0331508 0.094474 -0.0237799 0.742 0.5
+-0.0575764 0.0941613 -0.0208023 0.703326 0.5
+-0.0200586 0.0397198 0.0532237 0.447203 0.5
+-0.0203685 0.0352888 0.051184 0.291685 0.457265
+-0.0764163 0.125947 -0.00745144 0.524375 0.5
+-0.0205906 0.167551 -0.0139677 0.809186 0.5
+0.025858 0.116851 0.0315289 0.660225 0.5
+-0.0139279 0.167191 -0.021044 0.669958 0.5
+-0.0587481 0.149802 -0.00133886 0.562881 0.5
+0.0144191 0.0395247 0.0443396 0.266796 0.5
+0.0332953 0.105473 0.0329627 0.721815 0.5
+-0.0647461 0.114313 -0.0115219 0.592211 0.5
+-0.0520818 0.0353771 0.0449331 0.341981 0.5
+-0.015004 0.0392095 0.0513548 0.312679 0.5
+-0.0094925 0.0384962 0.049554 0.302651 0.5
+-0.0638496 0.117631 0.0454477 0.559641 0.5
+-0.0573025 0.136864 0.033162 0.554568 0.5
+0.0189101 0.0400942 0.0428502 0.270107 0.5
+-0.0508192 0.124393 0.0332635 0.581555 0.5
+-0.0182623 0.180885 -0.017743 0.594618 0.5
+-0.0651271 0.150343 -0.0325707 0.505808 0.5
+0.0332966 0.0936886 0.0400216 0.637373 0.5
+-0.0463011 0.149493 0.00833001 0.611316 0.5
+0.00260773 0.0354887 0.0450013 0.261253 0.345588
+-0.0780807 0.10971 0.0423535 0.916894 0.5
+-0.0542262 0.124756 0.0369858 0.64506 0.5
+-0.0402584 0.0361447 0.0436625 0.193197 0.5
+-0.00317483 0.0942874 -0.0331049 0.71511 0.325502
+-0.0151032 0.179716 -0.0207621 0.731902 0.5
+0.026141 0.0403246 0.0327265 0.294647 0.339561
+-0.0640247 0.111376 -0.0136272 0.608847 0.5
+0.027817 0.112309 0.0339118 0.692282 0.5
+-0.0586332 0.142774 0.0334953 0.761767 0.5
+-0.0146622 0.167501 -0.0154455 0.61604 0.5
+-0.0270893 0.167298 -0.00866399 0.642638 0.5
+0.0152056 0.045813 0.0442638 0.487785 0.5
+0.0190988 0.0442996 0.0429 0.362689 0.463942
+0.0215694 0.0456112 0.041209 0.479281 0.5
+0.0257452 0.0459137 0.0381185 0.444171 0.5
+0.0387365 0.0944447 0.0327088 0.718127 0.5
+0.0287308 0.0456722 0.0347466 0.335561 0.431941
+-0.0151805 0.173809 -0.0213305 0.730436 0.5
+-0.0658903 0.118253 0.0498126 0.307185 0.5
+-0.0628345 0.093206 -0.0188544 0.659442 0.5
+-0.0643065 0.142451 0.0394123 0.621016 0.5
+-0.040079 0.150283 0.00280951 0.491474 0.5
+-0.026851 0.173268 -0.00983852 0.620534 0.5
+-0.0207913 0.173767 -0.0147826 0.653794 0.5
+-0.0582334 0.124238 0.0403406 0.70004 0.5
+-0.0683337 0.131545 0.0479709 0.732904 0.5
+-0.0693547 0.10637 -0.012803 0.472443 0.5
+-0.0428668 0.157627 0.0050419 0.670804 0.5
+-0.0476449 0.130368 0.0258834 0.623828 0.5
+0.0379451 0.0817167 -0.0141547 0.644934 0.5
+0.0312298 0.0470286 0.0324465 0.426433 0.5
+-0.0662284 0.138149 0.042896 0.72515 0.5
+-0.0644094 0.105575 -0.0158634 0.566501 0.5
+0.0411271 0.0443713 0.0285474 0.466284 0.5
+-0.0830031 0.0762361 0.0150296 0.67606 0.5
+-0.0660167 0.123488 0.0501643 0.718404 0.5
+-0.0416352 0.155329 0.00636435 0.466436 0.5
+-0.0388456 0.155994 0.00477206 0.438555 0.402124
+-0.0551732 0.116538 0.0359195 0.457649 0.5
+-0.0600069 0.134082 0.0369434 0.682472 0.5
+0.0452816 0.0453284 0.0263124 0.471094 0.5
+0.0513161 0.0463154 0.0204963 0.342211 0.398387
+-0.0106687 0.172847 -0.0215627 0.69267 0.5
+-0.0147735 0.18419 -0.0259341 0.309641 0.5
+0.0301064 0.106776 0.0358091 0.72383 0.5
+-0.063709 0.125122 0.0457451 0.712215 0.420475
+0.0473431 0.0499217 0.0295077 0.554948 0.5
+0.0497106 0.0482066 0.0259506 0.48379 0.5
+0.0518484 0.0518415 0.0267161 0.416499 0.5
+-0.0162732 0.172938 -0.0174582 0.719256 0.5
+0.0355097 0.107304 0.0291151 0.718782 0.5
+-0.0552656 0.143077 0.0300537 0.622521 0.5
+-0.0637191 0.136482 0.0388176 0.603354 0.5
+-0.0199086 0.161072 -0.00863325 0.350317 0.5
+-0.0209172 0.179282 -0.0148523 0.455842 0.5
+0.014511 0.0513519 0.0474271 0.589102 0.5
+-0.0610259 0.126912 0.0416133 0.698375 0.5
+0.0539905 0.0494141 0.0219114 0.418448 0.5
+0.00925922 0.118865 -0.0148674 0.54369 0.457314
+-0.0268384 0.162091 -0.00836699 0.546076 0.486591
+-0.0367024 0.163198 -0.00107067 0.680811 0.5
+-0.0336432 0.155948 0.00188963 0.445666 0.44081
+-0.0280966 0.159587 0.000483069 0.431301 0.5
+-0.026491 0.16163 -0.00321758 0.537982 0.323001
+0.0206613 0.0528733 0.0451655 0.647628 0.324331
+0.0231576 0.0513069 0.0414753 0.507052 0.5
+0.0266044 0.0526516 0.039853 0.635463 0.446542
+0.0309772 0.0527823 0.0371348 0.671735 0.5
+0.0239371 0.103424 0.0418106 0.654526 0.5
+0.0568895 0.0527484 0.0209204 0.474964 0.5
+-0.0664209 0.11329 0.0441331 0.212624 0.5
+-0.0326789 0.162384 -0.00243762 0.543585 0.5
+0.0145199 0.0932586 -0.026363 0.546403 0.5
+-0.0543983 0.119186 0.0365781 0.502204 0.44785
+-0.0564272 0.132376 0.0357966 0.720059 0.5
+-0.0501636 0.142911 0.00230897 0.376445 0.5
+-0.043714 0.147707 0.0038501 0.245798 0.5
+-0.0291346 0.177171 -0.00534178 0.371295 0.5
+0.0357304 0.100363 -0.0111604 0.61591 0.5
+0.0133943 0.0541536 0.0499521 0.532724 0.5
+0.0551089 0.0545007 0.0253961 0.545646 0.5
+0.0291033 0.0572886 0.0407089 0.633826 0.5
+0.0585723 0.0583402 0.0214893 0.549998 0.477428
+-0.0740322 0.0656952 0.0144875 0.594594 0.5
+-0.0749844 0.179305 -0.0518221 0.216638 0.5
+0.0145778 0.0585769 0.0501691 0.387785 0.5
+0.0214876 0.058332 0.0470549 0.596242 0.5
+0.0259507 0.0590004 0.0437762 0.663038 0.5
+0.032833 0.0585633 0.0387158 0.630786 0.5
+0.0358218 0.0578374 0.0350365 0.591179 0.5
+0.0360585 0.0951301 0.0364902 0.726421 0.5
+-0.0886806 0.118283 0.0459142 0.444358 0.5
+0.0562736 0.0586365 0.0253398 0.57284 0.5
+0.0303311 0.0951295 0.0419589 0.717458 0.5
+-0.0222315 0.167389 -0.0110472 0.688671 0.5
+-0.0543257 0.136577 0.0307959 0.688078 0.5
+-0.0500074 0.150447 0.0117579 0.563476 0.5
+-0.0616289 0.137406 0.0354744 0.592141 0.5
+-0.0319367 0.159507 0.00191749 0.44862 0.5
+-0.0634458 0.132148 0.0406867 0.731705 0.5
+0.0368678 0.0921989 0.0367449 0.708135 0.5
+-0.0728433 0.156137 -0.0339112 0.713518 0.5
+0.0389872 0.0640689 0.0330299 0.521361 0.5
+-0.0636611 0.1488 -0.0205996 0.618447 0.5
+0.0153938 0.0648444 0.0513036 0.554385 0.463079
+0.0213958 0.0645506 0.0473078 0.414803 0.412252
+0.0265105 0.0649235 0.0439721 0.611901 0.5
+0.0302364 0.0650657 0.0415975 0.600683 0.487653
+0.0331295 0.0642221 0.0397381 0.500385 0.490901
+0.0367885 0.065027 0.0366867 0.593561 0.5
+0.0563131 0.0650782 0.0252208 0.639437 0.5
+0.0591364 0.0644742 0.0211357 0.550839 0.448044
+-0.0110683 0.167098 -0.0167807 0.360187 0.5
+-0.0605202 0.146205 0.0366666 0.591479 0.5
+0.0194528 0.0665736 0.0491642 0.603282 0.5
+-0.0286777 0.158132 0.000508817 0.402765 0.431383
+0.0253025 0.0989569 0.0434277 0.623394 0.5
+-0.0349979 0.152158 8.20736e-05 0.217633 0.5
+0.014665 0.070627 0.0528306 0.52613 0.5
+0.0202908 0.071041 0.0498828 0.634288 0.435356
+0.0230702 0.0702991 0.0473835 0.571849 0.5
+0.0263693 0.0706238 0.0441789 0.622852 0.474797
+0.0328306 0.0707606 0.0401362 0.612279 0.409693
+0.0368832 0.070672 0.0365953 0.662199 0.5
+0.0398878 0.0705632 0.0325808 0.656566 0.5
+0.0579544 0.0694794 0.0198345 0.6125 0.5
+-0.0641704 0.063724 0.0268682 0.425507 0.418571
+-0.0919499 0.114216 0.0149265 0.530043 0.5
+0.0351624 0.0819076 -0.0172502 0.760295 0.5
+-0.0862408 0.119271 -0.00117534 0.455279 0.5
+-0.0294401 0.174958 -0.00579982 0.562984 0.5
+-0.0175288 0.165418 -0.0114925 0.675539 0.5
+-0.0617869 0.117789 0.0409144 0.40334 0.5
+0.0301891 0.0723658 0.0418804 0.606777 0.5
+-0.0822099 0.149486 0.00288044 0.385889 0.468811
+-0.0760271 0.175704 -0.0506937 0.340571 0.5
+-0.0652343 0.0614738 0.0211346 0.414933 0.425841
+-0.0266574 0.110394 -0.019007 0.783101 0.5
+-0.0813538 0.0779161 0.0268055 0.756683 0.5
+0.021417 0.118723 -0.00893569 0.549 0.5
+0.0149346 0.0759297 0.0536191 0.48671 0.476705
+0.0209886 0.0761609 0.0506055 0.575091 0.5
+0.0268396 0.0762089 0.0459193 0.572664 0.5
+0.0336785 0.0760737 0.0405166 0.630563 0.5
+0.0373422 0.0760306 0.0366776 0.505468 0.5
+0.0400324 0.0763062 0.0328345 0.645662 0.5
+0.0419048 0.076876 0.0296092 0.673034 0.5
+0.0438094 0.0763805 0.0258638 0.624347 0.5
+-0.0452412 0.118472 -0.0142046 0.833781 0.5
+0.0456773 0.0768089 0.0208187 0.458504 0.467907
+-0.050165 0.137714 0.0207618 0.606401 0.481088
+-0.00327054 0.111563 -0.0203549 0.551699 0.482404
+-0.0483236 0.145111 0.00757835 0.59165 0.5
+0.0310833 0.0775315 0.0432282 0.624343 0.5
+-0.046855 0.145222 0.00288431 0.195425 0.432502
+-0.0141716 0.10541 -0.0225802 0.672132 0.5
+0.0470348 0.0753979 0.0148736 0.455861 0.5
+-0.0611433 0.140542 0.0356184 0.646306 0.5
+0.0272779 0.0823714 0.0459243 0.61663 0.478488
+0.0309212 0.08255 0.0430252 0.611382 0.5
+0.0343037 0.0825412 0.0402907 0.613309 0.465282
+0.0370354 0.0824663 0.0369099 0.642585 0.5
+-0.0799946 0.147989 -0.000835337 0.484293 0.5
+-0.0774435 0.0690153 0.00961977 0.704234 0.277826
+0.0404363 0.0826995 0.0326021 0.686672 0.5
+0.0417479 0.0827335 0.0302524 0.63553 0.5
+0.0436887 0.0825508 0.0263844 0.61829 0.5
+0.0454407 0.0825465 0.0207137 0.601475 0.480065
+-0.0822812 0.116295 0.0482855 0.66926 0.5
+-0.0844726 0.0947391 -0.00345192 0.592061 0.5
+-0.020271 0.168003 -0.0193935 0.821267 0.5
+-0.0742716 0.0668501 0.0190414 0.706894 0.5
+0.026747 0.0882417 0.0458314 0.539865 0.389736
+0.0308722 0.0882572 0.0430146 0.948814 0.5
+0.0344922 0.0883047 0.0403697 0.638338 0.5
+0.0372481 0.0881263 0.0366393 0.643327 0.5
+0.039927 0.088094 0.0326668 0.711283 0.5
+0.0419027 0.0877782 0.0290815 0.667656 0.5
+0.00264738 0.112302 -0.019871 0.766242 0.5
+-0.0703315 0.1455 -0.0205576 0.136819 0.239158
+-0.0749446 0.137879 -0.00653312 0.459033 0.397283
+-0.0266967 0.114299 -0.0159903 0.856895 0.5
+-0.0869924 0.113518 0.00410409 0.344807 0.5
+-0.0142186 0.174013 -0.0259807 0.439072 0.5
+-0.0221564 0.157852 -0.00861651 0.254248 0.5
+-0.011587 0.164129 -0.0163045 0.228563 0.367524
+-0.00997381 0.169338 -0.0247765 0.42189 0.5
+-0.082875 0.143405 0.00186692 0.494272 0.5
+0.0203757 0.0354405 -0.00287175 0 0
+0.0191274 0.0363337 -0.00917714 0.174536 0.5
+0.0184456 0.036388 -0.013479 0.173751 0.5
+0.0149535 0.0347732 -0.0154937 0.144529 0.253209
+0.0221204 0.0372026 0.0342324 0.156956 0.287305
+0.039271 0.0382866 0.00854708 0.245023 0.5
+0.0397549 0.0398545 0.002614 0.276002 0.5
+0.0221892 0.0380614 -0.00446361 0.173629 0.5
+0.0179901 0.0369066 -0.0161835 0.336518 0.5
+0.0154148 0.0392444 -0.0212861 0.367832 0.5
+0.0208023 0.100118 -0.0213392 0.648293 0.46589
+0.0446004 0.0409064 0.00927401 0.208963 0.5
+0.0435625 0.0411355 0.00427044 0.357471 0.452104
+0.0381381 0.0411139 -0.00147908 0.514406 0.5
+-0.0478807 0.135207 0.00885778 0.482359 0.5
+0.0217274 0.0404287 -0.00964433 0.311593 0.5
+0.0206744 0.0405956 -0.0144437 0.473825 0.5
+0.0192578 0.0411681 -0.0195074 0.414351 0.5
+-0.0885736 0.112913 0.0395856 0.488806 0.5
+-0.026793 0.106457 -0.0218501 0.617481 0.5
+0.0481487 0.0428585 0.0145594 0.265572 0.5
+0.0521212 0.0461655 0.0089655 0.199267 0.5
+0.0480438 0.0430647 0.00724585 0.412258 0.5
+0.0460936 0.0434131 0.00284357 0.566688 0.5
+0.0285003 0.100485 -0.0168103 0.728425 0.5
+0.0269462 0.0395833 -0.00334578 0.464947 0.5
+-0.0907856 0.117838 0.00647331 0.421552 0.5
+-0.062721 0.167567 -0.0470628 0.645866 0.5
+-0.0799532 0.106813 0.0316838 0.420249 0.5
+0.0527437 0.0462125 0.0139554 0.286197 0.5
+0.0504533 0.0466263 0.00264513 0.57721 0.5
+-0.0322581 0.117324 -0.0133273 0.811815 0.5
+0.0272475 0.0455966 -0.00927071 0.533119 0.5
+-0.0146455 0.0942084 -0.0337341 0.520871 0.5
+-0.0411545 0.16722 -0.010818 0.48116 0.5
+-0.0721385 0.156112 -0.0384102 0.511983 0.468875
+0.0456803 0.0474217 -0.00311192 0.412576 0.5
+0.0239407 0.0433254 -0.00969837 0.651864 0.5
+0.021084 0.0462585 -0.0205303 0.476548 0.5
+-0.0348527 0.0351549 -0.0307351 0.16856 0.5
+-0.0699867 0.0663066 0.0259153 0.590849 0.43032
+-0.0747071 0.149891 -0.0201453 0.5851 0.5
+-0.0845448 0.13725 0.000743181 0.580039 0.5
+0.0549514 0.0484178 0.0163982 0.295573 0.5
+0.0264565 0.0466261 -0.0141039 0.515417 0.5
+0.0225276 0.0444655 -0.0157683 0.505631 0.5
+0.0330538 0.0938135 -0.0160538 0.699679 0.5
+0.0526476 0.0694992 0.00297306 0.629664 0.372945
+0.0528544 0.0581339 -0.00277966 0.592036 0.5
+-0.0571464 0.0671799 0.0361705 0.503626 0.472266
+-0.0651544 0.157167 -0.0515491 0.708429 0.5
+-0.0493189 0.133682 0.00119868 0.355836 0.438333
+-0.032962 0.10595 -0.0206729 0.810434 0.5
+-0.0649538 0.155656 -0.045631 0.820472 0.5
+-0.0390456 0.150445 -0.00354536 0.204281 0.5
+0.0574365 0.051618 0.0145183 0.351624 0.5
+0.0574129 0.0522531 0.00903377 0.511629 0.5
+0.0536112 0.0500965 0.00204174 0.768402 0.5
+0.0512204 0.0520121 -0.00218354 0.534755 0.5
+0.0471226 0.0515811 -0.00481298 0.434179 0.5
+0.033443 0.047576 -0.0063817 0.557462 0.465257
+0.00280933 0.118297 -0.0158208 0.570337 0.473222
+-0.0147841 0.10125 -0.0238408 0.771507 0.5
+-0.0620037 0.167422 -0.0527165 0.538383 0.466596
+0.0559147 0.0528382 0.00339683 0.824166 0.5
+0.0334801 0.0518506 -0.00825293 0.591066 0.5
+0.0287814 0.0501171 -0.0157926 0.574224 0.5
+0.0256197 0.0485542 -0.0190548 0.421586 0.5
+-0.00863537 0.118406 -0.0146114 0.827086 0.5
+-0.0148322 0.117675 -0.014701 0.559736 0.5
+-0.0615138 0.145712 -0.00481276 0.466074 0.5
+0.0232531 0.12083 -0.00456186 0.617393 0.5
+-0.0401535 0.0342718 -0.0275149 0.0979878 0.5
+0.0302657 0.0496868 -0.0107289 0.647285 0.5
+0.0320066 0.111334 -0.00737407 0.536101 0.5
+-0.0211003 0.120417 -0.0102482 0.732965 0.5
+-0.0204991 0.117125 -0.0140803 0.767014 0.5
+-0.00910263 0.0383602 -0.025776 0.274297 0.5
+-0.0525144 0.11229 -0.0171034 0.442719 0.484227
+0.0202353 0.123713 -0.00247094 0.59012 0.5
+-0.0701749 0.0347541 -0.0017891 0.135623 0.5
+-0.00340266 0.114844 -0.0176928 0.826111 0.5
+0.0310248 0.053713 -0.0140522 0.572913 0.5
+0.0268191 0.0528482 -0.020339 0.412387 0.455219
+-0.0147458 0.120673 -0.0105853 0.653192 0.5
+0.0270905 0.106214 -0.0146756 0.603346 0.5
+0.0465541 0.0697991 0.00228503 0.590477 0.5
+-0.00300122 0.100676 -0.0235814 0.77298 0.5
+-0.0755874 0.076212 0.033468 0.651011 0.5
+0.059738 0.0572998 0.0151736 0.624329 0.5
+0.0595394 0.0578717 0.00861672 0.650231 0.5
+0.0572091 0.0580526 0.00253507 0.577167 0.5
+-0.0142907 0.123147 -0.00746744 0.689207 0.5
+0.0211831 0.112303 -0.0140834 0.636933 0.5
+0.0347455 0.0565046 -0.010714 0.517615 0.5
+0.0249138 0.0825163 -0.0245877 0.759593 0.5
+-0.0382227 0.114521 -0.016178 0.845616 0.5
+-0.0819485 0.0761672 0.0208322 0.76776 0.5
+-0.0269557 0.0392251 -0.0293943 0.537642 0.5
+0.0377037 0.0593401 -0.00852013 0.537798 0.5
+0.0330295 0.0586306 -0.014729 0.60439 0.5
+0.0218121 0.0515865 -0.0236492 0.56032 0.5
+-0.0204953 0.0935908 -0.0331675 0.485557 0.5
+-0.0872217 0.113521 0.0440666 0.448078 0.427651
+-0.0271537 0.0351608 0.0509267 0.96808 0.5
+-0.0503825 0.106302 -0.0194598 0.649024 0.5
+0.0266611 0.0585067 -0.0219134 0.622435 0.5
+0.00975018 0.0945932 -0.0280451 0.504262 0.457756
+-0.0205524 0.122391 -0.00754739 0.498583 0.5
+-0.0668021 0.0909191 -0.0174744 0.566525 0.5
+-0.0856155 0.0942099 -0.00109094 0.420789 0.436678
+-0.0915274 0.11444 0.0204492 0.759207 0.5
+-0.0909048 0.131701 0.00809159 0.558083 0.5
+0.0404851 0.0578886 -0.0051698 0.425865 0.437223
+0.0295964 0.0580473 -0.0178274 0.608291 0.460655
+0.0266986 0.0941359 -0.0205949 0.662934 0.5
+-0.0677104 0.172869 -0.0572602 0.695141 0.5
+0.0142001 0.118043 -0.013917 0.45799 0.403894
+-0.0698171 0.0699687 0.0326375 0.529959 0.5
+0.0607097 0.0648802 0.0151632 0.434757 0.451533
+0.0609346 0.0630505 0.0131585 0.526971 0.5
+0.0602205 0.0643718 0.00864139 0.443146 0.456896
+0.0574055 0.0638877 0.00271573 0.413274 0.5
+-0.0797793 0.103858 -0.00660016 0.553637 0.5
+-0.0563867 0.137359 -0.00421998 0.659682 0.5
+0.0344512 0.0638263 -0.0152012 0.581486 0.5
+0.0307139 0.0605317 -0.0184589 0.617611 0.449874
+0.0185684 0.121789 -0.00725624 0.61441 0.349043
+-0.0456617 0.112414 -0.0169658 0.70381 0.5
+0.0456177 0.0644845 -0.00162168 0.572144 0.5
+-0.0584268 0.0349015 0.0441202 0.767369 0.5
+-0.0747982 0.0723674 0.0308514 0.656357 0.5
+-0.0699373 0.0621854 0.0151778 0.587415 0.5
+-0.052889 0.136519 -0.00170821 0.593683 0.5
+0.0410205 0.0644886 -0.00476733 0.363401 0.5
+0.0388712 0.0646166 -0.00976797 0.384344 0.5
+0.0514871 0.0637279 -0.00174794 0.518067 0.5
+-0.0787297 0.0744551 0.0267421 0.809934 0.5
+-0.0850281 0.144269 0.00618082 0.578063 0.5
+0.0313094 0.064487 -0.0188936 0.672704 0.5
+0.0267274 0.0646171 -0.0220842 0.752591 0.5
+0.0318737 0.0877439 -0.0192705 0.740422 0.5
+-0.0772455 0.143995 -0.00470939 0.452269 0.5
+0.0132576 0.110443 -0.0183541 0.539267 0.5
+-0.00289343 0.124723 -0.00863032 0.516883 0.5
+-0.0342868 0.038582 0.0485461 0.546061 0.5
+0.0200397 0.0876233 -0.0261205 0.735721 0.5
+0.0585453 0.0705354 0.0146976 0.608535 0.5
+0.0581405 0.0699819 0.00856199 0.483528 0.5
+0.056099 0.069436 0.00424359 0.385578 0.5
+0.0370479 0.0665186 -0.0132637 0.645736 0.5
+-0.062561 0.172971 -0.0616721 0.43069 0.5
+-0.0702718 0.15494 -0.0455472 0.29179 0.457421
+-0.0916259 0.130499 0.00930481 0.432982 0.472725
+-0.070021 0.148229 -0.0328231 0.322588 0.195946
+-0.0721274 0.0680183 0.0267753 0.656727 0.5
+-0.0745337 0.15067 -0.0264303 0.331486 0.5
+0.0431087 0.0713461 -0.002764 0.390428 0.45538
+0.0421659 0.0692525 -0.00466106 0.55545 0.5
+0.0345404 0.0699378 -0.0160391 0.727409 0.5
+-0.0342368 0.122912 -0.00708584 0.432969 0.5
+0.0401518 0.070932 -0.00951127 0.706551 0.5
+0.0370706 0.0707408 -0.013301 0.722628 0.5
+0.0310856 0.0702175 -0.0192905 0.761897 0.5
+0.0283004 0.0705453 -0.0222447 0.701199 0.5
+-0.00859023 0.101699 -0.0237897 0.731824 0.5
+-0.0328234 0.0400139 -0.029875 0.413461 0.5
+-0.0830588 0.11047 0.0397334 0.931001 0.5
+0.0142724 0.123237 -0.00806485 0.479991 0.484444
+-0.0760443 0.108637 0.0389078 0.769887 0.5
+-0.0732762 0.154939 -0.0321392 0.640327 0.5
+0.0160324 0.0889232 -0.0282477 0.595959 0.5
+-0.0901677 0.131361 0.0394374 0.633972 0.457764
+0.0455828 0.0768365 0.00270178 0.323813 0.5
+-0.0516717 0.0553965 0.014906 0.168077 0.5
+-0.0376545 0.121002 -0.0109724 0.599451 0.451266
+0.0466318 0.0762885 0.00910629 0.334003 0.5
+0.0437303 0.0769241 -0.00295564 0.541016 0.5
+0.0405043 0.0766784 -0.0084913 0.540094 0.5
+0.0369463 0.0762836 -0.0128837 0.716695 0.5
+0.0349351 0.0766648 -0.0155944 0.687304 0.5
+0.0319237 0.0763904 -0.0194186 0.722365 0.5
+0.0285208 0.0758075 -0.0225233 0.729644 0.5
+-0.0646857 0.068809 0.0348219 0.518098 0.396839
+-0.00335573 0.0986136 -0.0269283 0.762285 0.5
+-0.0383606 0.100112 -0.0217661 0.633523 0.5
+-0.0705433 0.149897 -0.0387319 0.143598 0.5
+-0.0247871 0.179215 -0.0188356 0.466421 0.5
+0.00339058 0.0937023 -0.0318365 0.697748 0.5
+-0.09099 0.142689 0.0226645 0.743514 0.5
+-0.0851088 0.102115 0.000391121 0.420019 0.403283
+0.00299202 0.124707 -0.00864775 0.631346 0.5
+-0.0649459 0.167336 -0.0329944 0.692397 0.5
+0.045975 0.0827243 0.0146716 0.494123 0.463874
+0.0461931 0.0827376 0.00867911 0.540283 0.443947
+0.0453461 0.0826602 0.00269811 0.520808 0.5
+0.032594 0.082231 -0.0190597 0.700575 0.5
+-0.0707752 0.142011 -0.00901143 0.440829 0.5
+-0.0396694 0.045239 -0.0210351 0.371561 0.5
+-0.0736488 0.145787 -0.0131048 0.298566 0.5
+-0.0661855 0.1779 -0.0529018 0.456268 0.5
+-0.0698006 0.179227 -0.0517285 0.330383 0.5
+-0.0719677 0.177848 -0.0474604 0.498199 0.393806
+-0.0131817 0.0974247 0.0509808 0.29677 0.5
+-0.0760529 0.177651 -0.0471457 0.200482 0.341482
+-0.0875274 0.149451 0.00937476 0.260452 0.5
+-0.0847504 0.149536 0.00652369 0.220089 0.5
+-0.0853843 0.0980412 -0.000554198 0.453316 0.5
+-0.070162 0.172945 -0.0393132 0.377002 0.42015
+-0.0669053 0.17136 -0.0404187 0.587367 0.5
+-0.0915765 0.114644 0.0108349 0.335405 0.476851
+0.0311175 0.116345 -0.00142056 0.524001 0.485056
+-0.09039 0.144074 0.0142555 0.571623 0.5
+0.0533752 0.0724173 0.00805773 0.504643 0.5
+0.0348115 0.113636 0.00289967 0.517745 0.5
+0.0321047 0.117128 0.00373672 0.512637 0.481334
+-0.0558554 0.16013 0.00226313 0.176407 0.35978
+0.0284127 0.12005 0.00266093 0.800124 0.5
+-0.0693417 0.151526 -0.0443255 0.162625 0.220555
+0.0509143 0.0733396 0.0112131 0.81315 0.5
+0.0485286 0.0726358 0.00856732 0.779683 0.5
+0.0251471 0.122517 0.00254898 0.804299 0.5
+-0.0684168 0.170157 -0.0319531 0.535557 0.5
+-0.071028 0.171274 -0.0325886 0.712016 0.5
+-0.0765634 0.155757 -0.00874762 0.256295 0.5
+0.0525206 0.0734678 0.0148876 0.468908 0.45355
+0.035521 0.113454 0.00908801 0.654915 0.5
+0.0208324 0.125627 0.00327965 0.76886 0.5
+-0.0476722 0.134348 0.0194434 0.579216 0.488505
+-0.0746083 0.171229 -0.0326516 0.439107 0.422901
+0.0322027 0.117616 0.0093642 0.646061 0.5
+0.0162523 0.127588 0.00132734 0.679655 0.445027
+-0.0914669 0.142805 0.0167223 0.344959 0.5
+0.0290775 0.120474 0.00686894 0.798143 0.5
+0.0135909 0.12914 0.00336546 0.632038 0.474565
+-0.0861635 0.100458 0.025719 0.514874 0.431291
+-0.0653051 0.165945 -0.0269849 0.665887 0.5
+-0.0698492 0.16889 -0.0268648 0.536219 0.5
+-0.07827 0.167473 -0.032496 0.259817 0.452429
+0.0215557 0.0945234 -0.0226594 0.630702 0.48336
+0.0260612 0.123082 0.00873766 0.803075 0.5
+0.00920342 0.130081 0.00248247 0.641161 0.5
+-0.0709934 0.170517 -0.0295248 0.566905 0.409383
+-0.0760202 0.167938 -0.0272636 0.242234 0.5
+0.0525229 0.0716654 0.0211203 0.349876 0.431389
+0.0207167 0.126566 0.00922145 0.763786 0.5
+-0.0746025 0.0998033 -0.0126456 0.503102 0.5
+-0.0864333 0.0890874 0.0257055 0.752441 0.5
+0.0354941 0.113435 0.0150848 0.708057 0.5
+0.0320737 0.117698 0.0146262 0.694886 0.5
+0.00294754 0.130714 0.00292443 0.849802 0.5
+-0.0256391 0.0823957 0.0519489 0.764034 0.5
+-0.0666258 0.165416 -0.0221631 0.534987 0.5
+-0.0804177 0.153092 0.00488677 0.321879 0.39417
+-0.0645623 0.0350017 0.0151892 0.352362 0.5
+-0.0627936 0.0352479 0.02012 0.616295 0.5
+-0.0642932 0.0349381 0.0264604 0.161121 0.384305
+-0.0642421 0.0397497 0.0267659 0.206373 0.5
+-0.0652419 0.0352202 0.0324357 0.167045 0.5
+-0.06432 0.0352261 0.0387914 0.349097 0.5
+-0.0869014 0.0944088 0.0260869 0.722262 0.5
+-0.026376 0.100403 -0.0237519 0.527518 0.47737
+-0.0704394 0.0348288 0.00888692 0.228898 0.5
+-0.0696375 0.039673 0.0091864 0.30841 0.5
+-0.0678064 0.035728 0.013362 0.509091 0.5
+-0.0778433 0.0819732 0.0354617 0.774608 0.5
+-0.0809318 0.0827942 0.0325 0.767831 0.5
+-0.0712316 0.038974 0.00275642 0.155719 0.237906
+-0.0616101 0.0379618 0.0219344 0 0
+-0.0653778 0.0407054 0.0323415 0.379158 0.5
+-0.0612949 0.040108 0.0438783 0.388361 0.5
+-0.0748891 0.0826916 0.0381154 0.772848 0.5
+-0.0841641 0.133769 0.0486564 0.546232 0.467433
+-0.0849106 0.0945271 0.0290479 0.754258 0.5
+-0.082994 0.144712 0.0404065 0.382972 0.420138
+-0.0265479 0.117619 -0.0132781 0.755106 0.5
+-0.0679678 0.0383221 0.0123903 0.271535 0.306541
+-0.0639259 0.0401146 0.0151101 0.258252 0.450399
+-0.0588527 0.0407802 0.0202136 0.51937 0.5
+-0.0869621 0.135589 0.0440584 0.520567 0.5
+-0.038827 0.0398484 0.042564 0.570175 0.5
+-0.0253238 0.0773437 0.0501603 0.646885 0.5
+0.00864855 0.111878 -0.0192252 0.821439 0.5
+-0.0625014 0.04424 0.0388616 0.455153 0.47063
+-0.088493 0.125258 0.0461673 0.674925 0.5
+0.0150785 0.10107 -0.0220372 0.749486 0.5
+-0.0810533 0.0876325 0.0334622 0.750019 0.5
+-0.0636602 0.0439221 0.0322355 0.437404 0.5
+-0.0823757 0.12585 -0.00459555 0.376136 0.464207
+-0.0374554 0.042873 0.0429512 0.492581 0.5
+-0.031328 0.0432863 0.0501185 0.483275 0.5
+-0.0841802 0.0875016 0.0285815 0.671149 0.464325
+-0.0690099 0.0427216 0.00298087 0.372436 0.5
+-0.0690323 0.0427133 0.00739115 0.277083 0.5
+-0.0642007 0.0449178 0.00895163 0.562755 0.5
+-0.0630005 0.0427497 0.0133004 0.520064 0.348086
+-0.0580777 0.0444032 0.0143596 0.493924 0.5
+-0.087476 0.130712 0.0458544 0.531379 0.477045
+-0.0837712 0.0999337 0.029339 0.668895 0.5
+-0.083719 0.0822846 0.0270932 0.660348 0.5
+-0.0209183 0.0934772 0.0512134 0.479975 0.5
+-0.0868983 0.142651 0.0383505 0.486766 0.469754
+-0.0588984 0.0467651 0.00989959 0.460736 0.319245
+-0.0529144 0.0464475 0.0158024 0.381525 0.5
+-0.0881654 0.0882094 0.0209192 0.624947 0.5
+-0.0494075 0.165901 0.000731671 0.369742 0.391777
+-0.0586114 0.0473978 0.0337061 0.152377 0.410418
+-0.05614 0.0517476 0.00835186 0.396733 0.5
+-0.0865231 0.148073 0.0321271 0.367072 0.452379
+-0.0308497 0.0493297 0.0429654 0.330168 0.454747
+-0.0769102 0.114994 0.0501188 0.653806 0.5
+-0.0209065 0.0959579 0.0474195 0.622864 0.5
+-0.0509947 0.0509637 0.0150799 0.759028 0.5
+0.00842415 0.0889657 -0.0320537 0.627702 0.5
+-0.0240561 0.0544386 0.0416973 0.433194 0.5
+-0.0510392 0.0524223 0.0203213 0.262945 0.5
+-0.0526208 0.0518271 0.027021 0.695325 0.5
+-0.0504022 0.0591186 0.0326891 0.768296 0.5
+-0.0478821 0.0590694 0.0363134 0.800191 0.5
+-0.0239128 0.0586553 0.0421308 0.768223 0.5
+-0.0759314 0.119228 -0.00697007 0.568703 0.5
+-0.0183181 0.0604564 0.0506182 0.70539 0.5
+-0.0298441 0.0972531 -0.0235715 0.830462 0.5
+-0.0241926 0.0628773 0.0422936 0.709715 0.5
+-0.0223998 0.06467 0.045979 0.606456 0.5
+-0.0192899 0.0641483 0.0503928 0.754401 0.5
+-0.0260109 0.172925 -0.0191453 0.51739 0.5
+-0.0265331 0.161574 -0.0144318 0.84044 0.5
+-0.0558556 0.15572 -0.00121016 0.41523 0.5
+-0.0599028 0.136466 -0.0064456 0.660892 0.5
+-0.063538 0.071665 0.0379463 0.556494 0.5
+-0.0200417 0.0869862 -0.0378876 0.500126 0.449734
+-0.0557176 0.105745 -0.0186241 0.707273 0.5
+-0.0530691 0.143914 -0.00100898 0.728895 0.5
+-0.0256688 0.0704637 0.0438935 0.717372 0.393932
+-0.0235577 0.0693774 0.0470203 0.657726 0.5
+-0.0525759 0.127247 -0.00521525 0.567734 0.5
+-0.0787859 0.131858 -0.00545913 0.44224 0.460808
+-0.0580212 0.120088 -0.0102747 0.564344 0.455328
+-0.0396294 0.110441 -0.0186258 0.62346 0.5
+-0.0210282 0.173113 -0.0214922 0.42389 0.352327
+-0.0547593 0.0563289 0.0107147 0.179388 0.5
+-0.0435534 0.0345758 -0.024752 0.176398 0.205782
+-0.0449833 0.0346921 -0.0207483 0.159962 0.261208
+-0.0443576 0.0390403 -0.0217491 0.178142 0.5
+-0.0462855 0.0345037 -0.0153112 0.189574 0.5
+-0.046619 0.0396457 -0.0141457 0.194812 0.5
+-0.00904923 0.0343826 -0.0246429 0.15305 0.5
+0.00311748 0.100303 -0.0227929 0.684313 0.5
+-0.0690809 0.0392217 -0.00181724 0.169982 0.409113
+-0.0920289 0.131041 0.0262349 0.856795 0.5
+-0.043414 0.0372487 -0.0253064 0.219927 0.5
+0.0280974 0.0818294 -0.0220931 0.752623 0.5
+-0.067702 0.169446 -0.0560134 0.487347 0.455218
+-0.0915377 0.129674 0.0312365 0.601516 0.48259
+-0.0663086 0.0411162 -0.00443149 0.346306 0.5
+-0.0731255 0.151935 -0.0368879 0.40925 0.5
+-0.0390145 0.0394889 -0.027598 0.3765 0.5
+-0.0637372 0.0437827 -0.00264533 0.37233 0.5
+-0.0605427 0.0425565 0.0246975 0.23689 0.5
+-0.0857603 0.130763 -0.000714461 0.66754 0.5
+-0.0520472 0.0403573 -0.0107411 0.62257 0.5
+-0.0568522 0.0434504 0.0224413 0.404188 0.5
+-0.043239 0.0429342 -0.0193166 0.339314 0.38382
+-0.0438787 0.0441322 -0.0144222 0.427488 0.468839
+-0.0457505 0.046486 -0.0105694 0.340556 0.5
+-0.0645938 0.0456897 0.00313082 0.3549 0.5
+-0.0525978 0.0464843 0.0207116 0.3335 0.5
+-0.0572578 0.0459489 0.026887 0.439332 0.5
+-0.0618962 0.0443648 0.0286813 0.302557 0.45843
+-0.0331467 0.0453179 -0.0267282 0.481653 0.5
+-0.0377669 0.0443547 -0.0252099 0.392631 0.5
+-0.0320922 0.114425 -0.0162304 0.853943 0.5
+-0.0578027 0.0470669 -0.0032674 0.530144 0.5
+-0.0914954 0.147994 0.0205137 0.478387 0.480384
+-0.0400067 0.0471536 -0.0151042 0.224844 0.33752
+0.00454895 0.121869 -0.0124797 0.622385 0.5
+0.0151282 0.112708 -0.0165496 0.634759 0.463552
+-0.0525787 0.0463291 -0.00775444 0.598118 0.5
+-0.0599276 0.0475112 0.00267117 0.286734 0.429608
+-0.0726458 0.147126 -0.0218625 0.235551 0.5
+-0.0740924 0.168686 -0.0440312 0.451963 0.347747
+-0.057494 0.0515426 0.00319413 0.311918 0.5
+-0.0536918 0.0483048 0.0264945 0.447469 0.5
+-0.0147156 0.114453 -0.0172255 0.634887 0.5
+-0.0335191 0.0480424 -0.021246 0.299501 0.5
+0.019461 0.0924333 -0.0244344 0.636237 0.5
+0.0169402 0.0952065 -0.0238278 0.793707 0.5
+0.0201047 0.104156 -0.0188197 0.859301 0.5
+-0.0319642 0.0516657 -0.0152509 0.265727 0.5
+-0.0368448 0.0488256 -0.0131071 0.109826 0.5
+-0.0391265 0.0518909 -0.0109467 0.555432 0.5
+-0.00892221 0.111576 -0.0202733 0.785262 0.5
+-0.0515659 0.0515158 -0.00751393 0.527245 0.5
+-0.0557028 0.05294 -0.00268598 0.514955 0.5
+-0.0293421 0.0526398 -0.0213991 0.356317 0.5
+-0.0314453 0.0496351 -0.0193539 0.306544 0.5
+0.0322381 0.10409 -0.0128482 0.653044 0.5
+-0.0261025 0.0525801 -0.0264669 0.366688 0.5
+-0.0583031 0.116733 -0.0130038 0.568329 0.5
+-0.014851 0.111599 -0.0191484 0.630253 0.463696
+-0.0521348 0.118189 -0.0137451 0.464136 0.474515
+-0.0517493 0.0582798 -0.00896954 0.683087 0.5
+-0.0561982 0.0582462 -0.00310645 0.618759 0.5
+-0.0587989 0.0586119 0.00276734 0.328771 0.427166
+-0.0585564 0.0578416 0.00857596 0.293131 0.5
+0.019026 0.11614 -0.0131686 0.497701 0.5
+-0.0211893 0.111662 -0.0190883 0.650648 0.5
+-0.0239176 0.0561149 -0.030057 0.484351 0.5
+-0.0272603 0.058548 -0.027478 0.457773 0.5
+-0.0295766 0.0582799 -0.0217551 0.550969 0.5
+-0.0320928 0.0589382 -0.0147618 0.534177 0.453646
+0.0073938 0.121789 -0.0126555 0.654152 0.5
+-0.0251946 0.0595227 -0.0308632 0.509396 0.5
+-0.0307167 0.06013 -0.0194181 0.549851 0.422118
+-0.0650113 0.0632174 -0.00293095 0.168435 0.5
+-0.0696479 0.065751 -0.00198101 0.165663 0.5
+-0.0699926 0.0635013 0.00374106 0.275779 0.5
+-0.0799435 0.0724812 0.0191514 0.599916 0.5
+-0.0676844 0.160922 -0.0559942 0.35716 0.5
+-0.0215435 0.0636559 -0.0350431 0.45692 0.5
+-0.0258325 0.0648252 -0.0322087 0.452259 0.5
+-0.028982 0.0636438 -0.0274997 0.410415 0.5
+-0.0304226 0.0629368 -0.0224261 0.908229 0.5
+-0.0319042 0.0651819 -0.0201942 0.518875 0.434998
+-0.0332741 0.0636337 -0.0160032 0.40837 0.447765
+-0.0205547 0.034111 -0.026401 0.174612 0.215481
+-0.0743367 0.0658286 0.00833126 0.649876 0.5
+0.016103 0.120745 -0.0103843 0.509865 0.5
+-0.0770212 0.0700544 0.00316631 0.305775 0.384345
+-0.0748219 0.06693 0.00451345 0.433069 0.463791
+-0.0306317 0.0657524 -0.025453 0.517895 0.5
+-0.0711433 0.0687078 -0.00390291 0.256016 0.135401
+-0.0762625 0.0716316 -0.00295918 0.293636 0.296358
+-0.0802204 0.0713935 0.00991267 0.507181 0.5
+-0.0913413 0.148143 0.0161458 0.474933 0.5
+-0.0273736 0.0700052 -0.0335323 0.445714 0.5
+-0.0300274 0.0692073 -0.0289677 0.511122 0.5
+-0.0316277 0.0711218 -0.0266514 0.502235 0.5
+-0.0330629 0.0699765 -0.0212743 0.929225 0.5
+-0.0353642 0.0705896 -0.0177097 0.263666 0.5
+-0.0587004 0.0391044 -0.0090027 0.295521 0.5
+-0.0697696 0.0703857 -0.00808666 0.238472 0.5
+-0.0804832 0.0726462 0.00472466 0.630221 0.5
+0.0151616 0.126104 -0.00266395 0.542796 0.5
+-0.0745721 0.072883 -0.00757069 0.303203 0.5
+-0.0823908 0.076277 0.00270117 0.615888 0.5
+-0.0912831 0.133698 0.0142161 0.68945 0.5
+0.00371049 0.0968817 -0.0280931 0.670854 0.5
+-0.0761392 0.0766258 -0.00859487 0.260107 0.5
+-0.0784749 0.0748827 -0.00523624 0.238143 0.440892
+-0.0806781 0.0771902 -0.00290803 0.36458 0.43512
+-0.0834622 0.0765209 0.00927112 0.562933 0.5
+0.00983826 0.11402 -0.0178612 0.519736 0.475688
+0.00210649 0.0981565 -0.0261244 0.689185 0.5
+-0.0285085 0.0757575 -0.0348118 0.64535 0.304239
+-0.0330874 0.0761249 -0.0270661 0.564742 0.5
+-0.0346568 0.0757906 -0.0215029 0.930953 0.5
+0.0231104 0.0892807 -0.0240236 0.697809 0.45449
+-0.0312132 0.0771357 -0.0320416 0.687582 0.5
+-0.0700425 0.0763633 -0.0141464 0.485274 0.5
+-0.0861137 0.0814707 0.00908143 0.590509 0.5
+-0.086319 0.08152 0.0149936 0.698173 0.5
+-0.0208042 0.0963182 -0.0270563 0.75553 0.5
+-0.0211078 0.114391 -0.0171285 0.793027 0.5
+-0.0746162 0.0828529 -0.0139325 0.683447 0.5
+-0.077295 0.081216 -0.0100568 0.47673 0.5
+-0.0800127 0.0821605 -0.00722237 0.637376 0.5
+-0.0826334 0.0820868 -0.00324616 0.569954 0.5
+-0.0844667 0.0817669 0.00249573 0.601403 0.5
+-0.0860445 0.0832591 0.0203255 0.630527 0.5
+-0.084816 0.0816746 0.0219849 0.638209 0.5
+0.0545549 0.0661692 0.000765649 0.628404 0.43579
+-0.0331604 0.0828369 -0.0270493 0.417784 0.5
+-0.0358028 0.0829047 -0.0227723 0.112354 0
+-0.0861942 0.0842505 0.00298565 0.418742 0.5
+-0.0287072 0.0827267 -0.0349537 0.48086 0.471486
+-0.0311601 0.0822387 -0.0315627 0.627475 0.5
+-0.085403 0.141865 0.00516647 0.463398 0.5
+-0.0785169 0.0885628 -0.0107607 0.69884 0.5
+-0.0807046 0.0887676 -0.00826584 0.689404 0.5
+-0.0843972 0.0878743 -0.00349923 0.402052 0.5
+-0.0855708 0.0882073 -0.00109946 0.425364 0.422235
+-0.0876157 0.0881286 0.00369184 0.414972 0.435161
+-0.0885339 0.0876942 0.00897158 0.630733 0.5
+-0.0885791 0.0877213 0.0149616 0.665472 0.5
+-0.0643854 0.0348576 -0.00775085 0.279509 0.5
+-0.0512932 0.034227 -0.0129013 0.159841 0.5
+-0.0266839 0.0458556 -0.027274 0.610127 0.5
+-0.0146368 0.0981541 -0.0264318 0.44201 0.5
+-0.0213468 0.10077 -0.0239588 0.58675 0.5
+0.020932 0.0825954 -0.0267347 0.750174 0.5
+0.00759225 0.0928541 -0.0309237 0.580726 0.5
+-0.0144478 0.0879274 -0.0380297 0.689122 0.5
+-0.00859724 0.11451 -0.0173132 0.77831 0.5
+0.0264818 0.109935 -0.0126182 0.652634 0.5
+-0.0145855 0.0385179 -0.0267991 0.230538 0.5
+-0.0330054 0.0337044 -0.0272991 0.262513 0.5
+-0.0267872 0.0340475 -0.0271901 0.244173 0
+-0.00849157 0.0985859 -0.0270535 0.53889 0.411612
+-0.0110954 0.120824 -0.0120135 0.770076 0.5
+0.0367379 0.0925992 -0.0129888 0.684003 0.5
+-0.0571635 0.0435755 -0.00717607 0.581004 0.404197
+-0.0193328 0.0979251 -0.024792 0.661276 0.5
+-0.0203798 0.0385467 -0.0283088 0.392689 0.5
+-0.0587681 0.0337133 -0.00871891 0.1361 0.5
+-0.0517919 0.100655 -0.0213258 0.798237 0.5
+0.00702627 0.0978418 -0.0246055 0.732067 0.326346
+-0.0148892 0.126068 -0.00252126 0.467449 0.5
+0.0307578 0.092446 -0.0188519 0.704525 0.5
+0.0211049 0.0578126 -0.0266116 0.685576 0.5
+-0.0169237 0.0970481 -0.0278718 0.775366 0.5
+0.0460004 0.0581866 -0.00508589 0.612698 0.5
+-0.00944331 0.0822271 -0.0381067 0.670336 0.467319
+-0.0635881 0.0392124 -0.00717766 0.572252 0.5
+0.00864227 0.0386371 -0.0233053 0.540697 0.5
+0.0252935 0.0769557 -0.0248407 0.75695 0.5
+-0.0229653 0.0895159 -0.036199 0.454072 0.467569
+-0.0523791 0.0341193 -0.00994653 0.132813 0.5
+0.0211693 0.0643935 -0.0268578 0.690366 0.5
+-0.0515867 0.13164 -0.0028092 0.545448 0.5
+-0.0149669 0.0345529 -0.0254273 0.17846 0.5
+-0.0161167 0.127288 0.00169291 0.694465 0.5
+-0.0469232 0.128515 -0.00163965 0.389857 0.5
+-0.00961381 0.127158 -0.00378809 0.714685 0.5
+-0.0074566 0.128562 -0.00130751 0.72817 0.5
+-0.00304493 0.128909 -0.00174857 0.778769 0.5
+0.0028379 0.129022 -0.00194723 0.574275 0.5
+0.00903363 0.128674 -0.00165013 0.617309 0.5
+-0.0561607 0.131588 -0.00571429 0.687735 0.5
+-0.0457551 0.127167 -0.00484962 0.645893 0.5
+-0.00304746 0.127678 -0.00456004 0.562309 0.5
+0.00303811 0.12768 -0.00442 0.624596 0.5
+0.0101526 0.126812 -0.00466464 0.64326 0.5
+-0.0553259 0.126836 -0.00601308 0.517644 0.5
+0.00799473 0.034846 -0.0206913 0.278473 0.5
+0.0027179 0.0342191 -0.0204737 0.322372 0.5
+-0.00295804 0.0342418 -0.0216222 0.194059 0.5
+0.0134674 0.0353221 -0.0196961 0.466171 0.5
+0.00440963 0.0383063 -0.0240776 0.3469 0.5
+0.00140752 0.0383474 -0.0246147 0.361099 0.5
+-0.00309177 0.0383877 -0.0251866 0.314174 0.5
+-0.0575023 0.100661 -0.0195211 0.459895 0.452391
+-0.0485739 0.15316 -0.00547278 0.691758 0.5
+-0.0646573 0.0334831 -0.00296009 0.187639 0.5
+-0.0640796 0.100426 -0.0173936 0.44544 0.466101
+-0.0704415 0.100139 -0.0146037 0.499781 0.478548
+-0.0326376 0.155806 -0.00949884 0.828995 0.5
+0.0336094 0.0373624 0.00273412 0.290019 0.5
+0.0320943 0.0397885 -0.00195136 0.323719 0.5
+0.0158502 0.0449602 -0.0237212 0.910511 0.5
+0.00889467 0.0426449 -0.0242659 0.891863 0.5
+0.00312499 0.0452721 -0.026588 0.665265 0.460024
+-0.00298345 0.044686 -0.0272222 0.905955 0.5
+-0.00912346 0.0448524 -0.0280671 0.895801 0.5
+-0.0145351 0.0443266 -0.0277771 0.887903 0.5
+-0.0209223 0.0460913 -0.0281918 0.705844 0.5
+0.034052 0.0448434 -0.00540113 0.626363 0.5
+-0.0312646 0.158257 -0.01223 0.732334 0.5
+0.0401509 0.0448981 -0.00354586 0.446696 0.5
+0.0143253 0.0473484 -0.0251513 0.546545 0.456757
+0.00937888 0.0466526 -0.0261685 0.907397 0.5
+-0.0766531 0.0695423 0.0207982 0.774152 0.5
+0.0087246 0.0517916 -0.0291615 0.840924 0.5
+0.00299372 0.0506927 -0.0298557 0.901259 0.5
+-0.00164566 0.0489436 -0.0304144 0.872257 0.5
+-0.00321397 0.0522596 -0.0314075 0.634884 0.475184
+-0.00915341 0.0509217 -0.0318681 0.650022 0.5
+-0.0146018 0.0513752 -0.0319045 0.891033 0.5
+-0.0161558 0.0488543 -0.0303763 0.808351 0.5
+-0.0205843 0.0508011 -0.0296435 0.813106 0.5
+0.0405252 0.0518855 -0.00654453 0.65569 0.5
+0.0149309 0.0520772 -0.0273859 0.655547 0.5
+0.041884 0.0490868 -0.00604367 0.898378 0.5
+0.019962 0.0529908 -0.0261219 0.592286 0.5
+-0.0198501 0.0534234 -0.0312267 0.768335 0.5
+-0.0336273 0.0527187 -0.0106243 0.102172 0.5
+-0.0461112 0.0529158 -0.0101664 0.636429 0.372142
+-0.0204 0.161875 -0.014658 0.822907 0.5
+0.0449924 0.0530898 -0.00614891 0.575737 0.5
+0.00733679 0.0546532 -0.0305038 0.688621 0.5
+0.00283568 0.0546532 -0.0307468 0.611749 0.5
+-0.00302245 0.0577 -0.0331477 0.67582 0.5
+-0.00914668 0.0576676 -0.0341165 0.698389 0.5
+-0.01517 0.058199 -0.0349877 0.856637 0.5
+-0.0202707 0.0581031 -0.0333681 0.552506 0.5
+0.0140844 0.057965 -0.028983 0.564173 0.483714
+0.0103301 0.0588553 -0.0299472 0.602031 0.489059
+0.00732823 0.0588898 -0.0306117 0.710141 0.5
+0.0027369 0.0590151 -0.0321928 0.690932 0.5
+-0.0337187 0.0579742 -0.0115824 0.143826 0.5
+-0.0390711 0.0582467 -0.0115033 0.780735 0.5
+-0.0460474 0.0579124 -0.0115174 0.472305 0.5
+-0.00961439 0.0642168 -0.0358564 0.670518 0.457134
+-0.044157 0.0599825 -0.0123877 0.830365 0.5
+0.015251 0.0645803 -0.029567 0.626368 0.396114
+0.00839294 0.0649214 -0.0316957 0.79033 0.385997
+0.00325858 0.0643529 -0.0332439 0.728322 0.418376
+-0.00361257 0.0645861 -0.034907 0.670644 0.5
+-0.0144709 0.065006 -0.0371603 0.712311 0.5
+-0.0366623 0.060679 -0.0122791 0.525705 0.5
+-0.0526404 0.0636402 -0.0101297 0.452904 0.5
+-0.0381866 0.0648919 -0.0142073 0.543504 0.5
+-0.0452495 0.0647856 -0.0139819 0.883769 0.5
+-0.0599262 0.0622966 -0.00429285 0.195385 0.344922
+-0.0778641 0.117463 -0.00576778 0.523105 0.5
+-0.0187447 0.0664151 -0.0374779 0.820087 0.5
+-0.0577616 0.0644884 -0.00779097 0.472929 0.5
+-0.0625778 0.0655353 -0.00741131 0.379588 0.453283
+0.0251088 0.0710945 -0.0248604 0.704567 0.5
+0.021457 0.0702729 -0.0273415 0.740248 0.5
+0.0166747 0.0701586 -0.0297203 0.658948 0.5
+0.0132745 0.0702643 -0.0312074 0.651019 0.5
+0.00867525 0.0703509 -0.0324278 0.818076 0.5
+0.00229643 0.0708694 -0.0343123 0.73028 0.5
+-0.0030646 0.070381 -0.0353565 0.764349 0.5
+-0.00773679 0.0691749 -0.0362051 0.757441 0.5
+-0.0101988 0.0715122 -0.0373778 0.76291 0.5
+-0.0147454 0.0704429 -0.0382943 0.581028 0.470136
+-0.0203984 0.0706516 -0.038158 0.645161 0.5
+-0.0240967 0.0693418 -0.0362521 0.4533 0.5
+-0.0605175 0.0673597 -0.0108259 0.635082 0.5
+-0.0387293 0.0706355 -0.0168457 0.666323 0.5
+-0.0451347 0.0705064 -0.0164504 0.875899 0.5
+-0.0523435 0.0697862 -0.0145984 0.401473 0.5
+-0.0591515 0.0702891 -0.0147203 0.639534 0.5
+-0.0652515 0.0688492 -0.00993982 0.422384 0.422462
+-0.0247614 0.0719777 -0.0368317 0.497524 0.5
+-0.0637884 0.0712697 -0.0138535 0.437166 0.5
+0.0211454 0.0769268 -0.0268772 0.737516 0.5
+0.0162128 0.0765268 -0.0293784 0.712202 0.5
+0.0133247 0.0760196 -0.0306715 0.679361 0.5
+0.00907695 0.076038 -0.0330382 0.719764 0.5
+0.00245085 0.0760857 -0.0351615 0.721395 0.5
+-0.00176321 0.0762288 -0.0360688 0.799862 0.5
+-0.00476487 0.076286 -0.0369742 0.814155 0.377247
+-0.00962992 0.0765936 -0.0378651 0.627364 0.487104
+-0.0144481 0.0764118 -0.0385775 0.832444 0.5
+-0.021453 0.0763574 -0.038668 0.751656 0.5
+-0.024977 0.0762484 -0.0374518 0.39297 0.457854
+-0.0377453 0.0766164 -0.0189124 0.273711 0.5
+-0.0397395 0.0746623 -0.0180255 0.973337 0.5
+-0.0437423 0.0765905 -0.0187922 0.595439 0.440211
+-0.0466377 0.0744845 -0.0173668 0.538698 0.468502
+-0.0518623 0.0745812 -0.0175084 0.855826 0.5
+-0.0589866 0.0745368 -0.01766 0.879682 0.5
+-0.0644081 0.0756279 -0.0167529 0.562234 0.481782
+-0.0721295 0.0740256 -0.0105719 0.341576 0.432723
+-0.0615233 0.0354132 0.043881 0.499903 0.5
+-0.0524971 0.0769872 -0.0189536 0.873913 0.5
+-0.0587482 0.0767445 -0.0187462 0.882838 0.5
+0.013102 0.0809953 -0.0307917 0.69611 0.5
+0.00892296 0.0820652 -0.0325478 0.700162 0.5
+0.0022917 0.0820297 -0.0349279 0.829579 0.5
+-0.00177837 0.0804805 -0.0364471 0.647084 0.379833
+-0.00379684 0.0824193 -0.037328 0.824332 0.5
+-0.0142988 0.0820384 -0.0390211 0.832022 0.373406
+-0.0207708 0.0823862 -0.0387335 0.805306 0.5
+-0.0248089 0.0818968 -0.0377031 0.470046 0.439483
+-0.0735819 0.0777026 -0.0122023 0.445372 0.5
+0.015425 0.0831288 -0.0295207 0.735694 0.5
+-0.0383994 0.0817919 -0.0209596 0.290803 0.5
+-0.0451184 0.0815526 -0.020434 0.548777 0.45056
+-0.051814 0.0818472 -0.0211348 0.886135 0.5
+-0.0583689 0.0812724 -0.0202975 0.847354 0.5
+-0.063949 0.082768 -0.0188935 0.606874 0.5
+-0.0662709 0.080065 -0.0177832 0.593549 0.449354
+-0.0695594 0.0830593 -0.0170582 0.4495 0.5
+-0.00481814 0.086841 -0.0367951 0.827149 0.5
+-0.0248206 0.0867524 -0.0367639 0.487957 0.5
+0.0132046 0.0871602 -0.0305473 0.663835 0.5
+-0.0360837 0.0867076 -0.023791 0.366486 0.5
+-0.00877843 0.0340556 -0.0204927 0 0
+-0.0207128 0.0342382 -0.0208728 0.319674 0.5
+-0.0147915 0.0341096 -0.0207616 0.312592 0.5
+-0.0265767 0.0342963 -0.0210989 0.482378 0.5
+0.00282685 0.0351053 -0.0158136 0.508357 0.5
+0.00885967 0.034471 -0.0147487 0.490133 0.5
+-0.0390848 0.0337228 -0.0202617 0.628543 0.5
+-0.0326656 0.0345334 -0.0201874 0.788348 0.5
+-0.00224535 0.0351539 -0.0166234 0.756398 0.5
+-0.0149096 0.0357313 -0.0180956 0.933106 0.5
+-0.0114808 0.0353662 -0.0177045 0.933613 0.5
+-0.00921575 0.0380183 -0.0149732 0.909294 0.5
+-0.00282494 0.0382292 -0.0140636 0.920543 0.5
+0.00285919 0.0377324 -0.0134715 0.965028 0.5
+0.0159109 0.0347098 -0.00882204 0.420938 0.5
+-0.0306839 0.036693 -0.0184598 0.875112 0.5
+-0.0265216 0.0367471 -0.0188177 0.84266 0.5
+-0.0218341 0.0369718 -0.0184303 0.873319 0.5
+-0.0203027 0.0382765 -0.0152577 0.887215 0.5
+-0.0152596 0.0382328 -0.0156428 0.873575 0.5
+0.00738356 0.0366172 -0.0125003 0.962688 0.5
+0.00992361 0.0351979 -0.00924624 0.9642 0.5
+0.00702596 0.0378387 -0.00879015 0.927286 0.5
+-0.0396958 0.0342843 -0.014578 0.76643 0.5
+-0.0329517 0.0382154 -0.014678 0.654319 0.5
+-0.0263862 0.0385778 -0.0153644 0.924592 0.5
+0.00320835 0.0389424 -0.00953857 0.945732 0.5
+-0.0364387 0.0357946 -0.0155844 0.543249 0.5
+-0.00301526 0.0391061 -0.00886496 0.816802 0.5
+0.00831664 0.0348156 -0.00321961 0.671683 0.5
+0.0145039 0.0343685 -0.0028433 0.748562 0.5
+-0.0365752 0.0370276 -0.0136534 0.498247 0.5
+-0.0146234 0.0388055 -0.00887465 0.759701 0.5
+-0.00886749 0.0389394 -0.00890173 0.761944 0.5
+-0.0451032 0.0336721 -0.00848668 0.772585 0.5
+-0.040313 0.0350801 -0.00861758 0.462745 0.5
+-0.0206235 0.0386 -0.00878063 0.747754 0.5
+0.00267879 0.038424 -0.00319748 0.448355 0.5
+0.015044 0.0350517 0.00289039 0.579133 0.5
+0.0201479 0.0347806 0.00348327 0.403273 0.5
+0.027119 0.0353514 0.00366834 0.0909279 0.5
+0.0280785 0.0365531 0.000826759 0 0
+-0.0376066 0.0375692 -0.00942418 0.22755 0.5
+-0.0332748 0.0384549 -0.00855692 0.752109 0.5
+-0.0264541 0.0384497 -0.00886193 0.729343 0.5
+-0.00299262 0.0389582 -0.00292437 0.746846 0.5
+0.00451408 0.0356078 -0.00103635 0.413486 0.5
+0.00881079 0.0350428 0.00356828 0.588251 0.5
+0.0314184 0.0360255 0.00457907 0.187967 0.5
+-0.00888202 0.0387884 -0.00299409 0.73859 0.5
+0.00271787 0.0349091 0.00339755 0.645421 0.5
+-0.041199 0.0341471 -0.00327644 0 0
+-0.0205479 0.0384259 -0.00283766 0.741413 0.5
+-0.0146618 0.0385908 -0.00288739 0.718901 0.5
+0.00103528 0.0375917 0.000952222 0.441385 0.5
+0.0215747 0.0354906 0.0086194 0.395945 0.5
+0.0264794 0.0346514 0.00870654 0.414057 0.5
+0.0322391 0.0355412 0.00882378 0.515667 0.5
+-0.0521057 0.0334794 -0.00318207 0.614129 0.5
+-0.0455078 0.0336572 -0.00225818 0.757211 0.5
+-0.0334104 0.0383259 -0.00292317 0.611245 0.5
+-0.0265122 0.0383343 -0.00296504 0.763748 0.5
+-0.00224847 0.0383354 0.00320971 0.728422 0.5
+-0.0589386 0.0334143 -0.00291301 0.444064 0.5
+-0.00874044 0.0385976 0.00291227 0.735039 0.5
+0.00273457 0.0342734 0.0088248 0.796819 0.5
+0.00621941 0.0351341 0.00654928 0 0
+-0.080018 0.109279 0.0373655 0.503151 0.426569
+-0.0393178 0.0336443 0.00354096 0.266658 0.5
+-0.0213111 0.0382973 0.00334866 0.753895 0.5
+-0.0146196 0.0384265 0.00290922 0.762157 0.5
+-0.00353554 0.0379644 0.00874752 0.658939 0.5
+0.0276681 0.0349662 0.0149532 0.360666 0.5
+0.03282 0.0359255 0.0147037 0.719837 0.5
+0.0389763 0.0383079 0.0145025 0.635106 0.5
+-0.0523961 0.0335249 0.00326874 0.742717 0.5
+-0.0462346 0.0335696 0.00267776 0.743661 0.5
+-0.0277984 0.0382296 0.00286126 0.456211 0.5
+-0.000947006 0.0357374 0.0103469 0.779853 0.5
+0.0222276 0.0358262 0.0160256 0.180494 0.5
+0.0448051 0.0411192 0.0150961 0.294679 0.5
+-0.0581064 0.033504 0.00272997 0.775526 0.5
+-0.0352323 0.0337248 0.00491425 0.152905 0
+-0.0312985 0.0381858 0.00167702 0 0
+-0.0088641 0.03847 0.00876261 0.73345 0.5
+0.0028919 0.0342894 0.0147059 0.676527 0.5
+-0.0703332 0.0340583 0.00286723 0.639535 0.5
+-0.0648245 0.0334924 0.00301734 0.793089 0.5
+-0.0387963 0.034763 0.00935652 0.458758 0.5
+-0.0332327 0.0337932 0.00943608 0.151116 0.5
+-0.0203456 0.0382265 0.00836296 0.759992 0.5
+-0.0152156 0.0383161 0.00935801 0.755179 0.5
+-0.000385714 0.0351459 0.0134171 0.848157 0.5
+0.00663645 0.0342324 0.0159688 0 0
+0.0268074 0.0356469 0.0204126 0.619176 0.5
+0.0309391 0.0362152 0.0189937 0.762661 0.5
+0.0334119 0.0376179 0.0210082 0.70177 0.5
+-0.0515734 0.0338904 0.00817232 0.493124 0.5
+-0.0454999 0.0352808 0.00804865 0.53914 0.5
+-0.0263229 0.0380313 0.00871732 0.143858 0.5
+-0.0031858 0.0377098 0.014513 0.797449 0.5
+0.0211051 0.0351552 0.0207004 0.432057 0.5
+0.0391983 0.0395969 0.0205879 0.670001 0.5
+0.0441778 0.0418755 0.0204802 0.609797 0.5
+-0.0580282 0.0335624 0.00918162 0.776077 0.5
+-0.00922404 0.0383488 0.0150261 0.754001 0.5
+0.00313746 0.0352426 0.0204176 0.692001 0.5
+0.00877508 0.0346179 0.020856 0.290121 0.5
+0.0468489 0.0434226 0.0210936 0.239557 0.5
+-0.0648031 0.0337402 0.00884817 0.802283 0.5
+-0.0338156 0.0345063 0.0150293 0.572312 0.5
+-0.0149173 0.0382498 0.0147214 0.753708 0.5
+0.0146344 0.0345628 0.0222588 0.157065 0.5
+-0.0365655 0.0357926 0.0130139 0.391807 0.5
+-0.0262153 0.0376693 0.0148666 0.146481 0.5
+-0.0205165 0.0381248 0.0146779 0.715632 0.5
+-0.00229335 0.0382456 0.020565 0.923102 0.5
+0.014723 0.0347707 0.0263935 0.310145 0.5
+0.0210245 0.0353476 0.0265418 0.313898 0.5
+0.0250756 0.0364517 0.0246847 0.678097 0.5
+0.0273584 0.0381522 0.0267127 0.778478 0.5
+0.0321164 0.0401984 0.026762 0.778536 0.5
+-0.053829 0.0335431 0.0139547 0.458851 0.5
+0.00114114 0.037661 0.0223414 0.978558 0.5
+0.00915473 0.0353589 0.0262457 0.701449 0.5
+0.0380552 0.0412819 0.02589 0.374179 0.417844
+-0.0588034 0.0336951 0.0146283 0.798139 0.5
+-0.0339319 0.0346253 0.0202274 0.513983 0.5
+-0.0152545 0.0382629 0.0204704 0.75125 0.5
+-0.00888844 0.0384087 0.0207206 0.746481 0.5
+0.00307272 0.0384964 0.0264151 0.996029 0.5
+-0.0261643 0.0378491 0.0205422 0.603577 0.5
+-0.0205429 0.0381473 0.0213758 0.772551 0.5
+-0.0538188 0.0335608 0.0210581 0 0
+-0.00301594 0.03875 0.0263901 0.805634 0.5
+0.00756209 0.0380712 0.0285007 0.978659 0.5
+0.0143741 0.0348327 0.0331833 0.915728 0.5
+0.0198279 0.03555 0.0321213 0.749506 0.5
+0.0236875 0.0373106 0.0299772 0.517201 0.5
+-0.0588476 0.033906 0.020465 0.657735 0.5
+-0.00882687 0.0386047 0.0265705 0.756827 0.5
+0.00847025 0.0383344 0.0315598 0.739987 0.5
+0.0108958 0.035647 0.0330663 0.649316 0.5
+-0.0366651 0.0353042 0.023032 0.153172 0.5
+-0.0340084 0.0344659 0.0266224 0.263742 0.5
+-0.0270447 0.0379104 0.0270529 0.074682 0.5
+-0.0210471 0.0383013 0.026282 0.782021 0.5
+-0.0147317 0.0384888 0.0265233 0.791552 0.5
+-0.0712786 0.0733348 0.0355839 0.683322 0.427231
+-0.0388887 0.0346255 0.0265538 0.109729 0
+0.00290004 0.0393205 0.032168 0.626516 0.5
+0.0155389 0.0350901 0.0393977 0.759188 0.5
+0.0195159 0.0358111 0.0367948 0.405286 0.5
+-0.0589139 0.0341314 0.0264586 0.808252 0.5
+-0.052234 0.0340737 0.0268887 0.497915 0.5
+-0.0447866 0.0339274 0.0274346 0.154159 0.5
+-0.0310127 0.0369382 0.02848 0.240675 0.5
+-0.00908756 0.0390146 0.0330901 0.79352 0.5
+-0.00293287 0.039209 0.03365 0.804769 0.5
+0.00861952 0.0346654 0.0391536 0.125418 0.5
+-0.0149144 0.0388312 0.0324344 0.795183 0.5
+0.00392423 0.0347398 0.0399064 0.146347 0.5
+-0.0657827 0.0618455 0.00187562 0.442355 0.5
+-0.0640051 0.0606097 0.00361345 0.333039 0.5
+-0.0455164 0.0345095 0.0326748 0.510388 0.5
+-0.0385699 0.0344168 0.033204 0.485482 0.5
+-0.0342024 0.0351611 0.0325685 0.248514 0.5
+-0.0270303 0.0384799 0.0326469 0.783767 0.5
+-0.0209433 0.0387397 0.0332273 0.806699 0.5
+-0.0520994 0.0344582 0.0326775 0.466807 0.5
+-0.0313489 0.0377268 0.0321213 0.178238 0.5
+-0.00219023 0.0348305 0.0410082 0.139343 0
+0.00818206 0.0355366 0.0443043 0.642932 0.5
+0.014947 0.0361331 0.0431407 0.796588 0.5
+-0.0642564 0.0597236 0.0092932 0.716255 0.5
+-0.0584732 0.0343588 0.0331559 0.775713 0.5
+-0.0145859 0.0393004 0.0380317 0.483641 0.5
+-0.00937548 0.0394517 0.037871 0.328321 0.5
+-0.0588297 0.0579582 0.0145443 0 0
+-0.038732 0.0346956 0.0400227 0.628019 0.5
+-0.0331487 0.034492 0.0390527 0.154826 0.5
+-0.0201914 0.0391628 0.0381696 0.483919 0.5
+-0.00878985 0.0348233 0.0452949 0.139305 0.5
+-0.0031441 0.0351515 0.045825 0.295611 0.5
+-0.0701619 0.0622789 0.00863964 0.42197 0.408074
+-0.0451191 0.034688 0.0396457 0.766116 0.5
+-0.0256628 0.0389081 0.0373249 0 0
+-0.0146115 0.0348173 0.0458198 0.143796 0.5
+-0.0636462 0.0593677 0.014889 0.807508 0.5
+-0.0531671 0.0345191 0.0391729 0.74918 0.5
+-0.0595372 0.034497 0.0397515 0.783724 0.5
+-0.0329555 0.0349777 0.045552 0.474674 0.5
+-0.0262436 0.034809 0.0452831 0.162616 0.5
+-0.0215554 0.0348112 0.0459347 0.152356 0
+-0.0633407 0.0601272 0.0190813 0.939061 0.5
+-0.0471 0.0351015 0.0434178 0.627709 0.5
+-0.0120723 0.0353434 0.0494553 0.877126 0.5
+-0.016313 0.0351836 0.0504037 0.67915 0.5
+-0.0483699 0.146034 -0.00115148 0.583019 0.5
+-0.0264335 0.156562 -0.00835956 0.469485 0.437523
+-0.065003 0.144791 -0.0142909 0.400803 0.470167
+-0.066228 0.151547 -0.0394609 0.538048 0.5
+-0.0663323 0.145309 -0.018858 0.764025 0.5
+-0.0412403 0.152108 -0.00674014 0.633348 0.5
+3 4 132 80
+3 80 132 544
+3 373 80 544
+3 387 299 241
+3 859 1475 1474
+3 371 299 401
+3 401 326 333
+3 347 673 402
+3 1187 1354 386
+3 1221 457 69
+3 186 224 114
+3 1250 1256 116
+3 164 333 376
+3 19 488 1245
+3 749 19 1245
+3 667 19 749
+3 1040 412 543
+3 1359 1358 1500
+3 216 4 80
+3 152 544 146
+3 4 387 505
+3 543 1235 1205
+3 610 604 297
+3 250 801 1274
+3 504 148 111
+3 387 348 299
+3 401 333 164
+3 1484 1483 1110
+3 91 196 310
+3 90 91 310
+3 952 406 609
+3 1244 1247 1240
+3 93 327 65
+3 373 544 152
+3 373 152 644
+3 1321 158 22
+3 401 416 326
+3 644 152 1263
+3 276 59 181
+3 294 853 150
+3 308 249 529
+3 406 1124 604
+3 609 406 463
+3 146 3 145
+3 90 310 3
+3 58 186 10
+3 575 261 384
+3 25 40 43
+3 379 535 713
+3 348 704 157
+3 388 443 22
+3 396 146 145
+3 152 133 1263
+3 1830 1829 1812
+3 214 114 224
+3 157 147 324
+3 1335 430 1274
+3 282 230 214
+3 92 346 652
+3 1151 1012 1491
+3 571 1151 1491
+3 571 1491 183
+3 310 196 111
+3 91 4 505
+3 1250 116 108
+3 110 183 47
+3 1209 854 953
+3 132 4 91
+3 111 148 327
+3 93 111 327
+3 110 571 183
+3 713 171 402
+3 294 920 200
+3 81 180 52
+3 525 731 784
+3 347 256 673
+3 175 57 220
+3 338 175 220
+3 27 14 220
+3 57 27 220
+3 359 446 27
+3 359 36 446
+3 145 28 262
+3 133 16 419
+3 1447 576 1465
+3 1885 287 444
+3 133 396 16
+3 598 543 1205
+3 447 93 65
+3 73 213 36
+3 1236 1255 1250
+3 1235 1236 1250
+3 115 782 731
+3 28 93 447
+3 525 548 115
+3 299 416 401
+3 667 603 463
+3 292 667 463
+3 492 70 637
+3 133 146 396
+3 1166 1125 619
+3 1151 1219 959
+3 821 304 409
+3 1486 1487 1684
+3 15 175 167
+3 120 15 167
+3 15 131 57
+3 175 15 57
+3 57 131 27
+3 257 209 359
+3 27 257 359
+3 209 55 36
+3 359 209 36
+3 55 87 73
+3 36 55 73
+3 101 108 735
+3 108 101 64
+3 310 365 3
+3 576 859 1465
+3 262 28 447
+3 102 64 101
+3 544 91 90
+3 262 447 485
+3 485 447 211
+3 1443 1440 1442
+3 697 457 1221
+3 1008 383 1011
+3 451 435 1330
+3 129 405 426
+3 70 75 161
+3 648 693 692
+3 204 129 426
+3 812 481 123
+3 406 292 463
+3 878 1591 1009
+3 478 128 50
+3 900 979 977
+3 490 900 977
+3 241 299 371
+3 1164 701 734
+3 683 703 682
+3 719 718 682
+3 703 719 682
+3 760 759 718
+3 719 760 718
+3 137 729 728
+3 54 130 2
+3 302 358 301
+3 566 567 614
+3 1069 1103 1068
+3 1186 1190 1208
+3 4 348 387
+3 277 311 228
+3 707 226 706
+3 355 394 393
+3 773 129 755
+3 646 647 679
+3 356 355 269
+3 270 356 269
+3 424 394 356
+3 623 654 602
+3 654 683 602
+3 193 217 192
+3 1677 1676 1662
+3 1018 1019 1025
+3 597 1231 1165
+3 490 26 605
+3 299 157 416
+3 504 241 148
+3 84 528 714
+3 1247 669 1240
+3 683 719 703
+3 1886 1231 1066
+3 79 168 218
+3 211 318 426
+3 165 377 148
+3 91 505 387
+3 577 623 622
+3 692 693 707
+3 255 254 218
+3 194 270 255
+3 695 137 728
+3 1475 1498 1474
+3 67 808 1010
+3 1190 240 1208
+3 242 259 300
+3 476 509 567
+3 743 755 558
+3 1025 1024 1018
+3 194 255 218
+3 270 269 254
+3 203 271 12
+3 603 667 749
+3 1379 1395 1392
+3 783 546 1340
+3 578 600 577
+3 624 623 577
+3 600 624 577
+3 655 654 623
+3 684 683 654
+3 655 684 654
+3 720 719 683
+3 684 720 683
+3 720 739 719
+3 761 760 719
+3 739 761 719
+3 218 254 253
+3 694 695 437
+3 255 270 254
+3 1202 488 19
+3 412 1222 543
+3 60 528 84
+3 1352 494 702
+3 624 655 623
+3 1361 221 143
+3 755 129 204
+3 132 91 544
+3 543 1221 1235
+3 216 5 4
+3 1221 1236 1235
+3 754 755 204
+3 1169 732 715
+3 756 755 743
+3 1036 1035 1024
+3 728 756 743
+3 476 567 508
+3 4 5 348
+3 244 1339 546
+3 405 445 211
+3 254 269 268
+3 253 254 268
+3 381 358 302
+3 346 92 59
+3 517 450 1560
+3 1618 1333 141
+3 1498 1497 1474
+3 1231 597 1165
+3 228 264 215
+3 100 151 99
+3 151 215 99
+3 151 228 215
+3 1864 827 1870
+3 561 578 480
+3 207 561 480
+3 579 600 578
+3 561 579 578
+3 600 625 624
+3 656 655 624
+3 625 656 624
+3 685 684 655
+3 656 685 655
+3 685 721 720
+3 684 685 720
+3 721 740 739
+3 720 721 739
+3 739 740 761
+3 762 789 788
+3 761 762 788
+3 789 239 770
+3 788 789 770
+3 328 770 239
+3 423 424 476
+3 121 195 522
+3 423 476 422
+3 381 431 358
+3 148 371 401
+3 579 625 600
+3 465 464 431
+3 381 465 431
+3 464 465 227
+3 248 11 71
+3 548 142 1005
+3 740 762 761
+3 767 900 490
+3 728 743 437
+3 776 195 121
+3 1177 1176 1153
+3 1043 1034 1035
+3 137 708 729
+3 91 387 196
+3 1721 1729 1703
+3 728 729 756
+3 727 728 437
+3 196 387 241
+3 404 458 522
+3 355 354 268
+3 647 648 692
+3 979 846 901
+3 241 371 148
+3 142 1155 574
+3 269 355 268
+3 358 301 300
+3 301 358 300
+3 753 754 793
+3 184 229 228
+3 229 277 228
+3 312 311 277
+3 1845 1853 1831
+3 1523 1532 1153
+3 580 579 561
+3 1276 1280 1771
+3 580 626 625
+3 579 580 625
+3 626 657 656
+3 625 626 656
+3 656 657 685
+3 722 721 685
+3 741 740 721
+3 722 741 721
+3 740 763 762
+3 790 789 762
+3 763 790 762
+3 790 339 239
+3 789 790 239
+3 377 165 327
+3 476 508 422
+3 259 301 300
+3 162 170 169
+3 81 162 169
+3 580 561 562
+3 657 686 685
+3 229 312 277
+3 28 365 93
+3 1263 419 1254
+3 396 145 144
+3 685 686 722
+3 741 763 740
+3 133 152 146
+3 1263 133 419
+3 207 520 562
+3 520 562 580
+3 562 520 580
+3 562 626 580
+3 239 339 487
+3 597 1063 1066
+3 3 365 28
+3 649 648 615
+3 108 64 116
+3 571 1225 1218
+3 184 185 229
+3 313 312 229
+3 185 313 229
+3 439 501 520
+3 501 581 562
+3 520 501 562
+3 627 626 562
+3 581 627 562
+3 627 628 626
+3 658 657 626
+3 628 658 626
+3 658 675 657
+3 687 686 657
+3 675 687 657
+3 723 722 686
+3 687 723 686
+3 722 723 741
+3 741 723 763
+3 764 791 790
+3 763 764 790
+3 791 407 339
+3 790 791 339
+3 407 303 339
+3 303 487 339
+3 303 460 487
+3 303 325 460
+3 170 106 105
+3 105 106 68
+3 439 440 501
+3 723 764 763
+3 1 1027 453
+3 1067 511 942
+3 775 121 774
+3 1281 1270 1291
+3 368 440 439
+3 367 368 439
+3 582 581 501
+3 628 627 581
+3 658 688 687
+3 675 658 687
+3 1733 1562 1561
+3 757 775 756
+3 74 68 46
+3 398 1223 317
+3 631 607 231
+3 1465 859 1474
+3 1775 1784 1754
+3 204 138 793
+3 74 122 97
+3 584 533 570
+3 278 313 185
+3 265 278 185
+3 369 368 313
+3 278 369 313
+3 369 440 368
+3 502 501 440
+3 583 582 501
+3 502 583 501
+3 583 581 582
+3 629 628 581
+3 583 629 581
+3 629 659 658
+3 628 629 658
+3 658 659 688
+3 724 723 687
+3 688 724 687
+3 724 742 723
+3 742 765 764
+3 723 742 764
+3 764 238 791
+3 791 238 407
+3 407 238 303
+3 238 333 303
+3 333 325 303
+3 614 615 647
+3 46 122 74
+3 606 199 112
+3 441 440 369
+3 83 173 573
+3 775 776 121
+3 846 979 901
+3 441 502 440
+3 659 689 688
+3 84 714 1367
+3 535 52 171
+3 551 798 1883
+3 630 629 583
+3 629 630 659
+3 689 724 688
+3 792 238 764
+3 765 792 764
+3 1207 1208 177
+3 195 96 522
+3 122 13 97
+3 344 492 637
+3 1025 1036 1024
+3 775 774 756
+3 1012 1151 959
+3 1270 1372 1291
+3 145 3 28
+3 649 670 695
+3 517 1888 243
+3 444 399 1885
+3 370 369 278
+3 724 765 742
+3 376 333 238
+3 1372 1375 1291
+3 1060 1161 1162
+3 16 396 144
+3 369 442 441
+3 583 601 630
+3 690 689 659
+3 318 295 427
+3 138 204 427
+3 693 694 707
+3 310 111 365
+3 365 111 93
+3 636 660 659
+3 567 566 508
+3 426 405 211
+3 121 126 774
+3 471 601 583
+3 251 237 188
+3 1303 188 237
+3 278 314 370
+3 370 442 369
+3 442 503 502
+3 441 442 502
+3 503 471 583
+3 502 503 583
+3 858 302 259
+3 16 144 319
+3 660 690 659
+3 690 725 724
+3 689 690 724
+3 750 765 724
+3 725 750 724
+3 8 792 765
+3 750 8 765
+3 376 238 792
+3 8 376 792
+3 164 376 238
+3 376 164 238
+3 1381 1380 1375
+3 1135 1134 1103
+3 1104 1135 1103
+3 794 204 793
+3 447 65 211
+3 442 1347 503
+3 249 262 485
+3 1036 1043 1035
+3 522 96 438
+3 204 426 427
+3 188 283 251
+3 1235 1250 1205
+3 485 262 23
+3 597 1066 1165
+3 144 308 319
+3 1027 767 589
+3 648 649 694
+3 567 615 614
+3 821 409 304
+3 63 711 903
+3 8 164 376
+3 12 478 50
+3 171 347 402
+3 284 1327 314
+3 1447 1465 1459
+3 1456 1447 1459
+3 1329 1328 1380
+3 755 756 773
+3 756 774 773
+3 193 218 253
+3 648 694 693
+3 168 194 218
+3 190 188 189
+3 284 283 188
+3 190 284 188
+3 283 284 314
+3 262 485 23
+3 108 116 64
+3 751 750 725
+3 726 751 725
+3 751 771 750
+3 37 8 750
+3 771 37 750
+3 632 164 8
+3 569 53 411
+3 511 1560 1884
+3 386 1354 1320
+3 165 632 8
+3 37 165 8
+3 165 164 632
+3 662 661 638
+3 354 393 422
+3 401 165 148
+3 979 1883 798
+3 144 145 262
+3 413 408 349
+3 16 319 669
+3 318 211 295
+3 156 1213 198
+3 1153 1152 1119
+3 1225 1448 247
+3 190 266 284
+3 419 669 1247
+3 479 233 232
+3 166 165 37
+3 709 492 344
+3 567 568 615
+3 107 827 1864
+3 695 727 437
+3 485 211 23
+3 1254 419 1247
+3 419 16 669
+3 1884 1591 1009
+3 249 485 24
+3 41 249 24
+3 1103 1134 1133
+3 272 398 492
+3 754 204 794
+3 1498 159 113
+3 24 485 23
+3 1102 1103 1133
+3 308 144 249
+3 164 165 401
+3 692 707 706
+3 509 568 567
+3 191 252 190
+3 190 252 266
+3 252 285 284
+3 266 252 284
+3 285 286 284
+3 284 286 337
+3 144 262 249
+3 536 564 563
+3 563 564 593
+3 564 612 611
+3 593 564 611
+3 645 361 611
+3 612 645 611
+3 645 691 1313
+3 309 752 751
+3 726 309 751
+3 752 772 771
+3 751 752 771
+3 119 37 771
+3 772 119 771
+3 425 166 37
+3 119 425 37
+3 380 165 166
+3 425 380 166
+3 128 83 17
+3 50 128 17
+3 729 757 756
+3 394 423 422
+3 589 767 490
+3 424 509 476
+3 1374 1359 1531
+3 408 372 349
+3 679 692 706
+3 855 242 300
+3 766 757 730
+3 354 355 393
+3 79 218 193
+3 129 126 405
+3 126 458 405
+3 647 692 679
+3 757 766 775
+3 766 776 775
+3 1699 1014 1013
+3 393 394 422
+3 252 286 285
+3 752 119 772
+3 425 327 380
+3 696 730 729
+3 708 696 729
+3 649 695 694
+3 78 79 193
+3 1497 1498 113
+3 901 979 798
+3 404 24 445
+3 24 23 445
+3 776 795 195
+3 1340 1591 1884
+3 1035 1034 1024
+3 177 203 12
+3 380 327 425
+3 510 509 424
+3 477 510 424
+3 458 404 405
+3 192 217 252
+3 191 192 252
+3 217 267 286
+3 252 217 286
+3 286 267 352
+3 353 421 420
+3 352 353 420
+3 421 507 506
+3 506 507 536
+3 507 565 564
+3 536 507 564
+3 565 613 612
+3 564 565 612
+3 646 645 612
+3 613 646 612
+3 646 679 691
+3 645 646 691
+3 706 705 691
+3 679 706 691
+3 753 309 280
+3 138 119 752
+3 753 138 752
+3 427 425 119
+3 138 427 119
+3 295 380 425
+3 427 295 425
+3 65 327 380
+3 295 65 380
+3 769 104 315
+3 426 318 427
+3 568 616 615
+3 695 728 727
+3 404 445 405
+3 1635 1653 1453
+3 271 478 12
+3 839 136 830
+3 615 648 647
+3 311 277 228
+3 749 1245 1225
+3 353 392 421
+3 793 138 753
+3 315 104 33
+3 432 466 465
+3 381 432 465
+3 465 466 527
+3 1170 1190 1099
+3 754 794 793
+3 558 754 280
+3 193 253 217
+3 253 268 267
+3 217 253 267
+3 268 354 353
+3 267 268 353
+3 354 392 353
+3 422 421 392
+3 354 422 392
+3 422 508 507
+3 421 422 507
+3 508 566 565
+3 507 508 565
+3 614 613 565
+3 566 614 565
+3 614 647 646
+3 613 614 646
+3 168 810 194
+3 886 940 923
+3 946 945 930
+3 929 939 944
+3 940 569 887
+3 661 649 616
+3 320 919 878
+3 227 526 464
+3 882 873 866
+3 552 384 820
+3 464 927 358
+3 917 432 905
+3 879 829 820
+3 194 836 880
+3 935 466 432
+3 917 935 432
+3 1038 1725 1013
+3 1378 1391 1406
+3 173 448 293
+3 477 943 510
+3 616 568 617
+3 1405 550 980
+3 665 86 847
+3 891 906 912
+3 845 130 54
+3 999 925 822
+3 1885 928 555
+3 904 910 270
+3 315 33 478
+3 1033 1034 1042
+3 490 921 26
+3 850 257 131
+3 1070 1077 1034
+3 843 860 15
+3 120 843 15
+3 850 209 257
+3 914 913 300
+3 880 911 910
+3 641 661 616
+3 843 120 797
+3 860 870 15
+3 870 131 15
+3 870 850 131
+3 894 873 882
+3 811 248 875
+3 974 981 992
+3 850 201 131
+3 131 201 850
+3 850 201 209
+3 907 917 905
+3 694 437 226
+3 895 843 797
+3 870 860 843
+3 816 870 843
+3 870 201 850
+3 913 933 932
+3 968 969 986
+3 840 118 712
+3 816 843 895
+3 201 856 209
+3 856 845 55
+3 209 856 55
+3 931 930 911
+3 228 151 184
+3 1340 884 1884
+3 553 506 536
+3 539 867 842
+3 870 924 201
+3 977 823 490
+3 868 829 780
+3 999 1000 925
+3 198 701 156
+3 787 816 895
+3 924 877 856
+3 201 924 856
+3 877 845 856
+3 66 305 941
+3 769 203 1208
+3 848 847 919
+3 880 889 911
+3 1027 589 605
+3 957 816 787
+3 849 870 816
+3 957 849 816
+3 414 821 409
+3 1887 1004 928
+3 569 888 887
+3 459 384 552
+3 891 889 890
+3 839 892 891
+3 1080 1057 1051
+3 957 328 816
+3 328 957 816
+3 849 881 870
+3 881 849 870
+3 870 849 924
+3 481 531 123
+3 777 835 698
+3 891 892 906
+3 912 911 889
+3 891 912 889
+3 546 1339 746
+3 328 849 957
+3 849 88 924
+3 1043 1070 1034
+3 777 122 46
+3 477 929 943
+3 617 641 616
+3 822 915 72
+3 915 331 72
+3 834 806 956
+3 788 957 787
+3 770 788 787
+3 788 328 957
+3 864 877 924
+3 833 938 130
+3 845 833 130
+3 938 256 130
+3 1005 142 574
+3 661 676 137
+3 730 305 776
+3 1186 1208 1207
+3 1189 1186 1207
+3 798 1888 1067
+3 864 924 88
+3 864 922 877
+3 982 845 877
+3 922 982 877
+3 982 833 845
+3 894 905 873
+3 879 665 86
+3 879 665 847
+3 817 922 864
+3 833 982 922
+3 817 833 922
+3 894 907 905
+3 1562 1561 1038
+3 305 893 776
+3 899 864 88
+3 1071 1049 1072
+3 788 770 328
+3 776 768 795
+3 835 919 847
+3 817 864 899
+3 833 256 938
+3 1177 1195 1176
+3 1276 1771 1275
+3 155 100 813
+3 832 96 449
+3 879 384 665
+3 879 86 665
+3 834 956 320
+3 863 898 328
+3 826 849 328
+3 898 826 328
+3 849 826 88
+3 826 899 88
+3 1346 1883 900
+3 930 945 944
+3 939 930 944
+3 810 818 836
+3 838 836 837
+3 1077 1069 1034
+3 891 890 838
+3 1473 1680 1679
+3 44 24 438
+3 899 200 817
+3 1374 1391 1378
+3 466 935 527
+3 66 941 730
+3 913 906 912
+3 956 919 320
+3 662 676 661
+3 239 395 863
+3 395 898 863
+3 819 826 898
+3 826 200 899
+3 35 86 82
+3 880 270 194
+3 935 950 527
+3 670 661 695
+3 134 256 94
+3 818 837 836
+3 848 879 847
+3 395 819 898
+3 826 819 200
+3 200 920 817
+3 920 876 833
+3 817 920 833
+3 833 876 256
+3 1034 1069 1068
+3 932 933 947
+3 997 676 662
+3 836 889 880
+3 757 729 730
+3 956 806 919
+3 603 749 1460
+3 876 48 256
+3 827 107 155
+3 107 184 155
+3 830 855 841
+3 1042 1034 1068
+3 832 449 795
+3 997 662 638
+3 384 261 665
+3 997 696 676
+3 294 48 876
+3 920 294 876
+3 925 915 822
+3 1199 1231 1886
+3 941 305 730
+3 200 418 294
+3 569 940 886
+3 100 155 184
+3 840 712 331
+3 921 379 26
+3 1016 1014 1699
+3 776 766 730
+3 983 997 638
+3 676 696 137
+3 487 395 239
+3 487 819 395
+3 569 886 511
+3 940 887 923
+3 986 1000 985
+3 1125 110 47
+3 947 968 958
+3 842 874 834
+3 822 918 66
+3 985 999 998
+3 984 985 998
+3 999 822 998
+3 983 984 997
+3 984 998 997
+3 819 418 200
+3 177 85 1206
+3 12 275 397
+3 1231 1165 1066
+3 240 769 1208
+3 1000 999 985
+3 943 965 568
+3 906 913 932
+3 300 913 892
+3 997 998 66
+3 998 822 66
+3 478 33 128
+3 570 701 1076
+3 305 72 768
+3 72 811 768
+3 878 884 411
+3 878 835 884
+3 930 939 929
+3 968 978 967
+3 958 968 967
+3 946 958 967
+3 819 853 418
+3 510 943 509
+3 509 943 568
+3 151 100 184
+3 978 984 983
+3 967 978 983
+3 474 1122 799
+3 932 931 912
+3 487 460 819
+3 460 29 819
+3 819 29 853
+3 340 867 383
+3 1134 1135 1161
+3 947 946 931
+3 1411 1501 1408
+3 300 892 855
+3 356 910 929
+3 136 838 837
+3 1259 351 523
+3 887 896 923
+3 260 86 665
+3 774 129 773
+3 872 873 871
+3 906 932 912
+3 661 137 695
+3 511 886 942
+3 985 984 978
+3 968 985 978
+3 818 136 837
+3 1559 851 857
+3 872 871 865
+3 1222 1221 543
+3 548 1005 115
+3 430 1198 1065
+3 768 811 832
+3 945 967 944
+3 1132 1134 1160
+3 1019 1036 1025
+3 1134 1161 1160
+3 615 616 649
+3 1560 884 1884
+3 884 835 888
+3 214 230 114
+3 811 332 832
+3 878 411 53
+3 848 842 879
+3 842 829 879
+3 48 673 256
+3 869 811 768
+3 912 931 911
+3 935 936 950
+3 871 302 381
+3 972 991 971
+3 708 137 696
+3 1225 571 110
+3 847 955 13
+3 803 190 189
+3 865 871 858
+3 986 985 968
+3 929 944 943
+3 227 972 526
+3 888 835 896
+3 1001 1002 840
+3 1830 1841 1829
+3 50 140 275
+3 394 424 423
+3 411 884 888
+3 936 935 917
+3 907 936 917
+3 835 847 698
+3 811 6 332
+3 842 867 829
+3 1161 1060 1226
+3 1885 399 1887
+3 808 834 995
+3 1659 1658 1638
+3 65 295 211
+3 918 822 305
+3 302 871 381
+3 847 86 955
+3 1001 840 925
+3 1010 937 834
+3 1208 203 177
+3 1135 1162 1161
+3 921 81 379
+3 271 315 478
+3 948 969 947
+3 464 526 927
+3 834 848 806
+3 409 296 414
+3 302 873 432
+3 885 896 777
+3 841 892 839
+3 811 875 6
+3 1077 1104 1069
+3 1104 1103 1069
+3 68 106 46
+3 823 921 490
+3 162 81 921
+3 823 162 921
+3 989 1001 1000
+3 986 989 1000
+3 1000 1001 925
+3 888 896 887
+3 929 477 356
+3 974 972 534
+3 87 2 213
+3 915 840 331
+3 970 969 948
+3 965 641 568
+3 1207 177 1206
+3 1726 1725 1038
+3 1002 51 840
+3 814 191 803
+3 191 190 803
+3 855 892 841
+3 302 432 381
+3 173 293 573
+3 880 904 270
+3 871 873 302
+3 358 914 300
+3 239 863 328
+3 910 911 929
+3 331 712 811
+3 438 24 404
+3 892 913 906
+3 991 1002 990
+3 128 33 83
+3 810 836 194
+3 788 770 787
+3 814 803 804
+3 774 126 129
+3 242 855 830
+3 981 1189 1206
+3 927 934 914
+3 847 13 777
+3 301 358 300
+3 822 72 305
+3 641 617 568
+3 839 838 136
+3 904 880 910
+3 1850 1864 1870
+3 118 248 811
+3 949 970 948
+3 970 989 986
+3 1328 1325 1316
+3 358 927 914
+3 867 340 829
+3 943 944 966
+3 1100 221 1361
+3 530 805 525
+3 327 148 377
+3 1259 179 351
+3 1029 1028 1014
+3 969 968 947
+3 970 986 969
+3 832 795 768
+3 888 569 411
+3 342 344 113
+3 458 126 121
+3 943 966 965
+3 979 901 823
+3 823 861 162
+3 701 198 1076
+3 966 638 641
+3 769 315 271
+3 760 761 787
+3 965 966 641
+3 927 949 934
+3 949 948 934
+3 558 755 754
+3 919 835 878
+3 270 910 356
+3 852 162 861
+3 106 170 162
+3 852 106 162
+3 947 958 946
+3 815 192 191
+3 814 815 191
+3 820 384 879
+3 305 768 893
+3 698 847 777
+3 829 340 780
+3 534 972 227
+3 121 522 458
+3 1071 1077 1070
+3 846 823 901
+3 846 861 823
+3 918 305 66
+3 893 768 776
+3 1190 1186 1099
+3 67 1010 937
+3 925 840 915
+3 862 861 846
+3 862 852 861
+3 835 777 896
+3 946 945 944
+3 862 106 852
+3 1885 1887 928
+3 464 358 431
+3 526 949 927
+3 946 944 945
+3 890 889 838
+3 66 696 997
+3 1019 1561 1026
+3 1375 1380 1291
+3 1071 1061 1077
+3 712 118 811
+3 806 848 919
+3 971 990 970
+3 661 670 649
+3 971 970 949
+3 749 1225 110
+3 122 777 13
+3 35 13 955
+3 734 701 1164
+3 795 449 195
+3 874 842 848
+3 990 1002 989
+3 977 979 823
+3 526 971 949
+3 78 193 192
+3 815 78 192
+3 990 989 970
+3 834 539 842
+3 839 891 838
+3 1146 767 1064
+3 1002 1001 989
+3 840 51 118
+3 886 862 846
+3 280 754 753
+3 811 869 768
+3 906 913 912
+3 967 966 944
+3 931 946 930
+3 829 552 820
+3 886 106 862
+3 885 46 106
+3 1061 1104 1077
+3 320 67 834
+3 905 432 873
+3 874 848 834
+3 911 930 929
+3 1026 1572 1019
+3 972 974 992
+3 934 933 913
+3 914 934 913
+3 923 106 886
+3 777 46 885
+3 355 356 394
+3 449 96 195
+3 66 730 696
+3 807 96 832
+3 72 331 811
+3 896 106 923
+3 896 885 106
+3 1071 1070 1043
+3 932 947 931
+3 1049 1071 1043
+3 450 39 785
+3 946 967 945
+3 836 838 889
+3 787 761 788
+3 967 983 638
+3 966 967 638
+3 991 990 971
+3 597 1165 1231
+3 937 539 834
+3 934 948 947
+3 933 934 947
+3 886 846 942
+3 972 971 526
+3 1737 1762 1746
+3 1841 1851 1829
+3 417 1219 1218
+3 1166 110 1125
+3 159 342 113
+3 1065 1032 1274
+3 430 1065 1274
+3 1307 1320 1395
+3 767 1027 1
+3 846 798 1067
+3 735 1256 469
+3 1829 1850 1834
+3 398 317 1039
+3 288 32 34
+3 1051 1057 1058
+3 515 1684 1674
+3 1080 1079 1057
+3 1051 1058 1029
+3 1039 288 34
+3 1561 1726 1038
+3 1379 1307 1395
+3 304 642 409
+3 1396 1380 1381
+3 1030 1051 1029
+3 1219 1218 959
+3 598 1205 642
+3 1604 1615 1613
+3 1209 953 1193
+3 389 1521 1121
+3 398 75 70
+3 1314 273 524
+3 1022 1030 1021
+3 1022 1021 1016
+3 1030 1029 1021
+3 598 642 304
+3 528 1185 714
+3 1194 1209 1193
+3 177 12 397
+3 878 67 320
+3 1057 1045 1028
+3 1096 1095 1079
+3 1296 264 1319
+3 101 491 1237
+3 834 808 67
+3 1312 1484 1110
+3 963 1312 1110
+3 113 344 637
+3 1497 113 1496
+3 1119 1127 1113
+3 1312 1483 1484
+3 1799 1816 1812
+3 1079 1095 1057
+3 854 357 1191
+3 399 444 1210
+3 539 1011 383
+3 246 250 664
+3 1028 1038 1020
+3 1058 1057 1029
+3 311 1329 1396
+3 1260 1483 1312
+3 1187 1328 428
+3 1851 1864 1850
+3 317 609 619
+3 609 463 619
+3 1223 952 317
+3 603 1166 619
+3 1003 976 1094
+3 1248 556 297
+3 287 1885 555
+3 1138 1172 1163
+3 297 556 610
+3 1591 878 1009
+3 463 603 619
+3 749 110 1166
+3 1157 680 1081
+3 1886 304 409
+3 436 1027 605
+3 1015 1029 1014
+3 556 20 610
+3 20 604 610
+3 1099 1186 1189
+3 20 599 604
+3 1209 854 953
+3 1360 1550 1685
+3 492 398 70
+3 1172 1193 1192
+3 1175 1172 1192
+3 733 292 406
+3 202 733 406
+3 1010 834 67
+3 468 113 637
+3 154 1157 1081
+3 1172 1175 1163
+3 1193 854 1192
+3 953 854 1193
+3 1055 174 493
+3 1502 650 1046
+3 236 60 84
+3 1195 1194 1176
+3 85 177 397
+3 1163 1175 1148
+3 585 433 357
+3 1050 1045 1028
+3 1138 1148 1112
+3 1603 1402 1589
+3 1174 1192 1191
+3 1170 1174 1190
+3 1416 1417 1617
+3 398 1039 34
+3 75 398 1007
+3 1095 1107 1078
+3 1133 1134 1132
+3 1528 1022 1551
+3 1066 598 304
+3 292 733 667
+3 63 903 18
+3 619 1125 288
+3 1357 1355 1356
+3 733 1363 667
+3 470 1 212
+3 1017 830 136
+3 1119 1113 1095
+3 1387 231 247
+3 1107 1112 1078
+3 113 468 1496
+3 1028 1045 1050
+3 1014 1028 1013
+3 808 1011 539
+3 830 841 839
+3 12 50 275
+3 1344 1530 1115
+3 1363 153 19
+3 667 1363 19
+3 1103 1102 1068
+3 952 609 317
+3 1175 1174 1148
+3 1031 236 681
+3 595 10 555
+3 1119 1114 1587
+3 1114 1119 1096
+3 709 272 492
+3 451 736 434
+3 1174 1175 1192
+3 1380 1328 1316
+3 928 595 555
+3 153 489 1202
+3 572 598 1066
+3 19 153 1202
+3 1737 1746 1721
+3 1418 1417 1395
+3 1148 1147 1126
+3 488 1387 1448
+3 1245 488 1448
+3 1040 543 598
+3 572 1040 598
+3 1021 1029 1015
+3 1654 1653 1635
+3 329 58 595
+3 489 125 488
+3 1163 1148 1138
+3 1534 1687 1439
+3 342 709 344
+3 1112 1148 1126
+3 1202 489 488
+3 125 231 1387
+3 488 125 1387
+3 398 272 1223
+3 383 867 539
+3 414 296 928
+3 1176 1194 1172
+3 1028 1020 1013
+3 173 176 448
+3 1591 878 1009
+3 444 287 263
+3 1083 444 263
+3 272 952 1223
+3 1192 854 1191
+3 854 585 357
+3 1119 1152 1139
+3 1547 461 513
+3 296 329 595
+3 296 595 928
+3 603 749 1166
+3 1319 1329 1381
+3 1138 1152 1172
+3 63 18 33
+3 433 63 104
+3 769 433 104
+3 1174 1171 1147
+3 1372 1381 1375
+3 1613 1424 1603
+3 1113 1138 1107
+3 571 1218 1219
+3 1528 1551 1548
+3 1007 398 34
+3 1738 1737 1717
+3 1396 1329 1380
+3 1063 572 1066
+3 1153 1176 1152
+3 1139 1113 1127
+3 1119 1139 1127
+3 1191 357 1190
+3 357 240 1190
+3 1148 1174 1147
+3 29 460 325
+3 317 619 1039
+3 1754 1762 1737
+3 1329 311 1396
+3 1309 1380 1316
+3 1225 247 1218
+3 1448 1387 247
+3 1028 1045 1038
+3 1635 1453 1452
+3 116 1256 735
+3 514 1032 962
+3 1095 1078 1086
+3 1079 1095 1086
+3 357 433 240
+3 1174 1170 1171
+3 1218 1219 959
+3 1067 942 846
+3 1057 1095 1079
+3 10 114 287
+3 150 853 620
+3 555 10 287
+3 1152 1138 1139
+3 1052 1085 1370
+3 1704 1721 1703
+3 89 51 1002
+3 512 89 1002
+3 1152 1176 1172
+3 981 1206 992
+3 991 512 1002
+3 402 673 48
+3 1016 1551 1022
+3 1151 571 1219
+3 433 769 240
+3 1291 1380 1309
+3 1571 785 884
+3 589 490 605
+3 584 572 1063
+3 1057 1079 1045
+3 1138 1112 1107
+3 1045 1086 1078
+3 1095 1113 1107
+3 76 512 991
+3 1549 1552 1548
+3 203 769 271
+3 992 76 991
+3 274 89 512
+3 76 274 512
+3 274 51 89
+3 139 118 51
+3 274 139 51
+3 11 248 118
+3 139 11 118
+3 1056 1042 1068
+3 737 103 17
+3 871 302 858
+3 273 489 153
+3 1826 1835 1820
+3 197 48 294
+3 975 197 294
+3 197 713 402
+3 48 197 402
+3 584 1076 1040
+3 1079 1086 1045
+3 1029 1057 1028
+3 1139 1138 1113
+3 572 584 1040
+3 198 412 1040
+3 1076 198 1040
+3 298 273 153
+3 1500 1531 1359
+3 1096 1119 1095
+3 1194 1193 1172
+3 1560 785 1571
+3 882 866 894
+3 49 139 274
+3 1189 1207 1206
+3 1102 1133 1132
+3 1717 1721 1704
+3 1674 1487 1653
+3 584 570 1076
+3 894 1102 907
+3 821 1167 1199
+3 17 103 140
+3 50 17 140
+3 1042 1056 866
+3 1056 1068 894
+3 866 1056 894
+3 894 1068 1102
+3 1102 1132 936
+3 907 1102 936
+3 1160 950 936
+3 1132 1160 936
+3 1174 1191 1190
+3 1206 85 76
+3 992 1206 76
+3 397 274 76
+3 85 397 76
+3 275 49 274
+3 397 275 274
+3 140 139 49
+3 275 140 49
+3 103 11 139
+3 140 103 139
+3 409 642 329
+3 296 409 329
+3 436 975 1241
+3 436 605 975
+3 605 26 975
+3 26 197 975
+3 26 379 713
+3 197 26 713
+3 1010 539 937
+3 59 454 346
+3 652 408 413
+3 21 61 149
+3 171 345 347
+3 94 2 130
+3 130 256 134
+3 1004 1393 715
+3 313 368 367
+3 544 90 146
+3 81 535 379
+3 1257 527 950
+3 1257 950 1160
+3 302 301 259
+3 1004 414 928
+3 1160 1238 1257
+3 102 214 186
+3 1238 1160 1161
+3 1226 1238 1161
+3 1257 227 527
+3 95 233 364
+3 620 853 29
+3 1257 534 227
+3 282 454 230
+3 1453 1653 1452
+3 232 233 95
+3 821 1199 1886
+3 1232 1238 1226
+3 1238 981 1257
+3 1257 981 534
+3 417 408 652
+3 1233 1238 1232
+3 1027 436 42
+3 196 504 111
+3 169 180 81
+3 61 21 479
+3 631 231 388
+3 372 631 388
+3 1300 1382 1270
+3 1558 1559 857
+3 714 298 800
+3 298 153 800
+3 981 974 534
+3 704 348 5
+3 706 226 234
+3 388 231 443
+3 311 1330 1329
+3 1282 1300 1270
+3 1189 981 1238
+3 1233 1189 1238
+3 334 94 256
+3 1462 1672 1473
+3 895 786 787
+3 595 58 10
+3 1242 1251 1256
+3 489 231 125
+3 1236 1256 1250
+3 0 717 40
+3 470 212 0
+3 717 279 40
+3 276 230 59
+3 454 282 1237
+3 521 219 213
+3 417 652 346
+3 1266 417 346
+3 364 521 213
+3 171 363 345
+3 279 704 40
+3 470 0 160
+3 94 95 2
+3 42 281 717
+3 97 375 540
+3 61 479 363
+3 1216 705 706
+3 349 372 233
+3 453 42 717
+3 1241 975 150
+3 150 975 294
+3 214 362 282
+3 959 417 1266
+3 959 1219 417
+3 281 42 279
+3 408 607 372
+3 372 607 631
+3 0 40 25
+3 1221 69 1242
+3 287 114 263
+3 279 147 157
+3 704 279 157
+3 134 94 130
+3 81 52 535
+3 1265 58 329
+3 1249 1265 329
+3 82 97 13
+3 364 643 158
+3 82 375 97
+3 156 542 1214
+3 479 232 345
+3 35 82 13
+3 147 620 29
+3 102 186 58
+3 64 102 58
+3 363 479 345
+3 21 413 479
+3 652 413 21
+3 372 388 233
+3 216 43 5
+3 61 171 52
+3 413 349 479
+3 186 114 10
+3 619 288 1039
+3 697 1221 412
+3 171 61 363
+3 212 717 0
+3 1236 1242 1256
+3 607 408 417
+3 92 21 149
+3 279 42 147
+3 1221 1222 412
+3 697 1217 457
+3 156 1214 1213
+3 453 717 1
+3 552 829 868
+3 114 276 263
+3 570 734 701
+3 324 29 325
+3 1 717 212
+3 214 102 101
+3 2 364 213
+3 95 364 2
+3 74 97 68
+3 108 58 1265
+3 196 241 504
+3 416 325 326
+3 1346 900 767
+3 642 1205 1249
+3 71 135 6
+3 665 261 260
+3 389 43 216
+3 108 64 58
+3 1255 1236 1250
+3 7 1261 71
+3 1261 135 71
+3 83 737 17
+3 165 380 327
+3 147 29 324
+3 279 717 281
+3 417 1218 607
+3 1218 247 607
+3 83 573 737
+3 737 1239 7
+3 1239 1261 7
+3 42 620 147
+3 1215 697 1214
+3 1221 1242 1236
+3 1261 172 135
+3 651 62 6
+3 117 332 6
+3 62 117 6
+3 416 324 325
+3 157 324 416
+3 40 5 43
+3 1227 1239 737
+3 573 1227 737
+3 1261 1262 172
+3 172 651 6
+3 135 172 6
+3 62 807 117
+3 0 25 160
+3 364 388 643
+3 345 95 94
+3 1214 909 1215
+3 336 149 180
+3 233 388 364
+3 807 832 117
+3 1243 1262 1261
+3 1239 1243 1261
+3 42 150 620
+3 1215 1217 697
+3 1214 412 198
+3 1213 1214 198
+3 293 1227 573
+3 172 62 651
+3 878 1591 67
+3 1214 697 412
+3 1228 1243 1239
+3 1227 1228 1239
+3 96 45 438
+3 40 704 5
+3 59 92 181
+3 172 9 62
+3 643 22 158
+3 388 22 643
+3 92 149 181
+3 345 94 334
+3 652 21 92
+3 345 232 95
+3 214 101 362
+3 535 171 713
+3 1262 678 172
+3 678 9 172
+3 9 592 62
+3 479 349 233
+3 326 325 333
+3 117 832 332
+3 347 345 334
+3 234 1216 706
+3 486 62 592
+3 486 807 62
+3 884 1340 1884
+3 1270 1381 1372
+3 348 157 299
+3 1320 1418 1395
+3 1243 452 1262
+3 1262 452 678
+3 343 592 9
+3 149 61 52
+3 1224 1230 1228
+3 1246 1253 1243
+3 1243 1253 452
+3 163 486 592
+3 163 96 486
+3 2 87 54
+3 1474 1497 1496
+3 1488 1474 1496
+3 525 115 731
+3 1230 1246 1243
+3 1228 1230 1243
+3 452 343 9
+3 678 452 9
+3 31 592 343
+3 31 163 592
+3 743 226 437
+3 334 256 347
+3 149 52 180
+3 6 875 248
+3 1482 1474 1488
+3 1246 1230 1253
+3 452 31 343
+3 45 96 163
+3 364 158 521
+3 737 7 103
+3 213 73 87
+3 1063 533 584
+3 45 44 438
+3 42 436 150
+3 1244 1240 1092
+3 211 445 23
+3 1459 1465 1482
+3 1440 988 1442
+3 163 44 45
+3 1418 1354 1863
+3 436 1241 150
+3 453 1027 42
+3 108 1265 1249
+3 230 454 59
+3 1465 1474 1482
+3 311 1329 1319
+3 677 1234 1230
+3 1230 1234 1253
+3 452 374 31
+3 163 323 44
+3 282 214 230
+3 214 282 230
+3 1258 374 452
+3 1253 1258 452
+3 1215 909 1217
+3 1354 1418 1320
+3 1234 1240 1253
+3 294 418 853
+3 558 234 226
+3 11 103 71
+3 1240 1258 1253
+3 31 77 163
+3 77 323 163
+3 558 280 234
+3 214 224 186
+3 1205 1250 1249
+3 586 1296 1282
+3 1240 1234 677
+3 114 230 276
+3 1125 47 32
+3 308 77 31
+3 868 780 340
+3 1250 108 1249
+3 694 226 707
+3 288 1125 32
+3 319 31 374
+3 31 319 308
+3 529 323 77
+3 323 24 44
+3 280 309 234
+3 234 309 1216
+3 1491 1012 183
+3 77 308 529
+3 323 41 24
+3 225 361 1313
+3 6 248 71
+3 1258 669 374
+3 669 319 374
+3 249 41 323
+3 529 249 323
+3 115 444 782
+3 146 90 3
+3 309 705 1216
+3 669 1258 1240
+3 1264 109 636
+3 1302 185 1293
+3 1217 909 960
+3 237 265 1302
+3 337 1337 1336
+3 547 541 205
+3 1313 691 705
+3 286 352 337
+3 1327 1332 370
+3 798 846 901
+3 337 1338 1337
+3 361 225 611
+3 451 439 484
+3 677 1092 1240
+3 225 1313 109
+3 264 228 277
+3 352 1334 337
+3 785 783 1340
+3 309 1313 705
+3 674 683 682
+3 663 623 602
+3 622 663 710
+3 995 1591 806
+3 450 206 1047
+3 1283 99 215
+3 611 563 593
+3 475 246 664
+3 1294 1264 636
+3 442 1337 1347
+3 465 527 227
+3 659 630 636
+3 1454 1499 1527
+3 602 674 663
+3 107 1293 185
+3 1829 1851 1850
+3 109 690 660
+3 1313 690 109
+3 563 611 1264
+3 362 101 1237
+3 337 1334 1338
+3 206 450 517
+3 1347 471 503
+3 167 554 1323
+3 1468 1472 1489
+3 1091 1141 702
+3 471 563 1294
+3 715 1167 821
+3 1264 611 225
+3 1332 337 1336
+3 1004 1887 399
+3 586 1283 215
+3 1023 1252 1400
+3 1179 1370 1383
+3 1313 726 1324
+3 471 636 630
+3 352 420 1334
+3 1047 39 450
+3 99 1283 586
+3 237 1302 107
+3 14 446 330
+3 1313 361 645
+3 530 525 784
+3 1338 553 1348
+3 1337 1338 1348
+3 370 1332 442
+3 1332 1336 442
+3 715 821 414
+3 237 283 1322
+3 362 1237 282
+3 187 1303 237
+3 257 27 131
+3 801 430 1335
+3 1287 36 213
+3 554 167 338
+3 335 1331 1330
+3 311 335 1330
+3 1331 439 451
+3 506 420 421
+3 1330 1331 451
+3 663 674 1284
+3 674 385 1284
+3 184 107 185
+3 1322 283 314
+3 14 27 446
+3 439 520 484
+3 265 185 1302
+3 420 553 1338
+3 554 338 916
+3 1400 1298 1023
+3 553 563 471
+3 1324 726 1313
+3 1285 14 446
+3 434 435 451
+3 338 167 175
+3 277 311 1319
+3 546 783 244
+3 801 1335 1274
+3 338 1297 916
+3 1294 563 1264
+3 420 1338 1334
+3 783 606 244
+3 1337 1348 1347
+3 1313 1324 690
+3 311 312 335
+3 220 1285 1310
+3 1284 385 1278
+3 1128 1023 1252
+3 1285 220 14
+3 622 623 663
+3 109 660 636
+3 524 446 330
+3 1589 1402 1403
+3 338 220 1310
+3 674 682 385
+3 284 337 1332
+3 107 1302 1293
+3 663 1284 710
+3 1888 517 1067
+3 350 1251 69
+3 435 434 415
+3 435 415 428
+3 1297 338 1310
+3 682 307 385
+3 1306 1305 1280
+3 1276 1306 1280
+3 313 367 312
+3 1327 284 1332
+3 1336 1337 442
+3 1264 225 109
+3 180 169 1229
+3 475 801 246
+3 352 267 353
+3 1403 1404 1589
+3 1285 1292 1310
+3 307 682 748
+3 682 718 748
+3 1277 1276 1275
+3 1158 237 107
+3 1067 517 511
+3 1271 1282 1270
+3 489 125 443
+3 446 14 330
+3 586 1282 1271
+3 1292 1285 446
+3 446 330 1287
+3 443 125 489
+3 108 116 735
+3 813 100 99
+3 1276 1307 1306
+3 1483 1260 1317
+3 1272 586 1271
+3 1348 553 471
+3 1287 213 219
+3 330 446 1287
+3 443 231 489
+3 330 36 1287
+3 86 35 955
+3 450 785 1560
+3 1312 1304 1260
+3 1329 435 428
+3 1289 1276 1277
+3 1289 1290 1276
+3 1158 187 237
+3 1311 556 1248
+3 558 226 743
+3 1323 554 993
+3 1292 446 524
+3 273 443 489
+3 1290 1289 1277
+3 1290 1307 1276
+3 215 264 1296
+3 1304 1286 1273
+3 1260 1304 1273
+3 1311 1248 1273
+3 1286 1311 1273
+3 246 801 250
+3 1322 314 278
+3 1019 1572 1036
+3 307 748 608
+3 182 307 608
+3 1321 443 273
+3 471 1294 636
+3 215 1296 586
+3 1322 278 265
+3 542 701 1129
+3 1101 1284 1278
+3 39 783 785
+3 1304 400 1286
+3 400 1311 1286
+3 1339 244 1200
+3 1324 309 726
+3 674 602 683
+3 265 237 1322
+3 1653 1487 1470
+3 446 36 330
+3 1321 22 443
+3 1277 1267 1290
+3 1290 1308 1307
+3 1313 309 1324
+3 467 68 97
+3 1312 1279 1304
+3 367 1331 335
+3 524 330 1287
+3 1347 1348 471
+3 1308 1315 1307
+3 1365 1529 1108
+3 531 530 784
+3 123 531 784
+3 1041 556 1311
+3 666 1041 1311
+3 312 367 335
+3 1705 1707 1711
+3 690 1324 725
+3 1331 367 439
+3 710 1284 1101
+3 608 748 758
+3 170 699 169
+3 1308 1309 1315
+3 1368 1362 1371
+3 306 1200 244
+3 1279 1299 1304
+3 1304 1299 400
+3 666 1311 400
+3 251 283 237
+3 1853 1866 1871
+3 1312 963 1279
+3 1101 1279 963
+3 1314 1321 273
+3 699 1229 169
+3 1268 1290 1267
+3 1290 1309 1308
+3 386 1320 1315
+3 1320 386 1315
+3 314 1327 370
+3 542 156 701
+3 475 1486 1684
+3 1297 1185 528
+3 1031 916 60
+3 1297 1310 1185
+3 158 1321 1314
+3 1379 1305 1306
+3 553 420 506
+3 291 1342 782
+3 608 758 1041
+3 666 608 1041
+3 1635 1452 1453
+3 60 916 528
+3 916 1297 528
+3 1314 1287 158
+3 601 471 630
+3 1291 1290 1268
+3 1316 1315 1309
+3 1316 1320 1315
+3 400 608 666
+3 1292 524 1185
+3 187 189 188
+3 68 467 1318
+3 187 188 1303
+3 1281 1291 1268
+3 1290 1291 1309
+3 1320 1316 386
+3 1278 385 1299
+3 1279 1278 1299
+3 385 307 400
+3 1299 385 400
+3 307 182 400
+3 400 182 608
+3 801 208 430
+3 243 547 205
+3 1292 1185 1310
+3 1324 726 725
+3 699 170 105
+3 105 68 1318
+3 699 105 1318
+3 1316 1325 386
+3 1325 1187 386
+3 1096 1587 1114
+3 515 208 801
+3 1287 1314 524
+3 1287 219 158
+3 1541 1546 1545
+3 1540 1541 1545
+3 1361 519 1520
+3 570 1129 701
+3 785 1340 884
+3 176 903 127
+3 1232 1226 1233
+3 570 1366 1129
+3 1571 884 1560
+3 378 533 1432
+3 210 570 533
+3 378 210 533
+3 570 210 1366
+3 210 290 1129
+3 1366 210 1129
+3 1129 290 542
+3 290 909 542
+3 290 960 909
+3 124 697 290
+3 210 124 290
+3 697 457 290
+3 290 457 960
+3 960 457 1217
+3 378 124 210
+3 1477 1420 1006
+3 591 378 597
+3 1165 591 597
+3 378 591 124
+3 124 457 697
+3 124 1394 457
+3 1653 1470 1452
+3 1377 1378 1388
+3 1199 1048 1231
+3 124 350 1394
+3 1394 350 457
+3 1048 1165 1231
+3 1048 671 591
+3 1165 1048 591
+3 671 350 124
+3 591 671 124
+3 350 69 457
+3 98 467 97
+3 827 155 1880
+3 1544 1545 1157
+3 176 173 18
+3 1273 700 1260
+3 1369 1048 1199
+3 671 588 350
+3 1545 1546 1204
+3 1546 640 1204
+3 607 247 231
+3 1278 1279 1101
+3 154 1081 1513
+3 1777 1770 1785
+3 1295 499 143
+3 1850 1849 1834
+3 1199 732 1369
+3 732 1183 1369
+3 926 1048 1369
+3 1183 926 1369
+3 926 635 671
+3 1048 926 671
+3 671 635 588
+3 635 1251 350
+3 588 635 350
+3 1449 1472 1468
+3 1490 1489 1472
+3 1425 1074 1389
+3 1198 494 1065
+3 758 718 759
+3 1449 1468 1472
+3 1260 700 1317
+3 732 926 1183
+3 1449 1451 1468
+3 1468 1451 1472
+3 1490 538 1489
+3 272 406 952
+3 1183 926 732
+3 926 1183 635
+3 1426 1435 1434
+3 1449 1450 1451
+3 1421 1422 1426
+3 1426 1422 1435
+3 1469 1486 1490
+3 1472 1469 1490
+3 1183 455 635
+3 1435 1450 1449
+3 1434 1435 1449
+3 1451 1469 1472
+3 475 538 1490
+3 1486 475 1490
+3 475 664 538
+3 250 289 538
+3 664 250 538
+3 306 260 575
+3 455 1183 732
+3 1169 455 732
+3 455 469 635
+3 469 1256 1251
+3 635 469 1251
+3 1204 1389 1074
+3 1319 1381 1382
+3 1364 1370 1085
+3 1329 1396 1381
+3 1330 435 1329
+3 1444 1451 1450
+3 1435 1444 1450
+3 1520 519 1120
+3 1505 1120 1326
+3 640 883 1425
+3 1357 1364 1355
+3 1357 1370 1364
+3 1357 1211 1370
+3 1225 1245 1448
+3 1408 1423 1422
+3 1411 1408 1422
+3 1423 1436 1435
+3 1422 1423 1435
+3 1436 1437 1444
+3 1435 1436 1444
+3 1437 1452 1451
+3 1444 1437 1451
+3 1452 1470 1469
+3 1451 1452 1469
+3 1469 1470 1486
+3 1486 1674 1487
+3 1413 980 1420
+3 1470 1487 1486
+3 1537 154 1105
+3 1393 1054 1169
+3 668 455 1169
+3 1054 668 1169
+3 668 735 469
+3 455 668 469
+3 1455 1445 1446
+3 1185 524 298
+3 459 552 38
+3 1731 1739 1722
+3 1376 1211 1357
+3 1409 1408 1398
+3 1427 1437 1436
+3 1423 1427 1436
+3 1403 1388 1404
+3 84 1367 1363
+3 596 556 1041
+3 714 1185 298
+3 1004 1054 1393
+3 1210 574 399
+3 1320 1307 1315
+3 1635 1453 1438
+3 1359 1357 1356
+3 1358 1359 1356
+3 1377 1376 1357
+3 1359 1377 1357
+3 1388 1211 1376
+3 1377 1388 1376
+3 1402 1409 1398
+3 1388 1398 1211
+3 1402 1408 1409
+3 1402 1424 1423
+3 1408 1402 1423
+3 1423 1424 1427
+3 1424 1438 1437
+3 1427 1424 1437
+3 1438 1453 1452
+3 1437 1438 1452
+3 1197 828 650
+3 1111 1511 1510
+3 1300 1319 1382
+3 1010 808 539
+3 208 1198 430
+3 1200 459 38
+3 1539 1540 1544
+3 1507 1094 1405
+3 1094 976 1405
+3 475 515 801
+3 533 378 1432
+3 744 491 668
+3 1054 744 668
+3 491 101 735
+3 668 491 735
+3 384 459 1200
+3 306 384 1200
+3 1512 1197 650
+3 7 71 103
+3 1466 1295 1074
+3 1325 1328 1187
+3 1329 428 1328
+3 1507 1405 1397
+3 1120 897 1184
+3 1388 1403 1402
+3 1110 859 576
+3 1431 1110 576
+3 399 1212 744
+3 1363 800 153
+3 575 384 306
+3 1178 1100 1512
+3 1425 883 1074
+3 733 84 1363
+3 55 54 87
+3 1296 1300 1282
+3 1378 1377 1359
+3 1004 715 414
+3 1462 1479 1478
+3 1479 1492 222
+3 1478 1479 222
+3 1492 30 179
+3 222 1492 179
+3 30 351 179
+3 1544 1157 154
+3 429 297 709
+3 1110 1483 1475
+3 86 306 391
+3 700 159 1498
+3 1317 700 1498
+3 205 467 540
+3 828 518 650
+3 1672 1462 1473
+3 399 574 1212
+3 574 634 744
+3 1212 574 744
+3 634 1237 491
+3 744 634 491
+3 1367 714 1363
+3 375 606 112
+3 375 82 606
+3 82 86 391
+3 1457 1181 640
+3 1546 1457 640
+3 1479 1493 1492
+3 1493 258 30
+3 1492 1493 30
+3 429 1248 297
+3 39 375 199
+3 336 181 149
+3 1439 1511 1111
+3 1684 515 475
+3 1483 1317 1475
+3 1317 1498 1475
+3 429 342 159
+3 700 429 159
+3 1510 1052 1179
+3 1181 1130 883
+3 640 1181 883
+3 1405 980 1413
+3 1140 964 1181
+3 1509 1439 1517
+3 1479 1480 1493
+3 403 351 30
+3 258 403 30
+3 390 389 1121
+3 1400 897 1298
+3 604 272 709
+3 1460 749 603
+3 403 523 351
+3 1249 329 642
+3 1390 1466 883
+3 1382 1381 1270
+3 1363 714 800
+3 342 429 709
+3 540 375 1047
+3 297 604 709
+3 1467 1461 1201
+3 1130 1390 883
+3 1374 1454 1515
+3 1462 1480 1479
+3 1136 523 403
+3 964 1441 1181
+3 1059 1522 1128
+3 1003 221 500
+3 976 1003 500
+3 1100 1197 1512
+3 390 25 43
+3 1407 1416 1415
+3 1406 1407 1415
+3 1455 1446 1458
+3 1446 1463 1462
+3 1458 1446 1462
+3 1463 1464 1462
+3 1464 1481 1480
+3 1462 1464 1480
+3 1481 1494 1493
+3 1480 1481 1493
+3 1494 994 258
+3 1493 1494 258
+3 456 403 258
+3 994 456 258
+3 621 1136 403
+3 456 621 403
+3 621 523 1136
+3 621 745 523
+3 141 473 523
+3 745 141 523
+3 1516 1524 1476
+3 1128 245 1471
+3 1155 321 634
+3 574 1155 634
+3 599 681 202
+3 298 524 273
+3 681 236 202
+3 1368 1371 1527
+3 1263 1254 1540
+3 1502 1524 1476
+3 1476 1046 1115
+3 321 454 1237
+3 634 321 1237
+3 541 467 205
+3 467 97 540
+3 98 97 467
+3 236 84 733
+3 1446 1464 1463
+3 621 141 745
+3 1516 1476 1503
+3 1467 1201 828
+3 174 639 142
+3 202 236 733
+3 1686 1703 1702
+3 1494 1495 994
+3 1495 1123 994
+3 516 141 621
+3 1037 1059 1128
+3 590 467 541
+3 701 156 1164
+3 1398 1408 1383
+3 493 1301 322
+3 639 321 1155
+3 1429 1446 1445
+3 1481 1495 1494
+3 1220 456 994
+3 1123 1220 994
+3 954 322 495
+3 1373 493 322
+3 987 321 639
+3 260 306 86
+3 1370 1384 1383
+3 1280 1305 1362
+3 1305 1371 1362
+3 1305 1379 1371
+3 1392 1386 1371
+3 1392 1395 1386
+3 1395 1407 1386
+3 1395 1417 1416
+3 1407 1395 1416
+3 1703 1694 1704
+3 1428 1430 1429
+3 1430 1442 1446
+3 1429 1430 1446
+3 1442 1459 1464
+3 1446 1442 1464
+3 1459 1482 1481
+3 1464 1459 1481
+3 1482 1488 1481
+3 1488 1496 1495
+3 1481 1488 1495
+3 1496 468 1123
+3 1495 1496 1123
+3 637 1220 1123
+3 468 637 1123
+3 637 456 1220
+3 637 1203 456
+3 161 621 456
+3 1203 161 456
+3 75 516 621
+3 161 75 621
+3 1687 1694 1703
+3 75 34 495
+3 954 495 34
+3 34 32 322
+3 954 34 322
+3 32 47 322
+3 47 1373 322
+3 47 493 1373
+3 47 183 493
+3 183 1055 493
+3 1055 1012 174
+3 1012 639 174
+3 959 987 639
+3 1012 959 639
+3 959 321 987
+3 959 346 321
+3 346 454 321
+3 1341 1229 541
+3 1229 699 541
+3 699 1318 590
+3 541 699 590
+3 637 161 1203
+3 1615 1635 1438
+3 1300 1296 1319
+3 1379 1395 1392
+3 1615 1438 1424
+3 1613 1615 1424
+3 75 1007 34
+3 1229 1341 336
+3 700 1273 1248
+3 1392 1395 1379
+3 70 161 637
+3 183 1012 1055
+3 1673 1184 1504
+3 1159 1504 1184
+3 1266 346 959
+3 1604 1613 1603
+3 1459 1442 1456
+3 1075 1502 1503
+3 221 1100 500
+3 677 1084 1092
+3 1230 1224 677
+3 1537 1544 154
+3 1105 1097 1088
+3 1121 1105 1088
+3 1146 1346 767
+3 1087 293 176
+3 1100 1467 1197
+3 1154 585 1209
+3 962 1032 1065
+3 1765 1759 1024
+3 1408 1501 1383
+3 1516 1502 1524
+3 143 499 519
+3 1147 1142 1126
+3 1074 1094 680
+3 1399 897 1400
+3 1087 176 366
+3 235 1087 653
+3 897 1399 1184
+3 1149 1135 1061
+3 1685 1687 1360
+3 1379 1392 1371
+3 1810 259 242
+3 293 448 176
+3 1521 1537 1105
+3 235 653 677
+3 1100 1361 1467
+3 373 1538 1537
+3 1514 585 1059
+3 1059 585 341
+3 462 796 1023
+3 964 1106 1419
+3 1505 1326 1201
+3 711 63 1514
+3 964 1140 1092
+3 340 780 868
+3 796 1037 1128
+3 1037 127 796
+3 127 1037 796
+3 18 903 127
+3 1059 341 1522
+3 1508 366 1098
+3 1410 176 127
+3 293 1087 235
+3 1117 187 1876
+3 366 1508 1106
+3 1399 1159 1184
+3 1399 1400 1252
+3 1170 1099 1142
+3 1410 1037 1098
+3 903 18 127
+3 1383 1384 1211
+3 1522 341 1154
+3 1120 1505 1201
+3 1087 366 653
+3 1060 1135 1485
+3 964 1419 1130
+3 293 235 1224
+3 1244 1092 1457
+3 1545 1204 1157
+3 1252 1471 1399
+3 366 964 653
+3 1525 1126 1142
+3 160 360 1173
+3 366 176 1098
+3 499 1401 1298
+3 1539 1544 1537
+3 677 653 1084
+3 176 18 903
+3 199 375 112
+3 1520 1120 1461
+3 644 1539 1538
+3 80 1535 216
+3 176 1410 1098
+3 1140 1181 1457
+3 462 1098 796
+3 1541 1457 1546
+3 1361 1520 1467
+3 1467 1520 1461
+3 373 644 1538
+3 1106 462 781
+3 1591 834 67
+3 390 594 360
+3 160 390 360
+3 1254 1247 1541
+3 1075 1503 980
+3 903 711 127
+3 181 263 276
+3 444 1083 291
+3 1538 1539 1537
+3 1419 1106 1390
+3 519 1361 143
+3 216 1535 1521
+3 1094 976 680
+3 1441 1130 1181
+3 1006 672 1385
+3 1060 1485 1149
+3 43 389 390
+3 181 809 263
+3 1120 1201 1461
+3 1130 1419 1390
+3 1081 680 1507
+3 1231 1165 597
+3 809 1083 263
+3 964 366 1106
+3 809 181 336
+3 809 291 1083
+3 496 1506 1046
+3 650 496 1046
+3 390 1121 594
+3 1295 1003 1074
+3 1832 1845 1831
+3 1204 640 1389
+3 341 585 1154
+3 1081 1507 1082
+3 511 1884 569
+3 1197 1467 828
+3 1341 809 336
+3 1094 680 976
+3 1540 1545 1544
+3 1398 1388 1402
+3 594 1121 1088
+3 1233 1099 1189
+3 1502 1516 1503
+3 1506 1344 1115
+3 291 1137 1342
+3 1097 1513 1082
+3 1412 812 123
+3 498 831 812
+3 1412 498 812
+3 883 1466 1074
+3 1006 1420 962
+3 1399 1471 1159
+3 962 672 1006
+3 1118 1091 702
+3 1686 1687 1703
+3 653 964 1084
+3 1735 1756 1747
+3 1 470 1536
+3 127 1059 1037
+3 1046 1506 1115
+3 1288 1006 1385
+3 1044 1288 1385
+3 1147 1171 1170
+3 1170 1142 1147
+3 1137 291 1346
+3 808 1008 1011
+3 1159 245 1504
+3 672 962 1065
+3 1390 1106 781
+3 1390 1156 1466
+3 1466 1156 1295
+3 1015 1016 1021
+3 644 1540 1539
+3 1137 1533 1542
+3 1088 1089 1288
+3 594 1088 1288
+3 1721 1746 1729
+3 360 594 1044
+3 594 1288 1044
+3 245 1108 1504
+3 216 1521 389
+3 1370 1211 1384
+3 1089 1006 1288
+3 461 587 513
+3 513 587 1351
+3 1351 1414 1412
+3 1091 498 1412
+3 1414 1091 1412
+3 1115 1530 538
+3 1092 1140 1457
+3 1509 1090 1534
+3 1137 1146 1533
+3 1349 461 1547
+3 767 1 1064
+3 1529 1365 1195
+3 964 1130 1441
+3 1351 587 1414
+3 1115 538 289
+3 1405 1413 1397
+3 1513 1081 1082
+3 980 514 1420
+3 680 1094 1507
+3 461 1414 587
+3 1106 1508 462
+3 1052 1370 1179
+3 1457 1541 1247
+3 587 1414 461
+3 1420 514 962
+3 545 1201 1326
+3 63 585 1514
+3 1105 154 1097
+3 1064 1533 1146
+3 1414 587 461
+3 973 1414 587
+3 1414 973 587
+3 250 1476 289
+3 1157 1204 680
+3 1398 1383 1211
+3 1507 1397 1082
+3 1543 1349 1533
+3 1536 1543 1533
+3 1543 410 461
+3 1349 1543 461
+3 973 1141 1414
+3 1141 1091 1414
+3 1476 1115 289
+3 1084 964 1092
+3 373 1537 1521
+3 1521 1105 1121
+3 1476 250 1274
+3 1149 1525 1142
+3 1254 1541 1540
+3 1064 1 1533
+3 1082 1397 1089
+3 1088 1082 1089
+3 80 373 1521
+3 1535 80 1521
+3 1341 541 547
+3 39 199 783
+3 1536 1533 1
+3 973 1414 461
+3 410 973 461
+3 1397 1413 1420
+3 1089 1397 1420
+3 1111 1383 1501
+3 1503 1476 1274
+3 606 82 391
+3 780 38 868
+3 1135 1104 1061
+3 711 1514 1059
+3 1365 1194 1195
+3 1111 1510 1179
+3 1485 1135 1149
+3 796 1128 1252
+3 1228 293 1224
+3 644 1263 1540
+3 390 160 25
+3 63 33 104
+3 391 306 606
+3 306 244 606
+3 1097 154 1513
+3 494 672 1065
+3 1543 618 410
+3 537 973 410
+3 618 537 410
+3 1141 1352 702
+3 972 992 991
+3 1509 1534 1439
+3 1088 1097 1082
+3 1401 1023 1298
+3 53 569 1884
+3 1536 470 1543
+3 1410 127 1037
+3 1227 293 1228
+3 1128 1522 245
+3 1522 1154 245
+3 1850 1870 1849
+3 796 1252 1023
+3 173 83 33
+3 1543 470 618
+3 1242 69 1251
+3 1515 1527 1371
+3 1162 1135 1060
+3 1188 537 618
+3 1109 973 537
+3 1109 1141 973
+3 1109 1352 1141
+3 545 518 828
+3 1201 545 828
+3 1244 1457 1247
+3 1386 1515 1371
+3 1454 1518 1499
+3 1089 1477 1006
+3 462 1401 781
+3 1401 499 781
+3 63 433 585
+3 1886 1066 304
+3 1204 1074 680
+3 127 711 1059
+3 640 1425 1389
+3 1188 618 537
+3 499 1298 519
+3 1226 1060 1099
+3 1233 1226 1099
+3 1515 1454 1386
+3 1298 897 1120
+3 519 1298 1120
+3 1188 537 618
+3 1524 1502 1046
+3 1178 980 550
+3 1178 1512 1075
+3 1515 1386 1391
+3 1386 1454 1515
+3 1149 1142 1099
+3 1060 1149 1099
+3 1508 1098 462
+3 1401 462 1023
+3 1178 1075 980
+3 552 868 459
+3 1061 1525 1149
+3 1098 1037 796
+3 1524 1502 1476
+3 1476 1524 1046
+3 1828 1829 1834
+3 1089 1420 1477
+3 1352 672 494
+3 1502 1512 650
+3 1252 1128 1471
+3 1531 1518 1454
+3 1374 1531 1454
+3 1512 1502 1075
+3 1173 618 470
+3 160 1173 470
+3 1173 1188 618
+3 1173 360 1188
+3 1188 360 537
+3 360 1044 537
+3 537 1044 1109
+3 1044 1385 1109
+3 672 1352 1109
+3 1385 672 1109
+3 511 517 1560
+3 1390 781 1156
+3 223 222 178
+3 831 123 812
+3 141 1333 1618
+3 1812 1829 1828
+3 115 1005 1210
+3 1636 1637 1656
+3 1515 1454 1527
+3 1682 538 1530
+3 1637 1657 1656
+3 1072 1525 1061
+3 1071 1072 1061
+3 548 1350 142
+3 1072 1592 1525
+3 1487 1674 1684
+3 67 937 1010
+3 1754 1784 1762
+3 1462 1478 1473
+3 1137 1542 778
+3 1614 1637 1616
+3 1036 1572 1049
+3 1049 1584 1072
+3 1072 1584 1592
+3 1126 1525 1592
+3 1584 1126 1592
+3 199 606 783
+3 1499 1362 1368
+3 1680 1678 1679
+3 1049 1586 1584
+3 799 178 1558
+3 702 494 1198
+3 1499 1368 1527
+3 868 552 459
+3 1486 1487 1674
+3 1572 1584 1586
+3 1049 1572 1586
+3 779 702 1198
+3 1799 1812 1795
+3 1618 1333 141
+3 1662 1676 1675
+3 805 1350 548
+3 1026 1570 1572
+3 1572 1073 1584
+3 1073 1078 1584
+3 1112 1126 1584
+3 1078 1112 1584
+3 1617 1620 1416
+3 1676 560 483
+3 1675 1676 483
+3 1659 1678 1663
+3 1572 1570 1073
+3 1054 1004 399
+3 827 1880 1870
+3 1404 1590 1604
+3 1374 1515 1391
+3 1345 472 498
+3 802 1555 1519
+3 1655 1675 1674
+3 1416 1614 1415
+3 1407 1406 1386
+3 482 560 1676
+3 1678 1677 1663
+3 1146 1137 1346
+3 1026 1562 1570
+3 1562 1570 1572
+3 1570 1562 1572
+3 1677 482 1676
+3 547 1556 798
+3 701 734 570
+3 1589 1404 1603
+3 1641 1638 1634
+3 1639 1641 1634
+3 1657 1658 1662
+3 260 261 575
+3 156 701 1164
+3 1639 1634 1621
+3 1638 1658 1657
+3 1637 1638 1657
+3 1519 1118 702
+3 1415 1605 1590
+3 779 1519 702
+3 1406 1415 1590
+3 1663 1677 1662
+3 1641 1659 1638
+3 1658 1663 1662
+3 1635 1636 1654
+3 1562 1073 1570
+3 1687 1686 1439
+3 1047 375 39
+3 1621 1620 1617
+3 738 1519 779
+3 483 515 1674
+3 473 1333 531
+3 481 473 531
+3 1155 142 639
+3 1662 1675 1655
+3 1656 1662 1655
+3 1038 1045 1073
+3 1562 1038 1073
+3 222 179 178
+3 1428 1621 1617
+3 1675 483 1674
+3 1159 1471 245
+3 1365 245 1154
+3 1209 1365 1154
+3 245 1365 1108
+3 1637 1636 1616
+3 1416 1620 1614
+3 1638 1637 1614
+3 884 1340 1884
+3 1049 1043 1036
+3 1605 1616 1607
+3 1620 1638 1614
+3 1455 1428 1445
+3 1622 1529 1195
+3 1177 1622 1195
+3 550 976 500
+3 1587 1523 1119
+3 1428 1455 1639
+3 1096 1079 1080
+3 553 536 563
+3 1404 1604 1603
+3 1153 1622 1177
+3 641 638 661
+3 1013 1020 1038
+3 1051 1114 1096
+3 1080 1051 1096
+3 1114 1587 1096
+3 1556 243 798
+3 1532 1623 1622
+3 1153 1532 1622
+3 1642 1529 1622
+3 1623 1642 1622
+3 1642 1504 1108
+3 1529 1642 1108
+3 1378 1404 1388
+3 178 179 1559
+3 1455 1473 1641
+3 1653 1655 1674
+3 805 530 531
+3 33 18 173
+3 1605 1604 1590
+3 179 1259 1559
+3 1051 1587 1114
+3 1118 1345 1091
+3 851 523 532
+3 715 732 1167
+3 1636 1655 1654
+3 1621 1634 1620
+3 1093 1280 1362
+3 1051 1114 1587
+3 1604 1605 1607
+3 1022 1563 1051
+3 1030 1022 1051
+3 1563 1573 1114
+3 1051 1563 1114
+3 1573 1585 1587
+3 1114 1573 1587
+3 1585 1599 1523
+3 1587 1585 1523
+3 1599 1608 1532
+3 1523 1599 1532
+3 1608 1624 1623
+3 1532 1608 1623
+3 1643 1642 1623
+3 1624 1643 1623
+3 1643 1673 1504
+3 1642 1643 1504
+3 551 547 798
+3 1014 1016 1015
+3 1306 1307 1379
+3 805 548 525
+3 1656 1655 1636
+3 1333 805 531
+3 1683 1184 1673
+3 1608 1625 1624
+3 1644 1643 1624
+3 1625 1644 1624
+3 1664 1673 1643
+3 1644 1664 1643
+3 1023 1128 1252
+3 585 854 1209
+3 1351 1412 123
+3 322 1301 495
+3 799 844 474
+3 516 495 141
+3 1555 1345 1118
+3 495 1618 141
+3 1047 205 540
+3 141 1333 473
+3 1433 1426 1434
+3 597 378 1063
+3 1528 1564 1563
+3 1022 1528 1563
+3 1564 1574 1573
+3 1563 1564 1573
+3 1573 1574 1585
+3 1574 1576 1585
+3 1585 1576 1599
+3 1576 1600 1599
+3 1600 1609 1608
+3 1599 1600 1608
+3 1609 1626 1625
+3 1608 1609 1625
+3 1645 1644 1625
+3 1626 1645 1625
+3 1645 1665 1664
+3 1644 1645 1664
+3 1665 1673 1664
+3 1665 1326 1683
+3 1673 1665 1683
+3 1326 1184 1683
+3 1590 1404 1378
+3 1614 1616 1605
+3 1415 1614 1605
+3 495 1333 1618
+3 738 802 1519
+3 1473 1659 1641
+3 1301 805 1333
+3 205 206 517
+3 1405 976 550
+3 495 1301 1333
+3 1593 1601 1600
+3 1576 1593 1600
+3 1600 1601 1609
+3 716 208 515
+3 500 1178 550
+3 1259 523 851
+3 1559 1259 851
+3 831 481 812
+3 513 1349 1547
+3 1616 1636 1635
+3 222 223 1680
+3 1301 493 1350
+3 805 1301 1350
+3 1548 1552 1528
+3 1552 1565 1564
+3 1528 1552 1564
+3 1564 1565 1574
+3 1574 1565 1576
+3 1565 1577 1576
+3 1594 1593 1576
+3 1577 1594 1576
+3 1602 1601 1593
+3 1594 1602 1593
+3 1602 1610 1609
+3 1601 1602 1609
+3 1610 1627 1626
+3 1609 1610 1626
+3 1627 1646 1645
+3 1626 1627 1645
+3 1666 1665 1645
+3 1646 1666 1645
+3 1666 545 1326
+3 1665 1666 1326
+3 483 716 515
+3 547 243 1556
+3 221 1003 1295
+3 143 221 1295
+3 809 1341 547
+3 551 809 547
+3 782 1342 778
+3 731 782 778
+3 1458 1660 1455
+3 752 309 753
+3 1424 1402 1603
+3 1519 1555 1118
+3 1549 1553 1552
+3 1552 1553 1565
+3 1775 1799 1784
+3 1406 1590 1378
+3 493 174 142
+3 1350 493 142
+3 1656 1657 1662
+3 291 809 551
+3 731 778 513
+3 523 473 481
+3 1365 1209 1194
+3 1346 291 551
+3 784 731 513
+3 1550 1554 1553
+3 1549 1550 1553
+3 1566 1565 1553
+3 1554 1566 1553
+3 1566 1578 1577
+3 1565 1566 1577
+3 1578 1579 1577
+3 1577 1579 1594
+3 1579 1595 1602
+3 1594 1579 1602
+3 1595 1611 1610
+3 1602 1595 1610
+3 1628 1627 1610
+3 1611 1628 1610
+3 1628 1647 1646
+3 1627 1628 1646
+3 1647 1667 1666
+3 1646 1647 1666
+3 1667 1668 1666
+3 518 545 1666
+3 1668 518 1666
+3 1074 1003 1094
+3 1607 1616 1615
+3 123 784 513
+3 1499 1527 1454
+3 1604 1607 1615
+3 1566 1579 1578
+3 1628 1648 1647
+3 1648 1668 1667
+3 1647 1648 1667
+3 1681 518 1668
+3 778 1533 1349
+3 513 778 1349
+3 496 650 497
+3 1615 1616 1635
+3 1429 1445 1428
+3 718 758 748
+3 1678 1680 474
+3 1680 223 474
+3 778 1542 1533
+3 206 205 1047
+3 1478 222 1680
+3 1679 1678 1659
+3 857 831 498
+3 1386 1406 1391
+3 1360 1090 1554
+3 1550 1360 1554
+3 1567 1566 1554
+3 1090 1567 1554
+3 1567 1580 1579
+3 1566 1567 1579
+3 1580 1596 1595
+3 1579 1580 1595
+3 1596 1606 1611
+3 1595 1596 1611
+3 1606 1629 1628
+3 1611 1606 1628
+3 1629 1630 1628
+3 1630 1649 1648
+3 1628 1630 1648
+3 1649 1526 1668
+3 1648 1649 1668
+3 496 1681 1668
+3 1526 496 1668
+3 497 518 1681
+3 496 497 1681
+3 518 497 650
+3 1005 574 1210
+3 123 513 1351
+3 1877 1876 1865
+3 1558 1555 802
+3 799 1558 802
+3 472 857 498
+3 1455 1641 1639
+3 1634 1638 1620
+3 482 802 738
+3 1248 429 700
+3 1654 1655 1653
+3 1606 1630 1629
+3 1473 1478 1680
+3 1362 1499 1557
+3 1558 857 472
+3 1120 1184 1326
+3 857 851 831
+3 844 802 482
+3 474 844 482
+3 1555 1558 472
+3 844 799 802
+3 1342 1137 778
+3 1649 1669 1526
+3 1669 1506 496
+3 1526 1669 496
+3 1421 1411 1422
+3 851 532 481
+3 780 340 383
+3 1008 780 383
+3 1568 1567 1090
+3 1509 1568 1090
+3 1568 1581 1580
+3 1567 1568 1580
+3 1597 1596 1580
+3 1581 1597 1580
+3 1596 1597 1606
+3 1597 1612 1606
+3 1612 1631 1630
+3 1606 1612 1630
+3 1631 1650 1649
+3 1630 1631 1649
+3 1649 1650 1669
+3 1678 482 1677
+3 560 738 716
+3 1345 498 1091
+3 500 1100 1178
+3 1678 474 482
+3 1009 878 53
+3 1841 1855 1851
+3 1454 1527 1499
+3 827 107 1864
+3 495 516 75
+3 208 716 1198
+3 1428 1639 1621
+3 1771 1280 1093
+3 842 1008 808
+3 378 533 1063
+3 1582 1581 1568
+3 1659 1663 1658
+3 1865 1864 1851
+3 1503 514 980
+3 1583 1582 1568
+3 1588 1581 1582
+3 1583 1588 1582
+3 1598 1597 1581
+3 1588 1598 1581
+3 1433 1612 1597
+3 1433 1619 1612
+3 1632 1631 1612
+3 1619 1632 1612
+3 1651 1650 1631
+3 1632 1651 1631
+3 1651 1670 1669
+3 1650 1651 1669
+3 1670 1344 1506
+3 1669 1670 1506
+3 1660 1473 1455
+3 1353 842 808
+3 995 1353 808
+3 1353 1008 842
+3 1353 842 1008
+3 38 780 1008
+3 842 38 1008
+3 1517 1569 1568
+3 1509 1517 1568
+3 1569 1583 1568
+3 716 779 1198
+3 716 738 779
+3 560 482 738
+3 1360 1687 1534
+3 1353 38 842
+3 1111 1179 1383
+3 1598 1433 1597
+3 868 38 552
+3 1045 1078 1073
+3 1124 406 272
+3 1433 1632 1619
+3 474 223 1122
+3 1033 1765 1024
+3 1378 1359 1374
+3 1269 1270 1281
+3 291 782 444
+3 483 560 716
+3 1462 1473 1660
+3 499 1156 781
+3 1214 542 909
+3 1555 472 1345
+3 178 1559 1558
+3 851 481 831
+3 1269 1281 1268
+3 824 1856 1846
+3 53 1884 1009
+3 1439 1111 1517
+3 1111 1501 1569
+3 1517 1111 1569
+3 1411 1583 1569
+3 1411 1588 1583
+3 1411 1421 1588
+3 1421 1598 1588
+3 1421 1426 1598
+3 1426 1433 1598
+3 1434 1433 1598
+3 1433 1434 1598
+3 1633 1632 1433
+3 1434 1633 1433
+3 1633 1640 1632
+3 1652 1651 1632
+3 1640 1652 1632
+3 1671 1670 1651
+3 1652 1671 1651
+3 1671 1682 1344
+3 1670 1671 1344
+3 1274 1032 514
+3 1503 1274 514
+3 532 523 481
+3 1319 264 277
+3 546 806 1591
+3 546 995 806
+3 546 746 995
+3 1339 1353 995
+3 746 1339 995
+3 1339 1200 38
+3 1353 1339 38
+3 223 178 799
+3 1530 1344 1682
+3 1449 1633 1434
+3 399 744 1054
+3 831 812 123
+3 1122 223 799
+3 243 205 517
+3 1473 1679 1659
+3 1024 1034 1033
+3 1591 1340 546
+3 1501 1411 1569
+3 1156 499 1295
+3 1449 1640 1633
+3 1209 953 854
+3 1458 1462 1660
+3 1224 235 677
+3 1523 1153 1119
+3 1085 1355 1364
+3 1468 1652 1640
+3 1449 1468 1640
+3 1489 1671 1652
+3 1468 1489 1652
+3 1671 1489 1682
+3 1489 538 1682
+3 1836 1857 1844
+3 803 189 1117
+3 1685 1696 1695
+3 1785 1791 1801
+3 1808 1801 1791
+3 1791 1792 1815
+3 1751 1750 1735
+3 1843 1862 1842
+3 1210 444 115
+3 1800 1816 1799
+3 1803 1821 1802
+3 995 834 1591
+3 758 759 1182
+3 1836 1858 1857
+3 810 168 1857
+3 1858 810 1857
+3 1740 1757 1764
+3 873 872 865
+3 1808 1815 1801
+3 1815 1819 1801
+3 804 803 1117
+3 1741 1757 1740
+3 1793 865 1788
+3 1812 1816 1830
+3 804 1117 1878
+3 1561 1725 1726
+3 1857 168 824
+3 1757 1779 1764
+3 1764 1779 1786
+3 78 815 1868
+3 1822 1821 1803
+3 1717 1737 1721
+3 1844 1857 1846
+3 1725 1724 1013
+3 824 168 79
+3 1713 1718 1696
+3 1715 1714 1699
+3 1722 1730 1711
+3 1843 1842 1825
+3 1819 1843 1825
+3 1857 824 1846
+3 1821 1827 1826
+3 1809 1821 1826
+3 1815 1843 1819
+3 1821 1809 1802
+3 1548 1689 1549
+3 1733 1561 1562
+3 1733 1725 1561
+3 604 599 406
+3 406 599 202
+3 1822 1836 1821
+3 815 1881 1867
+3 1852 1855 1830
+3 1699 1706 1016
+3 54 55 845
+3 158 219 521
+3 1742 1741 1725
+3 1733 1742 1725
+3 1763 1777 1785
+3 1033 1788 1765
+3 1769 1776 1749
+3 1749 1755 1734
+3 1756 1769 1747
+3 1732 1751 1735
+3 1707 1714 1722
+3 1738 1754 1737
+3 1876 1864 1865
+3 1722 1714 1731
+3 865 1810 1787
+3 1788 865 1787
+3 1715 1732 1714
+3 1742 1757 1741
+3 1804 1803 1786
+3 1779 1804 1786
+3 1770 1791 1785
+3 1806 1810 242
+3 1722 1747 1730
+3 1705 1706 1707
+3 1804 1805 1803
+3 1835 1843 1815
+3 1346 551 1883
+3 1802 1809 1792
+3 1837 1836 1822
+3 1017 1837 1822
+3 1878 1117 1877
+3 1711 1730 1713
+3 1690 1706 1705
+3 1837 136 1836
+3 1830 1855 1841
+3 1868 1867 1856
+3 599 20 681
+3 136 1858 1836
+3 1791 1815 1808
+3 1024 1759 1018
+3 1730 1734 1713
+3 1785 1790 1756
+3 20 596 681
+3 1831 1830 1817
+3 1797 1802 1792
+3 1864 1876 827
+3 1881 815 814
+3 1787 1806 1781
+3 1116 916 1031
+3 814 1882 1872
+3 1711 1713 1697
+3 1110 1475 859
+3 1690 1705 1698
+3 1785 1801 1790
+3 1765 1787 1759
+3 596 20 556
+3 596 1180 681
+3 681 1180 1031
+3 1882 804 1872
+3 1698 1711 1697
+3 1759 1781 1758
+3 900 1883 979
+3 1696 1717 1704
+3 1026 1561 1562
+3 1769 1796 1776
+3 1013 1723 1715
+3 1734 1717 1718
+3 1832 1831 1813
+3 1180 316 1116
+3 1031 1180 1116
+3 1561 1742 1733
+3 1882 814 804
+3 1820 1835 1815
+3 1442 988 1447
+3 710 577 622
+3 1810 1806 1787
+3 596 1752 1180
+3 1180 316 1116
+3 316 1180 1116
+3 316 993 916
+3 1116 316 916
+3 1759 1787 1781
+3 1776 1800 1775
+3 1732 1735 1714
+3 1886 409 821
+3 1703 1729 1710
+3 1693 1685 1550
+3 1016 1706 1690
+3 1693 1689 1697
+3 1842 1862 1861
+3 1694 1695 1704
+3 1687 1685 1694
+3 988 382 1447
+3 1447 382 576
+3 747 797 120
+3 1776 1775 1755
+3 1199 1167 732
+3 1041 758 596
+3 596 758 1752
+3 1876 1158 107
+3 1685 1695 1694
+3 1858 818 810
+3 382 1431 576
+3 1693 1697 1696
+3 243 1888 798
+3 1117 189 187
+3 1823 1822 1803
+3 1805 1823 1803
+3 1442 1430 1443
+3 818 1858 136
+3 356 477 424
+3 382 1575 1431
+3 916 993 554
+3 467 590 1318
+3 1685 1693 1696
+3 1845 1866 1853
+3 1431 1575 1110
+3 1182 1752 758
+3 1182 1180 1752
+3 1182 825 1180
+3 1180 825 316
+3 1017 136 1837
+3 1430 1440 1443
+3 1440 908 988
+3 988 908 382
+3 1575 549 1110
+3 549 1150 963
+3 1110 549 963
+3 1825 1813 1814
+3 1796 1817 1800
+3 825 633 316
+3 1699 1714 1706
+3 633 993 316
+3 633 1323 993
+3 1430 1869 1440
+3 908 1575 382
+3 1101 963 1150
+3 1734 1738 1717
+3 1819 1825 1814
+3 1801 1819 1814
+3 633 825 1182
+3 1440 1869 908
+3 908 996 1575
+3 759 787 786
+3 1809 1826 1820
+3 1869 1145 908
+3 549 1101 1150
+3 1723 1732 1715
+3 1428 1847 1430
+3 1145 56 908
+3 908 56 996
+3 1144 1101 549
+3 1809 1820 1815
+3 1793 1788 1033
+3 797 633 1182
+3 759 797 1182
+3 1735 1750 1756
+3 996 56 1575
+3 56 549 1575
+3 1750 1763 1756
+3 1847 1848 1430
+3 1848 1859 1430
+3 1430 1859 1869
+3 56 1168 549
+3 1690 1689 1548
+3 1758 1779 1757
+3 786 797 759
+3 747 633 797
+3 633 747 1323
+3 1323 747 167
+3 865 866 873
+3 1778 1791 1770
+3 78 1868 824
+3 1805 1804 1779
+3 1428 1848 1847
+3 56 1053 1168
+3 1168 1131 1144
+3 549 1168 1144
+3 1877 1117 1876
+3 1707 1722 1711
+3 1731 1735 1739
+3 830 1823 1805
+3 830 1822 1823
+3 1859 1873 1145
+3 1869 1859 1145
+3 1131 1101 1144
+3 1814 1813 1790
+3 830 1017 1822
+3 207 562 561
+3 1772 1771 1093
+3 1557 1772 1093
+3 866 1033 1042
+3 1617 1417 1428
+3 1428 1417 1848
+3 1734 1755 1738
+3 1821 1836 1844
+3 1801 1814 1790
+3 895 797 786
+3 787 759 760
+3 1873 961 1145
+3 1145 961 56
+3 1053 1131 1168
+3 1833 1101 1131
+3 1833 710 1101
+3 1853 1871 1852
+3 1689 1690 1698
+3 1772 1275 1771
+3 1848 1417 1859
+3 242 830 1806
+3 1713 1734 1718
+3 1800 1799 1775
+3 1695 1696 1704
+3 1518 1557 1499
+3 1794 1275 1772
+3 1809 1815 1792
+3 1417 1418 1859
+3 961 951 56
+3 56 951 1053
+3 1833 1131 710
+3 1019 1742 1561
+3 1758 1757 1742
+3 1780 1779 1758
+3 1748 1557 1518
+3 1794 1277 1275
+3 1873 1879 961
+3 951 559 1053
+3 1053 559 1131
+3 1876 107 827
+3 1747 1769 1749
+3 1267 1277 1794
+3 1418 1863 1859
+3 1859 1863 1873
+3 1868 815 1867
+3 1766 1772 1557
+3 1748 1766 1557
+3 866 1793 1033
+3 559 1196 1131
+3 1196 1343 1131
+3 1131 1343 710
+3 1766 1794 1772
+3 180 1229 336
+3 415 1879 1873
+3 415 961 1879
+3 1343 577 710
+3 866 865 1793
+3 1362 1557 1093
+3 1789 1794 1766
+3 1873 1863 415
+3 1456 1442 1447
+3 1018 1742 1019
+3 1743 1748 1518
+3 1531 1743 1518
+3 1789 1267 1794
+3 434 961 415
+3 434 951 961
+3 1781 1779 1780
+3 1811 1267 1789
+3 1781 1805 1779
+3 404 522 438
+3 434 736 951
+3 736 559 951
+3 1743 1766 1748
+3 559 736 1196
+3 1722 1739 1747
+3 715 1393 1169
+3 1534 1090 1360
+3 1269 1271 1270
+3 604 1124 272
+3 1358 1531 1500
+3 1824 1268 1267
+3 1811 1824 1267
+3 155 813 1880
+3 736 480 1196
+3 1196 480 1343
+3 1343 480 577
+3 1790 1813 1796
+3 1782 1789 1766
+3 1760 1782 1766
+3 1803 1802 1797
+3 1777 1770 1785
+3 1719 1743 1531
+3 1760 1766 1743
+3 1838 1269 1268
+3 1824 1838 1268
+3 1750 1770 1777
+3 480 578 577
+3 1689 1693 1550
+3 1763 1750 1777
+3 1770 1777 1785
+3 1691 1358 1356
+3 1719 1531 1358
+3 736 1062 480
+3 1813 1831 1817
+3 1549 1689 1550
+3 1756 1763 1785
+3 1691 1356 1355
+3 1085 1691 1355
+3 1761 1760 1743
+3 1744 1761 1743
+3 1807 1811 1789
+3 1838 1271 1269
+3 78 824 79
+3 1876 187 1158
+3 1821 1844 1827
+3 1708 1719 1358
+3 1744 1743 1719
+3 1782 1807 1789
+3 1807 1824 1811
+3 1860 1271 1838
+3 736 451 1062
+3 1756 1790 1769
+3 1725 1740 1724
+3 1698 1705 1711
+3 1510 1691 1085
+3 1708 1358 1691
+3 1761 1782 1760
+3 1818 1824 1807
+3 1860 1272 1271
+3 1718 1717 1696
+3 451 207 480
+3 1062 451 480
+3 1817 1830 1816
+3 1796 1800 1776
+3 1510 1085 1052
+3 1818 1838 1824
+3 1875 1272 1860
+3 1813 1817 1796
+3 1720 1719 1708
+3 1736 1744 1719
+3 1839 1838 1818
+3 96 807 486
+3 1018 1758 1742
+3 1842 1861 1845
+3 1881 814 1872
+3 1781 1780 1758
+3 1867 1881 1872
+3 1692 1691 1510
+3 1720 1736 1719
+3 1783 1782 1761
+3 1839 1860 1838
+3 1875 1661 586
+3 1272 1875 586
+3 1806 1805 1781
+3 1712 1716 1708
+3 1727 1720 1708
+3 1716 1727 1708
+3 1854 1860 1839
+3 1143 1661 1875
+3 1661 99 586
+3 1143 99 1661
+3 1031 60 236
+3 1709 1708 1691
+3 1692 1709 1691
+3 1709 1712 1708
+3 1709 1716 1712
+3 1709 1727 1716
+3 1727 1728 1720
+3 1786 1803 1797
+3 747 120 167
+3 484 207 451
+3 1736 1753 1744
+3 1753 1761 1744
+3 1753 1783 1761
+3 1874 1860 1854
+3 1874 1875 1860
+3 1792 1791 1778
+3 824 1868 1856
+3 1817 1816 1800
+3 520 207 484
+3 1692 1700 1709
+3 1728 1727 1709
+3 1728 1745 1727
+3 1511 1688 1510
+3 1688 1692 1510
+3 1874 1143 1875
+3 1013 1715 1699
+3 1749 1776 1755
+3 1759 1758 1018
+3 1688 1701 1700
+3 1692 1688 1700
+3 1773 1783 1753
+3 1849 1854 1839
+3 813 1143 1874
+3 813 99 1143
+3 1842 1845 1832
+3 1710 1709 1700
+3 1701 1710 1700
+3 1872 804 1878
+3 1825 1842 1832
+3 1853 1852 1830
+3 1730 1749 1734
+3 1689 1698 1697
+3 1805 1806 830
+3 1710 1728 1709
+3 1840 1849 1839
+3 1831 1853 1830
+3 1724 1723 1013
+3 1729 1728 1710
+3 1849 1874 1854
+3 1849 1870 1874
+3 1741 1740 1725
+3 1755 1775 1754
+3 1731 1714 1735
+3 1747 1749 1730
+3 1706 1714 1707
+3 1729 1745 1728
+3 1767 1773 1753
+3 1880 813 1874
+3 1870 1880 1874
+3 865 858 1810
+3 1810 858 259
+3 1861 1866 1845
+3 1790 1796 1769
+3 1755 1754 1738
+3 1739 1735 1747
+3 1686 1688 1511
+3 1439 1686 1511
+3 1746 1745 1729
+3 1774 1773 1767
+3 1834 1849 1840
+3 1016 1690 1548
+3 1788 1787 1765
+3 1551 1016 1548
+3 1686 1702 1701
+3 1688 1686 1701
+3 1703 1710 1701
+3 1702 1703 1701
+3 1795 1798 1773
+3 1825 1832 1813
+3 1697 1713 1696
+3 1762 1745 1746
+3 1762 1768 1745
+3 1795 1773 1774
diff --git a/modules/viz/samples/histo3D.cpp b/modules/viz/samples/histo3D.cpp
new file mode 100644
index 00000000000..a3ef0aacba5
--- /dev/null
+++ b/modules/viz/samples/histo3D.cpp
@@ -0,0 +1,188 @@
+#include <opencv2/core.hpp>
+#include <opencv2/imgproc.hpp>
+#include <opencv2/highgui.hpp>
+#include <iostream>
+
+using namespace std;
+using namespace cv;
+
+#ifdef HAVE_OPENCV_VIZ
+
+#include <opencv2/viz.hpp>
+
+const String keys =
+"{Aide h usage ? help  |     | print this message   }"
+"{@arg1                |     | Full path to color imag (3 channels)}"
+;
+
+
+struct Histo3DData {
+    Mat histogram;
+    int seuil;
+    double threshold;
+    Ptr<viz::Viz3d> fen3D;
+    int nbWidget;
+    bool status;
+    double maxH;
+    int code;
+};
+
+void DrawHistogram3D(Histo3DData &);
+void AddSlidebar(String sliderName, String windowName, int sliderMin, int sliderMax, int valeurDefaut, int *sliderVal, void(*f)(int, void *), void *r);
+void UpdateThreshold(int , void * r);
+void  KeyboardViz3d(const viz::KeyboardEvent &w, void *t);
+
+
+void DrawHistogram3D(Histo3DData &h)
+{
+    //! [get_cube_size]
+    int planSize = (int)h.histogram.step1(0);
+    int cols = (int)h.histogram.step1(1);
+    int rows = (int)planSize / cols;
+    int plans = (int)h.histogram.total() / planSize;
+    h.fen3D->removeAllWidgets();
+    h.nbWidget=0;
+    if (h.nbWidget==0)
+        h.fen3D->showWidget("Axis", viz::WCoordinateSystem(10));
+    //! [get_cube_size]
+    //! [get_cube_values]
+    for (int k = 0; k < plans; k++)
+    {
+        for (int i = 0; i < rows; i++)
+        {
+            for (int j = 0; j < cols; j++)
+            {
+                double x = h.histogram.at<float>(k, i, j);
+                if (x >= h.threshold)
+                {
+                    double r=std::max(x/h.maxH,0.1);
+                    viz::WCube s(Point3d(k - r / 2, i - r / 2, j - r / 2), Point3d(k + r / 2, i + r / 2, j + r / 2), false, viz::Color(j / double(plans) * 255, i / double(rows) * 255, k / double(cols) * 255));
+                    h.fen3D->showWidget(format("I3d%d", h.nbWidget++), s);
+                }
+            }
+        }
+    }
+    //! [get_cube_values]
+    h.status = false;
+}
+//! [viz_keyboard_callback]
+void  KeyboardViz3d(const viz::KeyboardEvent &w, void *t)
+{
+   Histo3DData *x=(Histo3DData *)t;
+   if (w.action)
+       cout << "you pressed "<< w.symbol<< " in viz window "<<x->fen3D->getWindowName()<<"\n";
+   x->code= w.code;
+   switch (w.code) {
+   case '/':
+           x->status=true;
+           x->threshold *= 0.9;
+       break;
+   case '*':
+       x->status = true;
+           x->threshold *= 1.1;
+       break;
+    }
+   if (x->status)
+   {
+       cout <<  x->threshold << "\n";
+       DrawHistogram3D(*x);
+   }
+}
+//! [viz_keyboard_callback]
+
+
+void AddSlidebar(String sliderName, String windowName, int sliderMin, int sliderMax, int defaultSlider, int *sliderVal, void(*f)(int, void *), void *r)
+{
+    createTrackbar(sliderName, windowName, sliderVal, 1, f, r);
+    setTrackbarMin(sliderName, windowName, sliderMin);
+    setTrackbarMax(sliderName, windowName, sliderMax);
+    setTrackbarPos(sliderName, windowName, defaultSlider);
+}
+
+
+void UpdateThreshold(int , void * r)
+{
+    Histo3DData *h = (Histo3DData *)r;
+    h->status=true;
+    h->threshold = h->seuil/1000000.0;
+    cout<<"Widget : "<<h->nbWidget<<","<< h->threshold<<"\n";
+}
+
+int main (int argc,char **argv)
+{
+    //! [command_line_parser]
+    CommandLineParser parser(argc, argv, keys);
+
+    if (parser.has("help"))
+    {
+        parser.printMessage();
+        return 0;
+    }
+    String nomFic = parser.get<String>(0);
+    Mat img;
+    if (nomFic.length() != 0)
+    {
+        img = imread(nomFic, IMREAD_COLOR);
+        if (img.empty())
+        {
+            cout << "Image does not exist!";
+            return 0;
+        }
+    }
+    //! [command_line_parser]
+    //! [synthetic_image]
+    else
+    {
+        img = Mat(512,512,CV_8UC3);
+        parser.printMessage();
+        RNG r;
+        r.fill(img(Rect(0, 0, 256, 256)), RNG::NORMAL, Vec3b(60, 40, 50), Vec3b(10, 5, 20));
+        r.fill(img(Rect(256, 0, 256, 256)), RNG::NORMAL, Vec3b(160, 10, 50), Vec3b(20, 5, 10));
+        r.fill(img(Rect(0, 256, 256, 256)), RNG::NORMAL, Vec3b(90, 100, 50), Vec3b(10, 20, 20));
+        r.fill(img(Rect(256, 256, 256, 256)), RNG::NORMAL, Vec3b(100, 10, 150), Vec3b(10, 5, 40));
+    }
+    //! [synthetic_image]
+    //! [calchist_for_histo3d]
+    Histo3DData h;
+    h.status=true;
+    h.seuil=90;
+    h.threshold= h.seuil/1000000.0;
+    float hRange[] = { 0, 256 };
+    const float* etendu[] = { hRange, hRange,hRange };
+    int hBins = 32;
+    int histSize[] = { hBins, hBins , hBins  };
+    int channel[] = { 2, 1,0 };
+    calcHist(&img, 1, channel, Mat(), h.histogram, 3, histSize, etendu, true, false);
+    normalize(h.histogram, h.histogram, 100.0/(img.total()), 0, NORM_MINMAX, -1, Mat());
+    minMaxIdx(h.histogram,NULL,&h.maxH,NULL,NULL);
+    //! [calchist_for_histo3d]
+    //! [slide_bar_for_thresh]
+    namedWindow("Image");
+    imshow("Image",img);
+    AddSlidebar("threshold","Image",0,100,h.seuil,&h.seuil, UpdateThreshold,&h);
+    waitKey(30);
+    //! [slide_bar_for_thresh]
+    //! [manage_viz_imshow_window]
+    h.fen3D = makePtr<viz::Viz3d>("3D Histogram");
+    h.nbWidget=0;
+    h.fen3D->registerKeyboardCallback(KeyboardViz3d,&h);
+    DrawHistogram3D(h);
+    while (h.code!=27)
+    {
+        h.fen3D->spinOnce(1);
+        if (h.status)
+            DrawHistogram3D(h);
+        if (h.code!=27)
+            h.code= waitKey(30);
+    }
+    //! [manage_viz_imshow_window]
+    return 0;
+}
+#else
+
+int main(int argc, char **argv)
+{
+cout << " you need VIZ module\n";
+return 0;
+}
+#endif
diff --git a/modules/viz/samples/launching_viz.cpp b/modules/viz/samples/launching_viz.cpp
new file mode 100644
index 00000000000..63097106605
--- /dev/null
+++ b/modules/viz/samples/launching_viz.cpp
@@ -0,0 +1,66 @@
+/**
+ * @file launching_viz.cpp
+ * @brief Launching visualization window
+ * @author Ozan Cagri Tonkal
+ */
+
+#include <opencv2/viz.hpp>
+#include <iostream>
+
+using namespace cv;
+using namespace std;
+
+/**
+ * @function help
+ * @brief Display instructions to use this tutorial program
+ */
+static void help()
+{
+    cout
+    << "--------------------------------------------------------------------------" << endl
+    << "This program shows how to launch a 3D visualization window. You can stop event loop to continue executing. "
+    << "You can access the same window via its name. You can run event loop for a given period of time. " << endl
+    << "Usage:"                                                                     << endl
+    << "./launching_viz"                                                            << endl
+    << endl;
+}
+
+/**
+ * @function main
+ */
+int main()
+{
+    help();
+    /// Create a window
+    viz::Viz3d myWindow("Viz Demo");
+
+    /// Start event loop
+    myWindow.spin();
+
+    /// Event loop is over when pressed q, Q, e, E
+    cout << "First event loop is over" << endl;
+
+    /// Access window via its name
+    viz::Viz3d sameWindow = viz::getWindowByName("Viz Demo");
+
+    /// Start event loop
+    sameWindow.spin();
+
+    /// Event loop is over when pressed q, Q, e, E
+    cout << "Second event loop is over" << endl;
+
+    /// Event loop is over when pressed q, Q, e, E
+    /// Start event loop once for 1 millisecond
+    sameWindow.spinOnce(1, true);
+    while(!sameWindow.wasStopped())
+    {
+        /// Interact with window
+
+        /// Event loop for 1 millisecond
+        sameWindow.spinOnce(1, true);
+    }
+
+    /// Once more event loop is stopped
+    cout << "Last event loop is over" << endl;
+    return 0;
+}
diff --git a/modules/viz/samples/transformations.cpp b/modules/viz/samples/transformations.cpp
new file mode 100644
index 00000000000..84a19854c93
--- /dev/null
+++ b/modules/viz/samples/transformations.cpp
@@ -0,0 +1,112 @@
+/**
+ * @file transformations.cpp
+ * @brief Visualizing cloud in different positions, coordinate frames, camera frustums
+ * @author Ozan Cagri Tonkal
+ */
+
+#include <opencv2/viz.hpp>
+#include <iostream>
+#include <fstream>
+
+using namespace cv;
+using namespace std;
+
+/**
+ * @function help
+ * @brief Display instructions to use this tutorial program
+ */
+static void help()
+{
+    cout
+    << "--------------------------------------------------------------------------"   << endl
+    << "This program shows how to use makeTransformToGlobal() to compute required pose,"
+    << "how to use makeCameraPose and Viz3d::setViewerPose. You can observe the scene "
+    << "from camera point of view (C) or global point of view (G)"                    << endl
+    << "Usage:"                                                                       << endl
+    << "./transformations [ G | C ]"                                                 << endl
+    << endl;
+}
+
+/**
+ * @function cvcloud_load
+ * @brief load bunny.ply
+ */
+static Mat cvcloud_load()
+{
+    Mat cloud(1, 1889, CV_32FC3);
+    ifstream ifs("bunny.ply");
+
+    string str;
+    for(size_t i = 0; i < 12; ++i)
+        getline(ifs, str);
+
+    Point3f* data = cloud.ptr<cv::Point3f>();
+    float dummy1, dummy2;
+    for(size_t i = 0; i < 1889; ++i)
+        ifs >> data[i].x >> data[i].y >> data[i].z >> dummy1 >> dummy2;
+
+    cloud *= 5.0f;
+    return cloud;
+}
+
+/**
+ * @function main
+ */
+int main(int argn, char **argv)
+{
+    help();
+
+    if (argn < 2)
+    {
+        cout << "Missing arguments." << endl;
+        return 1;
+    }
+
+    bool camera_pov = (argv[1][0] == 'C');
+
+    /// Create a window
+    viz::Viz3d myWindow("Coordinate Frame");
+
+    /// Add coordinate axes
+    myWindow.showWidget("Coordinate Widget", viz::WCoordinateSystem());
+
+    /// Let's assume camera has the following properties
+    Vec3f cam_pos(3.0f,3.0f,3.0f), cam_focal_point(3.0f,3.0f,2.0f), cam_y_dir(-1.0f,0.0f,0.0f);
+
+    /// We can get the pose of the cam using makeCameraPose
+    Affine3f cam_pose = viz::makeCameraPose(cam_pos, cam_focal_point, cam_y_dir);
+
+    /// We can get the transformation matrix from camera coordinate system to global using
+    /// - makeTransformToGlobal. We need the axes of the camera
+    Affine3f transform = viz::makeTransformToGlobal(Vec3f(0.0f,-1.0f,0.0f), Vec3f(-1.0f,0.0f,0.0f), Vec3f(0.0f,0.0f,-1.0f), cam_pos);
+
+    /// Create a cloud widget.
+    Mat bunny_cloud = cvcloud_load();
+    viz::WCloud cloud_widget(bunny_cloud, viz::Color::green());
+
+    /// Pose of the widget in camera frame
+    Affine3f cloud_pose = Affine3f().translate(Vec3f(0.0f,0.0f,3.0f));
+    /// Pose of the widget in global frame
+    Affine3f cloud_pose_global = transform * cloud_pose;
+
+    /// Visualize camera frame
+    if (!camera_pov)
+    {
+        viz::WCameraPosition cpw(0.5); // Coordinate axes
+        viz::WCameraPosition cpw_frustum(Vec2f(0.889484, 0.523599)); // Camera frustum
+        myWindow.showWidget("CPW", cpw, cam_pose);
+        myWindow.showWidget("CPW_FRUSTUM", cpw_frustum, cam_pose);
+    }
+
+    /// Visualize widget
+    myWindow.showWidget("bunny", cloud_widget, cloud_pose_global);
+
+    /// Set the viewer pose to that of camera
+    if (camera_pov)
+        myWindow.setViewerPose(cam_pose);
+
+    /// Start event loop.
+    myWindow.spin();
+
+    return 0;
+}
diff --git a/modules/viz/samples/widget_pose.cpp b/modules/viz/samples/widget_pose.cpp
new file mode 100644
index 00000000000..5de82bd3858
--- /dev/null
+++ b/modules/viz/samples/widget_pose.cpp
@@ -0,0 +1,79 @@
+/**
+ * @file widget_pose.cpp
+ * @brief Setting pose of a widget
+ * @author Ozan Cagri Tonkal
+ */
+
+#include <opencv2/viz.hpp>
+#include <opencv2/calib3d.hpp>
+#include <iostream>
+
+using namespace cv;
+using namespace std;
+
+/**
+ * @function help
+ * @brief Display instructions to use this tutorial program
+ */
+static void help()
+{
+    cout
+    << "--------------------------------------------------------------------------"   << endl
+    << "This program shows how to visualize a cube rotated around (1,1,1) and shifted "
+    << "using Rodrigues vector."                                                      << endl
+    << "Usage:"                                                                       << endl
+    << "./widget_pose"                                                                << endl
+    << endl;
+}
+
+/**
+ * @function main
+ */
+int main()
+{
+    help();
+
+    /// Create a window
+    viz::Viz3d myWindow("Coordinate Frame");
+
+    /// Add coordinate axes
+    myWindow.showWidget("Coordinate Widget", viz::WCoordinateSystem());
+
+    /// Add line to represent (1,1,1) axis
+    viz::WLine axis(Point3f(-1.0f,-1.0f,-1.0f), Point3f(1.0f,1.0f,1.0f));
+    axis.setRenderingProperty(viz::LINE_WIDTH, 4.0);
+    myWindow.showWidget("Line Widget", axis);
+
+    /// Construct a cube widget
+    viz::WCube cube_widget(Point3f(0.5,0.5,0.0), Point3f(0.0,0.0,-0.5), true, viz::Color::blue());
+    cube_widget.setRenderingProperty(viz::LINE_WIDTH, 4.0);
+    myWindow.showWidget("Cube Widget", cube_widget);
+
+    /// Rodrigues vector
+    Mat rot_vec = Mat::zeros(1,3,CV_32F);
+    float translation_phase = 0.0, translation = 0.0;
+    while(!myWindow.wasStopped())
+    {
+        /* Rotation using rodrigues */
+        /// Rotate around (1,1,1)
+        rot_vec.at<float>(0,0) += (float)CV_PI * 0.01f;
+        rot_vec.at<float>(0,1) += (float)CV_PI * 0.01f;
+        rot_vec.at<float>(0,2) += (float)CV_PI * 0.01f;
+
+        /// Shift on (1,1,1)
+        translation_phase += (float)CV_PI * 0.01f;
+        translation = sin(translation_phase);
+
+        Mat rot_mat;
+        Rodrigues(rot_vec, rot_mat);
+
+        /// Construct pose
+        Affine3f pose(rot_mat, Vec3f(translation, translation, translation));
+
+        myWindow.setWidgetPose("Cube Widget", pose);
+
+        myWindow.spinOnce(1, true);
+    }
+
+    return 0;
+}
diff --git a/modules/viz/src/clouds.cpp b/modules/viz/src/clouds.cpp
new file mode 100644
index 00000000000..c9a13fcfec2
--- /dev/null
+++ b/modules/viz/src/clouds.cpp
@@ -0,0 +1,530 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#include "precomp.hpp"
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// Point Cloud Widget implementation
+
+cv::viz::WCloud::WCloud(InputArray cloud, InputArray colors)
+{
+    WCloud cloud_widget(cloud, colors, cv::noArray());
+    *this = cloud_widget;
+}
+
+cv::viz::WCloud::WCloud(InputArray cloud, const Color &color)
+{
+    WCloud cloud_widget(cloud, Mat(cloud.size(), CV_8UC3, color));
+    *this = cloud_widget;
+}
+
+cv::viz::WCloud::WCloud(InputArray cloud, const Color &color, InputArray normals)
+{
+    WCloud cloud_widget(cloud, Mat(cloud.size(), CV_8UC3, color), normals);
+    *this = cloud_widget;
+}
+
+cv::viz::WCloud::WCloud(cv::InputArray cloud, cv::InputArray colors, cv::InputArray normals)
+{
+    CV_Assert(!cloud.empty() && !colors.empty());
+
+    vtkSmartPointer<vtkCloudMatSource> cloud_source = vtkSmartPointer<vtkCloudMatSource>::New();
+    cloud_source->SetColorCloudNormals(cloud, colors, normals);
+    cloud_source->Update();
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, cloud_source->GetOutput());
+    mapper->SetScalarModeToUsePointData();
+#if VTK_MAJOR_VERSION < 8
+    mapper->ImmediateModeRenderingOff();
+#endif
+    mapper->SetScalarRange(0, 255);
+    mapper->ScalarVisibilityOn();
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->GetProperty()->SetInterpolationToFlat();
+    actor->GetProperty()->BackfaceCullingOn();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+template<> cv::viz::WCloud cv::viz::Widget::cast<cv::viz::WCloud>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WCloud&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// Painted Cloud Widget implementation
+
+cv::viz::WPaintedCloud::WPaintedCloud(InputArray cloud)
+{
+    vtkSmartPointer<vtkCloudMatSource> cloud_source = vtkSmartPointer<vtkCloudMatSource>::New();
+    cloud_source->SetCloud(cloud);
+    cloud_source->Update();
+
+    Vec6d bounds(cloud_source->GetOutput()->GetPoints()->GetBounds());
+
+    vtkSmartPointer<vtkElevationFilter> elevation = vtkSmartPointer<vtkElevationFilter>::New();
+    elevation->SetInputConnection(cloud_source->GetOutputPort());
+    elevation->SetLowPoint(bounds[0], bounds[2], bounds[4]);
+    elevation->SetHighPoint(bounds[1], bounds[3], bounds[5]);
+    elevation->SetScalarRange(0.0, 1.0);
+    elevation->Update();
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, vtkPolyData::SafeDownCast(elevation->GetOutput()));
+#if VTK_MAJOR_VERSION < 8
+    mapper->ImmediateModeRenderingOff();
+#endif
+    mapper->ScalarVisibilityOn();
+    mapper->SetColorModeToMapScalars();
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->GetProperty()->SetInterpolationToFlat();
+    actor->GetProperty()->BackfaceCullingOn();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+cv::viz::WPaintedCloud::WPaintedCloud(InputArray cloud, const Point3d& p1, const Point3d& p2)
+{
+    vtkSmartPointer<vtkCloudMatSource> cloud_source = vtkSmartPointer<vtkCloudMatSource>::New();
+    cloud_source->SetCloud(cloud);
+
+    vtkSmartPointer<vtkElevationFilter> elevation = vtkSmartPointer<vtkElevationFilter>::New();
+    elevation->SetInputConnection(cloud_source->GetOutputPort());
+    elevation->SetLowPoint(p1.x, p1.y, p1.z);
+    elevation->SetHighPoint(p2.x, p2.y, p2.z);
+    elevation->SetScalarRange(0.0, 1.0);
+    elevation->Update();
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, vtkPolyData::SafeDownCast(elevation->GetOutput()));
+#if VTK_MAJOR_VERSION < 8
+    mapper->ImmediateModeRenderingOff();
+#endif
+    mapper->ScalarVisibilityOn();
+    mapper->SetColorModeToMapScalars();
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->GetProperty()->SetInterpolationToFlat();
+    actor->GetProperty()->BackfaceCullingOn();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+cv::viz::WPaintedCloud::WPaintedCloud(InputArray cloud, const Point3d& p1, const Point3d& p2, const Color& c1, const Color c2)
+{
+    vtkSmartPointer<vtkCloudMatSource> cloud_source = vtkSmartPointer<vtkCloudMatSource>::New();
+    cloud_source->SetCloud(cloud);
+
+    vtkSmartPointer<vtkElevationFilter> elevation = vtkSmartPointer<vtkElevationFilter>::New();
+    elevation->SetInputConnection(cloud_source->GetOutputPort());
+    elevation->SetLowPoint(p1.x, p1.y, p1.z);
+    elevation->SetHighPoint(p2.x, p2.y, p2.z);
+    elevation->SetScalarRange(0.0, 1.0);
+    elevation->Update();
+
+    Color vc1 = vtkcolor(c1), vc2 = vtkcolor(c2);
+    vtkSmartPointer<vtkColorTransferFunction> color_transfer = vtkSmartPointer<vtkColorTransferFunction>::New();
+    color_transfer->SetColorSpaceToRGB();
+    color_transfer->AddRGBPoint(0.0, vc1[0], vc1[1], vc1[2]);
+    color_transfer->AddRGBPoint(1.0, vc2[0], vc2[1], vc2[2]);
+    color_transfer->SetScaleToLinear();
+    color_transfer->Build();
+
+    //if in future some need to replace color table with real scalars, then this can be done usine next calls:
+    //vtkDataArray *float_scalars = vtkPolyData::SafeDownCast(elevation->GetOutput())->GetPointData()->GetArray("Elevation");
+    //vtkSmartPointer<vtkPolyData> polydata = cloud_source->GetOutput();
+    //polydata->GetPointData()->SetScalars(color_transfer->MapScalars(float_scalars, VTK_COLOR_MODE_DEFAULT, 0));
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, vtkPolyData::SafeDownCast(elevation->GetOutput()));
+#if VTK_MAJOR_VERSION < 8
+    mapper->ImmediateModeRenderingOff();
+#endif
+    mapper->ScalarVisibilityOn();
+    mapper->SetColorModeToMapScalars();
+    mapper->SetLookupTable(color_transfer);
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->GetProperty()->SetInterpolationToFlat();
+    actor->GetProperty()->BackfaceCullingOn();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+template<> cv::viz::WPaintedCloud cv::viz::Widget::cast<cv::viz::WPaintedCloud>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WPaintedCloud&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// Cloud Collection Widget implementation
+
+cv::viz::WCloudCollection::WCloudCollection()
+{
+    vtkSmartPointer<vtkAppendPolyData> append_filter = vtkSmartPointer<vtkAppendPolyData>::New();
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    mapper->SetInputConnection(append_filter->GetOutputPort());
+    mapper->SetScalarModeToUsePointData();
+#if VTK_MAJOR_VERSION < 8
+    mapper->ImmediateModeRenderingOff();
+#endif
+    mapper->SetScalarRange(0, 255);
+    mapper->ScalarVisibilityOn();
+
+    vtkSmartPointer<vtkLODActor> actor = vtkSmartPointer<vtkLODActor>::New();
+    actor->SetNumberOfCloudPoints(1);
+    actor->GetProperty()->SetInterpolationToFlat();
+    actor->GetProperty()->BackfaceCullingOn();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+void cv::viz::WCloudCollection::addCloud(InputArray cloud, InputArray colors, const Affine3d &pose)
+{
+    vtkSmartPointer<vtkCloudMatSource> source = vtkSmartPointer<vtkCloudMatSource>::New();
+    source->SetColorCloud(cloud, colors);
+
+    vtkSmartPointer<vtkPolyData> polydata = VtkUtils::TransformPolydata(source->GetOutputPort(), pose);
+
+    vtkSmartPointer<vtkLODActor> actor = vtkLODActor::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("Correctness check." && actor);
+
+    vtkSmartPointer<vtkAlgorithm> producer = actor->GetMapper()->GetInputConnection(0, 0)->GetProducer();
+    vtkSmartPointer<vtkAppendPolyData> append_filter = vtkAppendPolyData::SafeDownCast(producer);
+    VtkUtils::AddInputData(append_filter, polydata);
+
+    actor->SetNumberOfCloudPoints(std::max<vtkIdType>(1, actor->GetNumberOfCloudPoints() + polydata->GetNumberOfPoints()/10));
+}
+
+void cv::viz::WCloudCollection::addCloud(InputArray cloud, const Color &color, const Affine3d &pose)
+{
+    addCloud(cloud, Mat(cloud.size(), CV_8UC3, color), pose);
+}
+
+void cv::viz::WCloudCollection::finalize()
+{
+    vtkSmartPointer<vtkLODActor> actor = vtkLODActor::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("Incompatible widget type." && actor);
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkPolyDataMapper::SafeDownCast(actor->GetMapper());
+    CV_Assert("Need to add at least one cloud." && mapper);
+
+    vtkSmartPointer<vtkAlgorithm> producer = mapper->GetInputConnection(0, 0)->GetProducer();
+    vtkSmartPointer<vtkAppendPolyData> append_filter = vtkAppendPolyData::SafeDownCast(producer);
+    append_filter->Update();
+
+    vtkSmartPointer<vtkPolyData> polydata = append_filter->GetOutput();
+    mapper->RemoveInputConnection(0, 0);
+    VtkUtils::SetInputData(mapper, polydata);
+}
+
+template<> cv::viz::WCloudCollection cv::viz::Widget::cast<cv::viz::WCloudCollection>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WCloudCollection&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// Cloud Normals Widget implementation
+
+cv::viz::WCloudNormals::WCloudNormals(InputArray _cloud, InputArray _normals, int level, double scale, const Color &color)
+{
+    Mat cloud = _cloud.getMat();
+    Mat normals = _normals.getMat();
+
+    CV_Assert(cloud.type() == CV_32FC3 || cloud.type() == CV_64FC3 || cloud.type() == CV_32FC4 || cloud.type() == CV_64FC4);
+    CV_Assert(cloud.size() == normals.size() && cloud.type() == normals.type());
+
+    int sqlevel = (int)std::sqrt((double)level);
+    int ystep = (cloud.cols > 1 && cloud.rows > 1) ? sqlevel : 1;
+    int xstep = (cloud.cols > 1 && cloud.rows > 1) ? sqlevel : level;
+
+    vtkSmartPointer<vtkPoints> points = vtkSmartPointer<vtkPoints>::New();
+    points->SetDataType(cloud.depth() == CV_32F ? VTK_FLOAT : VTK_DOUBLE);
+
+    vtkSmartPointer<vtkCellArray> lines = vtkSmartPointer<vtkCellArray>::New();
+
+    int s_chs = cloud.channels();
+    int n_chs = normals.channels();
+    int total = 0;
+
+    for(int y = 0; y < cloud.rows; y += ystep)
+    {
+        if (cloud.depth() == CV_32F)
+        {
+            const float *srow = cloud.ptr<float>(y);
+            const float *send = srow + cloud.cols * s_chs;
+            const float *nrow = normals.ptr<float>(y);
+
+            for (; srow < send; srow += xstep * s_chs, nrow += xstep * n_chs)
+                if (!isNan(srow) && !isNan(nrow))
+                {
+                    Vec3f endp = Vec3f(srow) + Vec3f(nrow) * (float)scale;
+
+                    points->InsertNextPoint(srow);
+                    points->InsertNextPoint(endp.val);
+
+                    lines->InsertNextCell(2);
+                    lines->InsertCellPoint(total++);
+                    lines->InsertCellPoint(total++);
+                }
+        }
+        else
+        {
+            const double *srow = cloud.ptr<double>(y);
+            const double *send = srow + cloud.cols * s_chs;
+            const double *nrow = normals.ptr<double>(y);
+
+            for (; srow < send; srow += xstep * s_chs, nrow += xstep * n_chs)
+                if (!isNan(srow) && !isNan(nrow))
+                {
+                    Vec3d endp = Vec3d(srow) + Vec3d(nrow) * (double)scale;
+
+                    points->InsertNextPoint(srow);
+                    points->InsertNextPoint(endp.val);
+
+                    lines->InsertNextCell(2);
+                    lines->InsertCellPoint(total++);
+                    lines->InsertCellPoint(total++);
+                }
+        }
+    }
+
+    vtkSmartPointer<vtkPolyData> polydata = vtkSmartPointer<vtkPolyData>::New();
+    polydata->SetPoints(points);
+    polydata->SetLines(lines);
+    VtkUtils::FillScalars(polydata, color);
+
+    vtkSmartPointer<vtkDataSetMapper> mapper = vtkSmartPointer<vtkDataSetMapper>::New();
+    VtkUtils::SetInputData(mapper, polydata);
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+template<> cv::viz::WCloudNormals cv::viz::Widget::cast<cv::viz::WCloudNormals>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WCloudNormals&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// Mesh Widget implementation
+
+cv::viz::WMesh::WMesh(const Mesh &mesh)
+{
+    CV_Assert(mesh.cloud.rows == 1 && mesh.polygons.type() == CV_32SC1);
+
+    vtkSmartPointer<vtkCloudMatSource> source = vtkSmartPointer<vtkCloudMatSource>::New();
+    source->SetColorCloudNormalsTCoords(mesh.cloud, mesh.colors, mesh.normals, mesh.tcoords);
+    source->Update();
+
+    Mat lookup_buffer(1, (int)mesh.cloud.total(), CV_32SC1);
+    int *lookup = lookup_buffer.ptr<int>();
+    for(int y = 0, index = 0; y < mesh.cloud.rows; ++y)
+    {
+        int s_chs = mesh.cloud.channels();
+
+        if (mesh.cloud.depth() == CV_32F)
+        {
+            const float* srow = mesh.cloud.ptr<float>(y);
+            const float* send = srow + mesh.cloud.cols * s_chs;
+
+            for (; srow != send; srow += s_chs, ++lookup)
+                if (!isNan(srow[0]) && !isNan(srow[1]) && !isNan(srow[2]))
+                    *lookup = index++;
+        }
+
+        if (mesh.cloud.depth() == CV_64F)
+        {
+            const double* srow = mesh.cloud.ptr<double>(y);
+            const double* send = srow + mesh.cloud.cols * s_chs;
+
+            for (; srow != send; srow += s_chs, ++lookup)
+                if (!isNan(srow[0]) && !isNan(srow[1]) && !isNan(srow[2]))
+                    *lookup = index++;
+        }
+    }
+    lookup = lookup_buffer.ptr<int>();
+
+    vtkSmartPointer<vtkPolyData> polydata = source->GetOutput();
+    polydata->SetVerts(0);
+
+    const int * polygons = mesh.polygons.ptr<int>();
+    vtkSmartPointer<vtkCellArray> cell_array = vtkSmartPointer<vtkCellArray>::New();
+
+    int idx = 0;
+    size_t polygons_size = mesh.polygons.total();
+    for (size_t i = 0; i < polygons_size; ++idx)
+    {
+        int n_points = polygons[i++];
+
+        cell_array->InsertNextCell(n_points);
+        for (int j = 0; j < n_points; ++j, ++idx)
+            cell_array->InsertCellPoint(lookup[polygons[i++]]);
+    }
+    cell_array->GetData()->SetNumberOfValues(idx);
+    cell_array->Squeeze();
+    polydata->SetStrips(cell_array);
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    mapper->SetScalarModeToUsePointData();
+#if VTK_MAJOR_VERSION < 8
+    mapper->ImmediateModeRenderingOff();
+#endif
+    VtkUtils::SetInputData(mapper, polydata);
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    //actor->SetNumberOfCloudPoints(std::max(1, polydata->GetNumberOfPoints() / 10));
+    actor->GetProperty()->SetRepresentationToSurface();
+    actor->GetProperty()->BackfaceCullingOff(); // Backface culling is off for higher efficiency
+    actor->GetProperty()->SetInterpolationToFlat();
+    actor->GetProperty()->EdgeVisibilityOff();
+    actor->GetProperty()->ShadingOff();
+    actor->SetMapper(mapper);
+
+    if (!mesh.texture.empty())
+    {
+        vtkSmartPointer<vtkImageMatSource> image_source = vtkSmartPointer<vtkImageMatSource>::New();
+        image_source->SetImage(mesh.texture);
+
+        vtkSmartPointer<vtkTexture> texture = vtkSmartPointer<vtkTexture>::New();
+        texture->SetInputConnection(image_source->GetOutputPort());
+        actor->SetTexture(texture);
+    }
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+cv::viz::WMesh::WMesh(InputArray cloud, InputArray polygons, InputArray colors, InputArray normals)
+{
+    Mesh mesh;
+    mesh.cloud = cloud.getMat();
+    mesh.colors = colors.getMat();
+    mesh.normals = normals.getMat();
+    mesh.polygons = polygons.getMat();
+    *this = WMesh(mesh);
+}
+
+template<> CV_EXPORTS cv::viz::WMesh cv::viz::Widget::cast<cv::viz::WMesh>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WMesh&>(widget);
+}
+
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// Widget Merger implementation
+
+cv::viz::WWidgetMerger::WWidgetMerger()
+{
+    vtkSmartPointer<vtkAppendPolyData> append_filter = vtkSmartPointer<vtkAppendPolyData>::New();
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    mapper->SetInputConnection(append_filter->GetOutputPort());
+    mapper->SetScalarModeToUsePointData();
+#if VTK_MAJOR_VERSION < 8
+    mapper->ImmediateModeRenderingOff();
+#endif
+    mapper->SetScalarRange(0, 255);
+    mapper->ScalarVisibilityOn();
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->GetProperty()->SetInterpolationToFlat();
+    actor->GetProperty()->BackfaceCullingOn();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+void cv::viz::WWidgetMerger::addWidget(const Widget3D& widget, const Affine3d &pose)
+{
+    vtkActor *widget_actor = vtkActor::SafeDownCast(WidgetAccessor::getProp(widget));
+    CV_Assert("Widget is not 3D actor." && widget_actor);
+
+    vtkSmartPointer<vtkPolyDataMapper> widget_mapper = vtkPolyDataMapper::SafeDownCast(widget_actor->GetMapper());
+    CV_Assert("Widget doesn't have a polydata mapper" && widget_mapper);
+    widget_mapper->Update();
+
+    vtkSmartPointer<vtkActor> actor = vtkActor::SafeDownCast(WidgetAccessor::getProp(*this));
+    vtkSmartPointer<vtkAlgorithm> producer = actor->GetMapper()->GetInputConnection(0, 0)->GetProducer();
+    vtkSmartPointer<vtkAppendPolyData> append_filter = vtkAppendPolyData::SafeDownCast(producer);
+    CV_Assert("Correctness check" && append_filter);
+
+    VtkUtils::AddInputData(append_filter, VtkUtils::TransformPolydata(widget_mapper->GetInput(), pose));
+}
+
+void cv::viz::WWidgetMerger::finalize()
+{
+    vtkSmartPointer<vtkActor> actor = vtkActor::SafeDownCast(WidgetAccessor::getProp(*this));
+    vtkSmartPointer<vtkAlgorithm> producer = actor->GetMapper()->GetInputConnection(0, 0)->GetProducer();
+    vtkSmartPointer<vtkAppendPolyData> append_filter = vtkAppendPolyData::SafeDownCast(producer);
+    CV_Assert("Correctness check" && append_filter);
+    append_filter->Update();
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkPolyDataMapper::SafeDownCast(actor->GetMapper());
+    mapper->RemoveInputConnection(0, 0);
+    VtkUtils::SetInputData(mapper, append_filter->GetOutput());
+    mapper->Modified();
+}
+
+template<> CV_EXPORTS cv::viz::WWidgetMerger cv::viz::Widget::cast<cv::viz::WWidgetMerger>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WWidgetMerger&>(widget);
+}
diff --git a/modules/viz/src/precomp.hpp b/modules/viz/src/precomp.hpp
new file mode 100644
index 00000000000..f92fdb6ac22
--- /dev/null
+++ b/modules/viz/src/precomp.hpp
@@ -0,0 +1,344 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#ifndef __OPENCV_VIZ_PRECOMP_HPP__
+#define __OPENCV_VIZ_PRECOMP_HPP__
+
+#include <map>
+#include <ctime>
+#include <list>
+#include <vector>
+#include <iomanip>
+#include <limits>
+
+#include <vtkVersionMacros.h>
+#include <vtkAppendPolyData.h>
+#include <vtkAssemblyPath.h>
+#include <vtkCellData.h>
+#include <vtkLineSource.h>
+#include <vtkPropPicker.h>
+#include <vtkSmartPointer.h>
+#include <vtkDataSet.h>
+#include <vtkPolygon.h>
+#include <vtkUnstructuredGrid.h>
+#include <vtkDiskSource.h>
+#include <vtkPlaneSource.h>
+#include <vtkSphereSource.h>
+#include <vtkArrowSource.h>
+#include <vtkOutlineSource.h>
+#include <vtkTransform.h>
+#include <vtkTransformPolyDataFilter.h>
+#include <vtkTubeFilter.h>
+#include <vtkCubeSource.h>
+#include <vtkAxes.h>
+#include <vtkFloatArray.h>
+#include <vtkDoubleArray.h>
+#include <vtkPointData.h>
+#include <vtkPolyData.h>
+#include <vtkPolyDataMapper.h>
+#include <vtkDataSetMapper.h>
+#include <vtkCellArray.h>
+#include <vtkCommand.h>
+#include <vtkPLYReader.h>
+#include <vtkPolyLine.h>
+#include <vtkVectorText.h>
+#include <vtkFollower.h>
+#include <vtkInteractorStyle.h>
+#include <vtkUnsignedCharArray.h>
+#include <vtkRendererCollection.h>
+#include <vtkPNGWriter.h>
+#include <vtkWindowToImageFilter.h>
+#include <vtkInteractorStyleTrackballCamera.h>
+#include <vtkProperty.h>
+#include <vtkCamera.h>
+#include <vtkPlanes.h>
+#include <vtkImageFlip.h>
+#include <vtkRenderWindow.h>
+#include <vtkTextProperty.h>
+#include <vtkProperty2D.h>
+#include <vtkLODActor.h>
+#include <vtkActor.h>
+#include <vtkTextActor.h>
+#include <vtkRenderWindowInteractor.h>
+#include <vtkMath.h>
+#include <vtkExtractEdges.h>
+#include <vtkFrustumSource.h>
+#include <vtkTexture.h>
+#include <vtkTextureMapToPlane.h>
+#include <vtkPolyDataNormals.h>
+#include <vtkAlgorithmOutput.h>
+#include <vtkImageMapper.h>
+#include <vtkPoints.h>
+#include <vtkInformation.h>
+#include <vtkInformationVector.h>
+#include <vtkObjectFactory.h>
+#include <vtkPolyDataAlgorithm.h>
+#include <vtkMergeFilter.h>
+#include <vtkErrorCode.h>
+#include <vtkPLYWriter.h>
+#include <vtkSTLWriter.h>
+#include <vtkPLYReader.h>
+#include <vtkOBJReader.h>
+#include <vtkSTLReader.h>
+#include <vtkPNGReader.h>
+#include <vtkOBJExporter.h>
+#include <vtkVRMLExporter.h>
+#include <vtkTensorGlyph.h>
+#include <vtkImageAlgorithm.h>
+#include <vtkTransformFilter.h>
+#include <vtkConeSource.h>
+#include <vtkElevationFilter.h>
+#include <vtkColorTransferFunction.h>
+#include <vtkStreamingDemandDrivenPipeline.h>
+#include <vtkLight.h>
+#include "vtkCallbackCommand.h"
+
+#if !defined(_WIN32) || defined(__CYGWIN__)
+# include <unistd.h> /* unlink */
+#else
+# include <io.h> /* unlink */
+#endif
+
+#include "vtk/vtkOBJWriter.h"
+#include "vtk/vtkXYZWriter.h"
+#include "vtk/vtkXYZReader.h"
+#include "vtk/vtkCloudMatSink.h"
+#include "vtk/vtkCloudMatSource.h"
+#include "vtk/vtkTrajectorySource.h"
+#include "vtk/vtkImageMatSource.h"
+
+
+#include <opencv2/core.hpp>
+#include <opencv2/viz.hpp>
+#include <opencv2/viz/widget_accessor.hpp>
+#include <opencv2/core/utility.hpp>
+
+
+namespace cv
+{
+    namespace viz
+    {
+        typedef std::map<String, vtkSmartPointer<vtkProp> > WidgetActorMap;
+
+        struct VizMap
+        {
+            typedef std::map<String, Viz3d> type;
+            typedef type::iterator iterator;
+
+            type m;
+            ~VizMap();
+            void replace_clear();
+        };
+
+        class VizStorage
+        {
+        public:
+            static void unregisterAll();
+
+            //! window names automatically have Viz - prefix even though not provided by the users
+            static String generateWindowName(const String &window_name);
+
+        private:
+            VizStorage(); // Static
+
+            static void add(const Viz3d& window);
+            static Viz3d& get(const String &window_name);
+            static void remove(const String &window_name);
+            static bool windowExists(const String &window_name);
+            static void removeUnreferenced();
+
+            static VizMap storage;
+            friend class Viz3d;
+
+            static VizStorage init;
+        };
+
+        template<typename _Tp> inline _Tp normalized(const _Tp& v) { return v * 1/norm(v); }
+
+        template<typename _Tp> inline bool isNan(const _Tp* data)
+        {
+            return isNan(data[0]) || isNan(data[1]) || isNan(data[2]);
+        }
+
+        inline vtkSmartPointer<vtkActor> getActor(const Widget3D& widget)
+        {
+            return vtkActor::SafeDownCast(WidgetAccessor::getProp(widget));
+        }
+
+        inline vtkSmartPointer<vtkPolyData> getPolyData(const Widget3D& widget)
+        {
+            vtkSmartPointer<vtkMapper> mapper = getActor(widget)->GetMapper();
+            return vtkPolyData::SafeDownCast(mapper->GetInput());
+        }
+
+        inline vtkSmartPointer<vtkMatrix4x4> vtkmatrix(const cv::Matx44d &matrix)
+        {
+            vtkSmartPointer<vtkMatrix4x4> vtk_matrix = vtkSmartPointer<vtkMatrix4x4>::New();
+            vtk_matrix->DeepCopy(matrix.val);
+            return vtk_matrix;
+        }
+
+        inline Color vtkcolor(const Color& color)
+        {
+            Color scaled_color = color * (1.0/255.0);
+            std::swap(scaled_color[0], scaled_color[2]);
+            return scaled_color;
+        }
+
+        inline Vec3d get_random_vec(double from = -10.0, double to = 10.0)
+        {
+            RNG& rng = theRNG();
+            return Vec3d(rng.uniform(from, to), rng.uniform(from, to), rng.uniform(from, to));
+        }
+
+        struct VtkUtils
+        {
+            template<class Filter>
+            static void SetInputData(vtkSmartPointer<Filter> filter, vtkPolyData* polydata)
+            {
+            #if VTK_MAJOR_VERSION <= 5
+                filter->SetInput(polydata);
+            #else
+                filter->SetInputData(polydata);
+            #endif
+            }
+            template<class Filter>
+            static void SetSourceData(vtkSmartPointer<Filter> filter, vtkPolyData* polydata)
+            {
+            #if VTK_MAJOR_VERSION <= 5
+                filter->SetSource(polydata);
+            #else
+                filter->SetSourceData(polydata);
+            #endif
+            }
+
+            template<class Filter>
+            static void SetInputData(vtkSmartPointer<Filter> filter, vtkImageData* polydata)
+            {
+            #if VTK_MAJOR_VERSION <= 5
+                filter->SetInput(polydata);
+            #else
+                filter->SetInputData(polydata);
+            #endif
+            }
+
+            template<class Filter>
+            static void AddInputData(vtkSmartPointer<Filter> filter, vtkPolyData *polydata)
+            {
+            #if VTK_MAJOR_VERSION <= 5
+                filter->AddInput(polydata);
+            #else
+                filter->AddInputData(polydata);
+            #endif
+            }
+
+            static vtkSmartPointer<vtkUnsignedCharArray> FillScalars(size_t size, const Color& color)
+            {
+                Vec3b rgb = Vec3d(color[2], color[1], color[0]);
+                Vec3b* color_data = new Vec3b[size];
+                std::fill(color_data, color_data + size, rgb);
+
+                vtkSmartPointer<vtkUnsignedCharArray> scalars = vtkSmartPointer<vtkUnsignedCharArray>::New();
+                scalars->SetName("Colors");
+                scalars->SetNumberOfComponents(3);
+                scalars->SetNumberOfTuples((vtkIdType)size);
+                scalars->SetArray(color_data->val, (vtkIdType)(size * 3), 0, vtkUnsignedCharArray::VTK_DATA_ARRAY_DELETE);
+                return scalars;
+            }
+
+            static vtkSmartPointer<vtkPolyData> FillScalars(vtkSmartPointer<vtkPolyData> polydata, const Color& color)
+            {
+                return polydata->GetPointData()->SetScalars(FillScalars(polydata->GetNumberOfPoints(), color)), polydata;
+            }
+
+            static vtkSmartPointer<vtkPolyData> ComputeNormals(vtkSmartPointer<vtkPolyData> polydata)
+            {
+                vtkSmartPointer<vtkPolyDataNormals> normals_generator = vtkSmartPointer<vtkPolyDataNormals>::New();
+                normals_generator->ComputePointNormalsOn();
+                normals_generator->ComputeCellNormalsOff();
+                normals_generator->SetFeatureAngle(0.1);
+                normals_generator->SetSplitting(0);
+                normals_generator->SetConsistency(1);
+                normals_generator->SetAutoOrientNormals(0);
+                normals_generator->SetFlipNormals(0);
+                normals_generator->SetNonManifoldTraversal(1);
+                VtkUtils::SetInputData(normals_generator, polydata);
+                normals_generator->Update();
+                return normals_generator->GetOutput();
+            }
+
+            static vtkSmartPointer<vtkPolyData> TransformPolydata(vtkSmartPointer<vtkAlgorithmOutput> algorithm_output_port, const Affine3d& pose)
+            {
+                vtkSmartPointer<vtkTransform> transform = vtkSmartPointer<vtkTransform>::New();
+                transform->SetMatrix(vtkmatrix(pose.matrix));
+
+                vtkSmartPointer<vtkTransformPolyDataFilter> transform_filter = vtkSmartPointer<vtkTransformPolyDataFilter>::New();
+                transform_filter->SetTransform(transform);
+                transform_filter->SetInputConnection(algorithm_output_port);
+                transform_filter->Update();
+                return transform_filter->GetOutput();
+            }
+
+            static vtkSmartPointer<vtkPolyData> TransformPolydata(vtkSmartPointer<vtkPolyData> polydata, const Affine3d& pose)
+            {
+                vtkSmartPointer<vtkTransform> transform = vtkSmartPointer<vtkTransform>::New();
+                transform->SetMatrix(vtkmatrix(pose.matrix));
+
+                vtkSmartPointer<vtkTransformPolyDataFilter> transform_filter = vtkSmartPointer<vtkTransformPolyDataFilter>::New();
+                VtkUtils::SetInputData(transform_filter, polydata);
+                transform_filter->SetTransform(transform);
+                transform_filter->Update();
+                return transform_filter->GetOutput();
+            }
+        };
+
+        vtkSmartPointer<vtkRenderWindowInteractor> vtkCocoaRenderWindowInteractorNew();
+    }
+}
+
+#include "vtk/vtkVizInteractorStyle.hpp"
+#include "vizimpl.hpp"
+
+#endif
diff --git a/modules/viz/src/shapes.cpp b/modules/viz/src/shapes.cpp
new file mode 100644
index 00000000000..399106dd316
--- /dev/null
+++ b/modules/viz/src/shapes.cpp
@@ -0,0 +1,1107 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#include "precomp.hpp"
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// line widget implementation
+cv::viz::WLine::WLine(const Point3d &pt1, const Point3d &pt2, const Color &color)
+{
+    vtkSmartPointer<vtkLineSource> line = vtkSmartPointer<vtkLineSource>::New();
+    line->SetPoint1(pt1.x, pt1.y, pt1.z);
+    line->SetPoint2(pt2.x, pt2.y, pt2.z);
+    line->Update();
+
+    vtkSmartPointer<vtkPolyData> polydata = line->GetOutput();
+    VtkUtils::FillScalars(polydata, color);
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, polydata);
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+template<> cv::viz::WLine cv::viz::Widget::cast<cv::viz::WLine>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WLine&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// sphere widget implementation
+
+cv::viz::WSphere::WSphere(const Point3d &center, double radius, int sphere_resolution, const Color &color)
+{
+    vtkSmartPointer<vtkSphereSource> sphere = vtkSmartPointer<vtkSphereSource>::New();
+    sphere->SetRadius(radius);
+    sphere->SetCenter(center.x, center.y, center.z);
+    sphere->SetPhiResolution(sphere_resolution);
+    sphere->SetThetaResolution(sphere_resolution);
+    sphere->LatLongTessellationOff();
+    sphere->Update();
+
+    vtkSmartPointer<vtkPolyData> polydata = sphere->GetOutput();
+    VtkUtils::FillScalars(polydata, color);
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, polydata);
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+template<> cv::viz::WSphere cv::viz::Widget::cast<cv::viz::WSphere>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WSphere&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// plane widget implementation
+
+cv::viz::WPlane::WPlane(const Size2d& size, const Color &color)
+{
+    vtkSmartPointer<vtkPlaneSource> plane = vtkSmartPointer<vtkPlaneSource>::New();
+    plane->SetOrigin(-0.5 * size.width, -0.5 * size.height, 0.0);
+    plane->SetPoint1( 0.5 * size.width, -0.5 * size.height, 0.0);
+    plane->SetPoint2(-0.5 * size.width,  0.5 * size.height, 0.0);
+    plane->Update();
+
+    vtkSmartPointer<vtkPolyData> polydata = plane->GetOutput();
+    VtkUtils::FillScalars(polydata, color);
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, polydata);
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+    actor->GetProperty()->LightingOff();
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+cv::viz::WPlane::WPlane(const Point3d& center, const Vec3d& normal, const Vec3d& new_yaxis, const Size2d& size, const Color &color)
+{
+    Vec3d zvec = normalize(normal);
+    Vec3d xvec = normalize(new_yaxis.cross(zvec));
+    Vec3d yvec = zvec.cross(xvec);
+
+    WPlane plane(size, color);
+    plane.applyTransform(makeTransformToGlobal(xvec, yvec, zvec, center));
+    *this = plane;
+}
+
+template<> cv::viz::WPlane cv::viz::Widget::cast<cv::viz::WPlane>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WPlane&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// arrow widget implementation
+
+cv::viz::WArrow::WArrow(const Point3d& pt1, const Point3d& pt2, double thickness, const Color &color)
+{
+    vtkSmartPointer<vtkArrowSource> arrow_source = vtkSmartPointer<vtkArrowSource>::New();
+    arrow_source->SetShaftRadius(thickness);
+    arrow_source->SetTipRadius(thickness * 3.0);
+    arrow_source->SetTipLength(thickness * 10.0);
+
+    Vec3d arbitrary = get_random_vec();
+    Vec3d start_point(pt1.x, pt1.y, pt1.z), end_point(pt2.x, pt2.y, pt2.z);
+
+    double length = norm(end_point - start_point);
+
+    Vec3d xvec = normalized(end_point - start_point);
+    Vec3d zvec = normalized(xvec.cross(arbitrary));
+    Vec3d yvec = zvec.cross(xvec);
+
+    Matx33d R = makeTransformToGlobal(xvec, yvec, zvec).rotation();
+    Affine3d transform_with_scale(R * length, start_point);
+
+    vtkSmartPointer<vtkPolyData> polydata = VtkUtils::TransformPolydata(arrow_source->GetOutputPort(), transform_with_scale);
+    VtkUtils::FillScalars(polydata, color);
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, polydata);
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+template<> cv::viz::WArrow cv::viz::Widget::cast<cv::viz::WArrow>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WArrow&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// circle widget implementation
+
+cv::viz::WCircle::WCircle(double radius, double thickness, const Color &color)
+{
+    vtkSmartPointer<vtkDiskSource> disk = vtkSmartPointer<vtkDiskSource>::New();
+    disk->SetCircumferentialResolution(30);
+    disk->SetInnerRadius(radius - thickness);
+    disk->SetOuterRadius(radius + thickness);
+    disk->Update();
+
+    vtkSmartPointer<vtkPolyData> polydata = disk->GetOutput();
+    VtkUtils::FillScalars(polydata, color);
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, polydata);
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->GetProperty()->LightingOff();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+cv::viz::WCircle::WCircle(double radius, const Point3d& center, const Vec3d& normal, double thickness, const Color &color)
+{
+    Vec3d arbitrary = get_random_vec();
+    Vec3d zvec = normalized(normal);
+    Vec3d xvec = normalized(zvec.cross(arbitrary));
+    Vec3d yvec = zvec.cross(xvec);
+
+    WCircle circle(radius, thickness, color);
+    circle.applyTransform(makeTransformToGlobal(xvec, yvec, zvec, center));
+    *this = circle;
+}
+
+template<> cv::viz::WCircle cv::viz::Widget::cast<cv::viz::WCircle>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WCircle&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// WCone widget implementation
+
+cv::viz::WCone::WCone(double length, double radius, int resolution, const Color &color)
+{
+    vtkSmartPointer<vtkConeSource> cone_source = vtkSmartPointer<vtkConeSource>::New();
+    cone_source->SetCenter(length*0.5, 0.0, 0.0);
+    cone_source->SetHeight(length);
+    cone_source->SetRadius(radius);
+    cone_source->SetResolution(resolution);
+    cone_source->Update();
+
+    vtkSmartPointer<vtkPolyData> polydata = cone_source->GetOutput();
+    VtkUtils::FillScalars(polydata, color);
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, polydata);
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+cv::viz::WCone::WCone(double radius, const Point3d& center, const Point3d& tip, int resolution, const Color &color)
+{
+    Vec3d arbitrary = get_random_vec();
+    Vec3d xvec = normalized(Vec3d(tip - center));
+    Vec3d zvec = normalized(xvec.cross(arbitrary));
+    Vec3d yvec = zvec.cross(xvec);
+
+    WCone circle(norm(tip - center), radius, resolution, color);
+    circle.applyTransform(makeTransformToGlobal(xvec, yvec, zvec, center));
+    *this = circle;
+}
+
+template<> cv::viz::WCone cv::viz::Widget::cast<cv::viz::WCone>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WCone&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// cylinder widget implementation
+
+cv::viz::WCylinder::WCylinder(const Point3d& axis_point1, const Point3d& axis_point2, double radius, int numsides, const Color &color)
+{
+    vtkSmartPointer<vtkLineSource> line = vtkSmartPointer<vtkLineSource>::New();
+    line->SetPoint1(axis_point1.x, axis_point1.y, axis_point1.z);
+    line->SetPoint2(axis_point2.x, axis_point2.y, axis_point2.z);
+
+    vtkSmartPointer<vtkTubeFilter> tuber = vtkSmartPointer<vtkTubeFilter>::New();
+    tuber->SetInputConnection(line->GetOutputPort());
+    tuber->SetNumberOfSides(numsides);
+    tuber->SetRadius(radius);
+    tuber->Update();
+
+    vtkSmartPointer<vtkPolyData> polydata = tuber->GetOutput();
+    VtkUtils::FillScalars(polydata, color);
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, polydata);
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+template<> cv::viz::WCylinder cv::viz::Widget::cast<cv::viz::WCylinder>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WCylinder&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// cylinder widget implementation
+
+cv::viz::WCube::WCube(const Point3d& min_point, const Point3d& max_point, bool wire_frame, const Color &color)
+{
+    double bounds[6];
+    bounds[0] = std::min(min_point.x, max_point.x);
+    bounds[1] = std::max(min_point.x, max_point.x);
+    bounds[2] = std::min(min_point.y, max_point.y);
+    bounds[3] = std::max(min_point.y, max_point.y);
+    bounds[4] = std::min(min_point.z, max_point.z);
+    bounds[5] = std::max(min_point.z, max_point.z);
+
+    vtkSmartPointer<vtkPolyDataAlgorithm> cube;
+    if (wire_frame)
+    {
+        cube = vtkSmartPointer<vtkOutlineSource>::New();
+        vtkOutlineSource::SafeDownCast(cube)->SetBounds(bounds);
+    }
+    else
+    {
+        cube = vtkSmartPointer<vtkCubeSource>::New();
+        vtkCubeSource::SafeDownCast(cube)->SetBounds(bounds);
+    }
+    cube->Update();
+    vtkSmartPointer<vtkPolyData> polydata =cube->GetOutput();
+    VtkUtils::FillScalars(polydata, color);
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, polydata);
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+template<> cv::viz::WCube cv::viz::Widget::cast<cv::viz::WCube>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WCube&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// coordinate system widget implementation
+
+cv::viz::WCoordinateSystem::WCoordinateSystem(double scale)
+{
+    vtkSmartPointer<vtkAxes> axes = vtkSmartPointer<vtkAxes>::New();
+    axes->SetOrigin(0, 0, 0);
+    axes->SetScaleFactor(scale);
+    axes->Update();
+
+    vtkSmartPointer<vtkUnsignedCharArray> colors = vtkSmartPointer<vtkUnsignedCharArray>::New();
+    colors->SetNumberOfComponents(3);
+    colors->InsertNextTuple3(255, 0, 0);
+    colors->InsertNextTuple3(255, 0, 0);
+    colors->InsertNextTuple3(0, 255, 0);
+    colors->InsertNextTuple3(0, 255, 0);
+    colors->InsertNextTuple3(0, 0, 255);
+    colors->InsertNextTuple3(0, 0, 255);
+
+    vtkSmartPointer<vtkPolyData> polydata = axes->GetOutput();
+    polydata->GetPointData()->SetScalars(colors);
+
+    vtkSmartPointer<vtkTubeFilter> tube_filter = vtkSmartPointer<vtkTubeFilter>::New();
+    VtkUtils::SetInputData(tube_filter, polydata);
+    tube_filter->SetRadius(axes->GetScaleFactor() / 50.0);
+    tube_filter->SetNumberOfSides(6);
+    tube_filter->Update();
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    mapper->SetScalarModeToUsePointData();
+    VtkUtils::SetInputData(mapper, tube_filter->GetOutput());
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+template<> cv::viz::WCoordinateSystem cv::viz::Widget::cast<cv::viz::WCoordinateSystem>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WCoordinateSystem&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// polyline widget implementation
+
+cv::viz::WPolyLine::WPolyLine(InputArray points, InputArray colors)
+{
+    vtkSmartPointer<vtkCloudMatSource> cloud_source = vtkSmartPointer<vtkCloudMatSource>::New();
+    cloud_source->SetColorCloud(points, colors);
+    cloud_source->Update();
+
+    vtkSmartPointer<vtkPolyData> polydata = cloud_source->GetOutput();
+
+    vtkSmartPointer<vtkCellArray> cell_array = vtkSmartPointer<vtkCellArray>::New();
+    cell_array->Allocate(cell_array->EstimateSize(1, polydata->GetNumberOfPoints()));
+    cell_array->InsertNextCell(polydata->GetNumberOfPoints());
+    for(vtkIdType i = 0; i < polydata->GetNumberOfPoints(); ++i)
+        cell_array->InsertCellPoint(i);
+
+    polydata->SetLines(cell_array);
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, polydata);
+    mapper->SetScalarRange(0, 255);
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+cv::viz::WPolyLine::WPolyLine(InputArray points, const Color &color)
+{
+    WPolyLine polyline(points, Mat(points.size(), CV_8UC3, color));
+    *this = polyline;
+}
+
+template<> cv::viz::WPolyLine cv::viz::Widget::cast<cv::viz::WPolyLine>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WPolyLine&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// grid widget implementation
+
+
+cv::viz::WGrid::WGrid(const Vec2i &cells, const Vec2d &cells_spacing, const Color &color)
+{
+    vtkSmartPointer<vtkImageData> grid_data = vtkSmartPointer<vtkImageData>::New();
+
+    // Add 1 to dimensions because in ImageData dimensions is the number of lines
+    // - however here it means number of cells
+    grid_data->SetDimensions(cells[0]+1, cells[1]+1, 1);
+    grid_data->SetSpacing(cells_spacing[0], cells_spacing[1], 0.);
+
+    // Set origin of the grid to be the middle of the grid
+    grid_data->SetOrigin(cells[0] * cells_spacing[0] * (-0.5), cells[1] * cells_spacing[1] * (-0.5), 0);
+
+    // Extract the edges so we have the grid
+    vtkSmartPointer<vtkExtractEdges> extract_edges = vtkSmartPointer<vtkExtractEdges>::New();
+    VtkUtils::SetInputData(extract_edges, grid_data);
+    extract_edges->Update();
+
+    vtkSmartPointer<vtkPolyData> polydata = extract_edges->GetOutput();
+    VtkUtils::FillScalars(polydata, color);
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, polydata);
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+cv::viz::WGrid::WGrid(const Point3d& center, const Vec3d& normal, const Vec3d& new_yaxis, const Vec2i &cells, const Vec2d &cells_spacing, const Color &color)
+{
+    Vec3d zvec = normalize(normal);
+    Vec3d xvec = normalize(new_yaxis.cross(zvec));
+    Vec3d yvec = zvec.cross(xvec);
+
+    WGrid grid(cells, cells_spacing, color);
+    grid.applyTransform(makeTransformToGlobal(xvec, yvec, zvec, center));
+    *this = grid;
+}
+
+template<> cv::viz::WGrid cv::viz::Widget::cast<cv::viz::WGrid>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WGrid&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// text3D widget implementation
+
+cv::viz::WText3D::WText3D(const String &text, const Point3d &position, double text_scale, bool face_camera, const Color &color)
+{
+    vtkSmartPointer<vtkVectorText> textSource = vtkSmartPointer<vtkVectorText>::New();
+    textSource->SetText(text.c_str());
+    textSource->Update();
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    mapper->SetInputConnection(textSource->GetOutputPort());
+
+    if (face_camera)
+    {
+        vtkSmartPointer<vtkFollower> actor = vtkSmartPointer<vtkFollower>::New();
+        actor->SetMapper(mapper);
+        actor->SetPosition(position.x, position.y, position.z);
+        actor->SetScale(text_scale);
+        WidgetAccessor::setProp(*this, actor);
+    }
+    else
+    {
+        vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+        actor->SetMapper(mapper);
+        actor->SetPosition(position.x, position.y, position.z);
+        actor->SetScale(text_scale);
+        actor->GetProperty()->LightingOff();
+        WidgetAccessor::setProp(*this, actor);
+    }
+
+    setColor(color);
+}
+
+void cv::viz::WText3D::setText(const String &text)
+{
+    vtkActor *actor = vtkActor::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("This widget does not support text." && actor);
+
+    // Update text source
+    vtkPolyDataMapper *mapper = vtkPolyDataMapper::SafeDownCast(actor->GetMapper());
+    vtkVectorText * textSource = vtkVectorText::SafeDownCast(mapper->GetInputConnection(0,0)->GetProducer());
+    CV_Assert("This widget does not support text." && textSource);
+
+    textSource->SetText(text.c_str());
+    textSource->Modified();
+    textSource->Update();
+}
+
+cv::String cv::viz::WText3D::getText() const
+{
+    vtkActor *actor = vtkActor::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("This widget does not support text." && actor);
+
+    vtkPolyDataMapper *mapper = vtkPolyDataMapper::SafeDownCast(actor->GetMapper());
+    vtkVectorText * textSource = vtkVectorText::SafeDownCast(mapper->GetInputConnection(0,0)->GetProducer());
+    CV_Assert("This widget does not support text." && textSource);
+
+    return textSource->GetText();
+}
+
+template<> cv::viz::WText3D cv::viz::Widget::cast<cv::viz::WText3D>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WText3D&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// text widget implementation
+
+cv::viz::WText::WText(const String &text, const Point &pos, int font_size, const Color &color)
+{
+    vtkSmartPointer<vtkTextActor> actor = vtkSmartPointer<vtkTextActor>::New();
+    actor->SetDisplayPosition(pos.x, pos.y);
+    actor->SetInput(text.c_str());
+
+    actor->GetProperty()->SetDisplayLocationToForeground();
+
+    vtkSmartPointer<vtkTextProperty> tprop = actor->GetTextProperty();
+    tprop->SetFontSize(font_size);
+    tprop->SetFontFamilyToCourier();
+    tprop->SetJustificationToLeft();
+    tprop->BoldOn();
+
+    Color c = vtkcolor(color);
+    tprop->SetColor(c.val);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+template<> cv::viz::WText cv::viz::Widget::cast<cv::viz::WText>() const
+{
+    Widget2D widget = this->cast<Widget2D>();
+    return static_cast<WText&>(widget);
+}
+
+void cv::viz::WText::setText(const String &text)
+{
+    vtkTextActor *actor = vtkTextActor::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("This widget does not support text." && actor);
+    actor->SetInput(text.c_str());
+}
+
+cv::String cv::viz::WText::getText() const
+{
+    vtkTextActor *actor = vtkTextActor::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("This widget does not support text." && actor);
+    return actor->GetInput();
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// image overlay widget implementation
+
+cv::viz::WImageOverlay::WImageOverlay(InputArray image, const Rect &rect)
+{
+    CV_Assert(!image.empty() && image.depth() == CV_8U);
+    vtkSmartPointer<vtkImageMatSource> source = vtkSmartPointer<vtkImageMatSource>::New();
+    source->SetImage(image);
+    Size sz = image.size();
+
+    // Scale the image based on the Rect, and flip to match y-ais orientation
+    vtkSmartPointer<vtkTransform> transform = vtkSmartPointer<vtkTransform>::New();
+    transform->Scale(sz.width/(double)rect.width, sz.height/(double)rect.height, 1.0);
+    transform->RotateX(180);
+
+    vtkSmartPointer<vtkImageReslice> image_reslice = vtkSmartPointer<vtkImageReslice>::New();
+    image_reslice->SetResliceTransform(transform);
+    image_reslice->SetInputConnection(source->GetOutputPort());
+    image_reslice->SetOutputDimensionality(2);
+    image_reslice->InterpolateOn();
+    image_reslice->AutoCropOutputOn();
+    image_reslice->Update();
+
+    vtkSmartPointer<vtkImageMapper> image_mapper = vtkSmartPointer<vtkImageMapper>::New();
+    image_mapper->SetInputConnection(image_reslice->GetOutputPort());
+    image_mapper->SetColorWindow(255); // OpenCV color
+    image_mapper->SetColorLevel(127.5);
+
+    vtkSmartPointer<vtkActor2D> actor = vtkSmartPointer<vtkActor2D>::New();
+    actor->SetMapper(image_mapper);
+    actor->SetPosition(rect.x, rect.y);
+    actor->GetProperty()->SetDisplayLocationToForeground();
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+void cv::viz::WImageOverlay::setImage(InputArray image)
+{
+    CV_Assert(!image.empty() && image.depth() == CV_8U);
+
+    vtkActor2D *actor = vtkActor2D::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("This widget does not support overlay image." && actor);
+
+    vtkImageMapper *mapper = vtkImageMapper::SafeDownCast(actor->GetMapper());
+    CV_Assert("This widget does not support overlay image." && mapper);
+    \
+    Vec6i extent;
+    mapper->GetInput()->GetExtent(extent.val);
+    Size size(extent[1], extent[3]);
+
+    // Create the vtk image and set its parameters based on input image
+    vtkSmartPointer<vtkImageMatSource> source = vtkSmartPointer<vtkImageMatSource>::New();
+    source->SetImage(image);
+    Size sz = image.size();
+
+    // Scale the image based on the Rect, and flip to match y-ais orientation
+    vtkSmartPointer<vtkTransform> transform = vtkSmartPointer<vtkTransform>::New();
+    transform->Scale(sz.width/(double)size.width, sz.height/(double)size.height, 1.0);
+    transform->RotateX(180);
+
+    vtkSmartPointer<vtkImageReslice> image_reslice = vtkSmartPointer<vtkImageReslice>::New();
+    image_reslice->SetResliceTransform(transform);
+    image_reslice->SetInputConnection(source->GetOutputPort());
+    image_reslice->SetOutputDimensionality(2);
+    image_reslice->InterpolateOn();
+    image_reslice->AutoCropOutputOn();
+    image_reslice->Update();
+
+    mapper->SetInputConnection(image_reslice->GetOutputPort());
+}
+
+template<> cv::viz::WImageOverlay cv::viz::Widget::cast<cv::viz::WImageOverlay>() const
+{
+    Widget2D widget = this->cast<Widget2D>();
+    return static_cast<WImageOverlay&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// image 3D widget implementation
+
+cv::viz::WImage3D::WImage3D(InputArray image, const Size2d &size)
+{
+    CV_Assert(!image.empty() && image.depth() == CV_8U);
+
+    vtkSmartPointer<vtkImageMatSource> source = vtkSmartPointer<vtkImageMatSource>::New();
+    source->SetImage(image);
+
+    vtkSmartPointer<vtkTexture> texture = vtkSmartPointer<vtkTexture>::New();
+    texture->SetInputConnection(source->GetOutputPort());
+
+    vtkSmartPointer<vtkPlaneSource> plane = vtkSmartPointer<vtkPlaneSource>::New();
+    plane->SetOrigin(-0.5 * size.width, -0.5 * size.height, 0.0);
+    plane->SetPoint1( 0.5 * size.width, -0.5 * size.height, 0.0);
+    plane->SetPoint2(-0.5 * size.width,  0.5 * size.height, 0.0);
+
+    vtkSmartPointer<vtkTextureMapToPlane> textured_plane = vtkSmartPointer<vtkTextureMapToPlane>::New();
+    textured_plane->SetInputConnection(plane->GetOutputPort());
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    mapper->SetInputConnection(textured_plane->GetOutputPort());
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+    actor->SetTexture(texture);
+    actor->GetProperty()->ShadingOff();
+    actor->GetProperty()->LightingOff();
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+cv::viz::WImage3D::WImage3D(InputArray image, const Size2d &size, const Vec3d &center, const Vec3d &normal, const Vec3d &up_vector)
+{
+    CV_Assert(!image.empty() && image.depth() == CV_8U);
+
+    // Compute the transformation matrix for drawing the camera frame in a scene
+    Vec3d n = normalize(normal);
+    Vec3d u = normalize(up_vector.cross(n));
+    Vec3d v = n.cross(u);
+    Affine3d pose = makeTransformToGlobal(u, v, n, center);
+
+    WImage3D image3d(image, size);
+    image3d.applyTransform(pose);
+    *this = image3d;
+}
+
+void cv::viz::WImage3D::setImage(InputArray image)
+{
+    CV_Assert(!image.empty() && image.depth() == CV_8U);
+
+    vtkActor *actor = vtkActor::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("This widget does not support 3D image." && actor);
+
+    vtkSmartPointer<vtkImageMatSource> source = vtkSmartPointer<vtkImageMatSource>::New();
+    source->SetImage(image);
+
+    vtkSmartPointer<vtkTexture> texture = vtkSmartPointer<vtkTexture>::New();
+    texture->SetInputConnection(source->GetOutputPort());
+
+    actor->SetTexture(texture);
+}
+
+void cv::viz::WImage3D::setSize(const cv::Size& size)
+{
+    vtkSmartPointer<vtkActor> actor = vtkActor::SafeDownCast(WidgetAccessor::getProp(*this));
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkPolyDataMapper::SafeDownCast(actor->GetMapper());
+    vtkSmartPointer<vtkTextureMapToPlane> textured_plane;
+    vtkSmartPointer<vtkPlaneSource> plane;
+    #if VTK_MAJOR_VERSION <= 5
+        textured_plane = vtkTextureMapToPlane::SafeDownCast(mapper->GetInputConnection(0,0)->GetProducer());
+        plane = vtkPlaneSource::SafeDownCast(textured_plane->GetInputConnection(0,0)->GetProducer());
+    #else
+        textured_plane = vtkTextureMapToPlane::SafeDownCast(mapper->GetInputAlgorithm());
+        plane = vtkPlaneSource::SafeDownCast(textured_plane->GetInputAlgorithm());
+    #endif
+    plane->SetOrigin(-0.5 * size.width, -0.5 * size.height, 0.0);
+    plane->SetPoint1( 0.5 * size.width, -0.5 * size.height, 0.0);
+    plane->SetPoint2(-0.5 * size.width,  0.5 * size.height, 0.0);
+}
+
+template<> cv::viz::WImage3D cv::viz::Widget::cast<cv::viz::WImage3D>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WImage3D&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// camera position widget implementation
+
+namespace  cv  { namespace viz { namespace
+{
+    struct CameraPositionUtils
+    {
+        static vtkSmartPointer<vtkPolyData> createFrustum(double aspect_ratio, double fovy, double scale)
+        {
+            vtkSmartPointer<vtkCamera> camera = vtkSmartPointer<vtkCamera>::New();
+            camera->SetViewAngle(fovy);
+            camera->SetPosition(0.0, 0.0, 0.0);
+            camera->SetViewUp(0.0, 1.0, 0.0);
+            camera->SetFocalPoint(0.0, 0.0, 1.0);
+            camera->SetClippingRange(1e-9, scale);
+
+            double planes_array[24];
+            camera->GetFrustumPlanes(aspect_ratio, planes_array);
+
+            vtkSmartPointer<vtkPlanes> planes = vtkSmartPointer<vtkPlanes>::New();
+            planes->SetFrustumPlanes(planes_array);
+
+            vtkSmartPointer<vtkFrustumSource> frustumSource = vtkSmartPointer<vtkFrustumSource>::New();
+            frustumSource->SetPlanes(planes);
+
+            vtkSmartPointer<vtkExtractEdges> extract_edges = vtkSmartPointer<vtkExtractEdges>::New();
+            extract_edges->SetInputConnection(frustumSource->GetOutputPort());
+            extract_edges->Update();
+
+            return extract_edges->GetOutput();
+        }
+
+        static Mat ensureColorImage(InputArray image)
+        {
+            Mat color(image.size(), CV_8UC3);
+            if (image.channels() == 1)
+            {
+                Vec3b *drow = color.ptr<Vec3b>();
+                for(int y = 0; y < color.rows; ++y)
+                {
+                    const unsigned char *srow = image.getMat().ptr<unsigned char>(y);
+                    const unsigned char *send = srow + color.cols;
+                    for(;srow < send;)
+                        *drow++ = Vec3b::all(*srow++);
+                }
+            }
+            else
+                image.copyTo(color);
+            return color;
+        }
+    };
+}}}
+
+cv::viz::WCameraPosition::WCameraPosition(double scale)
+{
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, getPolyData(WCoordinateSystem(scale)));
+    mapper->SetScalarModeToUsePointData();
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+cv::viz::WCameraPosition::WCameraPosition(const Matx33d &K, double scale, const Color &color)
+{
+    double f_x = K(0,0), f_y = K(1,1), c_y = K(1,2);
+
+    // Assuming that this is an ideal camera (c_y and c_x are at the center of the image)
+    double fovy = 2.0 * atan2(c_y, f_y) * 180 / CV_PI;
+    double aspect_ratio = f_y / f_x;
+
+    vtkSmartPointer<vtkPolyData> polydata = CameraPositionUtils::createFrustum(aspect_ratio, fovy, scale);
+    VtkUtils::FillScalars(polydata, color);
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, polydata);
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+cv::viz::WCameraPosition::WCameraPosition(const Vec2d &fov, double scale, const Color &color)
+{
+    double aspect_ratio = tan(fov[0] * 0.5) / tan(fov[1] * 0.5);
+    double fovy = fov[1] * 180 / CV_PI;
+
+    vtkSmartPointer<vtkPolyData> polydata = CameraPositionUtils::createFrustum(aspect_ratio, fovy, scale);
+    VtkUtils::FillScalars(polydata, color);
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, polydata);
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+cv::viz::WCameraPosition::WCameraPosition(const Matx33d &K, InputArray _image, double scale, const Color &color)
+{
+    CV_Assert(!_image.empty() && _image.depth() == CV_8U);
+    Mat image = CameraPositionUtils::ensureColorImage(_image);
+    image.at<Vec3b>(0, 0) = Vec3d(color.val); //workaround of VTK limitation
+
+    double f_y = K(1,1), c_y = K(1,2);
+    // Assuming that this is an ideal camera (c_y and c_x are at the center of the image)
+    double fovy = 2.0 * atan2(c_y, f_y) * 180.0 / CV_PI;
+    double far_end_height = 2.00 * c_y * scale / f_y;
+    double aspect_ratio = image.cols/(double)image.rows;
+    double image_scale = far_end_height/image.rows;
+
+    WImage3D image_widget(image, Size2d(image.size()) * image_scale);
+    image_widget.applyTransform(Affine3d().translate(Vec3d(0, 0, scale)));
+    vtkSmartPointer<vtkPolyData> plane = getPolyData(image_widget);
+
+    vtkSmartPointer<vtkPolyData> frustum = CameraPositionUtils::createFrustum(aspect_ratio, fovy, scale);
+
+    // Frustum needs to be textured or else it can't be combined with image
+    vtkSmartPointer<vtkTextureMapToPlane> frustum_texture = vtkSmartPointer<vtkTextureMapToPlane>::New();
+    VtkUtils::SetInputData(frustum_texture, frustum);
+    frustum_texture->SetSRange(0.0, 0.0); // Texture mapping with only one pixel
+    frustum_texture->SetTRange(0.0, 0.0); // from the image to have constant color
+
+    vtkSmartPointer<vtkAppendPolyData> append_filter = vtkSmartPointer<vtkAppendPolyData>::New();
+    append_filter->AddInputConnection(frustum_texture->GetOutputPort());
+    VtkUtils::AddInputData(append_filter, plane);
+
+    vtkSmartPointer<vtkActor> actor = getActor(image_widget);
+    actor->GetMapper()->SetInputConnection(append_filter->GetOutputPort());
+    WidgetAccessor::setProp(*this, actor);
+}
+
+cv::viz::WCameraPosition::WCameraPosition(const Vec2d &fov, InputArray _image, double scale, const Color &color)
+{
+    CV_Assert(!_image.empty() && _image.depth() == CV_8U);
+    Mat image = CameraPositionUtils::ensureColorImage(_image);
+    image.at<Vec3b>(0, 0) = Vec3d(color.val); //workaround of VTK limitation
+
+    double fovy = fov[1] * 180.0 / CV_PI;
+    double far_end_height = 2.0 * scale * tan(fov[1] * 0.5);
+    double aspect_ratio = image.cols/(double)image.rows;
+    double image_scale = far_end_height/image.rows;
+
+    WImage3D image_widget(image, Size2d(image.size()) * image_scale);
+    image_widget.applyTransform(Affine3d().translate(Vec3d(0, 0, scale)));
+    vtkSmartPointer<vtkPolyData> plane = getPolyData(image_widget);
+
+    vtkSmartPointer<vtkPolyData> frustum = CameraPositionUtils::createFrustum(aspect_ratio, fovy, scale);
+
+    // Frustum needs to be textured or else it can't be combined with image
+    vtkSmartPointer<vtkTextureMapToPlane> frustum_texture = vtkSmartPointer<vtkTextureMapToPlane>::New();
+    VtkUtils::SetInputData(frustum_texture, frustum);
+    frustum_texture->SetSRange(0.0, 0.0); // Texture mapping with only one pixel
+    frustum_texture->SetTRange(0.0, 0.0); // from the image to have constant color
+
+    vtkSmartPointer<vtkAppendPolyData> append_filter = vtkSmartPointer<vtkAppendPolyData>::New();
+    append_filter->AddInputConnection(frustum_texture->GetOutputPort());
+    VtkUtils::AddInputData(append_filter, plane);
+
+    vtkSmartPointer<vtkActor> actor = getActor(image_widget);
+    actor->GetMapper()->SetInputConnection(append_filter->GetOutputPort());
+    WidgetAccessor::setProp(*this, actor);
+}
+
+template<> cv::viz::WCameraPosition cv::viz::Widget::cast<cv::viz::WCameraPosition>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WCameraPosition&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// trajectory widget implementation
+
+cv::viz::WTrajectory::WTrajectory(InputArray _path, int display_mode, double scale, const Color &color)
+{
+    vtkSmartPointer<vtkAppendPolyData> append_filter = vtkSmartPointer<vtkAppendPolyData>::New();
+
+    // Bitwise and with 3 in order to limit the domain to 2 bits
+    if (display_mode & WTrajectory::PATH)
+    {
+        Mat points = vtkTrajectorySource::ExtractPoints(_path);
+        vtkSmartPointer<vtkPolyData> polydata = getPolyData(WPolyLine(points, color));
+        VtkUtils::AddInputData(append_filter, polydata);
+    }
+
+    if (display_mode & WTrajectory::FRAMES)
+    {
+        vtkSmartPointer<vtkTrajectorySource> source = vtkSmartPointer<vtkTrajectorySource>::New();
+        source->SetTrajectory(_path);
+
+        vtkSmartPointer<vtkPolyData> glyph = getPolyData(WCoordinateSystem(scale));
+
+        vtkSmartPointer<vtkTensorGlyph> tensor_glyph = vtkSmartPointer<vtkTensorGlyph>::New();
+        tensor_glyph->SetInputConnection(source->GetOutputPort());
+        VtkUtils::SetSourceData(tensor_glyph, glyph);
+        tensor_glyph->ExtractEigenvaluesOff();  // Treat as a rotation matrix, not as something with eigenvalues
+        tensor_glyph->ThreeGlyphsOff();
+        tensor_glyph->SymmetricOff();
+        tensor_glyph->ColorGlyphsOff();
+
+        append_filter->AddInputConnection(tensor_glyph->GetOutputPort());
+    }
+    append_filter->Update();
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, append_filter->GetOutput());
+    mapper->SetScalarModeToUsePointData();
+    mapper->SetScalarRange(0, 255);
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+template<> cv::viz::WTrajectory cv::viz::Widget::cast<cv::viz::WTrajectory>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WTrajectory&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// WTrajectoryFrustums widget implementation
+
+cv::viz::WTrajectoryFrustums::WTrajectoryFrustums(InputArray _path, const Matx33d &K, double scale, const Color &color)
+{
+    vtkSmartPointer<vtkTrajectorySource> source = vtkSmartPointer<vtkTrajectorySource>::New();
+    source->SetTrajectory(_path);
+
+    vtkSmartPointer<vtkPolyData> glyph = getPolyData(WCameraPosition(K, scale));
+    VtkUtils::FillScalars(glyph, color);
+
+    vtkSmartPointer<vtkTensorGlyph> tensor_glyph = vtkSmartPointer<vtkTensorGlyph>::New();
+    tensor_glyph->SetInputConnection(source->GetOutputPort());
+    VtkUtils::SetSourceData(tensor_glyph, glyph);
+    tensor_glyph->ExtractEigenvaluesOff();  // Treat as a rotation matrix, not as something with eigenvalues
+    tensor_glyph->ThreeGlyphsOff();
+    tensor_glyph->SymmetricOff();
+    tensor_glyph->ColorGlyphsOff();
+    tensor_glyph->Update();
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, tensor_glyph->GetOutput());
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+cv::viz::WTrajectoryFrustums::WTrajectoryFrustums(InputArray _path, const Vec2d &fov, double scale, const Color &color)
+{
+    vtkSmartPointer<vtkTrajectorySource> source = vtkSmartPointer<vtkTrajectorySource>::New();
+    source->SetTrajectory(_path);
+
+    vtkSmartPointer<vtkPolyData> glyph = getPolyData(WCameraPosition(fov, scale));
+    VtkUtils::FillScalars(glyph, color);
+
+    vtkSmartPointer<vtkTensorGlyph> tensor_glyph = vtkSmartPointer<vtkTensorGlyph>::New();
+    tensor_glyph->SetInputConnection(source->GetOutputPort());
+    VtkUtils::SetSourceData(tensor_glyph, glyph);
+    tensor_glyph->ExtractEigenvaluesOff();  // Treat as a rotation matrix, not as something with eigenvalues
+    tensor_glyph->ThreeGlyphsOff();
+    tensor_glyph->SymmetricOff();
+    tensor_glyph->ColorGlyphsOff();
+    tensor_glyph->Update();
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    VtkUtils::SetInputData(mapper, tensor_glyph->GetOutput());
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+template<> cv::viz::WTrajectoryFrustums cv::viz::Widget::cast<cv::viz::WTrajectoryFrustums>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WTrajectoryFrustums&>(widget);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// WTrajectorySpheres widget implementation
+
+cv::viz::WTrajectorySpheres::WTrajectorySpheres(InputArray _path, double line_length, double radius, const Color &from, const Color &to)
+{
+    CV_Assert(_path.kind() == _InputArray::STD_VECTOR || _path.kind() == _InputArray::MAT);
+    CV_Assert(_path.type() == CV_32FC(16) || _path.type() == CV_64FC(16));
+
+    Mat path64;
+    _path.getMat().convertTo(path64, CV_64F);
+    Affine3d *traj = path64.ptr<Affine3d>();
+    size_t total = path64.total();
+
+    vtkSmartPointer<vtkAppendPolyData> append_filter = vtkSmartPointer<vtkAppendPolyData>::New();
+
+    for(size_t i = 0; i < total; ++i)
+    {
+        Vec3d curr = traj[i].translation();
+
+        vtkSmartPointer<vtkSphereSource> sphere_source = vtkSmartPointer<vtkSphereSource>::New();
+        sphere_source->SetCenter(curr.val);
+        sphere_source->SetRadius( (i == 0) ? 2 * radius : radius );
+        sphere_source->Update();
+
+        double alpha = static_cast<double>(i)/total;
+        Color c = from * (1 - alpha) + to * alpha;
+
+        vtkSmartPointer<vtkPolyData> polydata = sphere_source->GetOutput();
+        polydata->GetCellData()->SetScalars(VtkUtils::FillScalars(polydata->GetNumberOfCells(), c));
+        VtkUtils::AddInputData(append_filter, polydata);
+
+        if (i > 0)
+        {
+            Vec3d prev = traj[i-1].translation();
+            Vec3d lvec = prev - curr;
+
+            if(norm(lvec) > line_length)
+                lvec = normalize(lvec) * line_length;
+
+            Vec3d lend = curr + lvec;
+
+            vtkSmartPointer<vtkLineSource> line_source = vtkSmartPointer<vtkLineSource>::New();
+            line_source->SetPoint1(curr.val);
+            line_source->SetPoint2(lend.val);
+            line_source->Update();
+            vtkSmartPointer<vtkPolyData> polydata_ = line_source->GetOutput();
+            polydata_->GetCellData()->SetScalars(VtkUtils::FillScalars(polydata_->GetNumberOfCells(), c));
+            VtkUtils::AddInputData(append_filter, polydata_);
+        }
+    }
+    append_filter->Update();
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
+    mapper->SetScalarModeToUseCellData();
+    VtkUtils::SetInputData(mapper, append_filter->GetOutput());
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->SetMapper(mapper);
+
+    WidgetAccessor::setProp(*this, actor);
+}
+
+template<> cv::viz::WTrajectorySpheres cv::viz::Widget::cast<cv::viz::WTrajectorySpheres>() const
+{
+    Widget3D widget = this->cast<Widget3D>();
+    return static_cast<WTrajectorySpheres&>(widget);
+}
diff --git a/modules/viz/src/types.cpp b/modules/viz/src/types.cpp
new file mode 100644
index 00000000000..65571a192ed
--- /dev/null
+++ b/modules/viz/src/types.cpp
@@ -0,0 +1,228 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#include "precomp.hpp"
+
+////////////////////////////////////////////////////////////////////
+/// Events
+
+cv::viz::KeyboardEvent::KeyboardEvent(Action _action, const String& _symbol, unsigned char _code, int _modifiers)
+  : action(_action), symbol(_symbol), code(_code), modifiers(_modifiers) {}
+
+cv::viz::MouseEvent::MouseEvent(const Type& _type, const MouseButton& _button, const Point& _pointer, int _modifiers)
+    : type(_type), button(_button), pointer(_pointer), modifiers(_modifiers) {}
+
+////////////////////////////////////////////////////////////////////
+/// cv::viz::Mesh3d
+
+cv::viz::Mesh cv::viz::Mesh::load(const String& file, int type)
+{
+    vtkSmartPointer<vtkPolyDataAlgorithm> reader = vtkSmartPointer<vtkPolyDataAlgorithm>::New();
+    switch (type) {
+      case LOAD_AUTO:
+      {
+        CV_Error(Error::StsError, "cv::viz::Mesh::LOAD_AUTO: Not implemented yet");
+      }
+      case LOAD_PLY:
+      {
+        vtkSmartPointer<vtkPLYReader> ply_reader = vtkSmartPointer<vtkPLYReader>::New();
+        ply_reader->SetFileName(file.c_str());
+        ply_reader->Update();
+        reader = ply_reader;
+        break;
+      }
+      case LOAD_OBJ:
+      {
+        vtkSmartPointer<vtkOBJReader> obj_reader = vtkSmartPointer<vtkOBJReader>::New();
+        obj_reader->SetFileName(file.c_str());
+        obj_reader->Update();
+        reader = obj_reader;
+        break;
+      }
+      default:
+        CV_Error(Error::StsError, "cv::viz::Mesh::load: Unknown file type");
+    }
+
+    vtkSmartPointer<vtkPolyData> polydata = reader->GetOutput();
+    CV_Assert("File does not exist or file format is not supported." && polydata);
+
+    Mesh mesh;
+    vtkSmartPointer<vtkCloudMatSink> sink = vtkSmartPointer<vtkCloudMatSink>::New();
+    sink->SetOutput(mesh.cloud, mesh.colors, mesh.normals, mesh.tcoords);
+    sink->SetInputConnection(reader->GetOutputPort());
+    sink->Write();
+
+    // Now handle the polygons
+    vtkSmartPointer<vtkCellArray> polygons = polydata->GetPolys();
+    mesh.polygons.create(1, polygons->GetSize(), CV_32SC1);
+    int* poly_ptr = mesh.polygons.ptr<int>();
+
+    polygons->InitTraversal();
+    vtkIdType nr_cell_points, *cell_points;
+    while (polygons->GetNextCell(nr_cell_points, cell_points))
+    {
+        *poly_ptr++ = nr_cell_points;
+        for (vtkIdType i = 0; i < nr_cell_points; ++i)
+            *poly_ptr++ = (int)cell_points[i];
+    }
+
+    return mesh;
+}
+
+////////////////////////////////////////////////////////////////////
+/// Camera implementation
+
+cv::viz::Camera::Camera(double fx, double fy, double cx, double cy, const Size &window_size)
+{
+    init(fx, fy, cx, cy, window_size);
+}
+
+cv::viz::Camera::Camera(const Vec2d &fov, const Size &window_size)
+{
+    CV_Assert(window_size.width > 0 && window_size.height > 0);
+    setClip(Vec2d(0.01, 1000.01)); // Default clipping
+    setFov(fov);
+    window_size_ = window_size;
+    // Principal point at the center
+    principal_point_ = Vec2f(static_cast<float>(window_size.width)*0.5f, static_cast<float>(window_size.height)*0.5f);
+    focal_ = Vec2f(principal_point_[0] / tan(fov_[0]*0.5f), principal_point_[1] / tan(fov_[1]*0.5f));
+}
+
+cv::viz::Camera::Camera(const cv::Matx33d & K, const Size &window_size)
+{
+    double f_x = K(0,0);
+    double f_y = K(1,1);
+    double c_x = K(0,2);
+    double c_y = K(1,2);
+    init(f_x, f_y, c_x, c_y, window_size);
+}
+
+cv::viz::Camera::Camera(const Matx44d &proj, const Size &window_size)
+{
+    CV_Assert(window_size.width > 0 && window_size.height > 0);
+
+    double near = proj(2,3) / (proj(2,2) - 1.0);
+    double far = near * (proj(2,2) - 1.0) / (proj(2,2) + 1.0);
+    double left = near * (proj(0,2)-1) / proj(0,0);
+    double right = 2.0 * near / proj(0,0) + left;
+    double bottom = near * (proj(1,2)-1) / proj(1,1);
+    double top = 2.0 * near / proj(1,1) + bottom;
+
+    double epsilon = 2.2204460492503131e-16;
+
+    principal_point_[0] = fabs(left-right) < epsilon ? window_size.width  * 0.5 : (left * window_size.width) / (left - right);
+    principal_point_[1] = fabs(top-bottom) < epsilon ? window_size.height * 0.5 : (top * window_size.height) / (top - bottom);
+
+    focal_[0] = -near * principal_point_[0] / left;
+    focal_[1] =  near * principal_point_[1] / top;
+
+    setClip(Vec2d(near, far));
+    fov_[0] = atan2(principal_point_[0], focal_[0]) + atan2(window_size.width-principal_point_[0],  focal_[0]);
+    fov_[1] = atan2(principal_point_[1], focal_[1]) + atan2(window_size.height-principal_point_[1], focal_[1]);
+
+    window_size_ = window_size;
+}
+
+void cv::viz::Camera::init(double fx, double fy, double cx, double cy, const Size &window_size)
+{
+    CV_Assert(window_size.width > 0 && window_size.height > 0);
+    setClip(Vec2d(0.01, 1000.01));// Default clipping
+
+    fov_[0] = atan2(cx, fx) + atan2(window_size.width  - cx, fx);
+    fov_[1] = atan2(cy, fy) + atan2(window_size.height - cy, fy);
+
+    principal_point_[0] = cx;
+    principal_point_[1] = cy;
+
+    focal_[0] = fx;
+    focal_[1] = fy;
+
+    window_size_ = window_size;
+}
+
+void cv::viz::Camera::setWindowSize(const Size &window_size)
+{
+    CV_Assert(window_size.width > 0 && window_size.height > 0);
+
+    // Get the scale factor and update the principal points
+    float scalex = static_cast<float>(window_size.width) / static_cast<float>(window_size_.width);
+    float scaley = static_cast<float>(window_size.height) / static_cast<float>(window_size_.height);
+
+    principal_point_[0] *= scalex;
+    principal_point_[1] *= scaley;
+    focal_ *= scaley;
+    // Vertical field of view is fixed!  Update horizontal field of view
+    fov_[0] = (atan2(principal_point_[0],focal_[0]) + atan2(window_size.width-principal_point_[0],focal_[0]));
+
+    window_size_ = window_size;
+}
+
+void cv::viz::Camera::computeProjectionMatrix(Matx44d &proj) const
+{
+    double top = clip_[0] * principal_point_[1] / focal_[1];
+    double left = -clip_[0] * principal_point_[0] / focal_[0];
+    double right = clip_[0] * (window_size_.width - principal_point_[0]) / focal_[0];
+    double bottom = -clip_[0] * (window_size_.height - principal_point_[1]) / focal_[1];
+
+    double temp1 = 2.0 * clip_[0];
+    double temp2 = 1.0 / (right - left);
+    double temp3 = 1.0 / (top - bottom);
+    double temp4 = 1.0 / (clip_[0] - clip_[1]);
+
+    proj = Matx44d::zeros();
+    proj(0,0) = temp1 * temp2;
+    proj(1,1) = temp1 * temp3;
+    proj(0,2) = (right + left) * temp2;
+    proj(1,2) = (top + bottom) * temp3;
+    proj(2,2) = (clip_[1]+clip_[0]) * temp4;
+    proj(3,2) = -1.0;
+    proj(2,3) = (temp1 * clip_[1]) * temp4;
+}
+
+cv::viz::Camera cv::viz::Camera::KinectCamera(const Size &window_size)
+{
+    Matx33d K(525.0, 0.0, 320.0, 0.0, 525.0, 240.0, 0.0, 0.0, 1.0);
+    return Camera(K, window_size);
+}
diff --git a/modules/viz/src/viz3d.cpp b/modules/viz/src/viz3d.cpp
new file mode 100644
index 00000000000..62e570e6566
--- /dev/null
+++ b/modules/viz/src/viz3d.cpp
@@ -0,0 +1,156 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#include "precomp.hpp"
+
+cv::viz::Viz3d::Viz3d(const String& window_name) : impl_(0) { create(window_name); }
+
+cv::viz::Viz3d::Viz3d(const Viz3d& other) : impl_(other.impl_)
+{
+    if (impl_)
+        CV_XADD(&impl_->ref_counter, 1);
+}
+
+cv::viz::Viz3d& cv::viz::Viz3d::operator=(const Viz3d& other)
+{
+    if (this != &other)
+    {
+        release();
+        impl_ = other.impl_;
+        if (impl_)
+            CV_XADD(&impl_->ref_counter, 1);
+    }
+    return *this;
+}
+
+cv::viz::Viz3d::~Viz3d() { release(); }
+
+void cv::viz::Viz3d::create(const String &window_name)
+{
+    if (impl_)
+        release();
+
+    if (VizStorage::windowExists(window_name))
+        *this = VizStorage::get(window_name);
+    else
+    {
+        impl_ = new VizImpl(window_name);
+        impl_->ref_counter = 1;
+
+        // Register the window
+        VizStorage::add(*this);
+    }
+}
+
+void cv::viz::Viz3d::release()
+{
+    if (impl_ && CV_XADD(&impl_->ref_counter, -1) == 1)
+    {
+        delete impl_;
+        impl_ = 0;
+    }
+
+    if (impl_ && impl_->ref_counter == 1)
+        VizStorage::removeUnreferenced();
+
+    impl_ = 0;
+}
+
+void cv::viz::Viz3d::spin() { impl_->spin(); }
+void cv::viz::Viz3d::spinOnce(int time, bool force_redraw) { impl_->spinOnce(time, force_redraw); }
+void cv::viz::Viz3d::setOffScreenRendering() { impl_->setOffScreenRendering(); }
+void cv::viz::Viz3d::removeAllLights() { impl_->removeAllLights(); }
+void cv::viz::Viz3d::addLight(const Vec3d &position, const Vec3d &focalPoint, const Color &color,
+                              const Color &diffuseColor, const Color &ambientColor, const Color &specularColor)
+{  impl_->addLight(position, focalPoint, color, diffuseColor, ambientColor, specularColor);  }
+bool cv::viz::Viz3d::wasStopped() const { return impl_->wasStopped(); }
+void cv::viz::Viz3d::close() { impl_->close(); }
+
+void cv::viz::Viz3d::registerKeyboardCallback(KeyboardCallback callback, void* cookie)
+{ impl_->registerKeyboardCallback(callback, cookie); }
+
+void cv::viz::Viz3d::registerMouseCallback(MouseCallback callback, void* cookie)
+{ impl_->registerMouseCallback(callback, cookie); }
+
+void cv::viz::Viz3d::showWidget(const String &id, const Widget &widget, const Affine3d &pose) { impl_->showWidget(id, widget, pose); }
+void cv::viz::Viz3d::removeWidget(const String &id) { impl_->removeWidget(id); }
+cv::viz::Widget cv::viz::Viz3d::getWidget(const String &id) const { return impl_->getWidget(id); }
+void cv::viz::Viz3d::removeAllWidgets() { impl_->removeAllWidgets(); }
+
+void cv::viz::Viz3d::showImage(InputArray image, const Size& window_size) { impl_->showImage(image, window_size); }
+
+void cv::viz::Viz3d::setWidgetPose(const String &id, const Affine3d &pose) { impl_->setWidgetPose(id, pose); }
+void cv::viz::Viz3d::updateWidgetPose(const String &id, const Affine3d &pose) { impl_->updateWidgetPose(id, pose); }
+cv::Affine3d cv::viz::Viz3d::getWidgetPose(const String &id) const { return impl_->getWidgetPose(id); }
+
+void cv::viz::Viz3d::setCamera(const Camera &camera) { impl_->setCamera(camera); }
+cv::viz::Camera cv::viz::Viz3d::getCamera() const { return impl_->getCamera(); }
+void cv::viz::Viz3d::setViewerPose(const Affine3d &pose) { impl_->setViewerPose(pose); }
+cv::Affine3d cv::viz::Viz3d::getViewerPose() const { return impl_->getViewerPose(); }
+
+void cv::viz::Viz3d::resetCameraViewpoint(const String &id) { impl_->resetCameraViewpoint(id); }
+void cv::viz::Viz3d::resetCamera() { impl_->resetCamera(); }
+
+void cv::viz::Viz3d::convertToWindowCoordinates(const Point3d &pt, Point3d &window_coord) { impl_->convertToWindowCoordinates(pt, window_coord); }
+void cv::viz::Viz3d::converTo3DRay(const Point3d &window_coord, Point3d &origin, Vec3d &direction) { impl_->converTo3DRay(window_coord, origin, direction); }
+
+cv::Size cv::viz::Viz3d::getWindowSize() const { return impl_->getWindowSize(); }
+void cv::viz::Viz3d::setWindowSize(const Size &window_size) { impl_->setWindowSize(window_size); }
+cv::String cv::viz::Viz3d::getWindowName() const { return impl_->getWindowName(); }
+cv::Mat cv::viz::Viz3d::getScreenshot() const { return impl_->getScreenshot(); }
+void cv::viz::Viz3d::saveScreenshot(const String &file) { impl_->saveScreenshot(file); }
+void cv::viz::Viz3d::setWindowPosition(const Point& window_position) { impl_->setWindowPosition(window_position); }
+void cv::viz::Viz3d::setFullScreen(bool mode) { impl_->setFullScreen(mode); }
+void cv::viz::Viz3d::setBackgroundColor(const Color& color, const Color& color2) { impl_->setBackgroundColor(color, color2); }
+
+void cv::viz::Viz3d::setBackgroundTexture(InputArray image) { impl_->setBackgroundTexture(image); }
+void cv::viz::Viz3d::setBackgroundMeshLab() {impl_->setBackgroundMeshLab(); }
+
+void cv::viz::Viz3d::setRenderingProperty(const String &id, int property, double value) { getWidget(id).setRenderingProperty(property, value); }
+double cv::viz::Viz3d::getRenderingProperty(const String &id, int property) { return getWidget(id).getRenderingProperty(property); }
+
+void cv::viz::Viz3d::setRepresentation(int representation) { impl_->setRepresentation(representation); }
+
+void cv::viz::Viz3d::setGlobalWarnings(bool enabled) { vtkObject::SetGlobalWarningDisplay(enabled ? 1 : 0); }
diff --git a/modules/viz/src/vizcore.cpp b/modules/viz/src/vizcore.cpp
new file mode 100644
index 00000000000..a0ca4980a2d
--- /dev/null
+++ b/modules/viz/src/vizcore.cpp
@@ -0,0 +1,352 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#include "precomp.hpp"
+
+cv::Affine3d cv::viz::makeTransformToGlobal(const Vec3d& axis_x, const Vec3d& axis_y, const Vec3d& axis_z, const Vec3d& origin)
+{
+    Affine3d::Mat3 R(axis_x[0], axis_y[0], axis_z[0],
+                     axis_x[1], axis_y[1], axis_z[1],
+                     axis_x[2], axis_y[2], axis_z[2]);
+
+    return Affine3d(R, origin);
+}
+
+cv::Affine3d cv::viz::makeCameraPose(const Vec3d& position, const Vec3d& focal_point, const Vec3d& y_dir)
+{
+    // Compute the transformation matrix for drawing the camera frame in a scene
+    Vec3d n = normalize(focal_point - position);
+    Vec3d u = normalize(y_dir.cross(n));
+    Vec3d v = n.cross(u);
+
+    return makeTransformToGlobal(u, v, n, position);
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// VizStorage implementation
+
+#if defined(_WIN32) && !defined(__CYGWIN__)
+
+    #include <windows.h>
+
+    static BOOL WINAPI ConsoleHandlerRoutine(DWORD /*dwCtrlType*/)
+    {
+        vtkObject::GlobalWarningDisplayOff();
+        return FALSE;
+    }
+
+    static void register_console_handler()
+    {
+        HANDLE hOut = GetStdHandle(STD_OUTPUT_HANDLE);
+        CONSOLE_SCREEN_BUFFER_INFO hOutInfo;
+        if (GetConsoleScreenBufferInfo(hOut, &hOutInfo))
+            SetConsoleCtrlHandler(ConsoleHandlerRoutine, TRUE);
+    }
+
+#else
+
+    void register_console_handler();
+    void register_console_handler() {}
+
+#endif
+
+
+cv::viz::VizStorage cv::viz::VizStorage::init;
+cv::viz::VizMap cv::viz::VizStorage::storage;
+
+void cv::viz::VizMap::replace_clear() { type().swap(m); }
+cv::viz::VizMap::~VizMap() { replace_clear(); }
+
+cv::viz::VizStorage::VizStorage()
+{
+    register_console_handler();
+}
+void cv::viz::VizStorage::unregisterAll() { storage.replace_clear(); }
+
+cv::viz::Viz3d& cv::viz::VizStorage::get(const String &window_name)
+{
+    String name = generateWindowName(window_name);
+    VizMap::iterator vm_itr = storage.m.find(name);
+    CV_Assert(vm_itr != storage.m.end());
+    return vm_itr->second;
+}
+
+void cv::viz::VizStorage::add(const Viz3d& window)
+{
+    String window_name = window.getWindowName();
+    VizMap::iterator vm_itr = storage.m.find(window_name);
+    CV_Assert(vm_itr == storage.m.end());
+    storage.m.insert(std::make_pair(window_name, window));
+}
+
+bool cv::viz::VizStorage::windowExists(const String &window_name)
+{
+    String name = generateWindowName(window_name);
+    return storage.m.find(name) != storage.m.end();
+}
+
+void cv::viz::VizStorage::removeUnreferenced()
+{
+    for(VizMap::iterator pos = storage.m.begin(); pos != storage.m.end();)
+        if(pos->second.impl_->ref_counter == 1)
+            storage.m.erase(pos++);
+        else
+            ++pos;
+}
+
+cv::String cv::viz::VizStorage::generateWindowName(const String &window_name)
+{
+    String output = "Viz";
+    // Already is Viz
+    if (window_name == output)
+        return output;
+
+    String prefixed = output + " - ";
+    if (window_name.substr(0, prefixed.length()) == prefixed)
+        output = window_name; // Already has "Viz - "
+    else if (window_name.substr(0, output.length()) == output)
+        output = prefixed + window_name; // Doesn't have prefix
+    else
+        output = (window_name == "" ? output : prefixed + window_name);
+
+    return output;
+}
+
+cv::viz::Viz3d cv::viz::getWindowByName(const String &window_name) { return Viz3d (window_name); }
+void cv::viz::unregisterAllWindows() { VizStorage::unregisterAll(); }
+
+cv::viz::Viz3d cv::viz::imshow(const String& window_name, InputArray image, const Size& window_size)
+{
+    Viz3d viz = getWindowByName(window_name);
+    viz.showImage(image, window_size);
+    return viz;
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// Read/write clouds. Supported formats: ply, stl, xyz, obj
+
+void cv::viz::writeCloud(const String& file, InputArray cloud, InputArray colors, InputArray normals, bool binary)
+{
+    CV_Assert(file.size() > 4 && "Extension is required");
+    String extension = file.substr(file.size()-4);
+
+    vtkSmartPointer<vtkCloudMatSource> source = vtkSmartPointer<vtkCloudMatSource>::New();
+    source->SetColorCloudNormals(cloud, colors, normals);
+
+    vtkSmartPointer<vtkWriter> writer;
+    if (extension == ".xyz")
+    {
+        writer = vtkSmartPointer<vtkXYZWriter>::New();
+        vtkXYZWriter::SafeDownCast(writer)->SetFileName(file.c_str());
+    }
+    else if (extension == ".ply")
+    {
+        writer = vtkSmartPointer<vtkPLYWriter>::New();
+        vtkPLYWriter::SafeDownCast(writer)->SetFileName(file.c_str());
+        vtkPLYWriter::SafeDownCast(writer)->SetFileType(binary ? VTK_BINARY : VTK_ASCII);
+        vtkPLYWriter::SafeDownCast(writer)->SetArrayName("Colors");
+    }
+    else if (extension == ".obj")
+    {
+        writer = vtkSmartPointer<vtkOBJWriter>::New();
+        vtkOBJWriter::SafeDownCast(writer)->SetFileName(file.c_str());
+    }
+    else
+        CV_Error(Error::StsError, "Unsupported format");
+
+    writer->SetInputConnection(source->GetOutputPort());
+    writer->Write();
+}
+
+cv::Mat cv::viz::readCloud(const String& file, OutputArray colors, OutputArray normals)
+{
+    CV_Assert(file.size() > 4 && "Extension is required");
+    String extension = file.substr(file.size()-4);
+
+    vtkSmartPointer<vtkPolyDataAlgorithm> reader;
+    if (extension == ".xyz")
+    {
+        reader = vtkSmartPointer<vtkXYZReader>::New();
+        vtkXYZReader::SafeDownCast(reader)->SetFileName(file.c_str());
+    }
+    else if (extension == ".ply")
+    {
+        reader = vtkSmartPointer<vtkPLYReader>::New();
+        CV_Assert(vtkPLYReader::CanReadFile(file.c_str()));
+        vtkPLYReader::SafeDownCast(reader)->SetFileName(file.c_str());
+    }
+    else if (extension == ".obj")
+    {
+        reader = vtkSmartPointer<vtkOBJReader>::New();
+        vtkOBJReader::SafeDownCast(reader)->SetFileName(file.c_str());
+    }
+    else if (extension == ".stl")
+    {
+        reader = vtkSmartPointer<vtkSTLReader>::New();
+        vtkSTLReader::SafeDownCast(reader)->SetFileName(file.c_str());
+    }
+    else
+        CV_Error(Error::StsError, "Unsupported format");
+
+    cv::Mat cloud;
+
+    vtkSmartPointer<vtkCloudMatSink> sink = vtkSmartPointer<vtkCloudMatSink>::New();
+    sink->SetInputConnection(reader->GetOutputPort());
+    sink->SetOutput(cloud, colors, normals);
+    sink->Write();
+
+    return cloud;
+}
+
+cv::viz::Mesh cv::viz::readMesh(const String& file) { return Mesh::load(file); }
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// Read/write poses and trajectories
+
+bool cv::viz::readPose(const String& file, Affine3d& pose, const String& tag)
+{
+    FileStorage fs(file, FileStorage::READ);
+    if (!fs.isOpened())
+        return false;
+
+    Mat hdr(pose.matrix, false);
+    fs[tag] >> hdr;
+    if (hdr.empty() || hdr.cols != pose.matrix.cols || hdr.rows != pose.matrix.rows)
+        return false;
+
+    hdr.convertTo(pose.matrix, CV_64F);
+    return true;
+}
+
+void cv::viz::writePose(const String& file, const Affine3d& pose, const String& tag)
+{
+    FileStorage fs(file, FileStorage::WRITE);
+    fs << tag << Mat(pose.matrix, false);
+}
+
+void cv::viz::readTrajectory(OutputArray _traj, const String& files_format, int start, int end, const String& tag)
+{
+    CV_Assert(_traj.kind() == _InputArray::STD_VECTOR || _traj.kind() == _InputArray::MAT);
+
+    start = max(0, std::min(start, end));
+    end = std::max(start, end);
+
+    std::vector<Affine3d> traj;
+
+    for(int i = start; i < end; ++i)
+    {
+        Affine3d affine;
+        bool ok = readPose(cv::format(files_format.c_str(), i), affine, tag);
+        if (!ok)
+            break;
+
+        traj.push_back(affine);
+    }
+
+    Mat(traj).convertTo(_traj, _traj.depth());
+}
+
+void cv::viz::writeTrajectory(InputArray _traj, const String& files_format, int start, const String& tag)
+{
+    if (_traj.kind() == _InputArray::STD_VECTOR_MAT)
+    {
+#if CV_MAJOR_VERSION < 3
+        std::vector<Mat>& v = *(std::vector<Mat>*)_traj.obj;
+#else
+        std::vector<Mat>& v = *(std::vector<Mat>*)_traj.getObj();
+#endif
+
+        for(size_t i = 0, index = max(0, start); i < v.size(); ++i, ++index)
+        {
+            Affine3d affine;
+            Mat pose = v[i];
+            CV_Assert(pose.type() == CV_32FC(16) || pose.type() == CV_64FC(16));
+            pose.copyTo(affine.matrix);
+            writePose(cv::format(files_format.c_str(), index), affine, tag);
+        }
+        return;
+    }
+
+    if (_traj.kind() == _InputArray::STD_VECTOR || _traj.kind() == _InputArray::MAT)
+    {
+        CV_Assert(_traj.type() == CV_32FC(16) || _traj.type() == CV_64FC(16));
+
+        Mat traj = _traj.getMat();
+
+        if (traj.depth() == CV_32F)
+            for(size_t i = 0, index = max(0, start); i < traj.total(); ++i, ++index)
+                writePose(cv::format(files_format.c_str(), index), traj.at<Affine3f>((int)i), tag);
+
+        if (traj.depth() == CV_64F)
+            for(size_t i = 0, index = max(0, start); i < traj.total(); ++i, ++index)
+                writePose(cv::format(files_format.c_str(), index), traj.at<Affine3d>((int)i), tag);
+        return;
+    }
+
+    CV_Error(Error::StsError, "Unsupported array kind");
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// Computing normals for mesh
+
+void cv::viz::computeNormals(const Mesh& mesh, OutputArray _normals)
+{
+    vtkSmartPointer<vtkPolyData> polydata = getPolyData(WMesh(mesh));
+    vtkSmartPointer<vtkPolyData> with_normals = VtkUtils::ComputeNormals(polydata);
+
+    vtkSmartPointer<vtkDataArray> generic_normals = with_normals->GetPointData()->GetNormals();
+    if(generic_normals)
+    {
+        Mat normals(1, generic_normals->GetNumberOfTuples(), CV_64FC3);
+        Vec3d *optr = normals.ptr<Vec3d>();
+
+        for(int i = 0; i < generic_normals->GetNumberOfTuples(); ++i, ++optr)
+            generic_normals->GetTuple(i, optr->val);
+
+        normals.convertTo(_normals, mesh.cloud.type());
+    }
+    else
+        _normals.release();
+}
diff --git a/modules/viz/src/vizimpl.cpp b/modules/viz/src/vizimpl.cpp
new file mode 100644
index 00000000000..2c291c05697
--- /dev/null
+++ b/modules/viz/src/vizimpl.cpp
@@ -0,0 +1,622 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#include "precomp.hpp"
+
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+cv::viz::Viz3d::VizImpl::VizImpl(const String &name) : spin_once_state_(false),
+    window_position_(Vec2i(std::numeric_limits<int>::min())), widget_actor_map_(new WidgetActorMap)
+{
+    renderer_ = vtkSmartPointer<vtkRenderer>::New();
+    window_name_ = VizStorage::generateWindowName(name);
+
+    // Create render window
+    window_ = vtkSmartPointer<vtkRenderWindow>::New();
+    cv::Vec2i window_size = cv::Vec2i(window_->GetScreenSize()) / 2;
+    window_->SetSize(window_size.val);
+    window_->AddRenderer(renderer_);
+
+    // Create the interactor style
+    style_ = vtkSmartPointer<vtkVizInteractorStyle>::New();
+    style_->setWidgetActorMap(widget_actor_map_);
+    style_->UseTimersOn();
+
+    timer_callback_ = vtkSmartPointer<TimerCallback>::New();
+    exit_callback_ = vtkSmartPointer<ExitCallback>::New();
+    exit_callback_->viz = this;
+
+    offScreenMode_ = false;
+
+    setBackgroundMeshLab();
+}
+
+cv::viz::Viz3d::VizImpl::~VizImpl() { close(); }
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::TimerCallback::Execute(vtkObject* caller, unsigned long event_id, void* cookie)
+{
+    if (event_id == vtkCommand::TimerEvent && timer_id == *reinterpret_cast<int*>(cookie))
+    {
+        vtkSmartPointer<vtkRenderWindowInteractor> interactor = vtkRenderWindowInteractor::SafeDownCast(caller);
+        interactor->TerminateApp();
+    }
+}
+
+void cv::viz::Viz3d::VizImpl::ExitCallback::Execute(vtkObject*, unsigned long event_id, void*)
+{
+    if (event_id == vtkCommand::ExitEvent && viz->interactor_)
+    {
+        viz->interactor_->TerminateApp();
+        viz->interactor_ = 0;
+    }
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+
+bool cv::viz::Viz3d::VizImpl::wasStopped() const
+{
+    bool stopped = spin_once_state_ ? interactor_ == 0 : false;
+    spin_once_state_ &= !stopped;
+    return stopped;
+}
+
+void cv::viz::Viz3d::VizImpl::close()
+{
+    if (!interactor_)
+        return;
+    interactor_->GetRenderWindow()->Finalize();
+    interactor_->TerminateApp(); // This tends to close the window...
+    interactor_ = 0;
+}
+
+void cv::viz::Viz3d::VizImpl::recreateRenderWindow()
+{
+#if !defined _MSC_VER && !defined __APPLE__
+    //recreating is workaround for Ubuntu -- a crash in x-server
+    Vec2i window_size(window_->GetSize());
+    int fullscreen = window_->GetFullScreen();
+
+    window_->Finalize();
+    window_ = vtkSmartPointer<vtkRenderWindow>::New();
+    if (window_position_[0] != std::numeric_limits<int>::min()) //also workaround
+        window_->SetPosition(window_position_.val);
+
+    window_->SetSize(window_size.val);
+    window_->SetFullScreen(fullscreen);
+    window_->AddRenderer(renderer_);
+#endif
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::spin()
+{
+    recreateRenderWindow();
+#if defined __APPLE__
+    interactor_ = vtkCocoaRenderWindowInteractorNew();
+#else
+    interactor_ = vtkSmartPointer<vtkRenderWindowInteractor>::New();
+#endif
+    interactor_->SetRenderWindow(window_);
+    interactor_->SetInteractorStyle(style_);
+    window_->AlphaBitPlanesOff();
+    window_->PointSmoothingOff();
+    window_->LineSmoothingOff();
+    window_->PolygonSmoothingOff();
+    window_->SwapBuffersOn();
+    window_->SetStereoTypeToAnaglyph();
+    window_->Render();
+    window_->SetWindowName(window_name_.c_str());
+    interactor_->Start();
+    interactor_ = 0;
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::spinOnce(int time, bool force_redraw)
+{
+    if (interactor_ == 0)
+    {
+        spin_once_state_ = true;
+        recreateRenderWindow();
+#if defined __APPLE__
+        interactor_ = vtkCocoaRenderWindowInteractorNew();
+#else
+        interactor_ = vtkSmartPointer<vtkRenderWindowInteractor>::New();
+#endif
+        interactor_->SetRenderWindow(window_);
+        interactor_->SetInteractorStyle(style_);
+        interactor_->AddObserver(vtkCommand::TimerEvent, timer_callback_);
+        interactor_->AddObserver(vtkCommand::ExitEvent, exit_callback_);
+        window_->AlphaBitPlanesOff();
+        window_->PointSmoothingOff();
+        window_->LineSmoothingOff();
+        window_->PolygonSmoothingOff();
+        window_->SwapBuffersOn();
+        window_->SetStereoTypeToAnaglyph();
+        window_->Render();
+        window_->SetWindowName(window_name_.c_str());
+    }
+
+    vtkSmartPointer<vtkRenderWindowInteractor> local = interactor_;
+
+    if (force_redraw)
+        local->Render();
+
+    timer_callback_->timer_id = local->CreateRepeatingTimer(std::max(1, time));
+    local->Start();
+    local->DestroyTimer(timer_callback_->timer_id);
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::setOffScreenRendering()
+{
+    window_->SetOffScreenRendering(1);
+    offScreenMode_ = true;
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::removeAllLights()
+{
+    renderer_->RemoveAllLights();
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::addLight(Vec3d position, Vec3d focalPoint, const Color &color, const Color &diffuseColor,
+                                       const Color &ambientColor, const Color &specularColor)
+{
+    Color color_  = vtkcolor(color);
+    Color diffuseColor_ = vtkcolor(diffuseColor);
+    Color ambientColor_ = vtkcolor(ambientColor);
+    Color specularColor_ = vtkcolor(specularColor);
+
+    vtkSmartPointer<vtkLight> light = vtkSmartPointer<vtkLight>::New();
+    light->SetPosition(position.val);
+    light->SetFocalPoint(focalPoint.val);
+    light->SetColor(color_.val);
+    light->SetDiffuseColor(diffuseColor_.val);
+    light->SetAmbientColor(ambientColor_.val);
+    light->SetSpecularColor(specularColor_.val);
+
+    renderer_->AddLight(light);
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::showWidget(const String &id, const Widget &widget, const Affine3d &pose)
+{
+    WidgetActorMap::iterator wam_itr = widget_actor_map_->find(id);
+    bool exists = wam_itr != widget_actor_map_->end();
+    if (exists)
+    {
+        // Remove it if it exists and add it again
+        removeActorFromRenderer(wam_itr->second);
+    }
+    // Get the actor and set the user matrix
+    vtkProp3D *actor = vtkProp3D::SafeDownCast(WidgetAccessor::getProp(widget));
+    if (actor)
+    {
+        // If the actor is 3D, apply pose
+        vtkSmartPointer<vtkMatrix4x4> matrix = vtkmatrix(pose.matrix);
+        actor->SetUserMatrix(matrix);
+        actor->Modified();
+    }
+    // If the actor is a vtkFollower, then it should always face the camera
+    vtkFollower *follower = vtkFollower::SafeDownCast(actor);
+    if (follower)
+    {
+        follower->SetCamera(renderer_->GetActiveCamera());
+    }
+
+    renderer_->AddActor(WidgetAccessor::getProp(widget));
+    (*widget_actor_map_)[id] = WidgetAccessor::getProp(widget);
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::removeWidget(const String &id)
+{
+    WidgetActorMap::iterator wam_itr = widget_actor_map_->find(id);
+    bool exists = wam_itr != widget_actor_map_->end();
+    CV_Assert("Widget does not exist." && exists);
+    CV_Assert("Widget could not be removed." && removeActorFromRenderer(wam_itr->second));
+    widget_actor_map_->erase(wam_itr);
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+cv::viz::Widget cv::viz::Viz3d::VizImpl::getWidget(const String &id) const
+{
+    WidgetActorMap::const_iterator wam_itr = widget_actor_map_->find(id);
+    bool exists = wam_itr != widget_actor_map_->end();
+    CV_Assert("Widget does not exist." && exists);
+
+    Widget widget;
+    WidgetAccessor::setProp(widget, wam_itr->second);
+    return widget;
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::setWidgetPose(const String &id, const Affine3d &pose)
+{
+    WidgetActorMap::iterator wam_itr = widget_actor_map_->find(id);
+    bool exists = wam_itr != widget_actor_map_->end();
+    CV_Assert("Widget does not exist." && exists);
+
+    vtkProp3D *actor = vtkProp3D::SafeDownCast(wam_itr->second);
+    CV_Assert("Widget is not 3D." && actor);
+
+    vtkSmartPointer<vtkMatrix4x4> matrix = vtkmatrix(pose.matrix);
+    actor->SetUserMatrix(matrix);
+    actor->Modified();
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::updateWidgetPose(const String &id, const Affine3d &pose)
+{
+    WidgetActorMap::iterator wam_itr = widget_actor_map_->find(id);
+    bool exists = wam_itr != widget_actor_map_->end();
+    CV_Assert("Widget does not exist." && exists);
+
+    vtkProp3D *actor = vtkProp3D::SafeDownCast(wam_itr->second);
+    CV_Assert("Widget is not 3D." && actor);
+
+    vtkSmartPointer<vtkMatrix4x4> matrix = actor->GetUserMatrix();
+    if (!matrix)
+    {
+        setWidgetPose(id, pose);
+        return ;
+    }
+    Affine3d updated_pose = pose * Affine3d(*matrix->Element);
+    matrix = vtkmatrix(updated_pose.matrix);
+
+    actor->SetUserMatrix(matrix);
+    actor->Modified();
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+cv::Affine3d cv::viz::Viz3d::VizImpl::getWidgetPose(const String &id) const
+{
+    WidgetActorMap::const_iterator wam_itr = widget_actor_map_->find(id);
+    bool exists = wam_itr != widget_actor_map_->end();
+    CV_Assert("Widget does not exist." && exists);
+
+    vtkProp3D *actor = vtkProp3D::SafeDownCast(wam_itr->second);
+    CV_Assert("Widget is not 3D." && actor);
+
+    return Affine3d(*actor->GetUserMatrix()->Element);
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::saveScreenshot(const String &file) { style_->saveScreenshot(file.c_str()); }
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+cv::Mat cv::viz::Viz3d::VizImpl::getScreenshot() const
+{
+    vtkSmartPointer<vtkWindowToImageFilter> windowToImageFilter =
+        vtkSmartPointer<vtkWindowToImageFilter>::New();
+    windowToImageFilter->SetInput(window_);
+    windowToImageFilter->ReadFrontBufferOff(); // read from the back buffer
+    windowToImageFilter->Update();
+
+    vtkImageData *resultImage = windowToImageFilter->GetOutput();
+    int * dim  = resultImage->GetDimensions();
+    cv::Mat image(dim[1], dim[0], CV_8UC3);
+
+    Vec3b* dptr = reinterpret_cast<Vec3b*>(resultImage->GetScalarPointer());
+    size_t elem_step = resultImage->GetIncrements()[1]/sizeof(Vec3b);
+
+    for (int y = 0; y < image.rows; ++y)
+    {
+        const Vec3b* drow = dptr + elem_step * y;
+        unsigned char *srow = image.ptr<unsigned char>(image.rows - y - 1);
+        for (int x = 0; x < image.cols; ++x, srow += image.channels())
+        {
+          srow[0] = drow[x][2];
+          srow[1] = drow[x][1];
+          srow[2] = drow[x][0];
+        }
+    }
+
+    resultImage = 0;
+
+    return image;
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::registerMouseCallback(MouseCallback callback, void* cookie)
+{ style_->registerMouseCallback(callback, cookie); }
+
+void cv::viz::Viz3d::VizImpl::registerKeyboardCallback(KeyboardCallback callback, void* cookie)
+{ style_->registerKeyboardCallback(callback, cookie); }
+
+
+//////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::removeAllWidgets()
+{
+    widget_actor_map_->clear();
+    renderer_->RemoveAllViewProps();
+}
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::showImage(InputArray image, const Size& window_size)
+{
+    removeAllWidgets();
+    if (window_size.width > 0 && window_size.height > 0)
+        setWindowSize(window_size);
+
+    showWidget("showImage", WImageOverlay(image, Rect(Point(0,0), getWindowSize())));
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+bool cv::viz::Viz3d::VizImpl::removeActorFromRenderer(vtkSmartPointer<vtkProp> actor)
+{
+    vtkPropCollection* actors = renderer_->GetViewProps();
+    actors->InitTraversal();
+    vtkProp* current_actor = NULL;
+    while ((current_actor = actors->GetNextProp()) != NULL)
+        if (current_actor == actor)
+        {
+            renderer_->RemoveActor(actor);
+            return true;
+        }
+    return false;
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::setBackgroundColor(const Color& color, const Color& color2)
+{
+    Color c = vtkcolor(color), c2 = vtkcolor(color2);
+    bool gradient = color2[0] >= 0 && color2[1] >= 0 && color2[2] >= 0;
+
+    if (gradient)
+    {
+        renderer_->SetBackground(c2.val);
+        renderer_->SetBackground2(c.val);
+        renderer_->GradientBackgroundOn();
+    }
+    else
+    {
+        renderer_->SetBackground(c.val);
+        renderer_->GradientBackgroundOff();
+    }
+}
+
+void cv::viz::Viz3d::VizImpl::setBackgroundMeshLab()
+{ setBackgroundColor(Color(2, 1, 1), Color(240, 120, 120)); }
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::setBackgroundTexture(InputArray image)
+{
+    if (image.empty())
+    {
+        renderer_->SetBackgroundTexture(0);
+        renderer_->TexturedBackgroundOff();
+        return;
+    }
+
+    vtkSmartPointer<vtkImageMatSource> source = vtkSmartPointer<vtkImageMatSource>::New();
+    source->SetImage(image);
+
+    vtkSmartPointer<vtkImageFlip> image_flip = vtkSmartPointer<vtkImageFlip>::New();
+    image_flip->SetFilteredAxis(1); // Vertical flip
+    image_flip->SetInputConnection(source->GetOutputPort());
+
+    vtkSmartPointer<vtkTexture> texture = vtkSmartPointer<vtkTexture>::New();
+    texture->SetInputConnection(image_flip->GetOutputPort());
+    //texture->Update();
+
+    renderer_->SetBackgroundTexture(texture);
+    renderer_->TexturedBackgroundOn();
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::setCamera(const Camera &camera)
+{
+    vtkSmartPointer<vtkCamera> active_camera = renderer_->GetActiveCamera();
+
+    // Set the intrinsic parameters of the camera
+    window_->SetSize(camera.getWindowSize().width, camera.getWindowSize().height);
+    double aspect_ratio = static_cast<double>(camera.getWindowSize().width)/static_cast<double>(camera.getWindowSize().height);
+
+    Matx44d proj_mat;
+    camera.computeProjectionMatrix(proj_mat);
+
+    // Use the intrinsic parameters of the camera to simulate more realistically
+    vtkSmartPointer<vtkMatrix4x4> vtk_matrix = active_camera->GetProjectionTransformMatrix(aspect_ratio, -1.0, 1.0);
+    Matx44d old_proj_mat(*vtk_matrix->Element);
+
+    // This is a hack around not being able to set Projection Matrix
+    vtkSmartPointer<vtkTransform> transform = vtkSmartPointer<vtkTransform>::New();
+    transform->SetMatrix(vtkmatrix(proj_mat * old_proj_mat.inv()));
+    active_camera->SetUserTransform(transform);
+
+    renderer_->ResetCameraClippingRange();
+    renderer_->Render();
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+cv::viz::Camera cv::viz::Viz3d::VizImpl::getCamera() const
+{
+    vtkSmartPointer<vtkCamera> active_camera = renderer_->GetActiveCamera();
+
+    Size window_size(renderer_->GetRenderWindow()->GetSize()[0],
+                     renderer_->GetRenderWindow()->GetSize()[1]);
+    double aspect_ratio = window_size.width / (double)window_size.height;
+
+    vtkSmartPointer<vtkMatrix4x4> proj_matrix = active_camera->GetProjectionTransformMatrix(aspect_ratio, -1.0f, 1.0f);
+    return Camera(Matx44d(*proj_matrix->Element), window_size);
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::setViewerPose(const Affine3d &pose)
+{
+    vtkCamera& camera = *renderer_->GetActiveCamera();
+
+    // Position = extrinsic translation
+    cv::Vec3d pos_vec = pose.translation();
+
+    // Rotate the view vector
+    cv::Matx33d rotation = pose.rotation();
+    cv::Vec3d y_axis(0.0, -1.0, 0.0); // In Computer Vision Camera Y-axis is oriented down
+    cv::Vec3d up_vec(rotation * y_axis);
+
+    // Compute the new focal point
+    cv::Vec3d z_axis(0.0, 0.0, 1.0);
+    cv::Vec3d focal_vec = pose * z_axis;
+
+    camera.SetPosition(pos_vec.val);
+    camera.SetFocalPoint(focal_vec.val);
+    camera.SetViewUp(up_vec.val);
+
+    renderer_->ResetCameraClippingRange();
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+cv::Affine3d cv::viz::Viz3d::VizImpl::getViewerPose() const
+{
+    vtkCamera& camera = *renderer_->GetActiveCamera();
+
+    Vec3d pos(camera.GetPosition());
+    Vec3d view_up(camera.GetViewUp());
+    Vec3d focal(camera.GetFocalPoint());
+
+    Vec3d y_axis = normalized(-view_up); // In Computer Vision Camera Y-axis is oriented down
+    Vec3d z_axis = normalized(focal - pos);
+    Vec3d x_axis = normalized(y_axis.cross(z_axis));
+
+    return makeTransformToGlobal(x_axis, y_axis, z_axis, pos);
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::convertToWindowCoordinates(const Point3d &pt, Point3d &window_coord)
+{
+    Vec3d window_pt;
+    vtkInteractorObserver::ComputeWorldToDisplay(renderer_, pt.x, pt.y, pt.z, window_pt.val);
+    window_coord = window_pt;
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::converTo3DRay(const Point3d &window_coord, Point3d &origin, Vec3d &direction)
+{
+    Vec4d world_pt;
+    vtkInteractorObserver::ComputeDisplayToWorld(renderer_, window_coord.x, window_coord.y, window_coord.z, world_pt.val);
+    Vec3d cam_pos(renderer_->GetActiveCamera()->GetPosition());
+    origin = cam_pos;
+    direction = normalize(Vec3d(world_pt.val) - cam_pos);
+}
+
+/////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::resetCameraViewpoint(const String &id)
+{
+    vtkSmartPointer<vtkMatrix4x4> camera_pose;
+    static WidgetActorMap::iterator it = widget_actor_map_->find(id);
+    if (it != widget_actor_map_->end())
+    {
+        vtkProp3D *actor = vtkProp3D::SafeDownCast(it->second);
+        CV_Assert("Widget is not 3D." && actor);
+        camera_pose = actor->GetUserMatrix();
+    }
+    else
+        return;
+
+    // Prevent a segfault
+    if (!camera_pose) return;
+
+    vtkSmartPointer<vtkCamera> cam = renderer_->GetActiveCamera();
+    cam->SetPosition(camera_pose->GetElement(0, 3),
+                     camera_pose->GetElement(1, 3),
+                     camera_pose->GetElement(2, 3));
+
+    cam->SetFocalPoint(camera_pose->GetElement(0, 3) - camera_pose->GetElement(0, 2),
+                       camera_pose->GetElement(1, 3) - camera_pose->GetElement(1, 2),
+                       camera_pose->GetElement(2, 3) - camera_pose->GetElement(2, 2));
+
+    cam->SetViewUp(camera_pose->GetElement(0, 1),
+                   camera_pose->GetElement(1, 1),
+                   camera_pose->GetElement(2, 1));
+
+    renderer_->SetActiveCamera(cam);
+    renderer_->ResetCameraClippingRange();
+    renderer_->ResetCamera();
+    renderer_->Render();
+}
+
+///////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::resetCamera()
+{
+    renderer_->ResetCamera();
+}
+
+///////////////////////////////////////////////////////////////////////////////////
+void cv::viz::Viz3d::VizImpl::setRepresentation(int representation)
+{
+    vtkActorCollection * actors = renderer_->GetActors();
+    actors->InitTraversal();
+    vtkActor * actor;
+    switch (representation)
+    {
+        case REPRESENTATION_POINTS:
+        {
+            while ((actor = actors->GetNextActor()) != NULL)
+                actor->GetProperty()->SetRepresentationToPoints();
+            break;
+        }
+        case REPRESENTATION_SURFACE:
+        {
+            while ((actor = actors->GetNextActor()) != NULL)
+                actor->GetProperty()->SetRepresentationToSurface();
+            break;
+        }
+        case REPRESENTATION_WIREFRAME:
+        {
+            while ((actor = actors->GetNextActor()) != NULL)
+                actor->GetProperty()->SetRepresentationToWireframe();
+            break;
+        }
+    }
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+cv::String cv::viz::Viz3d::VizImpl::getWindowName() const { return window_name_; }
+void cv::viz::Viz3d::VizImpl::setFullScreen(bool mode) { window_->SetFullScreen(mode); }
+void cv::viz::Viz3d::VizImpl::setWindowPosition(const Point& position) { window_position_ = position; window_->SetPosition(position.x, position.y); }
+void cv::viz::Viz3d::VizImpl::setWindowSize(const Size& window_size) { window_->SetSize(window_size.width, window_size.height); }
+cv::Size cv::viz::Viz3d::VizImpl::getWindowSize() const { return Size(Point(Vec2i(window_->GetSize()))); }
diff --git a/modules/viz/src/vizimpl.hpp b/modules/viz/src/vizimpl.hpp
new file mode 100644
index 00000000000..b39bc497880
--- /dev/null
+++ b/modules/viz/src/vizimpl.hpp
@@ -0,0 +1,146 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#ifndef __OPENCV_VIZ_VIZ3D_IMPL_HPP__
+#define __OPENCV_VIZ_VIZ3D_IMPL_HPP__
+
+struct cv::viz::Viz3d::VizImpl
+{
+public:
+    typedef Viz3d::KeyboardCallback KeyboardCallback;
+    typedef Viz3d::MouseCallback MouseCallback;
+
+    int ref_counter;
+
+    VizImpl(const String &name);
+    virtual ~VizImpl();
+
+    bool wasStopped() const;
+    void close();
+
+    void spin();
+    void spinOnce(int time = 1, bool force_redraw = false);
+    void setOffScreenRendering();
+
+    void removeAllLights();
+    void addLight(Vec3d position, Vec3d focalPoint, const Color &color, const Color &diffuseColor,
+                  const Color &ambientColor, const Color &specularColor);
+
+    void showWidget(const String &id, const Widget &widget, const Affine3d &pose = Affine3d::Identity());
+    void removeWidget(const String &id);
+    Widget getWidget(const String &id) const;
+    void removeAllWidgets();
+
+    void showImage(InputArray image, const Size& window_size);
+
+    void setWidgetPose(const String &id, const Affine3d &pose);
+    void updateWidgetPose(const String &id, const Affine3d &pose);
+    Affine3d getWidgetPose(const String &id) const;
+
+    void setRepresentation(int representation);
+
+    void setCamera(const Camera &camera);
+    Camera getCamera() const;
+
+    /** \brief Reset the camera to a given widget */
+    void resetCameraViewpoint(const String& id);
+    void resetCamera();
+
+    void setViewerPose(const Affine3d &pose);
+    Affine3d getViewerPose() const;
+
+    void convertToWindowCoordinates(const Point3d &pt, Point3d &window_coord);
+    void converTo3DRay(const Point3d &window_coord, Point3d &origin, Vec3d &direction);
+
+    Mat getScreenshot() const;
+    void saveScreenshot(const String &file);
+    void setWindowPosition(const Point& position);
+    Size getWindowSize() const;
+    void setWindowSize(const Size& window_size);
+    void setFullScreen(bool mode);
+    String getWindowName() const;
+    void setBackgroundColor(const Color& color, const Color& color2);
+    void setBackgroundTexture(InputArray image);
+    void setBackgroundMeshLab();
+
+    void registerKeyboardCallback(KeyboardCallback callback, void* cookie = 0);
+    void registerMouseCallback(MouseCallback callback, void* cookie = 0);
+
+private:
+    struct TimerCallback : public vtkCommand
+    {
+        static TimerCallback* New() { return new TimerCallback; }
+        virtual void Execute(vtkObject* caller, unsigned long event_id, void* cookie);
+        int timer_id;
+    };
+
+    struct ExitCallback : public vtkCommand
+    {
+        static ExitCallback* New() { return new ExitCallback; }
+        virtual void Execute(vtkObject*, unsigned long event_id, void*);
+        VizImpl* viz;
+    };
+
+    mutable bool spin_once_state_;
+    vtkSmartPointer<vtkRenderWindowInteractor> interactor_;
+
+    vtkSmartPointer<vtkRenderWindow> window_;
+    String window_name_;
+    Vec2i window_position_;
+
+    vtkSmartPointer<TimerCallback> timer_callback_;
+    vtkSmartPointer<ExitCallback> exit_callback_;
+
+    vtkSmartPointer<vtkRenderer> renderer_;
+    vtkSmartPointer<vtkVizInteractorStyle> style_;
+    Ptr<WidgetActorMap> widget_actor_map_;
+
+    bool offScreenMode_;
+
+    bool removeActorFromRenderer(vtkSmartPointer<vtkProp> actor);
+    void recreateRenderWindow();
+};
+
+#endif
diff --git a/modules/viz/src/vtk/vtkCloudMatSink.cpp b/modules/viz/src/vtk/vtkCloudMatSink.cpp
new file mode 100644
index 00000000000..aa3d34ca4ff
--- /dev/null
+++ b/modules/viz/src/vtk/vtkCloudMatSink.cpp
@@ -0,0 +1,174 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#include "../precomp.hpp"
+
+namespace cv { namespace viz
+{
+    vtkStandardNewMacro(vtkCloudMatSink);
+}}
+
+cv::viz::vtkCloudMatSink::vtkCloudMatSink() {}
+cv::viz::vtkCloudMatSink::~vtkCloudMatSink() {}
+
+void cv::viz::vtkCloudMatSink::SetOutput(OutputArray _cloud, OutputArray _colors, OutputArray _normals, OutputArray _tcoords)
+{
+    cloud = _cloud;
+    colors = _colors;
+    normals = _normals;
+    tcoords = _tcoords;
+}
+
+void cv::viz::vtkCloudMatSink::WriteData()
+{
+    vtkPolyData *input = this->GetInput();
+    if (!input)
+        return;
+
+    vtkSmartPointer<vtkPoints> points_Data = input->GetPoints();
+
+    if (cloud.needed() && points_Data)
+    {
+        int vtktype = points_Data->GetDataType();
+        CV_Assert(vtktype == VTK_FLOAT || vtktype == VTK_DOUBLE);
+
+        cloud.create(1, points_Data->GetNumberOfPoints(), vtktype == VTK_FLOAT ? CV_32FC3 : CV_64FC3);
+        Vec3d *ddata = cloud.getMat().ptr<Vec3d>();
+        Vec3f *fdata = cloud.getMat().ptr<Vec3f>();
+
+        if (cloud.depth() == CV_32F)
+            for(size_t i = 0; i < cloud.total(); ++i)
+                *fdata++ = Vec3d(points_Data->GetPoint((vtkIdType)i));
+
+        if (cloud.depth() == CV_64F)
+            for(size_t i = 0; i < cloud.total(); ++i)
+                *ddata++ = Vec3d(points_Data->GetPoint((vtkIdType)i));
+    }
+    else
+        cloud.release();
+
+    vtkSmartPointer<vtkDataArray> scalars_data = input->GetPointData() ? input->GetPointData()->GetScalars() : 0;
+
+    if (colors.needed() && scalars_data)
+    {
+        int channels = scalars_data->GetNumberOfComponents();
+        int vtktype = scalars_data->GetDataType();
+
+        CV_Assert((channels == 3 || channels == 4) && "Only 3- or 4-channel color data support is implemented");
+        CV_Assert(cloud.total() == (size_t)scalars_data->GetNumberOfTuples());
+
+        Mat buffer(cloud.size(), CV_64FC(channels));
+        Vec3d *cptr = buffer.ptr<Vec3d>();
+        for(size_t i = 0; i < buffer.total(); ++i)
+            *cptr++ = Vec3d(scalars_data->GetTuple((vtkIdType)i));
+
+        buffer.convertTo(colors, CV_8U, vtktype == VTK_FLOAT || VTK_FLOAT == VTK_DOUBLE ?  255.0 : 1.0);
+    }
+    else
+        colors.release();
+
+    vtkSmartPointer<vtkDataArray> normals_data = input->GetPointData() ? input->GetPointData()->GetNormals() : 0;
+
+    if (normals.needed() && normals_data)
+    {
+        int channels = normals_data->GetNumberOfComponents();
+        int vtktype = normals_data->GetDataType();
+
+        CV_Assert((vtktype == VTK_FLOAT || VTK_FLOAT == VTK_DOUBLE) && (channels == 3 || channels == 4));
+        CV_Assert(cloud.total() == (size_t)normals_data->GetNumberOfTuples());
+
+        Mat buffer(cloud.size(), CV_64FC(channels));
+        Vec3d *cptr = buffer.ptr<Vec3d>();
+        for(size_t i = 0; i < buffer.total(); ++i)
+            *cptr++ = Vec3d(normals_data->GetTuple((vtkIdType)i));
+
+        buffer.convertTo(normals, vtktype == VTK_FLOAT ? CV_32F : CV_64F);
+    }
+    else
+        normals.release();
+
+    vtkSmartPointer<vtkDataArray> coords_data = input->GetPointData() ? input->GetPointData()->GetTCoords() : 0;
+
+    if (tcoords.needed() && coords_data)
+    {
+        int vtktype = coords_data->GetDataType();
+
+        CV_Assert(vtktype == VTK_FLOAT || VTK_FLOAT == VTK_DOUBLE);
+        CV_Assert(cloud.total() == (size_t)coords_data->GetNumberOfTuples());
+
+        Mat buffer(cloud.size(), CV_64FC2);
+        Vec2d *cptr = buffer.ptr<Vec2d>();
+        for(size_t i = 0; i < buffer.total(); ++i)
+            *cptr++ = Vec2d(coords_data->GetTuple((vtkIdType)i));
+
+        buffer.convertTo(tcoords, vtktype == VTK_FLOAT ? CV_32F : CV_64F);
+
+    }
+    else
+        tcoords.release();
+}
+
+void cv::viz::vtkCloudMatSink::PrintSelf(ostream& os, vtkIndent indent)
+{
+  Superclass::PrintSelf(os, indent);
+  os << indent << "Cloud: " << cloud.needed() << "\n";
+  os << indent << "Colors: " << colors.needed() << "\n";
+  os << indent << "Normals: " << normals.needed() << "\n";
+}
+
+int cv::viz::vtkCloudMatSink::FillInputPortInformation(int, vtkInformation *info)
+{
+    info->Set(vtkAlgorithm::INPUT_REQUIRED_DATA_TYPE(), "vtkPolyData");
+    return 1;
+}
+
+vtkPolyData* cv::viz::vtkCloudMatSink::GetInput()
+{
+    return vtkPolyData::SafeDownCast(this->Superclass::GetInput());
+}
+
+vtkPolyData* cv::viz::vtkCloudMatSink::GetInput(int port)
+{
+    return vtkPolyData::SafeDownCast(this->Superclass::GetInput(port));
+}
diff --git a/modules/viz/src/vtk/vtkCloudMatSink.h b/modules/viz/src/vtk/vtkCloudMatSink.h
new file mode 100644
index 00000000000..997e6984a07
--- /dev/null
+++ b/modules/viz/src/vtk/vtkCloudMatSink.h
@@ -0,0 +1,88 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#ifndef __vtkCloudMatSink_h
+#define __vtkCloudMatSink_h
+
+#include <opencv2/core.hpp>
+#include <vtkWriter.h>
+
+namespace cv
+{
+    namespace viz
+    {
+        class vtkCloudMatSink : public vtkWriter
+        {
+        public:
+          static vtkCloudMatSink *New();
+          vtkTypeMacro(vtkCloudMatSink,vtkWriter)
+          void PrintSelf(ostream& os, vtkIndent indent);
+
+          void SetOutput(OutputArray cloud, OutputArray colors = noArray(), OutputArray normals = noArray(), OutputArray tcoords = noArray());
+
+          // Description:
+          // Get the input to this writer.
+          vtkPolyData* GetInput();
+          vtkPolyData* GetInput(int port);
+
+        protected:
+          vtkCloudMatSink();
+          ~vtkCloudMatSink();
+
+          void WriteData();
+          int FillInputPortInformation(int port, vtkInformation *info);
+
+          _OutputArray cloud; //!< point coordinates of type CV_32FC3 or CV_64FC3 with only 1 row
+          _OutputArray colors; //!< point color of type CV_8UC3 or CV_8UC4 with only 1 row
+          _OutputArray normals; //!< point normal of type CV_32FC3, CV_32FC4, CV_64FC3 or CV_64FC4 with only 1 row
+          _OutputArray tcoords; //!< texture coordinates of type CV_32FC2 or CV_64FC2 with only 1 row
+
+        private:
+          vtkCloudMatSink(const vtkCloudMatSink&);  // Not implemented.
+          void operator=(const vtkCloudMatSink&);  // Not implemented.
+        };
+    } // end namespace viz
+} // end namespace cv
+
+#endif
diff --git a/modules/viz/src/vtk/vtkCloudMatSource.cpp b/modules/viz/src/vtk/vtkCloudMatSource.cpp
new file mode 100644
index 00000000000..5f1cecbf2de
--- /dev/null
+++ b/modules/viz/src/vtk/vtkCloudMatSource.cpp
@@ -0,0 +1,286 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#include "../precomp.hpp"
+
+namespace cv { namespace viz
+{
+    vtkStandardNewMacro(vtkCloudMatSource);
+
+    template<typename _Tp> struct VtkDepthTraits;
+
+    template<> struct VtkDepthTraits<float>
+    {
+        const static int data_type = VTK_FLOAT;
+        typedef vtkFloatArray array_type;
+    };
+
+    template<> struct VtkDepthTraits<double>
+    {
+        const static int data_type = VTK_DOUBLE;
+        typedef vtkDoubleArray array_type;
+    };
+}}
+
+cv::viz::vtkCloudMatSource::vtkCloudMatSource() { SetNumberOfInputPorts(0); }
+cv::viz::vtkCloudMatSource::~vtkCloudMatSource() {}
+
+int cv::viz::vtkCloudMatSource::SetCloud(InputArray _cloud)
+{
+    CV_Assert(_cloud.depth() == CV_32F || _cloud.depth() == CV_64F);
+    CV_Assert(_cloud.channels() == 3 || _cloud.channels() == 4);
+
+    Mat cloud = _cloud.getMat();
+
+    int total = _cloud.depth() == CV_32F ? filterNanCopy<float>(cloud) : filterNanCopy<double>(cloud);
+
+    vertices = vtkSmartPointer<vtkCellArray>::New();
+    vertices->Allocate(vertices->EstimateSize(1, total));
+    vertices->InsertNextCell(total);
+    for(int i = 0; i < total; ++i)
+        vertices->InsertCellPoint(i);
+
+    return total;
+}
+
+int cv::viz::vtkCloudMatSource::SetColorCloud(InputArray _cloud, InputArray _colors)
+{
+    int total = SetCloud(_cloud);
+
+    if (_colors.empty())
+        return total;
+
+    CV_Assert(_colors.depth() == CV_8U && _colors.channels() <= 4 && _colors.channels() != 2);
+    CV_Assert(_colors.size() == _cloud.size());
+
+    Mat cloud = _cloud.getMat();
+    Mat colors = _colors.getMat();
+
+    if (cloud.depth() == CV_32F)
+        filterNanColorsCopy<float>(colors, cloud, total);
+    else if (cloud.depth() == CV_64F)
+        filterNanColorsCopy<double>(colors, cloud, total);
+
+    return total;
+}
+
+int cv::viz::vtkCloudMatSource::SetColorCloudNormals(InputArray _cloud, InputArray _colors, InputArray _normals)
+{
+    int total = SetColorCloud(_cloud, _colors);
+
+    if (_normals.empty())
+        return total;
+
+    CV_Assert(_normals.depth() == CV_32F || _normals.depth() == CV_64F);
+    CV_Assert(_normals.channels() == 3 || _normals.channels() == 4);
+    CV_Assert(_normals.size() == _cloud.size());
+
+    Mat c = _cloud.getMat();
+    Mat n = _normals.getMat();
+
+    if (n.depth() == CV_32F && c.depth() == CV_32F)
+        filterNanNormalsCopy<float, float>(n, c, total);
+    else if (n.depth() == CV_32F && c.depth() == CV_64F)
+        filterNanNormalsCopy<float, double>(n, c, total);
+    else if (n.depth() == CV_64F && c.depth() == CV_32F)
+        filterNanNormalsCopy<double, float>(n, c, total);
+    else if (n.depth() == CV_64F && c.depth() == CV_64F)
+        filterNanNormalsCopy<double, double>(n, c, total);
+    else
+        CV_Error(Error::StsError, "Unsupported normals/cloud type");
+
+    return total;
+}
+
+int cv::viz::vtkCloudMatSource::SetColorCloudNormalsTCoords(InputArray _cloud, InputArray _colors, InputArray _normals, InputArray _tcoords)
+{
+    int total = SetColorCloudNormals(_cloud, _colors, _normals);
+
+    if (_tcoords.empty())
+        return total;
+
+    CV_Assert(_tcoords.depth() == CV_32F || _tcoords.depth() == CV_64F);
+    CV_Assert(_tcoords.channels() == 2 && _tcoords.size() == _cloud.size());
+
+    Mat cl = _cloud.getMat();
+    Mat tc = _tcoords.getMat();
+
+    if (tc.depth() == CV_32F && cl.depth() == CV_32F)
+        filterNanTCoordsCopy<float, float>(tc, cl, total);
+    else if (tc.depth() == CV_32F && cl.depth() == CV_64F)
+        filterNanTCoordsCopy<float, double>(tc, cl, total);
+    else if (tc.depth() == CV_64F && cl.depth() == CV_32F)
+        filterNanTCoordsCopy<double, float>(tc, cl, total);
+    else if (tc.depth() == CV_64F && cl.depth() == CV_64F)
+        filterNanTCoordsCopy<double, double>(tc, cl, total);
+    else
+        CV_Error(Error::StsError, "Unsupported tcoords/cloud type");
+
+    return total;
+}
+
+int cv::viz::vtkCloudMatSource::RequestData(vtkInformation *vtkNotUsed(request), vtkInformationVector **vtkNotUsed(inputVector), vtkInformationVector *outputVector)
+{
+    vtkInformation *outInfo = outputVector->GetInformationObject(0);
+    vtkPolyData *output = vtkPolyData::SafeDownCast(outInfo->Get(vtkDataObject::DATA_OBJECT()));
+
+    output->SetPoints(points);
+    output->SetVerts(vertices);
+    if (scalars)
+        output->GetPointData()->SetScalars(scalars);
+
+    if (normals)
+        output->GetPointData()->SetNormals(normals);
+
+    if (tcoords)
+        output->GetPointData()->SetTCoords(tcoords);
+
+    return 1;
+}
+
+template<typename _Tp>
+int cv::viz::vtkCloudMatSource::filterNanCopy(const Mat& cloud)
+{
+    CV_DbgAssert(DataType<_Tp>::depth == cloud.depth());
+    points = vtkSmartPointer<vtkPoints>::New();
+    points->SetDataType(VtkDepthTraits<_Tp>::data_type);
+    points->Allocate((vtkIdType)cloud.total());
+    points->SetNumberOfPoints((vtkIdType)cloud.total());
+
+    int s_chs = cloud.channels();
+    int total = 0;
+    for (int y = 0; y < cloud.rows; ++y)
+    {
+        const _Tp* srow = cloud.ptr<_Tp>(y);
+        const _Tp* send = srow + cloud.cols * s_chs;
+
+        for (; srow != send; srow += s_chs)
+            if (!isNan(srow))
+                points->SetPoint(total++, srow);
+    }
+    points->SetNumberOfPoints(total);
+    points->Squeeze();
+    return total;
+}
+
+template<typename _Msk>
+void cv::viz::vtkCloudMatSource::filterNanColorsCopy(const Mat& cloud_colors, const Mat& mask, int total)
+{
+    Vec3b* array = new Vec3b[total];
+    Vec3b* pos = array;
+
+    int s_chs = cloud_colors.channels();
+    int m_chs = mask.channels();
+    for (int y = 0; y < cloud_colors.rows; ++y)
+    {
+        const unsigned char* srow = cloud_colors.ptr<unsigned char>(y);
+        const unsigned char* send = srow + cloud_colors.cols * s_chs;
+        const _Msk* mrow = mask.ptr<_Msk>(y);
+
+        if (cloud_colors.channels() == 1)
+        {
+            for (; srow != send; srow += s_chs, mrow += m_chs)
+                if (!isNan(mrow))
+                    *pos++ = Vec3b(srow[0], srow[0], srow[0]);
+        }
+        else
+            for (; srow != send; srow += s_chs, mrow += m_chs)
+                if (!isNan(mrow))
+                    *pos++ = Vec3b(srow[2], srow[1], srow[0]);
+
+    }
+
+    scalars = vtkSmartPointer<vtkUnsignedCharArray>::New();
+    scalars->SetName("Colors");
+    scalars->SetNumberOfComponents(3);
+    scalars->SetNumberOfTuples(total);
+    scalars->SetArray(array->val, total * 3, 0, vtkUnsignedCharArray::VTK_DATA_ARRAY_DELETE);
+}
+
+template<typename _Tn, typename _Msk>
+void cv::viz::vtkCloudMatSource::filterNanNormalsCopy(const Mat& cloud_normals, const Mat& mask, int total)
+{
+    normals = vtkSmartPointer< typename VtkDepthTraits<_Tn>::array_type >::New();
+    normals->SetName("Normals");
+    normals->SetNumberOfComponents(3);
+    normals->SetNumberOfTuples(total);
+
+    int s_chs = cloud_normals.channels();
+    int m_chs = mask.channels();
+
+    int pos = 0;
+    for (int y = 0; y < cloud_normals.rows; ++y)
+    {
+        const _Tn* srow = cloud_normals.ptr<_Tn>(y);
+        const _Tn* send = srow + cloud_normals.cols * s_chs;
+
+        const _Msk* mrow = mask.ptr<_Msk>(y);
+
+        for (; srow != send; srow += s_chs, mrow += m_chs)
+            if (!isNan(mrow))
+                normals->SetTuple(pos++, srow);
+    }
+}
+
+template<typename _Tn, typename _Msk>
+void cv::viz::vtkCloudMatSource::filterNanTCoordsCopy(const Mat& _tcoords, const Mat& mask, int total)
+{
+    typedef Vec<_Tn, 2> Vec2;
+    tcoords = vtkSmartPointer< typename VtkDepthTraits<_Tn>::array_type >::New();
+    tcoords->SetName("TextureCoordinates");
+    tcoords->SetNumberOfComponents(2);
+    tcoords->SetNumberOfTuples(total);
+
+    int pos = 0;
+    for (int y = 0; y < mask.rows; ++y)
+    {
+        const Vec2* srow = _tcoords.ptr<Vec2>(y);
+        const Vec2* send = srow + _tcoords.cols;
+        const _Msk* mrow = mask.ptr<_Msk>(y);
+
+        for (; srow != send; ++srow, mrow += mask.channels())
+            if (!isNan(mrow))
+                tcoords->SetTuple(pos++, srow->val);
+    }
+}
diff --git a/modules/viz/src/vtk/vtkCloudMatSource.h b/modules/viz/src/vtk/vtkCloudMatSource.h
new file mode 100644
index 00000000000..56bd93e066f
--- /dev/null
+++ b/modules/viz/src/vtk/vtkCloudMatSource.h
@@ -0,0 +1,96 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#ifndef __vtkCloudMatSource_h
+#define __vtkCloudMatSource_h
+
+#include <opencv2/core.hpp>
+#include <vtkPolyDataAlgorithm.h>
+#include <vtkSmartPointer.h>
+#include <vtkPoints.h>
+#include <vtkCellArray.h>
+
+namespace cv
+{
+    namespace viz
+    {
+        class vtkCloudMatSource : public vtkPolyDataAlgorithm
+        {
+        public:
+            static vtkCloudMatSource *New();
+            vtkTypeMacro(vtkCloudMatSource,vtkPolyDataAlgorithm)
+
+            virtual int SetCloud(InputArray cloud);
+            virtual int SetColorCloud(InputArray cloud, InputArray colors);
+            virtual int SetColorCloudNormals(InputArray cloud, InputArray colors, InputArray normals);
+            virtual int SetColorCloudNormalsTCoords(InputArray cloud, InputArray colors, InputArray normals, InputArray tcoords);
+
+        protected:
+            vtkCloudMatSource();
+            ~vtkCloudMatSource();
+
+            int RequestData(vtkInformation *, vtkInformationVector **, vtkInformationVector *);
+
+            vtkSmartPointer<vtkPoints> points;
+            vtkSmartPointer<vtkCellArray> vertices;
+            vtkSmartPointer<vtkUnsignedCharArray> scalars;
+            vtkSmartPointer<vtkDataArray> normals;
+            vtkSmartPointer<vtkDataArray> tcoords;
+        private:
+            vtkCloudMatSource(const vtkCloudMatSource&);  // Not implemented.
+            void operator=(const vtkCloudMatSource&);  // Not implemented.
+
+            template<typename _Tp> int filterNanCopy(const Mat& cloud);
+            template<typename _Msk> void filterNanColorsCopy(const Mat& cloud_colors, const Mat& mask, int total);
+
+            template<typename _Tn, typename _Msk>
+            void filterNanNormalsCopy(const Mat& cloud_normals, const Mat& mask, int total);
+
+            template<typename _Tn, typename _Msk>
+            void filterNanTCoordsCopy(const Mat& tcoords, const Mat& mask, int total);
+        };
+    }
+}
+
+#endif
diff --git a/modules/viz/src/vtk/vtkCocoaInteractorFix.mm b/modules/viz/src/vtk/vtkCocoaInteractorFix.mm
new file mode 100644
index 00000000000..b1531603115
--- /dev/null
+++ b/modules/viz/src/vtk/vtkCocoaInteractorFix.mm
@@ -0,0 +1,226 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//  This workaround code was taken from PCL library(www.pointclouds.org)
+//
+//M*/
+
+#import <Cocoa/Cocoa.h>
+#include <vtkCocoaRenderWindow.h>
+#include <vtkCocoaRenderWindowInteractor.h>
+#include <vtkObjectFactory.h>
+#include <vtkSmartPointer.h>
+
+namespace cv { namespace viz {
+    vtkSmartPointer<vtkRenderWindowInteractor> vtkCocoaRenderWindowInteractorNew();
+}} // namespace
+
+#if ((VTK_MAJOR_VERSION < 6) || ((VTK_MAJOR_VERSION == 6) && (VTK_MINOR_VERSION < 2)))
+
+
+//----------------------------------------------------------------------------
+@interface vtkCocoaServerFix : NSObject
+{
+    vtkCocoaRenderWindow* renWin;
+}
+
++ (id)cocoaServerWithRenderWindow:(vtkCocoaRenderWindow*)inRenderWindow;
+
+- (void)start;
+- (void)stop;
+- (void)breakEventLoop;
+
+@end
+
+//----------------------------------------------------------------------------
+@implementation vtkCocoaServerFix
+
+//----------------------------------------------------------------------------
+- (id)initWithRenderWindow:(vtkCocoaRenderWindow *)inRenderWindow
+{
+    self = [super init];
+    if (self)
+        renWin = inRenderWindow;
+    return self;
+}
+
+//----------------------------------------------------------------------------
++ (id)cocoaServerWithRenderWindow:(vtkCocoaRenderWindow *)inRenderWindow
+{
+    vtkCocoaServerFix *server = [[[vtkCocoaServerFix alloc] initWithRenderWindow:inRenderWindow] autorelease];
+    return server;
+}
+
+//----------------------------------------------------------------------------
+- (void)start
+{
+    // Retrieve the NSWindow.
+    NSWindow *win = nil;
+    if (renWin)
+    {
+        win = reinterpret_cast<NSWindow*> (renWin->GetRootWindow ());
+
+        // We don't want to be informed of every window closing, so check for nil.
+        if (win != nil)
+        {
+            // Register for the windowWillClose notification in order to stop the run loop if the window closes.
+            NSNotificationCenter *nc = [NSNotificationCenter defaultCenter];
+            [nc addObserver:self selector:@selector(windowWillClose:) name:NSWindowWillCloseNotification object:win];
+        }
+    }
+    // Start the NSApplication's run loop
+    NSApplication* application = [NSApplication sharedApplication];
+    [application run];
+}
+
+//----------------------------------------------------------------------------
+- (void)stop
+{
+    [self breakEventLoop];
+}
+
+//----------------------------------------------------------------------------
+- (void)breakEventLoop
+{
+    NSApplication* application = [NSApplication sharedApplication];
+    [application stop:application];
+
+    NSEvent *event = [NSEvent otherEventWithType:NSApplicationDefined
+            location:NSMakePoint(0.0,0.0)
+            modifierFlags:0
+            timestamp:0
+            windowNumber:-1
+            context:nil
+            subtype:0
+            data1:0
+            data2:0];
+    [application postEvent:event atStart:YES];
+}
+
+//----------------------------------------------------------------------------
+- (void)windowWillClose:(NSNotification*)aNotification
+{
+    (void)aNotification;
+
+    NSNotificationCenter *nc = [NSNotificationCenter defaultCenter];
+    [nc removeObserver:self name:NSWindowWillCloseNotification object:nil];
+
+    if (renWin)
+    {
+        int windowCreated = renWin->GetWindowCreated ();
+        if (windowCreated)
+        {
+            [self breakEventLoop];
+
+            // The NSWindow is closing, so prevent anyone from accidentally using it
+            renWin->SetRootWindow(NULL);
+        }
+    }
+}
+
+@end
+
+//----------------------------------------------------------------------------
+
+namespace cv { namespace viz
+{
+    class vtkCocoaRenderWindowInteractorFix : public vtkCocoaRenderWindowInteractor
+    {
+    public:
+        static vtkCocoaRenderWindowInteractorFix *New ();
+        vtkTypeMacro (vtkCocoaRenderWindowInteractorFix, vtkCocoaRenderWindowInteractor)
+
+        virtual void Start ();
+        virtual void TerminateApp ();
+
+    protected:
+        vtkCocoaRenderWindowInteractorFix () {}
+        ~vtkCocoaRenderWindowInteractorFix () {}
+
+    private:
+        vtkCocoaRenderWindowInteractorFix (const vtkCocoaRenderWindowInteractorFix&);  // Not implemented.
+        void operator = (const vtkCocoaRenderWindowInteractorFix&);  // Not implemented.
+    };
+
+    vtkStandardNewMacro (vtkCocoaRenderWindowInteractorFix)
+}}
+
+void cv::viz::vtkCocoaRenderWindowInteractorFix::Start ()
+{
+    vtkCocoaRenderWindow* renWin = vtkCocoaRenderWindow::SafeDownCast(this->GetRenderWindow ());
+    if (renWin != NULL)
+    {
+        vtkCocoaServerFix *server = reinterpret_cast<vtkCocoaServerFix*> (this->GetCocoaServer ());
+        if (!this->GetCocoaServer ())
+        {
+            server = [vtkCocoaServerFix cocoaServerWithRenderWindow:renWin];
+            this->SetCocoaServer (reinterpret_cast<void*> (server));
+        }
+
+        [server start];
+    }
+}
+
+void cv::viz::vtkCocoaRenderWindowInteractorFix::TerminateApp ()
+{
+    vtkCocoaRenderWindow *renWin = vtkCocoaRenderWindow::SafeDownCast (this->RenderWindow);
+    if (renWin)
+    {
+        vtkCocoaServerFix *server = reinterpret_cast<vtkCocoaServerFix*> (this->GetCocoaServer ());
+        [server stop];
+    }
+}
+
+vtkSmartPointer<vtkRenderWindowInteractor> cv::viz::vtkCocoaRenderWindowInteractorNew()
+{
+    return vtkSmartPointer<vtkCocoaRenderWindowInteractorFix>::New();
+}
+
+
+#else
+
+vtkSmartPointer<vtkRenderWindowInteractor> cv::viz::vtkCocoaRenderWindowInteractorNew()
+{
+    return vtkSmartPointer<vtkCocoaRenderWindowInteractor>::New();
+}
+
+#endif
diff --git a/modules/viz/src/vtk/vtkImageMatSource.cpp b/modules/viz/src/vtk/vtkImageMatSource.cpp
new file mode 100644
index 00000000000..d9de698df96
--- /dev/null
+++ b/modules/viz/src/vtk/vtkImageMatSource.cpp
@@ -0,0 +1,143 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#include "../precomp.hpp"
+
+namespace cv { namespace viz
+{
+    vtkStandardNewMacro(vtkImageMatSource);
+}}
+
+cv::viz::vtkImageMatSource::vtkImageMatSource()
+{
+    this->SetNumberOfInputPorts(0);
+    this->ImageData = vtkSmartPointer<vtkImageData>::New();
+}
+
+int cv::viz::vtkImageMatSource::RequestInformation(vtkInformation *, vtkInformationVector**, vtkInformationVector *outputVector)
+{
+    vtkInformation* outInfo = outputVector->GetInformationObject(0);
+
+    outInfo->Set(vtkStreamingDemandDrivenPipeline::WHOLE_EXTENT(), this->ImageData->GetExtent(), 6);
+    outInfo->Set(vtkDataObject::SPACING(), 1.0, 1.0, 1.0);
+    outInfo->Set(vtkDataObject::ORIGIN(),  0.0, 0.0, 0.0);
+
+    vtkDataObject::SetPointDataActiveScalarInfo(outInfo, this->ImageData->GetScalarType(), this->ImageData->GetNumberOfScalarComponents());
+    return 1;
+}
+
+int cv::viz::vtkImageMatSource::RequestData(vtkInformation*, vtkInformationVector**, vtkInformationVector *outputVector)
+{
+     vtkInformation *outInfo = outputVector->GetInformationObject(0);
+
+     vtkImageData *output = vtkImageData::SafeDownCast(outInfo->Get(vtkDataObject::DATA_OBJECT()) );
+     output->ShallowCopy(this->ImageData);
+     return 1;
+}
+
+void cv::viz::vtkImageMatSource::SetImage(InputArray _image)
+{
+    CV_Assert(_image.depth() == CV_8U && (_image.channels() == 1 || _image.channels() == 3 || _image.channels() == 4));
+
+    Mat image = _image.getMat();
+
+    this->ImageData->SetDimensions(image.cols, image.rows, 1);
+#if VTK_MAJOR_VERSION <= 5
+    this->ImageData->SetNumberOfScalarComponents(image.channels());
+    this->ImageData->SetScalarTypeToUnsignedChar();
+    this->ImageData->AllocateScalars();
+#else
+    this->ImageData->AllocateScalars(VTK_UNSIGNED_CHAR, image.channels());
+#endif
+
+    switch(image.channels())
+    {
+    case 1: copyGrayImage(image, this->ImageData); break;
+    case 3: copyRGBImage (image, this->ImageData); break;
+    case 4: copyRGBAImage(image, this->ImageData); break;
+    }
+    this->ImageData->Modified();
+}
+
+void cv::viz::vtkImageMatSource::copyGrayImage(const Mat &source, vtkSmartPointer<vtkImageData> output)
+{
+    unsigned char* dptr = reinterpret_cast<unsigned char*>(output->GetScalarPointer());
+    size_t elem_step = output->GetIncrements()[1]/sizeof(unsigned char);
+
+    for (int y = 0; y < source.rows; ++y)
+    {
+        unsigned char* drow = dptr + elem_step * y;
+        const unsigned char *srow = source.ptr<unsigned char>(y);
+        for (int x = 0; x < source.cols; ++x)
+            drow[x] = *srow++;
+    }
+}
+
+void cv::viz::vtkImageMatSource::copyRGBImage(const Mat &source, vtkSmartPointer<vtkImageData> output)
+{
+    Vec3b* dptr = reinterpret_cast<Vec3b*>(output->GetScalarPointer());
+    size_t elem_step = output->GetIncrements()[1]/sizeof(Vec3b);
+
+    for (int y = 0; y < source.rows; ++y)
+    {
+        Vec3b* drow = dptr + elem_step * y;
+        const unsigned char *srow = source.ptr<unsigned char>(y);
+        for (int x = 0; x < source.cols; ++x, srow += source.channels())
+            drow[x] = Vec3b(srow[2], srow[1], srow[0]);
+    }
+}
+
+void cv::viz::vtkImageMatSource::copyRGBAImage(const Mat &source, vtkSmartPointer<vtkImageData> output)
+{
+    Vec4b* dptr = reinterpret_cast<Vec4b*>(output->GetScalarPointer());
+    size_t elem_step = output->GetIncrements()[1]/sizeof(Vec4b);
+
+    for (int y = 0; y < source.rows; ++y)
+    {
+        Vec4b* drow = dptr + elem_step * y;
+        const unsigned char *srow = source.ptr<unsigned char>(y);
+        for (int x = 0; x < source.cols; ++x, srow += source.channels())
+            drow[x] = Vec4b(srow[2], srow[1], srow[0], srow[3]);
+    }
+}
diff --git a/modules/viz/src/vtk/vtkImageMatSource.h b/modules/viz/src/vtk/vtkImageMatSource.h
new file mode 100644
index 00000000000..a7a41e082fd
--- /dev/null
+++ b/modules/viz/src/vtk/vtkImageMatSource.h
@@ -0,0 +1,82 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#include "../precomp.hpp"
+
+#ifndef __vtkImageMatSource_h
+#define __vtkImageMatSource_h
+
+namespace cv
+{
+    namespace viz
+    {
+        class vtkImageMatSource : public vtkImageAlgorithm
+        {
+        public:
+            static vtkImageMatSource *New();
+            vtkTypeMacro(vtkImageMatSource,vtkImageAlgorithm);
+
+            void SetImage(InputArray image);
+
+        protected:
+            vtkImageMatSource();
+            ~vtkImageMatSource() {}
+
+            vtkSmartPointer<vtkImageData> ImageData;
+
+            int RequestInformation(vtkInformation*, vtkInformationVector**, vtkInformationVector*);
+            int RequestData (vtkInformation*, vtkInformationVector**, vtkInformationVector*);
+        private:
+            vtkImageMatSource(const vtkImageMatSource&);  // Not implemented.
+            void operator=(const vtkImageMatSource&);  // Not implemented.
+
+            static void copyGrayImage(const Mat &source, vtkSmartPointer<vtkImageData> output);
+            static void copyRGBImage (const Mat &source, vtkSmartPointer<vtkImageData> output);
+            static void copyRGBAImage(const Mat &source, vtkSmartPointer<vtkImageData> output);
+        };
+    }
+}
+
+
+#endif
diff --git a/modules/viz/src/vtk/vtkOBJWriter.cpp b/modules/viz/src/vtk/vtkOBJWriter.cpp
new file mode 100644
index 00000000000..296b6eb0656
--- /dev/null
+++ b/modules/viz/src/vtk/vtkOBJWriter.cpp
@@ -0,0 +1,270 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#include "../precomp.hpp"
+
+namespace cv { namespace viz
+{
+    vtkStandardNewMacro(vtkOBJWriter);
+}}
+
+cv::viz::vtkOBJWriter::vtkOBJWriter()
+{
+    std::ofstream fout; // only used to extract the default precision
+    this->DecimalPrecision = fout.precision();
+    this->FileName = NULL;
+}
+
+cv::viz::vtkOBJWriter::~vtkOBJWriter(){}
+
+void cv::viz::vtkOBJWriter::WriteData()
+{
+    vtkPolyData *input = this->GetInput();
+    if (!input)
+        return;
+
+    if (!this->FileName )
+    {
+        vtkErrorMacro(<< "No FileName specified! Can't write!");
+        this->SetErrorCode(vtkErrorCode::NoFileNameError);
+        return;
+    }
+
+    vtkDebugMacro(<<"Opening vtk file for writing...");
+    ostream *outfilep = new ofstream(this->FileName, ios::out);
+    if (outfilep->fail())
+    {
+        vtkErrorMacro(<< "Unable to open file: "<< this->FileName);
+        this->SetErrorCode(vtkErrorCode::CannotOpenFileError);
+        delete outfilep;
+        return;
+    }
+
+    std::ostream& outfile = *outfilep;
+
+    //write header
+    outfile << "# wavefront obj file written by opencv viz module" << std::endl << std::endl;
+    outfile << "mtllib NONE" << std::endl << std::endl;
+
+    // write out the points
+    for (int i = 0; i < input->GetNumberOfPoints(); i++)
+    {
+        Vec3d p;
+        input->GetPoint(i, p.val);
+        outfile << std::setprecision(this->DecimalPrecision) << "v " << p[0] << " " << p[1] << " " << p[2] << std::endl;
+    }
+
+    const int idStart = 1;
+
+    // write out the point data
+    vtkSmartPointer<vtkDataArray> normals = input->GetPointData()->GetNormals();
+    if(normals)
+    {
+        for (int i = 0; i < normals->GetNumberOfTuples(); i++)
+        {
+            Vec3d p;
+            normals->GetTuple(i, p.val);
+            outfile << std::setprecision(this->DecimalPrecision) << "vn " << p[0] << " " << p[1] << " " << p[2] << std::endl;
+        }
+    }
+
+    vtkSmartPointer<vtkDataArray> tcoords = input->GetPointData()->GetTCoords();
+    if (tcoords)
+    {
+        for (int i = 0; i < tcoords->GetNumberOfTuples(); i++)
+        {
+            Vec2d p;
+            tcoords->GetTuple(i, p.val);
+            outfile << std::setprecision(this->DecimalPrecision) << "vt " << p[0] << " " << p[1] << std::endl;
+        }
+    }
+
+    // write out a group name and material
+    outfile << std::endl << "g grp" << idStart << std::endl;
+    outfile << "usemtl mtlNONE" << std::endl;
+
+    // write out verts if any
+    if (input->GetNumberOfVerts() > 0)
+    {
+        vtkIdType npts = 0, *index = 0;
+        vtkCellArray *cells = input->GetVerts();
+        for (cells->InitTraversal(); cells->GetNextCell(npts, index); )
+        {
+            outfile << "p ";
+            for (int i = 0; i < npts; i++)
+                outfile << index[i] + idStart << " ";
+            outfile << std::endl;
+        }
+    }
+
+    // write out lines if any
+    if (input->GetNumberOfLines() > 0)
+    {
+        vtkIdType npts = 0, *index = 0;
+        vtkCellArray *cells = input->GetLines();
+        for (cells->InitTraversal(); cells->GetNextCell(npts, index); )
+        {
+            outfile << "l ";
+            if (tcoords)
+            {
+                for (int i = 0; i < npts; i++)
+                    outfile << index[i] + idStart << "/" << index[i] + idStart << " ";
+            }
+            else
+                for (int i = 0; i < npts; i++)
+                    outfile << index[i] + idStart << " ";
+
+            outfile << std::endl;
+        }
+    }
+
+    // write out polys if any
+    if (input->GetNumberOfPolys() > 0)
+    {
+        vtkIdType npts = 0, *index = 0;
+        vtkCellArray *cells = input->GetPolys();
+        for (cells->InitTraversal(); cells->GetNextCell(npts, index); )
+        {
+            outfile << "f ";
+            for (int i = 0; i < npts; i++)
+            {
+                if (normals)
+                {
+                    if (tcoords)
+                        outfile << index[i] + idStart << "/"  << index[i] + idStart << "/" << index[i] + idStart << " ";
+                    else
+                        outfile << index[i] + idStart << "//" << index[i] + idStart << " ";
+                }
+                else
+                {
+                    if (tcoords)
+                        outfile << index[i] + idStart << " " << index[i] + idStart << " ";
+                    else
+                        outfile << index[i] + idStart << " ";
+                }
+            }
+            outfile << std::endl;
+        }
+    }
+
+    // write out tstrips if any
+    if (input->GetNumberOfStrips() > 0)
+    {
+        vtkIdType npts = 0, *index = 0;
+        vtkCellArray *cells = input->GetStrips();
+        for (cells->InitTraversal(); cells->GetNextCell(npts, index); )
+        {
+            for (int i = 2, i1, i2; i < npts; ++i)
+            {
+                if (i % 2)
+                {
+                    i1 = i - 1;
+                    i2 = i - 2;
+                }
+                else
+                {
+                    i1 = i - 1;
+                    i2 = i - 2;
+                }
+
+                if(normals)
+                {
+                    if (tcoords)
+                    {
+                        outfile << "f " << index[i1] + idStart << "/" << index[i1] + idStart << "/" << index[i1] + idStart << " "
+                            << index[i2]+ idStart << "/" << index[i2] + idStart << "/" << index[i2] + idStart << " "
+                            << index[i] + idStart << "/" << index[i]  + idStart << "/" << index[i]  + idStart << std::endl;
+                    }
+                    else
+                    {
+                        outfile << "f " << index[i1] + idStart << "//" << index[i1] + idStart << " " << index[i2] + idStart
+                                << "//" << index[i2] + idStart << " "  << index[i]  + idStart << "//" << index[i] + idStart << std::endl;
+                    }
+                }
+                else
+                {
+                    if (tcoords)
+                    {
+                        outfile << "f " << index[i1] + idStart << "/" << index[i1] + idStart << " " << index[i2] + idStart
+                                << "/" << index[i2] + idStart << " "  << index[i]  + idStart << "/" << index[i]  + idStart << std::endl;
+                    }
+                    else
+                        outfile << "f " << index[i1] + idStart << " " << index[i2] + idStart << " " << index[i] + idStart << std::endl;
+                }
+            } /* for (int i = 2; i < npts; ++i) */
+        }
+    } /* if (input->GetNumberOfStrips() > 0) */
+
+    vtkDebugMacro(<<"Closing vtk file\n");
+    delete outfilep;
+
+    // Delete the file if an error occurred
+    if (this->ErrorCode == vtkErrorCode::OutOfDiskSpaceError)
+    {
+        vtkErrorMacro("Ran out of disk space; deleting file: " << this->FileName);
+        unlink(this->FileName);
+    }
+}
+
+void cv::viz::vtkOBJWriter::PrintSelf(ostream& os, vtkIndent indent)
+{
+    Superclass::PrintSelf(os, indent);
+    os << indent << "DecimalPrecision: " << DecimalPrecision << "\n";
+}
+
+int cv::viz::vtkOBJWriter::FillInputPortInformation(int, vtkInformation *info)
+{
+    info->Set(vtkAlgorithm::INPUT_REQUIRED_DATA_TYPE(), "vtkPolyData");
+    return 1;
+}
+
+vtkPolyData* cv::viz::vtkOBJWriter::GetInput()
+{
+    return vtkPolyData::SafeDownCast(this->Superclass::GetInput());
+}
+
+vtkPolyData* cv::viz::vtkOBJWriter::GetInput(int port)
+{
+    return vtkPolyData::SafeDownCast(this->Superclass::GetInput(port));
+}
diff --git a/modules/viz/src/vtk/vtkOBJWriter.h b/modules/viz/src/vtk/vtkOBJWriter.h
new file mode 100644
index 00000000000..7ad0f17b15e
--- /dev/null
+++ b/modules/viz/src/vtk/vtkOBJWriter.h
@@ -0,0 +1,91 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#ifndef __vtkOBJWriter_h
+#define __vtkOBJWriter_h
+
+#include <vtkWriter.h>
+
+namespace cv
+{
+    namespace viz
+    {
+        class vtkOBJWriter : public vtkWriter
+        {
+        public:
+          static vtkOBJWriter *New();
+          vtkTypeMacro(vtkOBJWriter,vtkWriter)
+          void PrintSelf(ostream& os, vtkIndent indent);
+
+          vtkGetMacro(DecimalPrecision, int)
+          vtkSetMacro(DecimalPrecision, int)
+
+          // Description:
+          // Specify file name of data file to write.
+          vtkSetStringMacro(FileName)
+          vtkGetStringMacro(FileName)
+
+          // Description:
+          // Get the input to this writer.
+          vtkPolyData* GetInput();
+          vtkPolyData* GetInput(int port);
+
+        protected:
+          vtkOBJWriter();
+          ~vtkOBJWriter();
+
+          void WriteData();
+          int FillInputPortInformation(int port, vtkInformation *info);
+
+          int DecimalPrecision;
+          char *FileName;
+
+        private:
+          vtkOBJWriter(const vtkOBJWriter&);  // Not implemented.
+          void operator=(const vtkOBJWriter&);  // Not implemented.
+        };
+    }
+}
+
+#endif
diff --git a/modules/viz/src/vtk/vtkTrajectorySource.cpp b/modules/viz/src/vtk/vtkTrajectorySource.cpp
new file mode 100644
index 00000000000..d0e180a9c1b
--- /dev/null
+++ b/modules/viz/src/vtk/vtkTrajectorySource.cpp
@@ -0,0 +1,110 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#include "../precomp.hpp"
+
+namespace cv { namespace viz
+{
+    vtkStandardNewMacro(vtkTrajectorySource);
+}}
+
+cv::viz::vtkTrajectorySource::vtkTrajectorySource() { SetNumberOfInputPorts(0); }
+cv::viz::vtkTrajectorySource::~vtkTrajectorySource() {}
+
+void cv::viz::vtkTrajectorySource::SetTrajectory(InputArray _traj)
+{
+    CV_Assert(_traj.kind() == _InputArray::STD_VECTOR || _traj.kind() == _InputArray::MAT);
+    CV_Assert(_traj.type() == CV_32FC(16) || _traj.type() == CV_64FC(16));
+
+    Mat traj;
+    _traj.getMat().convertTo(traj, CV_64F);
+    const Affine3d* dpath = traj.ptr<Affine3d>();
+    size_t total = traj.total();
+
+    points = vtkSmartPointer<vtkPoints>::New();
+    points->SetDataType(VTK_DOUBLE);
+    points->SetNumberOfPoints((vtkIdType)total);
+
+    tensors = vtkSmartPointer<vtkDoubleArray>::New();
+    tensors->SetNumberOfComponents(9);
+    tensors->SetNumberOfTuples((vtkIdType)total);
+
+    for(size_t i = 0; i < total; ++i, ++dpath)
+    {
+        Matx33d R = dpath->rotation().t();  // transposed because of
+        tensors->SetTuple((vtkIdType)i, R.val);        // column major order
+
+        Vec3d p = dpath->translation();
+        points->SetPoint((vtkIdType)i, p.val);
+    }
+}
+
+cv::Mat cv::viz::vtkTrajectorySource::ExtractPoints(InputArray _traj)
+{
+    CV_Assert(_traj.kind() == _InputArray::STD_VECTOR || _traj.kind() == _InputArray::MAT);
+    CV_Assert(_traj.type() == CV_32FC(16) || _traj.type() == CV_64FC(16));
+
+    Mat points(1, (int)_traj.total(), CV_MAKETYPE(_traj.depth(), 3));
+    const Affine3d* dpath = _traj.getMat().ptr<Affine3d>();
+    const Affine3f* fpath = _traj.getMat().ptr<Affine3f>();
+
+    if (_traj.depth() == CV_32F)
+        for(int i = 0; i < points.cols; ++i)
+            points.at<Vec3f>(i) = fpath[i].translation();
+
+    if (_traj.depth() == CV_64F)
+        for(int i = 0; i < points.cols; ++i)
+            points.at<Vec3d>(i) = dpath[i].translation();
+
+    return points;
+}
+
+int cv::viz::vtkTrajectorySource::RequestData(vtkInformation *vtkNotUsed(request), vtkInformationVector **vtkNotUsed(inputVector), vtkInformationVector *outputVector)
+{
+    vtkInformation *outInfo = outputVector->GetInformationObject(0);
+    vtkPolyData *output = vtkPolyData::SafeDownCast(outInfo->Get(vtkDataObject::DATA_OBJECT()));
+    output->SetPoints(points);
+    output->GetPointData()->SetTensors(tensors);
+    return 1;
+}
diff --git a/modules/viz/src/vtk/vtkTrajectorySource.h b/modules/viz/src/vtk/vtkTrajectorySource.h
new file mode 100644
index 00000000000..f6c9c77b9ca
--- /dev/null
+++ b/modules/viz/src/vtk/vtkTrajectorySource.h
@@ -0,0 +1,84 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#ifndef __vtkTrajectorySource_h
+#define __vtkTrajectorySource_h
+
+#include <opencv2/core/mat.hpp>
+#include <vtkPolyDataAlgorithm.h>
+#include <vtkSmartPointer.h>
+#include <vtkPoints.h>
+#include <vtkCellArray.h>
+
+namespace cv
+{
+    namespace viz
+    {
+        class vtkTrajectorySource : public vtkPolyDataAlgorithm
+        {
+        public:
+            static vtkTrajectorySource *New();
+            vtkTypeMacro(vtkTrajectorySource,vtkPolyDataAlgorithm)
+
+            virtual void SetTrajectory(InputArray trajectory);
+
+            static Mat ExtractPoints(InputArray trajectory);
+
+        protected:
+            vtkTrajectorySource();
+            ~vtkTrajectorySource();
+
+            vtkSmartPointer<vtkPoints> points;
+            vtkSmartPointer<vtkDoubleArray> tensors;
+
+            int RequestData(vtkInformation *, vtkInformationVector **, vtkInformationVector *);
+        private:
+            vtkTrajectorySource(const vtkTrajectorySource&);  // Not implemented.
+            void operator=(const vtkTrajectorySource&);  // Not implemented.
+
+        };
+    }
+}
+
+#endif
diff --git a/modules/viz/src/vtk/vtkVizInteractorStyle.cpp b/modules/viz/src/vtk/vtkVizInteractorStyle.cpp
new file mode 100644
index 00000000000..e2d338066e3
--- /dev/null
+++ b/modules/viz/src/vtk/vtkVizInteractorStyle.cpp
@@ -0,0 +1,1076 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#include "../precomp.hpp"
+
+namespace cv { namespace viz
+{
+    vtkStandardNewMacro(vtkVizInteractorStyle)
+}}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+
+cv::viz::vtkVizInteractorStyle::vtkVizInteractorStyle()
+{
+    FlyMode = false;
+    MotionFactor = 10.0;
+
+    keyboardCallback_ = 0;
+    keyboard_callback_cookie_ = 0;
+
+    mouseCallback_ = 0;
+    mouse_callback_cookie_ = 0;
+
+    // Set windows size (width, height) to unknown (-1)
+    win_size_ = Vec2i(-1, -1);
+    win_pos_ = Vec2i(0, 0);
+    max_win_size_ = Vec2i(-1, -1);
+
+    stereo_anaglyph_redblue_ = true;
+
+    //from fly
+    KeysDown     = 0;
+    UseTimers    = 1;
+
+    DiagonalLength           = 1.0;
+    MotionStepSize           = 1.0/100.0;
+    MotionUserScale          = 1.0;  // +/- key adjustment
+    MotionAccelerationFactor = 10.0;
+    AngleStepSize            = 1.0;
+}
+
+cv::viz::vtkVizInteractorStyle::~vtkVizInteractorStyle() {}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::saveScreenshot(const String &file)
+{
+    FindPokedRenderer(Interactor->GetEventPosition()[0], Interactor->GetEventPosition()[1]);
+
+    vtkSmartPointer<vtkWindowToImageFilter> wif = vtkSmartPointer<vtkWindowToImageFilter>::New();
+    wif->SetInput(Interactor->GetRenderWindow());
+
+    vtkSmartPointer<vtkPNGWriter> snapshot_writer = vtkSmartPointer<vtkPNGWriter>::New();
+    snapshot_writer->SetInputConnection(wif->GetOutputPort());
+    snapshot_writer->SetFileName(file.c_str());
+    snapshot_writer->Write();
+
+    cout << "Screenshot successfully captured (" << file.c_str() << ")" << endl;
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+
+void cv::viz::vtkVizInteractorStyle::exportScene(const String &file)
+{
+    vtkSmartPointer<vtkExporter> exporter;
+    if (file.size() > 5 && file.substr(file.size() - 5) == ".vrml")
+    {
+        exporter = vtkSmartPointer<vtkVRMLExporter>::New();
+        vtkVRMLExporter::SafeDownCast(exporter)->SetFileName(file.c_str());
+    }
+    else
+    {
+        exporter = vtkSmartPointer<vtkOBJExporter>::New();
+        vtkOBJExporter::SafeDownCast(exporter)->SetFilePrefix(file.c_str());
+    }
+
+    exporter->SetInput(Interactor->GetRenderWindow());
+    exporter->Write();
+
+    cout << "Scene successfully exported (" << file.c_str() << ")" << endl;
+}
+
+void cv::viz::vtkVizInteractorStyle::exportScene()
+{
+    // Export scene as in obj or vrml format
+    String format = Interactor->GetAltKey() ? "scene-%d.vrml" : "scene-%d";
+    exportScene(cv::format(format.c_str(), (unsigned int)time(0)));
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+
+void cv::viz::vtkVizInteractorStyle::changePointsSize(float delta)
+{
+    vtkSmartPointer<vtkActorCollection> ac = CurrentRenderer->GetActors();
+    vtkCollectionSimpleIterator ait;
+
+    for (ac->InitTraversal(ait); vtkActor* actor = ac->GetNextActor(ait); )
+        for (actor->InitPathTraversal(); vtkAssemblyPath* path = actor->GetNextPath(); )
+        {
+            vtkActor* apart = vtkActor::SafeDownCast(path->GetLastNode()->GetViewProp());
+            float psize = apart->GetProperty()->GetPointSize() + delta;
+            psize = std::max(1.f, std::min(63.f, psize));
+            apart->GetProperty()->SetPointSize(psize);
+        }
+}
+
+void cv::viz::vtkVizInteractorStyle::setRepresentationToPoints()
+{
+    vtkSmartPointer<vtkActorCollection> ac = CurrentRenderer->GetActors();
+    vtkCollectionSimpleIterator ait;
+    for (ac->InitTraversal(ait); vtkActor* actor = ac->GetNextActor(ait); )
+        for (actor->InitPathTraversal(); vtkAssemblyPath* path = actor->GetNextPath(); )
+        {
+            vtkActor* apart = vtkActor::SafeDownCast(path->GetLastNode()->GetViewProp());
+            apart->GetProperty()->SetRepresentationToPoints();
+        }
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+
+void cv::viz::vtkVizInteractorStyle::printCameraParams()
+{
+    vtkSmartPointer<vtkCamera> cam = Interactor->GetRenderWindow()->GetRenderers()->GetFirstRenderer()->GetActiveCamera();
+
+    Vec2d clip(cam->GetClippingRange());
+    Vec3d focal(cam->GetFocalPoint()), pos(cam->GetPosition()), view(cam->GetViewUp());
+    Vec2i win_pos(Interactor->GetRenderWindow()->GetPosition());
+    Vec2i win_size(Interactor->GetRenderWindow()->GetSize());
+    double angle = cam->GetViewAngle () / 180.0 * CV_PI;
+
+    String data = cv::format("clip(%f,%f) focal(%f,%f,%f) pos(%f,%f,%f) view(%f,%f,%f) angle(%f) winsz(%d,%d) winpos(%d,%d)",
+                             clip[0], clip[1], focal[0], focal[1], focal[2], pos[0], pos[1], pos[2], view[0], view[1], view[2],
+                             angle, win_size[0], win_size[1], win_pos[0], win_pos[1]);
+
+    std::cout << data.c_str() << std::endl;
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+
+void cv::viz::vtkVizInteractorStyle::toggleFullScreen()
+{
+    Vec2i screen_size(Interactor->GetRenderWindow()->GetScreenSize());
+    Vec2i win_size(Interactor->GetRenderWindow()->GetSize());
+
+    // Is window size = max?
+    if (win_size == max_win_size_)
+    {
+        Interactor->GetRenderWindow()->SetSize(win_size_.val);
+        Interactor->GetRenderWindow()->SetPosition(win_pos_.val);
+        Interactor->Render();
+    }
+    // Set to max
+    else
+    {
+        win_pos_ = Vec2i(Interactor->GetRenderWindow()->GetPosition());
+        win_size_ = win_size;
+
+        Interactor->GetRenderWindow()->SetSize(screen_size.val);
+        Interactor->Render();
+        max_win_size_ = Vec2i(Interactor->GetRenderWindow()->GetSize());
+    }
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+
+void cv::viz::vtkVizInteractorStyle::resetViewerPose()
+{
+    WidgetActorMap::iterator it = widget_actor_map_->begin();
+    // it might be that some actors don't have a valid transformation set -> we skip them to avoid a seg fault.
+    for (; it != widget_actor_map_->end();  ++it)
+    {
+        vtkProp3D * actor = vtkProp3D::SafeDownCast(it->second);
+        if (actor && actor->GetUserMatrix())
+            break;
+    }
+
+    vtkSmartPointer<vtkCamera> cam = CurrentRenderer->GetActiveCamera();
+
+    // if a valid transformation was found, use it otherwise fall back to default view point.
+    if (it != widget_actor_map_->end())
+    {
+        vtkMatrix4x4* m = vtkProp3D::SafeDownCast(it->second)->GetUserMatrix();
+
+        cam->SetFocalPoint(m->GetElement(0, 3) - m->GetElement(0, 2),
+                           m->GetElement(1, 3) - m->GetElement(1, 2),
+                           m->GetElement(2, 3) - m->GetElement(2, 2));
+
+        cam->SetViewUp  (m->GetElement(0, 1), m->GetElement(1, 1), m->GetElement(2, 1));
+        cam->SetPosition(m->GetElement(0, 3), m->GetElement(1, 3), m->GetElement(2, 3));
+    }
+    else
+    {
+        cam->SetPosition(0, 0, 0);
+        cam->SetFocalPoint(0, 0, 1);
+        cam->SetViewUp(0, -1, 0);
+    }
+
+    // go to the next actor for the next key-press event.
+    if (it != widget_actor_map_->end())
+        ++it;
+    else
+        it = widget_actor_map_->begin();
+
+    CurrentRenderer->SetActiveCamera(cam);
+    CurrentRenderer->ResetCameraClippingRange();
+    Interactor->Render();
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+
+void cv::viz::vtkVizInteractorStyle::toggleStereo()
+{
+    vtkSmartPointer<vtkRenderWindow> window = Interactor->GetRenderWindow();
+    if (!window->GetStereoRender())
+    {
+        static Vec2i red_blue(4, 3), magenta_green(2, 5);
+        window->SetAnaglyphColorMask (stereo_anaglyph_redblue_ ? red_blue.val : magenta_green.val);
+        stereo_anaglyph_redblue_ = !stereo_anaglyph_redblue_;
+    }
+    window->SetStereoRender(!window->GetStereoRender());
+    Interactor->Render();
+
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+
+void cv::viz::vtkVizInteractorStyle::printHelp()
+{
+    std::cout << "| Help:\n"
+                 "-------\n"
+                 "          p, P   : switch to a point-based representation\n"
+                 "          w, W   : switch to a wireframe-based representation (where available)\n"
+                 "          s, S   : switch to a surface-based representation (where available)\n"
+                 "\n"
+                 "          j, J   : take a .PNG snapshot of the current window view\n"
+                 "          k, K   : export scene to Wavefront .obj format\n"
+                 "    ALT + k, K   : export scene to VRML format\n"
+                 "          c, C   : display current camera/window parameters\n"
+                 "          F5     : enable/disable fly mode (changes control style)\n"
+                 "\n"
+                 "          e, E   : exit the interactor\n"
+                 "          q, Q   : stop and call VTK's TerminateApp\n"
+                 "\n"
+                 "           +/-   : increment/decrement overall point size\n"
+                 "     +/- [+ ALT] : zoom in/out \n"
+                 "\n"
+                 "    r, R [+ ALT] : reset camera [to viewpoint = {0, 0, 0} -> center_{x, y, z}]\n"
+                 "\n"
+                 "    ALT + s, S   : turn stereo mode on/off\n"
+                 "    ALT + f, F   : switch between maximized window mode and original size\n"
+                 "\n"
+              << std::endl;
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::zoomIn()
+{
+    FindPokedRenderer(Interactor->GetEventPosition()[0], Interactor->GetEventPosition()[1]);
+    // Zoom in
+    StartDolly();
+    double factor = 10.0 * 0.2 * .5;
+    Dolly(std::pow(1.1, factor));
+    EndDolly();
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::zoomOut()
+{
+    FindPokedRenderer(Interactor->GetEventPosition()[0], Interactor->GetEventPosition()[1]);
+    // Zoom out
+    StartDolly();
+    double factor = 10.0 * -0.2 * .5;
+    Dolly(std::pow(1.1, factor));
+    EndDolly();
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::OnChar()
+{
+    FindPokedRenderer(Interactor->GetEventPosition()[0], Interactor->GetEventPosition()[1]);
+
+    String key(Interactor->GetKeySym());
+    if (key.find("XF86ZoomIn") != String::npos)
+        zoomIn();
+    else if (key.find("XF86ZoomOut") != String::npos)
+        zoomOut();
+
+    switch (Interactor->GetKeyCode())
+    {
+//    // All of the options below simply exit
+//    case 'l': case 'L': case 'j': case 'J': case 'c': case 'C': case 'q': case 'Q':
+//    case 'f': case 'F': case 'g': case 'G': case 'o': case 'O': case 'u': case 'U':
+    case 'p': case 'P':
+        break;
+
+    case '+':
+        if (FlyMode)
+            MotionUserScale = std::min(16.0, MotionUserScale*2.0);
+        break;
+    case '-':
+        if (FlyMode)
+            MotionUserScale = std::max(MotionUserScale * 0.5, 0.0625);
+        break;
+
+    case 'r': case 'R': case 's': case 'S':
+        if (!Interactor->GetAltKey())
+            Superclass::OnChar();
+        break;
+    default:
+        Superclass::OnChar();
+        break;
+    }
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::registerMouseCallback(void (*callback)(const MouseEvent&, void*), void* cookie)
+{
+    mouseCallback_ = callback;
+    mouse_callback_cookie_ = cookie;
+}
+
+void cv::viz::vtkVizInteractorStyle::registerKeyboardCallback(void (*callback)(const KeyboardEvent&, void*), void *cookie)
+{
+    keyboardCallback_ = callback;
+    keyboard_callback_cookie_ = cookie;
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+int cv::viz::vtkVizInteractorStyle::getModifiers()
+{
+    int modifiers = KeyboardEvent::NONE;
+
+    if (Interactor->GetAltKey())
+        modifiers |= KeyboardEvent::ALT;
+
+    if (Interactor->GetControlKey())
+        modifiers |= KeyboardEvent::CTRL;
+
+    if (Interactor->GetShiftKey())
+        modifiers |= KeyboardEvent::SHIFT;
+    return modifiers;
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::OnKeyDown()
+{
+    FindPokedRenderer(Interactor->GetEventPosition()[0], Interactor->GetEventPosition()[1]);
+
+    String key(Interactor->GetKeySym());
+    if (key.find("XF86ZoomIn") != String::npos)
+        zoomIn();
+    else if (key.find("XF86ZoomOut") != String::npos)
+        zoomOut();
+    else if (key.find("F5") != String::npos)
+    {
+        FlyMode = !FlyMode;
+        std::cout << (FlyMode ? "Fly mode: on" : "Fly mode: off") << std::endl;
+    }
+
+    // Save the initial windows width/height
+    if (win_size_[0] == -1 || win_size_[1] == -1)
+        win_size_ = Vec2i(Interactor->GetRenderWindow()->GetSize());
+
+    switch (Interactor->GetKeyCode())
+    {
+    case 'a': case 'A' : KeysDown |=16; break;
+    case 'z': case 'Z' : KeysDown |=32; break;
+    case 'h': case 'H' : printHelp();   break;
+    case 'p': case 'P' : setRepresentationToPoints(); break;
+    case 'k': case 'K' : exportScene(); break;
+    case 'j': case 'J' : saveScreenshot(cv::format("screenshot-%d.png", (unsigned int)time(0))); break;
+    case 'c': case 'C' : printCameraParams(); break;
+    case '=':           zoomIn();            break;
+    case 43:        // KEY_PLUS
+    {
+        if (FlyMode)
+            break;
+        if (Interactor->GetAltKey())
+            zoomIn();
+        else
+            changePointsSize(+1.f);
+        break;
+    }
+    case 45:        // KEY_MINUS
+    {
+        if (FlyMode)
+            break;
+        if (Interactor->GetAltKey())
+            zoomOut();
+        else
+           changePointsSize(-1.f);
+        break;
+    }
+        // Switch between maximize and original window size
+    case 'f': case 'F':
+    {
+        if (Interactor->GetAltKey())
+            toggleFullScreen();
+        break;
+    }
+        // 's'/'S' w/out ALT
+    case 's': case 'S':
+    {
+        if (Interactor->GetAltKey())
+            toggleStereo();
+        break;
+    }
+
+    case 'o': case 'O':
+    {
+        vtkSmartPointer<vtkCamera> cam = CurrentRenderer->GetActiveCamera();
+        cam->SetParallelProjection(!cam->GetParallelProjection());
+        Interactor->Render();
+        break;
+    }
+
+    // Overwrite the camera reset
+    case 'r': case 'R':
+    {
+        if (Interactor->GetAltKey())
+            resetViewerPose();
+        break;
+    }
+    case 'q': case 'Q':
+        Interactor->ExitCallback(); return;
+    default:
+        Superclass::OnKeyDown(); break;
+    }
+
+    KeyboardEvent event(KeyboardEvent::KEY_DOWN, Interactor->GetKeySym(), Interactor->GetKeyCode(), getModifiers());
+    if (keyboardCallback_)
+        keyboardCallback_(event, keyboard_callback_cookie_);
+
+    if (FlyMode && (KeysDown & (32+16)) == (32+16))
+    {
+        if (State == VTKIS_FORWARDFLY || State == VTKIS_REVERSEFLY)
+            StopState();
+    }
+    else if (FlyMode && (KeysDown & 32) == 32)
+    {
+        if (State == VTKIS_FORWARDFLY)
+            StopState();
+
+        if (State == VTKIS_NONE)
+            StartState(VTKIS_REVERSEFLY);
+    }
+    else if (FlyMode && (KeysDown & 16) == 16)
+    {
+        if (State == VTKIS_REVERSEFLY)
+            StopState();
+
+        if (State == VTKIS_NONE)
+            StartState(VTKIS_FORWARDFLY);
+    }
+
+    Interactor->Render();
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::OnKeyUp()
+{
+    KeyboardEvent event(KeyboardEvent::KEY_UP, Interactor->GetKeySym(), Interactor->GetKeyCode(), getModifiers());
+    if (keyboardCallback_)
+        keyboardCallback_(event, keyboard_callback_cookie_);
+
+    switch (Interactor->GetKeyCode())
+    {
+    case 'a': case 'A' : KeysDown &= ~16; break;
+    case 'z': case 'Z' : KeysDown &= ~32; break;
+    }
+
+    if (State == VTKIS_FORWARDFLY && (KeysDown & 16) == 0)
+        StopState();
+
+    if (State == VTKIS_REVERSEFLY && (KeysDown & 32) == 0)
+        StopState();
+
+    Superclass::OnKeyUp();
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::OnMouseMove()
+{
+    Vec2i p(Interactor->GetEventPosition());
+    MouseEvent event(MouseEvent::MouseMove, MouseEvent::NoButton, p, getModifiers());
+    if (mouseCallback_)
+        mouseCallback_(event, mouse_callback_cookie_);
+
+    FindPokedRenderer(p[0], p[1]);
+
+    if (State == VTKIS_ROTATE || State == VTKIS_PAN || State == VTKIS_DOLLY || State == VTKIS_SPIN)
+    {
+        switch (State)
+        {
+        case VTKIS_ROTATE: Rotate(); break;
+        case VTKIS_PAN:    Pan();    break;
+        case VTKIS_DOLLY:  Dolly();  break;
+        case VTKIS_SPIN:   Spin();   break;
+        }
+
+        InvokeEvent(vtkCommand::InteractionEvent, NULL);
+    }
+
+    if (State == VTKIS_FORWARDFLY || State == VTKIS_REVERSEFLY)
+    {
+        vtkCamera *cam = CurrentRenderer->GetActiveCamera();
+        Vec2i thispos(Interactor->GetEventPosition());
+        Vec2i lastpos(Interactor->GetLastEventPosition());
+
+        // we want to steer by an amount proportional to window viewangle and size
+        // compute dx and dy increments relative to last mouse click
+        Vec2i size(Interactor->GetSize());
+        double scalefactor = 5*cam->GetViewAngle()/size[0];
+
+        double dx = - (thispos[0] - lastpos[0])*scalefactor*AngleStepSize;
+        double dy =   (thispos[1] - lastpos[1])*scalefactor*AngleStepSize;
+
+        // Temporary until I get smooth flight working
+        DeltaPitch = dy;
+        DeltaYaw = dx;
+
+        InvokeEvent(vtkCommand::InteractionEvent, NULL);
+    }
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::OnLeftButtonDown()
+{
+    Vec2i p(Interactor->GetEventPosition());
+    MouseEvent::Type type = (Interactor->GetRepeatCount() == 0) ? MouseEvent::MouseButtonPress : MouseEvent::MouseDblClick;
+    MouseEvent event(type, MouseEvent::LeftButton, p, getModifiers());
+    if (mouseCallback_)
+        mouseCallback_(event, mouse_callback_cookie_);
+
+    FindPokedRenderer(p[0], p[1]);
+    if (!CurrentRenderer)
+        return;
+
+    GrabFocus(EventCallbackCommand);
+
+    if (FlyMode)
+    {
+        if(State == VTKIS_REVERSEFLY)
+            State = VTKIS_FORWARDFLY;
+        else
+        {
+            SetupMotionVars();
+            if (State == VTKIS_NONE)
+                StartState(VTKIS_FORWARDFLY);
+        }
+    }
+    else
+    {
+        if (Interactor->GetShiftKey())
+        {
+            if (Interactor->GetControlKey())
+                StartDolly();
+            else
+                StartPan();
+        }
+        else
+        {
+            if (Interactor->GetControlKey())
+                StartSpin();
+            else
+                StartRotate();
+        }
+    }
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::OnLeftButtonUp()
+{
+    Vec2i p(Interactor->GetEventPosition());
+    MouseEvent event(MouseEvent::MouseButtonRelease, MouseEvent::LeftButton, p, getModifiers());
+    if (mouseCallback_)
+        mouseCallback_(event, mouse_callback_cookie_);
+
+    switch (State)
+    {
+    case VTKIS_DOLLY:      EndDolly();  break;
+    case VTKIS_PAN:        EndPan();    break;
+    case VTKIS_SPIN:       EndSpin();   break;
+    case VTKIS_ROTATE:     EndRotate(); break;
+    case VTKIS_FORWARDFLY: StopState(); break;
+    }
+
+    if (Interactor )
+        ReleaseFocus();
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::OnMiddleButtonDown()
+{
+    Vec2i p(Interactor->GetEventPosition());
+    MouseEvent::Type type = (Interactor->GetRepeatCount() == 0) ? MouseEvent::MouseButtonPress : MouseEvent::MouseDblClick;
+    MouseEvent event(type, MouseEvent::MiddleButton, p, getModifiers());
+    if (mouseCallback_)
+        mouseCallback_(event, mouse_callback_cookie_);
+
+    FindPokedRenderer(p[0], p[1]);
+    if (!CurrentRenderer)
+        return;
+
+    GrabFocus(EventCallbackCommand);
+    StartPan();
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::OnMiddleButtonUp()
+{
+    Vec2i p(Interactor->GetEventPosition());
+    MouseEvent event(MouseEvent::MouseButtonRelease, MouseEvent::MiddleButton, p, getModifiers());
+    if (mouseCallback_)
+        mouseCallback_(event, mouse_callback_cookie_);
+
+    if (State == VTKIS_PAN)
+    {
+        EndPan();
+        if (Interactor)
+            ReleaseFocus();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::OnRightButtonDown()
+{
+    Vec2i p(Interactor->GetEventPosition());
+    MouseEvent::Type type = (Interactor->GetRepeatCount() == 0) ? MouseEvent::MouseButtonPress : MouseEvent::MouseDblClick;
+    MouseEvent event(type, MouseEvent::RightButton, p, getModifiers());
+    if (mouseCallback_)
+        mouseCallback_(event, mouse_callback_cookie_);
+
+    FindPokedRenderer(p[0], p[1]);
+    if (!CurrentRenderer)
+        return;
+
+    GrabFocus(EventCallbackCommand);
+
+    if (FlyMode)
+    {
+        if (State == VTKIS_FORWARDFLY)
+            State = VTKIS_REVERSEFLY;
+        else
+        {
+            SetupMotionVars();
+            if (State == VTKIS_NONE)
+                StartState(VTKIS_REVERSEFLY);
+        }
+
+    }
+    else
+        StartDolly();
+}
+
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::OnRightButtonUp()
+{
+    Vec2i p(Interactor->GetEventPosition());
+    MouseEvent event(MouseEvent::MouseButtonRelease, MouseEvent::RightButton, p, getModifiers());
+    if (mouseCallback_)
+        mouseCallback_(event, mouse_callback_cookie_);
+
+    if(State == VTKIS_DOLLY)
+    {
+        EndDolly();
+        if (Interactor)
+            ReleaseFocus();
+    }
+
+    if (State == VTKIS_REVERSEFLY)
+    {
+        StopState();
+        if (Interactor)
+            ReleaseFocus();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::OnMouseWheelForward()
+{
+    Vec2i p(Interactor->GetEventPosition());
+    MouseEvent event(MouseEvent::MouseScrollUp, MouseEvent::VScroll, p, getModifiers());
+    if (mouseCallback_)
+        mouseCallback_(event, mouse_callback_cookie_);
+    if (Interactor->GetRepeatCount() && mouseCallback_)
+        mouseCallback_(event, mouse_callback_cookie_);
+
+    if (Interactor->GetAltKey())
+    {
+        // zoom
+        vtkSmartPointer<vtkCamera> cam = CurrentRenderer->GetActiveCamera();
+        double opening_angle = cam->GetViewAngle();
+        if (opening_angle > 15.0)
+            opening_angle -= 1.0;
+
+        cam->SetViewAngle(opening_angle);
+        cam->Modified();
+        CurrentRenderer->ResetCameraClippingRange();
+        CurrentRenderer->Modified();
+        Interactor->Render();
+    }
+    else
+    {
+        FindPokedRenderer(p[0], p[1]);
+        if (!CurrentRenderer)
+            return;
+
+        GrabFocus(EventCallbackCommand);
+        StartDolly();
+        Dolly(pow(1.1, MotionFactor * 0.2 * MouseWheelMotionFactor));
+        EndDolly();
+        ReleaseFocus();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::OnMouseWheelBackward()
+{
+    Vec2i p(Interactor->GetEventPosition());
+    MouseEvent event(MouseEvent::MouseScrollDown, MouseEvent::VScroll, p, getModifiers());
+    if (mouseCallback_)
+        mouseCallback_(event, mouse_callback_cookie_);
+
+    if (Interactor->GetRepeatCount() && mouseCallback_)
+        mouseCallback_(event, mouse_callback_cookie_);
+
+    if (Interactor->GetAltKey())
+    {
+        // zoom
+        vtkSmartPointer<vtkCamera> cam = CurrentRenderer->GetActiveCamera();
+        double opening_angle = cam->GetViewAngle();
+        if (opening_angle < 170.0)
+            opening_angle += 1.0;
+
+        cam->SetViewAngle(opening_angle);
+        cam->Modified();
+        CurrentRenderer->ResetCameraClippingRange();
+        CurrentRenderer->Modified();
+        Interactor->Render();
+    }
+    else
+    {
+        FindPokedRenderer(p[0], p[1]);
+        if (!CurrentRenderer)
+            return;
+
+        GrabFocus(EventCallbackCommand);
+        StartDolly();
+        Dolly(pow(1.1, MotionFactor * -0.2 * MouseWheelMotionFactor));
+        EndDolly();
+        ReleaseFocus();
+    }
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::OnTimer()
+{
+    if  (State == VTKIS_FORWARDFLY || State == VTKIS_REVERSEFLY)
+        Fly();
+
+    Interactor->Render();
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+
+void cv::viz::vtkVizInteractorStyle::Rotate()
+{
+    if (!CurrentRenderer)
+        return;
+
+    Vec2i dxy = Vec2i(Interactor->GetEventPosition()) - Vec2i(Interactor->GetLastEventPosition());
+    Vec2i size(CurrentRenderer->GetRenderWindow()->GetSize());
+
+    double delta_elevation = -20.0 / size[1];
+    double delta_azimuth   = -20.0 / size[0];
+
+    double rxf = dxy[0] * delta_azimuth * MotionFactor;
+    double ryf = dxy[1] * delta_elevation * MotionFactor;
+
+    vtkCamera *camera = CurrentRenderer->GetActiveCamera();
+    camera->Azimuth(rxf);
+    camera->Elevation(ryf);
+    camera->OrthogonalizeViewUp();
+
+    if (AutoAdjustCameraClippingRange)
+        CurrentRenderer->ResetCameraClippingRange();
+
+    if (Interactor->GetLightFollowCamera())
+        CurrentRenderer->UpdateLightsGeometryToFollowCamera();
+
+    Interactor->Render();
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::Spin()
+{
+    if (!CurrentRenderer)
+        return;
+
+    vtkRenderWindowInteractor *rwi = Interactor;
+
+    double *center = CurrentRenderer->GetCenter();
+
+    double newAngle = vtkMath::DegreesFromRadians( atan2( rwi->GetEventPosition()[1]     - center[1], rwi->GetEventPosition()[0]     - center[0] ) );
+    double oldAngle = vtkMath::DegreesFromRadians( atan2( rwi->GetLastEventPosition()[1] - center[1], rwi->GetLastEventPosition()[0] - center[0] ) );
+
+    vtkCamera *camera = CurrentRenderer->GetActiveCamera();
+    camera->Roll( newAngle - oldAngle );
+    camera->OrthogonalizeViewUp();
+
+    rwi->Render();
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+void cv::viz::vtkVizInteractorStyle::Pan()
+{
+    if (!CurrentRenderer)
+        return;
+
+    vtkRenderWindowInteractor *rwi = Interactor;
+
+    double viewFocus[4], focalDepth, viewPoint[3];
+    double newPickPoint[4], oldPickPoint[4], motionVector[3];
+
+    // Calculate the focal depth since we'll be using it a lot
+
+    vtkCamera *camera = CurrentRenderer->GetActiveCamera();
+    camera->GetFocalPoint(viewFocus);
+    ComputeWorldToDisplay(viewFocus[0], viewFocus[1], viewFocus[2], viewFocus);
+    focalDepth = viewFocus[2];
+
+    ComputeDisplayToWorld(rwi->GetEventPosition()[0], rwi->GetEventPosition()[1], focalDepth, newPickPoint);
+
+    // Has to recalc old mouse point since the viewport has moved, so can't move it outside the loop
+    ComputeDisplayToWorld(rwi->GetLastEventPosition()[0], rwi->GetLastEventPosition()[1], focalDepth, oldPickPoint);
+
+    // Camera motion is reversed
+    motionVector[0] = oldPickPoint[0] - newPickPoint[0];
+    motionVector[1] = oldPickPoint[1] - newPickPoint[1];
+    motionVector[2] = oldPickPoint[2] - newPickPoint[2];
+
+    camera->GetFocalPoint(viewFocus);
+    camera->GetPosition(viewPoint);
+    camera->SetFocalPoint(motionVector[0] + viewFocus[0], motionVector[1] + viewFocus[1], motionVector[2] + viewFocus[2]);
+    camera->SetPosition(  motionVector[0] + viewPoint[0], motionVector[1] + viewPoint[1], motionVector[2] + viewPoint[2]);
+
+    if (Interactor->GetLightFollowCamera())
+        CurrentRenderer->UpdateLightsGeometryToFollowCamera();
+
+    Interactor->Render();
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+
+void cv::viz::vtkVizInteractorStyle::Dolly()
+{
+    if (!CurrentRenderer)
+        return;
+
+    int dy = Interactor->GetEventPosition()[1] - Interactor->GetLastEventPosition()[1];
+    Dolly(pow(1.1, MotionFactor * dy / CurrentRenderer->GetCenter()[1]));
+}
+
+void cv::viz::vtkVizInteractorStyle::Dolly(double factor)
+{
+    if (!CurrentRenderer)
+        return;
+
+    vtkCamera *camera = CurrentRenderer->GetActiveCamera();
+    if (camera->GetParallelProjection())
+        camera->SetParallelScale(camera->GetParallelScale() / factor);
+    else
+    {
+        camera->Dolly(factor);
+        if (AutoAdjustCameraClippingRange)
+            CurrentRenderer->ResetCameraClippingRange();
+    }
+
+    if (Interactor->GetLightFollowCamera())
+        CurrentRenderer->UpdateLightsGeometryToFollowCamera();
+
+    Interactor->Render();
+}
+//////////////////////////////////////////////////////////////////////////////////////////////
+
+void cv::viz::vtkVizInteractorStyle::Fly()
+{
+    if (CurrentRenderer == NULL)
+        return;
+
+    if (KeysDown)
+        FlyByKey();
+    else
+        FlyByMouse();
+
+    CurrentRenderer->GetActiveCamera()->OrthogonalizeViewUp();
+
+    if (AutoAdjustCameraClippingRange)
+        CurrentRenderer->ResetCameraClippingRange();
+
+    if (Interactor->GetLightFollowCamera())
+        CurrentRenderer->UpdateLightsGeometryToFollowCamera();
+}
+
+void cv::viz::vtkVizInteractorStyle::SetupMotionVars()
+{
+    Vec6d bounds;
+    CurrentRenderer->ComputeVisiblePropBounds(bounds.val);
+
+    if ( !vtkMath::AreBoundsInitialized(bounds.val) )
+        DiagonalLength = 1.0;
+    else
+        DiagonalLength = norm(Vec3d(bounds[0], bounds[2], bounds[4]) - Vec3d(bounds[1], bounds[3], bounds[5]));
+}
+
+void cv::viz::vtkVizInteractorStyle::MotionAlongVector(const Vec3d& vector, double amount, vtkCamera* cam)
+{
+    // move camera and focus along DirectionOfProjection
+    Vec3d campos = Vec3d(cam->GetPosition())   - amount * vector;
+    Vec3d camfoc = Vec3d(cam->GetFocalPoint()) - amount * vector;
+
+    cam->SetPosition(campos.val);
+    cam->SetFocalPoint(camfoc.val);
+}
+
+void cv::viz::vtkVizInteractorStyle::FlyByMouse()
+{
+    vtkCamera* cam = CurrentRenderer->GetActiveCamera();
+    double speed  = DiagonalLength * MotionStepSize * MotionUserScale;
+    speed = speed * ( Interactor->GetShiftKey() ? MotionAccelerationFactor : 1.0);
+
+    // Sidestep
+    if (Interactor->GetAltKey())
+    {
+        if (DeltaYaw!=0.0)
+        {
+            vtkMatrix4x4 *vtm = cam->GetViewTransformMatrix();
+            Vec3d a_vector(vtm->GetElement(0,0), vtm->GetElement(0,1), vtm->GetElement(0,2));
+
+            MotionAlongVector(a_vector, -DeltaYaw*speed, cam);
+        }
+        if (DeltaPitch!=0.0)
+        {
+            Vec3d a_vector(cam->GetViewUp());
+            MotionAlongVector(a_vector, DeltaPitch*speed, cam);
+        }
+    }
+    else
+    {
+        cam->Yaw(DeltaYaw);
+        cam->Pitch(DeltaPitch);
+        DeltaYaw = 0;
+        DeltaPitch = 0;
+    }
+    //
+    if (!Interactor->GetControlKey())
+    {
+        Vec3d a_vector(cam->GetDirectionOfProjection()); // reversed (use -speed)
+        switch (State)
+        {
+        case VTKIS_FORWARDFLY: MotionAlongVector(a_vector, -speed, cam); break;
+        case VTKIS_REVERSEFLY: MotionAlongVector(a_vector, speed, cam); break;
+        }
+    }
+}
+
+void cv::viz::vtkVizInteractorStyle::FlyByKey()
+{
+    vtkCamera* cam = CurrentRenderer->GetActiveCamera();
+
+    double speed  = DiagonalLength * MotionStepSize * MotionUserScale;
+    speed = speed * ( Interactor->GetShiftKey() ? MotionAccelerationFactor : 1.0);
+
+    // Left and right
+    if (Interactor->GetAltKey())
+    { // Sidestep
+        vtkMatrix4x4 *vtm = cam->GetViewTransformMatrix();
+        Vec3d a_vector(vtm->GetElement(0,0), vtm->GetElement(0,1), vtm->GetElement(0,2));
+
+        if (KeysDown & 1)
+            MotionAlongVector(a_vector, -speed, cam);
+
+        if (KeysDown & 2)
+            MotionAlongVector(a_vector,  speed, cam);
+    }
+    else
+    {
+        if (KeysDown & 1)
+            cam->Yaw( AngleStepSize);
+
+        if (KeysDown & 2)
+            cam->Yaw(-AngleStepSize);
+    }
+
+    // Up and Down
+    if (Interactor->GetControlKey())
+    { // Sidestep
+        Vec3d a_vector = Vec3d(cam->GetViewUp());
+        if (KeysDown & 4)
+            MotionAlongVector(a_vector,-speed, cam);
+
+        if (KeysDown & 8)
+            MotionAlongVector(a_vector, speed, cam);
+    }
+    else
+    {
+        if (KeysDown & 4)
+            cam->Pitch(-AngleStepSize);
+
+        if (KeysDown & 8)
+            cam->Pitch( AngleStepSize);
+    }
+
+    // forward and backward
+    Vec3d a_vector(cam->GetDirectionOfProjection());
+    if (KeysDown & 16)
+        MotionAlongVector(a_vector, speed, cam);
+
+    if (KeysDown & 32)
+        MotionAlongVector(a_vector,-speed, cam);
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////////
+
+void cv::viz::vtkVizInteractorStyle::PrintSelf(ostream& os, vtkIndent indent)
+{
+    Superclass::PrintSelf(os, indent);
+    os << indent << "MotionFactor: " << MotionFactor << "\n";
+    os << indent << "MotionStepSize: " << MotionStepSize << "\n";
+    os << indent << "MotionAccelerationFactor: "<< MotionAccelerationFactor << "\n";
+    os << indent << "AngleStepSize: " << AngleStepSize << "\n";
+    os << indent << "MotionUserScale: "<< MotionUserScale << "\n";
+}
diff --git a/modules/viz/src/vtk/vtkVizInteractorStyle.hpp b/modules/viz/src/vtk/vtkVizInteractorStyle.hpp
new file mode 100644
index 00000000000..c9458ae0645
--- /dev/null
+++ b/modules/viz/src/vtk/vtkVizInteractorStyle.hpp
@@ -0,0 +1,169 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#ifndef __OPENCV_VIZ_INTERACTOR_STYLE_H__
+#define __OPENCV_VIZ_INTERACTOR_STYLE_H__
+
+#include <vtkInteractorStyle.h>
+
+namespace cv
+{
+    namespace viz
+    {
+        class vtkVizInteractorStyle : public vtkInteractorStyle
+        {
+        public:
+            static vtkVizInteractorStyle *New();
+            vtkTypeMacro(vtkVizInteractorStyle, vtkInteractorStyle)
+            void PrintSelf(ostream& os, vtkIndent indent);
+
+            virtual void OnChar();
+            virtual void OnKeyDown();
+            virtual void OnKeyUp();
+
+            virtual void OnMouseMove();
+            virtual void OnLeftButtonDown();
+            virtual void OnLeftButtonUp();
+            virtual void OnMiddleButtonDown();
+            virtual void OnMiddleButtonUp();
+            virtual void OnRightButtonDown();
+            virtual void OnRightButtonUp();
+            virtual void OnMouseWheelForward();
+            virtual void OnMouseWheelBackward();
+            virtual void OnTimer();
+
+            virtual void Rotate();
+            virtual void Spin();
+            virtual void Pan();
+            virtual void Dolly();
+
+            vtkSetMacro(FlyMode,bool)
+            vtkGetMacro(FlyMode,bool)
+
+
+            vtkSetMacro(MotionFactor, double)
+            vtkGetMacro(MotionFactor, double)
+
+            void registerMouseCallback(void (*callback)(const MouseEvent&, void*), void* cookie = 0);
+            void registerKeyboardCallback(void (*callback)(const KeyboardEvent&, void*), void * cookie = 0);
+
+            void setWidgetActorMap(const Ptr<WidgetActorMap>& actors) { widget_actor_map_ = actors; }
+            void saveScreenshot(const String &file);
+            void exportScene(const String &file);
+            void exportScene();
+            void changePointsSize(float delta);
+            void setRepresentationToPoints();
+            void printCameraParams();
+            void toggleFullScreen();
+            void resetViewerPose();
+            void toggleStereo();
+            void printHelp();
+
+            // Set the basic unit step size : by default 1/250 of bounding diagonal
+            vtkSetMacro(MotionStepSize,double)
+            vtkGetMacro(MotionStepSize,double)
+
+            // Set acceleration factor when shift key is applied : default 10
+            vtkSetMacro(MotionAccelerationFactor,double)
+            vtkGetMacro(MotionAccelerationFactor,double)
+
+            // Set the basic angular unit for turning : default 1 degree
+            vtkSetMacro(AngleStepSize,double)
+            vtkGetMacro(AngleStepSize,double)
+
+        private:
+            Ptr<WidgetActorMap> widget_actor_map_;
+
+            Vec2i win_size_;
+            Vec2i win_pos_;
+            Vec2i max_win_size_;
+
+            void zoomIn();
+            void zoomOut();
+
+        protected:
+            vtkVizInteractorStyle();
+            ~vtkVizInteractorStyle();
+
+            virtual void Dolly(double factor);
+
+            void Fly();
+            void FlyByMouse();
+            void FlyByKey();
+            void SetupMotionVars();
+            void MotionAlongVector(const Vec3d& vector, double amount, vtkCamera* cam);
+
+        private:
+            vtkVizInteractorStyle(const vtkVizInteractorStyle&);
+            vtkVizInteractorStyle& operator=(const vtkVizInteractorStyle&);
+
+            //! True for red-blue colors, false for magenta-green.
+            bool stereo_anaglyph_redblue_;
+
+            void (*keyboardCallback_)(const KeyboardEvent&, void*);
+            void *keyboard_callback_cookie_;
+
+            void (*mouseCallback_)(const MouseEvent&, void*);
+            void *mouse_callback_cookie_;
+
+            bool FlyMode;
+            double MotionFactor;
+
+            int getModifiers();
+
+            // from fly
+            unsigned char KeysDown;
+            double        DiagonalLength;
+            double        MotionStepSize;
+            double        MotionUserScale;
+            double        MotionAccelerationFactor;
+            double        AngleStepSize;
+            double        DeltaYaw;
+            double        DeltaPitch;
+        };
+    } // end namespace viz
+} // end namespace cv
+
+#endif
diff --git a/modules/viz/src/vtk/vtkXYZReader.cpp b/modules/viz/src/vtk/vtkXYZReader.cpp
new file mode 100644
index 00000000000..57726eae9b3
--- /dev/null
+++ b/modules/viz/src/vtk/vtkXYZReader.cpp
@@ -0,0 +1,107 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#include "../precomp.hpp"
+
+namespace cv { namespace viz
+{
+    vtkStandardNewMacro(vtkXYZReader);
+}}
+
+
+cv::viz::vtkXYZReader::vtkXYZReader()
+{
+    this->FileName = 0;
+    this->SetNumberOfInputPorts(0);
+}
+
+cv::viz::vtkXYZReader::~vtkXYZReader()
+{
+    this->SetFileName(0);
+}
+
+void cv::viz::vtkXYZReader::PrintSelf(ostream& os, vtkIndent indent)
+{
+    this->Superclass::PrintSelf(os,indent);
+    os << indent << "FileName: " << (this->FileName ? this->FileName : "(none)") << "\n";
+}
+
+int cv::viz::vtkXYZReader::RequestData(vtkInformation*, vtkInformationVector**, vtkInformationVector* outputVector)
+{
+    // Make sure we have a file to read.
+    if(!this->FileName)
+    {
+        vtkErrorMacro("A FileName must be specified.");
+        return 0;
+    }
+
+    // Open the input file.
+    ifstream fin(this->FileName);
+    if(!fin)
+    {
+        vtkErrorMacro("Error opening file " << this->FileName);
+        return 0;
+    }
+
+    // Allocate objects to hold points and vertex cells.
+    vtkSmartPointer<vtkPoints> points = vtkSmartPointer<vtkPoints>::New();
+    vtkSmartPointer<vtkCellArray> verts = vtkSmartPointer<vtkCellArray>::New();
+
+    // Read points from the file.
+    vtkDebugMacro("Reading points from file " << this->FileName);
+    double x[3];
+    while(fin >> x[0] >> x[1] >> x[2])
+    {
+        vtkIdType id = points->InsertNextPoint(x);
+        verts->InsertNextCell(1, &id);
+    }
+    vtkDebugMacro("Read " << points->GetNumberOfPoints() << " points.");
+
+    // Store the points and cells in the output data object.
+    vtkPolyData* output = vtkPolyData::GetData(outputVector);
+    output->SetPoints(points);
+    output->SetVerts(verts);
+
+    return 1;
+}
diff --git a/modules/viz/src/vtk/vtkXYZReader.h b/modules/viz/src/vtk/vtkXYZReader.h
new file mode 100644
index 00000000000..13ae048ec60
--- /dev/null
+++ b/modules/viz/src/vtk/vtkXYZReader.h
@@ -0,0 +1,80 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#ifndef __vtkXYZReader_h
+#define __vtkXYZReader_h
+
+#include "vtkPolyDataAlgorithm.h"
+
+namespace cv
+{
+    namespace viz
+    {
+        class vtkXYZReader : public vtkPolyDataAlgorithm
+        {
+        public:
+          static vtkXYZReader* New();
+          vtkTypeMacro(vtkXYZReader,vtkPolyDataAlgorithm)
+          void PrintSelf(ostream& os, vtkIndent indent);
+
+          // Description:
+          // Set/Get the name of the file from which to read points.
+          vtkSetStringMacro(FileName)
+          vtkGetStringMacro(FileName)
+
+        protected:
+          vtkXYZReader();
+          ~vtkXYZReader();
+
+          char* FileName;
+
+          int RequestData(vtkInformation*, vtkInformationVector**, vtkInformationVector*);
+        private:
+          vtkXYZReader(const vtkXYZReader&);  // Not implemented.
+          void operator=(const vtkXYZReader&);  // Not implemented.
+        };
+    }
+}
+
+#endif
diff --git a/modules/viz/src/vtk/vtkXYZWriter.cpp b/modules/viz/src/vtk/vtkXYZWriter.cpp
new file mode 100644
index 00000000000..cf95e3c6a09
--- /dev/null
+++ b/modules/viz/src/vtk/vtkXYZWriter.cpp
@@ -0,0 +1,122 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#include "../precomp.hpp"
+
+namespace cv { namespace viz
+{
+    vtkStandardNewMacro(vtkXYZWriter);
+}}
+
+cv::viz::vtkXYZWriter::vtkXYZWriter()
+{
+    std::ofstream fout; // only used to extract the default precision
+    this->DecimalPrecision = fout.precision();
+}
+
+void cv::viz::vtkXYZWriter::WriteData()
+{
+    vtkPolyData *input = this->GetInput();
+    if (!input)
+        return;
+
+    if (!this->FileName )
+    {
+        vtkErrorMacro(<< "No FileName specified! Can't write!");
+        this->SetErrorCode(vtkErrorCode::NoFileNameError);
+        return;
+    }
+
+    vtkDebugMacro(<<"Opening vtk file for writing...");
+    ostream *outfilep = new ofstream(this->FileName, ios::out);
+    if (outfilep->fail())
+    {
+        vtkErrorMacro(<< "Unable to open file: "<< this->FileName);
+        this->SetErrorCode(vtkErrorCode::CannotOpenFileError);
+        delete outfilep;
+        return;
+    }
+
+    ostream &outfile = *outfilep;
+
+    for(vtkIdType i = 0; i < input->GetNumberOfPoints(); ++i)
+    {
+        Vec3d p;
+        input->GetPoint(i, p.val);
+        outfile << std::setprecision(this->DecimalPrecision) << p[0] << " " << p[1] << " " << p[2] << std::endl;
+    }
+
+    // Close the file
+    vtkDebugMacro(<<"Closing vtk file\n");
+    delete outfilep;
+
+    // Delete the file if an error occurred
+    if (this->ErrorCode == vtkErrorCode::OutOfDiskSpaceError)
+    {
+        vtkErrorMacro("Ran out of disk space; deleting file: " << this->FileName);
+        unlink(this->FileName);
+    }
+}
+
+int cv::viz::vtkXYZWriter::FillInputPortInformation(int, vtkInformation *info)
+{
+    info->Set(vtkAlgorithm::INPUT_REQUIRED_DATA_TYPE(), "vtkPolyData");
+    return 1;
+}
+
+void cv::viz::vtkXYZWriter::PrintSelf(ostream& os, vtkIndent indent)
+{
+    this->Superclass::PrintSelf(os,indent);
+    os << indent << "DecimalPrecision: " << this->DecimalPrecision << "\n";
+}
+
+vtkPolyData* cv::viz::vtkXYZWriter::GetInput()
+{
+    return vtkPolyData::SafeDownCast(this->Superclass::GetInput());
+}
+
+vtkPolyData* cv::viz::vtkXYZWriter::GetInput(int port)
+{
+    return vtkPolyData::SafeDownCast(this->Superclass::GetInput(port));
+}
diff --git a/modules/viz/src/vtk/vtkXYZWriter.h b/modules/viz/src/vtk/vtkXYZWriter.h
new file mode 100644
index 00000000000..91d0c8f6b46
--- /dev/null
+++ b/modules/viz/src/vtk/vtkXYZWriter.h
@@ -0,0 +1,90 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#ifndef __vtkXYZWriter_h
+#define __vtkXYZWriter_h
+
+#include "vtkWriter.h"
+
+namespace cv
+{
+    namespace viz
+    {
+        class vtkXYZWriter : public vtkWriter
+        {
+        public:
+            static vtkXYZWriter *New();
+            vtkTypeMacro(vtkXYZWriter,vtkWriter)
+            void PrintSelf(ostream& os, vtkIndent indent);
+
+            vtkGetMacro(DecimalPrecision, int)
+            vtkSetMacro(DecimalPrecision, int)
+
+            // Description:
+            // Specify file name of data file to write.
+            vtkSetStringMacro(FileName)
+            vtkGetStringMacro(FileName)
+
+            // Description:
+            // Get the input to this writer.
+            vtkPolyData* GetInput();
+            vtkPolyData* GetInput(int port);
+
+        protected:
+            vtkXYZWriter();
+            ~vtkXYZWriter(){}
+
+            void WriteData();
+            int FillInputPortInformation(int port, vtkInformation *info);
+
+            int DecimalPrecision;
+            char *FileName;
+
+        private:
+            vtkXYZWriter(const vtkXYZWriter&);  // Not implemented.
+            void operator=(const vtkXYZWriter&);  // Not implemented.
+        };
+    }
+}
+#endif
diff --git a/modules/viz/src/widget.cpp b/modules/viz/src/widget.cpp
new file mode 100644
index 00000000000..b46d3d1ce06
--- /dev/null
+++ b/modules/viz/src/widget.cpp
@@ -0,0 +1,352 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#include "precomp.hpp"
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// widget implementation
+
+class cv::viz::Widget::Impl
+{
+public:
+    vtkSmartPointer<vtkProp> prop;
+    Impl() : prop(0) {}
+};
+
+cv::viz::Widget::Widget() : impl_( new Impl() ) { }
+
+cv::viz::Widget::Widget(const Widget& other) : impl_( new Impl() )
+{
+    if (other.impl_ && other.impl_->prop)
+        impl_->prop = other.impl_->prop;
+}
+
+cv::viz::Widget& cv::viz::Widget::operator=(const Widget& other)
+{
+    if (!impl_)
+        impl_ = new Impl();
+
+    if (other.impl_)
+        impl_->prop = other.impl_->prop;
+    return *this;
+}
+
+cv::viz::Widget::~Widget()
+{
+    if (impl_)
+    {
+        delete impl_;
+        impl_ = 0;
+    }
+}
+
+cv::viz::Widget cv::viz::Widget::fromPlyFile(const String &file_name)
+{
+    CV_Assert(vtkPLYReader::CanReadFile(file_name.c_str()));
+
+    vtkSmartPointer<vtkPLYReader> reader = vtkSmartPointer<vtkPLYReader>::New();
+    reader->SetFileName(file_name.c_str());
+
+    vtkSmartPointer<vtkDataSetMapper> mapper = vtkSmartPointer<vtkDataSetMapper>::New();
+    mapper->SetInputConnection( reader->GetOutputPort() );
+#if VTK_MAJOR_VERSION < 8
+    mapper->ImmediateModeRenderingOff();
+#endif
+
+    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
+    actor->GetProperty()->SetInterpolationToFlat();
+    actor->GetProperty()->BackfaceCullingOn();
+    actor->SetMapper(mapper);
+
+    Widget widget;
+    WidgetAccessor::setProp(widget, actor);
+    return widget;
+}
+
+void cv::viz::Widget::setRenderingProperty(int property, double value)
+{
+    vtkActor *actor = vtkActor::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("Widget type is not supported." && actor);
+
+    switch (property)
+    {
+        case POINT_SIZE:          actor->GetProperty()->SetPointSize(float(value)); break;
+        case OPACITY:             actor->GetProperty()->SetOpacity(value);          break;
+        case LINE_WIDTH:          actor->GetProperty()->SetLineWidth(float(value)); break;
+#if VTK_MAJOR_VERSION < 8
+        case IMMEDIATE_RENDERING: actor->GetMapper()->SetImmediateModeRendering(int(value)); break;
+#else
+        case IMMEDIATE_RENDERING: std::cerr << "this property has no effect" << std::endl; break;
+#endif
+        case AMBIENT:             actor->GetProperty()->SetAmbient(float(value)); break;
+        case LIGHTING:
+        {
+            if (value == 0)
+                actor->GetProperty()->LightingOff();
+            else
+                actor->GetProperty()->LightingOn();
+            break;
+        }
+        case FONT_SIZE:
+        {
+            vtkTextActor* text_actor = vtkTextActor::SafeDownCast(actor);
+            CV_Assert("Widget does not have text content." && text_actor);
+            text_actor->GetTextProperty()->SetFontSize(int(value));
+            break;
+        }
+        case REPRESENTATION:
+        {
+            switch (int(value))
+            {
+                case REPRESENTATION_POINTS:    actor->GetProperty()->SetRepresentationToPoints(); break;
+                case REPRESENTATION_WIREFRAME: actor->GetProperty()->SetRepresentationToWireframe(); break;
+                case REPRESENTATION_SURFACE:   actor->GetProperty()->SetRepresentationToSurface();  break;
+            }
+            break;
+        }
+        case SHADING:
+        {
+            switch (int(value))
+            {
+                case SHADING_FLAT: actor->GetProperty()->SetInterpolationToFlat(); break;
+                case SHADING_GOURAUD:
+                {
+                    if (!actor->GetMapper()->GetInput()->GetPointData()->GetNormals())
+                    {
+                        vtkSmartPointer<vtkPolyDataMapper> mapper = vtkPolyDataMapper::SafeDownCast(actor->GetMapper());
+                        CV_Assert("Can't set shading property for such type of widget" && mapper);
+
+                        vtkSmartPointer<vtkPolyData> with_normals = VtkUtils::ComputeNormals(mapper->GetInput());
+                        VtkUtils::SetInputData(mapper, with_normals);
+                    }
+                    actor->GetProperty()->SetInterpolationToGouraud();
+                    break;
+                }
+                case SHADING_PHONG:
+                {
+                    if (!actor->GetMapper()->GetInput()->GetPointData()->GetNormals())
+                    {
+                        vtkSmartPointer<vtkPolyDataMapper> mapper = vtkPolyDataMapper::SafeDownCast(actor->GetMapper());
+                        CV_Assert("Can't set shading property for such type of widget" && mapper);
+
+                        vtkSmartPointer<vtkPolyData> with_normals = VtkUtils::ComputeNormals(mapper->GetInput());
+                        VtkUtils::SetInputData(mapper, with_normals);
+                    }
+                    actor->GetProperty()->SetInterpolationToPhong();
+                    break;
+                }
+            }
+            break;
+        }
+        default:
+            CV_Assert("setRenderingProperty: Unknown property");
+    }
+    actor->Modified();
+}
+
+double cv::viz::Widget::getRenderingProperty(int property) const
+{
+    vtkActor *actor = vtkActor::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("Widget type is not supported." && actor);
+
+    double value = 0.0;
+    switch (property)
+    {
+        case POINT_SIZE: value = actor->GetProperty()->GetPointSize(); break;
+        case OPACITY:    value = actor->GetProperty()->GetOpacity();   break;
+        case LINE_WIDTH: value = actor->GetProperty()->GetLineWidth(); break;
+#if VTK_MAJOR_VERSION < 8
+        case IMMEDIATE_RENDERING:  value = actor->GetMapper()->GetImmediateModeRendering();  break;
+#else
+        case IMMEDIATE_RENDERING: std::cerr << "this property has no effect" << std::endl; break;
+#endif
+        case AMBIENT: value = actor->GetProperty()->GetAmbient(); break;
+        case LIGHTING: value = actor->GetProperty()->GetLighting(); break;
+        case FONT_SIZE:
+        {
+            vtkTextActor* text_actor = vtkTextActor::SafeDownCast(actor);
+            CV_Assert("Widget does not have text content." && text_actor);
+            value = text_actor->GetTextProperty()->GetFontSize();;
+            break;
+        }
+        case REPRESENTATION:
+        {
+            switch (actor->GetProperty()->GetRepresentation())
+            {
+                case VTK_POINTS:    value = REPRESENTATION_POINTS; break;
+                case VTK_WIREFRAME: value = REPRESENTATION_WIREFRAME; break;
+                case VTK_SURFACE:   value = REPRESENTATION_SURFACE; break;
+            }
+            break;
+        }
+        case SHADING:
+        {
+            switch (actor->GetProperty()->GetInterpolation())
+            {
+                case VTK_FLAT:      value = SHADING_FLAT; break;
+                case VTK_GOURAUD:   value = SHADING_GOURAUD; break;
+                case VTK_PHONG:     value = SHADING_PHONG; break;
+            }
+            break;
+        }
+        default:
+            CV_Assert("getRenderingProperty: Unknown property");
+    }
+    return value;
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// widget accessor implementation
+
+vtkSmartPointer<vtkProp> cv::viz::WidgetAccessor::getProp(const Widget& widget)
+{
+    return widget.impl_->prop;
+}
+
+void cv::viz::WidgetAccessor::setProp(Widget& widget, vtkSmartPointer<vtkProp> prop)
+{
+    widget.impl_->prop = prop;
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// widget3D implementation
+
+void cv::viz::Widget3D::setPose(const Affine3d &pose)
+{
+    vtkProp3D *actor = vtkProp3D::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("Widget is not 3D." && actor);
+
+    vtkSmartPointer<vtkMatrix4x4> matrix = vtkmatrix(pose.matrix);
+    actor->SetUserMatrix(matrix);
+    actor->Modified();
+}
+
+void cv::viz::Widget3D::updatePose(const Affine3d &pose)
+{
+    vtkProp3D *actor = vtkProp3D::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("Widget is not 3D." && actor);
+
+    vtkSmartPointer<vtkMatrix4x4> matrix = actor->GetUserMatrix();
+    if (!matrix)
+    {
+        setPose(pose);
+        return;
+    }
+
+    Affine3d updated_pose = pose * Affine3d(*matrix->Element);
+    matrix = vtkmatrix(updated_pose.matrix);
+
+    actor->SetUserMatrix(matrix);
+    actor->Modified();
+}
+
+cv::Affine3d cv::viz::Widget3D::getPose() const
+{
+    vtkProp3D *actor = vtkProp3D::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("Widget is not 3D." && actor);
+    if (!actor->GetUserMatrix())
+    {
+        return Affine3d(); // empty user matrix, return an identity transform.
+    }
+    return Affine3d(*actor->GetUserMatrix()->Element);
+}
+
+void cv::viz::Widget3D::applyTransform(const Affine3d &transform)
+{
+    vtkActor *actor = vtkActor::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("Widget is not 3D actor." && actor);
+
+    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkPolyDataMapper::SafeDownCast(actor->GetMapper());
+    CV_Assert("Widget doesn't have a polydata mapper" && mapper);
+
+    mapper->Update(); // #10945
+    VtkUtils::SetInputData(mapper, VtkUtils::TransformPolydata(mapper->GetInput(), transform));
+    mapper->Update();
+}
+
+void cv::viz::Widget3D::setColor(const Color &color)
+{
+    // Cast to actor instead of prop3d since prop3d doesn't provide getproperty
+    vtkActor *actor = vtkActor::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("Widget type is not supported." && actor);
+
+    Color c = vtkcolor(color);
+    actor->GetMapper()->ScalarVisibilityOff();
+    actor->GetProperty()->SetColor(c.val);
+    actor->GetProperty()->SetEdgeColor(c.val);
+    actor->Modified();
+}
+
+template<> cv::viz::Widget3D cv::viz::Widget::cast<cv::viz::Widget3D>() const
+{
+    vtkProp3D *actor = vtkProp3D::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("Widget cannot be cast." && actor);
+
+    Widget3D widget;
+    WidgetAccessor::setProp(widget, actor);
+    return widget;
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////
+/// widget2D implementation
+
+void cv::viz::Widget2D::setColor(const Color &color)
+{
+    vtkActor2D *actor = vtkActor2D::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("Widget type is not supported." && actor);
+    Color c = vtkcolor(color);
+    actor->GetProperty()->SetColor(c.val);
+    actor->Modified();
+}
+
+template<> cv::viz::Widget2D cv::viz::Widget::cast<cv::viz::Widget2D>() const
+{
+    vtkActor2D *actor = vtkActor2D::SafeDownCast(WidgetAccessor::getProp(*this));
+    CV_Assert("Widget cannot be cast." && actor);
+
+    Widget2D widget;
+    WidgetAccessor::setProp(widget, actor);
+    return widget;
+}
diff --git a/modules/viz/test/test_common.cpp b/modules/viz/test/test_common.cpp
new file mode 100644
index 00000000000..ac0cea4b8bb
--- /dev/null
+++ b/modules/viz/test/test_common.cpp
@@ -0,0 +1,27 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#include "test_precomp.hpp"
+
+cv::String cv::Path::combine(const String& item1, const String& item2)
+{
+    if (item1.empty())
+        return item2;
+
+    if (item2.empty())
+        return item1;
+
+    char last = item1[item1.size()-1];
+
+    bool need_append = last != '/' && last != '\\';
+    return item1 + (need_append ? "/" : "") + item2;
+}
+
+cv::String cv::Path::combine(const String& item1, const String& item2, const String& item3)
+{ return combine(combine(item1, item2), item3); }
+
+cv::String cv::Path::change_extension(const String& file, const String& ext)
+{
+    String::size_type pos = file.find_last_of('.');
+    return pos == String::npos ? file : file.substr(0, pos+1) + ext;
+}
diff --git a/modules/viz/test/test_common.hpp b/modules/viz/test/test_common.hpp
new file mode 100644
index 00000000000..bfd6d13cddb
--- /dev/null
+++ b/modules/viz/test/test_common.hpp
@@ -0,0 +1,87 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+//
+//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+//
+//  By downloading, copying, installing or using the software you agree to this license.
+//  If you do not agree to this license, do not download, install,
+//  copy or use the software.
+//
+//
+//                           License Agreement
+//                For Open Source Computer Vision Library
+//
+// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
+// Third party copyrights are property of their respective owners.
+//
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
+//   * Redistribution's of source code must retain the above copyright notice,
+//     this list of conditions and the following disclaimer.
+//
+//   * Redistribution's in binary form must reproduce the above copyright notice,
+//     this list of conditions and the following disclaimer in the documentation
+//     and/or other materials provided with the distribution.
+//
+//   * The name of the copyright holders may not be used to endorse or promote products
+//     derived from this software without specific prior written permission.
+//
+// This software is provided by the copyright holders and contributors "as is" and
+// any express or implied warranties, including, but not limited to, the implied
+// warranties of merchantability and fitness for a particular purpose are disclaimed.
+// In no event shall the Intel Corporation or contributors be liable for any direct,
+// indirect, incidental, special, exemplary, or consequential damages
+// (including, but not limited to, procurement of substitute goods or services;
+// loss of use, data, or profits; or business interruption) however caused
+// and on any theory of liability, whether in contract, strict liability,
+// or tort (including negligence or otherwise) arising in any way out of
+// the use of this software, even if advised of the possibility of such damage.
+//
+// Authors:
+//  * Ozan Tonkal, ozantonkal@gmail.com
+//  * Anatoly Baksheev, Itseez Inc.  myname.mysurname <> mycompany.com
+//
+//M*/
+
+#ifndef OPENCV_VIZ_TEST_COMMON_HPP
+#define OPENCV_VIZ_TEST_COMMON_HPP
+
+#include <opencv2/viz/vizcore.hpp>
+
+namespace cv
+{
+    struct Path
+    {
+        static String combine(const String& item1, const String& item2);
+        static String combine(const String& item1, const String& item2, const String& item3);
+        static String change_extension(const String& file, const String& ext);
+    };
+
+    inline cv::String get_dragon_ply_file_path()
+    {
+        return Path::combine(cvtest::TS::ptr()->get_data_path(), "dragon.ply");
+    }
+
+    template<typename _Tp>
+    inline std::vector< Affine3<_Tp> > generate_test_trajectory()
+    {
+        std::vector< Affine3<_Tp> > result;
+
+        for (int i = 0, j = 0; i <= 270; i += 3, j += 10)
+        {
+            double x = 2 * cos(i * 3 * CV_PI/180.0) * (1.0 + 0.5 * cos(1.2 + i * 1.2 * CV_PI/180.0));
+            double y = 0.25 + i/270.0 + sin(j * CV_PI/180.0) * 0.2 * sin(0.6 + j * 1.5 * CV_PI/180.0);
+            double z = 2 * sin(i * 3 * CV_PI/180.0) * (1.0 + 0.5 * cos(1.2 + i * CV_PI/180.0));
+            result.push_back(viz::makeCameraPose(Vec3d(x, y, z), Vec3d::all(0.0), Vec3d(0.0, 1.0, 0.0)));
+        }
+        return result;
+    }
+
+    inline Mat make_gray(const Mat& image)
+    {
+        Mat chs[3]; split(image, chs);
+        return 0.114 * chs[0] + 0.58 * chs[1] + 0.3 * chs[2];
+    }
+}
+
+#endif
diff --git a/modules/viz/test/test_main.cpp b/modules/viz/test/test_main.cpp
new file mode 100644
index 00000000000..656b1d4fefd
--- /dev/null
+++ b/modules/viz/test/test_main.cpp
@@ -0,0 +1,6 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#include "test_precomp.hpp"
+
+CV_TEST_MAIN("viz")
diff --git a/modules/viz/test/test_precomp.hpp b/modules/viz/test/test_precomp.hpp
new file mode 100644
index 00000000000..954f0335445
--- /dev/null
+++ b/modules/viz/test/test_precomp.hpp
@@ -0,0 +1,10 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#include "opencv2/ts.hpp"
+#include "test_common.hpp"
+
+namespace opencv_test
+{
+using namespace cv::viz;
+}
diff --git a/modules/viz/test/test_tutorial2.cpp b/modules/viz/test/test_tutorial2.cpp
new file mode 100644
index 00000000000..6b2972f0af3
--- /dev/null
+++ b/modules/viz/test/test_tutorial2.cpp
@@ -0,0 +1,58 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+static void tutorial2()
+{
+    /// Create a window
+    viz::Viz3d myWindow("Coordinate Frame");
+
+    /// Add coordinate axes
+    myWindow.showWidget("Coordinate Widget", viz::WCoordinateSystem());
+
+    /// Add line to represent (1,1,1) axis
+    viz::WLine axis(Point3f(-1.0, -1.0, -1.0), Point3d(1.0, 1.0, 1.0));
+    axis.setRenderingProperty(viz::LINE_WIDTH, 4.0);
+    myWindow.showWidget("Line Widget", axis);
+
+    /// Construct a cube widget
+    viz::WCube cube_widget(Point3d(0.5, 0.5, 0.0), Point3d(0.0, 0.0, -0.5), true, viz::Color::blue());
+    cube_widget.setRenderingProperty(viz::LINE_WIDTH, 4.0);
+
+    /// Display widget (update if already displayed)
+    myWindow.showWidget("Cube Widget", cube_widget);
+
+    /// Rodrigues vector
+    Vec3d rot_vec = Vec3d::all(0);
+    double translation_phase = 0.0, translation = 0.0;
+    while(!myWindow.wasStopped())
+    {
+        /* Rotation using rodrigues */
+        /// Rotate around (1,1,1)
+        rot_vec[0] += CV_PI * 0.01;
+        rot_vec[1] += CV_PI * 0.01;
+        rot_vec[2] += CV_PI * 0.01;
+
+        /// Shift on (1,1,1)
+        translation_phase += CV_PI * 0.01;
+        translation = sin(translation_phase);
+
+        /// Construct pose
+        Affine3d pose(rot_vec, Vec3d(translation, translation, translation));
+
+        myWindow.setWidgetPose("Cube Widget", pose);
+
+        myWindow.spinOnce(1, true);
+    }
+}
+
+
+TEST(Viz, DISABLED_tutorial2_pose_of_widget)
+{
+    tutorial2();
+}
+
+}} // namespace
diff --git a/modules/viz/test/test_tutorial3.cpp b/modules/viz/test/test_tutorial3.cpp
new file mode 100644
index 00000000000..232130f0a6a
--- /dev/null
+++ b/modules/viz/test/test_tutorial3.cpp
@@ -0,0 +1,64 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+/**
+ * @function main
+ */
+static void tutorial3(bool camera_pov)
+{
+    /// Create a window
+    viz::Viz3d myWindow("Coordinate Frame");
+
+    /// Add coordinate axes
+    myWindow.showWidget("Coordinate Widget", viz::WCoordinateSystem());
+
+    /// Let's assume camera has the following properties
+    Point3d cam_origin(3.0, 3.0, 3.0), cam_focal_point(3.0, 3.0, 2.0), cam_y_dir(-1.0, 0.0, 0.0);
+
+    /// We can get the pose of the cam using makeCameraPose
+    Affine3d camera_pose = viz::makeCameraPose(cam_origin, cam_focal_point, cam_y_dir);
+
+    /// We can get the transformation matrix from camera coordinate system to global using
+    /// - makeTransformToGlobal. We need the axes of the camera
+    Affine3d transform = viz::makeTransformToGlobal(Vec3d(0.0, -1.0, 0.0), Vec3d(-1.0, 0.0, 0.0), Vec3d(0.0, 0.0, -1.0), cam_origin);
+
+    /// Create a cloud widget.
+    Mat dragon_cloud = viz::readCloud(get_dragon_ply_file_path());
+    viz::WCloud cloud_widget(dragon_cloud, viz::Color::green());
+
+    /// Pose of the widget in camera frame
+    Affine3d cloud_pose = Affine3d().rotate(Vec3d(0.0, CV_PI/2, 0.0)).rotate(Vec3d(0.0, 0.0, CV_PI)).translate(Vec3d(0.0, 0.0, 3.0));
+    /// Pose of the widget in global frame
+    Affine3d cloud_pose_global = transform * cloud_pose;
+
+    /// Visualize camera frame
+    myWindow.showWidget("CPW_FRUSTUM", viz::WCameraPosition(Vec2f(0.889484f, 0.523599f)), camera_pose);
+    if (!camera_pov)
+        myWindow.showWidget("CPW", viz::WCameraPosition(0.5), camera_pose);
+
+    /// Visualize widget
+    myWindow.showWidget("bunny", cloud_widget, cloud_pose_global);
+
+    /// Set the viewer pose to that of camera
+    if (camera_pov)
+        myWindow.setViewerPose(camera_pose);
+
+    /// Start event loop.
+    myWindow.spin();
+}
+
+TEST(Viz, tutorial3_global_view)
+{
+    tutorial3(false);
+}
+
+TEST(Viz, tutorial3_camera_view)
+{
+    tutorial3(true);
+}
+
+}} // namespace
diff --git a/modules/viz/test/test_viz3d.cpp b/modules/viz/test/test_viz3d.cpp
new file mode 100644
index 00000000000..cdf8a00ad79
--- /dev/null
+++ b/modules/viz/test/test_viz3d.cpp
@@ -0,0 +1,65 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+ //
+ //  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+ //
+ //  By downloading, copying, installing or using the software you agree to this license.
+ //  If you do not agree to this license, do not download, install,
+ //  copy or use the software.
+ //
+ //
+ //                           License Agreement
+ //                For Open Source Computer Vision Library
+ //
+ // Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+ // Copyright (C) 2008-2013, Willow Garage Inc., all rights reserved.
+ // Third party copyrights are property of their respective owners.
+ //
+ // Redistribution and use in source and binary forms, with or without modification,
+ // are permitted provided that the following conditions are met:
+ //
+ //   * Redistribution's of source code must retain the above copyright notice,
+ //     this list of conditions and the following disclaimer.
+ //
+ //   * Redistribution's in binary form must reproduce the above copyright notice,
+ //     this list of conditions and the following disclaimer in the documentation
+ //     and / or other materials provided with the distribution.
+ //
+ //   * The name of the copyright holders may not be used to endorse or promote products
+ //     derived from this software without specific prior written permission.
+ //
+ // This software is provided by the copyright holders and contributors "as is" and
+ // any express or implied warranties, including, but not limited to, the implied
+ // warranties of merchantability and fitness for a particular purpose are disclaimed.
+ // In no event shall the Intel Corporation or contributors be liable for any direct,
+ // indirect, incidental, special, exemplary, or consequential damages
+ // (including, but not limited to, procurement of substitute goods or services;
+ // loss of use, data, or profits; or business interruption) however caused
+ // and on any theory of liability, whether in contract, strict liability,
+ // or tort (including negligence or otherwise) arising in any way out of
+ // the use of this software, even if advised of the possibility of such damage.
+ //
+ //M*/
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+TEST(Viz_viz3d, DISABLED_develop)
+{
+    cv::Mat cloud = cv::viz::readCloud(get_dragon_ply_file_path());
+
+    cv::viz::Viz3d viz("abc");
+    viz.setBackgroundMeshLab();
+    viz.showWidget("coo", cv::viz::WCoordinateSystem(1));
+    viz.showWidget("cloud", cv::viz::WPaintedCloud(cloud));
+
+    //---->>>>> <to_test_in_future>
+    //std::vector<cv::Affine3d> gt, es;
+    //cv::viz::readTrajectory(gt, "d:/Datasets/trajs/gt%05d.xml");
+    //cv::viz::readTrajectory(es, "d:/Datasets/trajs/es%05d.xml");
+    //cv::Mat cloud = cv::viz::readCloud(get_dragon_ply_file_path());
+    //---->>>>> </to_test_in_future>
+
+    viz.spin();
+}
+
+}} // namespace
diff --git a/modules/viz/test/tests_simple.cpp b/modules/viz/test/tests_simple.cpp
new file mode 100644
index 00000000000..12d696dfba5
--- /dev/null
+++ b/modules/viz/test/tests_simple.cpp
@@ -0,0 +1,454 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////
+ //
+ //  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
+ //
+ //  By downloading, copying, installing or using the software you agree to this license.
+ //  If you do not agree to this license, do not download, install,
+ //  copy or use the software.
+ //
+ //
+ //                           License Agreement
+ //                For Open Source Computer Vision Library
+ //
+ // Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+ // Copyright (C) 2008-2013, Willow Garage Inc., all rights reserved.
+ // Third party copyrights are property of their respective owners.
+ //
+ // Redistribution and use in source and binary forms, with or without modification,
+ // are permitted provided that the following conditions are met:
+ //
+ //   * Redistribution's of source code must retain the above copyright notice,
+ //     this list of conditions and the following disclaimer.
+ //
+ //   * Redistribution's in binary form must reproduce the above copyright notice,
+ //     this list of conditions and the following disclaimer in the documentation
+ //     and / or other materials provided with the distribution.
+ //
+ //   * The name of the copyright holders may not be used to endorse or promote products
+ //     derived from this software without specific prior written permission.
+ //
+ // This software is provided by the copyright holders and contributors "as is" and
+ // any express or implied warranties, including, but not limited to, the implied
+ // warranties of merchantability and fitness for a particular purpose are disclaimed.
+ // In no event shall the Intel Corporation or contributors be liable for any direct,
+ // indirect, incidental, special, exemplary, or consequential damages
+ // (including, but not limited to, procurement of substitute goods or services;
+ // loss of use, data, or profits; or business interruption) however caused
+ // and on any theory of liability, whether in contract, strict liability,
+ // or tort (including negligence or otherwise) arising in any way out of
+ // the use of this software, even if advised of the possibility of such damage.
+ //
+ //M*/
+
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+TEST(Viz, show_cloud_bluberry)
+{
+    Mat dragon_cloud = readCloud(get_dragon_ply_file_path());
+
+    Affine3d pose = Affine3d().rotate(Vec3d(0, 0.8, 0));
+
+    Viz3d viz("show_cloud_bluberry");
+    viz.setBackgroundColor(Color::black());
+    viz.showWidget("coosys", WCoordinateSystem());
+    viz.showWidget("dragon", WCloud(dragon_cloud, Color::bluberry()), pose);
+
+    viz.showWidget("text2d", WText("Bluberry cloud", Point(20, 20), 20, Color::green()));
+    viz.spin();
+}
+
+TEST(Viz, show_cloud_random_color)
+{
+    Mat dragon_cloud = readCloud(get_dragon_ply_file_path());
+
+    Mat colors(dragon_cloud.size(), CV_8UC3);
+    theRNG().fill(colors, RNG::UNIFORM, 0, 255);
+
+    Affine3d pose = Affine3d().rotate(Vec3d(0, 0.8, 0));
+
+    Viz3d viz("show_cloud_random_color");
+    viz.setBackgroundMeshLab();
+    viz.showWidget("coosys", WCoordinateSystem());
+    viz.showWidget("dragon", WCloud(dragon_cloud, colors), pose);
+    viz.showWidget("text2d", WText("Random color cloud", Point(20, 20), 20, Color::green()));
+    viz.spin();
+}
+
+TEST(Viz, show_cloud_masked)
+{
+    Mat dragon_cloud = readCloud(get_dragon_ply_file_path());
+
+    Vec3f qnan = Vec3f::all(std::numeric_limits<float>::quiet_NaN());
+    for(int i = 0; i < (int)dragon_cloud.total(); ++i)
+        if (i % 15 != 0)
+            dragon_cloud.at<Vec3f>(i) = qnan;
+
+    Affine3d pose = Affine3d().rotate(Vec3d(0, 0.8, 0));
+
+    Viz3d viz("show_cloud_masked");
+    viz.showWidget("coosys", WCoordinateSystem());
+    viz.showWidget("dragon", WCloud(dragon_cloud), pose);
+    viz.showWidget("text2d", WText("Nan masked cloud", Point(20, 20), 20, Color::green()));
+    viz.spin();
+}
+
+TEST(Viz, show_cloud_collection)
+{
+    Mat cloud = readCloud(get_dragon_ply_file_path());
+
+    WCloudCollection ccol;
+    ccol.addCloud(cloud, Color::white(), Affine3d().translate(Vec3d(0, 0, 0)).rotate(Vec3d(CV_PI/2, 0, 0)));
+    ccol.addCloud(cloud, Color::blue(),  Affine3d().translate(Vec3d(1, 0, 0)));
+    ccol.addCloud(cloud, Color::red(),   Affine3d().translate(Vec3d(2, 0, 0)));
+    ccol.finalize();
+
+    Viz3d viz("show_cloud_collection");
+    viz.setBackgroundColor(Color::mlab());
+    viz.showWidget("coosys", WCoordinateSystem());
+    viz.showWidget("ccol", ccol);
+    viz.showWidget("text2d", WText("Cloud collection", Point(20, 20), 20, Color::green()));
+    viz.spin();
+}
+
+TEST(Viz, show_painted_clouds)
+{
+    Mat cloud = readCloud(get_dragon_ply_file_path());
+
+    Viz3d viz("show_painted_clouds");
+    viz.setBackgroundMeshLab();
+    viz.showWidget("coosys", WCoordinateSystem());
+    viz.showWidget("cloud1", WPaintedCloud(cloud), Affine3d(Vec3d(0.0, -CV_PI/2, 0.0), Vec3d(-1.5, 0.0, 0.0)));
+    viz.showWidget("cloud2", WPaintedCloud(cloud, Vec3d(0.0, -0.75, -1.0), Vec3d(0.0, 0.75, 0.0)), Affine3d(Vec3d(0.0, CV_PI/2, 0.0), Vec3d(1.5, 0.0, 0.0)));
+    viz.showWidget("cloud3", WPaintedCloud(cloud, Vec3d(0.0, 0.0, -1.0), Vec3d(0.0, 0.0, 1.0), Color::blue(), Color::red()));
+    viz.showWidget("arrow", WArrow(Vec3d(0.0, 1.0, -1.0), Vec3d(0.0, 1.0, 1.0), 0.009, Color::raspberry()));
+    viz.showWidget("text2d", WText("Painted clouds", Point(20, 20), 20, Color::green()));
+    viz.spin();
+}
+
+TEST(Viz, show_mesh)
+{
+    Mesh mesh = Mesh::load(get_dragon_ply_file_path());
+
+    Affine3d pose = Affine3d().rotate(Vec3d(0, 0.8, 0));
+
+    Viz3d viz("show_mesh");
+    viz.showWidget("coosys", WCoordinateSystem());
+    viz.showWidget("mesh", WMesh(mesh), pose);
+    viz.showWidget("text2d", WText("Just mesh", Point(20, 20), 20, Color::green()));
+    viz.spin();
+}
+
+TEST(Viz, show_mesh_random_colors)
+{
+    Mesh mesh = Mesh::load(get_dragon_ply_file_path());
+    theRNG().fill(mesh.colors, RNG::UNIFORM, 0, 255);
+
+    Affine3d pose = Affine3d().rotate(Vec3d(0, 0.8, 0));
+
+    Viz3d viz("show_mesh_random_color");
+    viz.showWidget("coosys", WCoordinateSystem());
+    viz.showWidget("mesh", WMesh(mesh), pose);
+    viz.setRenderingProperty("mesh", SHADING, SHADING_PHONG);
+    viz.showWidget("text2d", WText("Random color mesh", Point(20, 20), 20, Color::green()));
+    viz.spin();
+}
+
+TEST(Viz, show_widget_merger)
+{
+    WWidgetMerger merger;
+    merger.addWidget(WCube(Vec3d::all(0.0), Vec3d::all(1.0), true, Color::gold()));
+
+    RNG& rng = theRNG();
+    for(int i = 0; i < 77; ++i)
+    {
+        Vec3b c;
+        rng.fill(c, RNG::NORMAL, Scalar::all(128), Scalar::all(48), true);
+        merger.addWidget(WSphere(Vec3d(c)*(1.0/255.0), 7.0/255.0, 10, Color(c[2], c[1], c[0])));
+    }
+    merger.finalize();
+
+    Viz3d viz("show_mesh_random_color");
+    viz.showWidget("coo", WCoordinateSystem());
+    viz.showWidget("merger", merger);
+    viz.showWidget("text2d", WText("Widget merger", Point(20, 20), 20, Color::green()));
+    viz.spin();
+}
+
+TEST(Viz, show_textured_mesh)
+{
+    Mat lena = imread(Path::combine(cvtest::TS::ptr()->get_data_path(), "lena.png"));
+
+    std::vector<Vec3d> points;
+    std::vector<Vec2d> tcoords;
+    std::vector<int> polygons;
+    for(size_t i = 0; i < 64; ++i)
+    {
+        double angle = CV_PI/2 * i/64.0;
+        points.push_back(Vec3d(0.00, cos(angle), sin(angle))*0.75);
+        points.push_back(Vec3d(1.57, cos(angle), sin(angle))*0.75);
+        tcoords.push_back(Vec2d(0.0, i/64.0));
+        tcoords.push_back(Vec2d(1.0, i/64.0));
+    }
+
+    for(int i = 0; i < (int)points.size()/2-1; ++i)
+    {
+        int polys[] = {3, 2*i, 2*i+1, 2*i+2, 3, 2*i+1, 2*i+2, 2*i+3};
+        polygons.insert(polygons.end(), polys, polys + sizeof(polys)/sizeof(polys[0]));
+    }
+
+    cv::viz::Mesh mesh;
+    mesh.cloud = Mat(points, true).reshape(3, 1);
+    mesh.tcoords = Mat(tcoords, true).reshape(2, 1);
+    mesh.polygons = Mat(polygons, true).reshape(1, 1);
+    mesh.texture = lena;
+
+    Viz3d viz("show_textured_mesh");
+    viz.setBackgroundMeshLab();
+    viz.showWidget("coosys", WCoordinateSystem());
+    viz.showWidget("mesh", WMesh(mesh));
+    viz.setRenderingProperty("mesh", SHADING, SHADING_PHONG);
+    viz.showWidget("text2d", WText("Textured mesh", Point(20, 20), 20, Color::green()));
+    viz.spin();
+}
+
+TEST(Viz, show_polyline)
+{
+    const Color palette[] = { Color::red(), Color::green(), Color::blue(), Color::gold(), Color::raspberry(), Color::bluberry(), Color::lime() };
+    size_t palette_size = sizeof(palette)/sizeof(palette[0]);
+
+    Mat polyline(1, 32, CV_64FC3), colors(1, 32, CV_8UC3);
+    for(int i = 0; i < (int)polyline.total(); ++i)
+    {
+        polyline.at<Vec3d>(i) = Vec3d(i/16.0, cos(i * CV_PI/6), sin(i * CV_PI/6));
+        colors.at<Vec3b>(i) = palette[i & palette_size];
+    }
+
+    Viz3d viz("show_polyline");
+    viz.showWidget("polyline", WPolyLine(polyline, colors));
+    viz.showWidget("coosys", WCoordinateSystem());
+    viz.showWidget("text2d", WText("Polyline", Point(20, 20), 20, Color::green()));
+    viz.spin();
+}
+
+TEST(Viz, show_sampled_normals)
+{
+    Mesh mesh = Mesh::load(get_dragon_ply_file_path());
+    computeNormals(mesh, mesh.normals);
+
+    Affine3d pose = Affine3d().rotate(Vec3d(0, 0.8, 0));
+
+    Viz3d viz("show_sampled_normals");
+    viz.showWidget("mesh", WMesh(mesh), pose);
+    viz.showWidget("normals", WCloudNormals(mesh.cloud, mesh.normals, 30, 0.1f, Color::green()), pose);
+    viz.setRenderingProperty("normals", LINE_WIDTH, 2.0);
+    viz.showWidget("text2d", WText("Cloud or mesh normals", Point(20, 20), 20, Color::green()));
+    viz.spin();
+}
+
+TEST(Viz, show_cloud_shaded_by_normals)
+{
+    Mesh mesh = Mesh::load(get_dragon_ply_file_path());
+    computeNormals(mesh, mesh.normals);
+
+    Affine3d pose = Affine3d().rotate(Vec3d(0, 0.8, 0));
+
+    WCloud cloud(mesh.cloud, Color::white(), mesh.normals);
+    cloud.setRenderingProperty(SHADING, SHADING_GOURAUD);
+
+    Viz3d viz("show_cloud_shaded_by_normals");
+    viz.showWidget("cloud", cloud, pose);
+    viz.showWidget("text2d", WText("Cloud shaded by normals", Point(20, 20), 20, Color::green()));
+    viz.spin();
+}
+
+TEST(Viz, show_trajectories)
+{
+    std::vector<Affine3d> path = generate_test_trajectory<double>(), sub0, sub1, sub2, sub3, sub4, sub5;
+    int size =(int)path.size();
+
+    Mat(path).rowRange(0, size/10+1).copyTo(sub0);
+    Mat(path).rowRange(size/10, size/5+1).copyTo(sub1);
+    Mat(path).rowRange(size/5, 11*size/12).copyTo(sub2);
+    Mat(path).rowRange(11*size/12, size).copyTo(sub3);
+    Mat(path).rowRange(3*size/4, 33*size/40).copyTo(sub4);
+    Mat(path).rowRange(33*size/40, 9*size/10).copyTo(sub5);
+    Matx33d K(1024.0, 0.0, 320.0, 0.0, 1024.0, 240.0, 0.0, 0.0, 1.0);
+
+    Viz3d viz("show_trajectories");
+    viz.showWidget("coos", WCoordinateSystem());
+    viz.showWidget("sub0", WTrajectorySpheres(sub0, 0.25, 0.07));
+    viz.showWidget("sub1", WTrajectory(sub1, WTrajectory::PATH, 0.2, Color::brown()));
+    viz.showWidget("sub2", WTrajectory(sub2, WTrajectory::FRAMES, 0.2));
+    viz.showWidget("sub3", WTrajectory(sub3, WTrajectory::BOTH, 0.2, Color::green()));
+    viz.showWidget("sub4", WTrajectoryFrustums(sub4, K, 0.3, Color::yellow()));
+    viz.showWidget("sub5", WTrajectoryFrustums(sub5, Vec2d(0.78, 0.78), 0.15));
+    viz.showWidget("text2d", WText("Different kinds of supported trajectories", Point(20, 20), 20, Color::green()));
+
+    int i = 0;
+    while(!viz.wasStopped())
+    {
+        double a = --i % 360;
+        Vec3d pose(sin(a * CV_PI/180), 0.7, cos(a * CV_PI/180));
+        viz.setViewerPose(makeCameraPose(pose * 7.5, Vec3d(0.0, 0.5, 0.0), Vec3d(0.0, 0.1, 0.0)));
+        viz.spinOnce(20, true);
+    }
+    viz.resetCamera();
+    viz.spin();
+}
+
+TEST(Viz, show_trajectory_reposition)
+{
+    std::vector<Affine3f> path = generate_test_trajectory<float>();
+
+    Viz3d viz("show_trajectory_reposition_to_origin");
+    viz.showWidget("coos", WCoordinateSystem());
+    viz.showWidget("sub3", WTrajectory(Mat(path).rowRange(0, (int)path.size()/3), WTrajectory::BOTH, 0.2, Color::brown()), path.front().inv());
+    viz.showWidget("text2d", WText("Trajectory resposition to origin", Point(20, 20), 20, Color::green()));
+    viz.spin();
+}
+
+TEST(Viz, show_camera_positions)
+{
+    Matx33d K(1024.0, 0.0, 320.0, 0.0, 1024.0, 240.0, 0.0, 0.0, 1.0);
+    Mat lena = imread(Path::combine(cvtest::TS::ptr()->get_data_path(), "lena.png"));
+    Mat gray = make_gray(lena);
+
+    Affine3d poses[2];
+    for(int i = 0; i < 2; ++i)
+    {
+        Vec3d pose = 5 * Vec3d(sin(3.14 + 2.7 + i*60 * CV_PI/180), 0.4 - i*0.3, cos(3.14 + 2.7 + i*60 * CV_PI/180));
+        poses[i] = makeCameraPose(pose, Vec3d(0.0, 0.0, 0.0), Vec3d(0.0, -0.1, 0.0));
+    }
+
+    Viz3d viz("show_camera_positions");
+    viz.showWidget("sphe", WSphere(Point3d(0,0,0), 1.0, 10, Color::orange_red()));
+    viz.showWidget("coos", WCoordinateSystem(1.5));
+    viz.showWidget("pos1", WCameraPosition(0.75), poses[0]);
+    viz.showWidget("pos2", WCameraPosition(Vec2d(0.78, 0.78), lena, 2.2, Color::green()), poses[0]);
+    viz.showWidget("pos3", WCameraPosition(0.75), poses[1]);
+    viz.showWidget("pos4", WCameraPosition(K, gray, 3, Color::indigo()), poses[1]);
+    viz.showWidget("text2d", WText("Camera positions with images", Point(20, 20), 20, Color::green()));
+    viz.spin();
+}
+
+TEST(Viz, show_overlay_image)
+{
+    Mat lena = imread(Path::combine(cvtest::TS::ptr()->get_data_path(), "lena.png"));
+    Mat gray = make_gray(lena);
+
+    Size2d half_lsize = Size2d(lena.size()) * 0.5;
+
+    Viz3d viz("show_overlay_image");
+    viz.setBackgroundMeshLab();
+    Size vsz = viz.getWindowSize();
+
+    viz.showWidget("coos", WCoordinateSystem());
+    viz.showWidget("cube", WCube());
+    viz.showWidget("img1", WImageOverlay(lena, Rect(Point(10, 10), half_lsize)));
+    viz.showWidget("img2", WImageOverlay(gray, Rect(Point(vsz.width-10-lena.cols/2, 10), half_lsize)));
+    viz.showWidget("img3", WImageOverlay(gray, Rect(Point(10, vsz.height-10-lena.rows/2), half_lsize)));
+    viz.showWidget("img5", WImageOverlay(lena, Rect(Point(vsz.width-10-lena.cols/2, vsz.height-10-lena.rows/2), half_lsize)));
+    viz.showWidget("text2d", WText("Overlay images", Point(20, 20), 20, Color::green()));
+
+    int i = 0;
+    while(!viz.wasStopped())
+    {
+        double a = ++i % 360;
+        Vec3d pose(sin(a * CV_PI/180), 0.7, cos(a * CV_PI/180));
+        viz.setViewerPose(makeCameraPose(pose * 3, Vec3d(0.0, 0.5, 0.0), Vec3d(0.0, 0.1, 0.0)));
+        viz.getWidget("img1").cast<WImageOverlay>().setImage(lena * pow(sin(i*10*CV_PI/180) * 0.5 + 0.5, 1.0));
+        viz.spinOnce(1, true);
+    }
+    viz.showWidget("text2d", WText("Overlay images (stopped)", Point(20, 20), 20, Color::green()));
+    viz.spin();
+}
+
+
+TEST(Viz, show_image_method)
+{
+    Mat lena = imread(Path::combine(cvtest::TS::ptr()->get_data_path(), "lena.png"));
+
+    Viz3d viz("show_image_method");
+    viz.showImage(lena);
+    viz.spinOnce(1500, true);
+    viz.showImage(lena, lena.size());
+    viz.spinOnce(1500, true);
+
+    cv::viz::imshow("show_image_method", make_gray(lena)).spin();
+}
+
+TEST(Viz, show_image_3d)
+{
+    Mat lena = imread(Path::combine(cvtest::TS::ptr()->get_data_path(), "lena.png"));
+    Mat gray = make_gray(lena);
+
+    Viz3d viz("show_image_3d");
+    viz.setBackgroundMeshLab();
+    viz.showWidget("coos", WCoordinateSystem());
+    viz.showWidget("cube", WCube());
+    viz.showWidget("arr0", WArrow(Vec3d(0.5, 0.0, 0.0), Vec3d(1.5, 0.0, 0.0), 0.009, Color::raspberry()));
+    viz.showWidget("img0", WImage3D(lena, Size2d(1.0, 1.0)), Affine3d(Vec3d(0.0, CV_PI/2, 0.0), Vec3d(.5, 0.0, 0.0)));
+    viz.showWidget("arr1", WArrow(Vec3d(-0.5, -0.5, 0.0), Vec3d(0.2, 0.2, 0.0), 0.009, Color::raspberry()));
+    viz.showWidget("img1", WImage3D(gray, Size2d(1.0, 1.0), Vec3d(-0.5, -0.5, 0.0), Vec3d(1.0, 1.0, 0.0), Vec3d(0.0, 1.0, 0.0)));
+
+    viz.showWidget("arr3", WArrow(Vec3d::all(-0.5), Vec3d::all(0.5), 0.009, Color::raspberry()));
+
+    viz.showWidget("text2d", WText("Images in 3D", Point(20, 20), 20, Color::green()));
+
+    int i = 0;
+    while(!viz.wasStopped())
+    {
+        viz.getWidget("img0").cast<WImage3D>().setImage(lena * pow(sin(i++*7.5*CV_PI/180) * 0.5 + 0.5, 1.0));
+        viz.spinOnce(1, true);
+    }
+    viz.showWidget("text2d", WText("Images in 3D (stopped)", Point(20, 20), 20, Color::green()));
+    viz.spin();
+}
+
+TEST(Viz, show_simple_widgets)
+{
+    Viz3d viz("show_simple_widgets");
+    viz.setBackgroundMeshLab();
+
+    viz.showWidget("coos", WCoordinateSystem());
+    viz.showWidget("cube", WCube());
+    viz.showWidget("cub0", WCube(Vec3d::all(-1.0), Vec3d::all(-0.5), false, Color::indigo()));
+    viz.showWidget("arro", WArrow(Vec3d::all(-0.5), Vec3d::all(0.5), 0.009, Color::raspberry()));
+    viz.showWidget("cir1", WCircle(0.5, 0.01, Color::bluberry()));
+    viz.showWidget("cir2", WCircle(0.5, Point3d(0.5, 0.0, 0.0), Vec3d(1.0, 0.0, 0.0), 0.01, Color::apricot()));
+
+    viz.showWidget("cyl0", WCylinder(Vec3d(-0.5, 0.5, -0.5), Vec3d(0.5, 0.5, -0.5), 0.125, 30, Color::brown()));
+    viz.showWidget("con0", WCone(0.25, 0.125, 6, Color::azure()));
+    viz.showWidget("con1", WCone(0.125, Point3d(0.5, -0.5, 0.5), Point3d(0.5, -1.0, 0.5), 6, Color::turquoise()));
+
+    viz.showWidget("text2d", WText("Different simple widgets", Point(20, 20), 20, Color::green()));
+    viz.showWidget("text3d", WText3D("Simple 3D text", Point3d( 0.5,  0.5, 0.5), 0.125, false, Color::green()));
+
+    viz.showWidget("plane1", WPlane(Size2d(0.25, 0.75)));
+    viz.showWidget("plane2", WPlane(Vec3d(0.5, -0.5, -0.5), Vec3d(0.0, 1.0, 1.0), Vec3d(1.0, 1.0, 0.0), Size2d(1.0, 0.5), Color::gold()));
+
+    viz.showWidget("grid1", WGrid(Vec2i(7,7), Vec2d::all(0.75), Color::gray()), Affine3d().translate(Vec3d(0.0, 0.0, -1.0)));
+
+    viz.spin();
+    viz.getWidget("text2d").cast<WText>().setText("Different simple widgets (updated)");
+    viz.getWidget("text3d").cast<WText3D>().setText("Updated text 3D");
+    viz.spin();
+}
+
+TEST(Viz, show_follower)
+{
+    Viz3d viz("show_follower");
+
+    viz.showWidget("coos", WCoordinateSystem());
+    viz.showWidget("cube", WCube());
+    viz.showWidget("t3d_2", WText3D("Simple 3D follower", Point3d(-0.5, -0.5, 0.5), 0.125, true,  Color::green()));
+    viz.showWidget("text2d", WText("Follower: text always facing camera", Point(20, 20), 20, Color::green()));
+    viz.setBackgroundMeshLab();
+    viz.spin();
+    viz.getWidget("t3d_2").cast<WText3D>().setText("Updated follower 3D");
+    viz.spin();
+}
+
+}} // namespace
diff --git a/modules/viz/tutorials/creating_widgets/creating_widgets.markdown b/modules/viz/tutorials/creating_widgets/creating_widgets.markdown
new file mode 100644
index 00000000000..934046f842d
--- /dev/null
+++ b/modules/viz/tutorials/creating_widgets/creating_widgets.markdown
@@ -0,0 +1,55 @@
+Creating Widgets {#tutorial_creating_widgets}
+================
+
+Goal
+----
+
+In this tutorial you will learn how to
+
+-   Create your own widgets using WidgetAccessor and VTK.
+-   Show your widget in the visualization window.
+
+Code
+----
+
+You can download the code from [here ](https://github.com/opencv/opencv_contrib/tree/master/modules/viz/samples/creating_widgets.cpp).
+@include viz/samples/creating_widgets.cpp
+
+Explanation
+-----------
+
+Here is the general structure of the program:
+
+-   Extend Widget3D class to create a new 3D widget.
+    @code{.cpp}
+    class WTriangle : public viz::Widget3D
+    {
+        public:
+            WTriangle(const Point3f &pt1, const Point3f &pt2, const Point3f &pt3, const viz::Color & color = viz::Color::white());
+    };
+    @endcode
+-   Assign a VTK actor to the widget.
+    @code{.cpp}
+    // Store this actor in the widget in order that visualizer can access it
+    viz::WidgetAccessor::setProp(*this, actor);
+    @endcode
+-   Set color of the widget.
+    @code{.cpp}
+    // Set the color of the widget. This has to be called after WidgetAccessor.
+    setColor(color);
+    @endcode
+-   Construct a triangle widget and display it in the window.
+    @code{.cpp}
+    /// Create a triangle widget
+    WTriangle tw(Point3f(0.0,0.0,0.0), Point3f(1.0,1.0,1.0), Point3f(0.0,1.0,0.0), viz::Color::red());
+
+    /// Show widget in the visualizer window
+    myWindow.showWidget("TRIANGLE", tw);
+    @endcode
+
+Results
+-------
+
+Here is the result of the program.
+
+![](images/red_triangle.png)
diff --git a/modules/viz/tutorials/creating_widgets/images/red_triangle.png b/modules/viz/tutorials/creating_widgets/images/red_triangle.png
new file mode 100644
index 00000000000..7da6ad0602a
Binary files /dev/null and b/modules/viz/tutorials/creating_widgets/images/red_triangle.png differ
diff --git a/modules/viz/tutorials/histo3D/histo3D.markdown b/modules/viz/tutorials/histo3D/histo3D.markdown
new file mode 100644
index 00000000000..3946c5ea8c3
--- /dev/null
+++ b/modules/viz/tutorials/histo3D/histo3D.markdown
@@ -0,0 +1,51 @@
+Creating a 3D histogram {#tutorial_histo3D}
+================
+
+Goal
+----
+
+In this tutorial you will learn how to
+
+-   Create your own callback keyboard function for viz window.
+-   Show your 3D histogram in a viz window.
+
+Code
+----
+
+You can download the code from [here ](https://github.com/opencv/opencv_contrib/tree/master/modules/viz/samples/histo3D.cpp).
+@include viz/samples/histo3D.cpp
+
+Explanation
+-----------
+
+Here is the general structure of the program:
+
+-   You can give full path to an image in command line
+    @snippet histo3D.cpp command_line_parser
+
+    or without path, a synthetic image is generated with pixel values are a gaussian distribution @ref cv::RNG::fill center(60+/-10,40+/-5,50+/-20) in first quadrant,
+    (160+/-20,10+/-5,50+/-10) in second quadrant, (90+/-10,100+/-20,50+/-20) in third quadrant, (100+/-10,10+/-5,150+/-40) in last quadrant.
+    @snippet histo3D.cpp synthetic_image
+    Image tridimensional histogram is calculated using opencv @ref cv::calcHist and @ref cv::normalize between 0 and 100.
+    @snippet histo3D.cpp calchist_for_histo3d
+    channel are 2, 1 and 0 to synchronise color with Viz axis color in objetc cv::viz::WCoordinateSystem.
+
+    A slidebar is inserted in image window. Init slidebar value is 90, it means that only histogram cell greater than 9/100000.0 (23 pixels for an 512X512 pixels) will be display.
+    @snippet histo3D.cpp slide_bar_for_thresh
+    We are ready to open a viz window with a callback function to capture keyboard event in viz window. Using @ref cv::viz::Viz3d::spinOnce enable keyboard event to be capture in @ref cv::imshow window too.
+    @snippet histo3D.cpp manage_viz_imshow_window
+    The function DrawHistogram3D processes histogram Mat to display it in a Viz window. Number of plan, row and column in [three dimensional Mat](@ref CVMat_Details ) can be found using  this code :
+    @snippet histo3D.cpp get_cube_size
+    To get histogram value at a specific location we use @ref cv::Mat::at(int i0,int i1, int i2)  method with three arguments k, i and j where k is plane number, i row number and j column number.
+    @snippet histo3D.cpp get_cube_values
+
+-   Callback function
+    Principle are as mouse callback function. Key code pressed is in field code of class @ref cv::viz::KeyboardEvent.
+    @snippet histo3D.cpp viz_keyboard_callback
+
+Results
+-------
+
+Here is the result of the program with no argument and threshold equal to 50.
+
+![](images/histo50.png)
diff --git a/modules/viz/tutorials/histo3D/images/histo50.png b/modules/viz/tutorials/histo3D/images/histo50.png
new file mode 100644
index 00000000000..b2b4416ceec
Binary files /dev/null and b/modules/viz/tutorials/histo3D/images/histo50.png differ
diff --git a/modules/viz/tutorials/images/facedetect.jpg b/modules/viz/tutorials/images/facedetect.jpg
new file mode 100644
index 00000000000..788b7d8262b
Binary files /dev/null and b/modules/viz/tutorials/images/facedetect.jpg differ
diff --git a/modules/viz/tutorials/images/image_effects.png b/modules/viz/tutorials/images/image_effects.png
new file mode 100644
index 00000000000..25edb668f96
Binary files /dev/null and b/modules/viz/tutorials/images/image_effects.png differ
diff --git a/modules/viz/tutorials/images/intro.png b/modules/viz/tutorials/images/intro.png
new file mode 100644
index 00000000000..5f2dc1aa4c3
Binary files /dev/null and b/modules/viz/tutorials/images/intro.png differ
diff --git a/modules/viz/tutorials/launching_viz/images/window_demo.png b/modules/viz/tutorials/launching_viz/images/window_demo.png
new file mode 100644
index 00000000000..b853fe29da9
Binary files /dev/null and b/modules/viz/tutorials/launching_viz/images/window_demo.png differ
diff --git a/modules/viz/tutorials/launching_viz/launching_viz.markdown b/modules/viz/tutorials/launching_viz/launching_viz.markdown
new file mode 100644
index 00000000000..9319251a213
--- /dev/null
+++ b/modules/viz/tutorials/launching_viz/launching_viz.markdown
@@ -0,0 +1,64 @@
+Launching Viz {#tutorial_launching_viz}
+=============
+
+Goal
+----
+
+In this tutorial you will learn how to
+
+-   Open a visualization window.
+-   Access a window by its name.
+-   Start event loop.
+-   Start event loop for a given amount of time.
+
+Code
+----
+
+You can download the code from [here ](https://github.com/opencv/opencv_contrib/tree/master/modules/viz/samples/launching_viz.cpp).
+@include viz/samples/launching_viz.cpp
+
+Explanation
+-----------
+
+Here is the general structure of the program:
+
+-   Create a window.
+    @code{.cpp}
+    /// Create a window
+    viz::Viz3d myWindow("Viz Demo");
+    @endcode
+-   Start event loop. This event loop will run until user terminates it by pressing **e**, **E**,
+    **q**, **Q**.
+    @code{.cpp}
+    /// Start event loop
+    myWindow.spin();
+    @endcode
+-   Access same window via its name. Since windows are implicitly shared, **sameWindow** is exactly
+    the same with **myWindow**. If the name does not exist, a new window is created.
+    @code{.cpp}
+    /// Access window via its name
+    viz::Viz3d sameWindow = viz::getWindowByName("Viz Demo");
+    @endcode
+-   Start a controlled event loop. Once it starts, **wasStopped** is set to false. Inside the while
+    loop, in each iteration, **spinOnce** is called to prevent event loop from completely stopping.
+    Inside the while loop, user can execute other statements including those which interact with the
+    window.
+    @code{.cpp}
+    /// Event loop is over when pressed q, Q, e, E
+    /// Start event loop once for 1 millisecond
+    sameWindow.spinOnce(1, true);
+    while(!sameWindow.wasStopped())
+    {
+        /// Interact with window
+
+        /// Event loop for 1 millisecond
+        sameWindow.spinOnce(1, true);
+    }
+    @endcode
+
+Results
+-------
+
+Here is the result of the program.
+
+![](images/window_demo.png)
diff --git a/modules/viz/tutorials/table_of_content_viz.markdown b/modules/viz/tutorials/table_of_content_viz.markdown
new file mode 100644
index 00000000000..fae1396de48
--- /dev/null
+++ b/modules/viz/tutorials/table_of_content_viz.markdown
@@ -0,0 +1,42 @@
+OpenCV Viz {#tutorial_table_of_content_viz}
+==========
+
+-   @subpage tutorial_launching_viz
+
+    *Compatibility:* \> OpenCV 3.0.0
+
+    *Author:* Ozan Tonkal
+
+    You will learn how to launch a viz window.
+
+-   @subpage tutorial_widget_pose
+
+    *Compatibility:* \> OpenCV 3.0.0
+
+    *Author:* Ozan Tonkal
+
+    You will learn how to change pose of a widget.
+
+-   @subpage tutorial_transformations
+
+    *Compatibility:* \> OpenCV 3.0.0
+
+    *Author:* Ozan Tonkal
+
+    You will learn how to transform between global and camera frames.
+
+-   @subpage tutorial_creating_widgets
+
+    *Compatibility:* \> OpenCV 3.0.0
+
+    *Author:* Ozan Tonkal
+
+    You will learn how to create your own widgets.
+
+-   @subpage tutorial_histo3D
+
+    *Compatibility:* \> OpenCV 3.0.0
+
+    *Author:* Laurent Berger
+
+    You will learn how to plot a 3D histogram.
diff --git a/modules/viz/tutorials/transformations/images/camera_view_point.png b/modules/viz/tutorials/transformations/images/camera_view_point.png
new file mode 100644
index 00000000000..e2ac5b0f0de
Binary files /dev/null and b/modules/viz/tutorials/transformations/images/camera_view_point.png differ
diff --git a/modules/viz/tutorials/transformations/images/global_view_point.png b/modules/viz/tutorials/transformations/images/global_view_point.png
new file mode 100644
index 00000000000..fc6de2c1a1f
Binary files /dev/null and b/modules/viz/tutorials/transformations/images/global_view_point.png differ
diff --git a/modules/viz/tutorials/transformations/transformations.markdown b/modules/viz/tutorials/transformations/transformations.markdown
new file mode 100644
index 00000000000..8bdd9ba3b36
--- /dev/null
+++ b/modules/viz/tutorials/transformations/transformations.markdown
@@ -0,0 +1,88 @@
+Transformations {#tutorial_transformations}
+===============
+
+Goal
+----
+
+In this tutorial you will learn how to
+
+-   How to use makeTransformToGlobal to compute pose
+-   How to use makeCameraPose and Viz3d::setViewerPose
+-   How to visualize camera position by axes and by viewing frustum
+
+Code
+----
+
+You can download the code from [here ](https://github.com/opencv/opencv_contrib/tree/master/modules/viz/samples/transformations.cpp).
+@include viz/samples/transformations.cpp
+
+Explanation
+-----------
+
+Here is the general structure of the program:
+
+-   Create a visualization window.
+    @code{.cpp}
+    /// Create a window
+    viz::Viz3d myWindow("Transformations");
+    @endcode
+-   Get camera pose from camera position, camera focal point and y direction.
+    @code{.cpp}
+    /// Let's assume camera has the following properties
+    Point3f cam_pos(3.0f,3.0f,3.0f), cam_focal_point(3.0f,3.0f,2.0f), cam_y_dir(-1.0f,0.0f,0.0f);
+
+    /// We can get the pose of the cam using makeCameraPose
+    Affine3f cam_pose = viz::makeCameraPose(cam_pos, cam_focal_point, cam_y_dir);
+    @endcode
+-   Obtain transform matrix knowing the axes of camera coordinate system.
+    @code{.cpp}
+    /// We can get the transformation matrix from camera coordinate system to global using
+    /// - makeTransformToGlobal. We need the axes of the camera
+    Affine3f transform = viz::makeTransformToGlobal(Vec3f(0.0f,-1.0f,0.0f), Vec3f(-1.0f,0.0f,0.0f), Vec3f(0.0f,0.0f,-1.0f), cam_pos);
+    @endcode
+-   Create a cloud widget from bunny.ply file
+    @code{.cpp}
+    /// Create a cloud widget.
+    Mat bunny_cloud = cvcloud_load();
+    viz::WCloud cloud_widget(bunny_cloud, viz::Color::green());
+    @endcode
+-   Given the pose in camera coordinate system, estimate the global pose.
+    @code{.cpp}
+    /// Pose of the widget in camera frame
+    Affine3f cloud_pose = Affine3f().translate(Vec3f(0.0f,0.0f,3.0f));
+    /// Pose of the widget in global frame
+    Affine3f cloud_pose_global = transform * cloud_pose;
+    @endcode
+-   If the view point is set to be global, visualize camera coordinate frame and viewing frustum.
+    @code{.cpp}
+    /// Visualize camera frame
+    if (!camera_pov)
+    {
+        viz::WCameraPosition cpw(0.5); // Coordinate axes
+        viz::WCameraPosition cpw_frustum(Vec2f(0.889484, 0.523599)); // Camera frustum
+        myWindow.showWidget("CPW", cpw, cam_pose);
+        myWindow.showWidget("CPW_FRUSTUM", cpw_frustum, cam_pose);
+    }
+    @endcode
+-   Visualize the cloud widget with the estimated global pose
+    @code{.cpp}
+    /// Visualize widget
+    myWindow.showWidget("bunny", cloud_widget, cloud_pose_global);
+    @endcode
+-   If the view point is set to be camera's, set viewer pose to **cam_pose**.
+    @code{.cpp}
+    /// Set the viewer pose to that of camera
+    if (camera_pov)
+        myWindow.setViewerPose(cam_pose);
+    @endcode
+
+Results
+-------
+
+-#  Here is the result from the camera point of view.
+
+    ![](images/camera_view_point.png)
+
+-#  Here is the result from global point of view.
+
+    ![](images/global_view_point.png)
diff --git a/modules/viz/tutorials/widget_pose/images/widgetpose.png b/modules/viz/tutorials/widget_pose/images/widgetpose.png
new file mode 100644
index 00000000000..ef8a5937f9c
Binary files /dev/null and b/modules/viz/tutorials/widget_pose/images/widgetpose.png differ
diff --git a/modules/viz/tutorials/widget_pose/widget_pose.markdown b/modules/viz/tutorials/widget_pose/widget_pose.markdown
new file mode 100644
index 00000000000..e161ab97210
--- /dev/null
+++ b/modules/viz/tutorials/widget_pose/widget_pose.markdown
@@ -0,0 +1,85 @@
+Pose of a widget {#tutorial_widget_pose}
+================
+
+Goal
+----
+
+In this tutorial you will learn how to
+
+-   Add widgets to the visualization window
+-   Use Affine3 to set pose of a widget
+-   Rotating and translating a widget along an axis
+
+Code
+----
+
+You can download the code from [here ](https://github.com/opencv/opencv_contrib/tree/master/modules/viz/samples/widget_pose.cpp).
+@include viz/samples/widget_pose.cpp
+
+Explanation
+-----------
+
+Here is the general structure of the program:
+
+-   Create a visualization window.
+    @code{.cpp}
+    /// Create a window
+    viz::Viz3d myWindow("Coordinate Frame");
+    @endcode
+-   Show coordinate axes in the window using CoordinateSystemWidget.
+    @code{.cpp}
+    /// Add coordinate axes
+    myWindow.showWidget("Coordinate Widget", viz::WCoordinateSystem());
+    @endcode
+-   Display a line representing the axis (1,1,1).
+    @code{.cpp}
+    /// Add line to represent (1,1,1) axis
+    viz::WLine axis(Point3f(-1.0f,-1.0f,-1.0f), Point3f(1.0f,1.0f,1.0f));
+    axis.setRenderingProperty(viz::LINE_WIDTH, 4.0);
+    myWindow.showWidget("Line Widget", axis);
+    @endcode
+-   Construct a cube.
+    @code{.cpp}
+    /// Construct a cube widget
+    viz::WCube cube_widget(Point3f(0.5,0.5,0.0), Point3f(0.0,0.0,-0.5), true, viz::Color::blue());
+    cube_widget.setRenderingProperty(viz::LINE_WIDTH, 4.0);
+    myWindow.showWidget("Cube Widget", cube_widget);
+    @endcode
+-   Create rotation matrix from rodrigues vector
+    @code{.cpp}
+    /// Rotate around (1,1,1)
+    rot_vec.at<float>(0,0) += CV_PI * 0.01f;
+    rot_vec.at<float>(0,1) += CV_PI * 0.01f;
+    rot_vec.at<float>(0,2) += CV_PI * 0.01f;
+
+    ...
+
+    Mat rot_mat;
+    Rodrigues(rot_vec, rot_mat);
+    @endcode
+-   Use Affine3f to set pose of the cube.
+    @code{.cpp}
+    /// Construct pose
+    Affine3f pose(rot_mat, Vec3f(translation, translation, translation));
+    myWindow.setWidgetPose("Cube Widget", pose);
+    @endcode
+-   Animate the rotation using wasStopped and spinOnce
+    @code{.cpp}
+    while(!myWindow.wasStopped())
+    {
+        ...
+
+        myWindow.spinOnce(1, true);
+    }
+    @endcode
+
+Results
+-------
+
+Here is the result of the program.
+
+\htmlonly
+<div align="center">
+<iframe width="420" height="315" src="https://www.youtube.com/embed/22HKMN657U0" frameborder="0" allowfullscreen></iframe>
+</div>
+\endhtmlonly
diff --git a/modules/xfeatures2d/CMakeLists.txt b/modules/xfeatures2d/CMakeLists.txt
index 3c86ebc141b..f1b0fd2fcfa 100644
--- a/modules/xfeatures2d/CMakeLists.txt
+++ b/modules/xfeatures2d/CMakeLists.txt
@@ -15,3 +15,7 @@ if(NOT boost_status OR NOT vgg_status)
 endif()
 
 ocv_module_include_directories("${DOWNLOAD_DIR}")
+
+if(TARGET opencv_test_${name})
+  ocv_target_include_directories(opencv_test_${name} "${OpenCV_SOURCE_DIR}/modules")  # use common files from features2d tests
+endif()
diff --git a/modules/xfeatures2d/doc/xfeatures2d.bib b/modules/xfeatures2d/doc/xfeatures2d.bib
index bce1f79f5c1..ca5981c783c 100644
--- a/modules/xfeatures2d/doc/xfeatures2d.bib
+++ b/modules/xfeatures2d/doc/xfeatures2d.bib
@@ -71,6 +71,15 @@ @article{Lowe04
   publisher = {Kluwer Academic Publishers}
 }
 
+@article{Lowry2018LOGOSLG,
+  title = {LOGOS: Local Geometric Support for High-Outlier Spatial Verification},
+  author = {Stephanie Lowry and Henrik Andreasson},
+  journal = {2018 IEEE International Conference on Robotics and Automation (ICRA)},
+  year = {2018},
+  pages = {7262-7269},
+  doi = {10.1109/ICRA.2018.8460988},
+}
+
 @article{Mikolajczyk2004,
   title = {Scale \& affine invariant interest point detectors},
   author = {Mikolajczyk, Krystian and Schmid, Cordelia},
diff --git a/modules/xfeatures2d/include/opencv2/xfeatures2d.hpp b/modules/xfeatures2d/include/opencv2/xfeatures2d.hpp
index 1c0a5f8020f..26201a805c1 100644
--- a/modules/xfeatures2d/include/opencv2/xfeatures2d.hpp
+++ b/modules/xfeatures2d/include/opencv2/xfeatures2d.hpp
@@ -55,7 +55,9 @@ known to be patented. You need to set the OPENCV_ENABLE_NONFREE option in cmake
 
     @defgroup xfeatures2d_match Experimental 2D Features Matching Algorithm
 
-This section describes the GMS (Grid-based Motion Statistics) matching strategy.
+This section describes the following matching strategies:
+    - GMS: Grid-based Motion Statistics, @cite Bian2017gms
+    - LOGOS: Local geometric support for high-outlier spatial verification, @cite Lowry2018LOGOSLG
 
 @}
 */
@@ -117,10 +119,9 @@ class CV_EXPORTS_W FREAK : public Feature2D
 {
 public:
 
-    enum
-    {
-        NB_SCALES = 64, NB_PAIRS = 512, NB_ORIENPAIRS = 45
-    };
+    static const int    NB_SCALES        = 64;
+    static const int    NB_PAIRS         = 512;
+    static const int    NB_ORIENPAIRS    = 45;
 
     /**
     @param orientationNormalized Enable orientation normalization.
@@ -230,12 +231,12 @@ DAISY::NRM_SIFT mean that descriptors are normalized for L2 norm equal to 1.0 bu
 class CV_EXPORTS_W DAISY : public Feature2D
 {
 public:
-    enum
+    enum NormalizationType
     {
         NRM_NONE = 100, NRM_PARTIAL = 101, NRM_FULL = 102, NRM_SIFT = 103,
     };
     CV_WRAP static Ptr<DAISY> create( float radius = 15, int q_radius = 3, int q_theta = 8,
-                int q_hist = 8, int norm = DAISY::NRM_NONE, InputArray H = noArray(),
+                int q_hist = 8, DAISY::NormalizationType norm = DAISY::NRM_NONE, InputArray H = noArray(),
                 bool interpolation = true, bool use_orientation = false );
 
     /** @overload
@@ -994,7 +995,7 @@ FastFeatureDetector::TYPE_5_8
 Detects corners using the FAST algorithm by @cite Rosten06 .
  */
 CV_EXPORTS void FASTForPointSet( InputArray image, CV_IN_OUT std::vector<KeyPoint>& keypoints,
-                      int threshold, bool nonmaxSuppression=true, int type=FastFeatureDetector::TYPE_9_16);
+                      int threshold, bool nonmaxSuppression=true, cv::FastFeatureDetector::DetectorType type=FastFeatureDetector::TYPE_9_16);
 
 
 //! @}
@@ -1003,7 +1004,7 @@ CV_EXPORTS void FASTForPointSet( InputArray image, CV_IN_OUT std::vector<KeyPoin
 //! @addtogroup xfeatures2d_match
 //! @{
 
-/** @brief GMS  (Grid-based Motion Statistics) feature matching strategy by @cite Bian2017gms .
+/** @brief GMS (Grid-based Motion Statistics) feature matching strategy described in @cite Bian2017gms .
     @param size1 Input size of image1.
     @param size2 Input size of image2.
     @param keypoints1 Input keypoints of image1.
@@ -1018,10 +1019,24 @@ CV_EXPORTS void FASTForPointSet( InputArray image, CV_IN_OUT std::vector<KeyPoin
         If matching results are not satisfying, please add more features. (We use 10000 for images with 640 X 480).
         If your images have big rotation and scale changes, please set withRotation or withScale to true.
  */
+CV_EXPORTS_W void matchGMS(const Size& size1, const Size& size2, const std::vector<KeyPoint>& keypoints1, const std::vector<KeyPoint>& keypoints2,
+                           const std::vector<DMatch>& matches1to2, CV_OUT std::vector<DMatch>& matchesGMS, const bool withRotation = false,
+                           const bool withScale = false, const double thresholdFactor = 6.0);
 
-CV_EXPORTS_W void matchGMS( const Size& size1, const Size& size2, const std::vector<KeyPoint>& keypoints1, const std::vector<KeyPoint>& keypoints2,
-                          const std::vector<DMatch>& matches1to2, CV_OUT std::vector<DMatch>& matchesGMS, const bool withRotation = false,
-                          const bool withScale = false, const double thresholdFactor = 6.0 );
+/** @brief LOGOS (Local geometric support for high-outlier spatial verification) feature matching strategy described in @cite Lowry2018LOGOSLG .
+    @param keypoints1 Input keypoints of image1.
+    @param keypoints2 Input keypoints of image2.
+    @param nn1 Index to the closest BoW centroid for each descriptors of image1.
+    @param nn2 Index to the closest BoW centroid for each descriptors of image2.
+    @param matches1to2 Matches returned by the LOGOS matching strategy.
+    @note
+        This matching strategy is suitable for features matching against large scale database.
+        First step consists in constructing the bag-of-words (BoW) from a representative image database.
+        Image descriptors are then represented by their closest codevector (nearest BoW centroid).
+ */
+CV_EXPORTS_W void matchLOGOS(const std::vector<KeyPoint>& keypoints1, const std::vector<KeyPoint>& keypoints2,
+                             const std::vector<int>& nn1, const std::vector<int>& nn2,
+                             std::vector<DMatch>& matches1to2);
 
 //! @}
 
diff --git a/modules/xfeatures2d/include/opencv2/xfeatures2d/cuda.hpp b/modules/xfeatures2d/include/opencv2/xfeatures2d/cuda.hpp
index 16039a5a72c..ea4a3238180 100644
--- a/modules/xfeatures2d/include/opencv2/xfeatures2d/cuda.hpp
+++ b/modules/xfeatures2d/include/opencv2/xfeatures2d/cuda.hpp
@@ -83,7 +83,7 @@ between function calls.
         opencv_source_code/samples/gpu/surf_keypoint_matcher.cpp
 
  */
-class CV_EXPORTS SURF_CUDA
+class CV_EXPORTS_W SURF_CUDA
 {
 public:
     enum KeypointLayout
@@ -104,15 +104,28 @@ class CV_EXPORTS SURF_CUDA
     explicit SURF_CUDA(double _hessianThreshold, int _nOctaves=4,
          int _nOctaveLayers=2, bool _extended=false, float _keypointsRatio=0.01f, bool _upright = false);
 
+    /**
+    @param _hessianThreshold Threshold for hessian keypoint detector used in SURF.
+    @param _nOctaves Number of pyramid octaves the keypoint detector will use.
+    @param _nOctaveLayers Number of octave layers within each octave.
+    @param _extended Extended descriptor flag (true - use extended 128-element descriptors; false - use
+    64-element descriptors).
+    @param _keypointsRatio
+    @param _upright Up-right or rotated features flag (true - do not compute orientation of features;
+    false - compute orientation).
+    */
+    CV_WRAP static Ptr<SURF_CUDA> create(double _hessianThreshold, int _nOctaves = 4,
+        int _nOctaveLayers = 2, bool _extended = false, float _keypointsRatio = 0.01f, bool _upright = false);
+
     //! returns the descriptor size in float's (64 or 128)
-    int descriptorSize() const;
+    CV_WRAP int descriptorSize() const;
     //! returns the default norm type
-    int defaultNorm() const;
+    CV_WRAP int defaultNorm() const;
 
     //! upload host keypoints to device memory
     void uploadKeypoints(const std::vector<KeyPoint>& keypoints, GpuMat& keypointsGPU);
     //! download keypoints from device to host memory
-    void downloadKeypoints(const GpuMat& keypointsGPU, std::vector<KeyPoint>& keypoints);
+    CV_WRAP void downloadKeypoints(const GpuMat& keypointsGPU, CV_OUT std::vector<KeyPoint>& keypoints);
 
     //! download descriptors from device to host memory
     void downloadDescriptors(const GpuMat& descriptorsGPU, std::vector<float>& descriptors);
@@ -133,24 +146,47 @@ class CV_EXPORTS SURF_CUDA
     void operator()(const GpuMat& img, const GpuMat& mask, GpuMat& keypoints, GpuMat& descriptors,
         bool useProvidedKeypoints = false);
 
+    /** @brief Finds the keypoints using fast hessian detector used in SURF
+
+    @param img Source image, currently supports only CV_8UC1 images.
+    @param mask A mask image same size as src and of type CV_8UC1.
+    @param keypoints Detected keypoints.
+     */
+    CV_WRAP inline void detect(const GpuMat& img, const GpuMat& mask, CV_OUT GpuMat& keypoints) {
+        (*this)(img, mask, keypoints);
+    }
+
     void operator()(const GpuMat& img, const GpuMat& mask, std::vector<KeyPoint>& keypoints);
     void operator()(const GpuMat& img, const GpuMat& mask, std::vector<KeyPoint>& keypoints, GpuMat& descriptors,
         bool useProvidedKeypoints = false);
 
+    /** @brief Finds the keypoints and computes their descriptors using fast hessian detector used in SURF
+
+    @param img Source image, currently supports only CV_8UC1 images.
+    @param mask A mask image same size as src and of type CV_8UC1.
+    @param keypoints Detected keypoints.
+    @param descriptors Keypoint descriptors.
+    @param useProvidedKeypoints Compute descriptors for the user-provided keypoints and recompute keypoints direction.
+     */
+    CV_WRAP inline void detectWithDescriptors(const GpuMat& img, const GpuMat& mask, CV_OUT GpuMat& keypoints, CV_OUT GpuMat& descriptors,
+        bool useProvidedKeypoints = false) {
+        (*this)(img, mask, keypoints, descriptors, useProvidedKeypoints);
+    }
+
     void operator()(const GpuMat& img, const GpuMat& mask, std::vector<KeyPoint>& keypoints, std::vector<float>& descriptors,
         bool useProvidedKeypoints = false);
 
     void releaseMemory();
 
     // SURF parameters
-    double hessianThreshold;
-    int nOctaves;
-    int nOctaveLayers;
-    bool extended;
-    bool upright;
+    CV_PROP double hessianThreshold;
+    CV_PROP int nOctaves;
+    CV_PROP int nOctaveLayers;
+    CV_PROP bool extended;
+    CV_PROP bool upright;
 
     //! max keypoints = min(keypointsRatio * img.size().area(), 65535)
-    float keypointsRatio;
+    CV_PROP float keypointsRatio;
 
     GpuMat sum, mask1, maskSum;
 
diff --git a/modules/xfeatures2d/misc/python/pyopencv_xfeatures2d.hpp b/modules/xfeatures2d/misc/python/pyopencv_xfeatures2d.hpp
new file mode 100644
index 00000000000..17d9067465f
--- /dev/null
+++ b/modules/xfeatures2d/misc/python/pyopencv_xfeatures2d.hpp
@@ -0,0 +1,7 @@
+#ifdef HAVE_OPENCV_XFEATURES2D
+
+#include "opencv2/xfeatures2d.hpp"
+using cv::xfeatures2d::DAISY;
+
+typedef DAISY::NormalizationType DAISY_NormalizationType;
+#endif
diff --git a/modules/xfeatures2d/misc/python/test/test_cuda_xfeatures2d.py b/modules/xfeatures2d/misc/python/test/test_cuda_xfeatures2d.py
new file mode 100644
index 00000000000..16c8e56512c
--- /dev/null
+++ b/modules/xfeatures2d/misc/python/test/test_cuda_xfeatures2d.py
@@ -0,0 +1,48 @@
+#!/usr/bin/env python
+import os
+import cv2 as cv
+import numpy as np
+
+from tests_common import NewOpenCVTests, unittest
+
+class xfeatures2d_test(NewOpenCVTests):
+    def setUp(self):
+        super(xfeatures2d_test, self).setUp()
+        if not cv.cuda.getCudaEnabledDeviceCount():
+            self.skipTest("No CUDA-capable device is detected")
+
+    @unittest.skipIf('OPENCV_TEST_DATA_PATH' not in os.environ,
+                     "OPENCV_TEST_DATA_PATH is not defined")
+    def test_surf(self):
+        img_path = os.environ['OPENCV_TEST_DATA_PATH'] + "/gpu/features2d/aloe.png"
+        hessianThreshold = 100
+        nOctaves = 3
+        nOctaveLayers = 2
+        extended = False
+        keypointsRatio = 0.05
+        upright = False
+
+        npMat = cv.cvtColor(cv.imread(img_path),cv.COLOR_BGR2GRAY)
+        cuMat = cv.cuda_GpuMat(npMat)
+
+        try:
+            cuSurf = cv.cuda_SURF_CUDA.create(hessianThreshold,nOctaves,nOctaveLayers,extended,keypointsRatio,upright)
+            surf = cv.xfeatures2d_SURF.create(hessianThreshold,nOctaves,nOctaveLayers,extended,upright)
+        except cv.error as e:
+            self.assertEqual(e.code, cv.Error.StsNotImplemented)
+            self.skipTest("OPENCV_ENABLE_NONFREE is not enabled in this build.")
+
+        cuKeypoints = cuSurf.detect(cuMat,cv.cuda_GpuMat())
+        keypointsHost = cuSurf.downloadKeypoints(cuKeypoints)
+        keypoints = surf.detect(npMat)
+        self.assertTrue(len(keypointsHost) == len(keypoints))
+
+        cuKeypoints, cuDescriptors = cuSurf.detectWithDescriptors(cuMat,cv.cuda_GpuMat(),cuKeypoints,useProvidedKeypoints=True)
+        keypointsHost = cuSurf.downloadKeypoints(cuKeypoints)
+        descriptorsHost = cuDescriptors.download()
+        keypoints, descriptors = surf.compute(npMat,keypoints)
+
+        self.assertTrue(len(keypointsHost) == len(keypoints) and descriptorsHost.shape == descriptors.shape)
+
+if __name__ == '__main__':
+    NewOpenCVTests.bootstrap()
\ No newline at end of file
diff --git a/modules/xfeatures2d/src/daisy.cpp b/modules/xfeatures2d/src/daisy.cpp
index 30b7b7ae001..4c992975758 100644
--- a/modules/xfeatures2d/src/daisy.cpp
+++ b/modules/xfeatures2d/src/daisy.cpp
@@ -98,7 +98,7 @@ class DAISY_Impl CV_FINAL : public DAISY
      * @param use_orientation sample patterns using keypoints orientation, disabled by default.
      */
     explicit DAISY_Impl(float radius=15, int q_radius=3, int q_theta=8, int q_hist=8,
-                        int norm = DAISY::NRM_NONE, InputArray H = noArray(),
+                        DAISY::NormalizationType norm = DAISY::NRM_NONE, InputArray H = noArray(),
                         bool interpolation = true, bool use_orientation = false);
 
     virtual ~DAISY_Impl() CV_OVERRIDE;
@@ -189,7 +189,7 @@ class DAISY_Impl CV_FINAL : public DAISY
 
     // holds the type of the normalization to apply; equals to NRM_PARTIAL by
     // default. change the value using set_normalization() function.
-    int m_nrm_type;
+    DAISY::NormalizationType m_nrm_type;
 
     // the size of the descriptor vector
     int m_descriptor_size;
@@ -561,7 +561,7 @@ static void normalize_full( float* desc, const int _descriptor_size )
     }
 }
 
-static void normalize_descriptor( float* desc, const int nrm_type, const int _grid_point_number,
+static void normalize_descriptor( float* desc, const DAISY::NormalizationType nrm_type, const int _grid_point_number,
                                   const int _hist_th_q_no, const int _descriptor_size  )
 {
     if( nrm_type == DAISY::NRM_NONE ) return;
@@ -888,7 +888,7 @@ static void get_descriptor( const double y, const double x, const int orientatio
             const std::vector<Mat>* m_smoothed_gradient_layers, const Mat* m_oriented_grid_points,
             const double* m_orientation_shift_table, const int m_th_q_no, const int m_hist_th_q_no,
             const int m_grid_point_number, const int m_descriptor_size, const bool m_enable_interpolation,
-            const int m_nrm_type )
+            const DAISY::NormalizationType m_nrm_type)
 {
     get_unnormalized_descriptor( y, x, orientation, descriptor, m_smoothed_gradient_layers,
                                  m_oriented_grid_points, m_orientation_shift_table, m_th_q_no, m_enable_interpolation );
@@ -912,7 +912,7 @@ static bool get_descriptor_h( const double y, const double x, const int orientat
             const std::vector<Mat>* m_smoothed_gradient_layers, const Mat& m_cube_sigmas,
             const Mat* m_grid_points, const double* m_orientation_shift_table, const int m_th_q_no,
             const int m_hist_th_q_no, const int m_grid_point_number, const int m_descriptor_size,
-            const bool m_enable_interpolation, const int m_nrm_type  )
+            const bool m_enable_interpolation, const DAISY::NormalizationType m_nrm_type)
 
 {
     bool rval =
@@ -1054,7 +1054,7 @@ inline void DAISY_Impl::compute_descriptors( Mat* m_dense_descriptors )
 
 struct NormalizeDescriptorsInvoker : ParallelLoopBody
 {
-    NormalizeDescriptorsInvoker( Mat* _descriptors, int _nrm_type, int _grid_point_number,
+    NormalizeDescriptorsInvoker( Mat* _descriptors, DAISY::NormalizationType _nrm_type, int _grid_point_number,
                                  int _hist_th_q_no, int _descriptor_size )
     {
       descriptors = _descriptors;
@@ -1074,7 +1074,7 @@ struct NormalizeDescriptorsInvoker : ParallelLoopBody
     }
 
     Mat *descriptors;
-    int nrm_type;
+    DAISY::NormalizationType nrm_type;
     int grid_point_number;
     int hist_th_q_no;
     int descriptor_size;
@@ -1590,7 +1590,7 @@ void DAISY_Impl::compute( InputArray _image, OutputArray _descriptors )
 
 // constructor
 DAISY_Impl::DAISY_Impl( float _radius, int _q_radius, int _q_theta, int _q_hist,
-             int _norm, InputArray _H, bool _interpolation, bool _use_orientation )
+             DAISY::NormalizationType _norm, InputArray _H, bool _interpolation, bool _use_orientation )
            : m_rad(_radius), m_rad_q_no(_q_radius), m_th_q_no(_q_theta), m_hist_th_q_no(_q_hist),
              m_nrm_type(_norm), m_enable_interpolation(_interpolation), m_use_orientation(_use_orientation)
 {
@@ -1612,7 +1612,7 @@ DAISY_Impl::~DAISY_Impl()
 }
 
 Ptr<DAISY> DAISY::create( float radius, int q_radius, int q_theta, int q_hist,
-             int norm, InputArray H, bool interpolation, bool use_orientation)
+             DAISY::NormalizationType norm, InputArray H, bool interpolation, bool use_orientation)
 {
     return makePtr<DAISY_Impl>(radius, q_radius, q_theta, q_hist, norm, H, interpolation, use_orientation);
 }
diff --git a/modules/xfeatures2d/src/fast.cpp b/modules/xfeatures2d/src/fast.cpp
index fd05c961813..fdec0b3b36e 100644
--- a/modules/xfeatures2d/src/fast.cpp
+++ b/modules/xfeatures2d/src/fast.cpp
@@ -456,7 +456,7 @@ namespace {
 namespace cv {
     namespace xfeatures2d {
 
-        void FASTForPointSet(InputArray _img, std::vector<KeyPoint>& keypoints, int threshold, bool nonmax_suppression, int type)
+        void FASTForPointSet(InputArray _img, std::vector<KeyPoint>& keypoints, int threshold, bool nonmax_suppression, FastFeatureDetector::DetectorType type)
         {
             if (keypoints.empty()) {
                 FAST(_img, keypoints, threshold, nonmax_suppression, type);
diff --git a/modules/xfeatures2d/src/freak.cpp b/modules/xfeatures2d/src/freak.cpp
index ff2b94781a4..054841d4ee6 100644
--- a/modules/xfeatures2d/src/freak.cpp
+++ b/modules/xfeatures2d/src/freak.cpp
@@ -138,14 +138,10 @@ class FREAK_Impl CV_FINAL : public FREAK
     OrientationPair orientationPairs[NB_ORIENPAIRS];
 };
 
-
 static const double FREAK_LOG2 = 0.693147180559945;
 static const int FREAK_NB_ORIENTATION = 256;
 static const int FREAK_NB_POINTS = 43;
 static const int FREAK_SMALLEST_KP_SIZE = 7; // smallest size of keypoints
-static const int FREAK_NB_SCALES = FREAK::NB_SCALES;
-static const int FREAK_NB_PAIRS = FREAK::NB_PAIRS;
-static const int FREAK_NB_ORIENPAIRS = FREAK::NB_ORIENPAIRS;
 
 // default pairs
 static const int FREAK_DEF_PAIRS[FREAK_Impl::NB_PAIRS] =
@@ -209,8 +205,8 @@ void FREAK_Impl::buildPattern()
     nOctaves0 = nOctaves;
     patternScale0 = patternScale;
 
-    patternLookup.resize(FREAK_NB_SCALES*FREAK_NB_ORIENTATION*FREAK_NB_POINTS);
-    double scaleStep = std::pow(2.0, (double)(nOctaves)/FREAK_NB_SCALES ); // 2 ^ ( (nOctaves-1) /nbScales)
+    patternLookup.resize(FREAK::NB_SCALES*FREAK_NB_ORIENTATION*FREAK_NB_POINTS);
+    double scaleStep = std::pow(2.0, (double)(nOctaves)/FREAK::NB_SCALES ); // 2 ^ ( (nOctaves-1) /nbScales)
     double scalingFactor, alpha, beta, theta = 0;
 
     // pattern definition, radius normalized to 1.0 (outer point position+sigma=1.0)
@@ -226,7 +222,7 @@ void FREAK_Impl::buildPattern()
                              radius[6]/2.0, radius[6]/2.0
                             };
     // fill the lookup table
-    for( int scaleIdx=0; scaleIdx < FREAK_NB_SCALES; ++scaleIdx )
+    for( int scaleIdx=0; scaleIdx < FREAK::NB_SCALES; ++scaleIdx )
     {
         patternSizes[scaleIdx] = 0; // proper initialization
         scalingFactor = std::pow(scaleStep,scaleIdx); //scale of the pattern, scaleStep ^ scaleIdx
@@ -282,7 +278,7 @@ void FREAK_Impl::buildPattern()
     orientationPairs[39].i=30; orientationPairs[39].j=33; orientationPairs[40].i=31; orientationPairs[40].j=34; orientationPairs[41].i=32; orientationPairs[41].j=35;
     orientationPairs[42].i=36; orientationPairs[42].j=39; orientationPairs[43].i=37; orientationPairs[43].j=40; orientationPairs[44].i=38; orientationPairs[44].j=41;
 
-    for( unsigned m = FREAK_NB_ORIENPAIRS; m--; )
+    for( unsigned m = FREAK::NB_ORIENPAIRS; m--; )
     {
         const float dx = patternLookup[orientationPairs[m].i].x-patternLookup[orientationPairs[m].j].x;
         const float dy = patternLookup[orientationPairs[m].i].y-patternLookup[orientationPairs[m].j].y;
@@ -305,9 +301,9 @@ void FREAK_Impl::buildPattern()
     // Input vector provided
     if( !selectedPairs0.empty() )
     {
-        if( (int)selectedPairs0.size() == FREAK_NB_PAIRS )
+        if( (int)selectedPairs0.size() == FREAK::NB_PAIRS )
         {
-            for( int i = 0; i < FREAK_NB_PAIRS; ++i )
+            for( int i = 0; i < FREAK::NB_PAIRS; ++i )
                  descriptionPairs[i] = allPairs[selectedPairs0.at(i)];
         }
         else
@@ -317,7 +313,7 @@ void FREAK_Impl::buildPattern()
     }
     else // default selected pairs
     {
-        for( int i = 0; i < FREAK_NB_PAIRS; ++i )
+        for( int i = 0; i < FREAK::NB_PAIRS; ++i )
              descriptionPairs[i] = allPairs[FREAK_DEF_PAIRS[i]];
     }
 }
@@ -370,11 +366,11 @@ void FREAK_Impl::compute( InputArray _image, std::vector<KeyPoint>& keypoints, O
 template <typename srcMatType>
 void FREAK_Impl::extractDescriptor(srcMatType *pointsValue, void ** ptr)
 {
-    std::bitset<FREAK_NB_PAIRS>** ptrScalar = (std::bitset<FREAK_NB_PAIRS>**) ptr;
+    std::bitset<FREAK::NB_PAIRS>** ptrScalar = (std::bitset<FREAK::NB_PAIRS>**) ptr;
 
     // extracting descriptor preserving the order of SSE version
     int cnt = 0;
-    for( int n = 7; n < FREAK_NB_PAIRS; n += 128)
+    for( int n = 7; n < FREAK::NB_PAIRS; n += 128)
     {
         for( int m = 8; m--; )
         {
@@ -396,7 +392,7 @@ void FREAK_Impl::extractDescriptor(uchar *pointsValue, void ** ptr)
 
     // note that comparisons order is modified in each block (but first 128 comparisons remain globally the same-->does not affect the 128,384 bits segmanted matching strategy)
     int cnt = 0;
-    for( int n = FREAK_NB_PAIRS/128; n-- ; )
+    for( int n = FREAK::NB_PAIRS/128; n-- ; )
     {
         __m128i result128 = _mm_setzero_si128();
         for( int m = 128/16; m--; cnt += 16 )
@@ -457,7 +453,7 @@ void FREAK_Impl::computeDescriptors( InputArray _image, std::vector<KeyPoint>& k
     std::vector<int> kpScaleIdx(keypoints.size()); // used to save pattern scale index corresponding to each keypoints
     const std::vector<int>::iterator ScaleIdxBegin = kpScaleIdx.begin(); // used in std::vector erase function
     const std::vector<cv::KeyPoint>::iterator kpBegin = keypoints.begin(); // used in std::vector erase function
-    const float sizeCst = static_cast<float>(FREAK_NB_SCALES/(FREAK_LOG2* nOctaves));
+    const float sizeCst = static_cast<float>(FREAK::NB_SCALES/(FREAK_LOG2* nOctaves));
     srcMatType pointsValue[FREAK_NB_POINTS];
     int thetaIdx = 0;
     int direction0;
@@ -470,8 +466,8 @@ void FREAK_Impl::computeDescriptors( InputArray _image, std::vector<KeyPoint>& k
         {
             //Is k non-zero? If so, decrement it and continue"
             kpScaleIdx[k] = std::max( (int)(std::log(keypoints[k].size/FREAK_SMALLEST_KP_SIZE)*sizeCst+0.5) ,0);
-            if( kpScaleIdx[k] >= FREAK_NB_SCALES )
-                kpScaleIdx[k] = FREAK_NB_SCALES-1;
+            if( kpScaleIdx[k] >= FREAK::NB_SCALES )
+                kpScaleIdx[k] = FREAK::NB_SCALES-1;
 
             if( keypoints[k].pt.x <= patternSizes[kpScaleIdx[k]] || //check if the description at this specific position and scale fits inside the image
                  keypoints[k].pt.y <= patternSizes[kpScaleIdx[k]] ||
@@ -490,9 +486,9 @@ void FREAK_Impl::computeDescriptors( InputArray _image, std::vector<KeyPoint>& k
         for( size_t k = keypoints.size(); k--; )
         {
             kpScaleIdx[k] = scIdx; // equivalent to the formule when the scale is normalized with a constant size of keypoints[k].size=3*SMALLEST_KP_SIZE
-            if( kpScaleIdx[k] >= FREAK_NB_SCALES )
+            if( kpScaleIdx[k] >= FREAK::NB_SCALES )
             {
-                kpScaleIdx[k] = FREAK_NB_SCALES-1;
+                kpScaleIdx[k] = FREAK::NB_SCALES-1;
             }
             if( keypoints[k].pt.x <= patternSizes[kpScaleIdx[k]] ||
                 keypoints[k].pt.y <= patternSizes[kpScaleIdx[k]] ||
@@ -510,7 +506,7 @@ void FREAK_Impl::computeDescriptors( InputArray _image, std::vector<KeyPoint>& k
     if( !extAll )
     {
         // extract the best comparisons only
-        _descriptors.create((int)keypoints.size(), FREAK_NB_PAIRS/8, CV_8U);
+        _descriptors.create((int)keypoints.size(), FREAK::NB_PAIRS/8, CV_8U);
         _descriptors.setTo(Scalar::all(0));
         Mat descriptors = _descriptors.getMat();
 
@@ -767,9 +763,9 @@ std::vector<int> FREAK_Impl::selectPairs(const std::vector<Mat>& images
     }
 
     std::vector<int> idxBestPairs;
-    if( (int)bestPairs.size() >= FREAK_NB_PAIRS )
+    if( (int)bestPairs.size() >= FREAK::NB_PAIRS )
     {
-        for( int i = 0; i < FREAK_NB_PAIRS; ++i )
+        for( int i = 0; i < FREAK::NB_PAIRS; ++i )
             idxBestPairs.push_back(bestPairs[i].idx);
     }
     else
@@ -821,7 +817,7 @@ FREAK_Impl::~FREAK_Impl()
 
 int FREAK_Impl::descriptorSize() const
 {
-    return FREAK_NB_PAIRS / 8; // descriptor length in bytes
+    return FREAK::NB_PAIRS / 8; // descriptor length in bytes
 }
 
 int FREAK_Impl::descriptorType() const
diff --git a/modules/xfeatures2d/src/logos.cpp b/modules/xfeatures2d/src/logos.cpp
new file mode 100644
index 00000000000..0f4b3908d18
--- /dev/null
+++ b/modules/xfeatures2d/src/logos.cpp
@@ -0,0 +1,69 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+#include "precomp.hpp"
+#include "logos/Logos.hpp"
+
+namespace cv
+{
+namespace xfeatures2d
+{
+void matchLOGOS(const std::vector<KeyPoint>& keypoints1, const std::vector<KeyPoint>& keypoints2,
+                const std::vector<int>& nn1, const std::vector<int>& nn2, std::vector<DMatch>& matches1to2)
+{
+    CV_CheckEQ(keypoints1.size(), nn1.size(), "Number of keypoints1 must be equal to the number of nn1.");
+    CV_CheckEQ(keypoints2.size(), nn2.size(), "Number of keypoints2 must be equal to the number of nn2.");
+    if (keypoints1.empty() || keypoints2.empty())
+    {
+        return;
+    }
+
+    std::vector<logos::Point*> vP1, vP2;
+    vP1.reserve(keypoints1.size());
+    vP2.reserve(keypoints2.size());
+
+    for (size_t i = 0; i < keypoints1.size(); i++)
+    {
+        logos::Point* pt1 = new logos::Point(keypoints1[i].pt.x, keypoints1[i].pt.y,
+                                             static_cast<float>(keypoints1[i].angle*CV_PI/180),
+                                             keypoints1[i].size, nn1[i]);
+        vP1.push_back(pt1);
+    }
+
+    for (size_t i = 0; i < keypoints2.size(); i++)
+    {
+        logos::Point* pt2 = new logos::Point(keypoints2[i].pt.x, keypoints2[i].pt.y,
+                                             static_cast<float>(keypoints2[i].angle*CV_PI/180),
+                                             keypoints2[i].size, nn2[i]);
+        vP2.push_back(pt2);
+    }
+
+    logos::Logos logos;
+    std::vector<logos::PointPair*> globalMatches;
+    logos.estimateMatches(vP1, vP2, globalMatches);
+
+    matches1to2.clear();
+    matches1to2.reserve(globalMatches.size());
+    for (size_t i = 0; i < globalMatches.size(); i++)
+    {
+        logos::PointPair* pp = globalMatches[i];
+        matches1to2.push_back(DMatch(pp->getPos1(), pp->getPos2(), 0));
+    }
+
+    for (size_t i = 0; i < globalMatches.size(); i++)
+    {
+        delete globalMatches[i];
+    }
+
+    for (size_t i = 0; i < vP1.size(); i++)
+    {
+        delete vP1[i];
+    }
+
+    for (size_t i = 0; i < vP2.size(); i++)
+    {
+        delete vP2[i];
+    }
+}
+} //namespace xfeatures2d
+} //namespace cv
diff --git a/modules/xfeatures2d/src/logos/Logos.cpp b/modules/xfeatures2d/src/logos/Logos.cpp
new file mode 100644
index 00000000000..b4c8773ac0d
--- /dev/null
+++ b/modules/xfeatures2d/src/logos/Logos.cpp
@@ -0,0 +1,189 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+/*
+ * MIT License
+ *
+ * Copyright (c) 2018 Stephanie Lowry
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include <cmath>
+#include "Logos.hpp"
+#include <opencv2/core.hpp>
+
+namespace logos
+{
+Logos::Logos()
+{
+    LogosParameters defaultParams;
+    init(defaultParams);
+}
+
+Logos::Logos(const LogosParameters& p)
+{
+    init(p);
+}
+
+void Logos::init(const LogosParameters& p)
+{
+    setParams(p);
+    LB = static_cast<float>(-CV_PI);
+    BINSIZE = logosParams.GLOBALORILIMIT/3;
+    BINNUMBER = static_cast<unsigned int>(ceil(2*CV_PI/BINSIZE));
+    bins.resize(BINNUMBER);
+    std::fill(bins.begin(), bins.end(), 0);
+}
+
+int Logos::estimateMatches(std::vector<Point*> vP1, std::vector<Point*> vP2, std::vector<PointPair*>& globalmatches)
+{
+    matches.clear();
+
+    // for each point
+    int count1 = 0;
+
+    for (std::vector<Point*>::iterator pit1 = vP1.begin(); pit1 != vP1.end(); ++pit1, count1++)
+    {
+        (*pit1)->nearestNeighbours(vP1, count1, getNum1());
+        int count2 = 0;
+
+        // find possible matches
+        for (std::vector<Point*>::iterator pit2 = vP2.begin(); pit2 != vP2.end(); ++pit2, count2++)
+        {
+            if ((*pit1)->getLabel() != (*pit2)->getLabel())
+            {
+                continue;
+            }
+            // this is a possible match in Image 2
+            // get nearest neighbours
+            (*pit2)->nearestNeighbours(vP2, count2, getNum2());
+
+            PointPair* ptpr = new PointPair(*pit1, *pit2);
+            ptpr->addPositions(count1, count2);
+            ptpr->computeLocalSupport(pp, getNum2());
+
+            // calc matches
+            int support = 0;
+            for (std::vector<PointPair*>::const_iterator it = pp.begin(); it < pp.end(); ++it)
+            {
+                Match m(ptpr, *it);
+                if (evaluateMatch(m))
+                {
+                    support++;
+                }
+            }
+            for (size_t i = 0; i < pp.size(); i++)
+            {
+                delete pp[i];
+            }
+            pp.clear();
+            if (support > 0)
+            {
+                ptpr->setSupport(support);
+                matches.push_back(ptpr);
+                updateBin(ptpr->getRelOri());
+            }
+            else
+            {
+                delete ptpr;
+                ptpr = NULL;
+            }
+        }
+    }
+
+    // do global orientation
+    double maxang = calcGlobalOrientation();
+
+    // find which matches are within global orientation limit
+    int numinliers = 0;
+    globalmatches.clear();
+    for (std::vector<PointPair*>::iterator it = matches.begin(); it != matches.end(); ++it)
+    {
+        if (std::fabs((*it)->getRelOri() - maxang) < logosParams.GLOBALORILIMIT)
+        {
+            numinliers++;
+            globalmatches.push_back(*it);
+        }
+        else
+        {
+            delete *it;
+            *it = NULL;
+        }
+    }
+
+    return numinliers;
+}
+
+bool Logos::evaluateMatch(const Match& m) const
+{
+    return ((m.getRelOrientation() < getIntraOriLimit()) &&
+            (m.getRelScale() < getIntraScaleLimit()) &&
+            (m.getInterOrientation() < getInterOriLimit()) &&
+            (m.getInterScale() < getInterScaleLimit()));
+}
+
+void Logos::updateBin(float input)
+{
+    unsigned int binnumber = static_cast<unsigned int>(cvFloor((input-LB) / BINSIZE));
+    // compare binnumber to BINNUMBER
+    if (binnumber < BINNUMBER)
+    {
+        bins[binnumber]++;
+    }
+    else
+    {
+        bins[BINNUMBER-1]++;
+    }
+}
+
+float Logos::calcGlobalOrientation()
+{
+    // find max bin
+    // check BINNUMBER is big enough
+    if (BINNUMBER < 3)
+    {
+        return 0;
+    }
+
+    std::vector<int> bins2(BINNUMBER);
+    int maxval = 0;
+    unsigned int maxix = 0;
+    bins2[0] = bins[0] + bins[1] + bins[BINNUMBER-1];
+    maxval = bins2[0];
+    for (unsigned int i = 1; i < BINNUMBER; i++)
+    {
+        if (i == BINNUMBER-1)
+        {
+            bins2[i] = bins[i]+bins[i-1]+bins[0];
+        }
+        else
+        {
+            bins2[i] = bins[i]+bins[i-1]+bins[i+1];
+        }
+        if (bins2[i] > maxval)
+        {
+            maxval = bins2[i];
+            maxix = i;
+        }
+    }
+
+    // convert to an angle
+    return LB + maxix*BINSIZE + BINSIZE/2;
+}
+}
diff --git a/modules/xfeatures2d/src/logos/Logos.hpp b/modules/xfeatures2d/src/logos/Logos.hpp
new file mode 100644
index 00000000000..1d0fdf6218a
--- /dev/null
+++ b/modules/xfeatures2d/src/logos/Logos.hpp
@@ -0,0 +1,87 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+/*
+ * MIT License
+ *
+ * Copyright (c) 2018 Stephanie Lowry
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef LOGOS_HPP
+#define LOGOS_HPP
+
+#include "Point.hpp"
+#include "Match.hpp"
+#include "PointPair.hpp"
+
+namespace logos
+{
+struct LogosParameters
+{
+    LogosParameters() :
+        INTRAORILIMIT(0.1f), INTRASCALELIMIT(0.1f), INTERORILIMIT(0.1f), INTERSCALELIMIT(0.1f), GLOBALORILIMIT(0.1f),
+        NUM1(5), NUM2(5) {}
+
+    float INTRAORILIMIT;
+    float INTRASCALELIMIT;
+    float INTERORILIMIT;
+    float INTERSCALELIMIT;
+    float GLOBALORILIMIT;
+    int NUM1;
+    int NUM2;
+};
+
+class Logos
+{
+private:
+    std::vector<PointPair*> pp;
+    std::vector<PointPair*> matches;
+
+    LogosParameters logosParams;
+    float LB;
+    float BINSIZE;
+    unsigned int BINNUMBER;
+    std::vector<int> bins;
+
+public:
+    Logos();
+    Logos(const LogosParameters& p);
+
+    void init(const LogosParameters& p);
+
+    int estimateMatches(std::vector<Point*> vP1, std::vector<Point*> vP2, std::vector<PointPair*>& globalmatches);
+    bool evaluateMatch(const Match& m) const;
+
+    inline float getIntraOriLimit() const { return logosParams.INTRAORILIMIT; }
+    inline float getIntraScaleLimit() const { return logosParams.INTRASCALELIMIT; }
+    inline float getInterOriLimit() const { return logosParams.INTERORILIMIT; }
+    inline float getInterScaleLimit() const { return logosParams.INTERSCALELIMIT; }
+    inline float getGlobalOriLimit() const { return logosParams.GLOBALORILIMIT; }
+    inline int getNum1() const { return logosParams.NUM1; }
+    inline int getNum2() const { return logosParams.NUM2; }
+
+    void updateBin(float input);
+    float calcGlobalOrientation();
+
+    inline void setParams(const LogosParameters& p) { logosParams = p; }
+};
+}
+
+#endif
diff --git a/modules/xfeatures2d/src/logos/Match.cpp b/modules/xfeatures2d/src/logos/Match.cpp
new file mode 100644
index 00000000000..e70e19acac2
--- /dev/null
+++ b/modules/xfeatures2d/src/logos/Match.cpp
@@ -0,0 +1,103 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+/*
+ * MIT License
+ *
+ * Copyright (c) 2018 Stephanie Lowry
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include <algorithm>    // std::min
+#include <cmath>        // log, acos
+#include <iostream>
+#include <opencv2/core.hpp>
+#include "Match.hpp"
+
+namespace logos
+{
+Match::Match(PointPair* r_, PointPair* s_) :
+    r(r_), s(s_)
+{
+    calculateInternalVariables();
+    setRelOrientation();
+    setRelScale();
+    interOrientationAndScale();
+}
+
+void Match::calculateInternalVariables()
+{
+    vijx = r->getx1() - s->getx1();
+    vijy = r->gety1() - s->gety1();
+    vmnx = r->getx2() - s->getx2();
+    vmny = r->gety2() - s->gety2();
+}
+
+void Match::setRelOrientation()
+{
+    relOrientation = angleAbsDiff(r->getRelOri(), s->getRelOri());
+}
+
+void Match::setRelScale()
+{
+    relScale = std::fabs(r->getRelScale() - s->getRelScale());
+}
+
+float Match::angleAbsDiff(float a1, float a2)
+{
+    float ad = std::fabs(a1-a2);
+    while (ad > 2*CV_PI)
+    {
+        ad = static_cast<float>(ad-2*CV_PI);
+    }
+    ad = std::min(std::fabs(ad), std::fabs(static_cast<float>(2*CV_PI-std::fabs(ad))));
+
+    return ad;
+}
+
+void Match::interOrientationAndScale()
+{
+    float cp =  vijx*vmny - vijy*vmnx; // analogous to 2D cross product
+    float nmij = std::sqrt(vijx*vijx + vijy*vijy);
+    float nmnm = std::sqrt(vmnx*vmnx + vmny*vmny);
+
+    float fr = (vijx*vmnx+vijy*vmny) / (nmij*nmnm);  // numerator equivalent to dot product
+    fr = std::min(std::max(fr, -1.0f), 1.0f);
+    ro3 = std::acos(fr)*sign(cp);
+
+    rs3 = std::log(nmij) - std::log(nmnm);
+
+    interOrientation = angleAbsDiff(r->getRelOri(), ro3);
+    interScale = std::fabs(r->getRelScale() - rs3);
+}
+
+void Match::printMatch() const
+{
+    std::cout << "Relative Orientation: " << relOrientation << std::endl;
+    std::cout << "Relative Scale: " << relScale << std::endl;
+    std::cout << "Inter Orientation: " << interOrientation << std::endl;
+    std::cout << "Inter Scale: " << interScale << std::endl;
+    std::cout << "Global Relative Orientation: " << r->getRelOri() << std::endl;
+}
+
+int Match::sign(float x)
+{
+    return (x > 0) - (x < 0);
+}
+}
diff --git a/modules/xfeatures2d/src/logos/Match.hpp b/modules/xfeatures2d/src/logos/Match.hpp
new file mode 100644
index 00000000000..8c01cead22b
--- /dev/null
+++ b/modules/xfeatures2d/src/logos/Match.hpp
@@ -0,0 +1,73 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+/*
+ * MIT License
+ *
+ * Copyright (c) 2018 Stephanie Lowry
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef MATCH_HPP
+#define MATCH_HPP
+
+#include "PointPair.hpp"
+
+namespace logos
+{
+class Match
+{
+private:
+    PointPair* r;
+    PointPair* s;
+    float relOrientation;
+    float relScale;
+    float interOrientation;
+    float interScale;
+
+    // Internal variables
+    float ro3;
+    float rs3;
+    float vijx;
+    float vijy;
+    float vmnx;
+    float vmny;
+
+    // Internal functions
+    void calculateInternalVariables();
+    void setRelOrientation();
+    void setRelScale();
+    float angleDiff(float a1,float a2);
+    float angleAbsDiff(float a1, float a2);
+    void interOrientationAndScale();
+    int sign(float x);
+
+public:
+    Match(PointPair* r, PointPair* s);
+
+    inline float getRelOrientation() const { return relOrientation; }
+    inline float getRelScale() const { return relScale; }
+    inline float getInterOrientation() const { return interOrientation; }
+    inline float getInterScale() const { return interScale; }
+
+    void printMatch() const;
+};
+}
+
+#endif
diff --git a/modules/xfeatures2d/src/logos/Point.cpp b/modules/xfeatures2d/src/logos/Point.cpp
new file mode 100644
index 00000000000..2966de284e5
--- /dev/null
+++ b/modules/xfeatures2d/src/logos/Point.cpp
@@ -0,0 +1,120 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+/*
+ * MIT License
+ *
+ * Copyright (c) 2018 Stephanie Lowry
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include <iostream>
+#include <algorithm> // std::sort
+#include "Point.hpp"
+
+namespace logos
+{
+static bool cMP(const MatchPoint& m, const MatchPoint& n)
+{
+    return (m.sd < n.sd);
+}
+
+Point::Point() :
+    x(0), y(0), orientation(0), scale(1), nnVector(), nnFound(false), label(0)
+{
+}
+
+Point::Point(float x_, float y_, float orientation_, float scale_, int label_) :
+    x(x_), y(y_), orientation(orientation_), scale(scale_), nnVector(), nnFound(false), label(label_)
+{
+}
+
+void Point::nearestNeighbours(const std::vector<Point*>& vP, int index, int N)
+{
+    nearestNeighboursNaive(vP, index, N);
+}
+
+void Point::nearestNeighboursNaive(const std::vector<Point*>& vP, int index, int N)
+{
+    // only want to calculate once.
+    if (nnFound)
+    {
+        return;
+    }
+
+    std::vector<MatchPoint> minMatch;
+    minMatch.reserve(vP.size());
+
+    int i = 0;
+    for (std::vector<Point*>::const_iterator it = vP.begin(); it != vP.end(); ++it, i++)
+    {
+        // A point is not it's own neighbour
+        if (i == index)
+        {
+            continue;
+        }
+        float sd = squareDist(getx(), gety(), (*it)->getx(), (*it)->gety());
+        MatchPoint mP(sd, i);
+        minMatch.push_back(mP);
+    }
+
+    std::sort(minMatch.begin(), minMatch.end(), cMP);
+    nnVector.resize(static_cast<size_t>(N));
+    int count = 0;
+    for (std::vector<MatchPoint>::const_iterator mmit = minMatch.begin(); count < N; ++mmit, count++)
+    {
+        nnVector[static_cast<size_t>(count)] = vP[static_cast<size_t>(mmit->index)];
+    }
+
+    nnFound = true;
+}
+
+void Point::matchLabel(int label_, std::vector<Point*>& matchNN)
+{
+    for (std::vector<Point*>::const_iterator nnIterator = nnVector.begin();
+         nnIterator != nnVector.end(); ++nnIterator)
+    {
+        if ((*nnIterator)->label == label_)
+        {
+            matchNN.push_back(*nnIterator);
+        }
+    }
+}
+
+void Point::printPoint() const
+{
+    std::cout << getx() << " "
+              << gety() << " " << getOrientation() << " "
+              << getScale() << " " << getLabel() << std::endl;
+}
+
+void Point::printNN() const
+{
+    for(std::vector<Point*>::const_iterator nnIterator = nnVector.begin();
+        nnIterator != nnVector.end(); ++nnIterator)
+    {
+        (*nnIterator)->printPoint();
+    }
+}
+
+float Point::squareDist(float x1, float y1, float x2, float y2)
+{
+    return (x1-x2)*(x1-x2) + (y1-y2)*(y1-y2);
+}
+}
diff --git a/modules/xfeatures2d/src/logos/Point.hpp b/modules/xfeatures2d/src/logos/Point.hpp
new file mode 100644
index 00000000000..5d0e3d7e1d4
--- /dev/null
+++ b/modules/xfeatures2d/src/logos/Point.hpp
@@ -0,0 +1,79 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+/*
+ * MIT License
+ *
+ * Copyright (c) 2018 Stephanie Lowry
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef POINT_HPP
+#define POINT_HPP
+
+#include <vector>
+
+namespace logos
+{
+struct MatchPoint
+{
+    float sd;
+    int index;
+
+    MatchPoint(float sd_, int idx) :
+        sd(sd_), index(idx) {}
+};
+
+class Point
+{
+private:
+    float x;
+    float y;
+    float orientation;
+    float scale;
+    std::vector<Point*> nnVector;
+    bool nnFound;
+    int label;
+
+public:
+    Point();
+    Point(float x_, float y_, float orientation_, float scale_, int label_ = 0);
+
+    inline float getx() const { return x; }
+    inline float gety() const { return y; }
+    inline float getOrientation() const { return orientation; }
+    inline float getScale() const { return scale; }
+
+    inline int getLabel() const { return label; }
+    inline void setLabel(int label_) { label = label_; }
+
+    inline void getNNVector(std::vector<Point*>& nnv) const { nnv = nnVector; }
+    void matchLabel(int label, std::vector<Point*>& mNN);
+
+    void nearestNeighbours(const std::vector<Point*>& vP, int index, int N);
+    void nearestNeighboursNaive(const std::vector<Point*>& vP, int index, int N);
+
+    void printPoint() const;
+    void printNN() const;
+
+    float squareDist(float x1, float y1, float x2, float y2);
+};
+}
+
+#endif
diff --git a/modules/xfeatures2d/src/logos/PointPair.cpp b/modules/xfeatures2d/src/logos/PointPair.cpp
new file mode 100644
index 00000000000..0db95570308
--- /dev/null
+++ b/modules/xfeatures2d/src/logos/PointPair.cpp
@@ -0,0 +1,77 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+/*
+ * MIT License
+ *
+ * Copyright (c) 2018 Stephanie Lowry
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include <opencv2/core.hpp>
+#include "PointPair.hpp"
+
+namespace logos
+{
+PointPair::PointPair(Point* p_, Point* q_) :
+    p(p_), q(q_), support(0), pos1(0), pos2(0)
+{
+    calculateInternalVariables();
+}
+
+void PointPair::computeLocalSupport(std::vector<PointPair*>& pp, int N)
+{
+    std::vector<Point*> nnVector;
+    p->getNNVector(nnVector); // Exposes the nearest neighbours
+
+    // for each nearest neighbour
+    for (std::vector<Point*>::iterator nnIterator = nnVector.begin(); nnIterator != nnVector.end(); ++nnIterator)
+    {
+        // is there a matching nearestNeighbour?
+        std::vector<Point*> matchNN;
+        matchNN.reserve(static_cast<size_t>(N));
+        q->matchLabel((*nnIterator)->getLabel(), matchNN);
+        for (std::vector<Point*>::const_iterator mit = matchNN.begin(); mit != matchNN.end(); ++mit)
+        {
+            PointPair* m = new PointPair(*nnIterator, *mit);
+            pp.push_back(m);
+        }
+    }
+}
+
+void PointPair::calculateInternalVariables()
+{
+    relOri = angleDiff(p->getOrientation(), q->getOrientation());
+    relScale = std::log(p->getScale()) - std::log(q->getScale());
+}
+
+float PointPair::angleDiff(float a1, float a2)
+{
+    float ad = a1 - a2;
+    while (ad > CV_PI)
+    {
+        ad = static_cast<float>(ad-2*CV_PI);
+    }
+    while (ad < -CV_PI)
+    {
+        ad = static_cast<float>(ad + 2*CV_PI);
+    }
+    return ad;
+}
+}
diff --git a/modules/xfeatures2d/src/logos/PointPair.hpp b/modules/xfeatures2d/src/logos/PointPair.hpp
new file mode 100644
index 00000000000..1669adc56ac
--- /dev/null
+++ b/modules/xfeatures2d/src/logos/PointPair.hpp
@@ -0,0 +1,71 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+/*
+ * MIT License
+ *
+ * Copyright (c) 2018 Stephanie Lowry
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef POINTPAIR_HPP
+#define POINTPAIR_HPP
+
+#include "Point.hpp"
+
+namespace logos
+{
+class PointPair
+{
+private:
+    Point* p;
+    Point* q;
+    int support;
+    float relOri;
+    float relScale;
+    int pos1;
+    int pos2;
+
+    float angleDiff(float a1, float a2);
+
+public:
+    PointPair(Point* p_, Point* q_);
+
+    void computeLocalSupport(std::vector<PointPair*>& pp, int N);
+
+    void calculateInternalVariables();
+
+    float getRelOri() const { return relOri; }
+    float getRelScale() const { return relScale; }
+
+    float getx1() const { return p->getx(); }
+    float getx2() const { return q->getx(); }
+    float gety1() const { return p->gety(); }
+    float gety2() const { return q->gety(); }
+
+    void addPositions(int pos1_, int pos2_) { pos1 = pos1_; pos2 = pos2_; }
+    int getPos1() const { return pos1; }
+    int getPos2() const { return pos2; }
+
+    int getSupport() const { return support; }
+    void setSupport(int support_) { support = support_; }
+};
+}
+
+#endif
diff --git a/modules/xfeatures2d/src/pct_signatures/pct_clusterizer.cpp b/modules/xfeatures2d/src/pct_signatures/pct_clusterizer.cpp
index f02a2395480..0e2ee36f7ed 100644
--- a/modules/xfeatures2d/src/pct_signatures/pct_clusterizer.cpp
+++ b/modules/xfeatures2d/src/pct_signatures/pct_clusterizer.cpp
@@ -126,7 +126,7 @@ namespace cv
 
                     if ((int)(this->mInitSeedIndexes.size()) > samples.rows)
                     {
-                        CV_Error_(Error::StsBadArg, ("Number of seeds %d must be less or equal to the number of samples %d.",
+                        CV_Error_(Error::StsBadArg, ("Number of seeds %zu must be less or equal to the number of samples %d.",
                             mInitSeedIndexes.size(), samples.rows));
                     }
 
diff --git a/modules/xfeatures2d/src/pct_signatures/pct_sampler.cpp b/modules/xfeatures2d/src/pct_signatures/pct_sampler.cpp
index 11343bcb5a6..1f36ab303ef 100644
--- a/modules/xfeatures2d/src/pct_signatures/pct_sampler.cpp
+++ b/modules/xfeatures2d/src/pct_signatures/pct_sampler.cpp
@@ -157,7 +157,7 @@ namespace cv
                     if (weights.size() != mWeights.size())
                     {
                         CV_Error_(Error::StsUnmatchedSizes,
-                            ("Invalid weights dimension %d (max %d)", weights.size(), mWeights.size()));
+                            ("Invalid weights dimension %zu (max %zu)", weights.size(), mWeights.size()));
                     }
                     else
                     {
@@ -178,7 +178,7 @@ namespace cv
                     if (translations.size() != mTranslations.size())
                     {
                         CV_Error_(Error::StsUnmatchedSizes,
-                            ("Invalid translations dimension %d (max %d)", translations.size(), mTranslations.size()));
+                            ("Invalid translations dimension %zu (max %zu)", translations.size(), mTranslations.size()));
                     }
                     else
                     {
diff --git a/modules/xfeatures2d/src/surf.cuda.cpp b/modules/xfeatures2d/src/surf.cuda.cpp
index 36be1780af4..7864a166a37 100644
--- a/modules/xfeatures2d/src/surf.cuda.cpp
+++ b/modules/xfeatures2d/src/surf.cuda.cpp
@@ -54,6 +54,7 @@ using namespace cv::cuda;
 
 cv::cuda::SURF_CUDA::SURF_CUDA() { throw_no_nonfree }
 cv::cuda::SURF_CUDA::SURF_CUDA(double, int, int, bool, float, bool) { throw_no_nonfree }
+Ptr<SURF_CUDA> cv::cuda::SURF_CUDA::create(double, int, int, bool, float, bool) { throw_no_nonfree }
 int cv::cuda::SURF_CUDA::descriptorSize() const { throw_no_nonfree }
 int cv::cuda::SURF_CUDA::defaultNorm() const { throw_no_nonfree }
 void cv::cuda::SURF_CUDA::uploadKeypoints(const std::vector<KeyPoint>&, GpuMat&) { throw_no_nonfree }
@@ -71,6 +72,7 @@ void cv::cuda::SURF_CUDA::releaseMemory() { throw_no_nonfree }
 
 cv::cuda::SURF_CUDA::SURF_CUDA() { throw_no_cuda(); }
 cv::cuda::SURF_CUDA::SURF_CUDA(double, int, int, bool, float, bool) { throw_no_cuda(); }
+Ptr<SURF_CUDA> cv::cuda::SURF_CUDA::create(double, int, int, bool, float, bool) { throw_no_cuda(); }
 int cv::cuda::SURF_CUDA::descriptorSize() const { throw_no_cuda(); }
 int cv::cuda::SURF_CUDA::defaultNorm() const { throw_no_cuda(); }
 void cv::cuda::SURF_CUDA::uploadKeypoints(const std::vector<KeyPoint>&, GpuMat&) { throw_no_cuda(); }
@@ -451,6 +453,12 @@ void cv::cuda::SURF_CUDA::releaseMemory()
     maxPosBuffer.release();
 }
 
+Ptr<SURF_CUDA> cv::cuda::SURF_CUDA::create(double _hessianThreshold, int _nOctaves,
+    int _nOctaveLayers, bool _extended, float _keypointsRatio, bool _upright)
+{
+    return makePtr<SURF_CUDA>(_hessianThreshold, _nOctaves, _nOctaveLayers, _extended, _keypointsRatio, _upright);
+}
+
 #endif // !defined (OPENCV_ENABLE_NONFREE)
 #endif
 
diff --git a/modules/xfeatures2d/test/test_features2d.cpp b/modules/xfeatures2d/test/test_features2d.cpp
index 65e2b63f8ec..613c7e1d946 100644
--- a/modules/xfeatures2d/test/test_features2d.cpp
+++ b/modules/xfeatures2d/test/test_features2d.cpp
@@ -42,950 +42,16 @@
 #include "test_precomp.hpp"
 
 namespace opencv_test { namespace {
-
 const string FEATURES2D_DIR = "features2d";
 const string DETECTOR_DIR = FEATURES2D_DIR + "/feature_detectors";
 const string DESCRIPTOR_DIR = FEATURES2D_DIR + "/descriptor_extractors";
 const string IMAGE_FILENAME = "tsukuba.png";
+}} // namespace
 
-/****************************************************************************************\
-*            Regression tests for feature detectors comparing keypoints.                 *
-\****************************************************************************************/
-
-class CV_FeatureDetectorTest : public cvtest::BaseTest
-{
-public:
-    CV_FeatureDetectorTest( const string& _name, const Ptr<FeatureDetector>& _fdetector ) :
-        name(_name), fdetector(_fdetector) {}
-
-protected:
-    bool isSimilarKeypoints( const KeyPoint& p1, const KeyPoint& p2 );
-    void compareKeypointSets( const vector<KeyPoint>& validKeypoints, const vector<KeyPoint>& calcKeypoints );
-
-    void emptyDataTest();
-    void regressionTest(); // TODO test of detect() with mask
-
-    virtual void run( int );
-
-    string name;
-    Ptr<FeatureDetector> fdetector;
-};
-
-void CV_FeatureDetectorTest::emptyDataTest()
-{
-    // One image.
-    Mat image;
-    vector<KeyPoint> keypoints;
-    try
-    {
-        fdetector->detect( image, keypoints );
-    }
-    catch(...)
-    {
-        ts->printf( cvtest::TS::LOG, "detect() on empty image must not generate exception (1).\n" );
-        ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-    }
-
-    if( !keypoints.empty() )
-    {
-        ts->printf( cvtest::TS::LOG, "detect() on empty image must return empty keypoints vector (1).\n" );
-        ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-        return;
-    }
-
-    // Several images.
-    vector<Mat> images;
-    vector<vector<KeyPoint> > keypointCollection;
-    try
-    {
-        fdetector->detect( images, keypointCollection );
-    }
-    catch(...)
-    {
-        ts->printf( cvtest::TS::LOG, "detect() on empty image vector must not generate exception (2).\n" );
-        ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-    }
-}
-
-bool CV_FeatureDetectorTest::isSimilarKeypoints( const KeyPoint& p1, const KeyPoint& p2 )
-{
-    const float maxPtDif = 1.f;
-    const float maxSizeDif = 1.f;
-    const float maxAngleDif = 2.f;
-    const float maxResponseDif = 0.1f;
-
-    float dist = (float)cv::norm(p1.pt - p2.pt);
-    return (dist < maxPtDif &&
-            fabs(p1.size - p2.size) < maxSizeDif &&
-            abs(p1.angle - p2.angle) < maxAngleDif &&
-            abs(p1.response - p2.response) < maxResponseDif &&
-            p1.octave == p2.octave &&
-            p1.class_id == p2.class_id);
-}
-
-void CV_FeatureDetectorTest::compareKeypointSets( const vector<KeyPoint>& validKeypoints, const vector<KeyPoint>& calcKeypoints )
-{
-    const float maxCountRatioDif = 0.01f;
-
-    // Compare counts of validation and calculated keypoints.
-    float countRatio = (float)validKeypoints.size() / (float)calcKeypoints.size();
-    if( countRatio < 1 - maxCountRatioDif || countRatio > 1.f + maxCountRatioDif )
-    {
-        ts->printf( cvtest::TS::LOG, "Bad keypoints count ratio (validCount = %d, calcCount = %d).\n",
-                    validKeypoints.size(), calcKeypoints.size() );
-        ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-        return;
-    }
-
-    int progress = 0, progressCount = (int)(validKeypoints.size() * calcKeypoints.size());
-    int badPointCount = 0, commonPointCount = max((int)validKeypoints.size(), (int)calcKeypoints.size());
-    for( size_t v = 0; v < validKeypoints.size(); v++ )
-    {
-        int nearestIdx = -1;
-        float minDist = std::numeric_limits<float>::max();
-
-        for( size_t c = 0; c < calcKeypoints.size(); c++ )
-        {
-            progress = update_progress( progress, (int)(v*calcKeypoints.size() + c), progressCount, 0 );
-            float curDist = (float)cv::norm(calcKeypoints[c].pt - validKeypoints[v].pt);
-            if( curDist < minDist )
-            {
-                minDist = curDist;
-                nearestIdx = (int)c;
-            }
-        }
-
-        assert( minDist >= 0 );
-        if( !isSimilarKeypoints( validKeypoints[v], calcKeypoints[nearestIdx] ) )
-            badPointCount++;
-    }
-    ts->printf( cvtest::TS::LOG, "badPointCount = %d; validPointCount = %d; calcPointCount = %d\n",
-                badPointCount, validKeypoints.size(), calcKeypoints.size() );
-    if( badPointCount > 0.9 * commonPointCount )
-    {
-        ts->printf( cvtest::TS::LOG, " - Bad accuracy!\n" );
-        ts->set_failed_test_info( cvtest::TS::FAIL_BAD_ACCURACY );
-        return;
-    }
-    ts->printf( cvtest::TS::LOG, " - OK\n" );
-}
-
-void CV_FeatureDetectorTest::regressionTest()
-{
-    assert( !fdetector.empty() );
-    string imgFilename = string(ts->get_data_path()) + FEATURES2D_DIR + "/" + IMAGE_FILENAME;
-    string resFilename = string(ts->get_data_path()) + DETECTOR_DIR + "/" + string(name) + ".xml.gz";
-
-    // Read the test image.
-    Mat image = imread( imgFilename );
-    if( image.empty() )
-    {
-        ts->printf( cvtest::TS::LOG, "Image %s can not be read.\n", imgFilename.c_str() );
-        ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_TEST_DATA );
-        return;
-    }
-
-    FileStorage fs( resFilename, FileStorage::READ );
-
-    // Compute keypoints.
-    vector<KeyPoint> calcKeypoints;
-    fdetector->detect( image, calcKeypoints );
-
-    if( fs.isOpened() ) // Compare computed and valid keypoints.
-    {
-        // TODO compare saved feature detector params with current ones
-
-        // Read validation keypoints set.
-        vector<KeyPoint> validKeypoints;
-        read( fs["keypoints"], validKeypoints );
-        if( validKeypoints.empty() )
-        {
-            ts->printf( cvtest::TS::LOG, "Keypoints can not be read.\n" );
-            ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_TEST_DATA );
-            return;
-        }
-
-        compareKeypointSets( validKeypoints, calcKeypoints );
-    }
-    else // Write detector parameters and computed keypoints as validation data.
-    {
-        fs.open( resFilename, FileStorage::WRITE );
-        if( !fs.isOpened() )
-        {
-            ts->printf( cvtest::TS::LOG, "File %s can not be opened to write.\n", resFilename.c_str() );
-            ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_TEST_DATA );
-            return;
-        }
-        else
-        {
-            fs << "detector_params" << "{";
-            fdetector->write( fs );
-            fs << "}";
-
-            write( fs, "keypoints", calcKeypoints );
-        }
-    }
-}
-
-void CV_FeatureDetectorTest::run( int /*start_from*/ )
-{
-    if( !fdetector )
-    {
-        ts->printf( cvtest::TS::LOG, "Feature detector is empty.\n" );
-        ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_TEST_DATA );
-        return;
-    }
-
-    emptyDataTest();
-    regressionTest();
-
-    ts->set_failed_test_info( cvtest::TS::OK );
-}
-
-/****************************************************************************************\
-*                     Regression tests for descriptor extractors.                        *
-\****************************************************************************************/
-static void writeMatInBin( const Mat& mat, const string& filename )
-{
-    FILE* f = fopen( filename.c_str(), "wb");
-    if( f )
-    {
-        int type = mat.type();
-        fwrite( (void*)&mat.rows, sizeof(int), 1, f );
-        fwrite( (void*)&mat.cols, sizeof(int), 1, f );
-        fwrite( (void*)&type, sizeof(int), 1, f );
-        int dataSize = (int)(mat.step * mat.rows * mat.channels());
-        fwrite( (void*)&dataSize, sizeof(int), 1, f );
-        fwrite( (void*)mat.data, 1, dataSize, f );
-        fclose(f);
-    }
-}
-
-static Mat readMatFromBin( const string& filename )
-{
-    FILE* f = fopen( filename.c_str(), "rb" );
-    if( f )
-    {
-        int rows, cols, type, dataSize;
-        size_t elements_read1 = fread( (void*)&rows, sizeof(int), 1, f );
-        size_t elements_read2 = fread( (void*)&cols, sizeof(int), 1, f );
-        size_t elements_read3 = fread( (void*)&type, sizeof(int), 1, f );
-        size_t elements_read4 = fread( (void*)&dataSize, sizeof(int), 1, f );
-        CV_Assert(elements_read1 == 1 && elements_read2 == 1 && elements_read3 == 1 && elements_read4 == 1);
-
-        int step = dataSize / rows / CV_ELEM_SIZE(type);
-        CV_Assert(step >= cols);
-
-        Mat m = Mat( rows, step, type).colRange(0, cols);
-
-        size_t elements_read = fread( m.ptr(), 1, dataSize, f );
-        CV_Assert(elements_read == (size_t)(dataSize));
-        fclose(f);
-
-        return m;
-    }
-    return Mat();
-}
-
-template<class Distance>
-class CV_DescriptorExtractorTest : public cvtest::BaseTest
-{
-public:
-    typedef typename Distance::ValueType ValueType;
-    typedef typename Distance::ResultType DistanceType;
-
-    CV_DescriptorExtractorTest( const string _name, DistanceType _maxDist, const Ptr<DescriptorExtractor>& _dextractor,
-                                int imgMode = IMREAD_COLOR, Distance d = Distance()):
-            name(_name), maxDist(_maxDist), dextractor(_dextractor), imgLoadMode(imgMode), distance(d) {}
-protected:
-    virtual void createDescriptorExtractor() {}
-
-    void compareDescriptors( const Mat& validDescriptors, const Mat& calcDescriptors )
-    {
-        if( validDescriptors.size != calcDescriptors.size || validDescriptors.type() != calcDescriptors.type() )
-        {
-            ts->printf(cvtest::TS::LOG, "Valid and computed descriptors matrices must have the same size and type.\n");
-            ts->printf(cvtest::TS::LOG, "Valid size is (%d x %d) actual size is (%d x %d).\n", validDescriptors.rows, validDescriptors.cols, calcDescriptors.rows, calcDescriptors.cols);
-            ts->printf(cvtest::TS::LOG, "Valid type is %d  actual type is %d.\n", validDescriptors.type(), calcDescriptors.type());
-            ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_TEST_DATA );
-            return;
-        }
-
-        CV_Assert( DataType<ValueType>::type == validDescriptors.type() );
-
-        int dimension = validDescriptors.cols;
-        DistanceType curMaxDist = std::numeric_limits<DistanceType>::min();
-        for( int y = 0; y < validDescriptors.rows; y++ )
-        {
-            DistanceType dist = distance( validDescriptors.ptr<ValueType>(y), calcDescriptors.ptr<ValueType>(y), dimension );
-            if( dist > curMaxDist )
-                curMaxDist = dist;
-        }
-
-        std::stringstream ss;
-        ss << "Max distance between valid and computed descriptors " << curMaxDist;
-        if( curMaxDist <= maxDist )
-            ss << "." << endl;
-        else
-        {
-            ss << ">" << maxDist  << " - bad accuracy!"<< endl;
-            ts->set_failed_test_info( cvtest::TS::FAIL_BAD_ACCURACY );
-        }
-        ts->printf(cvtest::TS::LOG,  ss.str().c_str() );
-    }
-
-    void emptyDataTest()
-    {
-        assert( !dextractor.empty() );
-
-        // One image.
-        Mat image;
-        vector<KeyPoint> keypoints;
-        Mat descriptors;
-
-        try
-        {
-            dextractor->compute( image, keypoints, descriptors );
-        }
-        catch(...)
-        {
-            ts->printf( cvtest::TS::LOG, "compute() on empty image and empty keypoints must not generate exception (1).\n");
-            ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_TEST_DATA );
-        }
-
-        if(imgLoadMode == IMREAD_GRAYSCALE)
-            image.create( 256, 256, CV_8UC1 );
-        else
-            image.create( 256, 256, CV_8UC3 );
-        try
-        {
-            dextractor->compute( image, keypoints, descriptors );
-        }
-        catch(...)
-        {
-            ts->printf( cvtest::TS::LOG, "compute() on nonempty image and empty keypoints must not generate exception (1).\n");
-            ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_TEST_DATA );
-        }
-
-        // Several images.
-        vector<Mat> images;
-        vector<vector<KeyPoint> > keypointsCollection;
-        vector<Mat> descriptorsCollection;
-        try
-        {
-            dextractor->compute( images, keypointsCollection, descriptorsCollection );
-        }
-        catch(...)
-        {
-            ts->printf( cvtest::TS::LOG, "compute() on empty images and empty keypoints collection must not generate exception (2).\n");
-            ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_TEST_DATA );
-        }
-    }
-
-    void regressionTest()
-    {
-        assert( !dextractor.empty() );
-
-        // Read the test image.
-        string imgFilename =  string(ts->get_data_path()) + FEATURES2D_DIR + "/" + IMAGE_FILENAME;
-
-        Mat img = imread( imgFilename, imgLoadMode );
-        if( img.empty() )
-        {
-            ts->printf( cvtest::TS::LOG, "Image %s can not be read.\n", imgFilename.c_str() );
-            ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_TEST_DATA );
-            return;
-        }
-
-        vector<KeyPoint> keypoints;
-        FileStorage fs( string(ts->get_data_path()) + FEATURES2D_DIR + "/keypoints.xml.gz", FileStorage::READ );
-        if( fs.isOpened() )
-        {
-            read( fs.getFirstTopLevelNode(), keypoints );
-
-            Mat calcDescriptors;
-            double t = (double)getTickCount();
-#ifdef HAVE_OPENCL
-            if(cv::ocl::useOpenCL())
-            {
-                cv::UMat uimg;
-                img.copyTo(uimg);
-                dextractor->compute(uimg, keypoints, calcDescriptors);
-            }
-            else
-#endif
-            {
-                dextractor->compute(img, keypoints, calcDescriptors);
-            }
-            t = getTickCount() - t;
-            ts->printf(cvtest::TS::LOG, "\nAverage time of computing one descriptor = %g ms.\n", t/((double)getTickFrequency()*1000.)/calcDescriptors.rows );
-
-            if( calcDescriptors.rows != (int)keypoints.size() )
-            {
-                ts->printf( cvtest::TS::LOG, "Count of computed descriptors and keypoints count must be equal.\n" );
-                ts->printf( cvtest::TS::LOG, "Count of keypoints is            %d.\n", (int)keypoints.size() );
-                ts->printf( cvtest::TS::LOG, "Count of computed descriptors is %d.\n", calcDescriptors.rows );
-                ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-                return;
-            }
-
-            if( calcDescriptors.cols != dextractor->descriptorSize() || calcDescriptors.type() != dextractor->descriptorType() )
-            {
-                ts->printf( cvtest::TS::LOG, "Incorrect descriptor size or descriptor type.\n" );
-                ts->printf( cvtest::TS::LOG, "Expected size is   %d.\n", dextractor->descriptorSize() );
-                ts->printf( cvtest::TS::LOG, "Calculated size is %d.\n", calcDescriptors.cols );
-                ts->printf( cvtest::TS::LOG, "Expected type is   %d.\n", dextractor->descriptorType() );
-                ts->printf( cvtest::TS::LOG, "Calculated type is %d.\n", calcDescriptors.type() );
-                ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-                return;
-            }
-
-            // TODO read and write descriptor extractor parameters and check them
-            Mat validDescriptors = readDescriptors();
-            if( !validDescriptors.empty() )
-                compareDescriptors( validDescriptors, calcDescriptors );
-            else
-            {
-                if( !writeDescriptors( calcDescriptors ) )
-                {
-                    ts->printf( cvtest::TS::LOG, "Descriptors can not be written.\n" );
-                    ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_TEST_DATA );
-                    return;
-                }
-            }
-        }
-        else
-        {
-            ts->printf( cvtest::TS::LOG, "Compute and write keypoints.\n" );
-            fs.open( string(ts->get_data_path()) + FEATURES2D_DIR + "/keypoints.xml.gz", FileStorage::WRITE );
-            if( fs.isOpened() )
-            {
-                Ptr<SURF> fd = SURF::create();
-                fd->detect(img, keypoints);
-                write( fs, "keypoints", keypoints );
-            }
-            else
-            {
-                ts->printf(cvtest::TS::LOG, "File for writting keypoints can not be opened.\n");
-                ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_TEST_DATA );
-                return;
-            }
-        }
-    }
-
-    void run(int)
-    {
-        createDescriptorExtractor();
-        if( !dextractor )
-        {
-            ts->printf(cvtest::TS::LOG, "Descriptor extractor is empty.\n");
-            ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_TEST_DATA );
-            return;
-        }
-
-        emptyDataTest();
-        regressionTest();
-
-        ts->set_failed_test_info( cvtest::TS::OK );
-    }
-
-    virtual Mat readDescriptors()
-    {
-        Mat res = readMatFromBin( string(ts->get_data_path()) + DESCRIPTOR_DIR + "/" + string(name) );
-        return res;
-    }
-
-    virtual bool writeDescriptors( Mat& descs )
-    {
-        writeMatInBin( descs,  string(ts->get_data_path()) + DESCRIPTOR_DIR + "/" + string(name) );
-        return true;
-    }
-
-    string name;
-    const DistanceType maxDist;
-    Ptr<DescriptorExtractor> dextractor;
-    int imgLoadMode;
-    Distance distance;
-
-private:
-    CV_DescriptorExtractorTest& operator=(const CV_DescriptorExtractorTest&) { return *this; }
-};
-
-/*template<typename T, typename Distance>
-class CV_CalonderDescriptorExtractorTest : public CV_DescriptorExtractorTest<Distance>
-{
-public:
-    CV_CalonderDescriptorExtractorTest( const char* testName, float _normDif, float _prevTime ) :
-            CV_DescriptorExtractorTest<Distance>( testName, _normDif, Ptr<DescriptorExtractor>(), _prevTime )
-    {}
-
-protected:
-    virtual void createDescriptorExtractor()
-    {
-        CV_DescriptorExtractorTest<Distance>::dextractor =
-                new CalonderDescriptorExtractor<T>( string(CV_DescriptorExtractorTest<Distance>::ts->get_data_path()) +
-                                                    FEATURES2D_DIR + "/calonder_classifier.rtc");
-    }
-};*/
-
-/****************************************************************************************\
-*                       Algorithmic tests for descriptor matchers                        *
-\****************************************************************************************/
-class CV_DescriptorMatcherTest : public cvtest::BaseTest
-{
-public:
-    CV_DescriptorMatcherTest( const string& _name, const Ptr<DescriptorMatcher>& _dmatcher, float _badPart ) :
-        badPart(_badPart), name(_name), dmatcher(_dmatcher)
-        {}
-protected:
-    static const int dim = 500;
-    static const int queryDescCount = 300; // must be even number because we split train data in some cases in two
-    static const int countFactor = 4; // do not change it
-    const float badPart;
-
-    virtual void run( int );
-    void generateData( Mat& query, Mat& train );
-
-    //void emptyDataTest();
-    void matchTest( const Mat& query, const Mat& train );
-    void knnMatchTest( const Mat& query, const Mat& train );
-    void radiusMatchTest( const Mat& query, const Mat& train );
-
-    string name;
-    Ptr<DescriptorMatcher> dmatcher;
-
-private:
-    CV_DescriptorMatcherTest& operator=(const CV_DescriptorMatcherTest&) { return *this; }
-};
-
-#if 0 // not used
-void CV_DescriptorMatcherTest::emptyDataTest()
-{
-    assert( !dmatcher.empty() );
-    Mat queryDescriptors, trainDescriptors, mask;
-    vector<Mat> trainDescriptorCollection, masks;
-    vector<DMatch> matches;
-    vector<vector<DMatch> > vmatches;
-
-    try
-    {
-        dmatcher->match( queryDescriptors, trainDescriptors, matches, mask );
-    }
-    catch(...)
-    {
-        ts->printf( cvtest::TS::LOG, "match() on empty descriptors must not generate exception (1).\n" );
-        ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-    }
-
-    try
-    {
-        dmatcher->knnMatch( queryDescriptors, trainDescriptors, vmatches, 2, mask );
-    }
-    catch(...)
-    {
-        ts->printf( cvtest::TS::LOG, "knnMatch() on empty descriptors must not generate exception (1).\n" );
-        ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-    }
-
-    try
-    {
-        dmatcher->radiusMatch( queryDescriptors, trainDescriptors, vmatches, 10.f, mask );
-    }
-    catch(...)
-    {
-        ts->printf( cvtest::TS::LOG, "radiusMatch() on empty descriptors must not generate exception (1).\n" );
-        ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-    }
-
-    try
-    {
-        dmatcher->add( trainDescriptorCollection );
-    }
-    catch(...)
-    {
-        ts->printf( cvtest::TS::LOG, "add() on empty descriptors must not generate exception.\n" );
-        ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-    }
-
-    try
-    {
-        dmatcher->match( queryDescriptors, matches, masks );
-    }
-    catch(...)
-    {
-        ts->printf( cvtest::TS::LOG, "match() on empty descriptors must not generate exception (2).\n" );
-        ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-    }
-
-    try
-    {
-        dmatcher->knnMatch( queryDescriptors, vmatches, 2, masks );
-    }
-    catch(...)
-    {
-        ts->printf( cvtest::TS::LOG, "knnMatch() on empty descriptors must not generate exception (2).\n" );
-        ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-    }
-
-    try
-    {
-        dmatcher->radiusMatch( queryDescriptors, vmatches, 10.f, masks );
-    }
-    catch(...)
-    {
-        ts->printf( cvtest::TS::LOG, "radiusMatch() on empty descriptors must not generate exception (2).\n" );
-        ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-    }
-}
-#endif
-
-void CV_DescriptorMatcherTest::generateData( Mat& query, Mat& train )
-{
-    RNG& rng = theRNG();
-
-    // Generate query descriptors randomly.
-    // Descriptor vector elements are integer values.
-    Mat buf( queryDescCount, dim, CV_32SC1 );
-    rng.fill( buf, RNG::UNIFORM, Scalar::all(0), Scalar(3) );
-    buf.convertTo( query, CV_32FC1 );
-
-    // Generate train decriptors as follows:
-    // copy each query descriptor to train set countFactor times
-    // and perturb some one element of the copied descriptors in
-    // in ascending order. General boundaries of the perturbation
-    // are (0.f, 1.f).
-    train.create( query.rows*countFactor, query.cols, CV_32FC1 );
-    float step = 1.f / countFactor;
-    for( int qIdx = 0; qIdx < query.rows; qIdx++ )
-    {
-        Mat queryDescriptor = query.row(qIdx);
-        for( int c = 0; c < countFactor; c++ )
-        {
-            int tIdx = qIdx * countFactor + c;
-            Mat trainDescriptor = train.row(tIdx);
-            queryDescriptor.copyTo( trainDescriptor );
-            int elem = rng(dim);
-            float diff = rng.uniform( step*c, step*(c+1) );
-            trainDescriptor.at<float>(0, elem) += diff;
-        }
-    }
-}
-
-void CV_DescriptorMatcherTest::matchTest( const Mat& query, const Mat& train )
-{
-    dmatcher->clear();
-
-    // test const version of match()
-    {
-        vector<DMatch> matches;
-        dmatcher->match( query, train, matches );
-
-        if( (int)matches.size() != queryDescCount )
-        {
-            ts->printf(cvtest::TS::LOG, "Incorrect matches count while test match() function (1).\n");
-            ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-        }
-        else
-        {
-            int badCount = 0;
-            for( size_t i = 0; i < matches.size(); i++ )
-            {
-                DMatch match = matches[i];
-                if( (match.queryIdx != (int)i) || (match.trainIdx != (int)i*countFactor) || (match.imgIdx != 0) )
-                    badCount++;
-            }
-            if( (float)badCount > (float)queryDescCount*badPart )
-            {
-                ts->printf( cvtest::TS::LOG, "%f - too large bad matches part while test match() function (1).\n",
-                            (float)badCount/(float)queryDescCount );
-                ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-            }
-        }
-    }
-
-    // test version of match() with add()
-    {
-        vector<DMatch> matches;
-        // make add() twice to test such case
-        dmatcher->add( vector<Mat>(1,train.rowRange(0, train.rows/2)) );
-        dmatcher->add( vector<Mat>(1,train.rowRange(train.rows/2, train.rows)) );
-        // prepare masks (make first nearest match illegal)
-        vector<Mat> masks(2);
-        for(int mi = 0; mi < 2; mi++ )
-        {
-            masks[mi] = Mat(query.rows, train.rows/2, CV_8UC1, Scalar::all(1));
-            for( int di = 0; di < queryDescCount/2; di++ )
-                masks[mi].col(di*countFactor).setTo(Scalar::all(0));
-        }
-
-        dmatcher->match( query, matches, masks );
-
-        if( (int)matches.size() != queryDescCount )
-        {
-            ts->printf(cvtest::TS::LOG, "Incorrect matches count while test match() function (2).\n");
-            ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-        }
-        else
-        {
-            int badCount = 0;
-            for( size_t i = 0; i < matches.size(); i++ )
-            {
-                DMatch match = matches[i];
-                int shift = dmatcher->isMaskSupported() ? 1 : 0;
-                {
-                    if( i < queryDescCount/2 )
-                    {
-                        if( (match.queryIdx != (int)i) || (match.trainIdx != (int)i*countFactor + shift) || (match.imgIdx != 0) )
-                            badCount++;
-                    }
-                    else
-                    {
-                        if( (match.queryIdx != (int)i) || (match.trainIdx != ((int)i-queryDescCount/2)*countFactor + shift) || (match.imgIdx != 1) )
-                            badCount++;
-                    }
-                }
-            }
-            if( (float)badCount > (float)queryDescCount*badPart )
-            {
-                ts->printf( cvtest::TS::LOG, "%f - too large bad matches part while test match() function (2).\n",
-                            (float)badCount/(float)queryDescCount );
-                ts->set_failed_test_info( cvtest::TS::FAIL_BAD_ACCURACY );
-            }
-        }
-    }
-}
-
-void CV_DescriptorMatcherTest::knnMatchTest( const Mat& query, const Mat& train )
-{
-    dmatcher->clear();
-
-    // test const version of knnMatch()
-    {
-        const int knn = 3;
-
-        vector<vector<DMatch> > matches;
-        dmatcher->knnMatch( query, train, matches, knn );
-
-        if( (int)matches.size() != queryDescCount )
-        {
-            ts->printf(cvtest::TS::LOG, "Incorrect matches count while test knnMatch() function (1).\n");
-            ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-        }
-        else
-        {
-            int badCount = 0;
-            for( size_t i = 0; i < matches.size(); i++ )
-            {
-                if( (int)matches[i].size() != knn )
-                    badCount++;
-                else
-                {
-                    int localBadCount = 0;
-                    for( int k = 0; k < knn; k++ )
-                    {
-                        DMatch match = matches[i][k];
-                        if( (match.queryIdx != (int)i) || (match.trainIdx != (int)i*countFactor+k) || (match.imgIdx != 0) )
-                            localBadCount++;
-                    }
-                    badCount += localBadCount > 0 ? 1 : 0;
-                }
-            }
-            if( (float)badCount > (float)queryDescCount*badPart )
-            {
-                ts->printf( cvtest::TS::LOG, "%f - too large bad matches part while test knnMatch() function (1).\n",
-                            (float)badCount/(float)queryDescCount );
-                ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-            }
-        }
-    }
-
-    // test version of knnMatch() with add()
-    {
-        const int knn = 2;
-        vector<vector<DMatch> > matches;
-        // make add() twice to test such case
-        dmatcher->add( vector<Mat>(1,train.rowRange(0, train.rows/2)) );
-        dmatcher->add( vector<Mat>(1,train.rowRange(train.rows/2, train.rows)) );
-        // prepare masks (make first nearest match illegal)
-        vector<Mat> masks(2);
-        for(int mi = 0; mi < 2; mi++ )
-        {
-            masks[mi] = Mat(query.rows, train.rows/2, CV_8UC1, Scalar::all(1));
-            for( int di = 0; di < queryDescCount/2; di++ )
-                masks[mi].col(di*countFactor).setTo(Scalar::all(0));
-        }
-
-        dmatcher->knnMatch( query, matches, knn, masks );
-
-        if( (int)matches.size() != queryDescCount )
-        {
-            ts->printf(cvtest::TS::LOG, "Incorrect matches count while test knnMatch() function (2).\n");
-            ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-        }
-        else
-        {
-            int badCount = 0;
-            int shift = dmatcher->isMaskSupported() ? 1 : 0;
-            for( size_t i = 0; i < matches.size(); i++ )
-            {
-                if( (int)matches[i].size() != knn )
-                    badCount++;
-                else
-                {
-                    int localBadCount = 0;
-                    for( int k = 0; k < knn; k++ )
-                    {
-                        DMatch match = matches[i][k];
-                        {
-                            if( i < queryDescCount/2 )
-                            {
-                                if( (match.queryIdx != (int)i) || (match.trainIdx != (int)i*countFactor + k + shift) ||
-                                    (match.imgIdx != 0) )
-                                    localBadCount++;
-                            }
-                            else
-                            {
-                                if( (match.queryIdx != (int)i) || (match.trainIdx != ((int)i-queryDescCount/2)*countFactor + k + shift) ||
-                                    (match.imgIdx != 1) )
-                                    localBadCount++;
-                            }
-                        }
-                    }
-                    badCount += localBadCount > 0 ? 1 : 0;
-                }
-            }
-            if( (float)badCount > (float)queryDescCount*badPart )
-            {
-                ts->printf( cvtest::TS::LOG, "%f - too large bad matches part while test knnMatch() function (2).\n",
-                            (float)badCount/(float)queryDescCount );
-                ts->set_failed_test_info( cvtest::TS::FAIL_BAD_ACCURACY );
-            }
-        }
-    }
-}
-
-void CV_DescriptorMatcherTest::radiusMatchTest( const Mat& query, const Mat& train )
-{
-    dmatcher->clear();
-    // test const version of match()
-    {
-        const float radius = 1.f/countFactor;
-        vector<vector<DMatch> > matches;
-        dmatcher->radiusMatch( query, train, matches, radius );
-
-        if( (int)matches.size() != queryDescCount )
-        {
-            ts->printf(cvtest::TS::LOG, "Incorrect matches count while test radiusMatch() function (1).\n");
-            ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-        }
-        else
-        {
-            int badCount = 0;
-            for( size_t i = 0; i < matches.size(); i++ )
-            {
-                if( (int)matches[i].size() != 1 )
-                    badCount++;
-                else
-                {
-                    DMatch match = matches[i][0];
-                    if( (match.queryIdx != (int)i) || (match.trainIdx != (int)i*countFactor) || (match.imgIdx != 0) )
-                        badCount++;
-                }
-            }
-            if( (float)badCount > (float)queryDescCount*badPart )
-            {
-                ts->printf( cvtest::TS::LOG, "%f - too large bad matches part while test radiusMatch() function (1).\n",
-                            (float)badCount/(float)queryDescCount );
-                ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-            }
-        }
-    }
-
-    // test version of match() with add()
-    {
-        int n = 3;
-        const float radius = 1.f/countFactor * n;
-        vector<vector<DMatch> > matches;
-        // make add() twice to test such case
-        dmatcher->add( vector<Mat>(1,train.rowRange(0, train.rows/2)) );
-        dmatcher->add( vector<Mat>(1,train.rowRange(train.rows/2, train.rows)) );
-        // prepare masks (make first nearest match illegal)
-        vector<Mat> masks(2);
-        for(int mi = 0; mi < 2; mi++ )
-        {
-            masks[mi] = Mat(query.rows, train.rows/2, CV_8UC1, Scalar::all(1));
-            for( int di = 0; di < queryDescCount/2; di++ )
-                masks[mi].col(di*countFactor).setTo(Scalar::all(0));
-        }
-
-        dmatcher->radiusMatch( query, matches, radius, masks );
-
-        //int curRes = cvtest::TS::OK;
-        if( (int)matches.size() != queryDescCount )
-        {
-            ts->printf(cvtest::TS::LOG, "Incorrect matches count while test radiusMatch() function (1).\n");
-            ts->set_failed_test_info( cvtest::TS::FAIL_INVALID_OUTPUT );
-        }
-
-        int badCount = 0;
-        int shift = dmatcher->isMaskSupported() ? 1 : 0;
-        int needMatchCount = dmatcher->isMaskSupported() ? n-1 : n;
-        for( size_t i = 0; i < matches.size(); i++ )
-        {
-            if( (int)matches[i].size() != needMatchCount )
-                badCount++;
-            else
-            {
-                int localBadCount = 0;
-                for( int k = 0; k < needMatchCount; k++ )
-                {
-                    DMatch match = matches[i][k];
-                    {
-                        if( i < queryDescCount/2 )
-                        {
-                            if( (match.queryIdx != (int)i) || (match.trainIdx != (int)i*countFactor + k + shift) ||
-                                (match.imgIdx != 0) )
-                                localBadCount++;
-                        }
-                        else
-                        {
-                            if( (match.queryIdx != (int)i) || (match.trainIdx != ((int)i-queryDescCount/2)*countFactor + k + shift) ||
-                                (match.imgIdx != 1) )
-                                localBadCount++;
-                        }
-                    }
-                }
-                badCount += localBadCount > 0 ? 1 : 0;
-            }
-        }
-        if( (float)badCount > (float)queryDescCount*badPart )
-        {
-            ts->printf( cvtest::TS::LOG, "%f - too large bad matches part while test radiusMatch() function (2).\n",
-                        (float)badCount/(float)queryDescCount );
-            ts->set_failed_test_info( cvtest::TS::FAIL_BAD_ACCURACY );
-        }
-    }
-}
-
-void CV_DescriptorMatcherTest::run( int )
-{
-    Mat query, train;
-    generateData( query, train );
-
-    matchTest( query, train );
-
-    knnMatchTest( query, train );
-
-    radiusMatchTest( query, train );
-}
-
-/****************************************************************************************\
-*                                Tests registrations                                     *
-\****************************************************************************************/
+#include "features2d/test/test_detectors_regression.impl.hpp"
+#include "features2d/test/test_descriptors_regression.impl.hpp"
 
-/*
- * Detectors
- */
+namespace opencv_test { namespace {
 
 TEST( Features2d_Detector_SIFT, regression )
 {
@@ -1077,9 +143,8 @@ TEST( Features2d_DescriptorExtractor_DAISY, regression )
 
 TEST( Features2d_DescriptorExtractor_FREAK, regression )
 {
-    // TODO adjust the parameters below
-    CV_DescriptorExtractorTest<Hamming> test( "descriptor-freak",  (CV_DescriptorExtractorTest<Hamming>::DistanceType)12.f,
-                                             FREAK::create(), IMREAD_GRAYSCALE );
+    CV_DescriptorExtractorTest<Hamming> test("descriptor-freak", (CV_DescriptorExtractorTest<Hamming>::DistanceType)12.f,
+                                             FREAK::create());
     test.safe_run();
 }
 
@@ -1190,23 +255,6 @@ TEST( Features2d_DescriptorExtractor_BINBOOST_256, regression )
 }
 
 
-/*#if CV_SSE2
-TEST( Features2d_DescriptorExtractor_Calonder_uchar, regression )
-{
-    CV_CalonderDescriptorExtractorTest<uchar, L2<uchar> > test( "descriptor-calonder-uchar",
-                                                                std::numeric_limits<float>::epsilon() + 1,
-                                                                0.0132175f );
-    test.safe_run();
-}
-
-TEST( Features2d_DescriptorExtractor_Calonder_float, regression )
-{
-    CV_CalonderDescriptorExtractorTest<float, L2<float> > test( "descriptor-calonder-float",
-                                                                std::numeric_limits<float>::epsilon(),
-                                                                0.0221308f );
-    test.safe_run();
-}
-#endif*/ // CV_SSE2
 #ifdef OPENCV_ENABLE_NONFREE
 TEST(Features2d_BruteForceDescriptorMatcher_knnMatch, regression)
 {
@@ -1214,14 +262,14 @@ TEST(Features2d_BruteForceDescriptorMatcher_knnMatch, regression)
     const int k = 3;
 
     Ptr<DescriptorExtractor> ext = SURF::create();
-    ASSERT_TRUE(ext != NULL);
+    ASSERT_TRUE(ext);
 
     Ptr<FeatureDetector> det = SURF::create();
     //"%YAML:1.0\nhessianThreshold: 8000.\noctaves: 3\noctaveLayers: 4\nupright: 0\n"
-    ASSERT_TRUE(det != NULL);
+    ASSERT_TRUE(det);
 
     Ptr<DescriptorMatcher> matcher = DescriptorMatcher::create("BruteForce");
-    ASSERT_TRUE(matcher != NULL);
+    ASSERT_TRUE(matcher);
 
     Mat imgT(256, 256, CV_8U, Scalar(255));
     line(imgT, Point(20, sz/2), Point(sz-21, sz/2), Scalar(100), 2);
@@ -1259,13 +307,6 @@ TEST(Features2d_BruteForceDescriptorMatcher_knnMatch, regression)
 }
 #endif
 
-/*TEST(Features2d_DescriptorExtractorParamTest, regression)
-{
-    Ptr<DescriptorExtractor> s = DescriptorExtractor::create("SURF");
-    ASSERT_STREQ(s->paramHelp("extended").c_str(), "");
-}
-*/
-
 class CV_DetectPlanarTest : public cvtest::BaseTest
 {
 public:
diff --git a/modules/xfeatures2d/test/test_gms_matcher.cpp b/modules/xfeatures2d/test/test_gms_matcher.cpp
index fa6aab23491..9b65083d258 100644
--- a/modules/xfeatures2d/test/test_gms_matcher.cpp
+++ b/modules/xfeatures2d/test/test_gms_matcher.cpp
@@ -27,16 +27,15 @@ CV_GMSMatcherTest::CV_GMSMatcherTest()
     combinations[2][0] = true; combinations[2][1] = false;
     combinations[3][0] = true; combinations[3][1] = true;
 
-    //Threshold = truncate(min(acc_win32, acc_win64))
-    eps[0][0] = 0.9313;
-    eps[0][1] = 0.92;
-    eps[0][2] = 0.9313;
-    eps[0][3] = 0.92;
+    eps[0][0] = 0.91;
+    eps[0][1] = 0.91;
+    eps[0][2] = 0.91;
+    eps[0][3] = 0.91;
 
-    eps[1][0] = 0.8199;
-    eps[1][1] = 0.7964;
-    eps[1][2] = 0.8199;
-    eps[1][3] = 0.7964;
+    eps[1][0] = 0.80;
+    eps[1][1] = 0.78;
+    eps[1][2] = 0.80;
+    eps[1][3] = 0.78;
 
     eps[2][0] = 0.6;
     eps[2][1] = 0.6;
@@ -66,7 +65,8 @@ void CV_GMSMatcherTest::run( int )
     const int nImgs = 3;
     for (int num = startImg; num < startImg+nImgs; num++)
     {
-        string imgPath = string(ts->get_data_path()) + format("detectors_descriptors_evaluation/images_datasets/graf/img%d.png", num);
+        string fileName = cv::format("img%d.png", num);
+        string imgPath = string(ts->get_data_path()) + "detectors_descriptors_evaluation/images_datasets/graf/" + fileName;
         Mat imgCur = imread(imgPath);
         orb->detectAndCompute(imgCur, noArray(), keypointsCur, descriptorsCur);
 
@@ -102,14 +102,11 @@ void CV_GMSMatcherTest::run( int )
             }
 
             double ratio = nbCorrectMatches / (double) matchesGMS.size();
-            if (ratio < eps[num-startImg][comb])
-            {
-                ts->printf( cvtest::TS::LOG, "Invalid accuracy for image %s and combination withRotation=%d withScale=%d, "
-                                             "matches ratio is %f, ratio threshold is %f, distance threshold is %f.\n",
-                            imgPath.substr(imgPath.size()-8).c_str(), combinations[comb][0], combinations[comb][1], ratio,
+            EXPECT_GT(ratio, eps[num-startImg][comb]) <<
+                cv::format("Invalid accuracy for image %s and combination withRotation=%d withScale=%d, "
+                           "matches ratio is %g, ratio threshold is %g, distance threshold is %g.",
+                            fileName.c_str(), combinations[comb][0], combinations[comb][1], ratio,
                             eps[num-startImg][comb], correctMatchDistThreshold);
-                ts->set_failed_test_info(cvtest::TS::FAIL_BAD_ACCURACY);
-            }
         }
     }
 }
diff --git a/modules/xfeatures2d/test/test_logos_matcher.cpp b/modules/xfeatures2d/test/test_logos_matcher.cpp
new file mode 100644
index 00000000000..5a8f7d57fb2
--- /dev/null
+++ b/modules/xfeatures2d/test/test_logos_matcher.cpp
@@ -0,0 +1,152 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+static void loadKeypoints(const std::string& vP_path,
+                          const std::string& oP_path,
+                          const std::string& sP_path,
+                          const std::string& w_path,
+                          std::vector<cv::KeyPoint>& keypoints,
+                          std::vector<int>& nn)
+{
+    {
+        std::ifstream file(vP_path.c_str());
+        if (file.is_open())
+        {
+            float x = 0, y = 0;
+            while (file >> x >> y)
+            {
+                keypoints.push_back(cv::KeyPoint(x, y, 0));
+            }
+        }
+    }
+    {
+        std::ifstream file(oP_path.c_str());
+        if (file.is_open())
+        {
+            float orientation = 0;
+            size_t idx = 0;
+            while (file >> orientation)
+            {
+                keypoints[idx].angle = static_cast<float>(orientation * 180.0 / CV_PI);
+                idx++;
+            }
+        }
+    }
+    {
+        std::ifstream file(sP_path.c_str());
+        if (file.is_open())
+        {
+            float scale = 0;
+            size_t idx = 0;
+            while (file >> scale)
+            {
+                keypoints[idx].size = scale;
+                idx++;
+            }
+        }
+    }
+    {
+        std::ifstream file(w_path.c_str());
+        if (file.is_open())
+        {
+            int neighborIdx = 0;
+            while (file >> neighborIdx)
+            {
+                nn.push_back(neighborIdx);
+            }
+        }
+    }
+
+    ASSERT_TRUE(!keypoints.empty());
+}
+
+static void loadGroundTruth(const std::string& d1_path,
+                            const std::string& b1_path,
+                            std::vector<cv::DMatch>& groundTruth)
+{
+    std::vector<int> d1_vec;
+    {
+        std::ifstream file(d1_path.c_str());
+        if (file.is_open())
+        {
+            int idx = 0;
+            while (file >> idx)
+            {
+                d1_vec.push_back(idx-1);
+            }
+        }
+    }
+
+    std::vector<int> b1_vec;
+    {
+        std::ifstream file(b1_path.c_str());
+        if (file.is_open())
+        {
+            int idx = 0;
+            while (file >> idx)
+            {
+                b1_vec.push_back(idx-1);
+            }
+        }
+    }
+
+    ASSERT_TRUE(!d1_vec.empty());
+    ASSERT_EQ(d1_vec.size(), b1_vec.size());
+
+    for (size_t i = 0; i < d1_vec.size(); i++)
+    {
+        groundTruth.push_back(cv::DMatch(d1_vec[i], b1_vec[i], 0));
+    }
+}
+
+TEST(XFeatures2d_LogosMatcher, logos_matcher_regression)
+{
+    const std::string vP1_path = cvtest::findDataFile("detectors_descriptors_evaluation/matching/LOGOS/vP1.txt");
+    const std::string oP1_path = cvtest::findDataFile("detectors_descriptors_evaluation/matching/LOGOS/oP1.txt");
+    const std::string sP1_path = cvtest::findDataFile("detectors_descriptors_evaluation/matching/LOGOS/sP1.txt");
+    const std::string w1_path = cvtest::findDataFile("detectors_descriptors_evaluation/matching/LOGOS/w1.txt");
+
+    const std::string vP2_path = cvtest::findDataFile("detectors_descriptors_evaluation/matching/LOGOS/vP2.txt");
+    const std::string oP2_path = cvtest::findDataFile("detectors_descriptors_evaluation/matching/LOGOS/oP2.txt");
+    const std::string sP2_path = cvtest::findDataFile("detectors_descriptors_evaluation/matching/LOGOS/sP2.txt");
+    const std::string w2_path = cvtest::findDataFile("detectors_descriptors_evaluation/matching/LOGOS/w2.txt");
+
+    std::vector<cv::KeyPoint> keypoints1, keypoints2;
+    std::vector<int> nn1, nn2;
+    loadKeypoints(vP1_path, oP1_path, sP1_path, w1_path, keypoints1, nn1);
+    loadKeypoints(vP2_path, oP2_path, sP2_path, w2_path, keypoints2, nn2);
+
+    std::vector<cv::DMatch> matchesLogos;
+    matchLOGOS(keypoints1, keypoints2, nn1, nn2, matchesLogos);
+
+    std::vector<cv::DMatch> groundTruth;
+    const std::string d1_path = cvtest::findDataFile("detectors_descriptors_evaluation/matching/LOGOS/d1.txt");
+    const std::string b1_path = cvtest::findDataFile("detectors_descriptors_evaluation/matching/LOGOS/b1.txt");
+    loadGroundTruth(d1_path, b1_path, groundTruth);
+
+    int correctMatches = 0;
+    for (size_t i = 0; i < matchesLogos.size(); i++)
+    {
+        for (size_t j = 0; j < groundTruth.size(); j++)
+        {
+            if (groundTruth[j].queryIdx == matchesLogos[i].queryIdx &&
+                groundTruth[j].trainIdx == matchesLogos[j].trainIdx)
+            {
+                correctMatches++;
+                break;
+            }
+        }
+    }
+
+    ASSERT_EQ(static_cast<int>(groundTruth.size()), correctMatches)
+            << "groundTruth: " << groundTruth.size()
+            << " ; matchesLogos: " << matchesLogos.size()
+            << " ; correctMatches: " << correctMatches;
+}
+
+}} // namespace
diff --git a/modules/xfeatures2d/test/test_rotation_and_scale_invariance.cpp b/modules/xfeatures2d/test/test_rotation_and_scale_invariance.cpp
index b9b0eec42af..23b2d504757 100644
--- a/modules/xfeatures2d/test/test_rotation_and_scale_invariance.cpp
+++ b/modules/xfeatures2d/test/test_rotation_and_scale_invariance.cpp
@@ -1,841 +1,261 @@
-/*M///////////////////////////////////////////////////////////////////////////////////////
-//
-//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
-//
-//  By downloading, copying, installing or using the software you agree to this license.
-//  If you do not agree to this license, do not download, install,
-//  copy or use the software.
-//
-//
-//                        Intel License Agreement
-//                For Open Source Computer Vision Library
-//
-// Copyright (C) 2000, Intel Corporation, all rights reserved.
-// Third party copyrights are property of their respective owners.
-//
-// Redistribution and use in source and binary forms, with or without modification,
-// are permitted provided that the following conditions are met:
-//
-//   * Redistribution's of source code must retain the above copyright notice,
-//     this list of conditions and the following disclaimer.
-//
-//   * Redistribution's in binary form must reproduce the above copyright notice,
-//     this list of conditions and the following disclaimer in the documentation
-//     and/or other materials provided with the distribution.
-//
-//   * The name of Intel Corporation may not be used to endorse or promote products
-//     derived from this software without specific prior written permission.
-//
-// This software is provided by the copyright holders and contributors "as is" and
-// any express or implied warranties, including, but not limited to, the implied
-// warranties of merchantability and fitness for a particular purpose are disclaimed.
-// In no event shall the Intel Corporation or contributors be liable for any direct,
-// indirect, incidental, special, exemplary, or consequential damages
-// (including, but not limited to, procurement of substitute goods or services;
-// loss of use, data, or profits; or business interruption) however caused
-// and on any theory of liability, whether in contract, strict liability,
-// or tort (including negligence or otherwise) arising in any way out of
-// the use of this software, even if advised of the possibility of such damage.
-//
-//M*/
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
 
 #include "test_precomp.hpp"
 
-namespace opencv_test { namespace {
-
-const string IMAGE_TSUKUBA = "/features2d/tsukuba.png";
-const string IMAGE_BIKES = "/detectors_descriptors_evaluation/images_datasets/bikes/img1.png";
-
-#define SHOW_DEBUG_LOG 0
-
-static
-Mat generateHomography(float angle)
-{
-    // angle - rotation around Oz in degrees
-    float angleRadian = static_cast<float>(angle * CV_PI / 180);
-    Mat H = Mat::eye(3, 3, CV_32FC1);
-    H.at<float>(0,0) = H.at<float>(1,1) = std::cos(angleRadian);
-    H.at<float>(0,1) = -std::sin(angleRadian);
-    H.at<float>(1,0) =  std::sin(angleRadian);
-
-    return H;
-}
-
-static
-Mat rotateImage(const Mat& srcImage, float angle, Mat& dstImage, Mat& dstMask)
-{
-    // angle - rotation around Oz in degrees
-    float diag = std::sqrt(static_cast<float>(srcImage.cols * srcImage.cols + srcImage.rows * srcImage.rows));
-    Mat LUShift = Mat::eye(3, 3, CV_32FC1); // left up
-    LUShift.at<float>(0,2) = static_cast<float>(-srcImage.cols/2);
-    LUShift.at<float>(1,2) = static_cast<float>(-srcImage.rows/2);
-    Mat RDShift = Mat::eye(3, 3, CV_32FC1); // right down
-    RDShift.at<float>(0,2) = diag/2;
-    RDShift.at<float>(1,2) = diag/2;
-    Size sz(cvRound(diag), cvRound(diag));
-
-    Mat srcMask(srcImage.size(), CV_8UC1, Scalar(255));
-
-    Mat H = RDShift * generateHomography(angle) * LUShift;
-    warpPerspective(srcImage, dstImage, H, sz);
-    warpPerspective(srcMask, dstMask, H, sz);
-
-    return H;
-}
-
-void rotateKeyPoints(const vector<KeyPoint>& src, const Mat& H, float angle, vector<KeyPoint>& dst)
-{
-    // suppose that H is rotation given from rotateImage() and angle has value passed to rotateImage()
-    vector<Point2f> srcCenters, dstCenters;
-    KeyPoint::convert(src, srcCenters);
-
-    perspectiveTransform(srcCenters, dstCenters, H);
-
-    dst = src;
-    for(size_t i = 0; i < dst.size(); i++)
-    {
-        dst[i].pt = dstCenters[i];
-        float dstAngle = src[i].angle + angle;
-        if(dstAngle >= 360.f)
-            dstAngle -= 360.f;
-        dst[i].angle = dstAngle;
-    }
-}
-
-void scaleKeyPoints(const vector<KeyPoint>& src, vector<KeyPoint>& dst, float scale)
-{
-    dst.resize(src.size());
-    for(size_t i = 0; i < src.size(); i++)
-        dst[i] = KeyPoint(src[i].pt.x * scale, src[i].pt.y * scale, src[i].size * scale, src[i].angle);
-}
-
-static
-float calcCirclesIntersectArea(const Point2f& p0, float r0, const Point2f& p1, float r1)
-{
-    float c = static_cast<float>(cv::norm(p0 - p1)), sqr_c = c * c;
-
-    float sqr_r0 = r0 * r0;
-    float sqr_r1 = r1 * r1;
-
-    if(r0 + r1 <= c)
-       return 0;
+#include "features2d/test/test_detectors_invariance.impl.hpp" // main OpenCV repo
+#include "features2d/test/test_descriptors_invariance.impl.hpp" // main OpenCV repo
 
-    float minR = std::min(r0, r1);
-    float maxR = std::max(r0, r1);
-    if(c + minR <= maxR)
-        return static_cast<float>(CV_PI * minR * minR);
-
-    float cos_halfA0 = (sqr_r0 + sqr_c - sqr_r1) / (2 * r0 * c);
-    float cos_halfA1 = (sqr_r1 + sqr_c - sqr_r0) / (2 * r1 * c);
-
-    float A0 = 2 * acos(cos_halfA0);
-    float A1 = 2 * acos(cos_halfA1);
-
-    return  0.5f * sqr_r0 * (A0 - sin(A0)) +
-            0.5f * sqr_r1 * (A1 - sin(A1));
-}
-
-static
-float calcIntersectRatio(const Point2f& p0, float r0, const Point2f& p1, float r1)
-{
-    float intersectArea = calcCirclesIntersectArea(p0, r0, p1, r1);
-    float unionArea = static_cast<float>(CV_PI) * (r0 * r0 + r1 * r1) - intersectArea;
-    return intersectArea / unionArea;
-}
-
-static
-void matchKeyPoints(const vector<KeyPoint>& keypoints0, const Mat& H,
-                    const vector<KeyPoint>& keypoints1,
-                    vector<DMatch>& matches)
-{
-    vector<Point2f> points0;
-    KeyPoint::convert(keypoints0, points0);
-    Mat points0t;
-    if(H.empty())
-        points0t = Mat(points0);
-    else
-        perspectiveTransform(Mat(points0), points0t, H);
-
-    matches.clear();
-    vector<uchar> usedMask(keypoints1.size(), 0);
-    for(int i0 = 0; i0 < static_cast<int>(keypoints0.size()); i0++)
-    {
-        int nearestPointIndex = -1;
-        float maxIntersectRatio = 0.f;
-        const float r0 =  0.5f * keypoints0[i0].size;
-        for(size_t i1 = 0; i1 < keypoints1.size(); i1++)
-        {
-            float r1 = 0.5f * keypoints1[i1].size;
-            float intersectRatio = calcIntersectRatio(points0t.at<Point2f>(i0), r0,
-                                                      keypoints1[i1].pt, r1);
-            if(intersectRatio > maxIntersectRatio)
-            {
-                maxIntersectRatio = intersectRatio;
-                nearestPointIndex = static_cast<int>(i1);
-            }
-        }
-
-        matches.push_back(DMatch(i0, nearestPointIndex, maxIntersectRatio));
-        if(nearestPointIndex >= 0)
-            usedMask[nearestPointIndex] = 1;
-    }
-}
-
-static void removeVerySmallKeypoints(vector<KeyPoint>& keypoints)
-{
-    size_t i, j = 0, n = keypoints.size();
-    for( i = 0; i < n; i++ )
-    {
-        if( (keypoints[i].octave & 128) != 0 )
-            ;
-        else
-            keypoints[j++] = keypoints[i];
-    }
-    keypoints.resize(j);
-}
+namespace opencv_test { namespace {
 
+static const char* const IMAGE_TSUKUBA = "features2d/tsukuba.png";
+static const char* const IMAGE_BIKES = "detectors_descriptors_evaluation/images_datasets/bikes/img1.png";
 
-class DetectorRotationInvarianceTest : public cvtest::BaseTest
-{
-public:
-    DetectorRotationInvarianceTest(const Ptr<FeatureDetector>& _featureDetector,
-                                     float _minKeyPointMatchesRatio,
-                                     float _minAngleInliersRatio) :
-        featureDetector(_featureDetector),
-        minKeyPointMatchesRatio(_minKeyPointMatchesRatio),
-        minAngleInliersRatio(_minAngleInliersRatio)
-    {
-        CV_Assert(featureDetector);
-    }
+// ========================== ROTATION INVARIANCE =============================
 
-protected:
+#ifdef OPENCV_ENABLE_NONFREE
 
-    void run(int)
-    {
-        const string imageFilename = string(ts->get_data_path()) + IMAGE_TSUKUBA;
-
-        // Read test data
-        Mat image0 = imread(imageFilename), image1, mask1;
-        if(image0.empty())
-        {
-            ts->printf(cvtest::TS::LOG, "Image %s can not be read.\n", imageFilename.c_str());
-            ts->set_failed_test_info(cvtest::TS::FAIL_INVALID_TEST_DATA);
-            return;
-        }
-
-        vector<KeyPoint> keypoints0;
-        featureDetector->detect(image0, keypoints0);
-        removeVerySmallKeypoints(keypoints0);
-        if(keypoints0.size() < 15)
-            CV_Error(Error::StsAssert, "Detector gives too few points in a test image\n");
-
-        const int maxAngle = 360, angleStep = 15;
-        for(int angle = 0; angle < maxAngle; angle += angleStep)
-        {
-            Mat H = rotateImage(image0, static_cast<float>(angle), image1, mask1);
-
-            vector<KeyPoint> keypoints1;
-            featureDetector->detect(image1, keypoints1, mask1);
-            removeVerySmallKeypoints(keypoints1);
-
-            vector<DMatch> matches;
-            matchKeyPoints(keypoints0, H, keypoints1, matches);
-
-            int angleInliersCount = 0;
-
-            const float minIntersectRatio = 0.5f;
-            int keyPointMatchesCount = 0;
-            for(size_t m = 0; m < matches.size(); m++)
-            {
-                if(matches[m].distance < minIntersectRatio)
-                    continue;
-
-                keyPointMatchesCount++;
-
-                // Check does this inlier have consistent angles
-                const float maxAngleDiff = 15.f; // grad
-                float angle0 = keypoints0[matches[m].queryIdx].angle;
-                float angle1 = keypoints1[matches[m].trainIdx].angle;
-                if(angle0 == -1 || angle1 == -1)
-                    CV_Error(Error::StsBadArg, "Given FeatureDetector is not rotation invariant, it can not be tested here.\n");
-                CV_Assert(angle0 >= 0.f && angle0 < 360.f);
-                CV_Assert(angle1 >= 0.f && angle1 < 360.f);
-
-                float rotAngle0 = angle0 + angle;
-                if(rotAngle0 >= 360.f)
-                    rotAngle0 -= 360.f;
-
-                float angleDiff = std::max(rotAngle0, angle1) - std::min(rotAngle0, angle1);
-                angleDiff = std::min(angleDiff, static_cast<float>(360.f - angleDiff));
-                CV_Assert(angleDiff >= 0.f);
-                bool isAngleCorrect = angleDiff < maxAngleDiff;
-                if(isAngleCorrect)
-                    angleInliersCount++;
-            }
-
-            float keyPointMatchesRatio = static_cast<float>(keyPointMatchesCount) / keypoints0.size();
-            if(keyPointMatchesRatio < minKeyPointMatchesRatio)
-            {
-                ts->printf(cvtest::TS::LOG, "Incorrect keyPointMatchesRatio: curr = %f, min = %f.\n",
-                           keyPointMatchesRatio, minKeyPointMatchesRatio);
-                ts->set_failed_test_info(cvtest::TS::FAIL_BAD_ACCURACY);
-                return;
-            }
-
-            if(keyPointMatchesCount)
-            {
-                float angleInliersRatio = static_cast<float>(angleInliersCount) / keyPointMatchesCount;
-                if(angleInliersRatio < minAngleInliersRatio)
-                {
-                    ts->printf(cvtest::TS::LOG, "Incorrect angleInliersRatio: curr = %f, min = %f.\n",
-                               angleInliersRatio, minAngleInliersRatio);
-                    ts->set_failed_test_info(cvtest::TS::FAIL_BAD_ACCURACY);
-                    return;
-                }
-            }
-#if SHOW_DEBUG_LOG
-            std::cout << "keyPointMatchesRatio - " << keyPointMatchesRatio
-                << " - angleInliersRatio " << static_cast<float>(angleInliersCount) / keyPointMatchesCount << std::endl;
-#endif
-        }
-        ts->set_failed_test_info( cvtest::TS::OK );
-    }
+INSTANTIATE_TEST_CASE_P(SURF, DetectorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA, SURF::create(), 0.40f, 0.76f)
+));
 
-    Ptr<FeatureDetector> featureDetector;
-    float minKeyPointMatchesRatio;
-    float minAngleInliersRatio;
-};
+INSTANTIATE_TEST_CASE_P(SIFT, DetectorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA, SIFT::create(), 0.45f, 0.70f)
+));
 
-class DescriptorRotationInvarianceTest : public cvtest::BaseTest
-{
-public:
-    DescriptorRotationInvarianceTest(const Ptr<FeatureDetector>& _featureDetector,
-                                     const Ptr<DescriptorExtractor>& _descriptorExtractor,
-                                     int _normType,
-                                     float _minDescInliersRatio, int imgLoad = IMREAD_COLOR) :
-        featureDetector(_featureDetector),
-        descriptorExtractor(_descriptorExtractor),
-        normType(_normType),
-        minDescInliersRatio(_minDescInliersRatio),
-        imgLoadMode(imgLoad)
-    {
-        CV_Assert(featureDetector);
-        CV_Assert(descriptorExtractor);
-    }
+INSTANTIATE_TEST_CASE_P(SURF, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA, SURF::create(), SURF::create(), 0.83f)
+));
 
-protected:
+INSTANTIATE_TEST_CASE_P(SIFT, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA, SIFT::create(), SIFT::create(), 0.98f)
+));
 
-    void run(int)
-    {
-        const string imageFilename = string(ts->get_data_path()) + IMAGE_TSUKUBA;
-
-        // Read test data
-        Mat image0 = imread(imageFilename, imgLoadMode), image1, mask1;
-        if(image0.empty())
-        {
-            ts->printf(cvtest::TS::LOG, "Image %s can not be read.\n", imageFilename.c_str());
-            ts->set_failed_test_info(cvtest::TS::FAIL_INVALID_TEST_DATA);
-            return;
-        }
-
-        vector<KeyPoint> keypoints0;
-        Mat descriptors0;
-        featureDetector->detect(image0, keypoints0);
-        removeVerySmallKeypoints(keypoints0);
-        if(keypoints0.size() < 15)
-            CV_Error(Error::StsAssert, "Detector gives too few points in a test image\n");
-        descriptorExtractor->compute(image0, keypoints0, descriptors0);
-
-        BFMatcher bfmatcher(normType);
-
-        const float minIntersectRatio = 0.5f;
-        const int maxAngle = 360, angleStep = 15;
-        for(int angle = 0; angle < maxAngle; angle += angleStep)
-        {
-            SCOPED_TRACE(cv::format("angle=%d", angle));
-
-            Mat H = rotateImage(image0, static_cast<float>(angle), image1, mask1);
-
-            vector<KeyPoint> keypoints1;
-            rotateKeyPoints(keypoints0, H, static_cast<float>(angle), keypoints1);
-            Mat descriptors1;
-            descriptorExtractor->compute(image1, keypoints1, descriptors1);
-
-            vector<DMatch> descMatches;
-            bfmatcher.match(descriptors0, descriptors1, descMatches);
-
-            int descInliersCount = 0;
-            for(size_t m = 0; m < descMatches.size(); m++)
-            {
-                const KeyPoint& transformed_p0 = keypoints1[descMatches[m].queryIdx];
-                const KeyPoint& p1 = keypoints1[descMatches[m].trainIdx];
-                if(calcIntersectRatio(transformed_p0.pt, 0.5f * transformed_p0.size,
-                                      p1.pt, 0.5f * p1.size) >= minIntersectRatio)
-                {
-                    descInliersCount++;
-                }
-            }
-
-            EXPECT_GE(descInliersCount, keypoints0.size() * minDescInliersRatio)
-                << "minDescInliersRatio=" << minDescInliersRatio << " keypoints0.size()=" << keypoints0.size();
-#if SHOW_DEBUG_LOG
-            std::cout << "angle=" << angle << " descInliersRatio=" << static_cast<float>(descInliersCount) / keypoints0.size() << std::endl;
-#endif
-        }
-        ts->set_failed_test_info( cvtest::TS::OK );
-    }
+INSTANTIATE_TEST_CASE_P(LATCH, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA, SIFT::create(), LATCH::create(), 0.98f)
+));
 
-    Ptr<FeatureDetector> featureDetector;
-    Ptr<DescriptorExtractor> descriptorExtractor;
-    int normType;
-    float minDescInliersRatio;
-    int imgLoadMode;
-};
+#endif // NONFREE
 
+INSTANTIATE_TEST_CASE_P(DAISY, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA,
+               BRISK::create(),
+               DAISY::create(15, 3, 8, 8, DAISY::NRM_NONE, noArray(), true, true),
+               0.79f)
+));
+INSTANTIATE_TEST_CASE_P(VGG120, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA,
+               KAZE::create(),
+               VGG::create(VGG::VGG_120, 1.4f, true, true, 48.0f, false),
+               0.97f)
+));
+INSTANTIATE_TEST_CASE_P(VGG80, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA,
+               KAZE::create(),
+               VGG::create(VGG::VGG_80, 1.4f, true, true, 48.0f, false),
+               0.97f)
+));
+INSTANTIATE_TEST_CASE_P(VGG64, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA,
+               KAZE::create(),
+               VGG::create(VGG::VGG_64, 1.4f, true, true, 48.0f, false),
+               0.97f)
+));
+INSTANTIATE_TEST_CASE_P(VGG48, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA,
+               KAZE::create(),
+               VGG::create(VGG::VGG_48, 1.4f, true, true, 48.0f, false),
+               0.97f)
+));
 
-class DetectorScaleInvarianceTest : public cvtest::BaseTest
-{
-public:
-    DetectorScaleInvarianceTest(const Ptr<FeatureDetector>& _featureDetector,
-                                float _minKeyPointMatchesRatio,
-                                float _minScaleInliersRatio) :
-        featureDetector(_featureDetector),
-        minKeyPointMatchesRatio(_minKeyPointMatchesRatio),
-        minScaleInliersRatio(_minScaleInliersRatio)
-    {
-        CV_Assert(featureDetector);
-    }
 
-protected:
+#ifdef OPENCV_ENABLE_NONFREE
 
-    void run(int)
-    {
-        const string imageFilename = string(ts->get_data_path()) + IMAGE_BIKES;
-
-        // Read test data
-        Mat image0 = imread(imageFilename);
-        if(image0.empty())
-        {
-            ts->printf(cvtest::TS::LOG, "Image %s can not be read.\n", imageFilename.c_str());
-            ts->set_failed_test_info(cvtest::TS::FAIL_INVALID_TEST_DATA);
-            return;
-        }
-
-        vector<KeyPoint> keypoints0;
-        featureDetector->detect(image0, keypoints0);
-        removeVerySmallKeypoints(keypoints0);
-        if(keypoints0.size() < 15)
-            CV_Error(Error::StsAssert, "Detector gives too few points in a test image\n");
-
-        for(int scaleIdx = 1; scaleIdx <= 3; scaleIdx++)
-        {
-            float scale = 1.f + scaleIdx * 0.5f;
-            Mat image1;
-            resize(image0, image1, Size(), 1./scale, 1./scale, INTER_LINEAR_EXACT);
-
-            vector<KeyPoint> keypoints1, osiKeypoints1; // osi - original size image
-            featureDetector->detect(image1, keypoints1);
-            removeVerySmallKeypoints(keypoints1);
-            if(keypoints1.size() < 15)
-                CV_Error(Error::StsAssert, "Detector gives too few points in a test image\n");
-
-            if(keypoints1.size() > keypoints0.size())
-            {
-                ts->printf(cvtest::TS::LOG, "Strange behavior of the detector. "
-                    "It gives more points count in an image of the smaller size.\n"
-                    "original size (%d, %d), keypoints count = %d\n"
-                    "reduced size (%d, %d), keypoints count = %d\n",
-                    image0.cols, image0.rows, keypoints0.size(),
-                    image1.cols, image1.rows, keypoints1.size());
-                ts->set_failed_test_info(cvtest::TS::FAIL_INVALID_OUTPUT);
-                return;
-            }
-
-            scaleKeyPoints(keypoints1, osiKeypoints1, scale);
-
-            vector<DMatch> matches;
-            // image1 is query image (it's reduced image0)
-            // image0 is train image
-            matchKeyPoints(osiKeypoints1, Mat(), keypoints0, matches);
-
-            const float minIntersectRatio = 0.5f;
-            int keyPointMatchesCount = 0;
-            int scaleInliersCount = 0;
-
-            for(size_t m = 0; m < matches.size(); m++)
-            {
-                if(matches[m].distance < minIntersectRatio)
-                    continue;
-
-                keyPointMatchesCount++;
-
-                // Check does this inlier have consistent sizes
-                const float maxSizeDiff = 0.8f;//0.9f; // grad
-                float size0 = keypoints0[matches[m].trainIdx].size;
-                float size1 = osiKeypoints1[matches[m].queryIdx].size;
-                CV_Assert(size0 > 0 && size1 > 0);
-                if(std::min(size0, size1) > maxSizeDiff * std::max(size0, size1))
-                    scaleInliersCount++;
-            }
-
-            float keyPointMatchesRatio = static_cast<float>(keyPointMatchesCount) / keypoints1.size();
-            if(keyPointMatchesRatio < minKeyPointMatchesRatio)
-            {
-                ts->printf(cvtest::TS::LOG, "Incorrect keyPointMatchesRatio: curr = %f, min = %f.\n",
-                           keyPointMatchesRatio, minKeyPointMatchesRatio);
-                ts->set_failed_test_info(cvtest::TS::FAIL_BAD_ACCURACY);
-                return;
-            }
-
-            if(keyPointMatchesCount)
-            {
-                float scaleInliersRatio = static_cast<float>(scaleInliersCount) / keyPointMatchesCount;
-                if(scaleInliersRatio < minScaleInliersRatio)
-                {
-                    ts->printf(cvtest::TS::LOG, "Incorrect scaleInliersRatio: curr = %f, min = %f.\n",
-                               scaleInliersRatio, minScaleInliersRatio);
-                    ts->set_failed_test_info(cvtest::TS::FAIL_BAD_ACCURACY);
-                    return;
-                }
-            }
-#if SHOW_DEBUG_LOG
-            std::cout << "keyPointMatchesRatio - " << keyPointMatchesRatio
-                << " - scaleInliersRatio " << static_cast<float>(scaleInliersCount) / keyPointMatchesCount << std::endl;
+INSTANTIATE_TEST_CASE_P(BRIEF_64, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA,
+               SURF::create(),
+               BriefDescriptorExtractor::create(64,true),
+               0.98f)
+));
+
+INSTANTIATE_TEST_CASE_P(BRIEF_32, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA,
+               SURF::create(),
+               BriefDescriptorExtractor::create(32,true),
+               0.97f)
+));
+
+INSTANTIATE_TEST_CASE_P(BRIEF_16, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA,
+               SURF::create(),
+               BriefDescriptorExtractor::create(16, true),
+               0.98f)
+));
+
+INSTANTIATE_TEST_CASE_P(FREAK, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA,
+               SURF::create(),
+               FREAK::create(),
+               0.90f)
+));
+
+INSTANTIATE_TEST_CASE_P(BoostDesc_BGM, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA,
+               SURF::create(),
+               BoostDesc::create(BoostDesc::BGM, true, 6.25f),
+               0.999f)
+));
+
+INSTANTIATE_TEST_CASE_P(BoostDesc_BGM_HARD, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA,
+               SURF::create(),
+               BoostDesc::create(BoostDesc::BGM_HARD, true, 6.25f),
+               0.98f)
+));
+
+INSTANTIATE_TEST_CASE_P(BoostDesc_BGM_BILINEAR, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA,
+               SURF::create(),
+               BoostDesc::create(BoostDesc::BGM_BILINEAR, true, 6.25f),
+               0.98f)
+));
+
+INSTANTIATE_TEST_CASE_P(BoostDesc_BGM_LBGM, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA,
+               SURF::create(),
+               BoostDesc::create(BoostDesc::LBGM, true, 6.25f),
+               0.999f)
+));
+
+INSTANTIATE_TEST_CASE_P(BoostDesc_BINBOOST_64, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA,
+               SURF::create(),
+               BoostDesc::create(BoostDesc::BINBOOST_64, true, 6.25f),
+               0.98f)
+));
+
+INSTANTIATE_TEST_CASE_P(BoostDesc_BINBOOST_128, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA,
+               SURF::create(),
+               BoostDesc::create(BoostDesc::BINBOOST_128, true, 6.25f),
+               0.98f)
+));
+
+INSTANTIATE_TEST_CASE_P(BoostDesc_BINBOOST_256, DescriptorRotationInvariance, Values(
+    make_tuple(IMAGE_TSUKUBA,
+               SURF::create(),
+               BoostDesc::create(BoostDesc::BINBOOST_256, true, 6.25f),
+               0.999f)
+));
 #endif
-        }
-        ts->set_failed_test_info( cvtest::TS::OK );
-    }
 
-    Ptr<FeatureDetector> featureDetector;
-    float minKeyPointMatchesRatio;
-    float minScaleInliersRatio;
-};
 
-class DescriptorScaleInvarianceTest : public cvtest::BaseTest
-{
-public:
-    DescriptorScaleInvarianceTest(const Ptr<FeatureDetector>& _featureDetector,
-                                const Ptr<DescriptorExtractor>& _descriptorExtractor,
-                                int _normType,
-                                float _minDescInliersRatio) :
-        featureDetector(_featureDetector),
-        descriptorExtractor(_descriptorExtractor),
-        normType(_normType),
-        minDescInliersRatio(_minDescInliersRatio)
-    {
-        CV_Assert(featureDetector);
-        CV_Assert(descriptorExtractor);
-    }
 
-protected:
+// ============================ SCALE INVARIANCE ==============================
 
-    void run(int)
-    {
-        const string imageFilename = string(ts->get_data_path()) + IMAGE_BIKES;
-
-        // Read test data
-        Mat image0 = imread(imageFilename);
-        if(image0.empty())
-        {
-            ts->printf(cvtest::TS::LOG, "Image %s can not be read.\n", imageFilename.c_str());
-            ts->set_failed_test_info(cvtest::TS::FAIL_INVALID_TEST_DATA);
-            return;
-        }
-
-        vector<KeyPoint> keypoints0;
-        featureDetector->detect(image0, keypoints0);
-        removeVerySmallKeypoints(keypoints0);
-        if(keypoints0.size() < 15)
-            CV_Error(Error::StsAssert, "Detector gives too few points in a test image\n");
-        Mat descriptors0;
-        descriptorExtractor->compute(image0, keypoints0, descriptors0);
-
-        BFMatcher bfmatcher(normType);
-        for(int scaleIdx = 1; scaleIdx <= 3; scaleIdx++)
-        {
-            float scale = 1.f + scaleIdx * 0.5f;
-            SCOPED_TRACE(cv::format("scale=%g", scale));
-
-            Mat image1;
-            resize(image0, image1, Size(), 1./scale, 1./scale, INTER_LINEAR_EXACT);
-
-            vector<KeyPoint> keypoints1;
-            scaleKeyPoints(keypoints0, keypoints1, 1.0f/scale);
-            Mat descriptors1;
-            descriptorExtractor->compute(image1, keypoints1, descriptors1);
-
-            vector<DMatch> descMatches;
-            bfmatcher.match(descriptors0, descriptors1, descMatches);
-
-            const float minIntersectRatio = 0.5f;
-            int descInliersCount = 0;
-            for(size_t m = 0; m < descMatches.size(); m++)
-            {
-                const KeyPoint& transformed_p0 = keypoints0[descMatches[m].queryIdx];
-                const KeyPoint& p1 = keypoints0[descMatches[m].trainIdx];
-                if(calcIntersectRatio(transformed_p0.pt, 0.5f * transformed_p0.size,
-                                      p1.pt, 0.5f * p1.size) >= minIntersectRatio)
-                {
-                    descInliersCount++;
-                }
-            }
-
-            EXPECT_GE(descInliersCount, keypoints0.size() * minDescInliersRatio)
-                << "minDescInliersRatio=" << minDescInliersRatio << " keypoints0.size()=" << keypoints0.size();
-#if SHOW_DEBUG_LOG
-            std::cout << "scale=" << scale << " descInliersRatio=" << static_cast<float>(descInliersCount) / keypoints0.size() << std::endl;
-#endif
-        }
-        ts->set_failed_test_info( cvtest::TS::OK );
-    }
-
-    Ptr<FeatureDetector> featureDetector;
-    Ptr<DescriptorExtractor> descriptorExtractor;
-    int normType;
-    float minKeyPointMatchesRatio;
-    float minDescInliersRatio;
-};
-
-// Tests registration
-
-/*
- * Detector's rotation invariance check
- */
 #ifdef OPENCV_ENABLE_NONFREE
-TEST(Features2d_RotationInvariance_Detector_SURF, regression)
-{
-    DetectorRotationInvarianceTest test(SURF::create(),
-                                        0.65f,
-                                        0.76f);
-    test.safe_run();
-}
-
-TEST(Features2d_RotationInvariance_Detector_SIFT, DISABLED_regression)
-{
-    DetectorRotationInvarianceTest test(SIFT::create(),
-                                        0.45f,
-                                        0.70f);
-    test.safe_run();
-}
+INSTANTIATE_TEST_CASE_P(SURF, DetectorScaleInvariance, Values(
+    make_tuple(IMAGE_BIKES, SURF::create(), 0.64f, 0.84f)
+));
+
+INSTANTIATE_TEST_CASE_P(SIFT, DetectorScaleInvariance, Values(
+    make_tuple(IMAGE_BIKES, SIFT::create(), 0.55f, 0.99f)
+));
+
+INSTANTIATE_TEST_CASE_P(SURF, DescriptorScaleInvariance, Values(
+    make_tuple(IMAGE_BIKES, SURF::create(), SURF::create(), 0.7f)
+));
+INSTANTIATE_TEST_CASE_P(SIFT, DescriptorScaleInvariance, Values(
+    make_tuple(IMAGE_BIKES, SIFT::create(), SIFT::create(), 0.3f)
+));
+#endif // NONFREE
 
-/*
- * Descriptors's rotation invariance check
- */
-TEST(Features2d_RotationInvariance_Descriptor_SURF, regression)
-{
-    DescriptorRotationInvarianceTest test(SURF::create(),
-                                          SURF::create(),
-                                          NORM_L1,
-                                          0.83f);
-    test.safe_run();
-}
 
-TEST(Features2d_RotationInvariance_Descriptor_SIFT, regression)
-{
-    DescriptorRotationInvarianceTest test(SIFT::create(),
-                                          SIFT::create(),
-                                          NORM_L1,
-                                          0.98f);
-    test.safe_run();
-}
+#if 0  // DAISY is not scale invariant
+INSTANTIATE_TEST_CASE_P(DISABLED_DAISY, DescriptorScaleInvariance, Values(
+    make_tuple(IMAGE_BIKES,
+               BRISK::create(),
+               DAISY::create(15, 3, 8, 8, DAISY::NRM_NONE, noArray(), true, true),
+               0.1f)
+));
+#endif
 
-TEST(Features2d_RotationInvariance_Descriptor_LATCH, regression)
-{
-    DescriptorRotationInvarianceTest test(SIFT::create(),
-                                          LATCH::create(),
-                                          NORM_HAMMING,
-                                          0.98f);
-    test.safe_run();
-}
+INSTANTIATE_TEST_CASE_P(VGG120, DescriptorScaleInvariance, Values(
+    make_tuple(IMAGE_BIKES,
+               KAZE::create(),
+               VGG::create(VGG::VGG_120, 1.4f, true, true, 48.0f, false),
+               0.98f)
+));
+INSTANTIATE_TEST_CASE_P(VGG80, DescriptorScaleInvariance, Values(
+    make_tuple(IMAGE_BIKES,
+               KAZE::create(),
+               VGG::create(VGG::VGG_80, 1.4f, true, true, 48.0f, false),
+               0.98f)
+));
+INSTANTIATE_TEST_CASE_P(VGG64, DescriptorScaleInvariance, Values(
+    make_tuple(IMAGE_BIKES,
+               KAZE::create(),
+               VGG::create(VGG::VGG_64, 1.4f, true, true, 48.0f, false),
+               0.97f)
+));
+INSTANTIATE_TEST_CASE_P(VGG48, DescriptorScaleInvariance, Values(
+    make_tuple(IMAGE_BIKES,
+               KAZE::create(),
+               VGG::create(VGG::VGG_48, 1.4f, true, true, 48.0f, false),
+               0.93f)
+));
+
+#ifdef OPENCV_ENABLE_NONFREE  // SURF detector is used in tests
+INSTANTIATE_TEST_CASE_P(BoostDesc_BGM, DescriptorScaleInvariance, Values(
+    make_tuple(IMAGE_BIKES,
+               SURF::create(),
+               BoostDesc::create(BoostDesc::BGM, true, 6.25f),
+               0.98f)
+));
+INSTANTIATE_TEST_CASE_P(BoostDesc_BGM_HARD, DescriptorScaleInvariance, Values(
+    make_tuple(IMAGE_BIKES,
+               SURF::create(),
+               BoostDesc::create(BoostDesc::BGM_HARD, true, 6.25f),
+               0.75f)
+));
+INSTANTIATE_TEST_CASE_P(BoostDesc_BGM_BILINEAR, DescriptorScaleInvariance, Values(
+    make_tuple(IMAGE_BIKES,
+               SURF::create(),
+               BoostDesc::create(BoostDesc::BGM_BILINEAR, true, 6.25f),
+               0.95f)
+));
+INSTANTIATE_TEST_CASE_P(BoostDesc_LBGM, DescriptorScaleInvariance, Values(
+    make_tuple(IMAGE_BIKES,
+               SURF::create(),
+               BoostDesc::create(BoostDesc::LBGM, true, 6.25f),
+               0.95f)
+));
+INSTANTIATE_TEST_CASE_P(BoostDesc_BINBOOST_64, DescriptorScaleInvariance, Values(
+    make_tuple(IMAGE_BIKES,
+               SURF::create(),
+               BoostDesc::create(BoostDesc::BINBOOST_64, true, 6.25f),
+               0.75f)
+));
+INSTANTIATE_TEST_CASE_P(BoostDesc_BINBOOST_128, DescriptorScaleInvariance, Values(
+    make_tuple(IMAGE_BIKES,
+               SURF::create(),
+               BoostDesc::create(BoostDesc::BINBOOST_128, true, 6.25f),
+               0.95f)
+));
+INSTANTIATE_TEST_CASE_P(BoostDesc_BINBOOST_256, DescriptorScaleInvariance, Values(
+    make_tuple(IMAGE_BIKES,
+               SURF::create(),
+               BoostDesc::create(BoostDesc::BINBOOST_256, true, 6.25f),
+               0.98f)
+));
 #endif // NONFREE
 
-TEST(DISABLED_Features2d_RotationInvariance_Descriptor_DAISY, regression)
-{
-    DescriptorRotationInvarianceTest test(BRISK::create(),
-                                          DAISY::create(15, 3, 8, 8, DAISY::NRM_NONE, noArray(), true, true),
-                                          NORM_L1,
-                                          0.79f);
-    test.safe_run();
-}
 
-TEST(Features2d_RotationInvariance_Descriptor_VGG120, regression)
-{
-    DescriptorRotationInvarianceTest test(KAZE::create(),
-                                          VGG::create(VGG::VGG_120, 1.4f, true, true, 48.0f, false),
-                                          NORM_L1,
-                                          0.98f);
-    test.safe_run();
-}
 
-TEST(Features2d_RotationInvariance_Descriptor_VGG80, regression)
-{
-    DescriptorRotationInvarianceTest test(KAZE::create(),
-                                          VGG::create(VGG::VGG_80, 1.4f, true, true, 48.0f, false),
-                                          NORM_L1,
-                                          0.98f);
-    test.safe_run();
-}
-
-TEST(Features2d_RotationInvariance_Descriptor_VGG64, regression)
-{
-    DescriptorRotationInvarianceTest test(KAZE::create(),
-                                          VGG::create(VGG::VGG_64, 1.4f, true, true, 48.0f, false),
-                                          NORM_L1,
-                                          0.98f);
-    test.safe_run();
-}
-
-TEST(Features2d_RotationInvariance_Descriptor_VGG48, regression)
-{
-    DescriptorRotationInvarianceTest test(KAZE::create(),
-                                          VGG::create(VGG::VGG_48, 1.4f, true, true, 48.0f, false),
-                                          NORM_L1,
-                                          0.98f);
-    test.safe_run();
-}
+// ============================== OTHER TESTS =================================
 
 #ifdef OPENCV_ENABLE_NONFREE
-TEST(Features2d_RotationInvariance_Descriptor_BRIEF_64, regression)
-{
-    DescriptorRotationInvarianceTest test(SURF::create(),
-                                          BriefDescriptorExtractor::create(64,true),
-                                          NORM_L1,
-                                          0.98f);
-    test.safe_run();
-}
-
-TEST(Features2d_RotationInvariance_Descriptor_BRIEF_32, regression)
-{
-    DescriptorRotationInvarianceTest test(SURF::create(),
-                                          BriefDescriptorExtractor::create(32,true),
-                                          NORM_L1,
-                                          0.97f);
-    test.safe_run();
-}
-
-TEST(Features2d_RotationInvariance_Descriptor_BRIEF_16, regression)
-{
-    DescriptorRotationInvarianceTest test(SURF::create(),
-                                          BriefDescriptorExtractor::create(16,true),
-                                          NORM_L1,
-                                          0.85f);
-    test.safe_run();
-}
-
-TEST(Features2d_RotationInvariance_Descriptor_FREAK, regression)
-{
-    Ptr<Feature2D> f2d = FREAK::create();
-    DescriptorRotationInvarianceTest test(SURF::create(),
-                                          f2d,
-                                          f2d->defaultNorm(),
-                                          0.9f, IMREAD_GRAYSCALE);
-    test.safe_run();
-}
-
-TEST(Features2d_RotationInvariance_Descriptor_BoostDesc_BGM, regression)
-{
-    DescriptorRotationInvarianceTest test(SURF::create(),
-                                          BoostDesc::create(BoostDesc::BGM,true,6.25f),
-                                          NORM_HAMMING,
-                                          0.98f);
-    test.safe_run();
-}
-
-TEST(Features2d_RotationInvariance_Descriptor_BoostDesc_BGM_HARD, regression)
-{
-    DescriptorRotationInvarianceTest test(SURF::create(),
-                                          BoostDesc::create(BoostDesc::BGM_HARD,true,6.25f),
-                                          NORM_HAMMING,
-                                          0.98f);
-    test.safe_run();
-}
-
-TEST(Features2d_RotationInvariance_Descriptor_BoostDesc_BGM_BILINEAR, regression)
-{
-    DescriptorRotationInvarianceTest test(SURF::create(),
-                                          BoostDesc::create(BoostDesc::BGM_BILINEAR,true,6.25f),
-                                          NORM_HAMMING,
-                                          0.98f);
-    test.safe_run();
-}
-
-TEST(Features2d_RotationInvariance_Descriptor_BoostDesc_LBGM, regression)
-{
-    DescriptorRotationInvarianceTest test(SURF::create(),
-                                          BoostDesc::create(BoostDesc::LBGM,true,6.25f),
-                                          NORM_L1,
-                                          0.98f);
-    test.safe_run();
-}
-
-TEST(Features2d_RotationInvariance_Descriptor_BoostDesc_BINBOOST_64, regression)
-{
-    DescriptorRotationInvarianceTest test(SURF::create(),
-                                          BoostDesc::create(BoostDesc::BINBOOST_64,true,6.25f),
-                                          NORM_HAMMING,
-                                          0.98f);
-    test.safe_run();
-}
-
-TEST(Features2d_RotationInvariance_Descriptor_BoostDesc_BINBOOST_128, regression)
-{
-    DescriptorRotationInvarianceTest test(SURF::create(),
-                                          BoostDesc::create(BoostDesc::BINBOOST_128,true,6.25f),
-                                          NORM_HAMMING,
-                                          0.98f);
-    test.safe_run();
-}
-
-TEST(Features2d_RotationInvariance_Descriptor_BoostDesc_BINBOOST_256, regression)
-{
-    DescriptorRotationInvarianceTest test(SURF::create(),
-                                          BoostDesc::create(BoostDesc::BINBOOST_256,true,6.25f),
-                                          NORM_HAMMING,
-                                          0.98f);
-    test.safe_run();
-}
-
-/*
- * Detector's scale invariance check
- */
-TEST(Features2d_ScaleInvariance_Detector_SURF, regression)
-{
-    DetectorScaleInvarianceTest test(SURF::create(),
-                                     0.64f,
-                                     0.84f);
-    test.safe_run();
-}
-
-TEST(Features2d_ScaleInvariance_Detector_SIFT, regression)
-{
-    DetectorScaleInvarianceTest test(SIFT::create(),
-                                     0.69f,
-                                     0.98f);
-    test.safe_run();
-}
-
-/*
- * Descriptor's scale invariance check
- */
-TEST(Features2d_ScaleInvariance_Descriptor_SURF, regression)
-{
-    DescriptorScaleInvarianceTest test(SURF::create(),
-                                       SURF::create(),
-                                       NORM_L1,
-                                       0.61f);
-    test.safe_run();
-}
-
-TEST(Features2d_ScaleInvariance_Descriptor_SIFT, regression)
-{
-    DescriptorScaleInvarianceTest test(SIFT::create(),
-                                       SIFT::create(),
-                                       NORM_L1,
-                                       0.78f);
-    test.safe_run();
-}
-
-
 TEST(Features2d_RotationInvariance2_Detector_SURF, regression)
 {
     Mat cross(100, 100, CV_8UC1, Scalar(255));
@@ -866,114 +286,4 @@ TEST(Features2d_RotationInvariance2_Detector_SURF, regression)
 
 #endif // NONFREE
 
-TEST(DISABLED_Features2d_ScaleInvariance_Descriptor_DAISY, regression)
-{
-    DescriptorScaleInvarianceTest test(BRISK::create(),
-                                       DAISY::create(15, 3, 8, 8, DAISY::NRM_NONE, noArray(), true, true),
-                                       NORM_L1,
-                                       0.075f);
-    test.safe_run();
-}
-
-TEST(Features2d_ScaleInvariance_Descriptor_VGG120, regression)
-{
-    DescriptorScaleInvarianceTest test(KAZE::create(),
-                                       VGG::create(VGG::VGG_120, 1.4f, true, true, 48.0f, false),
-                                       NORM_L1,
-                                       0.98f);
-    test.safe_run();
-}
-
-TEST(Features2d_ScaleInvariance_Descriptor_VGG80, regression)
-{
-    DescriptorScaleInvarianceTest test(KAZE::create(),
-                                       VGG::create(VGG::VGG_80, 1.4f, true, true, 48.0f, false),
-                                       NORM_L1,
-                                       0.98f);
-    test.safe_run();
-}
-
-TEST(Features2d_ScaleInvariance_Descriptor_VGG64, regression)
-{
-    DescriptorScaleInvarianceTest test(KAZE::create(),
-                                       VGG::create(VGG::VGG_64, 1.4f, true, true, 48.0f, false),
-                                       NORM_L1,
-                                       0.97f);
-    test.safe_run();
-}
-
-TEST(Features2d_ScaleInvariance_Descriptor_VGG48, regression)
-{
-    DescriptorScaleInvarianceTest test(KAZE::create(),
-                                       VGG::create(VGG::VGG_48, 1.4f, true, true, 48.0f, false),
-                                       NORM_L1,
-                                       0.93f);
-    test.safe_run();
-}
-
-#ifdef OPENCV_ENABLE_NONFREE
-TEST(Features2d_ScaleInvariance_Descriptor_BoostDesc_BGM, regression)
-{
-    DescriptorScaleInvarianceTest test(SURF::create(),
-                                       BoostDesc::create(BoostDesc::BGM, true, 6.25f),
-                                       NORM_HAMMING,
-                                       0.98f);
-    test.safe_run();
-}
-
-TEST(Features2d_ScaleInvariance_Descriptor_BoostDesc_BGM_HARD, regression)
-{
-    DescriptorScaleInvarianceTest test(SURF::create(),
-                                       BoostDesc::create(BoostDesc::BGM_HARD, true, 6.25f),
-                                       NORM_HAMMING,
-                                       0.75f);
-    test.safe_run();
-}
-
-TEST(Features2d_ScaleInvariance_Descriptor_BoostDesc_BGM_BILINEAR, regression)
-{
-    DescriptorScaleInvarianceTest test(SURF::create(),
-                                       BoostDesc::create(BoostDesc::BGM_BILINEAR, true, 6.25f),
-                                       NORM_HAMMING,
-                                       0.95f);
-    test.safe_run();
-}
-
-TEST(Features2d_ScaleInvariance_Descriptor_BoostDesc_LBGM, regression)
-{
-    DescriptorScaleInvarianceTest test(SURF::create(),
-                                       BoostDesc::create(BoostDesc::LBGM, true, 6.25f),
-                                       NORM_L1,
-                                       0.95f);
-    test.safe_run();
-}
-
-TEST(Features2d_ScaleInvariance_Descriptor_BoostDesc_BINBOOST_64, regression)
-{
-    DescriptorScaleInvarianceTest test(SURF::create(),
-                                       BoostDesc::create(BoostDesc::BINBOOST_64, true, 6.25f),
-                                       NORM_HAMMING,
-                                       0.75f);
-    test.safe_run();
-}
-
-TEST(Features2d_ScaleInvariance_Descriptor_BoostDesc_BINBOOST_128, regression)
-{
-    DescriptorScaleInvarianceTest test(SURF::create(),
-                                       BoostDesc::create(BoostDesc::BINBOOST_128, true, 6.25f),
-                                       NORM_HAMMING,
-                                       0.95f);
-    test.safe_run();
-}
-
-TEST(Features2d_ScaleInvariance_Descriptor_BoostDesc_BINBOOST_256, regression)
-{
-    DescriptorScaleInvarianceTest test(SURF::create(),
-                                       BoostDesc::create(BoostDesc::BINBOOST_256, true, 6.25f),
-                                       NORM_HAMMING,
-                                       0.98f);
-    test.safe_run();
-}
-#endif // NONFREE
-
 }} // namespace
diff --git a/modules/ximgproc/CMakeLists.txt b/modules/ximgproc/CMakeLists.txt
index f6f88bec66d..cff0191b458 100644
--- a/modules/ximgproc/CMakeLists.txt
+++ b/modules/ximgproc/CMakeLists.txt
@@ -1,2 +1,2 @@
 set(the_description "Extended image processing module. It includes edge-aware filters and etc.")
-ocv_define_module(ximgproc opencv_core opencv_imgproc opencv_calib3d opencv_imgcodecs WRAP python java)
+ocv_define_module(ximgproc opencv_core opencv_imgproc opencv_calib3d opencv_imgcodecs opencv_video WRAP python java)
diff --git a/modules/ximgproc/README.md b/modules/ximgproc/README.md
index 598a0766756..59e744e0238 100644
--- a/modules/ximgproc/README.md
+++ b/modules/ximgproc/README.md
@@ -15,3 +15,4 @@ Extended Image Processing
 - Deriche Filter
 - Pei&Lin Normalization
 - Ridge Detection Filter
+- Binary morphology on run-length encoded images
diff --git a/modules/ximgproc/doc/ximgproc.bib b/modules/ximgproc/doc/ximgproc.bib
index b1c7ae0c44a..c9a391a09cd 100644
--- a/modules/ximgproc/doc/ximgproc.bib
+++ b/modules/ximgproc/doc/ximgproc.bib
@@ -289,6 +289,16 @@ @article{PersoonFu1977
   publisher={IEEE Computer Society}
 }
 
+@inproceedings{Breuel2008,
+  title = {Binary Morphology and Related Operations on Run-Length Representations.},
+  author = {Breuel, Thomas},
+  year = {2008},
+  month = {01},
+  pages = {159-166},
+  volume = {1},
+  booktitle = {VISAPP 2008 - 3rd International Conference on Computer Vision Theory and Applications, Proceedings}
+}
+
 @misc{Segleafvein,
   title = {Best Way of Segmenting Veins in Leaves.},
   author = {Niki Estner},
@@ -299,7 +309,15 @@ @misc{M_RF
   title = {Ridge Filter Mathematica},
   author = {Wolfram Mathematica},
   url = {http://reference.wolfram.com/language/ref/RidgeFilter.html}
+}
 
+@inproceedings{BarronPoole2016,
+author = {Jonathan T Barron and Ben Poole},
+title={The Fast Bilateral Solver},
+booktitle={European Conference on Computer Vision (ECCV)},
+year={2016},
+publisher={Springer International Publishing},
+pages={617--632},
 }
 
 @inproceedings{BarronPoole2016,
@@ -310,3 +328,11 @@ @inproceedings{BarronPoole2016
 publisher={Springer International Publishing},
 pages={617--632},
 }
+
+@inproceedings{Hu2017,
+  title={Robust interpolation of correspondences for large displacement optical flow},
+  author={Hu, Yinlin and Li, Yunsong and Song, Rui},
+  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
+  pages={481--489},
+  year={2017}
+}
\ No newline at end of file
diff --git a/modules/ximgproc/include/opencv2/ximgproc.hpp b/modules/ximgproc/include/opencv2/ximgproc.hpp
index 2853dca6906..0e9f597b727 100644
--- a/modules/ximgproc/include/opencv2/ximgproc.hpp
+++ b/modules/ximgproc/include/opencv2/ximgproc.hpp
@@ -56,6 +56,9 @@
 #include "ximgproc/fourier_descriptors.hpp"
 #include "ximgproc/ridgefilter.hpp"
 #include "ximgproc/brightedges.hpp"
+#include "ximgproc/run_length_morphology.hpp"
+#include "ximgproc/edgepreserving_filter.hpp"
+#include "ximgproc/color_match.hpp"
 
 
 /** @defgroup ximgproc Extended Image Processing
@@ -76,6 +79,26 @@ i.e. algorithms which somehow takes into account pixel affinities in natural ima
     @defgroup ximgproc_fast_line_detector Fast line detector
 
     @defgroup ximgproc_fourier Fourier descriptors
+
+    @defgroup ximgproc_run_length_morphology Binary morphology on run-length encoded image
+
+    These functions support morphological operations on binary images. In order to be fast and space efficient binary images are encoded with a run-length representation.
+    This representation groups continuous horizontal sequences of "on" pixels together in a "run". A run is charactarized by the column position of the first pixel in the run, the column
+    position of the last pixel in the run and the row position. This representation is very compact for binary images which contain large continuous areas of "on" and "off" pixels. A checkerboard
+    pattern would be a good example. The representation is not so suitable for binary images created from random noise images or other images where little correlation between neighboring pixels
+    exists.
+
+    The morphological operations supported here are very similar to the operations supported in the imgproc module. In general they are fast. However on several occasions they are slower than the functions
+    from imgproc. The structuring elements of cv::MORPH_RECT and cv::MORPH_CROSS have very good support from the imgproc module. Also small structuring elements are very fast in imgproc (presumably
+    due to opencl support). Therefore the functions from this module are recommended for larger structuring elements (cv::MORPH_ELLIPSE or self defined structuring elements). A sample application
+    (run_length_morphology_demo) is supplied which allows to compare the speed of some morphological operations for the functions using run-length encoding and the imgproc functions for a given image.
+
+    Run length encoded images are stored in standard opencv images. Images have a single column of cv::Point3i elements. The number of rows is the number of run + 1. The first row contains
+    the size of the original (not encoded) image.  For the runs the following mapping is used (x: column begin, y: column end (last column), z: row).
+
+    The size of the original image is required for compatibility with the imgproc functions when the boundary handling requires that pixel outside the image boundary are
+    "on".
+
     @}
 */
 
diff --git a/modules/ximgproc/include/opencv2/ximgproc/color_match.hpp b/modules/ximgproc/include/opencv2/ximgproc/color_match.hpp
new file mode 100644
index 00000000000..c18390d4ac6
--- /dev/null
+++ b/modules/ximgproc/include/opencv2/ximgproc/color_match.hpp
@@ -0,0 +1,66 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef __OPENCV_COLOR_MATCH_HPP__
+#define __OPENCV_COLOR_MATCH_HPP__
+
+#include <opencv2/core.hpp>
+
+namespace cv {
+namespace ximgproc {
+
+//! @addtogroup ximgproc_filters
+//! @{
+
+/**
+* @brief   creates a quaternion image.
+*
+* @param   img         Source 8-bit, 32-bit or 64-bit image, with 3-channel image.
+* @param   qimg        result CV_64FC4 a quaternion image( 4 chanels zero channel and B,G,R).
+*/
+CV_EXPORTS_W void createQuaternionImage(InputArray img, OutputArray qimg);
+
+/**
+* @brief   calculates conjugate of a quaternion image.
+*
+* @param   qimg         quaternion image.
+* @param   qcimg        conjugate of qimg
+*/
+CV_EXPORTS_W void qconj(InputArray qimg, OutputArray qcimg);
+/**
+* @brief   divides each element by its modulus.
+*
+* @param   qimg         quaternion image.
+* @param   qnimg        conjugate of qimg
+*/
+CV_EXPORTS_W void qunitary(InputArray qimg, OutputArray qnimg);
+/**
+* @brief   Calculates the per-element quaternion product of two arrays
+*
+* @param   src1         quaternion image.
+* @param   src2         quaternion image.
+* @param   dst        product dst(I)=src1(I) . src2(I)
+*/
+CV_EXPORTS_W void qmultiply(InputArray  	src1, InputArray  	src2, OutputArray  	dst);
+/**
+* @brief    Performs a forward or inverse Discrete quaternion Fourier transform of a 2D quaternion array.
+*
+* @param   img        quaternion image.
+* @param   qimg       quaternion image in dual space.
+* @param   flags      quaternion image in dual space. only DFT_INVERSE flags is supported
+* @param   sideLeft   true the hypercomplex exponential is to be multiplied on the left (false on the right ).
+*/
+CV_EXPORTS_W void qdft(InputArray img, OutputArray qimg, int  	flags, bool sideLeft);
+/**
+* @brief    Compares a color template against overlapped color image regions.
+*
+* @param   img        Image where the search is running. It must be 3 channels image
+* @param   templ       Searched template. It must be not greater than the source image and have 3 channels
+* @param   result     Map of comparison results. It must be single-channel 64-bit floating-point
+*/
+CV_EXPORTS_W void colorMatchTemplate(InputArray img, InputArray templ, OutputArray result);
+
+}
+}
+#endif
diff --git a/modules/ximgproc/include/opencv2/ximgproc/edge_filter.hpp b/modules/ximgproc/include/opencv2/ximgproc/edge_filter.hpp
index d42b0c89332..82be7c71b7f 100644
--- a/modules/ximgproc/include/opencv2/ximgproc/edge_filter.hpp
+++ b/modules/ximgproc/include/opencv2/ximgproc/edge_filter.hpp
@@ -417,6 +417,7 @@ For more details about the Fast Bilateral Solver parameters, see the original pa
 CV_EXPORTS_W Ptr<FastBilateralSolverFilter> createFastBilateralSolverFilter(InputArray guide, double sigma_spatial, double sigma_luma, double sigma_chroma, double lambda = 128.0, int num_iter = 25, double max_tol = 1e-5);
 
 
+
 /** @brief Simple one-line Fast Bilateral Solver filter call. If you have multiple images to filter with the same
 guide then use FastBilateralSolverFilter interface to avoid extra computations.
 
diff --git a/modules/ximgproc/include/opencv2/ximgproc/edgepreserving_filter.hpp b/modules/ximgproc/include/opencv2/ximgproc/edgepreserving_filter.hpp
new file mode 100644
index 00000000000..f5685ce39bb
--- /dev/null
+++ b/modules/ximgproc/include/opencv2/ximgproc/edgepreserving_filter.hpp
@@ -0,0 +1,33 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef __OPENCV_EDGEPRESERVINGFILTER_HPP__
+#define __OPENCV_EDGEPRESERVINGFILTER_HPP__
+
+#include <opencv2/core.hpp>
+
+namespace cv { namespace ximgproc {
+
+//! @addtogroup ximgproc
+//! @{
+
+    /**
+    * @brief Smoothes an image using the Edge-Preserving filter.
+    *
+    * The function smoothes Gaussian noise as well as salt & pepper noise.
+    * For more details about this implementation, please see
+    * [ReiWoe18]  Reich, S. and Wörgötter, F. and Dellen, B. (2018). A Real-Time Edge-Preserving Denoising Filter. Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP): Visapp, 85-94, 4. DOI: 10.5220/0006509000850094.
+    *
+    * @param src Source 8-bit 3-channel image.
+    * @param dst Destination image of the same size and type as src.
+    * @param d Diameter of each pixel neighborhood that is used during filtering. Must be greater or equal 3.
+    * @param threshold Threshold, which distinguishes between noise, outliers, and data.
+    */
+    CV_EXPORTS_W void edgePreservingFilter( InputArray src, OutputArray dst, int d, double threshold );
+
+}} // namespace
+
+//! @}
+
+#endif
diff --git a/modules/ximgproc/include/opencv2/ximgproc/run_length_morphology.hpp b/modules/ximgproc/include/opencv2/ximgproc/run_length_morphology.hpp
new file mode 100644
index 00000000000..5754691a2ca
--- /dev/null
+++ b/modules/ximgproc/include/opencv2/ximgproc/run_length_morphology.hpp
@@ -0,0 +1,119 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef __OPENCV_RUN_LENGTH_MORPHOLOGY_HPP__
+#define __OPENCV_RUN_LENGTH_MORPHOLOGY_HPP__
+
+#include <opencv2/core.hpp>
+
+namespace cv {
+namespace ximgproc {
+namespace rl {
+
+
+//! @addtogroup ximgproc_run_length_morphology
+//! @{
+
+/**
+* @brief   Applies a fixed-level threshold to each array element.
+*
+*
+* @param   src         input array (single-channel).
+* @param   rlDest      resulting run length encoded image.
+* @param   thresh      threshold value.
+* @param   type        thresholding type (only cv::THRESH_BINARY and cv::THRESH_BINARY_INV are supported)
+*
+*/
+CV_EXPORTS void threshold(InputArray src, OutputArray rlDest, double thresh, int type);
+
+
+/**
+* @brief   Dilates an run-length encoded binary image by using a specific structuring element.
+*
+*
+* @param   rlSrc       input image
+* @param   rlDest      result
+* @param   rlKernel    kernel
+* @param   anchor      position of the anchor within the element; default value (0, 0)
+*                      is usually the element center.
+*
+*/
+CV_EXPORTS void dilate(InputArray rlSrc, OutputArray rlDest, InputArray rlKernel, Point anchor = Point(0, 0));
+
+/**
+* @brief   Erodes an run-length encoded binary image by using a specific structuring element.
+*
+*
+* @param   rlSrc       input image
+* @param   rlDest      result
+* @param   rlKernel    kernel
+* @param   bBoundaryOn indicates whether pixel outside the image boundary are assumed to be on
+            (True: works in the same way as the default of cv::erode, False: is a little faster)
+* @param   anchor      position of the anchor within the element; default value (0, 0)
+*                      is usually the element center.
+*
+*/
+CV_EXPORTS void erode(InputArray rlSrc, OutputArray rlDest, InputArray rlKernel,
+    bool bBoundaryOn = true, Point anchor = Point(0, 0));
+
+/**
+* @brief   Returns a run length encoded structuring element of the specified size and shape.
+*
+*
+* @param   shape	Element shape that can be one of cv::MorphShapes
+* @param   ksize	Size of the structuring element.
+*
+*/
+CV_EXPORTS cv::Mat getStructuringElement(int shape, Size ksize);
+
+/**
+* @brief   Paint run length encoded binary image into an image.
+*
+*
+* @param   image       image to paint into (currently only single channel images).
+* @param   rlSrc       run length encoded image
+* @param   value      all foreground pixel of the binary image are set to this value
+*
+*/
+CV_EXPORTS void paint(InputOutputArray image, InputArray rlSrc, const cv::Scalar& value);
+
+/**
+* @brief   Check whether a custom made structuring element can be used with run length morphological operations.
+*          (It must consist of a continuous array of single runs per row)
+*
+* @param   rlStructuringElement   mask to be tested
+*/
+CV_EXPORTS bool isRLMorphologyPossible(InputArray rlStructuringElement);
+
+/**
+* @brief   Creates a run-length encoded image from a vector of runs (column begin, column end, row)
+*
+* @param   runs   vector of runs
+* @param   res    result
+* @param   size   image size (to be used if an "on" boundary should be used in erosion, using the default
+*                  means that the size is computed from the extension of the input)
+*/
+CV_EXPORTS void createRLEImage(std::vector<cv::Point3i>& runs, OutputArray res, Size size = Size(0, 0));
+
+/**
+* @brief   Applies a morphological operation to a run-length encoded binary image.
+*
+*
+* @param   rlSrc       input image
+* @param   rlDest      result
+* @param   op          all operations supported by cv::morphologyEx (except cv::MORPH_HITMISS)
+* @param   rlKernel    kernel
+* @param   bBoundaryOnForErosion indicates whether pixel outside the image boundary are assumed
+*          to be on for erosion operations (True: works in the same way as the default of cv::erode,
+*          False: is a little faster)
+* @param   anchor      position of the anchor within the element; default value (0, 0) is usually the element center.
+*
+*/
+CV_EXPORTS void morphologyEx(InputArray rlSrc, OutputArray rlDest, int op, InputArray rlKernel,
+    bool bBoundaryOnForErosion = true, Point anchor = Point(0,0));
+
+}
+}
+}
+#endif
diff --git a/modules/ximgproc/include/opencv2/ximgproc/slic.hpp b/modules/ximgproc/include/opencv2/ximgproc/slic.hpp
index 1f789dfefed..ab8e4bca335 100644
--- a/modules/ximgproc/include/opencv2/ximgproc/slic.hpp
+++ b/modules/ximgproc/include/opencv2/ximgproc/slic.hpp
@@ -61,8 +61,7 @@ namespace ximgproc
 //! @addtogroup ximgproc_superpixel
 //! @{
 
-    enum SLIC { SLIC = 100, SLICO = 101, MSLIC = 102 };
-    typedef enum SLIC SLICType;
+    enum SLICType { SLIC = 100, SLICO = 101, MSLIC = 102 };
 
 /** @brief Class implementing the SLIC (Simple Linear Iterative Clustering) superpixels
 algorithm described in @cite Achanta2012.
diff --git a/modules/ximgproc/include/opencv2/ximgproc/sparse_match_interpolator.hpp b/modules/ximgproc/include/opencv2/ximgproc/sparse_match_interpolator.hpp
index fffcfd6fa6f..80e20571477 100644
--- a/modules/ximgproc/include/opencv2/ximgproc/sparse_match_interpolator.hpp
+++ b/modules/ximgproc/include/opencv2/ximgproc/sparse_match_interpolator.hpp
@@ -77,6 +77,18 @@ estimator from @cite Revaud2015 and Fast Global Smoother as post-processing filt
 class CV_EXPORTS_W EdgeAwareInterpolator : public SparseMatchInterpolator
 {
 public:
+    /** @brief Interface to provide a more elaborated cost map, i.e. edge map, for the edge-aware term.
+     *  This implementation is based on a rather simple gradient-based edge map estimation.
+     *  To used more complex edge map estimator (e.g. StructuredEdgeDetection that has been
+     *  used in the original publication) that may lead to improved accuracies, the internal
+     *  edge map estimation can be bypassed here.
+     *  @param _costMap a type CV_32FC1 Mat is required.
+     *  @see cv::ximgproc::createSuperpixelSLIC
+    */
+    CV_WRAP virtual void setCostMap(const Mat & _costMap) = 0;
+    /** @brief Parameter to tune the approximate size of the superpixel used for oversegmentation.
+     *  @see cv::ximgproc::createSuperpixelSLIC
+    */
     /** @brief K is a number of nearest-neighbor matches considered, when fitting a locally affine
     model. Usually it should be around 128. However, lower values would make the interpolation
     noticeably faster.
@@ -125,6 +137,132 @@ EdgeAwareInterpolator.
 CV_EXPORTS_W
 Ptr<EdgeAwareInterpolator> createEdgeAwareInterpolator();
 
+/** @brief Sparse match interpolation algorithm based on modified piecewise locally-weighted affine
+ * estimator called Robust Interpolation method of Correspondences or RIC from @cite Hu2017 and Variational
+ * and Fast Global Smoother as post-processing filter. The RICInterpolator is a extension of the EdgeAwareInterpolator.
+ * Main concept of this extension is an piece-wise affine model based on over-segmentation via SLIC superpixel estimation.
+ * The method contains an efficient propagation mechanism to estimate among the pieces-wise models.
+ */
+class CV_EXPORTS_W RICInterpolator : public SparseMatchInterpolator
+{
+public:
+    /** @brief K is a number of nearest-neighbor matches considered, when fitting a locally affine
+     *model for a superpixel segment. However, lower values would make the interpolation
+     *noticeably faster. The original implementation of @cite Hu2017 uses 32.
+    */
+    CV_WRAP virtual void setK(int k = 32) = 0;
+    /** @copybrief setK
+     *  @see setK
+     */
+    CV_WRAP virtual int getK() const = 0;
+    /** @brief Interface to provide a more elaborated cost map, i.e. edge map, for the edge-aware term.
+     *  This implementation is based on a rather simple gradient-based edge map estimation.
+     *  To used more complex edge map estimator (e.g. StructuredEdgeDetection that has been
+     *  used in the original publication) that may lead to improved accuracies, the internal
+     *  edge map estimation can be bypassed here.
+     *  @param costMap a type CV_32FC1 Mat is required.
+     *  @see cv::ximgproc::createSuperpixelSLIC
+    */
+    CV_WRAP virtual void setCostMap(const Mat & costMap) = 0;
+    /** @brief Get the internal cost, i.e. edge map, used for estimating the edge-aware term.
+     *  @see setCostMap
+     */
+    CV_WRAP virtual void setSuperpixelSize(int spSize = 15) = 0;
+    /** @copybrief setSuperpixelSize
+     *  @see setSuperpixelSize
+     */
+    CV_WRAP virtual int getSuperpixelSize() const = 0;
+    /** @brief Parameter defines the number of nearest-neighbor matches for each superpixel considered, when fitting a locally affine
+     *model.
+    */
+    CV_WRAP virtual void setSuperpixelNNCnt(int spNN = 150) = 0;
+    /** @copybrief setSuperpixelNNCnt
+     *  @see setSuperpixelNNCnt
+    */
+    CV_WRAP virtual int getSuperpixelNNCnt() const = 0;
+    /** @brief Parameter to tune enforcement of superpixel smoothness factor used for oversegmentation.
+     *  @see cv::ximgproc::createSuperpixelSLIC
+    */
+    CV_WRAP virtual void setSuperpixelRuler(float ruler = 15.f) = 0;
+    /** @copybrief setSuperpixelRuler
+     *  @see setSuperpixelRuler
+     */
+    CV_WRAP virtual float  getSuperpixelRuler() const = 0;
+    /** @brief Parameter to choose superpixel algorithm variant to use:
+     * - cv::ximgproc::SLICType SLIC segments image using a desired region_size (value: 100)
+     * - cv::ximgproc::SLICType SLICO will optimize using adaptive compactness factor (value: 101)
+     * - cv::ximgproc::SLICType MSLIC will optimize using manifold methods resulting in more content-sensitive superpixels (value: 102).
+     *  @see cv::ximgproc::createSuperpixelSLIC
+    */
+    CV_WRAP virtual void setSuperpixelMode(int mode = 100) = 0;
+    /** @copybrief setSuperpixelMode
+     *  @see setSuperpixelMode
+     */
+    CV_WRAP virtual int getSuperpixelMode() const = 0;
+    /** @brief Alpha is a parameter defining a global weight for transforming geodesic distance into weight.
+     */
+    CV_WRAP virtual void setAlpha(float alpha = 0.7f) = 0;
+    /** @copybrief setAlpha
+     *  @see setAlpha
+     */
+    CV_WRAP virtual float getAlpha() const = 0;
+    /** @brief Parameter defining the number of iterations for piece-wise affine model estimation.
+     */
+    CV_WRAP virtual void setModelIter(int modelIter = 4) = 0;
+    /** @copybrief setModelIter
+     *  @see setModelIter
+     */
+    CV_WRAP virtual int getModelIter() const = 0;
+    /** @brief Parameter to choose wether additional refinement of the piece-wise affine models is employed.
+    */
+    CV_WRAP virtual void setRefineModels(bool refineModles = true) = 0;
+    /** @copybrief setRefineModels
+     *  @see setRefineModels
+     */
+    CV_WRAP virtual bool getRefineModels() const = 0;
+    /** @brief MaxFlow is a threshold to validate the predictions using a certain piece-wise affine model.
+     * If the prediction exceeds the treshold the translational model will be applied instead.
+    */
+    CV_WRAP virtual void setMaxFlow(float maxFlow = 250.f) = 0;
+    /** @copybrief setMaxFlow
+     *  @see setMaxFlow
+     */
+    CV_WRAP virtual float getMaxFlow() const = 0;
+    /** @brief Parameter to choose wether the VariationalRefinement post-processing  is employed.
+    */
+    CV_WRAP virtual void setUseVariationalRefinement(bool use_variational_refinement = false) = 0;
+    /** @copybrief setUseVariationalRefinement
+     *  @see setUseVariationalRefinement
+     */
+    CV_WRAP virtual bool  getUseVariationalRefinement() const = 0;
+    /** @brief Sets whether the fastGlobalSmootherFilter() post-processing is employed.
+    */
+    CV_WRAP virtual void setUseGlobalSmootherFilter(bool use_FGS = true) = 0;
+    /** @copybrief setUseGlobalSmootherFilter
+     *  @see setUseGlobalSmootherFilter
+     */
+    CV_WRAP virtual bool getUseGlobalSmootherFilter() const = 0;
+    /** @brief Sets the respective fastGlobalSmootherFilter() parameter.
+     */
+    CV_WRAP virtual void  setFGSLambda(float lambda = 500.f) = 0;
+    /** @copybrief setFGSLambda
+     *  @see setFGSLambda
+     */
+    CV_WRAP virtual float getFGSLambda() const = 0;
+    /** @brief Sets the respective fastGlobalSmootherFilter() parameter.
+     */
+    CV_WRAP virtual void  setFGSSigma(float sigma = 1.5f) = 0;
+    /** @copybrief setFGSSigma
+     *  @see setFGSSigma
+     */
+    CV_WRAP virtual float getFGSSigma() const = 0;
+};
+
+/** @brief Factory method that creates an instance of the
+RICInterpolator.
+*/
+CV_EXPORTS_W
+Ptr<RICInterpolator> createRICInterpolator();
 //! @}
 }
 }
diff --git a/modules/ximgproc/perf/perf_edgepreserving_filter.cpp b/modules/ximgproc/perf/perf_edgepreserving_filter.cpp
new file mode 100644
index 00000000000..7d86d165f25
--- /dev/null
+++ b/modules/ximgproc/perf/perf_edgepreserving_filter.cpp
@@ -0,0 +1,45 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level
+// directory of this distribution and at http://opencv.org/license.html.
+//
+//                    Created by Simon Reich
+//
+#include "perf_precomp.hpp"
+
+namespace opencv_test
+{
+namespace
+{
+
+/* 1. Define parameter type and test fixture */
+typedef tuple<int, double> RGFTestParam;
+typedef TestBaseWithParam<RGFTestParam> EdgepreservingFilterTest;
+
+/* 2. Declare the testsuite */
+PERF_TEST_P(EdgepreservingFilterTest, perf,
+            Combine(Values(-20, 0, 10), Values(-100, 0, 20)))
+{
+    /* 3. Get actual test parameters */
+    RGFTestParam params = GetParam();
+    int kernelSize = get<0>(params);
+    double threshold = get<1>(params);
+
+    /* 4. Allocate and initialize arguments for tested function */
+    std::string filename = getDataPath("perf/320x260.png");
+    Mat src = imread(filename, 1);
+    Mat dst(src.size(), src.type());
+
+    /* 5. Manifest your expectations about this test */
+    declare.in(src).out(dst);
+
+    /* 6. Collect the samples! */
+    PERF_SAMPLE_BEGIN();
+        ximgproc::edgePreservingFilter(src, dst, kernelSize, threshold);
+    PERF_SAMPLE_END();
+
+    /* 7. Do not check anything */
+    SANITY_CHECK_NOTHING();
+}
+
+} // namespace
+} // namespace opencv_test
diff --git a/modules/ximgproc/perf/perf_run_length_morphology.cpp b/modules/ximgproc/perf/perf_run_length_morphology.cpp
new file mode 100644
index 00000000000..7ab542d8d13
--- /dev/null
+++ b/modules/ximgproc/perf/perf_run_length_morphology.cpp
@@ -0,0 +1,37 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#include "perf_precomp.hpp"
+
+namespace opencv_test {
+namespace {
+
+typedef tuple<int, Size, int> RLParams;
+
+typedef TestBaseWithParam<RLParams> RLMorphologyPerfTest;
+
+PERF_TEST_P(RLMorphologyPerfTest, perf, Combine(Values(1,7, 21), Values(sz720p, sz2160p),
+    Values(MORPH_ERODE, MORPH_DILATE, MORPH_OPEN, MORPH_CLOSE, MORPH_GRADIENT,MORPH_TOPHAT, MORPH_BLACKHAT)))
+{
+    RLParams params = GetParam();
+    int seSize = get<0>(params);
+    Size sz = get<1>(params);
+    int op = get<2>(params);
+
+    Mat src(sz, CV_8U);
+    Mat thresholded, dstRLE;
+    Mat se = rl::getStructuringElement(MORPH_ELLIPSE, cv::Size(2 * seSize + 1, 2 * seSize + 1));
+
+    declare.in(src, WARMUP_RNG);
+
+    TEST_CYCLE_N(4)
+    {
+        rl::threshold(src, thresholded, 100.0, THRESH_BINARY);
+        rl::morphologyEx(thresholded, dstRLE, op, se);
+    }
+
+    SANITY_CHECK_NOTHING();
+}
+
+}
+} // namespace
\ No newline at end of file
diff --git a/modules/ximgproc/samples/color_match_template.cpp b/modules/ximgproc/samples/color_match_template.cpp
new file mode 100644
index 00000000000..e75191029b3
--- /dev/null
+++ b/modules/ximgproc/samples/color_match_template.cpp
@@ -0,0 +1,96 @@
+﻿#include <iostream>
+#include <fstream>
+#include <opencv2/core.hpp>
+#include <opencv2/core/utility.hpp>
+#include <opencv2/highgui.hpp>
+#include <opencv2/imgproc.hpp>
+#include <opencv2/ximgproc.hpp>
+#include <opencv2/ximgproc/color_match.hpp>
+
+using namespace std;
+using namespace cv;
+
+
+
+static void AddSlider(String sliderName, String windowName, int minSlider, int maxSlider, int valDefault, int *valSlider, void(*f)(int, void *), void *r)
+{
+    createTrackbar(sliderName, windowName, valSlider, 1, f, r);
+    setTrackbarMin(sliderName, windowName, minSlider);
+    setTrackbarMax(sliderName, windowName, maxSlider);
+    setTrackbarPos(sliderName, windowName, valDefault);
+}
+
+struct SliderData {
+    Mat img;
+    int thresh;
+};
+
+static void UpdateThreshImage(int , void *r)
+{
+    SliderData *p = (SliderData*)r;
+    Mat dst,labels,stats,centroids;
+
+    threshold(p->img, dst, p->thresh, 255, THRESH_BINARY);
+
+    connectedComponentsWithStats(dst, labels, stats, centroids, 8);
+    if (centroids.rows < 10)
+    {
+        cout << "**********************************************************************************\n";
+        for (int i = 0; i < centroids.rows; i++)
+        {
+            cout << dst.cols - centroids.at<double>(i, 0)  << " ";
+            cout << dst.rows -  centroids.at<double>(i, 1)  << "\n";
+        }
+        cout << "----------------------------------------------------------------------------------\n";
+    }
+    flip(dst, dst, -1);
+
+    imshow("Max Quaternion corr",dst);
+}
+
+int main(int argc, char *argv[])
+{
+    cv::CommandLineParser parser(argc, argv,
+        "{help h | | match color image }{@colortemplate | | input color template image}{@colorimage | | input color image}");
+    if (parser.has("help"))
+    {
+        parser.printMessage();
+        return -1;
+    }
+    string templateName = parser.get<string>("@colortemplate");
+    if (templateName.empty())
+    {
+        parser.printMessage();
+        parser.printErrors();
+        return -2;
+    }
+    string colorImageName = parser.get<string>("@colorimage");
+    if (templateName.empty())
+    {
+        parser.printMessage();
+        parser.printErrors();
+        return -2;
+    }
+    Mat imgLogo = imread(templateName, IMREAD_COLOR);
+    Mat imgColor = imread(colorImageName, IMREAD_COLOR);
+    imshow("Image", imgColor);
+    imshow("template", imgLogo);
+    // OK NOW WHERE IS OPENCV LOGO ?
+    Mat imgcorr;
+    SliderData ps;
+    ximgproc::colorMatchTemplate(imgColor, imgLogo, imgcorr);
+    imshow("quaternion correlation real", imgcorr);
+    normalize(imgcorr, imgcorr,1,0,NORM_MINMAX);
+    imgcorr.convertTo(ps.img, CV_8U, 255);
+    imshow("quaternion correlation", imgcorr);
+    ps.thresh = 0;
+    AddSlider("Level", "quaternion correlation", 0, 255, ps.thresh, &ps.thresh, UpdateThreshImage, &ps);
+    int code = 0;
+    while (code != 27)
+    {
+        code = waitKey(50);
+    }
+
+    waitKey(0);
+    return 0;
+}
\ No newline at end of file
diff --git a/modules/ximgproc/samples/colorize.cpp b/modules/ximgproc/samples/colorize.cpp
index 8aa0fc09760..b44f02da5e2 100644
--- a/modules/ximgproc/samples/colorize.cpp
+++ b/modules/ximgproc/samples/colorize.cpp
@@ -159,7 +159,7 @@ int main(int argc, char* argv[])
             if(show_count%5==0)
             {
                 cv::Mat target_temp(target.size(),target.type());
-                filtering_time = (double)getTickCount();
+                filtering_time = static_cast<float>(getTickCount());
                 if(mouseDraw)
                 {
                     cv::cvtColor(target, target_temp, cv::COLOR_BGR2YCrCb);
@@ -184,7 +184,7 @@ int main(int argc, char* argv[])
                 {
                   target_temp = target.clone();
                 }
-                filtering_time = ((double)getTickCount() - filtering_time)/getTickFrequency();
+                filtering_time = static_cast<float>(((double)getTickCount() - filtering_time)/getTickFrequency());
                 std::cout << "solver time: " << filtering_time << "s" << std::endl;
 
                 cv::Mat color_selected(target_temp.rows-mat_pallet.rows,PALLET_RADIUS*2,CV_8UC3,cv::Scalar(selected_b, selected_g, selected_r));
@@ -209,7 +209,7 @@ int main(int argc, char* argv[])
     cv::Mat result1 = cv::Mat(mat_input_gray.size(),mat_input_gray.type());
     cv::Mat result2 = cv::Mat(mat_input_gray.size(),mat_input_gray.type());
 
-    filtering_time = (double)getTickCount();
+    filtering_time = static_cast<float>(getTickCount());
 
     // dst_channels.push_back(src_channels[0]);
     dst_channels.push_back(mat_input_gray);
@@ -221,7 +221,7 @@ int main(int argc, char* argv[])
     cv::merge(dst_channels,target);
     cv::cvtColor(target, target, cv::COLOR_YCrCb2BGR);
 
-    filtering_time = ((double)getTickCount() - filtering_time)/getTickFrequency();
+    filtering_time = static_cast<float>(((double)getTickCount() - filtering_time)/getTickFrequency());
     std::cout << "solver time: " << filtering_time << "s" << std::endl;
 
 
@@ -331,7 +331,7 @@ void drawTrajectoryByReference(cv::Mat& img)
                 gray = *grayPix;
                 grayPix++;
                 mat_input_confidence.at<uchar>(y,x) = 255;
-                float draw_y = 0.229*(float(selected_r)) + 0.587*(float(selected_g)) + 0.114*(float(selected_b));
+                float draw_y = 0.229f*(float(selected_r)) + 0.587f*(float(selected_g)) + 0.114f*(float(selected_b));
                 int draw_b = int(float(selected_b)*(gray/draw_y));
                 int draw_g = int(float(selected_g)*(gray/draw_y));
                 int draw_r = int(float(selected_r)*(gray/draw_y));
@@ -397,13 +397,13 @@ void createPlate(Mat &im1, int radius)
 			Point pt2(j - cx, i - cy);
 			if (inCircle(Point(0, 0), pt2, radius))
 			{
-				int theta = angle(pt1, pt2) * 180 / CV_PI;
+				int theta = static_cast<int>(angle(pt1, pt2) * 180 / CV_PI);
 				if (i > cx)
 				{
 					theta = -theta + 360;
 				}
-				hsvImag.at<Vec3b>(i, j)[0] = theta / 2;
-				hsvImag.at<Vec3b>(i, j)[1] = module(pt2) / cx * 255;
+				hsvImag.at<Vec3b>(i, j)[0] = saturate_cast<uchar>(theta / 2);
+				hsvImag.at<Vec3b>(i, j)[1] = saturate_cast<uchar>(module(pt2) / cx * 255);
 				hsvImag.at<Vec3b>(i, j)[2] = 255;
 			}
 		}
diff --git a/modules/ximgproc/samples/disparity_filtering.cpp b/modules/ximgproc/samples/disparity_filtering.cpp
index ea06fb77a0e..bd0fb1eec90 100644
--- a/modules/ximgproc/samples/disparity_filtering.cpp
+++ b/modules/ximgproc/samples/disparity_filtering.cpp
@@ -3,7 +3,6 @@
 #include "opencv2/imgcodecs.hpp"
 #include "opencv2/highgui.hpp"
 #include "opencv2/core/utility.hpp"
-#include "opencv2/ximgproc/disparity_filter.hpp"
 #include "opencv2/ximgproc.hpp"
 #include <iostream>
 #include <string>
@@ -308,7 +307,6 @@ int main(int argc, char** argv)
         (void)fbs_luma;
         (void)fbs_chroma;
         (void)fbs_lambda;
-
 #endif
     }
     else if(filter=="wls_no_conf")
diff --git a/modules/ximgproc/samples/edgepreserving_filter_demo.cpp b/modules/ximgproc/samples/edgepreserving_filter_demo.cpp
new file mode 100644
index 00000000000..faa89b43d8d
--- /dev/null
+++ b/modules/ximgproc/samples/edgepreserving_filter_demo.cpp
@@ -0,0 +1,42 @@
+#include <iostream>
+#include <opencv2/highgui.hpp>
+#include <opencv2/ximgproc.hpp>
+#include <string>
+
+using namespace cv;
+
+int main(int argc, char **argv)
+{
+    cv::CommandLineParser parser(
+        argc, argv,
+        "{help h ? |     | help message}"
+        "{@image   |     | Image filename to process }");
+    if (parser.has("help") || !parser.has("@image"))
+    {
+        parser.printMessage();
+        return 0;
+    }
+
+    // Load image from first parameter
+    std::string filename = parser.get<std::string>("@image");
+    Mat image = imread(filename, 1), res;
+
+    if (!image.data)
+    {
+        std::cerr << "No image data at " << filename << std::endl;
+        throw;
+    }
+
+    // Before filtering
+    imshow("Original image", image);
+    waitKey(0);
+
+    // Initialize filter. Kernel size 5x5, threshold 20
+    ximgproc::edgePreservingFilter(image, res, 9, 20);
+
+    // After filtering
+    imshow("Filtered image", res);
+    waitKey(0);
+
+    return 0;
+}
diff --git a/modules/ximgproc/samples/fourier_descriptors_demo.cpp b/modules/ximgproc/samples/fourier_descriptors_demo.cpp
index 7347034acca..2afc42d7de8 100644
--- a/modules/ximgproc/samples/fourier_descriptors_demo.cpp
+++ b/modules/ximgproc/samples/fourier_descriptors_demo.cpp
@@ -104,11 +104,11 @@ int main(void)
             vector<Point2f> ctrRef2d, ctrRot2d;
             // sampling contour we want 256 points
             ximgproc::contourSampling(ctrRef, ctrRef2d, 256); // use a mat
-            ximgproc::contourSampling(ctrNoisyRotateShift, ctrRot2d, 256); // use a vector og point
+            ximgproc::contourSampling(ctrNoisyRotateShift, ctrRot2d, 256); // use a vector of points
             fit.setFDSize(16);
             Mat t;
             fit.estimateTransformation(ctrRot2d, ctrRef2d, t, &dist, false);
-            cout << "Transform *********\n "<<"Origin = "<< t.at<double>(0,0)*ctrNoisy.size() <<" expected "<< (p.origin*ctrNoisy.size()) / 100 <<" ("<< ctrNoisy.size()<<")\n";
+            cout << "Transform *********\n "<<"Origin = "<< 1-t.at<double>(0,0) <<" expected "<< p.origin/100.0 <<" ("<< ctrNoisy.size()<<")\n";
             cout << "Angle = " << t.at<double>(0, 1) * 180 / M_PI << " expected " << p.angle  <<"\n";
             cout << "Scale = " << t.at<double>(0, 2) << " expected " << p.scale10 / 10.0 << "\n";
             Mat dst;
diff --git a/modules/ximgproc/samples/fourier_descriptors_demo.py b/modules/ximgproc/samples/fourier_descriptors_demo.py
new file mode 100644
index 00000000000..5583192fd51
--- /dev/null
+++ b/modules/ximgproc/samples/fourier_descriptors_demo.py
@@ -0,0 +1,169 @@
+import numpy as np
+import cv2 as cv
+import math
+
+class ThParameters:
+    def __init__(self):
+        self.levelNoise=6
+        self.angle=45
+        self.scale10=5
+        self.origin=10
+        self.xg=150
+        self.yg=150
+        self.update=True
+
+def UpdateShape(x ):
+    p.update = True
+
+def union(a,b):
+  x = min(a[0], b[0])
+  y = min(a[1], b[1])
+  w = max(a[0]+a[2], b[0]+b[2]) - x
+  h = max(a[1]+a[3], b[1]+b[3]) - y
+  return (x, y, w, h)
+
+def intersection(a,b):
+  x = max(a[0], b[0])
+  y = max(a[1], b[1])
+  w = min(a[0]+a[2], b[0]+b[2]) - x
+  h = min(a[1]+a[3], b[1]+b[3]) - y
+  if w<0 or h<0: return () # or (0,0,0,0) ?
+  return (x, y, w, h)
+
+def NoisyPolygon(pRef,n):
+#    vector<Point> c
+    p = pRef;
+#    vector<vector<Point> > contour;
+    p = p+n*np.random.random_sample((p.shape[0],p.shape[1]))-n/2.0
+    if (n==0):
+        return p
+    c = np.empty(shape=[0, 2])
+    minX = p[0][0]
+    maxX = p[0][0]
+    minY = p[0][1]
+    maxY = p[0][1]
+    for i in range( 0,p.shape[0]):
+        next = i + 1;
+        if (next == p.shape[0]):
+            next = 0;
+        u = p[next] - p[i]
+        d = int(cv.norm(u))
+        a = np.arctan2(u[1], u[0])
+        step = 1
+        if (n != 0):
+            step = d // n
+        for j in range( 1,int(d),int(max(step, 1))):
+            while  True:
+                pAct = (u*j) / (d)
+                r = n*np.random.random_sample()
+                theta = a + 2*math.pi*np.random.random_sample()
+#                pNew = Point(Point2d(r*cos(theta) + pAct.x + p[i].x, r*sin(theta) + pAct.y + p[i].y));
+                pNew = np.array([(r*np.cos(theta) + pAct[0] + p[i][0], r*np.sin(theta) + pAct[1] + p[i][1])])
+                if (pNew[0][0]>=0 and pNew[0][1]>=0):
+                    break
+            if (pNew[0][0]<minX):
+                minX = pNew[0][0]
+            if (pNew[0][0]>maxX):
+                maxX = pNew[0][0]
+            if (pNew[0][1]<minY):
+                minY = pNew[0][1]
+            if (pNew[0][1]>maxY):
+                maxY = pNew[0][1]
+            c = np.append(c,pNew,axis = 0)
+    return c
+
+#static vector<Point> NoisyPolygon(vector<Point> pRef, double n);
+#static void UpdateShape(int , void *r);
+#static void AddSlider(String sliderName, String windowName, int minSlider, int maxSlider, int valDefault, int *valSlider, void(*f)(int, void *), void *r);
+def AddSlider(sliderName,windowName,minSlider,maxSlider,valDefault, update):
+    cv.createTrackbar(sliderName, windowName, valDefault,maxSlider-minSlider+1, update)
+    cv.setTrackbarMin(sliderName, windowName, minSlider)
+    cv.setTrackbarMax(sliderName, windowName, maxSlider)
+    cv.setTrackbarPos(sliderName, windowName, valDefault)
+
+#    vector<Point> ctrRef;
+#    vector<Point> ctrRotate, ctrNoisy, ctrNoisyRotate, ctrNoisyRotateShift;
+#    // build a shape with 5 vertex
+ctrRef = np.array([(250,250),(400, 250),(400, 300),(250, 300),(180, 270)])
+cg = np.mean(ctrRef,axis=0)
+p=ThParameters()
+cv.namedWindow("FD Curve matching");
+# A rotation with center at (150,150) of angle 45 degrees and a scaling of 5/10
+AddSlider("Noise", "FD Curve matching", 0, 20, p.levelNoise,  UpdateShape)
+AddSlider("Angle", "FD Curve matching", 0, 359, p.angle,  UpdateShape)
+AddSlider("Scale", "FD Curve matching", 5, 100, p.scale10, UpdateShape)
+AddSlider("Origin", "FD Curve matching", 0, 100, p.origin, UpdateShape)
+AddSlider("Xg", "FD Curve matching", 150, 450, p.xg, UpdateShape)
+AddSlider("Yg", "FD Curve matching", 150, 450, p.yg, UpdateShape)
+code = 0
+img = np.zeros((300,512,3), np.uint8)
+print ("******************** PRESS g TO MATCH CURVES *************\n")
+
+while (code!=27):
+    code = cv.waitKey(60)
+    if p.update:
+        p.levelNoise=cv.getTrackbarPos('Noise','FD Curve matching')
+        p.angle=cv.getTrackbarPos('Angle','FD Curve matching')
+        p.scale10=cv.getTrackbarPos('Scale','FD Curve matching')
+        p.origin=cv.getTrackbarPos('Origin','FD Curve matching')
+        p.xg=cv.getTrackbarPos('Xg','FD Curve matching')
+        p.yg=cv.getTrackbarPos('Yg','FD Curve matching')
+
+        r = cv.getRotationMatrix2D((p.xg, p.yg), angle=p.angle, scale=10.0/ p.scale10);
+        ctrNoisy= NoisyPolygon(ctrRef,p.levelNoise)
+        ctrNoisy1 = np.reshape(ctrNoisy,(ctrNoisy.shape[0],1,2))
+        ctrNoisyRotate = cv.transform(ctrNoisy1,r)
+        ctrNoisyRotateShift = np.empty([ctrNoisyRotate.shape[0],1,2],dtype=np.int32)
+        for  i in range(0,ctrNoisy.shape[0]):
+            k=(i+(p.origin*ctrNoisy.shape[0])//100)% ctrNoisyRotate.shape[0]
+            ctrNoisyRotateShift[i] = ctrNoisyRotate[k]
+#       To draw contour using drawcontours
+        cc= np.reshape(ctrNoisyRotateShift,[ctrNoisyRotateShift.shape[0],2])
+        c = [ ctrRef,cc]
+        p.update = False;
+        rglobal =(0,0,0,0)
+        for i in range(0,2):
+            r = cv.boundingRect(c[i])
+            rglobal = union(rglobal,r)
+        r = list(rglobal)
+        r[2] = r[2]+10
+        r[3] = r[3]+10
+        rglobal = tuple(r)
+        img = np.zeros((2 * rglobal[3], 2 * rglobal[2], 3), np.uint8)
+        cv.drawContours(img, c, 0, (255,0,0),1);
+        cv.drawContours(img, c, 1, (0, 255, 0),1);
+        cv.circle(img, tuple(c[0][0]), 5, (255, 0, 0),3);
+        cv.circle(img, tuple(c[1][0]), 5, (0, 255, 0),3);
+        cv.imshow("FD Curve matching", img);
+    if code == ord('d') :
+        cv.destroyWindow("FD Curve matching");
+        cv.namedWindow("FD Curve matching");
+# A rotation with center at (150,150) of angle 45 degrees and a scaling of 5/10
+        AddSlider("Noise", "FD Curve matching", 0, 20, p.levelNoise,  UpdateShape)
+        AddSlider("Angle", "FD Curve matching", 0, 359, p.angle,  UpdateShape)
+        AddSlider("Scale", "FD Curve matching", 5, 100, p.scale10,  UpdateShape)
+        AddSlider("Origin%%", "FD Curve matching", 0, 100, p.origin, UpdateShape)
+        AddSlider("Xg", "FD Curve matching", 150, 450, p.xg,  UpdateShape)
+        AddSlider("Yg", "FD Curve matching", 150, 450, p.yg,  UpdateShape)
+    if  code == ord('g'):
+        fit = cv.ximgproc.createContourFitting(1024,16);
+# sampling contour we want 256 points
+        cn= np.reshape(ctrRef,[ctrRef.shape[0],1,2])
+
+        ctrRef2d = cv.ximgproc.contourSampling(cn,  256)
+        ctrRot2d = cv.ximgproc.contourSampling(ctrNoisyRotateShift,  256)
+        fit.setFDSize(16)
+        c1 = ctrRef2d
+        c2 = ctrRot2d
+        alphaPhiST, dist	 = fit.estimateTransformation(ctrRot2d, ctrRef2d)
+        print( "Transform *********\n Origin = ", 1-alphaPhiST[0,0] ," expected ", p.origin / 100. ,"\n")
+        print( "Angle = ", alphaPhiST[0,1] * 180 / math.pi ," expected " , p.angle,"\n")
+        print( "Scale = " ,alphaPhiST[0,2] ," expected " , p.scale10 / 10.0 , "\n")
+        dst = cv.ximgproc.transformFD(ctrRot2d, alphaPhiST,cn, False);
+        ctmp= np.reshape(dst,[dst.shape[0],2])
+        cdst=ctmp.astype(int)
+
+        c = [ ctrRef,cc,cdst]
+        cv.drawContours(img, c, 2, (0,0,255),1);
+        cv.circle(img, (int(c[2][0][0]),int(c[2][0][1])), 5, (0, 0, 255),5);
+        cv.imshow("FD Curve matching", img);
diff --git a/modules/ximgproc/samples/run_length_morphology_demo.cpp b/modules/ximgproc/samples/run_length_morphology_demo.cpp
new file mode 100644
index 00000000000..37056c2f907
--- /dev/null
+++ b/modules/ximgproc/samples/run_length_morphology_demo.cpp
@@ -0,0 +1,246 @@
+#include <iostream>
+
+#include "opencv2/imgproc.hpp"
+#include "opencv2/ximgproc.hpp"
+#include "opencv2/imgcodecs.hpp"
+#include "opencv2/highgui.hpp"
+
+using namespace std;
+using namespace cv;
+using namespace cv::ximgproc;
+
+// Adapted from cv_timer in cv_utilities
+class Timer
+{
+public:
+  Timer() : start_(0), time_(0) {}
+
+  void start()
+  {
+    start_ = cv::getTickCount();
+  }
+
+  void stop()
+  {
+    CV_Assert(start_ != 0);
+    int64 end = cv::getTickCount();
+    time_ += end - start_;
+    start_ = 0;
+  }
+
+  double time()
+  {
+    double ret = time_ / cv::getTickFrequency();
+    time_ = 0;
+    return ret;
+  }
+
+private:
+  int64 start_, time_;
+};
+
+static void help()
+{
+
+printf("\nAllows to estimate the efficiency of the morphology operations implemented\n"
+    "in ximgproc/run_length_morphology.cpp\n"
+    "Call:\n  example_ximgproc_run_length_morphology_demo [image] -u=factor_upscaling image\n"
+    "Similar to the morphology2 sample of the main opencv library it shows the use\n"
+    "of rect, ellipse and cross kernels\n\n"
+    "As rectangular and cross-shaped structuring elements are highly optimized in opencv_imgproc module,\n"
+    "only with elliptical structuring elements a speedup is possible (e.g. for larger circles).\n"
+    "Run-length morphology has advantages for larger images.\n"
+    "You can verify this by upscaling your input with e.g. -u=2\n");
+printf( "Hot keys: \n"
+    "\tESC - quit the program\n"
+    "\tr - use rectangle structuring element\n"
+    "\te - use elliptic structuring element\n"
+    "\tc - use cross-shaped structuring element\n"
+    "\tSPACE - loop through all the options\n" );
+}
+
+static void print_introduction()
+{
+    printf("\nFirst select a threshold for binarization.\n"
+        "Then move the sliders for erosion/dilation or open/close operation\n\n"
+        "The ratio between the time of the execution from opencv_imgproc\n"
+        "and the code using run-length encoding will be displayed in the console\n\n");
+}
+
+Mat src, dst;
+
+int element_shape = MORPH_ELLIPSE;
+
+//the address of variable which receives trackbar position update
+int max_size = 40;
+int open_close_pos = 0;
+int erode_dilate_pos = 0;
+int nThreshold = 100;
+cv::Mat binaryImage;
+cv::Mat binaryRLE, dstRLE;
+cv::Mat rlePainted;
+
+static void PaintRLEToImage(cv::Mat& rleImage, cv::Mat& res, unsigned char uValue)
+{
+    res = cv::Scalar(0);
+    rl::paint(res, rleImage, Scalar((double) uValue));
+}
+
+
+static bool AreImagesIdentical(cv::Mat& image1, cv::Mat& image2)
+{
+    cv::Mat diff;
+    cv::absdiff(image1, image2, diff);
+    int nDiff = cv::countNonZero(diff);
+    return (nDiff == 0);
+}
+
+// callback function for open/close trackbar
+static void OpenClose(int, void*)
+{
+    int n = open_close_pos - max_size;
+    int an = n > 0 ? n : -n;
+    Mat element = getStructuringElement(element_shape, Size(an*2+1, an*2+1), Point(an, an) );
+    Timer timer;
+    timer.start();
+    if( n < 0 )
+        morphologyEx(binaryImage, dst, MORPH_OPEN, element);
+    else
+        morphologyEx(binaryImage, dst, MORPH_CLOSE, element);
+    timer.stop();
+    double imgproc_duration = timer.time();
+
+    element = rl::getStructuringElement(element_shape, Size(an * 2 + 1, an * 2 + 1));
+
+    Timer timer2;
+    timer2.start();
+    if (n < 0)
+        rl::morphologyEx(binaryRLE, dstRLE, MORPH_OPEN, element, true);
+    else
+        rl::morphologyEx(binaryRLE, dstRLE, MORPH_CLOSE, element, true);
+
+    timer2.stop();
+    double rl_duration = timer2.time();
+    cout << "ratio open/close duration: " << rl_duration / imgproc_duration << " (run-length: "
+        << rl_duration << ", pixelwise: " << imgproc_duration << " )" << std::endl;
+
+    PaintRLEToImage(dstRLE, rlePainted, (unsigned char)255);
+    if (!AreImagesIdentical(dst, rlePainted))
+    {
+        cout << "error result image are not identical" << endl;
+    }
+
+    imshow("Open/Close", rlePainted);
+}
+
+// callback function for erode/dilate trackbar
+static void ErodeDilate(int, void*)
+{
+    int n = erode_dilate_pos - max_size;
+    int an = n > 0 ? n : -n;
+    Mat element = getStructuringElement(element_shape, Size(an*2+1, an*2+1), Point(an, an) );
+    Timer timer;
+    timer.start();
+    if( n < 0 )
+        erode(binaryImage, dst, element);
+    else
+        dilate(binaryImage, dst, element);
+    timer.stop();
+    double imgproc_duration = timer.time();
+
+    element = rl::getStructuringElement(element_shape, Size(an*2+1, an*2+1));
+
+    Timer timer2;
+    timer2.start();
+    if( n < 0 )
+        rl::erode(binaryRLE, dstRLE, element, true);
+    else
+        rl::dilate(binaryRLE, dstRLE, element);
+    timer2.stop();
+    double rl_duration = timer2.time();
+
+    PaintRLEToImage(dstRLE, rlePainted, (unsigned char)255);
+    cout << "ratio erode/dilate duration: " << rl_duration / imgproc_duration <<
+        " (run-length: " << rl_duration << ", pixelwise: " << imgproc_duration << " )" << std::endl;
+
+    if (!AreImagesIdentical(dst, rlePainted))
+    {
+        cout << "error result image are not identical" << endl;
+    }
+
+    imshow("Erode/Dilate", rlePainted);
+}
+
+static void OnChangeThreshold(int, void*)
+{
+  threshold(src, binaryImage, (double) nThreshold, 255.0, THRESH_BINARY );
+  rl::threshold(src, binaryRLE, (double) nThreshold, THRESH_BINARY);
+  imshow("Threshold", binaryImage);
+}
+
+
+int main( int argc, char** argv )
+{
+    cv::CommandLineParser parser(argc, argv, "{help h||}{ @image | ../data/aloeL.jpg | }{u| |}");
+    if (parser.has("help"))
+    {
+        help();
+        return 0;
+    }
+    std::string filename = parser.get<std::string>("@image");
+
+    cv::Mat srcIn;
+    if( (srcIn = imread(filename,IMREAD_GRAYSCALE)).empty() )
+    {
+        help();
+        return -1;
+    }
+    int nScale = 1;
+    if (parser.has("u"))
+    {
+        int theScale = parser.get<int>("u");
+        if (theScale > 1)
+            nScale = theScale;
+    }
+
+    if (nScale == 1)
+        src = srcIn;
+    else
+        cv::resize(srcIn, src, cv::Size(srcIn.rows * nScale, srcIn.cols * nScale));
+
+    cout << "scale factor read " << nScale << endl;
+
+    print_introduction();
+
+    //create windows for output images
+    namedWindow("Open/Close",1);
+    namedWindow("Erode/Dilate",1);
+    namedWindow("Threshold",1);
+
+    open_close_pos = erode_dilate_pos = max_size - 10;
+    createTrackbar("size s.e.", "Open/Close",&open_close_pos,max_size*2+1,OpenClose);
+    createTrackbar("size s.e.", "Erode/Dilate",&erode_dilate_pos,max_size*2+1,ErodeDilate);
+    createTrackbar("threshold", "Threshold",&nThreshold,255, OnChangeThreshold);
+    OnChangeThreshold(0, 0);
+    rlePainted.create(cv::Size(src.cols, src.rows), CV_8UC1);
+
+    for(;;)
+    {
+        OpenClose(open_close_pos, 0);
+        ErodeDilate(erode_dilate_pos, 0);
+        char c = (char)waitKey(0);
+
+        if( c == 27 )
+            break;
+        if( c == 'e' )
+            element_shape = MORPH_ELLIPSE;
+        else if( c == 'r' )
+            element_shape = MORPH_RECT;
+        else if( c == 'c' )
+            element_shape = MORPH_CROSS;
+        else if( c == ' ' )
+            element_shape = (element_shape + 1) % 3;
+    }
+
+    return 0;
+}
\ No newline at end of file
diff --git a/modules/ximgproc/src/edgepreserving_filter.cpp b/modules/ximgproc/src/edgepreserving_filter.cpp
new file mode 100644
index 00000000000..ea9dfb65e70
--- /dev/null
+++ b/modules/ximgproc/src/edgepreserving_filter.cpp
@@ -0,0 +1,236 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level
+// directory of this distribution and at http://opencv.org/license.html.
+//
+//                    Created by Simon Reich
+//
+
+#include "precomp.hpp"
+
+namespace cv
+{
+namespace ximgproc
+{
+using namespace std;
+
+void edgePreservingFilter(InputArray _src, OutputArray _dst, int d,
+                          double threshold)
+{
+    CV_Assert(_src.type() == CV_8UC3);
+
+    Mat src = _src.getMat();
+
+    // [re]create the output array so that it has the proper size and type.
+    _dst.create(src.size(), src.type());
+    Mat dst = _dst.getMat();
+    src.copyTo(dst);
+
+    if (d < 3)
+        d = 3;
+    int subwindowX = d, subwindowY = d;
+
+    if (threshold < 0)
+        threshold = 0;
+
+    // number of image channels
+    int nChannel = src.channels();
+
+    vector<double> pixel(nChannel, 0);
+    vector<vector<double>> line1(src.rows, pixel);
+    vector<vector<vector<double>>> weight(src.cols,
+                                          line1); // global weights
+    vector<vector<vector<double>>> imageResult(
+        src.cols, line1); // global normalized image
+
+    // do algorithm
+    cv::Mat subwindow, subwindow1;
+    for (int posX = 0; posX < src.cols - subwindowX; posX++)
+    {
+        for (int posY = 0; posY < src.rows - subwindowY; posY++)
+        {
+            cv::Rect roi =
+                cv::Rect(posX, posY, subwindowX, subwindowY);
+            subwindow1 = src(roi);
+            cv::GaussianBlur(subwindow1, subwindow, cv::Size(5, 5),
+                             0.3, 0.3);
+
+            // compute arithmetic mean of subwindow
+            cv::Scalar ArithmeticMean = cv::mean(subwindow);
+
+            // compute pixelwise distance
+            vector<vector<double>> pixelwiseDist;
+
+            for (int subPosX = 0; subPosX < subwindow.cols;
+                 subPosX++)
+            {
+                vector<double> line;
+                for (int subPosY = 0; subPosY < subwindow.rows;
+                     subPosY++)
+                {
+                    cv::Vec3b intensity =
+                        subwindow.at<cv::Vec3b>(subPosY,
+                                                subPosX);
+                    double distance =
+                        ((double)intensity.val[0] -
+                         ArithmeticMean[0]) *
+                            ((double)intensity.val[0] -
+                             ArithmeticMean[0]) +
+                        ((double)intensity.val[1] -
+                         ArithmeticMean[1]) *
+                            ((double)intensity.val[1] -
+                             ArithmeticMean[1]) +
+                        ((double)intensity.val[2] -
+                         ArithmeticMean[2]) *
+                            ((double)intensity.val[2] -
+                             ArithmeticMean[2]);
+                    distance = sqrt(distance);
+
+                    line.push_back(distance);
+                };
+
+                pixelwiseDist.push_back(line);
+            };
+
+            // compute mean pixelwise distance
+            double meanPixelwiseDist = 0;
+
+            for (int i = 0; i < (int)pixelwiseDist.size(); i++)
+                for (int j = 0;
+                     j < (int)pixelwiseDist[i].size(); j++)
+                    meanPixelwiseDist +=
+                        pixelwiseDist[i][j];
+
+            meanPixelwiseDist /= ((int)pixelwiseDist.size() *
+                                  (int)pixelwiseDist[0].size());
+
+            // detect edge
+            for (int subPosX = 0; subPosX < subwindow.cols;
+                 subPosX++)
+            {
+                for (int subPosY = 0; subPosY < subwindow.rows;
+                     subPosY++)
+                {
+                    if ((meanPixelwiseDist <= threshold &&
+                         pixelwiseDist[subPosX][subPosY] <=
+                             threshold) ||
+                        (meanPixelwiseDist <= threshold &&
+                         pixelwiseDist[subPosX][subPosY] >
+                             threshold))
+                    {
+                        // global Position
+                        int globalPosX = posX + subPosX;
+                        int globalPosY = posY + subPosY;
+
+                        // compute global weight
+                        cv::Vec3b intensity =
+                            subwindow.at<cv::Vec3b>(
+                                subPosY, subPosX);
+                        weight[globalPosX][globalPosY]
+                              [0] +=
+                            intensity.val[0] *
+                            (threshold -
+                             pixelwiseDist[subPosX]
+                                          [subPosY]) *
+                            (threshold -
+                             pixelwiseDist[subPosX]
+                                          [subPosY]);
+                        weight[globalPosX][globalPosY]
+                              [1] +=
+                            intensity.val[1] *
+                            (threshold -
+                             pixelwiseDist[subPosX]
+                                          [subPosY]) *
+                            (threshold -
+                             pixelwiseDist[subPosX]
+                                          [subPosY]);
+                        weight[globalPosX][globalPosY]
+                              [2] +=
+                            intensity.val[2] *
+                            (threshold -
+                             pixelwiseDist[subPosX]
+                                          [subPosY]) *
+                            (threshold -
+                             pixelwiseDist[subPosX]
+                                          [subPosY]);
+
+                        // compute final image
+                        imageResult[globalPosX]
+                                   [globalPosY][0] +=
+                            intensity.val[0] *
+                            (threshold -
+                             pixelwiseDist[subPosX]
+                                          [subPosY]) *
+                            (threshold -
+                             pixelwiseDist[subPosX]
+                                          [subPosY]) *
+                            ArithmeticMean[0];
+                        imageResult[globalPosX]
+                                   [globalPosY][1] +=
+                            intensity.val[1] *
+                            (threshold -
+                             pixelwiseDist[subPosX]
+                                          [subPosY]) *
+                            (threshold -
+                             pixelwiseDist[subPosX]
+                                          [subPosY]) *
+                            ArithmeticMean[1];
+                        imageResult[globalPosX]
+                                   [globalPosY][2] +=
+                            intensity.val[2] *
+                            (threshold -
+                             pixelwiseDist[subPosX]
+                                          [subPosY]) *
+                            (threshold -
+                             pixelwiseDist[subPosX]
+                                          [subPosY]) *
+                            ArithmeticMean[2];
+                    };
+                };
+            };
+        };
+    };
+
+    // compute final image
+    for (int globalPosX = 0; globalPosX < (int)imageResult.size();
+         globalPosX++)
+    {
+        for (int globalPosY = 0;
+             globalPosY < (int)imageResult[globalPosX].size();
+             globalPosY++)
+        {
+            // cout << "globalPosX: " << globalPosX << "/"
+            // << dst.cols << "," << imageResult.size () <<
+            // "\tglobalPosY: " << globalPosY << "/" <<
+            // dst.rows << "," <<imageResult.at
+            // (globalPosX).size () << endl;
+
+            // add image to result
+            cv::Vec3b intensity =
+                src.at<cv::Vec3b>(globalPosY, globalPosX);
+            imageResult[globalPosX][globalPosY][0] +=
+                (double)intensity.val[0];
+            imageResult[globalPosX][globalPosY][1] +=
+                (double)intensity.val[1];
+            imageResult[globalPosX][globalPosY][2] +=
+                (double)intensity.val[2];
+
+            // normalize using weight
+            imageResult[globalPosX][globalPosY][0] /=
+                (weight[globalPosX][globalPosY][0] + 1);
+            imageResult[globalPosX][globalPosY][1] /=
+                (weight[globalPosX][globalPosY][1] + 1);
+            imageResult[globalPosX][globalPosY][2] /=
+                (weight[globalPosX][globalPosY][2] + 1);
+
+            // copy to output image frame
+            dst.at<cv::Vec3b>(globalPosY, globalPosX)[0] =
+                (uchar)imageResult[globalPosX][globalPosY][0];
+            dst.at<cv::Vec3b>(globalPosY, globalPosX)[1] =
+                (uchar)imageResult[globalPosX][globalPosY][1];
+            dst.at<cv::Vec3b>(globalPosY, globalPosX)[2] =
+                (uchar)imageResult[globalPosX][globalPosY][2];
+        };
+    };
+}
+} // namespace ximgproc
+} // namespace cv
diff --git a/modules/ximgproc/src/fbs_filter.cpp b/modules/ximgproc/src/fbs_filter.cpp
index 27859669e4b..b490d895ca4 100644
--- a/modules/ximgproc/src/fbs_filter.cpp
+++ b/modules/ximgproc/src/fbs_filter.cpp
@@ -186,8 +186,8 @@ namespace ximgproc
             bs_params()
             {
                 lam = 128.0;
-                A_diag_min = 1e-5;
-                cg_tol = 1e-5;
+                A_diag_min = 1e-5f;
+                cg_tol = 1e-5f;
                 cg_maxiter = 25;
             }
         };
@@ -261,7 +261,7 @@ namespace ximgproc
                     ++pix_idx;
                 }
             }
-            nvertices = hashed_coords.size();
+            nvertices = static_cast<int>(hashed_coords.size());
 
             // construct Blur matrices
             Eigen::VectorXf ones_nvertices = Eigen::VectorXf::Ones(nvertices);
@@ -373,7 +373,7 @@ namespace ximgproc
                     ++pix_idx;
                 }
             }
-            nvertices = hashed_coords.size();
+            nvertices = static_cast<int>(hashed_coords.size());
 
             // construct Blur matrices
             Eigen::VectorXf ones_nvertices = Eigen::VectorXf::Ones(nvertices);
diff --git a/modules/ximgproc/src/fgs_filter.cpp b/modules/ximgproc/src/fgs_filter.cpp
index d5e63389373..5e168da5dad 100644
--- a/modules/ximgproc/src/fgs_filter.cpp
+++ b/modules/ximgproc/src/fgs_filter.cpp
@@ -486,7 +486,6 @@ void FastGlobalSmootherFilterImpl::VerticalPass_ParBody::operator()(const Range&
     int start = std::min(range.start * stripe_sz, w);
     int end   = std::min(range.end   * stripe_sz, w);
 
-    //float lambda = fgs->lambda;
     WorkType denom;
     WorkType *Cvert_row, *Cvert_row_prev;
     WorkType *interD_row, *interD_row_prev, *cur_row, *cur_row_prev, *cur_row_next;
@@ -677,13 +676,11 @@ void FastGlobalSmootherFilterImpl::ComputeLUT_ParBody::operator()(const Range& r
 
 ////////////////////////////////////////////////////////////////////////////////////////////////
 
-CV_EXPORTS_W
 Ptr<FastGlobalSmootherFilter> createFastGlobalSmootherFilter(InputArray guide, double lambda, double sigma_color, double lambda_attenuation, int num_iter)
 {
     return Ptr<FastGlobalSmootherFilter>(FastGlobalSmootherFilterImpl::create(guide, lambda, sigma_color, num_iter, lambda_attenuation));
 }
 
-CV_EXPORTS_W
 void fastGlobalSmootherFilter(InputArray guide, InputArray src, OutputArray dst, double lambda, double sigma_color, double lambda_attenuation, int num_iter)
 {
     Ptr<FastGlobalSmootherFilter> fgs = createFastGlobalSmootherFilter(guide, lambda, sigma_color, lambda_attenuation, num_iter);
diff --git a/modules/ximgproc/src/fourier_descriptors.cpp b/modules/ximgproc/src/fourier_descriptors.cpp
index a06c29f0d83..a1d8201f6c4 100644
--- a/modules/ximgproc/src/fourier_descriptors.cpp
+++ b/modules/ximgproc/src/fourier_descriptors.cpp
@@ -5,7 +5,6 @@
 #include "precomp.hpp"
 #include <math.h>
 #include <vector>
-#include <iostream>
 
 /*
 If you use this code please cite this @cite BergerRaghunathan1998
@@ -107,7 +106,7 @@ void ContourFitting::estimateTransformation(InputArray _src, InputArray _ref, Ou
 void ContourFitting::estimateTransformation(InputArray _src, InputArray _ref, OutputArray _alphaPhiST,double *distFin, bool fdContour)
 {
     if (!fdContour)
-        CV_Assert( _src.kind() == _InputArray::STD_VECTOR && _ref.kind() == _InputArray::STD_VECTOR);
+        CV_Assert( (_src.kind() == _InputArray::STD_VECTOR || _src.kind() == _InputArray::MAT) && (_ref.kind() == _InputArray::STD_VECTOR || _ref.kind() == _InputArray::MAT));
     else
         CV_Assert(fdContour && _src.kind() == _InputArray::MAT && _ref.kind() == _InputArray::MAT);
     CV_Assert(_src.channels() == 2 && _ref.channels() == 2);
@@ -220,6 +219,8 @@ void fourierDescriptor(InputArray _src, OutputArray _dst, int nbElt, int nbFD)
     Mat  Z;
     if (z.rows*z.cols!=nbElt)
         contourSampling(_src, z,nbElt);
+    else if (_src.depth() == CV_32S)
+        z.convertTo(z, CV_32F);
     dft(z, Z, DFT_SCALE | DFT_REAL_OUTPUT);
     if (nbFD == -1)
     {
@@ -250,13 +251,12 @@ void contourSampling(InputArray _src, OutputArray _out, int nbElt)
     }
     CV_Assert(ctr.rows==1 || ctr.cols==1);
     double		l1 = 0, l2, p, d, s;
- //   AutoBuffer<Point2d> _buf(nbElt);
     Mat r;
     if (ctr.rows==1)
         ctr=ctr.t();
     int j = 0;
     int 		nb = ctr.rows;
-    p = arcLength(_src, true);
+    p = arcLength(ctr, true);
     l2 = norm(ctr.row(j) - ctr.row(j + 1)) / p;
     for (int i = 0; i<nbElt; i++)
     {
@@ -275,7 +275,6 @@ void contourSampling(InputArray _src, OutputArray _out, int nbElt)
             Mat d10 = d1 - d0;
             Mat pn = d0 + d10 * (s - l1) / (l2 - l1);
             r.push_back(pn);
- //           _buf[i]=Point2d(pn.at<Point2f>(0,0));
         }
     }
     r.copyTo(_out);
@@ -284,7 +283,7 @@ void contourSampling(InputArray _src, OutputArray _out, int nbElt)
 void transformFD(InputArray _src, InputArray _t,OutputArray _dst,  bool fdContour)
 {
     if (!fdContour)
-        CV_Assert(_src.kind() == _InputArray::STD_VECTOR);
+        CV_Assert(_src.kind() == _InputArray::STD_VECTOR || _src.kind() == _InputArray::MAT);
     else
         CV_Assert( _src.kind() == _InputArray::MAT );
     CV_Assert(_src.channels() == 2);
diff --git a/modules/ximgproc/src/quaternion.cpp b/modules/ximgproc/src/quaternion.cpp
new file mode 100644
index 00000000000..3b1ea163ae8
--- /dev/null
+++ b/modules/ximgproc/src/quaternion.cpp
@@ -0,0 +1,200 @@
+﻿// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.#include "precomp.hpp"
+#include "precomp.hpp"
+#include "opencv2/imgproc.hpp"
+#include "opencv2/ximgproc/color_match.hpp"
+
+using namespace std;
+
+namespace cv { namespace ximgproc {
+
+void createQuaternionImage(InputArray _img, OutputArray _qimg)
+{
+    int type = _img.type(), depth = CV_MAT_DEPTH(type), cn = CV_MAT_CN(type);
+    CV_CheckType(depth, depth == CV_8U || depth == CV_32F || depth == CV_64F, "Depth must be CV_8U, CV_32F or CV_64F");
+    CV_Assert(_img.dims() == 2 && cn == 3);
+    vector<Mat> qplane(4);
+    vector<Mat> plane;
+    split(_img, plane);
+    qplane[0] = Mat::zeros(_img.size(), CV_64FC1);
+    for (int i = 0; i < cn; i++)
+        plane[i].convertTo(qplane[3-i], CV_64F);
+    merge(qplane, _qimg);
+}
+
+void qconj(InputArray _img, OutputArray _qimg)
+{
+    int type = _img.type(), depth = CV_MAT_DEPTH(type), cn = CV_MAT_CN(type);
+    CV_CheckType(depth, depth == CV_32F || depth == CV_64F, "Depth must be CV_32F or CV_64F");
+    CV_Assert(_img.dims() == 2 && cn == 4);
+    vector<Mat> qplane(4), plane;
+    split(_img, plane);
+    qplane[0] = plane[0];
+    qplane[1] = -plane[1];
+    qplane[2] = -plane[2];
+    qplane[3] = -plane[3];
+    merge(qplane, _qimg);
+}
+
+void qunitary(InputArray _img, OutputArray _qimg)
+{
+    int type = _img.type(), depth = CV_MAT_DEPTH(type), cn = CV_MAT_CN(type);
+    CV_Assert((depth == CV_64F) && _img.dims() == 2 && cn == 4);
+    _img.copyTo(_qimg);
+    Mat qimg = _qimg.getMat();
+    qimg.forEach<Vec4d>([](Vec4d &p, const int * /*position*/) -> void {
+        double d = p[0] * p[0] + p[1] * p[1] + p[2] * p[2] + p[3] * p[3];
+        d = 1 / sqrt(d);
+        p *= d;
+    });
+}
+
+void qdft(InputArray _img, OutputArray _qimg, int  	flags, bool sideLeft)
+{
+    CV_INSTRUMENT_REGION();
+
+    int type = _img.type(), depth = CV_MAT_DEPTH(type), cn = CV_MAT_CN(type);
+    CV_Assert(depth == CV_64F && _img.dims() == 2 && cn == 4);
+    float c;
+    if (sideLeft)
+        c = 1;  // Left qdft
+    else
+        c = -1; // right qdft
+
+    vector<Mat> q;
+    Mat img;
+    img = _img.getMat();
+
+    CV_Assert(getOptimalDFTSize(img.rows) == img.rows && getOptimalDFTSize(img.cols) == img.cols);
+
+    split(img, q);
+    Mat c1r;
+    Mat c1i; // Imaginary part of c1 =x'
+    Mat c2r; // Real part of c2 =y'
+    Mat c2i; // Imaginary part of c2=z'
+    c1r = q[0].clone();
+    c1i = (q[1] + q[2] + q[3]) / sqrt(3);
+    c2r = (q[2] - q[3]) / sqrt(2);
+    c2i = c * (q[3] + q[2] - 2 * q[1]) / sqrt(6);
+    vector<Mat> vc1 = { c1r,c1i }, vc2 = { c2r,c2i };
+    Mat c1, c2, C1, C2;
+    merge(vc1, c1);
+    merge(vc2, c2);
+    if (flags& DFT_INVERSE)
+    {
+        dft(c1, C1, DFT_COMPLEX_OUTPUT | DFT_INVERSE|DFT_SCALE);
+        dft(c2, C2, DFT_COMPLEX_OUTPUT | DFT_INVERSE | DFT_SCALE);
+    }
+    else
+    {
+        dft(c1, C1, DFT_COMPLEX_OUTPUT);
+        dft(c2, C2, DFT_COMPLEX_OUTPUT);
+    }
+    split(C1, vc1);
+    split(C2, vc2);
+    vector<Mat> qdft(4);
+    qdft[0] = vc1[0].clone();
+    qdft[1] = vc1[1] / sqrt(3) - c * 2 * vc2[1] / sqrt(6);
+    qdft[2] = vc1[1] / sqrt(3) + vc2[0] / sqrt(2) + c * vc2[1] / sqrt(6);
+    qdft[3] = vc1[1] / sqrt(3) - vc2[0] / sqrt(2) + c * vc2[1] / sqrt(6);
+    Mat dst0;
+    merge(qdft, dst0);
+    dst0.copyTo(_qimg);
+}
+
+
+void qmultiply(InputArray  	src1, InputArray  	src2, OutputArray  	dst)
+{
+    int type = src1.type(), depth = CV_MAT_DEPTH(type), cn = CV_MAT_CN(type);
+    CV_Assert(depth == CV_64F && src1.dims() == 2 && cn == 4);
+    type = src2.type(), depth = CV_MAT_DEPTH(type), cn = CV_MAT_CN(type);
+    CV_Assert(depth == CV_64F && src2.dims() == 2 && cn == 4);
+    vector<Mat> q3(4);
+    if (src1.rows() == src2.rows() && src1.cols() == src2.cols())
+    {
+        vector<Mat> q1, q2;
+        split(src1, q1);
+        split(src2, q2);
+        q3[0] = q1[0].mul(q2[0]) - q1[1].mul(q2[1]) - q1[2].mul(q2[2]) - q1[3].mul(q2[3]);
+        q3[1] = q1[0].mul(q2[1]) + q1[1].mul(q2[0]) + q1[2].mul(q2[3]) - q1[3].mul(q2[2]);
+        q3[2] = q1[0].mul(q2[2]) - q1[1].mul(q2[3]) + q1[2].mul(q2[0]) + q1[3].mul(q2[1]);
+        q3[3] = q1[0].mul(q2[3]) + q1[1].mul(q2[2]) - q1[2].mul(q2[1]) + q1[3].mul(q2[0]);
+    }
+    else if (src1.rows() == 1 && src1.cols() == 1)
+    {
+        vector<Mat> q2;
+        Vec4d q1 = src1.getMat().at<Vec4d>(0, 0);
+        split(src2, q2);
+        q3[0] = q1[0] * q2[0] - q1[1] * q2[1] - q1[2] * q2[2] - q1[3] * q2[3];
+        q3[1] = q1[0] * q2[1] + q1[1] * q2[0] + q1[2] * q2[3] - q1[3] * q2[2];
+        q3[2] = q1[0] * q2[2] - q1[1] * q2[3] + q1[2] * q2[0] + q1[3] * q2[1];
+        q3[3] = q1[0] * q2[3] + q1[1] * q2[2] - q1[2] * q2[1] + q1[3] * q2[0];
+    }
+    else if (src2.rows() == 1 && src2.cols() == 1)
+    {
+        vector<Mat> q1;
+        split(src1, q1);
+        Vec4d q2 = src2.getMat().at<Vec4d>(0, 0);
+        q3[0] = q1[0] * q2[0] - q1[1] * q2[1] - q1[2] * q2[2] - q1[3] * q2[3];
+        q3[1] = q1[0] * q2[1] + q1[1] * q2[0] + q1[2] * q2[3] - q1[3] * q2[2];
+        q3[2] = q1[0] * q2[2] - q1[1] * q2[3] + q1[2] * q2[0] + q1[3] * q2[1];
+        q3[3] = q1[0] * q2[3] + q1[1] * q2[2] - q1[2] * q2[1] + q1[3] * q2[0];
+    }
+    else
+        CV_Assert(src1.rows() == src2.rows() && src1.cols() == src2.cols());
+    merge(q3, dst);
+
+}
+
+void colorMatchTemplate(InputArray _image, InputArray _templ, OutputArray _result)
+{
+    CV_INSTRUMENT_REGION();
+    Mat image = _image.getMat(), imageF;
+    CV_Assert(image.channels() == 3);
+    Mat colorTemplate = _templ.getMat(), colorTemplateF;
+    CV_Assert(colorTemplate.channels() == 3);
+    int rr = max(getOptimalDFTSize(image.rows), getOptimalDFTSize(colorTemplate.rows));
+    int cc = max(getOptimalDFTSize(image.cols), getOptimalDFTSize(colorTemplate.cols));
+    Mat logo(rr, cc, CV_64FC3, Scalar::all(0));
+    Mat img = Mat(rr, cc, CV_64FC3, Scalar::all(0));
+    colorTemplate.convertTo(colorTemplateF, CV_64F, 1 / 256.),
+    colorTemplateF.copyTo(logo(Rect(0, 0, colorTemplate.cols, colorTemplate.rows)));
+    image.convertTo(imageF, CV_64F, 1 / 256.);
+    imageF.copyTo(img(Rect(0, 0, image.cols, image.rows)));
+    Mat qimg, qlogo;
+    Mat qimgFFT, qimgIFFT, qlogoFFT;
+    // Create quaternion image
+    createQuaternionImage(img, qimg);
+    createQuaternionImage(logo, qlogo);
+    // quaternion fourier transform
+    qdft(qimg, qimgFFT, 0, true);
+    qdft(qimg, qimgIFFT, DFT_INVERSE, true);
+    qdft(qlogo, qlogoFFT, 0, false);
+    double sqrtnn = sqrt(static_cast<int>(qimgFFT.rows*qimgFFT.cols));
+    qimgFFT /= sqrtnn;
+    qimgIFFT *= sqrtnn;
+    qlogoFFT /= sqrtnn;
+    Mat mu(1, 1, CV_64FC4, Scalar(0, 1, 1, 1) / sqrt(3.));
+    Mat qtmp, qlogopara, qlogoortho;
+    qmultiply(mu, qlogoFFT, qtmp);
+    qmultiply(qtmp, mu, qtmp);
+    subtract(qlogoFFT, qtmp, qlogopara);
+    qlogopara = qlogopara / 2;
+    subtract(qlogoFFT, qlogopara, qlogoortho);
+    Mat qcross1, qcross2, cqf, cqfi;
+    qconj(qimgFFT, cqf);
+    qconj(qimgIFFT, cqfi);
+    qmultiply(cqf, qlogopara, qcross1);
+    qmultiply(cqfi, qlogoortho, qcross2);
+    Mat pwsp = qcross1 + qcross2;
+    Mat crossCorr, pwspUnitary;
+    qunitary(pwsp, pwspUnitary);
+    qdft(pwspUnitary, crossCorr, DFT_INVERSE, false);
+    vector<Mat> p;
+    split(crossCorr, p);
+    Mat imgcorr = (p[0].mul(p[0]) + p[1].mul(p[1]) + p[2].mul(p[2]) + p[3].mul(p[3]));
+    sqrt(imgcorr, _result);
+}
+}
+}
diff --git a/modules/ximgproc/src/run_length_morphology.cpp b/modules/ximgproc/src/run_length_morphology.cpp
new file mode 100644
index 00000000000..fb9bf76d7fa
--- /dev/null
+++ b/modules/ximgproc/src/run_length_morphology.cpp
@@ -0,0 +1,812 @@
+/*
+ *  By downloading, copying, installing or using the software you agree to this license.
+ *  If you do not agree to this license, do not download, install,
+ *  copy or use the software.
+ *
+ *
+ *  License Agreement
+ *  For Open Source Computer Vision Library
+ *  (3 - clause BSD License)
+ *
+ *  Redistribution and use in source and binary forms, with or without modification,
+ *  are permitted provided that the following conditions are met :
+ *
+ *  * Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ *  * Redistributions in binary form must reproduce the above copyright notice,
+ *  this list of conditions and the following disclaimer in the documentation
+ *  and / or other materials provided with the distribution.
+ *
+ *  * Neither the names of the copyright holders nor the names of the contributors
+ *  may be used to endorse or promote products derived from this software
+ *  without specific prior written permission.
+ *
+ *  This software is provided by the copyright holders and contributors "as is" and
+ *  any express or implied warranties, including, but not limited to, the implied
+ *  warranties of merchantability and fitness for a particular purpose are disclaimed.
+ *  In no event shall copyright holders or contributors be liable for any direct,
+ *  indirect, incidental, special, exemplary, or consequential damages
+ *  (including, but not limited to, procurement of substitute goods or services;
+ *  loss of use, data, or profits; or business interruption) however caused
+ *  and on any theory of liability, whether in contract, strict liability,
+ *  or tort(including negligence or otherwise) arising in any way out of
+ *  the use of this software, even if advised of the possibility of such damage.
+ */
+#include "precomp.hpp"
+#include <math.h>
+#include <vector>
+#include <iostream>
+
+
+namespace cv {
+namespace ximgproc {
+namespace rl {
+
+struct rlType
+{
+  int cb, ce, r;
+  rlType(int cbIn, int ceIn, int rIn): cb(cbIn), ce(ceIn), r(rIn) {}
+  rlType(): cb(0), ce(0), r(0) {}
+  bool operator < (const rlType& other) const { if (r < other.r || (r == other.r && cb < other.cb)
+      || (r == other.r && cb == other.cb && ce < other.ce)) return true; else return false; }
+};
+
+typedef std::vector<rlType> rlVec;
+
+template <class T>
+void _thresholdLine(T* pData, int nWidth, int nRow, T threshold, int type, rlVec& res)
+{
+  bool bOn = false;
+  int nStartSegment = 0;
+  for (int j = 0; j < nWidth; j++)
+  {
+    bool bAboveThreshold = (pData[j] > threshold);
+    bool bCurOn = (bAboveThreshold == (THRESH_BINARY == type));
+    if (!bOn && bCurOn)
+    {
+      nStartSegment = j;
+      bOn = true;
+    }
+    else if (bOn && !bCurOn)
+    {
+      rlType chord(nStartSegment, j - 1, nRow);
+      res.push_back(chord);
+      bOn = false;
+    }
+
+  }
+  if (bOn)
+  {
+    rlType chord(nStartSegment, nWidth - 1, nRow);
+    res.push_back(chord);
+  }
+}
+
+static void _threshold(cv::Mat& img, rlVec& res, double threshold, int type)
+{
+  res.clear();
+  switch (img.depth())
+  {
+  case CV_8U:
+    for (int i = 0; i < img.rows; ++i)
+      _thresholdLine<uchar>(img.ptr(i), img.cols, i, (uchar) threshold, type, res);
+    break;
+  case CV_8S:
+    for (int i = 0; i < img.rows; ++i)
+      _thresholdLine<schar>((schar*) img.ptr(i), img.cols, i, (schar) threshold, type, res);
+    break;
+  case CV_16U:
+      for (int i = 0; i < img.rows; ++i)
+      {
+          _thresholdLine<unsigned short>((unsigned short*)img.ptr(i), img.cols, i,
+              (unsigned short)threshold, type, res);
+      }
+    break;
+  case CV_16S:
+    for (int i = 0; i < img.rows; ++i)
+      _thresholdLine<short>((short*) img.ptr(i), img.cols, i, (short) threshold, type, res);
+    break;
+  case CV_32S:
+    for (int i = 0; i < img.rows; ++i)
+      _thresholdLine<int>((int*) img.ptr(i), img.cols, i, (int) threshold, type, res);
+    break;
+  case CV_32F:
+    for (int i = 0; i < img.rows; ++i)
+      _thresholdLine<float>((float*) img.ptr(i), img.cols, i, (float) threshold, type, res);
+    break;
+  case CV_64F:
+    for (int i = 0; i < img.rows; ++i)
+      _thresholdLine<double>((double*) img.ptr(i), img.cols, i, threshold, type, res);
+    break;
+  default:
+    CV_Error( CV_StsUnsupportedFormat, "unsupported image type" );
+  }
+}
+
+
+static void convertToOutputArray(rlVec& runs, Size size, OutputArray& res)
+{
+    size_t nRuns = runs.size();
+    std::vector<cv::Point3i> segments(nRuns + 1);
+    segments[0] = cv::Point3i(size.width, size.height, 0);
+    for (size_t i = 0; i < nRuns; ++i)
+    {
+        rlType& curRun = runs[i];
+        segments[i + 1] = Point3i(curRun.cb, curRun.ce, curRun.r);
+    }
+    Mat(segments).copyTo(res);
+}
+
+
+CV_EXPORTS void threshold(InputArray src, OutputArray rlDest, double thresh, int type)
+{
+    CV_INSTRUMENT_REGION();
+
+    Mat image = src.getMat();
+    CV_Assert(!image.empty() && image.channels() == 1);
+    CV_Assert(type == THRESH_BINARY || type == THRESH_BINARY_INV);
+    rlVec runs;
+    _threshold(image, runs, thresh, type);
+    Size size(image.cols, image.rows);
+
+    convertToOutputArray(runs, size, rlDest);
+}
+
+
+template <class T>
+void paint_impl(cv::Mat& img, rlType* pRuns, int nSize, T value)
+{
+    int i;
+    rlType* pCurRun;
+    for (pCurRun = pRuns, i = 0; i< nSize; ++pCurRun, ++i)
+    {
+        rlType& curRun = *pCurRun;
+        if (curRun.r < 0 || curRun.r >= img.rows || curRun.cb >= img.cols || curRun.ce < 0)
+            continue;
+
+        T* rowPtr = (T*)img.ptr(curRun.r);
+        std::fill(rowPtr + std::max(curRun.cb, 0), rowPtr + std::min(curRun.ce + 1, img.cols), value);
+    }
+}
+
+  CV_EXPORTS void paint(InputOutputArray image, InputArray rlSrc, const Scalar& value)
+  {
+    Mat _runs;
+    _runs = rlSrc.getMat();
+    int N = _runs.checkVector(3);
+    if (N <= 1)
+        return;
+
+    double dValue = value[0];
+
+    cv::Mat _image = image.getMat();
+
+    rlType* pRuns = (rlType*) &(_runs.at<Point3i>(1));
+    switch (_image.type())
+    {
+    case CV_8UC1:
+        paint_impl<uchar>(_image, pRuns, N - 1, (uchar)dValue);
+        break;
+    case CV_8SC1:
+        paint_impl<schar>(_image, pRuns, N - 1, (schar)dValue);
+        break;
+    case CV_16UC1:
+        paint_impl<unsigned short>(_image, pRuns, N - 1, (unsigned short)dValue);
+        break;
+    case CV_16SC1:
+        paint_impl<short>(_image, pRuns, N - 1, (short)dValue);
+        break;
+    case CV_32SC1:
+        paint_impl<int>(_image, pRuns, N - 1, (int)dValue);
+        break;
+    case CV_32FC1:
+        paint_impl<float>(_image, pRuns, N - 1, (float)dValue);
+        break;
+    case CV_64FC1:
+        paint_impl<double>(_image, pRuns, N - 1, dValue);
+        break;
+    default:
+        CV_Error(CV_StsUnsupportedFormat, "unsupported image type");
+        break;
+    }
+  }
+
+static void translateRegion(rlVec& reg, Point ptTrans)
+{
+    for (rlVec::iterator it=reg.begin();it!=reg.end();++it)
+    {
+        it->r += ptTrans.y;
+        it->cb += ptTrans.x;
+        it->ce += ptTrans.x;
+    }
+}
+
+CV_EXPORTS Mat getStructuringElement(int shape, Size ksize)
+{
+  Mat mask = cv::getStructuringElement(shape, ksize);
+
+  rlVec reg;
+  _threshold(mask, reg, 0.0, THRESH_BINARY);
+
+  Point ptTrans = - Point(mask.cols / 2, mask.rows / 2);
+  translateRegion(reg, ptTrans);
+  Mat rlDest;
+  convertToOutputArray(reg, Size(mask.cols, mask.rows), rlDest);
+
+  return rlDest;
+}
+
+static void erode_rle (rlVec& regIn, rlVec& regOut, rlVec& se)
+{
+  using namespace std;
+
+    regOut.clear();
+
+    if (regIn.size() == 0)
+        return;
+
+    int nMinRow = regIn[0].r;
+    int nMaxRow = regIn.back().r;
+
+    int nRows = nMaxRow - nMinRow + 1;
+
+
+    const int EMPTY = -1;
+
+    // setup a table which holds the index of the first chord for each row
+    vector<int> pIdxChord1(nRows);
+    vector<int> pIdxNextRow(nRows);
+
+    int i,j;
+
+    for (i=1;i<nRows;i++)
+    {
+        pIdxChord1[i] = EMPTY;
+        pIdxNextRow[i] = EMPTY;
+    }
+
+    pIdxChord1[0] = 0;
+    pIdxNextRow[nRows-1] = (int) regIn.size();
+
+    for (i=1; i < (int) regIn.size();i++)
+        if (regIn[i].r != regIn[i-1].r)
+        {
+            pIdxChord1[regIn[i].r - nMinRow] = i;
+            pIdxNextRow[regIn[i-1].r - nMinRow] = i;
+        }
+
+    int nMinRowSE = se[0].r;
+    int nMaxRowSE = se.back().r;
+
+    int nRowsSE = nMaxRowSE - nMinRowSE + 1;
+
+    assert(nRowsSE == (int) se.size());
+
+    vector<int> pCurIdxRow(nRowsSE);
+
+    // loop through all possible rows
+    for (i=nMinRow - nMinRowSE; i<= nMaxRow - nMaxRowSE; i++)
+    {
+        // check whether all relevant rows are available
+        bool bNextRow = false;
+
+        for (j=0; j < nRowsSE; j++)
+        {
+            // get idx of first chord in regIn for this row of the se
+            pCurIdxRow[j] = pIdxChord1[ j + nMinRowSE + i - nMinRow];
+            if (pCurIdxRow[j] == -1)
+            {
+                bNextRow = true;
+                break;
+            }
+        }
+
+        if (bNextRow)
+            continue;
+
+        while (!bNextRow)
+        {
+          int nPossibleStart = std::numeric_limits<int>::min();
+
+          // search for row with max( cb - se.cb) (the leftmost possible position of a result chord
+          for (j=0;j<nRowsSE;j++)
+              nPossibleStart = max(nPossibleStart, regIn[pCurIdxRow[j]].cb - se[j].cb);
+
+          // for all rows skip chords whose end is left from the point
+          // where it can contribute to a result
+          bool bHaveResult = true;
+          int nLimitingRow = 0;
+          int nChordEnd = std::numeric_limits<int>::max(); //INT_MAX;
+
+          for (j=0;j<nRowsSE;j++)
+          {
+              while (regIn[pCurIdxRow[j]].ce < nPossibleStart + se[j].ce &&
+                  pCurIdxRow[j] != pIdxNextRow[j + nMinRowSE + i - nMinRow])
+              {
+                  pCurIdxRow[j]++;
+              }
+
+              // if all chords in this row skipped -> next row
+              if (pCurIdxRow[j] == pIdxNextRow[ j + nMinRowSE + i - nMinRow])
+              {
+                  bNextRow = true;
+                  bHaveResult = false;
+                  break;
+              }
+              else if ( bHaveResult )
+              {
+              // can the found chord contribute to a result ?
+              if (regIn[ pCurIdxRow[j] ].cb - se[j].cb <= nPossibleStart)
+              {
+                  int nCurPossibleEnd = regIn[ pCurIdxRow[j] ].ce - se[j].ce;
+                  if (nCurPossibleEnd < nChordEnd)
+                  {
+                      nChordEnd = nCurPossibleEnd;
+                      nLimitingRow = j;
+                  }
+              }
+              else
+                  bHaveResult = false;
+              }
+          }
+
+        if (bHaveResult)
+        {
+            regOut.push_back(rlType(nPossibleStart, nChordEnd, i));
+            pCurIdxRow[nLimitingRow]++;
+
+            if (pCurIdxRow[nLimitingRow] == pIdxNextRow[ nLimitingRow + nMinRowSE + i - nMinRow])
+                  bNextRow = true;
+        }
+        } // end while (!bNextRow
+    } // end for
+
+}
+
+static void convertInputArrayToRuns(InputArray& theArray, rlVec& runs, Size& theSize)
+{
+  Mat _runs;
+  _runs = theArray.getMat();
+  int N = _runs.checkVector(3);
+  if (N == 0)
+  {
+      runs.clear();
+      return;
+  }
+  runs.resize(N - 1);
+  Point3i pt = _runs.at<Point3i>(0);
+  theSize.width = pt.x;
+  theSize.height = pt.y;
+
+  for (int i = 1; i < N; ++i)
+  {
+    pt = _runs.at<Point3i>(i);
+    runs[i-1] = rlType(pt.x, pt.y, pt.z);
+  }
+}
+
+static void sortChords(rlVec& lChords)
+{
+    sort(lChords.begin(), lChords.end());
+}
+
+static void mergeNeighbouringChords(rlVec& rlIn, rlVec& rlOut)
+{
+    rlOut.clear();
+    if (rlIn.size() == 0)
+        return;
+
+    rlOut.push_back(rlIn[0]);
+
+    for (int i = 1; i< (int)rlIn.size(); i++)
+    {
+        rlType& curIn = rlIn[i];
+        rlType& lastAddedOut = rlOut.back();
+        if (curIn.r == lastAddedOut.r && curIn.cb <= lastAddedOut.ce + 1)
+            lastAddedOut.ce = max(curIn.ce, lastAddedOut.ce);
+        else
+            rlOut.push_back(curIn);
+    }
+}
+
+static void union_regions(rlVec& reg1, rlVec& reg2, rlVec& regUnion)
+{
+    rlVec lAllChords(reg1);
+
+    lAllChords.insert(lAllChords.end(), reg2.begin(), reg2.end());
+
+    sortChords(lAllChords);
+    mergeNeighbouringChords(lAllChords, regUnion);
+}
+
+static void intersect(rlVec& reg1, rlVec& reg2, rlVec& regRes)
+{
+    rlVec::iterator end1 = reg1.end();
+    rlVec::iterator end2 = reg2.end();
+
+    rlVec::iterator cur1 = reg1.begin();
+    rlVec::iterator cur2 = reg2.begin();
+    regRes.clear();
+
+    while (cur1 != end1 && cur2 != end2)
+    {
+        if (cur1->r < cur2->r || (cur1->r == cur2->r && cur1->ce < cur2->cb))
+            ++cur1;
+        else if (cur2->r < cur1->r || (cur1->r == cur2->r && cur2->ce < cur1->cb))
+            ++cur2;
+        else
+        {
+            assert(cur1->r == cur2->r);
+            int nStart = max(cur1->cb, cur2->cb);
+            int nEnd = min(cur1->ce, cur2->ce);
+            if (nStart > nEnd)
+            {
+                assert(nStart <= nEnd);
+            }
+            regRes.push_back(rlType(nStart, nEnd, cur1->r));
+            if (cur1->ce < cur2->ce)
+                ++cur1;
+            else
+                ++cur2;
+        }
+
+    }
+}
+
+static void addBoundary(rlVec& runsIn, int nWidth, int nHeight, int nBoundaryLeft, int nBoundaryTop,
+    int nBoundaryRight, int nBoundaryBottom, rlVec& res)
+{
+    rlVec boundary;
+    for (int i = -nBoundaryTop; i < 0; ++i)
+        boundary.push_back(rlType(-nBoundaryLeft, nWidth - 1 + nBoundaryRight, i));
+    for (int i = 0; i < nHeight; ++i)
+    {
+        boundary.push_back(rlType(-nBoundaryLeft, -1, i));
+        boundary.push_back(rlType(nWidth, nWidth - 1 + nBoundaryRight, i));
+    }
+
+    for (int i = nHeight; i < nHeight + nBoundaryBottom; ++i)
+        boundary.push_back(rlType(-nBoundaryLeft, nWidth - 1 + nBoundaryRight, i));
+
+    union_regions(runsIn, boundary, res);
+}
+
+static cv::Rect getBoundingRectangle(rlVec& reg)
+{
+    using namespace std;
+    cv::Rect rect;
+    if (reg.empty())
+    {
+        rect.x = rect.y = rect.width = rect.height = 0;
+        return rect;
+    }
+
+    int minX = std::numeric_limits<int>::max();
+    int minY = std::numeric_limits<int>::max();
+    int maxX = std::numeric_limits<int>::min();
+    int maxY = std::numeric_limits<int>::min();
+
+    int i;
+    for (i = 0; i<(int)reg.size(); i++)
+    {
+        minX = min(minX, reg[i].cb);
+        maxX = max(maxX, reg[i].ce);
+        minY = min(minY, reg[i].r);
+        maxY = max(maxY, reg[i].r);
+    }
+
+    rect.x = minX;
+    rect.y = minY;
+    rect.width = maxX - minX + 1;
+    rect.height = maxY - minY + 1;
+    return rect;
+}
+
+static void createUprightRectangle(cv::Rect rect, rlVec &rl)
+{
+    rl.clear();
+    rlType curRL;
+    int j;
+    int cb = rect.x;
+    int ce = rect.x + rect.width - 1;
+    for (j = 0; j < rect.height; j++)
+    {
+        curRL.cb = cb;
+        curRL.ce = ce;
+        curRL.r = j + rect.y;
+        rl.push_back(curRL);
+    }
+}
+
+static void erode_with_boundary_rle(rlVec& runsSource, int nWidth, int nHeight, rlVec& runsDestination,
+    rlVec& runsKernel)
+{
+    cv::Rect rect = getBoundingRectangle(runsKernel);
+    rlVec regExtended, regFrame, regResultRaw;
+    addBoundary(runsSource, nWidth, nHeight, max(0, -rect.x), max(0, -rect.y),
+        max(0, rect.x + rect.width), max(0, rect.y + rect.height), regExtended);
+
+    erode_rle(regExtended, regResultRaw, runsKernel);
+    createUprightRectangle(cv::Rect(0, 0, nWidth, nHeight), regFrame);
+    intersect(regResultRaw, regFrame, runsDestination);
+}
+
+
+CV_EXPORTS void erode(InputArray rlSrc, OutputArray rlDest, InputArray rlKernel, bool bBoundaryOn,
+    Point anchor)
+{
+  rlVec runsSource, runsDestination, runsKernel;
+  Size sizeSource, sizeKernel;
+  convertInputArrayToRuns(rlSrc, runsSource, sizeSource);
+  convertInputArrayToRuns(rlKernel, runsKernel, sizeKernel);
+
+  if (anchor != Point(0,0))
+    translateRegion(runsKernel, -anchor);
+
+  if (bBoundaryOn)
+      erode_with_boundary_rle(runsSource, sizeSource.width, sizeSource.height, runsDestination, runsKernel);
+  else
+      erode_rle(runsSource, runsDestination, runsKernel);
+  convertToOutputArray(runsDestination, sizeSource, rlDest);
+}
+
+
+static void subtract_rle( rlVec& regFrom,
+                        rlVec& regSubtract,
+                        rlVec& regRes)
+{
+    rlVec::iterator end1 = regFrom.end();
+    rlVec::iterator end2 = regSubtract.end();
+
+    rlVec::iterator cur1 = regFrom.begin();
+    rlVec::iterator cur2 = regSubtract.begin();
+    regRes.clear();
+
+    while( cur1 != end1)
+    {
+        if (cur2 == end2)
+        {
+            regRes.insert( regRes.end(), cur1, end1);
+            return;
+        }
+        else if ( cur1->r < cur2->r || (cur1->r == cur2->r && cur1->ce < cur2->cb))
+        {
+            regRes.push_back(*cur1);
+            ++cur1;
+        }
+        else if ( cur2->r < cur1->r || (cur1->r == cur2->r && cur2->ce < cur1->cb))
+            ++cur2;
+        else
+        {
+            int curR = cur1->r;
+            assert(curR == cur2->r);
+            rlVec::iterator lastIncluded;
+
+            bool bIncremented = false;
+            for (lastIncluded = cur2;
+                lastIncluded != end2 && lastIncluded->r == curR && lastIncluded->cb <= cur1->ce;
+                ++lastIncluded)
+            {
+                bIncremented = true;
+            }
+
+            if (bIncremented)
+                --lastIncluded;
+
+            // now all chords from cur2 to lastIncluded have an intersection with cur1
+            if (cur1->cb < cur2->cb)
+                regRes.push_back(rlType(cur1->cb, cur2->cb - 1, curR));
+
+            // we add the gaps between the chords of reg2 to the result
+            while (cur2 < lastIncluded)
+            {
+                regRes.push_back(rlType(cur2->ce + 1, (cur2 + 1)->cb - 1, curR));
+                if (regRes.back().cb > regRes.back().ce)
+                {
+                    assert(false);
+                }
+                ++cur2;
+            }
+
+            if (cur1->ce > lastIncluded->ce)
+            {
+                regRes.push_back(rlType(lastIncluded->ce + 1, cur1->ce, curR));
+                assert(regRes.back().cb <= regRes.back().ce);
+            }
+            ++cur1;
+        }
+
+    }
+}
+
+
+
+
+static void invertRegion(rlVec& runsIn, rlVec& runsOut)
+{
+    // if there is only one chord in row -> do not insert anything for this row
+    // otherwise insert chords for the spaces between chords
+    runsOut.clear();
+    int nCurRow = std::numeric_limits<int>::min();
+    int nLastRight = nCurRow;
+    for (rlVec::iterator it = runsIn.begin(); it != runsIn.end(); ++it)
+    {
+        rlType run = *it;
+        if (run.r != nCurRow)
+        {
+            nCurRow = run.r;
+            nLastRight = run.ce;
+        }
+        else
+        {
+            assert(run.cb > nLastRight + 1);
+            runsOut.push_back(rlType(nLastRight + 1, run.cb - 1, nCurRow));
+            nLastRight = run.ce;
+        }
+    }
+}
+
+
+static void dilate_rle(rlVec& runsSource,
+    rlVec& runsDestination,
+    rlVec& runsKernel)
+{
+    cv::Rect rectSource = getBoundingRectangle(runsSource);
+    cv::Rect rectKernel = getBoundingRectangle(runsKernel);
+
+    cv::Rect background;
+    background.x = rectSource.x - 2 * rectKernel.width;
+    background.y = rectSource.y - 2 * rectKernel.height;
+    background.width = rectSource.width + 4 * rectKernel.width;
+    background.height = rectSource.height + 4 * rectKernel.height;
+
+    rlVec rlBackground, rlSourceInverse, rlResultInverse;
+    createUprightRectangle(background, rlBackground);
+    subtract_rle(rlBackground, runsSource, rlSourceInverse);
+
+    erode_rle(rlSourceInverse, rlResultInverse, runsKernel);
+
+    invertRegion(rlResultInverse, runsDestination);
+}
+
+
+
+CV_EXPORTS void dilate(InputArray rlSrc, OutputArray rlDest, InputArray rlKernel, Point anchor)
+{
+  rlVec runsSource, runsDestination, runsKernel;
+  Size sizeSource, sizeKernel;
+  convertInputArrayToRuns(rlSrc, runsSource, sizeSource);
+  convertInputArrayToRuns(rlKernel, runsKernel, sizeKernel);
+
+  if (anchor != Point(0, 0))
+      translateRegion(runsKernel, -anchor);
+
+  dilate_rle(runsSource, runsDestination, runsKernel);
+
+  convertToOutputArray(runsDestination, sizeSource, rlDest);
+}
+
+CV_EXPORTS bool isRLMorphologyPossible(InputArray rlStructuringElement)
+{
+  rlVec runsKernel;
+  Size sizeKernel;
+  convertInputArrayToRuns(rlStructuringElement, runsKernel, sizeKernel);
+
+  for (int i = 1; i < (int) runsKernel.size(); ++i)
+    if (runsKernel[i].r != runsKernel[i-1].r + 1)
+      return false;
+
+  return true;
+}
+
+CV_EXPORTS void createRLEImage(std::vector<cv::Point3i>& runs, OutputArray res, Size size)
+{
+    size_t nRuns = runs.size();
+    rlVec runsConverted(nRuns);
+    for (size_t i = 0u; i < nRuns; ++i)
+    {
+        Point3i &curIn = runs[i];
+        runsConverted[i] = rlType(curIn.x, curIn.y, curIn.z);
+    }
+    sortChords(runsConverted);
+
+    if (size.width == 0 || size.height == 0)
+    {
+        Rect boundingRect = getBoundingRectangle(runsConverted);
+        size.width = std::max(0, boundingRect.x + boundingRect.width);
+        size.height = std::max(0, boundingRect.y + boundingRect.height);
+    }
+    convertToOutputArray(runsConverted, size, res);
+}
+
+
+CV_EXPORTS void morphologyEx(InputArray rlSrc, OutputArray rlDest, int op, InputArray rlKernel,
+    bool bBoundaryOnForErosion, Point anchor)
+{
+    if (op == MORPH_ERODE)
+        rl::erode(rlSrc, rlDest, rlKernel, bBoundaryOnForErosion, anchor);
+    else if (op == MORPH_DILATE)
+        rl::dilate(rlSrc, rlDest, rlKernel, anchor);
+    else
+    {
+        rlVec runsSource, runsKernel, runsDestination;
+        Size sizeSource, sizeKernel;
+        convertInputArrayToRuns(rlKernel, runsKernel, sizeKernel);
+        convertInputArrayToRuns(rlSrc, runsSource, sizeSource);
+
+        if (anchor != Point(0, 0))
+            translateRegion(runsKernel, -anchor);
+
+        switch (op)
+        {
+        case MORPH_OPEN:
+        {
+            rlVec runsEroded;
+
+            if (bBoundaryOnForErosion)
+                erode_with_boundary_rle(runsSource, sizeSource.width, sizeSource.height, runsEroded, runsKernel);
+            else
+                erode_rle(runsSource, runsEroded, runsKernel);
+            dilate_rle(runsEroded, runsDestination, runsKernel);
+        }
+        break;
+        case MORPH_CLOSE:
+        {
+            rlVec runsDilated;
+
+            dilate_rle(runsSource, runsDilated, runsKernel);
+            if (bBoundaryOnForErosion)
+                erode_with_boundary_rle(runsDilated, sizeSource.width, sizeSource.height, runsDestination, runsKernel);
+            else
+                erode_rle(runsDilated, runsDestination, runsKernel);
+        }
+        break;
+        case MORPH_GRADIENT:
+        {
+            rlVec runsEroded, runsDilated;
+            if (bBoundaryOnForErosion)
+                erode_with_boundary_rle(runsSource, sizeSource.width, sizeSource.height, runsEroded, runsKernel);
+            else
+                erode_rle(runsSource, runsEroded, runsKernel);
+            dilate_rle(runsSource, runsDilated, runsKernel);
+            subtract_rle(runsDilated, runsEroded, runsDestination);
+         }
+        break;
+        case MORPH_TOPHAT:
+        {
+            rlVec runsEroded, runsOpened;
+
+            if (bBoundaryOnForErosion)
+                erode_with_boundary_rle(runsSource, sizeSource.width, sizeSource.height, runsEroded, runsKernel);
+            else
+                erode_rle(runsSource, runsEroded, runsKernel);
+            dilate_rle(runsEroded, runsOpened, runsKernel);
+            subtract_rle(runsSource, runsOpened, runsDestination);
+        }
+        break;
+
+        case MORPH_BLACKHAT:
+        {
+            rlVec runsClosed, runsDilated;
+
+            dilate_rle(runsSource, runsDilated, runsKernel);
+            if (bBoundaryOnForErosion)
+                erode_with_boundary_rle(runsDilated, sizeSource.width, sizeSource.height, runsClosed, runsKernel);
+            else
+                erode_rle(runsDilated, runsClosed, runsKernel);
+
+            subtract_rle(runsClosed, runsSource, runsDestination);
+        }
+        break;
+        default:
+        case MORPH_HITMISS:
+            CV_Error(CV_StsBadArg, "unknown or unsupported morphological operation");
+        }
+        convertToOutputArray(runsDestination, sizeSource, rlDest);
+    }
+}
+
+}
+} //end of cv::ximgproc
+} //end of cv
diff --git a/modules/ximgproc/src/selectivesearchsegmentation.cpp b/modules/ximgproc/src/selectivesearchsegmentation.cpp
index 99d35a07e53..198c153a44e 100644
--- a/modules/ximgproc/src/selectivesearchsegmentation.cpp
+++ b/modules/ximgproc/src/selectivesearchsegmentation.cpp
@@ -819,7 +819,7 @@ namespace cv {
                 Ptr<SelectiveSearchSegmentationStrategyTexture> texture2 = createSelectiveSearchSegmentationStrategyTexture();
                 Ptr<SelectiveSearchSegmentationStrategySize> size2 = createSelectiveSearchSegmentationStrategySize();
 
-                Ptr<SelectiveSearchSegmentationStrategyMultiple> m2 = createSelectiveSearchSegmentationStrategyMultiple(fill, texture, size);
+                Ptr<SelectiveSearchSegmentationStrategyMultiple> m2 = createSelectiveSearchSegmentationStrategyMultiple(fill2, texture2, size2);
 
                 addStrategy(m2);
 
@@ -847,10 +847,7 @@ namespace cv {
                 addImage(channel[0]);
 
                 split(base_image, channel);
-                std::vector<Mat> channel2;
-                channel2.push_back(channel[2]);
-                channel2.push_back(channel[1]);
-                channel2.push_back(I);
+                std::vector<Mat> channel2 = {channel[2], channel[1], I};
 
                 Mat rgI;
                 merge(channel2, rgI);
@@ -876,7 +873,7 @@ namespace cv {
                 Ptr<SelectiveSearchSegmentationStrategyTexture> texture2 = createSelectiveSearchSegmentationStrategyTexture();
                 Ptr<SelectiveSearchSegmentationStrategySize> size2 = createSelectiveSearchSegmentationStrategySize();
 
-                Ptr<SelectiveSearchSegmentationStrategyMultiple> m2 = createSelectiveSearchSegmentationStrategyMultiple(fill, texture, size);
+                Ptr<SelectiveSearchSegmentationStrategyMultiple> m2 = createSelectiveSearchSegmentationStrategyMultiple(fill2, texture2, size2);
 
                 addStrategy(m2);
 
diff --git a/modules/ximgproc/src/sparse_match_interpolators.cpp b/modules/ximgproc/src/sparse_match_interpolators.cpp
index 8cb12f10a73..caec6983c3d 100644
--- a/modules/ximgproc/src/sparse_match_interpolators.cpp
+++ b/modules/ximgproc/src/sparse_match_interpolators.cpp
@@ -36,6 +36,7 @@
 
 #include "precomp.hpp"
 #include "opencv2/ximgproc/sparse_match_interpolator.hpp"
+#include "opencv2/video.hpp"
 #include <algorithm>
 #include <vector>
 #include "opencv2/core/hal/hal.hpp"
@@ -57,9 +58,10 @@ struct SparseMatch
 
 bool operator<(const SparseMatch& lhs,const SparseMatch& rhs);
 
-void weightedLeastSquaresAffineFit(int* labels, float* weights, int count, float lambda, SparseMatch* matches, Mat& dst);
-void generateHypothesis(int* labels, int count, RNG& rng, unsigned char* is_used, SparseMatch* matches, Mat& dst);
-void verifyHypothesis(int* labels, float* weights, int count, SparseMatch* matches, float eps, float lambda, Mat& hypothesis_transform, Mat& old_transform, float& old_weighted_num_inliers);
+static void computeGradientMagnitude(Mat& src, Mat& dst);
+static void weightedLeastSquaresAffineFit(int* labels, float* weights, int count, float lambda, const SparseMatch* matches, Mat& dst);
+static void generateHypothesis(int* labels, int count, RNG& rng, unsigned char* is_used, SparseMatch* matches, Mat& dst);
+static void verifyHypothesis(int* labels, float* weights, int count, SparseMatch* matches, float eps, float lambda, Mat& hypothesis_transform, Mat& old_transform, float& old_weighted_num_inliers);
 
 struct node
 {
@@ -69,6 +71,8 @@ struct node
     node(int l,float d): dist(d), label(l) {}
 };
 
+
+
 class EdgeAwareInterpolatorImpl CV_FINAL : public EdgeAwareInterpolator
 {
 public:
@@ -76,14 +80,14 @@ class EdgeAwareInterpolatorImpl CV_FINAL : public EdgeAwareInterpolator
     void interpolate(InputArray from_image, InputArray from_points, InputArray to_image, InputArray to_points, OutputArray dense_flow) CV_OVERRIDE;
 
 protected:
-    int w,h;
     int match_num;
+    int w, h;
     //internal buffers:
     vector<node>* g;
-    Mat labels;
     Mat NNlabels;
     Mat NNdistances;
-
+    Mat labels;
+    Mat costMap;
     //tunable parameters:
     float lambda;
     int k;
@@ -93,15 +97,14 @@ class EdgeAwareInterpolatorImpl CV_FINAL : public EdgeAwareInterpolator
     float fgs_sigma;
 
     // static parameters:
-    static const int distance_transform_num_iter   = 1;
     static const int ransac_interpolation_num_iter = 1;
+    static const int distance_transform_num_iter = 1;
     float regularization_coef;
     static const int ransac_num_stripes = 4;
     RNG rngs[ransac_num_stripes];
 
     void init();
     void preprocessData(Mat& src, vector<SparseMatch>& matches);
-    void computeGradientMagnitude(Mat& src, Mat& dst);
     void geodesicDistanceTransform(Mat& distances, Mat& cost_map);
     void buildGraph(Mat& distances, Mat& cost_map);
     void ransacInterpolation(vector<SparseMatch>& matches, Mat& dst_dense_flow);
@@ -133,6 +136,7 @@ class EdgeAwareInterpolatorImpl CV_FINAL : public EdgeAwareInterpolator
     };
 
 public:
+    void setCostMap(const Mat & _costMap) CV_OVERRIDE { _costMap.copyTo(costMap); }
     void  setK(int _k) CV_OVERRIDE {k=_k;}
     int   getK() CV_OVERRIDE {return k;}
     void  setSigma(float _sigma) CV_OVERRIDE {sigma=_sigma;}
@@ -156,6 +160,7 @@ void EdgeAwareInterpolatorImpl::init()
     fgs_lambda    = 500.0f;
     fgs_sigma     = 1.5f;
     regularization_coef = 0.01f;
+    costMap = Mat();
 }
 
 Ptr<EdgeAwareInterpolatorImpl> EdgeAwareInterpolatorImpl::create()
@@ -201,13 +206,13 @@ void EdgeAwareInterpolatorImpl::interpolate(InputArray from_image, InputArray fr
     if(use_post_proc)
         fastGlobalSmootherFilter(src,dst,dst,fgs_lambda,fgs_sigma);
 
+    costMap.release();
     delete[] g;
 }
 
 void EdgeAwareInterpolatorImpl::preprocessData(Mat& src, vector<SparseMatch>& matches)
 {
     Mat distances(h,w,CV_32F);
-    Mat cost_map (h,w,CV_32F);
     distances = Scalar(INF);
 
     int x,y;
@@ -220,56 +225,26 @@ void EdgeAwareInterpolatorImpl::preprocessData(Mat& src, vector<SparseMatch>& ma
         labels.at<int>(y,x) = (int)i;
     }
 
-    computeGradientMagnitude(src,cost_map);
-    cost_map = (1000.0f-lambda) + lambda*cost_map;
-
-    geodesicDistanceTransform(distances,cost_map);
-    buildGraph(distances,cost_map);
-    parallel_for_(Range(0,getNumThreads()),GetKNNMatches_ParBody(*this,getNumThreads()));
-}
-
-void EdgeAwareInterpolatorImpl::computeGradientMagnitude(Mat& src, Mat& dst)
-{
-    Mat dx,dy;
-    Sobel(src, dx, CV_16S, 1, 0);
-    Sobel(src, dy, CV_16S, 0, 1);
-    float norm_coef = src.channels()*4*255.0f;
-
-    if(src.channels()==1)
+    if (costMap.empty())
     {
-        for(int i=0;i<h;i++)
-        {
-            short* dx_row  = dx.ptr<short>(i);
-            short* dy_row  = dy.ptr<short>(i);
-            float* dst_row = dst.ptr<float>(i);
-
-            for(int j=0;j<w;j++)
-                dst_row[j] = ((float)abs(dx_row[j])+abs(dy_row[j]))/norm_coef;
-        }
+        costMap.create(h, w, CV_32FC1);
+        computeGradientMagnitude(src, costMap);
     }
     else
-    {
-        for(int i=0;i<h;i++)
-        {
-            Vec3s* dx_row  = dx.ptr<Vec3s>(i);
-            Vec3s* dy_row  = dy.ptr<Vec3s>(i);
-            float* dst_row = dst.ptr<float>(i);
-
-            for(int j=0;j<w;j++)
-                dst_row[j] = (float)(abs(dx_row[j][0])+abs(dy_row[j][0])+
-                                     abs(dx_row[j][1])+abs(dy_row[j][1])+
-                                     abs(dx_row[j][2])+abs(dy_row[j][2]))/norm_coef;
-        }
-    }
+        CV_Assert(costMap.cols == w && costMap.rows == h);
+    costMap = (1000.0f-lambda) + lambda* costMap;
+    geodesicDistanceTransform(distances, costMap);
+    buildGraph(distances, costMap);
+    parallel_for_(Range(0,getNumThreads()),GetKNNMatches_ParBody(*this,getNumThreads()));
 }
 
 void EdgeAwareInterpolatorImpl::geodesicDistanceTransform(Mat& distances, Mat& cost_map)
 {
-    float c1 = 1.0f/2.0f;
-    float c2 = sqrt(2.0f)/2.0f;
+    float c1 = 1.0f / 2.0f;
+    float c2 = sqrt(2.0f) / 2.0f;
     float d = 0.0f;
-    int i,j;
-    float *dist_row,      *cost_row;
+    int i, j;
+    float *dist_row, *cost_row;
     float *dist_row_prev, *cost_row_prev;
     int *label_row;
     int *label_row_prev;
@@ -282,74 +257,74 @@ void EdgeAwareInterpolatorImpl::geodesicDistanceTransform(Mat& distances, Mat& c
         cur_label = prev_label;}\
 }
 
-    for(int it=0;it<distance_transform_num_iter;it++)
+    for (int it = 0; it < distance_transform_num_iter; it++)
     {
         //first pass (left-to-right, top-to-bottom):
-        dist_row  = distances.ptr<float>(0);
+        dist_row = distances.ptr<float>(0);
         label_row = labels.ptr<int>(0);
-        cost_row  = cost_map.ptr<float>(0);
-        for(j=1;j<w;j++)
-            CHECK(dist_row[j],label_row[j],cost_row[j],dist_row[j-1],label_row[j-1],cost_row[j-1],c1);
+        cost_row = cost_map.ptr<float>(0);
+        for (j = 1; j < w; j++)
+            CHECK(dist_row[j], label_row[j], cost_row[j], dist_row[j - 1], label_row[j - 1], cost_row[j - 1], c1);
 
-        for(i=1;i<h;i++)
+        for (i = 1; i < h; i++)
         {
-            dist_row       = distances.ptr<float>(i);
-            dist_row_prev  = distances.ptr<float>(i-1);
+            dist_row = distances.ptr<float>(i);
+            dist_row_prev = distances.ptr<float>(i - 1);
 
-            label_row      = labels.ptr<int>(i);
-            label_row_prev = labels.ptr<int>(i-1);
+            label_row = labels.ptr<int>(i);
+            label_row_prev = labels.ptr<int>(i - 1);
 
-            cost_row      = cost_map.ptr<float>(i);
-            cost_row_prev = cost_map.ptr<float>(i-1);
+            cost_row = cost_map.ptr<float>(i);
+            cost_row_prev = cost_map.ptr<float>(i - 1);
 
-            j=0;
-            CHECK(dist_row[j],label_row[j],cost_row[j],dist_row_prev[j]  ,label_row_prev[j]  ,cost_row_prev[j]  ,c1);
-            CHECK(dist_row[j],label_row[j],cost_row[j],dist_row_prev[j+1],label_row_prev[j+1],cost_row_prev[j+1],c2);
+            j = 0;
+            CHECK(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j], label_row_prev[j], cost_row_prev[j], c1);
+            CHECK(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j + 1], label_row_prev[j + 1], cost_row_prev[j + 1], c2);
             j++;
-            for(;j<w-1;j++)
+            for (; j < w - 1; j++)
             {
-                CHECK(dist_row[j],label_row[j],cost_row[j],dist_row[j-1]     ,label_row[j-1]     ,cost_row[j-1]     ,c1);
-                CHECK(dist_row[j],label_row[j],cost_row[j],dist_row_prev[j-1],label_row_prev[j-1],cost_row_prev[j-1],c2);
-                CHECK(dist_row[j],label_row[j],cost_row[j],dist_row_prev[j]  ,label_row_prev[j]  ,cost_row_prev[j]  ,c1);
-                CHECK(dist_row[j],label_row[j],cost_row[j],dist_row_prev[j+1],label_row_prev[j+1],cost_row_prev[j+1],c2);
+                CHECK(dist_row[j], label_row[j], cost_row[j], dist_row[j - 1], label_row[j - 1], cost_row[j - 1], c1);
+                CHECK(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j - 1], label_row_prev[j - 1], cost_row_prev[j - 1], c2);
+                CHECK(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j], label_row_prev[j], cost_row_prev[j], c1);
+                CHECK(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j + 1], label_row_prev[j + 1], cost_row_prev[j + 1], c2);
             }
-            CHECK(dist_row[j],label_row[j],cost_row[j],dist_row[j-1]     ,label_row[j-1]     ,cost_row[j-1]     ,c1);
-            CHECK(dist_row[j],label_row[j],cost_row[j],dist_row_prev[j-1],label_row_prev[j-1],cost_row_prev[j-1],c2);
-            CHECK(dist_row[j],label_row[j],cost_row[j],dist_row_prev[j]  ,label_row_prev[j]  ,cost_row_prev[j]  ,c1);
+            CHECK(dist_row[j], label_row[j], cost_row[j], dist_row[j - 1], label_row[j - 1], cost_row[j - 1], c1);
+            CHECK(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j - 1], label_row_prev[j - 1], cost_row_prev[j - 1], c2);
+            CHECK(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j], label_row_prev[j], cost_row_prev[j], c1);
         }
 
         //second pass (right-to-left, bottom-to-top):
-        dist_row  = distances.ptr<float>(h-1);
-        label_row = labels.ptr<int>(h-1);
-        cost_row  = cost_map.ptr<float>(h-1);
-        for(j=w-2;j>=0;j--)
-            CHECK(dist_row[j],label_row[j],cost_row[j],dist_row[j+1],label_row[j+1],cost_row[j+1],c1);
+        dist_row = distances.ptr<float>(h - 1);
+        label_row = labels.ptr<int>(h - 1);
+        cost_row = cost_map.ptr<float>(h - 1);
+        for (j = w - 2; j >= 0; j--)
+            CHECK(dist_row[j], label_row[j], cost_row[j], dist_row[j + 1], label_row[j + 1], cost_row[j + 1], c1);
 
-        for(i=h-2;i>=0;i--)
+        for (i = h - 2; i >= 0; i--)
         {
-            dist_row       = distances.ptr<float>(i);
-            dist_row_prev  = distances.ptr<float>(i+1);
+            dist_row = distances.ptr<float>(i);
+            dist_row_prev = distances.ptr<float>(i + 1);
 
-            label_row      = labels.ptr<int>(i);
-            label_row_prev = labels.ptr<int>(i+1);
+            label_row = labels.ptr<int>(i);
+            label_row_prev = labels.ptr<int>(i + 1);
 
-            cost_row      = cost_map.ptr<float>(i);
-            cost_row_prev = cost_map.ptr<float>(i+1);
+            cost_row = cost_map.ptr<float>(i);
+            cost_row_prev = cost_map.ptr<float>(i + 1);
 
-            j=w-1;
-            CHECK(dist_row[j],label_row[j],cost_row[j],dist_row_prev[j]  ,label_row_prev[j]  ,cost_row_prev[j]  ,c1);
-            CHECK(dist_row[j],label_row[j],cost_row[j],dist_row_prev[j-1],label_row_prev[j-1],cost_row_prev[j-1],c2);
+            j = w - 1;
+            CHECK(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j], label_row_prev[j], cost_row_prev[j], c1);
+            CHECK(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j - 1], label_row_prev[j - 1], cost_row_prev[j - 1], c2);
             j--;
-            for(;j>0;j--)
+            for (; j > 0; j--)
             {
-                CHECK(dist_row[j],label_row[j],cost_row[j],dist_row[j+1]     ,label_row[j+1]     ,cost_row[j+1]     ,c1);
-                CHECK(dist_row[j],label_row[j],cost_row[j],dist_row_prev[j+1],label_row_prev[j+1],cost_row_prev[j+1],c2);
-                CHECK(dist_row[j],label_row[j],cost_row[j],dist_row_prev[j]  ,label_row_prev[j]  ,cost_row_prev[j]  ,c1);
-                CHECK(dist_row[j],label_row[j],cost_row[j],dist_row_prev[j-1],label_row_prev[j-1],cost_row_prev[j-1],c2);
+                CHECK(dist_row[j], label_row[j], cost_row[j], dist_row[j + 1], label_row[j + 1], cost_row[j + 1], c1);
+                CHECK(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j + 1], label_row_prev[j + 1], cost_row_prev[j + 1], c2);
+                CHECK(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j], label_row_prev[j], cost_row_prev[j], c1);
+                CHECK(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j - 1], label_row_prev[j - 1], cost_row_prev[j - 1], c2);
             }
-            CHECK(dist_row[j],label_row[j],cost_row[j],dist_row[j+1]     ,label_row[j+1]     ,cost_row[j+1]     ,c1);
-            CHECK(dist_row[j],label_row[j],cost_row[j],dist_row_prev[j+1],label_row_prev[j+1],cost_row_prev[j+1],c2);
-            CHECK(dist_row[j],label_row[j],cost_row[j],dist_row_prev[j]  ,label_row_prev[j]  ,cost_row_prev[j]  ,c1);
+            CHECK(dist_row[j], label_row[j], cost_row[j], dist_row[j + 1], label_row[j + 1], cost_row[j + 1], c1);
+            CHECK(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j + 1], label_row_prev[j + 1], cost_row_prev[j + 1], c2);
+            CHECK(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j], label_row_prev[j], cost_row_prev[j], c1);
         }
     }
 #undef CHECK
@@ -624,7 +599,7 @@ void EdgeAwareInterpolatorImpl::GetKNNMatches_ParBody::operator() (const Range&
     delete[] expanded_flag;
 }
 
-void weightedLeastSquaresAffineFit(int* labels, float* weights, int count, float lambda, SparseMatch* matches, Mat& dst)
+static void weightedLeastSquaresAffineFit(int* labels, float* weights, int count, float lambda, const SparseMatch* matches, Mat& dst)
 {
     double sa[6][6]={{0.}}, sb[6]={0.};
     Mat A (6, 6, CV_64F, &sa[0][0]),
@@ -671,7 +646,7 @@ void weightedLeastSquaresAffineFit(int* labels, float* weights, int count, float
     MM.reshape(2,3).convertTo(dst,CV_32F);
 }
 
-void generateHypothesis(int* labels, int count, RNG& rng, unsigned char* is_used, SparseMatch* matches, Mat& dst)
+static void generateHypothesis(int* labels, int count, RNG& rng, unsigned char* is_used, SparseMatch* matches, Mat& dst)
 {
     int idx;
     Point2f src_points[3];
@@ -702,7 +677,7 @@ void generateHypothesis(int* labels, int count, RNG& rng, unsigned char* is_used
     getAffineTransform(src_points,dst_points).convertTo(dst,CV_32F);
 }
 
-void verifyHypothesis(int* labels, float* weights, int count, SparseMatch* matches, float eps, float lambda, Mat& hypothesis_transform, Mat& old_transform, float& old_weighted_num_inliers)
+static void verifyHypothesis(int* labels, float* weights, int count, SparseMatch* matches, float eps, float lambda, Mat& hypothesis_transform, Mat& old_transform, float& old_weighted_num_inliers)
 {
     float* tr = hypothesis_transform.ptr<float>(0);
     Point2f a,b;
@@ -724,6 +699,41 @@ void verifyHypothesis(int* labels, float* weights, int count, SparseMatch* match
     }
 }
 
+static void computeGradientMagnitude(Mat& src, Mat& dst)
+{
+    Mat dx, dy;
+    Sobel(src, dx, CV_16S, 1, 0);
+    Sobel(src, dy, CV_16S, 0, 1);
+    float norm_coef = src.channels() * 4 * 255.0f;
+
+    if (src.channels() == 1)
+    {
+        for (int i = 0; i < src.rows; i++)
+        {
+            short* dx_row = dx.ptr<short>(i);
+            short* dy_row = dy.ptr<short>(i);
+            float* dst_row = dst.ptr<float>(i);
+
+            for (int j = 0; j < src.cols; j++)
+                dst_row[j] = ((float)abs(dx_row[j]) + abs(dy_row[j])) / norm_coef;
+        }
+    }
+    else
+    {
+        for (int i = 0; i < src.rows; i++)
+        {
+            Vec3s* dx_row = dx.ptr<Vec3s>(i);
+            Vec3s* dy_row = dy.ptr<Vec3s>(i);
+            float* dst_row = dst.ptr<float>(i);
+
+            for (int j = 0; j < src.cols; j++)
+                dst_row[j] = (float)(abs(dx_row[j][0]) + abs(dy_row[j][0]) +
+                    abs(dx_row[j][1]) + abs(dy_row[j][1]) +
+                    abs(dx_row[j][2]) + abs(dy_row[j][2])) / norm_coef;
+        }
+    }
+}
+
 EdgeAwareInterpolatorImpl::RansacInterpolation_ParBody::RansacInterpolation_ParBody(EdgeAwareInterpolatorImpl& _inst, Mat* _transforms, float* _weighted_inlier_nums, float* _eps, SparseMatch* _matches, int _num_stripes, int _inc):
 inst(&_inst), transforms(_transforms), weighted_inlier_nums(_weighted_inlier_nums), eps(_eps), matches(_matches), num_stripes(_num_stripes), inc(_inc)
 {
@@ -877,5 +887,1006 @@ bool operator<(const SparseMatch& lhs,const SparseMatch& rhs)
         return (lhs.reference_image_pos.x<rhs.reference_image_pos.x);
 }
 
+class RICInterpolatorImpl CV_FINAL : public RICInterpolator
+{
+public:
+    static Ptr<RICInterpolatorImpl> create();
+    void interpolate(InputArray from_image, InputArray from_points, InputArray to_image, InputArray to_points, OutputArray dense_flow) CV_OVERRIDE;
+
+protected:
+    // internal buffers
+    int match_num;
+    vector<vector<node>> g;
+    Mat NNlabels;
+    Mat NNdistances;
+    Mat labels;
+    Mat costMap;
+    static const int distance_transform_num_iter = 1;
+    float lambda;
+
+    //tunable parameters:
+    int max_neighbors;
+    float alpha;
+    int sp_size;
+    int sp_nncnt;
+    float sp_ruler;
+    int model_iter;
+    bool refine_models;
+    bool use_variational_refinement;
+    float cost_suffix;
+    float max_flow;
+    bool use_global_smoother_filter;
+    float fgs_lambda;
+    float fgs_sigma;
+    SLICType slic_type;
+
+    void init();
+    void buildGraph(Mat& distances, Mat& cost_map);
+    void geodesicDistanceTransform(Mat& distances, Mat& cost_map);
+    int  overSegmentaion(const Mat & img, Mat & outLabels, const int spSize);
+    void superpixelNeighborConstruction(const Mat & labels, int labelCnt, Mat& outNeighbor);
+    void superpixelLayoutAnalysis(const Mat & labels, int labelCnt, Mat & outCenterPositions, Mat & outNodeItemLists);
+    void findSupportMatches(vector<int> & srcIds, int srcCnt, int supportCnt, Mat & matNN,
+        Mat & matNNDis, vector<int> & outSupportIds, vector<float> &  outSupportDis);
+    float GetWeightFromDistance(float dis);
+    int PropagateModels(int spCnt, Mat & spNN, vector<int> & supportMatchIds, vector<float> & supportMatchDis, int supportCnt,
+        const vector<SparseMatch> &inputMatches, Mat & outModels);
+    float HypothesisEvaluation(const Mat & inModel, const int * matNodes, const float * matDis, int matCnt, const vector<SparseMatch> & inputMatches, Mat & outInLier);
+    int HypothesisGeneration(int* matNodes, int matCnt, const vector<SparseMatch> & inputMatches, Mat & outModel);
+
+public:
+    void setCostMap(const Mat & _costMap) CV_OVERRIDE { _costMap.copyTo(costMap); }
+    void setK(int val) CV_OVERRIDE { max_neighbors  = val; }
+    int  getK() const CV_OVERRIDE { return max_neighbors; }
+    void setSuperpixelSize(int val) CV_OVERRIDE { sp_size = val; }
+    int  getSuperpixelSize() const CV_OVERRIDE { return sp_size; }
+    void setSuperpixelNNCnt(int val) CV_OVERRIDE { sp_nncnt = val; }
+    int  getSuperpixelNNCnt() const CV_OVERRIDE { return sp_nncnt; }
+    void setSuperpixelRuler(float val) CV_OVERRIDE { sp_ruler = val; }
+    float getSuperpixelRuler() const CV_OVERRIDE { return sp_ruler; }
+    void setSuperpixelMode(int val) CV_OVERRIDE
+    {
+        slic_type = static_cast<SLICType>(val);
+        CV_Assert(slic_type == SLICO || slic_type == SLIC || slic_type == MSLIC);
+    }
+    int  getSuperpixelMode() const CV_OVERRIDE { return slic_type; }
+    void setAlpha(float val) CV_OVERRIDE { alpha = val; }
+    float getAlpha() const CV_OVERRIDE { return alpha; }
+    void setModelIter(int val) CV_OVERRIDE { model_iter = val; }
+    int  getModelIter() const CV_OVERRIDE { return model_iter; }
+    void setRefineModels(bool val) CV_OVERRIDE { refine_models = static_cast<int>(val); }
+    bool getRefineModels() const CV_OVERRIDE { return refine_models; }
+    void setMaxFlow(float val) CV_OVERRIDE { max_flow = val; }
+    float getMaxFlow() const CV_OVERRIDE { return max_flow; }
+    void setUseVariationalRefinement(bool val) CV_OVERRIDE { use_variational_refinement = val; }
+    bool getUseVariationalRefinement() const CV_OVERRIDE { return use_variational_refinement; }
+    void  setUseGlobalSmootherFilter(bool val) CV_OVERRIDE { use_global_smoother_filter = val; }
+    bool  getUseGlobalSmootherFilter() const CV_OVERRIDE { return use_global_smoother_filter; }
+    void  setFGSLambda(float _lambda) CV_OVERRIDE { fgs_lambda = _lambda; }
+    float getFGSLambda() const CV_OVERRIDE { return fgs_lambda; }
+    void  setFGSSigma(float _sigma) CV_OVERRIDE { fgs_sigma = _sigma; }
+    float getFGSSigma() const CV_OVERRIDE { return fgs_sigma; }
+};
+
+Ptr<RICInterpolatorImpl> RICInterpolatorImpl::create()
+{
+    auto eai = makePtr<RICInterpolatorImpl>();
+    eai->init();
+    return eai;
+}
+
+void RICInterpolatorImpl::init()
+{
+    max_neighbors = 32;
+    alpha = 0.7f;
+    lambda = 999.0f;
+    sp_size = 15;
+    sp_nncnt = 150;
+    sp_ruler = 15.f;
+    model_iter = 4;
+    refine_models = true;
+    use_variational_refinement = false;
+    cost_suffix = 0.001f;
+    max_flow = 250.f;
+    use_global_smoother_filter = true;
+    fgs_lambda = 500.0f;
+    fgs_sigma = 1.5f;
+    slic_type = SLIC;
+    costMap = Mat();
+}
+
+struct MinHeap
+{
+public:
+    MinHeap(int size)
+    {
+        //memset(this, 0, sizeof(*this));
+        Init(size);
+    }
+
+    int Init(int size)
+    {
+        m_index.resize(size);
+        m_weight.resize(size);
+        m_size = size;
+        m_validSize = 0;
+
+        return 0;
+    }
+    int Push(float index, float weight)
+    {
+        if (m_validSize >= m_size)
+        {
+            CV_Error(CV_StsOutOfRange, " m_validSize >= m_size this problem can be resolved my decreasig k parameter");
+        }
+        m_index[m_validSize] = index;
+        m_weight[m_validSize] = weight;
+        m_validSize++;
+
+        int i = m_validSize - 1;
+        while (prior(m_weight[i], m_weight[(i - 1) / 2])) {
+            swap(m_weight[i], m_weight[(i - 1) / 2]);
+            swap(m_index[i], m_index[(i - 1) / 2]);
+            i = (i - 1) / 2; // jump up to the parent
+        }
+
+        return i;
+    }
+
+    float Pop(float* weight = NULL)
+    {
+        if (m_validSize == 0) {
+            return -1;
+        }
+
+        if (weight) {
+            *weight = m_weight[0];
+        }
+        float outIdx = m_index[0];
+
+        // use the last item to overwrite the first
+        m_index[0] = m_index[m_validSize - 1];
+        m_weight[0] = m_weight[m_validSize - 1];
+        m_validSize--;
+
+        // adjust the heap from top to bottom
+        float rawIdx = m_index[0];
+        float rawWt = m_weight[0];
+        int candiPos = 0; // the root
+        int i = 1; // left child of the root
+        while (i < m_validSize) {
+            // test right child
+            if (i + 1 < m_validSize && prior(m_weight[i + 1], m_weight[i])) {
+                i++;
+            }
+            if (prior(rawWt, m_weight[i])) {
+                break;
+            }
+            m_index[candiPos] = m_index[i];
+            m_weight[candiPos] = m_weight[i];
+            candiPos = i;
+
+            i = (i + 1) * 2 - 1; // left child
+        }
+        m_index[candiPos] = rawIdx;
+        m_weight[candiPos] = rawWt;
+
+        return outIdx;
+    }
+    void Clear()
+    {
+        m_validSize = 0;
+    }
+    int Size()
+    {
+        return m_validSize;
+    }
+
+private:
+    inline bool prior(float w1, float w2)
+    {
+            return w1 < w2;
+    }
+
+    vector<float> m_index;
+    vector<float> m_weight;
+    int m_size;
+    int m_validSize;
+};
+
+void RICInterpolatorImpl::interpolate(InputArray from_image, InputArray from_points, InputArray to_image, InputArray to_points, OutputArray dense_flow)
+{
+    CV_Assert(!from_image.empty());
+    CV_Assert(from_image.depth() == CV_8U);
+    CV_Assert((from_image.channels() == 3 || from_image.channels() == 1));
+    CV_Assert(use_variational_refinement == false || !to_image.empty());
+    CV_Assert(use_variational_refinement == false || to_image.depth() == CV_8U);
+    CV_Assert(use_variational_refinement == false || to_image.channels() == 3 || to_image.channels() == 1);
+    CV_Assert(!from_points.empty() && from_points.isVector());
+    CV_Assert(!to_points.empty() && to_points.isVector());
+    CV_Assert(from_points.sameSize(to_points));
+
+    vector<Point2f> from_vector = *(const vector<Point2f>*)from_points.getObj();
+    vector<Point2f> to_vector = *(const vector<Point2f>*)to_points.getObj();
+    vector<SparseMatch> matches_vector(from_vector.size());
+    for (unsigned int i = 0; i < from_vector.size(); i++)
+        matches_vector[i] = SparseMatch(from_vector[i], to_vector[i]);
+
+    match_num = static_cast<int>(from_vector.size());
+
+    Mat src = from_image.getMat();
+    Size src_size = src.size();
+
+    labels = Mat(src_size, CV_32SC1);
+    labels.setTo(-1);
+    NNlabels = Mat(match_num, max_neighbors, CV_32S);
+    NNlabels = Scalar(-1);
+    NNdistances = Mat(match_num, max_neighbors, CV_32F);
+    NNdistances = Scalar(0.0f);
+
+    Mat matDistanceMap(src_size, CV_32FC1);
+    matDistanceMap.setTo(1e10);
+
+    if (costMap.empty())
+    {
+        costMap.create(src_size, CV_32FC1);
+        computeGradientMagnitude(src, costMap);
+    }
+    else
+        CV_Assert(costMap.rows == src.rows && costMap.cols == src.cols );
+
+    costMap = (1000.0f - lambda) + lambda * costMap;
+
+    for (unsigned int i = 0; i < matches_vector.size(); i++)
+    {
+        const SparseMatch & p = matches_vector[i];
+        Point pos(static_cast<int>(p.reference_image_pos.x), static_cast<int>(p.reference_image_pos.y));
+        labels.at<int>(pos) = i;
+        matDistanceMap.at<float>(pos) = costMap.at<float>(pos);
+    }
+
+    geodesicDistanceTransform(matDistanceMap, costMap);
+
+    g.resize(match_num);
+    buildGraph(matDistanceMap, costMap);
+    parallel_for_(Range(0, getNumThreads()), [&](const Range & range)
+    {
+        int stripe_sz = (int)ceil(match_num / (double)getNumThreads());
+        int start = std::min(range.start * stripe_sz, match_num);
+        int end = std::min(range.end   * stripe_sz, match_num);
+        nodeHeap q((int)match_num);
+        int num_expanded_vertices;
+        vector<unsigned int> expanded_flag(match_num);
+        node* neighbors;
+        for (int i = start; i < end; i++)
+        {
+            if (g[i].empty())
+                continue;
+            num_expanded_vertices = 0;
+            fill(expanded_flag.begin(), expanded_flag.end(), 0);
+            q.clear();
+            q.add(node((int)i, 0.0f));
+            int* NNlabels_row = NNlabels.ptr<int>(i);
+            float* NNdistances_row = NNdistances.ptr<float>(i);
+            while (num_expanded_vertices < max_neighbors && !q.empty())
+            {
+                node vert_for_expansion = q.getMin();
+                expanded_flag[vert_for_expansion.label] = 1;
+
+                //write the expanded vertex to the dst:
+                NNlabels_row[num_expanded_vertices] = vert_for_expansion.label;
+                NNdistances_row[num_expanded_vertices] = vert_for_expansion.dist;
+                num_expanded_vertices++;
+
+                //update the heap:
+                neighbors = &g[vert_for_expansion.label].front();
+                for (int j = 0; j < (int)g[vert_for_expansion.label].size(); j++)
+                {
+                    if (!expanded_flag[neighbors[j].label])
+                        q.updateNode(node(neighbors[j].label, vert_for_expansion.dist + neighbors[j].dist));
+                }
+            }
+        }
+    });
+
+    Mat spLabels;
+    Mat spNN;
+    Mat spPos;
+    Mat spItems;
+
+    int spCnt = overSegmentaion(src, spLabels, sp_size);
+    superpixelNeighborConstruction(spLabels, spCnt, spNN);
+    superpixelLayoutAnalysis(spLabels, spCnt, spPos, spItems);
+
+    vector<int> srcMatchIds(spCnt);
+    for (int i = 0; i < spCnt; i++)
+    {
+        Point pos = static_cast<Point>(spPos.at<Point2f>(i) + Point2f(0.5f,0.5f));
+        pos.x = MIN(labels.cols - 1, pos.x);
+        pos.y = MIN(labels.rows - 1, pos.y);
+        srcMatchIds[i] = labels.at<int>(pos);
+    }
+
+    int supportCnt = sp_nncnt;
+    vector<int> supportMatchIds(spCnt*supportCnt);  // support matches for each superpixel
+    vector<float> supportMatchDis(spCnt*supportCnt);
+
+    findSupportMatches(srcMatchIds, spCnt, supportCnt, NNlabels, NNdistances, supportMatchIds, supportMatchDis);
+
+    Mat transModels(spCnt,6, CV_32FC1);
+    Mat fitModels(spCnt, 6, CV_32FC1);
+    Mat rawModel(1, 6, CV_32FC1);
+    rawModel.setTo(0);
+    rawModel.at<float>(0) = 1;
+    rawModel.at<float>(4) = 1;
+
+    for (int i = 0; i < spCnt; i++) {
+        int matId = supportMatchIds[i*supportCnt + 0];
+
+        const SparseMatch & p = matches_vector[matId];
+        float u = p.target_image_pos.x - p.reference_image_pos.x;
+        float v = p.target_image_pos.y - p.reference_image_pos.y;
+
+        rawModel.copyTo(transModels.row(i));
+        transModels.at<float>(i,2) = u;
+        transModels.at<float>(i,5) = v;
+        transModels.row(i).copyTo(fitModels.row(i));
+    }
+
+    PropagateModels(spCnt, spNN, supportMatchIds, supportMatchDis, supportCnt, matches_vector, fitModels);
+
+    dense_flow.create(from_image.size(), CV_32FC2);
+
+    Mat U = Mat(src.rows, src.cols, CV_32FC1);
+    Mat V = Mat(src.rows, src.cols, CV_32FC1);
+    for (int i = 0; i < spCnt; i++) {
+        for (int k = 0; k < spItems.cols; k++) {
+            int x = spItems.at<Point>(i,k).x;
+            int y = spItems.at<Point>(i,k).y;
+            if (x < 0 || y < 0) {
+                break;
+            }
+            float fx = fitModels.at<float>(i, 0) * x + fitModels.at<float>(i, 1) * y + fitModels.at<float>(i, 2);
+            float fy = fitModels.at<float>(i, 3) * x + fitModels.at<float>(i, 4) * y + fitModels.at<float>(i, 5);
+            //cout << fitModels << endl;
+            U.at<float>(y, x) = fx - x;
+            V.at<float>(y, x) = fy - y;
+            if (abs(fx - x) > max_flow || abs(fy - y) > max_flow)
+            {
+                // use the translational model directly
+                fx = transModels.at<float>(i, 0) * x + transModels.at<float>(i, 1) * y + transModels.at<float>(i, 2);
+                fy = transModels.at<float>(i, 3) * x + transModels.at<float>(i, 4) * y + transModels.at<float>(i, 5);
+                U.at<float>(y, x) = fx - x;
+                V.at<float>(y, x) = fy - y;
+            }
+        }
+    }
+
+    Mat dst;
+    Mat prevGrey, currGrey;
+
+    if (use_variational_refinement)
+    {
+        Mat src2 = to_image.getMat();
+        cv::medianBlur(U, U, 3);
+        cv::medianBlur(V, V, 3);
+        Ptr<VariationalRefinement> variationalrefine = VariationalRefinement::create();
+        cvtColor(src, prevGrey, COLOR_BGR2GRAY);
+        cvtColor(src2, currGrey, COLOR_BGR2GRAY);
+        variationalrefine->setOmega(1.9f);
+        variationalrefine->calcUV(prevGrey, currGrey, U,V);
+    }
+    Mat UV[2] = { U,V };
+    merge(UV, 2, dst);
+
+    if (use_global_smoother_filter)
+    {
+        if (prevGrey.empty())
+            cvtColor(src, prevGrey, COLOR_BGR2GRAY);
+        fastGlobalSmootherFilter(prevGrey, dst, dst, fgs_lambda, fgs_sigma);
+    }
+    dst.copyTo(dense_flow.getMat());
+    costMap.release();
+}
+
+void RICInterpolatorImpl::geodesicDistanceTransform(Mat& distances, Mat& cost_map)
+{
+    float c1 = 1.0f / 2.0f;
+    float c2 = sqrt(2.0f) / 2.0f;
+    float d = 0.0f;
+    int i, j;
+    float *dist_row, *cost_row;
+    float *dist_row_prev, *cost_row_prev;
+    int *label_row;
+    int *label_row_prev;
+
+#define CHK_GD(cur_dist,cur_label,cur_cost,prev_dist,prev_label,prev_cost,coef)\
+{\
+    d = prev_dist + coef*(cur_cost+prev_cost);\
+    if(cur_dist>d){\
+        cur_dist=d;\
+        cur_label = prev_label;}\
+}
+
+    for (int it = 0; it < distance_transform_num_iter; it++)
+    {
+        //first pass (left-to-right, top-to-bottom):
+        dist_row = distances.ptr<float>(0);
+        label_row = labels.ptr<int>(0);
+        cost_row = cost_map.ptr<float>(0);
+        for (j = 1; j < distances.cols; j++)
+            CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row[j - 1], label_row[j - 1], cost_row[j - 1], c1);
+
+        for (i = 1; i < distances.rows; i++)
+        {
+            dist_row = distances.ptr<float>(i);
+            dist_row_prev = distances.ptr<float>(i - 1);
+
+            label_row = labels.ptr<int>(i);
+            label_row_prev = labels.ptr<int>(i - 1);
+
+            cost_row = cost_map.ptr<float>(i);
+            cost_row_prev = cost_map.ptr<float>(i - 1);
+
+            j = 0;
+            CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j], label_row_prev[j], cost_row_prev[j], c1);
+            CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j + 1], label_row_prev[j + 1], cost_row_prev[j + 1], c2);
+            j++;
+            for (; j < distances.cols - 1; j++)
+            {
+                CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row[j - 1], label_row[j - 1], cost_row[j - 1], c1);
+                CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j - 1], label_row_prev[j - 1], cost_row_prev[j - 1], c2);
+                CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j], label_row_prev[j], cost_row_prev[j], c1);
+                CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j + 1], label_row_prev[j + 1], cost_row_prev[j + 1], c2);
+            }
+            CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row[j - 1], label_row[j - 1], cost_row[j - 1], c1);
+            CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j - 1], label_row_prev[j - 1], cost_row_prev[j - 1], c2);
+            CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j], label_row_prev[j], cost_row_prev[j], c1);
+        }
+
+        //second pass (right-to-left, bottom-to-top):
+        dist_row = distances.ptr<float>(distances.rows - 1);
+        label_row = labels.ptr<int>(distances.rows - 1);
+        cost_row = cost_map.ptr<float>(distances.rows - 1);
+        for (j = distances.cols - 2; j >= 0; j--)
+            CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row[j + 1], label_row[j + 1], cost_row[j + 1], c1);
+
+        for (i = distances.rows - 2; i >= 0; i--)
+        {
+            dist_row = distances.ptr<float>(i);
+            dist_row_prev = distances.ptr<float>(i + 1);
+
+            label_row = labels.ptr<int>(i);
+            label_row_prev = labels.ptr<int>(i + 1);
+
+            cost_row = cost_map.ptr<float>(i);
+            cost_row_prev = cost_map.ptr<float>(i + 1);
+
+            j = distances.cols - 1;
+            CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j], label_row_prev[j], cost_row_prev[j], c1);
+            CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j - 1], label_row_prev[j - 1], cost_row_prev[j - 1], c2);
+            j--;
+            for (; j > 0; j--)
+            {
+                CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row[j + 1], label_row[j + 1], cost_row[j + 1], c1);
+                CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j + 1], label_row_prev[j + 1], cost_row_prev[j + 1], c2);
+                CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j], label_row_prev[j], cost_row_prev[j], c1);
+                CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j - 1], label_row_prev[j - 1], cost_row_prev[j - 1], c2);
+            }
+            CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row[j + 1], label_row[j + 1], cost_row[j + 1], c1);
+            CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j + 1], label_row_prev[j + 1], cost_row_prev[j + 1], c2);
+            CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j], label_row_prev[j], cost_row_prev[j], c1);
+        }
+    }
+#undef CHK_GD
+}
+
+void RICInterpolatorImpl::buildGraph(Mat& distances, Mat& cost_map)
+{
+    float *dist_row, *cost_row;
+    float *dist_row_prev, *cost_row_prev;
+    const int *label_row;
+    const int *label_row_prev;
+    int i, j;
+    const float c1 = 1.0f / 2.0f;
+    const float c2 = sqrt(2.0f) / 2.0f;
+    float d;
+    bool found;
+    int h = labels.rows;
+    int w = labels.cols;
+#define CHK_GD(cur_dist,cur_label,cur_cost,prev_dist,prev_label,prev_cost,coef)\
+    if(cur_label!=prev_label)\
+    {\
+        d = prev_dist + cur_dist + coef*(cur_cost+prev_cost);\
+        found = false;\
+        for(unsigned int n=0;n<g[prev_label].size();n++)\
+        {\
+            if(g[prev_label][n].label==cur_label)\
+            {\
+                g[prev_label][n].dist = min(g[prev_label][n].dist,d);\
+                found=true;\
+                break;\
+            }\
+        }\
+        if(!found)\
+            g[prev_label].push_back(node(cur_label ,d));\
+    }
+
+    dist_row = distances.ptr<float>(0);
+    label_row = labels.ptr<int>(0);
+    cost_row = cost_map.ptr<float>(0);
+    for (j = 1; j < w; j++)
+        CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row[j - 1], label_row[j - 1], cost_row[j - 1], c1);
+
+    for (i = 1; i < h; i++)
+    {
+        dist_row = distances.ptr<float>(i);
+        dist_row_prev = distances.ptr<float>(i - 1);
+
+        label_row = labels.ptr<int>(i);
+        label_row_prev = labels.ptr<int>(i - 1);
+
+        cost_row = cost_map.ptr<float>(i);
+        cost_row_prev = cost_map.ptr<float>(i - 1);
+
+        j = 0;
+        CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j], label_row_prev[j], cost_row_prev[j], c1);
+        CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j + 1], label_row_prev[j + 1], cost_row_prev[j + 1], c2);
+        j++;
+        for (; j < w - 1; j++)
+        {
+            CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row[j - 1], label_row[j - 1], cost_row[j - 1], c1);
+            CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j - 1], label_row_prev[j - 1], cost_row_prev[j - 1], c2);
+            CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j], label_row_prev[j], cost_row_prev[j], c1);
+            CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j + 1], label_row_prev[j + 1], cost_row_prev[j + 1], c2);
+        }
+        CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row[j - 1], label_row[j - 1], cost_row[j - 1], c1);
+        CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j - 1], label_row_prev[j - 1], cost_row_prev[j - 1], c2);
+        CHK_GD(dist_row[j], label_row[j], cost_row[j], dist_row_prev[j], label_row_prev[j], cost_row_prev[j], c1);
+    }
+#undef CHK_GD
+
+    // force equal distances in both directions:
+    node* neighbors;
+    for (i = 0; i < match_num; i++)
+    {
+        if (g[i].empty())
+            continue;
+        neighbors = &g[i].front();
+        for (j = 0; j < (int)g[i].size(); j++)
+        {
+            found = false;
+
+            for (unsigned int n = 0; n < g[neighbors[j].label].size(); n++)
+            {
+                if (g[neighbors[j].label][n].label == i)
+                {
+                    neighbors[j].dist = g[neighbors[j].label][n].dist = min(neighbors[j].dist, g[neighbors[j].label][n].dist);
+                    found = true;
+                    break;
+                }
+            }
+
+            if (!found)
+                g[neighbors[j].label].push_back(node((int)i, neighbors[j].dist));
+        }
+    }
+}
+
+int RICInterpolatorImpl::overSegmentaion(const Mat & img, Mat & outLabels, const int spSize)
+{
+    Mat labImg;
+    cvtColor(img, labImg, COLOR_BGR2Lab);
+    Ptr< SuperpixelSLIC> slic = createSuperpixelSLIC(labImg, slic_type, spSize, sp_ruler);
+    slic->iterate(5);
+    slic->getLabels(outLabels);
+    return slic->getNumberOfSuperpixels();
+}
+
+void RICInterpolatorImpl::superpixelNeighborConstruction(const Mat & _labels, int labelCnt, Mat& outNeighbor)
+{
+    // init
+    outNeighbor = Mat(labelCnt, max_neighbors, CV_32SC1); // only support 32 neighbors
+    outNeighbor.setTo(-1);
+
+    vector<int> diffPairs(labels.cols*_labels.rows * 4, 0);
+    int diffPairCnt = 0;
+    for (int i = 1; i < _labels.rows; i++) {
+        for (int j = 1; j < _labels.cols; j++) {
+
+            int l0 = _labels.at<int>(i,j);
+            int l1 = _labels.at<int>(i,j- 1);
+            int l2 = _labels.at<int>(i-1,j);
+
+            if (l0 != l1) {
+                diffPairs[2 * diffPairCnt] = l0;
+                diffPairs[2 * diffPairCnt + 1] = l1;
+                diffPairCnt++;
+            }
+
+            if (l0 != l2) {
+                diffPairs[2 * diffPairCnt] = l0;
+                diffPairs[2 * diffPairCnt + 1] = l2;
+                diffPairCnt++;
+            }
+        }
+    }
+
+    for (int i = 0; i < diffPairCnt; i++) {
+        int a = diffPairs[2 * i];
+        int b = diffPairs[2 * i + 1];
+        int k = 0;
+
+        // add to neighbor list of a
+        for (k = 0; k < max_neighbors; k++) {
+            if (outNeighbor.at<int>(a, k) < 0) {
+                break;
+            }
+            if (outNeighbor.at<int>(a,k) == b) {
+                k = -1;
+                break;
+            }
+        }
+        if (k >= 0 && k < max_neighbors) {
+            outNeighbor.at<int>(a, k) = b;
+        }
+
+        // add to neighbor list of b
+        for (k = 0; k < max_neighbors; k++) {
+            if (outNeighbor.at<int>(b, k) < 0) {
+                break;
+            }
+            if (outNeighbor.at<int>(b, k) == a) {
+                k = -1;
+                break;
+            }
+        }
+        if (k >= 0 && k < max_neighbors) {
+            outNeighbor.at<int>(b, k) = a;
+        }
+    }
+
+}
+
+void RICInterpolatorImpl::superpixelLayoutAnalysis(const Mat & _labels, int labelCnt, Mat & outCenterPositions, Mat & outNodeItemLists)
+{
+    outCenterPositions = Mat(labelCnt,1,CV_32FC2); // x and y
+    outCenterPositions.setTo(0);
+
+    vector<int> itemCnt(labelCnt, 0);
+
+    for (int i = 0; i < _labels.rows; i++) {
+        for (int j = 0; j < _labels.cols; j++) {
+            int id = _labels.at<int>(i,j);
+            outCenterPositions.at<Point2f>(id) += Point2f(static_cast<float>(j),static_cast<float>(i));
+            itemCnt[id]++;
+        }
+    }
+    int maxItemCnt = 0;
+    for (int i = 0; i < labelCnt; i++) {
+        if (itemCnt[i] > maxItemCnt) {
+            maxItemCnt = itemCnt[i];
+        }
+        if (itemCnt[i] > 0) {
+            outCenterPositions.at<Point2f>(i).x /= itemCnt[i];
+            outCenterPositions.at<Point2f>(i).y /= itemCnt[i];
+        }
+        else {
+            outCenterPositions.at<Point2f>(i) = Point2f(-1,-1);
+        }
+    }
+
+    // get node item lists
+    outNodeItemLists = Mat(labelCnt, maxItemCnt, CV_32SC2);
+    outNodeItemLists.setTo(-1);
+    fill(itemCnt.begin(), itemCnt.end(), 0);
+    for (int i = 0; i < _labels.rows; i++) {
+        for (int j = 0; j < _labels.cols; j++) {
+            int id = _labels.at<int>(i,j);
+            int cnt = itemCnt[id];
+            outNodeItemLists.at<Point>(id, cnt) = Point(j,i);
+            itemCnt[id]++;
+        }
+    }
+
+}
+
+void RICInterpolatorImpl::findSupportMatches(vector<int> & srcIds, int srcCnt, int supportCnt, Mat & matNN,
+                                            Mat & matNNDis, vector<int> & outSupportIds, vector<float> &  outSupportDis)
+{
+    fill(outSupportIds.begin(), outSupportIds.end(), -1); // -1
+    fill(outSupportDis.begin(), outSupportDis.end(), -1.f); // -1
+
+    int allNodeCnt = matNN.rows;
+    MinHeap H(allNodeCnt); // min-heap
+    vector<float> currDis(allNodeCnt);
+
+    for (int i = 0; i < srcCnt; i++)
+    {
+        int id = srcIds[i];
+        int* pSupportIds   = &outSupportIds[i * supportCnt];
+        float* pSupportDis = &outSupportDis[i * supportCnt];
+
+        H.Clear();
+        fill(currDis.begin(), currDis.end(), numeric_limits<float>::max());
+
+        int validSupportCnt = 0;
+
+        H.Push(static_cast<float>(id), 0); // min distance
+        currDis[id] = 0;
+
+        while (H.Size()) {
+            float dis;
+            int idx = static_cast<int>(H.Pop(&dis));
+
+            if (dis > currDis[idx]) {
+                continue;
+            }
+
+            pSupportIds[validSupportCnt] = idx;
+            pSupportDis[validSupportCnt] = dis;
+            validSupportCnt++;
+            if (validSupportCnt >= supportCnt) {
+                break;
+            }
+
+            for (int k = 0; k < matNN.cols; k++) {
+                int nb = matNN.at<int>(idx, k);
+                if (nb < 0) {
+                    break;
+                }
+                float newDis = dis + matNNDis.at<float>(idx,k);
+                if (newDis < currDis[nb]) {
+                    H.Push(static_cast<float>(nb), newDis);
+                    currDis[nb] = newDis;
+                }
+            }
+        }
+    }
+}
+
+int RICInterpolatorImpl::PropagateModels(int spCnt, Mat & spNN, vector<int> & supportMatchIds, vector<float> & supportMatchDis, int supportCnt,
+    const vector<SparseMatch> &inputMatches, Mat & outModels)
+{
+    int iterCnt = model_iter;
+
+    srand(0);
+
+    Mat inLierFlag(spCnt, supportCnt, CV_32SC1);
+    Mat tmpInlierFlag(1, supportCnt, CV_32SC1);
+    Mat tmpModel(1, 6, CV_32FC1);
+
+    // prepare data
+    vector<float> bestCost(spCnt);
+    parallel_for_(Range(0, spCnt), [&](const Range& range)
+    {
+        for (int i = range.start; i < range.end; i++)
+        {
+            Mat outModelRow = outModels.row(i);
+            Mat inlierFlagRow = inLierFlag.row(i);
+            bestCost[i] =
+                HypothesisEvaluation(
+                outModelRow,
+                &supportMatchIds[i * supportCnt],
+                &supportMatchDis[i * supportCnt],
+                supportCnt,
+                inputMatches,
+                inlierFlagRow);
+        }
+    }
+    );
+    parallel_for_(Range(0, iterCnt), [&](const Range& range)
+    {
+        vector<int> vFlags(spCnt);
+        for (int iter = range.start; iter < range.end; iter++)
+        {
+            fill(vFlags.begin(), vFlags.end(), 0);
+
+            int startPos = 0, endPos = spCnt, step = 1;
+            if (iter % 2 == 1) {
+                startPos = spCnt - 1; endPos = -1; step = -1;
+            }
+
+            for (int idx = startPos; idx != endPos; idx += step)
+            {
+                for (int i = 0; i < spNN.cols; i++) {
+                    int nb = spNN.at<int>(idx, i);
+                    if (nb < 0) break;
+                    if (!vFlags[nb]) continue;
+                    Mat NNModel = outModels.row(nb);
+                    float cost = HypothesisEvaluation(
+                        outModels.row(nb),
+                        &supportMatchIds[idx * supportCnt],
+                        &supportMatchDis[idx * supportCnt],
+                        supportCnt,
+                        inputMatches,
+                        tmpInlierFlag);
+                    if (cost < bestCost[idx])
+                    {
+                        outModels.row(nb).copyTo(outModels.row(idx));
+                        tmpInlierFlag.copyTo(inLierFlag.row(idx));
+                        bestCost[idx] = cost;
+                    }
+                }
+
+                // random test
+                int testCnt = 1;
+                for (int i = 0; i < testCnt; i++) {
+                    if (HypothesisGeneration(&supportMatchIds[idx * supportCnt], supportCnt, inputMatches, tmpModel) == 0)
+                    {
+                        float cost = HypothesisEvaluation(
+                            tmpModel,
+                            &supportMatchIds[idx * supportCnt],
+                            &supportMatchDis[idx * supportCnt],
+                            supportCnt,
+                            inputMatches,
+                            tmpInlierFlag);
+                        if (cost < bestCost[idx])
+                        {
+                            tmpModel.copyTo(outModels.row(idx));
+                            tmpInlierFlag.copyTo(inLierFlag.row(idx));
+                            bestCost[idx] = cost;
+                        }
+                    }
+                }
+                vFlags[idx] = 1;
+            }
+        }
+    }
+    );
+    // refinement
+    if (refine_models)
+    {
+        parallel_for_(Range(0, spCnt), [&](const Range& range)
+        {
+            int averInlier = 0;
+            int minPtCnt = 30;
+            for (int i = range.start; i < range.end; i++)
+            {
+
+                Mat pt1(supportCnt, 1, CV_32FC2);
+                Mat pt2(supportCnt, 1, CV_32FC2);
+                vector<int>   inlier_labels(supportCnt);
+                vector<float> inlier_distances(supportCnt);
+                Mat fitModel;
+
+                int* matNodes = &supportMatchIds[i * supportCnt];
+                float* matDis = &supportMatchDis[i * supportCnt];
+
+                int inlierCnt = 0;
+                for (int k = 0; k < supportCnt; k++) {
+                    if (inLierFlag.at<int>(i, k))
+                    {
+                        int matId = matNodes[k];
+                        inlier_labels[inlierCnt] = matId;
+                        inlier_distances[inlierCnt] = GetWeightFromDistance(matDis[k]);
+                        inlierCnt++;
+                    }
+                }
+                if (inlierCnt >= minPtCnt)
+                {
+                    weightedLeastSquaresAffineFit(&inlier_labels[0], &inlier_distances[0], inlierCnt, 0.01f, &inputMatches[0], fitModel);
+                    fitModel.reshape(1, 1).copyTo(outModels.row(i));
+
+                }
+                averInlier += inlierCnt;
+            }
+        }
+        );
+    }
+
+    return 0;
+}
+
+float RICInterpolatorImpl::GetWeightFromDistance(float dis)
+{
+    return exp(-dis / (alpha * 1000));
+}
+
+float RICInterpolatorImpl::HypothesisEvaluation(const Mat & inModel, const int * matNodes, const float * matDis, int matCnt, const vector<SparseMatch> & inputMatches, Mat & outInLier)
+{
+    float errTh = 5.;
+
+    // find inliers
+    int inLierCnt = 0;
+    float cost = 0;
+    for (int k = 0; k < matCnt; k++) {
+        int matId = matNodes[k];
+        const SparseMatch & p = inputMatches[matId];
+        float x1 = p.reference_image_pos.x;
+        float y1 = p.reference_image_pos.y;
+        float xp = inModel.at<float>(0) * x1 + inModel.at<float>(1) * y1 + inModel.at<float>(2);
+        float yp = inModel.at<float>(3) * x1 + inModel.at<float>(4) * y1 + inModel.at<float>(5);
+        float pu = xp - x1, pv = yp - y1;
+
+        float tu = p.target_image_pos.x - p.reference_image_pos.x;
+        float tv = p.target_image_pos.y - p.reference_image_pos.y;
+        float wt = GetWeightFromDistance(matDis[k]);
+
+        if (std::isnan(pu) || std::isnan(pv)) {
+            outInLier.at<int>(k) = 0;
+            cost += wt * errTh;
+            continue;
+        }
+
+        float dis = sqrt((tu - pu)*(tu - pu) + (tv - pv)*(tv - pv));
+        if (dis < errTh) {
+            outInLier.at<int>(k) = 1;
+            inLierCnt++;
+            cost += wt * dis;
+        }
+        else {
+            outInLier.at<int>(k) = 0;
+            cost += wt * errTh;
+        }
+    }
+    return cost;
+}
+
+int RICInterpolatorImpl::HypothesisGeneration(int* matNodes, int matCnt, const vector<SparseMatch> & inputMatches, Mat & outModel)
+{
+    if (matCnt < 3)
+    {
+        return -1;
+    }
+
+    int pickTimes = 0;
+    int maxPickTimes = 10;
+    float p1[6], p2[6]; // 3 pairs
+    bool pick_data = true;
+    float deter = 0;
+    while (pick_data)
+    {
+        pick_data = false;
+        // pick 3 group of points randomly
+        for (int k = 0; k < 3; k++) {
+            int matId = matNodes[rand() % matCnt];
+            const SparseMatch & p = inputMatches[matId];
+            p1[2 * k] = p.reference_image_pos.x;
+            p1[2 * k + 1] = p.reference_image_pos.y;
+            p2[2 * k] = p.target_image_pos.x;
+            p2[2 * k + 1] = p.target_image_pos.y;
+        }
+        // are the 3 points on the same line ?
+        deter = 0; // determinant
+        deter = p1[0] * p1[3] + p1[2] * p1[5] + p1[4] * p1[1]
+              - p1[4] * p1[3] - p1[0] * p1[5] - p1[2] * p1[1];
+        if (abs(deter) <= FLT_EPSILON)
+        {
+            pickTimes++;
+            if (pickTimes > maxPickTimes) {
+                return -1;
+            }
+            pick_data = true;
+        }
+    }
+    // estimate the model
+    float inv[9];
+    inv[0] = (p1[3] - p1[5]) / deter;
+    inv[1] = (p1[5] - p1[1]) / deter;
+    inv[2] = (p1[1] - p1[3]) / deter;
+    inv[3] = (p1[4] - p1[2]) / deter;
+    inv[4] = (p1[0] - p1[4]) / deter;
+    inv[5] = (p1[2] - p1[0]) / deter;
+    inv[6] = (p1[2] * p1[5] - p1[3] * p1[4]) / deter;
+    inv[7] = (p1[1] * p1[4] - p1[0] * p1[5]) / deter;
+    inv[8] = (p1[0] * p1[3] - p1[1] * p1[2]) / deter;
+
+    outModel.at<float>(0) = inv[0] * p2[0] + inv[1] * p2[2] + inv[2] * p2[4];
+    outModel.at<float>(1) = inv[3] * p2[0] + inv[4] * p2[2] + inv[5] * p2[4];
+    outModel.at<float>(2) = inv[6] * p2[0] + inv[7] * p2[2] + inv[8] * p2[4];
+    outModel.at<float>(3) = inv[0] * p2[1] + inv[1] * p2[3] + inv[2] * p2[5];
+    outModel.at<float>(4) = inv[3] * p2[1] + inv[4] * p2[3] + inv[5] * p2[5];
+    outModel.at<float>(5) = inv[6] * p2[1] + inv[7] * p2[3] + inv[8] * p2[5];
+
+    return 0;
+}
+CV_EXPORTS_W
+
+Ptr<RICInterpolator> createRICInterpolator()
+{
+    return Ptr<RICInterpolator>(RICInterpolatorImpl::create());
+}
+
 }
 }
diff --git a/modules/ximgproc/src/structured_edge_detection.cpp b/modules/ximgproc/src/structured_edge_detection.cpp
index 9c3044fbf50..884c7fee990 100644
--- a/modules/ximgproc/src/structured_edge_detection.cpp
+++ b/modules/ximgproc/src/structured_edge_detection.cpp
@@ -811,9 +811,11 @@ class StructuredEdgeDetectionImpl : public StructuredEdgeDetection
                 {// for j,k in [0;width)x[0;nTreesEval)
 
                     int currentNode = pIndex[j*nTreesEval + k];
-
-                    int start  = __rf.edgeBoundaries[currentNode];
-                    int finish = __rf.edgeBoundaries[currentNode + 1];
+                    size_t sizeBoundaries = __rf.edgeBoundaries.size();
+                    int convertedBoundaries = static_cast<int>(sizeBoundaries);
+                    int nBnds = (convertedBoundaries - 1) / (nTreesNodes * nTrees);
+                    int start = __rf.edgeBoundaries[currentNode * nBnds];
+                    int finish = __rf.edgeBoundaries[currentNode * nBnds + 1];
 
                     if (start == finish)
                         continue;
diff --git a/modules/ximgproc/test/test_edgepreserving_filter.cpp b/modules/ximgproc/test/test_edgepreserving_filter.cpp
new file mode 100644
index 00000000000..8f6059b7d07
--- /dev/null
+++ b/modules/ximgproc/test/test_edgepreserving_filter.cpp
@@ -0,0 +1,35 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+//
+//                    Created by Simon Reich
+//
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+TEST(ximgproc_EdgepreservingFilter, regression)
+{
+    // Load original image
+    std::string filename = string(cvtest::TS::ptr()->get_data_path()) + "perf/320x260.png";
+    cv::Mat src, dst, noise, original = imread(filename, 1);
+
+    ASSERT_FALSE(original.empty()) << "Could not load input image " << filename;
+    ASSERT_EQ(3, original.channels()) << "Load color input image " << filename;
+
+    // add noise
+    noise = Mat(original.size(), original.type());
+    cv::randn(noise, 0, 5);
+    src = original + noise;
+
+    // Filter
+    int kernel = 9;
+    double threshold = 20;
+    ximgproc::edgePreservingFilter(src, dst, kernel, threshold);
+
+    double psnr = cvtest::PSNR(original, dst);
+    //printf("psnr=%.2f\n", psnr);
+    ASSERT_LT(psnr, 25.0);
+}
+
+}} // namespace
diff --git a/modules/ximgproc/test/test_fourier_descriptors.cpp b/modules/ximgproc/test/test_fourier_descriptors.cpp
new file mode 100644
index 00000000000..d13a8969a8a
--- /dev/null
+++ b/modules/ximgproc/test/test_fourier_descriptors.cpp
@@ -0,0 +1,90 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+
+TEST(ximgproc_fourierdescriptors,test_FD_AND_FIT)
+{
+    Mat fd;
+    vector<Point2f> ctr(16);
+    float Rx = 100, Ry = 100;
+    Point2f g(0, 0);
+    float angleOri = 0;
+    for (int i = 0; i < static_cast<int>(ctr.size()); i++)
+    {
+        float theta = static_cast<float>(2 * CV_PI / static_cast<int>(ctr.size()) * i + angleOri);
+        ctr[i] = Point2f(Rx * cos(theta) + g.x, Ry * sin(theta) + g.y);
+
+    }
+    ximgproc::fourierDescriptor(ctr, fd);
+    CV_Assert(cv::norm(fd.at<Vec2f>(0, 0)) < ctr.size() * FLT_EPSILON && cv::norm(fd.at<Vec2f>(0, 1) - Vec2f(Rx, 0)) < ctr.size() * FLT_EPSILON);
+    Rx = 100, Ry = 50;
+    g = Point2f(50, 20);
+    for (int i = 0; i < static_cast<int>(ctr.size()); i++)
+    {
+        float theta = static_cast<float>(2 * CV_PI / static_cast<int>(ctr.size()) * i + angleOri);
+        ctr[i] = Point2f(Rx * cos(theta) + g.x, Ry * sin(theta) + g.y);
+    }
+    ximgproc::fourierDescriptor(ctr, fd);
+    CV_Assert(cv::norm(fd.at<Vec2f>(0, 0) - Vec2f(g)) < 1 &&
+        fabs(fd.at<Vec2f>(0, 1)[0] + fd.at<Vec2f>(0, static_cast<int>(ctr.size()) - 1)[0] - Rx) < 1 &&
+        fabs(fd.at<Vec2f>(0, 1)[0] - fd.at<Vec2f>(0, static_cast<int>(ctr.size()) - 1)[0] - Ry) < 1);
+    Rx = 70, Ry = 100;
+    g = Point2f(30, 100);
+    angleOri = static_cast<float>(CV_PI / 4);
+    for (int i = 0; i < static_cast<int>(ctr.size()); i++)
+    {
+        float theta = static_cast<float>(2 * CV_PI / static_cast<int>(ctr.size()) * i + CV_PI / 4);
+        ctr[i] = Point2f(Rx * cos(theta) + g.x, Ry * sin(theta) + g.y);
+    }
+    ximgproc::fourierDescriptor(ctr, fd);
+    CV_Assert(cv::norm(fd.at<Vec2f>(0, 0) - Vec2f(g)) < 1);
+    CV_Assert(cv::norm(Vec2f((Rx + Ry)*cos(angleOri) / 2, (Rx + Ry)*sin(angleOri) / 2) - fd.at<Vec2f>(0, 1)) < 1);
+    CV_Assert(cv::norm(Vec2f((Rx - Ry)*cos(angleOri) / 2, -(Rx - Ry)*sin(angleOri) / 2) - fd.at<Vec2f>(0, static_cast<int>(ctr.size()) - 1)) < 1);
+
+    RNG rAlea;
+    g.x = 0; g.y = 0;
+    ctr.resize(256);
+    for (int i = 0; i < static_cast<int>(ctr.size()); i++)
+    {
+        ctr[i] = Point2f(rAlea.uniform(0.0F, 1.0F), rAlea.uniform(0.0F, 1.0F));
+        g += ctr[i];
+    }
+    g.x = g.x / ctr.size();
+    g.y = g.y / ctr.size();
+    double rotAngle = 35;
+    double s = 0.1515;
+    Mat r = getRotationMatrix2D(g, rotAngle, 0.1515);
+    vector<Point2f> unknownCtr;
+    vector<Point2f> ctrShift;
+    int valShift = 170;
+    for (int i = 0; i < static_cast<int>(ctr.size()); i++)
+        ctrShift.push_back(ctr[(i + valShift) % ctr.size()]);
+    cv::transform(ctrShift, unknownCtr, r);
+    ximgproc::ContourFitting fit;
+    fit.setFDSize(16);
+    Mat t;
+    double dist;
+    fit.estimateTransformation(unknownCtr, ctr, t, &dist, false);
+    CV_Assert(fabs(t.at<double>(0, 0)*ctr.size() + valShift) < 10 || fabs((1 - t.at<double>(0, 0))*ctr.size() - valShift) < 10);
+    CV_Assert(fabs(t.at<double>(0, 1) - rotAngle / 180.*CV_PI) < 0.1);
+    CV_Assert(fabs(t.at<double>(0, 2) - 1 / s) < 0.1);
+    ctr.resize(4);
+    ctr[0] = Point2f(0, 0);
+    ctr[1] = Point2f(16, 0);
+    ctr[2] = Point2f(16, 16);
+    ctr[3] = Point2f(0, 16);
+    double squareArea = contourArea(ctr), lengthSquare = arcLength(ctr, true);
+    Mat ctrs;
+    ximgproc::contourSampling(ctr, ctrs, 64);
+    CV_Assert(fabs(squareArea - contourArea(ctrs)) < FLT_EPSILON);
+    CV_Assert(fabs(lengthSquare - arcLength(ctrs, true)) < FLT_EPSILON);
+}
+
+
+
+}} // namespace
diff --git a/modules/ximgproc/test/test_matchcolortemplate.cpp b/modules/ximgproc/test/test_matchcolortemplate.cpp
new file mode 100644
index 00000000000..c4aeaf80f4d
--- /dev/null
+++ b/modules/ximgproc/test/test_matchcolortemplate.cpp
@@ -0,0 +1,81 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+
+TEST(ximgproc_matchcolortemplate,test_QFFT)
+{
+    String openCVExtraDir = cvtest::TS::ptr()->get_data_path();
+    String dataPath = openCVExtraDir;
+#ifdef GENERATE_TESTDATA
+    FileStorage fs;
+    dataPath += "cv/ximgproc/sources/07.png";
+    Mat imgTest = imread(dataPath, IMREAD_COLOR);
+    resize(imgTest, imgTest, Size(), 0.0625, 0.0625);
+    Mat qimgTest, qdftimgTest;
+    ximgproc::createQuaternionImage(imgTest, qimgTest);
+    ximgproc::qdft(qimgTest, qdftimgTest, 0, true);
+    fs.open(openCVExtraDir + "cv/ximgproc/qdftData.yml.gz", FileStorage::WRITE);
+    fs << "image" << imgTest;
+    fs << "qdftleft" << qdftimgTest;
+    ximgproc::qdft(qimgTest, qdftimgTest, 0, false);
+    fs << "qdftright" << qdftimgTest;
+    ximgproc::qdft(qimgTest, qdftimgTest, DFT_INVERSE, true);
+    fs << "qidftleft" << qdftimgTest;
+    ximgproc::qdft(qimgTest, qdftimgTest, DFT_INVERSE, false);
+    fs << "qidftright" << qdftimgTest;
+    fs.release();
+#endif
+    dataPath = openCVExtraDir + "cv/ximgproc/qdftData.yml.gz";
+    FileStorage f;
+    f.open(dataPath, FileStorage::READ);
+    Mat img;
+    f["image"] >> img;
+    Mat qTest;
+    vector<String> nodeName = { "qdftleft","qdftright","qidftleft","qidftright" };
+    vector<int> flag = { 0,0,DFT_INVERSE,DFT_INVERSE };
+    vector<bool> leftSize = {true,false,true,false};
+    ximgproc::createQuaternionImage(img, img);
+    for (int i=0;i<static_cast<int>(nodeName.size());i++)
+    {
+        Mat test, dd;
+        f[nodeName[i]] >> qTest;
+        ximgproc::qdft(img, test, flag[i], leftSize[i]);
+        absdiff(test, qTest, dd);
+        vector<Mat> plane;
+        split(dd, plane);
+        for (auto p : plane)
+        {
+            double maxVal;
+            Point pIdx;
+            minMaxLoc(p, NULL, &maxVal, NULL, &pIdx);
+            ASSERT_LE(p.at<double>(pIdx), 1e-5);
+        }
+    }
+}
+
+TEST(ximgproc_matchcolortemplate, test_COLORMATCHTEMPLATE)
+{
+    String openCVExtraDir = cvtest::TS::ptr()->get_data_path();
+    String dataPath = openCVExtraDir + "cv/ximgproc/corr.yml.gz";
+    Mat img, logo;
+    Mat corrRef,corr;
+    img = imread(openCVExtraDir + "cv/ximgproc/image.png", IMREAD_COLOR);
+    logo = imread(openCVExtraDir + "cv/ximgproc/opencv_logo.png", IMREAD_COLOR);
+    ximgproc::colorMatchTemplate(img, logo, corr);
+#ifdef GENERATE_TESTDATA
+    FileStorage fs;
+    fs.open(dataPath, FileStorage::WRITE);
+    fs << "corr" << imgcorr;
+    fs.release();
+#endif
+    FileStorage f;
+    f.open(dataPath, FileStorage::READ);
+    f["corr"] >> corrRef;
+    EXPECT_LE(cv::norm(corr, corrRef, NORM_INF), 1e-5);
+}
+}} // namespace
diff --git a/modules/ximgproc/test/test_run_length_morphology.cpp b/modules/ximgproc/test/test_run_length_morphology.cpp
new file mode 100644
index 00000000000..f4e39f48ddd
--- /dev/null
+++ b/modules/ximgproc/test/test_run_length_morphology.cpp
@@ -0,0 +1,249 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#include "test_precomp.hpp"
+#include "opencv2/ximgproc/run_length_morphology.hpp"
+#include "opencv2/imgproc.hpp"
+
+namespace opencv_test {
+namespace {
+
+const Size img_size(640, 480);
+const int tile_size(20);
+typedef tuple<cv::MorphTypes, int, int> RLMParams;
+typedef tuple<cv::MorphTypes, int, int, int> RLMSParams;
+
+class RLTestBase
+{
+public:
+    RLTestBase() { }
+
+protected:
+    std::vector<Mat> test_image;
+    std::vector<Mat> test_image_rle;
+
+    void generateCheckerBoard(Mat& image);
+    void generateRandomImage(Mat& image);
+
+    bool areImagesIdentical(Mat& pixelImage, Mat& rleImage);
+    bool arePixelImagesIdentical(Mat& image1, Mat& image2);
+
+    void setUp_impl();
+};
+
+void RLTestBase::generateCheckerBoard(Mat& image)
+{
+    image.create(img_size, CV_8UC1);
+    for (int iy = 0; iy < img_size.height; iy += tile_size)
+    {
+        Range rowRange(iy, std::min(iy + tile_size, img_size.height));
+        for (int ix = 0; ix < img_size.width; ix += tile_size)
+        {
+            Range colRange(ix, std::min(ix + tile_size, img_size.width));
+            Mat tile(image, rowRange, colRange);
+            bool bBright = ((iy + ix) % (2 * tile_size) == 0);
+            tile = (bBright ? Scalar(255) : Scalar(0));
+        }
+    }
+}
+
+void RLTestBase::generateRandomImage(Mat& image)
+{
+    image.create(img_size, CV_8UC1);
+    randu(image, Scalar::all(0), Scalar::all(255));
+}
+
+
+void RLTestBase::setUp_impl()
+{
+    test_image.resize(2);
+    test_image_rle.resize(2);
+    generateCheckerBoard(test_image[0]);
+    rl::threshold(test_image[0], test_image_rle[0], 100.0, THRESH_BINARY);
+
+    cv::Mat theRandom;
+    generateRandomImage(theRandom);
+    double dThreshold = 254.0;
+    cv::threshold(theRandom, test_image[1], dThreshold, 255.0, THRESH_BINARY);
+
+    rl::threshold(theRandom, test_image_rle[1], dThreshold, THRESH_BINARY);
+
+}
+
+bool RLTestBase::areImagesIdentical(Mat& pixelImage, Mat& rleImage)
+{
+    cv::Mat rleConverted;
+    rleConverted = cv::Mat::zeros(pixelImage.rows, pixelImage.cols, CV_8UC1);
+    rl::paint(rleConverted, rleImage, Scalar(255.0));
+    return arePixelImagesIdentical(pixelImage, rleConverted);
+}
+
+bool RLTestBase::arePixelImagesIdentical(Mat& image1, Mat& image2)
+{
+    cv::Mat diff;
+    cv::absdiff(image1, image2, diff);
+    int nDiff = cv::countNonZero(diff);
+    return (nDiff == 0);
+}
+
+
+
+class RL_Identical_Result_Simple : public RLTestBase, public ::testing::TestWithParam<RLMSParams>
+{
+public:
+    RL_Identical_Result_Simple() { }
+protected:
+    virtual void SetUp() { setUp_impl(); }
+};
+
+TEST_P(RL_Identical_Result_Simple, simple)
+{
+    Mat resPix, resRLE;
+    RLMSParams param = GetParam();
+    cv::MorphTypes elementType = get<0>(param);
+    int nSize = get<1>(param);
+    int image = get<2>(param);
+    int op = get<3>(param);
+    Mat element = getStructuringElement(elementType, Size(nSize * 2 + 1, nSize * 2 + 1),
+        Point(nSize, nSize));
+
+
+    morphologyEx(test_image[image], resPix, op, element);
+
+    Mat elementRLE = rl::getStructuringElement(elementType, Size(nSize * 2 + 1, nSize * 2 + 1));
+    rl::morphologyEx(test_image_rle[image], resRLE, op, elementRLE);
+
+    ASSERT_TRUE(areImagesIdentical(resPix, resRLE));
+}
+
+INSTANTIATE_TEST_CASE_P(TypicalSET, RL_Identical_Result_Simple, Combine(Values(MORPH_RECT, MORPH_CROSS, MORPH_ELLIPSE),
+    Values(1, 5, 11), Values(0, 1), Values(MORPH_ERODE, MORPH_DILATE, MORPH_OPEN, MORPH_CLOSE, MORPH_GRADIENT, MORPH_TOPHAT, MORPH_BLACKHAT)));
+
+
+class RL_Identical_Result : public RLTestBase, public ::testing::TestWithParam<RLMParams>
+{
+public:
+    RL_Identical_Result() { }
+protected:
+    virtual void SetUp() { setUp_impl(); }
+};
+
+
+TEST_P(RL_Identical_Result, erosion_no_boundary)
+{
+    Mat resPix, resRLE;
+    RLMParams param = GetParam();
+    cv::MorphTypes elementType = get<0>(param);
+    int nSize = get<1>(param);
+    int image = get<2>(param);
+    Mat element = getStructuringElement(elementType, Size(nSize * 2 + 1, nSize * 2 + 1),
+        Point(nSize, nSize));
+
+    erode(test_image[image], resPix, element, cv::Point(-1,-1), 1, BORDER_CONSTANT, cv::Scalar(0));
+
+    Mat elementRLE = rl::getStructuringElement(elementType, Size(nSize * 2 + 1, nSize * 2 + 1));
+    rl::erode(test_image_rle[image], resRLE, elementRLE, false);
+
+    ASSERT_TRUE(areImagesIdentical(resPix, resRLE));
+}
+
+TEST_P(RL_Identical_Result, erosion_with_offset)
+{
+    Mat resPix, resRLE;
+    RLMParams param = GetParam();
+    cv::MorphTypes elementType = get<0>(param);
+    int nSize = get<1>(param);
+    int image = get<2>(param);
+    int nOffset = nSize - 1;
+    Mat element = getStructuringElement(elementType, Size(nSize * 2 + 1, nSize * 2 + 1),
+        Point(nSize, nSize));
+
+    erode(test_image[image], resPix, element, cv::Point(nSize + nOffset, nSize + nOffset));
+
+    Mat elementRLE = rl::getStructuringElement(elementType, Size(nSize * 2 + 1, nSize * 2 + 1));
+    rl::erode(test_image_rle[image], resRLE, elementRLE, true, Point(nOffset, nOffset));
+
+    ASSERT_TRUE(areImagesIdentical(resPix, resRLE));
+}
+
+TEST_P(RL_Identical_Result, dilation_with_offset)
+{
+    Mat resPix, resRLE;
+    RLMParams param = GetParam();
+    cv::MorphTypes elementType = get<0>(param);
+    int nSize = get<1>(param);
+    int image = get<2>(param);
+    int nOffset = nSize - 1;
+    Mat element = getStructuringElement(elementType, Size(nSize * 2 + 1, nSize * 2 + 1),
+        Point(nSize, nSize));
+
+    dilate(test_image[image], resPix, element, cv::Point(nSize + nOffset, nSize + nOffset));
+
+    Mat elementRLE = rl::getStructuringElement(elementType, Size(nSize * 2 + 1, nSize * 2 + 1));
+    rl::dilate(test_image_rle[image], resRLE, elementRLE, Point(nOffset, nOffset));
+
+    ASSERT_TRUE(areImagesIdentical(resPix, resRLE));
+}
+
+INSTANTIATE_TEST_CASE_P(TypicalSET, RL_Identical_Result, Combine(Values(MORPH_RECT, MORPH_CROSS, MORPH_ELLIPSE), Values(1,5,11), Values(0,1)));
+
+class RL_CreateCustomKernel : public RLTestBase, public testing::Test
+{
+public:
+    RL_CreateCustomKernel() { }
+protected:
+    virtual void SetUp() { setUp_impl(); }
+};
+
+
+TEST_F(RL_CreateCustomKernel, check_valid)
+{
+    // create a diamond
+    int nSize = 21;
+    std::vector<Point3i> runs;
+    for (int i = 0; i < nSize; ++i)
+    {
+        runs.emplace_back(Point3i(-i, i, -nSize + i));
+        runs.emplace_back(Point3i(-i, i, nSize - i));
+    }
+    runs.emplace_back(Point3i(-nSize, nSize, 0));
+    Mat kernel, dest;
+    rl::createRLEImage(runs, kernel);
+    ASSERT_TRUE(rl::isRLMorphologyPossible(kernel));
+
+    rl::erode(test_image_rle[0], dest, kernel);
+    //only one row means: no runs, all pixels off
+    ASSERT_TRUE(dest.rows == 1);
+}
+
+typedef tuple<int> RLPParams;
+
+class RL_Paint : public RLTestBase, public ::testing::TestWithParam<RLPParams>
+{
+public:
+RL_Paint() { }
+protected:
+    virtual void SetUp() { setUp_impl(); }
+};
+
+TEST_P(RL_Paint, same_result)
+{
+    Mat converted, pixBinary, painted;
+    RLPParams param = GetParam();
+    int rType = get<0>(param);
+
+    double dThreshold = 100.0;
+    double dMaxValue = 105.0;
+    test_image[1].convertTo(converted, rType);
+    cv::threshold(converted, pixBinary, dThreshold, dMaxValue, THRESH_BINARY);
+
+    painted.create(test_image[1].rows, test_image[1].cols, rType);
+    painted = cv::Scalar(0.0);
+    rl::paint(painted, test_image_rle[1], Scalar(dMaxValue));
+    ASSERT_TRUE(arePixelImagesIdentical(pixBinary, painted));
+}
+
+INSTANTIATE_TEST_CASE_P(TypicalSET, RL_Paint, Values(CV_8U, CV_16U, CV_16S, CV_32F, CV_64F));
+
+}
+}
\ No newline at end of file
diff --git a/modules/ximgproc/test/test_sparse_match_interpolator.cpp b/modules/ximgproc/test/test_sparse_match_interpolator.cpp
index 7fb7e8fc6d3..e5f0f31f7f7 100644
--- a/modules/ximgproc/test/test_sparse_match_interpolator.cpp
+++ b/modules/ximgproc/test/test_sparse_match_interpolator.cpp
@@ -20,12 +20,12 @@ Mat readOpticalFlow( const String& path )
     Mat_<Point2f> flow;
     std::ifstream file(path.c_str(), std::ios_base::binary);
     if ( !file.good() )
-        return CV_CXX_MOVE(flow); // no file - return empty matrix
+        return std::move(flow); // no file - return empty matrix
 
     float tag;
     file.read((char*) &tag, sizeof(float));
     if ( tag != FLOW_TAG_FLOAT )
-        return CV_CXX_MOVE(flow);
+        return std::move(flow);
 
     int width, height;
 
@@ -44,14 +44,14 @@ Mat readOpticalFlow( const String& path )
             if ( !file.good() )
             {
                 flow.release();
-                return CV_CXX_MOVE(flow);
+                return std::move(flow);
             }
 
             flow(i, j) = u;
         }
     }
     file.close();
-    return CV_CXX_MOVE(flow);
+    return std::move(flow);
 }
 
 CV_ENUM(GuideTypes, CV_8UC1, CV_8UC3)
@@ -95,6 +95,54 @@ TEST(InterpolatorTest, ReferenceAccuracy)
     EXPECT_LE(cv::norm(res_flow, ref_flow, NORM_L1) , MAX_MEAN_DIF*res_flow.total());
 }
 
+TEST(InterpolatorTest, RICReferenceAccuracy)
+{
+    double MAX_DIF = 6.0;
+    double MAX_MEAN_DIF = 60.0 / 256.0;
+    string dir = getDataDir() + "cv/sparse_match_interpolator";
+
+    Mat src = imread(getDataDir() + "cv/optflow/RubberWhale1.png", IMREAD_COLOR);
+    ASSERT_FALSE(src.empty());
+
+    Mat ref_flow = readOpticalFlow(dir + "/RubberWhale_reference_result.flo");
+    ASSERT_FALSE(ref_flow.empty());
+
+    Mat src1 = imread(getDataDir() + "cv/optflow/RubberWhale2.png", IMREAD_COLOR);
+    ASSERT_FALSE(src.empty());
+
+    std::ifstream file((dir + "/RubberWhale_sparse_matches.txt").c_str());
+    float from_x, from_y, to_x, to_y;
+    vector<Point2f> from_points;
+    vector<Point2f> to_points;
+
+    while (file >> from_x >> from_y >> to_x >> to_y)
+    {
+        from_points.push_back(Point2f(from_x, from_y));
+        to_points.push_back(Point2f(to_x, to_y));
+    }
+
+    Mat res_flow;
+
+    Ptr<RICInterpolator> interpolator = createRICInterpolator();
+    interpolator->setK(32);
+    interpolator->setSuperpixelSize(15);
+    interpolator->setSuperpixelNNCnt(150);
+    interpolator->setSuperpixelRuler(15.f);
+    interpolator->setSuperpixelMode(ximgproc::SLIC);
+    interpolator->setAlpha(0.7f);
+    interpolator->setModelIter(4);
+    interpolator->setRefineModels(true);
+    interpolator->setMaxFlow(250.f);
+    interpolator->setUseVariationalRefinement(true);
+    interpolator->setUseGlobalSmootherFilter(true);
+    interpolator->setFGSLambda(500.f);
+    interpolator->setFGSSigma(1.5f);
+    interpolator->interpolate(src, from_points, src1, to_points, res_flow);
+
+    EXPECT_LE(cv::norm(res_flow, ref_flow, NORM_INF), MAX_DIF);
+    EXPECT_LE(cv::norm(res_flow, ref_flow, NORM_L1), MAX_MEAN_DIF*res_flow.total());
+}
+
 TEST_P(InterpolatorTest, MultiThreadReproducibility)
 {
     if (cv::getNumberOfCPUs() == 1)
diff --git a/modules/xobjdetect/src/precomp.hpp b/modules/xobjdetect/src/precomp.hpp
index 968374af2b6..0fa5d928176 100644
--- a/modules/xobjdetect/src/precomp.hpp
+++ b/modules/xobjdetect/src/precomp.hpp
@@ -55,8 +55,6 @@ the use of this software, even if advised of the possibility of such damage.
 #include <opencv2/core.hpp>
 
 #include <opencv2/imgcodecs.hpp>
-#include <opencv2/imgcodecs/imgcodecs_c.h>
-
 #include <opencv2/objdetect.hpp>
 
 #include <algorithm>
diff --git a/modules/xobjdetect/src/wbdetector.cpp b/modules/xobjdetect/src/wbdetector.cpp
index 4c9aed6c5c5..47f593464dd 100644
--- a/modules/xobjdetect/src/wbdetector.cpp
+++ b/modules/xobjdetect/src/wbdetector.cpp
@@ -63,7 +63,7 @@ static vector<Mat> sample_patches(
     vector<Mat> patches;
     size_t patch_count = 0;
     for (size_t i = 0; i < filenames.size(); ++i) {
-        Mat img = imread(filenames[i], CV_LOAD_IMAGE_GRAYSCALE);
+        Mat img = imread(filenames[i], IMREAD_GRAYSCALE);
         for (int row = 0; row + n_rows < img.rows; row += n_rows) {
             for (int col = 0; col + n_cols < img.cols; col += n_cols) {
                 patches.push_back(img(Rect(col, row, n_cols, n_rows)).clone());
@@ -84,7 +84,7 @@ static vector<Mat> read_imgs(const string& path)
     glob(path, filenames);
     vector<Mat> imgs;
     for (size_t i = 0; i < filenames.size(); ++i) {
-        imgs.push_back(imread(filenames[i], CV_LOAD_IMAGE_GRAYSCALE));
+        imgs.push_back(imread(filenames[i], IMREAD_GRAYSCALE));
     }
     return imgs;
 }
@@ -165,7 +165,7 @@ void WBDetectorImpl::train(
         for (; img_i < neg_filenames.size(); ++img_i) {
             cerr << "win " << bootstrap_count << "/" << stage_neg
                  << " img " << (img_i + 1) << "/" << neg_filenames.size() << "\r";
-            Mat img = imread(neg_filenames[img_i], CV_LOAD_IMAGE_GRAYSCALE);
+            Mat img = imread(neg_filenames[img_i], IMREAD_GRAYSCALE);
             vector<Rect> bboxes;
             Mat1f confidences;
             boost_.detect(eval, img, scales, bboxes, confidences);
diff --git a/modules/xobjdetect/tools/waldboost_detector/waldboost_detector.cpp b/modules/xobjdetect/tools/waldboost_detector/waldboost_detector.cpp
index 6f196bf50b9..a10aa3c53da 100644
--- a/modules/xobjdetect/tools/waldboost_detector/waldboost_detector.cpp
+++ b/modules/xobjdetect/tools/waldboost_detector/waldboost_detector.cpp
@@ -1,6 +1,5 @@
 #include "opencv2/xobjdetect.hpp"
 #include "opencv2/imgcodecs.hpp"
-#include "opencv2/imgcodecs/imgcodecs_c.h"
 #include "opencv2/imgproc.hpp"
 #include <iostream>
 #include <cstdio>
@@ -28,7 +27,7 @@ int main(int argc, char **argv)
         assert(argc == 6);
         vector<Rect> bboxes;
         vector<double> confidences;
-        Mat img = imread(argv[3], CV_LOAD_IMAGE_GRAYSCALE);
+        Mat img = imread(argv[3], IMREAD_GRAYSCALE);
         FileStorage fs(argv[2], FileStorage::READ);
         detector->read(fs.getFirstTopLevelNode());
         detector->detect(img, bboxes, confidences);
diff --git a/modules/xphoto/doc/xphoto.bib b/modules/xphoto/doc/xphoto.bib
index d89e90723a5..c888e374f49 100644
--- a/modules/xphoto/doc/xphoto.bib
+++ b/modules/xphoto/doc/xphoto.bib
@@ -13,6 +13,11 @@ @inproceedings{Cheng2015
   pages={1000--1008},
   year={2015}
 }
+@book{Holzmann1988,
+  title={Beyond Photography: The Digital Darkroom},
+  author={GerPublished by ard J. Holzmann},
+  publisher={Prentice Hall in 1988}
+}
 @inproceedings{DD02,
   author = {Durand, Fr{\'e}do and Dorsey, Julie},
   title = {Fast bilateral filtering for the display of high-dynamic-range images},
@@ -24,3 +29,49 @@ @inproceedings{DD02
   publisher = {ACM},
   url = {https://www.researchgate.net/profile/Julie_Dorsey/publication/220184746_Fast_Bilateral_Filtering_for_the_Display_of_High_-_dynamic_-_range_Images/links/54566b000cf26d5090a95f96/Fast-Bilateral-Filtering-for-the-Display-of-High-dynamic-range-Images.pdf}
 }
+
+@INPROCEEDINGS{GenserPCS2018,
+   author={N. {Genser} and J. {Seiler} and F. {Schilling} and A. {Kaup}},
+   booktitle={Proc. Picture Coding Symposium (PCS)},
+   title={Signal and Loss Geometry Aware Frequency Selective Extrapolation for Error Concealment},
+   year={2018},
+   pages={159-163},
+   keywords={extrapolation;image reconstruction;video coding;loss geometry aware frequency selective extrapolation;error concealment;complex models;moderate computational complexity;Full HD image;error pattern;adjacent samples;undistorted samples;reconstruction parameters;processing order;High Efficiency Video Coding;content based partitioning;signal characteristics;block based frequency selective extrapolation;Image reconstruction;Extrapolation;Geometry;Partitioning algorithms;Task analysis;Computational modeling;Standards},
+   doi={10.1109/PCS.2018.8456259},
+   month={June},
+}
+
+@ARTICLE{SeilerTIP2015,
+   author={J. {Seiler} and M. {Jonscher} and M. {Schöberl} and A. {Kaup}},
+   journal={IEEE Transactions on Image Processing},
+   title={Resampling Images to a Regular Grid From a Non-Regular Subset of Pixel Positions Using Frequency Selective Reconstruction},
+   year={2015},
+   volume={24},
+   number={11},
+   pages={4540-4555},
+   keywords={Fourier transforms;image reconstruction;resampling images;regular grid;nonregular subset;pixel positions;frequency selective reconstruction;displaying image signals;image signal reconstruction algorithm;Fourier domain;optical transfer function;visual quality;peak signal-to-noise ratio;Image reconstruction;Signal processing algorithms;Reconstruction algorithms;Signal processing;Spatial resolution;;Image reconstruction;non-regular sampling;interpolation},
+   doi={10.1109/TIP.2015.2463084},
+   month={Nov},
+}
+
+@INPROCEEDINGS{GroscheICIP2018,
+   author={S. {Grosche} and J. {Seiler} and A. {Kaup}},
+   booktitle={Proc. 25th IEEE International Conference on Image Processing (ICIP)},
+   title={Iterative Optimization of Quarter Sampling Masks for Non-Regular Sampling Sensors},
+   year={2018},
+   pages={26-30},
+   keywords={extrapolation;image enhancement;image reconstruction;image resolution;image sampling;image sensors;interpolation;iterative methods;optimisation;regression analysis;iterative optimization;nonregular sampling sensors;iterative algorithm;arbitrary quarter sampling mask;reconstruction algorithms;random quarter sampling mask;optimized mask;frequency selective extrapolation;steering kernel regression;nearest neighbor interpolation;linear interpolation;regular imaging sensor;reconstruction quality;noise figure 0.31 dB to 0.68 dB;Image resolution;Image reconstruction;Sensors;Optimization;Energy resolution;Reconstruction algorithms;Image sensors;Non-Regular Sampling;Image reconstruction},
+   doi={10.1109/ICIP.2018.8451658},
+   month={Oct},
+}
+
+@INPROCEEDINGS{GroscheIST2018,
+   author={S. {Grosche} and J. {Seiler} and A. {Kaup}},
+   booktitle={Proc. IEEE International Conference on Imaging Systems and Techniques (IST)},
+   title={Design Techniques for Incremental Non-Regular Image Sampling Patterns},
+   year={2018},
+   pages={1-6},
+   keywords={image reconstruction;image resolution;image sampling;design techniques;incremental nonregular image sampling patterns;image signals;regular two dimensional grid;nonregular sampling patterns;sampling positions;random patterns;regular patterns;arbitrary sampling densities;incremental sampling patterns;sampling density;Image reconstruction;Scanning electron microscopy;Probability distribution;Atomic force microscopy;Reconstruction algorithms;Measurement by laser beam;Image Reconstruction;non-Regular Sampling},
+   doi={10.1109/IST.2018.8577090},
+   month={Oct},
+}
diff --git a/modules/xphoto/include/opencv2/xphoto.hpp b/modules/xphoto/include/opencv2/xphoto.hpp
index 14d5d531179..031fbb98054 100644
--- a/modules/xphoto/include/opencv2/xphoto.hpp
+++ b/modules/xphoto/include/opencv2/xphoto.hpp
@@ -50,6 +50,7 @@
 #include "xphoto/white_balance.hpp"
 #include "xphoto/dct_image_denoising.hpp"
 #include "xphoto/bm3d_image_denoising.hpp"
+#include "xphoto/oilpainting.hpp"
 #include "xphoto/tonemap.hpp"
 
 #endif
diff --git a/modules/xphoto/include/opencv2/xphoto/inpainting.hpp b/modules/xphoto/include/opencv2/xphoto/inpainting.hpp
index 9c40e8c9cc2..cd71a9d8e94 100644
--- a/modules/xphoto/include/opencv2/xphoto/inpainting.hpp
+++ b/modules/xphoto/include/opencv2/xphoto/inpainting.hpp
@@ -9,9 +9,14 @@
 //
 //                           License Agreement
 //                For Open Source Computer Vision Library
+//                       (3-clause BSD License)
 //
-// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2000-2019, Intel Corporation, all rights reserved.
 // Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2009-2016, NVIDIA Corporation, all rights reserved.
+// Copyright (C) 2010-2013, Advanced Micro Devices, Inc., all rights reserved.
+// Copyright (C) 2015-2016, OpenCV Foundation, all rights reserved.
+// Copyright (C) 2015-2016, Itseez Inc., all rights reserved.
 // Third party copyrights are property of their respective owners.
 //
 // Redistribution and use in source and binary forms, with or without modification,
@@ -24,8 +29,9 @@
 //     this list of conditions and the following disclaimer in the documentation
 //     and/or other materials provided with the distribution.
 //
-//   * The name of the copyright holders may not be used to endorse or promote products
-//     derived from this software without specific prior written permission.
+//   * Neither the names of the copyright holders nor the names of the contributors
+//    may be used to endorse or promote products derived from this software
+//    without specific prior written permission.
 //
 // This software is provided by the copyright holders and contributors "as is" and
 // any express or implied warranties, including, but not limited to, the implied
@@ -58,24 +64,48 @@ namespace xphoto
 //! @addtogroup xphoto
 //! @{
 
-    //! various inpainting algorithms
+    //! @brief Various inpainting algorithms
+    //! @sa inpaint
     enum InpaintTypes
     {
         /** This algorithm searches for dominant correspondences (transformations) of
         image patches and tries to seamlessly fill-in the area to be inpainted using this
         transformations */
-        INPAINT_SHIFTMAP = 0
+        INPAINT_SHIFTMAP = 0,
+        /** Performs Frequency Selective Reconstruction (FSR).
+        One of the two quality profiles BEST and FAST can be chosen, depending on the time available for reconstruction.
+        See @cite GenserPCS2018 and @cite SeilerTIP2015 for details.
+
+        The algorithm may be utilized for the following areas of application:
+        1. %Error Concealment (Inpainting).
+           The sampling mask indicates the missing pixels of the distorted input
+           image to be reconstructed.
+        2. Non-Regular Sampling.
+           For more information on how to choose a good sampling mask, please review
+           @cite GroscheICIP2018 and @cite GroscheIST2018.
+
+        1-channel grayscale or 3-channel BGR image are accepted.
+
+        Conventional accepted ranges:
+        - 0-255 for CV_8U
+        - 0-65535 for CV_16U
+        - 0-1 for CV_32F/CV_64F.
+        */
+        INPAINT_FSR_BEST = 1,
+        INPAINT_FSR_FAST = 2                     //!< See #INPAINT_FSR_BEST
     };
 
     /** @brief The function implements different single-image inpainting algorithms.
 
-    See the original paper @cite He2012 for details.
+    See the original papers @cite He2012 (Shiftmap) or @cite GenserPCS2018 and @cite SeilerTIP2015 (FSR) for details.
 
-    @param src source image, it could be of any type and any number of channels from 1 to 4. In case of
+    @param src source image
+    - #INPAINT_SHIFTMAP: it could be of any type and any number of channels from 1 to 4. In case of
     3- and 4-channels images the function expect them in CIELab colorspace or similar one, where first
     color component shows intensity, while second and third shows colors. Nonetheless you can try any
     colorspaces.
-    @param mask mask (CV_8UC1), where non-zero pixels indicate valid image area, while zero pixels
+    - #INPAINT_FSR_BEST or #INPAINT_FSR_FAST: 1-channel grayscale or 3-channel BGR image.
+    @param mask mask (#CV_8UC1), where non-zero pixels indicate valid image area, while zero pixels
     indicate area to be inpainted
     @param dst destination image
     @param algorithmType see xphoto::InpaintTypes
diff --git a/modules/xphoto/include/opencv2/xphoto/oilpainting.hpp b/modules/xphoto/include/opencv2/xphoto/oilpainting.hpp
new file mode 100644
index 00000000000..6d5c6cf1383
--- /dev/null
+++ b/modules/xphoto/include/opencv2/xphoto/oilpainting.hpp
@@ -0,0 +1,41 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+
+#ifndef __OPENCV_OIL_PAINTING_HPP__
+#define __OPENCV_OIL_PAINTING_HPP__
+
+#include <opencv2/core.hpp>
+#include <opencv2/imgproc.hpp>
+
+namespace cv
+{
+namespace xphoto
+{
+
+//! @addtogroup xphoto
+//! @{
+
+/** @brief oilPainting
+See the book @cite Holzmann1988 for details.
+@param src Input three-channel or one channel image (either CV_8UC3 or CV_8UC1)
+@param dst Output image of the same size and type as src.
+@param size neighbouring size is 2-size+1
+@param dynRatio image is divided by dynRatio before histogram processing
+@param code	color space conversion code(see ColorConversionCodes). Histogram will used only first plane
+*/
+CV_EXPORTS_W void oilPainting(InputArray src, OutputArray dst, int size, int dynRatio, int code);
+/** @brief oilPainting
+See the book @cite Holzmann1988 for details.
+@param src Input three-channel or one channel image (either CV_8UC3 or CV_8UC1)
+@param dst Output image of the same size and type as src.
+@param size neighbouring size is 2-size+1
+@param dynRatio image is divided by dynRatio before histogram processing
+*/
+CV_EXPORTS_W void oilPainting(InputArray src, OutputArray dst, int size, int dynRatio);
+//! @}
+}
+}
+
+#endif // __OPENCV_OIL_PAINTING_HPP__
diff --git a/modules/xphoto/samples/learn_color_balance.py b/modules/xphoto/samples/learn_color_balance.py
index 6b216098541..bd67611e8ad 100644
--- a/modules/xphoto/samples/learn_color_balance.py
+++ b/modules/xphoto/samples/learn_color_balance.py
@@ -221,7 +221,7 @@ def load_ground_truth(gt_path):
                                                  "specify the -g parameter"))
         sys.exit(1)
 
-    img_range = map(int,parse_sequence(args.range))
+    img_range = list(map(int,parse_sequence(args.range)))
     if len(img_range)!=2:
         print("Error: Please specify the -r parameter in form <first_image_index>,<last_image_index>")
         sys.exit(1)
diff --git a/modules/xphoto/samples/oil.cpp b/modules/xphoto/samples/oil.cpp
new file mode 100644
index 00000000000..f06756ef193
--- /dev/null
+++ b/modules/xphoto/samples/oil.cpp
@@ -0,0 +1,103 @@
+#include <opencv2/core.hpp>
+#include <opencv2/highgui.hpp>
+#include <opencv2/imgproc.hpp>
+#include <opencv2/xphoto.hpp>
+#include "opencv2/xphoto/oilpainting.hpp"
+#include <iostream>
+
+using namespace cv;
+using namespace std;
+
+static void TrackSlider(int , void *);
+static void addSlider(String sliderName, String windowName, int minSlider, int maxSlider, int valDefault, int *valSlider, void(*f)(int, void *), void *r);
+vector<int> colorSpace = { COLOR_BGR2GRAY,COLOR_BGR2HSV,COLOR_BGR2YUV,COLOR_BGR2XYZ };
+
+struct OilImage {
+    String winName = "Oil painting";
+    int size;
+    int dynRatio;
+    int colorSpace;
+    Mat img;
+};
+
+const String keys =
+"{Help h usage ? help  |     | Print this message   }"
+"{v                    | 0   | video index }"
+"{a                    | 700   | API index }"
+"{s                    | 10   | neighbouring size }"
+"{d                    | 1   | dynamic ratio }"
+"{c                    | 0   | color space }"
+"{@arg1                |     | file path}"
+;
+
+
+int main(int argc, char* argv[])
+{
+    CommandLineParser parser(argc, argv, keys);
+
+    if (parser.has("help"))
+    {
+        parser.printMessage();
+        return 0;
+    }
+    String filename = parser.get<String>(0);
+    OilImage p;
+    p.dynRatio = parser.get<int>("d");
+    p.size = parser.get<int>("s");
+    p.colorSpace = parser.get<int>("c");
+    if (p.colorSpace < 0 || p.colorSpace >= static_cast<int>(colorSpace.size()))
+    {
+        std::cout << "Color space must be >= 0 and <"<< colorSpace.size()<<"\n";
+        return EXIT_FAILURE;
+    }
+    if (!filename.empty())
+    {
+        p.img = imread(filename);
+        if (p.img.empty())
+        {
+            std::cout << "Check file path!\n";
+            return EXIT_FAILURE;
+        }
+        Mat dst;
+        xphoto::oilPainting(p.img, dst, p.size, p.dynRatio, colorSpace[p.colorSpace]);
+        imshow("oil painting effect", dst);
+        waitKey();
+        return 0;
+    }
+    VideoCapture v(parser.get<int>("v")+ parser.get<int>("a"));
+    v>> p.img;
+    p.winName="Oil Painting";
+    namedWindow(p.winName);
+    addSlider("DynRatio", p.winName, 1,127,p.dynRatio,&p.dynRatio, TrackSlider, &p);
+    addSlider("Size", p.winName, 1, 100, p.size, &p.size, TrackSlider, &p);
+    addSlider("ColorSpace", p.winName, 0, static_cast<int>(colorSpace.size()-1), p.colorSpace, &p.colorSpace, TrackSlider, &p);
+    while (waitKey(20) != 27)
+    {
+        v>>p.img;
+        imshow("Original", p.img);
+        TrackSlider(0, &p);
+        waitKey(10);
+    }
+    return 0;
+}
+
+void addSlider(String sliderName, String windowName, int minSlider, int maxSlider, int valDefault, int *valSlider, void(*f)(int, void *), void *r)
+{
+    createTrackbar(sliderName, windowName, valSlider, 1, f, r);
+    setTrackbarMin(sliderName, windowName, minSlider);
+    setTrackbarMax(sliderName, windowName, maxSlider);
+    setTrackbarPos(sliderName, windowName, valDefault);
+}
+
+void TrackSlider(int , void *r)
+{
+    OilImage *p = (OilImage *)r;
+    Mat dst;
+    p->img = p->img / p->dynRatio;
+    p->img = p->img*p->dynRatio;
+    xphoto::oilPainting(p->img, dst, p->size, p->dynRatio,colorSpace[p->colorSpace]);
+    if (!dst.empty())
+    {
+        imshow(p->winName, dst);
+    }
+}
diff --git a/modules/xphoto/src/inpainting.cpp b/modules/xphoto/src/inpainting.cpp
index ce4bc0654e5..ac7f496fb10 100644
--- a/modules/xphoto/src/inpainting.cpp
+++ b/modules/xphoto/src/inpainting.cpp
@@ -9,11 +9,19 @@
 //
 //                           License Agreement
 //                For Open Source Computer Vision Library
+//                       (3-clause BSD License)
 //
-// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
+// Copyright (C) 2000-2019, Intel Corporation, all rights reserved.
 // Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
+// Copyright (C) 2009-2016, NVIDIA Corporation, all rights reserved.
+// Copyright (C) 2010-2013, Advanced Micro Devices, Inc., all rights reserved.
+// Copyright (C) 2015-2016, OpenCV Foundation, all rights reserved.
+// Copyright (C) 2015-2016, Itseez Inc., all rights reserved.
 // Third party copyrights are property of their respective owners.
 //
+// Redistribution and use in source and binary forms, with or without modification,
+// are permitted provided that the following conditions are met:
+//
 //   * Redistribution's of source code must retain the above copyright notice,
 //     this list of conditions and the following disclaimer.
 //
@@ -21,8 +29,9 @@
 //     this list of conditions and the following disclaimer in the documentation
 //     and/or other materials provided with the distribution.
 //
-//   * The name of Intel Corporation may not be used to endorse or promote products
-//     derived from this software without specific prior written permission.
+//   * Neither the names of the copyright holders nor the names of the contributors
+//    may be used to endorse or promote products derived from this software
+//    without specific prior written permission.
 //
 // This software is provided by the copyright holders and contributors "as is" and
 // any express or implied warranties, including, but not limited to, the implied
@@ -46,6 +55,8 @@
 #include <fstream>
 #include <time.h>
 #include <functional>
+#include <string>
+#include <tuple>
 
 #include "opencv2/xphoto.hpp"
 
@@ -62,6 +73,8 @@
 #include "annf.hpp"
 #include "advanced_types.hpp"
 
+#include "inpainting_fsr.impl.hpp"
+
 namespace cv
 {
 namespace xphoto
@@ -298,17 +311,9 @@ namespace xphoto
         }
     }
 
-    /*! The function reconstructs the selected image area from known area.
-    *  \param src : source image.
-    *  \param mask : inpainting mask, 8-bit 1-channel image. Zero pixels indicate the area that needs to be inpainted.
-    *  \param dst : destination image.
-    *  \param algorithmType : inpainting method.
-    */
-    void inpaint(const Mat &src, const Mat &mask, Mat &dst, const int algorithmType)
+    static
+    void inpaint_shiftmap(const Mat &src, const Mat &mask, Mat &dst, const int algorithmType)
     {
-        CV_Assert( mask.channels() == 1 && mask.depth() == CV_8U );
-        CV_Assert( src.rows == mask.rows && src.cols == mask.cols );
-
         switch ( src.type() )
         {
             case CV_8SC1:
@@ -399,8 +404,25 @@ namespace xphoto
                 CV_Error_( CV_StsNotImplemented,
                     ("Unsupported source image format (=%d)",
                     src.type()) );
-                break;
         }
     }
+
+void inpaint(const Mat &src, const Mat &mask, Mat &dst, const int algorithmType)
+{
+    CV_Assert(!src.empty());
+    CV_Assert(!mask.empty());
+    CV_CheckTypeEQ(mask.type(), CV_8UC1, "");
+    CV_Assert(src.rows == mask.rows && src.cols == mask.cols);
+
+    switch (algorithmType)
+    {
+        case xphoto::INPAINT_SHIFTMAP:
+            return inpaint_shiftmap(src, mask, dst, algorithmType);
+        case xphoto::INPAINT_FSR_BEST:
+        case xphoto::INPAINT_FSR_FAST:
+            return inpaint_fsr(src, mask, dst, algorithmType);
+    }
+    CV_Error_(Error::StsNotImplemented, ("Unsupported inpainting algorithm type (=%d)", algorithmType));
 }
-}
+
+}}  // namespace
diff --git a/modules/xphoto/src/inpainting_fsr.impl.hpp b/modules/xphoto/src/inpainting_fsr.impl.hpp
new file mode 100644
index 00000000000..8e9e0eb716f
--- /dev/null
+++ b/modules/xphoto/src/inpainting_fsr.impl.hpp
@@ -0,0 +1,826 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+// This is not a standalone header, see inpainting.cpp
+
+namespace cv
+{
+namespace xphoto
+{
+
+struct fsr_parameters
+{
+    // default variables
+    int block_size = 16;
+    double conc_weighting = 0.5;
+    double rhos[4] = { 0.80, 0.70, 0.66, 0.64 };
+    double threshold_stddev_Y[3] = { 0.014, 0.030, 0.090 };
+    double threshold_stddev_Cx[3] = { 0.006, 0.010, 0.028 };
+    // quality profile dependent variables
+    int block_size_min, fft_size, max_iter, min_iter, iter_const;
+    double orthogonality_correction;
+    fsr_parameters(const int quality)
+    {
+        if (quality == xphoto::INPAINT_FSR_BEST)
+        {
+            block_size_min = 2;
+            fft_size = 64;
+            max_iter = 400;
+            min_iter = 50;
+            iter_const = 2000;
+            orthogonality_correction = 0.2;
+        }
+        else if (quality == xphoto::INPAINT_FSR_FAST)
+        {
+            block_size_min = 4;
+            fft_size = 32;
+            max_iter = 100;
+            min_iter = 20;
+            iter_const = 1000;
+            orthogonality_correction = 0.5;
+        }
+        else
+        {
+            CV_Error(Error::StsBadArg, "Unknown quality level set, supported: FAST, BEST");
+
+        }
+    }
+};
+
+
+static void
+icvSgnMat(const Mat& src, Mat& dst) {
+    dst = Mat::zeros(src.size(), CV_64F);
+    for (int y = 0; y < src.rows; ++y)
+    {
+        for (int x = 0; x < src.cols; ++x)
+        {
+            double curr_val = src.at<double>(y,x);
+            if (curr_val > 0)
+            {
+                dst.at<double>(y,x) = 1;
+            }
+            else if (curr_val)
+            {
+                dst.at<double>(y,x) = -1;
+            }
+        }
+    }
+}
+
+
+static double
+icvStandardDeviation(const Mat& distorted_block_2d, const Mat& error_mask_2d) {
+    if (countNonZero(error_mask_2d) < 1)
+    {
+        return NAN; // block with no undistorted pixels shouldn't be chosen for processing (only if block_size_min is reached)
+    }
+    Scalar tmp_stddev, tmp_mean;
+    Mat mask8u;
+    error_mask_2d.convertTo(mask8u, CV_8U, 2.0);
+    meanStdDev(distorted_block_2d, tmp_mean, tmp_stddev, mask8u);
+    double sigma_n = tmp_stddev[0] / 255;
+    if (sigma_n < 0)
+    {
+        sigma_n = 0;
+    }
+    else if (sigma_n > 1)
+    {
+        sigma_n = 1;
+    }
+    return sigma_n;
+}
+
+static void
+icvExtrapolateBlock(Mat& distorted_block, Mat& error_mask, fsr_parameters& fsr_params, double rho, double normedStdDev, Mat& extrapolated_block)
+{
+    double fft_size = fsr_params.fft_size;
+    double orthogonality_correction = fsr_params.orthogonality_correction;
+    int M = distorted_block.rows;
+    int N = distorted_block.cols;
+    int fft_x_offset = cvFloor((fft_size - N) / 2);
+    int fft_y_offset = cvFloor((fft_size - M) / 2);
+
+    // weighting function
+    Mat w = Mat::zeros(fsr_params.fft_size, fsr_params.fft_size, CV_64F);
+    error_mask.copyTo(w(Range(fft_y_offset, fft_y_offset + M), Range(fft_x_offset, fft_x_offset + N)));
+    for (int u = 0; u < fft_size; ++u)
+    {
+        for (int v = 0; v < fft_size; ++v)
+        {
+            w.at<double>(u, v) *= std::pow(rho, std::sqrt(std::pow(u + 0.5 - (fft_y_offset + M / 2), 2) + std::pow(v + 0.5 - (fft_x_offset + N / 2), 2)));
+        }
+    }
+    Mat W;
+    dft(w, W, DFT_COMPLEX_OUTPUT);
+    Mat W_padded;
+    hconcat(W, W, W_padded);
+    vconcat(W_padded, W_padded, W_padded);
+
+    // frequency weighting
+    Mat frequency_weighting = Mat::ones(fsr_params.fft_size, fsr_params.fft_size / 2 + 1, CV_64F);
+    for (int y = 0; y < fft_size; ++y)
+    {
+        for (int x = 0; x < (fft_size / 2 + 1); ++x)
+        {
+            double y2 = fft_size / 2 - std::abs(y - fft_size / 2);
+            double x2 = fft_size / 2 - std::abs(x - fft_size / 2);
+            frequency_weighting.at<double>(y, x) = 1 - std::sqrt(x2*x2 + y2 * y2)*std::sqrt(2) / fft_size;
+        }
+    }
+    // pad image to fft window size
+    Mat f(Size(fsr_params.fft_size, fsr_params.fft_size), CV_64F, Scalar::all(0));
+    distorted_block.copyTo(f(Range(fft_y_offset, fft_y_offset + M), Range(fft_x_offset, fft_x_offset + N)));
+
+    // create initial model
+    Mat G = Mat::zeros(fsr_params.fft_size, fsr_params.fft_size, CV_64FC2); // complex
+
+    // calculate initial residual
+    Mat Rw_tmp, Rw;
+    dft(f.mul(w), Rw_tmp, DFT_COMPLEX_OUTPUT);
+    Rw = Rw_tmp(Range(0, fsr_params.fft_size), Range(0, fsr_params.fft_size / 2 + 1));
+
+    // estimate ideal number of iterations (GenserIWSSIP2017)
+    // calculate stddev if not available (e.g., for smallest block size)
+    if (normedStdDev == 0) {
+        normedStdDev = icvStandardDeviation(distorted_block, error_mask);
+    }
+    int num_iters = cvRound(fsr_params.iter_const * normedStdDev);
+    if (num_iters < fsr_params.min_iter) {
+        num_iters = fsr_params.min_iter;
+    }
+    else if (num_iters > fsr_params.max_iter) {
+        num_iters = fsr_params.max_iter;
+    }
+
+    int iter_counter = 0;
+    while (iter_counter < num_iters)
+    { // Spectral Constrained FSE (GenserIWSSIP2018)
+        Mat projection_distances(Rw.size(), CV_64F);
+        Mat Rw_mag = Mat(Rw.size(), CV_64F);
+        std::vector<Mat> channels(2);
+        split(Rw, channels);
+        magnitude(channels[0], channels[1], Rw_mag);
+        projection_distances = Rw_mag.mul(frequency_weighting);
+
+        double minVal, maxVal;
+        int maxLocx = -1;
+        int maxLocy = -1;
+        minMaxLoc(projection_distances, &minVal, &maxVal);
+
+        for (int y = 0; y < projection_distances.rows; ++y)
+        { // assure that first appearance of max Value is selected
+            for (int x = 0; x < projection_distances.cols; ++x)
+            {
+                if (std::abs(projection_distances.at<double>(y, x) - maxVal) < 0.001)
+                {
+                    maxLocy = y;
+                    maxLocx = x;
+                    break;
+                }
+            }
+            if (maxLocy != -1)
+            {
+                break;
+            }
+        }
+        int bf2select = maxLocy + maxLocx * projection_distances.rows;
+        int v = static_cast<int>(std::max(0.0, std::floor(bf2select / fft_size)));
+        int u = static_cast<int>(std::max(0, bf2select % fsr_params.fft_size));
+
+
+        // exclude second half of first and middle col
+        if ((v == 0 && u > fft_size / 2) || (v == fft_size / 2 && u > fft_size / 2))
+        {
+            int u_prev = u;
+            u = fsr_params.fft_size - u;
+            Rw.at<std::complex<double> >(u, v) = std::conj(Rw.at<std::complex<double> >(u_prev, v));
+        }
+
+        // calculate complex conjugate solution
+        int u_cj = -1;
+        int v_cj = -1;
+        // fill first lower col (copy from first upper col)
+        if (u >= 1 && u < fft_size / 2 && v == 0)
+        {
+            u_cj = fsr_params.fft_size - u;
+            v_cj = v;
+        }
+        // fill middle lower col (copy from first middle col)
+        if (u >= 1 && u < fft_size / 2 && v == fft_size / 2)
+        {
+            u_cj = fsr_params.fft_size - u;
+            v_cj = v;
+        }
+        // fill first row right (copy from first row left)
+        if (u == 0 && v >= 1 && v < fft_size / 2)
+        {
+            u_cj = u;
+            v_cj = fsr_params.fft_size - v;
+        }
+        // fill middle row right (copy from middle row left)
+        if (u == fft_size / 2 && v >= 1 && v < fft_size / 2)
+        {
+            u_cj = u;
+            v_cj = fsr_params.fft_size - v;
+        }
+        // fill cell upper right (copy from lower cell left)
+        if (u >= fft_size / 2 + 1 && v >= 1 && v < fft_size / 2)
+        {
+            u_cj = fsr_params.fft_size - u;
+            v_cj = fsr_params.fft_size - v;
+        }
+        // fill cell lower right (copy from upper cell left)
+        if (u >= 1 && u < fft_size / 2 && v >= 1 && v < fft_size / 2)
+        {
+            u_cj = fsr_params.fft_size - u;
+            v_cj = fsr_params.fft_size - v;
+        }
+
+        /// add coef to model and update residual
+        if (u_cj != -1 && v_cj != -1)
+        {
+            std::complex< double> expansion_coefficient = orthogonality_correction * Rw.at< std::complex<double> >(u, v) / W.at<std::complex<double> >(0, 0);
+            G.at< std::complex<double> >(u, v) += fft_size * fft_size * expansion_coefficient;
+            G.at< std::complex<double> >(u_cj, v_cj) = std::conj(G.at< std::complex<double> >(u, v));
+
+            Mat expansion_mat(Rw.size(), CV_64FC2, Scalar(expansion_coefficient.real(), expansion_coefficient.imag()));
+            Mat W_tmp1 = W_padded(Range(fsr_params.fft_size - u, fsr_params.fft_size - u + Rw.rows), Range(fsr_params.fft_size - v, fsr_params.fft_size - v + Rw.cols));
+            Mat W_tmp2 = W_padded(Range(fsr_params.fft_size - u_cj, fsr_params.fft_size - u_cj + Rw.rows), Range(fsr_params.fft_size - v_cj, fsr_params.fft_size - v_cj + Rw.cols));
+            Mat res_1(W_tmp1.size(), W_tmp1.type());
+            mulSpectrums(expansion_mat, W_tmp1, res_1, 0);
+            expansion_mat.setTo(Scalar(expansion_coefficient.real(), -expansion_coefficient.imag()));
+            Mat res_2(W_tmp1.size(), W_tmp1.type());
+            mulSpectrums(expansion_mat, W_tmp2, res_2, 0);
+            Rw -= res_1 + res_2;
+
+            ++iter_counter; // ... as two basis functions were added
+        }
+        else
+        {
+            std::complex<double> expansion_coefficient = orthogonality_correction * Rw.at< std::complex<double> >(u, v) / W.at< std::complex<double> >(0, 0);
+            G.at< std::complex<double> >(u, v) += fft_size * fft_size * expansion_coefficient;
+            Mat expansion_mat(Rw.size(), CV_64FC2, Scalar(expansion_coefficient.real(), expansion_coefficient.imag()));
+            Mat W_tmp = W_padded(Range(fsr_params.fft_size - u, fsr_params.fft_size - u + Rw.rows), Range(fsr_params.fft_size - v, fsr_params.fft_size - v + Rw.cols));
+            Mat res_tmp(W_tmp.size(), W_tmp.type());
+            mulSpectrums(expansion_mat, W_tmp, res_tmp, 0);
+            Rw -= res_tmp;
+
+        }
+        ++iter_counter;
+    }
+
+    // get pixels from model
+    Mat g;
+    idft(G, g, DFT_SCALE);
+
+    // extract reconstructed pixels
+    Mat g_real(M, N, CV_64F);
+    for (int x = 0; x < M; ++x)
+    {
+        for (int y = 0; y < N; ++y)
+        {
+            g_real.at<double>(x, y) = g.at< std::complex<double> >(fft_y_offset + x, fft_x_offset + y).real();
+        }
+    }
+    g_real.copyTo(extrapolated_block);
+    Mat orig_samples;
+    error_mask.convertTo(orig_samples, CV_8U);
+    distorted_block.copyTo(extrapolated_block, orig_samples); // copy where orig_samples is nonzero
+}
+
+
+static void
+icvGetTodoBlocks(Mat& sampled_img, Mat& sampling_mask, std::vector< std::tuple< int, int > >& set_todo, int block_size, int block_size_min, int border_width, double homo_threshold, Mat& set_process_this_block_size, std::vector< std::tuple< int, int > >& set_later, Mat& sigma_n_array)
+{
+    std::vector< std::tuple< int, int > > set_now;
+    set_later.clear();
+    size_t list_length = set_todo.size();
+    int img_height = sampled_img.rows;
+    int img_width = sampled_img.cols;
+    Mat reconstructed_img;
+    sampled_img.copyTo(reconstructed_img);
+
+    // calculate block lists
+    for (size_t entry = 0; entry < list_length; ++entry)
+    {
+        int xblock_counter = std::get<0>(set_todo[entry]);
+        int yblock_counter = std::get<1>(set_todo[entry]);
+
+        int left_border = std::min(xblock_counter*block_size, border_width);
+        int top_border = std::min(yblock_counter*block_size, border_width);
+        int right_border = std::max(0, std::min(img_width - (xblock_counter + 1)*block_size, border_width));
+        int bottom_border = std::max(0, std::min(img_height - (yblock_counter + 1)*block_size, border_width));
+
+        // extract blocks from images
+        Mat distorted_block_2d = reconstructed_img(Range(yblock_counter*block_size - top_border, std::min(img_height, (yblock_counter*block_size + block_size + bottom_border))), Range(xblock_counter*block_size - left_border, std::min(img_width, (xblock_counter*block_size + block_size + right_border))));
+        Mat error_mask_2d = sampling_mask(Range(yblock_counter*block_size - top_border, std::min(img_height, (yblock_counter*block_size + block_size + bottom_border))), Range(xblock_counter*block_size - left_border, std::min(img_width, (xblock_counter*block_size + block_size + right_border))));
+
+        // determine normalized and weighted standard deviation
+        if (block_size > block_size_min)
+        {
+            double sigma_n = icvStandardDeviation(distorted_block_2d, error_mask_2d);
+            sigma_n_array.at<double>( yblock_counter, xblock_counter) = sigma_n;
+
+            // homogeneous case
+            if (sigma_n < homo_threshold)
+            {
+                set_now.emplace_back(xblock_counter, yblock_counter);
+                set_process_this_block_size.at<double>(yblock_counter, xblock_counter) = 255;
+
+            }
+            else
+            {
+                int yblock_counter_quadernary = yblock_counter * 2;
+                int xblock_counter_quadernary = xblock_counter * 2;
+                int yblock_offset = 0;
+                int xblock_offset = 0;
+
+                for (int quader_counter = 0; quader_counter < 4; ++quader_counter)
+                {
+                    if (quader_counter == 0)
+                    {
+                        yblock_offset = 0;
+                        xblock_offset = 0;
+                    }
+                    else if (quader_counter == 1)
+                    {
+                        yblock_offset = 0;
+                        xblock_offset = 1;
+                    }
+                    else if (quader_counter == 2)
+                    {
+                        yblock_offset = 1;
+                        xblock_offset = 0;
+                    }
+                    else if (quader_counter == 3)
+                    {
+                        yblock_offset = 1;
+                        xblock_offset = 1;
+                    }
+
+                    set_later.emplace_back(xblock_counter_quadernary + xblock_offset, yblock_counter_quadernary + yblock_offset);
+                }
+
+            }
+        }
+
+    }
+}
+
+
+static void
+icvDetermineProcessingOrder(
+    const Mat& _sampled_img, const Mat& _sampling_mask,
+    const int quality, const std::string& channel, Mat& reconstructed_img
+)
+{
+    fsr_parameters fsr_params(quality);
+    int block_size = fsr_params.block_size;
+    int block_size_max = fsr_params.block_size;
+    int block_size_min = fsr_params.block_size_min;
+    double conc_weighting = fsr_params.conc_weighting;
+    int fft_size = fsr_params.fft_size;
+    double rho = fsr_params.rhos[0];
+    Mat sampled_img, sampling_mask;
+    _sampled_img.convertTo(sampled_img, CV_64F);
+    reconstructed_img = sampled_img.clone();
+
+    _sampling_mask.convertTo(sampling_mask, CV_64F);
+
+    double threshold_stddev_LUT[3];
+    if (channel == "Y")
+    {
+        std::copy(fsr_params.threshold_stddev_Y, fsr_params.threshold_stddev_Y + 3, threshold_stddev_LUT);
+    }
+    else if (channel == "Cx")
+    {
+        std::copy(fsr_params.threshold_stddev_Cx, fsr_params.threshold_stddev_Cx + 3, threshold_stddev_LUT);
+    }
+    else
+    {
+        CV_Error(Error::StsBadArg, "channel type unsupported!");
+    }
+
+
+    double threshold_stddev = threshold_stddev_LUT[0];
+
+    std::vector< std::tuple< int, int > > set_later;
+    int img_height = sampled_img.rows;
+    int img_width = sampled_img.cols;
+
+    // initial scan of distorted blocks
+    std::vector< std::tuple< int, int > > set_todo;
+    int blocks_column = divUp(img_height, block_size);
+    int blocks_line = divUp(img_width, block_size);
+    for (int y = 0; y < blocks_column; ++y)
+    {
+        for (int x = 0; x < blocks_line; ++x)
+        {
+            Mat curr_block = sampling_mask(Range(y*block_size, std::min(img_height, (y + 1)*block_size)), Range(x*block_size, std::min(img_width, (x + 1)*block_size)));
+            double min_block, max_block;
+            minMaxLoc(curr_block, &min_block, &max_block);
+            if (min_block == 0)
+            {
+                set_todo.emplace_back(x, y);
+            }
+        }
+    }
+
+    // loop over all distorted blocks and extrapolate them depending on
+    // their block size
+    int border_width = 0;
+    while (block_size >= block_size_min)
+    {
+        int blocks_per_column = cvCeil(img_height / block_size);
+        int blocks_per_line = cvCeil(img_width / block_size);
+        Mat nen_array = Mat::zeros(blocks_per_column, blocks_per_line, CV_64F);
+        Mat proc_array = Mat::zeros(blocks_per_column, blocks_per_line, CV_64F);
+        Mat sigma_n_array = Mat::zeros(blocks_per_column, blocks_per_line, CV_64F);
+        Mat set_process_this_block_size = Mat::zeros(blocks_per_column, blocks_per_line, CV_64F);
+        if (block_size > block_size_min)
+        {
+            if (block_size < block_size_max)
+            {
+                set_todo = set_later;
+            }
+            border_width = cvFloor(fft_size - block_size) / 2;
+            icvGetTodoBlocks(sampled_img, sampling_mask, set_todo, block_size, block_size_min, border_width, threshold_stddev, set_process_this_block_size, set_later, sigma_n_array);
+        }
+        else
+        {
+            set_process_this_block_size.setTo(Scalar(255));
+        }
+
+        // if block to be extrapolated, increase nen of neighboring pixels
+        for (int yblock_counter = 0; yblock_counter < blocks_per_column; ++yblock_counter)
+        {
+            for (int xblock_counter = 0; xblock_counter < blocks_per_line; ++xblock_counter)
+            {
+                Mat curr_block = sampling_mask(Range(yblock_counter*block_size, std::min(img_height, (yblock_counter + 1)*block_size)), Range(xblock_counter*block_size, std::min(img_width, (xblock_counter + 1)*block_size)));
+                double min_block, max_block;
+                minMaxLoc(curr_block, &min_block, &max_block);
+                if (min_block == 0)
+                {
+                    if (yblock_counter > 0 && xblock_counter > 0)
+                    {
+                        nen_array.at<double>(yblock_counter - 1, xblock_counter - 1)++;
+                    }
+                    if (yblock_counter > 0)
+                    {
+                        nen_array.at<double>(yblock_counter - 1, xblock_counter)++;
+                    }
+                    if (yblock_counter > 0 && xblock_counter < (blocks_per_line - 1))
+                    {
+                        nen_array.at<double>(yblock_counter - 1, xblock_counter + 1)++;
+                    }
+                    if (xblock_counter > 0)
+                    {
+                        nen_array.at<double>(yblock_counter, xblock_counter - 1)++;
+                    }
+                    if (xblock_counter < (blocks_per_line - 1))
+                    {
+                        nen_array.at<double>(yblock_counter, xblock_counter + 1)++;
+                    }
+                    if (yblock_counter < (blocks_per_column - 1) && xblock_counter>0)
+                    {
+                        nen_array.at<double>(yblock_counter + 1, xblock_counter - 1)++;
+                    }
+                    if (yblock_counter < (blocks_per_column - 1))
+                    {
+                        nen_array.at<double>(yblock_counter + 1, xblock_counter)++;
+                    }
+                    if (yblock_counter < (blocks_per_column - 1) && xblock_counter < (blocks_per_line - 1))
+                    {
+                        nen_array.at<double>(yblock_counter + 1, xblock_counter + 1)++;
+                    }
+                }
+            }
+        }
+
+        // determine if block itself has to be extrapolated
+        for (int yblock_counter = 0; yblock_counter < blocks_per_column; ++yblock_counter)
+        {
+            for (int xblock_counter = 0; xblock_counter < blocks_per_line; ++xblock_counter)
+            {
+                Mat curr_block = sampling_mask(Range(yblock_counter*block_size, std::min(img_height, (yblock_counter + 1)*block_size)), Range(xblock_counter*block_size, std::min(img_width, (xblock_counter + 1)*block_size)));
+                double min_block, max_block;
+                minMaxLoc(curr_block, &min_block, &max_block);
+                if (min_block != 0)
+                {
+                    nen_array.at<double>(yblock_counter, xblock_counter) = -1;
+                }
+                else
+                {
+                // if border block, increase nen respectively
+                    if (yblock_counter == 0 && xblock_counter == 0)
+                    {
+                        nen_array.at<double>(yblock_counter, xblock_counter) = nen_array.at<double>(yblock_counter, xblock_counter) + 5;
+                    }
+                    if (yblock_counter == 0 && xblock_counter == (blocks_per_line - 1))
+                    {
+                        nen_array.at<double>(yblock_counter, xblock_counter) = nen_array.at<double>(yblock_counter, xblock_counter) + 5;
+                    }
+                    if (yblock_counter == (blocks_per_column - 1) && xblock_counter == 0)
+                    {
+                        nen_array.at<double>(yblock_counter, xblock_counter) = nen_array.at<double>(yblock_counter, xblock_counter) + 5;
+                    }
+                    if (yblock_counter == (blocks_per_column - 1) && xblock_counter == (blocks_per_line - 1))
+                    {
+                        nen_array.at<double>(yblock_counter, xblock_counter) = nen_array.at<double>(yblock_counter, xblock_counter) + 5;
+                    }
+                    if (yblock_counter == 0 && xblock_counter != 0 && xblock_counter != (blocks_per_line - 1))
+                    {
+                        nen_array.at<double>(yblock_counter, xblock_counter) = nen_array.at<double>(yblock_counter, xblock_counter) + 3;
+                    }
+                    if (yblock_counter == (blocks_per_column - 1) && xblock_counter != 0 && xblock_counter != (blocks_per_line - 1))
+                    {
+                        nen_array.at<double>(yblock_counter, xblock_counter) = nen_array.at<double>(yblock_counter, xblock_counter) + 3;
+                    }
+                    if (yblock_counter != 0 && yblock_counter != (blocks_per_column - 1) && xblock_counter == 0)
+                    {
+                        nen_array.at<double>(yblock_counter, xblock_counter) = nen_array.at<double>(yblock_counter, xblock_counter) + 3;
+                    }
+                    if (yblock_counter != 0 && yblock_counter != (blocks_per_column - 1) && xblock_counter == (blocks_per_line - 1))
+                    {
+                        nen_array.at<double>(yblock_counter, xblock_counter) = nen_array.at<double>(yblock_counter, xblock_counter) + 3;
+                    }
+                }
+            }
+        }
+
+        // if all blocks have 8 not extrapolated neighbors, penalize nen of blocks without any known samples by one
+        double min_nen_tmp, max_nen_tmp;
+        minMaxLoc(nen_array, &min_nen_tmp, &max_nen_tmp);
+        if (min_nen_tmp == 8) {
+            for (int yblock_counter = 0; yblock_counter < blocks_per_column; ++yblock_counter)
+            {
+                for (int xblock_counter = 0; xblock_counter < blocks_per_line; ++xblock_counter)
+                {
+                    Mat curr_block = sampling_mask(Range(yblock_counter*block_size, std::min(img_height, (yblock_counter + 1)*block_size)), Range(xblock_counter*block_size, std::min(img_width, (xblock_counter + 1)*block_size)));
+                    double min_block, max_block;
+                    minMaxLoc(curr_block, &min_block, &max_block);
+                    if (max_block == 0)
+                    {
+                        nen_array.at<double>(yblock_counter, xblock_counter)++;
+                    }
+                }
+            }
+        }
+
+        // do actual processing per block
+        int all_blocks_finished = 0;
+        while (all_blocks_finished == 0) {
+            // clear proc_array
+            proc_array.setTo(Scalar(1));
+
+            // determine blocks to extrapolate
+            double min_nen = 99;
+            int bl_counter = 0;
+            // add all homogeneous blocks that shall be processed to list
+            // using same priority
+            // begins with highest prioroty or lowest nen array value
+            std::vector< std::tuple< int, int > > block_list;
+            for (int yblock_counter = 0; yblock_counter < blocks_per_column; ++yblock_counter)
+            {
+                for (int xblock_counter = 0; xblock_counter < blocks_per_line; ++xblock_counter)
+                {
+            // decision if block contains errors
+                    double tmp_val = nen_array.at<double>(yblock_counter, xblock_counter);
+                    if (tmp_val >= 0 && tmp_val < min_nen && set_process_this_block_size.at<double>(yblock_counter, xblock_counter) == 255) {
+                        bl_counter = 0;
+                        block_list.clear();
+                        min_nen = tmp_val;
+                        proc_array.setTo(Scalar(1));
+                    }
+                    if (tmp_val == min_nen && proc_array.at<double>(yblock_counter, xblock_counter) != 0 && set_process_this_block_size.at<double>(yblock_counter, xblock_counter) == 0) {
+                        nen_array.at<double>(yblock_counter, xblock_counter) = -1;
+                    }
+                    if (tmp_val == min_nen && proc_array.at<double>(yblock_counter, xblock_counter) != 0 && set_process_this_block_size.at<double>(yblock_counter, xblock_counter) != 0) {
+                        block_list.emplace_back(yblock_counter, xblock_counter);
+                        bl_counter++;
+                        // block neighboring blocks from processing
+                        if (yblock_counter > 0 && xblock_counter > 0)
+                        {
+                            proc_array.at<double>(yblock_counter - 1, xblock_counter - 1) = 0;
+                        }
+                        if (yblock_counter > 0)
+                        {
+                            proc_array.at<double>(yblock_counter - 1, xblock_counter) = 0;
+                        }
+                         if (yblock_counter > 0 && xblock_counter > 0)
+                        {
+                            proc_array.at<double>(yblock_counter - 1, xblock_counter - 1) = 0;
+                        }
+                        if (yblock_counter > 0)
+                        {
+                            proc_array.at<double>(yblock_counter - 1, xblock_counter) = 0;
+                        }
+                        if (yblock_counter > 0 && xblock_counter < (blocks_per_line - 1))
+                        {
+                            proc_array.at<double>(yblock_counter - 1, xblock_counter + 1) = 0;
+                        }
+                        if (xblock_counter > 0)
+                        {
+                            proc_array.at<double>(yblock_counter, xblock_counter - 1) = 0;
+                        }
+                        if (xblock_counter < (blocks_per_line - 1))
+                        {
+                            proc_array.at<double>(yblock_counter, xblock_counter + 1) = 0;
+                        }
+                        if (yblock_counter < (blocks_per_column - 1) && xblock_counter > 0)
+                        {
+                            proc_array.at<double>(yblock_counter + 1, xblock_counter - 1) = 0;
+                        }
+                        if (yblock_counter < (blocks_per_column - 1))
+                        {
+                            proc_array.at<double>(yblock_counter + 1, xblock_counter) = 0;
+                        }
+                        if (yblock_counter < (blocks_per_column - 1) && xblock_counter < (blocks_per_line - 1))
+                        {
+                            proc_array.at<double>(yblock_counter + 1, xblock_counter + 1) = 0;
+                        }
+                    }
+                }
+            }
+            int max_bl_counter = bl_counter;
+            block_list.emplace_back(-1, -1);
+            if (bl_counter == 0)
+            {
+                all_blocks_finished = 1;
+            }
+            // blockwise extrapolation of all blocks that can be processed in parallel
+            for (bl_counter = 0; bl_counter < max_bl_counter; ++bl_counter)
+            {
+                int yblock_counter = std::get<0>(block_list[bl_counter]);
+                int xblock_counter = std::get<1>(block_list[bl_counter]);
+
+                // calculation of the extrapolation area's borders
+                int left_border = std::min(xblock_counter*block_size, border_width);
+                int top_border = std::min(yblock_counter*block_size, border_width);
+                int right_border = std::max(0, std::min(img_width - (xblock_counter + 1)*block_size, border_width));
+                int bottom_border = std::max(0, std::min(img_height - (yblock_counter + 1)*block_size, border_width));
+
+                // extract blocks from images
+                Mat distorted_block_2d = reconstructed_img(Range(yblock_counter*block_size - top_border, std::min(img_height, (yblock_counter*block_size + block_size + bottom_border))), Range(xblock_counter*block_size - left_border, std::min(img_width, (xblock_counter*block_size + block_size + right_border))));
+                Mat error_mask_2d = sampling_mask(Range(yblock_counter*block_size - top_border, std::min(img_height, (yblock_counter*block_size + block_size + bottom_border))), Range(xblock_counter*block_size - left_border, std::min(img_width, xblock_counter*block_size + block_size + right_border)));
+                // get actual stddev value as it is needed to estimate the
+                // best number of iterations
+                double sigma_n_a = sigma_n_array.at<double>(yblock_counter, xblock_counter);
+
+                // actual extrapolation
+                Mat extrapolated_block_2d;
+                icvExtrapolateBlock(distorted_block_2d, error_mask_2d, fsr_params, rho, sigma_n_a, extrapolated_block_2d);
+
+                // update image and mask
+                extrapolated_block_2d(Range(top_border, extrapolated_block_2d.rows - bottom_border), Range(left_border, extrapolated_block_2d.cols - right_border)).copyTo(reconstructed_img(Range(yblock_counter*block_size, std::min(img_height, (yblock_counter + 1)*block_size)), Range(xblock_counter*block_size, std::min(img_width, (xblock_counter + 1)*block_size))));
+
+                Mat signs;
+                icvSgnMat(error_mask_2d(Range(top_border, error_mask_2d.rows - bottom_border), Range(left_border, error_mask_2d.cols - right_border)), signs);
+                Mat tmp_mask = error_mask_2d(Range(top_border, error_mask_2d.rows - bottom_border), Range(left_border, error_mask_2d.cols - right_border)) + (1 - signs) *conc_weighting;
+                tmp_mask.copyTo(sampling_mask(Range(yblock_counter*block_size, std::min(img_height, (yblock_counter + 1)*block_size)), Range(xblock_counter*block_size, std::min(img_width, (xblock_counter + 1)*block_size))));
+
+                // update nen-array
+                nen_array.at<double>(yblock_counter, xblock_counter) = -1;
+                if (yblock_counter > 0 && xblock_counter > 0)
+                {
+                    nen_array.at<double>(yblock_counter - 1, xblock_counter - 1)--;
+                }
+                if (yblock_counter > 0)
+                {
+                    nen_array.at<double>(yblock_counter - 1, xblock_counter)--;
+                }
+                if (yblock_counter > 0 && xblock_counter < blocks_per_line - 1)
+                {
+                    nen_array.at<double>(yblock_counter - 1, xblock_counter + 1)--;
+                }
+                if (xblock_counter > 0)
+                {
+                    nen_array.at<double>(yblock_counter, xblock_counter - 1)--;
+                }
+                if (xblock_counter < blocks_per_line - 1)
+                {
+                    nen_array.at<double>(yblock_counter, xblock_counter + 1)--;
+                }
+                if (yblock_counter < blocks_per_column - 1 && xblock_counter>0)
+                {
+                    nen_array.at<double>(yblock_counter + 1, xblock_counter - 1)--;
+                }
+                if (yblock_counter < blocks_per_column - 1)
+                {
+                    nen_array.at<double>(yblock_counter + 1, xblock_counter)--;
+                }
+                if (yblock_counter < blocks_per_column - 1 && xblock_counter < blocks_per_line - 1)
+                {
+                    nen_array.at<double>(yblock_counter + 1, xblock_counter + 1)--;
+                }
+
+            }
+
+        }
+
+        // set parameters for next extrapolation tasks (higher texture)
+        block_size = block_size / 2;
+        border_width = (fft_size - block_size) / 2;
+        if (block_size == 8)
+        {
+            threshold_stddev = threshold_stddev_LUT[1];
+            rho = fsr_params.rhos[1];
+        }
+        if (block_size == 4)
+        {
+            threshold_stddev = threshold_stddev_LUT[2];
+            rho = fsr_params.rhos[2];
+        }
+        if (block_size == 2)
+        {
+            rho = fsr_params.rhos[3];
+        }
+
+        // terminate function - no heterogeneous blocks left
+        if (set_later.empty())
+        {
+            break;
+        }
+    }
+}
+
+
+static
+void inpaint_fsr(const Mat &src, const Mat &mask, Mat &dst, const int algorithmType)
+{
+    CV_Assert(algorithmType == xphoto::INPAINT_FSR_BEST || algorithmType == xphoto::INPAINT_FSR_FAST);
+    CV_Check(src.channels(), src.channels() == 1 || src.channels() == 3, "");
+    switch (src.type())
+    {
+        case CV_8UC1:
+        case CV_8UC3:
+            break;
+        case CV_16UC1:
+        case CV_16UC3:
+        {
+            double minRange, maxRange;
+            minMaxLoc(src, &minRange, &maxRange);
+            if (minRange < 0 || maxRange > 65535)
+            {
+                CV_Error(Error::StsUnsupportedFormat, "Unsupported source image format!");
+                break;
+            }
+            src.convertTo(src, CV_8U, 1/257.0);
+            break;
+        }
+        case CV_32FC1:
+        case CV_64FC1:
+        case CV_32FC3:
+        case CV_64FC3:
+        {
+            double minRange, maxRange;
+            minMaxLoc(src, &minRange, &maxRange);
+            if (minRange < -FLT_EPSILON || maxRange > (1.0 + FLT_EPSILON))
+            {
+                CV_Error(Error::StsUnsupportedFormat, "Unsupported source image format!");
+                break;
+            }
+            src.convertTo(src, CV_8U, 255.0);
+            break;
+        }
+        default:
+            CV_Error(Error::StsUnsupportedFormat, "Unsupported source image format!");
+            break;
+    }
+    dst.create(src.size(), src.type());
+    Mat mask_01;
+    threshold(mask, mask_01, 0.0, 1.0, THRESH_BINARY);
+    if (src.channels() == 1)
+    { // grayscale image
+        Mat y_reconstructed;
+        icvDetermineProcessingOrder(src, mask_01, algorithmType, "Y", y_reconstructed);
+        y_reconstructed.convertTo(dst, CV_8U);
+    }
+    else if (src.channels() == 3)
+    { // RGB image
+        Mat ycrcb;
+        cvtColor(src, ycrcb, COLOR_BGR2YCrCb);
+        std::vector<Mat> channels(3);
+        split(ycrcb, channels);
+        Mat y = channels[0];
+        Mat cb = channels[2];
+        Mat cr = channels[1];
+        Mat y_reconstructed, cb_reconstructed, cr_reconstructed;
+        y = y.mul(mask_01);
+        cb = cb.mul(mask_01);
+        cr = cr.mul(mask_01);
+        icvDetermineProcessingOrder(y, mask_01, algorithmType, "Y", y_reconstructed);
+        icvDetermineProcessingOrder(cb, mask_01, algorithmType, "Cx", cb_reconstructed);
+        icvDetermineProcessingOrder(cr, mask_01, algorithmType, "Cx", cr_reconstructed);
+        Mat ycrcb_reconstructed;
+        y_reconstructed.convertTo(channels[0], CV_8U);
+        cr_reconstructed.convertTo(channels[1], CV_8U);
+        cb_reconstructed.convertTo(channels[2], CV_8U);
+        merge(channels, ycrcb_reconstructed);
+        cvtColor(ycrcb_reconstructed, dst, COLOR_YCrCb2BGR);
+    }
+}
+
+}}  // namespace
diff --git a/modules/xphoto/src/oilpainting.cpp b/modules/xphoto/src/oilpainting.cpp
new file mode 100644
index 00000000000..21e62414c32
--- /dev/null
+++ b/modules/xphoto/src/oilpainting.cpp
@@ -0,0 +1,172 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+#include "opencv2/xphoto.hpp"
+#include <opencv2/core.hpp>
+#include <opencv2/imgproc.hpp>
+
+template<class T>
+class Vec3fTo {
+public :
+    cv::Vec3f a;
+    Vec3fTo(cv::Vec3f x) {
+        a = x;
+    };
+    T extract();
+    cv::Vec3f make(int);
+};
+
+template<>
+uchar Vec3fTo<uchar>::extract()
+{
+    return static_cast<uchar>(a[0]);
+}
+
+template<>
+cv::Vec3b Vec3fTo<cv::Vec3b>::extract()
+{
+    return a;
+}
+
+template<>
+cv::Vec3f Vec3fTo<uchar>::make(int x)
+{
+    return cv::Vec3f((a*x)/x);
+}
+
+template<>
+cv::Vec3f Vec3fTo<cv::Vec3b>::make(int x)
+{
+    return cv::Vec3f(static_cast<float>(static_cast<int>(a[0]*x)/x),
+        static_cast<float>(static_cast<int>(a[1] * x) / x),
+        static_cast<float>(static_cast<int>(a[2] * x) / x));
+}
+
+namespace cv
+{
+namespace xphoto
+{
+template<typename Type>
+class ParallelOilPainting : public ParallelLoopBody
+{
+private:
+    Mat & imgSrc;
+    Mat &dst;
+    Mat &imgLuminance;
+    int halfsize;
+    int dynRatio;
+
+public:
+    ParallelOilPainting<Type>(Mat& img, Mat &d, Mat &iLuminance, int r,int k) :
+        imgSrc(img),
+        dst(d),
+        imgLuminance(iLuminance),
+        halfsize(r),
+        dynRatio(k)
+    {}
+    virtual void operator()(const Range& range) const CV_OVERRIDE
+    {
+        std::vector<int> histogram(256);
+        std::vector<Vec3f> meanBGR(256);
+
+        for (int y = range.start; y < range.end; y++)
+        {
+            Type *vDst = dst.ptr<Type>(y);
+            for (int x = 0; x < imgSrc.cols; x++, vDst++)
+            {
+                if (x == 0)
+                {
+                    histogram.assign(256, 0);
+                    meanBGR.assign(256, Vec3f(0,0,0));
+                    for (int yy = -halfsize; yy <= halfsize; yy++)
+                    {
+                        if (y + yy >= 0 && y + yy < imgSrc.rows)
+                        {
+                            Type *vPtr = imgSrc.ptr<Type>(y + yy) + x - 0;
+                            uchar *uc = imgLuminance.ptr(y + yy) + x - 0;
+                            for (int xx = 0; xx <= halfsize; xx++, vPtr++, uc++)
+                            {
+                                if (x + xx >= 0 && x + xx < imgSrc.cols)
+                                {
+                                    histogram[*uc]++;
+                                    meanBGR[*uc] += Vec3fTo<Type>(*vPtr).make(dynRatio);
+                                }
+                            }
+                        }
+                    }
+
+                }
+                else
+                {
+                    for (int yy = -halfsize; yy <= halfsize; yy++)
+                    {
+                        if (y + yy >= 0 && y + yy < imgSrc.rows)
+                        {
+                            Type *vPtr = imgSrc.ptr<Type>(y + yy) + x - halfsize - 1;
+                            uchar *uc = imgLuminance.ptr(y + yy) + x - halfsize - 1;
+                            int xx = -halfsize - 1;
+                            if (x + xx >= 0 && x + xx < imgSrc.cols)
+                            {
+                                histogram[*uc]--;
+                                meanBGR[*uc] -= Vec3fTo<Type>(*vPtr).make(dynRatio);
+                            }
+                            vPtr = imgSrc.ptr<Type>(y + yy) + x + halfsize;
+                            uc = imgLuminance.ptr(y + yy) + x + halfsize;
+                            xx = halfsize;
+                            if (x + xx >= 0 && x + xx < imgSrc.cols)
+                            {
+                                histogram[*uc]++;
+                                meanBGR[*uc] += Vec3fTo<Type>(*vPtr).make(dynRatio);
+                            }
+                        }
+                    }
+                }
+                auto pos = distance(histogram.begin(), std::max_element(histogram.begin(), histogram.end()));
+                *vDst = Vec3fTo<Type>(meanBGR[pos] / histogram[pos]).extract();
+            }
+        }
+    }
+};
+
+void oilPainting(InputArray src, OutputArray dst, int size, int dynValue)
+{
+    oilPainting(src, dst, size, dynValue, COLOR_BGR2GRAY);
+}
+
+void oilPainting(InputArray _src, OutputArray _dst, int size, int dynValue,int code)
+{
+    CV_CheckType(_src.type(), _src.type() == CV_8UC1 || _src.type() == CV_8UC3, "only 1 or 3 channels (CV_8UC)");
+    CV_Assert(_src.kind() == _InputArray::MAT);
+    CV_Assert(size >= 1);
+    CV_CheckGT(dynValue , 0,"dynValue must be  0");
+    CV_CheckLT(dynValue, 128, "dynValue must less than 128 ");
+    Mat src = _src.getMat();
+    Mat lum,dst(_src.size(),_src.type());
+    if (src.type() == CV_8UC3)
+    {
+        cvtColor(_src, lum, code);
+        if (lum.channels() > 1)
+        {
+            extractChannel(lum, lum, 0);
+        }
+    }
+    else
+        lum = src.clone();
+    double dratio = 1 / double(dynValue);
+    lum.forEach<uchar>([=](uchar &pixel, const int * /*position*/) { pixel = saturate_cast<uchar>(cvRound(pixel * dratio)); });
+    if (_src.type() == CV_8UC1)
+    {
+        ParallelOilPainting<uchar> oilAlgo(src, dst, lum, size, dynValue);
+        parallel_for_(Range(0, src.rows), oilAlgo);
+    }
+    else
+    {
+        ParallelOilPainting<Vec3b> oilAlgo(src, dst, lum, size, dynValue);
+        parallel_for_(Range(0, src.rows), oilAlgo);
+    }
+    dst.copyTo(_dst);
+    dst = (dst  / dynValue) * dynValue;
+}
+}
+}
diff --git a/modules/xphoto/test/test_inpainting.cpp b/modules/xphoto/test/test_inpainting.cpp
new file mode 100644
index 00000000000..c7391cebe3c
--- /dev/null
+++ b/modules/xphoto/test/test_inpainting.cpp
@@ -0,0 +1,75 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+using namespace xphoto;
+
+
+static void test_inpainting(const Size inputSize, InpaintTypes mode, double expected_psnr, ImreadModes inputMode = IMREAD_COLOR)
+{
+    string original_path = cvtest::findDataFile("cv/shared/lena.png");
+    string mask_path = cvtest::findDataFile("cv/inpaint/mask.png");
+
+    Mat original_ = imread(original_path, inputMode);
+    ASSERT_FALSE(original_.empty()) << "Could not load input image " << original_path;
+
+    Mat mask_ = imread(mask_path, IMREAD_GRAYSCALE);
+    ASSERT_FALSE(mask_.empty()) << "Could not load error mask " << mask_path;
+
+    Mat original, mask;
+    resize(original_, original, inputSize, 0.0, 0.0, INTER_AREA);
+    resize(mask_, mask, inputSize, 0.0, 0.0, INTER_NEAREST);
+
+    Mat mask_valid = (mask == 0);
+    Mat im_distorted(inputSize, original.type(), Scalar::all(0));
+    original.copyTo(im_distorted, mask_valid);
+
+    Mat reconstructed;
+    xphoto::inpaint(im_distorted, mask_valid, reconstructed, mode);
+
+    double adiff_psnr = cvtest::PSNR(original, reconstructed);
+    EXPECT_LE(expected_psnr, adiff_psnr);
+
+#if 0
+    imshow("original", original);
+    imshow("im_distorted", im_distorted);
+    imshow("reconstructed", reconstructed);
+    std::cout << "adiff_psnr=" << adiff_psnr << std::endl;
+    waitKey();
+#endif
+}
+
+TEST(xphoto_inpaint, smoke_FSR_FAST)  // fast smoke test, input doesn't fit well for tested algorithm
+{
+    test_inpainting(Size(128, 128), INPAINT_FSR_FAST, 30);
+}
+TEST(xphoto_inpaint, smoke_FSR_BEST)  // fast smoke test, input doesn't fit well for tested algorithm
+{
+    applyTestTag(CV_TEST_TAG_LONG);
+    test_inpainting(Size(128, 128), INPAINT_FSR_BEST, 30);
+}
+
+TEST(xphoto_inpaint, smoke_grayscale_FSR_FAST)  // fast smoke test, input doesn't fit well for tested algorithm
+{
+    test_inpainting(Size(128, 128), INPAINT_FSR_FAST, 30, IMREAD_GRAYSCALE);
+}
+TEST(xphoto_inpaint, smoke_grayscale_FSR_BEST)  // fast smoke test, input doesn't fit well for tested algorithm
+{
+    test_inpainting(Size(128, 128), INPAINT_FSR_BEST, 30, IMREAD_GRAYSCALE);
+}
+
+
+TEST(xphoto_inpaint, regression_FSR_FAST)
+{
+    test_inpainting(Size(512, 512), INPAINT_FSR_FAST, 39.5);
+}
+TEST(xphoto_inpaint, regression_FSR_BEST)
+{
+    applyTestTag(CV_TEST_TAG_VERYLONG);  // add --test_tag_enable=verylong to run this test
+    test_inpainting(Size(512, 512), INPAINT_FSR_BEST, 39.6);
+}
+
+
+}} // namespace
diff --git a/modules/xphoto/test/test_oil_painting.cpp b/modules/xphoto/test/test_oil_painting.cpp
new file mode 100644
index 00000000000..e7ef2fc163b
--- /dev/null
+++ b/modules/xphoto/test/test_oil_painting.cpp
@@ -0,0 +1,108 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+Mat testOilPainting(Mat imgSrc, int halfSize, int dynRatio, int colorSpace)
+{
+    vector<int> histogramme;
+    vector<Vec3f> moyenneRGB;
+    Mat dst(imgSrc.size(), imgSrc.type());
+    Mat lum;
+    if (imgSrc.channels() != 1)
+    {
+        cvtColor(imgSrc, lum, colorSpace);
+        if (lum.channels() > 1)
+        {
+            extractChannel(lum, lum, 0);
+        }
+    }
+    else
+        lum = imgSrc.clone();
+    lum = lum / dynRatio;
+    if (dst.channels() == 3)
+        for (int y = 0; y < imgSrc.rows; y++)
+        {
+            Vec3b *vDst = dst.ptr<Vec3b>(y);
+            for (int x = 0; x < imgSrc.cols; x++, vDst++) //for each pixel
+            {
+                Mat mask(lum.size(), CV_8UC1, Scalar::all(0));
+                Rect r(Point(x - halfSize, y - halfSize), Size(2 * halfSize + 1, 2 * halfSize + 1));
+                r = r & Rect(Point(0, 0), lum.size());
+                mask(r).setTo(255);
+                int histSize[] = { 256 };
+                float hranges[] = { 0, 256 };
+                const float* ranges[] = { hranges };
+                Mat hist;
+                int channels[] = { 0 };
+                calcHist(&lum, 1, channels, mask, hist, 1, histSize, ranges, true, false);
+                double maxVal = 0;
+                Point pMin, pMax;
+                minMaxLoc(hist, 0, &maxVal, &pMin, &pMax);
+                mask.setTo(0, lum != static_cast<int>(pMax.y));
+                Scalar v = mean(imgSrc, mask);
+                *vDst = Vec3b(static_cast<uchar>(v[0]), static_cast<uchar>(v[1]), static_cast<uchar>(v[2]));
+            }
+        }
+    else
+        for (int y = 0; y < imgSrc.rows; y++)
+        {
+            uchar *vDst = dst.ptr<uchar>(y);
+            for (int x = 0; x < imgSrc.cols; x++, vDst++) //for each pixel
+            {
+                Mat mask(lum.size(), CV_8UC1, Scalar::all(0));
+                Rect r(Point(x - halfSize, y - halfSize), Size(2 * halfSize + 1, 2 * halfSize + 1));
+                r = r & Rect(Point(0, 0), lum.size());
+                mask(r).setTo(255);
+                int histSize[] = { 256 };
+                float hranges[] = { 0, 256 };
+                const float* ranges[] = { hranges };
+                Mat hist;
+                int channels[] = { 0 };
+                calcHist(&lum, 1, channels, mask, hist, 1, histSize, ranges, true, false);
+                double maxVal = 0;
+                Point pMin, pMax;
+                minMaxLoc(hist, 0, &maxVal, &pMin, &pMax);
+                mask.setTo(0, lum != static_cast<int>(pMax.y));
+                Scalar v = mean(imgSrc, mask);
+                *vDst = static_cast<uchar>(v[0]);
+            }
+        }
+    return dst;
+}
+
+TEST(xphoto_oil_painting, regression)
+{
+    string folder = string(cvtest::TS::ptr()->get_data_path()) + "cv/inpaint/";
+    Mat orig = imread(folder+"exp1.png", IMREAD_COLOR);
+    ASSERT_TRUE(!orig.empty());
+    resize(orig, orig, Size(100, 100));
+    Mat dst1, dst2, dd;
+    xphoto::oilPainting(orig, dst1, 3, 5, COLOR_BGR2GRAY);
+    dst2 = testOilPainting(orig, 3, 5, COLOR_BGR2GRAY);
+    absdiff(dst1, dst2, dd);
+    vector<Mat> plane;
+    split(dd, plane);
+    for (auto p : plane)
+    {
+        double maxVal;
+        Point pIdx;
+        minMaxLoc(p, NULL, &maxVal, NULL, &pIdx);
+        ASSERT_LE(p.at<uchar>(pIdx), 2);
+    }
+    Mat orig2 = imread(folder + "exp1.png",IMREAD_GRAYSCALE);
+    ASSERT_TRUE(!orig2.empty());
+    resize(orig2, orig2, Size(100, 100));
+    Mat dst3, dst4, ddd;
+    xphoto::oilPainting(orig2, dst3, 3, 5, COLOR_BGR2GRAY);
+    dst4 = testOilPainting(orig2, 3, 5, COLOR_BGR2GRAY);
+    absdiff(dst3, dst4, ddd);
+    double maxVal;
+    Point pIdx;
+    minMaxLoc(ddd, NULL, &maxVal, NULL, &pIdx);
+    ASSERT_LE(ddd.at<uchar>(pIdx), 2);
+}
+
+}} // namespace
diff --git a/modules/xphoto/tutorials/fsr_for_inpainting.markdown b/modules/xphoto/tutorials/fsr_for_inpainting.markdown
new file mode 100644
index 00000000000..739dda70381
--- /dev/null
+++ b/modules/xphoto/tutorials/fsr_for_inpainting.markdown
@@ -0,0 +1,77 @@
+Image Inpainting {#tutorial_xphoto_inpainting}
+================
+
+Introduction
+------------
+In this tutorial we will show how to use the algorithm Rapid Frequency Selective Reconstructiom (FSR) for image inpainting.
+
+Basics
+------
+Image Inpainting is the process of reconstructing damaged or missing parts of an image.
+This is achieved by replacing distorted pixels by pixels similar to the neighboring ones. There are several algorithms for inpainting, using different approaches for such replacement.
+
+One of those algorithms is called **Rapid Frequency Selectice Reconstruction (FSR)**.
+FSR reconstructs image signals by exploiting the property that small areas of images can be represented sparsely in the Fourier domain. See @cite GenserPCS2018 and @cite SeilerTIP2015 for details.
+
+FSR can be utilized for the following areas of application:
+
+-# **Error Concealment (Inpainting)**:
+    The sampling mask indicates the missing pixels of the distorted input image to be reconstructed.
+
+-# **Non-Regular Sampling**:
+    For more information on how to choose a good sampling mask, please review @cite GroscheICIP2018 and @cite GroscheIST2018.
+
+Example
+-------
+The following sample code shows how to use FSR for inpainting.
+The non-zero pixels of the error mask indicate valid image area, while zero pixels indicate area to be reconstructed.
+You can create an arbitrary mask manually using tools like Paint or GIMP. Start with a plain white image and draw some distortions in black.
+
+  @code{.cpp}
+
+  #include <opencv2/opencv.hpp>
+  #include <opencv2/xphoto/inpainting.hpp>
+  #include <iostream>
+
+  using namespace cv;
+
+  int main(int argc, char** argv)
+  {
+      // read image and error pattern
+      Mat original_, mask_;
+      original_ = imread("images/kodim22.png");
+      mask_ = imread("images/pattern_random.png", IMREAD_GRAYSCALE);
+
+      // make sure that mask and source image have the same size
+      Mat mask;
+      resize(mask_, mask, original_.size(), 0.0, 0.0, cv::INTER_NEAREST);
+
+      // distort image
+      Mat im_distorted(original_.size(), original_.type(), Scalar::all(0));
+      original_.copyTo(im_distorted, mask); // copy valid pixels only (i.e. non-zero pixels in mask)
+
+      // reconstruct the distorted image
+      // choose quality profile fast (xphoto::INPAINT_FSR_FAST) or best (xphoto::INPAINT_FSR_BEST)
+      Mat reconstructed;
+      xphoto::inpaint(im_distorted, mask, reconstructed, xphoto::INPAINT_FSR_FAST);
+
+      imshow("orignal image", original_);
+      imshow("distorted image", im_distorted);
+      imshow("reconstructed image", reconstructed);
+      waitKey();
+
+      return 0;
+  }
+  @endcode
+
+Original and distorted image:
+![image](images/originalVSdistorted.jpg)
+
+Reconstruction:
+![image](images/reconstructed_fastVSbest.jpg)
+
+Left image: fast quality profile (run time 8 seconds). Right image: best quality profile (1 minute 51 seconds).
+
+Additional Resources
+--------------------
+[Comparison of FSR to existing inpainting methods in OpenCV](https://github.com/opencv/opencv_contrib/files/3730212/inpainting_comparison.pdf)
diff --git a/modules/xphoto/tutorials/images/baboon.jpg b/modules/xphoto/tutorials/images/baboon.jpg
new file mode 100644
index 00000000000..2f98d8359b8
Binary files /dev/null and b/modules/xphoto/tutorials/images/baboon.jpg differ
diff --git a/modules/xphoto/tutorials/images/baboon_oil_painting_effect.jpg b/modules/xphoto/tutorials/images/baboon_oil_painting_effect.jpg
new file mode 100644
index 00000000000..4fb033e46c7
Binary files /dev/null and b/modules/xphoto/tutorials/images/baboon_oil_painting_effect.jpg differ
diff --git a/modules/xphoto/tutorials/images/originalVSdistorted.jpg b/modules/xphoto/tutorials/images/originalVSdistorted.jpg
new file mode 100644
index 00000000000..2320694ab9a
Binary files /dev/null and b/modules/xphoto/tutorials/images/originalVSdistorted.jpg differ
diff --git a/modules/xphoto/tutorials/images/reconstructed_fastVSbest.jpg b/modules/xphoto/tutorials/images/reconstructed_fastVSbest.jpg
new file mode 100644
index 00000000000..c16ea0353c9
Binary files /dev/null and b/modules/xphoto/tutorials/images/reconstructed_fastVSbest.jpg differ
diff --git a/modules/xphoto/tutorials/oil_painting_effect.markdown b/modules/xphoto/tutorials/oil_painting_effect.markdown
new file mode 100644
index 00000000000..f7c4f95c3a6
--- /dev/null
+++ b/modules/xphoto/tutorials/oil_painting_effect.markdown
@@ -0,0 +1,23 @@
+Oil painting effect {#tutorial_xphoto_oil_painting_effect}
+===================================================
+
+Introduction
+------------
+Image is converted in a color space default color space COLOR_BGR2GRAY.
+For every pixel in the image a program calculated a histogram (first plane of color space) of the neighbouring of size 2*size+1.
+and assigned the value of the most frequently occurring value. The result looks almost like an oil painting. Parameter 4 of oilPainting is used to decrease image dynamic and hence increase oil painting effect.
+
+Example
+--------------------
+
+
+    @code{.cpp}
+    Mat img;
+    Mat dst;
+    img = imread("opencv/samples/data/baboon.jpg");
+    xphoto::oilPainting(img, dst, 10, 1, COLOR_BGR2Lab);
+    imshow("oil painting effect", dst);
+    @endcode
+
+    Original ![](images/baboon.jpg)
+    Oil painting effect ![](images/baboon_oil_painting_effect.jpg)
diff --git a/samples/python2/dis_opt_flow.py b/samples/python2/dis_opt_flow.py
deleted file mode 100644
index 171e6a235bc..00000000000
--- a/samples/python2/dis_opt_flow.py
+++ /dev/null
@@ -1,114 +0,0 @@
-#!/usr/bin/env python
-
-'''
-example to show optical flow estimation using DISOpticalFlow
-
-USAGE: dis_opt_flow.py [<video_source>]
-
-Keys:
- 1  - toggle HSV flow visualization
- 2  - toggle glitch
- 3  - toggle spatial propagation of flow vectors
- 4  - toggle temporal propagation of flow vectors
-ESC - exit
-'''
-
-# Python 2/3 compatibility
-from __future__ import print_function
-
-import numpy as np
-import cv2 as cv
-import video
-
-
-def draw_flow(img, flow, step=16):
-    h, w = img.shape[:2]
-    y, x = np.mgrid[step/2:h:step, step/2:w:step].reshape(2,-1).astype(int)
-    fx, fy = flow[y,x].T
-    lines = np.vstack([x, y, x+fx, y+fy]).T.reshape(-1, 2, 2)
-    lines = np.int32(lines + 0.5)
-    vis = cv.cvtColor(img, cv.COLOR_GRAY2BGR)
-    cv.polylines(vis, lines, 0, (0, 255, 0))
-    for (x1, y1), (x2, y2) in lines:
-        cv.circle(vis, (x1, y1), 1, (0, 255, 0), -1)
-    return vis
-
-
-def draw_hsv(flow):
-    h, w = flow.shape[:2]
-    fx, fy = flow[:,:,0], flow[:,:,1]
-    ang = np.arctan2(fy, fx) + np.pi
-    v = np.sqrt(fx*fx+fy*fy)
-    hsv = np.zeros((h, w, 3), np.uint8)
-    hsv[...,0] = ang*(180/np.pi/2)
-    hsv[...,1] = 255
-    hsv[...,2] = np.minimum(v*4, 255)
-    bgr = cv.cvtColor(hsv, cv.COLOR_HSV2BGR)
-    return bgr
-
-
-def warp_flow(img, flow):
-    h, w = flow.shape[:2]
-    flow = -flow
-    flow[:,:,0] += np.arange(w)
-    flow[:,:,1] += np.arange(h)[:,np.newaxis]
-    res = cv.remap(img, flow, None, cv.INTER_LINEAR)
-    return res
-
-
-if __name__ == '__main__':
-    import sys
-    print(__doc__)
-    try:
-        fn = sys.argv[1]
-    except IndexError:
-        fn = 0
-
-    cam = video.create_capture(fn)
-    ret, prev = cam.read()
-    prevgray = cv.cvtColor(prev, cv.COLOR_BGR2GRAY)
-    show_hsv = False
-    show_glitch = False
-    use_spatial_propagation = False
-    use_temporal_propagation = True
-    cur_glitch = prev.copy()
-    inst = cv.optflow.createOptFlow_DIS(cv.optflow.DISOPTICAL_FLOW_PRESET_MEDIUM)
-    inst.setUseSpatialPropagation(use_spatial_propagation)
-
-    flow = None
-    while True:
-        ret, img = cam.read()
-        gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
-        if flow is not None and use_temporal_propagation:
-            #warp previous flow to get an initial approximation for the current flow:
-            flow = inst.calc(prevgray, gray, warp_flow(flow,flow))
-        else:
-            flow = inst.calc(prevgray, gray, None)
-        prevgray = gray
-
-        cv.imshow('flow', draw_flow(gray, flow))
-        if show_hsv:
-            cv.imshow('flow HSV', draw_hsv(flow))
-        if show_glitch:
-            cur_glitch = warp_flow(cur_glitch, flow)
-            cv.imshow('glitch', cur_glitch)
-
-        ch = 0xFF & cv.waitKey(5)
-        if ch == 27:
-            break
-        if ch == ord('1'):
-            show_hsv = not show_hsv
-            print('HSV flow visualization is', ['off', 'on'][show_hsv])
-        if ch == ord('2'):
-            show_glitch = not show_glitch
-            if show_glitch:
-                cur_glitch = img.copy()
-            print('glitch is', ['off', 'on'][show_glitch])
-        if ch == ord('3'):
-            use_spatial_propagation = not use_spatial_propagation
-            inst.setUseSpatialPropagation(use_spatial_propagation)
-            print('spatial propagation is', ['off', 'on'][use_spatial_propagation])
-        if ch == ord('4'):
-            use_temporal_propagation = not use_temporal_propagation
-            print('temporal propagation is', ['off', 'on'][use_temporal_propagation])
-    cv.destroyAllWindows()