This page looks best with JavaScript enabled

Delve into OpenCV

 ·  ☕ 9 min read

TODOs

namespaces and so on

  • hal: Hardware Acceleration Layer. This seperate mudule is used to accelerate OpenCV for different platforms. The HAL can be developed independently from OpenCV. A HAL doesn’t have to implement all operations; it doesn’t depend on OpenCV API.
  • SSE2, Streaming SIMD Extensions 2(一种和硬件加速有关的指令集)
  • ocl: Open Computing Language (OpenCL) is an open standard for writing code that runs across heterogeneous platforms including CPUs, GPUs, DSPs and etc.
  • IPP: Intel Integrated Performance Primitives
  • TBB: Intel Threading Building Blocks. Parallelization in Android is done via Intel TBB. 简单使用教程参考这里

main modules

highgui

  • setWindowProperty

core

core-array

  • copymakeborder各种bordertypes

  • perspectiveTransform

  • gemm:gemeralized matrix multiplication,计算
    $$\texttt{dst} = \texttt{alpha} \cdot \texttt{src1} ^T \cdot \texttt{src2} + \texttt{beta} \cdot \texttt{src3} ^T$$

  • mulTransposed:计算
    $$\texttt{dst} = \texttt{scale} ( \texttt{src} - \texttt{delta} )^T ( \texttt{src} - \texttt{delta} )$$

    calcCovarMatrix中使用

  • convertPointsToHomogeneousconvertPointsFromHomogeneous

  • reduce:把一个矩阵通过某种方式“简化”成一个向量这种方式可以是按行或列求最值CV_REDUCE_MAX/MIN、均值CV_REDUCE_AVG或求和CV_REDUCE_SUM

  • calcCovarMatrix:计算协方差矩阵,对于未normalized的矩阵,需要手动除以$n-1$,例如

    1
    2
    3
    4
    5
    
    cv::Mat data = (cv::Mat_<double>(2, 6) << -2.5, -1.5, -0.5, 0.5, 1.5, 2.5,
                                            -0.8, -1.2, 0.4, -0.5, 0.3, 1.8);
    cv::Mat mean, covar;
    cv::calcCovarMatrix(data, covar, mean, cv::COVAR_NORMAL + cv::COVAR_COLS);
    // 结果covar是[[17.5,8.3], [8.3,5.82]],都应该再除以6-1=5
    
  • SVD::compute

  • LDA降维/分类的“投影轴”数量,和有标签的输入数据的类别数量有关(而不是PCA、和输入数据特征维度数量有关)

  • solve

  • eigen

  • norm

  • dft

  • LUT

  • x

core-optim

core-utils

  • 参考:opencvdoc: Optimization Algorithms

  • setUseOptimizeduseOptimized

  • getTickCountgetTickFrequency获取程序执行时间

    1
    2
    3
    4
    5
    
    int64 e1 = cv::getTickCount();
    // do some workd
    int64 e2 = cv::getTickCount();
    std::cout << (e2 - e1) / cv::getTickFrequency() << std::endl;
    std::cout << std::boolalpha << cv::useOptimized() << std::endl;
    

features2d

imgproc

  • getAffineTransform:利用3组对应点计算一个平面/二维坐标变换矩阵,示例参考这里;参考源码接受三组对应点的输入;建立线性方程组,利用solve函数进行LU分解求解
  • getPerspectiveTransform:利用4组对应点计算一个空间/三维坐标变换矩阵,示例参考这里;参考源码接受四组对应点的输入;建立线性方程组,利用solve函数进行SVD分解求解
  • warpAffine
  • warpPerspective
  • grubcut:从围绕要分割的对象的用户指定的边界框开始,算法使用高斯混合模型估计目标对象和背景的颜色分布。这用于在像素标签上构建马尔可夫随机场,其具有优选具有相同标签的连接区域的能量函数,并且运行基于图切割的优化以推断它们的值。由于这个估计可能比边界框中的原始估计更准确,所以重复这个两步程序直到收敛。由于分割过程需要训练高斯混合模型,所以大图像比较耗时。参考zhihu:opencv实战从0到n (15)- grabcut分割(抠图)
  • matchshapes
  • matchtemplate(一些相关讨论参考opencv. how to find correlation coefficient for two images可以用该函数计算两个图像的相关系数);实现文件在templmatch.cpp
  • cv::integral,例如用积分图$\texttt{sum}$计算图像$\texttt{image}$的区间$y_1\leq y<y_2, x_1\leq x<x_2$内的和:
    $$\sum_{y_1\leq y<y_2}\sum_{x_1\leq x<x2}\texttt{image}(x,y)=\texttt{sum}(x_2,y_2)-\texttt{sum}(x_1,y_2)-\texttt{sum}(x_2,y_1)+\texttt{sum}(x_1,y_1)$$

structural analysis and shape descriptors

  • findContours1可以用灰度图,但一般建议用二值图;返回值是一个个点

    • 计算中心点坐标为整数,曾经出现的原因:

      1
      2
      3
      
      img.setto(2, 255-mask); // 注意这里“2”写错了,导致后面坐标为整数
      cv::findcontours(img, contours, cv::rete_list, cv::chain_approx_none);
      // 后续用norm或者fitellipse得到的轮廓中心都是整数,或者.5
      
  • contertourArea利用格林公式求轮廓积分;不会自动减去中间孔洞面积(无论oriented是否为真),且如果检测目标是黑色背景上的白色目标时oriented有意义

  • convexHull, moments参考数学/计算机 - 图形学Graphics

calib3d

  • initCameraMatrix2D/cvInitIntrinsicParams2D初始化相机内参矩阵(使用如源码samples中的stereo_calib示例)

    // extract varnishing points in order t obtain initial value of the focal length

  • cvFindHomography,参考Fixed Single-Camera 3D Laser Scanning.2020.Zausa利用平面已知物理间距点及其图像坐标、使用该函数计算$H$后,利用下式计算平面到相机坐标系的转换
    $$\frac{K^{-1}H}{|K^{-1}H|}=[r_1,r_2,t]$$

    当然上式直接计算的结果还可以通过SVD分解以使得$r_0$和$r_1$正交;且可以此计算平面方程。

    • image warp:将标定板图案利用homograph转换成frontal,检测角点/圆心提高精度??流程参考Fixed Single-Camera 3D Laser Scanning.2020.Zausa附录代码,大致流程为

      graph LR
      a[`findContours`<br>查找轮廓] --> b[`approxPolyDP`<br>筛选出矩形] --> c[`erode`缩放筛选出宽黑边<br>包围的矩形区域] --> d[根据已知的pattern长宽比<br>用函数`findHomography`计算H] --> e[`warpPerspective`<br>转换成frontal图] --> 检测角点 --> f[`perspectiveTransform`利用H_inv<br>将检测到的角点转换回原图]
      
    • 另一个应用示例

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      
      K_inv = np.linalg.inv(K) # K: camera intrinsic matrix
      metricPoints: np.ndarray = np.array(
          [[[0, rectHeight]], [[0, 0]], [[rectWidth, 0]], [[rectWidth, rectHeight]]])
      H = cv2.findHomography(imgPoints, metricPoints)[0]
      # First, apply the inverse intrinsics to the homography to remove the camera
      result = np.matmul(K_inv, H)
      # We need to normalize our matrix to remove the scale factor
      result /= cv2.norm(result[:, 1])
      # Split homography columns to get two 2D rotation basis vectors and translation
      r0, r1, t = np.hsplit(result, 3)
      # To get the third rotation basis vector simply make the cross product
      r2 = np.cross(r0.T, r1.T).T
      # Since r0 and r1 may not be orthogonal, we use the Zhang aproximation:
      # we keep only the u and vt part of the SVD of the new rotation basis matrix
      # to minimize the Frobenius norm of the difference
      _, u, vt = cv2.SVDecomp(np.hstack([r0, r1, r2]))
      R = np.matmul(u, vt)
      # We finally have our origin center and normal vector
      origin = t[:, 0]
      normal = R[:, 2]
      
  • cvRodrigues2

  • cvProjectPoints2

  • matMulDeriv/计算两个矩阵相乘操作对两个矩阵的偏导数,使用如composeRT函数,参考数学 - 微积分Calculus

  • composeRT/cvComposeRT将两个变换矩阵$[R_1(\text{om}_1)|T_1]$和$[R_2(\text{om}_2)|T_2]$合成为一个$[R_3(\text{om}_3)|T_3]$(函数输入、输出参数中旋转都是用旋转向量$\text{om}$表示的,所以求导中有一步Rodirgues转换过程)
    $$\begin{bmatrix}R_3&T_3\cr0&1\end{bmatrix}=\begin{bmatrix}R_2&T_2\cr0&1\end{bmatrix}\begin{bmatrix}R_1&T_1\cr 0&1\end{bmatrix}=\begin{bmatrix}R_2R_1&R_2T_1+T_2\cr0&1\end{bmatrix}$$

    并计算导数
    $$ \frac{\partial \text{om}_3}{\partial \text{om}_1}, \frac{\partial \text{om}_3}{\partial \text{om}_2}, \frac{\partial \text{om}_3}{\partial T_1}, \frac{\partial \text{om}_3}{\partial T_2}, \frac{\partial T_3}{\partial \text{om}_1}, \frac{\partial T_3}{\partial \text{om}_2}, \frac{\partial T_3}{\partial T_1}, \frac{\partial T_3}{\partial T_2} $$

  • stereoCalibrate/cvStereoCalibrate(见stereoCalib.md)

  • stereoRectify
    $Q$矩阵查看0项

  • reprojectImageTo3D:把双目矫正过图像中计算出的disparity视差图(稠密)映射成三维点云($z=bf/d=T_xf/(c_x-c_x’-d)=Z/W$),主要原理:
    $$Q\begin{bmatrix} u\cr v \cr d \cr 1 \end{bmatrix}= \begin{bmatrix}1&0&0&-c_x\cr0&1&0&-c_y\cr0&0&0&f\cr0&0&-\frac{1}{T_x}&\frac{c_x-c_x’}{T_x}\end{bmatrix}\begin{bmatrix} u\cr v \cr d \cr 1 \end{bmatrix}=\begin{bmatrix} u-c_x\cr v-c_y \cr f \cr \frac{c_x-c_x’-d}{T_x} \end{bmatrix}= \begin{bmatrix} X\cr Y \cr Z \cr W \end{bmatrix}$$

    参考OpenCV: derivation for perspective transformation matrix (Q).2018csdn: reprojectImageTo3D函数.2019;也可以自行实现该函数功能,如stackoverflow: reprojectImageTo3D() in OpenCV.2014;OpenCV中该函数的实现中,==考虑了$Q$矩阵中的0项为什么==?

  • estimateAffine2D
    $$\begin{bmatrix} x\cr y\cr \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12}\ a_{21} & a_{22}\cr \end{bmatrix} \begin{bmatrix} X\cr Y\cr \end{bmatrix} + \begin{bmatrix} b_1\cr b_2\cr \end{bmatrix}$$

    getAffineTransform的区别?

  • getOptimalNewCameraMatrix

  • stereoBM

    • pre-filter:类似边缘检测的过程,“intensity normalization”
    • cost:SAD
    • post-filter
      • uniqueness ratio:最好匹配视差$d$的cost应该比次好比配$d^{*}$的cost“小”的比例
        $$\texttt{SAD}(d)\geq\texttt{SAD}(d^*)\cdot(1+\frac{\texttt{uniquenessRatio}}{100})$$

      • texture:窗口内所有像素纹理(pre-filter的结果;类似边缘纹理)求和

      • speckle四邻域视差(注意是否需要乘16)差别不大于speckleRange的像素点连成一个区域blob,若该区域像素个数不小于speckleWindowSize的区域保留,否则滤除

    • 关于加速:
      • 按行分块并行计算
      • 避免窗口内重复计算(?)integral image的方式(?)
  • 其他

ml

  • references
    • Additive logistic regression:a statistical view of boosting.2000.Friedman

gpu

CUDA-accelerated Computer vision

legacy

objdetect

photo

stiching

extra modules

ximgproc

  • Extended Image Processing,包含功能参考这里
  • 命名空间ximgproc
  • Spare match interpolators 稀疏匹配插值
  • Disparity map filters 视差图滤波
    • Adaptive Manifold filter
    • Weighted Least Squares filter
    • Domain Transform filter
    • Fast Bilateral Solver
    • Fast Global Smoother filter
    • Guided filter
    • Edge Aware Interpolator
    • Robust Interpolation method of Correspondences (RIC)
    • Ridge Detection filter
  • 其他一些函数
    • getDisparityVis可视化显示视差图
    • computeBadPixelPercent计算和ground truth相比的错误率
    • computeMSE计算和ground truth相比的mean square error
    • readGT读取ground truth视差

phase_unwrapping

structured_light2

  • 未做加速优化,速度慢

  • 参考OpenCV: Structured light tutorials中的decode gray code pattern tutorial理解

    graph LR
    a["stereoRectify"] --> b["initUnidistortRectifyMap"] --> c["remap"] --> d["decode"] --"disparity=Δx"--> e["reprojectImageTo3D"]
    
  • graycode

    • 双目或多目;
    • 双向格雷码;
    • 计算对应投影分辨率的disparity视差图;
    • 已经经过双目矫正,所以disparity是对应同一投影像素坐标的两个相机横坐标之差
    • 通过disparity计算点云
  • sinusoidal

    • 有三种计算相对相位的方法
      • PSP(Phase Shifting Profilometry;需要三张图)
        $$\phi=\arctan{\frac{(1-\cos(\psi)\cdot(I_3-I_2))}{\sin(\psi)\cdot(2I_1-I_2-I_3)}}$$
      • FTP3(Fourier Transform Profilometry;;只需一张图,但为了平均计算mask,也需要三张)
        $$ \phi=\operatorname{FTP}(I_1)=\arctan{\frac{\operatorname{re}(\mathcal{F}^{-1}(\mathcal{F}(I_1)))}{\operatorname{im}(\mathcal{F}^{-1}(\mathcal{F}(I_1)))}} $$
      • FAPS3(Fourier-Assisted Phase Shifting)
        • 基本的解算相位仍是对一幅图使用FTP,而为了补偿目标运动,需要使用相邻的三张图
          $$\phi=\arctan{\frac{1-\cos\theta_2+(1-\cos\theta_1)h}{\sin\theta_1 h-\sin\theta_2}},h=\frac{I_2-I_3}{I_1-I2},\\ \theta_1=\operatorname{FTP}(I_2)-\operatorname{FTP}(I_1),\theta_2=\operatorname{FTP}(I_3)-\operatorname{FTP}(I_2)$$
        • 投影图案上有规则分布的一些小白点,是为了spatial phase unwrapping,具体参考原文
    • phase unwrap

References

integrity="sha256-90d2pnfw0r4K8CZAWPko4rpFXQsZvJhTBGYNkipDprI="

What's on this Page