authors are vetted experts in their fields and write on topics in which they have demonstrated experience. All of our content is peer reviewed and validated by Toptal experts in the same field.
Altaibayar Tseveenbayar

With a master’s degree in AI and 6+ years of professional experience, Altaibayar does full-stack and mobile development with a focus on AR.

PREVIOUSLY AT

Meta
Share

Over the last few years, the average mobile phone performance has increased significantly. 无论是纯粹的CPU能力还是RAM容量, it is now easier to do computation-heavy tasks on mobile hardware. Although these mobile technologies are headed in the right direction, 在移动平台上还有很多事情要做, 尤其是随着增强现实技术的出现, virtual reality, 还有人工智能.

A major challenge in computer vision is to detect objects of interest in images. 人类的眼睛和大脑做着非凡的工作,而且 在机器中复制这些 is still a dream. Over recent decades, approaches have been developed to mimic this in machines, 而且情况正在好转.

In this tutorial, we will explore an algorithm used in detecting blobs in images. We will also use the algorithm, from the open source library, OpenCV, to implement iPhone应用程序原型 that uses the rear-camera to acquire images and detect objects in them.

OpenCV Tutorial

OpenCV is an open source library that provides implementations of major computer vision and machine learning algorithms. 如果你想实现一个检测人脸的应用程序, 在牌桌上打牌, or even a simple application for adding effects on to an arbitrary image, 那么OpenCV是一个很好的选择.

OpenCV is written in C/C++, and has wrapper libraries for all major platforms. 这使得它特别容易在 iOS environment. 在Objective-C中使用它 iOS application, download the OpenCV iOS框架来自官方网站. 请确保您使用的是版本2.4.11 of OpenCV for iOS (which this article assumes you are using), as the lastest version, 3.0, has some compatibility-breaking changes in how the header files are organized. 关于如何安装它的详细信息是 在其网站上有记录.

MSER

MSER, 是极大稳定极区的缩写, is one of the many methods available for blob detection within images. In simple words, the algorithm identifies contiguous sets of pixels whose outer boundary pixel intensities are higher (by a given threshold) than the inner boundary pixel intensities. Such regions are said to be maximally stable if they do not change much over a varying amount of intensities.

尽管有很多其他的 斑点检测算法 exist, MSER was chosen here because it has a fairly light run-time complexity of O(n log(log(n))) where n is the total number of pixels on the image. 该算法对模糊和缩放也具有鲁棒性, which is advantageous when it comes to processing images acquired through real-time sources, 比如手机的摄像头.

在本教程中,我们将设计 application 以侦测Toptal的标志. 这个符号有尖角, and that may lead one to think about how effective corner detection algorithms may be in detecting Toptal’s logo. After all, such an algorithm is both simple to use and understand. Although corner based methods may have a high success rate when it comes to detecting objects that are distinctly separate from the background (such as black objects on white backgrounds), it would be difficult to achieve real-time detection of Toptal’s logo on real-world images, where the algorithm would be constantly detecting hundreds of corners.

Strategy

机器学习和openv

For each frame of image the application acquires through the camera, 它首先被转换成灰度. Grayscale images have only one channel of color, but the logo will be visible, nonetheless. This makes it easier for the algorithm to deal with the image and significantly reduces the amount of data the algorithm has to process for little to no extra gain.

Next, we will use OpenCV’s implementation the algorithm to extract all MSERs. Next, each MSER will be normalized by transforming its minimum bounding rectangle into a square. This step is important because the logo may be acquired from different angles and distances and this will increase tolerance of perspective distortion.

Furthermore, a number of properties are computed for each MSER:

  • Number of holes
  • Ratio of the area of MSER to the area of its convex hull
  • Ratio of the area of MSER to the area of its minimum-area rectangle
  • Ratio of the length of MSER skeleton to area of the MSER
  • Ratio of the area of MSER to the area of its biggest contour

Ios应用和机器学习

以便在图像中检测Toptal的徽标, properties of the all the MSERs are compared to already learned Toptal logo properties. 为本教程的目的, maximum allowed differences for each property were chosen empirically.

Finally, the most similar region is chosen as the result.

iOS Application

在iOS上使用OpenCV很容易. 如果你还没做过, here is a quick outline of the steps involved in setting up Xcode to create an iOS application and use OpenCV in it:

  1. 创建一个新项目名称“SuperCool Logo检测器”.作为语言,选择Objective-C.

  2. 添加一个新的前缀头(.pch)文件并命名为PrefixHeader.pch

  3. Go into project “SuperCool Logo Detector” Build Target and in the Build Settings tab, 找到“Prefix Headers”设置. You can find it in the LLVM Language section, or use the search feature.

  4. Add “PrefixHeader.到前缀头设置

  5. 此时,如果您还没有安装 OpenCV for iOS 2.4.11, do it now.

  6. Drag-and-drop the downloaded framework into the project. Check “Linked Frameworks and Libraries” in your Target Settings. (It should be added automatically, but better to be safe.)

  7. 此外,链接以下框架:

    • AVFoundation
    • AssetsLibrary
    • CoreMedia
  8. Open “PrefixHeader.Pch”,并添加以下3行:

     #ifdef __cplusplus 
     #include  
     #endif”
    
  9. Change extensions of automatically created code files from “.m” to “.mm”. OpenCV是用c++编写的,带有*.嗯,你说你将使用objective - c++.

  10. 导入“opencv2 / highgui / cap_ios.h” in ViewController.h and change ViewController to conform with the protocol CvVideoCameraDelegate:

    #import 
    
  11. Open Main.storyboard and put an UIImageView on the initial view controller.

  12. 创建一个ViewController的出口.mm named “imageView”

  13. Create a variable “CvVideoCamera *camera;” in ViewController.h or ViewController.mm, and initialize it with a reference to the rear-camera:

    camera = [[CvVideoCamera alloc] initWithParentView: _imageView];
    camera.defaultAVCaptureDevicePosition = AVCaptureDevicePositionBack;
    camera.defaultAVCaptureSessionPreset = AVCaptureSessionPreset640x480;
    camera.defaultAVCaptureVideoOrientation = AVCaptureVideoOrientationPortrait;
    camera.defaultFPS = 30;
    camera.grayscaleMode = NO;
    camera.delegate = self;
    
  14. 如果您现在构建项目, Xcode will warn you that you didn’t implement the “processImage” method from CvVideoCameraDelegate. For now, 为了简单起见, we will just acquire the images from the camera and overlay them with a simple text:

    • 给" viewDidAppear "添加一行:
    [camera start];
    
    • Now, if you run the application, it will ask you for permission to access the camera. 然后你就能看到摄像机的录像了.

    • In the “processImage” method add the following two lines:

    const char* str = [@"Toptal" cStringUsingEncoding: NSUTF8StringEncoding];
    cv::putText(image, str, cv::Point(100, 100), CV_FONT_HERSHEY_PLAIN, 2.0, cv::Scalar(0,0,255));
    

That is pretty much it. Now you have a very simple application that draws the text “Toptal” on images from camera. We can now build our target logo detecting application off this simpler one. For brevity, in this article we will discuss only a handful of code segments that are critical to understanding how the application works, overall. The code on GitHub has a fair amount of comments to explain what each segment does.

因为应用程序只有一个目的, 来检测Toptal的商标, 一旦发射, MSER features are extracted from the given template image and the values are stored in memory:

cv::Mat logo = [ImageUtils cvMatFromUIImage: templateImage];

//get gray image
cv::Mat gray;
cvtColor(logo,灰色,CV_BGRA2GRAY);

//最大面积的用户是 
std::vector maxMser = [ImageUtils maxMser: &gray];

//获取maxMSER的4个顶点
cv::RotatedRect = cv::minAreaRect(maxMser);    
cv::Point2f points[4];
rect.points(points);

//normalize image
cv::Mat M = [GeometryUtil getPerspectiveMatrix: points toSize: rect.size];
cv::Mat normalizedImage = [GeometryUtil normalizeImage: &灰色withTranformationMatrix: &M withSize: rect.size.width];

//从归一化图像中获取maxMser
std::vector normalizedMser = [ImageUtils maxMser: &normalizedImage];

//remember the template
self.logoTemplate = [[MSERManager sharedInstance] extractFeature: &normalizedMser];

//store the feature
[self storeTemplate];

The application has only one screen with a Start/Stop button, 以及所有必要的信息, 为FPS和检测到的mser数量, 自动绘制在图像上吗. 只要应用程序没有停止, 对于相机中的每个图像帧, 调用以下processImage方法:

- (void) processImage:(简历::垫 &)image
{    
    cv::Mat gray;
    cvtColor(图像,灰色,CV_BGRA2GRAY);
    
    std::vector> msers;
    [[MSERManager sharedInstance] detectRegions: gray intoVector: msers];
    if (msers.size() == 0) { return; };
    
    std::vector *bestMser = nil;
    double bestPoint = 10.0;
    
    std::for_each(msers.begin(), msers.end(), [&] (std::vector &mser) 
    {
        MSERFeature *feature = [[MSERManager sharedInstance] extractFeature: &mser];

        if(feature != nil)            
        {
            if([[MLManager sharedInstance] isToptalLogo: feature] )
            {
                double tmp = [[MLManager sharedInstance] distance: feature ];
                if ( bestPoint > tmp ) {
                    bestPoint = tmp;
                    bestMser = &mser;
                }
            }
        }
    });

    if (bestMser)
    {
        NSLog(@"minDist: %f", bestPoint);
                
        cv::Rect bound = cv::boundingRect(*bestMser);
        cv::rectangle(image, bound, GREEN, 3);
    }
    else 
    {
        cv::rectangle(image, cv::Rect(0,0, W, H), RED, 3);
    }

    // Omitted debug code
    
    [FPS draw: image]; 
}

This method, in essence, creates a grayscale copy of the original image. It identifies all MSERs and extracts their relevant features, scores each MSER for similarity with the template and picks the best one. Finally, it draws a green boundary around the best MSER and overlays the image with meta information.

Below are the definitions of a few important classes, and their methods, in this application. 它们的用途在注释中描述.

GeometryUtil.h

/*
 This static class provides perspective transformation function
 */
@interface geometry: NSObject

/*
 Return perspective transformation matrix for given points to square with 
 Origin[0,0]和size (size.width, size.width)
 */
+ (cv::Mat) getPerspectiveMatrix: (cv::Point2f[]) points toSize: (cv::Size2f) size;

/*
 Returns new perspecivly transformed image with given size
 */
+ (cv::Mat) normalizeImage: (cv::Mat *) image withTranformationMatrix: (cv::Mat *) M withSize: (float) size;

@end

MSERManager.h

/*
 提供与用户相关的函数的单例类
 */
@interface MSERManager: NSObject

+ (MSERManager *) shareinstance;

/*
 提取所有的mser到提供的向量
 */
- (void) detectreregions: (cv::Mat . &) gray intoVector: (std::vector> &) vector;

/*
 从mser中提取特征. 对于某些mser功能可以为NULL !!!
 */
- (MSERFeature *) extractFeature: (std::vector *) mser;

@end

MLManager.h

/*
 这个单例类封装了对象识别函数
 */
@interface MLManager: NSObject

+ (MLManager *) shareinstance;

/*
 Stores feature from the biggest MSER in the templateImage
 */
- (void) learn: (UIImage *) templateImage;

/*
 Sum of the differences between logo feature and given feature
 */
-(双倍)距离:(MSERFeature *)特征;

/*
Returns true if the given feature is similar to the one learned from the template
 */
- (BOOL) isToptalLogo: (MSERFeature *) feature;

@end

一切都连接好之后, with this application, you should be able to use the camera of your iOS device 来检测Toptal的商标 from different angles and orientations.

垂直检测图像(Toptal徽标).

Detecting an image (the Toptal logo) diagonally on a shirt.

Augmented reality apps start with understanding images, and this is how you can do it.

Conclusion

In this article we have shown how easy it is to detect simple objects from an image using OpenCV. The entire code is available on GitHub. Feel free to fork and send push requests, as contributions are welcome.

对于任何机器学习问题都是如此, the success rate of the logo detection in this application may be increased by using a different set of features and different method for object classification. However, I hope that this article will help you get started with object detection using MSER and applications of computer vision techniques, in general.

Further Reading

  • J. Matas, O. Chum, M. Urban, and T. Pajdla. “Robust wide baseline stereo from maximally stable extremal regions.”
  • Neumann, Lukas; Matas, Jiri (2011). “A Method for Text Localization and Recognition in Real-World Images”
聘请Toptal这方面的专家.
Hire Now

About the author

With a master’s degree in AI and 6+ years of professional experience, Altaibayar does full-stack and mobile development with a focus on AR.

PREVIOUSLY AT

Meta

世界级的文章,每周发一次.

输入您的电子邮件,即表示您同意我们的 privacy policy.

世界级的文章,每周发一次.

输入您的电子邮件,即表示您同意我们的 privacy policy.

Toptal Developers

Join the Toptal® community.