Why Pixel Precision is the Future of Image Annotation: An Interview with AI Researcher Vahan Petrosyan

Article by Daniel Smith | December 12, 2018

Vahan Petrosyan is a graduating PhD in machine learning from KTH Royal Institute of Technology, Sweden. His research focuses on developing unsupervised and semi-supervised learning algorithms for image annotation applications. Based on his doctoral research, he co-founded the image annotation startup SuperAnnotate in 2018.

We sat down with Vahan to discuss recent developments in image annotation and how they will affect the massive and increasing demand for high-quality AI training data. For more expert insight into the world of machine learning, have a look at the rest of our interview series here.

Lionbridge AI: What is your particular specialism in machine learning and how did you become interested in that area?

Vahan: I first became interested in machine learning in 2012 during my bachelors at Utah State University (USU) when I took my first course in statistical learning theory. While doing my masters in statistics there, I took part in several machine learning competitions in Kaggle. However, it was during my PhD at KTH that my interest in unsupervised and semi-supervised learning really took off. The initial goal of my research was to develop clustering algorithms that scale easily for 100,000s to millions of data points. Over the last two years, I have been applying some of these earlier developed techniques to image/video segmentation, also known as superpixel/supervoxel segmentation, which became the basis of SuperAnnotate’s image annotation technology.

L: You developed the algorithm behind SuperAnnotate for your PhD research. What made you decide to turn it into a business?

V: I attended CVPR, a major computer vision conference, in June 2018. While I was there, I talked to over 10 image annotation companies that sponsored the event and showed them the tool we have developed. At that time, I was thinking to license the technology and focus more on my PhD. However, the huge interest most of these companies showed in partnering with us helped us to see the potential value of what we’d created. As a result, we decided to create our own company and enter the image annotation market with our technology.

L: How did you develop your core technology?

V: During the first two years of my PhD I was developing more generic clustering algorithms. My interest in applying them to computer vision led to the research that ultimately developed into our image segmentation technology. The initial segmentation is totally unsupervised and does not require any training data. However, once the person annotates certain objects in the image, the algorithm gives higher priority on segmenting those objects while still keeping an accurate segmentation on the other objects. In other words, we gather training data during the annotation process.

L: How does SuperAnnotate’s approach to image segmentation differ from others, such as bounding boxes or polygons, and what impact do you expect your technology to have?

V: When I was developing the image segmentation technology, I had in mind that it should be used for annotation purposes. Unlike bounding boxes, we provide pixel accurate annotation based on our one-click object selection technology. Pixel accurate annotation is commonly achieved with polygon tools that are very time consuming and costly. In comparison, bounding boxes are much cheaper and are therefore extensively used for image annotation. We observed that our one-click object selection solution achieves similar pixel accuracy while being 10-20 times faster than polygon annotation tools.

Compared to other segmentation-based solutions, our technology has 5 main advantages:

  • The speed of our algorithm allows us to segment and annotate large 10-megapixel images in real time
  • The algorithm accurately generates non-homogeneous regions, allowing users to select both large and small objects with just one click
  • The tool allows us to change the number of segments instantly
  • It also has a manual correct feature for regions that need further corrections
  • Self-learning features improve the segmentation accuracy as the amount of annotated data increases

Over the last couple of weeks, we have been comparing our pixel accurate annotation with the manual bounding box annotation tools and realized that we can be up to 3x faster. As our tool and others like it hit the mainstream, we expect that the demand for bounding boxes will eventually disappear. Pixel accurate annotation will be the new norm.

L: As someone who does image segmentation every day, you must see a great difference in output depending on the quality of the images you receive. Are there any possible solutions to this on the technology side and what progress is being made towards resolving the issue?

V: The quality of the images certainly plays a big role in annotation and ultimately in recognition quality. However, recent research has made significant progress towards reducing the impact of image quality and solving this problem. The community is developing many exciting pre-processing algorithms that we can use to improve the image quality and ensure better quality segmentation.

Even as we annotate images for AI training, self-learning features of tools like ours are playing a role in improving segmentation quality around bad images. This is crucial for customers who want to solve an important recognition problem but lack deep expertise in AI computer vision. Innovation around this issue is allowing pixel-precise annotation to provide a greater ROI for customers and positioning it as the annotation technique of the future.

L: What will be the next big breakthrough in image annotation and what is needed to make it happen?

V: I think pixel-wise accurate video annotation will be the next big breakthrough, building on the progress made in the field of image annotation. In video annotation, the goal is to select an object in one frame and then accurately track it in consecutive frames. This is actually one of our main research interests; we’re planning to hit the market in early spring of 2019. Video object recognition and tracking can be especially valuable for industries like autonomous vehicles or security and surveillance.

L: Are there any industry uses of image segmentation that particularly excite you?

V: Pixel accurate segmentation of medical images is a very exciting field that we’re following very closely. We really hope that medical data will be far more accessible in the near future, as that will allow the community to develop algorithms for early stage diagnosis of various diseases.

L: Finally, do you have any other advice for anyone looking to build an algorithm based on image annotation data?

V: I think the most important thing people should consider when trying to create/improve their own computer vision models is the following: Is this learnable? What type of recognition accuracy would you expect with a certain number of annotated images, and will that be high enough to satisfy your needs? An excellent first step would be to consult with different computer vision specialists and better understand the complexity of the problem before starting the annotation process.

The Author
Daniel Smith

Daniel writes a variety of content for Lionbridge’s website as part of the marketing team. Born and raised in the UK, he first came to Japan by chance in 2013 and is continually surprised that no one has thrown him out yet. Outside of Lionbridge, he loves to travel, take photos and listen to music that his neighbors really, really hate.


    Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.