Online video is a huge part of our connected world today. It’s a medium that we use daily to share, communicate, learn, and of course, be entertained – and there seems to be no limit to its growth. Facebook is a great example – they are now getting an amazing 8 billion video views a day, more than double what they saw 6 months earlier. According to a recent Cisco report, video traffic will be 80% of all consumer Internet traffic in 2019, up from 64% in 2014, and mobile video will increase 11X in the next 5 years. In China alone, the online video market is expected to reach more than $17B by 2018 according to iresearch.
As a video and tech enthusiast, these developments are hugely exciting, but this relentless deluge of video does indeed present some very real challenges (and opportunities). Infrastructure challenges are obvious, given the need for increased storage and compute to process, transcode and manipulate videos for end user consumption. However, there is another, less straightforward, problem to overcome. How can viewers best navigate the flood of online video content? And how can content providers and advertisers efficiently and intelligently provide video content that is relevant (and useful) to consumers?
This is certainly a daunting task and something that we as humans are ill equipped to handle. Frankly, it is no wonder that many companies are investigating the possibility of developing intelligent systems that leverage machine learning and deep neural networks (DNN) to help automate these tasks.
With this in mind, Intel, Quanta, and Viscovery came together to build a full stack solution to this problem that leverages a deep learning based application from Viscovery, the power and scalability of Intel® Xeon® processors, and Quanta’s efficient platform designs. We created a turnkey solution specifically designed to solve the video content recognition problem. At Intel, we recognize that it is critical to take a holistic view when tackling these types of challenges and enable solutions that include everything from the silicon and server hardware to the libraries and open source components all the way to the end application. And of course, all of these ingredients must be optimized for cloud scale deployments. Below is a high level view of the solution stack:
In order to tackle these problems at scale, libraries like Intel® Math Kernel Library and optimized open source components like Caffe* are tightly integrated into Viscovery’s Deep Learning-based video content recognition engine to take full advantage of the performance of Intel® processors. The result is a solution that seamlessly runs across Intel® Xeon® and Intel® Xeon Phi™ processor-based platforms providing the capability to train DNNs quickly and deploy at scale at an efficient total cost of ownership. Below is an example of types of content that the Viscovery application uses to train their DNNs. As you can see they’ve moved significantly beyond simple image and object classification:
|
Modality |
Target |
|
Facial |
Human/Animal |
|
Image |
Brand/Logo |
|
Text |
OCR in the wild |
|
Audio |
Speech/Music |
|
Motion |
Action/Video2Text |
|
Object |
Brand/Model |
|
Scene |
Location/Event |
Of course, the real proof of success is in the usage of this platform by end customers. Leaders in video content delivery such as LeEco, YouKu, 8sian, Alimama (part of Alibaba) and many others have already deployed solutions based on this stack.
If you’re at Computex this month, you can check out this video discovery service in action running on our Intel Xeon and Intel Xeon Phi processors at Quanta’s booth and during Intel’s keynote speech by Diane Bryant. And with any luck, as video content recognition capabilities continue to advance, you’ll never find yourself watching irrelevant or unwanted video content again.
The post Delivering Full Stack Video Analytics with Viscovery and Quanta appeared first on Blogs@Intel.

