Liberating Machine Vision From the Machines

Until recently, computer vision — used most widely in manufacturing — and mainstream computing technology have existed in parallel worlds. Along with other factory floor technologies, computer vision tends to be machine-specific, hardware driven, and makes little if any use of the Internet. Many the advances we take for granted in modern computing — ubiquitous connectivity, unlimited data storage in the cloud, insights drawn from massive unstructured data sets — have yet to be applied systematically to the factory floor in general and to computer vision specifically.

It’s no surprise when you consider that until recently most computer vision software was written by computer vision hardware makers, built on embedded systems without open APIs. What comes to mind when you think of the software that came bundled with your scanner, your Wi-Fi router, your car’s navigation system? Balky, inflexible and unintuitive. The software isn’t much more than a utility to run the hardware.

But this closed world is being broken open by a convergence of emerging technologies:

  • The proliferation of cheap, high pixel-density camera sensors
  • Open implementations of vision algorithms, machine learning, and statistical tools
  • Large amounts of cheap computing power, becoming virtually limitless in the cloud

These technologies offer all the raw materials needed for a massive shift in how computer vision is practiced. It’s a shift from focusing on the raw material of visual data — the pixels and bitmaps generated by specific cameras — to extracting data from images and using statistical and data science techniques to draw insights.

This new approach to computer vision has a powerful application amid an American manufacturing renaissance emphasizing rapid product cycles and mass customization. Whereas the archetypal American factory was built around systematic, repeatable function, modern manufacturing is about flexibility, adaptability and high efficiency. We’ve gone from Henry Ford’s “any colour he wants so long as it is black” to Google’s Moto X phone — customer-configured, manufactured in the U.S. and delivered within four days.

Unrelenting Quality Demands

But that need for flexibility on the manufacturing line is in tension with unrelenting quality demands that manufacturers face across industries and down supply chains. Despite huge investments in quality control, automakers recalled nearly as many cars as they sold in the U.S. in 2012. Ford and GM made warranty payments of $5.7 billion in 2012, more than half of the $10.5 billion they reported in net income. Automakers are now paying suppliers prices based on benchmarks like defects per million, terminating those who fall below thresholds, and pushing liability for warranty claims down to their suppliers.

While automation has transformed much of manufacturing, a surprising amount of quality control is still done by hand or otherwise relies on human judgement. Many types of inspection require visual evaluation, but manfacturers’ experience with computer vision in quality control has been a frustrating one. Walk into a factory and ask the manager about computer vision, and you are likely to hear a variant of, “Oh yeah, we tried that, it didn’t work very well, we had to throw it out.”

Existing machine vision uses a 30-year-old architecture that’s capital-intensive and severely constrained in its abilities. Today’s computer vision systems operate as stand-alone islands, rarely connected to the Internet. Every time needs change, each installation has to be manually reprogrammed, unit by unit.

Worse still, little data is kept, making it difficult to spot trends or find correlations among multiple variables. Most manufacturing quality inspection by machine vision today is pass/fail. If the initial inspections of a production run pass the quality inspection, the machines are turned on and the testing data overwritten.

The New Computer Vision

The new computer vision, liberated from its hardware shackles and empowered by connectivity, unlimited data storage and Big Data-style statistical analysis, is beginning to change the role of vision in manufacturing. Instead of being a reactive tool to detect defects, computer vision is becoming a data collection tool supporting defect prevention initiatives, improving understanding of complex processes, and enabling greater collaboration across entire supply chains in real time.

With modern web services, once the data is collected it is easily aggregated into dashboards and distributed to production workers, quality engineers, and management, locally or around the globe. Manufacturers can share data with supply chain partners, making it easier to monitor their suppliers or to satisfy reporting requirements for customers.

One of our customers, a large manufacturer of high-quality bolts and other fasteners to automakers, is bringing this vision to life. Their system uses computer vision to analyze the grain pattern of bolts. If the pattern is wrong — if the grain lines end on a load-bearing surface — the bolt head can shear off when a factory worker torques it down, or worse, when it’s already holding an engine block in place.

The company is capturing images using a $100 scanner purchased at Best Buy. All the intelligence is in the software, running remotely on Amazon’s cloud computing platform. The system compares each image to thousands of other metal grain photos stored in the cloud, looking for patterns that correlate with part failure.

The bolt maker is now exploring the extension of its the computer vision system to its steel supplier, which will capture images of metal grain from each batch of steel rods it ships to the fastener maker. The fastener maker will then be able to analyze increasingly massive data sets to correlate grain patterns in the steel rods with quality measurements in the finished bolts.

Instead of examining only a single station, large data sets let companies trace complex interactions down the production line and across the supply chain. Upstream stations may produce parts that are technically within tolerance, but when certain ranges of acceptable variation are combined, they cause downstream defects after installation.

For our bolt-making customer, the raw material (a steel rod) and the batch of bolts made from that rod may each be well within spec, but retrospective data analysis may show that certain combinations of grain pattern in the steel rods lead to higher failure rates on bolts used for specific applications.

As automakers adapt the system it will gain even more power. Should an automaker report that the fastener-maker’s bolts are breaking and leading to warranty repairs, the parts supplier now has the analytical tools to determine the source of the problem. They can run analysis to determine whether the failed bolts came from a particular batch of steel rods, or were made on a day when their line was adjusted to a specific tolerance – or whether the problem wasn’t with the bolt itself, but rather with the worker on the left side of the assembly line who consistently overtorques the engine bolts.

Once the captured data is in the cloud, such systems can store an unlimited amount of data indefinitely, for reanalysis and retrieval anytime. They let plants run correlations over time, track trends and identify root causes, and as new variables of interest arise, go back and analyze previously acquired data.

As each plant gets smarter, the whole system gets smarter. Like Google learning more about consumers with their every search and click, we’re able to aggregate our learnings from quality issues common across industries.

Ultimately, vision can turn physical world challenges into Big Data problems. We know how to solve these Big Data problems better and better every day.

(Written by Jon Sobel, CEO and co-founder of Sight Machine Inc.)

The future of computer vision

Within 20 years, computer vision will be a commodity component within the fabric of the worldwide analytics infrastructure, similar to the telecommunications infrastructure of today, containing distributed analytics and databases services. Application-specific analytics and intelligence will be added to all devices by default within the Internet of All Things (IoAT), including visual, audio, textual, numerical and sensor analytics. A few new Neural Computing (NC) architectures will be standardized in silicon, applicable to all forms of data.

Major government and corporate initiatives are currently underway, similar to the space race, to create artificial brains which will contribute to the NC of the future. Future systems will contain application-specific mixtures of NCs, CPUs, GPUs, sensor processors, and IO. The underlying technology will be a near zero-cost commodity, and the revenue will come from services, similar to phone or cable services.

Imaging devices will be more accurate with more on-chip processing power for image processing and analytics. Image processing algorithms will be similar to those used today, with no major innovations expected. The computer vision community will standardize on a few feature descriptors and features learning architectures, enabling a generic NC platform for application-specific innovation and market growth.

Computer vision and analytics systems will be far superior to the primitive deep learning models in use today, combining deep-learning and multivariate wide-learning together, with improved feature descriptor models and comprehensive training protocols enabled by ubiquitous databases containing labeled samples of any type of image or data such as audio, textual, financial, and information about a person, place or thing. Personal privacy will virtually disappear.

Within 20 years, most mobile and hand-held devices will contain NCs connected to remote analytics services to enable personal, business, commercial, governmental, military, law enforcement and legal organizations to perform combined audio, visual, historical, and textual evaluations to enable shopping, tourism, employment interviews, banking, commerce, law enforcement or housing applications.

Neural computers will evaluate facial expression, body language and clothing style for emotions and intentions, as well as audio evaluation of the tone and rhythm of spoken words for latent intentions and assumptions, including analysis of the words from email, texts, blogs, and historical records from local governments, academic institutions, purchasing records, and other financial transactions.

The analytics will provide scenarios and what-if analysis and prediction of future behavior within a set of circumstances, for example allowing a commercial enterprise to design situations or opportunities to suit their preferences and influence purchasing behavior, or by allowing governments to develop policies and propaganda to test the reactions of a population, their preferences, intentions and personal beliefs.

Computer vision will be a central component of the future of the analytics infrastructure. Imagine government policy and business plans being designed around the predictions generated by an NC to form future programs and evaluation of each program by another NC to form recommendations, with the best recommendation being chosen by another NC to send to the final decision authority – a human…or an NC?


Energy-friendly chip can perform powerful artificial-intelligence tasks

Advance could enable mobile devices to implement “neural networks” modeled on the human brain.

In recent years, some of the most exciting advances in artificial intelligence have come courtesy of convolutional neural networks, large virtual networks of simple information-processing units, which are loosely modeled on the anatomy of the human brain.

Neural networks are typically implemented using graphics processing units (GPUs), special-purpose graphics chips found in all computing devices with screens. A mobile GPU, of the type found in a cell phone, might have almost 200 cores, or processing units, making it well suited to simulating a network of distributed processors.

At the International Solid State Circuits Conference in San Francisco this week, MIT researchers presented a new chip designed specifically to implement neural networks. It is 10 times as efficient as a mobile GPU, so it could enable mobile devices to run powerful artificial-intelligence algorithms locally, rather than uploading data to the Internet for processing.

Neural nets were widely studied in the early days of artificial-intelligence research, but by the 1970s, they’d fallen out of favor. In the past decade, however, they’ve enjoyed a revival, under the name “deep learning.”

“Deep learning is useful for many applications, such as object recognition, speech, face detection,” says Vivienne Sze, the Emanuel E. Landsman Career Development Assistant Professor in MIT’s Department of Electrical Engineering and Computer Science whose group developed the new chip. “Right now, the networks are pretty complex and are mostly run on high-power GPUs. You can imagine that if you can bring that functionality to your cell phone or embedded devices, you could still operate even if you don’t have a Wi-Fi connection. You might also want to process locally for privacy reasons. Processing it on your phone also avoids any transmission latency, so that you can react much faster for certain applications.”

The new chip, which the researchers dubbed “Eyeriss,” could also help usher in the “Internet of things” — the idea that vehicles, appliances, civil-engineering structures, manufacturing equipment, and even livestock would have sensors that report information directly to networked servers, aiding with maintenance and task coordination. With powerful artificial-intelligence algorithms on board, networked devices could make important decisions locally, entrusting only their conclusions, rather than raw personal data, to the Internet. And, of course, onboard neural networks would be useful to battery-powered autonomous robots.

Division of labor

A neural network is typically organized into layers, and each layer contains a large number of processing nodes. Data come in and are divided up among the nodes in the bottom layer. Each node manipulates the data it receives and passes the results on to nodes in the next layer, which manipulate the data they receive and pass on the results, and so on. The output of the final layer yields the solution to some computational problem.

In a convolutional neural net, many nodes in each layer process the same data in different ways. The networks can thus swell to enormous proportions. Although they outperform more conventional algorithms on many visual-processing tasks, they require much greater computational resources.

The particular manipulations performed by each node in a neural net are the result of a training process, in which the network tries to find correlations between raw data and labels applied to it by human annotators. With a chip like the one developed by the MIT researchers, a trained network could simply be exported to a mobile device.

This application imposes design constraints on the researchers. On one hand, the way to lower the chip’s power consumption and increase its efficiency is to make each processing unit as simple as possible; on the other hand, the chip has to be flexible enough to implement different types of networks tailored to different tasks.

Sze and her colleagues — Yu-Hsin Chen, a graduate student in electrical engineering and computer science and first author on the conference paper; Joel Emer, a professor of the practice in MIT’s Department of Electrical Engineering and Computer Science, and a senior distinguished research scientist at the chip manufacturer NVidia, and, with Sze, one of the project’s two principal investigators; and Tushar Krishna, who was a postdoc with the Singapore-MIT Alliance for Research and Technology when the work was done and is now an assistant professor of computer and electrical engineering at Georgia Tech — settled on a chip with 168 cores, roughly as many as a mobile GPU has.

Act locally

The key to Eyeriss’s efficiency is to minimize the frequency with which cores need to exchange data with distant memory banks, an operation that consumes a good deal of time and energy. Whereas many of the cores in a GPU share a single, large memory bank, each of the Eyeriss cores has its own memory. Moreover, the chip has a circuit that compresses data before sending it to individual cores.

Each core is also able to communicate directly with its immediate neighbors, so that if they need to share data, they don’t have to route it through main memory. This is essential in a convolutional neural network, in which so many nodes are processing the same data.

The final key to the chip’s efficiency is special-purpose circuitry that allocates tasks across cores. In its local memory, a core needs to store not only the data manipulated by the nodes it’s simulating but data describing the nodes themselves. The allocation circuit can be reconfigured for different types of networks, automatically distributing both types of data across cores in a way that maximizes the amount of work that each of them can do before fetching more data from main memory.

At the conference, the MIT researchers used Eyeriss to implement a neural network that performs an image-recognition task, the first time that a state-of-the-art neural network has been demonstrated on a custom chip.

“This work is very important, showing how embedded processors for deep learning can provide power and performance optimizations that will bring these complex computations from the cloud to mobile devices,” says Mike Polley, a senior vice president at Samsung’s Mobile Processor Innovations Lab. “In addition to hardware considerations, the MIT paper also carefully considers how to make the embedded core useful to application developers by supporting industry-standard [network architectures] AlexNet and Caffe.”

The MIT researchers’ work was funded in part by DARPA.

(Source: By Larry Hardesty, MIT News Office)

Harvard IACS Seminar: Smart Home Robots

When:        Fri., Nov. 13, 2015, 1 – 2 p.m.

Where:       Harvard SEAS Campus Maxwell Dworkin Bldg. G115, Cambridge MA

Host:          Harvard Institute for Applied Computational Science

Speaker:    Chris Jones, director of strategic technology, iRobot Corporation


The Smart Home market is forecast to be a multi-hundred billion dollar market by 2025, with a typical family home containing more than 500 connected devices and sensors by that time. From connected lights and thermostats, to cameras and HVAC circulation vents, to door locks and chore robots. While this market is currently seeing rapid growth with compelling market forecasts, to hit these forecasts and to achieve long-term viability, the ecosystem will need to address growing complexity and usability challenges. It is not practical to assume the average consumer will be willing and able to configure the multitude of low-level interactions between hundreds of diverse connected devices to achieve desired high-level smart home functionality. This talk will outline how incorporating a physical understanding of the home (e.g., maps) built and maintained by home robots can help address these challenges.

( Resources: )