Google and MIT Take Giant Leap in Mobile Image Processing

Posted on
Google & MIT Machine Learning

To say that digital photography has changed in the last ten years would be an understatement bordering on absurdity. Since I’ve graduated high school, we’ve gone from blurry, low-res camera phones to those that rival some of the best prosumer cameras on the market. I can take a 12-megapixel image, run it through Photoshop, save it to my Dropbox and post online all without leaving my iPhone. This creates a problem, though, as consumers like me now expect phones to do pretty much everything and do it well, without extended processing times and without having to interface with a desktop. Even mid-range mobile devices come with many-megapixel cameras, and so mobile high-resolution image processing is no longer a problem of luxury. It’s one of utmost necessity, and one that Moore’s Law is in no hurry to fix.

Enter MIT and Google (and are you even surprised?). Hot on the heels of last year’s RAISR, an efficient and accurate super-resolution technique that innovated by combining the best of the old and the new, comes Deep Bilateral Learning for Real-Time Image Enhancement, a performant mobile processing technique that can retouch high-resolution images in realtime.

Other solutions exist to solve this problem already and the authors make a quick study of them in their paper. But the true innovation of the new algorithm lies in the “black box” nature of the software and its being only about 100 megabytes in size. This allows it to run smoothly in realtime on mobile devices (thanks Tensorflow!), while still retaining high-fidelity to the original source.

There are subtle, more creative breakthroughs with this algorithm, too. Most high-resolution processing techniques, including both the traditional and the more recent deep learning solutions, focus either on applying super-resolution to modified low-res images, which is costly and liable to lose data, or doing memory-intensive processing on the original.

MIT/Google’s contribution is to perform image transformations through learned features on a low-resolution version of the image first, and then output a series of affine transformations (transformations that preserve points and edges, and thus sharpness and definition). The way they do this is very interesting, but you can think of it as creating a “recipe” for a transformation that can then be scaled up as coefficients, instead of inefficient pixel upsampling or convolutional transformations at many megapixels of resolution. This, the author’s claim, is magnitudes of speed faster than the current state-of-the-art and can do complex transformations like HDR and human-like retouching prior to the image even being captured.

Prisma and Snapchat have redefined what we thought of as mobile photo filters, and what we thought of as accessible to the common consumer. We’ve reached a point where users can click a button and almost instantly see themselves as a van Gough or Rembrandt, and where they can hold entire video conversations while being convincingly transformed into cats. Realtime data, image and video augmentation is not just one aspect of the future of data. In many ways, it IS the future of data.

As Google, MIT and other creative innovators continue to show us, that future is an undiscovered country, whose successful traversal will require more than scaling current solutions or throwing more processors at our problems. It will take ingenuity. It will take systems, services and engineers who understand that mobile data processing is no longer a way to merely store information about the world and our lives — it is a way to see and experience them, as well.


Read More →