December 10, 2014

On visual web, a photo is worth more than a 1000 words

Photos, photos and more photos! Photos are the atomic unit of social platforms. Photos and visuals are the common language of the Internet. It is hardly a surprise then, that we are going to upload nearly 900 billion photos to the Internet this year. Or that, Snapchat is worth tens-of-billions of dollars. What is more amazing is that we have only, barely scratched the surface when it comes to the potential of turning on the cameras. Here is a personal essay on why I am excited about photos, the visual web, computer vision and the far reaching impact of visual sensors.

I started taking pictures when I signed up for Instagram. Before that, I wasn’t too keen on taking them, mostly because I was too lazy to buy a camera. While Nokia’s camera phones were not perfect, they showed the world one simple reality: The future of the camera was mobile, connected and social.

At that time I didn’t understand the value of emotions involved with pictures. When I was growing up, my family’s means were modest, so we didn’t have a camera. Once a year we would dress up and go to a photo studio in our neighborhood (it was called Shankar Studio, and it is still there), and that would be it. Or at special occasions such as birthdays and weddings, some friends, family members or professionals would bring a camera.

Though the memories from my childhood, teenage years and even college years are vivid, tacit and real, only some of them have pictures. Most are triggered by a song, movie or place. I can distinctly remember where I had my first cigarette, who I was with and the feeling it evoked. I bet most people in the U.S. have different ideas about memories: After all, in the U.S., everybody has cameras. And as a result, there are a lot of pictures of you when you were growing up. I didn’t have one — so all of my memories are in my head.

When I started using Instagram, I realized that beyond being pictures, these images are also a daily diary of how I am feeling and what I am looking at. Snapchat has only turbocharged that behavior. For youngsters — teens like my nieces — the old idea of “taking” a photo as a deliberate act feels old-fashioned.

F39E3D4E-436C-4E89-AA36-A2C8405BFE66 One of my nieces keeps Snapchat open and essentially shares every few minutes; she pointed out that with her generation, there is a much lower barrier to Snapshotting and sharing pictures. This generation has never felt any constraints: They are living in time of plentiful bandwidth, unlimited storage, increasing quality of cameras and no legacy around photos. It is hardly surprising that the pictures are free and ephemeral for this group!

But it was only after Instagram and that conversation that pictures started to become more interesting to me. Why? Because they’re about the memories embedded in the photo. It isn’t just me who feels this way. My friend Sophie Lebrecht, the CEO and founder of True Ventures–backed Neon Labs, often reminds me about the power of images. “An image is the gateway to your emotional memory,” she tells me. “And on the visual web an image is the gateway to accessing almost all content and information.”

Today images, she says, “are becoming a touch point for navigation on mobile and other connected devices.” As the popularity of Instagram and Pinterest demonstrate, we are adapting to a different kind of a web, one that will be increasingly visual. And as more and more images populate the web, it’s getting harder and harder to find them. That’s exactly where the opportunity lies.

A Very Visual Web

“A landscape image cuts across all political and national boundaries, it transcends the constraints of language and culture.”
— Charlie Waite

As the web is populated with more and more things, our ability to consume information is going to be limited by how much we can absorb. This is a problem for text in general and the word-based web in particular. Why? We are built to process visual data. When we walk into a room, we look around and almost instantaneously make a careful assessment about our surroundings (for example, we notice who’s sitting where). With text, though, we have to read, internalize and contextualize. That’s three steps; but visual elements only require one. The reason Pinterest has taken off is that images are so much easier and faster to navigate. That’s why the web is increasingly becoming visual.

But more importantly, the visual web has another thing going for it: You don’t need a translation. A smile is a smile is a smile in Chinese, Hindi, Brazilian, Portuguese, Spanish or English.The emotional payload of a picture is universal. Everyone understands that the image of a train stands for a real-life train or that a cup with steam means something hot to drink. It’s a visual metaphor for the words behind it. The human brain is good at recognizing visual symbols: A scientific study shows that our brains take about 150 milliseconds to recognize a symbol and another 100 milliseconds to know what it means.

“For twenty thousand years, humans have used drawing as a way to communicate ideas, to share their experience of the world,” Scott Thomas, the co-founder of The Noun Project, told the audience at the Gigaom RoadMap event in 2013. “The visual language simplifies one of the most basic elements of being human: the ability to communicate with people, even if language isn’t an option.” That’s where we are all headed, especially now that the web is a lot more international and more connected.

Photos, Photos Everywhere

“For one thing, there are a great many more images around, claiming our attention. The inventory started in 1839 and since then just about everything has been photographed, or so it seems.” — Susan Sontag, On Photography

Every day 60 million new photos are shared on Instagram. Three-hundred million photos are shared daily on Facebook alone. That doesn’t even include Snapchat, Google and Twitter. In May 2013 Mary Meeker of Kleiner Perkins Caufield & Byers estimated we were uploading 500 million photos every day. A year later she estimated that number was up to 1.8 billion photos every day. We are living on a planet that is captured every second, from every angle. Pete Warden, the co-founder of Jetpac (who is now at Google), often points out that we are living with a satellite view of the earth, taken at ground level. It is an apt metaphor. It is amazing how much information we share on a daily basis. And we’re sharing it not as text but as photos. Yet it is harder and harder to find our photos, which means it’s harder and harder to derive the embedded information in photos.

Going back to my own obsession with Instagram, I look at my feed only occasionally, because it is still difficult to find photos from my history. Simplifying this process would actually be my No. 1 ask from team Instagram. So I’ve started to use tags to classify photos and call them up quickly. Mostly I classify by city and date, but it is a primitive system. I wish the machines did this, and faster. (It’s worth noting that Rick Klau of Google Ventures scanned all of his family albums and found that Google Plus’ #autoawesome feature resurfaced magic moments from the past almost automatically. More of this, please!)

meeker_report_photos What the world needs is not filters to apply cool effects to the photos but filters to surface photos. For instance, we still haven’t come up with a way to turn back the proverbial clock and develop, for the lack of a better word, time swiping: the ability to go back and forth in time. Right now it is virtually impossible to find that photo from six months ago, even though you have a vague recollection that you took that photo. You’d think Facebook would do this, but it is busy wiring the planet and tweaking your news feed. Personalizing your photo experience by helping you find the right pictures is a great opportunity for someone like Twitter or Facebook, but they are too busy with other projects.

And that leaves the opportunity wide for startups. Until recently I used my Instagram history and Foursquare check-in history to keep track of where I had been. (I could use Timehop, but I don’t like using the app, as I don’t think it has nailed the experience.) The only available methods to surface and categorize photos are beyond basic; we need something intuitive and streamlined to navigate this new visual web.

The Next Great Photo Service

We desperately need a service that helps us create a visual timeline of our life. That app is even more important now that we are not making photo albums like we used to. Instead we are constantly taking discreet images and sending them out on the social web as streams. Some of us take photos and leave them on the disk, only to later spend hours looking for a specific image.

These thousands of photos represent our stream of consciousness. Any organizational principle will have to use data on how people search for photos across a broader set and then build a photo-search algorithm based on constantly shifting search patterns across millions of people. This photo-organizational effort has to go beyond, say, semantics (i.e., trees, buildings and other basics).

Just as Google search made us find relevant (or seemingly relevant) things, the next great photo service could essentially find relevant photos, map them to a location and events, and resurface them at an opportune moment, taking a cue from our social behaviors. Just as a company like SwiftKey brings us a smarter, more personal and intelligent mobile keyboard, this service could be an amalgamation of many things.

For example, we experience moments or interactions as feelings that are associated with objects, scenes and images. So perhaps when we see later see the image, memories of those interactions, moments and feelings come back. There is just so much metadata in that image!

These timelines will not just be personal: We have come to a point in society where photos and videos are part of the larger sociopolitical dialogue. But how can we create a way for visuals to tell the near-history of our time?

A Fabric of Photos

Beyond this utopian service that creates a highly personal, visual timeline, there are plenty of opportunities in the visual web, from base-analyzing images to using metadata to draw more inferences and better information. About half of our brain is devoted to visual processing, and if we are going to augment some of our thinking with machine intelligence, it is also important to make machines that make sense of the visual. I’m interested in a way to find data from photos and use it to solve a specific problem.

When I was in Sweden a few months ago, I sat down for coffee with Peter Neubauer, who in his past life was the co-founder of Neo Technology and the open-source database Neo4J. The conversation turned to the evolution of photography. Neubauer pointed out that with the explosion in digital photography, the value curve of visual information is going to shift from the stand-alone individual aesthetic of the artist to the collaborative and social aesthetic of services like Facebook and Instagram. Photos have always been tools of creative, artistic and personal satisfaction. But going forward, the real value creation will come from stitching together photos as a fabric, extracting information and then providing that cumulative information as a totally different package.

The Community Photo Collections project at the University of Washington’s GRAIL Lab is a good example. Five years ago the researchers embarked on an effort to construct a 3D model of the city of Rome based on 150,000 of the 2 million publicly available photos on Flickr. “Tourist photographs will never capture [a large] city in its entirety,” team member Sameer Agarwal told National Geographic. “Our hope is that we will ultimately be able to combine the Flickr photographs with something like Google Street View or aerial imagery like Microsoft Virtual Earth to build complete 3-D models of cities.” The platform, called PhotoCity, also became the underpinning of Microsoft’s commercial Photosynth service.

While five years ago it was expensive to run an experiment like this, the continually falling prices of cloud computing are going to make mining intelligence from the fabric of photos relatively cheap and open up new opportunities. This becomes even more exciting when you think about how cameras are changing our perspective on the world, adding new context to it from angles that the human eye either can’t see or tends to miss. For example, wearable cameras are emerging from companies like Narrative, and drones are becoming a rich source of photos known as “dronies.” Of course, this also opens up our fabric of photos to marketing & targeting companies.

I can think of several practical applications for this new fabric of photos. First, using the embedded geospatial data in photos can open up new opportunities. For instance, many new cars now come with cameras that assist with navigation and parking. These cameras capture photos that have geospatial data, and in aggregate, that data can help facilitate the more precise and focused location systems needed for semi- or fully autonomous cars.

Meanwhile Dropcams and other personal security cameras silently observe our world and capture information. One day that data could facilitate software that generates personalized, invisible grids on which our home robots can move (think Roombas and other personalized devices that make our lives easier).

But dream as I do of these things, there are many challenges around photos. For starters, images are big, which means high bandwidth and storage requirements. That makes it a difficult proposition for a startup. Another problem? There are too many damn pictures, and their numbers only keep going up. By the end of 2014 some estimate that there will be 880 billion photos on the web. Beyond sheer volume, privacy challenges are the new reality of a utopian technological desire. Future photo services will have to tackle these familiar problems before they can move forward.

But I see a future in which a machine’s camera will make sure that our computer-assisted cars don’t crash or smartly reroute their way to a less crowded street. Who knows? We have only just gotten started. Perhaps a picture will soon be worth a lot more than, say, 1,000 words.

Note: This essay was inspired by a conversation with Evan Nisselson of LDV Capital at his LDV Vision Summit in New York in summer of 2014 . Since then Sophie Lebrecht, Pete Warden, Mark Kawano, Peter Neubauer, Jan Erik Solem and Hiten Shah have shared their thoughts with me on various aspects of photography, photos and visual sensors.

My Essays

0 comments