It could be my advancing years – I seriously doubt that – but every morning I dread turning on my MacBook Pro, fearful of the data deluge that it will bring and the day-long struggle to find the information I need to get things done that will ensue. And I’m not the only one caught in this quicksand-like avalanche of digital data.
According to market research firm comScore, in May the total number of Internet searches conducted in the U.S. alone was about 10.7 billion — up nearly 20 percent from 9.1 billion searches in May 2007. Those numbers make clear that we’re all searching for more information. What they don’t make clear is that often we don’t find what we’re looking for, and so end up trying again and again.
The problem is that there’s too much data coming online too quickly, and the traditional method of search that involves first finding and then consuming the information is not going to work for much longer. There just won’t be enough time for us to do that and still have a life. It’s a problem, and therefore solving it is an opportunity — a very big opportunity.
Earlier this week I was going through the digital detritus that has accumulated on my computer when I stumbled upon an old slide presentation made by Google back when they were still tiny. One slide estimated that by 2002, there would be about 500 million searches a day and between 3 billion and 8 billion web pages. Now those numbers seem so last century, for every day the amount of information online continues to grow at an exponential rate. It’s nearly impossible to calculate the exact number of web pages that are out there, but a good yardstick would be data from Netcraft, which tracks the number of servers on the Internet and says that the number of active domains almost quadrupled from 2002 to 2007. The total number of web sites at the end of April stood at over 162 million.
Many of these new sites are courtesy of Web 2.0 technologies that have allowed for the easy creation of digital data. Blogs, social networks, RSS feeds, Flickr feeds, Twitter messages, video clips…the data just keeps growing and growing, much like the proverbial Energizer bunny. And the problem of data overload is going to get even bigger as devices such as the 3G iPhone, with their fast wireless connections, make the on-the-go creation and sending of videos, messages and photos to our friends even easier.
If someone can become the Dolby of the web — remove the noise and give us clear sound — then they are going to make a lot of money. And when I say sound, I mean data that is truly useful. But that would just be the start.
Pip Coburn, who runs his own investment firm in New York, recently pointed out that “It’s not data that’s important, but what you do with it.” A good example of that would be a tiny startup called Summize, which is reportedly being acquired by Twitter.
Summize has come up with a clever way of peering through Twitter’s vast data stream and finding out what’s hot, where and how. The results are essentially keywords — topic-, person- or location-based — and thus can be used to show contextual advertising next to the pages that show these results. In other words, Summize has developed an ability to monetize conversations without being intrusive.
The possibilities of what a similar service could do with this data are endless. Imagine a service that would scroll through all the Flickr photos, Twitter messages and marry them to data on the Internet, such as nearby mass transit stations, Starbucks, movie theaters and grocery stories.
All this information would show up on your phone, but you would only see the options in, say, a 100-meter radius that could be increased by zooming out. It would be the ultimate mash-up of various web data sources offered to you as an application, and such applications would make it possible to find, consume and share information — without even trying.
Almost like serendipity! How’s that for a business model?
This post was originally published on BusinessWeek.com.
Well first you would have to know what is Information, or how does data become Information and when. If you take it a step further you get into Knowledge how it relates to Information and how it’s used. And soon you end up writing an equation for consciousness, because this all has to fit together.
Just throwing terms around believing we all share the same definition is to say at least a stretch and repeated by all the so called Information workers out there.
BTW, trying to get this even close enough to be right based on a Boolean system or algorithm will also be a stretch, or as far as I can see, impossible.
Anyway next thing will be the talking about Intelligent systems, but if you have solved the equations for the above problems you will realize there is no one such thing (Intelligence).
Ok, I’m grumpy today.
I don’t give a damn about what’s hot…I just want to get answers to my searches (often very niche) without fluff.
Yes. Add to this the fact that most people spend at least 8 hours a day at a job trying to make decisions based on data that is unique to their business (and not searchable). Yet, we continue to build and use systems at work that constrain data like the early catalogue systems of the Web. At least in our “personal lives” we can use search engines like Google to try and cut through the clutter. The challenge is that business runs on infrastructure and data and there is an increasing amount of it. There is much work to be done in Search for sure, especially in the enterprise. A few months ago we posted a blog on what we thought the Perfect Search Engine for business would look like – https://community.paglo.com/blog_topic/index/57-perfect-search
Brian de Haaff, Paglo
Spot on…turning data into information is no longer sufficient. We need actionable insights that are contextual. It is well within reach even using today’s technology to accomplish that but at some point, one has to deal with behavioral and personally identifiable info. Would people make the trade off between richer information and some degree of lost of privacy? Should that be regulated or would users be comfortable with a “do no evil” corporate mission??
To me your article brings to mind aggregation and meta sites.
I don’t necessarily want to plow through umpteen different interfaces to get the information contained therein. Sites like PopURLs and (the egregious copy) alltop.com make a decision about what sites you will want grouped and bring it all into a single interface.
Our site ScrapeUp.com attempts to do the same thing with video aggregators.
That’s interesting! It never occurred to me how confusing it will be to classify those sites as they would be pulling from as many other sites as possible.
Still the filters would be the determining factor as to how possibly close the info would be with regards to relevance to the query. Guess a few hits that would be as close to the need would be deemed more valuable than getting too many that only touches what we may call as ‘border’ answers.
Funny, as I seem to be talking about a similar concept called ‘network’. Or maybe the meaning of ‘rich’.
Best.
alain
mor.ph
Isn’t the context you’re calling for part of the hope for the semantic web? Once those ontologies and databases are set up isn’t that the first level filter that determines which data belongs together? The second filter would be some type of recommendation engine that could learn your individual preferences?
Summize and most of the services that focus on tracking one or two media don’t really help when it comes to parsing social media activity as a whole. The conversational one-to-many nature of social media means you need discovery tools that not only bring back results from across the eco-system but also give you extensive analysis capabilities: who is talking, where are they, are they positive/negative and how influential are they? These demographic, sentiment and authority measures help sort the bewildering array of conversations out there.
Internet= Information
Social Media= Communication
Two very different things.
@Stacey
I don’t know. The semantic web reminds me a lot of OO Programing.
Problem is, the real world is not as clean. Data is messy, incomplete, inconsistent and the meaning can change over time, cultural boundaries.
I don’t think it has great advantages over keyword search. Might be good for some things but in general it’s just hype.
Om
Excellent piece on the signal to noise issue we face on a daily basis. I have been thinking along similar lines. In a moment of “Serendipity” I created a concept called the Chaos Score. This basically looks at how the inputs (web, rss, apps, devices) have a similar behaviors to our information flow as Metcalf’s law regarding the network effect. You can see the complete post at http://groupswim.wordpress.com/2008/07/14/tmi-and-the-chaos-score-metcalf’s-law-applied-to-modern-productivity/
The point is we, as consumers of this data, can significantly improve our signal to noise ratio by rationalizing down our inputs. I believe you will not “miss” anything by cutting out a few feeds or a redundant application or two.
So, what is your Chaos Score?
I believe there are services (or efforts to bring in these services) in Europe which allow mobile users to receive updates on local events and destinations like retails stores, restaurants etc. within a set limit. It works like this: When a mobile user is at a city center, say Delhi’s Connaught Place, he will receive updates from the operator about his chosen services- Pubs in the area, Cinema listings and the like.
Taking this a step ahead, the system can let the user know of friends within his vicinity, or maybe more interesting things based on his interests.
Truly a great business model….to think. to implement? Lets wait for a pioneer.
Manish Pahuja