Jimmy Wales, the founder of not-for-profit Wikipedia and for-profit, San Mateo, Calif.-based Wikia is part of a growing number of people who are discomforted by the growing control Google has over search. And he is doing something about it. His company, Wikia, last week bought the distributed crawler Grub from LookSmart and plans to make it available in open source. Not that LookSmart was really using it anyway — and they also did ad business with Wikia.
Wales’ bet: like Linux became a migraine for the monopolist of the last generation, open-source search tools will keep companies like Google honest. It is not an easy task, for Google is firmly embedded into our digital lives.
“Search is part of the fundamental infrastructure of the Internet. And, it is currently broken,” Wales said back in December 2006, when Wikia launched Search Wikia effort. “Why is it broken? It is broken for the same reason that proprietary software is always broken: lack of freedom, lack of community, lack of accountability, lack of transparency.”
Wales launched Search Wikia earlier this year, and the Grub acqusition is part of that strategy. (You can run Grub on your Windows or Linux-based PC, either in the background or as a screensaver.) Following the announcement, we spoke with Wales, who outlined that with Grub, and other tools such as Lucene, an open-source indexing software, innovation around search can thrive.
By marrying these search results and the human context provided by Wikia wikis, the final search results could actually become useful once again. Grub, Lucene and Nutch (a web crawler based on Lucene) are the powder and spark of the open search revolution.
Grub is not by any means the final move, and should be viewed as a first concrete step in a long-term strategy. Jeremie Miller, inventor of Jabber and XMPP protocol, who is leading the Search Wikia efforts
(and also CTO of Wikia) gave a talk at OSCON about the architecture of open-source search. Miller pointed out that the monolithic search can be broken into three components, and interested parties could implement one or more of the three components.
The three components are – factories that crawl, present and present content; collectors who rate and rank content from multiple sources; and brokers who direct user queries to the collectors or factories. Miller believes that this is a five-year process. Grub is one of the many components that will be needed for building a truly open-source search infrastructure. The biggest hindrance to any search start-up taking on Google (or Microsoft, Ask or Yahoo for that matter) is the high cost of infrastructure.
Sure Amazon’s EC2 service has helped, but it isn’t enough. Google, thanks to its money machine, has been able to build an infrastructure that lets it crawl, index and show results at a faster pace. Even if a start-up comes up with a better alogrithm, it still needs to sink millions into infrastructure to just get into the business, and offer as fast of an experience as most people associate with Google.
Grub, on the other hand, is a way to build a massive, distributed user-contributed processing network. Another nascent but promising open-source P2P search engine, Yacy, coming out of Germany. (Also check out Faroo, a German P2P search start-up.)
Can it work?
Wales faces an uphill climb. First he has to ensure that there are enough people using Grub, and are more importantly, are hacking enhacements to the software. At the same time, he has to address other concerns, as pointed out by this commentator on the Search Engine Land and other blogs.
While Google might be impossible to beat in a full-frontal assault, it is vulnerable to smaller, more focused attacks. While Linux may not have been able to kill Microsoft, it has stolen opportunities from the OS giant. It has been particularly effective in the Internet infrastructure (data centers.)
Open source search can do precisely the same – take away opportunities from large search engines. Perhaps, like with Linux, we will see a shift away from Google, and venture capitalists, for long scared by the prospect of competing with Google, will loosen their purse strings.
If Linux ended up spawning devices as diverse as TiVo and mobile phones, open-source search can lead to many more specialized search engines, also called vertical search engines. Today, the cost of building a good vertical search engine is millions of dollars. However, building and operating a vertical search engine is not for the faint of the heart.
In an interview with Fast Company magazine earlier this year, Wales quipped:
“The other thing we’re looking to is some of the second-tier search companies,” he admits. “We’ve talked to–I can’t say who–different people, asking, would they be better off participating in a project that helps quality search results to become a commodity?”
Put it another way – Wales is hoping for death by a thousand cuts to the search incumbents.
More @ Resource Shelf.