Jimmy Wales, the founder of not-for-profit Wikipedia and for-profit, San Mateo, Calif.-based Wikia is part of a growing number of people who are discomforted by the growing control Google has over search. And he is doing something about it. His company, Wikia, last week bought the distributed crawler Grub from LookSmart and plans to make it available in open source. Not that LookSmart was really using it anyway — and they also did ad business with Wikia.
Wales’ bet: like Linux became a migraine for the monopolist of the last generation, open-source search tools will keep companies like Google honest. It is not an easy task, for Google is firmly embedded into our digital lives.
“Search is part of the fundamental infrastructure of the Internet. And, it is currently broken,” Wales said back in December 2006, when Wikia launched Search Wikia effort. “Why is it broken? It is broken for the same reason that proprietary software is always broken: lack of freedom, lack of community, lack of accountability, lack of transparency.”
Wales launched Search Wikia earlier this year, and the Grub acqusition is part of that strategy. (You can run Grub on your Windows or Linux-based PC, either in the background or as a screensaver.) Following the announcement, we spoke with Wales, who outlined that with Grub, and other tools such as Lucene, an open-source indexing software, innovation around search can thrive.
By marrying these search results and the human context provided by Wikia wikis, the final search results could actually become useful once again. Grub, Lucene and Nutch (a web crawler based on Lucene) are the powder and spark of the open search revolution.
Grub is not by any means the final move, and should be viewed as a first concrete step in a long-term strategy. Jeremie Miller, inventor of Jabber and XMPP protocol, who is leading the Search Wikia efforts
(and also CTO of Wikia) gave a talk at OSCON about the architecture of open-source search. Miller pointed out that the monolithic search can be broken into three components, and interested parties could implement one or more of the three components.
The three components are – factories that crawl, present and present content; collectors who rate and rank content from multiple sources; and brokers who direct user queries to the collectors or factories. Miller believes that this is a five-year process. Grub is one of the many components that will be needed for building a truly open-source search infrastructure. The biggest hindrance to any search start-up taking on Google (or Microsoft, Ask or Yahoo for that matter) is the high cost of infrastructure.
Sure Amazon’s EC2 service has helped, but it isn’t enough. Google, thanks to its money machine, has been able to build an infrastructure that lets it crawl, index and show results at a faster pace. Even if a start-up comes up with a better alogrithm, it still needs to sink millions into infrastructure to just get into the business, and offer as fast of an experience as most people associate with Google.
Grub, on the other hand, is a way to build a massive, distributed user-contributed processing network. Another nascent but promising open-source P2P search engine, Yacy, coming out of Germany. (Also check out Faroo, a German P2P search start-up.)
Can it work?
Wales faces an uphill climb. First he has to ensure that there are enough people using Grub, and are more importantly, are hacking enhacements to the software. At the same time, he has to address other concerns, as pointed out by this commentator on the Search Engine Land and other blogs.
While Google might be impossible to beat in a full-frontal assault, it is vulnerable to smaller, more focused attacks. While Linux may not have been able to kill Microsoft, it has stolen opportunities from the OS giant. It has been particularly effective in the Internet infrastructure (data centers.)
Open source search can do precisely the same – take away opportunities from large search engines. Perhaps, like with Linux, we will see a shift away from Google, and venture capitalists, for long scared by the prospect of competing with Google, will loosen their purse strings.
If Linux ended up spawning devices as diverse as TiVo and mobile phones, open-source search can lead to many more specialized search engines, also called vertical search engines. Today, the cost of building a good vertical search engine is millions of dollars. However, building and operating a vertical search engine is not for the faint of the heart.
In an interview with Fast Company magazine earlier this year, Wales quipped:
“The other thing we’re looking to is some of the second-tier search companies,” he admits. “We’ve talked to–I can’t say who–different people, asking, would they be better off participating in a project that helps quality search results to become a commodity?”
Put it another way – Wales is hoping for death by a thousand cuts to the search incumbents.
More @ Resource Shelf.
48 thoughts on “Google vs Jimmy Wales & Open Source Search”
Great article, very exciting stuff. You should fix the link to Grub, however. It’s grub.org not grub.oom. Also, looks like you forgot to close the link tag after nutch.org because it goes on for several paragraphs.
You left an anchor tag open for multiple paragraphs.
Wow. I agree a change to that extent will take time, but with the movement towards open source “everything” the internet might be ready to handle this. The real question is how far are hackers and software developers willing to go to really make this mean anything at all against giants like google?
thanks for the catch. fixed it. something went wrong when posting in wordpress from my blog editor.
Just when I blast you , you come out with a good albeit dated article.
You need an editor. The grammar mistakes make this barely readable.
The big issue with Wikia in particular is how many users will it find who are willing to contribute, considering that it is a for-profit entity. I for one, contributed heavily to Wikipedia, but will not touch Wikia.
On the other hand, Grub looks really interesting, and should be fantastic. However, I think Google wont be too worried, since it is good enough for most people, and it is far too entrenched in our collective mindset.
I bet Google is laughing hysterically now. Jimmy’s biting off more than he can chew. From the outside everybody’s a genius.
“You need an editor. The grammar mistakes make this barely readable.” He’s needs to learn how to write. In the short deadline world of blogging, the bloggers need to know grammar, know when to look things up in dictionaries (hyphen? open? closed?), and know the AP or NYT style manual.
I for one look forward to replacing my Google toolbar with the Grub toolbar the day it is released. Google was good but now sucks, the search results have become so bad I have given up on searching
“The grammar mistakes make this barely readable.”
Yup. I’m glad someone else noticed. Very annoying.
You have mistakes in your article:
“…part of a growing number of people who are discomforted by the growing control Google HAS over search.” (the word HAS is missing)
“…it is impossible to get away from Google that WHICH is firmly embedded into our digital lives.”
(The word “which” is a more appropriate choice than “that”)
“…Wales launched Search Wikia earlier this year, and THE Grub acqusition is part of that strategy. ”
(the word THE is missing)
“…By marrying these search results and the human context provided by says Wikia wikis, the final search results could actually become useful once again”.
(What is supposed to folloy the preposition “by”? This sentence has structure issues)
“…Miller in his talk pointed out that the monolithic search can be broken into three components…”
(Either use a comma after “Miller”, or restructure as…”In his talk, Miller pointed out…”)
“…The three components are – factories that crawl, present and present content;”
(You use the word “present” twice)
“Google, thanks to its money machine has been able to build an infrastructure that lets it crawl, index and show results at a faster pace”
(You should add a comma after “machine”)
“Even if a start-up comes up with a better alogrithm, it still needs to sink millions into infrastructure, to just get into the business, and offer a fast-experience most people associate with Google.”
(This is a run-on sentence and would be best as two)
“Grub, on the other hand is a way to build massive, distributed user-contributed processing network, and can help offset with the power of a wiki to form social consensus, the open source Search Wikia project has taken the next major step towards a future where search is open and transparent. ”
(This is a perfectly terrible sentence. Missing words (the word “a” before “massive”) and unconnected fragments. It not clear what is being said)
“Another nascent but promising open source P2P search engine, Yacy, coming out of Germany”
(You are missing the word “is” in front of “coming”)
“Wales faces an uphill climb. First he has to ensure that there are enough people using Grub, and are more importantly are hacking enhacements to the software.”
(There are too many “ARE” words. This sentence also has structure problems)
“Perhaps like Linux, we will see a shift away from Google, and Venture Capitalists, for long scared by the prospect of competing with Google, will loosen their purse strings.”
(The preposition “for” is not needed)
I agree with Bevis. You need an editor in a bad way. I am amazed that you have one awards for your writing.
Excellence in Journalism Ward
The gold award from American Society of Business Publication Editors
Senior Editor at Forbes.com
You have to be joking. There is no way that this article was written by a person with those credentials.
I agree with the comments about the poor quality of the writing. Om, step it up! We expect much more from you!
At Om’s request I just went through and did a light copy-edit.
sorry about the grammatical errors – post the wrong draft, and it was pretty late at night, so apologize for the mistakes.
I am sorry for the errors!
@Amazed Reader: if you are going to criticize other people’s writing you could at least do something about your own. “I am amazed that you have one awards for your writing.”. I think the word you wanted was “won”.
A few grammatical errors and some spelling mistakes renders the article unreadable to some people? Now, that is amazing.
This is a little ballsy of Wales to go after Google in such a way. It’s a little weird to see Wikipedia acting like the proud beggar with the hat in their hand meanwhile their boss is trying to figure out how to get rich.
And “lack of freedom, lack of community, lack of accountability, lack of transparency” is pretty rich. Aren’t those the major problems with Wikipedia?
the irony of google is that if you type any subject, the results on the first page will most likely have a link to a wikipedia.org web page containing that subject.
It seems like Grub wouldn’t be all that helpful for breaking news type content since there is an additional level of processing. If a search engine can’t be counted on for EVERY type of search, I don’t know how it will gain traction. However, I’d love to be proved wrong.
It’s great that Wales is trying, but Wikipedia is fundamentally different from search – the difference between structured and unstructured data – so there’s no reason to believe that he will be any more likely to succeed than all the other smart people out there.
The vast majority of readers are looking for up-to-date information on the Web from a trusted source. Om delivers on that promise, which is why I read his blog. I hope that Om doesn’t recruit “editors” to sanitize the personality of GigaOm (which would also slow things down) in order to appease a tiny minority of readers intoxicated with the finer points of English grammar.
I was going to mention Nutch, but you mentioned it, oh well. I am a big fan of human powered search, but no one seems to have perfected this method yet. I had actually begun pursuing venture capital several years ago, but abandoned it for personal reasons at the time.
I consider myself as a bit of a litmus test for change – I’m usually in the last 10% of people to “get it”; and I am “getting” fed up with google giving me advertising results instead of search results. I’m sold on the idea that an OS project can take on google. Just getting someone like me to write a comment should be a sign that ordinary people are ready to support something new. And no, it’s not lost on me that I read this article from a link from google-news, who obviously find it funny that such an aricle is being ridiculed becuase of grammar – not content.
Informative and interesting article.
I’ve tried installing GRUB on my PC. But, as I’m accessing net through a proxy, GRUB couldn’t pass through. I will try again on my laptop.
As its already said, its a long way to compete with the search giants like google, et al. Nevertheless, its a credible start.
We definitely need an alternate for Google for searching. And if Grub can fill the space nothing like it as it’ll get inputs from the open source communities. The search results should be more oriented towards the query’s context rather than just keywords or phrase. Great article.
It’s going to be pretty tough for open-source companies like this to compete against giants like Google, but hopefully Jimmy Wales will succeed in keeping Google honest.
I am looking in future that Most of the people are using Grub toolbar or something else instead of Google Toolbar, Google will no longer be a standard in future. But in reality Google is undisputed winner of internet atleast today.
I think the problem is that Google have done a tremendous job on search. It will not be easy to provide a better search experience than google. When Linux started to gain market share for web infrastructure it was because there were so many faults with the microsoft web server. Also Microsoft ask the end user to put their hand in their pocket so it is easier to attack them with a free (beer) product. Google have an advertiser pays model so if I’m happy with the results I get from google, I have no reason to use another search engine.