58 thoughts on “Even in Web 2.0 Scale & Size Matter”

  1. This infrastructure discussion is the “dirty little secret” of all the hyperbole surrounding Web 2.0. Just wait until there is global scaling required!

    I did a post about this dirty little secret right after the Web 2.0 Conference since this missing part of the discussion was so glaringly left out.

  2. I have to disagree. While it’s embarassing for Google to be overwhelmed by Analytics users, I think there is much more risk for small web 2.0 “companies” investing tens of thousands of dollars in infrastructure before figuring out how to turn a profit. That’s more like Bubble 1.0 if anything.

    Then again, I’m not quite sure what the business model behind companies like Del.icio.us really is, other than hoping to be bought before burning through the VC money? That might be the real problem.

  3. slashdoc, i think theissue is pretty clear… if you are going to be doing business, well, then you might as well prepare for the eventual issues of scale and size. the fact is google analytics did not scale, is their bad planning and inexcusable. it is even more critical for start-ups to capture the users who come their way. of course they can stay in controlled beta for as long as they want. sphere is clearly doing that.

  4. I write OSS server automation software, and I was at the Web 2.0 in hopes of learning how to apply some of these nifty ideas to my own development, and even though I went to learn I ended up feeling a lot more like a vendor. Very few people had any kind of infrastructure or really any idea how to successfully run web services.

    People need to realize that managing and scaling the services are different skills than creating them; either develop both skills or get to hiring, but don’t think that those machines will just run themselves.

  5. Scalability not just about how to add another box to the rack and let DNS round robin scheme take over the scalability. Scalability must be designed into application architecture. If scalability designed correctly into the applications there are hugh financial savings you can reap. Distributed Multi-Database SQL queries “is a must” for any web service application who want to scale.

  6. Jason is right on. There is little reason to spend much energy on “getting scalability right” at the outset. The energy is much better spent on figuring out how to get customers and make them pay. Scalability is fairly easy to provide when necessary. Most startups would kill to have scalability problems!!

  7. i think you folks are missing the point. i think ramana is spot on. it is the right architecture, and ability to think through the what if problem. it is not throwing servers which is an issue, it is an issue of building a scalable architecture. and that a lot of companies are not thinking through.

  8. Yeah. Totally. Scalability is the new ‘blog’.

    This is ALL I did at Rojo. Scalability 24/7… MySQL cluster design, distributed filesystem design, memcached, caching, web app performance.

    BTW if anyone needs a kickass scalability consultant they should send me a private email 🙂

    Kevin

  9. We’ve been dealing with these issues quite closely with Akismet and WordPress.com. The best advice I can give someone starting from scratch is to design for federated data if possible with your application. This is the single best decision we made with WP.com.

    Second, high quality hardware and load balancing setups are so cheap these days that it’s really criminal not to invest in it if you care about your users. I would never want to be running an app where users didn’t care if it was down for a couple of minutes a day, or slow.

    We’ve made mistakes, but you don’t have to: replicate your database, backup several times a day, always have a hot spare, keep DNS timeouts low, and TEST TEST TEST. There are open source tools today that do the exact same thing the 50k+ load balancers do.

    I’m on a budget, but I’m incredibly thankful that (a) so many users want to use our service that scalability is something to worry about and (b) we’ve invested with heavy spikes and growth in mind. A few months ago if a hard drive burned out the service just would have been down, likely for at least half a day. When it happened a few days ago it just meant a slowdown until we were able to shift our application load.

  10. At PodTech when I started the company all my dollars went into infrastructure (no money went into marketing). The fact that PodTech never went down was the marketing. When podcasters were dropping like flies when iTunes came out and then Yahoo Podcasts PodTech stood proud. As a small company that was our proudest moment. High availability and clustering are keys if your business is about 2.0.

  11. How many companies actually have scalability problems that is hardware related? 99% of the time the problem is the company has clueless developers who don’t know how to scale a product and no amount of money you toss at it will change it.

    Before i built my site i worked at a good many companies during the end of the bubble years. It was amazing sitting down and tuning the database for 20 minutes and you get rid of the need of 80% of the infustructure.

    I built my site and it gets 10 million pageviews a day now. Top 10 site in canada in terms of pageviews and #83 on monday in the USA according to hitwise.com The entire site has 1 web server, and 1 DB server doing 99% of the work. Site also has a mail server, and 1 image server.

    I run the only “web 2.0” company in my space and every single competitor my size has around 200 servers and a support staff of 20-40 people. I just don’t get it. In short if you have scalability issues and less then 5 million pageviews a day you’ve got issues with the knowledge level of your staff.

  12. All of the freeware on this page is unsupported by SGI. http://www.sgi.com/fun/freeware/web.html

    A better way to go could be that all freeware is supported. SGI could easily offer a support link and charge a small fee for tech help with the freeware. SGI could also provide links to businesses that do support this freeware. The little company running freeware today, could be the Google of the future.

  13. Paul Graham asks Does “Web 2.0” mean anything?
    This is at http://www.paulgraham.com/web20.html
    “The conference itself didn’t seem very grassroots. It cost $2800, so the only people who could afford to go were VCs and people from big companies.”
    The whole thing is worth reading. I thought it was anyway. Web 2.0 had a $2800 toll booth, which eliminates a great deal of talent from being involved. Personally I’d rather spend $2800 on other stuff. From the 2.0 site “Why attend? The Internet is a critical component of the strategy and infrastructure of every successful company today. At its most disruptive, it redefines markets and creates entirely new opportunities. More than 50 thought leaders and entrepreneurs are slated to present in an interactive format stressing audience participation.”

    For $2800 you can be a Web 2.0 thought leader. Follow the thought leader or follow your own lead.
    If you missed Web 2.0, there’s always something new to explore. Web 2.0 is over. We are now at Web 2.1

    Web 2.1: A BrainJam for the rest of us
    “The event raised over $1,000 for the Internet Archive and we are donating more than $100 to the Creative Commons fundraising drive. In our efforts to be a transparent organization, we posted an overview of the event financials for everyone to see who is interested. My personal wrap up is here and you can still visit the event Wiki, the original event site and the Insytes Blog for more details.” http://www.web2point1.org/

  14. Part of the reason scalability matters is that the web IS the business for most Web 2.0 companies. As a result, a failure to scale will become a major impediment to growth and could result in a growth curve that would deflate. Technorati and SixApart nearly averted disaster but it cost them in reputation (in Technorati’s case) and real dollars (in the case of SixApart). Google Analytics got mostly bad word of mouth because its systems failed to scale… etc, etc… In Web 1.0, we made some of the mistakes too but on the other side, by overestimating demand, which is almost as bad as it forces you to spend more than you should on your infrastructure.

  15. I would still say, scalability should start with code design. Things like database slave replication (mySQL seems to support this fine and well), faster code invocation like say FastCGI way, lesser libraries and lesser code bloat, and extensible code design; is your cheapest and easiest ticket to a scalable web service.

    Immediately investing on a lot of hardware at the start, is just overkill, no it’s suicide. As Jason F mentioned, why invest on something that’s not yet there?

    I posted my dev rants on making a scalable web app on my website.

  16. Jason is wrong about uptime/downtime. If Wal-Mart is down for half an hour I’ll come back later, because there’s really no competition at those prices. If Technorati is down I’ll immediately start looking elsewhere, maybe Google blogsearch, Yahoo blogsearch, Sphere, PubSub, Feedster, etc. There’s not shortage of alternative to try out with absolutely $0.00 switching costs. To make matters worse, I might even like one of those other search engines, and remain a loyal user forever.

    Even if I went to a Wal-Mart competitor for a day, it’s unlikely I’d be a permanently lost competitor.

  17. A sw developer’s perspective:
    In software development, there is unwritten
    rule not to worry about performance/scalability
    at the beginning because its easy to
    overengineer. Usually, performance/scalability
    problems are welcome because it means that the
    software/service is becoming popular.
    That said, this also differentiates great
    developers from the average — their design
    is simple, robust and scalable, eg — unix

  18. Walmart has the credit card industry supporting their card processing network. They don’t need to worry about it going down. It’s redundant, secure and scalable beyond just about anything. The average person can do the same thing as Walmart from a desktop with Paypal or a similar payment website.

    There’s always competition. With no inventory costs, you can price things lower than Walmart. That’s why people shop online. Shipping is certainly cheap enough, but UPS scales and wants your business. You can always work or shop at Walmart if that’s your kind of gig. The idea of less competition in a network environment seems kind of off base.

  19. A funny story.
    The local news folks have been running stories about the new xbox 360. Some of the stories are about overheating and crashing problems and ask if you should wait or buy an xbox 360 this Christmas. The problems seem pretty isolated and the thing has a strong warranty, so it doesn’t really matter. The big story is that you can’t find the things in a local store at list price. I found one for $1,300.00 online. I’m not buying one because it could be defective. It’s the sticker shock I can’t handle.

    I saw a 360 ad on TV last night. I think they are spending the budget on marketing while manufacturing has some problems scaling. Microsoft has huge financial resources and you can’t find their hot new product. Size doesn’t always matter. Xbox 360 is a big deal. You would think that they could make them fast enough to live up to the marketing hype. The thing would almost sell itself if it wasn’t missing in action.
    They could of made a ton of cash off of the xbox 360. They spent a ton of it on marketing with store kiosks and the works. It was a big deal.

  20. I know everyone has already chimed in on this, but seeing as this is something I deal w/alot, I figured I would take a stab at it. One of the main reasons that web2.0 companies don’t scale is that the guys writing the code have no experience w/mySQL clusters and other enterprise level services (something relatively few people do). The second is that for someone giving away their nifty Ajax product to pay through the nose for extra unused servers is nuts. Unless you have a revenue stream (which most don’t), you need to focus more on your business model/plan than whether or not your 40 load balanced Xeon servers are clustering.

    If you aren’t running a service like basecamp, then it makes sense to stagger your user-growth (its not like having a million users and no one getting access is going to make you money if you give everything away for free).

    BTW finding good resources on clustering/redundancy is easier said than done. Unless you’ve got an IT degree, most of it is incomprehensible.

  21. Clustering and load balancing are infrastructure scalability solutions not an application scalability solution. Clustering and load balancing does provide high availability but may not provide much in scalability. Scalability and security must get designed into the applications. Number of servers is not an issue, It how efficiently you can scale you application is more important. e.g.

    1. Google search server farm has over 400,000 servers with very little clustering, they do scale don’t they.
    2. Inkotomi got buried by Google because they used clustering and load balancing instead of distributed application architecture.

    Ramana Kovi
    ePlatform,Inc

  22. Number of servers is not an issue, It how efficiently you can scale you application is more important. e.g.
    Google search server farm has over 400,000 servers with very little clustering, they do scale don’t they.

    I don’t get it.
    Way too technical for me I guess.
    I bet they have a bunch of code I’d never figure out. 400,000 is a lot of servers. I wonder what the electric bill is. Once we have more wifi, we’ll need less server farming just to find stuff on other servers. Maybe you’ll just be able store stuff in the air. More room to grow corn and spuds.

  23. Jason Friend thinks To go from 98% to 99.9% uptime costs tens of thousands dollars.

    This is again another misconception in scalability arena. System and Network vendors want you to design with load balancing and clustering but best solution to the uptime and scalability problem is to design a distributed application with partition of data into multiple databases. Basic question is “Do you want scale by infrastructure or scale by application design through distributed computing? If you are an on-line business, you must drive to high scalability and uptime. If you approach scalability from clustering and load balancing, yes it can cost you thousands or even millions of dollars. Take a online social networking website as an example, It is lot easy to design SQL/JOIN in a single database to find relationships but when data gets large, single database becomes a bottleneck. If you want to scale, spread your data onto multiple databases and design multi-database query engine to cross-join across multiple database servers. Recently SAP bought Callixa to do just that. Link http://sapventures.typepad.com/main/2005/12/sap_picks_up_in.html#comments

    Hope that helps
    Ramana Kovi

  24. Sony 2.0|666 “built to flipâ€?
    SunComm’s MediaMax? That other copy protection system Sony BMG has shipped on some of its music CDs in an effort to cut down on piracy?
    It can cost you thousands or even millions of dollars. MediaMax is a different copy protection system than the “rootkitâ€? DRM that has been drawing all the attention.

    Sony opens up over another CD security hole, the Register reported today.
    “other severe problems with MediaMax discs, including: undisclosed communications with servers Sony controls… undisclosed installation of over 18 MB of software regardless of whether the user agrees to the End User License Agreement; and failure to include an uninstaller with the CD.â€?
    Also:
    The Texas Attorney General reportedly filed filed the lawsuit on Monday against Sony BMG Music Entertainment under Anti-Spyware laws.

    RIAA thought the problem was the user. This seems like a real problem. I wonder if Sony will have some downtime.

  25. Fact is, services do not fail becauce their opgrators can’t sccle them. They faile due to the inability to attract paying customers. In the scheme of things, scaling a service is relatively easy but getting customers and making money is very difficult.

  26. Let’s back up and first ask “What IS Web 2.0?”

    It turns out that “Web 2.0” is hype and that the correct way of achieving scale is what preceded “Web 2.0”, namely the principles of REST (REpresentational State Transfer) which were used to design the WWW:
    http://rest.blueoxen.net/cgi-bin/wiki.pl

    Do scale and size matter? Certainly if you’re Google. Probably not if you have little traffic.

    But in any case all discussion of “Web 2.0” is pure advertising and a waste of time. Each vendor selling “Web 2.0” has a different definition of it and none is worth buying. Best to let these firms die while they’re young!

  27. “If you want to scale, spread your data onto multiple databases and design multi-database query engine to cross-join across multiple database servers.”

    That would severly degrade your performance due to blocking and the whole thing would crash quicker then if you had it on a single machine.

    The only thing that works at that level is to replicate part of the database to another server and then run queries off that. I’ve never heard of a single major company attempting to chain together DB servers and then run queries across them. Each major social networking company or dating site replicates the DB to other servers and runs queries against it.

  28. Marcus,

    Here are couple of good papers from ACM/IEEE transactions

    P.M.G. Apers, A.R. Hevener, and S.B. Yao, “Optimization algorithms for distributed queries,” IEEE Transactions on Software Engineering, vol. 9, no. 1, 1983.

    Jaideep Srivastava, Optimizing multi-join queries in parallel relational databases, Proceedings of the second international conference on Parallel and distributed information systems, January 1993, San Diego, California, United States

    Lot of reasearch went in this area in last 25 years.

    Regarding real life examples ask any body who is running terabyte size databases and with directed graph problems. IC Design Placement and Routing, Web Search Engines, Security agencies are good places to start.

    Thrashing has nothing to do with multi-database querie engine. This is a network issue waiting for query returns from multiple database servers.

  29. I read that paper and it only talks about distributing a query across multipul CPU’s within a SINGLE server. This is how every DB on the market currently functions, and every db has options which limit how many cpu’s a query can use ram etc.

    As far as i know there isn’t a single database out there that recommends you to run queries between machines because there is no way to have it scale. The OS needs to lock the pages in ram and maintain index locks for queries.

    The articles you reference talk about a computer with N drives and X cpu’s all running under a single operating system, basically a custom built super computer.

    No web 2.0 company is going to spend 20 million on a super computer when they can cluster together $5,000 off the shelf PC’s and have a great database cluster.

  30. That is very fast read. I am not sure what you are read, It is called “multi-join queries in PARALLEL RELATIONAL databases”. The paper talks about queries across PARALLEL RELATIONAL databases. i.e multiple database servers. It is upto you where to place databases servers, in a single computer under single OS or spread across multiple computers (CPUs).

    Good luck with your approch. bye!

  31. I see what your talking about now. Hypercubes and etc are 10’s of millions of dollars. The one that Lord of the rings was rendered on AMD basically gave them the hardware for free for bragging rights as is commonly done by both intel and AMD.

    I thought you’d ment this was a actual solution for web 2.0 companies, but your talking about massive corperations doing massive data processing and not a transaction processing environment. In otherwords your starting with a fixed data set and rendering something out of it. On a website your working with a dataset that is constantly changing.

    I was part of a team a few years ago that went in and optimized the worlds largest commerical SQL server implimention at the time according to microsoft. The hypercubes work great for math problems or rendering of movies etc but fail miserablely when it comes to online multithreaded transaction processing. The main reason being that no company has spent hundreds of millions trying to build a database server. Ie oracle mysql and sql server are the only products with the features required for OLTP.

  32. Common sense – 1. Plan to scale 2. Get users 3. Scale.

    The key seems to be to factor scaleability into your initial planning. Doesn’t mean that you have to get the chequebook out until you need to.

    The hardest step is step 2. Creating a business that drives customers and revenue is the toughest nut to crack. Step 1 needs to be done well, but is not where the business risk truly lies.

  33. Les blogs’ website is actually lesblogs.typepad.com.

    One of the issues Web 2.0 companies have in building solutions on the cheap is that they don’t plan for real scalability of their infrastructure. Which means that they melt down whenever their traffic/audience grows faster than their ability to add servers/gears. And at some point, the whole thing needs to be rebuilt anyway to plan for the next stage in scalability.

  34. Scalibility is neither a web 2.0 nor a web 1.0 problem. It is a basic business issue.

    Companies can only grow their profits either by increasing their revenue or decreasing their costs. WebServers are a fixed costs and as more users join a service they can quickly absorb all the companies capital.

    One alternative is to look at a different architecture where the processing power and disk space is distributed instead of centralised. And I’m not talking Ajax here but a truly distributed architecture using the web as a platform and the user’s machine for processing and hosting data.

    That’s what we do at AllPeers which means that we can scale infinitly since each new user joining brings his computer power with him.

  35. Ironic as the URL is an MT-TP account so who knows what you’ll see.

    I believe today’s 6A outtage has lit a fuse on a gasoline soaked segment of the high tech industry. One that has been doing a lot of good but which has been shown up for the immature child it really is.

    That immaturity manifests itself in a failure to recognise the scaling issue. It is why the IBMs of this world earn billions of dollars, trying to hold this fragile beast we call the Internet together. For the sake of those in the real world that have to earn money buying and selling real goods and services that people physically consume in their real lives. Not virtually.

    This time around, large corporations that were tettering on the edge of immersing themselves in this ‘stuff’ may well shrink back.

    If I’m remotely correct, it will be a sad day.

  36. At Omnidrive we are preparing for our public release by having both a large hardware supplier and a hosting company that owns co-lo centers as both early investors and members of either our board and advisory committee.

    With a good business plan and product companies in these industries are open to such involvement – I can’t believe that it doesn’t happen more often with web 2.0 companies considering that supply of hardware and rackspace are very important parts of the business.

    I don’t agree with the 37 signals approach and nor do I believe that close to 100% uptime costs hundreds of thousands of dollars (actually, I know it doesn’t).

  37. “i think you folks are missing the point. i think ramana is spot on. it is the right architecture, and ability to think through the what if problem. it is not throwing servers which is an issue, it is an issue of building a scalable architecture. and that a lot of companies are not thinking through.”

    Spoken like an old school architect. I suggest reading about agile methodolgies and the shared nothing architecture. Architects almost always get it wrong. Requirements (and possibly the entire direction of the business) change and the architecture often becomes obsolete… so all the up front work is wasted.

    I remember the old days walking around the Exodus data centers looking at the millions of dollars of equipment (Sun E10Ks, etc) for these startups that had zero users. Most of these companies are dead now. Maybe some of these companies would still be around if they didn’t shoot their VC load on “building a scalable architecture” for an empty, zero traffic site. Perhaps they should have honed in on a good service that provided real value…

    here’s my take:
    1) Hire really freaking smart people who won’t do retarded things
    2) keep your architecture as simple as possible. add complexity when you actually need it.
    3) optimize when needed.

    If you have good coders and keep it simple, you’ll be able to make the changes for unforseen business requirements/load.

  38. I doubt any startup company ever looks forward to being the next web icon, especially in the face of viscious competition. Traffic doesn’t spike overnight. You accommodate the demand as the need arises. For example, what sense does it make to write code that handles a cluster of servers, when you only get a few hundred visits a day in the beginning?

  39. Pingback: Thought Leadership
  40. a.. Tis the season! I was searching the web and found your entry . I really like your site and found it worth time reading through the post. I am looking to publish a comprehensive site ranges many types of historical needlework. All those interested in this area will find this article of interest as it is written from many perspective. Please feel free to take a look at my blog at process strategic planning and add any thing your want.

  41. Just reading through the post,”
    1) Hire really freaking smart people who won’t do retarded things
    2) keep your architecture as simple as possible. add complexity when you actually need it.
    3) optimize when needed.”
    I think this is the way.

  42. Just read through the post.
    “1) Hire really freaking smart people who won’t do retarded things
    2) keep your architecture as simple as possible. add complexity when you actually need it.
    3) optimize when needed.

    It’s the way.

  43. Brilliant blog you have .I like the comments and topics you discuss here.Although this is not the information,I was hoping to find with my search.I believe it’s great when you come across a genuine subject that makes sense.Good luck in all your endeavors.If you have the chance.Maybe you could stop by my new web site.How to get free internet advertising.

Leave a Reply to Luke Kanies Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.