One thought on “Hadoop-Focused Startup Cloudera Raises $5 Million”

  1. How do you analyze terabytes of data – whether it it clickstream logs, financial transactions, call data records, or retail purchases? Reading through it one record at a time won’t work, since the data typically grows faster than a sequential program can read it. As Google and others have proved, the only way is with massive parallelism – i.e. have many servers work together to do the processing. And to do this you can either use an MPP (massively parallel) database engine such as Teradata, Netezza or Greenplum, or you can write your own parallel program.

    However, traditional parallel programming really is rocket science. The big innovation of Google’s MapReduce paradigm is that it forces developers to write programs in a simplistic model that is easily parallelized. And it has led to a number of implementations outside of Google — the open-source Hadoop project, Greenplum’s in-database MapReduce (which runs MapReduce and SQL natively on the same engine), and even an implementation that runs within NVIDIA GPUs.

    The challenge of Hadoop is that it is fairly raw, hard to setup and lacking in enterprise support. For folks who are serious about making Hadoop work, the team at Cloudera are bringing hands-on experience that is sorely needed.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.