Using MapReduce is Like Plumbing with Pre-Clogged Pipes
MapReduce is no longer the only way to process data on Hadoop. In fact, it’s arguably the worst Hadoop data processing framework. By now, everyone knows how awesome Hadoop is for large scale, data...
View ArticleThe Tragedy of Tez
Tez is one of the marvelous ironies of the fast moving big data and open source software space, a piece of brilliant technology that was obsolete almost as soon as it was released. In the second in my...
View ArticleThe Spark that Set the Hadoop World on Fire
Spark is the darling of the open source community right now. It’s setting the Hadoop world on fire with its power and speed in large scale data processing on Hadoop clusters. Spark is one of the most...
View ArticleData Day Texas Happy Hour Takeaways
I learned a lot at Data Day Texas. I live tweeted a lot of interesting bits on @RobertsPaige as I went along, but some of the most enjoyable and enlightening stuff happened at the happy hour...
View ArticleCyber Security with Apache Metron and Storm
A few weeks ago at Hadoop Summit, I caught up with some friends from the project I worked on last year with Hortonworks, including Ryan Merriman who is now an Apache Metron architect. Since Apache...
View ArticleOwen O’Malley on the Origins of Hadoop, Spark and a Vulcan ORC
Owen O’Malley is one of the folks I chatted with at the last Hadoop Summit in San Jose. I already discovered the first time I met him that he was the big Tolkien geek behind the naming of ORC files, as...
View Article
More Pages to Explore .....