There are several online references on how to run Apache Hadoop® (referred to as “Hadoop” in this article) in Docker for demo and test purposes. In a previous blog post, we described how we containerized Uber’s production Hadoop infrastructure spanning 21,000+ hosts.
HDFS NameNode is the most performance-sensitive component within large multi-tenant Hadoop clusters. We had deferred NameNode containerization to the end of our containerization journey to leverage learnings from containerizing other Hadoop components. In this blog post, we’ll share our experience on how we containerized HDFS NameNodes and architected a zero-downtime migration for 32 clusters.