How Netflix built its real-time data infrastructure
What makes Netflix, Netflix? Creating compelling original programming, analyzing its user data to serve subscribers better, and letting people consume content in the ways they prefer, according to Investopedia’s analysis.
While few people would disagree, probably not many are familiar with the backstory of what enables the analysis of Netflix user and operational data to serve subscribers better. During Netflix’s global hyper-growth, business and operational decisions rely on faster logging data more than ever, says Zhenzhong Xu.
Xu joined Netflix in 2015 as a founding engineer on the real-time data Infrastructure team, and later led the stream processing engines team. He developed an interest in real-time data in the early 2010s, and has since believed there is much value yet to be uncovered in this area.
Recently, Xu left Netflix to pursue a similar but expanded vision in the real-time machine learning space. Xu refers to the development of Netflix’s real-time data Infrastructure as an iterative journey, taking place between 2015 and 2021. He breaks down this journey in four evolving phases.
Phase 1 involved rescuing Netflix logs from the failing batch pipelines. In this phase, Xu’s team built a streaming-first platform from the ground up to replace the failing pipelines.