Netflix automates dynamic repartitioning to fix Cassandra wide partitions at scale
Netflix's engineering team explains how they built an automated system to dynamically repartition time series workloads in Apache Cassandra. Their TimeSeries Abstraction divides data into discrete time chunks, but initial provisioning often fails when workloads are misestimated or evolve. They developed background workers that monitor partition histograms via Cassandra virtual tables and automatically adjust future time slices to maintain optimal partition density. This approach reduced tail latency, thread queueing, and timeouts across thousands of datasets without requiring manual reconfiguration or expensive cluster scaling.