Pinterest Deduplicates URLs at Scale Using Content Fingerprints
Pinterest engineers built MIQPS, a data-driven URL normalization system that replaces manual allowlists for large-scale content deduplication. By rendering pages and comparing content fingerprints, it determines which query parameters are essential and which are tracking noise, then applies these rules at runtime. The architecture separates expensive offline analysis from lightweight runtime processing, cutting infrastructure costs while handling the long tail of millions of heterogeneous merchant domains.