Pinterest Scales URL Deduplication Using Content Fingerprints
Pinterest engineers built MIQPS, a data-driven URL normalization system that replaces manual allowlists for large-scale content deduplication. It renders pages and compares content fingerprints to determine which query parameters affect page identity, keeping important ones and stripping tracking noise. By moving expensive analysis offline and applying precomputed rules at runtime, the system cuts infrastructure costs from redundant fetching and indexing across millions of merchant domains.