Product
Problem
- We want to continue using existing tools like Google Analytics™ without breaking them (e.g., bounce rates, visits durations, etc.).
- We don't want collected data to be related to any identifiable individual, by any means (e.g., cookies, fingerprints, ip, user-agent, etc.).
- We want cookie consents to get out of the way of our users.
- We want to comply with past, current, and future data privacy laws.
Solution
We collect minimal data, send them to a proxy for identification and anonymization, and forward them to the service provider.
Collection
We focus on the minimum data required by analytics, and so we preserve users' privacy and prevent data leaks. Over time and as needed, we will add more data as long as we are guaranteed to stay on track (cf., notion of singling out ↗).
- Page view
- Page hostname and path (cf., privacy concerns of query strings ↗)
- Page title
- Referrer hostname (cf., privacy concerns of referrers ↗)
- More coming (e.g., UTM codes), provided that users' privacy is not undermined.
Both the page path and title must be free of personally identifiable information (PII). If there is any possibility of your paths or titles containing PII, you'll need to remove it.
Identification
The IP address and the User-Agent are personally identifiable information (PII) ↗ baked into the communication protocol.
To anonymize identities irreversibly and make it impossible to identify the underlying individual, we map the (api key, ip, ua) tuple to a cryptographically secure pseudorandom identifier (generated from a minimum of 384-bits of entropy ↗).
As a consequence:
- As the data controller, you can't link a data subject to an identifier (and vice versa).
- As the data processors, third-party providers (e.g., Google Analytics™) can't link back an identifier to a data subject (and vice versa).
As the trusted tier:
- We ensure individuals are no longer identifiable even in a post-quantum world ↗.
- We apply individuals' right to be forgotten by destroying the mapping after 24h.
Anonymization
- We do not forward the IP address. It is off-topic.
- We forward the country and the city of the anonymized IP address using MaxMind GeoLite2™ ↗.
- We forward a redacted version of the User-Agent containing only the type of the browser and operating system.
Security
- Encryption keys and entropy sources are backed by FIPS 140-2 Level 2 validated HSM.
- Anything in transit uses TLS/SSL encryption.
- Anything at rest uses AES-256 GCM encryption.
- Anything at rest is deleted after 24h.