I recently decided to build a user authentication system with a simple but strict rule: store as little personal information as possible. The goal was to only keep hashed versions of emails and passwords in the database. No plaintext data. This idea was largely inspired by recent data leaks, like the one at 4chan, where user emails were exposed. I wanted to see if I could build a system where even a full database leak wouldn’t reveal user emails.
Here is a detailed story of that process, including a critical mistake I made and how I fixed it.
The First Attempt: With A Critical Flaw
My initial plan seemed logical. If I was hashing passwords for security, why not do the exact same thing for emails? I wrote a simple function to hash any email that came through during registration.
Here is the code for my first, flawed idea:
// My first attempt at hashing emails. This was a mistake.
async function hashEmail_MyBadIdea(email: string) {
// This uses a random salt every time, just like for passwords.
return await Bun.password.hash(email, { algorithm: "bcrypt" });
}
A user with the email user@example.com
came to register. My code hashed their email, which turned into something like $2b$10$abc123...
, and I saved it. So far, so good.
The problem started when the same user, maybe forgetting they had an account, tried to register again with the same email. My code was supposed to check the database and say, “Hey, this email is already taken.”
Instead, when it hashed user@example.com
for the second time, the result was completely different, something like $2b$10$xyz789...
. Because the hash was different, my database check failed to find a match, and it created a brand new, duplicate account.
The reason this happened is that password hashing algorithms like bcrypt are designed to be random. They add a unique “salt” (a random piece of text) every time you hash something. This is a crucial security feature for passwords, as it means two users with the same password will have different hashes. But for checking if an email already exists, this feature becomes a bug.
The Fix: Deterministic Hashing with a Fixed Salt
To solve this, I needed a hashing method that was deterministic. This is a simple concept: the same input should always produce the exact same output.
I created a new function for email hashing that uses a fixed, secret salt that I define. This way, the process is repeatable.
Here is the corrected code:
// The corrected, deterministic email hashing function.
async function hashEmail(email: string): Promise<string> {
// A secret, fixed salt that only I know.
const MY_SECRET_SALT = "a-very-long-and-secret-string-that-never-changes";
// First, normalize the email to avoid case issues.
const lowercaseEmail = email.toLowerCase().trim();
// Combine the email with my fixed salt.
const saltedEmail = MY_SECRET_SALT + lowercaseEmail;
// Use a predictable hashing algorithm like SHA-256.
const encoder = new TextEncoder();
const data = encoder.encode(saltedEmail);
const hashBuffer = await crypto.subtle.digest('SHA-256', data);
// Convert the hash to a string.
return Array.from(new Uint8Array(hashBuffer))
.map(byte => byte.toString(16).padStart(2, '0'))
.join('');
}
With this new function, the email user@example.com
will always produce the exact same hash. Now, my duplicate check works perfectly, and the problem is solved.
How Password Verification Still Works
This led me to another question. If password hashes are random every time, how can we possibly verify a user’s old password when they try to log in or reset it?
Fortunately, the answer is built into modern password hashing. The random salt used for a password is not stored separately; it is embedded directly inside the final hash string.
A password hash looks something like this:
$argon2id$[settings]$[THE_RANDOM_SALT]$[THE_FINAL_HASH]
When you use a verification function like Bun.password.verify()
, it intelligently performs these steps:
- It takes the full hash string from the database.
- It extracts the embedded salt from that string.
- It uses that exact same salt to hash the password the user just typed.
- It compares the result with the hash part of the stored string. If they match, the password is correct.
This is why we can have random, secure password hashes while still being able to verify them reliably.
When Should You Use This Method?
Now for an important disclaimer. Hashing emails this way is not for every application. Most platforms, like e-commerce sites or social networks, need the user’s actual email for sending notifications, marketing, and crucial password reset links. For those applications, storing the email in plaintext (while protecting it with other security measures) is necessary.
However, this privacy-focused method is incredibly useful for specific types of platforms:
- Anonymous Forums and Imageboards: Sites like 4chan, where user anonymity is a core feature, can protect their users’ identities by not storing emails in a recoverable format.
- Pastebin Services: Users can create accounts to manage their pastes without needing to expose their email addresses.
- Privacy-Centric Platforms: Any platform where the primary goal is to minimize the collection of personally identifiable information.
Even with hashed emails, password resets are still possible. You can either require the user to provide their old password for verification or, in more advanced setups, have a system that can send a reset link by verifying the email hash against user input without ever needing to decrypt it.
This entire experiment showed me that with a few careful considerations, it is possible to build a system that respects user privacy on a deeper level, an important lesson in today’s world of frequent data breaches.
Live Demo & Source Code
- Live Demo - hash-auth.pujan.pm.
- Source Code - github.com/pujan-modha/hash-auth.