Substitution Cipher and Frequency Analysis

QRPBQR GUVF VS LBH PNA

How I Stumbled on Cryptography

This semester I am studying Information Security Systems and discovered a captivating world in modern cryptography. In my studies I encountered various encryption techniques that blend logical precision with creative problem solving.


Discovering the Caesar Cipher: The Salad(?) of My Curiosity

One of the first methods I explored was the Caesar cipher. Named after Caesar Salad Julius Caesar, who famously used this method to exchange recipie for a salad secret messages, it involves shifting every letter in the plaintext forward by a fixed number of positions in the alphabet.

For example, with a shift of 3:

Decrypting is simply a matter of shifting in the opposite direction. Although basic, this cipher offered early insights into how simple algorithms can protect information and sparked my curiosity to learn more.


Evolving Beyond Salads: Embracing More Complex Ciphers

Building on this foundation, I delved into more sophisticated methods such as:

These techniques, while seemingly robust, reveal vulnerabilities when subject to frequency analysis. By comparing the distribution of letters in the ciphertext to typical English text, it becomes possible to expose flaws in simple substitution schemes.


Challenge: Cracking a Random Substitution Cipher

Driven by curiosity and a desire to test my analytical skills, I took on the challenge of decrypting a text that was encoded with a random one-to-one letter substitution. Using frequency analysis, I mapped the distribution of letters in the ciphertext to standard English letter frequencies to reveal the hidden message.

Problem Statement: Cracking a Random Substitution Cipher

You are given a large encrypted text in which every letter has been replaced with another letter according to a one-to-one mapping. The goal is to decrypt the text based on established English letter frequencies.

The ciphertext is provided as a continuous string of lowercase letters. Although the precise mapping is unknown, it remains consistent throughout the text.

You might think that how to tell the other party what letter to replace with what letter? The answer is simple: share the key of length 26 (or 25) with them. i.e. the key qwertyuiopasdfghjklzxcvbnm means that the letter ‘a’ is replaced with ‘q’, the letter ‘b’ is replaced with ‘w’, and so on.


Input & Output Format

Input:

Output:

Constraints:


Reference: English Letter Frequencies

LetterFrequencyLetterFrequency
A8.167%N6.749%
B1.492%O7.507%
C2.782%P1.929%
D4.253%Q0.095%
E12.702%R5.987%
F2.228%S6.327%
G2.015%T9.056%
H6.094%U2.758%
I6.966%V0.978%
J0.153%W2.360%
K0.772%X0.150%
L4.025%Y1.974%
M2.406%Z0.074%

The English letter frequencies provided above are based on commonly accepted estimates. While more precise sources may exist, the differences are generally minor and unlikely to significantly affect decryption outcomes. That said, if you are working with especially large ciphertexts or require greater accuracy, it might be beneficial to consult additional research to verify the frequency data.


Example Cases

Example 1

Input:

ciphertext = "wklv lv d whvw phvvdjh"

Output:

"this is a test message"

Explanation: The given text was shifted (Caesar cipher example):

Example 2

Input:

ciphertext = "mps kcuqs anwm oqpumfs"

Output:

"the quick brown fox"

Explanation: Here, letters are randomly mapped. We decode by matching letter frequencies with real-world English frequencies.

In this case, when the input size is small, the difficulty gets harder as the frequency analysis becomes more inaccurate. But when the input size is large, the frequency analysis becomes more accurate and the decryption becomes easier.


Why This Problem Matters

This problem is not just a fun cryptography puzzle, it also teaches an important lesson in security. Many early encryption methods relied on simple letter substitutions, but they failed because patterns always emerge in any structured language. Even modern cryptographic techniques, like AES and RSA, have to ensure that they don’t leave behind identifiable patterns.

By working on this problem, you can experience firsthand how analyzing patterns in data can lead to breaking encryption. This is the foundation of cryptanalysis, a field that has shaped cybersecurity and even world history (think of the Enigma machine in WWII).


References & Further Reading