How I Stumbled on Cryptography
This semester I am studying Information Security Systems and discovered a captivating world in modern cryptography. In my studies I encountered various encryption techniques that blend logical precision with creative problem solving.
Discovering the Caesar Cipher: The Salad(?) of My Curiosity
One of the first methods I explored was the Caesar cipher. Named after Caesar Salad Julius Caesar, who famously used this method to exchange recipie for a salad secret messages, it involves shifting every letter in the plaintext forward by a fixed number of positions in the alphabet.
For example, with a shift of 3:
- ‘A’ becomes ‘D’
- ‘B’ becomes ‘E’
- ‘C’ becomes ‘F’
Decrypting is simply a matter of shifting in the opposite direction. Although basic, this cipher offered early insights into how simple algorithms can protect information and sparked my curiosity to learn more.
Evolving Beyond Salads: Embracing More Complex Ciphers
Building on this foundation, I delved into more sophisticated methods such as:
- The Vigenère cipher, which uses a repeating keyword to execute shifts that add layers of complexity.
- Mono-alphabetic substitution ciphers, where every letter is replaced by a unique counterpart selected by a random mapping.
These techniques, while seemingly robust, reveal vulnerabilities when subject to frequency analysis. By comparing the distribution of letters in the ciphertext to typical English text, it becomes possible to expose flaws in simple substitution schemes.
Challenge: Cracking a Random Substitution Cipher
Driven by curiosity and a desire to test my analytical skills, I took on the challenge of decrypting a text that was encoded with a random one-to-one letter substitution. Using frequency analysis, I mapped the distribution of letters in the ciphertext to standard English letter frequencies to reveal the hidden message.
Problem Statement: Cracking a Random Substitution Cipher
You are given a large encrypted text in which every letter has been replaced with another letter according to a one-to-one mapping. The goal is to decrypt the text based on established English letter frequencies.
The ciphertext is provided as a continuous string of lowercase letters. Although the precise mapping is unknown, it remains consistent throughout the text.
You might think that how to tell the other party what letter to replace with what letter? The answer is simple: share the key of length 26 (or 25) with them. i.e. the key
qwertyuiopasdfghjklzxcvbnm
means that the letter ‘a’ is replaced with ‘q’, the letter ‘b’ is replaced with ‘w’, and so on.
Input & Output Format
Input:
-
ciphertext: A string (1 ≤ len(ciphertext) ≤ 100,000) containing only lowercase English letters (a to z), no punctuations but spaces are placed between words.
-
Sample Input: ciphertext.txt - Random Mapping (one-to-one)
Output:
- The decrypted plaintext as a string.
Constraints:
- The text consists solely of lowercase English letters.
- The mapping is strictly one-to-one.
- The ciphertext will not exceed 100,000 characters.
- The original text follows the typical frequency patterns of the English language.
Reference: English Letter Frequencies
Letter | Frequency | Letter | Frequency |
---|---|---|---|
A | 8.167% | N | 6.749% |
B | 1.492% | O | 7.507% |
C | 2.782% | P | 1.929% |
D | 4.253% | Q | 0.095% |
E | 12.702% | R | 5.987% |
F | 2.228% | S | 6.327% |
G | 2.015% | T | 9.056% |
H | 6.094% | U | 2.758% |
I | 6.966% | V | 0.978% |
J | 0.153% | W | 2.360% |
K | 0.772% | X | 0.150% |
L | 4.025% | Y | 1.974% |
M | 2.406% | Z | 0.074% |
The English letter frequencies provided above are based on commonly accepted estimates. While more precise sources may exist, the differences are generally minor and unlikely to significantly affect decryption outcomes. That said, if you are working with especially large ciphertexts or require greater accuracy, it might be beneficial to consult additional research to verify the frequency data.
Example Cases
Example 1
Input:
ciphertext = "wklv lv d whvw phvvdjh"
Output:
"this is a test message"
Explanation: The given text was shifted (Caesar cipher example):
- w -> t, k -> h, l -> i, etc., based on frequency analysis.
Example 2
Input:
ciphertext = "mps kcuqs anwm oqpumfs"
Output:
"the quick brown fox"
Explanation: Here, letters are randomly mapped. We decode by matching letter frequencies with real-world English frequencies.
In this case, when the input size is small, the difficulty gets harder as the frequency analysis becomes more inaccurate. But when the input size is large, the frequency analysis becomes more accurate and the decryption becomes easier.
Why This Problem Matters
This problem is not just a fun cryptography puzzle, it also teaches an important lesson in security. Many early encryption methods relied on simple letter substitutions, but they failed because patterns always emerge in any structured language. Even modern cryptographic techniques, like AES and RSA, have to ensure that they don’t leave behind identifiable patterns.
By working on this problem, you can experience firsthand how analyzing patterns in data can lead to breaking encryption. This is the foundation of cryptanalysis, a field that has shaped cybersecurity and even world history (think of the Enigma machine in WWII).
References & Further Reading
- William Stallings - Cryptography and Network Security
- Simon Singh - The Code Book
- Bruce Schneier - Applied Cryptography
- Cryptanalysis and Frequency Analysis