I got to thinking that maybe one could compare the known K4 letter frequencies against what you could expect if it followed standard plaintext frequencies. I used a Blackchamber letter frequency page as a resource to speed up the analysis.
Standard Plaintext Frequencies X 97 = Expected vs. Actual then their resulting possible substitute based on K4 counts
e 0.12702 12.32 2 k
t 0.09056 8.78 6 s
a 0.08167 7.92 4 t
o 0.07507 7.3 5 u
i 0.06966 6.75 4 o
n 0.06749 6.55 3 b
s 0.06327 6.14 6 w
h 0.06094 5.91 2 f
r 0.05987 5.8 4 q
d 0.04253 3.9 3 r
l 0.04000 3.88 4 g
c 0.02782 2.7 2 i
u 0.02758 2.68 6 a
m 0.02406 2.33 1 l
w 0.02360 2.29 5 z
f 0.02228 2.16 4 p
g 0.02015 1.95 4 j
y 0.01974 1.91 1 n
p 0.01929 1.87 3 d
b 0.01492 1.45 5 h
v 0.00978 0.95 2 v
k 0.00772 0.75 8 c
j 0.00153 0.15 3 x
x 0.00150 0.15 2 e
q 0.00095 0.09 4 y
z 0.00074 0.07 4 m
K4 translated by it’s frequency alone: (I took some liberties with spacing, punctuation and capitalization)
In Edo, I, Jil, bomn Tim. Chnn shmd vrrf dylet ti as art grt TX ewws. U age mop cuscy hny qf. Vaaz whfeslp wjagkp cleo bou ox ekud.
I know there are wonderful statistical methods to describe the nature of ciphertext that can then determine the likelihood that it is this or that type of cipher. I would direct your interest to the Army manual or cryptology books available at your library/bookstore. Why? Well, I’m not that good at those calculations and it might get a little obscure for some of us so let’s just call it distortion.
- With no distortion, you can be sure it’s either plain English and you are not much of a reader or that it has been transposed.
- With no change in peaks and valley heights, just their distribution across the alphabet, well my friend, that is a sure sign of substitution.
- With lowered peaks and raised valleys it is probably polyalphabetic substitution or may include other things.
With K4, most of them are too high or too low. That’s why we’re all so excited about possible polybius squares or foursquare or digraphic substitution. Problem is that no one has been able to get one of them to work.
So we are left with several possibilities:
- it is a form of substitution no one has tried correctly
- it is a combination of substitution and transposition
- the letters of K4 have been transformed and we need the translated version to apply cryptoanalytic methods to solve
- not all of the letters of K4 are important
- only a portion of the letters have been modified
- there are two types of ciphering that have been layered together and we need to separate or divide K4 into segments to solve
How are we supposed to know how to solve it? Supposedly we’ve been given clues.
Why did you do all that business with the numbers and the frequencies if you knew it was bunk? I was secretely hoping a pattern would emerge in which letters were silenced or emphasized that would suggest the masking method.
At least we know we’re looking for some kind of masking/cipher that evens out the letter frequencies.