|
|
The Dancing Men
In Arthur Conan Doyle's The Adventure of the Dancing Men
Doctor Watson was once again amazed at his companion's penetrating
insights, this time into the cryptic messages hidden in the dancing
men, a seeming child's scrawl of dancing stick figures.
When first presented with the mystery, Holmes could do nothing---a short
encrypted message could mean anything: "These hieroglyphics have
evidently a meaning. If it is a purely arbitrary one, it may be
impossible for us to solve it. If, on the other hand, it is
systematic, I have no doubt that we shall get to the bottom of it."
Once presented with a few more secret messages, however, he rapidly
broke the encryption, for the conspirators had foolishly used a
simple encryption scheme. At the risk of spoiling the story of the
dancing men, I'll give Holmes's description of how he decrypted the
messages:
"The first message submitted to me was
so short that it was impossible for me to do more than say, with
some confidence, that the symbol stood for E."
Clever Holmes had seen, just as Young had with the Rosetta stone,
that changing each letter (or word) into another symbol may make the
message look odd, but it doesn't change each letter's frequency.
Moreover, like Scrabble players today, he knew that e is the most
common letter in English. So he simply found the most frequent
dancing man and assumed that it stood for e.
Now, in the single word I have already got the two E's coming second and fourth in a
word of five letters. It might be `sever,' or `lever,' or `never.'
There can be no question that the latter as a reply is most
probable.... Accepting it as correct, we are now able to say that
the symbols stand respectively for N, V, and R.
And so, by bits and pieces, Holmes broke the encryption and unmasked the clueless conspirators.
Everyone keeps secrets. Over two thousand years
ago, for example, Julius Caesar encrypted messages to his generals
far afield by cyclically mapping letters to the third letter on in
the alphabet: a became d, b became e, and so on, and z became c.
Thus, the message 'attack at dawn' would become 'dwwdfn dw gdzq.'
Over fifteen hundred years later, Mary, Queen of Scots, used such an
encryption to plot with Spain the assassination of her cousin, Queen
Elizabeth I. Sadly for Mary, Elizabeth's secret agents, who had
cleverly instigated the conspiracy in the first place, used a
statistical analysis to quickly break the encryption. So at eight in
the morning of Wednesday, February 8, 1587, Mary lost her head.
Governments around the world took note and never again used anything
as simple as a Caesar encryption. But while the new schemes they
came up with were more complex, they still only substituted and re-
arranged letters and other symbols. Nobody could think of anything
better.
So although practical secrecy advanced over the
centuries---particularly during the fury of technological
development we now call the Second World War---modern secrecy really
only started in 1948 when Claude Shannon, a brilliant researcher at
AT&T Bell Laboratories, put his finger on the real problem.
Shannon saw that roughly seven in every ten letters in a long
English message are redundant. For example, the three letters e, t,
and a alone account for well over a quarter of all letters used in
English. If English were not redundant, all letters would occur with
equal frequency.
Similarly, the nine words: the, of, and, to, it,
you, be, have, and will, account for a quarter of all words used. It
says something about us that I, me, my, and mine didn't make the top
nine, but you did.
Because all words and letters aren't equally
likely, each letter of a word, or each word of a sentence, builds
context for the next one, thereby reducing the choices. For this
reason, the first few letters of a word (the first few words of a
sentence, the first few sentences of a book) are usually more
important (that is, less redundant) than later ones.
For example,
in English q is always followed by u. Also, if a word starts with th
the next letter is almost surely a vowel, the pseudovowel y, or an
r. If a word starts with the chances are almost one in two that the
next letter is r. And if a word starts with ther the next letter is
almost surely a vowel or m. Similarly, if we hear the word the we
usually expect the next word to be a noun, or the beginning of a
noun phrase. If we hear the phrase the cat ate the, we expect the
next word to be mouse. Of course, the next word could just as well
be television. The more previous context there is, the fewer the
possible continuations---and the greater our surprise when the next
word isn't what we expected.
Languages are so redundant to help
us understand garbled communication. But that massive redundancy
makes long encrypted messages, even after massive rearrangements and
replacements, pretty easy to break by statistical analysis. Lik* al*
hu*an *ang*age*, E*gli*h i* ex*rem*ly *edu*dan*.
Of course, even
a simple encryption can take time to break; for dramatic reasons,
Doyle shortened Holmes's task considerably. Besides the three
possibilities he identified, the word he interpreted as never could
have been aedes, bedew, beget, beret, or scores of others. Today, of
course, we can find all such possibilities in millionths of a second
with a computer.
Secret writers took all those lessons to heart
but didn't know what to do about them. The only answer seemed to be
to use their computers to pile on more and more rearrangements and
replacements in greater and greater profusion, hoping that the
secrecy breakers' computers weren't fast enough to keep up. But
given the way computers were improving, they knew that was a losing
proposition.
One If by Land, Two If by Sea
|