Chapter IIPrinciples of Mechanism of a Written LanguageWith a few exceptions, notably Chinese, all modern languages are constructed of words which in turn are formed from letters. In any given language the number of letters, and their conventional order is fixed. Thus English is written with 26 letters and their conventional order isA,B,C,D,E, etc. Some letters are used very frequently and others rarely. In fact, if ten thousand consecutive letters of a text be counted and the frequency of occurrence of each letter be noted, the numbers found will be practically identical with those obtained from any other text of ten thousand letters in the same language. The relative proportion of occurrence of the various letters will also hold approximately for even very short texts.Such a count of a large number of letters, when it is put in the form of a table, is known as a frequency table. Every language has its own distinctive frequency table and, for any given language, the frequency table is almost as fixed as the alphabet. There are minor differences in frequency tables prepared from texts on special subjects. For example, if the text be newspaper matter, the frequency table will differ slightly from one prepared from military orders and will also differ slightly from one prepared from telegraph messages. But these differences are very slight as compared with the differences between the frequency tables of two different languages.Again there is a fixed ratio of occurrence ofevery letter with every other for any language and this, put in table form, constitutes a table of frequency of digraphs. In the same way a table of trigraphs, showing the ratio of occurrence of any three letters in sequence, could be prepared, but such a table would be very extensive and a count of the more common three letter combinations is usually used.Other tables, such as frequency of initial and final letters of words, might be of value but the common practice is to put cipher text into groups of five or ten letters each and eliminate word forms. This is almost a necessity in telegraphic and radio communication to enable the receiving operator to check correct receipt of a message. He must get five letters, neither more nor less, per word or he is sure a mistake has been made. There is little difficulty, as a rule, in restoring word forms in the deciphered message.We will now take up, in order, the various frequency tables and linguistic peculiarities of English and Spanish. Frequency tables for French, German, and Italian for single letters will follow. All frequency tables have been re-calculated from at least ten thousand letters of text and compared with existing tables. No marked difference has been found in any case between the re-calculated tables and those already in use.Data for Solution of Ciphers in EnglishTable I.—Normal frequency table. Frequency for ten thousand letters and for two hundred letters. This latter is put in graphic form and is necessarily an approximation. Taken from military orders and reports, English text.10,000 Letters200 LettersA778161111111111111111B1413111C2966111111D402811111111E12772611111111111111111111111111F19741111G1743111H59512111111111111I667131111111111111J5111K74211L37271111111M2886111111N6861411111111111111O807161111111111111111P22341111Q8R651131111111111111S62212111111111111T8551711111111111111111U3086111111V112211W1763111X27Y19641111Z17VowelsAEIOU= 38.37%; consonantsLNRST= 31.86%; consonantsJKQXZ= 1.77%.The vowels may be safely taken as 40%, consonantsLNRSTas 30% and consonantsJKQXZas 2%.Order of letters:E T O A N I R S H D L U C M P F Y W G B V K J X Z Q.Table II.—Frequency table for telegraph messages, English text. This table varies slightly from the standard frequency table because the common word “the” is rarely used in telegrams and there is a tendency to use longer and less common words in preparing telegraph messages.10,000 Letters200 LettersA813161111111111111111B1493111C3066111111D417811111111E13192611111111111111111111111111F20541111G20141111H386811111111I7111411111111111111J4211K88211L392811111111M2736111111N7181411111111111111O8441711111111111111111P243511111Q3811R6771411111111111111S656131111111111111T634131111111111111U3216111111V1363111W1663111X5111Y20841111Z6In this table the vowelsAEIOU= 40.08%, consonantsLNRST= 30.77% and consonantsJKQXZ= 2.25%.Orders of letters:E O A N I R S T D L H U C M P Y F G W B V K X J Q Z.Table III.—Table of frequency of digraphs, duals or pairs (English). This table was prepared from 20,000 letters, but the figures shown are on the basis of 2,000 letters. For this reason they are, to a certain extent, approximate; that is, merely because no figures are shown for certain combinations, we should not assume that such combinations never occur but rather that they are rare. The letters in the horizontal line at the top and bottom are the leading letters; those in the vertical columns at the sides are the following letters. Thus in two thousand letters we may expect to findAHonce andHAtwenty-six times.ABCDEFGHIJKLMNOPQRSTUVWXYZA17102232264227811291312924112B512111122131C61114211113231111D61230124301411113E111416122633102618141217361112216511F328212213253111G413211231H111241412112105032I21412651121598121312132223611J1K112211L146216111693633235M731322341104112N3832521313223943112O1112488312182478371315222615P218124232181431Q2111R16133403626121258228112S1613251217121127291161116T25131213523202124821620116227U1216132233117153551V31553251W128111124233X14211Y3224118121317Z111ABCDEFGHIJKLMNOPQRSTUVWXYZTable IV.—Order of frequency of common pairs to be expected in a count of 2,000 letters of military or semi-military English text. (Based on a count of 20,000 letters).TH50AT25ST20ER40EN25IO18ON39ES25LE18AN38OF25IS17RE36OR25OU17HE33NT24AR16IN31EA22AS16ED30TI22DE16ND30TO22RT16HA26IT20VE16Table V.—Table of recurrence of groups of three letters to be expected in a count of 10,000 letters of English text.THE89TIO33EDT27AND54FOR33TIS25THA47NDE31OFT23ENT39HAS28STH21ION36NCE27MEN20Table VI.—Table of frequency of occurrence of letters as initials and finals of English words. Based on a count of 4,000 words; this table gives the figures for an average 100 words and is necessarily an approximation, like Table III. English words are derived from so many sources that it is not impossible for any letter to occur as an initial or final of a word, althoughQ,XandZare rare as initials andB,I,J,Q,V,XandZare rare as finals.LettersABCDEFGHIJKLMNOPQRSTUVWXYZInitial96652423311242102-45172-7-3-Final1--1017642--161941-89111-1-8-It is practically impossible to find five consecutive letters in an English text without a vowel and we may expect from one to three with two as the general average. In any twenty letters we may expect to find from 6 to 9 vowels with 8 as an average. Among themselves the relative frequency of occurrence of each of the vowels, (includingYwhen a vowel) is as follows:A,19.5%E,32.0%I,16.7%O,20.2%U,8.0%Y,3.6%The foregoing tables give all the essential facts about the mechanism of the English language from the standpoint of the solution of ciphers. The use to be made of these tables will be evident when the solution of different types of ciphers is taken up.Data for the Solution of Ciphers in SpanishThe Spanish language is written with the following alphabet:A B C CH D E F G H I J L LLM N Ñ O P Q R RR S T U V X Y Zwhile the exact sense often depends upon the use of accents over the vowels. However, in cipher work it is exceedingly inconvenient to use the permanent digraphs,CH,LLandRRand they do not appear as such in any specimens of Spanish or Mexican cipherexamined. Accented vowels andÑare also not found and we may, in general, say that a cipher whose text is Spanish will be prepared with the following alphabet:A B C D E F G H I J L M N O P Q R S T U V X Y Zand the receiver must supply the accents and the tilde over theNto conform to the general sense.However, many Mexican cipher alphabets contain the lettersKandW. This is particularly true of the ciphers in use by secret service agents who must be prepared to handle words likeNEW YORK,WILSONandWASHINGTON. The lettersKandWwill, however, have a negligible frequency except in short messages where words like these occur more than once.In this connection, if a cipher contains Mexican geographical names likeCHIHUAHUA,MEXICO,MUZQUIZ, the lettersH,XandZwill have a somewhat exaggerated frequency.In Spanish, the letterQis always followed byUand theUis always followed by one of the other vowels,A,E,IorO. AsQUEorQUIoccurs not infrequently in Spanish text, particularly in telegraphic correspondence, it is well worth noting that, if aQoccurs in a transposition cipher, we must connect it withUand another vowel. The clue to several transposition ciphers has been found from this simple relation.Table VII.—Normal frequency table for military orders and reports, calculated on a basis of 10,000 letters of Spanish text. The graphic form is on a basis of 200 letters.10,000 Letters200 LettersA135227111111111111111111111111111B102211C4749111111111D524101111111111E1402281111111111111111111111111111F91211G1373111H102211I60612111111111111J4111L517101111111111M3006111111N61912111111111111O818161111111111111111P257511111Q87211R75115111111111111111S7241411111111111111T422811111111U38771111111V85211X6Y103211Z4211In this table the vowelsAEIOU= 45.65%; consonantsLNRST= 30.33%; consonantsJKQXZ= 1.76%.Order of letters:E A O R S N I D L C T U M P G Y (BH) F Q V Z J X.Table VIII.—Table of frequency of digraphs, duals or pairs, Spanish text. Like Table III, this table is on the basis of 2,000 letters although prepared from a count of 20,000 letters. For this reason it is, to a certain extent an approximation; that is, merely because no figures are shown for certain combinations, we should not assume that such combinations never occur but rather that they are rare. The letters in the horizontal lines at the top and bottom are the leading letters; those in the vertical columns at the sides are the following letters. Thus, in two thousand letters, we may expect to findAItwice andIAtwenty-three times.ABCDEFGHIJLMNOPQRSTUVXYZA941911561723541893202911218625AB6314BC24662453889522CD3129319131094DE122659101572121822493825282533EF444331FG24842GH2121021HI22316523111361053IJ321JL21363933721561222LM126516157261MN32462832122NO26222634916282015711OP13324927411PQ1151231QR402724436311173RS3952107142143ST51344185630TU2426345264171521UV222222VXXY562522YZ12142ZABCDEFGHIJLMNOPQRSTUVXYZTable IX.—Order of frequency of common pairs to be expected in a count of 2,000 letters of Spanish military orders and reports. Based on Table VIII.DE59ON32AC24LA54AD31EC24ES52ST30CI23EN46ED29IA23AR40RA29DO22AS39TE28NE22EL39ER27AL21RE38CO26LL21OR36SE25PA20AN32UE25PO20Alphabetic Frequency Tables(Truesdell)Frequency of occurrence in 1,000 letters of text:LetterFrenchGermanItalianPortugueseA8052117140B61866C33314534D40513140E197173126142F9211012G7421710H641610I658111459J3115K1101L49287232M31203046N791206648O572893110P3283028Q121316R74696464S66574988T65606043U62512946V2192015W115X3111Y2111Z114124Order of FrequencyFrenchEANRSIUOLDCPMVQFGBJYZTHXGermanENIRTSADGHCLFMBWZKVPJQXYUOItalianEAIOLNRTSCDMUVGZFBQPHPortugueseEAOSRINMTDCLPQVFGBJZXYUHGraphic Frequency TablesFrequency of occurrence in 200 letters of text.FrenchA161111111111111111B211C6111111D101111111111E39111111111111111111111111111111111111111F211G11H11I131111111111111J11KL101111111111M6111111N161111111111111111O1111111111111P6111111Q211R15111111111111111S131111111111111T131111111111111U12111111111111V41111WX11YZItalianA2311111111111111111111111B11C9111111111D6111111E251111111111111111111111111F211G3111H11I2311111111111111111111111L1411111111111111M6111111N131111111111111O191111111111111111111P6111111QR131111111111111S101111111111T12111111111111U6111111V41111XYZ211GermanA101111111111B41111C6111111D101111111111E3211111111111111111111111111111111F41111G811111111H811111111I161111111111111111JK211L6111111M41111N24111111111111111111111111O6111111P211QR1411111111111111S1111111111111T12111111111111U101111111111V211W3111XYZ3111PortugueseA281111111111111111111111111111B11C71111111D811111111E281111111111111111111111111111F211G211H211I12111111111111J11L6111111M9111111111N101111111111O221111111111111111111111P6111111Q3111R131111111111111S18111111111111111111T9111111111U9111111111V3111XYZ111Occurrence rare, usually in proper names.↑
Chapter IIPrinciples of Mechanism of a Written LanguageWith a few exceptions, notably Chinese, all modern languages are constructed of words which in turn are formed from letters. In any given language the number of letters, and their conventional order is fixed. Thus English is written with 26 letters and their conventional order isA,B,C,D,E, etc. Some letters are used very frequently and others rarely. In fact, if ten thousand consecutive letters of a text be counted and the frequency of occurrence of each letter be noted, the numbers found will be practically identical with those obtained from any other text of ten thousand letters in the same language. The relative proportion of occurrence of the various letters will also hold approximately for even very short texts.Such a count of a large number of letters, when it is put in the form of a table, is known as a frequency table. Every language has its own distinctive frequency table and, for any given language, the frequency table is almost as fixed as the alphabet. There are minor differences in frequency tables prepared from texts on special subjects. For example, if the text be newspaper matter, the frequency table will differ slightly from one prepared from military orders and will also differ slightly from one prepared from telegraph messages. But these differences are very slight as compared with the differences between the frequency tables of two different languages.Again there is a fixed ratio of occurrence ofevery letter with every other for any language and this, put in table form, constitutes a table of frequency of digraphs. In the same way a table of trigraphs, showing the ratio of occurrence of any three letters in sequence, could be prepared, but such a table would be very extensive and a count of the more common three letter combinations is usually used.Other tables, such as frequency of initial and final letters of words, might be of value but the common practice is to put cipher text into groups of five or ten letters each and eliminate word forms. This is almost a necessity in telegraphic and radio communication to enable the receiving operator to check correct receipt of a message. He must get five letters, neither more nor less, per word or he is sure a mistake has been made. There is little difficulty, as a rule, in restoring word forms in the deciphered message.We will now take up, in order, the various frequency tables and linguistic peculiarities of English and Spanish. Frequency tables for French, German, and Italian for single letters will follow. All frequency tables have been re-calculated from at least ten thousand letters of text and compared with existing tables. No marked difference has been found in any case between the re-calculated tables and those already in use.Data for Solution of Ciphers in EnglishTable I.—Normal frequency table. Frequency for ten thousand letters and for two hundred letters. This latter is put in graphic form and is necessarily an approximation. Taken from military orders and reports, English text.10,000 Letters200 LettersA778161111111111111111B1413111C2966111111D402811111111E12772611111111111111111111111111F19741111G1743111H59512111111111111I667131111111111111J5111K74211L37271111111M2886111111N6861411111111111111O807161111111111111111P22341111Q8R651131111111111111S62212111111111111T8551711111111111111111U3086111111V112211W1763111X27Y19641111Z17VowelsAEIOU= 38.37%; consonantsLNRST= 31.86%; consonantsJKQXZ= 1.77%.The vowels may be safely taken as 40%, consonantsLNRSTas 30% and consonantsJKQXZas 2%.Order of letters:E T O A N I R S H D L U C M P F Y W G B V K J X Z Q.Table II.—Frequency table for telegraph messages, English text. This table varies slightly from the standard frequency table because the common word “the” is rarely used in telegrams and there is a tendency to use longer and less common words in preparing telegraph messages.10,000 Letters200 LettersA813161111111111111111B1493111C3066111111D417811111111E13192611111111111111111111111111F20541111G20141111H386811111111I7111411111111111111J4211K88211L392811111111M2736111111N7181411111111111111O8441711111111111111111P243511111Q3811R6771411111111111111S656131111111111111T634131111111111111U3216111111V1363111W1663111X5111Y20841111Z6In this table the vowelsAEIOU= 40.08%, consonantsLNRST= 30.77% and consonantsJKQXZ= 2.25%.Orders of letters:E O A N I R S T D L H U C M P Y F G W B V K X J Q Z.Table III.—Table of frequency of digraphs, duals or pairs (English). This table was prepared from 20,000 letters, but the figures shown are on the basis of 2,000 letters. For this reason they are, to a certain extent, approximate; that is, merely because no figures are shown for certain combinations, we should not assume that such combinations never occur but rather that they are rare. The letters in the horizontal line at the top and bottom are the leading letters; those in the vertical columns at the sides are the following letters. Thus in two thousand letters we may expect to findAHonce andHAtwenty-six times.ABCDEFGHIJKLMNOPQRSTUVWXYZA17102232264227811291312924112B512111122131C61114211113231111D61230124301411113E111416122633102618141217361112216511F328212213253111G413211231H111241412112105032I21412651121598121312132223611J1K112211L146216111693633235M731322341104112N3832521313223943112O1112488312182478371315222615P218124232181431Q2111R16133403626121258228112S1613251217121127291161116T25131213523202124821620116227U1216132233117153551V31553251W128111124233X14211Y3224118121317Z111ABCDEFGHIJKLMNOPQRSTUVWXYZTable IV.—Order of frequency of common pairs to be expected in a count of 2,000 letters of military or semi-military English text. (Based on a count of 20,000 letters).TH50AT25ST20ER40EN25IO18ON39ES25LE18AN38OF25IS17RE36OR25OU17HE33NT24AR16IN31EA22AS16ED30TI22DE16ND30TO22RT16HA26IT20VE16Table V.—Table of recurrence of groups of three letters to be expected in a count of 10,000 letters of English text.THE89TIO33EDT27AND54FOR33TIS25THA47NDE31OFT23ENT39HAS28STH21ION36NCE27MEN20Table VI.—Table of frequency of occurrence of letters as initials and finals of English words. Based on a count of 4,000 words; this table gives the figures for an average 100 words and is necessarily an approximation, like Table III. English words are derived from so many sources that it is not impossible for any letter to occur as an initial or final of a word, althoughQ,XandZare rare as initials andB,I,J,Q,V,XandZare rare as finals.LettersABCDEFGHIJKLMNOPQRSTUVWXYZInitial96652423311242102-45172-7-3-Final1--1017642--161941-89111-1-8-It is practically impossible to find five consecutive letters in an English text without a vowel and we may expect from one to three with two as the general average. In any twenty letters we may expect to find from 6 to 9 vowels with 8 as an average. Among themselves the relative frequency of occurrence of each of the vowels, (includingYwhen a vowel) is as follows:A,19.5%E,32.0%I,16.7%O,20.2%U,8.0%Y,3.6%The foregoing tables give all the essential facts about the mechanism of the English language from the standpoint of the solution of ciphers. The use to be made of these tables will be evident when the solution of different types of ciphers is taken up.Data for the Solution of Ciphers in SpanishThe Spanish language is written with the following alphabet:A B C CH D E F G H I J L LLM N Ñ O P Q R RR S T U V X Y Zwhile the exact sense often depends upon the use of accents over the vowels. However, in cipher work it is exceedingly inconvenient to use the permanent digraphs,CH,LLandRRand they do not appear as such in any specimens of Spanish or Mexican cipherexamined. Accented vowels andÑare also not found and we may, in general, say that a cipher whose text is Spanish will be prepared with the following alphabet:A B C D E F G H I J L M N O P Q R S T U V X Y Zand the receiver must supply the accents and the tilde over theNto conform to the general sense.However, many Mexican cipher alphabets contain the lettersKandW. This is particularly true of the ciphers in use by secret service agents who must be prepared to handle words likeNEW YORK,WILSONandWASHINGTON. The lettersKandWwill, however, have a negligible frequency except in short messages where words like these occur more than once.In this connection, if a cipher contains Mexican geographical names likeCHIHUAHUA,MEXICO,MUZQUIZ, the lettersH,XandZwill have a somewhat exaggerated frequency.In Spanish, the letterQis always followed byUand theUis always followed by one of the other vowels,A,E,IorO. AsQUEorQUIoccurs not infrequently in Spanish text, particularly in telegraphic correspondence, it is well worth noting that, if aQoccurs in a transposition cipher, we must connect it withUand another vowel. The clue to several transposition ciphers has been found from this simple relation.Table VII.—Normal frequency table for military orders and reports, calculated on a basis of 10,000 letters of Spanish text. The graphic form is on a basis of 200 letters.10,000 Letters200 LettersA135227111111111111111111111111111B102211C4749111111111D524101111111111E1402281111111111111111111111111111F91211G1373111H102211I60612111111111111J4111L517101111111111M3006111111N61912111111111111O818161111111111111111P257511111Q87211R75115111111111111111S7241411111111111111T422811111111U38771111111V85211X6Y103211Z4211In this table the vowelsAEIOU= 45.65%; consonantsLNRST= 30.33%; consonantsJKQXZ= 1.76%.Order of letters:E A O R S N I D L C T U M P G Y (BH) F Q V Z J X.Table VIII.—Table of frequency of digraphs, duals or pairs, Spanish text. Like Table III, this table is on the basis of 2,000 letters although prepared from a count of 20,000 letters. For this reason it is, to a certain extent an approximation; that is, merely because no figures are shown for certain combinations, we should not assume that such combinations never occur but rather that they are rare. The letters in the horizontal lines at the top and bottom are the leading letters; those in the vertical columns at the sides are the following letters. Thus, in two thousand letters, we may expect to findAItwice andIAtwenty-three times.ABCDEFGHIJLMNOPQRSTUVXYZA941911561723541893202911218625AB6314BC24662453889522CD3129319131094DE122659101572121822493825282533EF444331FG24842GH2121021HI22316523111361053IJ321JL21363933721561222LM126516157261MN32462832122NO26222634916282015711OP13324927411PQ1151231QR402724436311173RS3952107142143ST51344185630TU2426345264171521UV222222VXXY562522YZ12142ZABCDEFGHIJLMNOPQRSTUVXYZTable IX.—Order of frequency of common pairs to be expected in a count of 2,000 letters of Spanish military orders and reports. Based on Table VIII.DE59ON32AC24LA54AD31EC24ES52ST30CI23EN46ED29IA23AR40RA29DO22AS39TE28NE22EL39ER27AL21RE38CO26LL21OR36SE25PA20AN32UE25PO20Alphabetic Frequency Tables(Truesdell)Frequency of occurrence in 1,000 letters of text:LetterFrenchGermanItalianPortugueseA8052117140B61866C33314534D40513140E197173126142F9211012G7421710H641610I658111459J3115K1101L49287232M31203046N791206648O572893110P3283028Q121316R74696464S66574988T65606043U62512946V2192015W115X3111Y2111Z114124Order of FrequencyFrenchEANRSIUOLDCPMVQFGBJYZTHXGermanENIRTSADGHCLFMBWZKVPJQXYUOItalianEAIOLNRTSCDMUVGZFBQPHPortugueseEAOSRINMTDCLPQVFGBJZXYUHGraphic Frequency TablesFrequency of occurrence in 200 letters of text.FrenchA161111111111111111B211C6111111D101111111111E39111111111111111111111111111111111111111F211G11H11I131111111111111J11KL101111111111M6111111N161111111111111111O1111111111111P6111111Q211R15111111111111111S131111111111111T131111111111111U12111111111111V41111WX11YZItalianA2311111111111111111111111B11C9111111111D6111111E251111111111111111111111111F211G3111H11I2311111111111111111111111L1411111111111111M6111111N131111111111111O191111111111111111111P6111111QR131111111111111S101111111111T12111111111111U6111111V41111XYZ211GermanA101111111111B41111C6111111D101111111111E3211111111111111111111111111111111F41111G811111111H811111111I161111111111111111JK211L6111111M41111N24111111111111111111111111O6111111P211QR1411111111111111S1111111111111T12111111111111U101111111111V211W3111XYZ3111PortugueseA281111111111111111111111111111B11C71111111D811111111E281111111111111111111111111111F211G211H211I12111111111111J11L6111111M9111111111N101111111111O221111111111111111111111P6111111Q3111R131111111111111S18111111111111111111T9111111111U9111111111V3111XYZ111Occurrence rare, usually in proper names.↑
Chapter IIPrinciples of Mechanism of a Written Language
With a few exceptions, notably Chinese, all modern languages are constructed of words which in turn are formed from letters. In any given language the number of letters, and their conventional order is fixed. Thus English is written with 26 letters and their conventional order isA,B,C,D,E, etc. Some letters are used very frequently and others rarely. In fact, if ten thousand consecutive letters of a text be counted and the frequency of occurrence of each letter be noted, the numbers found will be practically identical with those obtained from any other text of ten thousand letters in the same language. The relative proportion of occurrence of the various letters will also hold approximately for even very short texts.Such a count of a large number of letters, when it is put in the form of a table, is known as a frequency table. Every language has its own distinctive frequency table and, for any given language, the frequency table is almost as fixed as the alphabet. There are minor differences in frequency tables prepared from texts on special subjects. For example, if the text be newspaper matter, the frequency table will differ slightly from one prepared from military orders and will also differ slightly from one prepared from telegraph messages. But these differences are very slight as compared with the differences between the frequency tables of two different languages.Again there is a fixed ratio of occurrence ofevery letter with every other for any language and this, put in table form, constitutes a table of frequency of digraphs. In the same way a table of trigraphs, showing the ratio of occurrence of any three letters in sequence, could be prepared, but such a table would be very extensive and a count of the more common three letter combinations is usually used.Other tables, such as frequency of initial and final letters of words, might be of value but the common practice is to put cipher text into groups of five or ten letters each and eliminate word forms. This is almost a necessity in telegraphic and radio communication to enable the receiving operator to check correct receipt of a message. He must get five letters, neither more nor less, per word or he is sure a mistake has been made. There is little difficulty, as a rule, in restoring word forms in the deciphered message.We will now take up, in order, the various frequency tables and linguistic peculiarities of English and Spanish. Frequency tables for French, German, and Italian for single letters will follow. All frequency tables have been re-calculated from at least ten thousand letters of text and compared with existing tables. No marked difference has been found in any case between the re-calculated tables and those already in use.Data for Solution of Ciphers in EnglishTable I.—Normal frequency table. Frequency for ten thousand letters and for two hundred letters. This latter is put in graphic form and is necessarily an approximation. Taken from military orders and reports, English text.10,000 Letters200 LettersA778161111111111111111B1413111C2966111111D402811111111E12772611111111111111111111111111F19741111G1743111H59512111111111111I667131111111111111J5111K74211L37271111111M2886111111N6861411111111111111O807161111111111111111P22341111Q8R651131111111111111S62212111111111111T8551711111111111111111U3086111111V112211W1763111X27Y19641111Z17VowelsAEIOU= 38.37%; consonantsLNRST= 31.86%; consonantsJKQXZ= 1.77%.The vowels may be safely taken as 40%, consonantsLNRSTas 30% and consonantsJKQXZas 2%.Order of letters:E T O A N I R S H D L U C M P F Y W G B V K J X Z Q.Table II.—Frequency table for telegraph messages, English text. This table varies slightly from the standard frequency table because the common word “the” is rarely used in telegrams and there is a tendency to use longer and less common words in preparing telegraph messages.10,000 Letters200 LettersA813161111111111111111B1493111C3066111111D417811111111E13192611111111111111111111111111F20541111G20141111H386811111111I7111411111111111111J4211K88211L392811111111M2736111111N7181411111111111111O8441711111111111111111P243511111Q3811R6771411111111111111S656131111111111111T634131111111111111U3216111111V1363111W1663111X5111Y20841111Z6In this table the vowelsAEIOU= 40.08%, consonantsLNRST= 30.77% and consonantsJKQXZ= 2.25%.Orders of letters:E O A N I R S T D L H U C M P Y F G W B V K X J Q Z.Table III.—Table of frequency of digraphs, duals or pairs (English). This table was prepared from 20,000 letters, but the figures shown are on the basis of 2,000 letters. For this reason they are, to a certain extent, approximate; that is, merely because no figures are shown for certain combinations, we should not assume that such combinations never occur but rather that they are rare. The letters in the horizontal line at the top and bottom are the leading letters; those in the vertical columns at the sides are the following letters. Thus in two thousand letters we may expect to findAHonce andHAtwenty-six times.ABCDEFGHIJKLMNOPQRSTUVWXYZA17102232264227811291312924112B512111122131C61114211113231111D61230124301411113E111416122633102618141217361112216511F328212213253111G413211231H111241412112105032I21412651121598121312132223611J1K112211L146216111693633235M731322341104112N3832521313223943112O1112488312182478371315222615P218124232181431Q2111R16133403626121258228112S1613251217121127291161116T25131213523202124821620116227U1216132233117153551V31553251W128111124233X14211Y3224118121317Z111ABCDEFGHIJKLMNOPQRSTUVWXYZTable IV.—Order of frequency of common pairs to be expected in a count of 2,000 letters of military or semi-military English text. (Based on a count of 20,000 letters).TH50AT25ST20ER40EN25IO18ON39ES25LE18AN38OF25IS17RE36OR25OU17HE33NT24AR16IN31EA22AS16ED30TI22DE16ND30TO22RT16HA26IT20VE16Table V.—Table of recurrence of groups of three letters to be expected in a count of 10,000 letters of English text.THE89TIO33EDT27AND54FOR33TIS25THA47NDE31OFT23ENT39HAS28STH21ION36NCE27MEN20Table VI.—Table of frequency of occurrence of letters as initials and finals of English words. Based on a count of 4,000 words; this table gives the figures for an average 100 words and is necessarily an approximation, like Table III. English words are derived from so many sources that it is not impossible for any letter to occur as an initial or final of a word, althoughQ,XandZare rare as initials andB,I,J,Q,V,XandZare rare as finals.LettersABCDEFGHIJKLMNOPQRSTUVWXYZInitial96652423311242102-45172-7-3-Final1--1017642--161941-89111-1-8-It is practically impossible to find five consecutive letters in an English text without a vowel and we may expect from one to three with two as the general average. In any twenty letters we may expect to find from 6 to 9 vowels with 8 as an average. Among themselves the relative frequency of occurrence of each of the vowels, (includingYwhen a vowel) is as follows:A,19.5%E,32.0%I,16.7%O,20.2%U,8.0%Y,3.6%The foregoing tables give all the essential facts about the mechanism of the English language from the standpoint of the solution of ciphers. The use to be made of these tables will be evident when the solution of different types of ciphers is taken up.Data for the Solution of Ciphers in SpanishThe Spanish language is written with the following alphabet:A B C CH D E F G H I J L LLM N Ñ O P Q R RR S T U V X Y Zwhile the exact sense often depends upon the use of accents over the vowels. However, in cipher work it is exceedingly inconvenient to use the permanent digraphs,CH,LLandRRand they do not appear as such in any specimens of Spanish or Mexican cipherexamined. Accented vowels andÑare also not found and we may, in general, say that a cipher whose text is Spanish will be prepared with the following alphabet:A B C D E F G H I J L M N O P Q R S T U V X Y Zand the receiver must supply the accents and the tilde over theNto conform to the general sense.However, many Mexican cipher alphabets contain the lettersKandW. This is particularly true of the ciphers in use by secret service agents who must be prepared to handle words likeNEW YORK,WILSONandWASHINGTON. The lettersKandWwill, however, have a negligible frequency except in short messages where words like these occur more than once.In this connection, if a cipher contains Mexican geographical names likeCHIHUAHUA,MEXICO,MUZQUIZ, the lettersH,XandZwill have a somewhat exaggerated frequency.In Spanish, the letterQis always followed byUand theUis always followed by one of the other vowels,A,E,IorO. AsQUEorQUIoccurs not infrequently in Spanish text, particularly in telegraphic correspondence, it is well worth noting that, if aQoccurs in a transposition cipher, we must connect it withUand another vowel. The clue to several transposition ciphers has been found from this simple relation.Table VII.—Normal frequency table for military orders and reports, calculated on a basis of 10,000 letters of Spanish text. The graphic form is on a basis of 200 letters.10,000 Letters200 LettersA135227111111111111111111111111111B102211C4749111111111D524101111111111E1402281111111111111111111111111111F91211G1373111H102211I60612111111111111J4111L517101111111111M3006111111N61912111111111111O818161111111111111111P257511111Q87211R75115111111111111111S7241411111111111111T422811111111U38771111111V85211X6Y103211Z4211In this table the vowelsAEIOU= 45.65%; consonantsLNRST= 30.33%; consonantsJKQXZ= 1.76%.Order of letters:E A O R S N I D L C T U M P G Y (BH) F Q V Z J X.Table VIII.—Table of frequency of digraphs, duals or pairs, Spanish text. Like Table III, this table is on the basis of 2,000 letters although prepared from a count of 20,000 letters. For this reason it is, to a certain extent an approximation; that is, merely because no figures are shown for certain combinations, we should not assume that such combinations never occur but rather that they are rare. The letters in the horizontal lines at the top and bottom are the leading letters; those in the vertical columns at the sides are the following letters. Thus, in two thousand letters, we may expect to findAItwice andIAtwenty-three times.ABCDEFGHIJLMNOPQRSTUVXYZA941911561723541893202911218625AB6314BC24662453889522CD3129319131094DE122659101572121822493825282533EF444331FG24842GH2121021HI22316523111361053IJ321JL21363933721561222LM126516157261MN32462832122NO26222634916282015711OP13324927411PQ1151231QR402724436311173RS3952107142143ST51344185630TU2426345264171521UV222222VXXY562522YZ12142ZABCDEFGHIJLMNOPQRSTUVXYZTable IX.—Order of frequency of common pairs to be expected in a count of 2,000 letters of Spanish military orders and reports. Based on Table VIII.DE59ON32AC24LA54AD31EC24ES52ST30CI23EN46ED29IA23AR40RA29DO22AS39TE28NE22EL39ER27AL21RE38CO26LL21OR36SE25PA20AN32UE25PO20Alphabetic Frequency Tables(Truesdell)Frequency of occurrence in 1,000 letters of text:LetterFrenchGermanItalianPortugueseA8052117140B61866C33314534D40513140E197173126142F9211012G7421710H641610I658111459J3115K1101L49287232M31203046N791206648O572893110P3283028Q121316R74696464S66574988T65606043U62512946V2192015W115X3111Y2111Z114124Order of FrequencyFrenchEANRSIUOLDCPMVQFGBJYZTHXGermanENIRTSADGHCLFMBWZKVPJQXYUOItalianEAIOLNRTSCDMUVGZFBQPHPortugueseEAOSRINMTDCLPQVFGBJZXYUHGraphic Frequency TablesFrequency of occurrence in 200 letters of text.FrenchA161111111111111111B211C6111111D101111111111E39111111111111111111111111111111111111111F211G11H11I131111111111111J11KL101111111111M6111111N161111111111111111O1111111111111P6111111Q211R15111111111111111S131111111111111T131111111111111U12111111111111V41111WX11YZItalianA2311111111111111111111111B11C9111111111D6111111E251111111111111111111111111F211G3111H11I2311111111111111111111111L1411111111111111M6111111N131111111111111O191111111111111111111P6111111QR131111111111111S101111111111T12111111111111U6111111V41111XYZ211GermanA101111111111B41111C6111111D101111111111E3211111111111111111111111111111111F41111G811111111H811111111I161111111111111111JK211L6111111M41111N24111111111111111111111111O6111111P211QR1411111111111111S1111111111111T12111111111111U101111111111V211W3111XYZ3111PortugueseA281111111111111111111111111111B11C71111111D811111111E281111111111111111111111111111F211G211H211I12111111111111J11L6111111M9111111111N101111111111O221111111111111111111111P6111111Q3111R131111111111111S18111111111111111111T9111111111U9111111111V3111XYZ11
With a few exceptions, notably Chinese, all modern languages are constructed of words which in turn are formed from letters. In any given language the number of letters, and their conventional order is fixed. Thus English is written with 26 letters and their conventional order isA,B,C,D,E, etc. Some letters are used very frequently and others rarely. In fact, if ten thousand consecutive letters of a text be counted and the frequency of occurrence of each letter be noted, the numbers found will be practically identical with those obtained from any other text of ten thousand letters in the same language. The relative proportion of occurrence of the various letters will also hold approximately for even very short texts.
Such a count of a large number of letters, when it is put in the form of a table, is known as a frequency table. Every language has its own distinctive frequency table and, for any given language, the frequency table is almost as fixed as the alphabet. There are minor differences in frequency tables prepared from texts on special subjects. For example, if the text be newspaper matter, the frequency table will differ slightly from one prepared from military orders and will also differ slightly from one prepared from telegraph messages. But these differences are very slight as compared with the differences between the frequency tables of two different languages.
Again there is a fixed ratio of occurrence ofevery letter with every other for any language and this, put in table form, constitutes a table of frequency of digraphs. In the same way a table of trigraphs, showing the ratio of occurrence of any three letters in sequence, could be prepared, but such a table would be very extensive and a count of the more common three letter combinations is usually used.
Other tables, such as frequency of initial and final letters of words, might be of value but the common practice is to put cipher text into groups of five or ten letters each and eliminate word forms. This is almost a necessity in telegraphic and radio communication to enable the receiving operator to check correct receipt of a message. He must get five letters, neither more nor less, per word or he is sure a mistake has been made. There is little difficulty, as a rule, in restoring word forms in the deciphered message.
We will now take up, in order, the various frequency tables and linguistic peculiarities of English and Spanish. Frequency tables for French, German, and Italian for single letters will follow. All frequency tables have been re-calculated from at least ten thousand letters of text and compared with existing tables. No marked difference has been found in any case between the re-calculated tables and those already in use.
Data for Solution of Ciphers in EnglishTable I.—Normal frequency table. Frequency for ten thousand letters and for two hundred letters. This latter is put in graphic form and is necessarily an approximation. Taken from military orders and reports, English text.10,000 Letters200 LettersA778161111111111111111B1413111C2966111111D402811111111E12772611111111111111111111111111F19741111G1743111H59512111111111111I667131111111111111J5111K74211L37271111111M2886111111N6861411111111111111O807161111111111111111P22341111Q8R651131111111111111S62212111111111111T8551711111111111111111U3086111111V112211W1763111X27Y19641111Z17VowelsAEIOU= 38.37%; consonantsLNRST= 31.86%; consonantsJKQXZ= 1.77%.The vowels may be safely taken as 40%, consonantsLNRSTas 30% and consonantsJKQXZas 2%.Order of letters:E T O A N I R S H D L U C M P F Y W G B V K J X Z Q.Table II.—Frequency table for telegraph messages, English text. This table varies slightly from the standard frequency table because the common word “the” is rarely used in telegrams and there is a tendency to use longer and less common words in preparing telegraph messages.10,000 Letters200 LettersA813161111111111111111B1493111C3066111111D417811111111E13192611111111111111111111111111F20541111G20141111H386811111111I7111411111111111111J4211K88211L392811111111M2736111111N7181411111111111111O8441711111111111111111P243511111Q3811R6771411111111111111S656131111111111111T634131111111111111U3216111111V1363111W1663111X5111Y20841111Z6In this table the vowelsAEIOU= 40.08%, consonantsLNRST= 30.77% and consonantsJKQXZ= 2.25%.Orders of letters:E O A N I R S T D L H U C M P Y F G W B V K X J Q Z.Table III.—Table of frequency of digraphs, duals or pairs (English). This table was prepared from 20,000 letters, but the figures shown are on the basis of 2,000 letters. For this reason they are, to a certain extent, approximate; that is, merely because no figures are shown for certain combinations, we should not assume that such combinations never occur but rather that they are rare. The letters in the horizontal line at the top and bottom are the leading letters; those in the vertical columns at the sides are the following letters. Thus in two thousand letters we may expect to findAHonce andHAtwenty-six times.ABCDEFGHIJKLMNOPQRSTUVWXYZA17102232264227811291312924112B512111122131C61114211113231111D61230124301411113E111416122633102618141217361112216511F328212213253111G413211231H111241412112105032I21412651121598121312132223611J1K112211L146216111693633235M731322341104112N3832521313223943112O1112488312182478371315222615P218124232181431Q2111R16133403626121258228112S1613251217121127291161116T25131213523202124821620116227U1216132233117153551V31553251W128111124233X14211Y3224118121317Z111ABCDEFGHIJKLMNOPQRSTUVWXYZTable IV.—Order of frequency of common pairs to be expected in a count of 2,000 letters of military or semi-military English text. (Based on a count of 20,000 letters).TH50AT25ST20ER40EN25IO18ON39ES25LE18AN38OF25IS17RE36OR25OU17HE33NT24AR16IN31EA22AS16ED30TI22DE16ND30TO22RT16HA26IT20VE16Table V.—Table of recurrence of groups of three letters to be expected in a count of 10,000 letters of English text.THE89TIO33EDT27AND54FOR33TIS25THA47NDE31OFT23ENT39HAS28STH21ION36NCE27MEN20Table VI.—Table of frequency of occurrence of letters as initials and finals of English words. Based on a count of 4,000 words; this table gives the figures for an average 100 words and is necessarily an approximation, like Table III. English words are derived from so many sources that it is not impossible for any letter to occur as an initial or final of a word, althoughQ,XandZare rare as initials andB,I,J,Q,V,XandZare rare as finals.LettersABCDEFGHIJKLMNOPQRSTUVWXYZInitial96652423311242102-45172-7-3-Final1--1017642--161941-89111-1-8-It is practically impossible to find five consecutive letters in an English text without a vowel and we may expect from one to three with two as the general average. In any twenty letters we may expect to find from 6 to 9 vowels with 8 as an average. Among themselves the relative frequency of occurrence of each of the vowels, (includingYwhen a vowel) is as follows:A,19.5%E,32.0%I,16.7%O,20.2%U,8.0%Y,3.6%The foregoing tables give all the essential facts about the mechanism of the English language from the standpoint of the solution of ciphers. The use to be made of these tables will be evident when the solution of different types of ciphers is taken up.
Data for Solution of Ciphers in English
Table I.—Normal frequency table. Frequency for ten thousand letters and for two hundred letters. This latter is put in graphic form and is necessarily an approximation. Taken from military orders and reports, English text.10,000 Letters200 LettersA778161111111111111111B1413111C2966111111D402811111111E12772611111111111111111111111111F19741111G1743111H59512111111111111I667131111111111111J5111K74211L37271111111M2886111111N6861411111111111111O807161111111111111111P22341111Q8R651131111111111111S62212111111111111T8551711111111111111111U3086111111V112211W1763111X27Y19641111Z17VowelsAEIOU= 38.37%; consonantsLNRST= 31.86%; consonantsJKQXZ= 1.77%.The vowels may be safely taken as 40%, consonantsLNRSTas 30% and consonantsJKQXZas 2%.Order of letters:E T O A N I R S H D L U C M P F Y W G B V K J X Z Q.Table II.—Frequency table for telegraph messages, English text. This table varies slightly from the standard frequency table because the common word “the” is rarely used in telegrams and there is a tendency to use longer and less common words in preparing telegraph messages.10,000 Letters200 LettersA813161111111111111111B1493111C3066111111D417811111111E13192611111111111111111111111111F20541111G20141111H386811111111I7111411111111111111J4211K88211L392811111111M2736111111N7181411111111111111O8441711111111111111111P243511111Q3811R6771411111111111111S656131111111111111T634131111111111111U3216111111V1363111W1663111X5111Y20841111Z6In this table the vowelsAEIOU= 40.08%, consonantsLNRST= 30.77% and consonantsJKQXZ= 2.25%.Orders of letters:E O A N I R S T D L H U C M P Y F G W B V K X J Q Z.Table III.—Table of frequency of digraphs, duals or pairs (English). This table was prepared from 20,000 letters, but the figures shown are on the basis of 2,000 letters. For this reason they are, to a certain extent, approximate; that is, merely because no figures are shown for certain combinations, we should not assume that such combinations never occur but rather that they are rare. The letters in the horizontal line at the top and bottom are the leading letters; those in the vertical columns at the sides are the following letters. Thus in two thousand letters we may expect to findAHonce andHAtwenty-six times.ABCDEFGHIJKLMNOPQRSTUVWXYZA17102232264227811291312924112B512111122131C61114211113231111D61230124301411113E111416122633102618141217361112216511F328212213253111G413211231H111241412112105032I21412651121598121312132223611J1K112211L146216111693633235M731322341104112N3832521313223943112O1112488312182478371315222615P218124232181431Q2111R16133403626121258228112S1613251217121127291161116T25131213523202124821620116227U1216132233117153551V31553251W128111124233X14211Y3224118121317Z111ABCDEFGHIJKLMNOPQRSTUVWXYZTable IV.—Order of frequency of common pairs to be expected in a count of 2,000 letters of military or semi-military English text. (Based on a count of 20,000 letters).TH50AT25ST20ER40EN25IO18ON39ES25LE18AN38OF25IS17RE36OR25OU17HE33NT24AR16IN31EA22AS16ED30TI22DE16ND30TO22RT16HA26IT20VE16Table V.—Table of recurrence of groups of three letters to be expected in a count of 10,000 letters of English text.THE89TIO33EDT27AND54FOR33TIS25THA47NDE31OFT23ENT39HAS28STH21ION36NCE27MEN20Table VI.—Table of frequency of occurrence of letters as initials and finals of English words. Based on a count of 4,000 words; this table gives the figures for an average 100 words and is necessarily an approximation, like Table III. English words are derived from so many sources that it is not impossible for any letter to occur as an initial or final of a word, althoughQ,XandZare rare as initials andB,I,J,Q,V,XandZare rare as finals.LettersABCDEFGHIJKLMNOPQRSTUVWXYZInitial96652423311242102-45172-7-3-Final1--1017642--161941-89111-1-8-It is practically impossible to find five consecutive letters in an English text without a vowel and we may expect from one to three with two as the general average. In any twenty letters we may expect to find from 6 to 9 vowels with 8 as an average. Among themselves the relative frequency of occurrence of each of the vowels, (includingYwhen a vowel) is as follows:A,19.5%E,32.0%I,16.7%O,20.2%U,8.0%Y,3.6%The foregoing tables give all the essential facts about the mechanism of the English language from the standpoint of the solution of ciphers. The use to be made of these tables will be evident when the solution of different types of ciphers is taken up.
Table I.—Normal frequency table. Frequency for ten thousand letters and for two hundred letters. This latter is put in graphic form and is necessarily an approximation. Taken from military orders and reports, English text.
10,000 Letters200 LettersA778161111111111111111B1413111C2966111111D402811111111E12772611111111111111111111111111F19741111G1743111H59512111111111111I667131111111111111J5111K74211L37271111111M2886111111N6861411111111111111O807161111111111111111P22341111Q8R651131111111111111S62212111111111111T8551711111111111111111U3086111111V112211W1763111X27Y19641111Z17
VowelsAEIOU= 38.37%; consonantsLNRST= 31.86%; consonantsJKQXZ= 1.77%.
The vowels may be safely taken as 40%, consonantsLNRSTas 30% and consonantsJKQXZas 2%.
Order of letters:E T O A N I R S H D L U C M P F Y W G B V K J X Z Q.
Table II.—Frequency table for telegraph messages, English text. This table varies slightly from the standard frequency table because the common word “the” is rarely used in telegrams and there is a tendency to use longer and less common words in preparing telegraph messages.
10,000 Letters200 LettersA813161111111111111111B1493111C3066111111D417811111111E13192611111111111111111111111111F20541111G20141111H386811111111I7111411111111111111J4211K88211L392811111111M2736111111N7181411111111111111O8441711111111111111111P243511111Q3811R6771411111111111111S656131111111111111T634131111111111111U3216111111V1363111W1663111X5111Y20841111Z6
In this table the vowelsAEIOU= 40.08%, consonantsLNRST= 30.77% and consonantsJKQXZ= 2.25%.
Orders of letters:E O A N I R S T D L H U C M P Y F G W B V K X J Q Z.
Table III.—Table of frequency of digraphs, duals or pairs (English). This table was prepared from 20,000 letters, but the figures shown are on the basis of 2,000 letters. For this reason they are, to a certain extent, approximate; that is, merely because no figures are shown for certain combinations, we should not assume that such combinations never occur but rather that they are rare. The letters in the horizontal line at the top and bottom are the leading letters; those in the vertical columns at the sides are the following letters. Thus in two thousand letters we may expect to findAHonce andHAtwenty-six times.
ABCDEFGHIJKLMNOPQRSTUVWXYZA17102232264227811291312924112B512111122131C61114211113231111D61230124301411113E111416122633102618141217361112216511F328212213253111G413211231H111241412112105032I21412651121598121312132223611J1K112211L146216111693633235M731322341104112N3832521313223943112O1112488312182478371315222615P218124232181431Q2111R16133403626121258228112S1613251217121127291161116T25131213523202124821620116227U1216132233117153551V31553251W128111124233X14211Y3224118121317Z111ABCDEFGHIJKLMNOPQRSTUVWXYZ
Table IV.—Order of frequency of common pairs to be expected in a count of 2,000 letters of military or semi-military English text. (Based on a count of 20,000 letters).
TH50AT25ST20ER40EN25IO18ON39ES25LE18AN38OF25IS17RE36OR25OU17HE33NT24AR16IN31EA22AS16ED30TI22DE16ND30TO22RT16HA26IT20VE16
Table V.—Table of recurrence of groups of three letters to be expected in a count of 10,000 letters of English text.
THE89TIO33EDT27AND54FOR33TIS25THA47NDE31OFT23ENT39HAS28STH21ION36NCE27MEN20
Table VI.—Table of frequency of occurrence of letters as initials and finals of English words. Based on a count of 4,000 words; this table gives the figures for an average 100 words and is necessarily an approximation, like Table III. English words are derived from so many sources that it is not impossible for any letter to occur as an initial or final of a word, althoughQ,XandZare rare as initials andB,I,J,Q,V,XandZare rare as finals.
LettersABCDEFGHIJKLMNOPQRSTUVWXYZInitial96652423311242102-45172-7-3-Final1--1017642--161941-89111-1-8-
It is practically impossible to find five consecutive letters in an English text without a vowel and we may expect from one to three with two as the general average. In any twenty letters we may expect to find from 6 to 9 vowels with 8 as an average. Among themselves the relative frequency of occurrence of each of the vowels, (includingYwhen a vowel) is as follows:
A,19.5%E,32.0%I,16.7%O,20.2%U,8.0%Y,3.6%
The foregoing tables give all the essential facts about the mechanism of the English language from the standpoint of the solution of ciphers. The use to be made of these tables will be evident when the solution of different types of ciphers is taken up.
Data for the Solution of Ciphers in SpanishThe Spanish language is written with the following alphabet:A B C CH D E F G H I J L LLM N Ñ O P Q R RR S T U V X Y Zwhile the exact sense often depends upon the use of accents over the vowels. However, in cipher work it is exceedingly inconvenient to use the permanent digraphs,CH,LLandRRand they do not appear as such in any specimens of Spanish or Mexican cipherexamined. Accented vowels andÑare also not found and we may, in general, say that a cipher whose text is Spanish will be prepared with the following alphabet:A B C D E F G H I J L M N O P Q R S T U V X Y Zand the receiver must supply the accents and the tilde over theNto conform to the general sense.However, many Mexican cipher alphabets contain the lettersKandW. This is particularly true of the ciphers in use by secret service agents who must be prepared to handle words likeNEW YORK,WILSONandWASHINGTON. The lettersKandWwill, however, have a negligible frequency except in short messages where words like these occur more than once.In this connection, if a cipher contains Mexican geographical names likeCHIHUAHUA,MEXICO,MUZQUIZ, the lettersH,XandZwill have a somewhat exaggerated frequency.In Spanish, the letterQis always followed byUand theUis always followed by one of the other vowels,A,E,IorO. AsQUEorQUIoccurs not infrequently in Spanish text, particularly in telegraphic correspondence, it is well worth noting that, if aQoccurs in a transposition cipher, we must connect it withUand another vowel. The clue to several transposition ciphers has been found from this simple relation.Table VII.—Normal frequency table for military orders and reports, calculated on a basis of 10,000 letters of Spanish text. The graphic form is on a basis of 200 letters.10,000 Letters200 LettersA135227111111111111111111111111111B102211C4749111111111D524101111111111E1402281111111111111111111111111111F91211G1373111H102211I60612111111111111J4111L517101111111111M3006111111N61912111111111111O818161111111111111111P257511111Q87211R75115111111111111111S7241411111111111111T422811111111U38771111111V85211X6Y103211Z4211In this table the vowelsAEIOU= 45.65%; consonantsLNRST= 30.33%; consonantsJKQXZ= 1.76%.Order of letters:E A O R S N I D L C T U M P G Y (BH) F Q V Z J X.Table VIII.—Table of frequency of digraphs, duals or pairs, Spanish text. Like Table III, this table is on the basis of 2,000 letters although prepared from a count of 20,000 letters. For this reason it is, to a certain extent an approximation; that is, merely because no figures are shown for certain combinations, we should not assume that such combinations never occur but rather that they are rare. The letters in the horizontal lines at the top and bottom are the leading letters; those in the vertical columns at the sides are the following letters. Thus, in two thousand letters, we may expect to findAItwice andIAtwenty-three times.ABCDEFGHIJLMNOPQRSTUVXYZA941911561723541893202911218625AB6314BC24662453889522CD3129319131094DE122659101572121822493825282533EF444331FG24842GH2121021HI22316523111361053IJ321JL21363933721561222LM126516157261MN32462832122NO26222634916282015711OP13324927411PQ1151231QR402724436311173RS3952107142143ST51344185630TU2426345264171521UV222222VXXY562522YZ12142ZABCDEFGHIJLMNOPQRSTUVXYZTable IX.—Order of frequency of common pairs to be expected in a count of 2,000 letters of Spanish military orders and reports. Based on Table VIII.DE59ON32AC24LA54AD31EC24ES52ST30CI23EN46ED29IA23AR40RA29DO22AS39TE28NE22EL39ER27AL21RE38CO26LL21OR36SE25PA20AN32UE25PO20
Data for the Solution of Ciphers in Spanish
The Spanish language is written with the following alphabet:A B C CH D E F G H I J L LLM N Ñ O P Q R RR S T U V X Y Zwhile the exact sense often depends upon the use of accents over the vowels. However, in cipher work it is exceedingly inconvenient to use the permanent digraphs,CH,LLandRRand they do not appear as such in any specimens of Spanish or Mexican cipherexamined. Accented vowels andÑare also not found and we may, in general, say that a cipher whose text is Spanish will be prepared with the following alphabet:A B C D E F G H I J L M N O P Q R S T U V X Y Zand the receiver must supply the accents and the tilde over theNto conform to the general sense.However, many Mexican cipher alphabets contain the lettersKandW. This is particularly true of the ciphers in use by secret service agents who must be prepared to handle words likeNEW YORK,WILSONandWASHINGTON. The lettersKandWwill, however, have a negligible frequency except in short messages where words like these occur more than once.In this connection, if a cipher contains Mexican geographical names likeCHIHUAHUA,MEXICO,MUZQUIZ, the lettersH,XandZwill have a somewhat exaggerated frequency.In Spanish, the letterQis always followed byUand theUis always followed by one of the other vowels,A,E,IorO. AsQUEorQUIoccurs not infrequently in Spanish text, particularly in telegraphic correspondence, it is well worth noting that, if aQoccurs in a transposition cipher, we must connect it withUand another vowel. The clue to several transposition ciphers has been found from this simple relation.Table VII.—Normal frequency table for military orders and reports, calculated on a basis of 10,000 letters of Spanish text. The graphic form is on a basis of 200 letters.10,000 Letters200 LettersA135227111111111111111111111111111B102211C4749111111111D524101111111111E1402281111111111111111111111111111F91211G1373111H102211I60612111111111111J4111L517101111111111M3006111111N61912111111111111O818161111111111111111P257511111Q87211R75115111111111111111S7241411111111111111T422811111111U38771111111V85211X6Y103211Z4211In this table the vowelsAEIOU= 45.65%; consonantsLNRST= 30.33%; consonantsJKQXZ= 1.76%.Order of letters:E A O R S N I D L C T U M P G Y (BH) F Q V Z J X.Table VIII.—Table of frequency of digraphs, duals or pairs, Spanish text. Like Table III, this table is on the basis of 2,000 letters although prepared from a count of 20,000 letters. For this reason it is, to a certain extent an approximation; that is, merely because no figures are shown for certain combinations, we should not assume that such combinations never occur but rather that they are rare. The letters in the horizontal lines at the top and bottom are the leading letters; those in the vertical columns at the sides are the following letters. Thus, in two thousand letters, we may expect to findAItwice andIAtwenty-three times.ABCDEFGHIJLMNOPQRSTUVXYZA941911561723541893202911218625AB6314BC24662453889522CD3129319131094DE122659101572121822493825282533EF444331FG24842GH2121021HI22316523111361053IJ321JL21363933721561222LM126516157261MN32462832122NO26222634916282015711OP13324927411PQ1151231QR402724436311173RS3952107142143ST51344185630TU2426345264171521UV222222VXXY562522YZ12142ZABCDEFGHIJLMNOPQRSTUVXYZTable IX.—Order of frequency of common pairs to be expected in a count of 2,000 letters of Spanish military orders and reports. Based on Table VIII.DE59ON32AC24LA54AD31EC24ES52ST30CI23EN46ED29IA23AR40RA29DO22AS39TE28NE22EL39ER27AL21RE38CO26LL21OR36SE25PA20AN32UE25PO20
The Spanish language is written with the following alphabet:
A B C CH D E F G H I J L LLM N Ñ O P Q R RR S T U V X Y Z
A B C CH D E F G H I J L LLM N Ñ O P Q R RR S T U V X Y Z
while the exact sense often depends upon the use of accents over the vowels. However, in cipher work it is exceedingly inconvenient to use the permanent digraphs,CH,LLandRRand they do not appear as such in any specimens of Spanish or Mexican cipherexamined. Accented vowels andÑare also not found and we may, in general, say that a cipher whose text is Spanish will be prepared with the following alphabet:
A B C D E F G H I J L M N O P Q R S T U V X Y Z
A B C D E F G H I J L M N O P Q R S T U V X Y Z
and the receiver must supply the accents and the tilde over theNto conform to the general sense.
However, many Mexican cipher alphabets contain the lettersKandW. This is particularly true of the ciphers in use by secret service agents who must be prepared to handle words likeNEW YORK,WILSONandWASHINGTON. The lettersKandWwill, however, have a negligible frequency except in short messages where words like these occur more than once.
In this connection, if a cipher contains Mexican geographical names likeCHIHUAHUA,MEXICO,MUZQUIZ, the lettersH,XandZwill have a somewhat exaggerated frequency.
In Spanish, the letterQis always followed byUand theUis always followed by one of the other vowels,A,E,IorO. AsQUEorQUIoccurs not infrequently in Spanish text, particularly in telegraphic correspondence, it is well worth noting that, if aQoccurs in a transposition cipher, we must connect it withUand another vowel. The clue to several transposition ciphers has been found from this simple relation.
Table VII.—Normal frequency table for military orders and reports, calculated on a basis of 10,000 letters of Spanish text. The graphic form is on a basis of 200 letters.
10,000 Letters200 LettersA135227111111111111111111111111111B102211C4749111111111D524101111111111E1402281111111111111111111111111111F91211G1373111H102211I60612111111111111J4111L517101111111111M3006111111N61912111111111111O818161111111111111111P257511111Q87211R75115111111111111111S7241411111111111111T422811111111U38771111111V85211X6Y103211Z4211
In this table the vowelsAEIOU= 45.65%; consonantsLNRST= 30.33%; consonantsJKQXZ= 1.76%.
Order of letters:
E A O R S N I D L C T U M P G Y (BH) F Q V Z J X.
Table VIII.—Table of frequency of digraphs, duals or pairs, Spanish text. Like Table III, this table is on the basis of 2,000 letters although prepared from a count of 20,000 letters. For this reason it is, to a certain extent an approximation; that is, merely because no figures are shown for certain combinations, we should not assume that such combinations never occur but rather that they are rare. The letters in the horizontal lines at the top and bottom are the leading letters; those in the vertical columns at the sides are the following letters. Thus, in two thousand letters, we may expect to findAItwice andIAtwenty-three times.
ABCDEFGHIJLMNOPQRSTUVXYZA941911561723541893202911218625AB6314BC24662453889522CD3129319131094DE122659101572121822493825282533EF444331FG24842GH2121021HI22316523111361053IJ321JL21363933721561222LM126516157261MN32462832122NO26222634916282015711OP13324927411PQ1151231QR402724436311173RS3952107142143ST51344185630TU2426345264171521UV222222VXXY562522YZ12142ZABCDEFGHIJLMNOPQRSTUVXYZ
Table IX.—Order of frequency of common pairs to be expected in a count of 2,000 letters of Spanish military orders and reports. Based on Table VIII.
DE59ON32AC24LA54AD31EC24ES52ST30CI23EN46ED29IA23AR40RA29DO22AS39TE28NE22EL39ER27AL21RE38CO26LL21OR36SE25PA20AN32UE25PO20
Alphabetic Frequency Tables(Truesdell)Frequency of occurrence in 1,000 letters of text:LetterFrenchGermanItalianPortugueseA8052117140B61866C33314534D40513140E197173126142F9211012G7421710H641610I658111459J3115K1101L49287232M31203046N791206648O572893110P3283028Q121316R74696464S66574988T65606043U62512946V2192015W115X3111Y2111Z114124
Alphabetic Frequency Tables(Truesdell)
Frequency of occurrence in 1,000 letters of text:LetterFrenchGermanItalianPortugueseA8052117140B61866C33314534D40513140E197173126142F9211012G7421710H641610I658111459J3115K1101L49287232M31203046N791206648O572893110P3283028Q121316R74696464S66574988T65606043U62512946V2192015W115X3111Y2111Z114124
Frequency of occurrence in 1,000 letters of text:
LetterFrenchGermanItalianPortugueseA8052117140B61866C33314534D40513140E197173126142F9211012G7421710H641610I658111459J3115K1101L49287232M31203046N791206648O572893110P3283028Q121316R74696464S66574988T65606043U62512946V2192015W115X3111Y2111Z114124
Order of FrequencyFrenchEANRSIUOLDCPMVQFGBJYZTHXGermanENIRTSADGHCLFMBWZKVPJQXYUOItalianEAIOLNRTSCDMUVGZFBQPHPortugueseEAOSRINMTDCLPQVFGBJZXYUH
Order of Frequency
FrenchEANRSIUOLDCPMVQFGBJYZTHXGermanENIRTSADGHCLFMBWZKVPJQXYUOItalianEAIOLNRTSCDMUVGZFBQPHPortugueseEAOSRINMTDCLPQVFGBJZXYUH
French
EANRSIUOLDCPMVQFGBJYZTHX
German
ENIRTSADGHCLFMBWZKVPJQXYUO
Italian
EAIOLNRTSCDMUVGZFBQPH
Portuguese
EAOSRINMTDCLPQVFGBJZXYUH
Graphic Frequency TablesFrequency of occurrence in 200 letters of text.FrenchA161111111111111111B211C6111111D101111111111E39111111111111111111111111111111111111111F211G11H11I131111111111111J11KL101111111111M6111111N161111111111111111O1111111111111P6111111Q211R15111111111111111S131111111111111T131111111111111U12111111111111V41111WX11YZItalianA2311111111111111111111111B11C9111111111D6111111E251111111111111111111111111F211G3111H11I2311111111111111111111111L1411111111111111M6111111N131111111111111O191111111111111111111P6111111QR131111111111111S101111111111T12111111111111U6111111V41111XYZ211GermanA101111111111B41111C6111111D101111111111E3211111111111111111111111111111111F41111G811111111H811111111I161111111111111111JK211L6111111M41111N24111111111111111111111111O6111111P211QR1411111111111111S1111111111111T12111111111111U101111111111V211W3111XYZ3111PortugueseA281111111111111111111111111111B11C71111111D811111111E281111111111111111111111111111F211G211H211I12111111111111J11L6111111M9111111111N101111111111O221111111111111111111111P6111111Q3111R131111111111111S18111111111111111111T9111111111U9111111111V3111XYZ11
Graphic Frequency Tables
Frequency of occurrence in 200 letters of text.FrenchA161111111111111111B211C6111111D101111111111E39111111111111111111111111111111111111111F211G11H11I131111111111111J11KL101111111111M6111111N161111111111111111O1111111111111P6111111Q211R15111111111111111S131111111111111T131111111111111U12111111111111V41111WX11YZItalianA2311111111111111111111111B11C9111111111D6111111E251111111111111111111111111F211G3111H11I2311111111111111111111111L1411111111111111M6111111N131111111111111O191111111111111111111P6111111QR131111111111111S101111111111T12111111111111U6111111V41111XYZ211GermanA101111111111B41111C6111111D101111111111E3211111111111111111111111111111111F41111G811111111H811111111I161111111111111111JK211L6111111M41111N24111111111111111111111111O6111111P211QR1411111111111111S1111111111111T12111111111111U101111111111V211W3111XYZ3111PortugueseA281111111111111111111111111111B11C71111111D811111111E281111111111111111111111111111F211G211H211I12111111111111J11L6111111M9111111111N101111111111O221111111111111111111111P6111111Q3111R131111111111111S18111111111111111111T9111111111U9111111111V3111XYZ11
Frequency of occurrence in 200 letters of text.
French
A161111111111111111B211C6111111D101111111111E39111111111111111111111111111111111111111F211G11H11I131111111111111J11KL101111111111M6111111N161111111111111111O1111111111111P6111111Q211R15111111111111111S131111111111111T131111111111111U12111111111111V41111WX11YZ
Italian
A2311111111111111111111111B11C9111111111D6111111E251111111111111111111111111F211G3111H11I2311111111111111111111111L1411111111111111M6111111N131111111111111O191111111111111111111P6111111QR131111111111111S101111111111T12111111111111U6111111V41111XYZ211
German
A101111111111B41111C6111111D101111111111E3211111111111111111111111111111111F41111G811111111H811111111I161111111111111111JK211L6111111M41111N24111111111111111111111111O6111111P211QR1411111111111111S1111111111111T12111111111111U101111111111V211W3111XYZ3111
Portuguese
A281111111111111111111111111111B11C71111111D811111111E281111111111111111111111111111F211G211H211I12111111111111J11L6111111M9111111111N101111111111O221111111111111111111111P6111111Q3111R131111111111111S18111111111111111111T9111111111U9111111111V3111XYZ11
1Occurrence rare, usually in proper names.↑
1Occurrence rare, usually in proper names.↑