'^1 14 Phono tactic aspects of the linguistic expression BENGT SIGURD Department of Scandinavian Languages, University of Lund 1. Introduction Phonological descriptions of languages generally consist of a description of the phonemes of the language and a description of the distribution of the phonemes. The latter part is also referred to as the phonotactic structures of the language.1 Phonotactic data are generally presented as word, morpheme or syllable formulae, lists of phoneme sequences or rules of the following type: only word medial sequences (interludes) are permitted in the language, the longest type of consonant cluster has three members, dentals do not combine in initial clusters, the vowel before / is always short etc. Such information may be given in relation to various units, such as utterance, macrosegment, word, morpheme, syllable. The choice of reference unit is a problem in many languages. Phonotactic descriptions may also be called phonological grammars (Householder, 1959; Saporta and Contreras, 1962; Romeo, 1964). In this case the permitted strings of phonemes constitute the language which is to be generated by the grammar. The relation of descriptions of phonotactic structures to the sentence grammar of the language is not clear. The scholars mentioned above hold that it is necessary to supplement the sentence grammar with a phonotactic description, but others (Halle, 1962) maintain that there is no motivation for separate phonotactic descriptions: the phonotactic 1 The useful term 'phonotactics' was introduced by Robert P. Stockwell (Hill, 1958). The distribution of phonemes is sometimes used in the sense of the relative frequencies of the phonemes (Herdan, 1956). I am indebted to Gerald Sanders, Indiana University, for revising my English and suggesting many improvements, in particular in the part dealing with string-replacement rules. '-■.vi Phonotactic aspects of the linguistic expression 451 restrictions will appear at their proper place in the sentence grammar of the language as rules concerning the combination of distinctive features (provided the lexical items are specified in terms of distinctive features). It might be argued that the shape of words or syllables is a characteristic feature of the language which is hard to extract from a sentence grammar of the language. The permitted phoneme sequences are generally much restricted and only a fraction of the combinations which can be formed by combining the phonemes in all possible ways occurs. The restrictions vary/greatly between languages and given only the set of phonemes it is not possible to produce acceptable words. There might be languages with identical phoneme inventories but greatly different ways of combining the phonemes. Phonemes can be classified according to their combinatory possibilities. The study of phonotactic structures jums at _djs^pyering^he_^harac-teristic patterns of the language. These patterns seem to be fairly stable properties of languages and they can be used for comparative or typological purposes. The permitted phoneme arrangements (and the permitted letter arrangements) are of interest to those who coin new words as personal names, names of industrial products and trade names. The accidental gaps found in the analysis suggest potential words which can be used without breaking the rules of the language. The semantic associations of certain sequences must, however, also be taken into account in such cases. Phonotactic investigations are also of importance to areas such as shorthand writing, cryptography, speech audiometry, and automatic recognition of speech. The teaching of foreign languages may also benefit from phonotactic studies. A contrastive study may show that the native language and the foreign language combine identical phonemes differently. Initial German kn is for instance difficult for an English speaker since this cluster does not occur in English although both k and n occur. This chapter will give a survey of the problems and methods of phonotactic investigation. Data from several languages will be treated for the sake of demonstration.2 2 The general problems and methods have been treated in Trubetzkoy, 1939; Vogt, 1942; Fischer-Jefrgensen, 1952; Hockett, 1955; Haugen, 1956a; Hararay and Paper, 1957; Spang-Hanssen, 1959; Malmberg, 1966; Sigurd, 1965; Greenberg, 1965. 452 Bengt Sigurd 2. The choice of units It is apparent from language descriptions that various linguistic units have been chosen as the frame of reference for phonotactic descriptions. Hockett suggests the macro-segment, the microsegment and the syllable as basic units, Trubetzkoy prefers the morpheme, Haugen and Fischer-Jorgensen insist on the syllable. Haugen and Fischer-Jorgensen have discussed the problem from general points of view and they find the syllable as the most suitable unit for comparison of the phonotactic structures of different languages. Haugen goes further and tries to define the syllable on the basis of phonotactic structures: the syllable is the unit within which phoneme distribution can be most economically described. All languages seem to have syllables and their structure is generally of the type (C)V(C). The trouble is, however, that other units have an influence on the phonotactic patterns. In the Germanic languages the word and syllable final consonant clusters will vary between mono-morphemic and polymorphemic sequences. The monomorphemic sequences are much more restricted and form a neater pattern. The addition of suffixes breaks this pattern. Although morphophonemic changes adapt many combinations to the basic pattern found in mono-morphemic sequences the morphological pressure may result in sequences which deviate considerably from the monomorphemic pattern. In English the final cluster ksd is only found in polymorphemic clusters as in sixth. In Swedish the addition of suffixes may result in clusters of six or even more members in words such as skdlmskts {skdlm-sk-t-s} (nominal-ized genitive form of a neuter adjective derived from the noun skdlm, 'rouge'; s, 'genitive', t, 'neuter', sk, 'derivational suffix'). Monomorphemic final clusters can only have three members in Swedish. Similarly, in Russian all initial clusters with more than three members seem to be polymorphemic. The word is of phonotactic importance in some languages, such as Eskimo (Swadesh, 1946), Finnish (Haugen, 1956a), and Kutenai (Haugen, 1956b). Any syllable division in Finnish and Kutenai will yield syllable initial or syllable final sequences which cannot occur word initially or word finally and rules for the occurrence of syllables in different positions of the word must therefore be introduced, if we insist on describing the phonotactic pattern within syllables. Phonotactic aspects of the linguistic expression 453 3. Data Another problem ij, the heterogeneous character of linguistic data.'1 Generally it is possible to find a fairly neat system inherent in frequent and genuine words. As we include more data such as loan-words, archaic words, nursery words and interjections the system breaks down or has to be supplemented in a way which destroys the neat patterning. Linguists usually try to get rid of the disturbing material by reference to the special character of the words, but it is generally difficult to apply the criteria and only retain the well-formed words. If frequency is used as a criterion it will for instance often be clear that words which we would like to include are less frequent than words which we would like to exclude. The relevance of statistical information to phonotactic investigations is not clear. It is also a problem whether running text frequency or lexical frequency should be used (Karlgren, 1961). Such figures are supposed to show the importance of the patterns but it is dubious whether running text frequency or lexical frequency (or a combination of both) would do this best. This would seem to be part of the general problem of determining an appropriate role for statistical information in the general model of linguistic structure. 4. Methods of analysis and description A list of permitted sequences is in itself a description of the data but it does not reveal the inherent structure which we feel is present. The following list shows permitted word initial consonant sequences in Swedish (0 means no consonant). Single consonants: r, I, m, n, v,j, b, g, d,f,p, k, t, s,f, c, h, 0 2- member clusters: tr, pr, kr, fr, dr, br, gr, vr, si, pi, kl, fl, bl, gl, pj, f /, bj, mj, nj, sv, tv, kv, dv, sm, sn, kn,fn, gn, st, sp, sk 3- member clusters: skr, spr, str, spj, skv, spl. The following diagram displays the inherent structure better. It generates all permitted sequences. The words to the right are examples. 3 The heterogeneous nature of linguistic data has been pointed out recently by Malmberg (1964). 0 - A -t ■ f -j ■ I- — al — kus — tJUV — skjuta -jul — lur — rar — ínur — mjuk — nát — njure — var — vrak — dvárg — dag — drag — tvá — lak — trád — gris — gala — g/ad — gnida — brod — bo — blod — bjuda — pris — pá — plat — PJds — krig — klok — ko — kvar — kná -fri -flod -fa — fiáder — fnissa — siná — SfíÓ — svár — slag — se — slá — sirá — skrutta — škola — škvár ta — sprit — spis — splittra — spjut ig. 1. Word initial sequences. Phonotactic aspects of the linguistic expression 455 4.1. Position analysis The diagram above shows that we can define different classes on the basis of position in relation to the vowel. Class 1, defined as the class of phonemes which can occur one step before the vowel, contains all the phonemes, class 2 which includes the phonemes occurring 2 steps from the vowel has a restricted membership, class 3 includes only s. It is possible to go further and define classes on the basis of the number of members in the clusters. Thus position 2 can only be occupied by p, t, k in three-member clusters, while it can be occupied by m, n, v,p, t, k, b, d, g,f, s if we include two-member clusters. The varying ability of phonemes to occur separated from the vowel may be used for measuring the vowel adherence of the consonants (Sigurd, 1965, p. 47). 4.2. Combination analysis We may define classes of phonemes on the basis of their combinability. Such classes will be of different size. A big class may be said to show a more important feature of the language than a small class. In Swedish the labials (p, b, m,f, v) constitute a class of phonemes which do not combine initially, the palatals (k, g, f,j) constitute another such class. Dental stops (d, i) do not combine with /, but although this class is small the fact is interesting since this seems to be a characteristic of most Indo-european languages. Combinations between nasals and stops show interesting differences between languages. In the basic Swedish system in syllable final position we have the following pattern. r-s. lab lab trip, mb dent dent Npai dent pal Fig. 2. V:-"' 456 Bengt Sigurd Phonotactic aspects of the linguistic expression 457 The place of articulation of a stop following a dental nasal is thus predictable. A stop after labial or palatal nasals may have two different places of articulation. In word medial position in Finnish, where we only have mp, nt, rjk the place of articulation is always the same in the combination. In Spanish the place of articulation of a nasal at the end of a syllable is also predictable (from the pause or the following phoneme). 4.3. Order analysis If we look at phoneme sequences purely from the point of view of the order of the phonemes we may sometimes find an order relation inherent in the data. An order relation should fullfill the following requirements: (1) there should be no inversible sequences such as both xy andyx, (2) if we have xy and yz we should also have xz. The first criterion is the a-symmetry criterion; the second is the transitivity criterion. The Swedish sequences are very close to an order relation. They meet the first requirement and almost meet the second. We have for instance sk and kn and also sn, we have kv and vr and also kr. If we add the lacking sequences (sr, sj, kj, gj) we will get an order relation which can be depicted as in fig. 3. Fig. 3. Order diagram which generates initial two-member clusters in Swedish. In this figure a box has been drawn to the right of the order diagram containing the added sequences. This box can be considered as a filter which extracts the non-permitted sequences from the sequences which are generated if we select two phonemes going from left to right along the lines. The diagram may also generate three-member clusters, but in this case it is overgenerating. This diagram gives a neat description of the Swedish sequences and it is reasonable to use this approach in this case. It is obvious, however, that the size of the filter is a measure of the fit of the description to the data and that nothing important has been said if the filter extracts most of the sequences generated by the order diagram. The following examples show how this approach works in some other languages. The following is a chart of the two-consonant clusters in Fox interludes (cf. Hockett, 1955, p. 92). m w y pw py tw kw ky sw sw hw hy mw my nw ny p t k s s sk c h hp ht hk he m n vv y In the Fox interludes presented, the relation 'precedes' is almost transitive on the entire set of consonants. If the cluster sy was present it would be completely transitive. From the point of view of sequential order, this cluster might be said to be an 'accidental gap' in Fox (although, of course, there may be other reasons for rejecting the cluster). It thus seems reasonable to describe the Fox interludes by the following diagram (fig. 4). A box containing the exception sy has been drawn to the right of the diagram. This may be considered to be a representation of a 'filter' which excludes the cluster sy from the output of the diagram. The three-consonant interludes of Fox are the following: hpy, hpw, hky, hkw, sky, skw, and htw. This is precisely the set of three-consonant clusters generated by the diagram. Furthermore, the set of Fox interludes 458 Bengt Sigurd Fig. 4. Diagram generating medial sequences of one, two or three members in Fox which consist of a single consonant is just the set of consonants used to construct the diagram. Consequently, if we consider any one-, two- or three-consonant cluster obtained by traversing an arbitrary path of the diagram and filter to be allowable output, we find that the diagram generates exactly the full set of Fox interludes. The set of onsets in Fox is as follows :p,pw,py, t, tw, c, k, kw, ky, s, sw, s, sw, m, mw, my, n, nw, ny, w, and j>. Tc generate this set through the use of the preceding diagram would require a somewhat more complex filter to exclude the many interludes which do not occur as onsets. The following medial clusters occur in Eskimo4: jp, yp, pq, pk, pt, pn, ps, fa, pi, jp, jk, jt, jim, /in, js, js, jl, yp, yq, yt, ym, yn, ys, ys, yl, tl, ts. ji is an allophone of j before a nasal. The diagram below shows the order relation inherent in these clusters. It generates all the clusters and no others if two consonants are selected along the lines when going from left to right extracting those in the filter. The clusters in the filter show some systematic features: labials do not combine, the palatal fricative does not combine with a velar stop and the velar fricative does not combine with a palatal stop. The order inherent in the sequences permits us to arrange the consonants in columns - order classes5 - in such a way that we get the permitted clusters by going from left to right selecting members of the differ- 1 After Swadesh (1946, p. 31) j and y are palatal and velar fricatives respectively, k and q are palatal and velar stops respectively, s is articulated with the blade, s is articulated with the point of the tongue. 5 The application of order classifications in linguistic analysis has been discussed in detail in Brodda and Karlgren (1964). Phonotactic aspects of the linguistic expression iP 459 Fig. 5. Diagram generating medial two-member clusters in Eskimo. ent classes. The members in the classes do not combine, but members of different clusters combine although they need not all combine. By this classification only based on order we lose some information about occurring combinations. The members of the different classes will often have phonetic features in common. However, it is often possible to arrange the phonemes in order classes in several ways, if all phonemes do not combine which is often the case. The following arrangement shows one possible classification in Eskimo. Fricatives pal.-vel. lab. j ß y Stops and Nasals + s P k q t m n ? Lat. / 4.4. Phonological grammars using string-replacement rules String-replacement rules or rewrite rules of the type used in syntactic descriptions6 may also be used in phonotactic descriptions. Such grammars may take advantage of existing restrictions in position, combination 6 Cf. Chomsky (1957). Phonotactic descriptions of this type are discussed in Bach (1964), and in the papers mentioned in the introduction. 460 Bengt Sigurd Phonotactic aspects of the linguistic expression 461 or order in different ways. Various classes of grammars of this type have been defined. The following (a) is a context-free phrase structure grammar generating the initial sequences in Spanish: 6, p, b, t, d, k, g,f, r, I, I, X, s, m, n, ft, c, pr, br, kr, gr,fr, tr, dr,pl, bl, kl, gl,fl?
a /i*
V/
{T} a