Finite State Optimality Theory: Final Obstruent Devoicing

| tech

Recently I read an article by Gerdemann and Hulden (2012) in which they provide finite-state implementations for two Optimality Theory (OT) analyses1 together with an easy proof of nonregularity of OT grammars. This post is an annotated walkthrough of their OT implementation for word-final obstruent devoicing (e.g., German /lob/ > [lo​ːp]), a phonological process that is observed in typologically diverse languages, which makes it a prototypical showcase for the universal constraints proposed by OT. For their implementation, they use the finite-state toolkit foma (Hulden, 2009). I will introduce necessary notation informally as it comes up, the following table only lists foma's deviations from the POSIX regex standard:

Notation Description
~A Complement
? Any symbol in the alphabet
% Escape symbol
[ ] Grouping brackets
( ) Parentheses surround optional parts


The Generator Gen implemented for this analysis describes all possible deletions (elisions), insertions (epentheses), and substitutions. To keep track of each modification, "every surface segment (output) is surrounded by brackets [...], every input segment that was manipulated by Gen is surrounded by parantheses (...)", so we have four basic cases:

The following piece of code compiles a regular expression into a finite-state transducer (FST) that can be accessed later as Gen.

Gen = S -> %( ... %) %[ (S) %]      ,,
      S -> %[ ... %]                ,,
      [..] (->) [%( %) %[ S %]]*    ;

The regex consists of three parallel rules delimited by ,,. Each rule rewrites its left-hand side (before the arrow ->) into its right-hand side. The symbol S is a variable for a transducer. We can think of it as the set of available segments. For the examples from the paper, this could simply be

S = [p|t|k|b|d|g|a|e|i|o|u];

The first rule creates all possible deletions and substitutions. The three dots ... are a backreference to the input. If S is the set of available segments, for a single input segment we get |S| + 1 outputs, one for each segment and one with empty square brackets. The second rule just adds square brackets around its input (identity case). The third rule produces an infinite number of insertions. Square brackets with two dots [..] denote an empty left-hand side (nothing is replaced by something).


Generator automaton as defined by the three rules above.


Now that we have an infinite number of candidates, we need to define our ranked set of constraints to evaluate them. Here I just copy the definitions from the paper:

The first constraint (*VF) obviously is responsible for the devoicing of final obstruents. The second constraint (IdentV), ranked lower than the first, rules out unnecessary substitutions. VOP never decides the race since it's ranked lower than the faithfulness contraint IdentV.

If we were only dealing with candidates that are minimal pairs such as *bed and bet (where /bed/ is the underlying form), the three constraints above would suffice. However, Gen produces all possible insertions, deletions and substitutions, so we need additional faithfulness constraints to rule out forms such as *bede (epenthesis), *be (deletion), and *bek (substitution changing voicedness and place of articulation). These incorrect forms are avoided with the following three constraints:

On page 14 in the paper, all six constraints are shown in action in an OT tableau.

Let's move on to the FST implementation. The authors define two helpful templates for matching the context of constraint violations.

def Change(X,Y) [%( X %) %[ Y %]];

def Surf(X) [X .o. [0:%[ ? 0:%]]*].l/
            [ %( (S) %) | %[ %] ];

The first template generates a simple substitution (X)[Y]. The second definition is more complex, we'll work through it from the inside out:

So Surf(X) accepts all possible surface forms of expression X (and many invalid forms, but this is unproblematic because Gen does not produce them).

Furthermore, the authors define four segment subgroups: vowels (V), voiced consonants (CVOI), voiced and unvoiced phones.

V           = [a|o|u|i|e];
CVOI        = [b|d|g];
Voiced      = [CVOI|V];
Unvoiced    = [p|t|k];

With these groups and the Change template, they define regular expressions to recognize surface forms of two phonological processes: change of voice (VC) and change of place of articulation (Place).

define VC     Change(Voiced,Unvoiced) | 
define Place  Change(p,?-b) | Change(t,?-d) |
              Change(k,?-g) | Change(b,?-p) |
              Change(d,?-t) | Change(g,?-k) |
              Change(a,?)   | Change(e,?)   |
              Change(i,?)   | Change(o,?)   |

The language of the first expression contains words such as (b)[p], replacements of a voiced obstruent with its unvoiced counterpart, or (k)[a], replacements of a voiceless obstruent with a vowel. The second expression generates substitutions such as (p)[k], (i)[b] etc., but not (p)[b] or (d)[t], where the place of articulation is the same.

These intermediate expressions are now used to define the preceding context of the following four replacement rules that are constrained by context, more specifically A -> B || L _ R only applies if L precedes A and R follows A in the input (all other input passes unchanged).

VF      = [..] -> {*} || Surf(CVOI) _ .#. ;
IdentV  = [..] -> {*} || VC _ ;
VOP     = [..] -> {*} || Surf(CVOI) _ ;
IdentPl = [..] -> {*} || Place _ ;

VF inserts an asterisk after any surface form of a voiced consonant before a word boundary. IdentV stars surface forms that indicate a change of voice. VOP is a more general version of VF that doesn't require a word boundary. IdentPl stars surface forms that indicate a change of place.

Finally the constraints forbidding insertion (Dep) and deletion (Max) are easy to implement. In the former case we star empty parantheses, in the latter empty square brackets.

Dep   = [..] -> {*} || %( %) _ ;
Max   = [..] -> {*} || %[ %] _ ;


The final part of any OT analysis is Eval, the evaluation of the hierarchically ranked constraints. We move from the highest ranked constraint to the lowest ranked, at each step eliminating all but those candidates that incur the minimal number of violations. To do this we need a finite-state method of computing the set of candidates that have the minimal number of violations. Counting doesn't seem like a viable solution, since we want an automaton with a finite set of states and we are dealing with an infinite number of inputs of unknown length. The authors use a clever trick that they call worsening to solve this problem. They define a transducer that adds any number of violations (stars) in any possible position to each candidate.

AddViol = [?* 0:%* ?*]+;

If you compose the inverse of this transducer with a constraint that adds violations to candidates, it will filter out all candidates with a non-minimal number of violations. Let's look at a simple example. We have a set of two candidates S1 = {a*, a***} and want to retain the candidate with the minimal number of stars (in this case * is a literal symbol, not a Kleene star). Applying AddViol to S1 generates the language S2 that consists of all words with a single a and at least two stars. Now we intersect S1 with the complement of S2 (assuming the universe is any string over our two-symbol alphabet), which gives us S1∩ S2c = {a*}, the subset of words in S1 with the minimal number of stars.

I glossed over one little complication of the worsening approach: the candidates to which we add stars differ from each other not only by the number of stars. So if we have a candidate (t)[d] without any star and a second candidate (t)[]* with one star, we can add as many stars to the first candidate as we want, it will never match the second (nor will any other candidate). The authors solve this by composing the inverse Gen with Gen (ignoring any stars) and adding violations to every possible configuration afterwards.2

Worsen = [Gen.i .o. Gen]/%* .o. AddViol;

For (t)[d] the inverse generator Gen.i outputs t, for which Gen produces [t], (t)[], (t)[d], and so on. AddViol then produces starred variations, one of which will be (t)[]*.

So to filter out candidates with a non-minimal number of violations, we compose the previous transducer with the extracted range of its worsened complement. After this step, we remove all stars because each candidate will have the same number and we don't need them for the evaluation of the next constraint.

def Eval(X) X .o. ~[X .o. Worsen].l .o. %* -> 0;

The output of this evaluation will be the input for the evaluation of the next constraint etc. After all evaluations have been performed, remaining markup is removed with the Cleanup transducer.

Cleanup     = %[|%] -> 0 .o. %( \%)* %) -> 0;

def OTGrammar S+ .o. Eval(Eval(Eval(Eval(Eval(Eval(
    Gen .o. Dep) .o. Max) .o. IdentPl) .o.
    VF) .o. IdentV) .o. VOP) .o. Cleanup;

S+ restricts the input to non-empty strings over the segments defined above. After compiling the whole grammar, foma outputs this nice graphical representation via the view command.


FST for final obstruent devoicing analysis in OT.

Rule-based implementation

Instead of the OT analysis, we can translate the classical phonological rewrite rules directly into a transducer. (Three replacements that apply in the same context.)

Rule = b -> p, d -> t, g -> k || _ .#. ;
RuleGrammar S+ .o. Rule;

This transducer is weakly equivalent to the OT transducer (produces the same language), but is structurally different, as can be seen from its graphical representation.


FST for final obstruent devoicing from phonological rules.

Because both the OT transducer and the rule transducer represent a function, we can test for their equivalence: (1) they apply to the same domain, and (2) the composition of the inverse of one with the other is an identity transducer.

regex RuleGrammar.i .o. OTGrammar;
test identity 
# 1 (1 = TRUE, 0 = FALSE)


From an engineering point of view, the OT analysis doesn't seem appealing compared to the classical rewrite rules. We need many definitions that are far more complex, the resulting automaton is more complex, and for some OT grammars, a finite-state implementation is impossible. Theoretical linguists on the other hand usually don't care too much about the computational complexity of their analyses and frameworks, for them OT is appealing because it allows them to trace certain properties of languages back to universal constraints rooted in articulation and perception (in the case of phonology).

Still it seems to me that linguists could profit from implementing their analyses with finite-state technology.

  1. Implementing a functional transducer forces us to turn implicit assumptions into explicit constraints. We have seen this with the extra set of faithfulness constraints that were necessary to avoid suboptimal forms.
  2. The computer can help ensuring that the analysis is correct and works as intended.
  3. Once we have a working FST implementation, we can use it in both directions, that is, we can transform an underlying form into a surface form, or generate possible underlying forms for a given surface form.

1 Based on the matching approach developed by Gerdemann and Van Noord (2000).

2 In general, violations do not necessarily line up. For these cases the authors introduce a violation-permuting transducer that needs to be applied the right number of times. However, this problem does not affect the devoicing analysis, so I ignore it here.

Gerdemann, D., & Hulden, M. (2012). Practical finite state optimality theory. In Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing (pp. 10-19).

Gerdemann, D., & Van Noord, G. (2000). Approximation and exactness in finite state optimality theory. In Proceedings of the Fith Workshop of the ACL Special Interest Group in Computational Phonology (pp. 34-45).

Hulden, M. (2009). Foma: a finite-state compiler and library. In Proceedings of the Demonstrations Session at EACL 2009 (pp. 29-32).

This page was moved from my old website.