Editing Speech synthesis (section)

=== Text-to-phoneme challenges ===

{{Unreferenced section|date=April 2023}}

Speech synthesis systems use two basic approaches to determine the pronunciation of a word based on its [[spelling]], a process which is often called text-to-phoneme or [[grapheme]]-to-phoneme conversion ([[phoneme]] is the term used by [[Linguistics|linguists]] to describe distinctive sounds in a [[language]]). The simplest approach to text-to-phoneme conversion is the dictionary-based approach, where a large dictionary containing all the words of a language and their correct [[pronunciation]]s is stored by the program. Determining the correct pronunciation of each word is a matter of looking up each word in the dictionary and replacing the spelling with the pronunciation specified in the dictionary. The other approach is rule-based, in which pronunciation rules are applied to words to determine their pronunciations based on their spellings. This is similar to the "sounding out", or [[synthetic phonics]], approach to learning reading.

Each approach has advantages and drawbacks. The dictionary-based approach is quick and accurate, but completely fails if it is given a word which is not in its dictionary. As dictionary size grows, so too does the memory space requirements of the synthesis system. On the other hand, the rule-based approach works on any input, but the complexity of the rules grows substantially as the system takes into account irregular spellings or pronunciations. (Consider that the word "of" is very common in English, yet is the only word in which the letter "f" is pronounced {{IPA|[v]}}.) As a result, nearly all speech synthesis systems use a combination of these approaches.

Languages with a [[phonemic orthography]] have a very regular writing system, and the prediction of the pronunciation of words based on their spellings is quite successful. Speech synthesis systems for such languages often use the rule-based method extensively, resorting to dictionaries only for those few words, like foreign names and loanwords, whose pronunciations are not obvious from their spellings. On the other hand, speech synthesis systems for languages like English, which have extremely irregular spelling systems, are more likely to rely on dictionaries, and to use rule-based methods only for unusual words, or words that are not in their dictionaries.