Editing Interactive voice response (section)

== Technology ==
DTMF decoding and [[speech recognition]] are used to interpret the caller's response to voice prompts. DTMF tones are entered via the [[telephone keypad]].

Other technologies include using [[Speech synthesis|text-to-speech]] (TTS) to speak complex and dynamic information, such as e-mails, news reports or weather information. IVR technology is also being introduced into automobile systems for hands-free operation. TTS is computer generated synthesized speech that is no longer the robotic voice traditionally associated with computers. Real voices create the speech in fragments that are spliced together (concatenated) and smoothed before being played to the caller.

An IVR can be deployed in several ways:

* Equipment installed on the customer premises
* Equipment installed in the PSTN (public switched telephone network)
* [[Application service provider]] (ASP) / hosted IVR

An [[automatic call distributor]] (ACD) is often the second point of contact when calling many larger businesses. An ACD uses digital storage devices to play greetings or announcements, but typically routes a caller without prompting for input. An IVR can play announcements and request an input from the caller. This information can be used to profile the caller and used by an ACD to route the call to an agent with a particular skill set.

Interactive voice response can be used to front-end a [[call center]] operation by identifying the needs of the caller. Information can be obtained from the caller such as an account number. Answers to simple questions such as account balances or pre-recorded information can be provided without operator intervention. Account numbers from the IVR are often compared to [[caller ID]] data for security reasons and additional IVR responses are required if the caller ID does not match the account record.<ref>{{Cite web|url=http://electronics.howstuffworks.com/interactive-voice-response.htm|title=How Interactive Voice Response (IVR) Works|publisher=How Stuff Works|author=Dave Roos|date=20 February 2008 }}</ref>

IVR call flows are created in a variety of ways. A traditional IVR depended upon proprietary programming or scripting languages, whereas modern IVR applications are generated in a similar way to [[World Wide Web|Web]] pages, using standards such as [[VoiceXML]],<ref name="VXML">{{cite web | url = http://www.w3.org/TR/voicexml21 | title = Voice Extensible Markup Language (VoiceXML) Version 2.1 | publisher = W3C}}</ref> [[CCXML]],<ref name="CCXML">{{cite web | url = http://www.w3.org/TR/ccxml | title = Voice Browser Call Control: CCXML Version 1.0 | publisher = W3C }}</ref> [[SRGS]]<ref name="SRGS">{{cite web | url = http://www.w3.org/TR/speech-grammar | title = Speech Recognition Grammar Specification Version 1.0 | publisher = W3C}}</ref> and [[Speech Synthesis Markup Language|SSML]].<ref name="SSML">{{cite web | url = http://www.w3.org/TR/speech-synthesis | title = Speech Synthesis Markup Language (SSML) Version 1.0 | publisher = W3C}}</ref> The ability to use XML-driven applications allows a [[web server]] to act as the [[application server]], freeing the IVR developer to focus on the call flow.

IVR speech recognition interactions (call flows) are designed using 3 approaches to prompt for and recognize user input: directed, open-ended, and mixed dialogue.<ref name="advances in commercial deployment">{{cite book
 | last = Suendermann
 | first = David
 | title = Advances in Commercial Deployment of Spoken Dialog Systems
 | url = https://archive.org/details/advancescommerci00suen
 | url-access = limited
 | publisher = [[Springer Science+Business Media]]
 | location = Berlin
 | date = 2011
 | pages = [https://archive.org/details/advancescommerci00suen/page/n21 9]–11
 | isbn = 9781441996107}}</ref><ref name="advances in commercial conversational agents">{{cite book
 | last = Perez-Marin
 | first = Diana
 | title = Conversational Agents and Natural Language Interaction: Techniques and Effective Practices
 | publisher = IGI Global
 | location = Hershey, Pennsylvania
 | date = 2011
 | page = 340
 | isbn = 9781441996107}}</ref><ref name="w3c aural presentation">{{cite web
 | title = Presentation of Information - Aurally
 | publisher = W3C
 | url = https://www.w3.org/TR/voicexml/#s6.5
 | access-date = 26 October 2016
 }}</ref>

A directed dialogue prompt communicates a set of valid responses to the user (e.g. "How can I help you? ... Say something like, ''account balance, order status,'' or ''more options''"). An open-ended prompt does not communicate a set of valid responses (e.g. "How can I help you?"). In both cases, the goal is to glean a valid spoken response from the user. The key difference is that with directed dialogue, the user is more likely to speak an option exactly as was communicated by the prompt (e.g. "account balance"). With an open-ended prompt, however, the user is likely to include extraneous words or phrases (e.g. "I was just looking at my bill and saw that my balance was wrong."). The open-ended prompt requires a greater degree of [[natural language processing]] to extract the relevant information from the phrase (i.e. "balance"). Open-ended recognition also requires a larger [[grammar]] set, which accounts for a wider array of permutations of a given response (e.g. "balance was wrong", "wrong balance", "balance is high", "high balance"). Despite the greater amount of data and processing required for open-ended prompts, they are more interactively efficient, as the prompts themselves are typically much shorter.<ref name="advances in commercial deployment"/>

A mixed dialogue approach involves shifting from open-ended to directed dialogue or vice versa within the same interaction, as one type of prompt may be more effective in a given situation. Mixed dialog prompts must also be able to recognize responses that are not relevant to the immediate prompt, for instance in the case of a user deciding to shift to a function different from the current one.<ref name="w3c aural presentation"/><ref name="advances in commercial conversational agents"/>

Higher level IVR development tools are available to further simplify the application development process. A call flow diagram can be drawn with a GUI tool and the presentation layer (typically VoiceXML) can be automatically generated. In addition, these tools normally provide extension mechanisms for software integration, such as an HTTP interface to a website and a [[Java (programming language)|Java]] interface for connecting to a database.

In [[telecommunications]], an '''audio response unit''' (ARU) (often included in IVR systems) is a device that provides synthesized voice responses to DTMF keypresses by processing calls based on (a) the call-originator input, (b) information received from a database, and (c) information in the incoming call, such as the time of day. ARUs increase the number of information calls handled and provide consistent quality in information retrieval.