How to build a conversational IVR with Fonos and Dialogflow.

A tutorial covering the essentials of building a voice application with Project Fonos.

Pedro Sanders
5 min readNov 8, 2020
A person communicates with a conversational bot
A person talks with a conversational bot.

Building a Conversational Programmable Voice Application (C-PVA) can be challenging and is even more difficult when you need it to scale.

Project Fonos can reduce the complexity of creating conversational IVR applications and significantly reduce time to market.

Project Fonos provides all the tools and components to create CpaaS (Communications Platform as a Service) applications without vendor lock-in.

This tutorial will cover the steps to create a conversational chatbot with Project Fonos and Dialogflow. We will then connect the chatbot to an application built on top of Fonos and then deploy it to a working Project Fonos deployment.

Overview

Before we dive in, we’ll first take a quick look at the final product — a C-PVA to handle reminders. Project Fonos will handle the call and pass the controls to Dialogflow for smooth person-to-bot interaction.

Architecture

They are usually three actors in a C-PVA system.

Callers

The first and perhaps easiest to explain is the Callers. Simply put, these are the user and the one placing the call.

Telephony System

Next, the Telephony System, which will be managed by Fonos, is made up of several components. The most important being the SIP Proxy, Media Server, and Media Controller.

Project Fonos uses Routr I/O as its SIP Proxy, which consists of a server that receives registration requests from SIP endpoints and routes the call from an endpoint.

The Media Server is responsible for answering the call, recording, queues, conference, etc. We will be using Asterisk, but we could’ve easily use FreeSwitch or another solution.

Finally, the Media Controller. A server that allows us to create user-defined functions or applications to control the flow of a call. Functions typically come in the form of a verb, such as Play, Say, Hangup, Record, etc.

Bot Engine

At the other end of the call, we have a “Bot Engine” to control the conversation. This is where most of the logic is and where all the heavy lifting is done.

Although I’m using Dialogflow for this tutorial, you could easily replace it with AWS Lex or even Rasa for a fully open-source solution.

Building a conversational bot with Dialogflow

Before we dive into creating our bot, let’s briefly touch on some essential terms.

  • Agent: A virtual agent that controls communications with your end-users. It uses NLU (Natural Language Understanding) to capture the subtleties of the human language.
  • Intent: Simply put, this the intention or purpose of the user for one conversation turn.
  • Entity: Entities are a mechanism for identifying and obtaining valuable data from natural-language inputs.
  • Fulfillment: A fulfillment is a matching response to the user’s intent, typically deployed as a webhook.

If you are new to Dialogflow, it might be a good idea to use a pre-built bot. For simplicity, I will use the pre-build agent named “Reminders.”

Agent settings in Dialogflow Essentials

If you look at the “Intents” section, you will notice a list of pre-populated intents, including add, get, and delete. I will also add a new intent — close — to signal the session is over and that the bot should finish the call.

The current list of intents:

If you expand any intent, you will also see the training phrases used for each. Feel free to add to these phrases to enrich the training.

Before we get into the code, now is an excellent time to make sure your bot works as expected using the “Try it now.”

Integration with Project Fonos

Next, we will need a valid set of credentials from Google Cloud. At a minimum, you’ll need to enable the Text-to-Speech, Speech-to-Text, and DialogFlow APIs.

To enable an API for your project:

  1. Go to the API Console.
  2. From the projects list, select a project, or create a new one.
  3. If the APIs & services page isn’t already open, open the console left side menu, select APIs & services, and then select Library.
  4. Click and enable Cloud Text-to-Speech, Cloud Speech-to-Text, and Dialogflow.

After that and go to the Create Service Account Key page and create a New service Account.

Save the file as google_credentials.json at the root of your project.

Next, you’ll need your @fonos/ctl Command-Line properly configured against your Fonos deployment.

Create a new directory, and run the command fonos apps:init using all the default parameters.

The init command will instantly create a bootstrapped application to run in Project Fonos.

Next, install the GoogleASR, GoogleTTS, and Dialogflow modules with:

npm i --save @fonos/googleasr @fonos/googletts @google-cloud/dialogflow

Open the project in your favorite IDE and create a script named talk.js with the following content:

This script takes the provided text and executes a callback with Dialogflow’s response.

Next, create a script named config.js with the following content:

Finally, you will need to edit the entry point script index.js to look like the code below:

Once these changes have been implemented, run the command fonos apps:deployfrom the application directory.

Interact with your new conversational IVR

The final piece of this puzzle is to interact with the bot via a voice. You must now assign your app to a phone number so that it can receive inbound calls. For that, you can use fonos numbers:create or fonos numbers:editand fill out the prompted questions.

Once you have established the relation between your phone number and your app, you are ready to call your voice application.

Conclusion

As you can see, the gap between proprietary and open-source CPaaS is shrinking. Once tools like Project Fonos reach maturity, there will be no reason to use proprietary CPaaS anymore.

--

--

Pedro Sanders

Founder at @fonoster · Building the Open-Source alternative to Twilio · Posts about my journey with COSS & SaaS.