Conversational Interfaces: Where Are We Today? Where Are We Heading?
Introduction:
It’s hard to imagine a place where you can’t find an interface between user and computer system which allows the human and computing components to converse naturally, either through voice or text. By conversational interface I am referring to any system which allows users to interact with computers by providing their input in a natural language, such as English. The primary use for this type of interface is automation or simplification of existing tasks where users expect the computer will understand and respond appropriately.
Today conversational interfaces are everywhere:
from social networking sites such as Facebook, where you can send messages via Chat, to your mobile phone through text messaging, or by simply saying “Send Text Message” into Siri on an iPhone. You can also use Google Voice Search with the latest Android phones which uses natural language input to perform searches, make calls, send texts and more.
Natural Language processing is a complex science, and is still very limited in precision, however the natural language processing tools are already good enough to provide a great experience for users. This article will explore the state of conversational interfaces today, where we are likely headed with these types of interfaces in the future, and what companies like FizzPop are doing to advance this technology.
The Current State:
The current state of the art conversational interfaces can be divided into two main categories, voice and text. Within each category there are a number of techniques that work very well for certain situations, but not as well for others. I will cover what is currently possible with today’s technology and the types of applications which lend themselves to each category and technique. Keep in mind that this is a field which has been evolving rapidly, and the following chart shows an approximate timeline for certain key events:
Voice Conversational Interfaces:
The most straightforward way to implement a conversational interface is with speech recognition/voice recognition (where you speak your commands into the computer). As you can imagine, there are a number of challenges in recognizing and translating speech into commands which computers can understand.
Early voice recognition systems were unreliable and very difficult for users to train. However with the availability of cloud based computing platforms such as Amazon’s EC2 it is now possible to provide high quality out-of-the-box voice recognition at an affordable cost.
Voice recognition systems can be designed and trained to recognize and respond to a limited number of voice commands (such as those provided by Nuance’s Dragon products). For example, you might configure Dragon with rules such as: if the user says “I would like to go out for dinner” then send this email message:
“Where do you want to go?”. These types of voice activated systems are useful for specific applications, but they can be difficult to customize because each command has to be explicitly configured.
Voice recognition systems which allow natural language input (such as Apple’s Siri) are much more flexible, but these systems require significantly greater computing power and therefore at this point in time they are only available on mobile devices.
Text Conversational Interfaces:
Natural language text processing is much easier than voice recognition because computers today already understand written English fairly well. The challenge with text based conversational interfaces is providing users with an interface to launch these types of applications. For example, the user might install a “Conversation” app on their mobile device or social networking site. They might install a “Conversational Interface” browser plug-in.
These types of text based conversational interfaces are usually Domain Specific Language. Which have been designed to work in specific applications. e.g. you might write a script for searching the web using DuckDuckGo. For playing a game using the best IRC based games.
I have been working with text based conversational interfaces for the last couple of years. One of my favorite examples is a social network I built which allows you to chat through Facebook’s messaging system. This service requires very little computing power because it runs entirely in your browser. It requires little training because you already know how to use Facebook’s instant messaging system.
How Will We Use Conversational Interfaces in the Future?
The last 5 years have seen an explosion of technology. Which has dramatically changed how we interact with computers, mobile devices, social networks, etc. I think that natural language processing will continue to improve. But I think it is unlikely that we will be chatting with computers using voice recognition.
I predict that the next wave of computer interfaces will center around mobile devices. Because they are always available and they are inherently social. We already have mobile apps for Google Now, Siri, Cortana, etc. But I expect that the next major leap in mobile apps will come from developers. Who are building new types of applications which require conversational interfaces.
For example
imagine a new type of app which is designed to provide support for those with special needs. (such as an app which provides information about emergency exits on public transportation. e.g. “If you are on a plane the first exit would be located on your left”).
There might also be new types of apps which help family members stay in touch with one another. Chatbots that use conversational interfaces to allow users to order drinks at their local pub. I also expect to see new types of educational apps which teach children how to read, write, etc.
What are the Challenges with Conversational Interfaces?
One of the major challenges will be providing users with enough information. So that they can find out what type of chatbots are available. For this reason I think it will lead to an explosion of new types of social networks.
Challenges:
I think it will be very difficult for search engines to index these new types of apps. This is one reason I believe that they are unlikely to be used within the next 5 years. For example, if you type “Conversational Interfaces” into Google then there are only 2 sites which show up in the first 100 results. Both of those sites are chatbot directories.
A second challenge will be providing a simple way for chat bots to send information back to their creators. For example, a user might enter a request into a chat bot for an action e.g. “please book me a flight from New York to London”. This type of request could be sent to a chatbot directory which then allows users to choose. Which chatbot type they want to use.
A third challenge will be that these new types of interfaces. It will need to simplify some of the technical concepts behind natural language processing systems e.g. How do you find evidence for potential answers during question-answer systems?
I believe that another challenge is simply getting noticed. Quora allows users to ask questions which are then answered by people who have expertise in the subject matter. This type of site will likely become even more popular once chatbots are widely available. Since it provides a way for both humans and chat bots to answer questions.
The key message here is that natural language processing will become increasingly important over the next 5 years. But I don’t expect it to be widely used. The biggest challenge with conversational interfaces will be getting noticed by users who are interested in trying them out.
Other Resources
For more informative articles keep visiting Emu Article.