This article is a survey of conversational agents that I did when I was working to build one myself.

Architecture

The general architecture of a conversational agent is given in Figure 1. Traditional techniques follow this architectural division closely and implement specialized modules for each component of the agent. However it is becoming increasingly common to combine one or more components in favour of more end-to-end conversational agents. This is especially true when using large deep learning models as they are able to implement more than one component, often with better results due to information sharing between the tasks of individual components.

Figure 1: Architecture of a Conversational Agent


State-of-the-art techniques

Here’s a survey of state-of-the-art techniques for building entire conversational agents or one or more individual modules shown in Figure 1. All papers are linked in the table and also at the end. A few other interesting papers, papers on human-in-the-loop techniques and studies on building conversational agents are also linked at the end.

Download the document or open it in a new tab


State-of-the-art techniques for each module

Here is a grouping of papers based on which architectural component their techniques apply to. The key points of each paper are summarized.

Download the document or open it in a new tab


Frameworks

Frameworks are an important starting point for the development of a conversational agent as they can provide a standard data format, pretrained models, dialog system pipelines, etc. I ultimately chose Rasa to implement my conversational agent but there are many frameworks out there suited for a wide variety of tasks. Here’s a survey of the most popular platforms and frameworks available.

Download the document or open it in a new tab


Datasets

Datasets are necessary for training custom models to use for language understanding/dialog management or for training end-to-end conversational agents. Good quality datasets are very important for dialog applications since the agent needs to produce good quality text to hold a satisfying conversation. Here is a survey of the most popular datasets available for training conversational agents.

Download the document or open it in a new tab


Additional notes on select papers

Download the document or open it in a new tab


Further Reading

Speech and Language Processing by Dan Jurafsky and James H. Martin
Chapter 15: Chatbots & Dialogue Systems
https://web.stanford.edu/~jurafsky/slp3/15.pdf

Conversational Agents: Theory and Applications by Mattias Wahde and Marco Virgolin https://browse.arxiv.org/pdf/2202.03164.pdf
Here’s an extremely short summary of the key points they present.

Download the document or open it in a new tab