How we built our own scalable, real-time framework to create and serve conversational systems.
Voice based systems are becoming pervasive in the world, and NLP systems that power these systems have the daunting task of understanding, with high precision and high coverage(recall), what a user says, and also, doing it in real time. There are many frameworks that help create these paths, and over time the building of conversational agents have been distilled into few primitives such as intents and entities, positive and negative conversational flows, user stories etc. Over the course of two years we (at Slang Labs) built such a framework that helps define conversational flows, and designed API’s (on an SDK) that help perform actions or request information directly in the Apps. The premise of the SDK is to intertwine voice based requests with UI flows and to get the best of both worlds.
In this talk we will walk through how we designed this system, from ground up, to be scalable, (in order to serve multiple apps at the same time) and expressible (to handle the various facets of conversational and voice systems) with the key point being good data modeling. The key take away of this talk would be, how does one design a framework that is both expressible and precise, the importance of a data schema, or data layer that ties all the parts of the system together, and how this ties into the building of stateless architectures, how it makes it easier to load and serve from multiple ML models parallely, and finally how it makes it easy to test ML systems in production, quality, speed and reliability. We tie in real world anecdotes from our work to explain these concepts and the choices we make, and how these are even more effected by us having a unique paradigm of voice tied in with a GUI. Therefore it can be useful to both the larger community of conversationals systems developers, and to other developers that depend on systems that have inference models built on an amalgamation of various types of information.
I have structured the outline in the following fashion, a question and a one line take away for that question, which will obviously be detailed in the actual talk.
What is a conversational system? A conversational system performs actions or supplies information based on user’s commands or queries.
What are the primitives of a conversational system? There are many primitives to a conversational system: user stories, intents, entities, actions, prompts, statements, follow-on questions, context. The idea is, what is the smallest set of primitives that can help anyone define any set of user experiences for voice and UI together.
If there are so many primitives how did you choose to settle on a few of them and not the others? We discuss the tradeoff of designing primitives for a system. We can define many primitives that step on each other’s toes or define as little as possible that do one work each together but do them well. In the second version, it will require a lot more work on the side of the developer to compose these primitives together to create magical applications. So they come at a price.
How do you codeify the primitives? There are many moving parts to the system, the console which is a UI that helps create and use primitives, the backend processing, that works on them to infer meaning, and the actual app that performs actions based on these. It is important that there be a source of truth behind these primitives for maintainability and ease of development in larger teams that work on various parts in semi isolation. We discuss the need to codeify them into a schema that can be shared across the various parts of the system and how early investment into a schema can solve a lot of headache later.
How do you choose to release configurable parameters to the users? When building classifiers and other auxiliary algorithmic logic around the ML classifiers, like feature selection, there are many choices to be made, like what should the hyperparameters for the features be, should we release control over these to the users, what should be the defaults for people who don’t want to or don’t understand how to play with these hyperparameters.
What and why stateless architecture? (Touched upon) If we want to serve multiple people at the same time, and in real time, we would need multiple instances of the server application running in parallel. If we maintained state between them then the load balancer would have a doubly difficult job. We discuss why we need the stateless architecture, and how it makes our life difficult in some aspects and how we actually execute it and how the framework for conversational systems fortunately can be fit into a design that is stateless. We discuss what the tradeoffs are, of extra computation for every request, in order to remain stateless.
You may have different work loads, like intensive and long computes like train and smaller but larger number of computes like infer, how do you design around that? We split these many different types of computes into many microservers, that talk to each other through HTTP protocols. How having a good data model can help solve make it easier to create microservice architecture.
How do you serve multiple customers with different models at the same time? We have to train models, save models and serve inferred requests from these models and in a stateless fashion. We discuss how we store and serve models and how the schema helps by being the linchpin in the microservice and stateless architecture.
With long development cycles, how do you keep it all together? The answer is testing. But there are many forms of testing for us. Apart from the traditional software testing like unit tests etc. We have to test our NLP systems. We discuss, how we divide our tests into must not fail test and work NLP logic into the unit tests, keep benchmark systems that help us understand how our system is both improving and not regressing with any change to NLP logic. And how we perform blind tests and human user research to understand how people actually react to the quality of our systems. How it makes it easier with good data model
There are no technical requirements, but due to time we will be touching mutliple points of ML and NLP quickly so it may be helpful if you know a bit, but otherwise there are a lot of take aways for those who don’t also.
Ved Mathai is the NLP researcher/engineer at Slang Labs. The team builds voice augmented experiences(VAX) which is a framework to add voice experiences on top of the traditional mobile applications. He works on the NLP side of the framework and designs new primitives and tools that can help users build better conversational systems faster.