From Voice-First Development by Ann Thyme-Gobbel
This article delves into capturing and documenting voice UI design.
Capturing and documenting VUI design
When asked what makes a VUI design spec, one developer said, “If you’re a developer who’s new to voice, you’ll be surprised at the level of detail in quality VUI specs. Non-voice devs will naturally think ‘well, how hard can this be?’ to which the answer is always ‘harder than you think!’ Dialog design has subtle complexities. A competent VUI designer takes the time and effort to spell out how it all ought to work, leaving nothing to interpretation. Using clearly specified logic is excellent. A good dialog designer has to understand basic programming logic, and a good developer needs to understand VUI designers have reasons for what they do. I think it’s important to establish mutual common ground. Have regular meetings and ask each other questions.“
A product manager points out, “Clearly separate intended behavior from implementation details. Let’s say you’re building some device. Instead of ‘Feature X has the settings A, B, C, controlled with a toggle switch,’ start with ‘Feature X has the settings A, B, C.’ Then link to and specify the UI for the different interfaces: ‘X is controlled with a toggle on the top panel’ for one, and ‘Typical requests for X are ‘Set X to A’ or ‘Change X to B. If D or E is requested, respond that these are not supported’ for the other. This helps you treat the different UIs equally while remembering they’re not the same.”
VUI designs are typically captured and documented in three ways, each with its own strengths and purposes — you should make use of all three:
- Dialog flows
- Sample dialogs
- Detailed design specification
Let’s look briefly at each one.
Some dialog flows are high-level overviews of the core intents and experience, others are highly detailed. This is the best tool for everyone on the team to establish, for example, where data access takes place and the overall sequence of steps in multi-step dialogs with multiple slots. You want to settle on the overall logic flow before getting into the detailed design. When you add prompt wording, use a notation that signifies draft status. We often use all caps or angle brackets. Once your detailed design is done, you might update your flows with the prompt wording. Figure 1 shows a few close-to-real-world examples, scrubbed to protect the innocent.
SUCCESS TIP: STICK TO LOGIC AND DESCRIPTIONS IN EARLY FLOWS To keep everyone focused on the flow and logic early on, rather than on the wording of prompts and other details, exclude fully phrased prompts in those flows but use descriptions. For example, all error messages might say “ERROR MESSAGE + OFFER OPTION” instead of a context-specific complete sentence like “Doctor Jones doesn’t have an office phone. Want to call the lab instead?” Once you move into detailed design, you can easily change the descriptors to full prompts.
Sample dialogs are snapshots of representative voice interactions between users and your voice app. Each dialog should include clear descriptions of the context and condition for that dialog. They should illustrate both happy paths and error conditions. Above all, they should include both written (for convenience) and audio recordings (for relevance). They’re great tools for illustrating the intended style and features of any voice app. Sample dialogs and dialog flows are great tools for making user stories concrete. Figure 2 gives you a taste of what sample dialogs might look like.
SUCCESS TIP: PRESENT SAMPLE DIALOGS AS AUDIO Text versions of sample dialogs are useful for thinking through designs and discuss them with others, but never forget that spoken and written language are quite different. In reviews, always start with an audio version, pre-recorded using the actual voice if possible, otherwise read out loud, even role play. Don’t share the written text until later. In fact, review documentation in large voice companies sometimes include “DO NOT TURN THIS PAGE” warnings. While working out the design, talk to yourself, and read samples to others around you. This is VUI: it’s all about the sound.
Detailed design specifications
Your detailed VUI design needs to cover enough detail to make clear to developers what the intended behavior should be for every context and condition. This is obviously true for any design in any modality, but it’s probably less obvious how to do this for voice than for a visual interface where you include images of screens, exact measurements, and color codes. Detailed VUI design documentation should cover the following at a minimum:
- Every intent, name and descriptions
- For each intent
- Archetype utterances, and any expected words
- Slot names and values, required and optional
- Outcome and next step for every combination of context and conditions
- Context: user identity or category, user preferences, environment, previous user request and result (dialog path), current system activity, and so on
- Conditions: behavior for each expected user request status (recognized and handled, recognized and not handled, not recognized, and so on)
- Prompts: reference labels and exact wording for each context and condition, including error handling, retries, randomization, and so on.
- Logic and pseudo code describing behavior clearly and consistently
- Data needs: type, format, from where and saved to where
The tools you use to document your design obviously depends on where you work, who you work with, and what platform and voice service you’re using. Don’t use a documentation style or tool that ignores limitations on the design or implementation. Don’t use one that makes it difficult to capture what can be done either! You’ll probably use different approaches for multi-step dialog tasks versus one-step “one and done” requests.
SUCCESS TIP: PICK A NOTATION Whether you’re working with others or by yourself, be clear about any notation you use. To convey the design (or interpret it as intended), you’ll need a notation which is clear and easy to use, easy to type in flows and specs, and transfers well between design and development. We like to use all-caps plus some symbol for easier search to indicate variable content, like $FOOD. Your choice is influenced by your developing environment, programming language, and your tools, but it should be usable for all involved. For archetype utterances, it’s important to not lose track of them being illustrative rather than a promise of handling precise utterances. It’s equally important to avoid detailed design too soon. Flag the “serving suggestion” or draft nature of a prompt. We often use angle brackets for this, but you can use anything that works well for your situation. For example: <Okay. Calling $NAME, $PHONETYPE.> Pseudo code, if done right, can be useful because it limits the need for interpretation and is easy to search on. It can also help the design process when making sure all use cases are covered. We’ve made great use of simple pseudocode for contexts and conditions, such as “
is_muted = TRUE” or
"$AMOUNT >= maxAmount.” Whatever you use, make sure everyone is onboard and clear on what things mean.
VUI design documentation approaches
Let’s take a look at your most common options for documenting VUI designs:
Approach: Self-contained design tools
Description Graphical software tools for VUI design. The goal is to minimize coding and worrying about details to quickly create a functional VUI dialog that can run on Alexa or Google or either. Examples: BotTalk, Invocable (was Storyline), PullString, SaySpring (now part of Adobe), Voice Apps, Voiceflow, and others.
Pros Fast. Can lay out a design and turn it into a functional dialog. Great for concepts and simple contexts.
Cons Typically limited in functionality and features. Many are not mature, still changing. All emphasize speed of creation over flexibility and quality design. Some favor one platform over another. Some ramp-up time needed.
Take-home Great for smaller projects and sample dialogs. Evolving — find what works for you and doesn’t limit you.
Approach: Platform-specific tools
Description Examples: Google Dialogflow (was api.ai), and others
Pros Limited to what’s available through the platform or voice service, which automatically keeps you from designing out of scope features.
Cons Limited to what’s available through the platform or voice service. Most are still evolving, and can be unstable. Usually involves some coding, which can be a drawback for some designers. Some ramp-up time needed.
Take-home If you know you’ll design for one of the common platforms, these tools are well worth looking at.
Approach: Proprietary in-house tools
Description Highly flexible and powerful VUI design tools often created by a large effort over a long time. Examples: Nuance Application Studio (NAS), and others.
Pros Powerful and mature. Typically ties together every kind of design documentation, automatically updating change in all relevant places to keep things synchronized. Can generate code and prompt lists, even a prototyping set-up. Provides features corresponding to what’s available in the platform and only those features.
Cons Not accessible unless you’re in a relationship with the company who controls the tool. Auto-generated code isn’t optimized for production unless everything is streamlined, and it typically needs rework for larger realistic complex systems. Associated with a particular company’s platform, it isn’t generalized. Ramp-up time can be steep.
Take-home Top of the top — use them if you have access.
Approach: Standard documentation tools
Description Standard office software, sometimes with added tailored scripts. Examples: text (Word), spreadsheets (Excel, Google Docs), flow (Visio, Balsamiq) Can hook to databases and generate prompt lists.
Pros Completely flexible, can tailor exactly to your needs. Not dependent on others changing feature availability. No training ramp-up time needed.
Cons You have to do all the work. Any auto-generation is limited to what you create. Make sure everyone referencing the documentation uses the same notation.
Take-home Probably the most widespread VUI design tools today — don’t under-estimate the power of simplicity.
In figure 3 you can see a couple of examples from different VUI design specifications. The spreadsheet approach is best for broad-but-shallow interactions, like initial open-ended interactions which are no more than one or two turns long. They provide an easy view of the big picture and patterns across contexts. For longer transactional dialogs, a combination of flow and a word processing format with clickable links is often the better choice. In bigger implementations, you’ll probably combine both.
As you can see, you have lots of choices. No ‘One Way’ to VUI documentation is used by all voice practitioners. You need to determine what works for you based on many factors: your work style, team structure and familiarity with voice, company demands, voice platform, infrastructure, existing designs, tool availability, type of voice interaction, multimodality, or content. We used all approaches mentioned here in various combinations with varying results and level of happiness. Investigate your options — they’ll constantly change as tools come and go. Make use of anything already in place. Could be design patterns or existing designs for similar voice dialogs. Those may impact your documentation choice, or make it faster to get started. At times you’ll have little or no choice because the approaches and formats have already been set.
How to review a VUI design, as told by a developer-turned-VUI-designer: “If there’s a paired dialog flow, I always start with that. For me it makes the most sense. A lot of developers don’t care about the dialog flow and I have no *#! idea why. I always have. Big picture, then smaller details.
I can’t emphasize the dialog flow’s importance enough, like with mixed audiences. Business folks understand them. It’s the easiest doc for the largest range of people to understand. It’s great for orienting people in a bigger system. It’s helpful to go back and forth between the dialog flow and VUI spec, even having them both up at once allows people to reference what you went over.
I like to go through the spec module by module, first explaining at a high level what the module does such that when we get into the details, people can connect that to what I said earlier. The first few modules might take longer but then if it’s clear and predictable (like similar conditions written the same way), even a non-techie person can learn the lingo and follow any detailed VUI design. Then I usually go node-by-node to a point. When there’s a million conditions it may become more “In scenario A, we go down this path” and follow that through, then go back to scenario B. Specs must have clickable links which allow you to step through any scenario.
Explaining what to look out for and why helps folks who have less of a tech perspective, even with easy modules. Like “this is your transfer funds module…we have checks in place to ensure there’s enough money, that this business rule is caught here, etc. etc.” Talking about business rules piques their interest because everyone knows what a transfer funds module does. Those small details upfront help get them interested and also let them know you “got it.”
I’m a developer who always cared about design. I like when developers say, “If you do this part this way, the experience is the same but it’ll be much easier for me to code.” That’s a good sign they understand, are paying attention, and can compromise.”
Many tools today are aimed at letting developers jump in with little or no voice or design expertise. It’s cool, and one reason for the explosion of voice development, but for enterprise-level voice systems, you’ll probably need something else to coordinate and track details across teams. And you’ll certainly start with design, not development.
The detailed design references another important set of documented information: the VUI style guide. The core purpose of the style guide is to establish consistency. This means consistency on every level: in prompt wording and delivery within and across conversations, in VUI behavior, interpretation, and relative to any branding. Style guides are crucial when there’s more than one designer working on portions of a VUI or on related VUIs for a company, but even a lone designer needs a style guide to keep dialogs consistent. Developers and speech scientists need it to ensure that user behavior is handled consistently and that words and phrases are consistently interpreted.
SUCCESS TIP: BEWARE OF DESIGN BY COMMITTEE AND IN A VACUUM The biggest challenge during VUI design is funneling the (usually) good intentions of everyone involved with the project. Because the medium is voice, everyone has an opinion about what users will say and how best to respond. “I speak the language; how hard can this be” is a natural thought. Use the sample dialogs to settle on the overall conversational style, capturing decisions and reasons behind them in your style guide to keep the style and behavior consistent. Early design is a good time to settle on the amount of variation in prompting because it’ll likely impact the development effort.