[[!meta title="Enhancing productivity with voice computing"]]
[[!meta copyright="Copyright © 2023 Blaine Mooers"]]
[[!inline pages="internal(2023/info/voice-nav)" raw="yes"]]
<!-- Initially generated with emacsconf-publish-talk-page and then left alone for manual editing -->
<!-- You can manually edit this file to update the abstract, add links, etc. --->
# Enhancing productivity with voice computing
Blaine Mooers (he/him/his) - Pronunciation: pronounced like "moors", blaine-mooers(at)ouhsc.edu, <https://basicsciences.ouhsc.edu/bmb/Faculty/bio_details/mooers-blaine-hm-phd>, <https://twitter.com/BlaineMooers>, <https://github.com/MooersLab>, <https://codeberg.org/MooersLab>, mastodon(at)bhmooers
[[!inline pages="internal(2023/info/voice-before)" raw="yes"]]
Voice computing uses speech recognition software to convert speech into text, commands, or code.
While there is a venerated program called EmacSpeaks for converting text into speech, a
"EmacsListens" for converting speech into text is not available yet.
The Emacs Wiki describes the underdeveloped situation for speech-to-text in Emacs.
I will explain how two external software packages convert my speech into text and computer
commands that can be used with Emacs.
First, I present some motivations for using voice computing.
These can be divided into two categories: productivity improvement and health-related issues.
In this second category, there is the under-appreciated cure for ``standing desk envy'';
the cure is achievable with a large dose of voice computing while standing.
I found one software package (Voice In) to be quite accurate for speech-to-text or dictation
(Voice In Plus, <https://dictanote.co/voicein/plus/>), but less versatile for speech-to-commands.
I have used this package daily and I found a three-fold increase in my daily word count almost
immediately.
Of course, there are limits here; you can talk for only so many hours per day.
Second, I found another software package that has a less accurate language model (Talon Voice,
<http://talon.wiki/>)) but that supports custom commands that can be executed anywhere you can
place the cursor, including in virtual machines and on remote servers.
Talon Voice will appeal to those who like to tinker with configuration files, yet it is easy to
use.
I will explain how I have integrated these two packages into my workflow.
I have developed a library of commands that expand 94 English contractions when spoken.
This library eliminates tedious downstream editing of formal prose where I do not use
contractions.
The library is available on GitHub for both Voice In Plus
(<https://github.com/mooersLab/voice-in-plus-contractions>) and Talon Voice
(<https://github.com/MooersLab/talon-contractions>).
I also supply the interactive quizzes for mastering the basic Voice In commands
(<https://github.com/MooersLab/voice-in-basics-quiz>) and the Talon Voice phonetic alphabet
(<https://github.com/MooersLab/talon-voice-quizzes/qTalonAlphabet.py>)
I learned the Talon alphabet in one day by taking the quiz at spaced intervals.
The quiz took only 60 seconds to complete when I was proficient.
I gave a 60 minute talk on this topic to the Oklahoma Data Science Workshop
2023 Nov. 16 (<https://mediasite.ouhsc.edu/Mediasite/Channel/python>).
This workshop meets once a month and is for people interested in data
science and scientific computing. You do not have to be an Oklahoma
resident to attend. Send me e-mial if you want to be added to our mailing list.
# About the speaker:
I am an Associate Professor of Biochemistry at the University of
Oklahoma Health Sciences Center. I use X-ray crystallography to study
the structures of RNA, proteins, and protein-drug complexes. I have
been using Python and LaTeX for a dozen years, and Jupyter Notebooks
since 2013. I have been using Emacs every day for 2.5 years. I
discovered voice computing this summer when my chronic repetitive
stress injury flared up while entering data in a spreadsheet. I
tripled my daily word count by using the speech-to-text, and I get a
kick out of running remote computers by speech-to-command.
# Transcript
[slide 1]
Hi, I'm Blaine Mooers. I'm an associate professor of biochemistry at
the University of Oklahoma Health Sciences Center in Oklahoma City.
My lab studies the role of RNA structure in RNA editing. We use X-ray
crystallography to study the structures of these RNAs. We spend a lot
of time in the lab preparing our samples for structural studies, and then
we also spend a lot of time at the computer analyzing the resulting data.
I was seeking ways of using voice computing to try to enhance
my productivity.
[slide 2]
I divide voice computing into three activities, speech-to-text or dictation,
speech-to-commands, and speech-to-code. I'll be talking about
speech-to-text and speech-to-commands today because these are two
activities that are probably most broadly applicable to the workflows of
people attending this conference.
[slide 3]
This talk will not be about Emacspeak. This is a venerated program for
converting text to speech. We're talking about the flow of information in
the opposite direction, speech-to-text. We need an Emacs Listens. We
don't have one, so I had to seek help from outside the Emacs world via
the Voice In Plus. This runs in the Google Chrome web browser, and it's
very good for speech-to-text and very easy to learn how to use. It also
has some speech-to-commands. However, Talon Voice is much better
with the speech-to-commands, and it's also great at speech-to-code.
[slide 4]
The motivations are, obviously, as I mentioned already, for improved
productivity. So, if you're a fast typist who types faster than they can speak,
then nonetheless you might still benefit from voice computing when you
grow tired of using the keyboard. On the other hand, you might be a
slow typist who talks faster than they can type. In this case, you're
definitely going to benefit from dictation because you'll be able to encode
more words in text documents in a given day. If you're a coder,
then you may get a kick out of opening programs and websites and
coding projects by using your voice.
Then there are health-related reasons. You may have impaired use
of your hands, eyes, or both due to accident or disease, or you may
suffer from a repetitive stress injury. Many of us have this in a mild
but chronic form of it. We can't take a three-month sabbatical from
the keyboard without losing our jobs, so these injuries tend to persist.
And then you may have learned that it's not good for your health to
sit for prolonged periods of time with your staring at a computer screen.
You can actually dictate to your computer from 20 feet away while
looking out the window, thereby giving your lower body a break
and your eyes a break.
[slide 5]
I'm not God, so I have to bring data. I have two data points here,
the number of words that I wrote in June and July this year and in
September and October. I adopted the use of voice computing in
the middle of August. As you can see, I got an over three-fold increase
in my output.
[slide 6]
So this is the Chrome store website for voice-in. It's only available
for Google Chrome. You just hit the install button to install it. To configure it,
you need to select a language. It has support for 40 languages and
it supports about a dozen different dialects of English, including Australian.
[slide 7]
It works on web pages with text areas, so it works. I use it regularly on
Overleaf and 750words.com, a distraction-free environment for writing.
It also works in webmails. It works in Google. It works in Jupyter Lab,
of course, because that runs in the browser. It also works in Jupyter
Notebook and Colab Notebook. It should work in Cloudmacs. I've
mapped option-L to opening Voice In when the cursor is on a web page
that has a text area. The presence of a text area is the main limiting factor.
[slide 8]
Voice In has a number of built-in commands. You can turn it off by saying
"stop dictation". It doesn't distinguish between a command mode and
a dictation mode. It has undo command. When you use the command
"copy that" to a copy of selection. You say "press enter" to issue a
command or submit text that has been written in a web form, and then
"press tab" will open up the next tab in a web browser. The scroll up
and down will allow you to navigate a web page. I've put together a quiz
about these commands so that you can go through this quiz several
times until you get at least 90 percent of them correct, 90 percent of
the questions correct. In order to boost your recall of the commands,
I have a Python script that you can probably pound through the quiz
with in less than a minute, once you know the commands. I also
provide an Elisp version of this quiz, but it's a little slower to operate.
[slide 9]
These are some common errors that I've run into with Voice In. It
likes to contract statements like "I will" into "I'll". Contractions are not
used in formal writing, and most of my writing is formal writing, so
this annoys me. I will show you how I corrected for that problem. It
also drops the first word in sentences quite often. This might be some
speech issue that I have. It inserts the wrong word because it's not
in the dictionary that was used to train it. So, for example, the word
PyMOL is the name of a molecular graphics program that we use in
our field. It doesn't recognize PyMOL. Instead, it substitutes in the
word "primal". Since I don't use "primal" very often, I've mapped the
word "primal" to "PyMOL" in some custom commands I'll talk about
in a minute. Then there's a problem that the commands that exist might
get executed when you speak them when, in fact, you wanted to use
the words in those commands during your dictation. So this is a problem,
a pitfall of Voice In, in that it doesn't have a command mode that's
separate from a dictation mode.
[slide 10]
You can set up through a very easy-to-use GUI custom voice
commands mapped to what you want inserted, so this is how
misinterpreted words can be corrected. You just map the misinterpreted
word to the intended word. You can also map the contractions to their
expansions. I did this for 94 English contractions, and you can find
these on GitHub. You can also insert acronyms and expand those
acronyms. I apply the same approach to the first names of colleagues.
I say "expand Fred", for example, to get Fred's first and last name
with the correct spelling of his very long German name. You can also
insert other trivia like favorite URLs. You can insert LaTeX snippets.
It handles correctly multi-line snippets. You just have to enclose them
in double quotes. You can even insert BibTeX cite keys for references
that you use frequently. All fields have certain key references for certain
methods or topics.
[slide 11]
Then it has a set of commands that you can customize for the purpose
of speech-to-commands to get the computer to do something like open
up a specific website or save the current writing. In this case, we have
"press: command-s" for saving the current writing. You can change the
language with "lang:", and you can change the case of the text with "case:".
[slide 12]
But the speech-to-command repertoire is quite limited in Voice In,
so it's now time to pick up on Talon Voice. This is an open source project.
It's free. It is highly configurable via TalonScript, which is a subset of
Python. You can use either TalonScript or Python to configure it, but it's
easier to code up your configuration in TalonScript. It has a Python
interpreter embedded in it, so you don't have to mess around with
installing yet another Python interpreter. It runs on all platforms, and it
has a dictation mode that's separate from a command mode. You can
activate it, and it'll be in a listening state asleep. You just bark out
"Talon Wake" to start to wake it up, and "Talon Sleep" to have it go into
a listening state. It has a very welcoming community in the Talon Slack
channel. Then I need to point out that there's several packages that
others have developed that run on top of Talon, but one of particular note
is by Pokey Rule. He has on his website some really well-done videos
that demonstrate how he uses Cursorless to move the cursor around
using voice commands. This, however, runs on VS Code. At least that's
the text editor for which he's primarily developing Cursorless.
[slide 13]
I followed the install protocol outlined by Tara Roys. She has a collection
of tutorials on YouTube as well as on GitHub that are quite helpful. I
followed her tutorial for installing Talon on macOS without any issues,
but allow for half an hour to an hour to go through the process. When
you're done, you'll have this Talon icon appear in the toolbar on the Mac.
When it has this diagonal line across it, that means it's in the sleep state.
So, this leads to cascading pull-down menus. This is it for the GUI. One
of your first tasks is to select a language model that will be used to
interpret the sounds that you generate as words. And the other kind of
key feature is that there's a, under scripting, there's a view log pull-down
that opens up a window displaying the log file. Whenever you make a
change in a Talon configuration file, that change is implemented
immediately. You do not have to restart Talon to get the change
to take effect.
[slide 14]
This is an example of a Talon file. It has two components. It has a header
above the dash that describes the scope of the commands contained
below the dash. Each command is separated by a blank line. If a voice
command is mapped to multiple actions, these are listed separately on
indented lines below the first line. The words that are in square brackets
are optional. So, I have mapped the word toggle voice in, or the phrase
toggle voice in, to the keyboard shortcut Alt L in order to toggle on or off
voice in. If I toggle voice in on, I need to immediately toggle off Talon,
and this is done through this key command for Control T, which is
mapped to speech toggle. Speech toggle. Then there are, there's a
couple other examples. So, if there's no header present, it's an optional
feature of Talon files, then the commands in the file will apply in all
situations, in all modes.
[slide 15]
Here we have two restrictions. These commands will only work when
using the iTerm2 terminal emulator for the Mac, and then only when the
title of the window in ccc has this particular address, which is what
appears when I've logged into the supercomputer at the University of
Oklahoma. One of the commands in this file is checkjobs. It's mapped
to an alias, a bash alias called cj for "check jobs", which in turn is mapped
to a script called checkjobs.sh that, when it's run, returns a listing of the
pending and running jobs on the supercomputer in a format that I find
pleasing. This backslash n after cj, the new line character, enters the
command, so I don't have to do that as an additional step. Likewise,
here's a similar setup for interacting with a Ubuntu virtual machine.
[slide 16]
In terms of picking up voice computing, these are my recommendations.
You're going to run into more errors than you may like initially, and so
you need some patience in dealing with those. And also, it'll take you
a while to get your head wrapped around Talon and how it works. You'll
definitely want to use custom commands to correct the errors or
shortcomings of the language models. And you've seen how, by
opening up projects by voice commands, you can reduce friction in
terms of restarting work on a project. You've seen how Voice In is
preferred for more accurate dictation. I think my error rate is about
1 to 2 percent. That is, 1 to 2 out of 100 words are incorrect versus
Talon Voice where I think the error rate is closer to 5 percent. I have
put together a library of Enlgish contractions and their expansion for
Talon too, and they can be found here on GitHub. And I also have posted
a quiz of 600 questions about some basic Talon commands.
[slide 17]
I'd like to thank the people who've helped me out on the Talon Slack
channel and members of the Oklahoma Data Science Workshop where
I gave an hour-long talk on this topic several weeks ago. I'd like to thank
my friends at the Berlin and Austin Emacs Meetup and at the M-x research
Slack channel. And I thank these grant funding agencies for supporting
my work. I'll be happy to take any questions.
[[!inline pages="internal(2023/info/voice-after)" raw="yes"]]
[[!inline pages="internal(2023/info/voice-nav)" raw="yes"]]