WEBVTT captioned by sachac
00:00:00.000 --> 00:00:04.359
Hi, I'm Blaine Mooers. I'm an associate professor
00:00:04.360 --> 00:00:06.519
of biochemistry at the University of Oklahoma
00:00:06.520 --> 00:00:09.319
Health Sciences Center in Oklahoma City.
00:00:09.320 --> 00:00:12.959
My lab studies the role of RNA structure in RNA editing.
00:00:12.960 --> 00:00:17.199
We use X-ray crystallography to study the structures
00:00:17.200 --> 00:00:19.919
of these RNAs. We spend a lot of time in the lab
00:00:19.920 --> 00:00:22.719
preparing our samples for structural studies,
00:00:22.720 --> 00:00:26.719
and then we also spend a lot of time at the computer
00:00:26.720 --> 00:00:29.719
analyzing the resulting data.
00:00:29.720 --> 00:00:33.039
I was seeking ways of using voice computing
00:00:33.040 --> 00:00:37.399
to try to enhance my productivity.
00:00:37.400 --> 00:00:41.319
I divide voice computing into three activities,
00:00:41.320 --> 00:00:44.959
speech-to-text or dictation, speech-to-commands,
00:00:44.960 --> 00:00:47.639
and speech-to-code. I'll be talking about
00:00:47.640 --> 00:00:50.159
speech-to-text and speech-to-commands today
00:00:50.160 --> 00:00:55.079
because these are two activities
00:00:55.080 --> 00:00:57.319
that are probably most broadly applicable
00:00:57.320 --> 00:01:02.559
to the workflows of people attending this conference.
00:01:02.560 --> 00:01:06.799
This talk will not be about Emacspeak.
00:01:06.800 --> 00:01:11.359
This is a verbal program for converting text to speech.
00:01:11.360 --> 00:01:13.319
We're talking about the flow of information
00:01:13.320 --> 00:01:16.519
opposite direction, speech-to-text.
00:01:16.520 --> 00:01:20.599
We need an Emacs Listens. We don't have one,
00:01:20.600 --> 00:01:25.479
so I had to seek help from outside the Emacs world
00:01:25.480 --> 00:01:30.639
via the Voice In Plus. This runs in
00:01:30.640 --> 00:01:33.639
the Google Chrome web browser,
00:01:33.640 --> 00:01:36.719
and it's very good for speech-to-text
00:01:36.720 --> 00:01:39.519
and very easy to learn how to use.
00:01:39.520 --> 00:01:41.999
It also has some speech-to-commands.
00:01:42.000 --> 00:01:44.799
However, Talon Voice is much better
00:01:44.800 --> 00:01:47.559
with the speech-to-commands,
00:01:47.560 --> 00:01:53.519
and it's also great at speech-to-code.
NOTE Motivations
00:01:53.520 --> 00:01:57.239
So, the motivations are, obviously, as I mentioned already,
00:01:57.240 --> 00:01:59.159
for improved productivity.
00:01:59.160 --> 00:02:00.399
So, if you're a fast typist
00:02:00.400 --> 00:02:05.199
who types faster than they can speak,
00:02:05.200 --> 00:02:07.079
then nonetheless you might still benefit
00:02:07.080 --> 00:02:09.279
from voice computing when you grow tired of
00:02:09.280 --> 00:02:12.199
using the keyboard. On the other hand,
00:02:12.200 --> 00:02:15.199
you might be a slow typist who talks faster
00:02:15.200 --> 00:02:17.519
than they can type.
00:02:17.520 --> 00:02:19.759
In this case, you're definitely going to
00:02:19.760 --> 00:02:22.859
benefit from dictation because you'll be able to
00:02:22.860 --> 00:02:29.359
encode more words in text documents in a given day.
00:02:29.360 --> 00:02:33.639
If you're a coder, then you may get a kick out of
00:02:33.640 --> 00:02:36.999
opening programs and websites and coding projects
00:02:37.000 --> 00:02:39.279
by using your voice.
00:02:39.280 --> 00:02:41.719
Then there are health-related reasons.
00:02:41.720 --> 00:02:44.599
You may have impaired use of your hands, eyes, or both
00:02:44.600 --> 00:02:49.199
due to accident or disease, or you may suffer from
00:02:49.200 --> 00:02:53.519
a repetitive stress injury. Many of us have this
00:02:53.520 --> 00:02:55.759
in a mild but chronic form of it.
00:02:55.760 --> 00:02:59.039
We can't take a three-month sabbatical from the keyboard
00:02:59.040 --> 00:03:05.519
without losing our jobs, so these injuries tend to persist.
00:03:05.520 --> 00:03:06.679
And then you may have learned
00:03:06.680 --> 00:03:09.959
that it's not good for your health to sit
00:03:09.960 --> 00:03:11.919
for prolonged periods of time
00:03:11.920 --> 00:03:14.919
with your staring at a computer screen.
00:03:14.920 --> 00:03:21.799
You can actually dictate to your computer from 20 feet away
00:03:21.800 --> 00:03:24.999
while looking out the window,
00:03:25.000 --> 00:03:27.779
thereby giving your lower body a break
00:03:27.780 --> 00:03:33.239
and your eyes a break.
NOTE Data
00:03:33.240 --> 00:03:35.639
I'm not God, so I have to bring data.
00:03:35.640 --> 00:03:38.039
I have two data points here,
00:03:38.040 --> 00:03:42.399
the number of words that I wrote in June and July this year
00:03:42.400 --> 00:03:45.159
and in September and October.
00:03:45.160 --> 00:03:49.519
I adopted the use of voice computing
00:03:49.520 --> 00:03:53.919
in the middle of August. As you can see,
00:03:53.920 --> 00:03:58.679
I got a over three-fold increase in my output.
NOTE Voice In in the Chrome Store
00:03:58.680 --> 00:04:07.119
So this is the Chrome store website for voice-in.
00:04:07.120 --> 00:04:11.119
So it's only available for Google Chrome.
00:04:11.120 --> 00:04:13.239
You just hit the install button to install it.
00:04:13.240 --> 00:04:16.639
To configure it, you need to select a language.
00:04:16.640 --> 00:04:19.559
It has support for 40 languages
00:04:19.560 --> 00:04:23.119
and it supports about a dozen different dialects of English,
00:04:23.120 --> 00:04:29.959
including Australian. It works on web pages with text areas,
00:04:29.960 --> 00:04:33.319
so it works. I use it regularly
00:04:33.320 --> 00:04:37.879
on Overleaf and 750words.com,
00:04:37.880 --> 00:04:42.279
a distraction-free environment for writing.
00:04:42.280 --> 00:04:46.239
It also works in webmails. It works in Google.
00:04:46.780 --> 00:04:51.319
It works in Jupyter Lab, of course,
00:04:51.320 --> 00:04:52.879
because that runs in the browser.
00:04:52.880 --> 00:04:57.999
It also works in Jupyter Notebook and Colab Notebook.
00:04:58.000 --> 00:05:01.319
It should work in Cloudmacs.
00:05:01.320 --> 00:05:04.159
I've mapped option-L to opening Voice In
00:05:04.160 --> 00:05:09.119
when the cursor is on a web page that has a text area.
00:05:09.120 --> 00:05:16.879
So that's the main limiting factor.
NOTE Built-in commands in Voice In Plus
00:05:16.880 --> 00:05:19.159
So it has a number of built-in commands.
00:05:19.160 --> 00:05:24.879
You can turn it off by saying stop dictation.
00:05:24.880 --> 00:05:26.119
It doesn't distinguish between
00:05:26.120 --> 00:05:28.799
a command mode and a dictation mode.
00:05:28.800 --> 00:05:33.599
It has undo command. When you use a command,
00:05:33.600 --> 00:05:36.919
copy that to a copy of selection.
00:05:36.920 --> 00:05:40.079
And the `press` commands are used in the browser,
00:05:40.080 --> 00:05:44.839
so you press Enter to issue a command or a text
00:05:44.840 --> 00:05:50.319
that has been written in a web form,
00:05:50.320 --> 00:05:55.279
and then "press tab" will open up the next tab
00:05:55.280 --> 00:05:58.599
in a web browser. The scroll up and down
00:05:58.600 --> 00:06:02.379
will allow you to navigate a web page.
00:06:02.380 --> 00:06:05.819
I've put together a quiz about these commands
00:06:05.820 --> 00:06:09.559
so that you can go through this quiz several times
00:06:09.560 --> 00:06:14.699
until you get at least 90 percent of them correct,
00:06:14.700 --> 00:06:16.679
90 percent of the questions correct.
00:06:16.680 --> 00:06:20.599
In order to boost your recall of the commands,
00:06:20.600 --> 00:06:23.799
I have a Python script that you can probably
00:06:23.800 --> 00:06:26.559
pound through the quiz with
00:06:26.560 --> 00:06:32.159
in less than a minute, once you know the commands.
00:06:32.160 --> 00:06:35.599
I also provide an Elisp version of this quiz,
00:06:35.600 --> 00:06:41.739
but it's a little slower to operate.
NOTE Common errors
00:06:41.740 --> 00:06:43.399
These are some common errors
00:06:43.400 --> 00:06:45.399
that I've run into with Voice In.
00:06:45.400 --> 00:06:50.319
It likes to contract statements like "I will" into "I'll".
00:06:50.320 --> 00:06:55.599
Contractions are not used in formal writing,
00:06:55.600 --> 00:07:00.359
and most of my writing is formal writing, so this annoys me.
00:07:00.360 --> 00:07:04.759
I will show you how I corrected for that problem.
00:07:04.760 --> 00:07:10.039
It also drops the first word in sentences quite often.
00:07:10.040 --> 00:07:13.359
This might be some speech issue that I have.
00:07:13.360 --> 00:07:17.599
It inserts the wrong word because it's not in the dictionary
00:07:17.600 --> 00:07:22.619
that was used to train it. So, for example,
00:07:22.620 --> 00:07:26.919
the word PyMOL is the name of a lexicographic program
00:07:26.920 --> 00:07:31.639
that we use in our field. It doesn't recognize PyMOL.
00:07:31.640 --> 00:07:34.239
Instead, it substitutes in the word "primal".
00:07:34.240 --> 00:07:38.399
Since I don't use "primal" very often,
00:07:38.400 --> 00:07:42.299
I've mapped the word "primal" to "PyMOL"
00:07:42.300 --> 00:07:45.659
in some custom commands I'll talk about in a minute.
00:07:45.660 --> 00:07:50.439
Then there's a problem that the commands that exist
00:07:50.440 --> 00:07:54.439
might get executed when you speak them when, in fact,
00:07:54.440 --> 00:07:58.839
you wanted to use the words in those commands
00:07:58.840 --> 00:08:01.439
during your dictation.
00:08:01.440 --> 00:08:07.119
So this is a problem, a pitfall of Voice In,
00:08:07.120 --> 00:08:08.919
in that it doesn't have a command mode
00:08:08.920 --> 00:08:14.759
that's separate from a dictation mode.
NOTE Custom speech-to-text commands
00:08:14.760 --> 00:08:20.319
So you can set up through a very easy-to-use GUI
00:08:20.320 --> 00:08:26.959
custom voice commands mapped to what you want inserted.
00:08:26.960 --> 00:08:32.399
So this is how misinterpreted words can be corrected.
00:08:32.400 --> 00:08:35.759
You just map the misinterpreted word to the intended word.
00:08:35.760 --> 00:08:42.839
You can also map the contractions to their expansions.
00:08:42.840 --> 00:08:46.959
I did this for 94 English contractions,
00:08:46.960 --> 00:08:50.139
and you can find this on GitHub.
00:08:50.140 --> 00:08:56.079
You can also insert acronyms and expand those acronyms.
00:08:56.080 --> 00:09:00.239
I apply the same approach to the first names of colleagues.
00:09:00.240 --> 00:09:03.759
I say "expand Fred", for example,
00:09:03.760 --> 00:09:06.999
to get Fred's first and last name with the spelling
00:09:07.000 --> 00:09:12.599
of his very long German name.
00:09:12.600 --> 00:09:19.399
You can also insert other trivia like favorite URLs.
00:09:19.400 --> 00:09:24.559
You can insert a lot of text snippets,
00:09:24.560 --> 00:09:34.799
and so it handles correctly multi-line snippets.
00:09:34.800 --> 00:09:39.419
You just have to enclose them in double quotes.
00:09:39.420 --> 00:09:45.039
You can even insert BibTeX cite keys for references
00:09:45.040 --> 00:09:46.879
that you use frequently. All fields
00:09:46.880 --> 00:09:59.419
have certain key references for certain methods or topics.
00:09:59.420 --> 00:10:05.079
Then it has a set of commands that you can customize
00:10:05.080 --> 00:10:08.199
for the purpose of speech to commands
00:10:08.200 --> 00:10:09.679
to get the computer to do something
00:10:09.680 --> 00:10:15.399
like open up a specific website or save the current writing.
00:10:15.400 --> 00:10:19.919
In this case, we have "press" is a mapping of
00:10:19.920 --> 00:10:27.759
is applied to the command `s` for saving current writing.
00:10:27.760 --> 00:10:28.099
You can change the language,
00:10:28.100 --> 00:10:37.539
and you can change the case of the text.
NOTE Introducing Talon Voice
00:10:37.540 --> 00:10:41.039
But the speech to command repertoire is quite limited
00:10:41.040 --> 00:10:49.759
in Voice In, so it's now time to pick up on Talon Voice.
00:10:49.760 --> 00:10:54.119
This is an open source project. It's free.
00:10:54.120 --> 00:10:57.399
It is highly configurable via TalonScript,
00:10:57.400 --> 00:10:58.959
which is a subset of Python.
00:10:58.960 --> 00:11:03.039
You can use either TalonScript or Python to configure it,
00:11:03.040 --> 00:11:06.279
but it's easier to code up your configuration
00:11:06.280 --> 00:11:08.399
in TalonScript.
00:11:08.400 --> 00:11:10.759
It has a Python interpreter embedded in it,
00:11:10.760 --> 00:11:12.999
so you don't have to mess around with installing
00:11:13.000 --> 00:11:14.559
yet another Python interpreter.
00:11:14.560 --> 00:11:21.519
It runs on all platforms, and it has a dictation mode
00:11:21.520 --> 00:11:24.599
that's separate from a command mode.
00:11:24.600 --> 00:11:25.599
You can activate it,
00:11:25.600 --> 00:11:31.359
and it'll be in a listening state asleep.
00:11:31.360 --> 00:11:36.279
You just bark out Talon Wake to start to wake it up,
00:11:36.280 --> 00:11:43.799
and Talon Sleep to have it go into a listening state.
00:11:43.800 --> 00:11:47.919
It has a very welcoming community
00:11:47.920 --> 00:11:50.919
in the Talon Slack channel.
00:11:50.920 --> 00:11:56.399
Then I need to point out that there's several packages
00:11:56.400 --> 00:11:59.199
that others have developed that run on top of Talon,
00:11:59.200 --> 00:12:03.079
but one of particular note is by Pokey Rule.
00:12:03.080 --> 00:12:08.119
He has on his website some really well-done videos
00:12:08.120 --> 00:12:11.479
that demonstrate how he uses Cursorless
00:12:11.480 --> 00:12:17.239
to move the cursor around using voice commands.
00:12:17.240 --> 00:12:20.559
This, however, runs on VS Code.
00:12:20.560 --> 00:12:23.359
At least that's the text editor
00:12:23.360 --> 00:12:28.399
for which he's primarily developing Cursorless.
NOTE Talon GUI
00:12:28.400 --> 00:12:35.519
So, I followed the protocol outlined by Tara Roys.
00:12:35.520 --> 00:12:38.759
She has a collection of tutorials
00:12:38.760 --> 00:12:44.599
on YouTube as well as on GitHub that are quite helpful.
00:12:44.600 --> 00:12:49.479
I followed her tutorial for installing
00:12:49.480 --> 00:12:51.359
Talend on macOS without any issues,
00:12:51.360 --> 00:12:55.319
but allow for half an hour to an hour
00:12:55.320 --> 00:12:57.719
to go through the process. When you're done,
00:12:57.720 --> 00:13:02.199
you'll have this Talon icon appear in the toolbar
00:13:02.200 --> 00:13:06.119
on the Mac. When it has this diagonal line across it,
00:13:06.120 --> 00:13:09.539
that means it's in the sleep state.
00:13:09.540 --> 00:13:13.519
So, this leads to cascading pull-down menus.
00:13:13.520 --> 00:13:19.639
This is it for the GUI interface.
00:13:19.640 --> 00:13:26.519
One of your first tasks is to select a large language model
00:13:26.520 --> 00:13:30.439
or language model that will be used to interpret
00:13:30.440 --> 00:13:35.179
the sounds that you generate as words.
00:13:35.180 --> 00:13:38.959
And the other kind of key feature is that there's a,
00:13:38.960 --> 00:13:43.399
under scripting, there's a view log pull-down
00:13:43.400 --> 00:13:48.399
that opens up a window displaying the log file.
00:13:48.400 --> 00:13:52.879
Whenever you make a change in a Talon configuration file,
00:13:52.880 --> 00:13:55.079
that change is implemented immediately.
00:13:55.080 --> 00:13:57.599
You do not have to restart Talon
00:13:57.600 --> 00:14:02.539
to get the change to take effect.
00:14:02.540 --> 00:14:04.759
So, this is an example of a Talon file.
00:14:04.760 --> 00:14:10.499
It has two components. It has a header above the dash that describes
00:14:10.500 --> 00:14:14.919
the scope of the commands contained below the dash.
00:14:14.920 --> 00:14:19.739
Each command is separated by a blank line.
00:14:19.740 --> 00:14:24.239
If a voice command is mapped to multiple actions,
00:14:24.240 --> 00:14:30.999
these are listed separately on indented lines
00:14:31.000 --> 00:14:33.599
below the first line.
00:14:33.600 --> 00:14:39.419
The words that are in square brackets are optional.
00:14:39.420 --> 00:14:44.319
So, I have mapped the word toggle voice in,
00:14:44.320 --> 00:14:46.319
or the phrase toggle voice in,
00:14:46.320 --> 00:14:51.279
to the keyboard shortcut Alt L
00:14:51.280 --> 00:14:54.999
in order to toggle on or off voice in.
00:14:55.000 --> 00:14:57.879
If I toggle voice in on,
00:14:57.880 --> 00:15:01.759
I need to immediately toggle off Talon,
00:15:01.760 --> 00:15:09.079
and this is done through this key command for Control T,
00:15:09.080 --> 00:15:11.079
which is mapped to speech toggle.
00:15:11.080 --> 00:15:20.399
Speech toggle. Then there are,
00:15:20.400 --> 00:15:24.079
there's a couple other examples.
00:15:24.080 --> 00:15:26.439
So, if there's no header present,
00:15:26.440 --> 00:15:29.599
it's an optional feature of Talon files,
00:15:29.600 --> 00:15:32.639
then the commands in the file will apply in all situations,
00:15:32.640 --> 00:15:36.959
in all modes. Here we have two restrictions.
00:15:36.960 --> 00:15:38.959
This is only, these commands will only work
00:15:38.960 --> 00:15:42.959
when using the iTerm2 terminal emulator for the Mac,
00:15:42.960 --> 00:15:48.239
and then only when the title of the window in iTerm2
00:15:48.240 --> 00:15:52.439
has this particular address, which corresponds to,
00:15:52.440 --> 00:15:55.559
which is what appears when I've logged into
00:15:55.560 --> 00:16:00.059
the supercomputer at the University of Oklahoma.
00:16:00.060 --> 00:16:03.479
So, one of the commands in this file is checkjobs.
00:16:03.480 --> 00:16:05.539
It's mapped to an alias,
00:16:05.540 --> 00:16:10.919
a bash alias called cj for "check jobs",
00:16:10.920 --> 00:16:17.079
which in turn is mapped to a script called checkjobs.sh
00:16:17.080 --> 00:16:20.399
that, when it's run, returns a listing
00:16:20.400 --> 00:16:23.219
of the pending and running jobs on the supercomputer
00:16:23.220 --> 00:16:26.080
in a format that I find pleasing.
00:16:26.081 --> 00:16:34.559
So, this backslash n after cj, new line character,
00:16:34.560 --> 00:16:39.839
enters the command. So, I don't have to do that
00:16:39.840 --> 00:16:43.799
as an additional step. And then, likewise,
00:16:43.800 --> 00:16:46.799
here's a similar setup for interacting with
00:16:46.800 --> 00:16:52.499
a Ubuntu virtual machine.
NOTE Recommendations
00:16:52.500 --> 00:16:55.919
So, in terms of picking up voice computing,
00:16:55.920 --> 00:16:57.479
these are my recommendations.
00:16:57.480 --> 00:16:59.759
You're going to run into more errors
00:16:59.760 --> 00:17:01.479
than you may like initially,
00:17:01.480 --> 00:17:07.839
and so you need some patience in dealing with those.
00:17:07.840 --> 00:17:09.919
And also, it'll take you a while
00:17:09.920 --> 00:17:16.799
to get your head wrapped around Talon and how it works.
00:17:16.800 --> 00:17:19.439
You'll definitely want to use these custom commands
00:17:19.440 --> 00:17:21.479
to correct the errors or shortcomings
00:17:21.480 --> 00:17:26.919
of the language models. And you've seen how,
00:17:26.920 --> 00:17:29.879
by opening up projects by voice commands,
00:17:29.880 --> 00:17:31.359
you can reduce friction
00:17:31.360 --> 00:17:36.659
in terms of restarting work on a project.
00:17:36.660 --> 00:17:40.399
You've seen how Voice In is preferred
00:17:40.400 --> 00:17:44.879
for more accurate dictation.
00:17:44.880 --> 00:17:48.079
I think my error rate is about 1 to 2 percent.
00:17:48.080 --> 00:17:53.879
That is, 1 to 2 out of 100 words are incorrect
00:17:53.880 --> 00:17:56.319
versus Talon Voice where I think
00:17:56.320 --> 00:17:59.879
the error rate is closer to 5 percent.
00:18:00.840 --> 00:18:04.759
I have put together contractions also for Talon,
00:18:04.760 --> 00:18:07.479
and they can be found here on GitHub.
00:18:07.480 --> 00:18:12.959
And I also have a quiz of 600 questions
00:18:12.960 --> 00:18:17.719
about some basic Talon commands.
00:18:17.720 --> 00:18:20.999
So, I'd like to thank the people who've helped me out
00:18:21.000 --> 00:18:22.159
on the Talon Slack channel
00:18:22.160 --> 00:18:25.799
and members of the Oklahoma Data Science Workshop
00:18:25.800 --> 00:18:29.879
where I gave an hour-long talk on this topic
00:18:29.880 --> 00:18:30.959
several weeks ago.
00:18:30.960 --> 00:18:34.159
I'd like to thank my friends
00:18:34.160 --> 00:18:37.399
at the Berlin and Austin Emacs Meetup
00:18:37.400 --> 00:18:42.659
and at the M-x Research Slack channel.
00:18:42.660 --> 00:18:45.119
And I thank these grant funding agencies
00:18:45.120 --> 00:18:48.880
for supporting my work. I'll be happy to take any questions.