summaryrefslogtreecommitdiffstats
path: root/2023/talks/voice.md
diff options
context:
space:
mode:
Diffstat (limited to '2023/talks/voice.md')
-rw-r--r--2023/talks/voice.md216
1 files changed, 0 insertions, 216 deletions
diff --git a/2023/talks/voice.md b/2023/talks/voice.md
index 6d44de97..f12d5195 100644
--- a/2023/talks/voice.md
+++ b/2023/talks/voice.md
@@ -66,222 +66,6 @@ stress injury flared up while entering data in a spreadsheet. I
tripled my daily word count by using the speech-to-text, and I get a
kick out of running remote computers by speech-to-command.
-# Transcript
-
-[slide 1]
-Hi, I'm Blaine Mooers. I'm an associate professor of biochemistry at
- the University of Oklahoma Health Sciences Center in Oklahoma City.
- My lab studies the role of RNA structure in RNA editing. We use X-ray
- crystallography to study the structures of these RNAs. We spend a lot
- of time in the lab preparing our samples for structural studies, and then
- we also spend a lot of time at the computer analyzing the resulting data.
- I was seeking ways of using voice computing to try to enhance
- my productivity.
-
-[slide 2]
-I divide voice computing into three activities, speech-to-text or dictation,
-speech-to-commands, and speech-to-code. I'll be talking about
-speech-to-text and speech-to-commands today because these are two
-activities that are probably most broadly applicable to the workflows of
-people attending this conference.
-
-[slide 3]
-This talk will not be about Emacspeak. This is a venerated program for
-converting text to speech. We're talking about the flow of information in
-the opposite direction, speech-to-text. We need an Emacs Listens. We
-don't have one, so I had to seek help from outside the Emacs world via
- the Voice In Plus. This runs in the Google Chrome web browser, and it's
- very good for speech-to-text and very easy to learn how to use. It also
- has some speech-to-commands. However, Talon Voice is much better
- with the speech-to-commands, and it's also great at speech-to-code.
-
-[slide 4]
-The motivations are, obviously, as I mentioned already, for improved
-productivity. So, if you're a fast typist who types faster than they can speak,
-then nonetheless you might still benefit from voice computing when you
-grow tired of using the keyboard. On the other hand, you might be a
-slow typist who talks faster than they can type. In this case, you're
-definitely going to benefit from dictation because you'll be able to encode
-more words in text documents in a given day. If you're a coder,
-then you may get a kick out of opening programs and websites and
-coding projects by using your voice.
-
-Then there are health-related reasons. You may have impaired use
-of your hands, eyes, or both due to accident or disease, or you may
-suffer from a repetitive stress injury. Many of us have this in a mild
-but chronic form of it. We can't take a three-month sabbatical from
-the keyboard without losing our jobs, so these injuries tend to persist.
-And then you may have learned that it's not good for your health to
-sit for prolonged periods of time with your staring at a computer screen.
-You can actually dictate to your computer from 20 feet away while
-looking out the window, thereby giving your lower body a break
-and your eyes a break.
-
-[slide 5]
-I'm not God, so I have to bring data. I have two data points here,
-the number of words that I wrote in June and July this year and in
-September and October. I adopted the use of voice computing in
-the middle of August. As you can see, I got an over three-fold increase
- in my output.
-
-[slide 6]
-So this is the Chrome store website for voice-in. It's only available
-for Google Chrome. You just hit the install button to install it. To configure it,
-you need to select a language. It has support for 40 languages and
-it supports about a dozen different dialects of English, including Australian.
-
-[slide 7]
-It works on web pages with text areas, so it works. I use it regularly on
-Overleaf and 750words.com, a distraction-free environment for writing.
- It also works in webmails. It works in Google. It works in Jupyter Lab,
- of course, because that runs in the browser. It also works in Jupyter
- Notebook and Colab Notebook. It should work in Cloudmacs. I've
- mapped option-L to opening Voice In when the cursor is on a web page
- that has a text area. The presence of a text area is the main limiting factor.
-
-[slide 8]
-Voice In has a number of built-in commands. You can turn it off by saying
-"stop dictation". It doesn't distinguish between a command mode and
-a dictation mode. It has undo command. When you use the command
-"copy that" to a copy of selection. You say "press enter" to issue a
-command or submit text that has been written in a web form, and then
-"press tab" will open up the next tab in a web browser. The scroll up
-and down will allow you to navigate a web page. I've put together a quiz
-about these commands so that you can go through this quiz several
-times until you get at least 90 percent of them correct, 90 percent of
-the questions correct. In order to boost your recall of the commands,
-I have a Python script that you can probably pound through the quiz
-with in less than a minute, once you know the commands. I also
-provide an Elisp version of this quiz, but it's a little slower to operate.
-
-[slide 9]
-These are some common errors that I've run into with Voice In. It
-likes to contract statements like "I will" into "I'll". Contractions are not
-used in formal writing, and most of my writing is formal writing, so
-this annoys me. I will show you how I corrected for that problem. It
-also drops the first word in sentences quite often. This might be some
-speech issue that I have. It inserts the wrong word because it's not
-in the dictionary that was used to train it. So, for example, the word
-PyMOL is the name of a molecular graphics program that we use in
-our field. It doesn't recognize PyMOL. Instead, it substitutes in the
-word "primal". Since I don't use "primal" very often, I've mapped the
-word "primal" to "PyMOL" in some custom commands I'll talk about
-in a minute. Then there's a problem that the commands that exist might
-get executed when you speak them when, in fact, you wanted to use
-the words in those commands during your dictation. So this is a problem,
-a pitfall of Voice In, in that it doesn't have a command mode that's
-separate from a dictation mode.
-
-[slide 10]
-You can set up through a very easy-to-use GUI custom voice
-commands mapped to what you want inserted, so this is how
-misinterpreted words can be corrected. You just map the misinterpreted
-word to the intended word. You can also map the contractions to their
-expansions. I did this for 94 English contractions, and you can find
-these on GitHub. You can also insert acronyms and expand those
-acronyms. I apply the same approach to the first names of colleagues.
-I say "expand Fred", for example, to get Fred's first and last name
-with the correct spelling of his very long German name. You can also
-insert other trivia like favorite URLs. You can insert LaTeX snippets.
-It handles correctly multi-line snippets. You just have to enclose them
-in double quotes. You can even insert BibTeX cite keys for references
-that you use frequently. All fields have certain key references for certain
-methods or topics.
-
-[slide 11]
-Then it has a set of commands that you can customize for the purpose
-of speech-to-commands to get the computer to do something like open
-up a specific website or save the current writing. In this case, we have
-"press: command-s" for saving the current writing. You can change the
-language with "lang:", and you can change the case of the text with "case:".
-
-[slide 12]
-But the speech-to-command repertoire is quite limited in Voice In,
-so it's now time to pick up on Talon Voice. This is an open source project.
-It's free. It is highly configurable via TalonScript, which is a subset of
-Python. You can use either TalonScript or Python to configure it, but it's
-easier to code up your configuration in TalonScript. It has a Python
-interpreter embedded in it, so you don't have to mess around with
-installing yet another Python interpreter. It runs on all platforms, and it
-has a dictation mode that's separate from a command mode. You can
-activate it, and it'll be in a listening state asleep. You just bark out
-"Talon Wake" to start to wake it up, and "Talon Sleep" to have it go into
-a listening state. It has a very welcoming community in the Talon Slack
-channel. Then I need to point out that there's several packages that
-others have developed that run on top of Talon, but one of particular note
-is by Pokey Rule. He has on his website some really well-done videos
-that demonstrate how he uses Cursorless to move the cursor around
-using voice commands. This, however, runs on VS Code. At least that's
-the text editor for which he's primarily developing Cursorless.
-
-[slide 13]
-I followed the install protocol outlined by Tara Roys. She has a collection
-of tutorials on YouTube as well as on GitHub that are quite helpful. I
-followed her tutorial for installing Talon on macOS without any issues,
-but allow for half an hour to an hour to go through the process. When
-you're done, you'll have this Talon icon appear in the toolbar on the Mac.
-When it has this diagonal line across it, that means it's in the sleep state.
-So, this leads to cascading pull-down menus. This is it for the GUI. One
-of your first tasks is to select a language model that will be used to
-interpret the sounds that you generate as words. And the other kind of
-key feature is that there's a, under scripting, there's a view log pull-down
-that opens up a window displaying the log file. Whenever you make a
-change in a Talon configuration file, that change is implemented
-immediately. You do not have to restart Talon to get the change
-to take effect.
-
-[slide 14]
-This is an example of a Talon file. It has two components. It has a header
-above the dash that describes the scope of the commands contained
-below the dash. Each command is separated by a blank line. If a voice
-command is mapped to multiple actions, these are listed separately on
-indented lines below the first line. The words that are in square brackets
-are optional. So, I have mapped the word toggle voice in, or the phrase
-toggle voice in, to the keyboard shortcut Alt L in order to toggle on or off
-voice in. If I toggle voice in on, I need to immediately toggle off Talon,
-and this is done through this key command for Control T, which is
-mapped to speech toggle. Speech toggle. Then there are, there's a
-couple other examples. So, if there's no header present, it's an optional
-feature of Talon files, then the commands in the file will apply in all
-situations, in all modes.
-
-[slide 15]
-Here we have two restrictions. These commands will only work when
-using the iTerm2 terminal emulator for the Mac, and then only when the
-title of the window in ccc has this particular address, which is what
-appears when I've logged into the supercomputer at the University of
-Oklahoma. One of the commands in this file is checkjobs. It's mapped
-to an alias, a bash alias called cj for "check jobs", which in turn is mapped
-to a script called checkjobs.sh that, when it's run, returns a listing of the
-pending and running jobs on the supercomputer in a format that I find
-pleasing. This backslash n after cj, the new line character, enters the
-command, so I don't have to do that as an additional step. Likewise,
-here's a similar setup for interacting with a Ubuntu virtual machine.
-
-[slide 16]
-In terms of picking up voice computing, these are my recommendations.
-You're going to run into more errors than you may like initially, and so
-you need some patience in dealing with those. And also, it'll take you
-a while to get your head wrapped around Talon and how it works. You'll
-definitely want to use custom commands to correct the errors or
-shortcomings of the language models. And you've seen how, by
-opening up projects by voice commands, you can reduce friction in
-terms of restarting work on a project. You've seen how Voice In is
-preferred for more accurate dictation. I think my error rate is about
-1 to 2 percent. That is, 1 to 2 out of 100 words are incorrect versus
-Talon Voice where I think the error rate is closer to 5 percent. I have
-put together a library of Enlgish contractions and their expansion for
-Talon too, and they can be found here on GitHub. And I also have posted
-a quiz of 600 questions about some basic Talon commands.
-
-[slide 17]
-I'd like to thank the people who've helped me out on the Talon Slack
-channel and members of the Oklahoma Data Science Workshop where
-I gave an hour-long talk on this topic several weeks ago. I'd like to thank
-my friends at the Berlin and Austin Emacs Meetup and at the M-x research
-Slack channel. And I thank these grant funding agencies for supporting
-my work. I'll be happy to take any questions.
-
[[!inline pages="internal(2023/info/voice-after)" raw="yes"]]
[[!inline pages="internal(2023/info/voice-nav)" raw="yes"]]