Added trancscript with slides

author: Blaine Mooers <bmooers1@gmail.com> 2023-12-04 13:10:20 -0600
committer: Blaine Mooers <bmooers1@gmail.com> 2023-12-04 13:10:20 -0600
commit: 5cb970ee7b92483f67288f29027afa437c7026d7 (patch)
tree: 822492990f455b3f3b2e35f1f31c964f3dac1260
parent: b5e397f1bedea9705b9705f167b98f03b6249695 (diff)
download: emacsconf-wiki-5cb970ee7b92483f67288f29027afa437c7026d7.tar.xz
emacsconf-wiki-5cb970ee7b92483f67288f29027afa437c7026d7.zip
1 files changed, 218 insertions, 1 deletions
diff --git a/2023/talks/voice.md b/2023/talks/voice.md
index eeac32d0..3eee45df 100644
--- a/2023/talks/voice.md
+++ b/2023/talks/voice.md
@@ -48,7 +48,7 @@ I also supply the interactive quizzes for mastering the basic Voice In commands
 I learned the Talon alphabet in one day by taking the quiz at spaced intervals.
 The quiz took only 60 seconds to complete when I was proficient.
 
-About the speaker:
+# About the speaker:
 
 I am an Associate Professor of Biochemistry at the University of
 Oklahoma Health Sciences Center. I use X-ray crystallography to study
@@ -60,6 +60,223 @@ stress injury flared up while entering data in a spreadsheet. I
 tripled my daily word count by using the speech-to-text, and I get a
 kick out of running remote computers by speech-to-command.
 
+# Transcript
+
+[slide 1]
+Hi, I'm Blaine Mooers. I'm an associate professor of biochemistry at
+ the University of Oklahoma Health Sciences Center in Oklahoma City. 
+ My lab studies the role of RNA structure in RNA editing. We use X-ray 
+ crystallography to study the structures of these RNAs. We spend a lot 
+ of time in the lab preparing our samples for structural studies, and then
+ we also spend a lot of time at the computer analyzing the resulting data.
+ I was seeking ways of using voice computing to try to enhance
+  my productivity. 
+
+[slide 2]
+I divide voice computing into three activities, speech-to-text or dictation, 
+speech-to-commands, and speech-to-code. I'll be talking about 
+speech-to-text and speech-to-commands today because these are two 
+activities that are probably most broadly applicable to the workflows of 
+people attending this conference. 
+
+[slide 3]
+This talk will not be about Emacspeak. This is a venerated program for 
+converting text to speech. We're talking about the flow of information in 
+the opposite direction, speech-to-text. We need an Emacs Listens. We 
+don't have one, so I had to seek help from outside the Emacs world via
+ the Voice In Plus. This runs in the Google Chrome web browser, and it's 
+ very good for speech-to-text and very easy to learn how to use. It also 
+ has some speech-to-commands. However, Talon Voice is much better 
+ with the speech-to-commands, and it's also great at speech-to-code. 
+
+[slide 4]
+the motivations are, obviously, as I mentioned already, for improved 
+productivity. So, if you're a fast typist who types faster than they can speak, 
+then nonetheless you might still benefit from voice computing when you 
+grow tired of using the keyboard. On the other hand, you might be a 
+slow typist who talks faster than they can type. In this case, you're 
+definitely going to benefit from dictation because you'll be able to encode 
+more words in text documents in a given day. If you're a coder, 
+then you may get a kick out of opening programs and websites and 
+coding projects by using your voice. 
+
+[slide 5]
+Then there are health-related reasons. You may have impaired use 
+of your hands, eyes, or both due to accident or disease, or you may 
+suffer from a repetitive stress injury. Many of us have this in a mild 
+but chronic form of it. We can't take a three-month sabbatical from 
+the keyboard without losing our jobs, so these injuries tend to persist. 
+And then you may have learned that it's not good for your health to 
+sit for prolonged periods of time with your staring at a computer screen. 
+You can actually dictate to your computer from 20 feet away while 
+looking out the window, thereby giving your lower body a break 
+and your eyes a break. 
+
+[slide 5]
+I'm not God, so I have to bring data. I have two data points here, 
+the number of words that I wrote in June and July this year and in 
+September and October. I adopted the use of voice computing in 
+the middle of August. As you can see, I got an over three-fold increase
+ in my output. 
+
+[slide 6]
+So this is the Chrome store website for voice-in. It's only available 
+for Google Chrome. You just hit the install button to install it. To configure it, 
+you need to select a language. It has support for 40 languages and 
+it supports about a dozen different dialects of English, including Australian. 
+
+[slide 7]
+It works on web pages with text areas, so it works. I use it regularly on 
+Overleaf and 750words.com, a distraction-free environment for writing.
+ It also works in webmails. It works in Google. It works in Jupyter Lab, 
+ of course, because that runs in the browser. It also works in Jupyter 
+ Notebook and Colab Notebook. It should work in Cloudmacs. I've 
+ mapped option-L to opening Voice In when the cursor is on a web page 
+ that has a text area. The presence of a text area is the main limiting factor. 
+
+[slide 8]
+Voice In has a number of built-in commands. You can turn it off by saying 
+"stop dictation". It doesn't distinguish between a command mode and 
+a dictation mode. It has undo command. When you use the command 
+"copy that" to a copy of selection. You say  "press enter" to issue a 
+command or submit text that has been written in a web form, and then 
+"press tab"  will open up the next tab in a web browser. The scroll up 
+and down will allow you to navigate a web page. I've put together a quiz 
+about these commands so that you can go through this quiz several 
+times until you get at least 90 percent of them correct, 90 percent of 
+the questions correct. In order to boost your recall of the commands, 
+I have a Python script that you can probably pound through the quiz 
+with in less than a minute, once you know the commands. I also 
+provide an Elisp version of this quiz, but it's a little slower to operate. 
+
+[slide 9]
+These are some common errors that I've run into with Voice In. It 
+likes to contract statements like "I will" into "I'll". Contractions are not 
+used in formal writing, and most of my writing is formal writing, so 
+this annoys me. I will show you how I corrected for that problem. It 
+also drops the first word in sentences quite often. This might be some 
+speech issue that I have. It inserts the wrong word because it's not 
+in the dictionary that was used to train it. So, for example, the word 
+PyMOL is the name of a molecular graphics program that we use in 
+our field. It doesn't recognize PyMOL. Instead, it substitutes in the 
+word "primal". Since I don't use "primal" very often, I've mapped the 
+word "primal" to "PyMOL" in some custom commands I'll talk about 
+in a minute. Then there's a problem that the commands that exist might 
+get executed when you speak them when, in fact, you wanted to use 
+the words in those commands during your dictation. So this is a problem, 
+a pitfall of Voice In, in that it doesn't have a command mode that's 
+separate from a dictation mode. 
+
+[slide 10]
+You can set up through a very easy-to-use GUI custom voice 
+commands mapped to what you want inserted, so this is how 
+misinterpreted words can be corrected. You just map the misinterpreted 
+word to the intended word. You can also map the contractions to their 
+expansions. I did this for 94 English contractions, and you can find 
+these on GitHub. You can also insert acronyms and expand those 
+acronyms. I apply the same approach to the first names of colleagues. 
+I say "expand Fred", for example, to get Fred's first and last name 
+with the correct spelling of his very long German name. You can also 
+insert other trivia like favorite URLs. You can insert LaTeX snippets. 
+It handles correctly multi-line snippets. You just have to enclose them 
+in double quotes. You can even insert BibTeX cite keys for references
+that you use frequently. All fields have certain key references for certain 
+methods or topics. 
+
+[slide 11]
+Then it has a set of commands that you can customize for the purpose 
+of speech-to-commands to get the computer to do something like open 
+up a specific website or save the current writing. In this case, we have 
+"press: command-s" for saving the current writing. You can change the 
+language with "lang:", and you can change the case of the text with "case:". 
+
+[slide 12]
+But the speech-to-command repertoire is quite limited in Voice In, 
+so it's now time to pick up on Talon Voice. This is an open source project. 
+It's free. It is highly configurable via TalonScript, which is a subset of 
+Python. You can use either TalonScript or Python to configure it, but it's 
+easier to code up your configuration in TalonScript. It has a Python 
+interpreter embedded in it, so you don't have to mess around with 
+installing yet another Python interpreter. It runs on all platforms, and it 
+has a dictation mode that's separate from a command mode. You can 
+activate it, and it'll be in a listening state asleep. You just bark out 
+"Talon Wake" to start to wake it up, and "Talon Sleep" to have it go into 
+a listening state. It has a very welcoming community in the Talon Slack 
+channel. Then I need to point out that there's several packages that 
+others have developed that run on top of Talon, but one of particular note 
+is by Pokey Rule. He has on his website some really well-done videos 
+that demonstrate how he uses Cursorless to move the cursor around 
+using voice commands. This, however, runs on VS Code. At least that's 
+the text editor for which he's primarily developing Cursorless. 
+
+[slide 13]
+I followed the install protocol outlined by Tara Roys. She has a collection 
+of tutorials on YouTube as well as on GitHub that are quite helpful. I 
+followed her tutorial for installing Talon on macOS without any issues, 
+but allow for half an hour to an hour to go through the process. When 
+you're done, you'll have this Talon icon appear in the toolbar on the Mac. 
+When it has this diagonal line across it, that means it's in the sleep state. 
+So, this leads to cascading pull-down menus. This is it for the GUI. One 
+of your first tasks is to select a language model that will be used to 
+interpret the sounds that you generate as words. And the other kind of 
+key feature is that there's a, under scripting, there's a view log pull-down 
+that opens up a window displaying the log file. Whenever you make a 
+change in a Talon configuration file, that change is implemented 
+immediately. You do not have to restart Talon to get the change 
+to take effect. 
+
+[slide 14]
+This is an example of a Talon file. It has two components. It has a header 
+above the dash that describes the scope of the commands contained 
+below the dash. Each command is separated by a blank line. If a voice 
+command is mapped to multiple actions, these are listed separately on
+indented lines below the first line. The words that are in square brackets
+are optional. So, I have mapped the word toggle voice in, or the phrase 
+toggle voice in, to the keyboard shortcut Alt L in order to toggle on or off 
+voice in. If I toggle voice in on, I need to immediately toggle off Talon, 
+and this is done through this key command for Control T, which is 
+mapped to speech toggle. Speech toggle. Then there are, there's a 
+couple other examples. So, if there's no header present, it's an optional 
+feature of Talon files, then the commands in the file will apply in all 
+situations, in all modes. 
+
+[slide 15]
+Here we have two restrictions. These commands will only work when 
+using the iTerm2 terminal emulator for the Mac, and then only when the 
+title of the window in ccc has this particular address, which is what 
+appears when I've logged into the supercomputer at the University of 
+Oklahoma. One of the commands in this file is checkjobs. It's mapped 
+to an alias, a bash alias called cj for "check jobs", which in turn is mapped 
+to a script called checkjobs.sh that, when it's run, returns a listing of the 
+pending and running jobs on the supercomputer in a format that I find 
+pleasing. This backslash n after cj, the new line character, enters the 
+command, so I don't have to do that as an additional step. Likewise, 
+here's a similar setup for interacting with a Ubuntu virtual machine. 
+
+[slide 16]
+In terms of picking up voice computing, these are my recommendations. 
+You're going to run into more errors than you may like initially, and so 
+you need some patience in dealing with those. And also, it'll take you 
+a while to get your head wrapped around Talon and how it works. You'll 
+definitely want to use custom commands to correct the errors or 
+shortcomings of the language models. And you've seen how, by 
+opening up projects by voice commands, you can reduce friction in 
+terms of restarting work on a project. You've seen how Voice In is 
+preferred for more accurate dictation. I think my error rate is about 
+1 to 2 percent. That is, 1 to 2 out of 100 words are incorrect versus 
+Talon Voice where I think the error rate is closer to 5 percent. I have 
+put together a library of Enlgish contractions and their expansion for 
+Talon too, and they can be found here on GitHub. And I also have posted 
+a quiz of 600 questions about some basic Talon commands.
+
+[slide 17]
+I'd like to thank the people who've helped me out on the Talon Slack 
+channel and members of the Oklahoma Data Science Workshop where 
+I gave an hour-long talk on this topic several weeks ago. I'd like to thank 
+my friends at the Berlin and Austin Emacs Meetup and at the M-x research 
+Slack channel. And I thank these grant funding agencies for supporting 
+my work. I'll be happy to take any questions.
+
 [[!inline pages="internal(2023/info/voice-after)" raw="yes"]]
 
 [[!inline pages="internal(2023/info/voice-nav)" raw="yes"]]
author	Blaine Mooers <bmooers1@gmail.com>	2023-12-04 13:10:20 -0600
committer	Blaine Mooers <bmooers1@gmail.com>	2023-12-04 13:10:20 -0600
commit	5cb970ee7b92483f67288f29027afa437c7026d7 (patch)
tree	822492990f455b3f3b2e35f1f31c964f3dac1260
parent	b5e397f1bedea9705b9705f167b98f03b6249695 (diff)
download	emacsconf-wiki-5cb970ee7b92483f67288f29027afa437c7026d7.tar.xz emacsconf-wiki-5cb970ee7b92483f67288f29027afa437c7026d7.zip