[[!meta title="Enhancing productivity with voice computing"]] [[!meta copyright="Copyright © 2023 Blaine Mooers"]] [[!inline pages="internal(2023/info/voice-nav)" raw="yes"]] # Enhancing productivity with voice computing Blaine Mooers (he/him/his) - Pronunciation: pronounced like "moors", blaine-mooers(at)ouhsc.edu, , , , , mastodon(at)bhmooers [[!inline pages="internal(2023/info/voice-before)" raw="yes"]] Voice computing uses speech recognition software to convert speech into text, commands, or code. While there is a venerated program called EmacSpeaks for converting text into speech, a "EmacsListens" for converting speech into text is not available yet. The Emacs Wiki describes the underdeveloped situation for speech-to-text in Emacs. I will explain how two external software packages convert my speech into text and computer commands that can be used with Emacs. First, I present some motivations for using voice computing. These can be divided into two categories: productivity improvement and health-related issues. In this second category, there is the under-appreciated cure for ``standing desk envy''; the cure is achievable with a large dose of voice computing while standing. I found one software package (Voice In) to be quite accurate for speech-to-text or dictation (Voice In Plus, ), but less versatile for speech-to-commands. I have used this package daily and I found a three-fold increase in my daily word count almost immediately. Of course, there are limits here; you can talk for only so many hours per day. Second, I found another software package that has a less accurate language model (Talon Voice, )) but that supports custom commands that can be executed anywhere you can place the cursor, including in virtual machines and on remote servers. Talon Voice will appeal to those who like to tinker with configuration files, yet it is easy to use. I will explain how I have integrated these two packages into my workflow. I have developed a library of commands that expand 94 English contractions when spoken. This library eliminates tedious downstream editing of formal prose where I do not use contractions. The library is available on GitHub for both Voice In Plus () and Talon Voice (). I also supply the interactive quizzes for mastering the basic Voice In commands () and the Talon Voice phonetic alphabet () I learned the Talon alphabet in one day by taking the quiz at spaced intervals. The quiz took only 60 seconds to complete when I was proficient. I gave a 60 minute talk on this topic to the Oklahoma Data Science Workshop 2023 Nov. 16 (). This workshop meets once a month and is for people interested in data science and scientific computing. You do not have to be an Oklahoma resident to attend. Send me e-mial if you want to be added to our mailing list. # About the speaker: I am an Associate Professor of Biochemistry at the University of Oklahoma Health Sciences Center. I use X-ray crystallography to study the structures of RNA, proteins, and protein-drug complexes. I have been using Python and LaTeX for a dozen years, and Jupyter Notebooks since 2013. I have been using Emacs every day for 2.5 years. I discovered voice computing this summer when my chronic repetitive stress injury flared up while entering data in a spreadsheet. I tripled my daily word count by using the speech-to-text, and I get a kick out of running remote computers by speech-to-command. # Transcript [slide 1] Hi, I'm Blaine Mooers. I'm an associate professor of biochemistry at the University of Oklahoma Health Sciences Center in Oklahoma City. My lab studies the role of RNA structure in RNA editing. We use X-ray crystallography to study the structures of these RNAs. We spend a lot of time in the lab preparing our samples for structural studies, and then we also spend a lot of time at the computer analyzing the resulting data. I was seeking ways of using voice computing to try to enhance my productivity. [slide 2] I divide voice computing into three activities, speech-to-text or dictation, speech-to-commands, and speech-to-code. I'll be talking about speech-to-text and speech-to-commands today because these are two activities that are probably most broadly applicable to the workflows of people attending this conference. [slide 3] This talk will not be about Emacspeak. This is a venerated program for converting text to speech. We're talking about the flow of information in the opposite direction, speech-to-text. We need an Emacs Listens. We don't have one, so I had to seek help from outside the Emacs world via the Voice In Plus. This runs in the Google Chrome web browser, and it's very good for speech-to-text and very easy to learn how to use. It also has some speech-to-commands. However, Talon Voice is much better with the speech-to-commands, and it's also great at speech-to-code. [slide 4] The motivations are, obviously, as I mentioned already, for improved productivity. So, if you're a fast typist who types faster than they can speak, then nonetheless you might still benefit from voice computing when you grow tired of using the keyboard. On the other hand, you might be a slow typist who talks faster than they can type. In this case, you're definitely going to benefit from dictation because you'll be able to encode more words in text documents in a given day. If you're a coder, then you may get a kick out of opening programs and websites and coding projects by using your voice. Then there are health-related reasons. You may have impaired use of your hands, eyes, or both due to accident or disease, or you may suffer from a repetitive stress injury. Many of us have this in a mild but chronic form of it. We can't take a three-month sabbatical from the keyboard without losing our jobs, so these injuries tend to persist. And then you may have learned that it's not good for your health to sit for prolonged periods of time with your staring at a computer screen. You can actually dictate to your computer from 20 feet away while looking out the window, thereby giving your lower body a break and your eyes a break. [slide 5] I'm not God, so I have to bring data. I have two data points here, the number of words that I wrote in June and July this year and in September and October. I adopted the use of voice computing in the middle of August. As you can see, I got an over three-fold increase in my output. [slide 6] So this is the Chrome store website for voice-in. It's only available for Google Chrome. You just hit the install button to install it. To configure it, you need to select a language. It has support for 40 languages and it supports about a dozen different dialects of English, including Australian. [slide 7] It works on web pages with text areas, so it works. I use it regularly on Overleaf and 750words.com, a distraction-free environment for writing. It also works in webmails. It works in Google. It works in Jupyter Lab, of course, because that runs in the browser. It also works in Jupyter Notebook and Colab Notebook. It should work in Cloudmacs. I've mapped option-L to opening Voice In when the cursor is on a web page that has a text area. The presence of a text area is the main limiting factor. [slide 8] Voice In has a number of built-in commands. You can turn it off by saying "stop dictation". It doesn't distinguish between a command mode and a dictation mode. It has undo command. When you use the command "copy that" to a copy of selection. You say "press enter" to issue a command or submit text that has been written in a web form, and then "press tab" will open up the next tab in a web browser. The scroll up and down will allow you to navigate a web page. I've put together a quiz about these commands so that you can go through this quiz several times until you get at least 90 percent of them correct, 90 percent of the questions correct. In order to boost your recall of the commands, I have a Python script that you can probably pound through the quiz with in less than a minute, once you know the commands. I also provide an Elisp version of this quiz, but it's a little slower to operate. [slide 9] These are some common errors that I've run into with Voice In. It likes to contract statements like "I will" into "I'll". Contractions are not used in formal writing, and most of my writing is formal writing, so this annoys me. I will show you how I corrected for that problem. It also drops the first word in sentences quite often. This might be some speech issue that I have. It inserts the wrong word because it's not in the dictionary that was used to train it. So, for example, the word PyMOL is the name of a molecular graphics program that we use in our field. It doesn't recognize PyMOL. Instead, it substitutes in the word "primal". Since I don't use "primal" very often, I've mapped the word "primal" to "PyMOL" in some custom commands I'll talk about in a minute. Then there's a problem that the commands that exist might get executed when you speak them when, in fact, you wanted to use the words in those commands during your dictation. So this is a problem, a pitfall of Voice In, in that it doesn't have a command mode that's separate from a dictation mode. [slide 10] You can set up through a very easy-to-use GUI custom voice commands mapped to what you want inserted, so this is how misinterpreted words can be corrected. You just map the misinterpreted word to the intended word. You can also map the contractions to their expansions. I did this for 94 English contractions, and you can find these on GitHub. You can also insert acronyms and expand those acronyms. I apply the same approach to the first names of colleagues. I say "expand Fred", for example, to get Fred's first and last name with the correct spelling of his very long German name. You can also insert other trivia like favorite URLs. You can insert LaTeX snippets. It handles correctly multi-line snippets. You just have to enclose them in double quotes. You can even insert BibTeX cite keys for references that you use frequently. All fields have certain key references for certain methods or topics. [slide 11] Then it has a set of commands that you can customize for the purpose of speech-to-commands to get the computer to do something like open up a specific website or save the current writing. In this case, we have "press: command-s" for saving the current writing. You can change the language with "lang:", and you can change the case of the text with "case:". [slide 12] But the speech-to-command repertoire is quite limited in Voice In, so it's now time to pick up on Talon Voice. This is an open source project. It's free. It is highly configurable via TalonScript, which is a subset of Python. You can use either TalonScript or Python to configure it, but it's easier to code up your configuration in TalonScript. It has a Python interpreter embedded in it, so you don't have to mess around with installing yet another Python interpreter. It runs on all platforms, and it has a dictation mode that's separate from a command mode. You can activate it, and it'll be in a listening state asleep. You just bark out "Talon Wake" to start to wake it up, and "Talon Sleep" to have it go into a listening state. It has a very welcoming community in the Talon Slack channel. Then I need to point out that there's several packages that others have developed that run on top of Talon, but one of particular note is by Pokey Rule. He has on his website some really well-done videos that demonstrate how he uses Cursorless to move the cursor around using voice commands. This, however, runs on VS Code. At least that's the text editor for which he's primarily developing Cursorless. [slide 13] I followed the install protocol outlined by Tara Roys. She has a collection of tutorials on YouTube as well as on GitHub that are quite helpful. I followed her tutorial for installing Talon on macOS without any issues, but allow for half an hour to an hour to go through the process. When you're done, you'll have this Talon icon appear in the toolbar on the Mac. When it has this diagonal line across it, that means it's in the sleep state. So, this leads to cascading pull-down menus. This is it for the GUI. One of your first tasks is to select a language model that will be used to interpret the sounds that you generate as words. And the other kind of key feature is that there's a, under scripting, there's a view log pull-down that opens up a window displaying the log file. Whenever you make a change in a Talon configuration file, that change is implemented immediately. You do not have to restart Talon to get the change to take effect. [slide 14] This is an example of a Talon file. It has two components. It has a header above the dash that describes the scope of the commands contained below the dash. Each command is separated by a blank line. If a voice command is mapped to multiple actions, these are listed separately on indented lines below the first line. The words that are in square brackets are optional. So, I have mapped the word toggle voice in, or the phrase toggle voice in, to the keyboard shortcut Alt L in order to toggle on or off voice in. If I toggle voice in on, I need to immediately toggle off Talon, and this is done through this key command for Control T, which is mapped to speech toggle. Speech toggle. Then there are, there's a couple other examples. So, if there's no header present, it's an optional feature of Talon files, then the commands in the file will apply in all situations, in all modes. [slide 15] Here we have two restrictions. These commands will only work when using the iTerm2 terminal emulator for the Mac, and then only when the title of the window in ccc has this particular address, which is what appears when I've logged into the supercomputer at the University of Oklahoma. One of the commands in this file is checkjobs. It's mapped to an alias, a bash alias called cj for "check jobs", which in turn is mapped to a script called checkjobs.sh that, when it's run, returns a listing of the pending and running jobs on the supercomputer in a format that I find pleasing. This backslash n after cj, the new line character, enters the command, so I don't have to do that as an additional step. Likewise, here's a similar setup for interacting with a Ubuntu virtual machine. [slide 16] In terms of picking up voice computing, these are my recommendations. You're going to run into more errors than you may like initially, and so you need some patience in dealing with those. And also, it'll take you a while to get your head wrapped around Talon and how it works. You'll definitely want to use custom commands to correct the errors or shortcomings of the language models. And you've seen how, by opening up projects by voice commands, you can reduce friction in terms of restarting work on a project. You've seen how Voice In is preferred for more accurate dictation. I think my error rate is about 1 to 2 percent. That is, 1 to 2 out of 100 words are incorrect versus Talon Voice where I think the error rate is closer to 5 percent. I have put together a library of Enlgish contractions and their expansion for Talon too, and they can be found here on GitHub. And I also have posted a quiz of 600 questions about some basic Talon commands. [slide 17] I'd like to thank the people who've helped me out on the Talon Slack channel and members of the Oklahoma Data Science Workshop where I gave an hour-long talk on this topic several weeks ago. I'd like to thank my friends at the Berlin and Austin Emacs Meetup and at the M-x research Slack channel. And I thank these grant funding agencies for supporting my work. I'll be happy to take any questions. [[!inline pages="internal(2023/info/voice-after)" raw="yes"]] [[!inline pages="internal(2023/info/voice-nav)" raw="yes"]]