WEBVTT captioned by sachac NOTE Introduction 00:00:00.000 --> 00:00:04.359 Hi, I'm Blaine Mooers. I'm an associate professor 00:00:04.360 --> 00:00:06.519 of biochemistry at the University of Oklahoma 00:00:06.520 --> 00:00:09.319 Health Sciences Center in Oklahoma City. 00:00:09.320 --> 00:00:12.959 My lab studies the role of RNA structure in RNA editing. 00:00:12.960 --> 00:00:17.199 We use X-ray crystallography to study the structures 00:00:17.200 --> 00:00:19.919 of these RNAs. We spend a lot of time in the lab 00:00:19.920 --> 00:00:22.719 preparing our samples for structural studies, 00:00:22.720 --> 00:00:26.719 and then we also spend a lot of time at the computer 00:00:26.720 --> 00:00:29.719 analyzing the resulting data. 00:00:29.720 --> 00:00:33.039 I was seeking ways of using voice computing 00:00:33.040 --> 00:00:37.399 to try to enhance my productivity. NOTE Three activities in voice computing 00:00:37.400 --> 00:00:41.319 I divide voice computing into three activities, 00:00:41.320 --> 00:00:44.959 speech-to-text or dictation, speech-to-commands, 00:00:44.960 --> 00:00:47.639 and speech-to-code. I'll be talking about 00:00:47.640 --> 00:00:50.159 speech-to-text and speech-to-commands today 00:00:50.160 --> 00:00:55.079 because these are two activities 00:00:55.080 --> 00:00:57.319 that are probably most broadly applicable 00:00:57.320 --> 00:01:02.559 to the workflows of people attending this conference. NOTE Talk is not about ... and about ... 00:01:02.560 --> 00:01:06.799 This talk will not be about Emacspeak. 00:01:06.800 --> 00:01:11.359 This is a venerated program for converting text to speech. 00:01:11.360 --> 00:01:13.319 We're talking about the flow of information 00:01:13.320 --> 00:01:16.519 in the opposite direction, speech-to-text. 00:01:16.520 --> 00:01:20.599 We need an Emacs Listens. We don't have one, 00:01:20.600 --> 00:01:25.479 so I had to seek help from outside the Emacs world 00:01:25.480 --> 00:01:30.639 via the Voice In Plus. This runs in 00:01:30.640 --> 00:01:33.639 the Google Chrome web browser, 00:01:33.640 --> 00:01:36.719 and it's very good for speech-to-text 00:01:36.720 --> 00:01:39.519 and very easy to learn how to use. 00:01:39.520 --> 00:01:41.999 It also has some speech-to-commands. 00:01:42.000 --> 00:01:44.799 However, Talon Voice is much better 00:01:44.800 --> 00:01:47.559 with the speech-to-commands, 00:01:47.560 --> 00:01:53.519 and it's also great at speech-to-code. NOTE Motivations 00:01:53.520 --> 00:01:57.239 The motivations are, obviously, as I mentioned already, 00:01:57.240 --> 00:01:59.159 for improved productivity. 00:01:59.160 --> 00:02:00.399 So, if you're a fast typist 00:02:00.400 --> 00:02:05.199 who types faster than they can speak, 00:02:05.200 --> 00:02:07.079 then nonetheless you might still benefit 00:02:07.080 --> 00:02:09.279 from voice computing when you grow tired of 00:02:09.280 --> 00:02:12.199 using the keyboard. On the other hand, 00:02:12.200 --> 00:02:15.199 you might be a slow typist who talks faster 00:02:15.200 --> 00:02:17.519 than they can type. 00:02:17.520 --> 00:02:19.759 In this case, you're definitely going to 00:02:19.760 --> 00:02:22.859 benefit from dictation because you'll be able to 00:02:22.860 --> 00:02:29.359 encode more words in text documents in a given day. 00:02:29.360 --> 00:02:33.639 If you're a coder, then you may get a kick out of 00:02:33.640 --> 00:02:36.999 opening programs and websites and coding projects 00:02:37.000 --> 00:02:39.279 by using your voice. 00:02:39.280 --> 00:02:41.719 Then there are health-related reasons. 00:02:41.720 --> 00:02:44.599 You may have impaired use of your hands, eyes, or both 00:02:44.600 --> 00:02:49.199 due to accident or disease, or you may suffer from 00:02:49.200 --> 00:02:53.519 a repetitive stress injury. Many of us have this 00:02:53.520 --> 00:02:55.759 in a mild but chronic form of it. 00:02:55.760 --> 00:02:59.039 We can't take a three-month sabbatical from the keyboard 00:02:59.040 --> 00:03:05.519 without losing our jobs, so these injuries tend to persist. 00:03:05.520 --> 00:03:06.679 And then you may have learned 00:03:06.680 --> 00:03:09.959 that it's not good for your health to sit 00:03:09.960 --> 00:03:11.919 for prolonged periods of time 00:03:11.920 --> 00:03:14.919 with your staring at a computer screen. 00:03:14.920 --> 00:03:21.799 You can actually dictate to your computer from 20 feet away 00:03:21.800 --> 00:03:24.999 while looking out the window, 00:03:25.000 --> 00:03:27.779 thereby giving your lower body a break 00:03:27.780 --> 00:03:33.239 and your eyes a break. NOTE Data 00:03:33.240 --> 00:03:35.639 I'm not God, so I have to bring data. 00:03:35.640 --> 00:03:38.039 I have two data points here, 00:03:38.040 --> 00:03:42.399 the number of words that I wrote in June and July this year 00:03:42.400 --> 00:03:45.159 and in September and October. 00:03:45.160 --> 00:03:49.519 I adopted the use of voice computing 00:03:49.520 --> 00:03:53.919 in the middle of August. As you can see, 00:03:53.920 --> 00:03:58.679 I got an over three-fold increase in my output. NOTE Voice In in the Chrome Store 00:03:58.680 --> 00:04:07.119 So this is the Chrome store website for voice-in. 00:04:07.120 --> 00:04:11.119 It's only available for Google Chrome. 00:04:11.120 --> 00:04:13.239 You just hit the install button to install it. 00:04:13.240 --> 00:04:16.639 To configure it, you need to select a language. 00:04:16.640 --> 00:04:19.559 It has support for 40 languages 00:04:19.560 --> 00:04:23.119 and it supports about a dozen different dialects of English, 00:04:23.120 --> 00:04:25.627 including Australian. NOTE Works in web pages with text areas 00:04:25.628 --> 00:04:29.959 It works on web pages with text areas, 00:04:29.960 --> 00:04:33.319 so it works. I use it regularly 00:04:33.320 --> 00:04:37.879 on Overleaf and 750words.com, 00:04:37.880 --> 00:04:42.279 a distraction-free environment for writing. 00:04:42.280 --> 00:04:46.239 It also works in webmails. It works in Google. 00:04:46.780 --> 00:04:51.319 It works in Jupyter Lab, of course, 00:04:51.320 --> 00:04:52.879 because that runs in the browser. 00:04:52.880 --> 00:04:57.999 It also works in Jupyter Notebook and Colab Notebook. 00:04:58.000 --> 00:05:01.319 It should work in Cloudmacs. 00:05:01.320 --> 00:05:04.159 I've mapped option-L to opening Voice In 00:05:04.160 --> 00:05:09.119 when the cursor is on a web page that has a text area. 00:05:09.120 --> 00:05:16.879 So [the presence of a text area is] the main limiting factor. NOTE Built-in commands in Voice In Plus 00:05:16.880 --> 00:05:19.159 [Voice In] has a number of built-in commands. 00:05:19.160 --> 00:05:24.879 You can turn it off by saying "stop dictation". 00:05:24.880 --> 00:05:26.119 It doesn't distinguish between 00:05:26.120 --> 00:05:28.799 a command mode and a dictation mode. 00:05:28.800 --> 00:05:33.599 It has undo command. You use the command 00:05:33.600 --> 00:05:36.919 "copy that" to copy a selection. 00:05:36.920 --> 00:05:40.079 The "press" commands are used in the browser. 00:05:40.080 --> 00:05:44.839 You [say] "press enter" to issue a command or [submit] text 00:05:44.840 --> 00:05:50.319 that has been written in a web form, 00:05:50.320 --> 00:05:55.279 and then "press tab" will open up the next tab 00:05:55.280 --> 00:05:58.599 in a web browser. The scroll up and down 00:05:58.600 --> 00:06:02.379 will allow you to navigate a web page. 00:06:02.380 --> 00:06:05.819 I've put together a quiz about these commands 00:06:05.820 --> 00:06:09.559 so that you can go through this quiz several times 00:06:09.560 --> 00:06:14.699 until you get at least 90 percent of them correct, 00:06:14.700 --> 00:06:16.679 90 percent of the questions correct. 00:06:16.680 --> 00:06:20.599 In order to boost your recall of the commands, 00:06:20.600 --> 00:06:23.799 I have a Python script that you can probably 00:06:23.800 --> 00:06:26.559 pound through the quiz with 00:06:26.560 --> 00:06:32.159 in less than a minute, once you know the commands. 00:06:32.160 --> 00:06:35.599 I also provide an Elisp version of this quiz, 00:06:35.600 --> 00:06:41.739 but it's a little slower to operate. NOTE Common errors made by Voice In 00:06:41.740 --> 00:06:43.399 These are some common errors 00:06:43.400 --> 00:06:45.399 that I've run into with Voice In. 00:06:45.400 --> 00:06:50.319 It likes to contract statements like "I will" into "I'll". 00:06:50.320 --> 00:06:55.599 Contractions are not used in formal writing, 00:06:55.600 --> 00:07:00.359 and most of my writing is formal writing, so this annoys me. 00:07:00.360 --> 00:07:04.759 I will show you how I corrected for that problem. 00:07:04.760 --> 00:07:10.039 It also drops the first word in sentences quite often. 00:07:10.040 --> 00:07:13.359 This might be some speech issue that I have. 00:07:13.360 --> 00:07:17.599 It inserts the wrong word because it's not in the dictionary 00:07:17.600 --> 00:07:22.619 that was used to train it. So, for example, 00:07:22.620 --> 00:07:26.919 the word PyMOL is the name of a molecular graphics program 00:07:26.920 --> 00:07:31.639 that we use in our field. It doesn't recognize PyMOL. 00:07:31.640 --> 00:07:34.239 Instead, it substitutes in the word "primal". 00:07:34.240 --> 00:07:38.399 Since I don't use "primal" very often, 00:07:38.400 --> 00:07:42.299 I've mapped the word "primal" to "PyMOL" 00:07:42.300 --> 00:07:45.659 in some custom commands I'll talk about in a minute. 00:07:45.660 --> 00:07:50.439 Then there's a problem that the commands that exist 00:07:50.440 --> 00:07:54.439 might get executed when you speak them when, in fact, 00:07:54.440 --> 00:07:58.839 you wanted to use the words in those commands 00:07:58.840 --> 00:08:01.439 during your dictation. 00:08:01.440 --> 00:08:07.119 So this is a problem, a pitfall of Voice In, 00:08:07.120 --> 00:08:08.919 in that it doesn't have a command mode 00:08:08.920 --> 00:08:14.759 that's separate from a dictation mode. NOTE Custom speech-to-text commands 00:08:14.760 --> 00:08:20.319 You can set up through a very easy-to-use GUI 00:08:20.320 --> 00:08:26.959 custom voice commands mapped to what you want inserted, 00:08:26.960 --> 00:08:32.399 so this is how misinterpreted words can be corrected. 00:08:32.400 --> 00:08:35.759 You just map the misinterpreted word to the intended word. 00:08:35.760 --> 00:08:42.839 You can also map the contractions to their expansions. 00:08:42.840 --> 00:08:46.959 I did this for 94 English contractions, 00:08:46.960 --> 00:08:50.139 and you can find these on GitHub. 00:08:50.140 --> 00:08:56.079 You can also insert acronyms and expand those acronyms. 00:08:56.080 --> 00:09:00.239 I apply the same approach to the first names of colleagues. 00:09:00.240 --> 00:09:03.759 I say "expand Fred", for example, 00:09:03.760 --> 00:09:06.999 to get Fred's first and last name 00:09:07.000 --> 00:09:12.599 with the [correct] spelling of his very long German name. 00:09:12.600 --> 00:09:19.399 You can also insert other trivia like favorite URLs. 00:09:19.400 --> 00:09:24.559 You can insert LaTeX snippets. 00:09:24.560 --> 00:09:34.799 It handles correctly multi-line snippets. 00:09:34.800 --> 00:09:39.419 You just have to enclose them in double quotes. 00:09:39.420 --> 00:09:45.039 You can even insert BibTeX cite keys for references 00:09:45.040 --> 00:09:46.879 that you use frequently. All fields 00:09:46.880 --> 00:09:59.419 have certain key references for certain methods or topics. NOTE Custom speech-to-commands 00:09:59.420 --> 00:10:05.079 Then it has a set of commands that you can customize 00:10:05.080 --> 00:10:08.199 for the purpose of speech-to-commands 00:10:08.200 --> 00:10:09.679 to get the computer to do something 00:10:09.680 --> 00:10:15.399 like open up a specific website or save the current writing. 00:10:15.400 --> 00:10:23.540 In this case, we have "press: command-s" 00:10:23.541 --> 00:10:27.759 for saving current writing. 00:10:27.760 --> 00:10:28.099 You can change the language [with "lang:"], 00:10:28.100 --> 00:10:37.539 and you can change the case of the text [with "case:"]. NOTE Introducing Talon Voice 00:10:37.540 --> 00:10:41.039 But the speech-to-command repertoire is quite limited 00:10:41.040 --> 00:10:49.759 in Voice In, so it's now time to pick up on Talon Voice. 00:10:49.760 --> 00:10:54.119 This is an open source project. It's free. 00:10:54.120 --> 00:10:57.399 It is highly configurable via TalonScript, 00:10:57.400 --> 00:10:58.959 which is a subset of Python. 00:10:58.960 --> 00:11:03.039 You can use either TalonScript or Python to configure it, 00:11:03.040 --> 00:11:06.279 but it's easier to code up your configuration 00:11:06.280 --> 00:11:08.399 in TalonScript. 00:11:08.400 --> 00:11:10.759 It has a Python interpreter embedded in it, 00:11:10.760 --> 00:11:12.999 so you don't have to mess around with installing 00:11:13.000 --> 00:11:14.559 yet another Python interpreter. 00:11:14.560 --> 00:11:21.519 It runs on all platforms, and it has a dictation mode 00:11:21.520 --> 00:11:24.599 that's separate from a command mode. 00:11:24.600 --> 00:11:25.599 You can activate it, 00:11:25.600 --> 00:11:31.359 and it'll be in a listening state asleep. 00:11:31.360 --> 00:11:36.279 You just bark out "Talon Wake" to start to wake it up, 00:11:36.280 --> 00:11:43.799 and "Talon Sleep" to have it go into a listening state. 00:11:43.800 --> 00:11:47.919 It has a very welcoming community 00:11:47.920 --> 00:11:50.919 in the Talon Slack channel. 00:11:50.920 --> 00:11:56.399 Then I need to point out that there's several packages 00:11:56.400 --> 00:11:59.199 that others have developed that run on top of Talon, 00:11:59.200 --> 00:12:03.079 but one of particular note is by Pokey Rule. 00:12:03.080 --> 00:12:08.119 He has on his website some really well-done videos 00:12:08.120 --> 00:12:11.479 that demonstrate how he uses Cursorless 00:12:11.480 --> 00:12:17.239 to move the cursor around using voice commands. 00:12:17.240 --> 00:12:20.559 This, however, runs on VS Code. 00:12:20.560 --> 00:12:23.359 At least that's the text editor 00:12:23.360 --> 00:12:28.399 for which he's primarily developing Cursorless. NOTE Talon GUI 00:12:28.400 --> 00:12:35.519 I followed the [install] protocol outlined by Tara Roys. 00:12:35.520 --> 00:12:38.759 She has a collection of tutorials 00:12:38.760 --> 00:12:44.599 on YouTube as well as on GitHub that are quite helpful. 00:12:44.600 --> 00:12:49.479 I followed her tutorial for installing 00:12:49.480 --> 00:12:51.359 Talon on macOS without any issues, 00:12:51.360 --> 00:12:55.319 but allow for half an hour to an hour 00:12:55.320 --> 00:12:57.719 to go through the process. When you're done, 00:12:57.720 --> 00:13:02.199 you'll have this Talon icon appear in the toolbar 00:13:02.200 --> 00:13:06.119 on the Mac. When it has this diagonal line across it, 00:13:06.120 --> 00:13:09.539 that means it's in the sleep state. 00:13:09.540 --> 00:13:13.519 So, this leads to cascading pull-down menus. 00:13:13.520 --> 00:13:19.639 This is it for the GUI. 00:13:19.640 --> 00:13:26.519 One of your first tasks is to select 00:13:26.520 --> 00:13:30.439 a language model that will be used to interpret 00:13:30.440 --> 00:13:35.179 the sounds that you generate as words. 00:13:35.180 --> 00:13:38.959 And the other kind of key feature is that there's a, 00:13:38.960 --> 00:13:43.399 under scripting, there's a view log pull-down 00:13:43.400 --> 00:13:48.399 that opens up a window displaying the log file. 00:13:48.400 --> 00:13:52.879 Whenever you make a change in a Talon configuration file, 00:13:52.880 --> 00:13:55.079 that change is implemented immediately. 00:13:55.080 --> 00:13:57.599 You do not have to restart Talon 00:13:57.600 --> 00:14:02.539 to get the change to take effect. NOTE Talon file with web scope 00:14:02.540 --> 00:14:04.759 This is an example of a Talon file. 00:14:04.760 --> 00:14:10.499 It has two components. It has a header above the dash that describes 00:14:10.500 --> 00:14:14.919 the scope of the commands contained below the dash. 00:14:14.920 --> 00:14:19.739 Each command is separated by a blank line. 00:14:19.740 --> 00:14:24.239 If a voice command is mapped to multiple actions, 00:14:24.240 --> 00:14:30.999 these are listed separately on indented lines 00:14:31.000 --> 00:14:33.599 below the first line. 00:14:33.600 --> 00:14:39.419 The words that are in square brackets are optional. 00:14:39.420 --> 00:14:44.319 So, I have mapped the word toggle voice in, 00:14:44.320 --> 00:14:46.319 or the phrase toggle voice in, 00:14:46.320 --> 00:14:51.279 to the keyboard shortcut Alt L 00:14:51.280 --> 00:14:54.999 in order to toggle on or off voice in. 00:14:55.000 --> 00:14:57.879 If I toggle voice in on, 00:14:57.880 --> 00:15:01.759 I need to immediately toggle off Talon, 00:15:01.760 --> 00:15:09.079 and this is done through this key command for Control T, 00:15:09.080 --> 00:15:11.079 which is mapped to speech toggle. 00:15:11.080 --> 00:15:20.399 Speech toggle. Then there are, 00:15:20.400 --> 00:15:24.079 there's a couple other examples. 00:15:24.080 --> 00:15:26.439 So, if there's no header present, 00:15:26.440 --> 00:15:29.599 it's an optional feature of Talon files, 00:15:29.600 --> 00:15:32.639 then the commands in the file will apply in all situations, 00:15:32.640 --> 00:15:34.014 in all modes. NOTE Terminals on remote and virtual machines 00:15:34.015 --> 00:15:36.959 Here we have two restrictions. 00:15:36.960 --> 00:15:38.959 These commands will only work 00:15:38.960 --> 00:15:42.959 when using the iTerm2 [ccc] terminal emulator for the Mac, 00:15:42.960 --> 00:15:48.239 and then only when the title of the window in iTerm2 00:15:48.240 --> 00:15:52.439 has this particular address, 00:15:52.440 --> 00:15:55.559 which is what appears when I've logged into 00:15:55.560 --> 00:16:00.059 the supercomputer at the University of Oklahoma. 00:16:00.060 --> 00:16:03.479 One of the commands in this file is checkjobs. 00:16:03.480 --> 00:16:05.539 It's mapped to an alias, 00:16:05.540 --> 00:16:10.919 a bash alias called cj for "check jobs", 00:16:10.920 --> 00:16:17.079 which in turn is mapped to a script called checkjobs.sh 00:16:17.080 --> 00:16:20.399 that, when it's run, returns a listing 00:16:20.400 --> 00:16:23.219 of the pending and running jobs on the supercomputer 00:16:23.220 --> 00:16:26.080 in a format that I find pleasing. 00:16:26.081 --> 00:16:34.559 This `\n` after cj, the new line character, 00:16:34.560 --> 00:16:39.839 enters the command, so I don't have to do that 00:16:39.840 --> 00:16:43.799 as an additional step. Likewise, 00:16:43.800 --> 00:16:46.799 here's a similar setup for interacting with 00:16:46.800 --> 00:16:52.499 a Ubuntu virtual machine. NOTE Recommendations 00:16:52.500 --> 00:16:55.919 In terms of picking up voice computing, 00:16:55.920 --> 00:16:57.479 these are my recommendations. 00:16:57.480 --> 00:16:59.759 You're going to run into more errors 00:16:59.760 --> 00:17:01.479 than you may like initially, 00:17:01.480 --> 00:17:07.839 and so you need some patience in dealing with those. 00:17:07.840 --> 00:17:09.919 And also, it'll take you a while 00:17:09.920 --> 00:17:16.799 to get your head wrapped around Talon and how it works. 00:17:16.800 --> 00:17:19.439 You'll definitely want to use these custom commands 00:17:19.440 --> 00:17:21.479 to correct the errors or shortcomings 00:17:21.480 --> 00:17:26.919 of the language models. And you've seen how, 00:17:26.920 --> 00:17:29.879 by opening up projects by voice commands, 00:17:29.880 --> 00:17:31.359 you can reduce friction 00:17:31.360 --> 00:17:36.659 in terms of restarting work on a project. 00:17:36.660 --> 00:17:40.399 You've seen how Voice In is preferred 00:17:40.400 --> 00:17:44.879 for more accurate dictation. 00:17:44.880 --> 00:17:48.079 I think my error rate is about 1 to 2 percent. 00:17:48.080 --> 00:17:53.879 That is, 1 to 2 out of 100 words are incorrect 00:17:53.880 --> 00:17:56.319 versus Talon Voice where I think 00:17:56.320 --> 00:17:59.879 the error rate is closer to 5 percent. 00:18:00.840 --> 00:18:03.507 I have put together [a library of English] contractions 00:18:03.508 --> 00:18:04.880 [and their expansion] for Talon [too], 00:18:04.881 --> 00:18:07.479 and they can be found here on GitHub. 00:18:07.480 --> 00:18:12.959 And I also have [posted] a quiz of 600 questions 00:18:12.960 --> 00:18:17.719 about some basic Talon commands. NOTE Acknowledgements 00:18:17.720 --> 00:18:20.999 I'd like to thank the people who've helped me out 00:18:21.000 --> 00:18:22.159 on the Talon Slack channel 00:18:22.160 --> 00:18:25.799 and members of the Oklahoma Data Science Workshop 00:18:25.800 --> 00:18:29.879 where I gave an hour-long talk on this topic 00:18:29.880 --> 00:18:30.959 several weeks ago. 00:18:30.960 --> 00:18:34.159 I'd like to thank my friends 00:18:34.160 --> 00:18:37.399 at the Berlin and Austin Emacs Meetup 00:18:37.400 --> 00:18:42.659 and at the M-x research Slack channel. 00:18:42.660 --> 00:18:45.119 And I thank these grant funding agencies 00:18:45.120 --> 00:18:48.880 for supporting my work. I'll be happy to take any questions.