From 626247c0ca28bda8987a65279033d3fd2b96284c Mon Sep 17 00:00:00 2001 From: Sacha Chua Date: Tue, 5 Dec 2023 15:38:47 -0500 Subject: incorporate voice changes into vtt, make chapters --- ...y-with-voice-computing--blaine-mooers--main.vtt | 137 ++++++++++++--------- 1 file changed, 81 insertions(+), 56 deletions(-) (limited to '2023/captions/emacsconf-2023-voice--enhancing-productivity-with-voice-computing--blaine-mooers--main.vtt') diff --git a/2023/captions/emacsconf-2023-voice--enhancing-productivity-with-voice-computing--blaine-mooers--main.vtt b/2023/captions/emacsconf-2023-voice--enhancing-productivity-with-voice-computing--blaine-mooers--main.vtt index 650d2d49..5ff59fdc 100644 --- a/2023/captions/emacsconf-2023-voice--enhancing-productivity-with-voice-computing--blaine-mooers--main.vtt +++ b/2023/captions/emacsconf-2023-voice--enhancing-productivity-with-voice-computing--blaine-mooers--main.vtt @@ -1,5 +1,7 @@ WEBVTT captioned by sachac +NOTE Introduction + 00:00:00.000 --> 00:00:04.359 Hi, I'm Blaine Mooers. I'm an associate professor @@ -33,6 +35,8 @@ I was seeking ways of using voice computing 00:00:33.040 --> 00:00:37.399 to try to enhance my productivity. +NOTE Three activities in voice computing + 00:00:37.400 --> 00:00:41.319 I divide voice computing into three activities, @@ -54,17 +58,19 @@ that are probably most broadly applicable 00:00:57.320 --> 00:01:02.559 to the workflows of people attending this conference. +NOTE Talk is not about ... and about ... + 00:01:02.560 --> 00:01:06.799 This talk will not be about Emacspeak. 00:01:06.800 --> 00:01:11.359 -This is a verbal program for converting text to speech. +This is a venerated program for converting text to speech. 00:01:11.360 --> 00:01:13.319 We're talking about the flow of information 00:01:13.320 --> 00:01:16.519 -opposite direction, speech-to-text. +in the opposite direction, speech-to-text. 00:01:16.520 --> 00:01:20.599 We need an Emacs Listens. We don't have one, @@ -99,7 +105,7 @@ and it's also great at speech-to-code. NOTE Motivations 00:01:53.520 --> 00:01:57.239 -So, the motivations are, obviously, as I mentioned already, +The motivations are, obviously, as I mentioned already, 00:01:57.240 --> 00:01:59.159 for improved productivity. @@ -209,7 +215,7 @@ I adopted the use of voice computing in the middle of August. As you can see, 00:03:53.920 --> 00:03:58.679 -I got a over three-fold increase in my output. +I got an over three-fold increase in my output. NOTE Voice In in the Chrome Store @@ -217,7 +223,7 @@ NOTE Voice In in the Chrome Store So this is the Chrome store website for voice-in. 00:04:07.120 --> 00:04:11.119 -So it's only available for Google Chrome. +It's only available for Google Chrome. 00:04:11.120 --> 00:04:13.239 You just hit the install button to install it. @@ -231,8 +237,13 @@ It has support for 40 languages 00:04:19.560 --> 00:04:23.119 and it supports about a dozen different dialects of English, -00:04:23.120 --> 00:04:29.959 -including Australian. It works on web pages with text areas, +00:04:23.120 --> 00:04:25.627 +including Australian. + +NOTE Works in web pages with text areas + +00:04:25.628 --> 00:04:29.959 +It works on web pages with text areas, 00:04:29.960 --> 00:04:33.319 so it works. I use it regularly @@ -265,15 +276,15 @@ I've mapped option-L to opening Voice In when the cursor is on a web page that has a text area. 00:05:09.120 --> 00:05:16.879 -So that's the main limiting factor. +So [the presence of a text area is] the main limiting factor. NOTE Built-in commands in Voice In Plus 00:05:16.880 --> 00:05:19.159 -So it has a number of built-in commands. +[Voice In] has a number of built-in commands. 00:05:19.160 --> 00:05:24.879 -You can turn it off by saying stop dictation. +You can turn it off by saying "stop dictation". 00:05:24.880 --> 00:05:26.119 It doesn't distinguish between @@ -282,16 +293,16 @@ It doesn't distinguish between a command mode and a dictation mode. 00:05:28.800 --> 00:05:33.599 -It has undo command. When you use a command, +It has undo command. You use the command 00:05:33.600 --> 00:05:36.919 -copy that to a copy of selection. +"copy that" to copy a selection. 00:05:36.920 --> 00:05:40.079 -And the `press` commands are used in the browser, +The "press" commands are used in the browser. 00:05:40.080 --> 00:05:44.839 -so you press Enter to issue a command or a text +You [say] "press enter" to issue a command or [submit] text 00:05:44.840 --> 00:05:50.319 that has been written in a web form, @@ -335,7 +346,7 @@ I also provide an Elisp version of this quiz, 00:06:35.600 --> 00:06:41.739 but it's a little slower to operate. -NOTE Common errors +NOTE Common errors made by Voice In 00:06:41.740 --> 00:06:43.399 These are some common errors @@ -368,7 +379,7 @@ It inserts the wrong word because it's not in the dictionary that was used to train it. So, for example, 00:07:22.620 --> 00:07:26.919 -the word PyMOL is the name of a lexicographic program +the word PyMOL is the name of a molecular graphics program 00:07:26.920 --> 00:07:31.639 that we use in our field. It doesn't recognize PyMOL. @@ -409,13 +420,13 @@ that's separate from a dictation mode. NOTE Custom speech-to-text commands 00:08:14.760 --> 00:08:20.319 -So you can set up through a very easy-to-use GUI +You can set up through a very easy-to-use GUI 00:08:20.320 --> 00:08:26.959 -custom voice commands mapped to what you want inserted. +custom voice commands mapped to what you want inserted, 00:08:26.960 --> 00:08:32.399 -So this is how misinterpreted words can be corrected. +so this is how misinterpreted words can be corrected. 00:08:32.400 --> 00:08:35.759 You just map the misinterpreted word to the intended word. @@ -427,7 +438,7 @@ You can also map the contractions to their expansions. I did this for 94 English contractions, 00:08:46.960 --> 00:08:50.139 -and you can find this on GitHub. +and you can find these on GitHub. 00:08:50.140 --> 00:08:56.079 You can also insert acronyms and expand those acronyms. @@ -439,19 +450,19 @@ I apply the same approach to the first names of colleagues. I say "expand Fred", for example, 00:09:03.760 --> 00:09:06.999 -to get Fred's first and last name with the spelling +to get Fred's first and last name 00:09:07.000 --> 00:09:12.599 -of his very long German name. +with the [correct] spelling of his very long German name. 00:09:12.600 --> 00:09:19.399 You can also insert other trivia like favorite URLs. 00:09:19.400 --> 00:09:24.559 -You can insert a lot of text snippets, +You can insert LaTeX snippets. 00:09:24.560 --> 00:09:34.799 -and so it handles correctly multi-line snippets. +It handles correctly multi-line snippets. 00:09:34.800 --> 00:09:39.419 You just have to enclose them in double quotes. @@ -465,11 +476,13 @@ that you use frequently. All fields 00:09:46.880 --> 00:09:59.419 have certain key references for certain methods or topics. +NOTE Custom speech-to-commands + 00:09:59.420 --> 00:10:05.079 Then it has a set of commands that you can customize 00:10:05.080 --> 00:10:08.199 -for the purpose of speech to commands +for the purpose of speech-to-commands 00:10:08.200 --> 00:10:09.679 to get the computer to do something @@ -477,22 +490,22 @@ to get the computer to do something 00:10:09.680 --> 00:10:15.399 like open up a specific website or save the current writing. -00:10:15.400 --> 00:10:19.919 -In this case, we have "press" is a mapping of +00:10:15.400 --> 00:10:23.540 +In this case, we have "press: command-s" -00:10:19.920 --> 00:10:27.759 -is applied to the command `s` for saving current writing. +00:10:23.541 --> 00:10:27.759 +for saving current writing. 00:10:27.760 --> 00:10:28.099 -You can change the language, +You can change the language [with "lang:"], 00:10:28.100 --> 00:10:37.539 -and you can change the case of the text. +and you can change the case of the text [with "case:"]. NOTE Introducing Talon Voice 00:10:37.540 --> 00:10:41.039 -But the speech to command repertoire is quite limited +But the speech-to-command repertoire is quite limited 00:10:41.040 --> 00:10:49.759 in Voice In, so it's now time to pick up on Talon Voice. @@ -537,10 +550,10 @@ You can activate it, and it'll be in a listening state asleep. 00:11:31.360 --> 00:11:36.279 -You just bark out Talon Wake to start to wake it up, +You just bark out "Talon Wake" to start to wake it up, 00:11:36.280 --> 00:11:43.799 -and Talon Sleep to have it go into a listening state. +and "Talon Sleep" to have it go into a listening state. 00:11:43.800 --> 00:11:47.919 It has a very welcoming community @@ -578,7 +591,7 @@ for which he's primarily developing Cursorless. NOTE Talon GUI 00:12:28.400 --> 00:12:35.519 -So, I followed the protocol outlined by Tara Roys. +I followed the [install] protocol outlined by Tara Roys. 00:12:35.520 --> 00:12:38.759 She has a collection of tutorials @@ -590,7 +603,7 @@ on YouTube as well as on GitHub that are quite helpful. I followed her tutorial for installing 00:12:49.480 --> 00:12:51.359 -Talend on macOS without any issues, +Talon on macOS without any issues, 00:12:51.360 --> 00:12:55.319 but allow for half an hour to an hour @@ -611,13 +624,13 @@ that means it's in the sleep state. So, this leads to cascading pull-down menus. 00:13:13.520 --> 00:13:19.639 -This is it for the GUI interface. +This is it for the GUI. 00:13:19.640 --> 00:13:26.519 -One of your first tasks is to select a large language model +One of your first tasks is to select 00:13:26.520 --> 00:13:30.439 -or language model that will be used to interpret +a language model that will be used to interpret 00:13:30.440 --> 00:13:35.179 the sounds that you generate as words. @@ -643,8 +656,10 @@ You do not have to restart Talon 00:13:57.600 --> 00:14:02.539 to get the change to take effect. +NOTE Talon file with web scope + 00:14:02.540 --> 00:14:04.759 -So, this is an example of a Talon file. +This is an example of a Talon file. 00:14:04.760 --> 00:14:10.499 It has two components. It has a header above the dash that describes @@ -706,20 +721,25 @@ it's an optional feature of Talon files, 00:15:29.600 --> 00:15:32.639 then the commands in the file will apply in all situations, -00:15:32.640 --> 00:15:36.959 -in all modes. Here we have two restrictions. +00:15:32.640 --> 00:15:34.014 +in all modes. + +NOTE Terminals on remote and virtual machines + +00:15:34.015 --> 00:15:36.959 +Here we have two restrictions. 00:15:36.960 --> 00:15:38.959 -This is only, these commands will only work +These commands will only work 00:15:38.960 --> 00:15:42.959 -when using the iTerm2 terminal emulator for the Mac, +when using the iTerm2 [ccc] terminal emulator for the Mac, 00:15:42.960 --> 00:15:48.239 and then only when the title of the window in iTerm2 00:15:48.240 --> 00:15:52.439 -has this particular address, which corresponds to, +has this particular address, 00:15:52.440 --> 00:15:55.559 which is what appears when I've logged into @@ -728,7 +748,7 @@ which is what appears when I've logged into the supercomputer at the University of Oklahoma. 00:16:00.060 --> 00:16:03.479 -So, one of the commands in this file is checkjobs. +One of the commands in this file is checkjobs. 00:16:03.480 --> 00:16:05.539 It's mapped to an alias, @@ -749,13 +769,13 @@ of the pending and running jobs on the supercomputer in a format that I find pleasing. 00:16:26.081 --> 00:16:34.559 -So, this backslash n after cj, new line character, +This `\n` after cj, the new line character, 00:16:34.560 --> 00:16:39.839 -enters the command. So, I don't have to do that +enters the command, so I don't have to do that 00:16:39.840 --> 00:16:43.799 -as an additional step. And then, likewise, +as an additional step. Likewise, 00:16:43.800 --> 00:16:46.799 here's a similar setup for interacting with @@ -766,7 +786,7 @@ a Ubuntu virtual machine. NOTE Recommendations 00:16:52.500 --> 00:16:55.919 -So, in terms of picking up voice computing, +In terms of picking up voice computing, 00:16:55.920 --> 00:16:57.479 these are my recommendations. @@ -822,20 +842,25 @@ versus Talon Voice where I think 00:17:56.320 --> 00:17:59.879 the error rate is closer to 5 percent. -00:18:00.840 --> 00:18:04.759 -I have put together contractions also for Talon, +00:18:00.840 --> 00:18:03.507 +I have put together [a library of English] contractions -00:18:04.760 --> 00:18:07.479 +00:18:03.508 --> 00:18:04.880 +[and their expansion] for Talon [too], + +00:18:04.881 --> 00:18:07.479 and they can be found here on GitHub. 00:18:07.480 --> 00:18:12.959 -And I also have a quiz of 600 questions +And I also have [posted] a quiz of 600 questions 00:18:12.960 --> 00:18:17.719 about some basic Talon commands. +NOTE Acknowledgements + 00:18:17.720 --> 00:18:20.999 -So, I'd like to thank the people who've helped me out +I'd like to thank the people who've helped me out 00:18:21.000 --> 00:18:22.159 on the Talon Slack channel @@ -856,7 +881,7 @@ I'd like to thank my friends at the Berlin and Austin Emacs Meetup 00:18:37.400 --> 00:18:42.659 -and at the M-x Research Slack channel. +and at the M-x research Slack channel. 00:18:42.660 --> 00:18:45.119 And I thank these grant funding agencies -- cgit v1.2.3