From 53cd704e22c01f42c6e8730a320300f3812bff30 Mon Sep 17 00:00:00 2001 From: EmacsConf Date: Sat, 2 Dec 2023 10:20:15 -0500 Subject: Automated commit --- ...y-with-voice-computing--blaine-mooers--main.vtt | 865 +++++++++++++++++++++ 2023/info/voice-after.md | 286 +++++++ 2023/info/voice-before.md | 4 +- 3 files changed, 1153 insertions(+), 2 deletions(-) create mode 100644 2023/captions/emacsconf-2023-voice--enhancing-productivity-with-voice-computing--blaine-mooers--main.vtt (limited to '2023') diff --git a/2023/captions/emacsconf-2023-voice--enhancing-productivity-with-voice-computing--blaine-mooers--main.vtt b/2023/captions/emacsconf-2023-voice--enhancing-productivity-with-voice-computing--blaine-mooers--main.vtt new file mode 100644 index 00000000..650d2d49 --- /dev/null +++ b/2023/captions/emacsconf-2023-voice--enhancing-productivity-with-voice-computing--blaine-mooers--main.vtt @@ -0,0 +1,865 @@ +WEBVTT captioned by sachac + +00:00:00.000 --> 00:00:04.359 +Hi, I'm Blaine Mooers. I'm an associate professor + +00:00:04.360 --> 00:00:06.519 +of biochemistry at the University of Oklahoma + +00:00:06.520 --> 00:00:09.319 +Health Sciences Center in Oklahoma City. + +00:00:09.320 --> 00:00:12.959 +My lab studies the role of RNA structure in RNA editing. + +00:00:12.960 --> 00:00:17.199 +We use X-ray crystallography to study the structures + +00:00:17.200 --> 00:00:19.919 +of these RNAs. We spend a lot of time in the lab + +00:00:19.920 --> 00:00:22.719 +preparing our samples for structural studies, + +00:00:22.720 --> 00:00:26.719 +and then we also spend a lot of time at the computer + +00:00:26.720 --> 00:00:29.719 +analyzing the resulting data. + +00:00:29.720 --> 00:00:33.039 +I was seeking ways of using voice computing + +00:00:33.040 --> 00:00:37.399 +to try to enhance my productivity. + +00:00:37.400 --> 00:00:41.319 +I divide voice computing into three activities, + +00:00:41.320 --> 00:00:44.959 +speech-to-text or dictation, speech-to-commands, + +00:00:44.960 --> 00:00:47.639 +and speech-to-code. I'll be talking about + +00:00:47.640 --> 00:00:50.159 +speech-to-text and speech-to-commands today + +00:00:50.160 --> 00:00:55.079 +because these are two activities + +00:00:55.080 --> 00:00:57.319 +that are probably most broadly applicable + +00:00:57.320 --> 00:01:02.559 +to the workflows of people attending this conference. + +00:01:02.560 --> 00:01:06.799 +This talk will not be about Emacspeak. + +00:01:06.800 --> 00:01:11.359 +This is a verbal program for converting text to speech. + +00:01:11.360 --> 00:01:13.319 +We're talking about the flow of information + +00:01:13.320 --> 00:01:16.519 +opposite direction, speech-to-text. + +00:01:16.520 --> 00:01:20.599 +We need an Emacs Listens. We don't have one, + +00:01:20.600 --> 00:01:25.479 +so I had to seek help from outside the Emacs world + +00:01:25.480 --> 00:01:30.639 +via the Voice In Plus. This runs in + +00:01:30.640 --> 00:01:33.639 +the Google Chrome web browser, + +00:01:33.640 --> 00:01:36.719 +and it's very good for speech-to-text + +00:01:36.720 --> 00:01:39.519 +and very easy to learn how to use. + +00:01:39.520 --> 00:01:41.999 +It also has some speech-to-commands. + +00:01:42.000 --> 00:01:44.799 +However, Talon Voice is much better + +00:01:44.800 --> 00:01:47.559 +with the speech-to-commands, + +00:01:47.560 --> 00:01:53.519 +and it's also great at speech-to-code. + +NOTE Motivations + +00:01:53.520 --> 00:01:57.239 +So, the motivations are, obviously, as I mentioned already, + +00:01:57.240 --> 00:01:59.159 +for improved productivity. + +00:01:59.160 --> 00:02:00.399 +So, if you're a fast typist + +00:02:00.400 --> 00:02:05.199 +who types faster than they can speak, + +00:02:05.200 --> 00:02:07.079 +then nonetheless you might still benefit + +00:02:07.080 --> 00:02:09.279 +from voice computing when you grow tired of + +00:02:09.280 --> 00:02:12.199 +using the keyboard. On the other hand, + +00:02:12.200 --> 00:02:15.199 +you might be a slow typist who talks faster + +00:02:15.200 --> 00:02:17.519 +than they can type. + +00:02:17.520 --> 00:02:19.759 +In this case, you're definitely going to + +00:02:19.760 --> 00:02:22.859 +benefit from dictation because you'll be able to + +00:02:22.860 --> 00:02:29.359 +encode more words in text documents in a given day. + +00:02:29.360 --> 00:02:33.639 +If you're a coder, then you may get a kick out of + +00:02:33.640 --> 00:02:36.999 +opening programs and websites and coding projects + +00:02:37.000 --> 00:02:39.279 +by using your voice. + +00:02:39.280 --> 00:02:41.719 +Then there are health-related reasons. + +00:02:41.720 --> 00:02:44.599 +You may have impaired use of your hands, eyes, or both + +00:02:44.600 --> 00:02:49.199 +due to accident or disease, or you may suffer from + +00:02:49.200 --> 00:02:53.519 +a repetitive stress injury. Many of us have this + +00:02:53.520 --> 00:02:55.759 +in a mild but chronic form of it. + +00:02:55.760 --> 00:02:59.039 +We can't take a three-month sabbatical from the keyboard + +00:02:59.040 --> 00:03:05.519 +without losing our jobs, so these injuries tend to persist. + +00:03:05.520 --> 00:03:06.679 +And then you may have learned + +00:03:06.680 --> 00:03:09.959 +that it's not good for your health to sit + +00:03:09.960 --> 00:03:11.919 +for prolonged periods of time + +00:03:11.920 --> 00:03:14.919 +with your staring at a computer screen. + +00:03:14.920 --> 00:03:21.799 +You can actually dictate to your computer from 20 feet away + +00:03:21.800 --> 00:03:24.999 +while looking out the window, + +00:03:25.000 --> 00:03:27.779 +thereby giving your lower body a break + +00:03:27.780 --> 00:03:33.239 +and your eyes a break. + +NOTE Data + +00:03:33.240 --> 00:03:35.639 +I'm not God, so I have to bring data. + +00:03:35.640 --> 00:03:38.039 +I have two data points here, + +00:03:38.040 --> 00:03:42.399 +the number of words that I wrote in June and July this year + +00:03:42.400 --> 00:03:45.159 +and in September and October. + +00:03:45.160 --> 00:03:49.519 +I adopted the use of voice computing + +00:03:49.520 --> 00:03:53.919 +in the middle of August. As you can see, + +00:03:53.920 --> 00:03:58.679 +I got a over three-fold increase in my output. + +NOTE Voice In in the Chrome Store + +00:03:58.680 --> 00:04:07.119 +So this is the Chrome store website for voice-in. + +00:04:07.120 --> 00:04:11.119 +So it's only available for Google Chrome. + +00:04:11.120 --> 00:04:13.239 +You just hit the install button to install it. + +00:04:13.240 --> 00:04:16.639 +To configure it, you need to select a language. + +00:04:16.640 --> 00:04:19.559 +It has support for 40 languages + +00:04:19.560 --> 00:04:23.119 +and it supports about a dozen different dialects of English, + +00:04:23.120 --> 00:04:29.959 +including Australian. It works on web pages with text areas, + +00:04:29.960 --> 00:04:33.319 +so it works. I use it regularly + +00:04:33.320 --> 00:04:37.879 +on Overleaf and 750words.com, + +00:04:37.880 --> 00:04:42.279 +a distraction-free environment for writing. + +00:04:42.280 --> 00:04:46.239 +It also works in webmails. It works in Google. + +00:04:46.780 --> 00:04:51.319 +It works in Jupyter Lab, of course, + +00:04:51.320 --> 00:04:52.879 +because that runs in the browser. + +00:04:52.880 --> 00:04:57.999 +It also works in Jupyter Notebook and Colab Notebook. + +00:04:58.000 --> 00:05:01.319 +It should work in Cloudmacs. + +00:05:01.320 --> 00:05:04.159 +I've mapped option-L to opening Voice In + +00:05:04.160 --> 00:05:09.119 +when the cursor is on a web page that has a text area. + +00:05:09.120 --> 00:05:16.879 +So that's the main limiting factor. + +NOTE Built-in commands in Voice In Plus + +00:05:16.880 --> 00:05:19.159 +So it has a number of built-in commands. + +00:05:19.160 --> 00:05:24.879 +You can turn it off by saying stop dictation. + +00:05:24.880 --> 00:05:26.119 +It doesn't distinguish between + +00:05:26.120 --> 00:05:28.799 +a command mode and a dictation mode. + +00:05:28.800 --> 00:05:33.599 +It has undo command. When you use a command, + +00:05:33.600 --> 00:05:36.919 +copy that to a copy of selection. + +00:05:36.920 --> 00:05:40.079 +And the `press` commands are used in the browser, + +00:05:40.080 --> 00:05:44.839 +so you press Enter to issue a command or a text + +00:05:44.840 --> 00:05:50.319 +that has been written in a web form, + +00:05:50.320 --> 00:05:55.279 +and then "press tab" will open up the next tab + +00:05:55.280 --> 00:05:58.599 +in a web browser. The scroll up and down + +00:05:58.600 --> 00:06:02.379 +will allow you to navigate a web page. + +00:06:02.380 --> 00:06:05.819 +I've put together a quiz about these commands + +00:06:05.820 --> 00:06:09.559 +so that you can go through this quiz several times + +00:06:09.560 --> 00:06:14.699 +until you get at least 90 percent of them correct, + +00:06:14.700 --> 00:06:16.679 +90 percent of the questions correct. + +00:06:16.680 --> 00:06:20.599 +In order to boost your recall of the commands, + +00:06:20.600 --> 00:06:23.799 +I have a Python script that you can probably + +00:06:23.800 --> 00:06:26.559 +pound through the quiz with + +00:06:26.560 --> 00:06:32.159 +in less than a minute, once you know the commands. + +00:06:32.160 --> 00:06:35.599 +I also provide an Elisp version of this quiz, + +00:06:35.600 --> 00:06:41.739 +but it's a little slower to operate. + +NOTE Common errors + +00:06:41.740 --> 00:06:43.399 +These are some common errors + +00:06:43.400 --> 00:06:45.399 +that I've run into with Voice In. + +00:06:45.400 --> 00:06:50.319 +It likes to contract statements like "I will" into "I'll". + +00:06:50.320 --> 00:06:55.599 +Contractions are not used in formal writing, + +00:06:55.600 --> 00:07:00.359 +and most of my writing is formal writing, so this annoys me. + +00:07:00.360 --> 00:07:04.759 +I will show you how I corrected for that problem. + +00:07:04.760 --> 00:07:10.039 +It also drops the first word in sentences quite often. + +00:07:10.040 --> 00:07:13.359 +This might be some speech issue that I have. + +00:07:13.360 --> 00:07:17.599 +It inserts the wrong word because it's not in the dictionary + +00:07:17.600 --> 00:07:22.619 +that was used to train it. So, for example, + +00:07:22.620 --> 00:07:26.919 +the word PyMOL is the name of a lexicographic program + +00:07:26.920 --> 00:07:31.639 +that we use in our field. It doesn't recognize PyMOL. + +00:07:31.640 --> 00:07:34.239 +Instead, it substitutes in the word "primal". + +00:07:34.240 --> 00:07:38.399 +Since I don't use "primal" very often, + +00:07:38.400 --> 00:07:42.299 +I've mapped the word "primal" to "PyMOL" + +00:07:42.300 --> 00:07:45.659 +in some custom commands I'll talk about in a minute. + +00:07:45.660 --> 00:07:50.439 +Then there's a problem that the commands that exist + +00:07:50.440 --> 00:07:54.439 +might get executed when you speak them when, in fact, + +00:07:54.440 --> 00:07:58.839 +you wanted to use the words in those commands + +00:07:58.840 --> 00:08:01.439 +during your dictation. + +00:08:01.440 --> 00:08:07.119 +So this is a problem, a pitfall of Voice In, + +00:08:07.120 --> 00:08:08.919 +in that it doesn't have a command mode + +00:08:08.920 --> 00:08:14.759 +that's separate from a dictation mode. + +NOTE Custom speech-to-text commands + +00:08:14.760 --> 00:08:20.319 +So you can set up through a very easy-to-use GUI + +00:08:20.320 --> 00:08:26.959 +custom voice commands mapped to what you want inserted. + +00:08:26.960 --> 00:08:32.399 +So this is how misinterpreted words can be corrected. + +00:08:32.400 --> 00:08:35.759 +You just map the misinterpreted word to the intended word. + +00:08:35.760 --> 00:08:42.839 +You can also map the contractions to their expansions. + +00:08:42.840 --> 00:08:46.959 +I did this for 94 English contractions, + +00:08:46.960 --> 00:08:50.139 +and you can find this on GitHub. + +00:08:50.140 --> 00:08:56.079 +You can also insert acronyms and expand those acronyms. + +00:08:56.080 --> 00:09:00.239 +I apply the same approach to the first names of colleagues. + +00:09:00.240 --> 00:09:03.759 +I say "expand Fred", for example, + +00:09:03.760 --> 00:09:06.999 +to get Fred's first and last name with the spelling + +00:09:07.000 --> 00:09:12.599 +of his very long German name. + +00:09:12.600 --> 00:09:19.399 +You can also insert other trivia like favorite URLs. + +00:09:19.400 --> 00:09:24.559 +You can insert a lot of text snippets, + +00:09:24.560 --> 00:09:34.799 +and so it handles correctly multi-line snippets. + +00:09:34.800 --> 00:09:39.419 +You just have to enclose them in double quotes. + +00:09:39.420 --> 00:09:45.039 +You can even insert BibTeX cite keys for references + +00:09:45.040 --> 00:09:46.879 +that you use frequently. All fields + +00:09:46.880 --> 00:09:59.419 +have certain key references for certain methods or topics. + +00:09:59.420 --> 00:10:05.079 +Then it has a set of commands that you can customize + +00:10:05.080 --> 00:10:08.199 +for the purpose of speech to commands + +00:10:08.200 --> 00:10:09.679 +to get the computer to do something + +00:10:09.680 --> 00:10:15.399 +like open up a specific website or save the current writing. + +00:10:15.400 --> 00:10:19.919 +In this case, we have "press" is a mapping of + +00:10:19.920 --> 00:10:27.759 +is applied to the command `s` for saving current writing. + +00:10:27.760 --> 00:10:28.099 +You can change the language, + +00:10:28.100 --> 00:10:37.539 +and you can change the case of the text. + +NOTE Introducing Talon Voice + +00:10:37.540 --> 00:10:41.039 +But the speech to command repertoire is quite limited + +00:10:41.040 --> 00:10:49.759 +in Voice In, so it's now time to pick up on Talon Voice. + +00:10:49.760 --> 00:10:54.119 +This is an open source project. It's free. + +00:10:54.120 --> 00:10:57.399 +It is highly configurable via TalonScript, + +00:10:57.400 --> 00:10:58.959 +which is a subset of Python. + +00:10:58.960 --> 00:11:03.039 +You can use either TalonScript or Python to configure it, + +00:11:03.040 --> 00:11:06.279 +but it's easier to code up your configuration + +00:11:06.280 --> 00:11:08.399 +in TalonScript. + +00:11:08.400 --> 00:11:10.759 +It has a Python interpreter embedded in it, + +00:11:10.760 --> 00:11:12.999 +so you don't have to mess around with installing + +00:11:13.000 --> 00:11:14.559 +yet another Python interpreter. + +00:11:14.560 --> 00:11:21.519 +It runs on all platforms, and it has a dictation mode + +00:11:21.520 --> 00:11:24.599 +that's separate from a command mode. + +00:11:24.600 --> 00:11:25.599 +You can activate it, + +00:11:25.600 --> 00:11:31.359 +and it'll be in a listening state asleep. + +00:11:31.360 --> 00:11:36.279 +You just bark out Talon Wake to start to wake it up, + +00:11:36.280 --> 00:11:43.799 +and Talon Sleep to have it go into a listening state. + +00:11:43.800 --> 00:11:47.919 +It has a very welcoming community + +00:11:47.920 --> 00:11:50.919 +in the Talon Slack channel. + +00:11:50.920 --> 00:11:56.399 +Then I need to point out that there's several packages + +00:11:56.400 --> 00:11:59.199 +that others have developed that run on top of Talon, + +00:11:59.200 --> 00:12:03.079 +but one of particular note is by Pokey Rule. + +00:12:03.080 --> 00:12:08.119 +He has on his website some really well-done videos + +00:12:08.120 --> 00:12:11.479 +that demonstrate how he uses Cursorless + +00:12:11.480 --> 00:12:17.239 +to move the cursor around using voice commands. + +00:12:17.240 --> 00:12:20.559 +This, however, runs on VS Code. + +00:12:20.560 --> 00:12:23.359 +At least that's the text editor + +00:12:23.360 --> 00:12:28.399 +for which he's primarily developing Cursorless. + +NOTE Talon GUI + +00:12:28.400 --> 00:12:35.519 +So, I followed the protocol outlined by Tara Roys. + +00:12:35.520 --> 00:12:38.759 +She has a collection of tutorials + +00:12:38.760 --> 00:12:44.599 +on YouTube as well as on GitHub that are quite helpful. + +00:12:44.600 --> 00:12:49.479 +I followed her tutorial for installing + +00:12:49.480 --> 00:12:51.359 +Talend on macOS without any issues, + +00:12:51.360 --> 00:12:55.319 +but allow for half an hour to an hour + +00:12:55.320 --> 00:12:57.719 +to go through the process. When you're done, + +00:12:57.720 --> 00:13:02.199 +you'll have this Talon icon appear in the toolbar + +00:13:02.200 --> 00:13:06.119 +on the Mac. When it has this diagonal line across it, + +00:13:06.120 --> 00:13:09.539 +that means it's in the sleep state. + +00:13:09.540 --> 00:13:13.519 +So, this leads to cascading pull-down menus. + +00:13:13.520 --> 00:13:19.639 +This is it for the GUI interface. + +00:13:19.640 --> 00:13:26.519 +One of your first tasks is to select a large language model + +00:13:26.520 --> 00:13:30.439 +or language model that will be used to interpret + +00:13:30.440 --> 00:13:35.179 +the sounds that you generate as words. + +00:13:35.180 --> 00:13:38.959 +And the other kind of key feature is that there's a, + +00:13:38.960 --> 00:13:43.399 +under scripting, there's a view log pull-down + +00:13:43.400 --> 00:13:48.399 +that opens up a window displaying the log file. + +00:13:48.400 --> 00:13:52.879 +Whenever you make a change in a Talon configuration file, + +00:13:52.880 --> 00:13:55.079 +that change is implemented immediately. + +00:13:55.080 --> 00:13:57.599 +You do not have to restart Talon + +00:13:57.600 --> 00:14:02.539 +to get the change to take effect. + +00:14:02.540 --> 00:14:04.759 +So, this is an example of a Talon file. + +00:14:04.760 --> 00:14:10.499 +It has two components. It has a header above the dash that describes + +00:14:10.500 --> 00:14:14.919 +the scope of the commands contained below the dash. + +00:14:14.920 --> 00:14:19.739 +Each command is separated by a blank line. + +00:14:19.740 --> 00:14:24.239 +If a voice command is mapped to multiple actions, + +00:14:24.240 --> 00:14:30.999 +these are listed separately on indented lines + +00:14:31.000 --> 00:14:33.599 +below the first line. + +00:14:33.600 --> 00:14:39.419 +The words that are in square brackets are optional. + +00:14:39.420 --> 00:14:44.319 +So, I have mapped the word toggle voice in, + +00:14:44.320 --> 00:14:46.319 +or the phrase toggle voice in, + +00:14:46.320 --> 00:14:51.279 +to the keyboard shortcut Alt L + +00:14:51.280 --> 00:14:54.999 +in order to toggle on or off voice in. + +00:14:55.000 --> 00:14:57.879 +If I toggle voice in on, + +00:14:57.880 --> 00:15:01.759 +I need to immediately toggle off Talon, + +00:15:01.760 --> 00:15:09.079 +and this is done through this key command for Control T, + +00:15:09.080 --> 00:15:11.079 +which is mapped to speech toggle. + +00:15:11.080 --> 00:15:20.399 +Speech toggle. Then there are, + +00:15:20.400 --> 00:15:24.079 +there's a couple other examples. + +00:15:24.080 --> 00:15:26.439 +So, if there's no header present, + +00:15:26.440 --> 00:15:29.599 +it's an optional feature of Talon files, + +00:15:29.600 --> 00:15:32.639 +then the commands in the file will apply in all situations, + +00:15:32.640 --> 00:15:36.959 +in all modes. Here we have two restrictions. + +00:15:36.960 --> 00:15:38.959 +This is only, these commands will only work + +00:15:38.960 --> 00:15:42.959 +when using the iTerm2 terminal emulator for the Mac, + +00:15:42.960 --> 00:15:48.239 +and then only when the title of the window in iTerm2 + +00:15:48.240 --> 00:15:52.439 +has this particular address, which corresponds to, + +00:15:52.440 --> 00:15:55.559 +which is what appears when I've logged into + +00:15:55.560 --> 00:16:00.059 +the supercomputer at the University of Oklahoma. + +00:16:00.060 --> 00:16:03.479 +So, one of the commands in this file is checkjobs. + +00:16:03.480 --> 00:16:05.539 +It's mapped to an alias, + +00:16:05.540 --> 00:16:10.919 +a bash alias called cj for "check jobs", + +00:16:10.920 --> 00:16:17.079 +which in turn is mapped to a script called checkjobs.sh + +00:16:17.080 --> 00:16:20.399 +that, when it's run, returns a listing + +00:16:20.400 --> 00:16:23.219 +of the pending and running jobs on the supercomputer + +00:16:23.220 --> 00:16:26.080 +in a format that I find pleasing. + +00:16:26.081 --> 00:16:34.559 +So, this backslash n after cj, new line character, + +00:16:34.560 --> 00:16:39.839 +enters the command. So, I don't have to do that + +00:16:39.840 --> 00:16:43.799 +as an additional step. And then, likewise, + +00:16:43.800 --> 00:16:46.799 +here's a similar setup for interacting with + +00:16:46.800 --> 00:16:52.499 +a Ubuntu virtual machine. + +NOTE Recommendations + +00:16:52.500 --> 00:16:55.919 +So, in terms of picking up voice computing, + +00:16:55.920 --> 00:16:57.479 +these are my recommendations. + +00:16:57.480 --> 00:16:59.759 +You're going to run into more errors + +00:16:59.760 --> 00:17:01.479 +than you may like initially, + +00:17:01.480 --> 00:17:07.839 +and so you need some patience in dealing with those. + +00:17:07.840 --> 00:17:09.919 +And also, it'll take you a while + +00:17:09.920 --> 00:17:16.799 +to get your head wrapped around Talon and how it works. + +00:17:16.800 --> 00:17:19.439 +You'll definitely want to use these custom commands + +00:17:19.440 --> 00:17:21.479 +to correct the errors or shortcomings + +00:17:21.480 --> 00:17:26.919 +of the language models. And you've seen how, + +00:17:26.920 --> 00:17:29.879 +by opening up projects by voice commands, + +00:17:29.880 --> 00:17:31.359 +you can reduce friction + +00:17:31.360 --> 00:17:36.659 +in terms of restarting work on a project. + +00:17:36.660 --> 00:17:40.399 +You've seen how Voice In is preferred + +00:17:40.400 --> 00:17:44.879 +for more accurate dictation. + +00:17:44.880 --> 00:17:48.079 +I think my error rate is about 1 to 2 percent. + +00:17:48.080 --> 00:17:53.879 +That is, 1 to 2 out of 100 words are incorrect + +00:17:53.880 --> 00:17:56.319 +versus Talon Voice where I think + +00:17:56.320 --> 00:17:59.879 +the error rate is closer to 5 percent. + +00:18:00.840 --> 00:18:04.759 +I have put together contractions also for Talon, + +00:18:04.760 --> 00:18:07.479 +and they can be found here on GitHub. + +00:18:07.480 --> 00:18:12.959 +And I also have a quiz of 600 questions + +00:18:12.960 --> 00:18:17.719 +about some basic Talon commands. + +00:18:17.720 --> 00:18:20.999 +So, I'd like to thank the people who've helped me out + +00:18:21.000 --> 00:18:22.159 +on the Talon Slack channel + +00:18:22.160 --> 00:18:25.799 +and members of the Oklahoma Data Science Workshop + +00:18:25.800 --> 00:18:29.879 +where I gave an hour-long talk on this topic + +00:18:29.880 --> 00:18:30.959 +several weeks ago. + +00:18:30.960 --> 00:18:34.159 +I'd like to thank my friends + +00:18:34.160 --> 00:18:37.399 +at the Berlin and Austin Emacs Meetup + +00:18:37.400 --> 00:18:42.659 +and at the M-x Research Slack channel. + +00:18:42.660 --> 00:18:45.119 +And I thank these grant funding agencies + +00:18:45.120 --> 00:18:48.880 +for supporting my work. I'll be happy to take any questions. diff --git a/2023/info/voice-after.md b/2023/info/voice-after.md index 3810645b..cf6a68eb 100644 --- a/2023/info/voice-after.md +++ b/2023/info/voice-after.md @@ -1,6 +1,292 @@ + +# Transcript + +[[!template text="""Hi, I'm Blaine Mooers. I'm an associate professor""" start="00:00:00.000" video="mainVideo-voice" id="subtitle"]] +[[!template text="""of biochemistry at the University of Oklahoma""" start="00:00:04.360" video="mainVideo-voice" id="subtitle"]] +[[!template text="""Health Sciences Center in Oklahoma City.""" start="00:00:06.520" video="mainVideo-voice" id="subtitle"]] +[[!template text="""My lab studies the role of RNA structure in RNA editing.""" start="00:00:09.320" video="mainVideo-voice" id="subtitle"]] +[[!template text="""We use X-ray crystallography to study the structures""" start="00:00:12.960" video="mainVideo-voice" id="subtitle"]] +[[!template text="""of these RNAs. We spend a lot of time in the lab""" start="00:00:17.200" video="mainVideo-voice" id="subtitle"]] +[[!template text="""preparing our samples for structural studies,""" start="00:00:19.920" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and then we also spend a lot of time at the computer""" start="00:00:22.720" video="mainVideo-voice" id="subtitle"]] +[[!template text="""analyzing the resulting data.""" start="00:00:26.720" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I was seeking ways of using voice computing""" start="00:00:29.720" video="mainVideo-voice" id="subtitle"]] +[[!template text="""to try to enhance my productivity.""" start="00:00:33.040" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I divide voice computing into three activities,""" start="00:00:37.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""speech-to-text or dictation, speech-to-commands,""" start="00:00:41.320" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and speech-to-code. I'll be talking about""" start="00:00:44.960" video="mainVideo-voice" id="subtitle"]] +[[!template text="""speech-to-text and speech-to-commands today""" start="00:00:47.640" video="mainVideo-voice" id="subtitle"]] +[[!template text="""because these are two activities""" start="00:00:50.160" video="mainVideo-voice" id="subtitle"]] +[[!template text="""that are probably most broadly applicable""" start="00:00:55.080" video="mainVideo-voice" id="subtitle"]] +[[!template text="""to the workflows of people attending this conference.""" start="00:00:57.320" video="mainVideo-voice" id="subtitle"]] +[[!template text="""This talk will not be about Emacspeak.""" start="00:01:02.560" video="mainVideo-voice" id="subtitle"]] +[[!template text="""This is a verbal program for converting text to speech.""" start="00:01:06.800" video="mainVideo-voice" id="subtitle"]] +[[!template text="""We're talking about the flow of information""" start="00:01:11.360" video="mainVideo-voice" id="subtitle"]] +[[!template text="""opposite direction, speech-to-text.""" start="00:01:13.320" video="mainVideo-voice" id="subtitle"]] +[[!template text="""We need an Emacs Listens. We don't have one,""" start="00:01:16.520" video="mainVideo-voice" id="subtitle"]] +[[!template text="""so I had to seek help from outside the Emacs world""" start="00:01:20.600" video="mainVideo-voice" id="subtitle"]] +[[!template text="""via the Voice In Plus. This runs in""" start="00:01:25.480" video="mainVideo-voice" id="subtitle"]] +[[!template text="""the Google Chrome web browser,""" start="00:01:30.640" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and it's very good for speech-to-text""" start="00:01:33.640" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and very easy to learn how to use.""" start="00:01:36.720" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It also has some speech-to-commands.""" start="00:01:39.520" video="mainVideo-voice" id="subtitle"]] +[[!template text="""However, Talon Voice is much better""" start="00:01:42.000" video="mainVideo-voice" id="subtitle"]] +[[!template text="""with the speech-to-commands,""" start="00:01:44.800" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and it's also great at speech-to-code.""" start="00:01:47.560" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So, the motivations are, obviously, as I mentioned already,""" start="00:01:53.520" video="mainVideo-voice" id="subtitle"]] +[[!template text="""for improved productivity.""" start="00:01:57.240" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So, if you're a fast typist""" start="00:01:59.160" video="mainVideo-voice" id="subtitle"]] +[[!template text="""who types faster than they can speak,""" start="00:02:00.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""then nonetheless you might still benefit""" start="00:02:05.200" video="mainVideo-voice" id="subtitle"]] +[[!template text="""from voice computing when you grow tired of""" start="00:02:07.080" video="mainVideo-voice" id="subtitle"]] +[[!template text="""using the keyboard. On the other hand,""" start="00:02:09.280" video="mainVideo-voice" id="subtitle"]] +[[!template text="""you might be a slow typist who talks faster""" start="00:02:12.200" video="mainVideo-voice" id="subtitle"]] +[[!template text="""than they can type.""" start="00:02:15.200" video="mainVideo-voice" id="subtitle"]] +[[!template text="""In this case, you're definitely going to""" start="00:02:17.520" video="mainVideo-voice" id="subtitle"]] +[[!template text="""benefit from dictation because you'll be able to""" start="00:02:19.760" video="mainVideo-voice" id="subtitle"]] +[[!template text="""encode more words in text documents in a given day.""" start="00:02:22.860" video="mainVideo-voice" id="subtitle"]] +[[!template text="""If you're a coder, then you may get a kick out of""" start="00:02:29.360" video="mainVideo-voice" id="subtitle"]] +[[!template text="""opening programs and websites and coding projects""" start="00:02:33.640" video="mainVideo-voice" id="subtitle"]] +[[!template text="""by using your voice.""" start="00:02:37.000" video="mainVideo-voice" id="subtitle"]] +[[!template text="""Then there are health-related reasons.""" start="00:02:39.280" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You may have impaired use of your hands, eyes, or both""" start="00:02:41.720" video="mainVideo-voice" id="subtitle"]] +[[!template text="""due to accident or disease, or you may suffer from""" start="00:02:44.600" video="mainVideo-voice" id="subtitle"]] +[[!template text="""a repetitive stress injury. Many of us have this""" start="00:02:49.200" video="mainVideo-voice" id="subtitle"]] +[[!template text="""in a mild but chronic form of it.""" start="00:02:53.520" video="mainVideo-voice" id="subtitle"]] +[[!template text="""We can't take a three-month sabbatical from the keyboard""" start="00:02:55.760" video="mainVideo-voice" id="subtitle"]] +[[!template text="""without losing our jobs, so these injuries tend to persist.""" start="00:02:59.040" video="mainVideo-voice" id="subtitle"]] +[[!template text="""And then you may have learned""" start="00:03:05.520" video="mainVideo-voice" id="subtitle"]] +[[!template text="""that it's not good for your health to sit""" start="00:03:06.680" video="mainVideo-voice" id="subtitle"]] +[[!template text="""for prolonged periods of time""" start="00:03:09.960" video="mainVideo-voice" id="subtitle"]] +[[!template text="""with your staring at a computer screen.""" start="00:03:11.920" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You can actually dictate to your computer from 20 feet away""" start="00:03:14.920" video="mainVideo-voice" id="subtitle"]] +[[!template text="""while looking out the window,""" start="00:03:21.800" video="mainVideo-voice" id="subtitle"]] +[[!template text="""thereby giving your lower body a break""" start="00:03:25.000" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and your eyes a break.""" start="00:03:27.780" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I'm not God, so I have to bring data.""" start="00:03:33.240" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I have two data points here,""" start="00:03:35.640" video="mainVideo-voice" id="subtitle"]] +[[!template text="""the number of words that I wrote in June and July this year""" start="00:03:38.040" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and in September and October.""" start="00:03:42.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I adopted the use of voice computing""" start="00:03:45.160" video="mainVideo-voice" id="subtitle"]] +[[!template text="""in the middle of August. As you can see,""" start="00:03:49.520" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I got a over three-fold increase in my output.""" start="00:03:53.920" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So this is the Chrome store website for voice-in.""" start="00:03:58.680" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So it's only available for Google Chrome.""" start="00:04:07.120" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You just hit the install button to install it.""" start="00:04:11.120" video="mainVideo-voice" id="subtitle"]] +[[!template text="""To configure it, you need to select a language.""" start="00:04:13.240" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It has support for 40 languages""" start="00:04:16.640" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and it supports about a dozen different dialects of English,""" start="00:04:19.560" video="mainVideo-voice" id="subtitle"]] +[[!template text="""including Australian. It works on web pages with text areas,""" start="00:04:23.120" video="mainVideo-voice" id="subtitle"]] +[[!template text="""so it works. I use it regularly""" start="00:04:29.960" video="mainVideo-voice" id="subtitle"]] +[[!template text="""on Overleaf and 750words.com,""" start="00:04:33.320" video="mainVideo-voice" id="subtitle"]] +[[!template text="""a distraction-free environment for writing.""" start="00:04:37.880" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It also works in webmails. It works in Google.""" start="00:04:42.280" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It works in Jupyter Lab, of course,""" start="00:04:46.780" video="mainVideo-voice" id="subtitle"]] +[[!template text="""because that runs in the browser.""" start="00:04:51.320" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It also works in Jupyter Notebook and Colab Notebook.""" start="00:04:52.880" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It should work in Cloudmacs.""" start="00:04:58.000" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I've mapped option-L to opening Voice In""" start="00:05:01.320" video="mainVideo-voice" id="subtitle"]] +[[!template text="""when the cursor is on a web page that has a text area.""" start="00:05:04.160" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So that's the main limiting factor.""" start="00:05:09.120" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So it has a number of built-in commands.""" start="00:05:16.880" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You can turn it off by saying stop dictation.""" start="00:05:19.160" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It doesn't distinguish between""" start="00:05:24.880" video="mainVideo-voice" id="subtitle"]] +[[!template text="""a command mode and a dictation mode.""" start="00:05:26.120" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It has undo command. When you use a command,""" start="00:05:28.800" video="mainVideo-voice" id="subtitle"]] +[[!template text="""copy that to a copy of selection.""" start="00:05:33.600" video="mainVideo-voice" id="subtitle"]] +[[!template text="""And the `press` commands are used in the browser,""" start="00:05:36.920" video="mainVideo-voice" id="subtitle"]] +[[!template text="""so you press Enter to issue a command or a text""" start="00:05:40.080" video="mainVideo-voice" id="subtitle"]] +[[!template text="""that has been written in a web form,""" start="00:05:44.840" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and then "press tab" will open up the next tab""" start="00:05:50.320" video="mainVideo-voice" id="subtitle"]] +[[!template text="""in a web browser. The scroll up and down""" start="00:05:55.280" video="mainVideo-voice" id="subtitle"]] +[[!template text="""will allow you to navigate a web page.""" start="00:05:58.600" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I've put together a quiz about these commands""" start="00:06:02.380" video="mainVideo-voice" id="subtitle"]] +[[!template text="""so that you can go through this quiz several times""" start="00:06:05.820" video="mainVideo-voice" id="subtitle"]] +[[!template text="""until you get at least 90 percent of them correct,""" start="00:06:09.560" video="mainVideo-voice" id="subtitle"]] +[[!template text="""90 percent of the questions correct.""" start="00:06:14.700" video="mainVideo-voice" id="subtitle"]] +[[!template text="""In order to boost your recall of the commands,""" start="00:06:16.680" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I have a Python script that you can probably""" start="00:06:20.600" video="mainVideo-voice" id="subtitle"]] +[[!template text="""pound through the quiz with""" start="00:06:23.800" video="mainVideo-voice" id="subtitle"]] +[[!template text="""in less than a minute, once you know the commands.""" start="00:06:26.560" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I also provide an Elisp version of this quiz,""" start="00:06:32.160" video="mainVideo-voice" id="subtitle"]] +[[!template text="""but it's a little slower to operate.""" start="00:06:35.600" video="mainVideo-voice" id="subtitle"]] +[[!template text="""These are some common errors""" start="00:06:41.740" video="mainVideo-voice" id="subtitle"]] +[[!template text="""that I've run into with Voice In.""" start="00:06:43.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It likes to contract statements like "I will" into "I'll".""" start="00:06:45.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""Contractions are not used in formal writing,""" start="00:06:50.320" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and most of my writing is formal writing, so this annoys me.""" start="00:06:55.600" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I will show you how I corrected for that problem.""" start="00:07:00.360" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It also drops the first word in sentences quite often.""" start="00:07:04.760" video="mainVideo-voice" id="subtitle"]] +[[!template text="""This might be some speech issue that I have.""" start="00:07:10.040" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It inserts the wrong word because it's not in the dictionary""" start="00:07:13.360" video="mainVideo-voice" id="subtitle"]] +[[!template text="""that was used to train it. So, for example,""" start="00:07:17.600" video="mainVideo-voice" id="subtitle"]] +[[!template text="""the word PyMOL is the name of a lexicographic program""" start="00:07:22.620" video="mainVideo-voice" id="subtitle"]] +[[!template text="""that we use in our field. It doesn't recognize PyMOL.""" start="00:07:26.920" video="mainVideo-voice" id="subtitle"]] +[[!template text="""Instead, it substitutes in the word "primal".""" start="00:07:31.640" video="mainVideo-voice" id="subtitle"]] +[[!template text="""Since I don't use "primal" very often,""" start="00:07:34.240" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I've mapped the word "primal" to "PyMOL"""" start="00:07:38.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""in some custom commands I'll talk about in a minute.""" start="00:07:42.300" video="mainVideo-voice" id="subtitle"]] +[[!template text="""Then there's a problem that the commands that exist""" start="00:07:45.660" video="mainVideo-voice" id="subtitle"]] +[[!template text="""might get executed when you speak them when, in fact,""" start="00:07:50.440" video="mainVideo-voice" id="subtitle"]] +[[!template text="""you wanted to use the words in those commands""" start="00:07:54.440" video="mainVideo-voice" id="subtitle"]] +[[!template text="""during your dictation.""" start="00:07:58.840" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So this is a problem, a pitfall of Voice In,""" start="00:08:01.440" video="mainVideo-voice" id="subtitle"]] +[[!template text="""in that it doesn't have a command mode""" start="00:08:07.120" video="mainVideo-voice" id="subtitle"]] +[[!template text="""that's separate from a dictation mode.""" start="00:08:08.920" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So you can set up through a very easy-to-use GUI""" start="00:08:14.760" video="mainVideo-voice" id="subtitle"]] +[[!template text="""custom voice commands mapped to what you want inserted.""" start="00:08:20.320" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So this is how misinterpreted words can be corrected.""" start="00:08:26.960" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You just map the misinterpreted word to the intended word.""" start="00:08:32.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You can also map the contractions to their expansions.""" start="00:08:35.760" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I did this for 94 English contractions,""" start="00:08:42.840" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and you can find this on GitHub.""" start="00:08:46.960" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You can also insert acronyms and expand those acronyms.""" start="00:08:50.140" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I apply the same approach to the first names of colleagues.""" start="00:08:56.080" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I say "expand Fred", for example,""" start="00:09:00.240" video="mainVideo-voice" id="subtitle"]] +[[!template text="""to get Fred's first and last name with the spelling""" start="00:09:03.760" video="mainVideo-voice" id="subtitle"]] +[[!template text="""of his very long German name.""" start="00:09:07.000" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You can also insert other trivia like favorite URLs.""" start="00:09:12.600" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You can insert a lot of text snippets,""" start="00:09:19.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and so it handles correctly multi-line snippets.""" start="00:09:24.560" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You just have to enclose them in double quotes.""" start="00:09:34.800" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You can even insert BibTeX cite keys for references""" start="00:09:39.420" video="mainVideo-voice" id="subtitle"]] +[[!template text="""that you use frequently. All fields""" start="00:09:45.040" video="mainVideo-voice" id="subtitle"]] +[[!template text="""have certain key references for certain methods or topics.""" start="00:09:46.880" video="mainVideo-voice" id="subtitle"]] +[[!template text="""Then it has a set of commands that you can customize""" start="00:09:59.420" video="mainVideo-voice" id="subtitle"]] +[[!template text="""for the purpose of speech to commands""" start="00:10:05.080" video="mainVideo-voice" id="subtitle"]] +[[!template text="""to get the computer to do something""" start="00:10:08.200" video="mainVideo-voice" id="subtitle"]] +[[!template text="""like open up a specific website or save the current writing.""" start="00:10:09.680" video="mainVideo-voice" id="subtitle"]] +[[!template text="""In this case, we have "press" is a mapping of""" start="00:10:15.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""is applied to the command `s` for saving current writing.""" start="00:10:19.920" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You can change the language,""" start="00:10:27.760" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and you can change the case of the text.""" start="00:10:28.100" video="mainVideo-voice" id="subtitle"]] +[[!template text="""But the speech to command repertoire is quite limited""" start="00:10:37.540" video="mainVideo-voice" id="subtitle"]] +[[!template text="""in Voice In, so it's now time to pick up on Talon Voice.""" start="00:10:41.040" video="mainVideo-voice" id="subtitle"]] +[[!template text="""This is an open source project. It's free.""" start="00:10:49.760" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It is highly configurable via TalonScript,""" start="00:10:54.120" video="mainVideo-voice" id="subtitle"]] +[[!template text="""which is a subset of Python.""" start="00:10:57.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You can use either TalonScript or Python to configure it,""" start="00:10:58.960" video="mainVideo-voice" id="subtitle"]] +[[!template text="""but it's easier to code up your configuration""" start="00:11:03.040" video="mainVideo-voice" id="subtitle"]] +[[!template text="""in TalonScript.""" start="00:11:06.280" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It has a Python interpreter embedded in it,""" start="00:11:08.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""so you don't have to mess around with installing""" start="00:11:10.760" video="mainVideo-voice" id="subtitle"]] +[[!template text="""yet another Python interpreter.""" start="00:11:13.000" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It runs on all platforms, and it has a dictation mode""" start="00:11:14.560" video="mainVideo-voice" id="subtitle"]] +[[!template text="""that's separate from a command mode.""" start="00:11:21.520" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You can activate it,""" start="00:11:24.600" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and it'll be in a listening state asleep.""" start="00:11:25.600" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You just bark out Talon Wake to start to wake it up,""" start="00:11:31.360" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and Talon Sleep to have it go into a listening state.""" start="00:11:36.280" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It has a very welcoming community""" start="00:11:43.800" video="mainVideo-voice" id="subtitle"]] +[[!template text="""in the Talon Slack channel.""" start="00:11:47.920" video="mainVideo-voice" id="subtitle"]] +[[!template text="""Then I need to point out that there's several packages""" start="00:11:50.920" video="mainVideo-voice" id="subtitle"]] +[[!template text="""that others have developed that run on top of Talon,""" start="00:11:56.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""but one of particular note is by Pokey Rule.""" start="00:11:59.200" video="mainVideo-voice" id="subtitle"]] +[[!template text="""He has on his website some really well-done videos""" start="00:12:03.080" video="mainVideo-voice" id="subtitle"]] +[[!template text="""that demonstrate how he uses Cursorless""" start="00:12:08.120" video="mainVideo-voice" id="subtitle"]] +[[!template text="""to move the cursor around using voice commands.""" start="00:12:11.480" video="mainVideo-voice" id="subtitle"]] +[[!template text="""This, however, runs on VS Code.""" start="00:12:17.240" video="mainVideo-voice" id="subtitle"]] +[[!template text="""At least that's the text editor""" start="00:12:20.560" video="mainVideo-voice" id="subtitle"]] +[[!template text="""for which he's primarily developing Cursorless.""" start="00:12:23.360" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So, I followed the protocol outlined by Tara Roys.""" start="00:12:28.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""She has a collection of tutorials""" start="00:12:35.520" video="mainVideo-voice" id="subtitle"]] +[[!template text="""on YouTube as well as on GitHub that are quite helpful.""" start="00:12:38.760" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I followed her tutorial for installing""" start="00:12:44.600" video="mainVideo-voice" id="subtitle"]] +[[!template text="""Talend on macOS without any issues,""" start="00:12:49.480" video="mainVideo-voice" id="subtitle"]] +[[!template text="""but allow for half an hour to an hour""" start="00:12:51.360" video="mainVideo-voice" id="subtitle"]] +[[!template text="""to go through the process. When you're done,""" start="00:12:55.320" video="mainVideo-voice" id="subtitle"]] +[[!template text="""you'll have this Talon icon appear in the toolbar""" start="00:12:57.720" video="mainVideo-voice" id="subtitle"]] +[[!template text="""on the Mac. When it has this diagonal line across it,""" start="00:13:02.200" video="mainVideo-voice" id="subtitle"]] +[[!template text="""that means it's in the sleep state.""" start="00:13:06.120" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So, this leads to cascading pull-down menus.""" start="00:13:09.540" video="mainVideo-voice" id="subtitle"]] +[[!template text="""This is it for the GUI interface.""" start="00:13:13.520" video="mainVideo-voice" id="subtitle"]] +[[!template text="""One of your first tasks is to select a large language model""" start="00:13:19.640" video="mainVideo-voice" id="subtitle"]] +[[!template text="""or language model that will be used to interpret""" start="00:13:26.520" video="mainVideo-voice" id="subtitle"]] +[[!template text="""the sounds that you generate as words.""" start="00:13:30.440" video="mainVideo-voice" id="subtitle"]] +[[!template text="""And the other kind of key feature is that there's a,""" start="00:13:35.180" video="mainVideo-voice" id="subtitle"]] +[[!template text="""under scripting, there's a view log pull-down""" start="00:13:38.960" video="mainVideo-voice" id="subtitle"]] +[[!template text="""that opens up a window displaying the log file.""" start="00:13:43.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""Whenever you make a change in a Talon configuration file,""" start="00:13:48.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""that change is implemented immediately.""" start="00:13:52.880" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You do not have to restart Talon""" start="00:13:55.080" video="mainVideo-voice" id="subtitle"]] +[[!template text="""to get the change to take effect.""" start="00:13:57.600" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So, this is an example of a Talon file.""" start="00:14:02.540" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It has two components. It has a header above the dash that describes""" start="00:14:04.760" video="mainVideo-voice" id="subtitle"]] +[[!template text="""the scope of the commands contained below the dash.""" start="00:14:10.500" video="mainVideo-voice" id="subtitle"]] +[[!template text="""Each command is separated by a blank line.""" start="00:14:14.920" video="mainVideo-voice" id="subtitle"]] +[[!template text="""If a voice command is mapped to multiple actions,""" start="00:14:19.740" video="mainVideo-voice" id="subtitle"]] +[[!template text="""these are listed separately on indented lines""" start="00:14:24.240" video="mainVideo-voice" id="subtitle"]] +[[!template text="""below the first line.""" start="00:14:31.000" video="mainVideo-voice" id="subtitle"]] +[[!template text="""The words that are in square brackets are optional.""" start="00:14:33.600" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So, I have mapped the word toggle voice in,""" start="00:14:39.420" video="mainVideo-voice" id="subtitle"]] +[[!template text="""or the phrase toggle voice in,""" start="00:14:44.320" video="mainVideo-voice" id="subtitle"]] +[[!template text="""to the keyboard shortcut Alt L""" start="00:14:46.320" video="mainVideo-voice" id="subtitle"]] +[[!template text="""in order to toggle on or off voice in.""" start="00:14:51.280" video="mainVideo-voice" id="subtitle"]] +[[!template text="""If I toggle voice in on,""" start="00:14:55.000" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I need to immediately toggle off Talon,""" start="00:14:57.880" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and this is done through this key command for Control T,""" start="00:15:01.760" video="mainVideo-voice" id="subtitle"]] +[[!template text="""which is mapped to speech toggle.""" start="00:15:09.080" video="mainVideo-voice" id="subtitle"]] +[[!template text="""Speech toggle. Then there are,""" start="00:15:11.080" video="mainVideo-voice" id="subtitle"]] +[[!template text="""there's a couple other examples.""" start="00:15:20.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So, if there's no header present,""" start="00:15:24.080" video="mainVideo-voice" id="subtitle"]] +[[!template text="""it's an optional feature of Talon files,""" start="00:15:26.440" video="mainVideo-voice" id="subtitle"]] +[[!template text="""then the commands in the file will apply in all situations,""" start="00:15:29.600" video="mainVideo-voice" id="subtitle"]] +[[!template text="""in all modes. Here we have two restrictions.""" start="00:15:32.640" video="mainVideo-voice" id="subtitle"]] +[[!template text="""This is only, these commands will only work""" start="00:15:36.960" video="mainVideo-voice" id="subtitle"]] +[[!template text="""when using the iTerm2 terminal emulator for the Mac,""" start="00:15:38.960" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and then only when the title of the window in iTerm2""" start="00:15:42.960" video="mainVideo-voice" id="subtitle"]] +[[!template text="""has this particular address, which corresponds to,""" start="00:15:48.240" video="mainVideo-voice" id="subtitle"]] +[[!template text="""which is what appears when I've logged into""" start="00:15:52.440" video="mainVideo-voice" id="subtitle"]] +[[!template text="""the supercomputer at the University of Oklahoma.""" start="00:15:55.560" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So, one of the commands in this file is checkjobs.""" start="00:16:00.060" video="mainVideo-voice" id="subtitle"]] +[[!template text="""It's mapped to an alias,""" start="00:16:03.480" video="mainVideo-voice" id="subtitle"]] +[[!template text="""a bash alias called cj for "check jobs",""" start="00:16:05.540" video="mainVideo-voice" id="subtitle"]] +[[!template text="""which in turn is mapped to a script called checkjobs.sh""" start="00:16:10.920" video="mainVideo-voice" id="subtitle"]] +[[!template text="""that, when it's run, returns a listing""" start="00:16:17.080" video="mainVideo-voice" id="subtitle"]] +[[!template text="""of the pending and running jobs on the supercomputer""" start="00:16:20.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""in a format that I find pleasing.""" start="00:16:23.220" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So, this backslash n after cj, new line character,""" start="00:16:26.081" video="mainVideo-voice" id="subtitle"]] +[[!template text="""enters the command. So, I don't have to do that""" start="00:16:34.560" video="mainVideo-voice" id="subtitle"]] +[[!template text="""as an additional step. And then, likewise,""" start="00:16:39.840" video="mainVideo-voice" id="subtitle"]] +[[!template text="""here's a similar setup for interacting with""" start="00:16:43.800" video="mainVideo-voice" id="subtitle"]] +[[!template text="""a Ubuntu virtual machine.""" start="00:16:46.800" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So, in terms of picking up voice computing,""" start="00:16:52.500" video="mainVideo-voice" id="subtitle"]] +[[!template text="""these are my recommendations.""" start="00:16:55.920" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You're going to run into more errors""" start="00:16:57.480" video="mainVideo-voice" id="subtitle"]] +[[!template text="""than you may like initially,""" start="00:16:59.760" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and so you need some patience in dealing with those.""" start="00:17:01.480" video="mainVideo-voice" id="subtitle"]] +[[!template text="""And also, it'll take you a while""" start="00:17:07.840" video="mainVideo-voice" id="subtitle"]] +[[!template text="""to get your head wrapped around Talon and how it works.""" start="00:17:09.920" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You'll definitely want to use these custom commands""" start="00:17:16.800" video="mainVideo-voice" id="subtitle"]] +[[!template text="""to correct the errors or shortcomings""" start="00:17:19.440" video="mainVideo-voice" id="subtitle"]] +[[!template text="""of the language models. And you've seen how,""" start="00:17:21.480" video="mainVideo-voice" id="subtitle"]] +[[!template text="""by opening up projects by voice commands,""" start="00:17:26.920" video="mainVideo-voice" id="subtitle"]] +[[!template text="""you can reduce friction""" start="00:17:29.880" video="mainVideo-voice" id="subtitle"]] +[[!template text="""in terms of restarting work on a project.""" start="00:17:31.360" video="mainVideo-voice" id="subtitle"]] +[[!template text="""You've seen how Voice In is preferred""" start="00:17:36.660" video="mainVideo-voice" id="subtitle"]] +[[!template text="""for more accurate dictation.""" start="00:17:40.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I think my error rate is about 1 to 2 percent.""" start="00:17:44.880" video="mainVideo-voice" id="subtitle"]] +[[!template text="""That is, 1 to 2 out of 100 words are incorrect""" start="00:17:48.080" video="mainVideo-voice" id="subtitle"]] +[[!template text="""versus Talon Voice where I think""" start="00:17:53.880" video="mainVideo-voice" id="subtitle"]] +[[!template text="""the error rate is closer to 5 percent.""" start="00:17:56.320" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I have put together contractions also for Talon,""" start="00:18:00.840" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and they can be found here on GitHub.""" start="00:18:04.760" video="mainVideo-voice" id="subtitle"]] +[[!template text="""And I also have a quiz of 600 questions""" start="00:18:07.480" video="mainVideo-voice" id="subtitle"]] +[[!template text="""about some basic Talon commands.""" start="00:18:12.960" video="mainVideo-voice" id="subtitle"]] +[[!template text="""So, I'd like to thank the people who've helped me out""" start="00:18:17.720" video="mainVideo-voice" id="subtitle"]] +[[!template text="""on the Talon Slack channel""" start="00:18:21.000" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and members of the Oklahoma Data Science Workshop""" start="00:18:22.160" video="mainVideo-voice" id="subtitle"]] +[[!template text="""where I gave an hour-long talk on this topic""" start="00:18:25.800" video="mainVideo-voice" id="subtitle"]] +[[!template text="""several weeks ago.""" start="00:18:29.880" video="mainVideo-voice" id="subtitle"]] +[[!template text="""I'd like to thank my friends""" start="00:18:30.960" video="mainVideo-voice" id="subtitle"]] +[[!template text="""at the Berlin and Austin Emacs Meetup""" start="00:18:34.160" video="mainVideo-voice" id="subtitle"]] +[[!template text="""and at the M-x Research Slack channel.""" start="00:18:37.400" video="mainVideo-voice" id="subtitle"]] +[[!template text="""And I thank these grant funding agencies""" start="00:18:42.660" video="mainVideo-voice" id="subtitle"]] +[[!template text="""for supporting my work. I'll be happy to take any questions.""" start="00:18:45.120" video="mainVideo-voice" id="subtitle"]] + Questions or comments? Please e-mail [emacsconf-org-private@gnu.org](mailto:emacsconf-org-private@gnu.org?subject=Comment%20for%20EmacsConf%202022%20voice%3A%20Enhancing%20productivity%20with%20voice%20computing) diff --git a/2023/info/voice-before.md b/2023/info/voice-before.md index 08e9ad79..eed3b66c 100644 --- a/2023/info/voice-before.md +++ b/2023/info/voice-before.md @@ -8,12 +8,12 @@ The following image shows where the talk is in the schedule for Sat 2023-12-02. Format: 19-min talk; Q&A: BigBlueButton conference room Etherpad: Discuss on IRC: [#emacsconf-dev](https://chat.emacsconf.org/?join=emacsconf,emacsconf-dev) -Status: Processing uploaded video +Status: Now playing on the conference livestream
Times in different timezones:
Saturday, Dec 2 2023, ~10:20 AM - 10:40 AM EST (US/Eastern)
which is the same as:
Saturday, Dec 2 2023, ~9:20 AM - 9:40 AM CST (US/Central)
Saturday, Dec 2 2023, ~8:20 AM - 8:40 AM MST (US/Mountain)
Saturday, Dec 2 2023, ~7:20 AM - 7:40 AM PST (US/Pacific)
Saturday, Dec 2 2023, ~3:20 PM - 3:40 PM UTC
Saturday, Dec 2 2023, ~4:20 PM - 4:40 PM CET (Europe/Paris)
Saturday, Dec 2 2023, ~5:20 PM - 5:40 PM EET (Europe/Athens)
Saturday, Dec 2 2023, ~8:50 PM - 9:10 PM IST (Asia/Kolkata)
Saturday, Dec 2 2023, ~11:20 PM - 11:40 PM +08 (Asia/Singapore)
Sunday, Dec 3 2023, ~12:20 AM - 12:40 AM JST (Asia/Tokyo)
Find out how to watch and participate
- + # Description \ No newline at end of file -- cgit v1.2.3