summaryrefslogtreecommitdiffstats
path: root/2023/captions/emacsconf-2023-voice--enhancing-productivity-with-voice-computing--blaine-mooers--main.vtt
blob: 650d2d49964fe70594d1441a1112ffe6ae0efc75 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
WEBVTT captioned by sachac

00:00:00.000 --> 00:00:04.359
Hi, I'm Blaine Mooers. I'm an associate professor

00:00:04.360 --> 00:00:06.519
of biochemistry at the University of Oklahoma

00:00:06.520 --> 00:00:09.319
Health Sciences Center in Oklahoma City.

00:00:09.320 --> 00:00:12.959
My lab studies the role of RNA structure in RNA editing.

00:00:12.960 --> 00:00:17.199
We use X-ray crystallography to study the structures

00:00:17.200 --> 00:00:19.919
of these RNAs. We spend a lot of time in the lab

00:00:19.920 --> 00:00:22.719
preparing our samples for structural studies,

00:00:22.720 --> 00:00:26.719
and then we also spend a lot of time at the computer

00:00:26.720 --> 00:00:29.719
analyzing the resulting data.

00:00:29.720 --> 00:00:33.039
I was seeking ways of using voice computing

00:00:33.040 --> 00:00:37.399
to try to enhance my productivity.

00:00:37.400 --> 00:00:41.319
I divide voice computing into three activities,

00:00:41.320 --> 00:00:44.959
speech-to-text or dictation, speech-to-commands,

00:00:44.960 --> 00:00:47.639
and speech-to-code. I'll be talking about

00:00:47.640 --> 00:00:50.159
speech-to-text and speech-to-commands today

00:00:50.160 --> 00:00:55.079
because these are two activities

00:00:55.080 --> 00:00:57.319
that are probably most broadly applicable

00:00:57.320 --> 00:01:02.559
to the workflows of people attending this conference.

00:01:02.560 --> 00:01:06.799
This talk will not be about Emacspeak.

00:01:06.800 --> 00:01:11.359
This is a verbal program for converting text to speech.

00:01:11.360 --> 00:01:13.319
We're talking about the flow of information

00:01:13.320 --> 00:01:16.519
opposite direction, speech-to-text.

00:01:16.520 --> 00:01:20.599
We need an Emacs Listens. We don't have one,

00:01:20.600 --> 00:01:25.479
so I had to seek help from outside the Emacs world

00:01:25.480 --> 00:01:30.639
via the Voice In Plus. This runs in

00:01:30.640 --> 00:01:33.639
the Google Chrome web browser,

00:01:33.640 --> 00:01:36.719
and it's very good for speech-to-text

00:01:36.720 --> 00:01:39.519
and very easy to learn how to use.

00:01:39.520 --> 00:01:41.999
It also has some speech-to-commands.

00:01:42.000 --> 00:01:44.799
However, Talon Voice is much better

00:01:44.800 --> 00:01:47.559
with the speech-to-commands,

00:01:47.560 --> 00:01:53.519
and it's also great at speech-to-code.

NOTE Motivations

00:01:53.520 --> 00:01:57.239
So, the motivations are, obviously, as I mentioned already,

00:01:57.240 --> 00:01:59.159
for improved productivity.

00:01:59.160 --> 00:02:00.399
So, if you're a fast typist

00:02:00.400 --> 00:02:05.199
who types faster than they can speak,

00:02:05.200 --> 00:02:07.079
then nonetheless you might still benefit

00:02:07.080 --> 00:02:09.279
from voice computing when you grow tired of

00:02:09.280 --> 00:02:12.199
using the keyboard. On the other hand,

00:02:12.200 --> 00:02:15.199
you might be a slow typist who talks faster

00:02:15.200 --> 00:02:17.519
than they can type.

00:02:17.520 --> 00:02:19.759
In this case, you're definitely going to

00:02:19.760 --> 00:02:22.859
benefit from dictation because you'll be able to

00:02:22.860 --> 00:02:29.359
encode more words in text documents in a given day.

00:02:29.360 --> 00:02:33.639
If you're a coder, then you may get a kick out of

00:02:33.640 --> 00:02:36.999
opening programs and websites and coding projects

00:02:37.000 --> 00:02:39.279
by using your voice.

00:02:39.280 --> 00:02:41.719
Then there are health-related reasons.

00:02:41.720 --> 00:02:44.599
You may have impaired use of your hands, eyes, or both

00:02:44.600 --> 00:02:49.199
due to accident or disease, or you may suffer from

00:02:49.200 --> 00:02:53.519
a repetitive stress injury. Many of us have this

00:02:53.520 --> 00:02:55.759
in a mild but chronic form of it.

00:02:55.760 --> 00:02:59.039
We can't take a three-month sabbatical from the keyboard

00:02:59.040 --> 00:03:05.519
without losing our jobs, so these injuries tend to persist.

00:03:05.520 --> 00:03:06.679
And then you may have learned

00:03:06.680 --> 00:03:09.959
that it's not good for your health to sit

00:03:09.960 --> 00:03:11.919
for prolonged periods of time

00:03:11.920 --> 00:03:14.919
with your staring at a computer screen.

00:03:14.920 --> 00:03:21.799
You can actually dictate to your computer from 20 feet away

00:03:21.800 --> 00:03:24.999
while looking out the window,

00:03:25.000 --> 00:03:27.779
thereby giving your lower body a break

00:03:27.780 --> 00:03:33.239
and your eyes a break.

NOTE Data

00:03:33.240 --> 00:03:35.639
I'm not God, so I have to bring data.

00:03:35.640 --> 00:03:38.039
I have two data points here,

00:03:38.040 --> 00:03:42.399
the number of words that I wrote in June and July this year

00:03:42.400 --> 00:03:45.159
and in September and October.

00:03:45.160 --> 00:03:49.519
I adopted the use of voice computing

00:03:49.520 --> 00:03:53.919
in the middle of August. As you can see,

00:03:53.920 --> 00:03:58.679
I got a over three-fold increase in my output.

NOTE Voice In in the Chrome Store

00:03:58.680 --> 00:04:07.119
So this is the Chrome store website for voice-in.

00:04:07.120 --> 00:04:11.119
So it's only available for Google Chrome.

00:04:11.120 --> 00:04:13.239
You just hit the install button to install it.

00:04:13.240 --> 00:04:16.639
To configure it, you need to select a language.

00:04:16.640 --> 00:04:19.559
It has support for 40 languages

00:04:19.560 --> 00:04:23.119
and it supports about a dozen different dialects of English,

00:04:23.120 --> 00:04:29.959
including Australian. It works on web pages with text areas,

00:04:29.960 --> 00:04:33.319
so it works. I use it regularly

00:04:33.320 --> 00:04:37.879
on Overleaf and 750words.com,

00:04:37.880 --> 00:04:42.279
a distraction-free environment for writing.

00:04:42.280 --> 00:04:46.239
It also works in webmails. It works in Google.

00:04:46.780 --> 00:04:51.319
It works in Jupyter Lab, of course,

00:04:51.320 --> 00:04:52.879
because that runs in the browser.

00:04:52.880 --> 00:04:57.999
It also works in Jupyter Notebook and Colab Notebook.

00:04:58.000 --> 00:05:01.319
It should work in Cloudmacs.

00:05:01.320 --> 00:05:04.159
I've mapped option-L to opening Voice In

00:05:04.160 --> 00:05:09.119
when the cursor is on a web page that has a text area.

00:05:09.120 --> 00:05:16.879
So that's the main limiting factor.

NOTE Built-in commands in Voice In Plus

00:05:16.880 --> 00:05:19.159
So it has a number of built-in commands.

00:05:19.160 --> 00:05:24.879
You can turn it off by saying stop dictation.

00:05:24.880 --> 00:05:26.119
It doesn't distinguish between

00:05:26.120 --> 00:05:28.799
a command mode and a dictation mode.

00:05:28.800 --> 00:05:33.599
It has undo command. When you use a command,

00:05:33.600 --> 00:05:36.919
copy that to a copy of selection.

00:05:36.920 --> 00:05:40.079
And the `press` commands are used in the browser,

00:05:40.080 --> 00:05:44.839
so you press Enter to issue a command or a text

00:05:44.840 --> 00:05:50.319
that has been written in a web form,

00:05:50.320 --> 00:05:55.279
and then "press tab" will open up the next tab

00:05:55.280 --> 00:05:58.599
in a web browser. The scroll up and down

00:05:58.600 --> 00:06:02.379
will allow you to navigate a web page.

00:06:02.380 --> 00:06:05.819
I've put together a quiz about these commands

00:06:05.820 --> 00:06:09.559
so that you can go through this quiz several times

00:06:09.560 --> 00:06:14.699
until you get at least 90 percent of them correct,

00:06:14.700 --> 00:06:16.679
90 percent of the questions correct.

00:06:16.680 --> 00:06:20.599
In order to boost your recall of the commands,

00:06:20.600 --> 00:06:23.799
I have a Python script that you can probably

00:06:23.800 --> 00:06:26.559
pound through the quiz with

00:06:26.560 --> 00:06:32.159
in less than a minute, once you know the commands.

00:06:32.160 --> 00:06:35.599
I also provide an Elisp version of this quiz,

00:06:35.600 --> 00:06:41.739
but it's a little slower to operate.

NOTE Common errors

00:06:41.740 --> 00:06:43.399
These are some common errors

00:06:43.400 --> 00:06:45.399
that I've run into with Voice In.

00:06:45.400 --> 00:06:50.319
It likes to contract statements like "I will" into "I'll".

00:06:50.320 --> 00:06:55.599
Contractions are not used in formal writing,

00:06:55.600 --> 00:07:00.359
and most of my writing is formal writing, so this annoys me.

00:07:00.360 --> 00:07:04.759
I will show you how I corrected for that problem.

00:07:04.760 --> 00:07:10.039
It also drops the first word in sentences quite often.

00:07:10.040 --> 00:07:13.359
This might be some speech issue that I have.

00:07:13.360 --> 00:07:17.599
It inserts the wrong word because it's not in the dictionary

00:07:17.600 --> 00:07:22.619
that was used to train it. So, for example,

00:07:22.620 --> 00:07:26.919
the word PyMOL is the name of a lexicographic program

00:07:26.920 --> 00:07:31.639
that we use in our field. It doesn't recognize PyMOL.

00:07:31.640 --> 00:07:34.239
Instead, it substitutes in the word "primal".

00:07:34.240 --> 00:07:38.399
Since I don't use "primal" very often,

00:07:38.400 --> 00:07:42.299
I've mapped the word "primal" to "PyMOL"

00:07:42.300 --> 00:07:45.659
in some custom commands I'll talk about in a minute.

00:07:45.660 --> 00:07:50.439
Then there's a problem that the commands that exist

00:07:50.440 --> 00:07:54.439
might get executed when you speak them when, in fact,

00:07:54.440 --> 00:07:58.839
you wanted to use the words in those commands

00:07:58.840 --> 00:08:01.439
during your dictation.

00:08:01.440 --> 00:08:07.119
So this is a problem, a pitfall of Voice In,

00:08:07.120 --> 00:08:08.919
in that it doesn't have a command mode

00:08:08.920 --> 00:08:14.759
that's separate from a dictation mode.

NOTE Custom speech-to-text commands

00:08:14.760 --> 00:08:20.319
So you can set up through a very easy-to-use GUI

00:08:20.320 --> 00:08:26.959
custom voice commands mapped to what you want inserted.

00:08:26.960 --> 00:08:32.399
So this is how misinterpreted words can be corrected.

00:08:32.400 --> 00:08:35.759
You just map the misinterpreted word to the intended word.

00:08:35.760 --> 00:08:42.839
You can also map the contractions to their expansions.

00:08:42.840 --> 00:08:46.959
I did this for 94 English contractions,

00:08:46.960 --> 00:08:50.139
and you can find this on GitHub.

00:08:50.140 --> 00:08:56.079
You can also insert acronyms and expand those acronyms.

00:08:56.080 --> 00:09:00.239
I apply the same approach to the first names of colleagues.

00:09:00.240 --> 00:09:03.759
I say "expand Fred", for example,

00:09:03.760 --> 00:09:06.999
to get Fred's first and last name with the spelling

00:09:07.000 --> 00:09:12.599
of his very long German name.

00:09:12.600 --> 00:09:19.399
You can also insert other trivia like favorite URLs.

00:09:19.400 --> 00:09:24.559
You can insert a lot of text snippets,

00:09:24.560 --> 00:09:34.799
and so it handles correctly multi-line snippets.

00:09:34.800 --> 00:09:39.419
You just have to enclose them in double quotes.

00:09:39.420 --> 00:09:45.039
You can even insert BibTeX cite keys for references

00:09:45.040 --> 00:09:46.879
that you use frequently. All fields

00:09:46.880 --> 00:09:59.419
have certain key references for certain methods or topics.

00:09:59.420 --> 00:10:05.079
Then it has a set of commands that you can customize

00:10:05.080 --> 00:10:08.199
for the purpose of speech to commands

00:10:08.200 --> 00:10:09.679
to get the computer to do something

00:10:09.680 --> 00:10:15.399
like open up a specific website or save the current writing.

00:10:15.400 --> 00:10:19.919
In this case, we have "press" is a mapping of

00:10:19.920 --> 00:10:27.759
is applied to the command `s` for saving current writing.

00:10:27.760 --> 00:10:28.099
You can change the language,

00:10:28.100 --> 00:10:37.539
and you can change the case of the text.

NOTE Introducing Talon Voice

00:10:37.540 --> 00:10:41.039
But the speech to command repertoire is quite limited

00:10:41.040 --> 00:10:49.759
in Voice In, so it's now time to pick up on Talon Voice.

00:10:49.760 --> 00:10:54.119
This is an open source project. It's free.

00:10:54.120 --> 00:10:57.399
It is highly configurable via TalonScript,

00:10:57.400 --> 00:10:58.959
which is a subset of Python.

00:10:58.960 --> 00:11:03.039
You can use either TalonScript or Python to configure it,

00:11:03.040 --> 00:11:06.279
but it's easier to code up your configuration

00:11:06.280 --> 00:11:08.399
in TalonScript.

00:11:08.400 --> 00:11:10.759
It has a Python interpreter embedded in it,

00:11:10.760 --> 00:11:12.999
so you don't have to mess around with installing

00:11:13.000 --> 00:11:14.559
yet another Python interpreter.

00:11:14.560 --> 00:11:21.519
It runs on all platforms, and it has a dictation mode

00:11:21.520 --> 00:11:24.599
that's separate from a command mode.

00:11:24.600 --> 00:11:25.599
You can activate it,

00:11:25.600 --> 00:11:31.359
and it'll be in a listening state asleep.

00:11:31.360 --> 00:11:36.279
You just bark out Talon Wake to start to wake it up,

00:11:36.280 --> 00:11:43.799
and Talon Sleep to have it go into a listening state.

00:11:43.800 --> 00:11:47.919
It has a very welcoming community

00:11:47.920 --> 00:11:50.919
in the Talon Slack channel.

00:11:50.920 --> 00:11:56.399
Then I need to point out that there's several packages

00:11:56.400 --> 00:11:59.199
that others have developed that run on top of Talon,

00:11:59.200 --> 00:12:03.079
but one of particular note is by Pokey Rule.

00:12:03.080 --> 00:12:08.119
He has on his website some really well-done videos

00:12:08.120 --> 00:12:11.479
that demonstrate how he uses Cursorless

00:12:11.480 --> 00:12:17.239
to move the cursor around using voice commands.

00:12:17.240 --> 00:12:20.559
This, however, runs on VS Code.

00:12:20.560 --> 00:12:23.359
At least that's the text editor

00:12:23.360 --> 00:12:28.399
for which he's primarily developing Cursorless.

NOTE Talon GUI

00:12:28.400 --> 00:12:35.519
So, I followed the protocol outlined by Tara Roys.

00:12:35.520 --> 00:12:38.759
She has a collection of tutorials

00:12:38.760 --> 00:12:44.599
on YouTube as well as on GitHub that are quite helpful.

00:12:44.600 --> 00:12:49.479
I followed her tutorial for installing

00:12:49.480 --> 00:12:51.359
Talend on macOS without any issues,

00:12:51.360 --> 00:12:55.319
but allow for half an hour to an hour

00:12:55.320 --> 00:12:57.719
to go through the process. When you're done,

00:12:57.720 --> 00:13:02.199
you'll have this Talon icon appear in the toolbar

00:13:02.200 --> 00:13:06.119
on the Mac. When it has this diagonal line across it,

00:13:06.120 --> 00:13:09.539
that means it's in the sleep state.

00:13:09.540 --> 00:13:13.519
So, this leads to cascading pull-down menus.

00:13:13.520 --> 00:13:19.639
This is it for the GUI interface.

00:13:19.640 --> 00:13:26.519
One of your first tasks is to select a large language model

00:13:26.520 --> 00:13:30.439
or language model that will be used to interpret

00:13:30.440 --> 00:13:35.179
the sounds that you generate as words.

00:13:35.180 --> 00:13:38.959
And the other kind of key feature is that there's a,

00:13:38.960 --> 00:13:43.399
under scripting, there's a view log pull-down

00:13:43.400 --> 00:13:48.399
that opens up a window displaying the log file.

00:13:48.400 --> 00:13:52.879
Whenever you make a change in a Talon configuration file,

00:13:52.880 --> 00:13:55.079
that change is implemented immediately.

00:13:55.080 --> 00:13:57.599
You do not have to restart Talon

00:13:57.600 --> 00:14:02.539
to get the change to take effect.

00:14:02.540 --> 00:14:04.759
So, this is an example of a Talon file.

00:14:04.760 --> 00:14:10.499
It has two components. It has a header above the dash that describes

00:14:10.500 --> 00:14:14.919
the scope of the commands contained below the dash.

00:14:14.920 --> 00:14:19.739
Each command is separated by a blank line.

00:14:19.740 --> 00:14:24.239
If a voice command is mapped to multiple actions,

00:14:24.240 --> 00:14:30.999
these are listed separately on indented lines

00:14:31.000 --> 00:14:33.599
below the first line.

00:14:33.600 --> 00:14:39.419
The words that are in square brackets are optional.

00:14:39.420 --> 00:14:44.319
So, I have mapped the word toggle voice in,

00:14:44.320 --> 00:14:46.319
or the phrase toggle voice in,

00:14:46.320 --> 00:14:51.279
to the keyboard shortcut Alt L

00:14:51.280 --> 00:14:54.999
in order to toggle on or off voice in.

00:14:55.000 --> 00:14:57.879
If I toggle voice in on,

00:14:57.880 --> 00:15:01.759
I need to immediately toggle off Talon,

00:15:01.760 --> 00:15:09.079
and this is done through this key command for Control T,

00:15:09.080 --> 00:15:11.079
which is mapped to speech toggle.

00:15:11.080 --> 00:15:20.399
Speech toggle. Then there are,

00:15:20.400 --> 00:15:24.079
there's a couple other examples.

00:15:24.080 --> 00:15:26.439
So, if there's no header present,

00:15:26.440 --> 00:15:29.599
it's an optional feature of Talon files,

00:15:29.600 --> 00:15:32.639
then the commands in the file will apply in all situations,

00:15:32.640 --> 00:15:36.959
in all modes. Here we have two restrictions.

00:15:36.960 --> 00:15:38.959
This is only, these commands will only work

00:15:38.960 --> 00:15:42.959
when using the iTerm2 terminal emulator for the Mac,

00:15:42.960 --> 00:15:48.239
and then only when the title of the window in iTerm2

00:15:48.240 --> 00:15:52.439
has this particular address, which corresponds to,

00:15:52.440 --> 00:15:55.559
which is what appears when I've logged into

00:15:55.560 --> 00:16:00.059
the supercomputer at the University of Oklahoma.

00:16:00.060 --> 00:16:03.479
So, one of the commands in this file is checkjobs.

00:16:03.480 --> 00:16:05.539
It's mapped to an alias,

00:16:05.540 --> 00:16:10.919
a bash alias called cj for "check jobs",

00:16:10.920 --> 00:16:17.079
which in turn is mapped to a script called checkjobs.sh

00:16:17.080 --> 00:16:20.399
that, when it's run, returns a listing

00:16:20.400 --> 00:16:23.219
of the pending and running jobs on the supercomputer

00:16:23.220 --> 00:16:26.080
in a format that I find pleasing.

00:16:26.081 --> 00:16:34.559
So, this backslash n after cj, new line character,

00:16:34.560 --> 00:16:39.839
enters the command. So, I don't have to do that

00:16:39.840 --> 00:16:43.799
as an additional step. And then, likewise,

00:16:43.800 --> 00:16:46.799
here's a similar setup for interacting with

00:16:46.800 --> 00:16:52.499
a Ubuntu virtual machine.

NOTE Recommendations

00:16:52.500 --> 00:16:55.919
So, in terms of picking up voice computing,

00:16:55.920 --> 00:16:57.479
these are my recommendations.

00:16:57.480 --> 00:16:59.759
You're going to run into more errors

00:16:59.760 --> 00:17:01.479
than you may like initially,

00:17:01.480 --> 00:17:07.839
and so you need some patience in dealing with those.

00:17:07.840 --> 00:17:09.919
And also, it'll take you a while

00:17:09.920 --> 00:17:16.799
to get your head wrapped around Talon and how it works.

00:17:16.800 --> 00:17:19.439
You'll definitely want to use these custom commands

00:17:19.440 --> 00:17:21.479
to correct the errors or shortcomings

00:17:21.480 --> 00:17:26.919
of the language models. And you've seen how,

00:17:26.920 --> 00:17:29.879
by opening up projects by voice commands,

00:17:29.880 --> 00:17:31.359
you can reduce friction

00:17:31.360 --> 00:17:36.659
in terms of restarting work on a project.

00:17:36.660 --> 00:17:40.399
You've seen how Voice In is preferred

00:17:40.400 --> 00:17:44.879
for more accurate dictation.

00:17:44.880 --> 00:17:48.079
I think my error rate is about 1 to 2 percent.

00:17:48.080 --> 00:17:53.879
That is, 1 to 2 out of 100 words are incorrect

00:17:53.880 --> 00:17:56.319
versus Talon Voice where I think

00:17:56.320 --> 00:17:59.879
the error rate is closer to 5 percent.

00:18:00.840 --> 00:18:04.759
I have put together contractions also for Talon,

00:18:04.760 --> 00:18:07.479
and they can be found here on GitHub.

00:18:07.480 --> 00:18:12.959
And I also have a quiz of 600 questions

00:18:12.960 --> 00:18:17.719
about some basic Talon commands.

00:18:17.720 --> 00:18:20.999
So, I'd like to thank the people who've helped me out

00:18:21.000 --> 00:18:22.159
on the Talon Slack channel

00:18:22.160 --> 00:18:25.799
and members of the Oklahoma Data Science Workshop

00:18:25.800 --> 00:18:29.879
where I gave an hour-long talk on this topic

00:18:29.880 --> 00:18:30.959
several weeks ago.

00:18:30.960 --> 00:18:34.159
I'd like to thank my friends

00:18:34.160 --> 00:18:37.399
at the Berlin and Austin Emacs Meetup

00:18:37.400 --> 00:18:42.659
and at the M-x Research Slack channel.

00:18:42.660 --> 00:18:45.119
And I thank these grant funding agencies

00:18:45.120 --> 00:18:48.880
for supporting my work. I'll be happy to take any questions.