[[!meta title="Enhancing productivity with voice computing"]] [[!meta copyright="Copyright © 2023 Blaine Mooers"]] [[!inline pages="internal(2023/info/voice-nav)" raw="yes"]] # Enhancing productivity with voice computing Blaine Mooers (he/him/his) - Pronunciation: pronounced like "moors", blaine-mooers(at)ouhsc.edu, , , , , mastodon(at)bhmooers [[!inline pages="internal(2023/info/voice-before)" raw="yes"]] [[!template id="help" volunteer="" summary="Q&A could be indexed with chapter markers" tags="help_with_chapter_markers" message="""The Q&A session for this talk does not have chapter markers yet. Would you like to help? See [[help_with_chapter_markers]] for more details. You can use the vidid="voice-qanda" if adding the markers to this wiki page, or e-mail your chapter notes to ."""]] Voice computing uses speech recognition software to convert speech into text, commands, or code. While there is a venerated program called EmacSpeaks for converting text into speech, an ``EmacsListens'' for converting speech into text is not available yet. The Emacs Wiki describes the underdeveloped situation for speech-to-text in Emacs. I will explain how two external software packages convert my speech into text and computer commands that can be used with Emacs. First, I present some motivations for using voice computing. These can be divided into two categories: productivity improvement and health-related issues. In this second category, there is the underappreciated cure for ``standing desk envy''; the cure is achievable with a large dose of voice computing while standing. I found one software package (Voice In) to be quite accurate for speech-to-text or dictation (Voice In Plus, ), but less versatile for speech-to-commands. I have used this package daily, and I found a three-fold increase in my daily word count almost immediately. Of course, there are limits here; you can talk for only so many hours per day. Second, I found another software package that has a less accurate language model (Talon Voice, )) but that supports custom commands that can be executed anywhere you can place the cursor, including in virtual machines and on remote servers. Talon Voice will appeal to those who like to tinker with configuration files, yet it is easy to use. I will explain how I have integrated these two packages into my workflow. I have developed a library of commands that expand 94 English contractions when spoken. This library eliminates tedious downstream editing of formal prose where I do not use contractions. The library is available on GitHub for both Voice In Plus () and Talon Voice (). I also supply the interactive quizzes to master the basic Voice In commands () and the Talon Voice phonetic alphabet () I learned the Talon alphabet in one day by taking the quiz at spaced intervals. The quiz took only 60 seconds to complete when I was proficient. I store my daily writing in a multi-file LaTeX document with one tex file per day. 365 files are compiled into one PDF per year. This is usually about 1000 pages. I am not going to push my luck with a multiyear document. Each month is a chapter. The resulting PDF is a breeze to scroll and search. It has an autogenerated table of contents and an index. I have posted a blank version for 2023 and another for the upcoming year () One could take a similar approach in org-mode by using Bastian Bechtold's org-journal package (). I gave a 60-minute talk on this topic to the Oklahoma Data Science Workshop 2023 Nov. 16 (). This workshop meets once a month and is for people interested in data science and scientific computing. You do not have to be an Oklahoma resident to attend. Send me e-mail if you want to be added to our mailing list. # About the speaker: I am an Associate Professor of Biochemistry at the University of Oklahoma Health Sciences Center. I use X-ray crystallography to study the structures of RNA, proteins, and protein-drug complexes. I have been using Python and LaTeX for a dozen years, and Jupyter Notebooks since 2013. I have been using Emacs every day for 2.5 years. I discovered voice computing this summer when my chronic repetitive stress injury flared up while entering data in a spreadsheet. I tripled my daily word count by using the speech-to-text, and I get a kick out of running remote computers by speech-to-command. # Discussion ## Questions and answers - Q:  Comment there is a text to command thing called clipea that would be awesome - A: also a good alternative. - Q: Could you comment on how speaking vs. typing affects your logic/content.  Thanks! - A: I find that this is like the difference between writing your thoughts down on a blank piece of printer paper versus paper bound with a leather notebook. I do not think there has any real difference. I know that some people believe there is a solid certain difference but this is, for the purpose I am using this, for the purpose of generating the first draft, because my skills with the-- using my voice to edit my text is still not very well developed, I am still more efficient using the keyboard for that stage. So the hardest part about writing generally is getting the first crappy draft written. I have found that dictation is perfectly fine for that phase. I find it actually very conducive for just getting the text out. The biggest problem that most of us have is applying our internal editor and that inhibits us from generating words in a free-flowing fashion. I generally do my generative writing--actually, I divide my writing into two categories: generative writing (generating the first crappy draft) and then rewriting. Rewriting is probably 80-90% of writing where you can go back and rework the order of the sentences, order of paragraphs, the order of words in a sentence and so forth. It is really hard work that is best done later in the day when I am more awake. I do my generative writing first thing in the morning when I am feel horrible. That is when my internal editor is not very awake and I can get more words out more words past that gatekeeper. I can do this sitting down. I can do this standing up. I can do this 20 feet away from my computer looking out the window to get my eyes a break. I find it is just a very enjoyable to use it in this fashion. The downside is that I wind up generating three times as much text. That makes for three times as much work when it comes to rewriting the text, and that means I am using the keyboard a lot and later on in the day. I have not made any progress on recovering from my own repetitive stress injury. I hope that I will add the use of voice commands, speech-to-commands, for editing the text in the future and I will eventually give my hands more of a break. This allows you to actually separate those two activities not only by time... So many professional writers will spend several hours in the morning doing the generative part and then they will spend the rest of the day rewriting. They have separated this to activities temporally. What most people actually do is they they do the generative part and then they write one sentence, and they apply that internal editor right away because they want to write the first draft as a perfect version, as a final draft, and that is what slows them down dramatically. This also allows you to separate these two activities in terms of modality. You are going to do the generative writing by Voice In, the rewriting by keyboard. I think this is like what most people... One way that many people can get into using speech-to-text in a productive way that sounds great... - A: (not the author, just an audiance): So, for example, when you're talking, you have an immense feeling of the topic you have. You can close your eyes and do your body gestures to manipulate a concept or idea, and you have... I just feel you feel more creative than just tapping. Definitely you have much more speed advantage over tapping, but more important thing is you use your body as a whole to interact with those ideas. [this one is done via voice...] - but typing is definitely good for acturate control, such as M-x some-command ... - Q: Have you tried the ChatGTP voice chat interface, if so how has been your experience of it? As someone experienced with voice control, interested to hear your thoughts, performance relative to the open source tools in particular. - A: I do not have much experience with that particular software. I have use Whisper a little bit, and so that is related. Of course, you have this problem of lag. I find that Whisper is good for spitting out a sentence maybe for a docstring and a programming file. I find that it is very prone to hallucinations. I find myself spending half my time deleting the hallucinations, and I feel like the net gain is diminished as a result, or there has not much of a net gain in terms of what I am getting out of it. - Q: Are any of these voice command/dictions freemium? - A: To be able to add custom commands, you have to pay $48 a year. The Talon Voice software is free and the only limitation there is access to the language model. If you want to get the beta version, you need to subscribe to Patreon to support the developer. I did that, and I really did not find much of an improvement. I really do not intend to do that in the future. But otherwise in Talon Voice, everything is open and free. The Slack community is incredibly welcoming. Its parallels with the Emacs Community are pretty striking. - Q: How good is Talon compared to whisper? - A: With Talon, I find that the first part of the sentence will be fairly accurate. When I am doing dictation and then towards the end, the errors... In general, I think its error rate is about five words out of 100 or so or will be wrong. Whisper is wonderful because it will insert punctuation for you, but I guess its errors are longer and that will hallucinate full sentences for you. So they both have significant error rates. They are just different kinds of errors. Hopefully, both over time... [Talon] errors are generally shorter in extent. It do not hallucinate as long. - Q: are any of those voice command/dictation tools libre? i can not find that information on the web - (not the speaker): - this FAQ says that Talon Voice is closed source - talon voice is non-free - Mistral 7B is apache 2.0 license i.e. no restrictions ## Notes - From the speaker: I really appreciate the high level of accuracy that I am getting from Voice In. I would use Talon Voice for dictation, but at this point, there is a significant difference between the level of accuracy of Voice In versus Talon Voice. It's large enough of a difference that I'll probably use Voice In for a while until I can figure out how to get Talon Voice to generate more accurate text. - When you do Org mode and you have the bullets, it can allows you to naturally shard your thoughts in a way that is really easy to edit. ... It has a summarizing capability. It allows you to you know pull back and get a overview. - Great stuff, definitely going to test-drive Talon [[!inline pages="internal(2023/info/voice-after)" raw="yes"]] [[!inline pages="internal(2023/info/voice-nav)" raw="yes"]]