diff options
Diffstat (limited to '2023/talks/voice.md')
| -rw-r--r-- | 2023/talks/voice.md | 216 | 
1 files changed, 216 insertions, 0 deletions
| diff --git a/2023/talks/voice.md b/2023/talks/voice.md new file mode 100644 index 00000000..2c5c537f --- /dev/null +++ b/2023/talks/voice.md @@ -0,0 +1,216 @@ +[[!meta title="Enhancing productivity with voice computing"]] +[[!meta copyright="Copyright © 2023 Blaine Mooers"]] +[[!inline pages="internal(2023/info/voice-nav)" raw="yes"]] + +<!-- Initially generated with emacsconf-publish-talk-page and then left alone for manual editing --> +<!-- You can manually edit this file to update the abstract, add links, etc. ---> + +# Enhancing productivity with voice computing +Blaine Mooers (he/him/his) - Pronunciation: pronounced like "moors", blaine-mooers(at)ouhsc.edu, <https://basicsciences.ouhsc.edu/bmb/Faculty/bio_details/mooers-blaine-hm-phd>, <https://twitter.com/BlaineMooers>, <https://github.com/MooersLab>, <https://codeberg.org/MooersLab>, mastodon(at)bhmooers + +[[!inline pages="internal(2023/info/voice-before)" raw="yes"]] + +[[!template id="help" +volunteer="" +summary="Q&A could be indexed with chapter markers" +tags="help_with_chapter_markers" +message="""The Q&A session for this talk does not have chapter markers yet. +Would you like to help? See [[help_with_chapter_markers]] for more details. You can use the vidid="voice-qanda" if adding the markers to this wiki page, or e-mail your chapter notes to <emacsconf-submit@gnu.org>."""]] + +Voice computing uses speech recognition software to convert speech into text, commands, or code. +While there is a venerated program called EmacSpeaks for converting text into speech, an +``EmacsListens'' for converting speech into text is not available yet. +The Emacs Wiki describes the underdeveloped situation for speech-to-text in Emacs. +I will explain how two external software packages convert my speech into text and computer +commands that can be used with Emacs. + +First, I present some motivations for using voice computing. +These can be divided into two categories: productivity improvement and health-related issues. +In this second category, there is the underappreciated cure for ``standing desk envy''; +the cure is achievable with a large dose of voice computing while standing. + +I found one software package (Voice In) to be quite accurate for speech-to-text or dictation +(Voice In Plus, <https://dictanote.co/voicein/plus/>), but less versatile for speech-to-commands. +I have used this package daily, and I found a three-fold increase in my daily word count almost +immediately. +Of course, there are limits here; you can talk for only so many hours per day. + +Second, I found another software package that has a less accurate language model (Talon Voice, +<http://talon.wiki/>)) but that supports custom commands that can be executed anywhere you can +place the cursor, including in virtual machines and on remote servers. +Talon Voice will appeal to those who like to tinker with configuration files, yet it is easy to +use. + +I will explain how I have integrated these two packages into my workflow. +I have developed a library of commands that expand 94 English contractions when spoken. +This library eliminates tedious downstream editing of formal prose where I do not use +contractions. +The library is available on GitHub for both Voice In Plus +(<https://github.com/mooersLab/voice-in-plus-contractions>) and Talon Voice +(<https://github.com/MooersLab/talon-contractions>). + +I also supply the interactive quizzes to master the basic Voice In commands +(<https://github.com/MooersLab/voice-in-basics-quiz>) and the Talon Voice phonetic alphabet +(<https://github.com/MooersLab/talon-voice-quizzes/qTalonAlphabet.py>) +I learned the Talon alphabet in one day by taking the quiz at spaced intervals. +The quiz took only 60 seconds to complete when I was proficient. + +I store my daily writing in a multi-file LaTeX document with one tex file per day. +365 files are compiled into one PDF per year. This is usually about 1000 pages. +I am not going to push my luck with a multiyear document. +Each month is a chapter. The resulting PDF is a breeze to scroll and search. +It has an autogenerated table of contents and an index. I have posted  +a blank version for 2023 and another for the upcoming year  +(<https://github.com/MooersLab/diary2024inLaTeX>) +One could take a similar approach in org-mode by using Bastian Bechtold's  +org-journal package (<https://github.com/bastibe/org-journal>). + +I gave a 60-minute talk on this topic to the Oklahoma Data Science Workshop  +2023 Nov. 16 (<https://mediasite.ouhsc.edu/Mediasite/Channel/python>). +This workshop meets once a month and is for people interested in data  +science and scientific computing. You do not have to be an Oklahoma +resident to attend. Send me e-mail if you want to be added to our mailing list. + +# About the speaker: + +I am an Associate Professor of Biochemistry at the University of +Oklahoma Health Sciences Center. I use X-ray crystallography to study +the structures of RNA, proteins, and protein-drug complexes. I have +been using Python and LaTeX for a dozen years, and Jupyter Notebooks +since 2013. I have been using Emacs every day for 2.5 years. I +discovered voice computing this summer when my chronic repetitive +stress injury flared up while entering data in a spreadsheet. I +tripled my daily word count by using the speech-to-text, and I get a +kick out of running remote computers by speech-to-command. +# Discussion + +## Questions and answers + +-   Q:  Comment there is a text to command thing called clipea that +    would be awesome <https://github.com/dave1010/clipea> +    -   A: <https://sourceforge.net/projects/sox/> also a good +        alternative. +-   Q: Could you comment on how speaking vs. typing affects your +    logic/content.  Thanks! +    -   A: I find that this is like the difference between writing your thoughts +		down on a blank piece of printer paper versus paper bound with a +		leather notebook. I do not think there has any real difference. I know +		that some people believe there is a solid certain difference but this +		is, for the purpose I am using this, for the purpose of generating the +		first draft, because my skills with the-- using my voice to edit my +		text is still not very well developed, I am still more efficient using +		the keyboard for that stage. + +		So the hardest part about +		writing generally is getting the first crappy draft written. I +		have found that dictation is perfectly fine for that phase. I +		find it actually very conducive for just getting the text out. The +		biggest problem that most of us have is applying our internal editor and +		that inhibits us from generating words in a free-flowing +		fashion.  + +		I generally do my generative writing--actually, I divide my writing +		into two categories: generative writing (generating the first crappy +		draft) and then rewriting. Rewriting is probably 80-90% of writing +		where you can go back and rework the order of the sentences, order of +		paragraphs, the order of words in a sentence and so forth. It is +		really hard work that is best done later in the day when I am more +		awake. I do my generative writing first thing in the morning when I am +		feel horrible. That is when my internal editor is not very awake and I +		can get more words out more words past that gatekeeper. I can do this +		sitting down. I can do this standing up. I can do this 20 feet away +		from my computer looking out the window to get my eyes a break. I find +		it is just a very enjoyable to use it in this fashion. The downside is +		that I wind up generating three times as much text. That makes for +		three times as much work when it comes to rewriting the text, and that +		means I am using the keyboard a lot and later on in the day. + +		I have not made any progress on recovering from my own repetitive +		stress injury. I hope that I will add the use of voice commands, +		speech-to-commands, for editing the text in the future and I will +		eventually give my hands more of a break. + +		This allows you to actually separate those two activities not only by +		time... So many professional writers will spend several hours in the +		morning doing the generative part and then they will spend the rest of +		the day rewriting. They have separated this to activities temporally. +		What most people actually do is they they do the generative part and +		then they write one sentence, and they apply that internal editor +		right away because they want to write the first draft as a perfect +		version, as a final draft, and that is what slows them down +		dramatically. + +		This also allows you to separate these two activities in terms of +		modality. You are going to do the generative writing by Voice In, the +		rewriting by keyboard. I think this is like what most people... One way +		that many people can get into using speech-to-text in a productive way +		that sounds great... +    -   A: (not the author, just an audiance): So, for example, when +        you're talking, you have an immense feeling of the topic you +        have. You can close your eyes and do your body gestures to +        manipulate a concept or idea, and you have... I just feel you +        feel more creative than just tapping. Definitely you have much +        more speed advantage over tapping, but more important thing is +        you use your body as a whole to interact with those ideas. +        [this one is done via voice...] +        -   but typing is definitely good for acturate control, such as +            M-x some-command ... +-   Q: Have you tried the ChatGTP voice chat interface, if so how has +    been your experience of it? As someone experienced with voice +    control, interested to hear your thoughts, performance relative to +    the open source tools in particular.  +    -   A: I do not have much experience with that particular software. I have +		use Whisper a little bit, and so that is related. Of course, you have +		this problem of lag. I find that Whisper is good for spitting out a +		sentence maybe for a docstring and a programming file. I find that it +		is very prone to hallucinations. I find myself spending half my +		time deleting the hallucinations, and I feel like the net gain is +		diminished as a result, or there has not much of a net gain in terms of +		what I am getting out of it. +-   Q: Are any of these voice command/dictions freemium? +    -   A: To be able to add custom commands, you have to pay +		$48 a year. The Talon Voice software is free and the only +		limitation there is access to the language model. If you want to get +		the beta version, you need to subscribe to Patreon to support the +		developer. I did that, and I really did not find much of +		an improvement. I really do not intend to do that in the future. +		But otherwise in Talon Voice, everything is open and free. The Slack +		community is incredibly welcoming. Its parallels with +		the Emacs Community are pretty striking. +-   Q: How good is Talon compared to whisper? +    - A: With Talon, I find that the first part of the sentence will +		be fairly accurate. When I am doing dictation and then towards +		the end, the errors... In general, I think its error rate is +		about five words out of 100 or so or will be wrong. Whisper is +		wonderful because it will insert punctuation for you, but I +		guess its errors are longer and that will hallucinate full +		sentences for you. So they both have significant error rates. +		They are just different kinds of errors. Hopefully, both over +		time... [Talon] errors are generally shorter in extent. It do +		not hallucinate as long. +- Q: are any of those voice command/dictation tools libre? i can not find that information on the web +  - (not the speaker):  +    - this FAQ <https://talon.wiki/faq/> says that Talon Voice is closed source +	- talon voice is non-free <https://talonvoice.com/EULA.txt> +    - Mistral 7B is apache 2.0 license  i.e. no restrictions + + +## Notes + +- From the speaker: I really appreciate the high level of accuracy that I am getting from +Voice In. I would use Talon Voice for dictation, but at this point, +there is a significant difference between the level of accuracy of +Voice In versus Talon Voice. It's large enough of a difference that I'll +probably use Voice In for a while until I can figure out how to get  +Talon Voice to generate more accurate text. +-   When you do Org mode and you have the bullets, it can allows you to naturally shard your thoughts in a way that is really easy to edit. ... It has a +summarizing capability. It allows you to you know pull back and get a +overview. +- Great stuff, definitely going to test-drive Talon + + +[[!inline pages="internal(2023/info/voice-after)" raw="yes"]] + +[[!inline pages="internal(2023/info/voice-nav)" raw="yes"]] + + | 
