From 8a80c4593f066f986e02ca5233fc5a59097d5a67 Mon Sep 17 00:00:00 2001 From: Sacha Chua Date: Fri, 8 Dec 2023 09:34:18 -0500 Subject: update pads and include original links for now --- 2023/talks/voice.md | 126 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 126 insertions(+) (limited to '2023/talks/voice.md') diff --git a/2023/talks/voice.md b/2023/talks/voice.md index 625dac97..3d96a967 100644 --- a/2023/talks/voice.md +++ b/2023/talks/voice.md @@ -75,6 +75,132 @@ discovered voice computing this summer when my chronic repetitive stress injury flared up while entering data in a spreadsheet. I tripled my daily word count by using the speech-to-text, and I get a kick out of running remote computers by speech-to-command. +# Discussion + +## Questions and answers + +- Q:  Comment there is a text to command thing called clipea that + would be awesome + - A: also a good + alternative. +- Q: Could you comment on how speaking vs. typing affects your + logic/content.  Thanks! + - A: I find that this is like the difference between writing your thoughts + down on a blank piece of printer paper versus paper bound with a + leather notebook. I do not think there has any real difference. I know + that some people believe there is a solid certain difference but this + is, for the purpose I am using this, for the purpose of generating the + first draft, because my skills with the-- using my voice to edit my + text is still not very well developed, I am still more efficient using + the keyboard for that stage. + + So the hardest part about + writing generally is getting the first crappy draft written. I + have found that dictation is perfectly fine for that phase. I + find it actually very conducive for just getting the text out. The + biggest problem that most of us have is applying our internal editor and + that inhibits us from generating words in a free-flowing + fashion. + + I generally do my generative writing--actually, I divide my writing + into two categories: generative writing (generating the first crappy + draft) and then rewriting. Rewriting is probably 80-90% of writing + where you can go back and rework the order of the sentences, order of + paragraphs, the order of words in a sentence and so forth. It is + really hard work that is best done later in the day when I am more + awake. I do my generative writing first thing in the morning when I am + feel horrible. That is when my internal editor is not very awake and I + can get more words out more words past that gatekeeper. I can do this + sitting down. I can do this standing up. I can do this 20 feet away + from my computer looking out the window to get my eyes a break. I find + it is just a very enjoyable to use it in this fashion. The downside is + that I wind up generating three times as much text. That makes for + three times as much work when it comes to rewriting the text, and that + means I am using the keyboard a lot and later on in the day. + + I have not made any progress on recovering from my own repetitive + stress injury. I hope that I will add the use of voice commands, + speech-to-commands, for editing the text in the future and I will + eventually give my hands more of a break. + + This allows you to actually separate those two activities not only by + time... So many professional writers will spend several hours in the + morning doing the generative part and then they will spend the rest of + the day rewriting. They have separated this to activities temporally. + What most people actually do is they they do the generative part and + then they write one sentence, and they apply that internal editor + right away because they want to write the first draft as a perfect + version, as a final draft, and that is what slows them down + dramatically. + + This also allows you to separate these two activities in terms of + modality. You are going to do the generative writing by Voice In, the + rewriting by keyboard. I think this is like what most people... One way + that many people can get into using speech-to-text in a productive way + that sounds great... + - A: (not the author, just an audiance): So, for example, when + you\'re talking, you have an immense feeling of the topic you + have. You can close your eyes and do your body gestures to + manipulate a concept or idea, and you have\... I just feel you + feel more creative than just tapping. Definitely you have much + more speed advantage over tapping, but more important thing is + you use your body as a whole to interact with those ideas. + \[this one is done via voice\...\] + - but typing is definitely good for acturate control, such as + M-x some-command \... +- Q: Have you tried the ChatGTP voice chat interface, if so how has + been your experience of it? As someone experienced with voice + control, interested to hear your thoughts, performance relative to + the open source tools in particular. + - A: I do not have much experience with that particular software. I have + use Whisper a little bit, and so that is related. Of course, you have + this problem of lag. I find that Whisper is good for spitting out a + sentence maybe for a docstring and a programming file. I find that it + is very prone to hallucinations. I find myself spending half my + time deleting the hallucinations, and I feel like the net gain is + diminished as a result, or there has not much of a net gain in terms of + what I am getting out of it. +- Q: Are any of these voice command/dictions freemium? + - A: To be able to add custom commands, you have to pay + $48 a year. The Talon Voice software is free and the only + limitation there is access to the language model. If you want to get + the beta version, you need to subscribe to Patreon to support the + developer. I did that, and I really did not find much of + an improvement. I really do not intend to do that in the future. + But otherwise in Talon Voice, everything is open and free. The Slack + community is incredibly welcoming. Its parallels with + the Emacs Community are pretty striking. +- Q: How good is Talon compared to whisper? + - A: With Talon, I find that the first part of the sentence will + be fairly accurate. When I am doing dictation and then towards + the end, the errors... In general, I think its error rate is + about five words out of 100 or so or will be wrong. Whisper is + wonderful because it will insert punctuation for you, but I + guess its errors are longer and that will hallucinate full + sentences for you. So they both have significant error rates. + They are just different kinds of errors. Hopefully, both over + time... [Talon] errors are generally shorter in extent. It do + not hallucinate as long. +- Q: are any of those voice command/dictation tools libre? i can not find that information on the web + - (not the speaker): + - this FAQ says that Talon Voice is closed source + - talon voice is non-free + - Mistral 7B is apache 2.0 license i.e. no restrictions + + +## Notes + +- From the speaker: I really appreciate the high level of accuracy that I am getting from +Voice In. I would use Talon Voice for dictation, but at this point, +there is a significant difference between the level of accuracy of +Voice In versus Talon Voice. It's large enough of a difference that I'll +probably use Voice In for a while until I can figure out how to get +Talon Voice to generate more accurate text. +- When you do Org mode and you have the bullets, it can allows you to naturally shard your thoughts in a way that is really easy to edit. ... It has a +summarizing capability. It allows you to you know pull back and get a +overview. +- Great stuff, definitely going to test-drive Talon + [[!inline pages="internal(2023/info/voice-after)" raw="yes"]] -- cgit v1.2.3