summaryrefslogtreecommitdiffstats
path: root/2023/talks/voice.md
diff options
context:
space:
mode:
Diffstat (limited to '2023/talks/voice.md')
-rw-r--r--2023/talks/voice.md126
1 files changed, 126 insertions, 0 deletions
diff --git a/2023/talks/voice.md b/2023/talks/voice.md
index 625dac97..3d96a967 100644
--- a/2023/talks/voice.md
+++ b/2023/talks/voice.md
@@ -75,6 +75,132 @@ discovered voice computing this summer when my chronic repetitive
stress injury flared up while entering data in a spreadsheet. I
tripled my daily word count by using the speech-to-text, and I get a
kick out of running remote computers by speech-to-command.
+# Discussion
+
+## Questions and answers
+
+- Q:  Comment there is a text to command thing called clipea that
+ would be awesome <https://github.com/dave1010/clipea>
+ - A: <https://sourceforge.net/projects/sox/> also a good
+ alternative.
+- Q: Could you comment on how speaking vs. typing affects your
+ logic/content.  Thanks!
+ - A: I find that this is like the difference between writing your thoughts
+ down on a blank piece of printer paper versus paper bound with a
+ leather notebook. I do not think there has any real difference. I know
+ that some people believe there is a solid certain difference but this
+ is, for the purpose I am using this, for the purpose of generating the
+ first draft, because my skills with the-- using my voice to edit my
+ text is still not very well developed, I am still more efficient using
+ the keyboard for that stage.
+
+ So the hardest part about
+ writing generally is getting the first crappy draft written. I
+ have found that dictation is perfectly fine for that phase. I
+ find it actually very conducive for just getting the text out. The
+ biggest problem that most of us have is applying our internal editor and
+ that inhibits us from generating words in a free-flowing
+ fashion.
+
+ I generally do my generative writing--actually, I divide my writing
+ into two categories: generative writing (generating the first crappy
+ draft) and then rewriting. Rewriting is probably 80-90% of writing
+ where you can go back and rework the order of the sentences, order of
+ paragraphs, the order of words in a sentence and so forth. It is
+ really hard work that is best done later in the day when I am more
+ awake. I do my generative writing first thing in the morning when I am
+ feel horrible. That is when my internal editor is not very awake and I
+ can get more words out more words past that gatekeeper. I can do this
+ sitting down. I can do this standing up. I can do this 20 feet away
+ from my computer looking out the window to get my eyes a break. I find
+ it is just a very enjoyable to use it in this fashion. The downside is
+ that I wind up generating three times as much text. That makes for
+ three times as much work when it comes to rewriting the text, and that
+ means I am using the keyboard a lot and later on in the day.
+
+ I have not made any progress on recovering from my own repetitive
+ stress injury. I hope that I will add the use of voice commands,
+ speech-to-commands, for editing the text in the future and I will
+ eventually give my hands more of a break.
+
+ This allows you to actually separate those two activities not only by
+ time... So many professional writers will spend several hours in the
+ morning doing the generative part and then they will spend the rest of
+ the day rewriting. They have separated this to activities temporally.
+ What most people actually do is they they do the generative part and
+ then they write one sentence, and they apply that internal editor
+ right away because they want to write the first draft as a perfect
+ version, as a final draft, and that is what slows them down
+ dramatically.
+
+ This also allows you to separate these two activities in terms of
+ modality. You are going to do the generative writing by Voice In, the
+ rewriting by keyboard. I think this is like what most people... One way
+ that many people can get into using speech-to-text in a productive way
+ that sounds great...
+ - A: (not the author, just an audiance): So, for example, when
+ you\'re talking, you have an immense feeling of the topic you
+ have. You can close your eyes and do your body gestures to
+ manipulate a concept or idea, and you have\... I just feel you
+ feel more creative than just tapping. Definitely you have much
+ more speed advantage over tapping, but more important thing is
+ you use your body as a whole to interact with those ideas.
+ \[this one is done via voice\...\]
+ - but typing is definitely good for acturate control, such as
+ M-x some-command \...
+- Q: Have you tried the ChatGTP voice chat interface, if so how has
+ been your experience of it? As someone experienced with voice
+ control, interested to hear your thoughts, performance relative to
+ the open source tools in particular.
+ - A: I do not have much experience with that particular software. I have
+ use Whisper a little bit, and so that is related. Of course, you have
+ this problem of lag. I find that Whisper is good for spitting out a
+ sentence maybe for a docstring and a programming file. I find that it
+ is very prone to hallucinations. I find myself spending half my
+ time deleting the hallucinations, and I feel like the net gain is
+ diminished as a result, or there has not much of a net gain in terms of
+ what I am getting out of it.
+- Q: Are any of these voice command/dictions freemium?
+ - A: To be able to add custom commands, you have to pay
+ $48 a year. The Talon Voice software is free and the only
+ limitation there is access to the language model. If you want to get
+ the beta version, you need to subscribe to Patreon to support the
+ developer. I did that, and I really did not find much of
+ an improvement. I really do not intend to do that in the future.
+ But otherwise in Talon Voice, everything is open and free. The Slack
+ community is incredibly welcoming. Its parallels with
+ the Emacs Community are pretty striking.
+- Q: How good is Talon compared to whisper?
+ - A: With Talon, I find that the first part of the sentence will
+ be fairly accurate. When I am doing dictation and then towards
+ the end, the errors... In general, I think its error rate is
+ about five words out of 100 or so or will be wrong. Whisper is
+ wonderful because it will insert punctuation for you, but I
+ guess its errors are longer and that will hallucinate full
+ sentences for you. So they both have significant error rates.
+ They are just different kinds of errors. Hopefully, both over
+ time... [Talon] errors are generally shorter in extent. It do
+ not hallucinate as long.
+- Q: are any of those voice command/dictation tools libre? i can not find that information on the web
+ - (not the speaker):
+ - this FAQ <https://talon.wiki/faq/> says that Talon Voice is closed source
+ - talon voice is non-free <https://talonvoice.com/EULA.txt>
+ - Mistral 7B is apache 2.0 license i.e. no restrictions
+
+
+## Notes
+
+- From the speaker: I really appreciate the high level of accuracy that I am getting from
+Voice In. I would use Talon Voice for dictation, but at this point,
+there is a significant difference between the level of accuracy of
+Voice In versus Talon Voice. It's large enough of a difference that I'll
+probably use Voice In for a while until I can figure out how to get
+Talon Voice to generate more accurate text.
+- When you do Org mode and you have the bullets, it can allows you to naturally shard your thoughts in a way that is really easy to edit. ... It has a
+summarizing capability. It allows you to you know pull back and get a
+overview.
+- Great stuff, definitely going to test-drive Talon
+
[[!inline pages="internal(2023/info/voice-after)" raw="yes"]]