From 558d28b033396d4384e8cf36a36ae8f3d0c0f9b1 Mon Sep 17 00:00:00 2001 From: Sacha Chua Date: Tue, 10 Dec 2024 13:15:03 -0500 Subject: add notes to p-search --- 2024/talks/p-search.md | 176 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 176 insertions(+) (limited to '2024/talks') diff --git a/2024/talks/p-search.md b/2024/talks/p-search.md index 01e8aed7..59d16379 100644 --- a/2024/talks/p-search.md +++ b/2024/talks/p-search.md @@ -53,6 +53,182 @@ tools. Code: +# Discussion + +## Questions and answers + +- Q: Do you think a reduced version of this functionality could be + integrated into isearch?  Right now you can turn on various flags + when using isearch with M-s \, like M-s SPC to match spaces + literally.  Is it possible to add a flag to \"search the buffer + semantically\"? (Ditto with M-x occur, which is more similar to your + buffer-oriented results interface) + - A: it\'s essencially a framwork so you would create a generator; + but it does not exist yet. +- Q: Any idea how this would work with personal information like + Zettlekastens?  + - A: Useable as is, because all the files are in directory. So + only have to set the files to search in only. You can then add + information to ignore some files (like daily notes). + Documentation is coming. +- Q: How good does the search work for synonyms especially if you use + different languages? + - A: There is an entire field of search to translate the word that + is inputted to normalize it (like plural -\> singular + transformation). Currently p-search does not address this.  + - A: for different languages it gets complicated (vector search + possible, but might be too slow in Elisp). +- Q: When searching by author I know authors may setup a new machine + and not put the exact same information. Is this doing anything to + combine those into one author? + - A: Currently using the git command. So if you know the emails + the author have used, you can add different priors. +- Q: A cool more powerful grep \"Rak\" to use and maybe has some good + ideas in increasing the value of searches, for example using Raku + code while searching. is Rak written in Raku. Have you seen it?  + - [https://github.com/lizmat/App-Rak](https://github.com/lizmat/App-Rak){rel="noreferrer noopener"} + - [https://www.youtube.com/watch?v=YkjGNV4dVio&t=167s&pp=ygURYXBwIHJhayByYWt1IGdyZXA%3D](https://www.youtube.com/watch?v=YkjGNV4dVio&t=167s&pp=ygURYXBwIHJhayByYWt1IGdyZXA%3D){rel="noreferrer noopener"}  + - A: I have to look into that. Tree-sitter AST would also be cool + to include to have a better search. +- Q: Have you thought about integrating results from using cosine + similarity with a deep-learning based vector embedding?  This will + let us search for \"fruit\" and get back results that have \"apple\" + or \"grapes\" in them \-- that kind of thing.  It will probably also + handle the case of terms that could be abbreviated/formatted + differently like in your initial example. + - A: Goes back to semantic search. Probably can be implemented, + but also probably too slow. And it is hard to get the embeddings + and the system running on the machine. +- Q:  I missed the start of the talk, so apologies if this has been + covered - is it possible to save/bookmark searches or search + templates so they can be used again and again? + - A: Exactly.  I just recently added bookmarking capabilities, so + we can bookmark and rerun our searches from where we left off.  + I tried to create a one-to-one mapping from the search object to + the search object - there is a command to do this- to get a data + representation of the search, to get a custom plist and resume + the search where we left off, which can be used to create + command to trigger a prior search. +- Q: You mentioned about candidate generators. Could you explain about + to what the score is assigned to. Is it to a line or whatever the + candidate generates? How does it work with rg in your demo? + +   FOLLOW-UP: How does the git scoring thingy hook into this?\ + +- - A: Candidate generator produces documents. Documents have + properties (like an id and a path). From that you get + subproperties like the content of the document. Each candidate + generator know how to search in the files (emails, buffers, + files, urls, \...). There is only the notion of score + + document. + - Then another method is used to extract the lines that matches in + the document (to show precisely the lines that matches). + +- Q: Hearing about this makes me think about how nice the emergent + workflow with denote using easy filtering with orderless. It is + really easy searching for file tags, titles etc. and do things with + them. Did this or something like this help or infulce the design of + psearch? + - A: You can search for whatever you want. No hardcoding is + possible for anything (file, directories, tags, titlese\...). + +- Q: \[comments from IRC\] \ git covers the \"multiple + names\" thing itself: see .mailmap  10:51:19  + - \ thiis is a git feature, p-search shouldn\'t need to + implement it  10:51:34  + - \ To me this seems to have similarities to notmuch \-- + honestly I want notmuch with the p-search UI :) (of course, + notmuch uses a xapian index, because repeatedly grepping all + traffic on huge mailing lists would be insane.)  10:55:30  + - \ (notmuch also has bookmark-like things as a core + feature, but no real weighting like p-search does.)  10:56:07  + - A: I have not used notmuch, but many extensions are + possible. mu4e is using  a full index for the search. This + could be adapted here to with the SQL database as source.  + +- Q: You can search a buffer using ripgrep by feeding it in as stdin + to the ripgrep process, can\'t you? + - A: Yes you can. But the aim is to search many different things + in elisp. So there is a mechanism in psearch anyway to be able + to represent anything including buffers. This is working pretty + well. + +- Q:  Thanks for making this lovely thing, I\'m looking forward to + trying it out.  Seems modular and well thought out. Questions about + integreation and about the interface + - A: project.el is used to search only in the local files of the + project (as done by default) + +- Q: how happy are you with the interface? + - A: psearch is going over the entire files trying to find the + best. Many features can be added, e.g., to improve debuggability + (is this highly ranked due to a bug? due to a high weight? many + matching documents?) + - A: hopefully will be on ELPA at some point with proper + documentation. + +- Q: Remembering searches is not available everywhere (rg.el? but AI + package like gptel already have it). Also useful for using the + document in the future. + - A: Retrievel augmented generation: p-search could be used for + the search, combining it with an AI to fine-tune the search with + a Q-A workflow. Although currently no API.   + - (gptel author here: I\'m looking forward to seeing if I can use + gptel with p-search) + - A: as the results are surprisingly good, why is that not used + anywhere else? But there is a lot of setup to get it right. You + need to something like emacs with many configuration (transient + is helping to do that) without scaring the users.  + - Everyone uses emacs differently, so unclear how people will + really use it. (PlasmaStrike) For example consult-omni + (elfeed-tube, \...) searching multiple webpages at the same + time, with orderless. However, no webpage offers this option. + Somehow those tools stay in emacs only. (Corwin Brust) This is + the strength of emacs: people invest a lot of time to improve + their workflow from tomorrow. \[see xkcd on emacs learning curve + vs nano vs vim\] + - [https://github.com/armindarvish/consult-omni](https://github.com/armindarvish/consult-omni){rel="noreferrer noopener"} + - [https://github.com/karthink/elfeed-tube](https://github.com/karthink/elfeed-tube){rel="noreferrer noopener"} + - [https://www.reddit.com/r/ProgrammerHumor/comments/9d6f19/text_editor_learning_curves_fixed/](https://www.reddit.com/r/ProgrammerHumor/comments/9d6f19/text_editor_learning_curves_fixed/){rel="noreferrer noopener"} + - A: emacs is not the most beginner friendly, but the solution + space is very large + - (Corwin Brust) Emacs supports all approaches and is extensible. + (PlasmaStrike) Youtube much larger, but somehow does not have + this nice sane interface. + +- Q: Do you think the Emacs being kinda slow will get in the way of + being able to run a lot of scoring algorithms? + - A: The code currently is dumb in a lot of places (like going of + all files to calculate a score), but that is not that slow + surprisingly. Elisp enumerating all files and multiplying + numbers in the emacs repo isn\'t really slow. But if you have to + search in files, this will be slow without relying on ripgrep on + a faster tool. Take for example the search in info files / elisp + info files, the search in elisp is almost instant. For + human-size documents, probably fast enough \-- and if not, there + is room for optimizations. For coompany-size documents (like + repos), could be too small. + +- Q: When do you have to make something more complicated to scale + better? + - A: I do not know yet really. I try to automate tasks as much as + possible, like in the emacs configuration meme \"not doing work + I have to do the configuration\". Usually I do not add web-based + things into emacs. + +## Notes + +- I like the dedicated-buffer interface (I\'m assuming using + magit-section and transient). +- \ Very interesting ideas. I was very happy when I was able + to do simple +-                 filters with orderless, but this is great \[11:46\] +- \ I dunno about you, but I want to start using p-search + yesterday. +-                     (possibly integrating lsp-based tokens + somehow\...) \[11:44\] +- \ Awesome job Ryota, thank you for sharing!  + [[!inline pages="internal(2024/info/p-search-after)" raw="yes"]] -- cgit v1.2.3