diff options
Diffstat (limited to '')
-rw-r--r-- | 2024/info/p-search-after.md | 51 |
1 files changed, 13 insertions, 38 deletions
diff --git a/2024/info/p-search-after.md b/2024/info/p-search-after.md index b6aa194a..779ffbb2 100644 --- a/2024/info/p-search-after.md +++ b/2024/info/p-search-after.md @@ -1,13 +1,10 @@ <!-- Automatically generated by emacsconf-publish-after-page --> -<div class="transcript transcript-mainVideo"><a name="p-search-mainVideo-transcript"></a> -# Transcript +<div class="transcript transcript-mainVideo"><a name="p-search-mainVideo-transcript"></a><h1>Transcript</h1> -[[!template new="1" text="""Search in daily workflows""" start="00:00:00.000" video="mainVideo-p-search" id="subtitle"]] - -[[!template text="""Hello, my name is Zachary Romero, and today I'll be going""" start="00:00:00.000" video="mainVideo-p-search" id="subtitle"]] +<div class="transcript-heading">[[!template new="1" text="""Search in daily workflows""" start="00:00:00.000" video="mainVideo-p-search" id="subtitle"]]</div>[[!template text="""Hello, my name is Zachary Romero, and today I'll be going""" start="00:00:00.000" video="mainVideo-p-search" id="subtitle"]] [[!template text="""over p-search, a local search engine in Emacs.""" start="00:00:03.400" video="mainVideo-p-search" id="subtitle"]] [[!template text="""Search these days is everywhere in software, from text editors,""" start="00:00:08.116" video="mainVideo-p-search" id="subtitle"]] [[!template text="""to IDEs, to most online websites. These tools tend to fall""" start="00:00:12.399" video="mainVideo-p-search" id="subtitle"]] @@ -28,9 +25,7 @@ [[!template text="""online services such as Google, GitHub,""" start="00:01:15.640" video="mainVideo-p-search" id="subtitle"]] [[!template text="""SourceGraph for code.""" start="00:01:18.766" video="mainVideo-p-search" id="subtitle"]] -[[!template new="1" text="""Problems with editor search tools""" start="00:01:24.200" video="mainVideo-p-search" id="subtitle"]] - -[[!template text="""The kind of search feature that editors""" start="00:01:24.200" video="mainVideo-p-search" id="subtitle"]] +<div class="transcript-heading">[[!template new="1" text="""Problems with editor search tools""" start="00:01:24.200" video="mainVideo-p-search" id="subtitle"]]</div>[[!template text="""The kind of search feature that editors""" start="00:01:24.200" video="mainVideo-p-search" id="subtitle"]] [[!template text="""usually have have a lot of downsides to them. For one, a lot""" start="00:01:28.840" video="mainVideo-p-search" id="subtitle"]] [[!template text="""of times you don't know the exact search string you're""" start="00:01:36.720" video="mainVideo-p-search" id="subtitle"]] [[!template text="""searching for. Some complicated term like this""" start="00:01:38.840" video="mainVideo-p-search" id="subtitle"]] @@ -68,9 +63,7 @@ [[!template text="""100 times more relevant? These tools usually don't provide""" start="00:03:43.096" video="mainVideo-p-search" id="subtitle"]] [[!template text="""such information.""" start="00:03:52.280" video="mainVideo-p-search" id="subtitle"]] -[[!template new="1" text="""Information retrieval""" start="00:03:58.233" video="mainVideo-p-search" id="subtitle"]] - -[[!template text="""There's a field called information retrieval,""" start="00:03:58.233" video="mainVideo-p-search" id="subtitle"]] +<div class="transcript-heading">[[!template new="1" text="""Information retrieval""" start="00:03:58.233" video="mainVideo-p-search" id="subtitle"]]</div>[[!template text="""There's a field called information retrieval,""" start="00:03:58.233" video="mainVideo-p-search" id="subtitle"]] [[!template text="""and this deals with this exact problem.""" start="00:04:00.395" video="mainVideo-p-search" id="subtitle"]] [[!template text="""You have lots of data you're searching for.""" start="00:04:02.617" video="mainVideo-p-search" id="subtitle"]] [[!template text="""How do you construct a search query?""" start="00:04:04.719" video="mainVideo-p-search" id="subtitle"]] @@ -82,9 +75,7 @@ [[!template text="""searching in Emacs, then drawing inspiration from this""" start="00:04:28.160" video="mainVideo-p-search" id="subtitle"]] [[!template text="""field is necessary.""" start="00:04:31.880" video="mainVideo-p-search" id="subtitle"]] -[[!template new="1" text="""Search engine in Emacs: the index""" start="00:04:34.296" video="mainVideo-p-search" id="subtitle"]] - -[[!template text="""The first aspect of information retrieval is the index.""" start="00:04:34.296" video="mainVideo-p-search" id="subtitle"]] +<div class="transcript-heading">[[!template new="1" text="""Search engine in Emacs: the index""" start="00:04:34.296" video="mainVideo-p-search" id="subtitle"]]</div>[[!template text="""The first aspect of information retrieval is the index.""" start="00:04:34.296" video="mainVideo-p-search" id="subtitle"]] [[!template text="""The reverse index is what search engines use to find results really fast.""" start="00:04:41.384" video="mainVideo-p-search" id="subtitle"]] [[!template text="""Essentially, it's a map of search term""" start="00:04:46.609" video="mainVideo-p-search" id="subtitle"]] [[!template text="""to locations where that term is located.""" start="00:04:51.455" video="mainVideo-p-search" id="subtitle"]] @@ -111,18 +102,14 @@ [[!template text="""Definitely, it can search a few pretty big size""" start="00:06:15.960" video="mainVideo-p-search" id="subtitle"]] [[!template text="""repositories.""" start="00:06:19.240" video="mainVideo-p-search" id="subtitle"]] -[[!template new="1" text="""Search engine in Emacs: Ranking""" start="00:06:21.757" video="mainVideo-p-search" id="subtitle"]] - -[[!template text="""Next main task. We decided not to use an""" start="00:06:21.757" video="mainVideo-p-search" id="subtitle"]] +<div class="transcript-heading">[[!template new="1" text="""Search engine in Emacs: Ranking""" start="00:06:21.757" video="mainVideo-p-search" id="subtitle"]]</div>[[!template text="""Next main task. We decided not to use an""" start="00:06:21.757" video="mainVideo-p-search" id="subtitle"]] [[!template text="""index. Next task is how do we rank search results? So there's""" start="00:06:24.800" video="mainVideo-p-search" id="subtitle"]] [[!template text="""two main algorithms that are used these days. The first""" start="00:06:29.960" video="mainVideo-p-search" id="subtitle"]] [[!template text="""one is tf-idf, which stands for term frequency, inverse""" start="00:06:33.440" video="mainVideo-p-search" id="subtitle"]] [[!template text="""target frequency. Then there's BM25, which is sort of a""" start="00:06:36.520" video="mainVideo-p-search" id="subtitle"]] [[!template text="""modified tf-idf algorithm.""" start="00:06:43.040" video="mainVideo-p-search" id="subtitle"]] -[[!template new="1" text="""tf-idf: term-frequency x inverse-document-frequency""" start="00:06:43.553" video="mainVideo-p-search" id="subtitle"]] - -[[!template text="""tf-idf, without going into""" start="00:06:43.553" video="mainVideo-p-search" id="subtitle"]] +<div class="transcript-heading">[[!template new="1" text="""tf-idf: term-frequency x inverse-document-frequency""" start="00:06:43.553" video="mainVideo-p-search" id="subtitle"]]</div>[[!template text="""tf-idf, without going into""" start="00:06:43.553" video="mainVideo-p-search" id="subtitle"]] [[!template text="""too much detail, essentially multiplies two terms. One""" start="00:06:45.680" video="mainVideo-p-search" id="subtitle"]] [[!template text="""is the term frequency, and then you multiply it by the""" start="00:06:49.160" video="mainVideo-p-search" id="subtitle"]] [[!template text="""inverse document frequency. The term frequency is a""" start="00:06:51.880" video="mainVideo-p-search" id="subtitle"]] @@ -140,9 +127,7 @@ [[!template text="""few documents, they're weighted a lot more. So the more""" start="00:07:35.680" video="mainVideo-p-search" id="subtitle"]] [[!template text="""those rare words occur, they boost the score higher.""" start="00:07:37.680" video="mainVideo-p-search" id="subtitle"]] -[[!template new="1" text="""BM25""" start="00:07:41.160" video="mainVideo-p-search" id="subtitle"]] - -[[!template text="""BM25 is a modification of this. It's essentially TF, it's""" start="00:07:41.160" video="mainVideo-p-search" id="subtitle"]] +<div class="transcript-heading">[[!template new="1" text="""BM25""" start="00:07:41.160" video="mainVideo-p-search" id="subtitle"]]</div>[[!template text="""BM25 is a modification of this. It's essentially TF, it's""" start="00:07:41.160" video="mainVideo-p-search" id="subtitle"]] [[!template text="""essentially the previous one, except it dampens out terms""" start="00:07:48.840" video="mainVideo-p-search" id="subtitle"]] [[!template text="""that occur more often. Imagine you have a bunch of""" start="00:07:53.120" video="mainVideo-p-search" id="subtitle"]] [[!template text="""documents. One has a term 10 times, one has a term, that same""" start="00:07:55.440" video="mainVideo-p-search" id="subtitle"]] @@ -156,9 +141,7 @@ [[!template text="""we can combine this together and create a simple search""" start="00:08:36.800" video="mainVideo-p-search" id="subtitle"]] [[!template text="""mechanism.""" start="00:08:40.080" video="mainVideo-p-search" id="subtitle"]] -[[!template new="1" text="""Searching with p-search""" start="00:08:41.200" video="mainVideo-p-search" id="subtitle"]] - -[[!template text="""Here we're in the directory for the Emacs source code.""" start="00:08:41.200" video="mainVideo-p-search" id="subtitle"]] +<div class="transcript-heading">[[!template new="1" text="""Searching with p-search""" start="00:08:41.200" video="mainVideo-p-search" id="subtitle"]]</div>[[!template text="""Here we're in the directory for the Emacs source code.""" start="00:08:41.200" video="mainVideo-p-search" id="subtitle"]] [[!template text="""Let's say we want to search for the display code. We""" start="00:08:47.440" video="mainVideo-p-search" id="subtitle"]] [[!template text="""run the p-search command, starting the search engine. It""" start="00:08:53.480" video="mainVideo-p-search" id="subtitle"]] [[!template text="""opens up. We notice it has three sections, the candidate""" start="00:08:58.680" video="mainVideo-p-search" id="subtitle"]] @@ -184,9 +167,7 @@ [[!template text="""these results and make sense of it.""" start="00:10:31.280" video="mainVideo-p-search" id="subtitle"]] [[!template text="""So that's p-search in a nutshell.""" start="00:10:34.320" video="mainVideo-p-search" id="subtitle"]] -[[!template new="1" text="""Flight AF 447""" start="00:10:41.457" video="mainVideo-p-search" id="subtitle"]] - -[[!template text="""Next, I wanted to talk about the story of Flight 447.""" start="00:10:41.457" video="mainVideo-p-search" id="subtitle"]] +<div class="transcript-heading">[[!template new="1" text="""Flight AF 447""" start="00:10:41.457" video="mainVideo-p-search" id="subtitle"]]</div>[[!template text="""Next, I wanted to talk about the story of Flight 447.""" start="00:10:41.457" video="mainVideo-p-search" id="subtitle"]] [[!template text="""Flight 447 going from Rio de Janeiro to Paris""" start="00:10:45.983" video="mainVideo-p-search" id="subtitle"]] [[!template text="""crashed somewhere in the Atlantic Ocean""" start="00:10:49.327" video="mainVideo-p-search" id="subtitle"]] [[!template text="""on June 1st, 2009, killing everyone on board.""" start="00:10:51.510" video="mainVideo-p-search" id="subtitle"]] @@ -259,9 +240,7 @@ [[!template text="""search. In the end, the wreckage was found at a point close to""" start="00:15:57.000" video="mainVideo-p-search" id="subtitle"]] [[!template text="""the center here, thus validating this methodology.""" start="00:16:02.000" video="mainVideo-p-search" id="subtitle"]] -[[!template new="1" text="""Modifying priors""" start="00:16:06.771" video="mainVideo-p-search" id="subtitle"]] - -[[!template text="""We can see the power of this Bayesian search methodology""" start="00:16:06.771" video="mainVideo-p-search" id="subtitle"]] +<div class="transcript-heading">[[!template new="1" text="""Modifying priors""" start="00:16:06.771" video="mainVideo-p-search" id="subtitle"]]</div>[[!template text="""We can see the power of this Bayesian search methodology""" start="00:16:06.771" video="mainVideo-p-search" id="subtitle"]] [[!template text="""in the way that we could take information from all the sources we had.""" start="00:16:10.333" video="mainVideo-p-search" id="subtitle"]] [[!template text="""We could draw analogies to similar situations.""" start="00:16:14.000" video="mainVideo-p-search" id="subtitle"]] [[!template text="""We can quantify these, combine them into a model,""" start="00:16:19.238" video="mainVideo-p-search" id="subtitle"]] @@ -322,9 +301,7 @@ [[!template text="""list of results that's tailor-made to the thing you're""" start="00:20:35.160" video="mainVideo-p-search" id="subtitle"]] [[!template text="""searching for.""" start="00:20:37.640" video="mainVideo-p-search" id="subtitle"]] -[[!template new="1" text="""Importance""" start="00:20:40.405" video="mainVideo-p-search" id="subtitle"]] - -[[!template text="""There's a couple of other features I""" start="00:20:40.405" video="mainVideo-p-search" id="subtitle"]] +<div class="transcript-heading">[[!template new="1" text="""Importance""" start="00:20:40.405" video="mainVideo-p-search" id="subtitle"]]</div>[[!template text="""There's a couple of other features I""" start="00:20:40.405" video="mainVideo-p-search" id="subtitle"]] [[!template text="""want to go through. One thing is that each of these priors,""" start="00:20:41.640" video="mainVideo-p-search" id="subtitle"]] [[!template text="""you can specify the importance. In other words, how""" start="00:20:49.080" video="mainVideo-p-search" id="subtitle"]] [[!template text="""important is this particular piece of information to your""" start="00:20:55.840" video="mainVideo-p-search" id="subtitle"]] @@ -337,9 +314,7 @@ [[!template text="""the word display in it are rated much higher.""" start="00:21:28.080" video="mainVideo-p-search" id="subtitle"]] [[!template text="""With this, we're able to fine-tune the results that we get.""" start="00:21:28.129" video="mainVideo-p-search" id="subtitle"]] -[[!template new="1" text="""Complement or inverse""" start="00:21:38.560" video="mainVideo-p-search" id="subtitle"]] - -[[!template text="""Another thing you can do is that you can add the complement or""" start="00:21:38.560" video="mainVideo-p-search" id="subtitle"]] +<div class="transcript-heading">[[!template new="1" text="""Complement or inverse""" start="00:21:38.560" video="mainVideo-p-search" id="subtitle"]]</div>[[!template text="""Another thing you can do is that you can add the complement or""" start="00:21:38.560" video="mainVideo-p-search" id="subtitle"]] [[!template text="""the inverse of certain queries. Let's say you want to""" start="00:21:45.640" video="mainVideo-p-search" id="subtitle"]] [[!template text="""search for display, but you don't want it to contain the word""" start="00:21:49.760" video="mainVideo-p-search" id="subtitle"]] [[!template text="""frame. With the complement option on, when we create this""" start="00:21:53.240" video="mainVideo-p-search" id="subtitle"]] |