summaryrefslogtreecommitdiffstats
path: root/2024/info/p-search-after.md
blob: b3932bddd1af82a36417784dd028a5a447e06b20 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
<!-- Automatically generated by emacsconf-publish-after-page -->


<a name="p-search-mainVideo-transcript"></a>
# Transcript


[[!template new="1" text="""Search in daily workflows""" start="00:00:00.000" video="mainVideo-p-search" id="subtitle"]]

[[!template text="""Hello, my name is Zachary Romero, and today I'll be going""" start="00:00:00.000" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""over p-search, a local search engine in Emacs.""" start="00:00:03.400" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""Search these days is everywhere in software, from text editors,""" start="00:00:08.116" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""to IDEs, to most online websites. These tools tend to fall""" start="00:00:12.399" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""into one of two categories. One are tools that run locally,""" start="00:00:18.360" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""and work by matching string to text. The most common""" start="00:00:25.840" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""example of this is grep. In Emacs, there are a lot of""" start="00:00:31.280" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""extensions which provide functionality on top of these""" start="00:00:35.640" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""tools, such as projectile-grep, deadgrep,""" start="00:00:38.960" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""consult-ripgrep. Most editors have some sort of""" start="00:00:42.389" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""search current project feature. Most of the time,""" start="00:00:46.850" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""some of these tools have features like regular expressions,""" start="00:00:52.692" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""or you can specify file extension,""" start="00:00:56.394" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""or a directory you want to search in,""" start="00:00:59.216" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""but features are pretty limited.""" start="00:01:01.637" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""The other kind of search we use are usually hosted online,""" start="00:01:03.958" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""and they usually search a vast corpus of data.""" start="00:01:07.920" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""These are usually proprietary""" start="00:01:12.303" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""online services such as Google, GitHub,""" start="00:01:15.640" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""SourceGraph for code.""" start="00:01:18.766" video="mainVideo-p-search" id="subtitle"]]

[[!template new="1" text="""Problems with editor search tools""" start="00:01:24.200" video="mainVideo-p-search" id="subtitle"]]

[[!template text="""The kind of search feature that editors""" start="00:01:24.200" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""usually have have a lot of downsides to them. For one, a lot""" start="00:01:28.840" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""of times you don't know the exact search string you're""" start="00:01:36.720" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""searching for. Some complicated term like this""" start="00:01:38.840" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""high volume demand partner, you know, do you know if...""" start="00:01:42.784" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""Are some words abbreviated, is it capitalized,""" start="00:01:46.861" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""is it in kebab case, camel case, snake case?""" start="00:01:49.709" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""You often have to search all these variations.""" start="00:01:53.090" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""Another downside is that the search results returned""" start="00:01:57.572" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""contain a lot of noise. For example,""" start="00:02:05.435" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""you may get a lot of test files.""" start="00:02:07.770" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""If the tool hits your vendor directory,""" start="00:02:10.817" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""it may get a bunch of results from libraries""" start="00:02:13.538" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""you're using, which most are not helpful. Another downside""" start="00:02:17.200" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""is that the order given is, well, there's no meaning to the""" start="00:02:22.880" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""order. It's usually just the search order that the tool""" start="00:02:26.680" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""happens to look in first.""" start="00:02:30.320" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""Another thing is, so when you're searching, you oftentimes""" start="00:02:34.640" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""have to keep the state of the searches in your head. For""" start="00:02:38.640" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""example, you try one search, you see the results, find the""" start="00:02:41.640" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""results you think are relevant, keep them in your head, run""" start="00:02:46.640" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""search number two, look through the results, kind of""" start="00:02:49.640" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""combine these different search results in your head until""" start="00:02:52.520" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""you get an idea of which ones might be relevant.""" start="00:02:56.120" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""Another thing is that the search primitives are fairly limited.""" start="00:02:59.971" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""So yeah, you can search regular expressions, but you can't""" start="00:03:04.516" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""really define complex things like, I want to search files in""" start="00:03:10.600" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""this directory, and this directory, and this directory,""" start="00:03:14.720" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""except these subdirectories, and accept test files, and I""" start="00:03:18.440" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""only want files with this file extension. Criteria like""" start="00:03:22.320" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""that are really hard to... I'm sure they're possible in tools""" start="00:03:25.560" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""like grep, but they're pretty hard to construct.""" start="00:03:28.920" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""And lastly, there's no notion of any relevance. All the""" start="00:03:34.480" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""results you get back, I mean, you don't know, is the search""" start="00:03:38.200" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""more relevant? Is it twice as relevant? Is it""" start="00:03:42.040" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""100 times more relevant? These tools usually don't provide""" start="00:03:43.096" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""such information.""" start="00:03:52.280" video="mainVideo-p-search" id="subtitle"]]

[[!template new="1" text="""Information retrieval""" start="00:03:58.233" video="mainVideo-p-search" id="subtitle"]]

[[!template text="""There's a field called information retrieval,""" start="00:03:58.233" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""and this deals with this exact problem.""" start="00:04:00.395" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""You have lots of data you're searching for.""" start="00:04:02.617" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""How do you construct a search query?""" start="00:04:04.719" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""How do you get results back fast? How do you""" start="00:04:09.262" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""rank which ones are most relevant? How do you evaluate""" start="00:04:09.840" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""your search system to see if it's getting better or worse?""" start="00:04:14.520" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""There's a lot of work, a lot of books written on the topic of""" start="00:04:20.080" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""information retrieval. If one wants to improve""" start="00:04:23.120" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""searching in Emacs, then drawing inspiration from this""" start="00:04:28.160" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""field is necessary.""" start="00:04:31.880" video="mainVideo-p-search" id="subtitle"]]

[[!template new="1" text="""Search engine in Emacs: the index""" start="00:04:34.296" video="mainVideo-p-search" id="subtitle"]]

[[!template text="""The first aspect of information retrieval is the index.""" start="00:04:34.296" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""The reverse index is what search engines use to find results really fast.""" start="00:04:41.384" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""Essentially, it's a map of search term""" start="00:04:46.609" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""to locations where that term is located.""" start="00:04:51.455" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""You'll have all the terms or maybe even parts of""" start="00:04:54.739" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""the terms, and then you'll have all the locations where""" start="00:04:57.080" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""they're located. Any query could easily look up""" start="00:04:59.160" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""where things are located, join results together, and""" start="00:05:02.120" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""that's how they get the results to be really fast. For this""" start="00:05:05.920" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""project, I decided to forgo creating an index altogether.""" start="00:05:12.880" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""An index is pretty complicated to maintain because""" start="00:05:19.160" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""it always has to be in sync. Any time you open a file and save""" start="00:05:23.760" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""it, you would have to re-index, you would have to make sure""" start="00:05:27.320" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""that file is re-indexed properly. Then you have the""" start="00:05:29.960" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""whole issue of, well, if you're searching in Emacs,""" start="00:05:32.560" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""you have all these projects, this directory,""" start="00:05:36.120" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""that directory, how do you know which? Do you always have to""" start="00:05:38.800" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""keep them in sync? It's quite a hard task to handle""" start="00:05:42.480" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""that. Then on the other end, tools like ripgrep can""" start="00:05:47.400" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""search very fast. Even though they can't search maybe on the""" start="00:05:53.080" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""order of tens of thousands of repositories, for a local""" start="00:05:59.120" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""setting, they should be plenty fast enough.""" start="00:06:03.920" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""I benchmarked. Ripgrep, for example, is""" start="00:06:06.040" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""on the order of gigabytes per second.""" start="00:06:12.240" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""Definitely, it can search a few pretty big size""" start="00:06:15.960" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""repositories.""" start="00:06:19.240" video="mainVideo-p-search" id="subtitle"]]

[[!template new="1" text="""Search engine in Emacs: Ranking""" start="00:06:21.757" video="mainVideo-p-search" id="subtitle"]]

[[!template text="""Next main task. We decided not to use an""" start="00:06:21.757" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""index. Next task is how do we rank search results? So there's""" start="00:06:24.800" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""two main algorithms that are used these days. The first""" start="00:06:29.960" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""one is tf-idf, which stands for term frequency, inverse""" start="00:06:33.440" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""target frequency. Then there's BM25, which is sort of a""" start="00:06:36.520" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""modified tf-idf algorithm.""" start="00:06:43.040" video="mainVideo-p-search" id="subtitle"]]

[[!template new="1" text="""tf-idf: term-frequency x inverse-document-frequency""" start="00:06:43.553" video="mainVideo-p-search" id="subtitle"]]

[[!template text="""tf-idf, without going into""" start="00:06:43.553" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""too much detail, essentially multiplies two terms. One""" start="00:06:45.680" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""is the term frequency, and then you multiply it by the""" start="00:06:49.160" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""inverse document frequency. The term frequency is a""" start="00:06:51.880" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""measure of how often that search term occurs. The""" start="00:06:54.560" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""inverse document frequency is a measure of how much""" start="00:06:58.520" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""information that term provides. If the term occurs a lot,""" start="00:07:00.800" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""then it gets a higher score in the term frequency section.""" start="00:07:06.200" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""But if it's a common word that exists in a lot of documents,""" start="00:07:08.720" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""then its inverse document frequency goes down.""" start="00:07:12.400" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""It kind of scores it less. You'll find that words like the,""" start="00:07:13.901" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""in, is, these really common words, since they occur""" start="00:07:20.880" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""everywhere, their inverse document frequency is""" start="00:07:25.960" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""essentially zero. They don't really count towards a""" start="00:07:29.200" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""score. But when you have rare words that only occur in a""" start="00:07:32.480" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""few documents, they're weighted a lot more. So the more""" start="00:07:35.680" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""those rare words occur, they boost the score higher.""" start="00:07:37.680" video="mainVideo-p-search" id="subtitle"]]

[[!template new="1" text="""BM25""" start="00:07:41.160" video="mainVideo-p-search" id="subtitle"]]

[[!template text="""BM25 is a modification of this. It's essentially TF, it's""" start="00:07:41.160" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""essentially the previous one, except it dampens out terms""" start="00:07:48.840" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""that occur more often. Imagine you have a bunch of""" start="00:07:53.120" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""documents. One has a term 10 times, one has a term, that same""" start="00:07:55.440" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""term a hundred times, another has a thousand times.""" start="00:07:59.360" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""You'll see the score dampens off as the number of""" start="00:08:02.440" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""occurrences increases. That prevents any one term from""" start="00:08:06.800" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""overpowering the score. This is the algorithm I ended up""" start="00:08:10.640" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""choosing for my implementation. So with a plan of using a""" start="00:08:16.560" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""command line tool like ripgrep to get term occurrences, and""" start="00:08:21.040" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""then using a scoring algorithm like BM25 to rank the terms,""" start="00:08:29.560" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""we can combine this together and create a simple search""" start="00:08:36.800" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""mechanism.""" start="00:08:40.080" video="mainVideo-p-search" id="subtitle"]]

[[!template new="1" text="""Searching with p-search""" start="00:08:41.200" video="mainVideo-p-search" id="subtitle"]]

[[!template text="""Here we're in the directory for the Emacs source code.""" start="00:08:41.200" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""Let's say we want to search for the display code. We""" start="00:08:47.440" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""run the p-search command, starting the search engine. It""" start="00:08:53.480" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""opens up. We notice it has three sections, the candidate""" start="00:08:58.680" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""generators, the priors, and the search results. The""" start="00:09:01.160" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""candidate generators generates the search space we're""" start="00:09:05.200" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""looking on. These are all composable and you can add as""" start="00:09:10.000" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""many as you want. So with this, it specifies that here""" start="00:09:14.720" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""we're searching on the file system and we're searching in""" start="00:09:19.720" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""this directory. We're using the ripgrep tool to search""" start="00:09:25.240" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""with, and we want to make sure that we're searching only on""" start="00:09:30.800" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""files committed to Git. Here we see the search results.""" start="00:09:33.360" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""Notice here is their final probability. Here, notice""" start="00:09:40.480" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""that they're all the same, and they're the same because we""" start="00:09:45.160" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""don't have any search criteria specified here. Suppose""" start="00:09:47.080" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""we want to search for display-related code. We add a""" start="00:09:50.720" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""query: display.""" start="00:09:55.680" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""So then it spins off the processes, gets the search term""" start="00:09:57.360" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""counts and calculates the new scores. Notice here that""" start="00:10:06.560" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""the results that come on top are just at first glance appear""" start="00:10:10.880" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""to be relevant to display. Remember, if we compare""" start="00:10:15.760" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""that to just running a ripgrep raw, notice here we're""" start="00:10:19.920" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""getting 53,000 results and it's pretty hard to go through""" start="00:10:25.080" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""these results and make sense of it.""" start="00:10:31.280" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""So that's p-search in a nutshell.""" start="00:10:34.320" video="mainVideo-p-search" id="subtitle"]]

[[!template new="1" text="""Flight AF 447""" start="00:10:41.457" video="mainVideo-p-search" id="subtitle"]]

[[!template text="""Next, I wanted to talk about the story of Flight 447.""" start="00:10:41.457" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""Flight 447 going from Rio de Janeiro to Paris""" start="00:10:45.983" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""crashed somewhere in the Atlantic Ocean""" start="00:10:49.327" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""on June 1st, 2009, killing everyone on board.""" start="00:10:51.510" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""Four search attempts were made to find the wreckage.""" start="00:10:54.714" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""None of them were successful, except the finding of some debris""" start="00:10:56.895" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""and a dead body. It was decided that they really wanted""" start="00:11:01.076" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""to find the wreckage to retrieve data as to why the search""" start="00:11:05.480" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""occurred. This occurred two years after the""" start="00:11:09.520" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""initial crash. With this next search attempt, they""" start="00:11:14.640" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""wanted to create a probability distribution of where the""" start="00:11:19.960" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""crash could be. The only piece of concrete data they had""" start="00:11:23.200" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""was a GPS signal from the ship at 210 containing the GPS""" start="00:11:26.760" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""location of the plane was at 2.98 degrees north, 30.59""" start="00:11:35.080" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""degrees west. That was the only data they had to go off of.""" start="00:11:40.240" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""So they drew a circle around that point""" start="00:11:44.720" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""with a radius of 40 nautical miles. They assumed that""" start="00:11:50.080" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""anything outside the circle would have been impossible for""" start="00:11:54.680" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""the ship to reach. This was the starting point for""" start="00:11:57.480" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""creating the probability distribution of where the""" start="00:12:01.240" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""wreckage occurred. Anything outside the circle, they""" start="00:12:04.800" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""assumed it was impossible to reach.""" start="00:12:08.120" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""The only other pieces of data were the four failed search""" start="00:12:09.640" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""attempts and then some of the debris found. One thing they""" start="00:12:16.480" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""did decide was to look at similar crashes where control was""" start="00:12:21.720" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""lost to analyze where the crashes landed, compared to where""" start="00:12:26.160" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""the loss of control started. This probability""" start="00:12:30.320" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""distribution, the circular normal distribution was""" start="00:12:37.400" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""decided upon. Here you can see that the center has a lot""" start="00:12:43.480" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""higher chance of finding the wreckage. As you go away""" start="00:12:47.920" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""from the center, the probability of finding the wreckage""" start="00:12:51.880" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""decreases a lot. The next thing they looked at was, well,""" start="00:12:55.400" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""they noticed they had retrieved some dead bodies from the""" start="00:13:02.320" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""wreckage. So they thought that they could calculate the""" start="00:13:05.960" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""backward drift on that particular day to find where the""" start="00:13:12.960" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""crash might've occurred. If they found bodies at a""" start="00:13:18.440" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""particular location, they can kind of work backwards from""" start="00:13:21.480" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""that in order to find where the initial crash occurred.""" start="00:13:25.120" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""So here you can see the probability distribution based off of""" start="00:13:30.666" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""the backward drift model. Here you see the darker colors""" start="00:13:34.720" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""have a higher probability of finding the location. So""" start="00:13:40.280" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""with all these pieces of data, so with that circular 40""" start="00:13:46.160" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""nautical mile uniform distribution, with that circular""" start="00:13:50.680" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""normal distribution of comparing similar crashes, as well""" start="00:13:54.960" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""as with the backward drift, they were able to combine all""" start="00:14:02.200" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""three of these pieces""" start="00:14:07.440" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""in order to come up with a final prior distribution of where""" start="00:14:08.560" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""the wreckage occurred. So this is what the final model""" start="00:14:14.600" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""they came upon. Here you can see it has that 40 nautical""" start="00:14:19.520" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""mile radius circle. It has that darker center, which""" start="00:14:24.720" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""indicates a higher probability because of the""" start="00:14:29.680" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""crash similarity. Then here you also see along this line""" start="00:14:32.040" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""has a slightly higher probability due to the backward drift""" start="00:14:38.960" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""distribution.""" start="00:14:50.800" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""So the next thing is, since they had performed searches,""" start="00:14:52.120" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""they decided to incorporate the data from those searches""" start="00:14:56.560" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""into their new distribution. Here you can see places""" start="00:15:00.560" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""where they searched initially. If you think about it,""" start="00:15:04.760" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""you can assume that, well, if you search for something,""" start="00:15:08.880" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""there's a good chance you'll find it, but not necessarily.""" start="00:15:11.400" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""Anywhere where they searched, the probability of it""" start="00:15:14.200" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""finding it there is greatly reduced. It's not zero because""" start="00:15:18.440" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""obviously you can look for something and miss it, but it kind""" start="00:15:22.840" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""of reduces the probability that we would expect to find it in""" start="00:15:26.880" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""those already searched locations. This is the""" start="00:15:31.120" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""posterior distribution or distribution after counting""" start="00:15:36.680" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""observations made.""" start="00:15:41.920" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""Here we can see kind of these cutouts of where the""" start="00:15:44.560" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""previous searches occurred. This is the final""" start="00:15:48.760" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""distribution they went off of to perform the subsequent""" start="00:15:53.960" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""search. In the end, the wreckage was found at a point close to""" start="00:15:57.000" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""the center here, thus validating this methodology.""" start="00:16:02.000" video="mainVideo-p-search" id="subtitle"]]

[[!template new="1" text="""Modifying priors""" start="00:16:06.771" video="mainVideo-p-search" id="subtitle"]]

[[!template text="""We can see the power of this Bayesian search methodology""" start="00:16:06.771" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""in the way that we could take information from all the sources we had.""" start="00:16:10.333" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""We could draw analogies to similar situations.""" start="00:16:14.000" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""We can quantify these, combine them into a model,""" start="00:16:19.238" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""and then also update our model according to each observation we make.""" start="00:16:22.480" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""I think there's a lot of similarities to be drawn with""" start="00:16:27.894" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""searching on a computer in the sense that when we search for""" start="00:16:30.360" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""something, there's oftentimes a story we kind of have as to""" start="00:16:35.160" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""what search terms exist, where we expect to find the file.""" start="00:16:39.400" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""For example, if you're implementing a new feature, you'll""" start="00:16:43.960" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""often have some search terms in mind that you think will be""" start="00:16:46.720" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""relevant. Some search terms, you might think they have a""" start="00:16:49.920" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""possibility of being relevant, but maybe you're not sure.""" start="00:16:54.720" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""There's some directories where you know that they're not""" start="00:16:57.600" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""relevant. There's other criteria like, well, you know that""" start="00:17:02.880" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""maybe somebody in particular worked on this code.""" start="00:17:07.760" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""What if you could incorporate that information? Like, I know""" start="00:17:11.400" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""this author, he's always working on this feature. What if""" start="00:17:16.320" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""I just give the files that this person works on a higher""" start="00:17:21.400" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""probability than ones he doesn't work on? Or maybe you think""" start="00:17:25.520" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""that this is a file that's committed too often. You think""" start="00:17:32.600" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""that maybe the amount of times of commits it receives""" start="00:17:38.600" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""should change your probability of this file being""" start="00:17:43.440" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""relevant. That's where p-search comes in.""" start="00:17:47.720" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""Its aim is to be a framework in order to incorporate all these""" start="00:17:52.840" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""sorts of different prior information into your searching""" start="00:17:57.680" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""process. You're able to say things like, I want files""" start="00:18:01.360" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""authored by this user to be given higher probability. I want""" start="00:18:06.000" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""this author to be given a lower priority. I know this author""" start="00:18:11.120" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""never works on this code. If he has a commit, then lower its""" start="00:18:13.920" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""probability, or you can specify specific paths, or you can""" start="00:18:18.760" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""specify multiple search terms, weighing different ones""" start="00:18:24.680" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""according to how you think those terms should be relevant.""" start="00:18:30.200" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""So with p-search, we're able to incorporate information""" start="00:18:38.920" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""from multiple sources. Here, for example, we have a prior""" start="00:18:42.080" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""of type git author, and we're looking for all of the files""" start="00:18:46.280" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""that are committed to by Lars. So the more commits he has,""" start="00:18:52.080" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""the higher probability is given to that file. Suppose""" start="00:18:56.720" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""there's a feature I know he worked on, but I don't know the""" start="00:19:01.400" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""file or necessarily even key terms of it. Well, with this, I""" start="00:19:04.560" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""can incorporate that information.""" start="00:19:09.160" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""So let's search again. Let's add display.""" start="00:19:12.141" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""Let's see what responses we get back here. We can add""" start="00:19:16.000" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""as many of these criteria as we want. We can even specify that""" start="00:19:22.960" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""the title of the file name should be a certain type. Let's""" start="00:19:27.200" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""say we're only concerned about C files. We add the file""" start="00:19:31.520" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""name should contain .c in it. With this, now we""" start="00:19:36.600" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""notice that all of the C files containing display authored""" start="00:19:45.400" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""by Lars should be given higher probability. We can""" start="00:19:51.320" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""continue to add these priors as we feel fit. The workflow""" start="00:19:56.280" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""that I found helps when searching is that you'll add""" start="00:20:02.720" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""criteria, you'll see some good results come up and some bad""" start="00:20:07.520" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""results come up. So you'll often find a pattern in those""" start="00:20:11.360" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""bad results, like, oh, I don't want test files, or this""" start="00:20:15.320" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""directory isn't relevant, or something like that. Then""" start="00:20:18.840" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""you can update your prior distribution, adding its""" start="00:20:22.680" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""criteria, and then rerun it, and then it will get different""" start="00:20:27.200" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""probabilities for the files. So in the end, you'll have a""" start="00:20:31.120" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""list of results that's tailor-made to the thing you're""" start="00:20:35.160" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""searching for.""" start="00:20:37.640" video="mainVideo-p-search" id="subtitle"]]

[[!template new="1" text="""Importance""" start="00:20:40.405" video="mainVideo-p-search" id="subtitle"]]

[[!template text="""There's a couple of other features I""" start="00:20:40.405" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""want to go through. One thing is that each of these priors,""" start="00:20:41.640" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""you can specify the importance. In other words, how""" start="00:20:49.080" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""important is this particular piece of information to your""" start="00:20:55.840" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""search? So here, everything is of importance medium. But""" start="00:21:01.120" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""let's say I really care about something having the word""" start="00:21:05.200" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""display in it. I'm going to change its importance.""" start="00:21:07.880" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""Instead of medium, I'll change its importance to high.""" start="00:21:12.680" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""What that does essentially is things that don't have""" start="00:21:18.600" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""display in it are given a much bigger penalty and things with""" start="00:21:23.280" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""the word display in it are rated much higher.""" start="00:21:28.080" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""With this, we're able to fine-tune the results that we get.""" start="00:21:28.129" video="mainVideo-p-search" id="subtitle"]]

[[!template new="1" text="""Complement or inverse""" start="00:21:38.560" video="mainVideo-p-search" id="subtitle"]]

[[!template text="""Another thing you can do is that you can add the complement or""" start="00:21:38.560" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""the inverse of certain queries. Let's say you want to""" start="00:21:45.640" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""search for display, but you don't want it to contain the word""" start="00:21:49.760" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""frame. With the complement option on, when we create this""" start="00:21:53.240" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""search prior, now it's going to be searching for frame, but""" start="00:21:58.040" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""instead of increasing the search score, it's going to""" start="00:22:01.840" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""decrease it if it contains the word frame.""" start="00:22:04.960" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""So here, things related to frame are kind of""" start="00:22:07.000" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""deprioritized. We can also say that we really don't want""" start="00:22:14.320" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""the search to contain the word frame by increasing its""" start="00:22:18.080" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""importance. So with all these composable pieces, we can""" start="00:22:21.600" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""create kind of a search that's tailor-made to our needs.""" start="00:22:27.200" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""That concludes this talk. There's a lot more I could talk""" start="00:22:33.413" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""about with regards to research, so definitely follow the""" start="00:22:35.760" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""project if you're interested. Thanks for watching, and I""" start="00:22:37.800" video="mainVideo-p-search" id="subtitle"]]
[[!template text="""hope you enjoy the rest of the conference.""" start="00:22:40.640" video="mainVideo-p-search" id="subtitle"]]



Captioner: sachac

Questions or comments? Please e-mail [zacromero@posteo.com](mailto:zacromero@posteo.com?subject=Comment%20for%20EmacsConf%202023%20p-search%3A%20p-search%3A%20a%20local%20search%20engine%20in%20Emacs)


<!-- End of emacsconf-publish-after-page -->