summaryrefslogtreecommitdiffstats
path: root/2022/captions/emacsconf-2022-treesitter--treesitter-beyond-syntax-highlighting--abin-simon--main.vtt
blob: 8a426e7ceed00f426e085f0f15fb261e1d24975b (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
WEBVTT captioned by sachac

00:00:00.000 --> 00:00:03.240
Hey everyone, my name is Abin Simon

00:00:03.240 --> 00:00:05.080
and this talk is about "Tree-sitter:

00:00:05.080 --> 00:00:08.200
Beyond Syntax Highlighting."

00:00:08.200 --> 00:00:10.720
For those who are not aware of what Tree-sitter is,

00:00:10.720 --> 00:00:11.720
let me give you a quick intro.

00:00:11.720 --> 00:00:17.120
Tree-sitter, at its core, is a parser generator tool

00:00:17.120 --> 00:00:19.440
and an incremental parsing library.

00:00:19.440 --> 00:00:22.000
What it essentially means is that it gives you

00:00:22.000 --> 00:00:23.154
an always up-to-date

00:00:23.155 --> 00:00:24.200
AST [abstract syntax tree] of your code.

00:00:24.200 --> 00:00:27.960
In the current Emacs frame, what you see to the right

00:00:27.960 --> 00:00:30.840
is the AST tree produced by Tree-sitter

00:00:30.840 --> 00:00:33.560
of the code that is on the left.

00:00:33.560 --> 00:00:37.000
For example, if you go to this "if" statement,

00:00:37.000 --> 00:00:38.840
you can see it goes here.

00:00:38.840 --> 00:00:41.440
It is also really good at handling errors.

00:00:41.440 --> 00:00:44.400
For example, if I were to delete this [if statement],

00:00:44.400 --> 00:00:47.960
it still parses out a tree as much as it can,

00:00:47.960 --> 00:00:50.280
but with an error node.

00:00:50.280 --> 00:00:51.760
Now let's see how we can query the tree

00:00:51.760 --> 00:00:54.440
to get the information that we need.

00:00:54.440 --> 00:01:01.480
Let's first try to get all the identifiers in the buffer.

00:01:01.480 --> 00:01:04.000
It highlights all the identifiers in the buffer,

00:01:04.000 --> 00:01:05.440
but let's say we want to get something

00:01:05.440 --> 00:01:07.280
a little more precise.

00:01:07.280 --> 00:01:10.400
Let's say we wanted to get this "i" here.

00:01:10.400 --> 00:01:13.280
This, in our case, would be this identifier

00:01:13.280 --> 00:01:15.200
inside this assignment expression

00:01:15.200 --> 00:01:27.320
inside this "for" statement.

00:01:27.320 --> 00:01:29.920
We can write it out like this.

00:01:29.920 --> 00:01:31.880
I hope this gives you a basic idea

00:01:31.880 --> 00:01:34.480
of how Tree-sitter works and how you can query

00:01:34.480 --> 00:01:37.040
to get the information that you need.

00:01:37.040 --> 00:01:39.520
First of all, let's see how Tree-sitter can help us

00:01:39.520 --> 00:01:41.880
with syntax highlighting.

00:01:41.880 --> 00:01:46.480
This is the default syntax highlighting by Emacs for SQL.

00:01:46.480 --> 00:01:52.000
Now let's see how Tree-sitter helps.

00:01:52.000 --> 00:01:54.240
This is the syntax highlighting in Emacs

00:01:54.240 --> 00:01:56.760
which Tree-sitter enabled.

00:01:56.760 --> 00:01:58.240
You'll see that we're able to target

00:01:58.240 --> 00:02:01.240
a lot more things and highlight them.

00:02:01.240 --> 00:02:03.138
That said, you don't always have to

00:02:03.139 --> 00:02:04.200
highlight everything.

00:02:04.200 --> 00:02:15.640
I personally prefer a much simpler theme.

00:02:15.640 --> 00:02:17.880
Now let's see how Tree-sitter helps you simplify

00:02:17.880 --> 00:02:20.920
adding custom syntax highlighting to your code.

00:02:20.920 --> 00:02:22.200
This is a Python file which has

00:02:22.200 --> 00:02:25.640
a class and a few member functions.

00:02:25.640 --> 00:02:27.680
Anyone who has used Python will know that

00:02:27.680 --> 00:02:32.040
the "self" keyword, while it is passed in as an argument,

00:02:32.040 --> 00:02:34.240
it has more meaning than that.

00:02:34.240 --> 00:02:35.480
Let's see if you can use Tree-sitter

00:02:35.480 --> 00:02:38.720
to highlight just the "self" keyword.

00:02:38.720 --> 00:02:40.400
If you look at the Tree-sitter tree,

00:02:40.400 --> 00:02:43.120
you can see that this is the first identifier

00:02:43.120 --> 00:02:45.520
in the list of parameters for a function definition.

00:02:45.520 --> 00:02:55.480
This is how you would query for the first identifier

00:02:55.480 --> 00:02:59.320
inside parameters inside a function definition.

00:02:59.320 --> 00:03:02.520
Now, if you see here, it also matches "cls",

00:03:02.520 --> 00:03:11.360
but let's restrict it to match just "self".

00:03:11.360 --> 00:03:14.200
Now we have a Tree-sitter query that identifies

00:03:14.200 --> 00:03:16.960
the first argument to the function definition

00:03:16.960 --> 00:03:19.640
and is also called "self".

00:03:19.640 --> 00:03:22.520
We can use this to apply custom highlighting onto this.

00:03:22.520 --> 00:03:25.000
This is pretty much all the code

00:03:25.000 --> 00:03:26.520
that you'll need to do this.

00:03:26.520 --> 00:03:29.240
The first block here is essentially to say to

00:03:29.240 --> 00:03:32.160
Tree-sitter to highlight anything with python.self

00:03:32.160 --> 00:03:35.720
with the face of custom-set.

00:03:35.720 --> 00:03:37.520
Now the second block here essentially is

00:03:37.520 --> 00:03:39.800
how we match for that.

00:03:39.800 --> 00:03:41.800
Now if you go back into a Python buffer

00:03:41.800 --> 00:03:44.680
and re-enable python-mode, we'll see that "self"

00:03:44.680 --> 00:03:47.120
is highlighted differently.

00:03:47.120 --> 00:03:48.880
How about creating text objects?

00:03:48.880 --> 00:03:50.440
Tree-sitter can help there too.

00:03:50.440 --> 00:03:53.080
For those who don't know, text objects

00:03:53.080 --> 00:03:54.440
is an idea that comes from Vim,

00:03:54.440 --> 00:03:57.760
and you can do things like select word,

00:03:57.760 --> 00:04:00.520
delete word, things like that.

00:04:00.520 --> 00:04:06.200
There are other text objects like line and paragraph.

00:04:06.200 --> 00:04:09.000
For each text object, you can have operations

00:04:09.000 --> 00:04:09.760
that are defined on them.

00:04:09.760 --> 00:04:13.600
For example, delete, copy, select, comment,

00:04:13.600 --> 00:04:16.400
all of these are operations that you can do.

00:04:16.400 --> 00:04:19.400
Let's try and use Tree-sitter to add more text objects.

00:04:19.400 --> 00:04:20.560
This is a plugin that I wrote

00:04:20.560 --> 00:04:25.000
which lets you add more text objects into Emacs.

00:04:25.000 --> 00:04:27.880
It helps you code aware text objects

00:04:27.880 --> 00:04:31.880
like functions, conditionals, loops, and such.

00:04:31.880 --> 00:04:34.360
Let's see an example scenario of how

00:04:34.360 --> 00:04:35.920
something like this could come in handy.

00:04:35.920 --> 00:04:39.280
For example, I can select inside this condition

00:04:39.280 --> 00:04:42.960
or inside this function and do things like that.

00:04:42.960 --> 00:04:44.520
Let's say I want to take this conditional,

00:04:44.520 --> 00:04:47.160
move to the next function, and create it here.

00:04:47.160 --> 00:04:49.640
What I would do is something like

00:04:49.640 --> 00:04:52.320
delete the conditional, move to the next function,

00:04:52.320 --> 00:04:56.240
create a conditional there, and paste.

00:04:56.240 --> 00:04:57.160
Let's try another example.

00:04:57.160 --> 00:05:01.360
Let's say I want to take this and move it to the end.

00:05:01.360 --> 00:05:02.960
If I had to do it without text objects,

00:05:02.960 --> 00:05:06.800
I'd probably have to go back to the previous comma,

00:05:06.800 --> 00:05:10.440
delete till next comma, find the closing bracket,

00:05:10.440 --> 00:05:11.880
and paste before.

00:05:11.880 --> 00:05:14.040
That works, but let's see

00:05:14.040 --> 00:05:16.520
how Tree-sitter can simplify it.

00:05:16.520 --> 00:05:19.240
With Tree-sitter, I can say delete the argument,

00:05:19.240 --> 00:05:22.880
go to the end of the next argument, and then paste.

00:05:22.880 --> 00:05:25.280
Tree-sitter essentially helps Emacs

00:05:25.280 --> 00:05:27.240
understand the code better semantically.

00:05:27.240 --> 00:05:29.600
Here is yet another use case.

00:05:29.600 --> 00:05:31.480
I work at a remote company,

00:05:31.480 --> 00:05:33.440
and I often find myself being in a call

00:05:33.440 --> 00:05:35.400
with my teammates, explaining the code to them.

00:05:35.400 --> 00:05:38.000
And one thing that really comes in handy

00:05:38.000 --> 00:05:39.760
is the narrowing accessibility of Emacs.

00:05:39.760 --> 00:05:43.040
Specifically, the fancy-narrow package.

00:05:43.040 --> 00:05:44.840
I use it to narrow just the function,

00:05:44.840 --> 00:05:48.760
or I could narrow to the conditional.

00:05:48.760 --> 00:05:51.520
Next to the end, the list would be code folding.

00:05:51.520 --> 00:05:54.480
This is a package which uses Tree-sitter

00:05:54.480 --> 00:05:57.560
to improve the code folding functionalities of Emacs.

00:05:57.560 --> 00:06:00.200
Code folding has always been this thing

00:06:00.200 --> 00:06:02.280
that I've had a love-hate relationship with.

00:06:02.280 --> 00:06:04.280
It usually works most of the time,

00:06:04.280 --> 00:06:06.960
but then fails if the indentation is wrong

00:06:06.960 --> 00:06:09.160
or we do something weird with the arguments.

00:06:09.160 --> 00:06:11.680
But now with Tree-sitter in the mix,

00:06:11.680 --> 00:06:12.720
it's a lot more precise.

00:06:12.720 --> 00:06:17.040
I can fold comments, I can fold functions,

00:06:17.040 --> 00:06:20.480
I can fold conditionals. You get the idea.

00:06:20.480 --> 00:06:23.840
I work with Kubernetes, which means I end up

00:06:23.840 --> 00:06:28.080
having to write and read a lot of YAML files.

00:06:28.080 --> 00:06:31.840
And navigating big YAML files is a mess.

00:06:31.840 --> 00:06:35.760
The two main problems are figuring out where I am,

00:06:35.760 --> 00:06:38.760
and two, navigating to where I want to be.

00:06:38.760 --> 00:06:41.760
Let's see how Tree-sitter can help us with both of this.

00:06:41.760 --> 00:06:43.840
This is an example YAML file.

00:06:43.840 --> 00:06:47.080
To be precise, this is the values file

00:06:47.080 --> 00:06:48.640
of the Redis helm chart.

00:06:48.640 --> 00:06:52.240
I'm somewhere in the file on tag under image,

00:06:52.240 --> 00:06:54.880
but I don't know what this tag is for.

00:06:54.880 --> 00:06:57.240
But with the help of Tree-sitter,

00:06:57.240 --> 00:06:59.160
I've been able to add this information

00:06:59.160 --> 00:07:00.440
into my header line.

00:07:00.440 --> 00:07:02.960
If you see in the header line,

00:07:02.960 --> 00:07:05.880
you'll see that I'm under sentinel.image.

00:07:05.880 --> 00:07:08.800
Now let's see how this helps with navigation.

00:07:08.800 --> 00:07:12.680
Let's say I want to enable persistence on master node.

00:07:12.680 --> 00:07:18.200
So with the help of Tree-sitter,

00:07:18.200 --> 00:07:20.400
I was able to enumerate every field

00:07:20.400 --> 00:07:22.200
that is available in this YAML file,

00:07:22.200 --> 00:07:24.520
and I can pass that information onto imenu,

00:07:24.520 --> 00:07:28.040
which I can then use to go to exactly where I want to.

00:07:28.040 --> 00:07:30.000
Also, since we're not dealing with

00:07:30.000 --> 00:07:32.600
any language specific constructs,

00:07:32.600 --> 00:07:34.040
this is very easy to extend to

00:07:34.040 --> 00:07:35.760
other similar languages

00:07:35.760 --> 00:07:37.440
or config files in this case.

00:07:37.440 --> 00:07:39.520
So for example, this is a JSON file,

00:07:39.520 --> 00:07:44.800
and I can navigate to location or project.

00:07:44.800 --> 00:07:48.320
And just like in YAML, it shows me where I'm at.

00:07:48.320 --> 00:07:49.920
I'm in projects.name,

00:07:49.920 --> 00:07:52.880
or I'm inside projects.highlights.

00:07:52.880 --> 00:07:55.600
Or how about Nix?

00:07:55.600 --> 00:07:57.480
This is my home.nix file.

00:07:57.480 --> 00:08:01.040
Again, I can search for services,

00:08:01.040 --> 00:08:04.640
and this lists me all the services that I've enabled.

00:08:04.640 --> 00:08:06.720
How about just services.description?

00:08:06.720 --> 00:08:08.160
So this is all the services

00:08:08.160 --> 00:08:10.480
that I've enabled and have descriptions.

00:08:10.480 --> 00:08:12.720
Now that we have seen this for config files,

00:08:12.720 --> 00:08:15.040
let's see how similar things apply for code.

00:08:15.040 --> 00:08:16.760
Just like in config files,

00:08:16.760 --> 00:08:18.680
I can see which function I'm under,

00:08:18.680 --> 00:08:21.560
and if I go to the next function, it changes.

00:08:21.560 --> 00:08:23.960
Okay, here is something really awesome.

00:08:23.960 --> 00:08:26.600
This is probably one of my favorites,

00:08:26.600 --> 00:08:30.400
and one of the things that actually made me understand

00:08:30.400 --> 00:08:34.080
how powerful Tree-sitter is, and got me into it.

00:08:34.080 --> 00:08:35.680
I work with a lot of Go code,

00:08:35.680 --> 00:08:38.840
and anyone who has worked with Go will tell you

00:08:38.840 --> 00:08:41.040
how repetitive it is handling errors.

00:08:41.040 --> 00:08:42.800
For those who don't write Go,

00:08:42.800 --> 00:08:45.200
let me give you a rough idea of what I'm talking about.

00:08:45.200 --> 00:08:47.000
If you want to bubble up the error,

00:08:47.000 --> 00:08:49.920
the way you would do it is just to return the error

00:08:49.920 --> 00:08:51.400
to the function that called it.

00:08:51.400 --> 00:08:55.720
Over here, you can either return nil or an empty value,

00:08:55.720 --> 00:08:57.640
and at the end, you return error.

00:08:57.640 --> 00:09:00.200
Let's try and use Tree-sitter to do this.

00:09:00.200 --> 00:09:03.120
Using the help of Tree-sitter, let's make Emacs

00:09:03.120 --> 00:09:06.421
go back, figure out what the return arguments are,

00:09:06.422 --> 00:09:08.240
figure out what their default values are,

00:09:08.240 --> 00:09:11.480
and automatically fill in the return statement.

00:09:11.480 --> 00:09:13.040
It would look something like this.

00:09:13.040 --> 00:09:16.120
In my case, it filled in the complete form,

00:09:16.120 --> 00:09:18.320
it figured out what the return arguments are,

00:09:18.320 --> 00:09:19.320
what their types are,

00:09:19.320 --> 00:09:20.960
and what their default values are,

00:09:20.960 --> 00:09:22.800
and filled out the entire return.

00:09:22.800 --> 00:09:24.760
And since this is a template,

00:09:24.760 --> 00:09:27.720
I can go to the next function, do the same thing,

00:09:27.720 --> 00:09:29.560
next function, do the same thing,

00:09:29.560 --> 00:09:31.520
next function, do the same thing.

00:09:31.520 --> 00:09:34.360
Here is a really fascinating use case of Tree-sitter,

00:09:34.360 --> 00:09:36.320
structural editing.

00:09:36.320 --> 00:09:38.200
You might be aware of plugins like paredit,

00:09:38.200 --> 00:09:40.280
which seems to "know" your code.

00:09:40.280 --> 00:09:42.520
This sort of takes it onto another level.

00:09:42.520 --> 00:09:46.040
It is in its early stages, but what this lets you do

00:09:46.040 --> 00:09:48.920
is completely treat your code as an AST,

00:09:48.920 --> 00:09:52.000
and edit as if it's a tree instead of characters.

00:09:52.000 --> 00:09:54.640
I am not going to go much in depth into it,

00:09:54.640 --> 00:09:57.000
but if you're interested, there is a talk

00:09:57.000 --> 00:09:59.080
from last year's EmacsConf around it.

00:09:59.080 --> 00:10:02.320
I'm just going to end this with one last tiny thing

00:10:02.320 --> 00:10:04.920
that I found in the tree-sitter-extras package.

00:10:04.920 --> 00:10:07.600
It's this tiny macro called tree-sitter-save-excursion.

00:10:07.600 --> 00:10:11.240
It works pretty much like save-excursion, but better.

00:10:11.240 --> 00:10:13.400
It uses the Tree-sitter syntax tree

00:10:13.400 --> 00:10:14.800
instead of just the code

00:10:14.800 --> 00:10:16.720
to figure out where to restore the position.

00:10:16.720 --> 00:10:20.200
My main use case for this was with code formatters.

00:10:20.200 --> 00:10:22.080
Since the code moves around a lot

00:10:22.080 --> 00:10:23.160
when it gets formatted,

00:10:23.160 --> 00:10:25.000
save-excursion was completely useless,

00:10:25.000 --> 00:10:26.240
but this came in handy.

00:10:26.240 --> 00:10:28.120
I'll just leave you off with

00:10:28.120 --> 00:10:31.120
what the future of Tree-sitter looks like for Emacs.

00:10:31.120 --> 00:10:33.760
So far, every Tree-sitter related feature

00:10:33.760 --> 00:10:36.040
that I've talked about is powered by this library.

00:10:36.040 --> 00:10:42.320
But there is talk about Tree-sitter coming into the core.

00:10:42.320 --> 00:10:45.840
It will most probably be landing in Emacs 29,

00:10:45.840 --> 00:10:48.720
and if you want to check out the work on Tree-sitter

00:10:48.720 --> 00:10:51.200
in core Emacs, you can check out

00:10:51.200 --> 00:10:52.920
the features/tree-sitter branch.

00:10:52.920 --> 00:10:56.640
You'll probably see more and more features and packages

00:10:56.640 --> 00:10:59.640
relying upon Tree-sitter, and even major modes

00:10:59.640 --> 00:11:01.560
being powered by Tree-sitter.

00:11:01.560 --> 00:11:03.880
And that's a wrap from me. Thank you.