summaryrefslogblamecommitdiffstats
path: root/2022/captions/emacsconf-2022-treesitter--treesitter-beyond-syntax-highlighting--abin-simon--main.vtt
blob: 576f1eafee02ef2ce3235c8aaa78383af24dad68 (plain) (tree)






























































































































































































































































































































































                                                         
                                     






















































































































































































































































































































































































                                                         
WEBVTT captioned by sachac

00:00:00.000 --> 00:00:03.240
Hey everyone, my name is Abin Simon

00:00:03.240 --> 00:00:05.080
and this talk is about "Tree-sitter:

00:00:05.080 --> 00:00:08.200
Beyond Syntax Highlighting."

00:00:08.200 --> 00:00:10.720
For those who are not aware of what Tree-sitter is,

00:00:10.720 --> 00:00:11.720
let me give you a quick intro.

00:00:11.720 --> 00:00:17.120
Tree-sitter, at its core, is a parser generator tool

00:00:17.120 --> 00:00:19.440
and an incremental parsing library.

00:00:19.440 --> 00:00:22.000
What it essentially means is that it gives you

00:00:22.000 --> 00:00:23.154
an always up-to-date

00:00:23.155 --> 00:00:24.200
AST [abstract syntax tree] of your code.

00:00:24.200 --> 00:00:27.960
In the current Emacs frame, what you see to the right

00:00:27.960 --> 00:00:30.840
is the AST tree produced by Tree-sitter

00:00:30.840 --> 00:00:33.560
of the code that is on the left.

00:00:33.560 --> 00:00:37.000
For example, if you go to this "if" statement,

00:00:37.000 --> 00:00:38.840
you can see it goes here.

00:00:38.840 --> 00:00:41.440
It is also really good at handling errors.

00:00:41.440 --> 00:00:44.400
For example, if I were to delete this [if statement],

00:00:44.400 --> 00:00:47.960
it still parses out a tree as much as it can,

00:00:47.960 --> 00:00:50.280
but with an error node.

00:00:50.280 --> 00:00:51.760
Now let's see how we can query the tree

00:00:51.760 --> 00:00:54.440
to get the information that we need.

00:00:54.440 --> 00:01:01.480
Let's first try to get all the identifiers in the buffer.

00:01:01.480 --> 00:01:04.000
It highlights all the identifiers in the buffer,

00:01:04.000 --> 00:01:05.440
but let's say we want to get something

00:01:05.440 --> 00:01:07.280
a little more precise.

00:01:07.280 --> 00:01:10.400
Let's say we wanted to get this "i" here.

00:01:10.400 --> 00:01:13.280
This, in our case, would be this identifier

00:01:13.280 --> 00:01:15.200
inside this assignment expression

00:01:15.200 --> 00:01:27.320
inside this "for" statement.

00:01:27.320 --> 00:01:29.920
We can write it out like this.

00:01:29.920 --> 00:01:31.880
I hope this gives you a basic idea

00:01:31.880 --> 00:01:34.480
of how Tree-sitter works and how you can query

00:01:34.480 --> 00:01:37.040
to get the information that you need.

00:01:37.040 --> 00:01:39.520
First of all, let's see how Tree-sitter can help us

00:01:39.520 --> 00:01:41.880
with syntax highlighting.

00:01:41.880 --> 00:01:46.480
This is the default syntax highlighting by Emacs for SQL.

00:01:46.480 --> 00:01:52.000
Now let's see how Tree-sitter helps.

00:01:52.000 --> 00:01:54.240
This is the syntax highlighting in Emacs

00:01:54.240 --> 00:01:56.760
which Tree-sitter enabled.

00:01:56.760 --> 00:01:58.240
You'll see that we're able to target

00:01:58.240 --> 00:02:01.240
a lot more things and highlight them.

00:02:01.240 --> 00:02:03.138
That said, you don't always have to

00:02:03.139 --> 00:02:04.200
highlight everything.

00:02:04.200 --> 00:02:15.640
I personally prefer a much simpler theme.

00:02:15.640 --> 00:02:17.880
Now let's see how Tree-sitter helps you simplify

00:02:17.880 --> 00:02:20.920
adding custom syntax highlighting to your code.

00:02:20.920 --> 00:02:22.200
This is a Python file which has

00:02:22.200 --> 00:02:25.640
a class and a few member functions.

00:02:25.640 --> 00:02:27.680
Anyone who has used Python will know that

00:02:27.680 --> 00:02:32.040
the "self" keyword, while it is passed in as an argument,

00:02:32.040 --> 00:02:34.240
it has more meaning than that.

00:02:34.240 --> 00:02:35.480
Let's see if you can use Tree-sitter

00:02:35.480 --> 00:02:38.720
to highlight just the "self" keyword.

00:02:38.720 --> 00:02:40.400
If you look at the Tree-sitter tree,

00:02:40.400 --> 00:02:43.120
you can see that this is the first identifier

00:02:43.120 --> 00:02:45.520
in the list of parameters for a function definition.

00:02:45.520 --> 00:02:55.480
This is how you would query for the first identifier

00:02:55.480 --> 00:02:59.320
inside parameters inside a function definition.

00:02:59.320 --> 00:03:02.520
Now, if you see here, it also matches "cls",

00:03:02.520 --> 00:03:11.360
but let's restrict it to match just "self".

00:03:11.360 --> 00:03:14.200
Now we have a Tree-sitter query that identifies

00:03:14.200 --> 00:03:16.960
the first argument to the function definition

00:03:16.960 --> 00:03:19.640
and is also called "self".

00:03:19.640 --> 00:03:22.520
We can use this to apply custom highlighting onto this.

00:03:22.520 --> 00:03:25.000
This is pretty much all the code

00:03:25.000 --> 00:03:26.520
that you'll need to do this.

00:03:26.520 --> 00:03:29.240
The first block here is essentially to say to

00:03:29.240 --> 00:03:32.160
Tree-sitter to highlight anything with python.self

00:03:32.160 --> 00:03:35.720
with the face of custom-set.

00:03:35.720 --> 00:03:37.520
Now the second block here essentially is

00:03:37.520 --> 00:03:39.800
how we match for that.

00:03:39.800 --> 00:03:41.800
Now if you go back into a Python buffer

00:03:41.800 --> 00:03:44.680
and re-enable python-mode, we'll see that "self"

00:03:44.680 --> 00:03:47.120
is highlighted differently.

00:03:47.120 --> 00:03:48.880
How about creating text objects?

00:03:48.880 --> 00:03:50.440
Tree-sitter can help there too.

00:03:50.440 --> 00:03:53.080
For those who don't know, text objects

00:03:53.080 --> 00:03:54.440
is an idea that comes from Vim,

00:03:54.440 --> 00:03:57.760
and you can do things like select word,

00:03:57.760 --> 00:04:00.520
delete word, things like that.

00:04:00.520 --> 00:04:06.200
There are other text objects like line and paragraph.

00:04:06.200 --> 00:04:09.000
For each text object, you can have operations

00:04:09.000 --> 00:04:09.760
that are defined on them.

00:04:09.760 --> 00:04:13.600
For example, delete, copy, select, comment,

00:04:13.600 --> 00:04:16.400
all of these are operations that you can do.

00:04:16.400 --> 00:04:19.400
Let's try and use Tree-sitter to add more text objects.

00:04:19.400 --> 00:04:20.560
This is a plugin that I wrote

00:04:20.560 --> 00:04:25.000
which lets you add more text objects into Emacs.

00:04:25.000 --> 00:04:27.880
It helps you code aware text objects

00:04:27.880 --> 00:04:31.880
like functions, conditionals, loops, and such.

00:04:31.880 --> 00:04:34.360
Let's see an example scenario of how

00:04:34.360 --> 00:04:35.920
something like this could come in handy.

00:04:35.920 --> 00:04:39.280
For example, I can select inside this condition

00:04:39.280 --> 00:04:42.960
or inside this function and do things like that.

00:04:42.960 --> 00:04:44.520
Let's say I want to take this conditional,

00:04:44.520 --> 00:04:47.160
move to the next function, and create it here.

00:04:47.160 --> 00:04:49.640
What I would do is something like

00:04:49.640 --> 00:04:52.320
delete the conditional, move to the next function,

00:04:52.320 --> 00:04:56.240
create a conditional there, and paste.

00:04:56.240 --> 00:04:57.160
Let's try another example.

00:04:57.160 --> 00:05:01.360
Let's say I want to take this and move it to the end.

00:05:01.360 --> 00:05:02.960
If I had to do it without text objects,

00:05:02.960 --> 00:05:06.800
I'd probably have to go back to the previous comma,

00:05:06.800 --> 00:05:10.440
delete till next comma, find the closing bracket,

00:05:10.440 --> 00:05:11.880
and paste before.

00:05:11.880 --> 00:05:14.040
That works, but let's see

00:05:14.040 --> 00:05:16.520
how Tree-sitter can simplify it.

00:05:16.520 --> 00:05:19.240
With Tree-sitter, I can say delete the argument,

00:05:19.240 --> 00:05:22.880
go to the end of the next argument, and then paste.

00:05:22.880 --> 00:05:25.280
Tree-sitter essentially helps Emacs

00:05:25.280 --> 00:05:27.240
understand the code better semantically.

00:05:27.240 --> 00:05:29.600
Here is yet another use case.

00:05:29.600 --> 00:05:31.480
I work at a remote company,

00:05:31.480 --> 00:05:33.440
and I often find myself being in a call

00:05:33.440 --> 00:05:35.400
with my teammates, explaining the code to them.

00:05:35.400 --> 00:05:38.000
And one thing that really comes in handy

00:05:38.000 --> 00:05:39.760
is the narrowing capability of Emacs.

00:05:39.760 --> 00:05:43.040
Specifically, the fancy-narrow package.

00:05:43.040 --> 00:05:44.840
I use it to narrow just the function,

00:05:44.840 --> 00:05:48.760
or I could narrow to the conditional.

00:05:48.760 --> 00:05:51.520
Next to the end, the list would be code folding.

00:05:51.520 --> 00:05:54.480
This is a package which uses Tree-sitter

00:05:54.480 --> 00:05:57.560
to improve the code folding functionalities of Emacs.

00:05:57.560 --> 00:06:00.200
Code folding has always been this thing

00:06:00.200 --> 00:06:02.280
that I've had a love-hate relationship with.

00:06:02.280 --> 00:06:04.280
It usually works most of the time,

00:06:04.280 --> 00:06:06.960
but then fails if the indentation is wrong

00:06:06.960 --> 00:06:09.160
or we do something weird with the arguments.

00:06:09.160 --> 00:06:11.680
But now with Tree-sitter in the mix,

00:06:11.680 --> 00:06:12.720
it's a lot more precise.

00:06:12.720 --> 00:06:17.040
I can fold comments, I can fold functions,

00:06:17.040 --> 00:06:20.480
I can fold conditionals. You get the idea.

00:06:20.480 --> 00:06:23.840
I work with Kubernetes, which means I end up

00:06:23.840 --> 00:06:28.080
having to write and read a lot of YAML files.

00:06:28.080 --> 00:06:31.840
And navigating big YAML files is a mess.

00:06:31.840 --> 00:06:35.760
The two main problems are figuring out where I am,

00:06:35.760 --> 00:06:38.760
and two, navigating to where I want to be.

00:06:38.760 --> 00:06:41.760
Let's see how Tree-sitter can help us with both of this.

00:06:41.760 --> 00:06:43.840
This is an example YAML file.

00:06:43.840 --> 00:06:47.080
To be precise, this is the values file

00:06:47.080 --> 00:06:48.640
of the Redis helm chart.

00:06:48.640 --> 00:06:52.240
I'm somewhere in the file on tag under image,

00:06:52.240 --> 00:06:54.880
but I don't know what this tag is for.

00:06:54.880 --> 00:06:57.240
But with the help of Tree-sitter,

00:06:57.240 --> 00:06:59.160
I've been able to add this information

00:06:59.160 --> 00:07:00.440
into my header line.

00:07:00.440 --> 00:07:02.960
If you see in the header line,

00:07:02.960 --> 00:07:05.880
you'll see that I'm under sentinel.image.

00:07:05.880 --> 00:07:08.800
Now let's see how this helps with navigation.

00:07:08.800 --> 00:07:12.680
Let's say I want to enable persistence on master node.

00:07:12.680 --> 00:07:18.200
So with the help of Tree-sitter,

00:07:18.200 --> 00:07:20.400
I was able to enumerate every field

00:07:20.400 --> 00:07:22.200
that is available in this YAML file,

00:07:22.200 --> 00:07:24.520
and I can pass that information onto imenu,

00:07:24.520 --> 00:07:28.040
which I can then use to go to exactly where I want to.

00:07:28.040 --> 00:07:30.000
Also, since we're not dealing with

00:07:30.000 --> 00:07:32.600
any language specific constructs,

00:07:32.600 --> 00:07:34.040
this is very easy to extend to

00:07:34.040 --> 00:07:35.760
other similar languages

00:07:35.760 --> 00:07:37.440
or config files in this case.

00:07:37.440 --> 00:07:39.520
So for example, this is a JSON file,

00:07:39.520 --> 00:07:44.800
and I can navigate to location or project.

00:07:44.800 --> 00:07:48.320
And just like in YAML, it shows me where I'm at.

00:07:48.320 --> 00:07:49.920
I'm in projects.name,

00:07:49.920 --> 00:07:52.880
or I'm inside projects.highlights.

00:07:52.880 --> 00:07:55.600
Or how about Nix?

00:07:55.600 --> 00:07:57.480
This is my home.nix file.

00:07:57.480 --> 00:08:01.040
Again, I can search for services,

00:08:01.040 --> 00:08:04.640
and this lists me all the services that I've enabled.

00:08:04.640 --> 00:08:06.720
How about just services.description?

00:08:06.720 --> 00:08:08.160
So this is all the services

00:08:08.160 --> 00:08:10.480
that I've enabled and have descriptions.

00:08:10.480 --> 00:08:12.720
Now that we have seen this for config files,

00:08:12.720 --> 00:08:15.040
let's see how similar things apply for code.

00:08:15.040 --> 00:08:16.760
Just like in config files,

00:08:16.760 --> 00:08:18.680
I can see which function I'm under,

00:08:18.680 --> 00:08:21.560
and if I go to the next function, it changes.

00:08:21.560 --> 00:08:23.960
Okay, here is something really awesome.

00:08:23.960 --> 00:08:26.600
This is probably one of my favorites,

00:08:26.600 --> 00:08:30.400
and one of the things that actually made me understand

00:08:30.400 --> 00:08:34.080
how powerful Tree-sitter is, and got me into it.

00:08:34.080 --> 00:08:35.680
I work with a lot of Go code,

00:08:35.680 --> 00:08:38.840
and anyone who has worked with Go will tell you

00:08:38.840 --> 00:08:41.040
how repetitive it is handling errors.

00:08:41.040 --> 00:08:42.800
For those who don't write Go,

00:08:42.800 --> 00:08:45.200
let me give you a rough idea of what I'm talking about.

00:08:45.200 --> 00:08:47.000
If you want to bubble up the error,

00:08:47.000 --> 00:08:49.920
the way you would do it is just to return the error

00:08:49.920 --> 00:08:51.400
to the function that called it.

00:08:51.400 --> 00:08:55.720
Over here, you can either return nil or an empty value,

00:08:55.720 --> 00:08:57.640
and at the end, you return error.

00:08:57.640 --> 00:09:00.200
Let's try and use Tree-sitter to do this.

00:09:00.200 --> 00:09:03.120
Using the help of Tree-sitter, let's make Emacs

00:09:03.120 --> 00:09:06.421
go back, figure out what the return arguments are,

00:09:06.422 --> 00:09:08.240
figure out what their default values are,

00:09:08.240 --> 00:09:11.480
and automatically fill in the return statement.

00:09:11.480 --> 00:09:13.040
It would look something like this.

00:09:13.040 --> 00:09:16.120
In my case, it filled in the complete form,

00:09:16.120 --> 00:09:18.320
it figured out what the return arguments are,

00:09:18.320 --> 00:09:19.320
what their types are,

00:09:19.320 --> 00:09:20.960
and what their default values are,

00:09:20.960 --> 00:09:22.800
and filled out the entire return.

00:09:22.800 --> 00:09:24.760
And since this is a template,

00:09:24.760 --> 00:09:27.720
I can go to the next function, do the same thing,

00:09:27.720 --> 00:09:29.560
next function, do the same thing,

00:09:29.560 --> 00:09:31.520
next function, do the same thing.

00:09:31.520 --> 00:09:34.360
Here is a really fascinating use case of Tree-sitter,

00:09:34.360 --> 00:09:36.320
structural editing.

00:09:36.320 --> 00:09:38.200
You might be aware of plugins like paredit,

00:09:38.200 --> 00:09:40.280
which seems to "know" your code.

00:09:40.280 --> 00:09:42.520
This sort of takes it onto another level.

00:09:42.520 --> 00:09:46.040
It is in its early stages, but what this lets you do

00:09:46.040 --> 00:09:48.920
is completely treat your code as an AST,

00:09:48.920 --> 00:09:52.000
and edit as if it's a tree instead of characters.

00:09:52.000 --> 00:09:54.640
I am not going to go much in depth into it,

00:09:54.640 --> 00:09:57.000
but if you're interested, there is a talk

00:09:57.000 --> 00:09:59.080
from last year's EmacsConf around it.

00:09:59.080 --> 00:10:02.320
I'm just going to end this with one last tiny thing

00:10:02.320 --> 00:10:04.920
that I found in the tree-sitter-extras package.

00:10:04.920 --> 00:10:07.600
It's this tiny macro called tree-sitter-save-excursion.

00:10:07.600 --> 00:10:11.240
It works pretty much like save-excursion, but better.

00:10:11.240 --> 00:10:13.400
It uses the Tree-sitter syntax tree

00:10:13.400 --> 00:10:14.800
instead of just the code

00:10:14.800 --> 00:10:16.720
to figure out where to restore the position.

00:10:16.720 --> 00:10:20.200
My main use case for this was with code formatters.

00:10:20.200 --> 00:10:22.080
Since the code moves around a lot

00:10:22.080 --> 00:10:23.160
when it gets formatted,

00:10:23.160 --> 00:10:25.000
save-excursion was completely useless,

00:10:25.000 --> 00:10:26.240
but this came in handy.

00:10:26.240 --> 00:10:28.120
I'll just leave you off with

00:10:28.120 --> 00:10:31.120
what the future of Tree-sitter looks like for Emacs.

00:10:31.120 --> 00:10:33.760
So far, every Tree-sitter related feature

00:10:33.760 --> 00:10:36.040
that I've talked about is powered by this library.

00:10:36.040 --> 00:10:42.320
But there is talk about Tree-sitter coming into the core.

00:10:42.320 --> 00:10:45.840
It will most probably be landing in Emacs 29,

00:10:45.840 --> 00:10:48.720
and if you want to check out the work on Tree-sitter

00:10:48.720 --> 00:10:51.200
in core Emacs, you can check out

00:10:51.200 --> 00:10:52.920
the features/tree-sitter branch.

00:10:52.920 --> 00:10:56.640
You'll probably see more and more features and packages

00:10:56.640 --> 00:10:59.640
relying upon Tree-sitter, and even major modes

00:10:59.640 --> 00:11:01.560
being powered by Tree-sitter.

00:11:01.560 --> 00:11:03.880
And that's a wrap from me. Thank you.