WEBVTT captioned by sachac 00:00:00.000 --> 00:00:03.240 Hey everyone, my name is Abin Simon 00:00:03.240 --> 00:00:05.080 and this talk is about "Tree-sitter: 00:00:05.080 --> 00:00:08.200 Beyond Syntax Highlighting." 00:00:08.200 --> 00:00:10.720 For those who are not aware of what Tree-sitter is, 00:00:10.720 --> 00:00:11.720 let me give you a quick intro. 00:00:11.720 --> 00:00:17.120 Tree-sitter, at its core, is a parser generator tool 00:00:17.120 --> 00:00:19.440 and an incremental parsing library. 00:00:19.440 --> 00:00:22.000 What it essentially means is that it gives you 00:00:22.000 --> 00:00:23.154 an always up-to-date 00:00:23.155 --> 00:00:24.200 AST [abstract syntax tree] of your code. 00:00:24.200 --> 00:00:27.960 In the current Emacs frame, what you see to the right 00:00:27.960 --> 00:00:30.840 is the AST tree produced by Tree-sitter 00:00:30.840 --> 00:00:33.560 of the code that is on the left. 00:00:33.560 --> 00:00:37.000 For example, if you go to this "if" statement, 00:00:37.000 --> 00:00:38.840 you can see it goes here. 00:00:38.840 --> 00:00:41.440 It is also really good at handling errors. 00:00:41.440 --> 00:00:44.400 For example, if I were to delete this [if statement], 00:00:44.400 --> 00:00:47.960 it still parses out a tree as much as it can, 00:00:47.960 --> 00:00:50.280 but with an error node. 00:00:50.280 --> 00:00:51.760 Now let's see how we can query the tree 00:00:51.760 --> 00:00:54.440 to get the information that we need. 00:00:54.440 --> 00:01:01.480 Let's first try to get all the identifiers in the buffer. 00:01:01.480 --> 00:01:04.000 It highlights all the identifiers in the buffer, 00:01:04.000 --> 00:01:05.440 but let's say we want to get something 00:01:05.440 --> 00:01:07.280 a little more precise. 00:01:07.280 --> 00:01:10.400 Let's say we wanted to get this "i" here. 00:01:10.400 --> 00:01:13.280 This, in our case, would be this identifier 00:01:13.280 --> 00:01:15.200 inside this assignment expression 00:01:15.200 --> 00:01:27.320 inside this "for" statement. 00:01:27.320 --> 00:01:29.920 We can write it out like this. 00:01:29.920 --> 00:01:31.880 I hope this gives you a basic idea 00:01:31.880 --> 00:01:34.480 of how Tree-sitter works and how you can query 00:01:34.480 --> 00:01:37.040 to get the information that you need. 00:01:37.040 --> 00:01:39.520 First of all, let's see how Tree-sitter can help us 00:01:39.520 --> 00:01:41.880 with syntax highlighting. 00:01:41.880 --> 00:01:46.480 This is the default syntax highlighting by Emacs for SQL. 00:01:46.480 --> 00:01:52.000 Now let's see how Tree-sitter helps. 00:01:52.000 --> 00:01:54.240 This is the syntax highlighting in Emacs 00:01:54.240 --> 00:01:56.760 which Tree-sitter enabled. 00:01:56.760 --> 00:01:58.240 You'll see that we're able to target 00:01:58.240 --> 00:02:01.240 a lot more things and highlight them. 00:02:01.240 --> 00:02:03.138 That said, you don't always have to 00:02:03.139 --> 00:02:04.200 highlight everything. 00:02:04.200 --> 00:02:15.640 I personally prefer a much simpler theme. 00:02:15.640 --> 00:02:17.880 Now let's see how Tree-sitter helps you simplify 00:02:17.880 --> 00:02:20.920 adding custom syntax highlighting to your code. 00:02:20.920 --> 00:02:22.200 This is a Python file which has 00:02:22.200 --> 00:02:25.640 a class and a few member functions. 00:02:25.640 --> 00:02:27.680 Anyone who has used Python will know that 00:02:27.680 --> 00:02:32.040 the "self" keyword, while it is passed in as an argument, 00:02:32.040 --> 00:02:34.240 it has more meaning than that. 00:02:34.240 --> 00:02:35.480 Let's see if you can use Tree-sitter 00:02:35.480 --> 00:02:38.720 to highlight just the "self" keyword. 00:02:38.720 --> 00:02:40.400 If you look at the Tree-sitter tree, 00:02:40.400 --> 00:02:43.120 you can see that this is the first identifier 00:02:43.120 --> 00:02:45.520 in the list of parameters for a function definition. 00:02:45.520 --> 00:02:55.480 This is how you would query for the first identifier 00:02:55.480 --> 00:02:59.320 inside parameters inside a function definition. 00:02:59.320 --> 00:03:02.520 Now, if you see here, it also matches "cls", 00:03:02.520 --> 00:03:11.360 but let's restrict it to match just "self". 00:03:11.360 --> 00:03:14.200 Now we have a Tree-sitter query that identifies 00:03:14.200 --> 00:03:16.960 the first argument to the function definition 00:03:16.960 --> 00:03:19.640 and is also called "self". 00:03:19.640 --> 00:03:22.520 We can use this to apply custom highlighting onto this. 00:03:22.520 --> 00:03:25.000 This is pretty much all the code 00:03:25.000 --> 00:03:26.520 that you'll need to do this. 00:03:26.520 --> 00:03:29.240 The first block here is essentially to say to 00:03:29.240 --> 00:03:32.160 Tree-sitter to highlight anything with python.self 00:03:32.160 --> 00:03:35.720 with the face of custom-set. 00:03:35.720 --> 00:03:37.520 Now the second block here essentially is 00:03:37.520 --> 00:03:39.800 how we match for that. 00:03:39.800 --> 00:03:41.800 Now if you go back into a Python buffer 00:03:41.800 --> 00:03:44.680 and re-enable python-mode, we'll see that "self" 00:03:44.680 --> 00:03:47.120 is highlighted differently. 00:03:47.120 --> 00:03:48.880 How about creating text objects? 00:03:48.880 --> 00:03:50.440 Tree-sitter can help there too. 00:03:50.440 --> 00:03:53.080 For those who don't know, text objects 00:03:53.080 --> 00:03:54.440 is an idea that comes from Vim, 00:03:54.440 --> 00:03:57.760 and you can do things like select word, 00:03:57.760 --> 00:04:00.520 delete word, things like that. 00:04:00.520 --> 00:04:06.200 There are other text objects like line and paragraph. 00:04:06.200 --> 00:04:09.000 For each text object, you can have operations 00:04:09.000 --> 00:04:09.760 that are defined on them. 00:04:09.760 --> 00:04:13.600 For example, delete, copy, select, comment, 00:04:13.600 --> 00:04:16.400 all of these are operations that you can do. 00:04:16.400 --> 00:04:19.400 Let's try and use Tree-sitter to add more text objects. 00:04:19.400 --> 00:04:20.560 This is a plugin that I wrote 00:04:20.560 --> 00:04:25.000 which lets you add more text objects into Emacs. 00:04:25.000 --> 00:04:27.880 It helps you code aware text objects 00:04:27.880 --> 00:04:31.880 like functions, conditionals, loops, and such. 00:04:31.880 --> 00:04:34.360 Let's see an example scenario of how 00:04:34.360 --> 00:04:35.920 something like this could come in handy. 00:04:35.920 --> 00:04:39.280 For example, I can select inside this condition 00:04:39.280 --> 00:04:42.960 or inside this function and do things like that. 00:04:42.960 --> 00:04:44.520 Let's say I want to take this conditional, 00:04:44.520 --> 00:04:47.160 move to the next function, and create it here. 00:04:47.160 --> 00:04:49.640 What I would do is something like 00:04:49.640 --> 00:04:52.320 delete the conditional, move to the next function, 00:04:52.320 --> 00:04:56.240 create a conditional there, and paste. 00:04:56.240 --> 00:04:57.160 Let's try another example. 00:04:57.160 --> 00:05:01.360 Let's say I want to take this and move it to the end. 00:05:01.360 --> 00:05:02.960 If I had to do it without text objects, 00:05:02.960 --> 00:05:06.800 I'd probably have to go back to the previous comma, 00:05:06.800 --> 00:05:10.440 delete till next comma, find the closing bracket, 00:05:10.440 --> 00:05:11.880 and paste before. 00:05:11.880 --> 00:05:14.040 That works, but let's see 00:05:14.040 --> 00:05:16.520 how Tree-sitter can simplify it. 00:05:16.520 --> 00:05:19.240 With Tree-sitter, I can say delete the argument, 00:05:19.240 --> 00:05:22.880 go to the end of the next argument, and then paste. 00:05:22.880 --> 00:05:25.280 Tree-sitter essentially helps Emacs 00:05:25.280 --> 00:05:27.240 understand the code better semantically. 00:05:27.240 --> 00:05:29.600 Here is yet another use case. 00:05:29.600 --> 00:05:31.480 I work at a remote company, 00:05:31.480 --> 00:05:33.440 and I often find myself being in a call 00:05:33.440 --> 00:05:35.400 with my teammates, explaining the code to them. 00:05:35.400 --> 00:05:38.000 And one thing that really comes in handy 00:05:38.000 --> 00:05:39.760 is the narrowing capability of Emacs. 00:05:39.760 --> 00:05:43.040 Specifically, the fancy-narrow package. 00:05:43.040 --> 00:05:44.840 I use it to narrow just the function, 00:05:44.840 --> 00:05:48.760 or I could narrow to the conditional. 00:05:48.760 --> 00:05:51.520 Next to the end, the list would be code folding. 00:05:51.520 --> 00:05:54.480 This is a package which uses Tree-sitter 00:05:54.480 --> 00:05:57.560 to improve the code folding functionalities of Emacs. 00:05:57.560 --> 00:06:00.200 Code folding has always been this thing 00:06:00.200 --> 00:06:02.280 that I've had a love-hate relationship with. 00:06:02.280 --> 00:06:04.280 It usually works most of the time, 00:06:04.280 --> 00:06:06.960 but then fails if the indentation is wrong 00:06:06.960 --> 00:06:09.160 or we do something weird with the arguments. 00:06:09.160 --> 00:06:11.680 But now with Tree-sitter in the mix, 00:06:11.680 --> 00:06:12.720 it's a lot more precise. 00:06:12.720 --> 00:06:17.040 I can fold comments, I can fold functions, 00:06:17.040 --> 00:06:20.480 I can fold conditionals. You get the idea. 00:06:20.480 --> 00:06:23.840 I work with Kubernetes, which means I end up 00:06:23.840 --> 00:06:28.080 having to write and read a lot of YAML files. 00:06:28.080 --> 00:06:31.840 And navigating big YAML files is a mess. 00:06:31.840 --> 00:06:35.760 The two main problems are figuring out where I am, 00:06:35.760 --> 00:06:38.760 and two, navigating to where I want to be. 00:06:38.760 --> 00:06:41.760 Let's see how Tree-sitter can help us with both of this. 00:06:41.760 --> 00:06:43.840 This is an example YAML file. 00:06:43.840 --> 00:06:47.080 To be precise, this is the values file 00:06:47.080 --> 00:06:48.640 of the Redis helm chart. 00:06:48.640 --> 00:06:52.240 I'm somewhere in the file on tag under image, 00:06:52.240 --> 00:06:54.880 but I don't know what this tag is for. 00:06:54.880 --> 00:06:57.240 But with the help of Tree-sitter, 00:06:57.240 --> 00:06:59.160 I've been able to add this information 00:06:59.160 --> 00:07:00.440 into my header line. 00:07:00.440 --> 00:07:02.960 If you see in the header line, 00:07:02.960 --> 00:07:05.880 you'll see that I'm under sentinel.image. 00:07:05.880 --> 00:07:08.800 Now let's see how this helps with navigation. 00:07:08.800 --> 00:07:12.680 Let's say I want to enable persistence on master node. 00:07:12.680 --> 00:07:18.200 So with the help of Tree-sitter, 00:07:18.200 --> 00:07:20.400 I was able to enumerate every field 00:07:20.400 --> 00:07:22.200 that is available in this YAML file, 00:07:22.200 --> 00:07:24.520 and I can pass that information onto imenu, 00:07:24.520 --> 00:07:28.040 which I can then use to go to exactly where I want to. 00:07:28.040 --> 00:07:30.000 Also, since we're not dealing with 00:07:30.000 --> 00:07:32.600 any language specific constructs, 00:07:32.600 --> 00:07:34.040 this is very easy to extend to 00:07:34.040 --> 00:07:35.760 other similar languages 00:07:35.760 --> 00:07:37.440 or config files in this case. 00:07:37.440 --> 00:07:39.520 So for example, this is a JSON file, 00:07:39.520 --> 00:07:44.800 and I can navigate to location or project. 00:07:44.800 --> 00:07:48.320 And just like in YAML, it shows me where I'm at. 00:07:48.320 --> 00:07:49.920 I'm in projects.name, 00:07:49.920 --> 00:07:52.880 or I'm inside projects.highlights. 00:07:52.880 --> 00:07:55.600 Or how about Nix? 00:07:55.600 --> 00:07:57.480 This is my home.nix file. 00:07:57.480 --> 00:08:01.040 Again, I can search for services, 00:08:01.040 --> 00:08:04.640 and this lists me all the services that I've enabled. 00:08:04.640 --> 00:08:06.720 How about just services.description? 00:08:06.720 --> 00:08:08.160 So this is all the services 00:08:08.160 --> 00:08:10.480 that I've enabled and have descriptions. 00:08:10.480 --> 00:08:12.720 Now that we have seen this for config files, 00:08:12.720 --> 00:08:15.040 let's see how similar things apply for code. 00:08:15.040 --> 00:08:16.760 Just like in config files, 00:08:16.760 --> 00:08:18.680 I can see which function I'm under, 00:08:18.680 --> 00:08:21.560 and if I go to the next function, it changes. 00:08:21.560 --> 00:08:23.960 Okay, here is something really awesome. 00:08:23.960 --> 00:08:26.600 This is probably one of my favorites, 00:08:26.600 --> 00:08:30.400 and one of the things that actually made me understand 00:08:30.400 --> 00:08:34.080 how powerful Tree-sitter is, and got me into it. 00:08:34.080 --> 00:08:35.680 I work with a lot of Go code, 00:08:35.680 --> 00:08:38.840 and anyone who has worked with Go will tell you 00:08:38.840 --> 00:08:41.040 how repetitive it is handling errors. 00:08:41.040 --> 00:08:42.800 For those who don't write Go, 00:08:42.800 --> 00:08:45.200 let me give you a rough idea of what I'm talking about. 00:08:45.200 --> 00:08:47.000 If you want to bubble up the error, 00:08:47.000 --> 00:08:49.920 the way you would do it is just to return the error 00:08:49.920 --> 00:08:51.400 to the function that called it. 00:08:51.400 --> 00:08:55.720 Over here, you can either return nil or an empty value, 00:08:55.720 --> 00:08:57.640 and at the end, you return error. 00:08:57.640 --> 00:09:00.200 Let's try and use Tree-sitter to do this. 00:09:00.200 --> 00:09:03.120 Using the help of Tree-sitter, let's make Emacs 00:09:03.120 --> 00:09:06.421 go back, figure out what the return arguments are, 00:09:06.422 --> 00:09:08.240 figure out what their default values are, 00:09:08.240 --> 00:09:11.480 and automatically fill in the return statement. 00:09:11.480 --> 00:09:13.040 It would look something like this. 00:09:13.040 --> 00:09:16.120 In my case, it filled in the complete form, 00:09:16.120 --> 00:09:18.320 it figured out what the return arguments are, 00:09:18.320 --> 00:09:19.320 what their types are, 00:09:19.320 --> 00:09:20.960 and what their default values are, 00:09:20.960 --> 00:09:22.800 and filled out the entire return. 00:09:22.800 --> 00:09:24.760 And since this is a template, 00:09:24.760 --> 00:09:27.720 I can go to the next function, do the same thing, 00:09:27.720 --> 00:09:29.560 next function, do the same thing, 00:09:29.560 --> 00:09:31.520 next function, do the same thing. 00:09:31.520 --> 00:09:34.360 Here is a really fascinating use case of Tree-sitter, 00:09:34.360 --> 00:09:36.320 structural editing. 00:09:36.320 --> 00:09:38.200 You might be aware of plugins like paredit, 00:09:38.200 --> 00:09:40.280 which seems to "know" your code. 00:09:40.280 --> 00:09:42.520 This sort of takes it onto another level. 00:09:42.520 --> 00:09:46.040 It is in its early stages, but what this lets you do 00:09:46.040 --> 00:09:48.920 is completely treat your code as an AST, 00:09:48.920 --> 00:09:52.000 and edit as if it's a tree instead of characters. 00:09:52.000 --> 00:09:54.640 I am not going to go much in depth into it, 00:09:54.640 --> 00:09:57.000 but if you're interested, there is a talk 00:09:57.000 --> 00:09:59.080 from last year's EmacsConf around it. 00:09:59.080 --> 00:10:02.320 I'm just going to end this with one last tiny thing 00:10:02.320 --> 00:10:04.920 that I found in the tree-sitter-extras package. 00:10:04.920 --> 00:10:07.600 It's this tiny macro called tree-sitter-save-excursion. 00:10:07.600 --> 00:10:11.240 It works pretty much like save-excursion, but better. 00:10:11.240 --> 00:10:13.400 It uses the Tree-sitter syntax tree 00:10:13.400 --> 00:10:14.800 instead of just the code 00:10:14.800 --> 00:10:16.720 to figure out where to restore the position. 00:10:16.720 --> 00:10:20.200 My main use case for this was with code formatters. 00:10:20.200 --> 00:10:22.080 Since the code moves around a lot 00:10:22.080 --> 00:10:23.160 when it gets formatted, 00:10:23.160 --> 00:10:25.000 save-excursion was completely useless, 00:10:25.000 --> 00:10:26.240 but this came in handy. 00:10:26.240 --> 00:10:28.120 I'll just leave you off with 00:10:28.120 --> 00:10:31.120 what the future of Tree-sitter looks like for Emacs. 00:10:31.120 --> 00:10:33.760 So far, every Tree-sitter related feature 00:10:33.760 --> 00:10:36.040 that I've talked about is powered by this library. 00:10:36.040 --> 00:10:42.320 But there is talk about Tree-sitter coming into the core. 00:10:42.320 --> 00:10:45.840 It will most probably be landing in Emacs 29, 00:10:45.840 --> 00:10:48.720 and if you want to check out the work on Tree-sitter 00:10:48.720 --> 00:10:51.200 in core Emacs, you can check out 00:10:51.200 --> 00:10:52.920 the features/tree-sitter branch. 00:10:52.920 --> 00:10:56.640 You'll probably see more and more features and packages 00:10:56.640 --> 00:10:59.640 relying upon Tree-sitter, and even major modes 00:10:59.640 --> 00:11:01.560 being powered by Tree-sitter. 00:11:01.560 --> 00:11:03.880 And that's a wrap from me. Thank you.