WEBVTT 00:00:01.520 --> 00:00:04.400 Hello, everyone! My name is Tuấn-Anh. 00:00:04.400 --> 00:00:07.200 I've been using Emacs for about 10 years. 00:00:07.200 --> 00:00:09.280 Today, I'm going to talk about tree-sitter, 00:00:09.280 --> 00:00:11.351 a new Emacs package that allows Emacs 00:00:11.351 --> 00:00:17.840 to parse multiple programming languages in real-time. 00:00:17.840 --> 00:00:21.840 So what is the problem statement? 00:00:21.840 --> 00:00:24.131 In order to support programming functionalities 00:00:24.131 --> 00:00:25.760 for a particular language, 00:00:25.760 --> 00:00:27.680 a text editor needs to have some degree 00:00:27.680 --> 00:00:29.679 of language understanding. 00:00:29.679 --> 00:00:31.840 Traditionally, text editors have relied 00:00:31.840 --> 00:00:34.960 very heavily on regular expressions for this. 00:00:34.960 --> 00:00:37.013 Emacs is no different. 00:00:37.013 --> 00:00:40.170 Most language major modes use regular expressions 00:00:40.170 --> 00:00:42.960 for syntax-highlighting, code navigation, 00:00:42.960 --> 00:00:46.618 folding, indexing, and so on. 00:00:46.618 --> 00:00:50.559 Regular expressions are problematic for a couple of reasons. 00:00:50.559 --> 00:00:53.778 They're slow and inaccurate. 00:00:53.778 --> 00:00:56.800 They also make the code hard to read and write. 00:00:56.800 --> 00:01:01.199 Sometimes it's because the regular expressions themselves are very hairy, 00:01:01.199 --> 00:01:05.199 and sometimes because they are just not powerful enough. 00:01:05.199 --> 00:01:08.625 Some helper code is usually needed 00:01:08.625 --> 00:01:11.200 to parse more intricate language features. 00:01:11.200 --> 00:01:16.159 That also illustrates the core problem with regular expressions, 00:01:16.159 --> 00:01:21.119 in that they are not powerful enough to parse programming languages. 00:01:21.119 --> 00:01:25.040 An example feature that regular expressions cannot handle very well 00:01:25.040 --> 00:01:28.320 is string interpolation, which is a very common feature 00:01:28.320 --> 00:01:31.680 in many modern programming languages. 00:01:31.680 --> 00:01:34.079 It would be much nicer if Emacs somehow 00:01:34.079 --> 00:01:39.520 had structural understanding of source code, like IDEs do. 00:01:39.520 --> 00:01:41.981 There have been multiple efforts 00:01:41.981 --> 00:01:45.280 to bring this kind of programming language understanding into Emacs. 00:01:45.280 --> 00:01:47.119 There are language-specific parsers 00:01:47.119 --> 00:01:48.640 written in Elisp 00:01:48.640 --> 00:01:50.675 that can be thought of 00:01:50.675 --> 00:01:51.989 as the next logical step of the glue code 00:01:51.989 --> 00:01:53.856 on top of regular expressions, 00:01:53.856 --> 00:01:57.356 moving from partial local pattern recognition 00:01:57.356 --> 00:01:59.840 into a full-fledged parser. 00:01:59.840 --> 00:02:02.023 The most prominent example of this approach 00:02:02.023 --> 00:02:06.479 is probably the famous js2-mode. 00:02:06.479 --> 00:02:10.080 However, this approach has several issues. 00:02:10.080 --> 00:02:12.606 Parsing is computationally expensive, 00:02:12.606 --> 00:02:16.800 and Emacs Lisp is not good at that kind of stuff. 00:02:16.800 --> 00:02:19.156 Furthermore, maintenance is very troublesome. 00:02:19.156 --> 00:02:22.160 In order to work on these parsers, 00:02:22.160 --> 00:02:24.239 first, you have to know Elisp well enough, 00:02:24.239 --> 00:02:26.606 and then you have to be comfortable with 00:02:26.606 --> 00:02:29.739 writing a recursive descending parser, 00:02:29.739 --> 00:02:34.000 while constantly keeping up with changes to the language itself, 00:02:34.000 --> 00:02:36.356 which can be evolving very quickly, 00:02:36.356 --> 00:02:39.360 like Javascript, for example. 00:02:39.360 --> 00:02:42.373 Together, these constraints significantly reduce 00:02:42.373 --> 00:02:45.680 the pool of potential maintainers. 00:02:45.680 --> 00:02:47.760 The biggest issue, though, in my opinion, 00:02:47.760 --> 00:02:52.139 is lack of the set of generic and reusable APIs. 00:02:52.139 --> 00:02:54.319 This makes them very hard to use 00:02:54.319 --> 00:02:55.920 for minor modes that want to deal with 00:02:55.920 --> 00:02:59.920 cross-cutting concerns across multiple languages. 00:02:59.920 --> 00:03:01.760 The other approach which has been 00:03:01.760 --> 00:03:04.319 gaining a lot of momentum in recent years 00:03:04.319 --> 00:03:06.560 is externalizing language understanding 00:03:06.560 --> 00:03:08.159 to another process, 00:03:08.159 --> 00:03:12.239 also known as language server protocol. 00:03:12.239 --> 00:03:16.560 This second approach is actually a very interesting one. 00:03:16.560 --> 00:03:18.400 By decoupling language understanding 00:03:18.400 --> 00:03:21.280 from the editing facility itself, 00:03:21.280 --> 00:03:25.120 the LSP servers can attract a lot more contributors, 00:03:25.120 --> 00:03:27.189 which makes maintenance easier. 00:03:27.189 --> 00:03:32.400 However, they also have several issues of their own. 00:03:32.400 --> 00:03:34.089 Being a separate process, 00:03:34.089 --> 00:03:37.073 they are usually more resource-intensive, 00:03:37.073 --> 00:03:39.920 and depending on the language, 00:03:39.920 --> 00:03:42.159 the LSP server itself can bring with it 00:03:42.159 --> 00:03:44.640 a host of additional dependencies 00:03:44.640 --> 00:03:50.640 external to Emacs, which may be messy to install and manage. 00:03:50.640 --> 00:03:55.120 Furthermore, JSON over RPC has pretty high latency. 00:03:55.120 --> 00:03:57.840 For one-off tasks like jumping to source 00:03:57.840 --> 00:04:00.879 or on-demand completion, it's great. 00:04:00.879 --> 00:04:03.040 But for things like code highlighting, 00:04:03.040 --> 00:04:06.000 the latency is just too much. 00:04:06.000 --> 00:04:08.319 I was using Rust and I was following the 00:04:08.319 --> 00:04:11.760 community effort to improve its IDE support, 00:04:11.760 --> 00:04:15.760 hoping to integrate some of that into Emacs itself. 00:04:15.760 --> 00:04:19.759 Then I heard someone from the community mention tree-sitter, 00:04:19.759 --> 00:04:23.360 and I decided to check it out. 00:04:23.360 --> 00:04:28.720 Basically, tree-sitter is an incremental parsing library and a parser generator. 00:04:28.720 --> 00:04:33.040 It was introduced by the Atom editor in 2018. 00:04:33.040 --> 00:04:35.923 Besides Atom, it is also being integrated 00:04:35.923 --> 00:04:37.623 into the NeoVim editor, 00:04:37.623 --> 00:04:41.040 and Github is using it to power 00:04:41.040 --> 00:04:42.423 their source code analysis 00:04:42.423 --> 00:04:45.840 and navigation features. 00:04:45.840 --> 00:04:48.639 It is written in C and can be compiled 00:04:48.639 --> 00:04:50.623 for all major platforms. 00:04:50.623 --> 00:04:53.120 It can even be compiled 00:04:53.120 --> 00:04:55.323 to web assembly to run on the web. 00:04:55.323 --> 00:05:00.800 That's how Github is using it on their website. 00:05:00.800 --> 00:05:05.840 So why is tree-sitter an interesting solution to this problem? 00:05:05.840 --> 00:05:10.000 There are multiple features that make it an attractive option. 00:05:10.000 --> 00:05:11.839 It is designed to be fast. 00:05:11.839 --> 00:05:13.680 By being incremental, 00:05:13.680 --> 00:05:15.680 the initial parse of a typical big file 00:05:15.680 --> 00:05:18.160 can take tens of milliseconds, 00:05:18.160 --> 00:05:20.240 while subsequent incremental processes 00:05:20.240 --> 00:05:22.560 are sub-millisecond. 00:05:22.560 --> 00:05:26.240 It achieves this by using structural sharing, 00:05:26.240 --> 00:05:29.360 meaning replacing only affected nodes 00:05:29.360 --> 00:05:32.960 in the old tree when it needs to. 00:05:32.960 --> 00:05:37.120 Also, unlike LSP, being in the same process, 00:05:37.120 --> 00:05:40.639 it has much lower latency. 00:05:40.639 --> 00:05:44.960 Secondly, it provides a uniform programming interface. 00:05:44.960 --> 00:05:47.039 The same data structures and functions 00:05:47.039 --> 00:05:50.400 work on parse trees of different languages. 00:05:50.400 --> 00:05:52.160 Syntax nodes of different languages 00:05:52.160 --> 00:05:54.160 differ only by their types 00:05:54.160 --> 00:05:55.723 and their possible child nodes. 00:05:55.723 --> 00:06:02.240 This is a big advantage over language-specific parsers. 00:06:02.240 --> 00:06:06.880 Thirdly, it's written in self-contained embeddable C. 00:06:06.880 --> 00:06:11.723 As I mentioned previously, it can even be compiled to webassembly. 00:06:11.723 --> 00:06:16.106 This makes integrating it into various editors quite easy 00:06:16.106 --> 00:06:22.880 without having to install any external dependencies. 00:06:22.880 --> 00:06:25.503 One thing that is not mentioned here 00:06:25.503 --> 00:06:28.000 is that being a parser generator, 00:06:28.000 --> 00:06:31.039 its grammars are declarative. 00:06:31.039 --> 00:06:34.880 Together with being editor-independent, 00:06:34.880 --> 00:06:39.139 this makes the pool of potential contributors much larger. 00:06:39.139 --> 00:06:45.520 So I was convinced that tree-sitter is a good fit for Emacs. 00:06:45.520 --> 00:06:48.000 Last year, I started writing the bindings 00:06:48.000 --> 00:06:53.280 using dynamic module support introduced in Emacs 25. 00:06:53.280 --> 00:06:58.479 Dynamic module means there is platform-specific native code involved, 00:06:58.479 --> 00:07:00.560 but since there are pre-compiled binaries 00:07:00.560 --> 00:07:02.880 for the three major platforms, 00:07:02.880 --> 00:07:04.706 it should work in most places. 00:07:04.706 --> 00:07:09.440 Currently, the core functionalities are in a pretty good shape. 00:07:09.440 --> 00:07:12.560 Syntax highlighting is working nicely. 00:07:12.560 --> 00:07:16.080 The whole thing is split into three packages. 00:07:16.080 --> 00:07:20.319 tree-sitter is the main package that other packages should depend on. 00:07:20.319 --> 00:07:22.800 tree-sitter-langs is the language bundle 00:07:22.800 --> 00:07:24.000 that includes support 00:07:24.000 --> 00:07:27.199 for most common languages. 00:07:27.199 --> 00:07:32.160 And finally, the core APIs are in the package tsc, 00:07:32.160 --> 00:07:36.160 which stands for tree-sitter-core. 00:07:36.160 --> 00:07:38.800 It is the implicit dependency of the 00:07:38.800 --> 00:07:43.520 tree-sitter package. 00:07:43.520 --> 00:07:47.520 The main package includes the minor mode tree-sitter-mode. 00:07:47.520 --> 00:07:52.560 This provides the base for other major or minor modes to build on. 00:07:52.560 --> 00:07:54.839 Using Emacs's change tracking hooks, 00:07:54.839 --> 00:07:57.073 it enables incremental parsing 00:07:57.073 --> 00:08:00.800 and provides a syntax tree that is always up to date 00:08:00.800 --> 00:08:04.080 after any edits in a buffer. 00:08:04.080 --> 00:08:06.223 There is also a basic debug mode 00:08:06.223 --> 00:08:10.080 that shows the parse tree in another buffer. 00:08:10.080 --> 00:08:13.360 Here is a quick demo. 00:08:13.360 --> 00:08:15.673 Here I'm in an empty Python buffer 00:08:15.673 --> 00:08:17.520 with tree-sitter enabled. 00:08:17.520 --> 00:08:19.440 I'm going to turn on the debug mode to 00:08:19.440 --> 00:08:26.560 see the parse tree. 00:08:26.560 --> 00:08:28.106 Since the buffer is empty, 00:08:28.106 --> 00:08:30.423 there is only one node in the syntax tree: 00:08:30.423 --> 00:08:33.279 the top-level module node. 00:08:33.279 --> 00:09:11.040 Let's try typing some code. 00:09:11.040 --> 00:09:14.640 As you can see, as I type into the Python buffer, 00:09:14.640 --> 00:09:19.120 the syntax tree updates in real time. 00:09:19.120 --> 00:09:22.039 The other minor mode included in the main package 00:09:22.039 --> 00:09:24.389 is tree-sitter-hl-mode. 00:09:24.389 --> 00:09:26.349 It overrides font-lock mode 00:09:26.349 --> 00:09:28.480 and provides its own set of phases 00:09:28.480 --> 00:09:30.139 and customization options 00:09:30.139 --> 00:09:32.800 It is query-driven. 00:09:32.800 --> 00:09:36.240 That means instead of regular expressions, 00:09:36.240 --> 00:09:39.518 it uses a Lisp-like query language 00:09:39.518 --> 00:09:40.320 to map syntax nodes 00:09:40.320 --> 00:09:41.923 to highlighting phrases. 00:09:41.923 --> 00:09:45.760 I'm going to open a python file with small snippets 00:09:45.760 --> 00:09:54.320 that showcase syntax highlighting. 00:09:54.320 --> 00:09:55.920 So this is the default highlighting 00:09:55.920 --> 00:10:00.880 provided by python-mode. 00:10:00.880 --> 00:10:04.640 This is the highlighting enabled by tree-sitter. 00:10:04.640 --> 00:10:07.680 as you can see string interpolation 00:10:07.680 --> 00:10:11.680 and decorators are highlighted correctly 00:10:11.680 --> 00:10:17.440 function calls are also highlighted 00:10:17.440 --> 00:10:20.240 you can also note that property 00:10:20.240 --> 00:10:21.839 assessors 00:10:21.839 --> 00:10:24.640 and property assignments are highlighted 00:10:24.640 --> 00:10:27.440 differently 00:10:27.440 --> 00:10:29.360 what I like the most about this is that 00:10:29.360 --> 00:10:30.880 new bindings are consistently 00:10:30.880 --> 00:10:32.640 highlighted 00:10:32.640 --> 00:10:36.320 this included local variable 00:10:36.320 --> 00:10:39.760 function parameters and property 00:10:39.760 --> 00:10:45.760 mutations 00:10:45.760 --> 00:10:48.000 before going through the three queries 00:10:48.000 --> 00:10:49.279 and the syntax highlighting 00:10:49.279 --> 00:10:51.680 customization options 00:10:51.680 --> 00:10:53.760 let's take a brief look at the core data 00:10:53.760 --> 00:10:55.040 structures and functions 00:10:55.040 --> 00:10:58.079 that tree sitter provides 00:10:58.079 --> 00:10:59.839 so parsing is done with the help of a 00:10:59.839 --> 00:11:02.240 generic parser object 00:11:02.240 --> 00:11:04.160 a single parser object can be used to 00:11:04.160 --> 00:11:06.000 pass different languages 00:11:06.000 --> 00:11:08.320 by sending different language objects to 00:11:08.320 --> 00:11:09.279 it 00:11:09.279 --> 00:11:10.880 the language objects themselves are 00:11:10.880 --> 00:11:14.079 loaded from shared libraries 00:11:14.079 --> 00:11:16.079 since three seater mode already handles 00:11:16.079 --> 00:11:17.360 the parsing part 00:11:17.360 --> 00:11:19.440 we will instead focus on the functions 00:11:19.440 --> 00:11:20.800 that inspect nodes 00:11:20.800 --> 00:11:25.279 and in the resulting path tree 00:11:25.279 --> 00:11:27.200 we can ask tree sitter what is the 00:11:27.200 --> 00:11:44.240 syntax node at point 00:11:44.240 --> 00:11:47.200 uh is it an opaque object so this is not 00:11:47.200 --> 00:11:48.480 very useful 00:11:48.480 --> 00:12:03.760 we can instead ask what is its type 00:12:03.760 --> 00:12:06.560 so his type is the symbol comparison 00:12:06.560 --> 00:12:08.959 operator 00:12:08.959 --> 00:12:11.600 trees there are two kinds of nodes 00:12:11.600 --> 00:12:13.680 anonymous nodes and named nodes 00:12:13.680 --> 00:12:15.519 anonymous nodes correspond to simple 00:12:15.519 --> 00:12:17.040 grammar elements 00:12:17.040 --> 00:12:19.839 like keywords operators punctuations and 00:12:19.839 --> 00:12:21.279 so on 00:12:21.279 --> 00:12:24.160 name nodes on the other hand grammar 00:12:24.160 --> 00:12:25.920 elements that are interesting enough for 00:12:25.920 --> 00:12:26.639 their own 00:12:26.639 --> 00:12:30.320 to have a name like an identifier an 00:12:30.320 --> 00:12:31.839 expression 00:12:31.839 --> 00:12:35.440 or a function definition 00:12:35.440 --> 00:12:37.760 name node types are symbols while 00:12:37.760 --> 00:12:42.639 anonymous node types are strings 00:12:42.639 --> 00:12:46.320 for example if we are on this 00:12:46.320 --> 00:12:49.760 comparison operator 00:12:49.760 --> 00:12:55.920 the node type should be a string 00:12:55.920 --> 00:12:57.920 we can also get other information about 00:12:57.920 --> 00:12:58.959 the node 00:12:58.959 --> 00:13:09.680 for example what is this text 00:13:09.680 --> 00:13:20.800 or where it is in the buffer 00:13:20.800 --> 00:13:43.199 or what is its parent 00:13:43.199 --> 00:13:46.160 there are many other apis to query or 00:13:46.160 --> 00:13:46.839 not 00:13:46.839 --> 00:13:52.639 properties 00:13:52.639 --> 00:13:54.399 tree sitter allows searching for 00:13:54.399 --> 00:13:58.240 structural patterns within a parse tree 00:13:58.240 --> 00:14:01.440 it does so through a list like language 00:14:01.440 --> 00:14:03.519 this language supports by the matching 00:14:03.519 --> 00:14:04.639 by node types 00:14:04.639 --> 00:14:07.760 field names and predicates 00:14:07.760 --> 00:14:10.079 it also allows capturing nodes for 00:14:10.079 --> 00:14:12.639 further processing 00:14:12.639 --> 00:14:37.680 let's try to see some examples 00:14:37.680 --> 00:14:41.040 so in this very simple query we just 00:14:41.040 --> 00:14:43.839 try to highlight all the identifiers in 00:14:43.839 --> 00:14:49.040 the buffer 00:14:49.040 --> 00:14:51.920 this s side tells trisito to capture a 00:14:51.920 --> 00:14:53.120 node 00:14:53.120 --> 00:14:55.839 in the context of the query builder it's 00:14:55.839 --> 00:14:57.360 not very important 00:14:57.360 --> 00:15:00.320 but in normal highlighting query this 00:15:00.320 --> 00:15:01.760 will determine 00:15:01.760 --> 00:15:06.639 the face used to highlight the note 00:15:06.639 --> 00:15:08.800 suppose we want to capture all the 00:15:08.800 --> 00:15:10.320 function names 00:15:10.320 --> 00:15:13.519 instead of just any identifier 00:15:13.519 --> 00:15:29.440 you can improve the query like this 00:15:29.440 --> 00:15:31.600 uh this will highlight the whole 00:15:31.600 --> 00:15:32.639 definition 00:15:32.639 --> 00:15:35.519 but we only want to capture the function 00:15:35.519 --> 00:15:36.399 name 00:15:36.399 --> 00:15:39.600 which means the identifier 00:15:39.600 --> 00:15:42.800 here so we 00:15:42.800 --> 00:15:46.320 move the capture to after the identifier 00:15:46.320 --> 00:15:49.600 node 00:15:49.600 --> 00:15:51.759 if we want to capture the class names as 00:15:51.759 --> 00:15:52.959 well 00:15:52.959 --> 00:16:10.079 we just add another pattern 00:16:10.079 --> 00:16:20.320 let's look at a more practical example 00:16:20.320 --> 00:16:22.959 here we can see that single quotes 00:16:22.959 --> 00:16:23.759 strings and 00:16:23.759 --> 00:16:25.600 double quotes screens are highlighted 00:16:25.600 --> 00:16:27.279 the same 00:16:27.279 --> 00:16:30.399 but in some places 00:16:30.399 --> 00:16:33.440 because of some coding conventions 00:16:33.440 --> 00:16:35.440 it may be desirable to highlight them 00:16:35.440 --> 00:16:37.279 differently for example if 00:16:37.279 --> 00:16:39.680 the string is single quoted we may want 00:16:39.680 --> 00:16:40.880 to highlight it 00:16:40.880 --> 00:16:44.399 as a constant 00:16:44.399 --> 00:16:46.160 let's try to see whether we can 00:16:46.160 --> 00:16:47.600 distinguish these 00:16:47.600 --> 00:16:56.240 two cases 00:16:56.240 --> 00:17:00.639 so here we get all the strings 00:17:00.639 --> 00:17:04.079 if we want to see if it's single quotes 00:17:04.079 --> 00:17:04.559 or 00:17:04.559 --> 00:17:08.799 double quote strings 00:17:08.799 --> 00:17:11.039 we can try looking at the first 00:17:11.039 --> 00:17:12.480 character 00:17:12.480 --> 00:17:15.280 of the string I mean the first character 00:17:15.280 --> 00:17:16.720 of the note 00:17:16.720 --> 00:17:19.360 to check whether it's a single quote or 00:17:19.360 --> 00:17:33.600 a double quote 00:17:33.600 --> 00:17:36.080 yeah so for that we use the three 00:17:36.080 --> 00:17:36.799 setters 00:17:36.799 --> 00:17:40.160 support for predicate in this case 00:17:40.160 --> 00:17:43.360 we use a match predicate 00:17:43.360 --> 00:17:46.080 to check whether the string where the 00:17:46.080 --> 00:17:46.799 note 00:17:46.799 --> 00:17:50.320 starts with a single quote and with this 00:17:50.320 --> 00:17:51.280 pattern 00:17:51.280 --> 00:17:58.840 we only capture the single quotes 00:17:58.840 --> 00:18:00.400 strings 00:18:00.400 --> 00:18:03.760 let's try to give it a different face 00:18:03.760 --> 00:18:13.039 so we copy the pattern 00:18:13.039 --> 00:18:18.640 and we add this pattern 00:18:18.640 --> 00:18:25.120 pop item only 00:18:25.120 --> 00:18:28.400 but we also want to give the 00:18:28.400 --> 00:18:31.440 capture a different name 00:18:31.440 --> 00:18:40.840 let's say we want to highlight it as a 00:18:40.840 --> 00:18:46.559 keyword 00:18:46.559 --> 00:19:06.320 and now if we refresh the buffer 00:19:06.320 --> 00:19:08.799 we see that single quote strings are 00:19:08.799 --> 00:19:10.320 highlighted as 00:19:10.320 --> 00:19:14.400 keywords 00:19:14.400 --> 00:19:16.400 the highlighting patterns can also be 00:19:16.400 --> 00:19:19.200 set for a single project 00:19:19.200 --> 00:19:23.440 using directory local variable 00:19:23.440 --> 00:19:26.880 for example let's take a look at 00:19:26.880 --> 00:19:35.760 ems source code 00:19:35.760 --> 00:19:40.400 so in image c source there are a lot of 00:19:40.400 --> 00:19:43.760 uses of these different macros 00:19:43.760 --> 00:19:47.679 to define functions 00:19:47.679 --> 00:19:51.200 and you can see 00:19:51.200 --> 00:19:53.520 this is actually the function name but 00:19:53.520 --> 00:19:55.760 it's highlighted as the 00:19:55.760 --> 00:19:59.120 string so what we want 00:19:59.120 --> 00:20:03.679 is to somehow recognize this pattern 00:20:03.679 --> 00:20:07.600 and highlight it 00:20:07.600 --> 00:20:11.280 as highlight this part 00:20:11.280 --> 00:20:14.559 with the function phase instead 00:20:14.559 --> 00:20:17.679 in order to do that 00:20:17.679 --> 00:20:20.240 we put a pattern in this project 00:20:20.240 --> 00:20:21.760 directory local 00:20:21.760 --> 00:20:31.760 settings file 00:20:31.760 --> 00:20:34.799 so we can put this button in the c 00:20:34.799 --> 00:20:40.159 mode section 00:20:40.159 --> 00:20:48.000 and now if we enable tree sitter 00:20:48.000 --> 00:20:50.480 you can see that this is the highlighted 00:20:50.480 --> 00:20:53.200 uh 00:20:53.200 --> 00:20:55.520 as a normal function definition so this 00:20:55.520 --> 00:20:56.559 is the function 00:20:56.559 --> 00:21:01.200 face like we wanted 00:21:01.200 --> 00:21:03.760 the pattern for this is actually pretty 00:21:03.760 --> 00:21:07.200 simple 00:21:07.200 --> 00:21:10.720 it's only 00:21:10.720 --> 00:21:14.720 only this part so 00:21:14.720 --> 00:21:17.440 if it's a function call where the name 00:21:17.440 --> 00:21:19.679 of the function is different 00:21:19.679 --> 00:21:21.600 then we highlight the different as a 00:21:21.600 --> 00:21:24.240 keyword 00:21:24.240 --> 00:21:27.360 and then the first string element we 00:21:27.360 --> 00:21:28.159 highlighted 00:21:28.159 --> 00:21:35.360 as a function name 00:21:35.360 --> 00:21:37.679 since the language objects are actually 00:21:37.679 --> 00:21:39.280 native code 00:21:39.280 --> 00:21:40.799 they have to be compiled for each 00:21:40.799 --> 00:21:43.440 platform that we want to support 00:21:43.440 --> 00:21:45.600 this will become a big obstacle for 00:21:45.600 --> 00:21:48.159 3-seater adoption 00:21:48.159 --> 00:21:50.240 therefore I've created a language window 00:21:50.240 --> 00:21:52.960 package 3-seater length 00:21:52.960 --> 00:21:54.960 that takes care of pre-compiling the 00:21:54.960 --> 00:21:56.320 grammars the 00:21:56.320 --> 00:21:59.679 most common grammars for all three major 00:21:59.679 --> 00:22:01.600 platforms 00:22:01.600 --> 00:22:04.080 it also takes care of distributing these 00:22:04.080 --> 00:22:05.360 binaries 00:22:05.360 --> 00:22:08.080 and provides some highlighting queries 00:22:08.080 --> 00:22:11.440 for some of the languages 00:22:11.440 --> 00:22:13.760 it should be noted that this package 00:22:13.760 --> 00:22:15.919 should be treated as a temporary 00:22:15.919 --> 00:22:19.919 distribution mechanism only 00:22:19.919 --> 00:22:22.240 to help with bootstrapping three-seaters 00:22:22.240 --> 00:22:24.720 adoption 00:22:24.720 --> 00:22:27.760 the plan is that eventually these files 00:22:27.760 --> 00:22:29.760 should be provided by the language major 00:22:29.760 --> 00:22:32.480 modes themselves 00:22:32.480 --> 00:22:35.120 but in order to do that we need better 00:22:35.120 --> 00:22:36.320 tooling 00:22:36.320 --> 00:22:40.240 so we're not there yet 00:22:40.240 --> 00:22:42.559 since the call already works reasonably 00:22:42.559 --> 00:22:43.280 well 00:22:43.280 --> 00:22:44.640 there are several areas that would 00:22:44.640 --> 00:22:46.320 benefit from the community's 00:22:46.320 --> 00:22:49.120 contribution 00:22:49.120 --> 00:22:51.520 so three seaters upstream language 00:22:51.520 --> 00:22:52.640 prepositories 00:22:52.640 --> 00:22:54.400 already contain highlighting queries on 00:22:54.400 --> 00:22:55.679 their own 00:22:55.679 --> 00:22:58.480 however they are pretty basic and they 00:22:58.480 --> 00:23:00.480 may not fit well with existing emax 00:23:00.480 --> 00:23:02.559 conventions 00:23:02.559 --> 00:23:04.320 therefore the language bundle has its 00:23:04.320 --> 00:23:07.120 own set of highlighting queries 00:23:07.120 --> 00:23:10.559 this requires maintenance until language 00:23:10.559 --> 00:23:11.600 measurements adopt 00:23:11.600 --> 00:23:13.760 three sitter and maintain the queries on 00:23:13.760 --> 00:23:16.640 their own 00:23:16.640 --> 00:23:18.480 the queries are actually quite easy to 00:23:18.480 --> 00:23:22.000 write as you've already seen 00:23:22.000 --> 00:23:24.240 you just need to be familiar with the 00:23:24.240 --> 00:23:25.360 language 00:23:25.360 --> 00:23:30.000 familiar enough to come up with sensible 00:23:30.000 --> 00:23:35.200 highlighting patterns 00:23:35.200 --> 00:23:37.600 and if you are a maintainer of a 00:23:37.600 --> 00:23:39.679 language major mode 00:23:39.679 --> 00:23:42.320 you may want to consider integrating 00:23:42.320 --> 00:23:43.360 tree sitter into 00:23:43.360 --> 00:23:46.960 your mode initially maybe as an 00:23:46.960 --> 00:23:50.080 optional feature the integration is 00:23:50.080 --> 00:23:53.279 actually pretty straightforward 00:23:53.279 --> 00:23:56.640 especially for syntax highlighting 00:23:56.640 --> 00:24:01.520 or alternatively 00:24:01.520 --> 00:24:03.760 you can also try writing a new major 00:24:03.760 --> 00:24:04.640 mode 00:24:04.640 --> 00:24:08.000 from scratch that relies on tree sitter 00:24:08.000 --> 00:24:12.559 from the very beginning 00:24:12.559 --> 00:24:16.320 the code for such a major mode is 00:24:16.320 --> 00:24:19.679 quite simple for example 00:24:19.679 --> 00:24:23.200 this is the proposed 00:24:23.200 --> 00:24:26.240 what mode for web assembly 00:24:26.240 --> 00:24:31.039 the code is just 00:24:31.039 --> 00:24:34.559 like one page of code not 00:24:34.559 --> 00:24:39.520 not a lot 00:24:39.520 --> 00:24:42.720 you can also try writing new minor modes 00:24:42.720 --> 00:24:46.559 or writing integration packages 00:24:46.559 --> 00:24:50.080 for example a lot of package a lot of 00:24:50.080 --> 00:24:50.880 packages 00:24:50.880 --> 00:24:54.559 may benefit from tree sitter integration 00:24:54.559 --> 00:24:58.840 but no one has written the integration 00:24:58.840 --> 00:25:02.960 yet 00:25:02.960 --> 00:25:05.039 if you are interested in 3-seater you 00:25:05.039 --> 00:25:06.720 can use these links to 00:25:06.720 --> 00:25:10.320 learn more about it I think that's it 00:25:10.320 --> 00:25:11.440 for me today 00:25:11.440 --> 00:25:18.159 I'm happy to answer any questions